<|startoftext|>
Introduction and preliminaries
The focus of this paper is decompositions of (k, `)-sparse graphs into edge-disjoint subgraphs
that certify sparsity. We use graph to mean a multigraph, possibly with loops. We say that a
graph is (k, `)-sparse if no subset of n′ vertices spans more than kn′− ` edges in the graph; a
(k, `)-sparse graph with kn′− ` edges is (k, `)-tight. We call the range k ≤ `≤ 2k−1 the upper
range of sparse graphs and 0≤ `≤ k the lower range.
In this paper, we present efficient algorithms for finding decompositions that certify sparsity
in the upper range of `. Our algorithms also apply in the lower range, which was already ad-
dressed by [3, 4, 5, 6, 19]. A decomposition certifies the sparsity of a graph if the sparse graphs
and graphs admitting the decomposition coincide.
Our algorithms are based on a new characterization of sparse graphs, which we call the
pebble game with colors. The pebble game with colors is a simple graph construction rule that
produces a sparse graph along with a sparsity-certifying decomposition.
We define and study a canonical class of pebble game constructions, which correspond to
previously studied decompositions of sparse graphs into edge disjoint trees. Our results provide
a unifying framework for all the previously known special cases, including Nash-Williams-
Tutte and [7, 24]. Indeed, in the lower range, canonical pebble game constructions capture the
properties of the augmenting paths used in matroid union and intersection algorithms[5, 6].
Since the sparse graphs in the upper range are not known to be unions or intersections of the
matroids for which there are efficient augmenting path algorithms, these do not easily apply in
∗ Research of both authors funded by the NSF under grants NSF CCF-0430990 and NSF-DARPA CARGO
CCR-0310661 to the first author.
2 Ileana Streinu, Louis Theran
Term Meaning
Sparse graph G Every non-empty subgraph on n′ vertices has ≤ kn′− ` edges
Tight graph G G = (V,E) is sparse and |V |= n, |E|= kn− `
Block H in G G is sparse, and H is a tight subgraph
Component H of G G is sparse and H is a maximal block
Map-graph Graph that admits an out-degree-exactly-one orientation
(k, `)-maps-and-trees Edge-disjoint union of ` trees and (k− `) map-grpahs
`Tk Union of ` trees, each vertex is in exactly k of them
Set of tree-pieces of an `Tk induced on V ′ ⊂V Pieces of trees in the `Tk spanned by E(V ′)
Proper `Tk Every V ′ ⊂V contains ≥ ` pieces of trees from the `Tk
Table 1. Sparse graph and decomposition terminology used in this paper.
the upper range. Pebble game with colors constructions may thus be considered a strengthening
of augmenting paths to the upper range of matroidal sparse graphs.
1.1. Sparse graphs
A graph is (k, `)-sparse if for any non-empty subgraph with m′ edges and n′ vertices, m′ ≤
kn′− `. We observe that this condition implies that 0 ≤ ` ≤ 2k− 1, and from now on in this
paper we will make this assumption. A sparse graph that has n vertices and exactly kn−` edges
is called tight.
For a graph G = (V,E), and V ′ ⊂ V , we use the notation span(V ′) for the number of edges
in the subgraph induced by V ′. In a directed graph, out(V ′) is the number of edges with the tail
in V ′ and the head in V −V ′; for a subgraph induced by V ′, we call such an edge an out-edge.
There are two important types of subgraphs of sparse graphs. A block is a tight subgraph of
a sparse graph. A component is a maximal block.
Table 1 summarizes the sparse graph terminology used in this paper.
1.2. Sparsity-certifying decompositions
A k-arborescence is a graph that admits a decomposition into k edge-disjoint spanning trees.
Figure 1(a) shows an example of a 3-arborescence. The k-arborescent graphs are described
by the well-known theorems of Tutte [23] and Nash-Williams [17] as exactly the (k,k)-tight
graphs.
A map-graph is a graph that admits an orientation such that the out-degree of each vertex is
exactly one. A k-map-graph is a graph that admits a decomposition into k edge-disjoint map-
graphs. Figure 1(b) shows an example of a 2-map-graphs; the edges are oriented in one possible
configuration certifying that each color forms a map-graph. Map-graphs may be equivalently
defined (see, e.g., [18]) as having exactly one cycle per connected component.1
A (k, `)-maps-and-trees is a graph that admits a decomposition into k− ` edge-disjoint
map-graphs and ` spanning trees.
Another characterization of map-graphs, which we will use extensively in this paper, is as
the (1,0)-tight graphs [8, 24]. The k-map-graphs are evidently (k,0)-tight, and [8, 24] show that
the converse holds as well.
1 Our terminology follows Lovász in [16]. In the matroid literature map-graphs are sometimes known as bases
of the bicycle matroid or spanning pseudoforests.
Sparsity-certifying Graph Decompositions 3
Fig. 1. Examples of sparsity-certifying decompositions: (a) a 3-arborescence; (b) a 2-map-graph; (c) a
(2,1)-maps-and-trees. Edges with the same line style belong to the same subgraph. The 2-map-graph is
shown with a certifying orientation.
A `Tk is a decomposition into ` edge-disjoint (not necessarily spanning) trees such that each
vertex is in exactly k of them. Figure 2(a) shows an example of a 3T2.
Given a subgraph G′ of a `Tk graph G, the set of tree-pieces in G′ is the collection of the
components of the trees in G induced by G′ (since G′ is a subgraph each tree may contribute
multiple pieces to the set of tree-pieces in G′). We observe that these tree-pieces may come
from the same tree or be single-vertex “empty trees.” It is also helpful to note that the definition
of a tree-piece is relative to a specific subgraph. An `Tk decomposition is proper if the set of
tree-pieces in any subgraph G′ has size at least `.
Figure 2(a) shows a graph with a 3T2 decomposition; we note that one of the trees is an
isolated vertex in the bottom-right corner. The subgraph in Figure 2(b) has three black tree-
pieces and one gray tree-piece: an isolated vertex at the top-right corner, and two single edges.
These count as three tree-pieces, even though they come from the same back tree when the
whole graph in considered. Figure 2(c) shows another subgraph; in this case there are three
gray tree-pieces and one black one.
Table 1 contains the decomposition terminology used in this paper.
The decomposition problem. We define the decomposition problem for sparse graphs as tak-
ing a graph as its input and producing as output, a decomposition that can be used to certify spar-
sity. In this paper, we will study three kinds of outputs: maps-and-trees; proper `Tk decompositions;
and the pebble-game-with-colors decomposition, which is defined in the next section.
2. Historical background
The well-known theorems of Tutte [23] and Nash-Williams [17] relate the (k,k)-tight graphs to
the existence of decompositions into edge-disjoint spanning trees. Taking a matroidal viewpoint,
4 Ileana Streinu, Louis Theran
Fig. 2. (a) A graph with a 3T2 decomposition; one of the three trees is a single vertex in the bottom right
corner. (b) The highlighted subgraph inside the dashed countour has three black tree-pieces and one gray
tree-piece. (c) The highlighted subgraph inside the dashed countour has three gray tree-pieces (one is a
single vertex) and one black tree-piece.
Edmonds [3, 4] gave another proof of this result using matroid unions. The equivalence of maps-
and-trees graphs and tight graphs in the lower range is shown using matroid unions in [24], and
matroid augmenting paths are the basis of the algorithms for the lower range of [5, 6, 19].
In rigidity theory a foundational theorem of Laman [11] shows that (2,3)-tight (Laman)
graphs correspond to generically minimally rigid bar-and-joint frameworks in the plane. Tay
[21] proved an analogous result for body-bar frameworks in any dimension using (k,k)-tight
graphs. Rigidity by counts motivated interest in the upper range, and Crapo [2] proved the
equivalence of Laman graphs and proper 3T2 graphs. Tay [22] used this condition to give a
direct proof of Laman’s theorem and generalized the 3T2 condition to all `Tk for k≤ `≤ 2k−1.
Haas [7] studied `Tk decompositions in detail and proved the equivalence of tight graphs and
proper `Tk graphs for the general upper range. We observe that aside from our new pebble-
game-with-colors decomposition, all the combinatorial characterizations of the upper range of
sparse graphs, including the counts, have a geometric interpretation [11, 21, 22, 24].
A pebble game algorithm was first proposed in [10] as an elegant alternative to Hendrick-
son’s Laman graph algorithms [9]. Berg and Jordan [1], provided the formal analysis of the
pebble game of [10] and introduced the idea of playing the game on a directed graph. Lee and
Streinu [12] generalized the pebble game to the entire range of parameters 0≤ `≤ 2k−1, and
left as an open problem using the pebble game to find sparsity certifying decompositions.
3. The pebble game with colors
Our pebble game with colors is a set of rules for constructing graphs indexed by nonnegative
integers k and `. We will use the pebble game with colors as the basis of an efficient algorithm
for the decomposition problem later in this paper. Since the phrase “with colors” is necessary
only for comparison to [12], we will omit it in the rest of the paper when the context is clear.
Sparsity-certifying Graph Decompositions 5
We now present the pebble game with colors. The game is played by a single player on a
fixed finite set of vertices. The player makes a finite sequence of moves; a move consists in the
addition and/or orientation of an edge. At any moment of time, the state of the game is captured
by a directed graph H, with colored pebbles on vertices and edges. The edges of H are colored
by the pebbles on them. While playing the pebble game all edges are directed, and we use the
notation vw to indicate a directed edge from v to w.
We describe the pebble game with colors in terms of its initial configuration and the allowed
moves.
Fig. 3. Examples of pebble game with colors moves: (a) add-edge. (b) pebble-slide. Pebbles on vertices
are shown as black or gray dots. Edges are colored with the color of the pebble on them.
Initialization: In the beginning of the pebble game, H has n vertices and no edges. We start
by placing k pebbles on each vertex of H, one of each color ci, for i = 1,2, . . . ,k.
Add-edge-with-colors: Let v and w be vertices with at least `+1 pebbles on them. Assume
(w.l.o.g.) that v has at least one pebble on it. Pick up a pebble from v, add the oriented edge vw
to E(H) and put the pebble picked up from v on the new edge.
Figure 3(a) shows examples of the add-edge move.
Pebble-slide: Let w be a vertex with a pebble p on it, and let vw be an edge in H. Replace
vw with wv in E(H); put the pebble that was on vw on v; and put p on wv.
Note that the color of an edge can change with a pebble-slide move. Figure 3(b) shows
examples. The convention in these figures, and throughout this paper, is that pebbles on vertices
are represented as colored dots, and that edges are shown in the color of the pebble on them.
From the definition of the pebble-slide move, it is easy to see that a particular pebble is
always either on the vertex where it started or on an edge that has this vertex as the tail. However,
when making a sequence of pebble-slide moves that reverse the orientation of a path in H, it is
sometimes convenient to think of this path reversal sequence as bringing a pebble from the end
of the path to the beginning.
The output of playing the pebble game is its complete configuration.
Output: At the end of the game, we obtain the directed graph H, along with the location
and colors of the pebbles. Observe that since each edge has exactly one pebble on it, the pebble
game configuration colors the edges.
We say that the underlying undirected graph G of H is constructed by the (k, `)-pebble game
or that H is a pebble-game graph.
Since each edge of H has exactly one pebble on it, the pebble game’s configuration partitions
the edges of H, and thus G, into k different colors. We call this decomposition of H a pebble-
game-with-colors decomposition. Figure 4(a) shows an example of a (2,2)-tight graph with a
pebble-game decomposition.
Let G = (V,E) be pebble-game graph with the coloring induced by the pebbles on the edges,
and let G′ be a subgraph of G. Then the coloring of G induces a set of monochromatic con-
6 Ileana Streinu, Louis Theran
(a) (b) (c)
Fig. 4. A (2,2)-tight graph with one possible pebble-game decomposition. The edges are oriented to
show (1,0)-sparsity for each color. (a) The graph K4 with a pebble-game decomposition. There is an
empty black tree at the center vertex and a gray spanning tree. (b) The highlighted subgraph has two
black trees and a gray tree; the black edges are part of a larger cycle but contribute a tree to the subgraph.
(c) The highlighted subgraph (with a light gray background) has three empty gray trees; the black edges
contain a cycle and do not contribute a piece of tree to the subgraph.
Notation Meaning
span(V ′) Number of edges spanned in H by V ′ ⊂V ; i.e. |EH(V ′)|
peb(V ′) Number of pebbles on V ′ ⊂V
out(V ′) Number of edges vw in H with v ∈V ′ and w ∈V −V ′
pebi(v) Number of pebbles of color ci on v ∈V
outi(v) Number of edges vw colored ci for v ∈V
Table 2. Pebble game notation used in this paper.
nected subgraphs of G′ (there may be more than one of the same color). Such a monochromatic
subgraph is called a map-graph-piece of G′ if it contains a cycle (in G′) and a tree-piece of G′
otherwise. The set of tree-pieces of G′ is the collection of tree-pieces induced by G′. As with
the corresponding definition for `Tk s, the set of tree-pieces is defined relative to a specific sub-
graph; in particular a tree-piece may be part of a larger cycle that includes edges not spanned
by G′.
The properties of pebble-game decompositions are studied in Section 6, and Theorem 2
shows that each color must be (1,0)-sparse. The orientation of the edges in Figure 4(a) shows
this.
For example Figure 4(a) shows a (2,2)-tight graph with one possible pebble-game decom-
position. The whole graph contains a gray tree-piece and a black tree-piece that is an isolated
vertex. The subgraph in Figure 4(b) has a black tree and a gray tree, with the edges of the black
tree coming from a cycle in the larger graph. In Figure 4(c), however, the black cycle does not
contribute a tree-piece. All three tree-pieces in this subgraph are single-vertex gray trees.
In the following discussion, we use the notation peb(v) for the number of pebbles on v and
pebi(v) to indicate the number of pebbles of colors i on v.
Table 2 lists the pebble game notation used in this paper.
4. Our Results
We describe our results in this section. The rest of the paper provides the proofs.
Sparsity-certifying Graph Decompositions 7
Our first result is a strengthening of the pebble games of [12] to include colors. It says
that sparse graphs are exactly pebble game graphs. Recall that from now on, all pebble games
discussed in this paper are our pebble game with colors unless noted explicitly.
Theorem 1 (Sparse graphs and pebble-game graphs coincide). A graph G is (k, `)-sparse
with 0≤ `≤ 2k−1 if and only if G is a pebble-game graph.
Next we consider pebble-game decompositions, showing that they are a generalization of
proper `Tk decompositions that extend to the entire matroidal range of sparse graphs.
Theorem 2 (The pebble-game-with-colors decomposition). A graph G is a pebble-game
graph if and only if it admits a decomposition into k edge-disjoint subgraphs such that each
is (1,0)-sparse and every subgraph of G contains at least ` tree-pieces of the (1,0)-sparse
graphs in the decomposition.
The (1,0)-sparse subgraphs in the statement of Theorem 2 are the colors of the pebbles; thus
Theorem 2 gives a characterization of the pebble-game-with-colors decompositions obtained
by playing the pebble game defined in the previous section. Notice the similarity between the
requirement that the set of tree-pieces have size at least ` in Theorem 2 and the definition of a
proper `Tk .
Our next results show that for any pebble-game graph, we can specialize its pebble game
construction to generate a decomposition that is a maps-and-trees or proper `Tk . We call these
specialized pebble game constructions canonical, and using canonical pebble game construc-
tions, we obtain new direct proofs of existing arboricity results.
We observe Theorem 2 that maps-and-trees are special cases of the pebble-game decompo-
sition: both spanning trees and spanning map-graphs are (1,0)-sparse, and each of the spanning
trees contributes at least one piece of tree to every subgraph.
The case of proper `Tk graphs is more subtle; if each color in a pebble-game decomposition
is a forest, then we have found a proper `Tk , but this class is a subset of all possible proper
`Tk decompositions of a tight graph. We show that this class of proper `Tk decompositions is
sufficient to certify sparsity.
We now state the main theorem for the upper and lower range.
Theorem 3 (Main Theorem (Lower Range): Maps-and-trees coincide with pebble-game
graphs). Let 0 ≤ ` ≤ k. A graph G is a tight pebble-game graph if and only if G is a (k, `)-
maps-and-trees.
Theorem 4 (Main Theorem (Upper Range): Proper `Tk graphs coincide with pebble-game
graphs). Let k≤ `≤ 2k−1. A graph G is a tight pebble-game graph if and only if it is a proper
`Tk with kn− ` edges.
As corollaries, we obtain the existing decomposition results for sparse graphs.
Corollary 5 (Nash-Williams [17], Tutte [23], White and Whiteley [24]). Let `≤ k. A graph
G is tight if and only if has a (k, `)-maps-and-trees decomposition.
Corollary 6 (Crapo [2], Haas [7]). Let k ≤ `≤ 2k−1. A graph G is tight if and only if it is a
proper `Tk .
Efficiently finding canonical pebble game constructions. The proofs of Theorem 3 and Theo-
rem 4 lead to an obvious algorithm with O(n3) running time for the decomposition problem.
Our last result improves on this, showing that a canonical pebble game construction, and thus
8 Ileana Streinu, Louis Theran
a maps-and-trees or proper `Tk decomposition can be found using a pebble game algorithm in
O(n2) time and space.
These time and space bounds mean that our algorithm can be combined with those of [12]
without any change in complexity.
5. Pebble game graphs
In this section we prove Theorem 1, a strengthening of results from [12] to the pebble game
with colors. Since many of the relevant properties of the pebble game with colors carry over
directly from the pebble games of [12], we refer the reader there for the proofs.
We begin by establishing some invariants that hold during the execution of the pebble game.
Lemma 7 (Pebble game invariants). During the execution of the pebble game, the following
invariants are maintained in H:
(I1) There are at least ` pebbles on V . [12]
(I2) For each vertex v, span(v)+out(v)+peb(v) = k. [12]
(I3) For each V ′ ⊂V , span(V ′)+out(V ′)+peb(V ′) = kn′. [12]
(I4) For every vertex v ∈V , outi(v)+pebi(v) = 1.
(I5) Every maximal path consisting only of edges with color ci ends in either the first vertex with
a pebble of color ci or a cycle.
Proof. (I1), (I2), and (I3) come directly from [12].
(I4) This invariant clearly holds at the initialization phase of the pebble game with colors.
That add-edge and pebble-slide moves preserve (I4) is clear from inspection.
(I5) By (I4), a monochromatic path of edges is forced to end only at a vertex with a pebble of
the same color on it. If there is no pebble of that color reachable, then the path must eventually
visit some vertex twice.
From these invariants, we can show that the pebble game constructible graphs are sparse.
Lemma 8 (Pebble-game graphs are sparse [12]). Let H be a graph constructed with the
pebble game. Then H is sparse. If there are exactly ` pebbles on V (H), then H is tight.
The main step in proving that every sparse graph is a pebble-game graph is the following.
Recall that by bringing a pebble to v we mean reorienting H with pebble-slide moves to reduce
the out degree of v by one.
Lemma 9 (The `+1 pebble condition [12]). Let vw be an edge such that H + vw is sparse. If
peb({v,w}) < `+1, then a pebble not on {v,w} can be brought to either v or w.
It follows that any sparse graph has a pebble game construction.
Theorem 1 (Sparse graphs and pebble-game graphs coincide). A graph G is (k, `)-sparse
with 0≤ `≤ 2k−1 if and only if G is a pebble-game graph.
6. The pebble-game-with-colors decomposition
In this section we prove Theorem 2, which characterizes all pebble-game decompositions. We
start with the following lemmas about the structure of monochromatic connected components
in H, the directed graph maintained during the pebble game.
Sparsity-certifying Graph Decompositions 9
Lemma 10 (Monochromatic pebble game subgraphs are (1,0)-sparse). Let Hi be the sub-
graph of H induced by edges with pebbles of color ci on them. Then Hi is (1,0)-sparse, for
i = 1, . . . ,k.
Proof. By (I4) Hi is a set of edges with out degree at most one for every vertex.
Lemma 11 (Tree-pieces in a pebble-game graph). Every subgraph of the directed graph H
in a pebble game construction contains at least ` monochromatic tree-pieces, and each of these
is rooted at either a vertex with a pebble on it or a vertex that is the tail of an out-edge.
Recall that an out-edge from a subgraph H ′ = (V ′,E ′) is an edge vw with v∈V ′ and vw /∈ E ′.
Proof. Let H ′ = (V ′,E ′) be a non-empty subgraph of H, and assume without loss of generality
that H ′ is induced by V ′. By (I3), out(V ′)+ peb(V ′) ≥ `. We will show that each pebble and
out-edge tail is the root of a tree-piece.
Consider a vertex v ∈ V ′ and a color ci. By (I4) there is a unique monochromatic directed
path of color ci starting at v. By (I5), if this path ends at a pebble, it does not have a cycle.
Similarly, if this path reaches a vertex that is the tail of an out-edge also in color ci (i.e., if the
monochromatic path from v leaves V ′), then the path cannot have a cycle in H ′.
Since this argument works for any vertex in any color, for each color there is a partitioning
of the vertices into those that can reach each pebble, out-edge tail, or cycle. It follows that each
pebble and out-edge tail is the root of a monochromatic tree, as desired.
Applied to the whole graph Lemma 11 gives us the following.
Lemma 12 (Pebbles are the roots of trees). In any pebble game configuration, each pebble of
color ci is the root of a (possibly empty) monochromatic tree-piece of color ci.
Remark: Haas showed in [7] that in a `Tk , a subgraph induced by n′ ≥ 2 vertices with m′
edges has exactly kn′−m′ tree-pieces in it. Lemma 11 strengthens Haas’ result by extending it
to the lower range and giving a construction that finds the tree-pieces, showing the connection
between the `+1 pebble condition and the hereditary condition on proper `Tk .
We conclude our investigation of arbitrary pebble game constructions with a description of
the decomposition induced by the pebble game with colors.
Theorem 2 (The pebble-game-with-colors decomposition). A graph G is a pebble-game
graph if and only if it admits a decomposition into k edge-disjoint subgraphs such that each
is (1,0)-sparse and every subgraph of G contains at least ` tree-pieces of the (1,0)-sparse
graphs in the decomposition.
Proof. Let G be a pebble-game graph. The existence of the k edge-disjoint (1,0)-sparse sub-
graphs was shown in Lemma 10, and Lemma 11 proves the condition on subgraphs.
For the other direction, we observe that a color ci with ti tree-pieces in a given subgraph can
span at most n− ti edges; summing over all the colors shows that a graph with a pebble-game
decomposition must be sparse. Apply Theorem 1 to complete the proof.
Remark: We observe that a pebble-game decomposition for a Laman graph may be read out
of the bipartite matching used in Hendrickson’s Laman graph extraction algorithm [9]. Indeed,
pebble game orientations have a natural correspondence with the bipartite matchings used in
10 Ileana Streinu, Louis Theran
Maps-and-trees are a special case of pebble-game decompositions for tight graphs: if there
are no cycles in ` of the colors, then the trees rooted at the corresponding ` pebbles must be
spanning, since they have n− 1 edges. Also, if each color forms a forest in an upper range
pebble-game decomposition, then the tree-pieces condition ensures that the pebble-game de-
composition is a proper `Tk .
In the next section, we show that the pebble game can be specialized to correspond to maps-
and-trees and proper `Tk decompositions.
7. Canonical Pebble Game Constructions
In this section we prove the main theorems (Theorem 3 and Theorem 4), continuing the inves-
tigation of decompositions induced by pebble game constructions by studying the case where a
minimum number of monochromatic cycles are created. The main idea, captured in Lemma 15
and illustrated in Figure 6, is to avoid creating cycles while collecting pebbles. We show that
this is always possible, implying that monochromatic map-graphs are created only when we
add more than k(n′−1) edges to some set of n′ vertices. For the lower range, this implies that
every color is a forest. Every decomposition characterization of tight graphs discussed above
follows immediately from the main theorem, giving new proofs of the previous results in a
unified framework.
In the proof, we will use two specializations of the pebble game moves. The first is a modi-
fication of the add-edge move.
Canonical add-edge: When performing an add-edge move, cover the new edge with a color
that is on both vertices if possible. If not, then take the highest numbered color present.
The second is a restriction on which pebble-slide moves we allow.
Canonical pebble-slide: A pebble-slide move is allowed only when it does not create a
monochromatic cycle.
We call a pebble game construction that uses only these moves canonical. In this section
we will show that every pebble-game graph has a canonical pebble game construction (Lemma
14 and Lemma 15) and that canonical pebble game constructions correspond to proper `Tk and
maps-and-trees decompositions (Theorem 3 and Theorem 4).
We begin with a technical lemma that motivates the definition of canonical pebble game
constructions. It shows that the situations disallowed by the canonical moves are all the ways
for cycles to form in the lowest ` colors.
Lemma 13 (Monochromatic cycle creation). Let v ∈ V have a pebble p of color ci on it and
let w be a vertex in the same tree of color ci as v. A monochromatic cycle colored ci is created
in exactly one of the following ways:
(M1) The edge vw is added with an add-edge move.
(M2) The edge wv is reversed by a pebble-slide move and the pebble p is used to cover the reverse
edge vw.
Proof. Observe that the preconditions in the statement of the lemma are implied by Lemma 7.
By Lemma 12 monochromatic cycles form when the last pebble of color ci is removed from a
connected monochromatic subgraph. (M1) and (M2) are the only ways to do this in a pebble
game construction, since the color of an edge only changes when it is inserted the first time or
a new pebble is put on it by a pebble-slide move.
Sparsity-certifying Graph Decompositions 11
vw vw
Fig. 5. Creating monochromatic cycles in a (2,0)-pebble game. (a) A type (M1) move creates a cycle by
adding a black edge. (b) A type (M2) move creates a cycle with a pebble-slide move. The vertices are
labeled according to their role in the definition of the moves.
Figure 5(a) and Figure 5(b) show examples of (M1) and (M2) map-graph creation moves,
respectively, in a (2,0)-pebble game construction.
We next show that if a graph has a pebble game construction, then it has a canonical peb-
ble game construction. This is done in two steps, considering the cases (M1) and (M2) sepa-
rately. The proof gives two constructions that implement the canonical add-edge and canonical
pebble-slide moves.
Lemma 14 (The canonical add-edge move). Let G be a graph with a pebble game construc-
tion. Cycle creation steps of type (M1) can be eliminated in colors ci for 1 ≤ i ≤ `′, where
`′ = min{k, `}.
Proof. For add-edge moves, cover the edge with a color present on both v and w if possible. If
this is not possible, then there are `+1 distinct colors present. Use the highest numbered color
to cover the new edge.
Remark: We note that in the upper range, there is always a repeated color, so no canonical
add-edge moves create cycles in the upper range.
The canonical pebble-slide move is defined by a global condition. To prove that we obtain
the same class of graphs using only canonical pebble-slide moves, we need to extend Lemma
9 to only canonical moves. The main step is to show that if there is any sequence of moves that
reorients a path from v to w, then there is a sequence of canonical moves that does the same
thing.
Lemma 15 (The canonical pebble-slide move). Any sequence of pebble-slide moves leading
to an add-edge move can be replaced with one that has no (M2) steps and allows the same
add-edge move.
In other words, if it is possible to collect `+ 1 pebbles on the ends of an edge to be added,
then it is possible to do this without creating any monochromatic cycles.
12 Ileana Streinu, Louis Theran
Figure 7 and Figure 8 illustrate the construction used in the proof of Lemma 15. We call this
the shortcut construction by analogy to matroid union and intersection augmenting paths used
in previous work on the lower range.
Figure 6 shows the structure of the proof. The shortcut construction removes an (M2) step
at the beginning of a sequence that reorients a path from v to w with pebble-slides. Since one
application of the shortcut construction reorients a simple path from a vertex w′ to w, and a
path from v to w′ is preserved, the shortcut construction can be applied inductively to find the
sequence of moves we want.
Fig. 6. Outline of the shortcut construction: (a) An arbitrary simple path from v to w with curved lines
indicating simple paths. (b) An (M2) step. The black edge, about to be flipped, would create a cycle,
shown in dashed and solid gray, of the (unique) gray tree rooted at w. The solid gray edges were part
of the original path from (a). (c) The shortened path to the gray pebble; the new path follows the gray
tree all the way from the first time the original path touched the gray tree at w′. The path from v to w′ is
simple, and the shortcut construction can be applied inductively to it.
Proof. Without loss of generality, we can assume that our sequence of moves reorients a simple
path in H, and that the first move (the end of the path) is (M2). The (M2) step moves a pebble
of color ci from a vertex w onto the edge vw, which is reversed. Because the move is (M2), v
and w are contained in a maximal monochromatic tree of color ci. Call this tree H ′i , and observe
that it is rooted at w.
Now consider the edges reversed in our sequence of moves. As noted above, before we make
any of the moves, these sketch out a simple path in H ending at w. Let z be the first vertex on
this path in H ′i . We modify our sequence of moves as follows: delete, from the beginning, every
move before the one that reverses some edge yz; prepend onto what is left a sequence of moves
that moves the pebble on w to z in H ′i .
Sparsity-certifying Graph Decompositions 13
Fig. 7. Eliminating (M2) moves: (a) an (M2) move; (b) avoiding the (M2) by moving along another path.
The path where the pebbles move is indicated by doubled lines.
Fig. 8. Eliminating (M2) moves: (a) the first step to move the black pebble along the doubled path is
(M2); (b) avoiding the (M2) and simplifying the path.
Since no edges change color in the beginning of the new sequence, we have eliminated
the (M2) move. Because our construction does not change any of the edges involved in the
remaining tail of the original sequence, the part of the original path that is left in the new
sequence will still be a simple path in H, meeting our initial hypothesis.
The rest of the lemma follows by induction.
Together Lemma 14 and Lemma 15 prove the following.
Lemma 16. If G is a pebble-game graph, then G has a canonical pebble game construction.
Using canonical pebble game constructions, we can identify the tight pebble-game graphs
with maps-and-trees and `Tk graphs.
14 Ileana Streinu, Louis Theran
Theorem 3 (Main Theorem (Lower Range): Maps-and-trees coincide with pebble-game
graphs). Let 0 ≤ ` ≤ k. A graph G is a tight pebble-game graph if and only if G is a (k, `)-
maps-and-trees.
Proof. As observed above, a maps-and-trees decomposition is a special case of the pebble game
decomposition. Applying Theorem 2, we see that any maps-and-trees must be a pebble-game
graph.
For the reverse direction, consider a canonical pebble game construction of a tight graph.
From Lemma 8, we see that there are ` pebbles left on G at the end of the construction. The
definition of the canonical add-edge move implies that there must be at least one pebble of
each ci for i = 1,2, . . . , `. It follows that there is exactly one of each of these colors. By Lemma
12, each of these pebbles is the root of a monochromatic tree-piece with n− 1 edges, yielding
the required ` edge-disjoint spanning trees.
Corollary 5 (Nash-Williams [17], Tutte [23], White and Whiteley [24]). Let `≤ k. A graph
G is tight if and only if has a (k, `)-maps-and-trees decomposition.
We next consider the decompositions induced by canonical pebble game constructions when
`≥ k +1.
Theorem 4 (Main Theorem (Upper Range): Proper Trees-and-trees coincide with peb-
ble-game graphs). Let k≤ `≤ 2k−1. A graph G is a tight pebble-game graph if and only if it
is a proper `Tk with kn− ` edges.
Proof. As observed above, a proper `Tk decomposition must be sparse. What we need to show
is that a canonical pebble game construction of a tight graph produces a proper `Tk .
By Theorem 2 and Lemma 16, we already have the condition on tree-pieces and the decom-
position into ` edge-disjoint trees. Finally, an application of (I4), shows that every vertex must
in in exactly k of the trees, as required.
Corollary 6 (Crapo [2], Haas [7]). Let k ≤ `≤ 2k−1. A graph G is tight if and only if it is a
proper `Tk .
8. Pebble game algorithms for finding decompositions
A naı̈ve implementation of the constructions in the previous section leads to an algorithm re-
quiring Θ(n2) time to collect each pebble in a canonical construction: in the worst case Θ(n)
applications of the construction in Lemma 15 requiring Θ(n) time each, giving a total running
time of Θ(n3) for the decomposition problem.
In this section, we describe algorithms for the decomposition problem that run in time
O(n2). We begin with the overall structure of the algorithm.
Algorithm 17 (The canonical pebble game with colors).
Input: A graph G.
Output: A pebble-game graph H.
Method:
– Set V (H) = V (G) and place one pebble of each color on the vertices of H.
– For each edge vw ∈ E(G) try to collect at least `+1 pebbles on v and w using pebble-slide
moves as described by Lemma 15.
Sparsity-certifying Graph Decompositions 15
– If at least `+1 pebbles can be collected, add vw to H using an add-edge move as in Lemma
14, otherwise discard vw.
– Finally, return H, and the locations of the pebbles.
Correctness. Theorem 1 and the result from [24] that the sparse graphs are the independent
sets of a matroid show that H is a maximum sized sparse subgraph of G. Since the construction
found is canonical, the main theorem shows that the coloring of the edges in H gives a maps-
and-trees or proper `Tk decomposition.
Complexity. We start by observing that the running time of Algorithm 17 is the time taken to
process O(n) edges added to H and O(m) edges not added to H. We first consider the cost of an
edge of G that is added to H.
Each of the pebble game moves can be implemented in constant time. What remains is to
describe an efficient way to find and move the pebbles. We use the following algorithm as a
subroutine of Algorithm 17 to do this.
Algorithm 18 (Finding a canonical path to a pebble.).
Input: Vertices v and w, and a pebble game configuration on a directed graph H.
Output: If a pebble was found, ‘yes’, and ‘no’ otherwise. The configuration of H is updated.
Method:
– Start by doing a depth-first search from from v in H. If no pebble not on w is found, stop and
return ‘no.’
– Otherwise a pebble was found. We now have a path v = v1,e1, . . . ,ep−1,vp = u, where the vi
are vertices and ei is the edge vivi+1. Let c[ei] be the color of the pebble on ei. We will use
the array c[] to keep track of the colors of pebbles on vertices and edges after we move them
and the array s[] to sketch out a canonical path from v to u by finding a successor for each
edge.
– Set s[u] = ‘end′ and set c[u] to the color of an arbitrary pebble on u. We walk on the path in
reverse order: vp,ep−1,ep−2, . . . ,e1,v1. For each i, check to see if c[vi] is set; if so, go on to
the next i. Otherwise, check to see if c[vi+1] = c[ei].
– If it is, set s[vi] = ei and set c[vi] = c[ei], and go on to the next edge.
– Otherwise c[vi+1] 6= c[ei], try to find a monochromatic path in color c[vi+1] from vi to vi+1. If
a vertex x is encountered for which c[x] is set, we have a path vi = x1, f1,x2, . . . , fq−1,xq = x
that is monochromatic in the color of the edges; set c[xi] = c[ fi] and s[xi] = fi for i =
1,2, . . . ,q−1. If c[x] = c[ fq−1], stop. Otherwise, recursively check that there is not a monochro-
matic c[x] path from xq−1 to x using this same procedure.
– Finally, slide pebbles along the path from the original endpoints v to u specified by the
successor array s[v], s[s[v]], . . .
The correctness of Algorithm 18 comes from the fact that it is implementing the shortcut
construction. Efficiency comes from the fact that instead of potentially moving the pebble back
and forth, Algorithm 18 pre-computes a canonical path crossing each edge of H at most three
times: once in the initial depth-first search, and twice while converting the initial path to a
canonical one. It follows that each accepted edges takes O(n) time, for a total of O(n2) time
spent processing edges in H.
Although we have not discussed this explicity, for the algorithm to be efficient we need to
maintain components as in [12]. After each accepted edge, the components of H can be updated
in time O(n). Finally, the results of [12, 13] show that the rejected edges take an amortized O(1)
time each.
16 Ileana Streinu, Louis Theran
Summarizing, we have shown that the canonical pebble game with colors solves the decom-
position problem in time O(n2).
9. An important special case: Rigidity in dimension 2 and slider-pinning
In this short section we present a new application for the special case of practical importance,
k = 2, ` = 3. As discussed in the introduction, Laman’s theorem [11] characterizes minimally
rigid graphs as the (2,3)-tight graphs. In recent work on slider pinning, developed after the
current paper was submitted, we introduced the slider-pinning model of rigidity [15, 20]. Com-
binatorially, we model the bar-slider frameworks as simple graphs together with some loops
placed on their vertices in such a way that there are no more than 2 loops per vertex, one of each
color.
We characterize the minimally rigid bar-slider graphs [20] as graphs that are:
1. (2,3)-sparse for subgraphs containing no loops.
2. (2,0)-tight when loops are included.
We call these graphs (2,0,3)-graded-tight, and they are a special case of the graded-sparse
graphs studied in our paper [14].
The connection with the pebble games in this paper is the following.
Corollary 19 (Pebble games and slider-pinning). In any (2,3)-pebble game graph, if we
replace pebbles by loops, we obtain a (2,0,3)-graded-tight graph.
Proof. Follows from invariant (I3) of Lemma 7.
In [15], we study a special case of slider pinning where every slider is either vertical or
horizontal. We model the sliders as pre-colored loops, with the color indicating x or y direction.
For this axis parallel slider case, the minimally rigid graphs are characterized by:
1. (2,3)-sparse for subgraphs containing no loops.
2. Admit a 2-coloring of the edges so that each color is a forest (i.e., has no cycles), and each
monochromatic tree spans exactly one loop of its color.
This also has an interpretation in terms of colored pebble games.
Corollary 20 (The pebble game with colors and slider-pinning). In any canonical (2,3)-
pebble-game-with-colors graph, if we replace pebbles by loops of the same color, we obtain the
graph of a minimally pinned axis-parallel bar-slider framework.
Proof. Follows from Theorem 4, and Lemma 12.
10. Conclusions and open problems
We presented a new characterization of (k, `)-sparse graphs, the pebble game with colors, and
used it to give an efficient algorithm for finding decompositions of sparse graphs into edge-
disjoint trees. Our algorithm finds such sparsity-certifying decompositions in the upper range
and runs in time O(n2), which is as fast as the algorithms for recognizing sparse graphs in the
upper range from [12].
We also used the pebble game with colors to describe a new sparsity-certifying decomposi-
tion that applies to the entire matroidal range of sparse graphs.
Sparsity-certifying Graph Decompositions 17
We defined and studied a class of canonical pebble game constructions that correspond to
either a maps-and-trees or proper `Tk decomposition. This gives a new proof of the Tutte-Nash-
Williams arboricity theorem and a unified proof of the previously studied decomposition cer-
tificates of sparsity. Canonical pebble game constructions also show the relationship between
the `+1 pebble condition, which applies to the upper range of `, to matroid union augmenting
paths, which do not apply in the upper range.
Algorithmic consequences and open problems. In [6], Gabow and Westermann give an O(n3/2)
algorithm for recognizing sparse graphs in the lower range and extracting sparse subgraphs from
dense ones. Their technique is based on efficiently finding matroid union augmenting paths,
which extend a maps-and-trees decomposition. The O(n3/2) algorithm uses two subroutines to
find augmenting paths: cyclic scanning, which finds augmenting paths one at a time, and batch
scanning, which finds groups of disjoint augmenting paths.
We observe that Algorithm 17 can be used to replace cyclic scanning in Gabow and Wester-
mann’s algorithm without changing the running time. The data structures used in the implemen-
tation of the pebble game, detailed in [12, 13] are simpler and easier to implement than those
used to support cyclic scanning.
The two major open algorithmic problems related to the pebble game are then:
Problem 1. Develop a pebble game algorithm with the properties of batch scanning and obtain
an implementable O(n3/2) algorithm for the lower range.
Problem 2. Extend batch scanning to the `+1 pebble condition and derive an O(n3/2) pebble
game algorithm for the upper range.
In particular, it would be of practical importance to find an implementable O(n3/2) algorithm
for decompositions into edge-disjoint spanning trees.
References
1. Berg, A.R., Jordán, T.: Algorithms for graph rigidity and scene analysis. In: Proc. 11th
European Symposium on Algorithms (ESA ’03), LNCS, vol. 2832, pp. 78–89. (2003)
2. Crapo, H.: On the generic rigidity of plane frameworks. Tech. Rep. 1278, Institut de
recherche d’informatique et d’automatique (1988)
3. Edmonds, J.: Minimum partition of a matroid into independent sets. J. Res. Nat. Bur.
Standards Sect. B 69B, 67–72 (1965)
4. Edmonds, J.: Submodular functions, matroids, and certain polyhedra. In: Combinatorial
Optimization—Eureka, You Shrink!, no. 2570 in LNCS, pp. 11–26. Springer (2003)
5. Gabow, H.N.: A matroid approach to finding edge connectivity and packing arborescences.
Journal of Computer and System Sciences 50, 259–273 (1995)
6. Gabow, H.N., Westermann, H.H.: Forests, frames, and games: Algorithms for matroid sums
and applications. Algorithmica 7(1), 465–497 (1992)
7. Haas, R.: Characterizations of arboricity of graphs. Ars Combinatorica 63, 129–137 (2002)
8. Haas, R., Lee, A., Streinu, I., Theran, L.: Characterizing sparse graphs by map decompo-
sitions. Journal of Combinatorial Mathematics and Combinatorial Computing 62, 3–11
(2007)
9. Hendrickson, B.: Conditions for unique graph realizations. SIAM Journal on Computing
21(1), 65–84 (1992)
18 Ileana Streinu, Louis Theran
10. Jacobs, D.J., Hendrickson, B.: An algorithm for two-dimensional rigidity percolation: the
pebble game. Journal of Computational Physics 137, 346–365 (1997)
11. Laman, G.: On graphs and rigidity of plane skeletal structures. Journal of Engineering
Mathematics 4, 331–340 (1970)
12. Lee, A., Streinu, I.: Pebble game algorihms and sparse graphs. Discrete Mathematics
308(8), 1425–1437 (2008)
13. Lee, A., Streinu, I., Theran, L.: Finding and maintaining rigid components. In: Proc. Cana-
dian Conference of Computational Geometry. Windsor, Ontario (2005). http://cccg.
cs.uwindsor.ca/papers/72.pdf
14. Lee, A., Streinu, I., Theran, L.: Graded sparse graphs and matroids. Journal of Universal
Computer Science 13(10) (2007)
15. Lee, A., Streinu, I., Theran, L.: The slider-pinning problem. In: Proceedings of the 19th
Canadian Conference on Computational Geometry (CCCG’07) (2007)
16. Lovász, L.: Combinatorial Problems and Exercises. Akademiai Kiado and North-Holland,
Amsterdam (1979)
17. Nash-Williams, C.S.A.: Decomposition of finite graphs into forests. Journal of the London
Mathematical Society 39, 12 (1964)
18. Oxley, J.G.: Matroid theory. The Clarendon Press, Oxford University Press, New York
(1992)
19. Roskind, J., Tarjan, R.E.: A note on finding minimum cost edge disjoint spanning trees.
Mathematics of Operations Research 10(4), 701–708 (1985)
20. Streinu, I., Theran, L.: Combinatorial genericity and minimal rigidity. In: SCG ’08: Pro-
ceedings of the twenty-fourth annual Symposium on Computational Geometry, pp. 365–
374. ACM, New York, NY, USA (2008).
21. Tay, T.S.: Rigidity of multigraphs I: linking rigid bodies in n-space. Journal of Combinato-
rial Theory, Series B 26, 95–112 (1984)
22. Tay, T.S.: A new proof of Laman’s theorem. Graphs and Combinatorics 9, 365–370 (1993)
23. Tutte, W.T.: On the problem of decomposing a graph into n connected factors. Journal of
the London Mathematical Society 142, 221–230 (1961)
24. Whiteley, W.: The union of matroids and the rigidity of frameworks. SIAM Journal on
Discrete Mathematics 1(2), 237–255 (1988)
http://cccg.cs.uwindsor.ca/papers/72.pdf
http://cccg.cs.uwindsor.ca/papers/72.pdf
	Introduction and preliminaries
	Historical background
	The pebble game with colors
	Our Results
	Pebble game graphs
	The pebble-game-with-colors decomposition
	Canonical Pebble Game Constructions
	Pebble game algorithms for finding decompositions
	An important special case: Rigidity in dimension 2 and slider-pinning
	Conclusions and open problems
ABSTRACT
  We describe a new algorithm, the $(k,\ell)$-pebble game with colors, and use
it obtain a characterization of the family of $(k,\ell)$-sparse graphs and
algorithmic solutions to a family of problems concerning tree decompositions of
graphs. Special instances of sparse graphs appear in rigidity theory and have
received increased attention in recent years. In particular, our colored
pebbles generalize and strengthen the previous results of Lee and Streinu and
give a new proof of the Tutte-Nash-Williams characterization of arboricity. We
also present a new decomposition that certifies sparsity based on the
$(k,\ell)$-pebble game with colors. Our work also exposes connections between
pebble game algorithms and previous sparse graph algorithms by Gabow, Gabow and
Westermann and Hendrickson.

<|endoftext|><|startoftext|>
Introduction 
The popularly accepted theory for the formation of the Earth-Moon system is that 
the Moon was formed from debris of a strong impact by a giant planetesimal with the 
Earth at the close of the planet-forming period (Hartmann and Davis 1975). Since the 
formation of the Earth-Moon system, it has been evolving at all time scale. It is well 
known that the Moon is receding from us and both the Earth’s rotation and Moon’s 
rotation are slowing. The popular theory is that the tidal friction causes all those changes 
based on the conservation of the angular momentum of the Earth-Moon system. The 
situation becomes complicated in describing the past evolution of the Earth-Moon 
system. Because the Moon is moving away from us and the Earth rotation is slowing, this 
means that the Moon was closer and the Earth rotation was faster in the past.  Creationists 
argue that based on the tidal friction theory, the tidal friction should be stronger and the 
recessional rate of the Moon should be greater in the past, the distance of the Moon 
would quickly fall inside the Roche's limit (for earth, 15500 km) in which the Moon 
would be torn apart by gravity in 1 to 2 billion years ago. However, geological evidence 
indicates that the recession of the Moon in the past was slower than the present rate, i. e., 
the recession has been accelerating with time. Therefore, it must be concluded that tidal 
friction was very much less in the remote past than we would deduce on the basis of 
present-day observations (Stacey 1977). This was called “geological time scale 
difficulty” or “Lunar crisis” and is one of the main arguments by creationists against the 
tidal friction theory (Brush 1983).  
But we have to consider the case carefully in various aspects. One possible 
scenario is that the Earth has been undergoing dynamic evolution at all time scale since 
its inception, the geological and physical conditions (such as the continent positions and 
drifting, the crust, surface temperature fluctuation like the glacial/snowball effect, etc) at 
remote past could be substantially different from currently, in which the tidal friction 
could be much less; therefore, the receding rate of the Moon could be slower. Various 
tidal friction models were proposed in the past to describe the evolution of the Earth-
Moon system to avoid such difficulty or crisis and put the Moon at quite a comfortable 
distance from Earth at 4.5 billion years ago (Hansen 1982, Kagan and Maslova 1994, Ray 
et al. 1999, Finch 1981, Slichter 1963). The tidal friction theories explain that the present 
rate of tidal dissipation is anomalously high because the tidal force is close to a resonance 
in the response function of ocean (Brush 1983). Kagan gave a detailed review about those 
tidal friction models (Kagan 1997). Those models are based on many assumptions about 
geological (continental position and drifting) and physical conditions in the past, and 
many parameters (such as phase lag angle, multi-mode approximation with time 
dependent frequencies of the resonance modes, etc.) have to be introduced and carefully 
adjusted to make their predictions close to the geological evidence. However, those 
assumptions and parameters are still challenged, to certain extent, as concoction.  
The second possible scenario is that another mechanism could dominate the 
evolution of the Earth-Moon system and the role of the tidal friction is not significant. In 
the Meeting of Division of Particle and Field 2004, American Physical Society, 
University of California at Riverside, the author proposed a dark matter field fluid model 
(Pan 2005) with a non-Newtonian approach, the current Moon and Earth data agree with 
this model very well. This paper will demonstrate that the past evolution of Moon-Earth 
system can be described by the dark matter field fluid model without any assumptions 
about past geological and physical conditions. Although the subject of the evolution of 
the Earth-Moon system has been extensively studied analytically or numerically, to the 
author’s knowledge, there are no theories similar or equivalent to this model.  
2.  Invisible matter 
In modern cosmology, it was proposed that the visible matter in the universe is 
about 2 ~ 10 % of the total matter and about 90 ~ 98% of total matter is currently 
invisible which is called dark matter and dark energy, such invisible matter has an anti-
gravity property to make the universe expanding faster and faster.   
If the ratio of the matter components of the universe is close to this hypothesis, 
then, the evolution of the universe should be dominated by the physical mechanism of 
such invisible matter, such physical mechanism could be far beyond the current 
Newtonian physics and Einsteinian physics, and the Newtonian physics and Einsteinian 
physics could reflect only a corner of the iceberg of the greater physics.  
If the ratio of the matter components of the universe is close to this hypothesis, 
then, it should be more reasonable to think that such dominant invisible matter spreads in 
everywhere of the universe (the density of the invisible matter may vary from place to 
place); in other words, all visible matter objects should be surrounded by such invisible 
matter and the motion of the visible matter objects should be affected by the invisible 
matter if there are  interactions between the visible matter and the invisible matter.  
If the ratio of the matter components of the universe is close to this hypothesis, 
then, the size of the particles of the invisible matter should be very small and below the 
detection limit of the current technology; otherwise, it would be detected long time ago 
with such dominant amount. 
With such invisible matter in mind, we move to the next section to develop the 
dark matter field fluid model with non-Newtonian approach. For simplicity, all invisible 
matter (dark matter, dark energy and possible other terms) is called dark matter here. 
3. The dark matter field fluid model 
In this proposed model, it is assumed that: 
1.  A celestial body rotates and moves in the space, which, for simplicity, is uniformly 
filled with the dark matter which is in quiescent state relative to the motion of the 
celestial body. The dark matter possesses a field property and a fluid property; it can 
interact with the celestial body with its fluid and field properties; therefore, it can have 
energy exchange with the celestial body, and affect the motion of the celestial body. 
2.  The fluid property follows the general principle of fluid mechanics. The dark matter 
field fluid particles may be so small that they can easily permeate into ordinary 
“baryonic” matter;  i. e., ordinary matter objects could be saturated with such dark matter 
field fluid. Thus, the whole celestial body interacts with the dark matter field fluid, in the 
manner of a sponge moving thru water. The nature of the field property of the dark matter 
field fluid is unknown. It is here assumed that the interaction of the field associated with 
the dark matter field fluid with the celestial body is proportional to the mass of the 
celestial body. The dark matter field fluid is assumed to have a repulsive force against the 
gravitational force towards baryonic matter. The nature and mechanism of such repulsive 
force is unknown. 
With the assumptions above, one can study how the dark matter field fluid may 
influence the motion of a celestial body and compare the results with observations. The 
common shape of celestial bodies is spherical. According to Stokes's law, a rigid non-
permeable sphere moving through a quiescent fluid with a sufficiently low Reynolds 
number experiences a resistance force F 
rvF πμ6−=       (1) 
where v is the moving velocity, r is the radius of the sphere, and μ is the fluid viscosity 
constant. The direction of the resistance force F in Eq. 1 is opposite to the direction of the 
velocity v. For a rigid sphere moving through the dark matter field fluid, due to the dual 
properties of the dark matter field fluid and its permeation into the sphere, the force F 
may not be proportional to the radius of the sphere. Also, F may be proportional to the 
mass of the sphere due to the field interaction. Therefore, with the combined effects of 
both fluid and field, the force exerted on the sphere by the dark matter field fluid is 
assumed to be of the scaled form 
        (2) mvrF n−−= 16πη
where n is a parameter arising from saturation by dark matter field fluid, the r1-n can be 
viewed as the effective radius with the same unit as r, m is the mass of the sphere, and η 
is the dark matter field fluid constant, which is equivalent to μ. The direction of the 
resistance force F in Eq. 2 is opposite to the direction of the velocity v. The force 
described by Eq. 2 is velocity-dependent and causes negative acceleration. According to 
Newton's second law of motion, the equation of motion for the sphere is 
   mvr
m n−−= 16πη      (3) 
Then  
       (4) )6exp( 10 vtrvv
n−−= πη
where v0 is the initial velocity (t = 0) of the sphere. If the sphere revolves around a 
massive gravitational center, there are three forces in the line between the sphere and the 
gravitational center: (1) the gravitational force, (2) the centripetal acceleration force; and 
(3) the repulsive force of the dark matter field fluid.  The drag force in Eq. 3 reduces the 
orbital velocity and causes the sphere to move inward to the gravitational center. 
However, if the sum of the centripetal acceleration force and the repulsive force is 
stronger than the gravitational force, then, the sphere will move outward and recede from 
the gravitational center. This is the case of interest here. If the velocity change in Eq. 3 is 
sufficiently slow and the repulsive force is small compared to the gravitational force and 
centripetal acceleration force, then the rate of receding will be accordingly relatively 
slow. Therefore, the gravitational force and the centripetal acceleration force can be 
approximately treated in equilibrium at any time. The pseudo equilibrium equation is 
GMm 2
2 =       (5) 
where G is the gravitational constant, M is the mass of the gravitational center, and R is 
the radius of the orbit.  Inserting v of Eq.  4 into Eq. 5 yields  
   )12exp( 1
R n−= πη     (6) 
       (7) )12exp( 10 trRR
n−= πη
where 
R =       (8) 
R0 is the initial distance to the gravitational center.  Note that R exponentially increases 
with time. The increase of orbital energy with the receding comes from the repulsive 
force of dark matter field fluid.  The recessional rate of the sphere is 
dR n−= 112πη      (9) 
The acceleration of the recession is 
   ( Rr
Rd n 21
12 −= πη ) .     (10) 
The recessional acceleration is positive and proportional to its distance to the 
gravitational center, so the recession is faster and faster. 
According to the mechanics of fluids, for a rigid non-permeable sphere rotating 
about its central axis in the quiescent fluid, the torque T exerted by the fluid on the sphere 
ωπμ 38 rT −=       (11) 
where ω is the angular velocity of the sphere.  The direction of the torque in Eq. 11 is 
opposite to the direction of the rotation. In the case of a sphere rotating in the quiescent 
dark matter field fluid with angular velocity ω, similar to Eq. 2, the proposed T exerted 
on the sphere is 
( ) ωπη mrT n 318 −−=      (12) 
The direction of the torque in Eq. 12 is opposite to the direction of the rotation. The 
torque causes the negative angular acceleration 
=       (13) 
where I is the moment of inertia of the sphere in the dark matter field fluid 
   ( )21
2 nrmI −=       (14) 
Therefore, the equation of rotation for the sphere in the dark matter field fluid is 
   ωπη
d −−= 120      (15) 
Solving this equation yields 
       (16) )20exp( 10 tr
n−−= πηωω
where ω0 is the initial angular velocity.  One can see that the angular velocity of the 
sphere exponentially decreases with time and the angular deceleration is proportional to 
its angular velocity.   
For the same celestial sphere, combining Eq. 9 and Eq. 15 yields 
     (17) 
The significance of Eq. 17 is that it contains only observed data without assumptions and 
undetermined parameters; therefore, it is a critical test for this model. 
For two different celestial spheres in the same system, combining Eq. 9 and Eq. 
15 yields 
   67.1
1 −=−=⎟⎟
   (18) 
This is another critical test for this model. 
4. The current behavior of the Earth-Moon system agrees with the model 
 The Moon-Earth system is the simplest gravitational system. The solar system is 
complex, the Earth and the Moon experience not only the interaction of the Sun but also 
interactions of other planets. Let us consider the local Earth-Moon gravitational system as 
an isolated local gravitational system, i.e., the influence from the Sun and other planets 
on the rotation and orbital motion of the Moon and on the rotation of the Earth is 
assumed negligible compared to the forces exerted by the moon and earth on each other. 
In addition, the eccentricity of the Moon's orbit is small enough to be ignored. The data 
about the Moon and the Earth from references (Dickey et .al., 1994,  and Lang, 1992) are 
listed below for the readers' convenience to verify the calculation because the data may 
vary slightly with different data sources. 
Moon: 
    Mean radius:    r  = 1738.0 km 
    Mass:       m = 7.3483 × 1025 gram 
    Rotation period = 27.321661 days 
    Angular velocity of Moon = 2.6617 × 10-6 rad s-1 
    Mean distance to Earth Rm= 384400 km 
    Mean orbital velocity v = 1.023 km s-1 
    Orbit eccentricity    e = 0.0549 
    Angular rotation acceleration rate = -25.88 ±  0.5 arcsec century-2 
       = (-1.255 ± 0.024) × 10-4 rad century-2
       = (-1.260 ± 0.024) × 10-23 rad s-2
    Receding rate from Earth = 3.82 ±  0.07 cm year-1 = (1.21 ±  0.02) × 10-9 m s-1 
Earth: 
    Mean radius:    r  = 6371.0 km 
    Mass:       m = 5.9742 × 1027 gram 
    Rotation period = 23 h 56m 04.098904s = 86164.098904s 
    Angular velocity of rotation = 7.292115 × 10-5 rad s-1 
    Mean distance to the Sun Rm= 149,597,870.61 km 
    Mean orbital velocity v = 29.78 km s-1 
    Angular acceleration of Earth = (-5.5 ± 0.5) × 10-22 rad s-2
The Moon's angular rotation acceleration rate and increase in mean distance to the Earth 
(receding rate) were obtained from the lunar laser ranging of the Apollo Program (Dickey 
et .al., 1994). By inserting the data of the Moon's rotation and recession into Eq.  17, the 
result is 
 039.054.1
10662.21021.1
1092509.31026.1
   (19) 
The distance R in Eq. 19 is from the Moon's center to the Earth's center and the number 
384400 km is assumed to be the distance from the Moon's surface to the Earth's surface. 
Eq. 19 is in good agreement with the theoretical value of -1.67. The result is in accord 
with the model used here. The difference (about 7.8%) between the values of -1.54 and -
1.67 may come from several sources: 
1. Moon's orbital is not a perfect circle 
2. Moon is not a perfect rigid sphere. 
3. The effect from Sun and other planets. 
4. Errors in data. 
5. Possible other unknown reasons. 
The two parameters n and η in Eq. 9 and Eq. 15 can be determined with two data 
sets.  The third data set can be used to further test the model.  If this model correctly 
describes the situation at hand, it should give consistent results for different motions. The 
values of n and η calculated from three different data sets are listed below (Note, the 
mean distance of the Moon to the Earth and mean radii of the Moon and the Earth are 
used in the calculation). 
    The value of n:          n = 0.64 
    From the Moon's rotation:   η  = 4.27 ×  10-22  s-1 m-1
    From the Earth's rotation:     η  = 4.26 × 10-22  s-1 m-1
    From the Moon's recession:  η = 4.64 × 10-22  s-1 m-1
One can see that the three values of η are consistent within the range of error in the data. 
    The average value of  η:   η = (4.39 ± 0.22) × 10-22 s-1 m-1
By inserting the data of the Earth's rotation, the Moon’s recession and the value of n into 
Eq. 18, the result is 
14.053.1
6371000
1738000
1021.11029.7
1092509.3105.5
)64.01(
  (20) 
This is also in accord with the model used here. 
The dragging force exerted on the Moon's orbital motion by the dark matter field 
fluid is -1.11 × 108 N, this is negligibly small compared to the gravitational force between 
the Moon and the Earth ~ 1.90 × 1020 N;  and the torque exerted by the dark matter field 
fluid on the Earth’s and the Moon's rotations is T = -5.49 × 1016 Nm and -1.15 × 1012 Nm, 
respectively.  
5. The evolution of Earth-Moon system 
Sonett  et al. found that the length of the terrestrial day 900 million years ago was 
about 19.2 hours based on the laminated tidal sediments on the Earth (Sonett et al., 
1996). According to the model presented here, back in that time, the length of the day 
was about 19.2 hours, this agrees very well with Sonett et al.'s result.   
Another critical aspect of modeling the evolution of the Earth-Moon system is to  
give a reasonable estimate of the closest distance of the Moon to the Earth when the 
system was established at 4.5 billion years ago. Based on the dark matter field fluid 
model, and the above result, the closest distance of the Moon to the Earth was about 
259000 km (center to center) or 250900 km (surface to surface) at 4.5 billion years ago, 
this is far beyond the Roche's limit. In the modern astronomy textbook by Chaisson and 
McMillan (Chaisson and McMillan, 1993, p.173), the estimated distance at 4.5 billion 
years ago was 250000 km, this is probably the most reasonable number that most 
astronomers believe and it agrees excellently with the result of this model. The closest 
distance of the Moon to the Earth by Hansen’s models was about 38 Earth radii or 
242000 km (Hansen, 1982).  
According to this model, the length of day of the Earth was about 8 hours at 4.5 
billion years ago. Fig. 1 shows the evolution of the distance of Moon to the Earth and the 
length of day of the Earth with the age of the Earth-Moon system described by this model 
along with data from Kvale et al. (1999), Sonett et al. (1996) and Scrutton (1978). One 
can see that those data fit this model very well in their time range.  
Fig. 2 shows the geological data of solar days year-1 from Wells (1963) and from 
Sonett et al. (1996) and the description (solid line) by this dark matter field fluid model 
for past 900 million years. One can see that the model agrees with the geological and 
fossil data beautifully.  
The important difference of this model with early models in describing the early 
evolution of the Earth-Moon system is that this model is based only on current data of the 
Moon-Earth system and there are no assumptions about the conditions of earlier Earth 
rotation and continental drifting. Based on this model, the Earth-Moon system has been 
smoothly evolving to the current position since it was established and the recessional rate 
of the Moon has been gradually increasing, however, this description does not take it into 
account that there might be special events happened in the past to cause the suddenly 
significant changes in the motions of the Earth and the Moon, such as strong impacts by 
giant asteroids and comets, etc, because those impacts are very common in the universe. 
The general pattern of the evolution of the Moon-Earth system described by this model 
agrees with geological evidence. Based on Eq. 9, the recessional rate exponentially 
increases with time. One may then imagine that the recessional rate will quickly become 
very large.  The increase is in fact extremely slow. The Moon's recessional rate will be 
3.04 × 10-9 m s-1 after 10 billion years and 7.64 × 10-9 m s-1 after 20 billion years. 
However, whether the Moon's recession will continue or at some time later another 
mechanism will take over is not known. It should be understood that the tidal friction 
does affect the evolution of the Earth itself such as the surface crust structure, continental 
drifting and evolution of bio-system, etc; it may also play a role in slowing the Earth’s 
rotation, however, such role is not a dominant mechanism.  
Unfortunately, there is no data available for the changes of the Earth's orbital 
motion and all other members of solar system. According to this model and above results, 
the recessional rate of the Earth should be 6.86 × 10-7 m s-1 = 21.6 m year-1 = 2.16 km 
century-1, the length of a year increases about 6.8 ms and the change of the temperature is 
-1.8 × 10-8 K year-1 with constant radiation level of the Sun and the stable environment on 
the Earth. The length of a year at 1 billion years ago would be 80% of the current length 
of the year. However, much evidence (growth-bands of corals and shellfish as well as 
some other evidences) suggest that there has been no apparent change in the length of the 
year over the billion years and the Earth's orbital motion is more stable than its rotation.  
This suggests that dark matter field fluid is circulating around Sun with the same 
direction and similar speed of Earth (at least in the Earth's orbital range). Therefore, the 
Earth's orbital motion experiences very little or no dragging force from the dark matter 
field fluid. However, this is a conjecture, extensive research has to be conducted to verify 
if this is the case.  
6. Skeptical description of the evolution of the Mars 
The Moon does not have liquid fluid on its surface, even there is no air, therefore, 
there is no ocean-like tidal friction force to slow its rotation; however, the rotation of the 
Moon is still slowing at significant rate of (-1.260 ± 0.024) × 10-23 rad s-2,  which agrees 
with the model very well. Based on this, one may reasonably think that the Mars’s 
rotation should be slowing also. 
The Mars is our nearest neighbor which has attracted human’s great attention 
since ancient time. The exploration of the Mars has been heating up in recent decades. 
NASA, Russian and Europe Space Agency sent many space crafts to the Mars to collect 
data and study this mysterious planet. So far there is still not enough data about the 
history of this planet to describe its evolution. Same as the Earth, the Mars rotates about 
its central axis and revolves around the Sun, however, the Mars does not have a massive 
moon circulating it (Mars has two small satellites: Phobos and Deimos) and there is no 
liquid fluid on its surface, therefore, there is no apparent ocean-like tidal friction force to 
slow its rotation by tidal friction theories. Based on the above result and current the 
Mars's data, this model predicts that the angular acceleration of the Mars should be about 
-4.38 × 10-22 rad s-2. Figure 3 describes the possible evolution of the length of day and the 
solar days/Mars year, the vertical dash line marks the current age of the Mars with 
assumption that the Mars was formed in a similar time period of the Earth formation. 
Such description was not given before according to the author’s knowledge and is 
completely skeptical due to lack of reliable data.  However, with further expansion of the 
research and exploration on the Mars, we shall feel confident that the reliable data about 
the angular rotation acceleration of the Mars will be available in the near future which 
will provide a vital test for the prediction of this model. There are also other factors 
which may affect the Mars’s rotation rate such as mass redistribution due to season 
change, winds, possible volcano eruptions and Mars quakes. Therefore, the data has to be 
carefully analyzed. 
7.  Discussion about the model 
From the above results, one can see that the current Earth-Moon data and the 
geological and fossil data agree with the model very well and the past evolution of the 
Earth-Moon system can be described by the model without introducing any additional 
parameters; this model reveals the interesting relationship between the rotation and 
receding (Eq.  17 and Eq.  18) of the same celestial body or different celestial bodies in 
the same gravitational system, such relationship is not known before. Such success can 
not be explained by “coincidence” or “luck” because of so many data involved (current 
Earth’s and Moon’s data and geological and fossil data) if one thinks that this is just a 
“ad hoc” or a wrong model, although the chance for the natural happening of such 
“coincidence” or “luck” could be greater than wining a jackpot lottery; the future Mars’s 
data will clarify this; otherwise, a new theory from different approach can be developed 
to give the same or better description as this model does. It is certain that this model is 
not perfect and may have defects, further development may be conducted. 
James Clark Maxwell said in the 1873 “ The vast interplanetary and interstellar 
regions will no longer be regarded as waste places in the universe, which the Creator has 
not seen fit to fill with the symbols of the manifold order of His kingdom. We shall find 
them to be already full of this wonderful medium; so full, that no human power can 
remove it from the smallest portion of space, or produce the slightest flaw in its infinite 
continuity. It extends unbroken from star to star ….” The medium that Maxwell talked 
about is the aether which was proposed as the carrier of light wave propagation. The 
Michelson-Morley experiment only proved that the light wave propagation does not 
depend on such medium and did not reject the existence of the medium in the interstellar 
space. In fact, the concept of the interstellar medium has been developed dramatically 
recently such as the dark matter, dark energy, cosmic fluid, etc. The dark matter field 
fluid is just a part of such wonderful medium and “precisely” described by Maxwell. 
7. Conclusion 
The evolution of the Earth-Moon system can be described by the dark matter field 
fluid model with non-Newtonian approach and the current data of the Earth and the Moon 
fits this model very well. At 4.5 billion years ago, the closest distance of the Moon to the 
Earth could be about 259000 km, which is far beyond the Roche’s limit and the length of 
day was about 8 hours. The general pattern of the evolution of the Moon-Earth system 
described by this model agrees with geological and fossil evidence. The tidal friction may 
not be the primary cause for the evolution of the Earth-Moon system. The Mars’s rotation 
is also slowing with the angular acceleration rate about -4.38 × 10-22 rad s-2. 
References 
S. G. Brush, 1983.   L. R. Godfrey (editor), Ghost from the Nineteenth century: 
Creationist Arguments for a young Earth. Scientists confront creationism. W. W. 
Norton & Company, New York, London, pp 49. 
E. Chaisson and S. McMillan. 1993. Astronomy Today,  Prentice Hall, Englewood 
Cliffs, NJ 07632. 
J. O. Dickey,  et al., 1994. Science, 265, 482. 
D. G. Finch, 1981. Earth, Moon, and Planets, 26(1), 109. 
K. S. Hansen, 1982. Rev. Geophys. and Space Phys. 20(3), 457. 
W. K. Hartmann, D. R. Davis, 1975. Icarus, 24, 504. 
B. A. Kagan, N. B. Maslova, 1994. Earth, Moon and Planets 66, 173.  
B. A. Kagan, 1997. Prog. Oceanog. 40, 109. 
E. P. Kvale,  H. W. Johnson, C. O. Sonett, A. W. Archer, and  A. Zawistoski, 1999, J. 
Sediment. Res. 69(6), 1154. 
K. Lang, 1992. Astrophysical Data: Planets and Stars, Springer-Verlag, New York. 
H. Pan, 2005.  Internat. J. Modern Phys. A, 20(14), 3135. 
R. D. Ray, B. G. Bills, B. F. Chao, 1999. J. Geophys. Res. 104(B8), 17653. 
C. T. Scrutton, 1978.  P. Brosche, J. Sundermann, (Editors.), Tidal Friction and the 
Earth’s Rotation. Springer-Verlag, Berlin, pp. 154. 
L. B. Slichter, 1963. J. Geophys. Res.  68, 14. 
C. P. Sonett, E. P. Kvale, M. A. Chan, T. M. Demko,  1996. Science, 273, 100. 
F. D. Stacey, 1977.  Physics of the Earth, second edition. John Willey & Sons. 
J. W. Wells,  1963. Nature, 197, 948. 
Caption 
Figure 1, the evolution of Moon’s distance and the length of day of the earth with 
the age of the Earth-Moon system. Solid lines are calculated according to the dark matter 
field fluid model. Data sources: the Moon distances are from Kvale and et al. and for the 
length of day: (a and b) are from Scrutton ( page 186, fig. 8), c is from Sonett and et al.  
The dash line marks the current age of the Earth-Moon system.  
Figure 2, the evolution of Solar days of year with the age of the Earth-Moon 
system. The solid line is calculated according to dark matter field fluid model. The data 
are from Wells (3.9 ~ 4.435 billion years range), Sonett (3.6 billion years) and current 
age (4.5 billion years). 
Figure 3, the skeptical description of the evolution of Mars’s length of day and the 
solar days/Mars year with the age of the Mars (assuming that the Mars’s age is about 4.5 
billion years). The vertical dash line marks the current age of Mars. 
Figure 1, Moon's distance and the length of day of Earth 
change with the age of Earth-Moon system
The age of Earth-Moon system (109 years)
0 1 2 3 4 5
Distance
Length of day
Roche's limit
Hansen's result
Figure 2,  the solar days / year vs. the age of the Earth
The age of the Earth (109 years)
3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6
ABSTRACT
  The evolution of Earth-Moon system is described by the dark matter field
fluid model proposed in the Meeting of Division of Particle and Field 2004,
American Physical Society. The current behavior of the Earth-Moon system agrees
with this model very well and the general pattern of the evolution of the
Moon-Earth system described by this model agrees with geological and fossil
evidence. The closest distance of the Moon to Earth was about 259000 km at 4.5
billion years ago, which is far beyond the Roche's limit. The result suggests
that the tidal friction may not be the primary cause for the evolution of the
Earth-Moon system. The average dark matter field fluid constant derived from
Earth-Moon system data is 4.39 x 10^(-22) s^(-1)m^(-1). This model predicts
that the Mars's rotation is also slowing with the angular acceleration rate
about -4.38 x 10^(-22) rad s^(-2).

<|endoftext|><|startoftext|>
Introduction The chief purpose of this paper is to show bijectively that
a determinant of Stirling cycle numbers counts unlabeled acyclic single-source automata.
Specifically, let Ak(n) denote the kn × kn matrix with (i, j) entry
[ ⌊ i−1
⌊ i−1
⌋+1+i−j
, where
is the Stirling cycle number, the number of permutations on [i] with j cycles. For
example,
A2(5) =
1 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0
0 1 3 2 0 0 0 0 0 0
0 0 1 3 2 0 0 0 0 0
0 0 0 1 6 11 6 0 0 0
0 0 0 0 1 6 11 6 0 0
0 0 0 0 0 1 10 35 50 24
0 0 0 0 0 0 1 10 35 50
0 0 0 0 0 0 0 1 15 85
0 0 0 0 0 0 0 0 1 15
http://arxiv.org/abs/0704.0004v1
As evident in the example, Ak(n) is formed from k copies of each of rows 2 through n+1
of the Stirling cycle triangle, arranged so that the first nonzero entry in each row is a 1
and, after the first row, this 1 occurs just before the main diagonal; in other words, Ak(n)
is a Hessenberg matrix with 1s on the infra-diagonal. We will show
Main Theorem. The determinant of Ak(n) is the number of unlabeled acyclic single-
source automata with n transient states on a (k + 1)-letter input alphabet.
Section 2 reviews basic terminology for automata and recurrence relations to count
finite acyclic automata. Section 3 introduces column-marked subdiagonal paths, which
play an intermediate role, and a way to code them. Section 4 presents a bijection from
these column-marked subdiagonal paths to unlabeled acyclic single-source automata. Fi-
nally, Section 5 evaluates detAk(n) using a sign-reversing involution and shows that the
determinant counts the codes for column-marked subdiagonal paths.
2 Automata
A (complete, deterministic) automaton consists of a set of states and an input alphabet
whose letters transform the states among themselves: a letter and a state produce another
state (possibly the same one). A finite automaton (finite set of states, finite input alphabet
of, say, k letters) can be represented as a k-regular directed multigraph with ordered edges:
the vertices represent the states and the first, second, . . . edge from a vertex give the effect
of the first, second, . . . alphabet letter on that state. A finite automaton cannot be acyclic
in the usual sense of no cycles: pick a vertex and follow any path from it. This path must
ultimately hit a previously encountered vertex, thereby creating a cycle. So the term
acyclic is used in the looser sense that only one vertex, called the sink, is involved in
cycles. This means that all edges from the sink loop back to itself (and may safely be
omitted) and all other paths feed into the sink.
A non-sink state is called transient. The size of an acyclic automaton is the number of
transient states. An acyclic automaton of size n thus has transient states which we label
1, 2, . . . , n and a sink, labeled n + 1. Liskovets [1] uses the inclusion-exclusion principle
(more about this below) to obtain the following recurrence relation for the number ak(n)
of acyclic automata of size n on a k-letter input alphabet (k ≥ 1):
ak(0) = 1; ak(n) =
(−1)n−j−1
(j + 1)k(n−j)ak(j), n ≥ 1.
A source is a vertex with no incoming edges. A finite acyclic automaton has at least
one source because a path traversed backward v1 ← v2 ← v3 ← . . . must have distinct
vertices and so cannot continue indefinitely. An automaton is single-source (or initially
connected) if it has only one source. Let Bk(n) denote the set of single-source acyclic
finite (SAF) automata on a k-letter input alphabet with vertices 1, 2, . . . , n + 1 where 1
is the source and n + 1 is the sink, and set bk(n) = | Bk(n) |. The two-line representation
of an automaton in Bk(n) is the 2× kn matrix whose columns list the edges in order. For
example,
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
2 4 6 6 6 6 6 6 6 3 5 3 2 2 6
is in B3(5) and the source-to-sink paths in B include 1
→ 6, 1
→ 6, 1
→ 6, where the alphabet is {a, b, c}.
Proposition 1. The number bk(n) of SAF automata of size n on a k-letter input alphabet
(n, k ≥ 1) is given by
bk(n) =
(−1)n−i
(i+ 1)k(n−i)ak(i)
Remark This formula is a bit more succinct than the the recurrence in [1, Theorem
3.2].
Proof Consider the setA of acyclic automata with transient vertices [n] = {1, 2, . . . , n}
in which 1 is a source. Call 2, 3, . . . , n the interior vertices. For X ⊆ [2, n], let
f(X) = # automata in A whose set of interior vertices includes X,
g(X) = # automata in A whose set of interior vertices is precisely X.
Then f(X) =
Y :X⊆Y⊆[2,n] g(Y ) and by Möbius inversion [2] on the lattice of subsets of
[2, n], g(X) =
Y :X⊆Y⊆[2,n] µ(X, Y )f(Y ) where µ(X, Y ) is the Möbius function for this
lattice. Since µ(X, Y ) = (−1)|Y |−|X| if X ⊆ Y , we have in particular that
g(∅) =
Y⊆[2,n]
(−1)| Y |f(Y ). (1)
Let | Y | = n − i so that 1 ≤ i ≤ n. When Y consists entirely of sources, the vertices
in [n+ 1]\Y and their incident edges form a subautomaton with i transient states; there
are ak(i) such. Also, all edges from the n − i vertices comprising Y go directly into
[n + 1]\Y : (i + 1)k(n−i) choices. Thus f(Y ) = (i + 1)k(n−i)ak(i). By definition, g(∅) is
the number of automata in A for which 1 is the only source, that is, g(∅) = bk(n) and the
Proposition now follows from (1).
An unlabeled SAF automaton is an equivalence class of SAF automata under relabeling
of the interior vertices. Liskovets notes [1] (and we prove below) that Bk(n) has no
nontrivial automorphisms, that is, each of the (n− 1)! relabelings of the interior vertices
of B ∈ Bk(n) produces a different automaton. So unlabeled SAF automata of size n on
a k-letter alphabet are counted by 1
(n−1)!
bk(n). The next result establishes a canonical
representative in each relabeling class.
Proposition 2. Each equivalence class in Bk(n) under relabeling of interior vertices has
size (n− 1)! and contains exactly one SAF automaton with the “last occurrences increas-
ing” property: the last occurrences of the interior vertices—2, 3, . . . , n—in the bottom row
of its two-line representation occur in that order.
Proof The first assertion follows from the fact that the interior vertices of an au-
tomatonB ∈ bk(n) can be distinguished intrinsically, that is, independent of their labeling.
To see this, first mark the source, namely 1, with a mark (new label) v1 and observe that
there exists at least one interior vertex whose only incoming edge(s) are from the source
(the only currently marked vertex) for otherwise a cycle would be present. For each such
interior vertex v, choose the last edge from the marked vertex to v using the built-in
ordering of these edges. This determines an order on these vertices; mark them in order
v2, v3, . . . , vj (j ≥ 2). If there still remain unmarked interior vertices, at least one of them
has incoming edges only from a marked vertex or again a cycle would be present. For
each such vertex, use the last incoming edge from a marked vertex, where now edges are
arranged in order of initial vertex vi with the built-in order breaking ties, to order and
mark these vertices vj+1, vj+2, . . .. Proceed similarly until all interior vertices are marked.
For example, for
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
2 4 6 6 6 6 6 6 6 3 5 3 2 2 6
v1 = 1 and there is just one interior vertex, namely 4, whose only incoming edge is from
the source, and so v2 = 4 and 4 becomes a marked vertex. Now all incoming edges to
both 3 and 5 are from marked vertices and the last such edges (built-in order comes into
play) are 4
→ 5 and 4
→ 3 putting vertices 3, 5 in the order 5, 3. So v3 = 5 and v4 = 3.
Finally, v5 = 2. This proves the first assertion. By construction of the vs, relabeling each
interior vertex i with the subscript of its corresponding v produces an automaton in Bk(n)
with the “last occurrences increasing” property and is the only relabeling that does so.
The example yields
1 1 1 2 2 2 3 3 3 4 4 4 5 5 5
5 2 6 4 3 4 5 5 6 6 6 6 6 6 6
Now let Ck(n) denote the set of canonical SAF automata in Bk(n) representing un-
labeled automata; thus | Ck(n) | =
(n−1)!
bk(n). Henceforth, we identify an unlabeled au-
tomaton with its canonical representative.
3 Column-Marked Subdiagonal Paths
A subdiagonal (k, n, p)-path is a lattice path of steps E = (1, 0) and N = (0, 1), E for
east and N for north, from (0, 0) to (kn, p) that never rise above the line y = 1
x. Let
Ck(n, p) denote the set of such paths.For k ≥ 1, it is clear that Ck(n, p) is nonempty only
for 0 ≤ p ≤ n and it is known (generalized ballot theorem) that
|Ck(n, p) | =
kn− kp+ 1
kn+ p+ 1
kn+ p + 1
A path P in Ck(n, n) can be coded by the heights of its E steps above the line y = −1;
this gives a a sequence (bi)
i=1 subject to the restrictions 1 ≤ b1 ≤ b2 ≤ . . . ≤ bkn and
bi ≤ ⌈i/k⌉ for all i.
A column-marked subdiagonal (k, n, p)-path is one in which, for each i ∈ [1, kn], one of
the lattice squares below the ith E step and above the horizontal line y = −1 is marked,
say with a ‘ ∗ ’. Let C
k(n, p) denote the set of such marked paths.
b b b
b b b b
b b b b
∗ ∗ ∗
(0,0)
(8,4)
y = −1
y = 1
A path in C
2(4, 3)
A marked path P ∗ in C
k(n, n) can be coded by a sequence of pairs
(ai, bi)
where
i=1 is the code for the underlying path P and ai ∈ [1, bi] gives the position of the ∗ in the
ith column. The example is coded by (1, 1), (1, 1), (1, 2), (2, 2), (1, 2), (3, 3), (1, 3), (2, 3).
An explicit sum for |C
k(n, n) | is
k(n, n) | =
1≤b1≤b2≤...≤bkn,
bi ≤ ⌈i/k⌉ for all i
b1b2 . . . bkn,
because the summand b1b2 . . . bkn is the number of ways to insert the ‘ ∗ ’s in the underlying
path coded by (bi)
It is also possible to obtain a recurrence for |C
k(n, p) |, and then, using Prop. 1, to
show analytically that |C
k(n, n) | = | Ck+1(n) |. However, it is much more pleasant to
give a bijection and in the next section we will do so. In particular, the number of SAF
automata on a 2-letter alphabet is
| C2(n) | = |C
1(n, n) | =
1≤b1≤b2≤...≤bn
bi ≤ i for all i
b1b2 . . . bn = (1, 3, 16, 127, 1363, . . .)n≥1,
sequence A082161 in [3].
4 Bijection from Paths to Automata
In this section we exhibit a bijection from C
k(n, n) to Ck+1(n). Using the illustrated
path as a working example with k = 2 and n = 4,
b b b
b b b b
b b b b
∗ ∗ ∗
(0,0)
(8,4)
y = −1
y = 1
first construct the top row of a two-line representation consisting of k + 1 each 1s, 2s,
. . . ,n s and number them left to right:
The last step in the path is necessarily anN step. For the second last, third last,. . .N steps
in the path, count the number of steps following it. This gives a sequence i1, i2, . . . , in−1
satisfying 1 ≤ i1 < i2 < . . . < in−1 and ij ≤ (k + 1)j for all j. Circle the positions
i1, i2, . . . , in−1 in the two-line representation and then insert (in boldface) 2, 3, . . . , n in
the second row in the circled positions:
2 3 4
These will be the last occurrences of 2, 3, . . . , n in the second row. Working from the last
column in the path back to the first, fill in the blanks in the second row left to right as
follows. Count the number of squares from the ∗ up to the path (including the ∗ square)
http://www.research.att.com:80/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A082161
and add this number to the nearest boldface number to the left of the current blank entry
(if there are no boldface numbers to the left, add this number to 1) and insert the result
in the current blank square. In the example the numbers of squares are 2,3,1,2,1,2,1,1
yielding
2 4 5 3 3 5 4 5 4 5 5
This will fill all blank entries except the last. Note that ∗ s in the bottom row correspond
to sink (that is, n+1) labels in the second row. Finally, insert n+1 into the last remaining
blank space to give the image automaton:
1 1 1 2 2 2 3 3 3 4 4 4
2 4 5 3 3 5 4 5 4 5 5 5
This process is fully reversible and the map is a bijection.
5 Evaluation of detAk(n)
For simplicity, we treat the case k = 1, leaving the generalization to arbitrary k
as a not-too-difficult exercise for the interested reader. Write A(n) for A1(n). Thus
A(n) =
1≤i,j≤n
. From the definition of detA(n) as a sum of signed products, we
show that detA(n) is the total weight of certain lists of permutations, each list carrying
weight ±1. Then a weight-reversing involution cancels all −1 weights and reduces the
problem to counting the surviving lists. These surviving lists are essentially the codes for
paths in C
1(n, p), and the Main Theorem follows from §4.
To describe the permutations giving a nonzero contribution to detA(n) =
σ sgn σ×
i=1 ai,σ(i), define the code of a permutation σ on [n] to be the list c = (ci)
i=1 with
ci = σ(i)−(i−1). Since the (i, j) entry of A(n),
, is 0 unless j ≥ i−1, we must have
σ(i) ≥ i−1 for all i. It is well known that there are 2n−1 such permutations, corresponding
to compositions of n, with codes characterized by the following four conditions: (i) ci ≥ 0
for all i, (ii) c1 ≥ 1, (iii) each ci ≥ 1 is immediately followed by ci − 1 zeros in the list,
i=1 ci = n. Let us call such a list a padded composition of n: deleting the zeros
is a bijection to ordinary compositions of n. For example, (3, 0, 0, 1, 2, 0) is a padded
composition of 6. For a permutation σ with padded composition code c, the nonzero
entries in c give the cycle lengths of σ. Hence sgnσ, which is the parity of “n−#cycles
in σ”, is given by (−1)#0s in c.
We have detA(n) =
σ sgn σ
i=1 ai,σ(i) =
σ sgn σ
2i−σ(i)
, and so
detA(n) =
(−1)#0s in c
i+ 1− ci
where the sum is restricted to padded compositions c of n with ci ≤ i for all i (A002083)
because
i+1−ci
= 0 unless ci ≤ i.
Henceforth, let us write all permutations in standard cycle form whereby the smallest
entry occurs first in each cycle and these smallest entries increase left to right. Thus,
with dashes separating cycles, 154-2-36 is the standard cycle form of the permutation
( 1 2 3 4 5 65 2 6 1 4 3 ). We define a nonfirst entry to be one that does not start a cycle. Thus the
preceding permutation has 3 nonfirst entries: 5,4,6. Note that the number of nonfirst
entries is 0 only for the identity permutation. We denote an identity permutation (of any
size) by ǫ.
By definition of Stirling cycle number, the product in (2) counts lists (πi)
i=1 of permu-
tations where πi is a permutation on [i+1] with i+1− ci cycles, equivalently, with ci ≤ i
nonfirst entries. So define Ln to be the set all lists of permutations π = (πi)
i=1 where πi
is a permutation on [i + 1], #nonfirst entries in πi is ≤ i, π1 is the transposition (1,2),
each nonidentity permutation πi is immediately followed by ci − 1 ǫ’s where ci ≥ 1 is the
number of nonfirst entries in πi (so the total number of nonfirst entries is n). Assign a
weight to π ∈ Ln by wt(π) = (−1)
# ǫ’s in π. Then
detA(n) =
wt(π).
We now define a weight-reversing involution on (most of) Ln. Given π ∈ Ln, scan the
list of its component permutations π1 = (1, 2), π2, π3, . . . left to right. Stop at the first
one that either (i) has more than one nonfirst entry, or (ii) has only one nonfirst entry, b
say, and b > maximum nonfirst entry m of the next permutation in the list. Say πk is the
permutation where we stop.
http://www.research.att.com:80/cgi-bin/access.cgi/as/njas/sequences/eisA.cgi?Anum=A002083
In case (i) decrement (i.e. decrease by 1) the number of ǫ’s in the list by splitting πk
into two nonidentity permutations as follows. Let m be the largest nonfirst entry of πk
and let ℓ be its predecessor. Replace πk and its successor in the list (necessarily an ǫ) by
the following two permutations: first the transposition (ℓ,m) and second the permutation
obtained from πk by erasing m from its cycle and turning it into a singleton. Here are
two examples of this case (recall permutations are in standard cycle form and, for clarity,
singleton cycles are not shown).
i 1 2 3 4 5 6
πi 12 13 23 14-253 ǫ ǫ
i 1 2 3 4 5 6
πi 12 13 23 25 14-23 ǫ
i 1 2 3 4 5 6
πi 12 23 14 13-24 ǫ 23
i 1 2 3 4 5 6
πi 12 23 14 24 13 23
The reader may readily check that this sends case (i) to case (ii).
In case (ii), πk is a transposition (a, b) with b > maximum nonfirst entry m of πk+1. In
this case, increment the number of ǫ’s in the list by combining πk and πk+1 into a single
permutation followed by an ǫ: in πk+1, b is a singleton; delete this singleton and insert b
immediately after a in πk+1 (in the same cycle). The reader may check that this reverses
the result in the two examples above and, in general, sends case (ii) to case (i). Since the
map alters the number of ǫ’s in the list by 1, it is clearly weight-reversing. The map fails
only for lists that both consist entirely of transpositions and have the form
(a1, b1), (a2, b2), . . . , (an, bn) with b1 ≤ b2 ≤ . . . ≤ bn.
Such lists have weight 1. Hence detA(n) is the number of lists
(ai, bi)
satisfying
1 ≤ ai < bi ≤ i+ 1 for 1 ≤ i ≤ n, and b1 ≤ b2 ≤ . . . ≤ bn. After subtracting 1 from each
bi, these lists code the paths in C
1(n, n) and, using §4, detA(n) = |C
1(n, n) | = | C2(n) |.
References
[1] Valery A. Liskovets, Exact enumeration of acyclic deterministic au-
tomata, Disc. Appl. Math., in press, 2006. Earlier version available at
http://www.i3s.unice.fr/fpsac/FPSAC03/articles.html
http://www.i3s.unice.fr/fpsac/FPSAC03/articles.html
[2] J. H. van Lint and R. M. Wilson, A Course in Combinatorics, 2nd ed., Cambridge
University Press, NY, 2001.
[3] Neil J. Sloane (founder and maintainer), The On-Line Encyclopedia of Integer Se-
quences http://www.research.att.com:80/ njas/sequences/index.html?blank=1
http://www.research.att.com:80/~njas/sequences/index.html?blank=1
ABSTRACT
  We show that a determinant of Stirling cycle numbers counts unlabeled acyclic
single-source automata. The proof involves a bijection from these automata to
certain marked lattice paths and a sign-reversing involution to evaluate the
determinant.

<|endoftext|><|startoftext|>
FROM DYADIC Λα TO Λα
WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY
Abstract. In this paper we show how to compute the Λα norm , α ≥ 0,
using the dyadic grid. This result is a consequence of the description of
the Hardy spaces Hp(RN ) in terms of dyadic and special atoms.
Recently, several novel methods for computing the BMO norm of a function
f in two dimensions were discussed in [9]. Given its importance, it is also of
interest to explore the possibility of computing the norm of a BMO function,
or more generally a function in the Lipschitz class Λα, using the dyadic grid
in RN . It turns out that the BMO question is closely related to that of
approximating functions in the Hardy space H1(RN ) by the Haar system.
The approximation in H1(RN ) by affine systems was proved in [2], but this
result does not apply to the Haar system. Now, if HA(R) denotes the closure
of the Haar system in H1(R), it is not hard to see that the distance d(f,HA)
of f ∈ H1(R) to HA is ∼
f(x) dx
∣, see [1]. Thus, neither dyadic atoms
suffice to describe the Hardy spaces, nor the evaluation of the norm in BMO
can be reduced to a straightforward computation using the dyadic intervals.
In this paper we address both of these issues. First, we give a characterization
of the Hardy spaces Hp(RN ) in terms of dyadic and special atoms, and then,
by a duality argument, we show how to compute the norm in Λα(R
N ), α ≥ 0,
using the dyadic grid.
We begin by introducing some notations. Let J denote a family of cubes
Q in RN , and Pd the collection of polynomials in R
N of degree less than or
equal to d. Given α ≥ 0, Q ∈ J , and a locally integrable function g, let pQ(g)
denote the unique polynomial in P[α] such that [g − pQ(g)]χQ has vanishing
moments up to order [α].
For a locally square-integrable function g, we consider the maximal function
α,J g(x) given by
α,J g(x) = sup
x∈Q,Q∈J
|Q|α/N
|g(y)− pQ(g)(y)|
1991 Mathematics Subject Classification. 42B30,42B35.
http://arxiv.org/abs/0704.0005v1
2 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY
The Lipschitz space Λα,J consists of those functions g such that M
α,J g is
in L∞, ‖g‖Λα,J = ‖M
α,J g‖∞; when the family in question contains all cubes
in RN , we simply omit the subscript J . Of course, Λ0 = BMO.
Two other families, of dyadic nature, are of interest to us. Intervals in R of
the form In,k = [ (k−1)2
n, k2n], where k and n are arbitrary integers, positive,
negative or 0, are said to be dyadic. In RN , cubes which are the product of
dyadic intervals of the same length, i.e., of the form Qn,k = In,k1 ×· · ·×In,kN ,
are called dyadic, and the collection of all such cubes is denoted D.
There is also the family D0. Let I
n,k = [(k− 1)2
n, (k+ 1)2n], where k and
n are arbitrary integers. Clearly I ′n,k is dyadic if k is odd, but not if k is even.
Now, the collection {I ′n,k : n, k integers} contains all dyadic intervals as well
as the shifts [(k − 1)2n + 2n−1, k 2n + 2n−1] of the dyadic intervals by their
half length. In RN , put D0 = {Q
n,k : Q
n,k = I
× · · · × I ′n,kN }; Q
n,k is
called a special cube. Note that D0 contains D properly.
Finally, given I ′n,k, let I
n,k = [(k − 1)2
n, k2n], and I
n,k = [k2
n, (k + 1)2n].
The 2N subcubes of Q′n,k = I
× · · · × I ′n,kN of the form I
× · · · × I
Sj = L or R, 1 ≤ j ≤ N , are called the dyadic subcubes of Q
Let Q0 denote the special cube [−1, 1]
N . Given α ≥ 0, we construct a
family Sα of piecewise polynomial splines in L
2(Q0) that will be useful in
characterizing Λα. Let A be the subspace of L
2(Q0) consisting of all functions
with vanishing moments up to order [α] which coincide with a polynomial
in P[α] on each of the 2
N dyadic subcubes of Q0. A is a finite dimensional
subspace of L2(Q0), and, therefore, by the Graham-Schmidt orthogonalization
process, say, A has an orthonormal basis in L2(Q0) consisting of functions
p1, . . . , pM with vanishing moments up to order [α], which coincide with a
polynomial in P[α] on each dyadic subinterval of Q0. Together with each p
we also consider all dyadic dilations and integer translations given by
pLn,k,α(x) = 2
n(N+α)pL(2nx1 + k1, . . . , 2
nxN + kN ) , 1 ≤ L ≤ M ,
and let
Sα = {p
n,k,α : n, k integers, 1 ≤ L ≤ M} .
Our first result shows how the dyadic grid can be used to compute the
norm in Λα.
Theorem A. Let g be a locally square-integrable function and α ≥ 0. Then,
g ∈ Λα if, and only if, g ∈ Λα,D and Aα(g) = supp∈Sα
∣〈g, p〉
∣ < ∞. Moreover,
‖g‖Λα ∼ ‖g‖Λα,D +Aα(g) .
Furthermore, it is also true, and the proof is given in Proposition 2.1 be-
low, that ‖g‖Λα ∼ ‖g‖Λα,D0 . However, in this simpler formulation, the tree
structure of the cubes in D has been lost.
FROM DYADIC Λα TO Λα 3
The proof of Theorem A relies on a close investigation of the predual of
Λα, namely, the Hardy space H
p(RN ) with 0 < p = (α + N)/N ≤ 1. In the
process we characterize Hp in terms of simpler subspaces: H
, or dyadic Hp,
and H
, the space generated by the special atoms in Sα. Specifically, we
Theorem B. Let 0 < p ≤ 1, and α = N(1/p− 1). We then have
Hp = H
where the sum is understood in the sense of quasinormed Banach spaces.
The paper is organized as follows. In Section 1 we show that individual
Hp atoms can be written as a superposition of dyadic and special atoms;
this fact may be thought of as an extension of the one-dimensional result of
Fridli concerning L∞ 1- atoms, see [5] and [1]. Then, we prove Theorem B.
In Section 2 we discuss how to pass from Λα,D, and Λα,D0 , to the Lipschitz
space Λα.
1. Characterization of the Hardy spaces Hp
We adopt the atomic definition of the Hardy spaces Hp, 0 < p ≤ 1, see
[6] and [10]. Recall that a compactly supported function a with [N(1/p− 1)]
vanishing moments is an L2 p -atom with defining cube Q if supp(a) ⊆ Q, and
|Q|1/p
| a(x) |2dx
≤ 1 .
The Hardy space Hp(RN ) = Hp consists of those distributions f that can be
written as f =
λjaj , where the aj ’s are H
p atoms,
|λj |
p < ∞, and the
convergence is in the sense of distributions as well as in Hp. Furthermore,
‖f‖Hp ∼ inf
|λj |
where the infimum is taken over all possible atomic decompositions of f . This
last expression has traditionally been called the atomic Hp norm of f .
Collections of atoms with special properties can be used to gain a better
understanding of the Hardy spaces. Formally, let A be a non-empty subset
of L2 p -atoms in the unit ball of Hp. The atomic space H
spanned by A
consists of those ϕ in Hp of the form
λjaj , aj ∈ A ,
|λj |
p < ∞ .
It is readily seen that, endowed with the atomic norm
‖ϕ‖Hp
= inf
|λj |
: ϕ =
λj aj , aj ∈ A
becomes a complete quasinormed space. Clearly, H
⊆ Hp, and, for
f ∈ H
, ‖f‖Hp ≤ ‖f‖Hp
4 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY
Two families are of particular interest to us. When A is the collection
of all L2 p -atoms whose defining cube is dyadic, the resulting space is H
or dyadic Hp. Now, although ‖f‖Hp ≤ ‖f‖Hp
, the two quasinorms are not
equivalent on H
. Indeed, for p = 1 and N = 1, the functions
fn(x) = 2
n[χ[1−2−n,1](x) − χ[1,1+2−n](x)] ,
satisfy ‖fn‖H1 = 1, but ‖fn‖H1
∼ |n| tends to infinity with n.
Next, when Sα is the family of piecewise polynomial splines constructed
above with α = N(1/p − 1), in analogy with the one-dimensional results in
[4] and [1], H
is referred to as the space generated by special atoms.
We are now ready to describe Hp atoms as a superposition of dyadic and
special atoms.
Lemma 1.1. Let a be an L2 p -atom with defining cube Q, 0 < p ≤ 1,
and α = N(1/p − 1). Then a can be written as a linear combination of 2N
dyadic atoms ai, each supported in one of the dyadic subcubes of the smallest
special cube Qn,k containing Q, and a special atom b in Sα. More precisely,
a(x) =
i=1 di ai(x) +
L=1 cL p
−n,−k,α(x), with |di| , |cL| ≤ c.
Proof. Suppose first that the defining cube of a is Q0, and let Q1, . . . , Q2N
denote the dyadic subcubes of Q0. Furthermore, let {e
i , . . . , e
i } denote an
orthonormal basis of the subspace Ai of L
2(Qi) consisting of polynomials in
P[α], 1 ≤ i ≤ 2
N . Put
αi(x) = a(x)χQi (x)−
〈aχQi , e
j(x) , 1 ≤ i ≤ 2
and observe that 〈αi, e
j〉 = 0 for 1 ≤ j ≤ M . Therefore, αi has [α] vanishing
moments, is supported in Qi, and
‖αi‖2 ≤ ‖aχQi‖2 +
‖aχQi‖2 ≤ (M + 1) ‖aχQi‖2 .
ai(x) =
2N(1/2−1/p)
M + 1
αi(x) , 1 ≤ i ≤ N ,
is an L2 p - dyadic atom. Finally, put
b(x) = a(x) −
M + 1
2N(1/2−1/p)
ai(x) .
FROM DYADIC Λα TO Λα 5
Clearly b has [α] vanishing moments, is supported in Q0, coincides with a
polynomial in P[α] on each dyadic subcube of Q0, and
‖b‖22 ≤
|〈aχQi , e
2 ≤ M ‖a‖22 .
So, b ∈ A, and, consequently, b(x) =
L=1 cL p
L(x), where
|cL| = |〈b, p
L〉| ≤ c , 1 ≤ L ≤ M .
In the general case, let Q be the defining cube of a, side-length Q = ℓ, and
let n and k = (k1, . . . , kN ) be chosen so that 2
n−1 ≤ ℓ < 2n, and
Q ⊂ [(k1 − 1)2
n, (k1 + 1)2
n]× · · · × [(kN − 1)2
n, (kN + 1)2
Then, (1/2)N ≤ |Q|/2nN < 1.
Now, given x ∈ Q0, let a
′ be the translation and dilation of a given by
a′(x) = 2nN/pa(2nx1 − k1, . . . , 2
nxN − kN ) .
Clearly, [α] moments of a′ vanish, and
‖a′‖2 = 2
nN/p 2−nN/2‖a‖2 ≤ c |Q|
1/p|Q|−1/2‖a‖2 ≤ c .
Thus, a′ is a multiple of an atom with defining cube Q0. By the first part of
the proof,
a′(x) =
i(x) +
L(x) , x ∈ Q0 .
The support of each a′i is contained in one of the dyadic subcubes of Q0, and,
consequently, there is a k such that
ai(x) = 2
−nN/pa′i(2
−nx1 − k1, . . . , 2
−nxN − kN )
ai is an L
2p -atom supported in one of the dyadic subcubes of Q. Similarly
for the pL’s. Thus,
a(x) =
di ai(x) +
−n,−k,N(1/p−1)(x) ,
and we have finished. �
Theorem B follows readily from Lemma 1.1. Clearly, H
→֒ Hp.
Conversely, let f =
j λj aj be in H
p. By Lemma 1.1 each aj can be written
as a sum of dyadic and special atoms, and, by distributing the sum, we can
write f = fd + fs, with fd in H
, fs in H
, and
‖fd‖Hp
, ‖fs‖Hp
|λj |
Taking the infimum over the decompositions of f we get ‖f‖Hp
c ‖f‖Hp , and H
p →֒ H
. This completes the proof.
6 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY
The meaning of this decomposition is the following. Cubes in D are con-
tained in one of the 2N non-overlapping quadrants of RN . To allow for the
information carried by a dyadic cube to be transmitted to an adjacent dyadic
cube, they must be connected. The pLn,k,α’s channel information across ad-
jacent dyadic cubes which would otherwise remain disconnected. The reader
will have no difficulty in proving the quantitative version of this observation:
Let T be a linear mapping defined on Hp, 0 < p ≤ 1, that assumes values in
a quasinormed Banach space X . Then, T is continuous if, and only if, the
restrictions of T to H
and H
are continuous.
2. Characterizations of Λα
Theorem A describes how to pass from Λα,D to Λα, and we prove it next.
Since (Hp)∗ = Λα and (H
)∗ = Λα,D, from Theorem B it follows readily that
Λα = Λα,D ∩ (H
)∗, so it only remains to show that (H
)∗ is characterized
by the condition Aα(g) < ∞.
First note that if g is a locally square-integrable function with Aα(g) < ∞
and f =
j,L cj,L p
nj ,kj ,α
, since 0 < p ≤ 1,
|〈g, f〉| ≤
|cj,L| |〈g, p
nj ,kj ,α
≤ Aα(g)
|cj,L|
and, consequently, taking the infimum over all atomic decompositions of f in
, we get g ∈ (H
)∗ and ‖g‖(Hp
)∗ ≤ Aα(g).
To prove the converse we proceed as in [3]. Let Qn = [−2
n, 2n]N . We begin
by observing that functions f in L2(Qn) that have vanishing moments up to
order [α] and coincide with polynomials of degree [α] on the dyadic subcubes
of Qn belong to H
‖f‖Hp
≤ |Qn|
1/p−1/2‖f‖2 .
Given ℓ ∈ (H
)∗, for a fixed n let us consider the restriction of ℓ to the space
of L2 functions f with [α] vanishing moments that are supported in Qn. Since
|ℓ(f)| ≤ ‖ℓ‖ ‖f‖Hp
≤ ‖ℓ‖ |Qn|
1/p−1/2‖f‖2 ,
this restriction is continuous with respect to the norm in L2 and, consequently,
it can be extended to a continuous linear functional in L2 and represented as
ℓ(f) =
f(x) gn(x) dx ,
FROM DYADIC Λα TO Λα 7
where gn ∈ L
2(Qn) and satisfies ‖gn‖2 ≤ ‖ℓ‖ |Qn|
1/p−1/2. Clearly, gn is
uniquely determined in Qn up to a polynomial pn in P[α]. Therefore,
gn(x) − pn(x) = gm(x)− pm(x) , a.e. x ∈ Qmin(n,m) .
Consequently, if
g(x) = gn(x)− pn(x) , x ∈ Qn ,
g(x) is well defined a.e. and, if f ∈ L2 has [α] vanishing moments and is
supported in Qn, we have
ℓ(f) =
f(x) gn(x) dx
f(x) [gn(x)− pn(x)] dx
f(x) g(x) dx .
Moreover, since each 2nN/ppL(2n ·+k) is an L2 p-atom, 1 ≤ L ≤ M , it readily
follows that
Aα(g) = sup
1≤L≤M
n,k∈Z
|〈g, 2−n/ppL(2n ·+k)〉|
≤ ‖ℓ‖ sup
‖pL‖Hp ≤ ‖ℓ‖ ,
and, consequently, Aα(g) ≤ ‖ℓ‖ , and (H
)∗ is the desired space. �
The reader will have no difficulty in showing that this result implies the
following: Let T be a bounded linear operator from a quasinormed space X
into Λα,D. Then, T is bounded from X into Λα if, and only if, Aα(Tx) ≤
c ‖x‖X for every x ∈ X .
The process of averaging the translates of dyadic BMO functions leads to
BMO, and is an important tool in obtaining results in BMO once they are
known to be true in its dyadic counterpart, BMOd, see [7]. It is also known
that BMO can be obtained as the intersection of BMOd and one of its shifted
counterparts, see [8]. These results motivate our next proposition, which
essentially says that g ∈ Λα if, and only if, g ∈ Λα,D and g is in the Lipschitz
class obtained from the shifted dyadic grid. Note that the shifts involved in
this class are in all directions parallel to the coordinate axis and depend on
the side-length of the cube.
Proposition 2.1. Λα = Λα,D0 , and ‖g‖Λα ∼ ‖g‖Λα,D0 .
Proof. It is obvious that ‖g‖Λα,D0 ≤ ‖g‖Λα . To show the other inequality we
invoke Theorem A. Since D ⊂ D0, it suffices to estimate Aα(g), or, equiva-
lently, |〈g, p〉| for p ∈ Sα, α = N(1/p − 1). So, pick p = p
n,k,α in Sα. The
defining cube Q of pLn,k,α is in D0, and, since p
n,k,α has [α] vanishing moments,
8 WAEL ABU-SHAMMALA AND ALBERTO TORCHINSKY
〈pLn,k,α, pQ(g)〉 = 0. Therefore,
|〈g, pLn,k,α〉| = |〈g − pQ(g), p
n,k,α〉|
≤ ‖pLn,k,α‖2 ‖g − pQ(g)‖L2(Q)
≤ |Q|α/N |Q|1/2‖pLn,k,α‖2 ‖g‖Λα,D0 .
Now, a simple change of variables gives |Q|α/N |Q|1/2‖pLn,k,α‖2 ≤ 1, and, con-
sequently, also Aα(g) ≤ ‖g‖Λα,D0 . �
References
[1] W. Abu-Shammala, J.-L. Shiu, and A. Torchinsky, Characterizations of the Hardy
space H1 and BMO, preprint.
[2] H.-Q. Bui and R. S. Laugesen, Approximation and spanning in the Hardy space, by
affine systems, Constr. Approx., to appear.
[3] A. P. Calderón and A. Torchinsky, Parabolic maximal functions associated with a
distibution, II, Advances in Math., 24 (1977), 101–171.
[4] G. S. de Souza, Spaces formed by special atoms, I, Rocky Mountain J. Math. 14 (1984),
no. 2, 423–431.
[5] S. Fridli, Transition from the dyadic to the real nonperiodic Hardy space, Acta Math.
Acad. Paedagog. Niházi (N.S.) 16 (2000), 1–8, (electronic).
[6] J. Garćıa-Cuerva and J. L. Rubio de Francia, Weighted norm inequalities and related
topics, Notas de Matemática 116, North Holland, Amsterdam, 1985.
[7] J. Garnett and P. Jones, BMO from dyadic BMO, Pacific J. Math. 99 (1982), no. 2,
351–371.
[8] T. Mei, BMO is the intersection of two translates of dyadic BMO, C. R. Math. Acad.
Sci. Paris 336 (2003), no. 12, 1003–1006.
[9] T. M. Le and L. A. Vese, Image decomposition using total variation and div( BMO)∗,
Multiscale Model. Simul. 4, (2005), no. 2, 390–423.
[10] A. Torchinsky, Real-variable methods in harmonic analysis, Dover Publications, Inc.,
Mineola, NY, 2004.
Department of Mathematics, Indiana University, Bloomington IN 47405
E-mail address: wabusham@indiana.edu
Department of Mathematics, Indiana University, Bloomington IN 47405
E-mail address: torchins@indiana.edu
	1. Characterization of the Hardy spaces Hp
	2. Characterizations of 
	References
ABSTRACT
  In this paper we show how to compute the $\Lambda_{\alpha}$ norm, $\alpha\ge
0$, using the dyadic grid. This result is a consequence of the description of
the Hardy spaces $H^p(R^N)$ in terms of dyadic and special atoms.

<|endoftext|><|startoftext|>
Polymer Quantum Mechanics and its Continuum Limit
Alejandro Corichi,1, 2, 3, ∗ Tatjana Vukašinac,4, † and José A. Zapata1, ‡
Instituto de Matemáticas, Unidad Morelia, Universidad Nacional Autónoma de México,
UNAM-Campus Morelia, A. Postal 61-3, Morelia, Michoacán 58090, Mexico
Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México,
A. Postal 70-543, México D.F. 04510, Mexico
Institute for Gravitational Physics and Geometry, Physics Department,
Pennsylvania State University, University Park PA 16802, USA
Facultad de Ingenieŕıa Civil, Universidad Michoacana de San Nicolas de Hidalgo,
Morelia, Michoacán 58000, Mexico
A rather non-standard quantum representation of the canonical commutation relations of quan-
tum mechanics systems, known as the polymer representation has gained some attention in recent
years, due to its possible relation with Planck scale physics. In particular, this approach has been
followed in a symmetric sector of loop quantum gravity known as loop quantum cosmology. Here we
explore different aspects of the relation between the ordinary Schrödinger theory and the polymer
description. The paper has two parts. In the first one, we derive the polymer quantum mechanics
starting from the ordinary Schrödinger theory and show that the polymer description arises as an
appropriate limit. In the second part we consider the continuum limit of this theory, namely, the
reverse process in which one starts from the discrete theory and tries to recover back the ordinary
Schrödinger quantum mechanics. We consider several examples of interest, including the harmonic
oscillator, the free particle and a simple cosmological model.
PACS numbers: 04.60.Pp, 04.60.Ds, 04.60.Nc 11.10.Gh.
I. INTRODUCTION
The so-called polymer quantum mechanics, a non-
regular and somewhat ‘exotic’ representation of the
canonical commutation relations (CCR) [1], has been
used to explore both mathematical and physical issues in
background independent theories such as quantum grav-
ity [2, 3]. A notable example of this type of quantization,
when applied to minisuperspace models has given way to
what is known as loop quantum cosmology [4, 5]. As in
any toy model situation, one hopes to learn about the
subtle technical and conceptual issues that are present
in full quantum gravity by means of simple, finite di-
mensional examples. This formalism is not an exception
in this regard. Apart from this motivation coming from
physics at the Planck scale, one can independently ask
for the relation between the standard continuous repre-
sentations and their polymer cousins at the level of math-
ematical physics. A deeper understanding of this relation
becomes important on its own.
The polymer quantization is made of several steps.
The first one is to build a representation of the
Heisenberg-Weyl algebra on a Kinematical Hilbert space
that is “background independent”, and that is sometimes
referred to as the polymeric Hilbert space Hpoly. The
second and most important part, the implementation of
dynamics, deals with the definition of a Hamiltonian (or
Hamiltonian constraint) on this space. In the examples
∗Electronic address: corichi@matmor.unam.mx
†Electronic address: tatjana@shi.matmor.unam.mx
‡Electronic address: zapata@matmor.unam.mx
studied so far, the first part is fairly well understood,
yielding the kinematical Hilbert space Hpoly that is, how-
ever, non-separable. For the second step, a natural im-
plementation of the dynamics has proved to be a bit more
difficult, given that a direct definition of the Hamiltonian
Ĥ of, say, a particle on a potential on the space Hpoly is
not possible since one of the main features of this repre-
sentation is that the operators q̂ and p̂ cannot be both
simultaneously defined (nor their analogues in theories
involving more elaborate variables). Thus, any operator
that involves (powers of) the not defined variable has to
be regulated by a well defined operator which normally
involves introducing some extra structure on the configu-
ration (or momentum) space, namely a lattice. However,
this new structure that plays the role of a regulator can
not be removed when working in Hpoly and one is left
with the ambiguity that is present in any regularization.
The freedom in choosing it can be sometimes associated
with a length scale (the lattice spacing). For ordinary
quantum systems such as a simple harmonic oscillator,
that has been studied in detail from the polymer view-
point, it has been argued that if this length scale is taken
to be ‘sufficiently small’, one can arbitrarily approximate
standard Schrödinger quantum mechanics [2, 3]. In the
case of loop quantum cosmology, the minimum area gap
A0 of the full quantum gravity theory imposes such a
scale, that is then taken to be fundamental [4].
A natural question is to ask what happens when we
change this scale and go to even smaller ‘distances’, that
is, when we refine the lattice on which the dynamics of
the theory is defined. Can we define consistency con-
ditions between these scales? Or even better, can we
take the limit and find thus a continuum limit? As it
http://arxiv.org/abs/0704.0007v2
mailto:corichi@matmor.unam.mx
mailto:tatjana@shi.matmor.unam.mx
mailto:zapata@matmor.unam.mx
has been shown recently in detail, the answer to both
questions is in the affirmative [6]. There, an appropriate
notion of scale was defined in such a way that one could
define refinements of the theory and pose in a precise
fashion the question of the continuum limit of the theory.
These results could also be seen as handing a procedure
to remove the regulator when working on the appropri-
ate space. The purpose of this paper is to further explore
different aspects of the relation between the continuum
and the polymer representation. In particular in the first
part we put forward a novel way of deriving the polymer
representation from the ordinary Schrödinger represen-
tation as an appropriate limit. In Sec. II we derive two
versions of the polymer representation as different lim-
its of the Schrödinger theory. In Sec. III we show that
these two versions can be seen as different polarizations
of the ‘abstract’ polymer representation. These results,
to the best of our knowledge, are new and have not been
reported elsewhere. In Sec. IV we pose the problem of
implementing the dynamics on the polymer representa-
tion. In Sec. V we motivate further the question of the
continuum limit (i.e. the proper removal of the regulator)
and recall the basic constructions of [6]. Several exam-
ples are considered in Sec. VI. In particular a simple
harmonic oscillator, the polymer free particle and a sim-
ple quantum cosmology model are considered. The free
particle and the cosmological model represent a general-
ization of the results obtained in [6] where only systems
with a discrete and non-degenerate spectrum where con-
sidered. We end the paper with a discussion in Sec. VII.
In order to make the paper self-contained, we will keep
the level of rigor in the presentation to that found in the
standard theoretical physics literature.
II. QUANTIZATION AND POLYMER
REPRESENTATION
In this section we derive the so called polymer repre-
sentation of quantum mechanics starting from a specific
reformulation of the ordinary Schrödinger representation.
Our starting point will be the simplest of all possible
phase spaces, namely Γ = R2 corresponding to a particle
living on the real line R. Let us choose coordinates (q, p)
thereon. As a first step we shall consider the quantization
of this system that leads to the standard quantum theory
in the Schrödinger description. A convenient route is to
introduce the necessary structure to define the Fock rep-
resentation of such system. From this perspective, the
passage to the polymeric case becomes clearest. Roughly
speaking by a quantization one means a passage from the
classical algebraic bracket, the Poisson bracket,
{q, p} = 1 (1)
to a quantum bracket given by the commutator of the
corresponding operators,
[ q̂, p̂] = i~ 1̂ (2)
These relations, known as the canonical commutation re-
lation (CCR) become the most common corner stone of
the (kinematics of the) quantum theory; they should be
satisfied by the quantum system, when represented on a
Hilbert space H.
There are alternative points of departure for quantum
kinematics. Here we consider the algebra generated by
the exponentiated versions of q̂ and p̂ that are denoted
U(α) = ei(α q̂)/~ ; V (β) = ei(β p̂)/~
where α and β have dimensions of momentum and length,
respectively. The CCR now become
U(α) · V (β) = e(−iα β)/~V (β) · U(α) (3)
and the rest of the product is
U(α1)·U(α2) = U(α1+α2) ; V (β1)·V (β2) = V (β1+β2)
The Weyl algebra W is generated by taking finite linear
combinations of the generators U(αi) and V (βi) where
the product (3) is extended by linearity,
(Ai U(αi) +Bi V (βi))
From this perspective, quantization means finding an
unitary representation of the Weyl algebra W on a
Hilbert space H′ (that could be different from the ordi-
nary Schrödinger representation). At first it might look
weird to attempt this approach given that we know how
to quantize such a simple system; what do we need such
a complicated object as W for? It is infinite dimensional,
whereas the set S = {1̂, q̂, p̂}, the starting point of the
ordinary Dirac quantization, is rather simple. It is in
the quantization of field systems that the advantages of
the Weyl approach can be fully appreciated, but it is
also useful for introducing the polymer quantization and
comparing it to the standard quantization. This is the
strategy that we follow.
A question that one can ask is whether there is any
freedom in quantizing the system to obtain the ordinary
Schrödinger representation. On a first sight it might seem
that there is none given the Stone-Von Neumann unique-
ness theorem. Let us review what would be the argument
for the standard construction. Let us ask that the repre-
sentation we want to build up is of the Schrödinger type,
namely, where states are wave functions of configuration
space ψ(q). There are two ingredients to the construction
of the representation, namely the specification of how the
basic operators (q̂, p̂) will act, and the nature of the space
of functions that ψ belongs to, that is normally fixed by
the choice of inner product on H, or measure µ on R.
The standard choice is to select the Hilbert space to be,
H = L2(R, dq)
the space of square-integrable functions with respect to
the Lebesgue measure dq (invariant under constant trans-
lations) on R. The operators are then represented as,
q̂ · ψ(q) = (q ψ)(q) and p̂ · ψ(q) = −i ~ ∂
ψ(q) (4)
Is it possible to find other representations? In order to
appreciate this freedom we go to the Weyl algebra and
build the quantum theory thereon. The representation
of the Weyl algebra that can be called of the ‘Fock type’
involves the definition of an extra structure on the phase
space Γ: a complex structure J . That is, a linear map-
ping from Γ to itself such that J2 = −1. In 2 dimen-
sions, all the freedom in the choice of J is contained in
the choice of a parameter d with dimensions of length. It
is also convenient to define: k = p/~ that has dimensions
of 1/L. We have then,
Jd : (q, k) 7→ (−d2 k, q/d2)
This object together with the symplectic structure:
Ω((q, p); (q′, p′)) = q p′ − p q′ define an inner product on
Γ by the formula gd(· ; ·) = Ω(· ; Jd ·) such that:
gd((q, p); (q
′, p′)) =
q q′ +
which is dimension-less and positive definite. Note that
with this quantities one can define complex coordinates
(ζ, ζ̄) as usual:
q + i
p ; ζ̄ =
q − i d
from which one can build the standard Fock representa-
tion. Thus, one can alternatively view the introduction
of the length parameter d as the quantity needed to de-
fine (dimensionless) complex coordinates on the phase
space. But what is the relevance of this object (J or
d)? The definition of complex coordinates is useful for
the construction of the Fock space since from them one
can define, in a natural way, creation and annihilation
operators. But for the Schrödinger representation we are
interested here, it is a bit more subtle. The subtlety is
that within this approach one uses the algebraic prop-
erties of W to construct the Hilbert space via what is
known as the Gel’fand-Naimark-Segal (GNS) construc-
tion. This implies that the measure in the Schrödinger
representation becomes non trivial and thus the momen-
tum operator acquires an extra term in order to render
the operator self-adjoint. The representation of the Weyl
algebra is then, when acting on functions φ(q) [7]:
Û(α) · φ(q) := (eiα q/~ φ)(q)
V̂ (β) · φ(q) := e
(q−β/2)
φ(q − β)
The Hilbert space structure is introduced by the defini-
tion of an algebraic state (a positive linear functional)
ωd : W → C, that must coincide with the expectation
value in the Hilbert space taken on a special state ref-
ered to as the vacuum: ωd(a) = 〈â〉vac, for all a ∈ W .
In our case this specification of J induces such a unique
state ωd that yields,
〈Û(α)〉vac = e−
d2 α2
~2 (5)
〈V̂ (β)〉vac = e−
d2 (6)
Note that the exponents in the vacuum expectation
values correspond to the metric constructed out of J :
d2 α2
= gd((0, α); (0, α)) and
= gd((β, 0); (β, 0)).
Wave functions belong to the space L2(R, dµd), where
the measure that dictates the inner product in this rep-
resentation is given by,
dµd =
d2 dq
In this representation, the vacuum is given by the iden-
tity function φ0(q) = 1 that is, just as any plane wave,
normalized. Note that for each value of d > 0, the rep-
resentation is well defined and continuous in α and β.
Note also that there is an equivalence between the q-
representation defined by d and the k-representation de-
fined by 1/d.
How can we recover then the standard representation
in which the measure is given by the Lebesgue measure
and the operators are represented as in (4)? It is easy to
see that there is an isometric isomorphism K that maps
the d-representation in Hd to the standard Schrödinger
representation in Hschr by:
ψ(q) = K · φ(q) = e
d1/2π1/4
φ(q) ∈ Hschr = L2(R, dq)
Thus we see that all d-representations are unitarily equiv-
alent. This was to be expected in view of the Stone-Von
Neumann uniqueness result. Note also that the vacuum
now becomes
ψ0(q) =
d1/2π1/4
2 d2 ,
so even when there is no information about the param-
eter d in the representation itself, it is contained in the
vacuum state. This procedure for constructing the GNS-
Schrödinger representation for quantum mechanics has
also been generalized to scalar fields on arbitrary curved
space in [8]. Note, however that so far the treatment has
all been kinematical, without any knowledge of a Hamil-
tonian. For the Simple Harmonic Oscillator of mass m
and frequency ω, there is a natural choice compatible
with the dynamics given by d =
, in which some
calculations simplify (for instance for coherent states),
but in principle one can use any value of d.
Our study will be simplified by focusing on the funda-
mental entities in the Hilbert Space Hd , namely those
states generated by acting with Û(α) on the vacuum
φ0(q) = 1. Let us denote those states by,
φα(q) = Û(α) · φ0(q) = ei
The inner product between two such states is given by
〈φα, φλ〉d =
dµd e
~ = e−
(λ−α)2 d2
4 ~2 (7)
Note incidentally that, contrary to some common belief,
the ‘plane waves’ in this GNS Hilbert space are indeed
normalizable.
Let us now consider the polymer representation. For
that, it is important to note that there are two possible
limiting cases for the parameter d: i) The limit 1/d 7→ 0
and ii) The case d 7→ 0. In both cases, we have ex-
pressions that become ill defined in the representation or
measure, so one needs to be careful.
A. The 1/d 7→ 0 case.
The first observation is that from the expressions (5) and
(6) for the algebraic state ωd, we see that the limiting
cases are indeed well defined. In our case we get, ωA :=
lim1/d→0 ωd such that,
ωA(Û(α)) = δα,0 and ωA(V̂ (β)) = 1 (8)
From this, we can indeed construct the representation
by means of the GNS construction. In order to do that
and to show how this is obtained we shall consider several
expressions. One has to be careful though, since the limit
has to be taken with care. Let us consider the measure
on the representation that behaves as:
dµd =
d2 dq 7→ 1
so the measures tends to an homogeneous measure but
whose ‘normalization constant’ goes to zero, so the limit
becomes somewhat subtle. We shall return to this point
later.
Let us now see what happens to the inner product
between the fundamental entities in the Hilbert Space Hd
given by (7). It is immediate to see that in the 1/d 7→ 0
limit the inner product becomes,
〈φα, φλ〉d 7→ δα,λ (9)
with δα,λ being Kronecker’s delta. We see then that the
plane waves φα(q) become an orthonormal basis for the
new Hilbert space. Therefore, there is a delicate interplay
between the two terms that contribute to the measure in
order to maintain the normalizability of these functions;
we need the measure to become damped (by 1/d) in order
to avoid that the plane waves acquire an infinite norm
(as happens with the standard Lebesgue measure), but
on the other hand the measure, that for any finite value
of d is a Gaussian, becomes more and more spread.
It is important to note that, in this limit, the operators
Û(α) become discontinuous with respect to α, given that
for any given α1 and α2 (different), its action on a given
basis vector ψλ(q) yields orthogonal vectors. Since the
continuity of these operators is one of the hypotesis of
the Stone-Von Neumann theorem, the uniqueness result
does not apply here. The representation is inequivalent
to the standard one.
Let us now analyze the other operator, namely the
action of the operator V̂ (β) on the basis φα(q):
V̂ (β) · φα(q) = e−
~ e(β/d
2+iα/~)q
which in the limit 1/d 7→ 0 goes to,
V̂ (β) · φα(q) 7→ ei
~ φα(q)
that is continuous on β. Thus, in the limit, the operator
p̂ = −i~∂q is well defined. Also, note that in this limit
the operator p̂ has φα(q) as its eigenstate with eigenvalue
given by α:
p̂ · φα(q) 7→ αφα(q)
To summarize, the resulting theory obtained by taking
the limit 1/d 7→ 0 of the ordinary Schrödinger descrip-
tion, that we shall call the ‘polymer representation of
type A’, has the following features: the operators U(α)
are well defined but not continuous in α, so there is no
generator (no operator associated to q). The basis vec-
tors φα are orthonormal (for α taking values on a contin-
uous set) and are eigenvectors of the operator p̂ that is
well defined. The resulting Hilbert space HA will be the
(A-version of the) polymer representation. Let us now
consider the other case, namely, the limit when d 7→ 0.
B. The d 7→ 0 case
Let us now explore the other limiting case of the
Schrödinger/Fock representations labelled by the param-
eter d. Just as in the previous case, the limiting algebraic
state becomes, ωB := limd→0 ωd such that,
ωB(Û(α)) = 1 and ωB(V̂ (β)) = δβ,0 (10)
From this positive linear function, one can indeed con-
struct the representation using the GNS construction.
First let us note that the measure, even when the limit
has to be taken with due care, behaves as:
dµd =
d2 dq 7→ δ(q) dq
That is, as Dirac’s delta distribution. It is immediate to
see that, in the d 7→ 0 limit, the inner product between
the fundamental states φα(q) becomes,
〈φα, φλ〉d 7→ 1 (11)
This in fact means that the vector ξ = φα − φλ belongs
to the Kernel of the limiting inner product, so one has to
mod out by these (and all) zero norm states in order to
get the Hilbert space.
Let us now analyze the other operator, namely the
action of the operator V̂ (β) on the vacuum φ0(q) = 1,
which for arbitrary d has the form,
φ̃β := V̂ (β) · φ0(q) = e
(q−β/2)
The inner product between two such states is given by
〈φ̃α, φ̃β〉d = e−
(α−β)2
In the limit d → 0, 〈φ̃α, φ̃β〉d → δα,β. We can see then
that it is these functions that become the orthonormal,
‘discrete basis’ in the theory. However, the function φ̃β(q)
in this limit becomes ill defined. For example, for β > 0,
it grows unboundedly for q > β/2, is equal to one if
q = β/2 and zero otherwise. In order to overcome these
difficulties and make more transparent the resulting the-
ory, we shall consider the other form of the representation
in which the measure is incorporated into the states (and
the resulting Hilbert space is L2(R, dq)). Thus the new
state
ψβ(q) := K · (V̂ (β) · φ0(q)) =
(q−β)2
We can now take the limit and what we get is
d 7→0
ψβ(q) := δ
1/2(q, β)
where by δ1/2(q, β) we mean something like ‘the square
root of the Dirac distribution’. What we really mean is
an object that satisfies the following property:
δ1/2(q, β) · δ1/2(q, α) = δ(q, β) δβ,α
That is, if α = β then it is just the ordinary delta, other-
wise it is zero. In a sense these object can be regarded as
half-densities that can not be integrated by themselves,
but whose product can. We conclude then that the inner
product is,
〈ψβ , ψα〉 =
dq ψβ(q)ψα(q) =
dq δ(q, α) δβ,α = δβ,α
which is just what we expected. Note that in this repre-
sentation, the vacuum state becomes ψ0(q) := δ
1/2(q, 0),
namely, the half-delta with support in the origin. It is
important to note that we are arriving in a natural way to
states as half-densities, whose squares can be integrated
without the need of a nontrivial measure on the configu-
ration space. Diffeomorphism invariance arises then in a
natural but subtle manner.
Note that as the end result we recover the Kronecker
delta inner product for the new fundamental states:
χβ(q) := δ
1/2(q, β).
Thus, in this new B-polymer representation, the Hilbert
space HB is the completion with respect to the inner
product (13) of the states generated by taking (finite)
linear combinations of basis elements of the form χβ :
Ψ(q) =
bi χβi(q) (14)
Let us now introduce an equivalent description of this
Hilbert space. Instead of having the basis elements be
half-deltas as elements of the Hilbert space where the
inner product is given by the ordinary Lebesgue measure
dq, we redefine both the basis and the measure. We
could consider, instead of a half-delta with support β, a
Kronecker delta or characteristic function with support
on β:
χ′β(q) := δq,β
These functions have a similar behavior with respect to
the product as the half-deltas, namely: χ′β(q) · χ′α(q) =
δβ,α. The main difference is that neither χ
′ nor their
squares are integrable with respect to the Lebesgue mea-
sure (having zero norm). In order to fix that problem we
have to change the measure so that we recover the basic
inner product (13) with our new basis. The needed mea-
sure turns out to be the discrete counting measure on R.
Thus any state in the ‘half density basis’ can be written
(using the same expression) in terms of the ‘Kronecker
basis’. For more details and further motivation see the
next section.
Note that in this B-polymer representation, both Û
and V̂ have their roles interchanged with that of the
A-polymer representation: while U(α) is discontinuous
and thus q̂ is not defined in the A-representation, we
have that it is V (β) in the B-representation that has this
property. In this case, it is the operator p̂ that can not
be defined. We see then that given a physical system for
which the configuration space has a well defined physi-
cal meaning, within the possible representation in which
wave-functions are functions of the configuration variable
q, the A and B polymer representations are radically dif-
ferent and inequivalent.
Having said this, it is also true that the A and B
representations are equivalent in a different sense, by
means of the duality between q and p representations
and the d↔ 1/d duality: The A-polymer representation
in the “q-representation” is equivalent to the B-polymer
representation in the “p-representation”, and conversely.
When studying a problem, it is important to decide from
the beginning which polymer representation (if any) one
should be using (for instance in the q-polarization). This
has as a consequence an implication on which variable is
naturally “quantized” (even if continuous): p for A and q
for B. There could be for instance a physical criteria for
this choice. For example a fundamental symmetry could
suggest that one representation is more natural than an-
other one. This indeed has been recently noted by Chiou
in [10], where the Galileo group is investigated and where
it is shown that the B representation is better behaved.
In the other polarization, namely for wavefunctions
of p, the picture gets reversed: q is discrete for the A-
representation, while p is for the B-case. Let us end this
section by noting that the procedure of obtaining the
polymer quantization by means of an appropriate limit
of Fock-Schrödinger representations might prove useful in
more general settings in field theory or quantum gravity.
III. POLYMER QUANTUM MECHANICS:
KINEMATICS
In previous sections we have derived what we have
called the A and B polymer representations (in the q-
polarization) as limiting cases of ordinary Fock repre-
sentations. In this section, we shall describe, without
any reference to the Schrödinger representation, the ‘ab-
stract’ polymer representation and then make contact
with its two possible realizations, closely related to the A
and B cases studied before. What we will see is that one
of them (the A case) will correspond to the p-polarization
while the other one corresponds to the q−representation,
when a choice is made about the physical significance of
the variables.
We can start by defining abstract kets |µ〉 labelled by
a real number µ. These shall belong to the Hilbert space
Hpoly. From these states, we define a generic ‘cylinder
states’ that correspond to a choice of a finite collection of
numbers µi ∈ R with i = 1, 2, . . . , N . Associated to this
choice, there are N vectors |µi〉, so we can take a linear
combination of them
|ψ〉 =
ai |µi〉 (15)
The polymer inner product between the fundamental kets
is given by,
〈ν|µ〉 = δν,µ (16)
That is, the kets are orthogonal to each other (when ν 6=
µ) and they are normalized (〈µ|µ〉 = 1). Immediately,
this implies that, given any two vectors |φ〉 =
j=1 bj |νj〉
and |ψ〉 =
i=1 ai |µi〉, the inner product between them
is given by,
〈φ|ψ〉 =
b̄j ai 〈νj |µi〉 =
b̄k ak
where the sum is over k that labels the intersection points
between the set of labels {νj} and {µi}. The Hilbert
space Hpoly is the Cauchy completion of finite linear com-
bination of the form (15) with respect to the inner prod-
uct (16). Hpoly is non-separable. There are two basic
operators on this Hilbert space: the ‘label operator’ ε̂:
ε̂ |µ〉 := µ |µ〉
and the displacement operator ŝ (λ),
ŝ (λ) |µ〉 := |µ+ λ〉
The operator ε̂ is symmetric and the operator(s) ŝ(λ)
defines a one-parameter family of unitary operators on
Hpoly, where its adjoint is given by ŝ† (λ) = ŝ (−λ). This
action is however, discontinuous with respect to λ given
that |µ〉 and |µ + λ〉 are always orthogonal, no matter
how small is λ. Thus, there is no (Hermitian) operator
that could generate ŝ (λ) by exponentiation.
So far we have given the abstract characterization of
the Hilbert space, but one would like to make contact
with concrete realizations as wave functions, or by iden-
tifying the abstract operators ε̂ and ŝ with physical op-
erators.
Suppose we have a system with a configuration space
with coordinate given by q, and p denotes its canonical
conjugate momenta. Suppose also that for physical rea-
sons we decide that the configuration coordinate q will
have some “discrete character” (for instance, if it is to
be identified with position, one could say that there is
an underlying discreteness in position at a small scale).
How can we implement such requirements by means of
the polymer representation? There are two possibilities,
depending on the choice of ‘polarizations’ for the wave-
functions, namely whether they will be functions of con-
figuration q or momenta p. Let us the divide the discus-
sion into two parts.
A. Momentum polarization
In this polarization, states will be denoted by,
ψ(p) = 〈p|ψ〉
where
ψµ(p) = 〈p|µ〉 = ei
How are then the operators ε̂ and ŝ represented? Note
that if we associate the multiplicative operator
V̂ (λ) · ψµ(p) = ei
~ = ei
(µ+λ)
p = ψ(µ+λ)(p)
we see then that the operator V̂ (λ) corresponds precisely
to the shift operator ŝ (λ). Thus we can also conclude
that the operator p̂ does not exist. It is now easy to
identify the operator q̂ with:
q̂ · ψµ(p) = −i~
ψµ(p) = µ e
~ = µψµ(p)
namely, with the abstract operator ε̂. The reason we
say that q̂ is discrete is because this operator has as its
eigenvalue the label µ of the elementary state ψµ(p), and
this label, even when it can take value in a continuum
of possible values, is to be understood as a discrete set,
given that the states are orthonormal for all values of
µ. Given that states are now functions of p, the inner
product (16) should be defined by a measure µ on the
space on which the wave-functions are defined. In order
to know what these two objects are, namely, the quan-
tum “configuration” space C and the measure thereon1,
we have to make use of the tools available to us from
the theory of C∗-algebras. If we consider the operators
V̂ (λ), together with their natural product and ∗-relation
given by V̂ ∗(λ) = V̂ (−λ), they have the structure of
an Abelian C∗-algebra (with unit) A. We know from
the representation theory of such objects that A is iso-
morphic to the space of continuous functions C0(∆) on a
compact space ∆, the spectrum of A. Any representation
of A on a Hilbert space as multiplication operator will be
on spaces of the form L2(∆, dµ). That is, our quantum
configuration space is the spectrum of the algebra, which
in our case corresponds to the Bohr compactification Rb
of the real line [11]. This space is a compact group and
there is a natural probability measure defined on it, the
Haar measure µH. Thus, our Hilbert space Hpoly will be
isomorphic to the space,
Hpoly,p = L2(Rb, dµH) (17)
In terms of ‘quasi periodic functions’ generated by ψµ(p),
the inner product takes the form
〈ψµ|ψλ〉 :=
dµH ψµ(p)ψλ(p) :=
= lim
L 7→∞
dpψµ(p)ψλ(p) = δµ,λ (18)
note that in the p-polarization, this characterization cor-
responds to the ‘A-version’ of the polymer representation
of Sec. II (where p and q are interchanged).
B. q-polarization
Let us now consider the other polarization in which wave
functions will depend on the configuration coordinate q:
ψ(q) = 〈q|ψ〉
The basic functions, that now will be called ψ̃µ(q), should
be, in a sense, the dual of the functions ψµ(p) of the
previous subsection. We can try to define them via a
‘Fourier transform’:
ψ̃µ(q) := 〈q|µ〉 = 〈q|
dµH|p〉〈p|µ〉
which is given by
ψ̃µ(q) :=
dµH〈q|p〉ψµ(p) =
dµH e
−i p q
~ = δq,µ (19)
1 here we use the standard terminology of ‘configuration space’ to
denote the domain of the wave function even when, in this case,
it corresponds to the physical momenta p.
That is, the basic objects in this representation are Kro-
necker deltas. This is precisely what we had found in
Sec. II for the B-type representation. How are now the
basic operators represented and what is the form of the
inner product? Regarding the operators, we expect that
they are represented in the opposite manner as in the
previous p-polarization case, but that they preserve the
same features: p̂ does not exist (the derivative of the Kro-
necker delta is ill defined), but its exponentiated version
V̂ (λ) does:
V̂ (λ) · ψ(q) = ψ(q + λ)
and the operator q̂ that now acts as multiplication has
as its eigenstates, the functions ψ̃ν(q) = δν,q:
q̂ · ψ̃µ(q) := µ ψ̃µ(q)
What is now the nature of the quantum configurations
space Q? And what is the measure thereon dµq? that
defines the inner product we should have:
〈ψ̃µ(q), ψ̃λ(q)〉 = δµ,λ
The answer comes from one of the characterizations of
the Bohr compactification: we know that it is, in a precise
sense, dual to the real line but when equipped with the
discrete topology Rd. Furthermore, the measure on Rd
will be the ‘counting measure’. In this way we recover the
same properties we had for the previous characterization
of the polymer Hilbert space. We can thus write:
Hpoly,x := L2(Rd, dµc) (20)
This completes a precise construction of the B-type poly-
mer representation sketched in the previous section. Note
that if we had chosen the opposite physical situation,
namely that q, the configuration observable, be the quan-
tity that does not have a corresponding operator, then
we would have had the opposite realization: In the q-
polarization we would have had the type-A polymer rep-
resentation and the type-B for the p-polarization. As
we shall see both scenarios have been considered in the
literature.
Up to now we have only focused our discussion on the
kinematical aspects of the quantization process. Let us
now consider in the following section the issue of dynam-
ics and recall the approach that had been adopted in the
literature, before the issue of the removal of the regulator
was reexamined in [6].
IV. POLYMER QUANTUM MECHANICS:
DYNAMICS
As we have seen the construction of the polymer
representation is rather natural and leads to a quan-
tum theory with different properties than the usual
Schrödinger counterpart such as its non-separability, the
non-existence of certain operators and the existence of
normalized eigen-vectors that yield a precise value for
one of the phase space coordinates. This has been done
without any regard for a Hamiltonian that endows the
system with a dynamics, energy and so on.
First let us consider the simplest case of a particle of
mass m in a potential V (q), in which the Hamiltonian H
takes the form,
p2 + V (q)
Suppose furthermore that the potential is given by a non-
periodic function, such as a polynomial or a rational func-
tion. We can immediately see that a direct implementa-
tion of the Hamiltonian is out of our reach, for the simple
reason that, as we have seen, in the polymer representa-
tion we can either represent q or p, but not both! What
has been done so far in the literature? The simplest
thing possible: approximate the non-existing term by a
well defined function that can be quantized and hope for
the best. As we shall see in next sections, there is indeed
more that one can do.
At this point there is also an important decision to be
made: which variable q or p should be regarded as “dis-
crete”? Once this choice is made, then it implies that
the other variable will not exist: if q is regarded as dis-
crete, then p will not exist and we need to approximate
the kinetic term p2/2m by something else; if p is to be
the discrete quantity, then q will not be defined and then
we need to approximate the potential V (q). What hap-
pens with a periodic potential? In this case one would
be modelling, for instance, a particle on a regular lattice
such as a phonon living on a crystal, and then the natural
choice is to have q not well defined. Furthermore, the po-
tential will be well defined and there is no approximation
needed.
In the literature both scenarios have been considered.
For instance, when considering a quantum mechanical
system in [2], the position was chosen to be discrete,
so p does not exist, and one is then in the A type for
the momentum polarization (or the type B for the q-
polarization). With this choice, it is the kinetic term the
one that has to be approximated, so once one has done
this, then it is immediate to consider any potential that
will thus be well defined. On the other hand, when con-
sidering loop quantum cosmology (LQC), the standard
choice is that the configuration variable is not defined
[4]. This choice is made given that LQC is regarded as
the symmetric sector of full loop quantum gravity where
the connection (that is regarded as the configuration vari-
able) can not be promoted to an operator and one can
only define its exponentiated version, namely, the holon-
omy. In that case, the canonically conjugate variable,
closely related to the volume, becomes ‘discrete’, just as
in the full theory. This case is however, different from the
particle in a potential example. First we could mention
that the functional form of the Hamiltonian constraint
that implements dynamics has a different structure, but
the more important difference lies in that the system is
constrained.
Let us return to the case of the particle in a po-
tential and for definiteness, let us start with the aux-
iliary kinematical framework in which: q is discrete, p
can not be promoted and thus we have to approximate
the kinetic term p̂2/2m. How is this done? The stan-
dard prescription is to define, on the configuration space
C, a regular ‘graph’ γµ0 . This consists of a numerable
set of points, equidistant, and characterized by a pa-
rameter µ0 that is the (constant) separation between
points. The simplest example would be to consider the
set γµ0 = {q ∈ R | q = nµ0 , ∀ n ∈ Z}.
This means that the basic kets that will be considered
|µn〉 will correspond precisely to labels µn belonging to
the graph γµ0 , that is, µn = nµ0. Thus, we shall only
consider states of the form,
|ψ〉 =
bn |µn〉 . (21)
This ‘small’ Hilbert space Hγµ0 , the graph Hilbert space,
is a subspace of the ‘large’ polymer Hilbert space Hpoly
but it is separable. The condition for a state of the form
(21) to belong to the Hilbert space Hγµ0 is that the co-
efficients bn satisfy:
n |bn|2 <∞.
Let us now consider the kinetic term p̂2/2m. We have
to approximate it by means of trigonometric functions,
that can be built out of the functions of the form eiλ p/~.
As we have seen in previous sections, these functions can
indeed be promoted to operators and act as translation
operators on the kets |µ〉. If we want to remain in the
graph γ, and not create ‘new points’, then one is con-
strained to considering operators that displace the kets
by just the right amount. That is, we want the basic
shift operator V̂ (λ) to be such that it maps the ket with
label |µn〉 to the next ket, namely |µn+1〉. This can in-
deed achieved by fixing, once and for all, the value of the
allowed parameter λ to be λ = µ0. We have then,
V̂ (µ0) · |µn〉 = |µn + µ0〉 = |µn+1〉
which is what we wanted. This basic ‘shift operator’ will
be the building block for approximating any (polynomial)
function of p. In order to do that we notice that the
function p can be approximated by,
p ≈ ~
(µ0 p
~ − e−i
where the approximation is good for p << ~/µ0. Thus,
one can define a regulated operator p̂µ0 that depends on
the ‘scale’ µ0 as:
p̂µ0 · |µn〉 :=
[V (µ0) − V (−µ0)] · |µn〉 =
(|µn+1〉 − |µn−1〉) (22)
In order to regulate the operator p̂2, there are (at least)
two possibilities, namely to compose the operator p̂µ0
with itself or to define a new approximation. The oper-
ator p̂µ0 · p̂µ0 has the feature that shifts the states two
steps in the graph to both sides. There is however an-
other operator that only involves shifting once:
p̂2µ0 · |νn〉 :=
[2 − V̂ (µ0) − V̂ (−µ0)] · |νn〉 =
(2|νn〉 − |νn+1〉 − |νn−1〉) (23)
which corresponds to the approximation p2 ≈ 2~
cos(µ0 p/~)), valid also in the regime p << ~/µ0. With
these considerations, one can define the operator Ĥµ0 ,
the Hamiltonian at scale µ0, that in practice ‘lives’ on
the space Hγµ0 as,
Ĥµ0 :=
p̂2µ0 + V̂ (q) , (24)
that is a well defined, symmetric operator on Hγµ0 . No-
tice that the operator is also defined on Hpoly, but there
its physical interpretation is problematic. For example,
it turns out that the expectation value of the kinetic term
calculated on most states (states which are not tailored
to the exact value of the parameter µ0) is zero. Even
if one takes a state that gives “reasonable“ expectation
values of the µ0-kinetic term and uses it to calculate the
expectation value of the kinetic term corresponding to
a slight perturbation of the parameter µ0 one would get
zero. This problem, and others that arise when working
on Hpoly, forces one to assign a physical interpretation
to the Hamiltonian Ĥµ0 only when its action is restricted
to the subspace Hγµ0 .
Let us now explore the form that the Hamiltonian takes
in the two possible polarizations. In the q-polarization,
the basis, labelled by n is given by the functions χn(q) =
δq,µn . That is, the wave functions will only have sup-
port on the set γµ0 . Alternatively, one can think of a
state as completely characterized by the ‘Fourier coeffi-
cients’ an: ψ(q) ↔ an, which is the value that the wave
function ψ(q) takes at the point q = µn = nµ0. Thus,
the Hamiltonian takes the form of a difference equation
when acting on a general state ψ(q). Solving the time
independent Schrödinger equation Ĥ · ψ = E ψ amounts
to solving the difference equation for the coefficients an.
The momentum polarization has a different structure.
In this case, the operator p̂2µ0 acts as a multiplication
operator,
p̂2µ0 · ψ(p) =
1 − cos
(µ0 p
ψ(p) (25)
The operator corresponding to q will be represented as a
derivative operator
q̂ · ψ(p) := i~ ∂p ψ(p).
For a generic potential V (q), it has to be defined by
means of spectral theory defined now on a circle. Why
on a circle? For the simple reason that by restricting
ourselves to a regular graph γµ0 , the functions of p that
preserve it (when acting as shift operators) are of the
form e(i m µ0 p/~) for m integer. That is, what we have
are Fourier modes, labelled by m, of period 2π ~/µ0 in p.
Can we pretend then that the phase space variable p is
now compactified? The answer is in the affirmative. The
inner product on periodic functions ψµ0(p) of p coming
from the full Hilbert space Hpoly and given by
〈φ(p)|ψ(p)〉poly = lim
L 7→∞
dp φ(p)ψ(p)
is precisely equivalent to the inner product on the circle
given by the uniform measure
〈φ(p)|ψ(p)〉µ0 =
∫ π~/µ0
−π~/µ0
dp φ(p)ψ(p)
with p ∈ (−π~/µ0, π~/µ0). As long as one restricts at-
tention to the graph γµ0 , one can work in this separable
Hilbert space Hγµ0 of square integrable functions on S
Immediately, one can see the limitations of this descrip-
tion. If the mechanical system to be quantized is such
that its orbits have values of the momenta p that are
not small compared with π~/µ0 then the approximation
taken will be very poor, and we don’t expect neither the
effective classical description nor its quantization to be
close to the standard one. If, on the other hand, one is al-
ways within the region in which the approximation can be
regarded as reliable, then both classical and quantum de-
scriptions should approximate the standard description.
What does ‘close to the standard description’ exactly
mean needs, of course, some further clarification. In
particular one is assuming the existence of the usual
Schrödinger representation in which the system has a be-
havior that is also consistent with observations. If this is
the case, the natural question is: How can we approxi-
mate such description from the polymer picture? Is there
a fine enough graph γµ0 that will approximate the system
in such a way that all observations are indistinguishable?
Or even better, can we define a procedure, that involves
a refinement of the graph γµ0 such that one recovers the
standard picture?
It could also happen that a continuum limit can be de-
fined but does not coincide with the ‘expected one’. But
there might be also physical systems for which there is
no standard description, or it just does not make sense.
Can in those cases the polymer representation, if it ex-
ists, provide the correct physical description of the sys-
tem under consideration? For instance, if there exists a
physical limitation to the minimum scale set by µ0, as
could be the case for a quantum theory of gravity, then
the polymer description would provide a true physical
bound on the value of certain quantities, such as p in
our example. This could be the case for loop quantum
cosmology, where there is a minimum value for physical
volume (coming from the full theory), and phase space
points near the ‘singularity’ lie at the region where the
approximation induced by the scale µ0 departs from the
standard classical description. If in that case the poly-
mer quantum system is regarded as more fundamental
than the classical system (or its standard Wheeler-De
Witt quantization), then one would interpret this dis-
crepancies in the behavior as a signal of the breakdown
of classical description (or its ‘naive’ quantization).
In the next section we present a method to remove
the regulator µ0 which was introduced as an intermedi-
ate step to construct the dynamics. More precisely, we
shall consider the construction of a continuum limit of
the polymer description by means of a renormalization
procedure.
V. THE CONTINUUM LIMIT
This section has two parts. In the first one we motivate
the need for a precise notion of the continuum limit of
the polymeric representation, explaining why the most
direct, and naive approach does not work. In the sec-
ond part, we shall present the main ideas and results of
the paper [6], where the Hamiltonian and the physical
Hilbert space in polymer quantum mechanics are con-
structed as a continuum limit of effective theories, follow-
ing Wilson’s renormalization group ideas. The resulting
physical Hilbert space turns out to be unitarily isomor-
phic to the ordinary Hs = L2(R, dq) of the Schrödinger
theory.
Before describing the results of [6] we should discuss
the precise meaning of reaching a theory in the contin-
uum. Let us for concreteness consider the B-type repre-
sentation in the q-polarization. That is, states are func-
tions of q and the orthonormal basis χµ(q) is given by
characteristic functions with support on q = µ. Let us
now suppose we have a Schrödinger state Ψ(q) ∈ Hs =
L2(R, dq). What is the relation between Ψ(q) and a state
in Hpoly,x? We are also interested in the opposite ques-
tion, that is, we would like to know if there is a preferred
state in Hs that is approximated by an arbitrary state
ψ(q) in Hpoly,x. The first obvious observation is that a
Schödinger state Ψ(q) does not belong to Hpoly,x since it
would have an infinite norm. To see that note that even
when the would-be state can be formally expanded in the
χµ basis as,
Ψ(q) =
Ψ(µ) χµ(q)
where the sum is over the parameter µ ∈ R. Its associ-
ated norm in Hpoly,x would be:
|Ψ(q)|2poly =
|Ψ(µ)|2 → ∞
which blows up. Note that in order to define a mapping
P : Hs → Hpoly,x, there is a huge ambiguity since the
values of the function Ψ(q) are needed in order to expand
the polymer wave function. Thus we can only define a
mapping in a dense subset D of Hs where the values of the
functions are well defined (recall that in Hs the value of
functions at a given point has no meaning since states are
equivalence classes of functions). We could for instance
ask that the mapping be defined for representatives of the
equivalence classes in Hs that are piecewise continuous.
From now on, when we refer to an element of the space
Hs we shall be refereeing to one of those representatives.
Notice then that an element of Hs does define an element
of Cyl∗γ , the dual to the space Cylγ , that is, the space
of cylinder functions with support on the (finite) lattice
γ = {µ1, µ2, . . . , µN}, in the following way:
Ψ(q) : Cylγ −→ C
such that
Ψ(q)[ψ(q)] = (Ψ|ψ〉 :=
Ψ(µ) 〈χµ|
ψi χµi〉polyγ
Ψ(µi)ψi < ∞ (26)
Note that this mapping could be seen as consisting of two
parts: First, a projection Pγ : Cyl
∗ → Cylγ such that
Pγ(Ψ) = Ψγ(q) :=
i Ψ(µi)χµi(q) ∈ Cylγ . The state
Ψγ is sometimes refereed to as the ‘shadow of Ψ(q) on
the lattice γ’. The second step is then to take the inner
product between the shadow Ψγ(q) and the state ψ(q)
with respect to the polymer inner product 〈Ψγ |ψ〉polyγ .
Now this inner product is well defined. Notice that for
any given lattice γ the corresponding projector Pγ can be
intuitively interpreted as some kind of ‘coarse graining
map’ from the continuum to the lattice γ. In terms of
functions of q the projection is replacing a continuous
function defined on R with a function over the lattice
γ ⊂ R which is a discrete set simply by restricting Ψ to
γ. The finer the lattice the more points that we have
on the curve. As we shall see in the second part of this
section, there is indeed a precise notion of coarse graining
that implements this intuitive idea in a concrete fashion.
In particular, we shall need to replace the lattice γ with
a decomposition of the real line in intervals (having the
lattice points as end points).
Let us now consider a system in the polymer represen-
tation in which a particular lattice γ0 was chosen, say
with points of the form {qk ∈ R |qk = ka0 , ∀ k ∈ Z},
namely a uniform lattice with spacing equal to a0. In this
case, any Schrödinger wave function (of the type that we
consider) will have a unique shadow on the lattice γ0. If
we refine the lattice γ 7→ γn by dividing each interval in
2n new intervals of length an = a0/2
n we have new shad-
ows that have more and more points on the curve. Intu-
itively, by refining infinitely the graph we would recover
the original function Ψ(q). Even when at each finite step
the corresponding shadow has a finite norm in the poly-
mer Hilbert space, the norm grows unboundedly and the
limit can not be taken, precisely because we can not em-
bed Hs into Hpoly. Suppose now that we are interested
in the reverse process, namely starting from a polymer
theory on a lattice and asking for the ‘continuum wave
function’ that is best approximated by a wave function
over a graph. Suppose furthermore that we want to con-
sider the limit of the graph becoming finer. In order
to give precise answers to these (and other) questions we
need to introduce some new technology that will allow us
to overcome these apparent difficulties. In the remaining
of this section we shall recall these constructions for the
benefit of the reader. Details can be found in [6] (which
is an application of the general formalism discussed in
[9]).
The starting point in this construction is the concept
of a scale C, which allows us to define the effective the-
ories and the concept of continuum limit. In our case a
scale is a decomposition of the real line in the union of
closed-open intervals, that cover the whole line and do
not intersect. Intuitively, we are shifting the emphasis
from the lattice points to the intervals defined by the
same points with the objective of approximating con-
tinuous functions defined on R with functions that are
constant on the intervals defined by the lattice. To be
precise, we define an embedding, for each scale Cn from
Hpoly to Hs by means of a step function:
Ψ(man) χman(q) →
Ψ(man) χαm(q) ∈ Hs
with χαn(q) a characteristic function on the interval
αm = [man, (m + 1)an). Thus, the shadows (living on
the lattice) were just an intermediate step in the con-
struction of the approximating function; this function is
piece-wise constant and can be written as a linear com-
bination of step functions with the coefficients provided
by the shadows.
The challenge now is to define in an appropriate sense
how one can approximate all the aspects of the theory
by means of this constant by pieces functions. Then the
strategy is that, for any given scale, one can define an
effective theory by approximating the kinetic operator
by a combination of the translation operators that shift
between the vertices of the given decomposition, in other
words by a periodic function in p. As a result one has a
set of effective theories at given scales which are mutually
related by coarse graining maps. This framework was
developed in [6]. For the convenience of the reader we
briefly recall part of that framework.
Let us denote the kinematic polymer Hilbert space at
the scale Cn as HCn , and its basis elements as eαi,Cn ,
where αi = [ian, (i + 1)an) ∈ Cn. By construction this
basis is orthonormal. The basis elements in the dual
Hilbert space H∗Cn are denoted by ωαi,Cn ; they are also
orthonormal. The states ωαi,Cn have a simple action on
Cyl, ωαi,Cn(δx0,q) = χαi,Cn(x0). That is, if x0 is in the
interval αi of Cn the result is one and it is zero if it is
not there.
Given any m ≤ n, we define d∗m,n : H∗Cn → H
as the ‘coarse graining’ map between the dual Hilbert
spaces, that sends the part of the elements of the dual
basis to zero while keeping the information of the rest:
d∗m,n(ωαi,Cn) = ωβj ,Cm if i = j2
n−m, in the opposite case
d∗m,n(ωαi,Cn) = 0.
At every scale the corresponding effective theory is
given by the hamiltonian Hn. These Hamiltonians will
be treated as quadratic forms, hn : HCn → R, given by
hn(ψ) = λ
(ψ,Hnψ) , (27)
where λ2Cn is a normalizaton factor. We will see later
that this rescaling of the inner product is necessary in
order to guarantee the convergence of the renormalized
theory. The completely renormalized theory at this scale
is obtained as
hrenm := lim
d⋆m,nhn. (28)
and the renormalized Hamiltonians are compatible with
each other, in the sense that
d⋆m,nh
n = h
In order to analyze the conditions for the convergence
in (28) let us express the Hamiltonian in terms of its
eigen-covectors end eigenvalues. We will work with effec-
tive Hamiltonians that have a purely discrete spectrum
(labelled by ν) Hn · Ψν,Cn = Eν,Cn Ψν,Cn . We shall also
introduce, as an intermediate step, a cut-off in the energy
levels. The origin of this cut-off is in the approximation
of the Hamiltonian of our system at a given scale with
a Hamiltonian of a periodic system in a regime of small
energies, as we explained earlier. Thus, we can write
hνcut−offm =
νcut−off
Eν,CmΨν,Cm ⊗ Ψν,Cm , (29)
where the eigen covectors Ψν,Cm are normalized accord-
ing to the inner product rescaled by 1
, and the cut-
off can vary up to a scale dependent bound, νcut−off ≤
νmax(Cm). The Hilbert space of covectors together with
such inner product will be called H⋆renCm .
In the presence of a cut-off, the convergence of the
microscopically corrected Hamiltonians, equation (28) is
equivalent to the existence of the following two limits.
The first one is the convergence of the energy levels,
Eν,Cn = E
ν . (30)
Second is the existence of the completely renormalized
eigen covectors,
d⋆m,n Ψν,Cn = Ψ
∈ H⋆renCm ⊂ Cyl
⋆ . (31)
We clarify that the existence of the above limit means
that Ψrenν,Cm(δx0,q) is well defined for any δx0,q ∈ Cyl. No-
tice that this point-wise convergence, if it can take place
at all, will require the tuning of the normalization factors
λ2Cn .
Now we turn to the question of the continuum limit
of the renormalized covectors. First we can ask for the
existence of the limit
Ψrenν,Cn(δx0,q) (32)
for any δx0,q ∈ Cyl. When this limits exists there is
a natural action of the eigen covectors in the continuum
limit. Below we consider another notion of the continuum
limit of the renormalized eigen covectors.
When the completely renormalized eigen covectors
exist, they form a collection that is d⋆-compatible,
d⋆m,nΨ
= Ψrenν,Cm . A sequence of d
⋆-compatible nor-
malizable covectors define an element of
, which is
the projective limit of the renormalized spaces of covec-
H⋆renCn . (33)
The inner product in this space is defined by
({ΨCn}, {ΦCn})renR := lim
(ΨCn ,ΦCn)
The natural inclusion of C∞0 in
is by an antilinear
map which assigns to any Ψ ∈ C∞0 the d⋆-compatible
collection ΨshadCn :=
ωαiΨ̄(L(αi)) ∈ H⋆renCn ⊂ Cyl
ΨshadCn will be called the shadow of Ψ at scale Cn and acts
in Cyl as a piecewise constant function. Clearly other
types of test functions like Schwartz functions are also
naturally included in
. In this context a shadow is
a state of the effective theory that approximates a state
in the continuum theory.
Since the inner product in
is degenerate, the
physical Hilbert space is defined as
H⋆phys :=
/ ker(·, ·)ren
Hphys := H⋆⋆phys
The nature of the physical Hilbert space, whether it is
isomorphic to the Schrödinger Hilber space, Hs, or not, is
determined by the normalization factors λ2Cn which can
be obtained from the conditions asking for compatibil-
ity of the dynamics of the effective theories at different
scales. The dynamics of the system under consideration
selects the continuum limit.
Let us now return to the definition of the Hamilto-
nian in the continuum limit. First consider the contin-
uum limit of the Hamiltonian (with cut-off) in the sense
of its point-wise convergence as a quadratic form. It
turns out that if the limit of equation (32) exists for
all the eigencovectors allowed by the cut-off, we have
νcut−off ren
: Hpoly,x → R defined by
νcut−off ren
(δx0,q) := lim
hνcut−off renn ([δx0,q]Cn). (34)
This Hamiltonian quadratic form in the continuum can
be coarse grained to any scale and, as can be ex-
pected, it yields the completely renormalized Hamilto-
nian quadratic forms at that scale. However, this is not
a completely satisfactory continuum limit because we can
not remove the auxiliary cut-off νcut−off . If we tried, as
we include more and more eigencovectors in the Hamilto-
nian the calculations done at a given scale would diverge
and doing them in the continuum is just as divergent.
Below we explore a more successful path.
We can use the renormalized inner product to induce
an action of the cut–off Hamiltonians on
νcut−off ren
({ΨCn}) := lim
hνcut−off renn ((ΨCn , ·)renCn ),
where we have used the fact that (ΨCn , ·)renCn ∈ HCn . The
existence of this limit is trivial because the renormalized
Hamiltonians are finite sums and the limit exists term by
term.
These cut-off Hamiltonians descend to the physical
Hilbert space
νcut−off ren
([{ΨCn}]) := h
νcut−off ren
({ΨCn})
for any representative {ΨCn} ∈ [{ΨCn}] ∈ H⋆phys.
Finally we can address the issue of removal of the cut-
off. The Hamiltonian hren
→ R is defined by the
limit
:= lim
νcut−off→∞
νcut−off ren
when the limit exists. Its corresponding Hermitian form
in Hphys is defined whenever the above limit exists. This
concludes our presentation of the main results of [6]. Let
us now consider several examples of systems for which
the continuum limit can be investigated.
VI. EXAMPLES
In this section we shall develop several examples of
systems that have been treated with the polymer quanti-
zation. These examples are simple quantum mechanical
systems, such as the simple harmonic oscillator and the
free particle, as well as a quantum cosmological model
known as loop quantum cosmology.
A. The Simple Harmonic Oscillator
In this part, let us consider the example of a Simple Har-
monic Oscillator (SHO) with parameters m and ω, clas-
sically described by the following Hamiltonian
mω2 x2.
Recall that from these parameters one can define a length
scale D =
~/mω. In the standard treatment one uses
this scale to define a complex structure JD (and an in-
ner product from it), as we have described in detail that
uniquely selects the standard Schrödinger representation.
At scale Cn we have an effective Hamiltonian for the
Simple Harmonic Oscillator (SHO) given by
HCn =
1 − cos anp
mω2x2 . (35)
If we interchange position and momentum, this Hamilto-
nian is exactly that of a pendulum of mass m, length l
and subject to a constant gravitational field g:
ĤCn = −
+mgl(1 − cos θ)
where those quantities are related to our system by,
mω an
, g =
, θ =
That is, we are approximating, for each scale Cn the
SHO by a pendulum. There is, however, an important
difference. From our knowledge of the pendulum system,
we know that the quantum system will have a spectrum
for the energy that has two different asymptotic behav-
iors, the SHO for low energies and the planar rotor in
the higher end, corresponding to oscillating and rotating
solutions respectively2. As we refine our scale and both
the length of the pendulum and the height of the periodic
potential increase, we expect to have an increasing num-
ber of oscillating states (for a given pendulum system,
there is only a finite number of such states). Thus, it
is justified to consider the cut-off in the energy eigenval-
ues, as discussed in the last section, given that we only
expect a finite number of states of the pendulum to ap-
proximate SHO eigenstates. With these consideration in
mind, the relevant question is whether the conditions for
the continuum limit to exist are satisfied. This question
has been answered in the affirmative in [6]. What was
shown there was that the eigen-values and eigen func-
tions of the discrete systems, which represent a discrete
and non-degenerate set, approximate those of the contin-
uum, namely, of the standard harmonic oscillator when
the inner product is renormalized by a factor λ2Cn = 1/2
This convergence implies that the continuum limit exists
as we understand it. Let us now consider the simplest
possible system, a free particle, that has nevertheless the
particular feature that the spectrum of the energy is con-
tinuous.
2 Note that both types of solutions are, in the phase space, closed.
This is the reason behind the purely discrete spectrum. The
distinction we are making is between those solutions inside the
separatrix, that we call oscillating, and those that are above it
that we call rotating.
B. Free Polymer Particle
In the limit ω → 0, the Hamiltonian of the Simple
Harmonic oscillator (35) goes to the Hamiltonian of a
free particle and the corresponding time independent
Schrödinger equation, in the p−polarization, is given by
(1 − cos anp
) − ECn
ψ̃(p) = 0
where we now have that p ∈ S1, with p ∈ (−π~
Thus, we have
ECn =
1 − cos
≤ ECn,max ≡ 2
. (36)
At each scale the energy of the particle we can describe
is bounded from above and the bound depends on the
scale. Note that in this case the spectrum is continu-
ous, which implies that the ordinary eigenfunctions of
the Hilbert are not normalizable. This imposes an upper
bound in the value that the energy of the particle can
have, in addition to the bound in the momentum due to
its “compactification”.
Let us first look for eigen-solutions to the time inde-
pendent Schrödinger equation, that is, for energy eigen-
states. In the case of the ordinary free particle, these
correspond to constant momentum plane waves of the
form e±(
) and such that the ordinary dispersion re-
lation p2/2m = E is satisfied. These plane waves are
not square integrable and do not belong to the ordinary
Hilbert space of the Schrödinger theory but they are still
useful for extracting information about the system. For
the polymer free particle we have,
ψ̃Cn(p) = c1δ(p− PCn) + c2δ(p+ PCn)
where PCn is a solution of the previous equation consid-
ering a fixed value of ECn . That is,
PCn = P (ECn) =
arccos
1 − ma
The inverse Fourier transform yields, in the ‘x represen-
tation’,
ψCn(xj) =
∫ π~/an
−π~/an
ψ̃(p) e
p j dp =
ixjPCn /~ + c2e
−ixjPCn /~
.(37)
with xj = an j for j ∈ Z. Note that the eigenfunctions
are still delta functions (in the p representation) and thus
not (square) normalizable with respect to the polymer
inner product, that in the p polarization is just given
by the ordinary Haar measure on S1, and there is no
quantization of the momentum (its spectrum is still truly
continuous).
Let us now consider the time dependent Schrödinger
equation,
i~ ∂t Ψ̃(p, t) = Ĥ · Ψ̃(p, t).
Which now takes the form,
Ψ̃(p, t) =
(1 − cos (an p/~)) Ψ̃(p, t)
that has as its solution,
Ψ̃(p, t) = e−
(1−cos (an p/~)) t ψ̃(p) = e(−iECn /~) t ψ̃(p)
for any initial function ψ̃(p), where ECn satisfy the dis-
persion relation (36). The wave function Ψ(xj , t), the
xj-representation of the wave function, can be obtained
for any given time t by Fourier transforming with (37)
the wave function Ψ̃(p, t).
In order to check out the convergence of the micro-
scopically corrected Hamiltonians we should analyze the
convergence of the energy levels and of the proper cov-
ectors. In the limit n → ∞, ECn → E = p2/2m so
we can be certain that the eigen-values for the energy
converge (when fixing the value of p). Let us write the
proper covector as ΨCn = (ψCn , ·)renCn ∈ H
. Then we
can bring microscopic corrections to scale Cm and look
for convergence of such corrections
ΨrenCm
= lim
d⋆m,nΨCn .
It is easy to see that given any basis vector eαi ∈ HCm
the following limit
ΨrenCm(eαi,Cm) = limCn→∞
ΨCn(dn,m(eαi,Cm))
exists and is equal to
ΨshadCm (eαi,Cm) = [d
⋆ΨSchr](eαi,Cm) = Ψ
Schr(iam)
where ΨshadCm is calculated using the free particle Hamilto-
nian in the Schrödinger representation. This expression
defines the completely renormalized proper covector at
the scale Cm.
C. Polymer Quantum Cosmology
In this section we shall present a version of quantum
cosmology that we call polymer quantum cosmology. The
idea behind this name is that the main input in the quan-
tization of the corresponding mini-superspace model is
the use of a polymer representation as here understood.
Another important input is the choice of fundamental
variables to be used and the definition of the Hamiltonian
constraint. Different research groups have made differ-
ent choices. We shall take here a simple model that has
received much attention recently, namely an isotropic,
homogeneous FRW cosmology with k = 0 and coupled
to a massless scalar field ϕ. As we shall see, a proper
treatment of the continuum limit of this system requires
new tools under development that are beyond the scope
of this work. We will thus restrict ourselves to the intro-
duction of the system and the problems that need to be
solved.
The system to be quantized corresponds to the phase
space of cosmological spacetimes that are homogeneous
and isotropic and for which the homogeneous spatial
slices have a flat intrinsic geometry (k = 0 condition).
The only matter content is a mass-less scalar field ϕ. In
this case the spacetime geometry is given by metrics of
the form:
ds2 = −dt2 + a2(t) (dx2 + dy2 + dz2)
where the function a(t) carries all the information and
degrees of freedom of the gravity part. In terms of the
coordinates (a, pa, ϕ, pϕ) for the phase space Γ of the the-
ory, all the dynamics is captured in the Hamiltonian con-
straint
C := −3
+ 8πG
2|a|3
The first step is to define the constraint on the kine-
matical Hilbert space to find physical states and then a
physical inner product to construct the physical Hilbert
space. First note that one can rewrite the equation as:
p2a a
2 = 8πG
If, as is normally done, one chooses ϕ to act as an in-
ternal time, the right hand side would be promoted, in
the quantum theory, to a second derivative. The left
hand side is, furthermore, symmetric in a and pa. At
this point we have the freedom in choosing the variable
that will be quantized and the variable that will not be
well defined in the polymer representation. The standard
choice is that pa is not well defined and thus, a and any
geometrical quantity derived from it, is quantized. Fur-
thermore, we have the choice of polarization on the wave
function. In this respect the standard choice is to select
the a-polarization, in which a acts as multiplication and
the approximation of pa, namely sin(λ pa)/λ acts as a
difference operator on wave functions of a. For details of
this particular choice see [5]. Here we shall adopt the op-
posite polarization, that is, we shall have wave functions
Ψ(pa, ϕ).
Just as we did in the previous cases, in order to gain
intuition about the behavior of the polymer quantized
theory, it is convenient to look at the equivalent prob-
lem in the classical theory, namely the classical system
we would get be approximating the non-well defined ob-
servable (pa in our present case) by a well defined object
(made of trigonometric functions). Let us for simplicity
choose to replace pa 7→ sin(λ pa)/λ. With this choice
we get an effective classical Hamiltonian constraint that
depends on λ:
Cλ := −
sin(λ pa)
λ2|a|
+ 8πG
2|a|3
We can now compute effective equations of motion by
means of the equations: Ḟ := {F, Cλ}, for any observable
F ∈ C∞(Γ), and where we are using the effective (first
order) action:
dτ(pa ȧ+ pϕ ϕ̇−N Cλ)
with the choice N = 1. The first thing to notice is that
the quantity pϕ is a constant of the motion, given that
the variable ϕ is cyclic. The second observation is that
ϕ̇ = 8πG
has the same sign as pϕ and never vanishes.
Thus ϕ can be used as a (n internal) time variable. The
next observation is that the equation for
, namely
the effective Friedman equation, will have a zero for a
non-zero value of a given by
λ2p2ϕ.
This is the value at which there will be bounce if the
trajectory started with a large value of a and was con-
tracting. Note that the ‘size’ of the universe when the
bounce occurs depends on both the constant pϕ (that
dictates the matter density) and the value of the lattice
size λ. Here it is important to stress that for any value
of pϕ (that uniquely fixes the trajectory in the (a, pa)
plane), there will be a bounce. In the original description
in terms of Einstein’s equations (without the approxima-
tion that depends on λ), there in no such bounce. If
ȧ < 0 initially, it will remain negative and the universe
collapses, reaching the singularity in a finite proper time.
What happens within the effective description if we re-
fine the lattice and go from λ to λn := λ/2
n? The only
thing that changes, for the same classical orbit labelled
by pϕ, is that the bounce occurs at a ‘later time’ and for
a smaller value of a∗ but the qualitative picture remains
the same.
This is the main difference with the systems considered
before. In those cases, one could have classical trajecto-
ries that remained, for a given choice of parameter λ,
within the region where sin(λp)/λ is a good approxima-
tion to p. Of course there were also classical trajectories
that were outside this region but we could then refine the
lattice and find a new value λ′ for which the new clas-
sical trajectory is well approximated. In the case of the
polymer cosmology, this is never the case: Every classical
trajectory will pass from a region where the approxima-
tion is good to a region where it is not; this is precisely
where the ‘quantum corrections’ kick in and the universes
bounces.
Given that in the classical description, the ‘original’
and the ‘corrected’ descriptions are so different we expect
that, upon quantization, the corresponding quantum the-
ories, namely the polymeric and the Wheeler-DeWitt will
be related in a non-trivial way (if at all).
In this case, with the choice of polarization and for a
particular factor ordering we have,
sin(λpa)
· Ψ(pa, ϕ) = 0
as the Polymer Wheeler-DeWitt equation.
In order to approach the problem of the continuum
limit of this quantum theory, we have to realize that the
task is now somewhat different than before. This is so
given that the system is now a constrained system with
a constraint operator rather than a regular non-singular
system with an ordinary Hamiltonian evolution. Fortu-
nately for the system under consideration, the fact that
the variable ϕ can be regarded as an internal time allows
us to interpret the quantum constraint as a generalized
Klein-Gordon equation of the form
Ψ = Θλ · Ψ
where the operator Θλ is ‘time independent’. This al-
lows us to split the space of solutions into ‘positive and
negative frequency’, introduce a physical inner product
on the positive frequency solutions of this equation and
a set of physical observables in terms of which to de-
scribe the system. That is, one reduces in practice the
system to one very similar to the Schrödinger case by
taking the positive square root of the previous equation:
Θλ · Ψ. The question we are interested is
whether the continuum limit of these theories (labelled
by λ) exists and whether it corresponds to the Wheeler-
DeWitt theory. A complete treatment of this problem
lies, unfortunately, outside the scope of this work and
will be reported elsewhere [12].
VII. DISCUSSION
Let us summarize our results. In the first part of the
article we showed that the polymer representation of the
canonical commutation relations can be obtained as the
limiting case of the ordinary Fock-Schrödinger represen-
tation in terms of the algebraic state that defines the
representation. These limiting cases can also be inter-
preted in terms of the naturally defined coherent states
associated to each representation labelled by the param-
eter d, when they become infinitely ‘squeezed’. The two
possible limits of squeezing lead to two different polymer
descriptions that can nevertheless be identified, as we
have also shown, with the two possible polarizations for
an abstract polymer representation. This resulting the-
ory has, however, very different behavior as the standard
one: The Hilbert space is non-separable, the representa-
tion is unitarily inequivalent to the Schrödinger one, and
natural operators such as p̂ are no longer well defined.
This particular limiting construction of the polymer the-
ory can shed some light for more complicated systems
such as field theories and gravity.
In the regular treatments of dynamics within the poly-
mer representation, one needs to introduce some extra
structure, such as a lattice on configuration space, to con-
struct a Hamiltonian and implement the dynamics for the
system via a regularization procedure. How does this re-
sulting theory compare to the original continuum theory
one had from the beginning? Can one hope to remove
the regulator in the polymer description? As they stand
there is no direct relation or mapping from the polymer
to a continuum theory (in case there is one defined). As
we have shown, one can indeed construct in a systematic
fashion such relation by means of some appropriate no-
tions related to the definition of a scale, closely related
to the lattice one had to introduce in the regularization.
With this important shift in perspective, and an appro-
priate renormalization of the polymer inner product at
each scale one can, subject to some consistency condi-
tions, define a procedure to remove the regulator, and
arrive to a Hamiltonian and a Hilbert space.
As we have seen, for some simple examples such as
a free particle and the harmonic oscillator one indeed
recovers the Schrödinger description back. For other sys-
tems, such as quantum cosmological models, the answer
is not as clear, since the structure of the space of classi-
cal solutions is such that the ‘effective description’ intro-
duced by the polymer regularization at different scales
is qualitatively different from the original dynamics. A
proper treatment of these class of systems is underway
and will be reported elsewhere [12].
Perhaps the most important lesson that we have
learned here is that there indeed exists a rich inter-
play between the polymer description and the ordinary
Schrödinger representation. The full structure of such re-
lation still needs to be unravelled. We can only hope that
a full understanding of these issues will shed some light
in the ultimate goal of treating the quantum dynamics
of background independent field systems such as general
relativity.
Acknowledgments
We thank A. Ashtekar, G. Hossain, T. Pawlowski and P.
Singh for discussions. This work was in part supported
by CONACyT U47857-F and 40035-F grants, by NSF
PHY04-56913, by the Eberly Research Funds of Penn
State, by the AMC-FUMEC exchange program and by
funds of the CIC-Universidad Michoacana de San Nicolás
de Hidalgo.
[1] R. Beaume, J. Manuceau, A. Pellet and M. Sirugue,
“Translation Invariant States In Quantum Mechanics,”
Commun. Math. Phys. 38, 29 (1974); W. E. Thirring and
H. Narnhofer, “Covariant QED without indefinite met-
ric,” Rev. Math. Phys. 4, 197 (1992); F. Acerbi, G. Mor-
chio and F. Strocchi, “Infrared singular fields and non-
regular representations of canonical commutation rela-
tion algebras”, J. Math. Phys. 34, 899 (1993); F. Cav-
allaro, G. Morchio and F. Strocchi, “A generalization of
the Stone-von Neumann theorem to non-regular repre-
sentations of the CCR-algebra”, Lett. Math. Phys. 47
307 (1999); H. Halvorson, “Complementarity of Repre-
sentations in quantum mechanics”, Studies in History
and Philosophy of Modern Physics 35 45 (2004).
[2] A. Ashtekar, S. Fairhurst and J.L. Willis, “Quantum
gravity, shadow states, and quantum mechanics”, Class.
Quant. Grav. 20 1031 (2003) [arXiv:gr-qc/0207106].
[3] K. Fredenhagen and F. Reszewski, “Polymer state ap-
proximations of Schrödinger wave functions”, Class.
Quant. Grav. 23 6577 (2006) [arXiv:gr-qc/0606090].
[4] M. Bojowald, “Loop quantum cosmology”, Living Rev.
Rel. 8, 11 (2005) [arXiv:gr-qc/0601085]; A. Ashtekar,
M. Bojowald and J. Lewandowski, “Mathematical struc-
ture of loop quantum cosmology”, Adv. Theor. Math.
Phys. 7 233 (2003) [arXiv:gr-qc/0304074]; A. Ashtekar,
T. Pawlowski and P. Singh, “Quantum nature of the
big bang: Improved dynamics” Phys. Rev. D 74 084003
(2006) [arXiv:gr-qc/0607039]
[5] V. Husain and O. Winkler, “Semiclassical states for
quantum cosmology” Phys. Rev. D 75 024014 (2007)
[arXiv:gr-qc/0607097]; V. Husain V and O. Winkler, “On
singularity resolution in quantum gravity”, Phys. Rev. D
69 084016 (2004). [arXiv:gr-qc/0312094].
[6] A. Corichi, T. Vukasinac and J.A. Zapata. “Hamil-
tonian and physical Hilbert space in polymer quan-
tum mechanics”, Class. Quant. Grav. 24 1495 (2007)
[arXiv:gr-qc/0610072]
[7] A. Corichi and J. Cortez, “Canonical quantization from
an algebraic perspective” (preprint)
[8] A. Corichi, J. Cortez and H. Quevedo, “Schrödinger
and Fock Representations for a Field Theory on
Curved Spacetime”, Annals Phys. (NY) 313 446 (2004)
[arXiv:hep-th/0202070].
[9] E. Manrique, R. Oeckl, A. Weber and J.A. Zapata, “Loop
quantization as a continuum limit” Class. Quant. Grav.
23 3393 (2006) [arXiv:hep-th/0511222]; E. Manrique,
R. Oeckl, A. Weber and J.A. Zapata, “Effective theo-
ries and continuum limit for canonical loop quantization”
(preprint)
[10] D.W. Chiou, “Galileo symmetries in polymer particle
representation”, Class. Quant. Grav. 24, 2603 (2007)
[arXiv:gr-qc/0612155].
[11] W. Rudin, Fourier analysis on groups, (Interscience, New
York, 1962)
[12] A. Ashtekar, A. Corichi, P. Singh, “Contrasting LQC
and WDW using an exactly soluble model” (preprint);
A. Corichi, T. Vukasinac, and J.A. Zapata, “Continuum
limit for quantum constrained system” (preprint).
http://arxiv.org/abs/gr-qc/0207106
http://arxiv.org/abs/gr-qc/0606090
http://arxiv.org/abs/gr-qc/0601085
http://arxiv.org/abs/gr-qc/0304074
http://arxiv.org/abs/gr-qc/0607039
http://arxiv.org/abs/gr-qc/0607097
http://arxiv.org/abs/gr-qc/0312094
http://arxiv.org/abs/gr-qc/0610072
http://arxiv.org/abs/hep-th/0202070
http://arxiv.org/abs/hep-th/0511222
http://arxiv.org/abs/gr-qc/0612155
ABSTRACT
  A rather non-standard quantum representation of the canonical commutation
relations of quantum mechanics systems, known as the polymer representation has
gained some attention in recent years, due to its possible relation with Planck
scale physics. In particular, this approach has been followed in a symmetric
sector of loop quantum gravity known as loop quantum cosmology. Here we explore
different aspects of the relation between the ordinary Schroedinger theory and
the polymer description. The paper has two parts. In the first one, we derive
the polymer quantum mechanics starting from the ordinary Schroedinger theory
and show that the polymer description arises as an appropriate limit. In the
second part we consider the continuum limit of this theory, namely, the reverse
process in which one starts from the discrete theory and tries to recover back
the ordinary Schroedinger quantum mechanics. We consider several examples of
interest, including the harmonic oscillator, the free particle and a simple
cosmological model.

<|endoftext|><|startoftext|>
Introduction
	Conceptual structure for material properties
	Idealized one-dimensional loading
	Ramp compression
	Shock compression
	Accuracy: application to air
	Complex behavior of condensed matter
	Temperature
	Density-temperature equations of state
	Temperature model for mechanical equations of state
	Strength
	Preferred representation of isotropic strength
	Beryllium
	Phase changes
	Composite loading paths
	Conclusions
	Acknowledgments
	References
	References
	List of figures
ABSTRACT
  A general formulation was developed to represent material models for
applications in dynamic loading. Numerical methods were devised to calculate
response to shock and ramp compression, and ramp decompression, generalizing
previous solutions for scalar equations of state. The numerical methods were
found to be flexible and robust, and matched analytic results to a high
accuracy. The basic ramp and shock solution methods were coupled to solve for
composite deformation paths, such as shock-induced impacts, and shock
interactions with a planar interface between different materials. These
calculations capture much of the physics of typical material dynamics
experiments, without requiring spatially-resolving simulations. Example
calculations were made of loading histories in metals, illustrating the effects
of plastic work on the temperatures induced in quasi-isentropic and
shock-release experiments, and the effect of a phase transition.

<|endoftext|><|startoftext|>
Introduction
A hypercube H(X) on a set X is a graph which vertices are the finite subsets
of X ; two vertices are joined by an edge if they differ by a singleton. A partial
cube is a graph that can be isometrically embedded into a hypercube.
There are three general graph-theoretical structures that play a prominent
role in the theory of partial cubes; namely, semicubes, Djoković’s relation θ, and
Winkler’s relation Θ. We use these structures, in particular, to characterize bi-
partite graphs and partial cubes. The characterization problem for partial cubes
was considered as an important one and many characterizations are known.
We list contributions in the chronological order: Djoković [9] (1973), Avis [2]
(1981), Winkler [20] (1984), Roth and Winkler [18] (1986), Chepoi [6, 7] (1988
and 1994). In the paper, we present new proofs for the results of Djoković [9],
Winkler [20], and Chepoi [6], and obtain two more characterizations of partial
cubes.
http://arxiv.org/abs/0704.0010v1
The paper is also concerned with some ways of constructing new partial
cubes from old ones. Properties of subcubes, the Cartesian product of partial
cubes, and expansion and contraction of a partial cube are investigated. We
introduce a construction based on pasting two graphs together and show how
new partial cubes can be obtained from old ones by pasting them together.
The paper is organized as follows.
Hypercubes and partial cubes are introduced in Section 2 together with
two basic examples of infinite partial cubes. Vertex sets of partial cubes are
described in terms of well graded families of finite sets.
In Section 3 we introduce the concepts of a semicube, Djoković’s θ and Win-
kler’s Θ relations, and establish some of their properties. Bipartite graphs and
partial cubes are characterized by means of these structures. One more charac-
terization of partial cubes is obtained in Section 4, where so-called fundamental
sets in a graph are introduced.
The rest of the paper is devoted to constructions: subcubes and the Carte-
sian product (Section 6), pasting (Section 7), and expansions and contractions
(Section 8). We show that these constructions produce new partial cubes from
old ones. Isometric and lattice dimensions of new partial cubes are calculated.
These dimensions are introduced in Section 5.
Few words about conventions used in the paper are in order. The sum
(disjoint union) A+B of two sets A and B is the union
({1} ×A) ∪ ({2} ×B).
All graphs in the paper are simple undirected graphs. In the notation G =
(V,E), the symbol V stands for the set of vertices of the graph G and E stands
for its set of edges. By abuse of language, we often write ab for an edge in a
graph; if this is the case, ab is an unordered pair of distinct vertices. We denote
〈U〉 the graph induced by the set of vertices U ⊆ V . If G is a connected graph,
then dG(a, b) stands for the distance between two vertices a and b of the graph
G. Wherever it is clear from the context which graph is under consideration, we
drop the subscript G in dG(a, b). A subgraph H ⊆ G is an isometric subgraph
if dH(a, b) = dG(a, b) for all vertices a and b of H ; it is convex if any shortest
path in G between vertices of H belongs to H .
2 Hypercubes and partial cubes
Let X be a set. We denote Pf (X) the set of all finite subsets of X .
Definition 2.1. A graph H(X) has the set Pf (X) as the set of its vertices; a
pair of vertices PQ is an edge of H(X) if the symmetric difference P∆Q is a
singleton. The graph H(X) is called the hypercube on X [9]. If X is a finite
set of cardinality n, then the graph H(X) is the n-cube Qn. The dimension of
the hypercube H(X) is the cardinality of the set X .
The shortest path distance d(P,Q) on the hypercube H(X) is the Hamming
distance between sets P and Q:
d(P,Q) = |P∆Q| for P,Q ∈ Pf . (2.1)
The set Pf (X) is a metric space with the metric d.
Definition 2.2. A graph G is a partial cube if it can be isometrically embedded
into a hypercube H(X) for some set X . We often identify G with its isometric
image in the hypercube H(X), and say that G is a partial cube on the set X .
Figure 2.1: A graph and its isometric embedding into Q3.
An example of a partial cube and its isometric embedding into the cube Q3
is shown in Figure 2.1.
Clearly, a family F of finite subsets of X induces a partial cube on X if and
only if for any two distinct subsets P,Q ∈ F there is a sequence
R0 = P,R1, . . . , Rn = Q
of sets in F such that
d(Ri, Ri+1) = 1 for all 0 ≤ i < n, and d(P,Q) = n. (2.2)
The families of sets satisfying condition (2.2) are known as well graded fam-
ilies of sets [10]. Note that a sequence (Ri) satisfying (2.2) is a shortest path
from P to Q in H(X) (and in the subgraph induced by F).
Definition 2.3. A family F of arbitrary subsets ofX is a wg-family (well graded
family of sets) if, for any two distinct subsets P,Q ∈ F, the set P∆Q is finite
and there is a sequence
R0 = P,R1, . . . , Rn = Q
of sets in F such that |Ri∆Ri+1| = 1 for all 0 ≤ i < n and |P∆Q| = n.
Example 2.1. The induced graph can be a partial cube on a different set if
the family F is not well graded. Consider, for instance, the family
F = {∅, {a}, {a, b}, {a, b, c}, {b, c}}
of subsets of X = {a, b, c}. The graph induced by this family is a path of length
4 in the cube Q3 (cf. Figure 2.2). Clearly, F is not well graded. On the other
hand, as it can be easily seen, any path is a partial cube.
Figure 2.2: A nonisometric path in the cube Q3.
Any family F of subsets of X defines a graph GF = (F, EF), where
EF = {{P,Q} ⊆ F : |P∆Q| = 1}.
Theorem 2.1. The graph GF defined by a family F of subsets of a set X is
isomorphic to a partial cube on X if and only if the family F is well graded.
Proof. We need to prove sufficiency only. Let S be a fixed set in F. We define
a mapping f : F → Pf (X) by f(R) = R∆S for R ∈ F. Then
d(f(R), f(T )) = |(R∆S)∆(T∆S)| = |R∆T |.
Thus f is an isometric embedding of F into Pf (X). Let (Ri) be a sequence of
sets in F such that R0 = P , Rn = Q, |P∆Q| = n, and |Ri∆Ri+1| = 1 for all
0 ≤ i < n. Then the sequence (f(Ri)) satisfies conditions (2.2). The result
follows.
A set R ∈ Pf (X) is said to be lattice between sets P,Q ∈ Pf (X) if
P ∩Q ⊆ R ⊆ P ∪Q.
It is metrically between P and Q if
d(P,R) + d(R,Q) = d(P,Q).
The following theorem is a well-known result about these two betweenness re-
lations on Pf (X) (see, for instance, [3]).
Theorem 2.2. Lattice and metric betweenness relations coincide on Pf (X).
Let F be a family of finite subsets of X . The set of all R ∈ F that are
between P,Q ∈ F is the interval I(P,Q) between P and Q in F. Thus,
I(P,Q) = F ∩ [P ∩Q,P ∪Q],
where [P ∩Q,P ∪Q] is the usual interval in the lattice Pf .
Two distinct sets P,Q ∈ F are adjacent in F if J(P,Q) = {P,Q}. If sets P
and Q form an edge in the graph induced by F, then P and Q are adjacent in
F, but, generally speaking, not vice versa. For instance, in Example 2.1, the
vertices ∅ and {b, c} are adjacent in F but do not define an edge in the induced
graph (cf. Figure 2.2).
The following theorem is a ‘local’ characterization of wg-families of sets.
Theorem 2.3. A family F ⊆ Pf (X) is well graded if and only if d(P,Q) = 1
for any two sets P and Q that are adjacent in F.
Proof. (Necessity.) Let F be a wg-family of sets. Suppose that P and Q are
adjacent in F. There is a sequence R0 = P,R1, . . . , Rn = Q that satisfies
conditions (2.2). Since the sequence (Ri) is a shortest path in F, we have
d(P, Pi) + d(Pi, Q) = d(P,Q) for all 0 ≤ i ≤ n.
Thus, Pi ∈ I(P,Q) = {P,Q}. It follows that d(P,Q) = n = 1.
(Sufficiency.) Let P and Q be two distinct sets in F. We prove by induction
on n = d(P,Q) that there is a sequence (Ri) ∈ F satisfying conditions (2.2).
The statement is trivial for n = 1. Suppose that n > 1 and that the
statement is true for all k < n. Let P and Q be two sets in F such that
d(P,Q) = n. Since d(P,Q) > 1, the sets P and Q are not adjacent in F.
Therefore there exists R ∈ F that lies between P and Q and is distinct from
these two sets. Then d(P,R) + d(R,Q) = d(P,Q) and both distances d(P,R)
and d(R,Q) are less than n. By the induction hypothesis, there is a sequence
(Ri) ∈ F such that
P = R0, R = Rj , Q = Rn for some 0 < j < n,
satisfying conditions (2.2) for 0 ≤ i < j and j ≤ i < n. It follows that F is a
wg-family of sets.
We conclude this section with two examples of infinite partial cubes (more
examples are found in [17]).
Example 2.2. Let Z be the graph on the set Z of integers with edges defined
by pairs of consecutive integers. This graph is a partial cube since its vertex set
is isometric to the wg-family of intervals {(−∞,m) : m ∈ Z} in Z.
Example 2.3. Let us consider Zn as a metric space with respect to the ℓ1-
metric. The graph Zn has Zn as the vertex set; two vertices in Zn are connected
if they are on the unit distance from each other. We will show in Section 6
(Corollary 6.1) that Zn is a partial cube.
3 Characterizations
Only connected graphs are considered in this section.
Definition 3.1. Let G = (V,E) be a graph and d be its distance function. For
any two adjacent vertices a, b ∈ V let Wab be the set of vertices that are closer
to a than to b:
Wab = {w ∈ V : d(w, a) < d(w, b)}.
Following [11], we call the sets Wab and induced subgraphs 〈Wab〉 semicubes of
the graph G. The semicubes Wab and Wba are called opposite semicubes.
Remark 3.1. The subscript ab in Wab stands for an ordered pair of vertices,
not for an edge of G. In his original paper [9], Djoković uses notation G(a, b)
(cf. [8]). We use the notation from [15].
Clearly, two opposite semicubes are disjoint. They can be used to charac-
terize bipartite graphs as follows.
Theorem 3.1. A graph G = (V,E) is bipartite if and only if the semicubes Wab
and Wba form a partition of V for any edge ab ∈ E.
Proof. Let us recall that a connected graph G is bipartite if and only if for every
vertex x there is no edge ab with d(x, a) = d(x, b) (see, for instance, [1]). For
any edge ab ∈ E and vertex x ∈ V we clearly have
d(x, a) = d(x, b) ⇔ x /∈ Wab ∪Wba.
The result follows.
The following lemma is instrumental and will be used frequently in the rest
of the paper.
Lemma 3.1. Let G = (V,E) be a graph and w ∈ Wab for some edge ab ∈ E.
d(w, b) = d(w, a) + 1.
Accordingly,
Wab = {w ∈ V : d(w, b) = d(w, a) + 1}.
Proof. By the triangle inequality, we have
d(w, a) < d(w, b) ≤ d(w, a) + d(a, b) = d(w, a) + 1.
The result follows, since d takes values in N.
There are two binary relations on the set of edges of a graph that play a
central role in characterizing partial cubes.
Definition 3.2. Let G = (V,E) be a graph and e = xy and f = uv be two
edges of G.
(i) (Djoković [9]) The relation θ on E is defined by
e θf ⇔ f joins a vertex in Wxy with a vertex in Wyx.
The notation can be chosen such that u ∈Wxy and v ∈ Wyx.
(ii) (Winkler [20]) The relation Θ on E is defined by
eΘf ⇔ d(x, u) + d(y, v) 6= d(x, v) + d(y, u).
It is clear that both relations θ and Θ are reflexive and Θ is symmetric.
Lemma 3.2. The relation θ is a symmetric relation on E.
Proof. Suppose that xy θ uv with u ∈ Wxy and v ∈ Wyx. By Lemma 3.1 and
the triangle inequality, we have
d(u, x) = d(u, y)− 1 ≤ d(u, v) + d(v, y)− 1 = d(v, y) =
= d(v, x)− 1 ≤ d(v, u) + d(u, x) − 1 = d(u, x).
Hence, d(u, x) = d(v, x) − 1 and d(v, y) = d(u, y)− 1. Therefore, x ∈ Wuv and
y ∈ Wvu. It follows that uv θ xy.
Lemma 3.3. θ ⊆ Θ.
Proof. Suppose that xy θ uv with u ∈Wxy, v ∈ Wyx. By Lemma 3.1,
d(x, u) + d(y, v) = d(x, v) − 1 + d(y, u)− 1 6= d(x, v) + d(y, u).
Hence, xyΘ uv.
Example 3.1. It is easy to verify that θ is the identity relation on the set of
edges of the cycle C3. On the other hand, any two edges of C3 stand in the
relation Θ. Thus, θ 6= Θ in this case.
Bipartite graphs can be characterized in terms of relations θ and Θ as follows.
Theorem 3.2. A graph G = (V,E) is bipartite if and only if θ = Θ.
Proof. (Necessity.) Suppose that G is a bipartite graph, two edges xy and uv
stand in the relation Θ, that is,
d(x, u) + d(y, v) 6= d(x, v) + d(y, u),
and that edges xy and uv do not stand in the relation θ. By Theorem 3.1, we
may assume that u, v ∈ Wxy. By Lemma 3.1, we have
d(x, u) + d(y, v) = d(y, u)− 1 + d(x, v) + 1 = d(x, v) + d(y, u),
a contradiction. It follows that Θ ⊆ θ. By Lemma 3.3, θ = Θ.
(Sufficiency.) Suppose that G is not bipartite. By Theorem 3.1, there is an
edge xy such that Wxy ∪Wyx is a proper subset of V . Since G is connected,
there is an edge uv with u /∈ Wxy ∪Wyx and v ∈ Wxy ∪Wyx. Clearly, uv does
not stand in the relation θ to xy. On the other hand,
d(x, u) + d(y, v) 6= d(x, v) + d(y, u),
since u /∈ Wxy ∪Wyx and v ∈ Wxy ∪Wyx. Thus, xyΘ uv, a contradiction, since
we assumed that θ = Θ.
By Theorem 3.2, the relations θ and Θ coincide on bipartite graphs. For
this reason we use the relation θ in the rest of the paper.
Lemma 3.4. Let G = (V,E) be a bipartite graph such that all its semicubes are
convex sets. Then two edges xy and uv stand in the relation θ if and only if the
corresponding pairs of mutually opposite semicubes form equal partitions of V :
xy θ uv ⇔ {Wxy,Wyx} = {Wuv,Wvu}.
Proof. (Necessity) We assume that the notation is chosen such that u ∈ Wxy
and v ∈ Wyx. Let z ∈ Wxy ∩Wvu. By Lemma 3.1, d(z, u) = d(z, v) + d(v, u).
Since z, u ∈ Wxy and Wxy is convex, we have v ∈ Wxy, a contradiction to the
assumption that v ∈Wyx. Thus Wxy ∩Wvu = ∅. Since two opposite semicubes
in a bipartite graph form a partition of V , we haveWuv =Wxy andWvu =Wyx.
A similar argument shows that Wuv = Wyx and Wvu = Wxy, if u ∈ Wyx
and v ∈ Wxy.
(Sufficiency.) Follows from the definition of the relation θ.
We need another general property of the relation θ (cf. Lemma 2.2 in [15]).
Lemma 3.5. Let P be a shortest path in a graph G. Then no two distinct edges
of P stand in the relation θ.
Proof. Let i < j and xixi+1 and xjxj+1 be two edges in a shortest path P from
x0 to xn. Then
d(xi, xj) < d(xi, xj+1) and d(xi+1, xj) < d(xi+1, xj+1),
so xi, xi+1 ∈ Wxjxj+1 . It follows that edges xixi+1 and xjxj+1 do not stand in
the relation θ.
The converse statement is true for bipartite graphs (we omit the proof); a
counterexample is the cycle C5 which is not bipartite.
Lemma 3.6. Let G = (V,E) be a bipartite graph. The following statements are
equivalent
(i) All semicubes of G are convex.
(ii) The relation θ is an equivalence relation on E.
Proof. (i) ⇒ (ii). Follows from Lemma 3.4.
(ii) ⇒ (i). Suppose that θ is transitive and there is a nonconvex semicube
Wab. Then there are two vertices u, v ∈ Wab and a shortest path P from u to
v that intersects Wba. This path contains two distinct edges e and f joining
vertices of semicubes Wab and Wba. The edges e and f stand in the relation θ
to the edge ab. By transitivity of θ, we have e θf . This contradicts the result
of Lemma 3.5. Thus all semicubes of G are convex.
We now establish some basic properties of partial cubes.
Theorem 3.3. Let G = (V,E) be a partial cube. Then
(i) G is a bipartite graph.
(ii) Each pair of opposite semicubes form a partition of V .
(iii) All semicubes are convex subsets of V .
(iv) θ is an equivalence relation on E.
Proof. We may assume that G is an isometric subgraph of some hypercube
H(X), that is, G = (F, EF) for a wg-family F of finite subsets of X .
(i) It suffices to note that if two sets in H(X) are connected by an edge then
they have different parity. Thus, H(X) is a bipartite graph and so is G.
(ii) Follows from (i) and Theorem 3.1.
(iii) LetWAB be a semicube of G. By Lemma 3.1 and Theorem 2.2, we have
WAB = {S ∈ F : S ∩B ⊆ A ⊆ S ∪B}.
Let Q,R ∈WAB and P be a vertex of G such that
d(Q,P ) + d(P,R) = d(Q,R).
By Theorem 2.2,
Q ∩R ⊆ P ⊆ Q ∪R.
Since Q,R ∈WAB , we have
Q ∩B ⊆ A ⊆ Q ∪B and R ∩B ⊆ A ⊆ R ∪B,
which implies
P ∩B ⊆ (Q ∪R) ∩B ⊆ A ⊆ (Q ∩R) ∪B ⊆ S ∪B.
Hence, P ∈ WAB, and the result follows.
(iv) Follows from (iii) and Lemma 3.6.
Remark 3.2. Since semicubes of a partial cube G = (V,E) are convex subsets
of the metric space V , they are half-spaces in V [19]. This terminology is used
in [6, 7].
The following theorem presents four characterizations of partial cubes. The
first two are due to Djoković [9] and Winkler [20] (cf. Theorem 2.10 in [15]).
Theorem 3.4. Let G = (V,E) be a connected graph. The following statements
are equivalent:
(i) G is a partial cube.
(ii) G is bipartite and all semicubes of G are convex.
(iii) G is bipartite and θ is an equivalence relation.
(iv) G is bipartite and, for all xy, uv ∈ E,
xy θ uv ⇒ {Wxy,Wyx} = {Wuv,Wvu}. (3.1)
(v) G is bipartite and, for any pair of adjacent vertices of G, there is a unique
pair of opposite semicubes separating these two vertices.
Proof. By Lemma 3.6, the statements (ii) and (iii) are equivalent and, by The-
orem 3.3, (i) implies both (ii) and (iii).
(iii) ⇒ (i). By Theorem 3.1, each pair {Wab,Wba} of opposite semicubes of
G form a partition of V . We orient these partitions by calling, in an arbitrary
way, one of the two opposite semicubes in each partition a positive semicube.
Let us assign to each x ∈ V the set W+(x) of all positive semicubes containing
x. In the next paragraph we prove that the family F = {W+(x)}x∈V is well
graded and that the assignment x 7→ W+(x) is an isometry between V and F.
Let x and y be two distinct vertices of G. We say that a positive semicube
Wab separates x and y if either x ∈ Wab, y ∈ Wba or x ∈ Wba, y ∈ Wab. It is
clear that Wab separates x and Y if and only if Wab ∈ W
+(x)∆W+(y). Let P
be a shortest path x0 = x, x1, . . . , xn = y from x to y. By Lemma 3.5, no two
distinct edges of P stand in the relation θ. By Lemma 3.4, distinct edges of P
define distinct positive semicubes; clearly, these semicubes separate x and y. Let
Wab be a positive semicube separating x and y, and, say, x ∈Wab and y ∈Wba.
There is an edge f ∈ P that joins vertices in Wab and Wba. Hence, f stands in
the relation θ to ab and, by Lemma 3.4, Wab is defined by f . It follows that any
semicube inW+(x)∆W+(y) is defined by a unique edge in P and any edge in P
defines a semicube in W+(x)∆W+(y). Therefore, d(W+(x),W+(y)) = d(x, y),
that is x 7→W+(x) is an isometry. Clearly, F is a wg-family of sets.
By Theorem 2.1, the family F is isometric to a wg-family of finite sets.
Hence, G is a partial cube.
(iv) ⇒ (ii). Suppose that there exist an edge ab such that semicube Wba is
not convex. Let p and q be two vertices in Wba such that there is a shortest
path P from p to q that intersects Wab. There are two distinct edges xy and uv
in P such that x, u ∈ Wab and y, v ∈ Wba. Since ab θ xy and ab θ uv, we have,
by (3.1),
Wab =Wxy =Wuv.
Hence, u ∈ Wxy and v ∈Wyx. By Lemma 3.1,
d(x, u) = d(x, v) − 1 = 1 + d(v, y)− 1 = d(v, y),
a contradiction, since P is a shortest path from p to q.
(ii) ⇒ (iv). Follows from Lemma 3.4.
It is clear that (iv) and (v) are equivalent.
4 Fundamental sets in partial cubes
Semicubes played an important role in the previous section. In this section we
introduce three more classes of useful subsets of graphs. We also establish one
more characterization of partial cubes.
Let G = (V,E) be a connected graph. For a given edge e = ab ∈ E, we
define the following sets (cf. [15, 16]):
Fab = {f ∈ E : e θf} = {uv ∈ E : u ∈Wab, v ∈Wba},
Uab = {w ∈Wab : w is adjacent to a vertex in Wba},
Uba = {w ∈Wba : w is adjacent to a vertex in Wab}.
The five sets are schematically shown in Figure 4.1.
Figure 4.1: Fundamental sets in a partial cube.
Remark 4.1. In the case of a partial cube G = (V,E), the semicubes Wab and
Wba are complementary half-spaces in the metric space V (cf. Remark 3.2).
Then the set Fab can be regarded as a ‘hyperplane’ separating these half-spaces
(see [17] where this analogy is formalized in the context of hyperplane arrange-
ments).
The following theorem generalizes the result obtained in [16] for median
graphs (see also [15]).
Theorem 4.1. Let ab be an edge of a connected bipartite graph G. If the
semicubes Wab and Wba are convex, then the set Fab is a matching and induces
an isomorphism between the graphs 〈Uab〉 and 〈Uba〉.
Proof. Suppose that Fab is not a matching. Then there are distinct edges xu
and xv with, say, x ∈ Uab and u, v ∈ Uba. By the triangle inequality, d(u, v) ≤ 2.
Since G does not have triangles, d(u, v) 6= 1. Hence, d(u, v) = 2, which implies
that x lies between u and v. This contradicts convexity of Wba, since x ∈ Wab.
Therefore Fab is a matching.
To show that Fab induces an isomorphism, let xy, uv ∈ Fab and xu ∈ E,
where x, u ∈ Uab and y, v ∈ Uba. Since G does not have odd cycles, d(v, y) 6= 2.
By the triangle inequality,
d(v, y) ≤ d(v, u) + d(u, x) + d(x, y) = 3.
Since Wba is convex, d(v, y) 6= 3. Thus d(v, y) = 1, that is, vy is an edge. The
result follows by symmetry.
By Theorem 3.4(ii), we have the following corollary.
Corollary 4.1. Let G = (V,E) be a partial cube. For any edge ab the set Fab
is a matching and induces an isomorphism between induced graphs 〈Uab〉 and
〈Uba〉.
Figure 4.2: Graph G.
Example 4.1. Let G be the graph depicted in Figure 4.2. The set
Fab = {ab, xu, yv}
is a matching and defines an isomorphism between the graphs induced by subsets
Uab = {a, x, y} and Uba = {b, u, v}. The set Wba is not convex, so G is not a
partial cube. Thus the converse of Corollary 4.1 does not hold.
We now establish another characterization of partial cubes that utilizes a
geometric property of families Fab.
Theorem 4.2. For a connected graph G the following statements are equivalent:
(i) G is a partial cube.
(ii) G is bipartite and
d(x, u) = d(y, v) and d(x, v) = d(y, u), (4.1)
for any ab ∈ E and xy, uv ∈ Fab.
Proof. (i)⇒(ii). We may assume that x, u ∈ Wab and y, v ∈ Wba. Since θ is an
equivalence relation, we have xy θ uv θab. By Lemma 3.4, Wuv = Wxy = Wab.
By Lemma 3.1,
d(x, u) = d(x, v) − 1 = d(v, y) + 1− 1 = d(y, v).
We also have
d(x, v) = d(y, v) + 1 = d(y, u),
by the same lemma.
(ii)⇒(i). Suppose that G is not a partial cube. Then, by Theorem 3.4, there
exist an edge ab such that, say, semicube Wba is not convex. Let p and q be two
vertices in Wba such that there is a shortest path P from p to q that intersects
Wab. Let uv be the first edge in P which belongs to Fab and xy be the last edge
in P with the same property (see Figure 4.3).
Figure 4.3: An illustration to the proof of theorem 4.2.
Since P is a shortest path, we have
d(v, y) = d(v, u) + d(u, x) + d(x, y) 6= d(x, u),
which contradicts condition (4.1). Thus all semicubes of G are convex. By
Theorem 3.4, G is a partial cube.
Remark 4.2. One can say that four vertices satisfying conditions (4.1) define
a rectangle in G. Then Theorem 4.2 states that a connected graph is a partial
cube if and only if it is bipartite and for any edge ab pairs of edges in Fab define
rectangles in G.
5 Dimensions of partial cubes
There are many different ways in which a given partial cube can be isometrically
embedded into a hypercube. For instance, the graph K2 can be isometrically
embedded in different ways into any hypercube H(X) with |X | > 2.
Following Djoković [9] (see also [8]), we define the isometric dimension,
dimI(G), of a partial cube G as the minimum possible dimension of a hypercube
H(X) in which G is isometrically embeddable. Recall (see Section 2) that the
dimension of H(X) is the cardinality of the set X .
Theorem 5.1. (Theorem 2 in [9].) Let G = (V,E) be a partial cube. Then
dimI(G) = |E/θ|, (5.1)
where θ is Djoković’s equivalence relation on E and E/θ is the set of its equiv-
alence classes (the quotient-set).
The quotient-set E/θ can be identified with the family of all distinct sets Fab
(see Section 4). If G is a finite partial cube, we may consider it as an isometric
subgraph of some hypercube Qn. Then the edges in each family Fab are parallel
edges in Qn (cf. Theorem 4.2). This observation essentially proves (5.1) in the
finite case.
Let G be a partial cube on a set X . The vertex set of G is a wg-family F of
finite subsets of X (see Section 2). We define the retraction of F as a family F′
of subsets of X ′ = ∪F \ ∩F consisting of the intersections of sets in F with X ′.
It is clear that F′ satisfies conditions
∩ F′ = ∅ and ∪ F′ = X ′. (5.2)
Proposition 5.1. The partial cubes induced by a wg-family F and its retraction
F′ are isomorphic.
Proof. It suffices to prove that metric spaces F and F′ are isometric. Clearly,
α : P 7→ P ∩X ′ is a mapping from F onto F′. For P,Q ∈ F, we have
(P ∩X ′)∆(Q ∩X ′) = (P∆Q) ∩X ′ = (P∆Q) ∩ (∪F \ ∩F) = P∆Q.
Thus, d(α(P ), α(Q)) = d(P,Q). Consequently, α is an isometry.
Let G be a partial cube on some set X induced by a wg-family F satisfying
conditions (5.2), and let PQ be an edge of G. By definition, there is x ∈ X such
that P∆Q = {x}. The following two lemmas are instrumental.
Lemma 5.1. Let PQ be an edge of a partial cube G on X and let P∆Q = {x}.
The two sets
{R ∈ F : x ∈ R} and {R ∈ F : x /∈ R}
form the same bipartition of the family F as semicubes WPQ and WQP .
Proof. We may assume that Q = P + {x}. Then, for any R ∈ F,
R∆Q = R∆(P + {x}) =
(R∆P ) + {x}, if x ∈ R,
R∆P, if x /∈ R.
Hence, |R∆P | < |R∆Q| if and only if x ∈ R. It follows that
WPQ = {R ∈ F : x ∈ R}.
A similar argument shows that WQP = {R ∈ F : x /∈ R}.
Lemma 5.2. If F is a wg-family of sets satisfying conditions (5.2), then for
any x ∈ X there are sets P,Q ∈ F such that P∆Q = {x}.
Proof. By conditions 5.2, for a given x ∈ X there are sets S and T in F such
that x ∈ S and x /∈ T . Let R0 = S,R1, . . . , Rn = T be a sequence of sets in
F satisfying conditions (2.2). It is clear that there is i such that x ∈ Ri and
x /∈ Ri+1. Hence, Ri∆Ri+1 = {x}, so we can choose P = Ri and Q = Ri+1.
By Lemmas 5.1 and 5.2, there is one-to-one correspondence between the set
X and the quotient-set E/θ. From Theorem 5.1 we obtain the following result.
Theorem 5.2. Let F be a wg-family of finite subsets of a set X such that
∩F = ∅ and ∪F = X, and let G be a partial cube on X induced by F. Then
dimI(G) = |X |.
Clearly, a graph which is isometrically embeddable into a partial cube is a
partial cube itself. We will show in Section 6 (Corollary 6.1) that the integer
lattice Zn is a partial cube. Thus a graph which is isometrically embeddable
into an integer lattice is a partial cube. It follows that a finite graph is a partial
cube if and only if it is embeddable in some integer lattice. Examples of infinite
partial cubes isometrically embeddable into a finite dimensional integer lattice
are found in [17].
We call the minimum possible dimension n of an integer lattice Zn, in which
a given graph G is isometrically embeddable, its lattice dimension and denote
it dimZ(G). The lattice dimension of a partial cube can be expressed in terms
of maximum matchings in so-called semicube graphs [11].
Definition 5.1. The semicube graph Sc(G) has all semicubes in G as the set
of its vertices. Two vertices Wab and Wcd are connected in Sc(G) if
Wab ∪Wcd = V and Wab ∩Wcd 6= ∅. (5.3)
If G is a partial cube, then condition (5.3) is equivalent to each of the two
equivalent conditions:
Wba ⊂Wcd ⇔ Wdc ⊂Wab, (5.4)
where ⊂ stands for the proper inclusion.
Theorem 5.3. (Theorem 1 in [11].) Let G be a finite partial cube. Then
dimZ(G) = dimI(G) − |M |,
where M is a maximum matching in the semicube graph Sc(G).
Example 5.1. Let G be the graph shown in Figure 2.1. It is easy to see that
dimI(G) = 3 and dimZ(G) = 2.
Example 5.2. Let T be a tree with n edges and m leaves. Then
dimI(T ) = n and dimZ(T ) = ⌈m/2⌉
(cf. [8] and [14], respectively).
Example 5.3. For the cycle C6 we have (see Figure 8.2)
dimI(C6) = dimZ(C6) = 3.
6 Subcubes and Cartesian products
Let G be a partial cube. We say that G′ is a subcube of G if it is an isometric
subgraph of G.
Clearly, a subcube is itself a partial cube. The converse does not hold; a
subgraph of a graph G can be a partial cube but not an isometric subgraph of
G (cf. Example 2.1).
If G′ is a subcube of a partial cube G, then dimI(G
′) ≤ dimI(G) and
dimZ(G
′) ≤ dimZ(G). In general, the two inequalities are not strict. For
instance, the cycle C6 is an isometric subgraph of the cube Q3 (see Figure 8.2)
dimI(C6) = dimZ(C6) = dimI(Q3) = dimZ(Q3) = 3.
Semicubes of a partial cube are examples of subcubes. Indeed, by Theo-
rem 3.4, semicubes are convex subgraphs and therefore isometric. In general,
the converse is not true; a path connecting two opposite vertices in C6 is an
isometric subgraph but not a convex one.
Another common way of constructing new partial cubes from old ones is by
forming their Cartesian products (see [15] for details and proofs).
Definition 6.1. Given two graphs G1 = (V1, E1) and G2 = (V2, E2), their
Cartesian product
G = G1�G2
has vertex set V = V1 × V2; a vertex u = (u1, u2) is adjacent to a vertex
v = (v1, v2) if and only if u1v1 ∈ E1 and u2 = v2, or u1 = v1 and u2v2 ∈ E2.
The operation � is associative, so we can write
G = G1� · · ·�Gn =
for the Cartesian product of graphs G1, . . . , Gn. A Cartesian product
i=1Gi
is connected if and only if the factors are connected. Then we have
dG(u, v) =
dGi(ui, vi). (6.1)
Example 6.1. Let {Xi}
i=1 be a family of sets and Y =
i=1 be their sum.
Then the Cartesian product of the hypercubes H(Xi) is isomorphic to the hy-
percube H(Y ). The isomorphism is established by the mapping
f : (P1, . . . , Pn) 7→
Formula (6.1) yields immediately the following results.
Proposition 6.1. Let Hi be isometric subgraphs of graphs Gi for all 1 ≤ i ≤ n.
Then the Cartesian product
i=1Hi is an isometric subgraph of the Cartesian
product
i=1Gi.
Corollary 6.1. The Cartesian product of a finite family of partial cubes is a
partial cube. In particular, the integer lattice Zn (cf. Examples 2.2 and 2.3) is
a partial cube.
The results of the next two theorems can be easily extended to arbitrary
finite products of finite partial cubes.
Theorem 6.1. Let G = G1�G2 be the Cartesian product of two finite partial
cubes. Then
dimI(G) = dimI(G1) + dimI(G2).
Proof. We may assume that G1 (resp. G2) is induced by a wg-family F1 (resp.
F2) of subsets of a finite set X1 (resp. X2) such that ∩F1 = ∅ and ∪F1 = X1
(resp. ∩F2 = ∅ and ∪F2 = X1) (see Section 5). By Theorem 5.2,
dimI(G1) = |X1| and dimI(G2) = |X2|.
It is clear that the graph G is induced by the wg-family F = F1 +F2 of subsets
of the set X = X1 + X2 (cf. Example 6.1) with ∩F = ∅, ∪F = X . By
Theorem 5.2,
dimI(G) = |X | = |X1|+ |X2| = dimI(G1) + dimI(G2).
Theorem 6.2. Let G = (V,E) be the Cartesian product of two finite partial
cubes G1 = (V1, E1) and G2 = (V2, E2). Then
dimZ(G) = dimZ(G1) + dimZ(G2).
Proof. Let W(a,b)(c,d) be a semicube of the graph G. There are two possible
cases:
(i) c = a, bd ∈ E2. Let (x, y) be a vertex of G. Then, by (6.1),
dG((x, y), (a, b)) = dG1(x, a) + dG2(y, b)
dG((x, y), (c, d)) = dG1(x, c) + dG2(y, d).
Hence,
dG((x, y), (a, b)) < dG((x, y), (c, d)) ⇔ dG2(y, b) < dG2(y, d).
It follows that
W(a,b)(c,d) = V1 ×Wbd. (6.2)
(ii) d = b, ac ∈ E1. Like in (i), we have
W(a,b)(c,d) =Wac × V2. (6.3)
Clearly, two semicubes given by (6.2) form an edge in the semicube graph
Sc(G) if and only if their second factors form an edge in the semicube graph
Sc(G2). The same is true for semicubes in the form (6.3) with respect to their
first factors. It is also clear that semicubes in the form (6.2) and in the form (6.3)
are not connected by an edge in Sc(G). Therefore the semicube graph Sc(G) is
isomorphic to the disjoint union of semicube graphs Sc(G1) and Sc(G2). If M1
is a maximum matching in Sc(G1) and M2 is a maximum matching in Sc(G2),
then M =M1 ∪M2 is a maximum matching in Sc(G). The result follows from
theorems 5.3 and 6.1.
Remark 6.1. The result of Corollary 6.1 does not hold for infinite Cartesian
products of partial cubes, as these products are disconnected. On the other
hand, it can be shown that arbitrary weak Cartesian products (connected com-
ponents of Cartesian products [15]) of partial cubes are partial cubes.
7 Pasting partial cubes
In this section we use the set pasting technique [5, ch.I, §2.5] to build new partial
cubes from old ones.
Let G1 = (V1, E1) and G2 = (V2, E2) be two graphs, H1 = (U1, F1) and
H2 = (U2, F2) be two isomorphic subgraphs of G1 and G2, respectively, and
ψ : U1 → U2 be a bijection defining an isomorphism between H1 and H2. The
bijection ψ defines an equivalence relation R on the sum V1+V2 as follows: any
element in (V1 \U1)∪ (V2 \U2) is equivalent to itself only and elements u1 ∈ U1
and u2 ∈ U2 are equivalent if and only if u2 = ψ(u1). We say that the quotient
set V = (V1 + V2)/R is obtained by pasting together the sets V1 and V2 along
the subsets U1 and U2. Since the graphs H1 and H2 are isomorphic, the pasting
of the sets V1 and V2 can be naturally extended to a pasting of sets of edges
E1 and E2 resulting in the set E of edges joining vertices in V . We say that
the graph G = (E, V ) is obtained by pasting together the graphs G1 and G2
along the isomorphic subgraphs H1 and H2. The pasting construction allows
for identifying in a natural way the graphs G1 and G2 with subgraphs of G, and
the isomorphic graphs H1 and H2 with a common subgraph H of both graphs
G1 and G2. We often follow this convention below.
Remark 7.1. Note that in the above construction the resulting graph G de-
pends not only on graphs G1 and G2 and their isomorphic subgraphs H1 and
H2 but also on the bijection ψ defining an isomorphism from H1 onto H2 (see
the drawings in Figures 7.1 and 7.2).
Figure 7.1: Pasting of two trees.
Figure 7.2: Another pasting of the same trees.
In general, pasting of two partial cubes G1 and G2 along two isomorphic
subgraphs H1 and H2 does not produce a partial cube even under strong as-
sumptions about these subgraphs as the next example illustrates.
Figure 7.3: Pasting partial cubes G1 and G2.
Example 7.1. Pasting of two partial cubes G1 = C6 and G2 = C6 along
subgraphs H1 and H2 is shown in Figure 7.3. The resulting graph G is not a
partial cube. Indeed, the semicubeWab is not a convex set. Note that subgraphs
H1 and H2 are convex subgraphs of the respective partial cubes.
In this section we study two simple pastings of connected graphs together,
the vertex-pasting and the edge-pasting, and show that these pastings produce
partial cubes from partial cubes. We also compute the isometric and lattice
dimensions of the resulting graphs.
Let G1 = (V1, E1) and G2 = (V2, E2) be two connected graphs, a1 ∈ V1,
a2 ∈ V2, and H1 = ({a1},∅), H2 = ({a2},∅). Let G be the graph obtained
by pasting G1 and G2 along subgraphs H1 and H2. In this case we say that
the graph G is obtained from graphs G1 and G2 by vertex-pasting. We also say
that G is obtained from G1 and G2 by identifying vertices a1 and a2. Figure 7.4
illustrates this construction. Note that the vertex a = {a1, a2} is a cut vertex
of G, since G1 ∪ G2 = G and G1 ∩ G2 = {a}. (We follow our convention and
identify graphs G1 and G2 with subgraphs of G.)
Figure 7.4: An example of vertex-pasting.
In what follows we use superscripts to distinguish subgraphs of the graphs
G1 and G2. For instance, W
stands for the semicube of G2 defined by two
adjacent vertices a, b ∈ V2.
Theorem 7.1. A graph G = (V,E) obtained by vertex-pasting from partial
cubes G1 = (V1, E1) and G2 = (V2, E2) is a partial cube.
Proof. We denote a = {a1, a2} the vertex of G obtained by identifying vertices
a1 ∈ V1 and a2 ∈ V2. Clearly, G is a bipartite graph. Let xy be an edge of G.
Without loss of generality we may assume that xy ∈ E1 and a ∈ Wxy. Note
that any path between vertices in V1 and V2 must go through a. Since a ∈Wxy,
we have, for any v ∈ V2,
d(v, x) = d(v, a) + d(a, x) < d(v, a) + d(a, y) = d(v, y),
which implies V2 ⊆ Wxy and Wyx ⊆ V1. It follows that Wxy = W
xy ∪ V2 and
Wyx = W
yx . The sets W
xy , W
yx and V2 are convex subsets of V . Since
xy ∩ V2 = {a}, the set Wxy = W
xy ∪ V2 is also convex. By Theorem 3.4(ii),
the graph G is a partial cube.
The vertex-pasting construction introduced above can be generalized as
follows. Let G = {Gi = (Vi, Ei)}i∈J be a family of connected graphs and
A = {ai ∈ Gi}i∈J be a family of distinguished vertices of these graphs. Let G
be the graph obtained from the graphs Gi by identifying vertices in the set A.
We say that G is obtained by vertex-pasting together the graphs Gi (along the
set A).
Example 7.2. Let J = {1, . . . , n} with n ≥ 2,
G = {Gi = ({ai, bi}, {aibi})}i∈J , and A = {ai}i∈J .
Clearly, each Gi is K2. By vertex-pasting these graphs along A, we obtain the
n-star graph K1,n.
Since the star K1,n is a tree it can be also obtained from K1 by successive
vertex-pasting as in Example 7.3.
Example 7.3. Let G1 be a tree and G2 = K2. By vertex-pasting these graphs
we obtain a new tree. Conversely, let G be a tree and v be its leaf. Let G1 be
a tree obtained from G by deleting the leaf v. Clearly, G can be obtained by
vertex-pasting G1 and K2. It follows that any tree can obtained from the graph
K1 by successive vertex-pasting of copies of K2 (cf. Theorem 2.3(e) in [12]).
Any connected graph G can be constructed by successive vertex-pasting of
its blocks using its block cut-vertex tree [4] structure. Let G1 be an endblock of
G with a cut vertex v and G2 be the union of the remaining blocks of G. Then
G can be obtained from G1 and G2 by vertex-pasting along the vertex v. It
follows that any connected graph can be obtained from its blocks by successive
vertex-pastings.
Let G = (V,E) be a partial cube. We recall that the isometric dimension
dimI(G) of G is the cardinality of the quotient set E/θ, where θ is Djoković’s
equivalence relation on the set E (cf. formula (5.1)).
Theorem 7.2. Let G = (V,E) be a partial cube obtained by vertex-pasting
together partial cubes G1 = (V1, E1) and G2 = (V2, E2). Then
dimI(G) = dimI(G1) + dimI(G2).
Proof. It suffices to prove that there are no edges xy ∈ E1 and uv ∈ E2 which
are in Djoković’s relation θ with each other. Suppose that G1 and G2 are
vertex-pasted along vertices a1 ∈ E1 and a2 ∈ E2 and let a = {a1, a2} ∈ E. Let
xy ∈ E1 and uv ∈ E2 be two edges in E. We may assume that u ∈ Wxy. Since
a is a cut-vertex of G and u ∈Wxy, we have
d(u, a) + d(a, x) = d(u, x) < d(u, y) = d(u, a) + d(a, y).
Hence, d(a, x) < d(a, y), which implies
d(v, x) = d(v, a) + d(a, x) < d(v, a) + d(a, y) = d(v, y).
It follows that v ∈ Wxy. Therefore the edge xy does not stand in the relation θ
to the vertex uv.
The next result follows immediately from the previous theorem. Note that
blocks of a partial cube are partial cubes themselves.
Corollary 7.1. Let G be a partial cube and {G1, . . . , Gn} be the family of its
blocks. Then
dimI(G) =
dimI(Gi).
In the case of the lattice dimension of a partial cube we can claim only much
weaker result than one stated in Theorem 7.2 for the isometric dimension. We
omit the proof.
Theorem 7.3. Let G be a partial cube obtained by vertex-pasting together partial
cubes G1 and G2. Then
max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) ≤ dimZ(G1) + dimZ(G2).
The following example illustrate possible cases for inequalities in Theo-
rem 7.3. Let us recall that the lattice dimension of a tree with m leaves is
⌈m/2⌉ (cf. [14]).
Example 7.4. The star K1,6 can be obtained from the stars K1,2 and K1,4 by
vertex-pasting these two stars along their centers. Clearly,
max{dimZ(K1,2), dimZ(K1,4)} < dimZ(K1,6) = dimZ(K1,2) + dimZ(K1,4).
The same star K1,6 is obtained from two copies of the star K1,3 by vertex-
pasting along their centers. We have dimZ(K1,3) = 2, dimZ(K1,6) = 3, so
max{dimZ(K1,3), dimZ(K1,3)} < dimZ(K1,6) < dimZ(K1,3) + dimZ(K1,3).
Let us vertex-paste two stars K1,3 along their two leaves. The resulting
graph T is a tree with four vertices. Therefore,
max{dimZ(K1,3), dimZ(K1,3)} = dimZ(T ) < dimZ(K1,3) + dimZ(K1,3).
We now consider another simple way of pasting two graphs together.
Let G1 = (V1, E1) and G2 = (V2, E2) be two connected graphs, a1b1 ∈ E1,
a2b2 ∈ E2, and H1 = ({a1, b1}, {a1b1}), H2 = ({a2, b2}, {a2b2}). Let G be the
graph obtained by pasting G1 and G2 along subgraphs H1 and H2. In this case
we say that the graph G is obtained from graphs G1 and G2 by edge-pasting.
Figures 7.1, 7.2, and 7.5 illustrate this construction.
Figure 7.5: An example of edge-pasting.
As before, we identify the graphs G1 and G2 with subgraphs of the graph
G and denote a = {a1, a2}, b = {b1, b2} the two vertices obtained by pasting
together vertices a1 and a2 and, respectively, b1 and b2. The edge ab ∈ E is
obtained by pasting together edges a1b1 ∈ E1 and a2b2 ∈ E2 (cf. Figure 7.5).
Then G = G1∪G2, V1∩V2 = {a, b} and E1∩E2 = {ab}. We use these notations
in the rest of this section.
Proposition 7.1. A graph G obtained by edge-pasting together bipartite graphs
G1 and G2 is bipartite.
Proof. Let C be a cycle in G. If C ⊆ G1 or C ⊆ G2, then the length of C is
even, since the graphs G1 and G2 are bipartite. Otherwise, the vertices a and
b separate C into two paths each of odd length. Therefore C is a cycle of even
length. The result follows.
The following lemma is instrumental; it describes the semicubes of the graph
G in terms of semicubes of graphs G1 and G2.
Lemma 7.1. Let uv be an edge of G. Then
(i) For uv ∈ E1, a, b ∈ Wuv ⇒ Wuv =W
uv ∪ V2, Wvu =W
(ii) For uv ∈ E2, a, b ∈ Wuv ⇒ Wuv =W
uv ∪ V1, Wvu =W
(iii) a ∈ Wuv, b ∈Wvu ⇒ Wuv =Wab.
Figure 7.6: Edge-pasting of graphs G1 and G2.
Proof. We prove parts (i) and (iii) (see Figure 7.6).
(i) Since any path from w ∈ V2 to u or v contains a or b and a, b ∈Wuv, we
have w ∈Wuv. Hence, Wuv =W
uv ∪ V2 and Wvu =W
(iii) Since ab θ uv in G1, we have W
uv = W
, by Theorem 3.4(iv). Let w
be a vertex in W
uv . Then, by the triangle inequality,
d(w, u) < d(w, v) ≤ d(w, b) + d(b, v) < d(w, b) + d(b, u).
Since any shortest path from w to u contains a or b, we have
d(w, a) + d(a, u) = d(w, u).
Therefore,
d(w, a) + d(a, u) < d(w, b) + d(b, u).
Since ab θ uv in G1, we have d(a, u) = d(b, v), by Theorem 4.2. It follows that
d(w, a) < d(w, b), that is, w ∈ W
. We proved that W
uv ⊆ W
symmetry, W
vu ⊆ W
. Since two opposite semicubes form a partition of V2,
we have W
uv =W
. The result follows.
Theorem 7.4. A graph G obtained by edge-pasting together partial cubes G1
and G2 is a partial cube.
Proof. By Theorem 3.4(ii) and Proposition 7.1, we need to show that for any
edge uv of G the semicube Wuv is a convex subset of V . There are two possible
cases.
(i) uv = ab. The semicube Wab is the union of semicubes W
and W
which are convex subsets of V1 and V2, respectively. It is clear that any shortest
path connecting a vertex in W
with a vertex in W
contains vertex a and
therefore is contained in Wab. Hence, Wab is a convex set. A similar argument
proves that the set Wba is convex.
(ii) uv 6= ab. We may assume that uv ∈ E1. To prove that the semicube
Wuv is a convex set, we consider two cases.
(a) a, b ∈ Wuv. (The case when a, b ∈ Wvu is treated similarly.) By
Lemma 7.1(i), the semicube Wuv is the union of the semicube W
uv and the
set V2 which are both convex sets. Any shortest path P from a vertex in V2 to
a vertex in W
uv contains either a or b. It follows that P ⊆ W
uv ∪ V2 = Wuv.
Therefore the semicube Wuv is convex.
(b) a ∈ Wuv, b ∈ Wvu. (The case when b ∈ Wuv , a ∈ Wvu is treated
similarly.) By Lemma 7.1(ii), Wuv = Wab. The result follows from part (i) of
the proof.
Theorem 7.5. Let G be a graph obtained by edge-pasting together finite partial
cubes G1 and G2. Then
dimI(G) = dimI(G1) + dimI(G2)− 1.
Proof. Let θ, θ1, and θ2 be Djoković’s relations on E, E1, and E2, respectively.
By Lemma 7.1, for uv, xy ∈ E1 (resp. uv, xy ∈ E2) we have
uv θ xy ⇔ uv θ1xy (resp. uv θ xy ⇔ uv θ2xy).
Let uv ∈ E1, xy ∈ E2, and uv θ xy. Suppose that (uv, ab) /∈ θ. We may
assume that a, b ∈ Wuv . By Lemma 7.1(i), V2 ⊂ Wuv, a contradiction, since
xy ∈ E2. Hence, uv θ xy θ ab. It follows that each equivalence class of the
relation θ is either an equivalence class of θ1, an equivalence class of θ2 or the
class containing the edge ab. Therefore
|E/θ| = |E1/θ1|+ |E2/θ2| − 1.
The result follows, since the isometric dimension of a partial cube is equal to the
cardinality of the set of equivalence classes of Djoković’s relation (formula (5.1)).
We need some results about semicube graphs in order to prove an analog of
Theorem 7.3 for a partial cube obtained by edge-pasting of two partial cubes.
Lemma 7.2. Let G be a partial cube and WpqWuv , WqpWxy be two edges in
the graph Sc(G). Then WxyWuv is an edge in Sc(G).
Proof. By condition (5.4), Wqp ⊂ Wuv and Wyx ⊂ Wqp. Hence, Wyx ⊂ Wuv.
By the same condition, WxyWuv ∈ Sc(G).
As before, we identify partial cubes G1 and G2 with subgraphs of the partial
cube G. Then G1 ∪G2 = G and G1 ∩G2 = ({a, b}, {ab}) = K2 (cf. Figure 7.6).
Lemma 7.3. Let G be a partial cube obtained by edge-pasting together partial
cubes G1 and G2. Let W
xy (resp. W
xy ) be an edge in the semicube
Sc(G1) (resp. Sc(G2)). Then WuvWxy is an edge in Sc(G).
Figure 7.7: Semicubes forming an edge in Sc(G1).
Proof. It suffices to consider the case of Sc(G1) (see Figure 7.7). By condi-
tion (5.4),W
vu ⊂W
xy andW
yx ⊂W
uv . Suppose that a ∈ W
vu and b ∈W
(the case when b ∈ W
vu and a ∈ W
yx is treated similarly). Then ab θ1xy and
ab θ1uv. By transitivity of θ1, we have uv θ1xy, a contradiction, since semicubes
uv and W
xy are distinct. Therefore we may assume that, say, a, b ∈ W
Then, by Lemma 7.1, Wvu = W
vu ⊂ V1. Since W
vu ⊂ W
xy ⊆ Wxy, we have
Wvu ⊂Wxy. By condition (5.4), WuvWxy is an edge in Sc(G).
Lemma 7.4. LetM1 andM2 be matchings in graphs Sc(G1) and Sc(G2). There
is a matching M in Sc(G) such that
|M | ≥ |M1|+ |M2| − 1.
Proof. By Lemma 7.3, M1 and M2 induce matchings in Sc(G) which we denote
by the same symbols. The intersection M1 ∩M2 is either empty or a subgraph
of the empty graph with vertices Wab and Wba.
If M1 ∩M2 is empty, then M = M1 ∪M2 is a matching in Sc(G) and the
result follows.
If M1 ∩M2 is an empty graph with a single vertex, say, in M1, we remove
fromM1 the edge that has this vertex as its end vertex, resulting in the matching
M ′1. Clearly, M =M
1 ∪M2 is a matching in Sc(G) and |M | = |M1|+ |M2| − 1.
Suppose now that M1 ∩M2 is the empty graph with vertices Wab and Wba.
Let WabWuv, WbaWpq (resp. WabWxy, WbaWrs) be edges in M1 (resp. M2).
By Lemma 7.2, WxyWrs is an edge in Sc(G2). Let us replace edgesWabWxy and
WbaWrs in M2 by a single edge WxyWrs, resulting in the matching M
2. Then
M =M1 ∪M
2 is a matching in Sc(G) and |M | = |M1|+ |M2| − 1.
Corollary 7.2. Let M1 and M2 be maximum matchings in Sc(G1) and Sc(G2),
respectively, and M be a maximum matching in Sc(G). Then
|M | ≥ |M1|+ |M2| − 1. (7.1)
By Theorem 5.3, we have
dimI(G1) = dimZ(G1) + |M1|, dimI(G2) = dimZ(G2) + |M2|,
dimI(G) = dimZ(G) + |M |,
where M1 and M2 are maximum matchings in Sc(G1) and Sc(G2), respectively,
and M is a maximum matching in Sc(G). Therefore, by Theorem 7.5 and (7.1),
we have the following result (cf. Theorem 7.3).
Theorem 7.6. Let G be a partial cube obtained by edge-pasting from partial
cubes G1 and G2. Then
max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) ≤ dimZ(G1) + dimZ(G2).
Example 7.5. Let us consider two edge-pastings of the stars G1 = K1,3 and
G2 = K1,3 of lattice dimension 2 shown in figures 7.1 and 7.2. In the first case
the resulting graph is the star G = K1,5 of lattice dimension 3. Then we have
max{dimZ(G1), dimZ(G2)} < dimZ(G) < dimZ(G1) + dimZ(G2).
In the second case the resulting graph is a tree with 4 leaves. Therefore,
max{dimZ(G1), dimZ(G2)} = dimZ(G) < dimZ(G1) + dimZ(G2).
Let c1a1 and c2a2 be edges of stars G1 = K1,4 and G2 = K1,4 (each of
which has lattice dimension 2), where c1 and c2 are centers of the respective
stars. Let us edge-paste these two graphs by identifying c1 with c2 and a1 with
a2, respectively. The resulting graph G is the star K1,7 of lattice dimension 4.
Thus,
max{dimZ(G1), dimZ(G2)} ≤ dimZ(G) = dimZ(G1) + dimZ(G2).
8 Expansions and contractions of partial cubes
The graph expansion procedure was introduced by Mulder in [16], where it is
shown that a graph is a median graph if and only if it can be obtained from
K1 by a sequence of convex expansions (see also [15]). A similar result for
partial cubes was established in [6] (see also [7]) as a corollary to a more general
result concerning isometric embeddability into Hamming graphs; it was also
established in [13] in the framework of oriented matroids theory.
In this section we investigate properties of (isometric) expansion and con-
traction operations and, in particular, prove in two different ways that a graph is
a partial cube if and only if it can be obtained from the graph K1 by a sequence
of expansions.
A remark about notations is in order. In the product {1, 2} × (V1 ∪ V2), we
denote V ′i = {i} × Vi and x
i = (i, x) for x ∈ Vi, where i, j = 1, 2.
Definition 8.1. Let G = (V,E) be a connected graph, and let G1 = (V1, E1)
and G2 = (V2, E2) be two isometric subgraphs of G such that G = G1 ∪ G2.
The expansion of G with respect to G1 and G2 is the graph G
′ = (V ′, E′)
constructed as follows from G (see Figure 8.1):
(i) V ′ = V1 + V2 = V
1 ∪ V
(ii) E′ = E1 + E2 +M , where M is the matching
x∈V1∩V2
{x1x2}.
In this case, we also say that G is a contraction of G′.
Figure 8.1: Expansion/contraction processes.
It is clear that the graphs G1 and 〈V
1〉 are isomorphic, as well as the graphs
G2 and 〈V
We define a projection p : V ′ → V by p(xi) = x for x ∈ V . Clearly, the
restriction of p to V ′1 is a bijection p1 : V
1 → V1 and its restriction to V
2 is a
bijection p2 : V
2 → V2. These bijections define isomorphisms 〈V
1〉 → G1 and
〈V ′2〉 → G2.
Let P ′ be a path in G′. The vertices of G obtained from the vertices in P ′
under the projection p define a walk P in G; we call this walk P the projection
of the path P ′. It is clear that
ℓ(P ) = ℓ(P ′), if P ′ ⊆ 〈V ′1〉 or P
′ ⊆ 〈V ′2〉. (8.1)
In this case, P is a path in G and either P = p1(P
′) or P = p2(P
′). On the
other hand,
ℓ(P ) < ℓ(P ′), if P ′ ∩ 〈V ′1〉 6= ∅ and P
′ ∩ 〈V ′2 〉 6= ∅, (8.2)
and P is not necessarily a path.
We will frequently use the results of the following lemma in this section.
Lemma 8.1. (i) For u1, v1 ∈ V ′1 , any shortest path Pu1v1 in G
′ belongs to 〈V ′1 〉
and its projection Puv = p1(Pu1v1) is a shortest path in G. Accordingly,
dG′(u
1, v1) = dG(u, v)
and 〈V ′1〉 is a convex subgraph of G
′. A similar statement holds for u2, v2 ∈ V ′2 .
(ii) For u1 ∈ V ′1 and v
2 ∈ V ′2 ,
dG′(u
1, v2) = dG(u, v) + 1.
Let Pu1v2 be a shortest path in G
′. There is a unique edge x1x2 ∈M such that
x1, x2 ∈ Pu1v2 and the sections Pu1x1 and Px2v2 of the path Pu1v2 are shortest
paths in 〈V ′1 〉 and 〈V
2 〉, respectively. The projection Puv of Pu1v2 in G
′ is a
shortest path in G.
Proof. (i) Let Pu1v1 be a path in G
′ that intersects V ′2 . Since 〈V1〉 is an isometric
subgraph of G, there is a path Puv in G that belongs to 〈V1〉. Then p
1 (Puv) is
a path in 〈V ′1 〉 of the same length as Puv. By (8.1) and (8.2),
ℓ(p−11 (Puv)) < ℓ(Pu1v1).
Therefore any shortest path Pu1v1 in G
′ belongs to 〈V ′1 〉. The result follows.
(ii) Let Pu1v2 be a shortest path in G
′ and Puv be its projection to V .
By (8.2),
dG′(u
1, v2) = ℓ(Pu1v2) > ℓ(Puv) ≥ dG(u, v).
Since there is no edge of G joining vertices in V1 \ V2 and V2 \ V1, a shortest
path in G from u to v must contain a vertex x ∈ V1 ∩ V2. Since G1 and G2
are isometric subgraphs, there are shortest paths Pux in G1 and Pxv in G2 such
that their union is a shortest path from u to v. Then, by the triangle inequality
and part (i) of the proof, we have (cf. Figure 8.1)
dG′(u
1, v2) ≤ dG′(u
1, x1) + dG′(x
1, x2) + dG′(x
2, v2) = dG(u, v) + 1.
The last two displayed formulas imply dG′(u
1, v2) = dG(u, v) + 1.
Since u1 ∈ V ′1 and v
2 ∈ V ′2 the path Pu1v2 must contain an edge, say x
1x2, in
M . Since this path is a shortest path in G′, this edge is unique. Then the sec-
tions Pu1x1 and Px2v2 of Pu1v2 are shortest paths in 〈V
1 〉 and 〈V
2〉, respectively.
Clearly, Puv is a shortest path in G.
Let a1a2 be an edge in the matchingM = ∪x∈V1∩V2{x
1x2}. This edge defines
five fundamental sets (cf. Section 4): the semicubes Wa1a2 and Wa2a1 , the sets
of vertices Ua1a2 and Ua2a1 , and the set of edges Fa1a2 . The next theorem
follows immediately from Lemma 8.1. It gives a hint to a connection between
the expansion process and partial cubes.
Theorem 8.1. Let G′ be an expansion of a connected graph G and notations
are chosen as above. Then
(i) Wa1a2 = V
1 and Wa2a1 = V
2 are convex semicubes of G
(ii) Fa1a2 =M defines an isomorphism between induced subgraphs 〈Ua1a2〉 and
〈Ua2a1〉, which are isomorphic to the subgraph G1 ∩G2.
The result of Theorem 8.1 justifies the following constructive definition of
the contraction process.
Definition 8.2. Let ab be an edge of a connected graph G′ = (V ′, E′) such
(i) semicubes Wab and Wba are convex and form a partition of V
(ii) the set Fab is a matching and defines an isomorphism between subgraphs
〈Uab〉 and 〈Uba〉.
A graph G obtained from the graphs 〈Wab〉 and 〈Wba〉 by pasting them along
subgraphs 〈Uab〉 and 〈Uba〉 is said to be a contraction of the graph G
Remark 8.1. If G′ is bipartite, then semicubesWab andWba form a partition of
its vertex set. Then, by Theorem 4.1, condition (i) implies condition (ii). Thus
any pair of opposite convex semicubes in a connected bipartite graph defines a
contraction of this graph.
By Theorem 8.1, a graph is a contraction of its expansion. It is not difficult
to see that any connected graph is also an expansion of its contraction.
The following three examples give geometric illustrations for the expansion
and contraction procedures.
Example 8.1. Let a and b be two opposite vertices in the graph G = C4.
Clearly, the two distinct paths P1 and P2 from a to b are isometric subgraphs
of G defining an expansion G′ = C6 of G (see Figure 8.2). Note that P1 and P2
are not convex subsets of V .
Example 8.2. Another isometric expansion of the graph G = C4 is shown
in Figure 8.3. Here, the path P1 is the same as in the previous example and
G2 = G.
Example 8.3. Lemma 8.1 claims, in particular, that the projection of a shortest
path in an extension G′ of a graphG is a shortest path in G. Generally speaking,
Figure 8.2: An expansion of the cycle C4.
Figure 8.3: Another isometric expansion of the cycle C4.
the converse is not true. Consider the graph G shown in Figure 8.4 and two
paths in G:
V1 = abcef and V2 = bde.
The graph G′ in Figure 8.4 is the convex expansion of G with respect to V1 and
V2. The path abdef is a shortest path in G; it is not a projection of a shortest
path in G′.
Figure 8.4: A shortest path which is not a projection of a shortest path.
One can say that, in the case of finite partial cubes, the contraction procedure
is defined by an orthogonal projection of a hypercube onto one of its facets.
By Theorem 8.1, the sets V ′1 and V
2 are opposite semicubes of the graph G
defined by edges in M . Their projections are the sets V1 and V2 which are not
necessarily semicubes of G. For other semicubes in G′ we have the following
result.
Lemma 8.2. For any two adjacent vertices u, v ∈ V ,
Wuivi = p
−1(Wuv) for u, v ∈ Vi and i = 1, 2.
Proof. By Lemma 8.1,
dG′(x
j , ui) < dG′(x
j , vi) ⇔ dG(x, u) < dG(x, v)
for x ∈ V and i, j = 1, 2. The result follows.
Corollary 8.1. If uv is an edge of G1 ∩G2, then Wu1v1 =Wu2v2 .
The following lemma is an immediate consequence of Lemma 8.1. We shall
use it implicitly in our arguments later.
Lemma 8.3. Let u, v ∈ V1 and x ∈ V1 ∩ V2. Then
x1 ∈Wu1v1 ⇔ x
2 ∈Wu1v1 .
The same result holds for semicubes in the form Wu2v2 .
Generally speaking, the projection of a convex subgraph of G′ is not a con-
vex subgraph of G. For instance, the projection of the convex path b2d2e2 in
Figure 8.4 is the path bde which is not a convex subgraph of G. On the other
hand, we have the following result.
Theorem 8.2. Let G′ = (V ′, E′) be an expansion of a graph G = (V,E) with
respect to subgraphs G1 = (V1, E1) and G2 = (V2, E2). The projection of a
convex semicube of G′ different from 〈V ′1〉 and 〈V
2 〉 is a convex semicube of G.
Proof. It suffices to consider the case when Wuv = p(Wu1v1) for u, v ∈ V1 (cf.
Theorem 8.2). Let x, y ∈Wuv and z ∈ V be a vertex such that
dG(x, z) + dG(z, y) = dG(x, y).
We need to show that z ∈Wuv.
Figure 8.5: A shortest path from x to y.
(i) x, y ∈ V1 (the case when x, y ∈ V2 is treated similarly). Suppose that
z ∈ V1. Then x
1, y1, z1 ∈ V ′1 and, by Lemma 8.1,
dG′(x
1, z1) + dG′(z
1, y1) = dG′(z
1, y1).
Since x1, y1 ∈ Wu1v1 and Wu1v1 is convex, z
1 ∈ Wu1v1 . Hence, z ∈Wuv.
Suppose now that z ∈ V2 \ V1. Consider a shortest path Pxy in G from x
to y containing z. This path contains vertices x′, y′ ∈ V1 ∩ V2 such that (see
Figure 8.5)
dG(x, x
′) + dG(x
′, z) = dG(x, z) and dG(y, y
′) + dG(y
′, z) = dG(y, z).
Since Pxy is a shortest path in G, we have
dG(x, x
′) + dG(x
′, y) = dG(x, y), dG(x, y
′) + dG(y
′, y) = dG(x, y),
′, z) + dG(z, y
′) = dG(x
′, y′).
Since x, x′, y ∈ V1, we have x
1, x′1, y1 ∈ V ′1 . Because x
1, y1 ∈ Wu1v1 and Wu1v1
is convex, x′1 ∈ Wu1v1 . Hence, x
′ ∈ Wuv and, similarly, y
′ ∈ Wuv. Since
x′2, y′2, z2 ∈ V ′2 and Wu1v1 is convex, z
2 ∈Wu1v1 . Hence, z ∈Wuv.
(ii) x ∈ V1 \V2 and y ∈ V2 \V1. We may assume that z ∈ V1. By Lemma 8.1,
dG′(x
1, y2) = dG(x, y) + 1 = dG(x, z) + dG(z, y) + 1
= dG′(x
1, z1) + dG′(z
1, y2).
Since x1, y2 ∈ Wu1v1 and Wu1v1 is convex, z
1 ∈ Wu1v1 . Hence, z ∈Wuv.
By using the results of Lemma 8.1, it is not difficult to show that the class
of connected bipartite graphs is closed under the expansion and contraction
operations. The next theorem establishes this result for the class of partial
cubes.
Theorem 8.3. (i) An expansion G′ of a partial cube G is a partial cube.
(ii) A contraction G of a partial cube G′ is a partial cube.
Proof. (i) Let G = (V,E) be a partial cube and G′ = (V ′, E′) be its expansion
with respect to isometric subgraphs G1 = (V1, E1) and G2 = (V2, E2). By
Theorem 3.4(ii), it suffices to show that the semicubes of G′ are convex.
By Lemma 8.1, the semicubes 〈V ′1〉 and 〈V
2〉 are convex, so we consider a
semicube in the formWu1v1 where uv ∈ E1 (the other case is treated similarly).
Let Px′y′ be a shortest path connecting two vertices in Wu1v1 and Pxy be its
projection to G. By Theorem 8.2, x, y ∈ Wuv and, by Lemma 8.1, Pxy is a
shortest path in G. Since Wuv is convex, Pxy belongs to Wuv. Let z
′ be a
vertex in Px′y′ and z = p(z
′) ∈ Pxy. By Lemma 8.1,
dG(z, u) < dG(z, v) ⇒ dG′(z
′, u1) ≤ dG′(z
′, v1).
Since G′ is a bipartite graph, dG′(z
′, u1) < dG′(z
′, v1). Hence, Px′y′ ⊆ Wu1v1 ,
so Wu1v1 is convex.
(ii) Let G = (V,E) be a contraction of a partial cube G′ = (V ′, E′). By
Theorem 3.4, we need to show that the semicubes of G are convex. By The-
orem 8.2, all semicubes of G are projections of semicubes of G′ distinct from
〈V ′1〉 and 〈V
2〉. By Theorem 8.2, the semicubes of G are convex.
Corollary 8.2. (i) A finite connected graph is a partial cube if and only if it
can be obtained from K1 by a sequence of expansions.
(ii) The number of expansions needed to produce a partial cube G from K1
is dimI(G).
Proof. (i) Follows immediately from Theorem 8.3.
(ii) Follows from theorems 8.2 and 5.1 (see the discussion in Section 5 just
before Theorem 5.2 ).
The processes of expansion and contraction admit useful descriptions in the
case of partial cubes on a set. Let G = (V,E) be a partial cube on a set X ,
that is an isometric subgraph of the hypercube H(X). Then it is induced by
some wg-family F of finite subsets of X (cf. Theorem 2.1). We may assume (see
Section 5) that ∩F = ∅ and ∪F = X .
In what follows we present proofs of the results of Theorem 8.3 and Corol-
lary 8.2 given in terms of wg-families of sets.
The expansion process for a partial cube G on X can be described as follows:
Let F1 and F2 be wg-families of finite subsets of X such that F1 ∩ F2 6= ∅,
F1∪F2 = F, and the distance between any two sets P ∈ F1 \F2 and Q ∈ F2 \F1
is greater than one. Note that 〈F1〉 and 〈F2〉 are partial cubes, 〈F1〉∩ 〈F2〉 6= ∅,
and 〈F1〉 ∪ 〈F2〉 = 〈F〉 = G. Let X
′ = X + {p}, where p /∈ X , and
2 = {Q+ {p} : Q ∈ F2}, F
′ = F1 ∪ F
It is quite clear that the graphs 〈F′2〉 and 〈F2〉 are isomorphic and the graph
G′ = 〈F′〉 is an isometric expansion of the graph G.
Theorem 8.4. An expansion of a partial cube is a partial cube.
Proof. We need to verify that F′ is a wg-family of finite subsets of X ′. By
Theorem 2.3, it suffices to show that the distance between any two adjacent
sets in F′ is 1. It is obvious if each of these two sets belong to one of the families
F1 or F
2. Suppose that P ∈ F1 and Q+ {p} ∈ F
2 are adjacent, that is, for any
S ∈ F′ we have
P ∩ (Q+ {p}) ⊆ S ⊆ P ∪ (Q+ {p}) ⇒ S = P or S = Q+ {p}. (8.3)
If Q ∈ F1, then
P ∩ (Q + {p}) ⊆ Q ⊆ P ∪ (Q+ {p}),
since p /∈ P . By (8.3), Q = P implying d(P,Q + {p}) = 1.
If Q ∈ F2 \ F1, there is R ∈ F1 ∩ F2 such that
d(P,R) + d(R,Q) = d(P,Q),
since F is well graded. By Theorem 2.2,
P ∩Q ⊆ R ⊆ P ∪Q,
which implies
P ∩ (Q + {p}) ⊆ R+ {p} ⊆ P ∪ (Q+ {p}).
By (8.3), R + {p} = Q+ {p}, a contradiction.
It is easy to recognize the fundamental sets (cf. Section 4) in an isometric
expansion G′ of a partial cube G = 〈F〉. Let P ∈ F1∩F2 and Q = P +{p} ∈ F
be two vertices defining an edge in G′ according to Definition 8.1(ii). Clearly,
the families F1 and F
2 are the semicubes WPQ and WQP of the graph G
′ (cf.
Lemma 5.1) and therefore are convex subsets of F′. The set FPQ is the set
of edges defined by p as in Lemma 5.1. In addition, UPQ = F1 ∩ F2 and
UQP = {R+ {p} : R ∈ F1 ∩ F2}.
Let G be a partial cube induced by a wg-family F of finite subsets of a set
X . As before, we assume that ∩F = ∅ and ∪F = X . Let PQ be an edge of G.
We may assume that Q = P + {p} for some p /∈ P . Then (see Lemma 5.1)
WPQ = {R ∈ F : p /∈ R} and WQP = {R ∈ F : p ∈ R}.
Let X ′ = X \ {p} and F′ = {R \ {p} : R ∈ F}. It is clear that the graph G′
induced by the family F′ is isomorphic to the contraction of G defined by the
edge PQ. Geometrically, the graph G′ is the orthogonal projection of the graph
G along the edge PQ (cf. figures 8.2 and 8.3).
Theorem 8.5. (i) A contraction G′ of a partial cube G is a partial cube.
(ii) If G is finite, then dimI(G
′) = dimI(G)− 1.
Proof. (i) For p ∈ X we define F1 = {R ∈ F : p /∈ R}, F2 = {R ∈ F : p ∈ R},
and F′2 = {R \ {p} ∈ F : p ∈ R}. Note that F1 and F2 are semicubes of G
and F′2 is isometric to F2. Hence, F1 and F
2 are wg-families of finite subsets
of X ′. We need to prove that F′ = F1 ∪ F
2 is a wg-family. By Theorem 2.3,
it suffices to show that d(P,Q) = 1 for any two adjacent sets P,Q ∈ F′. This
is true if P,Q ∈ F1 or P,Q ∈ F
2, since these two families are well graded. For
P ∈ F1 \ F
2 and Q ∈ F
2 \ F1, the sets P and Q + {p} are not adjacent in F,
since F is well graded and Q /∈ F. Hence there is R ∈ F1 such that
P ∩ (Q+ {p}) ⊆ R ⊆ P ∪ (Q + {p})
and R 6= P . Since p /∈ R, we have
P ∩Q ⊆ R ⊆ P ∪Q.
Since R 6= P and R 6= Q, the sets P and Q are not adjacent in F′. The result
follows.
(ii) If G is a finite partial cube, then, by Theorem 5.2,
dimI(G
′) = |X ′| = |X | − 1 = dimI(G)− 1.
9 Conclusion
The paper focuses on two themes of a rather general mathematical nature.
1. The characterization problem. It is a common practice in mathematics
to characterize a particular class of object in different terms. We present new
characterizations of the classes of bipartite graphs and partial cubes, and give
new proofs for known characterization results.
2. Constructions. The problem of constructing new objects from old ones
is a standard topic in many branches of mathematics. For the class of partial
cubes, we discuss operations of forming the Cartesian product, expansion and
contraction, and pasting. It is shown that the class of partial cubes is closed
under these operations.
Because partial cubes are defined as graphs isometrically embeddable into
hypercubes, the theory of partial cubes has a distinctive geometric flavor. The
three main structures on a graph—semicubes and Djoković’s and Winkler’s
relations—are defined in terms of the metric structure on a graph. One can say
that this theory is a branch of discrete metric geometry. Not surprisingly, geo-
metric structures play an important role in our treatment of the characterization
and construction problems.
References
[1] A.S. Asratian, T.M.J. Denley, and R. Häggkvist, Bipartite Graphs and
their Applications, Cambridge University Press, 1998.
[2] D. Avis, Hypermetric spaces and the Hamming cone, Canadian Journal of
Mathematics 33 (1981) 795–802.
[3] L. Blumenthal, Theory and Applications of Distance Geometry, Oxford
University Press, London, Great Britain, 1953.
[4] J.A. Bondy, Basic graph theory: Paths and circuits, in: R.L. Graham,
M. Grötshel, and L. Lovász (Eds.), Handbook of Combinatorics, The MIT
Press, Cambridge, Massachusetts, 1995, pp. 3–110.
[5] N. Bourbaki, General Topology, Addison-Wesley Publ. Co., 1966.
[6] V. Chepoi, Isometric subgraphs of Hamming graphs and d-convexity, Con-
trol and Cybernetics 24 (1988) 6–11.
[7] V. Chepoi, Separation of two convex sets in convexity structures, Journal
of Geometry 50 (1994) 30–51.
[8] M.M. Deza and M. Laurent, Geometry of Cuts and Metrics, Springer, 1997.
[9] D.Ž. Djoković, Distance preserving subgraphs of hypercubes, J. Combin.
Theory Ser. B 14 (1973) 263–267.
[10] J.-P. Doignon and J.-Cl. Falmagne, Well-graded families of relations, Dis-
crete Math. 173 (1997) 35–44.
[11] D. Eppstein, The lattice dimension of a graph, European J. Combinatorics
26 (2005) 585–592, doi: 10.1016/j.ejc.2004.05.001.
[12] A. Frank, Connectivity and network flows, in: R.L. Graham, M. Grötshel,
and L. Lovász (Eds.), Handbook of Combinatorics, The MIT Press, Cam-
bridge, Massachusetts, 1995, pp. 111–177.
[13] K. Fukuda and K. Handa, Antipodal graphs and oriented matroids, Dis-
crete Mathematics 111 (1993) 245–256.
[14] F. Hadlock and F. Hoffman, Manhattan trees, Util. Math. 13 (1978) 55–67.
[15] W. Imrich and S. Klavžar, Product Graphs, John Wiley & Sons, 2000.
[16] H.M. Mulder, The Interval Function of a Graph, Mathematical Centre
Tracts 132, Mathematisch Centrum, Amsterdam, 1980.
[17] S. Ovchinnikov, Media theory: representations and examples, Discrete Ap-
plied Mathematics, (in review, e-print available at
http://arxiv.org/abs/math.CO/0512282).
[18] R.I. Roth and P.M. Winkler, Collapse of the metric hierarchy for bipartite
graphs, European Journal of Combinatorics 7 (1986) 371–375.
[19] M.L.J. van de Vel, Theory of Convex Structures, Elsevier, The Netherlands,
1993.
[20] P.M. Winkler, Isometric embedding in products of complete graphs, Dis-
crete Appl. Math. 8 (1984) 209–212.
http://arxiv.org/abs/math.CO/0512282
	Introduction
	Hypercubes and partial cubes
	Characterizations
	Fundamental sets in partial cubes
	Dimensions of partial cubes
	Subcubes and Cartesian products
	Pasting partial cubes
	Expansions and contractions of partial cubes
	Conclusion
ABSTRACT
  Partial cubes are isometric subgraphs of hypercubes. Structures on a graph
defined by means of semicubes, and Djokovi\'{c}'s and Winkler's relations play
an important role in the theory of partial cubes. These structures are employed
in the paper to characterize bipartite graphs and partial cubes of arbitrary
dimension. New characterizations are established and new proofs of some known
results are given.
  The operations of Cartesian product and pasting, and expansion and
contraction processes are utilized in the paper to construct new partial cubes
from old ones. In particular, the isometric and lattice dimensions of finite
partial cubes obtained by means of these operations are calculated.

<|endoftext|><|startoftext|>
Introduction
Let F be a real quadratic field of narrow class number one and let B be the
unique (up to isomorphism) quaternion algebra over F which is ramified at
both archimedean places of F and unramified everywhere else. Let GU2(B)
be the unitary similitude group of B⊕2. This is the set of Q-rational points
of an algebraic group GB defined over Q. The group GB is an inner form of
G := ResF/Q(GSp4) such that G
B(R) is compact modulo its centre. (These
notions are reviewed at the beginning of Section 1.)
In this paper we develop an algorithm which computes automorphic forms
on GB in the following sense: given an idealN inOF and an integer k greater
than 2, the algorithm returns the Hecke eigensystems of all automorphic
forms f of level N and parallel weight k. More precisely, given a prime p in
OF , the algorithm returns the Hecke eigenvalues of f at p, and hence the
Euler factor Lp(f, s), for each eigenform f of level N and parallel weight k.
The algorithm is a generalization of the one developed in [D1 2005] to the
genus 2 case. Although we have only described the algorithm in the case of
a real quadratic field in this paper, it should be clear from our presentation
that it can be adapted to any totally real number field of narrow class
number one.
The Jacquet-Langlands Correspondence of the title refers to the conjec-
tural map JL : Π(GB) → Π(G) from automorphic representations of GB to
automorphic representations of G, which is injective, matches L-functions
and enjoys other properties compatible with the principle of functoriality;
Date: October 29, 2018.
1991 Mathematics Subject Classification. Primary: 11F41 (Hilbert and Hilbert-Siegel
modular forms).
Key words and phrases. Hilbert-Siegel modular forms, Jacquet-Langlands Correspon-
dence, Brandt matrices, Satake parameters.
http://arxiv.org/abs/0704.0011v3
2 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
in particular, the image of the Jacquet-Langlands Correspondence is to be
contained in the space of holomorphic automorphic representations. If we
admit this conjecture, then the algorithm above provides a way to produce
examples of cuspidal Hilbert-Siegel modular forms of genus 2 over F and
allows us to compute the L-factors of the corresponding automorphic repre-
sentations for arbitrary finite primes p of F .
In fact, we are also able to use these calculations to provide evidence for
the Jacquet-Langlands Correspondence itself by comparing the Euler factors
we find with those of known Hilbert-Siegel modular forms obtained by lifting.
This we do in the final section of the paper where we observe that some of
the Euler factors we compute match those of lifts of Hilbert modular forms,
for the primes we computed. Although this does not definitively establish
that these Hilbert-Siegel modular forms are indeed lifts, in principle one can
establish equality in this way, using an analogue of the Sturm bound.
The first systematic approach to Siegel modular forms from a computa-
tional viewpoint is due to Skoruppa [Sk 1992] who used Jacobi symbols to
generate spaces of such forms. His algorithm, which has been extensively
exploited by Ryan [R 2006], applies only to the case of full level structure.
More recently, Faber and van der Geer [FvdG1 2004] and [FvdG2 2004]
also produced examples of Siegel modular forms by counting points on hy-
perelliptic curves of genus 2; again their results are available only in the
full level structure case. The most substantial progress toward the com-
putation of Siegel modular forms for proper level structure is by Gunnells
[Gu 2000] who extended the theory of modular symbols to the symplectic
group Sp4/Q. However, this work does not see the cuspidal cohomology,
which is the only part of the cohomology which is relevant to arithmetic
geometric applications. To the best of our knowledge, there are no numer-
ical examples of Hilbert-Siegel modular forms for proper level structure in
the literature, with the exception of those produced from liftings of Hilbert
modular forms.
The outline of the paper is as follows. In Section 1 we recall the basic
properties of Hilbert-Siegel modular forms and algebraic automorphic forms
together with the Jacquet-Langlands Correspondence. In Section 2 we give
a detailed description of our algorithm. Finally, in Section 3 we present
numerical results for the quadratic field Q(
Acknowledgements. During the course of the preparation of this paper,
the second author had helpful email exchanges with several people includ-
ing Alexandru Ghitza, David Helm, Marc-Hubert Nicole, David Pollack,
Jacques Tilouine and Eric Urban. The authors wish to thank them all.
Also, we would like to thank William Stein for allowing us to use the SAGE
computer cluster at the University of Washington. And finally, the sec-
ond author would like to thank the PIMS institute for their postdoctoral
fellowship support, and the University of Calgary for its hospitality.
COMPUTING HILBERT-SIEGEL MODULAR FORMS 3
1. Hilbert-Siegel modular forms and the Jacquet-Langlands
correspondence
Throughout this paper, F denotes a real quadratic field of narrow class
number one. The two archimedean places of F and the real embeddings of
F will both be denoted v0 and v1. For every a ∈ F , we write a0 (resp. a1)
for the image of a under v0 (resp. v1). The ring of integers of F is denoted
by OF . For every prime ideal p in OF , the completion of F and OF at p
will be denoted by Fp and OFp , respectively.
Let B be the unique (up to isomorphism) totally definite quaternion al-
gebra over F which is unramified at all finite primes of F . We fix a maximal
order OB of B. Also, we choose a splitting field K/F of B that is Ga-
lois over Q and such that there exists an isomorphism j : OB ⊗Z OK ∼=
M2(OK)⊕M2(OK), where M2(A) denotes the ring of 2× 2-matrices with
entries from a ring A. For every finite prime p in F , we fix an isomorphism
Bp ∼= M2(Fp) which restricts to an isomorphism from OB, p onto M2(OFp ).
The algebraic group G = ResF/Q(GSp4) is defined as follows. For any
Q-algebra A, the set of A-rational points of G is given by
G(A) =
γ ∈ GL4(A⊗Q F )
t = νG(γ)J2
νG(γ) ∈ (A⊗Q F )×
where
−12 0
This group admits an integral model with A-rational points for every Z-
algebra A given by
GZ(A) =
γ ∈ GL4(A⊗Z OF )
t = νG(γ)J2
νG(γ) ∈ (A⊗Z OF )×
For any Q-algebra A, the conjugation on B extends in a natural way to the
matrix algebra M2(B ⊗Q A).
The algebraic group GB/Q is defined as follows. For any Q-algebra A,
the set of A-rational points of GB is given by
GB(A) =
γ ∈ M2(B ⊗Q A)
γγ̄t = νGB(γ)12
νGB (γ) ∈ (A⊗Q F )×
This group also admits an integral model with A-rational points for every
Z-algebra given by
GBZ (A) =
γ ∈ M2(OB ⊗Z A)
γγ̄t = νGB(γ)12
νGB (γ) ∈ (A⊗Z OF )×
The group GB/Q is an inner form of G/Q such that GB(R) is compact
modulo its center. Combining the isomorphism j (see above) with con-
jugation by a permutation matrix, we obtain an isomorphism GBZ (OK) ∼=
4 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
GZ(OK), which we fix from now on. For every prime ideal p in F , the split-
ting of GB at p amounts to the splitting of the quaternion algebra B at p;
we refer to [D1 2005] for further details.
By the choice of the quaternion algebra B, we have GB(Q̂) ∼= G(Q̂). (We
denote the finite adèles of Q (resp. Z) by Q̂ (resp. Ẑ)).
1.1. Hilbert-Siegel modular forms. We fix an integer k ≥ 3 and, for
simplicity, we restrict ourselves to Hilbert-Siegel modular forms of parallel
weight k. The real embeddings v0 and v1 of F extend to G(Q) = GSp4(F )
in a natural way. We denote by GSp+4 (F ) the subgroup of elements γ with
totally positive similitude factor νG(γ). We recall that the Siegel upper-half
plane of genus 2 is defined by
H2 = {γ ∈ GL2(C)
∣ γt = γ and Im(γ) is positive definite }.
We also recall that GSp+4 (F ) acts on H
(τ0, τ1) :=
(a0τ0 + b0)(c0τ0 + d0)
−1, (a1τ1 + b1)(c1τ1 + d1)
This induces an action on the space of functions f : H22 → C by
, f |kγ(τ) =
νG(γi)
det(ciτi + di)k
f(τ).
Let N be an ideal in OF and set
Γ0(N) =
∈ GSp+4 (OF )
∣ c ≡ 0(N)
A Hilbert-Siegel modular form of level N and parallel weight k is a
holomorphic function f : H22 → C such that
∀γ ∈ Γ0(N), f |kγ = f.
The space of Hilbert-Siegel modular forms of parallel weight k and level N
is denoted Mk(N). Each f ∈Mk(N) admits a Fourier expansion, which by
the Koecher principle takes the form
∀τ ∈ H22, f(τ) =
{Q}∪{0}
2πiTr(Qτ),
where Q ∈ M2(F ) runs over all symmetric totally positive and semi-definite
matrices. A Hilbert-Siegel modular forms f is a cusp form if, for all γ ∈
4 (F ), the constant term in the Fourier expansion of f |kγ is zero. The
space of Hilbert-Siegel cusp forms is denoted Sk(N).
COMPUTING HILBERT-SIEGEL MODULAR FORMS 5
1.2. The Hecke algebra. The space Sk(N) comes equipped with a Hecke
action, which we now recall. Take u ∈ GSp+4 (F ) ∩M4(OF ), and write the
finite disjoint union
Γ0(N)uΓ0(N) =
Γ0(N)ui.
Then the Hecke operator [Γ0(N)uΓ0(N)] on Sk(N) is given by
[Γ0(N)uΓ0(N)]f =
f |kui.
Let p be a prime ideal in OF and let πp be a totally positive generator of
p; let T1(p) and T2(p) be the Hecke operators corresponding to the double
Γ0(N)-cosets of the symplectic similitude matrices
1 0 0 0
0 1 0 0
0 0 πp 0
0 0 0 πp
1 0 0 0
0 πp 0 0
0 0 π2p 0
0 0 0 πp
respectively. (We remind the reader of the symplectic form J2 fixed at
the beginning of Section 1.) The Hecke algebra Tk(N) is the Z-algebra
generated by the operators T1(p) and T2(p), where p runs over all primes
not dividing N .
1.3. Algebraic Hilbert-Siegel autormorphic forms. We only consider
level structure of Siegel type. Namely, we define the compact open subgroup
U0(N) of G(Q̂) by
U0(N) =
GSp4(OFp )×
ep ),
where N =
p|N p
ep and
ep ) :=
∈ GSp4(OFp )
∣ c ≡ 0 mod pep
The weight representation is defined as follows. Let Lk be the repre-
sentation of GSp4(C) of highest weight (k− 3, k− 3). We let Vk = Lk ⊗Lk
and define the complex representation (ρk, Vk) by
ρk : G
B(R) −→ GL(Vk),
where the action on the first factor is via v0, and the action on the second
one is via v1.
The space of algebraic Hilbert-Siegel modular forms of weight k and
level N is given by
MBk (N) :=
f : GB(Q̂)/U0(N) → Vk
∣ ∀γ ∈ GB(Q), f |kγ = f
6 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
where f |kγ(x) = f(γx)γ, for all x ∈ GB(Q̂)/U0(N). When k = 3, we let
IBk (N) :=
f : GB(Q)\GB(Q̂)/U0(N) → C
∣ f is constant
Then, the space of algebraic Hilbert-Siegel cusp forms of weight k and
level N is defined by
SBk (N) :=
MBk (N) if k > 3,
MBk (N)/I
k (N) if k = 3.
The action of the Hecke algebra on SBk (N) is given as follows. For any
u ∈ G(Q̂), write the finite disjoint union
U0(N)uU0(N) =
uiU0(N),
and define
[U0(N)uU0(N)] : S
k (N) → SBk (N)
f 7→ f |k[U0(N)uU0(N)],
f |k[U0(N)uU0(N)](x) =
f(xui), x ∈ G(Q̂).
For any prime p ∤ N , let ̟p be a local uniformizer at p. The local Hecke alge-
bra at p is generated by the Hecke operators T1(p) and T2(p) corresponding
to the double U0(N)-cosets ∆1(p) and ∆2(p) of the matrices
1 0 0 0
0 1 0 0
0 0 ̟p 0
0 0 0 ̟p
1 0 0 0
0 ̟p 0 0
0 0 ̟2
0 0 0 ̟p
respectively. We let TBk (N) be the Hecke algebra generated by T1(p) and
T2(p) for all primes p ∤ N .
1.4. The Jacquet-Langlands Correspondence. The Hecke modules Sk(N)
and SBk (N) are related by the following conjecture known as the Jacquet-
Langlands Correspondence for symplectic similitude groups.
Conjecture 1. The Hecke algebras Tk(N) and T
k (N) are isomorphic and
there is a compatible isomorphism of Hecke modules
Sk(N)
∼−→ SBk (N).
It is common, but perhaps not entirely accurate, to attribute this con-
jecture to Jacquet-Langlands. To the best of our knowledge, the correspon-
dence in this form was first discussed by Ihara [Ih 1964] in the case F = Q. In
[Ib 1984], Ibukiyama provided some numerical evidence. On the other hand,
it is appropriate to refer to Conjecture 1 as the Jacquet-Langlands Corre-
spondence (for GSp(4)) since it is an analogue of the Jacquet-Langlands
COMPUTING HILBERT-SIEGEL MODULAR FORMS 7
Correspondence (for GL(2)) which relates automorphic representations of
the multiplicative group of a quaternion algebra with certain automorphic
representations of GL(2) (see [JL 1970]). Both correspondences are, in turn,
special consequences of the principle of functoriality, as expounded by Lang-
lands. Finally, it appears that Conjecture 1 may soon be a theorem due to
the work of [So 2008] and the forthcoming book by James Arthur on auto-
morphic representations of classical groups.
2. The Algorithm
In this section, we present the algorithm we used in order to compute
the Hecke module of (algebraic) Hilbert-Siegel modular forms. The main
assumption in this section is that the class number of the principal genus
of GB is 1. (We refer to [D3 2007] to see how one can relax this condition
on the class number.) We recall that since B is totally definite, GB satis-
fies Proposition 1.4 in Gross [Gr 1999]. Thus the group GB(R) is compact
modulo its centre, and Γ = GB(Z)/O×F is finite.
For any prime p in F , let Fp = OF /p be the residue field at p and define
the reduction map
M2(OB, p) → M4(Fp)
g 7→ g̃,
where we use the splitting of OB,p that was fixed at the beginning of Sec-
tion 1. Now, choose a totally positive generator πp of p and put
Θ1(p) := Γ\
u ∈ M2(OB)
∣ uūt = πp12and rank(g̃) = 2
Θ2(p) := Γ\
u ∈ M2(OB)
∣ uūt = π2
12 and rank(g̃) = 1
We let H20(N) = G(Ẑ)/U0(N). Then the group Γ acts on H20(N), thus on
the space of functions f : H20(N) → Vk by
∀x ∈ H20(N),∀γ ∈ Γ, f |kγ(x) := f(γx)γ.
Theorem 2. There is an isomorphism of Hecke modules
MBk (N)
f : H20(N) → Vk
∣ f |kγ = f, γ ∈ Γ
where the Hecke action on the right hand side is given by
f |kT1(p) =
u∈Θ1(p)
f |ku,
f |kT2(p) =
u∈Θ2(p)
f |ku.
Proof. The canonical map
φ : GB(Z)\GB(Ẑ)/U0(N) → GB(Q)\GB(Q̂)/U0(N)
is an injection. Making use of the fact that the class number in the principal
genus of GB is one (GB(Q̂) = GB(Q)GBZ (Ẑ)), we see that φ is in fact a
8 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
bijection. Since each element f ∈ MBk (N) is determined by its values on a
set of coset representatives of GB(Q)\GB(Q̂)/U0(N), the map φ induces an
isomorphism of complex vector spaces
MBk (N)
f : H20(N) → Vk
∣ f |kγ = f, γ ∈ Γ
f 7−→ f ◦ φ.
We make this into a Hecke module isomorphism by defining the Hecke action
on the right hand side as indicated in the statement of the theorem. �
In the rest of this section, we explain the main steps of the algorithm
provided by Theorem 2.
2.1. The quotient H20(N). Keeping the notations of the previous section,
we recall that N =
p|N p
ep . Let p be a prime dividing N and consider the
rank 4 free
OFp/pep
-module L =
OFp/pep
endowed with the symplectic
pairing 〈 , 〉 given by the matrix
−12 0
where 12 is the identity matrix in M2(OFp/pep ). Let M be a rank 2
OFp/pep
-submodule which is a direct factor in L. We say that M is
isotropic if 〈u, v〉 = 0 for all u, v ∈ M . We recall that GSp4(OFp ) acts
transitively on the set of rank 2, isotropic
OFp/pep
-submodules of L and
that the stabilizer of the submodule generated by e1 = (1, 0, 0, 0)
T and
e2 = (0, 1, 0, 0)
T is U0(p
ep ). The quotient H20(pep ) = GSp4(OFp )/U0(pep )
is the set of rank 2, isotropic
OFp/pep
-submodules of L. Via the reduction
map ÔF → OF /N , the quotient GZ(Ẑ)/U0(N) can be identified with the
product
H20(N) =
H20(pep ).
The cardinality of H20(N) is extremely useful and is determined using the
following lemma.
Lemma 1. Let p be a prime in F and ep ≥ 1 an integer. Then, the cardi-
nality of the set H20(pep ) is given by
#H20(pep ) = N(p)3(ep−1)(N(p) + 1)(N(p)2 + 1).
Proof. For ep = 1, the cardinality of the Lagrange variety over the finite
field Fp = OF /p is given by (N(p) + 1)(N(p)2 + 1). Proceed by induction
on ep. �
We have more to say about elements of H20(pep ) in Subsection 2.5.
COMPUTING HILBERT-SIEGEL MODULAR FORMS 9
2.2. Brandt matrices. Let F = {x1, . . . , xh} be a fundamental domain
for the action of Γ on H20(N) and, for each i, let Γi be the stabilizer of xi.
Then, every element in MBk (N) is completely determined by its values on
F . Thus, there is an isomorphism of complex spaces
MBk (N) →
f 7→ (f(xi)),
where V
is the subspace of Γi-invariants in Vk.
For any x, y ∈ H20(N), we let
Θ1(x, y, p) :=
u ∈ Θ1(p)
∣ ∃γ ∈ Γ, ux = γy
Θ2(x, y, p) :=
u ∈ Θ2(p)
∣ ∃γ ∈ Γ, ux = γy
Proposition 3. The actions of the Hecke operators Ts(p), s = 1, 2, are
given by the Brandt matrices Bs(p) = (bsij(p)), where
bsji(p) : V
k → V
v 7→ v ·
u∈Θs(xi, xj ,p)
γ−1u u
Proof. The proof of Proposition 3 follows the lines of [D1 2005, §3]. �
2.3. Computing the group GB(Z). It is enough to compute the subgroup
Γ consisting of the elements in GB(Z) with similitude factor 1. But it is easy
to see that
u, v ∈ O1B
u, v ∈ O1B
where O1B is the group of norm 1 elements.
2.4. Computing the sets Θ1(p) and Θ2(p). Let us consider the quadratic
form on the vector space V = B2 given by
V → F
(a, b) 7→ ||(a, b)|| := nr(a) + nr(b),
where nr is the reduced norm on B. This determines an inner form
V × V → F
(u, v) 7→ 〈u, v〉.
An element of Θ1(p) (resp. Θ2(p)) is a unitary matrix γ ∈ M2(OB) with
respect to this inner form such that the norm of each row is πp (resp. π
and the rank of the reduced matrix is 1). So we first start by computing
all the vectors u = (a, b) ∈ O2B such that ||u|| = πp (resp. ||u|| = π2p). And
for each such vector u, we compute the vectors v = (c, d) ∈ O2B of the same
10 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
norm such that 〈u, v〉 = 0. The corresponding matrix γ =
belongs
to Θ1(p) (resp. Θ2(p)) when its reduction mod p has the appropriate rank.
We list all these matrices up to equivalence and stop when we reach the
right cardinality.
2.5. The implementation of the algorithm. The implementation of the
algorithm is similar to that of [D1 2005]. However, it is important to note
how we represent elements in H20(N) so that we can retrieve them easily
once stored. As in [D1 2005] we choose to work with the product
H20(N) =
H20(pep ).
Using Plucker’s coordinates, we can view H20(pep ) as a closed subspace of
P5(OFp/pep ). We then represent each element in H20(pep ) by choosing a
point x = (a0 : · · · : a5) = [u ∧ v] ∈ P5(OFp/pep ) such that the submodule
M generated by u and v is a Lagrange submodule, and the first invertible
coordinate is scaled to 1.
Remark 1. In [LP 2002], Lansky and Pollack describe an algorithm which
computes algebraic modular forms on the same inner form of GSp4/Q that
we use. We would like to note that there are some differences between the
two algorithms. Although [LP 2002] also uses the flag variety H20(N) in
order to determine the double coset space GB(Q)\GB(Q̂)/U0(N), it later
returns to the adelic setting in order to compute the Brandt matrices. In
contrast, Theorem 2 and Proposition 3 allow us to avoid that unnecessary
step by describing the Hecke action on the flag variety H20(N) directly. As
a result, we get an algorithm that is more efficient.
3. Numerical examples: F = Q(
5) and B =
−1,−1
In this section, we provide some numerical examples using the quadratic
field F = Q(
5). It is proven in K. Hashimoto and T. Ibukiyama [HI 1980]
that, for the Hamilton quaternion algebra B over F , the class number of
the principal genus of GB is one. We use our algorithm to compute all the
systems of Hecke eigenvalues of Hilbert-Siegel cusp forms of weight 3 and
level N that are defined over real quadratic fields, where N runs over all
prime ideals of norm less than 50. We then determine which of the forms
we obtained are possible lifts of Hilbert cusp forms by comparing the Hecke
eigenvalues for those primes.
3.1. Tables of Hilbert-Siegel cusp forms of parallel weight 3. In
Table 1 we list all the systems of eigenvalues of Hilbert-Siegel cusp forms of
weight 3 and level N that are defined over real quadratic fields, where N
runs over all prime ideals in F of norm less than 50. Here are the conventions
we use in the tables.
COMPUTING HILBERT-SIEGEL MODULAR FORMS 11
(1) For a quadratic field K of discriminant D, we let ωD be a generator
of the ring of integers OK of K.
(2) The first row contains the level N , given in the format (Norm(N), α)
for some generator α ∈ F of N , and the dimensions of the relevant
spaces.
(3) The second row lists the Hecke operators that have been computed.
(4) For each eigenform f , the Hecke eigenvalues are given in a row, and
the last entry of that row indicates if the form f is a probable lift.
(5) The levels and the eigenforms are both listed up to Galois conjuga-
tion.
For an eigenform f and a given prime p ∤ N , let a1(p, f) and a2(p, f) be the
eigenvalues of the Hecke operators T1(p) and T2(p), respectively. Then the
Euler factor Lp(f, s) is given (for example, in [AS 2001, §3.4]) by
Lp(f, s) = Qp(q
−s)−1,
where
Qp(x) = 1− a1(p, f)x+ b1(p, f)x2 − a1(p, f)q2k−3x3 + q4k−6x4,
b1(p, f) = a1(p, f)
2 − a2(p, f)− q2k−4,
q = N(p).
3.2. Tables of Hilbert cusp forms of parellel weight 4. In Table 2, we
list all the Hilbert cusp forms of parallel weight 4 and level N that are defined
over real quadratic fields, with N running over all prime ideals of norm less
than 50. (They are computed by using the algorithm in [D1 2005]). We use
this data in order to determine the forms in Table 1 that are possible lifts
from GL2.
3.3. Lifts. There are two types of lifts from GL2 to GSp4. The first one
corresponds to the homomorphism of L-groups determined by the long root
embedding into GSp4, and the second one by the short root embedding.
(See [LP 2002] for more details). Let f be a Hilbert cusp form of parallel
weight k and level N with Hecke eigenvalues a(p, f), where p is a prime not
dividing N . Let φ be the lift of f to GSp4 via the long root, and ψ the one
via the short root. Then the Hecke eigenvalues of φ are given by
a1(p, φ) = a(p, f) N(p)
2 +N(p)2 +N(p)
a2(p, φ) = a(p, f) N(p)
2 (N(p) + 1) +N(p)2 − 1,
and the Hecke eigenvalues of ψ are given by
a1(p, ψ) = a(p, f)
2 − 2 a(p, f) N(p)
a2(p, ψ) = a(p, f)
N(p)4−2k − 3 a(p, f)2 N(p)3−k +N(p)2 − 1.
The second lift ψ is the so-called symmetric cube lifting.
12 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
N = (4, 2) : dimMB
(N) = 2, dimSB
(N) = 1
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 −4 0 20 −36 140 580 yes
N = (5, 2 + ω5) : dimM
(N) = 2, dimSB
(N) = 1
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 20 15 −5 0 40 −420 yes
N = (9, 3) : dimMB
(N) = 3, dimSB
(N) = 2
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 25− 3ω41 40− 15ω41 30 + 6ω41 24 + 36ω41 −9 0 yes
N = (11, 3 + ω5) : dimM
(N) = 3, dimSB
(N) = 2
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 24 35 34 48 88 60 yes
f2 −20 35 −10 4 0 60 no
N = (19, 4 + ω5) : dimM
(N) = 5, dimSB
(N) = 4
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 4 11 −20 28 6 76 no
f2 7 −50 15 −66 73 −90 yes
f3 24 + ω161 35 + 5ω161 36− ω161 60− 6ω161 98− 3ω161 160− 30ω161 yes
N = (29, 5 + ω5) : dimM
(N) = 9, dimSB
(N) = 8
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 −4 11 10 20 30 60 no
f2 8 −45 30 24 50 −320 yes
f3 17 0 9 −102 86 40 yes
N = (31, 5 + 2ω5) : dimM
(N) = 12, dimSB
(N) = 11
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 13 −20 20 −36 76 −60 yes
N = (41, 6 + ω5) : dimM
(N) = 19, dimSB
(N) = 18
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 10 20 −10 29 30 −20 no
f2 −1 1 5 14 −2 −56 no
f3 27 50 40 84 124 420 yes
f4 −12 19 30 65 0 0 no
f5 16− 2ω21 −5− 10ω21 21 + 4ω21 −30 + 24ω21 72− 2ω21 −100− 20ω21 yes
f6 2− 6ω5 11− 2ω5 8 + 4ω5 11− 4ω5 −12 + 54ω5 160 + 40ω5 no
N = (49, 7) : dimMB
(N) = 26, dimSB
(N) = 25
T1(2) T2(2) T1(
5) T2(
5) T1(3) T2(3) Lift?
f1 5 −60 46 120 40 −420 yes
f2 4 + 4ω65 32 + 3ω65 12− 4ω65 44− 4ω65 −6− 12ω65 145 + 8ω65 no
Table 1. Hilbert-Siegel eigenforms of weight 3
COMPUTING HILBERT-SIEGEL MODULAR FORMS 13
N (4, 2) (5, 2 + ω5) (9, 3) (11, 3 + ω5)
N(p) p a(p, f1) a(p, f1) a(p, f1) a(p, f1)
4 2 −4 0 5− 3ω41 4
5 2 + ω5 −10 −5 6ω41 4
9 3 50 −50 −9 −2
11 3 + 2ω5 −28 32 −18− 6ω41 −10
11 3 + ω5 −28 32 −18− 6ω41 −11
19 4 + 3ω5 60 100 −40 + 24ω41 −94
19 4 + ω5 60 100 −40 + 24ω41 28
N (19, 4 + ω5) (29, 5 + ω5)
N(p) p a(p, f1) a(p, f2) a(p, f1) a(p, f2)
4 2 −13 5− ω161 −12 −3
5 2 + ω5 −15 5 + ω161 0 −21
9 3 −17 5 + 3ω161 −40 −4
11 3 + 2ω5 −6 2 + 8ω161 −68 37
11 3 + ω5 33 7− 7ω161 30 −66
19 4 + 3ω5 −139 −15− 9ω161 −28 −40
19 4 + ω5 19 −19 84 −9
N (31, 5 + 2ω5) (41, 6 + ω5)
N(p) p a(p, f1) a(p, f1) a(p, f2)
4 2 −7 7 −4− 2ω21
5 2 + ω5 −10 10 −9 + 4ω21
9 3 −14 34 −18− 2ω21
11 3 + 2ω5 −20 −60 −19
11 3 + ω5 −28 −2 −24− 4ω21
19 4 + 3ω5 −12 74 4− 50ω21
19 4 + ω5 28 16 −29 + 44ω21
N (49, 7)
N(p) p a(p, f1) a(p, f2)
4 2 −15 −2
5 2 + ω5 16 −10
9 3 −50 −11
11 3 + 2ω5 −8 −7− 28ω13
11 3 + ω5 −8 −35 + 28ω13
19 4 + 3ω5 −110 −26 + 14ω13
19 4 + ω5 −110 −12− 14ω13
Table 2. Hilbert eigenforms of weight 4
Remark 2. So far, our algorithm has been implemented only for congruence
subgroups of Siegel type. We intend to improve the implementation in the
near future so as to include more additional level structures such as the
Klingen type. Indeed, Ramakrishnan and Shahidi [RS 2007] recently showed
the existence of symmetric cube lifts for non-CM elliptic curves E/Q to
GSp4/Q. And their result should hold for other totally real number fields,
with the level structures of the lifts being of Klingen type. Unfortunately,
14 CLIFTON CUNNINGHAM AND LASSINA DEMBÉLÉ
those lifts cannot be seen in our current tables. For example, there are
modular elliptic curves over Q(
5) whose conductors have norm 31, 41 and
49, but the corresponding symmetric cubic lifts do not appear in Table 1.
We would like to remedy that in our next implementation.
References
[D1 2005] L. Dembélé, Explicit computations of Hilbert modular forms on Q(
5). Exper-
iment. Math. 14 (2005), no. 4, 457–466.
[D2 2007] L. Dembélé, Quaternionic M -symbols, Brandt matrices and Hilbert modular
forms. Math. Comp. 76, no 258, (2007), 1039-1057. Also available electronically.
[D3 2007] L. Dembélé, On the computation of algebraic modular forms (submitted).
[AS 2001] Mahdi Asgari and Ralf Schmidt, Siegel modular forms and representations,
Manuscripta Math. 104 (2001), 173–200.
[FvdG1 2004] Carel Faber and Gerard van der Geer, Sur la cohomologie des systèmes
locaux sur les espaces de modules des courbes de genre 2 et des surfaces abéliennes.
I, C. R. Math. Acad. Sci. Paris 338 (2004), no. 5, 381–384.
[FvdG2 2004] Carel Faber and Gerard van der Geer, Sur la cohomologie des systèmes
locaux sur les espaces de modules des courbes de genre 2 et des surfaces abéliennes.
II, C. R. Math. Acad. Sci. Paris 338 (2004), no. 6, 467–470.
[JL 1970] Hervé Jacquet and Robert Langlands, Automorphic forms on GL(2), Lecture
notes in mathematics 114 and 278, 1970.
[Gr 1999] Benedict H. Gross, Algebraic modular forms. Israel J. Math. 113 (1999), 61–93.
[Gu 2000] P. Gunnells, Symplectic modular symbols, Duke Math. J. 102 (2000), no. 2,
329-350.
[HI 1980] K. Hashimoto and T. Ibukiyama, On the class numbers of positive definite
binary quaternion hermitian forms. J. Fac. Sci. Univ. Tokyo Sect. IA Math. 27 (1980),
549-601.
[Ib 1984] T. Ibukiyama, On symplectic Euler factors of genus 2. J. Fac. Sci. Univ. Tokyo
30 (1984), 587614.
[Ih 1964] Y. Ihara, On certain Dirichlet series, J. Math. Soc. Japan 16 (1964), 214-225.
[LP 2002] J. Lansky and D. Pollack, Hecke algebras and automorphic forms. Compositio
Math. 130 (2002), no. 1, 21–48.
[RS 2007] Dinakar Ramakrishnan and Freydoon Shahidi, Siegel modular forms of genus
2 attached to elliptic curves (preprint). Available at www.math.arxiv.
[R 2006] N. C. Ryan, Computing the Satake p-parameters of Siegel modular forms. (sub-
mitted).
[Sk 1992] Nils-Peter Skoruppa, Computations of Siegel modular forms of genus two. Math.
Comp. 58 (1992), no. 197, 381–398.
[So 2008] Claus M. Sorensen, Potential level-lowering for GSp(4), arXive:0804.0588v1.
Department of Mathematics, University of Calgary
E-mail address: cunning@math.ucalgary.ca
Institut für Experimentelle Mathematik, Universität Duisburg-Essen
E-mail address: lassina.dembele@uni-duisburg-essen.de
	Introduction
	1. Hilbert-Siegel modular forms and the Jacquet-Langlands correspondence
	1.1. Hilbert-Siegel modular forms
	1.2. The Hecke algebra
	1.3. Algebraic Hilbert-Siegel autormorphic forms
	1.4. The Jacquet-Langlands Correspondence
	2. The Algorithm
	2.1. The quotient H02(N)
	2.2. Brandt matrices
	2.3. Computing the group GB(Z)
	2.4. Computing the sets 1(p) and 2(p)
	2.5. The implementation of the algorithm
	3. Numerical examples: F=Q(5) and B=(-1,-1F)
	3.1. Tables of Hilbert-Siegel cusp forms of parallel weight 3
	3.2. Tables of Hilbert cusp forms of parellel weight 4
	3.3. Lifts
	References
ABSTRACT
  In this paper we present an algorithm for computing Hecke eigensystems of
Hilbert-Siegel cusp forms over real quadratic fields of narrow class number
one. We give some illustrative examples using the quadratic field
$\Q(\sqrt{5})$. In those examples, we identify Hilbert-Siegel eigenforms that
are possible lifts from Hilbert eigenforms.

<|endoftext|><|startoftext|>
Introduction and Results
Let Mλ+ 1
(Γ0(N), χ) and Sλ+ 1
(Γ0(N), χ) be the spaces, respectively, of modular forms
and cusp forms of weight λ + 1
on Γ0(N) with a Dirichlet character χ whose conductor
divides N . If f(z) ∈Mλ+ 1
(Γ0(N), χ), then f(z) has the form
f(z) =
a(n)qn,
where q := e2πiz. It is well-known that the coefficients of f are related to interesting
objects in number theory such as the special values of L-function, class number, traces of
singular moduli and so on. In this paper, we study congruence properties of the Fourier
coefficients of f(z) ∈Mλ+ 1
(Γ0(N), χ) ∩ Z[[q]] and their applications.
Recently, Bruinier and Ono proved in [3] that g(z) ∈ Sλ+ 1
(Γ0(N), χ) ∩ Z[[q]] has a
special form (see (2.1)) by modulo p when p is an odd prime and the coefficients of f(z)
do not satisfy the following property for p:
Property A. IfM is a positive integer, we say that a sequence α(n) ∈ Z satisfies Property
A for M if for every integer r
♯{ 1 ≤ n ≤ X | α(n) ≡ r (mod M) and gcd(M,n) = 1}
if r 6≡ 0 (mod M),
X if r ≡ 0 (mod M).
2000 Mathematics Subject Classification. 11F11,11F33.
Key words and phrases. Modular forms, Congruences.
http://arxiv.org/abs/0704.0012v1
2 D. CHOI
θ(f(z)) :=
f(z) =
n · a(n)qn.
Using Rankin-Cohen Bracket (see (2.3)), we prove that there exists
f̃(z) ∈ Sλ+p+1+ 1
(Γ0(4N), χ) ∩ Z[[q]]
such that θ(f(z)) ≡ f̃(z) (mod p). We extend the results in [3] to modular forms of half
integral weight.
Theorem 1. Let λ be a non-negative integer. We assume that f(z) =
n=0 a(n)q
Mλ+ 1
(Γ0(4N), χ) ∩ Z[[q]], where χ is a real Dirichlet character. If p ≥ 5 is a prime and
there exists a positive integer n for which gcd(a(n), p) = 1 and gcd(n, p) = 1, then at least
one of the following is true:
(1) The coefficients of θp−1(f(z)) satisfies Property A for p.
(2) There are finitely many square-free integers n1, n2, · · · , nt for which
(1.1) θp−1(f(z)) ≡
a(nim
2)qnim
(mod p).
Moreover, if gcd(4N, p) = 1 and an odd prime ℓ divides some ni, then
p|(ℓ− 1)ℓ(ℓ+ 1)N or ℓ | N.
Remark 1.1. Note that for every odd prime p ≥ 5,
θp−1(f(z)) ≡
a(n)qn (mod p).
As an applications of Theorem 1, we study the distribution of traces of singular moduli
modulo primes p ≥ 5. Let j(z) be the usual j-invariant function. We denote by Fd the
set of positive definite binary quadratic forms
F (x, y) = ax2 + bxy + cy2 = [a, b, c]
with discriminant −d = b2−4ac. For each F (x, y), let αF be the unique complex number
in the complex upper half plane, which is a root of F (x, 1). We define ωF ∈ {1, 2, 3} as
ωF :=
2 if F ∼Γ [a, 0, a],
3 if F ∼Γ [a, a, a],
1 otherwise,
where Γ := SL2(Z). Here, F ∼Γ [a, b, c] denotes that F (x, y) is equivalent to [a, b, c].
From these notations, we define the Hecke trace of singular moduli.
DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 3
Definition 1.2. If m ≥ 1, then we define the mth Hecke trace of the singular moduli of
discriminant −d as
tm(d) :=
F∈Fd/Γ
jm(αF )
where Fd/Γ denotes a set of Γ−equivalence classes of Fd and
jm(z) := j(z)|T0(m) =
az + b
Here, T0(m) denotes the normalized mth weight zero Hecke operator.
Note that t1(d) = t(d), where
t(d) :=
F∈Fd/Γ
j(αF )− 744
is the usual trace of singular moduli. Let
h(z) :=
η(z)2
η(2z)
· E4(4z)
η(4z)6
and Bm(1, d) denote the coefficient of q
d in h(z)|T (m2, 1, χ0), where
E4(z) := 1 + 240
d3qn, η(z) := q
(1− qn) ,
and χ0 is a trivial character. Here, T (m
2, λ, χ) denotes the mth Hecke operator of weight
λ + 1
with a Dirichlet chracter χ (see VI. §3. in [5] or (2.5)). Zagier proved in [11] that
for all m and d
(1.2) tm(d) = −Bm(1, d).
Using these generating functions, Ahlgren and Ono studied the divisibility properties
of traces and Hecke traces of singular moduli in terms of the factorization of primes in
imaginary quadratic fields (see [2]). For example, they proved that a positive proportion
of the primes ℓ has the property that tm(ℓ
3n) ≡ 0 (mod ps) for every positive integer n
coprime to ℓ such that p is inert or ramified in Q
. Here, p is an odd prime, and
s and m are integers with p ∤ m. In the following theorem, we give the distribution of
traces and Hecke traces of singular moduli modulo primes p.
4 D. CHOI
Theorem 2. Suppose that p ≥ 5 is a prime such that p ≡ 2 (mod 3).
(1) Then, for every integer r, p ∤ r,
♯{ 1 ≤ n ≤ X | t1(n) ≡ r (mod p)} ≫r,p
if r 6≡ 0 (mod p)
X if r ≡ 0 (mod p).
(2) Then, a positive proportion of the primes ℓ has the property that
♯{ 1 ≤ n ≤ X | tℓ(n) ≡ r (mod p)} ≫r,p
if r 6≡ 0 (mod p)
X if r ≡ 0 (mod p).
for every integer r, p ∤ r.
As another application we study the distribution of Hurwitz class number modulo
primes p ≥ 5. The Hurwitz class number H(−N) is defined as follows: the class number
of quadratic forms of the discriminant −N where each class C is counted with multiplicity
Aut(C)
. The following theorem gives the distribution of Hurwitz class number modulo
primes p ≥ 5.
Theorem 3. Suppose that p ≥ 5 is a prime. Then, for every integer r
♯{ 1 ≤ n ≤ X | H(n) ≡ r (mod p)} ≫r,p
if r 6≡ 0 (mod p),
X if r ≡ 0 (mod p).
We also use the main theorem to study an analogue of Newman’s conjecture for overpar-
titions. Newman’s conjecture concerns the distribution of the ordinary partition function
modulo primes p.
Newman’s Conjecture. Let P (n) be an ordinary partition function. If M is a positive
integer, then for every integer r there are infinitely many nonnegative integer n for which
P (n) ≡ r (mod M).
This conjecture was already studied by many mathematicians (see Chapter 5. in [8]).
The overpartition of a natural number n is a partition of n in which the first occurrence
of a number may be overlined. Let P̄ (n) be the number of the overpartition of an integer
n. As an analogue of Newman’s conjecture, the following theorem gives a distribution
property of P̄ (n) modulo odd primes p.
Theorem 4. Suppose that p ≥ 5 is a prime such that p ≡ 2 (mod 3). Then, for every
integer r,
♯{ 1 ≤ n ≤ X | P̄ (n) ≡ r (mod p)} ≫r,p
if r 6≡ 0 (mod p),
X if r ≡ 0 (mod p).
Remark 1.3. When r ≡ 0 (mod p), Theorem 2, 3 and 4 were proved in [2] and [10].
DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 5
Next sections are detailed proofs of theorems: Section 2 gives a proof of Theorem 1. In
Section 3, we give the proofs of Theorem 2, 3, and 4.
2. Proof of Theorem 1
We begin by stating the following theorem proved in [3].
Theorem 2.1 ([3]). Let λ be a non-negative integer. Suppose that g(z) =
n=0 ag(n)q
Sλ+ 1
(Γ0(4N), χ) ∩ Z[[q]], where χ is a real Dirichlet character. If p is an odd prime and
a positive integer n exists for which gcd(ag(n), p) = 1, then at least one of the following
is true:
(1) If 0 ≤ r < p, then
♯{ 1 ≤ n ≤ X | ag(n) ≡ r (mod p)} ≫r,M
if r 6≡ 0 (mod p),
X if r ≡ 0 (mod p).
(2) There are finitely many square-free integers n1, n2, · · · , nt for which
(2.1) g(z) ≡
ag(nim
2)qnim
(mod p).
Moreover, if gcd(p, 4N) = 1, ǫ ∈ {±1}, and ℓ ∤ 4Np is a prime with
∈ {0, ǫ}
for 1 ≤ i ≤ t, then (ℓ−1)g(z) is an eigenform modulo p of the half-integral weight
Hecke operator T (ℓ2, λ, χ). In particular, we have
(2.2) (ℓ− 1)g(z)|T (ℓ2, λ, χ) ≡ ǫχ(p)
(−1)λ
ℓλ + ℓλ−1
(ℓ− 1)g(z) (mod p).
Recall that f(z) =
a(n)qn ∈ Mλ+ 1
(Γ0(4N), χ) ∩ Z[[q]]. Thus, to apply Theorem
2.1, we show that there exists a cusp form f̃(z) such that f̃(z) ≡ θp−1(f(z)) (mod p) for
a prime p ≥ 5.
Lemma 2.2. Suppose that p ≥ 5 is a prime and
f(z) =
a(n)qn ∈Mλ+ 1
(Γ0(N), χ) ∩ Z[[q]].
Then, there exists a cusp form f̃(z) ∈ Sλ+(p+1)(p−1)+ 1
(Γ0(N), χ) ∩ Z[[q]] such that
f̃(z) ≡ θp−1(f(z)) (mod p).
Proof of Lemma 2.2. For F (z) ∈Mk1
(Γ0(N), χ1) and G(z) ∈Mk2
(Γ0(N), χ2), let
(2.3) [F (z), G(z)]1 :=
θ(F (z)) ·G(z)−
F (z) · θ(G(z)).
This operator is referred to as a Rankin-Cohen 1-bracket, and it was proved in [4] that
[F (z), G(z)]1 ∈ S k1+k2
(Γ0(N), χ1χ2χ
6 D. CHOI
where χ′ = 1 if k1
and k2
∈ Z, χ′(d) =
2 if ki
∈ Z and k3−i
+ Z, and
χ′(d) =
) k1+k2
2 if k1
and k2
For even k ≥ 4, let
Ek(z) := 1−
dk−1qn
be the usual normalized Eisenstein series of weight k. Here, the number Bk denotes the
kth Bernoulli number. The function Ek(z) is a modular form of weight k on SL2(Z), and
(2.4) Ep−1(z) ≡ 1 (mod p)
(see [6]). From (2.3) and (2.4), we have
[Ep−1(z), f(z)]1 ≡ θ(f(z)) (mod p)
and [Ep−1(z), f(z)]1 ∈ Sλ+p+1+ 1
(Γ0(N), χ). Repeating this method p− 1 times, we com-
plete the proof. �
Using the following lemma, we can deal with the divisibility of ag(n) for positive integers
n, p ∤ n, where g(z) =
n=1 ag(n)q
n ∈ Sλ+ 1
(Γ0(N), χ) ∩ Z[[q]].
Lemma 2.3 (see Chapter 3 in [8]). Suppose that g(z) =
n=1 ag(n)q
n ∈ Sλ+ 1
(Γ0(N), χ)
has coefficients in OK , the algebraic integers of some number field K. Furthermore,
suppose that λ ≥ 1 and that m ⊂ OK is an ideal norm M .
(1) Then, a positive proportion of the primes Q ≡ −1 (mod 4MN) has the property
g(z)|T (Q2, λ, χ) ≡ 0 (mod m).
(2) Then a positive proportion of the primes Q ≡ 1 (mod 4MN) has the property that
g(z)|T (Q2, λ, χ) ≡ 2g(z) (mod m).
We can now prove Theorem 1.
Proof of Theorem 1. From Lemma 2.2, there exists a cusp form
f̃(z) ∈ Sλ+(p+1)(p−1)+ 1
(Γ0(N), χ) ∩ Z[[q]]
such that
f̃(z) ≡ θp−1(f(z)) (mod p).
Note that, for F (z) =
n=0 aF (n)q
n ∈ Mk+ 1
(Γ0(N), χ) and each prime Q ∤ N , the
half-integral weight Hecke operator T (Q2, λ, χ) is defined as
(2.5)
F (z)|T (Q2, k, χ)
aF (Q
2n) + χ∗(Q)
Qk−1aF (n) + χ
∗(Q2)Q2k−1aF (n/Q
DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 7
where χ∗(n) := χ∗(n)
(−1)k
and aF (n/Q
2) = 0 if Q2 ∤ n. If F (z)|T (Q2, k, χ) ≡ 0
(mod p) for a prime Q ∤ N , then we have
aF (Q
2 ·Qn) + χ∗(Q)
Qk−1aF (Qn) + χ
∗(Q2)Q2k−1aF
Qn/Q2
≡ aF (Q3n) ≡ 0 (mod p)
for every positive integer n such that gcd(Q, n) = 1. Thus, we have the following by
Lemma 2.3-(1):
♯{ 1 ≤ n ≤ X | a(n) ≡ 0 (mod p) and gcd(p, n) = 1} ≫ X.
We apply Theorem 2.1 with f̃(z). Then the purpose of the remaining part of the proof
is to show the following: if gcd(p, 4N) = 1, an odd prime ℓ divides some ni, and
(2.6) θp−1(f(z)) ≡
a(nim
2)qnim
(mod p),
then p|(ℓ− 1)ℓ(ℓ+ 1)N or ℓ | N . We assume that there exists a prime ℓ1 such that ℓ1|n1,
p ∤ (ℓ1 − 1)ℓ1(ℓ1 + 1)N and ℓ | N . We also assume that nt = 1 and that ni ∤ n1 for every
i, 2 ≤ i ≤ t − 1. Then, we can take a prime ℓi for each i, 2 ≤ i ≤ t − 1, such that ℓi|ni
and ℓi ∤ n1. For convention, we define
(−1)(n−1)2/8 if n is odd,
0 otherwise,
and χQ(d) :=
for a prime Q. Let ψ(d) :=
i=2 χℓi(d). We take a prime β such that
ψ(n1)χβ(n1) = −1. If we denote the ψ-twist of f̃(z) by f̃ψ(z) and the ψχβ-twist of f̃(z)
by f̃ψχβ(z), then
f̃ψχ2
(z)− f̃ψχβ(z) ≡ 2
gcd(m,β
ℓj)=1
a(n1m
2)qn1m
(mod p)
and f̃ψχβ(z) ∈ Sλ+(p+1)(p−1)+ 1
(Γ0(Nα
2β2), χ) ∩ Z[[q]] (see Chapter 3 in [8]). Note that
gcd(Nα2β2, p) = gcd(Nα2β2, ℓ1) = 1.
Thus, (f̃ψ(z)− f̃ψχβ(z))|T (ℓ21, λ+ (p+ 1)(p− 1), χ) satisfies the formula (2.2) of Theorem
2.1 for both of ǫ = 1 and ǫ = −1. This results in a contradiction since
(f̃ψ(z)− f̃ψχβ(z))|T (ℓ
1, λ+ (p+ 1)(p− 1), χ) 6≡ 0 (mod p)
and p ≥ 5. Thus, we complete the proof. �
8 D. CHOI
3. Proofs of Theorem 2, 3, and 4
3.1. Proof of Theorem 2. Note that h(z) =
η(z)2
η(2z)
·E4(4z)
η(4z)6
is a meromorphic modular form.
In [2] it was obtained a holomorphic modular form on Γ0(4p
2) whose Fourier coefficients
generate traces of singular moduli modulo p (see the formula (3.1) and (3.2)). Since the
level of this modular form is not relatively prime to p, we need the following proposition.
Proposition 3.1 ([1]). Suppose that p ≥ 5 is a prime. Also, suppose that p ∤ N , j ≥ 1 is
an integer, and
g(z) =
a(n)qn ∈ Sλ+ 1
(Γ0(Np
j)) ∩ Z[[q]].
Then, there exists a cusp form G(z) ∈ Sλ′+ 1
(Γ0(N)) ∩ Z[[q]] such that
G(z) ≡ g(z) (mod p),
where λ′ + 1
= (λ+ 1
)pj + pe(p− 1) for a sufficiently large e ∈ N.
Using Theorem 1 and Proposition 3.1, we give the proof of Theorem 2.
Proof of Theorem 2. Let
(3.1) h1,p(z) := h(z)−
hχp(z),
where hχp(z) is the χp-twist of h(z). From (1.2), we have
h1,p(z) := −2 −
0<d≡0,3 (mod 4)
t1(d)q
d − 2
0<d≡0,3 (mod 4)
(−dp )=−1
t1(d)q
hm,p(z) := h1,p(z)|T (m2, 1, χ0)
= −2 −
0<d≡0,3 (mod 4)
tm(d)q
d − 2
0<d≡0,3 (mod 4)
(−dp )=−1
tm(d)q
for every positive integer m. Let
Fp(z) :=
η(4z)p
η(4pz)
It was proved in [2] that if α is a sufficiently large positive integer, then h1,p(z)Fp(z)
(Γ0(4p
2)) and
(3.2) h1,p(z)Fp(z)
α ≡ h1,p(z) (mod p),
DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 9
where k0 = α · p
. Lemma 2.2 and Proposition 3.1 imply that there exists f1,p(z) ∈
Sλ′+ 1
(Γ0(4)) ∩ Z[[q]] such that
f1,p(z) ≡ −2
0<d≡0,3 (mod 4)
(−dp )=−1
tm(d)q
d (mod p),
where λ′ = (k0 + 1 + (p− 1)(p+ 1) + 12)p
2 + pe(p− 1) for a sufficiently large e ∈ N.
We assume that the coefficients of f1,p(z) do not satisfy Property A for an odd prime
p ≡ 2 (mod 3). Note that
= −1 and that p ∤ (3−1)3(3+1). So, Theorem 1 implies
2t1(3) ≡ 0 (mod p).
This results in a contradiction since 2t1(3) = 2
4 ·31. Thus, we obtain a proof when m = 1.
For every odd prime ℓ, we have
f1,p(z)|T (ℓ2, λ′, χ0) ≡ θp−1(h1,p(z))|T (ℓ2, λ′, χ0)
≡ θp−1(h1,p(z)|T (ℓ2, 1, χ0)) ≡ θp−1(hℓ,p(z)) (mod p).
Moreover, Lemma 2.3 implies that a positive proportion of the primes ℓ satisfies the
property
f1,p(z)|T (ℓ2, λ′, χ0) ≡ 2f1,p (mod p).
This completes the proof. �
3.2. Proofs of Theorem 3. The following theorem gives the formula for the Hurwitz
class number in terms of the Fourier coefficients of a modular form of half integral weight.
Theorem 3.2. Let T (z) := 1 + 2
n=1 q
n2. If integers r3(n) are defined as
r3(n)q
n := T (z)3,
r(n) =


12H(−4n) if n ≡ 1, 2 (mod 4),
24H(−n) if n ≡ 3 (mod 8),
r(n/4) if n ≡ 0 (mod 4),
0 if n ≡ 7 (mod 8).
Note that T (z) is a half integral weight modular form of weight 1
on Γ0(4). Combining
Theorem 1 and Theorem 3.2, we derive the proof of Theorem 3.
Proof of Theorem 3. Let G(z) be the
-twist of T (z)3. Then, from Theorem 3.2, we
G(z) = 1 +
n≡1 (mod 4)
12H(−4n)qn +
n≡3 (mod 8)
24H(−n)qn
10 D. CHOI
and G(z) ∈ M 3
(Γ0(16)). Note that 24H(−3) = 8. This gives the complete proof by
Theorem 1. �
3.3. Proofs of Theorem 4. In the following, we prove Theorem 4.
Proof of Theorem 4. Let
W (z) :=
η(2z)
η(z)2
It is known that
W (z) =
P̄ (n)qn
and that W (z) is a weakly holomorphic modular form on Γ0(16). Let
G(z) :=
W (z)−
Wχp(z)
Fp(z)
where Fp(z) =
η(4z)p
η(4p2z)
and β are positive integers. Then we have
G(z) ≡ 2
(−np )=−1
P̄ (n)qn +
P̄ (n)qn (mod p).
We claim that there exists a positive integer β such that G(z) is a holomorphic modular
form of half integral weight on Γ0(16p
2). To prove our claim, we follow the arguments
of Ahlgren and Ono ([1], Lemma 4.2). Note that, by a well-known criterion, Fp(z) is a
holomorphic modular form on Γ0(4p
2) that vanishes at each cusp a
∈ Q for which p2 ∤ c
(see [7]). This implies that G(z) is a weakly holomorphic modular form on Γ0(16p
2). If β
is sufficiently large, then G(z) is holomorphic except at each cusp a
for which p2|c′.
Thus, we prove that G(z) is holomorphic at 1
for 0 ≤ m ≤ 3. Let, for odd d,
ǫd :=
1 if d ≡ 1 (mod 4),
i if d ≡ 3 (mod 4).
If f(z) is a function on the complex upper half plane, λ ∈ Z, and γ = ( a bc d ) ∈ Γ0(4), then
we define the usual slash operator by
f(z) |λ+ 1
)2λ+1
ǫ−1−2λd (cz + d)
−λ− 1
az + b
cz + d
Let g :=
e2πiv/p be the usual Gauss sum. Note that
Wχp(z) =
W (z)|− 1
1 −v/p
Choose an integer kv satisfying
16kv ≡ 15v (mod p).
DISTRIBUTION OF INTEGRAL FOURIER COEFFICIENTS MODULO PRIMES 11
Then, we have
(3.3)
2mp2 1
= γv,m
2mp2 1
1 −16v
+ 16kv
where
γv,m =
1− 2m+4p(v + kv + 2mv2p− 2mvkvp) 1p(15v − 16kv − 2
m+4(v2p+ vkvp))
22mp2(−16vp+ 16kvp) 2m+4vp− 2m+4kvp+ 1
Note that W (z) has its only pole at z ∼ 0 up to Γ0(16). Since γv,m ∈ Γ0(16), the
formula (3.3) implies that Wχp(z) is holomorphic at 2
mp2 for 1 ≤ m ≤ 3. Thus, G(z) is
holomorphic at 2mp2 for 1 ≤ m ≤ 3.
If m = 0, then we have
W (z)|− 1
γv,0 =
−16vp3 + 16kvp3
16vp− 16kvp+ 1
W (z) =
p2(−vp+ kvp)
16vp− 16kvp + 1
W (z) = W (z).
Note that
(3.4) W (z)|− 1
= α · q−
16 +O(1),
where α is a nonzero complex number. The q-expansion of Wχp(z) at
is given by
(3.5) Wχp(z)|− 1
Using (3.3) and (3.4), the only term in (3.5) with a negative exponent on q is the term
(v−kv).
If N is defined by 16N ≡ 1 (mod p), then we have
(v−kv) =
Thus, we have that
(W (z)−Wχp(z))|− 1
= O(1).
This implies that G(z) is a holomorphic modular form of half integral weight on Γ0(16p
Noting that
P̄ (3) = 8,
the remaining part of the proof is similar to that in Theorem 3. Thus, it is omitted. �
12 D. CHOI
References
[1] S. Ahlgren and M. Boylan, Central Critical Values of Modular L-functions and Coeffients of Half
Integral Weight Modular Forms Modulo ℓ, to appear in Amer. J. Math.
[2] S. Ahlgren and K. Ono, Arithmetic of singular moduli and class polynomials, Compos. Math. 141
(2005), no. 2, 293–312.
[3] J. H. Bruinier and K. Ono, Coefficients of half-integral weight modular forms, J. Number Theory 99
(2003), no. 1, 164–179.
[4] H. Cohen, Sums involving the values at negative integers of L-functions of quadratic characters,
Math. Ann. 217 (1975), no. 3, 271–285.
[5] N. Koblitz, Introduction to elliptic curves and modular forms, Springer-Verlag New York, GTM 97,
1993.
[6] S. Lang, Introduction to Modular Forms, Grundl. d. Math. Wiss. no. 222, Springer: Berlin Heidelberg
New York, 1976 Berlin, 1995.
[7] B. Gordon and K. Hughes, Multiplicative properties of eta-product, Cont. Math. 143 (1993), 415-430.
[8] K. Ono, The web of modularity: arithmetic of the coefficients of modular forms and q-series, Amer.
Math. Soc., CBMS Regional Conf. Series in Math., vol. 102, 2004.
[9] J.-P. Serre, Divisibilite de certaines fonctions arithmetiques, Enseignement Math. (2) 22 (1976), no.
3-4, 227–260.
[10] S. Treneer, Congruences for the Coefficients of Weakly Holomorphic Modular Forms, to appear in
the Proceedings of the London Mathematical Society.
[11] D. Zagier, Traces of singular moduli, Motives, polylogarithms and Hodge theory, Part I, Int. Press
Lect. Ser., 3, I, Int. Press, Somerville, MA, 2002, pp.211-244.
School of Mathematics, KIAS, 207-43 Cheongnyangni 2-dong 130-722, Korea
E-mail address : choija@postech.ac.kr
	1. Introduction and Results
	2. Proof of Theorem ??
	3. Proofs of Theorem ??, ??, and ??
	3.1. Proof of Theorem ??
	3.2. Proofs of Theorem ??
	3.3. Proofs of Theorem ??
	References
ABSTRACT
  Recently, Bruinier and Ono classified cusp forms $f(z) := \sum_{n=0}^{\infty}
a_f(n)q ^n \in S_{\lambda+1/2}(\Gamma_0(N),\chi)\cap \mathbb{Z}[[q]]$ that does
not satisfy a certain distribution property for modulo odd primes $p$. In this
paper, using Rankin-Cohen Bracket, we extend this result to modular forms of
half integral weight for primes $p \geq 5$. As applications of our main theorem
we derive distribution properties, for modulo primes $p\geq5$, of traces of
singular moduli and Hurwitz class number. We also study an analogue of Newman's
conjecture for overpartitions.

<|endoftext|><|startoftext|>
Introduction and Statement of Main Results
Serre obtained the p-adic limits of the integral Fourier coefficients of modular forms on
SL2(Z) for p = 2, 3, 5, 7 (see Théorème 7 and Lemma 8 in [20]). In this paper, we extend
the result of Serre to weakly holomorphic modular forms of half integral weight on Γ0(4N)
forN = 1, 2, 4. The proof is based on linear relations among Fourier coefficients of modular
forms of half integral weight. As applications of our main result, we obtain congruences
for various modular objects, such as those for Borcherds exponents, for Fourier coefficients
of quotients of Eisentein series and for Fourier coefficients of Siegel modular forms on the
Maass Space.
For odd d, let
:= γtΓ0(4N)tγ
where γt = (
c d ) ∈ Γ(1) and γt(t) = ∞. We denote the q-expansion of a modular form
f ∈Mλ+ 1
(Γ0(4N)) at each cusp t of Γ0(4N) by
(1.1) (f |λ+ 1
γt)(z) = (cz + d)
−λ− 1
az + b
cz + d
atf (n)q
t , qt := q
where
(1.2) r(t) ∈
2000 Mathematics Subject Classification. 11F11,11F33.
Key words and phrases. modular forms, p-adic limit, Borcherds exponents, Maass space .
This work was partially supported by KOSEF R01-2003-00011596-0 , ITRC and BRSI-POSTECH.
http://arxiv.org/abs/0704.0013v2
2 D. CHOI AND Y. CHOIE
When t ∼ ∞, we denote atf (n) by af (n). Note that the number r(t) is independent of the
choice of f ∈Mλ+ 1
(Γ0(4N)) and λ. We call t a regular cusp if r(t) = 0 (see Chapter IV.
§1. of [15] for a more general definition of a λ-regular cusp ).
Remark 1.1. Our definition of a regular cusp is different from the usual one.
Let U4N := {t1, · · · , tν(4N)} be the set of all inequivalent regular cusps of Γ0(4N). Note
that the genus of Γ0(4N) is zero if and only if 1 ≤ N ≤ 4. LetMλ+ 1
(Γ0(4N)) be the space
of weakly holomorphic modular forms of weight λ + 1
on Γ0(4N) and let M0λ+ 1
(Γ0(N))
denote the set of f(z) ∈ Mλ+ 1
(Γ0(N)) such that the constant term of its q-expansion at
each cusp is zero. Let Up be the operator defined by
(f |Up)(z) :=
af(pn)q
Let OL be the ring of integers of a number field L with a prime ideal p ⊂ OL. For
f(z) :=
af(n)q
n and g(z) :=
ag(n)q
n ∈ L[[q−1, q]] we write
f(z) ≡ g(z) (mod p)
if and only if af (n)− ag(n) ∈ p for every integer n.
With these notations we state the following theorem.
Theorem 1. For N = 1, 2, 4 consider
f(z) :=
af (n)q
n ∈ M0
(Γ0(4N)) ∩ L[[q−1, q]].
Suppose that p ⊂ OL is any prime ideal such that p|p, p prime, and that af(n) is p-integral
for every integer n ≥ n0.
(1) If p = 2 and af (0) = 0, then there exists a positive integer b such that
(f |(Up)b)(z) ≡ 0 (mod pj) for each j ∈ N.
(2) If p ≥ 3 and f(z) ∈ M0
(Γ0(4N)) with λ ≡ 2 or 2+
(mod p−1
), then there
exists a positive integer b such that
(f |(Up)b)(z) ≡ 0 (mod pj) for each j ∈ N.
Remark 1.2. The p-adic limit of a sum of Fourier coefficients of f ∈ M 3
(Γ0(4N)) was
studied in [13].
Our method only allows to prove a weaker result if f(z) 6∈ M0
(Γ0(4N)).
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 3
Theorem 2. For N = 1, 2 or 4, let
f(z) :=
af (n)q
n ∈ Mλ+ 1
(Γ0(4N)) ∩ L[[q−1, q]].
Suppose that p ⊂ OL is any prime ideal with p|p, p prime, p ≥ 5, and that af (n) is
p-integral for every integer n ≥ n0. If λ ≡ 2 or 2 +
(mod p−1
), then there exists a
positive integer b0 such that
p2b−m(p:λ)
t∈U4N
∆4N,3−α(p:λ)(z)
R4N (z)
e·ω(4N)
(0)atf (0) (mod p)
for every positive integer b > b0 (see Section 3 for detailed notation ).
Example 1.3. Recall that the generating function of the overpartition P̄ (n) of n(see [11])
P̄ (n)qn =
η(2z)
η(z)2
is in M− 1
(Γ0(16)), where η(z) := q
n=1(1− qn). Therefore, theorem 2 implies that
P̄ (52b) ≡ 1 (mod 5), ∀b ∈ N.
2. Applications: More Congruences
In this section, we study congruences for various modular objects such as those for
Borcherds exponents and for quotients of Eisenstein series.
2.1. p-adic Limits of Borcherds Exponents. Let MH denote the set of meromorphic
modular forms of integral weight on SL2(Z) with Heegner divisor, integer coefficients and
leading coefficient 1. Let
(Γ0(4)) := {f(z) =
af(n)q
n ∈ M 1
(Γ0(4)) | a(n) = 0 for n ≡ 2, 3 (mod 4)}.
If f(z) =
af(n)q
n ∈ M+1
(Γ0(4)), then define Ψ(f(z)) by
Ψ(f(z)) := q−h
(1− qn)af (n2),
where h = − 1
af(0) +
1<n≡0,1 (mod 4) af (−n)H(−n). Here H(−n) denotes the usual
Hurwitz class number of discriminant −n. The following was proved by Borcherds.
Theorem 2.1 ([4]). The map Ψ is an isomorphism from M+1
(Γ0(4)) to MH , and the
weight of Ψ(f(z)) is af (0).
4 D. CHOI AND Y. CHOIE
Let j(z) be the usual j-invariant function with the product expansion
j(z) = q−1
(1− qn)A(n).
Let F (z) := q−h
n=1(1 − qn)c(n) be a meromorphic modular form of weight k in MH .
The p-adic limit of
d|n d · c(d) was studied in [5] for p = 2, 3, 5, 7. Here we obtain the
p-adic limit of c(d) for p = 2, 3, 5, 7.
Theorem 3. Let F (z) := q−h
n=1(1− qn)c(n) be a meromorphic modular form of weight
k in MH .
(1) If p = 2, then for each j ∈ N there exists a positive integer b such that
c(mpb) ≡ 2k (mod pj)
for every positive integer m.
(2) If p ∈ {3, 5, 7}, then, for each j ∈ N there exists a positive integer b such that
5c(mpb)−̟(F )A(mpb) ≡ 10k (mod pj)
for every positive integer m. Here, ̟(F ) is a constant determined by the constant
term of the q-expansion of Ψ−1(F ) at 0.
2.2. Sums of n-Squares. For u ∈ Z>0, let
rn(u) := ♯{(s1, · · · , sn) ∈ Zn : s21 + · · ·+ s2n = u}.
Theorem 4. Suppose that p ≥ 5 is a prime. If λ ≡ 2 or 3 (mod p−1
), then there exists
a positive integer C0 such that
r2λ+1
p2b−m(p:λ)
≡ − (14− 4α (p : λ)) + 16
)[ λp−1 ]+α(p:λ)m(p:λ)
(mod p),
for every b > C0.
Remark 2.2. As for an example, if λ ≡ 2 (mod p− 1) and p is an odd prime, then there
exists a positive integer C0 such that
r2λ+1
≡ 10 (mod p), ∀b > C0
2.3. Quotients of Eisenstein Series. Congruences for the coefficients of quotients of
elliptic Eisenstein series have been studied in [3]. Let us consider the Cohen Eisenstein
series Hr+ 1
(z) :=
N=0H(r,N)q
n of weight r+ 1
, r ≥ 2 (see [7]). We derive congruences
for the coefficients of quotients of Hr+ 1
(z) and Eisenstein series.
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 5
Theorem 5. Let
F (z) :=
E4(z)
aF (n)q
G(z) :=
E6(z)
aG(n)q
W (z) :=
E6(z)
aW (n)q
Then there exists a positive integer C0 such that
aF (11
2b+1) ≡ 1 (mod 11),
aG(11
2b+1) ≡ 6 (mod 11),
aW (11
2b+1) ≡ 2 (mod 11)
for every integer b > C0.
2.4. The Maass Space. Next we deal with congruences for the Fourier coefficients of
a Siegel modular form in the Maass space. To define the Maass space, let us introduce
notations given in [17]: let T ∈ M2g(Q) be a rational, half-integral, symmetric, non-
degenerate matrix of size 2g with discriminant
DT := (−1)g det(2T ).
Let DT = DT,0f
T , where DT,0 is the corresponding fundamental discriminant. Further-
more, let
G8 :=

2 0 −1 0 0 0 0 0
0 2 0 −1 0 0 0 0
−1 0 2 −1 0 0 0 0
0 −1 −1 2 −1 0 0 0
0 0 0 −1 2 −1 0 0
0 0 0 0 −1 2 −1 0
0 0 0 0 0 −1 2 −1
0 0 0 0 0 0 −1 2

and G7 be the upper (7, 7)-submatrix of G8. Define
Sg :=
(g−1)/8
2, if g ≡ 1 (mod 8),
(g−7)/8
G7, if g ≡ −1 (mod 8).
6 D. CHOI AND Y. CHOIE
For each m ∈ N such that (−1)gm ≡ 0, 1 (mod 4), define a rational, half-integral, sym-
metric, positive definite matrix Tm of size 2g by
Tm :=


0 m/4
, if m ≡ 0 (mod 4),
e2g−1
e′2g−1 [m+ 2 + (−1)n]/4
, if m ≡ (−1)g (mod 4)
Here e2g−1 ∈ Z(2n−1,1) is the standard column vector and e′2g−1 is its transpose.
Definition 2.3. (The Maass Space) Take g, k ∈ N such that g ≡ 0, 1 (mod 4) and
g ≡ k (mod 2). Let
SMaassk+g (Γ2g)
F (Z) =
A(T )qtr(TZ) ∈ Sk+g(Γ2g)
∣∣∣∣∣∣
A(T ) =
ak−1φ(a;T )A(T|DT |/a2)
(see (6.2) for details). This space is called the Maass space of genus 2g and weight g + k.
In [17] it was proved that the Maass space is the same as the image of the Ikeda lifting
when g ≡ 0, 1 (mod 4). Using this fact together with Theorem 1, we derive the following
congruences for the Fourier coefficients of F (Z) in SMaassk+g (Γ2g).
Theorem 6. For g ≡ 0, 1 (mod 4), let
F (Z) :=
A(T )qtr(TZ) ∈ SMaassk+g (Γ2g)
with integral coefficients A(T ), T > 0. If k ≡ 2 or 3 (mod p−1
) for some prime p, then,
for each j ∈ N, there exists a positive integer b for which
A(T ) ≡ 0 (mod pj)
for every T > 0, det(2T ) ≡ 0 (mod pb).
This paper is organized as follows. Section 3 gives a linear relation among Fourier
coefficients of modular forms of half integral weight. The remaining sections contain
detailed proofs of the main theorems.
3. Linear Relation among Fourier Coefficients of modular forms of Half
Integral Weight
Let V (N ; k, n) be the subspace of Cn generated by the first n coefficients of the q-
expansion of f at ∞ for f ∈ Sk(Γ0(N)), where Sk(Γ0(N)) denotes the space of cusp forms
of weight k ∈ Z on Γ0(N). Let L(N ; k, n) be the orthogonal complement of V (N ; k, n)
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 7
in Cn with the usual inner product of Cn. The vector space L(1; k, d(k) + 1), d(k) =
dim(Sk(Γ(1))), was studied by Siegel to evaluate the value of the Dedekind zeta function
at a certain point. The vector space L(1; k, n) is explicitly described in terms of the
principal part of negative weight modular forms in [9]. These results were extended in [8]
to the groups Γ0(N) of genus zero. For 1 ≤ N ≤ 4, let
4N, λ+
at1f (0), · · · , a
tν(4N)
f (0), af(1), · · · , af(n)
∈ Cn+ν(4n)
∣∣∣ f ∈Mλ+ 1
(Γ0(4N))
where U4N := {t1, · · · , tν(4N)} is the set of all inequivalent regular cusps of Γ0(4N). We
define EL(4N, λ+ 1
;n) to be the orthogonal complement of EV (4N, λ+ 1
;n) in Cn+ν(4N).
Let ∆4N,λ := q
δλ(4N)+O(qδλ(4N)+1) be inMλ+ 1
(Γ0(4N) with the maximum order at ∞,
that is, its order at ∞ is bigger than that of any other modular form of the same level
and weight. Furthermore, let
R4(z) :=
η(4z)8
η(2z)4
, R8(z) :=
η(8z)8
η(4z)4
R12(z) :=
η(12z)12η(2z)2
η(6z)6η(4z)4
and R16(z) :=
η(16z)8
η(8z)4
For ℓ, n ∈ N, define
m(ℓ : n) :=
≡ 0 (mod 2)
≡ 1 (mod 2)
α(ℓ : n) := n− ℓ− 1
Let ω(4N) be the order of zero of R4N (z) at ∞. Note that R4N (z) ∈ M2(Γ0(4N)) has
its only zero at ∞. So, using the definition of η(z) = q 124
n=1(1− qn), we find that
(3.1) ω(4) = 1, ω(8) = 2, ω(12) = 4, ω(16) = 4.
For each g ∈Mr+ 1
(Γ0(4N)) and e ∈ N, let
(3.2)
R4N (z)e
e·ω(4N)∑
b(4N, e, g; ν)q−ν +O(1) at ∞.
With these notations we state the following theorem:
Theorem 3.1. Suppose that λ ≥ 0 is an integer and 1 ≤ N ≤ 4. For each e ∈ N
such that e ≥ λ
− 1, take r = 2e − λ + 1. The linear map Φr,e(4N) : Mr+ 1
(Γ0(4N)) →
8 D. CHOI AND Y. CHOIE
EL(4N, λ+ 1
; e · ω(4N)), defined by
Φr,e(4N)(g)
R4N (z)
(0), · · · , htν(4N)a
tν(4N)
R4N (z)
(0), b(4N, e, g; 1), · · · , b(4N, e, g; e · ω(4N))
is an isomorphism.
Proof of Theorem 3.1. Suppose that G(z) is a meromorphic modular form of weight 2 on
Γ0(4N). For τ ∈ H∪C4N , let Dτ be the image of τ under the canonical map from H∪C4N
to a compact Riemann surface X0(4N). Here H is the usual complex upper half plane,
and C4N denotes the set of all inequivalent cusps of Γ0(4N). The residue ResDτGdz of
G(z) at Dτ ∈ X0(4N) is well-defined since we have a canonical correspondence between a
meromorphic modular form of weight 2 on Γ0(4N) and a meromorphic 1-form of X0(4N).
If ResτG denotes the residue of G at τ on H, then
ResDτGdz =
ResτG.
Here lτ is the order of the isotropy group at τ . The residue of G at each cusp t ∈ C4N is
(3.3) ResDtGdz = ht ·
atG(0)
Now we give a proof of Theorem 3.1.
To prove Theorem 3.1, take
G(z) =
R4N (z)e
f(z),
where g ∈Mr+ 1
(Γ0(4N)) and f(z) =
n=1 af(n)q
n ∈Mλ+ 1
(Γ0(4N)). Note that G(z) is
holomorphic on H. Since g(z), R4N (z) and f(z) are holomorphic and R4N (z) has no zero
on H, it is enough to compute the residues of G(z) only at all inequivalent cusps to apply
the Residue Theorem. The q-expansion of
R4N (z)
ef(z) at ∞ is
R4N(z)e
f(z) =
e·ω(4N)∑
b(4N, e, g; ν)q−ν + a g(z)
R4N (z)
(0) +O(q)
af(n)q
Since R4N (z) has no zero at t ≁ ∞, we have
R4N (z)e
γt = a
R4N (z)
(0)af(0) +O(qt).
Further note that, for an irregular cusp t,
at g(z)
R4N (z)
(0)af(0) = 0.
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 9
So the Residue Theorem and (3.3) imply that
(3.4)
t∈U4N
e·ω(4N)
(0)atf(0) +
e·ω(4N)∑
b(4N, e, g; ν)af(ν) = 0.
This shows that Φr,e(4N) is well-defined. The linearity of the map Φr,e(4N) is clear.
It remains to check that Φr,e(4N) is an isomorphism. Since there exists no holomorphic
modular form of negative weight except the zero function, we obtain the injectivity of
Φr,e(4N). Note that for e ≥ λ−12 ,
4N ;λ+
, e · ω(4N)
= e · ω(4N) + ν(4N)− dimC
Mλ+ 1
(Γ0(4N))
However, the set C4N , 1 ≤ N ≤ 4, of all inequivalent cusps of Γ0(4N) are
∞, 0, 1
∞, 0, 1
C12 =
∞, 0, 1
C16 =
∞, 0, 1
and it can be checked that
(3.5) ν(4) = 2, ν(8) = 3, ν(12) = 4, ν(16) = 6
(see §1 of Chapter 4. in [15] for details). The dimension formula of Mλ+ 1
(Γ0(4N)) (see
Table 1) together with the results in (3.1) and (3.5), implies that
4N, λ+
; e · ω(N)
= dimC(Mr+ 1
(Γ0(4N)))
since r = 2e− λ+ 1.
Table 1. Dimension Formula for Mk(Γ0(4N))
N k = 2n + 1
k = 2n+ 3
k = 2n
N = 1 n + 1 n + 1 n + 1
N = 2 2n+ 1 2n+ 2 2n+ 1
N = 3 4n+ 1 4n+ 3 4n+ 1
N = 4 4n+ 2 4n+ 4 4n+ 1
So Φr,e(4N) is surjective since the map Φr,e(4N) is injective. This completes our claim.
10 D. CHOI AND Y. CHOIE
4. Proofs of Theorem 1 and 2
4.1. Proof of Theorem 1. First, we obtain linear relations among Fourier coefficients
of modular forms of half integral weight modulo p. Let
Op := {α ∈ L | α is p-integral}.
M̃λ+ 1
, p(Γ0(4N)) := {H(z) =
aH(n)q
n ∈ Op/pOp[[q−1, q]] |
H ≡ h (mod p) for some h ∈ Op[[q−1, q]] ∩Mλ+ 1
(Γ0(4N))}.
S̃λ+ 1
, p(Γ0(4N)) := {H(z) =
aH(n)q
n ∈ Op/pOp[[q−1, q]] |
H ≡ h (mod p) for some h ∈ Op[[q−1, q]] ∩ Sλ+ 1
(Γ0(4N))}.
The following lemma gives the dimension of M̃λ+ 1
, p(Γ0(4N)).
Lemma 4.1. Take λ ∈ N, 1 ≤ N ≤ 4 and a prime p such that
p ≥ 3 if N = 1, 2, 4,
p ≥ 5 if N = 3.
Now take any prime ideal p ⊂ OL, p|p. Then
dim M̃λ+ 1
, p(Γ0(4N)) = dimMλ+ 1
(Γ0(4N))
dim S̃λ+ 1
, p(Γ0(4N)) = dimSλ+ 1
(Γ0(4N)).
Proof. Let
j4N (z) = q
−1 +O(q)
be a meromorphic modular function with a pole only at ∞. Explicitly, these functions
j4(z) =
η(z)8
η(4z)8
+ 8, j8(z) =
η(4z)12
η(2z)4η(8z)8
j12(z) =
η(4z)4η(6z)2
η(2z)2η(12z)4
, j16(z) =
η2(z)η(8z)
η(2z)η2(16z)
Since the Fourier coefficients of η(z) and 1
are integral, the q-expansion of j4N (z) has
integral coefficients.
Recall that ∆4N,λ = q
δλ(4N) + O(qδλ(4N)+1) is the modular form of weight λ + 1
Γ0(4N) such that the order of its zero at ∞ is higher than that of any other modular form
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 11
of the same level and weight. Denote the order of zero of ∆4N,λ at ∞ by δλ(4N). Then
the basis of Mλ+ 1
(Γ0(4N)) can be chosen as
(4.1) {∆4N,λ(z)j4N (z)e | 0 ≤ e ≤ δλ(4N)} .
If ∆4N,λ(z) is p-integral, then {∆4N,λ(z)j4N (z)e | 0 ≤ e ≤ δλ(4N)} also forms a basis of
M̃λ+ 1
,p(Γ0(4N)). Note that δλ(4N) = dimMλ+ 1
(Γ0(4N))− 1. So from Table 1 we have
(4.2) ∆4N,λ(z) = ∆4N,j(z)R4N (z)
where λ ≡ j (mod 2), j ∈ {0, 1}. More precisely, one can choose ∆4N,j(z) as followings:
∆4,0(z) = θ(z), ∆4,1(z) = θ(z)
∆8,0(z) = θ(z), ∆8,1(z) =
(θ(z)3 − θ(z)θ(2z)2) ,
∆12,0(z) = θ(z), ∆12,1(z) =
x,y,z∈Z q
3x2+2(y2+z2+yz) −
x,y,z∈Z q
3x2+4y2+4z2+4yz
∆16,0(z) =
(θ(z)− θ(4z)) , ∆16,1(z) = 18 (θ(z)
3 − 3θ(z)2θ(4z) + 3θ(z)θ(4z)2 − θ(4z)3) .
Since θ(z) = 1+ 2
n=1 q
n, the coefficients of the q-expansion of ∆4N,j(z), j ∈ {0, 1}, are
p-integral. This completes the proof. �
Remark 4.2. The proof of Lemma 4.1 implies that the spaces of Mλ+ 1
(Γ0(4N)) for
N = 1, 2, 4 are generated by eta-quotients since θ(z) =
η(2z)5
η(z)2η(4z)2
For 1 ≤ N ≤ 4 set
4N, λ+
(af(1), · · · , af(n)) ∈ Fnp | f ∈ S̃λ+ 1
(Γ0(4N))
,Fp := Op/pOp.
We define L̃S(4N, λ +
;n) to be the orthogonal complement of ṼS(4N, λ +
;n) in Fn
Using Lemma 4.1, we obtain the following proposition.
Proposition 4.3. Suppose that λ is a positive integer and 1 ≤ N ≤ 4. For each e ∈ N,
e ≥ λ
− 1, take r = 2e−λ+1. The linear map ψ̃r,e(4N) : M̃r+ 1
,p(Γ0(4N)) → L̃S(4N, λ+
; e · ω(4N)), defined by
ψ̃r,e(4N)(g) = (b(4N, e, g; 1), · · · , b(N, e, g; e · ω(4N))) ,
is an isomorphism. Here b(4N, e, g; ν) is defined in (3.2).
Proof. Note that dimS 3
(4N) = 0 and that
dimSλ+ 1
(4N) +N + 1 +
= dimMλ+ 1
(see [10]). So, from Lemma 4.1 and Table 1, it is enough to show that ψr,e(4N) is injective.
If g is in the kernel of ψr,e(4N), then
R4N (z)
e · R4N (z)e ≡ 0 (mod p) by Sturm’s formula
(see [21]). So we have g(z) ≡ 0 (mod p) since R4N(z)e 6≡ 0 (mod p). This completes the
proof. �
12 D. CHOI AND Y. CHOIE
Theorem 4.4. Take a prime p,N = 1, 2, 4 and
f(z) :=
af(n)q
n ∈ Sλ+ 1
(Γ0(4N)) ∩ L[[q]].
Suppose that p ⊂ OL is any prime ideal with p|p and that af (n) is p-integral for every
integer n ≥ n0. If λ ≡ 2 or 2 +
(mod p−1
) or p = 2, then there exists a positive
integer b such that
≡ 0 (mod p), ∀n ∈ N.
Proof of Theorem 4.4. i) First, suppose that p ≥ 3: Take positive integers ℓ and b such
(4.3)
3− 2α(p : λ)
p2b +
pm(p:λ) + ℓ(p− 1) = 2.
Note that if b is large enough, that is, b > logp
3−2α(p:λ)
pm(p:λ) − 2
, then there
exists a positive integer ℓ satisfying (4.3). Also note that atf(0) = 0 for every cusp t of
Γ0(4N) since f(z) is a cusp form. So, if r = 2e− α(p : λ) + 1, then Theorem 3.1 implies
that, for g(z) ∈ M̃r+ 1
(Γ0(4N)),
e·ω(4N)∑
b(4N, e, g; ν)af(νp
2b−m(p:λ)) ≡ 0 (mod p),
since
R4N (z)e
f(z)p
m(p:λ)
Eℓp−1(z)
e·ω(4N)∑
b(4N, e, g; ν)q−νp
+ a g(z)
R4N (z)
(0) +
a g(z)
R4N (z)
(n)qnp
af(n)q
npm(p:λ)
(mod p).
So Proposition 4.3 implies that
p2b−m(p:λ)
2p2b−m(p:λ)
, · · · , a
e · ω(4N)p2b−m(p:λ)
∈ ṼS
4N,α(p : λ) + 1
If α(p : λ) = 2 or 2 +
, then
dimSα(p:λ)+ 1
(Γ0(4N)) = dim ṼS
4N,α(p : λ) +
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 13
ii) p = 2: Note that
∆4N,1(z)
R4N (z)
= q−1+O(1) for N = 1, 2, 4. So, there exists a polynomial
F (X) ∈ Z[X ] such that
F (j4N(z))
∆4N,1(z)
R4N (z)
= q−n +O(1).
For an integer b, 22
> λ+ 2, let
G(z) :=
F (j4N(z))
∆4N,1(z)
R4N(z)
f(z)θ(z)2
1+2b−2λ+3.
Since θ(z) ≡ 1 (mod 2), Theorem 3.1 implies that af(2b · n) ≡ 0 (mod p). �
To apply Theorem 4.4, we need the following two propositions.
Proposition 4.5 (Proposition 3.2 in [22]). Suppose that p is an odd prime, k and N are
integers with (N, p) = 1. Let
f(z) =
a(n)qn ∈ Mλ+ 1
(Γ0(4N)).
Suppose that ξ :=
cp2 d
, with ac > 0. Then there exist n0, h0 ∈ N with h0|N, a sequence
{a0(n)}n≥n0 and r0 ∈ {0, 1, 2, 3} such that
(f |Upm|λ+ 1
ξ)(z) =
4n+r0≡0 (mod p
a0(n)q
4n+r0
m , ∀m ≥ 1.
Proposition 4.6 (Proposition 5.1 in [1]). Suppose that p is an odd prime such that p ∤ N
and consider
g(z) =
a(n)qn ∈ Sλ+ 1
(Γ0(4Np
j)) ∩ L[[q]], for each j ∈ N.
Suppose further that p ⊂ OL is any prime ideal with p|p and that a(n) is p-integral for
every integer n ≥ 1. Then there exists G(z) ∈ Sλ′+ 1
(Γ0(4N)) ∩OL[[q]] such that
G(z) ≡ g(z) (mod p),
where λ′ + 1
= (λ+ 1
)pj + pe(p− 1) with eN large.
Remark 4.7. Proposition 4.6 was proved for p ≥ 5 in [1]. One can check that this holds
also for p = 3.
Now we prove Theorem 1.
Proof of Theorem 1. Take
Gp(z) :=
η(8z)48
η(16z)24
∈M12(Γ0(16)) if p = 2,
η(z)27
η(9z)3
∈M12(Γ0(9)) if p = 3,
η(4z)p
η(4p2z)
∈M p2−1
(Γ0(p
2)) if p ≥ 5.
14 D. CHOI AND Y. CHOIE
Using properties of eta-quotients (see [12]), note that Gp(z) vanishes at every cusp of
Γ0(16) except ∞ if p = 2, and vanishes at every cusp ac of Γ0(4Np
2) with p2 ∤ N if p ≥ 3.
Thus, Proposition 4.5 implies that there exist positive integers ℓ,m, k such that
(f |Upm)(z)Gp(z)ℓ ∈ Sk+ 1
(Γ0(16)) if p = 2,
(f |Upm)(z)Gp(z)ℓ ∈ Sk+ 1
(Γ0(4p
2N)) if p ≥ 3.
Note that k ≡ λ (mod p− 1). Using Proposition 4.6, we can find
F (z) ∈ Sk′+ 1
(Γ0(4N))
such that F (z) ≡ (f(z)|Upm)Gp(z)ℓ ≡ (f |Upm)(z) (mod p) and k′ ≡ k (mod p − 1).
Theorem 4.4 implies that there exists a positive integer b such that (F |Up2b)(z) ≡ 0
(mod p). Thus, we have shown so far that if ρ ∈ p \ p2, all the Fourier coefficients of
· F (z)|Upm+2b are p-integral. Repeat this argument to complete our claim. �
4.2. Proof of Theorem 2. Theorem 2 can be derived from Theorem 3.1 by taking a
special modular form.
Proof of Theorem 2. Take a positive integer ℓ and a positive even integer u such that
3− 2α(p : λ)
pm(p:λ) + ℓ(p− 1) = 2.
Let F (z) :=
∆4N,3−α(p:λ)(z)
R4N (z)
and G(z) := Ep−1(z)
ℓf(z)p
m(p:λ)
. Since Ep−1(z) ≡ 1
(mod p), we have
F (z)G(z) ≡
a∆4N,3−α(p:λ)(z)
R4N (z)
(n)qnp
af(n)q
nm(p:λ)
(mod p).
If Fourier coefficients of f(z) at each cusp are p-integral, then
((F ·G)|2γt) (z) ≡
atF (n)q
atG(n)q
atf (n)q
at∆4N,3−α(p:λ)(z)
R4N (z)
(mod p)
for t ≁ ∞. Since
aF (z)G(z)(0) ≡ a∆4N,3−α(p:λ)(z)
R4N (z)
(0)af (0) + af (p
u−m(p:λ)) (mod p) ,
F (z)G(z)
(0) ≡ at∆4N,3−α(p:λ)(z)
R4N (z)
(0)atf (0) (mod p) for t ≁ ∞,
for large u, the Residue Theorem implies Theorem 2 by letting u = 2b. Therefore it
is enough to check a p-integral property of Fourier coefficients of f(z) at each cusp:
take a positive integer e such that ∆(z)ef(z) is a holomorphic modular form, where
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 15
∆(z) := q
n=1(1− qn)24. Note that the q-expansions of j4N (z) and ∆4N,12e+λ(z) at each
cusp are p-integral. Thus (4.1) implies that
∆(z)ef(z) =
δ12e+λ(4N)∑
cnj4N (z)
n∆4N,12e+λ(z).
Moreover, cn is p-integral since
j4N (z)
n∆4N,12e+λ(z) = q
δ12e+λ(4N)−n +O
qδ12e+λ(4N)−n+1
and f(z) ∈ OL[[q, q−1]]. Note that p ∤ 4N since 1 ≤ N ≤ 4 and p ≥ 5 is a prime.
So Fourier coefficients of j4N (z), ∆N,12e+λ(z) and
at each cusp are p-integral. This
completes our claim. �
5. Proof of Theorem 3
Theorem 3 follows from Theorem 1 and Theorem 2.1.
Proof of Theorem 3. Note that j(z) ∈ MH . Let
g(z) := Ψ−1(j(z)) and f(z) := Ψ−1(F (z)) =
af (n)q
It is known (see §14 in [4]) that
g(z) =
(θ(z))E10(4z)
4πi∆(4z)
θ(z) d
(E10(4z))
80πi∆(4z)
θ(z).
Since the constant terms of the q-expansions at ∞ of f(z), θ(z) and g(z) are 0, a0
(0) =
and a0g(0) =
· 456
, respectively, we have
f(z)− kθ(z)−
a0f(0) + k(1− i)/2
a0g(0)
g(z) ∈ M01
(Γ0(4)).
Applying Theorem 1, one obtains the result. �
6. Proofs of Theorem 4 and 5
We begin with the following proposition.
Proposition 6.1. Let p be an odd prime and
f(z) :=
af (n)q
n ∈Mλ+ 1
(Γ0(4)) ∩ Zp[[q]].
If λ ≡ 2 or 3 (mod p−1
), then
p2b−m(p:λ)
≡ −(14 − 4α(p : λ))af(0) + 28
2−1 − 2−1i
)pb(7−2α(p:λ))
a0f (0) (mod p)
16 D. CHOI AND Y. CHOIE
for every integer b > logp
2α(p:λ)−3
pm(p:λ) + 2
Proof of Proposition 6.1. For ν ∈ Z≥0,
pm(p:λ) := ν · (p− 1) + α(p : λ) + 1
For an integer b with
3− 2α(p : λ)
pm(p:λ) − 2
there exists an ℓ ∈ N such that
3− 2α(p : λ)
p2b +
pm(p:λ) + ℓ(p− 1) = 2,
since
3− 2α(p : λ)
p2b +
pm(p:λ) − 2 = 3− 2α(p : λ)
(p2b − 1) + ν(p− 1).
We have
F (z) ≡
n=0 af (n)q
npm(p:λ) (mod p),
G(z) ≡ q−pb + 14− 4α(p : λ) + aG(1)q + · · · (mod p).
Note that aG(n) is p-integral for every integer n. Moreover, we obtain
F (z)G(z)|2 ( 0 −11 0 ) ≡
a0f (0) + · · ·
−26pb
)pb(7−2α(p:λ))
+ · · ·
(mod p),
where a0f (0) is given in (1.1). Note that
∞, 0, 1
is the set of cusps of Γ0(4), so Theorem
2 implies that
af (p
2b−m(p:n)) + (14− 4α(p : λ))af(0)− 28a0f (0)
)pb(7−2α(p:λ))
≡ 0 (mod p).
This proves Proposition 6.1. �
6.1. Proof of Theorem 4. Now we prove Theorem 4.
Proof of Theorem 4. Take
f(z) := θ2λ+1(z) = 1 +
r2λ+1(ℓ)q
af(n)q
Note that f(z) ∈Mλ+ 1
(Γ0(4)). Since (θ| 1
( 0 −11 0 ))(z) =
, we obtain
af(0) = 1 and a
f (0) =
)2λ+1
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 17
Since λ ≡ 2, 3 (mod p−1
) and
, we have
)p2u(7−2α(p:λ))
a0f (0)
pm(p:λ)
)p2u(7−2α(p:λ)) (
)pm(p:λ)(2α(p:λ)+(p−1)(2[ λp−1 ]+m(p:λ))+1)
)(7−2α(p:λ))(p2u−1)(
)8+2(p−1)[ λp−1 ]+m(p:λ)pm(p:λ)(p−1)+(pm(p:λ)−1)(1+2α(p:λ))
)8+2[ λp−1 ](p−1)+2α(p:λ)(pm(p:λ)−1)
)[ λp−1 ]+α(p:λ)m(p:λ)
(mod p),
for some u ∈ N. Applying Proposition 6.1, we obtain the result. �
6.2. Proof of Theorem 5. Consider the Cohen Eisenstein seriesHr+ 1
(z) :=
N=0H(r,N)q
of weight r + 1
, where r ≥ 2 is an integer. If (−1)rN ≡ 0, 1 (mod 4), then H(r,N) = 0.
If N = 0, then H(r, 0) = −B2r
. If N is a positive integer and Df 2 = (−1)rN , where D is
a fundamental discriminant, then
(6.1) H(r,N) = L(1− r, χD)
µ(d)χD(d)d
r−1σ2r−1(f/d).
Here µ(d) is the Möbius function. The following theorem implies that the Fourier coeffi-
cients of Hr+ 1
(z) are p-integral if p−1
Theorem 6.2 ([6]). Let D be a fundamental discriminant. If D is divisible by at least two
different primes, then L(1−n, χD) is an integer for every positive integer n. If D = p, p >
2, then L(1−n, χD) is an integer for every positive integer n unless gcd(p, 1−χD(g)gn) 6= 1,
where g is a primitive root (mod p).
Proof of Theorem 5. Note that E10(z) = E4(z)E6(z). So, E10(z)F (z), E10(z)G(z) and
E10(z)W (z) are modular forms of weights, 8 · 12 , 7 ·
and 8 · 1
respectively. Moreover, the
Fourier coefficients of those modular forms are 11-integral, since the Fourier coefficients
of H 5
(z), H 7
(z) and H 9
(z) are 11-integral by Theorem 6.2. We have
E10(z)F (z) =
+O(q),
E10(z)F (z)| 17
( 0 −11 0 ) =
(1 + i)(2i)−5 +O
E10(z)G(z) =
+O(q),
E10(z)G(z)| 15
( 0 −11 0 ) =
(1− i)(2i)−7 +O
E10(z)W (z) =
+O(q),
E10(z)W (z)| 17
( 0 −11 0 ) =
(1 + i)(2i)−9 +O
18 D. CHOI AND Y. CHOIE
where B2r is the 2rth Bernoulli number. The conclusion now follows from Proposition
6.1. �
6.3. Proof of Theorem 6. We begin by introducing some notations (see [17]). Let
V := (F2np , Q) be the quadratic space over Fp, where Q is the quadratic form obtained
from a quadratic form x 7→ T [x](x ∈ Z2np ) by reducing modulo p. We denote by < x, y >:=
Q(x, y)−Q(x)−Q(y), x, y ∈ F2np , the associated bilinear form and let
R(V ) := {x ∈ F2np : < x, y >= 0, ∀y ∈ F2np , Q(x) = 0}
be the radical of R(V ). Following [14], define a polynomial
Hn,p(T ;X) :=
1 if sp = 0,∏[(sp−1)/2]
j=1 (1− p2j−1X2) if sp > 0, sp odd,
(1 + λp(T )p
(sp−1)/2X)
∏[(sp−1)/2]
j=1 (1− p2j−1X2) if sp > 0, sp even,
where for even sp we denote
λp(T ) :=
1 if W is a hyperbolic space or sp = 2n,
−1 otherwise.
Following [16], for a nonnegative integer µ, define ρT (p
µ) by
ρT (p
µ)Xµ :=
(1−X2)Hn,p(T ;X), if p|fT ,
1 otherwise.
We extend the functions ρT multiplicatively to natural numbers N by defining
ρT (p
µ)X−µ :=
((1−X2)Hn,p(T ;X)).
D(T ) := GL2n(Z) \ {G ∈M2n(Z) ∩GL2n(Q) : T [G−1] half-integral},
where GL2n(Z) operates by left-multiplication and T [G
−1] = T ′G−1T . Then D(T ) is
finite. For a ∈ N with a|fT , let
(6.2) φ(a;T ) :=
G∈D(T ),|det(G)|=d
ρT [G−1](a/d
Note that φ(a;T ) ∈ Z for all a. With these notations we state the following theorem:
Theorem 6.3 ([17]). Suppose that g ≡ 0, 1 (mod 4) and let k ∈ N with g ≡ k (mod 2).
A Siegel modular form F is in SMaassk+n (Γ2g) if and only if there exists a modular form
f(z) =
c(n)qn ∈ Sk+ 1
(Γ0(4))
THE p-ADIC LIMIT OF WEAKLY HOLOMORPHIC MODULAR FORMS 19
such that A(T ) =
ak−1φ(a;T )c
|DT |
for all T . Here,
DT := (−1)g · det(2T )
and DT = DT,0f
T with DT,0 the corresponding fundamental discriminant and fT ∈ N.
Remark 6.4. A proof of Theorem 6.3 given in [17] implies that if A(T ) ∈ Z for all T ,
then c(m) ∈ Z for all m ∈ N.
Proof of Theorem 6. From Theorem 6.3 we can take
f(z) =
c(n)qn ∈ Sk+ 1
(Γ0(4)) ∩ Zp[[q]]
such that
F (Z) =
A(T )qtr(TZ) =
ak−1φ(a;T )c
|DT |
qtr(TZ).
By Theorem 1, there exists a positive integer b such that, for every positive integer m,
c(pbm) ≡ 0 (mod pj),
since k ≡ 2 or 3 (mod p−1
). Suppose that pb+2j ||DT |. If pj|a and a|fT , then
ak−1φ(a;T )c
|DT |
≡ 0 (mod pj).
If pj ∤ a and a|fT , then pb
∣∣∣ |DT |a2 and a
k−1φ(a;T )c
|DT |
≡ 0 (mod pj). �
Acknowledgement
We thank the referee for many helpful comments which have improved our exposition.
References
[1] S. Ahlgren and M. Boylan Central Critical Values of Modular L-functions and Coeffients of Half
Integral Weight Modular Forms Modulo ℓ, Amer. J. Math. 129 (2007), no. 2, 429–454.
[2] A. Balog, H. Darmon, K. Ono, Congruences for Fourier coefficients of half-integer weight modu-
lar forms and special values of L-functions, Analytic Number Theory, 105–128. Progr. Math. 138
Birkhauser, 1996.
[3] B. Berndt and A. Yee, Congruences for the coefficients of quotients of Eisenstein series, Acta Arith.
104 (2002), no. 3, 297–308.
[4] R. E. Borcherds, Automorphic forms on Os+2,2(R) and infinite products, Invent. Math. 120 (1995)
161–213.
[5] J. H. Bruinier, K. Ono, The arithmetic of Borcherds’ exponents, Math. Ann. 327 (2003), no. 2,
293–303.
[6] L. Carlitz, Arithmetic properties of generalized Bernoulli numbers, J. Reine Angew. Math. 202 1959
174–182.
20 D. CHOI AND Y. CHOIE
[7] H. Cohen, Sums involving the values at negative integers of L-functions of quadratic characters,
Math. Ann. 217 (1975), no. 3, 271–285.
[8] D. Choi and Y. Choie, Linear Relations among the Fourier Coefficients of Modular Forms on Groups
Γ0(N) of Genus Zero and Their Applications, to appear in J. Math. Anal. Appl. 326 (2007), no. 1,
655–666.
[9] Y. Choie, W. Kohnen, K. Ono, Linear relations between modular form coefficients and non-ordinary
primes, Bull. London Math. Soc. 37 (2005), no. 3, 335–341.
[10] H. Cohen and J. Oesterle, Dimensions des espaces de formes modulaires, Lecture Notes in Mathe-
matics, 627 (1977), 69–78.
[11] S. Corteel and J. Lovejoy, Overpartitions, Trans. Amer. Math. Soc. 356 (2004) 1623–1635.
[12] B. Gordon and K. Hughes, Multiplicative properties of eta-product, Cont. Math. 143 (1993), 415-430.
[13] P. Guerzhoy, The Borcherds-Zagier isomorphism and a p-adic version of the Kohnen-Shimura map,
Int. Math. Res. Not. 2005, no. 13, 799–814.
[14] Y. Kitaoka, Dirichlet series in the theory of Siegel modular forms, Nagoya Math. J. 95 (1984), 73–84.
[15] N. Koblitz, Introduction to elliptic curves and modular forms, Graduate Texts in Mathematics, 97.
Springer-Verlag, New York, 1993
[16] W. Kohnen, Lifting modular forms of half-integral weight to Siegel modular forms of even genus,
Math. Ann. 322 (2002), 787–809.
[17] W. Kohnen and H. Kojima, A Maass space in higher genus, Compos. Math. 141 (2005), no. 2,
313–322.
[18] P. Jenkins and K. Ono, Divisibility criteria for class numbers of imaginary quadratic fields, Acta
Arith. 125 (2006), no. 3, 285–289.
[19] T. Miyake, Modular forms, Translated from the Japanese by Yoshitaka Maeda, Springer-Verlag,
Berlin, 1989
[20] J.-P. Serre, Formes modulaires et fonctions zeta p-adiques, Lecture Notes in Math. 350, Modular
Functions of One Variable III. Springer, Berlin Heidelberg, 1973, pp. 191–268.
[21] J. Sturm, On the congruence of modular forms, Number theory (New York, 1984–1985), 275–280,
Lecture Notes in Math., 1240, Springer, Berlin, 1987.
[22] S. Treneer, Congruences for the Coefficients of Weakly Holomorphic Modular Forms, to appear in
the Proceedings of the London Mathematical Society.
[23] D. Zagier, Traces of singular moduli, Motives, polylogarithms and Hodge theory, Part I, Int. Press
Lect. Ser., 3, I, Int. Press, Somerville, MA, 2002, pp.211–244.
School of Liberal Arts and Sciences, Korea Aerospace University, 200-1, Hwajeon-
dong, Goyang, Gyeonggi, 412-791, Korea
E-mail address : choija@postech.ac.kr
Department of Mathematics and Pohang Mathematical Institute, POSTECH, Pohang,
790–784, Korea
E-mail address : yjc@postech.ac.kr
	1. Introduction and Statement of Main Results
	2. Applications: More Congruences 
	2.1. p-adic Limits of Borcherds Exponents
	2.2. Sums of n-Squares
	2.3. Quotients of Eisenstein Series
	2.4. The Maass Space
	3. Linear Relation among Fourier Coefficients of modular forms of Half Integral Weight
	4. Proofs of Theorem ?? and ??
	4.1. Proof of Theorem ??
	4.2. Proof of Theorem ??
	5. Proof of Theorem ??
	6. Proofs of Theorem ?? and ??
	6.1. Proof of Theorem ??
	6.2. Proof of Theorem ??
	6.3. Proof of Theorem ??
	Acknowledgement
	References
ABSTRACT
  Serre obtained the p-adic limit of the integral Fourier coefficient of
modular forms on $SL_2(\mathbb{Z})$ for $p=2,3,5,7$. In this paper, we extend
the result of Serre to weakly holomorphic modular forms of half integral weight
on $\Gamma_{0}(4N)$ for $N=1,2,4$. A proof is based on linear relations among
Fourier coefficients of modular forms of half integral weight. As applications
we obtain congruences of Borcherds exponents, congruences of quotient of
Eisentein series and congruences of values of $L$-functions at a certain point
are also studied. Furthermore, the congruences of the Fourier coefficients of
Siegel modular forms on Maass Space are obtained using Ikeda lifting.

<|endoftext|><|startoftext|>
Introduction
The purpose of this paper is to describe string topology from the viewpoint of
Chen’s iterated integrals. Let M be a compact closed oriented d-manifold and
LM be the free loop space ofM , the set of unbased smooth maps from S1 toM .
Let H∗(LM) be the homology of the free loop space shifted by the dimension of
the manifold i.e. H∗(LM) = H∗+d(LM). Chas and Sullivan found the product
on H∗(LM) which they called loop product [1]:
Hp(LM)⊗Hq(LM)→ Hp+q(LM).
They showed that this product makes H∗(LM) an associative, commutative
algebra.
Merkulov constructed a model for this product based on the theory of iter-
ated integrals, especially of the formal power series connection [10]. He showed
that there is an isomorphism of algebras
H∗(LM) ∼= H∗(ΛM ⊗ R
where ΛM is the de Rham differential graded algebra of M and R
the formal completion of the free graded associative algebra generated by some
noncommutative indeterminates.
On the other hand, Chen showed that the cohomology of the free loop space
of the simply-connected manifold is isomorphic to the cohomology of the cyclic
bar complex of differential forms via Chen’s iterated integrals (see [5] or [8]):
H∗(LM) ∼= H
∗(C(ΛM)).
In this paper, we construct a model for the loop product based on the the-
ory of the cyclic bar complex. We define a complex Hom(B(ΛM),ΛM) and
its subcomplex Hom(B(ΛM),ΛM) so that the Poincaré duality induces the
isomorphism of vector spaces
H∗(Hom(C(ΛM),R)) ∼= H∗−d(Hom(B(ΛM),ΛM)).
We can define a product on Hom(B(ΛM),ΛM) which realizes the loop product.
http://arxiv.org/abs/0704.0014v1
Theorem 1.1. Let M be a compact closed oriented simply-connected manifold.
Assume that H∗(M) is of finite type. Let A be a differential graded subalge-
bra of ΛM such that H∗(A) ∼= H∗(ΛM) by the inclusion. Then there is an
isomorphism of associative, commutative algebras
H∗(LM) ∼= H∗(Hom(B(A), A)).
The product defined on H∗(Hom(B(A), A)) corresponds to the loop product un-
der the isomorphism.
The paper is organized in the following way. In section 2, we briefly review
Chen’s iterated integrals. In section 3, we give a construction of a complex
Hom(B(A), A), and discuss its properties. In section 4, we give a proof of
theorem 1.1. In section 5, we study the iterated integrals on the free loop space
of the non-simply-connected manifolds. In section 6, we describe a relation
between the product on Hom(B(A), A) and the Goldman bracket. In this paper,
all the homologies have their coefficients in the field of real numbers.
Acknowledgement: The author would like to thank Professor Toshitake Kohno
much for helpful comments and gentle support.
2 Chen’s iterated integrals
We briefly review Chen’s iterated integrals (see [5], or [8]). Let M be a finite
dimensional smooth manifold and let LM be the free loop space of M , that is
the space of all smooth maps from S1 to M . Let ∆k be the k-simplex
{(t1, · · · , tk) ∈ R
k | 0 ≤ t1 ≤ · · · ≤ tk ≤ 1}.
We have an evaluation map
Φk : ∆k × LM →M
defined by
Φk(t1, · · · , tk; γ) = (γ(t1), · · · , γ(tk)).
Then define Pk to be the composition
(Λ∗M)⊗k → Λ∗Mk
→ Λ∗(∆k × LM)
→ Λ∗−kLM
where p∗ is the integration along the fiber of the projection p : ∆k×LM → LM .
Given ω1, · · ·ωk ∈ Λ
∗M , the iterated integral
ω1 · · ·ωk
is a differential form on LM of total degree |ω1| + · · · |ωk| − k, defined by the
formula
ω1 · · ·ωk = (−1)
(k−1)|ω1|+(k−2)|ω2|+···+|ωk−1|+k(k−1)/2Pk(ω1, · · · , ωk).
3 Preliminaries
In this section, we give a construction of some complexes. Let A be an arbitrary
differential graded algebra in this section. Let A∨ denote the dual of A. The
bar complex of A, (B(A), dB), is defined by
B(A) = ⊕r≥0 ⊗
r sA,
dB(ω1, · · · , ωr) = −(−1)
(ω1, · · · , ωi−1, dωi, ωi+1, · · · , ωr)
−(−1)εi
(ω1, · · · , ωi−1, ωi ∧ ωi+1, ωi+2, · · · , ωr).
Here (sA)q = Aq+1 or Aq according as 0 ≤ q or 0 < q, and εi = deg(ω1, · · · , ωi).
We denote the totality of degree n elements by B(A)n. The coproductH
∗(B(A))
→ H∗(B(A)) ⊗H∗(B(A)) is defined by
(ω1, · · · , ωn) 7→
(ω1, · · · , ωi)⊗ (ωi+1, · · · , ωn).
Chen proved the following theorem.
Theorem 3.1 (Chen [5]). Let M be a simply-connected manifold and H∗(M)
be of finite type. Let A be a differential graded algebra of ΛM such that A0 =
R and H∗(A) ∼= H∗(ΛM) by the inclusion. Then there is an isomorphism of
coalgebras
H∗(B(A)) ∼= H
∗(ΩM)
given by
(ω1, · · · , ωn) 7→
ω1 · · ·ωn.
Let F pB(A) be a filtration of B(A) such that
F pB(A) = ⊕0≤r≤p ⊗
r sA.
Let Hom(B(A), A∨)n =
p+q=n Hom(B(A)p, A
q∨) and Hom(B(A), A∨) =
n Hom(B(A), A
∨)n. Its boundary is defined by
δϕ(ω1, · · · , ωr)(ω)
= ϕ(ω1, · · · , ωr)(dω) + (−1)
|ω|ϕ(dB(ω1, · · · , ωr))(ω)
− (−1)|ω|ϕ(ω2, · · · , ωr)(ω ∧ ω1)
+(−1)|ω|+εr−1(|ωr|+1)ϕ(ω1, · · · , ωr−1)(ω ∧ ωr).
Let us define the subcomplex of Hom(B(A), A∨), Hom(B(A), A∨), according
to the Chen’s normalization of the cyclic bar complex (see [4] or [8]). We define
Hom(B(A), A∨) to be the set of elements in Hom(B(A), A∨) which satisfy the
following equations for any ω, ωi ∈ A
>0 and f ∈ A0:
−ϕ(· · ·ωi−2, fωi−1, ωi, · · · )(ω) + ϕ(· · · , ωi−1, fωi, ωi+1, · · · )(ω)
+ϕ(· · · , ωi−1, df, ωi, · · · )(ω) = 0, 1 ≤ i ≤ r − 1,
−ϕ(ω1, · · · , ωr)(fω) + ϕ(fω1, · · · , ωr)(ω) + ϕ(df, ω1, · · · , ωr)(ω) = 0,
−ϕ(ω1, · · · , fwr)(ω) + ϕ(ω1, · · · , ωr)(fω) + ϕ(ω1, · · · , ωr, df)(ω) = 0.
It can be easily seen that it is isomorphic to the dual of the normalized cyclic
bar complex of A:
Hom(B(A), A∨) ∼= C(A)
Similarly, let Hom(B(A), A)n =
p−q=n Hom(B(A)p, A
q) and Hom(B(A), A)
n Hom(B(A), A)n. Its boundary is defined by
δϕ(ω1, · · · , ωr)
= (−1)|ϕ|−εrdϕ(ω1, · · · , ωr)− (−1)
|ϕ|−εrϕ(dB(ω1, · · · , ωr))
+(−1)|ϕ|−εrω1 ∧ ϕ(ω2, · · · , ωr)
−(−1)(|ωr|+1)(|ϕ|+1)ϕ(ω1 · · · , ωr−1) ∧ ωr.
We define Hom(B(A), A) to be the set of elements in Hom(B(A), A) which
satisfy the following equations for any ω, ωi ∈ A
>0 and f ∈ A0:
−ϕ(· · ·ωi−2, fωi−1, ωi, · · · ) + ϕ(· · · , ωi−1, fωi, ωi+1, · · · )
+ϕ(· · · , ωi−1, df, ωi, · · · ) = 0, 1 ≤ i ≤ r − 1,
−f ∧ ϕ(ω1, · · · , ωr) + ϕ(fω1, · · · , ωr) + ϕ(df, ω1, · · · , ωr) = 0,
−ϕ(ω1, · · · , fwr) + ϕ(ω1, · · · , ωr) ∧ f + ϕ(ω1, · · · , ωr, df) = 0.
The cup product on Hom(B(A), A) is defined by
ϕ1 ∪ ϕ2(ω1, · · · , ωr)
0≤i≤r
(−1)|ϕ1|(|ϕ2|+εr−εi)ϕ1(ω1, · · · , ωi) ∧ ϕ2(ωi+1, · · · , ωr).
Since δ(ϕ1 ∪ ϕ2) = δϕ1 ∪ ϕ2 + (−1)
|ϕ1|ϕ1 ∪ δϕ2, H∗(Hom(B(A), A)) becomes
an algebra. This product can be induced on H∗(Hom(B(A), A)).
The E1-term of their spectral sequences associated with the filtration F
pB(A)
can be calculated from the cohomology of A.
Proposition 3.2. There is an isomorphism of vector spaces
H∗(Hom(F
pB(A)/F p−1B(A), A∨)) ∼= Hom(⊗
psH(A), H(A)∨)
Proof. Let A be a differential graded subalgebra of A such that A
= Ap for p
> 1, A
= R and
A1 = dA0 ⊕A
There is an isomorphism of vector spaces
Hom(F qB(A)/F q−1B(A), A∨) ∼= Hom(F
qB(A)/F q−1B(A), A
Since A
= R, there is an isomorphism
H0(Hom(F
qB(A)/F q−1B(A), A
)) ∼= Hom(⊗sH(A), H(A)
Therefore we obtain the proposition.
4 Proof of Theorem 1.1
We give the proof of theorem 1.1 in this section. There is a differential graded
subalgebra of A, A, such that A
= R and H(A) ∼= H(A) by the inclusion. Then
we obtain the isomorphism of algebras
H∗(Hom(B(A), A)) ∼= H∗(Hom(B(A), A))
by proposition 3.2. Therefore it suffices to verify the theorem in the case A0 = R.
The following result is due to Chen.
Theorem 4.1 (Chen [5]). H∗(LM) ∼= H∗(Hom(B(A), A
Proof. We define ψ : C∗(LM)→ Hom(B(A), A
∨) by
ψ(σ)(ω1, · · · , ωn)(ω) =
π∗ω ∧
ω1 · · ·ωn.
Let FpC∗(LM) be a filtration of C∗(LM) such that
FpCr(LM) = { σ : ∆
r → LM | π ◦ σ = σ′ ◦ π′ for some σ′ ∈ Cq(M),
q ≤ p, π′ : ∆r → ∆q } .
Let {Erp,q} be the associated spectral sequence. Define a filtration of Hom(B(A), A
FpHom(B(A), A) = {f ∈ Hom(B(A), A
∨) | f(ω1, · · · , ωn)(ω) = 0, ∀ω ∈ A
≥p+1}.
It can be easily shown that ψ preserves the filtrations of C∗(LM) and Hom(B(A), A
On E2-level, the map
ψ : Hp(M)⊗Hq(ΩM)→ Hp(A
∨)⊗Hq(B(A)
is given by
σ1 ⊗ σ2 7−→
(ω1, · · · , ωn 7→
ω1 · · ·ωn)
Theorem 3.1 asserts that this is an isomorphism. Therefore we obtain the
theorem.
Lemma 4.2. H∗(Hom(B(A), A)) ∼= H∗−d(Hom(B(A), A
Proof. We define a chain map P : Hom(B(A), A)→ Hom(B(A), A∨) by
P (ϕ)(ω1, · · · , ωn)(ω) =
ω ∧ ϕ(ω1, · · · , ωn).
Define a filtration of Hom(B(A), A) by
FpHom(B(A), A) = {ϕ ∈ Hom(B(A), A) | ϕ(ω1, · · · , ωn) ∈ A
≥d−p}.
The map P preserves those filtrations. On E2-level, the map
P : Hd−p(A)⊗Hq(B(A)
∨)→ Hp(A
∨)⊗Hq(B(A)
is given by
ω ⊗ ϕ 7−→
ω ∧ τ
This is isomorphic and we obtain the lemma.
Proof of theorem 1.1. We can verify that H∗(LM) is isomorphic to
H∗(Hom(B(A), A)) as vector spaces by composing the maps in theorem 4.1 and
lemma 4.2. We can also verify that there is an isomorphism of associative,
commutative algebras. Indeed, the cup product of Hom(B(A), A) on E2-level
Hd−p(A)⊗Hq(B(A)
∨)⊗Hd−s(A)⊗Ht(B(A)
∨)→ H2d−p−s(A)⊗Hq+t(B(A)
is given by
a⊗ g ⊗ b⊗ h 7→ (−1)(d−p+q)(d−s)a ∧ b⊗ g · h,
where g · h satisfies
g · h(ω1, · · · , ωn) =
g(ω1, · · · , ωi)h(ωi+1, · · · , ωn).
Then the following theorem asserts that the loop product and the cup product
coincide on E2-level.
Theorem 4.3 (Cohen-Jones-Yan [6]). Let M be a simply-connected manifold.
Then {Erp,q} becomes an algebra and converges to H∗(LM) as algebras. On
E2-level, the product
µ : Hp(M ;Hq(LM))⊗Hs(M ;Ht(LM))→ Hp+q−d(M ;Hs+t(LM))
is given by
µ((a⊗ g)⊗ (b ⊗ h)) = (−1)(d−s)(p+q−d)(a · b)⊗ (gh)
where a ∈ Hp(M), b ∈ Hs(M), g ∈ Hq(ΩM), h ∈ Ht(ΩM), a · b is the intersec-
tion product and gh is the Pontryagin product.
Therefore we obtain the theorem.
5 The conjugacy classes of fundamental groups
Let π denote a fundamental group of a smooth manifold M and J denote an
augmentation ideal of the group ring of π, Rπ. Chen showed that the completion
of the fundamental group with respect to the powers of its augmentation ideal is
isomorphic to the dual of the 0-th cohomology of the bar complex of differential
forms via iterated integrals [3]:
Rπ/Jp ∼= H
0(B(A))∨
where A is a differential graded subalgebra of ΛM such that A0 = R and
H∗(A) ∼= H∗(M).
Based on this work, we study iterated integrals on the free loop space of the
non-simply-connected manifold. Let π̃ denote the set of conjugacy classes of π
and J̃p denote pr(Jp) where pr is the projection of Rπ onto Rπ̃.
Theorem 5.1. Let M be a smooth manifold and H∗(M) is of finite type. Let A
be a differential graded subalgebra of ΛM such that the map Hq(A)→ Hq(ΛM)
induced by the inclusion is isomorphic if q = 0, 1 and injective if q = 2. Then
there is an isomorphism of vector spaces
Rπ̃/J̃p ∼= H0(Hom(B(A), A
We give the proof of this theorem in this section. Let ∗ be a fixed point
in S1. In this section, let LM be a set of smooth maps from S1 to M which
are constant maps near ∗. Let ΩxM be a subspace of LM whose elements
send ∗ to x ∈ M . Let Diff(S1, ∗) denote diffeomorphisms of S1 which coincide
with identity map near ∗. We define α, β : ∆q → LM to be equivalent by a
reparameterization iff there is a smooth map τ : ∆q → Diff(S1, ∗) such that
β(ξ)(t) = α(ξ)(τ(t, ξ)), ∀(t, ξ) ∈ S1 ×∆q.
Let C∗(LM) be a chain complex having as a basis the totality of equiva-
lence classes of smooth simplexes of LM . Let C∗(ΩxM) be a chain complex
having as a basis the totality of equivalence classes of smooth simplexes of ΩxM .
C∗(ΩxM) becomes a noncommutative associative algebra as follows. The prod-
uct of σ1 and σ2 in C∗(ΩxM) is defined to be the path product or 0 according
as degσ1+degσ2 ≤ 1 or > 1. The augmentation ε : C∗(ΩxM) → R is given by
εσ = 1 or 0 according as degσ = 0 or > 0.
Let σ be a smooth simplex of M . Define for each σ
Cq(LM)(σ) = {
niτi ∈ Cq(LM) | π♯τi = σ}.
Cq(LM)(σ) becomes a noncommutative associative algebra. Let ε(σ) denote
the augmentation of Cq(LM)(σ), given by
niτi 7→
ni. Define a filtration
of Cq(LM)(σ) by
FpCq(LM) = (kerε)
p ⊕ (⊕σ:∆q→M (kerε(σ))
Proposition 5.2. The map ψp : FpCq(LM) → Hom(F
p−1B(A), A∨) given by
(ω1, · · · , ωp) 7→
π∗ω ∧
ω1 · · ·ωp
is well-defined, chain map and FpCq(LM) ⊂ kerψp.
Proof. The well-definedness can be verified by the following lemma which can
be verified as in proposition 1.5, proposition 4.1.1 [2], and in proposition 1.5.3
Lemma 5.3 (Chen). (1) If α and β ∈ C∗(LM) are equivalent by a reparame-
terization, then
ω1 · · ·ωn = β
ω1 · · ·ωn.
(2) If τ1, τ2 ∈ Cq(LM)(σ), then
(τ1 · τ2)
ω1 · · ·ωn =
ω1 · · ·ωi ∧ τ
ωi+1 · · ·ωn.
(3) If f ∈ Λ0M , then for any i
ω1 · · · fωi−1 · · ·ωn +
ω1 · · · fωi · · ·ωn +
ω1 · · ·ωi−1df ωi · · ·ωn = 0.
To verify FpCq(LM) ⊂ kerψp, it suffices to show (kerε(σ))
p ⊂ kerψp. Let
s denote the section of π, which sends points of M to the constant map. Take
(σ1 − s♯σ) · (σ2 − s♯σ) · · · · ·(σp − s♯σ) ∈ (kerε(σ))
p, where σ ∈ Cq(M) and
σi ∈ Cq(LM)(σ). Then
(σ1 − s♯σ) · (σ2 − sσ) · · · · · (σp − s♯σ)
π∗ω ∧
ω1 · · ·ωp−1
σ∗ω ∧ (σ1 − s♯σ)
ω1 · · · (σk − s♯σ)
∗1 · · · ∧ (σp − s♯σ)
Therefore we obtain the proposition.
Let C∗(M,x) denote a set of smooth simplexes ofM neighborhood of whose
vertices are at x in M . We define
C ⊗ sC⊗p = C∗(M,x)⊗ sC∗(M,x)
Here (sC∗(M,x))q = Cq+1(M,x) or 0 according as q > 0 or q ≤ 0. Its boundary
is given by the sum of the boundary on each complex. Let us construct a chain
map Φ : C ⊗ sC⊗p → FpC∗(LM)/Fp+1C∗(LM) considering the following three
cases:
case 1: If (σ1, · · · , σp) ∈
sC(M,x)⊗p
, then
Φ : (σ1, · · · , σp) 7−→ (σ1 − x) · (σ2 − x) · · · · · (σp − x)
where x is regarded as a constant map.
case 2: If (σ1, · · · , σp) ∈
sC(M,x)⊗p
, then
Φ : (σ1, · · · , σp) 7−→ (σ1 − x) · (σ2 − x) · · ·σi · · · (σp − x)
where σi : ∆
1 ∋ ξ 7→ σi(ξ)(t) ∈ ΩxM is
σi(ξ)(t)
σi((1 − ξ)((1 − t)v0 + tv2) + ξ(1 − 2t)v0 + 2ξtv1), if 0 ≤ t ≤ 1/2
σi((1 − ξ)((1 − t)v0 + tv2) + ξ(2 − 2t)v1 + ξ(2t− 1)v2), if 1/2 ≤ t ≤ 1
Here v0, v1, v2 are the vertices of the standard simplex ∆
case 3: If (γ, σ1, · · · , σp) ∈ C1(M,x)⊗
sC(M,x)⊗p
, then
Φ : (γ, σ1, · · · , σp) 7−→ γ
t (σ1 − x)γt · · · γ
t (σp − x)γt
where γt : [0, 1] ∋ s 7→ γ(st) ∈ M , t ∈ ∆
Lemma 5.4. The following diagram commutes:
C ⊗ sC⊗p
−−−−→ FpC1(LM)/Fp+1C1(LM)
C ⊗ sC⊗p
−−−−→ FpC0(LM)/Fp+1C0(LM)
Proof. For case 2,
∂′Φ(σ1, · · · , σp)− Φ∂(σ1, · · · , σp)
= (σ1 − x) · · · (σ
i · σ
i − σ
i − σ
i + σ
i − σ
i + x) · · · (σp − x)
= (σ1 − x) · · · (σ
i − x) · (σ
i − x) · · · (σp − x) ∈ Fp+1C0(LM)
where σ
i , σ
i , σ
i are the faces of σi.
For case 3,
∂′Φ(γ, σ1, · · · , σp)− Φ∂
′(γ, σ1, · · · , σp)
= γ−1 · (σ1 − x) · γ · · · γ
−1 · (σp − x) · γ − (σ1 − x) · · · (σp − x)
∈ Fp+1C0(LM).
Therefore we obtain the lemma.
Proposition 5.2 gives the map
Hq(FpC(LM)/Fp−1C(LM))→ Hq(Hom(F
pB(A)/F p−1B(A), A∨)).
Lemma 5.5. For q = 0, the following map is isomorphic:
H0(FpC(LM)/Fp+1C(LM)) ∼= H0(Hom(F
pB(A)/F p−1B(A), A∨)).
Proof. We obtain the following surjection by lemma 5.4.
Φ : H0(C ⊗ sC
⊗p) ։ H0(FpC(LM)/Fp+1C(LM)).
Composing with the isomorphism ⊗pH1(M) ∼= H0(C ⊗ sC
⊗p), the map
⊗pH1(M) ։ H0(FpC(LM)/Fp+1C(LM))→ Hom(⊗
pH1(A),R)
is given by
(σ1, · · · , σn) 7→
(ω1, · · · , ωp) 7→
ω1 · · ·
This is isomorphic and we obtain the lemma.
Lemma 5.6. For q = 1, the following map surjective:
H1(FpC(LM)/Fp+1C(LM)) ։ H1(Hom(F
pB(A)/F p−1B(A), A∨)).
Proof. It suffices to show that the following map obtained by lemma 5.4 is
surjective.
ker∂ → H1(FpC(LM)/Fp+1C(LM))→ Hom(⊗
psH(A), H(A)∨)1
If (γ, σ1, · · · , σp) ∈ ker∂ ∩
C0(M,x)⊗
sC(M,x)⊗p
, then
(γ, σ1, · · · , σp) 7→
(ω1, · · · , ωp) 7→
ω1 · · ·
ωp, if deg ω = 0
0, otherwise
through the above map.
If (γ, σ1, · · · , σp) ∈ ker∂ ∩
C1(M,x)⊗
sC(M,x)⊗p
, then
(γ, σ1, · · · , σp) 7→
(ω1, · · · , ωp) 7→
ω1 · · ·
when deg ω = 1. Then we can verify the surjectivity and obtain the lemma.
Proof of theorem 1.1. Consider the spectral sequences ofC(LM)/FpC(LM) and
Hom(F p−1B(A), A∨) associated with FqC(LM) and Hom(F
qB(A), A∨), re-
spectively. Lemma 5.5 asserts that ψp is isomorphic on E1-level at degree 0:
H0(FqC(LM)/Fq+1C(LM)) ∼= H0(Hom(F
qB(A)/F q−1B(A), A∨)).
Lemma 5.6 asserts that ψp is surjective on E1-level at degree 1:
H1(FqC(LM)/Fq+1C(LM)) ։ H1(Hom(F
qB(A)/F q−1B(A), A∨)).
Then there is an isomorphism on Er-level at degree 0 for r ≥ 1. We have
Rπ̃/J̃p ∼= H0(C(LM)/FpC(LM)) ∼= H0(Hom(F
pB(A), A∨)).
Therefore we obtain the theorem.
6 The Goldman bracket
This section is devoted to the proof of the following theorem.
Theorem 6.1. Let M be a compact closed oriented surface with genus g. Then
the Goldman bracket induces a Lie algebra structure on lim
Rπ̃/J̃pand there is
an isomorphism of Lie algebras
Rπ̃/J̃p ∼= H0(Hom(B(H
∗(M)), H∗(M)∨)).
Goldman showed that the vector space spanned by the free homotopy classes
of closed curves on a closed oriented surface has a Lie algebra structure [9]. This
work led Chas and Sullivan to the string topology. We would verify that this
structure makes lim
Rπ̃/J̃p a Lie algebra. On the other hand, we can construct
a bracket on H0(Hom(B(H
∗(M)), H∗(M)∨)) by the cup product defined in
section 3 and the Connes’s operator. Here we regard H∗(M) as a differential
graded algebra with a trivial differential. Theorem 6.1 asserts that those two
Lie algebras are isomorphic.
First we describe a relation between this bracket and the augmentation ideal
of the group ring of the surface group to induce a Lie algebra structure on
Rπ̃/J̃p. Then we construct a bracket on H0(Hom(B(A), A
∨)) and verify
the isomorphism of Lie algebras
Rπ̃/J̃p ∼= H0(Hom(B(A), A
Finally we verify the isomorphism
H0(Hom(B(A), A
∨)) ∼= H0(Hom(B(H
∗(M)), H∗(M)∨).
The following proposition makes lim
Rπ̃/J̃p a Lie algebra.
Proposition 6.2. (1) If p ≥ 1 and q ≥ 2, then [J̃p, J̃q] ⊂ J̃p+q−2.
(2) If p ≥ 2 , then [J̃p,Rπ̃] ⊂ J̃p−1.
Proof. We give a proof of (1). Take (σ1−x) · · · (σp−x) ∈ J̃p, (τ1−y) · · · (τq−y)
∈ J̃q, where σi ∈ ΩxM and τi ∈ ΩyM . Assume that all curves are immersions
and σi τj intersect transversally for any i, j. Let {σi♯τj} denote the set of
intersection points of σi and τj . Also assume that all the intersection points are
distinct i.e. {σi♯τj} ∩ {σk♯τl} = φ if i 6= k or j 6= l. Then,
[σ, τ ] =
s∈σi♯τj
{ε(s;σi, τj)γs,x · (σi − x) · · · (σp − x)(σ1 − x) · · ·
·(σi−1 − x) · γ
s,x · ·γs,y · (τj − y) · · · (τq − y)(τ1 − y) · · · (τj−1 − y) · γ
−γs,x · (σi+1 − x) · · · (σp − x)(σ1 − x) · · · (σi−1 − x) · γ
s,x ·
·γs,y · (τj+1 − y) · · · (τq − y)(τ1 − y) · · · (τj−1 − y) · γ
∈ J̃p+q−2.
Here γs,x is a path from s to x along σi and γs,y is a path from s to y along τj .
The proof of (2) can be verified in the same way.
Let A be a differential graded subalgebra of ΛM such thatH∗(A) ∼= H∗(ΛM)
by the inclusion.
Proposition 6.3. There is an isomorphism of vector spaces
H∗(Hom(F
pB(A), A)) ∼= H∗−2(Hom(F
pB(A), A∨)).
Proof. We define P : H∗−2(Hom(F
pB(A), A))→ H∗(Hom(F
pB(A), A∨)) by
P (ϕ)(ω1, · · · , ωp)(ω) =
ω ∧ ϕ(ω1, · · · , ωp).
This map preserves the filtrations. On E1-level, the map
Hom(⊗qH(A), H(A))→ Hom(⊗qH(A), H(A)∨)
is isomorphic. Therefore we obtain the proposition.
Now we construct a bracket on H0(Hom(B(A), A
∨)). First, we define the
Connes’s operator B : H∗(Hom(F
pB(A), A∨)) → H∗+1(Hom(F
p−1B(A), A∨))
B(ϕ)(ω1, · · · , ωp−1)(ω)
0≤k≤p−1
(−1)(εk+1)(εp−1−εk)ϕ(ωk+1, · · · , ωp−1, ω, ω1, · · ·ωk)(1).
Composing these maps and the cup product, we can define a bracket on
H0(Hom(F
pB(A), A∨)) by
[ϕ1, ϕ2] = −P (P
−1Bϕ1 ∪ P
−1Bϕ2) ∈ H0(Hom(F
p−1B(A), A∨)).
Take 2g closed 1-forms on M , α1, · · · , αg, β1, · · ·βg, such that
αi ∧ βj =
δij . Let {E
p.q} denote the spectral sequence of Hom(B(A), A
∨) associated with
F pB(A). Notice that the cyclic group Z/pZ acts on E
∼= Hom(⊗pH1(A),R)
ιϕ(ω1, · · · , ωp) = ϕ(ω2, · · · , ωp, ω1)
where ι is a generator of Z/pZ. The bracket [ , ] : E
p,−p⊗E
q,−q → E
p+q−2,−p−q+2
[ϕ1, ϕ2](ω1, · · · , ωp+q−2)
i,m,n
ιmϕ1(αi, ω1, · · · , ωp−1)̺
nϕ2(βi, ωp, · · · , ωp+q−2)
−ιmϕ1(βi, ω1, · · · , ωp−1)̺
nϕ2(αi, ωp, · · · , ωp+q−2)
where ι and ̺ are generators of Z/pZ and Z/qZ, respectively.
Proposition 6.4. The following diagram commutes for p, q ≥ 1:
J̃p/ ˜Jp+1 ⊗ J̃q/ ˜Jq+1 −−−−→ E
∞ ⊗ E
[ , ]
[ , ]
J̃p+q−2/J̃p+q−1 −−−−→ E
p+q−2,−p−q+2
Proof. Take σ = (σ1 − x) · · · (σp − x) ∈ FpC0(LM), τ = (τ1 − y) · · · (τq − y) ∈
FqC0(LM). Take 2g curves in M , ai, bi, as in Figure 1. Assume that σi and τj ,
ak, or bk, intersect transversally for any i, j, k. Also assume that τj and ak, or
bk, intersect transversally for any j, k.
Assume that all the intersection points are distinct. Then for any i, j, k, we
can take each tubular neighborhoods of ai and bi so that it does not include some
neighborhoods of intersection points of σj and τk. We fix such neighborhoods
of intersection points and denote them by Up for each p. We can also take
a tubular neighborhood of the diagonal map from M to M×M outside those
neighborhoods of intersection points of σi and τj for any i, j i.e.
S1 \ ∪pσ
i (Up)
S1 \ ∪pτ
j (Up)
= φ, ∀i, j.
Here N∆ denotes the tubular neighborhood of the diagonal map. Thom class Φ
of this tubular neighborhood satisfies
Φ = −ε(p;σi, τj),
where ε(p;σi, τj) is the intersecion number of σi and τj at p.
Fig. 1
・ ・ ・
Define e♯ : C0(LM) → C1(LM) by e♯γ(ξ)(t) = γ(ξ + t). Let ωk, 1 ≤
k ≤ n, be differential forms on M which has its support inside the tubular
neighborhoods of ai and bi. Then
[σi,τj]
ω1 · · ·ωn
p∈σi♯τj,k
ε(p;σi, τj)
(σi)p
ω1 · · ·ωk
(τj)p
ωk+1 · · ·ωn
p∈σi♯τj ,k
×τj |
(σi)p
ω1 · · ·ωk
(τj)p
ωk+1 · · ·ωn
e♯σi×e♯τj
π∗Φ ∧ p∗1
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωn.
Here p1, p2 : LM×LM → LM are the projections. The last equality is obtained
by the following lemma.
Lemma 6.5. If p ∈ σi♯τj and p
′ ∈ Up ∩ σi([0, 1]), then
(σi)p
ω1 · · ·ωn =
(σi)p′
ω1 · · ·ωn.
Proof. F Let γ be the curve from p to p′ along σi inside Up. If γ and σ are in
the same direction, then
(σi)p′
ω1 · · ·ωn =
γ·(σi)p′
ω1 · · ·ωn =
(σ)p·γ
ω1 · · ·ωn
ω1 · · ·ωn.
We can also verify the case where γ is in the direction opposite to σ in the same
We have the equality
e♯σ×e♯τ
π∗Φ ∧ p∗1
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωp+q−2
e♯σ×e♯τ
− p∗1(α1 ∧ β1)− p
2(α1 ∧ β1) + p
1αj ∧ p
2βj − p
1βj ∧ p
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωp+q−2
In fact, if η ∈ Λ(M ×M) then
(−1)|η|+1
e♯σ×e♯τ
π∗dη ∧ p∗1
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωp+q−2
e♯σ×e♯τ
π∗η ∧ d
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωp+q−2
+(e♯σ)
ω1 · · ·ωk
(e♯τ)
ωk+1 · · ·ωj ∧ ωj+1 · · ·ωp+q−2
The last equality is obtained by the following lemma.
Lemma 6.6. If σ ∈ FpC0(LM), then
(e♯σ)
ω1 · · ·ωp−2 = 0.
Proof. It suffices to show the case σ = (τ1 − x) · · · (τp − x) where x ∈ M and τi
∈ ΩxM . We define τ̄i ∈ ΩxM by
τ̄i(t) =
τi(pt), if (i − 1)/p ≤ t ≤ i/p
0, otherwise.
Let σ̄ denote (τ̄1 − x) · · · (τ̄p − x). It can be shown that e♯σ̄ restricted on [(i −
1)/p, i/p] is contained in Fp−1C1(LM) for any i. Therefore
(e♯σ)
ω1 · · ·ωp−2 = (e♯σ̄)
ω1 · · ·ωp−2 = 0.
Jones, Geztler, and Petrack describes the map e♯ in terms of iterated inte-
grals by the following theorem.
Theorem 6.7 (Geztler-Jones-Petrack [8]). If σ ∈ C0(LM) and ω, ωi ∈ Λ
1 ≤ i ≤ p, then
π∗ω ∧
ω1 · · ·ωp =
ωk · · ·ωpωω1 · · ·ωk−1.
This theorem asserts the equality
e♯σ×e♯τ
− p∗1(α1 ∧ β1)− p
2(α1 ∧ β1) + p
1αj ∧ p
2βj − p
1βj ∧ p
ω1 · · ·ωk ∧ p
ωk+1 · · ·ωn
j,k,l
ωk+1 · · ·ωp−1αjω1 · · ·ωk
ωl+1 · · ·ωp+q−2βjωp · · ·ωl
ωk+1 · · ·ωp−1βjω1 · · ·ωk
ωl+1 · · ·ωp+q−2αjωp · · ·ωl
Finally we obtain the equality
[σ,τ ]
ω1 · · ·ωp+q−2
j,k,l
ωk+1 · · ·ωp−1αjω1 · · ·ωk
ωl+1 · · ·ωp+q−2βjωp · · ·ωl
ωk+1 · · ·ωp−1βjω1 · · ·ωk
ωl+1 · · ·ωp+q−2αjωp · · ·ωl
Since we can take ωi ∈ H
1(M), 1 ≤ i ≤ p + q − 2, so that their support are
inside the tubular neighborhoods of aj and bj, we obtain the proposition.
Proof of theorem 6.1. We obtain the following isomorphism of Lie algebras by
proposition 6.4.
Rπ̃/J̃p ∼= H0(Hom(B(A), A
To obtain the isomorphism of Lie algebras
H0(Hom(B(A), A
∨) ∼= H0(Hom(B(H
∗(M)), H∗(M)∨),
we introduce the following lemma, which asserts the formality of the compact
Kähler manifolds.
Lemma 6.8 (ddcLemma, Deligne-Griffiths-Morgan-Sullivan [7]). Let X be a
compact Kähler manifold and dc = J−1dJ where J gives the complex structure
in the cotangent bundle. If α is a differential form on X such that dα = 0 and
dcα = 0, and such that α = dγ, then α = ddcβ for some β.
Cor. There are quasi-isomorphisms of differential graded algebras
(ΛX, d)← (kerdc, d)→ (H∗dc(X), 0).
Notice that a closed oriented surface endowed with a complex structure become
a Kähler manifolds for the dimensional reason. Therefore the following lemma
completes the proof of the theorem.
Lemma 6.9. If f : A1 → A2 is a quasi-isomorphism of differential graded
algebras, then the map induced by f
H0(Hom(B(A1), A
1 )→ H0(Hom(B(A2), A
is an isomorphism.
Proof. It suffices to verify that the map induced by f
f : H0(Hom(F
pB(A1), A
1 )→ H0(Hom(F
pB(A2), A
is an isomorphism for any p. On E1-level, the map induced by f
Hom(⊗sH(A1), H(A1)
∨)→ Hom(⊗sH(A2), H(A2)
is an isomorphism because f is quasi-isomorphism. Therefore we obtain the
lemma.
Therefore we obtain the theorem.
References
[1] M. Chas and D. Sullivan, String topology, preprint, 1999, http://arXiv.org
/abs/math.GT/9911159.
[2] K.T. Chen, Iterated integrals of differential forms and loop space homology,
Ann. of Math. (2) 97(1973), 217-246.
[3] K.T. Chen, Iterated integrals, fundamental groups and covering spaces,
Trans. Amer. Math. Soc. 206 (1975), 83-98.
[4] K.T. Chen, Reduced bar constructions on de Rham complexes, in:A.Haller
and M.Tierney (eds), (Algebra, topology and category theory, 1977, pp. 19-
[5] K.T. Chen, Iterated path integrals, Bull. Amer. Math. Soc. 83 (1977), no.5,
831-879.
[6] R.L. Cohen, J.D.S. Jones and J. Yan, The loop homology algebra of spheres
and projective spaces, Categorical Decomposition Techniques in Algebraic
Topology (Isle of Skye, 2001), Progr. Math., vol. 215. Birkhäuser, Basel,
2004, pp.77-92.
[7] P. Deligne, P. Griffiths, J. Morgan and D. Sullivan, Real homotopy theory
of Kähler manifolds, Invent. Math. 29 (1975), 245-274.
[8] E. Getzler, J.D.S. Jones and S. Petrack Differential forms on loop spaces
and the cyclic bar complex, Topology 30 (1991), no.3, 339-371.
http://arXiv.org
[9] W.M. Goldman, Invariant functions on Lie groups and Hamlitonian flows
of surface group representation, Invent. Math. 85 (1986), no.2, 263-302.
[10] S.A. Merkulov, De Rham Model for String Topology, International Mathe-
matics Research Notices 55 (2004), 2955-2981.
	Introduction
	Chen's iterated integrals
	Preliminaries
	Proof of Theorem 1.1
	The conjugacy classes of fundamental groups
	The Goldman bracket
ABSTRACT
  In this article we discuss a relation between the string topology and
differential forms based on the theory of Chen's iterated integrals and the
cyclic bar complex.

<|endoftext|><|startoftext|>
Introduction 1
2. Zero mode integration 2
2.1 Symmetry considerations and tensorial formulae 3
2.2 A spinorial formula 5
2.3 Component-based approach 7
3. One-loop amplitudes 7
3.1 Review: four bosons 8
3.2 Four fermions 10
3.3 Two bosons, two fermions 10
4. Two-loop amplitudes 12
4.1 Review: four bosons 13
4.2 Four fermions 14
4.3 Two bosons, two fermions 15
5. Discussion 16
A. Reduction to kinematic bases 17
A.1 Four bosons 17
A.2 Four fermions 18
A.3 Two bosons, two fermions 20
B. A gamma matrix representation 21
1. Introduction
The quantisation of the ten-dimensional superstring using pure spinors as world-sheet
ghosts [1] has overcome many difficulties encountered in the Green-Schwarz (GS) and
Ramond-Neveu-Schwarz (RNS) formalisms. Most notably, by maintaining manifest space-
time supersymmetry, the pure spinor formalism has yielded super-Poincaré covariant multi-
loop amplitudes, leading to new insights into perturbative finiteness of superstring theory
[2, 3].
Counting fermionic zero modes is a powerful technique in the computation of loop
amplitudes in the pure spinor formalism and has for example been used to show that
at least four external states are needed for a non-vanishing massless loop amplitude [2].
Furthermore, the structure of massless four-point amplitudes is relatively simple because all
– 1 –
fermionic worldsheet variables contribute only through their zero modes. In the expressions
derived for the one-loop [2] and two-loop [4] amplitudes, supersymmetry was kept manifest
by expressing the kinematic factors as integrals over pure spinor superspace [5] involving
three pure spinors λ and five fermionic superspace coordinates θ,
K1-loop =
(λA)(λγmW )(λγnW )Fmn
K2-loop =
(λγmnpqrλ)(λγsW )FmnFpqFrs
(1.1)
where the pure spinor superspace integration is denoted by 〈. . . 〉, and Aα(x, θ), Wα(x, θ)
and Fmn(x, θ) are the superfields of ten-dimensional Yang-Mills theory.
The kinematic factors in (1.1) have been explicitly evaluated for Neveu-Schwarz states
at two loops [6] and one loop [7], and were found to match the amplitudes derived in
the RNS formalism [8]. This provided important consistency checks in establishing the
validity of the pure spinor amplitude prescriptions. (Related one-loop calculations had
been reported in [9].)
In this paper, it will be shown how to compute the kinematic factors in (1.1) when
the superfields are allowed to contribute fermionic fields, as is relevant for the scattering
of fermionic closed string states as well as Ramond/Ramond bosons. It turns out that
the calculation of fermionic amplitudes presents no additional difficulties, making (1.1)
a good practical starting point for the computation of four-point loop amplitudes in a
unified fashion. This practical aspect of the supersymmetric pure spinor amplitudes was
also emphasised in [10], where the tree-level amplitudes were used to construct the fermion
and Ramond/Ramond form contributions to the four-point effective action of the type II
theories.
This paper is organised as follows. In section 2, different methods to compute pure
spinor superspace integrals are explored. These methods are then applied to the explicit
evaluation of the kinematic factors of massless four-point amplitudes at the one-loop level
in section 3, and at the two-loop level in section 4. In both these sections, the bosonic calcu-
lations are briefly reviewed before separately considering the cases of two and four Ramond
states. Particular attention will be paid to the constraints imposed by simple exchange
symmetries. An appendix contains algorithms which were used to reduce intermediate
expressions encountered in the amplitude calculations to a canonical form.
2. Zero mode integration
The calculation of scattering amplitudes in the pure spinor formalism leads to integrals over
zero modes of the fermionic worldsheet variables λ and θ. Both θ and λ are 16-component
Weyl spinors, the λ are commuting and the θ anticommuting, and λ is subject to the pure
spinor constraint (λγmλ) = 0. The amplitude prescriptions [1, 2] require three zero modes
of λ and five zero modes of θ to be present, and a Lorentz covariant object
T̄ αβγ,δ1...δ5 ≡
λαλβλγθδ1 . . . θδ5
= T̄ (αβγ),[δ1...δ5] (2.1)
was constructed such that the Yang-Mills antighost vertex operator
V = (λγmθ)(λγnθ)(λγpθ)(θγmnpθ) has
= 1 . (2.2)
– 2 –
In this section, different methods of computing such “pure superspace integrals” are ex-
plored. As an example, a typical correlator encountered in the two-loop calculations of
section 4 is considered:
F (ki, ui) = k
(λγmnpq[rλ)(λγs]u1)(θγn
abθ)(θγbu2)(θγqu3)(θγsu4)
(2.3)
Here, ki and ui are the momenta and spinor wavefunctions of the four external particles.
2.1 Symmetry considerations and tensorial formulae
One systematic approach to evaluate the zero mode integrals is to find expressions for all
tensors that can be formed from (2.1). By Fierz transformations, one can always write the
product of two θ spinors as (θγ[3]θ), where γ[k] denotes the antisymmetrised product of k
gamma matrices. Due to the pure spinor constraint, the only bilinear in λ is (λγ[5]λ), and
it is thus sufficient to consider the three cases
(λγ[5]λ)(λ{γ[1] or γ[3] or γ[5]}θ)(θγ[3]θ)(θγ[3]θ)
. (2.4)
Lorentz invariance then implies that it must be possible to express these tensors as sums of
suitably symmetrised products of metric tensors, resulting in a parity-even expression, plus
a parity-odd part made up from terms which in addition contain an epsilon tensor. The
parity-even parts may be constructed [6] starting from the most general ansatz compatible
with the symmetries of the correlator and then using spinor identities along with the
normalisation (2.2) to determine all coefficients in the ansatz. Duality properties of the
spinor bilinears can be used to determine the parity-odd part [7]. An extensive (and almost
exhaustive) list of correlators is found in [11], including the (λγ[1]θ) and (λγ[3]θ) cases of
the above list:
(λγmnpqrλ)(λγuθ)(θγfghθ)(θγjklθ)
= − 4
mnpqr
m̄n̄p̄q̄r̄ +
εmnpqrm̄n̄p̄q̄r̄
δm̄n̄fg δ
(δr̄l δ
u + δ
u − δr̄uδhl )
[fgh][jkl]
(2.5)
(λγmnpqrλ)(λγstuθ)(θγfghθ)(θγjklθ)
= −24
mnpqr
m̄n̄p̄q̄r̄ +
εmnpqrm̄n̄p̄q̄r̄
δm̄j δ
δq̄sδ
u − δkhδr̄u)
[fgh][jkl](fgh↔jkl)
(2.6)
(Here, the brackets (fgh↔ jkl) denote symmetrisation under simultaneous interchange of
fgh with ijk, with weight one.) The remaining correlator with the (λγ[5]θ) factor can be
derived in the same way, using an ansatz consisting of six parity-even structures. Taking
a trace between the two γ[5] factors and noting that
(λγmnpqrλ)(λγabcdeθ) . . .
(λγmnpq[bλ)(λγcde]θ) . . .
one finds a relation to (2.6). This is sufficient to determine all coefficients in the ansatz,
and the result is
(λγmnpqrλ)(λγabcdeθ)(θγfghθ)(θγjklθ)
mnpqr
m̄n̄p̄q̄r̄ +
εmnpqrm̄n̄p̄q̄r̄
m̄n̄p̄
(−δehδr̄l + 2δel δr̄h) + δm̄n̄ab δcdfgδ
(δehδ
l − 3δel δr̄h)
[abcde][fgh][jkl](fgh↔jkl)
(2.7)
– 3 –
One may find it surprising that the derivation of these tensorial expressions only made
use of properties of (pure) spinors, and of the normalisation condition (2.2). However, it
can be seen from representation theory that the correlator (2.1) is uniquely characterised,
up to normalisation, by its symmetry. To see this, note that [12] the spinor products λ3
and θ5 transform in
λ(αλβλγ) : Sym3 S+ = [00003] ⊕ [10001]
θ[δ1 . . . θδ5] : Alt5 S+ = [00030] ⊕ [11010] .
(2.8)
(Here, λ and θ are taken to be in the S+ irrep of SO(1,9), with Dynkin label [00001].) The
tensor product of these contains only one copy of the trivial representation. This applies
to any spinors λ, which means that the pure spinor property cannot be essential to the
derivation of the tensorial identities. The use of the pure spinor constraint merely allows
for simpler derivations of the same identities.
As an illustration of this approach, consider the correlator of eq. (2.3). Leaving the
momenta aside for the moment by setting F = k2ak
r F̃ , the task is to compute
(λγmnpq[rλ)(λγs]u1)(θγn
abθ)(θγbu2)(θγqu3)(θγsu4)
After applying two Fierz transformations,
(λγmnpq[r|λ)(λγcθ)(θγn
abθ)(θγjklθ)
|s]γcγbu2)
3!·16
(λγmnpq[r|λ)(λγcdeθ)(θγn
abθ)(θγjklθ)
|s]γcdeγbu2)
2·5!·16
(λγmnpq[r|λ)(λγcdefgθ)(θγn
abθ)(θγjklθ)
|s]γcdefgγbu2)
3!·16(u3γqγjklγsu4) ,
one obtains a combination of the fundamental correlators listed in (2.5), (2.6) and (2.7).
A reliable evaluation of the numerous index symmetrisations is made possible by the use of
a computer algebra program. In doing these calculations with Mathematica, an essential
tool is the GAMMA package [13], expanding the products of gamma matrices in a γ[k] basis.
The result consists of two parts, F̃ = F̃ (δ) + F̃ (ε), with
F̃ (δ) = 1
mpru2)(u3γ
au4) +
r (u1γ
iu2)(u3γiu4) + . . .
ai1i2u2)(u3γ
i1i2u4) (92 terms) (2.9)
F̃ (ε) = − 1
1209600
εi1...i7
mpr(u1γ
i1...i7u2)(u3γ
au4) + . . .
604800
εampri1...i6(u1γ
i3...i9u2)(u3γ
i7i8i9u4) (34 terms) (2.10)
The epsilon tensors in the second part can be eliminated using the fact that the ui are
chiral spinors: If all the indices on γ[k]ui are contracted into an epsilon tensor, one uses
εi1...ik′j1...jkγ
j1...jkγ11 = (−)
k(k+1)
k! γi1...ik′ , (2.11)
where γ11 = 1
εi0...i9γ
i0...i9 . More generally, if all but r indices of γ[k]ui are contracted,
εi1...ik′ j1...jkγ
p1...prj1...jkγ11 = (−)
k(k+1)
(k′ − r)!
pr ...p1
[i1...ir
γir+1...i′k]
. (2.12)
– 4 –
The result of these manipulations is
F̃ (ε) =− 1
mpru2)(u3γ
au4)− 1280δ
r (u1γ
amiu2)(u3γiu4) + . . .
11200
i1i2i3u2)(u3γ
i1i2i3u4) (53 terms) (2.13)
(Note that while the epsilon terms in the basic correlator formulae were easily obtained from
the delta terms by using Poincaré duality, this cannot be done here in any obvious way.)
The last step in the evaluation of (2.3) is to contract with the momenta, F = k2ak
r F̃ ,
and to simplify the expressions using the on-shell identities
i ki = 0, k
i = 0, /kiui = 0. It
is shown in appendix A.2 that there are only ten independent scalars, denoted by B1 . . . B10,
that can be formed from four momenta and the four spinors u1 . . . u4. With respect to this
basis, the result is
F (δ) = 1
48·10080
695s12(u1/k3u2)(u3/k1u4) + · · ·+ 233s213(u1γau2)(u3γau4)
(7 terms)
48·10080 (695, 775, 0,−80, 356, 356, 0, 233, 233, 0)B1 ...B10 ,
F (ε) = 1
48·10080 (−23,−7, 0,−16, 28, 28, 0, 7, 7, 0)B1 ...B10 ,
F = 1
10080
(14, 16, 0,−2, 8, 8, 0, 5, 5, 0)B1 ...B10 , (2.14)
where sij = ki · kj .
2.2 A spinorial formula
While the derivation of tensorial identities for correlators of the form (2.4) is relatively
straightforward and elegant, it may be a tedious task to transform the expressions encoun-
tered in amplitude calculations to match this pattern. As seen in the example calculated
above, this is particularly true if additional spinors are involved, making it necessary to ap-
ply Fierz transformations. It is therefore desirable to use a covariant correlator expression
with open spinor indices. Such an expression was given in [1, 2]:
T̄αβγ,δ1...δ5 = N−1
(γm)αδ1(γn)βδ2(γp)γδ3(γmnp)
(αβγ)[δ1...δ5]
, (2.15)
where N is a normalisation constant and the brackets ()[] denote (anti-)symmetrisation
with weight one. (Note that the right hand side is automatically gamma-matrix traceless:
any gamma-trace
(γr)αβ × (γm)α[δ1|(γn)β|δ2|(γp)γ|δ3(γmnp)δ4δ5] = −(γmnr)[δ1δ2(γmnp)δ3δ4(γp)δ5]γ = 0
vanishes due to the double-trace identity (γabθ)
α(θγabcθ) = 0, which follows from the
fact that the tensor product (Alt3 S+)⊗ S− does not contain a vector representation and
therefore the vector (ψγabθ)(θγ
abcθ) has to vanish for all spinors ψ, and can also be shown
by applying a Fierz transformation.) This prescription was originally motivated [2] by the
fermionic expansion of the Yang-Mills antighost vertex operator V ,
V = Tαβγ,δ1...δ5λ
αλβλγθδ1 . . . θδ5 (2.16)
Tαβγ,δ1...δ5 =
(γm)αδ1(γ
n)βδ2(γ
p)γδ3(γmnp)δ4δ5
(αβγ)[δ1...δ5]
– 5 –
where T is related to T̄ by a parity transformation, up to the overall constant N . (Since
T̄ is uniquely determined by its symmetries, any covariant expression will be proportional
to T̄ , after symmetrisation of the spinor indices, and this is merely the simplest choice.)
Equation (2.15) immediately yields an algorithm to convert any correlator into traces
of gamma matrices or, if additional spinors are involved, bilinears in those spinors. It
is, however, already very tiresome to determine the normalisation constant N by hand.
The main advantage of this approach is that it clearly lends itself to implementation on
a computer algebra system, which can easily carry out the spinor index symmetrisations,
simplify the gamma products (again using the GAMMA package), and compute the traces.
For example,
N〈V 〉 =
(γm)αδ1(γn)βδ2(γp)γδ3(γmnp)
(αβγ)[δ1...δ5]
(γx)αδ1(γy)βδ2(γz)γδ3(γ
xyz)δ4δ5
= − 1
Tr(γxγ
m)Tr(γyγ
n)Tr(γzγ
p)Tr(γxyzγpnm) + . . .
Tr(γzγpnmγ
zyxγnγxγ
p) (60 terms)
= 5160960 .
The correct normalisation is therefore obtained by setting N = 5160960.
Returning to the example correlator (2.3), one finds that the calculation is by far
simpler than with the previous method. After carrying out the symmetrisations (αβγ)[δi],
one obtains
NF̃ = 1
Tr(γxγ
mnpq[r|)(u3γqγ
xyzγsu4)(u1γ
|s]γzγbu2) + . . .
(u2γbγ
xyzγqu3)(u1γsγyγ
mnpq[rγzγ
s]u4) , (24 terms)
where elementary index re-sorting has reduced the number of terms from 60 to 24. Ex-
panding the gamma products leads to
NF̃ = 476
δpr (u1γ
mu4)(u2γ
au3) + · · ·+ 815(u1γ
ai1i2i3i4u2)(u3γ
i1i2i3i4u4) , (294 terms)
which, in contrast to (2.10), contains no epsilon terms as there are not enough free indices
present. Note that this intermediate result contains terms with with u1 paired with u3
or u4, so it is not possible to directly compare to eqs. (2.9) and (2.13). However, after
contracting with the momenta k2ak
r and decomposing the result in the basis B1 . . . B10,
one again obtains
F = 1
10080
(14, 16, 0,−2, 8, 8, 0, 5, 5, 0)B1 ...B10 , (2.17)
in agreement with (2.14).
The algorithm just outlined will be the method of choice for all correlator calculations
in the later sections of this paper and can easily be applied to a wider range of problems.
The only limitation is that the larger the number of gamma matrices and open indices
of the correlator, the slower the computer evaluation will be. For example, the correlator
considered in eq. (5.2) of [11],
mnm1n1...m4n4
(λγpγm1n1θ)(λγqγm2n2θ)(λγrγm3n3θ)(θγmγnγpqrγ
m4n4θ)
= − 2
m1n1...m4n4
εmnm1n1...m4n4
, (2.18)
can still be verified with this method but this already requires substantial runtime.
– 6 –
2.3 Component-based approach
A third method to evaluate the zero mode integrals consists of choosing a gamma matrix
representation, expanding the integrand as a polynomial in spinor components, and then
applying (2.15) to the individual monomials. This procedure seems particularly appealing
if at some stage of the calculation one works with a matrix representation anyhow, in
order to reduce the results to a canonical form (e.g. as outlined in appendix A). An
efficient decomposition algorithm (of k4u1u2u3u4 scalars, say) only needs a few non-zero
momentum and spinor wavefunction components to distinguish all independent scalars, and
therefore k and u can be replaced by sparse vectors. Furthermore, a trivial observation
allows for a much quicker numeric evaluation of correlator components than a naive use
of (2.15): In view of (2.16), one can equivalently compute the components of the parity-
transformed expression V̄ = (λ̄γmθ̄)(λ̄γnθ̄)(λ̄γpθ̄)(θ̄γmnpθ̄), where λ̄ and θ̄ are spinors of
chirality opposite to that of λ, θ. In the representation given in appendix B, V̄ coincides
with V |λ→λ̄,θ→θ̄, and
V = 192λ9λ9λ9θ1θ2θ3θ4θ9 + · · ·+ 480λ1λ2λ3θ1θ9θ10θ13θ15 + . . . (100352 terms)
The monomials in the fermionic expansion of V̄ then correspond to the arguments of
non-zero correlators, and the coefficients of those monomials are, up to normalisation and
symmetry factors, the correlator values.
Unfortunately, it turns out that the complexity of typical correlators (e.g. the one
given in (2.3)) makes it difficult to carry out the expansion in fermionic components in
any straightforward way and limits this method to special applications. For example, the
coefficients in (2.18) can be checked relatively easily by choosing particular index values,
such as
(λγpγ12θ)(λγqγ21θ)(λγrγ34θ)(θγ0γ0γpqrγ
12λ1λ1λ1θ1θ9θ10θ11θ12 + · · ·+ 12λ16λ16λ16θ5θ6θ7θ8θ16
(For fixed values of pqr, one gets no more than about 105 monomials of the form λ3θ5).
This approach may thus still be helpful in situations where the result has been narrowed
down to a simple ansatz.
3. One-loop amplitudes
The amplitude for the scattering of four massless states of the type IIB superstring was
computed [2] in the pure spinor formalism as
A = KK̄
(Im τ)5
G(zi, zj)
ki·kj , (3.1)
where G(zi, zj) is the scalar Green’s function, and the kinematic factor is given by the
product KK̄ of left- and right-moving open superstring expressions,
K1-loop =
(λA1)(λγ
mW2)(λγ
nW3)F4,mn
cycl(234)
. (3.2)
– 7 –
Here the indices 1 . . . 4 label the external states and “· · ·+
cycl(234)
” denotes the addition
of two other terms obtained by cyclic permutation of the indices 234. The spinor super-
field Aα and its supercovariant derivatives, the vector gauge superfield Am =
m DαAβ
as well as the spinor and vector field strengths Wα = 1
(γm)αβ(DβAm − ∂mAβ) and
Fmn = 18(γmn)
β = 2∂[mAn], describe ten-dimensional super-Yang-Mills theory.
The physical fields of this theory, a gauge boson and a gaugino, are found in the leading
components Am| = ζm and Wα| = ûα and correspond to the Neveu-Schwarz and Ramond
superstring states.
The superfields Aα and W
α as well as the gaugino field ûα are anticommuting.1 To
facilitate computer calculations involving polynomials in the spinor components, and for
easier comparison with the literature, it will be more convenient to work with commuting
fermion wavefunctions uα. Fortunately, as the kinematic factors with fermionic external
states are multilinear functions of the distinctly labelled spinors ûi, it is straightforward to
translate between the two conventions: Any monomial expression in û1 . . . û4 (and possibly
fermionic coordinates θ) corresponds to the same expression in u1 . . . u4, multiplied by the
signature of the permutation sorting the ûi (and any θ variables) into some fixed order,
such as (θ · · · θ)ûα11 û
Choosing a gauge where θαAα = 0, the on-shell identities
2D(αAβ) = γ
αβAm , DαW
β = 1
(γmn)α
have been used to derive recursive relations [10, 14, 15] for the fermionic expansion
A(n)α =
(γmθ)αA
(n−1)
m , A
(θγmW
(n−1)) , Wα(n) = − 1
(γmnθ)α∂mA
(n−1)
where f (n) = 1
θαn · · · θα1(Dα1 · · ·Dαnf)|. These recursion relations were explicitly solved
in [10], reducing the fermionic expansion to a simple repeated application of the derivative
operator Omq = 12 (θγm
qpθ)∂p:
A(2k)m =
(2k)!
[Ok]mqζq ,
A(2k+1)m =
(2k+1)!
[Ok]mq(θγqû) .
(3.3)
With this solution at hand, one has all ingredients to evaluate the kinematic factor (3.2)
for the three cases of zero, two, or four fermionic states.
3.1 Review: four bosons
The kinematic factor involving four bosons was considered in [7] and this calculation will
now be reviewed briefly. First, note that the outcome is not fixed by symmetry: The result
must be gauge invariant [2] and therefore expressible in terms of the field strengths F1 . . . F4.
The cyclic symmetrisation in (3.2) yields expressions symmetric in F2, F3, F4, and acting
on scalars constructed from the Fi only, the (234) symmetrisation is equivalent to complete
symmetrisation in all labels (1234). Thus the result must be a linear combination of the
1Thanks to Carlos Mafra for pointing this out.
– 8 –
two gauge invariant symmetric F 4 scalars, namely the single trace Tr(F(1F2F3F4)) and
double trace Tr(F(1F2)Tr(F3F4)), leaving one relative coefficient to be determined.
Since all four states are of the same kind, one may first evaluate the correlator for one
labelling and then carry out the cyclic symmetrisation:
1-loop =
(λA1)(λγ
mW2)(λγ
nW3)F4,mn
cycl (234)
The different ways to saturate θ5 result in a sum of terms of the form
XABCD =
1 )(λγ
2 )(λγ
(3.4)
with A+B +C +D = 5 and A, B, C odd, D even:
(λA1)(λγ
mW2)(λγ
nW3)F4,mn
= X3110 +X1310 +X1130 +X1112 .
Note that X1310 and X1130 are related by exchange of the labels 2 and 3. This exchange
can be carried out after computing the correlator, an operation which will in the following
be denoted by π23. Using (3.3) for the superfield expansions and replacing ∂m → ikm, one
obtains
X3110 = − 1512F
tuX̃3110 , X̃3110 =
(λγ[t|γpqθ)(λγ|u]γrsθ)(λγaθ)(θγ
amnθ)
X1112 = − 1128 ik
tuX̃1112 , X̃1112 =
(λγ[m|γpqθ)(λγ|a]γrsθ)(λγnθ)(θγa
X1310 = − 1384 ik
tuX̃1310 , X̃1310 =
(λγ[t|γmaθ)(λγ|u]γrsθ)(λγnθ)(θγa
The method outlined in section 2.2 is readily applicable to these correlators. For example,
for X3111, the trace evaluation yields
X̃3110 = N
Tr(γaγ
z)Tr(γxyzγ
anm)Tr(γxγqpγ
[t|)Tr(γyγsrγ
|u]) + · · ·
· · ·+ 1
Tr(γ[u|γrsγzyxγqpγ
|t]γxγaγ
yγmnaγz)
(60 terms)
δmprs δ
tu − 1315δ
rs − 145δ
δmnpr δ
[mn][pq][rs][tu](pq↔rs)
Upon contracting with the field strengths, momenta and polarisations, and symmetrising
over the cyclic permutations (234) (with weight 3), one finds that all three contributions
are separately gauge invariant:
X3110 +
cycl(234)
= − 11
13440
Tr(F(1F2F3F4)) +
Tr(F(1F2)Tr(F3F4))
X1112 +
cycl(234)
= − 19
53760
Tr(F(1F2F3F4)) +
215040
Tr(F(1F2)Tr(F3F4))
(1 + π23)X1310 +
cycl(234)
= − 1
10240
4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4))
The sum X3110 +X1112 has the right ratio of single- and double-trace terms to be propor-
tional to the well-known result t8F
4, and the last line exhibits the right ratio by itself. The
overall kinematic factor is therefore
K4B1-loop = − 12560
4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4))
= − 1
15360
4 , (3.5)
in agreement with the expressions derived in the RNS [16] and Green-Schwarz [17] for-
malisms.
– 9 –
3.2 Four fermions
The four-fermion kinematic factor could be evaluated in the same way as in the four-boson
case by summing up all terms XABCD, A + B + C + D = 5, now with A, B, C even
and D odd. Note however that this time, the outcome is fixed by symmetry: The cyclic
symmetrisation in (3.2) leads to a completely symmetric dependence on û2, û3, û4, and
therefore to a completely antisymmetric dependence on u2, u3, u4. Acting on scalars of
the form k2u1u2u3u4, antisymmetrising over [234] is equivalent to antisymmetrising over
[1234], and there is only one completely antisymmetric k2u1u2u3u4 scalar. Without further
calculation, one can infer that the kinematic factor is proportional to that scalar,
K4F1-loop = const ·
(u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3)
which of course agrees with the RNS amplitude (see e.g. [16], eq. (3.67)).
3.3 Two bosons, two fermions
In evaluating (3.2) for two bosons and two fermions, the cyclic symmetrisations affect
whether the W and F superfields contribute bosons or fermions. Only the label of the Aα
superfield stays unaffected, and one has to choose whether it should contribute a boson
or a fermion. Since its fermionic expansion starts with the bosonic polarisation vector,
A1,α ∼ (/ζ1θ)α, the calculation can be simplified by choosing a labelling where particle 1 is
a fermion. (Of course, the final result must be independent of this choice.) The assignment
of the other three labels is then irrelevant and will be chosen as f1f2b3b4. Writing out
the cyclic permutations, two of the three terms are essentially the same because they are
related by interchange of the labels 3 and 4. The kinematic factor is then
K2B2F1-loop(f1f2b3b4) = (1 + π34)
(even)
1 )(λγ
(even)
2 )(λγ
(odd)
(even)
(even)
1 )(λγ
(odd)
3 )(λγ
(odd)
(odd)
Unlike in the four-fermion calculation, the result is not fixed by symmetry. There are five
independent ku1u2F3F4 scalars (see appendix A, eq. (A.6)), denoted by C1 . . . C5, and there
are two independent combinations of these scalars with the required [12](34) symmetry.
Expanding the superfields and collecting terms with θ5, the first line yields a combination
of terms XABCD with A, B, D odd and C even. There is only one θ
5 combination coming
from the second line, which will be denoted by X ′2111 ≡ (−π24)X2111:
K2B2F1-loop = (1 + π34) (X4010 +X2210 +X2030 +X2012) +X
2111 ,
with the correlators
X4010 =
ζ3c k
nX̃4010 , X̃4010 =
(λγaθ)(θγa
pqθ)(θγpu1)(λγ
[mu2)(λγ
n]γbcθ)
X2210 = − i12k
nX̃2210 , X̃2210 =
(λγaθ)(θγau1)(λγ
[m|γbcθ)(θγcu2)(λγ
|n]γdeθ)
X2030 = − i36k
nX̃2030 , X̃2030 =
(λγaθ)(θγau1)(λγ
[mu2)(λγ
n]γbcθ)(θγc
X2012 = − i12k
ζ3c k
ζ4e X̃2012 , X̃2012 =
(λγaθ)(θγau1)(λγ
[mu2)(λγ
n]γbcθ)(θγn
X ′2111 =
ζ3c k
2111 , X̃
2111 =
(λγaθ)(θγau1)(λγ
[m|γbcθ)(λγ|n]γdeθ)(θγnu2)
– 10 –
(The numerical coefficient in X ′2111 includes a sign coming from the θ, û ordering: there
is an odd number of θs between u1 and u2.) Evaluating these expressions as outlined in
section 2.2, the spinor wavefunctions ui present no complication. The last part takes the
simplest form: One finds
(λγaθ)(θγau1)(λγ
mγbcθ)(λγnγdeθ)(θγnu2)
= − 1
(2δbcm[d(u1γe]u2) + δ
m(u1γ
c]deu2))
and therefore
X̃ ′2111 = − 1480
δ[bm(u1γ
c]γdeu2) + δ
m(u1γ
e]γbcu2)
The result for X̃4010 is
X̃4010 =
δbqmn(u1γ
cu2)− 190δ
mq(u1γ
nu2) +
δbcmn(u1γ
qu2)− 12520δ
q (u1γ
bcnu2)
δbq(u1γ
cmnu2) +
δbm(u1γ
cnqu2) +
bcmnqu2)
[bc][mn]
For the evaluation of X̃2210, it is useful to consider the more general correlator
(λγaθ)(θγau1)(λγ
[m|γbcθ)(λγ|n]γdeθ)(θγxu2)
mn(u1γ
cu2) + . . .
201600
δmx (u1γ
bcdenu2) + · · · − 11403200 (u1γ
bcdemnxu2)
[mn][bc][de]
(27 terms)
9676800
εbcdemni1i2i3i4(u1γ
i1i2i3i4xu2)− 12419200εbcdemnxi1i2i3(u1γ
i1i2i3u2) .
This time, even using the method of section 2.2, there are sufficiently many open indices
and long enough traces for epsilon tensors to appear. Using eqs. (2.11) and (2.12), they
can be re-written into γ[5,7] terms:
(λγaθ)(θγau1)(λγ
[m|γbcθ)(λγ|n]γdeθ)(θγxu2)
mn(u1γ
cu2) + . . .
16800
δmx (u1γ
bcdenu2) + · · · − 133600 (u1γ
bcdemnxu2)
[mn][bc][de]
(27 terms)
A good check on the sign of the epsilon contributions is that X̃ ′2111 is recovered when
contracting with ηnx, involving a cancellation of all γ
[5] terms. To obtain X̃2210, one
multiplies by −ηcx:
X̃2210 =
δdemn(u1γ
bu2) +
δbdmn(u1γ
eu2) +
δbmde (u1γ
nu2) +
20160
δdm(u1γ
benu2)
δbm(u1γ
denu2) +
20160
δbd(u1γ
emnu2) +
bdemnu2)
[de][mn]
For the calculation of X2030 and X2012, one may first evaluate a more general correlator
〈(λγaθ)(θγau1)(λγ[mu2)(λγn]γbcθ)(θγxγdeθ)〉 and then contract with ηcx and ηnx, respec-
tively. The results are
X̃2030 =
δdemn(u1γ
bu2) +
δbdmn(u1γ
eu2)− 11440δ
de (u1γ
nu2)− 1710080δ
m(u1γ
benu2)
10080
δbm(u1γ
denu2)− 11440δ
d(u1γ
emnu2) +
bdemnu2)
[mn][de]
X̃2012 =
δdebm(u1γ
cu2) +
δbcdm(u1γ
eu2)− 11440δ
de(u1γ
mu2) +
δdm(u1γ
bceu2)
10080
δbm(u1γ
cdeu2) +
10080
δbd(u1γ
cemu2)− 13360 (u1γ
bcdemu2)
[bc][de]
– 11 –
After multiplication with the momenta and polarisations, all individual contributions are
gauge invariant and can be expanded in the basis C1 . . . C5 listed in (A.6):
(1 + π34)X4010 =
483840
(−6,−16,−40, 6, 0)C1 ...C5
(1 + π34)X2210 =
483840
(−18,−104,−176, 18, 0)C1 ...C5
(1 + π34)X2030 =
483840
(−21, 42,−42, 21, 0)C1 ...C5
(1 + π34)X2012 =
483840
(−39, 78,−78, 39, 0)C1 ...C5
X ′2111 = − i11520 (1, 0, 4,−1, 0)C1 ...C5
The sum can be written as
K2B2F1-loop = X
2111 = − i3840 (1, 0, 4,−1, 0)C1 ...C5
= − i
s13(u2/ζ3(/k2 + /k3)/ζ4u1) + s23(u2/ζ4(/k2 + /k4)/ζ3u1)
(3.6)
and again agrees with the amplitude computed in the RNS result, see [16] eq. (3.37).
4. Two-loop amplitudes
The pure spinor formalism was used in [4, 2] to compute the two-loop type-IIB amplitude
involving four massless states,
d2Ω11d
2Ω12d
i,j ki · kj G(zi, zj)
(det ImΩ)5
K2-loop(ki, zi) ,
where Ω is the genus-two period matrix, and the integration over fermionic zero modes is
encapsulated in
K2-loop = ∆12∆34
(λγmnpqrλ)(λγsW1)F2,mnF3,pqF4,rs
perm(1234)
(4.1)
≡ ∆12∆34K12 +∆13∆24K13 +∆14∆23K14 . (4.2)
The kinematic factors K12, K13, K14 are accompanied by the basic antisymmetric biholo-
morphic 1-form ∆, which is related to a canonical basis ω1, ω2 of holomorphic differentials
via ∆ij = ∆(zi, zj) = ω1(zi)ω2(zj) − ω2(zi)ω1(zj). The superfields Wαi and Fi,mn are the
spinor and vector field strengths of the i-th external state, as in section 3. One encounters
superspace integrals of the form
Y (abcd) =
(λγmnpqrλ)(λγsWa)Fb,mnFc,pqFd,rs
. (4.3)
The symmetries of the λ3 combination [4] in this correlator include the obvious symmetry
under mn↔ pq, and also (λγ[mnpqrλ)(λγs])α = 0 (this holds for pure spinors λ and can be
seen by dualising, and holds for unconstrained spinors λ as part of a λ3θ5 scalar, as seen
from the representation content (2.8)), and allow one to shuffle the F factors:
Y (abcd) = Y (acbd) , Y (abcd) + Y (acdb) + Y (adbc) = 0 . (4.4)
– 12 –
4.1 Review: four bosons
The case of four Neveu-Schwarz states was considered in [6] and will be briefly reviewed
here. As all three kinematic factors K12, K13 and K14 are equivalent, it is sufficient to
consider K12 in detail. With all external states being identical, the symmetrisations of
(4.1) can be carried out at the end of the calculation:
K4B12 = 4
W[1F2]F[3F4]
W[3F4]F[1F2]
= (1− π12)(1− π34)(1 + π13π24)
W1F2F3F4
Expanding the superfields and adopting the notation
YABCD(abcd) =
(λγmnpqrλ)(λγsW (A)a )F
F (C)c,pqF
the Neveu-Schwarz states come from terms of the form YABCD ≡ YABCD(1234) with A odd
and B, C, D even. Using the shuffling identities (4.4) to simplify, one obtains
W1F2F3F4
= Y5000 + Y1400 + Y1040 + Y1004 + Y3200 + Y3020 + Y3002 + Y1220 + Y1202 + Y1022
= (1 + π23)(1− π24)
Y5000 + Y1400 + Y3200 + Y1022
and therefore K4B12 can be written as the image of a symmetrisation operator S4B:
K4B12 = S4B
Y5000 + Y1400 + Y3200 + Y1022
S4B = (1− π12)(1− π34)(1 + π13π24)(1 + π23)(1− π24)
It is worth noting at this point that, on the sixteen-dimensional space of Lorentz scalars
built from the four field strengths Fi and two momenta, the symmetriser S4B has rank four.
The correlators were computed in [6], using the method outlined in section 2.1. Two are
zero, Y5000 = Y1400 = 0, and the remaining ones are
Y3200 =
(λγmnpqrλ)(λγsγabθ)(θγb
cdθ)(θγn
Y1022 =
F 1abF
(λγmnpq[rλ)(λγs]γabθ)(θγq
cdθ)(θγs
In reducing those two contributions to a set of independent scalars, one finds that they
both are not just sums of (k · k)F 4 terms but also contain terms of the form k · F terms.
The latter are projected out by the symmetriser S4B, and the result is
K4B12 = S4B(Y3200 + Y1022) = 1120 (s13 − s23)
4Tr(F(1F2F3F4))− Tr(F(1F2)Tr(F3F4))
(s13 − s23)t8F 4 .
By trivial index exchange, one obtains K13 and K14, and the total is
K4B2-loop =
(s13 − s23)∆12∆34 + (s12 − s23)∆13∆24 + (s12 − s13)∆14∆23
4 , (4.5)
a product of the completely symmetric one-loop kinematic factor t8F
4 and a completely
symmetric combination of the momenta and the ∆ij.
– 13 –
4.2 Four fermions
The calculation involving four Ramond states is very similar to the bosonic one. Focussing
on the K12 part, the symmetrisations in (4.1) can again be rewritten as action of sym-
metrisation operators on the correlator of superfields with one particular labelling:
K4F12 (ûi) = (1− π12)(1 − π34)(1 + π13π24)
W1F2F3F4
û1û2û3û4
= 4(1− π12)
W1F2F3F4
û1û2û3û4
The last step follows from the fact that all scalars of the form k4u4 (see appendix A.2),
and therefore all k4û4 scalars, are invariant under π13π24 and have π12 = π34. This time,
on expanding the superfields, one collects the terms YABCD with A even and B, C, D odd.
After using (4.4) to simplify,
W1F2F3F4
û1û2û3û4
= Y2111 + Y0311 + Y0131 + Y0113
= (1 + π23)(1− π24)
Y2111 + Y0311
and after translating to commuting wavefunctions ui, which multiplies every permutation
operator with its signature, one obtains
K4F12 (ui) = S4F
Y2111(ui) + Y0311(ui)
, S4F = 4(1 + π12)(1− π23)(1 + π24) .
This symmetriser has rank three, and the result is again not determined by symmetry.
Two correlators have to be computed:
Y2111(ui) = (−2)k1ak2mk3pk4r
(λγmnpq[rλ)(λγs]γabθ)(θγbu1)(θγnu2)(θγqu3)(θγsu4)
Y0311(ui) = (−23)k
(λγmnpq[rλ)(λγs]u1)(θγn
abθ)(θγbu2)(θγqu3)(θγsu4)
With four fermions present, the method of section 2.2 is preferred as it does not involve re-
arranging the fermions using Fierz identities. The first correlator was covered as an example
in that section, and the second one can be evaluated in the same fashion. Expressed in the
basis listed in (A.5), the results are
Y2111(ui) =
(−19,−21, 21, 19,−17,−17, 0, 0, 0, 0)B1 ...B10 ,
Y0311(ui) =
15120
(−14,−16, 0, 2,−8,−8, 0,−5,−5, 0)B1 ...B10 .
After acting with the symmetriser S4F, one obtains the same u4 scalar encountered in the
one-loop amplitude,
K4F12 (ui) = S4F(13Y2111(ui) + Y0311(ui)) =
(−1,−2, 1, 2,−1,−2, 0, 0, 0, 0)B1 ...B10
(s23 − s13)
(u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3)
The K13 and K14 parts again follow by index exchange, and the total result
K4F2-loop(ui) =
(s23 − s13)∆12∆34 + (s23 − s12)∆13∆24 + (s13 − s12)∆14∆23
(u1/k3u2)(u3/k1u4)− (u1/k2u3)(u2/k1u4) + (u1/k2u4)(u2/k1u3)
(4.6)
is again a simple product of the one-loop kinematic factor and a combination of the ∆ij
and momenta.
– 14 –
4.3 Two bosons, two fermions
As in the one-loop calculation of section 3.3, in the mixed case one has to pay some attention
to the permutations in (4.1) since they affect which superfields contribute fermionic fields.
The complete symmetrisation makes it irrelevant which labels are assigned to the two
fermions, and the convention f1f2b3b4 will be used here. The kinematic factor K
12 is then
distinguished from the other two, K2B2F13 and K
14 . Carrying out the symmetrisations in
(4.1) and using the identities (4.4), one finds
K12(û1, û2, ζ3, ζ4) = (1− π12)(1− π34)K̃ ,
K13(û1, û2, ζ3, ζ4) = (2 · 1+ π12 + π34 + 2π12π34)K̃ ,
K14(û1, û2, ζ3, ζ4) = (1+ 2π12 + 2π34 + π12π34)K̃ ,
where, schematically,
(even)
(odd)
(even)
(even)
(odd)
(even)
(odd)
(odd)
. (4.7)
In translating to commuting variables u1 and u2, the permutation operator π12 changes
sign, and therefore2
K12(u1, u2, ζ3, ζ4) = (1+ π12)(1− π34)K̃ ,
K13(u1, u2, ζ3, ζ4) = (2 · 1− π12 + π34 − 2π12π34)K̃ ,
K14(u1, u2, ζ3, ζ4) = (1− 2π12 + 2π34 − π12π34)K̃ .
Expanding the superfields, the contributions to K̃ are:
Y4100 = − i48k
(λγmnpqrλ)(λγsγabθ)(θγbγ
cdθ)(θγcu1)(θγnu2)
Y0500 =
(λγmnpqrλ)(λγsu1)(θγn
abθ)(θγb
cdθ)(θγdu2)
Y0140 =
(λγmnpqrλ)(λγsu1)(θγnu2)(θγq
abθ)(θγb
Y0104 =
(λγmnpqrλ)(λγsu1)(θγnu2)(θγ|s]
abθ)(θγb
Y2300 =
(λγmnpqrλ)(λγsγabθ)(θγbu1)(θγn
cdθ)(θγeu2)
Y2120 =
(λγmnpqrλ)(λγsγabθ)(θγbu1)(θγnu2)(θγq
Y2102 =
(λγmnpqrλ)(λγsγabθ)(θγbu1)(θγnu2)(θγ|s]
Y0320 =
(λγmnpqrλ)(λγsu1)(θγn
abθ)(θγbu2)(θγq
Y0302 =
(λγmnpqrλ)(λγsu1)(θγn
abθ)(θγbu2)(θγ|s]
Y0122 =
(λγmnpqrλ)(λγsu1)(θγnu2)(θγq
abθ)(θγs]
Y3011 =
(λγmnpqrλ)(λγsγabθ)(θγb
cdθ)(θγcu1)(θγnu2)
Y1211 =
F 3abk
(λγmnpqrλ)(λγsγabθ)(θγn
cdθ)(θγqu1)(θγ|s]u2)
Y1031 =
F 3abF
(λγmnpqrλ)(λγsγabθ)(θγq
cdθ)(θγdu1)(θγ|s]u2)
Y1013 =
F 3abF
(λγmnpqrλ)(λγsγabθ)(θγqu1)(θγ|s]
cdθ)(θγdu2)
2This sign change is crucial to avoid the erroneous conclusion that the two-boson, two-fermion kinematic
factor cannot be of the same product form as in the four-boson or four-fermion cases, which would be in
contradiction to the supersymmetric identities derived in [18].
– 15 –
These correlators can be evaluated exactly as described in section 3.3. One finds that
Y0500 = Y0140 = Y0104 = 0, and the sum of the remaining terms reduces to
K̃ = Y4100 + Y2300 + Y2120 + Y2102 + Y0320 + Y0302 + Y0122 + Y3011 + Y1211 + Y1031 + Y1013
(s12 + s13)× (1, 0, 4,−1, 0)C1 ...C5 .
After applying the symmetrisation operators,
(1+ π12)(1− π34)K̃ = i180 (s12 + 2s13)× (1, 0, 4,−1, 0)C1 ...C5 ,
(2 · 1− π12 + π34 − 2π12π34)K̃ = i180 (2s12 + s13)× (1, 0, 4,−1, 0)C1 ...C5 ,
(1− 2π12 + 2π34 − π12π34)K̃ = i180 (s12 − s13)× (1, 0, 4,−1, 0)C1 ...C5 ,
the total kinematic factor is seen to be
K2-loop(u1, u2, ζ3, ζ4) = − i180
(s23−s13)∆12∆34+(s23−s12)∆12∆34+(s13−s12)∆12∆34
× (1, 0, 4,−1, 0)C1 ...C5 (4.8)
and displays the same simple product form as in the four-boson and four-fermion case.
5. Discussion
In this paper, different methods were discussed to efficiently evaluate the superspace inte-
grals appearing in multiloop amplitudes derived in the pure spinor formalism. Extending
previous calculations [6, 7] restricted to Neveu-Schwarz states, it was then shown how the
treatment of Ramond states poses no additional difficulties.
While the bosonic calculations of [6, 7] have, in conjunction with supersymmetry,
already established the equivalence of the massless four-point amplitudes derived in the
pure spinor and RNS formalisms, it would be interesting to make contact between the
results of sections 4.2 / 4.3 and two-loop amplitudes involving Ramond states as computed
in the RNS formalism (see for example [19]).
The assistance of a computer algebra system seems indispensible in explicitly evaluat-
ing pure spinor superspace integrals. To avoid excessive use of custom-made algorithms,
it would be desirable to implement these calculations in a wider computational framework
particular adapted to field theory calculations [20].
The methods outlined in this paper should be easily applicable to future higher-loop
amplitude expressions derived from the pure spinor formalism, and, it is hoped, to other
superspace integrals.
Acknowledgements
The author would like to thank Louise Dolan for discussions, and Carlos Mafra for valuable
correspondence. This work is supported by the U.S. Department of Energy, grant no. DE-
FG01-06ER06-01, Task A.
– 16 –
A. Reduction to kinematic bases
In calculating scattering amplitudes one encounters kinematic factors which are Lorentz
invariant polynomials in the momenta, polarisations and/or spinor wavefunctions of the
scattered particles. It can be a non-trivial task to simplify such expressions, taking into
account the on-shell identities
i ki = 0, k
i = 0, ki · ζi = 0, /kiui = 0, and, in the case of
fermions, re-arrangements stemming from Fierz identities.
More generally, one would like to know how many independent combinations of some
given fields (subject to on-shell identitites) there are, and how to reduce an arbitrary expres-
sion with respect to some chosen basis. This appendix outlines methods to address these
problems, with an emphasis on algorithms which can easily be transferred to a computer
algebra system. These methods are not limited to dealing with pure spinor calculations
but the scope will be restricted to amplitudes of four massless vector or spinor particles in
ten dimensions.
A.1 Four bosons
It is not difficult to reduce polynomials in the momenta and polarisations to a canonical
form. The momentum conservation constraint
i ki = 0 is solved by eliminating one
momentum (for example k4), all k
i are set to zero, and one of the two remaining quadratic
combinations of momenta is eliminated (for example s23 → −s12− s13, where sij ≡ ki ·kj).
Then all products ki · ζi are set to zero, and one extra k · ζ product is replaced (when
eliminating k4, the replacement is k3 · ζ4 → (−k1 − k2) · ζ4). The remaining monomials are
then independent. (This is at least the case with the low powers of momenta encountered
in the calculations of sections 3 and 4, where there are enough spatial directions for all
momenta/polarisations to be linearly independent.)
The implementation of these reduction rules on a computer is straightforward. The
easiest way to obtain scalars which are also invariant under the gauge symmetry ki → ζi
is to start with expressions constructed from the field strengths F abi = 2∂
i . For the
one-loop calculations of section 3.1, the relevant basis consists of gauge invariant scalars
containing only the four field strengths F1 . . . F4. One finds six independent combinations,
Tr(F1F2F3F4)
Tr(F1F2F4F3)
Tr(F1F3F2F4)
Tr(F1F2)Tr(F3F4)
Tr(F1F3)Tr(F2F4)
Tr(F1F4)Tr(F2F3)
In the two-loop calculations of section 4.1, all monomials have two more momenta. There
are sixteen independent gauge invariant scalars of the form kkF1F2F3F4, and twelve of
them may be constructed from the previous basis by multiplication with s12 and s13:
A1 = s12 Tr(F1F2F3F4), A2 = s13 Tr(F1F2F3F4), etc. One choice for the additional four is
A13 = k3 · F1 · F2 · k3 Tr(F3F4) A15 = k3 · F1 · F4 · k2 Tr(F2F3)
A14 = k4 · F1 · F3 · k2 Tr(F2F4) A16 = k4 · F2 · F3 · k4 Tr(F1F4) .
– 17 –
As an example application of the computer algorithms, one may check that the symmetri-
sation operator of section 4.1,
S4B = (1− π12)(1− π34)(1 + π13π24)(1 + π23)(1− π24) ,
acts as
S4BA1 = 8A1 + 4A2 − 4A3 + 4A4 + 8A5 + 16A6
. . .
S4BA16 = −6A1 + 6A3 − 6A5 − 12A6 + 32A7 + 3A8 +
A9 + 3A10 +
A11 + 3A12
and has rank four.
A.2 Four fermions
In dealing with the spinor wavefunctions ui one has to face two issues: Fierz identities, and
the Dirac equation. Fierz identities not only allow one to change the order of the spinors
but also give rise to relations between different expressions in one spinor order. The Dirac
equation often simplifies terms with momenta contracted into (uiγ
[n]uj) bilinears.
In this section it is shown how to construct bases for terms of the form (k2 or k4) ×
u1u2u3u4. A significant simplification comes from noting that the Dirac equation allows
one to rewrite (uiγ
[n]uj) bilinears into terms with lower n if more than one momentum is
contracted into the γ[n]. A good first step is therefore to disregard the momenta temporarily
and find all independent scalars and two-index tensors built from u1, . . . , u4. From the
SO(10) representation content,
(S+)⊗4 = 2 · 1+ 6 · + 3 ·˜+ (tensors with rank > 2) ,
one expects two scalars and nine 2-tensors. The scalars are easily found by considering, as
in [21],
T1(1234) = (u1γ
au2)(u3γau4) ,
T3(1234) = (u1γ
abcu2)(u3γabcu4) .
and similarly for the other two inequivalent orders of the four spinors. (Note there is no T5
because of self-duality of the γ[5].) From Fierz transformations, one learns that all T3 terms
can be reduced to T1 by T3(1234) = −12T1(1234)− 24T1(1324) and permutations, and the
identity (γa)(αβ(γ
a)γ)δ = 0 implies that T1(1234) + T1(1324) + T1(1423) = 0, leaving for
example T1(1234) and T1(1324) as independent scalars.
Generalising this approach to two-index tensors, it turns out that it is sufficient to
start with
T11(1234) = (u1γ
mu2)(u3γ
nu4) ,
T31(1234) = (u1γ
aγmγnu2)(u3γau4) ,
T33(1234) = (u1γ
abγmu2)(u3γabγ
nu4) ,
– 18 –
and permutations of the spinor labels. It would be very tiresome to systematically apply
a variety of Fierz transformations by hand and to find an independent set. Fortunately,
by choosing a gamma matrix representation (such as the one listed in appendix B) and
reducing all expressions to polynomials in the independent spinor components u1i , . . . , u
this problem can be solved with computer help. As expected, one finds that the Tij(abcd)
span a nine-dimensional space, and a basis can be chosen as
T11(1234), T11(1324), T11(1423), T11(3412), T11(2413), T11(2314),
T31(1234), T31(1324), T31(2314) . (A.1)
A typical relation reducing the other Tij(abcd) to this basis is
T31(3412) = 2T11(1234) − 2T11(3412) + T31(1324) + T31(2314) + 2ηmnT1(1234) . (A.2)
Having solved the first step, it is now easy to include the two or four momenta, taking
the Dirac equation into account. Consider first the case of two momenta. Starting from
the two-tensors in (A.1), one gets the three independent scalars
(u1/k3u2)(u3/k1u4) , (u1/k2u3)(u2/k1u4) , (u1/k2u4)(u2/k1u3) .
In addition, there are four products of the two independent scalars T1(1234) and T1(1324)
with the two independent momentum invariants s12 and s13. By contracting (A.2) with
momenta, one can show that
s12T1(1324) − s13T1(1234)
= −(u1/k3u2)(u3/k1u4) + (u1/k2u3)(u2/k1u4)− (u1/k2u4)(u2/k1u3) , (A.3)
and this relation can be used to eliminate s12T1(1324). (It will become clear later that
there are no independent other relations like this one.) There are thus six independent
k2u1 · · · u4 scalars:
(u1/k3u2)(u3/k1u4) s12 T1(1234)
(u1/k2u3)(u2/k1u4) s13 T1(1234) (A.4)
(u1/k2u4)(u2/k1u3) s13 T1(1324)
Note that there is only one completely antisymmetric combination of those, given by the
right hand side of (A.3). Similarly, in the case of four momenta, one finds ten independent
k4u1 · · · u4 scalars:
B1 = s12 (u1/k3u2)(u3/k1u4) B2 = s13 (u1/k3u2)(u3/k1u4)
B3 = s12 (u1/k2u3)(u2/k1u4) B4 = s13 (u1/k2u3)(u2/k1u4)
B5 = s12 (u1/k2u4)(u2/k1u3) B6 = s13 (u1/k2u4)(u2/k1u3) (A.5)
B7 = s
12 T1(1234) B8 = s12s13 T1(1234)
B9 = s
13 T1(1234) B10 = s
13 T1(1324)
– 19 –
Working in a gamma matrix representation, it is again simple to construct a computer
algorithm which reduces any given k2u1 · · · u4 or k4u1 · · · u4 scalar into polynomials of the
spinor and momentum components. The Dirac equation can then be solved by breaking
up the sixteen-component spinors ui into eight-dimensional chiral spinors u
i and u
i , as in
eq. (B.1). One obtains polynomials in the momentum components kai and the independent
spinor components (uci )
1...8. However, a great disadvantage of this procedure is that it
breaks manifest Lorentz invariance. For example, one encounters expressions which contain
subsets of terms proportional to the square of a single momentum and are therefore equal
to zero, but it is difficult to recognise this with a simple algorithm. The easiest solution
is to choose several sets of particular vectors ki satisfying k
i = 0 and
i ki = 0 and to
evaluate all expressions on these vectors. (By choosing integer arithmetic, one easily avoids
issues of numerical accuracy.) Substituting these sets of momentum vectors in the bases
(A.4) and (A.5) gives full rank six and ten respectively, showing they are indeed linearly
independent.
Equipped with a computer algorithm for these basis decompositions, one finds, for
example, that the symmetriser S4F of section 4.2,
S4F = 4(1 + π12)(1 − π23)(1 + π24) ,
acts on the B1 . . . B10 basis as
S4FB1 = −12B4 + 12B5 + 12B6 ,
. . .
S4FB10 = 8B1 + 16B2 − 8B3 − 16B4 + 8B5 + 16B6 − 24B7 − 24B8 − 24B9
and has rank three.
A.3 Two bosons, two fermions
The combined methods of the last two sections can easily be extended to the mixed case
of two bosons and two fermions. In the one-loop calculation of section 3.3, one encounters
scalars of the form ku1u2F3F4. A basis of such objects is given by
C1 = (u1γ
au2)k
C2 = (u1γ
au2)F
C3 = (u1γ
au2)F
c (A.6)
C4 = (u1γ
abcu2)F
C5 = (u1γ
abcu2)F
There are two combinations antisymmetric in [12] and symmetric in (34):
−C1 + 4C2 +C4 and C2 + C3 .
Finally, there are ten independent scalars of the form k3u1u2F3F4 (relevant to the two-loop
calculation of section 4.3), and they can all be obtained by multiplication of C1 . . . C5 with
the two momentum invariants s12 and s13.
– 20 –
B. A gamma matrix representation
A convenient representation of the SO(1,9) gamma matrices is given by the 32×32 matrices
0 (γa)αβ
(γa)αβ 0
where
(γ0)αβ = 116 = (γ
0)αβ ,
(γ9)αβ =
−18 0
= −(γ9)αβ ,
and (γa)αβ = −(γa)αβ, a = 1 . . . 8, is a real, symmetric 16×16 representation for the SO(8)
Clifford algebra,
(γa)αβ =
(σa)T 0
, a = 1 . . . 8 ,
as given in appendix 5.B of [21]. The matrices Γa satisfy the SO(1,9) Clifford algebra
relations,
{Γa,Γb} = 2ηab132 , ηab = (+−− · · · −) ,
and bilinears of chiral spinors (with, say, positive chirality) are constructed as
(uΓ[a1...ak]v) = (uγ[a1...ak ]v) = uα(γ[a1)αβ(γ
a2)βγ . . . (γak ])γδv
This representation is particularly suitable for the calculations outlined in appendix A
because it allows a simple decomposition of SO(1,9) spinors into SO(8) spinors due to its
block structure:
Γ0 · · ·Γ9 =
116 0
0 −116
, Γ1 · · ·Γ8 =
18 0 0 0
0 −18 0 0
0 0 18 0
0 0 0 −18
Therefore, the Dirac equation for a chiral 16-component spinor u,
(γa)αβ∂au
α = 0 ,
can be solved by splitting u into two chiral eight-component spinors of SO(8),
with γ1...8
One obtains the coupled equations
(∂0 + ∂9)u
s − (σ · ∂)uc = 0
(∂0 − ∂9)uc − (σT · ∂)us = 0
(with eight-dimensional dot products). These can be solved for us in terms of uc:
(σ · ∂)uc =
(σ · k)uc , (B.1)
where k+ = −i∂+ = −i√
(∂0 + ∂9).
– 21 –
References
[1] N. Berkovits, Super-Poincaré covariant quantization of the superstring, J. High Energy Phys.
04 (2000) 018 [hep-th/0001035].
[2] N. Berkovits, Multiloop amplitudes and vanishing theorems using the pure spinor formalism
for the superstring, J. High Energy Phys. 09 (2004) 047 [hep-th/0406055].
[3] N. Berkovits, New higher-derivative R4 theorems [hep-th/0609006].
[4] N. Berkovits, Super-Poincaré covariant two-loop superstring amplitudes, J. High Energy
Phys. 01 (2006) 005 [hep-th/0503197].
[5] N. Berkovits, Explaining pure spinor superspace [hep-th/0612021].
[6] N. Berkovits and C.R. Mafra, Equivalence of two-loop superstring amplitudes in the pure
spinor and RNS formalisms, Phys. Rev. Lett. 96 (2006) 011602 [hep-th/0509234].
[7] C.R. Mafra, Four-point one-loop amplitude computation in the pure spinor formalism, J.
High Energy Phys. 01 (2006) 075 [hep-th/0512052].
[8] E. D’Hoker and D.H. Phong, Two loop superstrings, 1. Main formulas, Phys. Lett. B 529
(2002) 241, [hep-th/0110247].
[9] L. Anguelova, P.A. Grassi and P. Vanhove, Covariant one-loop amplitudes in D = 11, Nucl.
Phys. B 702 (2004) 269 [hep-th/0408171].
[10] G. Policastro and D. Tsimpis, R4, purified, Class. and Quant. Grav. 23 (2006) 4753
[hep-th/0603165].
[11] N. Berkovits and C.R. Mafra, Some superstring amplitude computations with the
non-minimal pure spinor formalism, J. High Energy Phys. 11 (2006) 079 [hep-th/0607187].
[12] A. Cohen, M. van Leeuwen and B. Lisser, LiE: A Computer algebra package for Lie group
computations, v. 2.2 (1998), http://wwwmathlabo.univ-poitiers.fr/~maavl/LiE/
[13] U. Gran, GAMMA: A Mathematica package for performing gamma-matrix algebra and Fierz
transformations in arbitrary dimensions [hep-th/010508].
[14] H. Ooguri, J. Rahmfeld, H. Robins, J. Tannenhauser, Holography in superspace, J. High
Energy Phys. 07 (2000) 045 [hep-th/0007104].
[15] P.A. Grassi, L. Tamassia, Vertex operators for closed superstrings, J. High Energy Phys. 07
(2004) 071 [hep-th/0405072].
[16] J.J. Atick and A. Sen, Covariant one loop fermion emission amplitudes in closed string
theories, Nucl. Phys. B 293 (1987) 317.
[17] M. B. Green and J. H. Schwarz, Supersymmetrical dual string theory. 3. Loops and
renormalisation, Nucl. Phys. B 198 (1982) 441.
[18] C. R. Mafra, Pure Spinor Superspace Identities for Massless Four-point Kinematic Factors,
[arXiv:0801.0580 [hep-th]].
[19] C.-J. Zhu, Covariant two-loop fermion emission amplitude in closed superstring theories,
Nucl. Phys. B 327 (1989) 744.
[20] K. Peeters, Introducing Cadabra: A symbolic computer algebra system for field theory
problems [hep-th/0701238].
[21] M.B. Green, J.H. Schwarz and E. Witten, Superstring theory. Vol. 1: Introduction,
Cambridge University Press, 1987.
– 22 –
http://jhep.sissa.it/stdsearch?paper=04%282000%29018
http://jhep.sissa.it/stdsearch?paper=04%282000%29018
http://xxx.lanl.gov/abs/hep-th/0001035
http://jhep.sissa.it/stdsearch?paper=09%282004%29047
http://xxx.lanl.gov/abs/hep-th/0406055
http://xxx.lanl.gov/abs/hep-th/0609006
http://jhep.sissa.it/stdsearch?paper=01%282006%29005
http://jhep.sissa.it/stdsearch?paper=01%282006%29005
http://xxx.lanl.gov/abs/hep-th/0503197
http://xxx.lanl.gov/abs/hep-th/0612021
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PRLTA%2C96%2C011602
http://xxx.lanl.gov/abs/hep-th/0509234
http://jhep.sissa.it/stdsearch?paper=01%282006%29075
http://jhep.sissa.it/stdsearch?paper=01%282006%29075
http://xxx.lanl.gov/abs/hep-th/0512052
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PHLTA%2CB529%2C241
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=PHLTA%2CB529%2C241
http://xxx.lanl.gov/abs/hep-th/0110247
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB702%2C269
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB702%2C269
http://xxx.lanl.gov/abs/hep-th/0408171
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=CQGRD%2C23%2C4753
http://xxx.lanl.gov/abs/hep-th/0603165
http://jhep.sissa.it/stdsearch?paper=11%282006%29079
http://xxx.lanl.gov/abs/hep-th/0607187
http://wwwmathlabo.univ-poitiers.fr/~maavl/LiE/
http://xxx.lanl.gov/abs/hep-th/010508
http://jhep.sissa.it/stdsearch?paper=07%282000%29045
http://jhep.sissa.it/stdsearch?paper=07%282000%29045
http://xxx.lanl.gov/abs/hep-th/0007104
http://jhep.sissa.it/stdsearch?paper=07%282004%29071
http://jhep.sissa.it/stdsearch?paper=07%282004%29071
http://xxx.lanl.gov/abs/hep-th/0405072
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB293%2C317
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB198%2C441
http://www-spires.slac.stanford.edu/spires/find/hep/www?j=NUPHA%2CB327%2C744
http://xxx.lanl.gov/abs/hep-th/0701238
ABSTRACT
  The pure spinor formulation of the ten-dimensional superstring leads to
manifestly supersymmetric loop amplitudes, expressed as integrals in pure
spinor superspace. This paper explores different methods to evaluate these
integrals and then uses them to calculate the kinematic factors of the one-loop
and two-loop massless four-point amplitudes involving two and four Ramond
states.

<|endoftext|><|startoftext|>
Introduction
	Formulation for Lifetimes of cc+, cc++, cc+ 
	Spectator Contribution to Lifetimes of cc+, cc++, cc+ 
	Non-spectator Contributions to Inclusive Decays of cc+, cc++, cc+
	The hadronic matrix elements
	Input parameters and Numerical results
	Conclusion and Discussion
	References
ABSTRACT
  In this work, we evaluate the lifetimes of the doubly charmed baryons
$\Xi_{cc}^{+}$, $\Xi_{cc}^{++}$ and $\Omega_{cc}^{+}$. We carefully calculate
the non-spectator contributions at the quark level where the Cabibbo-suppressed
diagrams are also included. The hadronic matrix elements are evaluated in the
simple non-relativistic harmonic oscillator model. Our numerical results are
generally consistent with that obtained by other authors who used the diquark
model. However, all the theoretical predictions on the lifetimes are one order
larger than the upper limit set by the recent SELEX measurement. This
discrepancy would be clarified by the future experiment, if more accurate
experiment still confirms the value of the SELEX collaboration, there must be
some unknown mechanism to be explored.

<|endoftext|><|startoftext|>
Introduction
	Observations And Data Reduction
	1991 Observations
	2001 Observations
	The Radial Velocities
	Period Searches
	Orbital Variations of the Emission Lines
	Orbital Tomograms and Trailed Spectra
	Spin Variations of the Emission lines
	The Spin Radial Velocity Curve
	Spin Tomograms and Trailed Spectra
	Discussion of the Orbital and Spin Data
	White dwarf and secondary masses
	The revised model of EX Hya
	Summary
	References
ABSTRACT
  Results from spectroscopic observations of the Intermediate Polar (IP) EX Hya
in quiescence during 1991 and 2001 are presented. Spin-modulated radial
velocities consistent with an outer disc origin were detected for the first
time in an IP. The spin pulsation was modulated with velocities near ~500-600
km/s. These velocities are consistent with those of material circulating at the
outer edge of the accretion disc, suggesting corotation of the accretion
curtain with material near the Roche lobe radius. Furthermore, spin Doppler
tomograms have revealed evidence of the accretion curtain emission extending
from velocities of ~500 km/s to ~1000 km/s. These findings have confirmed the
theoretical model predictions of King & Wynn (1999), Belle et al. (2002) and
Norton et al. (2004) for EX Hya, which predict large accretion curtains that
extend to a distance close to the Roche lobe radius in this system. Evidence
for overflow stream of material falling onto the magnetosphere was observed,
confirming the result of Belle et al. (2005) that disc overflow in EX Hya is
present during quiescence as well as outburst. It appears that the hbeta and
hgamma spin radial velocities originated from the rotation of the funnel at the
outer disc edge, while those of halpha were produced due to the flow of
material along the field lines far from the white dwarf (narrow component) and
close to the white dwarf (broad-base component), in agreement with the
accretion curtain model.

<|endoftext|><|startoftext|>
Introduction
We do not know what six-dimensional (2, 0) theory really is. It is believed
that it can sustain solitonic self-dual strings [1], although no one today knows
what a (non-Abelian) self-dual string really is. But if we break the gauge group
maximally to U(1)r, then we should be able to define the charges of these
mysterious self-dual strings by the asymptotic behaviour of the U(1) gauge
fields. One should expect these asymptotic U(1) fields to be (at least isomorphic
with) a copy of the familiar abelian two-form gauge potentials (with self-dual
field strengths).
It now seems to make sense to ask a question like, what is the dimension of
the moduli space of self-dual strings of a given charge?
If the gauge group is SU(2) and is broken to U(1) by the Higgs vacuum
expectation value (that should also determine the tension of the string), then the
intuitive answer to this question is 4N where N is the U(1) charge in a suitable
normalization, such that N = 1 corresponds to one self-dual string. One may
argue that half the supersymmetry is broken by the string. Therefore one string
should sustain 4 fermionic zero modes. Since some (half) of the supersymmety
is unbroken there should also be 4 corresponding bosonic zero modes. These
are naturally identified with the translational zero modes associated with the
four transverse directions to the string. Furthermore, the strings being BPS,
should be possible to separate at no cost of energy (thus staying in the moduli
space approximation). If we take them far from each other, one may suspect
that we can just add 4 bosonic zero modes from each string, to get 4N bosonic
zero modes in total in a configuration of N strings [2].
It would of course be nice to have a proof of this conjecture. Could it be
proven if one had some index theorem? We will not provide a full solution to
this problem in this Letter. But we will make it plausible that the problem can
indeed be solved by computing the index of a certain Dirac operator in loop
space.
To address our index problem, we think that one can lend the methods
that Callias [3] used to prove his index theorem in odd-dimensional spaces. In
our case we have an even number of dimensions (namely the four transverse
direction) so it is apparent that we would have to construct a new type of
index. This we do in section 3.
In section 2 we recall the Callias method [3] to address index problems
in open spaces, though we will modify Callias’ regularization, using the more
convergent exponential function to obtain the index, as the limit
γe−sD
, (1)
(here D2 > 0 and γ =diag (1,−1)) rather than
D2 +M2
, (2)
which is the regularization that Callias used. We think that using the more
convergent regularization of an exponential function is interesting in itself, as
it could possibly extend the Callias index theorem to a wider class of index
problems. Therefore we will devote the first part of this Letter on this subject.
But let us at once say that our regulator probably has no advantages when
attacking these old problems. It does not provide us with a solution for how to
count the number of zero modes in a multimonopole configuration with a non-
maximally broken gauge group, where the index can not be reliable computed
due to a contribution from the continuum portion of the spectrum. What we
hope though, is that our regulatization can be useful when attacking our new
index problem associated with the moduli space of self-dual strings.
In section 2 we obtain the index in one and three dimensions. In three
dimensions we apply this on the multimonopole moduli space and re-derive the
result in [4]. A recent review article on monopoles and supersymmetry is [5].
The one and three-dimensional index problems have also been studied in [6].
We then indicate how our method manages to reproduce the correct results in
any odd dimensions. In section 3 we show how one at least in principle should
be able to compute the dimension of the moduli space of N self-dual strings by
computing a certain index.
2 Computing the Callias index in odd-dimensional
spaces
For Dirac operators on open n− 1-dimensional space where n− 1 is odd, there
is an index theorem by Callias [3]. This applies to Dirac equations of the form
Dψ = 0 (3)
where the Dirac operator D is of the form
D = γiiDi + γnφ. (4)
Here i = 1, ..., n− 1 and γµ ≡ (γi, γn) denote the Dirac gamma matrices,
{γµ, γν} = 2δµν . (5)
We define the gauge covariant derivative as iDis = i∂is +Ais and all our fields
are hermitian. If n− 1 is odd, the gamma matrices can be represented as
One may use the n-dimesional notation Aµ = (Ai, φ), D = γµiDµ, but one must
then remember that space is really n− 1 dimensional.
If n − 1 is even there is no Weyl representation of the gamma matrices
(because of the inclusion of the ‘gamma-five’), and no index theorem of this
form exists.
We define the ‘gamma-five’ for even n as
γ ≡ −i−
2 γ1···n (7)
which then is hermitian, and we define the projectors
(1∓ γ) . (8)
In odd dimensions n− 1, the Dirac operator splits into two Weyl operators
D ≡ P+DP−
D† ≡ P−DP+ (9)
Because P± andD are all hermitian, it follows thatD† is the hermitian conjugate
of D. Also, because D is already of an off-block diagonal form, it suffices to
include just one of the projectors, so we can just as well write this as
D = P+D = DP−
D† = P−D = DP+ (10)
The index can now be defined as
dimkerD − dimkerD† (11)
Since kerD = ker
and kerD† = ker
we can express this as2
dimker
− dimker
= dimker
. (12)
where we have noted that γ = P− − P+.
Callias, Weinberg and others used the regulator
I(M2) = Tr
D2 +M2
to obtain the index as the limit M2 → 0. In this Letter we will be slightly more
general. We define
Ji(x, y) ≡ tr 〈x |γγif(D)| y〉 , (14)
for any function f (and of course D is not dimensionless, so D has to be accom-
panied by M in a suitable way). Then we notice that
W (x, y) ≡ (iγi∂xi + γµAµ(x) +M) 〈x |f (D)| y〉
= 〈x |f (D)| y〉
−iγi∂yi + γµAµ(y) +M
where (manifestly)
W (x, y) = 〈x |(D +M)f (D)| y〉 . (16)
From this, we obtain the following identity
∂xi + ∂yi
Ji(x, y) = 2tr 〈x |γDf(D)| y〉
+tr (Aµ(y)−Aµ(x)) 〈x |γγµf(D)| y〉 (17)
In odd dimensions, the second term in the right hand side vanishes as x ap-
proaches y. This can be seen as being equivalent to the statement that there is
no chiral anomaly in odd dimensions (by using point-splitting and inserting a
Wilson line). So we get
i∂iJi(x, x) = 2tr 〈x |γDf(D)|x〉 (18)
2To see this that kerD = kerD†D we apply the definition of hermitian conjugate
with respect to the inner product (ψ, χ) =
dxψ†χ and the property of the norm, to
0 = (ψ,D†Dψ) = (Dψ,Dψ).
If we wish to compute the index as in Eq (13), then we can take
f(D) =
D2 +M2
(however there is no unique choice of Ji). We then get
Ji(x, y) = tr
D2 +M2
−D2 +D2 +M2
D2 +M2
= −tr
D2 +M2
. (20)
provided
= 0 (21)
We will see in the next few paragraphs how one can achieve this by using a
principal value prescription.
The virtue of expressing Eq (13) as a total divergence, is that we then can
compute the index as a boundary integral over an (n− 2)-sphere at infinity as
I(M2) =
dΩn−2r
n−2x̂iJi(x, x). (22)
where r is the radius of the sphere and dΩn−2 denotes the volume element of
the unit sphere.
If instead we wish to compute the index as the limit of
I(s) = Tr
γe−sD
. (23)
as s→ ∞, then we get
Ji(x, y) = tr
. (24)
It might seem confusing that we can have a plus sign here, when we have a
minus sign in Eq (20). These peculiar signs seem to be correct though. Why we
can have opposite signs should be a reflection of the fact that these expressions
can not be continuously connected with each other, at least not in any obvious
way (like taking M to zero and s to zero. In fact s should be taken to plus
infinity as M goes to zero). We will now illustrate how one can use this Ji to
compute the index in odd dimensions.
One dimension
We choose our gamma matrices as
, γ2 =
and we have
γ = iγ1γ2 =
. (26)
The Dirac operator reads
D = iγ1∂ + γ2φ (27)
We need the square of the Dirac operator,
D2 = −∂2 + φ2 + γ∂φ. (28)
We make the choice
J1(x, y) = −tr
D2 +M2
We assume that φ(x) converges towards some constant values at x = −∞ and
x = +∞. That means that we may ignore ∂φ(x) for sufficiently large |x|, where
we then get
J1(x, x) = −tr (γγ1γ2)
k2 + φ2 +M2
φ2 +M2
The index is now given by
(J1(+∞)− J1(−∞)) = ±1 (31)
if φ flips the sign an odd number of times when going from −∞ to +∞, and 0
otherwise.
If instead we choose
J(x, y) = tr
then we get
J(x, x) = tr (γγ1γ2)
k2 + φ2
e−s(k
2+φ2) (33)
If we compute the integral over k in the most natural way, then we get a result
that vanishes in the limit s → ∞. Could there be another way of defining this
integral, such that we do not get zero as the result? We notice that the integral
A(s) ≡
e−s(k
k2 + 1
for s > 0 is convergent only if we integrate k along a line in the complex plane
which is such that it asymptotically is such that −π
< θ < π
where k = |k|eiθ.
Integrating along any such line in the complex plane, we get the same value of
this integral. If on the other hand we integrate over a line that asymptotically
lies outside this cone, then we get a divergent integral for s > 0. But we get
a convergent integral for s < 0. We then define the value of the integral for
s > 0 as the analytic continuation of the same integral for s < 0. It remains to
compute this convergent integral. Replacing k by ik and s by −s, we get the
integral
A(−s) = −i
e−s(k
k2 − 1 (35)
We can compute its derivative
′(−s) = −i
−s(k2−1) = −i
s (36)
The right-hand side can obviously be analytically continued to −s, and that is
how we will define A(s) where the integral representation does not converge.
We can then integrate up A′(s),
A(∞) = A(0)−
e−s = A(0)−
= A(0)− π (37)
and we then need to compute
A(0) = i
k2 − 1
We define this as the principal value. This is ad hoc – we have no argument
why one should define it like this. But if we accept this, then we get A(0) = 0.
We conclude that we could just as well define the integral that we had, as
e−s(k
k2 + 1
= −π. (39)
But this requires us to perform the integration of k in the cone where it diverges
for s > 0, and then define this integral by analytic continuation. This seem to
be rather ad hoc. We have three rather week arguments why one should Wick
rotate. First, if we keep x − y as a small number, then we get the factor
eik(x−y) and this can act as a convergence factor only if we Wick rotate. (We
illustrate this in the Appendix where we compute the corresponding integral in
any complex number of dimensions.) Second, it seems to be the only way that
we could produce a non-trivial answer. Third, with this prescription we will
manage to reproduce the right answer in any odd number of dimensions, where
we can check our result against the safer regularization used by Callias.
If we compute the integral by this prescription, then we get
J(x, x) = tr (γγ1γ2) lim
k2 + φ2
e−s(k
2+φ2) = i
and we see that we indeed get the right answer.
Three dimensions and magnetic monopoles
The physics problem that we will consider in three dimensions, is to compute
number of zero modes of the Bogomolnyi equation
Fij = ǫijkDkφ (41)
We choose the convention that our fields are hermitian. It is convenient to group
the fields into ‘gauge potential’
Aµ = (Ai, φ) (42)
We define Dµ = (Di, φ) such that iDµ = i∂µ + Aµ and we let Gµν = i[Dµ, Dν ]
be the associated ‘field strength’. Then the Bogomolnyi equation reads
Gµν =
ǫµνρσGρσ . (43)
Linearizing this, we get
DµδAν =
ǫµνρσDρδAσ (44)
Contracting with γµν , we get
(1 + γ)γµνDµδAν = 0 (45)
and if we impose the background gauge condition
DµδAµ = 0 (46)
which is to say that zero modes are orthogonal to gauge variations with respect
to the moduli space metric, then we can write this linearized equation as a Dirac
equation
Dψ ≡ γµDµψ = 0 (47)
where
ψ := (1 + γ)γµδAµ. (48)
We compute
D2 = −D2i + φ2 +
iγµνGµν (49)
Inserting the Bogomolnyi configuration we can write this, thus using the fact
that Gµν is selfdual,
2 = −D2i + φ2 +
(1 + γ)iγµνGµν . (50)
and get a vanishing theorem. Namely, dimkerDD† = 0 as DD† > 0 is strictly
postive. Hence we can compute the dimension of the moduli space dim kerD ≡
dimkerD†D just by computing the index of D. To compute the index, we now
wish to compute
Ji(x, x) = tr
γγiγkDk
We assume that asymptotically φ approaches a constant value at infinity. This
corresponds to a gauge choice where we have a Dirac string singularity. Some
further examination reveals that we get a non-negligible contribution to Ji, for
a sufficiently large two-sphere, only from the term
Ji(x, x) = tr
γγiγ4φ
(2π)3
k2 + φ2 + 1
iγµνGµν
−s(k2+φ2+ 1
iγµνGµν)
We thus need to perform an integral of the form
A(s) =
k2 + 1
e−s(k
2+1) (53)
If we choose the same prescription as we did in one dimension, then we get the
result
A(+∞) = π. (54)
For details of such a computation we refer to appendix A.
If we apply this result to the integral that we had, we get
Ji(x, x) =
γγiγ4φ
iγµνGµν
We expand the square root,
iγµνGµν = φ+
iγµνGµν + ... (56)
In the far distance, in a charge Q monopole configuration, we find that
γµνGµν = 2γkγ4(1− γ)
Q (57)
and so when we trace over the gamma matrices, we get
Ji(x, x) =
. (58)
If we now for instance assume SU(2) gauge group, broken to U(1), then if we
integrate i
Ji over S
2, we get the index 2Q. The number of bosonic zero modes
is twice the index, i.e. −4Q in our conventions [4, 5].
(2m+ 1) dimensions
In 2m+ 1 dimensions we get the integral
A(µ) ≡ lim
k2 + µ2
e−s(k
2+µ2) (59)
if we use our regulator. Here
µ2 ≡ v2 +G (60)
(and G is an abbreviation for 1
iγµνGµν .) This should be compared to the
integral
B(µ) ≡ − lim
(−1)m
(k2 + v2 +M2)
Gm (61)
that we get using the Callias regulator. 3 In order to compare these integrals,
we rewrite them as
A(µ) = µ2m−1a
B(µ) = v−1bGm (63)
where
a = lim
ξ2 + 1
−s̃(ξ2+1)
b = − lim
(−1)m
ξ2 + 1 + M̃2
We compute a according the prescription introduced above in one and three
dimensions, that is by Wick rotating ξ and continue analytically in s. (Details
are in appendix A.) We can compute b using residue calculus (introducing a
regulator so that we can close the contour on a semi-circle at infinity). The
result is
a = −(−1)mπ
b = (−1)m 1
) (65)
We next expand
vA(µ) = v
v2 +G
)m− 1
= v2ma+ ...+
m + ...
vB(µ) = bGm (66)
and we find that the coefficient of Gm becomes equal to
−(−1)m
) π (67)
if one uses our regularization, and equal to
(−1)m 1
) π (68)
3This integral comes from expanding
k2 + v2 +G+M2
k2 + v2 +M2
+ ... (62)
in powers of G as a geometric series [4].
if one uses the Callias regularization. We see that the two expressions coincide
for all m.
We have now showed that if we use our prescription of Wick rotating k to
compute the integrals over the exponential, then we get the right answer for all
cases that can be safely computed using a regulator that is less convergent. We
are inclined to think that our prescription for how to compute the integral, will
also work for index problems where the Callias regulator diverges. But we have
no proof. It is perhaps not so obvious that more general index problems can be
formulated. In the next section we will give one example of a more general type
of index problem.
3 Four dimensions and self-dual strings
To introduce the notation, we first consider the free Abelian tensor multiplet
theory in 1 + 5 dimensions. The on-shell field content is a two-form gauge
potential Bµν , five scalar fields φ
A and corresponding Weyl fermions ψ. The
field strength Hµνρ = ∂µBνρ + ∂ρBµν + ∂νBρµ is selfdual. The supersymmetry
variation of the Weyl fermions is
ΓµνρHµνρ + Γ
µΓA∂µφ
ǫ (69)
where we use eleven-dimensional gamma matrices splitted into SO(1, 5)×SO(5),
so that in particular
{Γµ,ΓA} = 0. (70)
In a static and x5 independent field configuration, in which only φ5 =: φ is
non-zero, we find the SUSY variation
Γ0i5H0i5 + Γ
iΓA=5∂iφ
ǫ (71)
If we assume that the classical bosonic field configuration is such that
∂iφ = H0i5 (72)
then the SUSY variation reduces to
δψ = ∂iφΓ
Γ05 + ΓA=5
ǫ (73)
and we find the condition for unbroken SUSY as
1 + Γ05ΓA=5
ǫ = 0 (74)
If we use the Weyl condition
Γǫ = −ǫ (75)
of the (2, 0) supersymmetry parameter ǫ, then we can also write this as
1 + Γ1234ΓA=5
ǫ = 0. (76)
We may represent the gamma matrices as
Γµ = (Γ0,Γi,Γ5) =
1⊗ iσ2 ⊗ 1, γi ⊗ σ1 ⊗ 1, γ ⊗ σ1 ⊗ 1
ΓA = 1⊗ iσ2 ⊗ σA (77)
where σ1,2,3 are the Pauli sigma matrices, γ = γ1234. Then the condition for
unbroken SUSY is
(1 + γ ⊗ σ) ǫ = 0 (78)
where σ = σ1234 = σA=5.
We have found that if
Hijk = ǫijkl∂lφ (79)
then half SUSY is unbroken. This equation is the Bogomolnyi equation for self-
dual strings [1]. We are interested in finding the number of parameters needed
to describe solutions of this equation. We can linearize it and get the equation
γi∂iχ = 0 (80)
for the bosonic zero modes, that we have gathered into a matrix
χ ≡ γijδBij + γδφ. (81)
For this to work we must also assume the background gauge condition
∂iBij = 0. (82)
Now this linearized equation Eq (80) does not make any reference to the gauge
field. So there is no way that we could count the number of parameters of a
multi-string configuration just using this equation. This should of course not be
a surprise. The strings that we have in the Abelian theory are not solutions of
the field equations. They have to be inserted by hand, that is we need to insert
delta function sources by hand, in the same spirit as for Dirac monopoles.
To be able to count the number of zero modes, we must consider some
interacting theory which (at the classical level) has solitonic string solutions.
To pass to non-Abelian theory we begin by rewriting the Abelian theory
in loop space. Loop space consists of parametrized loops C: s 7→ Cµ(s). We
introduce the Abelian ‘loop fields’ [7]
Aµs = Bµν(C(s))Ċ
ν (s)
φµs = φ(C(s))Ċµ(s)
µs = ψ(C(s))Ċµ(s) (83)
With these definitions, a short computation reveals that Aµs transforms as a vec-
tor and φµs a contra-variant vector under diffeomorphisms in loop space induced
by diffeomorphisms in space-time. One may then extend these transformation
properties to any diffeomorphism in loop space. Space-time diffeomorphism and
reparametrizations of the loops then get unified and are both diffemorphisms
in loop space. The only thing to remember is what is kept fixed under the
variation. If it is the parameter of the loop, or the loop itself.
The field strength becomes
Fµs,νt = Hµνρ(C(s))Ċ
ρ(s)δ(s− t) (84)
In terms of these fields, the Bogomolnyi equation will read4
Fis,jt = ǫijkl∂k(sφlt). (85)
We pass to the non-Abelian theory by letting these loop fields become non-
Abelian, in the sense that Aµs = A
a(s) where λa(s) are generators of a loop
algebra associated to the gauge group [7]. We introduce a covariant derivative
Dµs = ∂µs +Aµs. (86)
Local gauge transformations act as
δΛAµs = DµsΛ
µs = [φµs,Λ]. (87)
Given a loop C, we automatically get a tangent vector Ċµ(s) that makes no
reference to space-time. We can therefore impose the loop space constraints
Ċµ(s)Aµs = 0 (88)
for each s, and also
φµs = Ċµ(s)φ(s;C) (89)
for some subtle field φ(s;C) on loop space. As a consequence, we find that
µs = 0. (90)
These constraints are covariant under diffeomorphisms of space-time and reparametriza-
tions of loops. They are invariant also under local gauge transformations, pro-
vided that the gauge parameter is subject to the condition
µ(s)∂µsΛ = 0 (91)
which is the condition of reparametrization invariance. With the assumption
made that λa(s) are generators of a loop algebra, we find that the constraint
can also be written as
[Aµs, φ
µt] = 0 (92)
A local gauge variation of this constraint is
[DµsΛ, φ
µt] + [Aµs, [φ
µt,Λ]] = [∂µsΛ, φ
µt] + [[Aµs,Λ], φ
µt] + [Aµs, [φ
µt,Λ]]
= [∂µsΛ, φ
µt] + [Λ, [φµt, Aµs]] (93)
The last term vanishes by the constraint. The first term gives us the constraint
Eq (92) that we must impose on the gauge parameter
dsΛa(s, C)λa(s). (94)
4We denote by ∂is the usual functional derivative with respect to C
µ(s).
We have now introduced non-local non-Abelian fields with infinitely many
components. It is also likely that consisteny of the theory requires an infinite
set of constraints on these fields. Maybe then, it could be that we may in the
end descend to a finite degrees of freedom. But this is just a speculation. The
problem appears to be difficult and ill-defined – How should one define a degree
of freedom in a strongly coupled non-local theory?
The non-Abelian generalization of the Bogomolnyi equation should be given
by [7]
Fis,jt = ±ǫijklDk(sφlt). (95)
This equation is gauge invariant and invariant under the residual SO(4) Lorentz
group that is preserved by the strings. We can not think of any reasonable
modification of this equation that would preserve these symmetries, so on this
grounds alone one could suspect this equation to be correct. Of course this is not
the only requirement that the BPS condition imposes. We also get conditions
on the 0s and the 5s components. But these BPS equations will be of no interest
to us right now.
We will show below that the linearized Bogomolnyi equation can be written
Di(s + σφi(s
χt) = 0 (96)
We will also see below that we (presumably) can actually drop the symmetriza-
tion in s and t in this equation. The fields transform in the adjoint represen-
tation of the loop algebra, by which we mean that φisχt = [φis, χt]. We define
the Dirac operator
Ds = γi (Dis + σφis) (97)
and the projectors
(1∓ γσ) , (98)
We can now formulate an index problem, in an even-dimensional (loop-)space.
The even-dimensional space in this case is given by the 4-dimensional transverse
space to the strings, and the index is given by
dimkerDs − dimkerD†s (99)
where
Ds = P+Ds = DsP−
D†s = P−Ds = DsP+. (100)
Since Ds and P± are hermitian, it is manifest that D†s defined this way will be
the hermitian conjugate of Ds, thus justifying the notation.
Computing the index alone is not sufficient in order to obtain the dimension
of the moduli space of self-dual strings. We also need a vanishing theorem that
says that dimkerD†s = 0.
Linearizing the Bogomolnyi equation, we get
2D[isδAjt] = ±ǫijkl (Dksδφlt + φksδAlt) (101)
Contracting by γij , we get
γijD̃isχjt = 0 (102)
where we have defined
D̃is ≡ Dis ∓ γφis
χis ≡ δAis ∓ γδφis (103)
To see that the linearized BPS equation can be written like this, one must use
the constraint
γijφisδφjt = 0. (104)
We can avoid having explicit ± signs by introducing the other chiraly matrix
at our disposal, namely σ that lives in a different vector space than γ. We can
then hide the ± signs in the tensor product
γ ⊗ σ = ±1 (105)
which amounts to
D̃is ≡ Dis + σφis
χis ≡ δAis + σδφis (106)
without any ±.5 If we define
χs ≡ γiχis (108)
then we can write the zero mode equation as
γiD̃isχt + D̃
sχit = 0. (109)
Let us analyze the second term in this equation. It is given by
DisδAit + φ
sδφit
φisδAit +D
sδφit
(110)
We should not count variations that are gauge variations as bosonic zero modes.
We can insure this by demanding the zero modes to be orthogonal to gauge
variations, with respect to the metric on the moduli space,
(δΛAis, δAit) + (δΛφis, δφjt) = 0 (111)
This leads to the background gauge condition
sδAit + φ
sδφit = 0. (112)
5To really understand what is going on, one should apply (1± γσ) on everything, on ψs
and on Ds. Then one notices that
∓γ (1∓ γσ) = σ (1∓ γσ) . (107)
That is, we can trade ∓γ for σ, once we apply (1± γσ) on everything. This is what we really
should do, but to keep the notation simple, we do not spell this out.
This condition implies that the gauge variation of the zero modes vanishes,
δΛδAis = 0 = δΛδφis (113)
To see this, we make a gauge variation δΛδAis = DisΛ, δΛφis = φisΛ, and
ask which gauge parameters Λ will respect the background gauge condition.
Inserting this gauge variation into the background gauge condition, we get
sDit + φ
Λ = 0. (114)
For this to work nicely, it seems that we must constrain the non-locality of our
loop field such that ∂i
∂it) < 0. Then the only solution to this equation is Λ = 0.
In other words all gauge variations of the zero modes have to vanish.
Furthermore we want the variation to preserve the orthogonality between
Ais and φis,
(Ais, δφit) + (δAis, φit) = 0 (115)
If we make a gauge variation of this, then we get the condition
(δΛAis, δφit) + (δAis, δΛφit) = 0 (116)
which amounts to
φisδAit +D
sδφit = 0. (117)
We conclude that the zero mode equation can be written as
Dsχt = 0 (118)
where
Ds = γi (Dis + σφis) (119)
We are interested in counting the number of such modes in a background of k
BPS strings. We compute
D2 = (Dis)
2 + (φis)
γij (Fis,js + γσǫijklDksφls) (120)
(Here D2 ≡ DsDs ≡
DsDs, and analogously for the other fields or opera-
tors.) In a BPS configuration, we get is
2 = (Dis)
2 + (φis)
ij (1 + γσ)Fis,js (121)
Furthermore, in the subspace where 1 + γσ = 0, we find that
D2 = (Dis)
2 + (φis)
2 (122)
is a strictly negative operator, hence has no zero modes. This means that we
have a vanishing theorem, dimkerD† = 0.
A small comment
The zero mode equation was really
D(sχt) = 0 (123)
where we should symmetrize in s and t. That means that we should rather
consider
DsD(sχt) =
(DsDsχt +DsDtχs)
(DsDsχt +DtDsχs + [Ds, Dt]χs) . (124)
If now D[sDt] = 0 and Dsχs = 0, then we get
DsDsχt = 0 (125)
The latter condition, Dsχs = 0 is of course a consequence of D(sχt) = 0 with
s = t. The former condition reads
0 = D[sDt]
= Di[sDit] + φi[sφit] + σDi[sφit] (126)
which we would like to impose as a constraint. Restricting to the abelian case
this is condition is of course true as 0 ≡ ∂i[s∂|i|t]. If we can impose this as a
constraint on the non-abelian fields, then we have now seen that the zero mode
equation Eq (123) implies that
dsD†sDsχt = 0 (127)
because Ds is anti-self-adjoint with respect to the inner product
(ψs, χt) =
ψ†s(C)χt(C)
(128)
on loop space. We can also go in the opposite direction. Assuming that Eq
(127) holds, we get
χt, D
sDsχt
= (Dsχt, Dsχt) (129)
and we conclude that (123) implies
Dsχt = 0 (130)
with no symmetrization in s, t.
How to compute the index
We should now be able to compute an index associated to self-dual strings, as
the limit
I(s) = Tr
(131)
when s→ ∞. We define the quantity
Jis(C,C
′) = tr
γσγiγk (Dks + σφks)
(132)
(it should be clear that the two s’s involved in this formula are totally unrelated)
and find that
I(s) =
DC∂isJ is(C,C) (133)
We can separate the functional integral over parametrized loops C into several
pieces. We can keep a point on the loops C(s) = x fixed, and separate it as
DxC (134)
Then we can write I(s) as an integral over a large three-sphere at spatial infinity,
∂Jis(C)
∂Ci(s)
dΩ3x̂
DxCJis(C,C) (135)
where thus x = C(s).
If we assume that the gauge group is maximally broken to a product of
U(1)’s by the Higgs vacuum expectation values, then we should have U(1) loop
fields at spatial infinity.
If we assume that the gauge group is SU(2) and that it is broken to U(1),
then we need only the asymptotic form of the U(1) fields at spatial infinity,
Fis,jt = Hijl(x)Ċ
l(s)δ(s− t)
φks = vĊk(s) (136)
Without doing any computations, we can guess what the outcome of the
index calculation should be. A term like
ǫijkl
DxCtr (Fis,jt(C)Fks,jt(C)) (137)
could certainly arise somewhere (in odd dimensions a corresponding term van-
ished since there is no chiral anomaly in odd dimensions). In our case this term
vanishes identically by the Bogomolnyi equation and the constraint6
Fis,jtDisφjt = 0. (139)
Then there can be a term
ǫijkl
DxCtr
Fis,jtφks
(140)
6For U(1) fields this would read
Fis,jt∂isφjt ∼ Hijk(C(s))∂iφ(C(s))Ċ
k(s)Ċj(s)δ(s − t)2 ≡ 0. (138)
that should arise in a very similar way as the corresponding term arose for
monopoles. If we insert the asymptotic U(1) fields, this term becomes propor-
tional to
ǫijklHijk(x) (141)
That means that the index should be given by some numerical constant, times
the magnetic charge
H. (142)
A Integrals over the exponential
The integral we will analyze here is
a(s) =
k2 + 1
−s(k2+1)eiǫk (143)
for any complex number ζ. (The ǫ > 0, say, will be taken towards zero. It
arose from ǫ = x− y and we keep it here just as a convergence factor.) We first
compute
a(0) =
k2 + 1
eiǫk (144)
In order to make this integral converge for any ζ, we should Wick rotate k to
ik, and henceforth we will always mean by i the branch eiπ/2, and by −1 we
mean eiπ . Then we get
a(0) = −i2ζ+1
k2 − 1
e−ǫk (145)
and this integral we evaluate as a principal value. That means to evaluate the
residues along the real axis and multiply them not by 2πi, but by half of it, that
is, by πi. We get
a(0) = (−1)ζπ 1− (−1)
. (146)
Next we turn to our integral a(s). It is easier to first compute the derivative.
We should still work with the Wick rotated integral. Making the substitution
ξ = k2 we can put it on the form of two gamma functions. The result is that
′(−s) = −eiπ(ζ+
1 + (−1)2ζ
−ζ− 1
s (147)
which we can trivially continue analytically to +s, and then integrate up. The
result is
a(+∞) = −π 1 + (−1)
cos(πζ)
+ π(−1)ζ 1− (−1)
. (148)
References
[1] P. S. Howe, N. D. Lambert and P. C. West, “The self-dual string soliton,”
Nucl. Phys. B 515, 203 (1998) [arXiv:hep-th/9709014].
[2] D. S. Berman and J. A. Harvey, “The self-dual string and anomalies in
the M5-brane,” JHEP 0411, 015 (2004) [arXiv:hep-th/0408198].
[3] C. Callias, “Index Theorems On Open Spaces,” Commun. Math. Phys.
62, 213 (1978).
[4] E. J. Weinberg, “Parameter Counting For Multi - Monopole Solutions,”
Phys. Rev. D 20, 936 (1979).
[5] E. J. Weinberg and P. Yi, “Magnetic monopole dynamics, supersymmetry,
and duality,” Phys. Rept. 43, 65 (2007) [arXiv:hep-th/0609055].
[6] M. Hirayama, “Supersymmetric Quantum Mechanics And Index Theo-
rem,” Prog. Theor. Phys. 70, 1444 (1983).
[7] A. Gustavsson, “A reparametrization invariant surface ordering,” JHEP
0511, 035 (2005) [arXiv:hep-th/0508243].
A. Gustavsson, “The non-Abelian tensor multiplet in loop space,” JHEP
0601, 165 (2006) [arXiv:hep-th/0512341].
ABSTRACT
  We give a prescription for how to compute the Callias index, using as
regulator an exponential function. We find agreement with old results in all
odd dimensions. We show that the problem of computing the dimension of the
moduli space of self-dual strings can be formulated as an index problem in
even-dimensional (loop-)space. We think that the regulator used in this Letter
can be applied to this index problem.

<|endoftext|><|startoftext|>
Introduction
Let X = {0, 1}Zd denote a configuration space, where Zd is the d-dimensional
integer lattices. The contact process {ηt : t ≥ 0} is an X-valued continuous-
time Markov process. The model was introduced by Harris in 1974 [1] and
is considered as a simple model for the spread of a disease with the infection
rate λ. In this setting, an individual at x ∈ Zd for a configuration η ∈ X is
infected if η(x) = 1 and healthy if η(x) = 0. The formal generator is given
Ωf(η) =
c(x, η)[f(ηx)− f(η)],
where ηx ∈ X is defined by ηx(y) = η(y) (y 6= x), and ηx(x) = 1−η(x). Here
for each x ∈ Zd and η ∈ X, the transition rate is
c(x, η) = (1− η(x))× λ
y:|y−x|=1
η(y) + η(x),
http://arxiv.org/abs/0704.0019v2
with |x| = |x1|+ · · ·+ |xd|. In particular, the one-dimensional contact process
001 → 011 at rate λ,
100 → 110 at rate λ,
101 → 111 at rate 2λ,
1 → 0 at rate 1.
Let Y = {A ⊂ Zd : |A| < ∞}, where |A| is the number of elements in A.
Let ξAt (⊂ Zd) denote the state at time t of the contact process with ξA0 = A.
There is a one-to-one correspondence between ξAt (⊂ Zd) and ηt ∈ X such
that x ∈ ξAt if and only if ηt(x) = 1. For any A ∈ Y , we define the extinction
probability of A by limt→∞ P (ξ
t = ∅). Define νλ(A) = νλ{η : η(x) = 0
for any x ∈ A}, where νλ is an invariant measure of the process starting
from a configuration: η(x) = 1 (x ∈ Zd) and is called the upper invariant
measure. In other words, let δ1S(t) denote the probability measure at time t
for initial probability measure δi which is the pointmass η ≡ i(i = 0, 1). Then
νλ = limt→∞ δ1S(t). Then self-duality of the process implies that νλ(A) =
limt→∞ P (ξ
t = ∅). The correlation identities for νλ(A) can be obtained as
follows:
Theorem 1.1 For any A ∈ Y ,
y:|y−x|=1
νλ(A ∪ {y})− νλ(A)
νλ(A \ {x})− νλ(A)
From now on we consider the one-dimensional case. We introduce the fol-
lowing notation:
νλ(◦) = νλ({0}), νλ(◦◦) = νλ({0, 1}), νλ(◦ × ◦) = νλ({0, 2}), . . . .
By Theorem 1.1, we obtain
Corollary 1.2
2λνλ(◦◦)− (2λ+ 1)νλ(◦) + 1 = 0,(1)
λνλ(◦ ◦ ◦)− (λ+ 1)νλ(◦◦) + νλ(◦) = 0,(2)
2λνλ(◦ ◦ ◦◦) + νλ(◦ × ◦)− (2λ+ 3)νλ(◦ ◦ ◦) + 2νλ(◦◦) = 0,(3)
λνλ(◦ ◦ ×◦)− (2λ+ 1)νλ(◦ × ◦) + λνλ(◦ ◦ ◦) + νλ(◦) = 0.(4)
The detailed discussion concerning results in this section can be seen in
Konno [2, 3]. If we regard λ, νλ(◦), νλ(◦◦), νλ(◦ ◦ ◦), . . . as variables, then
the left hand sides of the correlation identities by Theorem 1.1 are polyno-
mials of degree at most two. In the next section, we give a new procedure
for getting a series of approximations for extinction probabilities based on
the Gröbner basis by using Corollary 1.2. As for the Gröbner basis, see [4],
for example.
2 Our results
Put x = νλ(◦), y = νλ(◦◦), z = νλ(◦ ◦ ◦), w = νλ(◦ × ◦), s = νλ(◦ ◦
◦◦), u = νλ(◦ ◦ ×◦). Let ≺ denote the lexicographic order with λ ≺ x ≺
y ≺ w ≺ z ≺ u ≺ s. For m = 1, 2, 3, let Im be the ideals of a polynomial
ring R[x1, x2, . . . , xn(m)] over R as defined below. Here x1 = λ, x2 = x, x3 =
y, x4 = z, x5 = w, x6 = s, x7 = u and n(1) = 3, n(2) = 4, n(3) = 7.
2.1 First approximation
We consider the following ideal based on Corollary 1.2 (1):
I1 = 〈 2λy − 2λx− x+ 1, y − x2 〉 ⊂ R[λ, x, y].(5)
Here y−x2 corresponds to the first (or mean-field) approximation: ν(1)
(◦◦) =
λ (◦))2. Then
G1 = {(x− 1)(2λx− 1), y − x2}(6)
is the reduced Gröbner basis for I1 with respect to ≺. Therefore the solution
except a trivial one x(= y) = 1 is x = ν
(◦) = 1/(2λ). Remark that the
trivial solution means that the invariant measure is δ0. From this, we obtain
the first approximation of the density of the particle, ρλ = Eνλ(η(x)), as
follows:
= 1− ν(1)
(◦) = 2λ− 1
for any λ ≥ 1/2. This result gives the first lower bound λ(1)c of the critical
value λc of the one-dimensional contact process, that is, λ
c = 1/2 ≤ λc.
However it should be noted that the inequality is not proved in our approach.
The estimated value of λc is about 1.649.
2.2 Second approximation
Consider the following ideal based on Corollary 1.2 (1) and (2):
I2 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x, xz − y2 〉 ⊂ R[λ, x, y, z].
Here xz−y2 corresponds to the second (or pair) approximation: ν(2)
(◦)ν(2)
◦) = (ν(2)λ (◦◦))2. Then
G2 = {(x− 1)((2λ− 1)x− 1), 1 + 2λ(y − x)− x,
−y − yx+ 2x2,−z − y(2 + y) + 4x2}
is the reduced Gröbner basis for I2 with respect to ≺. Therefore the solution
except a trivial one x(= y = z) = 1 is x = ν
(◦) = 1/(2λ − 1). As in a
similar way of the first approxaimation, we get the second approximation of
the density of the particle:
2(λ− 1)
2λ− 1 ,
for any λ ≥ 1. This result implies the second lower bound λ(2)c = 1. We
should remark that if we take
I ′2 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x, y − x2, z − x3 〉 ⊂ R[λ, x, y, z],
then we have
G′2 = {z − 1, y − 1, x− 1}
is the reduced Gröbner basis for I ′2 with respect to ≺. Here y−x2 and z−x3
correspond to an approximation: ν
(◦◦) = (ν(2
(◦))2 and ν(2
(◦ ◦ ◦) =
(◦))3, respectively. Then we have only trivial solution: x = y = z = 1.
2.3 Third approximation
Consider the following ideal based on Corollary 1.2 (1)–(4):
I3 = 〈 2λy − 2λx− x+ 1, λz − λy − y + x,
2λs+ w − (2λ+ 3)z + 2y, λu− (2λ+ 1)w + λz + x,
ys− z2, xu− yw 〉 ⊂ R[λ, x, y, z, w, s, u].
Here ys−z2 and xu−yw correspond to the third approximation: ν(3)
(◦◦)ν(3)
◦◦) = (ν(3)
(◦ ◦ ◦))2 and ν(3)
(◦)ν(3)
(◦ ◦×◦) = ν(3)
(◦◦)ν(3)
(◦× ◦), respectively.
G3 = {(x− 1)((12λ3 − 5λ− 1)x2 − 2λ(2λ+ 3)x− λ+ 1), . . .}
is the reduced Gröbner basis for I3 with respect to ≺. Therefore the solution
except a trivial one x = 1 is x = ν
λ (◦) = (λ(2λ+3)+
D)/(12λ3−5λ−1),
where D = 16λ4 + 4λ2 + 4λ+ 1. Then we obtain the third approximation of
the density of the particle:
4λ(3λ2 − λ− 3)
12λ3 − 2λ2 − 8λ− 1 +
for any λ ≥ (1 +
37)/6. This result corresponds to the third lower bound
c = (1 +
37)/6 ≈ 1.180.
3 Summary
We obtain the first, second, and third approximations for the extinction
probability, the density of the particle, and the lower bound of the one-
dimensional contact process by using the Gröbner basis with respect to a
suitable term order. These results coincide with results given by the Harris
lemma (more precisely, the Katori-Konno method, see [3]) or the BFKL
inequality [5] (see also [3]). As we saw, the generators of Im in Section 2
have degree at most two in x1, x2, . . ., such as 2λy − 2λx− x+ 1, ys− z2 in
the case of I3. We expect that this property will lead to get the higher order
approximations of the process (and other interacting particle systems having
a similar property) effectively.
Acknowledgment. The author thanks Takeshi Kajiwara for valuable dis-
cussions and comments.
References
[1] T. E. Harris, Contact interactions on a lattice, Ann. Probab. 2: 969–988
(1974).
[2] N. Konno, Phase Transitions on Interacting Particle Systems, World
Scientific, Singapore (1994).
[3] N. Konno, Lecture Notes on Interacting Particle Systems,
Rokko Lectures in Mathematics, Kobe University, No.3 (1997),
http://www.math.kobe-u.ac.jp/publications/rlm03.pdf.
[4] D. A. Cox, J. B. Little, and D. O’Shea, Ideals, Varieties, And Al-
gorithms: An Introduction to Computational Algebraic Geometry And
Commutative Algebra, 3rd edition, Undergraduate Texts in Mathemat-
ics, Springer Verlag (2007).
[5] V. Belitsky, P. A. Ferrari, N. Konno, and T. M. Liggett, A strong corre-
lation inequality for contact processes and oriented percolation, Stochas-
tic. Process. Appl. 67: 213–225 (1997).
http://www.math.kobe-u.ac.jp/publications/rlm03.pdf
	Introduction
	Our results
	First approximation
	Second approximation
	Third approximation
	Summary
ABSTRACT
  In this note we give a new method for getting a series of approximations for
the extinction probability of the one-dimensional contact process by using the
Gr\"obner basis.

<|endoftext|><|startoftext|>
Introduction
	The f+(q2) hadronic form factor
	Form factor parameterizations
	Taylor expansion
	Model-dependent parameterizations
	Quantitative expectations
	Quark Models
	QCD sum rules
	Lattice QCD
	Analyzed parameterizations
	The BABAR Detector and Dataset
	Signal reconstruction
	Signal selection
	Background rejection
	q2 measurement
	Results on the q2 dependence of the hadronic form factor 
	Systematic Uncertainties
	c-quark hadronization tuning
	Reconstruction algorithm
	Resolution on q2
	Particle identification
	Background estimate
	Fitting procedure and radiative events
	Control of the statistical accuracy in the SVD approach
	Summary of systematic errors
	Comparison with expectations and with other measurements
	Branching fraction measurement
	Selection of candidate signal events
	Efficiency corrections
	Systematic uncertainties on RD
	Correlated systematic uncertainties
	Selection requirement on the Fisher discriminant
	D*+ counting in D0K- +
	Decay rate measurement
	Summary
	Acknowledgments
	References
ABSTRACT
  The shape of the hadronic form factor f+(q2) in the decay D0 --> K- e+ nue
has been measured in a model independent analysis and compared with theoretical
calculations. We use 75 fb(-1) of data recorded by the BABAR detector at the
PEPII electron-positron collider. The corresponding decay branching fraction,
relative to the decay D0 --> K- pi+, has also been measured to be RD = BR(D0
--> K- e+ nue)/BR(D0 --> K- pi+) = 0.927 +/- 0.007 +/- 0.012. From these
results, and using the present world average value for BR(D0 --> K- pi+), the
normalization of the form factor at q2=0 is determined to be f+(0)=0.727 +/-
0.007 +/- 0.005 +/- 0.007 where the uncertainties are statistical, systematic,
and from external inputs, respectively.

<|endoftext|><|startoftext|>
Molecular Synchronization Waves in Arrays of Allosterically Regulated Enzymes
Vanessa Casagrande,1 Yuichi Togashi,2, ∗ and Alexander S. Mikhailov2, †
1Hahn-Meitner-Institut, Glienicker Straße 100, 14109 Berlin, Germany
2Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany
Spatiotemporal pattern formation in a product-activated enzymic reaction at high enzyme con-
centrations is investigated. Stochastic simulations show that catalytic turnover cycles of individual
enzymes can become coherent and that complex wave patterns of molecular synchronization can
develop. The analysis based on the mean-field approximation indicates that the observed patterns
result from the presence of Hopf and wave bifurcations in the considered system.
PACS numbers: 82.40.Ck, 87.18.Pj, 82.39.Fk, 05.45.Xt
Molecular machines, such as molecular motors, ion
pumps and some enzymes, play a fundamental role in
biological cells and can be also used in the emerging soft-
matter nanotechnology [1]. A protein machine is a cyclic
device, where each cycle consists of conformational mo-
tions initiated by binding of an energy-bringing ligand
[2, 3]. In motors, such internal motions generate me-
chanical work [4], while in enzymes they enable or facil-
itate chemical reaction events (see, e.g., [5, 6]). Much
attention has been attracted to studies of biomembranes
with ion pumps and molecular motors, where membrane
instabilities and synchronization effects have been ana-
lyzed [7, 8, 9]. Here, a different class of distributed ac-
tive molecular systems — formed by enzymes — is con-
sidered. The catalytic activity of an allosteric enzyme
protein is activated or inhibited by binding of small reg-
ulatory molecules; the role of such regulatory molecules
can be played by products of the same reaction [10]. Pre-
vious investigations of simple product-regulated enzymic
systems [11, 12] and enzymic networks [13] in small spa-
tial volume with full diffusional mixing have shown that
spontaneous synchronization of molecular turnover cycles
can take place there. External molecular synchroniza-
tion of enzymes of the photosensitive P-450 dependent
monooxygenase system by periodic optical forcing has
been experimentally demonstrated [14].
In this Letter, spatiotemporal pattern formation in en-
zymic arrays is investigated. In such systems, immobile
enzymes are attached to a solid planar support immersed
into a solution through which fresh substrate is supplied
and product molecules are continuously removed. Prod-
uct molecules released by an enzyme diffuse through the
solution and activate catalytic turnover cycles of neigh-
bouring enzymes in the array.
A simple stochastic model [12] of an enzyme as a cyclic
machine (a stochastic phase oscillator), shown in Fig. 1,
is used. Binding of a substrate molecule to an enzyme i
initiates an ordered internal conformational motion, de-
scribed by the conformational phase coordinate φi. The
initial state corresponds to the phase φi = 0. The cat-
alytic conversion event takes place and the product is
released at the state φp inside the cycle. After that, the
substrate
enzyme
regulatory
molecule
feedback
product
FIG. 1: (Color online) A sketch of the model.
conformational motion continues until the equilibrium
state of the enzyme (φi = 1) is finally reached. Initi-
ation of a turnover cycle is a random event, occurring
at a certain probability rate. We assume that substrate
is present in abundance, and its concentration is not af-
fected by the reactions. Conformational motion inside
the cycle is modeled as a stochastic diffusional drift pro-
cess, described by equation
φi = v+ ηi(t), where v is the
mean drift velocity and ηi(t) is an internal white noise
with 〈ηi(t)ηj(t
′)〉 = 2σδijδ(t− t
′) where σ specifies inten-
sity of intramolecular fluctuations.
Allosterically activated enzymes possess a site on their
surface where regulatory molecules can become bound.
Binding of a regulatory molecule leads to conformational
change that enhances catalytic activity of the enzyme. A
regulatory molecule binds to an enzyme with rate con-
stant β and dissociate from it with rate constant κ. Bind-
ing of a regulatory molecule at an enzyme raises its prob-
ability to start a cycle from α0 to α1. We assume that
a regulatory molecule can bind to an enzyme only in its
rest state and this molecule is released when the cycle
is started. The role of regulatory molecules is played
by product molecules of the same reaction. Immobile
enzymes are randomly distributed in space with concen-
tration c. Product diffuses at diffusion constant D and
undergoes decay at rate constant γ. The characteristic
diffusion length of product molecules is ldiff =
In our stochastic 2D simulations, the medium was dis-
cretized into spatial cells (up to 256 × 256), each con-
http://arxiv.org/abs/0704.0021v2
FIG. 2: Stochastic (a,b) and mean-field (c,d) simulations of
2D wave patterns; (a) τp = 0.14, c = 1, and β = 300, (b)
τp = 0.25, c = 10, and β = 10, (c) τp = 0.14, c = 1, and
β = 300, (d) τp = 0.34, c = 100, and β = 1.42. Other
parameters are α0 = 1, α1 = 1000, κ = 10, γ = 10, σ = 0,
D = 100. The linear size of the shown area is L = 40 ldiff in
all panels.
taining a number of enzyme molecules. The cells were
so small that diffusional mixing of product molecules in
a cell within the shortest characteristic time of the reac-
tion could always take place. Each enzyme was described
by the stochastic model given above; diffusion of product
molecules was modeled as a random walk over a discrete
cell lattice. The mean cycle time τ = 1/v was chosen as
the time unit (τ = 1). Systems including up to 655 360
enzymes were used in the simulations.
Figure 2a,b (see also Videos 1 and 2 in ref. [15]) shows
two typical examples of stochastic 2D simulations. Here,
spatial distributions of product molecules are displayed.
Waves of product concentration are propagating through
the medium. In a peak of a wave, many locally present
enzymes are simultaneously releasing product molecules.
Since product release can take place only at a certain
stage inside the cycle, this means that the cycles of en-
zymes are locally synchronized. Not only regular wave
structures, such as rotating spiral waves or target pat-
terns (Fig. 2a), but also complex regimes of wave turbu-
lence (Fig. 2b) have been observed.
To understand and interpret stochastic simulation re-
sults, an analytical study of the system in the mean-field
approximation, which holds in the limit of high enzyme
concentrations, has been performed. In this approxima-
tion, the system is characterised by three continuous vari-
ables n0(r, t), n1(r, t) and m(r, t) which represent local
concentrations of enzymes in the rest state without or
with regulatory molecules attached (n0 and n1) and local
concentration of the product (m). For simplicity, internal
fluctuations in enzymes are neglected (σ = 0). Thus, all
enzymes which have started their cycles at some time t
would release their products at a definite time t+τp (with
τp = φp/v) and finish their cycles, returning to the rest
state, at time t + τ . Therefore, the system is described
by a set of three reaction-diffusion equations with time
delays,
= βmn0 − κn1 − α1n1 (1a)
= −βmn0 + κn1 − α0n0 + α0n0(t− τ)
+α1n1(t− τ) (1b)
= −βmn0 + κn1 + α1n1 − γm+ α0n0(t− τp)
+α1n1(t− τp) +D∇
2m. (1c)
The system always has a uniform stationary state
with certain concentrations n0, n1 and m, which can be
found as solutions of the respective algebraic equations.
This state corresponds to the absence of synchronization.
However, it may become unstable if allosteric activation
is strong enough. To analyze stability, small perturba-
tions δn0, δn1 and δm are added to the stationary state,
equations (1) are linearized and their solutions are sought
as δn0 ∼ δn1 ∼ δm ∼ exp (λqt− iqx) with λq = µq+iωq.
Thus, each spatial mode with wavevector q is character-
ized by its frequency ωq and its rate of growth µq. The
properties µq and ωq are given by the roots of a charac-
teristic equation which is determined by the linearization
matrix of equations (1). The steady state becomes unsta-
ble when at least one spatial mode with some wavenum-
ber q0 starts to grow (µq0 > 0).
As the bifurcation parameter, coefficient β can be cho-
sen. If regulatory molecules cannot bind to enzymes
(β = 0), feedback is absent and instabilities are not pos-
sible. On the other hand, allosteric activation becomes
strong if regulatory molecules can easily bind and, in this
case, emergence of oscillations and wave patterns can be
expected. Our bifurcation analysis reveals that, depend-
ing on the parameters of the system, it can exhibit ei-
ther a Hopf or a wave bifurcation [16]. As a result of
the Hopf bifurcation, uniform oscillations with q = 0 de-
velop. Because of the presence of delays in equations (1),
the characteristic equation is nonpolynomial in terms of
λ and, generally, a number of oscillatory solutions with
different frequencies ω are possible. Physically, such so-
lutions correspond to formation of several synchronous
enzymic groups. This effect has been previously exten-
sively investigated for similar systems in small spatial
volumes with full diffusional mixing [11] and we shall
not further discuss it here. The most robust uniform
oscillations, which we consider, are characterized by the
frequency ω ≈ 2π/τ and correspond to the single-group
synchronization. As the result of a wave bifurcation (also
known as the Hopf bifurcation with a finite wave number
[17]), the first unstable modes are traveling waves with
a certain wavenumber q0. Figure 3 shows the bifurca-
tion diagram in the parameter plane (τp, β). Note the
presence of a codimension-2 bifurcation point where the
boundaries of the Hopf and the wave bifurcations join.
To investigate nonlinear dynamics of the system, nu-
merical simulations of equations (1) have been performed
0 0.1 0.2 0.3 0.4
oscillations
codimension−2
wave−Hopf bifurcation 
uniform
ripples
pacemakers/waves
higher frequency/
mixed modes
waves
standing−
traveling
standing waves
FIG. 3: Phase diagram (α0 = 1, α1 = 1000, κ = 10, γ = 10,
c = 100, D = 1000). The Hopf bifurcation (solid line) and the
wave bifurcation (dash-dotted line) boundaries are displayed.
Gray lines show instability of the stationary state with respect
to development of uniform oscillations with two (dashed) and
three (dotted) groups in the well-mixed case. Lines separating
parameter domains with different kinds of patterns are hand-
drawn, based on numerical simulations.
[16]. The explicit Euler integration method has been
used; no-flux boundary conditions were applied. Results
of 1D simulations are summarized in Fig. 3 and examples
of typical observed patterns are shown in Fig. 4. Stand-
ing waves (Fig. 4a) develop when the boundary of the
wave bifurcation (dash-dotted curve) is crossed and uni-
form oscillations are observed above the boundary of the
Hopf bifurcation. Near the codimension-2 point, more
complex behavior was found. This included rippled os-
cillations (Fig. 4b), self-organized pacemakers (Fig. 4c)
and modulated traveling waves (Fig. 4d). The observed
patterns are similar to those previously found in reaction-
diffusion systems with the wave bifurcation [18]. In the
right upper corner of the diagram in Fig. 3, higher fre-
quency oscillations with several synchronous groups take
place.
Two-dimensional simulations of reaction-diffusion
equations (1) with time delay have been performed for
selected parameter values. In 2D simulations, sponta-
neously developing concentric waves (target patterns)
and spiral waves have been observed; target patterns
were however unstable and evolved into pairs of rotat-
ing spiral waves (Fig. 2c and Video 3 [15]). Complex
wave regimes, which can be qualitatively characterized
as turbulence of standing waves, have also been observed
(Fig. 2d and Video 4 [15]).
The mean-field approximation is based on neglect-
ing statistical fluctuations in concentrations of reacting
species [11] and, therefore, it should hold in the high
concentration limit. In Fig. 4, two upper panel rows
display spatiotemporal patterns which are observed in
FIG. 4: Spatiotemporal patterns in a 1D system (in each
panel, the vertical axis is time, running down, and the hor-
izontal axis is the coordinate). The upper two rows are
stochastic simulations (σ = 0) with concentrations c = 1 and
c = 10, the bottom row shows mean-field simulations with
c = 100. (a) τp = 0.3, β = 95/c, (b) τp = 0.14, β = 260/c, (c)
τp = 0.22, β = 600/c, and (d) τp = 0.16, β = 300/c. Other
parameters as in Fig. 3; the system size shown is L = 51
ldiff .
stochastic simulations with parameter values correspond-
ing to the respective mean-field simulations. To compare
mean-field simulations with different enzyme densities,
the following property of equations (1) can be used: in-
troducing relative concentrations ñ0 = n0/c, ñ1 = n1/c
and m̃ = m/c, it can be noticed that they obey the same
equations, but with a rescaled coefficient β̃ = βc. Thus,
essentially the same patterns are observed as long as
the parameter combination βc remains constant. In the
stochastic simulations in Fig. 4, the coefficient β has been
increased to compensate for a decrease in the enzyme
concentration. For larger enzyme concentrations, good
agreement between mean-field predictions and stochas-
tic simulations has been found. In the mean-field equa-
tions (1), intramolecular fluctuations are not taken into
account (σ = 0 and therefore each turnover cycle has
the same fixed duration τ). Stochastic simulations have
been, however, also performed when such fluctuations
were present. Synchronization waves could still be found
even at internal noise levels which corresponded to the
mean relative dispersion ξ of turnover times of about 10%
(with ξ =
/τ ≃ (2στ)
Although the emphasis in this Letter is on the phenom-
ena in two-dimensional enzymic arrays, analogous effects
should be expected for three-dimensional systems repre-
senting aqueous enzymic solutions. The linear stability
analysis, yielding Hopf and wave bifurcation boundaries
(see Fig. 3), is valid also for the 3D geometry. We have
performed preliminary stochastic simulations for thin so-
lution layers with high enzyme concentrations and could
observe synchronization patterns similar to those found
for the enzymic arrays.
A product molecule, released by an enzyme, diffuses
in the solution until it either binds, as a regulatory
molecule, to another enzyme or undergoes a decay. Here,
it should be taken into account that a regulatory molecule
can bind to an allosteric enzyme only at a certain bind-
ing site of characteristic radius R. Using the theory of
diffusion-controlled reactions, the average time ttransit
after which a regulatory product would find a binding
site of one of the enzymes can be roughly estimated
[11] as ttransit = 1/cDR, if enzymes are uniformly dis-
tributed inside the reaction volume with concentration
c. Therefore, binding typically occurs within the dis-
tance Lcorr = (Dttransit)
= (cR)
from the point
where a molecule is released. Obviously, it can only
take place if the product molecule has not undergone
decay until that moment, i.e. if γttransit < 1. This
condition puts a restriction on the enzyme concentration
c, which must be higher than the critical concentration
c∗ = γ/DR. Choosing γ = 103 s−1, D = 10−5 cm2s−1
and R = 10−7 cm, the critical enzyme concentration is
c∗ = 1015 cm−3 = 10−6 M. A similar estimate can be
obtained when enzymes are immobilized on a plane im-
mersed into a reactive solution; in this case the mean dis-
tance between the enzymes on the plane should be less
than lc = (Rldiff)
[22]. Although the required en-
zyme concentrations are relatively large, they are within
the range characteristic for biological cells (glycolytic en-
zymes are present [19] in a cell at even higher concentra-
tion of more than 10−5 M). The characteristic temporal
period of developing patterns is determined by the en-
zyme turnover time τ , which typically varies from mil-
liseconds to seconds. The characteristic length scale of
developing wave patterns is determined by the diffusion
length ldiff , which can vary under these conditions from
a fraction of a micrometer to tens of micrometers.
Our analysis shows that spontaneous molecular syn-
chronization of allosteric product-activated enzymes can
be observed in enzymic arrays. Artificial arrays formed
by immobilized protein machines (molecular motors) are
already used in experiments on active nanoscale trans-
port (see [20]). Many enzymes in biological cells are
membrane-bound, thus forming natural enzymic arrays.
Similar phenomena are possible in dense enzyme solu-
tions. In the study by Petty et al. [21], traveling waves
of NAD(P)H and proton concentrations with the wave-
length of about a micrometer were observed inside neu-
trophil cells. These metabolic waves had the temporal
period of about 300 ms, which is by two orders of magni-
tude shorter than the characteristic period of glycolytic
oscillations in the cells and lies closer to the time scales
of turnover cycles of individual enzymes. An intriguing
question, requiring further detailed analysis, is whether
molecular synchronization waves may have already been
seen in these experiments.
Molecular synchronization waves are principally dif-
ferent from classical concentration waves in reaction-
diffusion systems. Under synchronization conditions,
internal conformational states of individual enzyme
molecules in their turnover cycles become strongly cor-
related. In optics, a similar situation is found when a
transition to coherent laser generation has taken place.
Our theoretical analysis may open a way to the investiga-
tions of a new class of spatio-temporal pattern formation
in chemically active molecular systems.
The authors are grateful to M. Falcke and P. Stange for
valuable discussions. Financial support of Japan Society
for the Promotion of Science through a fellowship for
research abroad (Y. T.) is acknowledged.
∗ Present address: Nanobiology Laboratories, Graduate
School of Frontier Biosciences, Osaka University, 1-3 Ya-
madaoka, Suita, Osaka 565-0871, Japan; Electronic ad-
dress: togashi@phys1.med.osaka-u.ac.jp
† Electronic address: mikhailov@fhi-berlin.mpg.de
[1] K. Kinbara, T. Aida, Chem. Rev. 105, 1377 (2005).
[2] L. A. Blumenfeld, A. N. Tikhonov, Biophysical Thermo-
dynamics of Intracellular Processes: Molecular Machines
of the Living Cell (Springer, Berlin 1994).
[3] M. Gerstein, A. M. Lesk, C. Chothia, Biochemistry 33,
6739 (1994).
[4] F. Jülicher, A. Ajdari, J. Prost, Rev. Mod. Phys. 69, 1269
(1997).
[5] H.-Ph. Lerch, A. S. Mikhailov, B. Hess, Proc. Natl. Acad.
Sci. (USA) 99, 15410 (2002).
[6] H.-Ph. Lerch, R. Rigler, A. S. Mikhailov, Proc. Natl.
Acad. Sci. (USA) 102, 10807 (2005).
[7] S. Ramaswamy, J. Toner, J. Prost, Phys. Rev. Lett. 84,
3494 (2000).
[8] P. Lenz, J.-F. Joanny, F. Jülicher, J. Prost, Phys. Rev.
Lett. 91, 108104 (2003).
[9] H.-Y. Chen, Phys. Rev. Lett. 92, 168101 (2004).
[10] A. Goldbeter, Biochemical Oscillations and Cellular
Rhythms (Cambridge University Press, Cambridge 1996).
[11] P. Stange, A. S. Mikhailov, B. Hess, J. Phys. Chem. B
102, 6273 (1998).
[12] P. Stange, A. S. Mikhailov, B. Hess, J. Phys. Chem. B
103, 6111 (1999).
[13] K. Sun, Q. Ouyang, Phys. Rev. E 64, 026111 (2001).
[14] M. Schienbein, H. Gruler, Phys. Rev. E 56, 7116 (1997).
[15] See EPAPS Document No. E-PRLTAO-99-041730
for dynamical evolutions in the 2D simula-
tions. For more information on EPAPS, see
http://www.aip.org/pubservs/epaps.html .
[16] V. Casagrande, Doctoral thesis, Technical University,
Berlin (2006),
http://opus.kobv.de/tuberlin/volltexte/2006/1273/ .
[17] D. Walgraef, Spatio-Temporal Pattern Formation
(Springer, Berlin 1997).
[18] A. M. Zhabotinsky, M. Dolnik, I. R. Epstein, J. Chem.
Phys. 103, 10306 (1995).
[19] B. Hess, A. Boiteux, J. Krüger, Adv. Enzyme Regul. 7,
mailto:togashi@phys1.med.osaka-u.ac.jp
mailto:mikhailov@fhi-berlin.mpg.de
http://www.aip.org/pubservs/epaps.html
http://opus.kobv.de/tuberlin/volltexte/2006/1273/
149 (1969).
[20] H. Hess, G. D. Bachand, Materials Today 8 (12, Suppl.
1), 22 (2005).
[21] H. R. Petty, R. G. Worth, A. L. Kindzelskii, Phys. Rev.
Lett. 84, 2754 (2000).
[22] Diffusion perpendicular to the plane is considered as di-
lution within a layer of effective thickness ≃ ldiff .
ABSTRACT
  Spatiotemporal pattern formation in a product-activated enzymic reaction at
high enzyme concentrations is investigated. Stochastic simulations show that
catalytic turnover cycles of individual enzymes can become coherent and that
complex wave patterns of molecular synchronization can develop. The analysis
based on the mean-field approximation indicates that the observed patterns
result from the presence of Hopf and wave bifurcations in the considered
system.

<|endoftext|><|startoftext|>
Introduction. We are interested in designing Lie group numerical schemes
for the strong approximation of nonlinear Stratonovich stochastic differential equa-
tions of the form
yt = y0 +
Vi(yτ , τ) dW
τ . (1.1)
HereW 1, . . . ,W d are d independent scalar Wiener processes andW 0t ≡ t. We suppose
that the solution y evolves on a smooth n-dimensional submanifold M of RN with
n ≤ N and Vi : M × R+ → TM, i = 0, 1, . . . , d, are smooth vector fields which in
local coordinates are Vi =
j=1 V
i ∂yj . The flow-map ϕt : M → M of the integral
equation (1.1) is defined as the map taking the initial data y0 to the solution yt at
time t, i.e. yt = ϕt ◦ y0.
Our goal in this paper is to show how the Lie group integration methods developed
by Munthe-Kaas and co-authors can be extended to stochastic differential equations
on smooth manifolds (see Crouch and Grossman [8] and Munthe-Kaas [40]). Suppose
we know that the exact solution of a given system of stochastic differential equations
evolves on a smooth manifold M (see Malliavin [36] or Emery [14]), but we can only
find the solution pathwise numerically. How can we ensure that our approximate
numerical solution also lies in the manifold?
Suppose we are given a finite dimensional Lie group G and Lie group action Λy0
that generates transport across the manifold M from the starting point y0 ∈ M via
elements of G. Then with any given elements ξ in the Lie algebra g corresponding to
the Lie group G, we can associate the infinitesimal action λξ using the Lie group action
Λy0 . The map ξ 7→ λξ is a Lie algebra homomorphism from g to X(M), the Lie algebra
∗Maxwell Institute for Mathematical Sciences and School of Mathematical and Computer Sciences,
Heriot-Watt University, Edinburgh EH14 4AS, UK. (S.J.Malham@ma.hw.ac.uk, A.Wiese@hw.ac.uk).
(16/10/2007)
http://arxiv.org/abs/0704.0022v2
2 Malham and Wiese
of vector fields over the manifold M. Further the Lie subalgebra {λξ ∈ X(M) : ξ ∈ g}
is isomorphic to a finite dimensional Lie algebra with the same structure constants
(see Olver [42], p. 56).
Conversely, suppose we know that the Lie algebra generated by the set of govern-
ing vector fields Vi, i = 0, 1, . . . , d, on M is finite dimensional, call this XF (M). Then
we know there exists a finite dimensional Lie group G whose Lie algebra g has the
same structure constants as XF (M) relative to some basis, and there is a Lie group
action Λy0 such that Vi = λξi , i = 0, 1, . . . , d, for some ξi ∈ g (see Olver [42], p. 56 or
Kunita [30], p. 194). The choice of group and action is not unique.
In this paper we assume that there is a finite dimensional Lie group G and action
Λy0 such that our set of governing vector fields Vi, i = 0, 1, . . . , d, are each infinitesimal
Lie group actions generated by some element in g via Λy0 , i.e. Vi = λξi for some ξi ∈ g,
i = 0, 1, . . . , d. They are said to be fundamental vector fields. This means that we can
write down the set of governing vector fields Xξi for a system of stochastic differential
equations on the Lie group G that, via the Lie group action Λy0, generates the flow
governed by the set of vector fields Vi on the manifold. The vector fields Vi on M are
simply the push forward of the vector fields Xξi on G via the Lie group action Λy0 .
Typically the flow on the Lie group also needs to be computed numerically. We thus
want the approximation to remain in the Lie group so that the Lie group action takes
us back to the manifold.
To achieve this, we pull back the set of governing vector fields Xξi on G to the set
of governing vector fields vξi on g, via the exponential map ‘exp’ from g to G. Thus
the stochastic flow generated on g by the vector fields vξi generates the stochastic flow
on G generated by the Xξi . The set of governing vector fields on g are for each σ ∈ g:
vξi ◦ σ ≡
(adσ)
k ◦ ξi , (1.2)
where Bk is the kth Bernoulli number and the adjoint operator adσ is a closed operator
on g, in fact adσ ◦ ζ = [σ, ζ], the Lie bracket on g. Now the essential point is that
ξi ∈ g and so the series on the right or any truncation of it is closed in g. Hence
if we construct an approximation to our stochastic differential equation on g using
the vector fields vξi or an approximation of them achieved by truncating the series
representation, then that approximation must reside in the Lie algebra g. We can then
push the approximation in the Lie algebra forward onto the Lie group and then onto
the manifold. Provided we compute the exponential map and action appropriately, our
approximate solution lies in the manifold (to within machine accuracy). In summary,
for a given ξ ∈ g and any y0 ∈ M we have the following commutative diagram:
∗−−−−→ X(G)
(Λy0 )∗−−−−→ X(M)
exp−−−−→ G
Λy0−−−−→ M
We have implicitly separated the governing set of vector fields Vi, i = 0, 1, . . . , d,
from the driving path process w ≡ (W 1, . . . ,W d). Together they generate the unique
solution process y ∈ M to the stochastic differential equation (1.1). When there
is only one driving Wiener process (d = 1) the Itô map w 7→ y is continuous in
the topology of uniform convergence. When there are two or more driving processes
Stochastic Lie group integrators 3
(d ≥ 2) the Universal Limit Theorem tells us that the Itô map w 7→ y is continuous
in the p-variation topology, in particular for 2 ≤ p < 3 (see Lyons [32], Lyons and
Qian [33] and Malliavin [36]). A Wiener path with d ≥ 2 has finite p-variation for
p > 2. This means that from a pathwise perspective, approximations to y constructed
using successively refined approximations to w are only guaranteed to converge to
the correct solution y, if we include information about the Lévy chordal areas of
the driving path process. Note however that the L2-norm of the 2-variation of a
Wiener process is finite. In the Lie group integration procedure prescribed above we
must solve a stochastic differential system on the Lie algebra g defined by the set
of governing vector fields vξi and the driving path process w ≡ (W 1, . . . ,W d). In
light of the Universal Limit Theorem and with stepsize adaptivity in mind in future
(see Gaines and Lyons [20]), we for instance use in our examples order 1 stochastic
numerical methods—that include the Lévy chordal area—to solve for the flow on the
Lie algebra g.
We have thus explained the idea behind Munthe-Kaas methods and how they can
be generalized to the stochastic setting. The first half of this paper formalizes this
procedure.
In the second half of this paper, we consider autonomous vector fields and con-
struct stochastic Lie group integration schemes using Castell–Gaines methods. This
approach proceeds as follows. We truncate the stochastic exponential Lie series expan-
sion corresponding to the flow ϕt of the solution process y to the stochastic differential
equation (1.1). We then approximate the driving path process w ≡ (W 1, . . . ,W d) by
replacing it by a suitable nearby piecewise smooth path in the appropriate variation
topology. An approximation to the solution yt requires the exponentiation of the
approximate truncated exponential Lie series. This can be achieved by solving the
system of ordinary differential equations driven by the vector field that is the approx-
imate truncated exponential Lie series. If we use ordinary Munthe-Kaas methods as
the underlying ordinary differential integrator the Castell–Gaines method becomes a
stochastic Lie group integrator.
Further, based on the Castell–Gaines approach we then present uniformly accurate
exponential Lie series integrators that are globally more accurate than their stochastic
Taylor counterpart schemes (these are investigated in detail in Lord, Malham and
Wiese [31] for linear stochastic differential equations). They require the assumption
that a sufficiently accurate underlying ordinary differential integrator is used; that
integrator could for example be an ordinary Lie group Munthe-Kaas method. In
the case of two driving Wiener processes we derive the order 1/2, and in the case
of one driving Wiener process the order 1 uniformly accurate exponential Lie series
integrators. As a consequence we confirm the asymptotic efficiency properties for
both these schemes proved by Castell and Gaines [8] (see Newton [41] for more details
on the concept of asymptotic efficiency). We also present in the case of one driving
Wiener process a new order 3/2 uniformly accurate exponential Lie series integrator
(also see Lord, Malham and Wiese [31]).
We present two physical applications that demonstrate the advantage of using
stochastic Munthe-Kaas methods. First we consider a free rigid body which for ex-
ample could model the dynamics of a satellite. We suppose that it is perturbed by
two independent multiplicative stochastic noise processes. The governing vector fields
are non-commutative and the corresponding exact stochastic flow evolves on the unit
sphere. We show that the stochastic Munthe-Kaas method, with an order 1 stochastic
Taylor integrator used to progress along the corresponding Lie algebra, preserves the
4 Malham and Wiese
approximate solution in the unit sphere manifold to within machine error. However
when an order 1 stochastic Taylor integrator is used directly, the solution leaves the
unit sphere. The contrast between these two methods is more emphatically demon-
strated in our second application. Here we consider an autonomous underwater vehicle
that is also perturbed by two independent multiplicative stochastic noise processes.
The exact stochastic flow evolves on the manifold which is the dual of the Euclidean
Lie algebra se(3); two independent Casimirs are conserved by the exact flow. Again
the stochastic Munthe-Kass method preserves the Casimirs to within machine error.
However the order 1 stochastic Taylor integrator is not only unstable for large step-
sizes, but the approximation drifts off the manifold and makes a dramatic excursion
off to infinity in the embedding space R6.
Preserving the approximate flow on the manifold of the exact dynamics may be a
required property for physical or financial systems driven by smooth or rough paths—
for general references see Iserles, Munthe-Kaas, Nørsett and Zanna [25], Hairer, Lubich
and Wanner [22], Elworthy [13], Lyons and Qian [33] and Milstein and Tretyakov [38].
Stochastic Lie group integrators in the form of Magnus integrators for linear stochastic
differential equations were investigated by Burrage and Burrage [5]. They were also
used in the guise of Möbius schemes (see Schiff and Shnider [43]) to solve stochastic
Riccati equations by Lord, Malham and Wiese [31] where they outperformed direct
stochastic Taylor methods. Further applications where they might be applied include:
backward stochastic Riccati equations arising in optimal stochastic linear-quadratic
control (Kohlmann and Tang [28]); jump diffusion processes on matrix Lie groups
for Bayesian inference (Srivastava, Miller and Grenander [44]); fractional Brownian
motions on Lie groups (Baudoin and Coutin [3]) and stochastic dynamics triggered
by DNA damage (Chickarmane, Ray, Sauro and Nadim [10]).
Our paper is outlined as follows. In Section 2 we present the basic geometric setup,
sans stochasticity. In particular we present a generalized right translation vector field
on a Lie group that forms the basis of our subsequent transformation from the Lie
group to the manifold. Using a Lie group action, this vector field pushes forward
to an infinitesimal Lie group action vector field that generates a flow on the smooth
manifold. In Section 3 we specialize to the case of a matrix Lie group and using
the exponential map, derive the pullback of the generalized right translation vector
field on the Lie group to the corresponding vector field on the Lie algebra. To help
give some context to our overall scheme, we provide in Section 4 illustrative examples
of manifolds and natural choices for associated Lie groups and actions that generate
flows on those manifolds. Then in Section 5 we show how a flow on a smooth manifold
corresponding to a stochastic differential equation can be generated by a stochastic
flow on a Lie algebra via a Lie algebra action. We explicitly present stochastic Munthe-
Kaas Lie group integration methods in Section 6. We start the second half of our
paper by reviewing the exponential Lie series for stochastic differential equations in
Section 7. We show in Section 8 how to construct geometric stochastic Castell–Gaines
numerical methods. In particular we also present uniformly accurate exponential Lie
series numerical schemes that not only can be used as geometric stochastic integrators,
but also are always more accurate than stochastic Taylor numerical schemes of the
corresponding order. In Section 9 we present our concrete numerical examples. Finally
in Section 10 we conclude and present some further future applications and directions.
2. Lie group actions. SupposeM is a smooth finite n-dimensional submanifold
of RN with n ≤ N . We use X(M) to denote the Lie algebra of vector fields on the
manifold M, equipped with the Lie–Jacobi bracket [U, V ] ≡ U · ∇V − V · ∇U , for all
Stochastic Lie group integrators 5
U, V ∈ X(M). Let G denote a finite dimensional Lie group.
Definition 2.1 (Lie group action). A left Lie group action of a Lie group G on a
manifold M is a smooth map Λ: G ×M → M satisfying for all y ∈ M and R,S ∈ G:
(1) Λ(id, y) = y; (2) Λ(R,Λ(S, y)) = Λ(RS, y). We denote Λy ◦ S ≡ Λ(S, y).
Hereafter we suppose y0 ∈ M is fixed and focus on the action map Λy0 : G→M.
We assume that the Lie group action Λ is transitive, i.e. transport across the manifold
from any point y0 ∈ M to any other point y ∈ M can always be achieved via a group
element S ∈ G with y = Λy0 ◦ S (Marsden and Ratiu [37], p. 310).
We define the Lie algebra g associated with the Lie group G to be the vector space
of all right invariant vector fields on G. By standard construction this is isomorphic
to the tangent space to G at the identity id ≡ idG (see Olver [42], p. 48 or Marsden
and Ratiu [37], p. 269).
Definition 2.2 (Generalized right translation vector field). Suppose we are
given a smooth map ξ : M→g. With each such map ξ we associate a vector field
Xξ : G → X(G) defined as follows
Xξ ◦ S ≡ ∂τ exp
τ ξ(Λy0 ◦ S)
for S ∈ G, where ‘exp’ is the usual local diffeomorphism exp: g → G from a neigh-
bourhood of the zero element o ∈ g to a neighbourhood of id ∈ G.
Definition 2.3 (Infinitesimal Lie group action). We associate with each vector
field Xξ : G→X(G) a vector field λξ : M→X(M) as the push forward of Xξ from G to
M by Λy0, i.e. λξ ≡
Xξ, so that if S ∈ G and y = Λy0 ◦ S ∈ M, then
λξ ◦ y ≡ ∂τΛy0 ◦ γ(τ)|τ=0 ,
where γ(t) ∈ G, γ(0) = S and ∂τγ(τ) = Xξ ◦ γ(τ) (the flow generated on G by the
vector field Xξ starting at S ∈ G). Naturally, as a vector field λξ is linear, and also
λξ ◦ y ≡ LXξ ◦ Λy0 ◦ S ,
the Lie derivative of Λy0 along Xξ at S ∈ G.
Remarks.
1. The map Λ(S) : M→M defined by y 7→ Λ(S) ◦ y ≡ Λy ◦ S represents a
flow on M. Hence if y = Λ(S) ◦ y0, the push forward of λξ by Λ(S) is given by
λξ ≡ λAdSξ (Marsden and Ratiu [37], p. 317).
2. We define the isotropy subgroup at y0 ∈ M by Gy0 ≡ {S ∈ G : Λy0◦S = y0}; it
is a closed subgroup of G (see Helgason [23], p. 121 or Warner [48], p. 123). We define
the global isotropy subgroup by GM ≡ ∩y0∈MGy0 ≡ {S ∈ G : Λy0 ◦ S = y0, ∀y0 ∈ M};
it is a normal subgroup of G (see Olver [42], p. 38).
3. A Lie group action is said to be is effective/faithful if the map S 7→ Λ(S) from
G to Diff(M), the group of diffeomorphisms on M, is one-to-one. This is equivalent
to the condition that different group elements have different actions, i.e. GM ≡ {idG}.
A Lie group action is said to be free if Gy0 = {idG} for all y0 ∈ M, i.e. Λy0 is a
diffeomorphism from G to M. For more details see Marsden and Ratiu [37], p. 310
and Olver [42], p. 38.
4. The map γ : G/Gy0→M defined by γ : S · Gy0 7→ Λy0 ◦ S is a diffeomorphism,
i.e. M ∼= G/Gy0 for any y0 ∈ M (a manifold M with a Lie group action Λ: G×M→M
defined over it is thus diffeomorphic to a homogeneous manifold ; see Warner [48],
p. 123 or Olver [42], p. 40). Further, the induced action of G/GM on M is effective.
Hence if Λ is not an effective action of G, we can replace it (without loss of generality)
by the induced action of G/GM (see Olver [42], p. 38).
6 Malham and Wiese
5. Our definition for the generalized right translation vector field Xξ on G is
motivated by the standard right translation vector field used to identify g, the vector
space of right invariant vector fields on G, with TidG, the tangent space to G at the
identity. When ξ ∈ g is constant, Xξ ∈ X(G) is right invariant and a Lie bracket
on TidG can be defined via right extension by the corresponding Lie–Jacobi bracket
for the vector fields Xξ on X(G). Unless ξ ∈ g is constant, Xξ is not in general
right invariant. For further details see Varadarajan [47], Olver [42], or Marsden and
Ratiu [37].
6. The infinitesimal generator map ξ 7→ λξ from g to X(M) is a Lie algebra
homomorphism. If we identify g as the vector space of left invariant vector fields on G
this map becomes an anti-homomorphism. The Lie–Jacobi bracket as defined above
gives the right (rather than left) Lie algebra stucture over the group of diffeomorphisms
on M. If in addition we take the Lie–Jacobi bracket to be minus that defined above—
associated with the left Lie algebra structure—then the infinitesimal generator map
becomes a homomorphism again. See for example Marsden and Ratiu [37], p. 324 or
Munthe-Kaas [40].
7. The image of g under the infinitesimal generator map ξ 7→ λξ forms a finite
dimensional Lie algebra of vector fields on M which is isomorphic to the Lie algebra of
the effectively acting quotient group G/GM (see Olver [42], p. 56). Thus the tangent
space to M at any point is g and M inherents a connection from G/GM. Connections
are necessary to define martingales on manifolds, but not for defining semimartingales
(our focus here); see Malliavin [36] and Emery [14].
8. A comprehensive study of the systematic construction of symmetry Lie groups
from given vector fields can be found in Olver [42].
9. We assumed above that the vector fieldsXξ and λξ are autonomous. However
all results in this and subsequent sections up to Section 7 can be straightforwardly
extended to non-autonomous vector fields generated by ξ : M × R→g with (y, t) 7→
ξ(y, t) for all y ∈ M and t ∈ R.
10. For full generality we want to suspend reference to embedding spaces as far as
possible. However in subsequent sections to be concise we will more explicitly reclaim
this context.
3. Pull back to the Lie algebra. For ease of presentation, we will assume in
this section that G is a matrix Lie group. Recall that the exponential map exp: g → G
is a local diffeomorphism from a neighbourhood of o ∈ g to a neighbourhood of id ∈ G.
Let vξ : g→g be the pull back of the vector field Xξ : G→X(G) from G to g via the
exponential mapping exp: g→G, i.e. vξ ◦ σ ≡ exp∗Xξ ◦ σ. If σ ∈ g then
vξ ◦ σ = dexp−1σ ◦ ξ
Λy0 ◦ expσ
. (3.1)
Here dexp−1σ : g→g is the inverse of the right-trivialized tangent map of the exponential
dexpσ : g→g defined as follows. If β(τ) is a curve in g such that β(0) = σ and
β′(0) = η ∈ g then dexp: g× g→g is the local smooth map (Varadarajan [47], p. 108)
dexpσ ◦ η ≡ ∂τ expβ(τ)|τ=0 exp(−σ)
exp(adσ)− id
◦ η .
Note that as a tangent map dexpσ : g→g is linear. The inverse operator dexp
σ is the
operator series (1.2) generated by considering the reciprocal of dexpσ.
Stochastic Lie group integrators 7
To show that (3.1) is true, if exp: g→G with σ 7→ S = expσ, and β(τ) ∈ g with
β(0) = σ and ∂τβ(τ) = vξ ◦ β(τ), then:
exp∗ vξ ◦ S = ∂τ expβ(τ)|τ=0
dexpσ ◦ vξ ◦ σ
exp(σ)
≡ Xξ ◦ S .
Since ‘exp’ is a diffeomorphism in a neighbourhood of o ∈ g, this push forward calcu-
lation establishes the pull back (3.1) for all σ ∈ g in that neighbourhood.
4. Illustrative examples. Suppose the vector field V : M× R→X(M) gener-
ates a flow solution yt ∈ M starting from y0 ∈ M. Then assume there exists a:
1. Lie group G with corresponding Lie algebra g;
2. Lie group action Λy0 : G→M for which a starting point y0 ∈ M is fixed;
3. Vector field λξ : M× R→X(M) such that: V ≡ λξ, i.e. V is a fundamental
vector field corresponding to the action Λy0 .
Let us suppose G is a matrix Lie group (or can be embedded into a matrix Lie
group, for example the Euclidean group SE(3) is naturally embedded into the special
linear group SL(4;R)). We have for all S ∈ G and t ∈ R,
Xξ(S, t) ≡ ξ
Λy0(S), t
S . (4.1)
If V = λξ for some ξ : M→g, some Lie group G and corresponding action Λy0 , then
the flow generated by Xξ on G drives the flow generated by V on M. In each of the
examples below, given the manifold M, we present a natural Lie group and action
associated with the manifold structure, and identify vector fields which generate flows
on the manifold via the Lie group.
Stiefel manifold Vn,k. Suppose M = Vn,k ≡ {y ∈ Rn×k : yTy = I}. Take
G = SO(n), the special orthogonal group, and Λy0(S) ≡ Sy0, the action of left
multiplication. The corresponding Lie algebra g = so(n). Then by direct calculation
λξ(y) = ξ(y, t) y. Hence if the given vector field V (y, t) = ξ(y, t) y, then the push
forward of the flow generated by Xξ(S, t) on G in (4.1) is the flow generated by V on
M. Note that the unit sphere S2 ∼= V3,1, i.e. S2 is just a particular Stiefel manifold.
In Section 9 as an application, we consider rigid body dynamics evolving on S2.
Isospectral manifold Sn. Suppose M = Sn = {y ∈ Rn×n : yT = y}, the set of
n× n real symmetric matrices. Take G = O(n), the orthogonal group and Λy0(S) ≡
T, which is an isospectral action (Munthe-Kaas [40]). The corresponding Lie
algebra is g = so(n). Again, by direct calculation λξ(y) = ξ(y, t) y − y ξ(y, t). Hence
if the given vector field V (y, t) = ξ(y, t) y−y ξ(y, t), then the push forward of the flow
generated by Xξ(S, t) on G in (4.1) is the flow generated by V on M.
Dual of the Euclidean algebra se(3)∗. Suppose M = se(3)∗ ∼= R3, the dual
of the Euclidean algebra se(3) of the Euclidean group SE(3) =
(s, ρ) ∈ SE(3) : s ∈
SO(3), ρ ∈ R3
. Take G = SE(3) so g = se(3) and Λ ≡ Ad∗ : G × g∗→g∗, the
coadjoint action of G on g∗. Then by direct calculation λξ(y) = −ad∗ξ(y). Since
λξ(y) in linear in ξ and −λξ(y) ≡ λ−ξ(y), it follows that if V (y) = ad∗ξ(y), then
the push forward of the flow generated by X−ξ(S, t) = −ξ
Λy0(S), t
S on G is the
flow generated by V on M. For more details see Section 9 where we investigate the
dynamics of an autonomous underwater vehicle evolving on se(3)∗.
8 Malham and Wiese
Grassmannian manifold Gr(k, n). The Grassmannian manifold M = Gr(k, n)
is the space of k-dimensional subspaces of Rn. Take G = GL(n), the general linear
matrix group, where if S ∈ GL(n), we identify
where the block matrices α, β, γ and δ are sizes k × k, k × (n − k), (n− k) × k and
(n − k) × (n − k), respectively (see Schiff and Shnider [43]; Munthe-Kaas [40]). We
choose the action of GL(n) on Gr(k, n) to be the generalized Möbius transformation
Λy0(S) = (αy0 + β)(γy0 + δ)
−1. Hence if
ξ(t) =
a(t) b(t)
c(t) d(t)
then direct calculation reveals that λξ(y) = a(t)y+ b(t)− yc(t)y− yd(t). Hence if the
given vector field V (y) = a(t)y + b(t) − yc(t)y − yd(t), then the push forward of the
flow generated by Xξ(S, t) = ξ(t)S on G is the flow generated by V on Gr(k, n).
5. Stochastic Lie group integration. We show that if a Lie group action
Λ: G ×M→M exists, then for y0 ∈ M fixed, the Lie algebra action Λy0 ◦ exp: g→M
carries a flow on g to a flow on M.
Theorem 5.1. Suppose there exists a Lie group action Λ: G ×M→M. Then if
there exists a process σ ∈ g and a stopping time T∗ such that on [0, T∗), σ satisfies
the Stratonovich stochastic differential equation
vξi ◦ στ dW iτ , (5.1)
then the process y = Λy0 ◦ expσ ∈ M satisfies the Stratonovich stochastic differential
equation on [0, T∗):
yt = y0 +
λξi ◦ yτ dW iτ . (5.2)
Proof. Using Itô’s lemma, if σt ∈ g satisfies (5.1) then Λy0 ◦ expσt satisfies
Λy0 ◦ expσt = Λy0 ◦ exp o+
Lvξi ◦ Λy0 ◦ expστ dW
Now recall that for each i = 0, 1, . . . , d, Xξi is the push forward of vξi from g to G via
the exponential map, and that λξi is the push forward of Xξi from G to M via Λy0
and so the Lie derivative
Lvξi ◦ Λy0 ◦ expσt ≡ λξi ◦ yt .
Then since yt = Λy0 ◦ expσt, we conclude that y ∈ M is a process satisfying the
stochastic differential equation (5.2).
Corollary 5.2. Suppose that for each i = 0, 1, . . . , d there exists ξi : M→g such
that the vector field Vi : M→X(M) and λξi : M→X(M) can be identified, i.e.
Vi ≡ λξi . (5.3)
Stochastic Lie group integrators 9
Then the push forward by ‘Λy0◦exp’ of the flow on the Lie algebra manifold g generated
by the stochastic differential equation (5.1) is the flow on the smooth manifold M
generated by the stochastic differential equation (5.2), whose solution can be expressed
in the form yt = Λy0 ◦ expσt.
Remark. If the action is free then ‘Λy0 ◦ exp’ is a diffeomorphism from a neigh-
bourhood of o ∈ g to a neighbourhood of y0 ∈ M.
6. Stochastic Munthe-Kaas methods. Assuming that the vector fields in our
original stochastic differential equation (1.1) are fundamental and satisfy (5.3), then
stochastic Munthe-Kaas methods are constructed as follows:
1. Subdivide the global interval of integration [0, T ] into subintervals [tn, tn+1].
2. Starting with t0 = 0, repeat the next two steps over successive intervals
[tn, tn+1] until tn+1 = T .
3. Compute an approximate solution σ̂tn,tn+1 to (5.1) across [tn, tn+1] using a
stochastic Taylor, stochastic Runge–Kutta or Castell–Gaines method.
4. Compute the approximate solution ytn+1 ≈ Λytn ◦ exp σ̂tn,tn+1 .
Note that by construction σ̂tn,tn+1 ∈ g because the stochastic differential equa-
tion (5.1) (or any stochastic Taylor or other sensible approximation) evolves the so-
lution locally on the Lie algebra g via the vector fields vξi : g→g. Suitable methods
for approximating the exponential map to ensure it maps g to G appropriately can be
found in Iserles and Zanna [26]. Then by construction ytn+1 ∈ M.
For example, with two Wiener processes and autonomous vector fields vξi ◦ σ, an
order 1 stochastic Taylor Munthe-Kaas method is based on
σ̂tn,tn+1 =
J0vξ0 +J1vξ1 +J2vξ2 +
+J12vξ1vξ2 +J21vξ2vξ1 +
◦o , (6.1)
evaluated at the zero element o ∈ g. Typically ‘dexp−1σ ’ is truncated to only include
the necessary low order terms to maintain the order of the numerical scheme.
Remark. It is natural to invoke Ado’s Theorem (see for example Olver [42] p. 54):
any finite dimensional Lie algebra is isomorphic to a Lie subalgebra of gl(n) (the
general linear algebra) for some n ∈ N. However as Munthe-Kaas [40] points out,
directly using a matrix representation for the given Lie group might not lead to the
optimal computational implementation (other data structures might do so).
7. Exponential Lie series. The stochastic Taylor series is known in different
contexts as the Neumann series, Peano–Baker series or Feynman–Dyson path ordered
exponential. If the vector fields in the stochastic differential equation (1.1) are au-
tonomous (which we assume henceforth), i.e. for all i = 0, 1, . . . , d, Vi = Vi(y) only,
then the stochastic Taylor series for the flow is
Jα1···αm(t)Vα1 · · ·Vαm .
Here Pm is the set of all combinations of multi-indices α = (α1, . . . , αm) of length m
with αi ∈ {0, 1, . . . , d} and
Jα1···αm(t) ≡
· · ·
∫ τm−1
dWα1τm · · · dW
are multiple Stratonovich integrals.
10 Malham and Wiese
The logarithm of ϕt is the exponential Lie series, Magnus expansion (Magnus [34])
or Chen–Strichartz formula (Chen [9], Strichartz [45]). In other words we can express
the flow map in the form ϕt = expψt, where
Ji(t)Vi +
j>i=0
(Jij − Jji)(t)[Vi, Vj ] + · · ·
is the exponential Lie series for our system, and [· , ·] is the Lie–Jacobi bracket on
X(M). See Yamato [49], Kunita [29], Ben Arous [1] and Castell [7] for the derivation
and convergence of the exponential Lie series expansion in the stochastic context;
Strichartz [45] for the full explicit expansion; Sussmann [46] for a related product
expansion and Lyons [32] for extensions to rough paths.
Let us denote the truncated exponential Lie series by
ψ̂t =
Jα cα , (7.1)
where Qm denotes the finite set of multi-indices α for which ‖Jα‖L2 is of order up to
and including tm, where m = 1/2, 1, 3/2, . . .. The terms cα are linear combinations
of finitely many (length α) products of the smooth vector fields Vi, i = 0, 1, . . . , d.
The following asymptotic convergence result can be established along the lines of the
proof for linear stochastic differential equations in Lord, Malham and Wiese [31]; we
provide a proof in Appendix A.
Theorem 7.1. Assume the vector fields Vi have 2m+1 uniformly bounded deriva-
tives, for all i = 0, 1, . . . , d. Then for t ≤ 1, the flow exp ψ̂t ◦ y0 is square-integrable,
where ψ̂t is the truncated Lie series (7.1). Further, if y is the solution of the stochastic
differential equation (1.1), there exists a constant C
m, ‖y0‖2
such that
∥yt − exp ψ̂t ◦ y0
m, ‖y0‖2
tm+1/2 . (7.2)
8. Geometric Castell–Gaines methods. Consider the truncated exponential
Lie series ψ̂tn,tn+1 across the interval [tn, tn+1]. We approximate higher order multiple
Stratonovich integrals across each time-step by their expectations conditioned on the
increments of the Wiener processes on suitable subdivisions (Gaines and Lyons [20]).
An approximation to the solution of the stochastic differential equation (1.1) across
the interval [tn, tn+1] is given by the flow generated by the truncated and conditioned
exponential Lie series ψ̂tn,tn+1 via
ytn+1 ≈ exp
ψ̂tn,tn+1
◦ ytn .
Hence the solution to the stochastic differential equation (1.1) can be approximately
computed by solving the ordinary differential system (see Castell and Gaines [8];
Misawa [39])
u′(τ) = ψ̂tn,tn+1 ◦ u(τ) (8.1)
across the interval τ ∈ [0, 1]. Then if u(0) = ytn we will get u(1) ≈ ytn+1. We
must choose a sufficiently accurate ordinary differential integrator to solve (8.1)—we
implicitly assume this henceforth.
Stochastic Lie group integrators 11
The set of governing vector fields Vi, i = 0, 1, . . . , d, prescribes a map from the
driving path process w ≡ (W 1, . . . ,W d) to the unique solution process y ∈ M to the
stochastic differential equation (1.1). The map w 7→ y is called the Itô map. Recall
that we assume the vector fields are smooth. When there is only one driving Wiener
process (d = 1) the Itô map is continuous in the topology of uniform convergence
(Theorem 1.1.1. in Lyons and Qian [33]). When there are two or more driving pro-
cesses (d ≥ 2) the Universal Limit Theorem (Theorem 6.2.2. in Lyons and Qian [33])
tells us that the Itô map is continuous in the p-variation topology, in particular for
2 ≤ p < 3. A Wiener path with d ≥ 2 has p-variation with p > 2, and the p-variation
metric in this case includes information about the Lévy chordal areas of the path
(Lyons [32]). Hence we must choose suitable piecewise smooth approximations to the
driving path process w. The following result follows from the corresponding result
for ordinary differential equations in Hairer, Lubich and Wanner [22] (p. 112) as well
as directly from Chapter VIII in Malliavin [36] on the Transfer Principle (see also
Emery [15]).
Lemma 8.1. A necessary and sufficient condition for the solution to the stochastic
differential equation (1.1) to evolve on a smooth n-dimensional submanifold M of
RN (n ≤ N) up to a stopping time T∗ is that Vi(y, t) ∈ TyM for all y ∈ M, i =
0, 1, . . . , d.
Hence the stochastic Taylor expansion for the flow ϕt is a diffeomorphism on M.
However a truncated version of the stochastic Taylor expansion for the flow ϕ̂t will not
in general keep you on the manifold, i.e. if y0 ∈ M then ϕ̂t ◦ y0 need not necessarily
lie in M. On the other hand, the exponential Lie series ψt, or any truncation ψ̂t of
it, lies in X(M). By Lemma 8.1 this is a necessary and sufficient condition for the
corresponding flow-map exp ψ̂t to be a diffeomorphism on M. Hence if u(0) = ytn ∈
M, then ytn+1 ≈ u(1) ∈ M. When solving the ordinary differential equation (8.1),
classical geometric integration methods, for example Lie group integrators such as
Runge–Kutta Munthe-Kaas methods, over the interval τ ∈ [0, 1] will numerically
ensure ytn+1 stays in M. Additionally, as the following result reveals, numerical
methods constructed using the Castell–Gaines Lie series approach can also be more
accurate (a proof is provided in Appendix B). We define the strong global error at
time T associated with an approximate solution ŷT as E ≡ ‖yT − ŷT ‖L2 .
Theorem 8.2. In the case of two independent Wiener processes and under the
assumptions of Theorem 7.1, for any initial condition y0 ∈ M and a sufficiently
small fixed stepsize h = tn+1 − tn, the order 1/2 Lie series integrator is globally more
accurate in L2 than the order 1/2 stochastic Taylor integrator. In addition, in the
case of one Wiener process, the order 1 and 3/2 uniformly accurate exponential Lie
series integrators generated by ψ̂
tn,tn+1
= J0V0 + J1V1 +
[V1, [V1, V0]]
(3/2)
tn,tn+1
= J0V0 + J1V1 +
(J01 − J10)[V0, V1] + h
[V1, [V1, V0]]
respectively, are globally more accurate in L2 than their corresponding stochastic Tay-
lor integrators. In other words, if E lsm denotes the global error of the exponential Lie
series integrators of order m above, and Estm is the global error of the stochastic Taylor
integrators of the corresponding order, then E lsm ≤ Estm for m = 1/2, 1, 3/2.
Remarks.
1. The result for ψ̂(3/2) is new. That the order-1/2 Lie series integrator (for two
Wiener processes) and the order 1 integrator generated by ψ̂(1) are uniformly more
accurate confirms the asymptotically efficient properties of these schemes proved by
12 Malham and Wiese
Castell and Gaines [8]. The proof follows along the lines of an analogous result for
linear stochastic systems considered in Lord, Malham and Wiese [31].
2. Consider the order 1/2 exponential Lie series with no vector field commu-
tations. Solving the ordinary differential equation (8.1) using an (ordinary) Euler
Munthe-Kaas method and approximating dexp
σ ≈ id is equivalent to the order 1/2
stochastic Taylor Munthe-Kaas method (for the same Lie group and action).
9. Numerical examples.
9.1. Rigid body. We consider the dynamics of a rigid body such as a satellite
(see Marsden and Ratiu [37]). We will suppose that the rigid body is perturbed by two
independent multiplicative stochastic processes W 1 and W 2 with the corresponding
vector fields Vi(y) ≡ ξi(y) y, for i = 0, 1, 2, with ξi ∈ so(3). If we normalize the initial
data y0 so that |y0| = 1 then the dynamics evolves on M = S2. We naturally suppose
G = SO(3), and Λy0(S) ≡ Sy0 so that λξi(y) = ξi(y) y, and we can pull back the
flow generated by V on M to the flow on G generated by Xξi(S, t) = ξi
Λy0(S)
i = 0, 1, 2. We use the following matrix representation for the ξi(y) ∈ so(3):
ξi(y) =
0 −y3/αi,3 y2/αi,2
y3/αi,3 0 −y1/αi,1
−y2/αi,2 y1/αi,1 0
where the constants αi,j for j = 1, 2, 3 are chosen so that the vector fields Vi and
matrices ξi do not commute for i = 0, 1, 2: α0,1 = 3, α0,2 = 1, α0,3 = 2, α1,1 = 1,
α1,2 = 1/2, α1,3 = 3/2, α2,1 = 1/4, α2,2 = 1, α2,3 = 1/2. The vector fields Vi satisfy
the conditions of Theorem 7.1 since the manifold is compact in this case.
We will numerically solve (1.1) using three different order 1 methods: stochastic
Taylor, stochastic Taylor Munthe-Kaas based on (6.1) and Castell–Gaines (a stan-
dard non-geometric Runge–Kutta method is used to solve the ordinary differential
equation (8.1)). The vector field compositions ViVj needed for the stochastic Taylor
and Castell–Gaines methods are readily computed. For the Munthe-Kaas method we
note that we have vξi ◦ o = ξi(y0) and
vξivξj ◦ o = Â(y0, y0;αi, αj)− 12 [ξi(y0), ξj(y0)] .
Here o ∈ so(3) is the zero element on the Lie algebra, and for all y, z ∈ R3 we define
A(y, z;α, β) ≡
− y3z2
− y1z3
− y2z1
and ˆ : R3→so(3) denotes the vector space isomorphism σ 7→ σ̂ where
0 −σ3 σ2
σ3 0 −σ1
−σ2 σ1 0
Note that ŷ z ≡ y ∧ z (see Marsden and Ratiu [37]). Note also since σ ∈ so(3),
expσ ∈ SO(3) can be conveniently and cheaply computed using Rodrigues’ formula
(see Marsden and Ratiu [37] or Iserles et al. [25]).
In Figure 9.1 we show the distance from the manifold S2 of each the three approx-
imations; we start with initial data y0 = (
2, 0)T. The stochastic Taylor Munthe-
Kaas method can be seen to preserve the solution in the unit sphere to within machine
Stochastic Lie group integrators 13
0 1 2 3 4 5 6 7 8 9 10
Stochastic Taylor
Castell−Gaines
Munthe−Kaas
Fig. 9.1. Rigid body: We show the log-distance of the approximate solution to the unit sphere
as a function of time for each of the methods. Below we show the approximate solutions as a function
of time for the stochastic Taylor (blue) and Munthe-Kaas methods (magenta). The trajectory starts
at the top right and eventually drifts over the left horizon.
14 Malham and Wiese
error. We also see that the stochastic Taylor method clearly drifts off the sphere as
the integration time progresses, as does the non-geometric Castell-Gaines method—
which does however remain markedly closer to the manifold than the stochastic Taylor
scheme.
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Stochastic Taylor
Castell−Gaines
Munthe−Kaas
−3.2 −3.1 −3 −2.9 −2.8 −2.7 −2.6 −2.5 −2.4 −2.3 −2.2
(stepsize)
Number of sampled paths=100
Stochastic Taylor
Castell−Gaines
Munthe−Kaas
Fig. 9.2. Autonomous underwater vehicle: We show the log-distance of the approximate solution
to the two Casimirs C1 = π ·p (dotted line) and C2 = |p|
2 (solid line) as a function of time for each
of the methods. Below, we also show the global error as a function of stepsize.
Stochastic Lie group integrators 15
9.2. Autonomous underwater vehicle. The dynamics of an ellipsoidal au-
tonomous underwater vehicle is prescribed by the state y = (π, p) ∈ se(3)∗ where
π ∈ so(3)∗ is its angular momentum and p ∈ (R3)∗ its linear momentum (see Holmes,
Jenkins and Leonard [24], Egeland, Dalsmo and Sørdalen [12] and Marsden and
Ratiu [37]). We suppose that the vehicle is perturbed by two independent multi-
plicative stochastic processes. The governing vector fields are for i = 0, 1, 2:
Vi(y) = ad
◦ y .
Here ξi(y) =
ωi(y), ui(y)
∈ se(3) where ωi(y) = I−1i π and ui(y) = M
i p are the
angular and linear velocity, and Ii = diag(αi,1, αi,2, αi,3) andMi = diag(βi,1, βi,2, βi,3)
are the constant moment of inertia and mass matrices, respectively. Explicitly for
ξ ∈ se(3) we have
ad∗ξ ◦ y ≡ (π ∧ ω + p ∧ u, p ∧ ω) .
The system of vector fields Vi, i = 0, 1, 2 represents the Lie–Poisson dynamics on
M = se(3)∗ (Marsden and Ratiu [37]). There are two independent Casimir functions
Ck : se(3)
∗→R, k = 1, 2, namely C1 = π · p and C2 = |p|2; these are conserved by the
flow on se(3)∗. Note that the Hamiltonian, i.e. total kinetic energy 1
(π · ω + p · u),
is also exactly conserved (and helpful for establishing the sufficiency conditions in
Theorem 7.1), but that is not our focus here.
If G = SE(3) ∼= SO(3) × R3, then the coadjoint action of SE(3) on se(3)∗,
: SE(3) × se(3)∗→se(3)∗ is defined for all S = (s, ρ) ∈ SE(3), where s ∈ SO(3)
and ρ ∈ R3, and y ∈ se(3)∗ by: Λy ◦ S = Ad∗S−1 ◦ y ≡
sπ + ρ ∧ (sp), sp
. The
corresponding infinitesimal action λ : se(3)× se(3)∗→se(3)∗ for all ξ ∈ se(3) and y ∈
se(3)∗ is given by (see Marsden and Ratiu [37], p. 477)
λξ ◦ y = −ad∗ξ ◦ y .
Since ad∗ξ(y) = −λξ(y) = λ−ξ(y) the governing set of vector fields on se(3)∗ are
Vi(y) = λ−ξi ◦ y .
We can now pull back this flow on se(3)∗ to a flow on SE(3) via Λy0 . The correspond-
ing flow on SE(3) is generated by the governing set of vector fields for i = 0, 1, 2:
X−ξi ◦ S = −
ωi(y) ∧ s, ωi(y) ∧ ρ+ ui(y)
with y = Λy0(S).
To aid implementation note that SE(3) =
(s, ρ) ∈ SE(3) : s ∈ SO(3), ρ ∈ R3
embeds into SL(4;R) via the map
S = (s, ρ) 7→
where O is the three-vector of zeros. Also se(3) is isomorphic to a Lie subalgebra of
sl(4;R) with elements of the form
σ = (θ, ζ) 7→
16 Malham and Wiese
Hence the governing vector fields on SE(3) are of the form Xξi = −ξi(y)S, where
ξi(y) =
ω̂i(π) ui(p)
The governing vector fields on se(3) are vξi(σ) = −dexpσ ◦ ξi
Λy0(expσ)
. Again the
vector field compositions ViVj needed for the stochastic Taylor and Castell–Gaines
methods can be computed straightforwardly. Direct calculation also reveals that in
block matrix form
vξivξj◦o =
Â(π0, π0;αi, αj) + Â(p0, p0;βi, αj) A(π0, p0;αi, βj)
[ξi(y0), ξj(y0)] .
Here A(y, z;α, β) is defined as for the rigid body example. Note that the exponential
map exp
se(3) : se(3)→SE(3) is defined for all σ = (θ, ζ) ∈ se(3) by
se(3) σ =
so(3) θ̂ f(θ)ζ
where exp
so(3) is the exponential map from so(3) to SO(3) which can be computed
using Rodrigues’ formula and (see Bullo and Murray [4], p. 5)
f(θ) = I3×3 + (1 − cos ‖θ‖)θ̂/‖θ‖2 +
1− (sin ‖θ‖)/‖θ‖
θ̂2/‖θ‖2 .
In Figure 9.2 we show the distance from the manifold se(3)∗ of each the three ap-
proximations; in particular how far the individual trajectories stray from the Casimirs
C1 = π · p and C2 = |p|2. We start with the initial data y0 = (
2, 0, 0,
As before the stochastic Taylor Munthe-Kaas method can be seen to preserve the
Casimirs to within machine error. We also see that the stochastic Taylor method
clearly drifts off the manifold as the integration time progresses and at a particular
time depending on the Wiener path shoots off very rapidly away from the manifold.
Note also that for large stepsizes the stochastic Taylor method is unstable. However
the non-geometric Castell–Gaines and stochastic Munthe-Kaas methods still give reli-
able results in that regime. Lastly, although the the stochastic Munthe-Kaas method
adheres to the manifold to within machine error, the error of the non-geometric
Castell–Gaines method is actually smaller.
10. Conclusions. We have established and implemented stochastic Lie group
integrators based on stochastic Munthe-Kaas methods and also derived geometric
Castell–Gaines methods. We have also revealed several aspects of these integrators
that require further investigation.
1. We could construct a stochastic nonlinear Magnus method by approximating
the solution to the stochastic differential equation (5.1) on the Lie algebra using Picard
iterations (see Casas and Iserles [6]).
2. We would like to develop a practical procedure for implementing ordinary
Munthe-Kaas methods for higher order Castell–Gaines integrators. We need to de-
termine the element ξ : M→g so that in (8.1) we have ψ̂ = λξ.
3. We need to determine the properties of the local and global errors for the
stochastic Munthe-Kaas methods. Also a thorough investigation of the stability prop-
erties of the stochastic Munthe-Kaas and Castell–Gaines methods is required. For
the autonomous underwater vehicle simulations they were both superior to the direct
stochastic Taylor method, especially for larger stepsizes. We also need to compare the
relative efficiency of the methods concerned, in particular to compare an optimally
efficient geometric Castell–Gaines method with the stochastic Munthe-Kaas method.
Stochastic Lie group integrators 17
4. Although we have chiefly confined ourselves to driving paths that are Wiener
processes, we can extend Munthe-Kaas and Castell–Gaines methods to rougher driv-
ing paths (Lyons and Qian [33], Friz [18], Friz and Victoir [19]). Further, what hap-
pens when we consider processes involving jumps? For example Srivastava, Miller and
Grenander [44] consider jump diffusion processes on matrix Lie groups for Bayesian
inference. Or what if we consider fractional Brownian driving paths; Baudoin and
Coutin [3] investigate fractional Brownian motions on Lie groups?
5. Schiff and Shnider [43] have used Lie group methods to derive Möbius schemes
for numerically integrating deterministic Riccati systems beyond finite time removable
singularities and numerical instabilities. They integrate a linear system of equations
on the general linear group GL(n) which corresponds to a Riccati flow on the Grass-
mannian manifold Gr(k, n) via the Möbius action map. Lord, Malham and Wiese [31]
implemented stochastic Möbius schemes and show that they can be more accurate and
cost effective than directly solving stochastic Riccati systems using stochastic Taylor
methods. We would like to investigate further their effectiveness for stochastic Ric-
cati equations arising in Kalman filtering (Kloeden and Platen [27]) and to backward
stochastic Riccati equations arising in optimal stochastic linear-quadratic control (see
for example Kohlmann and Tang [28] and Estrade and Pontier [16]).
6. Other areas of potential application of the methods we have presented in this
paper are for example: term-structure interest rate models evolving on finite dimen-
sional invariant manifolds (see Filipovic and Teichmann [17]); stochastic dynamics
triggered by DNA damage (Chickarmane, Ray, Sauro and Nadim [10]) and stochastic
symplectic integrators for which the gradient of the solution evolves on the symplectic
Lie group (see Milstein and Tretyakov [38]).
Acknowledgments. We thank Alex Dragt, Peter Friz, Anders Hansen, Terry
Lyons, Per-Christian Moan and Hans Munthe–Kaas for stimulating discussions. We
also thank the anonymous referees, whose suggestions and encouragement improved
the original manuscript significantly. SJAM would like to acknowledge the invalu-
able facilities of the Isaac Newton Institute where some of the final touches to this
manuscript were completed.
Appendix A. Proof of Theorem 7.1. We follow the proof for linear stochastic
differential equations in Lord, Malham and Wiese [31] (where further technical details
on estimates for multiple Stratonovich integrals can be found). Suppose ψ̂t ≡ ψ̂t(m)
is the truncated Lie series (7.1). First we show that exp ψ̂t ◦ y0 ∈ L2. We see that
for any number k,
)k ◦ y0 is a sum of |Qm|k terms, each of which is a k-multiple
product of terms Jα cα ◦ y0. It follows that
)k ◦ y0
‖cα ◦ y0‖
αi∈Qm
i=1,...,k
‖Jα1Jα2 · · · Jαk‖L2 . (A.1)
Note that the maximum of the norm of the compositions of vector fields cα◦y0 is taken
over a finite set. Repeated application of the product rule reveals that for i = 1, . . . , k,
each term ‘Jα1Jα2 · · · Jαk ’ in (A.1) is the sum of at most 22mk−1 Stratonovich integrals
Jβ , where for t ≤ 1, ‖Jβ‖L2 ≤ 24mk−1 tk/2. Since the right hand side of equation (A.1)
consists of |Qm|k 22mk−1 Stratonovich integrals Jβ , we conclude that,
)k ◦ y0
‖cα ◦ y0‖ · |Qm| · 26m · t1/2
18 Malham and Wiese
Hence exp ψ̂t ◦ y0 is square-integrable.
Second we prove (7.2). Let ŷt denote the stochastic Taylor series solution, trun-
cated to included terms of order up to and including tm. We have
∥yt − exp ψ̂t ◦ y0
∥yt − ŷt
∥ŷt − exp ψ̂t ◦ y0
We know yt ∈ L2—see Lemma III.2.1 in Gihman and Skorohod [21]. Note that
the assumptions there are fulfilled, since the uniform boundedness of the derivatives
implies uniform Lipschitz continuity of the vector fields by the mean value theorem,
and uniform Lipschitz continuity in turn implies a linear growth condition for the
vector fields since they are autonomous. Note that ŷt is a strong approximation to
yt up to and including terms of order t
m, with the remainder consisting of O(tm+1/2)
terms (see Proposition 5.9.1 in Kloeden and Platen [27]). It follows from the definition
of the exponential Lie series as the logarithm of the stochastic Taylor series, that the
terms of order up to and including tm in exp ψ̂t ◦ y0 correspond with ŷt; the error
consists of O(tm+1/2) terms.
Appendix B. Proof of Theorem 8.2. Our proof follows along the lines of that
for uniformly accurate Magnus integrators for linear constant coefficient systems (see
Lord, Malham & Wiese [31] and Malham and Wiese [35]). Let ϕtn,tn+1 and ϕ̂tn,tn+1
denote the exact and approximate flow-maps constructed on the interval [tn, tn+1] of
length h. We define the local flow remainder as
Rtn,tn+1 ≡ ϕtn,tn+1 − ϕ̂tn,tn+1 ,
and so the local remainder is Rtn,tn+1 ◦ ytn . Let Rls and Rst denote the local flow
remainders corresponding to the exponential Lie series and stochastic Taylor approx-
imations, respectively.
B.1. Order 1/2 integrator: two Wiener processes. For the global order 1/2
integrators we have to leading order Rls = 1
(J12 − J21)[V1, V2] and Rst = J12V1V2 +
J21V2V1. Note that we have included the terms J11V
1 and J22V
2 in the integrators.
A direct calculation reveals that
(Rst ◦ y0)TRst ◦ y0
(Rls ◦ y0)TRls ◦ y0
+ h2mUTBU +O
. (B.1)
Here m = 1/2 (for the order 1/2 integrators), U = (V1V2 ◦ y0, V2V1 ◦ y0)T ∈ R2n, and
B ∈ R2n×2n consists of n× n diagonal blocks of the form bijIn×n where
b = 1
and In×n is the n×n identity matrix. Since b is positive semi-definite, the matrix B =
b⊗In×n is positive semi-definite. Hence the order 1/2 exponential Lie series integrator
is locally more accurate than the corresponding stochastic Taylor integrator.
B.2. Order 1 integrator: one Wiener process. For the global order 1 in-
tegrators we have to leading order Rls = 1
(J01 − J10)[V0, V1] and Rst = J01V0V1 +
J10V1V0 + J111V
h2(V0V
1 + V
1 V0). The terms of order h
2 shown are significant
when we consider the global error in Section B.4 below. The estimate (B.1) also
applies in this case with m = 1 and U = (V0V1 ◦ y0, V1V0 ◦ y0, V 31 ◦ y0)T ∈ R3n; and
B ∈ R3n×3n consists of n× n diagonal blocks of the form bijIn×n where
b = 1
3 3 3
3 3 3
3 3 5
Stochastic Lie group integrators 19
Since b is positive semi-definite, the matrix B = b ⊗ In×n is positive semi-definite.
Hence the order 1 exponential Lie series integrator is locally more accurate than the
corresponding stochastic Taylor integrator.
B.3. Order 3/2 integrator: one Wiener process. The local flow remainders
are Rls = 1
J110−2J101+J011− 12h
[V1, [V1, V0]] and R
st = J011V0V
1 +J101V1V0V1+
J110V
1 V0 + J1111V
1 − 14h
2(V0V
1 + V
1 V0 +
V 41 ). The terms of order h
2 shown are
significant when we consider the global error—but for a different reason this time—see
Section B.4 below. Again, the estimate (B.1) applies in this case with m = 3/2 and
U = (V0V
1 ◦ y0, V1V0V1 ◦ y0, V 21 V0 ◦ y0, V 41 ◦ y0)T ∈ R4n; and B ∈ R4n×4n consists of
n× n diagonal blocks of the form bijIn×n where
b = 1
11 8 5 12
8 8 8 12
5 8 11 12
12 12 12 24
Again, B is positive semi-definite and the order 3/2 exponential Lie series integrator
is locally more accurate than the corresponding stochastic Taylor integrator.
B.4. Global error. Recall that we define the strong global error at time T
associated with an approximate solution ŷT as E ≡ ‖yT − ŷT ‖L2. The exact and
approximate solutions can be constructed by successively applying the exact and
approximate flow maps ϕtn,tn+1 and ϕ̂tn,tn+1 on the successive intervals [tn, tn+1] to
the initial data y0. A straightforward calculation shows for a small fixed stepsize h,
E2 = E (R ◦ y0)TR ◦ y0 , (B.2)
up to higher order terms, where R ≡
n=0 ϕtn+1,tN ◦Rtn,tn+1 ◦ϕt0,tn is the standard
accumulated local error contribution to the global error. The important conclusion is
that when we construct the global error (B.2), the terms of leading order in the local
flow remainders Rls or Rst with zero expectation lose only a half order of convergence
in this accumulation effect. Hence in the local flow remainders shown above, for the
terms of zero expectation, the local superior accuracy for the Lie series integrators
transfers to the corresponding global errors (see Lord, Malham and Wiese [31] for
more details). Terms of non-zero expectation however behave like deterministic er-
ror terms losing a whole order (in the local to global convergence); they contribute
to the global error through their expectations. Hence we include such terms of or-
der h2 in the order 3/2 integrators above and they appear as the terms subtracted
from the remainders shown. For the order 1 integrators we do not need to include
the order h2 terms in the integrator to obtain the correct mean-square convergence.
However to guarantee that the global error for the exponential Lie series integrator
is always smaller than that for the stochastic Taylor scheme, we include this term in
the integrator.
REFERENCES
[1] G. Ben Arous, Flots et series de Taylor stochastiques, Probab. Theory Related Fields, 81
(1989), pp. 29–77.
[2] F. Baudoin, An introduction to the geometry of stochastic flows, Imperial College Press, 2004.
[3] F.Baudoin and L. Coutin, Self-similarity and fractional Brownian motions on Lie groups,
arXiv:math.PR/0603199 v1, 2006.
http://arxiv.org/abs/math/0603199
20 Malham and Wiese
[4] F. Bullo and R. M. Murray, Proportional derivative (PD) control on the Euclidean group,
CDS Technical Report 95-010, 1995.
[5] K. Burrage and P. M. Burrage, High strong order methods for non-commutative stochas-
tic ordinary differential equation systems and the Magnus formula, Phys. D, 133 (1999),
pp. 34–48.
[6] F. Casas and A. Iserles, Explicit Magnus expansions for nonlinear equations, Cambridge NA
reports, 2005.
[7] F. Castell, Asymptotic expansion of stochastic flows, Probab. Theory Related Fields, 96
(1993), pp. 225–239.
[8] F. Castell and J. Gaines, An efficient approximation method for stochastic differential equa-
tions by means of the exponential Lie series, Math. Comp. Simulation, 38 (1995), pp. 13–19.
[9] K. T. Chen, Integration of paths, geometric invariants and a generalized Baker–Hausdorff
formula, Annals of Mathematics, 65(1) (1957), pp. 163–178.
[10] V. Chickarmane, A. Ray, H. M. Sauro and A. Nadim, A model for p53 dynamics triggered
by DNA damage, SIAM J. Applied Dynamical Systems, 6(1) (2007), pp.61–78.
[11] P. E. Crouch and R. Grossman, Numerical integration of ordinary differential equations on
manifolds, J. Nonlinear Sci., 3 (1993), pp. 1–33.
[12] O. Egeland, M. Dalsmo and O. J. Sørdalen, Feedback control of a nonholonomic under-
water vehicle with a constant desired configuration, The International Journal of Robotics
Research, 15(1) (1996), pp. 24–35.
[13] K. D. Elworthy, Stochastic differential equations on manifolds, London Mathematical Society
Lecture Note Series 70, Cambridge University Press, 1982.
[14] M. Emery, Stochastic Calculus on manifolds, Universitext, Springer–Verlag, 1989.
[15] , On two transfer principles in stochastic differential geometry, Séminaire de probabilités
(Strasbourg), 24 (1990), pp. 407–441.
[16] A. Estrade and M. Pontier, Backward stochastic differential equations in a Lie group,
Séminaire de probabilités (Strasbourg), 35 (2001), pp. 241–259.
[17] D. Filipović and J. Teichmann, On the geometry of the term structure of interest rates, Proc.
R. Soc. Lond. A, 460 (2004), pp. 129–167.
[18] P. Friz, Continuity of the Itô-map for Hölder rough paths with applications to the support
theorem in Hölder norm, arXiv:math.PR/0304501 v2, 2003.
[19] P. Friz and N. Victoir, Euler estimates for rough differential equations, Preprint, 2007.
[20] J. G. Gaines and T. J. Lyons, Variable step size control in the numerical solution of stochastic
differential equations, SIAM J. Appl. Math., 57(5) (1997), pp. 1455–1484.
[21] I. I. Gihman, and A. V. Skorohod, The theory of stochastic processes III, Springer, 1979.
[22] E. Hairer, C. Lubich and G. Wanner, Geometric Numerical Integration, Springer Series
in Computational Mathematics, 2002.
[23] S. Helgason, Differential geometry, Lie groups, and symmetric spaces, Academic Press, 1978.
[24] P. Holmes, J. Jenkins and N. E. Leonard, Dynamics of the Kirchoff Equations I: coincident
centers of gravity and bouyancy, Phys. D, 118 (1998), pp. 311–342.
[25] A. Iserles, H. Z. Munthe-Kaas, S. P. Nørsett, and A. Zanna, Lie-group methods, Acta
Numer., (2000), pp. 215–365.
[26] A. Iserles and A. Zanna, Efficient computation of the matrix exponential by generalized polar
decompositions, SIAM J. Numer. Anal., 42(5) (2005), pp. 2218–2256.
[27] P. E. Kloeden and E. Platen, Numerical solution of stochastic differential equations,
Springer, 1999.
[28] M. Kohlmann and S. Tang, Multidimensional backward stochastic Riccati equations and ap-
plications, SIAM J. Control Optim., 41(6) (2003), pp. 1696–1721.
[29] H. Kunita, On the representation of solutions of stochastic differential equations, LNM 784,
Springer–Verlag, 1980, pp. 282–304.
[30] , Stochastic flows and stochastic differential equations, Cambridge University Press, 1990.
[31] G. Lord, S. J.A. Malham and A. Wiese, Efficient strong integrators for linear stochastic
systems, 2006, Submitted.
[32] T. Lyons, Differential equations driven by rough signals, Rev. Mat. Iberoamericana, 14(2)
(1998), pp. 215–310.
[33] T. Lyons and Z. Qian, System control and rough paths, Oxford University Press, 2002.
[34] W. Magnus, On the exponential solution of differential equations for a linear operator, Comm.
Pure Appl. Math., 7 (1954), pp. 649–673.
[35] S. J.A. Malham and A. Wiese, Universal optimal stochastic expansions, 2007, Preprint.
[36] P. Malliavin, Stochastic analysis, Grundlehren der mathematischen Wissenschaften 313,
Springer, 1997.
[37] J. E. Marsden and T. S. Ratiu, Introduction to mechanics and symmetry, Second edition,
http://arxiv.org/abs/math/0304501
Stochastic Lie group integrators 21
Springer, 1999.
[38] G. N. Milstein and M. V. Tretyakov, Stochastic numerics for mathematical physics,
Springer, 2004.
[39] T. Misawa, A Lie algebraic approach to numerical integration of stochastic differential equa-
tions, SIAM J. Sci. Comput., 23(3) (2001), pp. 866–890.
[40] H. Munthe-Kaas, High order Runge–Kutta methods on manifolds, Appl. Numer. Math., 29
(1999), pp. 115–127.
[41] N. J. Newton, Asymptotically efficient Runge–Kutta methods for a class of Itô and
Stratonovich equations, SIAM J. Appl. Math., 51 (1991), pp. 542–567.
[42] P. J. Olver, Equivalence, invariants, and symmetry, Cambridge University Press, 1995.
[43] J. Schiff and S. Shnider, A natural approach to the numerical integration of Riccati differ-
ential equations, SIAM J. Numer. Anal., 36(5) (1999), pp. 1392–1413.
[44] A. Srivastava, M. I. Miller and U. Grenander, Jump-diffusion processes on matrix Lie
groups for Bayesian inference, preprint, 2000.
[45] R. S. Strichartz, The Campbell–Baker–Hausdorff–Dynkin formula and solutions of differen-
tial equations, J. Funct. Anal., 72 (1987), pp. 320–345.
[46] H. J. Sussmann, Product expansions of exponential Lie series and the discretization of stochas-
tic differential equations, in Stochastic Differential Systems, Stochastic Control Theory,
and Applications, W. Fleming and J. Lions, eds., Springer IMA Series, Vol. 10 (1988),
pp. 563–582.
[47] V. S. Varadarajan, Lie groups, Lie algebras, and their representations, Springer, 1984.
[48] F. W. Warner, Foundations of differentiable manifolds and Lie groups, Graduate Texts in
Mathematics, Springer–Verlag, 1983.
[49] Y. Yamato, Stochastic differential equations and nilpotent Lie algebras, Z. Wahrsch. Verw.
Gebiete, 47(2) (1979), pp 213–229.
ABSTRACT
  We present Lie group integrators for nonlinear stochastic differential
equations with non-commutative vector fields whose solution evolves on a smooth
finite dimensional manifold. Given a Lie group action that generates transport
along the manifold, we pull back the stochastic flow on the manifold to the Lie
group via the action, and subsequently pull back the flow to the corresponding
Lie algebra via the exponential map. We construct an approximation to the
stochastic flow in the Lie algebra via closed operations and then push back to
the Lie group and then to the manifold, thus ensuring our approximation lies in
the manifold. We call such schemes stochastic Munthe-Kaas methods after their
deterministic counterparts. We also present stochastic Lie group integration
schemes based on Castell--Gaines methods. These involve using an underlying
ordinary differential integrator to approximate the flow generated by a
truncated stochastic exponential Lie series. They become stochastic Lie group
integrator schemes if we use Munthe-Kaas methods as the underlying ordinary
differential integrator. Further, we show that some Castell--Gaines methods are
uniformly more accurate than the corresponding stochastic Taylor schemes.
Lastly we demonstrate our methods by simulating the dynamics of a free rigid
body such as a satellite and an autonomous underwater vehicle both perturbed by
two independent multiplicative stochastic noise processes.

<|endoftext|><|startoftext|>
Introduction
The chromosphere remains the least understood layer of
the solar atmosphere, with the very basics of its struc-
ture being hotly debated: is it better described by the
classical picture of a steady temperature rise as a func-
tion of height, with superposed weak oscillations (e.g.
semi empirical models of Vernazza et al. [8], Fontenla
et al. [5]), or does the temperature keep dropping out-
wards, with very hot shocks producing strong localized
heating (radiation hydrodynamic simulations of Carls-
son & Stein [3], [4], and Wedemeyer et al. [9])? The
latter concept is consistent with the IR observations of
carbon monoxide, which require cool gas to be present
at chromospheric heights (see, e.g. Ayres [1]).
Thus, existing models cannot provide a complete de-
scription of the solar chromosphere. Consequently now-
adays two alternative pictures of the chromosphere co-
exist and the role played by chromospheric dynamics in
the structuring of this atmospheric layer is a subject of
intense scientific debate.
One reason for conflicting models is that they are
based either on atomic chromospheric lines and con-
tinua in the UV or on molecular lines in the IR, since
UV observations are practically blind to cool gas in a
dynamic chromosphere, while the IR observations sam-
ple only the cool part of the chromosphere. Improved
and more sensitive diagnostics of the chromospheric
structure and dynamics, that sample both the hot and
http://arxiv.org/abs/0704.0023v1
MDI TRACE 1600
CaII K BIMA 3.5 mm
Fig. 1 Portrait of the solar chromosphere at the center of the Sun’s disk at 4 different wavelengths on May 18, 2004. From top left to
bottom right: MDI longitudinal photospheric magnetogram, UV 1600 A image from TRACE, CaII K line center image from BBSO
and BIMA image at 3.5 mm.
the cool gas and should distinguish between the ri-
val models, are provided by observations at millime-
ter wavelengths with an acceptable spatial resolution
as was proposed by Loukitcheva et al. [6]. In this con-
tribution we review the unique chromospheric obser-
vations at 3.5 mm with the Berkeley-Illinois-Maryland
Array and the analysis of the intensity variations ex-
pected from the model of Carlsson & Stein for mm
wavelengths. We postulate the requirements for mm ob-
servations with the future instruments, with emphasis
on spatial and temporal resolution. Finally we discuss
the prospects for chromospheric studies with ALMA.
2 Results
2.1 Analysis of the BIMA observations at 3.5 mm
The Berkeley-Illinois-Maryland Array (BIMA) operat-
ing at a wavelength of 3.5 mm (frequency of 85 GHz)
has been the only interferometer in the mm range fre-
quently used for solar observations. The BIMA tele-
scopes are now part of the CARMA array which will
also carry out such observations. With the BIMA data
obtained in the years 2003 and 2004 we have constructed
two-dimensional maps of the solar chromosphere with a
resolution of 12′′, which represents the highest spatial
resolution achieved so far at this wavelength for non-
flare solar observations. The BIMA images have led
to new insights in to chromospheric structure and to
the detection of spatially-resolved chromospheric oscil-
lations at mm wavelengths. The details of the restora-
tion procedure and extensive tests of the sensitivity of
the BIMA data to the detection of dynamic signatures
can be found in White et al. [11].
With the currently available resolution the contrast
of the brightness structures is evaluated to be up to
30% of the quiet-sun brightness at 3.5 mm (White et
al. [11]). However, the similarity of brightness struc-
tures, derived from the mm images and seen in other
chromospheric emissions (Fig.1), in spite of the differ-
ence in resolution of the images (1-2′′ resolution of the
UV images), implies that the BIMA resolution is not
enough to resolve the millimeter fine structure and ob-
servations with spatial resolution much higher than 12′′
are required. A detailed analysis of the relations be-
tween the millimeter emission, magnetic field and other
chromospheric diagnostics is in preparation.
In the millimeter brightness we detected intensity
oscillations with typical amplitudes of 50-150 K in the
range of periods from 120 to 700 seconds (frequency
range 1.5-8 mHz). We found a tendency toward short
period oscillations in internetwork and longer periods in
network regions in the quiet Sun, which is in good agree-
ment with the results obtained at other wavelengths. At
3 mm the inner parts of the chromospheric cells exhibit
a behavior typical of the internetwork with the maxi-
mum of the Fourier power in the 3-minute range, how-
ever, most of the oscillations are quasi-periodic, show-
ing up in wave trains of finite duration lasting for typi-
cally 1-3 wave periods (see also Loukitcheva et al. [7]).
2.2 Analysis of the CS model millimeter spectrum
The response of the submillimeter and millimeter ra-
diation to a time-series generated by Carlsson & Stein
(CS) was computed under the assumption of thermal
free-free radiation by Loukitcheva et al. [6]. The results
are depicted in Fig. 2 as the excess intensity as a func-
tion of wavelength and time.
400      720    1040    1360    1680    2000    2320    2640    2960    3280    3600
time(s)
Fig. 2 Evolution of the Carlsson & Stein model millimeter spec-
trum with time. Negative grey scale representing excess intensity
as a function of time and wavelength.
Wave periods of approximately 3 min can be clearly
distinguished in the intensity at all considered wave-
lengths. Though the dominant frequency of the oscilla-
tions changes slightly with wavelength, for all mm wave-
lengths it lies in the range of 3 minutes. The difference
from one period of time to another can be explained
by the presence of merging shocks during certain time
intervals. The differences in the light curves at differ-
ent wavelengths are caused primarily by the difference
in the formation heights of the emitted radiation. In
general the amplitudes of the oscillations compared to
the radiation temperature are large, in this sense mm
wavelength radiation combines the advantages of the
CO lines, which mainly see the cool gas, with those of
atomic lines and UV continua, which mainly sample the
hot gas.
On the whole, the brightness temperatures are ex-
tremely time-dependent at millimeter wavelengths, fol-
lowing changes in the atmospheric parameters. With
increasing wavelength the amplitude of the brightness
oscillations grows significantly, reaches its maximum
value at 2.2 mm (expected to be 15% of the quiet-Sun
brightness temperature), and decreases rapidly towards
longer wavelengths. Thus we can identify the range 0.8-
5.0 mm as the appropriate range of mm wavelengths at
which one can expect the clearest signatures of dynamic
effects. A careful look at the mm brightness spectrum
as a function of time (see Fig. 2) reveals a time delay
between the oscillations at long and short millimeter
wavelengths. Hence, it is possible to study wave modes
traveling in the chromosphere by comparing sub-mm
with mm observations.
3 Discussion
The CS model predicts that spatially and temporally
resolved observations should clearly exhibit the signa-
tures of the strong shock waves. However, a direct com-
parison of the observational data products (RMS val-
ues, histogram skewness, Fourier and wavelet spectra,
etc.), referring to regions with weak magnetic field like
the quiet Sun internetwork, with the corresponding prod-
ucts expected from the simulations of Carlsson & Stein
exhibits large differences. In particular, the RMS of the
brightness temperature is nearly an order of magnitude
larger in the model (800 K at 3 mm) than in the ob-
servations (100 K). Another difference is the absence of
longer periods in the model power spectrum. But these
discrepancies do not rule out the CS models. On the
one hand the model is one dimensional and hence does
not predict a coherence length of the oscillations, while
on the other hand we are not able to resolve individual
oscillating elements due to the limited spatial resolution
of the observations.
Consequently we estimated the influence of the spa-
tial smearing on the model parameters of chromospheric
dynamics and on the observed oscillatory power. Thus
we confirmed that the very limited spatial resolution
currently available hinders a clean separation between
cells and network and typically both network and in-
ternetwork areas contribute to the recorded BIMA ra-
diation. From the analysis of the observational data it
was found that power in all frequency ranges increases
significantly with improving resolution. Consistency be-
tween the power predicted by the CS model and the
observed power is obtained if the coherence length of
oscillating elements is on the order of 1′′.
Our results are consistent with Wedemeyer et al.
[10], who computed the millimeter wave signature re-
sulting from the 3-D simulations of Wedemeyer et al.
[9]. Although the 3-D simulations suffer from the fact
that the radiative transfer of energy is computed en-
tirely in LTE, which becomes a poor assumption at
chromospheric heights, the authors believe that the chro-
mospheric pattern and its temporal evolution is repre-
sentative of the non-magnetic internetwork regions of
the solar chromosphere. The simulations display a com-
plex 3D structure of the chromospheric layers, which is
highly dynamical on temporal scales of 20-25 s and on
spatial scales comparable to solar granulation, which is
in good agreement with the 1′′ size of oscillating ele-
ments that we deduced. According to Wedemeyer et al.
[9] the chromospheric temperature structure is charac-
terized by a pattern of hot shock waves, which originate
from convective motions, and cool gas lying between the
shocks. The intensity distribution at mm wavelengths
follows the pattern of the shocks in the chromosphere
with a sub arcsecond size of the features associated with
the shocks. All this complex and dynamic 3D structure
can be deduced from observations at mm wavelengths
with a sufficiently high spatial resolution of better than
4 Summary
Simultaneous mm-submm observations at different wave-
lengths can be used for the tomography of the solar
atmosphere, as radiation at the different wavelengths
originates from different layers, with the average for-
mation height increasing with wavelength. Such obser-
vations also provide a strong test of present and future
models. However, observations that might be able to
uncover the nature of the chromosphere should meet
the following requirements:
– multiband observations in mm-submm domain (0.8-
5.0 mm) to address shock waves and chromospheric
oscillation modes
– arcsecond spatial resolution to resolve fine structure
– temporal resolution better than a few seconds to
follow its evolution in time
– FOV size of order of 1′
– accurate absolute calibration of the observations (Bas-
tian [2])
These requirements look very similar to the techni-
cal specification of the continuum observations with the
Atacama Large Millimeter Array (ALMA), which rep-
resents an enormous advance over existing instrumenta-
tion operating at mm-submm wavelengths. ALMA will
produce images of the highest resolution available for
the foreseeable future (although the technical problem
of sampling both large and small spatial scales simulta-
neously, required for high–quality imaging of the chro-
mosphere, will remain a challenge) and will be the most
sensitive instrument operating at submm-mm wavelengths.
To summarize, ALMA will be an extraordinarily pow-
erful instrument for studying the solar chromosphere. It
will finally allow the mapping of the three-dimensional
thermal structure of the solar chromosphere which will
be a real breakthrough in solar studies.
Acknowledgements The use of BIMA for scientific research
carried out at the University of Maryland is supported by NSF
grant AST–0028963. Solar research at the University of Maryland
is supported by NSF grant ATM 99-90809 and NASA grants NAG
5-8192, NAG 5-10175, NAG 5-12860 and NAG 5-11872.
References
1. Ayres, T.R.: Does the Sun Have a Full-Time COmosphere?
Ap. J. 575, 1104-1115 (2002)
2. Bastian, T. S.: ALMA and the Sun. Astronomische
Nachrichten 323, 271-276 (2002)
3. Carlsson, M., & Stein, R.F.: Does a nonmagnetic solar chro-
mosphere exist? Ap. J. 440, L29-L32 (1995)
4. Carlsson, M., & Stein, R.F.: Dynamic Hydrogen Ionization.
Ap. J. 572, 626-635 (2002)
5. Fontenla, J. M.; Avrett, E. H.; Loeser, R.: Energy balance in
the solar transition region. III - Helium emission in hydrostatic,
constant-abundance models with diffusion. Ap. J. 406, 319-345
(1990)
6. Loukitcheva, M., Solanki, S.K., Carlsson, M., Stein, R.F.: Mil-
limeter observations and chromospheric dynamics. A&A 419,
747-756 (2004)
7. Loukitcheva, M., Solanki, S.K., White, S.: The dynamics
of the solar chromosphere: comparison of model predictions
with millimeter-interferometer observations. A&A 456, 713-723
(2006)
8. Vernazza, J. E., Avrett, E. H., Loeser, R.: Structure of the
solar chromosphere. III - Models of the EUV brightness com-
ponents of the quiet-sun. Ap. J. Suppl. 45, 635-725 (1981)
9. Wedemeyer, S., Freytag, B., Steffen, M., Ludwig, H.-G., Hol-
weger, H.: Numerical simulation of the three-dimensional struc-
ture and dynamics of the non-magnetic solar chromosphere.
A&A 414, 1121-1137 (2004)
10. Wedemeyer-Böhm, S., Ludwig, H.-G., Steffen, M., Freytag,
B., Holweger, H.: The shock-patterned solar chromosphere in
the light of ALMA. In: Favata et al. (eds.) Proceedings of
”The 13th Cambridge Workshop on Cool Stars, Stellar Systems
and the Sun” Hamburg, Germany, ESA SP-560, pp. 1035-1038
(2005)
11. White, S., Loukitcheva, M., & Solanki, S.K.: High-resolution
millimeter-interferometer observations of the solar chromo-
sphere. A&A 456, 697-711 (2006)
	Introduction
	Results
	Discussion
	Summary
ABSTRACT
  The very nature of the solar chromosphere, its structuring and dynamics,
remains far from being properly understood, in spite of intensive research.
Here we point out the potential of chromospheric observations at millimeter
wavelengths to resolve this long-standing problem. Computations carried out
with a sophisticated dynamic model of the solar chromosphere due to Carlsson
and Stein demonstrate that millimeter emission is extremely sensitive to
dynamic processes in the chromosphere and the appropriate wavelengths to look
for dynamic signatures are in the range 0.8-5.0 mm. The model also suggests
that high resolution observations at mm wavelengths, as will be provided by
ALMA, will have the unique property of reacting to both the hot and the cool
gas, and thus will have the potential of distinguishing between rival models of
the solar atmosphere. Thus, initial results obtained from the observations of
the quiet Sun at 3.5 mm with the BIMA array (resolution of 12 arcsec) reveal
significant oscillations with amplitudes of 50-150 K and frequencies of 1.5-8
mHz with a tendency toward short-period oscillations in internetwork and longer
periods in network regions. However higher spatial resolution, such as that
provided by ALMA, is required for a clean separation between the features
within the solar atmosphere and for an adequate comparison with the output of
the comprehensive dynamic simulations.

<|endoftext|><|startoftext|>
Formation of quasi-solitons in transverse confined ferromagnetic
film media
A.A. Serga 1
Technische Universität Kaiserslautern, Department of Physics and
Forschungsschwerpunkt MINAS, D - 67663 Kaiserslautern, Germany
M. Kostylev 2
School of Physics, The University of Western Australia, 35 Stirling
Highway, Crawley WA 6009, Australia
St.Petersburg Electrotechnical University, 197376, St.Petersburg, Russia
B. Hillebrands
Technische Universität Kaiserslautern, Department of Physics and
Forschungsschwerpunkt MINAS, D - 67663 Kaiserslautern, Germany
Abstract The formation of quasi-2D spin-wave waveforms in longitudinally
magnetized stripes of ferrimagnetic film was observed by using time- and space-
resolved Brillouin light scattering technique. In the linear regime it was found
that the confinement decreases the amplitude of dynamic magnetization near the
lateral stripe edges. Thus, the so-called effective dipolar pinning of dynamic mag-
netization takes place at the edges.
In the nonlinear regime a new stable spin wave packet propagating along a
waveguide structure, for which both transversal instability and interaction with
the side walls of the waveguide are important was observed. The experiments and
a numerical simulation of the pulse evolution show that the shape of the formed
waveforms and their behavior are strongly influenced by the confinement.
We report on the observation of a new type of a stable, two-dimensional
nonlinear spin wave packet propagating in a magnetic waveguide structure
and suggest a theoretical description of our experimental findings. Stable
two-dimensional spin wave packets, so-called spin wave bullets, were previ-
ously observed, however solely in long and wide samples of a thin ferrimag-
netic film of yttrium-iron-garnet (YIG) [1, 2, 3], that were practically un-
1Email address: serha@rhrk.uni-kl.de
2Email address: kostylev@cyllene.uwa.edu.au
bounded in both in-plane directions compared to the lateral size of the spin
wave packets and the wavelength of the carrier spin wave. In a waveguide
structure, where the transverse dimension is comparable to the wavelength,
up to day only quasi one-dimensional nonlinear spin wave objects were ob-
served, which are spin wave envelope solitons. Here a typical system is a
narrow (' 1-2mm) stripe of a YIG ferrite film [4, 5]. Both for solitons and bul-
lets the spreading in dispersion is compensated by the longitudinal nonlinear
compression. Concerning the transverse dimension, solitons have a cosine-
like amplitude distribution due to the lateral confinement in the waveguide,
whereas bullets show a transverse nonlinear instability compensating pulse
widening due to diffraction and leading to transverse confinement.
Here we report on the observation of a new stable spin wave packet prop-
agating along a waveguide structure, for which both transversal instability
and interaction with the side walls of the waveguide are important.
The experiments were carried out using a longitudinally magnetized long
YIG film stripe of 2.5mm width and 7µm thickness. The magnetizing field
was 1831Oe. The spin waves were excited by a microwave magnetic field
created with a microstrip antenna of 25µm width placed across the stripe
and driven by electromagnetic pulses of 20ns duration at a carrier frequency
of 7.125GHz. As is well known the backward volume magnetostatic spin
wave (BVMSW) [6] excited in the given experimental configuration is able to
form both envelope solitons and bullets [4], depending on the geometry. The
spatio-temporal behavior of the traveling BVMSW packets was investigated
by means of space- and time-resolved Brillouin light scattering spectroscopy
The obtained results are demonstrated in Fig. 1 where the spatial distri-
butions of the intensity of the spin wave packets are shown for given moments
of time. The spin wave packets propagate here from left to right and decay
in the course of their propagation along the waveguide because of magnetic
loss. The left set of diagrams corresponds to the linear case. The power of
the driving electromagnetic wave is 20mW. The right set of diagrams corre-
sponding to the nonlinear case was collected for a driving power of 376mW.
Differences between these two cases are clearly observed. First of all the
linear spin wave packet is characterized by a cosine-like lateral profile while
the cross section of the nonlinear pulse is sharply modified relative to the
linear case and has a pronounced bell-like shape. Second, the intensity of the
linear packet decays monotonically with time while the intensity of the non-
linear packet initially increases because of its strong transversal compression
(see the second diagram from the top in Fig. 1).
Both of these nonlinear features provide clear evidence for the develop-
Figure 1: Bullet formation in the transversally confined yttrium-iron-garnet
film.
ment of a transversal instability and bullet formation. It is interesting that
the bell-like cross-section shape survives even at the end of the propaga-
tion distance when the pulse intensity decreases more than ten times and
the nonlinear contribution to the spin wave dynamics should considerably
diminish.
In order to interpret the experimental result we have assumed that the de-
velopment of nonlinear instabilities in a laterally confined medium is strongly
modified by a quantization of the spin wave spectrum. That is why we have
transformed the two-dimensional Nonlinear Schrödinger Equation tradition-
ally used for the analysis of bullet dynamics [4] into a system of coupled
equations for amplitudes of the spin wave width modes. The specific form of
the discrete set of these orthogonal modes is defined by the actual boundary
conditions at the lateral edges of the stripe. We developed a two-dimensional
theory of linear spin-wave dynamics in magnetic stripes. As an important
outcome we found that the Guslienko-Slavins effective boundary condition [8]
for dynamic magnetization at the stripe lateral edges, being initially derived
for spin waves with vanishing longitudinal wavenumbers, is also valid in the
case of propagating width modes with non-vanishing longitudinal wavenum-
bers [9] . The effective boundary condition shows that the magnetization
vector at the lateral stripe edges is highly pinned, that means that the am-
plitude of dynamic magnetization practically vanishes at the edges. For
simplicity it is even possible to consider the stripe width modes to be totally
pinned at the stripe lateral edges. As seen from Fig. 1 this conclusion is in a
good agreement with the experiment.
The analysis of the system of nonlinear equations derived from the Non-
linear Schrödinger Equation shows that the formation of the two-dimensional
waveform can be considered as an enrichment of the spectrum of the width
modes. The partial waveforms carried by the modes have the same carrier
frequencies equal to that of the initial signal and the carrier wave numbers
which satisfy the dispersion relations for the modes. In the linear regime all
the modes are orthogonal to each other and do not interact. In the nonlinear
(high amplitude) regime the width modes become intercoupled by the four-
wave nonlinear interaction, resulting in an intermodal energy transfer and
the mode spectrum enrichment.
As the spin wave input antenna effectively generates only the lowest width
mode, the initial waveform launched in the stripe is determined by it solely.
Therefore to understand the underlaying physics of quasi-bullet formation it
is necessary to consider the nonlinear interaction of higherorder width modes
with it.
Our theoretical analysis shows that the interaction of the lowest width
mode (n = 1) with higherorder modes is different for odd and even higher or-
der modes. While interacting with even modes, the lowest width mode plays
the role of the pumping wave. This parametrically transfers its energy to the
higher width modes. The interaction is purely parametric and therefore a
threshold process. It needs an initial signal to start the process. This signal
usually is a thermally excited mode. Therefore the amplified waveform needs
a large distance of propagation and a group velocity equal to the velocity of
the lowest width mode in order to reach the soliton amplitude level. If there
is a damping of the pumped wave, even modes will never reach an amplitude
comparable with that of the lowest mode. As a result they can contribute to
the nonlinear waveform profile only, if the amplitude of the initial waveform
is far beyond the threshold of soliton formation.
Interaction of modes of the same type of symmetry are described by a
parametric term as well as by an additional pseudo-linear (tri-linear) exci-
tation term, playing the role of an external source of excitation. Such a
pseudo-linear excitation is a threshold-free process. In contrast to paramet-
ric processes it does not need an initial amplitude value to start the the
process. The pseudo-linear excitation is possible only due to the effective
dipolar pinning of the magnetization at the stripe edges. If the edge spins
were unpinned, the interaction of all the width modes would be purely para-
metric.
The purely parametric mechanism of developing a transversal instability
is typical for the process of bullet formation from a plane-wave waveform in
an unconfined medium, which distinguishes it from the process of soliton and
bullet formation in the waveguide structures.
In contrast, the transverse instability of a wave packet in a confined
medium starts as a pseudolinear excitation of higher-order width modes.
This mechanism ensures a rapid growth of the symmetric n = 3 mode up
to the level where the parametric mechanism starts to work. After that the
main mode together with the n = 3 mode are capable to rapidly generate
a large set of yet higher modes through both pseudo-linear and parametric
mechanisms.
Our theory shows that the efficiency of both nonlinear interaction mech-
anisms (parametric and tri-linear) strongly depends on the group velocity
difference of modes and the initial length of the nonlinear pulse. In larger
stripes the group velocities of modes are closer to each other. As a result
the nonlinearly generated higher-order modes longer remain within the pump
pulse. If the pulse is long enough, they reach significant amplitudes and a
bullet-like waveform is formed. In narrower stripes the group velocity differ-
ence is larger, and consequently the nonlinearly generated highorder wave-
forms leave faster the pumping area. As a result, for the same pulse length,
they do not reach significant amplitudes. The nonlinear steepening results
Figure 2: Lateral shapes of the nonlinear SW packets. 1 and 2 – theoretical
results calculated for the ferrite stripes of width of 2.5mm and 1mm , respec-
tively. 3 and 4 – experimental profiles observed in YIG waveguides of width
of 2.5mm and 1mm, respectively. 1 and 3: bullets. 2 and 4: solitons.
in the transformation of the lowest mode into a soliton.
The results of our calculations of the lateral shapes of the nonlinear spin
wave packets in wide (2.5mm) and narrow 1mm ferrite stripes are shown in
Fig. 2. The excellent correspondence with the experimental data provides
good evidence for the validity of the developed theory.
Support by the Deutsche Forschungsgemeinschaft, the Australian Re-
search Council, and Russian Foundation for Basic Research is gratefully ac-
knowledged.
References
[1] O. Büttner, M. Bauer, S.O. Demokritov, B. Hillebrands, Yu.S. Kivshar,
V. Grimalsky, Yu. Rapoport, A.N. Slavin, 61, 11576 (2000).
[2] A.A. Serga, B. Hillebrands, S.O. Demokritov, A.N. Slavin, 92, 117203
(2004).
[3] A.A. Serga, B. Hillebrands, S.O. Demokritov, A.N. Slavin, P. Wierzbicki,
V. Vasyuchka, O. Dzyapko, A. Chumak, 94, 167202 (2005).
[4] A.N. Slavin, O. Büttner, M. Bauer, S.O. Demokritov, B. Hillebrands,
M.P. Kostylev, B.A. Kalinikos, V. Grimalsky, Yu. Rapoport, Chaos 13,
693 (2003).
[5] M. Chen, M.A. Tsankov, J.M. Nash, C.E. Patton, 49, 12773 (1994).
[6] F.R. Morgenthaler, Proceedings of the IEEE 76, 138 (1988).
[7] S.O. Demokritov, B. Hillebrands, A.N. Slavin, Phys. Rep. 348, 441
(2001).
[8] K.Y.Guslienko, S.O.Demokritov, B.Hillebrands, and A.N.Slavin, 66,
132402 (2002).
[9] M.Kostylev, J.-G. Hu, and R.L.Stamps, 90, 012507 (2007).
ABSTRACT
  The formation of quasi-2D spin-wave waveforms in longitudinally magnetized
stripes of ferrimagnetic film was observed by using time- and space-resolved
Brillouin light scattering technique. In the linear regime it was found that
the confinement decreases the amplitude of dynamic magnetization near the
lateral stripe edges. Thus, the so-called effective dipolar pinning of dynamic
magnetization takes place at the edges.
  In the nonlinear regime a new stable spin wave packet propagating along a
waveguide structure, for which both transversal instability and interaction
with the side walls of the waveguide are important was observed. The
experiments and a numerical simulation of the pulse evolution show that the
shape of the formed waveforms and their behavior are strongly influenced by the
confinement.

<|endoftext|><|startoftext|>
Introduction
Theoretical study of polarons in the strongly correlated system is like an at-
tempt to view contents of a Pandora box embedded into another, even more
sinister and obscure, container of riddles, enigmas and mysteries. This des-
perate situation occurs because solution is not known even for the simplest
polaron problem, i.e. when a perfectly stable quasiparticle (QP) with mo-
mentum as a single quantum number interacts with a well defined bath of
bosonic elementary excitations. To the contrary, the definition of the strongly
correlated system implies that QPs might be highly unstable and the very
notion of QPs, both in electronic and bosonic subsystems, is under question.
Thus, one faces the problem of an interplay between ill defined objects and
it is crucial to solve the problem without approximations. Further difficulty,
pertinent to realistic systems, is an interplay of the momentum and other
quantum numbers characterizing internal states of a QP.
The problem of polaron originally emerged as that of an electron coupled
to phonons (see [1, 2]). In the initial formulation a structureless QP is char-
acterized by the only quantum number, momentum, which changes due to
interaction of the QP with phonons [3, 4]. Later, depending on what can be
called “particle” and “environment”, and how they interact with each other,
the polaron concept was related to extreme diversity of physical phenomena.
There are many other objects which, having nothing to do with phonons, are
isomorphic to simple polaron [5], as, e.g. an exciton-polaron in the intraband
scattering approximation [6, 7, 8, 9]. Another example is the problem of a hole
in the antiferromagnet which is closely related to polaron since hole movement
is accompanied by the spin flips which, in the spin wave approximation, are
equivalent to creation and annihilation of magnons [10, 11].
http://arxiv.org/abs/0704.0025v1
2 A. S. Mishchenko and N. Nagaosa
The concept of polaron was further generalized to include internal degrees
of freedom which, interacting with environment, change their quantum num-
bers. Example of a complex QP is the Jahn-Teller polaron, where electron-
phonon interaction (EPI) changes quantum numbers of degenerate electronic
states [12, 13, 14]. This generalization is important due to it’s relevance to
the colossal magnetoresistance phenomena in the manganese oxides [15, 16].
Another example is the pseudo Jahn-Teller polaron, where EPI is inelastic
and leads to transitions between close in energy electronic levels of a QP
[17, 18, 19]. Further generalization is a system of several QPs which interact
both with each other and environment. For example, effective interaction of
two electrons through exchange by phonons can overcome the Coulomb repul-
sion and form a bound state, bipolaron [20, 21, 22, 23, 24]. On the other hand,
coupling of attracting hole and electron to the lattice vibrations [25, 26, 27]
can create a lot of qualitatively different objects: localized exciton, weakly
bound pair of localized hole and localized electron, etc. [28, 7]. Scattering by
impurities introduces additional complexity to the polaron problem because
interference of impurity potential with lattice distortion, which accompanies
the polaron movement, can contribute either constructively or destructively
to the localization of a QP on impurity [29, 30, 7].
In addition, a bare QP and bosonic bath can not be considered as well
defined in the correlated systems. Angle Resolved Photoemission Spectra
(ARPES), revealing the Lehmann Function (LF) of quasiparticle, demonstrate
broad peaks in many correlated systems: cooper oxide high-temperature su-
perconductors [31, 32, 33], colossal magnetoresistive manganites [34, 35, 36],
quasi-one-dimensional Peierls conductors [37, 38], and Verwey magnetites [39].
Besides, phonons are also broadened in many correlated systems, e.g. in high-
temperature semiconductors [40] and mixed-valent materials [41, 42]. One of
possible reasons for these broadenings is the interaction of the QPs with the
lattice degrees of freedom. However, in many realistic cases other subsystems,
not explicitly included into the polaron Hamiltonian, are responsible for the
decay of QP and phonons, e.g., another electronic bands, phonon anharmonic-
ity, interaction with nuclear spins, etc. Then, if this auxiliary broadening is
known in some approximation, one can formulate an ambitious goal to study
spectral response when “bare” quasiparticle with known damping interacts
with “broadened” bosonic excitations.
No one of traditional numerical methods, to say nothing of analytical ones,
can give approximation free results for measurable quantities of polaron, such
as optical conductivity or angle resolved photoemission spectra, for in macro-
scopic system of arbitrary dimension. Besides, we are not aware of any numer-
ical method which can incorporate in an approximation free way the informa-
tion on the damping of QP and bosonic bath. Below we describe basics of re-
cently developed Diagrammatic Monte Carlo (DMC) method for numerically
exact computation of Green functions and correlation functions in imaginary
time for few polarons in a macroscopic system [43, 44, 45, 46, 47, 48, 49, 50, 51].
Analytic continuation of imaginary time functions to real frequencies is per-
Spectroscopic Properties of Polarons by Exact Monte Carlo 3
formed by a novel approximation free approach of stochastic optimization
(SO) [45, 50, 51], circumventing difficulties of popular Maximal Enthropy
method. Finally we focus on results of application of the DMC-SO machinery
to various problems [52, 53, 54, 55, 56, 57]
The basic models, related to the polaronic objects in correlated systems,
which can be solved by DMC-SO methods, are stated in the next Sect. It
is followed in Sect. 1.2 by description of stumbling blocks encountered by
analytic methods. Sect. 2 concerns the basics of DMC-SO methods. However,
those who are not interested in the details of the methods can briefly look
through the definitions in the introduction for Sect. 2 and turn to Sect. 3
where LF and optical conductivity of Fröhlich polaron are discussed (see also
[58]). Results of studies of the self-trapping phenomenon are presented in Sect.
4 and application of DMC-SO methods to the exciton problem can be found
in Sect. 5. The chapter is completed by Sect. 6 devoted to studies of ARPES
of high temperature superconductors.
1.1 Formulation of a General Model with Interacting Polarons
In general terms, the simplest problem of a complex polaronic object, where
center-of-mass motion does not separate from the rest of degrees of freedom,
is introduced as system of two QPs
εa(k)a
εh(k)hkh
(ak and hk are annihilation operators, and εa(k) and εh(k) are dispersions of
QPs), which interact with each other
Ĥa-h = −N−1
U(p,k,k′)a†
p−khp−k′ap+k′ . (2)
(N is the number of lattice sites) through the instantaneous Coulomb potential
and the scattering by bosons
Ĥpar-bos = i
(b†q,κ − b−q,κ)
γaa,κ(k,q)a
k−qak + γhh,κ(k,q)h
k−qhk + γah,κ(k,q)h
k−qak
+ h.c. (3)
(γ[aa,ah,hh],κ are interaction constants) where quanta of Q different branches
of bosonic excitations are created or annihilated, which are described by
Ĥbos =
ωq,κb
q,κbq,κ . (4)
In general, each QP can be a composite one with internal degree of freedom
represented by T different states
4 A. S. Mishchenko and N. Nagaosa
ĤPJT0 =
ǫi(k)a
i,kai,k, (5)
which quantum numbers can be also changed due to nondiagonal part of
particle-boson interaction
Ĥpar-bos = i
i,j=1
γij,κ(k,q)(b
q,κ − b−q,κ)a
i,k−qaj,k + h.c. (6)
Complicated model (1)-(6) is still too far from the cases encountered in
strongly correlated systems. Due to coupling of QPs (1) and (5) and bosonic
fields (4) to additional degrees of freedom, these excitations are not well de-
fined from the onset. Namely, the dispersion relation of the QP spectrum ǫ(k)
in realistic system is ill-defined. One can speak of a Lehmann Function (LF)
[59, 60, 61] of a QP
Lk(ω) =
δ(ω − Eν(k)) |〈ν|a†k|vac〉|
2 (7)
,which is normalized to unity
dωLk(ω) = 1 and can be interpreted as
a probability that a QP has momentum k and energy ω. (Here {|ν〉} is a
complete set of eigenstates of Hamiltonian Ĥ in a sector of given momentum
k: H |ν(k)〉 = Eν(k) |ν(k)〉.) Only for noninteracting system the LF reduces
to delta function LNONINTk (ω) = δ(ω − ǫ(k)) and, thus, sets up dispersion
relation ω = ǫ(k).
Specific cases of model (1)-(6) describe enormous variety of physical prob-
lems. Hamiltonians (1) and (2), in case of attractive potential U(p,k,k′) > 0,
describe an exciton with static screening [62, 63]. Besides, expressions (1)-(4)
describe bipolaron for repulsive interaction [20, 21, 22, 23, 24] U(p,k,k′) < 0
and exciton-polaron otherwise [25, 26, 27]. The simplest model for exciton-
phonon interaction, when only two (T = 2) lowest states of relative electron-
hole motion are relevant (e.g. in one-dimensional charge-transfer exciton
[64, 65, 66]), is defined by Hamiltonians (4)-(6)). The same relations (4)-(6)
describe the problems of Jahn-Teller [all ǫi in Hamiltonian (5) are the same]
and pseudo Jahn-Teller polaron. The problem of a hole in an antiferromagnet
in spin-wave approximation is expressed in terms of Hamiltonians (4)-(6) with
Q = 1 and T = 1. When hole also interacts with phonons, one has to take into
account one more bosonic branch and set Q = 2 in (4) and (6). Finally, the
simplest nontrivial problem of a polaron, i.e. of a structureless QP interacting
with one phonon branch, is described by noninteracting Hamiltonians of QP
Ĥpar and phonons Ĥph
Ĥ0 =
ǫ(k)a
qbq , (8)
and interaction term
Spectroscopic Properties of Polarons by Exact Monte Carlo 5
Ĥint =
V (k,q)(b†q − b−q)a
k−qak + h.c. . (9)
The simplest polaron problem, in turn, can be subdivided into continuous and
lattice polaron models.
1.2 Limitations of Analytic Methods in Problem of Polarons
Analytic solution for the problem of exciton in a rigid lattice is available
only for small radius Frenkel regime [67] and large radius Wannier regime
[68]. However, even limits of validity for these approximations are not known.
Random phase approximation approaches [62, 63], are capable of obtaining
some qualitative conclusions for intermediate radius regime though its’ quan-
titative results are not reliable due to uncontrolled errors. The situation is
similar with the problem of structureless polaron, where analytic solutions
are known only in the weak and strong coupling regimes. Besides, reliable
results for these regimes are available only for ground state properties.
Although several novel methods, capable of obtaining properties of excited
states, were developed recently, variational coherent-states expansion [69] and
free propagator momentum average summation [70] as a few examples, all of
them, to provide reliable data in a specific regime, need either comparison
with exact sum rules [71, 72] or with exact numerical results.
Application of variational methods to study of excitations is a tricky is-
sue since, strictly speaking, they are valid only for the ground state. As an
example for the importance of sum rules in variational treatment, we refer
to the problem of the optical conductivity of the Fröchlich polaron. Possibil-
ity of existence of Relaxed Excited State (RES), which is a metastable state
where lattice deformation has adjusted to the electronic excitation rendering
stability and narrow linewidth of the spectroscopic response, was briefly men-
tioned by S. I. Pekar in early 50’s. Then, conception of RES was rigorously
formulated by J. T. Devreese with coworkers and has been a subject of ex-
tensive investigations for years [5, 73, 74, 75, 76, 77, 48, 57]. Calculations of
impedance [75] in the framework of technique [78] supported the existence of
a narrow stable peak in the optical conductivity. However, even the authors
of [75] were skeptical about the fact that the width of RES in the strong
coupling regime appeared to be more narrow than the phonon frequency, i.e.
inverse time which is, according to the Heisenberg uncertainty principle, is
required for the lattice readaptation. In consequent paper [77] they realized
the importance of many-phonon processes and studied two-phonon contri-
bution to optical conductivity. Importance of many phonon processes was
confirmed when variational results [75] were compared with exact DMC sim-
ulations [48]. Variational result well reproduced the position of the peak in
exact data though failed in description of the peak width in the strong cou-
pling regime [48]. Finally, when approach [75] was modified and several sum
rules were accurately introduced into variational model [57], both position
6 A. S. Mishchenko and N. Nagaosa
and width of the peak were quantitatively reproduced. Studies [57] (see Sect.
3.1), do not address rather philosophical question whether RES exists or not,
though inevitably prove that, in contrast to the foregoing beliefs, there in no
stable excited state of the Fröhlich polaron in the strong coupling regime.
Note that sometimes excited states can not be handled by analytic methods
even for weak couplings: perturbation theory expression for LF of the Fröhlich
polaron model diverges at the phonon energy ωph [See (34) in Sect. 3.1.] and
more elaborate treatment is necessary.
Difficulties of semianalytic methods enhance in the intermediate coupling
regime where results are sometimes wrong even for ground state properties.
For example, the variatioanl approach [79], which has been considered as an
intermediate coupling theory, appeared to be valid only in the weak coupling
limit [45]. Special interest to the methods, giving reliable information on ex-
cited states, is triggered by the self-trapping phenomenon which occurs just
in the intermediate coupling regime. This phenomenon is a dramatic trans-
formation of QP properties when system parameters are slightly changed
[3, 7, 9, 80]. In the intermediate coupling regime “trapped” QP state with
strong lattice deformation around it and “free” state with weakly perturbed
lattice may hybridize and resonate because of close energies at some critical
value of electron-lattice interaction γc. It is clear that, to study self-trapping,
one has to apply a method giving reliable information on excited states in the
intermediate coupling regime.
2 Diagrammatic Monte Carlo and Stochastic
Optimization Methods
In this section we introduce definitions of exciton-polaron properties which
can be evaluated by DMC and SO methods. An idea of DMC approach for
numerically exact calculation of Green functions (GFs) in imaginary times
is presented in Sect. 2.1, and a short description of SO method, which is
capable of making unbiased analytic continuation from imaginary times to
real frequencies, is given in Sect. 2.2. Using combination of DMC and SO,
one can often circumvent difficulties of analytic and traditional numerical
methods. Therefore, a brief comparative analysis of advantages and drawbacks
of DMC-SO machinery is given in Sect. 2.3.
To obtain information on QPs it is necessary to calculate Matsubara GF
in imaginary time representation and make analytic continuation to the real
frequencies [60]. For the two-particle problem (1)-(4) the relevant quantity is
the two-particle GF [46, 47]
(τ) = 〈vac | ak+p′(τ)hk−p′(τ)h†k−pa
k+p | vac〉 . (10)
(Here hk−p(τ) = e
Ĥτhk−pe
−Ĥτ , τ > 0.) In the case of exciton-polaron, vac-
uum state | vac〉 is the state with filled valence and empty conduction bands.
Spectroscopic Properties of Polarons by Exact Monte Carlo 7
For the bipolaron problem it is a system without particles. In the simpler case
of a QP with two-level internal structure described by (4)-(6) the relevant
quantity is the one-particle matrix GF [52, 47]
Gk,ij(τ) = 〈vac | ai,k(τ)a†j,k | vac〉, i, j = 1, 2. (11)
For a structureless polaron the matrix (11) reduces to one-particle scalar GF
Gk(τ) = 〈vac | ak(τ)a†k | vac〉 . (12)
Information on the response to an external weak perturbation (e.g. optical
absorption) is contained in the current-current correlation function 〈Jβ(τ)Jδ〉
(β/δ are Cartesian indexes).
Lehmann spectral representation of Gk(τ) [60, 61] at zero temperature
Gk(τ) =
dω Lk(ω) e
−ωτ , (13)
with the Lehmann function (LF) Lk(ω) given in (7), reveals information on
the ground and excited states. Here {|ν〉} is a complete set of eigenstates of
Hamiltonian Ĥ in a sector of given momentum k: H |ν(k)〉 = Eν(k) |ν(k)〉.
The LF Lk(ω) has poles (sharp peaks) on the energies of stable (metastable)
states of particle. For example, if there is a stable state at energy E(k), the LF
reads Lk(ω) = Z
(k) δ(ω − E(k)) + . . ., and the state with the lowest energy
Eg.s.(k) in a sector of a given momentum k is highlighted by asymptotic
behavior of GF
Gk(τ ≫ max
ω−1q,κ
) → Z(k) exp[−Eg.s.(k)τ ] , (14)
where Z(k)-factor is the weight of the state. Analyzing the asymptotic behavior
of similar n-phonon GFs [45, 52]
Gk(n, τ ; q1, . . . ,qn) =
〈vac| bqn(τ) · · · bq1(τ) ap(τ)a
· · · b†qn |vac〉 , p = k−
j=1 qj .
one obtains detailed information about lowest state. For example, important
characteristics of the lowest state wave function
Ψg.s.(k) =
q1...qn
θi(k;q1, ...,qn)c
i,k−q1...−qnb
...b†qn | vac〉 (16)
are partial n-phonon contribution
Z(k)(n) ≡
q1...qn
| θi(k;q1, ...,qn) |2 (17)
which is normalized to unity
n=0 Z
(k)(n) ≡ 1, and the average number of
phonons
8 A. S. Mishchenko and N. Nagaosa
〈N〉 ≡ 〈Ψg.s.(k) |
b†qbq | Ψg.s.(k)〉 =
nZ(k)(n) (18)
in polaronic cloud. Another example is the wave function of relative electron-
hole motion of exciton in the lowest state in the sector of given momentum
Ψg.s.(k) =
ξk p(g.s.)a
| vac〉 . (19)
The amplitudes ξk p(g.s.) of this wave function can be obtained [46] from
asymptotic behavior of the following GF (10)
(τ → ∞) =| ξk p(g.s.) |2 e−Eg.s.(k)τ . (20)
Information on the excited states is obtained by the analytic continuation
of imaginary time GF to real frequencies which requires to solve the Fredholm
equation Gk(τ) = F̂ [Lk(ω)] (13)
Lk(ω) = F̂−1ω [Gk(τ)] . (21)
The equation (13) is a rather general relation between imaginary time GF/cor-
relator and spectral properties of the system. For example, the absorption
coefficient of light by excitons I(ω) is obtained as solution of the same equation
I(ω) = F̂−1ω
k=0(τ)
 . (22)
Besides, the real part of the optical conductivity σβδ(ω) is expressed [48] in
terms of current-current correlation function 〈Jβ(τ)Jδ〉 by relation
σβδ(ω) = πF̂−1ω [〈Jβ(τ)Jδ〉] /ω . (23)
2.1 Diagrammatic Monte Carlo Method
DMC Method is an algorithm which calculates GF (10)-(12) without any
systematic errors. This algorithm is described below for the simplest case of
structureless polaron [45], and generalizations to more complex cases can be
found in consequent references4. DMC is based on the Feynman expansion of
the Matsubara GF in imaginary time in the interaction representation
4 Generalization of described below technique to the case of exciton (1-2) is given
in [46] and its modification for pseudo-Jahn-Teller polaron (4-6) is developed in
[52, 47]. Method for evaluation of current-current correlation function can be
found in [48] and a case of a polaron interacting with two kinds of bosonic fields
is considered in [49].
Spectroscopic Properties of Polarons by Exact Monte Carlo 9
Gk(τ) =
∣∣∣∣Tτ
ak(τ)a
(0) exp
Ĥint(τ
′)dτ ′
}]∣∣∣∣ vac
; τ > 0 .
Here Tτ is the imaginary time ordering operator, |vac〉 is a vacuum state with-
out particle and phonons, Ĥint is the interaction Hamiltonian in (9). Symbol of
exponent denotes Taylor expansion which results in multiple integration over
internal variables {τ ′1, τ ′2, . . .}. Operators are in the interaction representation
Â(τ) = exp[τ(Ĥpar + Ĥph)]Â exp[−τ(Ĥpar + Ĥph)]. Index “con” means that
expansion contains only connected terms where no one integral over internal
time variables {τ ′1, τ ′2, . . .} can be factorized.
Vick theorem expresses matrix element of time-ordered operators as a
sum of terms, each is a factor of matrix elements of pairs of operators, and
expansion (24) becomes an infinite series of integrals with an ever increasing
number of integration variables
Gk(τ) =
m=0,2,4...
dx′1 · · · dx′m D(ξm)m (τ ; {x′1, . . . , x′m}) . (25)
Here index ξm stands for different Feynman diagrams (FDs) of the same
order m. Term with m = 0 is the GF of the noninteracting QP G
Function D(ξm)m (τ ; {x′1, . . . , x′m}) of any order m can be expressed as a fac-
tor of GFs of noninteracting quasiparticle, GFs of phonons, and interaction
vortexes V (k,q). For the simplest case of Hamiltonian system expressions
for GFs of QP G
(τ2 − τ1) = exp [−ǫ(k)(τ2 − τ1)] (τ2 > τ1) and phonons
q (τ2 − τ1) = exp [−ωq(τ2 − τ1)] (τ2 > τ1) are well known.
An important feature of the DMC method, which is distinct from the
row of other exact numerical approaches, is the explicit possibility to include
renormalized GFs into exact expansion without any change of the algorithm.
For example, if a damping of QP, caused by some interactions not included
in the Hamiltonian, is known, i.e. retarded self-energy of QP Σret(k, ω) is
available, renormalized GF
(τ) =
dωe−ωτ
ImΣret(k, ω)
[ω − ǫ(k)−ReΣret(k, ω)]2 + [ImΣret(k, ω]2
can be introduced instead of bare GF G
(τ). Explicit rules for evaluation of
D(ξm)m do not depend on the order and topology of FD. GFs of noninteracting
(τ2−τ1) (or G̃(0)k (τ2−τ1)) with corresponding times and momenta are
ascribed to horizontal lines and noninteracting GFs of phonon D
q (τ2 − τ1)
(multiplied by the factor of corresponding vortexes V (k′,q)V ∗(k′′,q)) are
attributed to phonon propagator arch (see Fig. 1a). Then, D(ξm)m is the factor
of all GSs. For example, expression for the weight of the second order term
(Fig. 1b) is the following
10 A. S. Mishchenko and N. Nagaosa
D2(τ ; {τ ′2, τ ′1,q}) =
|V (k,q)|2D(0)q (τ ′2 − τ ′1)G
(τ ′1)G
k−q(τ
2 − τ ′1)G
(τ − τ ′2) . (27)
τ’2τ’1
k k-q k
τ0 τ’2τ’4τ’1 τ’3
k k-q-q’ k-q k
Fig. 1. (a) Typical FD contributing into expansion (25). (b) FD of the second
order and (c) forth order.
The DMC process is a numerical procedure which, basing on the Metropo-
lis principle [81, 82], samples different FDs in the parameter space (τ,m, ξm, {x′m})
and collects statistics of external variable τ in a way that the result of this
statistics converges to exact GF Gk(τ). Although sampling of the internal
parameters of one term in (25) and switch between different orders is per-
formed within the the framework of one and the same numerical process,
it is instructive to start with the procedure of evaluation of a specific term
D(ξm)m (τ ; {x′1, . . . , x′m}).
Starting from a set {τ ; {x′1, . . . , x′m}}, an update x
(old)
l → x
(new)
l of an
arbitrary chosen parameter is suggested. This update is accepted or rejected
according to Metropolis principle. After many steps, altering all variables,
statistics of external variable converges to exact dependence of the term on
τ . Suggestion for new value of parameter x
(new)
l = Ŝ
−1(R) is generated by
random number R ∈ [0, 1] with a normalized distribution function W (xl)
in a range x
(min)
l < xl < x
(max)
l . There are only two restrictions for this
otherwise arbitrary function. First, new parameters x
(new)
l must not violate
FD topology, i.e., for example, internal time τ ′1 in Fig. 1c must be in the range
[x(min) = 0, x(max) = τ ′3]. Second, the distribution must be nonzero for the
whole, allowed by FD topology, domain. This ergodicity property is crucial
since it is necessary to sample the whole domain for convergence to exact
answer. At each step, update x
(old)
l → x
(new)
l is accepted with probability
Pacc = M (if M < 1) and always otherwise. The ratio M has the following
D(ξm)m (τ ; {x′1, . . . , x
(new)
l , . . . , x
m})/W (x
(new)
D(ξm)m (τ ; {x′1, . . . , x
(old)
l , . . . , x
m})/W (x
(old)
. (28)
For uniform distribution W = const =
(max)
l − x
(max)
, the probability
of any combination of parameters is proportional to the weight function D.
Spectroscopic Properties of Polarons by Exact Monte Carlo 11
However, for better convergence the distributionW (xnewl ) must be as close as
possible to the actual distribution given by function D(ξm)m ({. . . , x(new)l , . . . , }).
For sampling over FDs of all orders and topologies it is enough to introduce
two complimentary updates. Update A transforms FD D(ξm)m (τ ; {x′1, . . . , x′m})
into higher order FD D(ξm+2)m+2 (τ ; {x′1, . . . , x′m; q′, τ ′3, τ ′4}) with extra phonon
arch, connecting some time points τ ′3 and τ
4 by phonon propagator with mo-
mentum q′ (Fig. 1c). Note that the ratio of weights D(ξm+2)m+2 /D
m is not
dimensionless. Dimensionless Metropolis ratio
D(ξm+2)m+2 (τ ; {x′1, . . . , x′m; q′, τ ′, τ ′′})
D(ξm)m (τ ; {x′1, . . . , x′m})W (q′, τ ′, τ ′′)
. (29)
contains normalized probability function W (q′, τ ′, τ ′′), which is used for gen-
erating of new parameters5. Complementary update B, removing the phonon
propagator, uses ratio M−1 [45].
Note that all updates are local, i.e. do not depend on the structure of the
whole FD. Neither rules nor CPU time, needed for update, depends on the
FD order. DMC method does not imply any explicit truncation of FDs order
due to finite size of computer memory. Ever for strong coupling, where typical
number of phonon propagators Nph, contributing to result, is large, influence
of finite size of memory is not essential. Really, according to Central Limit
Theorem, number of phonon propagators obeys Gauss distribution centered
at N̄ph with half width of the order of
N̄ph [83]. Hence, if a memory for at
least 2N̄ph propagators is reserved, diagram order hardly surpasses this limit.
2.2 Stochastic Optimization Method
The problem of inverting of integral equation (13) is an ill posed problem.
Due to incomplete noisy information about GF Gk(τ), which is known with
statistic errors on a finite number of imaginary times in a finite range [0, τmax],
there is infinite number of approximate solutions which reproduce GF within
some range of deviations and the problem is to chose “the best one”. Another
problem, which is a stumbling block for decades, is the saw tooth noise insta-
bility. It occurs when solution is obtained by a naive method, e.g. by using
least-squares approach for minimizing deviation measure
D[L̃k(ω)] =
∫ τmax
∣∣∣Gk(τ) − G̃k(τ)
∣∣∣G−1k (τ)dτ . (30)
Here G̃k(τ) is obtained from approximate LF L̃k(ω) by applying of integral
operator G̃k(τ) = F
L̃k(ω)
in (13). Saw tooth instability corrupts LF in the
ranges where actual LF is smooth. Fast fluctuations of the solution L̃k(ω) often
5 The factor pA/pB depends on the probability to address add/remove processes.
12 A. S. Mishchenko and N. Nagaosa
have much larger amplitude than the value of actual LF Lk(ω). Standard tools
for saw tooth noise suppression are based on the early 60-es idea of Fillips-
Tikhonov regularization method [84, 85, 86, 87]. A nonlinear functional, which
suppresses large derivatives of approximate solution L̃k(ω), is added to the
linear deviation measure (30). Most popular variant of regularization methods
is the Maximal Entropy Method [61].
However, typical LF of a QP in a boson field consists of δ-functional peaks
and smooth incoherent continuum with a sharp border [45, 54]. Hence, sup-
pression of high derivatives, as a general strategy of the regularization method,
fails. Moreover, any specific implementation of the regularization method uses
predefined mesh in the ω space, which could be absolutely unacceptable for
the case of sharp peaks. If the actual location of a sharp peak is between
predefined discrete points, the rest of spectral density can be distorted be-
yond recognition. Finally, regularization Maximal Entropy approach requires
assumption of Gauss distribution of statistic errors in Gk(τ), which might be
invalid in some cases [61].
Recently, a Stochastic Optimization (SO) method, which circumvents
abovementioned difficulties, was developed [45]. The idea of the SO method
is to generate a large enough number M of statistically independent nonreg-
ularized solutions {L̃(s)
(ω)}, s = 1, ...,M , which deviation measures D(s) are
smaller than some upper limit Du, depending of the statistic noise of the GF
Gk(τ). Then, using linearity of expressions (13), (30), the final solution is
found as the average of particular solutions {L̃(s)
Lk(ω) = M
(ω) . (31)
Particular solution L̃
(ω) is parameterized in terms of sum
(ω) =
χ{Pt}(ω) (32)
of rectangles {Pt} = {ht, wt, ct} with height ht > 0, width wt > 0, and center
ct. Configuration
C = {{Pt}, t = 1, ...,K} , (33)
which satisfies normalization condition
t=1 htwt = 1, defines function
G̃k(τ). The procedure of generating particular solution starts from stochastic
choice of initial configuration Cinits . Then, deviation measure is optimized by
a randomly chosen consequence of updates until deviation is less than Du. In
addition to updates, which do not change number of terms in the sum (32),
there are updates which increase or decrease number K. Hence, since the
number of elements K is not fixed, any spectral function can be reproduced
with desired accuracy.
Spectroscopic Properties of Polarons by Exact Monte Carlo 13
Although each particular solution L̃
(ω) suffers from saw tooth noise at
the area of smooth LF, statistical independence of each solution leads to self
averaging of this noise in the sum (32). Note that suppression of noise happens
without suppression of high derivatives and, hence, sharp peaks and edges
are not smeared out in contrast to regularization approaches. Therefore, saw
tooth noise instability is defeated without corruption of sharp peaks and edges.
Moreover, continuous parameterization (32) does not need predefined mesh in
ω-space. Besides, since the Hilbert space of solution is sampled directly, any
assumption about distribution of statistical errors is not necessary.
SO method was successfully applied to restore LF of Fröhlich polaron
[45], Rashba-Pekar exciton-polaron [54], hole-polaron in t-J-model [53, 49],
and many-particle spin system [88]. Calculation of the optical conductivity of
polaron by SO method can be found in [48]. SO method appeared to be helpful
in cases when GF’s asymptotic limit, giving information about ground state,
can not be reached. For example, sign fluctuations of the terms in expansion
(25) for a hole in the t-J-model lead to poor statistics at large times [53],
though, SO method is capable of recovering energy and Z-factor even from
GF known only at small imaginary times [53].
2.3 Advantages and Drawbacks of DMC-SO Machinery
Among numerical methods, capable of obtaining quantitative results in the
problem of exciton (1) and (2), one can list time-dependent density func-
tional theory [89], Hanke-Sham technique of correcting particle-hole excita-
tion energy [90, 91], and approaches directly solving Bethe-Salpeter equation
[92, 93, 94]. The latter ones provide rather accurate information on the two-
particle GF. However, usage of finite mesh in direct/reciprocal space, which
is avoided in DMC method, leads to its’ failure in Wannier regime [93].
In contrast to DMC method, none of the traditional numeric methods
can give reliable results for measurable properties of excited states of polaron
at arbitrary range of electron-phonon interaction for the macroscopic system
in the thermodynamic limit. Exact diagonalization method [95, 96, 97, 98]
can study excited states though only on rather small finite size systems and
results of this method are not even justified in the variational sense in the
thermodynamic limit [99]. There is a batch of rather effective variational “ex-
act translation” methods [99, 100, 101, 102, 103] where basis is chosen in
the momentum space and, hence, the variational principle is applied in the
thermodynamic limit. Although these methods can reveal few discrete excited
states, its fail for long-range interaction and for dispersive, especially acoustic
phonons due to catastrophic growth of variational basis. A non perturbative
theory, which is able to give information about spectral properties in the ther-
modynamic limit at least for one electron, is Dynamical Mean Field Theory
[104, 105, 106, 107]. However it gives an exact solution only in the case of
infinite dimension which does not correspond to a realistic system and can be
considered only as a guide for extrapolation to finite dimensions [108].
14 A. S. Mishchenko and N. Nagaosa
Recently developed cluster perturbation theory, where exact diagonaliza-
tion of a cluster is further improved by taking into account inter-cluster inter-
action [109, 110, 111, 112, 113], is applicable for study of the excited states,
but limited to one-dimensional lattices or two-dimensional systems with short-
range interaction. Traditional density-matrix renormalization group method
[114, 115, 116, 117, 118] is very effective though mostly limited to one-
dimensional systems and ladders. Finally, recently developed path integral
quantum Monte Carlo algorithm [119, 120, 121, 122] is valid for any dimen-
sion and properly takes into account quasi long-range interactions [123]. Path
integral method is capable of obtaining the density of states [119, 120] and
isotope exponents [121, 124]. However calculations of measurable characteris-
tics of excited states, such as ARPES or optical conductivity, by this method
were never reported.
In conclusion, none of methods, except DMC-SO combination, can obtain
at the moment approximation-free results for measurable physical quantities
for a few QPs interacting with a macroscopic bosonic bath in the thermody-
namic limit. Indeed, there are limitations of the DMC and SO methods. DMC
method does not work in many-fermion systems due to sign problem and SO
method fails at high temperatures, comparable to energies of dominant spec-
tral peaks, because even very small statistical noise of GFs turns Fredholm
equation (13) into essentially “ill defined” problem [84].
3 Spectral Properties of the Fröhlich Polaron
Before development of DMC-SO methods, the information on the excited
states of polaron models, especially the Fröhlich one, was very limited. Knowl-
edge of LF was based on results of infinite-dimensions approximation [125],
exact diagonalization [126, 96, 97, 97], or strong coupling expansion [127]. No
one of the above techniques was capable of obtaining the LF of polaron with-
out approximations, especially for long-range interaction where difficulties of
traditional numerical methods dramatically increase. In a similar way, optical
conductivity (OC) of Fröhlich model was known only in strong coupling ex-
pansion approximation [128], within the framework of the perturbation theory
[129], or was based on the variational Feynman path integral technique [75].
In this sect. we consider exact DMC-SO results on LF [45] and OC [48, 57] of
Fröhlich polaron model.
3.1 Lehmann Function of the Fröhlich Polaron
The perturbation theory expression for the high-energy part (ω > 0) of the
LF for arbitrary interaction potential V (| q |) reads [45] (frequency of the
optical phonon ωph is set to unity)
Lk=0(ω > 0) =
ω − 1
| V (
2(ω − 1)) |2 θ(ω − 1) . (34)
Spectroscopic Properties of Polarons by Exact Monte Carlo 15
Low-energy part of the LF for the short-range interaction V (| q |) =
0 2 4 6
0.000
0.002
0.004
0.006
0 2 4 6
2 4 6
2 4 6
2 4 6
1.0 1.2
L (a)
Fig. 2. Comparison of the numerical results (solid lines) and the perturbation
theory (dashed lines) for the LFs of the Fröhlich model with α = 0.05 (a) and the
short-range interaction model with α = 0.05 and κ = 1 (b). LFs of Fröhlich polaron
for α = 0.5 (c), α = 1 (d) and α = 2 (e). Energy is measured from that of the
ground state of the polaron. The initial fragment of the LF for α = 1 is shown in
the inset (f).
(q2 + κ2)−1/2 , reducing to the Fröhlich one when κ→ 0, is
Lk=0(ω < 0) =
ω + α
. (35)
Comparison of low-energy parts of the LF of the Fröhlich model, obtained
by DMC-SO and taken from (35), shows perfect agreement for α = 0.05: the
accuracy for the polaron energy and Z-factor is about 10−4. On the other hand,
high-energy part of numeric result (Fig. 2) significantly deviates from that of
the analytic expression (35). This is not surprising since for Fröhlich polaron
the perturbation theory expression is diverging as ω → ωph and, therefore
the perturbation theory breaks down. When perturbation theory is obviously
valid, e.g. for the case of finite κ = 1, there is a perfect agreement between
analytic expression and DMC-SO results (Fig. 2b). Note that the high-energy
part of Lk=0(ω) is successfully restored by SO method despite the fact that
the total weight of the feature for α = 0.05 is less than 10−2.
The main deviation of the actual LF from the perturbation theory result
is the extra broad peak in the actual LF at ω ∼ 3.5. To study this feature
Lk=0(ω) was calculated for α = 0.5, α = 1, and α = 2 (Fig. 2c-e). The peak
16 A. S. Mishchenko and N. Nagaosa
0 4 8 12
0 4 8 12
0 4 8 12
=4 =6
Fig. 3. Evolution of spectral density with α in the cross-over region from interme-
diate to strong couplings. The polaron ground state peak is shown only for α = 8.
Note that the spectral analysis still resolves it, despite its very small weight < 10−3.
is seen for higher values of the interaction constant and its weight grows with
α. Near the threshold, ω = 1, LF demonstrates the square-root dependence
ω − 1 (Fig. 2f).
To trace the evolution of the peak at higher values of α the LF was cal-
culated [45] for α = 4, α = 6, and α = 8 (Fig. 3). At α = 4 the peak at
ω ∼ 4 already dominates. Moreover, a distinct high-energy shoulder appears
at α = 4, which transforms into a broad peak at ω ∼ 8.5 in the LF for α = 6.
The LF for α = 8 demonstrates further redistribution of the spectral weight
between different maxima without significant shift of the peak positions.
3.2 Optical Conductivity of the Fröhlich Polaron: Validity of the
Franck-Condon Principle in the Optical Spectroscopy
The FC principle [130, 131] and its validity have been widely discussed in stud-
ies of optical transitions in atoms, molecules [132, 133], and solids [134, 9].
Generally, the FC principle means that if only one of two coupled subsys-
tems, e.g. an electronic subsystem, is affected by an external perturbation,
the second subsystem, e.g., the lattice, is not fast enough to follow the re-
construction of the electronic configuration. It is clear that the justification
for the FC principle is the short characteristic time of the measurement pro-
cess τmp ≪ τic, where τmp is related to the energy gap between the initial
and final states, ∆E, through the uncertainty principle: τmp ≃ h̄/(∆E) and
τic is the time necessary to adjust the lattice when the electronic component
is perturbed. Then, the spectroscopic response considerably depends on the
value of the ratio τmp/τic For example, in mixed valence systems, where the
ionic valence fluctuates between configurations f5 and f6 with characteristic
time τic ≈ 10−13s, spectra of fast and slow experiments are dramatically dif-
ferent [135, 136]. Photoemission experiments with short characteristic times
τmp ≈ 10−16s (FC regime), reveal two lines, corresponding to f5 and f6
Spectroscopic Properties of Polarons by Exact Monte Carlo 17
states. On the other hand slow Mössbauer isomer shift measurements with
τmp ≈ 10−9s show a single broad peak with mean frequency lying between
signals from pure f5 and f6 shells. Finally, according to paradigm of mea-
surement process time, magnetic neutron scattering with τmp ≈ τic revealed
both coherent lines with all subsystems dynamically adjusted and broad inco-
herent remnants of strongly damped excitation of f5 and f6 shells [137, 138].
Actually, the meaning of the times τic and τmp varies with the system and
with the measurement process.
To study the interplay between measurement process time τmp and ad-
justment time τic, the OC of the Fröhlich polaron was studied in [57] from
the weak to the strong coupling regime by three methods. DMC method gives
numerically exact answer which is compared with memory function formalism
(MFF), which is able to take dynamical lattice relaxation into account, and
strong coupling expansion (SCE) which assumes FC approach. It was found
that near critical coupling αc ≈ 8.5 a dramatic change of the OC spectrum
occurs: dominating peak of OC splits into two satellites. In this critical regime
the upper (lower) one quickly decreases (increases) it’s spectral weight as the
value of coupling constant increases. Besides, while OC follows prediction of
MFF at α < αc, its dependence switches to that predicted by SCE for larger
couplings. It was concluded that, for the OC measurement of polaron, the
adjustment time τic ≈ h̄/D is set by typical nonadiabatic energy D. Nona-
diabaticity destroys FC classification at α < αc while FC principle rapidly
regains its validity at large couplings due to fast growth of energy separation
between initial and final states of optical transitions.
Comparison of exact DMC-SO data for OC with existing results of ap-
proximate methods showed [48] that the Feynman path integral technique
[75] of Devreese, De Sitter, and Goovaerts, where OC is calculated starting
from the Feynman variational model [139], is the only successfully describing
evolution of the energy of the main peak in OC with coupling constant α (see
[58]). However, starting from the intermediate coupling regime this approach
fails to reproduce the peak width. Subsequently, the path integral approach
was rewritten in terms of MFF [140]. Then, in [57] the extended MFF for-
malism, which introduces dissipation processes fixed by exact sum rules, was
developed [141].
As shown in Fig. 4a, in the weak coupling regime, the MFF, with or with-
out dissipation, is in very good agreement with DMC data, showing significant
improvement with respect to weak coupling perturbation approach [129] which
provides a good description of OC spectra only for very small values of α. For
1 ≤ α ≤ 8, where standard MFF fails to reproduce peak width (Fig. 4b-d)
and even the peak position (Fig. 4c), the damping, introduced to extended
MFF scheme, becomes crucial. Results of extended MFF are accurate for the
peak energy and quite satisfactory for the peak width (Fig. 4b-e). Note that
the broadening of the peak in DMC data is not a consequence of poor quality
of analytic continuation procedure since DMC-SO methods is capable of re-
vealing such fine features as 2- and 3-phonon thresholds of emission (Fig. 4b).
18 A. S. Mishchenko and N. Nagaosa
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8 10 12
0 5 10 15
0 5 10 15
(a) (b)
=5.25
Fig. 4. Comparison of the optical conductivity calculated by DMC method (circles),
extended MFF (solid line), and DSG [75, 140] (dashed line) for different values of
α. The slanted arrows indicate 2- and 3-phonon thresholds of absorption.
0 5 10 15 20
0 10 20 30
0 10 20 30 40
0 5 10 15
0 5 10 15
(a) (b)
Fig. 5. (a)-(c) Comparison of the optical conductivity calculated within the DMC
method (circles), the extended MFF (solid line), and SCE (dotted line) for different
values of α. (d) The energy of lower- and higher-frequency features (circles and tri-
angles, respectively) compared with the FC transition energy with the SCE (dashed
line) and with the energy of the peak obtained from the extended MFF (solid line).
In the inset, the weights of FC and adiabatically connected transitions are shown as
a function of α (for η = 1.3.)
However, a dramatic change of OC occurs around critical coupling strength
αc ≈ 8.5. The dominating peak of OC splits into two ones, the energy of lover
one corresponding to the predictions of SCR expansion and that of upper
one obeying extended MFF value (Fig. 5a). The shoulder, corresponding to
dynamical extended MFF contribution, rapidly decreases it’s intensity with
increase of α and at large α (Fig. 5b-c) the OC is in good agreement with
strong coupling expansion, assuming FC scheme. Finally, comparing energies
of the peaks, obtained by DMC, extended MFF and FC strong coupling ex-
pansion (Fig. 5d), we conclude that at critical coupling αc ≈ 8.5 the spectral
properties rapidly switch from dynamic, when lattice relaxes at transition, to
FC regime, where nuclei are frozen in initial configuration.
In order get an idea of the FC breakdown authors of [57] consider the fol-
lowing arguments. The approximate adiabatic states are not exact eigenstates
of the system. These states are mixed by nondiagonal matrix elements of the
Spectroscopic Properties of Polarons by Exact Monte Carlo 19
nonadiabatic operator D and exact eigenstates are linear combinations of the
adiabatic wavefunctions. Being interested in the properties of transition from
ground (g) to an excited (ex) state, whose energy correspond to that of the OC
peak, it is necessary to consider mixing of only these states and express exact
wavefunctions as a linear combinations [142, 143] of ground and excited adi-
abatic states. The coefficients of superposition are determined from standard
techniques [142, 143] where nondiagonal matrix elements of the nonadiabatic
operator [142] are expressed in terms of matrix elements of the kinetic energy
operator M , the gap between excited and ground state ∆E = Eex − Eg and
the number nβ of phonons in adiabatic state:
D± =M(∆E)−1
nβ + 1/2± 1/2 +M2(∆E)−2. (36)
The extent to which lattice can follow transition between electronic states,
depends on the degree of mixing between initial and final exact eigenstates
through the nonadiabatic interaction. If initial and ground states are strongly
mixed, the adiabatic classification has no sense and, therefore, the FC pro-
cesses have no place and lattice is adjusted to the change of electronic states
during the transition. In the opposite limit adiabatic approximation is valid
and FC processes dominate. The estimation for the weight of FC component
IFC [57] is equal to unity in the case of zero mixing and zero in the case
of maximal mixing. The weight of adiabatically connected (AC) transition
IAC = 1− IFC is defined accordingly. Non-diagonal matrix element M is pro-
portional to the root square of α with a coefficient η of the order of unity.
In the strong coupling regime, assuming that ∆E ≈ γα2 (γ ≈ 0.1 from MC
data), and nβ ≈ ∆E (nβ ≫ 1), one gets
IFC =
1 + 4(τmp/τic)
, (37)
where τmp = 1/∆E and τic = 1/D. For η of the order of unity one obtains
qualitative description of a rather fast transition from AC- to FC-dominated
transition, when IFC and IAC exchange half of their weights in the range
of α from 7 to 9. The physical reason for such quick change is the faster
growth of energy separation ∆E ∼ α2 compared to that of the matrix ele-
ment M ∼ α1/2. Finally, for large couplings, initial and final states become
adiabatically disconnected. The rapid AC-FC switch has nothing to do with
the self-trapping phenomenon where crossing and hybridization of the ground
and an excited states occurs. This phenomenon is a property of transition
between different states and related to the choice whether lattice can or can
not follow adiabatically the change of electronic state at the transition.
4 Self-Trapping
In this section we consider the self-trapping (ST) phenomenon which, due to
essential importance of many-particle interaction of QP with bosonic bath of
20 A. S. Mishchenko and N. Nagaosa
macroscopic system, was never addressed by exact method before. We start
with a basic definition of the ST phenomenon and introduce the adopted
criterion for it’s existence. Then, generic features of ST are demonstrated
on a simple model of Rashba-Pekar exciton-polaron in Sect. 4.1. It is shown
in Sect. 4.2 that the criterion is not a dogma since even in one dimensional
system, where ST is forbidden by criterion of existence, one can observe all
main features of ST due to peculiar nature of electronic states.
In general terms [7, 80], ST is a dramatic transformation of a QP properties
when system parameters are slightly changed. The physical reason of ST is a
quantum resonance, which happens at some critical interaction constant γc,
between “trapped” (T) state of QP with strong lattice deformation around
it and “free” (F) state. Naturally, ST transition is not abrupt because of
nonadiabatic interaction between T and F states and all properties of the QP
are analytic in γ [144]. At small γ < γc, ground state is an F state which is
weakly coupled to phonons while excited states are T states and have a large
lattice deformation. At critical couplings γ ≈ γc a crossover and hybridization
of these states occurs. Then, for γ > γc the roles of the states exchange. The
lowest state is a T state while the upper one is an F state.
First, and up to now the only quantitative criterion for ST existence was
given in terms of the ground state properties in the adiabatic approximation.
This criterion considers stability of the delocalized state in undistorted lattice
∆ = 0 with respect to the energy gain due to lattice distortion ∆′ 6= 0. ST
phenomenon occurs when completely delocalized state with∆ = 0 is separated
from distorted state with ∆′ 6= 0 by a barrier of adiabatic potential. One of
these states is stable while another one is meta-stable. The criterion of barrier
existence is defined in terms of the stability index
s = d− 2(1 + l) , (38)
where d is the system dimensionality. Index l determines the range of the force
limq→0 ψ(q) ∼ q−l, where ψ(R) is the kernel of interaction U(Rn) = ψ(Rn −
Rn′)ν(Rn′) connecting potential U(Rn) with generalized lattice distortion
ν(Rn′) [7]. The barrier exists for s > 0 and does not exist for s < 0. The
discontinuous change of the polaron state, i.e. ST, occurs in the former case
while does not happen in the latter case. When s = 0, this scaling argument
alone can not conclude the presence or absence of the ST and more detailed
discussion for each model is needed.
4.1 Typical Example of the Self-Trapping: Rasba-Pekar
Exciton-Polaron
Classical example of a system with ST phenomenon is the three dimensional
continuous Rasba-Pekar exciton-polaron in the approximation of intraband
scattering, i.e. when polar electron-phonon interaction (EPI) with dispersion-
less optical phonons ωph = 1 does not change the wave function of internal
Spectroscopic Properties of Polarons by Exact Monte Carlo 21
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 10 20 30 40
Fig. 6. The ground-state energy (a), effective mass (b), and average number of
phonons as function of coupling constant (c). Partial weights of n-phonon states (d)
in the polaron ground state (k = 0) at γ = 18 (circles), γ = 18.35 (squares), and
γ = 19 (diamonds). Dotted line in panel (a) is the result of strong coupling limit
and dashed line is the result of perturbation theory.
electron-hole motion. System is defined as a structureless QP with dispersion
ǫ(k) = k2/2 and short range coupling to phonons [54, 7]. General criterion of
the existence of ST is satisfied for three dimensional system with short range
interaction [54, 7, 50] and, thus, one expects to observe typical features of the
phenomenon.
It is shown [54] that in the vicinity of the critical coupling γc ≈ 18 the
average number of phonons 〈N〉 in (18) and effective mass m∗ quickly in-
crease in the ground state by several orders of magnitude (Fig. 6b-c). Besides,
a quantum resonance between polaronic phonon clouds of F and T state is
demonstrated. Distribution of partial n-phonon contributions Z(k=0)(n) in
(17) has one maximum at n = 0 in the weak coupling regime, which cor-
responds to weak deformation, and one maximum at n ≫ 1 in the strong
coupling regime, which is the consequence of a strong lattice distortion. How-
ever, due to F-T resonance there are two distinct peaks at n = 0 and n ≫ 1
for γ ≈ γc (Fig. 6d).
Near the critical coupling γc the LF of polaron has several stable states
(Fig. 7 a-b) below the threshold of incoherent continuum Egs+ωph. Any state
above the threshold is unstable because emission of a phonon with transition
to the ground state at k = 0 with energy Egs is allowed. On the other hand,
decay is forbidden by conservation laws for states below the threshold. De-
pendence of the energies of ground and excited resonances on the interaction
constant resembles a picture of crossing of several states interacting with each
other (Fig. 7c).
According to the general picture of the ST phenomenon, lowest F state in
the weak coupling regime at k = 0 has small effective mass m∗ ≈ m of the
order of the bare QP mass m. To the contrary, the effective mass of excited
state m∗ ≫ m is large. Hence, below the critical coupling the energy of the
F state, which is lowest at k = 0, has to reach a flat band of T state at
some momentum. Then, F and T state have to hybridize and exchange in
22 A. S. Mishchenko and N. Nagaosa
0 1 2 3
17 18 19
0 2 4
Fig. 7. LF L(k=0)(ω) at critical coupling γ = γc (a) and for γ > γc (b). Energy is
counted from the polaron ground state. (c) Dependence of energy of ground state
(squares) and stable excited states (circles, diamonds, and triangles) on the coupling
constant. Dashed line is the threshold of the incoherent continuum. Dependence
of energy (d) and average number of phonons (e) on the wave vector at γ < γc
(circles and rectangles). Dashed line is the effective mass approximation E(k) =
Egs + k
2/2m∗ for parameters Egs = −3.7946 and m∗ = 2.258, obtained by DMC
estimators for given value of γ. Dotted line is a parabolic dispersion law which is
fitted to last 4 points of energy dispersion curve with parameters E1 = −3.5273 and
m∗1 = 195. Empty square is the energy of first excited stable state at zero momentum
obtained by SO method.
energy. DMC data visualize this picture (Fig. 7 d-e). After F state crosses the
flat band of excited T state, the average number of phonons increases and
dispersion becomes flat.
It is natural to assume that above the critical coupling the situation is
opposite: ground state is the T state with large effective mass while excited
F state has small, nearly bare, effective mass. Indeed, this assumption was
confirmed in the framework of another model which is considered in Sect. 6.1.
Moreover, it was shown that in the strong coupling regime excited resonance
inherits not only bare effective mass around k = 0 but the whole dispersion
law of the bare QP [49].
4.2 Degeneracy Driven Self-Trapping
According to the criterion (38), ST phenomenon in one-dimensional sys-
tem does not occur. Although this statement is probably valid for the
case of single band in relevant energy range, it is not the case for the
generic multi-band cases. This fact has been unnoticed for many years,
Spectroscopic Properties of Polarons by Exact Monte Carlo 23
which prevented the proper explanation of puzzling physics of quasi-one-
dimensional compound Anthracene-PMDA, although it’s optical properties
[65, 145, 146, 147, 66, 148] directly suggested resonance of T and F states.
The reason is that in Anthracene-PMDA, in contrast to conditions at which
criterion (38) is obtained, there are two, nearly degenerate exciton bands.
Then, one can consider quasi-degenerate self-trapping mechanism when ST
phenomenon is driven by nondiagonal interaction of phonons with quaside-
generate exciton levels [52]. Such mechanism was already suggested for expla-
nation of properties of mixed valence systems [143] though it’s relevance was
never proved by an exact approach.
0.0 0.5 1.0 1.5
0.0 0.5 1.0 1.5
0 10 20
Fig. 8. Dependence of energy (a) and average number of phonons (b) on the non-
diagonal coupling constant λ12 at λ11 = 0 and λ22 = 0.25. Phonon distributions in
polaron cloud below ST point at λ12 = 1.0125 (c), at ST point at λ12 = 1.0435 (d),
and above ST coupling at λ12 = 1.0625 (e).
The minimal model to demonstrate the mechanism of quasi-degenerate
self-trapping involves one optical phonon branch with frequency ωph = 0.1
and two exciton branches with energies ǫ1,2(q) = ∆1,2 + 2[1 − cos(q)], where
∆1 = 0 and ∆2 = 1. Presence of short range diagonal γ22 and nondiagonal
γ12 interactions (with corresponding dimensionless constants λ22 = γ
22/(2ω)
and λ12 = γ
12/(2ω)) leads to classical self-trapping behavior even in one-
dimensional system [52] (see Fig. 8).
5 Exciton
Despite numerous efforts over the years, there has been no rigorous tech-
nique to solve for exciton properties even for the simplest model (1)-(2) which
treats electron-electron interactions as a static renormalized Coulomb poten-
tial with averaged dynamical screening. The only solvable cases are the Frenkel
small-radius limit [67] and the Wannier large-radius limit [68] which describe
molecular crystals and wide gap insulators with large dielectric constant, re-
spectively. Meanwhile, even the accurate data for the limits of validity of the
24 A. S. Mishchenko and N. Nagaosa
Wannier and Frenkel approximations have not been available. As discussed in
Sects. 1.2 and 2.3, semianalytic approaches has little to add to problem when
quantitative results are needed whereas traditional numerical methods fail to
reproduce them even in the Wannier regime. To the contrary, DMC results
do not contain any approximation.
0.0 0.5 1.0 1.5 2.0
Bandwidth
0.0 20.0 40.0 60.0 80.0
Bandwidth
0 5 10 15 20 25
Coordinate sphere
−0.05
0 200 400 600 800 1000
Electron−hole distance
0 2 4 6 8 10
Coordinate sphere
0 1 2 3 4
Coordinate sphere
Fig. 9. Panel (a): dependence of the exciton binding energy on the bandwidth
Ec = Ev for conduction and valence bands. The dashed line corresponds to the
Wannier model. The solid line is the cubic spline, the derivatives at the right and left
ends being fixed by the Wannier limit and perturbation theory, respectively. Inset
in panel (a): the initial part of the plot. Panel (b): the wave function of internal
motion in real space for the optically forbidden monopolar exciton. Panels (c)-(e):
the wave function of internal motion in real space: (c) Wannier [Ec = Ev = 60]; (d)
intermediate [Ec = Ev = 10]; (e) near-Frenkel [Ec = Ev = 0.4] regimes. The solid
line in the panel (c) is the Wannier model result while solid lines in other panels are
to guide the eyes only.
To study conditions of validity of limiting regimes by DMC method,
electron-hole spectrum of three dimensional system was chosen in the form
of symmetric valence and conduction bands with width Ec and direct gap Eg
Spectroscopic Properties of Polarons by Exact Monte Carlo 25
at zero momentum [46]. For large ratio W = Ec/Eg, when W > 30, exci-
ton binding energy is in good agreement with Wannier approximation results
(Fig. 9a) and probability density of relative electron-hole motion corresponds
(Fig. 9c) to hydrogen-like result. The striking result is the requirement of
rather large valence and conduction bandwidths (W > 20) for applicability of
Wannier approximation. For smaller values ofW the binding energy and wave
function of relative motion (Fig. 9d) deviate from large radius results. In the
similar way, conditions of validity of Frenkel approach are rather restricted
too. Moreover, even strong localization of wave function does not guarantee
good agreement between exact and Frenkel approximation result for binding
energy. At 1 < W < 10 the wave function is already strongly localized though
binding energy considerably differs from Frenkel approximation result. For
example, at W = 0.4 relative motion is well localized (Fig. 9e) whereas the
binding energy of Frenkel approximation is two times larger than exact result
(Inset in Fig. 9a).
A study of conditions necessary for formation of charge transfer exciton in
three dimensional systems is crucial to finalize protracted discussion of numer-
ous models concerning properties of mixed valence semiconductors [149]. A
decade ago unusual properties of SmS and SmB6 were explained by invoking
the excitonic instability mechanism assuming charge-transfer nature of the
optically forbidden exciton [150, 151]. Although this model explained quanti-
tatively the phonon spectra [152, 153], optical properties [154, 155], and mag-
netic neutron scattering data [138], it’s basic assumption has been criticized
as being groundless [156, 157]. To study excitonic wavefunction, dispersions
of valence and conduction bands were chosen as it is typical for mixed valence
materials: almost flat valence band is separated from broad conduction band,
having maximum in the centre and minimum at the border of Brillouin zone
[46]. Results presented in Fig. 9b support assumption of [150, 151] since wave
function of relative motion has almost zero on-site component and maximal
charge density at near neighbors.
6 Polarons in Undoped High Temperature
Superconductors
It is now well established that the physics of high temperature superconduc-
tors is that of hole doping a Mott insulator [158, 159, 160]. Even a single
hole in a Mott insulator, i.e. a hole in an antiferromagnet in case of infinite
Hubbard repulsion U , is substantially influenced by many-body effects [10] be-
cause it’s jump to a neighboring site disturbs antiferromagnetic arrangement
of spins. Hence a thorough understanding of the dynamics of doped holes in
Mott insulators has attracted a great deal of recent interest. The two major
interactions relevant to the electrons in solids are electron-electron interac-
tions (EEI) and electron-phonon interactions (EPI). The importance of the
former at low doping is no doubt essential since the Mott insulator is driven
26 A. S. Mishchenko and N. Nagaosa
by strong Hubbard repulsion, while the latter was considered to be largely
irrelevant to superconductivity based on the observations of a small isotope
effect on the optimal Tc [161] and an absence of a phonon contribution to the
resistivity (for review see [162]).
On the other hand, there are now accumulating evidences that the EPI
plays an important role in the physics of cuprates such as (i) an isotope effect
on superfluid density ρs and Tc away from optimal doping [163], (ii) neutron
and Raman scattering [164, 165, 166] experiments showing strong phonon soft-
ening with both temperature and hole doping, indicating that EPI is strong
[167, 168]. Furthermore, the recent studies of cuprates by the angle resolved
photoemission spectroscopy (ARPES), which spectra are proportional to the
LF (7) [32], resulted in the discovery of the dispersion ”kinks” at around 40-
70meV measured from the Fermi energy, in the correct range of the relevant
oxygen related phonons [169, 170, 171]. These particular phonons - oxygen
buckling and half-breathing modes are known to soften with doping [172, 164]
and with temperature [170, 171, 172, 164, 165, 166] indicating strong cou-
pling. The quick change of the velocity can be predicted by any interaction of
a quasiparticle with a bosonic mode, either with a phonon [170, 171] or with
a collective magnetic resonance mode [173, 174, 175]. However, the recently
discovered “universality” of the kink energy for LSCO over the entire doping
range [176] casts doubts on the validity of the latter scenario as the energy
scale of the magnetic excitation changes strongly with doping.
Besides, measured in undoped high Tc materials ARPES revealed appar-
ent contradiction between momentum dependence of the energy and linewidth
of the QP peak. On the one hand the experimental energy dispersion of the
broad peak in many underdoped compounds [31, 177] obeys the theoretical
predictions [178, 179], whereas the experimental peak width is comparable
with the bandwidth and orders of magnitude larger than that obtained from
theory of Mott insulator [53]. Early attempts to interpret this anomalously
short lifetime of a hole by an interaction with additional nonmagnetic bosonic
excitations, e.g. phonons [180], faced generic question: is it possible that in-
teraction with media leaves the energy dispersion absolutely unrenormalized,
while, induces a decay which inverse life-time is comparable or even larger
than the QP energy dispersion? A possibility of an extrinsic origin of this
width can be ruled out since the doping induces further disorder, while a
sharper peak is observed in the overdoped region.
In order to understand whether phonons can be responsible for peculiar
shape of the ARPES in the undoped cuprates, the LF of an interacting with
phonons hole in Mott insulator was studied by DMC-SO [49]. The case of the
LF of a single hole corresponds to the ARPES in an undoped compound. For
a system with large Hubbard repulsion U , when U is much larger than the
typical bandwidth W of noninteracting QP, the problem reduces to the t-J
model [181, 182, 158, 11]
Spectroscopic Properties of Polarons by Exact Monte Carlo 27
Ĥt-J = −t
〈ij〉s
iscjs + J
(SiSj − ninj/4) . (39)
Here cjσ is projected (to avoid double occupancy) fermion annihilation op-
erator, ni (< 2) is the occupation number, Si is spin 1/2 operator, J is an
exchange integral, and 〈ij〉 denotes nearest-neighbor sites in two dimensional
square lattice. Different theoretical approaches revealed [158, 183, 53] basic
properties of the LF. The LF has a sharp peak in the low energy part of the
spectrum which disperses with a bandwidth WJ/t ∼ 2J and, therefore, the
large QP width in experiment can not be explained. More complicated tt′t′′-J
model takes into account hoppings to the second t′ and third t′′ nearest neigh-
bors and, hence, dispersion of the hole changes [184, 185, 186, 178, 179, 32].
However, for parameters, which are necessary for description of dispersion
in realistic high Tc superconductors [31, 178], peak in the low energy part
remains sharp and well defined for all momenta [187].
After expressing spin operators in terms of Holstein-Primakoff spin wave
operators and diagonalizing the spin part of Hamiltonian (39) by Fourier
and Bogoliubov transformations [188, 10, 189, 190], tt′t′′-J Hamiltonian is
reduced to the boson-holon model, where hole (annihilation operator is hk)
with dispersion ε(k) = 4t′ cos(kx) cos(ky)+2t
′′[cos(2kx)+cos(2ky)] propagates
in the magnon (annihilation operator is αk) bath
Ĥ0t-J =
ε(k)h
αk (40)
with magnon dispersion ωk = 2J
1− γ2
, where γk = (cos kx + cos ky)/2.
The hole is scattered by magnons as described by
Ĥh-mt-J = N
hk−qαk + h.c.
with the scattering vertex Mk,q. Parameters t, t
′ and t′′ are hopping ampli-
tudes to the first, second and third near neighbors, respectively. If hopping
integrals t′ and t′′ are set to zero and bare hole has no dispersion, the problem
(40-41) corresponds to t-J model.
Short range interaction of a hole with dispersionless optical phonons
Ĥe-ph = Ω0
bk of the frequency Ω0 is introduced by Holstein Hamil-
tonian
Ĥe-ph = N−1/2
hk−qbq + h.c.
, (42)
where σ is the momentum and isotope independent coupling constant, M is
the mass of the vibrating lattice ions, and Ω0 is the frequency of dispersionless
phonon. The coefficient in front of square brackets is the standard Holstein in-
teraction constant γ = σ/
(2MΩ0). In the following we characterize strength
of EPI in terms of dimensionless coupling constant λ = γ2/4tΩ0. Note, if in-
teraction with magnetic subsystem (41) is neglected and hole dispersion ε(k)
28 A. S. Mishchenko and N. Nagaosa
is chosen in the form ε(k) = 2t[cos(kx) + cos(ky)], the problem (40), (42) cor-
responds to standard Holstein model where hole with near neighbor hopping
amplitude t interacts with dispersionless phonons.
We consider the evolution of ARPES of a single hole in t-J-Holstein model
(40)-(42) from the weak to the strong coupling regime and dispersion of the LF
in the strong coupling regime in Sect. 6.1. It occurs that properties of the LF in
the strong coupling regime of the EPI explain the puzzle of broad lineshape
in ARPES in underdoped high Tc superconductors. Therefore, in order to
suggest a crucial test for the mechanism of phonon-induced broadening, we
present calculations of the effect of the isotope substitution on the ARPES in
Sect. 6.2.
6.1 Spectral Function of a Hole Interacting with Phonons in the
t-J Model: Self-Trapping and Momentum Dependence
Previously, the LF of t-J-Holstein model was studied by exact diagonalization
method on small clusters [191] and in the non-crossing approximation (NCA)6
for both phonons and magnons [192, 193]. However, the small system size in
exact diagonalization method implies a discrete spectrum and, therefore, the
problem of lineshape could not be addressed. The latter method omits the
FDs with mutual crossing of phonon propagators and, hence, is an invalid
approximation for phonons in strong and intermediate couplings of EPI. This
statement was demonstrated by DMC, which can sum all FDs for Holstein
model both exactly and in the NCA [49]. Exact results and those of NCA are
in good agreement for small values λ ≤ 0.4 and drastically different for λ > 1.
For example, for Ω0/t = 0.1 exact result shows a sharp crossover to strong
coupling regime for λ > λcH ≈ 1.2 whereas NCA result does not undergo such
crossover even at λ = 100. On the other hand, NCA is valid for interaction of a
hole with magnons since spin S=1/2 can not flip more than once and number
of magnons in the polaronic cloud can not be large. Note that the t-J-Holstein
model is reduced to problem of polaron which interacts with several bosonic
fields (3)-(4).
DMC expansion in [49] takes into account mutual crossing of phonon prop-
agators and, in the framework of partial NCA, neglects mutual crossing of
magnon propagators, to avoid sign problem. NCA for magnons is justified for
J/t ≤ 0.4 by good agreement of results of NCA and exact diagonalization on
small clusters [188, 10, 194, 195, 190]. Recently results of exact diagonalization
were compared in the limit of small EPI for t-J-Holstein model, boson-holon
model (40-42) without NCA, and boson-holon model with NCA [196]. Al-
though agreement is not so good as for pure t-J model, it was concluded that
NCA for magnons is still good enough to suggest that one can use NCA for a
qualitative description of the t-J-Holstein model.
6 NCA is equivalent to self-consistent Born approximation (SCBA)
Spectroscopic Properties of Polarons by Exact Monte Carlo 29
0.0 0.2 0.4 0.6
0.0 0.2 0.4
0.0 0.2 0.4
-2.50 -2.25 -2.00
-2 0 2 4
Fig. 10. (a) The LF of a hole in the ground state k = (π/2, π, 2) at J/t = 0.3 and
λ = 0. Low energy part of the LF of a hole in the ground state k = (π/2, π, 2) at
J/t = 0.3: (b) λ = 0; (c) λ = 0.3; (d) λ = 0.4; (e) λ = 0.46. Dependence on coupling
strength λ at J/t = 0.3: (f) energies of lowest LF resonances; (g) Z-factor of lowest
peak; (h) average number of phonons 〈N〉.
Figures 10a-e show low energy part of LF in the ground state at k =
(π/2, π/2) in the weak, intermediate, and strong coupling regimes of inter-
action with phonons. Dependence on the coupling constant of energies of
resonances (Fig. 10f), Zk=(π/2,π/2)-factor of lowest peak (Fig. 10g), and aver-
age number of phonons in the polaronic cloud 〈N〉 (Fig. 10h) demonstrates a
picture which is typical for ST (see [80, 54] and Sect. 4). Two states cross and
hybridize in the vicinity of critical coupling constant λct-J ≈ 0.38, Zk=(π/2,π/2)-
factor of lowest resonance sharply drops and average number of phonons in
polaronic cloud quickly rises. According to the general understanding of the
ST phenomenon, above the critical couplings λ > λct-J one expects that the
lowest state is dispersionless while the upper one has small effective mass.
This assumption is supported by the momentum dependence of the LF in
the strong coupling regime (Fig. 11a-e). Dispersion of upper broad shake-off
Franck-Condon peak nearly perfectly obeys relation
εk = εmin+WJ/t/5{[coskx+cos ky]2+[cos(kx+ky)+cos(kx−ky)]2/4}, (43)
which describes dispersion of the pure t-J model in the broad range of ex-
change constant 0.1 < J/t < 0.9 [194] (Fig. 11f). Note that this property of
the shake-off peak is general for the whole strong coupling regime (Fig. 11f).
Momentum dependence of the shake-off peak, reproducing that of the free
particle, is the direct consequence of the adiabatic regime. Actually, phonon
frequency Ω0 is much smaller than the coherent bandwidth 2J of the t-J
30 A. S. Mishchenko and N. Nagaosa
-2 0 2
-2.5 -2.0
-2.75
-2.50
-2.25
-2.00
-1.75
-10 -5 0 5 10
k=( /2, /2)
/t /t
k=( /4, /4)
k=(0, /4)
k=(0, )
k=( /2, /2)
Fig. 11. The LF of a hole at J/t = 0.3 and λ = 0.46: (a) full energy range
for k = (π/2, π/2); (b–e) low energy part for different momenta. Slanted arrows
show broad peaks which can be interpreted in ARPES spectra as coherent (C) and
incoherent (I) part. Vertical arrows in panels (b)–(e) indicate position of “invisible”
lowest resonance. (f) Dispersion of resonances energies at J/t = 0.3: broad resonance
(filled circles) and lowest polaron pole (filled squares) at λ = 0.46; broad resonance
(open circles) and lowest polaron pole (open squares) at λ = 0.4. The solid curves
are dispersions (43) of a hole in pure t-J model at J/t = 0.3 (WJ/t=0.3 = 0.6):
εmin = −2.396 (εmin = −2.52) for dotted (solid) line. Panel (g) shows ground state
potential Q2/2 (solid line), excited state potential without relaxation D + Q2/2
(dashed line), and relaxed excited state potential D + (Q − λ)2/2 − λ2/2 (dotted
line).
model, giving the adiabatic ratio Ω/2J = 1/6 ≪ 1. Besides, as experience
with the OC of the Fröhlich polaron (Sect. 3.2) shows, there is one more
important parameter in the strong coupling limit. Namely, the ratio between
measurement process time τmp = h̄/∆E where ∆E is the energy separation of
shake-off hump from the ground state pole, and that of characteristic lattice
time τ ≈ 1/Ω0 is much less than unity. Hence, fast photoemission probe sees
the ions frozen in one of possible configurations [197]. The LF in the FC limit
is a sum of transitions between a lower Elow(Q) and an upper Eup(Q) sheets
of adiabatic potential, weighted by the adiabatic wave function of the lower
sheet | ψlow(Q) |2 [198]. If EPI is absent both in initial Elow(Q) = Q2/2
and final Eup(Q) = D + Q2/2 states, the LF is peaked at the energy D.
Then, if there is EPI ∆Eup(Q) = −λQ only in the final state, i.e. when hole
is removed from the Mott insulator, the upper sheet of adiabatic potential
Eup(Q) = D− λ2/2 + (Q− λ)2/2 has the same energy D at Q = 0. Since the
probability function | ψlow(Q) |2 has maximum at Q = 0, the peak of the LF
broadens but it’s energy does not shift [198] (Fig. 11g).
Spectroscopic Properties of Polarons by Exact Monte Carlo 31
Behavior of the LF is the same as observed in the ARPES of undoped
cuprates. The LF consists of a broad peak and a high energy incoherent con-
tinuum (see Fig. 11a). Besides, dispersion of the broad peak “c” in Figs. 11
reproduces that of sharp peak in pure t-J model (Fig. 11b-f). The lowest dis-
persionless peak, corresponding to small radius polaron, has very small weight
and, hence, can not be seen in experiment. On the other hand, according to ex-
periment, momentum dependence of spectral weight Z(k)
of broad resonance
exactly reproduces dispersion of Z(k)-factor of pure t-J model. The reason
for such perfect mapping is that in adiabatic case Ω0/2J ≪ 1 all weight of
the sharp resonance in t-J model without EPI is transformed at strong EPI
into the broad peak. This picture implies that the chemical potential in the
heavily underdoped cuprates is not connected with the broad resonance but
pinned to the real quasiparticle pole with small Z-factor. This conclusion was
recently confirmed experimentally [177].
Comparing the critical EPI for a hole in the t-J-Holstein model (40-42)
λct-J ≈ 0.38 and that for Holstein model λcH ≈ 1.2 with the same value of
hopping t, we conclude that spin-hole interaction accelerates transition into
the strong coupling regime. The reason for enhancement of the role of EPI is
found in [196]. Comparison of the EPI driven renormalization of the effective
mass in t-J-Holstein and Holstein model shows that large effective mass in the
t-J model is responsible for this effect. The enhancement of the role of EPI
by EEI takes place at least for a single hole at the bottom of the t-J band.
Had the comparison been made with half-filled model, the result would have
been smaller enhancement or no enhancement at all [199]. On the other hand,
coupling constant of half-breathing phonon is increased by correlations [200].
Finally, we conclude that effect of enhancement of the effective EPI by EEI is
not unambiguous and depends on details of interaction and filling. However,
this effect is present for small filling in the t-J-Holstein model.
6.2 Isotope Effect on ARPES in Underdoped High-Temperature
Superconductors
The magnetic resonance mode and the phonon modes are the two major
candidates to explain the “kink” structure of the electron energy dispersion
around 40-70 meV below the Fermi energy, and the isotope effect (IE) on
ARPES should be the smoking-gun experiment to distinguish between these
two. Gweon et al. [201] performed the ARPES experiment on O18-replaced
Bi2212 at optimal doping and found an appreciable IE, which however can
not be explained within the conventional weak-coupling Migdal-Eliashberg
theory. Namely the change of the spectral function due to O18-replacement
has been observed at higher energy region beyond the phonon energy (∼
60meV). This is in sharp contrast to the weak coupling theory prediction, i.e.,
the IE should occur only near the phonon energy. Hence the IE in optimal
Bi2212 remains still a puzzle. On the other hand, the ARPES in undoped
materials, as described in Sect. 6.1, has recently been understood in terms of
32 A. S. Mishchenko and N. Nagaosa
the small polaron formation [49, 202, 198]. Therefore, it is essential to compare
experiment in undoped systems with presented in this Sect. DMC-SO data,
where theory can offer quantitative results.
In addition to high-Tc problem, strong EPI mechanism of ARPES spec-
tra broadening was considered as one of alternative scenarios for diatomic
molecules [203], colossal magnetoresistive manganites [34], quasi-one-dimensi-
onal Peierls conductors [37, 38], and Verwey magnetites [39]. Therefore, exact
analysis of the IE on ARPES at strong EPI is of general interest for conclusive
experiments in a broad variety of compound classes.
Dimensionless coupling constant λ = γ2/4tΩ in (42) is an invariant quan-
tity for the simplest case of IE. Indeed, assuming natural relation Ω ∼ 1/
between phonon frequency and mass, we find that λ does not depend on
the isotope factor κiso = Ω/Ω0 =
M0/M , which is defined as the ratio of
phonon frequency in isotope substituted (Ω) and normal (Ω0) systems. We
chose adopted parameters of the tt′t′′-J model which reproduce the experi-
mental dispersion of ARPES [178]: J/t = 0.4, t′/t = −0.34, and t′′/t = 0.23 .
The frequency of the relevant phonon [32] is set to Ω0/t = 0.2 and the isotope
factor κiso =
16/18 corresponds to substitution of O18 isotope for O16.
To sweep aside any doubts of possible instabilities of analytic continuation,
we calculate the LF for normal compound (κnor = 1), isotope substituted
(κiso =
16/18) and “anti-isotope” substituted (κant =
18/16) compounds.
Monotonic dependence of LF on κ ensures stability of analytic continuation
and gives possibility to evaluate the error-bars of a quantity A using quantities
Aiso −Anor, Anor −Aant, and (Aiso −Aant)/2.
Since LF is sensitive to strengths of EPI only for low frequencies [55], we
concentrate on the low energy part of the spectrum. Figure 12 shows IE on the
hole LF for different couplings in nodal and antinodal points, respectively. The
general trend is a shift of all spectral features to larger energies with increase
of the isotope mass (κ < 1). One can also note that the shift of broad FCP is
much larger than that of narrow real-QP peak. Moreover, for large couplings
λ the shift of QP energy approaches zero and only decrease of QP spectral
weight Z is observed for larger isotope mass. On the other hand, the shift
of FCP is not suppressed for larger couplings. Except for the LF in nodal
point at λ = 0.62 (Fig. 12a, b), where LF still has significant weight of QP
δ-functional peak, there is one more notable feature of the IE. With increase
of the isotope mass the height of FCP increases. Taking into account the
conservation law for LF
−∞ Lk(ω) = 1 and insensitivity of high energy part
of LF to EPI strength [55], the narrowing of the FCP for larger isotope mass
can be concluded. To understand the trends of the IE in the strong coupling
regime we analyze the exactly solvable independent oscillators model (IOM)
[60]. The LF in IOM is the Poisson distribution
L(ω) = exp[−ξ0/κ]
[ξ0/κ]
Gκ,l(ω) , (44)
Spectroscopic Properties of Polarons by Exact Monte Carlo 33
Fig. 12. Low energy part of hole LFs: normal compound (solid line), isotope sub-
stituted compound (dotted line) and “antiisotope” substituted compound (dashed
line). LFs at different couplings in the nodal (a, c, e) and antinodal (g, i, l) points.
Insets (b, d, f, h, k) show low energy peak of real QP.
where ξ0 = γ
0 = 4tλ/Ω0 is dimensionless coupling constant for normal
system and Gκ,l(ω) = δ[ω+4tλ−Ω0κl] is the δ-function. The properties of the
Poisson distribution quantitatively explain many features of the IE on LF7.
The energy ωQP = −4tλ of the zero-phonon line l = 0 in (44) depends
only on isotope independent quantities which explains very weak isotope de-
pendence of QP peak energy in insets of Fig. 12. Besides, change of the zero-
phonon line weight Z(0) obeys relation Z
iso /Z
nor = exp [−ξ0(1− κ)/κ] in
IOM. These IOM estimates agree with DMC data within 15% in the nodal
point and within 25% in the antinodal one. IE on FCP in the strong cou-
pling regime follows from the properties of zero M0 =
−∞ L(ω)dω = 1, first
−∞ ωL(ω)dω = 0, and second M2 =
2L(ω)dω = κξ0Ω
0 mo-
ments of shifted Poisson distribution (44). Moments M0 and M2 establish
relation D = hFCPiso /hFCPnor = 1/
κ ≈ 1.03 between heights of FCP in normal
and substituted compounds. DMC data in the antinodal point perfectly agree
with the above estimate for all couplings. This is consistent with the idea that
the anti-nodal region remains in the strong coupling regime even though the
nodal region is in the crossover region. In the nodal point DMC data well
agree with IOM estimate for λ = 0.75 (D ≈ 1.025) whereas at λ = 0.69 and
7 Cautions should be made about approximate form of EPI (42). Strictly speaking,
actual momentum dependence of the interaction constant σ [204, 205] can slightly
change the obtained differences between nodal and antinodal points though the
general trends have to be left intact because ST is caused solely by the short
range part of EPI [80].
34 A. S. Mishchenko and N. Nagaosa
0.65 0.70 0.75
0.65 0.70 0.75
0.65 0.70 0.75
0.5 0.6 0.7
k=( /2, /2)
Fig. 13. (a) Energies of ground state and broad peaks for normal (triangles),
isotope substituted (circles) and “antiisotope” substituted (diamonds) compounds.
Comparison of IOM estimates (lines) with DMC data in the nodal (squares) and
antinodal (diamonds) points: (b) shift of the FCP top, (c) FCP leading edge at 1/2
of height, and (d) FCP leading edge at 1/3 of height.
λ = 0.62 influence of the ST point leads to anomalous values of D: D ≈ 1.07
and D ≈ 0.98, respectively. Shift of the low energy edge at half maximum
∆1/2 must be proportional to change of the root square of second moment
∆√M2 =
ξ0Ω0[1 −
κ]. As we found in numeric simulations of (44) with
Gaussian functions8 Gκ,l(ω), relation ∆1/2 ≈ ∆√M2/2 is accurate to 10% for
0.62 < λ < 0.75. Also, simulations show that the shift of the edge at one
third of maximum ∆1/3 obeys relation ∆1/3 ≈ ∆√M2 . DMC data with IOM
estimates are in good agreement for strong EPI λ = 0.75 (Fig. 13). How-
ever, shift of the FCP top ∆p and ∆1/2 are considerably enhanced in the
self-trapping (ST) transition region. The physical reason for enhancement of
IE in this region is a general property regardless of the QP dispersion, range
of EPI, etc. The influence of nonadiabatic matrix element, mixing excited
and ground states, on the energies of resonances essentially depends on the
phonon frequency. While in the adiabatic approximation ST transition is sud-
den and nonanalytic in λ [80], nonadiabatic matrix elements turn it to smooth
crossover [144]. Thus, as illustrated in Fig. 13a, the smaller the frequency the
sharper the kink in the dependence of excited state energy on the interaction
constant
In the undoped case the present results can be directly compared with
the experiments. It is found that the IE on the ARPES lineshape of a sin-
gle hole is anomalously enhanced in the intermediate coupling regime while
can be described by the simple independent oscillators model in the strong
coupling regime. The shift of FCP top and change of the FCP height are rele-
vant quantities to pursue experimentally in the intermediate coupling regime
since IE on these characteristics is enhanced near the self trapping point. In
8 Results are almost independent on the parameter η of the Gaussian distribution
Gκ,l(ω) = 1/(η
2π) exp(−[ω + 4tλ−Ω0κl]/(2η2)) in the range [0.12, 0.2].
Spectroscopic Properties of Polarons by Exact Monte Carlo 35
contrast, shift of the leading edge of the broad peak is the relevant quantity
in the strong coupling regime since this value increases with coupling as
These conclusions, depending on the fact whether self trapping phenomenon
is encountered in specific case, can be applied fully or partially to another
compounds with strong EPI [34, 37, 38, 39].
6.3 Conclusions and Perspectives
In this article, we have focused mainly on the polaron problem in strongly
correlated systems. This offers an approach from the limit of low carrier con-
centration doped into the (Mott) insulator, which is complementary to the
conventional Eliashberg-Migdal approach for the EPI in metals. In the latter
case, we have the Fermi energy εF as a relevant energy scale, which is usually
much larger than the phonon frequency Ω0. In this case, the adiabatic Migdal
approximation is valid and the vertex corrections, which correspond to the
multi-phonon cloud and are essential to the self-trapping phenomenon, are
suppressed by the ratio Ω0/εF . Therefore an important issue is the crossover
from the strong coupling polaronic picture to the weak coupling Eliashberg-
Migdal picture. This occurs as one increases the carrier doping into the insu-
lator. As is observed by ARPES experiments in high temperature supercon-
ductors, the polaronic states continue to survive even at finite doping [177].
This suggests a novel polaronic metallic state in underdoped cuprates, which
is common also in CMR manganites [36] and is most probably universal in
transition metal oxides. In the optimal and overdoped region, the Eliashberg-
Migdal picture becomes appropriate [170, 171], but still a nontrivial feature of
the EPI is its strong momentum dependence leading to the dichotomy between
the nodal and anti-nodal regions. It is an interesting observation that the high-
est superconducting transition temperature is attained at the crossover region
between the two pictures above, which suggests that both the itinerancy and
strong coupling to the phonons are essential to the quantum coherence. It
should be noted that this crossover occurs in a nontrivial way also in the mo-
mentum space, i.e., the nodal and anti-nodal regions behave quite differently
as discussed in Sect. 6.2. However, the relevance of the EPI to the high Tc
superconductivity is still left for future investigations.
We hope that this article convinces the readers the vital role of ARPES
experiments and numerically exact solutions to the EPI problem, the com-
bination of them offers a powerful tool for the momentum-energy resolved
analysis of these rather complicated strongly correlated electronic systems.
This will pave a new path to the deeper understanding of the many-body
electronic systems.
We thank Y. Toyozawa, Z. X. Shen, T. Cuk, T. Devereaux, J. Zaanen, S.
Ishihara, A. Sakamoto, N. V. Prokofev, B. V. Svistunov, E. A. Burovski, J. T.
Devreese, G. de Filippis, V. Cataudella, P. E. Kornilovitch, O. Gunnarsson,
N. M. Plakida, and K. A. Kikoin, for collaborations and discussions.
36 A. S. Mishchenko and N. Nagaosa
References
1. J. Appel: Solid State Physics, Vol. 21, ed by H. Ehrenreich, F. Seitz and D.
Turnbull (Academic, New York 1968).
2. S. I. Pekar: Untersuchungen über die Elektronentheorie der Kristalle,
(Akademie Verlag, Berlin 1954)
3. L. D. Landau: Sow. Phys. 3, 664 (1933).
4. H. Fröhlich, H. Pelzer, S. Zienau: Philos. Mag. 41, 221 (1950)
5. J. T. Devreese: Encyclopedia of Applied Physics Vol. 14, ed by G. L. Trigg
(VCH, New York 1996), p. 383
6. A. I. Anselm, Yu. A. Firsov: J. Exp. Theor. Phys. 28, 151 (1955); ibid. 30, 719
(1956)
7. M. Ueta, H. Kanzaki, K. Kobayashi, Y. Toyozawa, E. Hanamura: Excitonic
Processes in Solids, (Springer-Verlag, Berlin 1986)
8. Y. Toyozawa: Progr. Theor. Phys. 20 53 (1958).
9. Y. Toyozawa: Optical Processes in Solids, (University Press, Cambridge 2003)
10. C. L. Kane, P. A. Lee, N. Read: Phys. Rev. B 39, 6880 (1989)
11. Yu. A. Izymov: Usp. Fiz. Nauk 167, 465 (1997) [Physics-Uspekhi 40, 445
(1997)]
12. J. Kanamori: Appl. Phys. 31, S14 (1960).
13. A. Abragam, B. Bleaney: Electron Paramagnetic Resonance of Transition Ions,
(Clarendon Press, Oxford 1970)
14. K. I. Kugel, D. I. Khomskii: Sov. Phys. Usp. 25, 231 (1982)
15. A. J. Millis, P. B. Littlewood, B. I. Shraiman: Phys. Rev. Lett. 74, 5144 (1995)
16. A. S. Alexandrov, A. M. Bratkovsky: Phys. Rev. Lett. 82, 141 (1999)
17. E. I. Rashba: Sov. Phys. JETP 23, 708 (1966)
18. Y. Toyozawa, J. Hermanson: Phys. Rev. Lett. 21, 1637 (1968)
19. I. B. Bersuker: The Jahn-Teller Effect, (IFI/Plenum, New York 1983)
20. V. L. Vinetskii: Zh. Exp. Teor. Fiz 40, 1459 (1961) [Sov. Phys. - JETP 13,
1023 (1961)]
21. P. W. Anderson: Phys. Rev. Lett. 34 953 (1975)
22. H. Hiramoto, Y. Toyozawa: J. Phys. Soc. Jpn. 54, 245 (1985)
23. A. Alexandrov, and J. Ranninger: Phys. Rev. B 23 1796 (1981)
24. A. Alexandrov, and J. Ranninger: Phys. Rev. B 24 1164 (1981)
25. H. Haken: Il Nuovo Cimento 3, 1230 (1956)
26. F. Bassani, G. Pastori Parravicini: Electronic States and Optical Transitions
in Solids, (Pergamon, Oxford 1975)
27. J. Pollman, H. Büttner: Phys. Rev. B 16, 4480 (1977)
28. A. Sumi: J. Phys. Soc. Jpn. 43, 1286 (1977)
29. Y. Shinozuka, Y. Toyozawa: J. Phys. Soc. Jpn. 46, 505 (1979)
30. Y. Toyozawa: Physica 116B, 7 (1983)
31. B. O. Wells, Z.-X. Shen, A. Matsuura et al: Phys. Rev. Lett. 74, 964 (1995)
32. A. Danmascelli, Z.-X. Shen, and Z. Hussain: Rev. Mod. Phys. 75, 473 (2003)
33. X. J. Zhou, T. Yoshida, D.-H. Lee et al: Phys. Rev. Lett. 92, 187001 (2004)
34. D. S. Dessau, T. Saitoh, C.-H. Park et al: Phys. Rev. Lett. 81, 192 (1998);
35. N. Mannella, A. Rosenhahn, C. H. Booth et al: Phys. Rev. Lett. 92, 166401
(2004)
36. N. Mannella, W. L. Yang, X. J. Zhou et al: Nature 438, 474 (2005)
37. L. Perfetti, H. Berger, A. Reginelli et al: Phys. Rev. Lett. 87, 216404 (2001)
Spectroscopic Properties of Polarons by Exact Monte Carlo 37
38. L. Perfetti, S. Mitrovic, G. Margaritondo et al: Phys. Rev. B 66, 075107 (2002)
39. D. Schrupp, M. Sing, M. Tsunekawa et al: Eur. Phys. Lett. 70, 789 (2005)
40. R. J. Mc Queeney, T. Egami, G. Shirane and Y. Endoh: Phys. Rev. B 54 R9689
(1996)
41. H. A. Mook, R. M. Nicklow: Phys. Rev. B 20 1656 (1979)
42. H. A. Mook, D. B. McWhan, F. Holtzberg: Phys. Rev. B 25 4321 (1982)
43. N. V. Prokof’ev, B. V. Svistunov, I. S. Tupitsyn: J. Exp. Theor. Phys. 114,
570 (1998) [Sov. Phys. - JETP 87, 310 (1998)]
44. N. V. Prokof’ev, B. V. Svistunov: Phys. Rev. Lett. 81, 2514 (1998)
45. A. S. Mishchenko, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov: Phys. Rev.
B 62, 6317 (2000)
46. E. A. Burovski, A. S. Mishchenko, N. V. Prokof’ev, B. V. Svistunov: Phys.
Rev. Lett. 87, 186402 (2001)
47. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, B. V. Svistunov, E. A.
Burovski: Nonlinear Optics 29, 257 (2002)
48. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov:
Phys. Rev. Lett. 91, 236401 (2003)
49. A. S. Mishchenko, N. Nagaosa: Phys. Rev. Lett. 93, 036402 (2004)
50. A. S. Mishchenko: Usp. Phys. Nauk 175, 925 (2005) [Physics-Uspekhi 48, 887
(2005)]
51. A. S. Mishchenko, N. Nagaosa: J. Phys. Soc. J. 75, 011003 (2006)
52. A. S. Mishchenko, N. Nagaosa: Phys. Rev. Lett. 86, 4624 (2001)
53. A. S. Mishchenko, N. V. Prokof’ev, B. V. Svistunov: Phys. Rev. B 64, 033101
(2001)
54. A. S. Mishchenko, N. Nagaosa, N. V. Prokof’ev, A. Sakamoto, B. V. Svistunov:
Phys. Rev. B 66 020301 (2002)
55. A. S. Mishchenko, N. Nagaosa: Phys. Rev. B 73, 092502 (2006)
56. A. S. Mishchenko, N. Nagaosa: J. Phys. Chem. Solids 67, 259 (2006)
57. G. De Fillipis, V. Cataudella, A. S. Mishchenko, J. T. Devreese, C. A. Perroni:
Phys. Rev. Lett. 96, 136405 (2006)
58. J. T. Devreese: Optical Properties of Few and Many Fröhlich Polarons from
3D to 0D, contribution to the present book.
59. A. A. Abrikosov, L. P. Gor’kov, I. E. Dzyaloshinskii: Quantum field theoretical
method in statistical physics (Pergamon Press, Oxford 1965)
60. G. D. Mahan: Many particle physics (Plenum Press, Plenum Press 2000)
61. M. Jarrell, J. Gubernatis: Phys. Rep. 269, 133 (1996)
62. R. Knox: Theory of Excitons, (Academic Press, New York 1963)
63. I. Egri: Phys. Rep 119, 364 (1985)
64. D. Haarer: Chem. Phys. Lett 31, 192 (1975)
65. D. Haarer, M. R. Philpot, H. Morawitz: J. Chem. Phys 63, 5238 (1975)
66. A. Elscner, G. Weiser: Chem. Phys 98 465 (1985)
67. J. I. Frenkel: Phys. Rev. 17, 17 (1931)
68. J. H. Wannier: Phys. Rev. 52, 191 (1937)
69. G. De Filippis, V. Cataudella, V. Marigliano Ramaglia, C. A. Perroni: Phys.
Rev. B 72, 014307 (2005)
70. M. Berciu: cond-mat/0602195
71. J. T. Devreese, L. F. Lemmens, J. Van Royen: Phys. Rev. B 15, 1212 (1977)
72. P. E. Kornilovitch: Europhys. Lett. 59, 735 (2002)
73. J. Devreese, R. Evrard: Phys. Lett. 11, 278 (1966)
38 A. S. Mishchenko and N. Nagaosa
74. E. Kartheuzer, R. Evrard, J. Devreese: Phys. Rev. Lett. 22, 94 (1969)
75. J. Devreese, J. De Sitter, M. Goovaerts: Phys. Rev. B 5, 2367 (1972)
76. J. T. Devreese: Internal structure of free Fröhlich polarons, optical absorption
and cyclotron resonance. In Polarons in Ionic crystals and Polar Semiconduc-
tors (North Holland, Amsterdam 1972) pp 83–159
77. M. J. Goovaerts, J. M. De Sitter, J. T. Devreese: Phys. Rev. B 7, 2639 (1973)
78. R. Feynman, R. Hellwarth, C. Iddings, and P. Platzman: Phys. Rev. 127, 1004
(1962)
79. T. D. Lee, F. E. Low, D. Pines: Phys. Rev. 90, 297 (1953)
80. E. I. Rashba: Self-trapping of excitons. In Modern Problems in Condensed
Matter Sciences, vol. 2, ed by V. M. Agranovich and A. A. Maradudin (Notrh
Holland, Amsterdam 1982) pp 543–602
81. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. M. Teller and E.
Teller: J. Chem. Phys. 21, 1087 (1953)
82. D. P. Landau, K. Binder: A Guide to Monte Carlo Simulations in Statistical
Physics, (University Press, Cambridge 2000)
83. A. W. Sandvik, J. Kurkijärvi: Phys. Rev. B 43, 5950 (1991)
84. A. N. Tikhonov, V. Y. Arsenin: Solutions of Ill-Posed Problems, (Winston,
Washington 1977)
85. E. Perchik: math-ph/0302045
86. D. L. Phillips: J. Assoc. Comut. Mach. 9 84 (1962)
87. A. N. Tikhonov: DAN USSR 151 501 (1963)
88. S. S. Aplesnin: J. Exp. Theor. Phys 97 969 (2003)
89. G. Onida, L. Reining, A. Rubio: Rev. Mod. Phys. 74, 601 (2002)
90. L. J. Sham, T. M. Rice: Phys. Rev. 144, 708 (1965).
91. L. X. Benedict, E. L. Shirley, R. B. Bohn: Phys. Rev. Lett. 80, 4514 (1998)
92. S. Albrecht, L. Reining, R. Del Sole, G. Onida: Phys. Rev. Lett. 80, 4510
(1998)
93. M. Rohlfing, S. G. Louie: Phys. Rev. Lett. 81, 2312 (1998)
94. A. Marini, R. Del Sole: Phys. Rev. Lett. 91, 176402 (2003)
95. W. Stephan: Phys. Rev. B 54, 8981 (1996)
96. G. Wellein, H. Fehske: Phys. Rev. B 56, 4513 (1997)
97. H. Fehske, J. Loos, G. Wellein: Z. Phys. B 104, 619 (1997)
98. H. Fehske, J. Loos, G. Wellein: Phys. Rev. B 61, 8016 (2000)
99. J. Bonča, S. A. Trugman, I. Batistić: Phys. Rev. B 60, 1633 (1999)
100. L.-C. Ku, S. A. Trugman, J. Bonča: Phys. Rev. B 65, 174306 (2002)
101. S. E. Shawish, J. Bonča, L.-C. Ku, S. A. Trugman: Phys. Rev. B 67, 014301
(2003)
102. O. S. Barisic: Phys. Rev. B 65, 144301 (2002)
103. O. S. Barisic: Phys. Rev. B 69, 064302 (2004)
104. A. Georges, G. Kotliar: Phys. Rev. B 45, 647 (1992)
105. M. Jarrel: Phys. Rev. Lett. 69, 168 (1992)
106. P. G. J. van Dongen, D. Vollhardt: Phys. Rev. Lett. 65, 1663 (1990)
107. A. Georges, G. Kotliar, W. Krauth, M. J. Rozenberg: Rev. Mod. Phys. 68, 13
(1996)
108. S. Ciuchi, F. de Pasquale, S. Fratini, D. Feinberg: Phys. Rev. B 56 4494 (1997)
109. D. Sénéchal, D. Perez, M. Pioro-Landriére: Phys. Rev. Lett. 84, 522 (2000)
110. D. Sénéchal, D. Perez, M. Plouffe: Phys. Rev. B 66, 075129 (2002)
111. M. Hohenadler, M. Aichhorn, W. von der Linden: Phys. Rev. B 68, 18430
(2003)
Spectroscopic Properties of Polarons by Exact Monte Carlo 39
112. M. Hohenadler, M. Aichhorn, W. von der Linden: Phys. Rev. B 71, 014302
(2005)
113. M. Hohenadler, D. Neuber, W. von der Linden, G. Wellein, J. Loos, H. Fehske:
ibid. 71, 245111 (2005)
114. S. R. White: Phys. Rev. Lett. 69, 2863 (1992)
115. S. R. White: Phys. Rev. B 48, 10345 (1993)
116. S. R. White: Phys. Rev. Lett. 77, 363 (1996)
117. E. Jeckelmann, S. R. White: Phys. Rev. B 57, 6376 (1998)
118. G. Hager, G. Wellein, E.Jeckelmann, H. Fehske: Phys. Rev. B 71, 075108 (2005)
119. P. E. Kornilovitch: Phys. Rev. Lett. 81, 5382 (1998)
120. P. E. Kornilovitch: Phys. Rev. B 60, 3237 (1999)
121. P. E. Spenser, J. H. Samson, P. E. Kornilovitch, A. S. Alexandrov: Phys. Rev.
B 71, 184310 (2005)
122. J. P. Hague, P. E. Kornilovitch, A. S. Alexandrov, J. H. Samson: Phys. Rev.
B 73, 054303 (2006)
123. A. S. Alexandrov, P. E. Kornilovitch: Phys. Rev. Lett. 82, 807 (1999)
124. A. S. Alexandrov, P. E. Kornilovitch: Phys. Rev. B 70, 224511 (2004)
125. S. Ciuchi, F. de Pasquale, D. Feinberg: Europhys. Lett. 30, 151 (1995)
126. A. S. Alexandrov, V. V. Labanov, D. K. Ray: Phys. Rev. B 49, 9915 (1994)
127. A. S. Alexandrov, J. Ranninger: Phys. Rev. B 45, 13109 (1992)
128. L. D. Landau, S. I. Pekar: Zh. Eksp. Teor. Fiz. 18, 419 (1948) [Sov. Phys.
JETP 18, 341 (1948)]
129. V. L. Gurevich, I. G. Lang, Yu. A. Firsov: Fiz. Tverd. Tela (Leningrad) 4, 1252
(1962) [Sov. Phys. Solid State 4, 918 (1962)]
130. J. Franck, E. G. Dymond: Trans. Faraday Soc. 21, 536 (1926)
131. E. U. Condon: Phys. Rev. 32, 858 (1928)
132. D. N. Bertran, J. J. Hopfield: J. Chem. Phys. 81, 5753 (1984)
133. X. Urbain, B. Fabre, E. M. Staice-Casagrande et al: Phys. Rev. Lett. 92, 163004
(2004)
134. M. Lax: J. Chem. Phys. 20, 1752 (1952)
135. D. I. Khomskii: Usp. Fiz. Nauk 129, 443 (1979) [Sov. Phys. Usp. 22, 879
(1979)]
136. C. E. T. Goncalves da Silva, L. M. Falicov: Phys. Rev. B 13, 3948 (1976).
137. P. A. Alekseev, J. M. Mignot, J. Rossat-Mignot: J. Phys.: Condens. Matter 7,
289 (1995)
138. K. A. Kikoin, A. S. Mishchenko: J. Phys.: Condens. Matter 7, 307 (1995)
139. R. Feynman: Phys. Rev. 97, 660 (1955)
140. F. M. Peeters, J. T. Devreese: Phys. Rev. B 28, 6051 (1983)
141. V. Cataudella, G. De. Filippis, C. A. Perroni: Single polaron properties in
different electron phonon models, contribution to the present book.
142. E. G. Brovman, Yu. Kagan: Zh. Eksp. Teor. Fiz. 52, 557 (1967) [Sov. Phys.
JETP 25, 365 (1967)]
143. K. A. Kikoin, A. S. Mishchenko: Zh. Eksp. Teor. Fiz. 104, 3810 (1993) [Sov.
Phys. JETP 77, 828 (1993)]
144. B. Gerlach, H. L”owen: Rev. Mod. Phys. 63, 63 (1991)
145. A. Brillante, M. R. Philpott: J. Chem. Phys. 72, 4019 (1980)
146. D. Haarer: Chem. Phys. Lett. 27, 91 (1974)
147. D. Haarer: J. Chem. Phys. 67, 4076 (1977)
148. M. Kuwata-Gonokami, N. Peyghambarian, K. Meissner et al: Nature 367, 47
(1994)
40 A. S. Mishchenko and N. Nagaosa
149. S. Curnoe, K. A. Kikoin: Phys. Rev. B 61, 15714 (2000)
150. K. A. Kikoin, A. S. Mishchenko: Zh. Eksp. Teor. Fiz. 94, 237 (1988) [Sov.
Phys. JETP 67, 2309 (1988)]
151. K. A. Kikoin, A. S. Mishchenko: J. Phys.: Condens. Matter 2, 6491 (1990)
152. P. A. Alekseev, A. S. Ivanov, B. Dorner et al: Europhys. Lett. 10, (1989) 457.
153. A. S. Mishchenko, K. A. Kikoin: J. Phys.: Condens. Matter 3, 5937 (1991).
154. G. Trawaglini P. Wachter: Phys. Rev. B 29, 893 (1984)
155. P. Lemmens, A. Hoffman, A. S. Mishchenko et al: Physica B 206&207, 371
(1995)
156. T. Kasuya: Europhys. Lett. 26, 277 (1994)
157. T. Kasuya: Europhys. Lett. 26, 283 (1994)
158. E. Manousakis: Rev. Mod. Phys. 63, 1 (1991)
159. E. Dagotto: Rev. Mod. Phys. 66, 763 (1994)
160. P. A. Lee, N. Nagaosa, X. G. Wen: Rev. Mod. Phys. 78, 17 (2006)
161. B. Batlogg, R. J. Cava, A. Jayaraman et al: Phys. Rev. Lett. 58, 2333 (1987)
162. O. Gunnarsson, M. Calandra, J. E. Han: Rev. Mod. Phys. 75, 1085 (2003)
163. R. Khasanov, D. G. Eshchenko, H. Luetkens et.al: Phys. Rev. Lett. 92, 057602
(2004)
164. L. Pintschovius, M. Braden: Phys. Rev. B, 60, R15039 (1999).
165. C. Thomsen, M. Cardona, B. Gegenheimer et. al: Phys. Rev. B 37, 9860 (1988)
166. V. G. Hadjiev, X. Zhou, T. Strohm, et. al: Phys. Rev. B 58, 1043 (1998)
167. G. Khaliullin, P. Horsch: Physica C 282-287, 1751 (1997)
168. O. Rösch, O. Gunnarsson: Phys. Rev. Lett. 93, 237001 (2004)
169. A. Lanzara, P. V. Bogdanov, X. J. Zhou et al: Nature 412, 510 (2001)
170. T. Cuk, F. Baumberger, D. H. Lu et al: Phys. Rev. Lett. 93, 117003 (2004)
171. T. P. Devereaux, T. Cuk, Z.-X. Shen, N. Nagaosa: Phys. Rev. Lett. 93, 117004
(2004)
172. R. J. McQueeney, Y. Petrov, T. Egami et al: Phys. Rev. Lett. 82, 628 (1999)
173. A. V. Chubukov, M. R. Norman: Phys. Rev. B 70, 174505 (2004)
174. M. Eschrig, M. R. Norman: Phys. Rev. Lett. 85, 3261 (2000)
175. M. Eschrig, M. R. Norman: Phys. Rev. B 67, 144503 (2003)
176. X. J. Zhou, T. Yoshida, A. Lanzara et al: Nature 423, 398 (2003)
177. K. M. Shen, F. Ronnig, D. H. Lu et al: Phys. Rev. Lett. 93, 267002 (2004)
178. T. Xiang, J. M. Wheatley: Phys. Rev. B 54, R12653 (1996)
179. B. Kyung, R. A. Ferrell: Phys. Rev. B 54, 10125 (1996)
180. J. J. M. Pothuizen1, R. Eder1, N. T. Hien et al: Phys. Rev. Lett. 78, 717 (1997)
181. K. A. Chao, J. Spalek, A. M. Oles: J. Phys. C 10, L271 (1977)
182. C. Gross, R. Joynt, T. M. Rice: Phys. Rev. B 36, 381 (1987)
183. M. Brunner, F. F. Assaad, A. Muramatsu: Phys. Rev. B 62, 15480 (2000).
184. V. I. Belinicher, A. L. Chernyshev, V. A. Shubin: Phys. Rev. B 53, 335 (1996)
185. V. I. Belinicher, A. L. Chernyshev, V. A. Shubin: Phys. Rev. B 54, 14914
(1996)
186. T. Tohyama, S. Maekawa: Superconductors Science and Technology 13, R17
(2000)
187. J. Ba la, A. M. Oleś, J. Zaanen: Phys. Rev. B 52 4597 (1995)
188. S. Schmitt-Rink, C. M. Varma, A. E. Ruckenstein: Phys. Rev. Lett. 60, 2793
(1988)
189. Z. Liu, E. Manousakis: Phys. Rev. B 44, 2414 (1991)
190. Z. Liu, E. Manousakis: Phys. Rev. B 45, 2425 (1992)
Spectroscopic Properties of Polarons by Exact Monte Carlo 41
191. B. Bauml, G. Wellein, H. Fehske: Phys. Rev. B 58, 3663 (1998)
192. A. Ramšak, P. Horsch, P. Fulde: Phys. Rev. B 46, 14305 (1992)
193. B. Kyung, S. I. Mukhin, V. N. Kostur, R. A. Ferrell: Phys. Rev. B 54, 13167
(1996)
194. F. Marsiglio F, A. E. Ruckenstein, S. Schmitt-Rink, C. Varma: Phys. Rev. B
43, 10882 (1991)
195. G. Martinez, P. Horsch: Phys. Rev. B 44, 317 (1991)
196. O. Rösch, O. Gunnarsson: Phys. Rev. B 73, 174521 (2006)
197. A. S. Mishchenko: Pis’ma Zh. Eksp. Teor. Fiz. 66, 460 (1997) [JETP Lett. 66,
487 (1997)]
198. O. Rösch, O. Gunnarsson: Europhys. Phys. J. B 43, 11 (2005)
199. G. Sangiovanni, O. Gunnarsson, E. Koch, C. Castellani, M. Capone: cond-
mat/0602606.
200. O. Rösch, O. Gunnarsson: Phys. Rev. B 70, 224518 (2004).
201. G.-H. Gweon, T. Sasagawa, S. Y. Zhou et al: Nature 430, 187 (2004)
202. O. Rösch, O. Gunnarsson, X. J. Zhou et al: Phys. Rev. Lett. 95, 227002 (2005)
203. G. A. Sawatzky: Nature (London) 342B, 480 (1989)
204. O. Rösch, O. Gunnarsson: Phys. Rev. Lett. 92, 146403 (2004)
205. S. Ishihara , N. Nagaosa: Phys. Rev. B 69, 144520 (2004)
ABSTRACT
  We present recent advances in understanding of the ground and excited states
of the electron-phonon coupled systems obtained by novel methods of
Diagrammatic Monte Carlo and Stochastic Optimization, which enable the
approximation-free calculation of Matsubara Green function in imaginary times
and perform unbiased analytic continuation to real frequencies. We present
exact numeric results on the ground state properties, Lehmann spectral function
and optical conductivity of different strongly correlated systems: Frohlich
polaron, Rashba-Pekar exciton-polaron, pseudo Jahn-Teller polaron, exciton, and
interacting with phonons hole in the t-J model.

<|endoftext|><|startoftext|>
Introduction By Way of Reprise: From Box-Kites
to ETs
The creation of 2N-dimensional analogues of Complex Numbers (and it was not a
trivial insight of 19th Century algebra that legitimate analogs always have dimen-
sion a power of 2) is handled by a now well-known algorithm called the Cayley-
Dickson Process (CDP). Its name suggests a compressed account of its history:
for Arthur Cayley – simultaneously with, but independently of, John Graves –
∗Email address: rdemarrais@alum.mit.edu
http://arxiv.org/abs/0704.0026v3
jumped on Hamilton’s initial generalization of the 2-D Imaginaries to the 4-D
Quaternions within weeks of its announcement, producing – by the method later
streamlined into Leonard Dickson’s close-to-modern “cookie-cutter” procedure –
the 8-D Octonions. The hope, voiced by no less than Gauss, had been that an
infinity of new forms of Number were lurking out there, with wondrous proper-
ties just awaiting discovery, whose magical utility would more than compensate
for the loss of things long taken for granted as their seekers ascended into higher
dimensions. But such fantasies were quashed quite abruptly by Adolph Hurwitz’s
proof, just a few years before the 20th Century loomed, that it only took four
dimension-doublings past the Real Number Line to find trouble: the 16-D Sede-
nions had zero-divisors, which meant division algebra itself broke down, which
meant researchers were so at a loss to find anything good to say about such Num-
bers that nobody bothered to even give their 32-D immediate successors a name,
much less investigate them seriously.
But it is with these 32-D “Pathions” (for short for “pathological,” which we’ll
call them from now on) that our own account will pick up in this second part
of our study of “placeholder substructures” (i.e., “zero divisors”) For, due to a
phenomenon we dubbed carrybit overflow in the first installment, strange yet pre-
dictable things are found to be afoot in the ZD equivalent of a “Cayley Table.” As
we’ll see shortly, this is a listing, in a square array, of the ZD “emanations” (or
lack of same) of all ZD “elements” with each other – all, that is, sharing mem-
bership in an ensemble defined not by a shared “identity element,” but a common
strut constant.
What we’ll see is that the lacks are of the essence: for each doubling of N, the
Emanation Table (ET) for the 2N+1-ions of same strut-constant will contain that
of its predecessor, leading to an infinite “boxes-within-boxes” deployment whose
empty cells define, as N grows ever larger, an unmistakable fractal limit. The full
algorithmic analysis of such Matrioshka-doll-like “meta-fractal” aspects – by the
simple rules of what we’ll call “recipe theory” (after the R, C, and P values related
to the Row label, Column label, and their cell-specific Products in such Tables) –
must await our third and last installment. But the colored-quilt-like graphics can
be viewed by any interested readers at their leisure, in the Powerpoint slide-show
online at Wolfram Science from our mid-June presentation at NKS 2006.[1] (The
slide-show’s title is almost identical to that of this monograph, as this latter is
meant to be the “theorem/proof” exposition of that iconic, hence largely intuitive
and empirically driven narration.)
What we’ll need to undertake this voyage is a quick reprise of the results
from Part I [2]. As the hardest part (as a hundred years of denial would imply)
is finding the right way to think about the phenomenology of zero-division, not
understanding its basic workings once they’re hit upon, such a summary can be
much more brief and easy to follow than the proofs required to produce and justify
it. We need but grasp 3 rather simple things. First, we must internalize the path
and vertex structure of an Octahedron – for, properly annotated and storyboarded,
this will provide us with the Box-Kite representation that completely catalogs ZDs
in the 16-D arena where they first emerged (and, as we’ll see in our Roundabout
Theorem herein, underwrites all higher-dimensional ZD emergences as well).
Second, instead of the cumbersome apparatus of CDP that one finds in algebra
texts and the occasional software treatment, we offer two easy algebraic one-liners
which (inspired by Dr. Seuss’s “Thing 1” and “Thing 2”), we simply call “Rule
1” and “Rule 2” – which operate, in almost Pythagorean earnest, on triplets of
integers (indices of associative triplets among our Hypercomplex Units, as we’ll
learn), and which, by so doing, accomplish everything the usual CDP tactics do,
but without the all-too-frequent obfuscation. (There is also a very useful, albeit
quite trivial, “Rule 0,” which merely states that any integer-triple serving to index
an associative triplet for one power of N will continue to do so for all higher pow-
ers. What makes this useful is its allowing us to recursively take triplet “givens”
for lower-level 2N-ions than those of current interest and toss them into the central
circle of the third thing we must grasp.)
We’ll need, that is, to be able to draw the simplest finite projective group’s
7-line, 7-node representation, the so-called PSL(2,7) triangle. The Rules, plus
the Triangle, applied to Box-Kite edge-tracings and nodal indices, are all we’ll
need. Indeed, the Box-Kite itself can be readily derived from the Triangle, by
suppressing the central node, and then recognizing four correspondences. First,
see the Triangle’s 3 triple-noded sides – two vertices plus midpoint – as the sources
of the Box-Kite’s trio of “filled-in” triangles dubbed Trefoil Sails. Second, link
the 1 triple-noded circle (which is a projective line, after all), wrapped around the
suppressed center and threading the midpoints, as the 4th such triangle, the quite
special “Zigzag Sail.” Third, envision the 3 lines from midpoints to angles as
underwriting the ZD-challenged part of the diagram (because ZDs housed at the
midpoint node cannot mutually zero-divide any housed at the opposite, vertex,
node), the struts (whence strut constants). Fourth and last, imagine the other four
triangles of the Box-Kite (meeting, as with the first four, each to each, at corners
only, like same-colored checkerboard squares) as the vents where the wind blows.
They keep the kite afloat, letting the four prettily colored jib-shaped Sails show
off, while the trio of wooden or plastic dowels that form the struts thanklessly
provide the structural stability that makes the kite able to fly in the first place.
As Euclid knew well, 3 points determine a Triangle as well as a Circle – which
is how we can glibly switch gears between representations based on these projec-
tive lines. But the easy convertibility of lines to circles is what projective means
here – and is, as well, at the very heart of linking the above geometrical images
to Imaginary Numbers. From Argand’s diagram to Riemann’s Sphere, this has
been the essence of Complex geometry. On the latter image only, place a sphere
on a flat tabletop, call the point of contact S (for “South”), and then direct rays
from its polar opposite point N. Rays through the equator intersect the table in
a circle whose radius we ascribe an absolute value of 1, with center S = 0. This
circle is just the trace of the usual ei·2π·θ exponential-orbit equation, with the i in
the exponent, of course, being the standard Imaginary. Any diameter through this
circle, extended indefinitely in either direction, is clearly a “projective pencil” of
a circular motion in the plane containing both it and N, and centered on the latter.
What each “line,” then, in the PSL(2,7) triangle represents is a coherent sys-
tem interrelating 3 distinct imaginaries, one per nodal point: that is, a “Quater-
nion copy” sans the Reals (which latter, like our N,S polar axis in the above, must
stand “outside” the Number Space itself, since 3-D visualization is all used up by
the nodes’ dimensional requirements). Hence, the 7 lines are the 7 interconnected
Quaternion copies which constitute the 8-D Octonions. And what makes this espe-
cially rich for our purposes is the built-in recursiveness of this Octonion-labeling
scheme for higher-dimensional isomorphs, embedded in the sorts of ensembles
we’ll be needing ETs to investigate more thoroughly.
To see how this relates to actual integers, take the prototype of the 7 lines
in the Triangle, and consider the Quaternions strictly from the vantage of CDP’s
Rule 1. The first task in studying any system of 2N-ions is generating its units, so
start with N = 0. Treat this singleton as the index of the Real axis: i0, that is, is
identically 1. Add a unit whose index = 20 = 1 and we have the complex plane.
Now, add in a unit whose index is the next available power of 2 – with N = 1, this
is 2 itself. Call this unit and its index G for Generator, and declare this inductive
rule: the index of the product of any two units is always the XOR of the indices of
the units being multiplied; but, for any unit with index u < G, the product of said
unit, written on the left (right), with the Generator written on its right (left), has
index equal to their indices’ simple sum, and sign equal (opposite) to the product
of the signs of their units’: i1 · i2 =+i3, but i2 · i1 =−i3. But this is just a standard
way of summarizing Quaternion multiplication.
Now, set N = 2, making G = 4. Applying the same logic, but slightly general-
ized, we get three more triplets of indices. Dispensing with the tedious overhead
of explicitly writing the indices as subscripts to explicit copies of the letter i, these
are written in cyclical positive order (CPO) as follows: (1,4,5);(2,4,6);(3,4,7).
(CPO is not mysterious: it just means read the triplet listing in left-right order,
and so long as we multiply any unit with any such index by the unit whose index
is to the right of it, the third term will result with signing as specified above: e.g.,
i4 · i5 = +i1; i4 · i3 = −i7.) We now have 4 of the Octonions’ 7 triplets, forming
labels on the nodes of 4 of PSL(2,7)’s lines. Call the central circle spanning the
medians the Rule 0 line (the Quaternions’ “starter kit” we just fed into our Rule 1
induction machine). Putting G = 4 in the center, the 3 lines through it are our Rule
1 triplets. If we further array the Quaternion index-set (1,2,3) in clockwise order
around the 4, starting from the left slope’s midpoint at 10 o’clock, these lines are
all oriented pointing into the angles. Now, with “Rule 2,” let’s construct the lines
along the Triangle’s sides.
Here’s all that Rule 2 says: given an associative index-triplet (henceforth, trip)
like the Quaternions’ (1,2,3), fix any one among them, then take its two CPO
successors and add G to them. Swap the order of the resulting two new units,
and you have a new trip. Hence, fixing 1, 2, and 3 in turn, in that order, Rule 2
gives us these 3 triplets: (1,7,6);(2,5,7);(3,6,5). If you’ve drawn PSL(2,7) with
the Octonion labels per the instructions in the last paragraph, you’ve already seen
these 3 trips are the answers ... and now you know how and why they’re oriented,
too. (Clockwise, in parallel with the Rule 0 circle).
We’ve now laid out all the ingredients we need to do a basic run-through of
Box-Kite properties. We’ll merely state and describe them, rather than prove them
(but we’ll give the Roman numerals of the theorem numbers from last installment,
for those who want to follow them). The first feature in need of elucidating,
which should have those who’ve been reading attentively scratching their heads
just about now, is this: the relations between the indices at the nodes of PSL(2,7)
qua Octonion labeling scheme are clear enough; but how can these same labeled
nodes serve to underwrite the 16-D Sedenion framework that Box-Kites reside in?
The answer has two parts.
First part: since all Imaginaries have negative Reals as squares, Imaginaries
whose products are zero must have different indices – meaning that the simple
case (which we call “primitive” ZDs) will always involve products of pairs of
differently-indexed units, whose respective planes share no points other than 0
[IV]. Second part: given any such ZD dyad, neither index can ever equal G [II];
and, one must have index > G, while the other has index < G [I, III]. The Oc-
tonion labeling scheme maps to the four Sails of a legitimate Sedenion Box-Kite
[V], because it only provides the low-index labels at each of the 6 Octahedral
vertices.
The 4 in the center of our example, meanwhile, is no longer the G for this
setup, since that role in now played by 8 (the next power of 2 in the CDP induc-
tion). In the context of the Box-Kite scheme, it is now represented by a different
letter: S, for strut constant – the only Octonion index not on a Box-Kite vertex.
Which is why, from one vantage, there are 7 distinct (but isomorphic) Box-
Kites in Sedenion space: because we’ve 7 choices of which Octonion to suppress!
6 vertices times 7 gives us the 42 Assessors of our first ZD paper [3], a term
we’ll use interchangeably with dyad throughout. We can, in fact, tug on the net-
work of interconnected lines “wok-cooking” style, stirring things into and out of
the hot oil in the center of the Box-Kite. (S as ”Stir-fry constant”?) To find the
“Octonion copy” labeling low indices on Box-Kite vertices where the 5, say, is
suppressed, trace the line containing it and the 4, and “rotate”: the 1 now goes
from the left slope’s midpoint to the bottom right angle, to be replaced by the 4
while the 5 heads for the middle, with CPO order (and hence, orientation of the
line) remaining unchanged. Of the other 2 trips the 5 belongs to, only one will
preserve midpoint-to-angle orientation along the 6 o’clock-to-midnight vertical:
(2,5,7), as one can check in an instance. (The two possibilities must orient oppo-
sitely when placed along the same line, since one is Rule 1, the other Rule 2.)
From this point, everything is forced. This is obviously a procedure that is
trivial to automate, for any “Octonion copy,” regardless of the ambient dimen-
sionality the Box-Kite it underwrites might float in. This simple insight will be
the basis, in fact, of our proof method, both in this paper and its sequel. Another
simple insight will tell us how to find the high-index term for any vertex’s dyad.
Two indices per vertex leaves 4 that are suppressed: 0 (for the Reals), G and S,
and the XOR (and also simple sum) of the latter two, which we’ll shorthand X.
These four clearly form a Quaternion copy – one, in fact, which has no involve-
ment whatsoever in its containing Box-Kite’s zero-divisions. Putting the index of
the one among these which is itself an L-unit center stage gives us the full array of
L-index sets (trips composed of those indices of a Sail’s 3 vertices <G) associated
with the 4 Sails. Putting in G or X, then, must give us the full array of U-index
sets (“U” as in “upper”).
Since each node belongs to 3 lines in PSL(2,7), the strut constant belongs to
3 trips, each containing one term from the Rule 0 Zigzag Sail’s L-index set, and
one from the Vent which resides opposite it on the Box-Kite’s octahedral frame.
In the Sedenions, three simple rules govern interactions of the Vent and Zigzag
dyads sharing a strut. Writing the U- and L- index terms in upper and lower case
respectively, we can symbolize their dyads as (V, v) and (Z, z) respectively. The
“Three Viziers” (derived as side-effects of [VII], with one for each non-0 member
of our ZD-free index set) read as follows:
VZ1: v · z =V ·Z = S
VZ2: Z · v =V · z = G
VZ3: V · v = z ·Z = X.
The First Vizier motivates the term strut constant: for the same pattern obtains
for it, regardless of the strut being investigated. The Second Vizier shows us
that G connects strut opposites, always by Rule 1 logic. But clearly, the Third
Vizier gives us the simplest way to answer any questions concerning the relations
between indices within a dyad: the L- and U- indices of any dyad belong to the
same trip as X, with CPO ordering determined by whether or not the dyad belongs
to the Zigzag proper or the Vent opposite it.
Beyond the Sedenions, VZ2 is universally true, but VZ1 and VZ3 are only so
up to sign: e.g., the VZ1 L-trip for an arbitrary strut can read (z,c,S) in certain
higher-dimensional contexts. This is ultimately a side-effect of the same “carrybit
overflow” that creates the phenomenon of most interest to us here, the “missing
box-kites” in all 2N-ions, N at least 5, for S > 8 and not a power of 2. Correlated
with such ZD-free structures are “Type II” box-kites with S < 8 (or, more gen-
erally, ¡ G/2), indistinguishable from the standard “Type I” variety but for strut
orientations (with exactly 2 of a “Type II”’s 3 struts always being reversed: see
Appendix B). Their “twist products” (operating similarly on parallel sides of each
of the 3 orthogonal squares or “catamarans” of a box-kite’s orthogonal wire-frame,
as opposed to the 4 triangular “sails” which are our sole focus in this monograph)
let them act as middlemen between the normal and ZD-free structures. Our ar-
guments here will make no use of such “twist product” subtleties (on which, see
Theorem 6 in Part I and the caveat that follows it, and the more developed re-
marks and diagrams in [4]). Indeed, their phenomenology falls “under the radar”
of our Sail-based analysis: strut-opposite Assessors, after all, do not mutually
zero-divide.
Given our limited purposes here, therefore, our toolkit, once the Viziers are
dropped in it, is complete for all our later proofs. (We must simply remember
that invocations of VZ1 and VZ3 implicitly concern sign-free relations between
Vent and Zigzag terms – that is, indices of XOR products only.) What’s left to do
still: get our hands messy with the plumbing, and then clean up with a last grand
construct. Let’s start with the plumbing, and add some notation. Label the Zigzag
dyads with the letters A, B, C; label their strut-opposite terms in the Vent F, E, D
respectively. Specify the diagonal lines containing all and only ZDs in any such
dyad K as (K, /) and (K, \) – for c · (iK + ik) and c · (iK − ik) respectively, c an
arbitrary real scalar. The twelve edges of the octahedral grid are so many pipes,
through which course the two-way streets of edge-currents: for the 3 edges of the
Zigzag (and the 3 defining the opposite Vent), currents joining arbitrary vertices
M and N are called negative, since they have this form:
(M,/) ·(N, \) = (M, \) · (N,/) = 0
Tracing the perimeter of the Zigzag with one’s finger, performing ZD products
in natural sequence – (A, /)·(B, \), followed by the latter times (C, /), then
this times (A, \) and so forth – one should quickly see how the Zigzag’s name
was suggested. Suppressing all letters, one is left with just this cyclically repeating
sequence: /\/\/\.
Currents along all 6 edges joining Zigzag and Vent dyads, on the contrary, con-
nect similarly sloping diagonals, hence are called positive, yielding the shorthand
sequence ///\\\ for Trefoil sail traversals:
(Z,/) ·(V, /)= (Z, \) ·(V,\)= 0
Consider the chain of ZD multiplications one can make along the Zigzag, be-
tween A and B, then B and C, then C and A, for S = 4. The first term of this
6-cycle of zero products, once fully expanded, is writable thus:
(A, /) ·(B, \) = (i1 + i13) · (i2− i14) = (i3 − i15 + i15 − i3) =
(C, /)−(C, /)= (C, \)−(C, \)= 0
We can readily see here where the notion of emanation arises: traversing the
edge between any two vertices in a Sail yields a balance-pan pairing of oppositely
signed instances of the terms at the Sail’s third vertex ... the 0 being, then, an
instance of “balanced bookkeeping” (whence the term “Assessor,” our synonym
for “dyad”). This suggests the spontaneous emanation of particle/anti-particle
pairings from the quantum vacuum, rather than true “emptiness.”
Finally, a side-effect of such “Sail dynamics” is this astonishing phenomenon:
each Sail is an interlacing of 4 associative triplets. For the Zigzag, these are the L-
index (a,b,c), plus the 3 U-index trips obtained by replacing all but one of these
lowercase letters with their uppercase partners: ergo, (a,B,C); (A,b,C); (A,B,c).
Ultimately this tells us that ZDs are extreme preservers of order, since they main-
tain associativity in rigorous lock-step patterns, for all 2N-ions, no matter how
close to ∞ their N might become. Put another way, the century-long aversion re-
action experienced by virtually all mathematicians faced with zero-divisors was
profoundly misguided.
2 Emanation Tables: Conventions for Construction
Theorem 7 guaranteed the simple structure of ETs: because any Assessor’s up-
percase index iU is strictly determined by G and S, once we are given these two
values, the table need only track interactions among the lowercase indices iL. This
will only lead to ambiguities in the very place these are meaningful: in the recur-
sive articulation of a boxes-within-boxes tabulation of meta-fractal or Sky behav-
iors. In such cases, the overlaying will be as rich in significance as the multiplicity
of sheets of a Riemann surface in complex analysis.
An ET does for ZD interactivity what a Cayley Table does for abstract groups:
it makes things visible we otherwise could not see – and in a similar way. Each
Assessor’s L-index is entered (in a manner we’ll soon specify) as a row (R) or
column (C) value, with XOR products (P values) among them being placed in the
“spreadsheet cell” (r,c) uniquely fixed by R and C. We’ve noted such values only
get entered if P is the L-index of a legitimate emanation: that is, the Assessor it
represents mutually zero-divides (forms DMZs with, for “divisors making zero”)
both the Assessors represented by the R and C labels of its cell. (As already
suggested, the natural use of the letters R, C, P here inspired calling the study of
NKS-like “simple rules” for cooking fractals from their bit-strings recipe theory.)
Four conventions are used in building ETs: first, their labeling scheme obeys
the same nested-parentheses ordering we’ve already used in designating Assessors
A through F, with D, E, F the strut opposites of A, B, C in reverse of the order just
written. The L-indices, then, are entered as labels running across the top and down
the left. The label of the lowest L-index is placed flush left (abutting the ceiling),
with the corresponding label of its strut opposite being entered flush right (atop
the floor). As there will always be G− 2 (hence, an even number of) indices to
enter, repeating this procedure after each pair has been copied to horizontal and
vertical labels will completely exhaust them all.
Second convention: As the point of an ET is to display all legitimate DMZs,
any cell whose R and C do not mutually zero-divide is left blank – even if, in
fact, there is a well-defined XOR value. Hence, if R and C reference the same
Assessor, the XOR of their L-indices will be 0; if they reference strut opposites,
the XOR will be S. But in both cases, the cell (hence, the P value) is left blank.
All “normal” ETs, then, will have both long diagonals populated by blank cells,
while all other cells are filled.
Third convention: the two ZD diagonals associated with any Assessor are not
distinguished in the ET, although various protocols are possible that would make
doing so easy. The reasons are parsimony and redundancy: rather than create
longer, or twice as many, entries, we assume both entries for the same Box-Kite
edge will contain the positive-sloping diagonal when the lower L-index appears as
the row label, else the negative-sloping diagonal when the higher L-index appears
first instead. Such niceties won’t concern us much here: the key thing is that, in
fact, all 24 filled cells of a Box-Kite’s ET entries can be mapped one-to-one to its
ZD diagonals. Recall, per Theorem 3, that both ZD diagonals of an Assessor form
DMZs with the same Assessor, according to the same edge-sign logic. This leads
us to the . . .
Fourth convention: Although they are superfluous for many purposes, edge
signs provide critical information for others, and so are indicated in all ETs pro-
vided here. Each of a Box-Kite’s 12 edges conducts two currents – one per ZD
diagonal – and does so according to one or the other orientational option. ZD di-
agonals are conventionally inscribed so that the horizontal axis of their Assessor
plane is the L-indexed unit, while the vertical is the U-indexed unit. But even if
this convention were reversed, the diagonal leading from lower left quadrant to up-
per right would still correspond to the state of synchrony implied by ±k(iL + iU):
for some Assessor U, we write (U,/). Conversely, the orthogonal diagonal in-
dicative of anti-synchrony is written (U,\). If DMZs formed by the Assessors
bounding an edge are both of same kind, then we call the edge blue or notate it
[+]; if Assessors U and V only form DMZs from oppositely oriented ZD diag-
onals – (U,/) · (V,\) = 0 ⇔ (U,\) · (V,/) = 0 – then we call the edge red or
notate it [-]. However, for ET purposes, since the red edges are the most infor-
mative (all-red-edged Zigzags providing the stable basis of Box-Kite structure,
while all-red-edged DEF Vents play a key role in twist-product interpreting – a
deep topic touched upon in Part I, which won’t concern us further here), we leave
them unmarked. The six blue edges bounding the hexagonal view of the Box-Kite,
however, are preceded by an extra mark (best interpreted as a dash, rather than a
minus sign). This has the pragmatic advantage that when zoomed, a large ET will
have its entries with an extra mark become unreadable in many software systems
(e.g., one sees only asterisks) – and so we want the unmarked entries to be those
likely to be of most interest.
Since, given X (or, alternatively, G or N, and S), we can reconstruct a Box-
Kite from just its Zigzag’s L-index trip, gleaning this information from an ET is
worth explaining. If a given row contains the indices of any such Zigzag L-trips,
they will appear as the row label itself, plus two unmarked cell entries, with the
column label of the one appearing as the content of the other. (If either cell in such
a complementary set be marked with a dash, then we are dealing with a DEF Vent
index.) Each Zigzag L-trip will also appear 3 times in an ET, once in each row
whose label is one of its indices, its 2 non-label indices appearing in un-dashed
cell entries each time.
Here is a readily interpreted emanation table. Having 6 = 23 − 2 rows and
columns, G = 8, so N = 4, making this a Sedenion ET (encoding, thereby, a
single Box-Kite). And, since 2⊻ 3 = 4⊻ 5 = 6⊻ 7 = 1, the Strut Constant S = 1
as well. A scan of the first row shows 6 and 5 unmarked, under headings 4 and
7 respectively; however, these two labels appear as cell values which are marked,
making these edges that connect Assessors in the D, E, F Vent. In the fourth row
of entries, though, column labels 5 and 3 contain cell values 3 and 5 respectively,
both unmarked. With their row label 6, then, these form the Zigzag L-index set
(3,6,5), which hence must map to Assessors (A,B,C). Using the mirror-opposite
logic of the labeling scheme to determine strut opposites, it is clear that the six row
and column headings (2,4,6,7,5,3) correspond, in that order, to the Assessors
(F,D,B,E,C,A). (The unmarked contents 6 and 5 in the first row, having labels
(2,4) and (2,7), thereby map to edges FD and FE, connecting DEF Vent Assessors
as claimed.) Finally, the long diagonals are all empty: those cells in the diagonal
beginning at the upper left all have identical row and column labels; those in the
mirror-opposite slots, meanwhile, have labels which are strut-opposites. By our
second convention, all these cells are left blank.
2 4 6 7 5 3
2 6 −4 5 −7
4 6 −2 3 −7
6 −4 −2 3 5
7 5 3 −2 −4
5 −7 3 −2 6
3 −7 5 −4 6
Before beginning an in-depth study of emanation tables by type, there is one
general result that applies to them all – and whose proof will give us the chance
to put the Three Viziers to good use. While seemingly quite concrete, we will use
it in roundabout ways to simplify some otherwise quite complicated arguments,
beginning with next section’s Theorem 9. This Roundabout Theorem is our
Theorem 8. The number of filled cells in any emanation table is a multiple of 24.
Proof. Since 24 is the number of filled cells in a Sedenion Box-Kite, this is equiv-
alent to claiming that CDP zero-divisors come in clusters no smaller than Box-
Kites. We have already seen, in Theorem 5, that the existence of a DMZ implies
the 3-Assessor system of a Sail, which further (as Theorem 7 spelled out) entails a
system of 4 interlocking trips: the Sail’s L-trip, plus 3 trips comprising each L-trip
index plus the U-indices of its Assessor’s 2 “sailing partners.” Since we have an
ET, we have a fixed S and fixed G. Hence, if we suppose our DMZ corresponds
to a Zigzag edge-current, we immediately can derive its L-trip by Theorem 5, and
all 3 Zigzag strut-opposites’ L-indices by VZ 1, and all 6 U-indices by VZ 3. We
then can test whether the Trefoil Sails’ edge-currents are all DMZs as follows. As
we wrote in Theorem 7, (u,v,w) maps to the Zigzag L-trip in CPO, but not neces-
sarily in (a,b,c), order: hence, (uopp,wopp,v) is an L-trip, and can be mapped to
any of the Trefoils. In other words, given the Zigzag’s 3-fold rotational symmetry,
proving the truth of the following arithmetical result proves the DMZ status of all
Trefoil edges. Yet we can avail ourselves of all 3 Zigzag U-trips in proving it.
(wopp −Wopp)
(uopp +Uopp)
−V − v
+v +V
The left bottom result is a given of the trip we started with. The result to its
right is a three-step deduction from one of the Zigzag U-trips: use (uopp,w,vopp);
Rule 2 gives (uopp,vopp+G,w+G); the Second Vizier tells us this is (uopp,V,Wopp);
but the negative inner sign on the upper dyad reverses the sign this trip implies,
yielding +V for the answer.
The top results are derived similarly: find which of the 4 Zigzag trips un-
derwrites the Vizier-derived “harmonic” which contains the pair of terms being
multiplied, and flip signs as necessary. Hence, the top left uses (u,wopp,vopp),
then applies Rule 2 and the Second Vizier to get (−V ), while the top right uses
the Zigzag L-trip itself: (u,v,w)→ (w+G,v,u+G) → (Wopp,v,Uopp) – which,
multiplied by (−1), yields (−v). �
Remark. The implication that, regardless of how large N grows, ZDs only increase
in their interconnectedness, rather than see their basic structures atrophy, flies in
the face of a century’s intuition based on the Hurwitz Proof. That there are no
standalone edge-currents, nor even standalone Sails, bespeaks an astonishing (and
hitherto quite unsuspected) stability in the realm of ZDs.
Corollary. An easy calculation makes it clear that the maximum number of filled
cells in any ET for any 2N-ions is just the square of a row or column’s length in
cells, minus twice the same number (to remove all the blanks in long diagonals):
that is, (2N−1 −2)(2N−1 −2)−2 · 2N−1 +4 = (22N−2 −6 · 2N−1 +8) = (2N−1 −
4)(2N−1 − 2) = 4 · (2N−2 − 1)(2N−2 − 2). By Roundabout, we now know this
number is divisible by 24, hence indicates an integer number of Box-Kites. But
two dozen into this number is just (2N−2 − 1)(2N−2 − 2)/6 – the trip count for
the 2N−2-ions! (See Section 2 of Part I.) We have, then, the very important Trip-
Count Two-Step: The maximum number of Box-Kites that can fill a 2N-ion ET =
TripN−2. We will see just how important this corollary is next section.
3 ETs for N > 4 and S ≤ 7
One of the immediate corollaries of our CDP Rules for creating new triplets from
old ones is something we might call the Zero-Padding Lemma: if two k-bit-long
bitstring representations of two integers R and C being XORed are stuffed with
the same number n of 0s between bits j and j+1, 0 ≤ j ≤ k, their XOR will, but
for the extra n bits of 0s in the same positions, be unchanged – and so will the
sign of the product P of CDP-derived imaginary units with these three bit-strings
representing their respective indices.
Examples. (1,2,3)→ (2,4,6)→ (4,8,12) [Add 1, then 2, 0s to the right of
each bitstring]
(1,2,3)→ (1,4,5)→ (1,8,9) [Add 1, then 2, 0s just before the rightmost bit
in each bitstring]
(3,4,7)→ (3,8,11)→ (3,16,19) [Add 1, then 2, 0s just after the leftmost bit
in each bitstring]
Proof. Rule 1 will create a new unit of index G+L from any unit of index
L < G, regardless of what power of 2 G might be. Rule 2, meanwhile, uses
any power of 2 which exceeds all indices of the trip it would operate on, then
adds this G to two of the members of the trip, creating a new trip with reversed
orientation – one of an infinite series of such, differing only in the power of 2
(hence, position of the leftmost bit) used to construct them. The lemma, then, is
an obvious restatement of the fundamental implications of the CDP Rules.
But creation of U-indices associated with L-indices in Assessor dyads is the
direct result of creating new triplets with G+S as their middle term. Hence, if
we call the current generator g and that of the next higher 2N-ions G (= 2 · g),
then if Assessors with L-indices u and v form DMZs in the Sedenions for a given
strut constant S, their U-indices will increment by g in the Pathions, and zero
division will remain unaffected. By induction, the emanation table contents of the
Sedenion (R,C,P) entries will remain unchanged for all N, for all fixed S≤ 7. This
leads us to
Theorem 9. All non-long-diagonal cell entries in all ETs for all N, for all fixed
S ≤ 7, will be filled.
Proof. Keeping the same notation, the 2N-ions will have g more Assessors than
their predecessors, with indices ranging from g itself to 2g−1 (=G−1). Consider
first some arbitrary Zigzag Assessor with L-index z < g, whose U-index is G+
z ·S. (If it were a Vent Assessor, or a Zigzag on a reversed-orientation strut in a
“Type II” box-kite, the second part of the expression would be reversed: S · z, per
the First Vizier. This effects triplet orientation, but not absolute value of the index,
however, and it is only the latter which matters at the moment.) Now consider the
Assessor whose L-index is the lowest of those new to the 2N-ions, g. We know
it is a Vent Assessor, in all Box-Kites with S < g, of which there are 7 per each
such S in the Pathions, 35 in the 64-D 26-ions, and so on: for it belongs to the trip
(S,g,g+S) (Rule 1), so that its U-index appears on its immediate left in the triplet
(G+g+S,g,G+S) (Rule 2 and last parentheses). Its U-index, then, is G+(g⊻
S), or (recall Rule 1) just G+ g+S. We claim these Assessors form DMZs; or,
writing out the arithmetic, that the following term-by-term multiplication is true:
+g+(G+g+S)
+z+(G+ z·S)
−(G+g+ z ·S)− (z+g)
+(z+g)+(G+g+ z ·S)
Because one Assessor is assumed a Zigzag, while the other is proven a Vent,
the inner signs will be the same. (Simple sign reversals, akin to those involving our
frequently invoked binary variable sg, will let us generalize our proof to include
the Vent-times-Vent case later.) Let’s examine the terms one at a time, starting
with the bottom line. Its left term is an obvious application of Rule 1, as z < g,
the latter being the Generator of the prior CDP level which also contained z as an
L-index. The term on bottom right we derive as follows: we know that z and its
U-index partner in the 2N−1-ions belong to the triplet mediated by g+S: (z,g+
z ·S,g+S). Supplementing this CPO expression by adding G to the right-hand
terms (Rule 2), we get the triplet containing both multiplicands of the bottom-
right quantity: (z,G+g+S,G+g+ z ·S). The multiplicands appear in this trip
in their order of application in forming the product; therefore, their resultant is a
plus-signed copy of the trip’s third term, as shown above.
Moving to the left-hand term of the top line, what trip do the multiplicands
belong to? Within the prior generation, Rule 1 tells us that z’s strut opposite, z ·S,
multiplies g on the left to yield g+ z ·S. Application of Rule 2 to the terms 6= g
reverses order and gives us this: (G+g+z ·S,g,G+z ·S). But what we’ve written
above is the product of multiplying the third and second terms of the trip together,
in CPO-reversed order; hence, the negative sign is correct. Finally, we get the
negative of (z+g) by similar tactics: the term is the U-index of z’s strut-opposite
Assessor in the prior CDP generation, hence belongs to the trip with this CPO
expression: (g+S,z+g,z ·S). Rule 2 gives us (G+g+S,G+z ·S,z+g). Hence,
the product written above is properly signed.
Now, what effect does our initial assumption that z is the L-index of a Zigzag
Assessor have on the argument? The lower-left term is obviously unaffected. But
the upper-left term, perhaps less obviously, also is unchanged: while it seems to
depend on z ·S, in fact this is only used to define the L-index of z’s strut opposite,
which multiplies g on the left to precisely the same effect as z itself, both being less
than it. The two terms on the right, just as clearly, do have their signs changed, for
in both, the order relations of L- and U- indices vis à vis G+S or X are necessarily
invoked. But both signs on the right can be re-reversed to obtain the desired result
if we change the inner sign of the topmost expression – which is to say, we have an
effect analogous to that achieved in earlier arguments by use of the binary variable
sg, as claimed.
Since one CDP level’s G is the g of the next level up, the above demonstration
clearly obtains, by the obvious induction, for all 2N-ions including and beyond the
Pathions. But what if one or both L-indices in a candidate DMZ pairing exceed g?
Rather than answer directly, we use the Roundabout Theorem of last section.
Given a DMZ involving Assessors with L-indices u < g and g, we are assured a
full Box-Kite exists with a Trefoil L-trip (u,g,g+ u). The remaining Assessors,
being their strut opposites, then have L-indices uopp,g+ S, and g+ u · S. As u
varies from 1 to 7, skipping S < 8, zero-padding assures us that all DMZs from
prior CDP generations exist for higher N, for all L-indices u,v < 8. Only those
Box-Kites created by zero-padding from prior-generation Box-Kites (of which
there can be but 1 inherited per fixed S among the 7 found in the Pathions, for
instance) will have all L-indices < g. For all others, the model shown with those
having g as an L-index must obtain. Hence, only one strut will have L-indices
< g, the rest being comprised of some w with L-index ≥ 8, the others deriving
their L-indices from the XOR of w with the strut just mentioned, or with S.
But what will guarantee that any edge-currents will exist between arbitrary
Assessors with L-indices u < g and g+ k,0 < k < g, since there is not even one
DMZ to be found among Assessors with L-indices ≤ g in the candidate Box-
Kite they would share? We can now narrow the focus of our original question
considerably, by making use of the curious computational fact we called the Trip-
Count Two-Step.
In Part I’s preliminary arguments concerning CDP, we showed that the number
of associative triplets in a given generation of 2N-ions, or TripN , can be derived
from a simple combinatoric formula. Call the count of complete Box-Kites in
an ET BKN,S. For S < 8, BKN,S = TripN−2, provided all L-indices g+ k,0 <
k < g, form DMZs in the candidate Box-Kites implied. To begin an induction,
let us consider a new construction along familiar lines, which will provide us an
easy way to comprehend the Pathion trip-systems of all S < 8. Beginning with
N = 5, we designate TripN−2 trips for each S < 8 as type Rule 0, in the manner
the singleton 22-ion trip (1,2,3) was used in our introduction’s ”wok-cooking”
discussion (which Part I, Section 5, used as the basis of its “slipcover proofs”).
But now, instead of putting the Octonions’ G = 4 in the center of the PSL(2,7)
triangle, we put the Sedenions’ 8.
For consistency of examples, we continue to assume S= 1, so we’ll begin with
(3,6,5), the Zigzag L-trip for S = 1 in the Sedenions, and also, by zero-padding,
an L-trip Zigzag for 1 of the 7 Box-Kites with S = 1 among the Pathions. Ex-
tending rays from the (3,6,5) midpoints through the center creates Rule 1 trips
which end in 11,14,13: (a,b,c) get sent to (F,E,D) respectively. The Rule 2
trips along the sides, in order of Zigzag L-index inclusion, then correspond to
Trefoil U-trips, all oriented clockwise. They read symbolically (literally) as fol-
lows: EaD (14,3,13);DbF (13,6,11);FcE (11,5,14). We claim each of these 7
lines, when its nodes are attached to their strut opposites, map 1-to-1 to an S = 1
Pathion Box-Kite. We have this as a given for the Rule 0 trip; we need to ex-
plain this for the Rule 1 trips (which Roundabout already tells us are Box-Kites);
and, we need to prove it for the Rule 2 trips that make the sides. (And, once
we do prove it, and frame the suitable induction for all higher N, the task which
originally motivated us will be done: for these U-trips house the Assessors with
L-indices > g, whose candidate Box-Kites don’t include g.)
The Rule 1 trips, in all instances within this example, correspond to Asses-
sor L-indices (a,d,e). With g = 8 at d, the Third Vizier tells us c = 8+ S =
Sedenion X. (a,b,c) thereby reads, within the Sedenions, as (a,A,X). But in the
Pathions, all 3 terms are less than G, hence can comprise an L-index trip for a Sail
– and specifically, a Zigzag (else the order of A and X would be reversed). Simi-
larly, the old Sedenion ( f ,F) are the new Pathion ( f ,e), with the new trip ( f ,c,e)
being the Third Vizier’s way of saying ( f ,X ,F) from the Sedenions’ vantage.
For the Rule 2 trips, we prove one relation in one of them a DMZ, which
Roundabout tells us implies the whole Box-Kite, while symmetry allows us to
assume the same of the other two. Consider, then, the aDE Trefoil U-trip, in-
stantiated by (3,13,14) in our example; specifically, compute the product of the
Assessors containing a and D = c+ g as L-indices. Their U-indices within the
Pathions must be (G+ a⊻S) = (G+ f ), and (G+ g+ c⊻S) = (G+ g+ d) re-
spectively. We write their dyads when multiplying with opposite inner signs, as
we assume their DMZ is an edge in a Zigzag. We claim the truth of this arithmetic:
+(c+g)− (G+g+d)
+a + (G+ f )
+(G+g+ e)− (b+g)
+(b+g)− (G+g+ e)
Bottom left: (a,b,c)→ (a,c+g,b+g) (Rule 2, with N = 4.)
Bottom right: (a,d,e)→ (a,g+ e,g+d)→ (a,G+g+d,G+g+ e) (Rule 2
twice, N = 4, then N = 5.) Upper dyad’s inner sign reverses that of product.
Top left: ( f ,c,e)→ (e+g,c+g, f )→ (G+ f ,c+g,G+e+g) (Rule 2 twice,
N = 4, then N = 5.)
Top right: ( f ,d,b) → (b+ g,d + g, f ) → (b+ g,G+ f ,G+ g+ d) (Rule 2
twice, N = 4, then N −5.) Upper dyad’s inner sign reverses that of product.
A similar brief exercise with either DMZ formed with the emanated Assessor
will show it, too, has a negative inner sign with respect to a positive in its DMZ
partner. Two negative edge-signs in one Sail means Zigzag (means three negative
edge-signs, in fact). Our proof up through the Pathions is complete; we need only
indicate the existence of a constructive mechanism for pursuing this same strategy
as N grows arbitrarily large.
Consider now the same PSL(2,7) triangle, but in its center put a 16 (= g=G/2
for the 64-D Chingons, after the 64 Hexagrams of the I Ching, to give them a
name). Then, put all 7 of the Pathions’ S = 1 Zigzag L-trips into the Rule 0 circle.
One gets 3 · 7 = 21 Rule 2 Zigzag L-trips, and the 10 integers < g found in them
and the 7 Rule 0 Zigzag L-trips implies there are 10 Rule 1 Trefoil L-trips, each
associated with a distinct Box-Kite. But that would make for 7+ 21+ 10 = 38
Zigzag L-trips, when we know there can only be 35. The extra 3 indicate there’s
some double-duty occurring: specifically, 3 of the Rule 1 Trefoil L-trips in fact
designate not the standard (a,d,e), but ( f ,d,b), with d = g = 16 in each instance.
When (5,14,11) is fed into our “trip machine” as Rule 0 circle, both (11,16,27)
and (14,16,30) map to ( f ,d,b) trips tied to Rule 0 Zigzag L-indices (10,27,17)
and (15,30,17), whose (a,d,e) trips appear as rays on triangles for (3,10,9) and
(3,13,14) respectively. (11,16,27) also shows as an ( f ,d,b) with Rule 0 trip
(6,11,13). (Readers are encouraged to use the code in the appendix to [4], to
generate ETs for low S and N. Trip-machining details for our S = 1 example are
in Appendix A.) For N = 7, use the 35 just-derived S = 1 L-trips as Rule 0 circles
with a central 32, and so on. �
4 The Number Hub Theorem (S = 2N−2) for 2N-ions
Given the lengths required to prove the fullness of ETs for S < 8, it might be
surprising to realize that the infinite number of cases for S = 2N−2 for all 2N-ions
are so simple to handle that they almost prove themselves. Yet the proof of this
Number Hub Theorem, while technically trivial, has far-reaching implications.
Theorem 10. For all 2N-ions with ZDs (N > 3), and S = g = G/2, all non-long-
diagonal entries in the emanation table are filled; more, each such filled cell in
the ET’s upper left quadrant is unmarked (indeed, indicates an edge-current in a
Zigzag); further, the row, column, and cell entries are isomorphic to those found
in an unsigned, CDP-generated, multiplication table for the 2N−2-ions; finally, the
TripN−2 Zigzag L-index sets which underwrite its Box-Kites are precisely all and
only those trips contained in said 2N−2-ions, the ET effectively serving as their
high-level atlas.
Proof. As the largest L-index of any Assessor is 2g− 1, and each S in the ETs
in question is precisely g, then the row (column) labels will ascend from 1 to
g− 1 in simple increments from top to bottom (left to right) in the upper left
quadrant, making its square of filled cells isomorphic to unsigned entries in the
corresponding 2N−2-ion multiplication table. Also, all these filled cells of the ET
will only contain XORs of indices < g. Hence, all and only L-index trips will have
the edges of their (necessarily Zigzag) Sails residing in said quadrant. All non-
long-diagonal cells in the ET are meanwhile filled, since all candidate Assessors
have form M = (m,G+ g+m), and for any CPO triplet (a,b,c) whose row and
column labels plus cell entry are contained in the upper left quadrant, it is easy to
show that the following arithmetic is true:
+b− (G+g+b)
+a + (G+g+a)
+(G+g+ c)− c
+c− (G+g+ c)
Therefore, the TripN−2 Box-Kites, the Zigzag L-index set of each of which is
one of the TripN−2 trips contained in the 2
N−2-ions, all have this simple form:
(a,b,c,d,e, f ) = (a,b,c,g+ c,g+b,g+a) �
Remarks. As will become ever more evident, powers of 2 – which is to say,
singleton 1-bits in indefinitely long binary bitstrings – play a role in ZD number
theory most readily analogized to that of primes in traditional studies. And while
integer triples (from Pythagoras to Fermat) play a central role in prime-factor-
based traditional studies, all XOR triplets at two CDP generations’ remove from
the power of 2 in question are collected by its ET in this new approach. All other
integers sufficiently large (meaning > 8) are meanwhile associated with fractal
signatures, to each of which is linked a unique infinite-dimensional space spanned
by ZD diagonals. But can such a vantage truly be called Number Theory at all?
We say indeed it can: that it is, in fact, the “new kind of number theory” that must
accompany Stephen Wolfram’s New Kind of Science. In his massive 2002 book,
he tells us that, common wisdom to the contrary, complex behavior can be derived
from the simplest arithmetical behavior. The obstacle to seeing this resides in the
common wisdom itself [5, p. 116]:
· · · traditional mathematics makes a fundamental idealization: it as-
sumes that numbers are elementary objects whose only relevant at-
tribute is their size. But in a computer, numbers are not elementary
objects. Instead, they must be represented explicitly, typically by giv-
ing a sequence of digits.
But that ultimately implies strings of 0’s and 1’s, where the matter of impor-
tance becomes which places in the string are held, and which are vacant: the orig-
inal meaning of our decimal notation’s sense of itself as placeholder arithmetic.
The study of zero divisors – placeholder substructures – then becomes the natural
way to investigate the composite characteristics of Numbers qua bitstrings. When
we discover, in what follows, that composite integers (meaning those requiring
multiple bits to be represented) are inherently linked, when seen as strut-constant
bit-strings, with infinite-dimensional meta-fractals, the continuation of the quote
on the following page should ring true:
In traditional mathematics, the details of how operations performed
on numbers affect sequences of digits are usually considered quite
irrelevant. But · · · precisely by looking at such details, we will be
able to see more clearly how complexity develops in systems based
on numbers.
5 The Sand Mandala Flip-Book (8 < S < 16, N = 5)
In the first concrete exploration of ZD phenomenology beyond the Sedenions [6,
pp. 13-19], a startling set of patterns were discovered in the ETs for values of S
beyond the “Bott limit”: that is, for 8 < S < 16 (the upper bound being the G of
the 32-D Pathions), the filled cells sufficed to define not 7, but only 3, Box-Kites
for N = 5; more, the primary geometric figures in each such ET transformed into
each other with each integer increment of S, in a manner exactly reminiscent of the
flip-books which anticipated cartoon animation. While these seemed perplexing
in mid-2002 when they were found, their logic is in fact profoundly simple.
First, each such ET’s S is just the X of one already seen in the Sedenions. We
continue our convention of using g to indicate the G of the prior CDP genera-
tion, employ s for said generation’s S, and reference all prior Assessor indices by
suffixing their letters with asterisks. Then, since S = g+ s, the trip (s,g,g+ s)
mandates, by the First Vizier (whose signed version we invoke due to the direct
derivation from the Sedenions), that g must belong to the Zigzag Sail if it’s to be
an Assessor L-index at all.
Note that this is not a truly legitimate argument, as we’ll see shortly, albeit the
results are correct, as shown by other means in [6]: this is because “Type II” box-
kites first emerge in this current context – but are not among the 3 x 7 “flip-book”
denizens of immediate interest. We will assume, for simplicity of presentation,
that the First Vizier does obtain here: proving that it does, however, requires a
background argument concerning “Type II” box-kites: their S values must be less
than g, hence none of our flip-book candidates can qualify. (But they are just as
numerous as the flip-book box-kites, there being 3 for each of the seven values of
S < g. For their listing, and theoretical framing of “Type II” phenomenology, see
Appendix B.)
We will content ourselves here with giving this as an empirical result, and
assume, therefore, the validity of the signed version of the First Vizier in the
case at hand. Based on this assumption, we can further claim that the Sedenion
Vent L-indices, f∗,e∗,d∗, must also be associated with Zigzag Assessors. By an
argument exactly akin to that of last section, we then have 3 candidate Box-Kites
to consider: since the 3 Vent L-indices are all less than g, they must be mapped to
the 3 Assessors A, g = 8 must adhere to B (and s = 1 to E), while the L-indices of
the C Assessors associated with f∗,e∗,d∗ must be A*, B*, C* respectively. The
proof is easy: taking the new A, C Assessors = ( f∗,G+g+a∗) and (g,G+ s) in
that order as readily generalizable representatives, we do the arithmetic.
+g− (G+ s)
+ f ∗ + (G+g+a∗)
+(G+a∗) − ( f ∗+g)
+( f ∗+g)− (G+a∗)
The bottom left is just Rule 1. For the bottom right, start with the First Vizier:
( f∗,a∗,s)→ ( f∗,G+ s,G+a∗)→ ( f∗) · (−(G+ s)) =−(G+a∗). The top left
is derived thus: (a∗, g, g+a∗)→ (g, G+a∗, G+g+a∗)→ (G+g+a∗) · g =
+(G+a∗). Finally, (a∗,s, f∗)→ (g+a∗,g+ f∗,s)→ (G+g+a∗,G+s,g+ f∗),
but the negative inner sign of the top dyad reverses sign as shown.
The 3 Box-Kites thus derived are the only among the 7 candidates to be viable:
for the Zigzag L-index of the S = 1 Sedenion Box-Kite does not underwrite a Sail;
hence, by what lawyers would call a “fruit of the poisoned tree” argument, neither
do the 3 U-trips associated with the same failed Zigzag. Using A* and B*, then
invoking the Roundabout Theorem, we see this readily:
+b∗+(G+g+ e∗)
+a∗ + (G+g+ f∗)
−(G+g+d∗)− c∗
+c∗ − (G+g+d∗)
NOT ZERO (only c*’s cancel)
With the appending of two successive bits to the left, the bottom-left and top-
right products are identical to those obtaining without the (G+g) being included.
Similarly, the top-left product uses Rule 2 twice, to similar effect, but with (G+g)
included in the outcome: since ( f∗,d∗,b∗) is CPO, we then get −(G+g+d∗).
For the top-right result, meanwhile, the two high bits induce a double reversal,
then are killed by XOR, leaving the product the same as if they hadn’t been there:
( f∗,c∗,e∗)→ (g+e∗,c∗,g+ f∗)→ (G+g+ f∗,c∗,G+g+e∗), hence −c∗. We
have an argument reminiscent of Theorem 2: depending on the inner sign of the
upper dyad, one pair of products cancels or the other, but not both.
We see, then, that the construction given without explanation at the end of
Part I is correct. The arguments given there concerning the vital relationship of
a Box-Kite’s non-ZD structures to semiotic modeling suggest that this “offing”
(to use the appropriately binary slang linked to Mafia hitmen) of a Zigzag’s 4
triplets should have a similarly significant role to play in such modeling. This has
bearing not just on semiotic, but physical models, since the key dynamic fact im-
plicit in the Zigzag L- and U- trips (or just Z-trips henceforth) is their similarity of
orientation: since (a,b,c);(a,B,C);(A,b,C);(A,B,c) are all CPO as written, we
are effectively allowed to do pairwise swaps of upper- and lower- case lettering
among them without inducing anything a physicist might deem observable (e.g.,
a 180◦ reversal or “spin quantum”). This condition of trip sync breaks down as
soon as we attempt to allow similar swapping between Z-trips and their Trefoil
compatriots: in particular, those 2 which don’t share an Assessor with the Zigzag.
The toy model of [7] would use these features to designate the basis of a “Cre-
ation Pressure” that leads to the output of the string theorist’s E8 ×E8 symmetry.
This symmetry, as discussed there, breaks in the standard models when one of the
primordial E8’s decays into an E6 – which has 72 roots to parallel the 72 filled
cells of our Sand Mandalas. For present purposes, the key aspect of this corre-
spondence is that, in ZD theory at least, the explosion of a singleton Box-Kite
into a Sand Mandalic trinity throws the off-switch on the source of the dynamics:
the Z-trips which underwrite trip sync no longer even underwrite Box-Kites. The
whole scenario suggests nothing so much as those boxes which, when opened by
pushing an external lever, emit an arm which pulls up on the same lever, forcing
the box to close and the arm to return to its hiding place inside it.
Let’s turn now to the ET graphics of the flip-book sequence, so suggestive of
cellular automata. For each of the 7 ET’s in question, all labels < g are monotoni-
cally increasing, since S, and hence their strut opposites, exceed them all. But the
only filled (but for long-diagonal crossings) rows and columns will be those with
labels equal to S−g = s and its strut-opposite g, for these L-indices reside at E
and B respectively in all 3 Box-Kites in the ensemble, hence either dyad contain-
ing one of them makes DMZs within each of the trio’s (a,d,e) and ( f ,d,b) Sails,
filling all 12 (= 24−2, minus 2 for diagonals) fillable cells in each row or column
tagged with these Assessors’ label. Thus, as s is incremented, two parallel sets of
perpendicular lines of ET cells start off defining a square missing its corners, then
these parallels move in unit increments toward each other, until they form a 2-ply
crossbar once s = 7 (S = 15). 24 cells each have row label R or column label C
= s; 24 reside in lines with label = g; and 24 more have their contents P = s or
g: these last have an orderliness that is less obvious, but by the last ET in the flip-
book, they have arrayed themselves to form the edges of a diamond, orthogonal to
the long diagonals and meeting up with the crossbar at its four corners, with s = 7
values filling the upward-pointing edges, and g = 8’s those sloping down.
The graphics for the flip-book first appeared in [6, p. 15]; they were recy-
cled on p. 13 of [8]; larger, easily-read versions of these ETs were then included
(along with numerous other Chingon-based flip-books and other graphics we’ll
discuss later) as Slides 25-31 of the Powerpoint presentation comprising [1], de-
livered at Wolfram Science’s June 15-18, 2006, NKS conference in Washington,
D.C. All three of these resources are available online, and the reader is especially
encouraged to explore the last, whose 78 slides can be thought of as the visual
accompaniment to this monograph. (Henceforth, references to numbered Slides
will be to those contained and indexed in it.)
6 64-D Spectrography: 3 Ingredients for “Recipe
Theory”
In a manner clearly related to Bott periodicity, strut constants fall into types de-
marcated by multiples of 8. But unlike the familiar modulo 8 categorization of
types demonstrated, perhaps most familiarly, in the Clifford algebras of various
dimensions, the situation with zero-divisors concerns not typology (which keeps
producing new patterns at all dimensions), but granularity. As we shall see, em-
anation tables for S > 8 (and not a power of 2), aside from diagonally aligned
cells in otherwise empty stretches, display checkerboard layouts of parallel and
perpendicular near-solid lines (NSLs), whose cells all have emanations save for
a pair of long-diagonal crossings, and whose visual rhythms are strictly governed
by S and 8 or the latter’s higher multiples.
The rule we found in the 32-D Pathions for the Sand Mandalas indicates that
the basic pattern (and BK5, S for 8 < S < 16) is “essentially the same” for all of
them. We put the qualifying phrase in quotes, as it is an open question at this
point what features, residing at what depth, are indeed “the same,” and which are
different. For the moment, we will invoke the term spectrographic equivalence as
a sort of promissory note, hoping to stuff ever more elements into its grab-bag of
properties, beginning with two. First is something at once intuitively obvious but
not readily proven. (We will include a corollary to a later theorem when we have
done so). Since the first 8 possible strut-constant values all display maximally-
filled ETs, and since anomalies displayed by higher values are strictly side-effects
of bits to the left of the 8-bit (which are, of course, its multiples), it is natural to
assume that any recursive induction upon simpler forms will echo this “octave”
structure: that each time S passes a new multiple of 8, it participates in a new type.
(As with the Sand Mandalas, we will see this means that BKN, S for the new 7- or
8-element spectral band of new forms will differ from that found in its predecessor
band.) This will lead, in the most clear-cut cases – S = 15, or a multiple of 8 not
a power of 2, say – to grids composed of 8 x 8 boxes some or all of whose borders
are NSLs.
How we determine which cases are clear-cut, meanwhile, and why and how
we might want or need to privilege them, leads to our second property to include
up-front in our grab-bag. In a manner reminiscent of the various tricks – like
minors and cofactors – used in classical matrix theory to prove two matrices are
equivalent, we can transform members of a spectral band into each other by cer-
tain formal methods of hand-waving. With the Sand Mandalas, for instance, we
could replace concrete indices in the row and column labels with abstract desig-
nations referencing the (a,b,c) values of each of their 3 Box-Kites, listed in one
of a number of predetermined orders: by least-first CPO ordering of such (a,b,c)
triplets, in a sequence determined by the Zigzag L-trip of the Sedenion Box-Kite
we can derive them from, for instance (which is equivalent to the 3 sand-mandalic
Box-Kites’ d values, as we’ve seen).
Since which cells are filled is strictly determined by S and G, such desig-
nations eliminate all individuality among the ETs in question. Hence, if certain
display features of one of them seem convenient, we can convert its “tone row”
of indices populating its row and column labels into an abstract layout, governed
by which index is associated with which Assessor, in the manner sketched last
paragraph. We could then use this layout as the template for re-writes of all other
ETs in the same spectral band, knowing that results obtained using the specific
instantiation of the band could thereby be converted into exactly analogous ones
for the other band-members.
We will, in fact, implicitly adopt this tactic by using S = 1 as an exemplary
“for instance” in numerous arguments, while employing the highest-valued S
found among the Sand Mandalas, 15, to simplify the visualizing (and calculat-
ing) of recursive pattern creation for fixed-S, growing N sequences. (S = 15 is
chosen because it has all its low bits filled, hence all XORs are derived by simple
subtraction, leaving carrybit overflow to show itself only in what matters most to
us: the turning off of 4 candidate Box-Kites in the Pathions, and – as we will show
two sections hence – 16 in the Chingons, and 4N−4 in all higher 2N-ions.) Where
we termed, for reasons already explained, the fixed-N, growing S sequences flip-
books, we designate these new displays (for reasons we’ll justify shortly) balloon-
rides.
While there is but one abstract type for the Sedenions, with one Box-Kite for
each of the 7 possible S values, a second spectral band emerges in the Pathions
to include the Sand Mandalas, and two more are added for the 64-D Chingons.
By induction from the universally shared first band for all N > 3, where there
are TripN−2 Box-Kites in each ET, for each S ≤ 8, the first new spectrographic
addition includes the upper multiple of 8 that bounds it, since it is not a power
of 2: 16 < S ≤ 24. The second new range, though, is bounded by G, hence does
not include it, as it is tautologically a power of 2 (which powers, as we saw two
sections ago, comply with a type all their own, with the same Box-Kite-count
formula as for the lowest spectral band): 24 < S < 32.
Each of these two new bands displays a distinctive feature which underwrites
one of the three key ingredients for the recipe theory we are ultimately aiming for.
We call these, for S ascending, (s,g)-modularity and hide/fill involution respec-
tively. The third key ingredient, meanwhile, resides in the band that first emerges
in the Pathions – and whose echo in the Chingons has recapitulative features suffi-
ciently rich as to merit the name of recursivity. We will be devoting Part III’s first
post-introductory section to a thorough treatment of the simplest instance of this
third ingredient, showing how to ascend into the meta-fractal we call the Whor-
fian Sky (named for the great theorist of linguistics, Benjamin Lee Whorf, whose
last-ever lecture on “Language, mind and reality” described the layering of mean-
ing in language in a manner strongly suggesting something akin to it). Among
many visionary passages in his descriptions of a future cross-disciplinary science,
the following seems most apt to serve as the lead-in quote for the third and final
sweep of our argument [9]:
Patterns form wholes, akin to the Gestalten of psychology, which are
embraced in larger wholes in continual progression. Thus the cos-
mic picture has a serial or hierarchical character, that of a progression
of planes or levels. Lacking recognition of such serial order, differ-
ent sciences chop segments, as it were, out of the world, segments
which perhaps cut across the direction of the natural levels, or stop
short when, upon reaching a major change of level, the phenomena
become of quite different type, or pass out of the ken of the older ob-
servational methods. But · · · the facts of the linguistic domain compel
recognition of serial planes, each explicitly given by an order of pat-
terning observed. It is as if, looking at a wall covered with fine tracery
of lacelike design, we found that this tracery served as the ground for
a bolder pattern, yet still delicate, of tiny flowers, and that upon be-
coming aware of this floral expanse we saw that multitudes of gaps
in it made another pattern like scrollwork, and that groups of scrolls
made letters, the letters if followed in a proper sequence made words,
the words were aligned in columns which listed and classified enti-
ties, and so on in continual cross-patterning until we found this wall
to be – a great book of wisdom! [10, p. 248]
Appendix A: Genealogy of S = 1 Box-Kites
N = 4: Unique Quaternion L-index set (1,2,3) fed as Rule 0 circle into PSL(2,7)
with central g = 4, yielding 7 Octonions trips, each with a different S. For S = 1,
have (3,6,5), which becomes singleton Rule 0 for next level.
N = 5: (3,6,5) fed as Rule 0 circle into PSL(2,7) with central g= 8 yields 3 Rule 2
L-trips as triangle’s sides, which (upon affixing their strut opposites as L-indices)
generate (along with zero-padded (3,6,5) ) 4 Box-Kites with X = G+1 = 17.
Triangle’s medians become (a,d,e) Trefoil L-index sets of 3 Rule 1 S = 1 Box-
Kites, making 7 in all. These Zigzag L-index sets become Rule 0 trips for the next
level, and are:
Rule 0: (3,6,5)
Rule 1: (3,10,9); (6,15,9); (5,12,9)
Rule 2: (3,13,14); (6,11,13); (5,14,11)
N = 6: The 7 N = 5 Zigzag L-index sets just listed are fed as Rule 0 circles into
PSL(2,7) triangles with central g = 16, and are Zigzag L-index sets in their own
right for Box-Kites with X = G+1 = 33.
10 Rule 1 medians, 3 redundant (as they generate ( f ,d,b)’s where (a,d,e)’s
are also given: (14,16,30)* and (11,16,27)** in (5,14,11)’s triangle, the latter
also in (6,11,13)’s). They are associated with these 7 Zigzag L-index sets:
(3,18,17); (5,20,17); (6,23,17); (9,24,17);
(10,27,17)∗; (12,29,17); (15,30,17)∗∗
Rule 2 sides: 3 per each Rule 0 trip, as follows:
(3,6,5)→ (3,21,22); (6,19,21); (5,22,19)
(3,10,9)→ (3,25,26); (10,19,25); (9,26,19)
(6,15,9)→ (6,25,31); (15,22,25); (9,31,22)
(5,12,9)→ (5,25,28); (12,21,25); (9,28,21)
(3,13,14)→ (3,30,29); (13,19,30); (14,29,19)
(6,11,13)→ (6,29,27); (11,22,29); (13,27,22)
(5,14,11)→ (5,27,30); (14,21,27); (11,30,21)
N = 7: Feed the just-listed 35 Zigzag L-index sets to PSL(2,7)’s with g = 32, as
Rule 0 circles, thereby generating the 155 S = 1 Zigzags found in the 27-ions, or
Routions – named for the site of the Internet Bubble’s once-famed “Massachusetts
Miracle,” Route 128 – and so on.
Appendix B: A Brief Intro to “Type II” Box-Kites
The recursive generation of Zigzag L-sets just presented calls for some close at-
tention when the box-kites involved are Type II, since they then have the diagonals
of their PSL(2,7) triangles oriented differently: instead of all 3 leading from mid-
points of the Rule 2 sides to the corners, only 1 of these will preserve orientation
for a Type II (with the other two having “reversed VZ1” rules in evidence). We
first give a construction for producing all the Type II box-kites in the Pathions, and
then indicate the manner in which their workings are intimately connected with
the phenomenology of twist products broached in Part I’s Theorem 6.
The construction was presented with different framing in [8], where we de-
ployed a “stereo Fano” representation using side-by-side triangles, the left being a
proper PSL(2,7). Within the Pathions, there are 7 distinct box-kites for each S ex-
cept for the “flip-book” trios, one for each S> 8. And for S= 8 exactly, we saw in
our discussion of the Number Hub Theorem that we can build all 7 by placing 8 in
the center of the standard Fano (what we’ll call PSL(2,7) henceforth), then taking
the Zigzag L-trip for each Sedenion S and placing its units at the sides’ midpoints,
in the usual CPO order (in left, right, and bottom sides respectively). Each of
these 7 lines then generates a new box-kite in the Pathions for the Sedenion S in
question.
If we re-inscribe the starter-kit L-trip, but change G to the Pathion’s 16, ap-
plying VZ2 gives us new U-index terms, but the L-index terms for all 6 Assessors
remain the same as for the Sedenion box-kite: we call this “Rule 0” instance the
zero-padded box-kite (or just ZP) for the S value in question.
If we take the 3 “Rule 1” triplets along the struts, and place them not at the A,
B, C positions of our new Pathion box-kites, but instead at A, D, E (with 8 always
winding up at D), we generate 3 more standard (Type I) box-kites. For S = 1, the
Sedenion Zigzag L-trip is just (3,6,5), and each of its units becomes the low-index
‘A’ for a new Pathion box-kite, with L-indices written in “nested parentheses”
order (that is, A, B, C, D, E, F) as follows: (3,10,9,8,11,2); (6,15,9,8,14,7);
(5,12,9,8,13,4).
But if we take the 3 “Rule 2” triplets along the edges, mapping the Zigzag unit
at the center of each to the low-index ‘A’ of a new box-kite, the 8 doesn’t show
at any Assessor, and two of the three struts will have orientations reversed. These
“Type II” box-kites, again for S = 1, written per the same convention just used for
their 8-bearing siblings, read like this: (3,13,14,15,12,2); (6,11,13,12,10,7);
(5,14,11,10,15,4). Since the A and F low-index terms are the same as in the
same-S Sedenion box-kite, the strut they make obviously has the standard orienta-
tion. (But note that there is nothing essential about the (A,F) strut here: the placing
of the lowest-indexed unit of the Zigzag L-trip at A is a convenient convention,
and its employment in the Pathions suffices to induce this effect; however, it no
longer suffices in higher dimensions, where S can exceed 8 yet still be less than
That their being Type II is an immediate side-effect of “Rule 2” in this method
of deriving them should be obvious. What is less obvious is their special relation-
ship with twist products. Here, we review some of the basics: in the Sedenions,
whenever two Assessors bound an edge, we can swap a pair of corresponding
terms (either L- or U- indices) and then switch the sign joining the L- and U- in-
dices in the resultant pairing, and get an Assessor in another box-kite as a result.
Such “twist products,” then, reverse the edge-sign of a given line of ZDs as we
move between containing box-kites. Moreover, such twists are naturally investi-
gated in the context of the squares, not the triangles, of the octahedral vertex figure
we write Assessors on: the three orthogonal Catamarans, then, instead of the four
touching-only-at-the-vertices Sails. That’s because opposite sides of a Catamaran
twist to Assessors in the same box-kite, so that each Catamaran lets one twist to
two different box-kites – with the terminal Catamaran, in each case, being further
twistable into the box-kite you didn’t twist to in the first instance. As shown in the
“Twisted Sister” and “Royal Hunt” diagrams of [4], these triple transforms can be
represented in their own Fano planes, with the indices placed on their loci now
corresponding to the strut constants of a septet of box-kites.
Each Catamaran comprises the pathways connecting 4 Assessors – meaning
it doesn’t connect up with either term of the third strut in its box-kite. It is not
hard to see that the strut constant of the box-kite one twists to is equal to the strut-
opposite of the term which completes the L-trip of the edge being twisted in the
first place. Hence, any L-index term on a Sedenion box-kite corresponds to the
strut constant of another such box-kite one can twist to. This suggests expanding
the meaning of “twist product” to embrace pairings which share a strut rather than
an edge. For, if we allow this, we can then treat the third strut orthogonal to the
square hosting twists as the “mast” of the Catamaran, giving us an expanded sense
of this latter term which allows us a major simplification: instead of thinking of
the Sedenions’ ZDs as distributed among 7 distinct box-kites, we can see them all
included in one “embroidered” box-kite diagram, which we call a brocade. Each
of the 12 box-kite edges allows twists to a pair of different Assessors – let’s say
(A, b) and (B, a), in the box-kite with S = copp = d. More, the (S,X) pair – which
we can think of as in the box-kite’s center – can be “twisted” with all 6 Assessors
in the original box-kite to yield 12 more. We therefore have 6 + 24 + 12 = the
total set of “42 Assessors” in the Sedenions, all representable, on any one of the 7
component box-kites, as a unitary “brocade.”
It would be nice to be able to generalize the “brocade” notion so as to reduce
the number of basic structures in higher-order contexts: in the Pathions, for in-
stance, there are 77 box-kites, all but 21 of which are “Type I,” with 21 of those
coming in sand-mandala triples, 7 forming the S = 8 “Atlas,” plus 7 ZP’s and 3 ·7
“strongboxes” (so called, because these low-S box-kites contain “pieces of 8”)
completing the collection. But if we also count in the 4 · 7 = 28 “missing” box-
kites for high S, we can collapse our head count from 105 box-kite-like structures
to 15 brocades. Miming the Sedenion situation, the 7 ZP’s form the simplest; the
7 Sand-Mandala trios intermingle with the Atlas septet and the 21 strongboxes to
make 7 more brocades; and the 21 Type II box-kites twist into each other (to fill
out one Catamaran in each) and into the “hidden box-kites” linked with high S
(filling out two more Catamarans per Type II instance), yielding up the final set
of 7 brocades. (We note that the Type II situation is not as mysterious as it might
appear, once we recall the “slipcover proof” logic of Part I, Section 5: with 2 of
3 strut triplets being reversed, “tugging” on a Type II’s Fano will tend to send a
reversed arrow onto an edge 4 times out of 6 – meaning that, in all such cases, the
corollary to Theorem 7, and hence the theorem itself, will fail, thereby explaining
the “why” of “missing” box-kites!)
We gain the generalized “brocade” simplification at a very small price: relax-
ing the notion of “twist product” to embrace source and target L- and U- index
pairs which aren’t necessarily zero-divisors within the context of the G at hand.
But this is an investment which pays dividends, since it allows us to use Type II
structures as “middlemen” to facilitate studying the “hidden box-kite” substruc-
tures of the meta-fractal “white space” in high-S ET’s. Given the semiotic and
semantic importance of “ZD-free” structures (recall that our transcription of Pe-
titot’s analysis of Greimas’ “Semiotic Square” into zero-divisor theory is based
on ZD-free strut opposites), we can expect a richness of results based on Catama-
ran study that should at least equal that we are conducting based on Sails. (For
a “coming attraction,” interested readers should see the online Powerpoint slide-
show linked with our NKS 2007 presentation [11], which will play a role with
respect to our forthcoming and similarly named monograph, “Voyage by Catama-
ran,” akin to that our NKS 2006 slide-show did for the theorem/proof exposition
you are currently reading.)
References
[1] Robert P. C. de Marrais, “Placeholder Substructures: The Road from NKS
to Small-World, Scale-Free Networks Is Paved with Zero-Divisors,” http://
wolframscience.com/conference/2006/ presentations/materials/demarrais.ppt
(Note: the author’s surname is listed under “M,” not “D.”)
[2] Robert P. C. de Marrais, “Placeholder Substructures I: The Road From NKS
to Scale-Free Networks is Paved with Zero Divisors,” Complex Systems, 17
(2007), 125-142; arXiv:math.RA/0703745.
[3] Robert P. C. de Marrais, “The 42 Assessors and the Box-Kites They Fly,”
arXiv:math.GM/0011260.
[4] Robert P. C. de Marrais, “Presto! Digitization,” arXiv:math.RA/0603281
[5] Stephen Wolfram, A New Kind of Science, (Wolfram Media, Champaign IL,
2002). Electronic version at http://www.wolframscience.com/nksonline.
[6] Robert P. C. de Marrais, “Flying Higher Than A Box-Kite,”
arXiv:math.RA/0207003.
http://arxiv.org/abs/math/0703745
http://arxiv.org/abs/math/0011260
http://arxiv.org/abs/math/0603281
http://www.wolframscience.com/nksonline
http://arxiv.org/abs/math/0207003
[7] Robert P. C. de Marrais, “The Marriage of Nothing and All: Zero-Divisor
Box-Kites in a ‘TOE’ Sky”, in Proceedings of the 26th International Col-
loquium on Group Theoretical Methods in Physics, The Graduate Center
of the City University of New York, June 26-30, 2006, forthcoming from
Springer–Verlag.
[8] Robert P. C. de Marrais, “The ‘Something From Nothing’ Insertion Point”,
http://www.wolframscience.com/conference/2004/presentations/materials/
rdemarrais.pdf
[9] Robert P. C. de Marrais, “Placeholder Substructures III: A Bit-String-Driven
‘Recipe Theory’ for Infinite-Dimensional Zero-Divisor Spaces,”
arXiv:0704.0112 [math.RA])
[10] Benjamin Lee Whorf, Language, Thought, and Reality, edited by John B.
Carroll (M.I.T. Press, Cambridge MA, 1956).
[11] Robert P. C. de Marrais, “Voyage by Catamaran: Long-Distance Seman-
tic Navigation, from Myth Logic to Semantic Web, Can Be Effected by
Infinite-Dimensional Zero-Divisor Ensembles,” wolframscience.com/
conference/2007/presentations/materials/demarrais.ppt (Note: the author’s
surname is listed this time American style, under “D,” not “M.”)
http://www.wolframscience.com/conference/2004/presentations/materials/
http://arxiv.org/abs/0704.0112
	Introduction By Way of Reprise: From Box-Kites to ETs
	Emanation Tables: Conventions for Construction
	ETs for N > 4 and S 7
	The Number Hub Theorem (S = 2N - 2) for 2N-ions
	 The Sand Mandala Flip-Book (8 < S < 16,  N = 5)
	64-D Spectrography: 3 Ingredients for ``Recipe Theory''
ABSTRACT
  Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from
N-dimensional hypercomplex numbers (N a power of 2, at least 4) can represent
singularities and, as N approaches infinite, fractals -- and thereby,scale-free
networks. Any integer greater than 8 and not a power of 2 generates a
meta-fractal or "Sky" when it is interpreted as the "strut constant" (S) of an
ensemble of octahedral vertex figures called "Box-Kites" (the fundamental
building blocks of ZDs). Remarkably simple bit-manipulation rules or "recipes"
provide tools for transforming one fractal genus into others within the context
of Wolfram's Class 4 complexity.

<|endoftext|><|startoftext|>
Filling-Factor-Dependent Magnetophonon Resonance in Graphene
M. O. Goerbig,1 J.-N. Fuchs,1 K. Kechedzhi,2 and Vladimir I. Fal’ko2
Laboratoire de Physique des Solides, Univ. Paris-Sud, CNRS UMR 8502, F-91405 Orsay, France and
Department of Physics, Lancaster University, Lancaster, LA1 4YB, United Kingdom
(Dated: October 23, 2018)
We describe a peculiar fine structure acquired by the in-plane optical phonon at the Γ-point
in graphene when it is brought into resonance with one of the inter-Landau-level transitions in
this material. The effect is most pronounced when this lattice mode (associated with the G-band
in graphene Raman spectrum) is in resonance with inter-Landau-level transitions 0 ⇒ +, 1 and
−, 1 ⇒ 0, at a magnetic field B0 ≃ 30T. It can be used to measure the strength of the electron-
phonon coupling directly, and its filling-factor dependence can be used experimentally to detect
circularly polarized lattice vibrations.
PACS numbers: 78.30.Na, 73.43.-f, 81.05.Uw
In metals and semiconductors the spectra of phonons
are renormalized by their interaction with electrons.
Some of the best known examples include the Kohn
anomaly [1] in the phonon dispersion, which originates
from the excitation/de-excitation of electrons across the
Fermi level upon the propagation of a phonon through
the bulk of a metal and a shift in the longitudinal opti-
cal phonon frequency in heavily doped polar semiconduc-
tors [2]. However, despite the transparency of theoretical
models the observation of such effects is often obscured
by the difficulty to change the electron density in a mate-
rial, whereas in semiconductor structures containing two-
dimensional (2D) electrons the density of which can be
varied, the influence of the latter on the phonon modes is
weak due to a negligibly small volume fraction occupied
by the electron gas. In this context, a unique opportunity
arises in graphene-based field-effect transistors [3], where
the density of carriers in an atomically thin film (mono-
layer [4, 5, 6] or a bilayer [7]) can be continuously varied
from 1013cm−2 p-type to 1013cm−2 n-type. Several Ra-
man experiments have already been reported [8, 9] where
the variation of carrier density in graphene changes the
optical phonon frequency, in agreement with theoretical
expectations [10, 11, 12].
When graphene is exposed to a quantizing magnetic
field, its electronic spectrum quenches into discrete Lan-
dau levels (LLs) [13]. Then, the optical phonon energy in
graphene may coincide with the energy of one of the inter-
LL transitions, a condition known as magnetophonon
resonance [14, 15]. Recently, Ando has suggested [16]
that in undoped graphene the magnetophonon resonance
enhances the effect of the electron-phonon coupling on
a spectrum of the in-plane optical phonons - the E2g
modes attributed to the G-band in the Raman spectra
in Refs. [8, 9, 17, 18, 19]. In this paper, we investigate
a rich structure of the anti-crossing experienced by such
lattice modes when a magnetic field makes their energy
equal to the energy of one of the valley-antisymmetric
interband magnetoexcitons [20]. Most saliently, the dif-
ference between circular polarization of various inter-LL
transitions [21, 22] makes the magnetophonon resonance
distinguishable for lattice vibrations of different circular
polarization, which makes the number of split lines in
the fine structure acquired by a phonon and the value of
splitting dependent on the electronic filling factor, ν.
The in-plane optical phonons in graphene [relative dis-
placement u = (ux, uy) of sublattices A and B] have
the energy ω ≈ 0.2eV at the Γ-point (in the center of
the Brillouin zone). These phonons and their coupling to
electrons can be described using the Hamiltonian [10, 11],
Hph =
ωb†µ,qbµ,q + g
2Mω(σxuy − σyux), (1)
u(r) =
2NucMω
bµ,q + b
eµ,qe
−iq·r,
where b
µ,q are annihilation (creation) operators of a
phonon with polarisation eµ,q, M is the mass of a car-
bon atom, and Nuc is the number of unit cells. Here
and below, we use units ~ ≡ 1. Also, we shall uti-
lize a double degeneracy of the E2g mode at the Γ-point
(at q = 0) and describe the in-plane optical phonon in
terms of a degenerate pair of circularly polarized modes,
u	 = (ux+iuy)/
2 and u� = u
	. The constant g in Eq.
(1) characterizes the electron-phonon coupling [23]. This
coupling has the form of the only invariant linear in u per-
mitted by the symmetry group of the honeycomb crystal.
It is constructed using Pauli matrices σ = (σx, σy) acting
in the space of sublattice components of the Bloch func-
tions, [φK+A, φK+B] and [φK−B, φK−A] which describe
electron states in the valleys K± (two opposite corners of
the hexagonal Brillouin zone) and obey the Hamiltonian,
in terms of the electron charge −e < 0 [24],
Hel = ξvσ · p, p =− i∇+ eA, ∂xAy − ∂yAx = B.
Here, ξ = ± distinguishes between K±, and momentum
p is calculated with respect to the center of the corre-
sponding valley. This Hamiltonian represents the dom-
inant term of the next-neighbor tight-binding model of
graphene [25, 26, 27], and the electron-phonon coupling
http://arxiv.org/abs/0704.0027v4
:   B  sublattice
:   A  sublattice
(a) (b)
−,(n+1)
+,(n+1)
FIG. 1: (a) Optical phonons are lattice vibrations with an out-
off-phase oscillation of the two sublattices. (b) Interband electron-
hole excitations coupling to phonon modes with different circular
polarization.
in Eq. (1) takes into account the change in the A − B
hopping elements due to the sublattice displacement [28].
In a perpendicular magnetic field, Hel determines [13]
a spectrum of 4-fold (spin and valley) degenerate LLs,
εα=±n = α
2nvλ−1B in the valence band (ε
n>0), con-
duction band (ε+n>0), and at zero energy (ε0 = 0, ex-
actly at the Dirac point in the electron spectrum), in
terms of the magnetic length λB = 1/
eB. Such a
spectrum has been confirmed by recent quantum Hall
effect measurements [4, 5, 6]. In each of the two val-
leys, the LL basis is given by two-component states
1 + δn,0φn,m, iξα(1−δn,0)φn−1,m], where φn,m are
the LL wave functions described by the quantum num-
bers n and m, the latter being related to the guiding
center degree of freedom. Here, we neglect the Zeeman
effect, and simply take into account the two-fold spin de-
generacy.
Excitations of electrons between LLs can be described
in terms of magnetoexcitons (see Fig. 1). Those relevant
for the magnetophonon resonance are
	(n, ξ) =
1 + δn,0
+,n,m;ξc−,(n+1),m;ξ,
�(n, ξ) =
1 + δn,0
+,(n+1),m;ξ
c−,n,m;ξ, (2)
where the index A =	,� characterizes the angular mo-
mentum of the excitation and the operators c
α,n,m;ξ
annihilate (create) an electron in the state α, n,m in
the valley Kξ. The normalization factors N	n =
[(1 + δn,0)NB(ν̄−,(n+1) − ν̄+,n)]1/2 and N�n = [(1 +
δn,0)NB(ν̄−,n − ν̄+,(n+1))]1/2 are used to ensure the
bosonic commutation relations of the exciton operators,
[ψA(n, ξ), ψ
′, ξ′)] = δA,A′δξ,ξ′δn,n′ , where NB is the
total number of states per LL in a sample, including
the two-fold spin-degeneracy. These commutation rela-
tions are obtained within the mean-field approximation
with 〈c†α,n,m;ξcα′,n′,m′;ξ′〉 = δξ,ξ′δα,α′δn,n′δm,m′(δα,− +
δα,+ν̄α,n), where 0 ≤ ν̄α,n ≤ 1 is the partial filling fac-
tor of the n-th LL. Similarly to magneto-optical selec-
tion rules in graphene [20, 21, 22], α, n ⇒ α′, n ± 1, 	-
polarized phonons are coupled to electronic transitions
with −, (n + 1) ⇒ +, n, and �-polarized phonons to
−, n ⇒ +, (n + 1) magneto-excitons, at the same en-
ergy Ωn ≡
2(v/λB)(
n+ 1) (Fig. 1), which
follows directly from the composition of the LL in
graphene and the form of the electron-phonon coupling
in Eq. (1). In contrast to photons that couple to
the valley-symmetric mode ψA,s(n) = [ψA(n,K+) +
ψA(n,K−)]/
2, electron-phonon interaction in Eq.(1)
couples phonons to the valley-antisymmetric magnetoex-
citon ψA,as(n) = [ψA(n,K+)− ψA(n,K−)]/
In terms of magnetoexcitons we can, now, rewrite the
electron-phonon Hamiltonian in a bosonized form, as
τ=s,as
A,τ (n)ψA,τ (n) +
AbA (3)
gA(n)
ψA;as(n) + bAψ
A;as(n)
g	(n) = g
(1 + δn,0)γ
ν̄−,(n+1) − ν̄+,n,
g�(n) = g
(1 + δn,0)γ
ν̄−,n − ν̄+,(n+1),
where gA are the effective coupling constants, with γ =
3a2/2πλ2B and a = 1.4Å (distance between neighbor-
ing carbon atoms). In the Hamiltonian (3), we have omit-
ted electronic excitations with a higher angular momen-
tum which do not couple to the in-plane optical phonon
modes (e.g., n ⇒ n′, with n′ 6= n ± 1). The dressed
phonon operator corresponding to the Hamiltonian (3)
is obtained by solving Dyson’s equation. The pole of the
propagator gives the antisymmetric coupled mode fre-
quencies ω̃A,
ω̃2A − ω2 = 4ω
n=nF+1
ω̃2A − Ω2n
∆nF g
A(nF )
ω̃2A −∆2nF
, (4)
where nF stands for the number of the highest fully occu-
pied LL in the spectrum, and ∆n =
2(v/λB)(
n+ 1−√
n). In Eq. (4), the sum (extended up to the high-
energy cut-off N ∼ (λB/a)2 above which the electronic
dispersion is no longer linear) takes into account inter-
band magnetoexcitons, and the last term gives a small
correction due to an intraband magnetoexciton. In the
small-field limit and large doping (nF ≫ 1), solution of
Eq. (4) reproduces the zero-field result [10, 11] if one
replaces the sum by an integral,
n=0 →
dn, ap-
proximates
n+ 1 ≈ 2
n and ∆nF ≈ 0, and,
then, linearizes Eq. (4) by replacing ω̃A by ω in the de-
nominator,
ω̃ ≃ ω̃0 + λ
ω + 2
2nFv/λB
ω − 2
2nFv/λB
ω̃0 ≃ ω + 2
ω2 − Ω2n
where λ = (2/
3π)(g/t)2 ≃ 3.3× 10−3 is the same as in
Refs. [10, 16] (t = 2v/3a ∼ 3eV is the A−B hopping am-
plitude), and ω̃0 is the renormalized phonon frequency in
an undoped graphene sheet at B = 0. The only variation
arises at high fields, ω̃0 &
2v/λB, where for nF = 0 the
linearized Eq. (4) yields
ω̃ ≃ ω̃0 −
g2(0)
(ω̃0λB/
2v)2 − 1
The strongest effect of the phonon coupling to elec-
tron modes occurs when the frequency of the former
coincides with the frequency Ωn of one of the magne-
toexcitons ψA,as(n). In such a case, the sum on the
right-hand-side of the eigenvalue equation (4) is domi-
nated by the resonance term and may be approximated
by 2ωg2A(n)/ (ω̃A − Ωn). This results in a fine structure
of mixed phonon-magnetoexciton modes, ψA,as(n) cos θ+
bA sin θ with frequency ω̃
A and ψA,as(n) sin θ − bA cos θ
with frequency ω̃−A [where cot 2θ = (Ωn−ω̃0)/2gA], which
are determined for each polarisation (A =	,� ) sepa-
rately,
(n) = 1
(Ωn + ω̃0)∓
(Ωn − ω̃0)2 + g2A(n). (5)
A generic form of the phonon-magnetoexciton anti-
crossing and formation of coupled modes, ω±
(n) in un-
doped graphene (i.e., ν = 0) is illustrated in Fig. 2(a).
Such an anticrossing and mode mixing is simlar to that
described by Ando [16]. It can manifest itself in Raman
spectroscopy: in a fine structure acquired by the G-line
(earlier attributed [8, 9, 17, 18, 19] to the in-plane op-
tical phonon at the Γ-point, E2g mode) at the magneto-
phonon resonance conditions. The effect is the strongest
for the resonance Ωn=0 ≈ ω̃0 between the phonon and
magnetoexciton based upon −, 1 ⇒ 0 and 0 ⇒ +, 1 tran-
sitions. When approaching the resonance (by sweeping a
magnetic field), the phonon line becomes accompanied by
a weak satellite moving towards it and increasing its in-
tensity. Exactly at the magnetophonon resonance, where
both the upper mode [ω̃+A(n)] and the lower mode [ω̃
A(n)]
consist of an equal-weight superposition of the phonon
and the resonant exciton, with cos θ = sin θ = 1/
the G-band in graphene would appear as two lines. For
Ωn=0 =
2v/λB ≈ 36
B[T] meV (see [16, 24]) and
ω̃0 ≃ 200 meV, this resonance occurs in an experimen-
tally accessible field range, B0 ≃ 30 T. For the filling
factor ν = 0, the central LL (n = 0) is always half-filled.
Then, coupling and, therefore, splitting of the �- and 	-
polarized modes coincide, g� = g	, thus, giving rise to a
pair of peaks at the energies ω̃± = ω̃0 ± g� sketched
in part I in Fig. 2(b). For the magnetic field value
B0 ≃ 30 T and g ≃ 0.28eV [12], we estimate this splitting
as 2gA ∼ 16meV (∼ 130cm−1), which largely exceeds the
G-band width observed in Refs. [8, 9, 17, 18, 19].
Doping of graphene changes the strength of the cou-
pling constants g� and g	, as shown in Fig. 2(c). This
5 10 15 20 25 30 35
Magnetic Field [T]
3010 20 40
ν = 0
0 < |ν| < 2
|ν| = 2
2g 2g
mode splitting
−6 −4 −2 0
4n−2 4n+2 4n+6
B=Bn>02γ
FIG. 2: (a) Coupled phonon and magneto-excitons as a function
of the magnetic field. Energies are in units of the bare phonon
energy ω. Dashed lines indicate the uncoupled valley-symmetric
modes, with gA = 0. (b) Mode splitting as a function of the filling
factor, as may be seen in Raman spectroscopy, with the resonance
condition Ωn=0 ≈ ω̃0, for ν = 0 in (I), 0 < |ν| < 2 (in II), and
ν = ±2 (in III). The absolute intensity of the modes is in arbitrary
units, but the height and the width reflect the expected relative
intensities. (c) Mode splitting for n = 0, as a function of the filling
factor ν. (d) Same as in (c) for n ≥ 1.
is because a higher (lower) occupancy of the n = 0 LL
reduces (enhances) the oscillator strength of the 	 polar-
ized transition due to the availability of filled and empty
states in the involved LLs, whereas the same change in
the electron density has the opposite effect on g�. As
a result, for an arbitrary filling factor −2 < ν < 2,
we predict that, in the vicinity of magnetophonon reso-
nance, the phonon mode (and, therefore, G-band in Ra-
man spectrum) should split into four lines [part II in Fig.
2(b)], with ω̃±� = ω̃±g� for �-polarized and ω̃±	 = ω̃±g	
for 	-polarized phonons. In the quantum Hall state at
filling factor ν = 2, the transition −, 1 ⇒ 0 becomes suc-
cessively blocked and no longer affects the frequency of a
	-polarized phonon, whereas the transition 0 ⇒ +, 1 ac-
quires the maximum strength, thus, increasing the cou-
pling parameter g�. This leads to the magnetophonon
resonance fine structure consisting of three peaks, with
an even larger splitting between side lines, as sketched in
part III in Fig. 2(b). Interestingly, this may enable one
to directly observe lattice modes with a definite circu-
lar polarization. A further increase of the electron filling
factor reduces the side-line splitting which should com-
pletely disappear at ν = 6, after the transition 0 ⇒ +, 1
becomes blocked by a complete filling of the +, 1 LL [Fig.
2(c)]. The same arguments hold for p-doped graphene,
though in this case the roles of �- and 	-polarized modes
are interchanged.
Magnetophonon resonances with other possible inter-
LL transitions n ⇒ n+ 1 occur at much lower magnetic
fields, Bn = B0/(
n+ 1)2. For example, a resonant
phonon coupling with the magnetoexciton ψA;as(1) is ex-
pected to occur at B1 ≈ 5T. Its description remains qual-
itatively similar, though for n > 0 the mode splitting is
less pronounced because of the B-field dependence of the
coupling constants in Eq. (3). One finds that g� = g	
for |ν| < 2(2n − 1). At ν = 2(2n − 1), filling of the n-
th LL starts changing, which reduces splitting of the 	-
polarized mode and gives rise to the four-peak structure.
At ν = 2(2n+1), where the +, n LL becomes completely
filled, splitting of the 	-polarized phonon vanishes, thus,
resulting in the three-peak fine structure [part III in Fig.
2(b)] that would persist up to ν = 2(2n + 3). This is
because the splitting of the �-polarized modes remains
constant up to the filling factor ν = 2(2n + 1), above
which population of the +, (n+ 1) LL starts to suppress
the value of g�, until the latter vanishes at ν = 2(2n+3)
[see Fig. 2(d)].
In conclusion, we have predicted a filling-factor depen-
dence of the fine structure acquired by the in-plane (E2g)
optical phonon in graphene when the latter is in reso-
nance with one of the inter-LL transitions in this ma-
terial. The effect is expected to be most pronounced
when the phonon is resonantly coupled to the 0 ⇒ +, 1
and −, 1 ⇒ 0 transitions, which requires a magnetic field
B0 ≃ 30T. The predicted mode splitting may be used
to measure directly the strength of the electron-phonon
coupling, and also to distinguish between circularly (left-
and right- hand) polarized lattice modes.
We thank D. Abergel, A. Ferrari, P. Lederer, and A.
Pinczuk for useful discussions. This work was suported
by Agence Nationale de la Recherche Grant ANR-06-
NANO-019-03 and EPSRC-Lancaster Portfolio Partner-
ship EP/C511743. We thank the MPI-PKS workshop
‘Dynamics and Relaxation in Complex Quantum and
Classical Systems and Nanostructures’ and the Kavli
Institute for Theoretical Physics, UCSB (NSF PHY99-
07949) for hospitality.
[1] W. Kohn, Phys. Rev. Lett. 2, 393 (1959).
[2] G.D. Mahan, Many-Particle Physics, Kluwer Academic,
New York 2000.
[3] K. Novoselov et al., Science 306, 666 (2004).
[4] K. Novoselov et al., Nature 438, 197 (2005).
[5] Y. Zhang et al., Nature 438, 201 (2005).
[6] Y. Zhang et al., Phys. Rev. Lett. 96, 136806 (2006).
[7] K. Novoselov et al., Nature Phys. 2, 177 (2006).
[8] S. Pisana et al., Nat. Mater. 6, 198 (2007).
[9] J. Yan, Y. Zhang, P. Kim, and A. Pinczuk, Phys. Rev.
Lett. 98, 166802 (2007).
[10] T. Ando, J. Phys. Soc. Jpn. 75, 124701 (2006).
[11] A.H. Castro Neto and F. Guinea, Phys. Rev. B 75,
045404 (2007).
[12] M. Lazzeri and F. Mauri, Phys. Rev. Lett. 97, 266407
(2006).
[13] J.W. McClure, Phys. Rev. 104, 666 (1956).
[14] J.P. Maneval, A. Zylberzstejn, and H.F. Budd, Phys.
Rev. Lett. 23, 848 (1969); G. Bauer and H. Kahlert, Phys.
Rev. B 5, 566 (1972).
[15] R.J. Nicholas, S.J. Sessions, and J.C. Portal, Appl. Phys.
Lett. 37, 178 (1980); T.A. Vaughan et al., Phys. Rev. B
53, 16481 (1996).
[16] T. Ando, J. Phys. Soc. Jpn 76, 024712 (2007).
[17] A.C. Ferrari et al., Phys. Rev. Lett. 97, 187401 (2006).
[18] A. Gupta et al., Nano Lett. 6, 2667 (2006).
[19] D. Graf et al., Nano Lett. 7, 238 (2007).
[20] A. Iyengar et al., Phys. Rev. B 75, 125430 (2007).
[21] M.L. Sadowski et al., Phys. Rev. Lett. 97, 266405 (2006).
[22] D.S.L. Abergel and V. I. Fal’ko, Phys. Rev. B 75, 155430
(2007).
[23] Numerical results yield g =
〉F ≃ 0.28eV; S. Pis-
canec et al., Phys. Rev. Lett. 93, 185503 (2004).
[24] We use the reported value v = 108cm/s; A.K. Geim and
K.S. Novoselov, Nat. Mater. 6, 183 (2007).
[25] P.R. Wallace, Phys. Rev. 71, 622 (1947).
[26] R. Saito, G. Dresselhaus, M.S. Dresselhaus, Physical
Properties of Carbon Nanotubes, Imperial College Press,
London 1998.
[27] T. Ando, J. Phys. Soc. Jpn. 74, 777 (2005).
[28] The electron-phonon coupling is off-diagonal because a
lattice distortion affects the bond length and thus the
nearest-neighbor hopping between the two different sub-
lattices [10, 11].
Erratum
In the previous version (v3) of this Letter, we have
underestimated the numerical value of the mode split-
ting of the magnetophonon resonance [see paragraph af-
ter Eq. (5)] by a factor of 2 (the text above takes into
account the corrected parameters). This is a result of
two mistakes. First, there is a factor of
2, which finds
its origin in an erroneous normalization of the circular
polarized phonons. They should indeed be defined as
u	 = (ux + iuy)/
2 and u� = (ux − iuy)/
2 [and not
as u	 = ux + iuy and u� = ux − iuy as incorrectly
assumed on page 1, second column], such that the asso-
ciated phonon operators bA obey the usual commutation
relations [bA, b
] = δA,A′ , with A =	,�. This yields
a factor of
2 in the definition of the effective coupling
constants [Eq. (3)], which read in the corrected form
g	(n) = g
(1 + δn,0)γ
ν̄−,(n+1) − ν̄+,n ,
g�(n) = g
(1 + δn,0)γ
ν̄−,n − ν̄+,(n+1) .
As a consequence, the zero-field dimensionless coupling
constant λ [defined in the first column page 3 of our
Letter] is multiplied by a factor of 2 and becomes λ =
3π)(g/t)2.
Second, we also underestimated the numerical value of
the electron-phonon coupling constant g by a factor of√
2. Indeed, g defined in our work [see Eq. (1)] is related
to 〈g2Γ〉F ≃ 0.0405 eV2 computed by Piscanec et al. [2]
as g =
2〈g2Γ〉F ≃ 0.28 eV and not as g =
〈g2Γ〉F ≃ 0.2
eV as incorrectly assumed in our Letter. In addition,
there is a substantial uncertainty in the precise value of
the constant g. In a tight-binding model, the latter may
be related to the derivative of the hopping amplitude t
as a function of the carbon-carbon distance a as g =
(−dt/da)× 3/(2
Mω) [1]. Harrison’s phenomenological
law t ∝ 1/a2 then implies that g ≃ 0.26 eV. Experiments
in graphene [3] and [4] in zero magnetic field give for the
dimensionless coupling constant λ the values 4.4 × 10−3
and 5.3×10−3 respectively. This determines g in between
0.3 eV and 0.36 eV, where we take into account that the
value of t lies between 2.7 and 3 eV. In the end, we have
to take g in the range between 0.26 and 0.36 eV [instead
of g ≃ 0.2 eV] and therefore the dimensionless coupling
constant becomes λ ≃ (2.8 to 5.3)× 10−3 [instead of λ ≃
10−3].
As a result of the two factors of
2, the numerical
estimate for the mode splitting 2gA at ν = 0 and B ≃ 30
T [at the discussed resonance −, 1 ⇒ 0 and 0 ⇒ +, 1, see
second column of page 3] becomes 2gA ∼ 15 meV (∼ 120
cm−1), for g ≃ 0.26 eV and 2gA ∼ 20 meV (∼ 160 cm−1)
for g ≃ 0.36 eV [instead of 2gA ∼ 8 meV]. The effect
is therefore twice larger than initially predicted. The
conclusions of our work remain unaltered.
We would like to thank C. Faugeras and M. Potemski
for having drawn our attention on the underestimated
value of the mode splitting. See also their recent preprint
where they measure the magnetophonon resonance [5].
[1] T. Ando, J. Phys. Soc. Jpn 75, 124701 (2006); ibid 76,
024712 (2007).
[2] S. Piscanec, M. Lazzeri, F. Mauri, A. C. Ferrari, and J.
Robertson, Phys. Rev. Lett. 93, 185503 (2004).
[3] S. Pisana, M. Lazzeri, C. Casiraghi, K. S. Novoselov, A.
K. Geim, A. C. Ferrari, and F. Mauri, Nature Materials
6, 198 (2007).
[4] J. Yan, Y. Zhang, P. Kim, and A. Pinczuk, Phys. Rev.
Lett. 98, 166802 (2007).
[5] C. Faugeras, M. Amado, P. Kossacki, M. Orlita, M.
Sprinkle, C. Berger, W.A. de Heer and M. Potemski,
arXiv:0907.5498.
http://arxiv.org/abs/0907.5498
ABSTRACT
  We describe a peculiar fine structure acquired by the in-plane optical phonon
at the Gamma-point in graphene when it is brought into resonance with one of
the inter-Landau-level transitions in this material. The effect is most
pronounced when this lattice mode (associated with the G-band in graphene Raman
spectrum) is in resonance with inter-Landau-level transitions 0 -> (+,1) and
(-,1) -> 0, at a magnetic field B_0 ~ 30 T. It can be used to measure the
strength of the electron-phonon coupling directly, and its filling-factor
dependence can be used experimentally to detect circularly polarized lattice
modes.

<|endoftext|><|startoftext|>
Pfa�ans, hafnians and produ
ts of real linear
fun
tionals
Péter E. Frenkel
Alfréd Rényi Institute of Mathemati
s
Hungarian A
ademy of S
ien
es
P.O.B. 127, 1364 Budapest, Hungary
frenkelp�renyi.hu
Abstra
t
We prove pfa�an and hafnian versions of Lieb's inequalities on deter-
minants and permanents of positive semi-de�nite matri
es. We use the
hafnian inequality to improve the lower bound of Révész and Sarantopou-
los on the norm of a produ
t of linear fun
tionals on a real Eu
lidean
spa
e (this subje
t is sometimes 
alled the `real linear polarization 
on-
stant' problem).
Mathemati
s Subje
t Classi�
ation: 46C05, 15A15
Keywords: polarization 
onstant, real Eu
lidean spa
e, hafnian, pfaf-
�an, positive semi-de�nite matrix
-1. Introdu
tion
The 
ontents of this paper are as follows. In Se
tion 0, we sket
h one part of
the histori
 ba
kground: 
lassi
al inequalities on determinants and permanents
of positive semi-de�nite matri
es. In Se
tion 1, we prove pfa�an and hafnian
versions of these inequalities, and we formulate Conje
ture 1.5, another hafnian
inequality. In Se
tion 2, we apply the hafnian inequality of Theorem 1.4 to
our main goal: improving the lower bound of Révész and Sarantopoulos on the
norm of a produ
t of linear fun
tionals on a real Eu
lidean spa
e (this subje
t
is sometimes 
alled the `real linear polarization 
onstant' problem, its history is
sket
hed at the end of the paper). This is a
hieved in Theorem 2.3. We point
out that Conje
ture 1.5 would be su�
ient to 
ompletely settle the real linear
polarization 
onstant problem.
Partially supported by OTKA grants T 046365, K 61116 and NK 72523.
http://arxiv.org/abs/0704.0028v2
0. Old inequalities on determinants and perma-
nents
Re
all that the determinant and the permanent of an n× n matrix A = (ai,j)
are de�ned by
detA =
(−1)π
ai,π(i), per A =
ai,π(i),
where Sn is the symmetri
 group on n elements. Throughout this se
tion, we
assume that A is a positive semi-de�nite Hermitian n × n matrix (we write
A ≥ 0). For su
h A, Hadamard proved that
detA ≤
ai,i,
with equality if and only if A has a zero row or is a diagonal matrix. Fis
her
generalized this to
detA ≤ detA′ · detA′′
B∗ A′′
≥ 0, (1)
with equality if and only if detA′ · detA′′ · B = 0.
Con
erning the permanent of a positive semi-de�nite matrix, Mar
us [Mar1,
Mar2℄ proved that
per A ≥
ai,i, (2)
with equality if and only if A has a zero row or is a diagonal matrix. Lieb [L℄
generalized this to
per A ≥ per A′ · per A′′ (3)
for A as in (1), with equality if and only if A has a zero row or B = 0. Moreover,
he proved that in the polynomial P (λ) of degree n′ (=size of A′) de�ned by
P (λ) = per
λA′ B
B∗ A′′
all 
oe�
ients ct are real and non-negative. This is indeed a stronger theorem
sin
e it implies
per A = P (1) =
ct ≥ cn′ = per A′ · per A′′.
�okovi¢ [D, Mi℄ gave a simple proof of Lieb's inequalities, and showed also that if
A′ and A′′ are positive de�nite then cn′−t = 0 if and only if all subpermanents of
B of order t vanish. Lieb [L℄ also states an analogous (and analogously provable)
theorem for determinants: for A as in (1), let
D(λ) = det
λA′ B
B∗ A′′
If detA′ · detA′′ = 0, then D(λ) = 0. If A′ and A′′ are positive de�nite, then
(−1)tdn′−t is positive for t ≤ rk B and is zero for t > rk B.
Remark. In all of Lieb's inequalities mentioned above, the 
ondition that
the matrix A is positive semi-de�nite 
an be repla
ed by the weaker 
ondition
that the diagonal blo
ks A′ and A′′ are positive semi-de�nite. The proof goes
through virtually un
hanged. Alternatively, this stronger form of the inequali-
ties 
an be easily dedu
ed from the seemingly weaker form above.
1 New inequalities on pfa�ans and hafnians
For an n × n matrix A = (ai,j) and subsets S, T of N := {1, . . . , n}, we write
AS,T := (ai,j)i∈S,j∈T . If |T | = 2t is even, we write
(−1)T := (−1)t+
1.1 Pfa�ans
As far as the appli
ations in Se
tion 2 are 
on
erned, this subse
tion may be
skipped.
Re
all that the pfa�an of a 2n × 2n antisymmetri
 matrix C = (ci,j) is
de�ned by
pf C =
π∈S2n
(−1)πcπ(1),π(2) · · · cπ(2n−1),π(2n).
We have (pf C)
= detC.
For antisymmetri
 A and symmetri
 B, both of size n× n, we 
onsider the
polynomial
(−1)⌊n/2⌋pf
−λA B
⌊n/2⌋
Theorem 1.1 Let A and B be real n×n matri
es with A antisymmetri
 and B
symmetri
. If B is positive semi-de�nite, then pt ≥ 0 for all t. If B is positive
de�nite, then pt > 0 for t ≤ (rk A)/2 and pt = 0 for t > (rk A)/2.
Proof. If B = (bi,j) is positive semi-de�nite, then there exist ve
tors x1, . . . , xn
in a real Eu
lidean spa
e V su
h that (xi, xj) = bi,j. Re
all that in the exterior
tensor algebra
V a positive de�nite inner produ
t (and the 
orresponding
Eu
lidean norm) is de�ned by
:= det((vi, wj)).
We have
|S|=2t
|T |=2t
(−1)S(−1)Tpf AS,S · pf AT,T · detBN\S,N\T =
|S|=2t
|T |=2t
(−1)Spf AS,S ·
xi, (−1)Tpf AT,T ·
j 6∈T
|S|=2t
(−1)Spf AS,S ·
Assume that B is positive de�nite. Then the ve
tors xi are linearly independent.
It follows that the tensors
i6∈S xi are also linearly independent as S runs over
the subsets of N . Thus pt = 0 if and only if pf AS,S = 0 for all |S| = 2t, i.e., if
and only if 2t > rk A. �
Theorem 1.2 Let A and B be real n × n matri
es with A antisymmetri
 and
B symmetri
. Let λ ≥ 0. If B is positive semi-de�nite, then
(−1)⌊n/2⌋pf
−λA B
≥ detB.
If B is positive de�nite, then equality o

urs if and only if λA = 0.
Proof. The left hand side is
p0 + p1λ+ · · ·+ p⌊n/2⌋λ⌊n/2⌋.
The right hand side is p0. �
I am grateful to the anonymous referee of this paper for the idea of the
following alternative proof of Theorems 1.1 and 1.2. We may assume B > 0,
sin
e every positive semi-de�nite matrix is a limit of positive de�nite ones.
The matrix B−1/2AB−1/2 being real and antisymmetri
, there exists a unitary
matrix U su
h that D := U−1B−1/2AB−1/2U is diagonal with purely imaginary
eigenvalues a1
−1, . . . , an
−1. The real multiset {a1, . . . , an} is invariant
under a ↔ −a. We have
= det
−λA B
= det
BUDU−1
BUDU−1
= det
−λD 1
0 U−1
= det
−1 ai
= detB2 ·
(1 + a2iλ).
Extra
ting square roots, and 
hoosing the sign in a

ordan
e with p0 = +detB,
we get
t = (−1)⌊n/2⌋pf
−λA B
= detB ·
(1 + a2iλ),
when
e both theorems immediately follow, sin
e detB > 0.
1.2 Hafnians
Re
all that the hafnian of a 2n× 2n symmetri
 matrix C = (ci,j) is de�ned by
haf C =
π∈S2n
cπ(1),π(2) · · · cπ(2n−1),π(2n).
For symmetri
 A and B, both of size n× n, we 
onsider the polynomial
⌊n/2⌋
Theorem 1.3 Let A and B be symmetri
 real n× n matri
es. If B is positive
semi-de�nite, then ht ≥ 0 for all t. If B is positive de�nite, then ht = 0 if and
only if all 2t× 2t subhafnians of A vanish.
Proof. If B = (bi,j) is positive semi-de�nite, then there exist ve
tors x1, . . . , xn
in a real Eu
lidean spa
e V su
h that (xi, xj) = bi,j . Re
all [Mar1, Mar2, MN,
Mi℄ that in the symmetri
 tensor algebra SV a positive de�nite inner produ
t
(and the 
orresponding Eu
lidean norm) is de�ned by
:= per ((vi, wj)).
We have
|S|=2t
|T |=2t
haf AS,S · haf AT,T · per BN\S,N\T =
|S|=2t
haf AS,S ·
Assume that B is positive de�nite. Then the ve
tors xi are linearly independent.
It follows that the tensors
i6∈S xi are also linearly independent as S runs over
the subsets of N . Thus ht = 0 if and only if haf AS,S = 0 for all |S| = 2t. �
Theorem 1.4 Let A and B be symmetri
 real n× n matri
es. Let λ ≥ 0. If B
is positive semi-de�nite, then
≥ per B.
If B is positive de�nite, then equality o

urs if and only if A is a diagonal matrix
or λ = 0.
Proof. The left hand side is
h0 + h1λ+ · · ·+ h⌊n/2⌋λ⌊n/2⌋.
The right hand side is h0. �
Setting A = B and λ = 1, and 
ombining with Mar
us's inequality (2), we
arrive at 
ase p = 1 of
Conje
ture 1.5 If A = (ai,j) is a positive semi-de�nite symmetri
 real n× n
matrix, then the hafnian of the 2pn× 2pn matrix 
onsisting of 2p× 2p blo
ks A
is at least (2p− 1)!!n
i,i, with equality if and only if A has a zero row or is
a diagonal matrix.
2 Produ
ts of real linear fun
tionals
In this se
tion, we apply Theorem 1.4 to produ
ts of jointly normal random
variables and then to produ
ts of real linear fun
tionals, whi
h was the main
motivation for this work. The ideas in this se
tion are analogous to those that
Arias-de-Reyna [A℄ used in the 
omplex 
ase.
Let ξ1, . . . , ξd denote independent random variables with standard Gaussian
distribution, i.e., with joint density fun
tion (2π)−d/2 exp(−|ξ|2/2), where |ξ|2 =
ξ2k.We write Ef(ξ) for the expe
tation of a fun
tion f = f(ξ) = f(ξ1, . . . , ξd).
Re
all that
k = (2p− 1)!! = (2p− 1)(2p− 3) · · · 3 · 1
for k = 1, . . . , d (easy indu
tive proof via integration by parts), and thus
(2pk − 1)!!.
, we write (·, ·) for the standard Eu
lidean inner produ
t. We re
all
the well-known [B2, G, S, Z℄
Wi
k formula. Let x1, . . . , xn be ve
tors in R
with Gram matrix A =
((xi, xj)). Then
(xi, ξ) = haf A. (4)
(For odd n, we de�ne haf A = 0.)
Proof. Both sides are multilinear in the xi, so we may assume that ea
h xi is
an element of the standard orthonormal basis e1, . . . , ed. If there is an ek that
o

urs an odd number of times among the xi, then both sides are zero. If ea
h
ek o

urs 2pk times, then the left hand side is E
k=1 ξ
k , and the right hand
side is
k=1(2pk − 1)!!, whi
h are equal. �
The following theorems are easy 
orollaries of Theorem 1.4 together with
the Wi
k formula (4) and Mar
us's theorem (2).
Theorem 2.1 If X1, . . . , Xn are jointly normal random variables with zero
expe
tation, then
X21 · · ·X2n
≥ EX21 · · ·EX2n.
Equality holds if and only if they are independent or at least one of them is
almost surely zero.
Proof. The variables 
an be written as Xi = (xi, ξ) with ξ of standard normal
distribution and the xi 
onstant ve
tors with a positive semi-de�nite Gram
matrix A = (ai,j) = ((xi, xj)). Then
X2i = E
(xi, ξ)
= haf
≥ per A ≥
ai,i =
E(xi, ξ)
EX2i ,
with equality if and only if A is a diagonal matrix or has a zero row, i.e., the xi
are pairwise orthogonal or at least one of them is zero. �
The generalization of Theorem 2.1 to an arbitrary even exponent 2p is equiv-
alent to Conje
ture 1.5.
Theorem 2.2 For any x1, . . . , xn ∈ Rd, |xi| = 1, the average of
(xi, ξ)
the unit sphere {ξ ∈ Rd : |ξ| = 1} is at least
Γ(d/2)
2nΓ(d/2 + n)
(d− 2)!!
(d+ 2n− 2)!!
d(d+ 2)(d+ 4) . . . (d+ 2n− 2)
with equality if and only if the ve
tors xi are pairwise orthogonal.
Proof. The average on the unit sphere is the 
onstant in the theorem times
the expe
tation w.r.t. the standard Gaussian measure (see e.g. [B1℄). By The-
orem 2.1, the latter expe
tation is minimal if and only if the xi are pairwise
orthogonal, in whi
h 
ase it is 1. �
Theorem 2.3 For real linear fun
tionals fi on a real Eu
lidean spa
e,
||f1 · · · fn|| ≥
||f1|| · · · ||fn||
n(n+ 2)(n+ 4) · · · (3n− 2)
Here || · || means supremum of the absolute value on the unit sphere. In the
in�nite-dimensional 
ase, fun
tionals with in�nite norm may be allowed. Then
the 
onvention 0 · ∞ = 0 should be used on the right hand side.
Proof. We may assume that the spa
e is R
with d ≤ n, and the fun
tionals are
given by fi(ξ) = (xi, ξ) with ||fi|| = |xi| = 1. Then ||f1 · · · fn||2 is at least the
average of
f2i (ξ) =
(xi, ξ)
on the unit sphere, whi
h by Theorem 2.2 and
d ≤ n is at least 1/(n(n+ 2)(n+ 4) · · · (3n− 2)). �
It is an unsolved problem, raised by Benítez, Sarantopoulos and Tonge [BST℄
(1998), whether Theorem 2.3 is true with nn under the square root sign in the
denominator on the right hand side. This is 
alled the `real linear polarization

onstant' problem. In the 
omplex 
ase, the a�rmative answer was proved by
Arias-de-Reyna [A℄ in 1998, based on the 
omplex analog of the Wi
k formula
[A, B2, G℄ and on Lieb's inequality (3).
Keith Ball [Ball℄ gave another proof
of the a�rmative answer in the 
omplex 
ase by solving the 
omplex plank
problem.
In the real 
ase, the a�rmative answer for n ≤ 5 was proved by Pappas and
Révész [PR℄ in 2004. For general n, the best estimate known before the present
paper was that of Révész and Sarantopoulos [RS℄ (2004), based on results of
[MST℄, with (2n)n/4 under the square root sign. See [Mat1, Mat2, MM, R℄ for
a

ounts on this and related questions. Note that
n(n+ 2)(n+ 4) · · · (3n− 2) =
= exp (logn+ log(n+ 2) + log(n+ 4) + · · ·+ log(3n− 2)) <
< exp
log u · du
= exp
[u(log u− 1)]3nn /2
= exp((3n log 3n− 3n− n logn+ n)/2) =
= exp
n(2 logn+ 3 log 3− 2)
and 3
3/e < 3 · 1.8/2.7 = 2, so Theorem 2.3 is an improvement. Note also that
the statement with nn under the square root sign would follow from Conje
-
ture 1.5.
A
knowledgements
I am grateful to Péter Major, Máté Matol
si and Szilárd Révész for helpful
dis
ussions, and to the anonymous referee for useful 
omments.
Referen
es
[A℄ J. Arias-de-Reyna, Gaussian variables, polynomials and permanents, Lin.
Alg. Appl. 285 (1998), 107�114.
The referee of the present paper 
alled my attention to the fa
t that Arias-de-Reyna used
only the spe
ial 
ase of (3) where the matrix A
is of rank 1. This is mu
h simpler than (3) in
general, it 
an be proved essentially by the argument Mar
us used in [Mar1, Mar2℄ to prove
the even more spe
ial 
ase n
= 1, whi
h still implies inequality (2).
[Ball℄ K. M. Ball, The 
omplex plank problem, Bull. London. Math. So
. 33
(2001), 433�442.
[B1℄ A. Barvinok, Estimating L∞ norms by L2k norms for fun
tions on orbits,
Found. Comput. Math. 2 (2002), 393�412.
[B2℄ A. Barvinok, Integration and optimization of multivariate polynomials by
restri
tion onto a random subspa
e, arXiv preprint: math.OC/0502298
[BST℄ C. Benítez, Y. Sarantopoulos, A. Tonge, Lower bounds for norms of
produ
ts of polynomials, Math. Pro
. Camb. Phil. So
. 124 (1998), 395�408.
[D℄ D. �. �okovi¢, Simple proof of a theorem on permanents, Glasgow Math. J.
10 (1969), 52�54.
[G℄ L. Gurvits, Classi
al 
omplexity and quantum entanglement, J. Comput.
System S
i. 69 (2004), no. 3, 448�484.
[L℄ E. H. Lieb, Proofs of some 
onje
tures on permanents, J. Math. Me
h. 16
(1966), 127�134.
[Mar1℄ M. Mar
us, The permanent analogue of the Hadamard determinant the-
orem, Bull. Amer. Math. So
. 69 (1963), 494�496.
[Mar2℄ M. Mar
us, The Hadamard theorem for permanents, Pro
. Amer. Math.
So
. 15 (1964), 967�973.
[MN℄ M. Mar
us, M. Newman, The permanent fun
tion as an inner produ
t,
Bull. Amer. Math. So
. 67 (1961), 223�224.
[Mat1℄ M. Matol
si, A geometri
 estimate on the norm of produ
t of fun
tionals,
Lin. Alg. Appl. 405 (2005), 304�310.
[Mat2℄ M. Matol
si, The linear polarization 
onstant of R
, A
ta Math. Hungar.
108 (2005), no. 1-2, 129�136.
[MM℄ M. Matol
si, G. A. Muñoz, On the real linear polarization 
onstant prob-
lem, Math. Inequal. Appl. 9 (2006), no. 3, 485�494.
[Mi℄ H. Min
, Permanents, En
y
lopedia of Mathemati
s and its Appli
ations,
Addison-Wesley, 1978
[MST℄ G. A. Muñoz, Y. Sarantopoulos, A. Tonge, Complexi�
ations of real
Bana
h spa
es, polynomials and multilinear maps, Studia Math. 134 (1999),
no. 1, 1�33.
http://arxiv.org/abs/math/0502298
[PR℄ A. Pappas, Sz. Révész, Linear polarization 
onstants..., J. Math. Anal.
Appl. 300 (2004), 129�146.
[R℄ Sz. Gy. Révész, Inequalities for multivariate polynomials, Annals of the
Marie Curie Fellowships 4 (2006), http://www.marie
urie.org/annals/, arXiv
preprint: math.CA/0703387
[RS℄ Sz. Gy. Révész, Y. Sarantopoulos, Plank problems, polarization and Cheby-
shev 
onstants, J. Korean Math. So
. 41 (2004) 157�174.
[S℄ B. Simon, The P(φ)2 Eu
lidean (Quantum) Field Theory, Prin
eton Series
in Physi
s, Prin
eton University Press, 1974
[Z℄ A. Zvonkin, Matrix integrals and map enumeration: an a

esible introdu
-
tion, Combinatori
s and physi
s (Marseille, 1995), Math. Comput. Modelling
26 (1997), 281�304.
http://arxiv.org/abs/math/0703387
	New inequalities on pfaffians and hafnians
	Pfaffians
	Hafnians
	Products of real linear functionals
ABSTRACT
  We prove pfaffian and hafnian versions of Lieb's inequalities on determinants
and permanents of positive semi-definite matrices. We use the hafnian
inequality to improve the lower bound of R\'ev\'esz and Sarantopoulos on the
norm of a product of linear functionals on a real Euclidean space (this subject
is sometimes called the `real linear polarization constant' problem).

<|endoftext|><|startoftext|>
Understanding the Flavor Symmetry Breaking and Nucleon
Flavor-Spin Structure within Chiral Quark Model
Zhan Shu, Xiao-Lin Chen, and Wei-Zhen Deng∗
Department of Physics, Peking University, Beijing 100871, China
Abstract
In χQM, a quark can emit Goldstone bosons. The flavor symmetry breaking in the Goldstone
boson emission process is used to intepret the nucleon flavor-spin structure. In this paper, we
study the inner structure of constituent quarks implied in χQM caused by the Goldstone boson
emission process in nucleon. From a simplified model Hamiltonian derived from χQM, the intrinsic
wave functions of constituent quarks are determined. Then the obtained transition probabilities
of the emission of Goldstone boson from a quark can give a reasonable interpretation to the flavor
symmetry breaking in nucleon flavor-spin structure.
PACS numbers: 12.39.-x, 12.39.Fe, 14.20.Dh
∗Electronic address: dwz@th.phy.pku.edu.cn
http://arxiv.org/abs/0704.0029v2
mailto:dwz@th.phy.pku.edu.cn
I. INTRODUCTION
The measurements of the polarized structure functions of the nucleon in deep inelastic
scattering(DIS) experiments[1, 2, 3, 4] show the complication in proton spin structure. Only
a portion of the proton spin is carried by valence quarks. Moreover, several experiments[5, 6,
7] clearly indicate the ū-d̄ asymmetry as well as the existence of the strange quark content s̄ in
the proton sea. Also the distribution of strange quark in the proton sea is polarized negative.
The DIS results deviate significantly from the näıve quark model (NQM) expectation.
NQM gives many fairly good descriptions of hadron properties. Why does NQM work?
It is a puzzle that the quarks inside a hadron could be treated as non-relativistic particles
in NQM. The chiral quark model (χQM) tries to bridge between QCD and NQM. It was
originated by Weinberg[8] and formulated by Manohar and Georgi[9]. Between the QCD
confinement scale (ΛQCD ≃200MeV) and a chiral symmetry breaking scale (ΛχSB ≃1GeV),
the strong interaction is described by an effective Lagrangian of quarks q, gluons g and
Numbu-Goldstone bosons Π. An important feature of the χQM is that, betweetn ΛQCD and
ΛχSB, the internal gluon effects in a hadron can be small compared to the internal Goldstone
bosons Π and quarks q, so the effective degrees of freedom in this region can be q and Π.
It is interesting that χQM can also be used to explain why NQM does not work in the
above DIS experiments. By the emission of Goldstone boson, χQM allows the fluctuation
of a quark q into a recoiling quark plus a Goldstone boson q → q′Π . The q′Π system then
further splits to generate quark sea through
• the helicity-flipping process
q↑ −→ Π+ q′↓ −→ (qq̄′) + q′↓ (1)
• and the helicity-non-flipping process
q↑ −→ Π+ q′↑ −→ (qq̄′) + q′↑ (2)
where the subscript indicates the helicity of quark. In both the process, q′Π is in a relative P-
wave state. In the helicity-flipping process (1), the orbital angular momentum along helicity
direction must be 〈lz〉 = +1. In the helicity-non-flipping process (2), 〈lz〉 = 0. The process
cause a modification of the spin content of the nucleon because a quark changes its helicity
in (1). Also it causes a modification of the flavor content because the generated quark sea
from Π is flavor dependent[10, 11].
χQM was first used to explain the nucleon sea flavor asymmetry and the smallness of
the quark spin fraction by Eichten, Hinchliffe and Quigg[10]. The flavor asymmetry of sea
quark distribution arises from the mass differences in different quark flavors and in different
Goldstone bosons. Only the lightest Goldstone Boson π was considered since its contribution
dominates. From a perturbation calculation, the probability for an up quark to emit a π+
was estimated to be a = 0.083. This would induce a flavor asymmetry in parton distributions
of nucleon and other hadrons.
However, the estimated transition probability is not enough to full account the flavor
asymmetry in DIS experiments. Contribution from other Π’s and even η′ was considered by
Cheng and Li[11]. Explicit SUf (3) breaking in the transition probabilities was later intro-
duced in refs. 12, 13 and further used by several authors[14, 15, 16, 17, 18, 19]. Nevertheless,
in all these calculations, the transition probabilities were put into model by hand. To fit the
experimental data, the probability of an up quark emitting π+ needs to be set to a >∼ 0.1,
which is about 20% larger than the perturbation calculation. Although the probability of π
emission can be enlarged by using a higher momentum cut off Λ > ΛχQM in the perturbation
calculation [20], however, the chiral quark model is no longer valid at arbitrary high energies
Λ ≫ ΛχQM.
We should not be surprised by this discrepancy since the χQM works in a region right
above the QCD confinement scale ΛQCD. There one may expect the confinement effect is
important and the perturbative calculation of QCD may contain large error. However, there
is another essential difference between the above model calculations and the perturbation
calcultion. In the perturbation calculation, the emitted Goldstone bosons are virtual par-
ticles. In the above model calculations which are closely related to NQM, however, the
Goldstone bosons are close to mass shell under the non-relativistic approximation.
Since χQM can be a bridge between NQM and QCD, it is interesting to explore χQM
from NQM side where we use the wave function method. This will give the above model cal-
culations a concrete foundation in NQM and help us further understand the flavor symmetry
breaking mechanism.
In this paper, we will use wave function method to investigate the flavor symmetry break-
ing in χQM. In a conventional quark model[21], a hadron consists of confined constituent
quarks and its wave function is constructed in the configuration space of the constituent
quarks. To incorporate the transition process of emitting Goldstone boson of χQM into the
quark model, the constituent quarks will have intrinsic wave functions within the configu-
ration q + q′Π.
In Sec. II, we first present the composite wave function of constituent quarks including
components of q′Π. The wave functions and the transition probabilities of q → q′Π are
determined from a simplified χQM Hamiltionian. In Sec. III and Sec. IV, the obtained
transition probabilities are used to calculate nucleon flavor-spin structure and baryon octet
magnetic moments respectively. The numerical results and a brief summary are presented
in Sec. V.
II. THE WAVE FUNCTION OF A CONSTITUENT QUARK
In χQM, the effective Lagrangian below the chiral symmetry breaking scale ΛχQM involves
quarks, gluons, and Goldstone bosons. The first few terms in this Lagrangian are[9]:
LχQM = ψ̄(iDµ + Vµ)γµψ + igAψ̄Aµγµγ5ψ
−mψ̄ψ +
f 2πtr∂
µΣ†∂µΣ+ ... (3)
where Dµ = ∂µ + igGµ is the gauge-covariant derivative of QCD, Gµ the gluon field and
g the strong coupling constant. The dimensionless axial-vector coupling gA = 0.7524 is
determined from the axial charge of the nucleon. m represents the constituent quark masses
due to chiral symmetry breaking. The pseudoscalar decay constant is fπ ≈ 93MeV. The Σ
field, vector currents Vµ and axial-vector currents Aµ are given in terms of the Goldstone
boson fields Φ
π0 + 1√
η π+ K+
π− − 1√
π0 + 1√
K− K̄0 − 2√
, (4)
Σ = exp(i
), (5)
(ξ†∂µξ ± ξ∂µξ†), (6)
ξ = exp(i
). (7)
An expansion of the currents in powers of Φ/fπ yields the effective interaction between Π
and q[10]
LI = −
ψ̄∂µΦγ
µγ5ψ. (8)
This allows the fluctuation of a quark into a recoil quark plus a Goldstone boson q → q′Π.
In quark model, a hadron is built with constituent quarks. In accordance with χQM,
we should treat a constituent quark as a composite particle including such components q′Π.
Here we denote the wave function of a composite constituent quark as |q〉〉. At rest,
|q〉〉 = zq|q〉+
q′Π|q
′Π〉. (9)
In our paper, the state normalization relation is always taken as
〈p|p′〉 = δ3(p− p′). (10)
The above wave function is of essential importance in our work. The square of the mod-
ulus of the coefficient of each q′Π configuration is just the probability for the corresponding
Π emission process
Pq→q′Π = |xqq′Π|
2, (11)
|zq|2 = (1−
Pq→q′Π)
is the probability of no Π emission.
To determine the wave function (9), we first construct a simplified Hamiltonian in the
degrees of freedom q and Π,
H = H0 +HB +HI . (12)
H0 represents the kinetic energies of q and Π. It reads
ψ̄(iα · ∇+m)ψ +
Tr[Φ̇2 + (∇Φ)2] +
m2Π(Φ
, (13)
where mΠ is the physical mass of Π which is nonzero and nondegenerate.
HI = −
d3xLI , (14)
is the χQM interaction. HB is an accessary interaction which is needed to bind the q
together. In our simplified Hamiltonian, we will not disscuss the explicit formalism of HB.
Instead, we will put some physical restriction conditions on it later in this section, which is
sufficient to our calculation.
FromH0, we can expand free fields ψ and Π in terms of creation and annihilation operators
ψq(x) =
(2π)3/2
q(p, s)e−ip·x + bq†
ps(t)v
q(p, s)eip·x
, (15)
ΦΠ(x) =
(2π)3/2
e−ip·x + cΠ†
eip·x
p0=EΠ
, (16)
where
p2 +m2q
is the quark energy of flavor q,
p2 +m2Π
is the energy of Goldstone boson Π. aq†
ps and b
pr are the creation operators of quark q and
anti-quark q̄
pr, a
p′s} = {b
pr, b
p′s} = δ
(3)(p− p′)δrs. (17)
is the creation operator of Π
] = δ(3)(p− p′). (18)
Next, we will replace the field ψ and Φ in the Hamiltonian (12) with the free field of (15)
and (16). Then we can express the Hamiltonian in creation and annihilation operators, for
example
d3p Eq
ps + b
ps] +
d3p EΠ
. (19)
In all the model calculations [11, 12, 13, 14, 15, 16, 17, 18, 19], the emitted Π is assumed
bound to the quark source. To represent that q′Π are bound, we use the well known SHO
function as their spatial wave function
|qΠ〉 =
d3p|p|e−
2λ2 [Y1(θ, φ) c
]1/2 |0〉, (20)
|qΠ ↑〉 = 1√
d3p|p|e−
2λ2 Y11(θ, φ) c
p↓ |0〉
d3p|p|e−
2λ2 Y10(θ, φ) c
p↑ |0〉, (21)
where λ is the “characteristic radius” parameter in Gaussian function. 1/
N is the nor-
malization factor,
dp p4 e
πλ5. (22)
However, we need a binding interaction HB in the Hamiltonian. Yet we do not know how
to write out the explicit form of HB. However, HB should provide enough binding energy.
That is, for the q′Π system, we must have
〈qΠ|H0 +HB|qΠ〉 ≤ mq +mΠ. (23)
That is
EB = 〈qΠ|HB|qΠ〉 ≤ mq +mΠ − 〈qΠ|H0|qΠ〉 = mq − Eq +mΠ − EΠ. (24)
As a rough estimation, we will take the mininum value of EB
EB = −max
{Eq −mq + EΠ −mΠ} = −(Eu −mu + Eπ −mπ). (25)
Then the wave function of a composite constituent quark is determined from Schrödinger
equation
H|q〉〉 =Mq|q〉〉. (26)
After taking the above simplification, we need only solve a matrix eigen-value problem
 =Mq
 , (27)
where
aδ3(0) = 〈q|H|q〉,
Bq′Πδ
3(0) = 〈q|H|q′Π〉,
Cq′Π;q′′Π′δ
3(0) = 〈q′Π|H|q′′Π′〉,
q′Π = x
For example, let us consider the process u emitting Π. There are four possible |q′Π〉 states
generated by the fluctuations of a u quark: |uπ0〉, |uη〉, |dπ+〉 and |sK+〉. Thus
|u〉〉 = zu|u〉+ xuuπ0 |uπ0〉+ xuuη|uη〉+ xudπ+ |dπ+〉+ xusK+|sK+〉. (28)
Taking these wave functions as basis, we can calculate the matrix of the Hamiltonian in
(27).
a = mu. (29)
C is diagonalized. Its diagonal matrix elements are calculated from H0
Cuπ0;uπ0 =
dp p4 e
p2 +m2u +
) + EB, (30)
Cuη;uη =
dp p4 e
p2 +m2u +
p+m2η) + EB, (31)
Cdπ+;dπ+ =
dp p4 e
p2 +m2d +
) + EB, (32)
CsK+;sK+ =
dp p4 e
p2 +m2s +
) + EB. (33)
B is calculated from HI
Buπ0 = −
dp p4e
, (34)
Buη = −
dp p4e
, (35)
Bdπ+ = −
dp p4e
, (36)
BsK+ = −
dp p4e
. (37)
By diagonalizing this Hamiltonian matrix, we will obtain a new mass of the constituent u
quark Mu and its composite wave function. The constituent masses and wave functions of
d and s quarks can be obtained similarly. We have
|d〉〉 = zd|d〉+ xddπ0 |dπ0〉+ xddη|dη〉+ xduπ−|uπ−〉+ xdsK0|sK0〉, (38)
|s〉〉 = zs|s〉+ xssη|sη〉+ xsdK̄0 |dK̄
0〉+ xsuK−|uK−〉. (39)
From isospin symmetry, mu = md, we have
zd = zu; xddπ0 = −xuuπ0 ; xduπ− = xudπ+ ; ... (40)
However, since mu 6= ms, one should notice that
zs 6= zu; xsdK̄0 6= x
sK0; x
uK− 6= xusK+. (41)
After the diagonalization, the Goldstone bosons Π are separated from quarks q approx-
imately. With only degrees of freedom q one can rebuild the quark model and so Mu, Md,
Ms should be regarded as the constituent quark masses in quark model.
III. FLAVOR AND SPIN STRUCTURE OF PROTON
Having known the wave functions of constituent quark q and the transition amplitudes
of q emitting each Goldstone bosons Π, we are able to calculate the quark distribution in
a constituent quark following refs. 11, 12, 13. In χQM, Π will further split into a quark-
antiquark pair. By substituting the quark contents of Π into wave functions (28), (38) and
(39), we can rewrite the wave functions of constituent quark q as
|u〉〉 = zu|u〉+
xuuη√
|u(uū)〉+
xuuη√
|u(dd̄)〉
2xuuη√
|u(ss̄)〉+ xudπ+ |d(ud̄)〉+ xusK+|s(us̄)〉, (42)
|d〉〉 = zu|d〉+
xuuη√
|d(uū)〉+
xuuη√
|d(dd̄)〉
2xuuη√
|d(ss̄)〉+ xudπ+ |u(dū)〉+ xusK+|s(ds̄)〉, (43)
|s〉〉 = zs|s〉+
xssη√
|s(uū)〉+
xssη√
|s(dd̄)〉 −
2xssη√
|s(ss̄)〉
|d(sd̄)〉+ xsuK−|u(sū)〉. (44)
Then the antiquark and quark flavor contents of the proton (uud) are
ū = 2
xuuη√
xuuη√
+ |xudπ+ |2, u = ū+ 2, (45)
xuuη√
xuuη√
+ 2|xudπ+ |2, d = d̄+ 1, (46)
s̄ = 2|xuuη|2 + 3|xusK+|2, s = s̄. (47)
Some important quantities depending on the above quark distribution are: the Gottfried
sum rule IG =
(ū − d̄) whose deviation indicates the ū-d̄ asymmetry in proton sea;
ū/d̄ measured through the ratio of muon pair production cross sections; and the fractions
of quark flavors in proton fq =
Σ(q+q̄)
, f3 = fu − fd and f8 = fu + fd − 2fs.
We can further calculate the spin structure of proton. Here one should consider the effects
of configuration mixing generated by spin-spin forces[22]. We take the baryon wave functions
from the quark model calculation[23, 24, 25]. The proton wave function for example, is
expressed as
= 0.90|P 28SS〉 − 0.34|P 28S ′S〉 − 0.27|P 28SM〉 (48)
where the baryon SU(6)⊗O(3) wave functions are denoted as |B2S+1N Lσ〉, N is SU(3) mul-
tiplicity. S, L are the total spin and total orbital angular momentum while σ = S,M,A
denotes the permutation symmetry of SU(6). The spin polarization functions will be re-
markably affected by configuration mixing. Following refs. 15, 17, we define the number
operator by
N̂ = nu↑u↑ + nu↓u↓ + nd↑d↑ + nd↓d↓ + ns↑s↑ + ns↓s↓,
where nq↑, nq↓ are the number of q↑, q↓ quarks. The spin structure of the “mixed” proton is
given by
= (0.902 + 0.342)
+ 0.272
. (49)
The spin structure after considering Π-emission is obtained by replacing for every quark in
eq. (49) by
q↑,↓ −→ (1− ΣPi)q↑,↓ + Pflipping(q↑,↓) + Pnon−flipping(q↑,↓), (50)
where Pflipping(q↑,↓) and Pnon−flipping(q↑,↓)| are the probabilities of quark helicity flipping and
non-flipping for q↑,↓ respectively. For example, in the case of u↑ quark we have,
Pflipping(u↑) =
(|xuuπ0 |2 + |xuuη|2)u↓ + |xudπ+ |2d↓ + |xusK+|2s↓
Pnon−flipping(u↑) =
(|xuuπ0 |2 + |xuuη|2)u↑ + |xudπ+ |2d↑ + |xusK+|2s↑
Finally the spin polarization functions defined as ∆q = q↑ − q↓ are
∆u = (0.902 + 0.342)
114|xu
|2 + 48|xuuη|2 + 36|xusK+|2
+ 0.272
66|xu
|2 + 24|xuuη|2 + 18|xusK+|2
, (51)
∆d = (0.902 + 0.342)
|2 + 12|xuuη|2 + 9|xusK+|2
+ 0.272
42|xu
|2 + 12|xuuη|2 + 9|xusK+|2
, (52)
∆s = −
. (53)
There are several measured quantities which can be expressed in terms of the above spin
polarization functions. The quantities usually calculated are ∆3 = ∆u−∆d and ∆8 = ∆u+
∆d−2∆s, obtained from the neutron β-decay and the weak decays of hyperons respectively.
Another important quantity is the flavor singlet component of the total quark spin content
defined as 2∆Σ = ∆u + ∆d + ∆s . We also calculate some weak axial-vector form factors
which are also related to the spin polarization functions, (GA/GV )Λ→p =
(2∆u−∆d−∆s),
(GA/GV )Σ−→n = ∆d−∆s, and (GA/GV )Ξ−→Λ = 13(∆u+∆d− 2∆s).
IV. BARYON OCTET MAGNETIC MOMENTS
Considering the relative angular momentum between quark and Goldstone boson Π, the
magnetic moment operator of a qΠ system is
µ̂qΠ =
p2q +m
p2Π +m
p2q +m
p2Π +m
p2Π +m
p2q +m
p2q +m
p2Π +m
l̂ (54)
where eq and eΠ are the electric charges carried by q and Π respectively, ŝ the quark spin
operator and l̂ the relative angular momentum bewteen q and Π. The first term in Eq(54)
is the intrinsic magnetic moment of quark and the other two terms are the contribution of
the orbital angular momentum. Here we have to consider the relativistic effect since the
relative momentum of q or Π are comparable to their masses in the qΠ system
pq,Π ∼ Λ ∼ mq,Π.
With the SHO wave functions of (20), the magnetic moment of qΠ system (54) can be
readily calculated. Then we can recalculate the magnetic moments of constituent quarks
taking into account of the relativistic effect. For example, the magnetic moments of the u
quark is
µu = |zu|2〈u↑|µ̂|u↑〉+ Pu→uπ0〈uπ0|µ̂|uπ0〉+ Pu→uη〈uη|µ̂|uη〉
+ Pu→dπ+〈dπ+|µ̂|dπ+〉+ Pu→sK+〈sK+|µ̂|sK+〉, (55)
where
〈u↑|µ̂|u↑〉 =
, (56)
and the contribution from qΠ systems are
〈uπ0|µ̂|uπ0〉 = − eu
p2 +m2π
p2 +m2u +
p2 +m2π
p2 +m2u
λ2 , (57)
〈uη|µ̂|uη〉 = − eu
p2 +m2η
p2 +m2u +
p2 +m2η
p2 +m2u
λ2 , (58)
〈dπ+|µ̂|dπ+〉 = −
p2 +m2π
p2 +m2d +
p2 +m2π
p2 +m2d
p2 +m2d
p2 +m2d +
p2 +m2π
p2 +m2π
λ2 , (59)
〈sK+|µ̂|sK+〉 = −
p2 +m2K
p2 +m2s +
p2 +m2K
p2 +m2s
p2 +m2s
p2 +m2s +
p2 +m2K
p2 +m2K
λ2 . (60)
The magnetic moments of d and s quarks can be calculated similarly.
One can easily obtain the octet baryon magnetic moments by replacing the valence quarks
inside the baryons with the corresponding constituent quarks. Again we take proton as an
example,
µp = (0.90
2 + 0.342)
+ 0.272
. (61)
If we replace the µq by (55), µp can be further expressed as the baryon magnetic moment in
conventional quark model plus the contribution from the Goldstone boson emission process
[26]. The magnetic moments for other octet baryons can be calculated similarly.
V. NUMERICAL RESULTS AND CONCLUSIONS
In the numerical calculation, most of the parameters can be taken from the experimental
data or the chiral quark model. We collect these fixed input parameters of our calculation
in Table I. Here we have used the the physical masses of Goldstone bosons[27].
TABLE I: The fixed input parameters from chiral quark model and experimental data.
gA fπ(MeV) mπ(MeV) mK(MeV) mη(MeV)
0.7524 93 135 494 548
For the quark masses, since our work focuses on the inner context of the constituent quarks
in quark model, naturally we will refer to the quark masses from quark model, instead of
the chiral quark model values. Here we will use the quark mass values from the widely
accepted Isgur’s quark model[21] as shown in Table II. However, one should be cautious
that, in our model, it is the quark with the Goldstone boson mixing which corresponds to
the constituent quark in quark model. That is, mass values Mq after the diagonalization
process should be set to the quark masses in Isgur’s model. Our strategy is to adjust the
quark masses mq in the model Hamiltonian to fit the Mq values.
Finally we are left only with one free parameter λ which describes the confinement of the
emitted Goldstone boson in our model. An overall fit to the experimental data of nucleon
flavor-spin structure and octet baryon magnetic moments shows that the best value should
be λ=152MeV. With this value of λ and a minimun binding energe EB = −218MeV, the
“bare” values of quark masses mq without Goldstone boson mixing are shown also in Table
Transition probabilities of the light and strange quarks to various q′Π systems are given
in Table III and IV respectively. The probability of a u quark emitting a π+ P (u →
TABLE II: The quark masses with vs. without Goldstone boson mixing.
λ EB(MeV) mu,d(MeV) ms(MeV) Mu,d(MeV) Ms(MeV)
152 −218 288 474 220 419
d + π+)=0.145 is significantly larger than the perturbation calculation a=0.083. One may
notice that the λ parameter value 152MeV in our wave function, which is below ΛQCD, is
rather small than another energy scale ΛχQM in chiral quark model. Surely this will weaken
the interaction between q and q′Π. However, one should also notice that the binding energy
EB = −218MeV will make the energy of a qΠ system much close to the single quark energy.
This will enhance the mixing of q′Π components in a constituent quark.
Also, we notice that the asymmetry between the probabilities of u(d) → s + K and
s → u(d) + K̄. Whether this asymmetry leads to any observable consequence in hadron
structure needs further investigation.
TABLE III: Transition probabilities of a u quark to various q′Π systems and the mass of constituent
u quark.
u → u+ π0 u → u+ η u → d+ π+ u → s+K+ no GB-emission Mu
0.072 0.003 0.145 0.010 0.770 220MeV
TABLE IV: Transition probabilities of a s quark to various q′Π systems and the mass of constituent
s quark.
s → s+ η s → u+K− s → d+ K̄0 no GB-emission Ms
0.012 0.071 0.071 0.846 419MeV
Next, we will compare our calculate results with the experimental data. Since our em-
phasis is on the substructure of a constituent quark in NQM, here we also quote the results
from NQM. In Table V, the calculated flavor and spin structures of the proton are shown.
It should be mentioned that the quark spin polarization functions can be further corrected
by the gluon anomaly[13, 15, 17, 28, 29, 30] through
∆q(Q2) = ∆q − αs(Q
∆g(Q2), (62)
and the flavor singlet component of the total helicity is modified accordingly as
∆Σ(Q2) = ∆Σ− 3αs(Q
∆g(Q2), (63)
where ∆q(Q2) and ∆Σ(Q2) are the experimentally measured quantities, ∆q and ∆Σ corre-
spond to the calculated quantities without gluon correction. Using the experimental data
Σ(Q2 = 5GeV2) = 0.19 ± 0.02[2], αs(Q2 = 5GeV2) = 0.285 ± 0.013[27], and our result
∆Σ=0.346, the gluon polarization ∆g(Q2) is estimated to be 2.293. Both the results with
and without gluon polarization corrections are presented in Table V. The inclusion of gluon
polarization leads to a better agreement with experimental data for the spin structure.
The calculated magnetic moments of octet baryons are given in Table VI. Although the
deviation is somewhat around 30% in the case of Ξ−, our overall fit to octet baryon magnetic
moments is in good agreement with experiments. Also it should be mentioned that even in
the case of Ξ− the fit can perhaps be improved if corrections due to pion loops are taken
into account[32, 33].
In the model calculations [11, 12, 13, 14, 15, 16, 17, 18, 19], the Goldstone boson sector in
χQM is usually extended to include the η′ meson with U(3) symmetry. According to Cheng
and Li[11], in the large Nc limit of QCD, there are nine Goldstone bosons including the usual
octet and the singlet η′. Thus an constituent quark can also transit to a quark-η′ system. We
have also made an U(3) calculation. With the inclusion of η′, we find that the probabilities
for η′-emission from light and strange quarks P (u → u + η′)=P (d → d + η′)=0.0021 and
P (s → s + η′)=0.0018 which are negligibly small as compared to those of octet Goldstone
boson emissions. We therefore conclude that the contribution of η′ is not important, due to
the obvious axial U(1) symmetry breaking in meson mass spectra mη′ > mK,η.
To summarize, the χQM builds a bridge between the QCD and low-energy quark model.
This allows us to understand the mechanism of flavor symmetry breaking and nucleon flavor-
spin structure in NQM through the consideration of the sea quark and Goldstone bosons
in the substructure of constituent quarks. Using the simple SHO wave function, we have
modeled the wave functions of the composite constituent quarks and thus estimated the
transition probabilities for Goldstone boson emissions. These transition probabilities indeed
reflect the flavor SU(3) symmetry breaking in χQM from the differences in quark masses
ms > mu,d and differences in Goldstone bosons masses mK,η > mπ and they are roughly
in agreement with the parametrizations of other model calculations [11, 12, 13, 14, 15, 16,
TABLE V: The calculated values for the quark flavor distribution functions and spin polarization
functions in proton, as compared with experimental data and NQM results.
Data NQM Our Model
With ∆g Without ∆g
∆u 0.85 ± 0.05[2] 1.33 0.864 0.968
∆d −0.41± 0.05[2] −0.33 −0.377 −0.274
∆s −0.07± 0.05[2] 0 −0.107 −0.003
∆3 = (GA/GV )n→p 1.270 ± 0.003[27] 1.67 1.242 1.242
(GA/GV )Λ→p 0.718 ± 0.015[27] 1 0.737 0.737
(GA/GV )Σ→n −0.340 ± 0.017[27] −0.33 −0.270 −0.270
(GA/GV )Ξ→Λ 0.25 ± 0.05[27] 0.33 0.234 0.234
∆8 0.58 ± 0.025[2] 1 0.701 0.701
∆Σ 0.19 ± 0.02[2] 0.5 0.190 0.346
ū − 0.264
d̄ − 0.392
s̄ − 0.036
ū− d̄ −0.118 ± 0.015[6] 0 −0.128
ū/d̄ 0.67 ± 0.06[6] 1 0.674
IG 0.254 ± 0.005[6] 0.33 0.248
fu − 0.577
fd − 0.407
fs 0.10 ± 0.06[31] 0 0.017
f3 − 0.170
f8 − 0.950
f3/f8 0.21 ± 0.05[14] 0.33 0.179
17, 18, 19]. The fit to both the flavor-spin structure of nucleon and octet baryon magnetic
moments are in good agreement with experiments.
TABLE VI: The caculated octet baryon magnetic moments in nuclear magneton, as compared with
experiments and the results of NQM.
Octet baryons Data[27] NQM[34] Our model
p 2.79± 0.00 2.72 2.73
n −1.91 ± 0.00 -1.81 −1.91
Σ− −1.16 ± 0.025 -1.01 −1.23
Σ+ 2.46± 0.01 2.61 2.67
Ξ0 −1.25± 0.0014 −1.41 −1.36
Ξ− −0.65 ± 0.002 −0.50 −0.44
Λ −0.61 ± 0.004 −0.59 −0.56
ΣΛ 1.61± 0.08 1.51 1.63
Acknowledgments
Zhan Shu would like to thank Fan-Yong Zou and Yan-Rui Liu for useful discussions. This
work was supported by the National Natural Science Foundation of China under Grants
10675008.
[1] J. Ashman et al. (European Muon), Phys. Lett. B206, 364 (1988); Nucl. Phys. B328, 1
(1990).
[2] B. Adeva et al. (Spin Muon), Phys. Lett. B302, 533 (1993); P. Adams et al. (Spin Muon),
Phys. Rev. D56, 5330 (1997).
[3] P. L. Anthony et al. (E142), Phys. Rev. Lett. 71, 959 (1993).
[4] K. Abe et al. (E143), Phys. Rev. Lett. 74, 346 (1995).
[5] P. Amaudruz et al. (New Muon), Phys. Rev. Lett. 66, 2712 (1991); M. Arneodo et al. (New
Muon), Phys. Rev. D50, R1 (1994).
[6] E. A. Hawker et al. (E866/NuSea), Phys. Rev. Lett. 80, 3715 (1998); J. C. Peng et al.
(E866/NuSea), Phys. Rev. D58, 092004 (1998); R. S. Towell et al. (E866/NuSea), ibid. D64,
052002 (2001).
[7] A. Baldit et al. (NA51), Phys. Lett. B332, 244 (1994).
[8] S. Weinberg, Physica A96, 327 (1979).
[9] A. Manohar and H. Georgi, Nucl. Phys. B234, 327 (1984).
[10] E. J. Eichten, I. Hinchliffe, and C. Quigg, Phys. Rev. D45, 2269 (1992).
[11] T. P. Cheng and L.-F. Li, Phys. Rev. Lett. 74, 2872 (1995).
[12] T. P. Cheng and L.-F. Li, Phys. Rev. D57, 344 (1998).
[13] X. Song, J. S. McCarthy, and H. J. Weber, Phys. Rev. D55, 2624 (1997); X. Song, ibid., D57,
4114 (1998).
[14] T. P. Cheng and L.-F. Li, Phys. Rev. Lett. 80, 2789 (1998).
[15] J. Linde, T. Ohlsson, and H. Snellman, Phys. Rev. D57, 452 (1998); T. Ohlsson and H. Snell-
man, Eur. Phys. J. C7, 501 (1999).
[16] H. Dahiya and M. Gupta, Phys. Rev. D64, 014013 (2001).
[17] H. Dahiya and M. Gupta, Phys. Rev. D66, 051501(R) (2002); D67, 114015 (2003).
[18] H. Dahiya, M. Gupta, and J. M. S. Rana, Int. J. Mod. Phys. A21, 4255 (2006).
[19] L. Yu, X.-L. Chen, W.-Z. Deng, and S.-L. Zhu, Phys. Rev. D73, 114001 (2006).
[20] S. Baumgartner, H. J. Pirner, K. C. Konigsmann, and B. Povh, Z. Phys. A353, 397 (1996).
[21] S. Godfrey and N. Isgur, Phys. Rev D32, 189 (1985); S. Capstick and N. Isgur, Phys. Rev.
D34, 2809 (1986).
[22] A. De Rujula, H. Georgi, and S. L. Glashow, Phys. Rev. D12, 147 (1975).
[23] N. Isgur and G. Karl, Phys. Rev. D18, 4187 (1978).
[24] R. Koniuk and N. Isgur, Phys. Rev. D21, 1868 (1980)
[25] N. Isgur, G. Karl, and R. Koniuk, Phys. Rev. Lett. 41, 1269 (1978); N. Isgur and G. Karl,
Phys. Rev. D21, 3175 (1980).
[26] J. Franklin, Phys. Rev. D66, 033010 (2002).
[27] W. M. Yao et al. (Particle Data Group), J. Phys. G33, 1 (2006).
[28] G. Altarelli, G. G. Ross, Phys. Lett. B212, 391 (1988).
[29] R. D. Carlitz, J. D. Collins, and A. H. Mueller, Phys. Lett. B214, 229 (1988).
[30] A. V. Efremov, O. V. Teryaev, Dubna Report No. JIN-E2-88-287, 1998.
[31] J. Grasser, H. Leutwyler, and M. E. Saino, Phys. Lett. B253, 252 (1991); A. O. Bazarko et
al. (CCFR), Z. Phys. C65, 189 (1995).
[32] S. Theberge and A. W. Thomas, Phys. Rev. D25, 284 (1982).
[33] J. Cohen and H. J. Weber, Phys. Lett. B165, 229 (1985).
[34] G. Karl, Phys. Rev. D45, 247 (1992).
	INTRODUCTION
	The Wave Function of a Constituent quark
	FLAVOR AND SPIN STRUCTURE OF PROTON
	BARYON OCTET MAGNETIC MOMENTS
	NUMERICAL RESULTS AND CONCLUSIONS
	Acknowledgments
	References
ABSTRACT
  In $\XQM$, a quark can emit Goldstone bosons. The flavor symmetry breaking in
the Goldstone boson emission process is used to intepret the nucleon
flavor-spin structure. In this paper, we study the inner structure of
constituent quarks implied in $\XQM$ caused by the Goldstone boson emission
process in nucleon. From a simplified model Hamiltonian derived from $\XQM$,
the intrinsic wave functions of constituent quarks are determined. Then the
obtained transition probabilities of the emission of Goldstone boson from a
quark can give a reasonable interpretation to the flavor symmetry breaking in
nucleon flavor-spin structure.

<|endoftext|><|startoftext|>
Introduction
Mounting experimental evidence from high-Tc cuprates 1, nickelates 2,
manganites 3,4 and other interesting materials suggests that large electron-
phonon interactions may play a more important role in the physics of strongly
correlated electron systems than previously thought. Migdal-Eliashberg and
BCS theories have proved extremely successful in describing the effects of
phonons in many materials. However, if the coupling between electrons and
the underlying lattice is large, and/or the phonons can not be treated within
an adiabatic approximation, conventional approaches fail.
The Holstein model contains most of the fundamental physics of the
electron-phonon problem 5. Tight-binding electrons are coupled to the lat-
tice through a local interaction with Einstein modes. For large phonon
frequencies, electrons interact with a strongly correlated Hubbard-like at-
traction, while for small phonon frequencies the lattice gives rise to a static
http://arxiv.org/abs/0704.0030v1
potential which is essentially uncorrelated. Between these two extreme lim-
its of correlated and uncorrelated behavior, levels of correlation are tuned
by the size of the phonon frequency and novel physics is expected. In par-
ticular, it is normally the strength of interaction which is said to tune the
correlation in e.g. the Hubbard model, whereas in the Holstein model, it can
be seen that both interaction strength and phonon frequency may compete
with each other to play this role.
The dynamical mean-field theory (DMFT) approach has proved suc-
cessful in treating the Holstein and other models 6,7,8. DMFT treats the
self-energy as a momentum-independent quantity and is accurate as long as
the variation across the Brillouin zone is small. For many aspects of the
electron-phonon problem in 3D, correlations are short ranged and DMFT
can be successfully applied. The weak coupling phase diagram was studied
by Ciuchi et al. where competing charge-order (CO) and superconducting
states were found 7. Freericks et al. developed a quantum Monte-Carlo
(QMC) algorithm 8,9 and examined the applicability of several perturba-
tion theory based techniques to the electron-phonon problem 10,11,12. The
prediction of measurable quantities away from certain well-defined limits is
severely restricted owing to difficulties inherent in the analytic continuation.
Dynamic properties such as spectral functions can be computed in the case
of static phonons 13, and close to the static limit 14. Alternatively, the limit
of high phonon frequency (attractive Hubbard model) has been studied with
a QMC algorithm 15.
In the current study we are concerned with the behavior of dynamical
properties that could be measured directly with experiment. We use the iter-
ated perturbation theory approximation, which has been demonstrated to be
accurate for the Hubbard model, and use maximum entropy to analytically
continue the results. We compare the resulting single-particle spectral func-
tions over a wide range of electron-phonon coupling strengths and phonon
frequencies. The results obtained using iterated perturbation theory (IPT)
are promising and capture generic weak and strong coupling behavior for
all phonon frequencies. At intermediate phonon frequencies, we find that
electron-phonon interactions produce a spectral function which is simulta-
neously characteristic of both uncorrelated band (static) and strongly cor-
related Mott/Hubbard regimes. We also find that the competition between
band-like and correlated states causes unusual structures in the optical con-
ductivity and resistivity. Provided a material with high enough phonon
frequency can be identified, it is possible that such a state could be observed
experimentally.
This paper is organized as follows. First, we introduce the Holstein
model, the dynamical mean-field theory and analytic continuation techniques
(1a) (2a)
(2b) (2c)
Fig. 1. Second order contributions to the self-energy. Straight lines repre-
sent electron Green’s functions of the host and wavy lines phonon Green’s
functions.
(section 2.). In section 3., we use IPT to determine the spectral functions of
the Holstein model. We compare IPT with exactly known results in the static
limit. This, in conjunction with the conclusions of Ref. 11 leads us to argue
that IPT is a reasonable approximation for the calculation of dynamical
properties in the intermediate phonon frequency regime. We compute the
density of states, optical conductivity and resistivity, and give a heuristic
explanation for their behavior.
2. Formalism
The Holstein Hamiltonian is written as,
H = −t
<ij>σ
iσcjσ +
(gxi − µ)niσ +
Mω20x
The first term in this Hamiltonian represents a tight binding model with
hopping parameter t. The second term couples the local ion displacement,
xi to the local electron density. The final term can be identified as the
non-interacting phonon Hamiltonian. c
i (ci) create (annihilate) electrons at
site i, pi is the ion momentum, M the ion mass, µ the chemical potential
and g the electron-phonon coupling. The phonons are dispersionless with
frequency ω0.
The perturbation theory of this model may be written down in terms
of electrons interacting via phonons with the effective interaction,
U(iωs) = −
M(ω2s + ω
Here, ωs = 2πsT are the Matsubara frequencies for bosons and s is an
integer. Taking the limit ω0 → ∞, g → ∞, while keeping the ratio g/ω0
finite, leads to an attractive Hubbard model with a non-retarded on-site
interaction U = −g2/Mω20 . Iterated-perturbation theory (IPT) is known to
be a reasonable approximation to the half-filled Hubbard model 16,17. Taking
the opposite limit (ω0 → 0, M → ∞, keeping Mω20 ≡ κ finite) the phonon
kinetic energy term vanishes, and the phonons depend on a static variable
xi. As such, the model may be considered as uncorrelated.
We solve the Holstein model using dynamical mean-field theory (DMFT).
DMFT freezes spatial fluctuations, leading to a theory which is completely
momentum independent, while fully including dynamical effects of excita-
tions. In spite of this simplification, DMFT predicts non-trivial (correlated)
physics and may be used as an approximation to 3D models 18. As discussed
in Ref. 6, DMFT involves the solution of a set of coupled equations which
are solved self-consistently. The Green’s function for the single site problem,
G(iωn) can be written in terms of the self-energy Σ(iωn) as,
G−1(iωn) = G−10 (iωn)− Σ(iωn), (3)
where Σ is a functional of G0, the Green’s function for the host of a single
impurity model. Here ωn = 2πT (n+ 1/2) are the usual Matsubara frequen-
cies. The assumptions of DMFT are equivalent to taking the self-energy of
the original lattice problem to be local, hence G is also given by,
G(iωn) =
dǫD(ǫk)
iωn + µ− Σ(iωn)− ǫk
where D(ǫ) the density of states (DOS) of the non-interacting problem (in
our case g = 0). We work with a Gaussian DOS which corresponds to a
hypercubic lattice 18, D(ǫ) = exp(−ǫ2/2t2)/t
2π. Equations (3) and (4)
are solved according to the following self-consistent procedure: Compute
the Green’s function from equation (4) and the host Green’s function of the
effective impurity problem, G0, from equation (3); then calculate a new self-
energy from the host or full Green’s functions. In the following we will take
the hopping parameter t = 0.5, which sets the energy scale.
Once the algorithm has converged, and after analytically continuing to
the real axis, response functions can be computed. We use the MAXENT
method for the determination of spectral functions from Matsubara axis
data. MAXENT treats the analytic continuation as an inverse problem 19.
The Green’s function, G(z), is given by the integral transform,
G(z) =
z − x
dx (5)
where ρ(x) is the spectral function (ρ(ω) = Im[G(ω + iη)]/π). The problem
of finding ρ is therefore one of inverting the integral transform. Since the
data for Gn are incomplete and noisy for any finite set of Matsubara fre-
quencies, the inversion of the kernel of the discretised problem is ill-defined.
The MAXENT method selects the distribution ρ(x) which assumes the least
structure consistent with the calculated or measured data. These methods
have been extensively reviewed in the context of the inversion of the kernel
in Refs. 19,20. The applicability to the current problem has been thoroughly
tested, and is found to be accurate.
Within the DMFT formalism, many response functions follow directly
from the one-electron spectral function and the electron self-energy (essen-
tially because of the neglect of all connected higher point functions apart
from G0). Here we will be interested in the conductivity 6:
Re[σ(ω)] =
dǫD(ǫ)
dν ρ(ǫ, ν)ρ(ǫ, ν + ω)[f(ν)− f(ν + ω)] (6)
where f(x) is the Fermi-Dirac distribution. Taking the limit, ω → 0, leads
to the DC conductivity. (The conductivity is in units of e2V/ha2, where a
is the lattice spacing, V the volume of the unit cell and e and h are the
electron charge and Planck’s constant respectively.)
3. Results
In this section, we examine the validity of an approximation to the self-
energy constructed from only first and second order terms with respect to the
spectral functions calculated at very high and very low phonon frequencies.
Finally, we calculate the optical conductivity and resistivity.
Spectral functions are shown in figures 2 and 3. The perturbation theory
is carried out in the host Green’s function (i.e. both electrons and phonons
are bare). All diagrams in fig. 1 are considered,
Σ1a(iωn) = −UT
G0(iωn − iωs)D0(iωs) (7)
Σ2a(iωn) = −2U2T 2
D20(iωn−m)G0(iωm)G0(iωs)G0(iωn−m+s) (8)
Σ2b(iωn) = U
D0(iωm−s)D0(iωn−m)G0(iωm)G0(iωs)G0(iωn−m+s)
Σ2c(iωn) = U
D0(iωn−m)D0(iωm−s)G20(iωm)G0(iωs) (10)
This also gives the correct weak coupling limit for the electronic Green’s
function.
We consider first the calculation of spectral functions close to the static
and Hubbard limits. In the instantaneous limit the perturbation theories
for the Holstein and Hubbard models are equivalent. It is well known that
second order perturbation theory in the host Green’s function provides a
good approximation to the Hubbard model 21. In the static limit, the exact
solution can easily be calculated 13. Figure 2 shows spectral functions from
the exact solution, computed for a hypercubic lattice and spectral functions,
computed using 2nd order perturbation theory at a temperature of T = 0.08.
The phonon frequency ω0 = T/20 was chosen so that the effects of the
phonon kinetic energy are negligible compared to thermal fluctuations. This
allows a direct comparison to be made between the exact and approximate
results. The comparison shows that the widths and positions of the major
features are closely related.
The results in Figure 2 for the static limit (ω0 → 0), together with the
fact that second order IPT is known to give reasonable results in the instan-
taneous limit (ω0 → ∞), suggest that the calculation of spectral functions
should also be reliable at intermediate frequencies. We note that Freericks et
al. also find a reasonable agreement between the IPT and QMC self-energies
at half-filling, and that this should lead to a good agreement in the Matsub-
ara axis Green’s function. We have therefore solved the IPT equations for
the spectral functions at intermediate frequencies. We show the results in
Fig 3.
The results of the IPT calculations in the regime of intermediate cou-
pling (Fig 3) are consistent with known results for the limiting cases. For
frequencies ω ≫ ω0, the system has the qualitative behavior of the static
limit: The original unperturbed density of states splits into two sub-bands
centered around ±U/2. For small frequencies (ω ≪ U,ω0) and interaction
strength, U , less than some critical value, the system behaves as an in-
teracting electron (Hubbard) model, since the retarded interaction between
particles U(iωs) (see equation 2) is effectively constant for ωs ≪ ω0. There is
then a narrow quasiparticle band at the Fermi energy with density of states
at the Fermi energy pinned at its non-interacting value 6. We also note that
the results for small coupling and small frequencies are in good agreement
with those calculated using ME theory in the metallic phase 22.
The recent renormalization group (NRG) calculations of Meyer et al.
23,24 also report the spectral function in the intermediate regime. The NRG
is in principle an exact method for solving the impurity problem onto which
the DMFT equations map. Our results are largely consistent with theirs
adding further support to the use of IPT in the intermediate regime. When
comparing with the results of Meyer et al 23, one should note that the Hamil-
tonian (1) is exactly the one considered in Ref. 23 but with the quantity
2Mω0 =
Uω0/2 denoted by g in Ref. 23 and with energies measured
in terms of the full bandwidth (instead of the half-bandwidth used here).
In this paper, we work with the Gaussian density of states for the non-
interacting electron DOS, whereas reference 23 uses the semi-elliptic DOS.
In general, we would expect the critical values for the opening of a gap to
be larger for the Gaussian case than for the semi-elliptic case. The critical
coupling for the parameters in Figure 3(c) lies just above U = 2.0, corre-
sponding in the units used in Ref. 23 to g = 1, compared with the critical
value found for the semi-elliptic DOS of g = 0.69 (note that because of the
different energy scales ω0 = 2 in our results corresponds to ω0 = 1 in the
units of 23). However, the shapes of the spectral functions are similar in
both cases, with a five-peaked structure below and four peaked structure
above the transition. The peaks are narrower in the IPT results than in the
NRG results and there is less weight in the high energy peaks. This may
reflect the different DOS, or inaccuracies in the NRG method at frequencies
far from the Fermi energy resulting from the logarithmic discretisation, but
more likely the limitations of the IPT method.
Using the method outlined in section 2. it is possible to calculate the
real-axis self-energy. The temperature evolution of the imaginary part of the
self-energy may be seen in figure 4 for U = 2 and ω0 = 2 The self-energy
at low temperatures and small frequencies shows a quadratic (Fermi-liquid
like) behavior consistent with the narrow quasiparticle peak seen in the spec-
tral function (Fig 3) and develops to a broad central peak at higher tem-
peratures. There are also peaks corresponding to the Hubbard sub-bands.
With increasing temperature these phonon-induced peaks move together and
merge into the single central maximum associated with incoherent on-site
scattering. This peak is naturally characterized within the framework of
the self-consistent impurity model formulation of the DMFT equations 6 in
terms of a Kondo resonance. In this formulation, the dynamical mean field
G0(ω) is written in terms of a hybridization ∆(ω) between the site orbital and
a bath of conduction electrons and is therefore equivalent to an Anderson
impurity model with the added complication that ∆ is frequency-dependent
and needs to be computed self-consistently. However, many of the properties
in the metallic state are similar to those of the Anderson impurity model.
In particular the central peak in the spectral function can be viewed as the
Kondo resonance of the impurity model.
As all connected point-functions with order higher than G0 are neglected
within DMFT, the computation of q = 0 response functions is straightfor-
ward. As an example we show in fig 5 the (real part of the) optical con-
ductivity for various temperatures with U = 2 and ω0 = 2. The structure
seen in the curves reflects the structure of the density of states. There is a
strong response at low frequencies as particle-hole pairs are excited within
the ‘Kondo-like’ quasiparticle resonance at the Fermi energy. The second
peak at frequencies ω ∼ 1 arises from excitations between the quasiparti-
cle resonance and the large satellite (Hubbard band), while the third peak
around ω ∼ 5.0 involves excitations between the satellites. The first dip
at ω = 0.5 is the signature of the small Mott bands close to the Kondo
resonance and is the feature most likely to be observable experimentally.
Also calculated is the resistivity as a function of temperature (figure 6).
The curves reflect the structure of the self-energy shown in Figure 4: At low
temperatures the resistivity rises quadratically as expected for interacting
electrons. The temperature scale is given by the quasiparticle bandwidth
(‘Kondo temperature’). Above this temperature the resistivity drops as the
on-site (Kondo) scattering amplitude for electrons reduces. There is a slight
second peak at higher temperatures. The structures in ρ can be traced back
to the behavior seen in the self-energy. This second peak is the result of an
increase in scattering off the phonons: these soften slowly with increasing
temperature and, around the second peak in the resistivity curve, outweigh
the reduction in Kondo-like scattering as the temperature increases. This
effect clearly involves a partial cancellation between two effects and hence
may be sensitive to the accuracy of the analytic continuation, which at higher
temperatures starts from reduced information (since the majority of Mat-
subara points simply show asymptotic behaviour).
4. Summary
We have discussed the result of changing the ratio of electron and
phonon energies as a method for tuning the amount of correlation in a model
of electron-phonon interactions. We use approximate schemes to solve for
the spectral functions of the Holstein model. On the basis that second-order
iterated perturbation theory predicts the correct qualitative behavior at a
range of couplings in the static limit as well as describing correctly the limit
of infinite phonon frequency, we have computed the spectral function at in-
termediate frequencies and couplings. We have used an adaptation of the
standard maximum entropy scheme to obtain the spectral function, the self-
energy and the conductivity of the model by analytic continuation. These
quantities had not previously been studied.
The results for the intermediate frequency regime are consistent with
what might be expected on the basis of the limiting cases (high and low
frequencies). At energy scales smaller than ω0, the system shows behav-
ior similar to that of the Hubbard model found in the instantaneous limit
ω0 → ∞: there is a narrow central ‘Kondo-resonance’ or quasiparticle band.
At large energies the model behaves as it does in the static regime with a
well-defined band splitting. At intermediate frequencies the picture is com-
plicated by the interplay of the loss of coherence in the quasiparticle band
and the effective renormalization of the phonon frequency as a function of
coupling and temperature. We suggest that if systems with anomalously
large phonon frequencies and couplings exist, then the optical conductivity
should bear the hallmark of the correlation tuned regime.
5. Acknowledgements
The authors would like to thank F.Essler and F.Gebhard for useful
discussions.
REFERENCES
1. A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar, D.L.Feng, E.D.Lu,
T.Yoshida, H.Eisaki, A.Fujimori, K.Kishio, J.-I.Shimoyama, T.Noda, S.Uchida,
Z.Hussa, and Z.-X.Shen. Nature, 412:6846, 2001.
2. J.M.Tranquada, K.Nakajima, M.Braden, L.Pintschovius, and R.J.McQueeney.
Bond-stretching-phonon anomalies in stripe-ordered la1.69sr0.31nio4. Phys.
Rev. Lett., 88:075505, 2002.
3. G.M.Zhao, K.Conder, H.Keller, and K.A.Müller. Nature, 381:676, 1996.
4. A.J.Millis, R.Mueller, and B.I.Shraiman. Phys. Rev. B, 54:5405–5417, 1996.
5. T.Holstein. Ann. Phys., 8:325–342, 1959.
6. A.Georges, G.Kotliar, W.Krauth, and M.Rozenburg. Rev. Mod. Phys., 68:13,
1996.
7. S. Ciuchi, F.de Pasquale, C.Masciovecchio, and D.Feinberg. Europhys. Lett.,
24:575–580, 1993.
8. J.K.Freericks, M.Jarrell, and D.J.Scalapino. Phys. Rev. B, 48:6302–6314, 1993.
9. J.K.Freericks, M.Jarrell, and D.J.Scalapino. Europhys. Lett., 25:37–42, 1994.
10. J.K.Freericks. Phys. Rev. B, 50:403–417, 1994.
11. J.K.Freericks and M.Jarrell. Phys. Rev. B, 50:6939–6952, 1994.
12. J.K.Freericks, V.Zlatić, W.Chung, and M.Jarrell. Phys. Rev. B, 58:11613–
11623, 1998.
13. A.J.Millis, R.Mueller, and B.I.Shraiman. Phys. Rev. B, 54:5389–5404, 1996.
14. P.Benedetti and R.Zeyher. Phys. Rev. B, 58:14320–14334, 1998.
15. M.Keller, W.Metzner, and U.Schollwock. Dynamical mean-field theory for the
normal phase of the attractive hubbard model. J. Low. Temp. Phys, 126:961,
2002.
16. A.Georges and G.Kotliar. Phys. Rev. B, 45:6479, 1992.
17. M.J.Rozenberg, X.Y.Zhang, and G.Kotliar. Phys. Rev. Lett., 69:1236, 1992.
18. W.Metzner and D.Vollhardt. Phys. Rev. Lett., 62:324, 1989.
19. J.E.Gubernatis, M.Jarrell, R.N.Silver, and D.S.Sivia. Phys. Rev. B, 44:6011,
1991.
20. H.Touchette and D.Poulin. Aspects numériques des simulations du modèle de
hubbard – monte carlo quantique et méthode d’entropie maximum. Technical
report, Université de Sherbrooke, 2000.
21. X.Y.Zhang, M.J.Rozenberg, and G.Kotliar. Phys. Rev. Lett., 70:1666, 1993.
22. J.P.Hague and N.d’Ambrumenil. cond-mat/0106355, 2001.
23. D.Meyer, A.C.Hewson, and R.Bulla. Gap formation and soft phonon mode in
the holstein model. Phys. Rev. Lett., 89:196401, 2002.
24. A.C.Hewson and D.Meyer. Numerical renormalization group study of the
anderson-holstein impurity model. J. Phys. Condens. Matt, 14(3):427, 2002.
-6 -4 -2 0 2 4 6
(a) Static
U=0.33
U=1.17
U=2.00
U=4.50
-6 -4 -2 0 2 4 6
(b) IPT
U=0.33
U=1.17
U=2.00
U=4.50
-6 -4 -2  0  2  4  6
(c) Conserving
U=0.33
U=1.17
U=2.00
U=4.50
Fig. 2. The spectral function in the static limit of the half-filled Holstein
model computed at temperature T = 0.08 (a) using the exact solution and
(b) using 2nd order IPT at a low frequency, ω0 = 0.004. The IPT solution
at this small non-zero frequency is quite close to the exact solution in the
static limit. In particular, the band splitting and the positions of the maxima
agree. To contrast, panel (c) shows the results of the approximation using
the full Green’s function (Diagram 2c from figure 1 is not included to avoid
overcounting)
-6 -4 -2 0 2 4 6
(a) ω0=0.056
U=0.33
U=1.17
U=2.00
U=4.50
-6 -4 -2 0 2 4 6
(b) ω0=0.500
U=0.33
U=2.00
U=4.50
-6 -4 -2 0 2 4 6
(c) ω0=2.000
U=0.33
U=2.00
U=4.50
Fig. 3. Spectral functions of the half-filled Holstein model for various
electron-phonon couplings U , approximated using 2nd order perturbation
theory at T = 0.02 and ω0 = 0.056 (top), ω0 = 0.5 (center) and ω0 = 2
(bottom). In the low frequency limit (ω0 = 0.125), the spectral functions
are similar to those in the static limit shown in Fig. 2, with only a small
effect from the non-zero phonon frequency. As the temperature is lower than
the phonon frequency, the central quasiparticle peak is clearly resolved for
U ≤ 2. For the intermediate frequencies (central panel) the peak around
ω = 0 is again clear and has a width ∼ ω0 at low coupling. In the gapped
phase at large couplings two band-splittings are visible. For ω ≫ ω0 the
band splits just as in the static limit, while for ω ≪ U there is a peak at
a renormalized phonon frequency (which is less than the bare phonon fre-
quency). In the ungapped phases for ω0 = 0.5 and 2, the low energy behavior
is similar to that found in the Hubbard model with a narrow quasiparticle
band forming near the Fermi energy with the value at the Fermi energy
pinned to its value in the non-interacting case.
-6 -4 -2 0 2 4 6
T=0.08
T=0.16
T=0.32
Fig. 4. Imaginary part of the self-energy of the half-filled Holstein model
when U = 2 and ω0 = 2 computed using IPT and analytically continued
using MAXENT. At low temperatures the low frequency behavior is Fermi-
liquid like (quadratic dependence on ω) down to quite low frequencies (at
very low frequencies and low temperatures there are some inaccuracies as-
sociated with the truncation in Matsubara frequencies). There are peaks at
the frequencies associated with the phonon energy and with U. As the tem-
perature increases the minimum at the Fermi energy (ω = 0) increases as
incoherent on-site scattering in the corresponding local impurity increases
(see text). At temperatures above the characteristic (Kondo-like) energy
scale the central peak subsides and disappears.
0 1 2 3 4 5 6
T=0.08
T=0.16
T=0.32
Fig. 5. The real part of the optical conductivity for a system with U = 2.0
and ω0 = 2.0 for a range of temperatures. The structure of the spectrum
reflects that in the density of states (see fig 3. At low frequencies, electrons
may be excited within the quasiparticle resonance. The second peak at
ω ∼ 2.0 represents excitations from the Kondo resonance to the large satellite
(Hubbard band), and the peak at ω ∼ 5.0 represents excitations between the
satellites.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
U=2.00
U=2.13
U=2.28
Fig. 6. The resistivity as a function of temperature for the Holstein model
for ω0 = 2 for varying electron-phonon coupling strengths. The resistivity is
in units of e2V/ha2 with V the unit cell volume and a the lattice cell spacing.
The behavior reflects what is seen in the self-energy. At low temperatures
the behavior is similar to that in a Kondo lattice. The resistivity rises
sharply with temperature for temperatures smaller than the quasiparticle
bandwidth. The resistivity then drops for temperatures larger than this
lattice coherence temperature. A simple logarithmic decay with temperature
is not visible because, in addition to the Kondo-like scattering processes,
the electrons are scattered from thermally excited phonons whose spectral
weight broadens and shifts towards lower frequencies as the temperature
rises. This leads to a second peak. In contrast, the second peak is not
visible for the Hubbard model, and indicates the presence of two energy
scales in the Holstein model.
	Introduction
	Formalism
	Results
	Summary
	Acknowledgements
ABSTRACT
  We investigate the effect of tuning the phonon energy on the correlation
effects in models of electron-phonon interactions using DMFT. In the regime
where itinerant electrons, instantaneous electron-phonon driven correlations
and static distortions compete on similar energy scales, we find several
interesting results including (1) A crossover from band to Mott behavior in the
spectral function, leading to hybrid band/Mott features in the spectral
function for phonon frequencies slightly larger than the band width. (2) Since
the optical conductivity depends sensitively on the form of the spectral
function, we show that such a regime should be observable through the low
frequency form of the optical conductivity. (3) The resistivity has a double
kondo peak arrangement

<|endoftext|><|startoftext|>
Introduction
A particle entering the crystal lattice parallel to a major crystallographic direction can be
captured and channeled by the lattice along a crystal axis or plane [1]. For instance, a positive
particle can be channeled between adjacent atomic planes. In a bent crystal, the channeled
particles can follow the bend [2]. This led to elegant technique of beam steering by bent
channeling crystals [3] now experimentally explored over six decades in energy from low MeV
[4] to 1 TeV [5]. The technique is used on permanent basis in IHEP Protvino where crystal
systems extract protons from 70 GeV main ring with efficiency of 85% at intensity up to 4×1012
protons using Si crystals just 2 mm along the beam [6]. Bent crystals channel in good agreement
with predictions up to the highest energies [6-9].
Crystal applications at the 7-TeV Large Hadron Collider are considered for beam collimation
and extraction [10] and in situ calibration of CMS and ATLAS calorimeters [11]. In another
proposal, crystal could capture the particles emerging from the interaction point (IP) with small
angles and channel them out of the beam [12]. This could help to improve on measurement of
small angle elastic and “quasi-elastic” scattering in CMS and ATLAS where lower momentum
transfers might become available for pp elastics scattering and lower proton momentum losses
for diffractive physics [12]. Groups in both CMS (with TOTEM) and ATLAS would like to add
very forward proton detectors, 420 m downstream on both sides, a project FP420 [13]. By
detecting protons that have lost less than 1% of their longitudinal momentum, a rich QCD,
electroweak, Higgs and BSM program becomes accessible, with the potential to make
measurements which are unique at LHC, and difficult even at a future linear collider [13].
The measurement of the displacement x and angle x’ (in the horizontal plane) of the outgoing
protons relative to the beam allows the momentum loss ξ=∆p/p and transverse momentum of the
scattered protons to be reconstructed. Protons emerging from diffractive scattering at LHC have
very small emission angles (10-150 µrad) and fractional momentum loss (ξ = 10-8 – 0.1). Hence
they are very close to the beam and can only be detected in the Roman Pots downstream if their
displacement at the detector location is large enough to escape the beam halo [13].
2. Crystal efficiency
As practice shows, crystal can go into a very limited space and get particles from there [6].
Most efficient crystal applications are based on so called “multipass” mode where particles can
encounter a crystal many times in the ring [6,14]. There are also successful experimental
demonstrations of highly efficient channeling in a single pass, with efficiency up to 60% at
CERN SPS [15]. Throughout this paper we consider only a single-pass channeling.
We show with simulations that at the LHC a crystal can efficiently channel forward protons.
For channeling simulations we apply a Monte Carlo code CATCH [16] successfully used for
prediction of experiments at CERN SPS [9], IHEP U-70 [6], Tevatron [7], RHIC [8] and KEK
[17] and crystal applications at the LHC [10,11].
Crystal capture is very selective in angle. The critical angle θC within which the capture is
possible is as small as about ±5 µrad / E1/2(TeV) at a high energy E in Silicon (110) planes.
Proton divergence of 150 µrad is almost 100 times θC at 7 TeV. Therefore it is not possible to
capture all these protons by a plain crystal. However, we can suggest an efficient solution
benefiting from the fact that all diffractively scattered protons originate from a small region at
the IP.
For standard LHC optics with beta function value at the IP β*=0.55 m, the beam size at the IP
is σbeam≈16 µm rms. The spread in the transverse position of the vertex point where outgoing
protons originate from is determined by the rms spread of the beam and equals σbeam/√2 [13]. As
protons emerge from diffractive scattering at LHC with emission angles up to ≈150 µrad and
interaction width σbeam/√2, the emittance of the beam to be trapped by a crystal is ≤ 2π µrad-mm
only. This corresponds to the acceptance of a Si crystal of ~1 mm transverse size. The match of
the diffractive-protons emittance to the crystal acceptance means that the particles could be
                                                                                                                                                            
♦ http://mail.ihep.ru/~biryukov/
trapped and channeled efficiently. To realize this, one has to match the crystal design and
location to the application. For the LHC scenario with high luminosity, we find the most efficient
design to be a crystal with a point-to-parallel focusing entry face. Focusing crystal proposed by
A.I Smirnov in 1985 has a face shaped so that the tangents to the crystal planes cross at a focus
line at some distance LF from the crystal [3]. This kind of crystal was successfully tested at IHEP
where it efficiently trapped a beam of ±2 mrad divergence (or ~100 times θC) [18]. The crystal
traps protons emerging from the focus line uniformly from all the angular range if the entry face
has a proper shape [18].
We find that for most efficient channeling in the LHC the focus distance LF of this crystal
should be equal to the effective distance Leff between the crystal location and the IP. In a drift
space, Leff is geometrical distance. In accelerator lattice, Leff =(β*βC)1/2sin∆ψ, where βC is β value
at the crystal and ∆ψ is the phase advance from the IP to the crystal. A plain crystal with a flat
entry face has LF = ∞.
In simulations with the low β* optics settings, a focusing crystal shows best efficiency if
installed at a location with effective distance Leff ≥ 15 m from the IP. It can be a Si (110) or (111)
crystal of ≈(0.15 mrad)×Leff ≥ 2.5 mm transverse size in order to capture efficiently all diffractive
protons. Simulation predicts that a focusing Si(110) crystal with LF=Leff traps 90% of 7 TeV
protons emerging from the IP in the angular range of 150 µrad width into channeling mode. The
efficiency figure is almost independent of the crystal location provided Leff ≥ 15 m.
The reason for high efficiency at high Leff is that, at a distance Leff from the IP, any point at
the crystal entry face sees the beam source of σ size (at the IP) at an angle of σ/Leff. Channeling
efficiency reduces by a factor of about (1-(σ/LeffθC)2)1/2 [3]. The reduction in efficiency by a
factor of ≈1-σ2/2L2effθC2 becomes negligible for L2eff >> σ2/2θC2 =β*ε/4θC2 where ε is beam
emittance. With β*=0.55 m and Si(110) crystal, channeling efficiency saturates for L2eff >> (4
One more idea for efficient channeling of forward protons in the LHC is that a crystal can be
installed with planes parallel to either x’ or y’ plane. For application, it is not critical whether
crystal bends protons in horizontal or vertical plane to produce an offset at the detector. But the
distance Leff in accelerator lattice from the IP to the crystal can be very different in x and y
planes. Then channeling efficiency is very different, whether crystal traps protons in x’ or y’
plane. We suggest in this case to install crystal for channeling in the plane with larger Leff.
For instance, on the location 200 m downstream of the IP5 (CMS) and some 20 m ahead of
the Roman Pot station at 220 m where crystal could be installed, Leff ≈6 m in x’ and ≈20 m in y’
plane. According to the analysis above, channeling efficiency in y’ plane should be great while in
x’ plane moderate. Indeed, our simulations for this location show channeling efficiency of ≈87%
in y’ and only ≈60% in x’ plane, for β*=0.55 m and optimal crystals of Si(110). A plain Si crystal
has channeling efficiency in x’ plane of just 3.5% or 17 times lower than a Si focusing crystal
adapted to the LHC lattice.
For the run-in phase of the LHC with β*=2 m we find that channeling efficiency of ≥ 85% can
be achieved if crystal is located at Leff ≥ 30 m downstream of the IP. The nominal, high
luminosity optics of the LHC is not optimized for forward proton detection. Therefore a
possibility to use a channeling crystal can be very helpful as it offers opportunities for diffractive
physics studies otherwise inaccessible in the nominal LHC settings.
The LHC options with a high β* (1540 and 90 m) are devised for the studies of diffractive
physics. With β*=1540 m, the emittance of diffractively scattered protons increases to ≈50π
µrad-mm. This corresponds to the acceptance of a Si crystal of ≥ 30 mm transverse size. Such a
crystal is not out of question, however the problem is where to fit it in the LHC. In terms of Leff,
good channeling efficiency requires a location with L2eff >> (130 m)2 in this optics.
We simulated channeling on the location 200 m from the IP5. In β*=1540 m optics, a 10-cm
Si(110) crystal trapped and bent protons 0.5 mrad in x’ plane with efficiency of 41%. A Ge(110)
crystal shows there 48% efficiency, i.e. comparable to Si. All figures assumed a perfect match
LF=Leff in crystal. A plain Si crystal gives efficiency of <<1%, or 300 times lower than a Si
focusing crystal adapted to the LHC lattice on this location.
In β*=90 m option on the same location, the choice of plane is important because Leff ≈10 m in
x’ and ≈170 m in y’ plane. Preferred location should have L2eff >> (60 m)2 so we expect very
different efficiencies in x’ and y’ planes. Our simulations give crystal efficiency of 72% in y’ and
only 7% in x’ plane for β*=90 m. Here, crystal application is feasible only with bending in
vertical plane.
Low efficiency may exclude a crystal use for double-pomeron-exchange events (pp→pXp)
with double-arm reconstruction, because the probability to have channeling in both arms in
coincidence becomes small, e.g. (41%)2≈17%. For reconstruction of single-diffraction events
(pp→pX) more detailed studies are required before the benefits (or their absence) from a crystal
use with high β* options can be understood.
In this paper we suggest the use of a single crystal for proton extraction from halo and
delivery to the detector. The use of a 2-stage crystal system [12], first crystal for extracting a
proton and second one for bending it a big angle, would reduce the overall efficiency by a factor
of ~0.6 (ideally) or less. The 2nd crystal traps only part of the protons channeled in the 1st one.
Finally, we notice that one can filter diffraction events with a crystal. Instead of trapping all
forward protons, crystal acceptance can be made smaller and sample e.g. only the most forward
protons emerging from the IP with the angles of a few µrad.
3. Precise transmission in a single (x, x’) plane
Whereas protons are physically delivered from the IP to detector with good efficiency, the
essential question is whether the information on phase space (x, x', y, y', E) distribution of
particles is lost or corrupted while the particles are captured and transmitted in crystal. The
success of experiments on measuring forward high momentum protons at the LHC depends on
the angular precision of proton track reconstruction. A plain crystal would destroy the phase
space information first by selecting particles from just a single direction and then disturbing the
exit angle of particle by coherent and incoherent scattering in crystal. Plain crystal acceptance is
±θC and crystal accuracy in angle transmission is again ±θC. That means, a plain crystal traps and
delivers about zero bit information on angle distribution. In this paper we design a crystal with
the acceptance of ~100 θC and angle transmission accuracy of ~0.1 θC, although it sounds against
the nature of crystal channeling.
Suppose particles are coming with a distribution over (x, x', y, y', E). Ideally, we would like
the crystal to trap all coming particles and preserve their distribution over (x, x', y, y', E), and
then shift an angle of θ each particle towards a physical setup where this distribution can be
analyzed in detail.
One should solve two problems. One problem was to trap and bend a beam with a divergence
much greater than the critical angle. A focusing crystal adapted to the LHC optics solves this
problem. In simulations, a focusing crystal traps with 90% efficiency all protons emerging from
the IP with the angular distribution ~100 times θC. Notice that the trapped particles fully preserve
also their distribution over the angle in the plane orthogonal to the plane of channeling. In such a
crystal, particles are trapped uniformly from a very broad distribution over x’ and y’.
A bent crystal would transform (x, x', y, y') at the entrance into (x, x'+θ, y, y') at the exit. To
do so, each trapped particle has to be channeled over the same distance in crystal. Therefore, the
shape of the crystal exit face must match the entry face. Then in a bent crystal each channeled
particle receives the same bending angle.
Although the crystal described above can solve the idea of sampling a broad distribution of
particles and delivering it to a required destination, second problem is how to preserve the
sampled distribution (x, x', y, y') “frozen” on transmission through the crystal lattice as precise as
possible. The coordinates (x, y) of particles are obviously preserved in crystal, so one should take
care of the accuracy in transmission of angles x’, y’ only.
The protons channeled between atomic planes in crystal are disturbed by (1) oscillations in
the channeling plane with an amplitude up to θC and by (2) scattering on a rarefied electronic gas
(mostly valence electrons) in both planes, x’ and y’. Notice that nuclear scattering will not
disturb the sample of transmitted channeled particles as this process is strongly suppressed for
channeled positive particles. Simply saying, any particle nuclear scattered would be dechanneled
and thus not present in the sample of bent particles.
That gives us the first idea that partially solves the problem of transmission accuracy. The
idea is that the information on crystal-captured particles is very well preserved in one plane, e.g.
(x, x'), while the particles are trapped and bent in another plane, e.g. (y, y’). Notice that particle
distribution in the plane orthogonal to channeling is favored twice. Firstly, they are easily
trapped with a broad angular distribution; secondly they are transmitted with a very little
scattering. Information in this plane will be best preserved. The opportunity to have perfect data
on just one plane is interesting for applications. The reconstruction of the Higgs boson mass in
reaction pp→p+H+p requires (x, x’) data in horizontal plane only [13].
Figure 1 The difference in proton angles, x’ and y’, before and after a Si(110) crystal.
Oscillations in the channeling plane on the atomic coherent potential are a greater problem.
Fig. 1 shows a distribution of the difference in proton angle in x’ and y’ planes before and after a
channeling in crystal, (x’OUT – x’IN) and (y’OUT – y’IN – 0.1 mrad), as obtained in simulations for a
Si(110) bent crystal channeling in y’ plane. The accuracy in x’ transmission in crystal is very
good indeed, ~0.1 µrad rms. The width of (y’OUT – y’IN – 0.1 mrad) distribution is much greater
due to oscillations in the potential of Si (110) planes.
4. Precise transmission in both planes
To solve the problem of accuracy in the other plane, i.e. the plane of channeling, one solution
is to use a channel with a lower critical angle, for instance Si(100) instead of (110) or (111). A
more universal solution is to use a strongly bent crystal. The critical angle θC is gradually
reduced to zero when the crystal curvature approaches a critical value. The strong focusing of a
strongly bent crystal suppresses channeling oscillations to any low level needed in the
application.
Fig. 2 shows the difference in proton angle in x’ and y’ planes before and after channeling in a
crystal, (x’OUT – x’IN– 0.1 mrad) and (y’OUT – y’IN), as obtained in simulations for a 2 mm Si(100)
crystal bent 0.1 mrad. The protons were channeled in x’ plane. The rms value of angle smearing
found in simulations is 0.2 µrad both for x’ and y’.
Figure 2 The same as in Fig. 1 but for a strongly bent Si (100) crystal.
This accuracy should be compared to the angular resolution of the detectors downstream of
the crystal. Measuring proton coordinates with ~10 µm resolution [13] over a base of ~8 m as
allowed by a drift space would give an angular accuracy of ~1.4×10µm/8m=1.8 µrad. Addition
(quadratic) of crystal transmission accuracies doesn’t change this resolution. That would be
perfect for crystal. With a much better resolution on the detector side, down to 0.5 µrad, the
overall resolution becomes ~0.55 µrad, i.e. just slightly disturbed. Crystal transmission in both
planes, x’ and y’, is still almost perfect.
Because of scattering on electronic gas, the protons loose energy in crystal. In simulations, the
energy loss and its fluctuations in crystal are ∆E/E ≈10-7–10-6, i.e. much smaller than even the
nominal energy spread in the LHC beam, 1.1×10-4. Energy losses in bent crystals were studied in
experiments at CERN SPS with protons of 450 GeV and Pb ions of 33 TeV where CATCH
predictions were also validated [9]. The diffractively scattered protons would have energy spread
on the order of 100 GeV, or ∆E/E ≈1.5%, at the crystal entrance. In simulations with β*=0.55-2
m optics, channeling efficiency was completely independent of energy even for ∆E/E≈10%. In
high β* options, crystal efficiency was uniform within ≈0.7% for ∆E/E ≈1.5%. One can say that a
phase space distribution (x, y, x', y', E) can be perfectly preserved in crystal and no information is
lost on transmission in crystal.
Figure 3 An example of beam space (x’, y’) at the entrance to the crystal (a) and at the exit (b).
Fig. 3 shows an example of a (x’, y’) plot at the entrance to the crystal (a) and at the exit of it
(b) where we tried to show how accurately a crystal can transmit a signature in angular space
(semicircle chosen as a probe). The resolution of the image transmitted by a crystal is ~0.2 µrad
in both planes. In terms of the critical channeling angle θC, the obtained resolution is an order of
magnitude finer than θC while the size of the trapped and channeled area can be some orders of
magnitude greater than θC.
In the applications there is no point to have a crystal transmission too perfect. It should match
the other sources of inaccuracy like a multiple scattering in the detectors and vacuum chambers,
etc. By tuning crystal parameters, in principle, one could very much improve in precision of the
beam image downstream of the crystal but loose in brightness of the image, i.e. in statistics rate,
as the efficiency of crystal transmission could be affected.
Finally, we suggest another idea for the channeling plane (a “microscope idea”) that improves
not only the crystal accuracy but even the detector resolution in that plane. Crystal can magnify
beam image in one plane, e.g. transform the entrance values (x, x', y, y') into exit values (x, Nx'
+θ, y, y'). The magnification factor N can be as big as 2 or 10 or even 100, and serve the purpose
to increase strongly the overall angular resolution in x’. In the above examples, the overall
resolution was ~1 µrad defined by detector resolution. With magnification optics, the overall
inaccuracy in x’ would be effectively reduced by factor N, bringing it below 0.1 µrad rms.
Magnification is realized by making the shape of the crystal exit face different from the entry
face. With a magnification factor of 10, e.g., the entry angular opening of 50 µrad would
correspond to the exit opening of 500 µrad.
5. Conclusion
We have shown in simulations that crystal lattice can trap with 90% efficiency a beam with a
(x', y') distribution much broader than a critical angle θC. To achieve that, one has to match the
crystal focus length to the effective length between the particle source and the crystal in the
accelerator lattice. Crystal adaptation to accelerator lattice improved channeling efficiency up to
300-fold. Crystal can transmit the trapped particles in channeled states with the phase space (x,
x', y, y', E) distribution preserved with accuracy an order of magnitude finer than θC. Several
solutions were proposed and supported by simulations for achieving a fine resolution in crystal
transmission.
This may give a beam instrument for collision products in colliders. Usually, accelerator beam
instruments prepare particles for collision: by cooling them, bending, focusing, etc. Detectors
sort out the results of collision. We change this a bit by introducing crystal optics between the
collision point and detectors.
A crystal adapted to the LHC lattice can trap with 90% efficiency all protons emerging from
the IP with divergence of 150 µrad or ~100θC. The trapped protons can be channeled to detectors
with precision down to 0.1 µrad rms. This makes feasible a crystal application for the
measurement of diffractive scattering in CMS and ATLAS at the LHC. While we showed the
physical capabilities of crystal channeling, its actual application in the LHC environment has to
take into account many technical considerations to fit into existing infrastructure of accelerator
and detectors.
Crystal channeling of LHC forward protons can improve proton acceptance in momentum
loss ξ and four-momentum transfer t both in TOTEM and FP420 and allow to reach the smallest
possible value of the scattering angle [9]. Now the sensitive detector area starts at ~12-15σ from
the LHC beam [10]. Crystal can be placed at ~6σ from the LHC beam as it is very small, ~cm Si,
and does not provoke beam instability. Such a crystal can trap and deliver a very useful
information on most forward high momentum “quasi-elastic” and elastic protons at LHC,
unavailable otherwise.
There are practical benefits as well. Crystal would relax tough requirements on β* needed for
TOTEM. Crystal may allow TOTEM to run at the early start of the LHC, possibly running in
parallel to other experiments. Thanks to crystal, FP420 detectors could possibly reside out of the
cold region. The detectors don’t need to be edgeless. Crystal works best with low β*, where
FP420 is interested most. If detectors can be more distanced from the beam, background
conditions may improve. For injection, the active areas of the detectors must be kept away from
the beams and then moved back; instead, one can move a crystal. Crystal can be introduced to
experiment on a later stage in an attempt to expand the horizons of the physics program.
References
[1] D.S. Gemmel, Rev. Mod. Phys. 46, 1 (1974)
[2] E.N. Tsyganov, FNAL TM-682 (1976). A.S. Vodopianov et al., JETP Lett. 30, 474 (1979)
[3] V.M. Biryukov, Yu.A. Chesnokov and V.I. Kotov, Crystal Channeling and its Application at High
Energy Accelerators. Berlin: Springer (1997)
[4] M.B.H. Breese, Nucl. Instr. and Meth. B 132, 540 (1997)
[5] R.A. Carrigan et al., Phys. Rev. ST AB 5, 043501 (2002)
[6] A.G. Afonin et al., Nucl. Instr. and Meth. B 234, 14 (2005); Phys. Lett. B 435, 240 (1998); JETP
Lett. 67, 781 (1998)
[7] R.A. Carrigan et al., Phys. Rev. ST AB 1, 022801 (1998); V. Biryukov. Phys. Rev. E 52, 6818
(1995)
[8] R.P. Fliller et al. Phys. Rev. ST AB 9, 013501 (2006); Nucl. Instr. Meth. B 234, 47 (2005); AIP
Conf. Proc. 693, 192 (2004)
[9] S.P. Moller et al. Phys. Rev. A 64, 032902 (2001); S.P. Moller et al., Nucl. Instr. and Meth. B 84,
434 (1994); V. Biryukov, Nucl. Instr. and Meth. B 117, 357 (1996).
[10] E. Uggerhoj and U.I. Uggerhoj, Nucl. Instr. and Meth. B 234, 31 (2005); V.M. Biryukov et al.,
Nucl. Instr. and Meth. B 234, 23 (2005); arXiv:physics/0307027
[11] V.M. Biryukov and S. Bellucci, Nucl. Instr. and Meth. B 252, 7 (2006); arXiv:hep-ex/0504021
[12] K. Eggert and P. Grafstrom. Presented at CARE-HHH-APD Mini-Workshop on Crystal
Collimation (CC-2005), Geneva, 2005. M. Albrow. Talk given at CERN (2006).
[13] M. Albrow et al., CERN/LHCC 2006-039/G-124.
[14] V. Biryukov, Nucl. Instrum. and Meth. B 53, 202 (1991); A. Taratin et al., Nucl. Instrum. and
Meth. B 58, 103 (1991); V.M. Biryukov, Nucl. Instr. and Meth. B 117, 463 (1996)
[15] A. Baurichter et al. Nucl. Instr. and Meth. B 164-165, 27 (2000)
[16] V. Biryukov. Phys. Rev. E 51, 3522 (1995); CERN SL/Note 93-74 AP (1993).
[17] S. Strokov et al., submitted to J. Phys. Soc. Jap.
[18] V.I. Baranov et al., Nucl. Instr. and Meth. B 95, 449 (1995).
ABSTRACT
  We show that crystal can trap a broad (x, x', y, y', E) distribution of
particles and channel it preserved with a high precision. This sampled-and-hold
distribution can be steered by a bent crystal for analysis downstream. In
simulations for the 7 TeV Large Hadron Collider, a crystal adapted to the
accelerator lattice traps 90% of diffractively scattered protons emerging from
the interaction point with a divergence 100 times the critical angle. We set
the criterion for crystal adaptation improving efficiency ~100-fold. Proton
angles are preserved in crystal transmission with accuracy down to 0.1
microrad. This makes feasible a crystal application for measuring very forward
protons at the LHC.

<|endoftext|><|startoftext|>
IFIC/07-03
Probing non-standard neutrino interactions with supernova neutrinos
A. Esteban-Pretel, R. Tomàs and J. W. F. Valle1
1AHEP Group, Institut de F́ısica Corpuscular - C.S.I.C/Universitat de València
Edifici Instituts d’Investigació, Apt. 22085, E-46071 València, Spain
(Dated: November 4, 2018)
We analyze the possibility of probing non-standard neutrino interactions (NSI, for short) through
the detection of neutrinos produced in a future galactic supernova (SN). We consider the effect of
NSI on the neutrino propagation through the SN envelope within a three-neutrino framework, paying
special attention to the inclusion of NSI-induced resonant conversions, which may take place in the
most deleptonised inner layers. We study the possibility of detecting NSI effects in a Megaton water
Cherenkov detector, either through modulation effects in the ν̄e spectrum due to (i) the passage
of shock waves through the SN envelope, (ii) the time dependence of the electron fraction and (iii)
the Earth matter effects; or, finally, through the possible detectability of the neutronization νe
burst. We find that the ν̄e spectrum can exhibit dramatic features due to the internal NSI-induced
resonant conversion. This occurs for non-universal NSI strengths of a few %, and for very small
flavor-changing NSI above a few×10−5.
PACS numbers: 13.15.+g, 14.60.Lm, 14.60.Pq, 14.60.St, 97.60.Bw
I. INTRODUCTION
The very first data of the KamLAND collaboration [1]
have been enough to isolate neutrino oscillations as the
correct mechanism explaining the solar neutrino prob-
lem [2, 3], indicating also that large mixing angle (LMA)
was the right solution. The 766.3 ton-yr KamLAND data
sample further strengthens the validity of the LMA os-
cillation interpretation of the data [4].
Current data imply that neutrino have mass. For an
updated review of the current status of neutrino oscil-
lations see [5]. Theories of neutrino mass [6, 7] typ-
ically require that neutrinos have non-standard prop-
erties such as neutrino electromagnetic transition mo-
ments [8, 9, 10] or non-standard four-Fermi interactions
(NSI, for short) [11, 12, 13]. The expected magnitude of
the NSI effects is rather model-dependent.
Seesaw-type models lead to a non-trivial structure of
the lepton mixing matrix characterizing the charged and
neutral current weak interactions [6]. The NSI which
are induced by the charged and neutral current gauge
interactions may be sizeable [14, 15, 16, 17, 18]. Alter-
natively, non-standard neutrino interactions may arise in
models where neutrinos masses are radiatively “calcula-
ble” [19, 20]. Finally, in some supersymmetric unified
models, the strength of non-standard neutrino interac-
tions may arise from renormalization and/or threshold
effects [21].
We stress that non-standard interactions strengths are
highly model-dependent. In some models NSI strengths
are too small to be relevant for neutrino propagation,
because they are either suppressed by some large mass
scale or restricted by limits on neutrino masses, or both.
However, this need not be the case, and there are many
theoretically attractive scenarios where moderately large
NSI strengths are possible and consistent with the small-
ness of neutrino masses. In fact one can show that
NSI may exist even in the limit of massless neutri-
nos [14, 15, 16, 17, 18]. Such may also occur in the
context of fully unified models like SO(10) [22].
We argue that, in addition to the precision determi-
nation of the oscillation parameters, it is necessary to
test for sub-leading non-oscillation effects that could arise
from non-standard neutrino interactions. These are nat-
ural outcome of many neutrino mass models and can be of
two types: flavor-changing (FC) and non-universal (NU).
These are constrained by existing experiments (see be-
low) and, with neutrino experiments now entering a pre-
cision phase [23], an improved determination of neutrino
parameters and their theoretical impact constitute an im-
portant goal in astroparticle and high energy physics [5].
Here we concentrate on the impact of non-standard
http://arxiv.org/abs/0704.0032v1
neutrino interactions on supernova physics. We show
how complementary information on the NSI parame-
ters could be inferred from the detection of core-collapse
supernova neutrinos. The motivation for the study is
twofold. First, if a future SN event takes place in our
Galaxy the number of neutrino events expected in the
current or planned neutrino detectors would be enor-
mous, O(104 − 105) [24]. Moreover, the extreme con-
ditions under which neutrinos have to travel since they
are created in the SN core, in strongly deleptonised re-
gions at nuclear densities, until they reach the Earth,
lead to strong matter effects. In particular the effect of
small values of the NSI parameters can be dramatically
enhanced, possibly leading to observable consequences.
This paper is planned as follows. In Sec. II we summa-
rize the current observational bounds on the parameters
describing the NSI, including previous works on NSI in
SNe. In Sec. III we describe the neutrino propagation
formalism as well as the SN profiles which will be used.
In Sec. IV we analyze the effect of NSI on the ν propaga-
tion in the inner regions near the neutrinosphere and in
the outer regions of the SN envelope. In Sec. V we discuss
the possibility of using various observables to probe the
presence of NSI in the neutrino signal of a future galactic
SN. Finally in Sec. VI we present our conclusions.
II. PRELIMINARIES
A large class of non-standard interactions may be
parametrized with the effective low-energy four-fermion
operator:
LNSI = −εfPαβ 2
2GF (ν̄αγµLνβ)(f̄γ
µPf) , (1)
where P = L, R and f is a first generation fermion:
e, u, d. The coefficients ε
αβ denote the strength of the
NSI between the neutrinos of flavors α and β and the
P−handed component of the fermion f .
Current constraints on ε
αβ come from a variety of dif-
ferent sources, which we now briefly list.
A. Laboratory
Neutrino scattering experiments [25, 26, 27, 28,
29] provide the following bounds, |εfPµµ | . 10−3 −
10−2, |εfPee | . 10−1 − 1, |εfPµτ | . 0.05, |εfPeτ | . 0.5 at
90 % C.L [30, 31, 32]. On the other hand the analysis
of the e+e− → νν̄γ cross section measured at LEP II
leads to a bound on |εePττ | . 0.5 [33]. Future prospects
to improve the current limits imply the measurement of
sin2 ϑW leptonically in the scattering off electrons in the
target, as well as in neutrino deep inelastic scattering in
a future neutrino factory. The main improvement would
be in the case of |εfPee | and |εfPeτ |, where values as small
as 10−3 and 0.02, respectively, could be reached [31].
The search for flavor violating processes involving
charged leptons is expected to restrict corresponding neu-
trino interactions, to the extent that the SU(2) gauge
symmetry is assumed. However, this can at most give
indicative order-of-magnitude restrictions, since we know
SU(2) is not a good symmetry of nature. Using radiative
corrections it has been argued that, for example, µ − e
conversion on nuclei like in the case of µ−T i also con-
strains |εqPµe | . 7.7× 10−4 [31].
Non-standard interactions can also affect neutrino
propagation through matter, probed in current neutrino
oscillation experiments. The bounds so obtained apply to
the vector coupling constant of the NSI, ε
αβ = ε
since only this appears in neutrino propagation in mat-
ter [91].
B. Solar and reactor
The role of neutrino NSIs as subleading effects on the
solar neutrino oscillations and KamLAND has been re-
cently considered in Ref. [34, 35, 36] with the following
bounds at 90 % CL for ε ≡ − sinϑ23εdVeτ with the al-
lowed range −0.93 . ε . 0.30, while for the diagonal
term ε′ ≡ sin2 ϑ23εdVττ − εdVee , the only forbidden region is
[0.20, 0.78] [36]. Only in the ideal case of infinitely pre-
cise solar neutrino oscillation parameters determination,
the allowed range would “close from the left” for negative
NSI parameter values, at −0.6 for ε and −0.7 for ε′.
C. Atmospheric and accelerator neutrinos
Non-standard interactions involving muon neutrinos
can be constrained by atmospheric neutrino experiments
as well as accelerator neutrino oscillation searches at
K2K and MINOS. In Ref. [37] Super-Kamiokande and
MACRO observations of atmospheric neutrinos were con-
sidered in the framework of two neutrinos. The limits ob-
tained were −0.05 . εdVµτ < 0.04 and |εdVττ − εdVµµ | . 0.17
at 99 % CL. The same data set together with K2K were
recently considered in Refs. [38, 39] to study the nonstan-
dard neutrino interactions in a three generation scheme
under the assumption εeµ = εµµ = εµτ = 0. The al-
lowed region of εττ obtained for values of εeτ smaller
than O(10−1) becomes Σf=u,d,eεfVαβNf/Ne . 0.2 [39] ,
where Nf stands for the fermion number density.
D. Cosmology
If non-standard interactions with electrons were large
they might also lead to important cosmological and as-
trophysical implications. For instance, neutrinos could
be kept in thermal contact with electrons and positrons
longer than in the standard case, hence they would share
a larger fraction of the entropy release from e± annihi-
lations. This would affect the predicted features of the
cosmic background of neutrinos. As recently pointed out
in Ref [40] required couplings are, though, larger than
the current laboratory bounds.
E. NSI in Supernovae
According to the currently accepted supernova (SN)
paradigm, neutrinos are expected to play a crucial role
in SN dynamics. As a result, SN physics provides
a laboratory to probe neutrino properties. Moreover,
many future large neutrino detectors are currently be-
ing discussed [41]. The enormous number of events,
O(104 − 105) that would be “seen” in these detectors in-
dicates that a future SN in our Galaxy would provide a
very sensitive probe of non-standard neutrino interaction
effects.
The presence of NSI can lead to important conse-
quences for the SN neutrino physics both in the highly
dense core as well as in the envelope where neutrinos
basically freely stream.
The role of non-forward neutrino scattering processes
on heavy nuclei and free nucleons giving rise to flavor
change within the SN core has been recently analyzed in
Ref. [42, 43]. The main effect found was a reduction in
the core electron fraction Ye during core collapse. A lower
Ye would lead to a lower homologous core mass, a lower
shock energy, and a greater nuclear photon-disintegration
burden for the shock wave. By allowing a maximum
∆Ye = −0.02 it has been claimed that εeα . 10−3, where
α = µ, τ [43].
On the other hand it has been noted since long ago
that the existence of NSI plays an important role in the
propagation of SN neutrinos through the envelope lead-
ing to the possibility of a new resonant conversion. In
contrast to the well known MSW effect [44, 45] it would
take place even for massless neutrinos [13]. Two basic
ingredients are necessary: universal and flavor changing
NSI. In the original scheme neutrinos were mixed in the
leptonic charged current and universality was violated
thanks to the effect of mixing with heavy gauge singlet
leptons [6, 14]. Such resonance would induce strong neu-
trino flavor conversion both for neutrinos and antineutri-
nos simultaneously, possibly affecting the neutrino sig-
nal of the SN1987A as well as the possibility of having
r−process nucleosynthesis. This was first quantitatively
considered within a two-flavor νe−ντ scheme, and bounds
on the relevant NSI parameters were obtained using both
arguments [46].
One of the main features of the such “internal” or
“massless” resonant conversion mechanism is that it re-
quires the violation of universality, its position being
determined only by the matter chemical composition,
namely the value of the electron fraction Ye, and not by
the density. In view of the experimental upper bounds
on the NSI parameters such new resonance can only take
place in the inner layers of the supernova, near the neu-
trinosphere, where Ye takes its minimum values. In this
region the values of Ye are small enough to allow for
resonance conversions to take place in agreement with
existing bounds on the strengths of non-universal NSI
parameters.
The SN physics implications of another type of NSI
present in supersymmetric R-parity violating models
have also been studied in Ref. [47], again for a system
of two neutrinos. For definiteness NSI on d−quarks were
considered, in two cases: (i) massless neutrinos without
mixing in the presence of flavor-changing (FC) and non-
universal (NU) NSIs, and (ii) neutrinos with eV masses
and FC NSI. Different arguments have been used in
order to constrain the parameters describing the NSI,
namely, the SN1987A signal, the possibility to get suc-
cessful r−process nucleosynthesis, and the possible en-
hancement of the energy deposition behind the shock
wave to reactivate it.
On the other hand several subsequent articles [48, 49,
50] considered the effects of NSI on the neutrino propa-
gation in a three–neutrino mixing scenario for the case
Ye > 0.4, typical for the outer SN envelope. Together
with the assumption that εdVαβ . 10
−2 this prevents the
appearance of internal resonances in contrast to previous
references.
Motivated by supersymmetric theories without R par-
ity, in Ref. [48] the authors considered the effects of
small-strength NSI with d−quarks. Following the for-
malism developed in Refs. [51, 52] they studied the cor-
rections that such NSI would have on the expressions
for the survival probabilities in the standard resonances
MSW-H and MSW-L. A similar analysis was performed
in Ref. [49] assuming Z-induced NSI interactions orig-
inated by additional heavy neutrinos. A phenomeno-
logical generalization of these results was carried out in
Ref. [50]. The authors found an analytical compact ex-
pression for the survival probabilities in which the main
effects of the NSI can be embedded through shifts of the
mixing angles ϑ12 and ϑ13. In contrast to similar expres-
sions found previously these directly apply to all mixing
angles, and in the case with Earth matter effects. The
main phenomenological consequence was the identifica-
tion of a degeneracy between ϑ13 and εeα, similar to the
analogous “confusion” between ϑ13 and the correspond-
ing NSI parameter noted to exist in the context of long-
baseline neutrino oscillations [53, 54].
We have now re-considered the general three–neutrino
mixing scenario with NSI. In contrast to previous
work [48, 49, 50], we have not restricted ourselves to
large values of Ye, discussing also small values present
in the inner layers. This way our generalized descrip-
tion includes both the possibility of neutrinos having the
“massless” NSI-induced resonant conversions in the in-
ner layers of the SN envelope [13, 46, 47], as well as the
“outer” oscillation-induced conversions [48, 49, 50] [92].
III. NEUTRINO EVOLUTION
In this section we describe the main ingredients of our
analysis. Our emphasis will be on the use of astrophys-
ically realistic SN matter and Ye profiles, characterizing
its density and the matter composition. Their details,
in particular their time dependence, are crucial in deter-
mining the way the non-standard neutrino interactions
affect the propagation of neutrinos in the SN medium.
A. Evolution Equation
As discussed in Sec. II in an unpolarized medium the
neutrino propagation in matter will be affected by the
vector coupling constant of the NSI, ε
αβ = ε
αβ [93].
The way the neutral current NSI modifies the neu-
trino evolution will be parametrized phenomenologically
through the effective low-energy four-fermion operator
described in Eq. (1). We also assume ε
αβ ∈ ℜ, neglect-
ing possible CP violation in the new interactions.
Under these assumptions the Hamiltonian describing
the SN neutrino evolution in the presence of NSI can be
cast in the following form [94]
να = (Hkin +Hint)αβ νβ , (2)
where Hkin stands for the kinetic term
Hkin = U
U † , (3)
with M2 = diag(m21,m
3), and U the three-neutrino
lepton mixing matrix [6] in the PDG convention [55] and
with no CP phases.
The second term of the Hamiltonian accounts for the
interaction of neutrinos with matter and can be split into
two pieces,
Hint = H
int +H
int . (4)
The first term, Hstd
describes the standard interaction
with matter and can be written asHstd
= diag (VCC , 0, 0)
up to one loop corrections due to different masses of the
muon and tau leptons [56]. The standard matter poten-
tial for neutrinos is given by
VCC =
2GFNe = V0ρYe , (5)
where V0 ≈ 7.6×10−14 eV, the density is given in g/cm3,
and Ye stands for the relative number of electrons with
respect to baryons. For antineutrinos the potential is
identical but with the sign changed.
The term in the Hamiltonian describing the non-
standard neutrino interactions with a fermion f can be
expressed as,
(Hnsiint )αβ =
f=e,u,d
)αβ , (6)
with (V
)αβ ≡
2GFNfε
αβ. For definiteness and mo-
tivated by actual models, for example, those with broken
R parity supersymmetry we take for f the down-type
quark. However, an analogous treatment would apply
to the case of NSI on up-type quarks, the existence of
NSI with electrons brings no drastic qualitative differ-
ences with respect to the pure oscillation case (see be-
low). Therefore the NSI potential can be expressed as
follows,
(V dnsi)αβ = ε
αβV0ρ(2− Ye) . (7)
From now on we will not explicitely write the superindex
d. In order to further simplify the problem we will rede-
fine the diagonal NSI parameters so that εµµ = 0, as one
can easily see that subtracting a matrix proportional to
the identity leaves the physics involved in the neutrino
oscillation unaffected.
B. Supernova matter profiles
Neutrino propagation depends on the supernova mat-
ter and chemical profile through the effective potential.
This profile exhibits an important time dependence dur-
ing the explosion. Fig. 1 shows the density ρ(t, r) and the
electron fraction Ye(t, r) profiles for the SN progenitor as
well as at different times post-bounce.
Progenitor density profiles can be roughly
parametrized by a power-law function
ρ(r) = ρ0
, (8)
where ρ0 ∼ 104 g/cm3, R0 ∼ 109 cm, and n ∼ 3. The
electron fraction profile varies depending on the matter
composition of the different layers. For instance, typical
values of Ye between 0.42 and 0.45 in the inner regions
are found in stellar evolution simulations [57]. In the in-
termediate regions, where the MSW H and L-resonances
take place Ye ≈ 0.5. This value can further increase in
the most outer layers of the SN envelope due to the pres-
ence of hydrogen.
After the SN core bounce the matter profile is affected
in several ways. First note that a front shock wave starts
to propagate outwards and eventually ejects the SN enve-
lope. The evolution of the shock wave will strongly mod-
ify the density profile and therefore the neutrino propa-
gation [58, 59]. Following Ref. [60] we shall assume that
the structure of the shock wave is more complicated and
an additional “reverse wave” appears due to the collision
of the neutrino-driven wind and the slowly moving mate-
rial behind the forward shock, as seen in the upper panel
of Fig. 1 [95].
On the other hand, the electron fraction is also affected
by the time evolution as the SN explosion proceeds. Once
the collapse starts the core density grows so that the neu-
trinos become eventually effectively trapped within the
so called “neutrinosphere”. At this point the trapped
electron fraction has decreased until values of the order of
0.33 [61]. When the inner core reaches the nuclear density
it can not contract any further and bounces. As a con-
sequence a shock wave forms in the inner core and starts
propagating outwards. When the newly formed super-
nova shock reaches densities low enough for the initially
trapped neutrinos to begin streaming faster than the
shock propagates [62], a breakout pulse of νe is launched.
In the shock-heated matter, which is still rich of elec-
trons and completely disintegrated into free neutrons and
protons, a large number of νe are rapidly produced by
electron captures on protons. They follow the shock on
its way out until they are released in a very luminous
flash, the breakout burst, at about the moment when
the shock penetrates the neutrinosphere and the neutri-
nos can escape essentially unhindered. As a consequence,
the lepton number in the layer around the neutrinosphere
decreases strongly and the matter neutronizes [63]. The
value of Ye steadily decreases in these layers until val-
ues of the order of O(10−2). Outside the neutrinosphere
there is a steep rise until Ye ≈ 0.5. This is a robust
feature of the neutrino-driven baryonic wind. Neutrino
heating drives the wind mass loss and causes Ye to rise
within a few 10 km from low to high values, between 0.45
and 0.55 [64], see bottom panel of Fig. 1. Inspired in the
numerical results of Ref. [60] we have parametrized the
behavior of the electron fraction near the neutrinosphere
phenomenologically as,
Ye = a+ b arctan[(r − r0)/rs] , (9)
where a ≈ 0.23− 0.26 and b ≈ 0.16− 0.20. The param-
eters r0 and rs describe where the rise takes place and
how steep it is, respectively. As can be seen in Fig. 1
both decrease with time.
FIG. 1: Density (upper panel) and electron fraction (bottom
panel) profiles for the SN progenitor and at different instants
after the core bounce, from Ref. [60]. The regions where the
H (yellow) and the L (cyan) resonance take place are also
indicated, as well as the NSI-induced I (gray) resonance for
the parameters εee = 0, εττ . 0.07 and |εµτ | . 0.05
IV. THE TWO REGIMES
In order to study the neutrino propagation through
the SN envelope we will split the problem into two differ-
ent regions: the inner envelope, defined by the condition
VCC ≫ ∆m2atm/(2E) with ∆m2atm ≡ m23 − m22, and the
outer one, where ∆m2atm/(2E) & VCC . From the upper
panel of Fig. 1 one can see how the boundary roughly
varies between r ≈ 108 cm and 109 cm, depending on the
time considered. This way one can fully characterize all
resonances that can take place in the propagation of su-
pernova neutrinos, both the outer resonant conversions
related to neutrino masses and indicated as the upper
bands in Fig. 1, and the inner resonances that follow
from the presence of non-standard neutrino interactions,
indicated by the band at the bottom of the same figure.
Here we pay special attention to the use of realistic mat-
ter and chemical supernova profiles and three-neutrino
flavors thus generalising previous studies.
A. Neutrino Evolution in the Inner Regions
Let us first write the Hamiltonian in the inner layers,
where Hint ≫ Hkin. In this case the Hamiltonian can be
written as
H ≈ Hint = V0ρ(2− Ye)
+ εee εeµ εeτ
εeµ 0 εµτ
εeτ εµτ εττ
When the value of the εαβ is of the same order as the
electron fraction Ye internal resonances can arise [13].
Taking into account the current constraints on the ε’s
discussed in Sec. II one sees that small values of Ye are
required [46, 47]. As a result, these can only take place
in the most deleptonised inner layers, close to the neu-
trinosphere, where the kinetic terms of the Hamiltonian
are negligible.
Given the large number of free parameters εαβ in-
volved we consider one particular case where |εeµ| and
|εeτ | are small enough to neglect a possible initial mixing
between νe and νµ or ντ . Barring fine tuning, this basi-
cally amounts to |εeµ|, |εeτ | ≪ 10−2. According to the
discussion of Sec. II εeµ automatically satisfies the condi-
tion, whereas one expects that the window |εeτ | & 10−2
will eventually be probed in future experiments.
Since the initial fluxes of νµ and ντ are expected to be
basically identical, it is convenient to redefine the weak
basis by performing a rotation in the µ− τ sector:
= U(ϑ′23)
1 0 0
0 c23′ s23′
0 −s23′ c23′
where c23′ and s23′ stand for cos(ϑ
23) and sin(ϑ
23), re-
spectively. The angle ϑ′23 can be written as
tan(2ϑ′23) ≈
. (12)
The Hamiltonian becomes in the new basis
H ′αβ = U
†(ϑ′23)HαβU(ϑ
23) (13)
= V0ρ(2− Ye)
+ εee ε
ε′eµ ε
ε′eτ 0 ε
,(14)
where
ε′eµ = εeµc23′ − εeτs23′ (15)
ε′eτ = εeµs23′ + εeτ c23′ (16)
ε′µµ = (εττ −
ε2ττ + 4ε
µτ )/2 (17)
ε′ττ = (εττ +
ε2ττ + 4ε
µτ )/2 . (18)
With our initial assumptions on εeα one notices that
the new basis ν′α basically diagonalizes the Hamiltonian,
and therefore coincides roughly with the matter eigen-
state basis. A novel resonance can arise if the condition
H ′ee = H
ττ is satisfied, we call this I-resonance, I stand-
ing for “internal” [96]. The corresponding resonance con-
dition can be written as
Y Ie =
1 + εI
, (19)
where εI is defined as ε′ττ − εee. In Fig. 2 we represent
the range of εee and ε
ττ leading to the I-resonance for
an electron fraction profile between different Y mine ’s and
Y maxe = 0.5. It is important to notice that the value
of Y mine depends on time. Right before the collapse the
minimum value of the electron fraction is around 0.4.
Hence the window of NSI parameters that would lead
to a resonance would be relatively narrow, as indicated
by the shaded (yellow) band in Fig. 2. As time goes on
Y mine decreases to values of the order of a few %, and
as a result the region of parameters giving rise to the I-
resonance significantly widens. For example, in the range
|εee| ≤ 10−3 possibly accessible to future experiments one
sees that the I-resonance can take place for values of ε′ττ
of the order of O(10−2). This indicates that the potential
sensitivity on NSI parameters that can be achieved in su-
pernova studies is better than that of the current limits.
FIG. 2: Contours of Y Ie as function of εee and ε
ττ accord-
ing to Eq. (19) for different values of Ye. The region in yel-
low represents the region of parameters that gives rise to I-
resonance before the collapse. The arrows indicate how this
region widens with time.
As seen in Fig. 1 in order to fulfill the I-resonance con-
dition for such small values of the NSI parameters the
values of Ye must indeed lie, as already stated, in the
inner layers.
Several comments are in order: First, in contrast to
the standard H and L-resonances, related to the kinetic
term, the density itself does not explicitly enter into the
resonance condition, provided that the density is high
enough to neglect the kinetic terms. Analogously the en-
ergy plays no role in the resonance condition, which is
determined only by the electron fraction Ye. Moreover,
in contrast to the standard resonances, the I-resonance
occurs for both neutrinos and antineutrinos simultane-
ously [13]. Finally, as indicated in Fig. 3 the νe’s (ν̄e) are
not created as the heaviest (lightest) state but as the in-
termediate state, therefore the flavor composition of the
neutrinos arriving at the H-resonance is exactly the op-
posite of the case without NSI. As we show in Sec. V, this
fact can lead to important observational consequences.
In order to calculate the hopping probability between
matter eigenstates at the I-resonance we use the Landau-
ν m2 ν
FIG. 3: Level-crossing schemes, first panel is for the case of
normal hierarchy (oscillations only), the second includes the
NSI effect. The two lower panels correspond to the inverse hi-
erarchy, oscillations only and oscillations + NSI, respectively.
Zener approximation for two flavors
P ILZ ≈ e−
γI , (20)
where γI stands for the adiabaticity parameter, which
can be generally written as
Em2 − Em1
, (21)
where ϑ̇m ≡ dϑm/dr. If one applies this for-
mula to the e − τ ′ box of Eq. (14) assuming that
tan 2ϑmI = 2H
eτ/(H
ττ − Hee) and Em2 − Em1 =
(H ′ττ −Hee)2 + 4H ′eτ
one gets
4H ′2eτ
(Ḣ ′ττ − Ḣee)
16V0ρε
(1 + εI)3Ẏe
≈ 4× 109rs,5ρ11ε′2eτf(εI) , (22)
where the parametrization of the Ye profile has been de-
fined as in Eq. (9) with b = 0.16. The density ρ11 rep-
resents the density in units of 1011 g/cm3, rs,5 stands
for rs in units of 10
5 cm, and f(εI) is a function whose
value is of the order O(1) in the range of parameters we
are interested in. Taking all these factors into account
it follows that the internal resonance will be adiabatic
provided that ε′eτ & 10
−5, well below the current limits,
in full numerical agreement with, e. g., Ref. [47].
In Fig. 4 we show the resonance condition as well as
the adiabaticity in terms of εττ and εeτ assuming the
other εαβ = 0. In order to illustrate the dependence on
time we consider profiles inspired in the numerical profiles
of Fig. 1 at t = 2 s (upper panel) and 15.7 s (bottom
panel). For definiteness we take Y mine as the electron
fraction at which the density has value of 5× 1011g/cm3.
For comparison with Fig. 2 we have assumed Y mine =
10−2 in the case of 15.7 s. We observe how the border
of adiabaticity depends on εττ through the value of the
density at rI which in turn depends on time.
Before moving to the discussion of the outer resonances
a comment is in order, namely, how does the formalism
change for other non-standard interaction models. First
note that the whole treatment presented above also ap-
plies to the case of NSI on up-type quarks, except that
the position of the internal resonance shifts with respect
to the down-quark case. Indeed, in this case the NSI
potential
(V unsi)αβ = ε
αβV0ρ(1 + Ye) , (23)
 0.001  0.01  0.1  1
 0.001  0.01  0.1  1
FIG. 4: Contours of constant jump probability at the I-
resonance in terms of εττ and εeτ for two profiles correspond-
ing to Fig. 1 at 2 s with a = 0.235 and b = 0.175 (upper panel)
and 15.7 s with a = 0.26 and b = 0.195 (bottom panel). For
simplicity the other ε’s have been set to zero.
would induce a similar internal resonance for the condi-
tion Ye = ε
I/(1− εI).
In contrast, for the case of NSI with electrons, the
NSI potential is proportional to the electron fraction, and
therefore no internal resonance would appear.
B. Neutrino Evolution in the Outer Regions
In the outer layers of the SN envelope neutrinos can un-
dergo important flavor transitions at those points where
the matter induced potential equals the kinetic terms.
In absence of NSI this condition can be expressed as
VCC ≈ ∆m2/(2E). Neutrino oscillation experiments in-
dicate two mass scales, ∆m2atm and ∆m
⊙ ≡ m22−m21 [5],
hence two different resonance layers arise, the so-called
H-resonance and the L-resonance, respectively.
The presence of NSI with values of |εαβ| . 10−2 modi-
fies the properties of the H and L transitions [48, 49, 50].
In particular one finds that the effects of the NSI can be
described as in the standard case by embedding the ε’s
into effective mixing angles [50]. An analogous “confu-
sion” between sinϑ13 and the corresponding NSI param-
eter εeτ has been pointed out in the context of long-
baseline neutrino oscillations in Refs. [53, 54].
In this section we perform a more general and com-
plementary study for slightly higher values of the NSI
parameters: |εαβ | & few 10−2, still allowed by current
limits, and for which the I-resonance could occur.
The phenomenological assumption of hierarchical
squared mass differences, |∆m2atm| ≫ ∆m2⊙, allows, for
not too large ε’s, a factorization of the 3ν dynamics
into two 2ν subsystems roughly decoupled for the H
and L transitions [65]. To isolate the dynamics of the
H transition, one usually rotates the neutrino flavor ba-
sis by U †(ϑ23), and extracts the submatrix with indices
(1,3) [48, 50]. Whereas this method works perfectly for
small values of εαβ it can be dangerous for values above
10−2. In order to analyze how much our case deviates
from the simplest approximation we have performed a
rotation with the angle ϑ′′23 ≡ ϑ23 − α instead of just
ϑ23. By requiring that the new rotation diagonalizes the
submatrix (2,3) at the H-resonance layer one obtains the
following expression for the correction angle α
tan(2α) =
∆⊙s212s13 + V
ττ s223 − 2V NSIµτ c223
(∆atm +
∆⊙c212(−3 + c213)
+V NSIττ c223 + 2V
µτ s223
, (24)
where ∆atm ≡ ∆m2atm/(2E) and ∆⊙ ≡ ∆m2⊙/(2E). In
our notation sij and s2ij represent sinϑij and sin(2ϑij),
respectively. The parameters cij and c2ij are analogously
defined. In the absence of NSI α is just a small correction
to ϑ23 [97],
tan(2α) ≈ ∆⊙s212s13/∆atmc213 . O(10−3) . (25)
In order to calculate α we need to know the H-
resonance point. To calculate it one can proceed as in
the case without NSI, namely, make the ϑ′′23 rotation and
analyze the submatrix (1, 3). The new Hamiltonian H ′′αβ
has now the form
H ′′ee = V0ρ[Ye + εee(2− Ye)] + ∆atms213
+∆⊙(c
12 + s
13) ,
H ′′ττ = V0ρ(2− Ye)ε′′ττ +∆atmc213c2α
c213c
α + (sαc12 + cαs12s13)
H ′′eτ = V0ρ(2− Ye)ε′′eτ +
∆atms213cα
∆⊙(−c13sαs212 + c212cαs213) . (26)
We have defined ε′′ττ = εττc
23−α + εµτs223−α, and ε
εeτ c23−α+εeµs23−α, where s23−α ≡ sin(ϑ23−α), c23−α ≡
cos(ϑ23 − α), and s223−α ≡ sin(2ϑ23 − 2α), c223−α ≡
cos(2ϑ23 − 2α). The resonance condition for the H tran-
sition, H ′′ee = H
ττ can be then written as
H [Y He + (εee − ε′′ττ )(2− Y He )] = ∆atm(c213c2α − s213)
+∆⊙[c
13 − c2αs213)− s2αs212 + 12s2αs212s13] .(27)
It can be easily checked how in the limit of εαβ → 0 one
recovers the standard resonance condition,
HY He ≈ ∆atmc213 . (28)
In the region where the H-resonance occurs Y He ≈ 0.5.
Taking into account Eqs. (24) and (27) one can already
estimate how the value of α changes with the NSI param-
eters. In Fig. 5 we show the dependence of α on the εττ
after fixing the value of the other NSI parameters. One
can see how for εττ & 10
−2 the approximation of neglect-
ing α significantly worsens. Assuming ϑ23 = π/4 and a
fixed value of εµτ one can easily see that εττ basically
affects the numerator in Eq. (24). Therefore one expects
a rise of α as the value of εττ increases, as seen in Fig. 5.
The dependence of α on εµτ is correlated to the rela-
tive sign of the mass hierarchy and εµτ . For instance,
for normal mass hierarchy and positive values of εµτ the
dependence is inverse, namely, higher values of εµτ lead
to a suppression of α. Apart from this general behav-
ior, α also depends on the diagonal term εee as seen in
Fig. 5. This effect occurs by shifting the resonance point
through the resonance condition in Eq. (27).
One can now calculate the jump probability be-
tween matter eigenstates in analogy to the I-resonance
by means of the Landau-Zener approximation, see
Eqs. (20), (21), and 22,
PHLZ ≈ e−
γH , (29)
where γH represents the adiabaticity parameter at the
FIG. 5: Angle α as function of εττ for different values of εee
and εµτ , in the case of neutrinos of energy 10 MeV, with
normal mass hierarchy, and s213 = 10
−5. The other NSI pa-
rameters take the following values: εeµ = 0 and εeτ = 10
H-resonance, which can be written as
4H ′′2eτ
(Ḣ ′′ττ − Ḣ ′′ee)
, (30)
where the expressions for H ′′αβ are given in Eqs (26).
Let us first consider the case |εαβ| . 10−2. In this case
α ≈ 0 and one can rewrite the adiabaticity parameter as
∆atm sin
cos(2ϑ
)|d ln V/dr|rH
, (31)
where
= ϑ13 + ε
eτ (2− Ye)/Ye (32)
in agreement with Ref. [50]. For slightly larger ε’s there
can be significant differences. In Fig. 6 we show PHLZ in
the εeτ -εττ plane for antineutrinos with energy 10 MeV
in the case of inverse mass hierarchy, using Eq. (29) with
(upper panel) and without (bottom panel) the α cor-
rection. The values of ϑ13 and εeτ have been chosen so
that the jump probability lies in the transition regime be-
tween adiabatic and strongly non adiabatic. In the limit
of small εττ , α becomes negligible and therefore both re-
sults coincide. From Eq. (31) one sees how as the value of
εeτ increases γH gets larger and therefore the transition
becomes more and more adiabatic. For negative values
of εeτ there can be a cancellation between εeτ and ϑ13,
and as a result the transition becomes non-adiabatic.
An additional consequence of Eq. (32) is that a degen-
eracy between εeτ and ϑ13 arises. This is seen in Fig. 7,
which gives the contours of PH
in terms of εeτ and ϑ13
for εττ = 10
−4. One sees clearly that the same Landau-
Zener hopping probability is obtained for different com-
binations of εeτ and ϑ13. This leads to an intrinsic “con-
fusion” between the mixing angle and the corresponding
NSI parameter, which can not be disentangled only in
the context of SN neutrinos, as noted in Ref. [50].
We now turn to the case of |εττ | ≥ 10−2. As |εττ |
increases the role of α becomes relevant. Whereas in the
bottom panel PHLZ remains basically independent of εττ ,
one can see how in the upper panel PHLZ becomes strongly
sensitive to εττ for |εττ | ≥ 10−2.
One sees that for positive values of εττ it tends to adi-
abaticity whereas for negative values to non-adiabaticity.
This follows from the dependence of H ′′eτ on α, essen-
tially through the term −∆⊙c13sαs212, see Eq. (26). For
|εττ | ≥ 10−2 one sees that sinα starts being important,
and as a result this term eventually becomes of the same
order as the others in H ′′eτ . At this point the sign of
εττ , and so the sign of sinα, is crucial since it may con-
tribute to the enhancement or reduction of H ′′eτ . This
directly translates into a trend towards adiabaticity or
non-adiabaticity, seen in Fig. 6. Thus, for the range of
εττ relevant for the NSI-induced internal resonance the
adiabaticity of the outer H resonance can be affected in
a non-trivial way.
Turning to the case of the L transition a similar expres-
sion can be obtained by rotating the original Hamiltonian
by U(ϑ13)
†U(ϑ23)
† [48, 50]. However, in contrast to the
case of the H-resonance, where the mixing angle ϑ13 is
still unknown, in the case of the L transition the angle
ϑ12 has been shown by solar and reactor neutrino exper-
iments to be large [5]. As a result, for the mass scale ∆⊙
this transition will always be adiabatic irrespective of the
values of εαβ, and will affect only neutrinos.
FIG. 6: Landau-Zener jump probability isocontours at the H-
resonance in terms of εeτ and εττ for 10 MeV antineutrinos
in the case of inverted mass hierarchy. Upper panel: α given
by Eq. (24). Bottom panel: α set to zero. The remaining
parameters take the following values: sin2 ϑ13 = 10
, εeτ =
10−3, εee = εeµ = 0. See text.
V. OBSERVABLES AND SENSITIVITY
As mentioned in the introduction one of the major mo-
tivations to study NSI using the neutrinos emitted in a
SN is the enhancement of the NSI effects on the neutrino
propagation through the SN envelope due to the specific
extreme matter conditions that characterize it. In this
section we analyze how these effects translate into ob-
servable effects in the case of a future galactic SN.
Schematically, the neutrino emission by a SN can be di-
vided into four stages: Infall phase, neutronization burst,
accretion, and Kelvin-Helmholtz cooling phase. During
the infall phase and neutronization burst only νe’s are
emitted, while the bulk of neutrino emission is released
in all flavors in the last two phases. Whereas the neutrino
emission characteristics of the two initial stages are basi-
FIG. 7: Landau-Zener jump probability isocontours at the
H-resonance in terms of εeτ and ϑ13 for εττ = 10
−4. An-
tineutrinos with energy 10 MeV and inverted mass hierarchy
has been assumed.
cally independent of the features of the progenitor, such
as the core mass or equation of state (EoS), the details
of the neutrino spectra and luminosity during the ac-
cretion and cooling phases may significantly change for
different progenitor models. As a result, a straightfor-
ward extraction of oscillation parameters from the bulk
of the SN neutrino signal seems hopeless. Only features
in the detected neutrino spectra which are independent
of unknown SN parameters should be used in such an
analysis [66].
The question then arises as to how can one obtain in-
formation about the NSI parameters. Taking into ac-
count that the main effect of NSI is to generate new in-
ternal neutrino flavor transitions, one possibility is to in-
voke theoretical arguments that involve different aspects
of the SN internal dynamics.
In Ref. [47] it was argued that such an internal flavor
conversion during the first second after the core bounce
might play a positive role in the so-called SN shock re-
heating problem. It is observed in numerical simula-
tions [67, 68, 69, 70] that as the shock wave propa-
gates it loses energy until it gets stalled at a few hun-
dred km. It is currently believed that after neutrinos
escape the SN core they can to some extent deposit en-
ergy right behind and help the shock wave continue out-
wards. On the other hand it is also believed that due to
the composition in matter of the protoneutronstar (PNS)
the mean energies of the different neutrino spectra obey
〈Eνe〉 < 〈Eν̄e〉 < 〈Eνµ,ντ 〉. This means that a reso-
nant conversion between νe(ν̄e) and νµ,τ (ν̄µ,τ ) between
the neutrinosphere and the position of the stalled shock
wave would make the νe(ν̄e) spectra harder, and there-
fore the energy deposition would be larger, giving rise to
a shock wave regeneration effect.
Another argument used in the literature was the pos-
sibility that the r−process nucleosynthesis, responsible
for synthesizing about half of the heavy elements with
mass number A > 70 in nature, could occur in the region
above the neutrinosphere in SNe [71, 72]. A necessary
condition is Ye < 0.5 in the nucleosynthesis region. The
value of the electron fraction depends on the neutrino
absorption rates, which are determined in turn by the
νe(ν̄e) luminosities and energy distribution. These can
be altered by flavor conversion in the inner layers due to
the presence of NSI. Therefore by requiring the electron
fraction be below 0.5 one can get information about the
values of the NSI parameters.
While it is commonly accepted that neutrinos will play
a crucial role in both the shock wave re-heating as well
as the r−process nucleosynthesis, there are still other as-
trophysical factors that can affect both. While the issue
remains under debate we prefer to stick to arguments
directly related to physical observables in a large water
Cherenkov detector. There are several possibilities.
(A) the modulations in the ν̄e spectra due to the pas-
sage of shock waves through the supernova [58, 59,
(B) the modulation in the ν̄e spectra due to the time
dependence of the electron fraction, induced by the
I-resonance
(C) the modulations in the ν̄e spectra due to the Earth
matter [73, 74, 75, 76]
(D) detectability of the neutronization νe burst [77, 78]
Three of these observables, 1, 3 and 4 have already been
considered in the literature in the context of neutrino
oscillations. Here we discuss the potential of the above
promising observables in providing information about the
Scheme Hierarchy sin2 ϑ13 NSI Psurv P̄surv
A normal & 10−4 No 0 cos2 ϑ12
B inverted & 10−4 No sin2 ϑ12 0
C any . 10−6 No sin2 ϑ12 cos
AI normal & 10−4 Yes sin2 ϑ12 sin
BI inverted & 10−4 Yes cos2 ϑ12 cos
CIa normal . 10−6 Yes 0 sin2 ϑ12
CIb inverted . 10−6 Yes cos2 ϑ12 0
TABLE I: Definition of the neutrino schemes considered in
terms of the hierarchy, the value of ϑ13, and the presence
of NSI, as described in the text. The values of the survival
probabilities for νe (Psurv) and ν̄e (P̄surv) for each case are
also indicated.
NSI parameters. It is important to pay attention to the
possible ocurrence of the internal I-resonance and to its
effect in the external H and L-resonances. The first can
induce a genuinely new observable effect, item 2 above.
Here we concentrate on neutral current-type non-
standard interactions, hence there will be not effect in the
main reaction in water Cherenkov and scintillator detec-
tors, namely the inverse beta decay, ν̄e+p → e++n [98].
For definiteness we take NSI with d (down) quarks, in
which case the NSI effects will be confined to the neu-
trino evolution inside the SN and the Earth, through the
vector component of the interaction.
From all possible combinations of NSI parameters we
will concentrate on those for which the internal I tran-
sition does take place, namely |εI | & 10−2, see Fig. 2.
Concerning the FC NSI parameters we will consider |ε′eτ |
between few × 10−5 and 10−2, range in which the I-
resonance is adiabatic, see Fig. 4. In the following discus-
sion we will focus on the extreme cases defined in Table I.
One of the motivations for considering these cases is the
fact that the resonances involved become either adiabatic
or strongly non adiabatic, and hence the survival prob-
abilities in the absence of Earth effects or shock wave
passage, become energy independent. This assumption
simplifies the task of relating the observables with the
neutrino schemes.
A. Shock wave propagation
During approximately the first two seconds after the
core bounce, the neutrino survival probabilities are con-
stant in time and in energy for all cases mentioned in Ta-
ble I. Only the Earth effects could introduce an energy
dependence. However, at t ≈ 2 s the H-resonance layer is
reached by the outgoing shock wave, see Fig. 1. The way
the shock wave passage affects the neutrino propagation
strongly depends on the neutrino mixing scenario. In the
absence of NSI cases A and C will not show any evidence
of shock wave propagation in the observed ν̄e spectrum,
either because there is no resonance in the antineutrino
channel as in scenario A, or because the H-resonance is
always strongly non-adiabatic as in scenario C. How-
ever, in scenario B, the sudden change in density breaks
the adiabaticity of the resonance, leading to a time and
energy dependence of the electron antineutrino survival
probability P̄surv(E, t). In the upper panel of Fig. 8 we
show P̄surv(E, t) in the particular case that two shock
waves are present, one forward and a reverse one [60].
The presence of the shocks results in the appearance of
bumps in survival probability at those energies for which
the resonance region is passed by the shock waves. All
these structures move in time towards higher energies, as
the shock waves reach regions with lower density, leading
to observable consequences in the ν̄e spectrum.
We now turn to the case where NSI are present, which
opens the possibility of internal resonances. When such
I-resonance is adiabatic the situation will be similar to
the case without NSI. For normal mass hierarchy, AI and
CIa, ν̄e will not feel the H-resonance and therefore the
adiabaticity-breaking effect will not basically alter their
propagation. In contrast, for inverted mass hierarchy
and large ϑ13, case BI, the H-resonance occurs in the
antineutrino channel and therefore ν̄e will feel the shock
wave passage. However, in contrast to case B now ν̄e will
reach the H-resonance in a different matter eigenstate:
ν̄m1 instead of ν̄
3 , see Fig. 3. That means that before
the shock wave reaches the H-resonance the ν̄e survival
probability will be P̄surv ≈ cos2 ϑ12 ≈ 0.7. Once the
adiabaticity of the H-resonance is broken by the shock
wave then ν̄e will partly leave as ν̄
3 and therefore the
survival probability will decrease. As a consequence one
expects a pattern in time and energy for the survival
FIG. 8: Survival probability P̄surv(E, t) for ν̄e as function
of energy at different times averaged in energies with the en-
ergy resolution of Super-Kamiokande; for the profile shown in
Fig. 1. Upper panel: case B is assumed for sin2 ϑ13 = 10
Bottom panel: case BI , with εττ = 0.07, εeτ = 10
−4 and the
rest of NSI parameters put to zero.
probability in the case BI to be roughly opposite than in
the case B, see bottom panel of Fig. 8. The position of
the peaks and dips en each panel do not exactly coincide
as the value of εττ roughly shifts the position of the H-
resonance.
In the left panels of Fig. 9 we represent in light-shaded
(yellow) the range of εeτ and εττ for which this opposite
shock wave imprint would be observable. In the upper
panels we have assumed a minimum value of the electron
fraction of 0.06, based on the numerical profiles at t =
2 s of Fig. 1. In the bottom panels Y mine is set to 0.01,
inspired in the profiles at t = 15.7 s. It can be seen how as
time goes on the range of εττ ’s for which the I-resonance
takes place widens towards to smaller and smaller values.
This is a direct consequence of the steady deleptonization
of the inner layers.
For smaller ϑ13, case CIb, the situation is different.
Except for relatively large εeτ values theH-resonance will
be strongly non-adiabatic, as in case C. Therefore the
passage of the shock waves will not significantly change
the ν̄e survival probability and will not lead to any ob-
servable effect. In the right panels of Fig. 9 we show
the same as in the left panels but for sin2 ϑ13 = 10
Whereas for large values of ϑ13, left panels, the H-
resonance is always adiabatic and one has only to ensure
the adiabaticity of the I-resonance, for smaller values of
ϑ13 the adiabaticity of the H-resonance strongly depends
on the values of εeτ and εττ , as discussed in Sec. IVB.
This can be seen as a significant reduction of the yel-
low area. Only large values of either εeτ or εττ would
still allow for a clear identification of the opposite shock
wave effects. In dark-shaded (cyan) we show the region
of parameters for which PH lies in the transition region
between adiabatic and strongly non-adiabatic, and there-
fore could still lead to some effect.
A useful observable to detect effects of the shock prop-
agation is the average of the measured positron energies,
〈Ee〉, produced in inverse beta decays. In Fig. 10, we
show 〈Ee〉 together with the one sigma errors expected for
a Megaton water Cherenkov detector and a SN at 10 kpc
distance, with a time binning of 0.5 s, for different neu-
trino schemes: caseB and caseBI with different values of
εττ . For the neutrino fluxes we assumed the parametriza-
tion given by Refs. [79, 80] with 〈E0(ν̄e)〉 = 15 MeV and
〈E0(ν̄µ,τ )〉 = 18 MeV and the following ratio of the total
neutrino fluxes Φ0(ν̄e)/Φ0(ν̄µ,τ ) = 0.8 [99].
One can see how the features of the average positron
energy are a direct consequence of the shape of the sur-
vival probability, where dips have to be translated into
bumps and vice-versa.
Thus, it is important to stress that whereas in case
B one expects the presence of one or two dips (depend-
ing on the structure of the shock wave, see Ref [60]), or
nothing in the other cases, one or two bumps are ex-
pected in case BI, as seen in the upper left panel of
Fig. 10. As discussed in Ref. [60] the details of the
dips/bump will depend on the exact shape of the neu-
trino fluxes, but as long as general reasonable assump-
tions like 〈Eν̄e〉 . 〈Eν̄µ,τ 〉 are considered the dips/bumps
should be observed.
B. Time variation of Ye
We have just seen how the distorsion of the density
profile due to the shock wave passage through the outer
FIG. 9: Range of εττ and εeτ for which the effect of the shock
wave will be observed. In the upper panels a minimum value
of Y mine = 0.06 based on the numerical profiles at t = 2 s
has been assumed, see Fig. 1. In the lower panels we have
considered a case with Y mine = 0.01 inspired in the profile
at t = 15.7 s. The value of sin2 ϑ13 has been assumed to
be 10−2 and 10−7 in the left and right panels, respectively.
We have also superimposed isocontours of constant hopping
probability 0.1 (blue) and 0.9 (red) in the I (solid lines) and
H (dashed lines) resonances for inverted mass hierarchy and
E = 10 MeV and antineutrinos. The area in yellow represents
the parameter space where both resonances will be adiabatic.
In the cyan area the I-resonance is assumed to be adiabatic
whereas H lies in the transition region.
SN envelope can induce a time-dependent modulation in
the ν̄e spectrum in cases B and BI. However the time
dependence of the electron fraction Ye can also reveal the
presence of NSI leaving a clear imprint in the observed
ν̄e spectrum, as we now explain.
As discussed in Sec. IVA the region of NSI parame-
ters leading to I-resonance is basically determined by the
minimum and maximum values of the electron fraction,
Y mine and Y
e . The crucial point is that as the delep-
tonization of the proto-neutron star goes on, the value of
Y mine steadily decreases with time. As a result, the range
of NSI strengths for which the I-resonance takes place
FIG. 10: The average energy of ν̄p → ne+ events binned in
time for case B (dashed blue) and BI (solid red). In each
panel different values of εττ have been assumed. The error
bars represent 1 σ errors in any bin. εeτ = 10
increases with time, as can be seen in Fig. 2.
Let us first discuss the observational consequences of
the time dependence of the electron fraction in case BI.
If εττ (ε
I in general) is large enough the I-resonance will
take place right after the core bounce. In this case, as
seen in the upper left panel of Fig. 10 the two bumps we
have just discussed in Sec. VA would be clearly observed.
However for smaller NSI parameter values it could hap-
pen that the I-resonance occurs only after several sec-
onds. In particular for the specific Ye profile considered
we show how this delay could be of roughly 2, 4 or 9 sec
for values of εττ of 0.025, 0.02 or 0.015, respectively, see
last three panels Fig. 10. As can be inferred from the
figure this delay effect can lead to misidentification of
the pure NSI effect. So, for instance, in the upper right
panel, one sees how the two bumps might also be inter-
preted as two dips, given the astrophysical uncertainties.
This subtle degeneracy can only be solved by extra in-
formation on, for example, the time dependence of the
spectra or the velocity of the shock wave. Given the su-
pernova model, however, the time structure of the signal
could eventually not only point out the presence of NSI
but even potentially indicate a range of NSI parameters.
Let us now turn to the normal mass hierarchy sce-
nario (cases AI and CIa). In analogy to the BI case,
if εI is relatively large the onset of the I-resonance will
take place early on. As can be inferred from Fig. 3 that
implies that ν̄e will escape the SN as ν̄2. For smaller
values, though, it may happen that the I-resonance be-
comes effective only after a few seconds. This means
that during the first seconds of the neutrino signal ν̄e
would leave the star as ν̄1 (cases A and C). Then,
after some point, the electron fraction would be low
enough to switch on the I-resonance, and consequently
ν̄e would enter the Earth as ν̄2. This would result in
a transition in the electron antineutrino survival prob-
ability from P̄surv ≈ cos2 ϑ12 = 0.7 to sin2 ϑ12 = 0.3.
Given the expected hierarchy in the average neutrino en-
ergies 〈Eν̄e〉 . 〈Eν̄µ,τ 〉, it follows that the change in Ye
would lead to a hardening of the observed positron spec-
trum. The effect is quantified in Fig. 11 for different
values of εττ . The figure shows the average energy of
the ν̄p → ne+ events for the case of a Megaton water
Cherenkov detector exactly as in Fig. 10, but for scenar-
ios AI and CIa. One can see how for εττ = 0.07 the I-
resonance condition is always fulfilled and therefore there
is no time dependence. However for smaller values one
can see a rise at a certain moment which depends on the
magnitude of εττ . A similar effect would occur in case
C. Earth matter effects
Before the shock wave reaches the H-resonance layer
the dependence of the neutrino survival probability in the
cases we are considering, on the neutrino energy E is very
weak. However, if neutrinos cross the Earth before reach-
ing the detector, the conversion probabilities may become
energy-dependent, inducing modulations in the neutrino
energy spectrum. These modulations may be observed
in the form of local peaks and valleys in the spectrum of
the event rate σFDν̄e plotted as a function of 1/E. These
modulations arise in the antineutrino channel only when
ν̄e leave the SN as ν̄1 or ν̄2. In the absence of NSI this
happens in cases A and C, where ν̄e leave the star as ν̄1.
FIG. 11: The average energy of ν̄p → ne+ events binned in
time for case AI and CIa and different values of εττ . The
error bars represent 1 σ errors in any bin. εeτ = 10
In the presence of NSI ν̄e will arrive at the Earth as ν̄1
in cases BI, and as ν̄2 in case AI and CIa. Therefore
its observation would exclude cases B and CIb. This
distortion in the spectra could be measured by compar-
ing the neutrino signal at two or more different detectors
such that the neutrinos travel different distances through
the Earth before reaching them [73, 74]. However these
Earth matter effects can be also identified in a single de-
tector [75, 76].
By analyzing the power spectrum of the detected neu-
trino events one can identify the presence of peaks located
at the frequencies characterizing the modulation. These
do not dependend on the primary neutrino spectra, and
can be determined to a good accuracy from the knowl-
edge of the solar oscillation parameters, the Earth matter
density, and the position of the SN in the sky [76]. The
latter can be determined with sufficient precision even if
the SN is optically obscured using the pointing capability
of water Cherenkov neutrino detectors [81].
This method turns out to be powerful in detecting the
modulations in the spectra due to Earth matter effects,
and thus in ruling out cases B and CIb. However, the po-
sition of the peaks does not depend on how ν̄e enters the
Earth, as ν̄1 or ν̄2. Hence it is not useful to discriminate
case AI and CIa from the cases A, C, and BI.
The time dependence of Ye, however, can transform
case B into BI, and C with inverse hierarchy into CIb,
leading respectively to an appearance and disappearance
of these Earth matter effects. In case BI the presence of
the shock wave modulation can spoil a clear identification
of the Earth matter effects. Nevertheless, the disappear-
ance of the Earth matter effects in the transition from
case C to CIb allows us to pin down case CIb.
D. Neutronization burst
The prompt neutronization burst takes place during
the first ∼ 25 ms after the core bounce with a typical
full width half maximum of 5–7ms and a peak luminos-
ity of 3.3–3.5×1053 erg s−1. The striking similarity of the
neutrino emission characteristics despite the variability
in the properties of the pre-collapse cores is caused by
a regulation mechanism between electron number frac-
tion and target abundances for electron capture. This
effectively establishes similar electron fractions in the in-
ner core during collapse, leading to a convergence of the
structure of the central part of the collapsing cores, with
only small differences in the evolution of different pro-
genitors until shock breakout [77, 78].
Taking into account that the SN will be likely to be
obscured by dust and a good estimation of the distance
will not be possible, the time structure of the detected
neutrino signal should be used as signature for the neu-
tronization burst. In Ref. [78] it was shown that such a
time structure can be in principle cleanly seen in the
case of a Megaton water Cherenkov detector. It was
also shown how the time evolution of the signal depends
strongly on the neutrino mixing scheme. In the absence
of NSI the νe peak could be observed provided that the
νe survival probability Pνeνe is not zero. As can be seen
in Table I this happens for cases B and C. However
for case A (normal mass hierarchy and “large” ϑ13), νe
leaves the SN as ν3. This leads to a survival probability
Pνeνe ≈ sin2 ϑ13 . 10−1, and therefore the peak remains
hidden.
Let us now consider the situation where NSI are
prensent. For normal mass hierarchy νe, which is born
as νm2 passes through three different resonances, I, H
and L. Whereas I and L will be adiabatic, the fate of
H will depend on the value of ϑ13. For “large” values,
case AI, the H-resonance will also be adiabatic. This
implies that νe’s will leave as ν2, the survival probability
will be Pνeνe ≈ sin2 ϑ12 ≈ 0.3, and therefore the peak
will be seen, as in cases B and C. If ϑ13 happens to
be very small, case CIa, then H will be strongly non-
adiabatic and therefore νe will leave the star as ν3. As a
consequence the neutronization peak will not be seen.
For inverse mass hierarchy, νe is born as ν
1 and tra-
verses adiabatically I and L. This implies that they will
leave the star as ν1 and therefore the peak will also be
observed. However now the survival probability will be
larger, Pνeνe ≈ cos2 ϑ12 ≈ 0.7. Thus for a given known
normalization, i.e. the distance to the SN, one expects a
larger number of events during the neutronization peak
in this case. In Fig. 12 we show the expected number
of events per time bin in a water Cherenkov detector in
the case of a SN exploding at 10 kpc, for two different
neutrino schemes, C and BI, and for different SN pro-
genitor masses. One can see how the difference due to
the larger survival probability is bigger than the typi-
cal error bars, associated to the lack of knowledge of the
progenitor mass.
Two comments are in order. The neutronization νe
burst takes place during the first milliseconds, before
strong deleptonization takes place. As a result, in con-
trast to other observables we have considered in this pa-
per, here the I-resonance will only occur for εI & 10−1.
On the other hand in the presence of additional NSI with
electrons this would significantly affect the ν − e cross
sections, and consequently the results presented here.
VI. SUMMARY
We have analyzed the possibility of observing clear sig-
natures of non-standard neutrino interactions from the
detection of neutrinos produced in a future galactic su-
pernova.
In Secs. III and IV we have re-considered effect of ν−d
non-standard interactions on the neutrino propagation
through the SN envelope within a three-neutrino frame-
work. In contrast to previous works we have analyzed
the neutrino evolution in both the more deleptonized in-
FIG. 12: Number of events from the elastic scattering on elec-
trons, per time bin in a Megaton water Cherenkov detector for
a SN at 10 kpc for cases C (dashed lines) and BI (solid lines).
Different progenitor masses have been assumed: 13 M⊙ (n13)
in red, 15 M⊙ (s15s7b2) in black, and 25 M⊙ (s25a28) in blue.
1-sigma errors are also shown for the 15 M⊙ case.
ner layers and the outer regions of the SN envelope. We
have also taken into account the time dependence of the
SN density and electron fraction profiles.
First we have found that the small values of the elec-
tron fraction typical of the former allows for internal NSI-
induced resonant conversions, in addition to the standard
MSW-H and MSW-L resonances of the outer envelope.
These new flavor conversions take place for a relatively
large range of NSI parameters, namely |εαα| between
10−2 − 10−1, and |εeτ | & few × 10−5, currently allowed
by experiment. For this range of strengths, in particu-
lar εττ , non-standard interactions can significantly affect
the adiabaticity of the H-resonance. On the other hand
the NSI-induced resonant conversions may also lead to
the modulation of the ν̄e spectra as a result of the time
dependence of the electron fraction.
In Sec. V we have studied the possibility of detecting
NSI effects in a Megaton water Cherenkov detector us-
ing the modulation effects in the ν̄e spectrum due to (i)
the passage of shock waves through the SN envelope, (ii)
the time dependence of the electron fraction and (iii) the
Earth matter effects; and, finally, through the possible
detectability of the neutronization νe burst. Note that
observable (ii) turns out to be complementary to the ob-
servation of the shock wave passage, (i), and offers the
possibility to probe NSI effects also for normal hierarchy
neutrino spectra.
In Table II we summarize the results obtained for dif-
ferent neutrino schemes. We have found that observable
(i) can clearly indicate the existence of NSI in the case
of inverse mass hierarchy and large ϑ13 (case BI). On
the other hand, observable (ii) allows for an identification
of NSI effects in the other cases, normal mass hierarchy
(cases AI and CIa) and inverse mass hierarchy and small
ϑ13 (case CIb). Therefore a positive signal of either ob-
servable (i) or (ii) would establish the existence of NSI. In
the latter case this would, however, leave a degeneracy
among cases AI, CIa, and CIb. Such degeneracy can
be broken with the help of observables (iii) and the ob-
servation of the neutronization νe burst. The detection
of Earth matter effects during the whole supernova neu-
trino signal would rule out case CIb since, as discussed in
Sec. VC, a disappearance of Earth matter effects would
take place due to a transition from C to CIb. Finally,
the (non) observation of the neutronization burst can be
used to distinguish between cases AI and CIa.
Similarly, other degeneracies in Table II may be lifted
by suitably combining different observables. For exam-
ple, a negative of observable (ii) could mean either neg-
ligible NSI strengths or (NU) NSI parameter values so
large that the internal resonance is always present. In
this case one could use the observation of the neutron-
ization burst in order to establish the presence of NSI for
the case of inverse mass hierarchy. In addition the ob-
servation of the shock wave imprint in the ν̄e spectrum
would provide additional information on ϑ13.
In conclusion, by suitably combining all observables
one may establish not only the presence of NSI, but also
the mass hierarchy and probe the magnitude of ϑ13.
Acknowledgments
The authors wish to thank H-Th. Janka, O. Miranda,
S. Pastor, Th. Schwetz, and M. Tórtola for fruitful discus-
sions. Work supported by the Spanish grant FPA2005-
Scheme Hierarchy sin2 ϑ13 NSI shock Ye Earth νe burst
A normal & 10−4 No No No Yes No
B inverted & 10−4 No Yes No No Yes
C any . 10−6 No No No Yes Yes
AI normal & 10−4 Yes No Yes Yes Yes
BI inverted & 10−4 Yes Yes⋆ No Yes Yes⋆
CIa normal . 10−6 Yes No Yes Yes No
CIb inverted . 10−6 Yes No Yes No Yes⋆
TABLE II: Expectations for the observables discussed in the
text: modulation of the ν̄e spectrum due to the shock wave
passage, the time variation of Ye, the Earth effect, and the
observation of the νe burst within various neutrino schemes.
Asterisks indicate that the effect differs from that expected
in the absence of NSI. See text.
01269 and European Network of Theoretical Astroparti-
cle Physics ILIAS/N6 under contract number RII3-CT-
2004-506222. A. E. has been supported by a FPU grant
from the Spanish Government. R. T. has been supported
by the Juan de la Cierva program from the Spanish Gov-
ernment and by an ERG from the European Commission.
References
[1] KamLAND collaboration, K. Eguchi et al., Phys. Rev.
Lett. 90, 021802 (2003), [hep-ex/0212021].
[2] S. Pakvasa and J. W. F. Valle, hep-ph/0301061, Proc. of
the Indian National Academy of Sciences on Neutrinos,
Vol. 70A, No.1, p.189 - 222 (2004), Eds. D. Indumathi,
M.V.N. Murthy and G. Rajasekaran.
[3] V. Barger, D. Marfatia and K. Whisnant,
hep-ph/0308123.
[4] KamLAND collaboration, T. Araki et al., Phys. Rev.
Lett. 94, 081801 (2004).
[5] M. Maltoni, T. Schwetz, M. A. Tortola and J. W. F.
Valle, New J. Phys. 6, 122 (2004), Appendix C in
hep-ph/0405172 (v5) provides updated neutrino oscilla-
tion results taking into account new SSM, new SNO salt
data, latest K2K and MINOS data; previous works by
other groups are referenced therein.
[6] J. Schechter and J. W. F. Valle, Phys. Rev. D22, 2227
(1980).
[7] J. W. F. Valle, J. Phys. Conf. Ser. 53, 473 (2006),
[hep-ph/0608101], Review based on lectures at the Corfu
Summer Institute on Elementary Particle Physics in
September 2005.
[8] J. Schechter and J. W. F. Valle, Phys. Rev. D24, 1883
(1981), Err. D25, 283 (1982).
[9] C.-S. Lim and W. J. Marciano, Phys. Rev. D37, 1368
(1988).
[10] E. K. Akhmedov, Phys. Lett. B213, 64 (1988).
[11] L. Wolfenstein, Phys. Rev. D17, 2369 (1978).
[12] Mikheev, S. P. and Smirnov, A. Yu., (Editions Frontières,
Gif-sur-Yvette, 1986, p.355.), 86 Massive Neutrinos in
Astrophysics and Particle Physics, Proceedings of the
Sixth Moriond Workshop, ed. by Fackler, O. and Tran
Thanh Van, J.
[13] J. W. F. Valle, Phys. Lett. B199, 432 (1987).
[14] R. N. Mohapatra and J. W. F. Valle, Phys. Rev. D34,
1642 (1986).
[15] J. Bernabeu et al., Phys. Lett. B187, 303 (1987).
[16] G. C. Branco, M. N. Rebelo and J. W. F. Valle, Phys.
Lett. B225, 385 (1989).
[17] N. Rius and J. W. F. Valle, Phys. Lett.B246, 249 (1990).
[18] F. Deppisch and J. W. F. Valle, Phys. Rev. D72, 036001
(2005), [hep-ph/0406040].
[19] A. Zee, Phys. Lett. B93, 389 (1980).
[20] K. S. Babu, Phys. Lett. B203, 132 (1988).
[21] L. J. Hall, V. A. Kostelecky and S. Raby, Nucl. Phys.
B267, 415 (1986).
[22] M. Malinsky, J. C. Romao and J. W. F. Valle, Phys.
Rev. Lett. 95, 161801 (2005), [hep-ph/0506296].
[23] A. B. McDonald, astro-ph/0406253.
[24] K. Scholberg, astro-ph/0701081.
[25] LSND, L. B. Auerbach et al., Phys. Rev. D63, 112001
(2001), [hep-ex/0101039].
[26] MUNU, Z. Daraktchieva et al., Phys. Lett. B564, 190
(2003), [hep-ex/0304011].
[27] CHARM, J. Dorenbosch et al., Phys. Lett. B180, 303
(1986).
[28] CHARM-II, P. Vilain et al., Phys. Lett. B335, 246
(1994).
[29] NuTeV, G. P. Zeller et al., Phys. Rev. Lett. 88, 091802
(2002), [hep-ex/0110059].
[30] V. D. Barger, R. J. N. Phillips and K. Whisnant, Phys.
Rev. D44, 1629 (1991).
[31] S. Davidson, C. Pena-Garay, N. Rius and A. Santamaria,
JHEP 03, 011 (2003), [hep-ph/0302093].
http://arxiv.org/abs/hep-ex/0212021
http://arxiv.org/abs/hep-ph/0301061
http://arxiv.org/abs/hep-ph/0308123
http://arxiv.org/abs/hep-ph/0405172
http://arxiv.org/abs/hep-ph/0608101
http://arxiv.org/abs/hep-ph/0406040
http://arxiv.org/abs/hep-ph/0506296
http://arxiv.org/abs/astro-ph/0406253
http://arxiv.org/abs/astro-ph/0701081
http://arxiv.org/abs/hep-ex/0101039
http://arxiv.org/abs/hep-ex/0304011
http://arxiv.org/abs/hep-ex/0110059
http://arxiv.org/abs/hep-ph/0302093
[32] J. Barranco, O. G. Miranda, C. A. Moura and J. W. F.
Valle, Phys. Rev.D73, 113001 (2006), [hep-ph/0512195].
[33] Z. Berezhiani and A. Rossi, Phys. Lett. B535, 207
(2002), [hep-ph/0111137].
[34] A. Friedland, C. Lunardini and C. Pena-Garay, Phys.
Lett. B594, 347 (2004), [hep-ph/0402266].
[35] M. M. Guzzo, P. C. de Holanda and O. L. G. Peres, Phys.
Lett. B591, 1 (2004), [hep-ph/0403134].
[36] O. G. Miranda, M. A. Tortola and J. W. F. Valle, JHEP
10, 008 (2006), [hep-ph/0406280].
[37] N. Fornengo et al., Phys. Rev. D65, 013010 (2002),
[hep-ph/0108043].
[38] A. Friedland, C. Lunardini and M. Maltoni, Phys. Rev.
D70, 111301 (2004), [hep-ph/0408264].
[39] A. Friedland and C. Lunardini, Phys. Rev. D72, 053009
(2005), [hep-ph/0506143].
[40] G. Mangano et al., Nucl. Phys. B756, 100 (2006),
[hep-ph/0607267].
[41] S. K. Katsanevas, talk at Workshop on Neutrino Oscil-
lation Physics (NOW 2006), Otranto, Lecce, Italy, 9-16
Sep 2006.
[42] P. S. Amanik, G. M. Fuller and B. Grinstein, Astropart.
Phys. 24, 160 (2005), [hep-ph/0407130].
[43] P. S. Amanik and G. M. Fuller, astro-ph/0606607.
[44] S. P. Mikheev and A. Y. Smirnov, Sov. J. Nucl. Phys.
42, 913 (1985).
[45] S. P. Mikheev and A. Y. Smirnov, Nuovo Cim. C9, 17
(1986).
[46] H. Nunokawa, Y. Z. Qian, A. Rossi and J. W. F. Valle,
Phys. Rev. D54, 4356 (1996), [hep-ph/9605301].
[47] H. Nunokawa, A. Rossi and J. W. F. Valle, Nucl. Phys.
B482, 481 (1996), [hep-ph/9606445].
[48] S. Mansour and T.-K. Kuo, Phys. Rev. D58, 013012
(1998), [hep-ph/9711424].
[49] S. Bergmann and A. Kagan, Nucl. Phys. B538, 368
(1999), [hep-ph/9803305].
[50] G. L. Fogli, E. Lisi, A. Mirizzi and D. Montanino, Phys.
Rev. D66, 013009 (2002), [hep-ph/0202269].
[51] T.-K. Kuo and J. T. Pantaleone, Phys. Rev. D37, 298
(1988).
[52] S. Bergmann, Nucl. Phys. B515, 363 (1998),
[hep-ph/9707398].
[53] P. Huber, T. Schwetz and J. W. F. Valle, Phys. Rev.
Lett. 88, 101804 (2002), [hep-ph/0111224].
[54] P. Huber, T. Schwetz and J. W. F. Valle, Phys. Rev.
D66, 013006 (2002), [hep-ph/0202048].
[55] Particle Data Group, W. M. Yao et al., J. Phys. G33, 1
(2006).
[56] F. J. Botella, C. S. Lim and W. J. Marciano, Phys. Rev.
D35, 896 (1987).
[57] S. E. Woosley, A. Heger and T. A. Weaver, Reviews of
Modern Physics 74, 1015 (2002).
[58] R. C. Schirato, G. M. Fuller, . U. . LANL), UCSD and
LANL), astro-ph/0205390.
[59] G. L. Fogli, E. Lisi, D. Montanino and A. Mirizzi, Phys.
Rev. D68, 033005 (2003), [hep-ph/0304056].
[60] R. Tomas et al., JCAP 0409, 015 (2004),
[astro-ph/0407132].
[61] C. Y. Cardall, astro-ph/0701831.
[62] H. A. Bethe, J. H. Applegate and G. E. Brown, Astro-
phys. J. 241, 343 (1980).
[63] A. Burrows and T. J. Mazurek. Astrophys. J. 259, 330
(1982).
[64] H. Th. Janka, private communication.
[65] T.-K. Kuo and J. T. Pantaleone, Rev. Mod. Phys. 61,
937 (1989).
[66] M. Kachelriess and R. Tomas, hep-ph/0412100.
[67] M. Liebendoerfer et al., Phys. Rev. D63, 103004 (2001),
[astro-ph/0006418].
[68] M. Rampp and H. T. Janka, Astron. Astrophys. 396,
361 (2002), [astro-ph/0203101].
[69] T. A. Thompson, A. Burrows and P. A. Pinto, Astrophys.
J. 592, 434 (2003), [astro-ph/0211194].
[70] K. Sumiyoshi et al., Astrophys. J. 629, 922 (2005),
[astro-ph/0506620].
[71] Y.-Z. Qian, Prog. Part. Nucl. Phys. 50, 153 (2003),
[astro-ph/0301422].
[72] J. Pruet, S. E. Woosley, R. Buras, H.-T. Janka and
R. D. Hoffman, Astrophys. J. 623, 325 (2005),
[astro-ph/0409446].
[73] C. Lunardini and A. Y. Smirnov, Nucl. Phys. B616, 307
(2001), [hep-ph/0106149].
[74] A. S. Dighe, M. T. Keil and G. G. Raffelt, JCAP 0306,
005 (2003), [hep-ph/0303210].
[75] A. S. Dighe, M. T. Keil and G. G. Raffelt, JCAP 0306,
006 (2003), [hep-ph/0304150].
[76] A. S. Dighe, M. Kachelriess, G. G. Raffelt and R. Tomas,
JCAP 0401, 004 (2004), [hep-ph/0311172].
[77] K. Takahashi, K. Sato, A. Burrows and T. A. Thompson,
Phys. Rev. D68, 113009 (2003), [hep-ph/0306056].
[78] M. Kachelriess et al., Phys. Rev. D71, 063003 (2005),
[astro-ph/0412082].
[79] M. T. Keil, PhD thesis TU München 2003
[astro-ph/0308228].
[80] M. T. Keil, G. G. Raffelt and H. T. Janka, Astrophys. J.
590 (2003) 971 [astro-ph/0208035].
[81] R. Tomas, D. Semikoz, G. G. Raffelt, M. Kachelriess
and A. S. Dighe, Phys. Rev. D68, 093013 (2003),
[hep-ph/0307050].
[82] H. Nunokawa, V. B. Semikoz, A. Y. Smirnov and J. W. F.
Valle, Nucl. Phys. B501, 17 (1997), [hep-ph/9701420].
[83] H. Duan, G. M. Fuller, J. Carlson and Y.-Z. Qian, Phys.
Rev. D74, 105014 (2006), [astro-ph/0606616].
[84] H. Duan, G. M. Fuller, J. Carlson and Y.-Z. Qian, Phys.
Rev. Lett. 97, 241101 (2006), [astro-ph/0608050].
[85] S. Hannestad, G. G. Raffelt, G. Sigl and Y. Y. Y. Wong,
http://arxiv.org/abs/hep-ph/0512195
http://arxiv.org/abs/hep-ph/0111137
http://arxiv.org/abs/hep-ph/0402266
http://arxiv.org/abs/hep-ph/0403134
http://arxiv.org/abs/hep-ph/0406280
http://arxiv.org/abs/hep-ph/0108043
http://arxiv.org/abs/hep-ph/0408264
http://arxiv.org/abs/hep-ph/0506143
http://arxiv.org/abs/hep-ph/0607267
http://arxiv.org/abs/hep-ph/0407130
http://arxiv.org/abs/astro-ph/0606607
http://arxiv.org/abs/hep-ph/9605301
http://arxiv.org/abs/hep-ph/9606445
http://arxiv.org/abs/hep-ph/9711424
http://arxiv.org/abs/hep-ph/9803305
http://arxiv.org/abs/hep-ph/0202269
http://arxiv.org/abs/hep-ph/9707398
http://arxiv.org/abs/hep-ph/0111224
http://arxiv.org/abs/hep-ph/0202048
http://arxiv.org/abs/astro-ph/0205390
http://arxiv.org/abs/hep-ph/0304056
http://arxiv.org/abs/astro-ph/0407132
http://arxiv.org/abs/astro-ph/0701831
http://arxiv.org/abs/hep-ph/0412100
http://arxiv.org/abs/astro-ph/0006418
http://arxiv.org/abs/astro-ph/0203101
http://arxiv.org/abs/astro-ph/0211194
http://arxiv.org/abs/astro-ph/0506620
http://arxiv.org/abs/astro-ph/0301422
http://arxiv.org/abs/astro-ph/0409446
http://arxiv.org/abs/hep-ph/0106149
http://arxiv.org/abs/hep-ph/0303210
http://arxiv.org/abs/hep-ph/0304150
http://arxiv.org/abs/hep-ph/0311172
http://arxiv.org/abs/hep-ph/0306056
http://arxiv.org/abs/astro-ph/0412082
http://arxiv.org/abs/astro-ph/0308228
http://arxiv.org/abs/astro-ph/0208035
http://arxiv.org/abs/hep-ph/0307050
http://arxiv.org/abs/hep-ph/9701420
http://arxiv.org/abs/astro-ph/0606616
http://arxiv.org/abs/astro-ph/0608050
Phys. Rev. D74, 105010 (2006), [astro-ph/0608695].
[86] G. G. Raffelt and G. G. R. Sigl, hep-ph/0701182.
[87] A. B. Balantekin, J. M. Fetter and F. N. Loreti, Phys.
Rev. D54, 3941 (1996), [astro-ph/9604061].
[88] H. Nunokawa, A. Rossi, V. B. Semikoz and J. W. F.
Valle, Nucl. Phys. B472, 495 (1996), [hep-ph/9602307].
[89] G. L. Fogli, E. Lisi, A. Mirizzi and D. Montanino, JCAP
0606, 012 (2006), [hep-ph/0603033].
[90] A. Friedland and A. Gruzinov, astro-ph/0607244.
[91] Axial couplings would affect neutrino propagation in po-
larized media, see Ref. [82].
[92] However we have confined ourselves to values of εeα small
enough not to lead to drastic consequences during the
core collapse.
[93] For the sake of simplicity we will omit the superindex V .
[94] The importance of collective flavor neutrino conversions
driven by neutrino-neutrino interactions has been re-
cently noted in Refs. [83, 84, 85, 86]. Here we consider
only the case where the effective potential felt by neutri-
nos comes from their interactions with electrons, protons
and neutrons. In a future work we plan to include this
effect and have a complete picture of the neutrino prop-
agation.
[95] Here we neglect the possible effects of density fluctua-
tions [87, 88] taking place during the shock wave prop-
agation. For a detailed study of the phenomenological
consequences see Refs. [89, 90].
[96] The alternative condition H ′ee = H
µµ would give rise to
another internal resonance which can be studied using
the same method. For brevity, we will not pursue this in
this paper.
[97] Note that, in the limit of high densities one recovers
the rotation angle obtained for the internal I-resonance
23 → ϑ
23 after neglecting the kinetic terms.
[98] For the case of NSI with electrons both the vector and
axial components of εeαβ will contribute to the ν−e cross
section.
[99] We assume that for the values of the NSI parameters con-
sidered the initial neutrino spectra do not significantly
change.
http://arxiv.org/abs/astro-ph/0608695
http://arxiv.org/abs/hep-ph/0701182
http://arxiv.org/abs/astro-ph/9604061
http://arxiv.org/abs/hep-ph/9602307
http://arxiv.org/abs/hep-ph/0603033
http://arxiv.org/abs/astro-ph/0607244
ABSTRACT
  We analyze the possibility of probing non-standard neutrino interactions
(NSI, for short) through the detection of neutrinos produced in a future
galactic supernova (SN).We consider the effect of NSI on the neutrino
propagation through the SN envelope within a three-neutrino framework, paying
special attention to the inclusion of NSI-induced resonant conversions, which
may take place in the most deleptonised inner layers. We study the possibility
of detecting NSI effects in a Megaton water Cherenkov detector, either through
modulation effects in the $\bar\nu_e$ spectrum due to (i) the passage of shock
waves through the SN envelope, (ii) the time dependence of the electron
fraction and (iii) the Earth matter effects; or, finally, through the possible
detectability of the neutronization $\nu_e$ burst. We find that the $\bar\nu_e$
spectrum can exhibit dramatic features due to the internal NSI-induced resonant
conversion. This occurs for non-universal NSI strengths of a few %, and for
very small flavor-changing NSI above a few$\times 10^{-5}$.

<|endoftext|><|startoftext|>
Introduction 
The discrete dipole approximation (DDA) is a well-known method to solve the light 
scattering problem for arbitrary shaped particles. Since its introduction by Purcell and 
Pennypacker1 it has been improved constantly. The formulation of DDA summarized by 
Draine and Flatau2 more than 10 years ago is still most widely used for many applications,3 
partly due to the publicly available high-quality and user-friendly code DDSCAT.4 Although 
modern improvements of DDA (as discussed in detail in Section 2.F) exist, they are still in the 
research stage because they are not widely used in real applications. 
DDA directly discretizes the volume of the scatterer and hence is applicable to arbitrary 
shaped particles. However, the drawback of this discretization is the extreme computational 
complexity of DDA, although it is significantly decreased by advanced numerical 
techniques.2,5 That is why the usual application strategy for DDA is “single computation”, 
where a discretization is chosen based on available computational resources and some 
empirical estimates of the expected errors.3,4 These error estimates are based on a limited 
number of benchmark calculations3 and hence are external to the light scattering problem 
under investigation. Such error estimates have evident drawbacks, however no better 
alternative is available. Some results of analytical analysis of errors in computational 
electromagnetics are known, e.g. 6,7, however they typically consider the surface integral 
equations. To the best of our knowledge, such analysis has not been done for volume integral 
equations (such as DDA). 
Usually errors in DDA are studied as a function of the size parameter of the scatterer x 
(at a constant or few different total numbers of dipoles N), e.g. 2,8. Only a small number of 
papers directly present errors versus discretization parameter (e.g. d – the size of a single 
dipole).9-17 The range of d typically studied in those papers is limited to a 5 times difference 
between minimum and maximum values, with the exception of two papers11,12 where it is 15 
times. Those plots of errors versus discretization parameter are always used to illustrate the 
performance of a new DDA formulation and compare it with others. No conclusions about the 
convergence properties of DDA, as a function of d, have been made based on these plots. To 
our knowledge, no theoretical analysis of DDA convergence has been performed, but only a 
few limited empirical studies have appeared in the literature. 
In this paper we perform a theoretical analysis of DDA convergence when refining the 
discretization (Section 2). We derive rigorous theoretical bounds on the error in any measured 
quantity for any scatterer. In Section 3 we present extensive numerical results of DDA 
computations for 5 different scatterers using many different discretizations. These results are 
discussed in Section 4 to support conclusions of the theoretical analysis. We formulate the 
conclusions of the paper in Section 5. In a follow-up paper18 (which from now on we refer to 
as Paper 2) the theoretical convergence results are used for an extrapolation technique to 
increase the accuracy of DDA computations.  
2. Theoretical analysis 
In this section we analyze theoretically the errors of DDA computations. We formulate the 
volume integral equation for the internal electric field and its operator counterpart in 
Section 2.A and its discretization in Section 2.B. Section 2.C contains integral and discretized 
formulae for measured quantities that are the final goal of any light scattering simulation. We 
derive the main results in Section 2.D, where we consider errors of the traditional DDA 
formulation2 without shape errors, which are considered separately in Section 2.E. Finally in 
Section 2.F we discuss some recent DDA improvements from the viewpoint of our 
convergence theory. 
A.Integral Equation 
Throughout this paper we assume the )iexp( tω−  time dependence of all fields. The scatterer 
is assumed dielectric but not magnetic (magnetic permittivity 1=μ ), and the electric 
permittivity is assumed isotropic (non-isotropic permittivity will significantly complicate the 
derivations but will not principally change the main conclusion of Section 2 – Eqs. (70) and 
(87)). 
The general form of the integral equation governing the electric field inside the 
dielectric scatterer is the following:19,20 
)()(),(),()()(),(d)()( 00
rErrLrMrErrrGrErE χχ VVr
∂−+′′′′+= ∫ , (1)
where Einc(r), E(r) are the incident and total electric field at location r; πεχ 4)1)(()( −= rr  
is the susceptibility of the medium at point r (ε(r) – relative permittivity). V is the volume of 
the particle (more general – the volume, which contains all points where the susceptibility is 
not zero), V0 is a smaller volume such that , VV ⊂0 00 \ VV ∂∈r . ),( rrG ′  is the free space 
dyadic Green’s function, defined as 
−=∇∇+=′
)()(ˆˆ),(
kRgRgk IIIrrG , (2)
where I  is the identity dyadic, ck ω=  – free space wave vector, rrR ′−= , R=R , and 
 is a dyadic defined as  (μ, ν are Cartesian components of the vector or 
tensor), and g(R) is the scalar Green’s function 
RR ˆˆ νμμν RRRR =ˆˆ
iexp(
)( = . (3)
M is the following integral associated with the finiteness of the exclusion volume V0
( )∫ ′−′′′′=
)()(),()()(),(d),( s30
rV rErrrGrErrrGrM χχ , (4)
where ),(s rrG ′  is the static limit ( ) of 0→k ),( rrG ′ : 
−−=∇∇=′ 23
11ˆˆ),(
IrrG . (5)
L  is the so-called self-term dyadic: 
rV rL , (6)
where  is an external (as viewed from r) normal to the surface ∂Vn′ˆ 0 at point r'. 
Eq. (1) can be rewritten in operator form as follows 
inc~~~ EEA =⋅ , (7)
where ( 311 )~ CE →=∈ VLH  – functions from V to C3 that have finite L1-norm, 2inc~ H∈E  – 
subspace of H1 containing all functions that satisfy Maxwell equations in free space. A
 is a 
linear operator . Although the Sobolev norm is physically more sound (based on 
the finiteness of energy of the electric field),
21: HH →
6,21 we use the L1-norm. A detailed discussion of 
all assumptions made for the electric field is performed in Section 2.D. 
B.Discretization 
To solve Eq. (1) numerically a discretization is done in the following way.20 Let , 
 for . N denotes the number of subvolumes (dipoles). Assuming  and 
choosing , Eq. 
/0=ji VV I ji ≠ iV∈r
iVV =0 (1) can be rewritten as 
)()(),(),()()(),(d)()( 3inc rErrLrMrErrrGrErE χχ ii
∂−+′′′′+= ∑ ∫
. (8)
The set of Eq. (8) (for all i) is exact. Further one fixed point ri inside each Vi (its center) is 
chosen and  is set. irr =
The usual approximation20 is considering E and χ constant inside each subvolume: 
iiiii V∈==== rrrErErE for)()(,)()( χχχ . (9)
Eq. (8) can then be rewritten as 
( ) iiii
jjjijii V ELMEGEE χχ −++= ∑
inc , (10)
where , )(incinc ii rEE = ),( iii V rLL ∂= , 
( )∫ ′−′′=
iii r ),(),(d
s3 rrGrrGM , (11)
∫ ′′=
ij rV
1 3 rrGG . (12)
A further approximation, which is used in almost all formulations of DDA, is 
),()0( jiij rrGG = . (13)
This assumption is made implicitly by all formulations that start by replacing the scatterer 
with a set of point dipoles, as was done originally by Purcell and Pennypacker.1 For a cubical 
(as well as spherical) cell Vi with ri located at the center of the cell, iL  can be calculated 
analytically yielding22 
=i . (14)
Eq. (10) together with Eqs. (13) and (14) and completely neglecting iM  is equivalent to 
the original DDA by Purcell and Pennypacker (PP).1 The diagonal terms in Eq. (10) are then 
equivalent to the well-known Clausius-Mossotti (CM) polarizability for point dipoles. 
Modifications introduced by other DDA prescriptions are discussed in Section 2.F. 
In matrix notation Eq. (10) reads 
ddd ,incEEA = , (15)
where Ed, Einc,d are elements of  (vectors of size N where each element is a 3D complex 
vector) and 
( )N3C
dA  is a N×N matrix where each element is a 3×3 tensor. d is the size of one 
dipole. In operator notation Eq. (8) (for irr = ) is as follows 
( ) diii ,incinc )(~)(~~ ErErEA == , (16)
We define the discretization error function as 
( ) ( )iddidi ,0)(~~ EArEAh −= , (17)
where E0,d is the exact field at the centers of the dipoles – )(
i rEE = , in contrast to E
d that 
is only an approximation obtained from solution of Eq. (15) (here we neglect the numerical 
error that appears from the solution of Eq. (15) itself, which is acceptable if this error is 
controlled to be much less than other errors). Comparing Eqs. (15) and (17) one can 
immediately obtain the error in internal fields due to discretization δEd: 
( ) ddddd hAEEE 1,0δ −−=−= . (18)
C.Measured quantities 
After having determined the internal electric fields, scattered fields and cross sections can be 
calculated. Scattered fields are obtained by taking the limit ∞→r  of the integral in Eq. (1) 
(see e.g. 23) 
)iexp(
)(sca nFrE
= , (19)
where rrn =  is the unit vector in the scattering direction, and F is the scattering amplitude: 
∑∫ ′′⋅′−′−−=
krnnk )()()iexp(d)ˆˆ(i)( 33 rErnrInF χ . (20)
All other differential scattering properties, such as the amplitude and Mueller scattering 
matrices, and asymmetry parameter >< θcos  can be easily derived from F(n), calculated for 
two incident polarizations.24 We consider an incident polarized plane wave: 
)iexp()( 0inc rkerE ⋅= , (21)
where , a is direction of incidence, and ak k= 10 =e  is assumed. The scattering and 
extinction cross sections (Csca, Cext) are derived from the scattering amplitude:23 
∫ Ω= nFkC , (22)
( )∗⋅= 02ext )(Re
, (23)
where * denote complex conjugation. The expression for absorption cross section (Cabs) 
directly uses the internal fields:23 
( )∑ ∫ ′′′=
abs )()(Imd4 rErχπ , (24)
Since only values of the internal field in the centers of dipoles are known, Eqs. (20) and 
(24) are approximated by (PP) 
∑ ⋅−χ−−=
iii kVnnk )iexp()ˆˆ(i)(
3 nrEInF , (25)
iiiVkC
abs )Im(4 Eχπ . (26)
Corrections to Eq. (26) are discussed in Section 2.F. 
Both Eqs. (20) (for each component) and (24) can be generalized as ( )E~~φ  (a functional 
that is not necessarily linear), which is approximated as: 
( ) ( ) ddd φφφ δ~~ += EE , (27)
where ( )dd Eφ  corresponds to Eqs. (25) or (26) respectively, and the error δφ d consists of  two 
parts: 
( ) ( )[ ] ( ) ( )[ ]ddddddd EEEE φφφφφ −+−= ,0,0~~δ . (28)
The first one comes from discretization (similar to Eq. (17)), and the second from errors in the 
internal fields. 
D.Error analysis 
In this section we perform error analysis for the PP formulation of DDA. Improvements of 
DDA are further discussed in Section 2.F. 
We assume cubical subvolumes with size d. We also assume that the shape of the 
particle is exactly described by these cubical subvolumes (we call this cubically shaped 
scatterer). Moreover, χ is a smooth function inside V (exact assumptions on χ are formulated 
below). An extension of the theory to shapes that do not satisfy these conditions is presented 
in Section 2.E. If there are several regions with different values of χ (smooth inside each 
region), the analysis is still valid but interfaces inside V should be considered the same way as 
the outer boundary of V. We further fix the geometry of the scattering problem and incident 
field. Therefore we will be interested only in variation of discretization (which is 
characterized by the single parameter – d); for reasons that will become clear in the sequel, we 
assume that  (this bound is not limiting since otherwise DDA is generally 
inapplicable
We switch to dimensionless parameters by assuming 1=k , which is equivalent to 
measuring all the distances in units of k1 . The unit of electric field can be chosen arbitrary 
but constant. In all further derivations we will use two sets of constants: γi and ci. γ1-γ13 are 
basic constants that do not depend on the discretization d, but do depend directly upon all 
other problem parameters – size parameter eqkRx =  (Req – volume-equivalent radius), m, 
shape, and incident field – or some of them. On the contrary, c1-c94 are auxiliary values that 
either are numerical constants or can be derived in terms of constants γi. Although the 
dependencies of ci on γi are not explicitly derived in this paper, one can easily obtain them 
following the derivations of this section. That is the main motivation for using such vast 
amount of constants instead of an “order of magnitude” formalism. However, such explicit 
derivation has limited application because, as we will see further, constants in the final result 
depend upon almost all basic constants. Qualitative analysis of these dependencies will be 
performed in the end of this section. It should be noted that the main theoretical results 
concerning DDA convergence (boundedness of errors by a quadratic function, cf. Eq (70)) 
can be formulated and applied without consideration of any constants (which is simpler). 
However our full derivation enables us to make additional conclusions related to the behavior 
of specific error terms. 
The total number of dipoles used to discretize the scatterer is 
−= dN γ . (29)
We assume that the internal field E
 is at least four times differentiable and all these 
derivatives are bounded inside V 
65432 )(,)(,)(,)(,)( γγγγγ τρνμρνμνμμ ≤∂∂∂∂≤∂∂∂≤∂∂≤∂≤ rErErErErE  
for V∈r  and τρνμ ,,,∀ . 
This assumption is acceptable since there are no interfaces inside V, therefore E
 should be a 
smooth function. .  denotes the Euclidian (L2) norm, which is used for all 3D objects: vectors 
and tensors. We use the L1-norm, 
. , for N-dimensional vectors and matrices as well as for 
functions and operators. Eq. (30) immediately implies that . We require that χ 
satisfies Eq. 
~ 1 VL∈E
(30) with constants γ7-γ11. Further we will state an estimate for the norm of 
)(RG  and its derivatives. One can easily obtain from Eq. (2) that for 1>R  )(RG  satisfies 
Eq. (30) (with constants c1-c5), while for 2≤R  
6 )(,)(,)(,)(
−−−− ≤∂∂∂≤∂∂≤∂≤ RcRcRcRc RGRGRGRG ρνμνμμ , 
−≤∂∂∂∂ RcRGτρνμ  for τρνμ ,,,∀ . 
Next we state two auxiliary facts that will be used later. Let Vc be a cube with size d and 
with its center at the origin and f(r) a four times differentiable function inside Vc. Then 
)(max)()(d
fdcffr
d cVV
∈,∫ , (32)
( ) )(max)(
)()(d
d cVV
∂∂∂∂+∇≤−
=∫ . (33)
Eqs. (32) and (33) are the corollary of expanding f into Taylor series. Odd orders of the Taylor 
expansion vanish because of cubical symmetry. 
Our first goal is to estimate 
dh . Starting from Eq. (17) we write  as dih
),()(),(d )0(33 ii
i Vdr
rMPGrPrrGh +
−′′′= ∑ ∫
, (34)
where we have introduced the polarization vector for conciseness 
)()()( rErrP χ= , )( ii rPP = . (35)
It is evident that  also satisfies Eq. )(rP (30) (with constants c13-c17). We start by estimating 
),( iiV rM . Substituting a Taylor expansion of  )(rP
( ) ( )( )∑∑ ∂∂+∂+=
ρρ τρ ),,(
)()()( RrP0P0PRP RRR , (36)
where , into Eq μμ Rr ≤≤
~0 (4) gives 
( ) ( )(∫ ∑∫ ∂∂+−=
ii RRRRV
τρτρ τρ ),,(
)()(d),( 3s3 RrPRGPRGRGrM ) . (37)
The norms of these two terms can be estimated as 
( ) 2183s3 )(d3
)()(d dcRRgR
i ≤=− ∫∫ PIPRGRG , (38)
( ) 21923153 )(d3)),,(~()(d dcRRcRRR
≤≤∂∂ ∫∫ ∑ RGRrPRG
τρτρ τρ . (39)
Eq. (38) follows directly from the definitions in Eqs. (2), (5). To derive Eq. (39) we used 
Eq. (31) and the fact that 23RRR ≤∑
τρ . Finally, Eqs. (37)-(39) lead to 
20),( dcV ii ≤rM . (40)
To estimate the sum in Eq. (34) we consider separately three cases: 1) dipole j lies in a 
complete shell of dipole i (we define the shell below); 2) j lies in a distant shell of dipole i – 
1>−= ijijR rr ; 3) all j that fall between the first two cases (see Fig. 1). We define the first 
shell (S1(i)) of a cubical dipole as a set of dipoles that touch it (including touching in one point 
only). The second shell (S2(i)) is a set of dipoles that touch the outer surface of the first shell, 
and so on. The l-th shell (Sl(i)) is then a set of all dipoles that lie on the boundary of the cube 
with size  and center coinciding with the center of the original dipole. We call a shell 
complete if all its elements lie inside the volume of the scatterer V. A shell is called a distant 
dl )12( +
Kmax K(i) 
(2) (3) 
vacuum scatterer 
Fig. 1. Partition of the scatterer’s volume into three regions relative to dipole i. 
shell if all its elements satisfy , i.e. if its order 1>ijR [ ]dKl 1max => . Let K(i) be the order of 
the first incomplete shell, which is an indicator of how close dipole i is to the surface. We 
demand  to separate cases (1) and (2) described above. All j that fall in the third 
case satisfy  (the exact value of this constant – slightly larger than 
max)( KiK ≤
2<ijR 3  – depends on 
d). The number of dipoles in a shell Sl (which can be incomplete) – ns(l) – can be estimated as 
33 )12()12()( lclllns ≤−−+≤ . (41)
The sum of the error over all dipoles that lie in complete shells is then 
∑ ∑ ∫
)0(33 )(),(d
l iSj
dr PGrPrrG , (42)
Since each shell in Eq. (42) is complete it can be divided into pairs of dipoles that are 
symmetric over the center of the shell (j and –j). For convenience we set . The inner 
sum in Eq. 
0r =i
(42) can then be rewritten as 
( ) ( )∑ ∫
+−′−+′′′
)0(33 )()()(d
dr PPGrPrPrG , (43)
Further we introduce the auxiliary function 
( ) )()()(
)( 0PrPrPru −′−+′=′ , (44)
which satisfies the following inequalities (follows from Eq. (30) for P(r) and Taylor series) 
22 )(,)(,)( crcrc ≤∂∂≤∂≤ rururu νμμ  for νμ,∀ . (45)
Then Eq. (43) is equivalent to 
V l jl j
drdr PGrGuGrurG ∑ ∫∑ ∫
)0(33
)0(33 )(d)()(d , (46)
where . To estimate the first term we apply Eq. )( jj ruu = (32) to the whole function under 
the integral. Using Eqs. (31) and (45) one may obtain 
( ) 325)()(max −
≤′′∂∂ ij
, (47)
and hence 
)0(33 )()(d
V ll j
ldcRdcdr uGrurG , (48)
where we have used Eq. (41) and  for ldRij ≥ )(iSj l∈ . 
It is straightforward to show that 
∑ ∫∑ ∫
′′=′′
3 )(d
iSj ViSj V l jl j
rgrr IrG , (49)
)0( )(
RgIG . (50)
The derivation is based upon Eq. (2) and the equivalence I
 in all sums and integrals 
that satisfy cubical symmetry. Then second part of Eq. (46) is transformed to 
)0(33 )()(d)(d −
+≤−′′≤
−′′ ∑ ∫∑ ∫ ldcldcRgdrgrcdr
V l jl j
PGrG , (51)
where we apply Eq. (33) to derive the second inequality and use the identity  
and the following inequalities 
)()(2 rgrg −=∇
( ) 532131 )(, −− ≤∂∂∂∂≤ RcRgRcRg τρνμ  for τρνμ ,,,∀ . (52)
Substituting Eqs. (48) and (51) into Eq. (42) one can obtain 
( ) 23433
)0(33 )(ln)(),(d diKccdr
l iSj
−′′′∑ ∑ ∫
PGrPrrG , (53)
using the fact that . 1)( ≤diK
We now consider the second part of the sum in Eq. (34) (where ). We first apply 
1>ijR
(32), then use Eq. (30) for P(r) and )(rG , and finally invoke Eq. (29): 
)0(33 )(),(d dcdNcdcdr
ijij j
i ≤≤≤⎟
−′′′ ∑∑ ∫
PGrPrrG . (54)
To analyze the third part of the sum in Eq. (34) we again sum over shells, however since 
they are incomplete we cannot use symmetry considerations. We apply Eq. (33) to the whole 
function under the integral and proceed analogous to the derivation of Eq. (51). Using the 
identity 
)()(2 rGrG −=∇ , (55)
(since we have assumed ) we obtain 1=k
( ) 4372 )()( −= ≤∇ ijRcijRrrPrG , (56)
( ) 738)()(max −
≤′′∂∂∂∂ ij
, (57)
which leads to 
)0(33 )(),(d −−
−′′′∑ ∫ lcdlcdr
PGrPrrG , (58)
and then analogous to Eq. (53): 
)()()(),(d 442
)( )(
)0(33
iKcidKcdr
iKl iSj
−′′′∑ ∑ ∫ PGrPrrG . (59)
Collecting Eqs. (40), (53), (54), (59) we finally obtain 
( ) 24443442141 )(ln)()( diKcciKcidKcdi +++≤ −−h . (60)
Then 
( ) ( )442141
max4443
)(ln −−
+++≤= ∑∑ KcdKcKnNdKcc
d hh , (61)
where n(K) is the number of dipoles whose order of the first incomplete shell is equal to K. It 
is clear that 
NdnKn 12)1()( γ≤≤ , (62)
where γ12 is surface to volume ratio of the scatterer. Finally we obtain 
( )[ ]dcddccNd 46245431 ln +−≤h . (63)
The last term in Eq. (63) is mostly determined by dipoles that lie on the surface (or few 
dipoles deep) because it comes from the K-4 term in Eq. (61) (which rapidly decreases when 
moving from surface). We define surface errors as those associated with the linear term in 
Eq. (63). Our numerical simulation (see Section 0) show that this term is small compared to 
other terms for “typical” values of d, however it is always significant for small enough values 
of d. 
From Eq. (18) we directly obtain 
δ ddd hAE
≤ . (64)
We assume that a bounded solution of Eq. (7) uniquely exists for any , moreover we 
assume that if 
inc~ H∈E
inc =E  then 131
γ≤E . These assumptions are equivalent to the fact that 
1~ −A  exists and is finite (the operator 1
~ −A  is bounded). Because dA  is a discretization of A
one would expect that 
( ) 131
lim γ== −
. (65)
Although Eq. (65) seems intuitively correct, its rigorous prove, even if feasible, lies outside 
the scope of this paper. For an intuitive understanding one may consult the paper by Rahola,25 
where he studied the spectrum of the discretized operator (for scattering by a sphere) and 
showed that it does converge to the spectrum of the integral operator with decreasing d. It 
should however be noted, that convergence of the spectrum only implies the convergence of 
the spectral (L2) norm of the operator and not necessarily the convergence of the L1-norm. 
Therefore Eq. (65) should be considered as an assumption. It implies that there exists a d0 
such that 
for 0dd <  ( ) 47
A , (66)
where c47 is an arbitrary constant larger then γ13 (although d0 depends on its choice). For 
example 1347 2γ=c  should lead to a rather large d0 (a rigorous estimate of d0 does not seem 
feasible). Therefore 
δ dE  satisfies the same constrain as 
dh  (Eq. (63)) but with constants 
c48-c50. 
Next we estimate the errors in the measured quantities and start with the discretization 
error (first part in Eq. (28)). Examining Eqs. (20) and (24) one can see that Eq. (32) may be 
directly applied leading to 
( ) ( ) 252551,0~~ dcdc
dd ≤≤− ∑EE φφ . (67)
The second part in Eq. (28) is estimated as 
( ) ( ) ( ) dcddccdcdc d
55541
,0 lnδδ +−≤≤≤− ∑ EEEE φφ , (68)
where we used Eq. (29). The estimation of the error for Cabs additionally uses the fact 
i c EE δδ 57
c Eδmax
57 . 
By combining Eqs. (67) and (68) we obtain the final result of this section: 
( ) dcddccd 5625558 lnδ +−≤φ . (69)
It is important to remember that the derivation was performed for constant x, m, shape, and 
incident field. There are 13 basic constants (γ1-γ13). γ1 (Eq. (29)) characterizes the total 
volume of the scatterer, hence it depends only on x. γ7-γ11 (Eq. (30) for χ(r)) can be easily 
obtained given the function χ(r), moreover it is completely trivial in the common case of 
homogenous scatterers. γ12 (surface to volume ratio, Eq. (62)) depends on the shape of the 
scatterer and is inversely proportional to x. It is not feasible (except for certain simple shapes) 
to obtain the values of constants γ2-γ6 (Eq. (30)), since it requires an exact solution for the 
internal fields. These constants definitely depend upon all the parameters of the scattering 
problem. Moreover, these dependencies can be rapidly varying, especially near the resonance 
regions. The same is true for γ13 (L1-norm of the inverse of the integral operator, Eq. (65)). 
Finally, there is the important constant d0 that also depends on all the parameters, however 
one may expect it to be large enough (e.g. ) for most of the problems – then its 
variation can be neglected. 
20 ≥d
Before proceeding we introduce the discretization parameter kdmy = . We employ the 
commonly used formula as proposed by Draine,8 however the exact dependence on m is not 
important because all the conclusions are still valid for constant m. Replacing d by y does not 
significantly change the dependence of the constants in Eq. (69) since they all already depend 
on m through the basic constants γ2-γ11, γ13. This leads to 
( ) ycyyccy 6126059 lnδ +−≤φ . (70)
It is not feasible to make any rigorous conclusions about the variation of the constants in 
Eq. (70) with varying parameters because all these constants depend on γ2-γ6, γ13 that in turn 
depend in a complex way upon the parameters of the scattering problem. However we can 
make one conclusion about the general trend of this dependency. 
Following the derivation of the Eq. (70) one can observe that c61 is proportional to γ12, 
while c59 and c60 do not directly depend on it (at least part of the contributions to them are 
independent of γ12). Therefore the general trend will be a decrease of the ratio 5961 cc  with 
increasing x (when all other parameters are fixed). This is a mathematical justification of the 
intuitively evident fact that surface errors are less significant for larger particles. 
In the analysis of the results of the numerical simulations (Section 0) we will neglect the 
variation of the logarithm. Eq. (70) then states that error is bounded by a quadratic function of 
y (for ). However, keep in mind that our derivation does not lead to an optimal error 
estimation, i.e. it overestimates the error and can be improved. For example, the constants γ
0dd ≤
γ6 are usually largest inside a small volume fraction of the scatterer (near the surface or some 
internal resonance regions), while in the rest of the scatterer the internal electric field and its 
derivatives are bounded by significantly smaller constants. However the order of the error is 
estimated correctly, as we will see in the numerical simulations. 
It is important to note that Eq. (70) does not imply that δφ y (which is a signed value) 
actually depends on y as a quadratic function, but we will see later that it is the case for small 
enough y (Section 0, see detailed discussion in Paper 2). Moreover, the coefficients of linear 
and quadratic terms for δφ y may have different signs, which may lead to zero error for non-
zero y (however, this y, if it exists, is unfortunately different for each measured quantity). 
E.Shape errors 
In this section we extend the error analysis as presented in Section 2.D to shapes that cannot 
be described exactly by a set of cubical subvolumes. We perform the discretization the same 
way as in Section 2.B but some of the Vi are not cubical (for Vi ∂∈ , which denotes that 
dipole i lies on the boundary of the volume V). We set ri to be still in the center of the cube 
(circumscribing Vi) not to break the regularity of the lattice. The standard PP prescription uses 
equal volumes ( ) in Eqs. 3dVi = (10), (14), (25), and (26), i.e. the discretization changes the 
shape of the particle a little bit. We will estimate the errors introduced by these boundary 
dipoles. These errors should then be added to those obtained in Section 2.D. We start by 
estimating 
dh . First we consider  for dih Vi ∂∉  
−′′′=
PGrPrrGh )0(33 )(),(d , (71)
which is just a reduction of Eq. (34). For Vi ∂∈   is the same plus the error in the self-term dih
iiiiii
i VVdr
EΙrLrMPGrPrrGh χ
−′′′= ∑ ∫
),(),()(),(d )0(33 . (72)
Let us define 
iij dr
PGrPrrGh )0(33sh )(),(d −′′′= ∫ , (73)
iiiiiiii VV EΙrLrMh χ
⎛ −∂−=
),(),(sh . (74)
We estimate each of the terms in Eq. (73) separately (since there is actually no significant 
cancellation, and the error is of the same order of magnitude as the values themselves) using 
Eq. (30) for P(r) and )(rG  and Eq. (31). This leads to 
h  (75)
To estimate  we assume that the surface of the scatterer is a plane on the scale of the size 
of the dipole. A finite radius of curvature only changes the constants in the following 
expressions. We will prove that 
sh cii ≤h , (76)
therefore we do not need to consider the third term in Eq. (74) (coming from the unity tensor) 
at all, since it is bounded by a constant. 
( ) ( )∫∫ −′′′+′′−′′=
iiii rrV )()(),(d)(),(),(d),(
s3s3 rPrPrrGrPrrGrrGrM . (77)
−′ ic rrThe function in the first integral is always bounded by 
. If  the same is true 
for the second integral and hence 
ii V∈r
dcV ii 66),( ≤rM . (78)
If ii V∉r  we introduce an auxilia  r ′′ry point  that is symmetric to ri over the particle surface 
and apply the identity 
( ) ( ))()()()()()( ii rPrPrPrPrPrP −′′+′′−′=−′  (79)
to the second integral in Eq. (77). Using Taylor expansion of P near  and the fact that  r ′′
irrrr −′≤′′−′  for iV∈′r  one can show that 
∫ ′′+≤ii cdcV ),( s36867r
ir ),(d rrGM , (80)
where the remaining integral can be proven to be equal to ),( iiV rL ∂− . The last prove left 
(see Eqs. (74) and (80)) is to demonstrate that ),( iiV rL ∂  is bounded by a constant. The only 
potential problem may come from the subsurface of iV∂  th  the particle surface 
(because it may be close to r
at is part of
ce is i). This subsurfa med planar. We will calculate the 
integral in Eq. 
(6) over the infinite plane rρrr +=−′ i uch that 0 s =⋅rρ . Then ρρn ±=′  
( ) 223
2d),plane.inf(
mm == ∫irL
+∂V r
r , (81)
which is bounded. The rest of the integral (over the part of the cube surface) is bounded by a 
constant, which is a manifestation of a more general fact that (by its definition) ), ir   ( iVL ∂
does not depend on the size but only on the shape of the volume. Finally we have 
69),( cV ii ≤∂ rL , 
which together with Eqs. (74), (78), and (80) prove Eq. (76). 
Using Eqs. (75) and (76) we obtain 
)ln()( 70
cllnc
+≤+≤ ∑ ∑∑∑ ∑ −hhh 7271
dccNd
, (83)
where we have changed the order of the summation in the double sum and split the 
summation over cubical shells for 
⎝∂∈∂∈∂∈
maxKl ≤  and . Then we have grouped everything 
otal esti
maxKl >
into one sum over boundary dipoles. Eqs. (41) and (62) were used in the last inequality. 
Combining Eqs. (63) and (83) one can obtain the t mate of the 
dh  for any scatterer: 
( ) ( )[ ]ddccddccNd lnln 7273245431 −+−≤h . (84)
Using Eq. (66) we immediately obtain the sam
δ de estimate for E . 
The derivation of the errors in the measured quantities is slightly modified compared to 
 (68) e changed to Section 2.D, by the presence of the shape errors. Eqs. (67) and  ar
( ) ( )~~ dcdcdcdc
,0 +≤+≤− ∑∑
EE φφ , (85)
( ) )( ( ) ( )ddccddccdddd lnln 2,0 −+−≤−Eφ . dcd δ 7655541
53≤ EEφ (86)
The second term in Eq. (85) comes from surface dipoles for which errors are the same order 
as the values themselves. Finally the generalization of Eq. (70) is 
( ) ( )yyccyyccy lnlnδ 797826059 −+−≤φ . (87)
The shape errors “reinforce” the surface errors (the linear term of discretization error), and 
although they both generally decrease with increasing size parameter x one may expect the 
linear term in Eq. (87) to be significant up to higher values of y than in Eq. (70). 
All the derivations in this section can in principle be extended to interfaces inside the 
particle, i.e. when a surface, which cannot be described exactly as a surface of a set of cubes, 
separates two regions where χ(r) varies smoothly. Two parts of the cubical dipole on the 
interface should be considered separately the same way as it was done above. This will 
however not change the main conclusion of this section – Eq.(87) – but only the constants. 
F.Different DDA formulations 
In this section we discuss how different DDA formulations modify the error estimates derived 
in Sections 2.D and 2.E. 
Most of the improvements of PP proposed in the literature are concerned with the self-
term – . They are the Radiative Reaction correction (RR),),( iiV rM
8 the Digitized Green’s 
Function (DGF),23 the formulation by Lakhtakia (LAK),26,27 the a1-term method,28,29 the 
Lattice Dispersion Relation (LDR),30 the formulation by Peltoniemi (PEL),31 and the 
Corrected LDR (CLDR).32 All of them provide an expression for  that is of order d),( iiV rM
(except for RR that is of order d3). For instance LDR is equivalent to 
( )[ ] iii ddSmbmbbV PrM 3223221 i)32(),( +++= , (88)
(remember that we assumed ) where b1=k 1, b2, b3 are numerical constants and S is a 
constant that depends only on the propagation and polarization vectors of the incident field. 
However, none of these formulations can exactly evaluate the integral in Eq. (39), because the 
variation of the electric field is not known beforehand (PEL solves this problem, but only for 
a spherical dipole). Therefore they (hopefully) decrease the constant in Eq. (40), thus 
decreasing the overall error in the measured quantities. However, these formulations are not 
expected to change the order of the error from d2 to some higher order. 
We do not analyze the improvements by Rahmani, Chaumet, and Bryant (RCB)33,34 and 
Surface Corrected LDR (SCLDR),17 as they are limited to certain particle shapes. 
There exist two improvements of the interaction term in PP: Filtered Coupled Dipoles 
(FCD)12 and Integration of Green’s Tensor (IT).35 A rigorous analysis of FCD errors is 
beyond the scope of this paper, but it seems that FCD is not designed to reduce the linear term 
in Eq. (63) that comes from the incomplete (non-symmetric) shells. This is because FCD 
employs sampling theory to improve the accuracy of the overall discretization for regular 
cubical grids. FCD does not improve the accuracy of a single ijG  calculation (approximation 
of an integral over one subvolume). 
IT, which numerically evaluates the integral in Eq. (12), has a more pronounced effect 
on the error estimate. Consider dipole j from l-th shell (incomplete) of dipole i, then 
.),(d),(maxd
)(),(d)(),(d
≤′′′+′∂′′≤
−′′′=−′′′
dlcrrcrrc
jijij
rrGrrG
PrPrrGPGrPrrG
 (89)
Here we have used Eq. (36) and Taylor expansion of Green’s tensor up to the first order. 
Eq. (89) states that the second term in Eq. (58) is completely eliminated and so is the linear 
term in Eqs. (69) and (70) (surface errors). Therefore convergence of DDA with IT for 
cubically shaped scatterers is expected to be purely quadratic (neglecting the logarithm). 
However, for non cubically shaped scatterers the linear term reappears, due to the shape 
errors. Both IT and FCD also modify the self-term, however the effect is basically the same as 
for the other formulations. 
Several papers aimed to reduce shape errors.10,11,36 The first one – Generalized Semi-
Analytical (GSA) method10 – modifies the whole DDA scheme, while the other two propose 
averaging of the susceptibility over the boundary dipoles. We will analyze here Weighted 
Discretization (WD) by Piller11 as probably the most advanced method to reduce shape errors 
available today. 
WD modifies the susceptibility and self-term of the boundary subvolume. We slightly 
modify the definition of the boundary subvolume used in Sections 2.B and 2.E to 
automatically take into account interfaces inside the scatterer. We define Vi to be always 
cubical, but with a possible interface inside. The particle surface, crossing the subvolume Vi, 
is assumed planar and divides the subvolume into two parts: the principal volume  
(containing the center) and the secondary volume  with susceptibilities ,  and 
electric fields ,  respectively. The electric fields are considered constant inside 
each part and related to each other via the boundary condition tensor 
iV ii χχ ≡
ii EE ≡
iT : 
iii ETE =
s . (90)
In WD the susceptibility of the boundary subvolume is replaced by an effective one, defined 
( ) 3ssppe dVV iiiiii TI χχχ += , (91)
which gives the correct total polarization of the cubical dipole. The effective self-term is 
directly evaluated starting from Eq. (4), considering χ and E constant inside each part, 
( ) ( ) iii
rrV ETrrGrrGrrGrrGrM
′−′′+′−′′= ∫∫ ss3ps3
),(),(d),(),(d),( χχ . (92)
Piller evaluated the integrals in Eq. (92) numerically.11 
To take a smooth variation of the electric field and susceptibility into account we define 
 ( r  is defined in Section )(s r ′′= χχ i ′′ 2.E) and iT  is calculated at the surface between ri and 
.  and r ′′ ii PP ≡
iiiiii ETEP
ssss χχ == . Then 
c rrPrP
−≤−′′
min)( 83
s , (93)
where we have assumed that Eq. (30) for χ(r) and E(r) is also valid in . siV
We start estimating errors of WD with  (cf. Eq. shijh (73)) 
( ) ( )∫∫ −′′′+−′′′=
s)0(3p)0(3sh )(),(d)(),(d
jijiij rr PGrPrrGPGrPrrGh , (94)
Using Taylor expansions of  near r)(rP ′ i and r ′′  in  and  correspondingly and Eq. piV
iV (93) 
one may obtain that the main contribution comes from the derivative of Green’s tensor, 
leading to (cf. Eq. (75)) 
h  (95)
iih  is the following (cf. Eq. (74)) 
( ) ( )
( ) ( ) ( )
.),(),(
),(d)(),(d)(),(d
),(),(),(d),(),(d
),(),(
pss3s3p3
ess3ps3
iiiiiii
iiiiiii
PrLErL
PPrrGPrPrrGPrPrrG
ErLPrrGrrGPrrGrrG
PrLrMh
−′′+−′′′+−′′′=
∂−′−′′+′−′′−
 (96)
The first two integrals can be easily shown to be dс85≤  (cf. Eq. (77)) and third one is 
transformed to L  the same way as in Eq. (80), thus 
iiiiiiiiiiii VVVdc ErLPrLPrLh
esspp
sh ),(),(),( χ∂−∂+∂+≤ , (97)
where the second term comes from the fact that averaged PL  is not the same as L  times 
averaged P. This error depends on the geometry of the interface inside Vi and generally is of 
order unity. For example, if the plane interface is described as ε+= izz , taking limit 0→ε  
gives the error ( )zii sp2 PP −π  (using Eq. (81)). Therefore WD does not principally improve 
the error estimate of  given by Eq. shiih (76), although it may significantly decrease the 
constant. On the other hand, since ),( p iiV rL ∂  and ),(
iiV rL ∂  can be (analytically) evaluated 
for a cube intersected by a plane, WD can be further improved to reduce the error in  to 
linear in d, which is a subject of future research. 
Proceeding analogous to the derivation of Eq. (83) one can obtain 
Ndcdccllnc
898887
)( ≤⎟⎟
++≤ ∑ ∑
h . (98)
It can be shown that for the scattering amplitude (Eq. (25)) the error estimate given by 
Eq. (85) can be improved, since WD correctly evaluates the zeroth order of value for the 
boundary dipoles, leading to 
( ) ( ) 291490551,0~~ dcdcdc
dd ≤+≤− ∑∑
EE φφ . (99)
In his original paper11 Piller did not specify the expression that should be used for Cabs. Direct 
application of the susceptibility provided by WD into Eq. (26) does not reduce the order of 
error when compared with the exact Eq. (24) (except when ), since they are not linear 
functions of the electric field. However, if we consider separately  and  (which is 
equivalent to replacing 
0s =iχ
( )( )∗⋅ iiiiV EEeIm χ  by 
2ss2pp )Im()Im( iiiiiii VV ETE χχ + ) the same 
estimate as in Eq. (99) can be derived for Cabs. 
Using Eqs. (98), (99), and the first part of Eq. (86) one can derive the final error 
estimate for WD: 
( ) ycyyccy 9429392 lnδ +−≤φ , (100)
where the constant before the linear term, as compared to Eq. (87), does not contain a 
logarithm and is expected to be significantly smaller, because several factors contributing to it 
are eliminated in WD. Although WD has a potential for improving, it does not seem feasible 
to completely eliminate the linear term in the shape error. The accuracy of evaluation of the 
interaction term over the boundary dipole (cf. Eq. (94)) can be improved by integration of 
Green’s tensor over  and  separately but that would ruin the block-Toeplitz structure of 
the interaction matrix and hinder the FFT-based algorithm for the solution of linear 
0 30 60 90 120 150 180
Scattering angle θ, deg
 cube kD=8
 discretized sphere kD=10
 sphere kD=3 (x10)
 sphere kD=10
 sphere kD=30
Fig. 2. S11(θ ) for all 5 test cases in logarithmic scale. The result for the kD = 3 sphere is multiplied by 
10 for convenience. 
equations.5 Since there is no comparable alternative to FFT nowadays, this method seems 
inapplicable. 
Minor modifications of the expression for Cabs are possible. Draine8 proposed a 
modification of Eq. (26) that was widely used afterwards and which was further modified by 
Chaumet et al.35 However, for many cases these expressions are equivalent and, even when 
they are not, the difference is of order d3, which is neglected in our error analysis. 
3. Numerical simulations 
A.Discrete Dipole Approximation 
The basics of the DDA method were summarized by Draine and Flatau.2 In this paper we use 
the LDR prescription for dipole polarizability,30 which is most widely used nowadays, e.g. in 
the publicly available code DDSCAT 6.1.4 We also employ dipole size correction8 for non-
cubically shaped scatterers to ensure that the cubical approximation of the scatterer has the 
correct volume; this is believed to diminish shape errors, especially for small scatterers.2 We 
use a standard discretization scheme as described in Section 2.E, without any improvements 
for boundary dipoles. It is important to note that all the conclusions are valid for any DDA 
implementation, but with a few changes for specific improvements as discussed in 
Section 2.F. 
Our code – Amsterdam DDA (ADDA) – is capable of running on a cluster of computers 
(parallelizing a single DDA computation), which allows us to use practically unlimited 
number of dipoles, since we are not limited by the memory of a single computer.37,38 We used 
a relative error of residual  as a stopping criterion for the iterative solution of the DDA 
linear system. Tests suggest that the relative error of the measured quantities due to the 
iterative solver is then  (data not shown) and hence can be neglected (total relative 
errors in our simulations are  – see Section 
810−<
710−<
56 1010 −− ÷> 0). More details about our code can 
be found in Paper 2. All DDA simulations were carried out on the Dutch national compute 
cluster LISA.39 
Table 1. Exact values of Qext for the 5 test cases. 
Particle Qext
kD = 8 cube 4.490 
discretized kD = 10 sphere 3.916 
kD = 3 sphere 0.753 
kD = 10 sphere 3.928 
kD = 30 sphere 1.985 
0.1 1
 slope = 0.77
 y = kd·m
 maximum
 θ = 0°
 θ = 45°
 θ = 90°
 θ = 135°
 θ = 180°
 slope = 0.95
Fig. 3. Relative errors of S11 at different angles θ and maximum over all θ versus y for (a) the kD = 8 
cube, (b) the cubical discretization of kD = 10 sphere. A log-log scale is used. A linear fit of maximum 
over θ errors is shown. (m = 1.5). 
B.Results 
We study five test cases: one cube with 8=kD , three spheres with , and a 
particle obtained by a cubical discretization of the 
30,10,3=kD
10=kD  sphere using 16 dipoles per D 
(total 2176 dipoles, x equal to that of a sphere; see detailed description in Paper 2). By D we 
denote the diameter of a sphere or the edge size of a cube. All scatterers are homogenous with 
. Although DDA errors significantly depend on m (see e.g. 5.1=m 14), we limit ourselves to 
one single value and study effects of size and shape of the scatterer. 
The maximum number of dipoles per D (nD) was 256. The values of nD that we used are 
of the form  (p is an integer), except for the discretized sphere, where all np2}7,6,5,4{ ⋅ D are 
0.1 1
0.1 1
0.01 0.1
y = kd·m
 slope 2.29 (c)
 slope 0.91
 maximum
 θ = 0°
 θ = 45°
 θ = 90°
 θ = 135°
 θ = 180°
 slope 1.05
Fig. 4. Same as Fig. 3 but for (a) kD = 3, (b) kD = 10, and (c) kD = 30 spheres. 
multiples of 16 (this is required to exactly describe the shape of the particle composed from a 
number of cubes). The minimum values for nD were 8 for the 3=kD  sphere, 16 for the cube, 
the  sphere, and the discretized sphere, and 40 for the 10=kD 30=kD  sphere. 
All the computations use a direction of incidence parallel to one of the principal axes of 
the cubical dipoles. The scattering plane is parallel to one of the face of the cubical dipoles. In 
this paper we show results only for the extinction efficiency Qext (for incident light polarized 
parallel to one of the principal axes of the cubical dipoles) and phase function S11(θ ) as the 
0.01 0.1 1
 slope = 0.89
 cube kD=8
 discretized sphere kD=10
 sphere kD=3
 sphere kD=10
 sphere kD=30
y=kd·m  
Fig. 5. Relative errors of Qext versus y for all 5 test cases. A log-log scale is used. A linear fit through 5 
finest discretizations of kD = 3 sphere is shown. 
most commonly used in applications. However, the theory applies to any measured quantity. 
For instance, we have also confirmed it for other Mueller matrix elements (data not shown). 
Exact results of S11(θ ) for all 5 test cases are shown in Fig. 2. For spheres this is the 
result of Mie theory (the relative accuracy of the code we used24 is at least ) and for the 
cube and discretized sphere an extrapolation over the 5 finest discretizations (the 
extrapolation technique is presented in Paper 2, together with all details of obtaining these 
results, including their estimated errors). We use such ‘exact’ results because analytical theory 
is unavailable for these shapes and because errors of the best discretization are larger than that 
of the extrapolation. Their use as references for computing real errors (difference between the 
computed and the exact value) of single DDA calculations is justified because all these real 
errors are significantly larger than the errors of the references themselves (see Paper 2; in 
general, real errors obtained this way have an uncertainty of reference error). Exact values of 
610−<
ext for all test cases are presented in Table 1. 
In the following we show the results of DDA convergence. Fig. 3 and Fig. 4 present 
relative errors (absolute values) of S11 at different angles θ and maximum error over all θ 
versus y in log-log scale. In many cases the maximum errors are reached at exact 
backscattering direction, then these two sets of points overlap. Deep minima that happen at 
intermediate values of y for some values of θ (and also sometimes for Qext – Fig. 5) are due to 
the fact that the differences between simulated and reference values change sign near these 
values of y (see Paper 2 for detailed description of this behavior). The solid lines are linear fits 
to all or some points of maximum error. The slopes of these lines are depicted in the figures. 
Fig. 5 shows relative errors of Qext for all 5 studied cases in log-log scale. A linear fit through 
the 5 finest discretizations of the 3=kD  sphere is shown. More results of these numerical 
simulations are presented in Paper 2. 
4. Discussion 
Convergence of DDA for cubically shaped particles (Fig. 3) shows the following trends. All 
curves have linear and quadratic parts (the non-monotonic behavior of errors for some θ are 
also a manifestation of the fact that signed difference can be approximated by a sum of linear 
and quadratic terms that have different signs). The transition between these two regimes 
occurs at different y (which indicates the relative importance of linear and quadratic 
coefficients). While for maximum errors that are close to those of the backscattering direction 
the linear term is significant for larger y, it is much smaller and not significant in the whole 
range of y studied for side scattering ( °= 90θ ). R
for l
esults of DDA convergence for spheres (Fig. 
4) show a different behavior for different sizes. Errors for the small ( 3=kD ) sphere converge 
purely linear (except for small deviation of errors of )90(11 °S   values of y). Similar 
results are obtained for the 10=kD  sphere, but with significant oscillations superimposed 
upon the general trend. Convergence for the large ( 30
=kD ) sphere is quadratic or even faster 
in the range of y studied, also with significant oscillations. 
Comparing Fig. 3 and Fig. 4 (especially Fig. 3(b) and Fig. 4(b) showing results for 
almost the same particles) one can deduce the following differences in DDA convergence for 
cubically and non-cubically shaped scatterers. The linear term for cubically shaped scatterers 
is significantly smaller, resulting in smaller total errors, especially for small y. All these 
conclusions, together with the size dependence of the significance of the linear term in the 
total errors, are in perfect agreement with the theoretical predictions made in Sections 2.D and 
2.E. Errors for non-cubically shaped particles exhibit quasi-random oscillations that are not 
present for cubically shaped particles. This can be explained by the sharp variations of shape 
errors with changing y (discussed in details in Paper 2). Oscillations for the  sphere 
Fig. 4(a)) are very small (but still clearly present), which is due to the small size of the 
particle and hence featurelessness of its light scattering pattern – the surface structure is not 
that important and one may expect rather small shape errors. Results for Qext (Fig. 5) fully 
support the conclusions. Errors of Qext for the large sphere at small values of y are 
unexpectedly smaller than for smaller spheres. This feature requires further study before 
making any firm conclusions, however there is definitely no similar tendency for S11(θ ) (cf. 
Fig. 4). 
We have also studied a  porous cube that was obtained by dividing a cube into 
27 smaller cubes and then removing randomly 9 of them. All the conclusions are the same as 
those reported for the cube, but with slightly higher overall errors (data not shown). 
In this paper we have used a traditional DDA formulation2 for numerical simulations. 
However, as we showed in Section 2.F several modern improvements of DDA (namely IT 
and WD) should significantly change its convergence behavior. IT should completely 
eliminate the linear term for cubically shaped scatterers, which should improve the accuracy 
especially for small y. WD should significantly decrease shape and hence total errors for non-
cubically shaped particles, moreover it should significantly decrease the amplitude of quasi-
random error oscillations because it takes into account the location of the interface inside the 
boundary dipoles. Numerical testing of DDA convergence using IT and WD is a subject of a 
future study. 
5. Conclusion 
To the best of our knowledge, we conducted for the first time a rigorous theoretical 
convergence analysis of DDA. In the range of DDA applicability ( 2<kd ) errors are bounded 
by a sum of a linear and quadratic term in the discretization parameter y; the linear term is 
significantly smaller for cubically than for non-cubically shaped scatterers. Therefore for 
small y errors for cubically shaped particles are much smaller than for non-cubically shaped. 
The relative importance of the linear term decreases with increasing size, hence convergence 
of DDA for large enough scatterers is quadratic in the common range of y. All these 
conclusions were verified by extensive numerical simulations. 
Moreover, these simulations showed that errors are not only bounded by a quadratic 
function (as predicted in Section 2), but actually can be (with good accuracy) described by a 
quadratic function of y. This fact provides a basis for the extrapolation technique presented in 
Paper 2. 
Our theory predicts that modern DDA improvements (namely IT and WD) should 
significantly change the convergence of DDA, however numerical testing of these predictions 
is left for future research. 
Acknowledgements
We thank Gorden Videen and Michiel Min for valuable comments on earlier version of this 
manuscript and anonymous reviewer for helpful suggestions. Our research is supported by the 
NATO Science for Peace program through grant SfP 977976. 
References 
 1.  E. M. Purcell and C. R. Pennypacker, "Scattering and adsorption of light by nonspherical dielectric 
grains," Astrophys. J. 186, 705-714 (1973). 
 2.  B. T. Draine and P. J. Flatau, "Discrete-dipole approximation for scattering calculations," J. Opt. Soc. 
Am. A 11, 1491-1499 (1994). 
 3.  B. T. Draine, "The discrete dipole approximation for light scattering by irregular targets," in Light 
Scattering by Nonspherical Particles, Theory, Measurements, and Applications, M. I. Mishchenko, J. W. 
Hovenier, and L. D. Travis, eds. (Academic Press, New York, 2000), pp. 131-145. 
 4.  B. T. Draine and P. J. Flatau, "User guide for the discrete dipole approximation code DDSCAT 6.1," 
http://xxx.arxiv.org/abs/astro-ph/0409262 (2004). 
 5.  J. J. Goodman, B. T. Draine, and P. J. Flatau, "Application of fast-Fourier-transform techniques to the 
discrete-dipole approximation," Opt. Lett. 16, 1198-1200 (1991). 
 6.  G. C. Hsiao and R. E. Kleinman, "Mathematical foundations for error estimation in numerical solutions of 
integral equations in electromagnetics," IEEE Trans. Ant. Propag. 45, 316-328 (1997). 
 7.  K. F. Warnick and W. C. Chew, "Error analysis of the Moment Method," IEEE Ant. Prop. Mag. 46, 38-53 
(2004). 
 8.  B. T. Draine, "The discrete-dipole approximation and its application to interstellar graphite grains," 
Astrophys. J. 333, 848-872 (1988). 
 9.  J. I. Hage, J. M. Greenberg, and R. T. Wang, "Scattering from arbitrarily shaped particles - theory and 
experiment," Appl. Opt. 30, 1141-1152 (1991). 
 10.  F. Rouleau and P. G. Martin, "A new method to calculate the extinction properties of irregularly shaped 
particles," Astrophys. J. 414, 803-814 (1993). 
 11.  N. B. Piller, "Influence of the edge meshes on the accuracy of the coupled-dipole approximation," Opt. 
Lett. 22, 1674-1676 (1997). 
 12.  N. B. Piller and O. J. F. Martin, "Increasing the performance of the coupled-dipole approximation: A 
spectral approach," IEEE Trans. Ant. Propag. 46, 1126-1137 (1998). 
 13.  N. B. Piller, "Coupled-dipole approximation for high permittivity materials," Opt. Comm. 160, 10-14 
(1999). 
 14.  A. G. Hoekstra, J. Rahola, and P. M. A. Sloot, "Accuracy of internal fields in volume integral equation 
simulations of light scattering," Appl. Opt. 37, 8482-8497 (1998). 
 15.  S. D. Druger and B. V. Bronk, "Internal and scattered electric fields in the discrete dipole approximation," 
J. Opt. Soc. Am. B 16, 2239-2246 (1999). 
 16.  Y. L. Xu and B. A. S. Gustafson, "Comparison between multisphere light-scattering calculations: 
Rigorous solution and discrete-dipole approximation," Astrophys. J. 513, 894-909 (1999). 
 17.  M. J. Collinge and B. T. Draine, "Discrete-dipole approximation with polarizabilities that account for both 
finite wavelength and target geometry," J. Opt. Soc. Am. A 21, 2023-2028 (2004). 
 18.  M. A. Yurkin, V. P. Maltsev, and A. G. Hoekstra, "Convergence of the discrete dipole approximation. II. 
An extrapolation technique to increase the accuracy," J. Opt. Soc. Am. A 23, 2592-2601 (2006). 
 19.  A. Lakhtakia, "Strong and weak forms of the method of moments and the coupled dipole method for 
scattering of time-harmonic electromagnetic-fields," Int. J. Mod. Phys. C 3, 583-603 (1992). 
 20.  F. M. Kahnert, "Numerical methods in electromagnetic scattering theory," J. Quant. Spectrosc. Radiat. 
Transf. 79, 775-824 (2003). 
 21.  C. P. Davis and K. F. Warnick, "On the physical interpretation of the Sobolev norm in error estimation," 
Appl. Comp. ElectroMagn. Soc. J. 20, 144-150 (2005). 
http://xxx.arxiv.org/abs/astro-ph/0409262
 22.  A. D. Yanghjian, "Electric dyadic Green's function in the source region," IEEE Proc. 68, 248-263 (1980). 
 23.  G. H. Goedecke and S. G. O'Brien, "Scattering by irregular inhomogeneous particles via the digitized 
Green's function algorithm," Appl. Opt. 27, 2431-2438 (1988). 
 24.  C. F. Bohren and D. R. Huffman, Absorption and scattering of Light by Small Particles, (Wiley, New 
York, 1983). 
 25.  J. Rahola, "On the eigenvalues of the volume integral operator of electromagnetic scattering," SIAM J. 
Sci. Comp. 21, 1740-1754 (2000). 
 26.  A. Lakhtakia and G. W. Mulholland, "On 2 numerical techniques for light-scattering by dielectric 
agglomerated structures," J. Res. Nat. Inst. Stand. Technol. 98, 699-716 (1993). 
 27.  J. I. Hage and J. M. Greenberg, "A model for the optical-properties of porous grains," Astrophys. J. 361, 
251-259 (1990). 
 28.  C. E. Dungey and C. F. Bohren, "Light-scattering by nonspherical particles - a refinement to the coupled-
dipole method," J. Opt. Soc. Am. A 8, 81-87 (1991). 
 29.  H. Okamoto, "Light scattering by clusters: the a1-term method," Opt. Rev. 2, 407-412 (1995). 
 30.  B. T. Draine and J. J. Goodman, "Beyond clausius-mossotti - wave-propagation on a polarizable point 
lattice and the discrete dipole approximation," Astrophys. J. 405, 685-697 (1993). 
 31.  J. I. Peltoniemi, "Variational volume integral equation method for electromagnetic scattering by irregular 
grains," J. Quant. Spectrosc. Radiat. Transf. 55, 637-647 (1996). 
 32.  D. Gutkowicz-Krusin and B. T. Draine, "Propagation of electromagnetic waves on a rectangular lattice of 
polarizable points," http://xxx.arxiv.org/abs/astro-ph/0403082 (2004). 
 33.  A. Rahmani, P. C. Chaumet, and G. W. Bryant, "Coupled dipole method with an exact long-wavelength 
limit and improved accuracy at finite frequencies," Opt. Lett. 27, 2118-2120 (2002). 
 34.  A. Rahmani, P. C. Chaumet, and G. W. Bryant, "On the importance of local-field corrections for 
polarizable particles on a finite lattice: Application to the discrete dipole approximation," Astrophys. J. 
607, 873-878 (2004). 
 35.  P. C. Chaumet, A. Sentenac, and A. Rahmani, "Coupled dipole method for scatterers with large 
permittivity," Phys. Rev. E 70, 036606 (2004). 
 36.  K. F. Evans and G. L. Stephens, "Microwave radiative-transfer through clouds composed of realistically 
shaped ice crystals .1. Single scattering properties," J. Atmos. Sci. 52, 2041-2057 (1995). 
 37.  A. G. Hoekstra, M. D. Grimminck, and P. M. A. Sloot, "Large scale simulations of elastic light scattering 
by a fast discrete dipole approximation," Int. J. Mod. Phys. C 9, 87-102 (1998). 
 38.  M. A. Yurkin, K. A. Semyanov, P. A. Tarasov, A. V. Chernyshev, A. G. Hoekstra, and V. P. Maltsev, 
"Experimental and theoretical study of light scattering by individual mature red blood cells with scanning 
flow cytometry and discrete dipole approximation," Appl. Opt. 44, 5249-5256 (2005). 
 39.  "Description of the national compute cluster Lisa," http://www.sara.nl/userinfo/lisa/description/ (2005). 
http://xxx.arxiv.org/abs/astro-ph/0403082
http://xxx.arxiv.org/abs/astro-ph/0409262
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Error
  /CompatibilityLevel 1.4
  /CompressObjects /Tags
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJDFFile false
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /CMYK
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments true
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile ()
  /AlwaysEmbed [ true
  /NeverEmbed [ true
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages true
  /ColorImageDownsampleType /Bicubic
  /ColorImageResolution 300
  /ColorImageDepth -1
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /DCTEncode
  /AutoFilterColorImages true
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages true
  /GrayImageDownsampleType /Bicubic
  /GrayImageResolution 300
  /GrayImageDepth -1
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /DCTEncode
  /AutoFilterGrayImages true
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages true
  /MonoImageDownsampleType /Bicubic
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXOutputIntentProfile ()
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /ConvertToCMYK
      /DestinationProfileName ()
      /DestinationProfileSelector /DocumentCMYK
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure false
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles false
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /DocumentCMYK
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /UseDocumentProfile
      /UseDocumentBleed false
    >>
>> setdistillerparams
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice
ABSTRACT
  We performed a rigorous theoretical convergence analysis of the discrete
dipole approximation (DDA). We prove that errors in any measured quantity are
bounded by a sum of a linear and quadratic term in the size of a dipole d, when
the latter is in the range of DDA applicability. Moreover, the linear term is
significantly smaller for cubically than for non-cubically shaped scatterers.
Therefore, for small d errors for cubically shaped particles are much smaller
than for non-cubically shaped. The relative importance of the linear term
decreases with increasing size, hence convergence of DDA for large enough
scatterers is quadratic in the common range of d. Extensive numerical
simulations were carried out for a wide range of d. Finally we discuss a number
of new developments in DDA and their consequences for convergence.

<|endoftext|><|startoftext|>
Microsoft Word - NatureCorrespondJasmitedit.rtf
Sent to Nature in 1990 
Abstract 
This is a supplement to the paper arXiv:q-bio/0701050, containing the text of correspondence  
sent to Nature in 1990.  
Origin of adaptive mutants: a quantum measurement? 
 Sir, - Several recent works described non-random induction of adaptive mutations by 
environmental stimuli 1-3. The most obvious explanation of this striking phenomenon would be 
that activation of gene expression leads to the enhancement of its mutation rate4. However, this 
does not work with the lacZ mutations described by Cairns and co-workers as the true inducer of 
the lac-operon is not lactose as such, but allolactose, a by-product of the β-galactosidase 
reaction5. So, in lacZ mutants the operon is not induced by lactose6. Besides, induction of 
respective genes would not explain the high fraction, among the revertants, of suppressor 
mutations in tRNA genes1,7 
  Other explanations suggest some special mechanisms for the "acceleration of adaptive 
evolution", like selection of "useful" protein coupled to specific reverse transcription1. However, 
any mechanism of this type also should have emerged in evolution. I propose that, to explain the 
adaptive mutation phenomenon, there is no need for any new ad hoc mechanism. The only thing 
that is necessary is to return to the old discussion of the role of quantum concepts in our 
understanding of life. This alone will allow the explanation of this manifestly Lamarckian 
phenomenon by Darwinian selection, occurring not in a population of organisms as usual, but in a 
"population" of virtual, in the direct quantum theory sense, states of each distinct cell. Thus, this 
hypothesis may be called "selection of virtual mutations". Detailed substantiation of this concept 
will be presented in a special publication; below I briefly show how this explanation might work.  
  It has been shown by the Cairns group that the mutations ensuring cell growth begin to 
accumulate not immediately after plating, but only after conditions are created under which such 
mutations become "useful", as if the mutations are induced by these conditions1. I suggest that, to 
explain this phenomenon, we should change our ideas about what a cell is, and consider not 
actual but virtual mutations. An important distinction of virtual mutations is that they do not 
accumulate with time in stationary cell, whereas the number of actual mutants would grow 
linearly from the moment of plating, and this would yield drastically different results in 
experiments like those shown in Fig. 3 of Ref. 1. Virtual mutations produce "delocalization" of 
the cell among different states, similarly to the delocalization of electron in physical space. 
However, for a virtual mutation to become an actual one, certain conditions are necessary, 
namely the possibility to grow, leading the system away irreversibly from the initial state. Such 
conditions arise when, for example, lactose is added to a plate with lacZ bacteria. Briefly, this is 
the essence of the proposed explanation.  
  What is a virtual mutation? The main cause of usual spontaneous mutations is the well-
known base tautomerization8 (having the in vitro frequency of about 10-4). Thus could we reduce 
‘virtual mutation’ to such tautomerization? I believe that this view is not consistent with 
experiments, as it implies that the same rare tautomeric form should work both in transcription 
and in replication. If these processes are considered independent, we logically arrive to the leaky 
mutant, which was refuted by Cairns and coworkers1. Thus we need to postulate a correlation 
between the recognition of the tautomeric forms in transcription and in replication, making us to 
define "virtual mutation" as a certain state of the cell as a whole. Analogous reasoning is 
applicable to the "adaptive transpositions" discussed by Cairns. In other words, we consider the 
whole cell as a quantum system, with non-negligible nonlocality inherent in such systems. Most 
of all it resembles the systems of "generalized rigidity"9, such as superfluid or superconducing 
states of matter, whose behavior is linked to quantum correlations; and I believe, similar 
correlations take place in the cell too.  
  I would like to emphasize that the proposed approach does not require detalization of 
molecular processes in the cell. Its main focus is the behavior of the cell as a whole. Similarly, to 
explain gyroscopic precession there is no need to consider interactions between elements inside 
the gyroscope; it’s enough to know some motion invariants, defined by space-time symmetries.  
  Starting from this general view, one may express the above ideas using the operator 
formalism, and considering experiments conducted by Cairns as measurement of the cells’ 
capability to propagate under given conditions. I suggest that the trait "ability to reproduce on 
lactose" (as an example) can be represented by an operator which one may designate "Lac". 
Importantly, this new operator will act on the state Ψ of the whole cell because the ability to 
reproduce is a property of the cell as a whole, and not of any part of it. Generally, "Lac " will 
decompose this Ψ into a superposition of some eigenfunctions. The components of this 
superposition are those functions that do not change upon the action of this operator, but are only 
multiplied by a constant. It reflects the essence of operator formalism in quantum theory, which 
chooses states compatible with given experimental conditions. There are three such 
eigenfunctions (I intentionally simplify the situation): ψ1 corresponds to cell death, ψ2 to the 
stationary state, and ψ3 to the self-reproduction (that is the virtual mutation, in our case). Each 
function will enter the decomposition of Ψ with a coefficient ci related to the probability of this 
or that outcome, i.e.:  
  Ψ = c1 ψ1 + c2 ψ2 + c3 ψ3, where Σ| ci |2 = 1 
  By plating the cells on lactose agar we, in fact, begin to measure their ability to grow 
under these particular conditions. The rate of accumulation of lac revertants, i.e. the probability to 
obtain a cell in the mutant state, will correspond to |c3|2, being a small, but finite quantity, 
appearing, for example, due to base tautomerization. Here, the role of cell growth is dual: on the 
one hand, it is a factor of irreversibility amplifying the "quantum fluctuation", and on the other 
hand, it is a selection criterion, as each kind of virtual mutants capable of growth under these 
conditions can lead to colony formation. Another situation, i.e. glucose/valine agar, will be 
represented by another operator (Valr), which will decompose the same Ψ function according to 
another basis, and Valr mutants will be obtained with certain rate. In fact, this is the essence of 
adaptive mutation phenomenon, where a particular condition induces emergence of respective 
mutants. 
  Thus, the proposed change of our view on the cell suggests that, in accord with quantum 
concepts, we are not dealing with the probability for a cell to mutate by itself, independent of 
experimental conditions. Rather, we are dealing with the probability to observe the cell in the 
mutant state by plating it on lactose. We are certainly simplifying situation, as spontaneous 
mutations that accumulate during cell growth before plating, make our ensemble ‘mixed’. 
However, this complication does not change the essence of the explanation, according to which 
adaptive mutations emerge by measurement of ‘pure’ state. This resembles the passage of a 
polarized photon through a polarizer turned under some angle to the photon polarization. It will 
be incorrect to say that the polarization of the photon could turn by the necessary angle by 
chance, prior to interaction. It is the specific experimental situation that makes us to decompose 
the state vector according to the respective basis states, and to evaluate the fraction of the 
component that will pass through polarizer. On the other hand, one may speak about "adaptation" 
of photon polarization by selection of "fit" eigenstate, and consider this case as the model for our 
phenomenon.  
  How are all these ideas applicable to the living bacterial cell? Discussion of the possible 
role of quantum concepts in biology has a rather long history, initiated by Niels Bohr (‘the 
complementarity principle’). Briefly, one might reduce the essence of this discussion to the 
principal impossibility to predict precisely the fate of an individual cell. For example, any attempt 
to determine, whether it is able to reproduce under certain conditions, will lead to irreversible 
change of the state of the cell, even to its death. This is reminiscent of the two-slit diffraction 
experiment, where an attempt to determine through which of the two slits the electron actually 
passes will lead to disappearance of the interference. The two trajectories of the electron can be 
made physically discernable only by the cost of changing the experimental situation. Similarly, 
the notorious phenomenon of the "wholeness" of the living organism can be formally expressed 
according to the Feynman rules of calculating probabilities: different indiscernible (in the given 
experimental conditions) variants should be included in the pure state (i.e. their amplitudes, and 
not probabilities, should be summed, leading to interference and other quantum effects). Thus, as 
long as a whole cell exists and is alive, we are obligated to treat its different indiscernible states 
in this way. Such consideration of operational limitations allows us to explain the adaptive 
mutation phenomenon (and hopefully other adaptations too) as the consequence of unavoidable 
quantum scatter in measurement of the cell's capability to propagate under given conditions.  
  In spite of its apparent formal character, this hypothesis allows us to make some 
predictions of applied (in particular, medical) interest. It predicts that in processes involving 
somatic mutations (e.g. oncogenesis, or specific antibody generation) the mutations may be 
induced by conditions allowing the cell that happened to be in the state of virtual mutation to 
proliferate irreversibly. I believe, this possibility can be tested experimentally.  
References 
1. Cairns,J., Overbaugh,J., Miller,S. Nature 335, 142-145 (1988) 
 2. Shapiro,J.A. Molec. Gen. Genet. 194, 79-90 (1984) 
 3. Hall,B.J. Genetics 120, 887-897 (1988) 
 4. Devis,B.D. Proc.Natl.Acad.Sci.USA 86, 5005-5009 (1989) 
 5. Lewin,B., Genes, p.236 ( J.Wiley & Sons,1985) 
 6. Burstein,C., Cohn,M., Kepes,A., Monod,J. Bioch.Bioph.Acta 95,   634 (1965) 
 7. Savic,D.J.& Kanazir,D.T. Molec. Gen. Genet. 137, 143-150 (1975)  
 8. Topal,M., Fresco,J. Nature 263, 285-289 (1976) 
 9. Anderson,P.W.,Stein,D.L. in Self-Organizing Systems, ed. by   F.E.Yates, pp.451-452 
(Plenum Press, 1987) 
Comments:  
This text was written in 1990. The author translated it to English with the kind help of Dr. 
Eugene Koonin (current affiliation: National Center for Biotechnology Information, National 
Library of Medicine, National Institutes of Health, Bethesda MD, USA.) 
The English version of the text was sent to Nature in 1990 and rejected. At the same time it was 
also sent to the following correspondents : 
1. JOHN CAIRNS 
Department of Cancer Biology, Harvard School of Public Health, Boston, Massachusetts 02115. 
2. BARRY HALL 
Department of Molecular and Cell Biology, University of Connecticut, Storrs 06269. 
3. BERNARD DAVIS 
Bacterial Physiology Unit, Harvard Medical School, Boston, MA 02115. 
4. KOICHIRO MATSUNO 
Department of BioEngineering, Nagaoka University of Technology, Japan. 
5. KONSTANTIN CHUMAKOV 
Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, 
Maryland 20852, USA. 
6. MIKHAIL V. IVANOV  
Institute of Microbiology, Russian Academy of Sciences, pr. 60-letiya Oktyabrya 7, k. 2, 
Moscow, 117811 Russia. 
, as well as to all participants of the discussion ‘Origin of mutants disputed’ (Nature 336, 525 - 
526 (08 December 1988)) : 
1. D. CHARLESWORTH, B. CHARLESWORTH & J. J. BULL 
Department of Ecology and Evolution, University of Chicago, 915 East 57th Street, Chicago, 
Illinois 60637, USA 
Department of Zoology, University of Texas, Austin, Texas 78712, USA 
2. ALAN GRAFEN 
Animal Behaviour Research Group, Zoology Department, Oxford University, Oxford OX1 3PS, 
3. R. HOLLIDAY & R. F. ROSENBERGER 
CSIRO Laboratory for Molecular Biology, North Ryde, Sydney, Australia 
Genetics Division, National Institute for Medical Research, Mill Hill, London NW7 1AA, UK 
4. LEIGH M. VAN VALEN 
Department of Ecology and Evolution, University of Chicago, 915 East 57Street, Chicago, 
Illinois 60637, USA 
5. ANTOINE DANCHIN 
Institut Pasteur, 28 Rue Dr. Roux, 75724 Paris, Cedex 15, France 
6. IRWIN TESSMAN 
Departments of Biiological Sciences, Purdue University, West Lafayette, 
Indiana 47907, USA
ABSTRACT
  This is a supplement to the paper arXiv:q-bio/0701050, containing the text of
correspondence sent to Nature in 1990.

<|endoftext|><|startoftext|>
Introduction 
The discrete dipole approximation (DDA) is a well-known method to solve the light 
scattering problem for arbitrary shaped particles. Since its introduction by Purcell and 
Pennypacker1 it has been improved constantly. The formulation of DDA summarized by 
Draine and Flatau2 more than 10 years ago is still most widely used for different applications,3 
partly due to the publicly available high-quality and user-friendly code DDSCAT.4 
DDA directly discretizes the volume of the scatterer and hence is applicable to arbitrary 
shaped particles. However, the drawback of this discretization is the extreme computational 
complexity of DDA of O(N 2), where N is the number of dipoles. This complexity is 
decreased to O(NlogN) by advanced numerical techniques.2,5 Still the usual application 
strategy for DDA is “single computation”, where a discretization is chosen based on available 
computational resources and some empirical estimates of the expected errors.3,4 These error 
estimates are based on a limited number of benchmark calculations3 and hence are external to 
the light scattering problem under investigation. Such error estimates have evident drawbacks, 
however no better alternative is available. 
Usually errors in DDA are studied as a function of the size parameter of the scatterer x 
(at a constant or few different values of N), e.g. 2,6. Only several papers directly present errors 
versus discretization parameter (e.g. d – the size of a single dipole).7-15 The range of d 
typically studied in those papers is limited to a 5 times difference between minimum and 
maximum values (with the exception of two papers9,10 where it is 15 times). Only two 
papers7,15 use extrapolation (to zero d) to get an exact result of some measured quantity, 
however they use the simplest linear extrapolation without any theoretical foundation nor 
discussion of its capabilities. 
It is acknowledged for a long time that DDA errors are due to two different factors: 
shape (it is not always possible to describe the particle shape exactly by a collection of 
cubical cells) and discretization (finite size of each cell).6 However, the question which of 
them is more important in different cases is still open. A discussion on this issue spanned 
through several papers16-20 that have not reached any definite conclusions yet. The uncertainty 
is due to the indirect methods used that have inherent interpretation problems. 
In accompanying paper,21 that from now on we will refer to as Paper 1, we performed a 
theoretical analysis of DDA convergence when refining the discretization. It provides the 
basis for this paper, where an extrapolation technique is introduced (Section 2) to improve the 
accuracy of DDA computations. We thoroughly discuss all free parameters that influence 
extrapolation performance and provide a step-by-step prescription, which can be used with 
any existing DDA code without any modifications. It is important to note that although 
Paper 1 provides a firm theoretical background, it is not necessary to go through all 
theoretical details to understand and apply the extrapolation technique that we introduce here. 
In Section 3 we present extensive numerical results of DDA computations for 5 different 
scatterers using many different discretizations. These results are discussed in Section 4 to 
evaluate the performance of the extrapolation technique. We also propose a new method to 
directly separate shape and discretization errors of DDA (described and illustrated in 
Section 3.B). The results and possible applications are discussed in Section 4. We formulate 
the conclusions of the paper in Section 5. 
2. Extrapolation 
In this section we describe a straightforward technique to significantly increase the accuracy 
of a DDA simulation with a relatively small increase of computation time. This technique 
does not require any modification of a DDA program but only postprocessing of computed 
data. Therefore it can be easily implemented in any existing DDA code. 
In Paper 1 we have proven that the error of any measured quantity is bounded by a 
quadratic function of the discretization parameter mkdy =  (k – free space wave vector, m – 
refractive index of the scatterer): 
( ) ( )yybayybay lnlnδ 11222 φφφφφ −+−≤ , (1)
where φ y is some measured quantity (e.g. extinction efficiency Qext, Mueller matrix elements 
at some scattering angle Sij(θ ), etc.) and δφ y its error (difference between a result of the 
numerical simulation and an exact value). ,  are constants (independent on y), which 
are described in detail in Paper 1. 
Here we proceed and assume that for sufficiently small y, δφ y can in fact be 
approximated by a quadratic function of y (taking the logarithmic term as a constant). The 
applicability of this assumption will be tested empirically in Section 3.B. Introduction of 
higher-order terms is possible but not necessary (contrary to the quadratic term), and we avoid 
it in order to keep our technique as simple and robust as possible. We can now write: 
yy yayaa ζφ +++= 2210 , (2)
where a0, a1, a2 are constants that are chosen such that ζ y – the error of the approximation – is 
minimized. a0 is then an estimate for the exact value of the measured quantity φ 0. A 
procedure to determine a0 is basically fitting of a quadratic function over several points 
, which are obtained by a standard DDA simulation. In the ideal case of  one 
can use any three values of y to obtain the exact value of φ
},{ yy φ 0=yζ
 0. However, in practice different 
fits will always give different results. We limit ourselves to the usual least-square polynomial 
fit of the data. There are three question one should answer before conducting such a fit: 
1) how many and which values of y to use? 
2) how to weight the influence of different calculated values used in the fitting, i.e. what is 
the behavior of expected errors ζ y? (Note that in the polynomial fit we minimize χ2, the 
summation of the squared difference between computed values and the fitting function 
weighted by the inverse of the expected error ζ y.) 
3) how to estimate the difference between a0 and φ 0, i.e. the error of the final result? 
It is important to note that, although there are some theoretical hints, answers to these 
questions are mainly empirical and should be tested. Our approach is based on the test cases 
presented in Section 3.B. These may not be representative for all scattering problems, but they 
do show the potential power of our approach. We do not attempt to choose the most suitable 
fit options, but merely demonstrate the applicability of the technique. 
We start by analyzing the second question, i.e. what is the expected deviation from the 
quadratic model, i.e. what is the functional dependence of ζ y on y, to be used as weighting 
function in the polynomial fitting procedure? For cubically shaped particles, defined in 
Paper 1 as particles whose shape can be exactly discretized using cubical subvolumes, one 
expects a smooth variation of the function φ y, and the error can be attributed as a model error, 
i.e. coming mainly from neglecting higher order terms in the convergence analysis of Paper 1. 
In that case the error ζ y is expected to be a cubical function of y. We have tried cubical, 
quadratic and linear error functions when fitting results for cubically shaped particles and 
found that, although the differences are small, cubical errors generally lead to the best fits 
(data not shown). 
Shape errors, which are present for non-cubically shaped particles, are expected to be 
very sensitive to y, because they depend upon the position of the particle surface inside the 
boundary dipole that changes considerably by a small variation of y (for details see Paper 1). 
Therefore shape errors can be viewed as random noise superimposed upon a smooth variation 
of φ y. The asymptotic behavior of shape errors is linear in y (see Paper 1). Indeed, in certain 
cases we found that using linear errors ζ y results in significantly better fits than when using 
cubical errors. However in other cases linear errors performed significantly worse. In our 
experience, using a cubical error function is in general always more reliable, even in the 
presence of shape errors, because it decreases the influence of points with high values of y, 
where the error is larger and less predictable. Since we want the procedure to be as robust as 
possible and not to use more complex error functions than strictly needed (e.g. polynomial), 
we take a cubical dependence of the error ζ y, both for cubically- and non-cubically shaped 
scatterers. 
The choice of values of y for computation can be described by the interval [ymin,ymax], 
the number of points and their spacing. ymin is usually determined by available computer 
hardware (time or memory bounds), that is the best discretization that can be computed for a 
given resource. The goal of the extrapolation procedure is to increase the accuracy beyond 
this “single DDA boundary”. We will show in Section 3.B that the overall performance of this 
technique strongly depends upon ymin. 
The choice of ymax is governed by two notions: a larger interval of data points generally 
leads to better extrapolation but errors for high values of y are more random and their 
significance is anyway much smaller (since we use a cubical error function). We have found 
that for cubically shaped scatterers a good choice is minmax 2yy = , while for non-cubically 
shaped scatterers increasing the interval to minmax 4yy =  does improve the fits. Probably that 
is due to the fact that the quality of fit for non-cubically shaped scatterers is determined by 
quasi-random shape errors and increasing the range leads to larger statistical significance of 
the result. We will also demand that ymax is less than 1, since otherwise DDA is definitely far 
from its asymptotic behavior. 
Spacing of the sample points depends partly on the problem, especially for cubically 
shaped scatterers (in that case an arbitrary number of dipoles cannot be used). We space 
computational points approximately uniform on a logarithmic scale, acknowledging the fact 
that a relative difference in y is more significant than an absolute. The total number of points 
should be large enough for statistical significance. However, a large number of points 
increases computational time. We have used 5 points for cubically shaped particles (ratio of 
y1  values is 8:7:6:5:4) and 9 points for non-cubically shaped particles (ratio of y1  values is 
16:14:12:10:8:7:6:5:4) or less if minmax 4yy < . 
The estimation of the error of the final result is difficult since this error is due to model 
imperfection and not to some kind of random noise. The standard least-square fitting 
technique22 provides a standard error (SE) for the parameter a0, which we use as a starting 
point. Numerical simulations (Section 3.B) show that for spheres (the only non-cubical shape 
we studied) real errors are less than 2×SE in most cases. That is what one would expect if ζ y 
is considered completely random (which is similar to the expected behavior of the shape 
errors). For cubical shapes, on the contrary, we have to estimate the error as 10×SE to reliably 
describe the real errors. It is important to note that an error estimate based on the SE is the 
simplest one can use. Its drawback is that we have to use a large multiplier (based on the real 
errors obtained in some of our simulations), which may lead to significant overestimation of 
real errors in certain cases. 
We can now formulate the step-by-step extrapolation technique. We use abbreviations 
(c) and (nc) for cubically and non-cubically shaped scatterers respectively. 
1) Select ymin based on your computational resources. 
2) Take ymax to be 2 (c) or 4 (nc) times ymin but not larger than 1. 
3) Choose 5 (c) or 9 (nc) points over the interval [ymin,ymax] approximately uniformly 
spaced on a logarithmic scale. 
4) Perform DDA computations for each y. 
5) Fit the quadratic function (Eq. (2)) over the points  using y},{ yy φ 3 as errors of data 
points; a0 is then the estimate of φ 0. 
Multiply SE of a0 by 10 (c) or 2 (nc) to obtain an estimate of the extrapolation error. 
Results of using this procedure are presented in Section 3, together with computational costs. 
The extrapolation procedure is similar to a Romberg integration method,22 which is 
adaptive. The error estimate, obtained by extrapolation, is an internal accuracy indicator of 
DDA computations that is just as important as the increase in the accuracy itself. Our error 
estimate opens the way to adaptive DDA, i.e. a code that will reach a required accuracy, using 
minimum computational resources. 
3. Numerical simulations 
A.Discrete Dipole Approximation 
The basics of the DDA method were summarized by Draine and Flatau.2 In this paper we use 
the LDR prescription for dipole polarizability,23 which is most widely used nowadays, e.g. in 
the publicly available code DDSCAT 6.1.4 We also employ dipole size correction6 for non-
cubically shaped scatterers to ensure that the cubical approximation of the scatterer has the 
correct volume; this is believed to diminish shape errors, especially for small scatterers.2 We 
use a standard discretization scheme without any improvements for boundary dipoles. 
The main numerical challenge of DDA is to solve a large system of 3N linear equations. 
This is done iteratively using some Krylov-subspace method,22 while the matrix-vector 
products are computed using an FFT-based algorithm.5 Our code – Amsterdam DDA 
(ADDA) – is capable of running on a cluster of computers (parallelizing a single DDA 
computation), which allows us to use practically an unlimited number of dipoles, since we are 
not limited by the memory of a single computer.24,25 We used a relative error of residual 
 as a stopping criterion. Tests suggest that the relative error of the measured quantities 
due to the iterative solver is then  (data not shown) and hence can be neglected (total 
relative errors in our simulations are  – see Section 
810−<
710−<
56 1010 −− ÷> 3.B). All DDA simulations 
were carried out on the Dutch national compute cluster LISA.26 
The execution time of one iteration depends solely on N, it consists of an arithmetic part 
which scales linearly with N and an FFT part which scales as NlnN. The number of iterations 
only slightly depends on the discretization parameter y for fixed geometry of the scatterer. 
Rahola proved this theoretically for any Krylov-subspace method,27 and our own experience 
agrees with this conclusion. Therefore the total computational time scales linearly with N 
( ) or slightly faster (considering logarithm and imperfect optimization), which is 
consistent with our timing results (data not shown). 
3−∝ y
We can now estimate the computational overhead of the extrapolation technique 
compared to a single DDA computation for ymin (time – t(ymin)). Considering the spacing of 
points we used (described in Section 2) the execution time needed for 5 points computation is 
 and for the 9 points computation – )(5.2 min5 ytt < )(7.2 min9 ytt < . Memory requirements are 
the same as for a single computation. For comparison one should note that an 8 times increase 
in computational time and memory requirements (for single DDA computation with 
Fig. 1. Cubical discretization of a sphere using 16 dipoles per diameter (total 2176 dipoles). 
2minyy = ) gives only a 2 to 4 times increase in accuracy (depending in which error regime – 
linear or quadratic – ymin is located). 
B.Results 
We study five test cases: one cube with 8=kD , three spheres with , and a 
particle obtained by a cubical discretization of the 
30,10,3=kD
10=kD  sphere using 16 dipoles per D 
(total 2176 dipoles, see Fig. 1, x equal to that of a sphere). By D we denote the diameter of a 
sphere or the edge size of a cube. All scatterers are homogenous with . Although 
DDA errors significantly depend on m (see e.g. 
5.1=m
12), we limit ourselves to one single value and 
study the effects of size and shape of the scatterer. 
The maximum number of dipoles per D (nD) was 256. The values of nD that we used are 
of the form  (p is an integer), except for the discretized sphere, where all np2}7,6,5,4{ ⋅ D are 
multiples of 16 (this is required to exactly describe the shape of the particle composed from a 
number of cubes – see Fig. 1). The minimum values for nD were 8 for the  sphere, 16 
for the cube, the  sphere, and the discretized sphere, and 40 for the  sphere. 
10=kD 30=kD
Typical computation time for the finest discretization (for the cube with , 
resulting in ) currently is 2.5 hours on a cluster of 64 P4-3.4 GHz processors. We 
expect that it can be improved by an order of magnitude by using modern FFT routines (e.g. 
fastest Fourier transform in the West – FFTW
047.0=y
7107.1 ⋅=N
28) and a faster iterative solvers (bi-conjugate 
gradient stabilized or quasi-minimal residual that were shown to be clearly superior to 
CGNR29,30 that we still use). We are currently improving our code along these lines. 
All computations use a direction of incidence parallel to one of the principal axes of the 
cubical dipoles. The scattering plane is parallel to one of the faces of the cubical dipoles. In 
this paper we show results only for the extinction efficiency Qext (for incident light polarized 
parallel to one of the principal axes of the cubical dipoles) and phase function S11(θ ) as the 
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.
most commonly used in applications. However, the extrapolation technique is equally 
applicable to any measured quantity. For instance, we have also applied it for other Mueller 
matrix elements (data not shown). 
Reference (exact) results of S11(θ ) and Qext for spheres are obtained by Mie theory (the 
relative accuracy of the code we use31 is at least ). Unfortunately, no analytical theory 
is available for the cube and the discretized sphere, which could provide us with exact results. 
Instead, we use extrapolation over the 5 finest discretizations as reference results for these 
shapes. 
610−<
To justify this choice we discuss, as an example, simulation results of Qext for the cube. 
Instead of showing values of Qext itself, we show in Fig. 2a ( )10ext −aQ , with a0 obtained 
through fitting the 5 finest discretizations. The extrapolation through these 5 best points 
( , ) is also shown. The deviation of the fit from the five best points 
(that overlap on 
047.0min =y 094.0max =y
Fig. 2(a)) is very small indeed. This is also characterized by a small estimate 
of the extrapolation error  (see 6108.1 −× Table 1). In Paper 1 we proved that DDA converges 
to the exact solution, therefore the result of the best extrapolation should be close to the exact 
result. The relative difference between the best discretization and the best extrapolation is 
only , therefore it does not make a big difference which one to use as a reference 
when evaluating, for instance, the error of the extrapolation through the 5 worst 
5100.9 −×
 cube kD=8
 fit (5 best points)
 discretized sphere kD=10
 fit (5 best points)
(b) sphere kD=3
 fit (9 best points)
 sphere kD=10
 fit (9 best points)
 sphere kD=30
 fit (9 best points)
y=kd·m  
Fig. 2. Signed relative errors of Qext versus y and their fits by quadratic functions for (a) kD = 8 cube 
and discretized kD = 10 sphere, (b) 3 spheres. 5 and 9 best points are used for fits in (a) and (b) 
respectively. 
Table 1. Extrapolation errors of Qext. Estimate of the extrapolation errors is 10×SE for first two 
particles and 2×SE for spheres. 
Extrapolation ymin ymax Points Error for ymin Estimate Real 
kD = 8 cube 
0.047 0.094 5 9.0×10-5 1.8×10-6 –––– 
0.094 0.19 5 1.6×10-4 6.6×10-6 4.6×10-6
0.19 0.38 5 2.2×10-4 5.3×10-5 4.0×10-5
0.38 0.75 5 1.1×10-4 3.7×10-4 3.2×10-4
Discretized kD = 10 sphere 
0.058 0.12 5 1.0×10-4 2.4×10-5 –––– 
0.12 0.23 5 2.0×10-4 9.0×10-6 7.9×10-6
0.23 0.93 4 4.3×10-4 1.2×10-3 5.9×10-4
kD = 3 sphere 
0.018 0.070 9 2.2×10-4 1.0×10-5 4.1×10-6
0.035 0.14 9 4.0×10-4 5.9×10-5 4.8×10-5
0.070 0.28 9 6.8×10-4 8.7×10-5 5.7×10-6
0.14 0.54 9 9.0×10-4 3.7×10-4 7.0×10-4
0.28 0.54 5 2.4×10-4 4.3×10-3 1.8×10-3
kD = 10 sphere 
0.059 0.23 9 2.7×10-4 2.0×10-4 2.7×10-5
0.12 0.47 9 5.5×10-4 5.5×10-4 3.7×10-4
0.23 0.93 9 1.5×10-3 3.1×10-3 2.1×10-3
kD = 30 sphere 
0.18 0.70 9 3.8×10-4 1.3×10-3 1.4×10-3
0.18 0.35 5 3.8×10-4 3.3×10-3 6.9×10-4
Table 2. Comparison of shape and discretization errors of Qext for kD = 10 sphere discretized with 
y = 0.93. All errors are relative to the best extrapolation result for the discretized sphere. 
 Shape Discretization Total 
Error 3.1×10-3 8.3×10-3 5.2×10-3
discretizations ( 38.0min =y , 75.0max =y ). Hence all conclusions with respect to the 
reliability of the error estimates (as discussed in Section 4) do not depend on the choice of 
reference if ymin is large enough. We also apply this reasoning to smaller ymin and assume that 
using the reference value obtained by extrapolation of the finest discretizations is a good 
enough estimate of the exact value. 
The same justification is valid for the discretized sphere (see Table 1 for Qext results). 
Comparison of errors of different extrapolations results of S11(θ ) (shown in Fig. 3 and Fig. 4) 
is even more convincing. Reference results themselves (both of Qext and S11(θ )) can be found 
in Paper 1. 
Next we show the results obtained by the extrapolation technique. The dependence of 
the signed relative errors of Qext on y for all 5 test cases are shown in Fig. 2. Fig. 2(a) depicts 
results for the cube and the discretized sphere. The 5 best points for each scatterer are fitted 
by a quadratic function, using the method described in Section 2. Fig. 2(b) depicts 
extrapolation results for spheres, using the 9 best points for each of them (cf. Section 2). Since 
the exact Mie solution is available, intersection of a fit with a vertical axis is a measure of the 
accuracy of extrapolation result. Table 1 summarizes the parameters (ymin, ymax, number of 
points) of all the extrapolations, which were carried out, and their performance for Qext. 
0 30 60 90 120 150 180
Scattering angle θ, deg
 y = 0.75
 y = 0.38
 extrapolation
 estimate
 y = 0.19
 y = 0.094
 extrapolation
 estimate
 y = 0.094
 y = 0.047
 extrapolation
          (estimate)
Fig. 3. Errors of S11(θ ) in logarithmic scale for extrapolation using 5 values of y in the intervals (a) 
[0.047,0.094], (b) [0.094,0.19], and (c) [0.38,0.75] for kD = 8 cube. Estimate of the extrapolation error 
is 10×SE. 
Next we present some of the extrapolations results for S11(θ ). Results for the cube are 
shown in Fig. 3. Each subfigure shows real (compared to the best extrapolation – reference) 
and estimated errors together with the errors of the finest and crudest discretizations used. 
Only the estimate of the error is shown for the best extrapolation – Fig. 3(a). Fig. 3(b) and (c) 
show extrapolation results using 5 points in the intervals [0.094,0.19] and [0.38,0.75] 
respectively. The performance of the extrapolation for the discretized sphere is shown in Fig. 
4: (a) – best extrapolation, (b) and (c) – results for extrapolation using 5 and 4 points in the 
intervals [0.12,0.23] and [0.23,0.93] respectively. The broad spacing of points for 
0 30 60 90 120 150 180
Scattering angle θ, deg
 y = 0.93
 y = 0.23
 extrapolation
 estimate
 y = 0.23
 y = 0.12
 extrapolation
 estimate
 y = 0.12
 y = 0.058
 extrapolation
          (estimate)
Fig. 4. Errors of S11(θ ) in logarithmic scale for extrapolation using 5 values of y in the intervals (a) 
[0.058,0.12], (b) [0.12,0.23] ((c): 4 values of y in the interval [0.23,0.93]) for the discretized kD = 10 
sphere. Estimate of the extrapolation error is 10×SE. 
extrapolation depicted in Fig. 4(c) is, as was noted above, due to the complex shape of the 
discretized sphere that limits possible values of y to be 0.93 divided by an integer (total time 
for computing these 4 points is ). It is important to note once more that we use 
10×SE as an estimate of extrapolation error for the cube and discretized sphere and 2×SE for 
spheres (cf. Section 
)(6.1 minyt<
Extrapolation results for the 3=kD  sphere are summarized in Fig. 5: (a) shows the best 
extrapolation (using 9 points in the interval [0.018,0.070]), and (b) shows the worst, but still 
satisfactory result, i.e. one that shows definite improvement of accuracy over most of the θ 
0 30 60 90 120 150 180
Scattering angle θ, deg
 y = 0.55
 y = 0.14
 extrapolation
 estimate
 y = 0.070
 y = 0.018
 extrapolation
 estimate
Fig. 5. Errors of S11(θ ) in logarithmic scale for extrapolation using 9 values of y in the intervals (a) 
[0.018,0.070], (b) [0.14,0.55] for kD = 3 sphere. Estimate of the extrapolation error is 2×SE. 
range. The extrapolation using 5 points from the interval [0.28,0.54] is no longer satisfactory 
(data not shown). Errors of the two best extrapolations for the 10=kD  sphere (using 9 points 
from the intervals [0.059,0.23] and [0.12,0.47]) are shown in Fig. 6(a) and (b) respectively. A 
third extrapolation for  sphere is not satisfactory (data not shown). Both 
extrapolations for the  sphere show similar controversial results, only one of them (9 
points from the interval [0.18,0.70]) that is overall slightly better is shown in 
10=kD
30=kD
Fig. 7. The 
estimate of the extrapolation error is overall slightly higher than the real errors of the 
extrapolation (data not shown). 
Results of S11(θ ) for all extrapolations (see Table 1) support the following trend: the 
quality of the extrapolation (defined as decrease of error compared to a single DDA 
computation for ymin) rapidly degrades with increasing ymin. The ratio of estimated to real 
errors increase with increasing ymin (that can be considered as a degradation of the estimate 
quality). 
Computation of exact results for both the 10=kD  sphere and its cubical discretization 
( ) allows us for the first time to directly separate and compare shape and 
discretization error of single DDA computations. The shape error is the difference between 
some measured quantity for a discretized sphere (calculated to a high accuracy) and that for 
the exact sphere. The discretization error is difference between calculation using a limited 
number of dipoles (2176) and exact (very accurate) solution for the cubical discretization of 
the sphere (first curve in 
93.0=y
Fig. 4(c)). The total error is just the sum of the two. These three 
0 30 60 90 120 150 180
Scattering angle θ, deg
 y = 0.47
 y = 0.12
 extrapolation
 estimate
 y = 0.23
 y = 0.059
 extrapolation
 estimate
Fig. 6. Errors of S11(θ ) in logarithmic scale for extrapolation using 9 values of y in the intervals (a) 
[0.059,0.23], (b) [0.12,0.47] for kD = 10 sphere. Estimate of the extrapolation error is 2×SE. 
types of errors for S11(θ ) are shown in Fig. 8, all relative to the exact value for discretized 
sphere. Errors of Qext are shown in Table 2. 
4. Discussion 
In their review Draine and Flatau2 gave the condition 1<y  for applicability of DDA. Usually 
 (10 dipoles per wavelength in the medium) is used in applications.6.0=y 3 Smaller y are 
used only in studies of DDA errors2,12,13 or of light scattering by particles much smaller than a 
wavelength (then d is determined by a shape of a scatterer, and y, being proportional to 
scatterer size, can be arbitrarily small).32 However, if one wishes to achieve better (than usual) 
accuracy of a DDA simulation, smaller y must be used. 
The best extrapolation for the cube (Fig. 3(a)) shows a large improvement compared to 
the best single DDA calculation (it should be noted, however, that this result is based on the 
empiric error estimate). Maximum errors are decreased more than 2 orders of magnitude. This 
would be impossible to reach by a single DDA calculation as it will require over 6 orders of 
magnitude increase in execution time and memory, since there is only linear convergence for 
such small y. Even for  the extrapolation can be called satisfactory because the 
maximum error is decreased almost two times when considering the estimate of the error (the 
real errors are even less). It is important to note that an estimate of the error is important by 
38.0min =y
0 30 60 90 120 150 180
Scattering angle θ, deg
 y = 0.18
 extrapolation
Fig. 7. Errors of S11(θ ) in logarithmic scale for extrapolation using 9 values of y in the interval 
[0.18,0.70] for kD = 30 sphere. 
0 30 60 90 120 150 180
Scattering angle θ, deg
 discretization
 shape
 total
Fig. 8. Comparison of discretization and shape errors of S11(θ ) for kD = 10 sphere discretized using 16 
dipoles per D ( y = 0.93). 
itself (even when it is not less than the error of a single DDA computation) because it does not 
require an exact solution (that is usually unavailable in real applications). In general, the 
extrapolation decreases large errors better than those that are already small, i.e. it may 
significantly decrease maximum errors but prove less satisfactory for certain measured 
quantity (e.g. S11 for certain θ). This conclusion holds true for all the extrapolations we 
performed (Fig. 3 – Fig. 7 and those not shown). 
Extrapolation results for the discretized sphere (Fig. 4) are similar to those for the cube. 
Extrapolations for  and 0.12 are very good (more than an order of magnitude 
decrease of maximum errors), while for 
058.0min =y
23.0min =y  it is on the edge of being satisfactory. 
The latter is strongly influenced by the fact that only 4 points in a broad interval are used 
(hence it does not fully comply with the procedure specified in Section 2). 
The best extrapolation for the 3=kD  sphere (Fig. 5(a)) shows results comparable to 
cubically shaped scatterers, however it uses an extremely small 018.0min =y . Already for 
 (14.0min =y Fig. 5(b)) it only decreased the maximum errors by a factor of two. A similar 
boundary value of ymin for satisfactory extrapolation is observed for  sphere (10=kD Fig. 
6(b)), while the best extrapolation (Fig. 6(a)) does show good results (4 times decrease of 
maximum error), although significantly worse than the analogous results for cubically shaped 
scatterers. Unfortunately we are currently not able to reach sufficiently small y for the 
 sphere and the best extrapolation (30=kD Fig. 7) uses rather large , resulting in 
almost negligible improvement of accuracy. 
18.0min =y
We have also studied a  porous cube that was obtained by dividing a cube into 
27 smaller cubes and then removing randomly 9 of them. All the conclusions are the same as 
those reported for the cube, but with slightly higher overall errors (data not shown). 
Extrapolation of Qext (Table 1) shows similar results as discussed above, however the 
improvement of accuracy is generally less than for maximum errors in S11(θ ) (which is in 
agreement with what we stated above, since errors in Qext are already small). Moreover, one 
should take into account that errors of a single DDA calculation for some ymin are 
unexpectedly small (e.g. the last extrapolations for the cube and the 3=kD  sphere), but these 
are just “lucky hits” near the points where the function  crosses the horizontal axis 
(cf. 
)(δ ext yQ
Fig. 2). 
Summarizing all results we can conclude that shape errors significantly degrade the 
extrapolation performance, because of its abrupt behavior, and therefore the extrapolation 
technique is much more suited for cubically shaped particles. One may expect satisfactory 
extrapolation for non-cubically shaped particles only when 15.0min <y , while for cubically 
shaped particles the condition is 4.0min <y . It is important to note though that extrapolation 
can be used for any ymin. The estimate of the error coming from the fitting procedure (SE) can 
then be used to decide whether this extrapolation was satisfactory or not. The quality of the 
extrapolation significantly increases with decreasing ymin, hence extrapolation is of biggest 
value for obtaining (very accurate) benchmark results. The size of the particle for which the 
extrapolation technique provides significant improvement is mainly determined by available 
computational resources that are required to reach small enough ymin. However, further testing 
is required to evaluate the quality of extrapolation for scatterers large compared to the 
wavelength. 
It is important to note that the linear extrapolation that was applied in two papers7,15 may 
lead to completely erroneous results (e.g. if points on the right branch of the parabolas for the 
cube and  sphere in 3=kD Fig. 2 are used). Quadratic extrapolation, as proposed in this 
paper, is much more reliable. 
Throughout all the extrapolations we have used error estimates as specified in Section 2: 
10×SE and 2×SE for cubically and non-cubically shaped scatterers respectively. All the 
results show that these estimates are reliable, i.e. in most cases real errors are less than the 
estimates. There are only two exceptions, both for the 3=kD  sphere: the fourth extrapolation 
of Qext (Table 1) – real error 1.8 times larger than estimate – and second of S11 – real error 
1.5-2 times larger than estimate for broad range of θ (data not shown). The existence of such 
exceptions is acceptable since the estimates have a statistical nature of a confidence interval. 
However, these estimates, though reliable, are definitely not optimal, i.e. they often 
significantly overestimate the real errors (e.g. Fig. 5(a)). It also seems to be sensitive to the 
spacing of y values used for extrapolation – cf. Fig. 4(c), where unusually broad spacing was 
used. Generally this overestimation increases with increasing ymin. We can conclude that the 
error estimate should be improved, and this is subject of future research. However, the current 
estimate is already suitable for practical applications since they mainly require reliability of 
the error estimate, which is demonstrated empirically in this paper. 
It is important to note that we limited ourselves to a single value of m. While bounds of 
ymin to obtain satisfactory extrapolation definitely dependent on m, other conclusions, such as 
the reliability of the error estimate, are expected to hold true for a broad range of m. This can 
be easily tested for specific values of m of interest using the methodology put forward in this 
paper. 
Finally we discuss the results presented in Fig. 8. One cannot conclude that shape errors 
dominate over discretization errors (or the other way around): for some θ shape errors are 
much larger than discretization, for others – vice versa. However, maximum errors occurring 
in backscattering directions are definitely due to shape errors (ratio of maximum shape to 
maximum discretization errors is about 4). Errors in Qext (Table 2) are, on the contrary, mostly 
due to discretization (although they are almost two orders of magnitude smaller than 
maximum errors of S11). One may expect shape errors to become even more important for 
smaller values of y, since the linear component of discretization errors is significantly smaller 
than that of shape errors (hence for large values of y shape errors scale linearly and 
discretization – almost quadratically). Our single result principally shows different angle 
dependence of shape and discretization errors of S11: shape errors have a clear tendency to 
significantly increase towards backscattering, while the general trend of discretization errors 
is uniform over the whole θ range. 
We have presented a simple method to directly separate shape and discretization errors 
and only one result for illustration. All previous comparisons of shape and discretization 
errors had significant inherent interpretation problems that caused a lot of discussions about 
their conclusions.16-20 Our method is free of such problems and therefore can be used for 
rigorous study of shape errors in DDA. For instance, it can help to directly evaluate the 
performance of different techniques to reduce such errors, e.g. weighted discretization (WD).9 
Discretization errors are then the limit one can achieve by drastically reducing shape errors. 
We have used a traditional DDA formulation2 to show that the extrapolation technique 
can be used with current DDA codes (e.g. DDSCAT4) without any modifications. However, 
as we showed in Paper 1 several modern improvements of DDA (namely integration of 
Green’s tensor (IT)33 and WD) should significantly change the convergence behavior of DDA 
computations and hence influence the performance of the extrapolation technique. IT should 
completely eliminate the linear term for cubically shaped scatterers. This will improve the 
accuracy especially for small y, and probably also improve the quality of the extrapolation for 
such scatterers. WD should significantly decrease shape and hence total errors for non-
cubically shaped particles, moreover it should significantly decrease the amplitude of quasi-
random error oscillations because it takes into account the location of the interface inside the 
boundary dipoles. Therefore WD should improve the quality of the extrapolation for non-
cubically shaped scatterers. Testing of extrapolation performance of DDA using IT and WD is 
a subject of a future study. 
5. Conclusion 
Based on the theoretical convergence analysis as presented in Paper 1, we proposed an 
extrapolation technique together with a step-by-step prescription, which allows accuracy 
improvement of DDA computations. The performance of this technique was studied 
empirically and we showed that it significantly suppresses maximum errors of S11(θ ) when 
 and 0.15 for cubically and non-cubically shaped scatterers respectively (for 
). The quality of the extrapolation improves with decreasing y
4.0min <y
5.1=m min reaching 
extraordinary performance especially for cubically shaped particles – more than two order of 
magnitude decrease of error when 05.0min ≈y  for wavelength-sized scatterers with 5.1=m  
(total computational time for extrapolation is less than 2.7 times that for a single DDA 
computation). 
The proposed estimates of the extrapolation error were proven to be reliable, although 
they can be improved to decrease overestimation of the errors in some cases. This error 
estimate is completely internal, and hence can be used to create adaptive DDA – a code that 
will automatically refine discretization to reach a required accuracy. 
We also proposed a simple method to directly separate shape and discretization errors. 
Maximum errors of S11(θ ) for the 10=kD  sphere with 5.1=m , discretized using 16 dipoles 
per diameter ( ) are mostly due to shape errors, however the same is not true for all 
measured quantities. This method can be employed to rigorously study fundamental 
properties of these two types of errors and to directly evaluate the performance of different 
techniques aimed at reducing shape errors. 
93.0=y
Our theory predicts that modern DDA improvements (namely IT and WD) should 
significantly change the performance of the extrapolation technique, however numerical 
testing of these predictions is left for future research. 
Acknowledgements
We thank Gorden Videen and Michiel Min for valuable comments on earlier version of this 
manuscript and Denis Shamonin for help with 3D graphics. Our research is supported by the 
NATO Science for Peace program through grant SfP 977976. 
References 
 1.  E. M. Purcell and C. R. Pennypacker, "Scattering and adsorption of light by nonspherical dielectric 
grains," Astrophys. J. 186, 705-714 (1973). 
 2.  B. T. Draine and P. J. Flatau, "Discrete-dipole approximation for scattering calculations," J. Opt. Soc. 
Am. A 11, 1491-1499 (1994). 
 3.  B. T. Draine, "The discrete dipole approximation for light scattering by irregular targets," in Light 
Scattering by Nonspherical Particles, Theory, Measurements, and Applications, M. I. Mishchenko, J. W. 
Hovenier, and L. D. Travis, eds. (Academic Press, New York, 2000), pp. 131-145. 
 4.  B. T. Draine and P. J. Flatau, "User guide for the discrete dipole approximation code DDSCAT 6.1," 
http://xxx.arxiv.org/abs/astro-ph/0409262 (2004). 
 5.  J. J. Goodman, B. T. Draine, and P. J. Flatau, "Application of fast-Fourier-transform techniques to the 
discrete-dipole approximation," Opt. Lett. 16, 1198-1200 (1991). 
 6.  B. T. Draine, "The discrete-dipole approximation and its application to interstellar graphite grains," 
Astrophys. J. 333, 848-872 (1988). 
 7.  J. I. Hage, J. M. Greenberg, and R. T. Wang, "Scattering from arbitrarily shaped particles - theory and 
experiment," Appl. Opt. 30, 1141-1152 (1991). 
 8.  F. Rouleau and P. G. Martin, "A new method to calculate the extinction properties of irregularly shaped 
particles," Astrophys. J. 414, 803-814 (1993). 
 9.  N. B. Piller, "Influence of the edge meshes on the accuracy of the coupled-dipole approximation," Opt. 
Lett. 22, 1674-1676 (1997). 
 10.  N. B. Piller and O. J. F. Martin, "Increasing the performance of the coupled-dipole approximation: A 
spectral approach," IEEE Trans. Ant. Propag. 46, 1126-1137 (1998). 
 11.  N. B. Piller, "Coupled-dipole approximation for high permittivity materials," Opt. Comm. 160, 10-14 
(1999). 
 12.  A. G. Hoekstra, J. Rahola, and P. M. A. Sloot, "Accuracy of internal fields in volume integral equation 
simulations of light scattering," Appl. Opt. 37, 8482-8497 (1998). 
 13.  S. D. Druger and B. V. Bronk, "Internal and scattered electric fields in the discrete dipole approximation," 
J. Opt. Soc. Am. B 16, 2239-2246 (1999). 
 14.  Y. L. Xu and B. A. S. Gustafson, "Comparison between multisphere light-scattering calculations: 
Rigorous solution and discrete-dipole approximation," Astrophys. J. 513, 894-909 (1999). 
 15.  M. J. Collinge and B. T. Draine, "Discrete-dipole approximation with polarizabilities that account for both 
finite wavelength and target geometry," J. Opt. Soc. Am. A 21, 2023-2028 (2004). 
 16.  K. F. Evans and G. L. Stephens, "Microwave radiative-transfer through clouds composed of realistically 
shaped ice crystals .1. Single scattering properties," J. Atmos. Sci. 52, 2041-2057 (1995). 
http://xxx.arxiv.org/abs/astro-ph/0409262
 17.  H. Okamoto, A. Macke, M. Quante, and E. Raschke, "Modeling of backscattering by non-spherical ice 
particles for the interpretation of cloud radar signals at 94 GHz. An error analysis," Contrib. Atmos. Phys. 
68, 319-334 (1995). 
 18.  C. L. Liu and A. J. Illingworth, "Error analysis of backscatter from discrete dipole approximation for 
different ice particle shapes," Atmos. Res. 44, 231-241 (1997). 
 19.  H. Lemke, H. Okamoto, and M. Quante, "Comment on error analysis of backscatter from discrete dipole 
approximation for different ice particle shapes [ Liu, C.-L., Illingworth, A.J., 1997, Atmos. Res. 44, 231-
241.]," Atmos. Res. 49, 189-197 (1998). 
 20.  C. L. Liu and A. J. Illingworth, "Reply to comment by Lemke, Okamoto and Quante on 'Error analysis of 
backscatter from discrete dipole approximation for different ice particle shapes'," Atmos. Res. 50, 1-2 
(1999). 
 21.  M. A. Yurkin, V. P. Maltsev, and A. G. Hoekstra, "Convergence of the discrete dipole approximation. I. 
Theoretical analysis," J. Opt. Soc. Am. A 23, 2578-2591 (2006). 
 22.  W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes in C. The Art of 
Scientific Computing, (Cambridge University Press, New York, 1990). 
 23.  B. T. Draine and J. J. Goodman, "Beyond clausius-mossotti - wave-propagation on a polarizable point 
lattice and the discrete dipole approximation," Astrophys. J. 405, 685-697 (1993). 
 24.  A. G. Hoekstra, M. D. Grimminck, and P. M. A. Sloot, "Large scale simulations of elastic light scattering 
by a fast discrete dipole approximation," Int. J. Mod. Phys. C 9, 87-102 (1998). 
 25.  M. A. Yurkin, K. A. Semyanov, P. A. Tarasov, A. V. Chernyshev, A. G. Hoekstra, and V. P. Maltsev, 
"Experimental and theoretical study of light scattering by individual mature red blood cells with scanning 
flow cytometry and discrete dipole approximation," Appl. Opt. 44, 5249-5256 (2005). 
 26.  "Description of the national compute cluster Lisa," http://www.sara.nl/userinfo/lisa/description/ (2005). 
 27.  J. Rahola, "On the eigenvalues of the volume integral operator of electromagnetic scattering," SIAM J. 
Sci. Comp. 21, 1740-1754 (2000). 
 28.  M. Frigo and S. G. Johnson, "FFTW: an adaptive software architecture for the FFT," Proc. ICASSP 3, 
1381-1384 (1998). 
 29.  P. J. Flatau, "Improvements in the discrete-dipole approximation method of computing scattering and 
absorption," Opt. Lett. 22, 1205-1207 (1997). 
 30.  J. Rahola, "Solution of dense systems of linear equations in the discrete-dipole approximation," SIAM J. 
Sci. Comp. 17, 78-89 (1996). 
 31.  C. F. Bohren and D. R. Huffman, Absorption and scattering of Light by Small Particles, (Wiley, New 
York, 1983). 
 32.  M. Min, J. W. Hovenier, A. Dominik, A. de Koter, and M. A. Yurkin, "Absorption and scattering 
properties of arbitrary shaped particles in the Rayleigh domain: A rapid computational method and a 
theoretical foundation for the statistical approach," J. Quant. Spectrosc. Radiat. Transf. 97, 161-180 
(2006). 
 33.  P. C. Chaumet, A. Sentenac, and A. Rahmani, "Coupled dipole method for scatterers with large 
permittivity," Phys. Rev. E 70, 036606 (2004). 
http://www.sara.nl/userinfo/lisa/description/
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Error
  /CompatibilityLevel 1.4
  /CompressObjects /Tags
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJDFFile false
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /CMYK
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments true
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile ()
  /AlwaysEmbed [ true
  /NeverEmbed [ true
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages true
  /ColorImageDownsampleType /Bicubic
  /ColorImageResolution 300
  /ColorImageDepth -1
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /DCTEncode
  /AutoFilterColorImages true
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages true
  /GrayImageDownsampleType /Bicubic
  /GrayImageResolution 300
  /GrayImageDepth -1
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /DCTEncode
  /AutoFilterGrayImages true
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages true
  /MonoImageDownsampleType /Bicubic
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXOutputIntentProfile ()
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /ConvertToCMYK
      /DestinationProfileName ()
      /DestinationProfileSelector /DocumentCMYK
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure false
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles false
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /DocumentCMYK
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /UseDocumentProfile
      /UseDocumentBleed false
    >>
>> setdistillerparams
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice
ABSTRACT
  We propose an extrapolation technique that allows accuracy improvement of the
discrete dipole approximation computations. The performance of this technique
was studied empirically based on extensive simulations for 5 test cases using
many different discretizations. The quality of the extrapolation improves with
refining discretization reaching extraordinary performance especially for
cubically shaped particles. A two order of magnitude decrease of error was
demonstrated. We also propose estimates of the extrapolation error, which were
proven to be reliable. Finally we propose a simple method to directly separate
shape and discretization errors and illustrated this for one test case.

<|endoftext|><|startoftext|>
Introduction
A promising approach to handling the complexity of cell signaling pathways is to decompose pathways into
small motifs, and analyze the individual motifs. One particular motif that has attracted much attention in
recent years is the cycle formed by two or more inter-convertible forms of one protein. The protein, denoted
here by S0, is ultimately converted into a product, denoted here by Sn, through a cascade of “activation”
reactions triggered or facilitated by an enzyme E; conversely, Sn is transformed back (or “deactivated”)
into the original S0, helped on by the action of a second enzyme F . See Figure 1.
S S0 2S1
SSS nn−1n−2
Figure 1: A futile cycle of size n.
Such structures, often called “futile cycles” (also called substrate cycles, enzymatic cycles, or enzymatic
inter-conversions, see [1]), serve as basic blocks in cellular signaling pathways and have pivotal impact on
the signaling dynamics. Futile cycles underlie signaling processes such as GTPase cycles [2], bacterial
two-component systems and phosphorelays [3, 4] actin treadmilling [5]), and glucose mobilization [6], as
well as metabolic control [7] and cell division and apoptosis [8] and cell-cycle checkpoint control [9]. One
very important instance is that of Mitogen-Activated Protein Kinase (“MAPK”) cascades, which regulate
http://arxiv.org/abs/0704.0036v2
primary cellular activities such as proliferation, differentiation, and apoptosis [10–13] in eukaryotes from
yeast to humans.
MAPK cascades usually consist of three tiers of similar structures with multiple feedbacks [14–16].
Each individual level of the MAPK cascades is a futile cycle as depicted in Figure 1 with n = 2. Markevich
et al.’s paper [17] was the first to demonstrate the possibility of multistationarity at a single cascade
level, and motivated the need for analytical studies of the number of steady states. Conradi et al. studied
the existence of multistationarity in their paper [19], employing algorithms based on Feinberg’s chemical
reaction network theory (CRNT). (For more details on CRNT, see [31,32].) The CRNT algorithm confirms
multistationarity in a single level of MAPK cascades, and provides a set of kinetic constants which can
give rise to multistationarity. However, the CRNT algorithm only tests for the existence of multiple steady
states, and does not provide information regarding the precise number of steady states.
In [18], Gunawardena proposed a novel approach to the study of steady states of futile cycles. His
approach, which was focused in the question of determining the proportion of maximally phosphorylated
substrate, was developed under the simplifying quasi-steady state assumption that substrate is in excess.
Nonetheless, our study of multistationarity uses in a key manner the basic formalism in [18], even for the
case when substrate is not in excess.
In Section 2, we state our basic assumptions regarding the model. The basic formalism and background
for the approach is provided in Section 3. The main focus of this paper is on Section 4, where we derive
various bounds on the number of steady states of futile cycles of size n. The first result is a the lower
bound for the number of steady states. Currently available results on lower bounds, as in [29], can only
handle the case when quasi-steady state assumptions are valid; we substantially extend these results to the
fully general case by means of a perturbation argument which allows one to get around these restricted
assumptions. Another novel feature of our results in this paper is the derivation of an upper bound
of 2n − 1, valid for all kinetic constants. Models in molecular cell biology are characterized by a high
degree of uncertainly in parameters, hence such results valid over the entire parameter space are of special
significance. However, when more information of the parameters are available, sharper upper bounds can
obtained, see Theorems 4 and 5. We finally conclude our paper in Section 5 with a conjecture of an n+ 1
upper bound.
We remark that the results given here complement our work dealing with the dynamical behavior
of futile cycles. For the case n = 2, [25] showed that the model exhibits generic convergence to steady
states but no more complicated behavior, at least within restricted parameter ranges, while [27] showed a
persistence property (no species tends to be eliminated) for any possible parameter values. These papers
did not address the question of estimating the number of steady states. (An exception is the case n = 1,
for which uniqueness of steady states can be proved in several ways, and for which global convergence to
these unique equilibria holds [27].)
2 Model assumptions
Before presenting mathematical details, let us first discuss the basic biochemical assumptions that go into
the model. In general, phosphorylation and dephosphorylation can follow either distributive or processive
mechanism. In the processive mechanism, the kinase (phosphatase) facilitates two or more phosphorylations
(dephosphorylations) before the final product is released, whereas in the distributive mechanism, the kinase
(phosphatase) facilitates at most one phosphorylation (dephosphorylation) in each molecular encounter.
In the case of n = 2, a futile cycle that follows the processive mechanism can be represented by reactions
as follows:
S0 + E ←→ ES0 ←→ ES1 −→ S2 + E
S2 + F ←→ FS2 ←→ FS1 −→ S0 + F ;
and the distributive mechanism can be represented by reactions:
S0 + E ←→ ES0 −→ S1 + E ←→ ES1 −→ S2 + E
S2 + F ←→ FS2 −→ S1 + F ←→ FS1 −→ S0 + F.
Biological experiments have demonstrated that both dual phosphorylation and dephosphorylation in MAPK
are distributive, see [14–16]. In their paper [19], Conradi et al. showed mathematically that if either phos-
phorylation or dephosphorylation follows a processive mechanism, the steady state will be unique, which,
it is argued in [19], contradicts experimental observations. So, to get more interesting results, we assume
that both phosphorylations and dephosphorylations in the futile cycles follow the distributive mechanism.
Our structure of futile cycles in Figure 1 also implicitly assumes a sequential instead of a random
mechanism. By a sequential mechanism, we mean that the kinase phosphorylates the substrates in a
specific order, and the phosphatase works in the reversed order. This assumption dramatically reduces the
number of different phospho-forms and simplifies our analysis. In a special case when the kinetic constants
of each phosphorylation are the same and the kinetic constants of each dephosphorylation are the same,
the random mechanism can be easily included in the sequential case. Biologically, there are systems, for
instance the auto-phosphorylation of FGF-receptor-1, that have been experimentally shown to follow a
sequential mechanism [33].
To model the reactions, we assume mass action kinetics, which is standard in mathematical modeling
of molecular events in biology.
3 Mathematical formalism
In this section, we set up a mathematical framework for studying the steady states of futile cycles. Let us
first write down all the elementary chemical reactions in Figure 1:
S0 + E
koff0
kcat0
→ S1 + E
Sn−1 + E
konn−1
koffn−1
kcatn−1
→ Sn + E
S1 + F
loff0
lcat0
→ S0 + F
Sn + F
lonn−1
loffn−1
lcatn−1
→ Sn−1 + F
where kon0 , etc., are kinetic parameters for binding and unbinding, ES0 denotes the complex consisting of
the enzyme E and the substrate S0, and so forth. These reactions can be modeled by 3n + 3 differential-
algebraic equations according to mass action kinetics:
= −kon0s0e+ koff0c0 + lcat0d1
= −konisie+ koffi
ci + kcati−1ci−1 − loni−1sif + loffi−1
di + lcatidi+1, i = 1, . . . , n− 1
= konjsje− (koffj
+ kcatj )cj , j = 0, . . . , n− 1 (1)
= lonk−1skf − (loffk−1
+ lcatk−1)dk, k = 1, . . . , n,
together with the algebraic “conservation equations”:
Etot = e+
Ftot = f +
di, (2)
Stot =
The variables s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f stand for the concentrations of
S0, . . . , Sn, ES0, . . . , ESn−1, FS1, . . . , FSn, E, F
respectively. For each positive vector
κ =(kon0 , . . . , konn−1 , koff0
, . . . , koffn−1
, kcat0 , . . . , kcatn−1 ,
lon0 , . . . , lonn−1 , loff0
, . . . , loffn−1
, lcat0 , . . . , lcatn−1) ∈ R
(of “kinetic constants”) and each positive triple C = (Etot, Ftot, Stot), we have a different system Σ(κ, C).
Let us write the coordinates of a vector x ∈ R3n+3+ as:
x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f),
and define a mapping
Φ : R3n+3+ × R
+ × R
+ −→ R
with components Φ1, . . . ,Φ3n+3 where the first 3n components are
Φ1(x, κ, C) = −kon0s0e+ koff0c0 + lcat0d1,
and so forth, listing the right hand sides of the equations (1), Φ3n+1 is
ci − Etot,
and similarly for Φ3n+2 and Φ3n+3, we use the remaining equations in (2).
For each κ, C, let us define a set
Z(κ, C) = {x |Φ(x, κ, C) = 0}.
Observe that, by definition, given x ∈ R3n+3+ , x is a positive steady state of Σ(κ, C) if and only if x ∈ Z(κ, C).
So, the mathematical statement of the central problem in this paper is to count the number of elements
in Z(κ, C). Our analysis will be greatly simplified by a preprocessing. Let us introduce a function
Ψ : R3n+3+ × R
+ × R
+ −→ R
with components Ψ1, . . . ,Ψ3n+3 defined as
Ψ1 = Φ1 +Φn+1
Ψi = Φi +Φn+i +Φ2n+i−1 +Ψi−1, i = 2, . . . , n
Ψj = Φj, j = n+ 1, . . . , 3n + 3.
It is easy to see that
Z(κ, C) = {x |Ψ(x, κ, C) = 0},
but now the first 3n equations are:
Ψi = lcati−1di − kcati−1ci−1 = 0, i = 1, . . . , n,
Ψn+1+j = konjsje− (koffj
+ kcatj )cj = 0, j = 0, . . . , n− 1
Ψ2n+k = lonk−1skf − (loffk−1
+ lcatk−1)dk = 0, k = 1, . . . , n,
and can be easily solved as:
si+1 = λi(e/f)si, (3)
di+1 =
fsi+1
, (5)
where
kcatiLMi
KMi lcati
, KMi =
kcati + koffi
, LMi =
lcati + loffi
, i = 0, . . . , n− 1. (6)
We may now express
0 si,
0 ci and
1 di in terms of s0, κ, e and f :
si = s0
1 + λ0
+ λ0λ1
+ · · ·+ λ0 · · ·λn−1
:= s0ϕ
ci = es0
+ · · ·+
λ0 · · ·λn−2
KMn−1
:= es0ϕ
, (7)
di = fs0
+ · · · +
λ0 · · ·λn−1
LMn−1
:= fs0ϕ
Although the equation Ψ = 0 represents 3n+3 equations with 3n+3 unknowns, next we will show that
it can be reduced to two equations with two unknowns, which have the same number of positive solutions
as Ψ = 0. Let us first define a set
S(κ, C) = {(u, v) ∈ R+ × R+ |G
1 (u, v) = 0, G
2 (u, v) = 0},
where G
1 , G
2 : R
+ −→ R are given by
1 (u, v) = v (uϕ
1(u)− ϕ
2(u)Etot/Ftot)− Etot/Ftot + u,
2 (u, v) = ϕ
0(u)ϕ
2 (u)v
2 + (ϕκ0 (u)− Stotϕ
2 (u) + Ftotuϕ
1 (u) + Ftotϕ
2 (u)) v − Stot.
The precise statement is as follows:
Lemma 1 There exists a mapping δ : R3n+3 −→ R2 such that, for each κ, C, the map δ restricted to
Z(κ, C) is a bijection between the sets Z(κ, C) and S(κ, C).
Proof. Let us define the mapping δ : R3n+3 −→ R2 as δ(x) = (e/f, s0), where
x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f).
If we can show that δ induces a bijection between Z(κ, C) and S(κ, C), we are done.
First, we claim that δ(Z(κ, C)) ⊆ S(κ, C). Pick any x ∈ Z(κ, C), we have that x satisfies (3)-(5).
Moreover, Φ3n+2(x, κ, C) = 0 yields
Etot = e+ es0ϕ
and thus
1 + s0ϕ
1(e/f)
. (8)
Using Φ3n+1(x, κ, C) = 0 and Φ3n+2(x, κ, C) = 0, we get:
e(1 + s0ϕ
1(e/f))
f(1 + s0ϕ
2(e/f))
, (9)
which is G
1 (e/f, s0) = 0 after multiplying by 1 + s0ϕ
2(e/f) and rearranging terms.
To check that G
2 (e/f, s0) = 0, we start with Φ3n+3(x, κ, C) = 0, i.e.
Stot =
Using (7) and (8), this expression becomes
Stot = s0ϕ
Etots0ϕ
1(e/f)
1 + s0ϕ
1 (e/f)
Ftots0ϕ
2(e/f)
1 + s0ϕ
2(e/f)
= s0ϕ
eFtots0ϕ
1(e/f)
f(1 + s0ϕ
2(e/f))
Ftots0ϕ
2(e/f)
1 + s0ϕ
2(e/f)
where the last equality comes from (9).
After multiplying by 1 + s0ϕ
2 (e/f), and simplifying, we get
ϕκ0 (
)ϕκ2 (
)s20 +
)− Stotϕ
Ftotϕ
) + Ftotϕ
s0 − Stot = 0,
that is, G
2 (e/f, s0) = 0. since both G
1 (e/f, s0) and G
2 (e/f, s0) are zero, δ(x) ∈ S(κ, C).
Next, we will show that S(κ, C) ⊆ δ(Z(κ, C)). For any y = (u, v) ∈ S(κ, C), let the coordinates of x be
defined as:
s0 = v
si+1 = λiusi
1 + s0ϕ
1 (u)
di+1 =
fsi+1
for i = 0, . . . , n − 1. It is easy to see that the vector x = (s0, . . . , sn, c0, . . . , cn−1, d1, . . . , dn, e, f) satisfies
Φ1(x, κ, C) = 0, . . . ,Φ3n+1(x, κ, C) = 0. If Φ3n+2(x, κ, C) and Φ3n+3(x, κ, C) are also zero, then x is an
element of Z(κ, C) with δ(x) = y. Given the condition that G
i (u, v) = 0 (i = 1, 2) and u = e/f, v = s0,
we have G
1 (e/f, s0) = 0, and therefore (9) holds. Since
1 + s0ϕ
1(e/f)
in our construction, we have
Ftot = f(1 + s0ϕ
2(e/f)) = f +
To check Φ3n+3(x, κ, C) = 0, we use
2 (e/f, s0)
1 + s0ϕ
2(e/f)
2 (e/f, s0) = 0 and 1 + s0ϕ
2(e/f) > 0. Applying (7)-(9), we have
di = s0ϕ
0(e/f) +
eFtots0ϕ
1(e/f)
f(1 + s0ϕ
2 (e/f))
Ftots0ϕ
2(e/f)
1 + s0ϕ
2(e/f)
= Stot.
It remains for us to show that the map δ is one to one on Z(κ, C). Suppose that δ(x1) = δ(x2) = (u, v),
where
xi = (si0, . . . , s
0, . . . , c
n−1, d
1, . . . , d
i, f i), i = 1, 2.
By the definition of δ, we know that s10 = s
0 and e
1/f1 = e2/f2. Therefore, s1i = s
i for i = 0, . . . , n.
Equation (8) gives
1 + vϕκ1 (u)
= e2.
Thus, f1 = f2, and c1i = c
i , d
i+1 = d
i+1 for i = 0, . . . , n − 1 because of (3)-(5). Therefore, x
1 = x2, and δ
is one to one.
The above lemma ensures that the two sets Z(κ, C) and S(κ, C) have the same number of elements. From
now on, we will focus on S(κ, C), the set of positive solutions of equations G
1 (u, v) = 0, G
2 (u, v) = 0,
1 (u, v) = v (uϕ
1(u)− ϕ
2(u)Etot/Ftot)− Etot/Ftot + u = 0, (10)
2 (u, v) = ϕ
0(u)ϕ
2 (u)v
2 + (ϕκ0(u)− Stotϕ
2 (u) + Ftotuϕ
1 (u) + Ftotϕ
2(u)) v − Stot = 0. (11)
4 Number of positive steady states
4.1 Lower bound on the number of positive steady states
One approach to solving (10)-(11) is to view G
2 (u, v) as a quadratic polynomial in v. Since G
2 (u, 0) < 0,
equation (11) has a unique positive root, namely
−Hκ,C(u) +
Hκ,C(u)2 + 4Stotϕ
0(u)ϕ
2 (u)
2ϕκ0 (u)ϕ
2 (u)
, (12)
where
Hκ,C(u) = ϕκ0(u)− Stotϕ
2(u) + Ftotuϕ
1(u) + Ftotϕ
2(u). (13)
Substituting this expression for v into (10), and multiplying by ϕκ0 (u), we get
F κ,C(u) :=
−H̃κ,C(u) +
H̃κ,C(u)2 + 4Stotϕ
0 (u)ϕ
2 (u)
2ϕκ2 (u)
uϕκ1(u)−
ϕκ2(u)
ϕκ0(u)+uϕ
0 (u) = 0.
So, any (u, v) ∈ S(κ, C) should satisfy (12) and (14). On the other hand, any positive solution u of (14)
(notice that ϕκ0(u) > 0) and v given by (12) (always positive) provide a positive a solution of (10)-(11),
that is, (u, v) is an element in S(κ, C). Therefore, the number of positive solutions of (10)-(11) is the same
as the number of positive solutions of (12) and (14). But v is uniquely determined by u in (12), which
further simplifies the problem to one equation (14) with one unknown u. Based on this observation, we
have:
Theorem 1 For each positive numbers Stot, γ, there exist ε0 > 0 and κ ∈ R
+ such that the following
property holds. Pick any Etot, Ftot such that
Ftot = Etot/γ < ε0Stot/γ, (15)
then the system Σ(κ, C) with C = (Etot, Ftot, Stot) has at least n + 1 (n) positive steady states when n is
even (odd).
Proof. For each κ, γ, Stot, let us define two functions R+ × R+ −→ R as follows:
κ,γ,Stot(ε, u) = H
κ,(εStot,εStot/γ,Stot)(u) (16)
= ϕκ0(u)− Stotϕ
2(u) + ε
uϕκ1(u) + ε
ϕκ2(u),
κ,γ,Stot(ε, u) = F
κ,(εStot,εStot/γ,Stot)(u) (17)
κ,γ,Stot(ε, u) +
κ,γ,Stot(ε, u)2 + 4Stotϕ
0 (u)ϕ
2 (u)
2ϕκ2 (u)
(uϕκ1(u)− γϕ
2 (u))
− γϕκ0(u) + uϕ
0(u).
By Lemma 1 and the argument before this theorem, it is enough to show that there exist ε0 > 0 and κ ∈
+ such that for all ε ∈ (0, ε0), the equation F̃
κ,γ,Stot(ε, u) = 0 has at least n+1 (n) positive solutions
when n is even (odd). (Then, given Stot, γ, Etot, and Ftot satisfying (15), we let ε = Etot/Stot < ε0,
and apply the result.)
A straightforward computation shows that when ε = 0,
κ,γ,Stot(0, u) = Stot (uϕ
1(u)− γϕ
2(u))− γϕ
0 (u) + uϕ
= λ0 · · ·λn−1u
n+1 + λ0 · · ·λn−2
KMn−1
(1− γβn−1)− γλn−1
+ · · ·+ λ0 · · ·λi−2
KMi−1
(1− γβi−1)− γλi−1
ui + · · · (18)
(1− γβ0)− γλ0
u− γ,
where the λi’s and KMi ’s are defined as in (6), and βi = kcati/lcati . The polynomial F̃
κ,γ,Stot(0, u)
is of degree n + 1, so there are at most n + 1 positive roots. Notice that u = 0 is not a root because
κ,γ,Stot(0, u) = −γ < 0, which also implies that when n is odd, there can not be n + 1 positive roots.
Now fix any Stot and γ. We will construct a vector κ such that F̃
κ,γ,Stot(0, u) has n+ 1 distinct positive
roots when n is even.
Let us pick any n+ 1 positive real numbers u1 < · · · < un+1, such that their product is γ, and assume
(u− u1) · · · (u− un+1) = u
n+1 + anu
n + · · · + a1u+ a0, (19)
where a0 = −γ < 0 keeping in mind that ai’s are given. Our goal is to find a vector κ ∈ R
+ such that
(18) and (19) are the same. For each i = 0, . . . , n − 1, we pick λi = 1. Comparing the coefficients of u
in (18) and (19), we have:
(1 + a0βi) = ai+1 − a0 − 1. (20)
Let us pick KMi > 0 such that
(ai+1 − a0 − 1)− 1 < 0, then take
(ai+1 − a0 − 1)− 1
in order to satisfy (20). From the given
λ0, . . . , λn−1,KM0 , . . . ,KMn−1 , β0, . . . , βn−1,
we will find a vector
κ =(kon0 , . . . , konn−1 , koff0
, . . . , koffn−1
, kcat0 , . . . , kcatn−1 ,
lon0 , . . . , lonn−1 , loff0
, . . . , loffn−1
, lcat0 , . . . , lcatn−1) ∈ R
such that βi = kcati/lcati , i = 0, . . . , n− 1, and (6) holds. This vector κ will guarantee that F̃
κ,γ,Stot(0, u)
has n + 1 positive distinct roots. When n is odd, a similar construction will give a vector κ such that
κ,γ,Stot(0, u) has n positive roots and one negative root.
One construction of κ (given λi,KMi , βi, i = 0, . . . , n − 1) is as follows. For each i = 0, . . . , n − 1, we
start by defining:
LMi =
λiKMi
consistently with the definitions in (6). Then, we take
koni = 1, loni = 1,
koffi
= αiKMi , kcati = (1− αi)KMi , lcati =
1− αi
KMi , loffi
= LMi − lcati ,
where αi ∈ (0, 1) is chosen such that
loffi
= LMi −
1− αi
KMi > 0.
This κ satisfies βi = kcati/lcati , i = 0, . . . , n− 1, and (6).
In order to apply the Implicit Function Theorem, we now view the functions defined by formulas in
(16) and (17) as defined also for ε ≤ 0, i.e. as functions R×R+ −→ R. It is easy to see that F̃
κ,γ,Stot(ε, u)
is C1 on R × R+ because the polynomial under the square root sign in F̃
κ,γ,Stot(ε, u) is never zero. On
the other hand, since F̃
κ,γ,Stot(0, u) is a polynomial in u with distinct roots, ∂F̃
κ,γ,Stot
(0, ui) 6= 0. By the
Implicit Function Theorem, for each i = 1, . . . , n+ 1, there exist open intervals Ei containing 0, and open
intervals Ui containing ui, and a differentiable function
αi : Ei → Ui
such that αi(0) = ui, F̃
κ,γ,Stot(ε, αi(ε)) = 0 for all ε ∈ Ei, and the images αi(Ei)’s are non-overlapping. If
we take
(0, ε0) :=
(0,+∞),
then for any ε ∈ (0, ε0), we have {αi(ε)} as n+ 1 distinct positive roots of F̃
κ,γ,Stot(ε, u). The case when
n is odd can be proved similarly.
The above theorem shows that when Etot/Ftot is sufficiently small, it is always possible for the futile
cycle to have n + 1 (n) steady states when n is even (odd), by choosing appropriate kinetic constants κ.
We should notice that for arbitrary κ, the derivative of F̃ at each positive root may become zero, which
breaks down the perturbation argument. Here is an example to show that more conditions are needed:
n = 2, λ0 = 1, λ1 = 3, γ = 6, β0 = β1 = 1/12, K0 = 1/8, K1 = 1/2, Stot = 5,
we have that
κ,γ,Stot(0, u) = 3u3 − 12u2 + 15u− 6 = 3(u− 1)2(u− 2)
has a double root at u = 1. In this case, even for ε = 0.01, there is only one positive root of F̃
κ,γ,Stot(ε, u),
see Figure 2.
1 2 3
Figure 2: The plot of the function F̃
κ,γ,Stot(0.01, u) on [0, 3]. There is a unique positive real solution
around u = 2.14, the double root u = 1 of F̃
κ,γ,Stot(0, u) bifurcates to two complex roots with non-zero
imaginary parts.
However, the following lemma provides a sufficient condition for ∂F
κ,γ,Stot
(0, ū) 6= 0, for any positive
ū such that F̃
κ,γ,Stot(0, ū) = 0.
Lemma 2 For each positive numbers Stot, γ, and vector κ ∈ R
+ , if
1− γβj
holds for all j = 1, · · · , n − 1, then ∂F̃
κ,γ,Stot
(0, ū) 6= 0.
See Appendix for the proof.
Theorem 2 For each positive numbers Stot, γ, and vector κ ∈ R
+ satisfying condition (21), there exists
ε1 > 0 such that for any Ftot, Etot satisfying Ftot = Etot/γ < ε1Stot/γ, the number of positive steady
states of system Σ(κ, C) is greater or equal to the number of (positive) roots of F̃
κ,γ,Stot(0, u).
Proof. Suppose that F̃
κ,γ,Stot(0, u) has m roots: ū1, . . . , ūm. Applying Lemma 2, we have
κ,γ,Stot
(0, ūk) 6= 0, k = 1, . . . ,m.
By the perturbation arguments as in Theorem 1, we have that there exists ε1 > 0 such that F̃
κ,γ,Stot(ε, u)
has at least m roots for all 0 < ε < ε1.
The above result depends heavily on a perturbation argument, which only works when Etot/Ftot is
sufficiently small. In the next section, we will give an upper bound of the number of steady states with no
restrictions on Etot/Ftot, and independent of κ and C.
4.2 Upper bound on the number of steady states
Theorem 3 For each κ, C, the system Σ(κ, C) has at most 2n− 1 positive steady states.
Proof. An alternative approach to solving (10)-(11) is to first eliminate v from (10) instead of from (11),
Etot/Ftot − u
uϕκ1(u)− (Etot/Ftot)ϕ
, (22)
when uϕκ1(u) − (Etot/Ftot)ϕ
2 (u) 6= 0. Then, we substitute (22) into (11), and multiply by (uϕ
1(u) −
(Etot/Ftot)ϕ
2 (u))
2 to get:
P κ,C(u) := ϕκ0ϕ
+ (ϕκ0 − Stotϕ
2 + Ftotuϕ
1 + Ftotϕ
uϕκ1 −
− Stot
uϕκ1 −
= 0. (23)
Therefore, if uϕκ1(u) − (Etot/Ftot)ϕ
2 (u) 6= 0, the number of positive solutions of (10)-(11) is no greater
than the number of positive roots of P κ,C(u).
In the special case when uϕκ1(u) − (Etot/Ftot)ϕ
2(u) = 0, by (10), we must have u = Etot/Ftot, and
thus ϕκ1 (Etot/Ftot) = ϕ
2 (Etot/Ftot). Substituting into (11), we get a unique v defined as in (12) with
u = Etot/Ftot. But notice that in this case u = Etot/Ftot is also a root of P
κ,C(u), so also in this case
the number of positive solutions to (10)-(11) is no greater than the number of positive roots of P κ,C(u).
It is easy to see that P κ,C(u) is divisible by u. Consider the polynomial Qκ,C(u) := P κ,C(u)/u of
degree 2n + 1. We will first show that Qκ,C(u) has no more than 2n positive roots, then we will prove by
contradiction that 2n distinct positive roots can not be achieved.
It is easy to see that the coefficient of u2n+1 is
(λ0 · · · λn−1)
LMn−1
and the constant term is
FtotKM0
So the polynomial Qκ,C(u) has at least one negative root, and thus has no more than 2n positive roots.
Suppose that S(κ, C) has cardinality 2n, then Qκ,C(u) must have 2n distinct positive roots, and each of
them has multiplicity one. Let us denote the roots as u1, . . . , u2n in ascending order. We claim that none
of them equals Etot/Ftot. If so, we would have ϕ
1(Etot/Ftot) = ϕ
2 (Etot/Ftot), and Etot/Ftot would
be a double root of Qκ,C(u), contradiction.
Since Qκ,C(0) > 0, Qκ,C(u) is positive on intervals
I0 = (0, u1), I1 = (u2, u3), . . . , In−1 = (u2n−2, u2n−1), In = (u2n,∞),
and negative on intervals
J1 = (u1, u2), . . . , Jn = (u2n−1, u2n).
As remarked earlier, ϕκ1 (Etot/Ftot) 6= ϕ
2 (Etot/Ftot), the polynomial Q
κ,C(u) evaluated at Etot/Ftot
is negative, and therefore, Etot/Ftot belongs to one of the J intervals, say Js = (u2s−1, u2s), for some
s ∈ {1, . . . , n} .
On the other hand, the denominator of v in (22), denoted as B(u), is a polynomial of degree n and
divisible by u. If B(u) has no positive root, then it does not change sign on the positive axis of u. But v
changes sign when u passes Etot/Ftot, thus v2s−1 and v2s have opposite signs, and one of (u2s−1, v2s−1)
and (u2s, v2s) is not a solution to (10)-(11), which contradicts the fact that both are in S(κ, C).
Otherwise, there exists a positive root ū of B(u) such that there is no other positive root of B(u)
between ū and Etot/Ftot. Plugging ū into Q
κ,C(u), we see that Qκ,C(ū) is always positive, therefore, ū
belongs to one of the I intervals, say It = (u2t, u2t+1) for some t ∈ {0, . . . , n}. There are two cases:
1. Etot/Ftot < ū. We have
u2s−1 < Etot/Ftot < u2t < ū.
Notice that v changes sign when u passes Etot/Ftot, so the corresponding v2s−1 and v2t have different
signs, and either (u2s−1, v2s−1) /∈ S(κ, C) or (u2t, v2t) /∈ S(κ, C), contradiction.
2. Etot/Ftot > ū. We have
ū < u2t+1 < Etot/Ftot < u2s.
Since v changes sign when u passes Etot/Ftot, so the corresponding v2t+1 and v2s have different
signs, and either (u2t+1, v2t+1) /∈ S(κ, C) or (u2s, v2s) /∈ S(κ, C), contradiction.
Therefore, Σ(κ, C) has at most 2n− 1 steady states.
4.3 Fine-tuned upper bounds
In the previous section, we have seen that any (u, v) ∈ S(κ, C), u 6= Etot/Ftot must satisfy (22)-(23), but
not all solutions of (22)-(23) are elements in S(κ, C). Suppose that (u, v) is a solution of (22)-(23), it is
in S(κ, C) if and only if u, v > 0. In some special cases, for example, when the enzyme is in excess, or the
substrate is in excess, we could count the number of solutions of (22)-(23) which are not in S(κ, C) to get
a better upper bound.
The following is a standard result on continuity of roots; see for instance Lemma A.4.1 in [30]:
Lemma 3 Let g(z) = zn + a1z
n−1 + · · ·+ an be a polynomial of degree n and complex coefficients having
distinct roots
λ1, . . . , λq,
with multiplicities
n1 + · · ·+ nq = n,
respectively. Given any small enough δ > 0 there exists a ε > 0 so that, if
h(z) = zn + b1z
n−1 + · · ·+ bn, |ai − bi| < ε for i = 1, . . . , n,
then h has precisely ni roots in Bδ(λi) for each i = 1, . . . , q.
Theorem 4 For each γ > 0 and κ ∈ R6n−6+ such that ϕ
1 (γ) 6= ϕ
2 (γ), and each Stot > 0, there exists
ε2 > 0 such that for all positive numbers Etot, Ftot satisfying Ftot = Etot/γ < ε2Stot/γ, the system
Σ(κ, C) has at most n+ 1 positive steady states.
Proof. Let us define a function R+ × C −→ C as follows:
κ,γ,Stot(ε, u) = Q
κ,(εStot,εStot/γ,Stot)(u),
and a set B
κ,γ,Stot(ε) consisting of the roots of Q̃
κ,γ,Stot(ε, u) which are not positive or the corresponding
v’s determined by u’s as in (22) are not positive, Since Q̃
κ,γ,Stot(ε, u) is a polynomial of degree 2n + 1,
if we can show that there exists ε2 > 0 such that for any ε ∈ (0, ε2), Q̃
κ,γ,Stot(ε, u) has at least n roots
counting multiplicities that are in B
κ,γ,Stot(ε), then we are done.
In order to apply Lemma 3, we regard the function Q̃
κ,γ,Stot as defined on R× C. At ε = 0:
κ,γ,Stot(0, u) = [ϕκ0ϕ
2(γ − u)
2 + (ϕκ0 − Stotϕ
2 )(uϕ
1 − γϕ
2)(γ − u)− Stot(uϕ
1 − γϕ
= [ϕκ0ϕ
2(γ − u)
2 + ϕκ0(uϕ
1 − γϕ
2)(γ − u)− Stotϕ
1 − γϕ
2)(γ − u)− Stot(uϕ
1 − γϕ
= [ϕκ0 (γ − u)u(ϕ
1 − ϕ
2) + Stotu(uϕ
1 − γϕ
2 )(ϕ
2 − ϕ
1)]/u
= (ϕκ2 − ϕ
1)(uϕ
0 + Stot(uϕ
1 − γϕ
2 )− γϕ
= (ϕκ2 − ϕ
κ,γ,Stot(0, u)
Let us denote the distinct roots of Q̃
κ,γ,Stot(0, u)/u as
u1, . . . , uq,
with multiplicities
n1 + · · ·+ nq = 2n+ 1,
and the roots of ϕκ1 − ϕ
u1, . . . , up, p ≤ q,
with multiplicities
m1 + · · ·+mp = n, ni ≥ mi, for i = 1, . . . , p.
For each i = 1, . . . , p, if ui is real and positive, then there are two cases (ui 6= γ as ϕ
1(γ) 6= ϕ
2(γ)):
1. ui > γ. We have
1(ui)− γϕ
2 (ui) > γ(ϕ
1 (ui)− ϕ
2(ui)) = 0.
2. ui < γ. We have
1(ui)− γϕ
2 (ui) < γ(ϕ
1 (ui)− ϕ
2(ui)) = 0.
In both cases, uiϕ
1(ui)− γϕ
2 (ui) and γ − ui have opposite signs, i.e.
1 (ui)− γϕ
2(ui))(γ − ui) < 0.
Let us pick δ > 0 small enough such that the following conditions hold:
1. For all i = 1, . . . , p, if ui is not real, then Bδ(ui) has no intersection with the real axis.
2. For all i = 1, . . . , p, if ui is real and positive, the following inequality holds for any real u ∈ Bδ(ui):
(uϕκ1(u)− γϕ
2 (u))(γ − u) < 0. (24)
3. For all i = 1, . . . , p, if ui is real and negative, then Bδ(ui) has no intersection with the imaginary
axis.
4. Bδ(uj)
Bδ(uk) = ∅ for all j 6= k = 1, . . . , q.
By Lemma 3, there exists ε3 > 0 such that for all ε ∈ (0, ε3), the polynomial Q̃
κ,γ,Stot(ε, u)/u has exactly
nj roots in each Bδ(uj), j = 1, . . . , q, denoted by u
j (ε), k = 1, . . . , nj .
We pick one such ε, and we claim that none of the roots in Bδ(ui), i = 1, . . . , p with the v defined as
in (22) will be an element in S. If so, we are done, since there are
1 ni ≥
1 mi = n such roots, of
κ,γ,Stot(ε, u) which are in B
κ,γ,Stot(ε).
For each i = 1, . . . , p, there are two cases:
1. ui is not real. Then condition 1 guarantees that u
i (ε) is not real for each k = 1, . . . , ni, and thus is
κ,γ,Stot(ε).
2. ui is real and positive. Pick any root u
i (ε) ∈ Bδ(ui), k = 1, . . . , ni, the corresponding v
i (ε) equals
γ − uki (ε)
uki (ε)ϕ
i (ε)) − γϕ
i (ε))
) < 0
followed from (24). So (uki (ε), v
i (ε)) /∈ S(κ, C), and u
i (ε) ∈ B
κ,γ,Stot(ε).
3. ui is real and negative. By condition 1 and 3, u
i (ε) is not positive for all k = 1, . . . , ni.
The next theorem considers the case when enzyme is in excess:
Theorem 5 For each γ > 0, κ ∈ R6n−6+ such that ϕ
1 (γ) 6= ϕ
2(γ), and each Etot > 0, there exists ε3 > 0
such that for all positive numbers Ftot, Stot satisfying Ftot = Etot/γ > Stot/(ε3γ), the system Σ(κ, C) has
at most one positive steady state.
Proof. For each γ > 0, κ ∈ R6n−6+ such that ϕ
1 (γ) 6= ϕ
2 (γ), and each Etot > 0, we define a function
R+ × C −→ C as follows:
κ,γ,Etot(ε, u) = Q
κ,(Etot,Etot/γ,εEtot)(u).
Let us define the set C
κ,γ,Etot(ε) as the set of roots of Q̄
κ,γ,Etot(ε, u) which are not positive or the
corresponding v’s determined by u’s as in (22) are not positive. If we can show that there exists ε3 > 0
such that for any ε ∈ (0, ε3) there is at most one positive root of Q̄
κ,γ,Etot(ε, u) that is not in C
κ,γ,Etot(ε),
we are done.
In order to apply Lemma 3, we now view the function Q̄
κ,γ,Etot as defined on R× C. At ε = 0:
κ,γ,Etot(0, u) = (γ − u)
(γ − u)ϕκ0ϕ
ϕκ0 +
uϕκ1 +
(uϕκ1 − γϕ
:= (γ − u)R
κ,γ,Etot(u).
Let us denote the distinct roots of Q̄
κ,γ,Etot(0, u)/u as
u1(= γ), u2, . . . , uq,
with multiplicities
n1 + · · ·+ nq = 2n+ 1,
and u2, . . . , uq are the roots of R
κ,γ,Etot(u) other than γ.
Since ϕκ1(γ) 6= ϕ
2(γ), R
κ,γ,Etot(u) is not divisible by u− γ, and thus n1 = 1.
For each i = 2, . . . , q, we have
(γ − ui)ϕ
0(ui)ϕ
2 (ui) = −
ϕκ0(ui) +
1(ui) +
ϕκ2(ui)
1 (ui)− γϕ
2(ui)) .
If ui > 0, then ϕ
0(ui)ϕ
2 (ui) and ϕ
0 (ui) +
1 (ui) +
ϕκ2 (ui) are both positive. Since uiϕ
1(ui)−
γϕκ2(ui) and γ − ui are non zero, uiϕ
1 (ui)− γϕ
2(ui) and γ − ui must have opposite signs, that is
1 (ui)− γϕ
2(ui))(γ − ui) < 0.
Let us pick δ > 0 small enough such that the following conditions hold for all i = 2, . . . , q:
1. If ui is not real, then Bδ(ui) has no intersection with the real axis.
2. If ui is real and positive, then for any real u ∈ Bδ(ui), the following inequality holds:
(uϕκ1(u)− γϕ
2 (u))(γ − u) < 0. (25)
3. If ui is real and negative, then Bδ(ui) has no intersection with the imaginary axis.
4. Bδ(uj)
Bδ(uk) = ∅ for all i 6= k = 2, . . . , q.
By Lemma 3, there exists ε3 > 0 such that for all ε ∈ (0, ε3), the polynomial Q̄
κ,γ,Etot(ε, u) has exactly
nj roots in each Bδ(uj), j = 1, . . . , q, denoted by u
j (ε), k = 1, . . . , nj .
We pick one such ε, and if we can show that all of the roots in Bδ(ui), i = 2, . . . , q are in C
κ,γ,Etot(ε),
then we are done, since the only roots that may not be in C
κ,γ,Etot(ε) are the roots in Bδ(u1), and there
is one root in Bδ(u1).
For each i = 2, . . . , p, there are three cases:
1. ui is not real. Then condition 1 guarantees that u
i (ε) is not real for all k = 1, . . . , ni.
2. ui is real and positive. Pick any root u
i (ε), k = 1, . . . , ni, the corresponding v
i (ε) equals
γ − uki (ε)
uki (ε)ϕ
i (ε))− γϕ
i (ε))
So, uki (ε) is in C
κ,γ,Etot(ε).
3. ui is real and negative. By conditions 1 and 3, u
i (ε) is not positive for all k = 1, . . . , ni.
5 Conclusions and discussions
Here we have set up a mathematical model for multisite phosphorylation-dephosphorylation cycles of size
n, and studied the number of positive steady states based on this model. We reformulated the question
of number of positive steady states to question of the number of positive roots of certain polynomials,
through which we also applied perturbation techniques. Our theoretical results depend on the assumption
of mass action kinetics and distributive sequential mechanism, which are customary in the study of multisite
phosphorylation and dephosphorylation.
An upper bound of 2n−1 steady states is obtained for arbitrary parameter combinations. Biologically,
when the substrate concentration greatly exceeds that of the enzyme, there are at most n + 1 (n) steady
states if n is even (odd). And this upper bound can be achieved under proper kinetic conditions, see
Theorem 1 for the construction. On the other extreme, when the enzyme is in excess, there is a unique
steady state.
As a special case of n = 2, which can be applied to a single level of MAPK cascades. Our results
guarantees that there are no more than three steady states, consistent with numerical simulations in [17].
We notice that there is an apparent gap between the upper bound 2n−1 and the upper bound of n+1
(n) if n is even (odd) when the substrate is in excess. If we think the ratio Etot/Ftot as a parameter ε,
then when ε≪ 1, there are at most n+1 (n) steady states when n is even (odd), which coincides with the
largest possible lower bound. When ε ≫ 1, there is a unique steady state. If the number of steady states
changes “continuously” with respect to ε, then we do not expect the number of steady states to exceed
n + 1 (n) if n is even (odd). So a natural conjecture would be that the number of steady states never
exceed n+ 1 under any conditions.
6 Acknowledgment
We thank Jeremy Gunawardena for very helpful discussions.
7 Appendix
proof of Lemma 2: Recall that (dropping the u’s in ϕκi , i = 0, 1, 2)
κ,γ,Stot(0, u) = uϕκ0 + Stot(uϕ
1 − γϕ
2)− γϕ
κ,γ,Stot
(0, u) = ϕκ0 + Stot(uϕ
1 − γϕ
′ − (γ − u)(ϕκ0 )
Since F̃
κ,γ,Stot(0, ū) = 0,
Stot(ūϕ
1 − γϕ
2 ) = (γ − ū)ϕ
that is,
γ − ū =
Stot(ūϕ
1 − γϕ
Therefore,
κ,γ,Stot
(0, ū) = ϕκ0 + Stot(uϕ
1 − γϕ
Stot(ūϕ
1 − γϕ
(ϕκ0)
= ϕκ0 +
ϕκ0(uϕ
1 − γϕ
′ − (ūϕκ1 − γϕ
= ϕκ0 +
((1 + λ0ū+ λ0λ1ū
2 + · · · + λ0 · · ·λn−1ū
(1− γβ0) + 2
(1− γβ1)ū+ · · ·+ n
λ0 · · · λn−2
KMn−1
(1− γβn−1)ū
λ0 + 2λ0λ1ū+ · · · + nλ0 · · ·λn−1ū
(1− γβ0)ū+
(1− γβ1)ū
2 + · · ·+
λ0 · · ·λn−2
KMn−1
(1− γβn−1)ū
= ϕκ0 +
λ0 · · ·λi−1ū
(j + 1− i)
λ0 · · ·λj−1
(1− γβj)ū
λ0 · · ·λi−1ū
λ0 · · ·λj−1ū
+ Stot
λ0 · · ·λi−1ū
(j + 1− i)
λ0 · · ·λj−1
(1− γβj)ū
λ0 · · ·λi−1ū
λ0 · · ·λn−1ū
λ0 · · ·λj−1ū
1 + Stot(j + 1− i)
1 − γβj
where the product λ0 · · ·λ−1 is defined to be 1 for the convenience of notation.
Because of (21),
(j + 1− i)
1− γβj
so we have ∂F̃
κ,γ,Stot
(0, ū) > 0.
References
[1] M. Samoilov, S. Plyasunov, and A.P. Arkin. Stochastic amplification and signaling in enzymatic futile
cycles through noise-induced bistability with oscillations. Proc Natl Acad Sci USA, 102:2310–2315,
2005.
[2] S. Donovan, K.M. Shannon, and G. Bollag. GTPase activating proteins: critical regulators of intra-
cellular signaling. Biochim. Biophys Acta, 1602:23–45, 2002.
[3] J.J. Bijlsma and E.A. Groisman. Making informed decisions: regulatory interactions between two-
component systems. Trends Microbiol, 11:359–366, 2003.
[4] A.D. Grossman. Genetic networks controlling the initiation of sporulation and the development of
genetic competence in bacillus subtilis. Annu Rev Genet., 29:477–508, 1995.
[5] H. Chen, B.W. Bernstein, and J.R. Bamburg. Regulating actin filament dynamics in vivo. Trends
Biochem. Sci., 25:19–23, 2000.
[6] G. Karp. Cell and Molecular Biology. Wiley, 2002.
[7] L. Stryer. Biochemistry. Freeman, 1995.
[8] M.L. Sulis and R. Parsons. PTEN: from pathology to biology. Trends Cell Biol., 13:478–483, 2003.
[9] D.J. Lew and D.J. Burke. The spindle assembly and spindle position checkpoints. Annu Rev Genet.,
37:251–282, 2003.
[10] A.R. Asthagiri and D.A. Lauffenburger. A computational study of feedback effects on signal dynamics
in a mitogen-activated protein kinase (MAPK) pathway model. Biotechnol. Prog., 17:227–239, 2001.
[11] L. Chang and M. Karin. Mammalian MAP kinase signaling cascades. Nature, 410:37–40, 2001.
[12] C-Y.F. Huang and J.E. Ferrell Jr. Ultrasensitivity in the mitogen-activated protein kinase cascade.
Proc. Natl. Acad. Sci. USA, 93:10078–10083, 1996.
[13] C. Widmann, G. Spencer, M.B. Jarpe, and G.L. Johnson. Mitogen-activated protein kinase: Conser-
vation of a three-kinase module from yeast to human. Physiol. Rev., 79:143–180, 1999.
[14] W.R. Burack and T.W. Sturgill. The activating dual phosphorylation of MAPK by MEK is nonpro-
cessive. Biochemistry, 36:5929–5933, 1997.
[15] J.E. Ferrell and R.R. Bhatt. Mechanistic studies of the dual phosphorylation of mitogen-activated
protein kinase. J. Biol. Chem., 272:19008–19016, 1997.
[16] Y. Zhao and Z.Y. Zhang. The mechanism of dephosphorylation of extracellular signal-regulated kinase
2 by mitogen-activated protein kinase phosphatase 3. J. Biol. Chem., 276:32382–32391, 2001.
[17] N.I. Markevich, J.B. Hoek, and B.N. Kholodenko. Signaling switches and bistability arising from
multisite phosphorylation in protein kinase cascades. J. Cell Biol., 164:353–359, 2004.
[18] J. Gunawardena. Multisite protein phosphorylation makes a good threshold but can be a poor switch.
Proc. Natl. Acad. Sci., 102:14617–14622, 2005.
[19] C. Conradi, J. Saez-Rodriguez, E.-D. Gilles, and J. Raisch. Using chemical reaction network theory
to discard a kinetic mechanism hypothesis. In Proc. FOSBE 2005 (Foundations of Systems Biology
in Engineering), Santa Barbara, Aug. 2005, pages 325–328. 2005.
[20] T.S. Gardner, C.R. Cantor, and J.J. Collins. Construction of a genetic toggle switch in Escherichia
coli. Nature, 403:339–342, 2000.
[21] D. Angeli, J. E. Ferrell, and E.D. Sontag. Detection of multistability, bifurcations, and hysteresis in a
large class of biological positive-feedback systems. Proc Natl Acad Sci USA, 101(7):1822–1827, 2004.
[22] E.E. Sel’kov. Stabilization of energy charge, generation of oscillation and multiple steady states in
energy metabolism as a result of purely stoichiometric regulation. Eur. J. Biochem, 59(1):151–157,
1975.
[23] W. Sha, J. Moore, K. Chen, A.D. Lassaletta, C.S. Yi, J.J. Tyson, and J.C. Sible. Hysteresis drives
cell-cycle transitions in Xenopus laevis egg extracts. Proc. Natl. Acad. Sci., 100:975–980, 2003.
[24] F. Ortega, J. Garcés, F. Mas, B.N. Kholodenko, and M. Cascante. Bistability from double phos-
phorylation in signal transduction: Kinetic and structural requirements. FEBS J, 273:3915–3926,
2006.
[25] L. Wang and E.D. Sontag. Singularly perturbed monotone systems and an application to double
phosphorylation cycles. (Submitted to IEEE Transactions Autom. Control, Special Issue on Systems
Biology, January 2007, Preprint version in arXiv math.OC/0701575, 20 Jan 2007), 2007.
[26] L. Wang and E.D. Sontag. Almost global convergence in singular perturbations of strongly monotone
systems. In Positive Systems, pages 415–422. Springer-Verlag, Berlin/Heidelberg, 2006. (Lecture Notes
in Control and Information Sciences Volume 341, Proceedings of the second Multidisciplinary Inter-
national Symposium on Positive Systems: Theory and Applications (POSTA 06) Grenoble, France).
[27] D. Angeli, P. de Leenheer, and E.D. Sontag. A Petri net approach to the study of persistence in
chemical reaction networks. (Submitted to Mathematical Biosciences, also arXiv q-bio.MN/068019v2,
10 Aug 2006), 2007.
[28] D. Angeli and E.D. Sontag. Translation-invariant monotone systems, and a global convergence result
for enzymatic futile cycles. Nonlinear Analysis Series B: Real World Applications, to appear, 2007.
[29] M Thompson and J. Gunawardena. Multi-bit information storage by multisite phosphorylation. Sub-
mitted, 2007.
[30] E.D. Sontag. Mathematical Control Theory. Deterministic Finite-Dimensional Systems, volume 6 of
Texts in Applied Mathematics. Springer-Verlag, New York, second edition, 1998.
[31] M. Feinberg. Chemical reaction network structure and the stability of complex isothermal reactors:
II. Multiple steady states for networks of deficiency one. Chem. Eng. Sci., 43,1–25, 1988.
[32] P. Ellison, M. Feinberg. How catalytic mechanisms reveal themselves in multiple steady-state data: I.
Basic principles. J. Symbolic Comput., 33, 275–305, 2002.
[33] C.M. Furdui, E.D. Lew, J. Schlessinger, K.S. Anderson. Autophosphorylation of FGFR1 kinase is
mediated by a sequential and precisely ordered reaction. Molecular Cell, 21, 711–717, 2006.
http://arxiv.org/abs/math/0701575
	Introduction
	Model assumptions
	Mathematical formalism
	Number of positive steady states
	Lower bound on the number of positive steady states
	Upper bound on the number of steady states
	Fine-tuned upper bounds
	Conclusions and discussions
	Acknowledgment
	Appendix
ABSTRACT
  The multisite phosphorylation-dephosphorylation cycle is a motif repeatedly
used in cell signaling. This motif itself can generate a variety of dynamic
behaviors like bistability and ultrasensitivity without direct positive
feedbacks. In this paper, we study the number of positive steady states of a
general multisite phosphorylation-dephosphorylation cycle, and how the number
of positive steady states varies by changing the biological parameters. We show
analytically that (1) for some parameter ranges, there are at least n+1 (if n
is even) or n (if n is odd) steady states; (2) there never are more than 2n-1
steady states (in particular, this implies that for n=2, including single
levels of MAPK cascades, there are at most three steady states); (3) for
parameters near the standard Michaelis-Menten quasi-steady state conditions,
there are at most n+1 steady states; and (4) for parameters far from the
standard Michaelis-Menten quasi-steady state conditions, there is at most one
steady state.

<|endoftext|><|startoftext|>
Introduction 
The discrete dipole approximation (DDA) is a general method to calculate scattering and 
absorption of electromagnetic waves by particles of arbitrary geometry and composition. The 
DDA was first proposed by Purcell and Pennypacker [1] and was reviewed by Draine and 
Flatau in 1994 [2]. A recent review [3] describes the current state of the DDA and its 
historical development. It also explains the equivalence of the DDA and methods based on the 
volume integral equation formulation. The reader is referred to this review for an in-depth 
discussion of the DDA. 
There are a number of computer programs based on the DDA, some of which were 
recently compared by Penttila et al. [4]. The most popular among them is DDSCAT [5], 
which has been widely used by many researchers for more than 10 years. In this paper we 
present a new program, Amsterdam DDA (ADDA), which recently has been put in the public 
domain.1 Its main distinctive feature is the ability to parallelize a single DDA simulation over 
a cluster of computers, which allows simulation of light scattering by very large particles. 
This is demonstrated for a number of test cases in this manuscript. Validation of ADDA by 
simulating light scattering by wavelength-sized particles and comparing it with other DDA 
programs was reported elsewhere [4]. 
Section 2 describes in detail the ADDA computer code, showing its advantages 
compared to other codes. A number of numerical tests are shown in Section 3, demonstrating 
that DDA is actually capable processing large particles, and showing the current capabilities 
of ADDA. Results of these simulations are discussed in Section 4; the errors are compared 
with previous results for much smaller particles. Section 5 concludes the manuscript and 
discusses possible future work. 
2 ADDA computer code 
ADDA has been developed over a period of more than 10 years at the University of 
Amsterdam [6-8]. Its main feature (distinctive from other DDA codes) has always been the 
capability of running on a cluster of computers, parallelizing a single DDA computation, in 
contrast with e.g. DDSCAT [5] that allows farming several instantiations of a DDA 
simulation to different processors. This allows using a practically unlimited number of 
dipoles, since ADDA is not limited by the memory of a single computer [8,9]. Recently the 
overall performance of the code has been improved significantly, together with some 
optimizations specifically for single-processor mode. ADDA's source code and 
documentation is freely available.
Most of ADDA is written in ANSI C, which ensures wide portability on the source-code 
level. The code is fully operational under Linux and, in sequential mode, on Windows based 
systems. The parallelization over multiple processors is based on a geometric decomposition 
of the particle and the single-program-multiple-data paradigm of parallel computing. The 
code is written for distributed memory systems using the message passing interface (MPI).2 
Note that ADDA should in principle also run on shared memory computers, but so far this 
was not explicitly tested. The fast Fourier transform (FFT) used for the matrix-vector products 
in the iterative solver is performed either using routines by Temperton [10] or the more 
advanced package “Fastest Fourier transform in the West” (FFTW) [11]. The latter is 
generally considerably faster but requires a separate package installation. 
ADDA has four options implemented for dipole polarizabilities: Clausius-Mossotti [1], 
radiative reaction correction [12], lattice dispersion relation (LDR) [13], and corrected LDR 
[14]. It includes four iterative methods: conjugate gradient applied to normalized equation 
with minimization of residual norm (CGNR) [15], Bi-CG stabilized (Bi-CGSTAB) [15], Bi-
                                                 
1 http://www.science.uva.nl/research/scs/Software/adda/
2 http://www.mpi-forum.org
http://www.science.uva.nl/research/scs/Software/adda/
http://www.mpi-forum.org/
CG [16], and quasi minimal residual (QMR) [16]. The last two iterative methods employ the 
complex-symmetric property of the DDA interaction matrix to halve the calculation time [16]. 
The default stopping criterion of the iterative method in ADDA is the relative norm of the 
residual ε, which must be . 510−<
The usual formulation of DDA can be written as [2,3]:  
jijii EPGP =− ∑
−α , (1)
where iα  is the tensor of dipole polarizability,  is incident electric field, 
iE ijG  is the free-
space Green’s tensor (complex symmetric), and Pi is the unknown dipole polarization. If the 
polarizability tensor is diagonal for all dipoles then there always exists a iβ  such that 
iii αββ = , i.e. ii αβ = . Moreover, iβ  is then complex symmetric, and so is the matrix with 
elements 
A , (2)
where I  is an identity tensor. A  is the interaction matrix that is used in ADDA, i.e. the 
following system of linear equations is solved: 
jjijii
jij ExGxxA βββ =−= ∑∑
, (3)
where iii Px
1−= β  is a new unknown vector. Eq. (3) is equivalent to the use of Jacobi-
preconditioning [15] together with keeping the interaction matrix complex-symmetric (for any 
distribution of refractive index inside the scatterer and for any of the supported polarization 
prescriptions). We have not studied, however, whether this Jacobi-preconditioning improves 
the convergence of the iterative solver. Flatau showed [17] that in some test cases it helps, 
while in others there is no improvement. It is important to note also that DDA is not limited to 
diagonal or symmetric polarizabilities. Any other tensor may be used, but then the interaction 
matrix is not complex-symmetric; hence, QMR and Bi-CG are less efficient. 
ADDA can perform orientation averaging of the scattering quantities over three Euler 
angles (α, β, γ) of the particle orientation. Averaging over the angle α is done with a single 
computation of internal fields by computing scattering in different scattering planes, which is 
comparably fast. Averaging over the other two Euler angles is done by independent DDA 
simulations. The averaging itself is performed using a Romberg integration [18], which may 
be used adaptively (i.e. automatically simulating the required number of different orientations 
to reach a prescribed accuracy) but limits the possible number of values for each orientation 
angle to be , where n is an integer. Moreover, symmetries of the scatterer may be used 
to decrease the intervals of Euler angles, over which to average, and hence accelerate the 
calculation. This feature of ADDA was tested in a recent benchmark study [4]. 
12 +n
Other features of ADDA include computation of scattering by a tightly focused 
Gaussian beams [6], a checkpoint system to allow for long runs on queuing systems that 
enforce upper limits on wall clock time for execution as is usually the case on massively 
parallel supercomputers, calculation of radiation forces on each of the dipoles [19], use of 
rotational symmetry of the scatterer to halve the simulation time, and an extended command 
line interface. Some other features, such as applicability to anisotropic scatterers and a large 
set of predefined shapes, are planned to be implemented in the near future. 
There are several factors that allow ADDA's performance to compare favorably with 
other codes, which was shown in a benchmark study by Penttila et al. [4]. First of all, the 
FFTW 3 package that is used automatically adapts itself to optimally perform on any 
particular hardware. Moreover, ADDA does not perform complete 3D FFT transforms in one 
run, but decomposes them into a set of 1D transforms with data transposition in between. This 
allows employing the fact that input data for the forward transform contains many zeros, and 
0 20 40 60 80 100 120 140 160
ε ∈(10−5,10−3)
Size parameter x
ε =10−5
70 GB
Fig. 1. Current capabilities of the ADDA for spheres with different x and m. The striped region 
corresponds to full convergence and densely hatched region to incomplete convergence. The dashed 
lines show two levels of memory requirements for the simulation, according to the “rule of thumb” 
(see main text for explanation). 
only part of the output data of the backward transform is used [8]. Second, we have 
implemented four different Krylov-space-based iterative solvers, allowing us to choose the 
most suitable one for a particular application. As is known from the literature [17,20,21] and 
demonstrated in Section 3, there is not a best iterative solver for DDA. Depending on all 
details of the scattering problem, any of the methods may outperform the others. Third, 
dynamic memory allocation and optimized data structures allow all computations, except the 
FFT, to be performed only for the real (non-void) dipoles and not for the whole computational 
box. This also decreases ADDA's memory consumption. Moreover, symmetry of the 
interaction matrix is used to decrease memory required for its Fourier transform. Finally, all 
float variables in ADDA are represented in double precision. This accelerates convergence in 
cases when machine precision becomes important. Moreover, basic operations with double-
precision numbers can be faster than with single-precision ones on modern processors. This 
acceleration comes at a cost of increased memory consumption, which is, however, still lower 
than for other computer codes [4]. 
More information on ADDA can be found in an extensive manual included in the 
distribution package.
3 Numerical simulations 
3.1 Simulation parameters 
In our tests we used ADDA v.0.75, compiled with the Intel C compiler v.9.0 with maximum 
possible optimizations (default options in ADDA’s makefile). All the tests were run on the 
Dutch compute cluster LISA,3 using 32 nodes (each dual Intel Xeon 3.4 GHz processor with 
4 GB RAM). LDR was used as the most common polarization formulation. We have tried 
three different iterative solvers: QMR, Bi-CG, and Bi-CGSTAB. For all of them a default 
stopping criterion  was used. 510−=ε
                                                 
3 http://www.sara.nl/userinfo/lisa/description/
http://www.sara.nl/userinfo/lisa/description/
Table 1. Parameters of the numerical simulations. 
m x λ/md 
Number of 
dipolesa Iterative method 
Number of 
iterations 
20 9.6 2.6×105 Bi-CGSTAB 6 
30 9.6 8.8×105 Bi-CGSTAB 7 
40 9.6 2.1×106 Bi-CGSTAB 9 
60 9.6 7.1×106 Bi-CGSTAB 14 
80 9.6 1.7×107 Bi-CGSTAB 20 
100 9.6 3.3×107 Bi-CGSTAB 27 
130 10.3 9.0×107 Bi-CGSTAB 40 
1.05 
160 9.6 1.3×108 Bi-CGSTAB 65 
20 10.5 5.1×105 QMR 86 
30 11.2 2.1×106 QMR 223 
40 10.5 4.1×106 QMR 598 
60 9.8 1.1×107 QMR 2120 
80 10.5 3.3×107 Bi-CGSTAB 21748 
100 10.1 5.7×107 Bi-CGSTAB 6169 
130 10.3 1.3×108 Bi-CGSTAB 29200 
20 10.8 8.8×105 QMR 1344 
30 10.8 3.0×106 QMR 16930 
40 10.8 7.1×106 QMR 8164 
60 9.6 1.7×107 Bi-CG 127588 
20 11.0 1.4×106 QMR 8496 1.6 
30 10.5 4.1×106 Bi-CG 69748 
20 11.2 2.1×106 QMR 28171 1.8 
30 10.2 5.5×106 Bi-CG 118383 
2 20 10.1 2.1×106 QMR 58546 
a This is the total number of dipoles in the rectangular computational grid, which is the main factor determining 
the computation time of one iteration. For spheres the number of dipoles occupied by the scatterer itself is almost 
two times smaller. 
0 5000 10000 15000 20000 25000
Number of iterations  
Fig. 2. Convergence of the QMR iterative solver for the sphere with x = 20 and m = 1.8. The residual 
as a function of the iteration number is shown. The system of linear equations contains 3×106 
unknowns. 
Spheres were used as test objects. Their size parameter x was varied from 20 to 160 and 
their refractive index m was varied from 1.05 to 2. We limited ourselves to the case of real m. 
The current capabilities of ADDA are shown as a region of the (x,m)-plane in Fig. 1. The 
striped region corresponds to full convergence, the densely hatched region corresponds to 
those cases where ADDA could not fully converge to the required residual norm, but only to 
20 40 60 80 100 120 140 160
106 1 week
1 day
Size parameter x
          m =
 1.05
1 min
1 hour
Fig. 3. Total simulation wall clock time (on 64 processors) for spheres with different x and m. Time is 
shown in logarithmic scale. Horizontal dotted lines corresponding to a minute, an hour, a day, and a 
week are shown for convenience. 
20 40 60 80 100 120 140 160
Size parameter x
          m =
 1.05
Fig. 4. Relative errors of the extinction efficiency in logarithmic scale for spheres with different x and 
)10,10( 35 −−∈ε . Although this incomplete convergence probably affects the final accuracy of 
the scattering quantities only slightly, we remove such results from further consideration 
because a separate study is required to quantify this effect (see Section 4). For fully converged 
results, the errors of scattering quantities due to the numerical convergence are much smaller 
than the total errors (data not shown). 
A complete set of (x,m) pairs, for which ADDA converged, is shown in Table 1. It also 
shows the number of dipoles per wavelength in the medium ( md/λ  where d is the size of the 
dipole). We tried to keep it equal to 10 according to the “rule of thumb” as formulated by 
Draine and Flatau [2]; however, it was slightly different because we varied the size of the 
dipole grid to optimize the parallel efficiency of ADDA.4 The total number of dipoles in a 
rectangular computational grid, shown in Table 1, was varied from 2.6×105 to 1.3×108, it can 
                                                 
4 The best parallel performance is obtained when grid size divides the number of processors. However, ADDA 
works with any grid size. 
20 40 60 80 100 120 140 160
Size parameter x
          m =
 1.05
Fig. 5. Same as Fig. 4 but now for the asymmetry parameter. 
20 40 60 80 100 120 140 160
Size parameter x
          m =
 1.05
Fig. 6. Maximum relative errors of S11(θ ) in logarithmic scale for spheres with different x and m. 
be approximately determined as . Both memory requirements and computation time 
of one iteration are proportional to this number. Two dashed lines are shown in 
3)18.3( xm
Fig. 1 to 
indicate the memory requirements for different x and m. They correspond to typical memory 
of a modern desktop computer (2 GB) and the maximum total memory used in our 
simulations (70 GB), respectively. 
For each sphere we computed the extinction efficiency, the asymmetry parameter, and 
all Mueller matrix elements in one scattering plane, which is a symmetry plane of the cubical 
discretization of the sphere. Exact results for the same spheres were obtained using the Mie 
theory [22]. Spherical symmetry was used by ADDA to get all results from calculations for 
only one polarization state of the incident field. Therefore computation time is a factor of two 
smaller than for non-symmetric scatterers with the same x and m. We employed a volume 
correction to ensure equal volumes of sphere and its dipole representation [2]. Note, however, 
that for the very large spheres this correction is extremely small. 
3.2 Results 
Table 1 shows the iterative solver that provided the best performance for each particular case 
and the number of iterations to achieve convergence. Fig. 2 illustrates one specific example of 
20 40 60 80 100 120 140 160
Size parameter x
          m =
 1.05
Fig. 7. Same as Fig. 6 but now for RMS relative errors. 
0 30 60 90 120 150 180
170 175 180
Scattering angle θ, deg
Fig. 8. DDA results (dotted line) of S11(θ ) in logarithmic scale for a sphere with x = 160 and m = 1.05, 
compared with the results of the Mie theory (solid line). 
convergence of the DDA iterative solver. This is QMR applied to the system of 3⋅106 linear 
equations obtained for the sphere with 20=x  and 8.1=m . The total simulation wall clock 
time t for all particles is shown in Fig. 3. Fig. 4 and Fig. 5 show the relative errors of the 
extinction efficiency Qext and the asymmetry parameter >< θcos  respectively. Maximum - 
and root-mean-squared (RMS) relative errors of S11 over the whole range of scattering angle 
are shown in Fig. 6 and Fig. 7 respectively. Errors of other non-trivial Mueller matrix 
elements behave in a similar way (data not shown). 
DDA results of S11(θ) for a sphere with 160=x  and 05.1=m  are compared with the 
Mie theory in Fig. 8. The inset shows a magnification of the backscattering region. This is, to 
the best of our knowledge, the largest particle ever simulated with DDA. Fig. 9 and Fig. 10 
show the same comparisons but for 60=x , 4.1=m  and 20=x , 2=m  respectively. 
0 30 60 90 120 150 180
Scattering angle θ, deg
Fig. 9. Same as Fig. 8 but now for x = 60 and m = 1.4. 
0 30 60 90 120 150 180
Scattering angle θ, deg
Fig. 10. Same as Fig. 8 but now for x = 20 and m = 2. 
4 Discussion 
The convergence of the QMR iterative solver shown in Fig. 2, featuring plateaus and steep 
descents, is in agreement both with its behavior in general [16] and with particular examples 
of its application to DDA [20,23]. A distinctive feature of this graph compared to the 
literature data is that the convergence slows down with iteration number, i.e. the logarithm of 
the residual norm decreases slower than linearly. This is probably due to the large size of the 
scatterer and loss of numerical precision (see discussion below). 
The total computation times t increase steeply both with x and m (Fig. 3). The time is 
displayed in a logarithmic scale covering a range from 4 seconds to more than 2 weeks. For 
, the increase of t with x is mostly due to the increasing number of dipoles to model 
the scatterer, since the number of iterations increase at a slower pace (
05.1=m
Table 1). For larger m 
these two effects are comparable, combining into a very unfavorable scaling, which can be 
approximately described by a power law , where )()( mxmCt α≈ 6>α  for . It should 
be noted that both the number of iterations and t do not always increase monotonically with x. 
For example for ,  and 
2.1≥m
80=x 2.1=m 30=x , 4.1=m  the execution times are unusually high. 
This may be caused by a large condition number of DDA interaction matrices for these two 
particular particles. Moreover, when the convergence is slow it may suffer from machine 
precision, the latter determining the limit of x and m, for which ADDA will converge at all. 
Therefore, current size limitations of the DDA for  are due to the practically 
unbearable computation times, and not due to memory requirements.
2.1≥m
5 Simulations for larger 
m are far from the memory limit shown in Fig. 1. Moreover, simply using more processors 
does not solve the problem. Improving numerical performance is required, e.g. dedicated 
preconditioning of the iterative solver [15]. On the other hand, extension to larger sizes for 
 is feasible if more computer resources are available. This facilitates, for example, 
simulating scattering of visible light by almost all biological cells in suspension. 
2.1<m
The increase of the number of iterations with m is a well-known fact [12,17,21,24]; 
however, there is still no theoretical foundation to describe it in details. Rahola [24] provided 
theoretical predictions of the dependence of the number of iterations on m, valid for scatterers 
smaller than the wavelength. However, these conclusions are not applicable to the scatterers 
studied in this manuscript. The general reason for the slowing down of the convergence with 
increasing m is increased interaction between dipoles and, hence, an increased condition 
number of the interaction matrix. Absorption, if present, should decrease the overall 
interaction between dipoles in a large scatterer. Therefore, it is expected that convergence for 
complex refractive indices should be better than for the purely real ones that we consider here. 
The same was suggested by Budko and Samokhin [25] based on the analysis of the spectrum 
of the integral scattering operator. However, this proposition is still to be verified by 
numerical tests. 
Another parameter that may greatly affect the computation time is the convergence 
threshold ε. In this paper it is set to a de-facto default value of 10-5 [2], which ensures 
negligibly small numerical errors compared to the model errors. However, in many cases 
numerical errors are small enough already for , i.e. the difference of the scattering 
quantities between simulations with  and  is significantly smaller than the 
difference between the latter and the exact values (data not shown). 
310−=ε
310−=ε 510−=ε
Fig. 2 shows that QMR 
for a particular case converges to  and  three and six times faster 
respectively than to . Results for other simulated particles and iterative solvers show 
similar trends and even larger acceleration with increasing ε in some cases (data not shown). 
Therefore, if one can determine an optimum ε for a particular case, it can decrease the 
computation time significantly. However, we do not pursue this issue further in this 
manuscript. 
310−=ε 3102 −×=ε
510−=ε
Fig. 4 shows the deterioration of the accuracy of Qext with increasing m, while there is 
no clear dependence on x (the only exception is a single result for ). Results for 2=m
>< θcos  (Fig. 5) behave in a similar way. These results are in good agreement with results of 
other researchers for smaller size parameters [2,13,26], both in terms of the errors themselves 
and their dependence on m. To express errors on the angular dependencies of S11 we use two 
integral parameters: the maximum - and RMS relative errors (Fig. 6 and Fig. 7 respectively). 
Although these parameters are not completely objective, as they are significantly influenced 
by the values of S11 in deep minima, which are completely irrelevant to most real 
experiments, they do provide a consistent measure of the DDA accuracy. To relate these 
integral parameters to some other criteria, e.g. visual agreement, three examples are presented 
in Fig. 8 – Fig. 10. Errors of S11(θ ) show the same tendencies as the integral scattering 
quantities, except that errors for  are relatively large (larger than those for  in 
the range ) and generally decrease with x. This is due to the relative nature of the 
measured errors and the huge dynamical range of S
05.1=m 2.1=m
11(θ ) for small refractive indices (see Fig. 
8). Results for smaller size parameters found in the literature [2,26] show a similar increase of 
                                                 
5 The boundary value of m is not well-defined, as it depends on particular hardware and restrictions on 
computation time; 1.2 is just a convenient value to guide the reader. 
errors with m: however, the errors themselves are considerably smaller. For instance, 
maximum relative errors of S11(θ ) for 10<x  and m up to i4.15.2 +  are smaller than 0.4. This 
is due to the general differences between functions S11(θ ) for particles comparable to and 
much larger than the wavelength. The latter has deeper minima and a larger overall dynamic 
range. It is important to note that refractive indices as small as 1.05 are rarely used in DDA 
simulations [26], therefore it is hard to make any definite conclusions concerning the behavior 
of errors in this case. 
In what follows, the traditional “rule of thumb” [2] is discussed, which states that for 
10/ =mdλ  errors of cross sections and asymmetry parameter are expected to be a few 
percents, and maximum errors in the angular dependence of S11 on the order of 20-30 %. 
Results for both Qext and >< θcos  do satisfy the “rule of thumb,” however this rule does not 
describe the decrease of errors by two orders of magnitude with decreasing m. The latter can 
be used to cut down the number of dipoles and hence computation time in cases when only 
integral scattering quantities need to be calculated for small m. Relative errors of S11(θ ) are 
much larger than that predicted by the “rule of thumb,” which is due to the fact that the latter 
was derived based on test simulations for x smaller than 10 [2]. See, however, the discussion 
below on possible changes for complex refractive index and non-spherical shapes. To 
conclude, the “rule of thumb” has very limited application for the range of x and m here. More 
elaborate empirical functions are required to estimate the number of dipoles needed to reach a 
prescribed accuracy. They will also allow a more realistic estimate of DDA computational 
complexity, i.e. the computation time needed to reach a certain accuracy of some scattering 
quantities for particular x and m. This topic is left for the future study. 
The test results shown in this paper are limited to real refractive indices and spherically 
shaped scatterers. In the following we try to generalize our conclusions to complex refractive 
index and non-spherical shapes. However, we want to stress that this generalization is 
speculative, and more numerical tests are clearly needed to verify them. It is expected that 
accuracies of integral scattering quantities should not change significantly for more general 
cases. Their accuracy should deteriorate both with increasing real and imaginary parts of the 
refractive index. The situation for angle-resolved scattering quantities is expected to be 
different. Large relative errors observed in this paper are due to deep minima that are a 
consequence of both spherical symmetry and purely real refractive index. It is expected that 
visual agreement between the DDA results and the exact solution (as shown in Fig. 8 – Fig. 
10) should not change significantly for more general cases, however it will result in smaller 
relative errors, especially for larger x and smaller m. 
5 Conclusion 
In this paper we present the ADDA, a computer program to simulate light scattering by 
arbitrarily shaped particles. ADDA can parallelize a single DDA simulation, which allows it 
not to be limited by the memory of a single computer. Moreover, ADDA is heavily optimized, 
which allows it to compare favorably with other programs based on DDA when running on a 
single processor. We showed its capabilities for simulating light scattering by spheres with x 
up to 160 and m up to 2. The maximum reachable x on a cluster of 64 modern processors 
decrease rapidly with increasing m: it is 160 for 05.1=m  and only 20-40 (depending on the 
convergence threshold) for . This is mostly due to the slow convergence of the iterative 
solver leading to practically unbearable computation times. It is expected that larger particle 
sizes can be reached if m has a significant imaginary part. 
Errors of both integral and angle-resolved scattering quantities show no systematic 
dependence on x, but generally increase with m. Errors of Qext and >< θcos  range from less 
than 0.01 % to 6 %. Maximum - and RMS relative errors of S11(θ ) are in the ranges 0.2–18 
and 0.04–1 respectively. Error predictions of the traditional “rule of thumb” have very limited 
application in this range of x and m: it describes the upper limit of errors of Qext and >< θcos , 
however it does not account for the decrease of the errors with m. 
Currently, the ADDA is capable of simulating light scattering by almost all biological 
cells in suspension; however, its performance for other cases can be improved. These 
improvements, left for future work, may include improving the convergence of the iterative 
solver by preconditioning. It also is desirable to conduct a detailed study of the dependence of 
the accuracy of the final results on the size of the dipole and convergence thresholds of the 
iterative solver for different scatterers. Such a study should result in a reduction of the 
computation time and provide a realistic estimate of DDA complexity over a wide range of x 
and m. 
Acknowledgements 
We thank Gorden Videen for critically reading the manuscript and anonymous reviewer for 
valuable comments. Our research is supported by Siberian Branch of the Russian Academy of 
Sciences through the grant 2006-03. 
References 
 [1]  Purcell EM, Pennypacker CR. Scattering and adsorption of light by nonspherical dielectric grains. 
Astrophys J 1973;186:705-714. 
 [2]  Draine BT, Flatau PJ. Discrete-dipole approximation for scattering calculations. J Opt Soc Am A 
1994;11:1491-1499. 
 [3]  Yurkin MA, Hoekstra AG. The discrete dipole approximation: an overview and recent developments. J 
Quant Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.034. 
 [4]  Penttila A, Zubko E, Lumme K, Muinonen K, Yurkin MA, Draine BT, Rahola J, Hoekstra AG, Shkuratov 
Y. Comparison between discrete dipole implementations and exact techniques. J Quant Spectrosc Radiat 
Transf 2007, doi:10.1016/j.jqsrt.2007.01.026. 
 [5]  Draine BT, Flatau PJ. User guide for the discrete dipole approximation code DDSCAT 6.1. 
http://xxx.arxiv.org/abs/astro-ph/0409262, 2004. 
 [6]  Hoekstra AG. Computer simulations of elastic light scattering. PhD thesis. University of Amsterdam, 
Amsterdam, 1994. 
 [7]  Hoekstra AG, Sloot PMA. Coupled dipole simulations of elastic light scattering on parallel systems. Int J 
Mod Phys C 1995;6:663-679. 
 [8]  Hoekstra AG, Grimminck MD, Sloot PMA. Large scale simulations of elastic light scattering by a fast 
discrete dipole approximation. Int J Mod Phys C 1998;9:87-102. 
 [9]  Yurkin MA, Semyanov KA, Tarasov PA, Chernyshev AV, Hoekstra AG, Maltsev VP. Experimental and 
theoretical study of light scattering by individual mature red blood cells with scanning flow cytometry and 
discrete dipole approximation. Appl Opt 2005;44:5249-5256. 
 [10]  Temperton C. Self-sorting mixed-radix fast Fourier transforms. J Comp Phys 1983;52:1-23. 
 [11]  Frigo M, Johnson SG. FFTW: an adaptive software architecture for the FFT. Proc ICASSP 1998;3:1381-
1384. 
 [12]  Draine BT. The discrete-dipole approximation and its application to interstellar graphite grains. Astrophys 
J 1988;333:848-872. 
 [13]  Draine BT, Goodman JJ. Beyond clausius-mossotti - wave-propagation on a polarizable point lattice and 
the discrete dipole approximation. Astrophys J 1993;405:685-697. 
 [14]  Gutkowicz-Krusin D, Draine BT. Propagation of electromagnetic waves on a rectangular lattice of 
polarizable points. http://xxx.arxiv.org/abs/astro-ph/0403082, 2004. 
 [15]  Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der 
Vorst HA. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd ed. 
SIAM, 1994. 
 [16]  Freund RW. Conjugate gradient-type methods for linear-systems with complex symmetrical coefficient 
matrices. SIAM J Sci Stat Comp 1992;13:425-448. 
 [17]  Flatau PJ. Improvements in the discrete-dipole approximation method of computing scattering and 
absorption. Opt Lett 1997;22:1205-1207. 
 [18]  Davis PJ, Rabinowitz P. Methods of Numerical Integration. New York: Academic Press, 1975. 
 [19]  Hoekstra AG, Frijlink M, Waters LBFM, Sloot PMA. Radiation forces in the discrete-dipole 
approximation. J Opt Soc Am A 2001;18:1944-1953. 
 [20]  Rahola J. Solution of dense systems of linear equations in the discrete-dipole approximation. SIAM J Sci 
Comp 1996;17:78-89. 
http://xxx.arxiv.org/abs/astro-ph/0409262,
http://xxx.arxiv.org/abs/astro-ph/0403082,
 [21]  Fan ZH, Wang DX, Chen RS, Yung EKN. The application of iterative solvers in discrete dipole 
approximation method for computing electromagnetic scattering. Microwave Opt Tech Lett 
2006;48:1741-1746. 
 [22]  Bohren CF, Huffman DR. Absorption and scattering of Light by Small Particles. New York: Wiley, 1983. 
 [23]  Rahola J. Iterative solution of dense linear systems arising from integral equations. Appl Parall Comput , 
Lect Not Comp Sci 1998;1541:460-467. 
 [24]  Rahola J. On the eigenvalues of the volume integral operator of electromagnetic scattering. SIAM J Sci 
Comp 2000;21:1740-1754. 
 [25]  Budko NV, Samokhin AB. Spectrum of the volume integral operator of electromagnetic scattering. SIAM 
J Sci Comp 2006;28:682-700. 
 [26]  Hoekstra AG, Rahola J, Sloot PMA. Accuracy of internal fields in volume integral equation simulations 
of light scattering. Appl Opt 1998;37:8482-8497. 
	1  Introduction
	2 ADDA computer code
	3 Numerical simulations
	3.1 Simulation parameters
	3.2 Results
	4 Discussion
	5 Conclusion
	Acknowledgements
	References
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Error
  /CompatibilityLevel 1.4
  /CompressObjects /Tags
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJDFFile false
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /CMYK
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments true
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile ()
  /AlwaysEmbed [ true
  /NeverEmbed [ true
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages true
  /ColorImageDownsampleType /Bicubic
  /ColorImageResolution 300
  /ColorImageDepth -1
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /DCTEncode
  /AutoFilterColorImages true
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages true
  /GrayImageDownsampleType /Bicubic
  /GrayImageResolution 300
  /GrayImageDepth -1
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /DCTEncode
  /AutoFilterGrayImages true
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages true
  /MonoImageDownsampleType /Bicubic
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXOutputIntentProfile ()
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /ConvertToCMYK
      /DestinationProfileName ()
      /DestinationProfileSelector /DocumentCMYK
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure false
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles false
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /DocumentCMYK
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /UseDocumentProfile
      /UseDocumentBleed false
    >>
>> setdistillerparams
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice
ABSTRACT
  In this manuscript we investigate the capabilities of the Discrete Dipole
Approximation (DDA) to simulate scattering from particles that are much larger
than the wavelength of the incident light, and describe an optimized publicly
available DDA computer program that processes the large number of dipoles
required for such simulations. Numerical simulations of light scattering by
spheres with size parameters x up to 160 and 40 for refractive index m=1.05 and
2 respectively are presented and compared with exact results of the Mie theory.
Errors of both integral and angle-resolved scattering quantities generally
increase with m and show no systematic dependence on x. Computational times
increase steeply with both x and m, reaching values of more than 2 weeks on a
cluster of 64 processors. The main distinctive feature of the computer program
is the ability to parallelize a single DDA simulation over a cluster of
computers, which allows it to simulate light scattering by very large
particles, like the ones that are considered in this manuscript. Current
limitations and possible ways for improvement are discussed.

<|endoftext|><|startoftext|>
Introduction ......................................................................................................................... 2 
2 General framework.............................................................................................................. 3 
3 Various DDA models .......................................................................................................... 7 
3.1 Theoretical base of the DDA........................................................................................ 7 
3.2 Accuracy of DDA simulations ................................................................................... 13 
3.3 The DDA for clusters of spheres................................................................................ 16 
3.4 Modifications and extensions of the DDA................................................................. 18 
4 Numerical considerations.................................................................................................. 19 
4.1 Direct vs. iterative methods........................................................................................ 19 
4.2 Scattering order formulation ...................................................................................... 22 
4.3 Block-Toeplitz ........................................................................................................... 23 
4.4 FFT............................................................................................................................. 24 
4.5 Fast multipole method................................................................................................ 24 
4.6 Orientation averaging and repeated calculations ....................................................... 25 
5 Comparison of the DDA to other methods ....................................................................... 27 
6 Concluding remarks .......................................................................................................... 28 
Acknowledgements .................................................................................................................. 28 
Appendix. Description of used acronyms and symbols ........................................................... 28 
References ................................................................................................................................ 31 
1 Introduction 
The discrete dipole approximation (DDA) is a general method to compute scattering and 
absorption of electromagnetic waves by particles of arbitrary geometry and composition. 
Initially the DDA was proposed by Purcell and Pennypacker (PP) [1], who replaced the 
scatterer by a set of point dipoles. These dipoles interact with each other and the incident 
field, giving rise to a system of linear equations, which is solved to obtain dipole 
polarizations. All the measured scattering quantities can be obtained from these polarizations. 
The DDA was further developed by Draine and coworkers [2-5], who popularized the method 
by developing a publicly available computer code DDSCAT [6]. Later it was shown that the 
DDA also can be derived from the integral equation for the electric field, which is discretized 
by dividing the scatterer into small cubical subvolumes. This derivation was apparently first 
performed by Goedecke and O'Brien [7] and further developed by others (see, for instance, 
[8-11]). It is important to note that the final equations, produced by both lines of derivation of 
the DDA are essentially the same. The only difference is that derivations based on the integral 
equations give more mathematical insight into the approximation, thus pointing at ways to 
improve the method, while the model based on point dipoles is physically clearer. 
The DDA is called the coupled dipole method or approximation by some researchers 
[12,13]. There are also other methods, such as the volume integral equation formulation [14] 
and the digitized Green’s function (DGF) [7], which were developed completely 
independently from PP. However, later they were shown to be equivalent to DDA [8,15]. In 
this review we will use the term DDA to refer to all such methods, since we describe them in 
terms of one general framework. However, it is difficult to separate unambiguously the DDA 
from other similar methods, based on the volume integral equations for the electromagnetic 
fields, such as a broad range of method of moments (MoM) with different bases and testing 
functions [16-19]. In our opinion, one fundamental aspect of the DDA is that the solution for 
the “physically meaningful” internal fields or their direct derivatives, e.g. polarization, plays 
an integral role in the process. In other words, any DDA formulation can be interpreted as 
replacing a scatterer by a set of interacting dipoles; this is further discussed in Section 2. An 
example of method that is not considered DDA is the MoM with higher-order hierarchical 
Legendre basis functions [17]. 
The DDA is a popular method in the light-scattering community and it has been 
reviewed by several authors. An extensive review by Draine and Flatau [4] covers almost all 
DDA developments up to 1994. A more recent review by Draine [5] mainly concerns 
applications and numerical considerations. DDA theory was discussed together with other 
methods for light scattering simulations in reviews by Wriedt [20], Chiappetta and Torresani 
[21], and Kahnert [15] and in books by Mishchenko et al. [22] and Tsang et al. [23]. Jones 
[24] placed the DDA in context of different methods with respect to particle characterization. 
However, many important DDA developments since 1994 are not mentioned in any of these 
manuscripts. Those that are mentioned are usually considered as side-steps, and are not placed 
into a general framework. Moreover, to the best of our knowledge numerical aspects of the 
DDA have never been reviewed extensively – each paper discusses only a few particular 
aspects. In this review we try to fill this gap. 
A general framework is developed in Section 2 to ease the further discussion of different 
DDA models. This framework is based on the integral equation because it allows a uniform 
description of all the DDA development. However, connection to a physically clearer model 
of point dipoles is discussed throughout the section. The sources of errors in the DDA 
formulation are also discussed there. 
In Section 3 the physical principles of the DDA are reviewed and results of different 
models are compared. In Subsection 3.1 different improvements of polarizabilities and 
interaction terms are reviewed from a theoretical point of view. Different expressions for Cabs 
also are discussed. Comparison of simulation results using different formulations is given in 
Subsection 3.2. Subsection 3.3 covers the special case of a cluster of spheres that allows 
particular improvements and simplifications. In Section 3.4 different significant modifications 
are reviewed, which do not fall completely into the general framework described in Section 2. 
Enhancements of the DDA for some special purposes also are discussed. 
Different numerical aspects of the DDA are reviewed in Section 4. These are concerned 
primarily with solving very large systems of linear equations (Subsection 4.1). Subsection 4.2 
describes the simplest iterative procedure to solve DDA linear system, which has a clear 
physical meaning. The special structure of the DDA interaction matrix for a rectangular grid 
and its application to decrease computational costs are described in subsections 4.3 and 4.4 
respectively. General methods to accelerate calculations, which do not require a rectangular 
grid, are discussed in Subsection 4.5. Subsection 4.6 covers special techniques to increase the 
efficiency of repeated calculations (e.g. in orientation averaging). 
A numerical comparison of the DDA with other methods is reviewed in Section 5; its 
strong and weak points are discussed. Section 6 concludes the review and discusses future 
development of the DDA. 
2 General framework 
The )iexp( tω−  time dependence of all fields is assumed throughout this review. The scatterer 
is assumed dielectric but not magnetic (magnetic permittivity 1=μ ). The electric permittivity 
is assumed isotropic to simplify the derivations; however, extension to arbitrary dielectric 
tensors is straightforward.1
The general form of the integral equation governing the electric field inside the 
dielectric scatterer is the following [8,15]: 
)()(),(),()()(),(d)()( 00
rErrLrMrErrrGrErE χχ VVr
∂−+′′′′+= ∫ , (1)
                                                 
1 In most formulae scalar values can be replaced directly by tensors, but there are exceptions. Extensions of 
DDA to optically anisotropic scatterers are discussed in Section 3.4. 
where Einc(r) and E(r) are the incident and total electric field at location r; 
πεχ 4)1)(()( −= rr  is the susceptibility of the medium at point r (ε(r) – relative 
permittivity). V is the volume of the particle, i.e., the volume that contains all points where the 
susceptibility is not zero. V0 is a smaller volume such that , VV ⊂0 00 \ VV ∂∈r . ),( rrG ′  is 
the free space dyadic Green’s function, defined as 
−=∇∇+=′ 222
i1ˆˆ)iexp()iexp(ˆˆ),(
k IIIrrG , (2)
where I  is the identity dyadic, ck ω=  is the free space wave vector, rrR ′−= , R=R , 
and  is a dyadic defined as  (μ and ν  are Cartesian components of the vector 
or tensor). M is the following integral associated with the finiteness of the exclusion volume 
RR ˆˆ νμμν RRRR =ˆˆ
( )∫ ′−′′′′=
)()(),()()(),(d),( s30
rV rErrrGrErrrGrM χχ , (3)
where ),(s rrG ′  is the static limit ( ) of 0→k ),( rrG ′ : 
−−=∇∇=′
11ˆˆ),(
IrrG . (4)
L  is the so-called self-term dyadic: 
rV rL , (5)
where  is an external normal to the surface ∂Vn′ˆ 0 at point r'. L  is always a real, symmetric 
dyadic with trace equal to 4π [25]. It is important to note that L  does not depend on the size 
of the volume V0, but only on its shape (and location of the point r inside it). On the contrary, 
M does depend on the size of the volume, moreover it approaches zero when the size of the 
volume decreases [8] (if both χ(r) and E(r) are continuous inside V0). 
When deriving Eq. (1) the singularity of the Green’s function has been treated explicitly, 
therefore it is preferable to the commonly used formulation [8,15]: 
∫ ′′′′+=
r )()(),(d)()( 3inc rErrrGrErE χ . (6)
Moreover, Yanghjian noted [25] that there exist several methods for treating the singularity in 
Eq. (6) leading to different results. He also proved that the derivation of Eq. (6) is false in the 
vicinity of the singularity of ),( rrG ′ . Hence it can be considered correct only if the 
singularity is then treated in a way similar to that of Lakhtakia [8], resulting in the correct 
Eq. (1). 
Discretization of Eq. (1) is done in the following way [15]. Let , U
= /0=ji VV I  
for ji ≠ ; N denotes number of subvolumes.2 Although the formulation is applicable to any 
set of subvolumes Vi, in most applications standard (equal) cells are used. Then the shape of 
the scatterer cannot always be described exactly by such standard cells. Hence, the 
discretization may be only approximately correct. Assuming iV∈r  and choosing iVV =0 , 
Eq. (1) can be rewritten as 
)()(),(),()()(),(d)()( 3inc rErrLrMrErrrGrErE χχ ii
∂−+′′′′+= ∑ ∫
. (7)
                                                 
2 In the framework of the DDA we usually call a subvolume a dipole. 
The set of Eq. (7) (for all i) is exact. Further, one fixed point ri inside each Vi (its center) is 
chosen and  is set. In many cases the following assumptions can be made: irr =
)()()()(),(d3 jjijj
rErGrErrrG χχ =′′′′∫ , (8)
)()(),( iiiiiV rErMrM χ= , (9)
which state that integrals in Eq. (7) linearly depend upon the values of χ and E at point ri. 
Eq. (7) can then be rewritten as 
( ) iiii
jjjijii V ELMEGEE χχ −++= ∑
inc , (10)
where , , )( ii rEE = )(
incinc
ii rEE = )( ii rχχ = , ),( iii V rLL ∂= . 
The usual approximation [15] is to consider E and χ constant inside each subvolume: 
iii V∈== rrErE for)(,)( χχ , (11)
which automatically implies Eqs. (8), (9) and 
( )∫ ′−′′=
iii r ),(),(d
s3)0( rrGrrGM , (12)
∫ ′′=
ij rV
1 3)0( rrGG . (13)
Superscript (0) denotes approximate values of the dyadics. A further approximation, which is 
used in almost all formulations of the DDA, including e.g. [8], is  
),()0( jiij rrGG = . (14)
This assumption is made implicitly by all formulations that start by replacing the scatterer 
with a set of point dipoles. It is important to note that Eq. (10) and derivations resulting from 
it require weaker assumptions (Eqs. (8), (9)) than imposed by Eq. (11) and, moreover, 
Eq. (14). It is possible to formulate the DDA based on Eq. (10), e.g. the Peltoniemi 
formulation [26] that is described in Section 3.1. We postulate Eq. (10) as a distinctive feature 
of the DDA, i.e. a method is called the DDA if and only if its main equation is equivalent to 
Eq. (10) with any Vi, χi, iM , iL , and ijG . 
Kahnert [15] distinguished the DDA from the MoM by the fact that the MoM solves 
directly Eq. (10) for unknown Ei, while the DDA seeks not the total, but the exciting electric 
fields 
( )( ) selfexc iiiiiii EEEMLIE −=−+= χ , (15)
( ) iiiii ELME χ−=self , (16)
where  is the field induced by the subvolume on itself. Eq. selfiE (10) is then equivalent to  
jjijii
excexcinc EαGEE , (17)
where iα  is the polarizability tensor defined as 
( )( ) 1−−+= iiiiii V χχ MLIα . (18)
However, an alternative formulation of the DDA exists [4] seeking a solution for unknown 
polarizations Pi: 
iiiiii V EEαP χ==
exc , (19)
jijiii PGPαE
1inc . (20)
It is important to note that Pi, defined by Eq. (19), is only an approximation to the polarization 
of the subvolume Vi. This approximation is exact only under the assumption of Eq. (11), while 
the formulation itself does not require it. The formulation, using Eq. (20), can be thought as 
an intermediary between the DDA and the MoM as classified by Kahnert [15], therefore 
revealing complete equivalence of these two formulations. The special structure of the matrix 
ijG  makes Eq. (20) preferable over Eqs. (10), (17) to find a numerical solution. This is 
discussed in Section 4. 
Lakhtakia [8] classified strong and weak forms of the DDA as those accounting for or 
neglecting iM  respectively. The weak form approaches the strong form when the size of the 
cell decreases, because iM  approaches zero. For a cubical cell Vi and with ri located at the 
center of the cell, iL  can be calculated analytically yielding [25] 
=i . (21)
Using Eq. (18), this results in the well-known Clausius-Mossotti (CM) polarizability (used 
originally by Purcell and Pennypacker [1]) for the weak form of the DDA: 
ii d ε
α IIα , (22)
where )( ii rεε = , and d is the size of the cubical cell. 
After the internal electric fields are determined, the scattered fields and cross sections 
can be calculated. The scattered fields are obtained by taking the limit ∞→r  of the integral 
in Eq. (1) (see e.g. [7]): 
)iexp(
)(sca nFrE
= , (23)
where rrn =  is the unit vector in the scattering direction, and F is the scattering amplitude: 
∑∫ ′′⋅′−′−−=
krnnk )()()iexp(d)ˆˆ(i)( 33 rErnrInF χ . (24)
All other differential scattering properties, such as amplitude and Mueller scattering matrices, 
and asymmetry parameter >< θcos  can be derived from F(n), calculated for two incident 
polarizations [27]. Radiation forces also can be calculated [28-30]. Consider an incident 
polarized plane wave3
)iexp()( 0inc rkerE ⋅= , (25)
where , a is the incident direction, and ak k= 10 =e . The scattering cross section Csca is [27] 
2sca )(d
C . (26)
Absorption and extinction cross sections (Cabs, Cext) are derived [7,14] directly from the 
internal fields: 
( )∑ ∫ ′′′=
abs )()(Imd4 rErχπ , (27)
[ ]( ) ( )∗∗ ⋅=′⋅′′′= ∑∫ 02inc3ext )(Re4)()()(Imd4 eaFrErEr krkC i Vi
χπ , (28)
where * denotes a complex conjugate. Conservation of energy necessitates that 
absextsca CCC −= . (29)
However, as was noted by Draine [2], use of Eq. (29) for evaluation of Csca can lead to larger 
errors than Eq. (26), especially when . scaabs CC >>
The easiest way to express Eqs. (24) and (27) in terms of the internal fields in the 
subvolumes centers is to assume Eq. (11), yielding 
∑ ∫ ⋅′−′−−=
krnnk )iexp(d)ˆˆ(i)( 33)0( nrEInF χ , (30)
                                                 
3 DDA can be used for any incident wave, e.g. Gaussian beams [31]; however, we do not discuss this here. 
∑∑ ∗==
iii kVkC )Im(4)Im(4
abs EPE πχπ . (31)
Further approximation of Eq. (30), leaving only the lowest order expansion of the exponent 
around ri, leads to 
∑ ⋅−−−=
ii knnk )iexp()ˆˆ(i)(
3)0( nrPInF , (32)
which together with Eq. (28), leads to 
( )∑ ∗⋅=
inc)0(
ext Im4 EPπ . (33)
Eqs. (32) and (33) are identical to those used by Purcell and Pennypacker [1] and then by 
Draine [2], while expressions for Cabs (compared to Eq. (31)) are slightly different. These 
differences are discussed in Subsection 3.1. Unfortunately, many researchers do not specify 
explicitly how the scattering quantities are obtained from the computed internal fields or 
polarizations. Those who do usually use Draine’s prescription (Eqs. (26), (32), (33), and 
(35)). 
Errors of the formulation can be classified as associated with the finite cell size d 
(discretization errors), and with approximating the particle shape with a set of standard cells, 
e.g. cubical (shape errors). Discretization errors result from considering E constant inside 
each cell and the approximate evaluation of iM  and ijG . Shape errors also can be considered 
as resulting from the assumption of constant χ and E inside bordering cells, which is false 
since the edge of the particle crosses these cells. On the other hand, shape errors can be 
viewed as a difference of the results for the exact particle shape and for that comprised of the 
set of standard cells. Both errors approach zero when ∞→N , while the geometry of the 
scatterer and parameters of the incident field are fixed. However, the same does not apply if 
 while N is fixed, i.e. the DDA is not exact in the long-wavelength limit. Moreover, 
both errors are sensitive to the size of the scatterer in the resonance region (see discussion in 
Subsection 
3.2). The behavior of these errors was studied by Yurkin et al. [32]. 
3 Various DDA models 
3.1 Theoretical base of the DDA 
Since the original manuscript by Purcell and Pennypacker [1], many attempts have been made 
to improve the DDA. The first stage (1988-1993) of these improvements was reviewed by 
Draine and Flatau [4]. It has been noted [2] that Eq. (22) does not satisfy energy conservation, 
and results obtained using this formulation do not satisfy the optical theorem. Based on the 
well-known [33] “radiative reaction” (RR) electric field, a correction to the polarizability for a 
finite dipole was added [2]: 
i)32(1 α
= . (34)
Draine [2] also proposed the following expression for the absorption cross section: 
[ ]∑ ⋅−⋅=
iiii kkC
*3*exc)0(
abs )32()Im(4 PPEPπ , (35)
derived from Eq. (29) applied to a single point dipole. The PP formulation uses Eq. (35) 
without the second part. It can be verified that Eq. (35) results in zero absorption for any 
scatterer if the polarizability is of the following form: 
IAα 31 i)32( kii −=
− , Hii AA = , (36)
where H denotes the conjugate transpose of a tensor. For real refractive index m, RR and all 
other expressions specified below result in α  satisfying Eq. (36), which makes Eq. (35) 
clearly favorable over e.g. the PP formulation. It must be noted however that the original PP 
formulation, where CM polarizability was used, also results in zero absorption for real m. 
The correction in Eq. (34) is ( )3)(O kd . Several other corrections of ( )2)(O kd  have been 
proposed. The first one was proposed by Goedecke and O’Brien [7] and independently in two 
other manuscripts [34,35]. They started from Eqs. (10)-(12) and used the following 
simplifying fact for a cubical cell (also valid for spherical cells), resulting from symmetry: 
)(d IRRf
RRf , (37)
where the origin is in the center of the cube. Eq. (37) is valid for any f(R) that has a singularity 
of less than third order for , i.e. the integrals on both sides are defined. They obtained 0→R
32)0( )iexp(d
Rki IM . (38)
By expanding exp(ikR) in Taylor series one can obtain 
( )⎟⎟
++= ∫ 423
2)0( Oi
ki IM . (39)
The remaining integral was evaluated by approximating the cube by a volume-equivalent 
sphere, resulting in 
( )( )432DGF1)0( )(O))i(32()( kdkdkdbi ++= IM , (40)
611992.1)34( 31DGF1 ≈= πb . (41)
An exact evaluation, obtained without expanding the exponent, of Eq. (38) for the equivolume 
sphere with radius 31)43( πda =  was performed by Livensay and Chen [36] and 
implemented into the DGF formulation of the DDA by Hage and Greenberg [14,35] and later 
Lakhtakia [37]: 
[ ]1)iexp()i1()38()0( −−= kakai IM π . (42)
In terms of the first two orders of expansion, this yields an identical result as Eq. (40). Finally 
the polarizability is obtained as 
( )( )32DGF13CM
)i()32()(1 kdkdbd +−
α . (43)
We denote the method based on Eq. (42) as LAK. Differences between LAK and DGF should 
be noticeable only for large values of kd. 
Dungey and Bohren [38], using results by Doyle [39], proposed the following treatment 
of the polarizability. First, each cubic cell is replaced by the inscribed sphere that is called a 
dipolar subunit with a higher relative electric permittivity εs as determined by the Maxwell-
Garnett effective medium theory [27]: 
f , (44)
where 6π=f  is the volume filling factor. Other effective medium theories also may be used 
[40]. Next, the dipole moment of the equivalent sphere is determined using the Mie theory, 
and the polarizability is defined as [39] 
i= , (45)
where α1 is the electric dipole coefficient from the Mie theory (see e.g. [41]): 
)()()()(
)()()()(
sssssss
sssssss
1 xmxxxmm
xmxxxmm
α , (46)
where ψ, ξ are Riccati-Bessel functions; 2s kdx =  and ss ε=m  are the size parameter and 
the relative refractive index of the equivalent sphere. We denote this formulation for the 
polarizability as the a1-term method (note that this terminology was introduced later [42]). It 
has the particular property that 1constCMM ≠→αα  when , contrary to all other 
polarization prescription, for which this ratio approaches 1. It should be noted that the Mie 
theory is based on the assumption that the external electric field is a plane wave. In most 
applications of the DDA this is true for the incident electric field, but not for the field created 
by other subvolumes. Therefore the a
1-term method is expected to be correct only for very 
small cell size. Hence it is not clear whether this method has advantages even compared to 
CM. On the other hand, this method may be more justified for clusters of small spheres, 
where each sphere can be considered as a dipole (see Subsection 3.3). 
Draine and Goodman [3] pointed out that considering electric fields constant for 
evaluating integrals over a cell introduces errors of order ( )( )2O kd . This represents a problem 
for many polarizability corrections, based on integral equations. Draine and Goodman 
approached this problem from a different angle. They determined the optimal polarizability in 
the sense that an infinite lattice of point dipoles with such polarizability would lead to the 
same propagation of plane waves4 as in a medium with a given refractive index. This 
polarizability was called LDR (Lattice Dispersion Relation) and is, as expected, CM plus 
high-order corrections. These corrections in turn depend on the direction of propagation a and 
the polarization of the incident field e0: 
( ) ( )[ ]322LDR32LDR2LDR13CM
)i()32()(1 kdkdSmbmbbd +++−
α , (47)
8915316.1LDR1 ≈b , , , 1648469.0
2 −≈b 7700004.1
3 ≈b (48)
( )∑=
20eaS . (49)
We use a reverse sign convention in the denominator of Eq. (47) and the LDR coefficients as 
compared to the original paper [3]. 
Recently it has been shown [43] that the LDR derivation is not completely accurate, 
since the resulting dipole moment does not satisfy the transversality condition, for which a 
correction was proposed. This corrected LDR (CLDR) differs principally in the fact that the 
polarizability tensor can not be made isotropic but only diagonal [43], though not dependent 
on the incident polarization: 
( ) ( )[ ]3222LDR32LDR2LDR13CM
)i()32()(1 kdkdambmbbd +++−
α . (50)
Another flaw of LDR is that it is evidently not correct for dipoles near the particle surface. 
However, it is not clear how to evaluate the effect of these mistreated surface dipoles on the 
overall results, e.g. on the scattering cross section. 
Further improvement of the DDA was initiated by Peltoniemi [26] (PEL) who showed 
that the term M(Vi) in Eq. (7) can be evaluated exactly up to the third order of kd by 
expanding the term )()( rEr ′′χ  under the integral in a Taylor series over the point irr =′ , 
yielding 
( ) ( ,)(O3i3)iexp(d
)iexp(
EkdERRRRkRRk
ERkRRk
REMVM
ντρτρνμ
+∂∂−+−
∂−++=
                                                
 (51)
where χ, E and their derivatives are all considered at the point ri. Eq. (51) is correct up to the 
third order of kd since the third term in the Taylor series vanishes because of symmetry. For 
spherical Vi of radius a, the integrals can be evaluated exactly [26] in a way similar to 
4 with certain direction of propagation and polarization state. 
obtaining Eq. (42), but only terms of less than fourth order of kd are significant, which results 
( EkaakakaVi χχχχπ 42232 )(O)(10
)( +⎥
⎛ ⋅∇∇−∇−⎟
⎛ += EEEM ). (52)
If χ is constant inside the cell then the Maxwell equations state that 
EE 222 km−=∇ , 0=⋅∇ E . (53)
Hence Eq. (9) is valid up to the third order of ka and 
( )[ ]322 )(i)32()()101(1)34( kakami ++= IM π . (54)
Piller and Martin [44] proposed using sampling theory to evaluate the integrals in 
Eq. (1). The electric field and the susceptibility is sampled: 
∑ −′=′′
iiih )()()()()(
r rErrrrEr χχ , (55)
where hr(r) is the impulse response function of an antialiasing filter defined as 
)cos()sin(
qrqrqr
=r , (56)
where dq π2= . Eq. (1) is then transformed to Eq. (10) with the so-called filtered Green’s 
function, defined as 
∫ −′′′=
r3 )(),(d
ij hrV
rrrrGG . (57)
Eq. (57) can be viewed as a generalization of Eq. (13). The latter is obtained if a pulse 
function is considered instead of hr. The integral in Eq. (57) is evaluated analytically [44], 
taking V0 to be infinitesimally small. The filtered Green’s function does not have a singularity 
when , therefore ji rr = iiii V GM = . It was shown that the Fourier spectrum of E(r) lies on a 
sphere with radius m(r)k, if m is constant in the vicinity of r. Therefore at least two sampling 
points per wavelength in the scatterer are required. The susceptibility is also filtered, either by 
a mean value filter or a more complicated one, e.g. a Hanning window. This approach is 
called FCD (filtered coupled dipoles), and a computer code library for evaluation of filtered 
Green’s function is available [45]. 
Chaumet et al. [11] proposed direct integration of the Green’s tensor (IT) in Eqs. (12), 
(13). A Weyl expansion of the Green’s tensor is performed, transforming it to a form allowing 
efficient numerical computation of the self-term ( LM − ). They also proposed a correction to 
the second term in Draine’s expression for Cabs (Eq. (35)). Extension of their results to a non-
isotropic self-term is 
( ) ( )[ ]∑ −⋅+⋅=
iiiiiii VkC /)(ImIm4
***exc)0(
abs PLMPEPπ
The corrected second term is based on radiation energy of a finite dipole [11]: , in 
contrast to a point dipole used in the derivation of Eq. 
)Im( self ∗⋅ ii PE
(35). One can see that Eqs. (58) and 
(31) are equivalent. Moreover, both of them are equivalent to Eq. (35) if and only if 
IAM iii Vk
3i)32(+= , Hii AA = . (59)
This condition is similar, but not equivalent, to Eq. (36) and is always satisfied for RR, DGF, 
and LAK. Other polarizability prescriptions satisfy Eq. (59) for real m, then both Eqs. (58) 
and (35) result in zero absorption. 
Rahmani, Chaumet, and Bryant [46] proposed a new method (RCB) to determine 
polarizability based on the known solution of the electrostatic problem for the same scatterer. 
In the static limit the electric field at any point is linearly related to the incident field 
)()()( 01 rErCrE −= . (60)
Substituting Eq. (60) into Eq. (20) with the static Green’s tensor, one can obtain the 
polarizability, which would give an exact solution in the static limit, as 
1RCB −= iiii V Λχα , (61)
ii CCrrGCΛ
1),( −
∑+= χ , (62)
where )( ii rCC = . This static polarizability then replaces the CM polarizability, and the RR 
(Eq. (34)) is applied to it [46] to obtain the final polarizability for DDA simulations. It was 
later shown that RCB polarizabilities differ significantly from CM only for dipoles closer than 
2d to the interface [47]. 
In their next manuscript [48] Rahmani et al. stated that the previous derivation is correct 
only if the tensor C  is constant inside the particle (e.g. for ellipsoids), since otherwise the 
polarizability tensor obtained from Eq. (61) is generally not symmetric, which is physically 
impossible in the static case. This shows that a particle with a non-constant C  is not 
equivalent to any set of physical point dipoles even in the static regime. However, it is 
equivalent to a set of non-physical dipoles with an asymmetric polarizability. Therefore, the 
polarization defined by Eq. (61) formally can be used, by itself or with RR, even when C  is 
not constant. 
Collinge and Draine [47] empirically combined the RCB prescription with CLDR to get 
the surface-corrected LDR (SCLDR): 
( )( ) 13RCBRCBSCLDR −−= BαIαα d , (63)
where B  is the correction matrix (analogous to Eq. (50)): 
( )[ ]3222LDR32LDR2LDR1 )i()32()( kdkdambmbbB +++= μμνμν δ . (64)
All methods based on the paper by Rahmani et al. [46] are initially limited to very specific 
shapes of the scatterer (ellipsoids, infinite slabs and cylinders). Expansion of its applicability 
to other shapes is debatable [48] and would anyway require a preliminary solution of the 
electrostatic problem for the same shape, which is generally not trivial. 
All DDA formulations are schematically depicted in Fig. 1, which also shows 
interrelations between them. Some formulations can be compared unambiguously in terms of 
theoretical soundness: one is an improvement of the other, i.e. it employs fewer 
approximations. Such formulations are depicted in the same column on Fig. 1, while others 
cannot be compared directly with each other; they give rise to different columns. Comparison 
between formulations from different columns can and has been made almost exclusively 
empirically by comparing the accuracy of the simulation results (see Subsection 3.2). 
All the above techniques are aimed at reducing discretization errors; only a few aim at 
reducing shape errors. Some of them employ adaptive discretization (different dipole sizes) to 
better describe the shape of the scatterer (see Subsection 3.4). Another approach is to average 
susceptibility in boundary subvolumes. The simplest averaging using the Lorentz-Lorenz 
mixing rule was proposed by Evans and Stephens [49] for the case of the boundary between 
the scatterer and its surrounding medium 
3434 e
, (65)
where  is the effective susceptibility, and f is the volume fraction of the subvolume actually 
occupied by scatterer. 
A more advanced averaging, called the weighted discretization (WD), was proposed by 
Piller [13]. It modifies the susceptibility and self-term of the boundary subvolume.5 The 
particle surface, crossing the subvolume Vi, is assumed linear and divides the subvolume into 
two parts: the principal  that contains the center and a secondary  with susceptibilities piV
                                                 
5 any subvolume that has non-zero intersection with both the scatterer and the outer medium. All such 
subvolumes are accounted for. 
 Integral Eq. (1)  
discretization 
(no assumptions)
Eq. (7) 
General 
formulation of 
DDA – Eq. (20) 
Eqs. (8), (9) 
DGF, LAK 
Eqs. (11) 
Eq. (14)
(weak form)
CLDR 
a1 term 
SCLDR FCD 
sampling with 
antialiasing 
filter 
removing 
antialiasing 
filter 
improving polarizability starting 
from dipole formulation 
complies 
Eq. (14) 
simplifies to 
Fig. 1. Scheme of interrelation between the different DDA models discussed in Section 3.1. Arrows 
down correspond to assumptions employed. Vertical position of the method qualitatively corresponds 
to its accuracy (higher = better), however methods in different columns cannot be compared directly. 
iχ ,  and electric fields ,  respectively. Electric fields are considered constant 
inside each part and related to each other via a boundary condition tensor 
iχ ii EE ≡
iT : 
iii ETE =
s . (66)
Then the total polarization of the subvolume can be evaluated as follows: 
iiiiiiiii
i VVVr
EEErErP essspp3 )()(d χχχχ =+=′′′= ∫ , (67)
( ) iiiiiii VVV TI ssppe χχχ += . (68)
The susceptibility of the boundary subvolume is replaced by an effective one. 
The effective self-term is evaluated directly starting from Eq. (3), considering χ and E 
constant inside each part: 
( ) ( ) ii
rr TrrGrrGrrGrrGM ss3ps3ee
),(),(d),(),(d χχχ ∫∫ ′−′′+′−′′= . (69)
Piller [13] evaluated the integrals in Eq. (69) numerically. The final equations are the same as 
Eq. (20), where polarizabilities are obtained from Eq. (18) using effective susceptibilities and 
self-terms for boundary subvolumes. Hence, WD does not modify the general numerical 
scheme. 
Currently, there are no rigorous theoretical reasons for preferring one formulation over 
others. However, theoretical analyses of DDA convergence when refining discretization 
recently conducted by Yurkin et al. [32], showed that IT and WD significantly improve the 
convergence of shape and discretization errors, respectively. Experimental verification of 
these theoretical conclusions is still to be performed. 
Table 1. Accuracy of different DDA formulations for a sphere.a
Value Method x a/d y m Error, % Ref.
Cext a1-term 1÷2 2÷4
c 0.65 
0.85 
1.33+0.05i 
1.7+0.1i 
CSec, S11 LAK 9 
0.44 
0.42 
0.51 
1.05 
1.33+0.01i 
2.5+1.4i 
0.05, 37 
0.5, 35 
4, 15 
Csca, Cabs DGF ≤3.2c 16 ≤1 4+3i 5, 10÷30 [3]
CSec LDR ≤8c 16 ≤0.5, ≤0.1 m-1≤1 1, 2 
Csca 
LDR ≤7c 16 ≤1 
≤0.5, ≤1 
2+i 1.5 
3, 4 
CSec ≤16c
25 ≤1 1.6+0.0008i 
2.5+0.02i 
LDR [51]
[4]eCSec 
LDR any any ≤1 |m|≤3 5 
20÷30 
Csca LDR ≤10c 16 kd≤0.63 0.69 
0.41 
0.29 
[148]
S11 LDR ≤10c 24 kd≤0.42 0.69 
0.41 
Cext,  RMS11S
3.2 Accuracy of DDA simulations 
Over the years many results on the accuracy of DDA simulations have been published. It is, 
however, generally hard to systematically compare the relevant manuscripts because they all 
use different independent parameters, such as the size parameter x, refractive index m, or 
discretization, as a function of which the error is measured. We will describe discretization by 
the parameter kdmy =  or . The former is used wherever possible; however, 
in some cases a description of results is more straightforward in terms of y
kdmy )Re(Re =
Re. Accuracy results 
LDR 20÷160 
20÷130 
20÷60 
20÷30 
20÷30 
32÷256c
40÷256c
48÷128c
56÷80c
64÷88c
0.61÷0.65d
0.56÷0.64d
0.58÷0.65d
0.57÷0.60d
0.56÷0.62d
0.62d
1.05 
0.04, 38 
0.4, 23 
1, 59 
4.4, 56 
5.7, 105 
2.0, 86 
[113]
Ψ FCD π, 2π 2.8, 5.6c 1.7 1.5 1 [44]
WD-FCD 0.5÷3.2c
1.5÷3.8c
0.9÷1.5c
yRe=0.63 |m|<7b
|m|<2.5b
|m|<4b
[10]Ψ 
IT ≤5.2c CSec 
≤2.1c
≤1.1c
8 ≤1 1.5+0.3i 
3.5+1.4i 
7.1+0.7i 
Cabs 
CSec RCB-RR ≤8.2c 
≤7.5c 
≤5.9c 
≤3.4c 
≤1.3c
16 ≤1 1.8+0.4i 
1.9+i 
2.5+i 
2.5+4i 
7.4+9.4i 
CSec SCLDR 
SCLDR 
≤7.2c 
≤1.5c 
≤1.5c
12 ≤0.8 1.33+0.1i 
5+4i 
5+4i 
a All errors are relative. CSec denotes the maximum error over all cross sections, S11 and  correspond to 
maximum and root mean square error over the range of scattering angles, Ψ is the normalized mean error of the 
far-field electric fields [44]. In some cases two errors are shown in one cell separated by a coma. They 
correspond to two values of one of the parameters in the same row. 
b approximate description of the range. 
c this value is determined by other values in the same row. 
d this value is slightly different for different size parameters. 
e this corresponds to the “rule of thumb” for spheres. 
for scattering by a sphere are summarized in Table 1. All manuscripts on this subject can be 
divided into two classes: those that fix x and vary N (or equivalently, the number of dipoles 
per sphere radius a/d) with y, and those that fix a/d and vary the size parameter with y. The 
former is easier to interpret; the latter is easier to simulate. To facilitate comparison between 
different methods we provide both x and a/d, however one of them is dependent on the other. 
Some additional information on these results follows below. 
Draine and Goodman [3] compared RR, DGF, and LDR for cross sections of a sphere 
with . DGF is generally more accurate than RR. For 16/ =da 1|1| ≤−m  LDR gives superior 
or comparable results to DGF, for i2+=m  LDR and DGF are comparable, and for 
 DGF is preferable over LDR. In the review of LDR DDA, Draine and Flatau [4] 
summarized that for  cross sections can be evaluated to accuracies of a few percent 
provided . In that case differential cross sections have satisfactory accuracy: relative 
errors up to 20-30%, but only where the absolute value of the differential cross sections is 
small. For spheres, such results are obtained even for 
i34 +=m
2|| ≤m
3|| ≤m . Comparison of CLDR to LDR 
[43] only results in minor differences. Generally CLDR results in slightly better accuracy for 
Csca, but worse for Cabs. 
Piller and Martin [44] compared FCD to LAK by studying the dependence of the mean 
relative error of the far-field electric fields (Ψ) on y for spheres with π=x , 2π and . It 
was shown that FCD (with a Hanning window filter for the electric permittivity ε) is roughly 
3 times more accurate than LAK in the range 
5.1=m
5.27.0 ≤≤ y  and gives similar accuracies for 
 (for larger spheres). Comparison of WD to traditional methods [13] was performed 
for spheres with 
4.0≤y
π=x , 2π and 32.1=m , i7.01.2 + . LAK was used to determine 
polarizabilities. For  in the range 32.1=m 3.14.0 ≤≤ y  overall accuracy was only slightly 
improved, but error peaks for certain values of y were smoothed out. For i7.01.2 +=m  
accuracy was improved 4-5 times over the whole range 3.1≤y . Piller also showed [10] that a 
combination of WD and FCD gives even better results. Generally FCD decreases the negative 
effects of Re(ε) on accuracy and WD those of Im(ε). 
Rahmani et al. [48] showed that RCB was clearly superior to CM in calculating cross 
sections for fixed  and m from 16/ =da i4.08.1 +  to i4.94.7 +  in the range . Two 
corrections (LDR and RR) over the static case were compared, and they gave similar overall 
results. Improvement of overall accuracy compared with CM was 2-5 times in all cases 
studied. For a thin slab, it was shown [46,48] that the internal fields calculated using RCB 
differ from those by CM mostly near the interfaces, where RCB yields much smaller errors, 
almost the same as far from interfaces. 
Collinge and Draine [47] compared LDR, RCB, and SCLDR in calculations of cross 
sections of spheres with . It was shown that for 12/ =da i01.033.1 +=m , LDR and SCLDR 
are superior in the range , while for 8.0≤y i45 +=m , SCLDR and RCB are superior. 
Convergence of cross sections for spheres and ellipsoids for increasing N with fixed x and 
different m (from  to i01.033.1 + i45+ ) also was studied. SCLDR showed the most stable 
results for all cases, being the most or close to the most accurate one; however, for ellipsoids 
with large Im(m) RCB gave significantly more accurate results for Csca, especially for larger y. 
Performance of the DDA for more complex shapes also was studied by different 
authors. Flatau et al. [50] compared DDA simulations for a bisphere with an exact solution 
from a multipole expansion. For i01.033.1 +=m , 16/ =da , and 8.0≤y , LDR was several 
times more accurate than DGF and resulted in errors of less than 0.5% for both Csca and Cabs. 
Xu and Gustafson [51] made a similar but much more extended study of LDR. For 
, i008.06.1 +=m 25/ =da , and , errors in C4.0≤y ext, Cabs, and θcos  are within 10%. For 
, errors in the angular dependence of S81.0=y 11 are up to 20% while S12 and S21 were 
completely wrong. For , errors in cross sections exceed 10% for . i02.05.2 +=m 3.0≥y
Errors in the angular dependencies of the Mueller matrix elements are within 10-20% for 
 and increase rapidly with increasing y. For a fixed 3.0=y 3=x  and , errors 
i004.06.1 +=m
ext, Cabs, and >< θcos  decrease from 10% to 1% while y decrease from 1 to 0.2. For 
, the angular dependence of S33.0=y 11 is in good agreement with the rigorous solution, 
while S12 and S21 differ significantly for certain orientations of the bisphere. 
Hage and Greenberg [14] compared LAK to experimental results obtained from 
microwave experiments on porous cubes. Using i005.0362.1 +=m , 64.0=y  and , 
they obtained a difference of less than 40% with the experimental results of angular scattering 
patterns, except for deep minima. Light scattering of cubes, tiles, and cylinders with similar 
parameters also was studied and comparable differences between experiment and theory were 
obtained. Theoretical errors were estimated to be less than 10%, except for deep minima. 
5504=N
Iskander et al. [34] conducted a limited test of LAK for small elongated spheroids, 
comparing the results to those obtained using an iterative extended boundary condition 
method. Using , calculations were performed for aspect ratios up to 20 with maximum 
size parameter of the long axis being 10 and 0.5 for 
i01.033.1 +=m  and  
respectively. Errors in scattering cross section were 21% and 11%, respectively. Ku [52] 
compared LAK with CM and the a
i28.076.1 +
1-term for different shapes, but his conclusions are based 
on a large parameter y (up to 2), and are therefore suspicious and not further discussed here. 
Andersen et al. [53] studied the performance of the DDA for Rayleigh-sized clusters of 
a few spheres (most DDA formulations are then equivalent to CM). Several constituent 
materials were tested, all with high refractive indices in the studied region. It was shown that 
the DDA failed to converge using the fixed computational resources for very high (up to 13.0) 
and very low (down to 0.12) Re(m); up to 30 dipoles were used per diameter of a single 
sphere. 
It can be concluded that particles with more complex shapes than spheres are more 
difficult to model with the DDA, leading to larger errors for the same m and y. This effect can 
be explained in general by the increase of surface to volume ratio and hence larger fraction of 
boundary subvolumes [32]. Another possible reason is complex regions, e.g. contact between 
two particles in a cluster, where rapid variation of the electric field deteriorates the overall 
accuracy. There is, however, a notable exception from this general tendency. Shapes, which 
can be modeled exactly by a set of cubical dipoles, e.g. a cube, can be simulated using the 
DDA much more accurately than spheres, especially for small y [32]. 
Draine and Flatau [4] have introduced a “rule of thumb” for discretization: use 10 
dipoles per wavelength in the medium (i.e. either y or yRe equal to 0.63, depending on the 
interpretation). Though it is widely used, the accuracy of the results, when using such 
discretization, is hard to deduce a priori. Draine and Flatau themselves derived an estimate of 
the error based on a set of test simulations. This estimate is described above and mentioned in 
Table 1; it is usually cited as a “few percent accuracy in cross sections.” However, it may 
significantly over- or under-estimate the error, especially for large size parameters. Moreover, 
it does not completely account for the dependence on m, even in the stated range of its 
application ( ), since DDA accuracy deteriorates rapidly with increasing m (see 2|| ≤m Table 
1). Still, the rule of thumb is good first guess for many applications. 
Most studies of DDA accuracy are limited to integral scattering quantities and, at most, 
the angular dependence of S11. In only a few manuscripts are other scattering quantities 
studied. For instance, Singham [54] simulated the angular dependence of Mueller matrix 
element S34 for spheres and less compact particles, using CM polarizability. It was shown that 
an accurate simulation of this element requires smaller values of y than for S11. For 55.1=x  
and  a calculation of S33.1=m 11 was accurate already for 8.0=y , while  was 
required for S
2.0≤y
34. It was also reported that for less compact objects like discs and rods, the 
required y was larger, 0.4 and 0.55 respectively, because of the smaller interaction between 
the dipoles. However, Hoekstra and Sloot argued [55] that this effect is mostly caused by the 
pronounced S34 sensitivity to surface roughness, which is significant for smaller size if y is 
fixed. They showed that for  and 7.10=x 05.1=m , very high accuracy is achieved with 
 because of the larger number of dipoles used. 66.0=y
Internal fields are an intermediate result in the DDA. They cannot be directly compared 
to the experimental results; however, all measured scattering quantities are derived from 
them. Therefore, a study of their accuracy can reveal greater understanding of the nature of 
DDA errors. Hoekstra et al. [56] performed such a study for LAK polarizability. Three 
spheres were examined with , 9, 5 and 9=x 05.1=m , i01.033.1 + ,  respectively. 
Values of y were 0.44, 0.42, and 0.51 respectively. The most significant errors in the 
amplitude of the internal field were localized at the boundary of the spheres with maximum 
relative errors of 3.4%, 19%, and 120% respectively. Errors in S
i4.15.2 +
12, S33, S34 were significant 
only for the third sphere. It was shown that for a given yRe these errors rapidly increase with m 
but only slightly depend upon x in the range from 1 to 10. Moreover, the DDA is capable of 
reproducing resonances of Mie theory, although their positions are slightly shifted (less than 
1% in m). 
Druger and Bronk [57] studied the accuracy of the internal fields for single and coated 
spheres. They used 5.1=x , , and CM polarizability. Errors in the internal fields were 
localized at the interfaces, with average errors larger than 30% for a single sphere with 
 and , and less than 7% for a single and concentric sphere with  and 
. The core of the concentric sphere has 
8.1≤m
8.1=m 17.0=y 3.1=m
08.0=y 1.1=m  and its diameter is half the total 
diameter. The angular dependence of the absolute values of S1 and S2 had significant errors in 
the side- and backscattering. It can be concluded that shape errors contribute mostly to the 
internal fields near the boundary, and increase with m. 
All the literature discussing DDA accuracy shows errors as a function of input 
parameters and discretization, which is the most straightforward way. The only exception so 
far is the rule of thumb, which is too general and approximate to be applied in many particular 
cases. A more useful way to present errors is to fix the desired accuracy for certain input 
parameters and find the discretization that results in such accuracy. Such an analysis can be 
applied directly to practical calculations and can be used to derive rigorous estimates of DDA 
computational requirements [58]. 
In a number of manuscripts the origin of errors in the DDA was examined to try to 
separate and compare shape and discretization errors [49,59-62]; however, no definite 
conclusions were reached. The uncertainty was due to the indirect methods used that have 
inherent interpretation problems. Recently, Yurkin et al. [63] proposed a direct method to 
separate shape and discretization errors, which can be used to study their fundamental 
properties. This method also can be applied to study the performance of different formulations 
aimed at decreasing shape errors, e.g. WD. For example, it has been shown that the maximum 
errors of S11(θ) for a sphere with  and 5=x 5.1=m , discretized using 16 dipoles per diameter 
( ), are mostly due to shape errors. However the same is not true for all measured 
quantities. In another manuscript [32] it was suggested that the discretization error should 
decrease more rapidly with decreasing y than shape errors. However, it is still hard to deduce 
a priori the importance of shape errors for a certain scatterer and y; hence, further systematic 
quantitative study is required. 
93.0=y
3.3 The DDA for clusters of spheres 
There are two main peculiarities when the DDA is applied to clusters of spheres. First, such 
particles are generally less compact, yielding smaller interactions between dipoles. This leads 
to a smaller condition number of the DDA interaction matrix and hence faster convergence of 
the iterative solver (see Section 4.1). Second, when the constituent spheres are small 
compared to the wavelength, each sphere can be modeled as one spherical subvolume, 
yielding some theoretical simplifications. 
A general theory exists [64] based on the Mie theory (generalized multiparticle Mie 
solution (GMM) [65]) that allows for highly accurate simulations of clusters of spheres. 
However, when many small spheres are used one wants to minimize the number of unknowns 
in the linear system. Direct reduction of the GMM to the lowest order (using only the first 
order expansion coefficients) leads to DDA + CM [64]. Improving accuracy in the GMM is 
done by accounting for higher multipole moments, while the DDA introduces higher order 
corrections to the coefficients of the linear system. It is not clear how the accuracy of these 
two methods compare with each other; however, the former should lead to a formulation 
similar to a coupled multipole method (Subsection 3.4) with a larger number of unknowns. 
DDA-based methods (starting usually with the integral equations introduced in Section 2) 
should be successful in making the formulation more accurate without increasing the number 
of unknowns, which is the goal for large clusters of small spheres. Moreover, the DDA may 
employ fast algorithms for solving the linear system. In this setting, the fast multipole method 
(FMM) (see Subsection 4.5) seems most promising. 
It should be noted, however, that a cluster having a small size parameter (i.e. in the 
electrostatic approximation) does not imply that all expansion coefficients, except the first 
one, are negligible. This is because the size of the constituent particles is also very small and 
the fields inside them are far from constant, especially when the spheres are located close to 
each other and have large refractive indices [66]. Therefore, the DDA does have some 
principal difficulties of calculating scattering by clusters of spheres. Mackowski [67], for 
instance, found that for some systems composed of spheres much smaller than the 
wavelength, up to 10 expansion terms were necessary to achieve convergence. In studies of 
osculating spheres, Ngo et al. [68] proved that the GMM could be chaotic and were able to 
calculate Lyapunov exponents, and that the slow convergence for the touching spheres was 
the result of the system lying in an attractor region. A recent paper by Markel et al. [69] 
presented computationally efficient modifications of the GMM in the static limit and 
demonstrated the insufficiency of the DDA to compute scattering properties of fractal 
aggregates accurately. However, Kim et al. [70] showed that the DDA is satisfactory in 
calculating the static polarizability of dielectric nanoclusters, especially of clusters with a 
large number of constituents. 
The development of DDA-based methods for calculating light scattering by clusters of 
small spheres was started by Jones [71,72], who developed a method similar to CM. Iskander 
et al. [34] used a method equivalent to LAK to calculate scattering of chained aerosol 
clusters. This subject was further investigated by Kosaza [73,74]. Lou and Charalampopoulos 
[75] (LC) further improved the calculations of the interaction term and scattering quantities. 
Starting from an integral equation for the internal field equivalent to Eq. (1), they assumed 
Eq. (11). After that the integrals in Eqs. (12) and (13) over spherical subvolumes can be 
evaluated analytically. The result for the interaction term is the following: 
),()()0( jiij ka rrGG η= , (70)
where a correction function η is defined as 
)O()101(1
cossin
3)( 42
x +−=
=η . (71)
Eq. (30) also is evaluated analytically, yielding 
∑ ⋅−−−=
ii knnkak )iexp()ˆˆ)((i)(
3)0( nrPInF η , (72)
( )∑ ∗⋅=
iikakC
inc)0(
ext Im)(4 EPηπ . (73)
The following expression for Cabs is stated without derivation: 
iikakC )Im()(4
abs EPηπ . (74)
Markel et al. [76] applied the DDA to fractal clusters of spheres, and studied their 
optical properties. However, they have not fixed the polarizability of a single dipole but rather 
treated it as a variable, calculating the dependence of a cluster’s optical characteristics upon it. 
Pustovit et al. [77] argued that the DDA is inaccurate for touching spheres. They developed a 
hybrid of the DDA and the GMM, which considers only pair interactions between spheres (as 
the DDA) but, when calculating them, accounts for higher multipole terms. This formulation 
can be considered as the one providing a more accurate evaluation of the interaction term 
(Eq. (13)), and hence similar to LC. 
LC was compared to DGF and LAK in a Csca computation of a cluster of 10 particles for 
 and . Differences between DGF and LAK are less than 1% (as 
expected), while the difference between LC and LAK increases quadratically with ka, 
reaching 10% for 
i7.07.1 +=m 5.005.0 ≤≤ ka
5.0=ka . However, as no exact (e.g. GMM) solution is presented, the 
accuracy of each individual method is not clear. 
Okamoto [42] tested the a1-term method for clusters of up to 3 touching spheres. No 
effective medium is needed in this case, making the method sounder. It was shown that the a1-
term is clearly superior to LDR in cross-sections calculations, when each sphere is treated as a 
single dipole. Errors of the a1-term are less than 10% for 2.1≤y  when . For 
three collinear touching spheres the errors are 30% and 40% for  and 2.8 when 
 and  respectively. However, errors do not seem to diminish 
significantly for small y (results are presented only down to 
i01.033.1 +=m
9.1≤y
i01.033.1 +=m i2 +
2.0=y ). Therefore, the a1-term 
seems suitable for obtaining quick crude estimations of cross sections. 
In the sequel of this subsection we mention several applications of the DDA to 
scattering from clusters of spheres. It was applied to describe the scattering by astrophysical 
dust aggregates [78,79] using the a1-term method. Hull et al. [80] applied CM DDA to Diesel 
soot particles. LC was applied [81] to the computation of light scattering by randomly 
branched chain aggregates. Lumme and Rahola [40] studied scattering properties of clusters 
of large spheres (each modeled by a set of dipoles) with the a1-term method considering 
astrophysical applications. Hage and Greenberg [35] studied scattering by porous particles, 
which were modeled as clusters of cubical cells making their method equivalent to standard 
LAK. Recently the DDA with LDR was used [82] to model scattering by porous dust grains 
and compare them to approximate theories, e.g. effective medium theories. It also was used to 
study light scattering by fractal aggregates [83], especially its dependence on the internal 
structure [84]. 
3.4 Modifications and extensions of the DDA 
Bourrely et al. [85] proposed to use small d to minimize surface roughness, but larger dipoles 
inside the particle. Starting with small dipoles with CM polarizability, one dipole is combined 
with 6 adjacent ones (if they all have the same polarizability) producing a dipole, located at 
the same point but with a 7 times larger polarizability. This operation is repeated while 
possible. Interaction terms are considered in their simplest form (Eq. (14)). This method 
allows the decrease of the shape errors with only a minor increase in the number of dipoles. 
The authors showed that this method is more than two times more accurate than CM for some 
test cases. 
Rouleau and Martin [86] proposed a generalized semi-analytical method. A dynamic 
grid is used to evaluate the integral in Eq. (1). First, a static grid is built inside the particle. 
Then each point on the static grid is used as an origin of a spherical coordinate system, and 
the particle is approximated by an ensemble of volume elements in these spherical 
coordinates. As usual, the polarization inside each subvolume is assumed constant, but 
Eq. (13) can be evaluated analytically in spherical coordinates. Polarization inside a 
subvolume is obtained by interpolation of its values at the points of the static grid. In addition, 
adaptive gridding is employed, where smaller subvolumes are used at the boundary of the 
particle. 
Mulholland et al. [87] proposed a coupled electric and magnetic dipole method 
(CEMD), where a magnetic dipole is considered at each subvolume together with an electric 
dipole. Polarizabilities are derived from the a1 and b1 terms of the Mie theory. CEMD 
requires two times more variables in the linear system, since the electric and magnetic fields 
are interconnected. Lemaire [88] went further and developed the coupled multipole method, 
considering also the electric quadrupole. Addition of the electric quadrupole can be 
considered as a more accurate evaluation of the interaction term in Eq. (13), as compared to 
Eq. (14). It results in even better accuracy than CEMD, but at the expense of additional 
computation time. The major disadvantage of all these four methods is that the matrix of the 
system of linear equations does not seem to have any special form, suitable for faster 
algorithms (see Section 4). Therefore computational costs are much larger compared to 
regular methods, thus limiting their practical use. In what follows, several DDA extensions 
are mentioned without further discussion. 
The theoretical basis for application of the DDA to optically anisotropic particles was 
summarized by Lakhtakia [89]. Loiko and Molochko [90] applied the DDA to study light 
scattering by liquid-crystal spherical droplets. Smith and Stokes [91] used the DDA to 
calculate the Faraday effect for nanoparticles. Researchers in the electrical engineering 
community applied MoM (in a variation that is equivalent to the DDA) to anisotropic 
scatterers [92,93]. 
Rectangular parallelepipeds can be used as subvolumes in the DDA [11,23,43]. This 
allows an accurate description of light scattering by particles with large aspect ratios, using 
fewer dipoles and is also compatible with FFT techniques (Subsection 4.4). 
Khlebtsov [94] proposed a simplification of the DDA, based on the assumption that all 
polarizations are parallel to the incident electric field. The number of variables is thus reduced 
three times, however at a cost of accuracy. Moreover, depolarization is completely ignored. 
Markel [95] analytically solved the DDA equations for scattering by an infinite one-
dimensional periodic dipole array. This approach is similar to the one used in obtaining the 
LDR formulation for dipole polarizability [3]. 
Chaumet et al. [96] generalized the DDA to periodic structures, and further to defects in 
a periodic grating on a surface [97]. The idea of using the complex Green’s tensor in the 
standard DDA formulation was summarized by Martin [98]. 
Yang et al. [99] used the DDA to calculate surface electromagnetic fields and determine 
Raman intensities for small metal particles of arbitrary shape.  
Lemaire and Bassrei [100] showed that the shape of an object can be reconstructed from 
the measured angle dependence of scattered intensities. This procedure can be thought of as 
an inversion of the dependence between dipole polarizabilities and scattering. This 
dependence is taken from the DDA. A similar idea is used in recent manuscripts on optical 
tomography [101-103]. 
Zubko et al. [104] modified the Green’s tensor used in the DDA to study the 
backscattering of debris particles. They showed that the far-field part of the Green’s tensor is 
responsible for both the backscattering brightness surge and the negative polarization branch. 
4 Numerical considerations 
In this section the numerical aspects of the DDA are discussed. One should keep in mind, 
however, that final simulation times depend not only on the chosen numerical methods but 
also on the particular implementation. Recently, Penttila et al. [105] have compared four 
different computer programs for the DDA. These are based on almost identical numerical 
methods: the Krylov-subspace iterative method (Section 4.1) combined with a FFT 
acceleration of the matrix-vector product (Section 4.4). However, simulation times may differ 
by several factors. Optimizations of computer codes are not further discussed in this review. 
4.1 Direct vs. iterative methods 
There are two general types of methods to solve linear systems of equations , where x 
is an unknown vector and A and y are known matrix and vector, respectively: direct and 
yAx =
iterative [106]. Direct methods give results in a fixed number of steps, while the number of 
iterations required in iterative methods is generally not known a priori. The most usual 
example of a direct method is LU decomposition, which allows quick solving for multiple y 
once the decomposition is performed. Iterative methods are usually faster, less memory 
consuming and numerically more stable. However, iterative methods cannot be considered 
superior over direct, since they strongly depend on the problem to solve [107]. 
For a general n×n matrix (in DDA Nn 3= ) computation time of LU decomposition is 
O(n3) and storage requirements O(n2), while computation time for one iteration is O(n2) [107]. 
Iterative methods for a general matrix converge in O(n) iterations, although some of them 
may not converge at all. However, in many cases satisfactory accuracy can be obtained after a 
much smaller number of iterations. In these cases, iterative methods can provide significant 
increases in speed, especially for large n. Most iterative methods access the matrix A only 
through matrix-vector multiplication (sometimes also with the transposed matrix), which 
allows the construction of special routines for calculation of these products. Such routines 
may decrease memory requirements, since it is no longer necessary to store the entire matrix, 
especially for matrices of special form (see Subsection 4.3). A special structure of the matrix 
may also allow acceleration of the matrix-vector product from O(n2) to O(nlnn) (see 
subsections 4.4, 4.5). However, the same applies to direct methods (see Subsection 4.3). 
Throughout DDA history, mostly iterative methods were employed (however see 
Subsection 4.6). At first, they were used to accelerate computations [1], but they also allowed 
larger numbers of dipoles to be simulated [6,108], since storage of the entire matrix is 
prohibitive for direct methods. The most widely used iterative methods in the DDA are 
Krylov-space methods, such as [107] conjugate gradient (CG), CG applied to the Normalized 
equation with minimization of Residual norm (CGNR), Bi-CG, Bi-CG stabilized (Bi-
CGSTAB), CG squared (CGS), generalized minimal residual (GMRES), quasi-minimal 
residual (QMR), transpose free QMR (TFQMR), and generalized product-type methods based 
on Bi-CG (GPBi-CG) [109]. 
An important part of the iterative solver is preconditioning, which effectively decreases 
the condition number of the matrix A and therefore speeds up convergence. However, this 
requires additional computational time during both initialization and each iteration. 
Preconditioning of the initial system can be summarized as [107] 
yMxMAMM 12
21 )( =
− , (75)
where M1 and M2 are left and right preconditioners, respectively. Preconditioners should 
either allow fast inversion or be integrated into the iteration process. The simplest 
preconditioner of the first type is the Jacobi (point), which is just the diagonal part of matrix 
A. An example of the second type of preconditioner is the Neumann polynomial 
preconditioner of order l: 
)( AIM . (76)
QMR and Bi-CG can be made to employ the complex symmetric (CS) property of the 
DDA interaction matrix to halve the number of matrix-vector multiplications [110] (and thus 
computational time). Lumme and Rahola [40] were the first to apply QMR(CS) to the DDA 
and compared it with CGNR. They used m from i1.06.1 +  to i43+ , and x from 1.3 to 13.5, 
corresponding to N from 136 to 20336. For all cases studied QMR(CS) was 2-4 times faster 
than CGNR. 
Rahola [9] further studied QMR(CS) and compared it to CGNR, Bi-CG(CS), Bi-
CGSTAB, CGS, GMRES (full and with different memory length). For a “typical small 
problem” (parameters were not specified, unfortunately) the convergence of different methods 
was tested and QMR(CS) along with Bi-CG(CS) showed the best results. Although full 
GMRES was able to converge in fewer iterations, GMRES with as much as 40 memory 
lengths was slower than QMR(CS). 
Flatau [111] reviewed the use of iterative algorithms in the DDA and tested many of 
them, together with several preconditioners. He calculated scattering of a homogenous sphere 
with  and m from 1.33 up to 1.0=x i0001.05+ , 1=x  and m from 1.33 up to  and 
. Left (L) and right (R) Jacobi-, and first-order Neumann polynomial 
preconditioners were tested. Unfortunately the number of dipoles N was not specified, which 
hampers comparison with other studies. For small particles CG(L) was superior for all 
refractive indices studied. CG and CG(R) showed similar results, while CGNR(L) and Bi-
CGSTAB(L) were about 4 times slower. For 
i33.1 +
i0001.03+
1=x  Bi-CGSTAB(L) was superior while Bi-
CGSTAB,(R) and CGS,(L),(R) were slightly worse. TFQMR (both with and without Jacobi 
preconditioner) was 3-4 times slower. The first-order Neumann preconditioner showed 
unsatisfactory results. It was concluded that Bi-CGSTAB(L) is the most satisfactory choice 
for the DDA, and that method is the default one used in the DDSCAT program [6]. 
Recently Fan et al. [112] have compared GMRES, QMR(CS), Bi-CGSTAB, GPBi-CG, 
and Bi-CG(CS). They tested them on wavelength-sized scatterers (x up to 10) with m up to 
, and concluded that GMRES with memory depth 30 was the fastest, although it 
required four times more memory than the other methods. However, only the times of the 
matrix-vector product was compared, while other parts of the iteration may also take 
significant time, especially for GMRES(30). Choosing from less memory-consuming 
methods, QMR(CS) and Bi-CG(CS) showed a better convergence rate than Bi-CGSTAB and 
GPBi-CG, especially when 
i2.05.4 +
2>m . Moreover, the authors pointed out some flaws in the 
comparison by Flatau [111], making his conclusions insufficient. 
Yurkin et al. [113] employed QMR(CS), Bi-CG(CS), and Bi-CGSTAB to simulate light 
scattering by spheres with x up to 160 and 40 for 05.1=m  and 2, respectively. It was shown 
that convergence of the iterative methods becomes very slow with increasing x and m (up to 
105 iterations are required), and none of them is clearly preferable to the others. Moreover, 
there seems to be no systematic dependency of the choice of the best iterative solver on x and 
m; however, the difference in computational time was less than a factor of two, except for the 
largest x and m studied. 
Rahola [114] showed that the spectrum of the integral scattering operator for any 
homogenous scatterer is a line in the complex plane going from 1 to m2, except for a small 
amount of points, which corresponds to refractive indices that cause resonances for the 
specific shape. The spectrum of A is similar, since this matrix is obtained in the DDA by 
discretization of the integral operator (see also [9]). Assuming that the spectrum of A exactly 
lies on the specified line, it was shown that an estimate for the optimal reduction factor6 γ can 
be given as 
Eq. (77) is an approximation valid for small particle sizes, where no, or only few, resonances 
are present. However, in all cases the spectrum of A resembles the spectrum of the linear 
operator, which is defined by shape, size and refractive index of the scatterer. Therefore, the 
spectrum, and thus convergence, should not depend significantly on the discretization. This 
fact was confirmed empirically in other manuscripts [9,63]. 
Budko and Samokhin [115] generalized Rahola’s results to arbitrary inhomogeneous 
and anisotropic scatterers. They described a region in the complex plane that contains the 
whole spectrum of the integral scattering operator. This region depends only on the values of 
m inside the scatterer and does not depend on x. They showed that for purely real m or for m 
with very small imaginary part this region may come close to the origin, therefore the 
spectrum may contain very small eigenvalues for particles larger than the wavelength. This 
                                                 
6 Norm of the residual is decreased by this factor every iteration. 
may explain the extremely slow convergence of the iterative solver for real m and large x, 
which was recently obtained in numerical simulations [113]. Based on the analysis of the 
spectrum of the integral scattering operator for particles much smaller than the wavelength, 
Budko et al. [116] proposed an efficient iteration method for this particular case. 
It can be concluded that there are several modern iterative methods (QMR(CS), Bi-
CG(CS), and Bi-CGSTAB) that have proved to be efficient when applied to the DDA. 
However, none of them can be claimed superior to the others, and one should test them for 
particular light-scattering problems. Moreover, except for the simplest cases, preconditioning 
of the DDA interaction matrix is almost not studied, while there is a need for it for large x and 
m, since then all methods converge extremely slowly or even diverge. It seems to us that the 
next major numerical advance in the DDA will be achieved by developing a powerful 
preconditioner for the DDA matrix. 
A large number of dipoles requires large computational power and, hence, parallel 
computers are commonly used, e.g. [108,113]. Parallel efficiency is not discussed here, but 
for iterative solvers, it is generally close to 1 [117]. However, this is not true for all 
preconditioners [107], and hence heavy preconditioners requiring large computational time in 
combination with a parallel DDA implementation should be employed with caution. 
4.2 Scattering order formulation 
The Rayleigh-Debye-Gans (RDG) approximation [27] consists in considering E(r) equal to 
Einc(r). F(n) is then obtained directly from Eq. (24). Generalization of the RDG approach is 
obtained by iteratively solving the integral equation (1), which can be rewritten as 
)()()( inc rΛErErE += , (78)
where Λ is a linear integral operator describing the scatterer. The iterative scheme is readily 
obtained by inserting the current (l-th) iteration of the electric field E(l)(r) into the right side of 
Eq. (78) and calculating the next iteration in the left side: 
)()()( )(inc)1( rΛErErE ll +=+ . (79)
The starting value is taken the same as in RDG, , and the general formula for 
the solution is the following: 
)()( inc)0( rErE =
inc )()(
l rEΛrE , (80)
which is a direct implementation of the well-known Neumann series: 
lΛΛI , (81)
where I is the unitary operator. A necessary and sufficient condition for Neumann-series 
convergence is 
1<Λ . (82)
Physical sense of this iterative method lies in successive calculations of interaction 
between different parts of the scatterer. The zeroth approximation (or RDG) accounts for no 
interaction; the first approximation considers the influence of scattering of each dipole on the 
others once, and so on. Eq. (82) states that the interaction inside the scatterer should be small, 
but not as small as required for the applicability of RDG ( 1<<Λ ). In scattering problems, 
especially in quantum physics, Eq. (80) is called the Born expansion. 
Although theoretically clear, the Born expansion is not directly applicable [118], since 
each successive iteration requires analytical evaluation of multidimensional integrals with 
rising complexity, which quickly becomes unfeasible even for the simplest scatterers. The 
latest result is probably that of Acquista [118], who evaluated the Born expansion for a 
homogenous sphere up to second order. Therefore, realistic application of the Born expansion 
does require discretization of the integral operator, which is naturally done in the DDA. 
A scattering order formulation (SOF) of the DDA was developed independently by 
Chiappetta [119] and Singham and Bohren [12,120] by applying the Neumann series to 
Eq. (17). Λ is then a matrix defined as jijij αGΛ = , where each element is a dyadic, which 
can be expressed as a 3×3 matrix. An explicit check of Eq. (82) for a certain scatterer is not 
feasible numerically, however de Hoop [121] derived a sufficient condition for scalar waves: 
1)(max)(2 20 <r
χπ kR , (83)
where R0 is the radius of the smallest sphere circumscribing the scatterer. Although not 
directly applicable to light scattering, Eq. (83) can be used as an estimate. 
The range of size parameter and refractive index where SOF converges is limited [120]. 
Moreover, even when SOF converges, more advanced iterative methods converge faster (see 
Subsection 4.1). However, SOF has clear physical sense and can be used to study the 
importance of multiple scattering. 
4.3 Block-Toeplitz 
A square matrix A is called Toeplitz if jiij aA −= , i.e. matrix elements on any line parallel to 
the main diagonal are the same [106]. In a block-Toeplitz (BT) matrix (of order K) elements 
ai are not numbers, but square matrices themselves: 
. (84)
A 2-level BT matrix has BT matrices as components ai. Proceeding recursively a multilevel 
BT (MBT) matrix for any number of levels is defined. 
Let us consider a rectangular lattice nx×ny×nz, numbered in the following way 
zzyyzxxzy ininninnni +−+−= )1()1( , (85)
where  indicates the position of the element along the axes. Let us also define 
the vector index . Then one can verify that the interaction matrix in Eq. 
},...,1{ μμ ni ∈
),,( zyx iii=i (20), 
defined by Eq. (13), satisfies the following: 
jiGGG −′== jiij . (86)
This equation alone can be used to greatly reduce the storage requirements of iterative 
methods by use of indirect addressing. Further improvement is to note that Eq. (86) defines a 
symmetric 3-level BT matrix (orders of subsequent levels – nx, ny, nz) whose smallest blocks 
are 3×3 matrices (dyadics) ijG . 
A rectangular lattice is not much of a restriction, since any scatterer can be embedded in 
an appropriate rectangular grid. However, additional “empty” dipoles should be introduced to 
build up the grid up to the full parallelepiped. Moreover, position and size of the dipoles 
cannot be chosen arbitrarily to better describe the shape of the scatterer. This is especially 
problematic for highly porous particles or clusters of particles, where the monomer has a size 
comparable to a single dipole. For all other cases these restrictions are minor compared to the 
large increase in computational speed, imposed by the BT-structure of the interaction matrix. 
A matrix-vector multiplication can be transformed to a convolution, which is computed using 
a fast Fourier transform (FFT) technique in O(nln(n)) operations (see Subsection 4.4). Note 
however, that alternative techniques exist that do not require a regular grid (see Subsection 
4.5). 
The BT-structure also permits acceleration of direct methods. Flatau et al. [122] used an 
algorithm for inversion of symmetric BT-matrices. It has complexity )(O 3 xnn  and storage 
requirements )(O 2 xnn , since only 2 block columns of the inverse matrix need to be stored. 
In this case the x-axis is oriented along the longest particle dimension. Recently Flatau [123] 
studied the special case of 1D DDA where all dipoles are located on a straight line and 
equally spaced, in which systems of equations for different components can be separated. The 
interaction matrix for each component is symmetric Toeplitz, and a modern fast algorithm can 
be applied for its inversion. This method requires preliminarily solving linear equations for 
two right sides (e.g., by some iterative technique); then multiplication of the inverse matrix by 
any vector (i.e., a solution of the linear system for any right part) requires only O(nln(n)) 
operations. However, Flatau pointed out a strict limitation for all methods for fast calculation 
of the inverse of the interaction matrix: they are applicable only when polarizabilities of all 
dipoles are the same, since otherwise the first term on the right side of Eq. (20) ruins the BT 
structure on the diagonal of the interaction matrix. Therefore, they are currently limited to 
homogenous rectangular scatterers. Fortunately, it is not a problem for matrix-vector 
multiplication, since the diagonal term can be evaluated independently and added to the final 
result. 
4.4 FFT 
Goodman et al. [124] showed that multiplication of the interaction matrix for a rectangular 
lattice (see Subsection 4.3) by a vector can be transformed into a discrete convolution 
jii PGPGPGy ′′=′== ∑∑∑
)2,2,2(
)1,1,1(
)1,1,1(1
zyxzyx nnnnnnN
jij , (87)
where iG′  is defined by Eq. (86) (and 0=′0G ) for μμ ni ≤ and 
⎧ ≤≤∀
otherwise,0
1:, μμμ njj
P . (88)
Both G ′  and  are then regarded as periodic in each dimension μ with period 2nP′ μ. A 
discrete convolution can be transformed with a FFT to an element-wise product of two 
vectors, which is easily computed. It requires evaluation of a direct and inverse FFT for each 
matrix-vector product. Each of them is a 3D FFT of order 2nx×2ny×2nz. This operation is done 
for each of the 3 Cartesian components of P′  and preliminary calculations is performed for 6 
independent tensor components of G ′ . 
A slightly different method can be devised based on the paper by Barrowes et al. [125], 
who developed an algorithm for multiplication of any MBT by a vector. The multiplication is 
brought down to a 1D convolution that is evaluated by two 1D FFTs of order 
)12)(12)(12( −−− zyx nnn . Flatau [123] proposed an algorithm of matrix-vector 
multiplication for BT interaction matrix (e.g. 1D DDA), which requires twice as many FFTs 
as the standard algorithm, but of order n instead of 2n. Although Flatau stated that an 
extension of this algorithm to the general 3D case is straightforward, it is at least not trivial 
and probably its complexity will scale the same as standard methods. 
4.5 Fast multipole method 
The fast multipole method (FMM) was developed by Greengard and Rokhlin [126] for 
efficient evaluation of the potential and force fields in N-body simulations where all pairwise 
interactions of N particles are computed. The FMM is based on truncated potential expansions 
[127]. It is also called a hierarchical tree method because particles are grouped together in a 
hierarchical way, and the interaction between single particles and this hierarchy of particle 
groups is calculated [128]. However, some researchers distinguish between single- and 
multilevel FMM [129,130]; only the latter is truly hierarchical. The FMM naturally fits the 
DDA, since the matrix-vector multiplication is actually computing the total field on each 
single dipole due to all other dipoles, as was noted by Hoekstra and Sloot [128]. The 
computational complexity of the FMM (see below) is similar to FFT-based methods (see 
Subsection 4.4), but it does not require any regularity of the grid, thus making it applicable to 
any scatterer. The drawback is that the FMM is conceptually more complex, making it much 
harder to code. Nonetheless, the FMM was implemented in the DDA by Rahola [9,127]. 
Error analysis is critical for the FMM, since the acceleration is obtained by using 
approximations, in contrast to exact FFT-based methods. Approximation parameters are 
chosen to keep an error, calculated according to some estimate, in certain bounds. The more 
exact the error estimate is, the less computations are required; thus, the faster the whole 
algorithm. Therefore, algorithm complexity is directly connected to error analysis [131]. 
Koc and Chew [129] described the application of multilevel FMM to the DDA. They 
used semi-empirical formulae to determine the number of terms in multipole series, and 
obtained O(N) complexity. However, rigorous, close to exact, error analysis is still lacking for 
the FMM applied to the DDA. It will allow obtaining a real algorithm complexity with 
guaranteed accuracy. Such an analysis has been conducted for 2D acoustic scattering [130], 
and for light scattering formulated in terms of surface integrals [131]. In both cases the FMM 
was proven to have an asymptotic complexity O(Nln2(N)). Application of the FMM to 
surface-integrals formulation of light scattering was reviewed by Dembart and Yip [132]. 
Another problem of implementing the FMM is that it is completely dependent upon the 
exact form of the interaction potential ijG . All manuscripts mentioned above deal with 
interaction between point dipoles, i.e. Eq. (14). If a more complex expression for ijG  is used 
(e.g. IT), most of the FMM should be developed anew. This makes integration of the FMM 
and the DDA a formidable problem. 
The FMM is a promising method to calculate light scattering by particles that cannot be 
mapped effectively on a rectangular grid; however, there is still space for improving its theory 
to make it more robust and guarantee certain accuracy. 
The FMM is not the only hierarchical tree method available. For instance, a very 
intuitively simple method was proposed by Barnes and Hut [133,134]. Multipole expansions 
over the center of mass in gravitational computations are used, contrary to geometrical center 
in the FMM. It automatically eliminates the second term in the multipole expansion, and 
allows fast evaluation of monopole terms. Though this method is much simpler and clearer 
than the FMM, it has very little control over the errors that can be studied almost exclusively 
empirically. It can be applied to the DDA without significant increase in the total 
computational errors.7
An alternative approach was proposed by Ding and Tsang [135]. They studied scattering 
from trees and used a sparse matrix iterative approach. The interaction matrix is divided into a 
strong part, which accounts for interaction between nearby dipoles, and a complement weak 
part: . The strong part is sparse and therefore allows quick solution of the linear 
system. The weak part is a small correction that is accounted for iteratively: 
ws AAA +=
yxA =)0(s , . )(w)1(s ll xAyxA −=+ (89)
The authors demonstrate potential of this approach for some test cases. 
4.6 Orientation averaging and repeated calculations 
In many physical applications, one is interested in optical properties of an ensemble of 
randomly oriented particles. When the concentration of particles is small, multiple scattering 
is negligible and the optical properties are obtained by averaging single-particle scattering 
over different particle orientations. More general problems, where particles are not identical 
or multiple scattering is significant, are not considered here. 
Orientation averaging of any scattering property can be described as the integral over 
the Euler’s orientation angles (including a probability distribution function if necessary), 
                                                 
7 Hoekstra AG, unpublished results 
which is brought down to a sum by appropriate quadrature. The problem therefore consists in 
calculation of some scattering property for a set of different orientations of the same particle. 
The easiest way is to calculate it by solving sequentially and independently each problem 
from the set. However, the large size of this set calls for some means of reducing the 
calculations. This is especially relevant when the particle is asymmetric; hence, its optical 
properties are sensitive to particle orientation. Let us further assume for clarity that we are 
interested in the scattering matrix at a certain scattering angle. All the discussion for other 
scattering properties is analogous or even simpler. 
Singham et al. [136] noted that the set of problems described above is physically 
equivalent to a fixed orientation of the particle and different incident and scattering directions. 
The latter are determined by transformation of the laboratory reference frame to the reference 
frame associated with the particle. The amplitude scattering matrix, and hence the Mueller 
matrix, also is transformed along with the reference frame (see e.g. [137] for transformation 
formulae). There are two immediate advantages of using a fixed particle orientation. First, A 
is kept constant (see though discussion below), and therefore the construction of A is done 
only once. Second, the amplitude matrix for any scattering angle is quickly obtained after the 
linear system is solved (for two incident polarizations). Hence, integration over one Euler 
angle is relatively fast. 
The constancy of A can be exploited to further reduce the time of orientation-averaging. 
If  or its LU decomposition is obtained [75,136], a single solution for any right part y can 
be obtained in n
2 operations – the same or less time than required for one iteration using 
general iterative methods (see Subsection 4.1). Moreover, Singham et al. [136] and McClain 
and Ghoul [138] independently proposed an analytical way of averaging the scattering matrix 
at any scattering angle, which requires O(n2) operations once  is known. Khlebtsov [139] 
extended this technique to averaging of extinction and absorption cross sections. 
However, by employing special properties of the matrix A in the DDA allows 
computing matrix-vector products in O(nln(n)) operations (see subsections 4.4, 4.5). Although 
some acceleration of direct methods also can be performed (see Subsection 4.3), they are still 
O(n2) or slower. For large n, iterative methods (assuming that they converge in much less than 
n iterations) are clearly preferable, even if many quadrature points are used. Moreover, large n 
is unattainable by direct methods because of storage requirements. Another improvement 
could be using a heavy preconditioner, which has large initialization cost and greatly 
increases convergence rate. Initialization cost is then justified because it is computed only 
once. Possible candidates are incomplete factorization preconditioners [107]. 
Above it was stated that A is constant for a fixed-orientation particle. However, modern 
DDA formulations (e.g. LDR) take into account the direction of light incidence. Hence A 
depends upon this direction, but only weakly through ( )( )2O kd  corrections. This complicates 
the techniques described above, however probably they still may be used together with some 
special methods to correct for small changes in A on every step. Such methods have not been 
developed as yet. 
Another possibility to perform orientation averaging is to first compute the T-matrix of 
the particle, which then allows analytical averaging [140]. The T-matrix formalism is based 
on the multipole expansion which is truncated at some order N0. Although N0 is hard to 
deduce a priori, usually it is several times x [141,142]. The number of rows in the T-matrix 
equals . The simplest way to evaluate the T-matrix based on the DDA is to solve 
for every incident spherical wave (i.e. for each row of the T-matrix) independently [141]. 
Then the above discussion about optimizing this repeated calculation is relevant. Using 
iterative techniques with N
)2(2 00 +NN
iter number of iterations, computation time is 
( ) ( )[ ]20iter20 O)ln(O NNNNN + , where the first term in the sum is the time for solving the linear 
system, and the second one is the actual computation of the values in the row of the T-matrix. 
A new method to obtain the T-matrix from the DDA interaction matrix was proposed by 
Mackowski [141]. This requires two summations with computational time ( ))ln(O 20 NNN  and 
( )NN 40O . Mackowski showed that for 5=x  his method is an order of magnitude faster than 
the straightforward one. 
Recently Muinonen and Zubko [143] have proposed a way to optimize ensemble 
averaging of DDA results over different sizes and refractive indices. It is based on calculating 
a “good guess” for the initial vector in the iterative solver using results of the calculations 
with similar parameters. Similar ideas can be used to optimize simulation of a set of slightly 
different shapes or orientational averaging. 
Use of repeated calculations to increase the accuracy of DDA simulations was proposed 
recently by Yurkin et al. [63]. Several independent simulations with different discretization 
parameter were performed and results were extrapolated to the infinite discretization giving 
better accuracy than those of a single DDA simulation. 
5 Comparison of the DDA to other methods 
Hovenier et al. [144] compared the DDA, the extended boundary condition method (EBCM), 
and the separation of variables method (SVM) for calculations of scattering by spheroids, 
finite cylinders and bispheres. Parameters of the problems were as follows: , 
equivolume size parameter , 
i01.05.1 +=m
5=x 6.0=y . The angular dependencies of scattering matrix 
elements were calculated. The EBCM and SVM seemed to achieve an exact solution, and the 
DDA showed little errors, except for backscattering angles, where they were up to 10-20%. 
Wriedt and Comberg [145] compared the DDA, EBCM, and finite difference time 
domain (FDTD) method for a cube with 33.1=m , 1.5 and 9.2=x , 4.9, 9.7. For  and 
4.9. The DDA and EBCM achieved good accuracy in calculation of scattering intensity angle 
dependence; the DDA was 2-5 times faster, but consumed 8-16 times more memory (y was in 
the range 0.3-0.5). The FDTD had similar computational requirements as the DDA but was 
less accurate. For  the DDA was the only one to achieve little errors within the given 
computational resources. 
9.2=x
7.9=x
Comberg and Wriedt [146] compared the DDA, GMM (see Subsection 3.3) and the 
generalized multipole technique (GMT) for clusters of a few spheres. A single sphere had x in 
the range 4–20 and 33.1=m , 1.5. All the methods managed to achieve good accuracy, but the 
GMM was one order of magnitude (and for large x even several orders) faster than the other 
two. The DDA and GMT also were used to compute scattering by a cluster of two oblate 
spheroids with  and . The DDA was less accurate and consumed 4 times more 
memory, but was 6 times faster than the GMT. 
5=x 33.1=m
Wriedt et al. [147] compared the DDA, FDTD, GMT, and discrete sources method 
(DSM) for the calculation of light scattering by a red blood cell (RBC) with  and 
. Accuracy of all methods was similar. The DDA and GMT showed similar 
calculation times; they were 7 times faster than the FDTD and 12 times slower than the DSM. 
It should be noted that the latter employed the axisymmetric property of RBC. 
06.1=m
Recently Yurkin et al. [58] systematically compared the DDA and the FDTD for 
spheres with m from 1.02 to 2 and x from 10 to 100, depending on m. It was shown that 
numerical performance of the DDA is much more sensitive to the refractive index than that of 
the FDTD. Therefore, the DDA is preferable for small m, the FDTD for larger m. Cleary, the 
crossover point is not well defined and will depend on the details of the problem at hand as 
well as on the particular implementations of both methods. 
The main advantage of the DDA is that it is one of the most general methods, having a 
very broad range of applicability, limited only by available computational power. The reverse 
of this advantage is that it has almost no means to use the symmetry of the scatterer. Thus the 
DDA is not able to compete with the EBCM for homogenous axisymmetric scatterers. For 
homogenous non-axisymmetric scatterers the DDA is competitive with the EBCM for single-
particle orientation, but the latter allows much faster orientation averaging. The EBCM has 
little applicability to inhomogeneous scatterers, where the DDA can be applied without any 
changes. Comparison between the FDTD and the DDA suggests that the DDA is more 
suitable for small m. It also should be noted that the FDTD is even more general, being easily 
applicable to non-harmonic incident electric fields. Moreover, simulation of one pulse 
incident wave with the FDTD gives the solution for a complete spectrum of incident harmonic 
plane waves, but with a limitation on accuracy. 
6 Concluding remarks 
The DDA has been reviewed using a general framework based on the integral equation for the 
electric field. Although mainstream DDA algorithms as used in several production computer 
programs, has not changed significantly since 1994, many different improvements have been 
proposed since that time. Some of them do improve the accuracy or numerical performance of 
the DDA; however, they still wait for a wide acceptance. It seems that a critical mass of new 
improvements is building up, hopefully resulting in a next breakthrough in the field of the 
DDA. 
In our opinion, future major improvements in the DDA computer implementations will 
be connected with one of the following: 
1) Decreasing shape errors by implementing WD or similar techniques. 
2) Improving polarizability and interaction terms by techniques that are still to be 
developed similar to IT and PEL. 
3) Studying different preconditioners for the DDA interaction matrix, either trying some of 
the known types or developing one considering the special structure of the matrix. 
Item (1) should improve the overall accuracy of the DDA, especially for cases where shape 
errors are dominant, item (2) should expand the DDA applicability region to higher refractive 
indices, and item (3) should boost overall performance, especially for large size parameters 
and/or refractive indices. 
Acknowledgements 
We thank Dan Mackowski for clarifying discussion on the simulations of scattering by 
clusters of spheres and Gorden Videen for critically reading the manuscript and for valuable 
discussions. Our research is supported by Siberian Branch of the Russian Academy of 
Sciences through the grant 2006-03. 
Appendix. Description of used acronyms and symbols 
See Tables A1 and A2. 
Table A1. Acronyms in alphabetical order. 
Acronym Description Sectiona
(L) left Jacobi preconditioner 4.1
I right Jacobi preconditioner 4.1
a1-term (M) dipole term in the Mie theory 3.1
Bi-CGSTAB Bi-CG stabilized 4.1
BT block-Toeplitz 4.3
CEMD coupled electric and magnetic dipole 3.4
CG conjugate gradient 4.1
CGNR CG applied to normalized equation with minimization of residual norm 4.1
CGS CG squared 4.1
CS complex symmetric 4.1
CLDR corrected LDR 3.1
CM Clausius-Mossotti 2
DDA discrete dipole approximation 1
DGF digitized Green’s function 1
DSM discrete sources method 5
EBCM extended boundary condition method 5
FCD filtered coupled dipoles 3.1
FDTD finite difference time domain 5
FFT fast Fourier transform 4.4
FMM fast multipole method 4.5
GMM generalized multiparticle Mie solution 3.3
GMRES generalized minimal residual 4.1
GMT generalized multipole technique 5
GPBi-CG generalized product-type methods based on Bi-CG 4.1
IT integration of Green’s tensor 3.1
LAK Lakhtakia 3.1
LC Lou and Charalampopoulos 3.3
LDR lattice dispersion relation 3.1
MBT multilevel BT 4.3
MoM method of moments 3.1
PEL Peltoniemi 3.1
PP Purcell and Pennypacker 1
QMR quasi-minimal residual 4.1
RBC red blood cell 5
RCB Rahmani, Chaumet, and Bryant 3.1
RDG Rayleigh-Debye-Gans approximation 4.2
RR radiative reaction correction 3.1
SCLDR surface-corrected LDR 3.1
SOF scattering order formulation 4.2
SVM separation of variables method 5
TFQMR transpose free QMR 4.1
WD weighted discretization 3.1
a where it is explained or first appears (if no explanation is given). 
Table A2. Symbols used, Latin and Greek letters in alphabetical order.a
Symbols Description Section
(0) superscript: approximate value (usually under constant field assumption) 2
(n) superscript: after n-th iteration 4.2
>< θcos  asymmetry parameter 2
* superscript: complex conjugate 2
A a matrix 3.1
a kk  2
a radius of (equivalent) sphere 3.1
B correction matrix in SCLDR 3.1, Eq. (64)
b1 – b3 numerical coefficients in polarization prescriptions 3.1
C  tensor of electrostatic solution 3.1, Eq. (60)
Csca, Cabs, Cext scattering, absorption, extinction cross section 2
c speed of light in vacuum 2
d size of a cubical cell 2
E, Einc, Eexc, Eself, Esca (total) electric field, incident, exciting, self-induced, scattered 2
e0 polarization vector of the incident wave, 10 =e  2
e superscript: effective 3.1
F scattering amplitude 2
f a function; 
volume filling factor 
G  free space dyadic Green’s function (tensor) 2
2sG  G  in static limit 
ijG  interaction term 
H superscript: conjugate transpose 3.1
hr impulse response function of a filter 3.1
2, 4.1I , I identity dyadic (tensor), operator (matrix) 
i, j subscript: vector indices 4.3
i imaginary unity 2
i, j subscript: number of the dipole 2
K order of a BT matrix 4.3
k free space wave vector 2
L  self-term dyadic 2
M integral associated with finiteness of V0; 
preconditioner 
M  dyadic associated with M 2
m refractive index (relative) 3.1
N total number of dipoles 2
n rr  2
n size of a matrix 4.1
n′ˆ  external normal to the surface 2
nx, ny, nz sizes of the rectangular lattice 4.3
P polarization 2
p superscript: principal 3.1
q dπ2  3.1
R rr ′−  2
R0 radius of the smallest sphere circumscribing the scatterer 4.2
2r, r′ radius-vectors 
S LDR coefficient dependent on incident polarization 3.1, Eq. (49)
Si amplitude matrix element 3.2
Sij Mueller matrix element 3.2
s superscript: secondary; 
strong; 
subscript: equivalent spherical dipole 
T  boundary condition tensor 3.1, Eq. (66) 
t time 2
V volume of the scatterer 2
V0 exclusion volume 2
w superscript: weak 4.5
x unknown vector 4.1
Table A2 (continued) 
Symbols Description Section
x size parameter of scatterer 3.2
x, y, z Cartesian coordinates 4.3
y a known vector (right side of a linear system) 4.1
y kdm ||  3.2
yRe kdm)Re(  3.2
2α, α  polarizability, tensor 
4.1γ optimal reduction factor 
3.1δ Kronecker symbol 
2ε electric permittivity (relative) 
3.3η correction function 
Λ intermediate tensor in RCB method 3.1, Eq. (62)
4.2Λ linear integral operator, its matrix 
μ, ν, ρ, τ, … sub-, superscript: Cartesian components of vectors (tensors) 2
ξ, ψ Riccati-Bessel functions 3.1
2χ electric susceptibility 
3.2Ψ mean relative error of far-field electric field 
2Ω solid angle 
2ω circular frequency of the harmonic electric field 
a common sub- and superscripts are given on their own. For all vectors – the same symbol but in italic (instead of 
bold) denotes Euclidian norm of the vector (except unitary vectors). 
References 
 [1]  Purcell EM, Pennypacker CR. Scattering and adsorption of light by nonspherical dielectric grains. 
Astrophys J 1973;186:705-714. 
 [2]  Draine BT. The discrete-dipole approximation and its application to interstellar graphite grains. 
Astrophys J 1988;333:848-872. 
 [3]  Draine BT, Goodman JJ. Beyond clausius-mossotti - wave-propagation on a polarizable point lattice and 
the discrete dipole approximation. Astrophys J 1993;405:685-697. 
 [4]  Draine BT, Flatau PJ. Discrete-dipole approximation for scattering calculations. J Opt Soc Am A 
1994;11:1491-1499. 
 [5]  Draine BT. The discrete dipole approximation for light scattering by irregular targets. In: Mishchenko 
MI, Hovenier, JW, Travis, LD, editors. Light Scattering by Nonspherical Particles, Theory, 
Measurements, and Applications. New York: Academic Press, 2000. p. 131-145. 
 [6]  Draine BT, Flatau PJ. User guide for the discrete dipole approximation code DDSCAT 6.1. 
http://xxx.arxiv.org/abs/astro-ph/0409262, 2004. 
 [7]  Goedecke GH, O'Brien SG. Scattering by irregular inhomogeneous particles via the digitized Green's 
function algorithm. Appl Opt 1988;27:2431-2438. 
 [8]  Lakhtakia A. Strong and weak forms of the method of moments and the coupled dipole method for 
scattering of time-harmonic electromagnetic-fields. Int J Mod Phys C 1992;3:583-603. 
 [9]  Rahola J. Solution of dense systems of linear equations in the discrete-dipole approximation. SIAM J Sci 
Comp 1996;17:78-89. 
 [10]  Piller NB. Coupled-dipole approximation for high permittivity materials. Opt Comm 1999;160:10-14. 
 [11]  Chaumet PC, Sentenac A, Rahmani A. Coupled dipole method for scatterers with large permittivity. 
Phys Rev E 2004;70:036606. 
 [12]  Singham SB, Bohren CF. Light scattering by an arbitrary particle: a physical reformulation of the 
coupled dipole method. Opt Lett 1987;12:10-12. 
 [13]  Piller NB. Influence of the edge meshes on the accuracy of the coupled-dipole approximation. Opt Lett 
1997;22:1674-1676. 
 [14]  Hage JI, Greenberg JM, Wang RT. Scattering from arbitrarily shaped particles - theory and experiment. 
Appl Opt 1991;30:1141-1152. 
 [15]  Kahnert FM. Numerical methods in electromagnetic scattering theory. J Quant Spectrosc Radiat Transf 
2003;79:775-824. 
 [16]  Peterson AW, Ray SL, Mittra R. Computational Methods of Electromagnetic Scattering. IEEE Press, 
1998. 
 [17]  Kim OS, Meincke P, Breinbjerg O, Jorgensen E. Method of moments solution of volume integral 
equations using higher-order hierarchical Legendre basis functions. Radio Science 2004;39. 
 [18]  Lu CC. A fast algorithm based on volume integral equation for analysis of arbitrarily shaped dielectric 
radomes. IEEE Trans Ant Propag 2003;51:606-612. 
http://xxx.arxiv.org/abs/astro-ph/0409262,
 [19]  Ivakhnenko V, Eremin Y. Light scattering by needle-type and disk-type particles. J Quant Spectrosc 
Radiat Transf 2006;100:165-172. 
 [20]  Wriedt T. A review of elastic light scattering theories. Part Part Sys Charact 1998;15:67-74. 
 [21]  Chiappetta P, Torresani B. Some approximate methods for computing electromagnetic fields scattered by 
complex objects. Meas Sci Technol 1998;9:171-182. 
 [22]  Mishchenko MI, Travis LD, Lacis AA. Scattering, Absorption, and Emission of Light by Small Particles. 
Cambridge: Cambridge University Press, 2002. 
 [23]  Tsang L, Kong JA, Ding KH, Ao CO. Scattering of Electromagnetic Waves: Numerical Simulations. 
New York: Wiley, 2001. 
 [24]  Jones AR. Light scattering for particle characterization. Prog Ener Comb Sci 1999;25:1-53. 
 [25]  Yanghjian AD. Electric dyadic Green's function in the source region. IEEE Proc 1980;68:248-263. 
 [26]  Peltoniemi JI. Variational volume integral equation method for electromagnetic scattering by irregular 
grains. J Quant Spectrosc Radiat Transf 1996;55:637-647. 
 [27]  Bohren CF, Huffman DR. Absorption and scattering of Light by Small Particles. New York: Wiley, 
1983. 
 [28]  Draine BT, Weingartner JC. Radiative torques on interstellar grains .1. Superthermal spin-up. Astrophys 
J 1996;470:551-565. 
 [29]  Hoekstra AG, Frijlink M, Waters LBFM, Sloot PMA. Radiation forces in the discrete-dipole 
approximation. J Opt Soc Am A 2001;18:1944-1953. 
 [30]  Chaumet PC, Rahmani A, Sentenac A, Bryant GW. Efficient computation of optical forces with the 
coupled dipole method. Phys Rev E 2005;72:046708. 
 [31]  Hoekstra AG. Computer simulations of elastic light scattering. PhD thesis. University of Amsterdam, 
Amsterdam, 1994. 
 [32]  Yurkin MA, Maltsev VP, Hoekstra AG. Convergence of the discrete dipole approximation. I. Theoretical 
analysis. J Opt Soc Am A 2006;23:2578-2591. 
 [33]  Jackson JD. Classical Electrodynamics. New York: Wiley, 1975. 
 [34]  Iskander MF, Chen HY, Penner JE. Optical-scattering and absorption by branched chains of aerosols. 
Appl Opt 1989;28:3083-3091. 
 [35]  Hage JI, Greenberg JM. A model for the optical-properties of porous grains. Astrophys J 1990;361:251-
259. 
 [36]  Livesay DE, Chen KM. Electromagnetic fields induced inside arbitrarily shaped biological bodies. IEEE 
Trans Microw Theory Tech 1974;22:1273-1280. 
 [37]  Lakhtakia A, Mulholland GW. On 2 numerical techniques for light-scattering by dielectric agglomerated 
structures. J Res Nat Inst Stand Technol 1993;98:699-716. 
 [38]  Dungey CE, Bohren CF. Light-scattering by nonspherical particles - a refinement to the coupled-dipole 
method. J Opt Soc Am A 1991;8:81-87. 
 [39]  Doyle WT. Optical properties of a suspension of metal spheres. Phys Rev B 1989;39:9852-9858. 
 [40]  Lumme K, Rahola J. Light-scattering by porous dust particles in the discrete-dipole approximation. 
Astrophys J 1994;425:653-667. 
 [41]  van de Hulst HC. Light Scattering by Small Particles. New York: Dover, 1981. 
 [42]  Okamoto H. Light scattering by clusters: the a1-term method. Opt Rev 1995;2:407-412. 
 [43]  Gutkowicz-Krusin D, Draine BT. Propagation of electromagnetic waves on a rectangular lattice of 
polarizable points. http://xxx.arxiv.org/abs/astro-ph/0403082, 2004. 
 [44]  Piller NB, Martin OJF. Increasing the performance of the coupled-dipole approximation: A spectral 
approach. IEEE Trans Ant Propag 1998;46:1126-1137. 
 [45]  Gay-Balmaz P, Martin OJF. A library for computing the filtered and non-filtered 3D Green's tensor 
associated with infinite homogeneous space and surfaces. Comp Phys Comm 2002;144:111-120. 
 [46]  Rahmani A, Chaumet PC, Bryant GW. Coupled dipole method with an exact long-wavelength limit and 
improved accuracy at finite frequencies. Opt Lett 2002;27:2118-2120. 
 [47]  Collinge MJ, Draine BT. Discrete-dipole approximation with polarizabilities that account for both finite 
wavelength and target geometry. J Opt Soc Am A 2004;21:2023-2028. 
 [48]  Rahmani A, Chaumet PC, Bryant GW. On the importance of local-field corrections for polarizable 
particles on a finite lattice: Application to the discrete dipole approximation. Astrophys J 2004;607:873-
878. 
 [49]  Evans KF, Stephens GL. Microwave radiative-transfer through clouds composed of realistically shaped 
ice crystals .1. Single scattering properties. J Atmos Sci 1995;52:2041-2057. 
 [50]  Flatau PJ, Fuller KA, Mackowski DW. Scattering by 2 spheres in contact - comparisons between 
discrete-dipole approximation and modal-analysis. Appl Opt 1993;32:3302-3305. 
 [51]  Xu YL, Gustafson BAS. Comparison between multisphere light-scattering calculations: Rigorous 
solution and discrete-dipole approximation. Astrophys J 1999;513:894-909. 
 [52]  Ku JC. Comparisons of coupled-dipole solutions and dipole refractive-indexes for light-scattering and 
absorption by arbitrarily shaped or agglomerated particles. J Opt Soc Am A 1993;10:336-342. 
http://xxx.arxiv.org/abs/astro-ph/0403082,
 [53]  Andersen AC, Mutschke H, Posch T, Min M, Tamanai A. Infrared extinction by homogeneous particle 
aggregates of SiC, FeO and SiO2: Comparison of different theoretical approaches. J Quant Spectrosc 
Radiat Transf 2006;100:4-15. 
 [54]  Singham SB. Theoretical factors in modeling polarized light scattering by arbitrary particles. Appl Opt 
1989;28:5058-5064. 
 [55]  Hoekstra AG, Sloot PMA. Dipolar unit size in coupled-dipole calculations of the scattering matrix-
elements. Opt Lett 1993;18:1211-1213. 
 [56]  Hoekstra AG, Rahola J, Sloot PMA. Accuracy of internal fields in volume integral equation simulations 
of light scattering. Appl Opt 1998;37:8482-8497. 
 [57]  Druger SD, Bronk BV. Internal and scattered electric fields in the discrete dipole approximation. J Opt 
Soc Am B 1999;16:2239-2246. 
 [58]  Yurkin MA, Brock RS, Lu JQ, Hoekstra AG. Systematic comparison of the discrete dipole 
approximation and the finite difference time domain method. (in preparation) 
 [59]  Okamoto H, Macke A, Quante M, Raschke E. Modeling of backscattering by non-spherical ice particles 
for the interpretation of cloud radar signals at 94 GHz. An error analysis. Contrib Atmos Phys 
1995;68:319-334. 
 [60]  Liu CL, Illingworth AJ. Error analysis of backscatter from discrete dipole approximation for different ice 
particle shapes. Atmos Res 1997;44:231-241. 
 [61]  Lemke H, Okamoto H, Quante M. Comment on error analysis of backscatter from discrete dipole 
approximation for different ice particle shapes [ Liu, C.-L., Illingworth, A.J., 1997, Atmos. Res. 44, 231-
241.]. Atmos Res 1998;49:189-197. 
 [62]  Liu CL, Illingworth AJ. Reply to comment by Lemke, Okamoto and Quante on 'Error analysis of 
backscatter from discrete dipole approximation for different ice particle shapes'. Atmos Res 1999;50:1-2. 
 [63]  Yurkin MA, Maltsev VP, Hoekstra AG. Convergence of the discrete dipole approximation. II. An 
extrapolation technique to increase the accuracy. J Opt Soc Am A 2006;23:2592-2601. 
 [64]  Fuller KA, Mackowski DW. Electromagnetic scattering by compounded spherical particles. In: 
Mishchenko MI, Hovenier, JW, Travis, LD, editors. Light Scattering by Nonspherical Particles, Theory, 
Measurements, and Applications. New York: Academic Press, 2000. p. 223-272. 
 [65]  Xu YL. Scattering Mueller matrix of an ensemble of variously shaped small particles. J Opt Soc Am A 
2003;20:2093-2105. 
 [66]  Mackowski DW. Electrostatics analysis of radiative absorption by sphere clusters in the rayleigh limit - 
application to soot particles. Appl Opt 1995;34:3535-3545. 
 [67]  Mackowski DW. Calculation of total cross-sections of multiple-sphere clusters. J Opt Soc Am A 
1994;11:2851-2861. 
 [68]  Ngo D, Videen G, Dalling R. Chaotic light scattering from a system of osculating, conducting spheres. 
Physics Letters A 1997;227:197-202. 
 [69]  Markel VA, Pustovit VN, Karpov SV, Obuschenko AV, Gerasimov VS, Isaev IL. Electromagnetic 
density of states and absorption of radiation by aggregates of nanospheres with multipole interactions. 
Phys Rev B 2004;70:054202. 
 [70]  Kim HY, Sofo JO, Velegol D, Cole MW, Mukhopadhyay G. Static polarizabilities of dielectric 
nanoclusters. Phys Rev A 2005;72:053201. 
 [71]  Jones AR. Electromagnetic wave scattering by assemblies of particles in the Rayleigh approximation. 
Proc R Soc London A 1979;366:111-127. 
 [72]  Jones AR. Scattering efficiency factors for agglomerates for small spheres. J Phys D 1979;12:1661-1672. 
 [73]  Kozasa T, Blum J, Mukai T. Optical-properties of dust aggregates .1. Wavelength dependence. Astron 
Astrophys 1992;263:423-432. 
 [74]  Kozasa T, Blum J, Okamoto H, Mukai T. Optical-properties of dust aggregates .2. Angular-dependence 
of scattered-light. Astron Astrophys 1993;276:278-288. 
 [75]  Lou WJ, Charalampopoulos TT. On the electromagnetic scattering and absorption of agglomerated small 
spherical-particles. J Phys D 1994;27:2258-2270. 
 [76]  Markel VA, Shalaev VM, Stechel EB, Kim W, Armstrong RL. Small-particle composites .1. Linear 
optical properties. Phys Rev B 1996;53:2425-2436. 
 [77]  Pustovit VN, Sotelo JA, Niklasson GA. Coupled multipolar interactions in small-particle metallic 
clusters. J Opt Soc Am A 2002;19:513-518. 
 [78]  Lumme K, Rahola J, Hovenier JW. Light scattering by dense clusters of spheres. Icarus 1997;126:455-
469. 
 [79]  Kimura H, Mann I. Light scattering by large clusters of dipoles as an analog for cometary dust 
aggregates. J Quant Spectrosc Radiat Transf 2004;89:155-164. 
 [80]  Hull P, Shepherd I, Hunt A. Modeling light scattering from Diesel soot particles. Appl Opt 
2004;43:3433-3441. 
 [81]  Venizelos DT, Lou WJ, Charalampopoulos TT. Development of an algorithm for the calculation of the 
scattering properties of agglomerates. Appl Opt 1996;35:542-548. 
 [82]  Voshchinnikov NV, Il'in VB, Henning T. Modelling the optical properties of composite and porous 
interstellar grains. Astron Astrophys 2005;429:371-381. 
 [83]  Kohler M, Kimura H, Mann I. Applicability of the discrete-dipole approximation to light-scattering 
simulations of large cosmic dust aggregates. Astron Astrophys 2006;448:395-399. 
 [84]  Zubko E, Petrov D, Shkuratov Y, Videen G. Discrete dipole approximation simulations of scattering by 
particles with hierarchical structure. Appl Opt 2005;44:6479-6485. 
 [85]  Bourrely C, Chiappetta P, Lemaire TJ, Torresani B. Multidipole formulation of the coupled dipole 
method for electromagnetic scattering by an arbitrary particle. J Opt Soc Am A 1992;9:1336-1340. 
 [86]  Rouleau F, Martin PG. A new method to calculate the extinction properties of irregularly shaped 
particles. Astrophys J 1993;414:803-814. 
 [87]  Mulholland GW, Bohren CF, Fuller KA. Light-scattering by agglomerates - coupled electric and 
magnetic dipole method. Langmuir 1994;10:2533-2546. 
 [88]  Lemaire TJ. Coupled-multipole formulation for the treatment of electromagnetic scattering by a small 
dielectric particle of arbitrary shape. J Opt Soc Am A 1997;14:470-474. 
 [89]  Lakhtakia A. General-theory of the purcell-pennypacker scattering approach and its extension to 
bianisotropic scatterers. Astrophys J 1992;394:494-499. 
 [90]  Loiko VA, Molochko VI. Polymer dispersed liquid crystal droplets: Methods of calculation of optical 
characteristics. Liq Crys 1998;25:603-612. 
 [91]  Smith DA, Stokes KL. Discrete dipole approximation for magneto-optical scattering calculations. Opt 
Expr 2006;14:5746-5754. 
 [92]  Su CC. Electromagnetic scattering by a dielectric body with arbitrary inhomogeneity and anisotropy. 
IEEE Trans Ant Propag 1989;37:384-389. 
 [93]  Chen RS, Fan ZH, Yung EKN. Analysis of electromagnetic scattering of three-dimensional dielectric 
bodies using Krylov subspace FFT iterative methods. Microwave Opt Tech Lett 2003;39:261-267. 
 [94]  Khlebtsov NG. An approximate method for calculating scattering and absorption of light by fractal 
aggregates. Opt Spec 2000;88:594-601. 
 [95]  Markel VA. Coupled-dipole approach to scattering of light from a one-dimensional periodic dipole 
structure. J Mod Opt 1993;40:2281-2291. 
 [96]  Chaumet PC, Rahmani A, Bryant GW. Generalization of the coupled dipole method to periodic 
structures. Phys Rev B 2003;67:165404. 
 [97]  Chaumet PC, Sentenac A. Numerical simulations of the electromagnetic field scattered by defects in a 
double-periodic structure. Phys Rev B 2005;72:205437. 
 [98]  Martin OJF. Efficient scattering calculations in complex backgrounds. AEU-Int J Electr Comm 
2004;58:93-99. 
 [99]  Yang WH, Schatz GC, Vanduyne RP. Discrete dipole approximation for calculating extinction and 
raman intensities for small particles with arbitrary shapes. J Chem Phys 1995;103:869-875. 
 [100]  Lemaire TJ, Bassrei A. Three-dimensional reconstruction of dielectric objects by the coupled-dipole 
method. Appl Opt 2000;39:1272-1278. 
 [101]  Belkebir K, Chaumet PC, Sentenac A. Superresolution in total internal reflection tomography. J Opt Soc 
Am A 2005;22:1889-1897. 
 [102]  Chaumet PC, Belkebir K, Sentenac A. Three-dimensional subwavelength optical imaging using the 
coupled dipole method. Phys Rev B 2004;69:245405. 
 [103]  Chaumet PC, Belkebir K, Lencrerot R. Three-dimensional optical imaging in layered media. Opt Expr 
2006;14:3415-3426. 
 [104]  Zubko E, Shkuratov Y, Videen G. Discrete-dipole analysis of backscatter features of agglomerated 
debris particles comparable in size with wavelength. J Quant Spectrosc Radiat Transf 2006;100:483-488. 
 [105]  Penttila A, Zubko E, Lumme K, Muinonen K, Yurkin MA, Draine BT, Rahola J, Hoekstra AG, 
Shkuratov Y. Comparison between discrete dipole implementations and exact techniques. J Quant 
Spectrosc Radiat Transf 2007, doi:10.1016/j.jqsrt.2007.01.26. 
 [106]  Press WH, Flannery BP, Teukolsky SA, Vetterling WT. Numerical Recipes in C. The Art of Scientific 
Computing. New York: Cambridge University Press, 1990. 
 [107]  Barrett R, Berry M, Chan TF, Demmel J, Donato J, Dongarra J, Eijkhout V, Pozo R, Romine C, van der 
Vorst HA. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, 
1994. 
 [108]  Hoekstra AG, Grimminck MD, Sloot PMA. Large scale simulations of elastic light scattering by a fast 
discrete dipole approximation. Int J Mod Phys C 1998;9:87-102. 
 [109]  Zhang SL. GPBi-CG: Generalized product-type methods based on Bi-CG for solving nonsymmetric 
linear systems. SIAM J Sci Comp 1997;18:537-551. 
 [110]  Freund RW. Conjugate gradient-type methods for linear-systems with complex symmetrical coefficient 
matrices. SIAM J Sci Stat Comp 1992;13:425-448. 
 [111]  Flatau PJ. Improvements in the discrete-dipole approximation method of computing scattering and 
absorption. Opt Lett 1997;22:1205-1207. 
 [112]  Fan ZH, Wang DX, Chen RS, Yung EKN. The application of iterative solvers in discrete dipole 
approximation method for computing electromagnetic scattering. Microwave Opt Tech Lett 
2006;48:1741-1746. 
 [113]  Yurkin MA, Maltsev VP, Hoekstra AG. The discrete dipole approximation for simulation of light 
scattering by particles much larger than the wavelength. J Quant Spectrosc Radiat Transf 2007, 
doi:10.1016/j.jqsrt.2007.01.33. 
 [114]  Rahola J. On the eigenvalues of the volume integral operator of electromagnetic scattering. SIAM J Sci 
Comp 2000;21:1740-1754. 
 [115]  Budko NV, Samokhin AB. Spectrum of the volume integral operator of electromagnetic scattering. 
SIAM J Sci Comp 2006;28:682-700. 
 [116]  Budko NV, Samokhin AB, Samokhin AA. A generalized overrelaxation method for solving singular 
volume integral equations in low-frequency scattering problems. Differ Eq 2005;41:1262-1266. 
 [117]  Hoekstra AG, Sloot PMA. Coupled dipole simulations of elastic light scattering on parallel systems. Int J 
Mod Phys C 1995;6:663-679. 
 [118]  Acquista C. Light scattering by tenuous particles: a generalization of the Rayleigh-Gans-Rocard 
approach. Appl Opt 1976;15:2932-2936. 
 [119]  Chiappetta P. Multiple scattering approach to light scattering by arbitrarily shaped particles. J Phys A 
1980;13:2101-2108. 
 [120]  Singham SB, Bohren CF. Light-scattering by an arbitrary particle - the scattering-order formulation of 
the coupled-dipole method. J Opt Soc Am A 1988;5:1867-1872. 
 [121]  de Hoop AT. Convergence criterion for the time-domain iterative Born approximation to scattering by an 
inhomogeneous, dispersive object. J Opt Soc Am A 1991;8:1256-1260. 
 [122]  Flatau PJ, Stephens GL, Draine BT. Light-scattering by rectangular solids in the discrete-dipole 
approximation - a new algorithm exploiting the block-Toeplitz structure. J Opt Soc Am A 1990;7:593-
600. 
 [123]  Flatau PJ. Fast solvers for one dimensional light scattering in the discrete dipole approximation. Opt 
Expr 2004;12:3149-3155. 
 [124]  Goodman JJ, Draine BT, Flatau PJ. Application of fast-Fourier-transform techniques to the discrete-
dipole approximation. Opt Lett 1991;16:1198-1200. 
 [125]  Barrowes BE, Teixeira FL, Kong JA. Fast algorithm for matrix-vector multiply of asymmetric multilevel 
block-Toeplitz matrices in 3-D scattering. Microwave Opt Tech Lett 2001;31:28-32. 
 [126]  Greengard L, Rokhlin V. A fast algorithm for particle simulations. J Comp Phys 1987;73:325-348. 
 [127]  Rahola J. Diagonal forms of the translation operators in the fast multipole algorithm for scattering 
problems. BIT 1996;36:333-358. 
 [128]  Hoekstra AG, Sloot PMA. New computational techniques to simulate light-scattering from arbitrary 
particles. Part Part Sys Charact 1994;11:189-193. 
 [129]  Koc S, Chew WC. Multilevel fast multipole algorithm for the discrete dipole approximation. J Electrom 
Wav Applic 2001;15:1447-1468. 
 [130]  Amini S, Profit ATJ. Multi-level fast multipole solution of the scattering problem. Engin Anal Bound 
Elem 2003;27:547-564. 
 [131]  Darve E. The fast multipole method I: error analysis and asymptotic complexity. SIAM J Num Anal 
2000;38:98-128. 
 [132]  Dembart B, Yip E. The accuracy of fast multipole methods for Maxwell's equations. IEEE Comp Sci 
Engin 1998;5:48-56. 
 [133]  Barnes JE, Hut P. A hierarchical O(N log N) force-calculation algorithm. Nature 1986;324:446-449. 
 [134]  Barnes JE, Hut P. Error analysis of a tree code. Astrophys J Suppl 1989;70:389-417. 
 [135]  Ding KH, Tsang L. A sparse matrix iterative approach for modeling tree scattering. Microwave Opt Tech 
Lett 2003;38:198-202. 
 [136]  Singham MK, Singham SB, Salzman GC. The scattering matrix for randomly oriented particles. J Chem 
Phys 1986;85:3807-3815. 
 [137]  Mishchenko MI. Calculation of the amplitude matrix for a nonspherical particle in a fixed orientation. 
Appl Opt 2000;39:1026-1031. 
 [138]  McClain WM, Ghoul WA. Elastic light scattering by randomly oriented macromolecules: Computation 
of the complete set of observables. J Chem Phys 1986;84:6609-6622. 
 [139]  Khlebtsov NG. Orientational averaging of integrated cross sections in the discrete dipole method. Opt 
Spec 2001;90:408-415. 
 [140]  Mishchenko MI, Travis LD, Mackowski DW. T-matrix computations of light scattering by nonspherical 
particles: A review. J Quant Spectrosc Radiat Transf 1996;55:535-575. 
 [141]  Mackowski DW. Discrete dipole moment method for calculation of the T matrix for nonspherical 
particles. J Opt Soc Am A 2002;19:881-893. 
 [142]  Mishchenko MI. Light-scattering by size shape distributions of randomly oriented axially-symmetrical 
particles of a size comparable to a wavelength. Appl Opt 1993;32:4652-4666. 
 [143]  Muinonen K, Zubko E. Optimizing the discrete-dipole approximation for sequences of scatterers with 
identical shapes but differing sizes or refractive indices. J Quant Spectrosc Radiat Transf 2006;100:288-
294. 
 [144]  Hovenier JW, Lumme K, Mishchenko MI, Voshchinnikov NV, Mackowski DW, Rahola J. 
Computations of scattering matrices of four types of non-spherical particles using diverse methods. J 
Quant Spectrosc Radiat Transf 1996;55:695-705. 
 [145]  Wriedt T, Comberg U. Comparison of computational scattering methods. J Quant Spectrosc Radiat 
Transf 1998;60:411-423. 
 [146]  Comberg U, Wriedt T. Comparison of scattering calculations for aggregated particles based on different 
models. J Quant Spectrosc Radiat Transf 1999;63:149-162. 
 [147]  Wriedt T, Hellmers J, Eremina E, Schuh R. Light scattering by single erythrocyte: Comparison of 
different methods. J Quant Spectrosc Radiat Transf 2006;100:444-456. 
 [148]  Laczik Z. Discrete-dipole-approximation-based light-scattering calculations for particles with a real 
refractive index smaller than unity. Appl Opt 1996;35:3736-3745. 
	1 Introduction
	2 General framework
	3 Various DDA models
	3.1 Theoretical base of the DDA
	3.2 Accuracy of DDA simulations
	3.3 The DDA for clusters of spheres
	3.4 Modifications and extensions of the DDA
	4 Numerical considerations
	4.1 Direct vs. iterative methods
	4.2 Scattering order formulation
	4.3 Block-Toeplitz
	4.4 FFT
	4.5 Fast multipole method
	4.6 Orientation averaging and repeated calculations
	5 Comparison of the DDA to other methods
	6 Concluding remarks
	Acknowledgements
	Appendix. Description of used acronyms and symbols
	References
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Error
  /CompatibilityLevel 1.4
  /CompressObjects /Tags
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJDFFile false
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /CMYK
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments true
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile ()
  /AlwaysEmbed [ true
  /NeverEmbed [ true
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages true
  /ColorImageDownsampleType /Bicubic
  /ColorImageResolution 300
  /ColorImageDepth -1
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /DCTEncode
  /AutoFilterColorImages true
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages true
  /GrayImageDownsampleType /Bicubic
  /GrayImageResolution 300
  /GrayImageDepth -1
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /DCTEncode
  /AutoFilterGrayImages true
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages true
  /MonoImageDownsampleType /Bicubic
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXOutputIntentProfile ()
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000410064006f006200650020005000440046002065876863900275284e8e9ad88d2891cf76845370524d53705237300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef69069752865bc9ad854c18cea76845370524d5370523786557406300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000620065006400730074002000650067006e006500720020007300690067002000740069006c002000700072006500700072006500730073002d007500640073006b007200690076006e0069006e00670020006100660020006800f8006a0020006b00760061006c0069007400650074002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f006300680077006500720074006900670065002000500072006500700072006500730073002d0044007200750063006b0065002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f00730020005000440046002000640065002000410064006f0062006500200061006400650063007500610064006f00730020007000610072006100200069006d0070007200650073006900f3006e0020007000720065002d0065006400690074006f007200690061006c00200064006500200061006c00740061002000630061006c0069006400610064002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f0075007200200075006e00650020007100750061006c0069007400e90020006400270069006d007000720065007300730069006f006e00200070007200e9007000720065007300730065002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f00620065002000500044004600200070006900f900200061006400610074007400690020006100200075006e00610020007000720065007300740061006d0070006100200064006900200061006c007400610020007100750061006c0069007400e0002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea306a30d730ea30d730ec30b951fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e305930023053306e8a2d5b9a306b306f30d530a930f330c8306e57cb30818fbc307f304c5fc59808306730593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020ace0d488c9c80020c2dcd5d80020c778c1c4c5d00020ac00c7a50020c801d569d55c002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken die zijn geoptimaliseerd voor prepress-afdrukken van hoge kwaliteit. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d00200065007200200062006500730074002000650067006e0065007400200066006f00720020006600f80072007400720079006b006b0073007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c0069007400650074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020006d00610069007300200061006400650071007500610064006f00730020007000610072006100200070007200e9002d0069006d0070007200650073007300f50065007300200064006500200061006c007400610020007100750061006c00690064006100640065002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f00740020006c00e400680069006e006e00e4002000760061006100740069007600610061006e0020007000610069006e006100740075006b00730065006e002000760061006c006d0069007300740065006c00750074007900f6006800f6006e00200073006f00700069007600690061002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a0061002e0020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400200073006f006d002000e400720020006c00e4006d0070006c0069006700610020006600f60072002000700072006500700072006500730073002d007500740073006b00720069006600740020006d006500640020006800f600670020006b00760061006c0069007400650074002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents best suited for high-quality prepress printing.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /ConvertToCMYK
      /DestinationProfileName ()
      /DestinationProfileSelector /DocumentCMYK
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure false
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles false
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /DocumentCMYK
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /UseDocumentProfile
      /UseDocumentBleed false
    >>
>> setdistillerparams
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice
ABSTRACT
  We present a review of the discrete dipole approximation (DDA), which is a
general method to simulate light scattering by arbitrarily shaped particles. We
put the method in historical context and discuss recent developments, taking
the viewpoint of a general framework based on the integral equations for the
electric field. We review both the theory of the DDA and its numerical aspects,
the latter being of critical importance for any practical application of the
method. Finally, the position of the DDA among other methods of light
scattering simulation is shown and possible future developments are discussed.

<|endoftext|><|startoftext|>
Introduction
The scalar form factor of the pion, Γπ(t), corresponds to the matrix element
Γπ(t) =
d4x e−i(q
′−q)x〈π(q′)|
muū(x)u(x) +mdd̄(x)d(x)
|π(q)〉 , t = (q′ − q)2 . (1.1)
Performing a Taylor expansion around t = 0,
Γπ(t) = Γπ(0)
t〈r2〉πs +O(t2)
, (1.2)
where 〈r2〉πs is the quadratic scalar radius of the pion.
The quantity 〈r2〉πs contributes around 10% [1] to the values of the S-wave ππ scattering lengths
a00 and a
0 as determined in ref.[1], by employing Roy equations and χPT to two loops. If one
takes into account that this reference gives a precision of 2.2% in its calculation of the scattering
lengths, a 10% of contribution from 〈r2〉πs is a large one. Related to that, 〈r2〉πs is also important
in SU(2)×SU(2) χPT since it gives the low energy constant ℓ̄4 that controls the departure of Fπ
from its value in the chiral limit [2, 3] at leading order correction.
Based on one loop χPT , Gasser and Leutwyler [2] obtained 〈r2〉πs = 0.55 ± 0.15 fm2. This
calculation was improved later on by the same authors together with Donoghue [4], who solved
the corresponding Muskhelishvili-Omnès equations with the coupled channels of ππ and KK̄. The
update of this calculation, performed in ref.[1], gives 〈r2〉πs = 0.61±0.04 fm2, where the new results
on S-wave I=0 ππ phase shifts from the Roy equation analysis of ref.[5] are included. Moussallam
[6] employs the same approach and obtains values in agreement with the previous result.
One should notice that solutions of the Muskhelishvili-Omnès equations for the scalar form
factor rely on non-measured T−matrix elements or on assumptions about which are the channels
that matter. Given the importance of 〈r2〉πs , and the possible systematic errors in the analyses
based on Muskhelishvili-Omnès equations, other independent approaches are most welcome. In
this respect we quote the works [7, 8, 9], and Ynduráin’s ones [10, 11, 12]. These latter works have
challenged the previous value for 〈r2〉πs , shifting it to the larger 〈r2〉πs = 0.75 ± 0.07 fm2. From
ref.[1] the equations,
δa00 = +0.027∆r2 , δa
0 = −0.004∆r2 , (1.3)
give the change of the scattering lengths under a variation of 〈r2〉πs defined by 〈r2〉πs = 0.61(1 +
∆r2) fm
2. For the difference between the central values of 〈r2〉πs given above from refs.[1, 10], one
has ∆r2 = +0.23. This corresponds to δa
0 = +0.006 and δa
0 = −0.001, while the errors quoted
are a00 = 0.220 ± 0.005 and a20 = −0.0444 ± 0.0010. We then adduce about shifting the central
values for the predicted scattering lengths at the level of one sigma.
The value taken for 〈r2〉πs is also important for determining the O(p4) χPT coupling ℓ̄4. The
value of ref.[1] is ℓ̄4 = 4.4±0.2 while that of ref.[10] is ℓ̄4 = 5.4±0.5. Both values are incompatible
within errors.
The papers [10, 11, 12] have been questioned in refs.[13, 14]. The value of the Kπ quadratic
scalar radius, 〈r2〉Kπs , obtained by Ynduráin in ref.[10], 〈r2〉Kπs = 0.31± 0.06 fm2, is not accurate,
because he relies on old experiments and on a bad parameterization of low energy S-wave I=1/2Kπ
phase shifts by assuming dominance of the κ resonance as a standard Breit-Wigner pole [15]. Fur-
thermore, 〈r2〉Kπs was recently fixed by high statistics experiments in an interval in agreement with
the sharp prediction of [15], based on dispersion relations (three-channel Muskhelishvili-Omnès
equations from the T−matrix of ref.[16]) and two-loop χPT [17]. From the recent experiments
[18, 19], one has for the charged kaons [18] 〈r2〉K±πs = 0.235 ± 0.014 ± 0.007 fm2, and for the
neutral ones [19] 〈r2〉KLπs = 0.165 ± 0.016 fm2. The prediction of [15], in an isospin limit, is
〈r2〉Kπs = 0.192± 0.012 fm2, lying just in the middle of the experimental determinations. Another
issue is Ynduráin’s more sound determination of the pionic scalar radius, whose (in)correctness is
not settled yet.
In this paper we concentrate on the approach of Ynduráin [10, 11, 12] to evaluate the quadratic
scalar radius of the pion based on an Omnés representation of the I=0 non-strange pion scalar form
factor. Our main conclusion will be that this approach [10] and the solution of the Muskhelishvili-
Omnès equations [4], with ππ and KK̄ as coupled channels, agree between each other if one
properly takes into account, for some T−matrices, the presence of a zero in the pion scalar form
factor at energies slightly below the KK̄ threshold. Precisely these T−matrices are those used in
[10] and favoured in [11]. Once this is considered we conclude that 〈r2〉πs = 0.63± 0.05 fm2.
The contents of the paper are organized as follows. In section 2 we discuss the Omnès rep-
resentation of Γπ(t) and derive the expression to calculate 〈r2〉πs . This calculation is performed
in section 3, where we consider different parameterizations for experimental data and asymptotic
phases for the scalar form factor. Conclusions are given in the last section.
2 Scalar form factor
The pion scalar form factor Γπ(t), eq.(1.1), is an analytic function of t with a right hand cut,
due to unitarity, for t ≥ 4m2π. Performing a dispersion relation of its logarithm, with the possible
zeroes of Γπ(t) removed, the Omnès representation results,
Γπ(t) = P (t) exp
s(s− t)
. (2.1)
Here, P (t) is a polynomial made up from the zeroes of Γπ(t), with P (0) = Γπ(0). In the previous
equation, φ(s) is the phase of Γπ(t)/P (t), taken to be continuous and such that φ(4m
π) = 0. In
ref.[10] the scalar form factor is assumed to be free of zeroes and hence P (t) is just the constant
Γπ(0) (the exponential factor is 1 for t = 0). Thus,
Γπ(t) = Γπ(0) exp
s(s− t)
. (2.2)
From where it follows that,
〈r2〉πs =
ds . (2.3)
One of the features of the pion scalar form factor of refs.[4, 6, 8], as discussed in ref.[13], is the
presence of a strong dip at energies around the KK̄ threshold. This feature is also shared by the
strong S-wave I=0 ππ amplitude, tππ. This is so because tππ is in very good approximation purely
elastic below the KK̄ threshold and hence, neglecting inelasticity altogether in the discussion that
follows, it is proportional to sin δπe
iδπ , with δπ the S-wave I=0 ππ phase shift. It is an experimental
fact that δπ is very close to π around the KK̄ threshold, as shown in fig.1. Therefore, if δπ = π
happens before the opening of this channel the strong amplitude has a zero at that energy. On
the other hand, if δπ = π occurs after the KK̄ threshold, because inelasticity is then substantial,
see eq.(2.4) below, there is not a zero but a pronounced dip in |tππ|. This dip can be arbitrarily
close to zero if before the KK̄ threshold δπ approaches π more and more, without reaching it.
400 600 800 1000 1200 1400 1600
 (MeV)
Eq. (3.13), [20]
PY [24]
CGL [1]
Sol. A of [27]
Sol. B of [27]
Sol. C of [27]
Sol. D of [27]
Sol. E of [27]
Kaminski et al. [21]
BNL-E865 Coll. [25]
NA48/2 Coll. [26]
300 350 400 450
BL-E865 Coll. [25]
NA48/2 Coll. [26]
PY [24]
CGL [1]
Figure 1: S-wave I = 0 ππ phase shift, δπ(s). Experimental data are from refs.[21, 25, 26, 27].
Because of Watson final state theorem the phase φ(s) in eq.(2.1) is given by δπ(s) below the
KK̄ threshold, neglecting inelasticity due to 4π or 6π states as indicated by experiments [20]. The
situation above the KK̄ threshold is more involved. Let us recall that
tππ = (η e
2iδπ − 1)/2i , (2.4)
with 0 ≤ η ≤ 1 and the inelasticity is given by 1− η2, with η the elasticity coefficient. We denote
by ϕ(s) the phase of tππ, required to be continuous (below 4m
K it is given by δπ(s)). By continuity,
close enough to the KK̄ threshold and above it, η → 1 and then we are in the same situation as in
the elastic case. As a result, because of the Watson final state theorem and continuity, the phase
φ(s) must still be given by ϕ(s). For δπ(sK) < π, sK = 4m
K , ϕ(s) does not follow the increasing
trend with energy of δπ(s) but drops as a result of eq.(2.4), see fig.2 for δπ(sK) < π. This is easily
seen by writing explicitely the real and imaginary parts of tππ in eq.(2.4),
tππ =
η sin 2δπ +
(1− η cos 2δπ) . (2.5)
400 600 800 1000 1200 1400 1600 1800 2000
 (MeV)
1000 1100 1200 1300 1400 1500 1600
 (MeV)
( δπ(sK)< 180
( δπ(sK)> 180
Figure 2: Left panel: Strong phase ϕ(s), eigenvalue phase δ(+)(s) and asymptotic phase φas(s). Right
panel: Integrand of 〈r2〉πs in eq.(3.12) for parameterization I (dashed line) and II (solid line). For more
details see the text. Notice that the uncertainty due to φas(s) is much reduced in the integrand.
The imaginary part is always positive (η < 1 above the KK̄ threshold and 1.1 GeV [20]) while the
real part is negative for δπ < π, but in an interval of just a few MeV the real part turns positive
as soon as δπ > π, fig.1. As a result, ϕ(s) passes quickly from values below but close to π to
the interval [0, π/2]. This rapid motion of φ(s) gives rise to a pronounced minimum of |Γπ(t)| at
this energy, as indicated in ref.[13] and shown in fig.3. The drop in φ(s) becomes more and more
dramatic as δπ(sK) → π− (with the superscript +(−) indicating that the limit is approached from
values above(below), respectively); and in this limit, φ(sk) = ϕ(sK) is discontinuous at sK . This is
easily understood from eq.(2.5). Let us call s1 the point at which δπ(s1) = π with s1 > sK . Close
and above s1, ϕ(s) ∈ [0, π/2], for the reasons explained above, and ϕ(s) has decreased very rapidly
from almost π at the KK̄ threshold to values below π/2 just after s1. Then, in the limit s1 → s+K
one has φ(s−K) = ϕ(s
K) = π on the left, while on the right φ(s
K) = ϕ(s
K) < π/2. As a result ϕ(s)
is discontinuous at s = sK . We stress that this discontinuity of ϕ(s) at sK when δπ(sK) → π−
applies rigorously to φ(sK) as well since η(sK) = 1. This discontinuity at s = sK implies also that
the integrand in the Omnès representation for Γπ(t) develops a logarithmic singularity as,
φ(s−K)− φ(s
, (2.6)
with δ → 0+. When exponentiating this result one has a zero for Γπ(sK) as (δ/sK)ν , ν = (φ(s−K)−
φ(s+K))/π > 0 and δ → 0+. This zero is a necessary consequence when evolving continuously from
δπ(sK) < π to δπ(sK) > π.
#1 This in turn implies rigorously that in the Omnès representation of
Γπ(t), eq.(2.1), P (t) must be a polynomial of first degree for those cases with δπ(sK) ≥ π,#2
P (t) = Γπ(0)
s1 − t
, (2.7)
with s1 the position of the zero. Notice that the degree of the polynomial P (t) is discrete and thus
by continuity it cannot change unless a singularity develops. This is the case when δπ(sK) = π,
changing the degree from 0 to 1. Hence, if δπ(sK) ≥ π for a given tππ, instead of eqs.(2.2) and
(2.3) one must then consider,
Γπ(t) = Γπ(0)
s1 − t
s(s− t)
, (2.8)
〈r2〉πs = −
ds . (2.9)
For those tππ for which δπ(sK) > π then ϕ(s) follows δπ(s) just after the KK̄ threshold and there
is no drop, as emphasized in ref.[11], see fig.2.
Summarizing, we have shown that Γπ(t) has a zero at s1 when δπ(sK) ≥ π as a consequence of
the assumption that φ(s) follows ϕ(s) above the KK̄ threshold, along the lines of ref.[11], and by
imposing continuity in Γπ(t) under small changes in δπ(sK) ≃ π. As a result eqs.(2.8) and (2.9)
should be used in the latter case, instead of eqs.(2.2) and (2.3), valid for δπ(sK) < π. This solution
was overlooked in refs.[10, 11, 12]. We show in appendix A why the previous discussion on the
zero of Γπ(t) for δπ(sK) ≥ π at s1 cannot be applied to all pion scalar form factors, in particular
to the strange one.
If eq.(2.2) were used for those tππ with δπ(sK) ≥ π then a strong maximum of |Γπ(t)| would
be obtained around the KK̄ threshold, instead of the aforementioned zero or the minimum of
refs.[4, 6], as shown in fig.3 by the dashed-dotted line. That is also shown in fig.10 of ref.[22]
or fig.2 of [13]. This is the situation for the Γπ(t) of refs.[10, 11], and it is the reason why 〈r2〉πs
obtained there is much larger than that of refs.[4, 1, 6]. That is, Ynduráin uses eqs.(2.2), (2.3) for
δπ(sK) ≥ π, instead of eqs.(2.8), (2.9) (solid line in fig.3). The unique and important role played
by δπ(sK) (for elastic tππ below the KK̄ threshold) is perfectly recognised in ref.[11]. However, in
this reference the astonishing conclusion that Γπ(t) has two radically different behaviours under
tiny variations of tππ was sustained. These variations are enough to pass from δπ(sK) < π to
δπ(sK) ≥ π [10], while the T− or S−matrix are fully continuous. Because of this instability of the
solution of refs.[10, 11] under tiny changes of δπ(s), we consider ours, that produces continuous
Γπ(t), to be certainly preferred. We also stress that our solutions, either for δπ(sK) ≥ π and
δπ(sK) < π, are the ones that agree with those obtained by solving the Muskhelishvili-Omnès
equations [4, 1, 6] and Unitary χPT [8].
#1It can be shown from eq.(2.5) that φ(s−
) − φ(s+
) = π. Here we are assuming η = 1 for s ≤ sK , which is a
very good approximation as indicated by experiment [20, 21].
#2We are focusing in the physically relevant region of experimental allowed values for δπ(sK), which can be larger
or smaller than π but close to.
0 200 400 600 800 1000 1200
(MeV)
δπ(sK)<π
δπ(sK)>π,  P(t)=Γπ(0)(s1- t)/s1
δπ(sK)>π,  P(t)=Γπ(0)
ref. [8]  (δπ(sK)>π)
PSfrag replacements
Figure 3: |Γπ(t)/Γπ(0)| from eq.(2.2) with δπ(sK) < π, dashed-line, and δπ(sK) > π, dashed-dotted line.
The solid line corresponds to use eq.(2.8) for the latter case. For this figure we have used parameterization
II (defined in section 3) with α1 = 2.28 (dashed line) and 2.20 (dashed-dotted and solid lines). The
dashed-double-dotted line is the scalar form factor of ref.[8] that has δπ(sK) > π.
Let us now show how to fix s1 in terms of the knowledge of δπ(s) with δπ(sK) ≥ π. For this
purpose let us perform a dispersion relation of Γπ(t) with two subtractions,
Γπ(t) = Γπ(0) +
〈r2〉πs t+
ImΓπ(s)
s2(s− t)
ds , (2.10)
From asymptotic QCD [23] one expects that the scalar form factor vanishes at infinity [10, 12],
then the dispersion integral in eq.(2.10) should converge rather fast. Eq.(2.10) is useful because
it tells us that the only point around 1 GeV where there can be a zero in Γπ(t) is at the energy
s1 for which the imaginary part of Γπ(t) vanishes. Otherwise, the integral in the right hand side
of eq.(2.10) picks up an imaginary part and there is no way to cancel it as Γπ(0), 〈r2〉πs and t are
all real. Since |ImΓπ(t)| = |Γπ(t) sin δπ(t)| for t ≤ sK , it certainly vanishes at the point s1 where
δπ(s1) = π. As there is only one zero at such energies, this determines s1 exactly in terms of the
given parameterization for δπ(s).
One could argue against the argument just given to determine s1 that this energy could be
complex. However, this would imply two zeroes at s1 and s
1, and then the degree of P (t) would
be two instead of one. Notice that the degree of the polynomial P (t) is discrete and thus, by
softness in the continuous parameters of the T−matrix, its value should stay at 1 for some open
domain in the parameters with δπ(sK) > π until a discontinuity develops. Physically, the presence
of two zeroes would in turn require that φ(s) → 3π so as to guarantee that Γπ(t) still vanishes as
−1/t, as required by asymptotic QCD [23, 10]. This value for the asymptotic phase seems to be
rather unrealistic as ϕ(s) only reaches 2π at already quite high energy values, as shown in fig.2.
3 Results
Our main result from the previous section is the sum rule to determine 〈r2〉πs ,
〈r2〉πs = −
θ(δπ(sK)− π) +
ds , (3.11)
where θ(x) = 0 for x < 0 and 1 for x ≥ 0. We split 〈r2〉πs in two parts:
〈r2〉πs = QH +QA ,
QH = −
θ(δπ(sK)− π) +
, (3.12)
with sH = 2.25 GeV
2. Reasons for fixing sH to this value are given below.
The main issue in the application of eq.(3.11) is to determine φ(s) in the integrand. Below the
KK̄ threshold and neglecting inelasticity, one has that φ(s) = δπ(s), 4m
π ≤ s ≤ 4m2K . This follows
because of the Watson final state theorem, continuity and the equality φ(4m2π) = δπ(4m
π) = 0.
For practical applications we shall consider the S-wave I=0 ππ phase shifts given by the
K−matrix parameterization of ref.[20] (from its energy dependent analysis of data from 0.6 GeV
up to 1.9 GeV) and the parameterizations of ref.[1] (CGL) and ref.[24] (PY). The resulting δπ(s)
for all these parameterizations are shown in fig.1. We use CGL from ππ threshold up to 0.8 GeV,
because this is the upper limit of its analysis, while PY is used up to 0.9 GeV, because at this
energy it matches well inside the experimental errors with the data of [20]. The K−matrix of
ref.[20] is used for energies above 0.8 GeV, when using CGL below this energy (parameterization
I), and above 0.9 GeV, when using PY for lower energies (parameterization II). We take the pa-
rameterizations CGL and PY as their difference below 0.8 GeV accounts well for the experimental
uncertainties in δπ, see fig.1, and they satisfy constraints from χPT (the former) and dispersion re-
lations (both). The reason why we skip to use the parameterization of ref.[20] for lower energies is
because one should be there as precise as possible since this region gives the largest contribution to
〈r2〉πs , as it is evident from the right panel of fig.2. It happens that the K−matrix of [20], that fits
data above 0.6 GeV, is not compatible with data from Ke4 decays [25, 26]. We show in the insert
of fig.1 the comparison of the parameterizations CGL and PY with the Ke4 data of [25, 26]. We
also show in the same figure the experimental points on δπ from refs.[20, 21, 27]. Both refs.[20, 21]
are compatible within errors, with some disagreement above 1.5 GeV. This disagreement does not
affect our numerical results since above 1.5 GeV we do not rely on data.
The K−matrix of ref.[20] is given by,
Kij(s) = αiαj/(x1 − s) + βiβj/(x2 − s) + γij , (3.13)
where
1 = 0.11± 0.15 x
2 = 1.19± 0.01
α1 = 2.28± 0.08 α2 = 2.02± 0.11
β1 = −1.00± 0.03 β2 = 0.47± 0.05
γ11 = 2.86± 0.15 γ12 = 1.85± 0.18 γ22 = 1.00± 0.53 ,
(3.14)
with units given in appropriate powers of GeV. In order to calculate the contribution from the
phase shifts of this K−matrix we generate Monte-Carlo gaussian samples, taking into account the
errors shown in eq.(3.14), and evaluate QH according to eq.(3.12). The central value of δπ(sK)
for the K−matrix of ref.[20] is 3.05, slightly below π. When generating Monte-Carlo gaussian
samples according to eq.(3.14), there are cases with δπ(sK) ≥ π, around 30% of the samples. Note
that for these cases one also has the contribution −6/s1 in eq.(3.11).
The application of Watson final state theorem for s > 4m2K is not straightforward since inelastic
channels are relevant. The first important one is the KK̄ channel associated in turn with the
appearance of the narrow f0(980) resonance, just on top of its threshold. This implies a sudden
drop of the elasticity parameter η, but it again rapidly raises (the f0(980) resonance is narrow
with a width around 30 MeV) and in the region 1.12 . s . 1.52 GeV2 is compatible within
errors with η = 1 [20, 21]. For η ≃ 1, the Watson final state theorem would imply again that
φ(s) = ϕ(s), but, as emphasized by [13], this equality only holds, in principle, modulo π. The
reason advocated in ref.[13] is the presence of the region sK < s < 1.1
2 GeV2 where inelasticity
can be large, and then continuity arguments alone cannot be applied to guarantee the equality
φ(s) ≃ ϕ(s) for s & 1.12 GeV2. This argument has been proved in ref.[11] to be quite irrelevant
in the present case. In order to show this a diagonalization of the ππ and KK̄ S−matrix is done.
These channels are the relevant ones when η is clearly different from 1, between 1 and 1.1 GeV.
Above that energy one also has the opening of the ηη channel and the increasing role of multipion
states.
We reproduce here the arguments of ref.[11], but deliver expressions directly in terms of the
phase shifts and elasticity parameter, instead of K−matrix parameters as done in ref.[11]. For
two channel scattering, because of unitarity, the T−matrix can be written as:
(ηe2iδπ − 1) 1
1− η2ei(δπ+δK)
1− η2ei(δπ+δK) 1
(ηe2iδK − 1)
, (3.15)
with δK the elastic S-wave I=0KK̄ phase shift. In terms of the T -matrix the S-wave I=0 S−matrix
is given by,
S = I + 2iT , (3.16)
satisfying SS† = S†S = I. The T -matrix can also be written as
T = Q1/2
K−1 − iQ
Q1/2 , (3.17)
where the K−matrix is real and symmetric along the real axis for s ≥ 4m2π and Q = diag(qπ, qK),
with qπ(qK) the center of mass momentum of pions(kaons). This allows one to diagonalize K with
a real orthogonal matrix C, and hence both the T− and S−matrices are also diagonalized with
the same matrix. Writing,
cos θ sin θ
− sin θ cos θ
, (3.18)
one has
cos θ =
[(1− η2)/2]1/2
1− η2 cos2∆− η| sin∆|
1− η2 cos2∆
]1/2 ,
sin θ = −
sin∆√
1 + (1− η2) cot2∆
1− η2 cos2∆− η| sin∆|
1− η2 cos2∆
]1/2 , (3.19)
with ∆ = δK − δπ. On the other hand, the eigenvalues of the S−matrix are given by,
e2iδ(+) = S11
1 + e2i∆
1 + (1− η2) cot2∆
(3.20)
e2iδ(−) = S22
1 + e−2i∆
1 + (1− η2) cot2∆
. (3.21)
The eigenvalue phase δ(+) satisfies δ(+)(sK) = δπ(sK). The expressions above for exp 2iδ(+) and
exp 2iδ(−) interchange between each other when tan∆ crosses zero and simultaneously the sign in
the right hand side of eq.(3.19) for sin θ changes. This diagonalization allows to disentangle two
elastic scattering channels. The scalar form factors attached to every of these channels, Γ′1 and
Γ′2, will satisfy the Watson final state theorem in the whole energy range and then one has,
= CTQ1/2Γ = CTQ1/2
Γπ = q
λ cos θ |Γ′1|eiδ(+) ± sin θ |Γ′2|eiδ(−)
ΓK = q
± cos θ |Γ′2|eiδ(−) − λ sin θ |Γ′1|eiδ(+)
. (3.22)
The ± in front of |Γ′2| is due to the fact that Γ′2 = 0 at sK , as follows from its definition in the
equation above. Since Watson final state theorem only fixes the phase of Γ′2 up to modulo π, and
the phase is not defined in the zero, we cannot fix the sign in front at this stage. Next, Γ′1 has a
zero at s1 when δπ(sK) ≥ π. For this case, −|Γ′1| must appear in the previous equation, so as to
guarantee continuity of its ascribed phase, and this is why λ = (−1)θ(δπ(sK)−π).
Now, when η → 1 then sin θ → 0 as
(1− η)/2 and φ(s) is then the eigenvalue phase δ(+).
This eigenvalue phase can be calculated given the T−matrix. For those T−matrices employed
here, and those of refs.[10, 11, 4, 13], δ(+)(s) follows rather closely ϕ(s) in the whole energy range.
This is shown in fig.2 and already discussed in detail in ref.[11]. In this way, one guarantees
that φ(s) and ϕ(s) do not differ between each other in an integer multiple of π when η ≃ 1,
1.12 . s . 1.52 GeV2.
For the calculation of QH in eq.(3.12) we shall equate φ(s) = ϕ(s) for 4m
K < s < 1.5
2 GeV2.
Denoting,
= I1 + I2 + I3 ,
∫ 1.12
ds , (3.23)
QH ≃ IH −
θ(δπ(sK)− π) . (3.24)
Now, eq.(3.22) can also be used to estimate the error of approximating φ(s) by ϕ(s) in the range
4m2K < s < 1.5
2 GeV2 to calculate I2 and I3 as done in eq.(3.23). We could have also used δ(+)(s)
in eq.(3.23). However, notice that when η . 1 then ϕ(s) ≃ δ(+)(s) and when inelasticity could be
substantial the difference between δ(+)(s) and ϕ(s) is well taken into account in the error analysis
that follows. Remarkably, consistency of our approach also requires φ(s) to be closer to ϕ(s) than
to δ(+)(s). The reason is that ϕ(s) for δπ(sK) ≥ π is in very good approximation the ϕ(s) for
δπ(sK) < π plus π, this is clear from fig.2. This difference is precisely the required one in order
to have the same value for 〈r2〉πs either for δπ(sK) < π or δπ(sK) ≥ π from eq.(3.11). However, the
difference for δ(+)(s) between δπ(sK) < π and δπ(sK) ≥ π is smaller than π. Indeed, we note that
φ(s) follows closer ϕ(s) than δ(+)(s) for the explicit form factors of refs.[8, 4].
Let us consider first the range 1.12 < s < 1.52 GeV2 where from experiment [20] η ≃ 1 within
errors. With ǫ = ± tan θ|Γ′2/Γ′1| and ρ = δ(−) − δ(+), eq.(3.22) allows us to write,
Γπ = λ cos θ |Γ′1|eiδ(+)(1 + ǫ cos ρ)
1 + i
ǫ sin ρ
1 + ǫ cos ρ
. (3.25)
When η → 1 then ǫ → 0, according to the expansion,#3
tan θ =
(1− η)/2
1− 1 + 3 cos 2∆
8 sin2∆
(1− η)
(1− η)5/2
. (3.26)
Rewriting,
1 + i
ǫ sin ρ
1 + ǫ cos ρ
= exp
ǫ sin ρ
1 + ǫ cos ρ
+O(ǫ2) , (3.27)
which from eqs.(3.25) and (3.27) implies a shift in δ(+) because of inelasticity effects,
δ(+) → δ(+) +
ǫ sin ρ
1 + ǫ cos ρ
. (3.28)
#3The the ratio |Γ′2/Γ′1|, present in ǫ, is not expected to be large since the f0(1300) couples mostly to 4π and
similarly to ππ and KK̄, and the f0(1500) does mostly to ππ [28].
Using η = 0.8 in the range 1.12 . s . 1.52 GeV, η ≃ 1 from the energy dependent analysis of
ref.[20] given by the K−matrix of eq.(3.13), one ends with ǫ ≃ 0.3. Taking into account that δ(+)
is larger than & 3π/2 for δπ(sK) ≥ π (in this case δ(+) ≃ δπ), and around 3π/4 for δπ(sK) < π,
see fig.2, one ends with relative corrections to δ(+) around 6% for the former case and 13% for the
latter. Although the K−matrix of ref.[20], eq.(3.13), is given up to 1.9 GeV, one should be aware
that to take only the two channels ππ and KK̄ in the whole energy range is an oversimplification,
particularly above 1.2 GeV. Because of this we finally double the previous estimate. Hence I3 is
calculated with a relative error of 12% for δπ(sK) ≥ π and 25% for δπ(sK) < π.
In the narrow region between sK < s < 1.1
2 GeV2, η can be rather different from 1, due to
the f0(980) that couples very strongly to the just open KK̄ channel. However, from the direct
measurements of ππ → KK̄ [29], where 1 − η2 is directly measured,#4 one has a better way to
determine η than from ππ scattering [20, 21]. It results from the former experiments, as shown
also by explicit calculations [30, 31, 32], that η is not so small as indicated in ππ experiments [20],
and one has η ≃ 0.6 − 0.7 for its minimum value. Employing η = 0.6 in eq.(3.28) then ǫ ≃ 0.5.
Taking δ(+) around π/2 when δπ(sK) < π this implies a relative error of 30%. For δπ(sK) ≥ π one
has instead δ(+) & π, and a 15% of estimated error. Regarding the ratio of the moduli of form
factors entering in ǫ we expect it to be . 1 (see appendix A). Therefore, our error in the evaluation
of I2 is estimated to be 30% and 15% for the cases δπ(sK) < π and δπ(sK) ≥ π, respectively.
As a result of the discussion following eq.(3.24), we consider that the error estimates done
for I2 and I3 in the case δπ(sK) < π are too conservative and that the relative errors given for
δπ(sK) > π are more realistic. Nonetheless, since the absolute errors that one obtains for I2 and
I3 are the same in both cases (because I2 and I3 for δπ(sK) < π are around a factor 2 smaller than
those for δπ(sK) ≥ π) we keep the errors as given above. To the previous errors for I2 and I3 due
to inelasticity, we also add in quadrature the noise in the calculation of QH due to the error in
tππ from the uncertainties in the parameters of the K−matrix eqs.(3.13), (3.14), and those in the
parameterizations CGL and PY.
We finally employ for s > 2.25 GeV2 the knowledge of the asymptotic phase of the pion scalar
form factor in order to evaluate QA in eq.(3.12). The function φ(s) is determined so as to match
with the asymptotic behaviour of Γπ(t) as −1/t from QCD. The Omnès representation of the
scalar form factor, eqs.(2.2) and (2.8), tends to t−q/π and t−q/π+1 for t → ∞, respectively. Here, q
is the asymptotic value of the phase φ(s) when s → ∞. Hence, for δπ(sK) < π the function φ(s)
is then required to tend to π while for δπ(sK) ≥ π the asymptotic value should be 2π. The way
φ(s) is predicted to approach the limiting value is somewhat ambiguous [11, 12],
φas(s) ≃ π
log(s/Λ2)
. (3.29)
In this equation, 2dm = 24/(33 − 2nf ) ≃ 1, Λ2 is the QCD scale parameter and n = 1, 2 for
δπ(4m
K) < π, ≥ π, respectively. The case n = 2 was not discussed in refs.[10, 11, 12, 13, 14] for
the form factor given in eq.(1.1). There is as well a controversy between [14] and [12] regarding
the ± sign in eq.(3.29). If leading twist contributions dominate [11, 12] then the limiting value is
reached from above and one has the plus sign, while if twist three contributions are the dominant
ones [14] the minus sign has to be considered [12]. In the left panel of fig.2 we show with the wide
#4Neglecting multipion states.
φ(s) I I II II
δπ(sK) ≥ π < π ≥ π < π
I1 0.435± 0.013 0.435± 0.013 0.483± 0.013 0.483± 0.013
I2 0.063± 0.010 0.020± 0.006 0.063± 0.010 0.020± 0.006
I3 0.143± 0.017 0.053± 0.013 0.143± 0.017 0.053± 0.013
QH 0.403± 0.024 0.508± 0.019 0.452± 0.024 0.554± 0.019
QA 0.21± 0.03 0.10± 0.03 0.21± 0.03 0.10± 0.03
〈r2〉πs 0.61± 0.04 0.61± 0.04 0.66± 0.04 0.66± 0.04
Table 1: Different contributions to 〈r2〉πs as defined in eqs.(3.12) and (3.23). All the units are fm2.
In the value for 〈r2〉πs the errors due to I1, I2, I3 and QA are added in quadrature.
bands the values of φ(s)as for s > 2.25 GeV
2 from eq.(3.29), considering both signs, for n = 1
(δπ(sK) < π) and 2 (δπ(sK) ≥ π). We see in the figure that above 1.4−1.5 GeV (1.96−2.25 GeV2)
both ϕ(s) and φ(s)as phases match and this is why we take sH = 2.25 GeV
2 in eq.(3.11), similarly
as done in refs.[10, 11]. In this way, we also avoid to enter into hadronic details in a region where
η < 1 with the onset of the f0(1500) resonance. The present uncertainty whether the + or − sign
holds in eq.(3.29) is taken as a source of error in evaluating QA. The other source of uncertainty
comes from the value taken for Λ2, 0.1 < Λ2 < 0.35 GeV2, as suggested in ref.[10]. From fig.2 it is
clear that our error estimate for φas(s) is very conservative and should account for uncertainties
due to the onset of inelasticity for energies above 1.4 − 1.5 GeV and to the appearance of the
f0(1500) resonance. In the right panel of fig.2 we show the integrand for 〈r2〉πs , eq.(3.12), for
parameterization I (dashed line) and II (solid line). Notice as the large uncertainty in φas(s) is
much reduced in the integrand as it happens for the higher energy domain.
In table 1 we show the values of I1, I2, I3, QH , QA and 〈r2〉πs for the parameterizations I and
II and for the two cases δπ(sK) ≥ π and δπ(sK) < π. This table shows the disappearance of the
disagreement between the cases δπ(sK) ≥ π and δπ(sK) < π from the ππ and KK̄ T−matrix
of eq.(3.13), once the zero of Γπ(t) at s1 < sK is taken into account for the former case. This
disagreement was the reason for the controversy between Ynduráin and ref.[13] regarding the value
of 〈r2〉πs . The fact that the parameterization II gives rise to a larger value of 〈r2〉πs than I is because
PY follows the upper δπ data below 0.9 GeV, while CGL follows lower ones, as shown in fig.1.
The different errors in table 1 are added in quadrature. The final value for 〈r2〉πs is the mean
between those of parameterizations I and II and the error is taken such that it spans the interval
of values in table 1 at the level of two sigmas. One ends with:
〈r2〉πs = 0.63± 0.05 fm
2 . (3.30)
The largest sources of error in 〈r2〉πs are the uncertainties in the experimental δπ and in the
asymptotic phase φas. This is due to the fact that the former are enhanced because of its weight
in the integrand, see fig.2, and the latter due to its large size.
Our number above and that of refs.[1, 4], 〈r2〉πs = 0.61± 0.04 fm2, are then compatible. On the
other hand, we have also evaluated 〈r2〉πs directly from the scalar form factor obtained with the
dynamical approach of ref.[8] from Unitary χPT and we obtain 〈r2〉πs = 0.64±0.06 fm2, in perfect
agreement with eq.(3.30). Notice that the scalar form factor of ref.[8] has δπ(sK) > π and we have
checked that it has a zero at s1, as it should. This is shown in fig.3 by the dashed-double-dotted line.
The value 〈r2〉πs = 0.75±0.07 fm2 from refs.[10, 11] is much larger than ours because the possibility
of a zero at s1 was not taking into account there and other solution was considered. This solution,
however, has an unstable behaviour under the transition δπ(sK) = π− 0+ to δπ(sK) = π+0+ and
it cannot be connected continuously with the one for δπ(sK) < π. Our solution for Γπ(t) from
Ynduráin’s method does not have this unstable behaviour and it is continuous under changes in
the values of the parameters of the K−matrix, eqs.(3.13) and (3.14). This is why, from our results,
it follows too that the interesting discussion of ref.[11], regarding whether δπ(sK) < π or ≥ π,
is not any longer conclusive to explain the disagreement between the values of refs.[10, 11] and
ref.[1] for 〈r2〉πs .
We can also work out from our determination of 〈r2〉πs , eq.(3.30), values for the O(p4) SU(2)
χPT low energy constant ℓ̄4. We take the two loop expression in χPT for 〈r2〉πs [1],
〈r2〉πs =
8π2f 2π
ℓ̄4 −
+ ξ∆r
, (3.31)
where fπ = 92.4 MeV is the pion decay constant, ξ = (Mπ/4πfπ)
2 and Mπ is the pion mass. First,
at the one loop level calculation ∆r = 0 and then one obtains,
ℓ̄4 = 4.7± 0.3 . (3.32)
We now move to the determination of ℓ̄4 based on the full two loop relation between 〈r2〉πs and ℓ̄4.
The expression for ∆r can be found in Appendix C of ref.[1]. ∆r is given in terms of one O(p6)
χPT counterterm, r̃S2 , and four O(p4) ones. Taking the values of all these parameters, but for ℓ̄4,
from ref.[1], and solving for ℓ̄4, one arrives to
ℓ̄4 = 4.5± 0.3 . (3.33)
This number is in good agreement with ℓ̄4 = 4.4± 0.2 [1].
Ref.[12] also points out that one loop χPT fits to the S-, P- and D-wave scattering lengths and
effective ranges give rise to much larger values for ℓ̄2 and ℓ̄4 than those of ref.[1]. For more details
we refer to [12].
4 Conclusions
In this paper we have addressed the issue of the discrepancies between the values of the quadratic
pion scalar radius of Leutwyler et al. [4, 13], 〈r2〉πs = 0.61 ± 0.04 fm2, and Ynduráin’s papers
[10, 11, 12], 〈r2〉 = 0.75±0.07 fm2. One of the reasons of interest for having a precise determination
of 〈r2〉πs is its contribution of a 10% to a00 and a20, calculated with a precision of 2% in ref.[1]. The
value taken for 〈r2〉πs is also important for determining the O(p4) χPT coupling ℓ̄4.
From our study it follows that Ynduráin’s method to calculate 〈r2〉πs [10, 11], based on an
Omnès representation of the pion scalar form factor, and that derived by solving the two(three)
coupled channel Muskhelishvili-Omnès equations [4, 1, 6], are compatible. It is shown that the
reason for the aforementioned discrepancy is the presence of a zero in Γπ(t) for those S-wave I=0
T−matrices with δπ(sK) ≥ π and elastic below the KK̄ threshold, with sK = 4m2K . This zero
was overlooked in refs.[10, 11], though, if one imposes continuity in the solution obtained under
tiny changes of the ππ phase shifts employed, it is necessarily required by the approach followed
there. Once this zero is taken into account the same value for 〈r2〉πs is obtained irrespectively
of whether δπ(sK) ≥ π or δπ(sK) < π. Our final result is 〈r2〉πs = 0.63 ± 0.05 fm2. The error
estimated takes into account experimental uncertainty in the values of δπ(s), inelasticity effects
and present ignorance in the way the phase of the form factor approaches its asymptotic value π,
as predicted from QCD. Employing our value for 〈r2〉πs we calculate ℓ̄4 = 4.5 ± 0.3. The values
〈r2〉πs = 0.61± 0.04 fm2 and ℓ̄4 = 4.5± 0.3 of ref.[1] are then in good agreement with ours.
Acknowledgements
We thank Miguel Albaladejo for providing us numerical results from some unpublished T−matrices
and Carlos Schat for his collaboration in a parallel research. We also thank F.J. Ynduráin for
long discussions and B. Anathanarayan, I. Caprini, G. Colangelo, J. Gasser and H. Leutwyler for
a critical reading of a previous version of the manuscript. This work was supported in part by the
MEC (Spain) and FEDER (EC) Grants FPA2004-03470 and Fis2006-03438, the Fundación Séneca
(Murcia) grant Ref. 02975/PI/05, the European Commission (EC) RTN Network EURIDICE
under Contract No. HPRN-CT2002-00311 and the HadronPhysics I3 Project (EC) Contract No
RII3-CT-2004-506078.
Appendices
A Coupled channel dynamics
We take ππ and KK̄ coupled channels and denote by F1 and F2 their respective I=0 scalar form
factors. Unitarity requires,
ImFi =
Fjρjθ(t− s′j)t∗ji , (A.1)
where ||tij|| is the I=0 S-wave T−matrix, s′i is the threshold energy square of channel i and
ρi = qi/8π
s, with qi its center of mass three momentum.
A general solution to the previous equations is given by,
F = T G , F =
, G =
, (A.2)
where the functions Gi(t) do not have right hand cut. This equation is interesting as tells us that
if pion dynamics dominate, |G1| >> |G2|, then F1 ≃ G1t11 and the form factor phase φ(s) follows
ϕ(s). As a result, like t11, it has a zero at s1 below the KK̄ threshold for δπ(sK) ≥ π, as shown
in section 3. On the other hand, if kaon dynamics dominates, |G2| >> |G1|, then F1 ≃ G2t12 and
φ(s) follows the phase of t12, that above the KK̄ threshold is clearly above π. This is why for
the pion strange scalar form factor there is no zero at s1 . sK for δπ(sK) ≥ π, indeed there is a
maximum like that shown in fig.3 by the dashed-dotted line.
As in section 3 we now proceed to the diagnolization above the KK̄ threshold of the renormal-
ized T−matrix T ′,
T ′ = ρ1/2Tρ1/2 , ρ =
, T̃ = CTT ′C =
t̃11 0
0 t̃22
t̃11 = sin δ(+)e
iδ(+) , t̃22 = sin δ(−)e
iδ(−) . (A.3)
The corresponding diagonal form factors F ′1 and F
2, collected in the vector F
′, are
F ′ = CTρ1/2F = T̃CTρ−1/2G =
cos θ ρ
1 G1 − sin θ ρ
t̃11{
sin θ ρ
1 G1 + cos θ ρ
 . (A.4)
The previous expressions allow to obtaining F1 directly in terms of the eigenphases and with clean
separation between pion, G1, and kaon dynamics, G2. From eq.(3.22) it follows that,
cos2 θ ρ−1G1 − cos θ sin θ ρ−1/22 ρ
sin2 θ ρ−11 G1 + cos θ sin θ ρ
t̃22 .
(A.5)
For δπ(sK) ≥ π typical values, somewhat above the KK̄ threshold, are e2iδ(+) ≃ +i, e2iδ(−) ≃ −i
and sin θ > 0. For dominance of G1 one has F1/G1 ≃ ρ−11 (i + cos 2θ)/2 while for dominance
of G2 the result is F1/G2 ≃ − sin θ cos θ ρ−1/22 ρ
1 < 0. The factors G1,2 do not introduce
any change in φ(s) with respect to its value before the opening of the KK̄ threshold since they
are smooth functions in s.#5 In both cases the phase φ(s) is larger than π and F1 follows the
upper trend of phases shown in fig.2 (note that in this case t̃11 is in the first quadrant though
δπ > π). Now, doing the same exercise for δπ(sK) < π, one has the typical values e
2iδ(+) ≃ −i,
e2iδ(−) ≃ +i and sin θ < 0. For pion dominance then F1/G1 ≃ ρ−11 (i− cos 2θ)/2 and for the kaon
one F1/G2 ≃ + sin θ cos θρ−1/22 ρ
1 < 0. Thus, in the former case the phase is & π/2, and follows
the lower trend of phases of fig.2, while in the latter is & π and follows again the upper trend (this
is the case of the strange scalar form factor).
The demonstration in section 3 that φ(sK) is discontinuous in the limit δπ(sK) → π− by taking
s1 → s+K , cannot be applied in the case of kaon dominance (e.g. pion strange scalar form factor).
From eq.(A.5) it follows that,
F1(t) ≃ − cos θ sin θρ−1/22 ρ
t̃11 − t̃22
. (A.6)
The point is that t̃22 for t ≥ s1 (s1 → s+K) is of size comparable with that of t̃11 (both tend to zero)
and the phase does not follow δ(+). This is not the case for pion dominance because for s1 → s+K
then sin2 θ → 0, F1(t) ≃ cos2 θ ρ−11 G1t̃11, eq.(A.5), and φ(s) follows δ(+).
From eq.(A.4) we can also write |Γ′2/Γ′2| ≃ |t̃11 tan θ/t̃22| for the case of pion dominance. Since
typically |t̃11/t̃22| ≃ 1, as shown above for energies somewhat above the KK̄ threshold, then
|Γ′2/Γ′1| ≃ | tan θ| < 1. This is why we consider that equating it to 1 in section 3 is a conservative
estimate.
#5Due to the Adler zeroes this is not necessarily case close to the ππ threshold.
References
[1] G. Colangelo, J. Gasser and H. Leutwyler, Nucl. Phys. B603, 125 (2001).
[2] J. Gasser and H. Leutwyler, Phys. Lett. B125, 325 (1983).
[3] G. Colangelo and S. Dür, Eur. Phys. J. C33, 543 (2004).
[4] J. F. Donoghue, J. Gasser and H. Leutwyler, Nucl. Phys. B343, 341 (1990).
[5] B. Ananthanarayan, G. Colangelo, J. Gasser and H. Leutwyler, Phys. Rep. 353, 207 (2001).
[6] B. Moussallam, Eur. Phys. J. C14, 111 (2000).
[7] J. Gasser and U.-G. Meißner, Nucl. Phys. B357, 90 (1991).
[8] U. G. Meißner and J. A. Oller, Nucl. Phys. A679, 671 (2001).
[9] J. Bijnens, G. Colangelo and P. Talavera, JHEP 9805, 014 (1998).
[10] F. J. Ynduráin, Phys. Lett. B578, 99 (2004); (E)-ibid B586, 439 (2004).
[11] F. J. Ynduráin, Phys. Lett. B612, 245 (2005).
[12] F. J. Ynduráin, arXiv:hep-ph/0510317.
[13] B. Ananthanarayan, I. Caprini, G. Colangelo, J. Gasser and H. Leutwyler, Phys. Lett. B602,
218 (2004).
[14] I. Caprini, G. Colangelo and H. Leutwyler, Int. J. Mod. Phys. A21, 954 (2006).
[15] M. Jamin, J.A. Oller and A. Pich, JHEP 0402, 047 (2004); Phys. Rev. D74, 074009 (2006).
[16] M. Jamin, J.A. Oller and A. Pich, Nucl. Phys. B 587, 331 (2000).
[17] J. Bijnens and P. Talavera, Nucl. Phys. B669, 341 (2003).
[18] O. P. Yushchenko et al., Phys. Lett. B581, 31 (2004).
[19] T. Alexopoulos et al. [KTeV Collaboration], Phys. Rev. D70, 092007 (2004).
[20] B. Hyams et al., Nucl. Phys. B64, 134 (1973).
[21] R. Kaminski, L. Lesniak and K. Rybicki, Z. Phys. C 74, 79 (1997).
[22] F. Guerrero and J. A. Oller, Nucl. Phys. B537, 459 (1999); (E)-ibid. B602, 641 (2001).
[23] S. J. Brodsky and G. P. Lepage, Phys. Rev. D22, 2157 (1980).
[24] J. R. Peláez and F. J. Ynduráin, Phys. Rev. D68, 074005 (2003); ibid D71, 074016 (2005).
http://arxiv.org/abs/hep-ph/0510317
[25] S Pislak et al. [BNL-E865 Collaboration], Phys. Rev. Lett. 87, 221801; Phys. Rev. D67,
072004 (2003).
[26] L. Masetti [NA48/2 Collaboration], arXiv:hep-ex/0610071.
[27] G. Grayer et al., Nucl. Phys. B 75 (1974) 189.
[28] W.-M. Yao et al., Journal of Physics G33, 1 (2006).
[29] W. Wetzel et al., Nucl. Phys. B115, 208 (1976); V. A. Polychromatos et al., Phys. Rev. D19,
1317 (1979); D. Cohen et al. Phys. Rev. D22, 2595 (1980); E. Etkin et al., Phys. Rev. D25,
1786 (1982).
[30] J. A. Oller and E. Oset, Nucl. Phys. A 620 (1997) 438 (E)-ibid. A 652 (1999) 407].
[31] J. A. Oller and E. Oset, Phys. Rev. D 60 (1999) 074023.
[32] M. Albaladejo and J. A. Oller, forthcoming. Here the 4π channel is included.
[33] J. A. Oller, Nucl. Phys. A727, 353 (2003).
http://arxiv.org/abs/hep-ex/0610071
	Introduction
	Scalar form factor
	Results
	Conclusions
	Coupled channel dynamics
ABSTRACT
  The quadratic pion scalar radius, \la r^2\ra^\pi_s, plays an important role
for present precise determinations of \pi\pi scattering. Recently, Yndur\'ain,
using an Omn\`es representation of the null isospin(I) non-strange pion scalar
form factor, obtains \la r^2\ra^\pi_s=0.75\pm 0.07 fm^2. This value is larger
than the one calculated by solving the corresponding Muskhelishvili-Omn\`es
equations, \la r^2\ra^\pi_s=0.61\pm 0.04 fm^2. A large discrepancy between both
values, given the precision, then results. We reanalyze Yndur\'ain's method and
show that by imposing continuity of the resulting pion scalar form factor under
tiny changes in the input \pi\pi phase shifts, a zero in the form factor for
some S-wave I=0 T-matrices is then required. Once this is accounted for, the
resulting value is \la r^2\ra_s^\pi=0.65\pm 0.05 fm^2. The main source of error
in our determination is present experimental uncertainties in low energy S-wave
I=0 \pi\pi phase shifts. Another important contribution to our error is the not
yet settled asymptotic behaviour of the phase of the scalar form factor from
QCD.

<|endoftext|><|startoftext|>
Introduction
The paper addresses a topic related to conditionally free (or, shortly, using the
term from [2], c-free) probability. This notion was developed in the ’90’s (see [1],
[2]) as an extension of freeness within the framework of ∗-algebras endowed with
not one, but two states. Namely, given a family of unital algebras {A}i∈I, each Ai
endowed with two expectations ϕi, ψi : Ai −→ C, their c-free product is the triple
(A, ϕ, ψ), where:
(i) A = ∗i∈IAi is the free product of the algebras Ai.
(ii) ψ = ∗i∈Iψi and ϕ = ∗(ψi)i∈Iϕi are expectations given by the relations
(a) ψ(a1 · · · an) = 0
(b) ϕ(a1 · · · an) = ϕε(1)(a1) · · ·ϕε(n)(an)
for all aj ∈ Aε(j), j = 1, . . . , n such that ψε(j)(aj) = 0 and ε(1) 6= · · · 6= ε(n).
A key result is that if the Ai are ∗-algebras and ϕi, ψi are positive functionals,
then ϕ and ψ are also positive.
In [6], the positivity of the free product maps ϕ, ψ is proved for the case
when ϕ1, ϕ2 are positive conditional expectations in a common C∗-subalgebra,
but ψ1, ψ2 remain positive C-valued maps. A more general situation is indeed
discussed (see Theorem 3, Section 6, from [6]), but the question if ϕ, ψ re positive
for ϕ1,2, ψ1,2 arbitrary positive conditional expectations is left unanswered.
A first answer was given in [8], where we showed that for A a ∗-algebra, the
analogous construction with both ϕ and ψ valued in a C∗-subalgebra B of A still
retains the positivity. The present paper further develops this result (see Theorem
2.3) and also demonstrates the use of multilinear function series in c-free setting.
2000 Mathematics Subject Classification. Primary 45L53; Secondary 46L08.
Key words and phrases. conditional freeness, conditional expectation, R-transform, multi-
linear function series.
http://arxiv.org/abs/0704.0040v3
2 MIHAI POPA
In [2] is constructed a c-free version of Voiculescu’s R-transform, which we will
call the cR-transform, with the property that cRX+Y =
cRX +
cRY if X and Y
are c-free elements from the algebra A relative to ϕ and ψ (i.e. the relations (a)
and (b) from the definition of the c-free product hold true for the subalgebras
generated by X and Y .)
The apparatus of multilinear function series is used in recent work of K. Dykema
([3] and [4]) to construct suitable analogues for the R and S-transforms in the
framework of freeness with amalgamation. We will show that this construction
is also appropriate for the cR-transform mentioned above. The techniques used
differ from the ones of [3], the Fock space type construction being substituted by
combinatorial techniques similar to [2] and [7]. Particularly, Theorems 3.3 and 3.6
contain new (shorter) proves of the results 6.1–6.13 from [3].
The paper is structured in four sections. In Section 2 are stated the basic
definitions and are proved the main positivity results. Section 3 describes the
construction and the basic property of the multilinear function series cR-transform
and Section 4 treats the central limit theorem and a related positivity property.
2. Definitions and positivity results
Definition 2.1. Let {Ai}i∈I be a family of algebras, all containing the subalgebra
B. Suppose D is a subalgebra of B and Ψi : Ai −→ D and Φi : Ai −→ B are con-
ditional expectations, i ∈ I. We say that the triple (A,Φ,Ψ) = ∗i∈I(Ai,Φi,Ψi)
is the conditionally free product with amalgamation over (B,D), or shortly, the
c-free product, of the triples (Ai,Φi,Ψi)i∈I if
(1) A is the free product with amalgamation over B of the family (Ai)i∈I
(2) Ψ = ∗i∈IΨi and Φ = ∗(Ψi)i∈IΦi are determined by the relations
Ψ(a1a2 . . . an) = 0
Φ(a1a2 . . . an) = Φ(a1)Φ(a2) . . .Φ(an),
for all ai ∈ Aε(i), ε(i) ∈ I such that ε(1) 6= ε(2) 6= · · · 6= ε(n) and Ψε(i)(ai) = 0.
When D = C, this definition reduces to the one given in [6]. When both B and
D are equal to C, this definition was given in [2].
When discussing positivity, we need a ∗-structure on our algebras. We will
demand that B and D be C∗-algebras, while Ai and A are only required to be
∗-algebras.
The following results are slightly modified versions of Lemma 6.4 and Theorem
6.5 from [8].
Lemma 2.2. Let B be a C∗-algebra and A1, A2 be two ∗-algebras containing B
as a ∗-subalgebra, endowed with positive conditional expectations Φj : Aj −→ B,
j = 1, 2. If a1, . . . , an ∈ A1, an+1, . . . , an+m ∈ A2 and A = (Ai,j)i,j ∈ Mn+m(B)
is the matrix with the entries
Ai,j =
i aj) if i, j ≤ n
i )Φ2(aj) if i ≤ n, j > n
i )Φ1(aj) if i > n, j ≤ n
i aj) if i, j > n
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 3
then A is positive.
Proof. The vector space E = B ⊕ ker(Φ1) ⊕ ker(Φ2) has a B-bimodule structure
given by the algebraic operations on A1 and A2. Consider the B-sesquilinear
pairing
〈·, ·〉 : E× E −→ B
determined by the relations:
〈b1, b2〉 = b∗1b2, for b1, b2 ∈ B
〈uj, vj〉 = Φj(u∗jvj), for uj , vj ∈ ker(Φj), j = 1, 2
〈u1, u2〉 = 〈u2, u1〉 = 0 for u1 ∈ ker(Φ1), and u2 ∈ ker(Φ2).
〈b, uj〉 = 〈uj , b〉 = 0 for all b ∈ B, uj ∈ Aj
With this notation, we have that Ai,j = 〈ai, aj〉, hence it suffices to show that
〈a, a〉 ≥ 0 for all a ∈ E.
Indeed, for an element a = b + u1 + u2 with b ∈ B, uj ∈ ker(Φj), j = 1, 2, we
have:
〈a, a〉 = 〈b + u1 + u2, b+ u1 + u2〉
= 〈b, b〉+ 〈u1, u1〉+ 〈u2, u2〉
= b∗b+Φ1(u
1u1) + Φ2(u
Theorem 2.3. Let B be a C∗-algebra and D a C∗-subalgebra of B. Suppose that
A1, A2 are ∗-algebras containing B, each endowed with two positive conditional
expectations Φj : Aj −→ B, and Ψj : Aj −→ D, j = 1, 2 and consider the c-free
product (A,Φ,Ψ) = ∗i=1,2(Ai,Φi,Ψi).
Then the maps Φ and Ψ are positive.
Proof. The positivity of Ψ is by now a classical result in the theory of free proba-
bility with amalgamation over a C∗-algebra (for example, see [9], Theorem 3.5.6).
For the positivity of Φ we have to show that Φ(a∗a) ≥ 0 for any a ∈ A.
Any element of A can be written as
s1,k . . . sn(k),k,
where sj,k ∈ Aε(j,k) ε(1, k) 6= ε(2, k) 6= · · · 6= ε(n(k), k).
Writing
s(j,k) = s(j,k) −Ψ(s(j,k)) +Ψ(s(j,k))
and expanding the product, we can consider a of the form
a = d+
a1,k . . . an(k),k
with d ∈ D ⊂ B and aj,k ∈ Aε(j,k) such that Ψε(j,k)(aj,k) = 0 and ε(1, k) 6=
ε(2, k) 6= · · · 6= ε(n(k), k).
4 MIHAI POPA
Therefore
Φ(a∗a) = Φ
d+ d∗
a1,k . . . an(k),k
a1,k . . . an(k),k
a1,k . . . an(k),k
]∗[ N∑
a1,k . . . an(k),k
Since Φ is a conditional expectation and d ∈ D ⊂ B, the above equality becomes
Φ(a∗a) = d∗d+
d∗Φ(a1,k . . . an(k),k) +
Φ(a∗n(k),k . . . a
1,k)d
k,l=1
a∗n(k),k . . . a
1,ka1,l . . . an(l),l
Using the definition of the conditionally free product with amalgamation over B
and that Ψε(j,k)(aj,k) = 0 for all j, k, one further has
Φ(a∗a) = d∗d+
Φ(d∗a1,k)Φ(a2,k) . . .Φ(an(k),k)
Φ(an(k),k)
. . .Φ(a∗2,k)Φ(a
1,kd)
k,l=1
Φ(an(k),k)
∗ . . .Φ(a∗2,k)
Φ(a∗1,ka1,l)Φ(a2,l) . . .Φ(an(l),l)
that is
Φ(a∗a) = d∗d+
Φ(d∗a1,k)
Φ(a2,k) . . .Φ(an(k),k)
Φ(a2,k) . . .Φ(an(k),k)
Φ(a∗1,kd)
k,l=1
Φ(a2,k) . . .Φ(an(k),k)
Φ(a∗1,ka1,l)
Φ(a2,l) . . .Φ(an(l),l)
From Lemma 2.2, the matrix S =
Φ(a∗1,ia1,j)
i,j=1
is positive in MN+1(B),
therefore
S = T ∗T, for some T ∈MN+1(B).
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 5
Denote now a1,N+1 = d and vk = Φ(a2,k) . . .Φ(an(k),k).The identity for Φ(a
becomes:
Φ(a∗a) = (v1, . . . , vN , 1)
∗T ∗T (v1, . . . , vN , 1)
≥ 0, as claimed.
Theorem 2.4. Assume that I =
j∈J Ij is a partition of I. Then:
∗j∈J (∗i∈Ij (Ai,Φi,Ψi)) = ∗i∈I(Ai,Φi,Ψi)
Proof. The proof is identical to the proofs of similar results in [6] and [2]. Consider
ai ∈ Aε(i), 1 ≤ i ≤ m such that ε(1) 6= ε(2 6= · · · 6= ε(m) and Ψε(i)(ai) = 0. Let
1 = i0 < i1 < · · · < ik = m and Jl = {ε(i), il−1 ≤ i < il}.
Since
(∗j∈JlΨj) ((ail−1 · · · ail)) = 0,
it suffices to show that
Φ(a1 · · · am) =
(∗(Ψj),j∈JlΦj)(ail−1 · · · ail)] .
Φ(a1 · · · am) = Φε(1)(a1) · · ·Φε(m)(am)
while, since Ψε(i)(ai) = 0,
(∗(Ψj),j∈JlΦj)(ail−1 · · · ail) = Φil−1(ail−1) · · ·Φil(ail)
and the conclusion follows. �
Definition 2.5. Let A be an algebra (respectively a ∗-algebra),B a subalgebra (∗-
subalgebra) of A and D a subalgebra (∗-subalgebra) of B. Suppose A is endowed
with the conditional expectations Ψ : A −→ D and Φ : A −→ D.
(i) The subalgebras (∗-subalgebras) (Ai)i∈I of A are said to be c-free with
respect to (Φ,Ψ) if:
(a) (Ai)i∈I are free with respect to Ψ.
(b) if ai ∈ Aε(i), 1 ≤ i ≤ m, are such that ε(1) 6= · · · 6= ε(m) and
Ψ(ai) = 0, then Φ(a1 · · ·am) = Φ(a1) · · ·Φ(am).
(ii) The elements (Xi)i∈I of A are said to be c-free with respect to (Φ,Ψ) if
the subalgebras (∗-subalgebras) generated by B and Xi are c-free with
respect to (Φ,Ψ).
We will denote by B〈ξ〉 the non-commutative algebra of polynomials in the
symbol ξ and with coefficients from B (the coefficients do not commute with
the symbol ξ). If I is a family of indices, B〈{ξi}i∈I〉 will denote the algebra of
polynomials in the non-commuting variables {ξ}i∈I and with coefficients from B.
We will identify B〈{ξi}i∈I〉 with the free product with amalgamation over B of
the family {B〈ξi〉}i∈I .
6 MIHAI POPA
If A is a ∗-algebra and B is with the C∗-algebra, B〈ξ〉 will also be considered
with a ∗-algebra structure, by taking ξ∗ = ξ. If X is a selfadjoint element from A,
we define the conditional expectations
ΦX ,ΨX : B〈ξ〉 −→ B
given by ΦX(f(ξ)) = Φ(f(X)) and ΨX(f(ξ)) = Ψ(f(X)) , for any f(ξ) ∈ B〈ξ〉.
Corollary 2.6. Suppose that A is a ∗-algebra and X and Y are c-free selfadjoint
elements of A such that the maps ΦX ,ΨX and ΦY ,ΨY are positive. Then the
maps ΦX+Y and ΨX+Y are also positive.
Proof. The positivity of ΨX+Y is an immediate consequence of the fact that X
and Y are free with amalgamation over B with respect to Ψ. It remains to prove
the positivity of ΦX+Y .
Since the maps ΦX : B〈ξ1〉 −→ B and ΦY : B〈ξ2〉 −→ B are positive, from
Theorem 2.3 so is
Φx ∗(ΨX ,ΨY ) ΦY : B〈ξ1〉 ∗B B〈ξ2〉 = B〈ξ1, ξ2〉 −→ B
Remark also that
iZ : B〈ξ〉 ∋ f(ξ) 7→ f(X + Y ) ∈ B〈ξ1〉 ∗B B〈ξ2〉
is a positive B-functional.
The conclusion follows from the fact that the c-freeness of X and Y is equivalent
ΦX+Y = (ΦX ∗(ΨX ,ΨY ) ΦY ) ◦ iX+Y .
3. Multilinear function series and the cR-transform
Let A be a ∗-algebra containing the C∗-algebra B, endowed with a conditional
expectation Ψ : A −→ B. If X is a selfadjoint element of A, then by the moment
of order n of X we will understand the map
X : B× · · · ×B︸ ︷︷ ︸
n−1 times
X (b1, . . . , bn−1) = Ψ(Xb1X . . .Xbn−1X)
If B = C, then the moment-generating series of X
mX(z) =
Ψ(Xn)zn
encodes all the information about the moments of X . For B 6= C, the straightfor-
ward generalization
mX(z) =
Ψ(Xn)zn
generally fails to keep track of all the possible moments of X . A solution to
this inconvenience was proposed in [3], namely the moment-generating multilin-
ear function series of X . Before defining this notion, we will briefly recall the
construction and several results on multilinear function series.
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 7
Let B be an algebra over a field K. We set B̃ equal to B if B is unital and
to the unitization of B otherwise. For n ≥ 1, we denote by Ln(B) the set of all
K-multilinear maps
ωn : B× · · · ×B︸ ︷︷ ︸
n times
A formal multilinear function series over B is a sequence ω = (ω0, ω1, . . . ),
where ω0 ∈ B̃ and ωn ∈ Ln(B) for n ≥ 1. According to [3], the set of all
multilinear function series over B will de denoted by Mul[[B]].
For α, β ∈Mul[[B]], the formal sum α+ β and the formal product αβ are the
elements from Mul[[B]] defined by:
(α+ β)n(b1, . . . , bn) = αn(b1, . . . , bn) + βn(b1, . . . , bn)
(αβ)n(b1, . . . , bn) =
αk(b1, . . . , bk)βn−k(bk+1, . . . , bn)
for any b1, . . . , bn ∈ B.
If β0 = 0, then the formal composition α ◦ β ∈Mul[[B]] is defined by
(α ◦ β)0 = α0
and, for n ≥ 1, by
(α ◦ β)n(b1, . . . , bn) =
βp1(b1, . . . , bp1), . . . , βpk(bqk+1, . . . , bqk+pk)
where the second summation is done over all k-tuples p1, . . . , pk ≥ 1 such that
p1 + · · ·+ pk = n and qj = p1 + · · ·+ pj−1.
One can work with elements ofMul[[B]] as if they were formal power series. The
relevant properties are described in [3], Proposition 2.3 and Proposition 2.6. As in
[3], we use 1, respectively I, to denote the identity elements of Mul[[B]] relative
to multiplication, respectively composition. In other words, 1 = (1, 0, 0, . . . ) and
I = (0, idB, 0, 0, . . . ). We will also use the fact that an element α ∈Mul[[B]] has
an inverse with respect to formal composition, denoted α〈−1〉, if and only if α has
the form (0, α1, α2, . . . ) with α1 an invertible element of L1(B).
Definition 3.1. With the above notation, the moment-generating multilinear func-
tion series MX of X is the element of Mul[[B]] such that:
MX,0 = Ψ(X)
MX,n(b1, . . . , bn) = Ψ(Xb1X · · ·XbnX).
Given an element α ∈ Mul[[B]], the multilinear function series Rα is defined
by the following equation (see [3], Def 6.1):
(1 + αI)
◦ (I + IαI)〈−1〉. (3.1)
A key property of R is that for any X,Y ∈ A free over B, we have
RMX+Y = RMX +RMY . (3.2)
8 MIHAI POPA
These relations were proved earlier in the particular case B = C. One can also
describe Rα by combinatorial means, via the recurrence relation
αn(b1, . . . , bn) =
[b1αp(1)(b3, . . . , bi1−2)bi1−1], . . .
. . . , [bi(k−1)αp(k)(bi(k−1)+1, . . . , bi(k)−2)bi(k)−1]
bi(k)αn−ik(bik+1 , . . . , bn)
where the second summation is done over all 1 = i(0) < i(1) < · · · < i(k) ≤ n and
p(k) = i(k)− i(k − 1)− 2.
Following an idea from [2], the above equation can be graphically illustrated by
the picture:
In the case of scalar c-free probability, an analogue of the Voiculescu’s R-
transform is developed in [2]. In order to avoid confusions, we will denote it
by cR.
The cR-transform has the property that it linearizes the c-free convolution of
pairs of compactly supported measures. In particular, if X and Y are c-free
elements from some algebra A, then
cRX+Y =
cRX +
cRY .
If the ∗-algebraA is endowed with the C-valued states ϕ, ψ andX is a selfadjoint
element of A, then (see [2]), the coefficients {cRm}m ≥ 0 of cRX are defined by
the recurrence:
ϕ(Xn) =
l(1),...,l(k)≥0
l(1)+···+l(k)=n−k
cRk · ψ(X l(1)) · · ·ψ(X l(k−1))ϕ(X l(k))
equation that can be graphically illustrated by the picture, were the dark boxes
stand for the application of ϕ and the light ones for the application of ψ:
The above considerations lead to the following definition:
Definition 3.2. Let β, γ ∈Mul[[B]]. The multilinear function series cRβ,γ is the
element of Mul[[B]] defined by the recurrence relation
βn(b1, . . . , bn) =
cRβ,γ,k
[b1γp(1)(b3, . . . , bi1−2)bi1−1], . . .
. . . , [bi(k−1)γp(k)(bi(k−1)+1, . . . , bi(k)−2)bi(k)−1]
bi(k)βn−ik(bik+1 , . . . , bn)
where the second summation is done over all 1 = i(0) < i(1) < · · · < i(k) ≤ n and
p(k) = i(k)− i(k − 1)− 2.
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 9
The following analytical description of cRβ,γ also shows that it is unique and
well-defined:
Theorem 3.3. For any β, γ ∈Mul[[B]],
Rβ,γ =
β(1 + Iβ)−1
◦ (I + IγI)〈−1〉 (3.3)
Before proving the theorem, remark that the right-hand side of (3.3) is well-
defined and unique, since 1 + Iγ is invertible with respect to the formal multipli-
cation, I+ IβI is invertible with respect to formal composition and its inverse has
0 as first component (see [3]). We will need the following auxiliary result:
Lemma 3.4. Let β be an element of Mul[[B]] and I the identity element with
respect to formal composition, I = (0, idB, 0, 0 . . . ).
(i) the multilinear function series Iβ is given by:
(Iβ)0 = 0
(Iβ)n(b1, . . . , bn) = b1βn−1(b2, . . . , bn)
(ii) the multilinear function series IβI is given by
(IβI)0 = 0
(IβI)1(b1) = 0
(IβI)n(b1, . . . , bn) = b1βn−2(b2, . . . , bn−1)bn
Proof. Since I = (0, idB, 0, . . . ), one has:
(Iβ)0 = I0β0 = 0.
If n ≥ 1,
(Iβ)n(b1, . . . , bn) =
Ik(b1, . . . , bk)βn−k(bk+1, . . . , bn)
= I1(b1)βn−1(bk+1, . . . , bn)
= b1βn−1(bk+1, . . . , bn).
For IβI, the same computations give:
(IβI)0 = (Iβ)0I0 = 0
(IβI)1 = (Iβ)0I1(b1) + (Iβ)1(b1)I0
If n ≥ 2, one has:
(IβI)n(b1, . . . , bn) =
(Iβ)k(b1, . . . , bk)In−k(bk+1, . . . , bn)
= (Iβ)n−1(b1, . . . , bk)I1(b1)
= b1βn−2(b2, . . . , bn−1)bn
10 MIHAI POPA
Proof of the Theorem 3.3: Set σ = I + IβI. Then
(cRβ,γ ◦ σ)n (b1, . . . , bn) =
p1,...,pk≥1
p1+···+pk=n
Rβ,γ,k
σp1 (b1, . . . , bp1), . . . ,
σpk(bqk+1, . . . , bqk+pk)
where qi = p1 + · · ·+ pi−1.
From Lemma (3.4)(ii), we have that
σn(b1, . . . , bn) = (I + IβI)n(b1, . . . , bn)
therefore Definition 3.2 is equivalent to
βn(b1, . . . , bn) =
(cRβ,γ ◦ (I + IβI)k(b1, . . . , bk)) bk+1βn−k−2(bk+2, . . . , bn)
Considering now Lemma 3.4(i), the above relation becomes
βn(b1, . . . , bn) =
(cRβ,γ ◦ (I + IβI)k(b1, . . . , bk)) (I + Iβ)n−k(bk+1, . . . , bn)
therefore
β = [cRβ,γ ◦ (I + IγI)] (1 + Iβ)
which is equivalent to (3.3). �
Remark 3.5. Up to a shift in the coefficients, equation (3.3) is similar to the result
in the case B = C from [2], Theorem 5.1.
Let X be a selfadjoint element of A. If A is endowed with two B-valued condi-
tional expectations Φ,Ψ, the element X will have two moment-generating multi-
linear function series, one with respect to Ψ, that we will denote by MX , and one
with respect to Φ, denoted MX . For brevity, we will use the notation
cRX for the
multilinear function series cRMX ,MX .
Theorem 3.6. Let X and Y be two elements of A that are c-free with respect to
the pair of conditional expectations (Φ,Ψ). Then
cRX+Y =
cRX +
Proof. Let A be an algebra containing B as a subalgebra and endowed with the
conditional expectations Φ,Ψ : A −→ B. Consider the set A0 = A \ B (set
difference). For n ≥ 1 define the maps
cr : A0 × · · · × A0︸ ︷︷ ︸
n times
given by the recurrence formula:
Φ(a1 · · · an) =
l(1)<···<l(k)
1<l(1),l(k)≤n
rk(a1[Ψ(a2 · · · al(1)−1)], . . . ,
. . . , al(k−1)[Ψ(al(k−1)+1 · · · al(k)1)], al(k)[Φ(al(k)+1 · · · an)])
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 11
Note that crn is well defined, and that, for any b1, . . . , bn ∈ B,
crn+1(X, b1X, . . . , bnX) =
cRX,n(b1, . . . , bn). (3.4)
As in Section 2, consider B〈ξi〉, the noncommutative algebras of polynomi-
als in the symbols ξi, i = 1, 2 and with coefficients from B and the conditional
expectations
ΦX ,ΨX : B〈ξ1〉 −→ B
given by
ΦX(f(ξ1)) = Φ(f(X))
ΨX(f(ξ1)) = Ψ(f(X))
and their analogues ΦY ,ΨY for B〈ξ2〉.
OnB〈ξ1, ξ2〉, identified toB〈ξ1〉∗BB〈ξ2〉, consider the conditional expectations
Ψ0,Φ0, ϕ given by:
Ψ0 = ΨX ∗ΨY
Φ0(f(ξ1, ξ2)) = Φ(f(X,Y ))
ϕ(a1a2 . . . an) =
l(1)<···<l(k)
1<l(1),l(k)≤n
ρk(a1[Ψ0(a2 · · · al(1)−1)], . . . ,
. . . , al(k−1)[Ψ0(al(k−1)+1 · · ·al(k)1 )], al(k)[ϕ(al(k)+1 · · · an)])
where a1, . . . , an are elements of the set B〈ξ1, ξ2〉0 = B〈ξ1〉 ∪B〈ξ2〉 \B, and the
ρn : B〈ξ1, ξ2〉0 × . . .B〈ξ1, ξ2〉0︸ ︷︷ ︸
n times
are given by:
ρn(a1, . . . , an) =
cr(a1, . . . , an) if all a1, . . . , an ∈ B〈ξ1〉
cr(a1, . . . , an) if all a1, . . . , an ∈ B〈ξ2〉
0 otherwise
We will show that ϕ = Φ0, in particular ϕ is also well-defined. Consider the
element a ∈ B〈ξ1, ξ2〉 of the form a = a1 · · · an with aj ∈ B〈ξε(j)〉, such that
ε(1) 6= ε(2) 6= · · · 6= ε(n) and Ψ0(aj) = 0. The computation of ϕ(a1 · · · an) is done
via the recurrence relation above. Because of the definition of ρ and the fact that
Ψ0 = ΨX ∗ΨY , only the term with k = 1 contribute at the sum, i.e.
ϕ(a1 · · · an) = ϕ(a1ϕ(a2 · · · an))
= ϕε(1)(a1ϕ(a2 · · · an)
= ϕε(1)(a1)ϕ(a2 · · · an)
and the identity between ϕ and Φ0 follows by induction over n.
Since ϕ = Φ0, the maps ρn and
crn are satisfying the same recurrence relation,
hence
ρn(a1, . . . , an) =
cr(a1, . . . , an).
12 MIHAI POPA
In particular
RX+Y,n(b1, . . . , bn) =
rn+1((X + Y )b1(X + Y ) . . . (X + Y )bn(X + Y ))
= ρn+1((X + Y )b1(X + Y ) . . . (X + Y )bn(X + Y ))
= ρn+1((X)b1(X) . . . (X)bn(X)) + ρn+1((Y )b1(Y ) . . . (Y )bn(Y ))
= cRX,n(b1, . . . , bn) +
cRY,n(b1, . . . , bn).
4. Central limit theorem
Consider the ordered set 〈n〉 = {1, 2, . . . , n} and π a partition of 〈n〉 with blocks
B1, . . . , Bm:
〈n〉 = B1 ⊔B2 ⊔ · · · ⊔Bm.
The blocks Bp and Bq of π are said to be crossing if there exist i < j < k < l
in 〈n〉 such that i, k ∈ Bp and j, l ∈ Bq.
The partition π is said to be non-crossing if all pairs of distinct blocks of π
are not crossing. We will denote by NC2(n) the set of all non-crossing partitions
of 〈n〉 whose blocks contain exactly 2 elements and by NC≤s(n) the set of all
non-crossing partitions of 〈n〉 whose blocks contain at most s elements.
Let now γ be a non-crossing partition of 〈n〉 and B and C be two blocks of
π. We say that B is interior to C if there exist two indices i < j in 〈n〉 such
that i, j ∈ C and B ⊂ {i + 1, . . . , j − 1}. The block B is said to be outer if it is
not interior to any other block of γ. In a non-crossing partition of 〈n〉, the block
containing 1 is always outer.
Consider now an element X of A. Let π be a partition from NC2(n+ 1) (n =
odd) and B1 = (1, k) be the block of π containing 1. We define, by recurrence,
the following expressions:
Vπ(X, b1, . . . , bn) = Ψ(Xb1Vπ|{2,...,j−1}(X, b2, . . . , bk−2)bk−1X)bk
Vπ|{k+1,...,n+1}(X, bk+1, . . . , bn)
Wπ(X, b1, . . . , bn) = Φ(Xb1Vπ|{2,...,j−1}(X, b2, . . . , bk−2)bk−1X)bk
Wπ|{k+1,...,n+1}(X, bk+1, . . . , bn)
Theorem 4.1. (Central Limit Theorem) Let (Xn)n≥1 be a sequence of c-free
elements of A such that:
(1) all Xn have the same moment-generating multilinear function series, M
with respect to Φ and M with respect to Ψ.
(2) Ψ(Xn) = Φ(Xn) = 0.
X1 + · · ·+XN√
Then:
(i) lim
RSN = (0,M1(·), 0, . . . )
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 13
(ii) lim
RSN = (0,M1(·), 0, . . . )
(iii) there exist two conditional expectations ν : B〈ξ〉 −→ B, depending only
on M1(·), and µ : B〈ξ〉 −→ B, depending only on M1(·) and M1(·), such
ΨSN = ν
ΦSN = µ
in the weak sense; in particular,
ν(ξb1ξ . . . bnξ) =
π∈NC2(n)
Vπ(X1, b1, . . . , bn)
µ(ξb1ξ . . . bnξ) =
π∈NC2(n)
Wπ(X1, b1, . . . , bn).
Proof. Let X be an element of A with the same moment generating series as Xj,
j ≥ 1. As shown in [3],
RSN =
R Xk√
= NR X√
Also, from Theorem 2.4 and Theorem 3.6, it follows that
cRSN =
cR Xk√
= N cR X√
Since R and cR are multilinear and M0 = M0 = 0, we have that
cRSN ,n = lim
cRX,n
0 if n 6= 1
M1(·) if n = 1
and the similar relations for RSN ,n, hence (i) and (ii) are proved.
For (iii) it suffices to check the relations for ν(ξb1ξ . . . bnξ) and µ(ξb1ξ . . . bnξ),
which are a trivial corollary of (i), (ii), and the recurrence formulas that define R
and cR. �
Remark 4.2. For B = C, the theorem is a weaker version of Theorem 4.3 from [2].
If Ψ is C-valued, then the result is similar to Corollary 5.1 from [6]. Also, under
the assumptions that for some a, b ∈ B we have that:
NΨ(X1 · · ·XN ) = a
NΨ(X1 · · ·XN ) = b
the same techniques lead to a Poisson-type limit Theorem, similar to Corollary 2,
Section 5 of [6].
14 MIHAI POPA
In the following remaining pages we will describe the positivity of the limit
functionals µ and ν in terms of Φ and Ψ. The central result is Corollary 4.4.
For simplicity, suppose that B is a unital ∗-algebra (otherwise, we can replace
B by its unitisation). Consider the symbol ξ, the ∗-algebra B〈ξ〉 of polynomials
in ξ with coefficients from B, as defined before, and consider also the linear space
BξB generated by the set {b1ξb2; b1, b2 ∈ B} with the B-bimodule structure
given by
a1b1ξb2a2 = (a1b1)ξ(b2a2)
for all a1, a2, b1, b2 ∈ B.
Lemma 4.3. For any positive B-sesquilinear pairing 〈·, ·〉 on BξB there exists a
positive conditional expectation ϕ : B〈ξ〉 −→ B such that for any b1, b2 ∈ B one
has that
ϕ(ξb∗1b2ξ) = 〈b1ξ, b2ξ〉
Proof. Without loss of generality, we can suppose that B is unital (otherwise we
can replace B by its unitization).
Consider the Full Fock bimodule over BξB
F〈ξ〉 = B⊕
BξB⊗B · · · ⊗B BξB︸ ︷︷ ︸
n times
with the pairing given by
〈a, b〉 = a∗b
〈a1ξ ⊗ · · · ⊗ anξ, b1ξ ⊗ · · · ⊗ bmξ〉 = δm,n〈anξ, 〈. . . , 〈a1ξ, b1ξ〉b2ξ〉, . . . bnξ〉.
(a, aj , b, bj ∈ B, j = 1, . . . , n)
Note that the B-linear operators A1, A2 : F〈ξ〉 −→ F〈ξ〉 described by the
relations
A1b = ξb
A1(a1ξ ⊗ · · · ⊗ anξb) = ξ ⊗ a1ξ ⊗ · · · ⊗ anξb
A2b = 0
A2(a1ξ ⊗ · · · ⊗ anξb) = 〈ξ, a1ξ〉a2ξ ⊗ · · · ⊗ anξb
are self-adjoint to each other, in the sense that
〈A1ζ̃1, ζ̃2〉 = 〈ζ̃1, A2ζ̃2〉
for any ζ̃1, ζ̃2 ∈ F〈ξ〉, therefore S = A1 +A2 is selfadjoint.
Moreover, for any a, b ∈ B,
〈1, Sa∗bS1〉 = 〈aS1, bS1〉
= 〈a(A1 +A2)1, b(A1 +A2)1〉
= 〈aξ, bξ〉
and the conclusion follows by setting ϕ(p(ξ)) = 〈1, p(S)1〉 for all p ∈ B〈ξ〉. �
Corollary 4.4. The maps µ and ν from Theorem 4.1 are positive if and only if
for any b ∈ B one has that Φ(Xb∗bX) ≥ 0 and Ψ(Xb∗bX) ≥ 0.
MULTILINEAR FUNCTION SERIES, C-FREE PROBABILITY WITH AMALGAMATION 15
Proof. One implication is trivial, since, if ν and µ are positive, then
Ψ(Xb∗bX) = ν(Xb∗bX) = ν((bX)∗bX) ≥ 0
Φ(Xb∗bX) = µ(Xb∗bX) = µ((bX)∗bX) ≥ 0.
Suppose now that Φ(Xb∗bX) ≥ 0 and Ψ(Xb∗bX) ≥ 0 for all b ∈ B. We will
use the same argument as in [9] and [8].
Consider the set of selfadjoint symbols {ξi}i≥1. On each B-bimodule BξiB we
have the positive B-sesquilinear pairings 〈·, ·〉Φ and 〈·, ·〉Ψ determined by
〈aξi, bξi〉Φ = Φ(Xa∗bX)
〈aξi, bξi〉Ψ = Ψ(Xa∗bX).
As shown in Lemma 4.3, the above B-sesquilinear pairings determine positive
conditional expectations ϕ1, ψi : Ai −→ B, where Ai = B〈ξi〉 be the ∗-algebras of
polynomials in ξ with coefficients from B, i ≥ 1.
For τ : B〈ξ〉 −→ B a conditional expectation, and λ ≥ 0, note with Dλτ the
dilation with λ of τ , i.e.
Dλτ(ξb1ξ · · · bnξ) = λn+1τ(ξb1ξ · · · bnξ)
Remark that if τ is positive, then Dλτ is also positive.
With the notations above, consider, as in Definition 2.1, the conditionally free
product (A,Φ,Ψ) = ∗i∈I(Ai,Φi,Ψi). The elements {ξi}i≥1 are conditionally free
in A, so Theorem 4.1 implies that:
µ = lim
Φ ξ1+···+ξN√
= D 1√
Φξ1+···+ξN
ν = lim
Ψ ξ1+···+ξN√
= D 1√
Ψξ1+···+ξN
= D 1√
(∗Ni=1Ψξi
We have that ∗Ni=1Ψξi ≥ 0 since it is the free product of states (see, for example
[9]), hence the positivity of ν.
Also, Theorem 2.4 and Corollary 2.6 imply that Φξ1+···+ξN ≥ 0, therefore µ ≥ 0.
Acknowledgment. This research was partially supported by the Grant 2-CEx06-
11-34 of the Romanian Government. I am thankful to Marek Bożejko for present-
ing me the basics of c-freeness and bringing to my attention the references [2] and
[6]. I thank also Hari Bercovici for his constant support and his many advices
during the work on this paper.
16 MIHAI POPA
References
1. Bożejko, M. and Speicher, R. : ψ-independent and symmetrized white noises., in: Quantum
Probability and Related Topics, VI (1991), 219-236, World Scientific, Singapore.
2. Bożejko, M., Leinert, M. and Speicher, R.: Convolution and Limit Theorems for Condi-
tionally free Random Variables., in: Pac. J. Math. 175 (1996), 357-388.
3. Dykema, K.: Multilinear function series and transforms in Free Probability theory, in: Ad-
vances in Mathematics 208 (2007) 351407.
4. Dykema, K.: On the S-transform over a Banach algebra, in: Journal of Functional Analysis
231 (2006) 90110
5. Lance, E. C.: Hilbert C*-modules. A toolkit for operator algebraists, London Mathematical
Society Lecture Note Series 210, Cambridge University Press 1990.
6. M lotkowski, W.: Operator-valued version of conditionally free product, in: Studia Mathe-
matics 153 (1) (2002)
7. Nica, A. and Speicher, R.: Lectures on the Combinatorics of the Free Probability, London
mathematical Society Lecture Note Series 335, Cambridge University Press 2006
8. Popa, M.: A combinatorial approach to monotonic independence over a C*-algebra Preprint,
arXiv: math.OA/0612570, 01/2007
9. Speicher, R.: Combinatorial Theory of the Free Product with amalgamation and Operator-
Valued Free Probability Theory, in: Mem. AMS, Vol 132, No 627 (1998)
Mihai Popa: Department of Mathematics, Indiana University at Bloomington,
Rawles Hall, 931 E 3rd St, Bloomington, IN 47405
E-mail address: mipopa@indiana.edu
ABSTRACT
  As in the cases of freeness and monotonic independence, the notion of
conditional freeness is meaningful when complex-valued states are replaced by
positive conditional expectations. In this framework, the paper presents
several positivity results, a version of the central limit theorem and an
analogue of the conditionally free R-transform constructed by means of
multilinear function series.

<|endoftext|><|startoftext|>
Introduction
Since the formulation of quantum automorphism groups by Wang ([15],
[16]), following suggestions of Alain Connes, many interesting examples of
such quantum groups, particularly the quantum permutation groups of finite
sets and finite graphs, have been extensively studied by a number of mathe-
maticians (see, e.g. [1], [2], [17] and references therein), who have also found
applications to and interaction with areas like free probability and subfactor
theory. The underlying basic principle of defining a quantum automorphism
group corresponding to some given mathematical structure (for example, a
1The author gratefully acknowldges support obtained from the Indian National
Academy of Sciences through the grants for a project on ‘Noncommutative Geometry
and Quantum Groups’, and also wishes to thank The Abdus Salam ICTP (Trieste), where
a major part of the work was done during a visit as Junior Assciate.
http://arxiv.org/abs/0704.0041v4
finite set, a graph, a C∗ or von Neumann algebra) consists of two steps :
first, to identify (if possible) the group of automorphisms of the structure as
a universal object in a suitable category, and then, try to look for the univer-
sal object in a similar but bigger category by replacing groups by quantum
groups of appropriate type. However, most of the work done so far concern
some kind of quantum automorphism groups of a ‘finite’ structure, for ex-
ample, of finite sets or finite dimensional matrix algebras. It is thus quite
natural to try to extend these ideas to the ‘infinite’ or ‘continuous’ mathe-
matical structures, for example classical and noncommutative manifolds. In
the present article, we have made an attempt to formulate and study the
quantum analogues of the groups of Riemannian isometries, which play a
very important role in the classical differential geometry. The group of Rie-
mannian isometries of a compact Riemannian manifold M can be viewed as
the universal object in the category of all compact metrizable groups acting
on M , with smooth and isometric action. Therefore, to define the quantum
isometry group, it is reasonable to consider a category of compact quantum
groups which act on the manifold (or more generally, on a noncommutative
manifold given by spectral triple) in a ‘nice’ way, preserving the Riemannian
structure in some suitable sense, to be precisely formulated. In this article,
we have given a definition of such ‘smooth and isometric’ action by a com-
pact quantum group on a (possibly noncommutative) manifold, extending
the notion of smooth and isometric action by a group on a classical mani-
fold. Indeed, the meaning of isometric action is nothing but that the action
should commute with the ‘Laplacian’ coming from the spectral triple, and
we should mention that this idea was already present in [2], though only in
the context of a finite metric space or a finite graph. The universal object
in the category of such quantum groups, if it exists, should be thought of
as the quantum analogue of the group of isometries, and we have been able
to prove its existence under some regularity assumptions, all of which can
be verified for a general compact connected Riemannian manifold as well
as the standard examples of noncommutative manifolds. Motivated by the
ideas of Woronowicz and Soltan, we actually consider a bigger category. The
isometry group of a classical manifold, viewed as a compact metrizable space
(forgetiing the group structure), can be seen to be the universal object of a
category whose object-class consists of subsets (not necessarily subgroups)
of the set of smooth isometries of the manifold. Then it can be proved
that this universal compact set has a canonical group structure. A natural
quantum analogue of this has been formulated by us, called the category of
‘quantum families of smooth isometries’. The underlying C∗-algebra of the
quantum isometry group has been identified with its universal object and
moreover, it is shown to be equipped with a canonical coproduct making it
into a compact quantum group.
We believe that a detailed study of quantum isometry groups will not
only give many new and interesting examples of compact quantum groups,
it will also contribute to the understanding of quantum group covariant
spectral triples. In fact, we have made some progress in this direction already
by constructing a spectral triple (which is often closely related to the original
spectral triple) on the Hilbert space of forms which is equivriant with respect
to a canonical unitary representation of the quantum isometry group.
In a companion article [3] with J. Bhowmick, we provide explicit compu-
tations of quantum isometry groups of a few classical and noncommutative
manifolds. However, we briefly quote some of main results of [3] in the
present article. One interesting observation is that the quantum isometry
group of the noncommutative two-torus Aθ (with the canonical spectral
triple) is (as a C∗ algebra) a direct sum of two commutative and two non-
commutative tori, and contains as a quantum subgroup (which is univer-
sal for certain class of isometric actions called holomorphic isometries) the
‘quantum double-torus’ discovered and studied by Hajac and Masuda ([11]).
2 Definition of the quantum isometry group
2.1 Isometry groups of classical manifolds
We begin with a well-known characterization of the isometry group of a (clas-
sical) compact Riemannian manifold. Let (M,g) be a compact Riemannian
manifold and let Ω1 = Ω1(M) be the space of smooth one-forms, which has
a right Hilbert-C∞(M)-module structure given by the C∞(M)-valued inner
product << ·, · >> defined by
<< ω, η >> (m) =< ω(m), η(m) > |m,
where < ·, · > |m is the Riemannian metric on the cotangent space T
mM at
the pointm ∈M . The Riemannian volume form allows us to make Ω1 a pre-
Hilbert space, and we denote its completion by H1. Let H0 = L
2(M,dvol)
and consider the de-Rham differential d as an unbounded linear map from
H0 toH1, with the natural domain C
∞(M) ⊂ H0, and also denote its closure
by d. Let L := −d∗d. The following identity can be verified by direct and
easy computation using the local coordinates :
(∂L)(φ,ψ) ≡ L(φ̄ψ)−L(φ̄)ψ−φ̄L(ψ) = 2 << dφ, dψ >> for φ,ψ ∈ C∞(M) (∗).
Proposition 2.1 A smooth map γ : M → M is a Riemannian isometry if
and only if γ commutes with L in the sense that L(f ◦ γ) = (L(f)) ◦ γ for
all f ∈ C∞(M).
Proof :
If γ commutes with L then from the identity (*) we get for m ∈ M and
φ,ψ ∈ C∞(M) :
< dφ|γ(m), dψ|γ(m) > |γ(m)
= << dφ, dψ >> (γ(m))
(∂L(φ,ψ) ◦ γ)(m)
∂L(φ ◦ γ, ψ ◦ γ)(m)
= << d(φ ◦ γ), d(ψ ◦ γ) >> (m)
= < d(φ ◦ γ)|m, d(ψ ◦ γ)|m > |m
= < (dγ|m)
∗(dφ|γ(m)), (dγ|m)
∗(dψ|γ(m)) > |m,
which proves that (dγ|m)
∗ : T ∗
M → T ∗mM is an isometry. Thus, γ is a
Riemannian isometry.
Conversely, if γ is an isometry, both the maps induced by γ on H0 and
H1, i.e. U
γ : H0 → H0 given by U
γ (f) = f ◦ γ and U
γ : H
1 → H1 given
by U1γ (fdφ) = (f ◦ γ)d(φ ◦ γ) are unitaries. Moreover, d ◦ U
γ = U
γ ◦ d on
C∞(M) ⊂ H0. From this, it follows that L = −d
∗d commutes with U0γ . ✷
Now let us consider a compact metrizable (i.e. second countable) space
Y with a continuous map θ : M × Y → M . We abbreviate θ(m, y) as ym
and denote by ξy the map M ∋ m 7→ ym. Let α : C(M) → C(M)⊗C(Y ) ∼=
C(M × Y ) be the map given by α(f)(m, y) := f(ym) for y ∈ Y , m ∈ M
and f ∈ C(M). For a state φ on C(Y ), denote by αφ the map (id⊗ φ) ◦ α :
C(M) → C(M). We shall also denote by C the subspace of C(M) ⊗ C(Y )
generated by elements of the form α(f)(1⊗ψ), f ∈ C(M), ψ ∈ C(Y ). Since
C(M) and C(Y ) are commutative algebras, it is easy to see that C is a
∗-subalgebra of C(M)⊗ C(Y ). Then we have the following
Theorem 2.2 (i) C is norm-dense in C(M)⊗C(Y ) if and only if for every
y ∈ Y , ξy is one-to-one.
(ii) The map ξy is C
∞ for every y ∈ Y if and only if αφ(C
∞(M)) ⊆ C∞(M)
for all φ.
(iii) Under the hypothesis of (ii), each ξy is also an isometry if and only if
αφ commutes with (L − λ)
−1 for all state φ and all λ in the resolvent of L
(equivalently, αφ commutes with the Laplacian L on C
∞(M)).
Proof :
(i) First, assume that ξy is one-to-one for all y. By Stone-Weirstrass Theo-
rem, it is enough to show that C separates points. Take (m1, y1) 6= (m2, y2)
in M × Y . If y1 6= y2, we can choose ψ ∈ C(Y ) which separates y1 and y2,
hence (1 ⊗ ψ) ∈ C separates (m1, y1) and (m2, y2). So, we can consider the
case when y1 = y2 = y (say), but m1 6= m2. By injectivity of ξy, we have
ym1 6= ym2, so there exists f ∈ C(M) such that f(ym1) 6= f(ym2), i.e.
α(f)(m1, y) 6= α(f)(m2, y). This proves the density of C.
For the converse, we argue as in the proof of Proposition 3.3 of [14].
Assume that C is dense in C(M)⊗ C(Y ), and let y ∈ Y , m1,m2 ∈ M such
that ym1 = ym2. That is, α(f)(1 ⊗ ψ)(m1, y) = α(f)(1 ⊗ ψ)(m2, y) for all
f ∈ C(M), ψ ∈ C(Y ). By the density of C we get χ(m1, y) = χ(m2, y) for
all χ ∈ C(M × Y ), so (m1, y) = (m2, y), i.e. m1 = m2.
(ii) The ‘if part’ of (ii) follows by considering the states corresponding to
point evaluation, i.e. C(Y ) ∋ ψ 7→ ψ(y), y ∈ Y . For the converse, we note
that an arbitrary state φ corresponds to a regular Borel measure µ on Y
so that φ(h) =
hdµ, and thus, αφ(f)(m) =
f(ym)dµ(y) for f ∈ C(M).
From this, by interchanging differentiation and integation (which is allowed
by the Dominated Convergence Theorem, since µ is a finite measure) we can
prove that αφ(f) is C
∞ whenever f is so.
The assertion (iii) follows from Proposition 2.1 in a straghtforward way.
Let us recall a few well-known facts about the Laplacian L, viewed as a
negative self-adjoint operator on the Hilbert space L2(M,dvol). It is known
(see [12] and references therein) that L has compact resolvents and all its
eigenvectors belong to C∞(M). Moreover, it follows from the Sobolev Em-
bedding Theorem that
Dom(Ln) = C∞(M).
Let {eij , j = 1, ..., di; i = 1, 2, ...} be the set of (normalised) eigenvectors of
L, where eij ∈ C
∞(M) is an eigenvector corresponding to the eigenvalue λi,
|λ1| < |λ2| < .... We have the following:
Lemma 2.3 The complex linear span of {eij} is norm-dense in C(M).
Proof :
This is a consequence of the asymptotic estimates of eigenvalues λi, as
well as the uniform bound of the eigenfunctions eij . For example, it is
known ([9],Theorem 1.2) that there exist constants C,C ′ such that ‖eij‖∞ ≤
C|λi|
4 , di ≤ C
′|λi|
2 , where n is the dimension of the manifoldM . Now,
for f ∈ C∞(M) ⊆
k≥1Dom(L
k), we write f as an a-priori L2-convergent
series
ij fijeij (fij ∈ C), and observe that
|fij |
2|λi|
2k < ∞ for every
k ≥ 1. Choose and fix sufficiently large k such that
i≥0 |λi|
n−1−2k < ∞,
which is possible due to the well-known Weyl asymptotics of eigenvalues of
L. Now, by the Cauchy-Schwarz inequality and the estimate for di, we have
|fij|‖eij‖∞ ≤ C(C
|fij |
2|λi|
n−1−2k
Thus, the series
ij fijeij converges to f in sup-norm, so Sp{eij , j = 1, 2, ..., di ; i =
1, 2, ...} is dense in sup-norm in C∞(M), hence in C(M) as well. ✷
Let us denote Sp{eij , j = 1, ..., di; i ≥ 1} by A
0 from now on. We shall
now show that C∞(M) can be replaced by the smaller subspace A∞0 in
Theorem 2.2. We need a lemma for this, which will be useful later on too.
Lemma 2.4 Let H1,H2 be Hilbert spaces and for i = 1, 2, let Li be (possibly
unbounded) self-adjoint operator on Hi with compact resolvents, and let Vi
be the linear span of eigenvectors of Li. Moreover, assume that there is an
eigenvalue of Li for which the eigenspace is one-dimensional, say spanned by
a unit vector ξi. Let Ψ be a linear map from V1 to V2 such that L2Ψ = ΨL1
and Ψ(ξ1) = ξ2. Then we have
〈ξ2,Ψ(x)〉 = 〈ξ1, x〉 ∀x ∈ V1. (1)
Proof:
By hypothesis on Ψ, it is clear that there is a common eigenvalue, say λ0, of
L1 and L2, with the eigenvectors ξ1 and ξ2 respectively. Let us write the set
of eigenvalues of Li as a disjoint union {λ0}
Λi (i = 1, 2), and let the corre-
sponding orthogonal decomposition of Vi be given by Vi = Cξi
Vλi ≡
Cξi ⊕ V
i, say, where V
i denotes the eigenspace of Li corresponding to the
eigenvalue λ. By assumption, Ψ maps Vλ1 to V
2 whenever λ is an eigenvalue
of L2, i.e. V
2 6= {0}, and otherwise it maps V
1 into {0}. Thus, Ψ(V
1) ⊆ V
Now, (1) is obviously satisfied for x = ξ1, so it is enough to prove (1) for all
x ∈ V ′1. But we have 〈ξ, x〉 = 0 for x ∈ V
1, and since Ψ(x) ∈ V
2 = V2
it follows that 〈ξ2,Ψ(x)〉 = 0 = 〈ξ1, x〉. ✷
Lemma 2.5 Let Y and α be as in Theorem 2.2. Then the following are
equivalent.
(a) For every y ∈ Y , ξy is smooth isometric.
(b) For every state φ on C(Y ), we have αφ(A
0 ) ⊆ A
0 , and αφL = Lαφ on
A∞0 .
Proof:
We prove only the nontrivial implication (b) ⇒ (a). Assume (b) that αφ
leaves A∞0 invariant and commutes with L on it, for every state φ. To
prove that α is a smooth isometric action, it is enough (see the proof of
Theorem 2.2) to prove that αy(A
∞) ⊆ A∞ for all y ∈ Y , where αy(f) :=
(id⊗evy)(f) = f ◦ξy, evy being the evaluation at the point y. LetM1, ...,Mk
be the connected components of the compact manifoldM . Thus, the Hilbert
space L2(M,dvol) admits an orthogonal decomposition ⊕ki=1L
2(Mi,dvol),
and the Laplacian L is of the form ⊕iLi where Li denotes the Laplacian on
Mi. Since each Mi is connected, we have Ker(Li) = Cχi, where χi is the
constant function on Mi equal to 1. Now, we note that for fixed y and i,
the image of Mi under the continuous function ξy must be mapped into a
component, sayMj. Thus, by applying Lemma 2.4 with H1 = L
2(Mi),H2 =
L2(Mj), Ψ = ξy and the L
2-continuity of the map f 7→ αy(f) = f ◦ ξy, we
αy(f)(x)dvol(x) =
f(x)dvol(x)
for all f in the linear span of eigenvectors of Li, hence (by density) for all f
in L2(Mi). It follows that
αy(f)dvol =
fdvol for all f ∈ L2(M), in
particular for all f ∈ C(M). Since αy is a ∗-homomorphism on C(M), we
〈αy(f), αy(g)〉 =
αy(fg)dvol =
fgdvol = 〈f, g〉,
for all f, g ∈ C(M). Thus, αy extends to an isometry on L
2(M), to be
denoted by the same notation, which by our assumption commutes with the
self-adjoint operator L on the core A∞0 , and hence αy commutes with L
n for
all n. In particular it leaves invariant the domains of each Ln, which implies
∞) ⊆ A∞. ✷
In view of the fact that the set of isometries of M , denoted by ISO(M),
is a compact second countable (i.e. compact metrizable) group, we see that
ISO(M) is the maximal compact second countable group acting on M such
that the action is smooth and isometric. In other words, if we consider a
catogory whose objects are compact metrizable groups acting smoothly and
isometrically on M , and morphisms are the group homomorphisms com-
muting with the actions on M , then ISO(M) (with its canonical action on
M) is the initial object of this cateogory. However, one can take a more
general viewpoint and consider the category of compact metrizable spaces
Y equipped with a continuous map θ : M × Y → M satisfying (i)-(iii) of
Theorem 2.2, or equivalently, the pair of commutative unital C∗-algebras
B = C(Y ) and a unital C∗-homomorphism α : C(M) → C(M) → B satisfy-
ing the conditions (i)-(iii). The set of isometries ISO(M) (as a topological
space) can be identified with the universal object of this category, and then
one can prove that it has a group structure.
It is quite natural to formulate a quantum analogue of the above, by con-
sidering, in the spirit of Woronowicz and Soltan (see [19] and [13]), ‘quantum
families of isometries’, which can be defined to be a pair (B, α) where B is
a (not necessarily commutative) C∗-algebra and α : C(M) → C(M)⊗ B is
unital C∗-homomorhism satisfying (i)-(iii) of Theorem 2.2, i.e. the linear
span of α(C(M))(1⊗B) (which is not necessarily a ∗-subalgebra any more,
B being possibly noncommutative) is norm-dense in C(M)⊗ B and for ev-
ery state φ on B, the map αφ keeps C
∞(M) invariant and commutes with
the Laplacian L. The morphisms of this category are obvious. We shall
prove that this category has a universal object, and this universal object
can be equipped with a canonical quantum group structure. This will define
the quantum isometry group of a manifold. However, we shall go beyond
classical manifolds and define quantum isometry group QISO(A∞,H,D)
for a spectral triple (A∞,H,D), with A∞ being unital, and satisfying cer-
tain assumptions. To this end, we need to carefully formulate the notion
of Laplacian in noncommutative geometry, which is the goal of the next
subsection.
2.2 Laplacian in noncommutative geometry
Given a spectral triple (A∞,H,D), we recall from [10] and [6] the con-
struction of the space of one-forms. We have a derivation from A∞ to the
A∞-A∞ bimodule B(H) given by a 7→ [D, a]. This induces a bimodule
morphism π from Ω1(A∞) (the bimodule of universal one-forms on A∞) to
B(H), such that π(δ(a)) = [D, a], where δ : A∞ → Ω1(A∞) denotes the
universal derivation map. We set Ω1D ≡ Ω
∞) := Ω1(A∞)/Ker(π) ∼=
π(Ω1(A∞)) ⊆ B(H). Assume that the spectral triple is of compact type
and has a finite dimension in the sense of Connes ([6]), i.e. there is some
p > 0 such that the operator |D|−p (interpreted as the inverse of the re-
striction of |D|p on the closure of its range, which has a finite co-dimension
since D has compact resolvents) has finite nonzero Dixmier trace, denoted
by Trω (where ω is some suitable Banach limit, see, e.g. [6], [10]). Con-
sider the canonical ‘volume form’ τ coming from the Dixmier trace, i.e.
τ : B(H) → C defined by τ(A) := 1
Trω(|D|−p)
Trω(A|D|
−p). Let us at this
point assume that the spectral triple is QC∞, i.e. A∞ and {[D, a], a ∈ A∞}
are contained in the domains of all powers of the derivation [|D|, ·]. Under
this assumption, τ is a positive faithful trace on the C∗-subalgebra gener-
ated by A∞ and {[D, a] a ∈ A∞}, and the GNS Hilbert space L2(A∞, τ) is
denoted by H0D. Similarly, we equip Ω
D with a semi-inner product given by
< η, η′ >:= τ(η∗η′), and denote the Hilbert space obtained from it by H1D.
The map dD : H
D → H
D given by dD(·) = [D, ·] is an unbounded densely
defined linear map. Let us assume the following:
Assumption(i) (a) dD is closable (the closure is denoted again by dD);
(b) A∞ ⊆ Dom(L), where L := −d∗DdD and A
∞ is viewed as a dense sub-
space of H0D;
At this point, let us show that this assumption is valid under a very
natural condition on the spectral triple.
Lemma 2.6 Suppose that for every element a ∈ A∞, the map R ∋ t 7→
αt(X) := exp(itD)Xexp(−itD) is differentiable at t = 0 in the norm-
topology of B(H), where X = a or [D, a]. Then the assumption (i) is sat-
isfied. Moreover, in this case, L maps A∞ into the weak closure of A∞ in
B(H0D).
Proof :
We first observe that τ(αt(A)) = τ(A) for all t and for all A ∈ B(H), since
exp(itD) commutes with |D|−p. If moreover, A belongs to the domain of
norm-differentiability (at t = 0) of αt, i.e.
αt(A)−A
→ i[D,A] in operator-
norm, then it follows from the property of the Dixmier trace that τ([D,A]) =
limt→0
τ(αt(A))−τ(A)
= 0. Now, since by assumption we have the norm-
differentiability at t = 0 of αt(A) for A belonging to the ∗-subalgebra (say
B) generated by A∞ and [D,A∞], it follows that τ([D,A]) = 0 ∀A ∈ B. Let
us now fix a, b, c ∈ A∞ and observe that
< a dD(b), dD(c) >
= τ((a dD(b))
∗dD(c) >
= −τ([D, [D, b∗]a∗c]) + τ([D, [D, b∗]a∗]c)
= τ([D, [D, b∗]a∗]c),
using the fact that τ([D, [D, b∗]a∗c]) = 0. This implies
| < a dD(b), dD(c) > | ≤ ‖[D, [D, b
∗]a∗]‖τ(c∗c)
2 = ‖[D, [D, b∗]a∗]‖‖c‖2,
where ‖c‖2 = τ(c
2 denotes the L2-norm of c ∈ H0D. This proves that
a dD(b) belongs to the domain of d
D for all a, b ∈ A
∞, so in particular
d∗D is dense, i.e. dD is closable. Moreover, taking a = 1, we see that
∞) ⊆ Dom(d∗D), or in other words, A
∞ ⊆ Dom(d∗DdD). This proves
(i)(a) and (i)(b). The last sentence in the statement of the lemma can be
proven along the line of Theorem 2.9, page 129, [10]. ✷
We need few more assumptions on the operator L to define the quantum
isometry group.
Assumption (ii): L has compact resolvents,
Assumption(iii): L(A∞) ⊆ A∞;
Assumption(iv): Each eigenvector of L (which has a discrete spectrum,
hence a complete set of eigenvectors) belongs to A∞;
Assumption(v)(‘connectedness assumption’): the kernel of L is one-dimensional,
spanned by the identity 1 of A∞, viewed as a unit vector in H0D.
We call L the noncommutative Laplacian and Tt the noncommutative heat
semigroup. We summarize some simple observations in form of the following
Lemma 2.7 (a) If the assumptions (i)-(v) are valid, then for x ∈ A∞, we
have L(x∗) = (L(x))∗.
(b) If Tt := exp(tL) maps H
D into A
∞ for all t > 0, the the assumption
(iv) is satisfied.
Proof :
It follows by simple calculation using the facts that τ is a trace and dD(x
−(dD(x))
∗ that
τ(L(x∗)∗y)
= −τ(dD(x)dD(y)) = −τ(dD(y)dD(x)) = τ((dD(y
∗))∗dD(x))
= < y∗,L(x) >= τ(yL(x)) = τ(L(x)y),
for all y ∈ A∞. By density of A∞ in H0D (a) follows. To prove (b), we note
that if x ∈ H0D is an eigenvector of L, say L(x) = λx (λ ∈ C), then we have
Tt(x) = e
λtx, hence x = e−λtTt(x) ∈ A
Since by assumption, L has a countable set of eigenvalues each with finite
multiplicity, let us denote them by λ0 = 0, λ1, λ2, ... with V0 = C 1, V1, V2, ...
be corresponding eigenspaces (finite dimensional), and for each i, let {eij , j =
1, ..., di} be an orthonormal basis of Vi. By Assumption (iv), Vi ⊆ A
∞ for
each i, Vi is closed under ∗, and moreover, {e
ij , j = 1, ..., di} is also an or-
thonormal basis for Vi, since τ(x
∗y) = τ(yx∗) for x, y ∈ A∞. We also make
the following
Assumption (vi) The complex linear span of {eij , i = 0, 1, ...; j = 1, ..., di},
say A∞0 , is norm-dense in A
Definition 2.8 We say that a spectral triple satisfying the assumptions (i)-
(vi) admissible.
Remark 2.9 We have just seen that classical spectral triple (A∞ = C∞(M),H,D),
where M is compact connected spin manifold, H is the L2 space of square
integrable spinors and D is the Dirac operator, is indeed admissible in our
sense. Later on we shall discuss how we can weaken the connectedness
assumption as well, thus accommodating a general classical (commutative)
spectral triple in our set-up. Moreover, the standard examples of noncom-
mutative spectral triples, e.g. those on Aθ, quantum Heisenberg manifold
etc., do belong to the admissible class.
Lemma 2.10 Let us assume that the spectral triple (A∞,H,D) is admis-
sible. Let Ψ : A∞0 → A
0 be a (norm-) bounded linear map, such that
Ψ(1) = 1, and Ψ ◦L = L◦Ψ on the subspace A∞0 spanned (algebraically) by
Vi, i = 1, 2, .... Then τ(Ψ(x)) = τ(x) for all x ∈ A
Proof :
By Lemma 2.4 with H1 = H2 = H
D, ξ1 = ξ2 = 1, we have τ(Ψ(x)) = τ(x)
for all x ∈ A∞0 . By the norm-continuity of Ψ and τ it extends to the whole
of A∞. ✷
2.3 Definition and existence of the quantum isometry group
We begin by recalling the definition of compact quantum groups and their
actions from [18]. A compact quantum group is given by a pair (S,∆), where
S is a unital separable C∗ algebra equipped with a unital C∗-homomorphism
∆ : S → S ⊗ S (where ⊗ denotes the injective tensor product) satisfying
(ai) (∆⊗ id) ◦∆ = (id⊗∆) ◦∆ (co-associativity), and
(aii) the linear span of ∆(S)(S ⊗ 1) and ∆(S)(1 ⊗ S) are norm-dense in
S ⊗ S.
It is well-known (see [18]) that there is a canonical dense ∗-subalgebra S0
of S, consisting of the matrix coefficients of the finite dimensional unitary
(co)-representations of S, and maps ǫ : S0 → C (co-unit) and κ : S0 → S0
(antipode) defined on S0 which make S0 a Hopf ∗-algebra.
We say that the compact quantum group (S,∆) acts on a unital C∗
algebra B, if there is a unital C∗-homomorphism α : B → B ⊗ S satisfying
the following :
(bi) (α⊗ id) ◦ α = (id⊗∆) ◦ α, and
(bii) the linear span of α(B)(1 ⊗ S) is norm-dense in B ⊗ S.
Let us now recall the concept of universal quantum groups as in [17],
[15] and references therein. We shall use most of the terminologies of [15],
e.g. Woronowicz C∗ -subalgebra, Woronowicz C∗-ideal etc, however with
the exception that we shall call the Woronowicz C∗ algebras just compact
quantum groups, and not use the term compact quantum groups for the
dual objects as done in [15]. For Q ∈ GLn(C), let Au(Q) denote the uni-
versal compact quantum group generated by uij, i, j = 1, ..., n satisfying the
relations
uu∗ = In = u
∗u, u′QuQ−1 = In = QuQ
−1u′,
where u = ((uij)), u
′ = ((uji)) and u = ((u
ij)). The coproduct, say ∆̃, is
given by,
∆̃(uij) =
uik ⊗ ukj.
We refer the reader to [17] for a detailed discussion on the structure and clas-
sification of such quantum groups. Let us denote by Ui the quantum group
Adi(I), where di is dimension of the subspace Vi. We fix a representation
βi : Vi → Vi⊗Ui of Ui on the Hilbert space Vi, given by βi(eij) =
k eik⊗u
for j = 1, ..., di, where u
(i) ≡ u
are the generators of Ui as discussed before.
Thus, both u(i) and ¯u(i) are unitaries. It follows from [15] that the represen-
tations βi canonically induce a representation β = ∗iβi of the free product
U := ∗iUi (which is a compact quantum group, see [15] for the details) on
the Hilbert space H0D, such that the restriction of β on Vi coincides with βi
for all i.
In view of the characterization of smooth isometric action on a classical
manifold, we make the following definitions.
Definition 2.11 A quantum family of smooth isometries of a noncommu-
tative manifold A∞ (or, more precisely on the corresponding spectral triple)
is a pair (S, α) where S is a separable unital C∗-algebra, α : A → A ⊗ S
(where A denotes the C∗ algebra obtained by completing A∞ in the norm of
B(H0D)) is a unital C
∗-homomorphism, satisfying the following:
(a) Sp
α(A)(1⊗ S)
= A⊗ S,
(b) αφ := (id ⊗ φ) ◦ α maps A
0 into itself and commutes with L on A
for every state φ on S.
In case the C∗-algebra S has a coproduct ∆ such that (S,∆) is a compact
quantum group and α is an action of (S,∆) on A, we say that (S,∆) acts
smoothly and isometrically on the noncommutative manifold.
Fix a spectral triple (A∞,H,D). Consider the category Q with the
object-class consisting of all quantum families of isometries (S, α) of the
given noncommutative manifold, and the set of morphismsMor((S, α), (S ′, α′))
being the set of unital C∗-homomorphisms φ : S → S ′ satisfying (id⊗φ)◦α =
α′. We also consider another category Q′ whose objects are triplets (S,∆, α)
where (S,∆) is a compact quantum group acting smoothly and isometrically
on the given noncommutative manifold, with α being the corresponding ac-
tion. The morphisms are the homomorphisms of compact quantum groups
which are also morphisms of the underlying quantum families. The forget-
ful functor F : Q′ → Q is clearly faithful, and we can view F (Q′) as a
subcategory of Q.
Let us assume from now on that the spectral triple (A∞,H,D) is admis-
sible. Our aim is to prove the existence of a universal object in Q. We shall
also prove that the (unique upto isomorphism) universal object belongs to
F (Q′), and its pre-image in Q′ is a universal object in the category Q′. To
this end, we need some preparatory results.
Lemma 2.12 Consider an admissible spectral triple (A∞,H,D) and let
(S, α) be a quantum family of smooth isometries of the spectral triple. More-
over, assume that the action α is faithful in the sense that there is no proper
C∗-subalgebra S1 of S such that α(A
∞) ⊆ A∞ ⊗ S1. Then α̃ : A
∞ ⊗ S →
A∞ ⊗ S defined by α̃(a⊗ b) : α(a)(1 ⊗ b) extends to an S-linear unitary on
the Hilbert S-module H0D ⊗S, denoted again by α̃. Moreover, we can find a
C∗-isomorphism φ : U/I → S between S and a quotient of U by a C∗-ideal
I of U , such that α = (id ⊗ φ) ◦ (id ⊗ ΠI) ◦ β on A
∞ ⊆ H0D, where ΠI
denotes the quotient map from U to U/I.
If, furthermore, there is a compact quantum group structure on S given
by a coproduct ∆ such that (S,∆, α) is an object in Q′, the map α : A∞ →
A∞⊗S extends to a unitary representation (denoted again by α) of the com-
pact quantum group (S,∆) on H0D. In this case, the ideal I is a Woronowicz
C∗-ideal and the C∗-isomorphism φ : U/I → S is a morphism of compact
quantum groups.
Proof :
Let ω be any state on S. Since the action α : A∞ → A∞ ⊗ S is smooth
and isometric, we conclude by Lemma 2.10 that τ(αω(x)) = τ(x)ω(1) for all
x ∈ A. Since ω is arbitrary, we have (τ ⊗ id)α(x) = τ(x)1S for all x ∈ A.
So, < α(x), α(y) >S=< x, y > 1S , where < ·, · >S denotes the S-valued
inner product of the Hilbert module H0D ⊗S. This proves that α̃ defined by
α̃(x⊗ b) := α(x)(1⊗ b) (x ∈ A∞, b ∈ S) extends to an S-linear isometry on
the Hilbert S-module H0D⊗S. Moreover, since α(A
∞)(1⊗S) is norm-dense
in Ā⊗S, it is clear that the S-linear span of the range of α(A∞) is dense in
the Hilbert module H0D ⊗ S, or in other words, the isometry α̃ has a dense
range, so it is a unitary.
Since αω leaves each Vi invariant, it is clear that α maps Vi into Vi ⊗ S
for each i. Let v
(j, k = 1, ..., di) be the elements of S such that α(eij) =
k eik ⊗ v
. Note that vi := ((v
)) is a unitary in Mdi(C)⊗S. Moreover,
the ∗-subalgebra generated by all {v
, i, j, k ≥ 1} must be dense in S by
the assumption of faithfulness.
We have already remarked that {e∗ij} is also an orthonormal basis of
Vi, and since α, being a C
∗-action on A, is ∗-preserving, we have α(e∗ij) =
(α(eij))
, and therefore ((v
)) is also unitary. By univer-
sality of Ui, there is a C
∗-homomorphism from Ui to S sending u
and by definition of the free product, this induces a C∗-homomorphism, say
Π, from U onto S, so that U/I ∼= S, where I := Ker(Π).
In case S has a coproduct ∆ making it into a compact quantum group
and α is a quantum group action, it is easy to see that the subalgebra of
S generated by v
is a Hopf algebra, with ∆(v
. From
this, it follows that Π is Hopf-algebra morphism, hence I is a Woronowicz
C∗-ideal. ✷
Before we state and prove the main theorem, let us note the following
elementary fact about C∗-algebras.
Lemma 2.13 Let C be a C∗ algebra and F be a nonempty collection of
C∗-ideals (closed two-sided ideals) of C. Then for any x ∈ C, we have
‖x+ I‖ = ‖x+ I0‖,
where I0 denotes the intersection of all I in F and ‖x + I‖ = inf{‖x −
y‖ : y ∈ I} denotes the norm in C/I.
Proof :
It is clear that supI∈F ‖x + I‖ defines a norm on C/I0, which is in fact a
C∗-norm since each of the quotient norms ‖ · +I‖ is so. Thus the lemma
follows from the uniqueness of C∗ norm on the C∗ algebra C/I0. ✷
Theorem 2.14 For any admissible spectral triple (A∞,H,D), the category
Q of quantum families of smooth isometries has a universal (initial) object,
say (G, α0). Moreover, G has a coproduct ∆0 such that (G,∆0) is a com-
pact quantum group and (G,∆0, α0) is a universal object in the category Q
of compact quantum groups acting smoothly and isometrically on the given
spectral triple. The action α0 is faithful.
Proof :
Recall the C∗-algebra U considered before, and the map β from H0D to
H0D⊗U . By our definition of β, it is clear that β(A
0 ) ⊆ A
0 ⊗algU . However,
β is only a linear map (unitary) but not necessarily a ∗-homomorphism. We
shall construct the universal object as a suitable quotient of U . Let F
be the collection of all those C∗-ideals I of U such that the composition
ΓI := (id⊗ΠI) ◦ β : A
0 → A
0 ⊗alg (U/I) extends to a C
∗-homomorphsim
from Ā to Ā ⊗ (U/I), where ΠI denotes the quotient map from U onto
U/I. This collection is nonempty, since the trivial one-dimensional C∗-
algebra C gives an object in Q and by Lemma 2.12 we do get a member of
F . Now, let I0 be the intersection of all ideals in F . We claim that I0 is
again a member of F . Since any C∗-homomorphism is contractive, we have
‖ΓI(a)‖ ≡ ‖β(a) + Ā ⊗ I‖ ≤ ‖a‖ for all a ∈ A
0 and I ∈ F . By Lemma
2.13, we see that ‖ΓI0(a)‖ ≤ ‖a‖ for a ∈ A
0 , so ΓI0 extends to a norm-
contractive map on Ā by the density of A∞0 in Ā. Moreover, for a, b ∈ Ā
and for I ∈ F , we have ΓI(ab) = ΓI(a)ΓI(b). Since ΠI = ΠI ◦ ΠI0 , we can
rewrite the homomorphic property of ΓI as
ΓI0(ab)− ΓI0(a)ΓI0(b) ∈ Ā ⊗ (I/I0).
Since this holds for every I ∈ F , we conclude that ΓI0(ab)−ΓI0(a)ΓI0(b) ∈
I∈F Ā⊗(I/I0) = (0), i.e. ΓI0 is a homomorphism. In a similar way, we can
show that it is a ∗-homomorphism. Since each βi is a unitary representation
of the compact quantum group Ui on the finite dimensional space Vi, it
follows that βi(Vi)(1 ⊗ Ui) is total in Vi ⊗ Ui. In particular, for any vi ∈ Vi
(i arbitrary), the element vi ⊗ 1Ui = vi ⊗ 1U belongs to the linear span of
βi(Vi)(1⊗Ui) ⊂ β(Vi)(1⊗U). Thus, A
0 ⊗1U is contained in the linear span of
β(A∞0 )(1⊗U) and henceA
0 ⊗1 U
is linearly spanned by ΓI0(A
0 )(1⊗U/I0).
By the norm-denisty of A∞0 in A and the contractivity of the quotient map,
it follows that A ⊗ U/I0 is the closed linear span of ΓI0(A
0 )(1 ⊗ U/I0).
This completes the proof that (U/I0,ΓI0) is indeed an object of Q.
We now show that G := U/I0 is a universal object in Q. To see this, con-
sider any object (S, α) of Q. Without loss of generality we can assume the
action to be faithful, since otherwise we can replace S by the C∗-subalgebra
generated by the elements {v
} appearing in the proof of Lemma 2.12. But
by Lemma 2.12 we can further assume that S is isomorphic with U/I for
some I ∈ F . Since I0 ⊆ I, we have a C
∗-homomorphism from U/I0 onto
U/I, sending x+I0 to x+I, which is clearly a morphism in the category Q.
This is indeed the unique such morphism, since it is uniquely determined on
the dense subalgebra generated by {u
+ I0, i, j, k ≥ 1} of G.
To construct the coproduct on G = U/I0, we first consider α
(2) = (ΓI0 ⊗
id) ◦ΓI0 : A → A⊗G ⊗G. It is easy to verify that (G ⊗G, α
(2)) is an object
in the category Q, so by the universality of (G,ΓI0), we have a unique unital
C∗-homomorphism ∆0 : G → G ⊗ G satisfying
(id⊗∆0) ◦ ΓI0(x) = α
(2)(x) ∀x ∈ A.
Taking x = eij, we get
eil ⊗ (πI0 ⊗ πI0)
eil ⊗∆0(πI0(u
Comparing coefficients of eil, and recalling that ∆̃(u
(where ∆̃ denotes the coproduct on U), we have
(πI0 ⊗ πI0) ◦ ∆̃ = ∆0 ◦ πI0 (2)
on the linear span of {u
, i, j, k ≥ 1}, and hence on the whole of U . This
implies that ∆0 maps I0 = Ker(πI0) into Ker(πI0⊗πI0) = (I0⊗1+1⊗I0) ⊂
U ⊗ U . In other words, I0 is a Hopf C
∗-ideal, and hence G = U/I0 has the
canonical compact quantum group structure as a quantum subgroup of U . It
is clear from the relation (2) that ∆0 coincides with the canonical coproduct
of the quantum subgroup U/I0 inherited from that of U . It is also easy to
see that the object (G,∆0,ΓI0) is universal in the category Q
′, using the fact
that (by Lemma 2.12) any compact quantum group (G,Φ) acting smoothly
and isometrically on the given spectral triple is isomorphic with a quantum
subgroup U/I, for some Hopf C∗-ideal I of U .
Finally, the faithfulness of α0 follows from the universality by standard
arguments which we briefly sketch. If G1 ⊂ G is a ∗-subalgebra of G such
that α0(A) ⊆ A ⊗ G1, it is easy to see that (G1,∆0, α0) is also a universal
object, and by definition of universality of G it follows that there is a unique
morphism, say j, from G to G1. But the map j ◦ i is a morphism from G to
itself, where i : G1 → G is the inclusion. Again by universality, we have that
j ◦ i = idG , so in particular, i is onto, i.e. G1 = G. ✷
Definition 2.15 We shall call the universal object (G,∆0) obtained in the
theorem above the quantum isometry group of (A∞,H,D) and denote it
by QISO(A∞,H,D), or just QISO(A∞) (or sometimes QISO(Ā)) if the
spectral triple is understood from the context.
Remark 2.16 Assume that an admissible spectral triple (A∞,H,D) also
satisfies the condition (i) of Lemma 2.5, i.e.
Dom(Ln) = A∞. Let α :
A → A⊗S be a smooth isometric action on A∞ by a compact quantum group
S. We recall from the proof of Lemma 2.12 that the map α̃ from A⊗ S to
itself extends to an S-linear unitary on the Hilbert S-module H0D ⊗ S, i.e.
α̃ can be viewed as a unitary in B(H0D) ⊗ S. Clearly, for any state φ on
S, we have αφ = (id ⊗ φ)(α̃) ∈ B(H
D). Now, by the definition of a smooth
isometric action, the bounded operator αφ commutes with the self-adjoint
operator L on A∞0 , which is a core for L. So, αφ must commute with L
for all n, and in particular keeps A∞ =
nDom(L
n) invariant.
Remark 2.17 Let us now briefly indicate how one can weaken the hy-
pothesis of connectedness. Such an extension of our results is desirable
to accommodate the classical spaces, including the finite sets and graphs,
in our framework. One possibile approach could be to consider the cate-
gory of compact quantum group actions α which are not only ‘smmoth’ and
‘isometric’ in our sense, but also satisfy the τ -invariance condition, i.e.
(τ ⊗ id)(α(a)) = τ(a)1. It is easy to see that the connectedness assumption
has been used by us only to prove that the τ -invariance is automatic for
smooth isometric actions. Thus, if we work in the smaller cateogory of such
τ -invraiant actions only, the proof of Theorem 2.14 does go through and thus
we can prove the existence of a universal object, to be defined as the quantum
isometry group. It is easy to see that for the algebra of functions on a finite
set, with the spectral triple given by D = 0, this quantum isometry group
coincides with thw quantum permutation group defined by Wang.
Remark 2.18 It is easy to see how to extend our formulation and results
to spectral triples which are not necessarily of type II, i.e. when the trace
τ is replaced by some non-tracial positive functional. Indeed, our construc-
tion will go through in such a situation more or less verbatim, by replacing
the universal quantum groups Adi(I) by Adi(Qi) for some suitable choice of
matrices Qi coming from the modularity property of τ .
2.4 Construction of quantum group-equivariant spectral triples
In this subsection, we shall briefly discuss the relevance of quantum isometry
group to the problem of constructing quantum group equivariant spectral
triples, which is important to understand the role of quantum groups in the
framework of noncommutative geometry. There has been a lot of activity
in this direction recently, see, for example, the articles by Chakraborty and
Pal ([5]), Connes ([7]), Landi et al ([8]) and the references therein. In the
classical situation, there exists a natural unitary representation of the isom-
etry group G = ISO(M) of a manifold M on the Hilbert space of forms, so
that the operator d+d∗ (where d is the de-Rham differential operator) com-
mutes with the representation. Indeed, d+d∗ is also a Dirac operator for the
spectral triple given by the natural representation of C∞(M) on the Hilbert
space of forms, so we have a canonical construction of G-equivariant spectral
triple. Our aim in this subsection is to generalize this to the noncommuta-
tive framework, by proving that dD + d
D is equivariant with respect to a
canonical unitary representation on the Hilbert space of ‘noncommutative
forms’ (see, for example, [10] for a detailed discussion of such forms).
Consider an admissible spectral triple (A∞,H,D) and moreover, make
the assumption of Lemma 2.6, i.e. assume that t 7→ eitDxe−itD is norm-
differentiable at t = 0 for all x in the ∗-algebra B generated by A∞ and
[D,A∞].
Lemma 2.19 In the notation of Lemma 2.6, we have the following (where
b, c ∈ A∞):
d∗D(dD(b)c) = −
(bL(c)− L(b)c− L(bc)) . (3)
Proof:
Denote by χ(b, c) the right hand side of euqation (3) and fix any a ∈ A∞.
Using the facts the the functional τ is a faithful trace on the ∗-algebra B,
L = −d∗DdD and that [D,X] = 0 for any X in B, we have,
τ(a∗χ(b, c))
{τ(a∗bL(c)) + τ(ca∗L(b)) + τ(a∗L(bc))}
{τ([D, a∗b][D, c]) − τ([D, ca∗][D, b])− τ([D, a∗][D, bc])}
{τ(a∗[D, b][D, c]) − τ([D, c]a∗[D, b])− τ(c[D, a∗][D, b])− τ([D, a∗][D, b]c)}
= −τ([D, a∗][D, b]c)
= τ([D, a]∗[D, b]c)
= 〈dD(a), dD(b)c〉
= τ(a∗(d∗D(dD(b)c))).
From this, we get the following by a simple computation:
〈adD(b), a
′dD(b
′)〉 = −
τ(b∗Ψ(a∗a, b′)), (4)
for a, b, a′, b′ ∈ A∞, and where Ψ(x, y) := L(xy)−L(x)y+xL(y). Now, let us
denote the quantum isometry group of the given spectral triple (A∞,H,D)
by (G,∆, α). Let A0 denote the ∗-algebra generated by A
0 , G0 denote ∗-
algebra of G generated by matrix elements of irreducible representations.
Clearly, α : A0 → A0 ⊗alg G0 is a Hopf-algebraic action of G0 on A0. Define
Ψ̃ : (A0 ⊗alg G0)× (A0 ⊗alg G0) → A0 ⊗alg G0 by
Ψ̃((x⊗ q), (x′ ⊗ q′)) := Ψ(x, x′)⊗ (qq′).
It follows from the relation (L ⊗ id) ◦ α = α ◦ L on A0 that
Ψ̃(α(x), α(y)) = α(Ψ(x, y)). (5)
We now define a linear map α(1) from the linear span of {adD(b) : a, b ∈ A0}
to H1D ⊗ G by setting
α(1)(adD(b)) :=
i dD(b
j )⊗ a
where for any x ∈ A0 we write α(x) =
i ∈ A0⊗algG0 (summation
over finitely many terms). We shall sometimes use the Sweedler convention
of writing the above simply as α(x) = x(1) ⊗ x(2). It then follows from the
identities (4) and (5), and also the fact that (τ ⊗ id)(α(a)) = τ(a)1 for all
a ∈ A0 that
〈adD(b), a
′dD(b
(τ ⊗ id)(α(b∗)Ψ̃(α(a∗a′), α(b′)))
(τ ⊗ id)(α(b∗)α(Ψ(a∗a′, b′)))
(τ ⊗ id)(α(b∗Ψ(a∗a′, b′)))
τ(b∗Ψ(a∗a′, b′))1G
= 〈adD(b), a
′dD(b
′)〉1G .
This proves that α(1) is indeed well-defined and extends to a G-linear isom-
etry on H1D ⊗ G, to be denoted by U
(1), which sends (adD(b)) ⊗ q to
α(1)(adD(b))(1 ⊗ q), a, b ∈ A0, q ∈ G. Moreover, since the linear span
of α(A∞0 )(1 ⊗ G) is dense in H
D ⊗ G, it is easily seen that the range of the
isometry U (1) is the whole of H1D ⊗G, i.e. U
(1) is a unitary. In fact, from its
definition it can also be shwon that U (1) is a unitary representation of the
compact quantum group G on H1D.
In a similar way, we can construct unitary representation U (n) of G on
the Hilbert space of n-forms for any n ≥ 1, by defining
U (n)((a0dD(a1)dD(a2)...dD(an))⊗q) = a
0 dD(a
1 )...dD(a
n )⊗(a
1 ...a
n q), ai ∈ A
(using Sweedler convention) and verifying that it extends to a unitary.
We also denote by U (0) the unitary representation α̃ on H0D discussed be-
fore. Finally, we have a unitary representation U =
n≥0 U
(n) of G on
H̃ :=
D, and also extend dD as a closed densely defined operator on H̃
in the obvious way, by defining dD(a0dD(a1)...dD(an)) = dD(a0)...dD(an).
It is now straightforward to see the following:
Theorem 2.20 The operator D′ := dD+d
D is equivariant in the sense that
U(D′ ⊗ 1) = (D′ ⊗ 1)U .
We point out that there is a natural representation π of A on H̃ given
by π(a)(a0dD(a1)...dD(an)) = aa0dD(a1)...dD(an), and (π(A
∞), H̃,D′) is
indeed a spectral triple, which is G-equivariant.
Although the relation between spectral properties of D and D′ is not
clear in general, in many cases of interest (e.g. when there is an underlying
type (1, 1) spectral data in the sense of [10]) these two Dirac operators are
closely related. As an illustration, consider the canonical spectral on the
noncommutative 2-torus Aθ, which is discussed in some details in the next
section. In this case, the Dirac operator D acts on L2(Aθ, τ) ⊗ C
2, and it
can easily be shown (see [10]) that the Hilbert space of forms is isomorphic
with L2(Aθ, τ)⊗C
4 ∼= L2(Aθ)⊗C
2; thus D′ is essentially same as D in this
case.
3 Examples and computations
We give some simple yet interesting explicit examples of quantum isometry
groups here. However, we give only some computational details for the first
example, and for the rest, the reader is referred to a companion article ([3]).
Example 1 : commutative tori
Consider M = T, the one-torus, with the usual Riemannian structure. The
∗-algebra A∞ = C∞(M) is generated by one unitary U , which is the multi-
plication operator by z in L2(T). The Laplacian is given by L(Un) = −n2Un.
If a compact quantum group (S,∆S) acts on A
∞ smoothly, let An, n ∈ Z be
elements of S such that α0(U) =
n ⊗An (here α0 : A
∞ → A∞ ⊗alg S
is the S-action on A∞). Note that this infinite sum converges at least
in the topology of the Hilbert space L2(T) ⊗ L2(S), where L2(S) denotes
the GNS space for the Haar state of S. It is clear that the condition
(L ⊗ id) ◦ α0 = α0 ◦ L forces to have An = 0 for all but n = ±1. The
conditions α0(U)α0(U)
∗ = α0(U)
∗α0(U) = 1 ⊗ 1 further imply the follow-
A∗1A1 +A
−1A−1 = 1 = A1A
1 +A−1A
A∗1A−1 = A
−1A1 = A1A
−1 = A−1A
1 = 0.
It follows that A±1 are partial isometries with orthogonal domains and
ranges. Say, A1 has domain P and range Q. Hence the domain and
range of A−1 are respectively 1 − P and 1 − Q. Consider the unitary
V = A + B, so that V P = A, V (1 − P ) = B. Now, from the fact that
(L⊗ id)(α0(U
2)) = α0(L(U
2)) it is easy to see that the coefficient of 1⊗1 in
the expression of α0(U)
2 must be 0, i.e. AB+BA = 0. From this, it follows
that V and P commute and therefore P = Q. By straightforward calculation
using the facts that V is unitary, P is a projection and V and P commute, we
can verify that α0 given by α0(U) = U ⊗V P +U
−1⊗V (1−P ) extends to a
∗-homomorphsim from A∞ to A∞⊗C∗(V, P ) satisfying (L⊗id)◦α0 = α0◦L.
It follows that the C∗ algebra QISO(T) is commutative and generated by a
unitary V and a projection P , or equivalently by two partial isometries A,
B such that A∗A = AA∗, B∗B = BB∗, AB = BA = 0. So, as a C∗ algebra
it is isomorphic with C(T) ⊕ C(T) ∼= C(T × Z2). The coproduct (say ∆0)
can easily be calculated from the requirement of co-associativity, and the
Hopf algebra structure of QISO(T) can be seen to coincide with that of
the semi-direct product of T by Z2, where the generator of Z2 acts on T by
sending z 7→ z̄.
We summarize this in form of the following.
Theorem 3.1 The universal quantum group of isometries QISO(T) of the
one-torus T is isomorphic (as a quantum group) with C(T >⊳Z2) = C(ISO(T)).
We can easily extend this result to higher dimensional commutative tori,
and can prove that the quantum isometry group coincides with the classical
isometry group. This is some kind of rigidity result, and it will be interest-
ing to investigate the nature of quantum isometry groups of more general
classical manifolds.
Example 2 : Noncommutative torus; holomorphic isomrtries
Next we consider the simplest and well-known example of noncommutative
manifold, namely the noncommutative two-torus Aθ, where θ is a fixed
irrational number (see [6]). It is the universal C∗ algebra generated by
two unitaries U and V satisfying the commutation relation UV = λV U ,
where λ = e2πiθ. There is a canonical faithful trace τ on Aθ given by
τ(UmV n) = δmn. We consider the canonical spectral triple (A
∞,H,D),
where A∞ is the unital ∗-algebra spanned by U, V , H = L2(τ)⊕ L2(τ) and
D is given by
0 d1 + id2
d1 − id2 0
where d1 and d2 are closed unbounded linear maps on L
2(τ) given by
mV n) = mUmV n, d2(U
mV n) = nUmV n. It is easy to compute the
space of one-forms Ω1D (see [4], [10], [6]) and the Laplacian L = −d
∗d is
given by L(UmV n) = −(m2 + n2)UmV n. For simplicity of computation,
instead of the full quantum isometry group we at first concentrate on an
interesting quantum subgroup G = QISOhol(A∞,H,D), which is the uni-
versal quantum group which leaves invariant the subalgebra ofA∞ consisting
of polynomials in U , V and 1, i.e. span of UmV n with m,n ≥ 0. The proof
of existence and uniqueness of such a universal quantum group is more or
less identical to the proof of existence and uniqueness of QISO. We call G
the quantum group of “holomorphic” isometries, and observe in the theorem
stated below without proof (see [3]) that this quantum group is nothing but
the quantum double torus studied in [11].
Theorem 3.2 Consider the following co-product ∆B on the C
∗ algebra B =
C(T2)⊕A2θ, given on the generators A0, B0, C0,D0 as follows ( where A0,D0
correspond to C(T2) and B0, C0 correspond to A2θ)
∆B(A0) = A0 ⊗A0 + C0 ⊗B0, ∆B(B0) = B0 ⊗A0 +D0 ⊗B0,
∆B(C0) = A0 ⊗ C0 +C0 ⊗D0, ∆B(D0) = B0 ⊗ C0 +D0 ⊗D0.
Then (B,∆0) is a compact quantum group and it has an action α0 on Aθ
given by
α0(U) = U ⊗A0 + V ⊗B0, α0(V ) = U ⊗C0 + V ⊗D0.
Moreover, (B,∆B) is isomorphic (as quantum group) with G = QISO
hol(A∞,H,D).
We refer to [3] for a proof of the above result, and to [11] for the computation
of the Haar stat and representation theory of the compact quantum group
Example 3 : Noncommutative Torus; full quantum isometry group
By similar but somewhat tedious calculations (see [3]) one can also describe
explicitly the full quantum isometry group QISO(A∞,H,D). It is as a
C∗ algebra has eight direct summands, four of which are isomorphic with
the commutative algebra C(T2), and the other four are irrational rotation
algebras.
Theorem 3.3 QISO(Aθ) = ⊕
∗(Uk1, Uk2) (as a C
∗ algebra), where for
odd k, Uk1, Uk2 are the two commuting unitary generators of C(T
2), and for
even k, Uk1Uk2 = exp(4πiθ)Uk2Uk1, i.e. they generate A2θ. The (co)-action
on the generators U, V (say) of Aθ are given by the following :
α0(U) = U⊗(U11+U31)+V⊗(U52+U62)+U
−1⊗(U21+U41)+V
−1⊗(U72+U82),
α0(V ) = U⊗(U51+U71)+V⊗(U12+U22)+U
−1⊗(U61+U81)+V
−1⊗(U32+U42).
From the co-associativity condition, the co-product of QISO(Aθ) can easily
be calculated. For the detailed description of the coproduct, counit, an-
tipode and study of the representation theory of QISO(Aθ), the reader is
referred to [3]. It is interesting to mention here that the quantum isometry
group of Aθ is a Rieffel type deformation of the isometry group (which is
same as the quantum isometry group) of the commutative two-torus. The
commutative two-torus is a subgroup of its isometry group, but when the
isometry group is deformed into QISO(Aθ), the subgroup relation is not
respected, and the deformation of the commutative torus, which is A2θ, sits
in QISO(Aθ) just as a C
∗ subalgebra (in fact a direct summand) but not
as a quantum subgroup any more. This perhaps provides some explanation
of the non-existence of any Hopf algebra structure on the noncommutative
torus.
Acknowledgement : The author would like to thank P. Hajac for draw-
ing his attention to the article [11], and S.L. Woronowicz for many valuable
comments and suggestions which led to substantial improvement of the pa-
References
[1] Banica, T.: Quantum automorphism groups of small metric spaces,
Pacific J. Math. 219(2005), no. 1, 27–51.
[2] Banica, T.: Quantum automorphism groups of homogeneous graphs,
J. Funct. Anal. 224(2005), no. 2, 243–280.
[3] Bhowmick, J. and Goswami, D.: Quantum isometry groups :
examples and computations, preprint (2007), arXiv 0707.2648.
[4] Chakraborty, P. S.: Goswami, D. and Sinha, Kalyan B.: Probability
and geometry on some noncommutative manifolds, J. Operator
Theory 49 (2003), no. 1, 185–201.
[5] Chakraborty, P. S. and Pal, A.: Equivariant spectral triples on the
quantum SU(2) group, K. Theory 28(2003), 107–126.
[6] Connes, A.: “Noncommutative Geometry”, Aacdemic Press,
London-New York (1994).
[7] Connes, A.: Cyclic cohomology, quantum group symmetries and the
local index formula for SUq(2), J. Inst. Math. Jussieu 3(2004), no.
1, 17–68.
[8] Dabrowski,L., Landi, G., Sitarz, A., van Suijlekom, W. and Varilly,
Joseph C.: The Dirac operator on SUq(2), Comm. Math. Phys.
259(2005), no. 3, 729–759.
[9] Donnelly, H.: Eigenfunctions of Laplacians on Compact Riemannian
Manifolds, Assian J. Math. 10 (2006), no. 1, 115–126.
[10] Fröhlich, J.; Grandjean, O.; Recknagel, A.: Supersymmetric quan-
tum theory and non-commutative geometry, Comm. Math. Phys.
203 (1999), no. 1, 119–184.
[11] Hajac, P. and Masuda, T.: Quantum Double-Torus, Comptes
Rendus Acad. Sci. Paris 327(6), Ser. I, Math. (1998), 553–558.
[12] Rosenberg, S.: “The Laplacian on a Riemannian Manifold”, Cam-
bridge University Press, Cambridge (1997).
[13] Soltan, P. M.: Quantum families of maps and quantum semigroups
on finite quantum spaces, preprint, arXiv:math/0610922.
[14] Van Daele, A.: Notes on Compact Quantum Groups,
arXiv:math/9803122.
[15] Wang, S.: Free products of compact quantum groups, Comm. Math.
Phys. 167 (1995), no. 3, 671–692.
[16] Wang, S.: Quantum symmetry groups of finite spaces, Comm.
Math. Phys. 195(1998), 195–211.
[17] Wang, S.: Structure and isomorphism classification of compact
quantum groups Au(Q) and Bu(Q), J. Operator Theory 48 (2002),
573–583.
[18] Woronowicz, S. L.: ”Compact quantum groups”, pp. 845–884 in
Symétries quantiques (Quantum symmetries) (Les Houches, 1995),
edited by A. Connes et al., Elsevier, Amsterdam, 1998.
[19] Woronowicz, S. L.: Pseudogroups, pseudospaces and Pontryagin du-
ality, Proceedings of the International Conference on Mathematical
Physics, Lausane (1979), Lecture Notes in Physics 116, pp. 407-412.
ABSTRACT
  We formulate a quantum generalization of the notion of the group of
Riemannian isometries for a compact Riemannian manifold, by introducing a
natural notion of smooth and isometric action by a compact quantum group on a
classical or noncommutative manifold described by spectral triples, and then
proving the existence of a universal object (called the quantum isometry group)
in the category of compact quantum groups acting smoothly and isometrically on
a given (possibly noncommutative) manifold satisfying certain regularity
assumptions. In fact, we identify the quantum isometry group with the universal
object in a bigger category, namely the category of `quantum families of smooth
isometries', defined along the line of Woronowicz and Soltan. We also construct
a spectral triple on the Hilbert space of forms on a noncommutative manifold
which is equivariant with respect to a natural unitary representation of the
quantum isometry group. We give explicit description of quantum isometry groups
of commutative and noncommutative tori, and in this context, obtain the quantum
double torus defined in \cite{hajac} as the universal quantum group of
holomorphic isometries of the noncommutative torus.

<|endoftext|><|startoftext|>
Microsoft Word - Like-QuantumSemantics.doc
General System theory, Like-Quantum Semantics and Fuzzy Sets 
Ignazio Licata 
Isem, Institute for Scientific Methodology, Pa, Italy 
Ignazio.licata@ejtp.info 
Abstract: It is  outlined the possibility to extend the quantum formalism in relation to the 
requirements of  the  general systems theory. It can be done by using a  quantum semantics 
arising from the deep logical structure of quantum theory. It is so possible taking into account the 
logical openness relationship between observer and system. We are going to  show how 
considering the truth-values of quantum propositions within the context of the fuzzy sets is here 
more useful for systemics . In conclusion we propose an  example of   formal quantum coherence. 
Key-words: Quantum Theory; Fuzzy Sets; System Theory; Syntax and Semantics of Scientific 
Theories; Logical Openness. 
Published in Systemics of Emergence. Research and Development, Minati G., Pessa E., Abram M., 
Springer, 2006, pages 723-734. 
1.The role of syntactics and semantics in general system theory  
The omologic element breaks specializations up, forces taking into account different things at the same time, stirs up the 
interdependent game of the separated sub-totalities, hints at a broader totality whose laws are not the ones of its 
components. In other words, the omologic method is an anti-separatist and reconstructive one, which thing makes it 
unpleasant to specialists. 
F. Rossi-Landi 1985 
The systemic-cybernetic approach ( Wiener, 1961; von Bertalannfy,1968; Klir, 1991) requires a 
careful evaluation of epistemology as the critical praxis internal to the building up of the scientific 
discourse. That is why the usual referring to a “connective tissue” shared in common by different 
subjects could be misleading. As a matter of fact every scientific theory is the outcome of  a 
complex conceptual construction aimed to the problem peculiar features, so what  we are 
interested in is not a framework shaping an abstract super-scheme made by the “filtering” of the 
particular sciences, but a research focusing on the global and foundational characteristics of 
scientific activity in a trans-disciplinary perspective.   According to such view, we can understand 
the General System Theory (GST) by the analogy to metalogic. It deals with the possibilities and 
boundaries of various formal systems to a more higher degree than any specific structure.  
A scientific theory presupposes  a certain set of relations between observer and system, so 
GST has the purpose to investigate the possibility of describing the multeity of system-observer 
relationships. The GST main goal is delineating a formal epistemology to study the scientific 
knowledge formation, a science able to speak about science. Succeeding to outline such 
panorama will make possible analysing those inter-disciplinary processes which are more and 
more important in studying complex systems and they will be guaranteed the “transportability” 
conditions of a modellistic set from a field to another one. For instance, during a theory developing, 
syntax gets more and more structured by putting univocal constraints on semantics according to 
the operative requirements of the problem. Sometimes it can be useful generalising a syntactic tool 
in a new semantic domain so as to formulate new problems. Such work, a typically trans-
disciplinary one, can only be done by the tools of a GST able to discuss new relations between 
syntactics  (formal model) and semantics ( model usage). It is here useful to consider again the 
omologic  perspective, which not only  identifies analogies and isomorphisms in pre-defined 
structures, but aims to find out a structural and dynamical relation among theories to an higher 
level of analysis, so providing new use possibilities (Rossi-Landi, 1985). Which thing is particularly 
useful in studying complex systems, where the very essence of the problem itself makes a 
dynamic use of models necessary to describe the emergent features of the system (Minati & 
Brahms, 2002; Collen, 2002).  
We want here to briefly discuss such GST acceptation, and then showing the possibility of 
modifying the semantics of  Quantum Mechanics (QM) so to get a conceptual tool fit for the 
systemic requirements.  
2. Observer as emergence surveyor and semantic ambiguity solver  
What we look at is not Nature in itself, but Nature unveiling to our questioning methods. 
W. Heisenberg, 1958 
A very important and interesting question in system theory can be stated as follows: given a set 
of measurement systems M and of theories T related to a system S, is it always possible to order 
them, such that Ti-1 �Ti, where the partial order symbol �  is used to denote the relationship 
“physically weaker than” ?  We shall point out that, in this case, the ith theory of the chain contains 
more information than the preceding ones. This consequently leads to a second key question: can 
an unique final theory Tf  describe exhaustively each and every aspect of system S ? From the 
informational and metrical side, this is equivalent to state that all of the information contained in a 
system S can be extracted, by means of adequate measurement processes.  
The fundamental proposition for reductionism is, in fact, the idea that such a theory chain will 
be sufficient to give a coherent and complete description for a system S. Reductionism, in the light 
of our definitions, coincides therefore with the highest degree of semantic space “compression”; 
each object  D ∈ Ti in S has a definition in a theory Ti belonging to the theory chain, and the latter 
is - on its turn - related to the fundamental explanatory level of the “final” theory Tf. This implies that 
each aspect in a system S is unambiguously determined by the syntax described in Tf. Each 
system S can be described at a fundamental level, but also with many phenomenological 
descriptions, each of these descriptions can be considered an approximation of the “final” theory.  
Anyway, most of the “interesting” systems we deal with cannot be included in this chained-
theory syntax compatibility program: we have to consider this important aspect for a correct 
epistemic definition of systems “complexity”. Let us illustrate this point with a simple reasoning, 
based upon the concepts of logical openness and intrinsic emergence (Minati, Pessa, Penna, 
1998; Licata, 2003b). 
Each measurement operation can be theoretically coded on a Turing machine. If a coherent 
and complete fundamental description Tf exists, then there will also exist a finite set - or, at most, 
countably infinite - of measurement operations M which can extract each and every single 
information that describes the system S. We shall call such a measurement set Turing-observer. 
We can easily imagine Turing-observer as a robot that executes a series of measurements on a 
system. The robot is guided by a program built upon rules belonging to the theory T. It can be 
proved, though, that this is only possible for logically closed systems, or at most for systems with a 
very low degree of logical openness. When dealing with highly logically open systems, no recursive 
formal criterion exists that can be as selective as requested (i.e., automatically choose which 
information is relevant to describe and characterize the system, and which one is not), simply 
because it is not possible to isolate the system from the environment. This implies that the Turing-
observer hypothesis does not hold for fundamental reasons, strongly related to Zermelo-Fraenkel's 
choice axiom and to classical Godel's decision problems. In other words, our robot executes the 
measurements always following the same syntactics, whereas the scenario showing intrinsic 
emergence is semantically modified. So it is impossible thinking to codify any possible 
measurement in a logically open system! 
The observer therefore plays a key rule, unavoidable as a semantic ambiguity solver: only the 
observer can and will single out intrinsic-observational emergence properties ( Bass & 
Emmeche,1997; Cariani, 1991), and subsequently plan adequate measurement processes to 
describe what  – as a matter of fact- have turned in  new systems. System complexity is 
structurally bound to logical openness and is, at the same time, both an expression of highly 
organized system behaviours (long-range correlations, hierarchical structure, and so on) and an 
observer's request for new explanatory models.  
So, a GST  has to allow - in the very same theoretical context – to deal with the observer as an 
emergence surveyor in a logical open system. In particular, it is clear that the observer itself is a 
logical open system. 
Moreover, it has to be pointed out that the co-existence of many description levels – compatible but 
not each other deductible – leads to intrinsic uncertainty situations, linked to the different 
frameworks  by which a system property can be defined. 
3. Like-quantum semantics  
I’m not happy with all the analyses that go with just the classical theory, because nature isn’t classical, damm it, 
and if you want to make a simulation of nature, you’d better make it quantum mechanical, and by golly it’s a wonderful 
problem, because it doesn’t look so easy. Thank you. 
R. P. Feyman, 1981 
When we modify and/or amplify a theory so as to being able to speak about different systems 
from the ones they were fitted for, it could be better to look at the theory deep structural features so 
as to get an abstract perspective able to fulfil the omologic approach requirements, aiming to point 
out a non-banal conceptual convergence. 
As everybody knows, the logic of classical physics is a dichotomic language (tertium non 
datur), relatively orthocomplemented and able to fulfil the weak distributivity relations by the logical 
connectives AND/OR. Such features are the core of the Boolean  commutative elements of this 
logic because disjunctions and conjunctions are symmetrical and associative operations. We shall 
here dwell on the systemic consequences of these properties. A system S can get or not to get a 
given property P. Once we fix the P truth-value it is possible to keep on our research over a new 
preposition P subordinated to the previous one’s truth-value. Going ahead, we add a new piece of 
information to our knowledge about the system. So the relative orthocomplementation axiom 
grants that we keep on following a successions of steps, each one making our uncertainty about 
the system to diminish or, in case of a finite amount of steps, to let us defining the state of the 
system by determining all its properties. Each system’s property  can be described by a countable 
infinity of atomic propositions. So, such axiom plays the role of a describable axiom for classical 
systems.  
The unconstrained use of such kind of axiom tends to hide the conceptual problems spreading 
up from the fact that every description implies a context, as we have seen in the case of  Turing-
observer analysis, and it seems to imply that systemic properties are independent of the observer, 
it surely is a non-valid statement when we deal with  open logical systems. In particular, the 
Boolean features point out that it is always possible carrying out exhaustively a synchronic 
description of the properties of a systems. In other words, every question about the system is not 
depending on the order we ask it and it is liable to a fixed answer  we will indicate as 0- false / 1-
true. It can be suddenly noticed that the emergent features otherwise get a diachronic nature and 
can easily make such characteristics not taken for granted. By using Venn diagrams it is possible 
providing a representation of the complete descriptiveness of a  system ruled by classical logics. If 
the system’s state is represented by a point and a property of its by a set of points, then it is always 
possible a complete “blanketing” of the universal set I, which means the always universally true 
proposition. (see fig. 1).  
The quantum logics shows deep differences which could be extremely useful for our goals 
(Birkhoff & von Neumann, 1936; Piron, 1964). At the beginning it was born to clarify some QM’s 
counter-intuitive sides , later it has developed as an autonomous field greatly independent from the 
matters which gave birth to it. We will abridge here the formal references to an essential survey, 
focusing on some points of general interest in systemics. 
The quantum language is a non-Boolean orthomodular structure, which is to say it is relatively 
orthocomplemented but non-commutative, for the crack down of the distributivity axiom. Such thing 
comes naturally from the Heisenberg Indetermination Principle  and binds the truth- value of an 
assertion to the context and the order by which it has been investigated (Griffiths, 1995). A well-
known example is the one of a particle’s spin measurement along a given direction. In this case we 
deal with semantically well defined possibilities and yet intrinsically uncertain. Let put xΨ  
the spin measurement along the direction x. For the indetermination principle the value yΨ  will be 
totally uncertain, yet the proposition  yΨ =0 ∨   yΨ =1  is necessarily true. In general, if P is a 
proposition , (-P ) its negation and Q the property which does not commute with P, then we will get 
a situation that can be represented by a “patchy” blanketing of the set I (see fig.2). 
Such configuration finds its essential meaning just in its relation with the observer. So we can 
state that when a situation can be described by a quantum logics, a system is never completely 
defined a priori. The measurement process by which the observer’s action takes place is a choice 
fixing some system’s characteristics and letting other ones undefined. It happens just for the nature 
itself of the observer-system  inter-relationship. Each observation act  gives birth to new descriptive 
possibilities. The proposition Q – in the above example – describes properties that cannot be 
defined by any implicational chain of propositions P. Since the intrinsic emergence cannot be 
regarded as a system property independent of the observer  action- as in naïve classical  
emergentism - , Q can be formally considered the expression of an emergent property. Now we are 
strongly tempted to define as emergent the undefined proposition of quantum-like anti-
commutative language. In particular, it can be showed that a non-Boolean and irreducible 
orthomodular language arises infinite propositions. It means that for each couple of propositions P1 
and P2   such that non of them imply the other , there exists infinite propositions Q which imply  P1  
∨  P2  without necessarily implying the two of them separately: tertium datur. In a sense, the 
disjunction of the two propositions gets more information than their mere set-sum, that is the 
entirely opposite of what happens in the Boolean case. It is  now easy to comprehend the deep 
relation binding the anti-commutativity, indetermination principles and system’s holistic global 
structure. A system describable by a Boolean structure can be completely “solved” by analysing 
the sub-systems defined by a fit decomposition process( Heylighen, 1990; Abram, 2002). On the 
contrary, in the anti-commutative case studying any sub-system modifies the entire system in an 
irreversible and structural way and produces uncertainty correlated to the gained information, 
which think makes absolutely natural extending the indetermination principles to a big deal of 
spheres of strong interest for systemics (Volkenshtein , 1988). 
A particularly key-matter is how to conceptually managing the infinite cardinality of emergent 
propositions in a lik-quantum semantics. As everybody knows traditional QM refers to the 
frequentistic probability worked out within the Copenhagen Interpretation (CIQM). It is essentially a 
sub specie probabilitatis Boolean logics extension. The values between [ ]1,0  - i.e. between the 
completely and always true proposition I and the always false one O – are meant as expectation 
values, or the probabilities associated to any measurable property. Without dwelling on the 
complex – and as for many questions still open – debate on QM interpretation, we can here ask if 
the probabilistic acception of truth-values is the fittest for system theory. As it usually happens 
when we deal with trans-disciplinary feels, it will bring us to add a new, and of remarkable interest 
for the “ordinary” QM too, step to our search. 
4. A Fuzzy Interpretation of Quantum Languages 
A slight variation in the founding axioms of a theory can give way to huge changings  on  the frontier. 
S. Gudder, 1988 
The study of the structural and logical facets of quantum semanics does not provide any 
necessary indications about the most suitable algebraic space to implement its own ideas. One of 
the thing which made a big merit of such researches has been  to put under discussion the key role 
of Hilbert space. In our approach we have kept the QM “internal” problems and its extension to 
systemic questions well separated.  Anyway, the last ones suggest an interpretative possibility 
bounded to fuzzy logic, which thing can considerably affect the traditional QM too. The fuzzy set 
theory is , in its essence, a formal tool created to deal with information characterized with 
vagueness and indeterminacy. The by-now classical paper of  Lotfi Zadeh (Zadeh, 1965) brings to 
a conclusion an old tradition of logics, which counts Charles S. Peirce, Jan C. Smuts, Bertrand 
Russell, Max Black and Ian Lukasiewicz among its forerunners. At the core of the fuzzy theory  lies 
the idea that an element can belong to a set to a variable degree of membership; the same goes 
for a proposition and its variable relation to the  true and false logical constants. We  underline here 
two aspects of  particular interest for our aims. The fuzziness’ definition  concerns single elements 
and properties, but not a statistical ensemble, so it has to be considered a completely different 
concept from the probability one, it should –by now-  be widely clarified (Mamdani, 1977; Kosko, 
1990). A further essential – even maybe less evident – point is that fuzzy theory calls up a non-
algorithmic “oracle”, an observator  (i.e. a logical open system and a semantic ambiguity solver) to 
make a choice as for the membership degree. In fact, the most part of the theory in its structure is 
free-model; no equation and no numerical value create constraints to the quantitative evaluation, 
being the last one the model builder’s task. There consequently exists a deep bound between 
systemics and fuzziness  successfully expressed by the Zadeh’s incompatibility principle (Zadeh, 
1972) which satisfies our requirement for a generalized indeterminacy principle. It states that by 
increasing the system complexity (i.e. its logical openness degree), it will decrease our ability to 
make exact statements and proved predictions about its behaviour. There already exists many 
examples of crossing between fuzzy theory and QM (Dalla Chiara, Giuntini, 1995; Cattaneo, Dalla 
Chiara, Giuntini 1993). We want here to delineate the utility of fuzzy polyvalence for systemic 
interpretation of quantum semantics.  
Let us consider a complex system, such as a social group, a mind and a biological organism. 
Each of these cases show typical emergent features owed both to the interaction among its 
components and the inter-relations with the environment. An act of the observer will fix some 
properties and will let some others undetermined according to a non-Boolean logic. The recording 
of such properties will depend on the succession of the measurement acts and their very nature. 
The kind of complexity into play, on the other hand, prevents us by stating what the system state is 
so as to associate to the measurement of a property an expectation probabilistic value. In fact, just 
the above-mentioned examples are related to macroscopic systems for which the probabilistic 
interpretation of QM is patently not valid. Moreover, the traditional application of the probability 
concept implies the notion of  “possible cases”, and so it also implies a pre-defined knowledge of 
systems’ properties. However, the non-commutative logical structure here outlined does not 
provide any cogent indication on probability usage. 
Therefore, it would be proper to look at a fuzzy approach so to describe the measurement acts. 
We can state that given a generic system endowed with high logical openness and an indefinite set 
of properties able of describing it, each of them will belong  to the system in a variable degree. 
Such viewpoint expressing the famous theorem of fuzzy “subsetness” – also known as “the whole 
into the part” principle – could seem to be too strong , indeed it is nothing else than the most 
natural expression of the actual scientific praxis facing intrinsic emergent systems.  At the 
beginning,  we have at our disposal  indefinite information progressively structuring thanks to the 
feedback between models and measurements. It can be shown that any logically open model of 
degree n – where n  is an integer – will let a wide range of properties and propositions 
indeterminate  (the Qs in fig. 2).The above-mentioned model  is a “static” approximation of a 
process showing aspects of  variable closeness and openness. The latter ones varies in time, 
intensity, different levels and context. It is remarkable pointing out how such systems are “flexible” 
and context-sensitive, change the rules and make use of “contradictions” . This point  has to be 
stressed to understand the link between fuzzy logic and quantum languages. By increasing the 
logical openness and the unsharp properties of a system, it  will be less and less fit to be described 
by a Boolean logic. It brings as a consequence that for a complex system the intersection between 
a set (properties, propositions) and its complement is not equal to the empty set, but it includes 
they both in a fuzzy sense. So we get a polyvalent semantic situation which is well fitted for being 
described by  a quantum language. As for our systemic goal  it is the probabilistic interpretation to 
be useless, so we are going to build a fuzzy acception of the semantics of the formalism. In our 
case, given a system S and a property Q,, let Ψ be a  function which associates Q to S, the 
expression ( ) [ ]1,0∈Ψ QS  has not to be meant as a probability value, but as a degree of 
membership. Such union between the non-commutative sides of quantum languages and fuzzy 
polyvalence appears to be the most suitable and fecund for systemics. 
Let us consider the traditional expression of quantum coherence (the property expressing the 
QM global and non-local characteristics, i.e. superposition principle, uncertainty, interference of 
probabilities), 2211 Ψ+Ψ=Ψ aa . In the fuzzy interpretation, it means that the properties 1Ψ e 2Ψ  
belong to Ψ with degrees of membership 1a  e 2a  respectively. In other words, for complex 
systems the Schrödinger’s cat can be simultaneously both alive and dead ! Indeed the recent 
experiments with SQUIDs and the other ones investigating the so-called macroscopic quantum 
states suggest a form of macro-realism quite close to our fuzzy acception (Leggett, 1980; Chiatti, 
Cini, Serva, 1995). It can provide in nuce an hint which could show up to be interesting  for the QM 
old-questioned interpretative problems. 
In general, let x be a position coordinate of a quantum object and Ψ  its wave function, 
( ) dVx 2Ψ is usually meant as the probability of finding the particle in a region dV of space.  On the 
contrary, in the fuzzy interpretation we will be compelled to look at  the Ψ  square modulus as the 
degree of membership of the particle to the region  dV of space. How unusual it may seem, such 
idea has not to be regarded thoughtlessly at. As a matter of fact, in Quantum Field Theory and in 
other more advanced quantum scenarios, a particle is not only a localized object in the space, but 
rather an event emerging from the non-local networks elementary quantum transition (Licata, 
2003a).  Thus, the measurement  is  a “defuzzification” process which, according to the stated, 
reduces the system ambiguity by limiting the semantic space and by defining  a fixed  information 
quantity. 
If we agree with such  interpretation we will easily and immediately realize that we will able to 
observate quantum coherence behaviours in non-quantum and quite far from the range of  Plank’s 
h constant  situations. We reconsider here a situation owed to Yuri Orlov (Orlov, 1997). 
Let us consider a Riemann’s sphere (Dirac, 1947) – see fig. 3 - and let assume that each point 
on the sphere represents a single interpretation of a given situation, i.e. the assigning of a  
coherent set  of truth-values to a given proposition. Alternatively, we can consider the choosing of 
a vector v  from the centre O to a point on the sphere as a logical definition of a world. If we 
choose a different direction, associated to a different vector w , we can now set the problem about 
the meaning of the amplitude between the logical descriptions of the two worlds. It is known that 
such amplitude is expressed by ( )ϑcos121 + , where ϑ  is the angle between the two 
interpretations. The amplitude corresponds to a superposition of worlds, so producing the typical  
interference patterns which in vectorial terms are related to v
w .  In this case, the traditional use of 
probability is not necessary because our knowledge of one of the two world  with probability equal 
to p =1 (certainity), say nothing us about the other one probability. An interpretation is not a 
quantum object  in the proper sense, and yet we are forced to formally introduce a wave-function 
and interference terms whose role is  very obscure a one. The fuzzy approach, instead, clarifies 
the quantum semantics of this situation by interpreting  interference as a measurement where the  
properties of the world wv wv Ψ+Ψ  are owed to the global and indissoluble (non-local)  
contribution of the  v  and w  overlapping. 
In conclusion, the generalized using of quantum semantics associated to new interpretative 
possibilities gives to systemics a very powerful tool to describe the observator-environment relation 
and to convey the several, partial attempts - till now undertaken - of applying  the quantum 
formalism to the study of complex systems into a comprehensive  conceptual root. 
ACKNOWLEDGEMENTS  
A special thank to Prof. G. Minati  for his kindness and his supporting during this paper drafting.  I 
owe a lot to the useful discussing on structural Quantum Mechanics and logics with my good 
friends Prof. Renato Nobili (who let me use the figs. 1 and 2 from his book “Dai Quark alla Mente”, 
to be published) and Prof. Eliano Pessa. Dedicated to M.V. 
REFERENCES  
Abram, M.R.,2002, Decomposition of Systems, in Emergence in Complex, Cognitive,  Social and 
Biological Systems ,( G. Minati and E.Pessa eds.), Kluwer Academic, NY, 2002. 
Baas, N. A. and  Emmeche , C., 1997, On Emergence and Explanation, in SFI Working Paper, 
Santa Fé Inst., 97-02-008. 
Birkhoff, G. and von Neumann J., 1936, The Logic of Quantum Mechanics, in Annals of Math.,37. 
Cariani, P., 1991, Adaptivity and Emergence in Organism and Devices, in World Futures, 32( 111). 
Cattaneo, G., Dalla Chiara, M.L.,Giuntini, R., 1993, Fuzzy-Intuitionistic Quantum Logics, in Studia 
Logica, 52. 
Chiatti, L., Cini M., Serva, M., 1995, Is Macroscopic Quantum Coherence Incompatibile with 
Macroscopic Realism? In Nuovo Cim., 110B (5-6). 
Collen, A., 2002, Disciplinarity in the Pursuit of Knowledge, in Emergence in Complex, Cognitive,  
Social and Biological Systems ( G. Minati and E.Pessa eds.), Kluwer Academic, NY, 2002. 
Dalla Chiara, M.L. and Giuntini R, 1995, The Logic of Orthoalgebras, in Studia Logica, 55. 
Dirac,P.A.M., 1947, The Principles of Quantum Mechanics, 3rd ed., Oxford Univ. Press, Oxford. 
Feynman, R. P., 1982, Simulating Physics with Computers, in Int. J. of Theor. Phys., 21(6/7). 
Griffiths, R. B., 1995, Consistent Quantum Reasoning, in arXiv :quant-ph/9505009 v1. 
Gudder, S.P.,1988, Quantum Probability, Academic Press, NY. 
Heisenberg W., 1958, Physics and Philosophy: The Revolution in Modern Science .Harper and 
Row, NY ; Prometheus Books; Reprint edition  1999. 
Heylighen, F., 1990, Classical and Non-Classical Representations in Physics: Quantum 
Mechanics, in Cybernetics and Systems 21. 
Klir, J. G., (ed), 1991, Facets of Systems Science, Plenum Press, NY. 
Kosko, B., 1990, Fuzziness vs. Probability, in Int. J. Of General Systems, 17(2). 
Legget, A. J., 1980, Macroscopic Quantum Systems and the Quantum Theory of  Measurement,in 
Suppl.Prog.Theor.Phys., 69(80). 
Licata, I., 2003a, Osservando la sfinge. La realtà virtuale della fisica quantistica, Di Renzo, Roma. 
Licata,I., 2003b, Mente & Computazione, in Sistema Naturae, Annali di Biologia Teorica,5. 
Mamdani, E.H., 1977, Application of Fuzzy Logic to Approximate Reasoning Using Linguistic 
Synthesis, in IEEE Trans. on Computers, C26. 
Minati G and Brahms S., 2002, The Dynamic Usage of Models (DYSAM), in Emergence in 
Complex, Cognitive,    Social and Biological Systems ( G. Minati and E.Pessa eds.), Kluwer 
Academic, NY, 2002. 
Minati, G., Pessa, E., Penna, M. P., 1998, Thermodynamical and Logical Openness in Systems 
Research and Behavioral Science, 15(3). 
Orlov, Y.F., 1997, Quantum-Type Coherence as a Combination of Symmetry and Semantics, in 
arXiv:quant-ph/9705049 v1. 
Piron, C.,1964,  Axiomatique Quantique, in Helvetica Physica Acta, 37. 
Rossi- Landi, F, 1985, Metodica filosofica e scienza dei segni, Bompiani, Milano. 
Volkenshtein, M.V.,1988, Complementary,Physics and Biology in Soviet Phys. Uspekhi 31. 
Von Bertalanffy, 1968, General System Theory, Braziller, NY. 
Zadeh, L.A., 1965, Fuzzy Sets, in Information and Control , 8. 
Zadeh, L. A. , 1987, Fuzzy Sets and Applications:Selected Papers by L.A. Zadeh, R.R Yager, R.M 
Tong, S. Ovchnikov H.T Nguyen (eds.) , Wiley, NY. 
Wiener, N., 1961, Cybernetics : or control and communication in the animal ed the machine, MIT 
Press, Cambridge.
ABSTRACT
  It is outlined the possibility to extend the quantum formalism in relation to
the requirements of the general systems theory. It can be done by using a
quantum semantics arising from the deep logical structure of quantum theory. It
is so possible taking into account the logical openness relationship between
observer and system. We are going to show how considering the truth-values of
quantum propositions within the context of the fuzzy sets is here more useful
for systemics . In conclusion we propose an example of formal quantum
coherence.

<|endoftext|><|startoftext|>
Introduction
In 1959, S.K. Godunov [17] demonstrated that a (linear) scheme for a PDE
could not, at the same time, be monotone and second order accurate. Hence,
∗ Corresponding author.
Email addresses: r.brownlee@mcs.le.ac.uk (R. A. Brownlee),
a.gorban@mcs.le.ac.uk (A. N. Gorban), j.levesley@mcs.le.ac.uk
(J. Levesley).
1 This work is supported by EPSRC grant number GR/S95572/01.
Preprint submitted to Physica A 24 October 2018
http://arxiv.org/abs/0704.0043v1
we should choose between spurious oscillation in high order non-monotone
schemes and additional dissipation in first order schemes. Flux limiter schemes
are invented to combine high resolution schemes in areas with smooth fields
and first order schemes in areas with sharp gradients.
The idea of flux limiters can be illustrated by computation of the flux F0,1
of the conserved quantity u between a cell marked by 0 and one of two its
neighbour cells marked by ±1:
F0,1 = (1− φ(r))f low0,1 + φ(r)f
0,1 ,
where f low0, 1 , f
0, 1 are low and high resolution scheme fluxes, respectively, r =
(u0 − u−1)/(u1 − u0), and φ(r) ≥ 0 is a flux limiter function. For r close to 1,
the flux limiter function φ(r) should be also close to 1.
Many flux limiter schemes have been invented during the last two decades [43].
No particular limiter works well for all problems, and a choice is usually made
on a trial and error basis.
Below are several examples of flux limiter functions:
φmm(r) = max [0,min (r, 1)] (minmod, [36]);
φos(r) = max [0,min (r, β)] , (1 ≤ β ≤ 2) (Osher, [10]);
φmc(r) = max [0,min (2r, 0.5(1 + r), 2)] (monotonised central [42]);
φsb(r) = max [0,min (2r, 1) ,min (r, 2)] (superbee, [36]);
φsw(r) = max [0,min (βr, 1) , (r, β)] , (1 ≤ β ≤ 2) (Sweby, [40]).
The lattice Boltzmann method has been proposed as a discretization of Boltz-
mann’s kinetic equation and is now in wide use in fluid dynamics and beyond
(for an introduction and review see [38]). Instead of fields of moments M , the
lattice Boltzmann method operates with fields of discrete distributions f . This
allows us to construct very simple limiters that do not depend on slopes or
gradients.
All the limiters we construct are based on the representation of distributions
f in the form:
f = f ∗ + ‖f − f ∗‖
f − f ∗
‖f − f ∗‖
where f ∗ is the correspondent quasiequilibrium (conditional equilibrium) for
given moments M , f − f ∗ is the nonequilibrium “part” of the distribution,
which is represented in the form “norm×direction” and ‖f − f ∗‖ is the norm
of that nonequilibrium component (usually this is the entropic norm). Lim-
iters change the norm of the nonequilibrium component f − f ∗, but do not
touch its direction or the equilibrium. In particular, limiters do not change the
macroscopic variables, because moments for f and f ∗ coincide. All limiters we
use are transformations of the form
f 7→ f ∗ + φ× (f − f ∗) (1)
with φ > 0. If f − f ∗ is too big, then the limiter should decrease its norm.
The outline of the paper is as follows. In Sec. 2 we introduce the notions and
notations from lattice Boltzmann theory we need, in Sec. 3 we elaborate the
idea of entropic limiters in more detail and construct several nonequilibrium
entropy limiters for LBM, in Sec. 4 some numerical experiments are described:
(1) 1D athermal shock tube examples;
(2) steady state vortex centre locations and observation of first Hopf bifur-
cation in 2D lid-driven cavity flow.
Concluding remarks are given in Sec. 5.
2 Background
The essence of lattice Boltzmann methods was formulated by S. Succi in the
following maxim: “Nonlinearity is local, non-locality is linear” 2 . We should
even strengthen this statement. Non-locality (a) is linear; (b) is exactly and
explicitly solvable for all time steps; (c) space discretization is an exact oper-
ation.
The lattice Boltzmann method is a discrete velocity method. The finite set
of velocity vectors {vi} (i = 1, ...m) is selected, and a fluid is described by
associating, with each velocity vi, a single-particle distribution function fi =
fi(x, t) which is evolved by advection and interaction (collision) on a fixed
computational lattice. The values fi are named populations. If we look at all
lattice Boltzmann models, one finds that there are two steps: free flight for
time δt and a local collision operation.
The free flight transformation for continuous space is
fi(x, t+ δt) = fi(x− viδt, t).
After the free flight step the collision step follows:
fi(x) 7→ Fi({fj(x)}), (2)
2 S. Succi, “Lattice Boltzmann at all-scales: from turbulence to DNA transloca-
tion”, Mathematical Modelling Centre Distinguished Lecture, University of Leices-
ter, Leicester UK, 15th November 2006.
or in the vector form
f(x) 7→ F (f(x)).
Here, the collision operator F is the set of functions Fi({fj}) (i = 1, ...m).
Each function Fi depends on all fj (j = 1, ...m): new values of the populations
fi at a point x are known functions of all previous population values at the
same point.
The lattice Boltzmann chain “free flight → collision → free flight → collision
· · · ” can be exactly restricted onto any space lattice which is invariant with
respect to space shifts of the vectors viδt (i = 1, ...m). Indeed, free flight trans-
forms the population values at sites of the lattice into the population values
at sites of the same lattice. The collision operator (2) acts pointwise at each
lattice site separately. Much effort has been applied to answer the questions:
“how does the lattice Boltzmann chain approximate the transport equation for
the moments M?”, and “how does one construct the lattice Boltzmann model
for a given macroscopic transport phenomenon?” (a review is presented in
book [38]).
In our paper we propose a universal construction of limiters for all possible
collision operators, and the detailed construction of Fi({fj}) is not important
for this purpose. The only part of this construction we use is the local equilibria
(sometimes these states are named conditional equilibria, quasiequilibria, or
even simpler, equilibria).
The lattice Boltzmann models should describe the macroscopic dynamic, i.e.,
the dynamic of macroscopic variables. The macroscopic variables Mℓ(x) are
some linear functions of the population values at the same point: Mℓ(x) =
imℓifi(x), or in the vector form, M(x) = m(f(x)). The macroscopic vari-
ables are invariants of collisions:
mℓifi =
mℓiFi({fj}) (or m(f) = m(F (f))).
The standard example of the macroscopic variables are hydrodynamic fields
(density–velocity–energy density): {n, nu, E}(x) := ∑i{1, vi, v2i /2}fi(x). But
this is not an obligatory choice. If we would like to solve, by LBM methods,
the Grad equations [22] or some extended thermodynamic equations [25], we
should extend the list of moments (but, at the same time, we should be ready
to introduce more discrete velocities for a proper description of these extended
moment systems). On the other hand, the athermal lattice Boltzmann models
with a shortened list of macroscopic variables {n, nu} are very popular.
The quasiequilibrium is the positive fixed point of the collision operator for
the given macroscopic variablesM . We assume that this point exists, is unique
and depends smoothly on M . For the quasiequilibrium population vector for
given M we use the notation f ∗M , or simply f
∗, if the correspondent value of
M is obvious. We use Π∗ to denote the equilibration projection operation of
a distribution f into the corresponding quasiequilibrium state:
Π∗(f) = f ∗m(f).
For some of the collision models an entropic description of equilibrium is pos-
sible: an entropy density function S(f) is defined and the quasiequilibrium
point f ∗M is the entropy maximiser for given M [26,39].
As a basic example we shall consider the lattice Bhatnagar–Gross–Krook
(LBGK) model with overrelaxation (see, e.g., [3,12,23,28,38]). The LBGK col-
lision operator is
F (f) := Π∗(f) + (2β − 1)(Π∗(f)− f), (3)
where β ∈ [0, 1]. For β = 0, LBGK collisions do not change f , for β = 1/2
these collisions act as equilibration (this corresponds to the Ehrenfests’ coarse
graining [15] further developed in [14,19,20]), for β = 1, LBGK collisions act
as a point reflection with the center at the quasiequilibrium Π∗(f).
It is shown [8] that under some stability conditions and after an initial period
of relaxation, the simplest LBGK collision with overrelaxation [23,38] provides
second order accurate approximation for the macroscopic transport equation
with viscosity proportional to δt(1− β)/β.
The entropic LBGK (ELBM) method [5,20,26,39] differs in the definition
of (3): for β = 1 it should conserve the entropy, and in general has the following
form:
F (f) := (1− β)f + βf̃ , (4)
where f̃ = (1 − α)f + αΠ∗(f). The number α = α(f) is chosen so that the
constant entropy condition is satisfied: S(f) = S(f̃). For LBGK (3), α = 2. Of
course, for ELBM the entropic definition of quasiequilibrium should be valid.
In the low-viscosity regime, LBGK suffers from numerical instabilities which
readily manifest themselves as local blow-ups and spurious oscillations.
The LBM experiences the same spurious oscillation problems near sharp gra-
dients as high order schemes do. The physical properties of the LBM schemes
allows one to construct new types of limiters: the nonequilibrium entropy lim-
iters. In general, they do the same work for LBM as flux limiters do for finite
differences, finite volumes and finite elements methods, but for LBM the main
idea behind the construction of nonequilibrium entropy limiter schemes is to
limit a scalar quantity — nonequilibrium entropy (and not the vectors or ten-
sors of spatial derivatives, as it is for flux limiters). These limiters introduce
some additional dissipation, but all this dissipation could easily be evaluated
through analysis of nonequilibrium entropy production.
Two examples of such limiters have been recently proposed: the positivity
rule [6,31,41] and the Ehrenfests’ regularisation [7]. The positivity rule just
provides positivity of distributions: if a collision step produces negative popu-
lations, then the positivity rule returns them to the boundary of positivity. In
the Ehrenfests’ regularisation, one selects the k sites with highest nonequilib-
rium entropy (the difference between entropy of the state f and entropy of the
corresponding quasiequilibrium state f ∗ at a given space point) that exceed a
given threshold and equilibrates the state in these sites.
The positivity rule and Ehrenfests’ regularisation provide rare, intense and
localised corrections. It is easy and also computationally cheap to organise
more gentle transformation with smooth shift of highly nonequilibrium states
to quasiequilibrium. The following regularisation transformation distributes
its action smoothly: we can just choose in (1) φ = φ(∆S(f)) with sufficiently
smooth function φ(∆S(f)). Here f is the state at some site, f ∗ is the corre-
sponding quasiequilibrium state, S is entropy, and ∆S(f) := S(f ∗)− S(f).
The next step in the development of the nonequilibrium entropy limiters is in
the usage of local entropy filters. The filter of choice here is the median filter: it
does not erase sharp fronts, and is much more robust than convolution filters.
An important problem is: “how does one create nonequilibrium entropy lim-
iters for LBM with non-entropic quasiequilibria?”. We propose a solution
of this problem based on the nonequilibrium Kullback entropy. For entropic
quasiequilibrium the Kullback entropy approach gives the same entropic lim-
iters. In thermodynamics, Kullback entropy belongs to the family of Massieu–
Planck–Kramers functions (canonical or grandcanonical potentials).
3 Nonequilibrium entropy limiters for LBM
3.1 Positivity rule
There is a simple recipe for positivity preservation [6,31,41]: to substitute
nonpositive I
0 (f)(x) by the closest nonnegative state that belongs to the
straight line
λf(x) + (1− λ)Π∗(f(x))| λ ∈ R
defined by the two points, f(x) and corresponding quasiequilibrium. This op-
eration is to be applied pointwise, at points of the lattice where positivity
is violated. The coefficient λ depends on x too. Let us call this recipe the
positivity rule (Fig. 1). This recipe preserves positivity of populations and
probabilities, but can affect accuracy of approximation. The same rule is nec-
F(f )
Positivity fixation 
Positivity domain 
Fig. 1. Positivity rule in action. The motions stops at the positivity boundary.
essary for ELBM (4) when the positive “mirror state” f̃ with the same entropy
as f does not exists on the straight line (5).
3.2 Ehrenfests’ regularisation
To discuss methods with additional dissipation, the entropic approach is very
convenient. Let entropy S(f) be defined for each population vector f = (fi)
(below we use the same letter S for local in space entropy, and hope that
context will make this notation always clear). We assume that the global
entropy is a sum of local entropies for all sites. The local nonequilibrium
entropy is
∆S(f) := S(f ∗)− S(f), (6)
where f ∗ is the corresponding local quasiequilibrium at the same point.
The Ehrenfests’ regularisation [6,7] provides “entropy trimming”: we moni-
tor local deviation of f from the corresponding quasiequilibrium, and when
∆S(f)(x) exceeds a pre-specified threshold value δ, perform local Ehrenfests’
steps to the corresponding quasiequilibrium: f 7→ f ∗ at those points.
So that the Ehrenfests’ steps are not allowed to degrade the accuracy of LBGK
it is pertinent to select the k sites with highest ∆S > δ. The a posteriori
estimates of added dissipation could easily be performed by analysis of entropy
production in Ehrenfests’ steps. Numerical experiments show (see, e.g., [6,7])
that even a small number of such steps drastically improve stability.
To avoid the change of accuracy order “on average”, the number of sites with
this step should be ≤ O(Nh/L) where N is the total number of sites, h is
the step of the space discretization and L is the macroscopic characteristic
length. But this rough estimate of accuracy in average might be destroyed
by concentration of Ehrenfests’ steps in the most nonequilibrium areas, for
example, in the boundary layer. In that case, instead of the total number of
sites N in O(Nh/L) we should take the number of sites in a specific region.
The effects of concentration could be easily analysed a posteriori.
3.3 Smooth limiters of nonequilibrium entropy
The positivity rule and Ehrenfests’ regularisation provide rare, intense and
localised corrections. Of course, it is easy and also computationally cheap to
organise more gentle transformation with a smooth shift of highly nonequilib-
rium states to quasiequilibrium. The following regularisation transformation
distributes its action smoothly:
f 7→ f ∗ + φ(∆S(f))(f − f ∗). (7)
The choice of function φ is highly ambiguous, for example, φ = 1/(1+α∆Sk)
for some α > 0 and k > 0. There are two significantly different choices: (i)
ensemble-independent φ (i.e., the value of φ depends on local value of ∆S
only) and (ii) ensemble-dependent φ, for example
φ(∆S) =
1 + (∆S/(αE(∆S)))k−1/2
1 + (∆S/(αE(∆S)))k
, (8)
where E(∆S) is the average value of ∆S in the computational area, k ≥ 1,
and α & 1. For small ∆S, φ(∆S) ≈ 1 and for ∆S ≫ αE(∆S), φ(∆S) tends
αE(∆S)/∆S. It is easy to select an ensemble-dependent φ with control
of total additional dissipation.
3.4 Monitoring of total dissipation
For given β, the entropy production in one LBGK step in quadratic approxi-
mation for ∆S is:
δLBGKS ≈ [1− (2β − 1)2]
∆S(x),
where x is the grid point, ∆S(x) is nonequilibrium entropy (6) at point x,
δLBGKS is the total entropy production in a single LBGK step. It would be
desirable if the total entropy production for the limiter δlimS was small relative
to δLBGKS:
δlimS < δ0δLBGKS. (9)
A simple ensemble-dependent limiter (perhaps, the simplest one) for a given
δ0 operates as follows. Let us collect the histogram of the ∆S(x) distribution,
and estimate the distribution density, p(∆S). We have to estimate a value
∆S0 that satisfies the following equation:
p(∆S)(∆S −∆S0) d∆S = δ0[1− (2β − 1)2]
p(∆S)∆S d∆S. (10)
In order not to affect distributions with small expectation of ∆S, we choose
a threshold ∆St = max{∆S0, δ}, where δ is some predefined value (as in
the Ehrenfests’ regularisation). For states at sites with ∆S ≥ ∆St we pro-
vide homothety with quasiequilibrium center f ∗ and coefficient
∆St/∆S (in
quadratic approximation for nonequilibrium entropy):
f(x) 7→ f ∗(x) +
(f(x)− f ∗(x)). (11)
3.5 Median entropy filter
The limiters described above provide pointwise correction of nonequilibrium
entropy at the “most nonequilibrium” points. Due to the pointwise nature,
the technique does not introduce any nonisotropic effects, and provides some
other benefits. But if we involve the local structure, we can correct local non-
monotone irregularities without touching regular fragments. For example, we
can discuss monotone increase or decrease of nonequilibrium entropy as regular
fragments and concentrate our efforts on reduction of “speckle noise” or “salt
and pepper noise”. This approach allows us to use the accessible resource of
entropy change (9) more thriftily.
Among all possible filters, we suggest the median filter. The median is a more
robust average than the mean (or the weighted mean) and so a single very
unrepresentative value in a neighborhood will not affect the median value
significantly. Hence, we suppose that the median entropy filter will work better
than entropy convolution filters.
The median filter considers each site in turn and looks at its nearby neighbours.
It replaces the nonequilibrium entropy value ∆S at the point with the median
of those values ∆Smed, then updates f by the transformation (11) with the
homothety coefficient
∆Smed/∆S. The median, ∆Smed, is calculated by first
sorting all the values from the surrounding neighbourhood into numerical order
and then replacing that being considered with the middle value. For example,
if a point has 3 nearest neighbors including itself, then after sorting we have
3 values ∆S: ∆S1 ≤ ∆S2 ≤ ∆S3. The median value is ∆Smed = ∆S2. For 9
nearest neighbors (including itself) we have after sorting ∆Smed = ∆S5. For
27 nearest neighbors ∆Smed = ∆S14.
We accept only dissipative corrections (those resulting in a decrease of ∆S,
∆Smed < ∆S) because of the second law of thermodynamics. The analogue
of (10) is also useful for acceptance of the most significant corrections.
Median filtering is a common step in image processing [34] for the smoothing
of signals and the suppression of impulse noise with preservation of edges.
3.6 Entropic steps for non-entropic quasiequilibria
Beyond the quadratic approximation for nonequilibrium entropy all the logic of
the above mentioned constructions remain the same. There exists only one sig-
nificant change: instead of a simple homothety (11) with coefficient
∆St/∆S
the transformation (7) should be applied, where the multiplier φ is a solution
of the nonlinear equation
S(f ∗ + φ(f − f ∗)) = S(f ∗)−∆St.
This is essentially the same equation that appears in the definition of ELBM
steps (4).
More differences emerge for LBM with non-entropic quasiequilibria. The main
idea here is to reason that non-entropic quasiequilibria appear only because of
technical reasons, and approximate continuous physical entropic quasiequilib-
ria. This is not an approximation of a density function, but an approximation
of measure, i.e., from the cubature formula:
f(v) ≈
fiδ(v − vi)
ϕ(v)f(v) dv ≈
ϕ(vi)fi.
The discrete populations fi are connected to continuous (and sufficiently
smooth) densities f(v) by cubature weights fi ≈ wif(vi). These weights for
quasiequilibria are found by moment and flux matching conditions [37]. It
is impossible to approximate the BGS entropy
f ln fdv just by discretiza-
tion (to change integration by summation, and continuous distribution f by
discrete fi), because cubature weights appear as additional variables. Never-
theless, the approximate discretization of the Kullback entropy SK [30] does
not change its form:
SK(f) = −
f(v) ln
f ∗(v)
dv ≈ −
fi ln
, (12)
because fi/f
i approximates the ratio of functions f(v)/f
∗(v) and
i fi . . .
gives the integral
f(v) . . .dv approximation. Here, in (12), the state f ∗ is the
quasiequilibrium with the same values of the macroscopic variables as f . More-
over, for given values of the macroscopic variables, SK(f) achieves its maxi-
mum at the point f = f ∗ (both for continuous and for discrete distributions).
The corresponding maximal value is zero. Below, SK is the discrete Kullback
entropy. If the approximate discrete quasiequilibrium f ∗ is non-entropic, we
can use −SK(f) instead of ∆S(f).
For entropic quasiequilibria with perfect entropy the discrete Kullback entropy
gives the same ∆S: −SK(f) = ∆S(f). Let the discrete entropy have the
standard form for an ideal (perfect) mixture [27].
S(f) = −
fi ln
After the classical work of Zeldovich [44], this function is recognised as a
useful instrument for the analysis of kinetic equations (especially in chemical
kinetics [21]). If we define f ∗ as the conditional entropy maximum for given
k mjkfk, then
ln f ∗k =
µjmjk,
where µj(M) are the Lagrange multipliers (or “potentials”). For this entropy
and conditional equilibrium we find
∆S = S(f ∗)− S(f) =
fi ln
, (13)
if f and f ∗ have the same moments, m(f) = m(f ∗). The right hand side
of (13) is −SK(f).
In thermodynamics, the Kullback entropy belongs to the family of Massieu–
Planck–Kramers functions (canonical or grandcanonical potentials). There is
another sense of this quantity: SK is the relative entropy of f with respect to
f ∗ [18,35].
In quadratic approximation,
−SK(f) =
fi ln
(fi − f ∗i )2
3.7 ELBM collisions as a smooth limiter
On the base of numerical tests, the authors of [41] claim that the positivity
rule provides the same results (in the sense of stability and absence/presence
of spurious oscillations) as the ELBM models, but ELBM provides better
accuracy.
For the formal definition of ELBM (4) our tests do not support claims that
ELBM erases spurious oscillations (see below). Similar observation for Burgers
equation was previously published in [4]. We understand this situation in the
following way. The entropic method consists at least of three components:
(1) entropic quasiequilibrium, defined by entropy maximisation;
(2) entropy balanced collisions (4) that have to provide proper entropy bal-
ance;
(3) a method for the solution of the transcendental equation S(f) = S(f̃) to
find α = α(f) in (4).
It appears that the first two items do not affect spurious oscillations at all,
if we solve the equation for α(f) with high accuracy. Additional viscosity
is, potentially, added by explicit analytic formulas for α(f). In order not to
decrease entropy, errors in these formulas always increase dissipation. This
can be interpreted as a hidden transformation of the form (7), where the
coefficients in φ depend also on f ∗.
3.8 Monotonic and double monotonic limiters
Two monotonicity properties are important in the theory of nonequilibrium
entropy limiters:
(1) a limiter should move the distribution to equilibrium: in all cases of (1)
0 ≤ φ ≤ 1. This is the dissipativity condition which means that limiters
never produce negative entropy.
(2) a limiter should not change the order of states on the line: if for two
distributions with the same moments, f and f ′, ∆S(f) > ∆S(f ′) before
the limiter transformation, then the same inequality should hold after the
limiter transformation too. For example, for the limiter (7) it means that
∆S(f ∗ + xφ(∆S(f ∗ + x(f − f ∗))(f − f ∗)) is a monotonically increasing
function of x > 0.
In quadratic approximation,
∆S(f ∗ + x(f − f ∗)) = x2∆S(f),
∆S(f ∗ + xφ(∆S(f ∗ + x(f − f ∗))(f − f ∗)) = x2φ2(x2∆S(f)),
and the second monotonicity condition transforms into the following require-
ment: yφ(y2s) is a monotonically increasing (not decreasing) function of y > 0
for any s > 0.
If a limiter satisfies both monotonicity conditions, we call it “double mono-
tonic”. For example, Ehrenfests’ regularisation satisfies the first monotonicity
condition, but obviously violates the second one. The limiter (8) violates the
first condition for small ∆S, but is dissipative and satisfies the second one in
quadratic approximation for large ∆S. The limiter with φ = 1/(1+α∆Sk) al-
ways satisfies the first monotonicity condition, violates the second if k > 1/2,
and is double monotonic (in quadratic approximation for the second condi-
tion), if 0 < k ≤ 1/2. The threshold limiters (11) are also double monotonic.
Of course, it is not forbidden to use any type of limiters under the local and
global control of dissipation, but double monotonic limiters provide some nat-
ural properties automatically, without additional care.
4 Numerical experiment
To conclude this paper we report some numerical experiments conducted to
demonstrate the performance of some of the proposed nonequilibrium entropy
limiters for LBM from Sec. 3.
4.1 Velocities and quasiequilibria
We will perform simulations using both entropic and non-entropic quasiequi-
libria, but we always work with an athermal LBM model. Whenever we use
non-entropic quasiequilibria we employ Kullback entropy (13).
In 1D, we use a lattice with spacing and time step δt = 1 and a discrete
velocity set {v1, v2, v3} := {0,−1, 1} so that the model consists of static, left-
and right-moving populations only. The subscript i denotes population (not
lattice site number) and f1, f2 and f3 denote the static, left- and right-moving
populations, respectively. The entropy is S = −H , with
H = f1 log(f1/4) + f2 log(f2) + f3 log(f3),
(see, e.g., [27]) and, for this entropy, the local entropic quasiequilibrium state
f ∗ is available explicitly:
f ∗1 =
1 + 3u2
f ∗2 =
(3u− 1) + 2
1 + 3u2
f ∗3 = −
(3u+ 1)− 2
1 + 3u2
where
fi, u :=
vifi. (15)
The standard non-entropic polynomial quasiequilibria [38] are:
f ∗1 =
f ∗2 =
(1− 3u+ 3u2),
f ∗3 =
(1 + 3u+ 3u2).
In 2D, we employ a uniform 9-speed square lattice with discrete velocities
{vi | i = 0, 1, . . . 8}: v0 = 0, vi = (cos((i − 1)π/2), sin((i − 1)π/2)) for i =
1, 2, 3, 4, vi =
2(cos((i − 5)π
), sin((i − 5)π
)) for i = 5, 6, 7, 8. The
numbering f0, f1, . . . , f8 are for the static, east, north, west, south, north-
east, northwest, southwest and southeast-moving populations, respectively.
As usual, the entropic quasiequilibrium state, f ∗, can be uniquely determined
by maximising an entropy functional
S(f) = −
fi log
subject to the constraints of conservation of mass and momentum [2]:
f ∗i = ρWi
1 + 3u2j
2uj +
1 + 3u2j
1− uj
. (17)
Here, the lattice weights, Wi, are given lattice-specific constants: W0 = 4/9,
W1,2,3,4 = 1/9 and W5,6,7,8 = 1/36. Analogously to (15), the macroscopic vari-
ables ρ and u = (u1, u2) are the zeroth and first moments of the distribution
f , respectively. The standard non-entropic polynomial quasiequilibria [38] are:
f ∗i = ρWi
1 + 3viu+
9(viu)
. (18)
4.2 LBGK and ELBM
The governing equations for LBGK are
fi(x+ vi, t+ 1) = f
i (x, t) + (2β − 1)(f ∗i (x, t)−fi(x, t)), (19)
where β = 1/(2ν + 1).
For ELBM (4) the governing equations are:
fi(x+ vi, t+ 1) = (1− β)f ∗i (x, t) + βf̃i(x, t), (20)
with β as above and f̃ = (1−α)f+αf ∗. The parameter, α, is chosen to satisfy
a constant entropy condition. This involves finding the nontrivial root of the
equation
S((1− α)f + αf ∗) = S(f). (21)
To solve (21) numerically we employ a robust routine based on bisection. The
root is solved to an accuracy of 10−15 and we always ensure that the returned
value of α does not lead to a numerical entropy decrease. We stipulate that
if, at some site, no nontrivial root of (21) exists we will employ the positivity
rule instead (Fig. 1).
4.3 Shock tube
The 1D shock tube for a compressible athermal fluid is a standard benchmark
test for hydrodynamic codes. Our computational domain will be the interval
[0, 1] and we discretize this interval with 801 uniformly spaced lattice sites.
We choose the initial density ratio as 1:2 so that for x ≤ 400 we set ρ = 1.0
else we set ρ = 0.5. We will fix the kinematic viscosity of the fluid at ν = 10−9.
4.3.1 Comparison of LBGK and ELBM
In Fig. 2 we compare the shock tube density profile obtained with LBGK
(using entropic quasiequilibria (14)) and ELBM. On the same panel we also
display both the total entropy S(t) :=
x S(x, t) and total nonequilibrium
entropy ∆S(t) :=
x∆S(x, t) time histories. As expected, by construction,
we observe that total entropy is (effectively) constant for ELBM. On the other
hand, LBGK behaves non-entropically for this problem. In both cases we ob-
serve that nonequilibrium entropy grows with time.
As we can see, the choice between the two collision formulas LBGK (19)
or ELBM (20) does not affect spurious oscillation, and reported regularisa-
tion [29] is, perhaps, the result of approximate analytical solution of the equa-
tion (21). Inaccuracy in the solution of (21) can be interpreted as a hidden
nonequilibrium entropy limiter. But it should be mentioned that the entropic
method consists not only of the collision formula, but, what is important, in-
cludes a special choice of quasiequilibrium that could improve stability (see,
e.g., [13]). Indeed, when we compare ELBM with LBGK using either entopic or
standard polynomial quasiequilibria, there appears to be some gain in employ-
ing entropic quasiequilibria (Fig. 3). We observe that the post-shock region
for the LBGK simulations is more oscillatory when polynomial quasiequilibria
are used. In Fig. 3 we have also included a panel with the simulation result-
ing from a much higher viscosity (ν = 3.3333 × 10−2). Here, we observe no
appreciable differences in the results of LBGK and ELBM.
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
Fig. 2. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9
after 400 time steps using (a) LBGK (19); (b) ELBM (20). In this example, no
negative population are produced by any of the methods so the positivity rule is
redundant. For ELBM in this example, (21) always has a nontrivial root. Total
entropy and nonequilibrium entropy time histories are shown in panels (c), (d) and
(e), (f) for LBGK and ELBM, respectively.
0 0.5 1
0 0.5 1
0 0.5 1
0 0.5 1
0 0.5 1
0 0.5 1
Fig. 3. Density and velocity profile of the 1:2 isothermal shock tube simula-
tion after 400 time steps using (a) LBGK (19) with polynomial quasiequilib-
ria (16) [ν = 3.3333 × 10−2]; (b) LBGK (19) with entropic quasiequilibria (14)
[ν = 3.3333 × 10−2]; (c) ELBM (20) [ν = 3.3333 × 10−2]; (d) LBGK (19) with
polynomial quasiequilibria (16) [ν = 10−9]; (e) LBGK (19) with entropic quasiequi-
libria (14) [ν = 10−9]; (f) ELBM (20) [ν = 10−9].
4.3.2 Nonequilibrium entropy limiters.
Now, we would like to demonstrate just a representative sample of the many
possibilities of limiters suggested in Sec. 3. In each case the limiter is im-
plemented by a post-processing routine immediately following the collision
step (either LBGK (19) or ELBM (20)). Here, we will only consider LBGK
collisions and entropic quasiequilibria (14).
The post-processing step adjusts f by the update formula:
f 7→ f ∗ + φ(∆S)(f − f ∗),
where ∆S is defined by (6) and φ is a limiter function.
For the Ehrenfests’ regularisation one would choose
φ(∆S)(x) =
1, ∆S(x) ≤ δ,
0, otherwise,
where δ is a pre-specified threshold value. Furthermore, it is pertinent to select
just k sites with highest ∆S > δ. This limiter has been previously applied to
the shock tube problem in [6,7,8] and we will not reproduce those results here.
Instead, our first example will be the following smooth limiter:
φ(∆S) =
1 + α∆Sk
. (22)
For this limiter, we will fix k = 1/2 (so that the limiter is double monotonic in
quadratic approximation to entropy) and compare the density profiles for α =
δ/(E(∆S)k), δ = 0.1, 0.01, 0.001. We have also ensured an ensemble-dependent
limiter because of the dependence of α on the average E(∆S). As with Fig. 2,
we accompany each panel with the total entropy and nonequilibrium entropy
histories. Note the different scales for nonequilibrium entropy. Note also that
entropy (necessarily) now grows due to the additional dissipation.
Our next example (Fig. 5) considers the threshold filter (10). In this example
we choose the estimates ∆S0 = 5E(∆S), 10E(∆S), 20E(∆S) and fix the tol-
erance δ = 0 so that the influence of the threshold alone can be studied. Only
entropic adjustments are accepted in the limiter: ∆St ≤ ∆S. As the threshold
increases, nonequilibrium entropy grows faster and spurious begin to appear.
Finally, we test the median filter (Fig. 6). We choose a minimal filter so that
only the nearest neighbours are considered. As with the threshold filter, we
introduce a tolerance δ and we try the values δ = 10−3, 10−4, 10−5. Only
entropic adjustments are accepted in the limiter: ∆Smed ≤ ∆S.
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0.025
0 0.5 1
0 100 200 300 400
0 100 200 300 400
Fig. 4. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9
after 400 time steps using LBGK (19) and the smooth limiter (22) with k = 1/2,
α = δ/(E(∆S)k) and (a) δ = 0.1; (b) δ = 0.01 and (c) δ = 0.001. Total entropy and
nonequilibrium entropy time histories for each parameter set {k, α(δ)} are displayed
in the adjacent panels.
We have seen that each of the examples we have considered (Fig. 4, Fig. 5
and Fig. 6) is capable of subduing spurious post-shock oscillations compared
with LBGK (or ELBM) on this problem (cf. Fig. 2). Of course, by limiting
nonequilibrium entropy the result is necessarily an increase in entropy.
From our experiences our recommendation is that the median filter is the
superior choice amongst all the limiters suggested in Sec. 3. The action of the
median filter is found to be both extremely gentle and, at the same time, very
effective.
4.4 Lid-driven cavity
Our second numerical example is the classical 2D lid-driven cavity flow. A
square cavity of side length L is filled with fluid with kinematic viscosity ν
(initially at rest) and driven by the cavity lid moving at a constant velocity
(u0, 0) (from left to right in our geometry).
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
Fig. 5. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9
after 400 time steps using LBGK (19) and the threshold limiter (10) with (a)
∆St = 5E(∆S); (b) ∆St = 10E(∆S) and (c) ∆St = 20E(∆S). Total entropy and
nonequilibrium entropy time histories for each threshold ∆St are displayed in the
adjacent panels.
We will simulate the flow on a 100 × 100 grid using LBGK regularised with
the median filter limiter. Unless otherwise stated, we use entropic quasiequilib-
ria (17). The implementation of the filter is as follows: the filter is not applied
to boundary nodes; for nodes which immediately neighbour the boundary the
stencil consists of the 3 nearest neighbours (including itself) closest to the
boundary; for all other nodes the minimal stencil of 9 nearest neighbours is
used.
We have purposefully selected such a coarse grid simulation because it is read-
ily found that, on this problem, unregularised LGBK fails (blows-up) for all
but the most modest Reynolds numbers Re := Lu0/ν.
4.4.1 Steady-state vortex centres
For modest Reynolds number the system settles to a steady state in which the
dominant features are a primary central rotating vortex, with several counter-
rotating secondary vortices located in the bottom-left, bottom-right (and pos-
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
0 0.5 1
0 100 200 300 400
0 100 200 300 400
Fig. 6. Density and profile of the 1:2 athermal shock tube simulation with ν = 10−9
after 400 time steps using LBGK (19) and the minimal median limiter with (a)
δ = 10−5; (b) δ = 10−4 and (c) δ = 10−3. Total entropy and nonequilibrium
entropy time histories for each tolerance δ are displayed in the adjacent panels.
sibly top-left) corners.
Steady state has been extensively investigated in the literature. The study
of Hou et al [24] simulates the flow over a range of Reynolds numbers using
unregularised LBGK on a 256×256 grid. Primary and secondary vortex centre
data is provided. We compare this same statistic for the present median filtered
coarse grid simulation. We will employ the same convergence criteria used
in [24]. Namely, we deem that steady state has been reached by ensuring
that the difference between the maximum value of the stream function for
successive 10, 000 time steps is less that 10−5. The stream function, which is
not a primary variable in the LBM simulation, is obtained from the velocity
data by integration using Simpson’s rule. Vortex centres are characterised as
local extrema of the stream function.
We compare our results with the LBGK simulations in [24] and [41]. To align
ourselves with these studies we specify the following boundary condition: lid
profile is constant; remaining cavity walls are subject to the “bounce-back”
condition [38]. In our simulations, the initial uniform fluid density profile is
ρ = 2.7 and the velocity of the lid is u0 = 1/10 (in lattice units).
Collected in Table 1, for Re = 2000, 5000 and 7500, are the coordinates of
the primary and secondary vortex centres using (a) unregularised LBGK; (b)
LBGK with median filter limiter (δ = 10−3); (c) LBGK with median filter lim-
iter (δ = 10−4), all with non-entropic polynomial quasiequilibria (18). Lines
(d), (e) and (f) are the same but with entropic quasiequilibria (17). The re-
maining lines of Table 1 are as follows: (g) literature data [24] (unregularised
LBGK on a 256×256 grid); (h) literature data [41] (positivity rule); (i) litera-
ture data [41] (ELBM). With the exception of (g), all simulation are conducted
on a 100 × 100 grid. The top-left vortex does not appear at Re = 2000 and
no data was provided for it in [41] at Re = 5000. The unregularised LBGK
Re = 7500 simulation blows-up in finite time and the simulation becomes
meaningless. The y-coordinate of the two lower-vortices at Re = 5000 in (i)
appear anomalously small and were not reproduced by our experiments with
the positivity rule (not shown).
We have conducted two runs of the experiment with the median filter param-
eter δ = 10−3 and δ = 10−4. Despite the increased number of realisations the
vortex centre locations remain effectively unchanged and we detect no signif-
icant variation between the two runs. This demonstrates the gentle nature of
the median filter. At Reynolds Re = 2000 the median filter has no effect at all
on the vortex centres compared with LBGK.
We find no significant differences between the experiments with entropic and
non-entropic polynomial quasiequilibria in this test.
The coordinates of the primary vortex centre for unregularised LBGK at Re =
5000 are already quite inaccurate as LBGK begins to lose stability. Stability
is lost entirely at some critical Reynolds number 5000 < Re ≤ 7500 and the
simulation blows-up.
Furthermore, we have agreement (within grid resolution) with the data given
in [24]. Also compiled in Table 1 is the data from the limiter experiments
conducted in [41] (although not explicitly discussed in the language of limiters
by the authors of that work). In [41] the authors give vortex centre data for
the positivity rule (Fig. 1) and for ELBM (which we interpret as containing a
hidden limiter). In [41] the positivity rule is called FIX-UP.
As Reynolds number increases the flow in the cavity is no longer steady and a
more complicated flow pattern emerges. On the way to a fully developed tur-
bulent flow, the lid-driven cavity flow is known to undergo a series of period
doubling Hopf bifurcations. On our coarse grid, we observe that the coordi-
nates of the primary vortex centre (maximum of the stream function) is a very
robust feature of the flow, with little change between coordinates (no change
in y-coordinates) computed at Re = 5000 and Re = 7500 with the median fil-
ter. On one hand, because of this observation it becomes inconclusive whether
Table 1
Primary and secondary vortex centre coordinates for the lid-driven cavity flow at
Re = 2000, 5000, 7500.
Primary Lower-left Lower-right Top-left
Re x y x y x y x y
2000 (a) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (b) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (c) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (d) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (e) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (f) 0.5253 0.5455 0.0909 0.1010 0.8384 0.1010 Not applicable
2000 (g) 0.5255 0.5490 0.0902 0.1059 0.8471 0.0980 Not applicable
2000 (h) 0.5200 0.5450 0.0900 0.1000 0.8300 0.0950 Not applicable
2000 (i) 0.5200 0.5500 0.0890 0.1000 0.8300 0.1000 Not applicable
5000 (a) 0.5152 0.6061 0.0808 0.1313 0.7980 0.0707 0.0505 0.8990
5000 (b) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0606 0.8990
5000 (c) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0707 0.8889
5000 (d) 0.5152 0.5960 0.0808 0.1313 0.8081 0.0808 0.0505 0.8990
5000 (e) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0606 0.8990
5000 (f) 0.5152 0.5354 0.0808 0.1313 0.8081 0.0808 0.0707 0.8889
5000 (g) 0.5176 0.5373 0.0784 0.1373 0.8078 0.0745 0.0667 0.9059
5000 (h) 0.5150 0.5680 0.0950 0.0100 0.8450 0.0100 Not available
5000 (i) 0.5150 0.5400 0.0780 0.1350 0.8050 0.0750 Not available
7500 (a) — — — — — — — —
7500 (b) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0606 0.8990
7500 (c) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0707 0.8889
7500 (d) — — — — — — — —
7500 (e) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0606 0.8990
7500 (f) 0.5051 0.5354 0.0707 0.1515 0.7879 0.0707 0.0707 0.8889
7500 (g) 0.5176 0.5333 0.0706 0.1529 0.7922 0.0667 0.0706 0.9098
the median limiter is adding too much additional dissipation. On the other
hand, a more studious choice of control criteria may indicate that the first
bifurcation has already occurred by Re = 7500.
4.4.2 First Hopf bifurcation
A survey of available literature reveals that the precise value of Re at which
the first Hopf bifurcation occurs is somewhat contentious, with most current
studies (all of which are for incompressible flow) ranging from around Re =
7400–8500 [9,32,33]. Here, we do not intend to give a precise value because
it is a well observed grid effect that the critical Reynolds number increases
(shifts to the right) with refinement (see, e.g., Fig. 3 in [33]). Rather, we
will be content to localise the first bifurcation and, in doing so, demonstrate
that limiters are capable of regularising without effecting fundamental flow
features.
To localise the first bifurcation we take the following algorithmic approach.
Entropic quasiequilibria are in use. The initial uniform fluid density profile
is ρ = 1.0 and the velocity of the lid is u0 = 1/10 (in lattice units). We
record the unsteady velocity data at a single control point with coordinates
(L/16, 13L/16) and run the simulation for 5000 non-dimensionless time units
(5000L/u0 time steps). Let us denote the final 1% of this signal by (usig, vsig).
We then compute the energy Eu (ℓ2-norm normalised by non-dimensional
signal duration) of the deviation of usig from its mean:
Eu :=
u0|usig|
(usig − usig)
, (23)
where |usig| and usig denote the length and mean of usig, respectively. We
choose this robust statistic instead of attempting to measure signal amplitude
because of numerical noise in the LBM simulation. The source of noise in LBM
is attributed to the existence of an inherently unavoidable neutral stability
direction in the numerical scheme (see, e.g., [8]).
We opt not to employ the “bounce-back” boundary condition used in the pre-
vious steady state study. Instead we will use the diffusive Maxwell boundary
condition (see, e.g., [11]), which was first applied to LBM in [1]. The essence
of the condition is that populations reaching a boundary are reflected, propor-
tional to equilibrium, such that mass-balance (in the bulk) and detail-balance
are achieved. The boundary condition coincides with “bounce-back” in each
corner of the cavity.
To illustrate, immediately following the advection of populations consider the
situation of a wall, aligned with the lattice, moving with velocity uwall and
with outward pointing normal to the wall in the negative y-direction (this is
the situation on the lid of the cavity with uwall = u0). The implementation
of the diffusive Maxwell boundary condition at a boundary site (x, y) on this
wall consists of the update
fi(x, y, t+ 1) = γf
i (uwall), i = 4, 7, 8,
f2(x, y, t) + f5(x, y, t) + f6(x, y, t)
f ∗4 (uwall) + f
7 (uwall) + f
8 (uwall)
Observe that, because density is a linear factor of the quasiequilibria (17),
the density of the wall is inconsequential in the boundary condition and can
therefore be taken as unity for convenience. As is usual, only those populations
pointing in to the fluid at a boundary site are updated. Boundary sites do not
undergo the collisional step that the bulk of the sites are subjected to.
We prefer the diffusive boundary condition over the often preferred “bounce-
back” boundary condition with constant lid profile. This is because we have
experienced difficulty in separating the aforementioned numerical noise from
the genuine signal at a single control point using “bounce-back”. We remark
that the diffusive boundary condition does not prevent unregularised LBGK
from failing at some critical Reynolds number Re > 5000.
Now, we conduct an experiment and record (23) over a range of Reynolds
numbers. In each case the median filter limiter is employed with parameter
δ = 10−3. Since the transition between steady and periodic flow in the lid-
driven cavity is known to belong to the class of standard Hopf bifurcations
we are assured that E2u ∝ Re [16]. Fitting a line of best fit to the resulting
data localises the first bifurcation in the lid-driven cavity flow to Re = 7135
(Fig. 7). This value is within the tolerance of Re = 7402±4% given in [33] for
a 100×100 grid. We also provide a (time averaged) phase space trajectory and
Fourier spectrum for Re = 7375 at the monitoring point (Fig. 8 and Fig. 9)
which clearly indicate that the first bifurcation has been observed.
5 Conclusions
Entropy and thermodynamics are important for stability of the lattice Boltz-
mann methods. It is now clear: after almost 10 years of work since the pub-
lication of [26] proved this statement (the main reviews are [5,28,39]). The
question is now: “how does one utilise, optimally, entropy and thermody-
namic structures in lattice Boltzmann methods?”. In our paper we attempt to
propose a solution (temporary, at least). Our approach is applicable to both
entropic as well as for non-entropic polynomial quasiequilibria.
5750 6000 6250 6500 6750 7000 7250 7500 7750 8000
0.005
0.015
0.025
0.035
0.045
(7135,0)
Fig. 7. Plot of energy squared, E2u (23), as a function of Reynolds number, Re, using
LBGK regularised with the median filter limiter with δ = 10−3 on a 100× 100 grid.
Straight lines are lines of best fit. The intersection of the sloping line with the x-axis
occurs close to Re = 7135.
We have constructed a system of nonequilibrium entropy limiters for the lattice
Boltzmann methods (LBM):
• the positivity rule that provides positivity of distribution;
• the pointwise entropy limiters based on selection and correction of most
nonequilibrium values;
• filters of nonequilibrium entropy, and the median filter as a filter of choice.
All these limiters exploit physical properties of LBM and allow control of total
additional entropy production. In general, they do the same work for LBM as
flux limiters do for finite differences, finite volumes and finite elements meth-
ods, and come into operation when sharp gradients are present. For smoothly
changing waves, the limiters do not operate and the spatial derivatives can be
represented by higher order approximations without introducing non-physical
oscillations. But there are some differences too: for LBM the main idea behind
the construction of nonequilibrium entropy limiter schemes is to limit a scalar
quantity — the nonequilibrium entropy — or to delete the “salt and pepper”
noise from the field of this quantity. We do not touch the vectors or tensors
of spatial derivatives, as it is for flux limiters.
Standard test examples demonstrate that the developed limiters erase spurious
oscillations without blurring of shocks, and do not affect smooth solutions. The
limiters we have tested do not produce a noticeable additional dissipation and
Fig. 8. Velocity components as a function of time for the signal (usig, vsig) at the
monitoring point (L/16, 13L/16) using LBGK regularised with the median filter
limiter with δ = 10−3 on a 100 × 100 grid (Re = 7375). Dots represent simulation
results and the solid line is a 100 step time average of the signal.
allow us to reproduce the first Hopf bifurcation for 2D lid-driven cavity on a
coarse 100× 100 grid. At the same time the simplest median filter deletes the
spurious post-shock oscillations for low viscosity.
Perhaps, it is impossible to find one best nonequilibrium entropy limiter for
all problems. It is a special task to construct the optimal limiters for a specific
classes of problems.
Acknowledgments
Discussion of the preliminary version of this work with S. Succi and par-
ticipants of the lattice Boltzmann workshop held on 15th November 2006
in Leicester (UK) was very important. Author A. N. Gorban is grateful to
S. K. Godunov for the course of numerical methods given many years ago at
Novosibirsk University. This work is supported by Engineering and Physical
Sciences Research Council (EPSRC) grant number GR/S95572/01.
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
Frequency
Fig. 9. Relative amplitude spectrum for the signal usig at the monitoring point
(L/16, 13L/16) using LBGK regularised with the median filter limiter with δ = 10−3
on a 100 × 100 grid (Re = 7375). We measure a dominant frequency of ω = 0.525.
References
[1] S. Ansumali, and I. V. Karlin. Kinetic boundary conditions in the lattice
Boltzmann method. Phys. Rev. E 66, 026311 2002.
[2] S. Ansumali S, I. V. Karlin, H. C. Ottinger. Minimal entropic kinetic models
for hydrodynamics Europhys. Let. 63 (6): 798-804. 2003
[3] R. Benzi, S. Succi, and M. Vergassola. The lattice Boltzmann-equation - theory
and applications. Physics Reports, 222(3):145–197, 1992.
[4] B. M. Boghosian, P. J. Love, and J. Yepez. Entropic lattice Boltzmann model
for Burgers equation. Phil. Trans. Roy. Soc. A, 362:1691–1702, 2004.
[5] B. M. Boghosian, J. Yepez, P. V. Coveney, and A. J. Wager. Entropic lattice
Boltzmann methods. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci.,
457(2007):717–766, 2001.
[6] R. A. Brownlee, A. N. Gorban, and J. Levesley. Stabilisation of the lattice-
Boltzmann method using the Ehrenfests’ coarse-graining. cond-mat/0605359,
2006.
[7] R. A. Brownlee, A. N. Gorban, and J. Levesley. Stabilisation of the lattice-
Boltzmann method using the Ehrenfests’ coarse-graining. Phys. Rev. E,
74:037703, 2006.
http://arxiv.org/abs/cond-mat/0605359
[8] R. A. Brownlee, A.N. Gorban, and J. Levesley. Stability and stabilization of the
lattice Boltzmann method, Phys. Rev. E, to appear. cond-mat/0611444, 2006.
[9] C.-H. Bruneau, and M. Saad. The 2D lid-driven cavity problem revisited.
Comput. Fluids, 35:326–348, 2006.
[10] S. R. Chatkravathy, and S. Osher. High resolution applications of the Osher
upwind scheme for the Euler equations, AIAA Paper 83-1943, Proc. AIAA 6th
Comutational Fluid Dynamics Conference, (1983), 363–373.
[11] C. Cercignani. Theory and Application of the Boltzmann Equation. Scottish
Academic Press, Edinburgh, 1975.
[12] S. Chen and G. D. Doolen. Lattice Boltzmann method for fluid flows. Annu.
Rev. Fluid. Mech., 30:329–364, 1998.
[13] S. S. Chikatamarla and I. V. Karlin. Entropy and Galilean Invariance of Lattice
Boltzmann Theories. Phys. Rev. Lett. 97, 190601 (2006)
[14] A. J. Chorin, O. H. Hald, R. Kupferman. Optimal prediction with memory,
Physica D 166 (2002), 239–257.
[15] P. Ehrenfest and T. Ehrenfest. The conceptual foundations of the statistical
approach in mechanics. Dover Publications Inc., New York, 1990.
[16] N. K. Ghaddar, K. Z. Korczak, B. B. Mikic, and A. T. Patera. Numerical
investigation of incompressible flow in grooved channels. Part 1. Stability and
self-sustained oscillations. J. Fluid Mech., 163:99–127, 1986.
[17] S. K. Godunov. A Difference Scheme for Numerical Solution of Discontinuous
Solution of Hydrodynamic Equations, Math. Sbornik, 47 (1959), 271-306.
[18] A. N. Gorban. Equilibrium encircling. Equations of chemical kinetics and their
thermodynamic analysis, Nauka, Novosibirsk, 1984.
[19] A. N. Gorban, I. V. Karlin, H. C. Öttinger, and L. L. Tatarinova. Ehrenfest’s
argument extended to a formalism of nonequilibrium thermodynamics. Phys.
Rev. E, 62:066124, 2001.
[20] A. N. Gorban. Basic types of coarse-graining. In A. N. Gorban, N. Kazantzis,
I. G. Kevrekidis, H.-C. Öttinger, and C. Theodoropoulos, editors, Model
Reduction and Coarse-Graining Approaches for Multiscale Phenomena, pages
117–176. Springer, Berlin-Heidelberg-New York, 2006. cond-mat/0602024.
[21] A. Gorban, B. Kaganovich, S. Filippov, A. Keiko, V. Shamansky, I. Shirkalin,
Thermodynamic Equilibria and Extrema: Analysis of Attainability Regions and
Partial Equilibrium, Springer, Berlin, Heidelberg, New York, 2006.
[22] H. Grad. On the kinetic theory of rarefied gases, Comm. Pure and Appl. Math.
2 4, (1949), 331–407.
[23] F. Higuera, S. Succi, and R. Benzi. Lattice gas – dynamics with enhanced
collisions. Europhys. Lett., 9:345–349, 1989.
http://arxiv.org/abs/cond-mat/0611444
http://arxiv.org/abs/cond-mat/0602024
[24] S. Hou, Q. Zou, S. Chen, G. Doolen and A. C. Cogley. Simulation of cavity
flow by the lattice Boltzmann method. J. Comp. Phys., 118:329–347, 1995.
[25] D. Jou, J. Casas-Vázquez, G. Lebon. Extended irreversible thermodynamics,
Springer, Berlin, 1993.
[26] I. V. Karlin, A. N. Gorban, S. Succi, and V. Boffi. Maximum entropy principle
for lattice kinetic equations. Phys. Rev. Lett., 81:6–9, 1998.
[27] I. V. Karlin, A. Ferrante, and H. C. Öttinger. Perfect entropy functions of the
lattice Boltzmann method. Europhys. Lett., 47:182–188, 1999.
[28] I. V. Karlin, S. Ansumali, C. E. Frouzakis, and S. S. Chikatamarla. Elements of
the lattice Boltzmann method I: Linear advection equation. Commun. Comput.
Phys., 1 (2006), 616–655.
[29] I. V. Karlin, S. S. Chikatamarla and S. Ansumali. Elements of the lattice
Boltzmann method II: Kinetics and hydrodynamics in one dimension. Commun.
Comput. Phys., 2 (2007), 196–238.
[30] S. Kullback. Information theory and statistics, Wiley, New York, 1959.
[31] Y. Li, R. Shock, R. Zhang, and H. Chen. Numerical study of flow past an
impulsively started cylinder by the lattice-Boltzmann method. J. Fluid Mech.,
519:273–300, 2004.
[32] T. W. Pan, and R. Glowinksi. A projection/wave-like equation method for
the numerical simulation of incompressible viscous fluid flow modeled by the
Navier–Stokes equations. Comp. Fluid Dyn. J., 9:28–42, 2000.
[33] Y.-F. Peng, Y.-H. Shiau, and R. R. Hwang. Transition in a 2-D lid-driven cavity
flow. Comput. Fluids, 32:337–352, 2003.
[34] W. K. Pratt. Digital Image Processing, Wiley, New York, 1978.
[35] H. Qian. Relative entropy: free energy associated with equilibrium fluctuations
and nonequilibrium deviations, Phys. Rev. E. 63 (2001), 042103.
[36] P. L. Roe. Characteristic-based schemes for the Euler equations, Ann. Rev.
Fluid Mech., 18 (1986), 337-365.
[37] X. Shan, X-F. Yuan, and H. Chen. Kinetic theory representation of
hydrodynamics: a way beyond the NavierStokes equation. J. Fluid Mech. 550
(2006), 413-441.
[38] S. Succi. The lattice Boltzmann equation for fluid dynamics and beyond. Oxford
University Press, New York, 2001.
[39] S. Succi, I. V. Karlin, and H. Chen. Role of the H theorem in lattice Boltzmann
hydrodynamic simulations. Rev. Mod. Phys., 74:1203–1220, 2002.
[40] P. K. Sweby. High resolution schemes using flux-limiters for hyperbolic
conservation laws. SIAM J. Num. Anal., 21 (1984), 995–1011.
[41] F. Tosi, S. Ubertini, S. Succi, H. Chen, and I.V. Karlin. Numerical stability of
entropic versus positivity-enforcing lattice Boltzmann schemes. Math. Comput.
Simulation, 72:227–231, 2006.
[42] B. Van Leer. Towards the ultimate conservative difference scheme III.
Upstream-centered finite-difference schemes for ideal compressible flow., J.
Comp. Phys., 23 (1977), 263–275.
[43] P. Wesseling. Principles of Computational Fluid Dynamics, Springer Series in
Computational Mathematics (Springer-Verlag, Berlin, 2001), Vol. 29.
[44] Y. B. Zeldovich, Proof of the Uniqueness of the Solution of the Equations of
the Law of Mass Action, In: Selected Works of Yakov Borisovich Zeldovich,
Vol. 1, J. P. Ostriker (Ed.), Princeton University Press, Princeton, USA, 1996,
144–148.
	Introduction
	Background
	Nonequilibrium entropy limiters for LBM 
	Positivity rule
	Ehrenfests' regularisation
	Smooth limiters of nonequilibrium entropy
	Monitoring of total dissipation
	Median entropy filter
	Entropic steps for non-entropic quasiequilibria
	ELBM collisions as a smooth limiter
	Monotonic and double monotonic limiters
	Numerical experiment
	Velocities and quasiequilibria
	LBGK and ELBM
	Shock tube
	Lid-driven cavity
	Conclusions
	References
ABSTRACT
  We construct a system of nonequilibrium entropy limiters for the lattice
Boltzmann methods (LBM). These limiters erase spurious oscillations without
blurring of shocks, and do not affect smooth solutions. In general, they do the
same work for LBM as flux limiters do for finite differences, finite volumes
and finite elements methods, but for LBM the main idea behind the construction
of nonequilibrium entropy limiter schemes is to transform a field of a scalar
quantity - nonequilibrium entropy. There are two families of limiters: (i)
based on restriction of nonequilibrium entropy (entropy "trimming") and (ii)
based on filtering of nonequilibrium entropy (entropy filtering). The physical
properties of LBM provide some additional benefits: the control of entropy
production and accurate estimate of introduced artificial dissipation are
possible. The constructed limiters are tested on classical numerical examples:
1D athermal shock tubes with an initial density ratio 1:2 and the 2D lid-driven
cavity for Reynolds numbers Re between 2000 and 7500 on a coarse 100*100 grid.
All limiter constructions are applicable for both entropic and non-entropic
quasiequilibria.

<|endoftext|><|startoftext|>
Introduction, observational and nu-
merical evidence makes it safe to assume that the turbulence
in such a system will be anisotropic with k‖ ≪ k⊥ (at scales
smaller than the outer scale, k‖L ≫ 1; see § 1.3 and § 1.5.1).
Let us, therefore, introduce a small parameter ǫ ∼ k‖/k⊥ and
carry out a systematic expansion of Eqs. (7-10) in ǫ. In this
expansion, the fluctuations are treated as small, but not arbi-
trarily so: in order to estimate their size, we shall adopt the
critical-balance conjecture (3), which is now treated not as a
detailed scaling prescription but as an ordering assumption.
This allows us to introduce the following ordering:
∼ δB⊥
∼ ǫ, (12)
where vA = B0/
4πρ0 is the Alfvén speed. Note that this
means that we order the Mach number
, (13)
where cs = (γp0/ρ0)
1/2 is the speed of sound and
is the plasma beta, which is ordered to be order unity in the ǫ
expansion (subsidiary limits of high and low β can be taken
after the ǫ expansion is done; see § 2.4).
In Eq. (12), we made two auxiliary ordering assump-
tions: that the velocity and magnetic-field fluctuations
have the character of Alfvén and slow waves (δB⊥/B0 ∼
u⊥/vA, δB‖/B0 ∼ u‖/vA) and that the relative amplitudes
of the Alfvén-wave-polarized fluctuations (δB⊥/B0, u⊥/vA),
slow-wave-polarized fluctuations (δB‖/B0, u‖/vA) and den-
sity/pressure/entropy fluctuations (δρ/ρ0, δp/p0) are all the
same order. Strictly speaking, whether this is the case depends
on the energy sources that drive the turbulence: as we shall
see, if no slow waves (or entropy fluctuations) are launched,
none will be present. However, in astrophysical contexts, the
outer-scale energy input may be assumed random and, there-
fore, comparable power is injected into all types of fluctua-
tions.
We further assume that the characteristic frequency of the
fluctuations is ω∼ k‖vA [Eq. (3)], meaning that the fast waves,
for which ω ≃ k⊥(v2A + c2s )1/2, are ordered out. This restric-
tion must be justified empirically. Observations of the solar-
wind turbulence confirm that it is primarily Alfvénic (see,
e.g., Bale et al. 2005) and that its compressive component is
substantially pressure-balanced (Roberts 1990; Burlaga et al.
1990; Marsch & Tu 1993; Bavassano et al. 2004, see Eq. (22)
below). A weak-turbulence calculation of compressible MHD
turbulence in low-beta plasmas (Chandran 2005b) suggests
that only a small amount of energy is transferred from the fast
waves to Alfvén waves with large k‖. A similar conclusion
emerges from numerical simulations (Cho & Lazarian 2002,
2003). As the fast waves are also expected to be subject to
strong collisionless damping and/or to strong dissipation after
they steepen into shocks, we eliminate them from our con-
sideration of the problem and concentrate on low-frequency
turbulence.
2.2. Alfvén Waves
We start by observing that the Alfvén-wave-polarized
fluctuations are two-dimensionally solenoidal: since, from
Eq. (7),
∇·u = − d
= O(ǫ2) (15)
and ∇·δB = 0 exactly, separating the O(ǫ) part of these diver-
gences gives ∇⊥ ·u⊥ = 0 and ∇⊥ · δB⊥ = 0. To lowest order
in the ǫ expansion, we may, therefore, express u⊥ and δB⊥ in
terms of scalar stream (flux) functions:
u⊥ = ẑ×∇⊥Φ,
= ẑ×∇⊥Ψ. (16)
Evolution equations for Φ and Ψ are obtained by substituting
the expressions (16) into the perpendicular parts of the induc-
tion equation (10) and the momentum equation (8)—of the
latter the curl is taken to annihilate the pressure term. Keep-
ing only the terms of the lowest order, O(ǫ2), we get
+{Φ,Ψ}= vA
, (17)
∇2⊥Φ+
Φ,∇2⊥Φ
∇2⊥Ψ+
Ψ,∇2⊥Ψ
, (18)
where {Φ,Ψ} = ẑ · (∇⊥Φ×∇⊥Ψ) and we have taken into
account that, to lowest order,
+ u⊥ ·∇⊥ =
+{Φ, · · ·} , (19)
b̂ ·∇= ∂
·∇⊥ =
{Ψ, · · ·} . (20)
Here b̂ = B/B0 is the unit vector along the perturbed field line.
Equations (17-18) are known as the Reduced Magne-
tohydrodynamics (RMHD). The first derivations of these
equations (in the context of fusion plasmas) are due to
Kadomtsev & Pogutse (1974) and to Strauss (1976). These
were followed by many systematic derivations and gener-
alizations employing various versions and refinements of
the basic expansion, taking into account the non-Alfvénic
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 9
modes (which we will do in § 2.4), and including the ef-
fects of spatial gradients of equilibrium fields (e.g., Strauss
1977; Montgomery 1982; Hazeltine 1983; Zank & Matthaeus
1992; Kinney & McWilliams 1997; Bhattacharjee et al. 1998;
Kruger et al. 1998). A comparative review of these expansion
schemes and their (often close) relationship to ours is outside
the scope of this paper. One important point we wish to em-
phasize is that we do not assume the plasma beta [defined in
Eq. (14)] to be either large or small.
Equations (17) and (18) form a closed set, meaning that the
Alfvén-wave cascade decouples from the slow waves and den-
sity fluctuations. It is to the turbulence described by Eqs. (17-
18) that the GS theory outlined in § 1.2 applies.13 In § 5.3, we
will show that Eqs. (17) and (18) correctly describe inertial-
range Alfvénic fluctuations even in a collisionless plasma,
where the full MHD description [Eqs. (7-10)] is not valid.
2.3. Elsasser Fields
The MHD equations (7-10) in the incompressible limit
(ρ = const) acquire a symmetric form if written in terms of
the Elsasser fields z± = u± δB/
4πρ (Elsasser 1950). Let
us demonstrate how this symmetry manifests itself in the re-
duced equations derived above.
We introduce Elsasser potentials ζ± = Φ±Ψ, so that z±⊥ =
ẑ×∇⊥ζ±. For these potentials, Eqs. (17-18) become
∇2⊥ζ±∓ vA
∇2⊥ζ± = −
ζ+,∇2⊥ζ−
ζ−,∇2⊥ζ+
∓∇2⊥ {ζ+, ζ−}
. (21)
These equations show that the RMHD has a simple set of ex-
act solutions: if ζ− = 0 or ζ+ = 0, the nonlinear term vanishes
and the other, non-zero, Elsasser potential is simply a fluc-
tuation of arbitrary shape and magnitude propagating along
the mean field at the Alfvén speed vA: ζ
± = f±(x,y,z∓ vAt).
These solutions are finite-amplitude Alfvén-wave packets of
arbitrary shape. Only counterpropagating such solutions can
interact and thereby give rise to the Alfvén-wave cascade
(Kraichnan 1965). Note that these interactions are conserva-
tive in the sense that the “+” and “−” waves scatter off each
other without exchanging energy.
Note that the individual conservation of the “+” and “−”
waves’ energies means that the energy fluxes associated
with these waves need not be equal, so instead of a sin-
gle Kolmogorov flux ε assumed in the scaling arguments
13 The Alfvén-wave turbulence in the RMHD system has been stud-
ied by many authors. Some of the relevant numerical investigations are
due to Kinney & McWilliams (1998), Dmitruk et al. (2003), Oughton et al.
(2004), Rappazzo et al. (2007, 2008), Perez & Boldyrev (2008, 2009). An-
alytical theory has mostly been confined to the weak-turbulence paradigm
(Ng & Bhattacharjee 1996, 1997; Bhattacharjee & Ng 2001; Galtier et al.
2002; Lithwick & Goldreich 2003; Galtier & Chandran 2006; Nazarenko
2008). We note that adopting the critical balance [Eq. (3)] as an ordering
assumption for the expansion in k‖/k⊥ does not preclude one from subse-
quently attempting a weak-turbulence approach: the latter should simply be
treated as a subsidiary expansion. Indeed, implementing the anisotropy as-
sumption on the level of MHD equations rather than simultaneously with
the weak-turbulence closure (Galtier et al. 2000) significantly reduces the
amount of algebra. One should, however, bear in mind that the weak-
turbulence approximation always breaks down at some sufficiently small
scale—namely, when k⊥ ∼ (vA/U)
L, where L is the outer scale of the
turbulence, U velocity at the outer scale, and k‖ the parallel wavenum-
ber of the Alfvén waves (see Goldreich & Sridhar 1997 or the review by
Schekochihin & Cowley 2007). Below this scale, interactions cannot be as-
sumed weak.
reviewed in § 1.2, we could have ε+ 6= ε−. The GS the-
ory can be generalized to this case of imbalanced Alfvénic
cascades (Lithwick et al. 2007; Beresnyak & Lazarian 2008a;
Chandran 2008), but here we will focus on the balanced tur-
bulence, ε+ ∼ ε−. If one considers the turbulence forced
in a physical way (i.e., without forcing the magnetic field,
which would break the flux conservation), the resulting cas-
cade would always be balanced. In the real world, imbal-
anced Alfvénic fluxes are measured in the fast solar wind,
where the influence of initial conditions in the solar atmo-
sphere is more pronounced, while the slow-wind turbulence
is approximately balanced (Marsch & Tu 1990a; see also re-
views by Tu & Marsch 1995; Bruno & Carbone 2005 and ref-
erences therein).
2.4. Slow Waves and the Entropy Mode
In order to derive evolution equations for the remaining
MHD modes, let us first revisit the perpendicular part of the
momentum equation and use Eq. (12) to order terms in it. In
the lowest order, O(ǫ), we get the pressure balance
B0δB‖
= 0 ⇒ δp
. (22)
Using Eq. (22) and the entropy equation (9), we get
, (23)
where s0 = p0/ρ
0 . Now, substituting Eq. (15) for ∇·u in the
parallel component of the induction equation (10), we get
− b̂ ·∇u‖ = 0. (24)
Combining Eqs. (23) and (24), we obtain
1 + c2s/v
b̂ ·∇u‖, (25)
1 + v2A/c2s
b̂ ·∇u‖. (26)
Finally, we take the parallel component of the momentum
equation (8) and notice that, due to the pressure balance (22)
and to the smallness of the parallel gradients, the pressure
term is O(ǫ3), while the inertial and tension terms are O(ǫ2).
Therefore,
= v2Ab̂ ·∇
. (27)
Equations (26-27) describe the slow-wave-polarized fluctu-
ations, while Eq. (23) describes the zero-frequency entropy
mode, which is decoupled from the slow waves.14 The non-
linearity in Eqs. (26-27) enters via the derivatives defined in
14 For other expansion schemes leading to reduced sets of equations for
these “compressive” fluctuations see references in § 2.2. Note that the na-
ture of the density fluctuations described above is distinct from the so called
“pseudosound” density fluctuations that arise in the “nearly incompress-
ible” MHD theories (Montgomery et al. 1987; Matthaeus & Brown 1988;
Matthaeus et al. 1991; Zank & Matthaeus 1993). The “pseudosound” is es-
sentially the density response caused by the nonlinear pressure fluctuations
calculated from the incompressibility constraint. The resulting density fluc-
tuations are second order in Mach number and, therefore, order ǫ2 in our
expansion [see Eq. (13)]. The passive density fluctuations derived in this sec-
tion are order ǫ and, therefore, supersede the “pseudosound” (see review by
Tu & Marsch 1995 for a discussion of the relevant solar-wind evidence).
10 SCHEKOCHIHIN ET AL.
Eqs. (19-20) and is due solely to interactions with Alfvén
waves. Thus, both the slow-wave and the entropy-mode cas-
cades occur via passive scattering/mixing by Alfvén waves, in
the course of which there is no energy exchange between the
cascades.
Note that in the high-beta limit, cs ≫ vA [see Eq. (14)], the
entropy mode is dominated by density fluctuations [Eq. (23),
cs ≫ vA], which also decouple from the slow-wave cascade
[Eq. (25), cs ≫ vA]. and are passively mixed by the Alfvén-
wave turbulence:
= 0. (28)
The high-beta limit is equivalent to the incompressible ap-
proximation for the slow waves.
In § 5.5, we will derive a kinetic description for the inertial-
range compressive fluctuations (density and magnetic-field
strength), which is more generally valid in weakly collisional
plasmas and which reduces to Eqs. (26-27) in the collisional
limit (see Appendix D). While these fluctuations will in gen-
eral satisfy a kinetic equation, they will remain passive with
respect to the Alfvén waves.
2.5. Elsasser Fields for the Slow Waves
The original Elsasser (1950) symmetry was derived for in-
compressible MHD equations. However, for the “compres-
sive” slow-wave fluctuations, we may introduce generalized
Elsasser fields:
= u‖±
. (29)
Straightforwardly, the evolution equation for these fields is
∓ vA√
1 + v2A/c2s
1∓ 1√
1 + v2A/c2s
ζ+,z±
1± 1√
1 + v2A/c2s
ζ−,z±
. (30)
In the high-beta limit (vA ≪ cs), the generalized Elsasser
fields (29) become the parallel components of the conven-
tional incompressible Elsasser fields. We see that only in this
limit do the slow waves interact exclusively with the counter-
propagating Alfvén waves, and so only in this limit does set-
ting ζ− = 0 or ζ+ = 0 gives rise to finite-amplitude slow-wave-
packet solutions z±
= f±(x,y,z∓ vAt) analogous to the finite-
amplitude Alfvén-wave packets discussed in § 2.3.15 For gen-
eral β, the phase speed of the slow waves is smaller than that
of the Alfvén waves and, therefore, Alfvén waves can “catch
up” and interact with the slow waves that travel in the same
direction. All of these interactions are of scattering type and
involve no exchange of energy.
2.6. Scalings for Passive Fluctuations
15 Obviously, setting both ζ± = 0 does always enable these finite-
amplitude slow-wave solutions. More non-trivially, such finite-amplitude so-
lutions exist in the Lagrangian frame associated with the Alfvén waves—this
is discussed in detail in § 6.3.
The scaling of the passively mixed scalar fields introduced
above is slaved to the scaling of the Alfvénic fluctuations.
Consider for example the entropy mode [Eq. (23)]. As
in Kolmogorov–Obukhov theory (see § 1.1), one assumes a
local-in-scale-space cascade of scalar variance and a constant
flux εs of this variance. Then, analogously to Eq. (1),
v2thi
∼ εs. (31)
Since the cascade time is τ−1λ ∼ u⊥ ·∇⊥ ∼ vA/l‖λ ∼ ε/u2⊥λ,
)1/2 u⊥λ
, (32)
so the scalar fluctuations have the same scaling as the turbu-
lence that mixes them (Obukhov 1949; Corrsin 1951). In GS
turbulence, the scalar-variance spectrum should, therefore, be
⊥ (Lithwick & Goldreich 2001). The same argument ap-
plies to all passive fields.
It is the (presumably) passive electron-density spectrum
that provides the main evidence of the k−5/3 scaling in the in-
terstellar turbulence (Armstrong et al. 1981, 1995; Lazio et al.
2004, see further discussion in § 8.4.1). The explanation of
this spectrum in terms of passive mixing of the entropy mode,
originally proposed by Higdon (1984), was developed on the
basis of the GS theory by Lithwick & Goldreich (2001). The
turbulent cascade of the compressive fluctuations and the rel-
evant solar-wind data is discussed further in § 6.3. In partic-
ular, it will emerge that the anisotropy of these fluctuations
remains a non-trivial issue: is there an analog of the scaling
relation (5)? The scaling argument outlined above does not
invoke any assumptions about the relationship between the
parallel and perpendicular scales of the compressive fluctu-
ations (other than the assumption that they are anisotropic).
Lithwick & Goldreich (2001) argue that the parallel scales of
the Alfvénic fluctuations will imprint themselves on the pas-
sively advected compressive ones, so Eq. (5) holds for the
latter as well. In § 6.3, we examine this conclusion in view
of the solar-wind evidence and of the fact that the equations
for the compressive modes become linear in the Lagrangian
frame associated with the Alfvénic turbulence.
2.7. Five RMHD Cascades
Thus, the anisotropy and critical balance (3) taken as
ordering assumptions lead to a neat decomposition of the
MHD turbulent cascade into a decoupled Alfvén-wave cas-
cade and cascades of slow waves and entropy fluctuations pas-
sively scattered/mixed by the Alfvén waves. More precisely,
Eqs. (23), (21) and (30) imply that, for arbitrary β, there are
five conserved quantities:16
W±AW =
d3rρ0|∇⊥ζ±|2 (Alfven waves), (33)
W±sw =
d3rρ0|z±‖ |
2 (slow waves), (34)
(entropy fluctuations).(35)
16 Note that magnetic helicity of the perturbed field is not an invariant of
RMHD, except in two dimensions (see Appendix F.4). In 2D, there is also
conservation of the mean square flux,
d3r |Ψ|2 (see Appendix F.2).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 11
W +AW and W
AW are always cascaded by interaction with each
other, Ws is passively mixed by W
AW and W
AW, W
sw are pas-
sively scattered by W∓AW and, unless β ≫ 1, also by W
This is an example of splitting of the overall energy cascade
into several channels (recovered as a particular case of the
more general kinetic cascade in Appendix D.2)—a concept
that will repeatedly arise in the kinetic treatment to follow.
The decoupling of the slow- and Alfvén-wave cascades in
MHD turbulence was studied in some detail and confirmed
in direct numerical simulations by Maron & Goldreich (2001,
for β ≫ 1) and by Cho & Lazarian (2002, 2003, for a range
of values of β). The derivation given in § 2.2 and § 2.4 (cf.
Lithwick & Goldreich 2001) provides a straightforward theo-
retical basis for these results, assuming anisotropy of the tur-
bulence (which was also confirmed in these numerical stud-
ies).
It turns out that the decoupling of the Alfvén-wave cascade
that we demonstrated above for the anisotropic MHD turbu-
lence is a uniformly valid property of plasma turbulence at
both collisional and collisionless scales and that this cascade
is correctly described by the RMHD equations (17-18) all the
way down to the ion gyroscale, while the fluctuations of den-
sity and magnetic-field strength do not satisfy simple fluid
evolution equations anymore and require solving the kinetic
equation. In order to prove this, we adopt a kinetic descrip-
tion and apply to it the same ordering (§ 2.1) as we used to
reduce the MHD equations. The kinetic theory that emerges
as a result is called gyrokinetics.
3. GYROKINETICS
The gyrokinetic formalism was first worked out
for linear waves by Rutherford & Frieman (1968)
and by Taylor & Hastie (1968) (see also Catto 1978;
Antonsen & Lane 1980; Catto et al. 1981) and subsequently
extended to the nonlinear regime by Frieman & Chen (1982).
Rigorous derivations of the gyrokinetic equation based on
the Hamiltonian formalism were developed by Dubin et al.
(1983, electrostatic) and Hahm et al. (1988, electromagnetic).
This approach is reviewed in Brizard & Hahm (2007). A
more pedestrian, but perhaps also more transparent exposition
of the gyrokinetics in a straight mean field can be found in
Howes et al. (2006), who also provide a detailed explanation
of the gyrokinetic ordering in the context of astrophysical
plasma turbulence and a treatment of the linear waves and
damping rates. Here we review only the main points so as
to allow the reader to understand the present paper without
referring elsewhere.
In general, a plasma is completely described by the distribu-
tion function fs(t,r,v)—the probability density for a particle
of species s (= i,e) to be found at the spatial position r mov-
ing with velocity v. This function obeys the kinetic Vlasov–
Landau (or Boltzmann) equation
+ v ·∇ fs +
· ∂ fs
, (36)
where qs and ms are the particle’s charge and mass, c is the
speed of light, and the right-hand side is the collision term
(quadratic in f ). The electric and magnetic fields are
E = −∇ϕ−
, B = ∇×A. (37)
The first equality is Faraday’s law uncurled, the second
the magnetic-field solenoidality condition; we shall use the
Coulomb gauge, ∇·A = 0. The fields satisfy the Poisson and
the Ampère–Maxwell equations with the charge and current
densities determined by fs(t,r,v):
∇·E = 4π
qsns = 4π
d3v fs, (38)
∇×B − 1
d3vv fs. (39)
3.1. Gyrokinetic Ordering and Dimensionless Parameters
As in § 2 we set up a static equilibrium with a uniform mean
field, B0 = B0ẑ, E0 = 0, assume that the perturbations will be
anisotropic with k‖ ≪ k⊥ (at scales smaller than the outer
scale, k‖L ≫ 1; see § 1.3 and § 1.5.1), and construct an expan-
sion of the kinetic theory around this equilibrium with respect
to the small parameter ǫ ∼ k‖/k⊥. We adopt the ordering ex-
pressed by Eqs. (3) and (12), i.e., we assume the perturbations
to be strongly interacting Alfvén waves plus electron density
and magnetic-field-strength fluctuations.
Besides ǫ, several other dimensionless parameters are
present, all of which are formally considered to be of order
unity in the gyrokinetic expansion: the electron–ion mass ra-
tio me/mi, the charge ratio
Z = qi/|qe| = qi/e (40)
(for hydrogen, this is 1, which applies to most astrophysical
plasmas of interest to us), the temperature ratio17
τ = Ti/Te, (41)
and the plasma (ion) beta
v2thi
8πniTi
, (42)
where vthi = (2Ti/mi)
1/2 is the ion thermal speed and the total
β was defined in Eq. (14) based on the total pressure p = niTi +
neTe. We shall occasionally also use the electron beta
8πneTe
βi. (43)
The total beta is β = βi +βe.
3.1.1. Wavenumbers and Frequencies
As we want our theory to be uniformly valid at all (perpen-
dicular) scales above, at or below the ion gyroscale, we order
k⊥ρi ∼ 1, (44)
where ρi = vthi/Ωi is the ion gyroradius, Ωi = qiB0/cmi the ion
cyclotron frequency. Note that
ρi. (45)
17 It can be shown that equilibrium temperatures change on the timescale
∼ (ǫ2ω)−1 (Howes et al. 2006). On the other hand, from standard theory
of collisional transport (e.g., Helander & Sigmar 2002), the ion and elec-
tron temperatures equalize on the timescale ∼ ν−1ie ∼ (mi/me)
1/2ν−1ii [see
Eq. (51)]. Therefore, τ can depart from unity by an amount of order
ǫ2(ω/νii)(mi/me)1/2. In our ordering scheme [Eq. (49)], this is O(ǫ2) and,
therefore, we should simply set τ = 1 + O(ǫ2). However, we shall carry the
parameter τ because other ordering schemes are possible that permit arbitrary
values of τ . These are appropriate to plasmas with very weak collisions. For
example, in the solar wind, τ appears to be order unity but not exactly 1
(Newbury et al. 1998), while in accretion flows near the black hole, some
models predict τ ≫ 1 (see § 8.5).
12 SCHEKOCHIHIN ET AL.
FIG. 3.— Regions of validity in the wavenumber space of two primary approximations—the two-fluid (Appendix A.1) and gyrokinetic (§ 3). The gyrokinetic
theory holds when k‖ ≪ k⊥ and ω ≪ Ωi [when k‖ ≪ k⊥ < ρ
i , the second requirement is automatically satisfied for Alfvén, slow and entropy modes; see
Eq. (46)]. The two-fluid equations hold when k‖λmfpi ≪ 1 (collisional limit) and k⊥ρi ≪ 1 (magnetized plasma). Note that the gyrokinetic theory holds for all
but the very largest (outer) scales, where anisotropy cannot be assumed.
Assuming Alfvénic frequencies implies
∼ k⊥ρi√
ǫ. (46)
Thus, gyrokinetics is a low-frequency limit that averages over
the timescales associated with the particle gyration. Because
we have assumed that the fluctuations are anisotropic and have
(by order of magnitude) Alfvénic frequencies, we see from
Eq. (46) that their frequency remains far below Ωi at all scales,
including the ion and even electron gyroscale—the gyroki-
netics remains valid at all of these scales and the cyclotron-
frequency effects are negligible (cf. Quataert & Gruzinov
1999).
3.1.2. Fluctuations
Equation (3) allows us to order the fluctuations of the scalar
potential: on the one hand, we have from Eq. (3) u⊥ ∼ ǫvA; on
the other hand, the plasma mass flow velocity is (to the lowest
order) the E×B drift velocity of the ions, u⊥ ∼ cE⊥/B0 ∼
ck⊥ϕ/B0, so
ǫ. (47)
All other fluctuations (magnetic, density, parallel velocity) are
ordered according to Eq. (12).
Note that the ordering of the flow velocity dictated by
Eq. (3) means that we are considering the limit of small Mach
numbers:
M ∼ u
. (48)
This means that the gyrokinetic description in the form used
below does not extend to large sonic flows that can be
present in many astrophysical systems. It is, in principle,
possible to extend the gyrokinetics to systems with sonic
flows (e.g., in the toroidal geometry; see Artun & Tang 1994;
Sugama & Horton 1997). However, we do not follow this
route because such flows belong to the same class of non-
universal outer-scale features as background density and tem-
perature gradients, system-specific geometry etc.—these can
all be ignored at small scales, where the turbulence should be
approximately homogeneous and subsonic (as long as k‖L ≫
1, see discussion in § 1.5.1).
3.1.3. Collisions
Finally, we want our theory to be valid both in the colli-
sional and the collisionless regimes, so we do not assume
ω to be either smaller or larger than the (ion) collision fre-
quency νii:
k‖λmfpi√
∼ 1, (49)
where λmfpi = vthi/νii is the ion mean free path (this order-
ing can actually be inferred from equating the gyrokinetic en-
tropy production terms to the collisional entropy production;
see extended discussion in Howes et al. 2006). Note that the
ordering (49) holds on the understanding that we have ordered
k⊥ρi ∼ 1 [Eq. (44)] because the fluctuation frequency can de-
pend on k⊥ρi in the dissipation range (see § 7.3).
Other collision rates are related to νii via a set of standard
formulae (see, e.g., Helander & Sigmar 2002), which will be
useful in what follows:
νei = Zνee =
τ 3/2
νii, (50)
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 13
νie =
τ 3/2
νii, (51)
νii =
2πZ4e4ni lnΛ
, (52)
where lnΛ is the Coulomb logarithm and the numerical factor
in the definition of νie has been inserted for future notational
convenience (see Appendix A). We always define
λmfpi =
, λmfpe =
λmfpi. (53)
The ordering of the collision frequency expressed by
Eq. (49) means that collisions, while not dominant as in
the fluid description (Appendix A), are still retained in
the version of the gyrokinetic theory adopted by us. Their
presence is required in order for us to be able to assume
that the equilibrium distribution is Maxwellian [Eq. (54)
below] and for the heating and entropy production to be
treated correctly (§ 3.4 and § 3.5). However, our ordering of
collisions and of the fluctuation amplitudes (§ 3.1.2) imposes
certain limitations: thus, we cannot treat the class of nonlinear
phenomena involving particle trapping by parallel-varying
fluctuations, non-Maxwellian tails of particle distributions,
plasma instabilities arising from the equilibrium pressure
anisotropies (mirror, firehose) and their possible nonlinear
evolution to large amplitudes (see discussion in § 8.3).
The region of validity of the gyrokinetic approximation in
the wavenumber space is illustrated in Fig. 3—it embraces
all of the scales that are expected to be traversed by the
anisotropic energy cascade (except the scales close to the
outer scale).
As we explained above, me/mi, βi, k⊥ρi and k‖λmfpi (or
ω/νii) are assigned order unity in the gyrokinetic expansion.
Subsidiary expansions in small me/mi (§ 4) and in small or
large values of the other three parameters (§§ 5-7) can be car-
ried out at a later stage as long as their values are not so large
or small as to interfere with the primary expansion in ǫ. These
expansions will yield simpler models of turbulence with more
restricted domains of validity than gyrokinetics.
3.2. Gyrokinetic Equation
Given the gyrokinetic ordering introduced above, the ex-
pansion of the distribution function up to first order in ǫ can
be written as
fs(t,r,v) = F0s(v) −
qsϕ(t,r)
F0s(v) + hs(t,Rs,v⊥,v‖). (54)
To zeroth order, it is a Maxwellian:18
F0s(v) =
(πv2ths)
v2ths
, vths =
, (55)
with uniform density n0s and temperature T0s and no mean
flow. As will be explained in more detail in § 3.5, F0s has a
slow time dependence via the equilibrium temperature, T0s =
T0s(ǫ
2t). This reflects the slow heating of the plasma as the tur-
bulent energy is dissipated. However, T0s can be treated as a
constant with respect to the time dependence of the first-order
18 The use of isotropic equilibrium is a significant idealization—this is
discussed in more detail in § 8.3.
distribution function (the timescale of the turbulent fluctua-
tions). The first-order part of the distribution function is com-
posed of the Boltzmann response [second term in Eq. (54), or-
dered in Eq. (47)] and the gyrocenter distribution function hs.
The spatial dependence of the latter is expressed not by the
particle position r but by the position Rs of the particle gy-
rocenter (or guiding center)—the center of the ring orbit that
the particle follows in a strong guide field:
Rs = r +
v⊥× ẑ
. (56)
Thus, some of the velocity dependence of the distribution
function is subsumed in the Rs dependence of hs. Explicitly,
hs depends only on two velocity-space variables: it is cus-
tomary in the gyrokinetic literature for these to be chosen as
the particle energy εs = msv
2/2 and its first adiabatic invari-
ant µs = msv
⊥/2B0 (both conserved quantities to two lowest
orders in the gyrokinetic expansion). However, in a straight
uniform guide field B0ẑ, the pair (v⊥,v‖) is a simpler choice,
which will mostly be used in what follows (we shall some-
times find an alternative pair, v and ξ = v‖/v, useful, especially
where collisions are concerned). It must be constantly kept in
mind that derivatives of hs with respect to the velocity-space
variables are taken at constant Rs, not at constant r.
The function hs satisfies the gyrokinetic equation:
{〈χ〉Rs ,hs} =
qsF0s
∂〈χ〉Rs
where
χ(t,r,v) = ϕ−
v⊥ ·A⊥
, (58)
the Poisson brackets are defined in the usual way:
{〈χ〉Rs ,hs} = ẑ ·
∂〈χ〉Rs
× ∂hs
, (59)
and the ring average notation is introduced:
〈χ(t,r,v)〉Rs =
t,Rs −
v⊥× ẑ
, (60)
where ϑ is the angle in the velocity space taken in the plane
perpendicular to the guide field B0ẑ. Note that, while χ is
a function of r, its ring average is a function of Rs. Note
also that the ring averages depend on the species index, as
does the gyrocenter variable Rs. Equation (57) is derived by
transforming the first-order kinetic equation to the gyrocenter
variable (56) and ring averaging the result (see Howes et al.
2006, or the references given at the beginning of § 3). The
ring-averaged collision integral (∂hs/∂t)c is discussed in Ap-
pendix B.
3.3. Field Equations
To Eq. (57), we must append the equations that determine
the electromagnetic field, namely, the potentials ϕ(t,r) and
A(t,r) that enter the expression for χ [Eq. (58)]. In the
non-relativistic limit (vthi ≪ c), these are the plasma quasi-
neutrality constraint [which follows from the Poisson equa-
tion (38) to lowest order in vthi/c]:
qsδns =
n0s +
d3v〈hs〉r
14 SCHEKOCHIHIN ET AL.
and the parallel and perpendicular parts of Ampère’s law
[Eq. (39) to lowest order in ǫ and in vthi/c]:
∇2⊥A‖ = −
j‖ = −
d3vv‖〈hs〉r, (62)
∇2⊥δB‖ = −
∇⊥× j⊥
d3v〈v⊥hs〉r
, (63)
where we have used δB‖ = ẑ · (∇⊥×A⊥) and dropped the dis-
placement current. Since field variables ϕ, A‖ and δB‖ are
functions of the spatial variable r, not of the gyrocenter vari-
able Rs, we had to determine the contribution from the gy-
rocenter distribution function hs to the charge distribution at
fixed r by performing a gyroaveraging operation dual to the
ring average defined in Eq. (60):
〈hs(t,Rs,v⊥,v‖)〉r =
t,r +
v⊥× ẑ
,v⊥,v‖
In other words, the velocity-space integrals in Eqs. (61-63)
are performed over hs at constant r, rather than constant Rs.
If we Fourier transform hs in Rs, the gyroaveraging operation
takes a simple mathematical form:
〈hs〉r =
〈eik·Rs〉rhsk(t,v⊥,v‖)
eik·r
ik · v⊥× ẑ
hsk(t,v⊥,v‖)
eik·rJ0(as)hsk(t,v⊥,v‖), (65)
where as = k⊥v⊥/Ωs and J0 is a Bessel function that arose
from the angle integral in the velocity space. In Eq. (63), an
analogous calculation taking into account the angular depen-
dence of v⊥ leads to
δB‖ = −
eik·r
d3vmsv
J1(as)
hsk(t,v⊥,v‖).
Note that Eq. (63) [and, therefore, Eq. (66)] is the gyroki-
netic equivalent of the perpendicular pressure balance that ap-
peared in § 2 [Eq. (22)]:
B0δB‖
= ∇⊥ ·
d3v〈ẑ× v⊥hs〉r
= ∇⊥ ·
t,r +
v⊥× ẑ
,v⊥,v‖
= −∇⊥∇⊥ :
d3vms〈v⊥v⊥ hs〉r = −∇⊥∇⊥ : δP⊥,(67)
where we have integrated by parts with respect to the gyroan-
gle ϑ and used ∂v⊥/∂ϑ = ẑ× v⊥, ∂2v⊥/∂ϑ2 = −v⊥ (cf. the
Appendix of Roach et al. 2005).
Once the fields are determined, they have to be substi-
tuted into χ [Eq. (58)] and the result ring averaged [Eq. (60)].
Again, we emphasize that ϕ, A‖ and δB‖ are functions of r,
while 〈χ〉Rs is a function of Rs. The transformation is ac-
complished via a calculation analogous to the one that led to
Eqs. (65) and (66):
〈χ〉Rs =
eik·Rs〈χ〉Rs ,k, (68)
〈χ〉Rs ,k = J0(as)
v‖A‖k
v2ths
J1(as)
. (69)
The last equation establishes a correspondence between the
Fourier transforms of the fields with respect to r and the
Fourier transform of 〈χ〉Rs with respect to Rs.
3.4. Generalized Energy and the Kinetic Cascade
As promised in § 1.4, the central unifying concept of this
paper is now introduced.
If we multiply the gyrokinetic equation (57) by T0shs/F0s
and integrate over the velocities and gyrocenters, we find that
the nonlinear term conserves the variance of hs and
d3Rs qs
∂〈χ〉Rs
T0shs
. (70)
Let us now sum this equation over all species. The first term
on the right-hand side is
∂〈χ〉Rs
d3v〈hs〉r −
d3v〈vhs〉r
d3rE · j, (71)
where we have used Eq. (61) and Ampère’s law [Eqs. (62-
63)] to express the integrals of hs. The second term on the
right-hand side is the total work done on plasma per unit time.
Using Faraday’s law [Eq. (37)] and Ampère’s law [Eq. (39)],
it can be written as
d3rE · j = − d
|δB|2
+ Pext, (72)
where Pext ≡ −
d3rE · jext is the total power injected into the
system by the external energy sources (outer-scale stirring; in
terms of the Kolmogorov energy flux ε used in the scaling
arguments in § 1.2, Pext = Vmin0iε, where V is the system vol-
ume). Combining Eqs. (70-72), we find (Howes et al. 2006)
T0s〈h2s 〉r
|δB|2
= Pext +
T0shs
. (73)
W is a positive definite quantity—this becomes explicit if we
use Eq. (61) to express it in terms of the total perturbed distri-
bution function δ fs = −qsϕF0s/T0s + hs [see Eq. (54)]:
T0sδ f
|δB|2
. (74)
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 15
We will refer to W as the generalized energy. We use this
term to emphasize the role of W as the cascaded quantity in
gyrokinetic turbulence (see below). This quantity is, in fact,
the gyrokinetic version of a collisionless kinetic invariant var-
iously referred to as the generalized grand canonical poten-
tial (see Hallatschek 2004, who points out the fundamental
role of this quantity in plasma turbulence simulations) or free
energy (e.g., Fowler 1968; Scott 2007). The non-magnetic
part of W is related to the perturbed entropy of the sys-
tem (Krommes & Hu 1994; Sugama et al. 1996; Howes et al.
2006; Schekochihin et al. 2008b, see discussion in § 3.5).19
Equation (73) is a conservation law of the generalized en-
ergy: Pext is the source and the second term on the right-hand
side, which is negative definite, represents collisional dissi-
pation. This suggests that we might think of kinetic plasma
turbulence in terms of the generalized energy W injected by
the outer-scale stirring and dissipated by collisions. In or-
der for the dissipation to be important, the collisional term in
Eq. (73) has to become comparable to Pext. This can happen
in two ways:
1. At collisional scales (k‖λmfpi ∼ 1) due to deviations of
the perturbed distribution function from a local per-
turbed Maxwellian (see § 6.1 and Appendix D);
2. At collisionless scales (k‖λmfpi ≫ 1) due the develop-
ment of small scales in the velocity space—large gra-
dients in v‖ (see § 6.2.4) or v⊥ (which is accompanied
by the development of small perpendicular scales in the
position space; see § 7.9.1).
Thus, the dissipation is only important at particular (small)
scales, which are generally well separated from the outer
scale. The generalized energy is transferred from the outer
scale to the dissipation scales via a nonlinear cascade. We
shall call it the kinetic cascade. It is analogous to the energy
cascade in fluid or MHD turbulence, but a conceptually new
feature is present: the small scales at which dissipation hap-
pens are small scales both in the velocity and position space.
Whereas the large gradients in v‖ are produced by the lin-
ear parallel phase mixing, whose role in the kinetic dissipa-
tion processes has been appreciated for some time (Landau
1946; Hammett et al. 1991; Krommes & Hu 1994; Krommes
1999; Watanabe & Sugama 2004, see § 6.2.4), the emergence
of large gradients in v⊥ is due to an essentially nonlinear
phase mixing mechanism (§ 7.9.1). At spatial scales smaller
than the ion gyroradius, this nonlinear perpendicular phase
mixing turns out to be a faster and, therefore, presumably the
dominant way of generating small-scale structure in the veloc-
ity space. It was anticipated in the development of gyrofluid
moment hierarchies by Dorland & Hammett (1993). Here we
treat it for the first time as a phase-space turbulent cascade:
this is done in § 7.9 and § 7.10 (see also Schekochihin et al.
2008b).
In the sections that follow, we shall derive particular forms
of W for various limiting cases of the gyrokinetic theory
(§ 4.7, § 5.6, § 6.2.5, § 7.8, Appendices D.2 and E.2). We
shall see that the kinetic cascade of W is, indeed, a direct
generalization of the more familiar fluid cascades (such as
19 Note also that a quadratic form involving both the perturbed distribution
function and the electromagnetic field appears, in a more general form than
Eq. (74), in the formulation of the energy principle for the Kinetic MHD
approximation (Kruskal & Oberman 1958; Kulsrud 1962, 1964). Regarding
the relationship between Kinetic MHD and gyrokinetics, see footnote 23.
the RMHD cascades discussed in § 2) and that W contains
the energy invariants of the fluid models in the appropriate
limits. In these limits, the cascade of the generalized en-
ergy will split into several decoupled cascades, as it did in
the case of RMHD (§ 2.7). Whenever one of the physically
important scales (§ 1.5.2) is crossed and a change of physical
regime occurs, these cascades are mixed back together into
the overall kinetic cascade of W , which can then be split in
a different way as it emerges on the “opposite side” of the
transition region in the scale space. The conversion of the
Alfvénic cascade into the KAW cascade and the entropy cas-
cade at k⊥ρi ∼ 1 is the most interesting example of such a
transition, discussed in § 7.
The generalized energy appears to be the only quadratic
invariant of gyrokinetics in three dimensions; in two dimen-
sions, many other invariants appear (see Appendix F).
3.5. Heating and Entropy
In a stationary state, all of the the turbulent power injected
by the external stirring is dissipated and thus transferred into
heat. Mathematically, this is expressed as a slow increase in
the temperature of the Maxwellian equilibrium. In gyrokinet-
ics, the heating timescale is ordered as ∼ (ǫ2ω)−1.
Even though the dissipation of turbulent fluctuations may
be occurring “collisionlessly” at scales such that k‖λmfpi ≫ 1
(e.g., via wave–particle interaction at the ion gyroscale; § 7.1),
the resulting heating must ultimately be effected with the help
of collisions. This is because heating is an irreversible process
and it is a small amount of collisions that make “collisionless”
damping irreversible. In other words, slow heating of the
Maxwellian equilibrium is equivalent to entropy production
and Boltzmann’s H-theorem rigorously requires collisions to
make this possible. Indeed, the total entropy of species s is
Ss = −
d3v fs ln fs
F0s lnF0s +
δ f 2s
+ O(ǫ3), (75)
where we took
d3rδ fs = 0. It is then not hard to show that
T0shs
where the overlines mean averaging over times longer than
the characteristic time of the turbulent fluctuations ∼ ω−1
but shorter than the typical heating time ∼ (ǫ2ω)−1 (see
Howes et al. 2006; Schekochihin et al. 2008b for a detailed
derivation of this and related results on heating in gyroki-
netics; see also earlier discussions of the entropy production
in gyrokinetics by Krommes & Hu 1994; Krommes 1999;
Sugama et al. 1996). We have omitted the term describing the
interspecies collisional temperature equalization. Note that
both sides of Eq. (76) are order ǫ2ω.
If we now time average Eq. (73) in a similar fashion, the
left-hand side vanishes because it is a time derivative of a
quantity fluctuating on the timescale ∼ ω−1 and we confirm
that the right-hand side of Eq. (76) is simply equal to the av-
erage power Pext injected by external stirring. The import of
Eq. (76) is that it tells us that heating can only be effected
by collisions, while Eq. (73) implies that the injected power
gets to the collisional scales in velocity and position space by
means of a kinetic cascade of generalized energy.
16 SCHEKOCHIHIN ET AL.
The first term in the expression for the generalized energy
(74) is −
s T0sδSs, where δSs is the perturbed entropy [see
Eq. (75)]. The second term in Eq. (74) is magnetic energy.
Collisionless damping of electromagnetic fluctuations can be
thought of as a redistribution of the generalized energy, trans-
ferring the electromagnetic energy into entropy fluctuations,
while the total W is conserved (a simple example of how that
happens for collisionless compressive fluctuations in the iner-
tial range is worked out in § 6.2.3).
The contribution to the perturbed entropy from the gy-
rocenter distribution is the integral of −h2s/2F0s, whose
evolution equation (70) can be viewed as the gyrokinetic
version of the H-theorem. The first term on the right-hand
side of this equation represents the wave–particle interaction
(collisionless damping). Under time average, it is related to
the work done on plasma [Eq. (71)] and hence to the average
externally injected power Pext via time-averaged Eq. (72).
In a stationary state, this is balanced by the second term in the
right-hand side of Eq. (70), which is the collisional-heating,
or entropy-production, term that also appears in Eq. (76).
Thus, the generalized energy channeled by collisionless
damping into entropy fluctuations is eventually converted
into heat by collisions. The sub-gyroscale entropy cascade,
which brings the perturbed distribution function hs to col-
lisional scales, will be discussed further in § 7.9 and § 7.10
(see also Schekochihin et al. 2008b).
This concludes a short primer on gyrokinetics necessary
(and sufficient) for adequate understanding of what is to fol-
low. Formally, all further analytical derivations in this paper
are simply subsidiary expansions of the gyrokinetics in the pa-
rameters we listed in § 3.1: in § 4, we expand in (me/mi)
in § 5 in k⊥ρi (followed by further subsidiary expansions in
large and small k‖λmfpi in § 6), and in § 7 in 1/k⊥ρi.
4. ISOTHERMAL ELECTRON FLUID
In this section, we carry out an expansion of the electron gy-
rokinetic equation in powers of (me/mi)
1/2 ≃ 0.02 (for hydro-
gen plasma). In virtually all cases of interest, this expansion
can be done while still considering
βi, k⊥ρi, and k‖λmfpi to
be order unity.21 Note that the assumption k⊥ρi ∼ 1 together
with Eq. (45) mean that
k⊥ρe ∼ k⊥ρi(me/mi)1/2 ≪ 1, (77)
i.e., the expansion in (me/mi)
1/2 means also that we are
considering scales larger than the electron gyroradius. The
idea of such an expansion of the electron kinetic equation
has been utilized many times in plasma physics literature.
The mass-ratio expansion of the gyrokinetic equation in a
form very similar to what is presented below is found in
Snyder & Hammett (2001).
20 Note that Eq. (72) is valid not only in the integral form but also indi-
vidually for each wavenumber: indeed, using the Fourier-transformed Fara-
day and Ampère’s laws, we have Ek · j
k + E
k · jk = Ek · j
ext,k + E
k · jext,k −
(1/4π)∂|δBk|2/∂t. In a stationary state, time averaging eliminates the time
derivative of the magnetic-fluctuation energy, so Ek · j∗k + E
k · jk = 0 at all k
except those corresponding to the outer scale, where the external energy in-
jection occurs. This means that below the outer scale, the work done on one
species balances the work done on the other. The wave–particle interaction
term in the gyrokinetic equation is responsible for this energy exchange.
21 One notable exception is the LAPD device at UCLA, where β ∼ 10−4 −
10−3 (due mostly to the electron pressure because the ions are cold, τ ∼
0.1, so βi ∼ βe/10; see, e.g., Morales et al. 1999; Carter et al. 2006). This
interferes with the mass-ratio expansion.
The primary import of this section will be technical: we
shall dispense with the electron gyrokinetic equation and thus
prepare the necessary ground for further approximations. The
main results are summarized in § 4.9. A reader who is only
interested in following qualitatively the major steps in the
derivation may skip to this summary.
4.1. Ordering the Terms in the Kinetic Equation
In view of Eq. (77), ae ≪ 1, so we can expand the Bessel
functions arising from averaging over the electron ring mo-
tion:
J0(ae) = 1 −
a2e + · · · ,
J1(ae)
a2e + · · ·
. (78)
Keeping only the lowest-order terms of the above expansions
in Eq. (69) for 〈χ〉Re , then substituting this 〈χ〉Re and qe =
−e in the electron gyrokinetic equation, we get the following
kinetic equation for the electrons, accurate up to and including
the first order in (me/mi)
1/2 (or in k⊥ρe):
︸ ︷︷ ︸
︸ ︷︷ ︸
v2the
︸ ︷︷ ︸
︸ ︷︷ ︸
v2the
︸ ︷︷ ︸
︸ ︷︷ ︸
. (79)
Note that ϕ, A‖, δB‖ in Eq. (79) are taken at r = Re. We
have indicated the lowest order to which each of the terms
enters if compared with v‖∂he/∂z. In order to obtain these
estimates, we have assumed that the physical ordering intro-
duced in § 3.1 holds with respect to the subsidiary expansion
in (me/mi)
1/2 as well as for the primary gyrokinetic expansion
in ǫ, so we can use Eqs. (3) and (12) to order terms with re-
spect to (me/mi)
1/2. We have also made use of Eqs. (45), (47),
and of the following three relations:
∼ vthe
, (80)
(v‖/c)A‖
∼ vtheδB⊥
, (81)
v2the
βi. (82)
The collision term is estimated to be zeroth order because [see
Eqs. (49), (50)]
k‖λmfpi
. (83)
The consequences of other possible orderings of the collision
terms are discussed in § 4.8. We remind the reader that all
dimensionless parameters except k‖/k⊥ ∼ ǫ and (me/mi)1/2
are held to be order unity.
We now let he = h
e + h
e + . . . and carry out the expansion
to two lowest orders in (me/mi)
4.2. Zeroth Order
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 17
To zeroth order, the electron kinetic equation is
v‖b̂ ·∇h(0)e = v‖
∂h(0)e
, (84)
where we have assembled the terms in the left-hand side to
take the form of the derivative of the distribution function
along the perturbed magnetic field:
b̂ ·∇ = ∂
·∇ = ∂
A‖, · · ·
. (85)
We now multiply Eq. (84) by h(0)e /F0e and integrate over v and
r (since we are only retaining lowest-order terms, the distinc-
tion between r and Re does not matter here). Since ∇·B = 0,
the left-hand side vanishes (assuming that all perturbations are
either periodic or vanish at the boundaries) and we get
h(0)e
∂h(0)e
‖e = 0.
The right-hand side of this equation is zero because the
electron flow velocity is zero in the zeroth order, u(0)
(1/n0e)
d3vv‖h
e = 0. This is a consequence of the paral-
lel Ampére’s law [Eq. (62)], which can be written as follows
u‖e =
4πen0e
∇2⊥A‖ + u‖i, (87)
where
u‖i =
eik·r
d3vv‖J0(ai)hik. (88)
The three terms in Eq. (87) can be estimated as follows
∼ ǫvthe
ǫ, (89)
∼ ǫ, (90)
c∇2⊥A‖
4πen0evA
∼ k⊥ρi
ǫ, (91)
where we have used the fundamental ordering (12) of the slow
waves (u‖i ∼ ǫvA) and Alfvén waves (δB⊥ ∼ ǫB0). Thus, the
two terms in the right-hand side of Eq. (87) are one order of
(me/mi)
1/2 smaller than u(0)
‖e , which means that to zeroth order,
the parallel Ampère’s law is u(0)
‖e = 0.
The collision operator in Eq. (86) contains electron–
electron and electron–ion collisions. To lowest order in
(me/mi)
1/2, the electron–ion collision operator is simply the
pitch-angle scattering operator [see Eq. (B20) in Appendix B
and recall that u‖i is first order]. Therefore, we may then
rewrite Eq. (86) as follows
h(0)e
Cee[h
νeiD (v)
1 − ξ2
∂h(0)e
= 0. (92)
Both terms in this expression are negative definite and must,
therefore, vanish individually. This implies that h(0)e must be
a perturbed Maxwellian distribution with zero mean veloc-
ity (this follows from the proof of Boltzmann’s H theorem;
see, e.g., Longmire 1963), i.e., the full electron distribution
function to zeroth order in the mass-ratio expansion is [see
Eq. (54)]:
fe = F0e +
+ h(0)e =
2πTe/me
, (93)
where ne = n0e + δne, Te = T0e + δTe. Expanding around the
unperturbed Maxwellian F0e, we get
h(0)e =
v2the
F0e, (94)
where the fields are taken at r = Re. Now substitute this so-
lution back into Eq. (84). The collision term vanishes and the
remaining equation must be satisfied at all values of v. This
gives
+ b̂ ·∇ϕ= b̂ ·∇T0e
, (95)
b̂ ·∇δTe
= 0. (96)
The collision term is neglected in Eq. (95) because, for h(0)e
given by Eq. (94), it vanishes to zeroth order.
4.3. Flux Conservation
Equation (95) implies that the magnetic flux is conserved
and magnetic-field lines cannot be broken to lowest order in
the mass-ratio expansion. Indeed, we may follow Cowley
(1985) and argue that the left-hand side of Eq. (95) is minus
the projection of the electric field on the total magnetic field
[see Eq. (37)], so we have
E · b̂ = −b̂ ·∇
; (97)
hence the total electric field is
Î − b̂b̂
and Faraday’s law becomes
= −c∇×E = ∇× (ueff ×B) , (99)
ueff =
E +∇T0e
×B, (100)
i.e., the magnetic field lines are frozen into the velocity field
ueff. In Appendix C.1, we show that this effective velocity is
the part of the electron flow velocity ue perpendicular to the
total magnetic field B [see Eq. (C6)].
The flux conservation is broken in the higher orders of the
mass-ratio expansion. In the first order, Ohmic resistivity for-
mally enters in Eq. (95) (unless collisions are even weaker
than assumed so far; if they are downgraded one order as is
done in § 4.8.3, resistivity enters in the second order). In the
second order, the electron inertia and the finiteness of the elec-
tron gyroradius also lead to unfreezing of the flux. This can be
seen formally by keeping second-order terms in Eq. (79), mul-
tiplying it by v‖ and integrating over velocities. The relative
importance of these flux unfreezing mechanisms is evaluated
in § 7.7.
18 SCHEKOCHIHIN ET AL.
4.4. Isothermal Electrons
Equation (96) mandates that the perturbed electron temper-
ature must remain constant along the perturbed field lines.
Strictly speaking, this does not preclude δTe varying across
the field lines. However, we shall now assume δTe = const (has
no spatial variation), which is justified, e.g., if the field lines
are stochastic. Assuming that no spatially uniform perturba-
tions exist, we may set δTe = 0. Equation (94) then reduces
h(0)e =
F0e(v), (101)
or, using Eq. (54),
δ fe =
F0e(v). (102)
Hence follows the equation of state for isothermal electrons:
δpe = T0eδne. (103)
4.5. First Order
We now integrate Eq. (79) over the velocity space and retain
the lowest (first) order terms only. Using Eq. (101), we get
A‖,u‖e
= 0, (104)
where the parallel electron velocity is first order:
u‖e = u
d3vv‖h
e . (105)
The velocity-space integral of the collision term does not enter
because it is subdominant by at least one factor of (me/mi)
indeed, as shown in Appendix B.1, the velocity integration
leads to an extra factor of k2⊥ρ
e , so that
∼ νeik2⊥ρ2e
i νii
, (106)
where we have used Eqs. (45) and (50). The collision term
is subdominant because of the ordering of the ion collision
frequency given by Eq. (49).
4.6. Field Equations
Using Eq. (101) and qi = Ze, n0e = Zn0i, T0e = T0i/τ ,
we derive from the quasi-neutrality equation (61) [see also
Eq. (65)]
eik·r
d3vJ0(ai)hik, (107)
and, from the perpendicular part of Ampère’s law [Eq. (66),
using also Eq. (107)],
eik·r
J0(ai) +
v2thi
J1(ai)
. (108)
The parallel electron velocity, u‖e, is determined from the par-
allel part of Ampère’s law, Eq. (87).
The ion distribution function hi that enters these equations
has to be determined by solving the ion gyrokinetic equation:
Eq. (57) with s = i.
4.7. Generalized Energy
The generalized energy (§ 3.4) for the case of isothermal
electrons is calculated by substituting Eq. (102) into Eq. (74):
T0iδ f
n0eT0e
|δB|2
, (109)
where δ fi = hi −
Zeϕ/T0i
F0i [see Eq. (54)].
4.8. Validity of the Mass-Ratio Expansion
Let us examine the range of spatial scales in which the
equations derived above are valid. In carrying out the ex-
pansion in (me/mi)
1/2, we ordered k⊥ρi ∼ 1 [Eq. (77)] and
k‖λmfpi ∼ 1 [Eq. (83)]. Formally, this means that the perpen-
dicular and parallel wavelengths of the perturbations must not
be so small or so large as to interfere with the mass ratio ex-
pansion. We now discuss the four conditions that this require-
ment leads to and whether any of them can be violated without
destroying the validity of the equations derived above.
4.8.1. k⊥ρi ≪ (mi/me)
This is equivalent to demanding that k⊥ρe ≪ 1, a condition
that was, indeed, essential for the expansion to hold [Eq. (78)].
This is not a serious limitation because electrons can be con-
sidered well magnetized at virtually all scales of interest for
astrophysical applications. However, we do forfeit the de-
tailed information about some important electron physics at
k⊥ρe ∼ 1: for example such effects as wave damping at the
electron gyroscale and the electron heating (although the total
amount of the electron heating can be deduced by subtracting
the ion heating from the total energy input). The breaking of
the flux conservation (resistivity) is also an effect that requires
incorporation of the finite electron gyroscale physics.
4.8.2. k⊥ρi ≫ (me/mi)
If this condition is broken, the small-k⊥ρi expansion, car-
ried out in § 5, must, formally speaking, precede the mass-
ratio expansion. However, it turns out that the small-
k⊥ρi expansion commutes with the mass-ratio expansion
(Schekochihin et al. 2007, see also footnote 23), so we
may use the equations derived in §§ 4.2-4.6 when k⊥ρi .
(me/mi)
4.8.3. k‖λmfpi ≪ (mi/me)
Let us consider what happens if this condition is broken
and k‖λmfpi & (mi/me)
1/2. In this case, the collisions be-
come even weaker and the expansion procedure must be mod-
ified. Namely, the collision term picks up one extra order of
(me/mi)
1/2, so it is first order in Eq. (79). To zeroth order,
the electron kinetic equation no longer contains collisions: in-
stead of Eq. (84), we have
v‖b̂ ·∇h(0)e = v‖
. (110)
We may seek the solution of this equation in the form h(0)e =
H(t,Re)F0e + h
e,hom, where H(t,Re) is an unknown function to
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 19
FIG. 4.— Region of validity in the wavenumber space of the secondary approximation—isothermal electrons and gyrokinetic ions (§ 4). It is the region
of validity of the gyrokinetic approximation (Fig. 3) further circumscribed by two conditions: k‖λmfpi ≫ (me/mi)
1/2 (isothermal electrons) and k⊥ρe ≪ 1
(magnetized electrons). The region of validity of the strongly magnetized two-fluid theory (Appendix A.2) is also shown. It is the same as for the full two-fluid
theory plus the additional constraint k⊥ρi ≪ k‖λmfpi. The region of validity of MHD (or one-fluid theory) is the subset of this with k‖λmfpi ≪ (me/mi)
(adiabatic electrons).
be determined and h(0)e,hom is the homogeneous solution satis-
fying
b̂ ·∇h(0)e,hom = 0, (111)
i.e., h(0)e,hom must be constant along the perturbed magnetic
field. This is a generalization of Eq. (96). Again assuming
stochastic field lines, we conclude that h(0)e,hom is independent
of space. If we rule out spatially uniform perturbations, we
may set h(0)e,hom = 0. The unknown function H(t,Re) is readily
expressed in terms of δne and ϕ:
d3vh(0)e ⇒ H =
, (112)
so h(0)e is again given by Eq. (101), so the equations derived
in §§ 4.2-4.6 are unaltered. Thus, the mass-ratio expansion
remains valid at k‖λmfpi & (mi/me)
4.8.4. k‖λmfpi ≫ (me/mi)
If the parallel wavelength of the fluctuations is so long that
this is violated, k‖λmfpi . (me/mi)
1/2, the collision term in
Eq. (79) is minus first order. This is the lowest-order term in
the equation. Setting it to zero obliges h(0)e to be a perturbed
Maxwellian again given by Eq. (94). Instead of Eq. (84), the
zeroth-order kinetic equation is
v‖b̂ ·∇h(0)e = v‖
∂h(1)e
. (113)
Now the collision term in this order contains h(1)e , which
can be determined from Eq. (113) by inverting the colli-
sion operator. This sets up a perturbation theory that in due
course leads to the Reduced MHD version of the general
MHD equations—this is what was considered in § 2. Equa-
tion (96) no longer needs to hold, so the electrons are not
isothermal. In this true one-fluid limit, both electrons and
ions are adiabatic with equal temperatures [see Eq. (115) be-
low]. The collisional transport terms in this limit (parallel
and perpendicular resistivity, viscosity, heat fluxes, etc.) were
calculated [starting not from gyrokinetics but from the gen-
eral Vlasov–Landau equation (36)] in exhaustive detail by
Braginskii (1965). His results and the way RMHD emerges
from them are reviewed in Appendix A.
In physical terms, the electrons can no longer be isothermal
if the parallel electron diffusion time becomes longer than the
characteristic time of the fluctuations (the Alfvén time):
vtheλmfpik2‖
⇔ k‖λmfpi .
. (114)
Furthermore, under a similar condition, electron and ion tem-
peratures must equalize: this happens if the ion–electron col-
lision time is shorter than the Alfvén time,
⇔ k‖λmfpi .
(115)
(see Lithwick & Goldreich 2001 for a discussion of these con-
ditions in application to the ISM).
4.9. Summary
The original gyrokinetic description introduced in § 3 was
a system of two kinetic equations [Eq. (57)] that evolved the
electron and ion distribution functions he, hi and three field
20 SCHEKOCHIHIN ET AL.
equations [Eqs. (61-63)] that related ϕ, A‖ and δB‖ to he and
hi. In this section, we have taken advantage of the smallness
of the electron mass to treat the electrons as an isothermal
magnetized fluid, while ions remained fully gyrokinetic.
In mathematical terms, we solved the electron kinetic equa-
tion and replaced the gyrokinetics with a simpler closed sys-
tem of equations that evolve 6 unknown functions: ϕ, A‖, δB‖,
δne, u‖e and hi. These satisfy two fluid-like evolution equa-
tions (95) and (104), three integral relations (107), (108), and
(87) which involve hi, and the kinetic equation (57) for hi.
The system is simpler because the full electron distribution
function has been replaced by two scalar fields δne and u‖e.
We now summarize this new system of equations: denoting
ai = k⊥v⊥/Ωi, we have
+ b̂ ·∇ϕ= b̂ ·∇T0e
, (116)
+ b̂ ·∇u‖e = −
,(117)
eik·r
d3vJ0(ai)hik, (118)
u‖e =
4πen0e
∇2⊥A‖ +
eik·r
d3vv‖J0(ai)hik, (119)
eik·r
J0(ai) +
v2thi
J1(ai)
, (120)
and Eq. (57) for s = i and ion–ion collisions only:
{〈χ〉Ri ,hi} =
∂〈χ〉Ri
F0i + 〈Cii[hi]〉Ri ,
(121)
where 〈Cii[. . .]〉Ri is the gyrokinetic ion–ion collision oper-
ator (see Appendix B) and the ion–electron collisions have
been neglected to lowest order in (me/mi)
1/2 [see Eq. (51)].
Note that Eqs. (116-117) have been written in a compact form,
where
+ uE ·∇ =
{ϕ, · · ·} (122)
is the convective derivative with respect to the E×B drift ve-
locity, uE = −c∇⊥ϕ× ẑ/B0, and
b̂ ·∇ = ∂
·∇ = ∂
A‖, · · ·
(123)
is the gradient along the total magnetic field (mean field plus
perturbation).
The generalized energy conserved by Eqs. (116-121) is
given by Eq. (109).
It is worth observing that the left-hand side of Eq. (116) is
simply minus the component of the electric field along the to-
tal magnetic field [see Eq. (37)]. This was used in § 4.3 to
prove that the magnetic flux described by Eq. (116) is exactly
conserved (see § 7.7 for a discussion of scales at which this
conservation is broken). Equation (116) is the projection of
the generalized Ohm’s law onto the total magnetic field—the
right-hand side of this equation is the so-called thermoelec-
tric term. This is discussed in more detail in Appendix C.1,
where we also show that Eq. (117) is the parallel part of Fara-
day’s law and give a qualitative non-gyrokinetic derivation of
Eqs. (116-117).
We will refer to Eqs. (116-121) as the equations of isother-
mal electron fluid. They are valid in a broad range of scales:
the only constraints are that k‖ ≪ k⊥ (gyrokinetic order-
ing, § 3.1), k⊥ρe ≪ 1 (electrons are magnetized, § 4.8.1) and
k‖λmfpi ≫ (me/mi)1/2 (electrons are isothermal, § 4.8.4). The
region of validity of Eqs. (116-121) in the wavenumber space
is illustrated in Fig. 4. A particular advantage of this hybrid
fluid-kinetic system is that it is uniformly valid across the
transition from magnetized to unmagnetized ions (i.e., from
k⊥ρi ≪ 1 to k⊥ρi ≫ 1).
5. TURBULENCE IN THE INERTIAL RANGE: KINETIC RMHD
Our goal in this section is to derive a reduced set of equa-
tions that describe the magnetized plasma in the limit of small
k⊥ρi. Before we proceed with an expansion in k⊥ρi, we need
to make a formal technical step, the usefulness of which will
become clear shortly. A reader with no patience for this or
any of the subsequent technical developments may skip to the
summary at the end of this section (§ 5.7).
5.1. A Technical Step
Let us formally split the ion gyrocenter distribution function
into two parts:
v⊥ ·A⊥
F0i + g
eik·Ri
J0(ai)
v2thi
J1(ai)
F0i + g. (124)
Then g satisfies the following equation, obtained by substitut-
ing Eq. (124) and the expression for ∂A‖/∂t that follows from
Eq. (116) into the ion gyrokinetic equation (121):
{〈χ〉Ri ,g}− 〈Cii[g]〉Ri
︸ ︷︷ ︸
A‖,ϕ− 〈ϕ〉Ri
︸ ︷︷ ︸
+ b̂ ·∇
v⊥ ·A⊥
︸ ︷︷ ︸
v⊥ ·A⊥
︸ ︷︷ ︸
. (125)
In the above equation, we have used compact notation in
writing out the nonlinear terms: e.g.,
A‖,ϕ− 〈ϕ〉Ri
A‖(r),ϕ(r)
〈A‖〉Ri ,〈ϕ〉Ri
, where the first Poisson
bracket involves derivatives with respect to r and the second
with respect to Ri.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 21
The field equations (118-120) rewritten in terms of g are
−Γ1(αi)
︸ ︷︷ ︸
1 −Γ0(αi)
]Zeϕk
︸ ︷︷ ︸
d3vJ0(ai)gk
︸ ︷︷ ︸
, (126)
4πen0e
k2⊥A‖k
︸ ︷︷ ︸
d3vv‖J0(ai)gk
︸ ︷︷ ︸
= u‖ki, (127)
︸ ︷︷ ︸
Γ2(αi) +
︸ ︷︷ ︸
1 −Γ1(αi)
]Zeϕk
︸ ︷︷ ︸
v2thi
J1(ai)
︸ ︷︷ ︸
, (128)
where ai = k⊥v⊥/Ωi, αi = k
i /2 and we have defined
Γ0(αi) =
d3v [J0(ai)]
2 F0i
= I0(αi)e
−αi = 1 −αi + · · · , (129)
Γ1(αi) =
v2thi
J0(ai)
J1(ai)
F0i = −Γ′0(αi)
= [I0(αi) − I1(αi)] e−αi = 1 −
αi + · · · , (130)
Γ2(αi) =
v2thi
J1(ai)
F0i = 2Γ1(αi). (131)
Underneath each term in Eqs. (125-128), we have indicated
the lowest order in k⊥ρi to which this term enters.
5.2. Subsidiary Ordering in k⊥ρi
In order to carry out a subsidiary expansion in small k⊥ρi,
we must order all terms in Eqs. (95-104) and (125-128) with
respect to k⊥ρi. Let us again assume, like we did when ex-
panding the electron equation (§ 4), that the ordering intro-
duced for the gyrokinetics in § 3.1 holds also for the sub-
sidiary expansion in k⊥ρi. First note that, in view of Eq. (47),
we must regard Zeϕ/T0i to be minus first order:
. (132)
Also, as δB⊥/B0 ∼ ǫ [Eq. (12)],
(v‖/c)A‖
∼ vthiδB⊥
βi, (133)
so ϕ and (v‖/c)A‖ are same order.
Since u‖ = u‖i (electrons do not contribute to the mass flow),
assuming that slow waves and Alfvén waves have comparable
energies implies u‖i ∼ u⊥. As u‖i is determined by the second
equality in Eq. (127), we can order g [using Eq. (12)]:
, (134)
so g is zeroth order in k⊥ρi. Similarly, δne/n0e ∼ δB‖/B0 ∼ ǫ
are zeroth order in k⊥ρi—this follows directly from Eq. (12).
Together with Eq. (3), the above considerations allow us to
order all terms in our equations. The ordering of the collision
term involving ϕ is explained in Appendix B.2.
5.3. Alfvén Waves: Kinetic Derivation of RMHD
We shall now show that the RMHD equations (17-18) hold
in this approximation. There is a simple correspondence be-
tween the stream and flux functions defined in Eq. (16) and
the electromagnetic potentials ϕ and A‖:
ϕ, Ψ = −
4πmin0i
. (135)
The first of these definitions says that the perpendicular flow
velocity u⊥ is the E×B drift velocity; the second definition
is the standard MHD relation between the magnetic flux func-
tion and the parallel component of the vector potential.
5.3.1. Derivation of Eq. (17)
Deriving Eq. (17) is straightforward: in Eq. (95), we retain
only the lowest—minus first—order terms (those that contain
ϕ and A‖). The result is
= 0. (136)
Using Eq. (135) and the definition of the Alfvén speed, vA =
4πmin0i, we get Eq. (17). By the argument of § 4.3,
Eq. (136) expresses the fact that that magnetic-field lines are
frozen into the E×B velocity field, which is the mean flow
velocity associated with the Alfvén waves (see § 5.4).
5.3.2. Derivation of Eq. (18)
As we are about to see, in order to derive Eq. (18), we have
to separate the first-order part of the k⊥ρi expansion. The
easiest way to achieve this, is to integrate Eq. (125) over the
velocity space (keeping r constant) and expand the resulting
equation in small k⊥ρi. Using Eqs. (126) and (127) to express
the velocity-space integrals of g, we get
1 −Γ0(αi)
]Zeϕk
︸ ︷︷ ︸
−Γ1(αi)
︸ ︷︷ ︸
4πen0e
k2⊥A‖k
︸ ︷︷ ︸
d3vJ0(ai){〈χ〉Ri ,g}k
︸ ︷︷ ︸
d3vJ0(ai)
v⊥ ·A⊥
︸ ︷︷ ︸
22 SCHEKOCHIHIN ET AL.
. (137)
Underneath each term, the lowest order in k⊥ρi to which it
enters is shown. We see that terms containing ϕ are all first
order, so it is up to this order that we shall retain terms. The
collision term integrated over the velocity space picks up two
extra orders of k⊥ρi (see Appendix B.1), so it is second order
and can, therefore, be dropped. As a consequence of quasi-
neutrality, the zeroth-order part of the above equation exactly
coincides with Eq. (104), i.e, δni/n0i = δne/n0e satisfy the
same equation. Indeed, neglecting second-order terms (but
not first-order ones!), the nonlinear term in Eq. (137) (the last
term on the left-hand side) is
d3vv‖g
v2thi
, (138)
and, using Eqs. (126-128) to express velocity-space integrals
of g in the above expression, we find that the zeroth-order part
of the nonlinearity is the same as the nonlinearity in Eq. (104),
while the first-order part is
ρ2i ∇2⊥
4πen0e
∇2⊥A‖
, (139)
where we have used the expansion (129) of Γ0(αi) and con-
verted it back into x space.
Thus, if we subtract Eq. (104) from Eq. (137), the remain-
der is first order and reads
ρ2i ∇2⊥
ρ2i ∇2⊥
4πen0e
∇2⊥A‖ −
4πen0e
∇2⊥A‖
= 0. (140)
Multiplying Eq. (140) by 2T0i/Zeρ
i and using Eq. (135), we
get the second RMHD equation (18).
We have established that the Alfvén-wave component of the
turbulence is decoupled and fully described by the RMHD
equations (17) and (18). This result is the same as that in
§ 2.2 but now we have proven that collisions do not affect the
Alfvén waves and that a fluid-like description only requires
k⊥ρi ≪ 1 to be valid.
5.4. Why Alfvén Waves Ignore Collisions
Let us write explicitly the distribution function of the ion
gyrocenters [Eq. (124)] to two lowest orders in k⊥ρi:
〈ϕ〉Ri F0i +
v2thi
F0i + g + · · · , (141)
where, up to corrections of order k2⊥ρ
i , the ring-averaged
scalar potential is 〈ϕ〉Ri = ϕ(Ri), the scalar potential taken at
the position of the ion gyrocenter. Note that in Eq. (141), the
first term is minus first order in k⊥ρi [see Eq. (132)], the sec-
ond and third terms are zeroth order [Eq. (134)], and all terms
of first and higher orders are omitted. In order to compute the
full ion distribution function given by Eq. (54), we have to
convert hi to the r space. Keeping terms up to zeroth order,
we get
〈ϕ〉Ri ≃
ϕ(Ri) =
ϕ(r) +
v⊥× ẑ
·∇ϕ(r) + · · ·
ϕ(r) +
2v⊥ ·uE
v2thi
+ . . . , (142)
where uE = −c∇ϕ(r)× ẑ/B0, the E×B drift velocity. Sub-
stituting Eq. (142) into Eq. (141) and then Eq. (141) into
Eq. (54), we find
fi = F0i +
2v⊥ ·uE
v2thi
F0i +
v2thi
F0i + g + · · · . (143)
The first two terms can be combined into a Maxwellian
with mean perpendicular flow velocity u⊥ = uE . These are
the terms responsible for the Alfvén waves. The remaining
terms, which we shall denote δ f̃i, are the perturbation of the
Maxwellian in the moving frame of the Alfvén waves—they
describe the passive (compressive) component of the turbu-
lence (see § 5.5). Thus, the ion distribution function is
(πv2thi)
(v⊥ − uE)2 + v2‖
+ δ f̃i. (144)
This sheds some light on the indifference of Alfvén waves
to collisions: Alfvénic perturbations do not change the
Maxwellian character of the ion distribution. Unlike in a neu-
tral fluid or gas, where viscosity arises when particles trans-
port the local mean momentum a distance ∼ λmfpi, the parti-
cles in a magnetized plasma instantaneously take on the lo-
cal E×B velocity (they take a cyclotron period to adjust, so,
roughly speaking, ρi plays the role of the mean free path).
Thus, there is no memory of the mean perpendicular motion
and, therefore, no perpendicular momentum transport.
Some readers may find it illuminating to notice that
Eq. (140) can be interpreted as stating simply ∇·j = 0: the first
two terms represent the divergence of the polarization current,
which is perpendicular to the magnetic field;22 the last two
terms are b̂ ·∇ j‖. No contribution to the current arises from
the collisional term in Eq. (137) as ion–ion collisions cause
no particle transport to lowest order in k⊥ρi.
5.5. Compressive Fluctuations
The equations that describe the density (δne) and magnetic-
field-strength (δB‖) fluctuations follow immediately from
Eqs. (125-128) if only zeroth-order terms are kept. In these
equations, terms that involve ϕ and A‖ also contain factors
∼ k2⊥ρ2i and are, therefore, first-order [with the exception of
the nonlinearity on the left-hand side of Eq. (125)]. The fact
that 〈Cii[〈ϕ〉Ri F0i]〉Ri in Eq. (125) is first order is proved in Ap-
pendix B.2. Dropping these terms along with all other contri-
butions of order higher than zeroth and making use of Eq. (69)
22 The polarization-drift velocity is formally higher order than uE in the
gyrokinetic expansion. However, since uE does not produce any current,
the lowest-order contribution to the perpendicular current comes from the
polarization drift. The higher-order contributions to the gyrocenter distri-
bution function did not need to be calculated explicitly because the informa-
tion about the polarization charge is effectively carried by the quasi-neutrality
condition (61). We do not belabor this point because, in our approach, the no-
tion of polarization charge is only ever brought in for interpretative purposes,
but is not needed to carry out calculations. For further qualitative discussion
of the role of the polarization charge and polarization drift in gyrokinetics,
we refer the reader to Krommes 2006 and references therein.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 23
to write out 〈χ〉Ri , we find that Eq. (125) takes the form
+ v‖ b̂ ·∇
v2thi
v2thi
, (145)
where we have used definitions (122-123) of the convective
time derivative d/dt and the total gradient along the magnetic
field b̂ · ∇ to write our equation in a compact form. Note
that, in view of the correspondence between Φ, Ψ and ϕ, A‖
[Eq. (135)], these nonlinear derivatives are the same as those
defined in Eqs. (19-20). The collision term in the right-hand
side of the above equation is the zeroth-order limit of the gy-
rokinetic ion–ion collision operator: a useful model form of it
is given in Appendix B.3 [Eq. (B18)].
To zeroth order, Eqs. (126-128) are
d3vg, (146)
d3vv‖g, (147)
v2thi
g. (148)
Note that u‖ is not an independent quantity—it can be com-
puted from the ion distribution but is not needed for the deter-
mination of the latter.
Equations (145-148) evolve the ion distribution function
g, the “slow-wave quantities” u‖, δB‖, and the density fluc-
tuations δne. The nonlinearities in Eq. (145), contained in
d/dt and b̂ ·∇, involve the Alfvén-wave quantities Φ and Ψ
(or, equivalently, ϕ and A‖) determined separately and inde-
pendently by the RMHD equations (17-18). The situation
is qualitatively similar to that in MHD (§ 2.4), except now
a kinetic description is necessary—Eqs. (145-148) replace
Eqs. (25-27)—and the nonlinear scattering/mixing of the slow
waves and the entropy mode by the Alfvén waves takes the
form of passive advection of the distribution function g. The
density and magnetic-field-strength fluctuations are velocity-
space moments of g.
Another way to understand the passive nature of the com-
pressive component of the turbulence discussed above is to
think of it as the perturbation of a local Maxwellian equilib-
rium associated with the Alfvén waves. Indeed, in § 5.4, we
split the full ion distribution function [Eq. (144)] into such a
local Maxwellian and its perturbation
δ f̃i = g +
v2thi
F0i. (149)
It is this perturbation that contains all the information about
the compressive component; the second term in the above ex-
pression enforces to lowest order the conservation of the first
adiabatic invariant µi = miv
⊥/2B. In terms of the function
(149), Eqs. (145-148) take a somewhat more compact form
(cf. Schekochihin et al. 2007):
δ f̃i −
v2thi
+ v‖b̂ ·∇
δ f̃i +
δ f̃i
, (150)
FIG. 5.— Channels of the kinetic cascade of generalized energy (§ 3.4)
from large to small scales: see § 2.7 and Appendix D.2 (inertial range,
collisional regime), § 5.6 and § 6.2.5 (inertial range, collisionless regime),
§ 7.8 and § 7.12 (dissipation range). Note that some ion heating probably
also results from the collisional and collisionless damping of the compressive
fluctuations in the inertial range (see § 6.1.2 and § 6.2.4).
d3vδ f̃i, (151)
v2thi
δ f̃i. (152)
5.6. Generalized Energy: Three KRMHD Cascades
The generalized energy (§ 3.4) in the limit k⊥ρi ≪ 1 is cal-
culated by substituting into Eq. (109) the perturbed ion dis-
tribution function δ fi = 2v⊥ · uEF0i/v2thi + δ f̃i [see Eqs. (143)
and (149)]. After performing velocity integration, we get
min0iu
n0iT0i
δ f̃ 2i
=WAW +Wcompr. (153)
We see that the kinetic energy of the Alfvénic fluctuations
has emerged from the ion-entropy part of the generalized en-
ergy. The first two terms in Eq. (153) are the total (kinetic
plus magnetic) energy of the Alfvén waves, denoted WAW. As
we learned from § 5.3, it cascades independently of the rest of
the generalized energy, Wcompr, which contains the compres-
sive component of the turbulence (§ 5.5) and is the invariant
conserved by Eqs. (150-152).
In terms of the potentials used in our discussion of RMHD
in § 2, we have
WAW =
min0i
|∇⊥Φ|2 + |∇⊥Ψ|2
min0i
|∇⊥ζ+|2 + |∇⊥ζ−|2
=W +AW +W
AW (154)
whereW +AW and W
AW are the energies of the “+” and “−” waves
[Eq. (33)], which, as we know from § 2.3, cascade by scatter-
ing off each other but without exchanging energy.
Thus, the kinetic cascade in the limit k⊥ρi ≪ 1 is split, in-
dependently of the collisionality, into three cascades: of W +AW,
24 SCHEKOCHIHIN ET AL.
W −AW and Wcompr. The compressive cascade is, in fact, split
into three independent cascades—the splitting is different in
the collisional limit (Appendix D.2) and in the collisionless
one (§ 6.2.5). Figure 5 schematically summarizes both the
splitting of the kinetic cascade that we have worked out so far
and the upcoming developments.
5.7. Summary
In § 4, gyrokinetics was reduced to a hybrid fluid-kinetic
system by means of an expansion in the electron mass, which
was valid for k⊥ρe ≪ 1. In this section, we have further re-
stricted the scale range by taking k⊥ρi ≪ 1 and as a result have
been able to achieve a further reduction in the complexity of
the kinetic theory describing the turbulent cascades. The re-
duced theory derived here evolves 5 unknown functions: Φ,
Ψ, δB‖, δne and g. The stream and flux functions, Φ and Ψ
are related to the fluid quantities (perpendicular velocity and
magnetic field perturbations) via Eq. (16) and to the electro-
magnetic potentials ϕ, A‖ via Eq. (135). They satisfy a closed
system of equations, Eqs. (17-18), which describe the decou-
pled cascade of Alfvén waves. These are the same equations
that arise from the MHD approximations, but we have now
proven that their validity does not depend on the assumption
of high collisionality (the fluid limit) and extends to scales
well below the mean free path, but above the ion gyroscale.
The physical reasons for this are explained in § 5.4. The den-
sity and magnetic-field-strength fluctuations (the “compres-
sive” fluctuations, or the slow waves and the entropy mode in
the MHD limit) now require a kinetic description in terms of
the ion distribution function g [or δ f̃i, Eq. (149)], evolved by
the kinetic equation (145) [or Eq. (150)]. The kinetic equation
contains δne and δB‖, which are, in turn calculated in terms
of the velocity-space integrals of g via Eqs. (146) and (148)
[or Eqs. (151) and (152)]. The nonlinear evolution (turbulent
cascade) of g, δB‖ and δne is due solely to passive advection
of g by the Alfvén-wave turbulence.
Let us summarize the new set of equations:
= vAb̂ ·∇Φ, (155)
∇2⊥Φ= vAb̂ ·∇∇2⊥Ψ, (156)
+ v‖ b̂ ·∇
v2thi
v2thi
, (157)
v2thi
(158)
v2thi
g, (159)
where
+{Φ, · · ·} , b̂ ·∇ = ∂
{Ψ, · · ·} . (160)
An explicit form of the collision term in the right-hand side of
Eq. (157) is provided in Appendix B.3 [Eq. (B18)].
The generalized energy conserved by Eqs. (155-159) is
given by Eq. (153). The kinetic cascade is split, the Alfvénic
cascade proceeding independently of the compressive one
(see Fig. 5).
The decoupling of the Alfvénic cascade is manifested by
Eqs. (155-156) forming a closed subset. As already noted in
§ 4.9, Eq. (155) is the component of Ohm’s law along the total
magnetic field, B ·E = 0. Equation (156) can be interpreted as
the evolution equation for the vorticity of the perpendicular
plasma flow velocity, which is the E×B drift velocity.
We shall refer to the system of equations (155-159) as Ki-
netic Reduced Magnetohydrodynamics (KRMHD).23 It is a
hybrid fluid-kinetic description of low-frequency turbulence
in strongly magnetized weakly collisional plasma that is uni-
formly valid at all scales satisfying k⊥ρi ≪ min(1,k‖λmfpi)
(ions are strongly magnetized)24 and k‖λmfpi ≫ (me/mi)1/2
(electrons are isothermal), as illustrated in Fig. 2. Therefore,
it smoothly connects the collisional and collisionless regimes
and is the appropriate theory for the study of the turbulent cas-
cades in the inertial range. The KRMHD equations generalize
rather straightforwardly to plasmas that are so collisionless
that one cannot assume a Maxwellian equilibrium distribu-
tion function (Chen et al. 2009)—a situation that is relevant
in some of the solar-wind measurements (see further discus-
sion in § 8.3).
KRMHD describe what happens to the turbulent cascade at
or below the ion gyroscale—we shall move on to these scales
in § 7, but first we would like to discuss the turbulent cascades
of density and magnetic-field-strength fluctuations and their
damping by collisional and collisionless mechanisms.
6. COMPRESSIVE FLUCTUATIONS IN THE INERTIAL RANGE
Here we first derive the nonlinear equations that govern
the evolution of the compressive (density and magnetic-field-
strength) fluctuations in the collisional (k‖λmfpi ≪ 1, § 6.1 and
Appendix D) and collisionless (k‖λmfpi ≫ 1, § 6.2) limits, dis-
cuss the linear damping that these fluctuations undergo in the
two limits and work out the form the generalized energy takes
for compressive fluctuations (which is particularly interesting
in the collisionless limit, §§ 6.2.3-6.2.5). As in previous sec-
tions, an impatient reader may skip to § 6.3 where the results
of the previous two subsections are summarized and the im-
plications for the structure of the turbulent cascades of the
density and field-strength fluctuations are discussed.
6.1. Collisional Regime
6.1.1. Equations
In the collisional regime, k‖λmfpi ≪ 1, the fluid limit is re-
covered by expanding Eqs. (155-159) in small k‖λmfpi. The
calculation that is necessary to achieve this is done in Ap-
pendix D (see also Appendix A.4). The result is a closed set
23 The term is introduced by analogy with a popular fluid-kinetic system
known as Kinetic MHD, or KMHD (see Kulsrud 1964, 1983). KMHD is de-
rived for magnetized plasmas (ρi ≪ λmfpi) under the assumption that kρs ≪ 1
and ω ≪ Ωs but without assuming either strong anisotropy (k‖ ≪ k⊥) or
small fluctuations (|δB| ≪ B0). The KRMHD equations (155-159) can be
recovered from KMHD by applying to it the GK-RMHD ordering [Eq. (12)
and § 3.1] and an expansion in (me/mi)1/2 (Schekochihin et al. 2007). This
means that the k⊥ρi expansion (§ 5), which for KMHD is the primary ex-
pansion, commutes with the gyrokinetic expansion (§ 3) and the (me/mi)1/2
expansion (§ 4), both of which preceded it in this paper.
24 The condition k⊥ρi ≪ k‖λmfpi must be satisfied because in our esti-
mates of the collision terms (Appendix B.2) we took k⊥ρi ≪ 1 while assum-
ing that k‖λmfpi ∼ 1.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 25
of three fluid equations that evolve δB‖, δne and u‖:
= b̂ ·∇u‖ +
, (161)
= v2Ab̂ ·∇
+ ν‖ib̂ ·∇
b̂ ·∇u‖
, (162)
+κ‖ib̂ ·∇
b̂ ·∇δTi
, (163)
where
ν‖ib̂ ·∇u‖
, (164)
and ν‖i and κ‖i are the coefficients of parallel ion viscosity
and thermal diffusivity, respectively. The viscous and ther-
mal diffusion are anisotropic because plasma is magnetized,
λmfpi ≫ ρi (Braginskii 1965). The method of calculation of
ν‖i and κ‖i is explained in Appendix D.3. Here we shall ig-
nore numerical prefactors of order unity and give order-of-
magnitude values for these coefficients:
ν‖i ∼ κ‖i ∼
v2thi
∼ vthiλmfpi. (165)
If we set ν‖i = κ‖i = 0, Eqs. (161-164) are the same as the
RMHD equations of § 2 with the sound speed defined as
cs = vA
. (166)
This is the natural definition of cs for the case of adiabatic
ions, whose specific heat ratio is γi = 5/3, and isothermal elec-
trons, whose specific heat ratio is γe = 1 [because δpe = T0eδne;
see Eq. (103)]. Note that Eq. (164) is equivalent to the
pressure balance [Eq. (22) of § 2] with p = niTi + neTe and
δpe = T0eδne.
As in § 2, the fluctuations described by Eqs. (161-164) sep-
arate into the zero-frequency entropy mode and the left- and
right-propagating slow waves with
ω = ±
1 + v2A/c2s
(167)
[see Eq. (30)]. All three are cascaded independently of each
other via nonlinear interaction with the Alfvén waves. In Ap-
pendix D.2, we show that the generalized energy Wcompr for
this system, given in § 5.6, splits into the three familiar invari-
ants W +sw, W
sw, and Ws, defined by Eqs. (34-35) (see Fig. 5).
6.1.2. Dissipation
The diffusion terms add dissipation to the equations. Be-
cause diffusion occurs along the field lines of the total mag-
netic field (mean field plus perturbation), the diffusive terms
are nonlinear and the dissipation process also involves interac-
tion with the Alfvén waves. We can estimate the characteristic
parallel scale at which the diffusion terms become important
by balancing the nonlinear cascade time and the typical diffu-
sion time:
k‖vA ∼ vthiλmfpik2‖ ⇔ k‖λmfpi ∼ 1/
βi, (168)
where we have used Eq. (165).
Technically speaking, the cutoff given by Eq. (168) always
lies in the range of k‖ that is outside the region of validity
of the small-k‖λmfpi expansion adopted in the derivation of
Eqs. (161-163). In fact, in the low-beta limit, the collisional
cutoff falls manifestly in the collisionless scale range, i.e.,
the collisional (fluid) approximation breaks down before the
slow-wave and entropy cascades are damped and one must use
the collisionless (kinetic) limit to calculate the damping (see
§ 6.2.2). The situation is different in the high-beta limit: in
this case, the expansion in small k‖λmfpi can be reformulated
as an expansion in small 1/
βi and the cutoff falls within the
range of validity of the fluid approximation. Equations (161-
163) in this limit are
= b̂ ·∇u‖, (169)
= v2Ab̂ ·∇
+ ν‖ib̂ ·∇
b̂ ·∇u‖
, (170)
1 + Z/τ
5/3 + Z/τ
κ‖ib̂ ·∇
b̂ ·∇δne
. (171)
As in § 2 [Eq. (28)], the density fluctuations [Eq. (171)] have
decoupled from the slow waves [Eqs. (169-170)]. The former
are damped by thermal diffusion, the latter by viscosity. The
corresponding linear dispersion relations are
ω = −i
1 + Z/τ
5/3 + Z/τ
‖, (172)
ω =±k‖vA
ν‖ik‖
. (173)
Equation (172) describes strong diffusive damping of the den-
sity fluctuations. The slow-wave dispersion relation (173) has
two distinct regimes:
1. When k‖ < 2vA/ν‖i, it describes viscously damped slow
waves. In particular, in the limit k‖λmfpi ≪ 1/
βi, we
ω ≃±k‖vA − i
. (174)
2. For k‖ > 2vA/ν‖i, both solutions become purely imag-
inary, so the slow waves are converted into aperiodic
decaying fluctuations. The stronger-damped (diffusive)
branch has ω ≃ −iν‖ik2‖, the weaker-damped one has
ω ≃ −i v
∼ − i
λmfpi
∼ − i√
λmfpi
. (175)
This damping effect is called viscous relaxation. It is
valid until k‖λmfpi ∼ 1, where it is replaced by the col-
lisionless damping discussed in § 6.2.2 [see Eq. (190)].
The viscous and thermal-diffusive dissipation mechanisms
described above lead, in the limits where they are efficient, to
ion heating via the standard fluid (collisional) route, involving
the development of small parallel scales in the position space,
but not in velocity space (see § 3.4 and § 3.5).
6.2. Collisionless Regime
6.2.1. Equations
In the collisionless regime, k‖λmfpi ≫ 1, the collision inte-
gral in the right-hand side of the kinetic equation (157) can be
26 SCHEKOCHIHIN ET AL.
neglected. The v⊥ dependence can then be integrated out of
Eq. (157). Indeed, let us introduce the following two auxiliary
functions:
Gn(v‖) = −
dv⊥ v⊥
v2thi
g, (176)
GB(v‖) = −
dv⊥ v⊥
v2thi
g. (177)
In terms of these functions,
dv‖Gn,
dv‖GB (178)
and Eq. (157) reduces to the following two coupled one-
dimensional kinetic equations
+ v‖b̂ ·∇Gn = −
v‖FM(v‖)
×b̂ ·∇
, (179)
+ v‖b̂ ·∇GB =
v‖FM(v‖)
×b̂ ·∇
, (180)
where FM(v‖) = (1/
πvthi)exp(−v2‖/v
thi) is a one-dimensional
Maxwellian. This system can be diagonalized, so it splits into
two decoupled equations
+v‖b̂ ·∇G± =
v‖FM(v‖)
b̂ ·∇
dv′‖ G
±(v′‖), (181)
where
± = −
(182)
and we have introduced a new pair of functions
G+ = GB +
Gn, G
− = Gn +
GB, (183)
where
σ = 1 +
. (184)
Equation (181) describes two decoupled kinetic cascades,
which we will discuss in greater detail in §§ 6.2.3-6.2.5.
6.2.2. Collisionless Damping
Fluctuations described by Eq. (181) are subject to collision-
less damping. Indeed, let us linearize Eq. (181), Fourier trans-
form in time and space, divide through by −i(ω − k‖v‖), and
integrate over v‖. This gives the following dispersion relation
(the “−” branch is for G−, the “+” branch for G+)
ζiZ (ζi) = Λ
± − 1, (185)
FIG. 6.— Schematic log-log plot (artist’s impression) of the ratio of the
damping rate of magnetic-field-strength fluctuations to the Alfvén frequency
k‖vA in the high-beta limit [see Eqs. (173) and (190)]. In Barnes et al. (2009),
this plot is reproduced via a direct numerical solution of the linearized ion
gyrokinetic equation with collisions.
where ζi = ω/|k‖|vthi = ω/|k‖|vA
βi and we have used the
plasma dispersion function (Fried & Conte 1961)
Z (ζi) =
x − ζi
(186)
(the integration is along the Landau contour). This function is
not to be confused with the ion charge parameter Z = qi/e.
Formally, Eq. (185) has an infinite number of solutions.
When βi ∼ 1, they are all strongly damped with damping rates
Im(ω) ∼ |k‖|vthi ∼ |k‖|vA, so the damping time is compara-
ble to the characteristic timescale on which the Alfvén waves
cause these fluctuations to cascade to smaller scales.
It is interesting to consider the high- and low-beta limits.
High-Beta Limit. — When βi ≫ 1, we have in Eq. (185)
− − 1≃−2
, G− ≃ Gn, (187)
+ − 1≃
, G+ ≃ GB +
Gn. (188)
The “−” branch corresponds to the density fluctuations. The
solution of Eq. (185) has Im(ζi) ∼ 1, so these fluctuations are
strongly damped:
ω ∼ −i|k‖|vA
βi. (189)
The damping rate is much greater than the Alfvénic rate k‖vA
of the nonlinear cascade. In contrast, for the “+” branch, the
damping rate is small: it can be obtained by expanding Z(ζi) =
π + O (ζi), which gives25
ω = −i
|k‖|vthi√
|k‖|vA√
. (190)
Since Gn is strongly damped, Eq. (188) implies G
+ ≃ GB, i.e.,
the fluctuations that are damped at the rate (190) are predom-
inantly of the magnetic-field strength. The damping rate is a
25 This is the gyrokinetic limit (k‖/k⊥ ≪ 1) of the more general damping
effect known in astrophysics as the Barnes (1966) damping and in plasma
physics as transit-time damping. We remind the reader that our approach was
to carry out the gyrokinetic expansion (in small k‖/k⊥) first, and then take
the high-beta limit as a subsidiary expansion. A more standard approach in
the linear theory of plasma waves is to take the limit of high βi while treating
k‖/k⊥ as an arbitrary quantity. A detailed calculation of the damping rates
done in this way can be found in Foote & Kulsrud (1979).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 27
constant (independent of k‖) small fraction ∼ 1/
βi of the
Alfvénic cascade rate.
In Fig. 6, we give a schematic plot of the damping rate of the
magnetic-field-strength fluctuations (slow waves) connecting
the fluid and kinetic limits for βi ≫ 1.
Low-Beta Limit. — When βi ≪ 1, we have
− − 1≃−
, G− ≃ Gn +
GB, (191)
+ − 1≃ 2
, G+ ≃ GB. (192)
For the “−” branch, we again have Im(ζi) ∼ 1, so
ω ∼ −i|k‖|vA
βi, (193)
which now is much smaller than the Alfvénic cascade rate
k‖vA. For the “+” branch (predominantly the field-strength
fluctuations), we seek a solution with ζ = −iζ̃i and ζ̃i ≫ 1.
Then Eq. (185) becomes ζiZ(ζi) ≃ 2
π ζ̃i exp(ζ̃i) = 2/βi. Up
to logarithmically small corrections, this gives ζ̃i ≃
| lnβi|,
whence
ω ∼ −i|k‖|vA
βi| lnβi|. (194)
While this damping rate is slightly greater than that of the “−”
branch, it is still much smaller than the Alfvénic cascade rate.
6.2.3. Collisionless Invariants
Equation (181) obeys a conservation law, which is very easy
to derive. Multiplying Eq. (181) by G±/FM and integrating
over space and velocities and performing integration by parts
in the right-hand side, we get
(G±)2
b̂ ·∇
dv‖v‖G
±. (195)
On the other hand, integrating Eq. (181) over v‖ gives
± = −b̂ ·∇
dv‖v‖G
±. (196)
Using this to express the right-hand side of Eq. (195) as a full
time derivative, we find
dW±compr
= 0, (197)
where the two invariants are
W±compr =
n0iT0i
(G±)2
(198)
It is useful (and always possible) to split
G± = FM
± + G̃±, (199)
where
dv‖G̃
± = 0 by construction. Then
W±compr =
n0iT0i
(G̃±)2
. (200)
Written in this form, the two invariants W±compr are mani-
festly positive definite quantities because Λ+ > 1 and Λ− < 0.
The invariants regulate the two decoupled kinetic cascades of
compressive fluctuations in the collisionless regime. The col-
lisionless damping derived in § 6.2.2 leads to exponential de-
cay of the density and field-strength fluctuations, or, equiva-
lently, of
±, while conserving W±compr. This means that
the damping is merely a redistribution of the conserved quan-
tity W±compr: the first term in Eq. (200) grows to compensate
for the decay of the second.
6.2.4. Linear Parallel Phase Mixing
In dynamical terms, how does the kinetic system Eq. (181)
arrange for the integral of the distribution function G±(v‖) to
decay while allowing its norm to grow? This is a very well
known phenomenon of (linear) phase mixing (Landau 1946;
Hammett et al. 1991; Krommes & Hu 1994; Krommes 1999;
Watanabe & Sugama 2004). To put it in simple terms, the
solution of the linearized Eq. (181) consists of the inhomoge-
neous part, which contains the collisionless damping and the
homogeneous part (solution of the left-hand side = 0) given by
G± ∝ e−ik‖v‖t , the so-called ballistic response (this is also the
nonlinear solution if t and k‖ are interpreted as Lagrangian
variables in the frame of the Alfvén waves; see § 6.3). As
time goes on, this part of the solution becomes increasingly
oscillatory in v‖, so its velocity integral tends to zero, while
its amplitude does not decay. It is such ballistic contributions
that make up the G̃± term in Eq. (200).
As the velocity gradient of G̃± increases with time,
∂G̃±/∂v‖ ∼ k‖tG±, at some point it can become sufficiently
large to activate the collision integral [the right-hand side of
Eq. (157)], which has so far been neglected. This way the col-
lisionless damping of compressive fluctuations can be turned
into ion heating—a simple example of a more general prin-
ciple of how electromagnetic fluctuation energy is transferred
into heat via the entropy part of the generalized energy (§ 3.5).
Indeed, we will prove in § 6.2.5 that the invariants W±compr are
constituent parts of the overall generalized energy functional
for the compressive fluctuations, so their cascade to small
scales in phase space is part of the overall kinetic cascade in-
troduced in § 3.4.
It is not entirely clear how efficient is the parallel-phase-
mixing route to ion heating and, therefore, whether the colli-
sionlessly damped energy of compressive fluctuations ends up
in the ion heat or rather reaches the ion gyroscale and couples
back to the Alfvénic component of the turbulence (§ 7.1). The
answer to this question will depend on whether compressive
fluctuations can develop large k‖—a non-trivial issue further
discussed in § 6.3.
6.2.5. Generalized Energy: Three Collisionless Cascades
We will now show how the generalized energy for com-
pressive fluctuations in the collisionless regime incorporates
the two invariants derived in § 6.2.3.
Rewriting the compressive part of the KRMHD generalized
energy [Eq. (153)] in terms of the function g [see Eq. (149)],
we get
Wcompr =
n0iT0i
. (201)
28 SCHEKOCHIHIN ET AL.
Using Eqs. (178) and (183), we can express δne and δB‖ in
terms of
± as follows
, (202)
, (203)
where σ was defined in Eq. (184) and
. (204)
In order to express g in terms of G±, we have to reconstruct
the v⊥ dependence of g, which we integrated out at the begin-
ning of § 6.2.1.
Let us represent the distribution function as follows
πv2thi
e−xĝ(x,v‖), ĝ(x,v‖) =
Ll(x)Gl(v‖), (205)
where x = v2⊥/v
thi and we have expanded ĝ in Laguerre poly-
nomials Ll(x) = (e
x/l!)(dl/dxl)xle−x. Since Laguerre polyno-
mials are orthogonal, the first term in Eq. (201) splits into a
sum of “energies” associated with the expansion coefficients:
. (206)
The expansion coefficients are determined via the Laguerre
transform:
Gl(v‖) =
dxe−xLl(x)ĝ(x,v‖). (207)
As L0 = 1 and L1 = 1 − x, it is easy to see that δne and δB‖ can
be expressed as linear combinations of
dv‖G0 and
dv‖G1
[see Eqs. (176-178)]. Using Eqs. (176), (177), and (183), we
can show that
G0 = −
+G+ +
σ − 1 −
, (208)
σΛ+G+ −
, (209)
where G± satisfy Eq. (181). As follows from Eq. (157) (ne-
glecting the collision integral), all higher-order expansion co-
efficients satisfy a simple homogeneous equation:
+ v‖b̂ ·∇Gl = 0, l > 1. (210)
Thus, the distribution function can be explicitly written in
terms of G±:
G0(v‖) +
v2thi
G1(v‖)
πv2thi
thi + g̃, (211)
where G0 and G1 are given by Eqs. (208-209) and g̃ com-
prises the rest of the Laguerre expansion (all Gl with l > 1),
i.e., it is the homogeneous solution of Eq. (157) that does not
contribute to either density or magnetic-field strength:
+ v‖b̂ ·∇g̃ = 0,
d3v g̃ = 0,
v2thi
g̃ = 0. (212)
Now substituting Eqs. (208) and (209) into Eq. (206) and
then substituting the result and Eqs. (202-203) into Eq. (201),
we find after some straightforward manipulations
Wcompr =
T0ig̃
(Λ+)2W +compr
(Λ−)2W −compr, (213)
where κ is defined by Eq. (204) and W±compr are the two inde-
pendent invariants that we derived in § 6.2.3. Thus, the gener-
alized energy for compressive fluctuations splits into three in-
dependently cascading parts: W±compr associated with the den-
sity and magnetic-field-strength fluctuations and a purely ki-
netic part given by the first term in Eq. (213) (see Fig. 5).
The dynamical evolution of this purely kinetic component is
described by Eq. (212)—it is a passively mixed, undamped
ballistic-type mode.
All three cascade channels lead to small perpendicular spa-
tial scales via passive mixing by the Alfvénic turbulence and
also to small scales in v‖ via the parallel phase mixing pro-
cess discussed in § 6.2.4 (note that g̃ is subject to this process
as well).
6.3. Parallel and Perpendicular Cascades
Let us return to the kinetic equation (157) and transform
it to the Lagrangian frame associated with the velocity field
u⊥ = ẑ×∇⊥Φ of the Alfvén waves: (t,r) → (t,r0), where
r(t,r0) = r0 +
dt ′u⊥(t
′,r(t ′,r0)). (214)
In this frame, the convective derivative d/dt defined in
Eq. (160) turns into ∂/∂t, while the parallel spatial gradient
b̂ ·∇ can be calculated by employing the Cauchy solution for
the perturbed magnetic field δB⊥ = ẑ×∇⊥Ψ:
b̂(t,r) = ẑ +
δB⊥(t,r)
= b̂(0,r0) ·∇0r, (215)
where r is given by Eq. (214) and ∇0 = ∂/∂r0. Then
b̂ ·∇ = b̂(0,r0) ·
·∇ = b̂(0,r0) ·∇0 =
, (216)
where s0 is the arc length along the perturbed magnetic field
taken at t = 0 [if δB⊥(0,r0) = 0, s0 = z0]. Thus, in the La-
grangian frame associated with the Alfvénic component of
the turbulence, Eq. (157) is linear. This means that, if the
effect of finite ion gyroradius is neglected, the KRMHD sys-
tem does not give rise to a cascade of density and magnetic-
field-strength fluctuations to smaller scales along the moving
(perturbed) field lines, i.e., b̂ · ∇δne and b̂ · ∇δB‖ do not in-
crease. In contrast, there is a perpendicular cascade (cascade
in k⊥): the perpendicular wandering of field lines due to the
Alfvénic turbulence causes passive mixing of δne and δB‖ in
the direction transverse to the magnetic field (see § 2.6 for a
quick recapitulation of the standard scaling argument on the
passive cascade that leads to a k
⊥ in the perpendicular di-
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 29
FIG. 7.— Lagrangian mixing of passive fields: fluctuations develop small
scales across, but not along the exact field lines.
rection). Figure 7 illustrates this situation.26
We emphasize that this lack of nonlinear refinement of the
scale of δne and δB‖ along the moving field lines is a particu-
lar property of the compressive component of the turbulence,
not shared by the Alfvén waves. Indeed, unlike Eq. (157), the
RMHD equations (155-156), do not reduce to a linear form
under the Lagrangian transformation (214), so the Alfvén
waves should develop small scales both across and along the
perturbed magnetic field.
Whether the density and magnetic-field-strength fluctua-
tions develop small scales along the magnetic field has direct
physical and observational consequences. Damping of these
fluctuations, both in the collisional and collisionless regimes,
discussed in § 6.1.2 and § 6.2.2, respectively, depends pre-
cisely on their scale along the perturbed field: indeed, the
linear results derived there are exact in the Lagrangian frame
(214). To summarize these results, the damping rate of δne
and δB‖ at βi ∼ 1 is
γ∼ vthiλmfpik2‖0, k‖0λmfpi ≪ 1, (217)
γ∼ vthik‖0, k‖0λmfpi ≫ 1, (218)
where k‖0 ∼ b̂ ·∇ is the wavenumber along the perturbed field
(i.e., if there is no parallel cascade, the wavenumber of the
large-scale stirring).
Whether this damping cuts off the cascades of δne and δB‖
depends on the relative magnitudes of the damping rate γ for
a given k⊥ and the characteristic rate at which the Alfvén
waves cause δne and δB‖ to cascade to higher k⊥. This rate
is ωA ∼ k‖AvA, where k‖A is the parallel wave number of the
Alfvén waves that have the same k⊥. Since the Alfvén waves
do have a parallel cascade, assuming scale-by-scale critical
balance (3) leads to [Eq. (5)]
k‖A ∼ k
0 . (219)
If, in contrast to the Alfvén waves, δne and δB‖ have no par-
allel cascade, k‖0 does not grow with k⊥, so, for large enough
k⊥, k‖0 ≪ k‖A and γ≪ωA. This means that, despite the damp-
ing, the density and field-strength fluctuations should have
perpendicular cascades extending to the ion gyroscale.
The validity of the argument at the beginning of this sec-
tion that ruled out the parallel cascade of δne and δB‖ is not
quite as obvious as it might appear. Lithwick & Goldreich
(2001) argued that the dissipation of δne and δB‖ at the ion
gyroscale would cause these fluctuations to become uncorre-
lated at the same parallel scales as the Alfvénic fluctuations by
which they are mixed, i.e., k‖0 ∼ k‖A. The damping rate then
becomes comparable to the cascade rate, cutting off the cas-
cades of density and field-strength fluctuations at k‖λmfpi ∼ 1.
The corresponding perpendicular cutoff wavenumber is [see
26 Note that effectively, there is also a cascade in k‖ if the latter is mea-
sured along the unperturbed field—more precisely, a cascade in kz. This is
due to the perpendicular deformation of the perturbed magnetic field by the
Alfvén-wave turbulence: since ∇⊥ grows while b̂ ·∇ remains the same, we
have from Eq. (123) ∂/∂z ≃ −(δB⊥/B0) ·∇⊥.
Eq. (219)]
k⊥ ∼ l1/20 λ
mfpi . (220)
Asymptotically speaking, in a weakly collisional plasma,
this cutoff is far above the ion gyroscale, k⊥ρi ≪ 1. How-
ever, the relatively small value of λmfpi in the warm ISM,
which was the main focus of Lithwick & Goldreich 2001,
meant that the numerical value of the perpendicular cutoff
scale given by Eq. (220) was, in fact, quite close both to
the ion gyroscale (see Table 1) and to the observational es-
timates for the inner scale of the electron-density fluctuations
in the ISM (Spangler & Gwinn 1990; Armstrong et al. 1995).
Thus, it was not possible to tell whether Eq. (220), rather than
k⊥ ∼ ρ−1i , represented the correct prediction.
The situation is rather different in the nearly collision-
less case of the solar wind, where the cutoff given by
Eq. (220) would mean that very little density or field-
strength fluctuations should be detected above the ion gy-
roscale. Observations do not support such a conclu-
sion: the density fluctuations appear to follow a k−5/3 law
at all scales larger than a few times ρi (Lovelace et al.
1970; Woo & Armstrong 1979; Celnikier et al. 1983, 1987;
Coles & Harmon 1989; Marsch & Tu 1990b; Coles et al.
1991), consistently with the expected behavior of an un-
damped passive scalar field (see § 2.6). An extended range
of k−5/3 scaling above the ion gyroscale is also observed for
the fluctuations of the magnetic-field strength (Marsch & Tu
1990b; Bershadskii & Sreenivasan 2004; Hnat et al. 2005;
Alexandrova et al. 2008a).
These observational facts suggest that the cutoff formula
(220) does not apply. This does not, however, conclusively
vitiate the Lithwick & Goldreich (2001) theory. Heuristically,
their argument is plausible, although it is, perhaps, useful
to note that in order for the effect of the perpendicular dis-
sipation terms, not present in the KRMHD equations (157-
159), to be felt, the density and field-strength fluctuations
should reach the ion gyroscale in the first place. Quanti-
tatively, the failure of the compressive fluctuations in the
solar wind to be damped could still be consistent with the
Lithwick & Goldreich (2001) theory because of the relative
weakness of the collisionless damping, especially at low beta
(§ 6.2.2)—the explanation they themselves favor. The way to
check observationally whether this explanation suffices would
be to make a comparative study of the compressive fluctua-
tions for solar-wind data with different values of βi. If the
strength of the damping is the decisive factor, one should al-
ways see cascades of both δne and δB‖ at low βi, no cascades
at βi ∼ 1, and a cascade of δB‖ but not δne at high βi (in
this limit, the damping of the density fluctuations is strong,
of the field-strength weak; see § 6.2.2). If, on the other hand,
the parallel cascade of the compressive fluctuations is intrin-
sically inefficient, very little βi dependence is expected and a
perpendicular cascade should be seen in all cases.
Obviously, an even more direct observational (or numer-
ical) test would be the detection or non-detection of near-
perfect alignment of the density and field-strength structures
with the moving field lines (not with the mean magnetic
field—see footnote 26), but it is not clear how to measure
this reliably. It is interesting, in this context, that in near-
the-Sun measurements, the density fluctuations are reported
to have the form of highly anisotropic filaments aligned with
the magnetic field (Armstrong et al. 1990; Grall et al. 1997;
Woo & Habbal 1997). Another intriguing piece of observa-
30 SCHEKOCHIHIN ET AL.
tional evidence is the discovery that the local structure of the
magnetic-field-strength and density fluctuations at 1 AU is, in
a certain sense, correlated with the solar cycle (Kiyani et al.
2007; Hnat et al. 2007; Wicks et al. 2009)—this suggests a
dependence on initial conditions that is absent in the Alfvénic
fluctuations and that presumably should also disappear in the
compressive fluctuations if the latter are fully mixed both in
the perpendicular and parallel directions.
7. TURBULENCE IN THE DISSIPATION RANGE: ELECTRON RMHD
AND THE ENTROPY CASCADE
7.1. Transition at the Ion Gyroscale
The validity of the theory discussed in § 5 and § 6 breaks
down when k⊥ρi ∼ 1. As the ion gyroscale is approached,
the Alfvén waves are no longer decoupled from the rest of
the plasma dynamics. All modes now contain perturbations
of density and magnetic-field strength and can be collision-
lessly damped. Because of the low-frequency nature of the
Alfvén-wave cascade, ω ≪ Ωi even at k⊥ρi ∼ 1 [Eq. (46)],
so the ion cyclotron resonance (ω − k‖v‖ = ±Ωi) is not im-
portant, while the Landau one (ω = k‖v‖) is. The linear the-
ory of this collisionless damping in the gyrokinetic approx-
imation is worked out in detail in Howes et al. (2006) (see
also Gary & Borovsky 2008). Figure 8 shows the solutions of
their dispersion relation that illustrate how the Alfvén wave
becomes a dispersive kinetic Alfvén wave (KAW) (see § 7.3)
and collisionless damping becomes important as the ion gy-
roscale is reached.
We stress that this transition occurs at the ion gyroscale, not
at the ion inertial scale di = ρi/
βi (except in the limit of cold
ions, τ = T0i/T0e ≪ 1; see Appendix E). This statement is true
even when βi is not order unity, as illustrated in Fig. 8: for the
three cases plotted there, k⊥di = 1 corresponds to k⊥ρi = 0.1,
1 and 10 for βi = 0.01, 1 and 100, respectively, but there is
no trace of the ion inertial scale in the solutions of the linear
dispersion relation. Nonlinearly, in the limit βi ≪ 1, we may
consider the scales k⊥di ∼ 1 and expand the gyrokinetics in
k⊥ρi = k⊥di
βi ≪ 1 in a way similar to how it was done in § 5
and obtain precisely the same results: Alfvénic fluctuations
described by the RMHD equations and compressive fluctua-
tions passively advected by them and satisfying the reduced
kinetic equation derived in § 5.5. Thus, even though di ≫ ρi
at low beta, there is no change in the nature of the turbulent
cascade until k⊥ρi ∼ 1 is reached.
The nonlinear theory of what happens at k⊥ρi ∼ 1 is very
poorly understood. It is, however, possible to make progress
by examining what kind of fluctuations emerge on the other
side of the transition, at k⊥ρi ≫ 1. As we will demonstrate
below, it turns out that another turbulent cascade—this time
of KAW—is possible in this so-called dissipation range. It
can transfer the energy of KAW-like fluctuations down to the
electron gyroscale, where electron Landau damping becomes
important (see Howes et al. 2006). Some observational evi-
dence of KAW is, indeed, available in the solar wind and the
magnetosphere (Bale et al. 2005; Grison et al. 2005, see fur-
ther discussion in § 8.2.4). Below we derive the equations that
describe KAW-like fluctuations in the scale range k⊥ρi ≫ 1,
k⊥ρe ≪ 1 (§ 7.2) and work out a Kolmogorov-style scaling
theory for this cascade (§ 7.5).
Because of the presence of the collisionless damping at the
ion gyroscale, only a certain fraction of the turbulent power
arriving there from the inertial range is converted into the
KAW cascade, while the rest is Landau-damped. The damp-
ing leads to the heating of the ions, but the process of deposit-
ing the collisionlessly damped fluctuation energy into the ion
heat is non-trivial because, as we explained in § 3.5, collisions
do need to play a role in order for true heating to occur. As
we explained in § 3.5 and will see specifically for the dissi-
pation range in § 7.8, the electromagnetic-fluctuation energy
does not disappear as a result of the Landau damping but is
converted into ion entropy fluctuations, while the generalized
energy is conserved. Collisions are then accessed and ion
heating achieved via a purely kinetic phenomenon: the ion
entropy cascade in phase space (nonlinear phase mixing), for
which a theory is developed in § 7.9 and § 7.10. A similar pro-
cess of conversion of the KAW energy into electron entropy
fluctuations and then electron heat is treated in § 7.12.
Figure 5 illustrates the routes energy takes from the ion gy-
roscale towards heating. Crucially, it is at k⊥ρi ∼ 1 that it
is decided how much energy would eventually go into the
ions and how much into electrons.27 How this distribution
of energy depends on plasma parameters (βi and T0i/T0e)
is an open theoretical question28 of considerable astrophys-
ical interest: e.g., the efficiency of ion heating is a key un-
known in the theory of advection-dominated accretion flows
(Quataert & Gruzinov 1999, see discussion in § 8.5) and of
the solar corona (e.g., Cranmer & van Ballegooijen 2003); we
will also see in § 7.11 that it may determine the form of the
observed dissipation-range spectra in space plasmas.
A short summary of this section is given in § 7.14.
7.2. Equations of Electron Reduced MHD
The derivation is straightforward: when ai ∼ k⊥ρi ≫ 1, all
Bessel functions in Eqs. (118-120) are small, so the integrals
of the ion distribution function vanish and Eqs. (118-120) be-
, (221)
u‖e =
4πen0e
∇2⊥A‖ = −
ρi∇2⊥Ψ√
, u‖i = 0, (222)
, (223)
where we used the definitions (135) of the stream and flux
functions Φ and Ψ.
These equations are a reflection of the fact that, for k⊥ρi ≫
1, the ion response is effectively purely Boltzmann, with the
gyrokinetic part hi contributing nothing to the fields or flows
[see Eq. (54) with hi omitted; hi does, however, play an impor-
tant role in the energy balance and ion heating, as explained
in §§ 7.8-7.10 below]. The Boltzmann response for ion den-
sity is expressed by Eq. (221). Equation (222) states that the
parallel ion flow velocity can be neglected. Finally, Eq. (223)
expresses the pressure balance for Boltzmann (and, therefore,
isothermal) electrons [Eq. (103)] and ions: if we write
B0δB‖
= −δpi − δpe = −T0iδni − T0eδne, (224)
27 Some of the energy of compressive fluctuations may go into ion heat via
collisional (§ 6.1.2) or collisionless (§ 6.2.2) damping of these fluctuations
in the inertial range. Whether this is a significant ion heating mechanism
depends on the efficiency of the parallel cascade (see § 6.2.4 and § 6.3).
28 How much energy is converted into ion entropy fluctuations in the pro-
cess of a nonlinear turbulent cascade is not necessarily directly related to the
strength of the linear collisionless damping.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 31
FIG. 8.— Numerical solutions of the linear gyrokinetic dispersion relation (for a detailed treatment of the linear theory, see Howes et al. 2006) showing the
transition from the Alfvén wave to KAW between the inertial range (k⊥ρi ≪ 1) and the dissipation range (k⊥ρi & 1). We show three cases: low beta (βi = 0.01),
βi = 1, and high beta (βi = 100). In all three cases, τ = 1 and Z = 1. Bold solid lines show the real frequency ω, bold dashed lines the damping rate γ, both
normalized by k‖vA (in gyrokinetics, ω/k‖vA and γ/k‖vA are functions of k⊥ only). Dotted lines show the asymptotic KAW solution (230). Horizontal solid line
shows the Alfvén wave ω = k‖vA. Vertical solid lines show k⊥ρi = 1 and k⊥ρe = 1. Note that the damping can be considered strong if the characteristic decay
time is comparable or shorter than the wave period, i.e., γ/ω & 1/2π. Thus, in these plots, the damping at k⊥ρi ∼ 1 is relatively weak for βi = 1, relatively
strong for low beta and very strong for high beta.
it follows that
, (225)
which, combined with Eq. (221), gives Eq. (223). We remind
the reader that the perpendicular Ampère’s law, from which
Eq. (223) was derived [Eq. (66) via Eq. (120)] is, in gyrokinet-
ics, indeed equivalent to the statement of perpendicular pres-
sure balance (see § 3.3).
Substituting Eqs. (221-223) into Eqs. (116-117), we obtain
the following closed system of equations
b̂ ·∇Φ, (226)
2 +βi
1 + Z/τ
) b̂ ·∇
ρ2i ∇2⊥Ψ
. (227)
Note that, using Eq. (223), Eqs. (226) and (227) can be recast
as two coupled evolution equations for the perpendicular and
parallel components of the perturbed magnetic field, respec-
tively [Eqs. (C10) in Appendix C.2].
We shall refer to Eqs. (226-227) as Electron Reduced MHD
(ERMHD). They are related to the Electron Magnetohydrody-
namics (EMHD)—a fluid-like approximation that evolves the
magnetic field only and arises if one assumes that the mag-
netic field is frozen into the electron flow velocity ue, while
the ions are immobile, ui = 0 (Kingsep et al. 1990):
4πen0e
∇× [(∇×B)×B] . (228)
As explained in Appendix C.2, the result of applying the
RMHD/gyrokinetic ordering (§ 2.1 and § 3.1) to Eq. (228),
where B = B0ẑ + δB and
ẑ×∇⊥Ψ+ ẑ
, (229)
coincides with our Eqs. (226-227) in the effectively incom-
pressible limits of βi ≫ 1 or βe = βiZ/τ ≫ 1. When betas are
arbitrary, density fluctuations cannot be neglected compared
to the magnetic-field-strength fluctuations [Eq. (225)] and
give rise to perpendicular ion flows with ∇·ui 6= 0. Thus, our
ERMHD system constitutes the appropriate generalization of
EMHD for low-frequency anisotropic fluctuations without the
assumption of incompressibility.
A (more tenuous) relationship also exists between our
ERMHD system and the so-called Hall MHD, which, like
EMHD, is based on the magnetic field being frozen into
the electron flow, but includes the ion motion via the stan-
dard MHD momentum equation [Eq. (8)]. Strictly speak-
ing, Hall MHD can only be used in the limit of cold ions,
τ = T0i/T0e ≪ 1 (see, e.g., Ito et al. 2004; Hirose et al. 2004,
and Appendix E), in which case it can be shown to reduce
to Eqs. (226-227) in the appropriate small-scale limit (Ap-
pendix E). Although τ ≪ 1 is not a natural assumption for
most space and astrophysical plasmas, Hall MHD has, due to
its simplicity, been a popular theoretical paradigm in the stud-
ies of space and astrophysical plasma turbulence (see § 8.2.6).
We have therefore devoted Appendix E to showing how this
approximation fits into the theoretical framework proposed
here: namely, we derive the anisotropic low-frequency ver-
sion of the Hall MHD approximation from gyrokinetics under
the assumption τ ≪ 1 and discuss the role of the ion inertial
and ion sound scales, which acquire physical significance in
this limit. However, outside this Appendix, we assume τ ∼ 1
everywhere and shall not use Hall MHD.
The validity of the ERMHD equations as a model for
plasma dynamics in the dissipation range is further discussed
in § 7.6.
7.3. Kinetic Alfvén Waves
The linear modes supported by ERMHD are kinetic Alfvén
waves (KAW) with frequencies
ωk = ±
1 + Z/τ
2 +βi
1 + Z/τ
) k⊥ρik‖vA. (230)
This dispersion relation is illustrated in Fig. 8: note that the
transition from Alfvén waves to dispersive KAW always oc-
curs at k⊥ρi ∼ 1, even when βi ≪ 1 or βi ≫ 1. In the latter
case, there is a sharp frequency jump at the transition (accom-
panied by very strong ion Landau damping).
The eigenfunctions corresponding to the two waves with
32 SCHEKOCHIHIN ET AL.
FIG. 9.— Polarization of the kinetic Alfvén wave, see Eqs. (232) and (233).
frequencies (230) are
2 +βi
∓ k⊥Ψk. (231)
Using Eqs. (229) and (223), the perturbed magnetic-field vec-
tor can be expressed as follows
= −iẑ× k⊥
1 + Z/τ
2 +βi
1 + Z/τ
(232)
so, for a single “+” or “−” wave (corresponding to Θ−k = 0 or
k = 0, respectively), δBk rotates in the plane perpendicular
to the wave vector k⊥ clockwise with respect to the latter,
while the wave propagates parallel or antiparallel to the guide
field (Fig. 9).
The waves are elliptically right-hand polarized. Indeed, us-
ing Eq. (223), the perpendicular electric field is:
E⊥k = −ik⊥ϕ+
−ik⊥ + ẑ×k⊥
ϕ (233)
(cf. Gary 1986; Hollweg 1999). The second term is small in
the gyrokinetic expansion, so this is a very elongated ellipse
(Fig. 9).
7.4. Finite-Amplitude Kinetic Alfvén Waves
As we are about to argue for a critically balanced KAW
turbulence in a fashion analogous to the GS theory for the
Alfvén waves (§ 1.2), it is a natural question to ask how simi-
lar the nonlinear properties of a putative KAW cascade will be
to an Alfvén-wave cascade. As in the case of Alfvén waves,
there are two counterpropagating linear modes [Eqs. (230)
and (231)], and it turns out that certain superpositions of these
modes (KAW packets) are also exact nonlinear solutions of
Eqs. (226-227). Let us show that this is the case.
We might look for the nonlinear solutions of Eqs. (226-227)
by requiring that the nonlinear terms vanish. Since b̂ · ∇ =
∂/∂z + (1/vA){Ψ, · · ·}, this gives
{Ψ,Φ} = 0 ⇒ Ψ = c1Φ, (234)
{Ψ,ρ2i ∇2⊥Ψ} = 0 ⇒ ρ2i ∇2⊥Ψ = c2Ψ, (235)
where c1 and c2 are constants. Whether such solutions are
possible is determined by substituting Eqs. (234) and (235)
into Eqs. (226) and (227) and demanding that the two result-
ing linear equations be consistent with each other (both equa-
tions now just evolve Ψ). This is achieved if29
c21 = −
2 +βi
, (236)
so real solutions exist if c2 < 0. In particular, wave pack-
ets consisting of KAW given by one of the linear eigen-
modes (231) with an arbitrary shape in z but confined to a
single shell |k⊥| = k⊥ = const, satisfy Eqs. (234-236) with
c2 = −k2⊥ρ
i . This outcome is, in fact, only mildly non-trivial:
in gyrokinetics, the Poisson bracket nonlinearity [Eq. (59)]
vanishes for any monochromatic (in k⊥) mode because the
Poisson bracket of two modes with wavenumbers k⊥ and k′⊥
is ∝ ẑ · (k⊥ × k′⊥). Therefore, any monochromatic solution
of the linearized equations is also an exact nonlinear solution.
As we have shown above, a superposition of monochromatic
KAW that have a fixed k⊥, or, somewhat more generally, sat-
isfy Eq. (235) with a fixed c2, is still an exact solution.
Note that a similar procedure applied to the RMHD equa-
tions (17-18) returns the Elsasser solutions: perturbations of
arbitrary shape that satisfy Φ = ±Ψ. The physical difference
between these finite-amplitude Alfven-wave packets and the
finite-amplitude KAW packets discussed above is that non-
linear interactions can occur not just between counterpropa-
gating KAW but also between copropagating ones—a natural
conclusion because KAW are dispersive (their group velocity
along the guide field is ∝ vAk⊥ρi), so copropagating waves
with different k⊥ can “catch up” with each other and inter-
act.30
7.5. Scalings for KAW Turbulence
A scaling theory for the turbulence described by Eqs. (221-
227) can be constructed along the same lines as the GS theory
for the Alfvén-wave turbulence (§ 1.2). Namely, we shall as-
sume that the turbulence below the ion gyroscale consists of
KAW-like fluctuations with k‖ ≪ k⊥ (Quataert & Gruzinov
1999) and that the interactions between them are critically
balanced (Cho & Lazarian 2004), i.e., that the propagation
time and nonlinear interaction time are comparable at every
scale. We stress that none of these assumptions are, strictly
speaking, inevitable31 (and, in fact, neither were they in-
evitable in the case of Alfvén waves). Since we have de-
rived Eqs. (226-227) from gyrokinetics, the anisotropy of
the fluctuations described by these equations is hard-wired,
but it is not guaranteed that the actual physical cascade be-
low the ion gyroscale is indeed anisotropic, although anal-
ysis of solar-wind measurements does seem to indicate that
29 Formally speaking, c1 and c2 can depend on t and z. If this is allowed,
we still recover Eq. (236), but in addition to it, we get the evolution equation
c1∂c1/∂t = vA(1 + Z/τ )∂c1/∂z. This allows c1 = const, but there are, of
course, other solutions. We shall not consider them here.
30 The calculation above is analogous to the calculation by
Mahajan & Krishan (2005) for incompressible Hall MHD (i.e., essen-
tially, the high-βe limit of the equations discussed in Appendix E), but
the result is more general in the sense that it holds at arbitrary ion and
electron betas. The Mahajan–Krishan solution in the EMHD limit amounts
to noticing that Eq. (228) becomes linear for force-free (Beltrami) magnetic
perturbations, ∇× δB = λδB. Substituting Eq. (229) into this equation
and using Eq. (223), we see that the force-free equation is equivalent
to Eqs. (234-236) if c2 = −λ2 and the incompressible limit (βi ≫ 1 or
βe = βiZ/τ ≫ 1) is taken.
31 In fact, the EMHD turbulence was thought to be weak by several au-
thors, who predicted a k−2 spectrum of magnetic energy assuming isotropy
(Goldreich & Reisenegger 1992) or k
for the anisotropic case (Voitenko
1998; Galtier & Bhattacharjee 2003; Galtier 2006).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 33
at least a significant fraction of it is (see Leamon et al.
1998; Hamilton et al. 2008). Numerical simulations based
on Eq. (228) (Biskamp et al. 1996, 1999; Ghosh et al. 1996;
Ng et al. 2003; Cho & Lazarian 2004; Shaikh & Zank 2005)
have revealed that the spectrum of magnetic fluctuations
scales as k
⊥ , the outcome consistent with the assumptions
stated above. Let us outline the argument that leads to this
scaling.
First assume that the fluctuations are KAW-like and that Θ+
and Θ− [Eq. (231)] have similar scaling. This implies
1 +βi
Φλ (237)
(for the purposes of scaling arguments and order-of-
magnitude estimates, we set Z/τ = 1, but keep the βi de-
pendence so low- and high-beta limits could be recovered if
necessary). The fact that fixed-k⊥ KAW packets, which sat-
isfy Eq. (237) with λ = 1/k⊥, are exact nonlinear solutions
of the ERMHD equations (§ 7.4) lends some credence to this
assumption.
Assuming scale-space locality of interactions implies a
constant-flux KAW cascade: analogously to Eq. (1),
(Ψλ/λ)
τKAWλ
∼ (1 +βi)(Φλ/ρi)
τKAWλ
∼ εKAW = const, (238)
where τKAWλ is the cascade time and εKAW is the KAW energy
flux proportional to the fraction of the total flux ε (or the total
turbulent power Pext; see § 3.4) that was converted into the
KAW cascade at the ion gyroscale.
Using Eqs. (226-227) and Eq. (237), it is not hard to see
that the characteristic nonlinear decorrelation time is λ2/Φλ.
If the turbulence is strong, then this time is comparable to
the inverse KAW frequency [Eq. (230)] scale by scale and we
may assume the cascade time is comparable to either:
τKAWλ ∼
1 +βi
. (239)
In other words, this says that ∂/∂z ∼ (δB⊥/B0) ·∇⊥ and so
δB⊥λ/B0 ∼ λ/l‖λ (note that the last relation confirms that
our scaling arguments do not violate the gyrokinetic ordering;
see § 2.1 and § 3.1). Equation (239) is the critical-balance as-
sumption for KAW. As in the case of the Alfvén waves (§ 1.2),
we might argue physically that the critical balance is set up be-
cause the parallel correlation length l‖λ is determined by the
condition that a wave can propagate the distance l‖λ in one
nonlinear decorrelation time corresponding to the perpendic-
ular correlation length λ.
Combining Eqs. (238) and (239), we get the desired scaling
relations for the KAW turbulence:
(εKAW
)1/3 vA
(1 +βi)1/3
2/3, (240)
(1 +βi)1/6
, (241)
where l0 = v
A/ε, as in § 1.2. The first of these scaling relations
is equivalent to a k
⊥ spectrum of magnetic energy, the sec-
ond quantifies the anisotropy (which is stronger than for the
GS turbulence). Both scalings were confirmed in the numer-
ical simulations of Cho & Lazarian (2004)—it is their detec-
tion of the scaling (241) that makes a particularly strong case
that KAW turbulence is not weak and that the critical balance
hypothesis applies.
For KAW-like fluctuations, the density [Eq. (221)] and
magnetic field [Eqs. (223) and (231)] have the same spec-
trum as the scalar potential, i.e., k
⊥ , while the electric field
E ∼ k⊥ϕ has a k−1/3⊥ spectrum. The solar-wind fluctuation
spectra reported by Bale et al. (2005) indeed are consistent
with a transition to KAW turbulence around the ion gyroscale:
k−5/3 magnetic and electric-field power spectra at kρi ≪ 1 are
replaced, for kρi & 1, with what appears to be consistent with
a k−7/3 scaling for the magnetic-field spectrum and a k−1/3 for
the electric one (see Fig. 1). A similar result is recovered in
fully gyrokinetic simulations with βi = 1, τ = 1 (Howes et al.
2008b). However, not all solar-wind observations are quite as
straightforwardly supportive of the notion of the KAW cas-
cade and much steeper magnetic-fluctuation spectra have also
been reported (e.g., Denskat et al. 1983; Leamon et al. 1998;
Smith et al. 2006). Possible reasons for this will emerge in
§ 7.6 and § 7.11 and the solar-wind data are further discussed
in § 8.2.4 and § 8.2.5.
7.6. Validity of the Electron RMHD and the Effect of Electron
Landau Damping
The ERMHD equations derived in § 7 are valid provided
k⊥ρi ≫ 1 and also provided it is sufficient to use the leading
order in the mass-ratio expansion (isothermal electrons; see
§ 4). In particular, this means that the electron Landau damp-
ing is neglected. Asymptotically speaking, this is a rigorous
limit, but one must be cautious in applying it to real plas-
mas. Since the width of the scale range where k⊥ρi ≫ 1 and
k⊥ρe ≪ 1 is only ∼ (mi/me)1/2 ≃ 43, for some values of the
plasma parameters (T0i/T0e and βi) there may not be a very
broad interval of scales where the electron Landau damping
is truly negligible. Consider, for example, the low-beta limit,
βi ≪ 1. In this limit, the KAW frequency is ω ∼ k⊥ρik‖vA
[Eq. (230)]. The electron Landau damping becomes impor-
tant when ω ∼ k‖vthe, or k⊥ρe ∼
βi ≪ 1, so the ERMHD
approximation breaks down and, consequently, the KAW cas-
cade, if any, should be interrupted well before the electron
gyroscale is reached. Figure 8 shows the solution of the
full gyrokinetic dispersion relation (Howes et al. 2006) for
small, unity and large βi. One can judge for which scales and
how well (or how badly) the ERMHD approximation holds
from the precision with which the exact frequency follows the
asymptotic solution Eq. (230) and from the relative strength
of the damping compared to the real frequency of the waves.
Non-negligible electron Landau damping may affect turbu-
lence spectra because one can no longer assume a constant
flux of KAW energy as we did in § 7.5. To evaluate the conse-
quences of this effect, Howes et al. (2008a) constructed a sim-
ple model of spectral energy transfer and concluded that Lan-
dau damping leads to steepening of the KAW spectra—one
of several possible reasons for steep dissipation-range spectra
observed in space plasmas (see also § 7.11).
7.7. Unfreezing of Flux
As ERMHD is a limit of the isothermal-electron-fluid sys-
tem (§ 4), the magnetic-field lines remain unbroken (see
§ 4.3). Within the orderings employed above (small mass ra-
tio, νii ∼ ω, βi ∼ 1, τ ∼ 1), the flux unfreezes only in the
vicinity of the electron gyroscale. It is interesting to evaluate
somewhat more precisely the scale at which this happens as a
function of plasma parameters.
34 SCHEKOCHIHIN ET AL.
Physically, there are three kinds of mechanisms by which
the flux conservation is broken: electron inertia, the effects of
finite electron gyroradius, and Ohmic resistivity. Let us take
the v‖ moment of the electron gyrokinetic equation [Eq. (57),
s = e, integration at constant r] and use Eq. (222) to evaluate
the inertial term in the resulting parallel electron momentum
equation:
d2e∇2⊥A‖, (242)
where de = ρe/
βe is the electron inertial scale and βe =
Zβi/τ . Comparing this with the ∂A‖/∂t term in the right-
hand side of the electron momentum equation, we see that the
electron inertia becomes important when k⊥ρe ∼
βe. The
finite-gyroradius effects enter when k⊥ρe ∼ 1. Thus, at low
βe, the electron inertia becomes important above the electron
gyroscale, whereas at high βe, the finite-gyroradius effects en-
ter first. Finally, the Ohmic resistivity comes from the colli-
sion term (see Appendix B.4):
d3vv‖
νeiu‖e ∼ νeik2⊥d2e A‖. (243)
Thus, resistivity starts to act when k⊥de ∼ (ω/νei)1/2. Using
the KAW frequency [Eq. (230)] to estimate ω and assuming
that τ is not small, we get
k⊥ρe ∼ k‖λmfpi
1 +βi
. (244)
Thus, the resistive scale can only be larger the electron gy-
roscale if the plasma is collisional (k‖λmfpi ≪ 1) and/or elec-
trons are much colder than ions (τ ≫ 1) and/or βi ≪ 1. Note
if only the last of these conditions is satisfied, the electron
inertia still becomes important at larger scales than resistivity.
7.8. Generalized Energy: KAW and Entropy Cascades
The generalized energy (§ 3.4) in the limit k⊥ρi ≫ 1 is cal-
culated by substituting Eqs. (221) and (223) into Eq. (109):
T0i〈h2i 〉r
n0iT0i
=Whi +WKAW. (245)
Here the first term, Whi , is the total variance of hi, which is
proportional to minus the entropy of the ion gyrocenter distri-
bution (see § 3.5) and whose cascade to collisional scales will
be discussed in § 7.9 and § 7.10. The remaining two terms are
the independently cascaded KAW energy:
WKAW =
min0i
|∇⊥Ψ|2
min0i
|Θ+|2 + |Θ−|2
. (246)
Although we can write WKAW as the sum of the energies of
the “+” and “−” linear KAW eigenmodes [Eq. (231)], which
are also exact nonlinear solutions (§ 7.4), the two do not cas-
cade independently and can exchange energy. Note that the
ERMHD equations also conserve
d3rΨΦ, which is readily
interpreted as the helicity of the perturbed magnetic field (see
Appendix F.3). However, it does not affect the KAW cascade
discussed in § 7.5 because it can be argued to have a tendency
to cascade inversely (Appendix F.6).
Comparing the way the generalized energy is split above
and below the ion gyroscale (see § 5.6 for the k⊥ρi ≪ 1 limit),
we interpret what happens at the k⊥ρi ∼ 1 transition as a redis-
tribution of the power that arrived from large scales between
a cascade of KAW and a cascade of the (minus) gyrocenter
entropy in the phase space (see Fig. 5). The latter cascade
is the way in which the energy diverted from the electromag-
netic fluctuations by the collisionless damping (wave–particle
interaction) can be transferred to the collisional scales and de-
posited into heat (§ 7.1). The concept of entropy cascade as
the key agent in the heating of the plasma was introduced in
§ 3.5, where we promised a more detailed discussion later on.
We now proceed to this discussion.
7.9. Entropy Cascade
The ion-gyrocenter distribution function hi satisfies the ion
gyrokinetic equation (121), where ion–electron collisions are
neglected under the mass-ratio expansion. At k⊥ρi ≫ 1, the
dominant contribution to 〈χ〉Ri comes from the electromag-
netic fluctuations associated with KAW turbulence. Since
the KAW cascade is decoupled from the entropy cascade, hi
is a passive tracer of the ring-averaged KAW turbulence in
phase space. Expanding the Bessel functions in the expres-
sion for 〈χ〉Ri ,k [ai ≫ 1 in Eq. (69) with s = i] and making
use of Eqs. (222-223) and of the KAW scaling Ψ ∼ Φ/k⊥ρi
[Eq. (231)], it is not hard to show that
〈χ〉Ri ,k ≃
〈ϕ〉Ri ,k =
J0(ai)Φk
, (247)
where
J0(ai) ≃
, ai = k⊥ρi
, (248)
so hi satisfies [Eq. (121)]
+{〈Φ〉Ri ,hi} =
βiρivA
∂〈Φ〉Ri
F0i + 〈Cii[hi]〉Ri
(249)
with the conservation law [Eq. (70), s = i]
βi ρivA
∂〈Φ〉Ri
hi 〈Cii[hi]〉Ri
. (250)
7.9.1. Nonlinear Perpendicular Phase Mixing
The wave–particle interaction term (the first term on the
right hand sides of these two equations) will shortly be seen
to be subdominant at k⊥ρi ≫ 1. It represents the source of
the invariant Whi due to the collisionless damping at the ion
gyroscale of some fraction of the energy arriving from the in-
ertial range. In a stationary turbulent state, we should have
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 35
FIG. 10.— Nonlinear perpendicular phase-mixing mechanism: the
gyrocenter distribution function at Ri of particles with velocities v⊥ and v
is mixed by turbulent fluctuations of the potential Φ (E×B flows) averaged
over particle orbits separated by a distance greater than the correlation length
of Φ.
dWhi/dt = 0 and this source should be balanced on average by
the (negative definite) collisional dissipation term ( = heating;
see § 3.5). This balance can only be achieved if hi develops
small scales in the velocity space and carries the generalized
energy, or, in this case, entropy, to scales in the phase space at
which collisions are important. A quick way to see this is by
recalling that the collision operator has two velocity deriva-
tives and can only balance the terms on the left-hand side of
Eq. (249) if
∼ ω ⇒ δv
, (251)
where ω is the characteristic frequency of the fluctuations
of hi. If νii ≪ ω, δv/vthi ≪ 1. This is certainly true for
k⊥ρi ∼ 1: taking ω ∼ k‖vA and using k‖λmfpi ≫ 1 (which
is the appropriate limit at and below the ion gyroscale for
most of the plasmas of interest; cf. footnote 24), we have
νii/ω ∼
βi/k‖λmfpi ≪ 1.
The condition (251) means that the collision rate can be ar-
bitrarily small—this will always be compensated by the suf-
ficiently fine velocity-space structure of the distribution func-
tion to produce a finite amount of entropy production (heat-
ing) independent of νii in the limit νii → +0. The situa-
tion bears some resemblance to the emergence of small spa-
tial scales in neutral-fluid turbulence with arbitrarily small
but non-zero viscosity (Kolmogorov 1941). The analogy
is not perfect, however, because the ion gyrokinetic equa-
tion (249) does not contain a nonlinear interaction term that
would explicitly cause a cascade in the velocity space. In-
stead, the (ring-averaged) KAW turbulence mixes hi in the
gyrocenter space via the nonlinear term in Eq. (249), so hi
will have small-scale structure in Ri on characteristic scales
much smaller than ρi. Let us assume that the dominant non-
linear effect is a local interaction of the small-scale fluctua-
tions of hi with the similarly small-scale component of 〈Φ〉Ri .
Since ring averaging is involved and k⊥ρi is large, the val-
ues of 〈Φ〉Ri corresponding to two velocities v and v′ will
come from spatially decorrelated electromagnetic fluctuations
if k⊥v⊥/Ωi and k⊥v
⊥/Ωi [the argument of the Bessel function
in Eq. (247)] differ by order unity, i.e., for
|v⊥ − v′⊥|
(252)
(see Fig. 10). This relation gives a correspondence between
the decorrelation scales of hi in the position and velocity
space. Combining Eqs. (252) and (251), we see that there is
a collisional cutoff scale determined by k⊥ρi ∼ (ω/νii)1/2 ≫
1.32 The cutoff scale is much smaller than the ion gyroscale.
In the range between these scales, collisional dissipation is
small. The ion entropy fluctuations are transferred across this
scale range by means of a cascade, for which we will con-
struct a scaling theory in § 7.9.2 (and, for the case without the
background KAW turbulence, in § 7.10).
It is important to emphasize that no matter how small the
collisional cutoff scale is, all of the generalized energy chan-
neled into the entropy cascade at the ion gyroscale eventually
reaches it and is converted into heat. Note that the rate at
which this happens is in general amplitude-dependent because
the process is nonlinear, although we will argue in § 7.9.4 (see
also § 7.10.3) that the nonlinear cascade time and the parallel
linear propagation (particle streaming) time are related by a
critical-balance-like condition (we will also argue there that
the linear parallel phase mixing, which can generate small
scales in v‖, is a less efficient process than the nonlinear per-
pendicular one discussed above).
It is interesting to note the connection between the entropy
cascade and certain aspects of the gyrofluid closure formal-
ism developed by Dorland & Hammett (1993). In their the-
ory, the emergence of small scales in v⊥ manifested itself as
the growth of high-order v⊥ moments of the gyrocenter distri-
bution function. They correctly identified this effect as a con-
sequence of the nonlinear perpendicular phase mixing of the
gyrocenter distribution function caused by a perpendicular-
velocity-space spread in the ring-averaged E ×B velocities
(given by 〈uE〉Ri = ẑ×∇〈Φ〉Ri in our notation) arising at and
below the ion gyroscale.
7.9.2. Scalings
Since entropy is a conserved quantity, we will follow the
well trodden Kolmogorov path, assume locality of interac-
tions in scale space and constant entropy flux, and conclude,
analogously to Eq. (1),
v8thi
h2iλ ∼ εh = const, (253)
where εh is the entropy flux proportional to the fraction of the
total turbulent power ε (or Pext; see § 3.4) that was diverted
into the entropy cascade at the ion gyroscale, and is the cas-
cade time that we now need to find.
By the critical-balance assumption, the decorrelation time
of the electromagnetic fluctuations in KAW turbulence is
comparable at each scale to the KAW period at that scale and
to the nonlinear interaction time [Eq. (239)]:
τKAWλ ∼
(1 +βi)1/3
. (254)
The characteristic time associated with the nonlinear term in
Eq. (249) is longer than τKAWλ by a factor of (ρi/λ)
1/2 due to
the ring averaging, which reduces the strength of the nonlinear
interaction. This weakness of the nonlinearity makes it pos-
sible to develop a systematic analytical theory of the entropy
32 Another source of small-scale spatial smoothing comes from the per-
pendicular gyrocenter-diffusion terms ∼ −νii(v/vthi)2k2⊥ρ
i hik that arise in
the ring-averaged collision operators, e.g., the second term in the model
operator (B13). These terms again enforce a cutoff wavenumber such that
k⊥ρi ∼ (ω/νii)
1/2 ≫ 1.
36 SCHEKOCHIHIN ET AL.
cascade (Schekochihin & Cowley 2009). It is also possible
to estimate the cascade time via a more qualitative argument
analogous to that first devised by Kraichnan (1965) for the
weak turbulence of Alfvén waves: during each KAW correla-
tion time τKAWλ, the nonlinearity changes the amplitude of hi
by only a small amount:
∆hiλ ∼ (λ/ρi)1/2hiλ ≪ hiλ; (255)
these changes accumulate with time as a random walk,
so after time t, the cumulative change in amplitude is
∆hiλ(t/τKAWλ)
1/2; finally, the cascade time t = is the time
after which the cumulative change in amplitude is compara-
ble to the amplitude itself, which gives, using Eq. (254),
τKAWλ ∼
(1 +βi)1/3
. (256)
Substituting this into Eq. (253), we get
hiλ ∼
v3thi
)1/2(
(1 +βi)1/6√
(257)
which corresponds to a k
⊥ spectrum of entropy.
In the argument presented above, we assumed that the scal-
ing of hi was determined by the nonlinear mixing of hi by
the ring-averaged KAW fluctuations rather than by the wave–
particle interaction term on the right-hand side of Eq. (249).
We can now confirm the validity of this assumption. The
change in amplitude of hi in one KAW correlation time τKAWλ
due to the wave–particle interaction term is
∆hiλ∼
v3thi
βiρivA
∼ n0i
v3thi
(εKAW
)1/3 1√
βi (1 +βi)1/3
7/6, (258)
where we have used Eq. (240). Comparing this with Eq. (255)
and using Eq. (257), we see that ∆hiλ in Eq. (258) is a factor
of (λ/ρi)
1/2 smaller than ∆hiλ due to the nonlinear mixing.
7.9.3. Phase-Space Cutoff
To work out the cutoff scales both in the position and veloc-
ity space, we use Eqs. (251) and (252): in Eq. (251), ω ∼ 1/,
where is the characteristic decorrelation time of hi given by
Eq. (256); using Eq. (252), we find the cutoffs:
∼ (νiiτρi )3/5 = Do−3/5, (259)
where τρi is the cascade time [Eq. (256)] taken at λ = ρi.
By a recently established convention, the dimensionless num-
ber Do = 1/νiiτρi is called the Dorland number. It plays
the role of Reynolds number for kinetic turbulence, mea-
suring the scale separation between the ion gyroscale and
the collisional dissipation scale (Schekochihin et al. 2008b;
Tatsuno et al. 2009a,b).
7.9.4. Parallel Phase Mixing
Another assumption, which was made implicitly, was that
the parallel phase mixing due to the second term on the left-
hand side of Eq. (249) could be ignored. This requires jus-
tification, especially because it is with this “ballistic” term
that one traditionally associates the emergence of small-scale
structure in the velocity space (e.g., Krommes & Hu 1994;
Krommes 1999; Watanabe & Sugama 2004). The effect of
the parallel phase mixing is to produce small scales in veloc-
ity space δv‖ ∼ 1/k‖t. Let us assume that the KAW turbu-
lence imparts its parallel decorrelation scale to hi and use the
scaling relation (241) to estimate k‖ ∼ l−1‖λ. Then, after one
cascade time [Eq. (256)], hi is decorrelated on the parallel
velocity scales
βi(1 +βi)
∼ 1. (260)
We conclude that the nonlinear perpendicular phase mixing
[Eq. (259)] is more efficient than the linear parallel one. Note
that up to a βi-dependent factor Eq. (260) is equivalent to a
critical-balance-like assumption for hi in the sense that the
propagation time is comparable to the cascade time, or k‖v‖ ∼
−1 [see Eq. (249)].
7.10. Entropy Cascade in the Absence of KAW Turbulence
It is not currently known how one might determine ana-
lytically what fraction of the turbulent power arriving from
the inertial range to the ion gyroscale is channeled into the
KAW cascade and what fraction is dissipated via the kinetic
ion-entropy cascade introduced in § 7.9 (perhaps it can only
be determined by direct numerical simulations). It is cer-
tainly a fact that in many solar-wind measurements, the rel-
atively shallow magnetic-energy spectra associated with the
KAW cascade (§ 7.5) fail to appear and much steeper spectra
are detected (close to k−4; see Leamon et al. 1998; Smith et al.
2006). In view of this evidence, it is interesting to ask what
would be the nature of electromagnetic fluctuations below the
ion gyroscale if the KAW cascade failed to be launched, i.e.,
if all (or most) of the turbulent power were directed into the
entropy cascade (i.e., if W ≃Whi in § 7.8).
7.10.1. Equations
It is again possible to derive a closed set of equations for all
fluctuating quantities.
Let us assume (and verify a posteriori; § 7.10.4) that the
characteristic frequency of such fluctuations is much lower
than the KAW frequency [Eq. (230)] so that the first term in
Eq. (116) is small and the equation reduces to the balance of
the other two terms. This gives
, (261)
meaning that the electrons are purely Boltzmann [he = 0 to
lowest order; see Eq. (101)]. Then, from Eq. (118),
ρivthi
eik·r
d3vJ0(ai)hik (262)
Using Eq. (262), we find from Eq. (120) that the field-
strength fluctuations are
eik·r
v2thi
J1(ai)
hik, (263)
which is smaller than Zeϕ/T0i by a factor of βi/k⊥ρi.
Therefore, we can neglect δB‖/B0 compared to δne/n0e in
Eq. (117). Using Eq. (261), we get what is physically the
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 37
electron continuity equation:
+ b̂ ·∇
4πen0e
∇2⊥A‖ + u‖i
= 0, (264)
u‖i =
eik·r
d3vv‖J0(ai)hik. (265)
Note that in terms of the stream and flux functions, Eq. (264)
takes the form
ρ2i ∇2⊥Ψ =
, (266)
where we have approximated b̂ · ∇ ≃ ∂/∂z, which will, in-
deed, be shown to be correct in § 7.10.4.
Together with the ion gyrokinetic equation, which deter-
mines hi, Eqs. (261-264) form a closed set. They describe
low-frequency fluctuations of the density and electromagnetic
field due solely to the presence of fluctuations of hi below the
ion gyroscale.
It follows from Eq. (263) that δB‖/B0 contributes subdom-
inantly to 〈χ〉Ri [Eq. (69) with s = i and ai ≫ 1]. It will be
verified a posteriori (§ 7.10.4) that the same is true for A‖.
Therefore, Eqs. (247) and (249) continue to hold, as in the
case with KAW. This means that Eqs. (249) and (262) form
a closed subset. Thus the kinetic ion-entropy cascade is self-
regulating in the sense that hi is no longer passive (as it was
in the presence of KAW turbulence; § 7.9) but is mixed by
the ring-averaged “electrostatic” fluctuations of the scalar po-
tential, which themselves are produced by hi according to
Eq. (262).
The magnetic fluctuations are passive and determined by
the electrostatic and entropy fluctuations via Eqs. (263)
and (264).
7.10.2. Scalings
From Eq. (262), we can establish a correspondence between
Φλ and hiλ (the electrostatic fluctuations and the fluctuations
of the ion-gyrocenter distribution function):
Φλ ∼ ρivthi
hiλλ, (267)
where the factor of (λ/ρi)
1/2 comes from the Bessel function
[Eq. (248)] and the factor of (δv⊥/vthi)
1/2 results from the
v⊥ integration of the oscillatory factor in the Bessel function
times hi, which decorrelates on small scales in the velocity
space and, therefore, its integral accumulates in a random-
walk-like fashion. The velocity-space scales are related to the
spatial scales via Eq. (252), which was arrived at by an ar-
gument not specific to KAW-like fluctuations and, therefore,
continues to hold.
Using Eq. (267), we find that the wave–particle interaction
term in the right-hand side of Eq. (249) is subdominant: com-
paring it with ∂hi/∂t shows that it is smaller by a factor of
(λ/ρi)
3/2 ≪ 1. Therefore, it is the nonlinear term in Eq. (249)
that controls the scalings of hiλ and Φλ.
We now assume again the scale-space locality and con-
stancy of the entropy flux, so Eq. (253) holds. The cascade
(decorrelation) time is equal to the characteristic time associ-
ated with the nonlinear term in Eq. (249): ∼ (ρi/λ)1/2λ2/Φλ.
Substituting this into Eq. (253) and using Eq. (267), we ar-
rive at the desired scaling relations for the entropy cascade
(Schekochihin et al. 2008b):
v3thi
)1/3 1√
1/6, (268)
)1/3 vthi√
7/6, (269)
)1/3 √
1/3, (270)
where l0 = v
A/ε, as in § 1.2. Note that since the existence
of this cascade depends on it not being overwhelmed by the
KAW fluctuations, we should have εKAW ≪ ε and εh = ε −
εKAW ≈ ε.
The scaling for the ion-gyrocenter distribution function,
Eq. (268), implies a k
⊥ spectrum—the same as for the KAW
turbulence [Eq. (257)]. The scaling for the the cascade time,
Eq. (270), is also similar to that for the KAW turbulence
[Eq. (256)]. Therefore the velocity- and gyrocenter-space cut-
offs are still given by Eq. (259), where τρi is now given by
Eq. (270) taken at λ = ρi.
A new feature is the scaling of the scalar potential, given by
Eq. (269), which corresponds to a k
−10/3
⊥ spectrum (unlike the
KAW spectrum, § 7.5). This is a measurable prediction for the
electrostatic fluctuations: the implied electric-field spectrum
⊥ . From Eq. (261), we also conclude that the density
fluctuations should have the same spectrum as the scalar po-
tential, k
−10/3
⊥ —another measurable prediction.
The scalings derived above for the spectra of the ion
distribution function and of the scalar potential have been
confirmed in the numerical simulations by Tatsuno et al.
(2009a,b), who studied decaying electrostatic gyrokinetic tur-
bulence in two spatial dimensions. They also found velocity-
space scalings in accord with Eq. (252) (using a spectral
representation of the correlation functions in the v⊥ space
based on the Hankel transform of the distribution function;
see Plunk et al. 2009).
7.10.3. Parallel Cascade and Parallel Phase Mixing
We have again ignored the ballistic term (the second on
the left-hand side) in Eq. (249). We will estimate the effi-
ciency of the parallel spatial cascade of the ion entropy and of
the associated parallel phase mixing by making a conjecture
analogous to the critical balance: assuming that any two per-
pendicular planes only remain correlated provided particles
can stream between them in one nonlinear decorrelation time
(cf. § 1.2 and § 7.9.4), we conclude that the parallel particle-
streaming frequency k‖v‖ should be comparable at each scale
to the inverse nonlinear time −1, so
k‖vthi ∼ 1. (271)
As we explained in § 7.9.4, the parallel scales in the velocity
space generated via the ballistic term are related to the parallel
wavenumbers by δv‖ ∼ 1/k‖t. From Eq. (271), we find that
after one cascade time , the typical parallel velocity scale is
δv‖/vthi ∼ 1, so the parallel phase mixing is again much less
efficient than the perpendicular one.
Note that Eq. (271) combined with Eq. (270) means that the
anisotropy is again characterized by the scaling relation k‖ ∼
⊥ , similarly to the case of KAW turbulence [see Eq. (241)
and § 7.9.4].
38 SCHEKOCHIHIN ET AL.
7.10.4. Scalings for the Magnetic Fluctuations
The scaling law for the fluctuations of the magnetic-field
strength follows immediately from Eqs. (263) and (269):
ρivthi
−11/6
13/6, (272)
whence the spectrum of these fluctuations is k
−16/3
The scaling of A‖ (the perpendicular magnetic fluctuations)
depends on the relation between k‖ and k⊥. Indeed, the ratio
between the first and the third terms on the left-hand side of
Eq. (264) [or, equivalently, between the first and second terms
on the right-hand side of Eq. (266)] is ∼
k‖vthi
. For a crit-
ically balanced cascade, this makes the two terms comparable
[Eq. (271)]. Using the first term to work out the scaling for the
perpendicular magnetic fluctuations, we get, using Eq. (269),
ρivthi
−11/6
13/6, (273)
which is the same scaling as for δB‖/B0 [Eq. (272)].
Using Eq. (273) together with Eqs. (269) and (270), it is
now straightforward to confirm the three assumptions made
in § 7.10.1 that we promised to verify a posteriori:
1. In Eq. (116), ∂A‖/∂t ≪ cb̂ ·∇ϕ, so Eq. (261) holds (the
electrons remain Boltzmann). This means that no KAW
can be excited by the cascade.
2. δB⊥/B0 ≪ k‖/k⊥, so b̂ ·∇ ≃ ∂/∂z in Eq. (264). This
means that field lines are not significantly perturbed.
3. In the expression for 〈χ〉Ri [Eq. (69)], v‖A‖/c ≪ ϕ, so
Eq. (249) holds. This means that the electrostatic fluc-
tuations dominate the cascade.
7.11. Cascades Superposed?
The spectra of magnetic fluctuations obtained in § 7.10.4
are very steep—steeper, in fact, than those normally observed
in the dissipation range of the solar wind (§ 8.2.5). One might
speculate that the observed spectra may be due to a superposi-
tion of the two cascades realizable below the ion gyroscale: a
high-frequency cascade of KAW (§ 7.5) and a low-frequency
cascade of electrostatic fluctuations due to the ion entropy
fluctuations (§ 7.10). Such a superposition could happen if
the power going into the KAW cascade is relatively small,
εKAW ≪ ε. One then expects an electrostatic cascade to be
set up just below the ion gyroscale with the KAW cascade
superseding it deeper into the dissipation range. Comparing
Eqs. (240) and (269), we can estimate the position of the spec-
tral break:
k⊥ρi ∼
ε/εKAW
. (274)
Since ρi/ρe ∼ (τmi/me)1/2/Z is not a very large number, the
dissipation range is not very wide. It is then conceivable that
the observed spectra are not true power laws but simply non-
asymptotic superpositions of the electrostatic and KAW spec-
tra with the observed range of “effective” spectral exponents
due to varying values of the spectral break (274) between the
two cascades.33
33 Several alternative theories that aim to explain the dissipation-range
spectra exist: see § 8.2.6.
The value of εKAW/ε specific to any particular set of param-
eters (βi, τ , etc.) is set by what happens at k⊥ρi ∼ 1 (§ 7.1;
see § 8.2.2, § 8.2.5, and § 8.5 for further discussion).
7.12. Below the Electron Gyroscale: The Last Cascade
Finally, let us consider what happens when k⊥ρe ≫ 1. At
these scales, we have to return to the full gyrokinetic sys-
tem of equations. The quasi-neutrality [Eq. (61)], parallel
[Eq. (62)] and perpendicular [Eq. (66)] Ampère’s law become
eik·r
d3vJ0(ae)hek, (275)
4πen0e
∇2⊥A‖ =
eik·r
d3vv‖J0(ae)hek, (276)
eik·r
v2the
J1(ae)
hek, (277)
where βe = βiZ/τ . We have discarded the velocity integrals
of hi both because the gyroaveraging makes them subdom-
inant in powers of (me/mi)
1/2 and because the fluctuations
of hi are damped by collisions [assuming the collisional cut-
off given by Eq. (259) lies above the electron gyroscale]. To
Eqs. (275-277), we must append the gyrokinetic equation for
he [Eq. (57) with s = e], thus closing the system.
The type of turbulence described by these equations is very
similar to that discussed in § 7.10. It is easy to show from
Eqs. (275-277) that
. (278)
Hence the magnetic fluctuations are subdominant in the ex-
pression for 〈χ〉Re [Eq. (69) with s = e and ae ≫ 1], so
〈χ〉Re ≃ 〈ϕ〉Re . The electron gyrokinetic equation then is
{〈ϕ〉Re ,he} =
, (279)
where the wave–particle interaction term in the right-hand
side has been dropped because it can be shown to be small
via the same argument as in § 7.10.2.
Together with Eq. (275), Eq. (279) describes the kinetic cas-
cade of electron entropy from the electron gyroscale down to
the scale at which electron collisions can dissipate it into heat.
This cascade the result of collisionless damping of KAW at
k⊥ρe ∼ 1, whereby the power in the KAW cascade is con-
verted into the electron-entropy fluctuations: indeed, in the
limit k⊥ρe ≫ 1, the generalized energy is simply
= Whe (280)
(see Fig. 5).
The same scaling arguments as in § 7.10.2 apply and scaling
relations analogous to Eqs. (268-270), and (272) duly follow:
v3the
(εKAW
1/6, (281)
(εKAW
vthe l
7/6, (282)
)1/3(
, (283)
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 39
(εKAW
−11/6
13/6, (284)
where l0 = v
A/ε, as in § 1.2. The formula for the collisional
cutoffs in the wavenumber and velocity space is analogous to
Eq. (259):
∼ (νeiτρe )3/5, (285)
where τρe is the cascade time (283) taken at λ = ρe.
7.13. Validity of Gyrokinetics in the Dissipation Range
As the kinetic cascade takes the (generalized) energy to ever
smaller scales, the frequency ω of the fluctuations increases.
In applying the gyrokinetic theory, one must be mindful of
the need for this frequency to stay smaller than Ωi. Using
the scaling formulae for the characteristic times of the fluc-
tuations derived above [Eqs. (254), (270) and (283)], we can
determine the conditions for ω ≪ Ωi. Thus, for the gyroki-
netic theory to be valid everywhere in the inertial range, we
must have
k⊥ρi ≪ β3/4i
(286)
at all scales down to k⊥ρi ∼ 1, i.e., ρi/l0 ≪ β3/2i , not a very
stringent condition.
Below the ion gyroscale, the KAW cascade (§ 7.5) remains
in the gyrokinetic regime as long as
k⊥ρi ≪
i (1 +βi)
(287)
(we are assuming Ti/Te ∼ 1 everywhere). The condition for
this still to be true at the electron gyroscale is
i (1 +βi)
. (288)
The ion entropy fluctuations passively mixed by the KAW tur-
bulence (§ 7.9) satisfy Eq. (287) at all scales down to the ion
collisional cutoff [Eq. (259)] if
λmfpi
i (1 +βi)
. (289)
Note that the condition for the ion collisional cutoff to lie
above the electron gyroscale is
λmfpi
βi(1 +βi)1/3
)5/6(
(290)
In the absence of KAW turbulence, the pure ion-entropy cas-
cade (§ 7.10) remains gyrokinetic for
k⊥ρi ≪ β3/2i
. (291)
This is valid at all scales down to the ion collisional cutoff
provided λmfpi/l0 ≪ β3i (l0/ρi), an extremely weak condition,
which is always satisfied. This is because the ion-entropy
fluctuations in this case have much lower frequencies than in
the KAW regime. The ion collisional cutoff lies above the
electron gyroscale if, similarly to Eq. (290),
λmfpi
)5/6(
. (292)
If the condition (290) is satisfied, all fluctuations of the
ion distribution function are damped out above the electron
gyroscale. This means that below this scale, we only need
the electron gyrokinetic equation to be valid, i.e., ω ≪ Ωe.
The electron-entropy cascade (§ 7.12), whose characteristic
timescale is given by Eq. (283), satisfies this condition for
k⊥ρe ≪
β3/2e
. (293)
This is valid at all scales down to the electron
collisional cutoff [Eq. (285)] provided λmfpe/l0 ≪
(ε/εKAW)
2β3e (mi/me)
3(l0/ρe), which is always satisfied.
Within the formal expansion we have adopted (k⊥ρi ∼ 1
and k‖λmfpi ∼
βi), it is not hard to see that λmfpi/l0 ∼ ǫ2
and ρi/l0 ∼ ǫ3. Since all other parameters (me/mi, βi, βe
etc.) are order unity with respect to ǫ, all of the above con-
ditions for the validity of the gyrokinetics are asymptotically
correct by construction. However, in application to real as-
trophysical plasmas, one should always check whether this
construction holds. For example, substituting the relevant pa-
rameters for the solar wind shows that the gyrokinetic ap-
proximation is, in fact, likely to start breaking down some-
where between the ion and electron gyroscales (Howes et al.
2008a).34 This releases a variety of high-frequency wave
modes, which may be participating in the turbulent cascade
around and below the electron gyroscale (see, e.g., the recent
detailed observations of these scales in the magnetosheath by
Mangeney et al. 2006; Lacombe et al. 2006 or the early mea-
surements of high-frequency fluctuations in the solar wind by
Denskat et al. 1983; Coroniti et al. 1982).
7.14. Summary
In this section, we have analyzed the turbulence in the dissi-
pation range, which turned out to have many more essentially
kinetic features than the inertial range.
At the ion gyroscale, k⊥ρi ∼ 1, the kinetic cascade rear-
ranged itself into two distinct components: part of the (gener-
alized) energy arriving from the inertial range was collision-
lessly damped, giving rise to a purely kinetic cascade of ion-
entropy fluctuations, the rest was converted into a cascade of
Kinetic Alfvén Waves (KAW) (Fig. 5; see § 7.1 and § 7.8).
The KAW cascade is described by two fluid-like equa-
tions for two scalar functions, the magnetic flux function
Ψ = −A‖/
4πmin0i and the scalar potential, expressed, for
continuity with the results of § 5, in terms of the function
Φ = (c/B0)ϕ. The equations are (see § 7.2)
b̂ ·∇Φ, (294)
2 +βi
1 + Z/τ
) b̂ ·∇
ρ2i ∇2⊥Ψ
, (295)
where b̂ · ∇ = ∂/∂z + (1/vA){Ψ, · · ·}. The density and
34 See this paper also for a set of numerical tests of the validity of gy-
rokinetics in the dissipation range, a linear theory of the conversion of KAW
into ion-cyclotron-damped Bernstein waves, and a discussion of the potential
(un)importance of ion cyclotron damping for the dissipation of turbulence.
40 SCHEKOCHIHIN ET AL.
magnetic-field-strength fluctuations are directly related to the
scalar potential:
. (296)
We call Eqs. (294-296) the Electron Reduced Magnetohydro-
dynamics (ERMHD).
The ion-entropy cascade is described by the ion gyrokinetic
equation:
+{〈Φ〉Ri ,hi} = 〈Cii[hi]〉Ri . (297)
The ion distribution function is mixed by the ring-averaged
scalar potential and undergoes a cascade both in the velocity
and gyrocenter space—this phase-space cascade is essential
for the conversion of the turbulent energy into the ion heat,
which can ultimately only be done by collisions (see § 7.9).
If the KAW cascade is strong (its power εKAW is an order-
unity fraction of the total injected turbulent power ε), it de-
termines Φ in Eq. (297), so the ion-entropy cascade is passive
with respect to the KAW turbulence. Equations (294-295) and
(297) form a closed system that determines the three func-
tions Φ, Ψ, hi, of which the latter is slaved to the first two.
One can also compute δne and δB‖, which are proportional
to Φ [Eq. (296)]. The generalized energy conserved by these
equations is given by Eq. (245).
If the KAW cascade is weak (εKAW ≪ ε), the ion-entropy
cascade dominates the turbulence in the dissipation range and
drives low-frequency mostly electrostatic fluctuations, with a
subdominant magnetic component. These are given by the
following relations (see § 7.10)
ρivthi
2(1 + τ/Z)
eik·r
d3vJ0(ai)hik, (298)
ρivthi
, (299)
eik·r ×
1 + Z/τ
J0(ai)
hik, (300)
eik·r
v2thi
J1(ai)
hik, (301)
where ai = k⊥v⊥/Ωi, Equations (297) and (298) form a closed
system for Φ and hi. The rest of the fields, namely δne, Ψ and
δB‖, are slaved to hi via Eqs. (299-301).
The fluid and kinetic models summarized above are valid
between the ion and electron gyroscales. Below the electron
gyroscale, the collisionless damping of the KAW cascade con-
verts it into a cascade of electron entropy, similar in nature to
the ion-entropy cascade (§ 7.12).
The KAW cascade and the low-frequency turbulence asso-
ciated with the ion-entropy cascade have distinct scaling be-
haviors. For the KAW cascade, the spectra of the electric,
density and magnetic fluctuations are (§ 7.5)
EE (k⊥) ∝ k−1/3⊥ , En(k⊥) ∝ k
⊥ , EB(k⊥) ∝ k
⊥ . (302)
For the ion- and electron-entropy cascades (§ 7.9 and § 7.12),
EE (k⊥) ∝ k−4/3⊥ , En(k⊥) ∝ k
−10/3
⊥ , EB(k⊥) ∝ k
−16/3
(303)
We argued in § 7.11 that the observed spectra in the dissipa-
tion range of the solar wind could be the result of a superpo-
sition of these two cascades, although a number of alternative
theories exist (§ 8.2.6).
8. DISCUSSION OF ASTROPHYSICAL APPLICATIONS
We have so far only occasionally referred to some relevant
observational evidence for space and astrophysical plasmas.
We now discuss in more detail how the theoretical framework
laid out above applies to real plasma turbulence in space.
Although we will discuss the interstellar medium, accre-
tion disks and galaxy clusters towards the end of this sec-
tion, the most rewarding source of observational information
about plasma turbulence in astrophysical conditions is the so-
lar wind and the magnetosheath because only there direct in
situ measurements of all the interesting quantities are possi-
ble. Measurements of the fluctuating magnetic and velocity
fields in the solar wind have been available since the 1960s
(Coleman 1968) and a vast literature now exists on their spec-
tra, anisotropy, Alfvénic character and many other aspects (a
short recent review is Horbury et al. 2005; two long ones are
Tu & Marsch 1995; Bruno & Carbone 2005). It is not our
aim here to provide a comprehensive survey of what is known
about plasma turbulence in the solar wind. Instead, we shall
limit our discussion to a few points that we consider impor-
tant in light of the theoretical framework proposed in this pa-
per.35 As we do this, we shall provide copious references to
the main body of the paper, so this section can be read as a
data-oriented guide to it, aimed both at a thorough reader who
has arrived here after going through the preceding sections
and an impatient one who has skipped to this one hoping to
find out whether there is anything of “practical” use in the
theoretical developments above.
8.1. Inertial-Range Turbulence in the Solar Wind
In the inertial range, i.e., for k⊥ρi ≪ 1, the solar-wind turbu-
lence should be described by the reduced hybrid fluid-kinetic
theory derived in § 5 (KRMHD). Its applicability hinges on
three key assumptions: (i) the turbulence is Alfvénic, i.e., con-
sists of small (δB/B0 ≪ 1) low-frequency (ω ∼ k‖vA ≪ Ωi)
perturbations of an ambient mean magnetic field and corre-
sponding velocity fluctuations; (ii) it is strongly anisotropic,
k⊥ ≫ k‖; (iii) the equilibrium distribution can be approxi-
mated or, at least, reasonably modeled by a Maxwellian with-
out loss of essential physics (this will be discussed in § 8.3).
If these assumptions are satisfied, KRMHD (summarized in
§ 5.7) is a rigorous set of dynamical equations for the inertial
range, a set of Kolmogorov-style scaling predictions for the
Alfvénic component of the turbulence can be produced (the
GS theory, reviewed in § 1.2), while to the compressive fluc-
tuations, the considerations of § 6 apply. So let us examine
the observational evidence.
8.1.1. Alfvénic Nature of the Turbulence
The presence of Alfvén waves in the solar wind was re-
ported already the early works of Unti & Neugebauer (1968)
and Belcher & Davis (1971). Alfvén waves are detected al-
ready at very low frequencies (large scales)—and, at these
35 An extended quantitative discussion of the applicability of the gyroki-
netic theory to the turbulence in the slow solar wind was given by Howes et al.
(2008a).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 41
low frequencies, have a k−1 spectrum.36 This spectrum cor-
responds to a uniform distribution of scales/frequencies of
waves launched by the coronal activity of the Sun. Nonlin-
ear interaction of these waves gives rise to an Alfvénic tur-
bulent cascade of the type that was discussed above. The ef-
fective outer scale of this cascade can be detected as a spec-
tral break where the k−1 scaling steepens to the Kolmogorov
slope k−5/3 (see Bavassano et al. 1982; Marsch & Tu 1990a;
Horbury et al. 1996 for fast-wind results on the spectral break;
for a discussion of the effective outer scale in the slow wind
at 1 AU, see Howes et al. 2008a). The particular scale at
which this happens increases with the distance from the Sun
(Bavassano et al. 1982), reflecting the more developed state
of the turbulence at later stages of evolution. At 1 AU, the
outer scale is roughly in the range of 105 − 106 km; the k−5/3
range extends down to scales/frequencies that correspond to a
few times the ion gyroradius (102 − 103 km; see Table 1).
The range between the outer scale (the spectral break) and
the ion gyroscale is the inertial range. In this range, δB/B0 de-
creases with scale because of the steep negative spectral slope.
Therefore, the assumption of small fluctuations, δB/B0 ≪ 1,
while not necessarily true at the outer scale, is increasingly
better satisfied further into the inertial range (cf. § 1.3).
Are these fluctuations Alfvénic? In a plasma such as the
solar wind, they ought to be because, as showed in § 5.3, for
k⊥ρi ≪ 1, these fluctuations are rigorously described by the
RMHD equations. The magnetic flux is frozen into the ion
motions, so displacing a parcel of plasma should produce a
matching (Alfvénic) perturbation of the magnetic field line
and vice versa: in an Alfvén wave, u⊥ = ±δB⊥/
4πmin0i.
The strongest confirmation that this is indeed true for the
inertial-range fluctuations in the solar wind was achieved by
Bale et al. (2005), who compared the spectra of electric and
magnetic fluctuations and found that they both scale as k−5/3
and follow each other with remarkable precision (see Fig. 1).
The electric field is a very good measure of the perpendicular
velocity field because, for k⊥ρi ≪ 1, the plasma velocity is
the E×B drift velocity, u⊥ = cE× ẑ/B0 (see § 5.4).
This picture of agreement between basic theory and ob-
servations is upset in a disturbing fashion by an extraordi-
nary recent result by Chapman & Hnat (2007); Podesta et al.
(2006) and J. E. Borovsky (2008, private communication),
who claim different spectral indices for velocity and mag-
netic fluctuations—k−3/2 and k−5/3, respectively. This result
is puzzling because if it is asymptotically correct in the iner-
tial range, it implies either u⊥ ≫ δB⊥ or u⊥ ≪ δB⊥ and it is
not clear how perpendicular velocity fluctuations in a near-
ideal plasma could fail to produce Alfvénic displacements
and, therefore, perpendicular magnetic field fluctuations with
matching energies. Plausible explanations may be either that
the velocity field in these measurements is polluted by a non-
Alfvénic component parallel to the magnetic field (although
data analysis by Chapman & Hnat 2007 does not support this)
or that the flattening of the velocity spectrum is due to some
form of a finite-gyroradius effect or even an energy injection
into the velocity fluctuations at scales approaching the ion
gyroscale (e.g., from the pressure-anisotropy-driven instabili-
36 Inferred from the frequency spectrum f −1 via the Taylor (1938) hypoth-
esis, f ∼ k ·Vsw , where Vsw is the mean velocity at which the wind blows
past the spacecraft. The Taylor hypothesis is a good assumption for the so-
lar wind because Vsw (∼ 800 km/s in the fast wind, ∼ 300 km/s in the slow
wind) is highly supersonic, super-Alfvénic and far exceeds the fluctuating
velocities.
ties, § 8.3).
8.1.2. Energy Spectrum
How solid is the statement that the observed spectrum
has a k−5/3 scaling? In individual measurements of the
magnetic-energy spectra, very high accuracy is claimed
for this scaling: the measured spectral exponent is be-
tween 1.6 and 1.7; agreement with Kolmogorov value 1.67
is often reported to be within a few percent (see, e.g.,
Horbury et al. 1996; Leamon et al. 1998; Bale et al. 2005;
Narita et al. 2006; Alexandrova et al. 2008a; Horbury et al.
2008)). There is a somewhat wider scatter of spectral in-
dices if one considers large sets of measurement intervals
(Smith et al. 2006), but overall, the observational evidence
does not appear to be consistent with a k
⊥ spectrum consis-
tently found in the MHD simulations with a strong mean field
(Maron & Goldreich 2001; Müller et al. 2003; Mason et al.
2007; Perez & Boldyrev 2008, 2009; Beresnyak & Lazarian
2008b) and defended on theoretical grounds in the recent
modifications of the GS theory by Boldyrev (2006) and by
Gogoberidze (2007) (see footnote 10). This discrepancy be-
tween observations and simulations remains an unresolved
theoretical issue. It is probably best addressed by numeri-
cal modeling of the RMHD equations (§ 2.2) and by a de-
tailed comparison of the structure of the Alfvénic fluctuations
in such simulations and in the solar wind.
8.1.3. Anisotropy
Building up evidence for anisotropy of turbulent fluctua-
tions has progressed from merely detecting their elongation
along the magnetic field (Belcher & Davis 1971)—to fitting
data to an ad hoc model mixing a 2D perpendicular and a
1D parallel (“slab”) turbulent components in some propor-
tion37 (Matthaeus et al. 1990; Bieber et al. 1996; Dasso et al.
2005; Hamilton et al. 2008)—to formal systematic unbiased
analyses showing the persistent presence of anisotropy at all
scales (Bigazzi et al. 2006; Sorriso-Valvo et al. 2006)— to di-
rect measurements of three-dimensional correlation functions
(Osman & Horbury 2007)—and finally to computing spectral
exponents at fixed angles between k and B0 (Horbury et al.
2008). The latter authors appear to have achieved the first
direct quantitative confirmation of the GS theory by demon-
strating that the magnetic-energy spectrum scales as k
wavenumbers perpendicular to the mean field and as k−2
wavenumbers parallel to it [consistent with the first scaling
relation in Eq. (4)]. This is the closest that observations have
got to confirming the GS relation k‖ ∼ k
⊥ [see Eq. (5)] in a
real astrophysical turbulent plasma.
8.1.4. Compressive Fluctuations
According to the theory developed in § 5, the density and
magnetic-field-strength fluctuations are passive, energetically
decoupled from and mixed by the Alfvénic cascade (§ 5.5;
these are slow and entropy modes in the collisional MHD
limit—see § 2.4 and § 6.1). These fluctuations are expected to
be pressure-balanced, as expressed by Eq. (22) or, more gen-
erally in gyrokinetics, by Eq. (67). There is, indeed, strong
37 These techniques originate from the view of MHD turbulence as a su-
perposition of a 2D turbulence and an admixture of Alfvén waves (Fyfe et al.
1977; Montgomery & Turner 1981). As we discussed in § 1.2, we consider
the Goldreich & Sridhar (1995, 1997) view of a critically balanced Alfvénic
cascade to be better physically justified.
42 SCHEKOCHIHIN ET AL.
evidence that magnetic and thermal pressures in the solar
wind are anticorrelated, although there are some indications
of the presence of compressive, fast-wave-like fluctuations as
well (Roberts 1990; Burlaga et al. 1990; Marsch & Tu 1993;
Bavassano et al. 2004).
Measurements of density and field-strength fluctua-
tions done by a variety of different methods both at
1 AU (Celnikier et al. 1983, 1987; Marsch & Tu 1990b;
Bershadskii & Sreenivasan 2004; Hnat et al. 2005;
Kellogg & Horbury 2005; Alexandrova et al. 2008a) and
near the Sun (Lovelace et al. 1970; Woo & Armstrong 1979;
Coles & Harmon 1989; Coles et al. 1991) show fluctuation
levels of order 10% and spectra that appear to have a k−5/3
scaling above scales of order 102 − 103 km, which approxi-
mately corresponds to the ion gyroscale. The Kolmogorov
value of the spectral exponent is, as in the case of Alfvénic
fluctuations, measured quite accurately in individual cases
(1.67 ± 0.03 in Celnikier et al. 1987). Interestingly, the
higher-order structure function exponents measured for the
magnetic-field strength show that it is a more intermittent
quantity than the velocity or the vector magnetic field (i.e.,
than the Alfvénic fluctuations) and that the scaling expo-
nents are quantitatively very close to the values found for
passive scalars in neutral fluids (Bershadskii & Sreenivasan
2004; Bruno et al. 2007). One might argue that this lends
some support to the theoretical expectation of passive
magnetic-field-strength fluctuations.
Considering that in the collisionless regime these fluctua-
tions are supposed to be subject to strong kinetic damping
(§ 6.2.2), the presence of well-developed Kolmogorov-like
and apparently undamped turbulent spectra is more surprising
than has perhaps been publicly acknowledged. An extended
discussion of this issue was given in § 6.3. Without the in-
clusion of the dissipation effects associated with the finite ion
gyroscale, the passive cascade of the density and field strength
is purely perpendicular to the (exact) local magnetic field and
does not lead to any scale refinement along the field. This im-
plies highly anisotropic field-aligned structures, whose length
is determined by the initial conditions (i.e., conditions in the
corona). The kinetic damping is inefficient for such fluctua-
tions. While this would seem to explain the presence of fully
fledged power-law spectra, it is not entirely obvious that the
parallel cascade is really absent once dissipation is taken into
account (Lithwick & Goldreich 2001), so the issue is not yet
settled. This said, we note that there is plenty of evidence of
a high degree of anisotropy and field alignment of the den-
sity microstructure in the inner solar wind and outer corona
(e.g., Armstrong et al. 1990; Grall et al. 1997; Woo & Habbal
1997). There is also evidence that the local structure of the
compressive fluctuations at 1 AU is correlated with the coro-
nal activity, implying some form of memory of initial condi-
tions (Kiyani et al. 2007; Hnat et al. 2007; Wicks et al. 2009).
We note, finally, that whether compressive fluctuations in
the inertial range can develop short parallel scales should also
tell us how much ion heating can result from their damping
(see § 6.2.4).
8.2. Dissipation-Range Turbulence in the Solar Wind and the
Magnetosheath
At scales approaching the ion gyroscale, k⊥ρi ∼ 1, effects
associated with the finite extent of ion gyroorbits start to
matter. Observationally, this transition manifests itself as a
clear break in the spectrum of magnetic fluctuations, with the
inertial-range k−5/3 scaling replaced by a steeper slope (see
Fig. 1). While the electrons at these scales can be treated as
an isothermal fluid (as long as we are considering fluctuations
above the electron gyroscale, k⊥ρe ≪ 1; see § 4), the fully
gyrokinetic description (§ 3) has to be adopted for the ions.
It is, indeed, to understand plasma dynamics at and around
k⊥ρi ∼ 1 that gyrokinetics was first designed in fusion plasma
theory (Frieman & Chen 1982; Brizard & Hahm 2007). In or-
der for gyrokinetics and further dissipation-range approxima-
tions that follow from it (§ 7) to be a credible approach in
the solar wind and other space plasmas, it has to be estab-
lished that fluctuations at and below the ion gyroscale are still
strongly anisotropic, k‖ ≪ k⊥. If that is the case, then their
frequencies (ω∼ k‖vAk⊥ρi, see § 7.3) will still be smaller than
the cyclotron frequency in at least a part of the “dissipation
range”38—the range of scales k⊥ρi & 1 (see § 7.13).
Note that additional information about the dissipation-
range turbulence can be extracted from the measurements in
the magnetosheath—while scales above the ion gyroscale are
probably non-universal there, the dissipation range appears to
display universal behavior, mostly similar to the solar wind
(see, e.g., Alexandrova 2008). This complements the obser-
vational picture emerging from the solar-wind data and al-
lows us to learn more as fluctuation amplitudes in the mag-
netosheath are larger and much smaller scales can be probed
than in the solar wind (Mangeney et al. 2006; Lacombe et al.
2006; Alexandrova et al. 2008b).
8.2.1. Anisotropy
We know with a fair degree of certainty that the fluctu-
ations that cascade down to the ion gyroscale from the in-
ertial range are strongly anisotropic (§ 8.1.3). While it ap-
pears likely that the anisotropy persists at k⊥ρi ∼ 1, it is ex-
tremely important to have a clear verdict on this assumption
from solar wind measurements. While Leamon et al. (1998)
and, more recently, Hamilton et al. (2008) did present some
evidence that magnetic fluctuations in the solar wind have a
degree of anisotropy below the ion gyroscale, no definitive
study similar to Horbury et al. (2008) or Bigazzi et al. (2006);
Sorriso-Valvo et al. (2006) exists as yet. In the magne-
tosheath, where the dissipation-range scales are easier to mea-
sure than in the solar wind, recent analysis by Sahraoui et al.
(2006); Alexandrova et al. (2008b) does show evidence of
strong anisotropy.
Besides confirming the presence of the anisotropy, it would
be interesting to study its scaling characteristics: e.g., check
the scaling prediction k‖ ∼ k
⊥ [Eq. (241); see also § 7.9.4
and § 7.10.3] in a similar fashion as the GS relation k‖ ∼ k
[Eq. (5)] was corroborated by Horbury et al. (2008).
In this paper, we have proceeded on the assumption that
the anisotropy, and, therefore, low frequencies (ω ≪ Ωi) do
characterize fluctuations in the dissipation range—or, at least,
that the low-frequency anisotropic fluctuations are a signifi-
cant energy cascade channel and can be considered decoupled
from any possible high-frequency dynamics.
8.2.2. Transition at the Ion Gyroscale: Collisionless Damping and
Heating
38 This term, customary in the space-physics literature, is somewhat of
a misnomer because, as we have seen in § 7, rich dissipationless turbulent
dynamics are present in this range alongside what is normally thought of as
dissipation.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 43
If the fluctuations at the ion gyroscale have k‖ ≪ k⊥ and
ω ≪ Ωi (§ 8.2.1), they are not subject to the cyclotron res-
onance (ω − k‖v‖ = ±Ωi), but are subject to the Landau one
(ω = k‖v‖). Alfvénic fluctuations at the ion gyroscale are
no longer decoupled from the compressive fluctuations and
can be Landau-damped (§ 7.1). It seems plausible that it
is the inflow of energy from the Alfvénic cascade that ac-
counts for a pronounced local flattening of the spectrum of
density fluctuations in the solar wind observed just above
the ion gyroscale (Woo & Armstrong 1979; Celnikier et al.
1983, 1987; Coles & Harmon 1989; Marsch & Tu 1990b;
Coles et al. 1991; Kellogg & Horbury 2005).39
In energetic terms, Landau damping amounts to a redis-
tribution of generalized energy from electromagnetic fluctu-
ations to entropy fluctuations (§ 3.4, § 7.8). This gives rise
to the entropy cascade, ultimately transferring the Landau-
damped energy into ion heat (§ 3.5, § 7.9 and § 7.10). How-
ever, only part of the inertial-range cascade is so damped be-
cause an alternative, electron, cascade channel exists: the ki-
netic Alfvén waves (§§ 7.2-7.8). The energy transferred into
the KAW-like fluctuations can cascade to the electron gy-
roscale, where it is Landau damped on electrons, converting
first into the electron entropy cascade and then electron heat
(§ 7.12).
Thus, the transition at the ion gyroscale ultimately de-
cides in what proportion the turbulent energy arriving from
the inertial range is distributed between the ion and electron
heat. How the fraction of power going into either depends on
parameters—βi, Ti/Te, amplitudes, . . . —is a key unanswered
question both in space and astrophysical (see, e.g., § 8.5) plas-
mas. Gyrokinetics appears to be an ideal tool for addressing
this question both analytically and numerically (Howes et al.
2008b). Within the framework outlined in this paper, the min-
imal model appropriate for studying the transition at the ion
gyroscale is the system of equations for isothermal electrons
and gyrokinetic ions derived in § 4 (it is summarized in § 4.9).
8.2.3. Ion Gyroscale vs. Ion Inertial Scale
It is often assumed in the space physics literature that it is
at the ion inertial scale, di = ρi/
βi, rather than at the ion gy-
roscale ρi that the spectral break between the inertial and dis-
sipation range occurs. The distinction between di and ρi be-
comes noticeable when βi is significantly different from unity,
a relatively rare occurrence in the solar wind. While some at-
tempts to determine at which of these two scales a spectral
break between the inertial and dissipation ranges occurs have
produced claims that di is a more likely candidate (Smith et al.
2001), more comprehensive studies of the available data sets
conclude basically that it is hard to tell (Leamon et al. 2000;
Markovskii et al. 2008).
In the gyrokinetic approach advocated in this paper, the ion
inertial scale does not play a special role (see § 7.1). The only
parameter regime in which di does appear as a special scale
is Ti ≪ Te (“cold ions”), when the Hall MHD approximation
can be derived in a systematic way (see Appendix E). This,
however, is not the right limit for the solar wind or most other
astrophysical plasmas of interest because ions are rarely cold.
Hall MHD is discussed further in § 8.2.6 and Appendix E.
8.2.4. KAW Turbulence
39 Celnikier et al. (1987) proposed that the flattening might be a k−1 spec-
trum analogous to Batchelor’s spectrum of passive scalar variance in the
viscous-convective range. We think this analogy cannot apply because den-
sity is not passive at or below the ion gyroscale.
If gyrokinetics is valid at scales k⊥ρi & 1 (i.e., if k‖ ≪ k⊥,
ω ≪ Ωi and it is acceptable to at least model the equilibrium
distribution as a Maxwellian; see § 8.3), the electromagnetic
fluctuations below the ion gyroscale will be described by the
fluid approximation that we derived in § 7.2 and referred to
ERMHD. The wave solutions of this system of equations are
the kinetic Alfvén waves (§§ 7.3-7.4) and it is possible to ar-
gue for a GS-style critically balanced cascade of KAW-like
electromagnetic fluctuations (§ 7.5) between the ion and elec-
tron gyroscales (Landau damped on electrons at k⊥ρe ∼ 1; the
expression for the KAW damping rate in the gyrokinetic limit
is given in Howes et al. 2006; see also Fig. 8).
Individual KAW have, indeed, been detected in space plas-
mas (e.g., Grison et al. 2005). What about KAW turbulence?
How does one tell whether any particular spectral slope one is
measuring corresponds to the KAW cascade or fits some alter-
native scheme for the dissipation-range turbulence (§ 8.2.6)?
It appears to be a sensible program to look for specific rela-
tionships between different fields predicted by theory (§ 7.2)
and for the corresponding spectral slopes and scaling relations
for the anisotropy (§ 7.5). This means that simultaneous mea-
surements of magnetic, electric, density and magnetic-field-
strength fluctuations are needed.
For the solar wind, the spectra of electric and magnetic
fluctuations below the ion gyroscale reported by Bale et al.
(2005) are consistent with the k−1/3 and k−7/3 scalings pre-
dicted for an anisotropic critically balanced KAW cascade
(§ 7.5; see Fig. 1 for theoretical scaling fits superimposed
on a plot taken from Bale et al. 2005; note, however, that
Bale et al. 2005 themselves interpreted their data in a some-
what different way and that their resolution was in any case
not sufficient to be sure of the scalings). They were also able
to check that their fluctuations satisfied the KAW dispersion
relation—for critically balanced fluctuations, this is, indeed,
plausible. Magnetic-fluctuation spectra recently reported by
Alexandrova et al. (2008a) are only slightly steeper than the
theoretical k−7/3 KAW spectrum. These authors also find a
significant amount of magnetic-field-strength fluctuations in
the dissipation range, with a spectrum that follows the same
scaling—this is again consistent with the theoretical picture
of KAW turbulence [see Eq. (223)]. Measurements reported
by Czaykowska et al. (2001); Alexandrova et al. (2008b) for
the magnetosheath appear to present a similar picture.
The density spectra measured by Celnikier et al. (1983,
1987) steepen below the ion gyroscale following the flattened
segment around k⊥ρi ∼ 1 (discussed in § 8.2.2). For a KAW
cascade, the density spectrum should be k−7/3 (§ 7.5); with-
out KAW, k−10/3 (§ 7.10.2). The slope observed in the papers
cited above appears to be somewhat shallower even than k−2
(cf. a similar result by Spangler & Gwinn 1990 for the ISM;
see § 8.4.1), but, given imperfect resolution, neither seriously
in contradiction with the prediction based on the KAW cas-
cade, nor sufficient to corroborate it. Unfortunately, we have
not found published simultaneous measurements of density-
and magnetic- or electric-fluctuation spectra.
8.2.5. Variability of the Spectral Slope
While many measurements consistent with the KAW pic-
ture can be found, there are also many in which the spectra
are much steeper (Denskat et al. 1983; Leamon et al. 1998).
Analysis of a large set of measurements of the magnetic-
fluctuation spectra in the dissipation range of the solar wind
reveals a wide spread in the spectral indices: roughly between
44 SCHEKOCHIHIN ET AL.
−1 and −4 (Smith et al. 2006). There is evidence of a weak
positive correlation between steeper dissipation-range spectra
and higher ion temperatures (Leamon et al. 1998) or higher
cascade rates calculated from the inertial range (Smith et al.
2006). This suggests that a larger amount of ion heating may
correspond to a fully or partially suppressed KAW cascade,
which is in line with our view of the ion heating and the KAW
cascade as the two competing channels of the overall kinetic
cascade (§ 7.8). With a weakened KAW cascade, all or part of
the dissipation range would be dominated by the ion entropy
cascade—a purely kinetic phenomenon manifested by pre-
dominantly electrostatic fluctuations and very steep magnetic-
energy spectra (§ 7.10). This might account both for the steep-
ness of the observed spectra and for the spread in their indices
(§ 7.11), although many other theories exist (see § 8.2.6).
While we may thus have a plausible argument, this is not
yet a satisfactory quantitative theory that would allow us to
predict when the KAW cascade is present and when it is not or
what dissipation-range spectrum should be expected for given
values of the solar-wind parameters (βi, Ti/Te, etc.). Resolu-
tion of this issue again appears to hinge on the question of how
much turbulent power is diverted into the ion entropy cascade
(equivalently, into ion heat) at the ion gyroscale (see § 8.2.2).
8.2.6. Alternative Theories of the Dissipation Range
A number of alternative theories and models have been put
forward to explain the observed spectral slopes (and their vari-
ability) in the dissipation range. It is not our aim to review or
critique them all in detail, but perhaps it is useful to provide a
few brief comments about some of them in light of the theo-
retical framework constructed in this paper.
This entire theoretical framework hinges on adopting gy-
rokinetics as a valid description or, at least, a sensible model
that does not miss any significant channels of energy cascade
and dissipation. While we obviously believe this to be the
right approach, it is worth spelling out what effects are left
out “by construction.”
Parallel Alfvén-wave cascade and ion cyclotron damping. — The
use of gyrokinetics assumes that fluctuations stay anisotropic
at all scales, k‖ ≪ k⊥, and, therefore, ω ≪ Ωi, so the
cyclotron resonances are ordered out. However, if one
insists on routing the Alfvén-wave energy into a paral-
lel cascade, e.g., by forcibly setting k⊥ = 0, it is pos-
sible to construct a weak turbulence theory in which it
is dissipated by the ion cyclotron damping (Yoon & Fang
2008). Numerical simulations of 3D MHD turbulence do
not support the possibility of a parallel Alfvén-wave cascade
(Shebalin et al. 1983; Oughton et al. 1994; Cho & Vishniac
2000; Maron & Goldreich 2001; Cho et al. 2002; Müller et al.
2003). Solar-wind evidence that the perpendicular cascade
dominates is quite strong for the inertial range (§ 8.1.3) and
less so for the dissipation range (§ 8.2.1). While, as stated
in § 8.2.1, one cannot yet definitely claim that observations
tell us that ω ≪ Ωi at k⊥ρi ∼ 1, it has been argued that
observations do not appear to be consistent with cyclotron
damping being the main mechanism for the dissipation of
the inertial-range Alfvénic turbulence at the ion gyroscale
(Leamon et al. 1998, 2000; Smith et al. 2001). Ion-cyclotron
resonance could conceivably be reached somewhere in the
dissipation range (see § 7.13). At this point gyrokinetics will
formally break down, although, as argued by Howes et al.
(2008a, see their § 3.6), this does not necessarily mean that
ion cyclotron damping will become the dominant dissipation
channel for the turbulence.
Parallel whistler cascade. — A parallel magnetosonic/whistler
cascade eventually damped by the electron cyclotron
resonance (Stawicki et al. 2001) is also excluded in the
construction of gyrokinetics. The whistler cascade has
been given some consideration in the Hall MHD approxi-
mation (further discussed at the end of this section). Both
weak-turbulence theory (Galtier 2006) and 3D numerical
simulations (Cho & Lazarian 2004) concluded that, like
in MHD, the turbulent cascade is highly anisotropic, with
perpendicular energy transfer dominating over the parallel
one.40 The same conclusion appears to have been reached
in recent 2D kinetic PIC simulations by Gary et al. (2008);
Saito et al. (2008). Thus, the turbulence again seems to be
driven into the gyrokinetically accessible regime.
While theory and numerical simulations appear to make
arguing in favor of a parallel cascade and cyclotron heat-
ing difficult, there exists some observational evidence in sup-
port of them, especially for the near-Sun solar wind (e.g.,
Harmon & Coles 2005). Thus, the presence or relative im-
portance of the cyclotron heating in the solar wind and, more
generally, the mechanism(s) responsible for the observed per-
pendicular ion heating (Marsch et al. 1983) remain a largely
open problem. Besides the theories mentioned above, many
other ideas have been proposed, some of which attempted
to reconcile the dominance of the low-frequency perpendic-
ular cascade with the possibility of cyclotron heating (e.g.,
Chandran 2005b; Markovskii et al. 2006; see Hollweg 2008
for a concise recent review of the problem).
Mirror cascade. — Sahraoui et al. (2006) analyzed a set of
Cluster multi-spacecraft measurements in the magnetosheath
and reported a broad power-law (∼ k−8/3) spectrum of mirror
structures at and below the ion gyroscale. They claim that
these are not KAW-like fluctuations because their frequency
is zero in the plasma frame. Although these structures are
highly anisotropic with k‖ ≪ k⊥, they cannot be described by
the gyrokinetic theory in its present form because δB‖/B0 is
very large (∼ 40%, occasionally reaching unity) and because
the particle trapping by fluctuations, which is likely to be
important in the nonlinear physics of the mirror instabil-
ity (Kivelson & Southwood 1996; Pokhotelov et al. 2008;
Rincon et al. 2009), is ordered out in gyrokinetics. Thus, if a
“mirror cascade” exists, it is not captured in our description.
More generally, the effect of the pressure-anisotropy-driven
instabilities on the turbulence in the dissipation range is a
wide open area, requiring further analytical effort (see § 8.3).
If k‖ ≪ k⊥, ω ≪ Ωi, and δB/B0 ≪ 1 are accepted for the
dissipation range and plasma instabilities at the ion gyroscale
(§ 8.3) are ignored, the formal gyrokinetic theory and its
asymptotic consequences derived above should hold. There
are two essential features of the linear physics at and below
the ion gyroscale that must play some role: the collisionless
(Landau) damping and the dispersive nature of the wave so-
lutions (see Fig. 8 and § 7.3; cf., e.g., Leamon et al. 1999;
Stawicki et al. 2001). Both of these features have been em-
ployed to explain the spectral break at the ion gyroscale and
the spectral slopes below it.
40 It is possible to produce a parallel cascade artificially by running 1D
simulations (Matthaeus et al. 2008b).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 45
Landau damping and instrumental effects. — In most of our dis-
cussion, (§ 7, §§ 8.2.4-8.2.5), we effectively assumed that the
Landau damping is only important at k⊥ρi ∼ 1 and k⊥ρe ∼ 1,
but not in between, so we could talk about asymptotic scal-
ings and dissipationless cascades. However, as was noted
in § 7.6, a properly asymptotic scaling behavior in the dis-
sipation range is probably impossible in nature because the
scale separation between the ion and electron gyroscales is
only about (mi/me)
1/2 ≃ 43. In particular, there is not always
a wide scale interval where the kinetic damping is negligi-
bly small (especially at low βi; see Fig. 8; cf. Leamon et al.
1999). Howes et al. (2008a) proposed a model of how the
presence of damping combined with instrumental effects (a
resolution floor) could lead to measured spectra that look like
power laws steeper than k−7/3, with the effective spectral ex-
ponent depending on plasma parameters (we refer the reader
to that paper for a discussion of how this compares with pre-
vious models of a similar kind, e.g., Li et al. 2001). A key
physical assumption of theirs and similar models is that the
amount of power drained from the Alfvén-wave and KAW
cascades into the ion heat is set by the strength of the linear
damping. Whether this is justified is not yet clear.
Hall and Electron MHD. — If Landau damping is deemed
unimportant in some part of the dissipation range (which
can be true in some regimes; see Fig. 8 and Howes et al.
2006, 2008a,b) and the wave dispersion is considered to
be the salient feature, it might appear that a fluid, rather
than kinetic, description should be sufficient. Hall MHD
(Mahajan & Yoshida 1998) or its kdi ≫ 1 limit the Electron
MHD (Kingsep et al. 1990) have been embraced by many au-
thors as such a description, suitable both for analytical argu-
ments (Goldreich & Reisenegger 1992; Krishan & Mahajan
2004; Gogoberidze 2005; Galtier & Bhattacharjee 2003;
Galtier 2006; Alexandrova et al. 2008a) and numerical sim-
ulations (Biskamp et al. 1996, 1999; Ghosh et al. 1996;
Ng et al. 2003; Cho & Lazarian 2004; Shaikh & Zank 2005;
Galtier & Buchlin 2007; Matthaeus et al. 2008b).
To what extent does this constitute an approach alterna-
tive to (and better than?) gyrokinetics (as suggested, e.g., by
Matthaeus et al. 2008b)? For fluctuations with k‖ ≪ k⊥, Hall
MHD is merely a particular limit of gyrokinetics: βi ≪ 1 and
Ti/Te ≪ 1 (cold-ion limit; see Appendix E). If k‖ is not small
compared to k⊥, then the gyrokinetics is not valid, while Hall
MHD continues to describe the cold-ion limit correctly (e.g.,
Ito et al. 2004; Hirose et al. 2004), capturing in particular the
whistler branch of the dispersion relation. However, as we
have already mentioned above, the dominance of the perpen-
dicular energy transfer (k‖ ≪ k⊥) is supported both by weak-
turbulence theory for Hall MHD (Galtier 2006) and by 3D
numerical simulations of the Electron MHD (Cho & Lazarian
2004).
Thus, the gyrokinetic theory and its rigorous limits, such
as ERMHD (§ 7.2), supersede Hall MHD for anisotropic tur-
bulence. Since ions are generally not cold in the solar wind
(or any other plasma discussed here), Hall MHD is not for-
mally a relevant approximation. It also entirely misses the
kinetic damping and the associated entropy cascade channel
leading to particle heating (§ 7.1, § 7.9 and § 7.10). However,
Hall MHD does capture the Alfvén waves becoming disper-
sive and numerical simulations of it do show a spectral break,
although, technically speaking, at the wrong scale (di instead
of ρi; see § 7.1). Although Hall MHD cannot be rigorously
used as quantitative theory of the spectral break and the asso-
ciated change in the nature of the turbulent cascade, the Hall
MHD equations in the limit kdi ≫ 1 are mathematically sim-
ilar to our ERMHD equations (see § 7.2 and Appendix E) to
within constant coefficients probably not essential for quali-
tative models of turbulence. Therefore, results of numerical
simulations of Hall and Electron MHD cited above are di-
rectly useful for understanding the KAW cascade—and, in-
deed, in the limit kdi ≫ 1, kde ≪ 1, they are mostly consistent
with the scaling arguments of § 7.5.
Alfvén vortices. — Finally we mention an argument pertaining
to the dissipation-range spectra that is not based on energy
cascades at all. Based on the evidence of Alfvén vortices in
the magnetosheath, Alexandrova (2008) speculated that steep
power-law spectra observed in the dissipation range at least
in some cases could reflect the geometry of the ion-gyroscale
structures rather than a local energy cascade. If Alfvén vor-
tices are a common feature, this possibility cannot be ex-
cluded. However, the resulting geometrical spectra are quite
steep (k−4 and steeper), so they can become important only
if the KAW cascade is weak or suppressed—somewhat simi-
larly to the steep spectra associated with the entropy cascade
(§ 7.11).
8.3. Is Equilibrium Distribution Isotropic and Maxwellian?
In rigorous theoretical terms, the weakest point of this pa-
per is the use of a Maxwellian equilibrium. Formally, this is
only justified when the collisions are weak but not too weak:
we ordered the collision frequency as similar to the fluctu-
ation frequency [Eq. (49)]. This degree of collisionality is
sufficient to prove that a Maxwellian equilibrium distribution
F0s(v) does indeed emerge in the lowest order of the gyroki-
netic expansion (Howes et al. 2006). This argument works
well for plasmas such as the ISM (§ 8.4), where collisions are
weak (λmfpi ≫ ρi) but non-negligible (λmfpi ≪ L). In space
plasmas, the mean free path is of the order of 1 AU—the dis-
tance between the Sun and the Earth (see Table 1). Strictly
speaking, in so highly collisionless a plasma, the equilib-
rium distribution does not have to be either Maxwellian or
isotropic.
The conservation of the first adiabatic invariant, µ = v2⊥/2B,
suggests that temperature anisotropy with respect to the
magnetic-field direction (T0⊥ 6= T0‖) may exist. When the
relative anisotropy is larger than (roughly) 1/βi, it triggers
several very fast growing plasma instabilities: most promi-
nently the firehose (T0⊥ < T0‖) and mirror (T0⊥ > T0‖) modes
(e.g., Gary et al. 1976). Their growth rates peak around the
ion gyroscale, thus giving rise to additional energy injection
at k⊥ρi ∼ 1.
No definitive analytical theory of how these fluctuations sat-
urate, cascade and affect the equilibrium distribution has been
proposed. It appears to be a reasonable expectation that the
fluctuations resulting from temperature anisotropy will satu-
rate by limiting this anisotropy. This idea has some support
in solar-wind observations: while the degree of anisotropy
of the core particle distribution functions varies consider-
ably between data sets, the observed anisotropies do seem
to populate the part of the parameter plane (T0⊥/T0‖,βi) cir-
cumscribed in a rather precise way by the marginal stabil-
ity boundaries for the mirror and firehose (Gary et al. 2001;
Kasper et al. 2002; Marsch et al. 2004; Hellinger et al. 2006;
Matteini et al. 2007).41
41 Note that Kellogg et al. (2006) measure the electric-field fluctuations
46 SCHEKOCHIHIN ET AL.
If we want to study turbulence in data sets that do not lie
too close to these stability boundaries, assuming an isotropic
Maxwellian equilibrium distribution [Eq. (54)] is probably
an acceptable simplification, although not an entirely rigor-
ous one. Further theoretical work is clearly possible on this
subject: thus, it is not a problem to formulate gyrokinetics
with an arbitrary equilibrium distribution (Frieman & Chen
1982) and starting from that, once can generalize the results
of this paper (for the KRMHD system, § 5, this has been done
by Chen et al. 2009). Treating the instabilities themselves
might prove more difficult, requiring the gyrokinetic order-
ing to be modified and the expansion carried to higher orders
to incorporate features that are not captured by gyrokinetics,
e.g., short parallel scales (Rosin et al. 2009), particle trap-
ping (Pokhotelov et al. 2008; Rincon et al. 2009), or nonlin-
ear finite-gyroradius effects (Califano et al. 2008). Note that
the theory of the dissipation-range turbulence will probably
need to be modified to account for the additional energy in-
jection from the instabilities and for the (yet unclear) way in
which this energy makes its way to dissipation and into heat.
Besides the anisotropies, the particle distribution functions
in the solar wind (especially the electron one) exhibit non-
Maxwellian suprathermal tails (see Maksimovic et al. 2005;
Marsch 2006, and references therein). These contain small
(∼ 5% of the total density) populations of energetic particles.
Both the origin of these particles and their effect on turbulence
have to be modeled kinetically. Again, it is possible to formu-
late gyrokinetics for general equilibrium distributions of this
kind and examine the interaction between them and the turbu-
lent fluctuations, but we leave such a theory outside the scope
of this paper.
Thus, much remains to be done to incorporate realistic equi-
librium distribution functions into the gyrokinetic description
of the solar wind plasma. In the meanwhile, we believe that
the gyrokinetic theory based on a Maxwellian equilibrium dis-
tribution as presented in this paper, while idealized and imper-
fect, is nevertheless a step forward in the analytical treatment
of the space-plasma turbulence compared to the fluid descrip-
tions that have prevailed thus far.
8.4. Interstellar Medium
While the solar wind is unmatched by other astrophysical
plasmas in the level of detail with which turbulence in it can
be measured, the interstellar medium (ISM) also offers an ob-
server a number of ways of diagnosing plasma turbulence,
which, in the case of the ISM, is thought to be primarily ex-
cited by supernova explosions (Norman & Ferrara 1996). The
accuracy and resolution of this analysis are due to improve
rapidly thanks to many new observatories, e.g., LOFAR,42
Planck (Enßlin et al. 2006), and, in more distant future, the
SKA (Lazio et al. 2004).
The ISM is a spatially inhomogeneous environment consist-
ing of several phases that have different temperatures, densi-
ties and degrees of ionization (Ferrière 2001).43 We will use
the Warm ISM phase (see Table 1) as our fiducial interstel-
lar plasma and discuss briefly what is known about the two
main observationally accessible quantities—the electron den-
sity and magnetic fields—and how this information fits into
in the ion-cyclotron frequency range, estimate the resulting velocity-space
diffusion and argue that it is sufficient to isotropize the ion distribution
42 http://www.lofar.org
43 And, therefore, different degrees of importance of the neutral particles
and the associated ambipolar damping effects—these will not be discussed
here; see Lithwick & Goldreich 2001.
the theoretical framework proposed here.
8.4.1. Electron Density Fluctuations
The electron-density fluctuations inferred from the inter-
stellar scintillation measurements appear to have a spectrum
with an exponent ≃ −1.7, consistent with the Kolmogorov
scaling (Armstrong et al. 1981, 1995; Lazio et al. 2004; see,
however, dissenting evidence by Smirnova et al. 2006, who
claim a spectral exponent closer to −1.5). This holds over
about 5 decades of scales: λ ∈ (105,1010) km. Other observa-
tional evidence at larger and smaller scales supports the case
for this presumed inertial range to be extended over as many
as 12 decades: λ ∈ (102,1015) km, a fine example of scale
separation that prompted an impressed astrophysicist to dub
the density scaling “The Great Power Law in the Sky.” The
upper cutoff here is consistent with the estimates of the su-
pernova scale of order 100 pc—presumably the outer scale of
the turbulence (Norman & Ferrara 1996) and also roughly the
scale height of the galactic disk (obviously the upper bound
on the validity of any homogeneous model of the ISM tur-
bulence). The lower cutoff is an estimate for the inner scale
below which the logarithmic slope of the density spectrum
steepens to about −2 (Spangler & Gwinn 1990).
Higdon (1984) was the first to realize that the electron-
density fluctuations in the ISM could be attributed to a cas-
cade of a passive tracer mixed by the ambient turbulence (the
MHD entropy mode; see § 2.6). This idea was brought to ma-
turity by Lithwick & Goldreich (2001), who studied the pas-
sive cascades of the slow and entropy modes in the frame-
work of the GS theory (see also Maron & Goldreich 2001).
If the turbulence is assumed anisotropic, as in the GS theory,
the passive nature of the density fluctuations with respect to
the decoupled Alfvén-wave cascade becomes a rigorous re-
sult both in MHD (§ 2.4) and, as we showed above, in the
more general gyrokinetic description appropriate for weakly
collisional plasmas (§ 5.5). Anisotropy of the electron-density
fluctuations in the ISM is, indeed, observationally supported
(Wilkinson et al. 1994; Trotter et al. 1998; Rickett et al. 2002;
Dennett-Thorpe & de Bruyn 2003; Heyer et al. 2008, see also
Lazio et al. 2004 for a concise discussion), although detailed
scale-by-scale measurements are not currently possible.
If the underlying Alfvén-wave turbulence in the ISM has
⊥ spectrum, as predicted by GS, so should the elec-
tron density (see § 2.6). As we discussed in § 6.3, the phys-
ical nature of the inner scale for the density fluctuations de-
pends on whether they have a cascade in k‖ and are effi-
ciently damped when k‖λmfpi ∼ 1 or fail to develop small
parallel scales and can, therefore, reach k⊥ρi ∼ 1. The ob-
servationally estimated inner scale is consistent with the ion
gyroscale, ρi ∼ 103 km (see Table 1; note that the ion iner-
tial scale di = ρi/
βi is similar to ρi at the moderate values
of βi characteristic of the ISM—see further discussion of the
(ir)relevance of di in § 7.1, § 8.2.3 and Appendix E). How-
ever, since the mean free path in the ISM is not huge (Ta-
ble 1), it is not possible to distinguish this from the perpen-
dicular cutoff k−1⊥ ∼ λ
mfpiL
−1/2 ∼ 500 km implied by the par-
allel cutoff at k‖λmfpi ∼ 1 [see Eq. (220)], as advocated by
Lithwick & Goldreich (2001). Note that the relatively short
mean free path means that much of the scale range spanned by
the Great Power Law in the Sky is, in fact, well described by
the MHD approximation either with adiabatic (§ 2) or isother-
mal (§ 6.1 and Appendix D) electrons.
Below the ion gyroscale, the −2 spectral exponent reported
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 47
by Spangler & Gwinn (1990) is measured sufficiently impre-
cisely to be consistent with the −7/3 expected for the density
fluctuations in the KAW cascade (§ 7.5). However, given the
high degree of uncertainty about what happens in this “dis-
sipation range” even in the much better resolved case of the
solar wind (§ 8.2), it would probably be wise to reserve judg-
ment until better data are available.
8.4.2. Magnetic Fluctuations
The second main observable type of turbulent fluctuations
in the ISM are the magnetic fluctuations, accessible indirectly
via the measurements of the Faraday rotation of the polar-
ization angle of the pulsar light travelling through the ISM.
The structure function of the rotation measure (RM) should
have the Kolmogorov slope of 2/3 if the magnetic fluctua-
tions are due to Alfvénic turbulence described by the GS the-
ory. There is a considerable uncertainty in interpreting the
available data, primarily due to insufficient spatial resolution
(rarely better than a few parsec). Structure function slopes
consistent with 2/3 have been reported (Minter & Spangler
1996), but, depending on where one looks, shallower struc-
ture functions that seem to steepen at scales of a few parsec
are also observed (Haverkorn et al. 2004).
A recent study by Haverkorn et al. (2005) detected an in-
teresting trend: the RM structure functions computed for re-
gions that lie in the galactic spiral arms are nearly perfectly
flat down to the resolution limit, while in the interarm regions,
they have detectable slopes (although these are mostly shal-
lower that 2/3). Observations of magnetic fields in external
galaxies also reveal a marked difference in the magnetic-field
structure between arms and interarms: the spatially regular
(mean) fields are stronger in the interarms, while in the arms,
the stochastic fields dominate (Beck 2007). This qualitative
difference between the magnetic-field structure in the arms
and interarms has been attributed to smaller effective outer
scale in the arms (∼ 1 pc, compared to ∼ 102 pc in the in-
terarms; see Haverkorn et al. 2008) or to the turbulence in the
arms and interarms belonging to the two distinct asymptotic
regimes described in § 1.3: closer to the anisotropic Alfvénic
turbulence with a strong mean field in the interarms and to the
isotropic saturated state of small-scale dynamo in the arms
(Schekochihin et al. 2007).
8.5. Accretion Disks
Accretion of plasma onto a central black hole or neutron
star is responsible for many of the most energetic phenomena
observed in astrophysics (see, e.g., Narayan & Quataert 2005
for a review). It is now believed that a linear instability of dif-
ferentially rotating plasmas—the magnetorotational instabil-
ity (MRI)—amplifies magnetic fields and gives rise to MHD
turbulence in astrophysical disks (Balbus & Hawley 1998).
Magnetic stresses due to this turbulence transport angular mo-
mentum, allowing plasma to accrete. The MRI converts the
gravitational potential energy of the inflowing plasma into
turbulence at the outer scale that is comparable to the scale
height of the disk. This energy is then cascaded to small
scales and dissipated into heat—powering the radiation that
we see from accretion flows. Fluid MHD simulations show
that the MRI-generated turbulence in disks is subsonic and
has β ∼ 10 − 100. Thus, on scales much smaller than the scale
height of the disk, homogeneous turbulence in the parameter
regimes considered in this paper is a valid idealization and
the kinetic models developed above should represent a step
forward compared to the purely fluid approach.
Turbulence is not yet directly observable in disks, so mod-
els of turbulence are mostly used to produce testable predic-
tions of observable properties of disks such as their X-ray and
radio emission. One of the best observed cases is the (pre-
sumed) accretion flow onto the black hole coincident with the
radio source Sgr A∗ in the center of our Galaxy (see review
by Quataert 2003).
Depending on the rate of heating and cooling in the inflow-
ing plasma (which in turn depend on accretion rate and other
properties of the system under consideration), there are differ-
ent models that describe the physical properties of accretion
flows onto a central object. In one class of models, a geometri-
cally thin optically thick accretion disk (Shakura & Sunyaev
1973), the inflowing plasma is cold and dense and well de-
scribed as an MHD fluid. When applied to Sgr A∗, these
models produce a prediction for its total luminosity that is
several orders of magnitude larger than observed. Another
class of models, which appears to be more consistent with the
observed properties of Sgr A∗, is called radiatively inefficient
accretion flows (RIAFs; see Rees et al. 1982; Narayan & Yi
1995 and review by Quataert 2003 of the applications and ob-
servational constraints in Sgr A∗). In these models, the in-
flowing plasma near the black hole is believed to adopt a two-
temperature configuration, with the ions (Ti ∼ 1011 − 1012 K)
hotter than the electrons (Te ∼ 109 − 1011 K).44 The electron
and ion thermodynamics decouple because the densities are
so low that the temperature equalization time ∼ ν−1ie is longer
than the time for the plasma to flow into the black hole. Thus,
like the solar wind, RIAFs are macroscopically collisionless
plasmas (see Table 1 for plasma parameters in the Galactic
center; note that these parameters are so extreme that the gy-
rokinetic description, while probably better than the fluid one,
cannot be expected to be rigorously valid; at the very least, it
needs to be reformulated in a relativistic form). At the high
temperatures appropriate to RIAFs, electrons radiate energy
much more efficiently than the ions (by virtue of their much
smaller mass) and are, therefore, expected to contribute dom-
inantly to the observed emission, while the thermal energy of
the ions is swallowed by the black hole. Since the plasma
is collisionless, the electron heating by turbulence largely de-
termines the thermodynamics of the electrons and thus the
observable properties of RIAFs. The question of which frac-
tion of the turbulent energy goes into ion and which into elec-
tron heating is, therefore, crucial for understanding accretion
flows—and the answer to this question depends on the de-
tailed properties of the small-scale kinetic turbulence (e.g.,
Quataert & Gruzinov 1999; Sharma et al. 2007), as well as on
the linear properties of the collisionless MRI (Quataert et al.
2002; Sharma et al. 2003).
Since all of the turbulent power coming down the cascade
must be dissipated into either ion or electron heat, it is re-
ally the amount of generalized energy diverted at the ion gy-
roscale into the ion entropy cascade (§§ 7.8-7.9) that decides
how much energy is left to heat the electrons via the KAW
cascade (§§ 7.2-7.5, § 7.12). Again, as in the case of the solar
wind (§ 8.2.2 and § 8.2.5), the transition around the ion gy-
roscale from the Alfvénic turbulence at k⊥ρi ≪ 1 to the KAW
turbulence at k⊥ρi ≫ 1 emerges as a key unsolved problem.
8.6. Galaxy Clusters
44 It is partly with this application in mind that we carried the general
temperature ratio in our calculations; see footnote 17.
48 SCHEKOCHIHIN ET AL.
Galaxy clusters are the largest plasma objects in the Uni-
verse. Like the other examples discussed above, the intraclus-
ter plasma is in the weakly collisional regime (see Table 1).
Fluctuations of electron density, temperature and of magnetic
fields are measured in clusters by X-ray and radio observa-
tories, but the resolution is only just enough to claim that a
fairly broad scale range of fluctuations exists (Schuecker et al.
2004; Vogt & Enßlin 2005). No power-law scalings have yet
been established beyond reasonable doubt.
What fundamentally hampers quantitative modeling of tur-
bulence and related effects in clusters is that we do not have
a definite theory of the basic properties of the intracluster
medium: its (effective) viscosity, magnetic diffusivity or ther-
mal conductivity. In a weakly collisional and strongly mag-
netized plasma, all of these depend on the structure of the
magnetic field (Braginskii 1965), which is shaped by the tur-
bulence. If (or at scales where) a reasonable a priori assump-
tion can be made about the field structure, further analytical
progress is possible: thus, the theoretical models presented in
this paper assume that the magnetic field is a sum of a slowly
varying in space “mean field” and small low-frequency per-
turbations (δB ≪ B0).
In fact, since clusters do not have mean fields of any mag-
nitude that could be considered dynamically significant, but
do have stochastic fields, the outer-scale MHD turbulence in
clusters falls into the weak-mean-field category (see § 1.3).
The magnetic field should be highly filamentary, organized
in long folded direction-reversing structures. It is not cur-
rently known what determines the reversal scale.45 Obser-
vations, while tentatively confirming the existence of very
long filaments (Clarke & Enßlin 2006), suggest that the re-
versal scale is much larger than the ion gyroscale: thus, the
magnetic-energy spectrum for the Hydra A cluster core re-
ported by Vogt & Enßlin (2005) peaks at around 1 kpc, com-
pared to ρi ∼ 105 km. Below this scale, an Alfvén-wave cas-
cade should exist (as is, indeed, suggested by Vogt & Enßlin’s
spectrum being roughly consistent with k−5/3 at scales below
the peak). As these scales are collisionless (λmfpi ∼ 100 pc in
the cores and ∼ 10 kpc in the bulk of the clusters), it is to this
turbulence that the theory developed in this paper should be
applicable.
Another complication exists, similar to that discussed in
§ 8.3: pressure anisotropies could give rise to fast plasma
instabilities whose growth rate peaks just above the ion gy-
roscale. As was pointed out by Schekochihin et al. (2005),
these are, in fact, an inevitable consequence of any large-scale
fluid motions that change the strength of the magnetic field.
Although a number of interesting and plausible arguments
can be made about the way the instabilities might determine
the magnetic-field structure (Schekochihin & Cowley 2006;
Schekochihin et al. 2008a; Rosin et al. 2009; Rincon et al.
2009), it is not currently understood how the small-scale
fluctuations resulting from these instabilities coexist with the
Alfvénic cascade.
The uncertainties that result from this imperfect under-
standing of the nature of the intracluster medium are exempli-
fied by the problem of its thermal conductivity. The magnetic-
field reversal scale in clusters is certainly not larger than the
electron diffusion scale, (mi/me)
1/2λmfpi, which varies from a
45 See Schekochihin & Cowley (2006) for a detailed presentation of our
views on the interplay between turbulence, magnetic field and plasma ef-
fects in cluster; for further discussions and disagreements, see Enßlin & Vogt
(2006); Subramanian et al. (2006); Brunetti & Lazarian (2007).
few kpc in the cores to a few hundred kpc in the bulk. There-
fore, one would expect that the approximation of isothermal
electron fluid (§ 4) should certainly apply at all scales below
the reversal scale, where δB ≪ B0 presumably holds. Even
this, however, is not absolutely clear. One could imagine
the electrons being effectively adiabatic if (or in the regions
where) the plasma instabilities give rise to large fluctuations
of the magnetic field (δB/B0 ∼ 1) at the ion gyroscale re-
ducing the mean free path to λmfpi ∼ ρi (Schekochihin et al.
2008a; Rosin et al. 2009; Rincon et al. 2009). Such fluctu-
ations cannot be described by the gyrokinetics in its cur-
rent form. The current state of the observational evidence
does not allow one to exclude either of these possibilities.
Both isothermal (Fabian et al. 2006; Sanders & Fabian 2006)
and non-isothermal (Markevitch & Vikhlinin 2007) coherent
structures that appear to be shocks are observed. Disordered
fluctuations of temperature can also be detected, which allows
one to infer an upper limit for the scale at which the isothermal
approximation can start being valid: thus, Markevitch et al.
(2003) find temperature variations at all scales down to ∼
100 kpc, which is the statistical limit that defines the spa-
tial resolution of their temperature map. In none of these or
similar measurements is the magnetic field data available that
would make possible a pointwise comparison of the magnetic
and thermal structure.
Because of this lack of information about the state of the
magnetized plasma in clusters, theories of the intracluster
medium are not sufficiently constrained by observations, so
no one theory is in a position to prevail. This uncertain state
of affairs might be improved by analyzing the observationally
much better resolved case of the solar wind, which should be
quite similar to the intracluster medium at very small scales
(except for somewhat lower values of βi in the solar wind).
9. CONCLUSION
In this paper, we have considered magnetized plasma tur-
bulence in the astrophysically prevalent regime of weak col-
lisionality. We have shown how the energy injected at the
outer scale cascades in phase space, eventually to increase the
entropy of the system and heat the particles. In the process,
we have explained how one combines plasma physics tools—
in particular, the gyrokinetic theory—with the ideas of a tur-
bulent cascade of energy to arrive at a hierarchy of tractable
models of turbulence in various physically distinct scale in-
tervals. These models represent the branching pathways of a
generalized energy cascade in phase space (the “kinetic cas-
cade”; see Fig. 5) and make explicit the “fluid” and “kinetic”
aspects of plasma turbulence.
A detailed outline of these developments was given in the
Introduction. Intermediate technical summaries were pro-
vided in § 4.9, § 5.7, and § 7.14. An astrophysical summary
and discussion of the observational evidence was given in § 8,
with a particular emphasis on space plasmas (§§ 8.1-8.3). Our
view of how the transformation of the large-scale turbulent
energy into heat occurs was encapsulated in the concept of
a kinetic cascade of generalized energy. It was previewed in
§ 1.4 and developed quantitatively in §§ 3.4-3.5, § 4.7, § 5.6,
§§ 6.2.3-6.2.5, §§ 7.8-7.12, Appendices D.2 and E.2.
Following a series of analytical contributions that set
up a theoretical framework for astrophysical gyrokinetics
(Howes et al. 2006, 2008a; Schekochihin et al. 2007, 2008b,
and this paper), an extensive program of fluid, hybrid fluid-
kinetic, and fully gyrokinetic46 numerical simulations of mag-
netized plasma turbulence is now underway (for the first re-
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 49
sults of this program, see Howes et al. 2008b; Tatsuno et al.
2009a,b). Careful comparisons of the fully gyrokinetic
simulations with simulations based on the more readily
computable models derived in this paper (RMHD—§ 2,
isothermal electron fluid—§ 4, KRMHD—§ 5, ERMHD—
§ 7, HRMHD—Appendix E) as well as with the numerical
studies based on various Landau fluid (Snyder et al. 1997;
Goswami et al. 2005; Ramos 2005; Sharma et al. 2006, 2007;
Passot & Sulem 2007) and gyrofluid (Hammett et al. 1991;
Dorland & Hammett 1993; Snyder & Hammett 2001; Scott
2007) closures appear to be the way forward in developing a
comprehensive numerical model of the kinetic turbulent cas-
cade from the outer scale to the electron gyroscale. Of the
many astrophysical plasmas to which these results apply, the
solar wind and, perhaps, the magnetosheath, due to the high
quality of turbulence measurements possible in them, appear
to be the most suitable test beds for direct and detailed quan-
titative comparisons of the theory and simulation results with
observational evidence. The objective of all this work remains
a quantitative characterization of the scaling-range properties
(spectra, anisotropy, nature of fluctuations and their interac-
tions), the ion and electron heating, and the transport proper-
ties of the magnetized plasma turbulence.
We thank O. Alexandrova, S. Bale, J. Borovsky, T. Carter,
S. Chapman, C. Chen, E. Churazov, T. Enßlin, A. Fabian,
A. Finoguenov, A. Fletcher, M. Haverkorn, B. Hnat, T. Hor-
bury, K. Issautier, C. Lacombe, M. Markevitch, K. Osman,
T. Passot, F. Sahraoui, A. Shukurov, and A. Vikhlinin for
helpful discussions of experimental and observational data;
I. Abel, M. Barnes, D. Ernst, J. Hastie, P. Ricci, C. Roach,
and B. Rogers for discussions of collisions in gyrokinetics;
and G. Plunk for discussions of the theory of gyrokinetic tur-
bulence in two spatial dimensions. The authors’ travel was
supported by the US DOE Center for Multiscale Plasma Dy-
namics and by the Leverhulme Trust (UK) International Aca-
demic Network for Magnetized Plasma Turbulence. A.A.S.
was supported in part by a PPARC/STFC Advanced Fellow-
ship and by the STFC Grant ST/F002505/1. He also thanks
the UCLA Plasma Group for its hospitality on several occa-
sions. S.C.C. and W.D. thank the Kavli Institute for The-
oretical Physics and the Aspen Center for Physics for their
hospitality. G.W.H. was supported by the US DOE contract
DE-AC02-76CH03073. G.G.H. and T.T. were supported by
the US DOE Center for Multiscale Plasma Dynamics. E.Q.
and G.G.H. were supported in part by the David and Lucille
Packard Foundation.
46 Using the publicly available GS2 code (developed originally for fusion
applications; see http://gs2.sourceforge.net) and the purpose-built AstroGK
code (see http://www.physics.uiowa.edu/~ ghowes/astrogk/).
APPENDIX
A. BRAGINSKII’S TWO-FLUID EQUATIONS AND REDUCED MHD
Here we explain how the standard one-fluid MHD equations used in § 2 and the collisional limit of the KRMHD system
(§ 6.1, derived in Appendix D) both emerge as limiting cases of the two-fluid theory. For the case of anisotropic fluctuations,
k‖/k⊥ ≪ 1, all of this can, of course, be derived from gyrokinetics, but it is useful to provide a connection to the more well
known fluid description of collisional plasmas.
A.1. Two-Fluid Equations
The rigorous derivation of the fluid equations for a collisional plasma was done in the classic paper of Braginskii (1965). His
equations, valid for ω/νii ≪ 1, k‖λmfpi ≪ 1, k⊥ρi ≪ 1 (see Fig. 3), evolve the densities ns, mean velocities us and temperatures
Ts of each plasma species (s = i,e):
+ us ·∇
ns = −ns∇·us, (A1)
+ us ·∇
us = −∇ps −∇· Π̂s + qsns
us ×B
+ Fs, (A2)
+ us ·∇
Ts = −ps∇·us −∇·Γs − Π̂s : ∇us + Qs, (A3)
where ps = nsTs and the expressions for the viscous stress tensor Π̂s, the friction force Fs, the heat flux Γs and the interspecies heat
exchange Qs are given in Braginskii (1965). Equations (A1-A3) are complemented with the quasi-neutrality condition, ne = Zni,
and the Faraday and Ampère laws, which are (in the non-relativistic limit)
= −c∇×E, j = ene(ui − ue) =
∇×B. (A4)
Because of quasi-neutrality, we only need one of the continuity equations, say the ion one. We can also use the electron momen-
tum equation [Eq. (A2), s = e] to express E, which we then substitute into the ion momentum equation and the Faraday law. The
resulting system is
= −ρ∇·u, (A5)
−∇· Π̂+ B ·∇B
+ ue ·∇
ue, (A6)
u×B − j×B
c∇· Π̂e
+ ue ·∇
, (A7)
http://gs2.sourceforge.net
http://www.physics.uiowa.edu/~
50 SCHEKOCHIHIN ET AL.
where ρ = mini, u = ui, p = pi + pe, Π̂ = Π̂i + Π̂e, ue = u − j/ene, ne = Zni, d/dt = ∂/∂t + u ·∇. The ion and electron temperatures
continue to satisfy Eq. (A3).
A.2. Strongly Magnetized Limit
In this form, the two-fluid theory starts resembling the standard one-fluid MHD, which was our starting point in § 2: Eqs. (A5-
A7) already look similar to the continuity, momentum and induction equations. The additional terms that appear in these equations
and the temperature equations (A3) are brought under control by considering how they depend on a number of dimensionless
parameters: ω/νii, k‖λmfpi, k⊥ρi, (me/mi)
1/2. While all these are small in Braginskii’s calculation, no assumption is made as to
how they compare to each other. We now specify that
k‖λmfpi√
, k⊥ρi ≪ k‖λmfpi ∼
≪ 1 (A8)
(see Fig. 4). Note that the first of these relations is equivalent to assuming that the fluctuation frequencies are Alfvénic—the same
assumption as in gyrokinetics [Eq. (49)]. The second relation in Eq. (A8) will be referred to by us as the strongly magnetized
limit. Under the assumptions (A8), the two-fluid equations reduce to the following closed set:47
= −ρ∇·u, (A10)
b̂b̂ : ∇u − 1
b̂b̂ρν‖i
b̂b̂ : ∇u − 1
B ·∇B
, (A11)
= B ·∇u − B∇·u, (A12)
Ti∇·u +
b̂ρκ‖ib̂ ·∇Ti
− νie (Ti − Te) +
miν‖i
b̂b̂ : ∇u − 1
, (A13)
Te∇·u +
b̂ρκ‖eb̂ ·∇Te
νie (Te − Ti) , (A14)
where ν‖i = 0.90vthiλmfpi is the parallel ion viscosity, κ‖i = 2.45vthiλmfpi parallel ion thermal diffusivity, κ‖e = 1.40vtheλmfpe ∼
Z2/τ 5/2
(mi/me)
1/2κ‖i parallel electron thermal diffusivity [here λmfpi = vthi/νii with νii defined in Eq. (52)], and νie ion–electron
collision rate [defined in Eq. (51)]. Note that the last term in Eq. (A13) represents the viscous heating of the ions.
A.3. One-Fluid Equations (MHD)
If we now restrict ourselves to the low-frequency regime where ion–electron collisions dominate over all other terms in the
ion-temperature equation (A13),
k‖λmfpi√
≪ 1 (A15)
[see Eqs. (A8) and (51)], we have, to lowest order in this new subsidiary expansion, Ti = Te = T . We can now write p = (ni +ne)T =
(1 + Z)ρT/mi and, adding Eqs. (A13) and (A14), find the equation for pressure:
p∇·u = ∇·
b̂neκ‖eb̂ ·∇T
miν‖i
b̂b̂ : ∇u − 1
, (A16)
where we have neglected the ion thermal diffusivity compared to the electron one, but kept the ion heating term to maintain
energy conservation. Equation (A16) together with Eqs. (A10-A12) constitutes the conventional one-fluid MHD system. With
the dissipative terms [which are small because of Eq. (A15)] neglected, this was the starting point for our fluid derivation of
RMHD in § 2.
Note that the electrons in this regime are adiabatic because the electron thermal diffusion is small
∼ k‖λmfpi
≪ 1, (A17)
47 The structure of the momentum equation (A11) is best understood by realizing that ρν‖i
b̂b̂ : ∇u −∇·u/3
= p⊥ − p‖ , the difference between the perpen-
dicular and parallel (ion) pressures. Since the total pressure is p = (2/3)p⊥ + (1/3)p‖ , Eq. (A11) can be written
p⊥ − p‖
B ·∇B
. (A9)
This is the general form of the momentum equation that is also valid for collisionless plasmas, when k⊥ρi ≪ 1 but k‖λmfpi is order unity or even large.
Equation (A9) together with the continuity equation (A11), the induction equation (A12) and a kinetic equation for the particle distribution function (from the
solution of which p⊥ and p‖ are determined) form the system known as Kinetic MHD (KMHD, see Kulsrud 1964, 1983). The collisional limit, k‖λmfpi ≪ 1, of
KMHD is again Eqs. (A10-A14).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 51
provided Eq. (A15) holds and βi is order unity. If we take βi ≫ 1 instead, we can still satisfy Eq. (A15), so Ti = Te follows from
the ion temperature equation (A13) and the one-fluid equations emerge as an expansion in high βi. However, these equations now
describe two physical regimes: the adiabatic long-wavelength regime that satisfies Eq. (A17) and the shorter-wavelength regime
in which (me/mi)
βi ≪ k‖λmfpi ≪ (me/mi)1/2
βi, so the fluid is isothermal, T = T0 = const, p = [(1+Z)T0/mi]ρ = c2sρ [Eq. (9)
holds with γ = 1].
A.4. Two-Fluid Equations with Isothermal Electrons
Let us now consider the regime in which the coupling between the ion and electron temperatures is small and the electron
diffusion is large [the limit opposite to Eqs. (A15) and (A17)]:
k‖λmfpi√
∼ k‖λmfpi
≫ 1, (A18)
Then the electrons are isothermal, Te = T0e = const (with the usual assumption of stochastic field lines, so b̂ · ∇Te = 0 implies
∇Te = 0, as in § 4.4), while the ion temperature satisfies
Ti∇·u +
b̂ρκ‖ib̂ ·∇Ti
miν‖i
b̂b̂ : ∇u − 1
. (A19)
Equation (A19) together with Eqs. (A10-A12) and p = ρ(Ti + ZT0e)/mi are a closed system that describes an MHD-like fluid
of adiabatic ions and isothermal electrons. Applying the ordering of § 2.1 to these equations and carrying out an expansion in
k‖/k⊥ ≪ 1 entirely analogously to the way it was done in § 2, we arrive at the RMHD equations (17-18) for the Alfvén waves
and the following system for the compressive fluctuations (slow and entropy modes):
+ b̂ ·∇u‖ = 0, (A20)
− v2Ab̂ ·∇
= ν‖i b̂ ·∇
b̂ ·∇u‖ +
, (A21)
= κ‖ib̂ ·∇
b̂ ·∇
, (A22)
and the pressure balance
b̂ ·∇u‖ +
. (A23)
Recall that these equations, being the consequence of Braginskii’s two-fluid equations (§ A.1), are an expansion in k‖λmfpi ≪ 1
correct up to first order in this small parameter. Since the dissipative terms are small, we can replace (d/dt)δρ/ρ0 in the viscous
terms of Eqs. (A21) and (A23) by its value computed from Eqs. (A20), (A22) and (A23) in neglect of dissipation: (d/dt)δρ/ρ0 =
−b̂ · ∇u‖/(1 + c2s/v2A) [cf. Eq. (25)], where the speed of sound cs is defined by Eq. (166). Substituting this into Eqs. (A21)
and (A23), we recover the collisional limit of KRMHD derived in Appendix D, see Eqs. (D18-D20) and (D22).
B. COLLISIONS IN GYROKINETICS
The general collision operator that appears in Eq. (36) is (Landau 1936)
= 2π lnΛ
q2s q
fs′ (v
∂ fs(v)
fs(v)
∂ fs′ (v′)
, (B1)
where w = v − v′ and lnΛ is the Coulomb logarithm. We now take into account the expansion of the distribution function (54),
use the fact that the collision operator vanishes when it acts on a Maxwellian, and retain only first-order terms in the gyrokinetic
expansion. This gives us the general form of the collision term in Eq. (57): it is the ring-averaged linearized form of the Landau
collision operator (B1), (∂hs/∂t)c = 〈Cs[h]〉Rs , where
Cs[h] = 2π lnΛ
q2s q
F0s′(v
hs(v) − F0s(v)
hs′(v
. (B2)
Note that the velocity derivatives are taken at constant r, i.e., the gyrocenter distribution functions that appear in the integrand
should be understood as hs(v)≡ hs(t,r+v⊥× ẑ/Ωs,v⊥,v‖). The explicit form of the gyrokinetic collision operator can be derived
in k space as follows:
eik·Rhk
eik·rCs
e−ik·ρhk
eik·Rs
eik·ρs(v)Cs
e−ik·ρhk
, (B3)
52 SCHEKOCHIHIN ET AL.
where ρs(v) = −v⊥× ẑ/Ωs and Rs = r−ρs(v). Angle brackets with no subscript refer to averages over the gyroangle ϑ of quantities
that do not depend on spatial coordinates. Note that inside the operator Cs[. . .], h occurs both with index s and velocity v and
with index s′ and velocity v′ (over which summation/integration is done). In the latter case, ρ = ρs′(v′) = −v′⊥× ẑ/Ωs′ in the
exponential factor inside the operator.
Most of the properties of the collision operator that are used in the main body of this paper to order the collision terms
can be established in general, already on the basis of Eq. (B3) (§§ B.1-B.2). If the explicit form of the collision operator is
required, we could, in principle, perform the ring average on the linearized operator C [Eq. (B2)] and derive an explicit form of
(∂hs/∂t)c. In practice, in gyrokinetics, as in the rest of plasma physics, the full collision operator is only used when it is absolutely
unavoidable. In most problems of interest, further simplifications are possible: the same-species collisions are often modeled by
simpler operators that share the full collision operator’s conservation properties (§ B.3), while the interspecies collision operators
are expanded in the electron–ion mass ratio (§ B.4).
B.1. Velocity-Space Integral of the Gyrokinetic Collision Operator
Many of our calculations involve integrating the gyrokinetic equation (57) over the velocity space while keeping r constant.
Here we estimate the size of the integral of the collision term when k⊥ρs ≪ 1. Using Eq. (B3),
d3veik·r−ik·ρs(v)
eik·ρs(v)Cs
e−ik·ρhk
eik·r2π
dv⊥ v⊥
e−ik·ρs(v)
eik·ρs(v)Cs
e−ik·ρhk
eik·r
e−ik·ρs(v)
eik·ρs(v)Cs
e−ik·ρhk
eik·r
d3vJ0(as)e
ik·ρs(v)Cs
e−ik·ρhk
eik·r
1 − ik · v⊥× ẑ
k · v⊥× ẑ
+ . . .
e−ik·ρhk
. (B4)
Since the (linearized) collision operator Cs conserves particle number, the first term in the expansion vanishes. The operator
Cs = Css +Css′ is a sum of the same-species collision operator [the s′ = s part of the sum in Eq. (B2)] and the interspecies collision
operator (the s′ 6= s part). The former conserves total momentum of the particles of species s, so it gives no contribution to the
second term in the expansion in Eq. (B4). Therefore,
d3v〈〈Css[hs]〉Rs〉r ∼ νssk
s δns. (B5)
The interspecies collisions do contribute to the second term in Eq. (B4) due to momentum exchange with the species s′. This
contribution is readily inferred from the standard formula for the linearized friction force (see, e.g., Helander & Sigmar 2002):
d3vvCss′
e−ik·ρhk
S (v)e
−ik·ρs(v)hsk + ms′νs
S (v)e
−ik·ρs′ (v)hs′k
, (B6)
S (v) =
2πn0s′q
s′ lnΛ
(vths
vths′
vths′
erf ′
vths′
, (B7)
where erf(x) = (2/
dy exp(−y2) is the error function. From this, via a calculation of ring averages analogous to Eq. (B17),
we get
−ik ·
v⊥× ẑ
e−ik·ρhk
S (v)
ik ·ρs(v)e−ik·ρs(v)
hsk +
S (v)
ik ·ρs′(v)e−ik·ρs′ (v)
S (v)asJ1(as)hsk +
S (v)as′J1(as′)hs′k
∼ νss′k2⊥ρ2sδns + νs′sk2⊥ρ2s′δns′ . (B8)
For the ion–electron collisions (s = i, s′ = e), using Eqs. (45) and (51), we find that both terms are ∼ (me/mi)1/2νiik2⊥ρ2i δni.
Thus, besides an extra factor of k2⊥ρ
i , the ion–electron collisions are also subdominant by one order in the mass-ratio expansion
compared to the ion–ion collisions. The same estimate holds for the interspecies contributions to the third and fourth terms in
Eq. (B4). In a similar fashion, the integral of the electron–ion collision operator (s = e, s′ = i), is ∼ νeik2⊥ρ2eδne, which is the same
order as the integral of the electron–electron collisions.
The conclusion of this section is that, both for ion and for electron collisions, the velocity-space integral (at constant r) of the
gyrokinetic collision operator is higher order than the collision operator itself by two orders of k⊥ρs. This is the property that we
relied on in neglecting collision terms in Eqs. (104) and (137).
B.2. Ordering of Collision Terms in Eqs. (125) and (137)
In § 5, we claimed that the contribution to the ion–ion collision term due to the (Ze〈ϕ〉Ri/T0i)F0i part of the ion distribution
function [Eq. (124)] was one order of k⊥ρi smaller than the contributions from the rest of hi. This was used to order collision
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 53
terms in Eqs. (125) and (137). Indeed, from Eq. (B3),
Ze〈ϕ〉Ri
eik·Ri
eik·ρiCii
e−ik·ρi J0(ai)F0i
]〉 Zeϕk
eik·Ri
eik·ρiCii
1 − ik ·ρi −
(k ·ρi)2 −
+ · · ·
∼ νiik2⊥ρ2i
F0i. (B9)
This estimate holds because, as it is easy to ascertain using Eq. (B2), the operator Cii annihilates the first two terms in the
expansion and only acts non-trivially on an expression that is second order in k⊥ρi. With the aid of Eq. (47), the desired ordering
of the term (B9) in Eq. (125) follows. When Eq. (B9) is integrated over velocity space, the result picks up two extra orders in
k⊥ρi [a general effect of integrating the gyroaveraged collision operator over the velocity space; see Eq. (B4)]:
Ze〈ϕ〉Ri
∼ νiik4⊥ρ4i
, (B10)
so the resulting term in Eq. (137) is third order, as stated in § 5.3.
B.3. Model Pitch-Angle-Scattering Operator for Same-Species Collisions
A popular model operator for same-species collisions that conserves particle number, momentum, and energy is constructed
by taking the test-particle pitch-angle-scattering operator and correcting it with an additional term that ensures momentum con-
servation (Rosenbluth et al. 1972; see also Helander & Sigmar 2002):
CM[hs] = ν
D (v)
1 − ξ2
) ∂hs
1 − ξ2
2v ·U[hs]
v2ths
, U[hs] =
d3vvνssD (v)hs
d3v (v/vths)2νssD (v)F0s(v)
, (B11)
νssD (v) = νss
(vths
v2ths
erf ′
, νss =
2πn0sq
s lnΛ
, (B12)
where the velocity derivatives are at constant r. The gyrokinetic version of this operator is (cf. Catto & Tsang 1977;
Dimits & Cohen 1994)
〈CM[hs]〉Rs =
eik·RsνssD (v)
1 − ξ2
) ∂hsk
v2(1 + ξ2)
4v2ths
s hsk + 2
v⊥J1(as)U⊥[hsk] + v‖J0(as)U‖[hsk]
v2ths
, (B13)
U⊥[hsk] =
d3vv⊥J1(as)νssD (v)hsk(v⊥,v‖)
d3v (v/vths)2νssD (v)F0s(v)
, U‖[hsk] =
d3vv‖J0(as)ν
D (v)hsk(v⊥,v‖)
d3v (v/vths)2νssD (v)F0s(v)
where as = k⊥v⊥/Ωs. The velocity derivatives are now at constant Rs. The spatial diffusion term appearing in the ring-averaged
collision operator is physically due to the fact that a change in a particle’s velocity resulting from a collision can lead to a change
in the spatial position of its gyrocenter.
In order to derive Eq. (B13), we use Eq. (B3). Since, ρs(v) =
1 − ξ2 sinϑ+ ŷv
1 − ξ2 cosϑ
/Ωs, it is not hard to see
e−ik·ρs(v)hsk = e
−ik·ρs(v)
1 − ξ2
ik⊥ ·
v⊥× ẑ
e−ik·ρs(v)hsk = e
−ik·ρ(v)
ik⊥ ·v⊥
hsk. (B14)
Therefore,
eik·ρs(v)
1 − ξ2
e−ik·ρs(v)hsk
1 − ξ2
) ∂hsk
k2⊥hsk,
eik·ρs(v)
e−ik·ρs(v)hsk
1 − ξ2
k2⊥hsk.
(B15)
Combining these formulae, we obtain the first two terms in Eq. (B13). Now let us work out the U term:
eik·ρs(v)v ·
d3v′ v′νssD (v
′)e−ik·ρs(v
′)hsk
v′⊥,v
veik·ρs(v)
dv′⊥ v
dv′‖ν
v′e−ik·ρs(v
v′⊥,v
(B16)
Since
ve±ik·ρs(v)
= ẑv‖
e±ik·ρs(v)
v⊥e±ik·ρs(v)
, where
e±ik·ρs(v)
= J0(as) and
±ik·ρs(v)
= ẑ×
v⊥× ẑ
∓ik⊥ ·
v⊥× ẑ
= ±iΩsẑ×
∓ik⊥ ·
v⊥× ẑ
= ±i ẑ×k⊥
v⊥J1(as), (B17)
we obtain the third term in Eq. (B13).
54 SCHEKOCHIHIN ET AL.
It is useful to give the lowest-order form of the operator (B13) in the limit k⊥ρs ≪ 1:
〈CM[hs]〉Rs = ν
D (v)
1 − ξ2
) ∂hs
d3v′v′
νssD (v
′)hs(v
d3v′v′2νssD (v
′)F0s(v′)
+ O(k2⊥ρ
s ). (B18)
This is the operator that can be used in the right-hand side of Eq. (145) (as, e.g., is done in the calculation of collisional transport
terms in Appendix D.3).
In practical numerical computations of gyrokinetic turbulence, the pitch-angle scattering operator is not sufficient because the
distribution function develops small scales not only in ξ but also in v (M. Barnes, W. Dorland and T. Tatsuno 2006, unpublished).
This is, indeed, expected because the phase-space entropy cascade produces small scales in v⊥, rather than just in ξ (see § 7.9.1).
In order to provide a cut off in v, an energy-diffusion operator must be added to the pitch-angle-scattering operator derived above.
A numerically tractable model gyrokinetic energy-diffusion operator was proposed by Abel et al. (2008); Barnes et al. (2009).48
B.4. Electron–Ion Collision Operator
This operator can be expanded in me/mi and to the lowest order is (see, e.g., Helander & Sigmar 2002)
Cei[h] = ν
D (v)
1 − ξ2
) ∂he
1 − ξ2
2v ·ui
v2the
, νeiD (v) = νei
(vthe
. (B19)
The corrections to this form are O(me/mi). This is second order in the expansion of § 4 and, therefore, we need not keep these
corrections. The operator (B19) is mathematically similar to the model operator for the same-species collisions [Eq. (B13)]. The
gyrokinetic version of this operator is derived in the way analogous to the calculation in Appendix B.3. The result is
〈Cei[h]〉Re =
eik·ReνeiD (v)
1 − ξ2
) ∂hek
v2(1 + ξ2)
4v2the
v2the
J1(ae)
2v′2⊥
v2thi
hik +
2v‖J0(ae)u‖ki
v2the
. (B20)
At scales not too close to the electron gyroscale, namely, such that k⊥ρe ∼ (me/mi)1/2, the second and third terms are manifestly
second order in (me/mi)
1/2, so have to be neglected along with other O(me/mi) contributions to the electron–ion collisions.
49 The
remaining two terms are first order in the mass-ratio expansion: the first term vanishes for he = h
e [Eq. (101)], so its contribution
is first order; in the fourth term, we can use Eq. (87) to express u‖i in terms of quantities that are also first order. Keeping only
the first-order terms, the gyrokinetic electron–ion collision operator is
〈Cei[h]〉Re = ν
D (v)
1 − ξ2
) ∂h(1)e
2v‖u‖i
v2the
. (B21)
Note that the ion drag term is essential to represent the ion–electron friction correctly and, therefore, to capture the Ohmic
resistivity (which, however, is rarely more important for unfreezing flux than the electron inertia and the finiteness of the electron
gyroradius; see § 7.7).
C. A HEURISTIC DERIVATION OF THE ELECTRON EQUATIONS
Here we show how the equations (116-117) of § 4 and the ERMHD equations (226-227) of § 7 can be derived heuristically
from electron fluid dynamics and a number of physical assumptions, without the use of gyrokinetics (§ C.1). This derivation is
not rigorous. Its role is to provide an intuitive route to the isothermal electron fluid and ERMHD approximations.
C.1. Derivation of Eqs. (116-117)
We start with the following three equations:
= −c∇×E, ∂ne
+∇· (neue) = 0, E +
ue ×B
. (C1)
These are Faraday’s law, the electron continuity equation, and the generalized Ohm’s law, which is the electron momentum
equation with all electron inertia terms neglected (i.e., effectively, the lowest order in the expansion in the electron mass me). The
electron pressure is assumed to be scalar by fiat (this can be justified in certain limits: for example in the collisional limit, as in
Appendix A, or for the isothermal electron fluid approximation derived in § 4). The electron-pressure term in the right-hand side
of Ohm’s law is sometimes called the thermoelectric term. We now assume the same static uniform equilibrium, E0 = 0, B0 = B0ẑ,
that we have used throughout this paper and apply to Eqs. (C1) the fundamental ordering discussed in § 3.1.
48 The collision operator now used the GS2 and AstroGK codes (see footnote 46) is their energy-diffusion operator plus the pitch-angle-scattering opera-
tor (B13).
49 The third term in Eq. (B20) is, in fact, never important: at the electron scales, k⊥ρe ∼ 1, it is negligible because of the Bessel function in the velocity
integral (Abel et al. 2008).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 55
First consider the projection of Ohm’s law onto the total magnetic field B, use the definition of E [Eq. (37)], and keep the
leading-order terms in the ǫ expansion:
E · b̂ = − 1
b̂ ·∇pe ⇒
+ b̂ ·∇ϕ = b̂ ·∇ δpe
. (C2)
This turns into Eq. (116) if we also assume isothermal electrons, δpe = T0eδne [see Eq. (103)].
With the aid of Ohm’s law, Faraday’s law turns into
= ∇× (ue ×B) = −ue ·∇B + B ·∇ue − B∇·ue. (C3)
Keeping the leading-order terms, we find, for the components of Eq. (C3) perpendicular and parallel to the mean field,
+ u⊥e ·∇⊥
= b̂ ·∇u⊥e,
+ u⊥e ·∇⊥
= b̂ ·∇u‖e. (C4)
In the last equation, we have used the electron continuity equation to write
∇·ue = −
+ u⊥e ·∇⊥
. (C5)
From Ohm’s law, we have, to lowest order,
u⊥e = −ẑ×
E⊥ +∇⊥
= ẑ×∇⊥
. (C6)
Using this expression in the second of the equations (C4) gives
− b̂ ·∇u‖e =
, (C7)
where d/dt is defined in the usual way [Eq. (122)]. Assuming isothermal electrons (δpe = T0eδne) annihilates the second term on
the right-hand side and turns the above equation into Eq. (117). As for the first of the equations (C4), the use of Eq. (C6) and
substitution of δB⊥ = −ẑ×∇⊥A‖ turns it into the previously derived Eq. (C2), whence follows Eq. (116).
Thus, we have shown that Eqs. (116-117) can be derived as a direct consequence of Faraday’s law, electron fluid dynamics
(electron continuity equation and the electron force balance, a. k. a. the generalized Ohm’s law), and the assumption of isothermal
electrons—all taken to the leading order in the gyrokinetic ordering given in § 3.1 (i.e., assuming strongly interacting anisotropic
fluctuations with k‖ ≪ k⊥).
We have just proved that Eqs. (116) and (117) are simply the perpendicular and parallel part, respectively, of Eq. (C3). The
latter equation means that the magnetic-field lines are frozen into the electron flow velocity ue, i.e., the flux is conserved, the
result formally proven in § 4.3 [see Eq. (99)].
C.2. Electron MHD and the Derivation of Eqs. (226-227)
One route to Eqs. (226-227), already explained in § 7.2, is to start with Eqs. (C2) and (C7) and assume Boltzmann electrons
and ions and the total pressure balance. Another approach, more standard in the literature on the Hall and Electron MHD, is to
start with Eq. (C3), which states that the magnetic field is frozen into the electron flow. The electron velocity can be written in
terms of the ion velocity and the current density, and the latter then related to the magnetic field via Ampère’s law:
ue = ui −
= ui −
4πene
∇×B. (C8)
To the leading order in ǫ, the perpendicular and parallel parts of Eq. (C3) are Eqs. (C4), respectively, where the perpendicular and
parallel electron velocities are [from Eq. (C8)]
u⊥e = u⊥i +
4πen0e
ẑ×∇⊥δB‖, u‖e = u‖i +
4πen0e
∇2⊥A‖. (C9)
The relative size of the two terms in each of these expressions is controlled by the size of k⊥di, where di = ρi/
βi is the ion
inertial scale. When k⊥di ≫ 1, we may set ui = 0. Note, however, that the ion motion is not totally neglected: indeed, in the
second of the equations (C4), the δne/ne terms comes, via Eq. (C5), from the divergence of the ion velocity [from Eq. (C8),
∇·ui = ∇·ue]. To complete the derivation, we relate δne to δB‖ via the assumption of total pressure balance, as explained in
§ 7.2, giving us Eq. (225). Substituting this equation and Eqs. (C9) into Eqs. (C4), we obtain
= v2Adi b̂ ·∇
1 + 2/βi(1 + Z/τ )
b̂ ·∇∇2⊥Ψ, (C10)
where Ψ = −A‖/
4πmin0i. Equations (C10) evolve the perturbed magnetic field. These equations become the ERMHD equations
(226-227) if δB‖/B0 is expressed in terms of the scalar potential via Eq. (223).
56 SCHEKOCHIHIN ET AL.
Note that there are two special limits in which the assumption of immobile ions suffices to derive Eqs. (C10) from Eq. (C3)
without the need for the pressure balance: βi ≫ 1 (incompressible ions) or τ = T0i/T0e ≪ 1 (cold ions) but βe = βiZ/τ ≫ 1. In both
cases, Eq. (225) shows that δne/n0e ≪ δB‖/B0, so the density perturbation can be ignored and the coefficient of the right-hand
side of the second of the equations (C10) is equal to 1. The limit of cold ions is discussed further in Appendix E.
D. FLUID LIMIT OF THE KINETIC RMHD
Taking the fluid (collisional) limit of the KRMHD system (summarized in § 5.7) means carrying out another subsidiary
expansion—this time in k‖λmfpi ≪ 1. The expansion only affects the equations for the density and magnetic-field-strength
fluctuations (§ 5.5) because the Alfvén waves are indifferent to collisional effects.
The calculation presented below follows a standard perturbation algorithm used in the kinetic theory of gases and in plasma
physics to derive fluid equations with collisional transport coefficients (Chapman & Cowling 1970). For magnetized plasma,
this calculation was carried out in full generality by Braginskii (1965), whose starting point was the full plasma kinetic theory
[Eqs. (36-39)]. While what we do below is, strictly speaking, merely a particular case of his calculation (see Appendix A), it has
the advantage of relative simplicity and also serves to show how the fluid limit is recovered from the gyrokinetic formalism—a
demonstration that we believe to be of value.
It will be convenient to use the KRMHD system written in terms of the function δ f̃i = g + (v2⊥/v
thi)(δB‖/B0)F0i, which is the
perturbation of the local Maxwellian in the frame of the Alfvén waves [Eqs. (150-152)]. We want to expand Eq. (150) in powers
of k‖λmfpi, so we let δ f̃i = δ f̃
i + δ f̃
i + . . ., δB‖ = δB
+ δB(1)
+ . . ., etc.
D.1. Zeroth Order: Ideal Fluid Equations
Since [see Eq. (49)]
k‖λmfpi√
k‖vthi
∼ k‖λmfpi, (D1)
to zeroth order Eq. (150) becomes
δ f̃ (0)i
= 0. The zero mode of the collision operator is a Maxwellian. Therefore, we
may write the full ion distribution function up to zeroth order in k‖λmfpi as follows [see Eq. (144)]
2πTi/mi
mi[(v⊥ − uE)2 + (v‖ − u‖)2]
, (D2)
where ni = n0i + δni and Ti = T0i + δTi include both the unperturbed quantities and their perturbations. The E×B drift velocity uE
comes from the Alfvén waves (see § 5.4) and does not concern us here. Since the perturbations δni, u‖ and δTi are small in the
original gyrokinetic expansion, Eq. (D2) is equivalent to
δ f̃ (0)i =
δn(0)e
v2thi
δT (0)i
v2thi
F0i, (D3)
where we have used quasi-neutrality to replace δni/n0i = δne/n0e. This automatically satisfies Eq. (151), while Eq. (152) gives us
an expression for the ion-temperature perturbation:
δT (0)i
δn(0)e
δB(0)
. (D4)
Note that this is consistent with the interpretation of the perpendicular Ampère’s law [Eq. (63), which is the progenitor of
Eq. (152)] as the pressure balance [see Eq. (67)]: indeed, recalling that the electron pressure perturbation is δpe = T0eδne
[Eq. (103)], we have
= −δpe − δpi = −δneT0e − δniT0i − n0iδTi, (D5)
whence follows Eq. (D4) by way of quasi-neutrality (Zni = ne) and the definitions of Z, τ , βi [Eqs. (40-42)].
Since the collision operator conserves particle number, momentum and energy, we can obtain evolution equations for δn(0)e /n0e,
and δB(0)
/B0 by multiplying Eq. (150) by 1, v‖, v
2/v2thi, respectively, and integrating over the velocity space. The three
moments that emerge this way are
d3vδ f̃ (0)i =
δn(0)e
d3vv‖δ f̃
i = u
v2thi
δ f̃ (0)i =
δn(0)e
δT (0)i
. (D6)
The three evolution equations for these moments are
δn(0)e
δB(0)
+ b̂ ·∇u(0)
= 0, (D7)
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 57
du(0)
− v2A b̂ ·∇
δB(0)
= 0, (D8)
δn(0)e
δT (0)i
δB(0)
b̂ ·∇u(0)
= 0. (D9)
These allow us to recover the fluid equations we derived in § 2.4: Eq. (D8) is the parallel component of the MHD momentum
equation (27); combining Eqs. (D7), (D9) and (D4), we obtain the continuity equation and the parallel component of the induction
equation—these are the same as Eqs. (25) and (26):
δn(0)e
1 + c2s/v
b̂ ·∇u(0)
δB(0)
1 + v2A/c2s
b̂ ·∇u(0)
, (D10)
where the sound speed cs is defined by Eq. (166). From Eqs. (D7) and (D9), we also find the analog of the entropy equation (23):
δT (0)i
δn(0)e
δs(0)
δs(0)
δT (0)i
δn(0)e
δn(0)e
δB(0)
. (D11)
This implies that the temperature changes due to compressional heating only.
D.2. Generalized Energy: Five RMHD Cascades Recovered
We now calculate the generalized energy by substituting δ f̃i from Eq. (D3) into Eq. (153) and using Eqs. (D4) and (D11):
min0iu
min0iu
n0iT0i
1 + Z/τ
5/3 + Z/τ
=W +AW +W
AW +W
sw +W
n0iT0i
1 + Z/τ
5/3 + Z/τ
Ws. (D12)
The first two terms are the Alfvén-wave energy [Eq. (154)]. The following two terms are the slow-wave energy, which splits into
the independently cascaded energies of “+” and “−” waves (see § 2.5):
WSW = W
sw +W
min0i
|z+‖|
2 + |z−‖|
. (D13)
The last term is the total variance of the entropy mode. Thus, we have recovered the five cascades of the RMHD system (§ 2.7;
Fig. 5 maps out the fate of these cascades at kinetic scales).
D.3. First Order: Collisional Transport
Now let us compute the collisional transport terms for the equations derived above. In order to do this, we have to determine
the first-order perturbed distribution function δ f̃ (1)i , which satisfies [see Eq. (150)]
δ f̃ (1)i
δ f̃ (0)i −
v2thi
δB(0)
+ v‖ b̂ ·∇
δ f̃ (0)i +
δn(0)e
. (D14)
We now use Eq. (D3) to substitute for δ f̃ (0)i and Eqs. (D10-D11) and (D8) to compute the time derivatives. Equation (D14)
becomes
δ f̃ (1)i
1 − 3ξ2
v2thi
2/3 + c2s/v
1 + c2s/v
b̂ ·∇u(0)
v2thi
b̂ ·∇δT
F0i(v), (D15)
where ξ = v‖/v. Note that the right-hand side gives zero when multiplied by 1, v‖ or v
2 and integrated over the velocity space, as
it must do because the collision operator in the left-hand side conserves particle number, momentum and energy.
Solving Eq. (D15) requires inverting the collision operator. While this can be done for the general Landau collision operator
(see Braginskii 1965), for our purposes, it is sufficient to use the model operator given in Appendix B.3, Eq. (B18). This simplifies
calculations at the expense of an order-one inaccuracy in the numerical values of the transport coefficients. As the exact value of
these coefficients will never be crucial for us, this is an acceptable loss of precision. Inverting the collision operator in Eq. (D15)
then gives
δ f̃ (1)i =
ν iiD(v)
1 − 3ξ2
v2thi
2/3 + c2s/v
1 + c2s/v
b̂ ·∇u(0)
v2thi
b̂ ·∇δT
F0i(v), (D16)
58 SCHEKOCHIHIN ET AL.
where ν iiD(v) is a collision frequency defined in Eq. (B12) and we have chosen the constants of integration in such a way that the
three conservation laws are respected:
d3vδ f̃ (1)i = 0,
d3vv‖δ f̃
i = 0,
d3vv2δ f̃ (1)i = 0. These relations mean that δn
e = 0,
= 0, δT (1)i = 0 and that, in view of Eq. (152), we have
δB(1)
2/3 + c2s/v
1 + c2s/v
ν‖ib̂ ·∇u‖, (D17)
where ν‖i is defined below [Eq. (D21)]. Equations (D16-D17) are now used to calculate the first-order corrections to the moment
equations (D7-D9). They become
+ b̂ ·∇u‖ = 0, (D18)
− v2Ab̂ ·∇
2/3 + c2s/v
1 + c2s/v
ν‖i b̂ ·∇
b̂ ·∇u‖
, (D19)
= κ‖ib̂ ·∇
b̂ ·∇δTi
, (D20)
where we have introduced the coefficients of parallel viscosity and parallel thermal diffusivity:
ν‖i =
ν iiD(v)v
F0i(v), κ‖i =
ν iiD(v)v
v2thi
F0i(v). (D21)
All perturbed quantities are now accurate up to first order in k‖λmfpi. Note that in Eq. (D19), we used Eq. (D17) to express
δB(0)
= δB‖ − δB
. We do the same in Eq. (D4) and obtain
2/3 + c2s/v
1 + c2s/v
ν‖ib̂ ·∇u‖
. (D22)
This equation completes the system (D18-D20), which allows us to determine δne, u‖, δTi and δB‖. In § 6.1, we use the equations
derived above, but absorb the prefactor (2/3 + c2s/v
A)/(1 + c
A) into the definition of ν‖i. The same system of equations can also
be derived from Braginskii’s two-fluid theory (Appendix A.4), from which we can borrow the quantitatively correct values of the
viscosity and ion thermal diffusivity: ν‖i = 0.90v
thi/νii, κ‖i = 2.45v
thi/νii, where νii is defined in Eq. (52).
E. HALL REDUCED MHD
The popular Hall MHD approximation consists in assuming that the magnetic field is frozen into the electron flow velocity
[Eq. (C3)]. The latter is calculated from the ion flow velocity and the current determined by Ampère’s law [Eq. (C8)]:
4πen0e
, (E1)
where the ion flow velocity ui satisfies the conventional MHD momentum equation (8). The Hall MHD is an appealing theoretical
model that appears to capture both the MHD behavior at long wavelengths (when ue ≃ ui) and some of the kinetic effects that
become important at small scales due to decoupling between the electron and ion flows (the appearance of dispersive waves)
without bringing in the full complexity of the kinetic theory. However, unlike the kinetic theory, it completely ignores the
collisionless damping effects and suggests that the key small-scale physical change is associated with the ion inertial scale
di = ρi/
βi (or, when βe ≪ 1, the ion sound scale ρs = ρi
Z/2τ ; see § E.3), rather than the ion gyroscale ρi. Is this an acceptable
model for plasma turbulence? Figure 8 illustrates the fact that at τ ∼ 1, the ion inertial scale does not play a special role linearly,
the MHD Alfvén wave becomes dispersive at the ion gyroscale, not at di, and that the collisionless damping cannot in general
be neglected. A detailed comparison of the Hall MHD linear dispersion relation with full hot plasma dispersion relation leads to
the conclusion that Hall MHD is only a valid approximation in the limit of cold ions, namely, τ = T0i/T0e ≪ 1 (Ito et al. 2004;
Hirose et al. 2004). In this Appendix, we show that a reduced (low-frequency, anisotropic) version of Hall MHD can, indeed, be
derived from gyrokinetics in the limit τ ≪ 1.50 This demonstrates that the Hall MHD model fits into the theoretical framework
proposed in this paper as a special limit. However, the parameter regime that gives rise to this special limit is not common in
space and astrophysical plasmas of interest.
E.1. Gyrokinetic Derivation of Hall Reduced MHD
Let us start with the equations of isothermal electron fluid, Eqs. (116-121), i.e., work within the assumptions that allowed us
to carry out the mass-ratio expansion (§ 4.8). In Eq. (120) (perpendicular Ampère’s law, or gyrokinetic pressure balance), taking
50 Note that, strictly speaking, our ordering of the collision frequency does not allow us to take this limit (see footnote 17), but this is a minor betrayal of rigor,
which does not, in fact, invalidate the results.
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 59
the limit τ ≪ 1 gives
eik·r
d3vJ0(ai)hik
, (E2)
where we have used Eq. (118) to express the hi integral and the expression for the electron beta βe = βiZ/τ . Note that the above
equation is simply the statement of a balance between the magnetic and electron thermal pressure (the ions are relatively cold,
so they have fallen out of the pressure balance). Using Eq. (E2) to express δne in terms of δB‖ in Eqs. (116) and (117) and also
substituting for u‖e from Eq. (119) [or, equivalently, Eq. (87)], we get
= vAb̂ ·∇
Φ+ vAdi
1 + 2/βe
b̂ ·∇
u‖i − di∇2⊥Ψ
, (E3)
where we have used our usual definitions of the stream and flux functions [Eq. (135)] and of the full derivatives [Eq. (160)].
These equations determine the evolution of the magnetic field, but we still need the ion gyrokinetic equation (121) to calculate
the ion motion (Φ = cϕ/B0 and u‖i) via Eqs. (118) and (88). There are two limits in which the ion kinetics can be reduced to
simple fluid models.
E.1.1. High-Ion-Beta Limit, βi ≫ 1
In this limit, k⊥ρi = k⊥di
βi ≫ 1 as long as k⊥di is not small. Then the ion motion can be neglected because it is averaged out
by the Bessel functions in Eqs. (118) and (88)—in the same way as in § 7.2. So we get Φ = (τ/Z)vAdiδB‖/B0 [using Eq. (E2);
this is the τ ≪ 1 limit of Eq. (223)] and u‖i = 0. Noting that βe = βiZ/τ ≫ 1 in this limit, we find that Eqs. (E3) reduce to
= v2Adi b̂ ·∇
= −di b̂ ·∇∇2⊥Ψ, (E4)
which is the τ ≪ 1 limit of our ERMHD equations (226-227) [or, equivalently, Eqs. (C10)].
E.1.2. Low-Ion-Beta Limit, βi ∼ τ ≪ 1 (the Hall Limit)
This limit is similar to the RMHD limit worked out in § 5: we take, for now, k⊥di ∼ 1 and βe ∼ 1 (in which subsidiary
expansions can be carried out later), and expand the ion gyrokinetics in k⊥ρi = k⊥di
βi ≪ 1. Note that ordering βe ∼ 1 means
that we have ordered βi ∼ τ ≪ 1. We now proceed analogously to the way we did in § 5: express the ion distribution in terms of
the g function defined by Eq. (124) and, using the relation (E2) between δB‖/B0 and δne/n0e, write Eqs. (125-127) as follows:
︸ ︷︷ ︸
︸ ︷︷ ︸
v⊥ ·A⊥
︸ ︷︷ ︸
− 〈Cii[g]〉Ri
︸ ︷︷ ︸
A‖,ϕ− 〈ϕ〉Ri
︸ ︷︷ ︸
+b̂ ·∇
︸ ︷︷ ︸
v⊥ ·A⊥
︸ ︷︷ ︸
F0i +
v⊥ ·A⊥
︸ ︷︷ ︸
,(E5)
Γ1(αi) +
︸ ︷︷ ︸
1 −Γ0(αi)
]Zeϕk
︸ ︷︷ ︸
d3vJ0(ai)gk
︸ ︷︷ ︸
, u‖ki
d3vv‖J0(ai)gk
︸ ︷︷ ︸
. (E6)
All terms in these equations can be ordered with respect to the small parameter
βi (an expansion subsidiary to the gyrokinetic
expansion in ǫ and the Hall expansion in τ ≪ 1). The lowest order to which they enter is indicated underneath each term. The
ordering we use is the same as in § 5.2, but now we count the powers of
βi and order formally k⊥di ∼ 1 and βe ∼ 1. It is easy
to check that this ordering can be summarized as follows
and that the ion and electron terms in Eqs. (E3) are comparable under this ordering, so their competition is retained (in fact, this
could be used as the underlying assumption behind the ordering). The fluctuation frequency continues to be ordered as the Alfvén
frequency, ω ∼ k‖vA. The collision terms are ordered via ω/νii ∼ k‖λmfpi/
βi and k‖λmfpi ∼ 1, although the latter assumption is
not essential for what follows, because collisions turn out to be negligible and it is fine to take k‖λmfpi ≫ 1 from the outset and
neglect them completely.
In Eqs. (E6), we use Eqs. (129) and (130) to write 1 −Γ0(αi) ≃ αi = k2⊥ρ2i /2 and Γ1(αi) ≃ 1. These equations imply that if we
expand g = g(−1) + g(0) + . . ., we must have
d3vg(−1) = 0, so the contribution to the right-hand side of the first of the equations
60 SCHEKOCHIHIN ET AL.
(E6) (the quasi-neutrality equation) comes from g(0), while the parallel ion flow is determined by g(−1). Retaining only the lowest
(minus first) order terms in Eq. (E5), we find the equation for g(−1), the v‖ moment of which gives an equation for u‖i:
∂g(−1)
{ϕ,g(−1)} = 2
v‖b̂ ·∇
F0i ⇒
= v2Ab̂ ·∇
. (E8)
Now integrating Eq. (E5) over the velocity space (at constant r), using the first of the equations (E6) to express the integral of
g(0), and retaining only the lowest (zeroth) order terms, we find
ρ2i ∇2⊥
+ b̂ ·∇u‖i = 0 ⇒
∇2⊥Φ = vAb̂ ·∇∇2⊥Ψ, (E9)
where we have used the second of the equations (E3) to express the time derivative of δB‖/B0.
Together with Eqs. (E3), Eqs. (E8) and (E9) form a closed system, which it is natural to call Hall Reduced MHD (HRMHD)
because these equations can be straightforwardly derived by applying the RMHD ordering (§ 2.1) to the MHD equations (8-10)
with the induction equation (10) replaced by Eq. (E1). Indeed, Eqs. (E8) and (E9) exactly coincide with Eqs. (27) and (18), which
are the parallel and perpendicular components of the MHD momentum equation (8) under the RMHD ordering; Eqs. (E3) should
be compared Eqs. (17) and (26) while noticing that, in the limit τ ≪ 1, the sound speed is cs = vA
βe/2 [see Eq. (166)]. The
incompressible case (Mahajan & Yoshida 1998) is recovered in the subsidiary limit βe ≫ 1 (i.e., 1 ≫ βi ≫ τ ).
E.2. Generalized Energy for Hall RMHD and the Passive Entropy Mode
To work out the generalized energy (§ 3.4) for the HRMHD regime, we start with the generalized energy for the isothermal
electron fluid [Eq. (109)] and use Eq. (E2) to express the density perturbation:
T0iδ f
, (E10)
where δB⊥ = ẑ×∇⊥Ψ. The perturbed ion distribution function can be written in the same form as it was done in § 5.4 [Eq. (143)]:
to lowest order in the
βi expansion (§ E.1.2),
δ f (−1)i =
2v⊥ ·u⊥
v2thi
F0i + g(−1) =
2v⊥ ·u⊥
v2thi
F0i +
2v‖u‖i
v2thi
F0i + g̃, (E11)
where u⊥ = ẑ×∇⊥Φ. The last equality above is achieved by noticing that, since g(−1) satisfies Eq. (E8), we may split it into a
perturbed Maxwellian with parallel velocity u‖i and the remainder: g
(−1) = 2v‖u‖iF0i/v
thi + g̃. Then g̃ is the homogeneous solution
of the leading-order kinetic equation [see Eq. (E8)]:
+{Φ, g̃} = 0,
d3v g̃ = 0. (E12)
Substituting Eq. (E11) into Eq. (E10) and keeping only the leading-order terms in the
βi expansion, we get
min0iu
min0iu
T0ig̃
. (E13)
The first four terms are the energy of the Alfvénic and slow-wave-polarized fluctuations [cf. Eq. (D12)]. Unlike in RMHD, these
are not decoupled in HRMHD, unless a further subsidiary long-wavelength limit is taken (see § E.4). It is easy to verify that the
sum of these four terms is indeed conserved by Eqs. (E3), (E8) and (E9). The last term in Eq. (E13) is an individually conserved
kinetic quantity. Its conservation reflects the fact that g̃ is decoupled from the wave dynamics and passively advected by the
Alfvénic velocities via Eq. (E12).51
The passive kinetic mode g̃ can be thought of as a kinetic version of the MHD entropy mode and, indeed, reduces to it if the
collision operator in Eq. (E5) is upgraded to the leading order by orderingω/νii ∼ 1 (i.e., by considering long parallel wavelengths,
k‖λmfpi ∼
βi). In such a collisional limit, g̃ has to be a perturbed Maxwellian with no density or velocity perturbation [because
d3vg̃ = 0, while the velocity perturbation is explicitly separated from g̃ in Eq. (E11)]. Therefore,
v2thi
F0i ⇒
T0ig̃
n0iT0i
δT 2i
T 20i
. (E14)
This is to be compared with the βi ∼ τ ≪ 1 limit of Eqs. (D11) and (D12). As we have established, in the
βi expansion,
δTi = δT
i , δni = δn
i , δB‖ = δB
, so to lowest order δs/s0 = δTi/T0i and Eq. (E14) describes the entropy mode in the Hall limit.
51 A similar splitting of the generalized energy cascade into a fluid-like cascade plus a passive cascade of a zero-density part of the distribution function occurs
in the Hasegawa–Mima regime, which is the electrostatic version of the Hall limit (Plunk et al. 2009).
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 61
E.3. Hall RMHD Dispersion Relation
Linearizing the Hall RMHD equations (E3), (E8) and (E9) (derived in § E.1.2 assuming the ordering βi ∼ τ ≪ 1), we obtain
the following dispersion relation:52
ω2 − k2‖v
1 + 2/βe
= ω2k2‖v
1 + 2/βe
. (E15)
When the coupling term on the right-hand side is negligible, k⊥di/
1 + 2/βe ≪ 1, we recover the MHD Alfvén wave, ω2 = k2‖v
and the MHD slow wave, ω2 = k2
v2A/(1 + v
s ) [Eq. (167)], where cs = vA
βe/2 in the limit τ ≪ 1 [Eq. (166)]. In the opposite
limit, we get the kinetic Alfvén wave, ω2 = k2
i /(1 + 2/βe) [same as Eq. (230) with τ ≪ 1].
The solution of the dispersion relation (E15) is
1 + 2/βe
 . (E16)
The corresponding eigenfunctions then satisfy53
Ψ = −
Φ+ vAdi
, u‖i = −
, Φ = −
Ψ. (E17)
Equation (E16) takes a particularly simple form in the subsidiary limits of high and low electron beta βe = βiZ/τ :
βe ≫ 1 : ω2 = k2‖v
 , βe ≪ 1 : ω2 = k2‖v
1 + k2⊥ρ
and ω2 =
1 + k2⊥ρ
, (E18)
where ρs = di
βe/2 = ρi
Z/2τ = cs/Ωi is called the ion sound scale. The Alfvén wave and the slow wave (known as the ion
acoustic wave in the limit of τ ≪ 1, βe ≪ 1) become dispersive at the ion inertial scale (k⊥di ∼ 1) when βe ≫ 1 and at the ion
sound scale (k⊥ρs ∼ 1) when βe ≪ 1.
E.4. Summary of Hall RMHD and the Role of the Ion Inertial and Ion Sound Scales
We have shown that in the limit of cold ions and low ion beta (βi ∼ τ ≪ 1, “the Hall limit”), gyrokinetic turbulence can be
described by five scalar functions: the stream and flux functions Φ and Ψ for the Alfvénic fluctuations, the parallel velocity and
magnetic-field perturbations u‖i and δB‖ for the slow-wave-polarized fluctuations, and g̃, the zero-density, zero-velocity part of
the ion distribution function, which is the kinetic version of the MHD entropy mode. The first four of these functions satisfy a
closed set of four fluid-like equations, derived in § E.1 and collected here:
= vAb̂ ·∇
Φ+ vAdi
1 + 2/βe
b̂ ·∇
u‖i − di∇2⊥Ψ
, (E19)
∇2⊥Φ = vAb̂ ·∇∇2⊥Ψ,
= v2Ab̂ ·∇
. (E20)
We call these equations the Hall Reduced Magnetohydrodynamics (HRMHD). To fully account for the generalized energy cas-
cade, one must append to the four HRMHD equations the fifth, kinetic equation (E12) for g̃, which is energetically decoupled
from HRMHD and slaved to the Alfvénic velocity fluctuations (§ E.2).
The equations given above are valid above the ion gyroscale, k⊥ρi ≪ 1. They contain a special scale, di/
1 + 2/βe, which
is the ion inertial scale di for βe ≫ 1 and the ion sound scale ρs = cs/Ωi for βe ≪ 1. As becomes clear from the linear theory
(§ E.3), the Alfvén and slow waves become dispersive at this scale. Nonlinearly, this scale marks the transition from the regime
in which the Alfvénic and slow-wave-polarized fluctuations are decoupled to the regime in which they are mixed. Namely, when
k⊥di/
1 + 2/βe ≪ 1, HRMHD turns into RMHD: Eqs. (E19) become Eqs. (17) and (26), while Eqs. (E20) remain unchanged
and identical to Eqs. (18) and (27); in the opposite limit, k⊥di/
1 + 2/βe ≫ 1, the ion motion decouples from the magnetic-field
evolution and Eqs. (E19) turn into the ERMHD equations (226-227).
Since we are considering the case βi ≪ 1, both di and ρs are much larger than the ion gyroscale ρi. In the opposite limit of
βi ≫ 1 (§ E.1.1), while di is the only scale that appears explicitly in Eqs. (E4), we have di ≪ ρi and the equations themselves
represent the dynamics at scales much smaller than the ion gyroscale, so the transition between the RMHD and ERMHD regimes
occurs at k⊥ρi ∼ 1. The same is true for βi ∼ 1, when di ∼ ρi. The ion sound scale ρs ≫ ρi does not play a special role when
52 The full gyrokinetic dispersion relation in a similar limit was worked out in Howes et al. (2006), Appendix D.2.1.
53 Note that wave packets with |k⊥| = k⊥ and satisfying Eq. (E17) with k‖vA/ω as a function of k⊥ given by Eq. (E16) are exact nonlinear solutions of
the HRMHD equations (E3) and (E8-E9). This can be shown via a calculation analogous to that in § 7.3 (for the incompressible Hall MHD, this was done by
Mahajan & Krishan 2005).
62 SCHEKOCHIHIN ET AL.
βi is not small: it is not hard to see that for k⊥ρs ∼ 1, the ion motion terms in Eqs. (E19) dominate and we simply recover the
inertial-range KRMHD model (§ 5) by expanding in k⊥ρi = k⊥ρs
2τ/Z ≪ 1.
Various theories of the dissipation-range turbulence based on Hall and Electron MHD are further discussed in § 8.2.6.
F. TWO-DIMENSIONAL INVARIANTS IN GYROKINETICS
Since gyrokinetics is in a sense a “quasi-two-dimensional” approximation, it is natural to inquire if this gives rise to additional
conservation properties (besides the conservation of the generalized energy discussed in § 3.4) and how they are broken by the
presence of parallel propagation terms. It is important to emphasize that, except in a few special cases, these invariants are only
invariants in 2D, so gyrokinetic turbulence in 2D and 3D has fundamentally different properties, despite its seemingly “quasi-2D”
nature. It is, therefore, generally not correct to think of the gyrokinetic turbulence (or its special case the MHD turbulence) as
essentially a 2D turbulence with an admixture of parallel-propagating waves (Fyfe et al. 1977; Montgomery & Turner 1981).
In this Appendix, we work out the 2D invariants. Without attempting to present a complete analysis of the 2D conservation
properties of gyrokinetics, we limit our discussion to showing how some more familiar fluid invariants (most notably, magnetic
helicity) emerge from the general 2D invariants in the appropriate asymptotic limits.
F.1. General 2D Invariants
In deriving the generalized energy invariant, we used the fact that
d3Rs hs{〈χ〉Rs ,hs} = 0, so Eq. (57) after multiplication
by T0shs/F0s and integration over space contains no contribution from the Poisson-bracket nonlinearity. Since we also have∫
d3Rs 〈χ〉Rs{〈χ〉Rs ,hs} = 0, multiplying Eq. (57) by qs〈χ〉Rs and integrating over space has a similar outcome. Subtracting the
latter integrated equation from the former and rearranging terms gives
qs〈χ〉Rs
= qsv‖
d3Rs 〈χ〉Rs
qs〈χ〉Rs
. (F1)
We see that in a purely 2D situation, when ∂/∂z = 0, we have an infinite family of invariants Is = Is(v⊥,v‖) whose conservation (for
each species and for every value of v⊥ and v‖!) is broken only by collisions. In 3D, the parallel particle streaming (propagation)
term in the gyrokinetic equation generally breaks these invariants, although special cases may arise in which the first term on the
right-hand side of Eq. (F1) vanishes and a genuine 3D invariant appears.
F.2. “A2
-Stuff”
Let apply the mass-ratio expansion (§ 4.1) to Eq. (F1) for electrons. Using the solution (101) for the electron distribution
function, we find
T0eF0e
v2the
d3rA‖
v2the
+ · · ·
= −ev‖
v2the
F0e −
∂h(1)e
d3rA‖
, (F2)
where we have kept terms to two leading orders in the expansion. To lowest order, the above equation reduces to
d3rA‖
. (F3)
This equation can also be obtained directly from Eq. (116) (multiply by A‖ and integrate). In 2D, it expresses a well known
conservation law of the “A2‖-stuff.” As this 2D invariant exists already on the level of the mass-ratio expansion of the electron
kinetics, with no assumptions about the ions, it is inherited both by the RMHD equations in the limit of k⊥ρi ≪ 1 (§ 5.3) and
by the ERMHD equations in the limit of k⊥ρi ≫ 1 (§ 7.2). In the former limit, δne/n0e on the right-hand side of Eq. (F3) is
negligible (under the ordering explained in § 5.2); in the latter limit, it is expressed in terms of ϕ via Eq. (221). The conservation
of “A2‖-stuff” is a uniquely 2D feature, broken by the parallel propagation term in 3D.
F.3. Magnetic Helicity in the Electron Fluid
If we now divide Eq. (F2) through by ev‖/c and integrate over velocities, we get, after some integrations by parts, another
relation that becomes a conservation law in 2D and that can also easily be derived directly from the equations of the isothermal
electron fluid (116-117):
d3rA‖
. (F4)
In the ERMHD limit k⊥ρi ≫ 1 (§ 7.2), we use Eqs. (221-223) to simplify the above equation and find that the integral on the
right-hand side vanishes and we get a genuine 3D conservation law:
d3rA‖δB‖ = 0. (F5)
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 63
This can also be derived directly from the ERMHD equations (226-227) [using Eq. (223)]. The conserved quantity is readily seen
to be the helicity of the perturbed magnetic field:
d3rA · δB =
∇⊥×A‖ẑ
+ A‖δB‖
A‖ẑ · (∇⊥×A⊥) + A‖δB‖
d3rA‖δB‖. (F6)
F.4. Magnetic Helicity in the RMHD Limit
Unlike in the case of ERMHD, the helicity of the perturbed magnetic field in RMHD is conserved only in 2D. This is because
the induction equation for the perturbed field has an inhomogeneous term associated with the mean field [Eq. (10) with B =
B0ẑ + δB] (this issue has been extensively discussed in the literature; see Matthaeus & Goldstein 1982; Stribling et al. 1994;
Berger 1997; Montgomery & Bates 1999; Brandenburg & Matthaeus 2004). Directly from the induction equation or from its
RMHD descendants Eqs. (17) and (26), we obtain [note the definitions (135)]
d3rA‖δB‖ =
1 + v2A/c2s
, (F7)
so helicity is conserved only if ∂/∂z = 0.
For completeness, let us now show that this 2D conservation law is a particular case of Eq. (F1) for ions. Let us consider the
inertial range (k⊥ρi ≪ 1). We substitute Eq. (124) into Eq. (F1) for ions and expand to two leading orders in k⊥ρi using the
ordering explained in § 5.2:
v‖〈A‖〉Ri
Z2e2v2
d3rA‖g + · · ·
Z2e2v2
d3rA‖
v2thi
+ Zev‖
d3rA‖
. (F8)
The lowest-order terms in the above equations (all proportional to v2‖F0i) simply reproduce the 2D conservation of “A
‖-stuff,”
given by Eq. (F3). We now subtract Eq. (F3) multiplied by (Zev‖/c)
2F0i/T0i from Eq. (F8). This leaves us with
d3rA‖g = c
+ v‖F0i
v2thi
d3rA‖
. (F9)
This equation is a general 2D conservation law of the KRMHD equations (see § 5.7) and can also be derived directly from them.
If we integrate it over velocities and use Eqs. (146) and (147), we simply recover Eq. (F4). However, since Eq. (F9) holds for
every value of v‖ and v⊥, it carries much more information than Eq. (F4).
To make connection to MHD, let us consider the fluid (collisional) limit of KRMHD worked out in Appendix D. The distribu-
tion function to lowest order in the k‖λmfpi ≪ 1 expansion is g = −(v2⊥/v2thi)δB‖/B0 +δ f̃
i , where δ f̃
i is the perturbed Maxwellian
given by Eq. (D3). We can substitute this expression into Eq. (F9). Since in this expansion the collision integral is applied to δ f̃ (1)i
and is the same order as the rest of the terms (see § D.3), conservation laws are best derived by taking 1, v‖, and v
2/v2thi moments
of Eq. (F9) so as to make the collision term vanish. In particular, multiplying Eq. (F9) by 1 + (2τ/3Z)v2/v2thi, integrating over
velocities and using Eqs. (D4) and (D6), we obtain the evolution equation for
d3rA‖δB‖, which coincides with Eq. (F7). Note
that, either proceeding in an analogous way, one can derive similar equations for
d3rA‖δne and
d3rA‖u‖—these are also 2D
invariants of the RMHD system, broken in 3D by the presence of the propagation terms. The same result can be derived directly
from the evolution equations (D8) and (D10).
F.5. Electrostatic Invariant
Interestingly, the existence of the general 2D invariants introduced in § F.1 alongside the generalized energy invariant given by
Eq. (73) means that one can construct a 2D invariant of gyrokinetics that does not involve any velocity-space quantities. In order
to do that, one must integrate Eq. (F1) over velocities, sum over species, and subtract Eq. (73) from the resulting equation (thus
removing the h2s integrals). The result is not particularly edifying in the general case, but it takes a simple form if one considers
electrostatic perturbations (δB = 0). In this case, χ = ϕ, and the manipulations described above lead to the following equation
d3v Is −W
q2s n0s
1−Γ0(αs)
|ϕk|2 =
d3rE‖ j‖ −
d3Rs 〈ϕ〉Rs
, (F10)
where E‖ = −∂ϕ/∂z, αs = k2⊥ρ
s/2 and Γ0 is defined by Eq. (129). In 2D, E‖ = 0 and the above equation expresses a conservation
law broken only by collisions. The complete derivation and analysis of 2D conservation properties of gyrokinetics in the electro-
static limit, including the invariant (F10), the electrostatic version of Eq. (F1), and their consequences for scalings and cascades,
was given by Plunk et al. (2009). Here we briefly consider a few relevant limits.
For k⊥ρi ≪ 1, we have Γ0(α) = 1 −αs + . . ., so the invariant given by Eq. (F10) is simply the kinetic energy of the E×B flows:
s(msn0s/2)
d3r |∇⊥Φ|2, where Φ = cϕ/B0. In the limit k⊥ρi ≫ 1, k⊥ρe ≪ 1, we have Y = −n0i
d3rZ2e2ϕ2/2T0i. In
64 SCHEKOCHIHIN ET AL.
the limit k⊥ρe ≫ 1, we have Y = −(1 + Z/τ )n0e
d3re2ϕ2/2T0e. Whereas we are not interested in electrostatic fluctuations in the
inertial range, electrostatic turbulence in the dissipation range was discussed in § 7.10 and § 7.12. The electrostatic 2D invariant
in the limits k⊥ρi ≫ 1, k⊥ρe ≪ 1 and k⊥ρe ≫ 1 can also be derived directly from the equations given there [in the former limit,
use Eq. (264) to express u‖i in terms of j‖ in order to get Eq. (F10)].
Note that, taken separately and integrated over velocities, Eq. (F1) for ions (when k⊥ρi ≫ 1, k⊥ρe ≪ 1) and for electrons
(when k⊥ρe ≫ 1), reduces to lowest order to the statement of 3D conservation of
d3Ri T0ih2i /2F0i [Whi in Eq. (245)] and∫
d3Re T0eh2e/2F0e [Eq. (280)], respectively.
F.6. Implications for Turbulent Cascades and Scalings
Since invariants other than the generalized energy or its constituent parts are present in 2D and, in some limits, also in 3D, one
might ask how their presence affects the turbulent cascades and scalings. As an example, let us consider the magnetic helicity in
KAW turbulence, which is a 3D invariant of the ERMHD equations (§ F.3).
A Kolmogorov-style analysis of a local KAW cascade based on a constant flux of helicity gives (proceeding as in § 7.5):
τKAWλ
1 +βi
τKAWλ
1 +βi
∼ εH = const ⇒ Φλ ∼
(1 +βi)1/6
1/3, (F11)
where εH is the helicity flux (omitting constant dimensional factors, the helicity is now defined as
d3rΨΦ and assumed to be
non-zero). This corresponds to a k
⊥ spectrum of magnetic energy.
In order to decide whether we expect the scalings to be determined by the constant-helicity flux or by the constant-energy
flux (as assumed in § 7.5), we adapt a standard argument originally due to Fjørtoft (1953). If the helicity flux of the KAW
turbulence originating at the ion gyroscale (via partial conversion from the inertial-range turbulence; see § 7) is εH , its energy
flux is εKAW ∼ εH [set λ = ρi in Eq. (F11) and compare with Eq. (238)]. If the cascade between the ion and electron gyroscales
is controlled by maintaining a constant flux of helicity, then the helicity flux arriving to the electron gyroscale is still εH , while
the associated energy flux is εHρi/ρe ≫ εKAW, i.e., more energy arrives to ρe than there was at ρi! This is clearly impossible in
a stationary state. The way to resolve this contradiction is to conclude that the helicity cascade is, in fact, inverse (i.e., directed
towards larger scales), while the energy cascade is direct (to smaller scales). A similar argument based on the constancy of the
energy flux εKAW then leads to the conclusion that the helicity flux arriving to the electron gyroscale is εKAWρe/ρi ≪ εH ∼ εKAW,
i.e., the helicity indeed does not cascade to smaller scales. It does not, in fact, cascade to large scales either because the ERMHD
equations are not valid above the ion gyroscale and the helicity of the perturbed magnetic field in the inertial range is not a 3D
invariant (§ F.4). The situation would be different if an energy source existed either at the electron gyroscale or somewhere in
between ρe and ρi. In such a case, one would expect an inverse helicity cascade and the consequent shallower scaling [Eq. (F11)]
between the energy-injection scale and the ion gyroscale.
Other invariants introduced above can in a similar fashion be argued to give rise to inverse cascades in the hypothetical 2D
situations where they are valid and provided there is energy injection at small scales (for the electrostatic case, see Plunk et al.
2009 and numerical simulations by Tatsuno et al. 2009b). The view of turbulence advanced in this paper does not generally
allow for this to happen. First, the fundamentally 3D nature of the turbulence is imposed via the critical balance conjecture and
supported by the argument that “two dimensionality” can only be maintained across parallel distances that do not exceed the
distance a parallel-propagating wave (or parallel-streaming particles) travels over one nonlinear decorrelation time (see § 1.2,
§ 7.5 and § 7.10.3). Secondly, the lack of small-scale energy injection was assumed at the outset. This can, however, be violated
in real astrophysical plasmas by various small-scale plasma instabilities (e.g., triggered by pressure anisotropies; see discussion
in § 8.3). Treatment of such effects falls outside the scope of this paper and remains a matter for future work.
REFERENCES
Abel, I. G., Barnes, M., Cowley, S. C., Dorland, W., & Schekochihin, A. A.
2008, Phys. Plasmas, 15, 122509
Alexandrova, O. 2008, Nonlinear Process. Geophys., 15, 95
Alexandrova, O., Carbone, V., Veltri, P., & Sorriso-Valvo, L. 2008a, ApJ,
674, 1153
Alexandrova, O., Lacombe, C., & Mangeney, A. 2008b, Ann. Geophys., 26,
Antonsen, T. M. & Lane, B. 1980, Phys. Fluids, 23, 1205
Armstrong, J. W., Coles, W. A., Kojima, M., & Rickett, B. J. 1990, ApJ,
358, 685
Armstrong, J. W., Cordes, J. M., & Rickett, B. J. 1981, Nature, 291, 561
Armstrong, J. W., Rickett, B. J., & Spangler, S. R. 1995, ApJ, 443, 209
Artun, M. & Tang, W. M. 1994, Phys. Plasmas, 1, 2682
Balbus, S. A. & Hawley, J. F. 1998, Rev. Mod. Phys., 70, 1
Bale, S. D., Kellogg, P. J., Mozer, F. S., Horbury, T. S., & Reme, H. 2005,
Phys. Rev. Lett., 94, 215002
Barnes, A. 1966, Phys. Fluids, 9, 1483
Barnes, M. A., Abel, I. G., Dorland, W., Ernst, D. R., Hammett, G. W.,
Ricci, P., Rogers, B. N., Schekochihin, A. A., and Tatsuno, T. 2009, Phys.
Plasmas, submitted (arXiv:0809.3945)
Bavassano, B., Dobrowolny, M., Fanfoni, G., Mariani, F., & Ness, N. F.
1982, J. Geophys. Res., 87, 3617
Bavassano, B., Pietropaolo, E., & Bruno, R. 2004, Ann. Geophys., 22, 689
Beck, R. 2007, in Polarisation 2005, ed. F. Boulanger &
M. A. Miville-Deschenes, EAS Pub. Ser., 23, 19
Belcher, J. W. & Davis, L. 1971, J. Geophys. Res., 76, 3534
Beresnyak, A. & Lazarian, A. 2006, ApJ, 640, L175
Beresnyak, A. & Lazarian, A. 2008a, ApJ, 682, 1070
Beresnyak, A. & Lazarian, A. 2008b, arXiv:0812.0812
Berger, M. 1997, J. Geophys. Res., 102, 2637
Bershadskii, A. & Sreenivasan, K. R. 2004, Phys. Rev. Lett., 93, 064501
Bhattacharjee, A. & Ng, C. S. 2001, ApJ, 548, 318
Bhattacharjee, A., Ng, C. S., Spangler, S. R. 1998, ApJ, 494, 409
Bieber, J. W., Wanner, W., & Matthaeus, W. H. 1996, J. Geophys. Res., 101,
Bigazzi, A., Biferale, L., Gama, S. M. A., & Velli, M. 2006, ApJ, 638, 499
Biskamp, D. & Müller, W.-C. 2000, Phys. Plasmas, 7, 4889
Biskamp, D., Schwartz, E., & Drake, J. F. 1996, Phys. Rev. Lett., 76, 1264
Biskamp, D., Schwartz, E., Zeiler, A., Celani, A., & Drake, J. F. 1999, Phys.
Plasmas, 6, 751
Boldyrev, S. A. 2006, Phys. Rev. Lett., 96, 115002
Braginskii, S. I. 1965, Rev. Plasma Phys., 1, 205
Brandenburg, A. & Matthaeus, W. H. 2004, Phys. Rev. E, 69, 056407
Brizard, A. J. & Hahm, T. S. 2007, Rev. Mod. Phys., 79, 421
http://arxiv.org/abs/0809.3945
http://arxiv.org/abs/0812.0812
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 65
Brunetti, G. & Lazarian, A. 2007, MNRAS, 378, 245
Bruno, R. & Carbone, V. 2005, Living Rev. Solar Phys., 2, 4
Bruno, R., Carbone, V., Chapman, S., Hnat, B., Noullez, A., &
Sorriso-Valvo, L. 2007, Phys. Plasmas, 14, 032901
Burlaga, L. F., Scudder, J. D., Klein, L. W., & Isenburg, P. A. 1990, J.
Geophys. Res., 95, 2229
Califano, F., Hellinger, P., Kuznetsov, E., Passot, T., Sulem, P.-L., &
Trávnícek 2008, J. Geophys. Res., 113, A08219
Candy, J. & Waltz, R. E. 2003, J. Comput. Phys., 186, 545
Carter, T. A., Brugman, B., Pribyl, P., & Lybarger, W. 2006, Phys. Rev. Lett.,
96, 155001
Catto, P. J. 1978, Plasma Phys., 20, 719
Catto, P. J., Tang, W. M., & Baldwin, D. E. 1981, Plasma Phys., 23, 639
Catto, P. J. & Tsang, K. T. 1977, Phys. Fluids, 20, 396
Celnikier, L. M., Harvey, C. C., Jegou, R., Kemp, M., & Moricet, P. 1983,
A&A, 126, 293
Celnikier, L. M., Muschietti, L., & Goldman, M. V. 1987, A&A, 181, 138
Chandran, B. D. G. 2005a, ApJ, 632, 809
Chandran, B. D. G. 2005b, Phys. Rev. Lett., 95, 265004
Chandran, B. D. G. 2008, ApJ, 685, 646
Chapman, S. & Cowling, T. G. 1970, The Mathematical Theory of
Non-Uniform Gases (Cambridge: Cambridge Univ. Press)
Chapman, S. C. & Hnat, B. 2007, Geophys. Res. Lett., 34, L17103
Chen, Y. & Parker, S. E. 2003, J. Comput. Phys., 189, 463
Chen, C. H. K., Schekochihin, A. A., Cowley, S. C., & Horbury, T. S. 2009,
ApJ, submitted
Cho, J. & Lazarian, A. 2002, Phys. Rev. Lett., 88, 245001
Cho, J. & Lazarian, A. 2003, MNRAS, 345, 325
Cho, J. & Lazarian, A. 2004, ApJ, 615, L41
Cho, J. & Vishniac, E. T. 2000, ApJ, 539, 273
Cho, J., Lazarian, A., & Vishniac, E. T. 2002, ApJ, 564, 291
Cho, J., Lazarian, A., & Vishniac, E. T. 2003, ApJ, 595, 812
Clarke, T. E. & Enßlin T. A. 2006, AJ, 131, 2900
Coleman, P. J. 1968, ApJ, 153, 371
Coles, W. A. & Harmon, J. K. 1989, ApJ, 337, 1023
Coles, W. A., Liu, W., Harmon, J. K., & Martin, C. L. 1991, J. Geophys.
Res., 96, 1745
Coroniti, F. W., Kennel, C. F., Scarf, F. L., & Smith, E. J. 1982, J. Geophys.
Res., 87, 6029
Corrsin, S. 1951, J. Appl. Phys., 22, 469
Cowley, S. C. 1985, Ph. D. Thesis, Princeton University
Cranmer, S. R. & van Ballegooijen, A. A. 2003, ApJ, 594, 573
Czaykowska, A., Bauer, T. M., Treumann, R. A., & Baumjohann, W. 2001,
Ann. Geophys., 19, 275
Dasso, S., Milano, L. J., Matthaeus, W. H., & Smith, C. W. 2005, ApJ, 635,
Dennett-Thorpe, J. & de Bruyn, A. G. 2003, A&A, 404, 113
Denskat, K. U., Beinroth, H. J., & Neubauer, F. M. 1983, J. Geophys., 54, 60
Dimits, A. M. & Cohen, B. I. 1994, Phys. Rev. E, 49, 709
Dmitruk, P., Gomez, D. O., & Matthaeus, W. H. 2003, Phys. Plasmas, 10,
Dobrowolny, M., Mangeney, A., & Veltri, P.-L. 1980, Phys. Rev. Lett., 45,
Dorland, W. & Hammett, G. W. 1993, Phys. Fluids B, 5, 812
Dubin, D. H. E., Krommes, J. A., Oberman, C., & Lee, W. W. 1983, Phys.
Fluids, 26, 3524
Elsasser, W. M. 1950, Phys. Rev., 79, 183
Enßlin, T. A. & Vogt, C. 2006, A&A, 453, 447
Enßlin, T. A., Waelkens, A., Vogt, C., & Schekochihin, A. A. 2006, Astron.
Nachr., 327, 626
Fabian, A. C., Sanders, J. S., Taylor, G. B., Allen, S. W., Crawford, C. S.,
Johnstone, R. M., & Iwasawa, K. 2006, MNRAS, 366, 417
Ferriere, K. M. 2001, Rev. Mod. Phys., 73, 1031
Fjørtoft, R. 1953, Tellus, 5, 225
Foote, E. A. & Kulsrud, R. M. 1979, ApJ, 233, 302
Fowler, T. K. 1968, Adv. Plasma Phys., 1, 201
Fried, B. D. & Conte, S. D. 1961, The Plasma Dispersion Function (San
Diego, CA: Academic Press)
Frieman, E. A. & Chen, L. 1982, Phys. Fluids, 25, 502
Fyfe, D., Joyce, G., & Montgomery, D. 1977, J. Plasma Phys., 17, 317
Galtier, S. 2006, J. Plasma Phys., 72, 721
Galtier, S. & Bhattacharjee, A. 2003, Phys. Plasmas, 10, 3065
Galtier, S. & Buchlin, E. 2007, ApJ, 656, 560
Galtier, S. & Chandran, B. D. G. 2006, Phys. Plasmas, 13, 114505
Galtier, S., Nazarenko, S. V., Newell, A. C., & Pouquet, A. 2000, J. Plasma
Phys., 63, 447
Galtier, S., Nazarenko, S. V., Newell, A. C., & Pouquet, A. 2002, ApJ, 564,
Gary, S. P., Montgomery, M. D., Feldman, W. C., & Forslund, D. W. 1976,
J. Geophys. Res., 81, 1241
Gary, S. P. 1986, J. Plasma Phys., 35, 431
Gaty, S. P. & Borovsky, J. 2008, J. Geophys. Res., 113, A12104
Gary, S. P., Saito, S., & Li, H. 2008, Geophys. Res. Lett., 35, L02104
Gary, S. P., Skoug, R. M., Steinberg, J. T., & Smith, C. W. 2001, Geophys.
Res. Lett., 28, 2759
Ghosh, S., Siregar, E., Roberts, D. A., & Goldstein, M. L. 1996,
J. Geophys. Res., 101, 2493
Gogoberidze, G. 2005, Phys. Rev. E, 72, 046407
Gogoberidze, G. 2007, Phys. Plasmas, 14, 022304
Goldreich, P. & Reisenegger, A. 1992, ApJ, 395, 250
Goldreich, P. & Sridhar, S. 1995, ApJ, 438, 763
Goldreich, P. & Sridhar, S. 1997, ApJ, 485, 680
Goswami, P., Passot, T., & Sulem, P. L. 2005, Phys. Plasmas, 12, 102109
Grall, R. R., Coles, W. A., Spangler, S. R., Sakurai, T., & Harmon, J. K.
1997, J. Geophys. Res., 102, 263
Grison, B., Sahraoui, F., Lavraud, B., Chust, T., Cornilleau-Wehrlin, N.,
Rème, H., Balogh, A., & André, M. 2005, Ann. Geophys., 23, 3699
Hahm, T. S., Lee, W. W., & Brizard, A. 1988, Phys. Fluids, 31, 1940
Hallatschek, K. 2004, Phys. Rev. Lett., 93, 125001
Hamilton, K., Smith, C. W., Vasquez, B. J., & Leamon, R. J. 2008,
J. Geophys. Res., 113, A01106
Hammett, G. W., Dorland, W., & Perkins, F. W. 1991, Phys. Fluids B, 4,
Harmon, J. K. & Coles, W. A. 2005, J. Geophys. Res., 110, A03101
Haugen, N. E. L., Brandenburg, A., & Dobler, W. 2004, Phys. Rev. E, 70,
016308
Haverkorn, M., Gaensler, B. M., McClure-Griffiths, N. M., Dickey, J. M., &
Green, A. J. 2004, ApJ, 609, 776
Haverkorn, M., Gaensler, B. M., Brown, J. C., Bizunok, N. S.,
McClure-Griffiths, N. M., Dickey, J. M., & Green, A. J. 2005, ApJ, 637,
Haverkorn, M., Brown, J. C., Gaensler, B. M., & McClure-Griffiths, N. M.
2008, ApJ, 680, 362
Hazeltine, R. D. 1983, Phys. Fluids, 26, 3242
Helander, P. & Sigmar, D. J. 2002, Collisional Transport in Magnetized
Plasmas (Cambridge: Cambridge Univ. Press)
Hellinger, P., Trávnícek, P., Kasper, J. C., & Lazarus, A. J., 2006, Geophys.
Res. Lett., 33, L09101
Heyer, M., Gong, H., Ostriker, E., & Brunt, C. 2008, ApJ, 680, 420
Higdon, J. C. 1984, ApJ, 285, 109
Hirose, A., Ito, A., Mahajan, S. M., & Ohsaki, S. 2004, Phys. Lett. A, 330,
Hnat, B., Chapman, S. C., & Rowlands, G. 2005, Phys. Rev. Lett., 94,
204502
Hnat, B., Chapman, S. C., Kiyani, K., Rowlands, G., & Watkins, N. W.
2007, Geophys. Res. Lett., 34, L15108
Hollweg, J. V. 1999, J. Geophys. Res., 104, 14811
Hollweg, J. V. 2008, J. Astrophys. Astr., 29, 217
Horbury, T. S., Balogh, A., Forsyth, R. J., & Smith E. J. 1996, A&A, 316,
Horbury, T. S., Forman, M. A., & Oughton, S. 2005, Plasma Phys. Control.
Fusion, 47, B703
Horbury, T. S., Forman, M., & Oughton, S. 2008, Phys. Rev. Lett., 101,
175005
Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E., &
Schekochihin, A. A. 2006, ApJ, 651, 590
Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E., &
Schekochihin, A. A., 2008a, J. Geophys. Res., 113, A05103
Howes, G. G., Cowley, S. C., Dorland, W., Hammett, G. W., Quataert, E.,
Schekochihin, A. A., & Tatsuno, T. 2008b, Phys. Rev. Lett., 100, 065004
Iroshnikov, R. S. 1963, Astron. Zh., 40, 742 [English translation: 1964, Sov.
Astron, 7, 566]
Ito, A., Hirose, A., Mahajan, S. M., & Ohsaki, S. 2004, Phys. Plasmas, 11,
Jenko, F., Dorland, W., Kotschenreuther, M., & Rogers, B. N. 2000, Phys.
Plasmas, 7, 1904
Kadomtsev, B. B. & Pogutse, O. P. 1974, Sov. Phys.—JETP, 38, 283
Kasper, J. C., Lazarus, A. J., & Gary, S. P. 2002, Geophys. Res. Lett., 29, 20
Kellogg, P. J. & Horbury, T. S. 2005, Ann. Geophys., 23, 3765
Kellogg, P. J., Bale, S. D., Mozer, F. S., Horbury, T. S., & Reme, H. 2006,
ApJ, 645, 704
Kingsep, A. S., Chukbar, K. V., & Yankov, V. V. 1990, Rev. Plasma Phys.,
16, 243
66 SCHEKOCHIHIN ET AL.
Kinney, R. & McWilliams, J. C., 1997, J. Plasma Phys., 57, 73
Kinney, R. M. & McWilliams, J. C. 1998, Phys. Rev. E, 57, 7111
Kivelson, M. G. & Southwood, D. J. 1996, J. Geophys. Res., 101, 17365
Kiyani, K., Chapman, S. C., Hnat, B. & Nicol, R. M. 2007, Phys. Rev. Lett.,
98, 211101
Kolmogorov, A. N. 1941, Dokl. Akad. Nauk SSSR, 30, 299 [English
translation: 1991, Proc. R. Soc. A, 434, 9]
Kotschenreuther, M., Rewoldt, G., & Tang, W. M. 1995, Comput. Phys.
Commun., 88, 128
Kraichnan, R. H. 1965, Phys. Fluids, 8, 1385
Krishan, V. & Mahajan, S. M. 2004, J. Geophys. Res., 109, A11105
Krommes, J. A. 1999, Phys. Plasmas, 6, 1477
Krommes, J. A. 2006, in Turbulence and Coherent Structures in Fluids,
Plasmas and Nonlinear Medium, eds. M. Shats & H. Punzmann
(Singapore: World Scientific), 115
Krommes, J. A. & Hu, G. 1994, Phys. Plasmas, 1, 3211
Kruger, S. E., Hegna, C. C., & Callen, J. D. 1998, Phys. Plasmas, 5, 4169
Kruskal, M. D. & Oberman, C. R. 1958, Phys. Fluids, 1, 275
Kulsrud, R. 1962, Phys. Fluids, 5, 192
Kulsrud, R. M. 1964, in Teoria dei plasmi, ed. M. N. Rosenbluth (London:
Academic Press), 54
Kulsrud, R. M. 1983, in Handbook of Plasma Physics, Vol. 1, ed.
A. A. Galeev & R. N. Sudan (Amsterdam: North–Holland), 115
Lacombe, C., Samsonov, A. A., Mangeney, A., Maksimovic, M.,
Cornilleau-Wehrlin, N., Harvey, C. C., Bosqued, J.-M., & Trávnícek, P.
2006, Ann. Geophys., 24, 3523
Landau, L. 1936, Zh. Exp. Teor. Fiz., 7, 203
Landau, L. 1946, Zh. Exp. Teor. Fiz., 16, 574 [English translation: 1946,
J. Phys. U.S.S.R., 10, 25]
Lazio, T. J. W., Cordes, J. M., de Bruyn, A. G., & Macquart, J.-P. 2004, New
Astron. Rev., 48, 1439
Leamon, R. J., Smith, C. W., Ness, N. F., Matthaeus, W. H., & Wong, H. K.
1998, J. Geophys. Res., 103, 4775
Leamon, R. J., Smith, C. W., Ness, N. F., & Wong, H. K. 1998, J. Geophys.
Res., 104, 22331
Leamon, R. J., Matthaeus, W. H., Smith, C. W., Zank, G. P., & Mullan, D. J.
2000, ApJ, 537, 1054
Li, H., Gary, P., & Stawicki, O. 2001, Geophys. Res. Lett., 28, 1347
Lithwick, Y. & Goldreich, P. 2001, ApJ, 562, 279
Lithwick, Y. & Goldreich, P. 2003, ApJ, 582, 1220
Lithwick, Y., Goldreich, P., & Sridhar, S. 2007, ApJ, 655, 269
Loeb, A. & Waxman, E. 2007, J. Cosmol. Astropart. Phys., 03, 011
Longmire, C. L. 1963, Elementary Plasma Physics (New York: Interscience)
Lovelace, R. V. E., Salpeter, E. E., Sharp, L. E., & Harris, D. E. 1970, ApJ,
Mahajan, S. M. & Krishan, V. 2005, MNRAS, 359, L27
Mahajan, S. M. & Yoshida, Z. 1998, Phys. Rev. Lett., 81, 4863
Maksimovic, M., Zouganelis, I., Chaufray, J.-Y., Issautier, K., Scime, E. E.,
Littleton, J. E., Marsch, E., McComas, D. J., Salem, C., Lin, R. P., &
Elliott, H. 2005, J. Geophys. Res., 110, A09104
Mangeney, A., Lacombe, C., Maksimovic, M., Samsonov, A. A.,
Cornilleau-Wehrlin, N., Harvey, C. C., Bosqued, J.-M., & Trávnícek, P.
2006, Ann. Geophys., 24, 3507
Markevitch, M. & Vikhlinin, A. 2007, Phys. Rep., 443, 1
Markevitch, M., Mazzotta, P., Vikhlinin, A., Burke, D., Butt, Y., David, L.,
Donnelly, H., Forman, W. R., Harris, D., Kin, D.-W., Virani, S., &
Vrtilek, J. 2003, ApJ, 586, L19
Markovskii, S. A., Vasquez, B. J., Smith, C. W., & Holweg, J. V. 2006, ApJ,
639, 1177
Markovskii, S. A., Vasquez, B. J., & Smith, C. W. 2008, ApJ, 675, 1576
Maron, J. & Goldreich, P. 2001, ApJ, 554, 1175
Marsch, E. 2006, Living Rev. Solar Phys., 3, 1
Marsch, E. & Tu, C.-Y. 1990a, J. Gephys. Res, 95, 8211
Marsch, E. & Tu, C.-Y. 1990b, J. Gephys. Res, 95, 11945
Marsch, E. & Tu, C.-Y. 1993, Ann. Geophys., 11, 659
Marsch, E., Ao, X.-Z.,& Tu, C.-Y. 2004, J. Geophys. Res., 109, A04102
Marsch, E., Mühlhäuser, K. H., Rosenbauer, H., & Schwenn, R. 1983,
J. Geophys. Res., 88, 2982
Mason, J., Cattaneo, F., & Boldyrev, S. 2006, Phys. Rev. Lett., 97, 255002
Mason, J., Cattaneo, F., & Boldyrev, S. 2007, Phys. Rev. E, 77, 036403
Matteini, L., Landi, S., Hellinger, P., Pantellini, F., Maksimovic, M., Velli,
M., Goldstein, B. E., & Marsch, E. 2007, Geophys. Res. Lett., 34, L20105
Matthaeus, W. H. & Goldstein, M. L. 1982, J. Geophys. Res., 87, 6011
Matthaeus, W. H. & Brown, M. R. 1988, Phys. Fluids, 31, 3634
Matthaeus, W. H., Goldstein, M. L., & Roberts, D. A. 1990, J. Geophys.
Res., 95, 20673
Matthaeus, W. H., Klein, K. W., Ghosh, S., & Brown, M. R. 1991,
J. Geophys. Res., 96, 5421
Matthaeus, W. H., Pouquet, A., Mininni, P. D., Dmitruk, P., & Breech, B.
2008a, Phys. Rev. Lett., 100, 085003
Matthaeus, W. H., Servidio, S., & Dmitruk, P. 2008b, Phys. Rev. Lett., 101,
149501
Minter, A. H. & Spangler, S. R. 1996, ApJ, 458, 194
Montgomery, D. C. 1982, Phys. Scripta, T2/1, 83
Montgomery, D. C. & Bates, J. W. 1999, Phys. Plasmas, 6, 2727
Montgomery, D. & Turner, L. 1981, Phys. Fluids, 24, 825
Montgomery, D., Brown, M. R., & Matthaeus, W. H. 1987, J. Geophys.
Res., 92, 282
Morales, G. J., Maggs, J. E., Burke, A. T., & Peñano, J. R. 1999, Plasma
Phys. Control. Fusion, 41, A519
Müller, W.-C., Biskamp, D., & Grappin, R. 2003, Phys. Rev. E, 67, 066302
Narayan, R. & Quataert, E. 2005, Science, 307, 77
Narayan, R. & Yi, I. 1995, ApJ, 452, 710
Narita, Y., Glassmeier, K.-H., & Treumann, R. A. 2006, Phys. Rev. Lett., 97,
191101
Nazarenko, S. 2007, New J. Phys, 9, 307
Newbury, J. A., Russell, C. T., Phillips, J. L., & Gary, S. P. 1998,
J. Geophys. Res., 103, 9553
Ng, C. S. & Bhattacharjee, A. 1996, ApJ, 465, 845
Ng, C. S. & Bhattacharjee, A. 1997, Phys. Plasmas, 4, 605
Ng, C. S., Bhattacharjee, A., Germaschewski, K., & Galtier, S. 2003, Phys.
Plasmas, 10, 1954
Norman, C. A. & Ferrara, A. 1996, ApJ, 467, 280
Obukhov, A. M. 1941, Izv. Akad. Nauk SSSR Ser. Geogr. Geofiz., 5, 453
Obukhov, A. M. 1949, Izv. Akad. Nauk SSSR Ser. Geogr. Geofiz., 13, 58
Osman, K. T. & Horbury, T. S. 2007, 654, L103
Oughton, S., Dmitruk, P., & Matthaeus, W. H. 2004, Phys. Plasmas, 11, 2214
Oughton, S., Priest, E. R., & Matthaeus, W. H. 1994, J. Fluid Mech., 280, 95
Passot, T. & Sulem, P. L. 2007, Phys. Plasmas, 14, 082502
Perez, J. C. & Boldyrev, S. 2008, ApJ, 672, L61
Perez, J. C. & Boldyrev, S. 2009, Phys. Rev. Lett., 102, 025003
Plunk, G. G., Cowley, S. C., Schekochihin, A. A., & Tatsuno, T. 2009,
J. Fluid Mech., submitted (arXiv:0904.0243)
Podesta, J. J., Roberts, D. A., & Goldstein, M. L. 2006, J. Geophys. Res.,
111, A10109
Pokhotelov, O. A., Sagdeev, R. Z., Balikhin, M. A., Onishchenko, O. G., &
Fedun, V. N. 2008, J. Geophys. Res., 113, A04225
Quataert, E. 2003, Astron. Nachr., 324, 435
Quataert, E. & Gruzinov, A. 1999, ApJ, 520, 248
Quataert, E., Dorland, W., & Hammett, G. W. 2002, ApJ, 577, 524
Ramos, J. J. 2005, Phys. Plasmas, 12, 052102
Rappazzo, A. F., Velli, M., Einaudi, G., & Dahlburg, R. B. 2007, ApJ, 657,
Rappazzo, A. F., Velli, M., Einaudi, G., & Dahlburg, R. B. 2008, ApJ, 677,
Rees, M. J., Begelman, M. C., Blandford, R. D., & Phinney, E. S. 1982,
Nature, 295, 17
Rickett, B. J., Kedziora-Chudczer, L., & Jauncey, D. L. 2002, ApJ, 581, 103
Rincon, F., Schekochihin, A. A., & Cowley, S. C. 2009, MNRAS, submitted
Roach, C. M., Applegate, D. J., Connor, J. W., Cowley, S. C., Dorland,
W. D., Hastie, R. J., Joiner, N., Saarelma, S., Schekochihin, A. A., Akers,
R. J., Brickley, C., Field, A. R., Valovic, M., & MAST Team 2005,
Plasma Phys. Control. Fusion, 47, B323
Roberts, D. A. 1990, J. Geophys. Res., 95, 1087
Robinson, D. C. & Rusbridge, M. G. 1971, Phys. Fluids, 14, 2499
Rosenbluth, M. N., Hazeltine, R. D., & Hinton, F. L. 1972, Phys. Fluids, 15,
Rosin, M. S., Rincon, F., Schekochihin, A. A., & Cowley, S. C. 2009,
MNRAS, submitted
Rutherford, P. H. & Frieman, E. A. 1968, Phys. Fluids, 11, 569
Sahraoui, F., Belmont, G., Rezeau, L., Cornilleau-Wehrlin, N., Pinçon, J. L.,
& Balogh, A. 2006, Phys. Rev. Lett., 96, 075002
Saito, S., Gary, S. P., Li, H., & Narita, Y. 2008, Phys. Plasmas, 15, 102305
Sanders, J. S. & Fabian, A. C. 2006, MNRAS, 371, L65
Schekochihin, A. A. & Cowley, S. C. 2006, Phys. Plasmas, 13, 056501
Schekochihin, A. A. & Cowley, S. C. 2007, in Magnetohydrodynamics:
Historical Evolution and Trends, ed. S. Molokov, R. Moreau, &
H. K. Moffatt, (Berlin: Springer), 85 (arXiv:astro-ph/0507686)
Schekochihin, A. A. & Cowley, S. C. 2009, Phys. Rev. Lett., submitted
Schekochihin, A. A., Cowley, S. C., Taylor, S. F., Maron, J. L., &
McWilliams, J. C. 2004, ApJ, 612, 276
Schekochihin, A. A., Cowley, S. C., Kulsrud, R. M., Hammett, G. W., &
Sharma, P. 2005, ApJ, 629, 139
http://arxiv.org/abs/0904.0243
http://arxiv.org/abs/astro-ph/0507686
KINETIC TURBULENCE IN MAGNETIZED PLASMAS 67
Schekochihin, A. A., Cowley, S. C., & Dorland, W. 2007, Plasma Phys.
Control. Fusion, 49, A195
Schekochihin, A. A., Cowley, S. C., Kulsrud, R. M., Rosin, M. S., &
Heinemann, T. 2008a, Phys. Rev. Lett., 100, 081301
Schekochihin, A. A., Cowley, S. C., Dorland, W., Hammett, G. W., Howes,
G. G., Plunk, G. G., Quataert, E., & Tatsuno, T. 2008b, Plasma Phys.
Control. Fusion, 50, 124024
Schuecker, P., Finoguenov, A., Miniati, F., Böhringer, H., & Briel, U. G.
2004, A&A, 426, 387
Scott, B. D. 2007, Phys. Plasmas, submitted (arXiv:0710.4899)
Shaikh, D. & Zank, G. P. 2005, Phys. Plasmas, 12, 122310
Shakura, N. I. & Sunyaev, R. A. 1973, A&A, 24, 337
Sharma, P., Hammett, G. W., & Quataert, E. 2003, ApJ, 596, 1121
Sharma, P., Hammett, G. W., Quataert, E., & Stone, J. M. 2006, ApJ, 637,
Sharma, P., Quataert, E., Hammett, G. W., & Stone, J. M. 2007, ApJ, 667,
Shebalin, J. V., Matthaeus, W. H., & Montgomery, D. 1983, J. Plasma Phys.,
29, 525
Shukurov, A. 2007, in Mathematical Aspects of Natural Dynamos, eds.
E. Dormy & A. M. Soward (London: CRC Press), 313
(arXiv:astro-ph/0411739)
Smirnova, T. V., Gwinn, C. R., & Shishov, V. I. 2006, A&A, 453, 601
Smith, C. W., Mullan, D. J., Ness, N. F., Skoug, R. M., & Steinberg, J. 2001,
J. Geophys. Res., 106, 18625
Smith, C. W., Hamilton, K., Vasquez, B. J., & Leamon, R. J. 2006, ApJ, 645,
Snyder, P. B. & Hammett, G. W. 2001, Phys. Plasmas, 8, 3199
Snyder, P. B., Hammett, G. W., & Dorland, W. 1997, Phys. Plasmas, 4, 3974
Sorriso-Valvo, L., Carbone, V., Bruno, R., & Veltri, P. 2006, Europhys. Lett.,
75, 832
Spangler, S. R. & Gwinn, C. R. 1990, ApJ, 353, L29
Stawicki, O., Gary, S. P., & Li, H. 2001, J. Geophys. Res., 106, 8273
Strauss, H. R. 1976, Phys. Fluids, 19, 134
Strauss, H. R. 1977, Phys. Fluids, 20, 1354
Stribling, T., Matthaeus, W. H., & Ghosh, S. 1994, J. Geophys. Res., 99,
Subramanian, K., Shukurov, A., & Haugen, N. E. L. 2006, MNRAS, 366,
Sugama, H. & Horton, W. 1997, Phys. Plasmas, 4, 405
Sugama, H., Okamoto, M., Horton, W., & Wakatani, M. 1996, Phys.
Plasmas, 3, 2379
Tatsuno, T., Dorland, W., Schekochihin, A. A., Plunk, G. G., Barnes, M. A.,
Cowley, S. C., & Howes, G. G. 2009a, Phys. Rev. Lett., submitted
(arXiv:0811.2538)
Tatsuno, T., Dorland, W., Schekochihin, A. A., Plunk, G. G., Barnes, M. A.,
Cowley, S. C., & Howes, G. G. 2009b, Phys. Plasmas, submitted
Taylor, G. I. 1938, Proc. R. Soc. A, 164, 476
Taylor, J. B. & Hastie, R. J. 1968, Plasma Phys., 10, 479
Trotter, A. S., Moran, J. M., & Rodríguez, L. F. 1998, ApJ, 493, 666
Tu, C.-Y. & Marsch, E. 1995, Space Sci. Rev., 73, 1
Unti, T. W. J. & Neugebauer, M. 1968, Phys. Fluids, 11, 563
Vogt, C. & Enßlin, T. A. 2005, A&A, 434, 67
Voitenko, Yu. M. 1998, J. Plasma Phys., 60, 515
Watanabe, T.-H. & Sugama, H. 2004, Phys. Plasmas, 11, 1476
Wicks, R. T., Chapman, S. C., & Dendy, R. O. 2009, ApJ, 690, 734
Wilkinson, P. N., Narayan, R., & Spencer, R. E. 1994, MNRAS, 269, 67
Woo, R. & Armstrong, S. R. 1979, J. Geophys. Res., 84, 7288
Woo, R. & Habbal, S. R. 1997, ApJ, 474, L139
Yoon, P. H. & Fang, T.-M. 2008, Plasma Phys. Control. Fusion, 50, 085007
Yousef, T., Rincon, F., & Schekochihin, A. 2007, J. Fluid Mech., 575, 111
Yousef, T. A., Schekochihin, A. A., & Nazarenko, S. V. 2009,
Phys. Rev. Lett., submitted
Zank, G. P. & Matthaeus, W. H., 1992, J. Plasma Phys., 48, 85
Zank, G. P. & Matthaeus, W. H., 1993, Phys. Fluids A, 5, 257
Zweben, S. J., Menyuk, C. R. & Taylor, R. J., 1979, Phys. Rev. Lett.,42,
http://arxiv.org/abs/0710.4899
http://arxiv.org/abs/astro-ph/0411739
http://arxiv.org/abs/0811.2538
ABSTRACT
  We present a theoretical framework for plasma turbulence in astrophysical
plasmas (solar wind, interstellar medium, galaxy clusters, accretion disks).
The key assumptions are that the turbulence is anisotropic with respect to the
mean magnetic field and frequencies are low compared to the ion cyclotron
frequency. The energy injected at the outer scale scale has to be converted
into heat, which ultimately cannot be done without collisions. A KINETIC
CASCADE develops that brings the energy to collisional scales both in space and
velocity. Its nature depends on the physics of plasma fluctuations. In each of
the physically distinct scale ranges, the kinetic problem is systematically
reduced to a more tractable set of equations. In the "inertial range" above the
ion gyroscale, the kinetic cascade splits into a cascade of Alfvenic
fluctuations, which are governed by the RMHD equations at both the collisional
and collisionless scales, and a passive cascade of compressive fluctuations,
which obey a linear kinetic equation along the moving field lines associated
with the Alfvenic component. In the "dissipation range" between the ion and
electron gyroscales, there are again two cascades: the kinetic-Alfven-wave
(KAW) cascade governed by two fluid-like Electron RMHD equations and a passive
phase-space cascade of ion entropy fluctuations. The latter cascade brings the
energy of the inertial-range fluctuations that was damped by collisionless
wave-particle interaction at the ion gyroscale to collisional scales in the
phase space and leads to ion heating. The KAW energy is similarly damped at the
electron gyroscale and converted into electron heat. Kolmogorov-style scaling
relations are derived for these cascades. Astrophysical and space-physical
applications are discussed in detail.

<|endoftext|><|startoftext|>
Introduction
There have been many studies of the propagation of water waves over a slope, sometimes
also subject to the effects of bottom friction. Many of these works have considered linear
waves, or have been numerical simulations in the framework of various nonlinear long-wave
model equations. Our interest here is in the propagation of weakly nonlinear long water
http://arxiv.org/abs/0704.0045v1
waves over a slope, simultaneously subject to bottom friction, a combination apparently
first considered by Miles (1983a,b) albeit for the special case of a single solitary wave, or a
periodic wavetrain. An appropriate model equation for this scenario is the variable-coefficient
perturbed Korteweg-de Vries (KdV) equation (see Grimshaw 1981, Johnson 1973a,b),
At + cAx +
AAx +
Axxx = −CD
|A|A. (1)
Here A(x, t) is the free surface elevation above the undisturbed depth h(x) and c(x) =
gh(x) is the linear long wave phase speed. The bottom friction term on the right-hand
side is represented by the Chezy law, modelling a turbulent boundary layer. Here CD is a
non-dimensional drag coefficient, often assumed to have a value around 0.01 (Miles 1983a,b).
Other forms of friction could be used (see, for instance Grimshaw et al 2003) but the Chezy
law seems to be the most appropriate for water waves in a shallow depth. In (1) the first
two terms on the left-hand side are the dominant terms, and by themselves describe the
propagation of a linear long wave with speed c. The remaining terms on the left-hand
side represent, respectively, the effect of varying depth, weakly nonlinear effects and weak
linear dispersion. The equation is derived using the usual KdV balance in which the linear
dispersion, represented by ∂2/∂x2, is balanced by nonlinearity, represented by A. Here we
have added to this balance weak inhomogeneity so that cx/c scales as h
2∂3/∂x3, and weak
friction so that CD scales with h∂/∂x. Within this basic balance of terms, we can cast (1)
into the asymptotically equivalent form
AAX +
AXXX = −CD
|A|A, (2)
where τ =
c(x′)
, X = τ − t. (3)
Here we have h = h(x(τ)), explicitly dependent on the variable τ which describes evolution
along the path of the wave.
The governing equation (2) can be cast into several equivalent forms. That most com-
monly used is the variable-coefficient KdV equation, obtained here by putting
B = (gh)1/4A (4)
so that Bτ +
2g1/4h5/4
BBX +
BXXX = −CD
|B|B . (5)
This form shows that, in the absence of friction term, i.e. when CD ≡ 0, equation (2)
has two integrals of motion with the densities proportional to h1/4A and h1/2A2. These
are often referred to as laws for the conservation of “mass” and “momentum”. However,
these densities do not necessarily correspond to the corresponding physical entities. Indeed,
to leading order, the “momentum” density is proportional to the wave action flux, while
the “mass” density differs slightly from the actual mass density. This latter issue has been
explored by Miles (1979), where it was shown that the difference is smaller than the error
incurred in the derivation of equation (4), and is due to reflected waves.
Our main concern in this paper is with the behaviour of an undular bore over a slope
in the presence of bottom friction, using the perturbed KdV equation (2), where we were
originally motivated by the possibility that the behaviour of a tsunami approaching the
shore might be modeled in this way. The undular bore solution to the unperturbed KdV
equation can be constructed using the well-known Gurevich-Pitaevskii (GP) (1974) approach
(see also Fornberg and Whitham 1978). In this approach, the undular bore is represented
as a modulated nonlinear periodic wave train. The main feature of this unsteady undular
bore is the presence of a solitary wave (which is the limiting wave form of the periodic
cnoidal wave) at its leading edge. The original initial-value problem for the KdV equation is
then replaced by a certain boundary-value problem for the associated modulation Whitham
equations. We note, however, that so far, the simplest, “(x/t)”-similarity solutions of the
modulation equations have been used for the modelling of undular bores in various contexts
(see Grimshaw and Smyth 1986, Smyth 1987 or Apel 2003 for instance). These solutions,
while effectively describing many features of undular bores, are degenerate and fail to cap-
ture, even qualitatively, some important effects associated with non-self-similar modulation
dynamics. In particular, in the classical GP solution for the resolution of an initial jump
in the unperturbed KdV equation, the amplitude of the lead solitary wave in the undular
bore is constant (twice the value of the initial jump). On the other hand, the modulation
solution for the undular bore evolving from a general monotonically decreasing initial profile
shows that the lead solitary wave amplitude in fact grows with time (Gurevich, Krylov and
Mazur 1989; Gurevich, Krylov and El 1992; Kamchatnov 2000). As we shall see, the very
possibility of such variations in the modulated solutions of the unperturbed KdV equation
has a very important fluid dynamics implication: in a general setting, the undular bore lead
solitary wave cannot be treated as an individual KdV solitary wave but rather represents a
part of the global nonlinear wave structure. In other words, while at every particular moment
of time the lead solitary wave has the spatial profile of the familiar KdV soliton, generally,
the temporal dependence of its amplitude cannot be obtained in the framework of single
solitary wave perturbation theory.
In the unperturbed KdV equation, the growth of the lead solitary wave amplitude is
caused by the spatial inhomogeneity of the initial data. Here, however, the presence of a
perturbation due to topography and/or friction serves as an alternative and/or additional
cause for variation of the lead solitary wave amplitude. Thus, in the present case, the
variation in the amplitude will have two components (which generally, of course, cannot be
separated because of the nonlinear nature of the problem); one is local, described by the
adiabatic perturbation theory for a single solitary wave, and the other one is nonlocal, which
in principle requires the study of the full modulation solution. Depending on the relative
values of the small parameters associated with the slope, friction and spatial non-uniformity
of the initial modulations, we can take into account only one of these components, or a
combination of them.
The structure of the paper is as follows. First, in Section 2, we reformulate the basic model
(1) as a constant-coefficient KdV equation perturbed by terms representing topography
and friction. Then we derive in Section 3 the associated perturbed Whitham modulation
equations using methods recently developed by Kamchatnov (2004). Next, in Section 4, this
Whitham system is integrated in the solitary-wave limit. Our purpose here is primarily to
obtain the equation of a multiple characteristic, which defines the leading edge of a shoaling
undular bore in the case when the modulations due to the combined action of the slope and
bottom friction are small compared to the existing spatial modulations due to non-uniformity
of the initial data. As a by-product of this integration, we reproduce and extend the known
results on the adiabatic variation of a single solitary wave (Miles 1983a,b). Then, in Section
5, we carry out an analogous study of a cnoidal wave, propagating over a gradual slope and
subject to friction, a case studied previously by Miles (1983 b) but under the restriction of
zero mean flow, which is removed here. Finally, in Section 6 we study effects of a gradual
slope and bottom friction on the front of an undular bore which represents a modulated
cnoidal wave transforming into a system of weakly interacting solitons near its leading edge.
2 Problem formulation
For the purpose of the present paper it is convenient to recast (2) into the standard KdV
equation form with constant coefficients, modified by certain perturbation terms. Thus we
introduce the new variables
A, T =
hdτ =
6g3/2
h(x)dx. (6)
so that UT + 6UUX + UXXX = R = F (T )U −G(T )|U |U, (7)
where F (T ) = −9hT
, G(T ) = 4CD
. (8)
In this form, the governing equation (7) has the structure of the integrable KdV equation
on the left-hand side, while the separate effects of the varying depth and the bottom friction
are represented by the two terms on the right-hand side. This structure enables us to use
the general theory developed in Kamchatnov (2004) for perturbed integrable systems.
For much of the subsequent discussion, it is useful to assume that h(x) = constant,
CD = 0 for x < 0 in the original equation (1), which corresponds to F (T ) = G(T ) = 0 for
T < 0 in (7). We shall also assume that A = 0 for x > 0 at t = 0, which corresponds to
U = 0 for X > 0 on X = τ(T ) (see (6)). Then we shall propose two types of initial-value
problem for (1), and correspondingly for (7).
(a) Let a solitary wave of a given amplitude a0 initially propagating over a flat bottom
without friction (i.e a soliton described by an unperturbed KdV equation), enter the variable
topography and bottom friction region at t = 0, x = 0 (Fig. 1 a).
(b) Let an undular bore of a given intensity propagate over a flat bottom without friction
(the corresponding solution of the unperturbed KdV equation will be discussed in Section
5). Let the lead solitary wave of this undular bore have the same amplitude a0 and enter
the variable topography and bottom friction region at t = 0, x = 0 (Fig. 1b).
In particular, we shall be interested in the comparison of the slow evolution of these
two, initially identical, solitary waves in the two different problems described above. The
expected essential difference in the evolution is due to the fact that the lead solitary wave
in the undular bore is generally not independent of the remaining part of the bore and can
exhibit features that cannot be captured by a local perturbation analysis. The well-known
example of such a behaviour, when a solitary wave is constrained by the condition of being
a part of a global nonlinear wave structure, is provided by the undular bore solution of the
KdV-Burgers (KdV-B) equation
ut + 6uux + uxxx = µuxx, µ ≪ 1 . (9)
( )h x
  a) 
( )h x
Figure 1: Isolated solitary wave (a) and undular bore (b) entering the variable topogra-
phy/bottom friction region.
Indeed, the undular bore solution of the KdV-B equation (9) is known to have a solitary
wave at its leading edge (see Johnson 1970; Gurevich & Pitaevskii 1987; Avilov, Krichever
& Novikov 1987) and this solitary wave: (a) is asymptotically close to a soliton solution
of the unperturbed KdV equation; and (b) has the amplitude, say a0, that is constant in
time. At the same time, it is clear that if one takes an isolated KdV soliton of the same
amplitude a0 as initial data for the KdV-Burgers equation it would damp with time due
to dissipation. The physical explanation of such a drastic difference in the behaviour of an
isolated soliton and a lead solitary wave in the undular bore for the same weakly dissipative
KdV-B equation is that the action of weak dissipation on an expanding undular bore is
twofold: on the one hand, the dissipation tends to decrease the amplitude of the wave
locally but, on the other hand, it “squeezes” the undular bore so that the interaction (i.e.
momentum exchange) between separate solitons within the bore becomes stronger than in
the absence of dissipation and this acts as the amplitude increasing factor. The additional
momentum is extracted from the upstream flow with a greater depth (see Benjamin and
Lighthill 1954). As a result, in the case of the KdV-B equation, an equilibrium non-zero
value for the lead solitary wave amplitude in the undular bore is established. Of course,
for other types of dissipation, a stationary value of the lead soliton amplitude would not
necessarily exist, but in general, due to the expected increase of the soliton interactions near
the leading edge, the amplitude of the lead soliton of the undular bore would decay slower
than that of an isolated soliton. Indeed, the presence here of variable topography as well
can result in an additional “nonlocal” amplitude growth.
While the problem (a) can be solved using traditional perturbation analysis for a single
solitary wave, which leads to an ordinary differential equation along the solitary wave path
(see Miles 1983a,b), the undular bore evolution problem (b) requires a more general approach
which can be developed on the basis of Whitham’s modulation theory leading to a system of
three nonlinear hyperbolic partial differential equations of the first order. Since the Whitham
method, being equivalent to a nonlinear multiple scale perturbation procedure, contains the
adiabatic theory of slow evolution of a single solitary wave as a particular (albeit singular)
limit, it is instructive for the purposes of this paper to treat both problems (a) and (b) using
the general Whitham theory.
3 Modulation equations
The original Whithammethod (Whitham 1965, 1974) was developed for conservative constant-
coefficient nonlinear dispersive equations and is based on the averaging of appropriate con-
servation laws of the original system over the period of a single-phase periodic travelling
wave solution. The resulting system of quasi-linear equations describes the slow evolution
of the modulations (i.e. of the mean value, the wavenumber, the amplitude etc.) of the pe-
riodic travelling wave. Here, that approach is extended to the perturbed KdV equation (6)
following the general approach of Kamchatnov (2004), which extends earlier results for cer-
tain specific cases (see Gurevich and Pitaevskii (1987, 1991), Avilov, Krichever and Novikov
(1987) and Myint and Grimshaw (1995) for instance).
We suppose that the evolution of the nonlinear wave is adiabatically slow, that is, the
wave can be locally represented as a solution of the corresponding unperturbed KdV equation
(i.e. (7) with zero on the right-hand side) with its parameters slowly varying with space and
time. The one-phase periodic solution of the KdV equation can be written in the form
U(X, T ) = λ3 − λ1 − λ2 − 2(λ3 − λ2)sn2(
λ3 − λ1 θ,m) (10)
where sn(y,m) is the Jacobi elliptic sine function, λ1 ≤ λ2 ≤ λ3 are parameters and the
phase variable θ and the modulus m are given by
θ = X − V T, V = −2(λ1 + λ2 + λ3) , (11)
λ3 − λ2
λ3 − λ1
, (12)
and L =
−P (µ)
2K(m)√
λ3 − λ1
, (13)
where K(m) is the complete elliptic integral of the first kind, L is the “wavelength” along the
X-axis (which is actually a retarded time rather than a true spatial co-ordinate). Here we
have used the representation of the basic ordinary differential equation for the KdV travelling
wave solution (10) in the form (see Kamchatnov (2000) for a general motivation behind this
representation)
−P (µ), (14)
where
µ = 1
(U + s1), s1 = λ1 + λ2 + λ3 (15)
P (µ) =
(µ− λi) = µ3 − s1µ2 + s2µ− s3, (16)
that is the solution (10) is parameterized by the zeroes λ1, λ2, λ3 of the polynomial P (µ).
In a modulated wave, the parameters λ1, λ2, λ3 are allowed to be slow functions of X and
T , and their evolution is governed by the Whitham equations. For the unperturbed KdV
equation, the evolution of the modulation parameters is due to a spatial non-uniformity
of the initial distributions for λj, j = 1, 2, 3 and the typical spatio-temporal scale of the
modulation variations is determined by the scale of the initial data.
In the case of the perturbed KdV equation (7), the evolution of the parameters λ1, λ2, λ3
is caused not only by their initial spatial non-uniformity, but also by the action of the
weak perturbation, so that, generally, at least two independent spatio-temporal scales for
the modulations can be involved. However, at this point we shall not introduce any scale
separation within the modulation theory and derive general perturbed Whitham equations
assuming that the typical values of F (T ) and G(T ) are O(∂λj/∂T, ∂λj/∂X) within the
modulation theory.
It is instructive to first introduce the Whitham equations for the perturbed KdV equation
(7) using the traditional approach of averaging the (perturbed) conservation laws. To this
end, we introduce the averaging over the period (13) of the cnoidal wave (10) by
〈F〉 =
Fdθ =
−P (µ)
. (17)
In particular,
〈U〉 = 2〈µ〉 − s1 = 2(λ3 − λ1)
+ λ1 − λ2 − λ3, (18)
〈U2〉 = 8[−s1
(λ3 − λ1)
s1λ1 +
(λ21 − λ2λ3)] + s21 , (19)
where E(m) is the complete elliptic integral of the second kind. Now, one represents the
KdV equation (7) in the form of the perturbed conservation laws
= Rj , j = 1, 2, 3 , Rj ≪ 1 , (20)
where Pj and Qj are the standard expressions for the conserved densities (Kruskal integrals)
and “fluxes” of the unperturbed KdV equation. Just as in the Whitham (1965) theory for
unperturbed dispersive systems, the number of conservation laws required is equal to the
number of free parameters in the travelling wave solution, which is three in the present case.
Next, one applies the averaging (17) to the system (20) to obtain (see Dubrovin and Novikov
1989)
∂〈Pj〉
∂〈Qj〉
= 〈Rj〉 , j = 1, 2, 3 . (21)
The system (21) describes slow evolution of the parameters λj in the cnoidal wave solution
(10).
Along with these derived perturbed conservative form of the Whitham equations, we
introduce the wave conservation law which is a general condition for the existence of slowly
modulated single-phase travelling wave solutions (10) (see for instance Whitham 1974) and
must be consistent with the modulation system (21). This conservation law has the form
= 0 , (22)
where k =
, ω = kV (23)
are the “wavenumber” and the “frequency” respectively (we have put quotation marks here
because the actual wavenumber and frequency related to the physical variables x, t are
different quantities from those in (23), but are related through the transformations (3, 6) ).
The wave conservation law (22) can be introduced instead of any of three inhomogeneous
averaged conservation laws comprising the Whitham system (21).
It is known that the Whitham system for the homogeneous constant-coefficient KdV
equation can be represented in diagonal (Riemann) form (Whitham 1965, 1974) by an ap-
propriate choice of the three parameters characterising the periodic travelling wave solution.
In fact, in our solution (7) the parameters λj have already been chosen so that they coincide
with the Riemann invariants of the unperturbed KdV modulation system. Introducing them
explicitly into the perturbed system (21) we obtain (see Kamchatnov 2004)
∂L/∂λi
〈(2λi − s1 − U)R〉
j 6=i(λi − λj)
, i = 1, 2, 3, (24)
where R is the perturbation term on the right-hand side of the KdV equation (7) and
vi = −2
∂L/∂λi
, i = 1, 2, 3, (25)
are the Whitham characteristic velocities corresponding to the unperturbed KdV equation.
It should be noted that the straightforward realisation of the above lucid general algo-
rithm for obtaining perturbed modulation system in diagonal form is quite a laborious task.
In fact, to derive system (24), the so-called finite-gap integration method incorporating the
integrable structure of the unperturbed KdV equation has been used. The modulation sys-
tem (24) in a more particular form corresponding to specific choices of the perturbation term
was obtained by Myint and Grimshaw (1995) using a multiple-scale perturbation expansion.
In that latter setting, the wave conservation law (22) is an inherent part of the construction,
while in the averaging approach used here, it can be obtained as a consequence of the system
(24).
To obtain an explicit representation of the Whitham equations for the present case of
equation (7), we must substitute the perturbation R from the right-hand side of (7) and
perform the integration (17) with U given by (10). From now on, we are going to consider
only the flows where U ≥ 0 so that the perturbation term assumes the form
R(U) = G(T )U − F (T )U2 . (26)
Substituting (26) into (24) we obtain, after some detailed calculations (see Appendix),
the perturbed Whitham system in the form
= ρi = Ci[F (T )Ai −G(T )Bi], i = 1, 2, 3 (27)
where C1 =
, C2 =
E − (1−m)K
, C3 =
; (28)
(5λ1 − λ2 − λ3)E +
(λ2 − λ1)K,
(5λ2 − λ1 − λ3)E − (λ2 − λ1)
λ3 − λ1
(5λ3 − λ1 − λ2)E −
(λ2 − λ1)
(−27λ21 − 7λ22 − 7λ23 + 2λ1λ2 + 2λ1λ3 + 22λ2λ3)E
(λ2 − λ1)(3λ1 + λ2 + λ3)K,
(−7λ21 − 27λ22 − 7λ23 + 2λ1λ2 + 22λ1λ3 + 2λ2λ3)E
λ2 − λ1
λ3 − λ1
(7λ21 + 15λ
2 + 11λ
3 − 6λ1λ2 − 18λ1λ3 + 6λ2λ3)K,
(−7λ21 − 7λ22 − 27λ23 + 22λ1λ2 + 2λ1λ3 + 2λ2λ3)E
(7λ21 + 11λ
2 + 15λ
3 − 18λ1λ2 − 6λ1λ3 + 6λ2λ3)K;
and the characteristic velocities are:
v1 = −2
4(λ3 − λ1)(1−m)K
v2 = −2
4(λ3 − λ2)(1−m)K
E − (1−m)K
v3 = −2
4(λ3 − λ2)K
The equations (27) – (31) provide a general setting for studying the nonlinear modulated
wave evolution over variable topography with bottom friction. In the absence of the pertur-
bation terms (i.e. when F (T ) ≡ 0, G(T ) ≡ 0), the system (27), (31) indeed coincides with
the original Whitham equations (Whitham 1965) for the integrable KdV dynamics. In that
case the variables λ1, λ2, λ3 become Riemann invariants, so in this general (perturbed) case
we shall call them Riemann variables.
It is important to study the structure of the perturbed Whitham equations (27) – (31) in
two limiting cases when the underlying cnoidal wave degenerates into (i) a small-amplitude
sinusoidal wave (linear limit), when λ2 = λ3 (m = 0), and (ii) into a solitary wave when
λ2 = λ1 (m = 1). Since in both these limits the oscillations do not contribute to the mean
flow (they are infinitely small in the linear limit and the distance between them becomes
infinitely long in the solitary wave limit) one should expect that in both cases one of the
Whitham equations will transform into the dispersionless limit of the original perturbed
KdV equation (7) i.e.
UT + 6UUX = F (T )U −G(T )U2, (32)
Indeed, using formulae (27) – (31) we obtain for m = 0:
λ2 = λ3 ,
− 6λ1
= λ1F + λ
+ (6λ1 − 12λ3)
= λ1F + λ
Similarly, for m = 1, one has
λ2 = λ1 ,
− (4λ1 + 2λ3)
(4λ1 − λ3)F +
(7λ23 − 24λ1λ3 + 32λ21)G,
− 6λ3
= λ3F + λ
We see that, in both cases, one of the Riemann variables (taken with inverted sign) coincides
with the solution of the dispersionless equation (32) (recall that in the derivation of the
Whitham equations we assumed U ≥ 0 everywhere), namely U = 〈U〉 = −λ1 when λ2 = λ3
(m = 0) and U = 〈U〉 = −λ3 when λ2 = λ1 (m = 1).
To conclude this section, we present expressions for the physical wave parameters such as
the surface elevation wave amplitude a, mean elevation 〈A〉 speed and wavenumber in terms
of the modulation solution λj(X, T ). Using (6) and (10) we obtain for the wave amplitude
(peak to trough) and the mean elevation
(λ3 − λ2) , 〈A〉 =
〈U〉 , (35)
where the dependence of 〈U〉 on λj(X, T ), j = 1, 2, 3 is given by (18) and X = X(x, t),
T = T (x, t) by (3, 6). In order to obtain the physical wavenumber κ and the frequency Ω
we first note that the phase function θ(X, T ) defined in (11) is replaced by a more general
expression defined so that k = θX and kV = −θT are the “wavenumber” and “frequency” in
the X −T coordinate system. Then we define the physical phase function Θ(x, t) = θ(X, T )
so that we get
κ = Θx , Ω = −Θt . (36)
It now follows that
(1− hV
) , Ω = k , and
1− hV/6g
. (37)
Note that the physical frequency is the “wavenumber” in the X − T coordinate system,
and that the physical phase speed is Ω/κ. Since the validity of the KdV model (1) requires
inter alia that the wave be right-going it follows from this expression that the modulation
solution remains valid only when hV < 6g. Of course, the validity of (1) also requires that
the amplitude remains small, and this would normally also ensure that V remains small.
4 Modulation solution in the solitary wave limit
In this section, we shall integrate the perturbed modulation system (27) along the multiple
characteristic corresponding to the merging of two Riemann variables λ2 and λ1. As we shall
see later, this characteristic specifies the motion of the leading edge of the shoaling undular
bore in the case when the perturbations due to variable topography and bottom friction can
be considered as small compared with the existing spatial modulations within the bore. At
the same time, as the case λ2 = λ1 ( i.e. m = 1) corresponds to the solitary wave limit in the
travelling wave solution (10), our results here are expected to be consistent with the results
from the traditional perturbation approach to the adiabatic variation of a solitary wave due
to topography and bottom friction (see Miles 1983a,b).
In the limit m → 1 the periodic solution (10) of the KdV equation goes over to its solitary
wave solution
U(X, T ) = U0sech
λ3 − λ1(X − VsT )]− λ3, (38)
where
U0 = 2(λ3 − λ1) , Vs = −(4λ1 + 2λ3) (39)
are the solitary wave amplitude and “velocity” respectively. The solution (38) depends on two
parameters λ1 and λ3 whose adiabatic slow evolution is governed by the reduced modulation
system (34). It is important that the second equation in this system is decoupled from
the first one. Hence, evolution of the pedestal −λ3 on which the solitary wave rides, can
be found from the solution of this dispersionless equation by the method of characteristics.
When λ3(X, T ) is known, evolution of the parameter λ1 can be found from the solution of
the first equation (34). As a result, we arrive at a complete description of adiabatic slow
evolution of the solitary wave parameters taking account of its interaction with the (given)
pedestal.
However, it is important to note here that while this description of the adiabatic evolution
of a solitary wave is complete as far as the solitary wave itself is concerned, it fails to describe
the evolution of a trailing shelf, which is needed to conserve total “mass” (see, for instance,
Johnson 1973b, Grimshaw 1979 or Grimshaw 2006). This trailing shelf has a very small
amplitude, but a very large length scale, and hence can carry the same order of “mass” as
the solitary wave. But note that the “momentum” of the trailing shelf is much smaller than
that of the solitary wave, whose adiabatic deformation is in fact governed to leading order by
conservation of “momentum”, or more precisely, by conservation of wave action flux (strictly
speaking, conservation only in the absence of friction).
The situation simplifies if the solitary wave propagates into a region of still water so
that there is no pedestal ahead of the wave, that is λ3 = 0 in X > τ(T ). But then, since
λ3 = 0 is an exact solution of the degenerate Whitham system (34) for this solitary wave
configuration, we can put λ3 = 0 both in the solitary wave solution,
U(X, T ) = −2λ1sech2[
−λ1 (X − VsT )], Vs = −4λ1, (40)
and in equation (34) for the parameter λ1 to obtain,
− 4λ1
Fλ1 +
Gλ21 , (41)
As we see, the solitary wave moves with the instant velocity
= −4λ1, (42)
and the parameter λ1 changes with T along the solitary wave trajectory according to the
ordinary differential equation
F (T )λ1 +
G(T )λ21. (43)
It can be shown that equation (43) is consistent with the equation for the solitary wave half-
width γ =
−λ1 obtained by the traditional perturbation approach (see Grimshaw (1979)
for instance).
Next, we re-write equation (43) in terms the original independent x-variable. For that,
we find from (6), that
dT = (h1/2/6g3/2)dx (44)
and F = −27
)3/2 dh
, G = 4CD
. (45)
Then substituting these expressions into (43) yields the equation
= −31
λ21 (46)
which can be easily integrated to give
−C0 −
, (47)
where C0 is an integration constant and x = 0 is a reference point where h = h0. According
to (40), U0 = −2λ1 is the amplitude of the soliton expressed in terms of variable U(X, T ).
Returning to the original surface displacement A(x, t) by means of (6) and denoting C0 =
4/(3ga0h0), we find the dependence of the surface elevation soliton amplitude a = (2h
2/3g)U0
on x in the form
a = a0
CDa0h0
, (48)
where a0 is the solitary wave amplitude at x = 0. We note that for CD = 0 this reduces to
the classical Boussinesq (1872) result a ∼ h−1, while for h = h0 it reduces to the well-known
algebraic decay law a ∼ 1/(1 + constant x) due to Chezy friction. Miles (1983a,b) obtained
this expression for a linear depth variation, although we note that there is a factor of 2
difference from (48) (in Miles (1983a,b) the factor 16CD/15 is 8CD/15). The trajectory of
the soliton can be now found from (42) and (47):
− t =
dx′h−5/2(x′)
CDa0h0
h3(x)
. (49)
This expression determines implicitly the dependence of x on t along the solitary wave path
and provides the desired equation for the multiple characteristic of the modulation system
for the case m = 1.
It is instructive to derive an explicit expression for the solitary wave speed by computing
the derivative dx/dt from (49), or more simply, directly from (37),
1− a/2h
. (50)
The formula (50) yields the restriction for the relative amplitude γ = a/h < 2 which is
clearly beyond the applicability of the KdV approximation (wave breaking occurs already
at γ = 0.7 (see Whitham 1974)). In the frictionless case (CD = 0) equation (48) gives
a/h = a0h0/h
2, and so the expression (50) for the speed must fail as h → 0. It is interesting
to note that this failure of the KdV model as h → 0 due to appearance of infinite (and
further negative!) solitary wave speeds is not apparent from the expression (48) for the
solitary wave amplitude, and the implication is that the model cannot be continued as
h → 0. Curiously this restriction of the KdV model seems never to have been noticed before
in spite of numerous works on this subject. Note that taking account of bottom friction
leads to a more complicated formula for the solitary wave speed as a function of h but the
qualitative result remains the same.
It is straightforward to show from (46) or (48) that
= −hx
CDa0h0
CDa0h0
. (51)
It follows immediately that for a wave advancing into increasing depth (hx > 0), the ampli-
tude decreases due to a combination of increasing depth and bottom friction. However, for
a wave advancing into decreasing depth, there is a tendency to increase the amplitude due
to the depth decrease, but to decrease the amplitude due to bottom friction. Hence whether
or not the amplitude increases is determined by which of these effects is larger, and this in
turn is determined by the slope, the depth, and the consolidated drag parameter CDa0/h0.
To illustrate, let us consider the bottom topography in the form
h(x) = h1−α0 (h0 − δx)α , α > 0 , (52)
which satisfies the condition h(0) = h0; the parameter δ characterizes the slope of the bot-
tom. In this case the formula (48) becomes
a = a0
δ(3α− 1)h0
)(3α−1)/α
if α 6= 1/3. One can see now that if α < 1/3, then the bottom friction term is relatively
unimportant due to the smallness of CD. Of course, for this case we again recover the
Boussinesq result, now slightly modified,
a ≈ a0
δ(1− 3α)h20
, 0 < α <
, h ≪ h0. (54)
Of course, this result is impractical in the KdV context as the KdV approximation used here
requires the ratio a/h to remain small.
If α > 1/3 now obtain asymptotic formula
15(3α− 1)δ
, h ≪ h0 , (55)
which is independent of the initial amplitude a0. This expression is consistent with the small-
amplitude KdV approximation as long as (3α− 1)δ/CD is order unity. Simple inspection of
(55) shows that the solitary wave amplitude
• increases as h → 0 if 1
< α < 1
• is constant as h → 0 if α = 1
• decreases as h → 0 if α > 1
Thus for 1/3 < α < 1/2, as for the case α < 1/3, the amplitude will increase as the depth
decreases, in spite of the presence of (sufficiently small) friction. However, for α > 1/3, even
although there is usually some initial growth in the amplitude, eventually even small bottom
friction will take effect and the amplitude decreases to zero. We note that if α = 1/3 then
the integral
h−3dx in (48) diverges logarithmically as h → 0, which just slightly modifies
the result (55) for h ≪ h0 and implies growth of the amplitude ∝ ln h/h as h → 0.
Of particular interest is the case α = 1. In that case formula (53) becomes
a = a0
. (56)
and a ≈ 15
h , h ≪ h0 (57)
These expressions (56, 57) were obtained by Miles (1983a,b) using wave energy conservation
(as above, note, however, that in Miles (1983a,b) the numerical coefficient is 15/4 rather
than 15/8). Thus, these results obtained from the Whitham theory are indeed consistent, at
the leading order, with the traditional perturbation approach for a slowly-varying solitary
wave.
5 Adiabatic deformation of a cnoidal wave
Next we consider a modulated cnoidal wave (10) in the special case when the modulation does
not depend on X . While this case is, strictly speaking, impractical as it assumes there is an
infinitely long wavetrain, it can nevertheless provide some useful insights into the qualitative
effects of gradual slope and friction on undular bores which are locally represented as cnoidal
waves. In the absence of friction, the slow dependence of the cnoidal wave parameters on T
was obtained by Ostrovsky & Pelinovsky (1970, 1975) and Miles (1979) (see also Grimshaw
2006), assuming that the surface displacement had a zero mean (i.e. 〈U〉 = 0), while,
the effects of friction were taken into account by Miles (1983b) using the same zero-mean
displacement assumption. However, this assumption is inconsistent with our aim to study
undular bores where the value of 〈U〉 is essentially nonzero. Hence, we need to develop a
more general theory enabling us to take into account variations in all the parameters in the
cnoidal wave. Such a general setting is provided by the modulation system (27).
Thus we consider the case when the Riemann variables in (27) do not depend on the
variable X so that the general Whitham equations become ordinary differential equations
in T , which can be conveniently reformulated in terms of the original spatial x-coordinate
using the relationship (44):
, i = 1, 2, 3, (58)
where all variables are defined above in section 3 (see 28, 29, 30). This system can be readily
solved numerically. But it is instructive, however, to first indicate some general properties
of the solution.
First, the solution to the system (58) must have the property of conservation of “wave-
length” L (or “wavenumber” k=2π/L)
2K(m)√
λ3 − λ1
= constant (59)
Indeed, the wave conservation law (22) in absence of X-dependence assumes the form
= 0 , (60)
which yields (59). Thus the system of three equations (58) can be reduced to two equations.
Next, applying Whitham averaging directly to (7) yields
P̃ , M = 〈U〉 , P̃ = 〈|U |U〉 . (61)
P − 4CD
Q̃ , P = 〈U2〉 , Q̃ = 〈|U |3〉 . (62)
The equation set (59), (61), (62) comprise a closed modulation system for three independent
modulation parameters, say M , P̃ and m. While this system is not as convenient for further
analysis as the system (27) in Riemann variables, it does not have a restriction U > 0 inherent
in (27), and allows for some straightforward inferences regarding the possible existence of
modulation solutions with zero mean elevation, that is with M = 0. Indeed, one can see
that the solution with the zero mean is actually not generally permissible when CD 6= 0, a
situation overlooked in Miles (1983b). Indeed, M = 0 immediately then implies that P̃ = 0
by (61). But then due to (59) we have all three modulation parameters fixed which is clearly
inconsistent with the remaining equation (62) (except for the trivial case M = 0, P = 0,
Q̃ = 0). However, in the absence of friction, when CD = 0, equation (61) uncouples and
permits a nontrivial solution with a zero mean. In general, when CD = 0 equations (61),
(62) can be easily integrated to give
d = Mh9/4 = constant; σ = Ph9/2 = constant. (63)
Then, using (18, 19, 59) one readily gets the formula for the variation of the modulus m,
and hence of all the other wave parameters, as a function of h
K2[2(2−m)EK − 3E2 − (1−m)K2] =
(σ − d2)L4
. (64)
200
 400
 600
 800
0.
2
0.
4
0.
6
0.
8
C  = 0
C  = 0.01
Figure 2: Dependence of the modulus m on the physical space coordinate x in the cases
without and with bottom friction in the X-independent modulation solution.
Formula (64) generalises to the case M 6= 0 (i.e. d 6= 0) the expressions of Ostrovsky &
Pelinovsky (1970, 1975), Miles (1979) and Grimshaw (2006) (note that in Grimshaw (2006)
the zero mean restriction in actually not necessary). We note here that, again with CD = 0,
equation (5) implies conservation of 〈B〉 and 〈B2〉 (the averaged wave action flux), which,
together with (59), also yield (64).
The physical frequency Ω and wavenumber κ in the modulated periodic wave under study
are given by the formula (37), and we recall here that k = 2π/L is constant (see (59)). As
discussed before at the end of Section 3 we must require that the phase speed stays positive as
the wave evolves, and here that requires that the physical wavenumber κ > 0. Since a/h (and
hence hV/6g) is supposed to be small within the range of applicability of the KdV equation
(2) the expression (37) implies the behaviour κ ≃ Ω/
gh which of course agrees with the
well known result for linear waves on a sloping beach (see Johnson 1997 for instance). This
effect will be slightly attenuated for the nonlinear cnoidal wave, since V h/6g > 0, but the
overall effect will be a “squeezing” of the cnoidal wave, a result important for our further
study of undular bores. Next we study numerically the combined effect of slope and friction
on a cnoidal wave.
As we have shown, in the presence of Chezy friction M 6= 0, and we have also assumed
that U > 0, which is necessary when we come to study undular bores. Now we use the
stationary modulation system (58) in Riemann variables, which was derived using this as-
sumption. We solve the coupled ordinary differential equation system (58) for the case of a
linear slope
h(x) = h0 − δx (65)
with h0 = 10, δ = 0.01, and with the initial conditions
λ1 = −0.441, λ2 = 0.147, λ3 = 0.294 at x = 0, (66)
which corresponds to a nearly harmonic wave with m = 0.2, a/h0 = 0.2, 〈A〉/h0 ≈ 0.3
at x = 0 (see (35)). Also we note that for the chosen parameters we have V = 0, so at
x = 0 we have κ = Ω/
gh0 as in linear theory. It is instructive to compare solutions with
(CD = 0.01) and without (CD = 0) friction. In Fig. 2 the dependence of the modulus m
100 200 300 400 500
C  = 0
C  = 0.01
100 200 300 400 500
h  <A>
1/4 C  = 0
C  = 0.01
Figure 3: Left: Dependence of the mean value 〈A〉 in theX-independent modulation solution
on the physical space coordinate x without (dashed line) and with (solid line) bottom friction;
Right: Same but multiplied by the Green’s law factor, h1/4
100
 200
 300
 400
 500
1.
4
1.
6
1.
8
2.
2
2.
4
C   = 0
C   = 0.01
Figure 4: Dependence of the surface elevation amplitude a on the space coordinate x. Dashed
line corresponds to the frictionless case and solid line to the case with bottom friction.
on x is shown for both cases. We see that for the frictionless case m → 1 with decrease
of depth, i.e. the wave crests assume the shape of solitary waves when one approaches the
shoreline. When CD 6= 0 the modulus also grows with decrease of depth but never reaches
unity. The dependence on x of the mean surface elevation 〈A〉 for the cases without and
with friction is shown in Fig. 3. We have checked that the “wavelength” L (59) is constant
for both solutions. Also, one can see from Fig. 3 (right) that the value h1/4〈A〉 ∝ d is indeed
conserved in the frictionless case but is not constant if friction is present (the same holds
true for the value h1/2〈A2〉 ∝ σ but we do not present the graph here). Finally, in Fig. 4 the
dependence of the physical elevation wave amplitude a on the spatial coordinate x is shown.
One can see that the amplitude adiabatically grows with distance in the frictionless case
due to the effect of the slope (without friction) but, not unexpectedly, gradually decreases
in the case when bottom friction is present, where the decrease for these parameter settings
is comparable in magnitude to the effect of the slope. In both cases the main qualitative
changes occur in the wave shape and the wavelength.
Overall, we can infer from these results that the main local effect of a slope and bottom
friction on a cnoidal wave, along with the adiabatic amplitude variations, is twofold: a wave
with a m < 1 at x = 0 tends to transform into a sequence of solitary waves as x decreases,
and at the same time the distance between subsequent wave crests tends to decrease. This
is in sharp contrast with the behaviour of modulated cnoidal waves in problems described
by the unperturbed KdV equation, where growth of the modulus m is accompanied by an
increase of the distance between the wave crests. Generally, in the study of behaviour of
unsteady undular bores in the presence of a slope and bottom friction we will have to deal
with the combination of these two opposite tendencies.
6 Undular bore propagation over variable topography
with bottom friction
6.1 Gurevich-Pitaevskii problem for flat-bottom zero-friction case
We now turn to the problem (b) outlined in Section 2. We study the evolution of an undular
bore developing from an initial surface elevation jump ∆ > 0, located at some point x0 < 0.
As discussed below, the undular bore will expand with time so that at some t = t0 its lead
solitary wave enters the gradual slope region, which begins at x = 0 (see Fig. 1b). We assume
that for x < 0 one has h = h0 = constant and CD ≡ 0. We shall first present a formulation
of the Gurevich-Pitaevskii problem for the perturbation-free KdV equation and reproduce
the well-known similarity modulation solution describing the evolution of the undular bore
until the moment it enters the slope. We emphasize that, although this formulation and,
especially, this similarity solution are known very well and have been used by many authors,
some of the inferences important for the present application to fluid dynamics have not been
widely appreciated, as far as we can discern. Pertinent to our main objective in this paper,
we undertake a detailed study of the characteristics of the Whitham modulation system in
the vicinity of the leading edge of the undular bore solution, and show that the boundary con-
ditions of Gurevich-Pitaevskii type permit only two possible characteristics configurations,
implying two qualitatively different types of the leading solitary wave behaviour. Next, we
shall show how this Gurevich-Pitaevskii formulation of the problem applies to the perturbed
modulation system in the form (27) and finally we will study the effects of the perturbation
on the modulations in the vicinity of the leading edge of the undular bore.
In the case of a flat, frictionless bottom the original equation (1) becomes the constant-
coefficient KdV equation which can be cast into the standard form
ηζ + 6ηηξ + ηξξξ = 0 (67)
by introducing the new variables
A , ξ =
(x+ x0 −
gh0t) , ζ =
t , (68)
where x0 < 0 is an arbitrary constant. In the Gurevich-Pitaevskii (GP) approach, one
considers a large-scale initial disturbance η(ξ, 0) = f(ξ), in the form of a decreasing profile,
f ′(ξ) < 0 (e.g. a smooth step: f(ξ) → 0 as ξ → +∞; f(ξ) → η0 > 0 as ξ → −∞), whose
initial evolution until some critical (breaking) time ζb can be described by the dispersionless
limit of the KdV equation, i.e. by the Hopf equation,
ζ < ζb : η ≈ r(ξ, ζ), rζ + 6rrξ = 0 , r(ξ, 0) = f(ξ) . (69)
The evolution (69) leads to wave-breaking of the r(ξ)-profile at some ζ = ζb, with the
consequence that the dispersive term in the KdV equation then comes into play, and an
undular bore forms, which can be locally represented as a single-phase travelling wave. This
travelling wave is modulated in such a way that it acquires the form of a solitary wave at the
leading edge ξ = ξ+(ζ) and gradually degenerates, via the nonlinear cnoidal-wave regime, to
a linear wave packet at the trailing edge ξ = ξ−(ζ). It is important that this undular bore
is essentially unsteady, i.e. the region ξ−(ζ) < ξ < ξ+(ζ) expands with time ζ .
The single-phase travelling wave solution of the KdV equation (67) has the form (cf.
(10))
η(ξ, ζ) = r3 − r1 − r2 − 2(r3 − r2)sn2(
r3 − r1θ,m) (70)
θ = ξ + 2(r1 + r2 + r3)ζ , m =
r3 − r2
r3 − r1
. (71)
The parameters r1 ≤ r2 ≤ r3 ≤ 0 in the undular bore are slowly varying functions of ξ, ζ ,
whose evolution is governed by the Whitham equations
+ vj(r1, r2, r3)
= 0 , j = 1, 2, 3. (72)
The characteristic velocities in (72) are given by (31). We stress that, although analytical
expressions (70) and (10) (as well as (72) and the homogeneous version of (27)) are identical,
they are written for completely different sets of variables, both dependent and independent.
The Riemann invariants rj(ξ, ζ) are subject to special matching conditions at the free
boundaries, ξ = ξ±(ζ) defined by the conditions m = 0 (trailing edge) and m = 1 (leading
edge), formulated in Gurevich and Pitaevskii (1974) (see also Kamchatnov (2000) or El
(2005) for a detailed description).
At the trailing (harmonic) edge, where the wave amplitude a = 2(r3 − r2) vanishes and
m = 0, one has
ξ = ξ−(ζ) : r2 = r3 , −r1 = r . (73)
At the leading (soliton) edge, where m = 1 one has
ξ = ξ+(ζ) : r2 = r1 , −r3 = r . (74)
In both (73) and (74), r(ξ, ζ) is the solution of the Hopf equation (69).
The curves ξ = ξ±(ζ) are defined for the solution of the GP problem (72), (73), (74) by
the ordinary differential equations
= v−(ξ−, ζ) ,
= v+(ξ+, ζ) , (75)
where v± are calculated as the values of double characteristic velocities of the modulation
system at the undular bore edges,
v− = v2(r1, r3, r3)|ξ=ξ−(ζ) = v3(r1, r3, r3)|ξ=ξ−(ζ), (76)
v+ = v2(r1, r1, r3)|ξ=ξ+(ζ) = v1(r1, r1, r3)|ξ=ξ+(ζ) (77)
These equations (75) essentially represent kinematic boundary conditions for the undular
bore (see El 2005). Indeed, the double characteristic velocity v2(r1, r3, r3) = v3(r1, r3, r3) can
be shown to coincide with the linear group velocity of the small-amplitude KdV wavepacket
while the double characteristic velocity v2(r1, r1, r3) = v1(r1, r1, r3) is the soliton speed.
One might infer from this GP formulation of the problem that, since the leading edge of
the undular bore specified by (75), (77) is a characteristic of the modulation system, then
the value of the double Riemann invariant r+ ≡ r2 = r1 is constant. Then, on considering an
undular bore propagating into still water, where r = 0, one would obtain from the matching
condition (74) at the leading edge that r3|ξ=ξ+ = 0 and thus, the amplitude of the lead solitary
wave a+ = 2(r3−r1)|ξ=ξ+ = −r+ would always be constant as well. However, this contradicts
the general physical reasoning that the amplitude of the lead solitary wave should be allowed
to change in the case of general initial data. The apparent contradiction is resolved by noting
that the leading edge specified by (75), (77) can be an envelope of the characteristic family,
i.e. a caustic, rather than necessarily a regular characteristic, and hence there is no necessity
for the double Riemann invariant r+ to be constant along the curve ξ = ξ+(ζ) in general case.
On the other hand, since the leading edge is defined by the condition m = 1, the wave form
at the leading edge will coincide with the spatial profile of the standard KdV soliton. Thus
we arrive at the conclusion that, in general, the amplitude of the leading KdV solitary wave
will vary, even in the absence of the perturbation terms. Of course, in the unperturbed KdV
equation, such varying solitary waves cannot not exist on their own, and require the presence
of the rest of the undular bore. We also stress that these variations of the leading solitary
wave in the undular bore, as described here, have a completely different physical nature to
the variations of the parameters of an individual solitary wave due to small perturbations as
described in Section 4. They are caused by nonlinear wave interactions within the undular
bore rather than by a local adiabatic response of the solitary wave to a perturbation induced
by topography and friction. Importantly for our study, however, it will transpire that the
action of these same perturbation terms on the undular bore can lead to both a local and a
nonlocal response of the leading solitary wave.
6.2 Undular bore developing from an initial jump
Next we consider the simplest solution of the modulation system, which describes an undular
bore developing from an initial discontinuity placed at the point x = −x0. In (η; ξ, ζ) -
variables we have the initial conditions
η(ξ, 0) = ∆ for ξ < 0 ; η(ξ, 0) = 0 for ξ > 0 , (78)
where ∆ > 0 is a constant. Then, on using (69), the initial conditions (78) are readily
translated into the free-boundary matching conditions (73), (74) for the Riemann invariants.
Because of the absence of a length scale in this problem, the corresponding solution of the
modulation system must depend on the self-similar variable τ = ξ/ζ alone, which reduces
the modulation system to the ordinary differential equations
(vi − τ)
= 0 , i = 1, 2, 3. (79)
-4
0
 -3
0
 -2
0
 -1
0
 10
 20
-0.8
-0.6
-0.4
-0.2
-30 -20 -10 10
η(ξ, ζ = 5)
Figure 5: Left: Riemann invariants behaviour in the similarity modulation solution for the
flat-bottom zero-friction case ; Right: corresponding undular bore profile η(ξ).
The boundary conditions for (79) follow from the matching conditions (73), (74) using the
initial condition (78):
τ = τ− : r2 = r3 , r1 = −∆
τ = τ+ : r2 = r1 , r3 = 0 .
where τ± are self-similar coordinates (speeds) of the leading and trailing edges, ξ± = τ±ζ .
Taking into account the inequality r1 ≤ r2 ≤ r3 one obtains the well-known modulation
solution of Gurevich and Pitaevskii (1974) (see also Fornberg and Whitham 1978) in the
r1 = −∆ , r3 = 0 , r2 = −m∆ , (81)
= v2(−∆,−m∆, 0) = 2∆[(1 +m)−
2m(1−m)K(m)
E(m)− (1−m)K(m)
] . (82)
This modulation solution (81), (82) (see Fig. 5a) represents the replacement, due to averag-
ing over the oscillations, of the unphysical formal three-valued solution of the dispersionless
KdV equation (i.e. of the Hopf equation) which would have taken place in the absence of
the dispersive regularisation by the undular bore. We see that (82) describes an expansion
fan in the characteristic (ξ, ζ)-plane and thus is a global solution. Substituting (81), (82)
into the travelling wave solution (70) one obtains the asymptotic wave form of the undular
bore (see Fig. 5b), which then can be readily represented in terms of the original physical
variables using the relationships (68).
The equations of the trailing and leading edges of the undular bore are determined from
(82) by putting m = 0 and m = 1 respectively
= τ− = v2(−∆, 0, 0) = −6∆ ,
= τ+ = v2(−∆,−∆, 0) = 4∆ . (83)
The leading solitary wave amplitude is η0 = 2(r3−r1) = 2∆, which is exactly twice the height
of the initial jump. This corresponds to the amplitude of the surface elevation a = 3h0∆ (see
(68)). Note that, to get the leading solitary wave of the same initial amplitude a0 as for the
separate solitary wave considered in Section 4, one should use the jump value ∆0 = a0/3h0,
which of course is just 2∆̃, where ∆̃ = 3h0∆/2 is the initial discontinuity in the surface
elevation.
6.3 Structure of the undular bore front
We are especially interested in the behaviour of the modulation solution (81), (82) in the
vicinity of the leading edge ξ = ξ+(ζ). This behaviour is essentially determined by the
manner in which the pair of characteristics corresponding to the velocities v2 and v1 merge
into a multiple eigenvalue v+ of the modulation system at ξ = ξ+(ζ).
First, one can readily infer from the modulation solution (81), (82) that the phase velocity
c = −2(r1 + r2 + r3) = 2∆(1 +m) > v2(−∆,−m, 0) for m < 1 and c = v2 for m = 1. Thus,
any individual wave crest generated at the trailing edge of the undular bore moves towards
the leading edge, i.e. for any crest m → 1 as ζ → ∞. Thus, for any particular wave crest,
except for the very first one, the solitary wave ‘status’ is achieved only asymptotically as
ζ → ∞.
Without loss of generality we assume in this section that ∆ = 1 in (81), (82). First, as
we have already mentioned, the characteristic family Γ2 : dξ/dζ = v2 is an expansion fan in
the ξ, ζ - plane,
Γ2 : ξ = C2ζ , (84)
parameterised by a constant C2, −6 ≤ C2 ≤ 4 . Next, in (82) we make an asymptotic
expansion of v2(−1,−m, 0) for small (1−m) ≪ 1, to get
2(1−m) ln(16/(1−m)) ≃ τ+ − ξ/ζ (85)
or, with logarithmic accuracy,
(τ+ − ξ/ζ) ≪ 1 : 1−m ≃
τ+ − ξ/ζ
2 ln[1/(τ+ − ξ/ζ)]
. (86)
Next, expanding v1(−1,−m, 0) for (1 − m) ≪ 1 and using (86) we get the asymptotic
equation for the characteristics family Γ1,
= v1 = τ
+ + (τ+ − ξ/ζ) +O(1−m) , (87)
which is readily integrated to leading order to give
Γ1 : ξ ≃ τ+ζ −
, (88)
where C1 ≥ 0 is an arbitrary constant ‘labeling’ the characteristics; C1 = 0 corresponds to
the leading edge of the undular bore. This asymptotic formula (88) is valid as long as ζ ≫ 1.
The behaviour of the characteristics belonging to the families Γ1 and Γ2 near the leading
edge is shown in Fig. 6a.
Next, expanding the equation for the third characteristic family, Γ3: dξ/dζ = v3(−1,−m, 0)
for (1−m) ≪ 1, we get on using (86)
τ+ − ξ/ζ
ln(1/(τ+ − ξ/ζ))
+O(τ+ − ξ/ζ) . (89)
Figure 6: Characteristics behaviour for the similarity modulation solution near the leading
edge ξ+(ζ): (a) families Γ1: dξ/dζ = v1 and Γ2 : ξ = C2ζ , (b) family Γ3: dξ/dζ = v3.
Integrating (89) we obtain to first order
Γ3 : ξ ≃ C3 − g(ζ) , (90)
where g(ζ) =
τ+ζ − C3
ln |τ+ζ − C3| − ln ζ
dζ , g(C3/τ
+) = 0 , (91)
C3 being an arbitrary constant. The asymptotic formula (90) is valid as long as g(ζ)/C3 ≪
1. Since the characteristics Γ3 intersect the leading edge ξ = τ
+ζ we must indicate their
behaviour outside the undular bore. It follows from the matching condition (74) and the
limiting structure (34) of the characteristic velocities of the Whitham system, that the
characteristics from the family Γ3 match with the Hopf equation characteristics dξ/dζ = 6r
carrying the value of the Riemann invariant r = 0 corresponding to still water upstream the
undular bore. Therefore, the sought external characteristics are simply vertical lines ξ = C3.
The qualitative behaviour of the characteristics from the family Γ3 is shown in Fig. 6b.
It is clear from the asymptotic behaviour of the characteristics that the edge characteristic
ξ = τ+ζ corresponding to the motion of the leading solitary wave intersects only with
characteristics of the family Γ3 carrying the Riemann invariant value r3 = 0 into the undular
bore domain. Since, according to the matching condition (80), r3 ≡ 0 everywhere along the
edge characteristic one can infer that the leading solitary wave motion is completely specified
by its amplitude at ζ = 0. Indeed, in this case, the leading edge represents a genuine multiple
characteristic of the modulation system, along which the Riemann invariant r+ = r2 = r1 is
a constant. Given the constant value of r1 = −1 for the solution (82), one infers that the
amplitude of the lead soliton of the self-similar undular bore, η0 = 2(r3 − r+) = 2 is also a
constant value. Thus, in the undular bore evolving from an initial jump, the leading solitary
wave represents an independent soliton of the KdV equation. Of course, this fact follows
directly from the modulation solution (82) but now we have established its meaning in the
context of the characteristics, which will play an important role below.
Next we discuss the structure of the undular bore front in the case when the initial
profile η(ξ, 0) is not a simple jump discontinuity, and instead has the form of a monotonically
decreasing function, for instance, (−ξ)1/2 when ξ ≤ 0 and η(ξ, 0) = 0 for ξ > 0. In that case,
the modulation solution for the undular bore no longer possesses x/t-similarity as in the
Figure 7: a) Leading edge ξ+(ζ) of non-self-similar undular bore as an envelope of pairwise
merging characteristics from the families dξ/dζ = v1 and dξ/dζ = v2; b) behaviour of the
Riemann invariants in non-self-similar modulation solution with r3 ≡ 0.
jump resolution case and, as a result, the speed (and therefore, the amplitude) of the lead
solitary wave is not constant. For instance, for the afore-mentioned square-root initial profile
the amplitude of the lead solitary wave grows as ζ2 (see Gurevich, Krylov and Mazur 1989,
or Kamchatnov 2000). Clearly, such an amplitude variation is impossible if the leading edge
ξ+(ζ) was a regular characteristic carrying a constant value of the Riemann invariant r+. As
discussed above, however, the GP matching conditions (73) -(77) admit another possibility;
the leading edge curve is the envelope of the characteristic families Γ1: dξ/dζ = v1 and Γ2:
dξ/dζ = v2 merging when m = 1. This configuration is shown in Fig. 7a. In this case, the
behaviour of the modulus m in the vicinity of the leading edge is given by the asymptotic
formula found in Gurevich & Pitaevskii (1974):
(1−m)2
(r+)2
(ξ+ − ξ) (92)
where the function r+(ζ) 6= constant is assumed to be known. Another specific feature of
this (general) configuration is that dr1,2/dξ → ±∞ as ξ → ξ+ (see Fig. 7b - also found in
Gurevich & Pitaevskii 1974, see also Kamchatnov 2000), which is in drastic contrast with
similarity solution (see Fig. 6a). This particular difference was discussed in relation with
undular bores in the KdV-Burgers equation in Gurevich and Pitaevskii (1987).
In summary, we see from (92) that the structure of the modulation solution in the vicin-
ity of the leading edge of an undular bore defined as a characteristic envelope is qualitatively
different compared to that for the similarity case (see (85)). The more general (but qual-
itatively similar to (92)) asymptotic formula which takes into account small perturbations
due to a variable topography and bottom friction will be derived later. At the moment,
it is important for us that in this configuration, when the leading edge is a characteristic
envelope rather than just a characteristic, the value r+, and thus, the leading solitary wave
amplitude are allowed to vary.
The analysis of the corresponding modulation solution in Gurevich, Krylov and Mazur
(1989) showed that, while in the case of an initial jump the wave crests generated at the
trailing edge reach the leading edge (and therefore, transform into solitary waves) only
asymptotically as t → ∞, for the more general case of decreasing initial data each wave
crest generated at the trailing edge reaches the leading edge in finite time and replaces
(overtakes) the existing leading solitary wave. This process is manifested as a continuous
amplitude growth of the (apparent) leading solitary wave. As in classical soliton theory,
an alternative explanation of the leading solitary wave amplitude growth can be made in
terms of the momentum exchange between the “instantaneous” leading solitary wave and
solitary waves of greater amplitude coming from the left. Indeed, as the rigorous analysis of
Lax, Levermore and Venakides showed (see Lax, Levermore and Venakides (1994) and the
references therein), the whole modulated structure of the undular bore can be asymptotically
described in terms of the interactions of a large number of KdV solitons initially ‘packed’
into a non-oscillating large-scale initial profile.
This latter interpretation is especially instructive for our purposes. Our point is that
the specific cause of the enhanced soliton interactions resulting in amplitude growth at the
leading edge is not essential; it can be large-scale spatial variations of the initial profile as
just described, but it could also equally well be an effect of small perturbations in the KdV
equation itself. Indeed, in the weakly perturbed KdV equation, the local wave structure
of the undular bore must be described to leading order by the periodic solution (70) of
the unperturbed KdV equation, so if one assumes the GP boundary conditions analogous to
(73) – (77) for the perturbed modulation system (27), one invariably will have to deal with
one of the two possible types of the characteristics behaviour (shown in Figs. 7a and 8a)
in the vicinity of the leading edge of the undular bore, because this qualitative behaviour
is determined only by the structure of the GP boundary conditions and by the associated
asymptotic structure of the characteristic velocities of the Whitham system for (1−m) ≪ 1,
which are the same for both unperturbed and perturbed modulation systems. Next, we will
show that, by using the knowledge of this qualitative behaviour of the characteristics, one
is able to construct the asymptotic modulation solution for the undular bore front in the
presence of variable topography and bottom friction even if the full solution of the perturbed
modulation system is not available.
6.4 Gurevich-Pitaevskii problem for perturbed modulation sys-
We investigate now how the GP matching problem applies to the perturbed modulation
system (27). As in the original GP problem, we postulate the natural physical requirement
that the mean value 〈U〉 is continuous across the undular bore edges, which represent free
boundaries and are defined by the conditions m = 0 (trailing edge X = X−(T )) and m = 1
(leading edge X = X+(T )). Also, we consider propagation of the undular bore into still
water, hence 〈U〉|X=X+(T ) = 0. Now, using the explicit expression (18) for 〈U〉 in terms of
complete elliptic integrals and calculating its limits as m → 0 and m → 1 one has
X = X−(T ) : λ2 = λ3 , 〈U〉 = −λ1 = u ,
X = X+(T ) : λ2 = λ1 , 〈U〉 = −λ3 = 0 ,
where u(X, T ) is solution of the dispersionless perturbed KdV equation (7), i.e.
uT + 6uuX = F (T )u−G(T )u2, (94)
with the boundary conditions
∆0 if τ < τ0; u
= 0 if τ > τ0 , (95)
where τ0 = −x0/
gh0. The boundary conditions (95) correspond to a discontinuous initial
surface elevation A(x, t) at x = −x0, obtained by using transformations (3) and (6) where
one sets t = 0. As earlier, ∆0 = a0/(3h0) is the value of the discontinuity in A, chosen in
such a way that the amplitude of the lead solitary wave in the undular bore was exactly a0
in the flat-bottom zero-friction region (see Section 6.2).
This free-boundary matching problem is then complemented by the kinematic conditions
explicitly defining the boundaries X = X±(T ). These are formulated using the multiple
characteristic directions of the perturbed modulation system (27) in the limits as m → 0
and m → 1 (cf. (75) - (77)),
= V −(X−, T ) ,
= V +(X+, T ) , (96)
where V − = v2(u, λ
−, λ−) = v3(u, λ
−, λ−), (97)
V + = v2(λ
+, λ+, 0) = v1(λ
+, λ+, 0) , (98)
and λ− = λ2(X
−, T ) = λ3(X
−, T ) , λ+ = λ2(X
+, T ) = λ1(X
+, T ). (99)
Thus, for the perturbed KdV equation the leading and trailing edges of the undular bore are
defined mathematically in the same way as for the unperturbed one, albeit for a different
set of variables.
6.5 Deformation of the undular bore front due to variable topog-
raphy and bottom friction
Finally we study the effects of gradual slope and bottom friction on the leading front of the
self-similar expanding undular bore described in Sections 6.2, 6.3. The result will essentially
depend on the relative values of the small parameters appearing in the problem. We note
that in general there are three distinct relevant small parameters,
≪ 1 , δ = max(hx) ≪ 1, CD ≪ 1 (100)
The first small parameter is determined by the ratio of the equilibrium depth in the flat
bottom region, to the distance from the beginning of the slope region to the location of the
initial jump discontinuity in the surface displacement. This measures the typical relative
spatial variations of the modulation parameters in the undular bore when it reaches the
beginning of the slope. The second and third parameters are contained in the KdV equation
(1) itself and measure the values of the slope and bottom friction respectively. In terms of
the transformed variables appearing in (7), |F (T )| ∼ δ, |G(T )| ∼ CD (see (8)). Generally
we assume δ ∼ CD (the possible orderings δ ≪ CD or CD ≪ δ can be then considered as
particular cases).
To obtain a quantitative description of the vicinity of the leading edge of the undular
bore we perform an expansion of the Whitham modulation system (27) for (1 − m) ≪ 1.
We first introduce the substitutions
λi(X, T ) = λ
+(T ) + li(X̃, T ) , vi = V
+ + v′i , ρi = ρ
+ + ρ′i, i = 1, 2. (101)
where X̃ = X+ −X , V + = −4λ+ , ρ+ =
F (T )λ+ +
G(T )(λ+)2. (102)
Since λ2 ≥ λ1, v2 ≥ v1 one always has l2 ≥ l1, v′2 ≥ v′1. Assuming X̃/X+ ≪ 1 ⇔ 1−m ≪ 1
and using that λ3 = 0 to leading order in the vicinity of the leading edge (see the matching
condition (93)), we have from asymptotic expansions of (28) – (31) as (1−m) ≪ 1
v′1 = M1(l2 − l1) ≡ −2
ln(16/(1−m))
1 + 1
(1−m) ln(16/(1−m))
(l2 − l1),
v′2 = M2(l2 − l1) ≡ −2
1− ln(16/(1−m))
(1−m) ln(16/(1−m))
(l2 − l1),
(103)
ρ′1 = N1(l2 − l1) ≡
1 + ln
l2 − l1
−16λ+
2λ+ ln
l2 − l1
−16λ+
− 3λ+
(l2 − l1)
ρ′2 = N2(l2 − l1) ≡
5 + ln
l2 − l1
−16λ+
2λ+ ln
l2 − l1
−16λ+
+ 13λ+
(l2 − l1).
(104)
Naturally, v′i and ρ
i vanish when l2 = l1. Now, substituting (101), (102) into the modulation
system (27) we obtain
− (V + + v′i)
= ρ+ + ρ′i, i = 1, 2. (105)
On using the kinematic condition (96) at the leading edge, this reduces to
− v′i
= ρ+ + ρ′i, i = 1, 2. (106)
There are two qualitatively different cases to consider:
(i) limX̃→0 |dli/dX̃| < ∞, i = 1, 2 (Fig. 8a)
(ii) limX̃→0 |dli/dX̃| = ∞, i = 1, 2 (Fig. 8b)
The case (i) implies that to leading order (106) reduces to
= ρ+ , (107)
which, together with the kinematic condition dX+/dT = −4λ+, defines the leading edge
curve X+(T ). One can observe that this system coincides with (43), (42) defining the
Figure 8: Riemann variables behaviour in the vicinity of the leading edge of the undular
bore propagating over gradual slope with bottom friction (a) Adiabatic variations of the
similarity GP regime, δ ≪ ǫ, CD ≪ ǫ; (b) General case, δ ∼ CD ∼ ǫ.
motion of a separate solitary wave over a gradual slope with bottom friction. Its integral
expressed in terms of original physical x, t-variables is given by (49). Therefore, in the case
(i) the lead solitary wave in the undular bore to leading order is not restrained by interactions
with the remaining part of the bore and behaves as a separate solitary wave. Physically this
case corresponds to adiabatic deformation of the similarity modulation solution (81), (82)
and implies the following small parameter ordering : δ ≪ ǫ, CD ≪ ǫ.
Next, we study the structure of this weakly perturbed similarity modulation solution in
the vicinity of the leading edge. The next leading order of the system (106) yields
− v′i
= ρ′i, i = 1, 2, (108)
that is
= −N1
= −N2
. (109)
Subtraction of one equation (109) from another with account of the relationship l2 − l1 ∼=
−λ+(1−m) leads consistently to leading order to the differential equation for 1−m
∂(1 −m)
F (T )
16G(T )
, (110)
This equation should be solved with the initial condition
1−m = 0 at X̃ = 0 . (111)
Elementary integration gives with the accuracy O(1−m) (cf. (85))
(1−m) ln 16
F (T )− 16
λ+G(T )
X+ −X
. (112)
This formula determines the dependence of the modulus m on T and X (as long as 1−m ≪
Now, we make use of the solution λ+ of equation (107) given by (47) with C0 =
4/(3ga0h0) (see (48)). Under supposition that the integral
h−3dx diverges as h → 0,
so that the turbulent bottom friction plays an essential role in the undular bore front be-
haviour (see Section 4 for a similar approximation for an isolated solitary wave), we obtain
for h ≪ h0
(1−m) ln
2 + 3h2
(X+ −X). (113)
At last, if the bottom topography is approximated by the dependence (52), we get with the
same accuracy
(1−m) ln 16
(3α− 1)δ
(X+ −X) , (114)
where α > 1/3. The second term in square brackets tends to zero as h → 0. However, the
region where it can be neglected may be very narrow because of smallness of the parameter
δ. We recall that in this formula X+ is given by (49) and X is defined by (3) in terms of
the original physical independent variables x and t.
Summarising, if the conditions δ, CD ≪ ǫ are satisfied, the lead solitary wave of the
undular bore behaves as an individual (noninteracting) solitary wave adiabatically varying
under small perturbation due to variable topography and bottom friction. The modulation
solution in the vicinity of the leading edge also varies adiabatically, however, its qualitative
structure considered in Section 6.4 (see Figs 5,6) remains unchanged.
In a sharp contrast with the described case of adiabatic deformation of an undular bore
front is case (ii) when the second term in the left-hand side of (106) contributes to the
leading order, i.e. to the motion of the leading edge itself. Namely, we have
= ρ+ + v′i
, i = 1, 2. (115)
Now dλ+/dT 6= ρ+ which means that the amplitude of the lead solitary wave a = −2λ+ varies
essentially differently compared to the case of an isolated solitary wave. Indeed, the term
ρ+ in the right-hand side of (115) is responsible for local adiabatic variations of the solitary
wave while the term v′i∂li/∂X̃ describes nonlocal parts of the variations associated with
the wave interactions within the undular bore. Using asymptotic formulae (103) implying
v′2 ≥ 0, v′1 ≤ 0, and the condition limX̃→0 |dl1,2/dX̃| = ∞ along with l2 ≥ l1, it is not
difficult to show that this nonlocal term is always nonnegative , i.e. the lead solitary wave
in the undular bore propagating over a gradual slope with bottom friction always moves
faster (and, therefore, has greater amplitude) than an isolated solitary wave of the same
initial amplitude in the beginning of the slope. Indeed, as we have shown in Section 5, the
presence of a slope and bottom friction always result in “squeezing” the cnoidal wave, hence
increasing momentum exchange between solitary waves in the vicinity of the leading edge
of the undular bore and acceleration of the lead solitary wave itself. The situation here is
qualitatively analogous to that described in Section 6.4 where the general global modulation
solution for the unperturbed KdV equation was discussed. Similarly to that case, the leading
edge now represents a characteristic envelope – a caustic (otherwise we are back in the case
(i) implying dλ+/dT = ρ+) (see Fig. 6a).
Unlike the case of adiabatic variations of the leading edge, determination of the function
λ+(T ) requires now knowledge of the full solution of the perturbed modulation system (27)
with the matching conditions (93). While the analytic methods to construct such a solution
for inhomogeneous quasilinear systems are not available presently, it is instructive to assume
that dλ+/dT − ρ+ is a known function of T and to study the structure of the solution in
close vicinity of the leading edge. With an account of the explicit form (103) of the velocity
corrections, equations (115) assume the form
= −dλ
+/dT − ρ+
2(l2 − l1)
ln[16/(1−m)]
(1−m)
, (116)
= −dλ
+/dT − ρ+
2(l2 − l1)
ln[16/(1−m)]
(1−m)
. (117)
Taking the difference of (116) and (117) we transform it to the form
∂(1 −m)
dλ+/dT − ρ+
(λ+)2
(1−m) ln[16/(1−m)]
. (118)
This equation can be readily integrated with the initial condition (111) to give
(1−m)2
2(dλ+/dT − ρ+)
(λ+)2
(X+ −X). (119)
This solution coincides with the asymptotic formula (92) for the behaviour of the modulus
in the vicinity of the leading edge of the undular bore in general unperturbed GP problem
[16] but instead of the derivative dλ+/dT in (92) we have the difference dλ+/dT −ρ+ (which
is always positive as we have established).
7 Conclusions
We have studied the effects of a gradual slope and turbulent (Chezy) bottom friction on the
propagation of solitary waves, nonlinear periodic waves and undular bores in shallow-water
flows in the framework of the variable-coefficient perturbed KdV equation. The analysis has
been performed in the most general setting provided by the associated Whitham equations
describing slow modulations of a periodic travelling wave due to the slope, bottom friction
and spatial nonuniformity of initial data. This modulation theory, developed in general form
for perturbed integrable equations in Kamchatnov (2004) was applied here to the perturbed
KdV equation and allowed us to take into account slow variations of all three parameters
in the cnoidal wave solution. The particular time-independent solutions of the perturbed
modulation equations were shown to be consistent with the adiabatically varying solutions
for a single solitary wave and for a periodic wave propagating over a slope without bottom
friction obtained in Ostrovsky & Pelinovsky (1970, 1975) and Miles (1979, 1983a). It was
shown, however, that the assumption of zero mean elevation used in these papers for the
description of slow variations of a cnoidal wave, ceases to be valid in the case when the
turbulent bottom friction is present. In this case, a more general solution was obtained
numerically improving the results of Miles (1983b).
Further, the derived full time-dependent modulation system was used for the descrip-
tion of the effects of variable topography and bottom friction on the propagation of undular
bores, in particular on the variations of the undular bore front representing a system of
weakly interacting solitary waves. By the analysis of the characteristics of the Whitham
system in the vicinity of the leading edge of the undular bore, two possible configurations
have been identified depending on whether the leading edge of the undular bore represents a
regular characteristic of the modulation system or its singular characteristic, i.e. a caustic.
The first case was shown to correspond to adiabatically slow deformations of the classi-
cal Gurevich-Pitaevskii modulation solution and is realised when the perturbations due to
variable topography and bottom friction are small compared with the existing spatial non-
uniformity of modulations in the undular bore (which is supposed to be formed outside the
region of variable topography/bottom friction). In the case when modulations due to the
external perturbations are comparable in magnitude with the existing modulations in the
undular bore, the leading edge becomes a caustic, and this situation was shown to corre-
spond to enhanced solitary wave interactions within the undular bore front. These enhanced
interactions have been shown to lead to a “nonlocal” leading solitary wave amplitude growth,
which cannot be predicted in the frame of the traditional local adiabatic approach to prop-
agation of an isolated solitary wave in a variable environment. As we mentioned in the
Introduction, one of our original motivations for this study was the possibility to model a
shoreward propagating tsunami as an undular bore. In this context, we would suggest that
the second scenario described above is the more relevant, which has the implication that
the growth, and eventual breaking of the leading waves in a tsunami wavetrain, cannot be
modeled as a local effect for that particular wave, but is determined instead by the whole
structure of the wavetrain.
Acknowledgements
This work was started during the visit of A.M.K. at the Department of Mathematical Sci-
ences, Loughborough University, UK. A.M.K. is grateful to EPSRC for financial support.
Appendix A: Derivation of the perturbed modulation system
We express the integrand function in the right-hand side of (24) in terms of the µ-variable
(15):
(2λi − s1 − U)R = 8Gµ3 − [8Gλi + 4(F + 2s1G)]µ2
+ [4(F + 2s1G)λi + 2s1(s1G+ F )]µ− 2s1(s1G+ F )λi.
(120)
Then we obtain with the use of (13), (14), and (16) the following expressions:
〈µ〉 =
µdθ =
−P (µ)
〈µ2〉 = 1
µ2dθ =
〈µ3〉 =
µ3dθ = −
+ s1〈µ2〉 − s2〈µ〉+ s3,
(121)
where I is a known integral
(λ3 − µ)(µ− λ2)(µ− λ1) dµ
(λ3 − λ1)5/2[(1−m+m2)E(m)− (1−m)(1−m/2)K(m)],
(122)
K(m) and E(m) being the complete elliptic integrals of the first and second kind, respec-
tively. The derivatives of I with respect to λi are also known table integrals (Gradshtein &
Ryzhik 1980):
(λ3 − µ)(µ− λ2)
µ− λ1
λ3 − λ1[(λ2 + λ3 − 2λ1)E − 2(λ2 − λ1)K],
(λ3 − µ)(µ− λ1)
µ− λ2
λ3 − λ1[(λ3 − λ1)K + (λ1 + λ3 − 2λ2)E],
(µλ2)(µ− λ1)
λ3 − µ
λ3 − λ1[(2λ3 − λ1 − λ2)E − (λ2 − λ1)K].
(123)
We can easily express the si-derivatives in terms of λi derivatives by differentiation of the
formulae (see (16))
s1 = λ1 + λ2 + λ3, s2 = λ1λ2 + λ1λ3 + λ2λ3, s3 = λ1λ2λ3 (124)
and solving the linear system for differentials. Simple calculation gives
(−1)3−k
j 6=i(λi − λj)
. (125)
Then, combining (123) and (125), we obtain the derivatives ∂I/∂si and hence the expressions
(λ3 − λ1)
(s21 − 3s2)
(λ2 − λ1)(λ2 + λ3 − 2λ1)
+ s1λ1 + λ
1 − λ2λ3
(λ3 − λ1)
(126)
To complete the calculation of the right-hand side of (24), we need also expressions
∂L/∂λ1
= 2(λ2 − λ1)
∂L/∂λ2
2(λ3 − λ2)(1−m)K
E − (1−m)K
∂L/∂λ3
2(λ3 − λ2)K
(127)
Collecting all contributions into perturbations terms, we obtain the Whitham equations in
the form
= ρi = Ci[F (T )Ai −G(T )Bi], (128)
where Cj , Aj , Bj and vj , j = 1, 2, 3 are specified by formulae (28) - (30).
References
[1] Apel, J.P. 2003 A new analytical model for internal solitons in the ocean, Journ. Phys.
Oceanogr. 33, 2247.
[2] Avilov, V.V., Krichever,I.M. and Novikov, S.P 1987 Evolution of Whitham zone in
the theory of Korteweg-de Vries. Sov. Phys. Dokl. 32, 564 - .
[3] Benjamin, T.B. and Lighthill, M.J. 1954 On cnoidal waves and bores. Proc. Roy. Soc.
A224, 448
[4] Boussinesq, J. 1982 Théorie des ondes des remous qui se propagent le long d’un canal
rectangulaire, en communuuant au liquide contenu dans ce canal des vitesses sensblemnt
pareilles de la surface au fond. J. Math. Pures Appl. 17, 55-108.
[5] Dubrovin, B.A. and Novikov, S.P. 1989 Hydrodynamics of weakly deformed soliton
lattices. Differential geometry and Hamiltonian theory. Russian Math. Surveys 44, 35–
[6] El, G.A. 2005 Resolution of a shock in hyperbolic systems modified by weak dispersion.
Chaos 15, Art. No 037103.
[7] Fornberg, D. and Whitham, G.B. 1978 A numerical and theoretical study of certain
nonlinear wave phcnomena. Phil Trans. Roy. Soc. London A 289 373-403.
[8] Gradshtein, I.S. and Ryzhik, I.M. 1980 Table of integrals, series, and products, London
: Academic Press.
[9] Grimshaw, R. 1979 Slowly varying solitary waves. I Korteweg-de Vries equation. Proc.
Roy. Soc. 368A, 359-375.
[10] Grimshaw, R. 1981 Evolution equations for long nonlinear internal waves in stratified
shear flows. Stud. Appl. Math. 65, 159-188.
[11] Grimshaw, R. 2006 Internal solitary waves in a variable medium. Gesellschaft für
Angewandte Mathematik (accepted).
[12] Grimshaw, R. Pelinovsky, E. and Talipova, T. 2003 Damping of large-amplitude soli-
tary waves. Wave Motion 37, 351-364.
[13] Grimshaw, R.H.J. and Smyth, N.F. 1986 Resonant flow of a stratified fluid over
topography. J. Fluid Mech. 169, 429.
[14] Gurevich, A.V., Krylov, A.L. and El, G.A. 1992 Evolution of a Riemann wave in
dispersive hydrodynamics. Sov. Phys. JETP, 74 957–962.
[15] Gurevich, A.V., Krylov, A.L. and Mazur, N.G. 1989 Quasi-simple waves in Korteweg-
de Fries hydrodynamics, Zh. Eksp. Teor. Fiz. 95 1674.
[16] Gurevich, A.V. and Pitaevskii, L.P. 1974 Nonstationary structure of a collisionless
shock wave. Sov. Phys. JETP 38, 291.
[17] Gurevich, A.V. and Pitaevskii, L.P. 1987 Averaged description of waves in the
Korteweg-de Vries-Burgers equation. Sov. Phys. JETP 66, 490.
[18] Gurevich, A.V. and Pitaevskii, L.P. 1991 Nonlinear waves with dispersion and non-local
damping. Sov. Phys. JETP, 72, 821–825.
[19] Johnson, R.S. 1970 A non-linear equation incorporating damping and dispersion, J.
Fluid Mech. 42, 49-60.
[20] Johnson, R.S. 1973a On the development of a solitary wave moving over an uneven
bottom. Proc. Camb. Phil. Soc. 73, 183-203.
[21] Johnson, R.S. 1973b On an asymptotic solution of the Korteweg - de Vries equation
with slowly varying coefficients, J. Fluid Mach., 60, 813-824.
[22] Johnson, R.S. 1997 A Modern Introduction to the Mathematical Theory of Water
Waves Cambridge University Press, Cambridge.
[23] Kamchatnov, A.M. 2000 Nonlinear Periodic Waves and Their Modulations—An In-
troductory Course, World Scientific, Singapore.
[24] Kamchatnov, A.M. 2004 On Whitham theory for perturbed integrable equations.
Physica D188 247–261.
[25] Lax, P.D., Levermore, C.D. and Venakides, S. 1994 The generation and propagation of
oscillations in dispersive initial value problems and their limiting behavior. Important
developments in soliton theory, ed. by A.S. Focas and V.E. Zakharov, (Springer Ser.
Nonlinear Dynam., Springer, Berlin 1994) p. 205.
[26] Miles, J.W. 1979 On the Korteweg - de Vries equation for a gradually varying channel,
J. Fluid Mech 91 181-190
[27] Miles J.W. 1983a Solitary wave evolution over a gradual slope with turbulent friction.
J. Phys. Oceanography, 13 551–553.
[28] Miles, J.W. 1983b Wave evolution over a gradual slope with turbulent friction. J. Fluid
Mech 133 207-216
[29] Myint, S. and Grimshaw, R.H.J. 1995 The modulation of nonlinear periodic wavetrains
by dissipative terms in the Korteweg-de Vries equation. Wave Motion, 22, 215–238.
[30] Ostrovsky, L.A. and Pelinovsky, E.N. 1970 Wave transformation on the surface of a
fluid of variable depth. Akad. Nauk SSSR, Izv. Atmos. Ocean Phys. 6, 552-555.
[31] Ostrovsky, L.A. and Pelinovsky, E.N. 1975 Refraction of nonlinear sea waves in a
coastal zone. Akad. Nauk SSSR, Izv. Atmos. Ocean Phys. 11, 37-41.
[32] Smyth, N.F. 1987 Modulation theory for resonant flow over topography, Proc. Roy.
Soc. 409A, 79.
[33] Whitham, G.B. 1965 Non-linear dispersive waves, Proc. Roy. Soc. London A283, 238.
[34] Whitham, G.B. 1974 Linear and Nonlinear Waves, Wiley–Interscience, New York.
	Introduction
	Problem formulation
	Modulation equations
	Modulation solution in the solitary wave limit
	Adiabatic deformation of a cnoidal wave
	Undular bore propagation over variable topography with bottom friction
	Gurevich-Pitaevskii problem for flat-bottom zero-friction case
	Undular bore developing from an initial jump
	Structure of the undular bore front
	Gurevich-Pitaevskii problem for perturbed modulation system
	Deformation of the undular bore front due to variable topography and bottom friction
	Conclusions
ABSTRACT
  This paper considers the propagation of shallow-water solitary and nonlinear
periodic waves over a gradual slope with bottom friction in the framework of a
variable-coefficient Korteweg-de Vries equation. We use the Whitham averaging
method, using a recent development of this theory for perturbed integrable
equations. This general approach enables us not only to improve known results
on the adiabatic evolution of isolated solitary waves and periodic wave trains
in the presence of variable topography and bottom friction, modeled by the
Chezy law, but also importantly, to study the effects of these factors on the
propagation of undular bores, which are essentially unsteady in the system
under consideration. In particular, it is shown that the combined action of
variable topography and bottom friction generally imposes certain global
restrictions on the undular bore propagation so that the evolution of the
leading solitary wave can be substantially different from that of an isolated
solitary wave with the same initial amplitude. This non-local effect is due to
nonlinear wave interactions within the undular bore and can lead to an
additional solitary wave amplitude growth, which cannot be predicted in the
framework of the traditional adiabatic approach to the propagation of solitary
waves in slowly varying media.

<|endoftext|><|startoftext|>
Introduction
It was conjectured by Diósi, Feldmann and Kosloff in [4], based on thermodynamical
considerations, that the von Neumann entropy of a quantum state equal to a mixture
Rn :=
σ ⊗ ρ⊗(n−1) + ρ⊗ σ ⊗ ρ⊗(n−2) + · · · + ρ⊗(n−1) ⊗ σ
exceeds the entropy of a component asymptotically by the Umegaki relative entropy
S(σ‖ρ), that is,
S(Rn) − (n− 1)S(ρ) − S(σ) → S(σ‖ρ) (1)
as n → ∞. Here ρ and σ are density matrices acting on a finite dimensional Hilbert
space. Recall that S(σ) = −Tr σ log σ and
S(σ‖ρ) =
Tr σ(log σ − log ρ) if supp σ ≤ supp ρ
+∞ otherwise.
Concerning the background of quantum entropy quantities, we refer to [10, 12].
Apparently no exact proof of (1) has been published even for the classical case, al-
though for that case a heuristic proof is offered in [4].
In the paper first an analytic proof of (1) is given for the case supp σ ≤ supp ρ, using
an inequality between the Umegaki and the Belavkin-Staszewski relative entropies, and
the weak law of large numbers in the quantum case. In the second part of the paper, it
is clarified that the problem is related to the theory of classical-quantum channels. The
essential observation is the fact that S(Rn) − (n− 1)S(ρ) − S(σ) in the conjecture is a
Holevo quantity (classical-quantum mutual information) for a certain channel for which
the relative entropy emerges as the capacity per unit cost.
The two different proofs lead to two different generalizations of the conjecture.
2 An analytic proof of the conjecture
In this section we assume that supp σ ≤ supp ρ for the support projections of σ and ρ.
One can simply compute:
S(Rn‖ρ
⊗n) = Tr(Rn logRn − Rn log ρ
= −S(Rn) − (n− 1)Tr ρ log ρ− Trσ log ρ.
Hence the identity
S(Rn‖ρ
⊗n) = −S(Rn) + (n− 1)S(ρ) + S(σ‖ρ) + S(σ)
holds. It follows that the conjecture (1) is equivalent to the statement
S(Rn‖ρ
⊗n) → 0 as n → ∞
when supp σ ≤ supp ρ.
Recall the Belavkin-Staszewski relative entropy
SBS(ω‖ρ) = Tr(ω log(ω
1/2ρ−1ω1/2)) = −Tr(ρ η(ρ−1/2ωρ−1/2))
if suppω ≤ supp ρ, where η(t) := −t log t, see [1, 10]. It was proved by Hiai and Petz
S(ω‖ρ) ≤ SBS(ω‖ρ), (2)
see [6], or Proposition 7.11 in [10].
Theorem 1. If supp σ ≤ supp ρ, then S(Rn)− (n−1)S(ρ)−S(σ) → S(σ‖ρ) as n → ∞.
Proof: We want to use the quantum law of large numbers, see Proposition 1.17 in
[10]. Assume that ρ and σ are d × d density matrices and we may suppose that ρ is
invertible. Due to the GNS-construction with respect to the limit ϕ∞ of the product
states ϕn(A) = Tr ρ
⊗nA on the n-fold tensor product Md(C)
⊗n, n ∈ N, all finite tensor
products Md(C)
⊗n are embedded into a von Neumann algebra M acting on a Hilbert
space H. If γ denotes the right shift and X := ρ−1/2σρ−1/2, then Rn is written as
Rn = (ρ
1/2)⊗n
γi(X)
(ρ1/2)⊗n.
By inequality (2), we get
0 ≤ S(Rn‖ρ
⊗n) ≤ SBS(Rn‖ρ
= −Tr
ρ⊗n η
(ρ−1/2)⊗nRn(ρ
−1/2)⊗n
γi(X)
, (3)
where Ω is the cyclic vector in the GNS-construction.
The law of large numbers gives
γi(X) → I
in the strong operator topology in B(H), since ϕ(X) = Tr ρρ−1/2σρ−1/2 = 1.
Since the continuous functional calculus preserves the strong convergence (simply due
to approximation by polynomials on a compact set), we obtain
γi(X)
→ η(I) = 0 strongly.
This shows that the upper bound (3) converges to 0 and the proof is complete.
By the same proof one can obtain that for
Rm,n :=
σ⊗m ⊗ ρ⊗(n−1) + ρ⊗ σ⊗m ⊗ ρ⊗(n−2) + · · · + ρ⊗(n−1) ⊗ σ⊗m
the limit relation
S(Rm,n) − (n− 1)S(ρ) −mS(σ) → mS(σ‖ρ) (4)
holds as n → ∞ when m is fixed.
In the next theorem we treat the probabilistic case in a matrix language. The proof
includes the case when supp σ ≤ supp ρ is not true. Those readers who are not familiar
with the quantum setting of the previous theorem are suggested to follow the arguments
below.
Theorem 2. Assume that ρ and σ are commuting density matrices. Then S(Rn)− (n−
1)S(ρ) − S(σ) → S(σ‖ρ) as n → ∞.
Proof: We may assume that ρ = Diag(µ1, . . . , µℓ, 0, . . . , 0) and σ = Diag(λ1, . . . , λd)
are d×d diagonal matrices, µ1, . . . , µℓ > 0 and ℓ < d. (We may consider ρ, σ in a matrix
algebra of bigger size if ρ is invertible.) If supp σ ≤ supp ρ, then λℓ+1 = · · · = λd = 0;
this will be called the regular case. When supp σ ≤ supp ρ is not true, we may assume
that λd > 0 and we refer to the singular case.
The eigenvalues of Rn correspond to elements (i1, . . . , in) of {1, . . . , d}
(λi1µi2 · · ·µin + µi1λi2µi3 · · ·µin + · · · + µi1 · · ·µin−1λin). (5)
We divide the eigenvalues in three different groups as follows:
(a) A corresponds to (i1, . . . , in) ∈ {1, . . . , d}
n with 1 ≤ i1, . . . , in ≤ ℓ,
(b) B corresponds to (i1, . . . , in) ∈ {1, . . . , d}
n which contains exactly one d,
(c) C is the rest of the eigenvalues.
If the eigenvalue (5) is in group A, then it is
(λi1/µi1) + · · · + (λin/µin)
µi1µi2 · · ·µin .
First we compute
η(κ) =
i1,...,in
(λi1/µi1) + · · · + (λin/µin)
µi1 · · ·µin
Below the summations are over 1 ≤ i1, . . . , in ≤ ℓ:
i1,...,in
(λi1/µi1) + · · · + (λin/µin)
µi1 · · ·µin
i1,...,in
(λi1/µi1) + · · · + (λin/µin)
µi1 · · ·µin
log(µi1 · · ·µin) + Qn
i1,...,in
λi1µi2 · · ·µin log µik +
i1,...,in
λi1µi2 · · ·µin logµik
+ · · · +
i1,...,in
λi1µi2 · · ·µin log µik
(n− 1)
µik logµik +
λik logµik
= (n− 1)S(ρ) −
λi logµi + Qn,
where
Qn :=
i1,...,in
(µi1 · · ·µin)η
(λi1/µi1) + · · · + (λin/µin)
Consider a probability space
(Ω,P) :=
{1, . . . , ℓ}N, (µ1, . . . , µℓ)
where (µ1, . . . , µℓ)
N is the product of the measure on {1, . . . , ℓ} with the distribution
(µ1, . . . , µℓ). For each n ∈ N let Xn be a random variable on Ω depending on the
nth {1, . . . , ℓ} so that the value of Xn at i ∈ {1, . . . , ℓ} is λi/µi. Then X1, X2, . . . are
identically distributed independent random variables and Qn is the expectation value of
X1 + · · · + Xn
The strong law of large numbers says that
X1 + · · · + Xn
→ E(X1) =
λi almost surely.
Since η((X1 + · · · + Xn)/n) is uniformly bounded, the Lebesgue bounded convergence
theorem implies that
Qn → η
as n → ∞.
In the regular case
i=1 λi = 1, Qn → 0 and all non-zero eigenvalues are in group A.
Hence we have
S(Rn) − (n− 1)S(ρ) − S(σ) = −
λi logµi +
λi log λi + Qn = S(σ‖ρ) + Qn
and the statement is clear.
Next we consider the singular case, when we have
η(κ) = (n− 1)S(ρ) + O(1),
and we turn to eigenvalues in B. If the eigenvalue corresponding to (i1, . . . , in) ∈
{1, . . . , d}n is in group B and i1 = d, then the eigenvalue is
λdµi2 . . . µin .
It follows that
i2,...,in
(λdµi2 · · ·µin
(λdµi2 · · ·µin
i2,...,in
(µi2 · · ·µin) log(µi2 · · ·µin) −
(n− 1)S(ρ) −
When i2 = d, . . . , in = d, we get the same quantity, so this should be multiplied with n:
η(κ) = λd(n− 1)S(ρ) − λd log
We make a lower estimate to the entropy of Rn in such a way that we compute
κ η(κ)
when κ runs over A and B. It is clear now that
S(Rn) − (n− 1)S(ρ) − S(σ) ≥
η(κ) +
η(κ) − (n− 1)S(ρ) − S(σ)
≥ λd(n− 1)S(ρ) + λd log n + O(1) → +∞
as n → ∞.
3 Interpretation as capacity
A classical-quantum channel with classical input alphabet X transfers the input x ∈ X
into the output W (x) ≡ ρx which is a density matrix acting on a Hilbert space K. We
restrict ourselves to the case when X is finite and K is finite dimensional.
If a classical random variable X is chosen to be the input, with probability distribution
P = {p(x) : x ∈ X}, then the corresponding output is the quantum state ρX :=
x∈X p(x)ρx. When a measurement is performed on the output quantum system, it
gives rise to an output random variable Y which is jointly distributed with the input X .
If a partition of unity {Fy : y ∈ X} in B(K) describes the measurement, then
Prob(Y = y |X = x) = Tr ρxFy (x, y ∈ X ). (6)
According to the Holevo bound, we have
I(X ∧ Y ) := H(Y ) −H(Y |X) ≤ I(X,W ) := S(ρX) −
p(x)S(ρx), (7)
which is actually a simple consequence of the monotonicity of the relative entropy un-
der state transformation [7], see also [11]. I(X,W ) is the so-called Holevo quantity or
classical-quantum mutual information, and it satisfies the identity
p(x)S(ρx‖ρ) = I(X,W ) + S(ρX‖ρ), (8)
where ρ is an arbitrary density.
The channel is used to transfer sequences from the classical alphabet; x = (x1, x2, . . . , xn) ∈
X n is transferred into the quantum state W⊗n(x) = ρx := ρx1⊗ρx2⊗. . .⊗ρxn . A code for
the channel W⊗n is defined by a subset An ⊂ X
n, which is called a codeword set. The de-
coder is a measurement {Fy : y ∈ X
n}. The probability of error is Prob(X 6= Y ), where
X is the input random variable uniformly distributed on An and the output random
variable is determined by (6), where x and y are replaced by x and y.
The essential observation is the fact that S(Rn)−(n−1)S(ρ)−S(σ) in the conjecture
is a Holevo quantity in case of a channel with input sequences (x1, x2, . . . , xn) ∈ {0, 1}
and outputs ρx1 ⊗ ρx2 ⊗ . . . ⊗ ρxn, where ρ0 = σ, ρ1 = ρ and the codewords are all
sequences containing exactly one 0. More generally, we shall consider Holevo quantities
I(A, ρ0, ρ1) := S
S(ρx).
defined for any set A ⊂ {0, 1}n of binary sequences of length n.
The concept related to the conjecture we study is the channel capacity per unit cost
which is defined next for simplicity only in the case where X = {0, 1}, the cost of a
character 0 ∈ X is 1, while the cost of 1 ∈ X is 0.
For a memoryless channel with a binary input alphabet X = {0, 1} and an ε > 0, a
number R > 0 is called an ε-achievable rate per unit cost if for every δ > 0 and for any
sufficiently large T , there exists a code of length n > T with at least eT (R−δ) codewords
such that each of the codewords contains at most T 0’s and the error probability is at
most ε. The largest R which is an ε-achievable per unit cost for every ε > 0 is the
channel capacity per unit cost.
Lemma 1. For an arbitrary A ⊂ {0, 1}n,
I(A, ρ0, ρ1) ≤ c(A)S(ρ0‖ρ1)
holds, where
c(A) :=
|{i : xi = 0}|.
Proof: Let c(x) := |{i : xi = 0}| for x ∈ A. Since I(A, ρ0, ρ1) is a particular Holevo
quantity I(X,W⊗n), we can use the identity (8) to get an upper bound
S(ρx‖ρ
1 ) =
c(x)S(ρ0‖ρ1) = c(A)S(ρ0‖ρ1)
for I(A, ρ0, ρ1).
Lemma 2. If A ⊂ {0, 1}n is a code of the channel W⊗n, whose probability of error (for
some decoding scheme) does not exceed a given 0 < ε < 1, then
(1 − ε) log |A| − log 2 ≤ I(A, ρ0, ρ1).
Proof: The right-hand side is a bound for the classical mutual information I(X∧Y ) =
H(Y ) − H(Y |X), where Y is the channel output, see (7). Since the error probability
Prob(X 6= Y ) is smaller than ε, application of the Fano inequality (see [3]) gives
H(X|Y ) ≤ ε log |A| + log 2.
Therefore
I(X ∧ Y ) = H(X) −H(X|Y ) ≥ (1 − ε) log |A| − log 2,
and the proof is complete.
The above two lemmas shows that the relative entropy S(ρ0‖ρ1) is an upper bound
for the channel capacity per unit cost of the channel W (0) = ρ0 and W (1) = ρ1 with
a binary input alphabet. In fact, assume that R > 0 is an ε-achievable rate. For every
δ > 0 and T > 0 there is a code A ⊂ {0, 1}n for which we get by Lemmas 1 and 2
TS(ρ0‖ρ1) ≥ c(A)S(ρ0‖ρ1) ≥ I(A, ρ0, ρ1)
≥ (1 − ε) log |A| − log 2
≥ (1 − ε)T (R− δ) − log 2.
Since T is arbitrarily large and ε, δ are arbitrarily small, R ≤ S(ρ0‖ρ1) follows. That
S(ρ0‖ρ1) equals the channel capacity per unit cost will be verified below.
Theorem 3. Let the classical-quantum channel W : X = {0, 1} → B(K) be defined as
W (0) = ρ0 ≡ σ and W (1) = ρ1 ≡ ρ. Assume that An ⊂ {0, 1}
n is chosen such that
(a) each element x = (x1, x2, . . . , xn) ∈ An contains at most ℓ copies of 0,
(b) log |An|/ logn → c as n → ∞,
c(An) :=
|{i : xi = 0}| → c as n → ∞
for some real number c > 0 and for some natural number ℓ. If the random variable Xn
has a uniform distribution on An, then
S(ρXn) −
S(ρx)
= cS(σ‖ρ).
The proof of the theorem is divided into lemmas. We need the direct part of the
so-called quantum Stein lemma obtained in [6], see also [2, 5, 9, 12].
Lemma 3. Let ρ0 and ρ1 be density matrices. For every η > 0 and 0 < R < S(ρ0‖ρ1),
if N is sufficiently large, then there is a projection E ∈ B(K⊗N) such that
αN [E] := Tr ρ
0 (I − E) < η
and for βN [E] := Tr ρ
1 E the estimate
log βN [E] < −R
holds.
Note that αN is called the error of the first kind, while βN is the error of the second
kind.
Lemma 4. Assume that ε > 0, 0 < R < S(ρ0‖ρ1), ℓ is a positive integer and the
sequences x in An ⊂ {0, 1}
n contain at most ℓ copies of 0. Let the codewords be the
N-fold repetitions xN = (x,x, . . . ,x) of the sequences x ∈ An. If N is the integer part
and n is large enough, then there is a decoding scheme such that the error probability is
smaller than ε.
Proof: We follow the probabilistic construction in [13]. Let the codewords be the N -
fold repetitions xN = (x,x, . . . ,x) of the sequences x ∈ An. The corresponding output
density matrices act on the Hilbert space K⊗Nn ≡ (K⊗n)⊗N . We decompose this Hilbert
space into an N -fold product in a different way. For each 1 ≤ i ≤ n, let Ki be the
tensor product of the factors i, i + n, i + 2n, . . . , i + (N − 1)n. So K is identified with
K1 ⊗K2 ⊗ . . .⊗Kn.
For each 1 ≤ i ≤ n we perform a hypothesis testing on the Hilbert space Ki. The
0-hypothesis is that the ith component of the actually chosen x ∈ An is 0. Based on
the channel outputs at time instances i, i + n, . . . , i + (N − 1)n, the 0-hypothesis is
tested against the alternative hypothesis that the ith component of x is 1. According
to the quantum Stein lemma (Lemma 3), given any η > 0 and 0 < R < S(σ‖ρ), for N
sufficiently large, there exists a test Ei such that the probability of error of the first kind
is smaller than η, while the probability of error of the second kind is smaller than e−NR.
The projections Ei and I − Ei form a partition of unity in the Hilbert space Ki, and
the n-fold tensor product of these commuting projection will give a partition of unity in
K⊗Nn. Let y ∈ {0, 1}n and set Fy := ⊗
i=1Fyi , where Fyi = Ei if yi = 0 and Fyi = I −Ei
if yi = 1. Therefore, the result of decoding can be an arbitrary 0–1 sequence in {0, 1}
The decoding scheme gives y ∈ {0, 1}n in such a way that yi = 0 if the tests accepted
the 0-hypothesis for i and yi = 1 if the alternative was accepted. The error probability
should be estimated:
Prob(Y 6= X|X = x) =
y:y 6=x
Tr ρ⊗N
y:y 6=x
Tr ρ⊗Nxi Fyi
y:yi 6=xi
Tr ρ⊗Nxj Fyj ≤
Tr ρ⊗Nxi (I − Fxi).
If xi = 0, then
Tr ρ⊗Nxi (I − Fxi) = Tr ρ
0 (I −Ei) ≤ η,
because it is an error of the first kind. When xi = 1,
Tr ρ⊗Nxi (I − Fxi) = Tr ρ
1 Ei ≤ e
from the error of the second kind. It follows that ℓη + ne−NR is a bound for the error
probability. The first term will be small if η is small. The second term will be small
if N is large enough. If both terms are majorized by ε/2, then the statement of the
lemma holds. We can choose n so large that N defined by the statement should be large
enough.
Proof of Theorem 3: Since Lemma 1 gives an upper bound, that is,
lim sup
S(ρXn) −
S(ρx)
≤ cS(σ‖ρ),
it remains to prove that
lim inf
S(ρXn) −
S(ρx)
≥ cS(σ‖ρ).
Lemma 4 is about the N -times repeated input XN and describes a decoding scheme
with error probability at most ε. According to Lemma 2 we have
(1 − ε) log |An| − 1 ≤ S(ρXN ) −
S(ρxN ).
From the subadditivity of the entropy we have
S(ρXN ) ≤ NS(ρX)
S(ρxN ) = NS(ρx)
holds due to the additivity for product. It follows that
(1 − ε)
log |An|
≤ S(ρX) −
S(ρx).
From the choice of N in Lemma 4 we have
log |An|
log n
logn + log 2 − log ε
log |An|
and the lower bound is arbitrarily close to cR. Since R < S(ρ0‖ρ1) was arbitrary, the
proof is complete.
References
[1] V.P. Belavkin and P. Staszewski, C*-algebraic generalization of relative entropy and
entropy, Ann. Inst. Henri Poincaré, Sec. A 37(1982), 51–58.
[2] I. Bjelaković, J. Deuschel, T. Krüger, R. Seiler, R. Siegmund-Schultze and A. Szko la,
A quantum version of Sanov’s theorem, Comm. Math. Phys. 260(2005), 659–671.
[3] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second edition,
Wiley-Interscience, Hoboken, NJ, 2006.
[4] L. Diósi, T. Feldmann and R. Kosloff, On the exact identity between thermodynamic
and informatic entropies in a unitary model of friction, Int. J. Quantum Information,
4(2006), 99–104.
[5] M. Hayashi, Quantum information. An introduction, Springer, 2006.
[6] F. Hiai and D. Petz, The proper formula for relative entropy and its asymptotics in
quantum probability, Comm. Math. Phys. 143(1991), 99–114.
[7] A.S. Holevo, Some estimates for the amount of information transmittable by a quan-
tum communication channel (in Russian), Problemy Peredachi Informacii, 9(1973),
3–11.
[8] M.A. Nielsen and I.L. Chuang, Quantum computation and quantum information,
Cambridge University Press, Cambridge, 2000.
[9] T. Ogawa and H. Nagaoka, Strong converse and Stein’s lemma in quantum hypoth-
esis testing, IEEE Tans. Inf. Theory 46(2000), 2428–2433.
[10] M. Ohya and D. Petz, Quantum Entropy and its Use, Springer, 1993.
[11] M. Ohya, D. Petz and N. Watanabe, On capacities of quantum channels, Prob.
Math. Stat. 17(1997), 179–196.
[12] D. Petz, Lectures on quantum information theory and quantum statistics, book
manuscript in preparation.
[13] S. Verdu, On channel capacity per unit cost, IEEE Trans. Inform. Theory 36(1990),
1019–1030.
	Introduction
	An analytic proof of the conjecture
	Interpretation as capacity
ABSTRACT
  In a quantum mechanical model, Diosi, Feldmann and Kosloff arrived at a
conjecture stating that the limit of the entropy of certain mixtures is the
relative entropy as system size goes to infinity. The conjecture is proven in
this paper for density matrices. The first proof is analytic and uses the
quantum law of large numbers. The second one clarifies the relation to channel
capacity per unit cost for classical-quantum channels. Both proofs lead to
generalization of the conjecture.

<|endoftext|><|startoftext|>
Intelligent location of simultaneously active acoustic
emission sources:
Part I
Tadej Kosel and Igor Grabec
Faculty of Mechanical Engineering, University of Ljubljana,
Aškerčeva 6, POB 394, SI-1001 Ljubljana, Slovenia
e-mail: tadej.kosel@guest.arnes.si; igor.grabec@fs.uni-lj.si
Abstract— The intelligent acoustic emission locator is described
in Part I, while Part II discusses blind source separation,
time delay estimation and location of two simultaneously active
continuous acoustic emission sources.
The location of acoustic emission on complicated aircraft frame
structures is a difficult problem of non-destructive testing. This
article describes an intelligent acoustic emission source locator.
The intelligent locator comprises a sensor antenna and a general
regression neural network, which solves the location problem
based on learning from examples. Locator performance was
tested on different test specimens. Tests have shown that the
accuracy of location depends on sound velocity and attenuation
in the specimen, the dimensions of the tested area, and the
properties of stored data. The location accuracy achieved by
the intelligent locator is comparable to that obtained by the
conventional triangulation method, while the applicability of the
intelligent locator is more general since analysis of sonic ray paths
is avoided. This is a promising method for non-destructive testing
of aircraft frame structures by the acoustic emission method.
INTRODUCTION
Acoustic emission (AE) concerns non-destructive testing
methods and is used to locate and characterize developing
cracks and defects in material. In non-destructive testing of
aviation frame structures, acoustic emission is a well accepted
method [8]. The location problem is usually solved by various
triangulation techniques based on the analysis of ultrasonic ray
trajectories [10], [1], [3]. Solving and programming the related
equation is rather cumbersome and cannot be simply per-
formed if the structure of the tested specimen is geometrically
complicated. Acoustic emission testing of aircraft structures
is a challenging and difficult problem. The structures involve
bolts, fasteners and plates, all of which move relative to one
another due to differential structural loading during flight. The
complex geometry of the airframe results in multiple mode
conversions of AE source signals, compounding the difficulty
of relating the source event to the detected signal.
In order to avoid difficulties with equation solving and
programming of the triangulation procedure, several empirical
approaches based on learning from examples have already
been proposed [5]. We developed an intelligent locator capable
of learning from examples which we therefore called an
intelligent locator. The purpose of developing the intelligent
Manuscript generated: January 31, 2007
locator is to replace information obtained from the analysis
of sonic ray trajectories by information obtained directly from
simulated AE events on the specimen under test. In this way,
the calibration procedure, which has to be performed anyway,
could be generalized to the training of the intelligent locator.
The development of such an intelligent locator has been
described elsewhere [4]. In the locator developed a general
regression neural network (GRNN) is employed [9], which
acquires data about the detected AE signals and parameters
of their sources during learning. The GRNN uses these data
in testing when estimating the unknown source position from
detected AE signals. For this purpose, associative GRNN
operation is utilized. The basis of such operation is statistical
estimation determined by the conditional average [6]. Conse-
quently, the accuracy of the intelligent locator also depends on
the learning procedure, and must be examined before testing.
This article describes the results obtained by testing the
intelligent locator on experimental continuous AE sources. The
purpose of this study was to test and examine the advantages
of the intelligent locator compared to a conventional locator.
as described in Part I. In Part II an experiment will be
explained in which an intelligent locator was used to locate
two simultaneously active continuous AE sources generated
by leakage air flow. Location of more than one source at the
same time on the test specimen is a new approach in acoustic
emission testing, and is a very promising method for aircraft
and airspace structural testing.
When preparing the experiments, we focused on locating
evolving defects in stressed materials and constructions, and
leakage of vessels. We therefore performed location exper-
iments on four different specimens with three different AE
sources. The specimens comprised bands, plates, rings, and
vessels, while the AE sources were simulated by rupture of a
pencil lead (pen test), material deformation during tensile test,
and leakage air flow through a small hole in a sample. The
positions of AE sources used in testing were well specified.
Actual positions were compared with estimated ones, and
the discrepancy was used to describe the inaccuracy of the
locator. In this article, only the experiment with leakage air
flow through a small hole in a sample is explained. In Part I,
location of one continuous AE source is explained. This Part
is intended for better understanding of Part II and comparison
of results. In Part II, a new approach to the location of two
http://arxiv.org/abs/0704.0047v1
simultaneously active continuous AE sources is explained.
Below, the article first explains the theoretical background
for application of the conditional average to the location
problem, then describes auxiliary AE signal processing, and
finally demonstrates performance of the experimental intelli-
gent locator.
THEORETICAL BACKGROUND
In this section we describe a non-parametric approach to
empirical modeling of AE phenomena and solving the location
problem. This modeling stems from a description of physical
laws in terms of probability distributions. Since it has been
explained in detail elsewhere, we present here just its basic
concepts [6], [5].
The object of empirical modeling is the relationship between
variables which are simultaneously measured by a set of
sensors. In our example the variables are source coordinates
and AE signal characteristics. Let them be represented by a
vector of M components: x = (ξ1, . . . , ξM ). In the empirical
description of an AE phenomenon we repeat the observation N
times to create a database of prototype vectors {x1, . . . ,xN}.
Instead of formulating a relation between the components of x
we instead treat this vector as a random variable and express
the joint probability density function f by the estimator
f(x) =
δ(x− xn) . (1)
Here δ denotes Dirac’s delta function. For the purposes of
modelling, we must also estimate the probability density in
the space between the prototype points. This is achieved by
expressing the singular delta function in Eqs. 1 by a smooth
function, such as for example the Gaussian
wn(x− xn, σ) = exp
−‖x− xn‖
, n = 1, . . . , N .
in which σ denotes the smoothing parameter.
The data vectors determine an empirical model of the
probability density function. Their acquisition corresponds to
the learning phase of the empirical modeling. Let us further
assume that observation of AE phenomenon provides only
partial information that is given by a truncated vector
g = (ξ1, . . . , ξS ; ∅) , (3)
in which ∅ denotes missing components. The problem is
to estimate the complementary vector of missing or hidden
components:
h = (∅; ξS+1, . . . , ξM ); (4)
such that the complete data vector is determined by concate-
nation
x = g ⊕ h = (ξ1, . . . , ξS , ξS+1, . . . , ξM ) . (5)
A statistically optimal solution to this problem is determined
by the conditional average estimator, which is expressed by a
superposition of terms [6]
Bn(g)hn, where (6)
Bn(g) =
w(g − gn, σ)
w(g − gk, σ)
. (7)
The basis functions Bn(g) represent a measure of similarity
between the truncated vector g given by a particular ob-
servation and truncated vectors from the database gn. The
higher the value of Bn(g) the higher the contribution of hn
to the sum 7 estimating ĥ. Hence, estimation of the hidden
vector ĥ resembles associative recall, which is characteristic
of intelligence. The conditional average represents a general
non-parametric regression [6].
During the learning phase of operation an intelligent locator
of AE sources accepts AE signals and source coordinates
and stores prototype data vectors, while during application
it accepts only AE signals and estimates the corresponding
source position. Each of these phases can be performed in a
separate unit which can be interpreted as a layer of a sensory-
neural network.
In order to ensure acceptable properties of the locator,
the smoothing parameter σ must be properly chosen[2]. The
purpose of δ function smoothing is to estimate the probability
density function between the prototype data points. A unique
method for optimal specification of the smoothing parameter
is as yet unknown. In this case, it is numerically simpler to
specify σ by the half distance to the closest neighbor point:
σn = 0.5 min
‖gi − gn‖ , for all i 6= n . (8)
Signal pre-processing
The intelligent locator comprised a sensor antenna, signal
pre-processing unit and source locating unit, as shown in
Fig. 1. The first unit calculates the time delay ∆t from
AE signals y1(t) and y2(t), while the second unit estimates
the source position ẑ from the time delay ∆t. AE signals
y1(t) and y2(t) are detected by sensors and filtered using a
Butterworth bandpass filter. Without the bandpass filter, time
delays cannot be easily mapped to source positions on the
sample band, and therefore the applicability of this method
depends on the proper choice of bandpass filter function H(f).
We found on dispersive specimens that information in the
continuous AE signal about source position is located in a
narrow frequency band. A wave packet with approximately
constant wave velocity along the specimen must be extracted
by this filter. The filter function H(f) is determined during
training procedure of the locator.
PSfrag replacements
y1(t)
y2(t)
y1(t)
y2(t)
Ry1y2 ∆tCross-
correlator
detector
Locator ẑSensor
Sensor
Bandpass
filter
Test specimen
#2 H(f)
Signal pre-processing unit Source location
Fig. 1. AE signal processing by the intelligent locator
Two conventional methods for time delay estimation be-
tween two signals are known: threshold function and cross-
correlation function. Estimation of time delay by the threshold
function is simple, but only applicable in the case of discrete
AE. More general, but also more demanding, is time delay
estimation from the cross-correlation function of AE signals
[11]. The cross-correlation function:
Ry1y2(τ) =
y1(t) y2(t+ τ) , (9)
generally exhibits a peak when parameter τ corresponds to
the time delay ∆t between signals y1(t) and y2(t). The time
delay is thus determined from the position of the peak of the
cross-correlation function. One advantage of the application
of the cross-correlation function is that it does not depend
on the discrete or continuous character of AE signals. This
method for time delay estimation is only applicable when one
AE source is active at the time of detection. In the event of
two or more simultaneously active continuous AE sources, a
different approach should be used which will be discussed in
the Part II.
A filter function is calculated during calibration of the
intelligent locator as follows. During calibration, a set of
prototype sources is generated on the test specimen by a pen
test at a prepared coordinate net[8]. This net in most cases
has linear sections, where the prototype sources are positioned
on a straight line. In this case, we know that time delays
between signals are also linearly dependent. If we have a test
specimen with a complicated geometrical structure, then a pre-
calibration process has to be performed in which we have to
choose a geometrically simple part of the specimen and carry
out a pre-calibration procedure on this part such that time
delays between signals are linearly dependent.
For calibration we used AE signals generated by a pen
test. We obtained 12 pairs of AE signals from two sensors
concatenated with known coordinates of sources. The posi-
tions of simulated sources were uniformly distributed along a
straight line on a specimen. In such cases, time delay ∆t is
linearly related to source position z. This is of advantage for
optimal determination of bandpass filter because the reference
is a straight line. Calculation of time delays on the same set
of prototype AE signals was repeated 70 times. The bandpass
filter of ∆f = 10 kHz was shifted by 1 kHz at each repetition
from 5 to 75 kHz. Time delays were calculated at each
repetition and the distribution obtained was compared with a
straight line, as shown in Fig. 2. The frequency bandwidth was
considered optimal when the root mean square error (RMSE)
was minimal, as shown in Fig. 3(a). The optimal frequency
band for this specimen was 35-45 kHz and the velocity of
elastic waves was 1.7 km s−1. The filter was further used for
pre-processing samples of prototype as well as test sources. As
shown in Fig. 3(b), the pairs (z,∆t), estimated from filtered
signals, fit a straight line, except one outlier, which results
from experimental error.
EXPERIMENT
The intelligent AE source locator is shown schematically
in Fig. 4. It includes an automatic data-acquisition system
−1 0 1
PSfrag replacements
l [m]
5–15 kHz
−1 0 1
PSfrag replacements
l [m]
∆t [ms]
5–15 kHz
15–25 kHz
−1 0 1
PSfrag replacements
l [m]
∆t [ms]
5–15 kHz
15–25 kHz
25–35 kHz
−1 0 1
PSfrag replacements
l [m]
∆t [ms]
5–15 kHz
15–25 kHz
25–35 kHz
35–45 kHz
−1 0 1
PSfrag replacements
l [m]
∆t [ms]
5–15 kHz
15–25 kHz
25–35 kHz
35–45 kHz
45–55 kHz
−1 0 1
PSfrag replacements
l [m]
∆t [ms]
5–15 kHz
15–25 kHz
25–35 kHz
35–45 kHz
45–55 kHz
55–65 kHz
Fig. 2. Distribution of time delays and their linear approximation along
the band specimen. By this procedure an optimal bandpass filter can be
determined.
controlled by computer and a network of AE sensors.
The AE sensors are piezoelectric transducers (pinducers).
The diameter of the transducer active area is 1.3 mm, And so it
can be considered a point-like sensor. The signals from sensors
are fed to a digital oscilloscope where they are digitized and
transferred to a PC. Operation of the intelligent locator is
determined by software in the PC that controls data acquisition
and estimates the position of unknown AE sources.
The locator operates in two different modes:
1) In learning or calibration mode, a set of N pen tests is
performed in which complete information about the AE
phenomenon is acquired. The operator must prepare an
orientation net the shape of which depends on the shape
of the test specimen. The recommended shape is an
equidistant net, since such position of prototype sources
yield a minimum error of the locator. ¿From source
coordinates and time delays between pre-processed AE
signals, the prototype vectors are created and stored in
the memory of the neural network as a data base.
2) In application mode, only time delays between AE
signals are provided. There are then associated in the
neural network with the estimated source coordinates.
In the case of discrete AE, the time delay can visually be
estimated from a marked jump in the burst of the AE signal, or
can be instrumentally determined using a threshold function.
Hence, in the case of continuous AE, time delays cannot
be simply estimated, although a cross-correlation function
has already been used for this purpose. In our approach, we
therefore applied a cross-correlation function. The purpose of
this experiment was to determine the accuracy of location of
continuous AE sources on a one-dimensional specimen.
Two experiments on aluminum band specimen are explained
in this article. We tested the locator on an aluminum band
specimen of dimensions 4000 × 40 × 5mm3. Reflection of
AE signals at the ends of the band specimen was reduced
by sharpening the ends. For testing we selected a test area
15−25 35−45 55−65 75−85
PSfrag replacements
∆f [kHz] - Frequency band
E ∆fopt
−1000 −500 0 500 1000
PSfrag replacements
z [mm] - Actual location
-outlier
Fig. 3. Time delays for prototype and test sources by using the bandpass
filter of frequency 35-45 kHz. a) Deviation of prototype source position from
a straight line for different filter frequency bandwidth. b) Time delays of
prototype and test sources; Legend: + prototype source, ◦ test source
in the middle of the band specimen where 23 holes were
prepared. The distance between holes was 100 mm and the
diameter of holes was 2 mm. Two AE sensors were mounted
100 mm away from the terminal holes. For the purpose of
locator training, we generated 12 prototype sources separated
by 200 mm, while all 23 holes were applied for locator testing.
In this experiment, we calibrate the locator by pen test and
examine it by continuous AE generated by air flow. The air
flow was produced by expansion of compressed air through
nozzle of 1 mm diameter. The nozzle was mounted 1 mm
above the band specimen surface.
Two experiments were performed. In the first experiment,
only one continuous AE source was active on the band
specimen, while in the second experiment two continuous AE
sources were active simultaneously on the band specimen.
Successive simultaneous location of two sources is explained
in Part II.
Signals were processed as shown in Fig. 1. The first step
in processing was calculation of cross-correlation function
of AE signals. The corresponding signal was sent through a
bandpass Butterworth filter of bandpass from 35 to 45 kHz.
Determination of this filter is explained earlier in this article.
RESULTS
The results of locator testing are shown in Fig. 5(a). The
absolute location error for each test source is shown in
Fig. 5(b). Location error in the experiment ranges from 1.3 mm
to 60 mm with average value εa = 20mm (ignoring the
outlier). If we describe the error with respect to the distance
between sensors (2.4 m), the relative value is less than 1%.
Increasing the number of prototype sources can reduce the
error. Despite the complexity of continuous AE signals, the
location problem was solved satisfactorily with respect to The
accuracy required in non-destructive testing. Results also show
that a standard calibration procedure with discrete AE signals
generated by pen test can be used for locator training.
PSfrag replacements
Sensors
Operator
Analog
Signals
#2 Digital
oscilloscope
Parameter set
Computer
Calibration by simulated AE sources
Fig. 4. Experimental setup of intelligent locator
−1000 −500 0 500 1000
−1000
PSfrag replacements
x [mm] - Actual location
-outlier
−1000 −500 0 500 1000
PSfrag replacements
x [mm] - Actual location
-outlier
Fig. 5. Result of continuous AE source location on the band. a) Estimated
versus actual location of test sources; Legend: + prototype source, ◦ test source.
b) Absolute location error; εa - average error.
DISCUSSION AND CONCLUSION
Estimation of source coordinates by the conditional average
is subject to systematic error caused by smoothing of the
delta function [5]. This error can be reduced by increasing
the number of prototype sources. Since it is not always
possible to increase the number of prototype sources due to
the complexity of experiments, a compromise must be found
by trial and error.
Experimental error is acceptable, so we decided to make
additional tests, as will be discussed in Part II.
This study shows that a conventional AE locator operating
on the triangulation method can be successfully replaced by an
intelligent locator that learns from examples. The results show
that the intelligent locator can locate sources with acceptable
accuracy in cases of: (1) discrete AE on band and plate, (2)
continuous AE on band, (3) discrete AE on plate with hole
(ring), (4) discrete AE generated by specimen rupture during
the tensile test, and (5) discrete AE on pressure vessel. Is has
been also shown that the locator can perform zonal locating[7].
Comparing mean errors of all experiments and the distances
between prototype sources, we find that the average error
is always less than 30% of the distance between prototype
sources, while the maximal error is always less than 50% of
the distance between prototype sources. The accuracy of the
locator can be controlled by the number of prototype sources
excited during training. The experimental error of the locator is
a consequence of wave dispersion on a specimen that operates
as a waveguide, reflections from boundaries, and attenuation.
We found for dispersive waves that an optimal wave packet
must be found which has approximately constant velocity
along the test specimen. Estimation of time delay between AE
signals by the cross-correlation function is only applicable for
one active AE source. If there are several simultaneously active
AE sources, then blind source separation should be used, as
will be shown in Part II.
REFERENCES
[1] Chan, Y. T. Ho, K. C. 1994 , A simple and efficient estimator for hy-
perbolic location, IEEE Transactions on Signal Processing 42(8), 1905–
1915.
[2] Cherkassky, V. Mulier, F. 1998 , Leraning from Data: Concepts, Theory,
and Methods, John Wiley & Sons inc., New York.
[3] Friedlander, B. 1987 , A passive localization algorithm and its accuracy
analysis, IEEE Journal of Oceanic Engineering OE-12(1), 234–245.
[4] Grabec, I. Antolovič, B. 1994 , Intelligent locator of AE sources, in
T. Kishi, Y. Mori M. Enoki, eds, The 12th International Acoustic Emission
Symposium, Vol. 7 of Progress in Acoustic Emission, The Japanese
Society for Non-Destructive Inspection, Tokyo, Japan, pp. 565–570.
[5] Grabec, I. Sachse, W. 1991 , ‘Automatic modeling of physical phenomena:
Application to ultrasonic data’, J. Appl. Phys. 69(9), 6233–6244.
[6] Grabec, I. Sachse, W. 1997 , Synergetics of Measurement, Prediction and
Control, Springer-Verlag, Berlin.
[7] Kosel, T. Grabec, I. 1998 , Intelligent locator of discrete and continuous
acoustic emission sources, in J. Grum, ed., Application of Contemporary
Non-destructive Testing in Engineering, The 5th International Conference
of Slovenian Society for Nondestructive Testing, Slovenian Society for
Nondestructive Testing, Ljubljana, Slovenia, pp. 39–54.
[8] McIntire, P. Miller, R. K., eds 1987 , Acoustic Emission Testing, Vol. 5
of Nondestructive Testing Handbook, 2 edn, American Society for Non-
destructive Testing, Philadelphia, USA.
[9] Specht, D. F. 1991 , A general regression neural network, IEEE Trans.
on Neural Networks 2(6), 568–576.
[10] Tobias, A. 1976 , Acoustic emission source location in two dimensions
by an array of three sensors, Non-Destructive Testing 9(2), 9–12.
[11] Ziola, S. M. Gorman, M. R. 1991 , Source location in thin plates using
cross-correlation, J. Acoust. Soc. Am. 90(5), 2551–2556.
	References
ABSTRACT
  The intelligent acoustic emission locator is described in Part I, while Part
II discusses blind source separation, time delay estimation and location of two
simultaneously active continuous acoustic emission sources.
  The location of acoustic emission on complicated aircraft frame structures is
a difficult problem of non-destructive testing. This article describes an
intelligent acoustic emission source locator. The intelligent locator comprises
a sensor antenna and a general regression neural network, which solves the
location problem based on learning from examples. Locator performance was
tested on different test specimens. Tests have shown that the accuracy of
location depends on sound velocity and attenuation in the specimen, the
dimensions of the tested area, and the properties of stored data. The location
accuracy achieved by the intelligent locator is comparable to that obtained by
the conventional triangulation method, while the applicability of the
intelligent locator is more general since analysis of sonic ray paths is
avoided. This is a promising method for non-destructive testing of aircraft
frame structures by the acoustic emission method.

<|endoftext|><|startoftext|>
Introduction
The data obtained from LISA [1] will contain a large number of white dwarf binary
systems across the whole observational window [2]. At frequencies below ∼ 3 mHz the
sources are so abundant that they produce a stochastic foreground whose intensity
dominates the instrumental noise [3]. The closer (and louder) sources will still be
sufficiently bright to be individually resolvable. Above ∼ 3 mHz the sources become
sufficiently sparse in parameter space (and in particular in the frequency domain) that
the detectable sources become individually resolvable. The identification of white dwarfs
in the LISA data set represents one of the most interesting analysis problems posed by
the mission: the total number of signals in the data set is unknown, the effective noise
http://arxiv.org/abs/0704.0048v2
WD MLDC1 2
level affecting the measurements is not easily estimated from the data streams, and
there is a large number of overlapping sources to the limit of confusion.
Bayesian inference provides a clear framework to tackle such a problem [4, 5, 6].
Some of us have carried out exploratory studies and “proof of concept” analyses
on simplified problems that have demonstrated that Bayesian techniques do indeed
show good potential for LISA applications [11, 10, 12]. Similarly other authors have
successfully implemented techniques using Bayesian inference [18, 17, 16]. In this paper
we present the first results of an end-to-end analysis pipeline developed in the context of
the Mock LISA Data Challenges that has evolved from our earlier work. This pipeline
is applied to the simplest single-source challenge data sets 1.1.1a and 1.1.1b and all the
results presented here are obtained after the release of the key files. In a companion
paper [19], we present results that we have obtained for the analysis of the data sets
containing gravitational radiation from a massive-black-hole binary inspiral. Our group
submitted an entry for the MLDC analysing the blind data set 1.1.1c [13, 14]: however
that result suffered from the fact that the pipeline was not complete, the analysis code
was inefficient and we encountered hardware problems with the Beowulf cluster used to
perform the analysis.
The results that we present here are obtained with a two-stage end-to-end analysis
pipeline: (i) we first process the data set with a grid-based coherent algorithm to identify
candidate signals; (ii) we then follow up the candidate signals with a Markov Chain
Monte Carlo code to obtain probability density function on the model parameters. Our
method differs from other MCMC methods that have been proposed and applied to
the MLDC data in the context of white dwarf binaries [18, 17, 16]: the MCMC is not
used to search, but only in the final stage of the analysis to produce posterior density
functions of the model parameters. The noise spectral level is included as one of the
unknown parameters and is estimated together with the parameters of the gravitational
wave source(s).
2. Analysis method
In this section we describe the two stage approach that we have adopted for the analysis.
The signal produced by a white dwarf binary system is modelled as monochromatic in
the source reference frame, following the conventions adopted in the first MLDC [7, 8, 9].
It is described by 7 parameters: ecliptic latitude ϑe and longitude ϕe, inclination ι and
polarisation angle Ψ, frequency at a reference time f0 and corresponding overall phase
Φ0 and amplitude A.
The data distributed for the MLDC are the three TDI v1.5 Michelson observables
X , Y and Z ‡. From those we construct the two orthogonal TDI outputs
A = (2X − Y − Z)/3 (1)
E = (Z − Y )/
3 (2)
‡ In our MCMC analysis we use the data set produced using the LISA Simulator.
WD MLDC1 3
by diagonalizing the noise covariance matrix following the procedure presented in [23].
The noise affecting the channels A and E is uncorrelated and described by the one-sided
noise spectral density Sn(f). We model the LISA response function in the low frequency
limit in order to improve the computational efficiency of our analysis.
2.1. First stage: Grid based search
The first stage of the pipeline consists of a fast search of the data for the best
matched filter based on the well-known F -statistic algorithm, first developed for triaxial
pulsar signals in the context of ground-based observations [20]. This exploits the Fast
Fourier Transform to perform matching in the frequency domain to templates which are
generated at an array of fixed points in the parameter space.
The data from an individual detector in the frequency domain d̃(f) is supposed to
contain a signal plus Gaussian noise, d̃(f) = h̃(f) + ñ(f). We define the logarithmic
likelihood as a measure of match, as given by logL ≈ (d̃|h̃)− 1
(h̃|h̃) with (·|·) denoting
the scalar product as defined in [20]. A single signal in the F -statistic algorithm is
re-parameterised as a linear function of four orthogonal variables, and the frequency
f0. The detection statistic is based on four parameters AF , BF , CF and DF , found by
integrating over the response functions a(t) and b(t) to the two polarisation states of
the gravitational wave signal [20],
∫ Tobs
a(t)2dt (3)
b(t)2dt (4)
a(t)b(t)dt, (5)
DF = AFBF − C2F (6)
Tobs denotes the total observed time for the data set being analysed. The optimal
detection statistic 2F , which is pre-maximised over the nuisance parameters h0, ι, φ0
and ψ is
2F = 8
Sn(f)Tobs
BF |Fa|2 + AF |Fb|2 − 2CF ×R(FaFb)
. (7)
Fa and Fb are the demodulated Fourier transforms of the data,
∫ Tobs
d(t)a(t)e−iΦ(t)dt; Fb =
∫ Tobs
d(t)b(t)e−iΦ(t)dt, (8)
Φ(t) is the phase of the gravitational wave signal, as is described in [22].
As the LISA array moves in space, the frequency f0 is affected by Doppler
modulations. This modulation changes with differing position of the source in the sky,
implying the need to recalculate the modulations and thus a(t) and b(t) for each sky
position that is tested - a significant factor in the performance of this approach. The
differing modulation structure however also allows us to estimate the location of the
WD MLDC1 4
source in the sky by maximising the 2F value. The resolution possible on the sky with
this method is not as good as from a full Bayesian posterior probability calculation as
performed in the parameter estimation stage, as shown in an example for Challenge
1.1.1a in figure 1. Nevertheless, since this statistic can be computed fairly quickly it
serves as a useful way of finding initial values to feed into the MCMC routine, as adopted
within the pipeline. The resolution achievable on the sky increases with frequency, which
implies that the mismatch between filter and signal falls off more rapidly at higher
frequencies, requiring a greater number of templates to cover the sky. Therefore for
challenge 1.1.1b at f ≈ 3mHz a sky grid of size 5,752 points was used, in comparison
with 765 points for challenge 1.1.1a at f ≈ 1mHz.
The F -statistic search was implemented using the LIGO “Lalapps” suite of software
[24], in which the pulsar search code was modified by Reinhard Prix and John Whelan
to use the LISA response function for the TDI variables X , Y , and Z [21]. These input
data streams were given in the form of Short Fourier Transforms, each of length one day,
created from the MLDC1 challenge data. For each challenge the full specified range of
frequencies was searched for the signal as it would be in a blind search. The code was
run on a single CPU and executed in a few hours, with the run-time increasing at higher
frequency due to the higher resolution of sky and frequency grid that had to be used.
The candidate chosen to pass to the MCMC stage was simply that which triggered the
highest value of 2F .
2.2. Second stage: Markov Chain Monte Carlo follow-up
According to Bayes’ theorem, the posterior probability, p(m̃|d̃) of a model m̃ given the
data d̃ depends on the prior distribution p(m̃), containing the information known before
the analysis, the likelihood L(d̃|m̃) of the model and a normalisation factor p(d̃)
p(m̃|d̃) = L(d̃|m̃)p(m̃)
p(d̃)
The posterior probability density function shows the joint probability density of given
values of parameters of the model m̃, conditional on the data d̃.
We implemented Bayes’ theorem using data in the form of TDI variables A and E
and modelled our template according to the Long Wavelength Approximation directly
in the Fourier domain [25] to gain computational speed. The logarithmic likelihood
L(d̃|m̃) in this stage explicitly included its dependence on the one-sided noise spectral
density Sn(f)
logL(d̃|m̃) = const. −
log Sn(f) − (d̃− h̃|d̃− h̃), (10)
shown here for either A or E, with the combined likelihood as sum of the individual
likelihoods. We restricted our analysis to a sufficiently narrow frequency window in
order to be able to approximate the noise spectral density as constant, Sn(f) = S0. This
window was set as the interval in frequency that contains at least 98% of the power of our
WD MLDC1 5
Ecliptic Longitude
2F as a function of sky position, at a frequency 0.001063 Hz
1 2 3 4 5 6
Figure 1. The variation of 2F values for the search for unknown signal 1.1.1a, as a
function of sky position, parameterised by ecliptic latitude β and longitude λ. The
distribution is multi-modal and non-Gaussian, and has a poor resolution in comparison
with that can be achieved with the MCMC and a Bayesian likelihood, but by finding the
maximum it serves well as a starting point for the more refined parameter estimation
below.
model m̃, with the interval’s upper and lower limits given by f±(2/year)(5+2πf0AU/c)
[25]. S0 is therefore an additional parameter to be inferred within the model m̃ in Eq. 10.
We implemented an automatic Random Walk Metropolis sampler (Stroeer &
Vecchio 2007, in. prep.) to sample from the posterior probability density function in
form of a Markov chain. Metropolis sampling eliminates the need to explicitly calculate
the normalisation constant in Bayes’ theorem, and the evolving Markov chain gives
easy access to joint as well as marginalised posterior density distribution. The sampler
was started from the parameter set which triggered the highest value of 2F in our grid
based coherent run of the analysis (see former section). The automated function of the
Metropolis sampling is achieved by controlling the sampling step-size with adaptive
acceptance probability techniques [26]. The sampler therefore does not depend on
assumptions about the signal in the data set in order to perform successfully and reliably;
it develops a suitable algorithm and approach by itself based on the properties of the
likelihood as found on the fly, in the initial steps of the sampler. The length of our
Markov chain was pre-set to 106, with the initial 104 chain states discarded as the
“burn-in” phase of our sampler. The runtime for one data analysis run is 5 hours on a
single 2 GHz CPU on the Tsunami cluster of the University of Birmingham.
WD MLDC1 6
Figure 2. The marginalised posterior probability density functions of the eight
unknown parameters – the seven parameters that describe the signal and the noise
spectral density S0 – for the the challenge data set 1.1.1a. The vertical black solid
line denotes the true value of the parameter (for the polarisation angle the true value
modulo π/2), and the grey dashed line the initial value for the MCMC analysis as
determined by the template of the first-stage that produces the maximum value of the
F -statistic. In the case of the noise spectral density the first stage of the analysis does
not provide an estimate; the true value of this parameter is taken to be the value of
the instrumental noise spectrum used to generate the data set and provided in [9].
WD MLDC1 7
Figure 3. The marginalised posterior probability density functions of the eight
unknown parameters for the the challenge data set 1.1.1b. Labels are as in Figure
3. Results
We found that the most promising candidate signal from the F -statistic search already
matched the true embedded signal to high accuracy, particularly in frequency and
sky location. Our MCMC sampler, as a post-processing unit, thus only needed 1000
iterations to burn in and to establish a reliable sampling from the posterior. The
marginalised posteriors are shown in Figs. 2 and 3. We found, as seen in latter
figures, that the MCMC sampler further refined the initial guesses from the F -statistic,
as measured by the absolute difference between the true value of a given parameter
and the median of the marginalised posterior recovered for that parameter. Table 1
WD MLDC1 8
Table 1. Details about the results from Challenge 1.1.1a and Challenge 1.1.1b. S0,
the constant one-sided noise spectral density within our narrow frequency window, is
compared to the true one sided noise spectral density at the true frequency of the
signal, Ψ is given modulo π/2. Int90 denotes the minimum interval to include 90% of
MCMC states for given parameter, ∆mode denotes the absolute difference between the
true value of a signal parameter and the mode of its recovered posterior; ∆median and
∆mean denote the equivalent absolute difference for median and mean of the posterior
respectively; σ denotes the sampled standard deviation of the posterior as derived
from the median. We further quote the signal-to-noise ratio (SNR) for a template
using the true values of the source and the recovered values of the data analysis run,
as derived from the median of the individual posterior distributions, and the correlation
C between these two templates.
Int90 ∆mode ∆median ∆mean σ
Challenge 1.1.1a
10−41Hz−1
(3.53257, 4.72639) -0.42084 -0.440278 -0.452456 0.36704
ϑe/ rad (0.958409, 1.03165) -0.0147383 -0.0149381 -0.0148725 0.0222861
ϕe/ rad (5.05376, 5.13528) -0.00550139 -0.00569547 -0.00579889 0.0247886
Ψ/ rad (1.32475, 0.500553) 0.1768 0.1823 0.1902 0.1908
ι/ rad (0.097761, 1.0008) -0.0459747 0.190001 0.23459 0.295211
A/10−22 (1.61976, 2.67967) 0.664371 0.358844 0.298978 0.368524
f0/ mHz (1.06273, 1.06273) -1.19664e-06 -1.22207e-06 -1.22259e-06 1.04422e-06
Φ0/ rad (3.10668, 5.808) -0.164989 0.00998525 0.229659 0.829146
SNR true = 51.024497 recovered = 50.648600
C true vs. recovered = 0.99689
Challenge 1.1.1b
10−41Hz−1
(0.876833, 1.38959) -0.0679571 -0.0906557 -0.0996144 0.16017
ϑe/ rad (-0.121611, 0.0116916) -0.0343353 -0.151185 -0.150328 0.0406552
ϕe/ rad (4.60969, 4.63537) 0.00265723 0.00305564 0.00302203 0.00779893
Ψ/ rad (0.246328, 0.362409) 0.0301541 0.0311747 0.0311268 0.0353938
ι/ rad (1.22036, 1.33338) -0.0430412 -0.040458 -0.0394818 0.0348383
A/10−22 (0.45001, 0.542454) -0.016442 -0.0151921 -0.0149907 0.0281154
f0/ mHz (3.00036, 3.00036) 3.1221e-07 2.49289e-07 2.42807e-07 8.18111e-07
Φ0/ rad (5.83869, 6.19411) 0.137219 0.119301 0.119921 0.502384
SNR true = 36.587444 recovered = 37.368806
C true vs. recovered = 0.97897
shows details of the statistics of recovered posterior distributions. We highlight that
the majority of the true values of the parameters are within one standard deviation
of the median of the posterior, with a small percentage within two sampled standard
deviations. In addition, every true value of a parameter of the signal is within the
minimum interval of the posterior to cover 90% of all MCMC state values. Recovered
signal-to-noise ratios are measured as SNR = (s|h)/
(h|h), and the match C =
(htrue|hmed)/
(htrue|htrue) (hmed|hmed) between a template constructed from the true
values and a template from the median values of the individual posterior distributions,
yielding a correlation that is always higher than 0.97. Noise levels are determined
accurately and within 1 to 1.5 sampled standard deviations. Nevertheless we note that
WD MLDC1 9
our run on Challenge 1.1.1a shows a lower match and higher differences between true
value and recovered value of parameters as compared to the run on Challenge 1.1.1b. It
also exhibits tailing posterior distributions in inclination and amplitude, although the
SNR of Challenge 1.1.1a is twice the value of Challenge 1.1.1b.
4. Conclusions
We have presented a new approach to LISA data analysis in the form of an end-to-end
pipeline. We first detected and identified candidate signals in the LISA data stream with
a grid-based coherent algorithm, and then post-processed the most promising candidate
signals with an automatic Markov Chain Monte Carlo code to obtain probability
densities for the model’s parameters. We demonstrated successful identification and
post-processing of the signals from the double white dwarf single source MLDC data
sets 1.1.1a and 1.1.1b. Furthermore, the automatic Markov Chain Monte Carlo code
successfully identified the noise level within a small frequency window of interest in
these data sets. We note that a parallel approach to the data analysis of binary inspiral
signals is being developed by Röver et al, with a Markov Chain Monte Carlo method
that can successfully post-process a candidate signal generated from the true parameters
of the signal. Signal detection in a pre-processing stage is currently being tested within
parallel tempered MCMC methods and/or time-frequency analyses [19].
We identify two prominent and promising features of our pipeline: its ability to
determine good initial conditions for the MCMC and its ability to run the MCMC
automatically. As we have demonstrated in this paper, the width of the marginalised
posterior density for the frequency parameter is extremely narrow. It is therefore vital
that the initial estimate of the frequency is within this region, as the almost flat structure
of the posterior PDF outside this region gives little to no information on the location of
the peak. The chances of finding the mode through a random sampling are decreased
further still with a larger prior range for the parameter. Adding an F -statistic search as
the first stage in the pipeline solves this problem, since the frequency and position in the
sky are recovered very accurately, to within the limits of the posterior probability region
of interest, before the MCMC performs post-processing and parameter estimation. The
automatic feature of the MCMC ensures a successful post-processing for the other
astrophysical parameters that may have been located outside the posterior region of
interest by the F -statistic approach, as in the case for the amplitude of Challenge
1.1.1a. Convergence is aided by the ability of our code to increase or decrease sampling
step-sizes according to its experience of the sampling quality of the posterior during the
burn-in phase.
We are working on an extension of the pipeline as shown in this document to
successfully tackle multi-source data sets, required for the second round of the MLDC.
Current work includes the exploration of our grid-based coherent search on such data
streams in order to automatically identify the most promising individual candidate
signals, and the implementation of an automatic Reversible Jump Markov Chain Monte
WD MLDC1 10
Carlo routine (e.g. as already demonstrated in [10]) to find the trans-dimensional
probability density functions of the parameters of an unknown total number of signals.
We highlight that the noise level determination presented here already serves as a key
ingredient to round 2, where the simulation of a galactic white dwarf binary population
introduces additional confusion noise levels from unresolvable sources.
Acknowledgements
Nelson Christensen’s work was supported by the National Science Foundation grant
PHY-0553422 and the Fulbright Scholar Program. Alberto Vecchio’s work was partially
supported by the Packard Foundation and the National Science Foundation. The
University of Auckland group was supported by the Royal Society of New Zealand
Marsden Fund Grant UOA-204.
References
[1] Bender B L et al 1998 LISA Pre-Phase A Report; Second Edition, MPQ 233
[2] Nelemans G, Yungelson L R and Portegies Zwart S F 2001 Astron. and Astrophys. 375 890
[3] Farmer A J and Phinney E S 2003 Mon. Not. R. Astron. Soc 346 1197
[4] Jaynes E T Probability theory: The logic of science 2003 Cambridge University Press
[5] Gregory P C Bayesian logical data analysis for the physical sciences 2005 Cambridge University
Press
[6] Gelman A, Carlin J B, Stern H, and Rubin D B Bayesian data analysis 1997 Chapman & Hall
CRC Boca Raton
[7] Arnaud K A et al 2006 AIP Conf. Proc. 873 619 Preprint gr-qc/0609105
[8] Arnaud K A et al 2006 AIP Conf. Proc. 873 625 Preprint gr-qc/0609106
[9] Mock LISA Data Challenge Task Force, “Document for Challenge 1,”
svn.sourceforge.net/viewvc/lisatools/Docs/challenge1.pdf.
[10] Stroeer A, Gair J and Vecchio A 2006 Automatic Bayesian inference for LISA data analysis
strategies Preprint gr-qc/0609010
[11] Umstätter R, Christensen N, Hendry M, Meyer R, Simha V, Veitch J, Vigeland S and Woan G
2005 Phys Rev D 72 022001
[12] Wickham E D L, Stroeer A and Vecchio A 2006 Class Quantum Grav 23 819
[13] Bloomer E et al Report on MLDC1 available at http://astrogravs.nasa.gov/docs/mldc/round1/entries.html
[14] Arnaud K A et al 2007 Preprint gr-qc/0701139
[15] Arnaud K A et al 2007 Preprint gr-qc/0701170
[16] Crowder, J., and Cornish, N. J. 2007 Phys. Rev. D 75 043008
[17] Crowder J, Cornish N J and Reddinger J L 2006 Phys. Rev. D 73 063011
[18] Cornish N J and Crowder J 2005 Phys. Rev. D 72 043005
[19] Röver C et al in this volume
[20] Jaranowski P, Królak A and Schutz B F 1998 Phys. Rev. D 58 063001
[21] Prix R and Whelan J 2006 Technical note
[22] Brady P R, Creighton T, Cutler C and Schutz B F 1997 Phys. Rev. D 57 2101
[23] Prince T A, Tinto M, Larson S L and Armstrong J W 2002 Phys. Rev. D 66 122002
[24] LAL Home Page: http://www.lsc-group.phys.uwm.edu/daswg/projects/lal.html
[25] Cornish N J, Larson S L 2003 Phys. rev. D 67 103001
[26] Atchade Y F, Rosenthal J S 2005 Bernoulli 11 815-828
http://arxiv.org/abs/gr-qc/0609105
http://arxiv.org/abs/gr-qc/0609106
http://arxiv.org/abs/gr-qc/0609010
http://astrogravs.nasa.gov/docs/mldc/round1/entries.html
http://arxiv.org/abs/gr-qc/0701139
http://arxiv.org/abs/gr-qc/0701170
http://www.lsc-group.phys.uwm.edu/daswg/projects/lal.html
	Introduction
	Analysis method
	First stage: Grid based search
	Second stage: Markov Chain Monte Carlo follow-up
	Results
	Conclusions
ABSTRACT
  We report on the analysis of selected single source data sets from the first
round of the Mock LISA Data Challenges (MLDC) for white dwarf binaries. We
implemented an end-to-end pipeline consisting of a grid-based coherent
pre-processing unit for signal detection, and an automatic Markov Chain Monte
Carlo post-processing unit for signal evaluation. We demonstrate that signal
detection with our coherent approach is secure and accurate, and is increased
in accuracy and supplemented with additional information on the signal
parameters by our Markov Chain Monte Carlo approach. We also demonstrate that
the Markov Chain Monte Carlo routine is additionally able to determine
accurately the noise level in the frequency window of interest.

<|endoftext|><|startoftext|>
Introduction
Isomorphism classes of smooth toric Fano varieties of dimension d correspond
to isomorphism classes of socalled smooth Fano d-polytopes, which are fully
dimensional convex lattice polytopes in Rd, such that the origin is in the
interior of the polytopes and the vertices of every facet is a basis of the
integral lattice Zd ⊂ Rd. Smooth Fano d-polytopes have been intensively
studied for the last decades. They have been completely classified up to
isomorphism for d ≤ 4 ([1], [18], [3], [15]). Under additional assumptions
there are classification results valid in every dimension.
To our knowledge smooth Fano d-polytopes have been classified in the fol-
lowing cases:
• When the number of vertices is d+ 1, d+ 2 or d+ 3 ([9],[2]).
• When the number of vertices is 3d, which turns out to be the upper
bound on the number of vertices ([6]).
• When the number of vertices is 3d− 1 ([19]).
• When the polytopes are centrally symmetric ([17]).
• When the polytopes are pseudo-symmetric, i.e. there is a facet F ,
such that −F is also a facet ([8]).
• When there are many pairs of centrally symmetric vertices ([5]).
http://arxiv.org/abs/0704.0049v1
2 2 SMOOTH FANO POLYTOPES
• When the corresponding toric d-folds are equipped with an extremal
contraction, which contracts a toric divisor to a point ([4]) or a curve
([16]).
Recently a complete classification of smooth Fano 5-polytopes has been an-
nounced ([12]). The approach is to recover smooth Fano d-polytopes from
their image under the projection along a vertex. This image is a reflexive
(d− 1)-polytope (see [3]), which is a fully-dimensional lattice polytope con-
taining the origin in the interior, such that the dual polytope is also a lattice
polytope. Reflexive polytopes have been classified up to dimension 4 using
the computer program PALP ([10],[11]). Using this classification and PALP
the authors of [12] succeed in classifying smooth Fano 5-polytopes.
In this paper we present an algorithm that classifies smooth Fano d-polytopes
for any given d ≥ 1. We call this algorithm SFP (for Smooth Fano Poly-
topes). The input is the positive integer d, nothing else is needed. The
algorithm has been implemented in C++, and used to classify smooth Fano
d-polytopes for d ≤ 7. For d = 6 and d = 7 our results are new:
Theorem 1.1. There are 7622 isomorphism classes of smooth Fano 6-
polytopes and 72256 isomorphism classes of smooth Fano 7-polytopes.
The classification lists of smooth Fano d-polytopes, d ≤ 7, are available on
the authors homepage: http://home.imf.au.dk/oebro
A key idea in the algorithm is the notion of a special facet of a smooth Fano
d-polytope (defined in section 3.1): A facet F of a smooth Fano d-polytope
is called special, if the sum of the vertices of the polytope is a non-negative
linear combination of vertices of F . This allows us to identify a finite subset
Wd of the lattice Z
d, such that any smooth Fano d-polytope is isomorphic to
one whose vertices are contained in Wd (theorem 3.6). Thus the problem of
classifying smooth Fano d-polytopes is reduced to the problem of considering
certain subsets of Wd.
We then define a total order on finite subsets of Zd and use this to define a
total order on the set of smooth Fano d-polytopes, which respects isomor-
phism (section 4). The SFP-algorithm (described in section 5) goes through
certain finite subsets of Wd in increasing order, and outputs smooth Fano
d-polytopes in increasing order, such that any smooth Fano d-polytope is
isomorphic to exactly one in the output list.
As a consequence of the total order on smooth Fano d-polytopes, the algo-
rithm needs not consult the previous output to check for isomorphism to
decide whether or not to output a constructed polytope.
2 Smooth Fano polytopes
We fix a notation and prove some simple facts about smooth Fano polytopes.
The convex hull of a set K ∈ Rd is denoted by convK. A polytope is
the convex hull of finitely many points. The dimension of a polytope P is
the dimension of the affine hull, affP , of the polytope P . A k-polytope is
a polytope of dimension k. A face of a polytope is the intersection of a
supporting hyperplane with the polytope. Faces of polytopes are polytopes.
Faces of dimension 0 are called vertices, while faces of codimension 1 and 2
are called facets and ridges, respectively. The set of vertices of a polytope
P is denoted by V(P ).
Definition 2.1. A convex lattice polytope P in Rd is called a smooth Fano
d-polytope, if the origin is contained in the interior of P and the vertices of
every facet of P is a Z-basis of the lattice Zd ⊂ Rd.
We consider two smooth Fano d-polytopes P1, P2 to be isomorphic, if there
exists a bijective linear map ϕ : Rd → Rd, such that ϕ(Zd) = Zd and
ϕ(P1) = P2.
Whenever F is a (d−1)-simplex in Rd, such that 0 /∈ affF , we let uF ∈ (R
be the unique element determined by 〈uF , F 〉 = {1}. For every w ∈ V(F )
we define uw
∈ (Rd)∗ to be the element where 〈uw
, w〉 = 1 and 〈uw
, w′〉 = 0
for every w′ ∈ V(F ), w′ 6= w. Then {uw
|w ∈ V(F )} is the basis of (Rd)∗
dual to the basis V(F ) of Rd.
When F is a facet of a smooth Fano polytope and v ∈ V(P ), we certainly
have 〈uF , v〉 ∈ Z and
〈uF , v〉 = 1 ⇐⇒ v ∈ V(F ) and 〈uF , v〉 ≤ 0 ⇐⇒ v /∈ V(F ).
The lemma below concerns the relation between the elements uF and uF ′ ,
when F and F ′ are adjacent facets.
Lemma 2.2. Let F be a facet of a smooth Fano polytope P and v ∈ V(F ).
Let F ′ be the unique facet which intersects F in a ridge R of P , v /∈ V(R).
Let v′ = V(F ′) \ V(R).
1. 〈uv
, v′〉 = −1.
2. 〈uF , v
′〉 = 〈uF ′ , v〉.
3. 〈uF ′ , x〉 = 〈uF , x〉+ 〈u
, x〉(〈uF , v
′〉 − 1) for any x ∈ Rd.
4. In particular,
• 〈uv
, x〉 < 0 iff 〈uF ′ , x〉 > 〈uF , x〉.
• 〈uv
, x〉 > 0 iff 〈uF ′ , x〉 < 〈uF , x〉.
• 〈uv
, x〉 = 0 iff 〈uF ′ , x〉 = 〈uF , x〉.
for any x ∈ Rd.
4 2 SMOOTH FANO POLYTOPES
5. Suppose x 6= v′ is a vertex of P where 〈uv
, x〉 < 0. Then 〈uF , v
〈uF , x〉.
Proof. The sets V(F ) and V(F ′) are both bases of the lattice Zd and the
first statement follows.
We have v + v′ ∈ span(F ∩ F ′), and then the second statement follows.
Use the previous statements to calculate 〈uF ′ , x〉.
〈uF ′ , x〉 = 〈uF ′ ,
w∈V(F )
〈uwF , x〉w〉
w∈V(F )\{v}
〈uwF , x〉+ 〈u
F , x〉〈uF ′ , v〉
= 〈uF , x〉+ 〈u
F , x〉
〈uF ′ , v〉 − 1
= 〈uF , x〉+ 〈u
F , x〉
〈uF , v
′〉 − 1
As 〈uF , v
′〉 − 1 < 0 the three equivalences follow directly.
Suppose there is a vertex x ∈ V(P ), such that 〈uv
, x〉 < 0 and 〈uF , v
〈uF , x〉. Then
〈uF ′ , x〉 = 〈uF , x〉+ 〈u
F , x〉(〈uF , v
′〉 − 1) ≥ 〈uF , x〉 − (〈uF , v
′〉 − 1) ≥ 1.
Hence x is on the facet F ′. But this cannot be the case as V(F ′) = {v′} ∪
V(F ) \ {v}. Thus no such x exists.
And we’re done.
In the next lemma we show a lower bound on the numbers 〈uw
, v〉, w ∈ V(F ),
for any facet F and any vertex v of a smooth Fano d-polytope.
Lemma 2.3. Let F be a facet and v a vertex of a smooth Fano polytope P .
〈uwF , v〉 ≥
0 〈uF , v〉 = 1
−1 〈uF , v〉 = 0
〈uF , v〉 〈uF , v〉 < 0
for every w ∈ V(F ).
Proof. When 〈uF , v〉 = 1 the statement is obvious.
Suppose 〈uF , v〉 = 0 and 〈u
, v〉 < 0 for some w ∈ V(F ). Let F ′ be the
unique facet intersecting F in the ridge conv{V(F ) \ {w}}. By lemma 2.2
〈uF ′ , v〉 > 0. As 〈uF ′ , v〉 ∈ Z we must have 〈uF ′ , v〉 = 1. This implies
〈uF , v〉 = −1.
Suppose 〈uF , v〉 < 0 and 〈u
, v〉 < 〈uF , v〉 ≤ −1 for some w ∈ V(F ). Let
F ′ 6= F be the facet containing the ridge conv{V(F ) \ {w}}, and let w′ be
the unique vertex in V(F ′) \ V(F ). Then by lemma 2.2
〈uF ′ , v〉 = 〈uF , v〉 + 〈u
F , v〉(〈uF , w
′〉 − 1) ≥ 〈uF , v〉 − 〈u
F , v〉.
If 〈uF , v〉 − 〈u
, v〉 > 0, then v is on the facet F ′. But this is not the case
as 〈uw
, v〉 < −1. We conclude that 〈uw
, v〉 ≥ 〈uF , v〉.
When F is a facet and v a vertex of a smooth Fano d-polytope P , such that
〈uF , v〉 = 0, we can say something about the face lattice of P .
Lemma 2.4 ([7] section 2.3 remark 5(2), [13] lemma 5.5). Let F be a facet
and v be vertex of a smooth Fano polytope P . Suppose 〈uF , v〉 = 0.
Then conv{{v} ∪ V(F ) \ {w}} is a facet of P for every w ∈ V(F ) with
, v〉 = −1.
Proof. Follows from the proof of lemma 2.3.
3 Special embeddings of smooth Fano polytopes
In this section we find a concrete finite subset Wd of Z
d with the nice prop-
erty that any smooth Fano d-polytope is isomorphic to one whose vertices
are contained in Wd. The problem of classifying smooth Fano d-polytopes
is then reduced to considering subsets of Wd.
3.1 Special facets
The following definition is a key concept.
Definition 3.1. A facet F of a smooth Fano d-polytope P is called special,
if the sum of the vertices of P is a non-negative linear combination of V(F ),
that is
v∈V(P )
w∈V(F )
aww , aw ≥ 0.
Clearly, any smooth Fano d-polytope has at least one special facet.
Let F be a special facet of a smooth Fano d-polytope P . Then
0 ≤ 〈uF ,
v∈V(P )
v〉 = d+
v∈V(P ),〈uF ,v〉<0
〈uF , v〉,
which implies −d ≤ 〈uF , v〉 ≤ 1 for any vertex v of P . By using the lower
bound on the numbers 〈uw
, v〉, w ∈ V(F ) (see lemma 2.3), we can find an
explicite finite subset of the lattice Zd, such that every v ∈ V(P ) is contained
in this subset. In the following lemma we generalize this observation to
subsets of V(P ) containing V(F ).
Lemma 3.2. Let P be a smooth Fano polytope. Let F be a special facet of
P and let V be a subset of V(P ) containing V(F ), whose sum is ν.
〈uF , ν〉 ≥ 0
6 3 SPECIAL EMBEDDINGS OF SMOOTH FANO POLYTOPES
〈uwF , ν〉 ≤ 〈uF , ν〉+ 1
for every w ∈ V(F ).
Proof. For convenience we set U = V(P ) \ V and µ =
v∈U v. Since F is a
special facet we know that
0 ≤ 〈uF ,
v∈V(P )
v〉 = 〈uF , ν〉+ 〈uF , µ〉.
The set V(F ) is contained in V so 〈uF , v〉 ≤ 0 for every v in U , hence
〈uF , ν〉 ≥ 0.
Suppose that for some w ∈ V(F ) we have 〈uw
, ν〉 > 〈uF , ν〉+ 1. By lemma
2.3 we know that
〈uwF , v〉 ≥
−1 〈uF , v〉 = 0
〈uF , v〉 〈uF , v〉 < 0
for every vertex v ∈ V(P ) \ V(F ). There is at most one vertex v of P ,
〈uF , v〉 = 0, with negative coefficient 〈u
, v〉 (lemma 2.4). So
〈uwF , µ〉 ≥ 〈uF , µ〉 − 1.
Now, consider 〈uw
v∈V(P ) v〉.
〈uwF ,
v∈V(P )
v〉 = 〈uwF , ν〉+ 〈u
F , µ〉 > 〈uF , ν〉+ 〈uF , µ〉 = 〈uF ,
v∈V(P )
But this implies that 〈ux
v∈V(P ) v〉 is negative for some x ∈ V(F ). A
contradiction.
Corollary 3.3. Let F be a special facet and v any vertex of a smooth Fano
d-polytope. Then −d ≤ 〈uF , v〉 ≤ 1 and
〈uF , v〉
≤ 〈uwF , v〉 ≤
1 , 〈uF , v〉 = 1
d− 1 , 〈uF , v〉 = 0
d+ 〈uF , v〉 , 〈uF , v〉 < 0
for every w ∈ V(F ).
Proof. For 〈uF , v〉 = 1 the statement is obvious. When 〈uF , v〉 = 0 the
coefficients of v with respect to the basis V(F ) is bounded below by −1
(lemma 2.3), so no coefficient exceeds d− 1.
So the case 〈uF , v〉 < 0 remains. The lower bound is by lemma 2.3. Use
lemma 3.2 on the subset V = V(F ) ∪ {v} to prove the upper bound.
3.2 Special embeddings 7
3.2 Special embeddings
Let (e1, . . . , ed) be a fixed basis of the lattice Z
d ⊂ Rd.
Definition 3.4. Let P be a smooth Fano d-polytope. Any smooth Fano
d-polytope Q, with conv{e1, . . . , ed} as a special facet, is called a special
embedding of P , if P and Q are isomorphic.
Obviously, for any smooth Fano polytope P , there exists at least one special
embedding of P . As any polytope has finitely many facets, there exists only
finitely many special embeddings of P .
Now we define a subset of Zd which will play an important part in what
follows.
Definition 3.5. By Wd we denote the maximal set (with respect to inclu-
sion) of lattice points in Zd such that
1. The origin is not contained in Wd.
2. The points in Wd are primitive lattice points.
3. If a1e1 + . . .+ aded ∈ Wd, then −d ≤ a ≤ 1 for a = a1 + . . .+ ad and
≤ ai ≤
1 , a = 1
d− 1 , a = 0
d+ a , a < 0
for every i = 1, . . . , d.
The next theorem is one of the key results in this paper. It allows us to
classify smooth Fano d-polytopes by considering subsets of the explicitely
given set Wd.
Theorem 3.6. Let P be an arbitrary smooth Fano d-polytope, and Q any
special embedding of P . Then V(Q) is contained in the set Wd.
Proof. Follows directly from corollary 3.3 and the definition of Wd.
4 Total ordering of smooth Fano polytopes
In this section we define a total order on the set of smooth Fano d-polytopes
for any fixed d ≥ 1.
Throughout the section (e1, . . . , ed) is a fixed basis of the lattice Z
8 4 TOTAL ORDERING OF SMOOTH FANO POLYTOPES
4.1 The order of a lattice point
We begin by defining a total order � on Zd.
Definition 4.1. Let x = x1e1 + . . . + xded, y = y1e1 + . . . + yded be two
lattice points in Zd. We define x � y if and only if
(−x1 − . . .− xd, x1, . . . , xd) ≤lex (−y1 − . . .− yd, y1, . . . , yd),
where ≤lex is the lexicographical ordering on the product of d + 1 copies of
the ordered set (Z,≤).
The ordering � is a total order on Zd.
Example. (0, 1) ≺ (−1, 1) ≺ (1,−1) ≺ (−1, 0).
Let V be any nonempty finite subset of lattice points in Zd. We define max V
to the maximal element in V with respect to the ordering�. Similarly, minV
is defined to be the minimal element in V .
A important property of the ordering is shown in the following lemma.
Lemma 4.2. Let P be a smooth Fano d-polytope, such that conv{e1, . . . , ed}
is a facet of P . For every 1 ≤ i ≤ d, let vi 6= ei denote the vertex of P , such
that conv{e1, . . . , ei−1, vi, ei+1, . . . , ed} is a facet of P .
Then vi = min{v ∈ V(P ) | 〈u
, v〉 < 0}.
Proof. By lemma 2.2.(1) the vertex vi is in the set {v ∈ V(P ) | 〈u
, v〉 < 0},
and by lemma 2.2.(5) and the definition of the ordering �, vi is the minimal
element in this set.
In fact, we have chosen the ordering � to obtain the property of lemma 4.2,
and any other total order on Zd having this property can be used in what
follows.
4.2 The order of a smooth Fano d-polytope
We can now define an ordering on finite subsets of Zd. The ordering is
defined recursively.
Definition 4.3. Let X and Y be finite subsets of Zd. We define X � Y if
and only if X = ∅ or
Y 6= ∅ ∧ (minX ≺ minY ∨ (minX = minY ∧X\{minX} � Y \{min Y })).
Example. ∅ ≺ {(0, 1)} ≺ {(0, 1), (−1, 1)} ≺ {(0, 1), (1,−1)} ≺ {(−1, 1)}.
When W is a nonempty finite set of subsets of Zd, we define maxW to be the
maximal element in W with respect to the ordering of subsets �. Similarly,
minW is the minimal element in W .
Now, we are ready to define the order of a smooth Fano d-polytope.
4.3 Permutation of basisvectors and presubsets 9
Definition 4.4. Let P be a smooth Fano d-polytope. The order of P ,
ord(P ), is defined as
ord(P ) := min{V(Q) | Q a special embedding of P}.
The set is non-empty and finite, so ord(P ) is well-defined.
Let P1 and P2 be two smooth Fano d-polytopes. We say that P1 ≤ P2 if
and only if ord(P1) � ord(P2). This is indeed a total order on the set of
isomorphism classes of smooth Fano d-polytopes.
4.3 Permutation of basisvectors and presubsets
The group Sd of permutations of d elements acts on Z
d is the obvious way
by permuting the basisvectors:
σ.(a1e1 + . . .+ aded) := a1eσ(1) + . . .+ adeσ(d) , σ ∈ Sd.
Similarly, Sd acts on subsets of Z
σ.X := {σ.x | x ∈ X}.
In this notation we clearly have for any special embedding P of a smooth
Fano d-polytope
ord(P ) � min{σ.V(P ) | σ ∈ Sd}.
Let V and W be finite subsets of Zd. We say that V is a presubset of W , if
V ⊆ W and v ≺ w whenever v ∈ V and w ∈ W \ V .
Example. {(0, 1), (−1, 1)} is a presubset of {(0, 1), (−1, 1), (1,−1)}, while
{(0, 1), (1,−1)} is not.
Lemma 4.5. Let P be a smooth Fano polytope. Then every presubset V of
ord(P ) is the minimal element in {σ.V | σ ∈ Sd}.
Proof. Let ord(P ) = {v1, . . . , vn}, v1 ≺ . . . ≺ vn. Suppose there exists a
permutation σ and a k, 1 ≤ k ≤ n, such that
σ.{v1, . . . , vk} = {w1, . . . , wk} ≺ {v1, . . . , vk},
where w1 ≺ . . . ≺ wk. Then there is a number j, 1 ≤ j ≤ k, such that
wi = vi for every 1 ≤ i < j and wj ≺ vj.
Let σ act on {v1, . . . , vn}.
σ.{v1, . . . , vn} = {x1, . . . , xn} , x1 ≺ . . . ≺ xn.
Then xi � vi for every 1 ≤ i < j and xj ≺ vj. So σ.ord(P ) ≺ ord(P ), but
this contradicts the definition of ord(P ).
10 5 THE SFP-ALGORITHM
5 The SFP-algorithm
In this section we describe an algorithm that produces the classification list
of smooth Fano d-polytopes for any given d ≥ 1. The algorithm works by
going through certain finite subsets of Wd in increasing order (with respect
to the ordering defined in the previous section). It will output a subset V
iff convV is a smooth Fano d-polytope P and ord(P ) = V .
Throughout the whole section (e1, . . . , ed) is a fixed basis of Z
d and I denotes
the (d− 1)-simplex conv{e1, . . . , ed}.
5.1 The SFP-algorithm
The SFP-algorithm consists of three functions,
SFP, AddPoint and CheckSubset.
The finite subsets of Wd are constructed by the function AddPoint, which
takes a subset V , {e1, . . . , ed} ⊆ V ⊆ Wd, together with a finite set F ,
I ∈ F , of (d − 1)-simplices in Rd as input. It then goes through every v in
the set
{v ∈ Wd | max V ≺ v}
in increasing order, and recursively calls itself with input V ∪ {v} and some
set F ′ of (d − 1)-simplices of Rd, F ⊆ F ′. In this way subsets of Wd are
considered in increasing order.
Whenever AddPoint is called, it checks if the input set V is the vertex set of
a special embedding of a smooth Fano d-polytope P such that ord(P ) = V ,
in which case the polytope P = convV is outputted.
For any given integer d ≥ 1 the function SFP calls the function AddPoint
with input {e1, . . . , ed} and {I}. In this way a call SFP(d) will make the
algorithm go through every finite subset of Wd containing {e1, . . . , ed}, and
smooth Fano d-polytopes are outputted in strictly increasing order.
It is vital for the effectiveness of the SFP-algorithm, that there is some
efficient way to check if a subset V ⊆ Wd is a presubset of ord(P ) for some
smooth Fano d-polytope P . The function AddPoint should perform this
check before the recursive call AddPoint(V,F ′).
If P is any smooth Fano d-polytope, then any presubset V of ord(P ) is the
minimal element in the set {σ.V |σ ∈ Sd} (by lemma 4.5). In other words, if
there exists a permutation σ such that σ.V ≺ V , then the algorithm should
not make the recursive call AddPoint(V ).
But this is not the only test we wish to perform on a subset V before the
recursive call. The function CheckSubset performs another test: It takes
a subset V , {e1, . . . , ed} ⊆ V ⊆ Wd as input together with a finite set of
(d−1)-simplices F , I ∈ F , and returns a set F ′ of (d−1)-simplices containing
F , if there exists a special embedding P of a smooth Fano d-polytope, such
5.2 An example of the reasoning in CheckSubset 11
1. V is a presubset of V(P )
2. F is a subset of the facets of P
This is proved in theorem 5.1. If no such special embedding exists, then
CheckSubset returns false in many cases, but not always! Only when
CheckSubset(V,F) returns a set F ′ of simplices, we allow the recursive
call AddPoint(V,F ′).
Given input V ⊆ Wd and a set F of (d − 1)-simplices of R
d, the function
CheckSubset works in the following way: Suppose V is a presubset of V(P )
for some special embedding P of a smooth Fano d-polytope and F is a subset
of the facets of P . Deduce as much as possible of the face lattice of P and
look for contradictions to the lemmas stated in section 2. The more facets
we know of P , the more restrictions we can put on the vertex set V(P ), and
then on V . If a contradiction arises, return false. Otherwise, return the
deduced set of facets of P .
The following example illustrates how the function CheckSubset works.
5.2 An example of the reasoning in CheckSubset
Let d = 5 and V = {v1, . . . , v8}, where
v1 = e1 , v2 = e2 , v3 = e3 , v4 = e4 , v5 = e5
v6 = −e1 − e2 + e4 + e5 , v7 = e2 − e3 − e4 , v8 = −e4 − e5.
Suppose P is a special embedding of a smooth Fano 5-polytope, such that
V is a presubset of V(P ). Certainly, the simplex I is a facet of P .
Notice, that V does not violate lemma 3.2.
v1 + . . . + v8 = e2 + e5.
If V did contradict lemma 3.2, then the polytope P could not exist, and
CheckSubset(V, {I}) should return false.
For simplicity we denote any k-simplex conv{vi1 , . . . , vik} by {i1, . . . , ik}.
Since 〈uI , v6〉 = 0, the simplices F1 = {2, 3, 4, 5, 6} and F2 = {1, 3, 4, 5, 6}
are facets of P (lemma 2.4).
There are exactly two facets of P containing the ridge {1, 2, 4, 5}. One of
them is I. Suppose the other one is {1, 2, 4, 5, 9}, where v9 is some lattice
point not in V , v9 ∈ V(P ). Then 〈uI , v9〉 > 〈uI , v7〉 by lemma 2.2.(5)
and then v9 ≺ v7 by the definition of the ordering of lattice points Z
But then V is not a presubset of V(P ). This is the nice property of the
ordering of Zd, and the reason why we chose it as we did. We conclude that
F3 = {1, 2, 4, 5, 7} is a facet of P , and by similar reasoning F4 = {1, 2, 3, 5, 8}
and F5 = {1, 2, 3, 4, 8} are facets of P .
12 5 THE SFP-ALGORITHM
Now, for each of the facets Fi and every point vj ∈ V , we check if 〈uFi , vj〉 =
0. If this is the case, then by lemma 2.4 conv({vj} ∪ V(Fi) \ {w}) is a facet
of P for every w ∈ V(Fi) where 〈u
, vj〉 < 0. In this way we get that
{2, 4, 5, 6, 7} , {1, 4, 5, 6, 7} , {1, 2, 3, 7, 8} , {1, 3, 5, 7, 8}
are facets of P .
We continue in this way, until we cannot deduce any new facet of P . Every
time we find a new facet F we check that v is beneath F (that is 〈uF , v〉 ≤ 1)
and that lemma 2.3 holds for any v ∈ V . If not, then CheckSubset(V, {I})
should return false.
If no contradiction arises, CheckSubset(V, {I}) returns the set of deduced
facets.
5.3 The SFP-algorithm in pseudo-code
Input: A positive integer d.
Output: A list of special embeddings of smooth Fano d-polytopes, such that
1. Any smooth Fano d-polytope is isomorphic to one and only one poly-
tope in the output list.
2. If P is a smooth Fano d-polytope in the output list, then V(P ) =
ord(P ).
3. If P1 and P2 are two non-isomorphic smooth Fano d-polytopes in the
output list and P1 preceeds P2 in the output list, then ord(P1) ≺
ord(P2).
SFP ( an integer d ≥ 1 )
1. Construct the set V = {e1, . . . , ed} and the simplex I = convV .
2. Call the function AddPoint(V, {I}).
3. End program.
AddPoint ( a subset V where {e1, . . . , ed} ⊆ V ⊆ Wd , a set of (d − 1)-
simplices F in Rd where I ∈ F )
1. If P = conv(V(V )) is a smooth Fano d-polytope and V(V ) = ord(P ),
then output P .
2. Go through every v ∈ Wd, maxV(V ) ≺ v, in increasing order with
respect to the ordering ≺:
(a) If CheckSubset(V ∪ {v},F) returns false, then goto (d). Oth-
erwise let F ′ be the returned set of (d− 1)-simplices.
5.4 Justification of the SFP-algorithm 13
(b) If V ∪ {v} 6= min{σ.(V ∪ {v}) | σ ∈ Sd}, then goto (d).
(c) Call the function AddPoint(V ∪ {v},F ′).
(d) Let v be the next element in Wd and go back to (a).
3. Return
CheckSubset ( a subset V where {e1, . . . , ed} ⊆ V ⊆ Wd , a set of (d− 1)-
simplices F in Rd where I ∈ F )
1. Let ν =
v∈V v.
2. If 〈uI , ν〉 < 0, then return false.
3. If 〈u
, ν〉 > 1 + 〈uI , ν〉 for some i, then return false.
4. Let F ′ = F .
5. For every i ∈ {1, . . . , d}: If the set {v ∈ V |〈u
, v〉 < 0} is equal to
{max V }, then add the simplex conv({max V } ∪ V(I) \ {ei}) to F
6. If there exists F ∈ F ′ such that V(F ) is not a Z-basis of Zd, then
return false.
7. If there exists F ∈ F ′ and v ∈ V such that 〈uF , v〉 > 1, then return
false.
8. If there exists F ∈ F ′, v ∈ V and w ∈ V(F ), such that
〈uwF , v〉 <
0 〈uF , v〉 = 1
−1 〈uF , v〉 = 0
〈uF , v〉 〈uF , v〉 < 0
then return false.
9. If there exists F ∈ F ′, v ∈ V and w ∈ V(F ), such that 〈uF , v〉 = 0 and
, v〉 = −1, then consider the simplex F ′ = conv({v}∪V(F ) \ {w}).
If F ′ /∈ F ′, then add F ′ to F ′ and go back to step 6.
10. Return F ′.
5.4 Justification of the SFP-algorithm
The following theorems justify the SFP-algorithm.
Theorem 5.1. Let P be a special embedding of a smooth Fano d-polytope
and V a presubset of V(P ), such that {e1, . . . , ed} ⊆ V . Let F be a set of
facets of P .
Then CheckSubset(V,F) returns a subset F ′ of the facets of P and F ⊆ F ′.
14 6 CLASSIFICATION RESULTS AND WHERE TO GET THEM
Proof. By lemma 3.2 the subset V will pass the tests in step 2 and 3 in
CheckSubset.
The function CheckSubset constructs a set F ′ of (d−1)-simplices contain-
ing the input set F . We now wish to prove that every simplex F in F ′ is a
facet of P : By the assumptions the subset F ⊆ F ′ consists of facets of P .
Consider the addition of a simplex Fi, 1 ≤ i ≤ d, in step 5:
Fi = conv({max V } ∪ V(I) \ {ei}).
As maxV is the only element in the set {v ∈ V |〈uei
, v〉 < 0} and V is a
presubset of V(P ), Fi is a facet of P by lemma 4.2.
Consider the addition of simplices in step 9: If F is a facet of P , then by
lemma 2.4 the simplex conv({v} ∪ V(F ) \ {w}) is a facet of P .
By induction we conclude, that every simplex in F ′ is a facet of P . Then
any simplex F ∈ F ′ will pass the tests in steps 6–8 (use lemma 2.3 to see
that the last test is passed).
This proves the theorem.
Theorem 5.2. The SFP-algorithm produces the promised output.
Proof. Let P be a smooth Fano d-polytope. Clearly, P is isomorphic to at
most one polytope in the output list.
Let Q be a special embedding of P such that V(Q) = ord(P ). We need to
show that Q is in the output list. Let V(Q) = {e1, . . . , ed, q1, . . . , qk}, where
q1 ≺ . . . ≺ qk, and let Vi = {e1, . . . , ed, q1, . . . , qi} for every 1 ≤ i ≤ k.
Certainly the function AddPoint has been called with input {e1, . . . , ed}
and {I}.
By theorem 5.1 the function call CheckSubset(V1 , {I}) returns a set F1 of
(d − 1)-simplices which are facets of Q, I ⊂ F1. By lemma 4.5 the set V1
passes the test in 2b in AddPoint. Then AddPoint is called recursively
with input V1 and F1.
The call CheckSubset(V1,F1) returns a subset F2 of facets of Q, and the
set V2 passes the test in 2b in AddPoint. So the call AddPoint(V2,F2) is
made.
Proceed in this way to see that the call AddPoint(Vk ,Fk) is made, and then
the polytope Q = convVk is outputted in step 1 in AddPoint.
6 Classification results and where to get them
A modified version of the SFP-algorithm has been implemented in C++,
and used to classify smooth Fano d-polytopes for d ≤ 7. On an average
home computer our program needs less than one day (january 2007) to con-
struct the classification list of smooth Fano 7-polytopes. These lists can be
downloaded from the authors homepage: http://home.imf.au.dk/oebro
REFERENCES 15
An advantage of the SFP-algorithm is that it requires almost no memory:
When the algorithm has found a smooth Fano d-polytope P , it needs not
consult the output list to decide whether to output the polytope P or not.
The construction guarentees that V(P ) = min{σ.V(P ) | σ ∈ Sd} and it
remains to check if V(P ) = ord(P ). Thus there is no need of storing the
output list.
The table below shows the number of isomorphism classes of smooth Fano
d-polytopes with n vertices.
n d = 1 d = 2 d = 3 d = 4 d = 5 d = 6 d = 7
4 2 1
5 1 4 1
6 1 7 9 1
7 4 28 15 1
8 2 47 91 26 1
9 27 268 257 40
10 10 312 1318 643
11 1 137 2807 5347
12 1 35 2204 19516
13 5 771 26312
14 2 186 14758
15 39 4362
16 11 1013
17 1 214
18 1 43
Total 1 5 18 124 866 7622 72256
References
[1] V. V. Batyrev, Toroidal Fano 3-folds, Math. USSR-Izv. 19 (1982), 13–
[2] V. V. Batyrev, On the classification of smooth projective toric varieties,
Tohoku Math. J. 43 (1991), 569–585.
[3] V. V. Batyrev, On the classification of toric Fano 4-folds, J. Math. Sci.
(New York) 94 (1999), 1021–1050.
[4] L. Bonavero, Toric varieties whose blow-up at a point is Fano. Tohoku
Math. J. 54 (2002), 593–597.
16 REFERENCES
[5] C. Casagrande, Centrally symmetric generators in toric Fano varieties,
Manuscr. Math. 111 (2003), 471–485.
[6] C. Casagrande, The number of vertices of a Fano polytope, Ann. Inst.
Fourier 56 (2006), 121–130.
[7] O. Debarre, Toric Fano varieties in Higher dimensional varieties and
rational points, lectures of the summer school and conference, Budapest
2001, Bolyai Society Mathematical Studies 12, Springer, 2001.
[8] G. Ewald, On the classification of toric Fano varieties, Discrete Com-
put. Geom. 3 (1988), 49–54.
[9] P. Kleinschmidt, A classification of toric varieties with few generators,
Aequationes Math 35 (1988), no.2-3, 254–266.
[10] M. Kreuzer & H. Skarke, Classification of reflexive polyhedra in three
dimensions, Adv. Theor. Math. Phys. 2 (1998), 853–871.
[11] M. Kreuzer & H. Skarke, Complete classification of reflexive polyhedra
in four dimensions, Adv. Theor. Math. Phys. 4 (2000), 1209–1230.
[12] M. Kreuzer & B. Nill, Classification of toric Fano 5-folds, Preprint,
math.AG/0702890.
[13] B. Nill, Gorenstein toric Fano varieties, Manuscr. Math. 116 (2005),
183–210.
[14] B. Nill. Classification of pseudo-symmetric simplicial reflexive poly-
topes, Preprint, math.AG/0511294, 2005.
[15] H. Sato, Toward the classification of higher-dimensional Toric Fano
varieties,. Tohoku Math. J. 52 (2000), 383–413.
[16] H. Sato, Toric Fano varieties with divisorial contractions to curves.
Math. Nachr. 261/262 (2003), 163–170.
[17] V.E. Voskresenskij & A. Klyachko, Toric Fano varieties and systems of
roots. Math. USSR-Izv. 24 (1985), 221–244.
[18] K. Watanabe & M. Watanabe, The classification of Fano 3-folds with
torus embeddings, Tokyo Math. J. 5 (1982), 37–48.
[19] M. Øbro, Classification of terminal simplicial reflexive d-polytopes with
3d− 1 vertices, Preprint, math.CO/0703416.
http://arxiv.org/abs/math/0702890
http://arxiv.org/abs/math/0511294
http://arxiv.org/abs/math/0703416
REFERENCES 17
Department of Mathematics
University of Århus
8000 Århus C
Denmark
E-mail address : oebro@imf.au.dk
	Introduction
	Smooth Fano polytopes
	Special embeddings of smooth Fano polytopes
	Special facets
	Special embeddings
	Total ordering of smooth Fano polytopes
	The order of a lattice point
	The order of a smooth Fano d-polytope
	Permutation of basisvectors and presubsets
	The SFP-algorithm
	The SFP-algorithm
	An example of the reasoning in CheckSubset
	The SFP-algorithm in pseudo-code
	Justification of the SFP-algorithm
	Classification results and where to get them
ABSTRACT
  We present an algorithm that produces the classification list of smooth Fano
d-polytopes for any given d. The input of the algorithm is a single number,
namely the positive integer d. The algorithm has been used to classify smooth
Fano d-polytopes for d<=7. There are 7622 isomorphism classes of smooth Fano
6-polytopes and 72256 isomorphism classes of smooth Fano 7-polytopes.

<|endoftext|><|startoftext|>
Intelligent location of two simultaneously active
acoustic emission sources:
Part II
Tadej Kosel and Igor Grabec
Faculty of Mechanical Engineering, University of Ljubljana,
Aškerčeva 6, POB 394, SI-1001 Ljubljana, Slovenia
e-mail: tadej.kosel@guest.arnes.si; igor.grabec@fs.uni-lj.si
Abstract— Part I describes an intelligent acoustic emission
locator, while Part II discusses blind source separation, time
delay estimation and location of two continuous acoustic emission
sources.
Acoustic emission (AE) analysis is used for characterization
and location of developing defects in materials. AE sources
often generate a mixture of various statistically independent
signals. A difficult problem of AE analysis is separation and
characterization of signal components when the signals from
various sources and the mode of mixing are unknown. Recently,
blind source separation (BSS) by independent component analysis
(ICA) has been used to solve these problems. The purpose of
this paper is to demonstrate the applicability of ICA to locate
two independent simultaneously active acoustic emission sources
on an aluminum band specimen. The method is promising for
non-destructive testing of aircraft frame structures by acoustic
emission analysis.
INTRODUCTION
A common goal of many non-destructive testing methods is
to detect defects in materials. Acoustic emission analysis (AE)
is a passive testing method used to locate and characterize
defects which emit sound[10].
There are many ways to deduce the location of an AE
source from electrical signals detected by a chain of sensors.
The corresponding problems may be classified by the type of
acoustic source mechanism as the location of a continuous
emission source, such as that generated by a leak, or as the
location of discrete emission, such as an AE burst caused by a
growing crack. This paper describes a method for processing
continuous AE signals to determine the time delay (T-D)
between signals and thus to provide information for location
of AE sources. It should be pointed out that application of
AE source characteristics, such as count, count rate, ampli-
tude distribution, and conventional time delay measurement,
becomes meaningless when dealing with continuous acoustic
sources.
The basic information for AE source location consists of
T-D between stress waves detected at different positions on
a specimen. In the case of only one active AE source, T-
D of continuous acoustic waves can be estimated using the
cross-correlation function (CCF) of sensor signals described
in Part I of this article[10], [7]. In the case of two (or
Manuscript generated: January 31, 2007
more) simultaneously active AE sources, this method is not
applicable, since analysis of the CCF leads only to the T-D
of the most powerful AE signal. Detection of simultaneously
active independent AE source signals therefore requires a more
sophisticated approach.
The purpose of our study was to find a suitable method for
processing a mixture of two simultaneously active continuous
AE signals to determine the T-D and, related to this, the
coordinates of both AE sources. We found that the Blind
Source Separation (BSS) method solves this problem satisfac-
torily. BSS is a general signal processing method involving
the recovery of the contributions of different sources from
a finite set of observations recorded by sensors, independent
of the propagation medium and without any prior knowledge
of the sources. BSS has already been successfully applied
in medicine, telecommunications, image processing etc[8].
However, it is also a promising method for AE analysis of
aircraft structures, because AE signals are often hidden in a
mixture of signals from various sources. BSS could extract
the specific signature of each AE source, which can further
be used for location and characterization purposes, or to
isolate AE sources from background noise. We conducted
experiments with BSS on an aluminum beam on which two
continuous AE sources were generated simultaneously by air
flow.
METHODS
In this section we explain two different methods for time
delay estimation of AE sources. The first method is based on
analysis of the CCF and is convenient for T-D estimation of
one active continuous AE source as is described in Part I[10],
[7], [12]. The CCF exhibits a peak when the delay parameter
compensates the T-D between the sensor signals [10]. The T-D
is thus determined by the position of the highest peak of the
CCF. The second method is based on BSS algorithm and is
convenient for T-D estimation of two (or more) simultaneously
active continuous AE sources[9]. Location of two simulta-
neously active AE sources was performed by an intelligent
locator based on a general regression neural network[5] as is
described in Part I.
Multichannel Blind Source Separation has recently received
increased attention due to the importance of its potential
http://arxiv.org/abs/0704.0050v1
applications[3]. It occurs in many fields of engineering and
applied sciences, including processing of signals from antenna
array, speech and geophysical data processing, noise reduction,
biological system analysis, etc. It consists of recovering signals
emitted by unknown sources and mixed by an unknown
medium (material where waves propagate), using only several
observations of the mixtures. The only assumptions made
are the linearity of the mixing system and the statistical
independence of original signals.
BSS methods may be classified in several ways. One
possible classification that can be made depends on whether
the mixtures are instantaneous or convolutive [4]. Convo-
lutive mixtures correspond to a mixing system with time
dependent memory. They represent a more general case than
instantaneous mixtures, and they have in particular acoustic
applications. Recently, the principle of independent component
analysis (ICA) was applied in BSS, and it was found to
be a simple and powerful tool[6]. This study deals with the
separation of two convolutively mixed independent continuous
AE signals by ICA and the intelligent locator was used to
locate two independent continuous AE sources based on T-D
The mixing and filtering processes of unknown input signals
sj(t) may have different mathematical or physical back-
grounds, depending on specific applications. In this paper,
we focus mainly on the simplest cases with n signals xi(t)
linearly mixed in n unknown statistically independent, zero
mean source signals sj(t). The composition is expressed in
matrix notation as x = A ∗ s [8], where ‘*’ denotes a
convolution, x = [x1(t), . . . , xn(t)]
T is the vector of sensor
signals, s = [s1(t), . . . , sn(t)]
T is the vector of source signals
and A is an unknown full rank n × n mixing matrix whose
elements are finite inpulse response (FIR) filters. We assume
that only vector x is available. The goal of ICA is to find a
matrix W , by which vector x can be transformed into source
signals u = W ∗ x.
Matrix W is simply the inverse of A. However, when noise
corrupts the signals, matrix W must be found by an optimal
statistical treatment of the inverse problem. The optimal ma-
trix W can be estimated by a feed-forward neural network
operating in the frequency domain. A learning algorithm with
Amari’s natural gradient can be written as[1]: ũ = W̃ · x̃,
W̃ (τ + 1) = W̃ (τ) + α∆W̃ (τ) + η∆W̃ (τ − 1), ∆W̃ =
[I − ỹ · ũH] W̃ , ỹ = tanh(ℜ[ũ]) + ı tanh(ℑ[ũ]), where α is
the learning rate, η is the constant of learning, I is the identity
matrix and the tilde ‘˜ ’represents a frequency domain.
The ICA algorithm runs off-line and proceeds as follows
[11] (Fig. 1):
1) Pre-process the time-domain input signals, x(t): sub-
stract the mean from each signal.
2) Initialize the frequency domain unmixing filters, W̃ .
3) Take a block of input data and convert it into the
frequency domain using the Fast Fourier Transform
(FFT).
4) Filter the frequency domain input block, x̃, through W̃
to get the estimated source signals, ũ.
5) Pass ũ through the frequency domain nonlinearity, ỹ.
6) Use W̃ , ũ and ỹ along with the natural gradient
extension [2] to compute the change in the unmixing
PSfrag replacements
pre-process initialize
unmixing
filters
filter
update rule
Fig. 1. Block diagram of ICA algorithm
filter, ∆W̃ .
7) Take the next block of input data, covert it into the
frequency domain, and proceed from step 4. Repeat this
process until the unmixing filters have converged upon
a solution, passing several times through the data.
8) Normalize W̃ and convert it back into the time domain,
using the Inverse Fast Fourier Transform (IFFT).
9) Convolve the time domain unmixing filters, W , with x
to get the estimated sources.
EXPERIMENTS
We performed experiments with two independent continu-
ous AE sources on an aluminum band of dimensions 4000×
40× 5mm3. Reflections at the end of the band were reduced
by wrapping the ends in putty. The testing area was on the
longitudinal axis in the middle of the band, where 23 holes of
diameter 2 mm and mutual separation 100 mm were prepared
as shown in Fig. 2.
PSfrag replacements
φ 2 mm bandl air flow
Fig. 2. AE generation by air flowing through the hole
Two AE sensors were mounted 100 mm away from the
terminal holes, that is 2.4 m from each other. The origin of
the coordinate system was in the middle of the band and the
testing area extended from −1.1m to +1.1m. AE signals
were excited by two independent air jets flowing through
the holes. The source position was arbitrarily selected at
+100mm and +800mm. Air jets were formed by two nozzles
of diameter 1 mm using pressure 7 bar. The experimental set-
up consisted of the test specimen (aluminum band), two AE
sensors (pinducers), two AE sources (air jets), two amplifiers,
a digital oscilloscope (A/D converter) and a computer (BSS
module, locator, plotter) as shown in Fig. 3. Three experiments
were performed : (1) T-D estimation using a CCF of two AE
signals that were not simultaneously active; (2) T-D estimation
using a CCF of two AE signals which were simultaneously
active and (3) T-D estimation of AE signals using ICA.
Location of sources, based on T-D, by the intelligent locator
was performed in all three cases.
PSfrag replacements
sensor
AE source
locator
plotter
Fig. 3. Experimental set-up
In the first experiment only one air jet was activated for
a particular measurement. In the second experiment both air
jets were activated. Sensor signals were linear convolutive
mixtures of two independent continuous AE sources as shown
in Fig. 4. The auto-correlation R11, R22 and cross-correlation
functions R12, R21 were calculated from sensor signals. Only
one T-D of two signals can be estimated from the highest
peak in both CCF, regardless of the number of independent
AE sources on the test specimen as shown in Fig. 5. This
means that a CCF can not be used for automatic T-D estima-
tion of multiple AE signals on the test specimen. The CCF
exhibits various peaks which belong to various independent
AE sources, but it is ussually impossible to relate these peaks
to corresponding coordinates of AE sources.
0 0.1 0.2 0.3 0.4
PSfrag replacements
t [ms]
(a) Sensory signal #1
0 0.1 0.2 0.3 0.4
PSfrag replacements
t [ms]
(b) Sensory signal #2
Fig. 4. Mixtures of two independent continuous AE sources aquired by two
sensors
In the third experiment the ICA algorithm was used to
solve this problem satisfactorily. The ICA algorithm results
in demixing FIR filters which extract the independent source
signals from sensory signals. By inverting the demixing filters
W we obtain mixing filters A. In the case of two independent
0 5000 10000 15000
PSfrag replacements
PSfrag replacements
Rx1x1
PSfrag replacements
Rx1x1
Rx1x2
0 5000 10000 15000
PSfrag replacements
Rx1x1
Rx1x2
Rx2x1
Fig. 5. Auto- and cross-correlation functions of sensory signals; down-arrow
marks the highest peak
AE sources and two sensors, the components of A are four
FIR mixing filters, as shown in Fig. 6. There are two direct
a11, a22 and two cross mixing filters a12, a21. The first index
of the filter represents the number of the sensor, while the
second index represents the number of the source. The position
of the highest peak of the cross FIR filters determines the T-
D between two signals from two sensors. If we substract the
coordinate of the highest peak of a direct mixing FIR filter
a11 from the coordinate of the highest peak of cross filter a21
we obtain the T-D of first independent AE source, since each
of the highest peaks in the FIR filters belongs to different
independent AE signals.
RESULTS
The results of T-D estimation of two continuous independent
AE sources are shown in Fig. 7. Three experiments were done.
In the first experiment, the T-D was estimated by a CCF of two
AE sources which were not active simultaneously as marked
by ‘◦’. Locations of these two sources estimated by the
intelligent locator were +181 mm and +784 mm. The second
experiment was performed with both AE sources active simul-
taneously. T-D were also estimated by a CCF. The highest peak
0 5000 10000 15000
PSfrag replacements
PSfrag replacements
PSfrag replacements
0 5000 10000 15000
0.5PSfrag replacements
Fig. 6. Mixing filters obtained by ICA of sensory signals; down-arrow marks
the highest peak
position corresponds to the source location marked by ‘− −’
and was +784 mm. The third experiment was performed using
ICA for T-D estimation and location by intelligent locator. The
result is marked by ‘�’. Estimated positions of this two sources
were +179 mm and +784 mm respectively. If we compare the
coordinates of both independent AE sources estimated by
the first experiment and by the third experiment, we find a
good correspondence. If we compare estimated AE source
coordinates with actual coordinates, which were +100 mm and
+800 mm respectively, we observe a slight disagreement due to
experimental error. Experimental error is about 3% regarding
the distance between sensors. Absolute error in this case is
79 mm and 16 mm respectively. The results also depend on
the number and distribution of prototype sources marked by
‘•’, which are essential for operation of the intelligent locator.
If the number of prototype sources is increased, location error
is reduced. In our case the prototype sources were distributed
along the beam from −1.1m to +1.1m separated by 0.1m, so
that systematic error of the locator was set to several procents.
PSfrag replacements
actual position l [m]
correlation function
Fig. 7. Results of location of two continuous independent AE sources.
Symbols: ‘�’ – AE sources obtained by ICA; ‘◦’ – estimated AE sources
obtained by cross-correlation function in two steps, when just one of two
AE sources was active at time of measurement; ‘− −’ – estimated AE
sources obtained by cross-correlation function when two AE sources were
active simultaneously; ‘•’ – prototype AE sources required for location using
intelligent locator; ‘−’ – distribution of actual sources.
DISCUSSION AND CONCLUSION
CCF is applicable to T-D estimation only in the case of one
active AE source. The goal of our research is to develop a new
method to estimate T-D between AE signals in the case of mul-
tiple simultaneously active continuous AE sources. We have
shown that, for this purpose, ICA is an applicable option. ICA
finds a linear coordinate system (the unmixing filters) such that
the resulting signals are statistically independent. This is an
advantage of ICA over CCF. It represents a new approach to
processing of AE data and further expands the applicability of
AE analysis in the field of non-destructive testing. In machines
or in an industrial environment, multiple sources are usually
active Simultaneously, often representing environmental dis-
turbances. The corresponding complex signals are not directly
applicable to characterization of particular sources. However,
separation of contributions by ICA analysis in fact represents a
kind of filtering, increasing the applicability of filtered signals
to characterization of sources in complex environments. Future
research will be focused on location of multiple AE sources
on two-dimensional and three-dimensional specimens.
REFERENCES
[1] Amari, S.-I. 1998 , Natural gradient works efficiently in learning, Neural
Computation 10, 251–276.
[2] Amari, S.-I., Cichocki, A. Yang, H. H. 1996 , A new learning algorithm
for blind signal separation, in D. Touretzky, M. Mozer M. Hasselmo, eds,
‘Advances in Neural Information Processing Systems’, Vol. 8, MIT Press,
Cambridge MA, pp. 752–763.
[3] Burel, G. 1992 , Blind separation of sources: A nonlinear algorithm,
Neural Networks 5, 937–947.
[4] Deville, Y. Charkani, N. 1997 , Analysis of the stability of time-
domain source separation algorithms for convolutively mixed signals, in
International Comference on Acoustics, Speech, and Signal Processing,
pp. 1835–1838.
[5] Grabec, I. Sachse, W. 1997 , Synergetics of Measurement, Prediction and
Control, Springer-Verlag, Berlin.
[6] Hyvarinen, A. Oja, E. 2000 , Independent component analysis: algorithms
and applications, Neural Networks 13, 411–430.
[7] Kosel, T., Grabec, I. Mužič, P. 2000 , Location of continuous acoustic
emission sources generated by air flow, Ultrasonics 38(1–8), 824–826.
[8] Lee, T.-W. 1998 , Independent Component Analysis, Theory and Appli-
cations, Kluwer Academic Publishers, Boston etc.
[9] Lee, T.-W., Bell, A. J. Lambert, R. 1997 , Blind separation of convolved
and delayed sources, Advances in Neural Information Processing Systems
9, 758–764.
[10] McIntire, P. Miller, R. K., eds 1987 , Acoustic Emission Testing,
Vol. 5 of Nondestructive Testing Handbook, 2 edn, American Society
for Nondestructive Testing, Philadelphia, USA.
[11] Westner, A. G. 1996 , Object-based audio capture: Separating
acoustically-mixed sources, MSc Thesis, Massachusetts Institute of Tech-
nology.
[12] Ziola, S. M. Gorman, M. R. 1991 , Source location in thin plates using
cross-correlation, J. Acoust. Soc. Am. 90(5), 2551–2556.
	References
ABSTRACT
  Part I describes an intelligent acoustic emission locator, while Part II
discusses blind source separation, time delay estimation and location of two
continuous acoustic emission sources.
  Acoustic emission (AE) analysis is used for characterization and location of
developing defects in materials. AE sources often generate a mixture of various
statistically independent signals. A difficult problem of AE analysis is
separation and characterization of signal components when the signals from
various sources and the mode of mixing are unknown. Recently, blind source
separation (BSS) by independent component analysis (ICA) has been used to solve
these problems. The purpose of this paper is to demonstrate the applicability
of ICA to locate two independent simultaneously active acoustic emission
sources on an aluminum band specimen. The method is promising for
non-destructive testing of aircraft frame structures by acoustic emission
analysis.

<|endoftext|><|startoftext|>
Introduction
	Probabilities
	Classical coins and classical probabilities
	Quantum coins and quantum probabilities
	Teleportation
	Visualizing quantum information processing
	States of quantum coins
	Measurements on quantum coins
	Visualizing teleportation
	Teleporting classical coins
	Conclusion
	Acknowledgments
	References
ABSTRACT
  A novel way of picturing the processing of quantum information is described,
allowing a direct visualization of teleportation of quantum states and
providing a simple and intuitive understanding of this fascinating phenomenon.
The discussion is aimed at providing physicists a method of explaining
teleportation to non-scientists. The basic ideas of quantum physics are first
explained in lay terms, after which these ideas are used with a graphical
description, out of which teleportation arises naturally.

<|endoftext|><|startoftext|>
Introduction
The extension of quantum field theory to curved space-times has led to
the discovery of many qualitatively new phenomena which do not occur
in the simpler theory on Minkowski space, such as Hawking radiation; for
background and historical references, see [2, 6, 18].
The reconstruction of quantum field theory on a Lorentz-signature space-
time from the corresponding Euclidean quantum field theory makes use of
Osterwalder-Schrader (OS) positivity [15, 16] and analytic continuation. On
a curved background, there may be no proper definition of time-translation
and no Hamiltonian; thus, the mathematical framework of Euclidean quan-
tum field theory may break down. However, on static space-times there is a
Hamiltonian and it makes sense to define Euclidean QFT. This approach was
recently taken by the authors [11], in which the fundamental properties of
Osterwalder-Schrader quantization and some of the fundamental estimates
of constructive quantum field theory1 were generalized to static space-times.
The previous work [11], however, did not address the analytic continuation
which leads from a Euclidean theory to a real-time theory. In the present
article, we initiate a treatment of the analytic continuation by constructing
unitary operators which form a representation of the isometry group of the
Lorentz-signature space-time associated to a static Riemannian space-time.
Our approach is similar in spirit to that of Fröhlich [4] and of Klein and
Date: February 22, 2007.
1For background on constructive field theory in flat space-times, see [8, 9].
http://arxiv.org/abs/0704.0052v1
2 ARTHUR JAFFE AND GORDON RITTER
Landau [13], who showed how to go from the Euclidean group to the Poincaré
group without using the field operators on flat space-time.
This work also has applications to representation theory, as it provides
a natural (functorial) quantization procedure which constructs nontrivial
unitary representations of those Lie groups which arise as isometry groups
of static, Lorentz-signature space-times. These groups are typically non-
compact. For example, when applied to AdSd+1, our procedure gives a
unitary representation of the identity component of SO(d, 2). Moreover,
our procedure makes use of the Cartan decomposition, a standard tool in
representation theory.
2. Classical Space-Time
2.1. Structure of Static Space-Times.
Definition 2.1. A quantizable static space-time is a complete, con-
nected orientable Riemannian manifold (M,gab) with a globally-defined (smooth)
Killing field ξ which is orthogonal to a codimension-one hypersurface Σ ⊂M ,
such that the orbits of ξ are complete and each orbit intersects Σ exactly
once.
Throughout this paper, we assume that M is a quantizable static space-
time. Definition 2.1 implies that there is a global time function t defined
up to a constant by the requirement that ξ = ∂/∂t. Thus M is foliated by
time-slices Mt, and
M = Ω− ∪ Σ ∪Ω+
where the unions are disjoint, Σ =M0, and Ω± are open sets corresponding
to t > 0 and t < 0 respectively. We infer existence of an isometry θ which
reverses the sign of t,
θ : Ω± → Ω∓ such that θ
2 = 1, θ|Σ = id.
Fix a self-adjoint extension of the Laplacian, and let C = (−∆ +m2)−1
be the resolvent of the Laplacian (also called the free covariance), where
m2 > 0. Then C is a bounded self-adjoint operator on L2(M). For each s ∈
R, the Sobolev space Hs(M) is a real Hilbert space, defined as completion
of C∞c (M) in the norm
(2.1) ‖f‖2s = 〈f,C
−sf〉.
The inclusion Hs →֒ Hs+k for k > 0 is Hilbert-Schmidt. Define S :=⋂
s<0Hs(M) and S
s>0Hs(M). Then
S ⊂ H−1(M) ⊂ S
form a Gelfand triple, and S is a nuclear space.
Recall that S ′ has a natural σ-algebra of measurable sets (see for instance
[7, 8, 17]). There is a unique Gaussian probability measure µ with mean
zero and covariance C defined on the cylinder sets in S ′ (see [7]).
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES3
More generally, one may consider a non-Gaussian, countably-additive
measure µ on S ′ and the space
E := L2(S ′, µ).
We are interested in the case that the monomials of the form A(Φ) =
Φ(f1) . . .Φ(fn) for fi ∈ S are all elements of E , and for which their span
is dense in E . This is of course true if µ is the Gaussian measure with
covariance C.
For an open set Ω ⊂ M , let EΩ denote the closure in E of the set of
monomials A(Φ) =
iΦ(fi) where supp(fi) ⊂ Ω for all i. Of particular
importance for Euclidean quantum field theory is the positive-time subspace
E+ := EΩ+ .
2.2. The Operator Induced by an Isometry. Isometries of the under-
lying space-time manifold act on a Hilbert space of classical fields arising in
the study of a classical field theory. For f ∈ C∞(M) and ψ : M → M an
isometry, define
fψ ≡ (ψ−1)∗f = f ◦ ψ−1.
Since det(dψ) = 1, the operation f → fψ extends to a bounded operator
on H±1(M) or on L
2(M). A treatment of isometries for static space-times
appears in [11].
Definition 2.2. Let ψ be an isometry, and A(Φ) = Φ(f1) . . .Φ(fn) ∈ E a
monomial. Define the induced operator
(2.2) Γ(ψ)A ≡ Φ(f1
ψ) . . .Φ(fn
and extend Γ(ψ) by linearity to the domain of polynomials in the fields,
which is dense in E .
3. Osterwalder-Schrader Quantization
3.1. Quantization of Vectors (The Hilbert Space H of Quantum
Theory). In this section we define the quantization map E+ → H , where
H is the Hilbert space of quantum theory. The existence of the quantization
map relies on a condition known as Osterwalder-Schrader (or reflection)
positivity. A probability measure µ on S ′ is said to be reflection positive if
(3.1)
Γ(θ)F F dµ ≥ 0
for all F in the positive-time subspace E+ ⊂ E . Let Θ = Γ(θ) be the
reflection on E induced by θ. Define the sesquilinear form (A,B) on E+×E+
as (A,B) = 〈ΘA,B〉E , so (3.1) states that (F,F ) ≥ 0.
Assumption 1 (O-S Positivity). Any measure dµ that we consider is re-
flection positive with respect to the time-reflection Θ.
4 ARTHUR JAFFE AND GORDON RITTER
Definition 3.1 (OS-Quantization). Given a reflection-positive measure dµ,
the Hilbert space H of quantum theory is the completion of E+/N with
respect to the inner product given by the sesquilinear form (A,B). Denote
the quantization map Π for vectors E+ → H by Π(A) = Â, and write
(3.2) 〈Â, B̂〉H = (A,B) = 〈ΘA,B〉E for A,B ∈ E+ .
3.2. Quantization of Operators. The basic quantization theorem gives a
sufficient condition to map a (possibly unbounded) linear operator T on E
to its quantization, a linear operator T̂ on H . Consider a densely-defined
operator T on E , the unitary time-reflection Θ, and the adjoint T+ = ΘT ∗Θ.
A preliminary version of the following was also given in [10].
Definition 3.2 (Quantization Condition I). The operator T satisfies QC-I
i. The operator T has a domain D(T ) dense in E .
ii. There is a subdomain D0 ⊂ E+ ∩D(T )∩D(T
+), for which D̂0 ⊂ H
is dense.
iii. The transformations T and T+ both map D0 into E+.
Theorem 3.3 (Quantization I). If T satisfies QC-I, then
i. The operators T ↾D0 and T
+↾D0 have quantizations T̂ and T̂+ with
domain D̂0.
ii. The operators T̂ ∗ =
T̂ ↾D̂0
and T̂+ agree on D̂0.
iii. The operator T̂ ↾D0 has a closure, namely T̂
Proof. We wish to define the quantization T̂ with the putative domain D̂0
(3.3) T̂ Â = T̂A .
For any vector A ∈ D0 and for any B ∈ (D0 ∩ N ), it is the case that
Â = Â+B. The transformation T̂ is defined by (3.3) iff T̂A = ̂T (A+B) =
T̂A+ T̂B. Hence one needs to verify that T : D0 ∩ N → N , which we now
The assumption D0 ⊂ D(T
+), along with the fact that Θ is unitary,
ensures that ΘD0 ⊂ D(T
∗). Therefore for any F ∈ D0,
(3.4)
〈ΘF, TB〉E = 〈T
∗ΘF,B〉E = 〈Θ(ΘT
∗ΘF ) , B〉E = 〈ΘT
+F,B〉E = 〈T̂
+F, B̂〉H .
In the last step we use the fact assumed in part (iii) of QC-I that T+ :
D0 → E+, yielding the inner product of two vectors in H . We infer from
the Schwarz inequality in H that
|〈ΘF, TB〉E | ≤ ‖T̂+F‖H ‖B̂‖H = 0 .
As 〈ΘF, TB〉E = 〈F̂ , T̂B〉H , this means that T̂B ⊥ D̂0. As D̂0 is dense in
H by QC-I.ii, we infer T̂B = 0. In other words, TB ∈ N as required to
define T̂ .
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES5
In order show that D̂0 ⊂ D(T̂
∗), perform a similar calculation to (3.4)
with arbitrary A ∈ D0 replacing B, namely
(3.5)
〈F̂ , T̂ Â〉H = 〈ΘF, TA〉E = 〈Θ(ΘT
∗ΘF ) , A〉E = 〈ΘT
+F,A〉E = 〈T̂+F, Â〉H .
The right side is continuous in Â ∈ H , and therefore F̂ ∈ D(T ∗). Further-
more T ∗F̂ = T̂+F . This identity shows that if F ∈ N , then T̂+F = 0.
Hence T+↾D0 has a quantization T̂
+, and we may write (3.5) as
(3.6) T ∗F̂ = T̂+F̂ , for all F ∈ D0 .
In particular T̂ ∗ is densely defined so T̂ has a closure. This completes the
proof. �
Definition 3.4 (Quantization Condition II). The operator T satisfies QC-II
i. Both the operator T and its adjoint T ∗ have dense domains D(T ),D(T ∗) ⊂
ii. There is a domain D0 ⊂ E+ in the common domain of T , T
+, T+T ,
and TT+.
iii. Each operator T , T+, T+T , and TT+ maps D0 into E+.
Theorem 3.5 (Quantization II). If T satisfies QC-II, then
i. The operators T ↾D0 and T
+↾D0 have quantizations T̂ and T̂+ with
domain D̂0.
ii. If A,B ∈ D0, one has 〈B̂, T̂ Â〉H = 〈T̂+B̂, Â〉H .
Remarks.
i. In Theorem 3.5 we drop the assumption that the domain D̂0 is dense,
obtaining quantizations T̂ and T̂+ whose domains are not necessarily
dense. In order to compensate for this, we assume more properties
concerning the domain and the range of T+ on E .
ii. As D̂0 need not be dense in H , the adjoint of T̂ need not be defined.
Nevertheless, one calls the operator T̂ symmetric in case one has
(3.7) 〈B̂, T̂ Â〉H = 〈T̂ B̂, Â〉H , for all A,B ∈ D0 .
iii. If Ŝ ⊃ T̂ is a densely-defined extension of T̂ , then Ŝ∗ = T̂+ on the
domain D̂0.
Proof. We define the quantization T̂ with the putative domain D̂0. As in
the proof of Theorem 3.3, this quantization T̂ is well-defined iff it is the case
that T : D0 ∩ N → N . For any F ∈ D0 ∩ N , by definition ‖F̂‖H = 0.
〈TF, TF 〉H = (TF, TF ) = 〈ΘTF, TF 〉E = 〈F, T
∗ΘTF 〉
where one uses the fact that D0 ⊂ D(T
+T ). Thus
〈TF, TF 〉H =
ΘF, T+TF
= 〈F, T+TF 〉H .
6 ARTHUR JAFFE AND GORDON RITTER
Here we use the fact that T+T maps D0 to E+. Thus one can use the
Schwarz inequality on H to obtain
〈TF, TF 〉H ≤ ‖F̂‖H ‖T̂
= 0 .
Hence T : D0 ∩ N → N , and T has a quantization T̂ with domain D̂0.
In order verify that T+↾D0 has a quantization, one needs to show that
T+ : D0 ∩N ⊂ N . Repeat the argument above with T
+ replacing T . The
assumption TT+ : D0 → E+ yields for F ∈ D0 ∩ N ,
〈T+F, T+F 〉H = 〈T
∗ΘF, T+F 〉E = 〈ΘF, TT
+F 〉E = 〈F̂ , T̂ T
+F 〉H .
Use the Schwarz inequality in H to obtain the desired result that
〈T+F, T+F 〉H ≤ ‖F̂‖H ‖T̂ T
= 0 .
Hence T+ has a quantization T̂+ with domain D̂0, and for B ∈ D0 one has
T̂+B = T̂+B̂. In order to establish (ii), assume that A,B ∈ D0. Then
〈B̂, T̂ Â〉H = 〈ΘB,TA〉E = 〈Θ(ΘT
∗ΘB) , A〉E = 〈ΘT
+B,A〉E
= 〈T̂+B, Â〉H = 〈T̂+B̂, Â〉H .(3.8)
This completes the proof.
4. Structure and Representation of the Lie Algebra of
Killing Fields
For the remainder of this paper we assume the following, which is clearly
true in the Gaussian case as the Laplacian commutes with the isometry
group G. (A further explanation was given in [11].)
Assumption 2. The isometry groups G that we consider leave the measure
dµ invariant, in the sense that Γ, defined above, is a unitary representation
of G on E .
4.1. The Representation of g on E .
Lemma 4.1. Let Gi be an analytic group with Lie algebra gi (i = 1, 2),
and let λ : g1 → g2 be a homomorphism. There cannot exist more than one
analytic homomorphism π : G1 → G2 for which dπ = λ. If G1 is simply
connected then there is always one such π.
Let D = d/dt denote the canonical unit vector field on R. Let G be a
real Lie group with algebra g, and let X ∈ g. The map tD → tX(t ∈ R)
is a homomorphism of Lie(R) → g, so by the Lemma there is a unique
analytic homomorphism ξX : R → G such that dξX(D) = X. Conversely,
if η is an analytic homomorphism of R → G, and if we let X = dη(D),
it is obvious that η = ξX . Thus X 7→ ξX is a bijection of g onto the set
of analytic homomorphisms R → G. The exponential map is defined by
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES7
exp(X) := ξX(1). For complex Lie groups, the same argument applies,
replacing R with C throughout.
Since g is connected, so is exp(g). Hence exp(g) ⊆ G0, where G0 denotes
the connected component of the identity in G. It need not be the case for
a general Lie group that exp(g) = G0, but for a large class of examples
(the so-called exponential groups) this does hold. For any Lie group, exp(g)
contains an open neighborhood of the identity, so the subgroup generated
by exp(g) always coincides with G0.
We will apply the above results with G = Iso(M), the isometry group of
M , and g = Lie(G) the algebra of global Killing fields. Thus we have a bijec-
tive correspondence between Killing fields and 1-parameter groups of isome-
tries. This correspondence has a geometric realization: the 1-parameter
group of isometries
φs = ξX(s) = exp(sX)
corresponding to X ∈ g is the flow generated by X.
Consider the two different 1-parameter groups of unitary operators:
(1) the unitary group φ∗s on L
2(M), and
(2) the unitary group Γ(φs) on E .
Stone’s theorem applies to both of these unitary groups to yield densely-
defined self-adjoint operators on the respective Hilbert spaces.
In the first case, the relevant self-adjoint operator is simply an extension
of −iX, viewed as a differential operator on C∞c (M). This is because for
f ∈ C∞c (M) and p ∈M , we have:
Xpf = (LXf)(p) =
f(φs(p))|s=0.
Thus −iX is a densely-defined symmetric operator on L2(M), and Stone’s
theorem implies that −iX has self-adjoint extensions.
In the second case, the unitary group Γ(φs) on E also has a self-adjoint
generator Γ(X), which can be calculated explicitly. By definition,
e−isΓ(X)
Φ(fi)
Φ(fi ◦ φ−s).
Now replace s→ −s and calculate d/ds|s=0 applied to both sides of the last
equation to see that
Φ(fi)
Φ(f1) . . .Φ(−iXfj)Φ(fj+1) . . .Φ(fn) .
One may check that Γ is a Lie algebra representation of g, i.e. Γ([X,Y ]) =
[Γ(X),Γ(Y )].
4.2. The Cartan Decomposition of g. For each ξ ∈ g, there exists some
dense domain in E on which Γ(ξ) is self-adjoint, as discussed previously.
8 ARTHUR JAFFE AND GORDON RITTER
However, the quantizations Γ̂(ξ) acting on H may be hermitian, anti-
hermitian, or neither depending on whether there holds a relation of the
(4.1) Γ(ξ)Θ = ±ΘΓ(ξ),
with one of the two possible signs, or whether no such relation holds.
Even if (4.1) holds, to complete the construction of a unitary representa-
tion one must prove that there exists a dense domain in H on which Γ̂(ξ) is
self-adjoint or skew-adjoint. This nontrivial problem will be dealt with in a
later section using Theorems 3.3 and 3.5 and the theory of symmetric local
semigroups [12, 4]. Presently we determine which elements within g satisfy
relations of the form (4.1).
Let ϑ := θ∗ as an operator on C∞(M), and consider a Killing field X ∈ g
also as an operator on C∞(M). Define T : g → g by
(4.2) T (X) := ϑXϑ.
From (4.2) it is not obvious that the range of T is contained in g. To prove
this, we recall some geometric constructions.
Let M,N be manifolds, let ψ : M → N be a diffeomorphism, and X ∈
Vect(M). Then
(4.3) ψ−1∗Xψ∗ = X(· ◦ ψ) ◦ ψ−1.
defines an operator on C∞(N). One may check that this operator is a
derivation, thus (4.3) defines a vector field on N . The vector field (4.3) is
usually denoted
ψ∗X = dψ(Xψ−1(p))
and referred to as the push-forward of X.
We now wish to show that g = g+⊕ g−, where g± are the ±1-eigenspaces
of T . This is proven by introducing an inner product (X,Y )g on g with
respect to which T is self-adjoint.
Theorem 4.2. Consider g as a Hilbert space with inner product (X,Y )g.
The operator T : g → g is self-adjoint with T 2 = I; hence
(4.4) g = g+ ⊕ g−
as an orthogonal direct sum of Hilbert spaces, where g± are the ±1-eigenspaces
of T . Further, ∂t ∈ g− hence dim(g−) ≥ 1. Elements of g− have hermitian
quantizations, while elements of g+ have anti-hermitian quantizations.
Proof. Write (4.2) as
(4.5) T (X) = θ−1∗Xθ∗ = θ∗X .
Thus T is the operator of push-forward by θ. The push-forward of a Killing
field by an isometry is another Killing field, hence the range of T is contained
2It is not the case that g− consists only of ∂t. In particular, dim(g−) = 2 for M = H 2.
It can occur that dim g+ = 0.
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES9
in g. Also, T must have a trivial kernel since T 2 = I, and this implies that
T is surjective. It follows from (4.5) that T is a Hermitian operator on
g. Hence T is diagonalizable and has real eigenvalues which are square
roots of 1. This establishes the decomposition (4.4). That elements of
g− have hermitian quantizations, while elements of g+ have anti-hermitian
quantizations follows from Theorem 3.3. �
A Cartan involution is a Lie algebra homomorphism g → g which squares
to the identity. It follows from (4.2) that T is a Lie algebra homomorphism;
thus, Theorem 4.2 implies that T is a Cartan involution of g. This implies
that the eigenspaces (g+, g−) form a Cartan pair, meaning that
(4.6) [g+, g+] ⊂ g+, [g+, g−] ⊂ g−, and [g−, g−] ⊂ g+ .
Clearly g+ is a subalgebra while g− is not, and any subalgebra contained in
g− is abelian.
5. Reflection-Invariant and Reflected Isometries
Let G = Iso(M) denote the isometry group of M , as above. Then G
has a Z2 subgroup containing {1, θ}. This subgroup acts on G by conjuga-
tion, which is just the action ψ → ψθ := θψθ. Conjugation is an (inner)
automorphism of the group, so
(ψφ)θ = ψθφθ, (ψθ)−1 = (ψ−1)θ.
Definition 5.1. We say that ψ ∈ G is reflection-invariant if
ψθ = ψ,
and that ψ is reflected if
ψθ = ψ−1.
Let GRI denote the subgroup of G consisting of reflection-invariant elements,
and let GR denote the subset of reflected elements.
Note that GRI is the stabilizer of the Z2 action, hence a subgroup. An
alternate proof of this proceeds usingGRI = exp(g+). Although GR is closed
under the taking of inverses and does contain the identity, the product of two
reflected isometries is no longer reflected unless they commute. Generally,
the product of an element of GR with an element ofGRI is neither an element
of GR nor of GRI . The only isometry that is both reflection-invariant and
reflected is θ itself. Thus we have:
GR ∩GRI = {1, θ} ⊂ GR ∪GRI ( G.
Theorem 5.2. Let G0 denote the connected component of the identity in
G. Then G0 is generated by GR ∪ GRI . (This is a form of the Cartan
decomposition for G.)
10 ARTHUR JAFFE AND GORDON RITTER
Proof. Since g = g+ ⊕ g− as a direct sum of vector spaces (though not of
Lie algebras), we have
exp(g)
exp(g+) ∪ exp(g−)
Choose bases {ξ±,i}i=1,...,n± for g± respectively. Then we have:
{exp(sξ+,i) : 1 ≤ i ≤ n+, s ∈ R}∪{exp(sξ−,j) : 1 ≤ j ≤ n−, s ∈ R}
Furthermore, exp(sξ−,i) is reflected, while exp(sξ+,i) is reflection-invariant,
completing the proof. �
Corollary 5.3. The Lie algebra of the subgroup GRI is g+.
To summarize, the isometry group of a static space-time can always be
generated by a collection of n (= dim g) one-parameter subgroups, each of
which consists either of reflected isometries, or reflection-invariant isome-
tries.
6. Construction of Unitary Representations
6.1. Self-adjointness of Semigroups. In this section, we recall several
known results on self-adjointness of semigroups. Roughly speaking, these
results imply that if a one-parameter family Sα of unbounded symmetric
operators satisfies a semigroup condition of the form SαSβ = Sα+β, then
under suitable conditions one may conclude essential self-adjointness.
A theorem of this type appeared in a 1970 paper of Nussbaum [14], who
assumed that the semigroup operators have a common dense domain. The
result was rediscovered independently by Fröhlich, who applied it to quan-
tum field theory in several important papers [5, 3]. For our intended appli-
cation to quantum field theory, it turns out to be very convenient to drop
the assumption that ∃ a such that the Sα all have a common dense domain
for |α| < a, in favor of the weaker assumption that
α>0D(Sα) is dense.
A generalization of Nussbaum’s theorem which allows the domains of the
semigroup operators to vary with the parameter, and which only requires the
union of the domains to be dense, was later formulated and two independent
proofs were given: one by Fröhlich [4], and another by Klein and Landau [12].
The latter also used this theorem in their construction of representations of
the Euclidean group and the corresponding analytic continuation to the
Lorentz group [13].
In order to keep the present article self-contained, we first define symmet-
ric local semigroups and then recall the refined self-adjointness theorem of
Fröhlich, and Klein and Landau.
Definition 6.1. Let H be a Hilbert space, let T > 0 and for each α ∈ [0, T ],
let Sα be a symmetric linear operator on the domain Dα ⊂ H , such that:
(i) Dα ⊃ Dβ if α ≤ β and D :=
0<α≤T Dα is dense in H ,
(ii) α→ Sα is weakly continuous,
(iii) S0 = I, Sβ(Dα) ⊂ Dα−β for 0 ≤ β ≤ α ≤ T , and
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES11
(iv) SαSβ = Sα+β on Dα+β for α, β, α + β ∈ [0, T ].
In this situation, we say that (Sα,Dα, T ) is a symmetric local semigroup.
It is important that Dα is not required to be dense in H for each α; the
only density requirement is (i).
Theorem 6.2 ([12, 4]). For each symmetric local semigroup (Sα,Dα, T ),
there exists a unique self-adjoint operator A such that3
Dα ⊂ D(e
−αA) and Sα = e
−αA|Dα for all α ∈ [0, T ].
Also, A ≥ −c if and only if ‖Sαf‖ ≤ e
cα‖f‖ for all f ∈ Dα and 0 < α < T .
6.2. Reflection-Invariant Isometries.
Lemma 6.3. Let ψ be a reflection-invariant isometry and assume ∃ p ∈ Ω+
such that ψ(p) ∈ Ω+. Then ψ preserves the positive-time subspace, i.e.
ψ(Ω+) ⊆ Ω+.
Proof. We first prove that ψ(Σ) ⊆ Σ. Suppose not; then ∃ p ∈ Σ with ψ(p) 6∈
Σ. Assume ψ(p) ∈ Ω+ (without loss of generality: we could repeat the same
argument with ψ(p) ∈ Ω−). Then Ω+ contains (θψθ)(p) = θψ(p) ∈ Ω−, a
contradiction since Ω−∩Ω+ = ∅. We used the fact that θ|Σ = id so θ(p) = p.
Hence ψ restricts to an isometry of Σ. It follows that the restriction of ψ to
M ′ =M \Σ is also an isometry. However, M ′ = Ω− ⊔Ω+, where ⊔ denotes
the disjoint union. Therefore ψ(Ω+) is wholly contained in either Ω+ or Ω−,
since ψ is a homeomorphism and so ψ(Ω+) is connected. The possibility
that ψ(Ω+) ⊆ Ω− is ruled out by our assumption, so ψ(Ω+) ⊆ Ω+. �
Lemma 6.3 has the immediate consequence that if ξ ∈ g+ then the one-
parameter group associated to ξ is positive-time-invariant. This result plays
a key role in the proof of Theorem 6.4.
6.3. Construction of Unitary Representations. The rest of this section
is devoted to proving that the theory of symmetric local semigroups can be
applied to the quantized operators on H corresponding to each of a set of
1-parameter subgroups of G = Iso(M). The proof relies upon Lemma 6.3,
and Theorems 3.3, 3.5 and 6.2.
Theorem 6.4. Let (M,gab) be a quantizable static space-time. Let ξ be a
Killing field which lies in g+ or g−, with associated one-parameter group of
isometries {φα}α∈R. Then there exists a densely-defined self-adjoint opera-
tor Aξ on H such that
Γ̂(φα) =
e−αAξ , if ξ ∈ g−
eiαAξ if ξ ∈ g+.
3The authors of [4, 12] also showed that
bD :=
0<α≤S
0<β<α
Sβ(Dα)
, where 0 < S ≤ T,
is a core for A, i.e. (A, bD) is essentially self-adjoint.
12 ARTHUR JAFFE AND GORDON RITTER
Proof. First suppose that ξ ∈ g−, which implies that the isometries φα are
reflected, and so Γ(φα)
+ = Γ(φα). Define
Ωξ,α := φ
α (Ω+).
For all α in some neighborhood of zero, Ωξ,α is a nonempty open subset of
Ω+, and moreover, as α → 0
+, Ωξ,α increases to fill Ω+ with Ωξ,0 = Ω+.
These statements follow immediately from the fact that, for each p ∈ Ω+,
φα(p) is continuous with respect to α, and φ0 is the identity map.
Since φα(Ωξ,α) ⊆ Ω+, we infer that Γ(φα)EΩξ,α ⊆ E+. By Theorem 3.5,
Γ(φα) has a quantization which is a symmetric operator on the domain
Dξ,α := Π(EΩξ,α).
Note that Dξ,α is not necessarily dense in H .
4 We now show that Theorem
6.2 can be applied.
Fix some positive constant a with Ωξ,a nonempty. Note that
0<α≤a
Ωξ,α = Ω+ ⇒
0<α≤a
EΩξ,α = E+.
It follows that
Dξ :=
0<α≤a
is dense in H . This establishes condition (i) of Definition 6.1, and the
other conditions are routine verifications. Theorem 6.2 implies existence of
a densely-defined self-adjoint operator Aξ on H , such that
Γ̂(φα) = exp(−αAξ) for all α ∈ [0, a] .
This proves the theorem in case ξ ∈ g−.
Now suppose that ξ ∈ g+, implying that the isometries φα are reflection-
invariant, and
Γ(φα)
+ = Γ(φα)
−1 = Γ(φ−α) on E .
Lemma 6.3 implies that Γ(φα)E+ ⊆ E+. By Theorem 3.3, Γ(φα) has a
quantization Γ̂(φα) which is defined and satisfies
Γ̂(φα)
∗ = Γ̂(φα)
on the domain Π(E+), which is dense in H by definition. In this case we
do not need Theorem 6.2; for each α, Γ̂(φα) extends by continuity to a one-
parameter unitary group defined on all of H (not only for a dense subspace).
By Stone’s theorem,
Γ̂(φα) = exp(iαAξ)
for Aξ self-adjoint and for all α ∈ R. The proof is complete. �
4Density of Dξ,α would be implied by a Reeh-Schlieder theorem, which we do not prove
except in the free case. Theorem 6.2 removes the need for a Reeh-Schlieder theorem in
this argument.
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES13
7. Analytic Continuation
Each Riemannian static space-time (M,gab) has a Lorentzian continuation
Mlor, which we construct as follows. In adapted coordinates, the metric gab
on M takes the form
(7.1) ds2 = F (x)dt2 + Gµν(x)dx
µdxν .
The analytic continuation t → −it of (7.1) is standard and gives a metric
of Lorentz signature, ds2lor = −F dt
2 + G dx2, by which we define the
Lorentzian space-time Mlor. Einstein’s equation Ricg = k g is preserved
by the analytic continuation, but we do not use this fact anywhere in the
present paper.
Let {ξ
i : 1 ≤ i ≤ n±} be bases of g±, respectively. Let A
i = Aξ(±)
the densely-defined self-adjoint operators on H , constructed by Theorem
6.4. Let
(7.2) U
i (α) = exp(iαA
i ) , for 1 ≤ i ≤ n±
be the associated one-parameter unitary groups on H .
We claim that the group generated by the n = n+ + n− one-parameter
unitary groups (7.2) is isomorphic to the identity component of
Glor := Iso(Mlor),
the group of Lorentzian isometries. Since locally, the group structure is
determined by its Lie algebra, it suffices to check that the generators satisfy
the defining relations of glor := Lie(Glor).
Since quantization of operators preserves multiplication, we have
(7.3) X,Y,Z ∈ g, [X,Y ] = Z ⇒ [Γ̂(X), Γ̂(Y )] = Γ̂(Z).
In what follows, we will use the notation ĝ± for {Γ̂(X) : X ∈ g±}.
Quantization converts the elements of g− from skew operators into Her-
mitian operators; i.e. elements of ĝ− are Hermitian on H and hence, ele-
ments of i ĝ− are skew-symmetric on H . Thus ĝ+ ⊕ i ĝ− is a Lie algebra
represented by skew-symmetric operators on H .
Theorem 7.1. We have an isomorphism of Lie algebras:
(7.4) glor ∼= ĝ+ ⊕ i ĝ− .
Proof. LetMC be the manifold obtained by allowing the t coordinate to take
values in C. Define ψ :MC →MC by t 7→ −it. Then glor is generated by
i }1≤i≤n+ ∪ {ηj}1≤j≤n− , where ηj := iψ
It is possible to define a set of real structure constants fijk such that
(7.5) [ξ
fijkξ
14 ARTHUR JAFFE AND GORDON RITTER
Applying ψ∗ to both sides of (7.5), the commutation relations of glor are
seen to be
(7.6) [ηi, ηj ] = −fijkξ
together with the same relations for g+ as before. Now (7.3) implies that
(7.6) are the precisely the commutation relations of ĝ+ ⊕ i ĝ−, completing
the proof of (7.4). �
Corollary 7.2. Let (M,gab) be a quantizable static space-time. The unitary
groups (7.2) determine a unitary representation of G0lor on H .
7.1. Conclusions. We have obtained the following conclusions. There is a
unitary representation of the group G0lor on the physical Hilbert space H
of quantum field theory on the static space-time M . This representation
maps the time-translation subgroup into the unitary group exp(itH), where
the energy H ≥ 0 is a positive, densely-defined self-adjoint operator corre-
sponding to the Hamiltonian of the theory. The Hilbert space H contains a
ground state Ψ0 = 1̂ which is such that HΨ0 = 0 and Ψ0 is invariant under
the action of all spacetime symmetries. We obtain these results via analytic
continuation from the Euclidean path integral, under mild assumptions on
the measure which should include all physically interesting examples. This is
done without introducing the field operators; nonetheless, Theorems 3.3 and
3.5 do suffice to construct field operators. In the special case M = Rd with
G = SO(4), we obtain a unitary representation of the proper orthochronous
Lorentz group, G0lor = L
+ = SO
0(3, 1).
8. Hyperbolic Space and Anti-de Sitter Space
Consider Euclidean quantum field theory on M = H d. The metric is
ds2 = r−2
dx2i ,
where we define r = xd for convenience. The Laplacian is
(8.1) ∆
d = (2− d)r
+ r2∆
The d−1 coordinate vector fields {∂/∂xi : i 6= d} are all static Killing fields,
and any one of the coordinates xi (i 6= d) is a satisfactory representation of
time in this space-time. It is convenient to define t = x1 as before, and to
identify t with time.
The time-zero slice is M0 = H
d−1. From
H d = {v ∈ Rd,1 | 〈v, v〉 = −1, v0 > 0}
it follows that Isom(H d) = O+(d, 1) and the orientation-preserving isometry
group is SO+(d, 1).
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES15
Figure 1. Flow lines of the Killing field ζ = (t2 − r2)∂t +
2tr ∂r on H
For constant curvature spaces, one may solve Killing’s equation LKg = 0
explicitly. Let us illustrate the solutions and their quantizations for d = 2.
The three Killing fields
(8.2) ξ = ∂t, η = t∂t + r∂r, ζ = (t
2 − r2)∂t + 2tr ∂r
are a convenient basis for g. Any d-dimensional manifold satisfies dim g ≤
d(d + 1)/2, manifolds saturating the bound are said to be maximally sym-
metric, and H d is maximally symmetric.
Now, ∂tf(−t) = −f
′(−t) so ∂tΘ = −Θ∂t, hence ∂t ∈ g−. Similar calcula-
tions show [Θ, η] = 0 and Θζ = −ζΘ. Thus η spans g+, while ∂t, ζ span g−.
The commutation relations5 for g are:
[η, ζ] = ζ, [η, ∂t] = −∂t, [ζ, ∂t] = −2η.
These calculations verify that (g+, g−) forms a Cartan pair, as defined in
(4.6).
The flows associated to (8.2) are easily visualized: ξ is a right-translation,
and η flow-lines are radially outward from the Euclidean origin. The flows
of ζ are Euclidean circles, indicated by the darker lines in Figure 1. Hence
the flows of η are defined on all of E+, while the flows of ζ are analogous to
space-time rotations in R2, and hence, must be defined on a wedge of the
Wα = {(t, r) : t, r > 0, tan
−1(r/t) < α}.
The simple geometric idea of Section 6.2 is nicely confirmed in this case: the
flows of η (the generator of g+) preserve the t = 0 plane, and are separately
isometries of Ω+ and Ω−.
Corollary 7.2 implies that the procedure outlined above defines a uni-
tary representation of the identity component of Iso(AdS2) on the physical
Hilbert space H for quantum field theory on this background, including
theories with interactions that preserve the symmetry. Since Iso(AdSd+1) =
5Note that quite generally [g−, g−] ⊆ g+ so it’s automatic that [ζ, ∂t] is proportional
to η.
16 ARTHUR JAFFE AND GORDON RITTER
SO(d, 2), we have a unitary representation of SO0(1, 2). The latter is a non-
compact, semisimple real Lie group, and thus it has no finite-dimensional
unitary representations, but a host of interesting infinite-dimensional ones.
Appendix A. Euclidean Reeh-Schlieder Theorem
We prove the Euclidean Reeh-Schlieder property for free theories on curved
backgrounds. It is reasonable to expect this property to extend to interact-
ing theories on curved backgrounds, but it would have to be established for
each such model since it depends explicitly on the two-point function.
The Reeh-Schlieder theorem guarantees the existence of a dense quanti-
zation domain based on any open subset of Ω+. For this reason, one could
use the Reeh-Schlieder (RS) theorem with Nussbaum’s theorem [14] to con-
struct a second proof of Theorem 6.4 under the additional assumption that
M is real-analytic.
Fortunately, our proof of Theorem 6.4 is completely independent of the
Reeh-Schlieder property. This has two advantages: we do not have to assume
M is a real-analytic manifold and, more importantly, our proof of Theorem
6.4 generalizes immediately and transparently to interacting theories as long
as the Hilbert space H is not modified by the interaction.
We state and prove this using the one-particle space; however, the result
clearly extends to the quantum-field Hilbert space.
Theorem A.1. Let M be a quantizable static space-time endowed with a
real-analytic structure, and assume that gab is real-analytic. Let O ⊂ Ω+
and D = C∞(O) ⊂ L2(Ω+). Then D̂
⊥ = {0}.
Proof. Let f ∈ L2(Ω+) with f̂ ⊥ D . For x ∈ Ω+, define
η(x) := 〈f̂ , δ̂x〉H = 〈Θf,Cδx〉L2 .
Real-analyticity of η(x) follows from the real-analyticity of (the integral
kernel of) C, which in turn follows from the elliptic regularity theorem in the
real-analytic category (see for instance [1, Sec. II.1.3]). Now by assumption,
for any g ∈ C∞c (O), we have
0 = 〈ĝ, f̂〉H = 〈Θf,Cg〉L2(M).
Let g → δx for x ∈ O. Then 0 = 〈Θf,Cδx〉L2 ≡ η(x). Since η|O = 0, by
real-analyticity we infer the vanishing of η on Ω+, completing the proof. �
Acknowledgements. We are grateful to Hanno Gottschalk and Alexander
Strohmaier for helpful discussions, and G.R. is grateful to the Universität
Bonn for their hospitality during February 2007.
References
[1] Lipman Bers, Fritz John, and Martin Schechter. Partial differential
equations. American Mathematical Society, Providence, R.I., 1979. Lec-
tures in Applied Mathematics 3.
QUANTUM FIELD THEORY ON CURVED BACKGROUNDS. II.SPACETIME SYMMETRIES17
[2] N. D. Birrell and P. C. W. Davies. Quantum fields in curved space, vol-
ume 7 of Cambridge Monographs on Mathematical Physics. Cambridge
University Press, Cambridge, 1982.
[3] W. Driessler and J. Fröhlich. The reconstruction of local observable
algebras from the Euclidean Green’s functions of relativistic quantum
field theory. Ann. Inst. H. Poincaré Sect. A (N.S.), 27(3):221–236,
1977.
[4] J. Fröhlich. Unbounded, symmetric semigroups on a separable Hilbert
space are essentially selfadjoint. Adv. in Appl. Math., 1(3):237–256,
1980.
[5] Jürg Fröhlich. The pure phases, the irreducible quantum fields, and
dynamical symmetry breaking in Symanzik-Nelson positive quantum
field theories. Ann. Physics, 97(1):1–54, 1976.
[6] Stephen A. Fulling. Aspects of quantum field theory in curved space-
time, volume 17 of London Mathematical Society Student Texts. Cam-
bridge University Press, Cambridge, 1989.
[7] I. M. Gel′fand and N. Ya. Vilenkin. Generalized functions. Vol. 4.
Academic Press [Harcourt Brace Jovanovich Publishers], New York,
1964 [1977]. Applications of harmonic analysis, Translated from the
Russian by Amiel Feinstein.
[8] James Glimm and Arthur Jaffe. Quantum physics. Springer-Verlag,
New York, second edition, 1987. A functional integral point of view.
[9] Arthur Jaffe. Constructive quantum field theory. In Mathematical
physics 2000, pages 111–127. Imp. Coll. Press, London, 2000.
[10] Arthur Jaffe. Introduction to Quantum Field Theory. 2005. Lecture
notes from Harvard Physics 289r, available in part online at
http://www.arthurjaffe.com/Assets/pdf/IntroQFT.pdf.
[11] Arthur Jaffe and Gordon Ritter. Quantum field theory on curved
backgrounds. i. the euclidean functional integral. Comm. Math. Phys.,
270(2):545–572, 2007.
[12] Abel Klein and Lawrence J. Landau. Construction of a unique self-
adjoint generator for a symmetric local semigroup. J. Funct. Anal.,
44(2):121–137, 1981.
[13] Abel Klein and Lawrence J. Landau. From the Euclidean group to
the Poincaré group via Osterwalder-Schrader positivity. Comm. Math.
Phys., 87(4):469–484, 1983.
[14] A. E. Nussbaum. Spectral representation of certain one-parametric
families of symmetric operators in Hilbert space. Trans. Amer. Math.
Soc., 152:419–429, 1970.
[15] Konrad Osterwalder and Robert Schrader. Axioms for Euclidean
Green’s functions. Comm. Math. Phys., 31:83–112, 1973.
[16] Konrad Osterwalder and Robert Schrader. Axioms for Euclidean
Green’s functions. II. Comm. Math. Phys., 42:281–305, 1975. With
an appendix by Stephen Summers.
http://www.arthurjaffe.com/Assets/pdf/IntroQFT.pdf
18 ARTHUR JAFFE AND GORDON RITTER
[17] Barry Simon. The P (φ)2 Euclidean (quantum) field theory. Princeton
University Press, Princeton, N.J., 1974. Princeton Series in Physics.
[18] Robert M. Wald. Quantum field theory in curved space-time. In Grav-
itation et quantifications (Les Houches, 1992), pages 63–167. North-
Holland, Amsterdam, 1995.
E-mail address: arthur jaffe@harvard.edu
Harvard University, 17 Oxford St., Cambridge, MA 02138
E-mail address: ritter@post.harvard.edu
Harvard University, 17 Oxford St., Cambridge, MA 02138
	1. Introduction
	2. Classical Space-Time
	2.1. Structure of Static Space-Times
	2.2. The Operator Induced by an Isometry
	3. Osterwalder-Schrader Quantization
	3.1. Quantization of Vectors (The Hilbert Space  H of Quantum Theory)
	3.2. Quantization of Operators
	4. Structure and Representation of the Lie Algebra of Killing Fields
	4.1. The Representation of g on  E
	4.2. The Cartan Decomposition of g
	5. Reflection-Invariant and Reflected Isometries
	6. Construction of Unitary Representations
	6.1. Self-adjointness of Semigroups
	6.2. Reflection-Invariant Isometries
	6.3. Construction of Unitary Representations
	7. Analytic Continuation
	7.1. Conclusions
	8. Hyperbolic Space and Anti-de Sitter Space
	Appendix A. Euclidean Reeh-Schlieder Theorem
	Acknowledgements
	References
ABSTRACT
  We study space-time symmetries in scalar quantum field theory (including
interacting theories) on static space-times. We first consider Euclidean
quantum field theory on a static Riemannian manifold, and show that the
isometry group is generated by one-parameter subgroups which have either
self-adjoint or unitary quantizations. We analytically continue the
self-adjoint semigroups to one-parameter unitary groups, and thus construct a
unitary representation of the isometry group of the associated Lorentzian
manifold. The method is illustrated for the example of hyperbolic space, whose
Lorentzian continuation is Anti-de Sitter space.

<|endoftext|><|startoftext|>
Introduction
In Finsler geometry all geometric objects depend not only on positional coordi-
nates, as in Riemannian geometry, but also on directional arguments. In Riemannian
geometry there is a canonical linear connection on the manifold M , while in Finsler
geometry there is a corresponding canonical linear connection, due to E. Cartan,
which is not a connection on M but is a connection on π−1(TM), the pullback of
the tangent bundle TM by π : TM −→ M (the pullback approach). Moreover, in
Riemannian geometry there is one curvature tensor and one torsion tensor associated
with a given linear connection on the manifold M , whereas in Finsler geometry there
are three curvature tensors and five torsion tensors associated with a given linear
connection on π−1(TM).
Most of the special spaces in Finsler geometry are derived from the fact that
the π-tensor fields (torsions and curvatures) associated with the Cartan connection
satisfy special forms. Consequently, special spaces of Finsler geometry are more
numerous than those of Riemannian geometry. Special Finsler spaces are investigated
locally (using local coordinates) by many authors: M. Matsumoto [16], [18], [15], [14]
and others [6], [19], [8], [7]. On the other hand, the global (or intrinsic, free from
local coordinates) investigation of such spaces is very rare in the literature. Some
considerable contributions in this direction are due to A. Tamim [24], [25].
In the present paper, we provide a global presentation of the theory of special
Finsler manifolds. We introduce and investigate globally many of the most important
and most commonly used special Finsler manifolds : locally Minkowskian, Berwald,
Landesberg, general Landesberg, P -reducible, C-reducible, semi-C-reducible, quasi-
C-reducible, P ∗-Finsler, Ch-recurrent, Cv-recurrent, C0-recurrent, Sv-recurrent, Sv-
recurrent of the second order, C2-like, S3-like, S4-like, P2-like, R3-like, P -symmetric,
h-isotropic, of scalar curvature, of constant curvature, of p-scalar curvature, of s-ps-
curvature.
The paper consists of two parts, preceded by a preliminary section (§1), which
provides a brief account of the basic concepts of the pullback approach to Finsler
geometry necessary to this work. For more detail, the reader is referred to [1], [3], [5]
and [24].
In the first part (§2), we introduce the global definitions of the aforementioned
special Finsler manifolds in such a way that, when localized, they yield the usual
local definitions current in the literature (see the appendix). The definitions are
arranged according to the type of the defining property of the special Finsler manifold
concerned.
In the second part (§3), various relationships between the different types of the
considered special Finsler manifolds are found. Many local results, known in the
literature, are proved globally and several new results are obtained. As a by-product
of some of the obtained results, interesting identities and properties concerning the
torsion tensor fields and the curvature tensor fields are deduced, which in turn play
a key role in obtaining other results.
Among the obtained results are: a characterization of Riemannian manifolds, a
characterization of Sv-recurrent manifolds, a characterization of P -symmetric
manifolds, a characterization of Berwald manifolds (in certain cases), the equivalence
of Landsberg and general Landsberg manifolds under certain conditions, a classifica-
tion of h-isotropic Ch-recurrent manifolds and a presentation of different conditions
under which an R3-like Finsler manifold becomes a Finsler manifold of s-ps curvature.
The above results are just a non-exhaustive sample of the global results obtained in
this paper.
It should finally be noted that some important results of [8], [9], [11], [13], [19],
[20],...,etc. (obtained in local coordinates) are immediately derived from the obtained
global results (when localized).
Although our investigation is entirely global, we conclude the paper with an ap-
pendix presenting a local counterpart of our global approach and the local definitions
of the special Finsler spaces considered. This is done to facilitate comparison and to
make the paper more self-contained.
1. Notation and Preliminaries
In this section, we give a brief account of the basic concepts of the pullback
formalism of Finsler geometry necessary for this work. For more details refer to [1], [3],
[5] and [24]. We make the general assumption that all geometric objects we consider
are of class C∞. The following notations will be used throughout this paper:
M : a real differentiable manifold of finite dimension n and of class C∞,
F(M): the R-algebra of differentiable functions on M ,
X(M): the F(M)-module of vector fields on M ,
πM : TM −→M : the tangent bundle of M ,
π : TM −→M : the subbundle of nonzero vectors tangent to M ,
V (TM): the vertical subbundle of the bundle TTM ,
P : π−1(TM) −→ TM : the pullback of the tangent bundle TM by π,
P ∗ : π−1(T ∗M) −→ TM : the pullback of the cotangent bundle T ∗M by π,
X(π(M)): the F(TM)-module of differentiable sections of π−1(TM).
Elements of X(π(M)) will be called π-vector fields and will be denoted by barred
letters X. Tensor fields on π−1(TM) will be called π-tensor fields. The fundamental
π-vector field is the π-vector field η defined by η(u) = (u, u) for all u ∈ TM . The
lift to π−1(TM) of a vector field X on M is the π-vector field X defined by X(u) =
(u,X(π(u))). The lift to π−1(TM) of a 1-form ω on M is the π-form ω defined by
ω(u) = (u, ω(π(u))).
The tangent bundle T (TM) is related to the pullback bundle π−1(TM) by the
short exact sequence
0 −→ π−1(TM)
−→ T (TM)
−→ π−1(TM) −→ 0,
where the bundle morphisms ρ and γ are defined respectively by ρ = (πT M , dπ) and
γ(u, v) = ju(v), where ju is the natural isomorphism ju : TπM (v)M −→ Tu(TπM (v)M).
Let ∇ be a linear connection (or simply a connection) in the pullback bundle
π−1(TM). We associate to ∇ the map
K : TTM −→ π−1(TM) : X 7−→ ∇Xη,
called the connection (or the deflection) map of ∇. A tangent vector X ∈ Tu(TM)
is said to be horizontal if K(X) = 0 . The vector space Hu(TM) = {X ∈ Tu(TM) :
K(X) = 0} of the horizontal vectors at u ∈ TM is called the horizontal space to M
at u . The connection ∇ is said to be regular if
Tu(TM) = Vu(TM)⊕Hu(TM) ∀u ∈ TM.
If M is endowed with a regular connection, then the vector bundle maps
γ : π−1(TM) −→ V (TM),
ρ|H(T M) : H(TM) −→ π
−1(TM),
K|V (T M) : V (TM) −→ π
−1(TM)
are vector bundle isomorphisms. Let us denote β = (ρ|H(T M))
−1, then
ρoβ = idπ−1(T M), βoρ =
idH(T M) on H(TM)
0 on V(TM)
(1.1)
For a regular connection∇ we define two covariant derivatives
∇ and
∇ as follows:
For every vector (1)π-form A, we have
∇ A)(øX, øY ) := (∇βøXA)(øY ) , (
∇ A)(øX, øY ) := (∇γøXA)(øY ).
The classical torsion tensor T of the connection ∇ is defined by
T(X, Y ) = ∇XρY −∇Y ρX − ρ[X, Y ] ∀X, Y ∈ X(TM).
The horizontal ((h)h-) and mixed ((h)hv-) torsion tensors, denoted respectively by Q
and T , are defined by
Q(X, Y ) = T(βXβY ), T (X, Y ) = T(γX, βY ) ∀X, Y ∈ X(π(M)).
The classical curvature tensor K of the connection ∇ is defined by
K(X, Y )ρZ = −∇X∇Y ρZ +∇Y∇XρZ +∇[X,Y ]ρZ ∀X, Y, Z ∈ X(TM).
The horizontal (h-), mixed (hv-) and vertical (v-) curvature tensors, denoted respec-
tively by R, P and S, are defined by
R(X, Y )øZ = K(βXβY )øZ, P (X, Y )øZ = K(βX, γY )øZ, S(X, Y )øZ = K(γX, γY )øZ.
We also have the (v)h-, (v)hv- and (v)v-torsion tensors, denoted respectively by R̂,
P̂ and Ŝ, defined by
R̂(X, Y ) = R(X, Y )øη, P̂ (X, Y ) = P (X, Y )øη, Ŝ(X, Y ) = S(X, Y )øη.
Theorem 1.1. [25] Let (M,L) be a Finsler manifold. There exists a unique regular
connection ∇ in π−1(TM) such that
(a) ∇ is metric : ∇g = 0,
(b) The horizontal torsion of ∇ vanishes : Q = 0,
(c) The mixed torsion T of ∇ satisfies g(T (X, Y ), Z) = g(T (X,Z), Y ).
Such a connection is called the Cartan connection associated to the Finsler man-
ifold (M,L).
One can show that the torsion T of the Cartan connection has the property that
T (X, η) = 0 for all X ∈ X(π(M)) and associated to T we have:
Definition 1.2. [25] Let ∇ be the Cartan connection associated to (M,L). The
torsion tensor field T of the connection ∇ induces a π-tensor field of type (0, 3),
called the Cartan tensor and denoted again T , defined by :
T (X, Y , Z) = g(T (X, Y ), Z), for all X, Y , Z ∈ X(TM).
It also induces a π-form C, called the contracted torsion, defined by :
C(X) := Tr{Y 7−→ T (X, Y )}, for all X ∈ X(TM).
Definition 1.3. [25] With respect to the Cartan connection ∇ associated to (M,L),
we have
– The horizontal and vertical Ricci tensors Rich and Ricv are defined respectively by:
Rich(X, Y ) := Tr{Z 7−→ R(X,Z)Y }, for all X, Y ∈ X(TM),
Ricv(X, Y ) := Tr{Z 7−→ S(X,Z)Y }, for all X, Y ∈ X(TM).
– The horizontal and vertical Ricci maps Rich0 and Ric
0 are defined respectively by:
g(Rich0(X), Y ) := Ric
h(X, Y ), for all X, Y ∈ X(TM),
g(Ricv0(X), Y ) := Ric
v(X, Y ), for all X, Y ∈ X(TM).
– The horizontal and vertical scalar curvatures Sch , Scv are defined respectively by:
Sch := Tr(Rich0), Sc
v := Tr(Ricv0),
where R and S are respectively the horizontal and vertical curvature tensors of ∇.
Proposition 1.4. [12] Let (M,L) be a Finsler manifold. The vector field G deter-
mined by iGΩ = −dE is a spray, called the canonical spray associated to the energy
E, where E := 1
L2 and Ω := ddJE.
One can show, in this case, that G = βoη, and G is thus horizontal with respect
to the Cartan connection ∇.
Theorem 1.5. [26] Let (M,L) be a Finsler manifold. There exists a unique regular
connection D in π−1(TM) such that
(a) D is torsion free,
(b) The canonical spray G = βoη is horizontal with respect to D,
(c) The (v)hv-torsion tensor P̂ of D vanishes.
Such a connection is called the Berwald connection associated to the Finsler
manifold (M,L).
2. Special Finsler spaces
In this section, we introduce the global definitions of the most important and
commonly used special Finsler spaces in such a way that, when localized, they yield
the usual local definitions existing in the literature (see the Appendix). Here we
simply set the definitions, postponing investigation of the mutual relationships be-
tween these special Finsler spaces to the next section. The definitions are arranged
according to the type of defining property of the special Finsler space concerned.
Throughout the paper, g, ĝ, ∇ and D denote respectively the Finsler metric in
π−1(TM), the induced metric in π−1(T ∗M), the Cartan connection and the Berwald
connection associated to a given Finsler manifold (M,L). Also, T denotes the torsion
tensor of the Cartan connection (or the Cartan tensor) and R, P and S denote
respectively the horizontal curvature, the mixed curvature and the vertical curvature
of the Cartan connection.
Definition 2.1. A Finsler manifold (M,L) is :
(a) Riemannian if the metric tensor g(x, y) is independent of y or, equivalently, if
T (X, Y ) = 0, for all X, Y ∈ X(π(M)).
(b) locally Minkowskian if the metric tensor g(x, y) is independent of x or, equiva-
lently, if
∇βX T = 0 and R = 0.
Definition 2.2. A Finsler manifold (M,L) is said to be :
(a) Berwald [24] if the torsion tensor T is horizontally parallel. That is,
∇βX T = 0.
(b) Ch-recurrent if the torsion tensor T satisfies the condition
∇βX T = λo(X) T,
where λo is a π-form of order one.
(c) P ∗-Finsler manifold if the π-tensor field ∇βηT is expressed in the form
∇βη T = λ(x, y) T,
where λ(x, y) =
bg(∇βη C,C)
g(∇βηøC,øC)
and C2 := ĝ(C,C) = C(C) 6= 0; C
being the π-vector field defined by g(C,X) = C(X).
Definition 2.3. A Finsler manifold (M,L) is said to be:
(a) Cv-recurrent if the torsion tensor T satisfies the condition
(∇γXT )(Y , Z) = λo(X)T (Y , Z).
(b) C0-recurrent if the torsion tensor T satisfies the condition
(DγXT )(Y , Z) = λo(X)T (Y , Z).
Definition 2.4. [25] A Finsler manifold (M,L) is said to be :
(a) semi-C-reducible if dimM ≥ 3 and the Cartan tensor T has the form
T (X, Y , Z) =
{~(X, Y )C(Z) + ~(Y , Z)C(X) + ~(Z,X)C(Y )}+
C(X)C(Y )C(Z),
where µ and τ are scalar functions satisfying µ + τ = 1, ~ = g − ℓ ⊗ ℓ and ℓ(X) :=
L−1g(X, η).
(b) C-reducible if dimM ≥ 3 and the Cartan tensor T has the form
T (X, Y , Z) =
{~(X, Y )C(Z) + ~(Y , Z)C(X) + ~(Z,X)C(Y )}.
(c) C2-like if dimM ≥ 2 and the Cartan tensor T has the form
T (X, Y , Z) =
C(X)C(Y )C(Z).
Definition 2.5. A Finsler manifold (M,L), where dimM ≥ 3, is said to be quasi-C-
reducible if the Cartan tensor T is written as :
T (X, Y , Z) = A(X, Y )C(Z) + A(Y , Z)C(X) + A(Z,X)C(Y ),
where A is a symmetric indicatory (2) π-form (A(X, η) = 0 for all X).
Definition 2.6. [25] A Finsler manifold (M,L) is said to be :
(a) S3-like if dim(M) ≥ 4 and the vertical curvature tensor S(X, Y , Z,W )
:= g(S(X, Y )Z,W ) has the form :
S(X, Y , Z,W ) =
(n− 1)(n− 2)
{~(X,Z)~(Y ,W )− ~(X,W )~(Y , Z)}.
(b) S4-like if dim(M) ≥ 5 and the vertical curvature tensor S(X, Y , Z,W ) has the
form :
S(X, Y , Z,W ) =~(X,Z)F(Y ,W )− ~(Y , Z)F(X,W )+
+ ~(Y ,W )F(X,Z)− ~(X,W )F(Y , Z),
(2.1)
where F is the (2)π-form defined by F =
{Ricv −
Scv ~
2(n− 2)
Definition 2.7. A Finsler manifold (M,L) is said to be :
(a) Sv-recurrent if the v-curvature tensor S satisfies the condition
(∇γXS)(Y , Z,W ) = λ(X)S(Y , Z)W,
where λ is a π-form of order one.
(b) Sv-recurrent of the second order if the v-curvature tensor S satisfies the condition
∇ S)(øY, øX,Z,W,U) = Θ(X, Y )S(Z,W )U,
where Θ is a π-form of order two.
Definition 2.8. [24] A Finsler manifold (M,L) is said to be :
(a) a Landsberg manifold if
P̂ (X, Y ) = P (X, Y )η = 0 ∀X, Y ∈ X(π(M)), or equivalently ∇βη T = 0.
(b) a general Landsberg manifold if
Tr{Y −→ P̂ (X, Y )} = 0 ∀X,∈ X(π(M)), or equivalently ∇βη C = 0.
Definition 2.9. A Finsler manifold (M,L) is said to be P -symmetric if the mixed
curvature tensor P satisfies
P (X, Y )Z = P (Y ,X)Z, ∀ øX, øY, øZ ∈ X(π(M)).
Definition 2.10. A Finsler manifold (M,L), where dimM ≥ 3, is said to be P2-like
if the mixed curvature tensor P has the form :
P (X, Y , Z, øW ) = α(Z)T (X, Y , øW )− α(W ) T (X, øY, Z),
where α is a (1) π-form (positively homogeneous of degree 0).
Definition 2.11. [25] A Finsler manifold (M,L), where dimM ≥ 3, is said to be
P -reducible if the π-tensor field P (X, Y , Z) := g(P (X, Y )η, Z) can be expressed in
the form :
P (X, Y , Z) = δ(X)~(Y , Z) + δ(Y )~(Z,X) + δ(Z)~(X, Y ),
where δ is a (1) π-form satisfying δ(øη) = 0.
Definition 2.12. [2] A Finsler manifold (M,L), where dimM ≥ 3, is said to be
h-isotropic if there exists a scalar ko such that the horizontal curvature tensor R has
the form
R(X, Y )Z = ko{g(Y , Z)X − g(X,Z)Y }.
Definition 2.13. [2] A Finsler manifold (M,L), where dimM ≥ 3, is said to be :
(a) of scalar curvature if there exists a scalar function k : TM −→ R such that
the horizontal curvature tensor R(X, Y , Z,W ) := g(R(X, Y )Z,W ) satisfies the
relation
R(η,X, η, Y ) = kL2~(X, Y ).
(b) of constant curvature if the function k in (a) is constant.
Definition 2.14. A Finsler manifold (M,L) is said to be R3-like if dimM ≥ 4 and
the horizontal curvature tensor R(X, Y , Z,W ) is expressed in the form
R(X, Y , Z,W ) =g(X,Z)F (Y ,W )− g(Y , Z)F (X,W )+
+ g(Y ,W )F (X,Z)− g(X,W )F (Y , Z),
(2.2)
where F is the (2)π-form defined by F = 1
{Rich − Sc
2(n−1)
3. Relationships between different types of special
Finsler spaces
This section is devoted to global investigation of some mutual relationships
between the special Finsler spaces introduced in the preceding section. Some conse-
quences are also drawn from these relationships.
We start with some immediate consequences from the definitions:
(a) A Locally Minkowskian manifold is a Berwald manifold.
(b) A Berwald manifold is a Landsberg manifold.
(c) A Landsberg manifold is a general Landsberg manifold.
(d) A Berwald manifold is Ch-recurrent (resp. P ∗-Finsler).
(e) A P ∗-manifold is a Landsberg manifold.
(f) A C-reducible (resp. C2-like) manifold is semi-C-reducible.
(g) A semi-C-reducible manifold is quasi-C-reducible.
(h) A Finsler manifold of constant curvature is of scalar curvature.
The following two lemmas are useful for subsequent use.
Lemma 3.1. [25] For every øX, øY ∈ X(π(M)), we have:
(a) P (øη, øX)øY = 0, (b) P (øX, øη)øY = 0, (c) P (øX, øY )øη = (∇βøηT )(øX, øY ).
Lemma 3.2. If φ is the vector π-form defined by
φ(øX) := øX − L−1ℓ(øX)øη, or φ := I − L−1ℓ⊗ øη, (3.1)
where ℓ is the π-form given by ℓ(X) = L−1g(X, η), then we have:
(a) ~(øX, øY ) = g(φ(øX), øY ), (b) φ(øη) = 0, (c) φ o φ = φ,
(d) Tr(φ) = n− 1, (e) ∇βøX φ = 0, (f) ∇βøX ~ = 0.
As we have seen, a Landsberg manifold is general Landsberg. The converse is
not true. Nevertheless, we have
Proposition 3.3. A C-reducible general Landsberg manifold (M,L) is a Landsberg
manifold.
Proof. Since (M,L) is a C-reducible manifold, then, by Definition 2.4, Lemma 3.2,
the symmetry of ~ and the non-degeneracy of g, we get
T (øX, øY ) =
{~(øX, øY )øC + C(øX)φ(øY ) + C(øY )φ(øX)},
where øC is the π-vector field defined by g(øC, øX) := C(øX). Taking the h-covariant
derivative ∇βøZ of both sides of the above equation, we obtain
(∇βøZ T )(øX, øY ) =
{(∇βøZ ~)(øX, øY )øC + ~(øX, øY )∇βøZ øC + C(øX)(∇βøZ φ)(øY ) +
+(∇βøZ C)(øX)φ(øY ) + C(øY )(∇βøZ φ)(øX) + (∇βøZ C)(øY )φ(øX)},
from which, by setting øZ = øη and taking into account the fact that ∇βøZ ~ = 0 and
that ∇βøZ φ = 0 ( Lemma 3.2), we get
(∇βøη T )(øX, øY ) =
{~(øX, øY )∇βøη øC+(∇βøη C)(øX)φ(øY )+(∇βøη C)(øY )φ(øX)}.
Now, under the given assumption that the (M,L) is a general Landsberg manifold,
then ∇βøη C = 0 (Definition 2.8) and hence ∇βøη øC = 0. Hence ∇βøη T = 0 and the
result follows. �
Also, a Berwald manifold is Landsberg. The converse is by no means true,
although we have no counter-examples. Finding a Landsberg manifold which is not
Berwald is still an open problem. Nevertheless, we have
Proposition 3.4. [25] A C-reducible Landsberg manifold (M,L) is a Berwald
manifold.
Combining the above two Propositions, we obtain the more powerful result :
Proposition 3.5. A C-reducible general Landsberg manifold (M,L) is a Berwald
manifold.
Summing up, we get:
Theorem 3.6. Let (M,L) be a C-reducible Finsler manifold. The following assertion
are equivalent :
(a) (M,L) is a Berwald manifold.
(b) (M,L) is a Landsberg manifold.
(c) (M,L) is a general Landsberg manifold.
We retrieve here a result of Matsumuoto [15], namely
Corollary 3.7. If the h-curvature tensor R and hv-curvature tensor P of a C-
reducible manifold vanish, then the manifold is Locally Minkowskian.
Remark 3.8. [15] It may be conjectured that a Finsler manifold will be Minkowskian
if the h-curvature tensor R and hv-curvature tensor P vanish. As above seen the
conjecture is verified already under somewhat strong condition “C-reducibility”.
Theorem 3.9. Let (M,L) be a Finsler manifold. Then we have :
(a) A C-reducible manifold is P -reducible.
(b) A P -reducible general Landsberg manifold is Landsberg.
Proof.
(a) Since (M,L) is C-reducible, then by Definition 2.4, we have
T (øX, øY, øZ) =
SøX,øY,øZ{~(øX, øY )C(øZ)}.
Applying the h-covariant derivative ∇βøW on both sides of the above equation, taking
into account the fact that (∇βøW T )(øX, øY, øZ) = g((∇βøW T )(øX, øY ), øZ) and that
∇βøW ~ = 0, we obtain
g((∇βøWT )(øX, øY ), øZ) =
SøX,øY,øZ{~(øX, øY )(∇βøW C)(øZ)}.
From which, by setting øW = øη and noting that P (øX, øY )øη = (∇βøη T )(øX, øY ),
the result follows.
(b) Since (M,L) is a P -reducible manifold, then by Definition 2.11, taking into
account the fact that g is nondegenerate, we obtain
P (øX, øY )øη = δ(øX)φ(øY ) + δ(øY )φ(øX) + ø~(øX, øY ) øζ, (3.2)
where øζ is the π-vector field defined by g(øζ, øX) := δ(øX).
Since δ(øη) = 0, then Tr{øY 7−→ δ(øY )φ(øX) + ~(øX, øY ) øζ} = 2δ(øX). Taking
the trace of both sides of (3.2), using the fact that P (øX, øY )øη = (∇βøη T )(øX, øY )
(Lemma 3.1) and that Tr{øY 7−→ (∇βøη T )(øX, øY )} = (∇βøη C)(øX), we get
δ(øX) =
n + 1
(∇βøη C)(øX). (3.3)
Now, from Equations (3.2) and (3.3), we have
g(P (øX, øY )øη, øZ) =
n + 1
SøX,øY,øZ{~(øX, øY )(∇βøη C)(øZ)}. (3.4)
According to the given assumption that the manifold is general Landsberg, then
∇βøη C = 0. Therefore, from (3.4), we get P (øX, øY )øη = 0 and hence the manifold
is Landsberg. �
Proposition 3.10.
(a) A Ch-recurrent manifold is a P ∗-Finsler manifold.
(b) A general Landsberg P ∗-Finsler manifold is a Landsberg manifold.
Proof. The proof is straightforward and we omit it. �
Proposition 3.11. A C2-like Finsler manifold is a Berwald manifold if, and only if,
the π-tensor field C is horizontally parallel.
Proof. Let (M,L) be C2-like. Then, T (øX, øY, øZ) =
C(øC)
C(øX)C(øY )C(øZ),
from which T (øX, øY ) = 1
C(øC)
C(øX)C(øY )øC. Taking the h-covariant derivative of
both sides, we get
(∇βøZT )(øX, øY ) =
−∇βøZC(øC)
C(øX)C(øY )øC +
C(øC)
(∇βøZC)(øX)C(øY )øC +
C(øC)
(∇βøZC)(øY )C(øX)øC +
C(øC)
C(øX)C(øY )∇βøZøC.
In view of this relation, ∇βøZ T = 0 if, and only if, ∇βøZ C = 0. Hence the result. �
Corollary 3.12. A C2-like general Landsberg manifold is a Landsberg manifold.
In view of the above Theorems, we have:
Corollary 3.13. The two notions of being Landsberg and general Landsberg coincide
in the case of C-reducibility, P -reducibility, C2-likeness or P
∗-Finsler.
As we know, a C-reducible Landsberg manifold is a Berwald manifold (Proposi-
tion 3.4 ). Moreover, A C2-like Finsler manifold is a Berwald manifold if, and only
if, the π-tensor field C is horizontally parallel (Proposition 3.11). We shall try to
generalize these results to the case of semi-C-reduciblity.
Theorem 3.14. A semi-C-reducible Finsler manifold is a Berwald manifold if, and
only if, the characteristic scalar µ and the π-tensor field C are horizontally parallel.
Proof. Firstly, if (M,L) is semi-C-reducible, then
T (øX, øY, øZ) =
SøX,øY,øZ{~(øX, øY )C(øZ)}+
C(øC)
C(øX)C(øY )C(øZ).
Taking the h-covariant derivative of both sides, noting that ∇βøX~ = 0, we get
(∇βøWT )(øX, øY, øZ) =
n + 1
SøX,øY,øZ{~(øX, øY ){µ(∇βøWC)(øZ) + (∇βøWµ)C(øZ)}}+
SøX,øY,øZ{(∇βøWC)(øX)C(øY )C(øZ)} −
∇βøW µ
τ ∇βøWC(øC)
}C(øX)C(øY )C(øZ).
Now, if the characteristic scalar µ and the π-tensor field C are horizontally par-
allel, then ∇βøWT = 0 and (M,L) is a Berwald manifold.
Conversely, if (M,L) is a Berwald manifold, then∇βøXT = 0 and hence ∇βøXC =
0, ∇βøXøC = 0. These, together with the above equation, give
∇βøWµ{
SøX,øY,øZ{~(øX, øY )C(øZ)} −
C(øX)C(øY )C(øZ)} = 0,
which implies immediately that ∇βøWµ = 0. �
The following lemmas are useful for subsequent use
Lemma 3.15. For all X, Y ∈ X(π(M)), we have :
(a) [γX, γY ] = γ(∇γXY −∇γYX)
(b) [γX, βY ] = −γ(P (Y ,X)η +∇βYX) + β(∇γXY − T (X, Y ))
(c) [βX, βY ] = γ(R(X, Y )η) + β(∇βXY −∇βYX)
Lemma 3.16. For all øX, øY, øZ, øW ∈ X(π(M)) and W ∈ X(TM), we have :
(a) g((∇WT )(øX, øY ), øZ) = g((∇WT )(øX, øZ), øY ),
(b) g(S(øX, øY )øZ, øW ) = −g(S(øX, øY )øW, øZ).
Proof.
(a) From the definition of the covariant derivative, we get
g((∇WT )(øX, øY ), øZ) = g(∇WT (øX, øY ), øZ)− g(T (∇WøX, øY ), øZ)−
−g(T (øX,∇WøY ), øZ).
(3.5)
Now, we have
g(∇WT (øX, øY ), øZ) = W · g(T (øX, øY ), øZ)− g(T (øX, øY ),∇WøZ)
= W · g(T (øX, øY ), øZ)− g(T (øX,∇WøZ), øY ),
Similarly,
g(T (øX,∇WøY ), øZ) = W · g(T (øX, øZ), øY )− g(∇WT (øX, øZ), øY ).
Substituting these two equations into (3.5), noting the property that g(T (∇WøX, øY ), øZ)
= g(T (∇WøX, øZ), øY ) (cf. §1), the result follows.
(b) follows directly from the general formula (which can be easily proved)
g(K(X, Y )øZ, øW ) + g(K(X, Y )øW, øZ) = 0
by setting X = γøX and Y = γøY , where K is the classical curvature tensor of the
Cartan connection as a linear connection in the pull-back bundle (cf. §1). �
Proposition 3.17. Let (M,L) be a Ch-recurrent Finsler manifold (∇βøXT = λ0(øX)T ).
Then, we have:
(a) If Ko := λo(øη) = 0, then the hv-curvature tensor P is expressed in the form:
P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ)
and the (v)hv-torsion P̂ vanishes.
(b) If Ko 6= 0, then the v(hv)-torsion tensor P̂ is recurrent:
(∇βøZP̂ )(øX, øY ) = (λo(øZ) +
∇βøZKo
)P̂ (øX, øY ).
Proof.
(a) The hv-curvature tensor P can be written in the form [25]:
P (øX, øY, øZ, øW ) = g((∇βøZT )(øX, øY ), øW )− g((∇βøWT )(øX, øY ), øZ)+
+g(T (øX, øZ), P̂(øW, øY ))− g(T (øX, øW ), P̂(øZ, øY )).
Then, by using P̂ (øX, øY ) = (∇βøηT )(øX, øY ) (Lemma 3.1) and the C
h-recurrence
condition, we get
P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ)−
−λo(øη){g(T (øX, øW ), T (øY, øZ))− g(T (øX, øZ), T (øY, øW ))}
= λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ)− λo(øη)S(øX, øY, øZ, øW ).
Now, if λo(øη) = 0, then (a) follows from the above relation.
(b) If Ko := λo(øη) 6= 0, then by Lemma 3.1 and the recurrence condition, we have
P̂ (øX, øY ) = KoT (øX, øY ),
from which
(∇βøZP̂ )(øX, øY ) = {∇βøZKo +Koλo(øZ)}T (øX, øY ).
Then, (b) follows from the above two equations. �
Theorem 3.18. Assume that (M,L) is Ch-recurrent. Then, the v-curvature tensor S
is recurrent with respect to the h-covariant differentiation : ∇βøXS = θ(øX)S, where
θ is a π-form of order one.
Proof. One can easily show that : For all X, Y, Z ∈ X(TM),
SX,Y,Z{K(X, Y )ρZ +∇XT(Y, Z) +T(X, [Y, Z])} = 0.
Setting X = γøX, Y = γøY and Z = βøZ in the above equation, we get
S(øX, øY )øZ = ∇γøY T (øX, øZ)−∇γøXT (øY, øZ)−∇βøZT(γøX, γøZ)−
−T(γøX, [γøY, βøZ]) +T(γøY, [γøX, βøZ]) +T([γøX, γøY ], βøZ).
Using Lemma 3.15 and the fact that T(γøX, γøZ) = 0, the above equation reduces
S(øX, øY )øZ = (∇γøY T )(øX, øZ)− (∇γøXT )(øY, øZ)+
+T (øX, T (øY, øZ))− T (øY, T (øX, øZ)).
(3.6)
From which, since g(T (øX, øY ), øZ) = g(T (øX, øZ), øY ), we have
g(S(øX, øY )øZ, øW ) = g((∇γøY T )(øX, øZ), øW )− g((∇γøXT )(øY, øZ), øW )+
+g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)).
Similarly,
g(S(øX, øY )øW, øZ) = g((∇γøY T )(øX, øW ), øZ)− g((∇γøXT )(øY, øW ), øZ)+
+g(T (øX, øZ), T (øY, øW ))− g(T (øY, øZ), T (øX, øW )).
The above two equations, together with Lemma 3.16, yield
g((∇γøXT )(øY, øZ), øW ) = g((∇γøY T )(øX, øZ), øW ). (3.7)
By (3.6) and (3.7), we obtain
S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)). (3.8)
Now, using the given assumption that the manifold is Ch-recurrent, Equation
(3.8) implies that
(∇βøXS)(øY, øZ, øV, øW ) = ∇βøXS(øY, øZ, øV, øW )−
−S(∇βøXøY, øZ, øV, øW )− S(øY,∇βøXøZ, øV, øW )−
−S(øY, øZ,∇βøXøV, øW )− S(øY, øZ, øV,∇βøXøW ).
= +∇βøXg(T (øY, øW ), T (øZ, øV ))−∇βøXg(T (øZ, øW ), T (øY, øV ))−
−g(T (∇βøXøY, øW ), T (øZ, øV )) + g(T (øZ, øW ), T (∇βøXøY, øV ))−
−g(T (øY, øW ), T (∇βøXøZ, øV )) + g(T (∇βøXøZ, øW ), T (øY, øV ))−
−g(T (øY, øW ), T (øZ,∇βøXøV )) + g(T (øZ, øW ), T (øY,∇βøXøV ))−
−g(T (øY,∇βøXøW ), T (øZ, øV )) + g(T (øZ,∇βøXøW ), T (øY, øV )).
= g((∇βøXT )(øY, øW ), T (øZ, øV )) + g(T (øY, øW ), (∇βøXT )(øZ, øV ))−
−g((∇βøXT )(øZ, øW ), T (øY, øV ))− g(T (øZ, øW ), (∇βøXT )(øY, øV )).
= 2λo(øX)S(øY, øZ, øV, øW ) =: θ(øX)S(øY, øZ, øV, øW ).
Hence, the result follows. �
Corollary 3.19. In the course of the proof of Theorem 3.18, we have shown that
(Equations (3.7) and (3.8)) :
(a) (∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ),
(b) S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ)).
Corollary 3.20. Let (M,L) be a C2-like Finsler manifold. Then the the v-curvature
tensor S vanishes.
Proof. Substituting T (øX, øY ) = 1
C(øC)
C(øX)C(øY )øC in Corollary 3.19(b), we
get the result. �
Corollary 3.21. Let (M,L) be a C-reducible manifold. Then,
(a) the v-curvature tensor S has the form
S(øX, øY, øZ, øW ) =
(n + 1)2
{C2~(øX, øW )~(øY, øZ)− C2~(øY, øW )~(øX, øZ) +
+~(øX, øW )C(øY )C(øZ) + ~(øY, øZ)C(øX)C(øW )−
−~(øY, øW )C(øX)C(øZ)− ~(øX, øZ)C(øY )C(øW )}.
(b) the vertical Ricc tensor Ricv has the form
Ricv(øX, øY ) =
(3− n)
(n + 1)2
C(øX)C(øY )−
(n− 1)
(n+ 1)2
C2~(øX, øY ).
(c) the vertical scalar curvature Scv has the form
Scv =
(2− n)
(n+ 1)
Theorem 3.22. A Finsler manifold (M,L) is P -Symmetric if, and only if, the
v-curvature tensor S satisfies the equation ∇βøηS = 0.
Proof. One can show that: For all X, Y, Z ∈ X(TM),
SX,Y,Z{∇ZK(X, Y )−K(X, Y )∇Z −K([X, Y ], Z)} = 0. (3.9)
Setting X = γøX, Y = γøY and Z = βøZ in the above equation, we get
∇βøZS(øX, øY )øW +∇γøY P (øZ, øX)øW −∇γøXP (øZ, øY )øW−
−S(øX, øY )∇βøZøW + P (øZ, øY )∇γøXøW − P (øZ, øX)∇γøY øW−
−K([γøX, γøY ], βøZ)øW −K([γøY, βøZ], γøX)øW −K([βøZ, γøX ], γøY )øW = 0.
By using Lemma 3.15, the above relation reduces to
(∇βøZS)(øX, øY, øW ) + (∇γøY P )(øZ, øX, øW )− (∇γøXP )(øZ, øY, øW )+
+S(P (øZ, øY )øη, øX)øW − S(P (øZ, øX)øη, øY )øW+
+P (T (øY, øZ), øX)øW − P (T (øX, øZ), øY )øW = 0.
(3.10)
Setting øZ = øη in the above equation, taking into account Lemma 3.1 and the fact
that T (øX, øη) = 0 and that (∇γøXP )(øη, øY, øZ) = −P (øX, øY )øZ, we get
P (øX, øY )øZ = P (øY, øX)øZ − (∇βøηS)(øX, øY, øZ). (3.11)
The result follows immediately from (3.11). �
According to (3.11) and Lemma 3.1, we have :
Corollary 3.23. Let P̂ (øX, øY ) := P (øX, øY )øη and T̂ (øX, øY ) := (∇βøηT )(øX, øY ).
Then the π-tensor fields P̂ and T̂ are symmetric.
Theorem 3.18 and Theorem 3.22 give rise the following result.
Theorem 3.24. Assume that a Finsler manifold (M,L) is Ch-recurrent and P -
symmetric. If θ(øη) 6= 0, then the v-curvature tensor S vanishes identically.
Now, we shall prove the following lemma which provides some important and
useful properties of the torsion tensor T and the v-curvature S :
Lemma 3.25. For every øX, øY, øZ and øW ∈ X(π(M)), we have
(a) T (øX, øY ) = T (øY, øX),
(b) T (øη, øX) = 0,
(c) SøX,øY,øZS(øX, øY )øZ = 0,
(d) g(S(øX, øY )øZ, øW ) = g(S(øZ, øW )øX, øY ),
(e) S(øη, øX)øY = 0 = S(øX, øη)øY ,
(f) (∇γøXS)(øη, øY )øZ = −S(øX, øY )øZ, (∇γøXS)(øη, øX)øη = 0 .
(g) S(øX, øY )øZ = −1
{(DγXT )(Y , Z)− (DγY T )(X,Z)}.
Consequently, S vanishes if and only if (DγXT )(Y , Z) = (DγY T )(X,Z).
Proof.
(a) From Corollary 3.19(a), we have
(∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ).
Setting øZ = øη and using the fact that T (øX, øη) = 0 and that K oγ = idX(π(M)),
the result follows.
(b) Follows from (a) together with the relation T (øX, øη) = 0.
(c) Setting X = γøX, Y = γøY and Z = γøZ in (3.9) and using Lemma 3.15, we
SøX,øY,øZ(∇γøXS)(øY, øZ, øW ) = 0.
Again, setting øW = øη in the above equation and using the fact that S(øX, øY )øη =
0 and that K oγ = idX(π(M)), the result follows.
(d) Follows from Corollary 3.19(b), noting that T is symmetric.
(e) and (f) are clear.
(g) From the relation DγXøY = ∇γXøY − T (øX, øY ) [27], we get
(DγXT )(øY, øZ) = (∇γXT )(øY, øZ)−T (øX, T (øY, øZ))+T (T (øX, øY ), øZ)+T (øY, T (øX, øZ)),
(DγY T )(øX, øZ) = (∇γY T )(øX, øZ)−T (øY, T (øX, øZ))+T (T (øY, øX), øZ)+T (øX, T (øY, øZ)).
The result follows from the above two equations, using Corollary 3.19 and the sym-
metry of T . �
As a direct consequence of the above lemma, we have the
Corollary 3.26. A P2-like Finsler manifold is P -symmetric.
Proposition 3.27. Assume that (M,L) is Cv-recurrent. Then, the v-curvature ten-
sor S is v-recurrent : ∇γøXS = Ψ(øX)S, Ψ being a (1)π-form. Consequently, S
vanishes identically.
Proof. Taking the v-covariant derivative of both sides of the relation in Corollary
3.19(b) and, then, using the assumption that ∇γXT = λ0(X)T , we get
(∇γøXS)(øY, øZ, øV, øW ) = 2λo(øX)S(øY, øZ, øV, øW ) =: ψ(øX)S(øY, øZ, øV, øW ),
which shows that S is v-recurrent.
Now, setting øV = øη in the last equation, using the properties of S and noting
that K oγ = idX(π(M)), we conclude that S = 0. �
The following result gives a characterization of Riemannian manifolds in terms
of Cv-recurrence and C0-recurrence.
Theorem 3.28.
(a) A Cv-recurrent Finsler manifold is Riemannian,
(b) A C0-recurrent Finsler manifold is Riemannian.
Proof. (a) Since (M,L) is Cv-recurrent, then (∇γXT )(Y , Z) = λo(X)T (Y , Z), from
which, by setting øX = øη and noting that ∇γøηT = −T , we get
T (Y , Z) = −λo(η)T (Y , Z). (3.12)
But since (∇γøXT )(øY, øZ) = (∇γøY T )(øX, øZ) (Corollary 3.19), then λo(øX)T (øY, øZ) =
λo(øY )T (øX, øZ). Hence,
λo(η)T (Y , Z) = 0. (3.13)
Then, the result follows from (3.12) and (3.13).
(b) can be proved similarly. �
Theorem 3.29. For a Finsler manifold (M,L), the following assertions are
equivalent :
(a) (M,L) is Sv-recurrent.
(b) The v-curvature tensor S vanishes identically.
(c) (M,L) is Sv-recurrent of the second order.
Proof.
(a) =⇒ (b) : If (M,L) is Sv-recurrent, then by Definition 2.7(a) we have
(∇γøWS)(øX, øY, øZ) = λ(øW )S(øY, øX)øZ,
from which, by setting øZ = øη, taking into account the fact that S(øX, øY )øη = 0
and that Koγ = idπ−1(TM), the result follows.
(b) =⇒ (a) : Trivial.
(b) =⇒ (c) : Trivial.
(c) =⇒ (b) : If the given manifold (M,L) is Sv-recurrent of the second order, then
by Definition 2.7(b) we get
Θ(øX, øY )S(øZ, øV )øW = (
∇ S)(øY, øX, øZ, øV, øW )
= ∇γøY (∇γøXS)(øZ, øV, øW )− (∇γ∇γøY øXS)(øZ, øV, øW )−
−(∇γøXS)(∇γøY øZ, øV, øW )− (∇γøXS)(øZ,∇γøY øV, øW )−
−(∇γøXS)(øZ, øV,∇γøY øW ).
By substituting øZ = øη = øW in the above equation and using Lemma 3.25 and the
fact that S(øX, øY )øη = 0, we get
S(øX, øY )øZ = −S(øZ, øY )øX and S(øX, øY )øZ = −S(øX, øZ)øY.
From this, together with the identity SøX,øY,øZS(øX, øY )øZ = 0, the v-curvature
tensor S vanishes identically. �
In view of the above theorem we have :
Corollary 3.30.
(a) An Sv-recurrent (resp. a second order Sv-recurrent) manifold (M,L) is S3-like,
provided that dimM ≥ 4.
(b) An Sv-recurrent (resp. a second order Sv-recurrent) manifold (M,L) is S4-like,
provided that dimM ≥ 5.
Theorem 3.31. If (M,L) is a P2-like Finsler manifold, then the v-curvature tensor
S vanishes or the hv-curvature tensor P vanishes. In the later case, the h-covariant
derivative of S vanishes.
Proof. As (M,L) is P2-like, then P (X, Y , η, øW ) = α(η)T (X, Y , øW ) =: αoT (X, Y , øW )
and hence
P̂ (øX, øY ) = αoT (X, Y ). (3.14)
Now, setting øW = øη into (3.10), we get
(∇γøY P̂ )(øZ, øX)− (∇γøX P̂ )(øZ, øY )− P (øZ, øX)øY + P (øZ, øY )øX−
−P̂ (T (øX, øZ), øY ) + P̂ (T (øY, øZ), øX) = 0.
Hence,
g((∇γøY P̂ )(øZ, øX), øW )− g((∇γøXP̂ )(øZ, øY ), øW )− P (øZ, øX, øY, øW )+
+P (øZ, øY, øX, øW )− g(P̂ (T (øX, øZ), øY ), øW ) + g(P̂ (T (øY, øZ), øX), øW ) = 0.
From which, together with (3.14) and Definition 2.10, taking into account the relation
(∇γøY P̂ )(øZ, øX) = (∇γøY αo)T (øZ, øX) + αo(∇γøY T )(øZ, øX), we obtain
g((∇γøY αo)T (øZ, øX) + αo(∇γøY T )(øZ, øX), øW )− g((∇γøXαo)T (øZ, øY )+
+αo(∇γøXT )(øZ, øY ), øW ) + α(X)T (Z, Y , øW )− α(W ) T (Z, øY,X)− α(Y )T (Z,X, øW )
+α(W ) T (X, øY, Z)− g(αoT (T (øX, øZ), øY ), øW ) + g(αoT (T (øY, øZ), øX), øW ) = 0.
Therefore, using Corollary 3.19,
(∇γøY α)(øη)T (øX, øZ, øW )− (∇γøXα)(øη)T (øY, øZ, øW ) = αoS(øX, øY, øW, øZ).
It is to be observed that the left-hand side of the above equation is symmetric in
the arguments øZ and øW while the right-hand side is skew-symmetric in the same
arguments. Hence we have
αoS(øX, øY, øW, øZ) = 0, (3.15)
ε(øY )T (øX, øZ, øW )− ε(øX)T (øY, øZ, øW ) = 0, (3.16)
where ε is the π-form defined by ε(øY ) := (∇γøY α)(øη).
Now, If ε 6= 0, it follows from (3.16) that there exists a scalar function Υ such that
T (øX, øY, øZ) = Υ ε(øX)ε(øY )ε(øZ). Consequently, T (øX, øY ) = Υ ε(øX)ε(øY )øε,
where g(øε, øX) := ε(øX). From which
S(øX, øY, øZ, øW ) = g(T (øX, øW ), T (øY, øZ))− g(T (øY, øW ), T (øX, øZ))
= Υ ε(øX)ε(øY )ε(øZ)ε(øW )g(øε, øε)−Υ ε(øX)ε(øY )ε(øZ)ε(øW )g(øε, øε) = 0.
On the other hand, if the v-curvature tensor S 6= 0, then it follows from (3.15)
that ε = 0 and α(øη) = 0. Hence, α = 0 and the hv-curvature tensor P vanishes. In
this case, it follows from the identity (3.10) that ∇βøXS = 0. �
Proposition 3.32. A P2-like Finsler manifold (M,L) is a P
∗-Finsler manifold.
Proof. As (M,L) is P2-like, then from (3.14), we have P̂ (X, Y ) = αoT (X, Y ). Using
Lemma 3.1, we get (∇βøηT )(øX, øY ) = α0T (øX, øY ), from which, by taking the trace,
∇βøηC = α0T , where α0 =
bg(∇βη C,C)
. Hence the result. �
The next definition will be useful in the sequel.
Definition 3.33. A π-tensor field Θ is positively homogenous of degree r in the
directional argument y (symbolically, h(r)) if it satisfies the condition
∇γη Θ = rΘ, or Dγη Θ = rΘ.
Lemma 3.34. Let (M,L) be a Finsler manifold, then we have
(a) The Finsler metric g (the angular metric tensor ~) is homogenous of degree 0,
(b) The v-curvature tensor S is homogenous of degree −2,
(c) The hv-curvature tensor P is homogenous of degree −1,
(d) The h-curvature tensor R is homogenous of degree 0,
(e) The (h)hv-torsion tensor T is homogenous of degree −1,
(f) The (v)hv-torsion tensor P̂ is homogenous of degree 0,
(g) The (v)h-torsion tensor R̂ is homogenous of degree 1.
Lemma 3.35. For every vector (1)π-form A, we have
∇ A)(øX, øY, øZ)− (
∇ A)(øY, øX, øZ) = A(R(øX, øY )øZ)− R(øX, øY )A(øZ)+
γ bR(øX,øY )
A)(øZ).
Deicke theorem [4] can be formulated globally as follows:
Lemma 3.36. Let (M,L) be a Finsler manifold. The following assertions are
equivalent:
(a) (M,L) is Riemannian,
(b) The (h)hv-torsion tensor T vanishes,
(c) The π-form C vanishes.
Theorem 3.37. Let (M,L) be Finsler manifold which is h-isotropic (of scalar k0)
and Ch-recurrent (of recurrence vector λ0). Then, (M,L) is necessarily one of the
following:
(a) A Riemannian manifold of constant curvature,
(b) A Finsler manifold of dimension 2,
(c) A Finsler manifold of dimensions n ≥ 3 with vanishing scalar k0 and
(∇βøXλo)(øY ) = (∇βøY λo)(øX).
Proof. For a Ch-recurrent manifold, one can easily show that
∇ T )(øX, øY, øZ, øW )− (
∇ T )(øY, øX, øZ, øW ) =
= {(∇βøXλo)(øY )− (∇βøY λo)(øX)}T (øZ, øW ) =: Ψ(øX, øY )T (øZ, øW ).
From which, taking into account Lemma 3.35, we obtain
Ψ(øX, øY )T (øZ, øW ) = T (R(øX, øY )øZ, øW ) + T (øZ,R(øX, øY )øW )−
−R(øX, øY )T (øZ, øW ) + (∇
γ bR(øX,øY )
T )(øZ, øW ).
Now, as (M,L) is h-isotropic of scalar k0, then the h-curvature tensor R has the form
R(øX, øY )øZ = k0{g(øX, øZ)øY − g(øY, øZ)øX} ; (n ≥ 3).
From the above two equations, we get
Ψ(øX, øY )T (øZ, øW ) = k0g(øX, øZ)T (øY, øW )− k0g(øY, øZ)T (øX, øW ) + k0g(øX, øW )T (øZ, øY )−
−k0g(øY, øW )T (øZ, øX)− k0g(øX, T (øZ, øW ))øY + k0g(øY, T (øZ, øW ))øX
+k0g(øX, øη)(∇γøY T )(øZ, øW )− k0g(øY, øη)(∇γøXT )(øZ, øW ).
(3.17)
Setting øY = øη, noting that T is h(−1) and g(øη, øη) = L2, we get
Ψ(øX, øη)T (øZ, øW ) = −k0g(øη, øZ)T (øX, øW )− k0g(øη, øW )T (øZ, øX)− k0T (øX, øZ, øW )øη −
−k0g(øX, øη)T (øZ, øW )− k0L
2(∇γøXT )(øZ, øW ).
From which, we have
g(øY, øη)Ψ(øX, øη)T (øZ, øW ) = −k0g(øY, øη)g(øη, øZ)T (øX, øW )− k0g(øY, øη)g(øη, øW )T (øZ, øX)−
−k0g(øY, øη)T (øX, øZ, øW )øη− k0g(øY, øη)g(øX, øη)T (øZ, øW )−
2g(øY, øη)(∇γøXT )(øZ, øW ),
(3.18)
whereas
g(øX, øη)Ψ(øY, øη)T (øZ, øW ) = −k0g(øX, øη)g(øη, øZ)T (øY, øW )− k0g(øX, øη)g(øη, øW )T (øZ, øY )−
−k0g(øX, øη)T (øY, øZ, øW )øη− k0g(øX, øη)g(øY, øη)T (øZ, øW )−
2g(øX, øη)(∇γøY T )(øZ, øW ).
(3.19)
Now, from (3.17), (3.18) and (3.19), we obtain
T (øZ, øW ){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} =
= UøX,øY k0L
2{~(øX, øZ)T (øY, øW ) + ~(øX, øW )T (øY, øZ)− φ(øY ) T (øX, øZ, øW )}.
Taking the trace of both sides of the above equation, we get
C(øZ){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} =
= 2k0L
2{~(øX, øZ)C(øY )− ~(øY, øZ)C(øX)}.
(3.20)
Setting øZ = øC, taking into account the fact that ~(øX, øC) = C(øX), the above
equation reduces to
C(øC){L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη)} = 0.
Now, if C(øC) = g(øC, øC) = 0, then øC = 0 and so C = 0. Consequently, by
Lemma 3.36, (M,L) is a Riemannian manifold of constant curvature.
On the other hand, if (M,L) is not Riemannian, then we have
L2Ψ(øX, øY )− g(øY, øη)Ψ(øX, øη) + g(øX, øη)Ψ(øY, øη) = 0.
From which, together with (3.20), we get
k0{~(øX, øZ)C(øY )− ~(øY, øZ)C(øX)} = 0. (3.21)
If k0 6= 0, then, by (3.21), ~(øX, øZ)C(øY ) = ~(øY, øZ)C(øX). Setting øY = øC,
we get ~(øX, øZ) = 1
C(øX)C(øZ), which implies that dimM = 2.
If k0 = 0, then R = 0 and (3.17) yields Ψ(øX, øY ) = 0, which means that
(∇βøXλo)(øY ) = (∇βøY λo)(øX). �
Now, we focus our attention to the interesting case (c) of the above theorem. In
this case, the h-curvature tensor R = 0 and hence the (v)h-torsion tensor R̂ = 0.
Therefore, the equation (deduced from (3.9))
(∇γøXR)(øY, øZ, øW ) + (∇βøY P )(øZ, øX, øW )− (∇βøZP )(øY, øX, øW )−
−P (øZ, P (øY, øX)øη)øW +R(T (øX, øY ), øZ)øW − S(R(øY, øZ)øη, øX)øW+
+P (øY, P (øZ, øX)øη)øW − R(T (øX, øZ), øY )øW = 0.
reduces to
(∇βøY P )(øZ, øX, øW )− (∇βøZP )(øY, øX, øW )−
−P (øZ, P̂ (øY, øX))øW + P (øY, P̂ (øZ, øX))øW = 0.
Setting øW = øη, we get
(∇βøY P̂ )(øZ, øX)− (∇βøZP̂ )(øY, øX)− P̂ (øZ, P̂ (øY, øX)) + P̂ (øY, P̂ (øZ, øX)) = 0.
(3.22)
Since (M,L) is Ch-recurrent, then, by Proposition 3.17, the (v)hv-torsion tensor
P̂ satisfies the relations (∇βøZP̂ )(øX, øY ) = (Koλo(øZ) + ∇βøZKo)T (øX, øY ) and
P̂ (øX, øY ) = λo(øη)T (øX, øY ) = KoT (øX, øY ). From these, together with (3.22),
we get
(Koλo(øY ) +∇βøYKo)T (øZ, øX)− (Koλo(øZ) +∇βøZKo)T (øX, øY )−
−K2oT (øZ, T (øX, øY )) +K
oT (øY, T (øX, øZ)) = 0.
Hence, by Corollary 3.19,
K2oS(øY, øZ, øX, øW ) = UøY,øZ{(Koλo(øY ) +∇βøYKo)T (øX, øZ, øW )}.
As S(øY, øZ, øX, øW ) is skew-symmetric in the arguments øX and øW while the
right-hand side is symmetric in the same arguments, we obtain
K2oS(øY, øZ, øX, øW ) = 0, (3.23)
UøY,øZ{(Koλo(øY ) +∇βøYKo)T (øZ, øX, øW )} = 0. (3.24)
It follows from (3.23) and () that
P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ).
On the other hand, if Ko 6= 0, then the v-curvature tensor S vanishes from (3.23).
Next, it is seen from (3.24) that, if V(øY ) := Koλo(øY ) +∇βøYKo 6= 0, then there
exists a scalar function Υ =
T (øX,øZ,øW )T (øX,øY,øZ)T (øY,øZ,øW )
(T (øX,øY,øW ))2(V(øZ))3
such that
T (øX, øY, øW ) = ΥV(øX)V(øY )V(øW ).
Summing up, we have
Theorem 3.38. Let (M,L) be a Finsler manifold of dimensions n ≥ 3. If (M,L) is
h-isotropic and Ch-recurrent, then
(a) the recurrence vector λo satisfies : (∇βøXλo)(øY ) = (∇βøY λo)(øX),
(b) the h-curvature tensor R = 0 and the (v)h-torsion tensor R̂ = 0,
(c) the hv-curvature tensor P has the property that
P (øX, øY, øZ, øW ) = λo(øZ)T (øX, øY, øW )− λo(øW )T (øX, øY, øZ),
(d) the (v)hv-torsion tensor P̂ (øX, øY ) = KoT (øX, øY ).
Moreover, if Ko 6= 0, then
(e) the v-curvature tensor S vanishes,
(f) the (h)hv-torsion tensor T satisfies : T (øX, øY, øW ) = ΥV(øX)V(øY )V(øW ).
By Definition 2.10 and Theorem 3.38, we immediately have :
Corollary 3.39. A Finsler manifold (M,L) of dimension n ≥ 3 which is h-isotropic
and Ch-recurrent is necessarily P2-like.
Now, we define an operator P which aids us to investigate the R3-like manifolds.
Definition 3.40.
(a) If ω is a π-tensor field of type (1,p), then P · ω is a π-tensor field of the same
type defined by :
(P · ω)(øX1, ..., øXp) := φ(ω(φ(øX1), ..., φ(øXp))),
where φ is the vector π-form defined by (3.1).
(b) If ω is a π-tensor field of type (0,p), then P · ω is a π-tensor field of the same
type defined by :
(P · ω)(øX1, ..., øXp) := ω(φ(øX1), ..., φ(øXp)).
Remark 3.41. Since φ(φ(øX)) = φ(øX) for every øX ∈ X(π(M)) (Lemma 3.2),
then the operator P is a projector (i.e. P · (P · ω) = P · ω).
Definition 3.42. A π-tensor field ω is said to be indicatory if it satisfies the
condition : P · ω = ω.
The following result gives a characterization of the indicatory property for certain
types of π-tensor fields :
Lemma 3.43.
(a) A vector (2)π-form ω is indicatory if, and only if, ω(øX, øη) = 0 = ω(øη, øX)
and g(ω(øX, øY ), øη) = 0.
(b) A scaler (2) π-form ω is indicatory if, and only if, ω(øX, øη) = 0 = ω(øη, øX).
Proof.
(a) Let ω be a vector (2)π-form. By Definition 3.40(a) and taking into account (3.1),
we get
(P · ω)(øX, øY ) = φ(ω(φ(øX), φ(øY )))
= φ{ω(øX − L−1ℓ(øX)øη, øY − L−1ℓ(øY )øη)}
= φ{ω(øX, øY )− L−1ℓ(øY )ω(øX, øη)−
−L−1ℓ(øX)ω(øη, øY ) + L−2ℓ(øX)ℓ(øY )ω(øη, øη)}
= ω(øX, øY )− L−2g(ω(øX, øY ), øη)øη − φ{L−1ℓ(øY )ω(øX, øη)+
+L−1ℓ(øX)ω(øη, øY )− L−2ℓ(øX)ℓ(øY )ω(øη, øη)}
(3.25)
Now, if ω(øX, øη) = 0 = ω(øη, øX) and g(ω(øX, øY ), øη) = 0, then (3.25) implies
that (P · ω)(øX, øY ) = ω(øX, øY ) and hence ω is indicatory.
On the other hand, if ω is indicatory, then ω(øX, øY ) = φ(ω(φ(øX), φ(øY ))).
From which, setting øX = øη (resp. øY = øη) and taking into account the fact that
φ(øη) = 0 (Lemma 3.2), we get ω(øη, øY ) = 0 (resp. ω(øX, øη) = 0). From this, to-
gether with (P·ω)(øX, øY ) = ω(øX, øY ), Equation (3.25) implies that L−2g(ω(øX, øY ), øη)øη =
0. Consequently, g(ω(øX, øY ), øη) = 0.
(b) The proof is similar to that of (a) and we omit it. �
Proposition 3.44. For a Finsler manifold (M,L), the following tensors are
indicatory :
(a) The π-tensor field φ,
(b) The mixed torsion tensor T ,
(c) The v-curvature tensor S,
(d) The angular metric tensor ~,
(e) The π-tensor field P · ω for every π-tensor field ω.
Now, we define the following π-tensor fields:
F : F (X, Y ) := 1
{Rich(X, Y )−
Schg(X,Y )
2(n−1)
Fo : g(Fo(øX), øY ) := F (øX, øY ),
F a : F a(øX) := F (øη, øX),
F b : F b(øX) := F (øX, øη),
m : m(øX, øY ) := (P · F )(øX, øY ),
mo : g(mo(øX), øY ) := m(øX, øY ),
a : a(øX) := L−1(P · F a)(øX),
øa : g(øa, øY ) := a(øX),
b : b(øX) := L−1(P · F b)(øX),
øb : g(øb, øX) := b(øX),
c : c := L−2F (øη, øη),
R̂ : R̂(øX, øY ) := R(øX, øY )øη,
H : H(øX) := R(øη, øX)øη = R̂(øη, øX).


(3.26)
Remark 3.45. One can show that m, mo, a and b are indicatory and H(øη) = 0.
Proposition 3.46. If (M,L) is an R3-like Finsler manifold, then the π-tensor field
F can be written in the form
F (øX, øY ) = m(øX, øY ) + ℓ(øX)a(øY ) + ℓ(øY )b(øX) + c ℓ(øX)ℓ(øY ). (3.27)
Proof. The proof follows from Definitions 2.14 and 3.40(b), taking into account
Equations (3.1) and (3.26). In more details :
(P · F )(øX, øY ) = F (φ(øX), φ(øY ))
= F (øX − L−1ℓ(øX)øη, øY − L−1ℓ(øY )øη)
= F (øX, øY )− L−1ℓ(øY )F (øX, øη)−
−L−1ℓ(øX)F (øη, øY ) + L−2ℓ(øX)ℓ(øY )F (øη, øη)
= F (øX, øY )− L−1ℓ(øY ){(P · F b)(øX) + L−1ℓ(øX)F (øη, øη)}−
−L−1ℓ(øX){(P · F a)(øY ) + L−1ℓ(øY )F (øη øη)}+ L−2ℓ(øX)ℓ(øY )F (øη, øη)
= F (øX, øY )− ℓ(øX)a(øY )− ℓ(øY )b(øX)− c ℓ(øX)ℓ(øY ). �
Remark 3.47. One can show that the π-tensor fields a and b satisfy the following
relations
F a(øX) = L{a(øX) + c ℓ(øX)},
F b(øX) = L{b(øX) + c ℓ(øX)}.
(3.28)
Proposition 3.48. In an R3-like Finsler manifold (M,L), we have :
(a) R(øX, øY )øZ = g(øX, øZ)Fo(øY )+F (øX, øZ)øY−g(øY, øZ)Fo(øX)−F (øY, øZ)øX.
(b) R̂(øX, øY ) = g(øX, øη)Fo(øY )+F (øX, øη)øY −g(øY, øη)Fo(øX)−F (øY, øη)øX.
(c) H(øY ) = L2Fo(øY ) + c L
2øY − g(øY, øη)Fo(øη)− F (øY, øη)øη.
(d) Fo(øX) = mo(øX) + øa ℓ(øX) + L
−1b(øX)øη + c L−1ℓ(øX)øη.
Consequently,
(e) R̂(øX, øY ) = L{ℓ(øX)(mo(øY ) + c φ(øY )) + b(øX)φ(øY )}−
− L{ℓ(øY )(mo(øX) + c φ(øX)) + b(øY )φ(øX)}.
(f) H(øY ) = L2{mo(øY ) + c φ(øY )}.
Proof.
(a) Since (M,L) is an R3-like manifold, then by Definition 2.14, we have
R(X, Y , Z,W ) =g(X,Z)F (Y ,W )− g(Y , Z)F (X,W )+
+ g(Y ,W )F (X,Z)− g(X,W )F (Y , Z).
From which, using the fact that g(Fo(øX), øY ) = F (øX, øY ) and that the Finsler
metric g is non-degenerate, the result follows.
(b) Follows from (a) by setting øZ = øη.
(c) Follows from (b) by setting øX = øη.
(d) By (3.27) and (3.26), we get
g(Fo(øX), øY ) = g(mo(øX), øY )+g(øa, øY ) ℓ(øX)+L
−1b(øX)g(øη, øY )+c L−1ℓ(øX)g(øη, øY ).
Hence, the result follows, from the non-degeneracy of g.
(e) Follows by substituting Fo(øX) (from (d)) and F
b(øX) (from (3.28)) into (b).
(f) Follows from (e) by setting øX = øη, taking into account Remark 3.45 and the
fact that ℓ(øη) = L. �
Remark 3.49. In view of (3.26) and Lemma 3.2, Definition 2.13(a) can be reformu-
lated as follows:
A Finsler manifold (M,L) is of scaler curvature if the π-tensor field H satisfies the
relation H(øX) = L2κφ(øX), where κ is a scalar function on TM.
Definition 3.50. A Finsler manifold (M,L) is said to be of perpendicular scalar
(or of p-scalar) curvature if the h-curvature tensor R satisfies the condition
(P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øX, øW )~(øY, øZ)}, (3.29)
where Ro is a function called the perpendicular scalar curvature.
Definition 3.51. A Finsler manifold (M,L) is said to be of s-ps curvature if (M,L)
is both of scalar curvature and of p-scalar curvature.
Proposition 3.52. If mo(øX) = t φ(øX), then an R3-like Finsler manifold is a
Finsler manifold of s-ps curvature.
Proof. Under the given assumption and taking into account Proposition 3.48(f), we
H(øX) = L2κφ(øX), with κ = t + c.
Thus, the considered manifold is of scalar curvature.
Now, we prove that the given manifold is of p-scalar curvature. Applying the
projection P on the h-curvature tensor R of an R3-like manifold, we get
(P · R)(øX, øY, øZ, øW ) = R(φ(øX), φ(øY ), φ(øZ), φ(øW ))
= g(φ(øX), φ(øZ))(P · F )(øY, øW ) + g(φ(øY ), φ(øW ))(P · F )(øX, øZ)−
−g(φ(øY ), φ(øZ))(P · F )(øX, øW )− g(φ(øX), φ(øW ))(P · F )(øY, øZ)
= g(φ(øX), φ(øZ))m(øY, øW ) + g(φ(øY ), φ(øW ))m(øX, øZ)−
−g(φ(øY ), φ(øZ))m(øX, øW )− g(φ(øX), φ(øW ))m(øY, øZ).
(3.30)
Since
g(φ(øX), φ(øY )) = g(φ(øX), øY − L−1ℓ(øY )øη) = g(φ(øX), øY )− L−1ℓ(øY )g(φ(øX), øη)
= ~(øX, øY )− L−1ℓ(øY )~(øX, øη) = ~(øX, øY ),
then, by using again the given assumption (mo = t φ =⇒ m = t~), Equation (3.30)
reduces to
(P · R)(øX, øY, øZ, øW ) = ~(øX, øZ)m(øY, øW ) + ~(øY, øW )m(øX, øZ)−
−~(øY, øZ)m(øX, øW )− ~(øX, øW )m(øY, øZ)
= 2t{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}.
Therefore, by taking Ro = 2t, we have
(P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}.
Consequently, the given manifold is of p-scalar curvature. �
Theorem 3.53. If an R3-like Finsler manifold (M,L) is of p-scalar curvature, then
it is of s-ps curvature.
Proof. Since the considered manifold is R3-like, then, by the same procedure as in
the proof of Proposition 3.52, we have
(P · R)(øX, øY, øZ, øW ) = ~(øX, øZ)m(øY, øW ) + ~(øY, øW )m(øX, øZ)−
−~(øY, øZ)m(øX, øW )− ~(øX, øW )m(øY, øZ).
(3.31)
On the other hand, since the considered manifold is of p-scalar curvature, then the
h-curvature tensor satisfies
(P · R)(øX, øY, øZ, øW ) = Ro{~(øX, øZ)~(øY, øW )− ~(øY, øZ)~(øX, øW )}. (3.32)
Now, from Equations (3.31) and (3.32), we obtain
UøX,øY {Ro~(øX, øZ)~(øY, øW )− ~(øX, øZ)m(øY, øW )− ~(øY, øW )m(øX, øZ)} = 0.
Using (3.26) and the non-degeneracy of the metric tensor g, the above equation
reduces to
UøX,øY {Ro~(øX, øZ)φ(øY )− ~(øX, øZ)mo(øY )−m(øX, øZ)φ(øY )} = 0. (3.33)
Since the π-tensor fields φ,m and mo are indicatory, then
Tr{øY 7−→ ~(øX, øY )φ(øZ)} = g(øX, φ(øZ)) = ~(øX, øZ),
Tr{øY 7−→ ~(øX, øY )mo(øZ)} = m(øX, øZ),
Tr{øY 7−→ m(øX, øY )φ(øZ)} = m(øX, øZ).
Consequently, if we take the trace of both sides of Equation (3.33), making use of
Lemma 3.43, we get
(n− 2)Ro~(øX, øZ)− (n− 3)m(øX, øZ)− (n− 1)t ~(øX, øZ) = 0,
where t := 1
Tr(mo). From which, using (3.26) and Lemma 3.2, we get
(n− 2)Roφ− (n− 3)mo − (n− 1)t φ = 0. (3.34)
Again, taking the trace of the above equation, we obtain
(n− 1)(n− 2)(Ro − 2t) = 0.
Substituting the above relation into (3.34), we get mo = t φ. Hence, by Proposition
3.52, the result follows. �
Theorem 3.54. If an R3-like Finsler manifold (M,L) is of scalar curvature, then it
is of s-ps curvature.
Proof. Since the given manifold is R3-like, then the π-tensor H is given by (cf.
Proposition 3.48):
H(øX) = L2{mo(øX) + c φ(øX)}. (3.35)
And since the considered manifold is of scalar curvature, then
H(øX) = L2κφ(øX). (3.36)
From Equations (3.35) and (3.36), we deduce thatmo(øX) = (κ−c)φ(øX) =: tφ(øX).
Hence, by Proposition 3.52, the result follows. �
Now, let us define the π-tensor field
Ψ(øX, øY, øZ, øW ) = R(øX, øY, øZ, øW )− 1
UøX,øY {g(øX, øZ)Ric
h(øY, øW )+
+g(øY, øW )Rich(øX, øZ)− rg(øX, øZ)g(øY, øW )},
(3.37)
where r = 1
Sch. From Definition 2.14 and (3.37), we immediately obtain
Theorem 3.55. An R3-like Finsler manifold is characterized by
Ψ(øX, øY, øZ, øW ) = 0.
The tensor field Ψ in the above theorem being of the same form as the Weyl
conformal tensor in Riemannian geometry, we draw the following
Theorem 3.56. An R3-like Riemannian manifold is conformally flat.
Remark 3.57. It should be noted that some important results of [8], [9], [11], [13],
[19], [20],...,etc. (obtained in local coordinates) are retrieved from the above mentioned
global results (when localized).
Appendix. Local formulae
For the sake of completeness, we present in this appendix a brief and concise
survey of the local expressions of some important geometric objects and the local
definitions of the special Finsler manifolds treated in the paper.
Let (U, (xi)) be a system of local coordinates on M and (π−1(U), (xi, yi)) the
associated system of local coordinates on TM . We use the following notations :
(∂i) := (
): the natural basis of TxM, x ∈M ,
(∂̇i) := (
): the natural basis of Vu(TM), u ∈ TM ,
(∂i, ∂̇i): the natural basis of Tu(TM),
(ø∂i): the natural basis of the fiber over u in π
−1(TM) (ø∂i is the lift of ∂i at u).
To a Finsler manifold (M,L), we associate the geometric objects :
gij :=
∂̇i∂̇jL
2 = ∂̇i∂̇jE: the Finsler metric tensor,
Cijk :=
∂̇k gij : the Cartan tensor,
~ij := gij − ℓiℓj (ℓi := ∂L/∂y
i): the angular metric tensor,
Gh: the components of the canonical spray,
Ghi := ∂̇iG
Ghij := ∂̇jG
i = ∂̇j ∂̇iG
(δi) := (∂i −G
i ∂̇h): the basis of Hu(TM) adapted to G
(δi, ∂̇i): the basis of Tu(TM) = Hu(TM)⊕ Vu(TM) adapted to G
We have :
γ(ø∂i) = ∂̇i,
ρ(∂i) = ø∂i, ρ(∂̇i) = 0, ρ(δi) = ø∂i,
β(ø∂i) = δi,
J(∂i) = ∂̇i, J(∂̇i) = 0, J(δi) = ∂̇i,
h := βoρ = dxi ⊗ ∂i −G
j ⊗ ∂̇i v := γoK = dy
i ⊗ ∂̇i +G
j ⊗ ∂̇i.
We define :
γhij :=
ghℓ(∂i gℓj + ∂j giℓ − ∂ℓ gij),
Chij :=
ghℓ(∂̇i gℓj + ∂̇j giℓ − ∂̇ℓ gij) =
ghℓ ∂̇i gjℓ = g
hℓCijℓ,
Γhij :=
ghℓ(δi gℓj + δj giℓ − δℓ gij) .
Then, we have :
• The canonical spray G: Gh = 1
γhij y
• The Barthel connection Γ: Ghi = ∂̇iG
h = Γhijy
j = Ghijy
• The Cartan connection CΓ: ( Γhij, G
i , C
The associated h-covariant (resp. v-covariant) derivative is denoted by p (resp. |),
where Ki
j|k := δkK
mk −K
jk and K
j |k := ∂̇kK
mk −K
• The Berwald connection BΓ: ( Ghij, G
i , 0).
The associated h-covariant (resp. v-covariant) derivative is denoted by
p(resp.
where Ki
:= δkK
mk −K
jk and K
:= ∂̇kK
We also have Ghij = Γ
ij + C
ij |k y
k = Γhij + C
ij |o, where C
ij |o = C
ij |k y
For the Cartan connection, we have :
(v)h-torsion : Rijk = δkG
j − δjG
k = Ujk{δkG
(v)hv-torsion : P ijk = G
jk − Γ
jk = C
jk|my
m = C i
jk|0,
(h)hv-torsion : C ijk = 1/2{g
ri∂̇rgjk},
h-curvature : Rihjk = Ujk{δkΓ
hj + Γ
mk} − C
hv-curvature : P ihjk = ∂̇kΓ
hj − C
+ C ihmP
v-curvature : Sihjk = C
mj − C
mk = Ujk{C
For the Berwald connection, we have :
(v)h-torsion : R∗ijk = δkG
j − δjG
k = Ujk{δkG
h-curvature : R∗ihjk = Ujk{δkG
hj +G
hv-curvature : P ∗ihjk = ∂̇kG
hj =: G
In the following, we give the local definitions of the special Finsler spaces treated
in the paper. For each special Finsler space (M,L), we set its name, its defining
property and a selected reference in which the local definition is located:
• Rimaniann manifold [22] : gij(x, y) ≡ gij(x) ⇐⇒ Cijk = 0 ⇐⇒ Ci := C
ik = 0
(Deicke’s theorem [4]).
• Minkowaskian manifold [22]: gij(x, y) ≡ gij(y) ⇐⇒ C
= 0 and Rhijk = 0.
• Berwald manifold [22]: Γhij(x, y) ≡ Γ
ij(x) (i.e. ∂̇kΓ
ij = 0) ⇐⇒ C
• Ch-recurrent manifold [13]: Chij|k = µkChij ,
where µj is a covariant vector field.
• P ∗-Finsler manifold [7]: Ch
ij|0 = λ(x, y)C
where λ(x, y) = PiC
; Pi := P
ik = C
ik|0 = Ci|0 and C
2 = CiC
i 6= 0.
• Cv-recurrent manifold [13]: C ijk|l = λl C
jk or Cijk|l = λl Cijk.
• C0-recurrent manifold [13]: C ijk
= λl C
jk or Cijk
= λl Cijk.
• Semi-C-reducible manifold (dimM ≥ 3) [18]:
Cijk =
(n+ 1)
(~ijCk + ~jkCi + ~kiCj) +
CiCjCk, C
2 6= 0,
where µ and τ are scalar functions satisfying µ+ τ = 1.
• C-reducible manifold (dimM ≥ 3) [15]: Cijk =
(~ijCk + ~jkCi + ~kiCj).
• C2-like manifold (dimM ≥ 2) [17]: Cijk =
CiCjCk, C
2 6= 0.
• quasi-C-reducible manifold (dimM ≥ 3) [23]: Cijk = AijCk + AjkCi + AkiCj ,
where Aij(x, y) is a symmetric tensor field satisfying Aijy
i = 0.
• S3-like manifold (dimM ≥ 4) [6]: Slijk =
(n−1)(n−2)
{~ik~lj − ~ij~lk},
where S is the vertical scalar curvature.
• S4-like manifold (dimM ≥ 5) [6]: Slijk = ~ljFik − ~lkFij + ~ikFlj − ~ijFlk,
where Fij :=
{Sij −
2(n−2)
S~ij}; Sij being the vertical Ricci tensor.
• Sv-recurrent manifold [20], [11]: Shijk|m = λmShijk,
where λj(x, y) is a covariant vector field.
• Second order Sv-recurrent manifold [20], [11]: Shijk|m|n = ΘmnShijk,
where Θij(x, y) is a covariant tensor field.
• Landsberg manifold [7]: P hkji y
k = 0 ⇐⇒ (∂̇iΓ
k = 0 ⇐⇒ Ch
yk = 0.
• General Landsberg manifold [10]: P rijry
i = 0 ⇐⇒ Cj|o = 0.
• P -symmetric manifold [19]: Phijk = Phikj.
• P2-like manifold (dimM ≥ 3) [14]: Phijk = αhCijk − αiChjk,
where αk(x, y) is a covariant vector field.
• P -reducible manifold (dimM ≥ 3) [19]: Pijk =
(~ij Pk + ~jk Pi + ~ki Pj),
where Pijk = ghiP
• h-isotropic manifold (dimM ≥ 3) [13]: Rhijk = ko{ghjgik − ghkgij},
for some scalar ko, where Rhijk = gilR
• Manifold of scalar curvature [21]: Rijkl y
iyk = kL2~jl,
for some function k : TM −→ R .
• Manifold of constant curvature [21]: the function k in the above definition is
constant.
• Manifold of perpendicular scalar (or of p-scalar ) curvature [8], [9]:
P ·Rhijk := ~
k Rlmnr = Ro{~ik~hj − ~ij~hk},
where Ro is a function called a perpendicular scalar curvature.
• Manifold of s-ps curvature [8], [9]: (M,L) is both of scalar curvature and of
p-scalar curvature.
• R3-like manifold (dimM ≥ 4) [8]: Rhijk = ghjFik − ghkFij + gikFhj − gijFhk,
where Fij :=
{Rij −
r gij}; Rij := R
ijh, r :=
References
[1] H. AKbar-Zadeh, Les espaces de Finsler et certaines de leurs généralisations,
Ann. Ec. Norm. Sup., Série 3, 80 (1963), 1–79.
[2] , Sur les espaces de Finsler isotropes, C. R. Acad. Sc. Paris, série A
(1979), 53–56.
[3] , Initiation to global Finsler geometry, Elsevier, 2006.
[4] F. Brickell, A new proof of Deicke’s theorem on homogeneous functions, Proc.
Amer. Math. Soc., 16 (1965), 190-191.
[5] P. Dazord, Propriétés globales des géodésiques des espaces de Finsler, Thèse
d’Etat, (575) Publ. Dept. Math. Lyon, 1969.
[6] F. Ikedo, On S3- and S4-like Finsler spaces with the T-tensor of a special form,
Tensor, N. S., 35 (1981), 345–351.
[7] H. Izumi, On P∗-Finsler spaces, I, Memoirs of the Defense Academy, Japan, No.
4, XVI (1976), 133–138.
[8] H. Izumi and T. N. Srivastava, On R3-like Finsler spaces, Tensor, N. S., 32
(1978), 339–349.
[9] H. Izumi and M. Yoshida, On Finsler spaces of perpendicular scalar curvature,
Tensor, N. S., 32 (1978), 219–224.
[10] M. Kitayama, Geometry of transformations of Finsler metrics, Ph. D. Thesis,
Hokkaido University of Education, Kushiro, Japan, 2000.
[11] , Indicatrices of Randers change, 9th International Conference of Tensor
Society, Sapporo, Japan, September 4-8, 2006.
[12] J. Klein and A. Voutier, Formes extérieures génératrices de sprays, Ann. Inst.
Fourier, Grenoble, 18(1) (1968), 241–260.
[13] M. Matsumoto, On h-isotropic and Ch-recurrent Finsler spaces, J. Math. Kyoto
Univ., 11 (1971), 1–9.
[14] , On Finsler spaces with curvature tensors of some special forms, Tensor,
N. S., 22 (1971), 201–204.
[15] , On C-reducible Finsler spaces, Tensor, N. S., 24 (1972), 29–37.
[16] , Foundations of Finsler geometry and special Finsler spaces, Kaiseisha
Press, Otsu, Japan, 1986.
[17] M. Matsumoto and S. Numata, On semi-C-reducible Finsler spaces with constant
coefficients and C2-like Finsler spaces, Tensor, N. S., 34 (1980), 218–222.
[18] M. Matsumoto and C. Shibata, On semi-C-reducibility, T-tensor = 0 and S4-
likeness of Finsler spaces, J. Math. Kyoto Univ., 19 (1979), 301–314.
[19] M. Matsumoto and Shimada, On Finsler spaces with the curvature tensors Phijk
and Shijk satsfiying special conditions, Rep. Math. Phys., 12 (1977), 77–87.
[20] A. Moór, Über Finsler Räume von Zweifach Rekurrenter Krümmung, Acta
Math. Acad. Sci. Hungaricae, 22 (1971), 453–465.
[21] S. Numata, On Landesberg spaces of scalar curvature, J. Korean Math. Soc.,
12(2) (1975), 97–100.
[22] H. Rund, The differential geometry of Finsler spaces, Springer-Verlag, Berlin,
1959.
[23] C. Shibata, On invariant tensors of β-changes of Finsler metrics, J. Math.
Kyoto Univ., 24(1) (1984), 163–188.
[24] A. A. Tamim, General theory of Finsler spaces with applications to Randers
spaces, Ph. D. Thesis, Cairo University, 1991.
[25] , Special Finsler manifolds, J. Egypt. Math. Soc., 10(2) (2002), 149–177.
[26] , On Finsler submanifolds, J. Egypt. Math. Soc., 12(1) (2004), 55–70.
[27] Nabil L. Youssef, S. H. Abed, and A. Soleiman, A global theory of conformal
Finsler geometry, Submitted. ArXiv No.: math. DG/0610052.
ABSTRACT
  The aim of the present paper is to provide a global presentation of the
theory of special Finsler manifolds. We introduce and investigate globally (or
intrinsically, free from local coordinates) many of the most important and most
commonly used special Finsler manifolds: locally Minkowskian, Berwald,
Landesberg, general Landesberg, $P$-reducible, $C$-reducible,
semi-$C$-reducible, quasi-$C$-reducible, $P^{*}$-Finsler, $C^{h}$-recurrent,
$C^{v}$-recurrent, $C^{0}$-recurrent, $S^{v}$-recurrent, $S^{v}$-recurrent of
the second order, $C_{2}$-like, $S_{3}$-like, $S_{4}$-like, $P_{2}$-like,
$R_{3}$-like, $P$-symmetric, $h$-isotropic, of scalar curvature, of constant
curvature, of $p$-scalar curvature, of $s$-$ps$-curvature. The global
definitions of these special Finsler manifolds are introduced. Various
relationships between the different types of the considered special Finsler
manifolds are found. Many local results, known in the literature, are proved
globally and several new results are obtained. As a by-product, interesting
identities and properties concerning the torsion tensor fields and the
curvature tensor fields are deduced. Although our investigation is entirely
global, we provide; for comparison reasons, an appendix presenting a local
counterpart of our global approach and the local definitions of the special
Finsler spaces considered.

<|endoftext|><|startoftext|>
The Hardy-Lorentz Spaces Hp,q(Rn)
Wael Abu-Shammala and Alberto Torchinsky
Abstract
In this paper we consider the Hardy-Lorentz spaces Hp,q(Rn), with
0 < p ≤ 1, 0 < q ≤ ∞. We discuss the atomic decomposition of
the elements in these spaces, their interpolation properties, and the
behavior of singular integrals and other operators acting on them.
The real variable theory of the Hardy spaces represents a fruitful setting for
the study of maximal functions and singular integral operators. In fact, it
is because of the failure of these operators to preserve L1 that the Hardy
space H1 assumes its prominent role in harmonic analysis. Now, for many
of these operators, the role of L1 can just as well be played by H1,∞, or
Weak H1. However, although these operators are amenable to H1 − L1 and
H1,∞ − L1,∞ estimates, interpolation between H1 and H1,∞ has not been
available. Similar considerations apply to Hp and Weak Hp for 0 < p < 1.
The purpose of this paper is to provide an interpolation result for the
Hardy-Lorentz spaces Hp,q, 0 < p ≤ 1, 0 < q ≤ ∞, including the case of
Weak Hp as and end point for real interpolation. The atomic decomposition
is the key ingredient in dealing with interpolation since in this context neither
truncations are available, nor reiteration applies.
The paper is organized as follows. The Lorentz spaces, including criteria
that assure membership in Lp,q, 0 < p < ∞, 0 < q ≤ ∞, are discussed in
Section 1. In Section 2 we show that distributions in Hp,q have an atomic
decomposition in terms ofHp atoms with coefficients in an appropriate mixed
norm space. An interesting application of this decomposition is to Hp,q−Lp,∞
estimates for Calderón-Zygmund singular integral operators, p < q ≤ ∞.
Also, by manipulating the different levels of the atomic decomposition, we
show that, for 0 < q1 < q < q2 ≤ ∞, H
p,q is an intermediate space between
Hp,q1 and Hp,q2. This result applies to Calderón-Zygmund singular integral
operators, including those with variable kernels, Marcinkiewicz integrals, and
other operators.
http://arxiv.org/abs/0704.0054v1
1 The Lorentz spaces
The Lorentz space Lp,q(Rn) = Lp,q, 0 < p <∞, 0 < q ≤ ∞, consists of those
measurable functions f with finite quasinorm ‖f‖p,q given by
‖f‖p,q =
[t1/pf ∗(t)]q
, 0 < q <∞ ,
‖f‖p,∞ = sup
[t1/pf ∗(t)] , q = ∞ .
The Lorentz quasinorm may also be given in terms of the distribution func-
tion m(f, λ) = |{x ∈ Rn : |f(x)| > λ}|, loosely speaking, the inverse of the
non-increasing rearrangement f ∗ of f . Indeed, we have
‖f‖p,q =
λq−1m(f, λ)q/p dλ
2km(f, 2k)1/p
when 0 < q <∞, and
‖f‖p,∞ = sup
2km(f, 2k)1/p , q = ∞ .
Note that, in particular, Lp,p = Lp, and Lp,∞ is weak Lp.
The following two results are useful in verifying that a function is in Lp,q.
Lemma 1.1. Let 0 < p <∞, and 0 < q ≤ ∞. Assume that the non-negative
sequence {µk} satisfies {2
kµk} ∈ ℓ
q. Further suppose that the non-negative
function ϕ verifies the following property: there exists 0 < ε < 1 such that,
given an arbitrary integer k0, we have ϕ ≤ ψk0 + ηk0, where ψk0 is essentially
bounded and satisfies ‖ψk0‖∞ ≤ c 2
k0, and
2k0εpm(ηk0 , 2
k0) ≤ c
[2kεµk]
Then, ϕ ∈ Lp,q, and ‖ϕ‖p,q ≤ c ‖{2
kµk}‖ℓq.
Proof. It clearly suffices to verify that ‖{2k |{ϕ > γ 2k}|1/p}‖ℓq <∞, where γ
is an arbitrary positive constant. Now, given k0, let ψk0 and ηk0 be as above,
and put γ = c+ 1, where c is the constant in the above inequalities; for this
choice of γ, {ϕ > γ 2k0} ⊆ {ηk0 > 2
When q = ∞, we have
2k0εm(ηk0 , 2
k0)1/p ≤ c
[2−k(1−ε) 2k µk]
≤ c 2−k0(1−ε) sup
[ 2k µk] .
Thus, 2k0 m(ηk0 , 2
k0)1/p ≤ supk≥k0[ 2
k µk] , and, consequently,
2k0 m(ϕ, γ 2k0)1/p ≤ c ‖{2kµk}‖ℓ∞ , all k0.
When 0 < q < ∞, let 1 − ε = 2δ and rewrite the right-hand side above
[2k(1−δ)µk]
When p < q, by Hölder’s inequality with exponent r = q/p and its conjugate
r′, this expression is dominated by
2k δpr
)1/r′(
2k(1−δ)µk
]rp )1/r
≤ c 2−k0 δp
2k(1−δ)µk
]q )p/q
and, when 0 < q ≤ p, r < 1, and we get a similar bound by simply observing
that it does not exceed
2−k0δp
[2k(1−δ)µk]
≤ 2−k0δp
2k(1−δ)µk
]q )p/q
Whence, continuing with the estimate, we have
2k0εpm(ηk0 , 2
k0) ≤ c 2−k0δp
2k(1−δ)µk
]q )p/q
which yields, since 1− ε = 2 δ,
2k0 m(ϕ, γ 2k0)1/p ≤ c 2k0 δ
2k(1−δ)µk
]q )1/q
Thus, raising to the q and summing, we get
2k0 m(ϕ, γ 2k0)1/p
2k0 δ q
2k(1−δ)µk
which, upon changing the order of summation in the right-hand side of the
above inequality, is bounded by
2k(1−δ)µk
k0=−∞
2k0 δ q
The reader will have no difficulty in verifying that, for Lemma 1.1 to hold,
it suffices that ψx0 satisfies
m(ψx0 , 2
k0)1/p ≤ c µk0 , all k0 .
This holds, for instance, when ‖ψx0‖
r ≤ c 2
, 0 < r < ∞. In fact, the
assumptions of Lemma 1.1 correspond to the limiting case of this inequality
as r → ∞.
Another useful condition is given by our next result, the proof is left to
the reader.
Lemma 1.2. Let 0 < p < ∞, and let the non-negative sequence {µk} be
such that {2kµk} ∈ ℓ
q, 0 < q ≤ ∞. Further, suppose that the non-negative
function ϕ satisfies the following property: there exists 0 < ε < 1 such that,
given an arbitrary integer k0, we have ϕ ≤ ψk0+ηk0, where ψk0 and ηk0 satisfy
2k0pm(ψk0 , 2
k0)ε ≤ c
2kµεk
, 0 < ε < min(1, q/p) ,
2k0ε|{ηk0 > 2
k0}| ≤ c
2kεµk
Then, ϕ ∈ Lp,q, and ‖ϕ‖p,q ≤ c ‖{2
kµk}‖ℓq.
We will also require some basic concepts from the theory of real interpo-
lation. Let A0, A1, be a compatible couple of quasinormed Banach spaces,
i.e., both A0 and A1 are continuously embedded in a larger topological vector
space. The Peetre K functional of f ∈ A0 + A1 at t > 0 is defined by
K(t, f ;A0, A1) = inf
f=f0+f1
‖f0‖0 + t ‖f1‖1 ,
where f = f0 + f1, f0 ∈ A0 and f1 ∈ A1.
In the particular case of the Lq spaces, the K functional can be computed
by Holmstedt’s formula, see [12]. Specifically, for 0 < q0 < q1 ≤ ∞, let α be
given by 1/α = 1/q0 − 1/q1. Then,
K(t, f ;Lq0, Lq1) ∼
f ∗(s)q0ds
)1/q0
f ∗(s)q1ds
)1/q1
The intermediate space (A0, A1)η, q, 0 < η < 1, 0 < q < ∞, consists of
those f ’s in A0 + A1 with
‖f‖(A0,A1)η, q =
t−ηK(t, f ;A0, A1)
]q dt
‖f‖(A0,A1)η,∞ = sup
t−ηK(t, f ;A0, A1)
<∞ , q = ∞ .
Finally, for the Lq and Lp,q spaces, we have the following result. Let
0 < q1 < q < q2 ≤ ∞, and suppose that 1/q = (1 − η)/q1 + η/q2. Then,
Lq = (Lq1 , Lq2)η,q, and, L
1,q = (L1,q1, L1,q2)η,q, see [4].
2 The Hardy-Lorentz spaces Hp,q
In this paper we adopt the atomic characterization of the Hardy spaces Hp,
0 < p ≤ 1. Recall that a compactly supported function a with [n(1/p− 1)]
vanishing moments is an Hp atom with defining interval I (of course, I is
a cube in Rn), if supp(a) ⊆ I, and |I|1/p |a(x)| ≤ 1. The Hardy space
Hp(Rn) = Hp consists of those distributions f that can be written as f =
λjaj , where the aj ’s are H
p atoms,
p < ∞, and the convergence is
in the sense of distributions as well as in Hp. Furthermore,
‖f‖Hp ∼ inf
where the infimum is taken over all possible atomic decompositions of f .
This last expression has traditionally been called the atomic Hp norm of f .
C. Fefferman, Rivière and Sagher identified the intermediate spaces be-
tween the Hardy space Hp0, 0 < p0 < 1, and L
∞, as
(Hp0, L∞)η,q = H
p,q, 1/p = (1− η)/p0 , 0 < q ≤ ∞ ,
where Hp,q consists of those distributions f whose radial maximal function
Mf(x) = supt>0 |(f ∗ ϕt)(x)| belongs to L
p,q. Here ϕ is a compactly sup-
ported, smooth function with nonvanishing integral, see [10]. R. Fefferman
and Soria studied in detail the space H1,∞, which they called Weak H1, see
[11].
Just as in the case of Hp, Hp,q can be characterized in a number of
different ways, including in terms of non-tangential maximal functions and
Lusin functions. In what follows we will calculate the quasinorm of f in Hp,q
by the means of the expression
2km(Mf, 2k)1/p
, 0 < p ≤ 1, 0 < q ≤ ∞ ,
where Mf is an appropriate maximal function of f .
Passing to the atomic decomposition of Hp,q, the proof is divided in two
parts. First, we construct an essentially optimal atomic decomposition; Par-
ilov has obtained independently this result for H1,q when 1 ≤ q, see [14].
Also, R. Fefferman and Soria gave the atomic decomposition of Weak H1,
see [11], and Alvarez the atomic decomposition of Weak Hp, 0 < p < 1, see
Theorem 2.1. Let f ∈ Hp,q, 0 < p ≤ 1, 0 < q ≤ ∞. Then f has an atomic
decomposition f =
j,k λj,kaj,k, where the aj,k’s are H
p atoms with defining
intervals Ij,k that have bounded overlap uniformly for each k, the sequence
{λj,k} satisfies
j |λj,k|
< ∞, and the convergence is in the
sense of distributions. Furthermore,
j |λj,k|
∼ ‖f‖Hp,q .
Proof. The idea of constructing an atomic decomposition using Calderón’s
reproducing formula is well understood, so we will only sketch it here, for
further details, see [5] and [18]. Let Nf(x) = sup{|(f ∗ ψt)(y)| : |x− y| < t}
denote the non-tangential maximal function of f with respect to a suitable
smooth function ψ with nonvanishing integral. One considers the open sets
Ok = {Nf > 2
k}, all integers k, and builds the atoms with defining interval
associated to the intervals, actually cubes, of the Whitney decomposition
of Ok, and hence satisfying all the required properties. More precisely, one
constructs a sequence of bounded functions fk with norm not exceeding c 2
for each k, and such that f −
|k|≤n fk → 0 as n→ ∞ in the sense of distri-
butions. These functions have the further property that fk(x) =
j αj,k(x) ,
where |αj,k(x)| ≤ c 2
k, c is a constant, each αj,k has vanishing moments up
to order [n(1/p − 1)] and is supported in Ij,k - roughly one of the Whitney
cubes -, where the Ij,k’s have bounded overlaps for each k, uniformly in k. It
only remains now to scale αj,k,
αj,k(x) = λj,k aj,k(x) ,
and balance the contribution of each term to the sum. Let λj,k = 2
k|Ij,k|
Then, aj,k(x) is essentially an H
p atom with defining interval Ij,k, and one
j |λj,k|
∼ 2k |Ok|
1/p. Thus,
|λj,k|
)1/p∥
2k |Ok|
∼ ‖f‖Hp,q , 0 < q ≤ ∞ . �
As an application of this atomic decomposition, the reader should have
no difficulty in showing directly the C. Fefferman, Rivière, Sagher character-
ization of Hp,q, see [10].
Another interesting application of this decomposition is to Hp,q − Lp,∞
estimates for Calderón-Zygmund singular integral operators T , p < q ≤ ∞.
This approach combines the concept of p-quasi local operator of Weisz, see
[17], with the idea of variable dilations of R. Fefferman and Soria, see [11].
Intuitively, since Hörmander’s condition implies that T maps H1 into L1, say,
for T to be defined in H1,s, 1 < s ≤ ∞, some strengthening of this condition
is required. This is accomplished by the variable dilations. Moreover, since
we will include p < 1 in our discussion, as p gets smaller, more regularity of
the kernel of T will be required. This justifies the following definition.
Given 0 < p ≤ 1, let N = [n(1/p − 1)], and, associated to the kernel
k(x, y) of a Calderón-Zygmund singular integral operator T , consider the
modulus of continuity ωp given by
ωp(δ) = sup
Rn\(2/δ)I
| k(x, y)−
|α|≤N
(y − yI)
αkα(x, yI)| dy
where 0 < δ ≤ 1, and the sup is taken over the collection of arbitrary intervals
I of Rn centered at yI . Here, for a multi-index α = (α1, . . . , αn),
kα(x, yI) =
Dαk(x, y)
ωp(δ) controls the behavior of T on atoms. More precisely, if a is an H
p atom
with defining interval I, and 0 < δ < 1, observe that
T (a)(x) =
[k(x, y)−
|α|≤N
(y − yI)
αkα(x, yI)] a(y) dy ,
and, consequently,
Rn\(2/δ)I
|T (a)(x)|p dx ≤ ωp(δ) .
We are now ready to prove the Hp,q − Lp,∞ estimate for a Calderón-
Zygmund singular integral operator T with kernel k(x, y).
Theorem 2.2. Let 0 < p ≤ 1, and p < q ≤ ∞. Assume that a Calderón-
Zygmund singular integral operator T is of weak-type (r, r) for some 1 < r <
∞, and that the modulus of continuity ωp of the kernel k satisfies a Dini
condition of order q/(q − p), namely,
Ap,q =
ωp(δ)
q/(q−p)dδ
](q−p)/q
Then T maps Hp,q continuously into Lp,∞, and ‖Tf‖p,∞ ≤ cA
p,q ‖f‖Hp,q .
Proof. We need to show that
2k0pm(Tf, 2k0) ≤ c ‖f‖
Hp,q , all k0 .
Let f =
j λj,kaj,k , be the atomic decomposition of f given in Theorem
2.1, and set f1 =
j λj,kaj,k, and f2 = f − f1. Further, let µk =
j |λj,k|
, and recall that ‖{µk}‖ℓq ∼ ‖f‖Hp,q .
Since ‖f1‖
r ≤ c 2
k0(r−p)‖f‖
Hp,∞, we have
2pk0m(Tf1, 2
k0) ≤ c ‖f‖
Hp,∞ .
Next, put I∗j,k = 2
1/n(3/2)p(k−k0)/nIj,k, and let
I∗j,k .
Since |I∗j,k| = 2(3/2)
p(k−k0)|Ij,k| ∼ 2
−k0p(3/4)p(k−k0)|λj,k|
p, we get
|Ω| ≤
|I∗j,k| ≤ c 2
(3/4)p(k−k0)
|λj,k|
≤ c 2−k0p
≤ c 2−k0p‖f‖
Hp,∞ .
Also, since 0 < p ≤ 1, it readily follows that
|T (f2)(x)|
|λj,k|
p|T (aj,k)(x)|
and, by Tonelli and the estimate for T (a), we have
|T (f2)(x)|
p dx ≤
|λj,k|
Rn\I∗
|T (aj,k)(x)|
)p(k−k0)/n
)pk/n)q/(q−p))(q−p)/q
ωp(δ)
q/(q−p)dδ
](q−p)/q
Hp,q .
This bound gives at once
2pk0 |{x /∈ Ω : |T (f2)(x)| > 2
k0}| ≤ cAp,q ‖f‖
Hp,q ,
which implies that
2pk0m(Tf2, 2
k0−1) ≤ 2pk0
|Ω|+ |{x /∈ Ω : |T (f2)(x)| > 2
k0−1}|
≤ c ‖f‖
Hp,∞ + cAp,q ‖f‖
Hp,q .
Finally,
2k0pm(Tf, 2k0) ≤ 2k0pm(Tf1, 2
k0−1) + 2k0pm(Tf2, 2
k0−1)
≤ c ‖f‖
Hp,∞ + cAp,q ‖f‖
Hp,q ,
and, since ‖f‖Hp,∞ ≤ c ‖f‖Hp,q for all q, we have finished. �
We pass now to the converse of Theorem 2.1. It is apparent that a
condition that relates the coefficients λj with the corresponding atoms aj
involved in an atomic decomposition of the form
j λjaj(x) is relevant here.
More precisely, if Ij denotes the supporting interval of aj , let
Ik = {j : 2
k ≤ |λj|/|Ij|
1/p < 2k+1} ,
and, for λ = {λj}, put
‖λ‖[p,q] =
]q/p)1/q
We then have,
Theorem 2.3. Let 0 < p ≤ 1, 0 < q ≤ ∞, and let f be a distribution given
by f =
j λj aj(x) , where the aj’s are H
p atoms, and the convergence is in
the sense of distributions. Further, assume that the family {Ij} consisting of
the supports of the aj’s has bounded overlap at each level Ik uniformly in k,
and ‖λ‖[p,q] <∞. Then, f ∈ H
p,q, and ‖f‖Hp,q ≤ c ‖λ‖[p,q].
Proof. Let Mf(x) = supt>0 |(f ∗ ψt)(x)| denote the radial maximal function
of f with respect to a suitable smooth function ψ with support contained in
{|x| ≤ 1} and nonvanishing integral. We will verify that Mf satisfies the
conditions of Lemma 1.1 and is thus in Lp,q.
Fix an integer k0 and let
g(x) =
λjaj(x) .
Since ‖Mg‖∞ ≤ ‖g‖∞ it suffices to estimate |g(x)|. Let C be the bounded
overlap constant for the family of the supports of the aj’s. Then, for j ∈ Ik,
|λj| |aj(x)| =
|Ij|1/p
|λj | |Ij|
1/p |aj(x)| ≤ 2
kχIj (x) ,
and, consequently,
|g(x)| ≤
χIj(x) ≤ C 2
Next, let
h(x) =
λjaj(x) .
Since aj has N = [n(1/p − 1)] vanishing moments, it is not hard to see
that, if Ij is the defining interval of aj and Ij is centered at xj , and γ =
(n+N+1)/n > 1/p, then, with c independent of j, ϕj(x) =Maj(x) satisfies
ϕj(x) ≤ c
|Ij |
γ−1/p
(|Ij|+ |x− xj |n)γ
Thus, if 1/γ < εp < 1,
Mh(x)εp ≤ c
j∈Ik,k≥k0
(|λj| |Ij|
γ−1/p)εp
(|Ij |+ |x− xj |n)γεp
which, upon integration, yields
Mh(x)εp dx ≤ c
j∈Ik,k≥k0
(|λj| |Ij|
γ−1/p)εp
(|Ij |+ |x− xj |n)γεp
The integrals in the right-hand side above are of order |Ij|
1−γεp and, conse-
quently, by Chebychev’s inequality,
2k0εp|{Mh > 2k0}| ≤ c
j∈Ik,k≥k0
εp |Ij|
1−ε ≤ c
|Ij| .
Thus, Lemma 1.1 applies with ϕ = Mf , ψk0 = Mg, ηk0 = Mh, and µk =
, and we get
2km(Mf, 2k)1/p
)1/p}∥
which, since
|Ij| ∼
, j ∈ Ik ,
is bounded by c ‖λ‖[p,q], 0 < q ≤ ∞. �
The next result is of interest because it applies to arbitrary decomposi-
tions in Hp,q. The proof relies on Lemma 1.2, and is left to the reader.
Theorem 2.4. Let 0 < p ≤ 1, 0 < q ≤ ∞, and let f be a distribution given
by f =
j λj aj(x) , where the aj’s are H
p atoms, and the convergence is
in the sense of distributions. Further, assume that ‖λ‖[η,q] < ∞ for some
0 < η < min(p, q). Then, f ∈ Hp,q, and ‖f‖Hp,q ≤ c ‖λ‖[η,q].
2.1 Interpolation between Hardy-Lorentz spaces
We are now ready to identify the intermediate spaces of a couple of Hardy-
Lorentz spaces with the same first index p ≤ 1.
Theorem 2.5. Let 0 < p ≤ 1. Given 0 < q1 < q < q2 ≤ ∞, define 0 < η < 1
by the relation 1/q = (1− η)/q1 + η/q2. Then, with equivalent quasinorms,
Hp,q = (Hp,q1, Hp,q2)η,q .
Proof. Since the non-tangential maximal function Nf of a distribution f in
Hp,q1 is in Lp,q1, and that of f in Hp,q2 is in Lp,q2, we have
K(t, Nf ;Lp,q1, Lp,q2) ≤ cK(t, f ;Hp,q1, Hp,q2) .
Thus,
‖Nf‖p,q ∼ ‖Nf‖(Lp,q1 ,Lp,q2)η,q ≤ c ‖f‖(Hp,q1 ,Hp,q2 )η,q ,
and (Hp,q1, Hp,q2)η,q →֒ H
To show the other embedding, with the notation in the proof of Theorem
2.1, write f =
j λj,kaj,k , and recall that for every integer k, the level
set Ik = {j : |λj,k|/|Ij,k|
1/p ∼ 2k} contains exclusively the sequence {λj,k}.
Let µ
|λj,k|
p. By construction,
k ∼ ‖f‖
Hp,q . Now, rearrange
{µk} into {µ
l }, and, for each l ≥ 1, let kl be such that µkl = µ
l . For
l0 ≥ 1, let Kl0 = {k1, . . . , kl0}, and put f1,l0 =
k∈Kl0
j λj,kaj,k and f2,l0 =
f − f1,l0 . Then, by Theorem 2.2, f1,l0 ∈ H
p,q1, f2,l0 ∈ H
p,q2, and, with the
usual interpretation for q2 = ∞,
‖f1,l0‖Hp,q1 ≤ c
)1/q1
, ‖f2,l0‖Hp,q2 ≤ c
)1/q2
So, for t > 0 and every positive integer l0, we have
K(t, f ;Hp,q1, Hp,q2) ≤ c
)1/q1
)1/q2
Now, by Homstedt’s formula, there is a choice of l0 such that the right-hand
side above ∼ K(t, {µk}; ℓ
q1, ℓq2), and, consequently,
K(t, f ;Hp,q1, Hp,q2) ≤ cK(t, {µk}; ℓ
q1, ℓq2) .
Thus,
‖f‖(Hp,q1 ,Hp,q2 )η,q ≤ c ‖{µk}‖(ℓq1 ,ℓq2)η,q
≤ c ‖{µk}‖ℓq ≤ c ‖f‖Hp,q ,
and Hp,q →֒ (Hp,q1, Hp,q2)η,q. �
The reader will have no difficulty in verifying that Theorem 2.5 gives that
if T is a continuous, sublinear map fromH1 into L1, and fromH1,∞ into L1,∞,
then ‖Tf‖1,q ≤ c ‖f‖H1,q for 1 < q < ∞. This observation has numerous
applications. For instance, consider the Calderón-Zygmund singular integral
operators with variable kernel defined by
TΩ(f)(x) = p.v.
Ω(x, x−y)
|x− y|n
f(y) dy .
Under appropriate growth and smoothness assumptions on Ω, TΩ maps H
continuously into L1, see [6], and H1,∞ continuously into L1,∞, see [8]. Thus,
if Ω satisfies the assumptions of both of these results, TΩ maps H
1,q contin-
uously into L1,q for 1 < q < ∞. A similar result follows by invoking the
characterization of H1,q given by C. Fefferman, Rivière and Sagher. How-
ever, in this case the Hp−Lp estimate requires additional smoothness of Ω, as
shown, for instance, in [6]. Similar considerations apply to the Marcinkiewicz
integral, see [9], and [7].
Finally, when p < 1, our results cover, for instance, the δ-CZ operators
satisfying T ∗(1) = 0 discussed by Alvarez and Milman, see [3]. These oper-
ators, as well as a more general related class introduced in [15], preserve Hp
and Hp,∞ for n/(n + δ) < p ≤ 1, and, consequently, by Theorem 2.5, they
also preserve Hp,q for p in that same range, and q > p.
References
[1] W. Abu-Shammala and A. Torchinsky, The atomic decomposition for
H1,q(Rn), Proceedings of the International Conference on Harmonic
Analysis and Ergodic Theory, (2005), to appear.
[2] J. Alvarez, Hp and Weak Hp continuity of Calderón-Zygmund type op-
erators, Lecture Notes in Pure and Appl. Math. 157 (1992), 17–34.
[3] J. Alvarez and M. Milman, Hp continuity of Calderón-Zygmund type
operators, J. Math. Anal. Appl. 118 (1986), 63–79.
[4] J. Bergh and J. Löfström, Interpolation spaces, an introduction,
Springer-Verlag, 1976.
[5] A. P. Calderón, An atomic decomposition of distributions in parabolic
Hp spaces, Advances in Math. 25 (1977), 216–225.
[6] J. Chen, Y. Ding, and D. Fan, A class of integral operators with variable
kernels in Hardy spaces, Chinese Annals of Math. (A) 23 (2002), 289–
[7] Y. Ding, C.-C. Lin, and S. Shao, On the Marcinkiewciz integral with
variable kernels, Indiana Math. J. 53, (2004), 805–821.
[8] Y. Ding, S. Z. Lu, and S. Shao, Integral operators with variable kernels
on weak Hardy spaces, J. Math. Anal. Appl. 317, (2006), 127-135.
[9] Y. Ding, S. Z. Lu, and Q. Xue, Marcinkiewicz integral on Hardy spaces,
Integr. Equ. Oper. Theory 42, (2002), 174-182.
[10] C. Fefferman, N. M. Rivière, and Y. Sagher, Interpolation between Hp
spaces: the real method, Trans. Amer. Math. Soc. 191 (1974), 75–81.
[11] R. Fefferman and F. Soria, The space Weak H1, Studia Math. 85 (1987),
1–16.
[12] T. Holmstedt, Interpolation of quasi-normed spaces, Math. Scand. 25
(1970), 177–199.
[13] P. Krée, Interpolation d’espaces vectoriels qui ne sont ni normés ni com-
plets. Applications., Ann. Inst. Fourier (Grenoble) 17 (1967), 137–174.
[14] D. V. Parilov, Two theorems on the Hardy-Lorentz classes H1,q, Zap.
Nauchm. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 327
(2005), 150-167.
[15] T. Quek and D. Yang, Calderón-Zygmund type operators on weighted
weak Hardy spaces over Rn, Acta Math. Sinica (Engl. Ser.) 16 (2000),
141–160.
[16] A. Torchinsky, Real-variable methods in harmonic analysis, Dover Pub-
lications, Inc., 2004.
[17] F. Weisz, Summability of multi-dimensional Fourier series and Hardy
spaces, Kluwer Academic Publishers, 2002.
[18] J. M. Wilson, On the atomic decomposition for Hardy spaces, Pacific. J.
Math. 116 (1985), 201–207.
DEPARTMENT OF MATHEMATICS, INDIANA UNIVERSITY,
BLOOMINGTON, IN 47405
E-mail: wabusham@indiana.edu, torchins@indiana.edu
	The Lorentz spaces
	The Hardy-Lorentz spaces Hp,q
	Interpolation between Hardy-Lorentz spaces
ABSTRACT
  In this paper we consider the Hardy-Lorentz spaces $H^{p,q}(R^n)$, with
$0<p\le 1$, $0<q\le \infty$. We discuss the atomic decomposition of the
elements in these spaces, their interpolation properties, and the behavior of
singular integrals and other operators acting on them.

<|endoftext|><|startoftext|>
Potassium intercalation in graphite: A van der Waals density-functional study
Eleni Ziambaras,1 Jesper Kleis,1 Elsebeth Schröder,1 and Per Hyldgaard1, 2, ∗
Department of Applied Physics, Chalmers University of Technology, SE–412 96 Göteborg, Sweden
Microtechnology and Nanoscience, MC2, Chalmers University of Technology, SE–412 96 Göteborg, Sweden
(Dated: April 1, 2007)
Potassium intercalation in graphite is investigated by first-principles theory. The bonding in
the potassium-graphite compound is reasonably well accounted for by traditional semilocal density
functional theory (DFT) calculations. However, to investigate the intercalate formation energy
from pure potassium atoms and graphite requires use of a description of the graphite interlayer
binding and thus a consistent account of the nonlocal dispersive interactions. This is included
seamlessly with ordinary DFT by a van der Waals density functional (vdW-DF) approach [Phys.
Rev. Lett. 92, 246401 (2004)]. The use of the vdW-DF is found to stabilize the graphite crystal,
with crystal parameters in fair agreement with experiments. For graphite and potassium-intercalated
graphite structural parameters such as binding separation, layer binding energy, formation energy,
and bulk modulus are reported. Also the adsorption and sub-surface potassium absorption energies
are reported. The vdW-DF description, compared with the traditional semilocal approach, is found
to weakly soften the elastic response.
I. INTRODUCTION
Graphite with its layered structure is easily interca-
lated by alkali metals (AM) already at room tempera-
ture. The intercalated compound has two-dimensional
layers of AM between graphite layers,1,2,3,4,5 giving rise
to interesting properties, such as superconductivity.6,7
The formation of an AM-graphite intercalate proceeds
with adsorption of AM atoms on graphite and absorption
of AM atoms below the top graphite layer, after which
further exposure to AM atoms leads the AM intercalate
compound.
Recent experiments8,9 on the structure and elec-
tronic properties of AM/graphite systems use samples
of graphite that are prepared by heating SiC crystals
to temperatures around ∼ 1400◦ C.10 This heat-induced
graphitization is of great value for spectroscopic studies
of graphitic systems, since the resulting graphite overlay-
ers are of excellent quality.11 The nature of the bonding
between the SiC surfaces and graphite has been explored
experimentally with photoemission spectroscopy12 and
theoretically13 with a van der Waals density functional
(vdW-DF) theory approach that accounts for the van der
Waals (vdW) forces.14,15,16,17
Here we investigate with density functional theory
(DFT) the effects on the graphite structure and the
energetics and the elastic response when potassium
is intercalated. The final intercalate compound is
C8K. The AM intercalate system is interesting in it-
self and has been the focus of numerous experimen-
tal investigations.18,19,20,21,22 Graphitic systems are also
ideal test materials in ongoing theory development that
aims at improving the description of the nonlocal inter-
layer bonds in sparse systems.14,23,24 Standard DFT ap-
proaches are based on local (local density approximation,
LDA) and semilocal approximations (generalized gradi-
ent approximation, GGA)25,26,27,28 for the electron ex-
change and correlation. Such regular DFT tools do not
treat correctly the weak vdW binding, e.g., the cohe-
sion between (adjacent) graphite layers. The failure of
traditional DFT for graphite makes it impossible to ob-
tain a meaningful comparison of the energetics in on-
surface AM adsorption and subsurface AM absorption.
Conversely, investigations of graphitic systems like C8K
permit us to test the accuracy of our vdW-DF develop-
ment work.
We explore the nature of the bonding of graphite,
the process leading to intercalation via adsorption and
absorption of potassium, and the nature of potassium-
intercalated graphite C8K using a recently developed
vdW-DF density functional.16 This choice of functional is
essential for a comparison of graphite and C8K properties
because of the inability of traditional GGA-based DFT
to describe graphite. We calculate the structure and elas-
tic response (bulk modulus B0) of pristine graphite and
potassium intercalated graphite and we present results
for the formation energies of the C8K system.
The intercalation of potassium in graphite is preceded
by the adsorption of potassium on top of a graphite
surface and potassium absorption underneath the top
graphite layer of the surface. In this work we study
how potassium bonds to graphite in these two parts of
the process towards intercalation. Our vdW-DF inves-
tigations of the binding of potassium in or on graphite
supplements corresponding vdW-DF studies of the bind-
ing of polycyclic aromatic hydrocarbon dimers, of the
polyethylene crystal, of benzene dimers, and of poly-
cyclic aromatic hydrocarbon and phenol molecules on
graphite.29,30,31,32,33,34
The outline of the paper is as follows. Section II con-
tains a short description of the materials of interest here:
graphite, C8K, and graphite with an adsorbed or ab-
sorbed K atom layer. The vdW-DF scheme is described
in Sec. III. Section IV presents our results, Sec. V the
discussion, and conclusions are drawn in Sec. VI.
http://arxiv.org/abs/0704.0055v1
FIG. 1: (Color online) Simple hexagonal graphite (AA stack-
ing) and natural hexagonal graphite (AB stacking). The two
structures differ by that each second carbon layer in AB-
stacked graphite is shifted, whereas in AA-stacked graphite
all planes are directly above each other. The experimentally
obtained in-plane lattice constant and sheet separation of nat-
ural graphite is (Ref. 40) a = 2.459 Å and dC-C = 3.336 Å,
respectively.
II. MATERIAL STRUCTURE
Graphite is a semimetallic solid with strong intra-plane
bonds and weakly coupled layers. The presence of these
two types of bonding results in a material with different
properties along the various crystallographic directions.35
For example, the thermal and electrical conductivity
along the carbon sheets is two orders of magnitude higher
than that perpendicular to the sheets. This specific prop-
erty allows heat to move directionally, which makes it
possible to control the heat transfer. The relatively weak
vdW forces between the sheets contribute to another in-
dustrially important property: graphite is an ideal lubri-
cant. In addition, the anisotropic properties of graphite
make the material suitable as a substrate in electronic
studies of ultrathin metal films.36,37,38,39
The natural structure of graphite is an AB stacking,
with the graphite layers shifted relative to each other,
as illustrated in Fig. 1. The figure also shows hexagonal
graphite, consisting of AA-stacked graphite layers. The
in-plane lattice constant a and the layer separation dC-C
is also illustrated. In natural graphite the primitive unit
cell is hexagonal, includes four carbon atoms in two lay-
ers, and has unit cell side lengths a and height c = 2dC-C.
The physical properties of graphite have been studied
in a variety of experimental40,41,42 and theoretical43,44
work. Some of the DFT work has been performed in
LDA, which does not provide a physically meaningful
account of binding in layered systems.15,45 At the same
time, using GGA is not an option because it does not
bind the graphite layers. For a good description of the
FIG. 2: (Color online) Crystalline structure of C8K show-
ing the AA-stacking of the carbon layers (small balls) and
the αβγδ-stacking of the potassium layers (large balls) per-
pendicular to the graphene sheets. The potassium layers are
arranged in a p(2× 2) structure, with the K atoms occupying
the sites over the hollows of every fourth carbon hexagon.
graphite structure and nature the vdW interactions must
be included.45
Alkali metals (AM), except Na, easily penetrate the
gallery of the graphite forming alkali metal graphite in-
tercalation compounds. These intercalation compounds
are formed through electron exchange between the inter-
calated layer and the host carbon layers, resulting in a
different nature of the interlayer bonding type than that
of pristine graphite. The intercalate also affects the con-
ductive properties of graphite, which becomes supercon-
ductive in the direction parallel to the planes at critical
temperatures below 1 K.6,7
The structure of AM graphite intercalation compounds
is characterized by its stage n, where n is the number of
graphite sheets located between the AM layers. In this
work we consider only stage-1 intercalated graphite C8K,
in which the layers of graphite and potassium alternate
throughout the crystal. The primitive unit cell of C8K
is orthorhombic and contains sixteen C atoms and two
K atoms. In the C8K crystal the K atoms are ordered
in a p(2× 2) registry with K-K separation 2a, where a is
the in-plane lattice constant of graphite. This separation
of the potassium atoms is about 8% larger than that in
the natural K bcc crystal (based on experimental values).
The carbon sheet stacking in C8K is of AA type, with the
K atoms occupying the sites over the hollows of every
fourth carbon hexagon, each position denoted by α, β,
γ, or δ, and the stacking of the K atoms perpendicular
to the planes being described by the αβγδ-sequence as
illustrated in Fig. 2.
III. COMPUTATIONAL METHODS
The first-principle total-energy and electronic struc-
ture calculations are performed within the framework
of DFT. The semilocal Perdew-Burke-Ernzerhof (PBE)
flavor26 of GGA is chosen for the exchange-correlation
functional for the traditional self-consistent calculations
underlying the vdW-DF calculations. For all GGA cal-
culations we use the open source DFT code Dacapo,46
which employs Vanderbilt ultrasoft pseudopotentials,47
periodic boundary conditions, and a plane-wave basis set.
An energy cut-off of 500 eV is used for the expansion of
the wave functions and the Brillouin zone (BZ) of the
unit cells is sampled according to the Monkhorst-Pack
scheme.48 The self-consistently determined GGA valence
electron density n(r) as well as components of the energy
from these calculations are passed on to the subsequent
vdW-DF calculation of the total energy.
For the adsorption and absorption studies a graphite
surface slab consisting of 4 layers is used, with a surface
unit cell of side lengths twice those in the graphite bulk
unit cell (i.e., side lengths 2a). The surface calculations
are performed with a 4×4×1 k-point sampling of the BZ.
The (pure) graphite bulk GGA calculations are per-
formed with a 8×8×4 k-point sampling of the BZ,
whereas for the C8K bulk structure, in a unit cell at least
double the size in any direction, 4×4×2 k-points are used,
consistent also with the choice of k-point sampling of the
surface slabs.
We choose to describe C8K by using a hexagonal unit
cell with four formula units, lateral side lengths approxi-
mately twice those of graphite and with four graphite and
four K-layers in the direction perpendicular to the layers.
C8K can also be described by the previously mentioned
primitive orthorhombic unit cell containing two formula
units of atoms but we retain the orthorhombic cell for
ease of description and for simple implementation of nu-
merically robust vdW-DF calculations.
In all our studies, except test cases, the Fast Fourier
Transform (FFT) grids are chosen such that the separa-
tion of neighboring points is maximum ∼0.13 Å in any
direction in any calculation.
A. vdW density function calculations
In graphite, the carbon layers bind by vdW interac-
tions only. In the intercalated compound a major part of
the attraction is ionic, but also here the vdW interactions
cannot be ignored. In order to include the vdW interac-
tions systematically in all of our calculations we use the
vdW-DF of Ref. 16. There, the correlation energy func-
tional is divided into a local and a nonlocal part,
Ec ≈ E
c + E
c , (1)
where the local part is approximated in the LDA and the
nonlocal part Enlc is consistently constructed to vanish
for a homogeneous system. The nonlocal correlation Enlc
is calculated from the GGA-based n(r) and its gradients
by using information about the many-body response of
the weakly inhomogeneous electron gas:
Enlc =
dr′n(r)φ(r, r′)n(r′). (2)
The nonlocal kernel φ(r, r′) can be tabulated in terms
of the separation |r − r′| between the two fragments at
positions r and r′ through the parameters D = (q0 +
q′0)|r − r
′|/2 and δ = (q0 − q
0)/(q0 + q
0). Here q0 is a
local parameter that depends on the electron density and
its gradient at position r. The analytic expression for the
kernel φ in terms of D and δ can be found in Ref. 16.
For periodic systems, such as bulk graphite, C8K,
and the graphite surface (with adsorbed or absorbed K-
atoms), the nonlocal correlation per unit cell is simply
evaluated from the interaction of the points in the unit
cell V0 with points everywhere in space (V ) in the three
(for bulk graphite and C8K) or two (for the graphite
surface) dimensions of periodicity. Thus, the V -integral
in Eq. (2) in principle requires a representation of the
electron density infinitely repeated in space. In prac-
tice, the nonlocal correlation rapidly converges31 and it
suffices with repetitions of the unit cell a few times in
each spatial direction. For graphite bulk the V -integral
is converged when we use a V that extends 9 (7) times
the original unit cell in directions parallel (perpendicular)
to the sheets. For the potassium investigation a signif-
icantly larger original unit cell is adopted (see Fig. 2);
here a fully converged V corresponds to a cell extending
five (three) times the original cell in the direction parallel
(perpendicular) to the sheets for C8K bulk. To describe
the nonlocal correlations (2) for the graphite surface a
sufficient V extends five times the original unit cell along
the carbon sheets.
For the exchange energy Ex we follow the choice of
Ref. 16 of using revPBE27 exchange. Among the func-
tionals that we have easy access to, the revPBE has
proved to be the best candidate for minimizing the ten-
dency of artificial exchange binding in graphite.15
Using the scheme described above to evaluate Enlc , the
total energy finally reads:
EvdW−DF = EGGA − EGGAc + E
c + E
c , (3)
where EGGA is the GGA total energy with the revPBE
choice for the exchange description and EGGAc (E
the GGA (LDA) correlation energy. As our GGA calcu-
lations in this specific application of vdW-DF are carried
out in PBE, not revPBE, we further need to explicitly
replace the PBE exchange in EGGA by that of revPBE
for the same electron charge density distribution.
B. Convergence of the local and nonlocal energy
variation
DFT calculations provide physically meaningful results
for energy differences between total energies (3). To un-
derstand materials and processes we must compare total
energy differences between a system with all constituents
at relatively close distance and a system of two or more
fragments at “infinite” separation (the reference system).
Since the total energy (3) consists both of a long-range
term and shorter-ranged GGA and LDA terms it is nat-
ural to choose different ways to represent the separated
fragments for these different long- or short-range energy
terms.
For the shorter-range energy parts (LDA and GGA
terms) the reference system is a full system with vacuum
between the fragments. For LDA and GGA calculations
it normally suffices to make sure that the charge den-
sity tails of the fragments do not overlap, but here we
find that the surface dipoles cause a slower convergence
with layer separation. We use a system with the layer
separation between the potassium layer and the nearest
graphite layer(s) dC-K = 12 Å (8 Å) as reference for the
adsorption (absorption) study.
The evaluation of the nonlocal correlationsEnlc requires
additional care. This is due to technical reasons per-
taining to numerically stability in basing the Enlc eval-
uation on the FFT grid used to converge the underly-
ing traditional-DFT calculations. The evaluation of the
nonlocal correlation energy, Eq. (2), involves a weighted
double integral of a kernel with a significant short-range
variation16. The shape of the kernel makes the Enlc eval-
uation sensitive to the particulars of FFT-type griding,49
for example, to the relative position of FFT grid points
relative to the nuclei position (for a finite grid-point spac-
ing).
However, robust evaluation of binding- or cohesive-
energy contributions by nonlocal correlations can gen-
erally be secured by a further splitting of energy differ-
ences into steps that minimize the above-mentioned grid
sensitivity. The problem of FFT sensitivity of the Enlc
evaluation is accentuated because the binding in the Enlc
channel arises as a smaller energy difference between siz-
ableEnlc contributions of the system and of the fragments.
Conversely, convergence in vdW-DF calculations of bind-
ing and cohesive energies can be obtained even at a mod-
erate FFT grid accuracy (0.13 Å used here) by devising a
calculational scheme that always maintains identical po-
sition of the nuclei relative to grid points in the combined
systems as well as in the fragment reference system.
Thus we obtain a numerically robust evaluation of the
Enlc energy differences by choosing steps for which we
can explicitly control the FFT griding. For adsorption
and absorption cases we calculate the reference systems
as a sum of Enlc -contributions for each fragment and we
make sure to always position the fragment at the exact
same position in the system as in the interacting system.
For bulk systems we choose steps in which we exclusively
adjust the inter-plane or in-plane lattice constant. Here
the reference system is then simply defined as a system
with either double (or in some cases quadruple) lattice
constant and with a corresponding doubling of the FFT
griding along the relevant unit-cell vector.
The cost of full convergence is that, in practice, we of-
ten do three or more GGA calculations and subsequent
Enlc calculations for each point on the absorption, absorp-
tion, or formation-energy curve. In addition to the cal-
culations for the full system we have to do one for each of
the isolated fragments at identical position in the adsorp-
tion/absorption cases and one or more for fragments in
the doubled unit-cell and doubled griding reference. We
have explicitly tested that using a FFT grid spacing of
< 0.13 Å (but not larger) for such reference calculations
is sufficient to ensure full convergence in the reported
Enlc (and E
vdW−DF total) energy variation for graphitic
systems.
C. Material formation and sorption energies
The cohesive energy of graphite (G) is the energy gain,
per carbon atom, of creating graphite at in-plane lattice
constant a and layer separation dC-C from isolated (spin-
polarized) carbon atoms.
EG,coh(a, dC-C) = EG,tot(a, dC-C)− EC-atom,tot (4)
where EG,tot and EC-atom,tot are total energies per carbon
atom. The graphite structure is stable at the minimum
of the cohesive energy, at lattice constants a = aG and
2dC-C = cG.
The adsorption (absorption) energy for a p(2 × 2) K-
layer over (under) the top layer of a graphite surface is
the difference in total energy [from Eq. (3)] for the system
at hand minus the total energy of the initial system, i.e.,
a clean graphite surface and isolated gas-phase potassium
atoms. However, due to the above mentioned technical
issues in using the vdW-DF we calculate the adsorption
and absorption energy as a sum of (artificial) stages lead-
ing to the desired system: First the initially isolated,
spin-polarized potassium atoms are gathered into a free
floating potassium layer with the structure correspond-
ing to a full cover of potassium atoms. By this the total
system gains the energy ∆EK-layer(aG), with
∆EK-layer(a) = EK,tot(a)− EK-atom,tot . (5)
In adsorption the potassium layer is then simply placed
on top of the four-layer (2 × 2) graphite surface (with
the K atoms above graphite hollows) at distance dC-K.
The system thereby gains a further energy contribution
∆EK-G(dC-K). This leads to an adsorption energy per
K-atom
Eads(dC-K) = ∆EK-layer(aG) + ∆EK-G(dC-K) . (6)
In absorption the top graphite layer is peeled off the
(2 × 2) graphite surface and moved to a distance far
from the remains of the graphite surface. This process
costs the system an (“exfoliation”) energy −∆EC-G =
−[Etot,C-G(dC-C = cG/2)− Etot,C-G(dC-C → ∞)]. At the
far distance the isolated graphite layer is moved into AA
stacking with the surface, at no extra energy cost. Then,
the potassium layer is placed midway between the far-
away graphite layer and the remains of the graphite sur-
face. Finally the two layers are gradually moved towards
the surface. At distance 2dC-K between the two topmost
graphite layers (sandwiching the K-layer) the system has
further gained an energy ∆EC-K-G(dC-K). The absorp-
tion energy per K-atom is thus
Eabs(dC-K) = −∆EC-G+∆EK-layer(aG)+∆EC-K-G(dC-K) .
Similarly, the C8K intercalate compound is formed
from graphite by first moving the graphite layers far
apart accordion-like (and there shift the graphite stack-
ing from ABA . . . to AAA . . . at no energy cost), then
changing the in-plane lattice constant of the isolated
graphene layers from aG to a, then intercalating K-layers
(in stacking αβγδ) between the graphite layers, and fi-
nally moving all the K- and graphite layers back like an
accordion, with in-plane lattice constant a (which has the
value aC8K at equilibrium).
In practice, a unit cell of four periodically repeated
graphite layers is used in order to accommodate the
potassium αβγδ-stacking. The energy gain of creating
a (2× 2) graphene sheet from 8 isolated carbon atoms is
defined similarly to that of the K-layer:
∆EC-layer(a) = EC-layer,tot(a)− 8EC-atom,tot . (8)
The formation energy for the C8K intercalate com-
pound per K atom or formula unit, Eform, is thus found
from the energy cost of moving four graphite layers
apart by expanding the (2 × 2) unit cell to large height,
−∆EG-acc, the cost of changing the in-plane lattice con-
stant from aG to a in each of the four isolated graphene
layers, 4(∆EC-layer(a)−∆EC-layer(aG)), the gain of creat-
ing four K-layers from isolated K-atoms, 4∆EK-layer(a),
plus the gain of bringing four K-layers and four graphite
layers together in the C8K structure, ∆EC8K-acc(a, dC-K),
yielding
Eform(a, dC-K)
−∆EG-acc + 4∆EC-layer(a)− 4∆EC-layer(aG)
+ 4∆EK-layer(a) + ∆EC8K-acc(a, dC-K)
. (9)
The relevant energies to use for comparing the three
different mechanisms of including potassium (adsorp-
tion, absorption and intercalation) are thus Eads(dC-K),
Eabs(dC-K) and Eform(a, dC-K) at their respective mini-
mum values.
IV. RESULTS
Experimental observations indicate that the intercala-
tion of potassium into graphite starts with the absorption
of evaporated potassium into an initially clean graphite
surface.50 This subsurface absorption is preceded by ini-
tial, sparse potassium adsorption onto the surface, and
proceeds with further absorption into deeper graphite
voids. The general view is that the K atoms enter
graphite at the graphite step edges.20 The amount and
position of intercalated K atoms is controlled by the tem-
perature and time of evaporation.
Below, we first describe the initial clean graphite sys-
tem, and the energy gain in (artificially) creating free-
floating K-layers from isolated K-atoms. Then we present
and discuss our results on potassium adsorption and sub-
surface absorption, followed by a characterization of bulk
For the adsorption (absorption) system we calculate
the adsorption (absorption) energy curve, including the
equilibrium structure. As a demonstration of the need
for a relatively fine FFT griding in the vdW-DF cal-
culations we also calculate and compare the absorption
curve for a more sparse FFT grid. For the bulk sys-
tems (graphite and C8K) we determine the lattice pa-
rameters and the bulk modulus. We also calculate the
formation energy of C8K and the energy needed to peel
off one graphite layer from the graphite surface and com-
pare with experiment.51
A. Graphite bulk structure
The present calculations on pure graphite are for the
natural, AB-stacked graphite (lower panel of Fig. 1). The
cohesive energy is calculated at a total of 232 structure
values (a, dC-C) and the equilibrium structure and bulk
modulus B0 are then evaluated using the method de-
scribed in Ref. 52.
Figure 3 shows a contour plot of the graphite cohesive
energy variation EG,coh as a function of the layer sep-
aration dC-C and the in-plane lattice constant a, calcu-
lated within the vdW-DF scheme. The contour spacing
is 5meV per carbon atom, shown relative to the energy
minimum located at (a, dC-C) = (aG, cG/2) =(2.476 Å,
3.59 Å). These values are summarized in Table I together
with the results obtained from a semilocal PBE cal-
culation. As expected, and discussed in Ref. 14, the
semilocal PBE calculation yields unrealistic results for
the layer separation. The table also presents the cor-
responding experimental values. Our calculated lattice
values obtained using vdW-DFT are in good agreement
with experiment,40 and close to those found from the
older vdW-DF of Refs. 14 and 15, (in which we for Enlc as-
sume translational invariance of n(r) along the graphite
planes,) at (2.47 Å, 3.76 Å).
Consistent with experimental reports18 and our previ-
ous calculations14,15,45 we find graphite to be rather soft,
indicated by the bulk modulus B0 value. Since in-plane
compression is very hard in graphite most of the softness
suggested by (the isotropic) B0 comes from compression
perpendicular to the graphite layers, and the value of
B0 is expected to be almost identical to the C33 elastic
3 3.5 4 4.5 5 5.5 6
dC−C [Å]
FIG. 3: Graphite cohesive energy EG,coh (AB-stacked), based
on vdW-DF, as a function of the carbon layer separation dC-C
and the in-plane lattice constant a. The energy contours are
spaced by 5meV per carbon atom.
TABLE I: Optimized structure parameters and elastic
properties for natural hexagonal graphite (AB-stacking)
and the potassium-intercalated graphite structure C8K in
AαAβAγAδAα . . . stacking. The table shows the calcu-
lated optimal values of the in-plane lattice constant a, the
(graphite-)layer-layer separation dC-C, and the bulk modulus
B0. In C8K the value if dC-C is twice the graphite-potassium
distance dC-K.
Graphite C8K
PBE vdW-DF Exp. PBE vdW-DF Exp.
a (Å) 2.473 2.476 2.459a 2.494 2.494 2.480b
dC-C (Å) ≫ 4 3.59 3.336
a 5.39 5.53 5.35c
B0 (GPa) 27 37
de 37 26 47de
aRef. 40. bRef. 53. cRef. 4. dRef. 18.
eValue presented is for C33; for laterally rigid materials, like
graphite and C8K, C33 is a good approximation of B0.
coefficient.14,18
We find the energy cost of peeling off a graphite layer
from the graphite surface (the exfoliation energy) to be
∆EC-G = −435 meV per (2 × 2) unit cell, i.e., −55
meV per surface carbon atom (Table II). A recent
experiment51 measured the desorption energy of poly-
cyclic aromatic hydrocarbons (basically flakes of graphite
sheets) off a graphite surface. From this experiment
the energy cost of peeling off a graphite layer from the
graphite surface was deduced to −52± 5meV/atom.
Our value −55 meV/C-atom is also consistent with
a separate vdW-DF determination29 of the binding
(−47meV per in-plane atom) between two (otherwise)
isolated graphene sheets.
For the energies of the absorbate system and of the
C8K intercalate a few other graphite-related energy con-
tributions are needed. The energy of collecting C atoms
to form a graphene sheet at lattice constant a from iso-
lated (spinpolarized) atoms is given by ∆EC-layer(a); we
find that changing the lattice constant a from aG to
the equilibrium value aC8K of C8K causes this energy
to change a mere 30 meV per (2 × 2) sheet. The contri-
bution ∆EG-acc is the energy of moving bulk graphite
layers (in this case four periodically repeated layers)
far away from each other, by expanding the unit cell
along the direction perpendicular to the layers. Thus,
∆EG-acc = 32∆EG,coh(aG, cG/2) − 4∆EC-layer(aG) tak-
ing the number of atoms and layers per unit cell into
account. We find the value ∆EG-acc = −1600 meV per
(2×2) four-layer unit cell. This corresponds to −50 meV
per C atom, again consistent with our result for the ex-
foliation energy, ∆EC-G/8 = −55 meV.
B. Creating a layer of K-atoms
The (artificial) step of creating a layer of potassium
atoms from isolated atoms releases a significant energy
∆EK-layer. This energy contains the energy variation
with in-plane lattice constant and the energy cost of
changing from a spin-polarized to a spin-balanced elec-
tron configuration for the isolated atom.54
The creation of the K-layer provides an energy gain
which is about half an eV per potassium atom, depending
on the final lattice constant. With the graphite lattice
constant aG the energy change, including the spin-change
cost, is ∆EK-layer(aG) = −476meV per K atom in vdW-
DF (−624meV when calculated within PBE), whereas
∆EK-layer(aC8K) = −473meV in vdW-DF.
C. Graphite-on-surface adsorption of potassium
The potassium atoms are adsorbed on a usual
ABA . . .-stacked graphite surface. We consider here full
(one monolayer) coverage, which is one potassium atom
per (2 × 2) graphite surface unit cell. This orders the
potassium atoms in a honeycomb structure with lattice
constant 2aG, and a nearest-neighbor distance within the
K-layer of aG.
The unit cell used in the standard DFT calculations
for adsorption and absorption has a height of 40 Å and
includes a vacuum region sufficiently big that no interac-
tions (within GGA) can occur between the top graphene
sheet and the slab bottom in the periodically repeated
image of the slab. The vacuum region is also large in
order to guarantee that the separation from any atom to
the dipole layer55 always remains larger than 4 Å.
In the top panel of Fig. 4 we show the adsorption en-
ergy per potassium atom. The adsorption energy at equi-
librium is −937 meV per K atom at distance dC-K = 3.02
Å from the graphite surface.
For comparison we also show the adsorption curve cal-
culated in a PBE-only traditional DFT calculation. Since
vdW−DF
2 3 4 5 6 7
dC−K [Å]
vdW−DF
43.532.5
dC−K [Å]
sparse
FIG. 4: Potassium adsorption and absorption energy at the
graphite surface as a function of the separation dC-K of the
K-atom layer and the nearest graphite layer(s) (at in-plane
lattice constant corresponding to that of the surface, aG).
Top panel: Adsorption curve based on vdW-DF calculations
(solid line with black circles) and PBE GGA calculations
(dashed line). The horizontal lines to the left show the en-
ergy gain in creating the isolated K layer from isolated atoms,
∆EK-layer(aG), the asymptote of Eads(dC-K) in this plot.
Bottom panel: Absorption curve based on vdW-DF calcu-
lations. The asymptote is here the sum ∆EK-layer(aG) −
∆EC-G. Inset: Binding energy of the K-layer and the
top graphite layer (“C-layer”) on top of the graphite slab,
∆EC-K-G. The dashed curve shows our results when in E
ignoring every second FFT grid point (in each direction) of
the charge density from the underlying GGA calculations, the
solid curve with black circles shows the result of using every
available FFT grid point.
the interaction between the K-layer and the graphite sur-
face has a short-range component to it, even GGA calcu-
lations, such as the PBE curve, show significant binding
(−900 meV/K-atom at dC-K = 2.96 Å). This is in con-
trast to the pure vdW binding between the layers in clean
graphite.14,15 Note that the asymptote of the PBE curve
is different from that of the vdW-DF curve, this is due
to the different energy gains (∆EK-layer) in collecting a
potassium layer from isolated atoms when calculated in
PBE or in vdW-DF.
For K-adsorption the vdW-DF and PBE curves agree
reasonably well, and the use of vdW-DF for this spe-
cific calculation is not urgently necessary. However, in
order to compare the adsorption results consistently to
absorption, intercalation and clean graphite, it is neces-
sary to include the long-range interactions through vdW-
DF. As shown for the graphite bulk results above, PBE
yields quantitatively and qualitatively wrong results for
the layer separation.
D. Graphite-subsurface absorption of potassium
The first subsurface adsorption of K takes place in the
void under the top-most graphite layer. The surface ab-
sorption of the first K-layer causes a lateral shift of the
top graphite sheet, resulting in a A/K/ABAB . . . stack-
ing of the graphite. We have studied the bonding nature
of this absorption process by considering a full p(2× 2)-
intercalated potassium layer in the subsurface of a four
layer thick graphite slab.
Following the receipt of Section III for the absorption
energy (7) the energies ∆EC-K-G are approximated by
those from a four-layer intercalated graphite slab with
the stacking A/K/ABA, and the values are shown in the
inset of Fig. 4. The absorption energy Eabs is given by
the curve in the bottom panel of Fig. 4, and its minimum
is −952meV per K atom at dC-K = 2.90 Å.
To investigate what grid spacing is sufficiently dense
to obtain converged total-energy values in vdW-DF we
do additional calculations in the binding distance region
with a more sparse grid. Specifically, the inset of Fig. 4
compares the vdW-DF calculated at full griding with one
that uses only every other FFT grid point in each direc-
tion, implying a grid spacing for Enlc (but not for the lo-
cal terms) which is maximum 0.26 Å. We note that using
the full grid yields smaller absolute values of the absorp-
tion energy. We also notice that the effect is more pro-
nounced for small separations than for larger distances.
Thus given resources, the dense FFT grid calculations
are preferred, but even the less dense FFT grid calcu-
lations yield reasonably well-converged results. In all
calculations (except tests of our graphitic systems) we
use a spacing with maximum 0.13 Å between grid points.
This is a grid spacing for which we have explicitly tested
convergence of the vdW-DF for graphitic systems given
the computational strategy described and discussed in
Sec. III.
E. Potassium-intercalated graphite
When potassium atoms penetrate the gallery of the
graphite, they form planes that are ordered in a p(2× 2)
fashion along the planes. The K intercalation causes a
shift of every second carbon layer resulting into an AA
stacking of the graphite sheets. The K atoms then simply
occupy the sites over the hollows of every fourth carbon
hexagon. The order of the K atoms perpendicular to the
planes is described by the αβγδ stacking, illustrated in
Fig. 2.
For the potassium intercalated compound C8K we cal-
culate in standard DFT using PBE the total energy at
132 different combinations of the structural parameters a
TABLE II: Comparison of the graphite exfoliation energy per
surface atom, EC-G/8, graphite layer binding energy per car-
bon atom, ∆EC-acc/32, the energy gain per K atom of col-
lecting K- and graphite-layers at equilibrium to form C8K,
∆EC8K-acc/4, and the equilibrium formation energy of C8K,
Eform.
∆EC-G/8 ∆EC-acc/32 ∆EC8K-acc/4 Eform
[meV/atom] [meV/atom] [meV/C8K] [meV/C8K]
vdW-DF −55 −50 −818 −861
PBE − − −511 −
Exp. −52± 5a −1236b
aRef. 51. bRef. 1.
 5  5.2  5.4  5.6  5.8  6
dC−C [Å]
FIG. 5: Formation energy of C8K, Eform, as a function of
the carbon-to-carbon layer separation dC-C and half the in-
plane lattice constant, a. The energy contours are spaced by
20meV per formula unit.
and dC-C. The charge densities and energy terms of these
calculations are then used as input to vdW-DF. The equi-
librium structure and elastic properties (B0) both for the
vdW-DF results and for the PBE results are then evalu-
ated with the same method as in the graphite case.52
Figure 5 shows a contour plot of the C8K formation
energy, calculated in vdW-DF, as a function of the C-C
layer separation (dC-C) and the in-plane periodicity (a)
of the graphite-layer structure. The contour spacing is
20 meV per formula unit and are shown relative to the
energy minimum at (a, dC-C) = (2.494 Å, 5.53 Å).
V. DISCUSSION
Table I presents an overview of our structural results
obtained with the vdW-DF for graphite and C8K. The
table also contrasts the results with the corresponding
values calculated with PBE where available. The vdW-
DF value dC-C = 5.53 Å for the C8K C − C layer sep-
aration is 3% larger than the experimentally observed
value whereas the PBE value corresponds to less than a
1% expansion. Our vdW-DF result for the C8K bulk
modulus (26 GPa) is also softer than the PBE result
(37 GPa) and further away from the experimental esti-
mates (47 GPa) based on measurements of the C33 elastic
response.18 A small overestimation of atomic separation
is consistent with the vdW-DF behavior that has been
documented in a wide range of both finite and extended
systems.14,15,16,17,29,30,33,34 This overestimation results,
at least in part, from our choice of parametrization of
the exchange behavior — an aspect that lies beyond
the present vdW-DF implementation which focuses on
improving the account of the nonlocal correlations, per
se. It is likely that systematic investigations of the ex-
change effects can further refine the accuracy of vdW-DF
implementations.56 In any case, vdW-DF theory calcu-
lations represent, in contrast to PBE, the only approach
to obtain a full ab initio characterization of the AM in-
tercalation process.
The C8K system is more compact than graphite and
this explains why PBE alone can here provide a good de-
scription of the materials structure and at least some ma-
terials properties, whereas it fails completely for graphite.
The distance between the graphene sheets upon interca-
lation of potassium atoms is stretched compared to that
of pure graphite, but the (K-)layer to (graphite-)layer
separation, dC-K = dC-C/2 = 2.77 Å, is significantly less
than the layer-layer separation in pure graphite. This in-
dicates that C8K is likely held together, at least in part,
by shorter-ranged interactions.
Table II documents that the vdW binding neverthe-
less plays an important role in the binding and forma-
tion of C8K. The table summarizes and contrasts our
vdW-DF and PBE results for graphite exfoliation and
layer binding energies as well as C8K interlayer binding
and formation energies. The vdW-DF result for the C8K
formation energy is smaller than experimental measure-
ments by 31% but it nevertheless represent a physically
motivated ab initio calculation. In contrast, the C8K
formation energy is simply unavailable in PBE because
PBE, as indicated, fails to describe the layer binding in
graphite. Moreover, for the vdW-DF/PBE comparisons
that we can make — for example, of the C8K layer inter-
action ∆EC8K-acc — the vdW-DF is found to significantly
strengthen the bonding compared with PBE.
It is also interesting to note that the combination of
shorter-ranged and vdW bonding components in C8K
yields a layer binding energy that is close to that of
the graphite case. In spite of the difference in nature
of interactions, we find almost identical binding energies
per layer for the case of the exfoliation and accordion in
graphite and for the accordion in C8K. This observation
testifies to a perhaps surprising strength of the so-called
soft-matter vdW interactions.
In a wider perspective our vdW-DF permits a first
comparison of the range of AM-graphite systems from
adsorption over absorption to full intercalation and thus
insight on the intercalation progress. Assuming a dense
2×2 configuration, we find that the energy for potassium
adsorption and absorption is nearly degenerate with an
indication that absorption is slightly preferred, consis-
tent with experimental behavior. We also find that the
potassium absorption may eventually proceeds towards
full intercalation thanks to a significant release of forma-
tion energy.
VI. CONCLUSIONS
The potassium intercalation process in graphite has
been investigated by means of the vdW-DF density func-
tional method. This method includes the dispersive in-
teractions needed for a consistent investigation of the
intercalation process. For clean graphite the vdW-DF
predicts — contrary to standard semilocal DFT imple-
mentations — a stabilized bulk system with equilibrium
crystal parameters in close agreement with experiments.
Two limits of the absorption process have been inves-
tigated by the vdW-DF, namely single layer subsurface
absorption and the fully potassium intercalated stage-1
crystal C8K. Here the vdW-DF is shown to enhance the
(semi-)local type of bonding described by traditional ap-
proaches. The significant impact on the materials behav-
ior indicates that the vdW-DF is needed not only for a
consistent description of sparse matter systems that are
solely stabilized by dispersion forces, but also for their
intercalates.
We thank D.C. Langreth and B.I. Lundqvist for stim-
ulating discussions. Partial support from the Swedish
Research Council (VR), the Swedish National Graduate
School in Materials Science (NFSM), and the Swedish
Foundation for Strategic Research (SSF) through the
consortium ATOMICS is gratefully acknowledged, as
well as allocation of computer time at UNICC/C3SE
(Chalmers) and SNIC (Swedish National Infrastructure
for Computing).
∗ Electronic address: hyldgaar@chalmers.se
1 S. Aronson, F.J. Salzano, and D. Ballafiore, J. Chem. Phys.
49, 434 (1968).
2 D.E. Nixon and G.S. Parry, J. Phys. D 1, 291 (1968).
3 R. Clarke, N. Wada, and S.A. Solin, Phys. Rev. Lett. 44,
1616 (1980).
4 M.S. Dresselhaus and G. Dresselhaus, Adv. Phys. 30, 139
(1981).
5 D.P. DiVincenzo and E.J. Mele, Phys. Rev. B 32, 2538
(1985).
6 N.B. Hannay, T.H. Geballe, B.T. Matthias, K. Andreas, P.
Schmidt, and D. MacNair, Phys. Rev. Lett. 14, 225 (1965).
7 R.A. Jishi and M.S. Dresselhaus, Phys. Rev. B 45, 12465
(1992).
8 T. Kihlgren, T. Balasubramanian, L. Walldén, and R.
Yakimova, Surf. Sci. 600, 1160 (2006).
9 M. Breitholtz, T. Kihlgren, S.-Å. Lindgren, and L.
Walldén, Phys. Rev. B 66, 153401 (2002).
10 I. Forbeaux, J.-M. Themlin, and J.-M. Debever, Phys. Rev.
B 58, 16396 (1998).
11 T. Kihlgren, T. Balasubramanian, L. Walldén, and R.
Yakimova, Phys. Rev. B 66, 235422 (2002).
12 I. Forbeaux, J.-M. Themlin, A. Charrier, F. Thibaudau,
and J.-M. Debever, Appl. Surf. Sci. 162–163, 406 (2000).
13 E. Ziambaras, Ph.D. thesis, Chalmers (2006).
14 H. Rydberg, M. Dion, N. Jacobson, E. Schröder, P.
Hyldgaard, S.I. Simak, D.C. Langreth, and B.I. Lundqvist,
Phys. Rev. Lett. 91, 126402 (2003).
15 D.C. Langreth, M. Dion, H. Rydberg, E. Schröder, P.
Hyldgaard, and B.I. Lundqvist, Int. J. Quantum Chem.
101, 599 (2005).
16 M. Dion, H. Rydberg, E. Schröder, D.C. Langreth, and
B.I. Lundqvist, Phys. Rev. Lett. 92, 246401 (2004); 95,
109902(E) (2005).
17 T. Thonhauser, V.R. Cooper, S. Li, A. Puzder,
P. Hyldgaard, and D.C. Langreth, Van der
Waals density functional: Self-consistent poten-
tial and the nature of the van der Waals bond,
http://arxiv.org/abs/cond-mat/0703442
18 N. Wada, R. Clarke, and S.A. Solin, Solid State Comm.
35, 675 (1980).
19 H. Zabel and A. Magerl, Phys. Rev. B 25, 2463 (1982).
20 J.C. Barnard, K.M. Hock and R.E. Palmer, Surf. Science
287–288, 178 (1993).
21 K. M. Hock and R. E. Palmer, Surf. Science 284, 349
(1993).
22 Z.Y. Li, K.M. Hoch, and R.E. Palmer, Phys. Rev. Lett.
67, 1562 (1991).
23 S.D. Chakarova and E. Schröder, Materials Science and
Engineering C 25, 787 (2005).
24 L.A. Girifalco and M. Hodak, Phys. Rev. B 65, 125404
(2002).
25 J.P. Perdew, J.A. Chevary, S.H. Vosko, K.A. Jackson,
M.R. Pederson, D.J. Singh, and C. Fiolhais, Phys. Rev.
B 48, 6671 (1992).
26 J.P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett.
77, 3865 (1996).
27 Y. Zhang and W. Yang, Phys. Rev. Lett. 80, 890 (1998).
28 B. Hammer, L.B. Hansen, and J.K. Nørskov, Phys. Rev.
B 59, 7413 (1999).
29 S.D. Chakarova-Käck, J. Kleis, and E. Schröder, Appl.
Phys. Rep. 2005-16 (2005).
30 J. Kleis, B.I. Lundqvist, D.C. Langreth, and E. Schröder,
Towards a working density-functional theory for polymers:
First-principles determination of the polyethylene crystal
structure, http://arxiv.org/abs/cond-mat/0611498
31 S.D. Chakarova-Käck, E. Schröder, B.I. Lundqvist, and
D.C. Langreth, Phys. Rev. Lett. 96, 146107 (2006).
32 S.D. Chakarova-Käck, Ø. Borck, E. Schröder, and B.I.
Lundqvist, Phys. Rev. B 74, 155402 (2006).
33 A. Puzder, M. Dion, and D.C. Langreth, J. Chem. Phys.
124, 164105 (2006).
34 T. Thonhauser, A. Puzder, and D.C. Langreth, J. Chem.
Phys. 124, 164106 (2006).
35 D.D.L. Chung, J. Mat. Sci. 37, 1475 (2002).
36 M. Breitholtz, T. Kihlgren, S.-Å. Lindgren, H. Olin, E.
mailto:hyldgaar@chalmers.se
http://arxiv.org/abs/cond-mat/0703442
http://arxiv.org/abs/cond-mat/0611498
Wahlström, and L. Walldén, Phys. Rev. B 64, 073301
(2001).
37 Z.P. Hu, N.J. Wu, and A. Ignatiev, Phys. Rev. B 33, 7683
(1986).
38 J. Cui, J.D. White, R.D. Diehl, J.F. Annett, and M.W.
Cole, Surf. Sci. 279, 149 (1992).
39 L. Österlund, D.V. Chakarov, and B. Kasemo, Surf. Sci.
420, L437 (1991).
40 Y. Baskin and L. Meyer, Phys. Rev. 100, 544 (1955).
41 W. Eberhardt, I.T. McGovern, E.W. Plummer, and J.E.
Fisher, Phys. Rev. Lett. 44, 200 (1980).
42 A.R. Law, J.J. Barry, and H.P. Hughes, Phys. Rev. B 28,
5332 (1983).
43 R. Ahuja, S. Auluck, J. Trygg, J.M. Wills, O. Eriksson,
and B. Johansson, Phys. Rev. B 51, 4813 (1995).
44 N.A.W. Holzwarth, S.G. Louie, and S. Rabii, Phys. Rev.
B 26, 5382 (1982).
45 H. Rydberg, N. Jacobson, P. Hyldgaard, S.I. Simak, B.I.
Lundqvist, and D.C. Langreth, Surf. Sci. 532-535, 606
(2003).
46 Open-source plane-wave DFT computer code Dacapo,
http://www.fysik.dtu.dk/CAMPOS/
47 D. Vanderbilt, Phys. Rev. B 41, 7892 (1990).
48 H.J. Monkhorst and J.D. Pack, Phys. Rev. B 13, 5188
(1976).
49 D.C. Langreth, private communication; J. Kleis and P.
Hyldgaard, unpublished.
50 The transition from on-surface adsorption to subsurface
absorption is identified in experiment by a work function
change, Refs. 20 and 21.
51 R. Zacharia, H. Ulbricht, and T. Hertel, Phys. Rev. B 69,
155406 (2004).
52 E. Ziambaras and E. Schröder, Phys. Rev. B 68, 064112
(2003).
53 D.E. Nixon and G.S. Parry, J. Phys. C 2, 1732 (1969).
54 O. Gunnarsson, B.I. Lundqvist, and J.W. Wilkins, Phys.
Rev. B 10, 1319 (1974). Since no spin-polarized version
of vdW-DF exists at present, we calculate the the energy
cost for changing the spin of isolated potassium atoms in
PBE. The spin-change cost is thus determined to be 26
meV/K-atom.
55 L. Bengtsson, Phys. Rev. B 59, 12301 (1999), and refer-
ences therein.
56 The choice of exchange flavor in vdW-DF was set in Ref. 15
to avoid artificial bonding in noble-gas systems and to
better mimic exact exchange calculations for those sys-
tems. However, it is far from certain and even unlikely that
the conclusions drawn for noble-gas systems carry over to
bonding separations smaller than 3 Å.
http://www.fysik.dtu.dk/CAMPOS/
ABSTRACT
  Potassium intercalation in graphite is investigated by first-principles
theory. The bonding in the potassium-graphite compound is reasonably well
accounted for by traditional semilocal density functional theory (DFT)
calculations. However, to investigate the intercalate formation energy from
pure potassium atoms and graphite requires use of a description of the graphite
interlayer binding and thus a consistent account of the nonlocal dispersive
interactions. This is included seamlessly with ordinary DFT by a van der Waals
density functional (vdW-DF) approach [Phys. Rev. Lett. 92, 246401 (2004)]. The
use of the vdW-DF is found to stabilize the graphite crystal, with crystal
parameters in fair agreement with experiments. For graphite and
potassium-intercalated graphite structural parameters such as binding
separation, layer binding energy, formation energy, and bulk modulus are
reported. Also the adsorption and sub-surface potassium absorption energies are
reported. The vdW-DF description, compared with the traditional semilocal
approach, is found to weakly soften the elastic response.

<|endoftext|><|startoftext|>
Introduction, one main in-
convenience of liquid-crystal simulations is the correct
identification of the solid phase(s) of the system, since
a plethora of such phases are conceivable and there is
no unfailing criterion for choosing those that are really
relevant to the specific model under investigation. The
actual importance of a given crystal phase can only be
judged a posteriori, after proving its mechanical stability
in a long simulation run and, ultimately, on the basis of
the calculation of its Gibbs free energy, but nothing can
nevertheless ensure that no important phase was skipped.
Besides these vague indications, we adopted a more strin-
gent test in order to select the phases for which it is worth
performing the numerically-expensive calculation of the
free energy. With specific reference to the model (2.2),
we did a comprehensive T = 0 study of the chemical po-
tential µ as a function of the pressure for many stretched
cubic and hexagonal phases, in such a way as to iden-
tify the stable ground states and leave out from further
consideration all solids with a very large µ at zero tem-
perature. In fact, it is unlikely that such phases can ever
play a role for the thermodynamics at non-zero temper-
atures.
For the interaction potential describing the GCN
model, we surmise that all of its stable crystal phases
are to be sought among the structures obtained from
the common cubic and hexagonal lattices by a suit-
able stretching along a high-symmetry crystal axis, with
optimal stretching ratios α that are probably close to
L/D. Take e.g. the case of BCC. We can stretch
it along [001], [110], or [111], this way defining new
BCC001(α), BCC110(α), and BCC111(α) lattices (the
number within parentheses is the stretching ratio; for in-
stance, BCC001(2) is a BCC crystal whose unit cell has
been expanded by a factor of 2 along ẑ). The same can
be done with the simple-cubic (SC) and FCC structures.
We further consider hexagonal-close-packed (HCP) and
simple-hexagonal (SH) lattices that are stretched along
[111], this way arriving at a total of eleven potentially
relevant crystal phases.
METHOD
For fixed T and P values, the most stable of several
thermodynamic phases is the one with lowest chemical
potential µ (Gibbs free energy per particle). At T =
0, only crystal phases are involved in this competition
and, once a list of relevant phases has been compiled,
searching for the optimal one at a given P becomes a
simple computational exercise. An exact property of the
Gaussian-core model (which is the L/D = 1 limit of the
GCN model) is that, on increasing pressure, the BCC
crystal takes over the FCC crystal at P ∗ ≡ PD3/ǫ ≃
0.055 [3]. Hence, in the GCN model with L/D > 1 a
leading role is naturally expected for the stretched FCC
and BCC crystals.
For an assigned crystal structure, we calculate the T =
0 chemical potential µ(P ) of the GCN model for a given
pressure P by adjusting the stretching ratio α(P ) and the
density ρ(P ) until the minimum of (U+PV )/N is found.
Once the profile of µ as a function of P is known for each
structure, it is straightforward to draw the T = 0 phase
diagram for the given L/D.
The known thermodynamic behavior at zero tempera-
ture provides the general framework for the further simu-
lational study at non-zero temperatures. In fact, it is safe
to say that the same crystals that are stable at T = 0
also give the underlying lattice structure for the stable
solid phases at T > 0. As we shall see in more detail in
the next Section, the only complication is the existence
of three degenerate T = 0 structures for not too small
pressures, which obliged us to consider each of them as a
potentially relevant low-temperature GCN phase.
We perform a Monte Carlo (MC) simulation of the
GCN model with L/D = 3 in the isothermal-isobaric
ensemble, using the standard Metropolis algorithm with
periodic boundary conditions and the nearest-image con-
vention. For the solid phase, four different types of
lattices are considered, namely FCC001(3), BCC110(3),
BCC111(3), and BCC001(3) (see Section IV). The num-
ber of particles in a given direction is chosen so as to guar-
antee a negligible contribution to the interaction energy
from pairs of particles separated by half a simulation-
box length in that direction. More precisely, our samples
consist of 10× 20× 8 = 1600 particles in the FCC001(3)
phase, of 8 × 24 × 6 = 1152 particles in the fluid and in
the solid BCC110(3) phase, of 10× 12× 18 = 2160 parti-
cles in the BCC111(3) phase, and of 12× 12× 10 = 1440
particles in the BCC001(3) phase. Considering the large
system sizes employed, we made no attempt to extrapo-
late our finite-size results to infinity.
At given T and P , equilibration of the sample typically
took a few thousand MC sweeps, a sweep consisting of
one average attempt per particle to change its center-of-
mass position plus one average attempt to change the
volume by a isotropic rescaling of particle coordinates.
The maximum random displacement of a particle and
the maximum volume change in a trial MC move are ad-
justed once a sweep during the run so as to keep the
acceptance ratio of moves close to 50% and 40%, respec-
tively. While the above setup is sufficient when simu-
lating a (nematic) fluid system, it could have harmful
consequences on the sampling of a solid state to operate
with a fixed box shape since this would not allow the
system to release the residual stress. That is why, after a
first rough optimization with a fixed box shape, the equi-
librium MC trajectory of a solid state is generated with a
modified (so called constant-stress) Metropolis algorithm
which makes it possible to adjust the length of the vari-
ous sides of the box independently from each other (see
e.g. [8]). Ordinarily, however, the simulation box will de-
viate only very little from its original shape. When the
opposite occurs, this indicates a mechanic instability of
the solid in favor of the fluid, hence it gives a clue as to
where melting is located. We note that MC simulations
with a varying box shape are not well suited for the fluid
phase since in this case one side of the box usually be-
comes much larger or smaller than the other two, a fact
that seriously prejudicates the reliability of the simula-
tion results.
In order to locate the melting point for a given pres-
sure, we generate separate sequences of simulation runs,
starting from the cold solid on one side and from the hot
fluid on the other side. The last configuration produced
in a given run is taken to be the first of the next run at a
slightly different temperature. The starting configuration
of a “solid” chain of runs was always a perfect crystal with
α = 3 and a density equal to its T = 0 value. Usually,
this series of runs is carried on until a sudden change is
observed in the difference between the energies/volumes
of solid and fluid, so as to prevent us from averaging over
heterogeneous thermodynamic states. Thermodynamic
averages are computed over trajectories 104 sweeps long.
Much longer trajectories are constructed for estimating
the chemical potential of the fluid (see below).
Estimating statistical errors is a critical issue whenever
different candidate solid structures so closely compete for
thermodynamic stability. To this aim, we divide the MC
trajectory into ten blocks and estimate the length of the
error bars to be twice as large as the standard deviation of
the block averages. Typically, the relative errors affecting
the energy and the volume of the fluid are found to be
very small, a few hundredths percent at the most (for a
solid, they are even smaller).
A more direct clue about the nature of the phase(s)
expressed by the system for intermediate temperatures
can be got from a careful monitoring across the state
space of a “smectic” order parameter (OP) and of two
different, transversal and longitudinal (with respect to ẑ)
distribution functions (DFs). The smectic OP is defined
τ(λ) =
. (3.1)
This quantity is able to notice the existence of a layered
structure along ẑ in the system, be it solid-like or smectic-
like. In particular, the λ at which τ takes its largest value
gives the nominal distance λmax between the layers. A
large value of τ at λmax signals a strong layering along z
with period λmax. In order to discriminate between solid
and smectic (fluid) layers, we can rely on the in-plane
DF g⊥(r⊥), with r⊥ = r − (r · ẑ)ẑ, which informs on
how much rapid is the decay of crystal-like spatial corre-
lations in directions perpendicular to ẑ. The persistence
of crystal order along ẑ is measured through another DF,
g‖(z), which gives similar indications as τ(λ). A liquid-
like profile of g⊥ along with a sharply peaked τ or g‖ will
be faithful indication of a smectic phase. Conversely,
a sharply peaked g⊥ along with a structureless g‖ will
be the imprints of a columnar phase. Both g⊥(r⊥) and
g‖(z) are normalized in such a way as to approach 1 at
large distances in case of fully disordered center-of-mass
distributions in the respective directions. Slight devia-
tions from this asymptotic value may occur as a result
of the variation of box sidelengths during a simulation
run. The two DFs were constructed with a spatial reso-
lution of ∆r⊥ = D/20 and ∆z = L/20 respectively, and
updated every 10 MC sweeps.
We compute the difference in chemical potential be-
tween any two equilibrium states of the system – say,
1 and 2 – within the same phase (or even in different
phases, provided they are separated by a second-order
boundary) by the standard thermodynamic-integration
method as adapted to the isothermal-isobaric ensemble,
i.e., via the combined use of the formulas:
µ(T, P2)− µ(T, P1) =
dP v(T, P ) (3.2)
µ(T2, P )
µ(T1, P )
u(T, P ) + Pv(T, P )
(3.3)
To prove really useful, however, the above equations re-
quire an independent estimate of µ for at least one ref-
erence state in each phase. For the fluid, a reference
state can be any state characterized by a very small den-
sity (a nearly ideal gas), since then the excess chemical
potential can be estimated accurately through Widom’s
particle-insertion method [15]. The use of this technique
for small but finite densities avoids the otherwise neces-
sary extrapolation to the ideal gas limit as a reference
state for thermodynamic integration.
In order to calculate the excess Helmholtz free energy
of a solid, we resort to the method proposed by Frenkel
and Ladd [1], based on a different kind of thermody-
namic integration (see Ref. [4] for a full description of this
method and of its implementation on a computer). We
note that the ellipsoidal symmetry of the GCN particles
is not a complication at all, since the particle axes are
frozen and the only degrees of freedom been left are the
centers of mass. The solid excess Helmholtz free energy is
calculated through a series of NV T simulation runs, i.e.,
for fixed density and temperature. As far as the density
is concerned, its value is chosen in a way such that com-
plies with the pressure of the low-temperature reference
state, that is the one from which the NPT sequence of
runs is started. We wish to emphasize that, thanks to the
large sample sizes employed, the density histogram in a
NPT run always turned out to be sharply peaked, indi-
cating very limited density fluctuations (hence, negligible
ensemble dependence of statistical averages).
RESULTS
Zero-temperature calculations
For various L/D values in the interval between 1.1
and 3, we have calculated the T = 0 chemical poten-
tial µ(P ) for our eleven candidate ground states, with
P ranging from 0 to 0.20. We report in Table 1 the
results relative to L/D = 3 for two values of P , 0.05
and 0.20. An emergent aspect of this Table is the exis-
tence of a rich degeneracy that is only partly a result of
the effective identity of crystal structures up to a dila-
tion. Take e.g. the five structures with the minimum µ
(and with the same density). While the BCC001 lattice
with α = 3 is obtained from the FCC001 lattice with
α = 3/
2 = 2.12 . . . by a simple
2 dilation, there is
no homothety transforming BCC001(3) into BCC110(3)
or into BCC111(3) (in turn equivalent to SC111(1.5)):
Points in these three lattices have different local envi-
ronments, as can be checked by counting the nth-order
neighbors for n up to 4, yet the three stretched BCC crys-
tals of minimum µ share the same U/N . Also the pairs
FCC110(3), FCC111(3) and SC001(3), SC110(3) consist
of topologically-different degenerate structures. This fact
is an emergent phenomenon whose deep reason remains
unclear to us; it should deal with the dependence of u on
the ratio r/σ(θ), since the same symmetry holds with a
polynomial, rather than Gaussian, dependence.
For the case of L/D = 3, we show in Fig. 1 the over-
all P dependence at T = 0 of the chemical potential µ
for the various solids. The solid with the minimum µ is
either of the type FCC001 (with α = 3) or, say, of the
type BCC001 (with α = 3), a fact that holds true, but
with α = L/D, for all 1 < L/D < 3. Other solids are
definitely ruled out, and the same will probably hold for
T > 0. On increasing L/D, the transition from a FCC-
type to a BCC-type phase occurs at a lower and lower
pressure, whose reduced value is slightly less than 0.02
for L/D = 3.
Monte Carlo simulation
In order to investigate the thermodynamic behavior of
the GCN model at non-zero temperatures, we have car-
ried out a number of MC simulation runs for a GCN
system with L/D = 3, which is the system with the
strongest liquid-crystalline features that we can still man-
age numerically.
We have effected scans of the phase diagram for six
different pressure values, P ∗ = 0.01, 0.02, 0.03, 0.05, 0.12,
and 0.20. With all probability, FCC001(3) is the sta-
ble system phase only in a very small pocket of T -P
plane nearby the origin. However, we decided not to
embark on a free-energy study of the relative stability of
fluid, FCC001(3), and BCC-type phases at such low pres-
sures since this would require a numerical accuracy that
is beyond our capabilities. To a first approximation, the
boundary line between FCC001(3) and, say, BCC111(3)
can be assumed to run at constant pressure. For relating
data obtained at different pressures, we have carried out
two further sequences of MC runs along the isothermal
paths for T ∗ = 0.002 (solids) and T ∗ = 0.015 (fluid).
The Frenkel-Ladd computation of the excess
Helmholtz free energy per particle fex confirms that
the BCC001(3), BCC110(3), and BCC111(3) solids
are nearly degenerate at low temperature. We take
T ∗ = 0.002, P ∗ = 0.05 as a reference state for the
calculation of solid free energies. With the density
fixed at ρ = 0.08562D−3, in every case corresponding
to P ∗ = 0.05, we find βfex = 144.461(2), 144.470(2),
and 144.453(3), for the three above solids respectively,
implying a weak preference for the BCC111(3) phase.
Then, using thermodynamic integration along the
T ∗ = 0.002 isotherm (see Eq. (3.2)), we have studied
the relative stability of the three solids as a function
of pressure, up to P ∗ = 0.20. The results, depicted
in Fig. 2, suggest that BCC111(3) is the stable phase
throughout the low-temperature region, the other solids
being very good solutions anyway with near-optimal
chemical potentials.
We then follow the thermal disordering of the BCC-
type solids for fixed pressure (with three cases consid-
ered, P ∗ = 0.05, 0.12, and 0.20) through sequences of
isothermal-isobaric runs, all starting from T ∗ = 0.002,
with steps of 0.001. Any such sequence is stopped when
the values of potential energy and specific volume have
collapsed onto those of the fluid, thus informing that the
ultimate bounds of solid stability are reached (usually, a
solid can hardly be overheated). The stability thresholds
detected this way are fairly consistent with the indica-
tion coming from the DF profiles which, upon increas-
ing temperature, will eventually show a fluid-like appear-
ance. Thermodynamic integration (see Eq. (3.3)) is used
to propagate the calculated µ for T ∗ = 0.002 to higher
temperatures.
As far as the (nematic) fluid is concerned, we have
first generated a sequence of NPT simulation runs for
P ∗ = 0.05, starting from T ∗ = 0.015. At this initial
point, the excess chemical potential µex was estimated
by Widom’s insertion method, obtaining µex = 0.986(5).
It is worth noting that, in a long simulation run of as
many as 5×104 MC sweeps at equilibrium, the chemical-
potential value relaxed very soon, with small fluctuations
around the average and no significant drift observed. Our
analysis of the fluid phase is completed by further sim-
ulation runs along the isobaric paths for P ∗ = 0.12 and
0.20, for which we did not have the need to compute the
chemical potential again since this could be deduced from
the volume data along the T ∗ = 0.015 isotherm.
Chemical-potential results along the three isobars on
which we focussed are reported in Figs. 3 to 5. As is clear,
with increasing temperature the fluid eventually takes
over the solids. Among the solids, the BCC111(3) phase
is the preferred one for any temperature and pressure, al-
though the chemical potential of the other solid phases is
only slightly larger. On increasing pressure, the melting
temperature goes down, like in the Gaussian-core model.
The necessity of a matching with the zero-temperature
melting point for P = 0 will then imply reentrant melt-
ing in the GCN model too. The maximum error on the
melting temperature Tm, which we estimate to be about
0.003 (hence not that small), entirely depends on the lim-
ited precision of the fluid µex, which then constitutes a
major source of error on Tm.
The only conclusion we can draw from the above
chemical-potential study is that BCC111(3) is the most
stable solid phase of the system (provided the pressure is
not too low). However, a closer look at the DF profiles
obtained from the simulation of BCC111(3) raises some
doubts about the absolute stability of this phase at in-
termediate temperatures, whatever the pressure, calling
for a different interpretation of the hitherto considered
as BCC111(3) MC data. Take, for instance, the case of
P = 0.05. Upon increasing temperature, while g⊥ keeps
strongly peaked all the way to melting, the solid-like os-
cillations of g‖ undergo progressive damping until they
are washed out completely, suggesting a second-order
(or very weak first-order at the most) transformation of
BCC111(3) into a columnar phase before melting. This
is illustrated in Figs. 6 and 7, where the DFs are plotted
for a number of temperatures. A similar indication is got
from the behavior of the smectic OP, see Fig. 8, whose
highest maximum eventually deflates at practically the
same temperature, T ∗ ≈ 0.005, at which the oscillations
of g‖ disappear. Note that no appearance of a columnar
phase is seen during the simulation of either BCC110(3)
or BCC001(3), nor in the simulation of FCC001(3) for
P ∗ = 0.01. A slice of the columnar phase is depicted in
Fig. 9 (right panels). In this phase, columns of stacked
particles are arranged side by side, tightly packed to-
gether so as to project a triangular solid on the x-y plane.
Neighboring columns are not commensurate with each
other, as implied by a completely featureless g‖.
The probable reason for the instability of the smectic
phase in the GCNmodel is the absence of an ad hocmech-
anism for lateral attraction between the molecules, which
is present instead in the model of Ref. [14]. By the way,
hard ellipsoids do not show a smectic phase either [7], at
variance with (long) hard spherocylinders where particle
geometry alone proves sufficient to stabilize a periodic
modulation of the number density along ẑ [10].
Given the compelling evidence of a columnar phase
in the GCN model, one may now ask whether the con-
clusions drawn from the chemical-potential data are all
flawed. In particular, the µ curves that are tagged as
BCC111(3) in Figs. 3 to 5 would be meaningless beyond
a certain temperature Tc < Tm. In fact they are not, i.e.,
they retain full validity up to melting since the (nearly)
continuous character of the transition from BCC111(3)
to columnar allows one to safely continuate thermody-
namic integration across the boundary, with the proviso
that what previously treated as the BCC111(3) chemi-
cal potential beyond Tc is to be assigned instead to the
columnar phase.
As pressure goes up, the transition from BCC111(3) to
columnar takes place at lower and lower temperatures.
In order to exclude that the columnar phase too, like-
wise the fluid, will show reentrant behavior at low pres-
sure, we have simulated the disordering of a BCC111(3)
solid also for P ∗ = 0.02 and 0.03 (in fact, no reentrance
of the columnar phase is observed). Further points on
the melting line for P = 0.01, 0.02, and 0.03 are fixed
through the behavior of g⊥ as a function of temperature.
All in all, the overall GCN phase diagram appears as
sketched in Fig. 10. This is similar to the phase portrait
of the Gaussian-core model, see Fig. 1 of Ref. [4], with
the obvious exception of the columnar phase. There is a
small discrepancy between the melting points as located
through free-energy calculations (full dots in Fig. 10) and
those assessed from the evolution of g⊥ (open dots). In
our opinion, this would mostly be attributed to the sta-
tistical error associated with the µex of the fluid in its
reference state. Notwithstanding their limited precision,
however, free-energy calculations are all but useless in
identifying the structure of the solid phase. In conclu-
sion, although some aspects of the equilibrium behavior
of the GCN model remain still uncertain, especially with
regard to the exact location of the solid-solid transition
at low pressure, we are confident that the main features
of the GCN phase diagram are correctly accounted for
by Fig. 10.
Summing up, there are at least two conceivable and
mutually exclusive paths for the thermal disordering of
a liquid-crystal solid (aside from a direct transformation
of it into a nematic phase). One is through the forma-
tion of a smectic phase, which eventually transforms into
a nematic fluid. A second possibility is a more gradual
release of crystalline order by the appearance of a colum-
nar phase as intermediate stage between the solid and the
nematic phase. Our study showed that it is this second
scenario that occurs in the GCN model, with no evidence
whatsoever of a smectic phase.
CONCLUSIONS
We have introduced a liquid-crystal model of softly-
repulsive parallel ellipsoids, named the Gaussian-core ne-
matic (GCN) model, aiming at a complete characteriza-
tion of its phase behavior, including the solid sector. This
requires a preliminary identification of all relevant solid
structures, which is generally a far-from-trivial task to be
accomplished for model liquid crystals [16]. Through a
careful scrutiny of as many as eleven uniaxially-deformed
cubic and hexagonal phases, we obtained a thorough de-
scription of the T = 0 equilibrium phase portrait of the
GCN model, identifying its ground state at any given
pressure. In doing so, we discovered a rich and absolutely
unexpected structural degeneracy, which is only lifted by
going to T > 0. At low temperature, and for not too
low pressures, our free-energy calculations indicate that a
GCN system with an aspect ratio of 3 is found in just one
solid phase, i.e., a stretched BCC solid with the molecules
oriented along [111]. Only near zero pressure, the stable
phase becomes a stretched FCC solid. With increasing
temperature, the BCC-type solid first undergoes a weak
transition into a columnar phase, which still retains par-
tial crystalline order, before melting completely into the
nematic fluid.
It is worth emphasizing that our interest in the GCN
model is purely theoretical, hard-core ellipsoids provid-
ing a more physically realistic model liquid crystal. One
could even argue that a Gaussian repulsion is highly irre-
alistic for a liquid crystal. In real atomic systems, super-
position of particle cores is strongly obstructed, whence
the consideration of hard-core or steep inverse-power re-
pulsion in the more popular models. However, unless the
system density is very high, higher than considered in
our study, repulsive Gaussian particles would effectively
be blind to an inner hard core, which thus may or may
not exist, as evidenced e.g. in the snapshots of Fig. 9
where particles appear well spaced out.
The GCN model is a “deformation” of Stillinger’s
Gaussian-core model, well known for exhibiting a
reentrant-melting transition. Various instances of reen-
trant behavior are also known for nematics [17] and in-
deed one of the original motivations for the present work
was searching for a new kind of reentrance, i.e., re-
appearance of a more disordered phase with increasing
pressure. With this study, we provide yet another exam-
ple of reentrant behavior in a model nematic: While this
is nothing but the analog of fluid-phase reentrance in the
Gaussian-core model, the absolute novelty of our findings
is in the nature of the intermediate phase, this being sur-
prisingly columnar in a range of pressures rather than
genuinely solid.
∗ Electronic address: Santi.Prestipino@unime.it
† Electronic address: saija@me.cnr.it
[1] D. Frenkel and A. J. C. Ladd, J. Chem. Phys. 81, 3188
(1984); see also J. M. Polson, E. Trizac, S. Pronk, and
D. Frenkel, J. Chem. Phys. 112, 5339 (2000).
[2] F. Saija and S. Prestipino, Phys. Rev. B 72, 024113
(2005).
[3] S. Prestipino, F. Saija, and P. V. Giaquinta, Phys. Rev.
E 71, 050102(R) (2005).
[4] S. Prestipino, F. Saija, and P. V. Giaquinta, J. Chem.
Phys. 123, 144110 (2005).
[5] F. H. Stillinger, J. Chem. Phys., 65, 3968 (1976).
[6] A. Lang, C. N. Likos, M. Watzlawek, and H. Löwen, J.
Phys.: Condens. Matter, 12, 5087 (2000).
[7] D. Frenkel, B. M. Mulder, and J. P. McTague, Phys. Rev.
Lett. 52, 287 (1984).
[8] A. Stroobants, H. N. W. Lekkerkerker, and D. Frenkel,
Phys. Rev. A 36, 2929 (1987).
[9] J. A. C. Veerman and D. Frenkel, Phys. Rev. A 41, 3237
(1990); ibidem, 43, 4334 (1991).
[10] P. Bolhuis and D. Frenkel, J. Chem. Phys. 106, 666
(1997).
[11] C. Vega, E. P. A. Paras, and P. A. Monson, J. Chem.
Phys. 96, 9060 (1992); ibidem, 97, 8543 (1992).
[12] P. Pasini and C. Zannoni eds., Advances in the Computer
Simulations of Liquid Crystals (NATO-ASI Series, 1998).
[13] S. Singh, Phys. Rep. 324, 107 (2000).
[14] E. de Miguel and E. Martin del Rio, Phys. Rev. Lett. 95,
217802 (2005).
[15] B. Widom, J. Chem. Phys. 39, 2808 (1963).
[16] After completion of this paper, we became aware of
the discovery, reported in P. Pfleiderer and T. Schilling,
cond-mat/0612151, of a new stable crystal phase in
freely-standing hard ellipsoids. This further demonstrates
that the solid structure of liquid crystals is generally dif-
ficult to anticipate, even when the model system is the
simplest as possible.
[17] The first example of such behavior was discovered by
P. E. Cladis, Phys. Rev. Lett. 35, 48 (1975); see also
Ref. [14] and references therein.
mailto:Santi.Prestipino@unime.it
mailto:saija@me.cnr.it
TABLE I: GCN model for L/D = 3: T = 0 chemical poten-
tial µ(P ) for eleven different solids and two values of P ∗, 0.05
and 0.20. Nx, Ny , Nz are the number of lattice points along
the three spatial directions, ρ = NxNyNz/V is the density,
and α is the stretching ratio (for the SH111 lattice, α is the
so-called c/a ratio). Nx, Ny , Nz have been chosen so large
that the rounding-off error on the total potential energy per
particle, U/N , due to the finite lattice size is negligible. The
numerical precision on ρ and α is of one unit on the last deci-
mal digit. Looking at the Table, the most stable structures at
both pressures are five degenerate crystals, actually belonging
to three distinct types which are exemplified by BCC001(3)
(equivalent to FCC001(2.12) up to a dilation), BCC110(3),
and BCC111(3) (equivalent to SC111(1.5)) – within brackets
is the value of α.
crystal Nx, Ny , Nz ρ(0.05) α(0.05) µ(0.05) ρ(0.20) α(0.20) µ(0.20)
FCC001 10,20,10 0.086 2.12 0.855724 0.157 2.12 2.093695
BCC001 14,14,10 0.086 3.00 0.855724 0.157 3.00 2.093695
SC001 20,20,8 0.086 3.00 0.881586 0.158 3.00 2.105241
FCC110 16,12,12 0.086 3.00 0.856391 0.157 3.00 2.094368
BCC110 10,28,8 0.086 3.00 0.855724 0.157 3.00 2.093695
SC110 14,18,10 0.086 3.00 0.881586 0.158 3.00 2.105241
FCC111 16,18,9 0.086 3.00 0.856391 0.157 3.00 2.094368
BCC111 12,12,18 0.086 3.00 0.855724 0.157 3.00 2.093695
SC111 12,12,18 0.086 1.50 0.855724 0.157 1.50 2.093695
HCP111 18,20,10 0.086 3.00 0.856429 0.157 3.02 2.094474
SH111 18,20,9 0.086 2.75 0.870014 0.158 2.69 2.099565
FIG. 1: T = 0 equilibrium behavior of the GCN model with
L/D = 3. Left: T = 0 chemical potential µ(P ∗) of var-
ious crystals relative to BCC110(3), which thus serves as
the zero or reference level. The reduced pressure P ∗ is in-
cremented by steps of 0.01. Note that, for all P , the five
crystals FCC001(2.12), BCC001(3), BCC110(3), BCC111(3),
and SC111(1.5) are degenerate (∆µ = 0). Other data points
are for FCC001 (continuous line; α = 3 for P ∗ = 0.01, be-
ing α = 2.12 otherwise), FCC110(3) and FCC111(3) (dotted
line), HCP111 (open dots), SH111 (open squares), SC001(3)
and SC110(3) (dashed line). Right: Resulting equation of
state in the pressure range from 0 to 0.30. FCC001(3) (open
triangle) is stable at very low pressure, up to slightly less
than 0.02, while FCC001(2.12), BCC001(3), etc. (open dots)
prevail for higher pressures.
FIG. 2: GCN model with L/D = 3, chemical-potential results
for T ∗ = 0.002. In the picture, we plot the reduced chemical
potential of the three T = 0 degenerate structures that exist
for not too low pressure, taking BCC111(3) for reference. The
latter phase gives the most stable solid for any P in the range
from 0.05 to 0.20 (and, most likely, even further). The µ
curves are obtained by thermodynamic integration of volume
MC data, using as initial conditions those specified by the
Frenkel-Ladd calculations that were carried out at P ∗ = 0.05.
Though the reported µ values for the BCC-type solids are very
close to each other and also affected by some numerical noise,
the higher stability of BCC111(3) cannot be truly called into
question – a regular pattern is clearly seen behind each curve.
FIG. 3: GCN model with L/D = 3, chemical-potential
results for P ∗ = 0.05: Chemical potential of the fluid
phase (dotted line) as compared with those of the competing
solid phases for that pressure (BCC001(3), long-dashed line;
BCC110(3), dashed line; BCC111(3), continuous line). While
the BCC111(3) solid is stable at low temperature, the fluid
phase overcomes it in stability for higher temperatures. This
is more clearly seen in the inset, where chemical-potential
differences are reported, taking the fluid µ for reference. The
melting temperature for P ∗ = 0.05, which is where the con-
tinuous line crosses zero, is estimated to be T ∗ ≃ 0.0073.
FIG. 4: GCN model with L/D = 3, chemical-potential results
for P ∗ = 0.12. Same notation as in Fig. 3, except for the
absence of data for BCC001(3), which were not computed.
Despite this, a look at the results in Figs. 2 and 3 give us
confidence that the chemical potential of BCC001(3) will be
closer to that of BCC110(3) than is for P ∗ = 0.05.
FIG. 5: GCN model with L/D = 3, chemical-potential results
for P ∗ = 0.20. Same notation as in Figs. 3 and 4.
FIG. 6: GCN model with L/D = 3, distribution functions of
BCC111(3) for P ∗ = 0.05. Left: T ∗ = 0.002. Right: T ∗ =
0.003. The strenght of crystalline order along ẑ, as measured
by the amplitude of g‖ oscillations, reduces with increasing
temperature, until complete disorder is left above T ∗ ≃ 0.005
(see next Fig. 7). Considering that the crystallinity within
the x-y plane persists well beyond T ∗ = 0.005 (the spatial
modulation of g⊥ remains solid-like beyond this temperature
and up to melting), we conclude that the GCN system is found
in a columnar phase for 0.005 < T < Tm.
FIG. 7: GCN model with L/D = 3, distribution functions
of BCC111(3) for P ∗ = 0.05. Left: T ∗ = 0.004. Right:
T ∗ = 0.005.
FIG. 8: GCN model with L/D = 3, smectic order parame-
ter τ (λ) of BCC111(3) for P ∗ = 0.05. The behavior of τ (λ)
faithfully reproduces that seen for g‖(z) (cf. Figs. 6 and 7):
The deflating of the highest τ (λ) maximum with increasing
temperature closely follows the thermal damping of g‖(z) os-
cillations.
FIG. 9: GCN model with L/D = 3, some snapshots of the
particle configuration taken at low temperature (T ∗ = 0.002,
BCC111(3) solid phase) and at intermediate temperature
(T ∗ = 0.006, columnar phase). The reduced pressure is
P ∗ = 0.05 in both cases. Above: side view, i.e., projection
of particle coordinates onto the x-z plane. Below: top view,
i.e., projection of particle coordinates onto the x-y plane. For
clarity, in spite of their mutual interaction being soft, the par-
ticles are given sharp ellipsoidal boundaries, corresponding to
a unitary short axis (D) and a long axis of L = 3D. While the
crystalline order along z is lost already at T ∗ = 0.005 (hence,
it is there in the top-left panel while it is absent in the top-
right panel), the triangular order within the x-y plane is main-
tained up to the melting temperature (here, Tm ≃ 0.0073).
FIG. 10: GCN model with L/D = 3, sketch of the phase
diagram on the T -P plane. The full dots mark the location
of the melting transition as extracted from our free-energy
calculations. Open symbols refer instead to the transition
thresholds as given by a visual inspection of the DF profiles.
Though the latter melting-point estimates are more easily ob-
tained than the former, the free-energy study was essential to
identify the correct solid structure of the GCN model at not
too low pressure. To help the eye, tentative phase bound-
aries are drawn as continuous (i.e., first-order) and dashed
(nearly second-order) lines through the transition points. In
the low-pressure region, the solid-solid boundary is highly hy-
pothetical since we have no data there.
ABSTRACT
  We study a simple model of a nematic liquid crystal made of parallel
ellipsoidal particles interacting via a repulsive Gaussian law. After
identifying the relevant solid phases of the system through a careful
zero-temperature scrutiny of as many as eleven candidate crystal structures, we
determine the melting temperature for various pressure values, also with the
help of exact free energy calculations. Among the prominent features of this
model are pressure-driven reentrant melting and the stabilization of a columnar
phase for intermediate temperatures.

<|endoftext|><|startoftext|>
High-spin to low-spin and orbital polarization transitions in multiorbital Mott systems
Philipp Werner and Andrew J. Millis
Columbia University, 538 West, 120th Street, New York, NY 10027, USA
(Dated: June 30, 2007)
We study the interplay of crystal field splitting and Hund coupling in a two-orbital model which
captures the essential physics of systems with two electrons or holes in the eg shell. We use single site
dynamical mean field theory with a recently developed impurity solver which is able to access strong
couplings and low temperatures. The fillings of the orbitals and the location of phase boundaries
are computed as a function of Coulomb repulsion, exchange coupling and crystal field splitting. We
find that the Hund coupling can drive the system into a novel Mott insulating phase with vanishing
orbital susceptibility. Away from half-filling, the crystal field splitting can induce an orbital selective
Mott state.
PACS numbers: 71.10.Fd, 71.10.Fd, 71.28.+d, 71.30.+h
The Mott metal-insulator transition plays a fundamen-
tal role in electronic condensed matter physics [1]. Much
attention has focused on the one-orbital case, in part be-
cause of its presumed relevance to high temperature su-
perconductivity [2] and in part because appropriate the-
oretical tools for the multiorbital case have until recently
not been available. In most Mott systems, however, more
than one orbital is relevant [3] and the redistribution of
electrons among different orbitals leads to new phenom-
ena such as orbital ordering or “orbital selective” Mott
transitions. Recent studies of nickelates [4], titanates [5],
cobaltates [6], manganates [7], vanadates [8, 9, 10] and
ruthenates [3, 11, 12, 13] have focused interest on the in-
terplay between the Mott metal-insulator transition and
orbital degeneracy. A fundamental question in this field,
relevant in particular to the issue of lattice distortions
in strongly correlated materials, is the response of multi-
orbital systems to a perturbation which breaks the orbital
degeneracy. In this paper, we show that a two orbital
model with Hund coupling and crystal field splitting ex-
hibits two fundamentally different Mott phases, one char-
acterized by a vanishing orbital susceptibility, and one
adiabatically connected to the band insulating state. We
characterize these phases in terms of the atomic ground
states.
Multiorbital models are more difficult to study both
because of the larger number of degrees of freedom,
and because the physically important exchange and
pair-hopping terms are not easy to treat by standard
Hubbard-Stratonovich methods [14]. Weak coupling ap-
proaches [12] have been used to show that exchange and
pair hopping interactions act to suppress the response to
a crystal field splitting, and some authors have studied
the model without exchange and pair hopping terms [9],
but a reliable extension of these results to physically rel-
evant Slater-Kanamori interactions and the strong cou-
pling regime has been lacking.
Dynamical mean field theory (DMFT) provides a non-
perturbative and computationally tractable framework to
study correlation effects and has allowed insights into
the Mott metal-insulator transition [15]. In its sin-
gle site version, DMFT ignores the momentum depen-
dence of the self-energy and reduces the original lat-
tice problem to the self-consistent solution of a quan-
tum impurity model given by the Hamiltonian HQI =
Hloc + Hhyb + Hbath. For multi-orbital models Hloc =∑
m ǫmc
j,k,l,m U
jklmc
l cm, where m = (i, σ)
denotes both orbital and spin indices, and U jklm some
general four-fermion interaction. Hhyb and Hbath are
the impurity-bath mixing and bath Hamiltonians, respec-
tively. While the DMFT approximation simplifies the
problem enormously (replacing a 3 + 1 dimensional field
theory by a quantum impurity model plus a self consis-
tency condition), the extra complications associated with
exchange couplings in multiorbital systems have until re-
cently prohibited extensive numerical work. Interesting
progress has been made using a finite temperature exact
diagonalization technique [6, 13], but this approach re-
quires a truncation of Hbath to a small number of levels.
In Refs. [16, 17] we have introduced a continuous-time
impurity solver which can handle the general interactions
in Hloc. The method, which is free from systematic er-
rors, is based on a diagrammatic expansion of the parti-
tion function in the impurity-bath hybridization Hhyb.
Here, we employ this solver to study the physically
relevant case in which the number of electrons matches
the number of orbitals. The local Hamiltonian is
Hloc = −
α=1,2
µnα,σ +
∆(n1,σ − n2,σ)
α=1,2
Unα,↑nα,↓ +
U ′n1,σn2,−σ
(U ′ − J)n1,σn2,σ
− J(ψ†
2,↑ψ2,↓ψ1,↑ + ψ
2,↓ψ1,↑ψ1,↓ + h.c.), (1)
with µ the chemical potential, ∆ the crystal field split-
ting, U the intra-orbital and U ′ the inter-orbital Coulomb
interaction, and J the coefficient of the Hund coupling.
We adopt the conventional choice of parameters, U ′ =
http://arxiv.org/abs/0704.0057v2
 0  1  2  3  4  5  6  7  8  9
∆/t=0.2
∆/t=0.6
∆/t=1
FIG. 1: Filling of orbital 1 as a function of U for ∆/t =
0.2, 0.6, 1 and several values of J/U . The different curves
for given ∆ correspond (from bottom to top) to J/U = 0,
(0.01), (0.02), 0.05, 0.1, 0.15, 0.25, respectively. Open (full)
symbols correspond to metallic (insulating) solutions. The
metal-insulator transition is characterized by a jump in filling
and a coexistence region where both insulating and metallic
solutions exist. Our data show the region of stability of the
metallic phase.
U − 2J , which follows from symmetry considerations for
d-orbitals in free space and is also assumed to hold in
solids. With this choice the Hamiltonian (1) is rotation-
ally invariant in orbital space and the condition for half-
filling becomes µ = µ1/2 ≡ 32U −
J . In the DMFT self-
consistency loop we use a semi-circular density of states
of bandwith 4t (Bethe lattice). The temperature, unless
otherwise noted, is T/t = 0.02 and we suppress magnetic
order by averaging over spin up and down in each orbital.
No sign problem is encountered in the simulations.
The main result is shown in Fig. 1, which for several
values of ∆ and J/U plots the filling per spin, n1, of or-
bital 1 as a function of interaction strength. The half
filling, non-magnetic condition implies n2 = 1 − n1. In
the T → 0 limit, three phases are found: a metallic phase
(which may have any value of n1 between 0 and 0.5), an
orbitally polarized insulator favored by large ∆ and small
J , and a Mott insulator (with n1 = 0.5 = n2) favored
by large U , small ∆ and large J . If U is increased from
zero to a small value, the orbital splitting either increases
(small J/U) or decreases (large J/U), consistent with the
findings of Ref. [12]. As interaction strength is further in-
creased, one of several things may happen: at very small
J/U , n1 continues to decrease, and the system eventually
undergoes a transition to an orbitally polarized insulator
(for large ∆ essentially a band insulator). For somewhat
larger J/U , the occupancy n1, after an initial decrease,
goes through a minimum and begins to increase. At even
stronger interactions, one then observes a transition ei-
ther to an orbitally polarized insulator (where n1 may
take a range of values) or into a special type of insulator
 0  0.2  0.4  0.6  0.8  1
J/U=0
0.05 0.1
metal
insulator
 0  0.5  1  1.5  2  2.5  3  3.5
J/U=0 J/U=0.25
Mott insulator
(spin triplet for J/U=0.25)
metal
orbitally polarized
insulator
FIG. 2: Phase diagram in the plane of crystal field splitting ∆
and intraorbital Coulomb repulsion U for indicated values of
J/U . For J = 0 the phase boundary is a monotonic function
of ∆, whereas for J/U > 0 it peaks near ∆ =
2J (indicated
by the dotted lines). The insulating state in the region ∆ .√
2J is characterized by a vanishing orbital susceptibility.
with n1 = 0.5.
Figure 2 shows the metal-insulator phase diagram in
the space of crystal field splitting and Coulomb repul-
sion for several values of J/U . In the absence of a crystal
field splitting (∆ = 0), we observe a metal-insulator tran-
sition at a strongly J-dependent critical U . This finding
is consistent with data presented in Ref. [18]. As ∆ is
increased, the critical U changes. For J = 0 and fixed
U , n1 decreases until the band is emptied and a metal-
insulator transition occurs. The monotonic decrease of
the critical U with ∆ at J = 0 is a special case. For
J > 0, the first effect of a small ∆ is to stabilize the
metallic phase. Then, at larger ∆, a reentrant insulating
phase occurs. We shall show below that this behavior
arises from the unusual nature of the insulating state at
J > 0 and small ∆, which is characterized at T = 0 by a
strictly vanishing orbital susceptibility. If ∆ is increased
at large U , this state makes a transition to an orbitally
polarized insulator at ∆ ≈
2J . We therefore plot in
Fig. 2 the curves ∆ =
2J as dotted lines, and sug-
gest that they correspond to the T = 0 phase boundary
 0.05
 0.15
 0.25
 0.35
 0.45
 0  0.2  0.4  0.6  0.8  1  1.2
J/U=0.1
 0  0.05  0.1  0.15  0.2
J/U=0
J/U=0.002
J/U=0.005
J/U=0.010
J/U=0.020
FIG. 3: Filling of orbital 1 as a function of ∆ for fixed U
and indicated values of J/U . Top panel: U/t = 6. Open
(full) symbols correspond to metallic (insulating) solutions.
Bottom panel: U/t = 9. Here, all solutions are insulating.
For crystal field splittings smaller than ∆c =
2J (indicated
by a vertical line) the orbital susceptibility in the T → 0 limit
is completely suppressed. Solid lines are for βt = 50, dotted
lines show results for βt = 12.5, 25 and 100, respectively.
between two distinct insulating states.
Figure 3 plots the filling of orbital 1 as a function of
crystal field splitting for fixed U/t and several values of
J/U . The leftmost curve in the upper panel shows the
density variation for J/U = 0.02. At ∆ = 0, the model
is metallic. The rapid variation of n1 with ∆ reflects the
large, but finite orbital susceptibility of the metal, which
for this small value of J is strongly enhanced by U . At
∆/t ≈ 0.325 >
2J , an apparently first order transition
occurs to the orbitally polarized insulating state, which
then evolves smoothly (as ∆ is increased) to the band
insulator (n1 → 0). The two larger J values reveal a
different behavior. For ∆ < ∆c ≈
2J the insulating
state is characterized by an orbital occupancy which is
independent of crystal field splitting. Then, an appar-
ently discontinuous transition occurs to a metallic state
with a large orbital susceptibility, which at even larger ∆
exhibits a first order transition to the orbitally polarized
insulating state. The lower panel of Fig. 3 shows the be-
 2  4  6  8  10  12  14  16
basis state
∆/t=0.3
∆/t=0.5
∆/t=0.7
FIG. 4: Weight of the different eigenstates of Hloc for U/t =
6, J/U = 0.05 and ∆/t = 0.3, 0.5 and 0.7. The smallest
crystal field splitting corresponds to an insulating state with
suppressed orbital susceptibility, the intermediate value to a
metallic state and the largest splitting to a “band insulator”
(see Fig. 3).
havior for larger U , where the model is always insulating.
Our data for J = 0 exhibit a rapid variation of n1 with
∆. The slope is set by the inverse of the Kugel-Khomskii
superexchange ∼ t2/U ; thermal effects are unimportant
at βt = 50. For J > 0 and small ∆, the model is insu-
lating, with a vanishing orbital susceptibility, then (near
2J) makes a transition to the orbitally polarized
insulating phase with a differential susceptibility ∂n1/∂∆
determined by Kugel-Khomskii physics. Note that the
transition between the two insulators is sharp only at
T = 0; for T > 0 a rapid (but smooth) crossover occurs.
To gain insight into these phenomena, we look at the
contribution to the partition function from the differ-
ent eigenstates of the local Hamiltonian. Hloc has 16
eigenstates, which we number essentially as in Table II
of Ref. [17]. For the following discussion it is important to
note that |6〉, |7〉 and |8〉 are the three spin triplet states
(with energy U − 3J − 2µ), while |10〉 and |11〉 are lin-
ear combinations of the states |↑↓, 0〉 and |0,↑↓〉 with two
electrons in one orbital and none in the other. The latter
two states are coupled by the pair hopping and affected
by the crystal field splitting. Here, we choose them to
be eigenstates of Hloc corresponding to the eigenenergies
J2 + 4∆2 − 2µ: (1 + α2±)−1/2(| ↑↓, 0〉+ α±|0,↑↓〉),
α± = ±(
J2 + 4∆2 ∓ 2∆)/J . In particular, we choose
|10〉 to be the eigenstate with lower energy.
Figure 4 shows the weights of these states for the three
phases found at U/t = 6, J/U = 0.05 (see Fig. 3).
In the small-∆ phase, the triplet states are occupied,
with small excursions into states with occupancy 1 or
3. We therefore call this phase the triplet Mott insu-
lator. The triplet states of course have one electron in
each orbital and gain no energy from orbital polarization
(the remarkable fact is that this feature is preserved af-
 0.55
 0.65
 0.75
 3.5  4  4.5  5  5.5  6
orbital 1
orbital 2
FIG. 5: Filling n1(µ), n2(µ) for U/t = 4, J/U = 0.25 and
∆/t = 0.4. Full (open) symbols correspond to insulating
(metallic) solutions. At half-filling (µ/t = 3.5), the system
is in a triplet Mott insulating state, for 3.9 . µ/t . 4.6 in
an orbital selective Mott state, and for µ/t & 4.6 metallic in
both bands.
ter coupling to the lattice). In the metallic phase, a large
number of states is visited, while in the orbitally polar-
ized insulator, the dominant local state (whose weight
increases continuously with ∆) is a singlet (|10〉). The
triplet states are almost completely suppressed in the or-
bitally polarized phase. The large-U insulator-insulator
transition exhibits the same features, but without the
intermediate metallic phase, and is therefore also a tran-
sition between high and low spin states. Comparison of
the eigenenergies of the spin triplet states and |10〉 show
that these levels cross at ∆ =
2J . Thus, the transition
from triplet Mott insulator to orbitally polarized insula-
tor occurs at ∆c =
2J , consistent with our numerical
data. We also note that the wave-function of state |10〉
depends on the ratio J/∆, leading in the large-∆ limit
to n1(∆) ≈ (J/4∆)2. In the low spin phase, the or-
bital susceptibility has therefore two contributions: one
originating from Kugel-Khomskii physics and one of or-
der J2/∆3 from Hloc. The latter explains the roundings
seen in the right most curve of Fig. 3.
We briefly address the issue of the orbital selective
Mott transition, which provides a mechanism for local
moment formation in correlated materials, and has been
the subject of much recent debate [11]. Previous studies
focused on two-orbital models with different band-widths
and integer number of electrons. We find that in the
presence of a crystal field splitting, shifting the chemical
potential can drive the system into an orbital selective
Mott state, even if the band-widths are equal. Figure 5
shows the filling per spin in both orbitals as a function
of µ, for U/t = 4, J/U = 0.25 and ∆/t = 0.4. Doping
occurs first in one of the bands, leaving the other in a
Mott state with a magnetic moment. Further change of
the chemical potential drives the second band metallic.
In conclusion, we have shown that multiorbital impu-
rity models with realistic couplings can be efficiently sim-
ulated with the method of Ref. [17]. We have presented
numerical evidence, based on single site DMFT calcula-
tions, for the existence of two distinct Mott insulating
phases in a half-filled two-orbital model with Hund cou-
pling and crystal field splitting. At strong interactions
and J <
2∆, the system, in the T → 0 limit, is in
a phase characterized by a vanishing orbital susceptibil-
ity, and a spin 1 moment on each site. For J >
an orbitally polarized insulator is found. The exchange
terms promote insulating behavior at ∆ = 0 but can
stabilize a metallic phase at values of ∆ for which the
non-interacting model is a band-insulator.
It is interesting to compare our results to recent work
on the bilayer Hubbard model [19, 20]. The model
which these authors study is equivalent to our model with
U = U ′ = J , and ∆ replaced by the interlayer hopping.
In the low energy sector of this model, only four states
(essentially our three triplets and the pair hopping state
|10〉) are relevant, and what these authors describe as the
Mott insulator to band insulator crossover corresponds
to our transition (apparently sharp at T = 0) between
triplet Mott insulator and orbitally polarized insulator.
The existence of two distinct insulating phases raises
many interesting questions including the theory of an
insulator with strictly vanishing orbital susceptibility
(which should exhibit an orbital gauge symmetry) and
the nature and properties of the different metal-insulator
transitions. The physics near the “triple point” remains
to be studied. Our results away from half-filling suggest
that lightly doped La2NiO4 is in an orbitally selective
Mott phase.
The calculations have been performed on the Hreidar
beowulf cluster at ETH Zürich, using the ALPS-library
[21]. We thank M. Troyer for the generous allocation of
computer time, A. Georges and A. Poteryaev for stimu-
lating discussions and NSF-DMR-040135 for support.
[1] M. Imada, A. Fujimori and Y. Tokura, Rev. Mod. Phys.
70, 1039 (1998).
[2] P. W. Anderson, Science 235, 1196 (1987).
[3] Y. Tokura and N. Nagaosa, Science 288, 462 (2000).
[4] J. Kunes et al., Phys. Rev. B 75, 165115 (2007)
[5] C. Ulrich et al., Phys. Rev. Lett. 97, 157401 (2006).
[6] H. Ishida, M. D. Johannes, and A. Liebsch, Phys. Rev.
Lett. 94, 196401 (2005); A. Liebsch and H. Ishida,
arXiv:0705.3627.
[7] A. Yamasaki et al., Phys. Rev. Lett. 96, 166401 (2006).
[8] S. Biermann et al., Phys. Rev. Lett. 94, 026404 (2005).
[9] F. Lechermann, S. Biermann and A. Georges, Phys. Rev.
Lett. 94, 166402 (2005).
[10] T. Yoshida et al., Phys. Rev. Lett. 95, 146404 (2005).
[11] A. Liebsch, Phys. Rev. Lett. 91, 226401 (2003); A. Koga
et al., Phys. Rev. Lett. 92, 216402 (2004).
http://arxiv.org/abs/0705.3627
[12] S. Okamoto and A. J. Millis, Phys. Rev. B 70, 195120
(2004).
[13] A. Liebsch and H. Ishida, Phys. Rev. Lett. 98, 216403
(2007).
[14] S. Sakai, R. Arita, K. Held, and H. Aoki, Phys. Rev. B
74, 155102 (2006).
[15] A. Georges, G. Kotliar, W. Krauth and M. J. Rozenberg,
Rev. Mod. Phys. 68, 13 (1996).
[16] P. Werner et al., Phys. Rev. Lett. 97, 076405 (2006).
[17] P. Werner and A. J. Millis, Phys. Rev. B 74, 155107
(2006).
[18] A. Koga, Y. Imai and N. Kawakami, Phys. Rev. B 66,
165107 (2002).
[19] A. Fuhrmann, D. Heilmann and H. Monien, Phys. Rev.
B 73 245118 (2006).
[20] S. S. Kancharla and S. Okamoto, cond-mat/0703728.
[21] M. Troyer et al., Lecture Notes in Computer Science
1505, 191 (1998); F. Alet et al., J. Phys. Soc. Jpn. Suppl.
74, 30 (2005); http://alps.comp-phys.org/ .
http://arxiv.org/abs/cond-mat/0703728
http://alps.comp-phys.org/
ABSTRACT
  We study the interplay of crystal field splitting and Hund coupling in a
two-orbital model which captures the essential physics of systems with two
electrons or holes in the e_g shell. We use single site dynamical mean field
theory with a recently developed impurity solver which is able to access strong
couplings and low temperatures. The fillings of the orbitals and the location
of phase boundaries are computed as a function of Coulomb repulsion, exchange
coupling and crystal field splitting. We find that the Hund coupling can drive
the system into a novel Mott insulating phase with vanishing orbital
susceptibility. Away from half-filling, the crystal field splitting can induce
an orbital selective Mott state.

<|endoftext|><|startoftext|>
Introduction
Martin Rees is fond of arguing, absence of evidence is not evidence of absence. How could anyone
disagree? But on the question of the existence of extraterrestrial intelligent life, we have an undeniable fact:
they aren’t here. That is, extraterrestrial intelligent beings are not obviously present on our planet, or in
our solar system. I think even Martin will agree with this! But I claim this fact allows us to conclude that
extraterrestrial intelligence (ETI) is absence from our Galaxy and from the Local Group of galaxies. In other
words, if they existed, theyd be here!
This argument has often been called the Fermi Paradox. I think it is analogous to Olbers’ Paradox in
cosmology, which uses an equally obvious fact, known to all of us — the fact that the sky is dark at night
— to conclude that the universe must have evolved to its present state. The universe cannot have been
the same as it appears now for all eternity. I shall outline in Section 2 the reasons that the absence of ETI
on Earth allows us to conclude that they don’t exist in our galactic neighborhood. I have developed this
argument is much more detail elsewhere, addressing all counter-arguments that have been proposed. So I
shall only outline my argument in Section 2. I shall also only outline the evolutionary argument against ETI
here. Mayr, Dobzhanski, Simpson and Ayala have defended this position at length over the past 40 years,
and I’m sure this argument is quite familiar to the readers of this journal. What I want to develop in this
paper is a new argument against the existence of ETI.
I shall call it the Limited Resources Argument. It is related to the Fermi Paradox in that it assumes
that an intelligent life form will inevitability expand off its planet of origin and once this expansion begins, it
will never stop. But if intelligent life were common in the cosmos, the expansion of technological civilization
would use up resources so fast that intelligent life would die out. If intelligent life is rare, the speed of light
barrier will prevent life from using up the resources too fast.
The immediate reaction to this argument is, so what if intelligent life uses up the resources too fast and
dies out? Do we have any reason for believing that intelligent has some guarantee for survival that other
species do not? Most species that have evolved are now extinct, and have left no descendants. Why should
Homo sapiens be any different? There is no evidence from evolutionary biology that intelligence should
survive indefinitely.
But there is evidence from physics for the importance of intelligence life in cosmology. Not of course in
the current phase of universal history, but instead near the end of the universe.
http://arXiv.org/abs/0704.0058v1
II. Why Intelligent Life Must Be Rare
A. The Improbable Evolution Argument
The argument against ETI that most readers of this journal will be familiar with goes back to Alfred
Russell Wallace, and has more recently been defended by such major evolutionists as George Gaylord Simp-
son, Theodosius Dobzhanski, and Ernst Mayr. These scientists point out that according to the Modern
Synthesis, evolution has no knowledge of goals. Instead, natural selection acts on random mutations, muta-
tions which never appear with the intent of achieving a goal in the distant future. There are an enormous
number of evolutionary pathways, and so few of these lead to intelligent life, that it is unlikely intelligent
life will appear more than once in the visible universe, which is the part of the universe within 13.7 billion
light years. The universe is observed to be 13.7 billion years old, and so we cannot see out a distance greater
than 13.7 billion light years, the distance light could have traveled in that time. (Actually, we can see out
a bit further than 13.7 billion light years because of the expansion of the universe, but let me ignore this
minor technicality.) Even if we were to assume that all the matter and energy in the visible universe were in
the form of Earthlike planets, there would be only (!) about 1028 Earthlike planets in the visible universe.
This number assumes that “earthlike” means only that the mass of the planet is greater than or equal to
the mass of the Earth. No assumption is made about the planet’s star, atmosphere, or orbital radius.
The well-known evolutionist Francisco Ayala has recently made this argument quantitative. He estimates
that the probability of an intelligent species evolving on an Earthlike planet upon which one-cell organisms
have appeared is less than 10 to the minus one million power! This number is so tiny that the evolution
of intelligent life is exceedingly unlikely to have occurred even once. Ayala’s number is not contradicted
by the fact that intelligent life exists on Earth. It is just exceedingly improbable that it exists anywhere
in the universe (at least if the universe is finite in spatial size, as I shall argue in Section IV that it is).
Ayala’s number depends on the assumption that gene changes upon which natural selection operates are
essentially random. Evolution has no foresight. Mayr has emphasized that intelligence on earth is limited to
the chordate lineage, so, he argues that if the chordates never appeared on Earth, neither would intelligence.
But chordates first evolved more than half a billion years ago. These animals did not know that they had to
evolve so that Homo sapiens would eventually appear. Natural selection can only operate during an animal’s
lifetime. It cannot select a genome with the intent of using the genome a billion yeas later.
There is an important caveat to this; a caveat first pointed out by Charles Darwin himself in the last
pages of his book The Variation of Animals and Plants under Domestication. Darwin noted that at the
ultimate level of physics, the universe is deterministic. This means that at the ultimate level, there are no
random events. In particular, the evolution of Homo sapiens was inevitable, determined by the initial state
of the universe and the universes initial conditions. “Random” variation does not mean uncaused. It just
means unpredictable for human beings. Therefore, at this ultimate physical level, Darwin claims that his
own theory is only an approximation. Darwin noted that the advance of science might enable us to obtain
enough information to predict these “random” variations. I shall argue below that this time has now come.
B. If They Existed, They’d Be Here
The argument against the existence of extraterrestrial intelligent life that I have developed in most
detail is sometimes called the Fermi Paradox: if they existed, they’d be here. The force of this argument is
not usually appreciated, because most people — and even most scientists (! — tacitly assume that any alien
civilization, no matter when they evolved or how long they have had advanced technology, will nevertheless
have essentially the technology of the late 20th century. The reason for this tacit assumption is the usual
human weakness: we have an unfortunate habit of trying to impose our current human perspectives on the
physical universe.
But let consider the consequences of only slightly more advanced computer technology than we now
have. According to most computer experts, within a century or so we should have computer programs which
have human level intelligence, computers which can run such programs and also make copies of themselves
and the programs. Imagine such a machine combined with our rocket technology into a space probe. Such a
space probe can reach the nearest star in 40,000 years. Once there in the nearest star system, the probe could
make several copies of itself, using the asteroid material which we now know is present in almost all star
systems, sending these daughter probes to further star systems, where the process would be repeated. Even
with our rocket technology, every star system in the entire Galaxy would have a probe within 100 million
years. With a more advanced rocket technology, a rocket technology which is even today been experimented
with, it should be possible to send a probe between the stars at 1/10 light speed. With such a speed, probes
would cover the entire galaxy within a few million years. And all for the cost of a single probe!
Almost any motivation we can imagine would lead an intelligent species with the technology to launch
that single probe. Suppose for example, ET wants to contact other intelligent life forms. Then rather than
send out radio signals, they should send out that single probe. With radio, one has to send out the signals
to many stars, over many thousands of years. (We would expect evolution to intelligence to require billions
of years, as it did on Earth.) But once the probe is launched, coverage of the entire galaxy is automatic.
Once in a target star system, the intelligent probe can contact any intelligent life forms that happen to have
evolved on any planet in the system. Or if no intelligent life is found, the probe can study the entire system
and transmit the results back to Earth. This on the spot investigation is obviously impossible if radio signals
are sent out instead of a space probe.
One might think an intelligent species would be reluctant to use probes because of the worry that
these machines would eventually escape from the control of the original transmitting species. But the same
objection can be made to sending out radio signals. It is impossible to predict what use a recipient species
would make of the information in the signal. Many scientists here on Earth have opposed the transmission
of signals, fearing that hostile aliens may use the signals to home in on our planet. The fear of losing control
of the probes — which, since these machines are rational beings, should be regarded as our mind children
— apply with equal force to our biological descendants. “No species now existing will transmit its unaltered
likeness to a distant futurity” was how Darwin put in the closing pages of Origin of Species. We do not know
whether they will be good or bad by our standards. We do know that in the far future they won’t be Homo
sapiens.
But in the long run, our descendants, whatever they look like, whether they are silicon machines or
the more familiar DNA devices, must leave the Earth if they are to survive. Within 6 billion years, the
Sun’s atmosphere will expand out and engulf the Earth, which will spiral into the Sun and be vaporized. A
similar fate is in store for any and all intelligent species that evolve on a water planet. Making the reasonable
Darwinian assumption that survival will be a central motivation of all intelligent species, all intelligent species
will eventually develop space travel, leave their planet, and colonize their own star system. The universe
is 13.7 billion years old, and most stars and their planets are billions of years older than our own. Thus,
whatever the probability intelligent life evolves on an earthlike planet on which one-cell organisms appear,
most intelligent species would be billions of years older than we are. They should have left their mother
planet billions of years ago. Once they leave their planet, nothing can stop their expansion into interstellar
space. If they existed, they would be here.
C. The Limited Resources Argument
Once an intelligent species begins its expansion into interstellar space, there is only the speed of light
barrier to stop the expansion. Furthermore, as Dyson has emphasized, intelligent life will eventually develop
the ability to convert any form of matter into living matter and life support devices. Given time, intelligent
life can take apart no only asteroids, but also entire Jupiter-sized planets and even stars. Thus a galaxy
which has been invaded (infected?) by a space travelling intelligent life form will start to disappear. This,
by the way, is yet another argument for human uniqueness in the visible universe. We have never observed
galaxies in the process of controlled disintegration. Intelligent life, in the long term, ought to appear as a
horde of locusts, devouring all matter in its domain. A galactic wide government cannot be set up to stop
such behavior because of the speed of light barrier, but even if it could be set up, it would have no choice
but to allow such behavior. Survival requires the conversion of matter into energy. Setting an ultimate limit
to how much matter can be so converted would merely doom life to extinction.
However, the speed of light barrier, which prevents a galactic scale government from being set up to
prevent life from devouring all matter, itself imposes a limitation on how fast life can use up resources. The
disc of our galaxy is some 100,000 light years across; we not use up the material resources of our galaxy in less
than 100,000 years. The Virgo cluster is some 60 million light years away. We cannot use up the resources
of the Virgo cluster in less than 60 million years. If the universe were closed and decelerating, a single
intelligent life form could not devour the entire universe until after the universe had begun to recollapse.
Actually the universe is currently accelerating. If this acceleration were to continue forever at its present
rate, our descendants could devour only the region currently within at most 10 billion light years. This limit
is imposed by the speed of light barrier modified by the universal acceleration.
But the more intelligent life there is in the universe, the more planets upon which intelligent life inde-
pendently evolves, the more rapidly resources will be used up. When all the material resources are used up,
intelligent life will die. The more common intelligent life is in the universe, the more rapidly it will become
extinct.
Conversely, if intelligent life is quite rare — a single intelligent species, if the universe were closed and
always decelerating — intelligent life would be forced by the laws of physics to use resources at just the right
rate to survive to the very end of time. And even more intelligent species could so survive if the universe
were to have a period of acceleration in its expansion phase, as the universe is indeed observed to have.
But why should the universe adjust the number of intelligent species so that the descendants of the
species would survive to the end of time? As Darwin pointed out in the closing pages of Origin of Species,
almost all species that have ever existed on Earth have died out, leaving no descendants. Why should an
intelligent life form have a survival probability utterly different from almost all other species? I claim that
intelligent life will survive until the end of time because the laws of physics require it. Or to put it another
ways, because such survival is one of the goals of the universe.
III. Unitarity is Teleology
Teleology has been completely rejected by evolutionary biologists. This rejection is unfortunate, because,
teleology is alive and well in physics, under the name of unitarity. Unitarity is an absolutely central postulate
of quantum mechanics, and it has many consequences. One of these consequences is the CPT theorem, which
implies that the g-factors of particles and antiparticles must be exactly equal. This equality (for electrons
and positrons) has been verified experimentally to 13 decimal places, the most precise experimental number
we have. Which is why very few physicists are willing to give up the postulate of unitarity! Furthermore,
unitarity is closely related to the law of conservation of energy, and a violation of unitarity has been shown
to result usually in the gigantic creation of energy out of nothing. One model (due to Leonard Susskin) of
unitarity violation had the implication that whenever a microwave oven was turned on, so much energy was
created that the Earth was blown apart. So physicists are very reluctant to abandon unitarity.
Unitarity is most often applied to what physicists call the S-matrix, which is the quantum mechanical
linear operator that transforms any state in the ultimate past to a unique state in the ultimate future. But
unitarity more generally applies to the time evolution operator, a linear operator that carries the quantum
state of the universe at any initial time uniquely into the quantum state of the universe at any chosen future
time. Uniquely is a key word. It means that unitarity is the quantum mechanical version of determinism.
Contrary to what is generally thought, determinism is alive and well in quantum mechanics. Determinism,
however, applies to wave functions (quantum states) rather than to individual particles. Alternatively, we
can say that determinism applies to coherent collections of worlds rather than to individuals. There is a sense,
which I won’t have room to discuss here, in which quantum mechanics is more deterministic than classical
mechanics, and that Schrödinger derived his famous equation by requiring that classical mechanics in it
most general expression (Hamilton-Jacobi theory) be deterministic. (See Tipler (2005) for the mathematical
details.)
But the usual past-to-future determinism is not the fundamental meaning of unitarity. What unitarity
really means is that the inverse of the time evolution operator exists, and is easily computed from the time
evolution operator itself by forming the time evolution operator’s hermitian conjugate. Any operator whose
inverse is obtained in this manner is said to be a unitary operator. But in the present context, the important
point is that the inverse of the time evolution operator exists. The inverse of any operator is an operator
that undoes the effect of the original operator. In the case of the time evolution operator, which generates
past-to-future evolution, the inverse operator generates future-to-past evolution. In other words, it carries
future quantum states uniquely into past quantum states. Therefore, unitarity tells us that any complete
statement of usual past-to-future causation is mathematically equivalent to some complete statement of
future-to-past causation. In more traditional language, a complete list of all efficient causes is equivalent to
some complete list of final causes. Teleology is reborn!
Nevertheless, the Second Law of Thermodynamics says that the complexity of the universe at the
microlevel is increasing with time. This means that it will usually be the case that past-to-future causation
will be the simpler explanation of the two causal languages. But this will not always be the case. We should
always remember that for physical reality the two causation languages are mathematically equivalent. It
might occasionally be the case that we humans can understand where the evolution of the universe is taking
us only by using future-to-past causation. That is, we can understand what is happening now only by
considering the ultimate goal of the universe.
To reject this possibility is a terrible mistake. Humans naturally think in terms of past-to-future
causation because our memories are designed (by the laws of physics) to work in this time direction. But the
universe is not similarly restricted. It is a mistake to impose human limitations on the physical universe. It
was a terrible mistake to require that solar system mechanics look simple in a geocentric frame of reference.
Let me now use this future-to-past causation to show that biological evolution cannot be completely
random. I shall now argue that the laws of physics require intelligent life to evolve somewhere, and survive
to the very end of time.
IV. Why Intelligent Life MUST Exist in the Far Future
The necessity of intelligent life in the far future is an automatic consequence of the laws of physics,
specifically quantum mechanics, general relativity, the Standard Model of particle physics, and most impor-
tantly, the Second Law of Thermodynamics. I shall show that the mutual consistency of these laws requires
three things. First, the universe must be closed (the universe’s spatial topology must be a three-sphere).
Second, life must survive to the very end of time. Third, the knowledge possessed by life must increase to
infinity as the end of time is approached. I do not assume life survives to the end of time. Life’s survival
follows from the laws of physics. If the laws of physics be for us, who can be against us?
But before I prove that the laws of physics require life to survive, let me first show that it is possible
for life to survive. To survive for infinite experiential time, life requires an unlimited supply of energy. That
is, the supply of available energy must diverge to infinity as the end of time is approached. Nevertheless,
conservation of energy requires the total energy of the universe to be constant. In fact, Roger Penrose has
shown that the total energy of any closed universe is ZERO! The total energy is zero now, was zero in the
past, and will be zero at all times in the future. One might wonder how this is possible. After all, we are
now receiving energy from the Sun, we are using food energy as we read this, and we can extract energy
from coal, oil, and uranium. Energy, in other words, seems to be non-zero.
However, the forms of energy just listed are not all the forms of energy in the universe. There is also
gravitational energy, which is negative. So if we were to add all the positive forms of energy — radiant
energy, the stored energy in coal, oil, and uranium, and most importantly, the mass-energy of matter —
to the negative gravitational energy, the sum is zero. This means that if we can make the gravitational
energy even more negative, the positive energy, that is, the energy available for life, necessarily increases,
even though the total energy in the universe stays zero. The key property of energy that must always be
kept in mind is that it transforms from one form to another. Once we realize that gravitational energy can
transformed into available energy, we understand where life can obtain the unlimited source of available
energy it needs for survival: life must make the total gravitational energy approach minus infinity.
Life can do this only if the universe is closed, and collapses to zero size as the end of time is approached.
Conversely, if the universe is closed and collapses to zero size, then the total gravitational energy goes to
minus infinity, since the gravitational energy of a system is inversely proportional the size of the system.
I have shown in my book (Tipler, 1994) that life can in fact extract unlimited available energy from the
collapse of the universe.
Now let me outline the proof of my three claims above. I can give here only a bare outline. For complete
details, the reader is referred to my book (Tipler, 1994) and to papers ((Tipler et al, 2000), and (Tipler
2001)) on arXiv, the physics preprint database (available on the Internet at http://arxiv.org/). Black holes
exist, but Hawking proved that were black holes to evaporate completely — as they necessarily would if the
universe were to expand forever — the black holes would violate unitarity, the fundamental law of quantum
mechanics which I described in the previous section. Hence the universe must eventually stop expanding,
collapse, and end in a final singularity. If this final singularity were to be accompanied by event horizons, then
the Bekenstein Bound (another law of quantum mechanics, basically the Heisenberg Uncertainty Principle
http://arxiv.org/
expressed in the language of information theory) would have the following effect. It would force that all
the microstate information in the universe to go to zero as the universe approaches the final singularity.
But the microstate information going to zero would imply that the entropy of the universe would have to
go to zero, and this would contradict the Second Law of Thermodynamics, which says that the entropy of
the universe can never decrease. But if event horizons do not exist, then the Bekenstein Bound allows the
information in the microstates to diverge to infinity as the final singularity is approached. Conversely, ONLY
if event horizons do not exist can quantum mechanics (the Bekenstein Bound) be consistent with the Second
Law of Thermodynamics. Therefore, event horizons cannot exist, and by Seifert’s Theorem (see (Tipler,
1994), p. 435) the non-existence of event horizons requires the universe to be spatially closed. In Penrose’s
c-boundary construction (Tipler, 1994), (Hawking and Ellis, 1973), a singularity without event horizons is a
single point. I call such a final singularity the OMEGA POINT. At a Windsor Castle conference, Martin Rees
objected that many physicists (in particular, himself) do not accept Hawking’s proof that unitarity would
be violated were a black hole to evaporate to completion. But most of the physicists who reject Hawking’s
argument nevertheless accept that there is nevertheless a Black Hole Information Problem: i.e., that we
must explain how the information that falls into a black hole gets out. Many solutions to the Information
Problem have been proposed but all of these solutions (except the one I shall advance) have one feature in
common. They all involve proposed new laws of physics. My proposal — that there are no event horizons
at all, hence no black hole event horizons, so ALL information at all events are accessible to all observers
in the far future — does NOT involve new physical laws. Only classical general relativity is used. I use
Hawkings unitarity argument only to infer the non-existence of event horizons. If we resolve the Black Hole
Information Problem by simply assuming the non-existence of event horizons, then I don’t need to use either
the Bekenstein Bound or the Second Law of Thermodynamics to infer the existence of the Omega Point,
or spatial closure. Resolving the Information Problem using known physics automatically yields no event
horizons and spatial closure for the universe.
If the universe were to evolve into an Omega Point type final singularity without life being present to
guide its evolution, then the non-existence of event horizons would mean that the universe would be evolving
into an infinitely improbable state. Such an evolution would contradict the Second Law of Thermodynamics,
which requires the universe to evolve from less probable to more probable states. On the other hand, if life
is present guiding the evolution of the universe into the final singularity, then the absence of event horizons
is actually the MOST probable state, because the absence of event horizons is exactly what life requires
in order to survive (details in my book (Tipler 1994)). In other words, the validity of the Second Law
of Thermodynamics REQUIRES life to be present all the way into the final singularity, and further, the
Second Law requires life to guide the universe in such a way as to eliminate the event horizons. Life is the
only process consistent with known physical law capable of eliminating event horizons without the universe
evolving into an infinitely improbable state. Exactly how life eliminates the event horizons is described in
my book (Tipler, 1994). Roughly speaking, life nudges the universe so as to allow light to circumnavigate
the universe first in one direction, and then another. This is done repeatedly, an infinite number of times.
There are thus an INFINITE number of circumnavigations of light before the Omega Point is reached. If
we were to regard a single circumnavigation as a single tick of the light clock there would be an infinite
amount of such time between now and the Omega Point. An even more physical time would be the number
of experiences which life has between now and the Omega Point. This “experiential time” — the time
experienced by life in the far future — is the most appropriate physical time to use near the Omega Point.
It is far more appropriate than the human based proper time we now use in our clocks.
V. Life in the Future of an Accelerating Universe
As anyone who has read the science columns of the newspapers over the past decade knows, the universe
is now accelerating. The most recent WMAP observations of the Cosmic Microwave Background Radiation
provide the strongest evidence for acceleration, but there are several independent lines of evidence that lead
to the conclusion that the universe is accelerating. The evidence is also strong that the mechanism for the
acceleration is due to a positive cosmological constant. If this acceleration were to continue forever, then as
Barrow and I showed in our book (Barrow and Tipler, 1986), intelligent life will eventually die out, and the
entire theory, which I described in section III, would be false. If intelligent life is to continue until the very
end of time — as it must if the laws of physics are to hold at all times — then the universe must eventually
stop accelerating, slow down until the expansion stops, and then recollapse to a final singularity. In this
section, I shall outline a mechanism which can cancel the acceleration. My proposal assumes the validity of
the Standard Model of particle physics, a theory which is so far supported by all experiments conducted to
date, and which provides only one mechanism for a universal acceleration.
The latest WMAP observations of the Cosmic Microwave Background Radiation (CMBR) have provided
the following facts. First, the universe is 13.7 billion years old. Second, in the present epoch, the density
parameters of the curvature, the ordinary matter, the dark matter, and the dark energy are respectively
Ωk << 0.01, Ωm = 0.04, ΩDM = 0.23, and ΩΛ = 0.73. Notice that the subscript on the dark energy is Λ. I
use this subscript to emphasize that the WMAP data indicate the dark energy looks observationally like the
effect of a positive cosmological constant, traditionally written Λ. Any correct cosmological theory must be
consistent with these observations.
The Standard Model, minimally coupled to gravity, necessarily has a positive cosmological constant. I
predicted in my book (Tipler, 1994) that this cosmological constant would cause the universe to undergo an
acceleration. I argued that this acceleration would occur in the collapsing phase of universal history. I did
not realize that an acceleration could also occur in the expanding phase. Though I should have, since the
Standard Model requires such an acceleration.
The Standard Model requires a positive cosmological constant to cancel the effect of the Higgs vacuum.
Recall that according to the Standard Model, the universe is permeated with a non-zero value of the Higgs
field, and it is this non-zero value that breaks the electroweak symmetry and gives mass to all the particles.
But this symmetry breaking is accomplished via the Higgs potential, which for constant Higgs field, acts
exactly a very strong negative cosmological constant. Initially, at the Big Bang singularity, the Higgs field,
and hence the Higgs potential, was zero. But zero is not the lowest value of the potential, so as the universe
expanded, the Higgs potential dropped to its lowest value, corresponding to a negative cosmological constant.
Now in special relativity, this negative constant can be re-normalized out of existence. Not so in general
relativity. Any constant in the matter Lagrangian multiples the invariant volume element, and is equivalent
to putting in a cosmological constant in the Lagrangian (Weinberg, 1988).
The value of the negative cosmological constant corresponding to the Higgs potential can be set by
experiment, and it is enormous: −1.0× 1026 gm/cm3, as compared to the energy density of the dark matter
and dark energy, only 10−29 gm/cm3. The only way to make the Standard Model consistent with general
relativity is to add a positive cosmological constant of the same magnitude to the Lagrangian. We would
expect the value of the added positive cosmological constant to precisely cancel the value of the Higgs
potential, when the Higgs is in its true ground state (the absolute lowest energy density of the potential).
But the Higgs field cannot presently be in its true ground state, for a very simple reason: there is more
matter than antimatter in the universe. The Standard Model has a mechanism of generating this observed
excess of matter over antimatter, but most cosmologists believe that this cannot be the main mechanism
to generate matter, because they think, incorrectly, that it will generate too many photons to baryons. I
have shown that this large number of photons to baryons is a consequence of imposing the wrong boundary
conditions in the very early universe. If the only boundary conditions consistent with the Bekenstein Bound
(a.k.a. quantum field theory) are imposed, the photon to baryon ratio turns out fine. The Standard Model
generation of matter works by electroweak vacuum tunneling. And if this tunneling yields an excess of matter
over antimatter, the Higgs field cannot be in its true vacuum. Thus the excess of matter over antimatter in
the universe ultimately causes the observed acceleration of the universe!
Conversely, if the excess of matter over antimatter were to disappear — if matter were converted into
energy via electroweak tunneling — and if this disappearance were to occur rapidly enough, then the Higgs
potential would fall toward its true ground state, the positive cosmological constant would be progressively
cancelled, and the universe would cease to accelerate. If he universe were a spatially a three-sphere — and
I have argued in the previous section that it is — then once the acceleration stops, the universe will expand
to a maximum size, and then recollapse into the final singularity.
Provided, of course, than a mechanism can be found to convert matter into energy via electrweak
quantum tunneling. The mechanism would have to be the inverse of the process that created the matter
excess in the early universe. But a large amount of matter was created in the early universe because the
gauge field energy density was enormous. The gauge field energy density is tiny today: 10−31 gm/cm3, and
getting smaller as the universe expands. If the acceleration is to stop, another mechanism must annihilate
the matter.
I claim that our future descendants will annihilate the matter. Once again, they will annihilate the
matter in order to survive. Survival requires energy. If baryon number is conserved, then only a small
fraction of the energy content of matter can be extracted. If hydrogen is converted into helium, as in the
Sun, only 0.7% of the mass of the hydrogen is converted into energy. But if our descendants use the inverse of
baryogenesis (the technical term for the process that generated matter in the early universe), ALL the energy
in matter can be extracted. I predict that in the future, a way will be found to use inverse baryogenesis,
our descendants will use this process as their main energy source, and as a consequence of using up there
matter resources, they will save both themselves, and the entire universe. Because if the acceleration can be
cancelled and universal recollapse induced, then the gravitational collapse energy can provide an unlimited
energy source, as I showed above.
But in an accelerating universe, life can only travel to the cosmological event horizon, which is about 10
billion light years away at the present time, given the observed value of the dark energy. (Actually, I should
call it the “pseudo event horizon”, since it would be a true event horizon only if life never stops the expansion,
and the Omega Point never develops. The Omega Point, recall, means that there are no event horizons.) But
quantum non-locality means that the quantum tunneling responsible for baryogenesis generates a uniform
density of baryons on large scales. (And since it is the creation of baryons that generate perturbations in the
CMBR, the perturbation spectrum must be scale invariant.) This means that the baryons have essentially
the same density on large scales everywhere in the universe. This means that the acceleration must be
universal. This means that if the universe is to recollapse, the baryons must be annihilated everywhere, even
at distances greater than 10 billion light years, where our descendants cannot travel, even were rockets based
on baryon annihilation to be constructed. Such rockets could approach light speed. I have shown (Tipler,
1994) that such rockets can travel cosmological distances, using the expansion of the universe itself to slow
down the rocket. Our descendants can reach the pseudo event horizon but no farther.
Thus the laws of physics require there to exist other intelligent species in the universe. Because of the
Limited Resources Argument, the different intelligent life forms must be rare, roughly one species per Hubble
volume. The nearest other intelligent life form must be roughly 10 billion light years away. But were we to
look for them, we would not see them, because at 10 billion light years, we would see their galaxy as it was
10 billion years ago, probably long before their planetary system formed.
VI. Conclusion and Proposed Experiments
But sufficiently advanced radio telescopes MIGHT be able to detect their future presence. In other
words, I shall now argue that there is a role for SETI! If we cannot detect alien civilizations, we might be
able to detect the one-cell organisms out of which they will eventually evolve. Provided that these organisms
already existed 10 billion years ago.
There is some evidence that the one-cell organism that were our own ancestors were around billions of
years before the Earth formed 4.6 billion years ago. William Schopf (1999, p. 77) has discovered structures
in the 3,465 ± 5 million-year-old Apex chert of Australia that closely resemble modern cyanobacteria. Schopf
identified these structures as fossil cyanobacteria, an identification that has been recently challenged. But I
shall assume that his identification is correct, so I can consider the consequences.
Now cyanobacteria are actually very sophisticated biochemical machines. If the fossil found by Schopf are
indeed cyanobacteria, then all the machinery of prokaryotes, including photosynthetic ability, must have been
present on Earth almost as soon as the Earth became capable of sustaining life, about 3.8 billion years ago.
Schopf himself remarks (1999, p. 98) that it seems extraordinary to suppose that this much sophistication
could have evolved in the geologically short period between the solidification of the Earth and the date of
the Apex fossils. I agree with Schopf. If indeed the Apex structures are fossils of cyanobacteria, then these
organisms cannot have evolved on Earth. They must have evolved their observed level of sophistication on
some other planet whose star long ago left the main sequence, and in the process, scattered the cyanobacteria
throughout interstellar space.
At the Windsor Castle conference, Paul Davies emphasized the consensus opinion that cyanobacteria
could survive a trip from one of Solar System’s planets, but because of the amount of radiation that they
would receive, they could not survive an interstellar journey. But the evidence Paul cited was theoretical,
rather than experimental. Cyanobacteria are capable of surviving nuclear explosions, and they have been
known to live inside nuclear reactors (Schopf, 1999, pp. 232-234). Given the ability of cyanobacteria to
survive radiation, their biochemical complexity, and the evidence that they appeared almost instantaneously
on Earth, I think that the preponderance of evidence says that cyanobacteria evolved billions of years before
the Earth formed, on a star that has long since disappeared.
This hypothesis has consequences. First, our interplanetary space probes should find cyanobacteria
wherever in the Solar System there is, or has been, liquid water. But if cyanobacteria have indeed been
dispersed throughout interstellar space billions of years before the Earth formed, we would expect to find
cyanobacteria, with the same DNA codons and cellular machinery, wherever there is liquid water in the
entire Galaxy. This hypothesis can be rigorously tested only with interstellar space probes. Incidentally,
notice that I’ve given in passing yet another reason why interstellar probes will eventually be sent out by
any intelligent species: to check how related life is in the Galaxy.
But if photosynthetic organisms have existed for billions of years before the Earth formed — for the
order of 10 billion years — and if our evolution is typical, we would expect intelligent life near the pseudo
event horizon to have evolved from organisms, some of which have photosynthetic ability, which existed on
liquid water planets 10 billion years ago. We would also expect there to have been time for the photosynthetic
organisms to convert some of these ancient planets’ atmospheres into oxygen atmospheres. This is what we
should search for in distant galaxies: the spectral lines of free oxygen. It has long been known that the
oxygen in Earth’s atmosphere can be seen at a distance of 10 light years by a one meter orbiting telescope.
A million-kilometer telescope would be able to see free oxygen lines in planetary atmospheres near the pseudo
event horizon. From the arguments above, some such atmospheres must exist.
A million-kilometer telescope is not going to be built in the immediate future. In the short run, I would
propose testing the hypothesis that the excess of matter over antimatter is responsible for the universal
acceleration, and that a special boundary condition on the fields of the Standard Model generate the excess
of matter over antimatter. This can be done rather easily, using a modification of the original equipment that
discovered the CMBR. I have shown in (Tipler, 2001, 2005) that if Standard Model physics is responsible
for both the dark matter and the dark energy, then the CMBR should not couple to right-handed electrons,
and this can be seen by sending the CMBR through filters consisting of poor conductors. Through such a
filter, the CMBR would be more penetrating than thermal radiation of the same temperature. I have shown
elsewhere that the same effect is visible in the Sunyaev-Zel-dovich effect (Tipler, 2005), and it is responsible
for the great penetrating power of ultrahigh energy cosmic rays (Tipler, 2001, 2005).
Two of the arguments against the existence of ETI have been around for a long time. The evolutionary
argument goes back to Alfred Wallace, with Darwin the co-discoverer of the principle of natural selection.
The Fermi Paradox goes back to Enrico Fermi. I’ve added a third, the “Limited Resources Argument” which
connects the rarity of intelligent life in the universe to the unlimited survival of intelligence in the far future.
But to appreciate the power of this argument, we must learn to give up anthropocentric ways of thinking.
We must abandon the (usually tacit) idea that our technology exhausts what is possible using the known
laws of physics. We must abandon the idea that the universe acts according to human thought patterns,
that causality works from past to future. We must abandon the idea that the universe evolves us as the
highest level of intelligence, and that all other intelligent species will be as limited in space as we are. Finally,
we must abandon the idea that there is a limit to what intelligence can accomplish, and that intelligence
will never play a role on the cosmological scale. Once we give up these human ways of thinking, we can
appreciate the true relation between intelligent life and the cosmos.
References
Barrow, J.D., Tipler, F. J. 1986 The Anthropic Cosmological Principle, Oxford University Press.
Hawking, S.W., Ellis, G.F.R. 1973 The Large-Scale Structure of Space-Time, Cambridge University Press.
Schopf, W. 1999 Cradle of Life: the Discovery of Earths Earliest Fossils, Princeton University Press.
Tipler, F. J. 1994 The Physics of Immortality, Doubleday.
Tipler, F. J., Graber, J., McGinley, M., Nichols-Barrer, J., Staecker 2000 gr-qc/0003082.
Tipler, F.J. 2001 astro-ph/0111520.
http://arXiv.org/abs/gr-qc/0003082
http://arXiv.org/abs/astro-ph/0111520
Tipler, F. J. 2005, Reports Prog. Phys. 68, pp. 897–964.
Weinberg, S. 1989, Rev. Mod. Phys., 61, pp. 1–22.
ABSTRACT
  I shall present three arguments for the proposition that intelligent life is
very rare in the universe. First, I shall summarize the consensus opinion of
the founders of the Modern Synthesis (Simpson, Dobzhanski, and Mayr) that the
evolution of intelligent life is exceedingly improbable. Second, I shall
develop the Fermi Paradox: if they existed they'd be here. Third, I shall show
that if intelligent life were too common, it would use up all available
resources and die out. But I shall show that the quantum mechanical principle
of unitarity (actually a form of teleology!) requires intelligent life to
survive to the end of time. Finally, I shall argue that, if the universe is
indeed accelerating, then survival to the end of time requires that intelligent
life, though rare, to have evolved several times in the visible universe. I
shall argue that the acceleration is a consequence of the excess of matter over
antimatter in the universe. I shall suggest experiments to test these claims.

<|endoftext|><|startoftext|>
Introduction, we
assume that the spin axes of both stars have been aligned with the orbital normal and that
the rotation of both stars has been synchronized to the orbital period. This allows us to use
the observed rotational line broadening of the primary to solve for the radius of the primary
in linear units, which in turn allows us to convert the orbital size and secondary radius into
linear units from the values of [a/RA] and [RB/RA] derived from the light curves.
Using the assumption of synchronization, and that iorb = irot, we see by inspection that
[Vrot sin irot]
sin iorb
[Vrot sin irot]
sin iorb
[RB/RA] (13)
where [Vrot sin irot] is the projected rotational broadening of the primary derived from its
observed spectra. We may now substitute in Eq.(8) for sin iorb to get both radii in terms of
– 14 –
our observables:
2π (1− [b]2/[a/RA]2)
[Vrot sin irot] (14)
2π (1− [b]2/[a/RA]2)1/2
[RB/RA] [Vrot sin irot] (15)
By combining these two statements with Eq.(9) and (10), we arrive at expressions for
the masses of each component in terms of just the observable quantities:
[a/RA]
(1− [b]2/[a/RA]2)3/2
1− [e]2
[a/RA][Vrot sin irot]
[Vrot sin irot]
3 (16)
[a/RA]
(1− [b]2/[a/RA]2)3/2
1− [e]2 [Vrot sin irot]2 (17)
The results for the masses and radii for both components of HAT-TR-205-013 are pre-
sented in Table 8. The errors were estimated using Monte-Carlo simulations and were com-
pared with the results of formal error propagation, including the correlation coefficients
derived from the light-curve fits. Both approaches delivered similar results. The mass and
radius obtained for the primary star are essentially the same for both the g and i light curves,
but the mass and radius for the secondary differ by 0.8 and 3 percent, respectively. This
radius difference between the two light curves is close to 1-σ, and may be due to uncertainties
in the limb-darkening coefficients. Our adopted values are based on the average values of
the light curve parameters.
4. DISCUSSION
In Figure 6 we plot our mass and radius for the M-dwarf secondary HAT-TR-205-
013 B on a mass-radius diagram, together with isochrones for ages of 0.5 and 5 Gyr from
Baraffe et al. (1998). We also plot the results for 11 M dwarf secondaries from the sample
of OGLE planetary candidates analyzed by Bouchy et al. (2005); Pont et al. (2005a,b, 2006)
and listed in Table 9. For the systems OGLE-TR-34 (Bouchy et al. 2005), OGLE-TR-120
(Pont et al. 2005b), and the low mass systems OGLE-TR-122 (Pont et al. 2005a) and OGLE-
TR-123 (Pont et al. 2006) the authors had to use stellar models to to estimate the masses
and radii of the primaries without the assumption of synchronization, as synchronization
implied masses and radii that were inconsistent with the spectroscopic observations. For the
other seven systems, they were able to assume synchronization and to derive the radius of the
primary from the observed rotational line broadening. In general the agreement between the
– 15 –
OGLE results and the Baraffe et al. (1998) isochrones looks promising, but the observational
uncertainties are still too large to allow a critical test of the theoretical models. The OGLE
systems are all much fainter than HAT-TR-205-013, which presents significant challenges
for both the spectroscopic and photometric follow-up observations. Spectroscopy with the
resolution and signal-to-noise ratio suitable for determining accurate values for rotational
broadening requires time on large telescopes, and photometry for high-quality light curves
also requires large telescopes to achieve the needed photon statistics. Eclipsing binaries
identified by wide-angle surveys are much brighter and therefore less challenging on both
counts.
Our value for the radius of the M-dwarf secondary in HAT-TR-205-013 lies 11 percent,
about 3-σ, above the theoretical isochrones. This divergence is further reinforced by Eq.(11),
which, as has been previously noted, restricts the position of HAT-TR-205-013 B to lie on a
single line that is determined by the surface gravity of the object. This gravity curve does
not rely on any prior assumptions about the HAT-TR-205-013 system, nor does it depend
upon our measured value of Vrot sin irot, which is the biggest contributor of uncertainty to our
final results. We use the assumption of synchronization and the spectroscopically measured
Vrot sin irot to place HAT-TR-205-013 B at a specific location along the curve, but it is
important to note that in the region that we find HAT-TR-205-013 B, the curve of allowable
locations runs nearly parallel to the theoretical models. This is illustrated in Figure 6 by
the red line that passes through our point for HAT-TR-205-013 B. Thus the conclusion that
the theoretical models predict a radius for HAT-TR-205-013 B that is too small by about 10
percent is on much firmer ground than the error bar in the observed radius might suggest.
Indeed, it would require a 6-σ difference in Vrot sin irot to place HAT-TR-205-013 B onto the
Baraffe models.
Our result for HAT-TR-205-013 B supports the suggestion from the results for double-
lined eclipsing binaries plotted in Figure 1 that the models predict radii for M dwarfs that
are too small by up to 10 percent. This discrepancy has been noted before, for example
by Torres & Ribas (2002) in the case of YY Gem. Torres et al. (2006) raised the issue of
whether short-period eclipsing binaries are representative of isolated field stars and wide
binaries where tidal forces are negligible. They suggested that the rapid rotation of the
stars in these systems caused by tidal synchronization might give rise to enhanced magnetic
activity, thus decreasing the efficiency of energy transport in the convective envelopes and
leading to inflated stellar radii. For low mass stars, this effect is examined in more detail by
Lopez-Morales (2007).
In the case of HAT-TR-205-013, we see no evidence in the photometry of star spots on
the primary star, which would be tell-tale indicators of enhanced stellar magnetic activity.
– 16 –
Though HAT-TR-205-013 A is rapidly rotating, the lack of magnetic ativity is not suprising,
given its spectral type (F7). The star’s outer convective layer is relatively shallow, and
it is not unusual for rapidly rotating stars of this type to lack strong magnetic activity
(Torres et al. 2006).
In some instances it may be possible to independently determine the rotational period of
the primary through high-quality light curves used to definitively identify photometric varia-
tion outside of eclipse. This would serve as a check to the assumption of tidal synchronization
in the system.
In future papers we will present the results for several additional single-lined eclipsing
binaries with circularized orbits.
We thank Joe Zajac, Perry Berlind, and Mike Calkins for obtaining some of the spectro-
scopic observations; Bob Davis for maintaining the database for the CfA Digital Speedome-
ters; and John Geary, Andy Szentgyorgyi, Emilio Falco, Ted Groner, and Wayne Peters for
their contribution to making KeplerCam such an effective instrument for obtaining high-
quality light curves. TGB thanks the Harvard University Origins of Life Initiative for sup-
port. GK thanks the support of OTKA K-60750. The HATnet project is supported by
NASA Grant NNG04GN74G. This research was supported in part by the Kepler Mission
under NASA Cooperative Agreement NCC2-1390.
REFERENCES
Andersen, J. 1991, A&AR, 3, 91
Bakos, G. Á., Lázár, J., Papp, I., Sári, P., & Green, E. M. 2002, PASP, 114, 974
Bakos, G., Noyes, R. W., Kovács, G., Stanek, K. Z., Sasselov, D. D., & Domsa, I. 2004,
PASP, 116, 266
Baraffe, I., Chabrier, G., Allard, F., & Hauschildt, P. H. 1998, A&A, 337, 403
Baraffe, I., Chabrier, G., Allard, F., & Hauschildt, P. H. 2002, A&A, 382, 563
Borucki, W. J., Caldwell, D., Koch, D. G., Webster, L. D., Jenkins, J. M., Ninkov, Z., &
Showen, R. 2001, PASP, 113, 439
Bouchy, F., Pont, F., Melo, C., Santos, N. C., Mayor, M., Queloz, D., & Udry, S. 2005,
A&A, 431, 1105
– 17 –
Chabrier, G., & Baraffe, I. 1997, A&A, 327, 1039
Claret, A. 2004, A&A, 428, 1001
Creevey, O. L., Benedict, G. F., Brown, T. M., Alonso, R., Cargile, P., Mandushev, G.,
Charbonneau, D., McArthur, B. E., et al. 2005, ApJ, 625, L127
Cox, Arthur N., ed. 2000. Allen’s Astrophysical Quantities, Fourth Edition. New York:
Springer-Verlag.
Flower, P. J. 1996, ApJ, 469, 355
Gaudi, B. S., & Winn, J. N. 2007, ApJ, 655, 550
Girardi L., Bressan A., Bertelli G., & Chiosi C. 2000, A&AS, 141, 371
Hut, P. 1981, A&A, 99, 126
Kovács, G., Bakos, G., & Noyes, R. W. 2005, MNRAS, 356, 557
Kovács, G., Zucker, S., & Mazeh, T. 2002, A&A, 391, 369
Kurtz, M. J., & Mink, D. J. 1998, PASP, 110, 934
Kurucz, R. L. 1992, in The Stellar Populations of Galaxies, IAU Symp. No. 149, ed. B.
Barbuy and A. Renzini (Kluwer Acad. Publ.: Dordrecht), 225
Lacy, C. H. 1977, ApJ, 218, 444
Latham, D. W. 1992, in IAU Coll. 135, Complementary Approaches to Double and Multiple
Star Research, ASP Conf. Ser. 32, eds. H. A. McAlister & W. I. Hartkopf (San
Francisco: ASP), 110
Latham, D. W. 2003, in ASP Conf. Ser. 294: Scientific Frontiers in Research on Extrasolar
Planets, ed. D. Deming & S. Seager (San Fransisco: ASP), 409
Latham, D. W. 2007, in Transiting Extrasolar Planets Workshop, ed. C. Afonso, ASP Conf.
Ser. in press.
López-Morales, M., & Ribas, I. 2005, ApJ, 631, 1120
López-Morales, M., Orosz, J. A., Shaw, J. S., Havelka, L., Arevalo, M. J., McIntyre, T., &
Lazaro, C. 2006, ApJ, submitted (astro-ph/0610225)
Lopez-Morales, M. 2007, ArXiv Astrophysics e-prints, arXiv:astro-ph/0701702
http://arxiv.org/abs/astro-ph/0610225
http://arxiv.org/abs/astro-ph/0701702
– 18 –
Maceroni, C., & Montalbán, J. 2004, A&A, 426, 577
Mandel, K., & Agol, E. 2002, ApJ, 580, L171
Metcalfe, T. S., Mathieu, R. D., Latham, D. W., & Torres, G. 1996, ApJ, 456, 356
Mullan, D. J., & MacDonald, J. 2001, ApJ, 559, 353
Nordström, B., Mayor, M., Andersen, J., Holmberg, J., Pont, F., Jogensen, B. R., Olsen, E.
H., Udry, S., Mowlavi, N. 2004, A&A, 418, 989
O’Donovan, F. T., Charbonneau, D., Alonso, R., Brown, T. M., Mandushev, G., Dunham, E.
W., Latham, D. W., Stefanik, R. P., et al. 2007, ApJ, submitted (astro-ph/0610603)
Pont, F., Melo, C. H. F., Bouchy, F., Udry, S., Queloz, D., Mayor, M., & Santos, N. C. 2005,
A&A, 433, L21
Pont, F., Bouchy, F., Melo, C., Santos, N. C., Mayor, M., Queloz, D., & Udry, S. 2005,
A&A, 438, 1123
Pont, F., Moutou, C., Bouchy, F., Behrend, R., Mayor, M., Udry, S., Queloz, D., Santos, N.,
& Melo, C. 2006, A&A, 447, 1035
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P., Numerical Recipes,
1992 (Cambridge: Cambridge Univ. Press)
Ribas, I. 2003, A&A, 398, 239
Southworth, J., Zucker, S., Maxted, P. F. L., & Smalley, B. 2004, MNRAS, 355, 986
Tody, D. 1986, Proc. SPIE, 627, 733
Tody, D. 1993, in ASP Conf. Ser. 52, Astronomical Data Analysis Software and Systems II,
ed. R. J. Hanisch, R. J. V. Brissenden, & J. Barnes (San Francisco: ASP), 173
Torres, G., Lacy, C. H., Marschall, L. A., Sheets, H. A. & Mader, J. A. 2006, ApJ, 640, 1018
Torres, G., & Ribas, I. 2002, ApJ, 567, 1140
Winn, J. N., et al. 2006, ArXiv Astrophysics e-prints, arXiv:astro-ph/0612224
Young, A. T. 1967, AJ, 72, 747
Zahn, J. P. 1989, A&A, 220, 112
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0610603
http://arxiv.org/abs/astro-ph/0612224
– 19 –
Table 1. Individual Radial Velocities
HJD Vrad σ(Vrad)
(days) (km s−1) (km s−1)
2453034.45642 −2.02 1.38
2453035.47574 −10.11 1.61
2453035.58018 −18.52 1.01
2453036.48778 −12.93 1.53
2453037.46565 −0.47 1.43
2453037.61215 −6.83 0.91
2453038.46413 −25.72 1.48
2453038.57874 −19.76 1.15
2453040.47360 −28.58 1.24
2453042.58686 −27.79 0.91
2453043.58338 +4.84 1.23
2453044.58422 −19.73 1.14
2453045.57911 −6.42 0.73
2453046.46373 −2.11 0.80
2453046.60000 −10.47 0.77
2453047.50881 −20.45 1.49
2453047.58731 −14.83 0.95
2453543.94910 −4.01 1.10
2453658.69572 −20.53 1.16
2453659.75967 +3.14 2.28
2453659.78398 +2.85 1.09
2453660.70213 −25.98 1.57
2453664.70202 −19.39 1.21
– 20 –
Table 2. Spectroscopic Orbital Parameters
Parameter Value
P (days) 2.23072± 0.00005
γ (km s−1) −9.83± 0.30
K (km s−1) 18.33± 0.47
e 0.012± 0.021
ω (◦) 143± 90
Epoch (HJD) 2, 453, 198.61± 0.56
Nobs 23
O − C rms (km s−1) 1.06
f(M) (M⊙3) 0.00142± 0.00023
aA sin i (Gm) 0.562± 0.030
– 21 –
Table 3. Rotational Velocity Results
log g,[Fe/H] Teff < Vrot > σ(< Vrot >) Correlation
(K) (km s−1) (km s−1) Coefficient
4.0,0.0 6340 29.4 0.25 0.826
4.5,0.0 6540 28.4 0.24 0.823
4.0,−0.5 5960 29.2 0.21 0.821
4.5,−0.5 6150 28.2 0.24 0.816
Adopted:
4.24,−0.2 6295 28.9 1.0
– 22 –
Table 4. g Band Photometry
HJD Flux
2453666.575985 1.00054
2453666.576472 0.99856
2453666.576946 1.00108
2453666.579226 1.00084
2453666.579712 1.00008
2453666.580198 0.99985
2453666.582501 0.99998
2453666.582976 0.99882
2453666.583474 1.00159
Note. — Table 4 is pre-
sented in its entirety in the
electronic edition of the As-
trophysical Journal. A por-
tion is shown here for guid-
ance regarding its form and
content.
column (1): Heliocentric
Julian Date,
column (2): Normalized in-
strumental flux.
– 23 –
Table 5. i Band Photometry
HJD Flux
2453666.574226 0.99951
2453666.574469 0.99803
2453666.574724 1.00096
2453666.574967 0.99846
2453666.575233 0.99628
2453666.575488 1.00014
2453666.577432 1.00062
2453666.577698 0.99886
2453666.577965 0.99941
2453666.578208 0.99613
2453666.578474 0.99957
2453666.578728 0.99962
2453666.580719 0.99837
2453666.580974 0.99976
2453666.581228 0.99761
2453666.581494 1.00025
2453666.581772 1.00113
2453666.582027 0.99823
Note. — Table 5 is pre-
sented in its entirety in the
electronic edition of the As-
trophysical Journal. A por-
tion is shown here for guid-
ance regarding its form and
content.
column (1): Heliocentric
Julian Date,
– 24 –
column (2): Normalized in-
strumental flux.
– 25 –
Table 6. Light-Curve Fit Results
Parameter g Band i Band Adopted
a/RA 5.93± 0.15 5.91± 0.16 5.92± 0.11
RB/RA 0.1330± 0.0010 0.1288± 0.0007 0.1309± 0.0006
b 0.36± 0.06 0.37± 0.07 0.365± 0.046
– 26 –
Table 7. Correlation Coefficients
Coefficient g Band i Band
(a/RA,RB/RA) 0.28 0.27
(a/RA,b) −0.21 −0.42
(RB/RA,b) 0.04 0.01
– 27 –
Table 8. Physical Parameters for HAT-TR-205-013
Parameter g Band i Band Adopted
MA (M⊙) 1.04± 0.14 1.03± 0.14 1.04± 0.13
RA (R⊙) 1.28± 0.04 1.28± 0.04 1.28± 0.04
MB (M⊙) 0.124± 0.011 0.123± 0.011 0.124± 0.010
RB (R⊙) 0.169± 0.006 0.164± 0.006 0.167± 0.006
a (AU) 0.0351± 0.0015 0.0351± 0.0015 0.0351± 0.0014
– 28 –
Table 9. Masses and Radii for Low-Mass Stars
Name M (M⊙) R (R⊙) Type Ref.
OGLE-TR-123 B 0.085± 0.011 0.133± 0.009 SB1 EB 1
OGLE-TR-122 B 0.092± 0.009 0.120± 0.018 SB1 EB 2,3
OGLE-TR-106 B 0.116± 0.021 0.181± 0.013 SB1 EB 3
HAT-TR-205-013 B 0.123± 0.011 0.167± 0.007 SB1 EB 13
OGLE-TR-125 B 0.209± 0.033 0.211± 0.027 SB1 EB 3
CM Dra B 0.2136± 0.0010 0.2347± 0.0019 SB2 EB 4,5
CM Dra A 0.2307± 0.0010 0.2516± 0.0020 SB2 EB 4,5
OGLE-TR-78 B 0.243± 0.015 0.240± 0.013 SB1 EB 3
OGLE-TR-5 B 0.271± 0.035 0.263± 0.012 SB1 EB 6
OGLE-TR-7 B 0.281± 0.029 0.282± 0.013 SB1 EB 6
OGLE-TR-6 B 0.359± 0.025 0.393± 0.018 SB1 EB 6
OGLE-TR-18 B 0.387± 0.049 0.390± 0.040 SB1 EB 6
CU Cnc B 0.3890± 0.0014 0.3908± 0.0094 SB2 EB 7
OGLE-BW3-V38 B 0.41± 0.09 0.44± 0.06 SB2 EB 8
CU Cnc A 0.4333± 0.0017 0.4317± 0.0052 SB2 EB 7
OGLE-BW3-V38 A 0.44± 0.07 0.51± 0.04 SB2 EB 8
OGLE-TR-120 B 0.47± 0.04 0.42± 0.02 SB1 EB 3
TrES-Her0-07621 B 0.489± 0.003 0.452± 0.050 SB2 EB 9
TrES-Her0-07621 A 0.493± 0.003 0.453± 0.060 SB2 EB 9
NSVS01031772 B 0.4982± 0.0025 0.5088± 0.0030 SB2 EB 10
OGLE-TR-34 B 0.509± 0.038 0.435± 0.033 SB1 EB 6
NSVS01031772 A 0.5428± 0.0027 0.5260± 0.0028 SB2 EB 10
YY Gem A & B 0.5992± 0.0047 0.6191± 0.0057 SB2 EB 11
GU Boo B 0.599± 0.006 0.620± 0.020 SB2 EB 12
GU Boo A 0.610± 0.007 0.623± 0.016 SB2 EB 12
References. — 1. Pont et al. (2006); 2. Pont et al. (2005a); 3. Pont et al.
(2005b); 4. Lacy (1977); 5. Metcalfe et al. (1996); 6. Bouchy et al. (2005);
– 29 –
7. Ribas (2003); 8. Maceroni & Montalbán (2004); 9. Creevey et al.
(2005); 10. López-Morales et al. (2006); 11. Torres & Ribas (2002); 12.
López-Morales & Ribas (2005); 13. This paper
– 30 –
Fig. 1.— The mass-radius diagram for 10 stars in 5 double-lined eclipsing binaries each
composed of two M dwarfs, and with errors better than 3 percent.
– 31 –
 9.98
 10.02
 10.04
 10.06
 10.08
 10.1
-0.4 -0.2  0  0.2  0.4
Phase
HAT-5 TFA data
HAT-8 TFA data
 10.01
 10.02
 10.03
 10.04
 10.05
 10.06
-0.1 -0.05  0  0.05  0.1
Fig. 2.— The phase-folded HATnet light curve for HAT-TR-205-013.
– 32 –
Fig. 3.— The velocity curve for our orbital solution for HAT-TR-205-013, together with the
individual observed velocities. The lower panel shows the O-C velocity residuals from the
orbital solution.
– 33 –
Fig. 4.— KeplerCam light curves for HAT-TR-205-013 in the SDSS g and i bands. Contin-
uous lines show the best fit synthetic light curves for each.
– 34 –
Fig. 5.— Contours of χ2 for the results from fits to the light curves in the g-band (left
panels) and i-band (right panels). For each band the three panels show the projections onto
the three possible planes involving b, a/RA, and RB/RA. The 1-σ, 2-σ, and 3-σ contours are
plotted.
– 35 –
Fig. 6.— The mass-radius diagram for M dwarfs in single-lined eclipsing binaries. The M
dwarfs from Pont et al. (2005a,b, 2006) are plotted as open circles. The red line passing
through the point for HAT-TR-205-013 B shows the constraint imposed on its location by
Eq.(11) and our observed quantities, without making any explicit assumptions (such as
synchronization) about the system. Assuming synchronization, the hash marks on the line
show the effect that differences of ± 1, 2, & 3 km s−1 in Vrot have on our final results.
	INTRODUCTION
	OBSERVATIONS AND DATA REDUCTION
	HAT Photometry
	Follow-up Spectroscopy
	Follow-up KeplerCam Photometry
	LIGHT-CURVE ANALYSIS
	MASSES AND RADII FOR HAT-TR-205-013
	DISCUSSION
ABSTRACT
  We derive masses and radii for both components in the single-lined eclipsing
binary HAT-TR-205-013, which consists of a F7V primary and a late M-dwarf
secondary. The system's period is short, $P=2.230736 \pm 0.000010$ days, with
an orbit indistinguishable from circular, $e=0.012 \pm 0.021$. We demonstrate
generally that the surface gravity of the secondary star in a single-lined
binary undergoing total eclipses can be derived from characteristics of the
light curve and spectroscopic orbit. This constrains the secondary to a unique
line in the mass-radius diagram with $M/R^2$ = constant. For HAT-TR-205-013, we
assume the orbit has been tidally circularized, and that the primary's rotation
has been synchronized and aligned with the orbital axis. Our observed line
broadening, $V_{\rm rot} \sin i_{\rm rot} = 28.9 \pm 1.0$ \kms, gives a primary
radius of $R_{\rm A} = 1.28 \pm 0.04$ \rsun. Our light curve analysis leads to
the radius of the secondary, $R_{\rm B} = 0.167 \pm 0.006$ \rsun, and the
semimajor axis of the orbit, $a = 7.54 \pm 0.30 \rsun = 0.0351 \pm 0.0014$ AU.
Our single-lined spectroscopic orbit and the semimajor axis then yield the
individual masses, $M_{\rm B} = 0.124 \pm 0.010$ \msun and $M_{\rm A} = 1.04
\pm 0.13$ \msun. Our result for HAT-TR-205-013 B lies above the theoretical
mass-radius models from the Lyon group, consistent with results from
double-lined eclipsing binaries. The method we describe offers the opportunity
to study the very low end of the stellar mass-radius relation.

<|endoftext|><|startoftext|>
Coulomb excitation of unstable nuclei at intermediate energies
C.A. Bertulani1∗, G. Cardella2, M. De Napoli2,3, G. Raciti2,3, and E. Rapisarda2,3
1 Department of Physics and Astronomy,
University of Tennessee, Knoxville, Tennessee 37996, USA
2 Istituto Nazionale di Fisica Nucleare, Sezione di Catania,
via Santa Sofia 64, I-95123, Catania, Italy
3 Dipartimento di Fisica e Astronomia, Universitá Catania,
via Santa Sofia 64, I-95123, Catania, Italy
Abstract
We investigate the Coulomb excitation of low-lying states of unstable nuclei in intermediate
energy collisions (Elab ∼ 10−500 MeV/nucleon). It is shown that the cross sections for the E1 and
E2 transitions are larger at lower energies, much less than 10 MeV/nucleon. Retardation effects
and Coulomb distortion are found to be both relevant for energies as low as 10 MeV/nucleon and
as high as 500 MeV/nucleon. Implications for studies at radioactive beam facilities are discussed.
PACS numbers: 25.60.-t, 25.70.-z, 25.70.De
Keywords: Coulomb excitation, cross sections, unstable nuclei.
∗ bertulanica@ornl.gov.
http://arxiv.org/abs/0704.0060v2
Unstable nuclei are often studied with reactions induced by secondary radioactive beams.
Examples of these reactions are elastic scattering, fragmentation and Coulomb excitation
by heavy targets. Coulomb excitation is specially useful since the interaction mechanism is
very well known [1]. It is the result of electromagnetic interactions of a projectile (ZP ,AP )
with a target (ZT ,AT ). One of the participating nuclei is excited as it passes through the
electromagnetic field of the other. Here we will only consider the excitation of the pro-
jectile as is of interest in studies carried out in heavy ion facilities around the world, e.g.
LNS/Catania, NSCL/MSU, GSI, GANIL, RIKEN, etc. In Coulomb excitation a virtual pho-
ton with energy E is absorbed by the projectile. Because in pure Coulomb excitation the
participating nuclei stay outside the range of the nuclear strong force, the excitation cross
section can be expressed in terms of the same multipole matrix elements that character-
ize excited-state gamma-ray decay, or the reduced transition probabilities, B(πλ; Ji → Jf).
Hence, Coulomb excitation amplitudes are strongly coupled with valuable nuclear struc-
ture information. Therefore, this mechanism has been used for many years to study the
electromagnetic properties of low-lying nuclear states [1].
Coulomb excitation cross sections are large if the adiabacity parameter satisfies the con-
dition
ξ = ωfi
< 1 , (1)
where a0 is half the distance of closest approach in a head-on collision for a projectile
velocity v, and Ex = ~ωfi is the excitation energy. This adiabatic cut-off limits the possible
excitation energies below 1-2 MeV in sub-barrier collisions. A possible way to overcome this
limitation, and to excite high-lying states, is to use higher projectile energies. In this case,
the closest approach distance, at which the nuclei still interact only electromagnetically, is
of order of the sum of the nuclear radii, R = RP + RT , where P refers to the projectile
and T to the target. For very high energies one has also to take into account the Lorentz
contraction of the interaction time by means of the Lorentz factor γ = (1− v2/c2)−1/2, with
c being the speed of light. For such collisions the adiabacity condition, Eq. (1), becomes
ξ(R) =
< 1 . (2)
From this relation one obtains that for bombarding energies around and above 100
MeV/nucleon, states with energy up to 10-20 MeV can be readily excited [3].
An appropriate description of Coulomb excitation at intermediate energies (Elab =
10 − 500 MeV/nucleon) has been described in ref. [2]. In this energy region neither the
non-relativistic Coulomb excitation formalism described in ref. [1], nor the relativistic one
formulated in refs. [3, 4] are appropriate. This is discussed in details in ref. [2] where it
is shown that the correct values of the Coulomb excitation cross sections differ by up to
30-40% when compared to the non-relativistic and relativistic treatments used to calculate
experimental observables (cross sections, gamma-ray angular distributions, etc.).
We follow the formalism of ref. [2] to calculate cross sections for Coulomb excitation from
energies varying from 10 to 500 MeV/nucleon. These are the energies where most radioactive
beam facilities are or will be operating around the world. The calculated cross sections will
be of useful guide for future experiments. We also compare the accurate calculations with
those obtained by using simple analytical formulas and test the regime of their validity.
The cross sections for the transition Ji → Jf in the projectile are calculated using the
equation [2]
dσi→f
4π2Z2T e
B(πλ, Ji → Jf)
(2λ+ 1)3
| S(πλ, µ) |2 , (3)
where π = E or M stands for the electric or magnetic multipolarity, and
B(πλ, Ji −→ Jf) =
2Ji + 1
|〈Jf ‖M(πλ)‖Ji〉|
are the reduced transition probabilities. In these equations, ǫ = 1/ sin(Θ/2), with Θ being
the deflection angle, a0 = ZPZT e
2/m0v
2 and a = a0/γ. The complex functions S(πλ, µ) are
integrals along Coulomb trajectories corrected for retardation. Their calculation and how
they relate to the non-relativistic and relativistic theories are described in details in ref. [2].
Here we will introduce another comparison tool for the total cross section, which is obtained
by integration of eq. 3 over scattering angles. The code COULINT [2] was used to calculate
the orbital integrals S(πλ, µ) and the cross sections of eq. 3 (for more details, see ref. [2]).
Using the theory described in ref. [4], it is easy to show that approximate values of the
cross sections for E1, E2, and M1 transitions can be obtained by means of the relations
(app)
B (E1)
ξK0K1 −
K21 −K
(app)
E3xB (E2)
K21 + ξ
K0K1 −
K21 −K
(app)
B (M1)
ξK0K1 −
K21 −K
, (5)
where Kn are the modified Bessel functions of the second order, as a function of ξ given by
eq. 2, with R corrected for recoil by the modification R → R + πa/2 [3].
Here we will only consider the excitation of the lowest lying states in light and medium
heavy nuclei. For nuclear masses A < 20, the TUNL nuclear data evaluation web site
was of great help [5]. The electromagnetic transition rates at the TUNL database are
given in Weisskopf units and are transformed to the appropriate B(πλ, Ii → If)-values
by means of the standard Weisskopf relations BW (E1; Ji → Jgs) = 0.06446A
2/3 e2fm2,
BW (E2; Ji → Jgs) = 0.05940A
4/3 e2fm4, and BW (M1; Ji → Jgs) = 1.79 (e~/2mnc)
. For
comparison, a few medium mass nuclei, as well as a few stable nuclei, were included in the
calculation. Other data were taken from refs. [6, 7, 8, 9].
Some cases of nuclei far from the stability line are very interesting and deserve further
study, possibly using the method of Coulomb excitation. For example, it is well known that
nuclei with open shells tend to have B(E2) values greater than 10 W.u., whereas nuclei with
shell closure of neutrons or protons tend to have distinctly smaller B(E2) values. Typical
examples of the latter category are the doubly magic nuclei, 16O and 48Ca, which B(E2)
values are 3.17 and 1.58 W.u., respectively. According to an empirical formula adjusted
to a global fit of the known transition rates, the values of first excited 2+ level, E2+ , and
B(E2; 0+ → 2+) are related by [10] (E2+ in keV)
B(E2; 0+ → 2+) = 26
A2/3E2+
e2fm4. (6)
The value of B(E2) for 16C based on this formula is at least one order of magnitude
larger than what is observed experimentally in a Coulomb dissociation experiment [9]. The
anomalously strong hindrance of the 16C transition is not well explained theoretically. This
is just an example of the power of Coulomb excitation as a tool to access the new physics
inherent of poorly known rare nuclear species.
Another example is the strong E1 transition in 11Be. 11Be is an archetype of a halo nu-
cleus and exhibits the fastest known dipole transition between bound states in nuclei. The
B(E1) transition strength between the ground and the only bound excited state (at 0.32
MeV) was determined from lifetime measurements by Millener et al. to be 0.116 e2fm2 [11].
However, Coulomb excitation experiments have obtained a much smaller value of the B(E1)
which is still a matter of investigation [12, 13, 14]. It is thus seems clear that predictions
based on traditional nuclear structure and reaction theory often yields results in disagree-
ment with experimental data. In spite of that, when proper corrections are accounted for
(e.g. channel-coupling, nuclear excitation, relativistic corrections), Coulomb excitation of ra-
dioactive beams is a powerful complementary tool to investigate electromagnetic properties
of nuclei far from the stability line.
In Table 1 we compare our calculations with several experimentally obtained cross sections
for Coulomb excitation of unstable nuclei. The units of energy are MeV, the laboratory
energy is in MeV/nucleon, the B-values are in units of e2 fm2λ, and the cross sections are
in millibarns. The last two columns give the calculated cross sections obtained by using
eqs. 3 and 5, respectively. Since the cross sections of eq. 5 are functions of the minimum
impact parameter, the values reported in the Table have been calculated according to the
experimental angular ranges reported in the seventh column. Except for the 11Be case, for
which the discrepancy between theory and experiment is known (see discussion above), the
calculated cross sections are close to the experimental values. Nonetheless, the calculated
cross sections tend to be smaller than the experimental ones for 17Ne, 32Mg, 38S, 40S, 42S,
44Ar, and 46Ar projectiles. This is worrisome because the B(πλ) values were extracted
from the experimentally obtained cross sections, using equations similar to eq. 5. These
experimental B-values would have to be larger by 10− 30% according to our calculations.
It is important to stress the fact that many experimental data on unstable nuclei collected
up to now have been analyzed by means of theoretical tools (DWBA and coupled-channels
codes) which do not include relativistic dynamics (the inclusion of relativistic kinematics
is straightforward). This problem was first addressed in ref. [15], where it was shown
that the analysis of experimental data at intermediate energies without a proper treatment
of relativistic dynamics leads to wrong values of electromagnetic transition probabilities.
We should stress that a full theoretical treatment of relativistic dynamics of strong and
electromagnetic interactions in many-body systems is very difficult and still does not exist
[15].
Data Projectile Target Elab πλ B(πλ) θrange Ex σexp σth σapp
1 [16] 11Be Pb 43. E1 0.115 < 5◦ 0.32 (1
) 191± 26 328. 323.
2 [16] 11Be Pb 59.4 E1 0.094 < 3.8◦ 0.32 (1
) 304± 43 213. 211.
3 [18] 11Be Au 57.6 E1 0.079 < 3.8◦ 0.32 (1
) 244± 31 170. 168.
4 [17] 11Be Pb 64. E1 0.099 < 3.8◦ 0.32 (1
) 302± 31 217. 215.
5 [19, 20] 17Ne Au 60. M1 0.163 < 4.5◦ 1.29 (1
) 12± 4 12.6 13.0
6 [6] 32Mg Pb 49.2 E2 454 < 4◦ 0.885 (0+ → 2+) 91.7± 14.4 137. 128.
7 [19] 38S Au 39.2 E2 235 < 4.1◦ 1.29 (0+ → 2+) 59± 7 48. 45.0
8 [19] 40S Au 39.5 E2 334 < 4.1◦ 0.91 (0+ → 2+) 94± 9 75.5 70.4
9 [19] 42S Au 40.6 E2 397 < 4.1◦ 0.89 (0+ → 2+) 128± 19 101. 94.3
10 [19] 44Ar Au 33.5 E2 345 < 4.1◦ 1.14 (0+ → 2+) 81± 9 62.3 58.3
11 [19] 46Ar Au 35.2 E2 196 < 4.1◦ 1.55 (0+ → 2+) 53± 10 40.9 38.2
12 [8] 46Ar Au 76.4 E2 212 < 2.9◦ 1.55 (0+ → 2+) 68± 8 50.0 47.4
Table 1. Cross sections for Coulomb excitation of unstable nuclei. The units of energy
are MeV, the laboratory energy is in MeV/nucleon, the B(πλ)-values are in units of e2fm2λ,
and the cross sections are in millibarns. The data for different experiments (numbered 1 to
12) were collected from the references listed in column 1. The last two columns give the
calculated cross sections obtained by using eqs. 3 and 5, respectively.
In figure 1 we show a comparison between the experimental data and our calculations.
We notice that the cross sections calculated with help of eq. 5 are not much different than
those calculated with eq. 3. They are systematically lower, up to 10%, than the exact
calculation following eq. 3. As we discuss below, this is not always the case, specially for
the excitation of high-lying states. In fact, this is a good check of eq. 3, which is done in a
very different way than the analytical calculations of eq. 5. But as we will see below, this
agreement is not always the case, specially when one includes small impact parameters for
which the sensitivity to the relativistic corrections is higher (see ref. [2]). The dashed curve
in figure 1 is a guide to the eye. It helps to see that the experimental cross sections are on
average larger than the calculated ones, either with eq. 3 (open circles), or with eq. 5 (open
triangles).
2 4 6 8 10 12
Data set
FIG. 1: Comparison between experimental Coulomb excitation cross sections (solid stars with
error bars) and theoretical ones, calculated either with eq. 3 (open circles), or with eq. 5 (open
triangles).
Ex [MeV] J
i → J
f πλ B(πλ) [e
2 fm2λ] 10 20 30 50 100 200 500
11Be 0.32 1
E1 0.115 1128 653 473 315 187 115 69.6
11B 2.21 3
M1 2.40×10−2 0.301 0.799 1.15 1.63 2.33 3.08 4.17
11C 2.00 3
M1 1.52×10−2 0.196 0.551 0.793 1.12 1.57 2.07 2.76
12B 0.953 1+ → 2+ M1 4.62×10−3 0.227 0.395 0.490 0.607 0.762 0.917 1.13
12C 4.44 0+ → 2+ E2 37.9 34.6 38.6 31.3 21.6 12.1 6.93 3.81
13C 3.09 1
E1 1.39×10−2 8.37 11.3 11.0 9.61 7.28 5.39 3.89
13N 2.37 1
E1 3.56×10−2 38.2 43.6 39.6 32.5 23.2 16.4 11.4
15C 0.74 1
E2 2.90 8.79 4.04 2.65 1.59 0.839 0.475 0.267
16C 1.77 0+ → 2+ E2 2.12 8.81 4.41 2.92 1.76 0.920 0.517 0.285
16N 0.12 0− → 2− E2 10.2 31.0 14.1 9.21 5.53 2.91 1.64 0.926
17N 1.37 1
M1 5.15×10−3 0.153 0.304 0.397 0.516 0.680 0.848 1.09
17O 0.87 5
E2 2.07 6.30 2.88 1.87 1.12 0.588 0.332 0.184
17F 0.5 5
E2 21.6 68.3 29.7 19.3 11.6 6.08 3.44 1.92
18O 1.98 0+ → 2+ E2 44.8 109 60.7 40.9 24.8 11.6 7.27 3.99
18F 0.94 1+ → 3+ E2 37.9 115 52.5 34.1 20.4 10.7 6.01 3.34
18Ne 1.89 0+ → 2+ E2 248 615 342 229 138 72.0 40.1 22.1
19O 0.1 5
M1 2.34×10−4 0.0495 0.0615 0.0673 0.0737 0.0799 0.779 0.799
19F 0.11 1
E1 5.51×10−4 8.07 4.36 3.06 1.97 1.10 0.592 0.337
19Ne 0.24 1
E2 119 361 157 102 61.6 32.5 18.5 10.5
20O 1.67 0+ → 2+ E2 28.0 72 37.4 24.9 15.1 7.86 4.41 2.43
20F 0.656 2+ → 3+ M1 3.56×10−3 0.237 0.385 0.465 0.560 0.683 0.803 0.959
20Ne 1.63 0+ → 2+ E2 319 834 433 287 173 89.8 50.3 27.6
30Ne 0.791 0+ → 2+ E2 460 1167 550 361 218 115 65.0 35.2
32Mg 0.885 0+ → 2+ E2 454 1151 541 355 214 112 63.0 36.7
42S 0.89 0+ → 2+ E2 397 945 445 292 175 91.9 52 29.7
46Ar 1.55 0+ → 2+ E2 190 399 209 140 84.4 44.1 24.7 13.6
54Ni 1.40 0+ → 2+ E2 626 1319 677 447 268 139 78.1 43.1
Table 2 - Cross sections (in mb) for Coulomb excitation of projectiles incident on Pb
targets at bombarding energies ranging from 10 to 500 MeV/nucleon. The energy units are
MeV, the laboratory energy is in MeV/nucleon, the B(πλ)-values are in units of e2fm2λ.
The cross sections for Coulomb excitation of numerous projectiles incident on Pb targets
at bombarding energies ranging from 10 to 500 MeV/nucleon are shown in Table 2. These
cross sections were calculated assuming that the detectors collect events from all possible
Coulomb scattering events. In a real experimental situation, the angular distribution is
restricted to angular windows, reducing the available cross sections. Only the lowest lying
transitions have been considered, i.e. from the ground to the first excited states. One
observes that some cross sections are very large, specially for 11Be, 18Ne, 30Ne and 54Ni.
For these and other similar cases, the measurements are easy to perform, with a large
number of events/second even with modest intensities. Cases such as 16C are well within
the experimental possibilities in most radioactive beam facilities.
Table 2 also shows that, except for M1 excitations, the Coulomb excitation cross sections
decrease steadily as the energy increases from 10 to 500 MeV/nucleon. Based on these
numbers alone, one could conclude that Coulomb excitation of low-lying states (in contrast
to the case of high-lying states, e.g. giant resonances [4]) are better suited for studies at low
energies. However, reactions at lower energies while are less influenced by contamination due
to nuclear breakup [12, 14] can give rise to large high-order effects [21]. The interpretation
of data could be distorted as in the case of Coulomb dissociation of 8B at low energy [24],
which was completely misinterpreted in terms of first-order calculations. In some situations,
when higher-order effects are relevant, the effect of the nuclear breakup cannot be neglected
either [22, 23]. Thus, the choice of the incident energy would depend on the experimental
conditions. Identification of gamma-rays from de-excitation using Doppler shift techniques
are often more advantageous at higher energies. Moreover, except for few cases (e.g. 11C), the
magnetic dipole transitions are much smaller than those for E1 and E2 transitions. Even
for M1 transitions the measurements are under the possibility of most new experimental
facilities.
The comparison of the exact calculations, using eq. 3 (solid lines), and the approximations
5 (dashed lines) are shown in figs. 2(a-d), for 11Be, 11B, 54Ni and 16O, respectively. The 16O
case (as well as for 12C in Table 2) was included for comparison, with a high-lying excited
state. We see from figs 2a and 2b that the approximations in eq. 5 work quite well for
the M1 multipolarity and reasonably well (within 20% at 10 MeV/nucleon and 5% at 50
MeV/nucleon) for the E1 cases. But they fail badly at low and intermediate energies for
(c) (d)
FIG. 2: Coulomb excitation cross section of the first excited state in 11Be, 11B and 54Ni and of the
13.05 MeV sate in 16O projectiles incident on Pb targets as a function of the laboratory energy.
the E2 ( fig. 2c). The reason is that the E2 Coulomb field (“tidal field”) is very sensitive
to the details of the collision dynamics at low energies. These conclusions can be deceiving
since even for the E1 and M1 cases the approximations in eq. 5 may strongly differ from
the exact calculations if the excitation energy is large (see discussion in ref. [2]). This is
shown in figure 2d, where we plot the Coulomb excitation cross section of the Ex = 13.09
MeV state in 16O. In this case, the cross sections based on eq. 5 is a factor of 10 smaller
than the exact calculation at 10 MeV/nucleon. At 100 MeV/nucleon this difference drops
to 10%, which still needs to be considered with care.
In summary, in this article we have used the formalism of ref. [2] to predict the cross
sections for Coulomb excitation of several light projectiles with electromagnetic transitions
found in the literature, listed in the TUNL database [5], and for a few other selected cases.
These estimates will be useful for planing Coulomb excitation experiments at present and
future heavy ion facilities. It is evident that the inclusion of relativistic effects combined
with Coulomb distortion are of the utmost relevance. The cross section inferred by using
non-relativistic or pure relativistic treatments can be wrong by up to 30% even at 100
MeV/nucleon, as shown here and in ref. [2]. Finally, the use of Coulomb excitation to
produce nuclei in high-lying states is an important tool to study particle emission processes.
For example, the excitation of 18Ne and its subsequent decay by two-proton emission is a
process of large theoretical and experimental interest. Experimental work in this direction
is in progress [25].
Acknowledgments
This research was supported by the U.S. Department of Energy under contract No. DE-
AC05-00OR22725 (Oak Ridge National Laboratory) with UT-Battelle, LLC., and by DE-
FC02-07ER41457 with the University of Washington (UNEDF, SciDAC-2).
[1] K. Alder and A. Winther, Electromagnetic Excitation, North-Holland, Amsterdam, 1975.
[2] C.A. Bertulani, A.E. Stuchbery, T.J. Mertzimekis and A.D. Davies, Phys. Rev. C 68 (2003)
044609.
[3] A. Winther and K. Alder, Nucl. Phys. A 319 (1979) 518.
[4] C.A. Bertulani and G. Baur, Nucl. Phys. A 442 (1985) 739.
[5] TUNL Nuclear Data Project: http://www.tunl.duke.edu/nucldata/index.shtml
[6] T. Motobayashi et al., Phys. Lett. B 346 (1995) 9.
[7] H. Scheit et al., Phys. Rev. Lett. 77 (1996) 3967.
[8] A. Gade et al., Phys. Rev. C 68 (2003) 014302.
[9] N. Imai et al, Phys. Rev. Lett. 92 (2004) 062501.
[10] S. Raman, C.W. Nestor, Jr., and K. H. Bhatt, Phys. Rev. C 37, 805 (1988).
[11] D. J. Millener, J. W. Olness, E. K. Warburton, and S. S. Hanna, Phys. Rev. C 28 (1983) 497.
[12] C. A. Bertulani, L. F. Canto, and M. S. Hussein, Phys. Lett. B 353 (1995) 413.
[13] M. S. Hussein, R. Lichtenthäler, F. M. Nunes, and I. J. Thompson, Phys. Lett. B 640 (2006)
http://www.tunl.duke.edu/nucldata/index.shtml
[14] R. Chatterjee, [Los Alamos archiive: nucl-th/0703083], 2007.
[15] C.A. Bertulani, Phys. Rev. Lett. 94 (2005) 072701.
[16] R. Anne et al., Z. Phys. A 352 (1995) 397.
[17] T. Nakamura et al., Phys. Lett. B 394 (1997) 11.
[18] M. Fauerbach et al., Phys. Rev. C 56 (1997) R1.
[19] M.J. Chromik et al., Phys. Rev C 55 (1997) 1676.
[20] M.J. Chromik et al., Phys. Rev C 66 (2002) 024313.
[21] C.A.Bertulani and L.F.Canto, Nucl. Phys. A 539 (1992) 163; G.F. Bertsch and C.A. Bertulani,
Nucl. Phys. A 556 (1993) 136.
[22] C.A. Bertulani and M. Gai, Nucl. Phys. A 636 (1998) 227.
[23] C.H. Dasso, S.M. Lenzi, A. Vitturi, Nucl.Phys. A 639 (1998) 635.
[24] J. von Schwarzenberg, J.J. Kolata, D. Peterson, P. Santi, and M. Belbot, Phys. Rev. C 53,
R2598 (1996).
[25] E. Rapisarda, G. Cardella, F. Amorini, L. Calabretta, M. De Napoli, P.Figuera, G. Raciti, F.
Rizzo, D. Santonocito and C. Sfienti, 7th Int. Conf. on Radioactive Nuclear Beams, Cortina
d’Ampezzo, Italy, July 3 - 7, 2006.
http://arxiv.org/abs/nucl-th/0703083
	References
ABSTRACT
  We investigate the Coulomb excitation of low-lying states of unstable nuclei
in intermediate energy collisions ($E_{lab}\sim10-500$ MeV/nucleon). It is
shown that the cross sections for the $E1$ and $E2$ transitions are larger at
lower energies, much less than 10 MeV/nucleon. Retardation effects and Coulomb
distortion are found to be both relevant for energies as low as 10 MeV/nucleon
and as high as 500 MeV/nucleon. Implications for studies at radioactive beam
facilities are discussed.

<|endoftext|><|startoftext|>
Introduction.
2. Preliminaries.
3. Analytic families of the generalized cosine transforms.
4. Positive definite homogeneous distributions.
5. λ-intersection bodies.
6. Examples of λ-intersection bodies.
7. (q, ℓ)-balls.
8. The generalized cosine transforms and comparison of volumes.
9. Appendix.
2000 Mathematics Subject Classification. Primary 44A12; Secondary 52A38.
Key words and phrases. Spherical Radon transforms, cosine transforms, inter-
section bodies.
The research was supported in part by the NSF grant DMS-0556157 and the
Louisiana EPSCoR program, sponsored by NSF and the Board of Regents Support
Fund.
http://arxiv.org/abs/0704.0061v2
2 BORIS RUBIN
1. Introduction
This is an updated and extended version of our previous preprint
[R5].
Intersection bodies interact with Radon transforms and encompass
diverse classes of geometric objects associated to sections of star bodies.
The concept of intersection body was introduced in the remarkable
paper by Lutwak [Lu] and led to a breakthrough in the solution of the
long-standing Busemann-Petty problem; see [G], [K4], [Lu], [Z2] for
references and historical notes.
We remind some known facts that will be needed in the following.
An origin-symmetric (o.s.) star body in Rn, n ≥ 2, is a compact set
K with non-empty interior such that tK ⊂ K ∀t ∈ [0, 1], K = −K,
and the radial function ρK(θ) = sup{λ ≥ 0 : λθ ∈ K} is continuous
on the unit sphere Sn−1. In the following, Kn denotes the set of all
o.s. star bodies in Rn, Gn,i is the Grassmann manifold of i-dimensional
linear subspaces of Rn, and voli(·) denotes the i-dimensional volume
function. The Minkowski functional of a body K ∈ Kn is defined by
||x||K = min{a ≥ 0 : x ∈ aK}, so that ||θ||K = ρ−1K (θ), θ ∈ Sn−1.
Definition 1.1. [Lu] A body K ∈ Kn is an intersection body of a body
L ∈ Kn if ρK(θ) = voln−1(L ∩ θ⊥) for every θ ∈ Sn−1, where θ⊥ is the
central hyperplane orthogonal to θ.
By taking into account that voln−1(L ∩ θ⊥) in Definition 1.1 is a
constant multiple of the Minkowski-Funk transform
(Mf)(θ) =
Sn−1∩θ⊥
f(u) dθu, f(u) = ρ
L (u),
Goodey, Lutwak and Weil [GLW] generalized Definition 1.1 as follows.
Definition 1.2. A body K ∈ Kn is an intersection body if ρK = Mµ
for some even non-negative finite Borel measure µ on Sn−1.
A sequence of bodies Kj ∈ Kn is said to be convergent to K ∈ Kn
in the radial metric if lim
||ρKj − ρK ||C(Sn−1) = 0.
Proposition 1.3. The class of intersection bodies is the closure of the
class of intersection bodies of star bodies in the radial metric.
Proposition 1.4. If K is an intersection body in Rn, n > 2, then for
every i = 2, 3, . . . , n − 1 and every η ∈ Gn,i, K ∩ η is an intersection
body in η.
Regarding these two important propositions see [FGW], [GW] and
a nice historical survey in [G].
INTERSECTION BODIES 3
Different generalizations of the concept of intersection body associ-
ated to lower dimensional sections were suggested in the literature; see,
e.g., [K4], [RZ], [Z1]. The following one, which plays an important role
in the study of the lower dimensional Busemann-Petty problem, is due
to Zhang [Z1].
Definition 1.5. We say, that a body K ∈ Kn belongs to Zhang’s class
Zni if there is a non-negative finite Borel measure m on the Grassmann
manifold Gn,i such that ρ
K = R
im, where R
i is the dual spherical
Radon transform; see (2.2), (2.5).
Another generalization was suggested by Koldobsky [K2] and de-
scribed in detail in [K4]. This class of bodies will be our main concern.
Definition 1.6. [K4, p. 71] A body K ∈ Kn is a k-intersection body
of a body L ∈ Kn (we write K = IBk(L)) if
(1.1) volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) ∀ξ ∈ Gn,k.
We denote by IBk,n the set of all bodies K ∈ Kn satisfying (1.1) for
some L ∈ Kn.
When k = 1, this definition coincides with Definition 1.1 up to a
constant multiple. An analog of Definition 1.2 was given in the Fourier
analytic terms as follows.
Definition 1.7. [K4, Definition 4.7] A body K ∈ Kn is a k-intersection
body if there is a non-negative finite Borel measure µ on Sn−1, so that
for every Schwartz function φ,
||x||−kK φ(x) dx =
tk−1φ̂(tθ) dt
dµ(θ),
where φ̂ denotes the Fourier transform of φ.
The set of all k-intersection bodies in Rn will be denoted by Ink .
Keeping in mind Proposition 1.3 for k = 1, one can alternatively
define the class Ink as a closure of IBk,n in the radial metric; cf. [Mi1,
p. 532]. However, to apply results from [K4] to such class, equivalence
of this definition to Definition 1.7 must be proved. We will do this in
the more general situation in Section 5.2.
From Definitions 1.6 and 1.7 it is not clear, for which bodies L ∈ Kn
the relevant k-intersection body K = IBk(L) does exist. It is also not
obvious which bodies actually constitute the class Ink . The following
important characterization is due to Koldobsky.
4 BORIS RUBIN
Theorem 1.8. [K4, Theorem 4.8] A body K ∈ Kn is a k-intersection
body if and only if || · ||−kK represents a positive definite tempered dis-
tribution on Rn, that is, the Fourier transform (|| · ||−kK )∧ is a positive
tempered distribution on Rn.
The concept of k-intersection body is related to another important
development. For K ∈ Kn, the quasi-normed space (Rn, || · ||K) is said
to be isometrically embedded in Lp, p > 0, if there is a linear operator
T : Rn → Lp([0, 1]) so that ||x||K = ||Tx||Lp([0,1]).
Theorem 1.9. [K4, Theorem 6.10] The space (Rn, || · ||K) embeds iso-
metrically in Lp, p > 0, p 6= 2, 4, . . . , if and only if Γ(−p/2)(|| · ||pK)∧
is a positive distribution on Rn \ {0}.
Following Theorems 1.9 and 1.8, one can formally say that K ∈ Ink if
and only if (Rn, || · ||K) embeds isometrically in L−k. This observation,
combined with Definition 1.7, was used by A. Koldobsky to define the
concept of “isometric embedding in Lp” for negative p.
Definition 1.10. [K4, Definition 6.14] Let 0 < p < n, K ∈ Kn. The
space (Rn, || · ||K) is said to be isometrically embedded in L−p if there
is a non-negative finite Borel measure µ on Sn−1, so that for every
Schwartz function φ,
||x||−pK φ(x) dx =
tp−1φ̂(tθ) dt
dµ(θ),
where φ̂ denotes the Fourier transform of φ.
Origin-symmetric bodiesK in this definition can be regarded as “unit
balls of n-dimensional subspaces of L−p”. Comparing Definitions 1.10
and 1.7, one might call these bodies “p-intersection bodies”. Since the
meaning of the space L−p itself is not specified in Definition 1.10 and
since our paper is mostly focused on geometric properties of bodies
(rather than embeddings in Lp), in the following we prefer to adopt
another name “λ-intersection body”, where λ is a real number, that
will be specified in due course. We denote the set of all λ-intersection
bodies in Rn by Inλ .
Contents of the paper. We will focus on intimate connection
between intersection bodies, spherical Radon transforms, and general-
ized cosine transforms; see definitions in Section 2.2. This approach
is motivated by the fact that the volume of a central cross section of
a star body is expressed through the spherical Radon transform, and
the latter is a member of the analytic family of the generalized cosine
transforms. These transforms were introduced by Semyanistyi [Se] and
INTERSECTION BODIES 5
arise (up to naming and normalization) in different contexts of analysis
and geometry; see, e.g., [K4], [R1]-[RZ], [Sa2], [Sa3], [Str1], [Str2].
Sections 2-4 provide analytic background for geometric considera-
tions in Sections 5-7. In Section 2 we establish our notation and define
the generalized cosine transforms on the sphere and the relevant dual
transforms on Grassmann manifolds. In Section 3 we present basic
properties of these transforms, establish new relations between spheri-
cal Radon transforms and the generalized cosine transforms, and prove
“restriction theorems”, which are akin to trace theorems in Sobolev
spaces. Section 4 deals with positive definite homogeneous distribu-
tions, that can be characterized in terms of the generalized cosine
transforms. This section serves as a preparation for the forthcoming
definition of the concept of λ-intersection body. We investigate which
λ’s are appropriate and why. In Section 5 we switch to geometry and
define the class Inλ of λ-intersection bodies. The case 0 < λ < n cor-
responds to the “unit balls of L−p-spaces” in the spirit of Definition
1.10. The reader will find in this section new proofs of some known
facts. We introduce the notion of λ-intersection body of a star body
in Rn, which extends Definition 1.6 to all λ < n, λ 6= 0. The class
of all such bodies will be denoted by IBnλ . We will prove that for
all λ < n, λ 6= 0,−2,−4, . . . , the class Inλ is the closure of IBnλ in
the radial metric. The case λ = 1 gives Proposition 1.3. It will be
proved that all m-dimensional central sections of λ-intersection bod-
ies are λ-intersection bodies in the corresponding m-planes provided
λ < m, λ 6= 0.
The natural question arises: How to construct λ-intersection bodies?
In Section 6 we give a series of examples; some of them are known and
some are new. They can be obtained by utilizing auxiliary statements
from Section 3. In particular, the famous embedding of Zhang’s class
Znn−k into Ink , which was first established in [K3] and studied in [Mi1],
[Mi2], will be generalized to the case, when k is replaced by any λ ∈
(0, n). Section 7 is devoted to the so called (q, ℓ)-balls, defined by
Bnq,ℓ = {x = (x′, x′′) : |x′|q + |x′′|q ≤ 1; x′ ∈ Rn−ℓ, x′′ ∈ Rℓ}, q > 0.
We show that if 0 < q ≤ 2, then Bnq,ℓ ∈ Inλ for all λ ∈ (0, n). If
q > 2 and n − 3 ≤ λ < n, we still have Bnq,ℓ ∈ Inλ . If q > 2 and
0 < λ < λ0 = max(n − ℓ, ℓ) − 2, then Bnq,ℓ 6∈ Inλ . The case, when
q > 2, ℓ > 1, and λ0 ≤ λ < n− 3 represents an open problem.
In Section 8 we remind the generalized Busemann-Petty problem
(GBP) for i-dimensional central sections of o.s. convex bodies in Rn.
This challenging problem is still open for i = 2 and i = 3 (n ≥ 5).
It actually inspires the whole investigation. Using properties of the
6 BORIS RUBIN
generalized cosine transforms, we give a short direct proof of the fact
that an affirmative answer to GBP implies that every smooth o.s. con-
vex body in Rn with positive curvature is an (n− i)-intersection body.
This fact was discovered by A. Koldobsky. The original proof in [K3]
is based on the embedding Inn−i ⊂ Zni and Zhang’s result [Z1, Theorem
6]. The latter heavily relies on the Hahn-Banach separation theorem.
Our proof is more constructive and almost self-contained. We conclude
the paper by Appendix, which is added for convenience of the reader.
The list of references at the end of the paper is far from being com-
plete. Further references can be found in cited books and papers.
Acknowledgement. I am grateful to Professor Alexander Koldob-
sky, who shared with me his knowledge of the subject. Special thanks
go to Professors Erwin Lutwak, Deane Yang, and Gaoyong Zhang for
useful discussions.
2. Preliminaries
2.1. Notation. In the following, N = {1, 2, . . . } is the set of all nat-
ural numbers, Sn−1 is the unit sphere in Rn with the area σn−1 =
2πn/2/Γ(n/2); Ce(S
n−1) is the space of even continuous functions on
Sn−1; SO(n) is the special orthogonal group of Rn; for θ ∈ Sn−1 and
γ ∈ SO(n), dθ and dγ denote the relevant invariant probability mea-
sures; D(Sn−1) is the space of C∞-functions on Sn−1 equipped with
the standard topology, and D′(Sn−1) stands for the corresponding dual
space of distributions. The subspaces of even test functions (distribu-
tions) are denoted by De(Sn−1) ( D′e(Sn−1)); Gn,i denotes the Grass-
mann manifold of i-dimensional subspaces ξ of Rn with the SO(n)-
invariant probability measure dξ; D(Gn,i) is the space of infinitely dif-
ferentiable functions on Gn,i.
We write M(Sn−1) and M(Gn,i) for the spaces of finite Borel mea-
sures on Sn−1 and Gn,i; M+(Sn−1) and M+(Gn,i) are the relevant
spaces of non-negative measures; Me+(Sn−1) denotes the space of even
measures µ ∈ M+(Sn−1). Given a function ϕ on Gn,i, we denote
ϕ⊥(η) = ϕ(η⊥), η ∈ Gn,n−i. Similarly, given a measure µ ∈ M(Gn,n−i),
the corresponding “orthogonal measure” µ⊥ in M(Gn,i) is defined by
(µ⊥, ϕ) = (µ, ϕ⊥), ϕ ∈ C(Gn,i).
Let {Yj,k} be an orthonormal basis of spherical harmonics on Sn−1.
Here j = 0, 1, 2, . . . , and k = 1, 2, . . . , dn(j), where dn(j) is the di-
mension of the subspace of spherical harmonics of degree j. Each
function ω ∈ D(Sn−1) admits a decomposition ω =
j,k ωj,kYj,k with
the Fourier-Laplace coefficients ωj,k =
ω(θ)Yj,k(θ)dθ, which decay
rapidly as j → ∞. Each distribution f ∈ D′(Sn−1) can be defined by
INTERSECTION BODIES 7
(f, ω) =
j,k fj,kωj,k where fj,k = (f, Yj,k) grow not faster than j
m for
some integer m. We will need the Poisson integral, which is defined for
f ∈ L1(Sn−1) by
(2.1) (Πtf)(θ) = (1− t2)
f(u)|θ − tu|−ndu, 0 < t < 1,
and has the Fourier-Laplace decomposition Πtf =
j,k t
jfj,kYj,k [SW].
For f ∈ D′(Sn−1), this decomposition serves as a definition of Πtf . For
harmonic analysis on the unit sphere, the reader is referred to [Le],
[Mü], [Ne], [SW], and a survey article [Sa3].
2.2. Basic integral transforms. For integrable functions f on Sn−1
and ϕ onGn,i, 1 ≤ i ≤ n−1, the spherical Radon transform (Rif)(ξ), ξ ∈
Gn,i, and its dual (R
iϕ)(θ), θ ∈ Sn−1, are defined by
(2.2) (Rif)(ξ) =
θ∈Sn−1∩ξ
f(θ) dξθ, (R
iϕ)(θ) =
ϕ(ξ) dθξ,
where dξθ and dθξ denote the probability measures on the manifolds
Sn−1 ∩ ξ and {ξ ∈ Gn,i : ξ ∋ θ}, respectively. The precise meaning of
the second integral is
(2.3) (R∗iϕ)(θ) =
SO(n−1)
ϕ(rθγp0) dγ, θ ∈ Sn−1,
where p0 is an arbitrarily fixed coordinate i-plane containing the north
pole en and rθ ∈ SO(n) is a rotation satisfying rθen = θ.
Operators Ri and R
i extend to finite Borel measures in a canonical
way, using the duality
(2.4)
(Rif)(ξ)ϕ(ξ)dξ =
f(θ)(R∗iϕ)(θ)dθ.
Specifically, for µ ∈ M(Sn−1) and m ∈ M(Gn,i), we define Riµ ∈
M(Gn,i) and R∗im ∈ M(Sn−1) by
(2.5) (Riµ, ϕ)=
(R∗iϕ)(θ)dµ(θ), (R
im, f)=
(Rif)(ξ)dm(ξ),
where ϕ ∈ C(Gn,i), f ∈ C(Sn−1).
The generalized cosine transforms are defined by
(2.6) (Rαi f)(ξ) = γn,i(α)
|Prξ⊥θ|α+i−n f(θ) dθ,
(2.7) (
αϕ)(θ) = γn,i(α)
|Prξ⊥θ|α+i−n ϕ(ξ) dξ,
8 BORIS RUBIN
γn,i(α) =
σn−1 Γ((n− α− i)/2)
2π(n−1)/2 Γ(α/2)
, Re α > 0, α+i−n 6= 0, 2, 4, . . . .
Here Prξ⊥θ stands for the orthogonal projection of θ onto ξ
⊥, the or-
thogonal complement of ξ ∈ Gn,i. If f and ϕ are smooth enough, then
integrals (2.2) can be regarded (up to a constant multiple) as members
of the relevant analytic families (2.6) and (2.7); cf. Lemma 3.1. The
particular case i = n − 1 in (2.2) corresponds to the Minkowski-Funk
transform
(2.8) (Mf)(u) =
{θ : θ·u=0}
f(θ) duθ = (Rn−1f)(u
⊥), u ∈ Sn−1,
which integrates a function f over great circles of codimension 1. This
transform is a member of the analytic family
(2.9) (Mαf)(u) = (Rαn−1f)(u
⊥) = γn(α)
f(θ)|θ · u|α−1 dθ,
(2.10) γn(α)=
σn−1 Γ
(1−α)/2
2π(n−1)/2Γ(α/2)
, Re α>0, α 6=1, 3, 5, . . . .
The values α = 1, 3, 5, . . . are poles of the Gamma function Γ((1−α)/2).
In some occasions we include these values into consideration and set
(2.11) (M̃αf)(u) =
f(θ)|θ · u|α−1 dθ.
Historical notes. Regarding spherical Radon transforms (2.2) and
the Minkowski-Funk transform (2.8), see [GGG], [He], [R2], [R3]. The
first detailed investigation of the analytic family {Mα} is due to Se-
myanistyi [Se], who showed that these operators naturally arise in the
Fourier analysis of homogeneous functions. The case α = 2 in (2.11)
was known before, thanks to W. Blaschke, A.D. Alexandrov, and P.
Lévy. Integrals (2.9) (sometimes with different normalization) arise
in diverse areas of analysis and geometry; see [K4], [R1] - [R3], [Sa3],
[Str1], and references therein. In convex geometry and Banach space
theory, operators (2.11) with α − 1 replaced by p are known as the p-
cosine transforms. More general analytic families (2.6) and (2.7) were
introduced in [R2].
INTERSECTION BODIES 9
3. Analytic Families of the Generalized Cosine
Transforms
3.1. Basic properties. Below we review basic properties of integrals
(2.6), (2.7), (2.9); see [R2], [R3] for more details. For integrable func-
tions f and ϕ and Reα > 0, integrals (2.6), (2.7) and (2.9) are ab-
solutely convergent. When f and ϕ are infinitely differentiable, these
integrals extend meromorphically to all α ∈ C.
Lemma 3.1. If f and ϕ are continuous functions, then
Rαi f = R
i f = ciRif, ci =
2π(i−1)/2
;(3.1)
0ϕ = ciR
iϕ,(3.2)
Mαf = M0f = cn−1Mf, cn−1 =
2π(n−2)/2
.(3.3)
Hence, the Radon transform, its dual, and the Minkowski-Funk trans-
form can be regarded (up to a constant multiple) as members of the
corresponding analytic families {Rαi }, {
α}, {Mα}.
Proof. Formulas (3.2) and (3.3) follow from (3.1). To prove (3.1), we
write (2.6) in bi-spherical coordinates θ = u sin ψ + vcosψ, where
u ∈ Sn−1 ∩ ξ ∼ Si−1, v ∈ Sn−1 ∩ ξ⊥ ∼ Sn−i−1, 0 ≤ ψ ≤ π/2.
dθ = c sini−1 ψ cosn−i−1ψ dψdudv, c = σi−1σn−i−1/σn−1.
This gives
(Rαi f)(ξ) = c γn,i(α)
∫ π/2
sini−1 ψ cosα−1ψ dψ
Sn−1∩ξ⊥
Sn−1∩ξ
f(u sin ψ+vcosψ) du
ci(α)
Γ(α/2)
tα/2−1F (t) dt,
where
ci(α) =
c γn,i(α) Γ(α/2)
σi−1σn−i−1
Γ((n− α− i)/2)
2π(n−1)/2
2π(i−1)/2
as α → 0, and
F (t) = (1− t2)i/2−1
Sn−1∩ξ⊥
Sn−1∩ξ
1− t2+vt) du.
10 BORIS RUBIN
Since
Γ(α/2)
tα/2−1F (t) dt = F (0) =
Sn−1∩ξ
f(u)du = (Rif)(ξ),
we are done. �
Analytic continuation of integrals (2.9) can be realized in spherical
harmonics as Mαf=
mj,αfj,kYj,k, where
(3.4) mj,α=
(−1)j/2
Γ(j/2 + (1− α)/2)
Γ(j/2 + (n− 1 + α)/2)
if j is even,
0 if j is odd;
see [R1], [R3]. If f ∈D′(Sn−1), then Mαf is a distribution defined by
(Mαf, ω)=(f,Mαω)=
mj,α fj,k ωj,k, ω∈D(Sn−1); α 6=1, 3, 5, . . . .
Lemma 3.2. Let α, β ∈ C; α, β 6= 1, 3, 5, . . . . If α + β = 2 − n and
f ∈ De(Sn−1) (or f ∈ D′e(Sn−1)), then
(3.5) MαMβf = f.
If α, 2−n−α 6= 1, 3, 5, . . ., then Mα is an automorphism of the spaces
De(Sn−1) and D′e(Sn−1).
Proof. The equality (3.5) is equivalent to mj,αmj,β = 1, α+β = 2−n.
The latter follows from (3.4). The second statement is a consequence of
the standard theory of spherical harmonics [Ne], because the Fourier-
Laplace multiplier mj,α has a power behavior as j → ∞. �
Corollary 3.3. The Minkowski-Funk transform on the spaces De(Sn−1)
and D′e(Sn−1) can be inverted by the formula
(3.6) (M)−1 = cn−1M
2−n, cn−1 =
2π(n−2)/2
Note that there is a wide variety of diverse inversion formulas for
the Minkowski-Funk transform (see [GGG], [He], [R3] and references
therein), but all of them are, in fact, different realizations of (3.6),
depending on classes of functions.
3.2. Auxiliary statements. We establish some connections between
operator families defined above.
INTERSECTION BODIES 11
Lemma 3.4. Let α, β ∈ C; α, β 6= 1, 3, 5, . . . . If Reα > Reβ, then
Mα =MβAα,β, where Aα,β is a spherical convolution operator with the
Fourier-Laplace multiplier
(3.7) aα,β(j) =
Γ(j/2 + (1− α)/2)
Γ(j/2 + (n− 1 + α)/2)
Γ(j/2 + (n− 1 + β)/2)
Γ(j/2 + (1− β)/2)
so that aα,β(j) ∼ (j/2)β−α as j → ∞. If α and β are real numbers
satisfying α > β > 1− n, α + β < 2, then Aα,β is an integral operator
such that Aα,βf ≥ 0 for every non-negative f ∈ L1(Sn−1).
Proof. The first statement follows from (3.4). To prove the second one,
we consider integral operators
+ f)(x) =
Γ(µ/2)
(1− t2)µ/2−1(Πtf)(x) tn−νdt,(3.8)
− f)(x) =
Γ(µ/2)
(t2 − 1)µ/2−1(Π1/tf)(x) t1−νdt,(3.9)
expressed through the Poisson integral (2.1). The Fourier-Laplace mul-
tipliers of Q
+ and Q
− are
(3.10) q̂
+ (j)=
Γ((j+n−ν+1)/2)
Γ((j+n−ν+1+µ)/2)
− (j)=
Γ((j+ν−µ)/2)
Γ((j+ν)/2)
They can be easily computed by taking into account that Πt ∼ tj in
the Fourier-Laplace terms. If f ∈ L1(Sn−1) and 0 < µ < ν < n, then
integrals (3.8) and (3.9) are absolutely convergent and obey Q
± f ≥ 0
when f ≥ 0. Comparing (3.10) and (3.7), we obtain a factorization
Aα,β = Q
α−β,1−β
α−β,1−β
− (set µ = α − β, ν = 1 − β), which implies
the second statement of the lemma. �
It is convenient to introduce a special notation for the spherical
Radon transform and the generalized cosine transform with orthogonal
argument. Assuming ξ ∈ Gn,i, we denote
(3.11) (Rn−i,⊥f)(ξ) = (Rn−if)(ξ
⊥), (Rαn−i,⊥f)(ξ) = (R
n−if)(ξ
Lemma 3.5. Let f ∈ L1(Sn−1), Re α > 0; α 6= 1, 3, 5, . . . . Then
(3.12) (RiM
αf)(ξ) = c (Rα+i−1n−i,⊥ f)(ξ), ξ ∈ Gn,i, c =
2π(i−1)/2
or (replace i by n− i)
(3.13) (Rn−i,⊥M
αf)(ξ) =
2π(n−i−1)/2
σn−i−1
(Rα+n−i−1i f)(ξ).
If f ∈ De(Sn−1), then (3.12) and (3.13) extend to Reα ≤ 0 by analytic
continuation.
12 BORIS RUBIN
Proof. For Reα > 0,
αf)(ξ) = γn(α)
Sn−1∩ξ
f(θ)|θ · u|α−1 dθ.
Since |θ · u| = |Prξθ||vθ · u| for some vθ ∈ Sn−1 ∩ ξ, by changing the
order of integration, we obtain
αf)(ξ) = γn(α)
f(θ)|Prξθ|α−1 dθ
Sn−1∩ξ
|vθ · u|α−1dξu.
The inner integral is independent on vθ and can be easily evaluated:
Sn−1∩ξ
|vθ · u|α−1dξu =
|t|α−1(1− t2)(i−3)/2 dt
2π(i−1)/2 Γ(α/2)
σi−1 Γ((i+ α− 1)/2)
This implies (3.12). �
The following statement is dual to Lemma 3.5.
Lemma 3.6. Let µ ∈ M(Gn,i), α 6= 1, 3, 5, . . . . Then
(3.14) MαR∗iµ = c
Rα+i−1n−i µ
⊥, c = 2π(i−1)/2/σi−1,
in the D′(Sn−1)-sense. If Reα > 0 and µ is absolutely continuous with
density ϕ ∈ L1(Gn,i), then
(3.15) MαR∗iϕ = c
Rα+i−1n−i ϕ
almost everywhere on Sn−1. If ϕ ∈ D(Gn,i), then (3.15) extends to all
complex α 6= 1, 3, 5, . . . by analytic continuation.
Proof. Let ω ∈ De(Sn−1) (it suffices to consider only even test func-
tions). By (2.4) and (3.12),
(MαR∗iµ, ω) = (µ,RiM
αω) = c (µ,Rα+i−1n−i,⊥ ω) = c (µ
⊥, Rα+i−1n−i ω).
This gives the result. �
The next statement contains explicit representations of the right in-
verse of the dual Radon transform R∗i (note that R
i is non-injective on
D(Gn,i) when 1 < i < n− 1).
Lemma 3.7. Every function f ∈De(Sn−1) is represented as f=R∗iAf ,
where A : De(Sn−1) → D(Gn,i),
(3.16) Af = c1R
i f = c2Rn−i,⊥M
2−nf,
π(1−i)/2σn−2
σn−i−1
Γ((n− i)/2)
Γ((n− 1)/2)
, c2 =
2πn/2−1
INTERSECTION BODIES 13
Proof. The coincidence of expressions in (3.16) follows from (3.13). To
prove the first equality, we invoke spherical convolutions defined by
analytic continuation of the integral
(3.17) (Qαf)(θ)=
σn−1Γ((n−1−α)/2)
2π(n−1)/2Γ(α/2)
(1−|u·θ|2)(α−n+1)/2f(u)du,
Reα > 0, α − n 6= 0, 2, 4, . . . , so that Q0f = f [R2]. By Theorem
1.1 from [R2], R∗iR
i f = c
α+i−1f , and therefore (set α = 1 − i),
i f = c
1 f , as desired. �
The next statement provides an intriguing factorization of the Minkowski-
Funk transform in terms of Radon transforms associated to mutually
orthogonal subspaces. This factorization can be useful in different oc-
currences.
Theorem 3.8. For f ∈ L1(Sn−1) and 0 < i < n,
(3.18) Mf = R∗iRn−i,⊥f.
Proof. By (2.3),
(R∗iRn−i,⊥f)(θ) =
SO(n−1)
(Rn−i,⊥f)(rθγR
i) dγ
SO(n−1)
(Rn−if)(rθγR
n−i) dγ
SO(n−1)
Sn−1∩rθγR
f(v) dv
Sn−1∩Rn−i
SO(n−1)
f(rθγw) dγ.
The inner integral is independent on w ∈ Sn−1 ∩ Rn−i and equals
(Mf)(θ). This gives (3.18). �
3.3. Restriction theorems. Theorems of such type deal with traces
of functions on lower dimensional subspaces and are well known, for in-
stance, in the theory of function spaces. To the best of our knowledge,
traces of functions represented by Radon transforms or, more generally,
by the generalized cosine transforms , were not studied systematically
and deserve particular attention, because they provide analytic back-
ground to a series of results related to sections of star bodies; cf. [R3,
Sec. 3.5], [FGW]. Given a subspace η ∈ Gn,m and k < m, we denote
by Gk(η) the manifold of all k-dimensional subspaces of η.
14 BORIS RUBIN
Theorem 3.9. Let f ∈ Ce(Sn−1), 1 ≤ k < m < n, λ 6= 0,−2,−4, . . . .
If Reλ < k, then for every η ∈ Gn,m and every ξ ∈ Gk(η),
(3.19) (Rk−λn−kf)(ξ
⊥) = (Rk−λm−kT
η f)(ξ
⊥ ∩ η),
where
(3.20) (T λη f)(u) = c̃
Sn−1∩(η⊥⊕Ru)
f(w)|u · w|m−λ−1 dw,
u ∈ Sn−1 ∩ η, c̃ = π(m−n)/2 σn−m/2.
In particular (let λ→ k),
(3.21) (Rn−kf)(ξ
⊥)=c (Rm−kT
η f)(ξ
⊥ ∩ η), c= π
(n−m)/2 σm−k−1
σn−k−1
Proof. By (2.6),
(Rk−λn−kf)(ξ
⊥)=γn,n−k(k − λ)
|Prξθ|−λ f(θ) dθ.
We represent θ in bi-spherical coordinates as
(3.22) θ = ucosψ + v sinψ,
where
u ∈ Sn−1 ∩ η ∼ Sm−1, v ∈ Sn−1 ∩ η⊥ ∼ Sn−m−1, 0 ≤ ψ ≤ π/2,
dθ = c′′ sinn−m−1 ψ cosm−1ψ dψdudv, c′′ = σm−1σn−m−1/σn−1.
If ξ ⊂ η, then |Prξθ| = |Prξ[Prηθ]| = |Prξu| cosψ, and therefore,
(Rk−λn−kf)(ξ
⊥) = γm,m−k(k − λ)
Sn−1∩η
|Prξu|−λ(T λη f)(u) du,
where
(T λη f)(u) =
c′′ γn,n−k(k − λ)
γm,m−k(k − λ)
∫ π/2
sinn−m−1 ψ cosm−λ−1ψ dψ
Sn−1∩η⊥
f(ucosψ+v sin ψ) dv
π(m−n)/2 σn−m
Sn−1∩(η⊥⊕Ru)
f(w)|u · w|m−λ−1 dw.
Formula (3.21) follows from (3.19) by (3.1). �
INTERSECTION BODIES 15
Theorem 3.10. Let f ∈De(Sn−1), η∈Gn,m, 1<m<n. Suppose that
f =M1−λg, where Reλ < m, λ 6= 0,−2,−4, . . . . Then the restriction
of f onto η is represented as f =M1−λ
Sn−1∩η
T λη g, where T
η has the form
(3.20) andM1−λ
Sn−1∩η
denotes the same operator M1−λ, but on the sphere
Sn−1 ∩ η.
Proof. For Reλ < 1, the statement is a particular case of Theorem
3.9 (set k = 1). For other values of λ, the result follows by analytic
continuation. �
Remark 3.11. The restriction λ 6= 0,−2,−4, . . . in Theorems 3.9 and
3.10 is caused by the Gamma function Γ(λ/2) in the numerator of the
corresponding normalizing factor. It is evident from the proof, that
both theorems remain true also for λ = −2ℓ, ℓ ∈ N, if we remove the
normalizing factor. Then M1−λ in Theorem 3.10 will be substituted
for M̃1+2ℓ; see (2.11).
We will need the following generalization of Theorem 3.10.
Theorem 3.12. Let f ∈Ce(Sn−1), µ ∈ Me+(Sn−1), and let η∈Gn,m,
1<m<n. Suppose that f =M1−λµ, if λ < m, λ 6= −2ℓ, ℓ ∈ N, and
f=M̃1+2ℓµ, if λ = −2ℓ.
(i) There is a measure ν ∈ Me+(Sn−1 ∩ η) such that the restriction of
f onto Sn−1 ∩ η is represented as f =M1−λ
Sn−1∩η
(ii) If dµ(θ) = g(θ)dθ, g ∈ Ce(Sn−1), then (i) holds with dν(θ) =
(T λη g)(θ)dθ, where T
η g has the form (3.20).
(iii) If λ = −2ℓ, ℓ ∈ N, then (i) and (ii) hold with M1−λ
Sn−1∩η
substituted
for M̃1+2ℓ
Sn−1∩η
Proof. STEP 1. Let first λ < m, λ 6= 0,−2,−4, . . . . We invoke the
Poisson integral (2.1) so that
Πtf = ΠtM
1−λµ =M1−λgt, gt = Πtµ ∈ De(Sn−1), t ∈ (0, 1).
Since f is continuous, then Πtf converges to f as t → 0 uniformly
on Sn−1, and therefore, uniformly on Sn−1 ∩ η. Hence, for any test
function ω ∈ D(Sn−1 ∩ η), owing to Theorem 3.10, we have
(f, ω) = lim
(Πtf, ω) = lim
(M1−λgt, ω)
= lim
(M1−λ
Sn−1∩η
T λη gt, ω) = lim
(T λη gt,M
Sn−1∩η
= lim
(νt,M
Sn−1∩η
ω), νt = T
η gt.(3.23)
Thus, lim
(νt,M
Sn−1∩η
ω) exists for every ω∈D(Sn−1∩η). If ω is even,
i.e., ω ∈ De(Sn−1 ∩ η), then, by Lemma 3.2, we can replace ω by
16 BORIS RUBIN
M1−m+λ
Sn−1∩η
ω and conclude that the limit lim
(νt, ω) is well-defined for every
ω ∈ De(Sn−1∩η). Since νt = T λη Πtµ is an even function and the generic
test function ω ∈ D(Sn−1 ∩ η) can be represented as ω+ + ω−, where
ω± are even and odd, respectively, it follows that the limit lim
(νt, ω) =
(νt, ω+) is well-defined for every ω ∈ D(Sn−1∩ η) (not only for even
ω, as stated above). Since D′(Sn−1∩ η) is weakly complete, there is an
even distribution ν in D′(Sn−1 ∩ η) so that
(ν, ω) = lim
(νt, ω), ω ∈ D(Sn−1 ∩ η).
Furthermore, since (νt, ω) = (T
η Πtµ, ω) is non-negative for every non-
negative ω ∈ D(Sn−1 ∩ η) and every t ∈ (0, 1), then ν is a positive
distribution and, by Theorem 9.1, ν is a measure in Me+(Sn−1 ∩ η).
Thus, by (3.23), (f, ω) = lim
(νt,M
Sn−1∩η
ω) = (ν,M1−λ
Sn−1∩η
ω), which
means that f =M1−λ
Sn−1∩η
ν, as desired.
If dµ(θ) = g(θ)dθ, g ∈ Ce(Sn−1), then νt = T λη Πtg tends to T λη g uni-
formly on Sn−1∩η as t→ 0. Hence, by (3.23), (f, ω) = (T λη g,M1−λSn−1∩ηω),
which means f =M1−λ
Sn−1∩η
T λη g.
STEP 2. Consider the case λ = −2ℓ, ℓ ∈ N, when f = M̃1+2ℓµ,
µ ∈ Me+(Sn−1), and the operator T λη = T−2ℓη has the form
(T−2ℓη h)(u) = c̃
Sn−1∩(η⊥⊕Ru)
|u · w|m+2ℓ−1 h(w) dw,
cf. (3.20). For any functions h ∈ C(Sn−1) and ω ∈ C(Sn−1 ∩ η),
(3.24) (T−2ℓη h, ω) = (h,
T−2ℓη ω),
where
T−2ℓη ω)(θ)=
Γ(m/2)
2Γ(n/2)
( Prηθ
|Prηθ|
|Prηθ|2ℓ ∈ C(Sn−1).
INTERSECTION BODIES 17
Indeed, using bi-spherical coordinates (see (3.22)), we have
(T−2ℓη h, ω) = c̃
Sn−1∩η
ω(u)du
Sn−1∩(η⊥⊕Ru)
h(w)|u · w|m+2ℓ−1 dw
c̃ σn−m−1
Sn−1∩η
ω(u)du
sinn−m−1 ψ cosm+2ℓ−1ψ dψ
Sn−1∩η⊥
h(ucosψ+v sin ψ) dv
c̃ σn−m−1
c′′ σn−m
h(θ)ω
( Prηθ
|Prηθ|
|Prηθ|2ℓ dθ = (h,
T−2ℓη ω).
Let h = Πtµ and observe that the limit lim
(T−2ℓη Πtµ, ω) exists, be-
cause, by (3.24), (T−2ℓη Πtµ, ω) = (Πtµ,
T−2ℓη ω) → (µ,
T−2ℓη ω). Note
that (T−2ℓη Πtµ, ω) ≥ 0 for any non-negative ω ∈ C(Sn−1∩η). Applying
the standard completeness argument (as in Step 1), we conclude, that
there is a measure ν ∈ M+(Sn−1 ∩ η) such that
(T−2ℓη Πtµ, ω) = (ν, ω) ∀ω ∈ C(Sn−1 ∩ η).
Using this equality, for f = M̃1+2ℓµ we obtain
(f, ω) = lim
(Πtf, ω) = lim
(ΠtM̃
1+2ℓµ, ω) = lim
(M̃1+2ℓΠtµ, ω)
(use Theorem 3.10 and Remark 3.11)
= lim
(M̃1+2ℓ
Sn−1∩η
T−2ℓη Πtµ, ω) = lim
(T−2ℓη Πtµ, M̃
Sn−1∩η
= (ν, M̃1+2ℓ
Sn−1∩η
This gives the result.
If dµ(θ) = g(θ)dθ, g ∈ Ce(Sn−1), then, by Theorem 3.10 and Remark
3.11, for θ ∈ Sn−1 ∩ η we have
(Πtf)(θ) = (ΠtM̃
1+2ℓg)(θ) = (M̃1+2ℓΠtg)(θ) = (M̃
Sn−1∩η
T−2ℓη Πtg)(θ).
Owing to continuity of the operators M̃1+2ℓ
Sn−1∩η
, T−2ℓη , and Πt in the
relevant spaces of continuous functions, by passing to the limit as t→ 0,
we obtain f(θ) = (M̃1+2ℓ
Sn−1∩η
T−2ℓη g)(θ), θ ∈ Sn−1 ∩ η, as desired. �
18 BORIS RUBIN
4. Positive Definite Homogeneous Distributions
We remind some known facts; see, e.g., [GS], [Le]. Let S(Rn) be the
Schwartz space of rapidly decreasing C∞-functions on Rn and S ′(Rn)
its dual. The Fourier transform of F ∈ S ′(Rn) is defined by
〈F̂ , φ̂〉 = (2π)n〈F, φ〉, φ̂(y) =
φ(x) eix·y dx, φ ∈ S(Rn).
A distribution F ∈ S ′(Rn) is homogeneous of degree λ ∈ C if for any
φ ∈ S(Rn) and any a > 0, 〈F, φ(x/a)〉 = aλ+n 〈F, φ〉. Homogeneous dis-
tributions on Rn are intimately connected with distributions on Sn−1.
Let first f ∈ L1(Sn−1), (Eλf)(x) = |x|λf(x/|x|), x ∈ Rn \ {0}. The
operator Eλ generates a meromorphic S ′-distribution
〈Eλf, φ〉= a.c.
rλ+n−1u(r)dr, u(r) =
f(θ)φ(rθ)dθ,
where “a.c.” denotes analytic continuation in the λ-variable. The dis-
tribution Eλf is regular if Reλ > −n and admits simple poles at
λ = −n,−n − 1, . . .. The above definition extends to all distributions
f ∈ D′(Sn−1) by the formula
〈Eλf, φ〉 = a.c.
rλ+n−1u(r)dr, u(r) = (f, φ(rθ)), 1
and the map Eλ : D′(Sn−1) → S ′(Rn) is weakly continuous. If f
is orthogonal to all spherical harmonics of degree j, then the deriv-
ative u(j)(r) equals zero at r = 0 and the pole at λ = −n − j is
removable. In particular, if f is an even distribution, i.e., (f, ϕ) =
(f, ϕ−), ϕ−(θ) = ϕ(−θ) ∀ϕ ∈ D(Sn−1), then the only possible poles
of Eλf are −n,−n− 2,−n− 4, . . . .
The Fourier transform of homogeneous distributions was extensively
studied by many authors; see [Sa3] and references therein. We restrict
our consideration to even distributions, when the operator family {Mα}
defined by (2.9) naturally arises thanks to the formula
(4.1) [E1−n−αf ]
∧ = 21−απn/2Eα−1M
This formula amounts to Semyanistyi [Se]. If f ∈ De(Sn−1), then (4.1)
holds pointwise for 0 < Reα < 1 (see, e.g., Lemma 3.3 in [R1] ) and
extends in the S ′-sense to all α ∈ C satisfying
(4.2) α /∈ {1, 3, 5, . . .} ∪ {1− n,−n− 1,−n− 3, . . .}.
1Here and on, different notations 〈·, ·〉 and (·, ·) are used for distributions on Rn
and Sn−1, respectively.
INTERSECTION BODIES 19
Since De(Sn−1) is dense in D′e(Sn−1) and the maps E1−n−α and Eα−1
are weakly continuous from D′e(Sn−1) to S ′(Rn), then (4.1) extends to
all f ∈ D′e(Sn−1).
Regarding the cases excluded in (4.2), we note that if α = 1+ 2ℓ for
some ℓ = 0, 1, . . ., then (4.1) is meaningful if and only if f is orthogonal
to all spherical harmonics of degree 2ℓ. If α = 1 − n − 2ℓ for some
ℓ = 0, 1, . . ., then, according to the spherical harmonic decomposition
j,k fj,kYj,k, j even, formula (4.1) is substituted for the following:
[E2ℓf ]
∧(ξ) = (2π)n
fj,k(−∆)ℓ−j/2Yj,k(i∂) δ(ξ)(4.3)
+2n+2ℓπn/2E−n−2ℓM
1−n−2ℓ
fj,kYj,k
where −∆ is the Laplace operator, ∂ = (∂/∂ξ1, . . . , ∂/∂ξn), and δ(ξ)
is the delta function. It is worth noting that for α = 1, 3, 5, . . ., the
distribution [E1−n−αf ]
∧ can also be understood in the regularized sense
without any orthogonality assumptions. However, such regularization
does not preserve homogeneity; see [Sa1], [Sa3].
Our main concern is positivity and positive definiteness of even ho-
mogeneous distributions. The reader is referred to [GV] for the general
theory. A distribution F ∈ S ′(Rn) is positive if 〈F, φ〉 ≥ 0 for all non-
negative φ ∈ S(Rn). A similar definition holds for distributions on the
sphere and on Rn \ {0}. A distribution F ∈ S ′(Rn) is positive definite
if F̂ is positive. For our purposes, it is important to know, which even
homogeneous distributions are positive definite. Let us rewrite (4.1)
and (4.2) with 1− n− α replaced by −λ. We have
(4.4) [E−λf ]
∧ = 2n−λπn/2Eλ−nM
1+λ−nf,
(4.5) λ /∈ Λ0, Λ0 = {n, n + 2, n+ 4 . . .} ∪ {0,−2,−4, . . .}.
Theorem 4.1. Let λ ∈ R \ Λ0, f ∈ D′e(Sn−1).
(i) If λ < 0 and E−λf is a positive definite distribution, then f = 0.
(ii) For all λ ∈ R \ Λ0, the following statements are equivalent:
(a) [E−λf ]
∧ is a positive distribution on Rn \{0} (for λ > 0, this can
be replaced by “E−λf is a positive definite distribution on R
(b) M1+λ−nf ∈ Me+(Sn−1);
(c) f =M1−λµ for some measure µ ∈ Me+(Sn−1).
Furthermore, for any real λ 6= 0,−2,−4, . . ., and any i = 1, 2, . . . , n−1,
(c) is equivalent to
(d) Rif = R
n−i,⊥µ for some measure µ ∈ Me+(Sn−1).
20 BORIS RUBIN
Proof. (i) Choose φ(x) = exp(−|x|m) pt,θ(x/|x|), where m ∈ 2N and
pt,θ(·) is the Poisson kernel
(4.6) pt,θ(u) =
1− t2
(1− 2tu · θ + t2)n/2
, 0 < t < 1; u, θ ∈ Sn−1.
Then 〈Eλ−nM1+λ−nf, φ〉 = cλ(ΠtM1+λ−nf)(θ), where
cλ = a.c.
rλ−1 exp(−rm) dr = m−1Γ(λ/m)
and (ΠtM
1+λ−nf)(θ) is the Poisson integral of M1+λ−nf . If E−λf is a
positive definite distribution, then, by (4.4), Eλ−nM
1+λ−nf is a positive
distribution. On the other hand, if λ < 0 and m > −λ, then cλ < 0.
Hence 〈Eλ−nM1+λ−nf, φ〉 can be non-negative for every non-negative
φ ∈ S(Rn) only if (ΠtM1+λ−nf)(θ) = 0 for every 0 < t < 1 and
θ ∈ Sn−1. The latter implies M1+λ−nf = 0, which is equivalent to
f = 0 because M1+λ−n is injective; see Lemma 3.2.
(ii) Let [E−λf ]
∧ be a positive distribution on Rn \{0}. It means that
for every φ ∈ S(Rn) such that φ ≥ 0 and 0 /∈ suppφ, 〈[E−λf ]∧, φ〉 ≥ 0
or, by (4.4), 〈Eλ−nM1+λ−nf, φ〉 ≥ 0. Choose φ(x) = ψ(|x|)ω(x/|x|),
where ω ∈ D(Sn−1), ω ≥ 0, and ψ is a smooth non-negative function
such that
rα+n−2ψ(r)dr = 1 and 0 /∈ suppψ. Then
〈Eλ−nM1+λ−nf, φ〉 = (M1+λ−nf, ω) ≥ 0,
and therefore, M1+λ−nf ∈ Me+(Sn−1); see Theorem 9.1.
Conversely, let µ = M1+λ−nf ∈ Me+(Sn−1) and let φ ∈ S(Rn);
φ ≥ 0. In the case λ < 0 we additionally assume 0 /∈ suppφ. By (4.4),
〈[E−λf ]∧, φ〉 = 2n−λπn/2 〈Eλ−nµ, φ〉
= 2n−λπn/2
rλ−1dr
φ(rθ)dµ(θ) ≥ 0.
This proves equivalence of (a) and (b). Equivalence of (b) and (c)
follows from Lemma 3.2.
Let us prove the equivalence of (c) and (d). If Rif = R
n−i,⊥µ,
µ ∈ Me+(Sn−1), then, by (3.15),
(f, R∗iϕ) = (Rif, ϕ) = (R
n−i,⊥µ, ϕ) = (µ,
Ri−λn−iϕ
= c−1(µ,M1−λR∗iϕ), ϕ ∈ D(Gn,i).
Since any function ω ∈ De(Sn−1) can be expressed as ω = R∗iϕ for
some ϕ ∈ D(Gn,i) (see Lemma 3.7), this gives (f, ω) = c−1(µ,M1−λ, ω)
which is (c). Conversely, let f = M1−λµ, µ ∈ Me+(Sn−1), that is,
(f, ω) = (µ,M1−λ, ω) for every ω ∈ De(Sn−1). Choose ω = R∗iϕ, ϕ ∈
INTERSECTION BODIES 21
D(Gn,i). Then, as above, (f, R∗iϕ) = (µ,M1−λR∗iϕ) = c (µ,
Ri−λn−iϕ
which gives (d). �
5. λ-intersection bodies
5.1. Definitions and comments. We remind that Kn is the set of
all origin-symmetric star bodies K in Rn, n ≥ 2; ρK and || · ||K are the
radial function and the Minkowski functional of K. The following defi-
nitions and statements are motivated by Theorem 4.1 and the previous
consideration. Let λ be a real number,
(5.1) sλ =
1 if λ > 0, λ 6= n, n+ 2, n+ 4, . . . ,
Γ(λ/2) if λ < 0, λ 6= −2,−4, . . . .
The values λ = 0, n, n + 2, n + 4, . . . will not be considered in the
following, but values λ = −2,−4, . . . will be included. They become
meaningful if we change normalization. For λ 6= 0, n, n + 2, n + 4 . . . ,
let Inλ be the set of bodies K ∈ Kn, for which there is a measure
µ ∈ Me+(Sn−1) such that sλρK = M1−λµ if λ 6= −2ℓ, ℓ ∈ N, and
ρK = M̃
1−λµ ≡ M̃1+2ℓµ, otherwise. The equality sλρK = M1−λµ
means that for any ϕ ∈ D(Sn−1),
ρkK(θ)ϕ(θ) dθ =
(M1−λϕ)(θ) dµ(θ),
where for λ ≥ 1, (M1−λϕ)(θ) is understood in the sense of analytic
continuation. We remind the notation
Λ0 = {n, n+ 2, n+ 4 . . .} ∪ {0,−2,−4, . . .}.
Theorem 5.1. For λ ∈ R\Λ0, the following statements are equivalent:
(a) K ∈ Inλ ;
(b) The Fourier transform [sλ || · ||−λK ]∧ is a positive distribution on
n\{0} (for λ > 0, this can be replaced by “|| · ||−λK is a positive definite
distribution on Rn”);
(c) sλM
1+λ−nρλK ∈ Me+(Sn−1);
The theorem is an immediate consequence of Theorem 4.1 if the lat-
ter is applied to f = sλρ
K . Another useful characterization is provided
by Theorem 4.1 (d).
Theorem 5.2. Let λ ∈ R \ Λ0. If K ∈ Inλ , then for every i ∈
{1, 2, . . . , n−1} there is a measure µ ∈ Me+(Sn−1) such that sλRiρλK =
Ri−λn−i,⊥µ. Conversely, if
sλRiρ
K = R
n−i,⊥µ
for some i ∈ {1, 2, . . . , n−1} and some µ ∈ Me+(Sn−1), then K ∈ Inλ .
22 BORIS RUBIN
Although Inλ was called “the set of bodies”, the definition of this set
is purely analytic and extra work is needed to understand what bodies
(if any) actually constitute the class Inλ .
The following comments will be helpful.
1. The case λ > n is not so interesting, because by Theorem 5.1(c),
Inλ is either empty (if Γ((n − λ)/2) < 0) or coincides with the whole
class Kn (if Γ((n− λ)/2) > 0).
2. The case λ ∈ (0, n) agrees with the concept of isometric embed-
ding of the space (Rn, || · ||K) into L−p, p = λ; see Introduction. In the
framework of this concept, all bodies K ∈ Inλ can be regarded as “unit
balls of n-dimensional subspaces of L−λ”.
3. If K ∈ Inλ , where λ < 0 (one can replace λ by = −p, p > 0), then
||u||pK =
|θ · u|p dµ(θ)
for some µ ∈ Me+(Sn−1). This is the well known Lévy representation,
characterizing isometric embedding of the space (Rn, || · ||K) into Lp;
see Lemma 6.4 in [K4]. Statement (b) in Theorem 5.1 agrees with
Theorem 1.9. Keeping this terminology, we can state the following
Proposition 5.3. Let p > −n, p 6= 0. Then (Rn, || · ||K) embeds
isometrically in Lp if and only if K ∈ In−p.
4. If λ = k ∈ {1, 2, . . . , n−1}, then Inλ = Ink coincides with the class
of k-intersection bodies; see Definition 1.7 and Theorem 1.8. Theorems
5.1 and 5.2 provide new characterizations of this class.
These comments inspire the following
Definition 5.4. Let λ < n, λ 6= 0. A body K ∈ Kn is said to be a
λ-intersection body if K ∈ Inλ , or, in other words, if there is a measure
µ ∈ Me+(Sn−1) such that sλρλK = M1−λµ if λ 6= −2ℓ, ℓ ∈ N, and
ρ−2ℓK = M̃
1+2ℓµ, otherwise.
The result of Theorem 5.2 for λ = i = k can serve as an alternative
definition of k-intersection bodies in terms of Radon transforms. This
definition agrees with Definition 1.6 and mimics Definition 1.2.
Definition 5.5. Let k ∈ {1, 2, . . . , n − 1}. A body K ∈ Kn is a k-
intersection body if there is a non-negative measure µ on Sn−1 such
(5.2) (Rkρ
K)(ξ) = (Rn−kµ)(ξ
⊥), ξ ∈ Gn,k.
INTERSECTION BODIES 23
Equality (5.2) is understood in the weak sense according (2.5). Namely,
for ϕ ∈ C(Gn,k) and ϕ⊥(η) = ϕ(η⊥), η ∈ Gn,n−k, (5.2) means
(5.3)
K)(ξ)ϕ(ξ) dξ =
(R∗n−kϕ
⊥)(θ) dµ(θ).
5.2. λ-intersection bodies of star bodies and closure in the ra-
dial metric. As we mentioned in Introduction, the class of intersection
bodies, which coincides with Inλ when λ = 1, is the closure in the ra-
dial metric of the class of intersection bodies of star bodies. Below
we extend this result to all λ < n, λ 6= 0, in the framework of the
unique approach. We remind (see Definition 1.6) that K ∈ Kn is a
k-intersection body of a body L ∈ Kn and write K = IBk(L) if
(5.4) volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) ∀ξ ∈ Gn,k.
Let IBk,n be the set of all bodies K ∈ Kn satisfying (5.4) for some
L ∈ Kn.
How can we extend the purely geometric property (5.4) to non-
integer values of k? To this end, we first express (5.4) in terms of
the generalized cosine transforms (2.9).
Lemma 5.6. If K = IBk(L) is infinitely smooth, then
(5.5) ρn−kL =cM
1−n+kρkK , ρ
−1M1−kρn−kL ,
c = πk−n/2(n− k)/k.
Proof. We make use of (3.13), where we set i = k, α = 1 − n + k and
f = ρkK . By (3.1), this gives
(5.6) Rkρ
K = c̃Rn−k,⊥M
1−n+kρkK , c̃ =
πk−n/2 σn−k−1
On the other hand, if K = IBk(L) is infinitely smooth, then, according
to (5.4) and the equality
(5.7) volk(K ∩ ξ) =
K)(ξ),
we have
(5.8) Rkρ
k σn−k−1
(n− k) σk−1
Rn−k,⊥ρ
Comparing (5.6) and (5.8), owing to injectivity of the Radon transform,
we obtain the first equality in (5.5). The second equality follows from
the first one by (3.5). �
24 BORIS RUBIN
Equalities (5.5) are extendable to non-integer values of k. We denote
cλ,n = π
λ−n/2(n−λ)/λ,
and let sλ be defined by (5.1).
Definition 5.7. Let λ < n, λ 6= 0; K,L ∈ Kn. We say that K is a
λ-intersection body of L and write K = IBλ(L) if sλρλK=c−1λ,nM1−λρ
in the case λ 6= −2ℓ, ℓ ∈ N, and ρ−2ℓK = M̃1+2ℓρ
L , otherwise. The
set of all λ-intersection bodies of star bodies will be denoted by IBλ,n.
We also denote
(5.9) IB∞λ,n={K ∈ IBλ,n : ρK ∈ De(Sn−1)}.
By (3.5), equality sλρ
K = c
1−λρn−λL is equivalent to ρ
sλ cλ,nM
1−n+λρλK . Both equalities are generally understood in the sense
of distributions, for instance,
K , ϕ) = c
λ,n(ρ
1−λϕ), ϕ ∈ D(Sn−1).
If K (or L) is smooth, then sλρ
K(θ)=c
λ,n(M
1−λρn−λL )(θ) pointwise for
every θ∈Sn−1.
Theorem 5.8. Let λ < n, λ 6= 0. If λ 6= −2ℓ, ℓ ∈ N, then the class
Inλ of λ-intersection bodies is the closure of the classes IBλ,n and IB∞λ,n
of λ-intersection bodies of star bodies in the radial metric:
(5.10) Inλ = cl IBλ,n = cl IB∞λ,n.
If λ = −2ℓ, ℓ ∈ N, then Inλ ⊂ cl IBλ,n = cl IB∞λ,n.
Proof. STEP 1. We first prove that Inλ ⊂ cl IB∞λ,n. Let K ∈ Inλ , i.e.,
(a) sλρ
1−λµ, µ ∈ Me+(Sn−1), if λ 6= −2ℓ, ℓ ∈ N, and
(b) ρ−2ℓK = M̃
1+2ℓµ, otherwise.
Our aim is to define a sequence Kj ∈ IB∞λ,n such that ρKj → ρK in the
C-norm. Consider the Poisson integral Πtρ
K (see (2.1)), that converges
to ρλK in the C-norm when t→ 1. In the case (a), for any test function
ω ∈ D(Sn−1) we have
K , ω) = (ρ
K ,Πtω) = s
λ (µ,M
1−λΠtω) = s
1−λΠtµ, ω).
Similarly, in the case (b), we have a pointwise equality (Πtρ
K )(θ) =
(M̃1+2ℓΠtµ)(θ), θ ∈ Sn−1. Choose Kj so that ρλKj = Πtjρ
K , where tj is
a sequence in (0, 1) approaching 1. Clearly, Kj converges to K in the
radial metric. Moreover, Kj ∈ IB∞λ,n, because ρλKj = s
1−λρn−λLj
and ρ−2ℓKj = M̃
1+2ℓρn+2ℓLj , where the bodies Lj are defined by ρ
cλ,nΠtjµ in the case (a), and ρ
= Πtjµ in the case (b), respectively.
INTERSECTION BODIES 25
Conversely, let K ∈ cl IB∞λ,n, λ 6= −2,−4, . . . . It means that there
is a sequence of Kj ∈ IB∞λ,n such that lim
||ρK − ρKj ||C(Sn−1) = 0 and
= c−1λ,nM
1−λρn−λLj , ρLj ∈ De+(S
n−1). If j → ∞, then for every
ω ∈ D(Sn−1),
(5.11) sλ(ρ
,M1−n+λω) → sλ(ρλK ,M1−n+λω)=sλ(M1−n+λρλK , ω).
The right-hand side of (5.11) is non-negative, because by (3.5), for
every j and every ω ∈ De+(Sn−1),
,M1−n+λω) = c−1λ,n(M
1−λρn−λLj ,M
1−n+λω) = c−1λ,n(ρ
, ω) ≥ 0.
By Theorem 9.1, it follows that sλM
1−n+λρλK is a non-negative mea-
sure. We denote it by µ. By (3.5), for any ω ∈ D(Sn−1),
K , ω) = sλ(M
1−n+λρλK ,M
1−λω) = (µ,M1−λω) = (M1−λµ, ω),
i.e., K∈Inλ . This gives IB∞λ,n⊂Inλ and, by above, Inλ =cl IB∞λ,n.
STEP 2. It remains to prove that cl IB∞λ,n = cl IBλ,n. Since IB∞λ,n ⊂
IBλ,n, then cl IB∞λ,n ⊂ cl IBλ,n. To prove the opposite inclusion, let
K ∈ cl IBλ,n and consider the case λ 6= −2,−4, . . . . We have to show
that there is a sequence of smooth bodies Kj, which converges to K in
the radial metric and such that sλρ
= c−1λ,nM
1−λρn−λLj for some bodies
Lj ∈ Kn. Since K ∈ cl IBλ,n, there is a sequence K̃j ∈ Kn such that
||ρK̃j −ρK ||C(Sn−1) = 0 and sλρ
= c−1λ,nM
1−λρn−λ
for some bodies
L̃j ∈ Kn. We define smooth bodies Kj and Lj by
ρλKj = Π1−1/jρ
, ρn−λLj = Π1−1/jρ
where Π1−1/j stands for the Poisson integral with parameter 1 − 1/j.
Since operators Π1−1/j andM
1−λ commute, then sλρ
=c−1λ,nM
1−λρn−λLj ,
and therefore, Kj ∈ IB∞λ,n. On the other hand, by the properties of the
Poisson integral [SW],
|ρλKj − ρ
K | ≤ |Π1−1/jρλK̃j − Π1−1/jρ
K |+ |Π1−1/jρλK − ρλK | → 0
as j → ∞. It means, that K ∈ cl IB∞λ,n or cl IBλ,n ⊂ cl IB∞λ,n. Hence,
by above, cl IBλ,n = cl IB∞λ,n. For λ = −2,−4, . . . , the argument is
similar. �
Remark 5.9. If λ = −2,−4, . . . , we cannot prove the coincidence of
Inλ and cl IB∞λ,n, because the proof of the embedding cl IB∞λ,n ⊂ Inλ
relies heavily on the fact that M1−λ is an isomorphism of De(Sn−1). If
λ = −2,−4, . . . , this is not so, and the operator M̃1−λ has a nontrivial
kernel, which consists of spherical harmonics of degree > 2ℓ; see [R1]
for details.
26 BORIS RUBIN
It is interesting to translate Theorem 5.8 for λ = −p, p > 0, into
the language of isometric embeddings. Ignoring a non-important pos-
itive constant factor and using polar coordinates, one can replace the
equalities sλρ
K = c
1−λρn−λL and ρ
K = M̃
1+2ℓρn+2ℓL in Definition
5.7 by
(5.12) ||u||pK =
|x · u|p dx, u ∈ Sn−1.
Corollary 5.10.
(i) A unit ball of every n-dimensional subspace of Lp, can be approxi-
mated in the radial metric by bodies K, defined by (5.12), where L ∈ Kn
has a C∞ boundary.
(ii) If, moreover, p 6= 2, 4, . . . , then the set of unit balls of all n-
dimensional subspaces of Lp, can be identified with the closure in the
radial metric of the set of bodies K satisfying (5.12) for some smooth
body L ∈ Kn (one can also consider arbitrary bodies L ∈ Kn).
5.3. Central sections of λ-intersection bodies. It is known, that
a cross-section K ∩ η of a body K ∈ Ink by any m-dimensional central
plane η is a k-intersection body in η provided 1 ≤ k < m < n. This
fact was established in [Mi1, Proposition 3.17] by using Theorem 1.8
and a certain approximation procedure. Below we present more general
results, including sections of k-intersection bodies of star bodies and
the case of non-integer k = λ. These results are consequences of the
restriction theorems from Section 3.3.
Theorem 5.11. Let 1 ≤ k < m < n, η ∈ Gn,m. If K = IBk(L) in Rn,
then K ∩ η = IBk(L̃) in η, where the body L̃ is defined by
(5.13) ρm−k
(u) = ck,m,n
Sn−1∩(η⊥⊕Ru)
ρn−kL (w)|u · w|
m−k−1 dw,
u ∈ Sn−1 ∩ η, ck,m,n =
(m− k) σn−m
2(n− k)
Proof. By (5.7) and (3.21) (with f = ρn−kL ),
volk(K ∩ ξ) = voln−k(L ∩ ξ⊥) =
σn−k−1
(Rn−kρ
L )(ξ
c σn−k−1
(Rm−kT
L )(ξ
⊥ ∩ η)(5.14)
σm−k−1
(Rm−kρ
)(ξ⊥ ∩ η) = volm−k(L̃ ∩ ξ⊥),
as desired. �
INTERSECTION BODIES 27
Theorem 5.11 has the following generalization.
Theorem 5.12. Let 1 < m < n, η ∈ Gn,m and suppose that λ < m,
λ 6= 0. If K = IBλ(L) in Rn, then K ∩ η = IBλ(L̃) in η, where the
body L̃ is defined by
(5.15) ρm−λ
(u) = c̃
Sn−1∩(η⊥⊕Ru)
ρn−λL (w)|u · w|
m−λ−1 dw,
u ∈ Sn−1 ∩ η, c̃ =
(m− λ) σn−m
2(n− λ)
if λ 6= −2ℓ, ℓ ∈ N,
π(m−n)/2 σn−m/2 otherwise.
Moreover, if K ∈ Inλ in Rn, then K ∩ η ∈ Imλ in η.
Proof. Let λ 6= −2ℓ, ℓ ∈ N, and let θ ∈ Sn−1 ∩ η. By Definition 5.7,
K = c
1−λρn−λL , and Theorem 3.12 (with f = sλρ
K and g =
c−1λ,nρ
L ) yields
K(θ) = (M
Sn−1∩η
T λη [c
L ])(θ) = c
λ,m(M
Sn−1∩η
)(θ),
where ρm−λ
= c T λη ρ
L , c = π
(n−m)/2(m − λ)/(n − λ). By Definition
5.7 and (3.20), we are done. If λ = −2ℓ, ℓ ∈ N, then, as above,
ρ−2ℓK (θ) = (M̃
Sn−1∩η
T−2ℓη ρ
L )(θ) = (M
Sn−1∩η
ρm+2ℓ
where ρm+2ℓ
= T−2ℓη ρ
L . This gives (5.15).
Furthermore, if K ∈ Inλ , λ 6= −2ℓ, ℓ ∈ N, then, by Definition 5.4,
K = M
1−λµ, µ ∈ Me+(Sn−1). Hence, by Theorem 3.12, there is
a measure ν ∈ Me+(Sn−1 ∩ η) such that the restriction of sλρλK onto
Sn−1∩η is represented as sλρλK =M1−λSn−1∩ην. It means that K∩η ∈ I
in η. In the case λ = −2ℓ, ℓ ∈ N, the argument is similar. �
6. Examples of λ-intersection bodies
The definition of the classes Inλ and IBλ,n and all known characteri-
zations are purely analytic. Unlike the case λ = 1, when an intersection
body of a star body is explicitly defined by a simple geometric proce-
dure, it is not clear how can we construct λ−intersection bodies in the
general case. Below we give some examples, when the radial function
of a λ−intersection body can be explicitly determined. These examples
utilize the generalized cosine transforms.
Example 6.1. Let λ < 1, λ 6= 0. This case is the simplest. Indeed,
given a non-negative measure µ on Sn−1, the relevant λ−intersection
28 BORIS RUBIN
body can be directly constructed by the formula ρλK = M
1−λµ, if λ 6=
−2ℓ, ℓ ∈ N, and ρ−2ℓK = M̃1+2ℓµ, otherwise. In other words (cf. (2.11)),
(6.1) ρλK(u) =
|θ · u|−λ dµ(θ).
This fact (with λ replaced by −p) is a reformulation of Theorem 6.17
from [K4], which was stated in the language of isometric embeddings
and relies on the P. Lévy characterization; see also Lemma 6.4 and
Theorem 4.11 in [K4].
Example 6.2. If n − 3 ≤ λ < n, λ > 0, then Inλ includes all origin-
symmetric convex bodies in Rn.
This fact is due to Koldobsky [K4, Corollary 4.9]. It can be proved
using a slight modification of the argument from [R3, Sec. 7] as follows.
By Theorem 5.1 (c), it suffices to check that for any o.s. convex body
K,M1+λ−nρλK ∈ Me+(Sn−1). For λ ≥ n−1, this is obvious. To handle
the case n− 3 ≤ λ < n− 1, suppose first that K is infinitely smooth.
Using polar coordinates, for Reα > 0, we can write
(6.2) (Mαρα+n−1K )(u) = (α + n− 1) γn(α)
|x · u|α−1 dx.
Then M1+λ−nρλK can be realized as analytic continuation (a.c.) at α =
1 + λ− n of the right-hand side of (6.2). The latter can be written as
I(α) = 2(α+ n− 1)γn(α)
tα−1AK,u(t) dt,
AK,u(t) = voln−1(K∩{tu+u⊥}). Taking analytic continuation (see [GS,
Chapter 1]), for −2 < α < 0 (which is equivalent to n−3 ≤ λ < n−1)
we get
a.c.I(α) = c1
tα−1[AK,u(t)−AK,u(0)] dt.
Similarly, a.c.I(α) at α = −2 (which corresponds to λ = n − 3) is
K,u(0). Following [GS], one can easily check that constants c1 and
c2 are negative. Since K is convex, both analytic continuations are
positive, and thereforeM1+λ−nρλK > 0. If K is an arbitrary o.s. convex
body, we approximate it in the radial metric by smooth o.s. convex
bodies Kj. Then for any test function ω ∈ D+(Sn−1), by the previous
step we have
(M1+λ−nρλK , ω) = (ρ
1+λ−nω) = lim
(ρλKj ,M
1+λ−nω)
= lim
(M1+λ−nρλKj , ω) ≥ 0.
INTERSECTION BODIES 29
Hence, by Theorem 9.1, M1+λ−nρλK is a non-negative measure and the
proof is complete.
Example 6.3. If ρλK =
Ri−λn−iν for some ν ∈ M+(Gn,n−i) and λ ≤ i < n,
then K ∈ Inλ .
Indeed, for any test function ω ∈ D(Sn−1), by (3.12) (with α = 1−λ)
we have
(ρλK , ω) = (
Ri−λn−iν, ω) = (ν, R
n−iω) = (ν
⊥, Ri−λn−i,⊥ω)
= c−1(ν⊥, RiM
1−λω) = c−1(R∗i ν
⊥,M1−λω), c =
2π(i−1)/2
It means that for 0 < λ ≤ i < n and ν ∈ M+(Gn,n−i),
(6.3) ρλK =
Ri−λn−iν ⇐⇒ {ρλK =M1−λµ, µ = c−1R∗i ν⊥}.
By Definition 5.4, this gives the result. The particular case λ = i
implies the embedding into Ini of the Zhang’s class Znn−i; see Definition
1.5. This embedding was proved in [K3] and [Mi1] in a different way;
see also [Mi2], where it is proved that Znn−i is a proper subset of Ini if
2 ≤ i ≤ n− 2.
Example 6.4. If 0 < (i− 1)/2 < λ ≤ i < n and ρλK = M i−λµ for some
µ ∈ M+(Sn−1), then K ∈ Inλ .
Indeed, by Lemma 3.4 (with α = i− λ, β = 1− λ), ρλK =M i−λµ =
M1−λAi−λ,1−λ, where Ai−λ,1−λ is an integral operator which preserves
positivity provided i− λ > 1− λ > 1− n, (i− λ) + (1− λ) < 2. This
is just our case.
Example 6.5. One can construct bodies K ∈ Inλ from bodies L ∈ Inδ
by the formula ρK = ρ
L provided 0 < δ < λ < n.
Indeed, by Definition 5.4, there is a measure µ ∈ M+(Sn−1) so that
ρδL = M
1−δµ. Then, by Lemma 3.4 (with α = 1 − δ, β = 1 − λ),
ρλK = ρ
L = M
1−δµ = M1−λA1−δ,1−λµ, and we are done. This example
generalizes the corresponding result from [Mi1, p. 533, Statement (c)],
which was obtained in a different way for the case, when λ and δ are
integers.
Example 6.6. Let
(6.4) Bnq = {x ∈ Rn : ||x||q =
|xk|q
≤ 1}.
If 0 < q ≤ 2, then Bnq ∈ Inλ for all λ ∈ (0, n). If 2 < q <∞, λ ∈ (0, n),
then Bnq ∈ Inλ if and only if λ ≥ n− 3.
30 BORIS RUBIN
Both statements are due to Koldobsky. The first one follows from
the fact that for 0 < q ≤ 2 the Fourier transform of ||x||−λq is a positive
S ′-distribution (see Lemmas 3.6 and 2.27 in [K4]). The second state-
ment is a reformulation of Theorem 4.13 from [K4]. The “if” part is a
consequence of Example 6.2.
7. (q, ℓ)-balls
In this section we consider one more example, which resembles Ex-
ample 6.6, but does not fall into its scope and requires a separate
consideration. Let
x = (x′, x′′) ∈ Rn, x′ ∈ Rn−ℓ =
Rej , x
′′ ∈ Rℓ =
j=n−ℓ+1
Rej ,
e1, . . . , εn being coordinate unit vectors. Consider the (q, ℓ)-ball
(7.1) Bnq,ℓ = {x : ||x||q,ℓ = (|x′|q + |x′′|q)1/q ≤ 1}, q > 0.
We wonder, for which triples (q, ℓ, n), Bnq,ℓ is a λ-intersection body. To
study this problem, we need some preparation. Consider the Fourier
integral
(7.2) γq,ℓ(η) =
e−|y|
eiy·η dy, η ∈ Rℓ, q > 0.
The function γq,ℓ(η) is uniformly continuous on R
ℓ and vanishes at
infinity.
Lemma 7.1. If 0 < q ≤ 2, then γq,ℓ(η) > 0 for all η ∈ Rℓ.
Proof. (Cf. [K4, p. 44, for ℓ = 1]). For η = 0, the statement is obvious.
It is known (see, e.g., [SW]), that
(7.3) [e−t|· |
]∧(η) = πℓ/2t−ℓ/2e−|η|
2/4t, t > 0.
This gives the result for q = 2. Let 0 < q < 2. By Bernstein’s theorem
[F, Chapter 18, Sec. 4], there is a non-negative finite measure µq on
[0,∞) so that e−zq/2 =
e−tz dµq(t), z ∈ [0,∞). Replace z by |y|2 to
(7.4) e−|y|
e−t|y|
dµq(t).
Then (7.3) yields
γq,ℓ(η) =
eiy·ηdy
e−t|y|
dµq(t) =
dµq(t)
eiy·ηe−t|y|
= πℓ/2
t−ℓ/2e−|η|
2/4t dµq(t) > 0.
INTERSECTION BODIES 31
The Fubini theorem is applicable here, because, by (7.4),
|eiy·η|dy
e−t|y|
dµq =
e−|y|
dy <∞.
Our next concern is the behavior of γq,ℓ(η) when |η| → ∞. If q
is even, then e−|·|
is a Schwartz function and therefore, γq,ℓ is infin-
itely smooth and rapidly decreasing. In the general case, we have the
following.
Lemma 7.2. For any q > 0,
(7.5) lim
|η|→∞
|η|ℓ+qγq,ℓ(η) = 2ℓ+qπℓ/2−1Γ(1+ q/2)Γ((ℓ+ q)/2) sin(πq/2).
Proof. For ℓ = 1, this statement can be found in [PS, Chapter 3, Prob-
lem 154] and in [K4, p. 45]. In the general case, the proof is more
sophisticated and relies on the properties of Bessel functions. By the
well-known formula for the Fourier transform of a radial function (see,
e.g., [SW]), we write γq,ℓ(η) = I(|η|), where
I(s) = (2π)ℓ/2s1−ℓ/2
rℓ/2Jℓ/2−1(rs) dr
= (2π)ℓ/2s−ℓ
[(rs)ℓ/2Jℓ/2(rs)] dr.
Integration by parts yields
I(s) = q(2π)ℓ/2s−ℓ/2
rℓ/2+q−1Jℓ/2(rs) dr.
Changing variable z = sqrq, we obtain
sℓ+qI(s) = (2π)ℓ/2A(s1/q), A(δ) =
e−zδzℓ/2qJℓ/2(z
1/q) dz.
We actually have to compute the limit A0 = lim
A(δ). To this end, we
invoke Hankel functions H
ν (z), so that Jν(z) = ReH
ν (z) if z is real
[Er]. Let hν(z) = z
ν (z). This is a single-valued analytic function
in the z-plane with cut (−∞, 0]. Using the properties of the Bessel
functions [Er], we get
(7.6) lim
hν(z) = 2
νΓ(ν)/πi,
(7.7) hν(z) ∼
2/π zν−1/2eiz−
(ν+ 1
), z → ∞.
Then we write A(δ) as A(δ) = Re
e−zδhℓ/2(z
1/q) dz and change the
line of integration from [0,∞) to ℓθ = {z : z = reiθ, r > 0} for
32 BORIS RUBIN
small θ < πq/2. By Cauchy’s theorem, owing to (7.6) and (7.7), we
obtain A(δ) = Re
e−zδhℓ/2(z
1/q) dz. Since for z = reiθ, hℓ/2(z
1/q) =
O(1) when r = |z| → 0 and hℓ/2(z1/q) = O(r(ℓ−1)/2qe−r
1/q sin(θ/q)) as
r → ∞, by the Lebesgue theorem on dominated convergence, we get
A0 = Re
hℓ/2(z
1/q) dz. To evaluate the last integral, we again use
analyticity and replace ℓθ by ℓπq/2 = {z : z = reiπq/2, r > 0} to get
A0 = Re
eiπq/2
hℓ/2(r
1/qeiπ/2) dr
To finalize calculations, we invoke McDonald’s function Kν(z) so that
hν(z) = z
νH(1)ν (z) = −
(ze−iπ/2)νKν(ze
−iπ/2).
This gives
sin(πq/2)
sℓ/2+q−1Kℓ/2(s) ds.
The last integral can be explicitly evaluated by the formula 2.16.2 (2)
from [PBM], and we obtain the result. �
Now we can proceed to studying (q, ℓ)-balls Bnq,ℓ; see (7.1). There is
an intimate connection between geometric properties of the balls Bnq,ℓ
and the Fourier transform of the power function || · ||pq,ℓ. The case q = 2
is well-known and associated with Riesz potentials; see, e.g., [St]. The
relevant case of ℓnq -balls, which agrees with ℓ = 1 was considered in
Example 6.6.
Lemma 7.3. Let q > 0, ξ = (ξ′, ξ′′) ∈ Rn, γq,ℓ(ξ′′) and γq,n−ℓ(ξ′) be
the functions of the form (7.2). We define
(7.8) hp,q,ℓ(ξ) =
Γ(−p/q)
tn+p−1 γq,n−ℓ(ξ
′t) γq,ℓ(ξ
′′t) dt.
(i) Let ξ′ 6= 0 and ξ′′ 6= 0. If q is even, then the integral (7.8) is abso-
lutely convergent for all p > −n. Otherwise, it is absolutely convergent
when −n < p < 2q. In these cases, hp,q,ℓ (ξ) is a locally integrable
function away from the coordinate subspaces Rℓ and Rn−ℓ.
(ii) If −n < p < 0, then hp,q,ℓ (ξ) ∈ L1loc(Rn)∩S ′(Rn) and (||·||
∧(ξ) =
hp,q,ℓ(ξ) in the sense of S ′-distributions. Specifically, for ϕ ∈ S(Rn),
(7.9) 〈hp,q,ℓ , ϕ̂〉 = (2π)n〈|| · ||pq,ℓ , ϕ〉.
INTERSECTION BODIES 33
Proof. (i) For any 0 < ε < a <∞,
ε<|ξ′|<a
ε<|ξ′′|<a
|hp,q,ℓ (ξ; , ξ′′)| dξ′′
|Γ(−p/q)|
tn+p−1 dt
ε<|ξ′|<a
|γq,n−ℓ (ξ′t)| dξ′
ε<|ξ′′|<a
|γq,ℓ (ξ′′t)| dξ′′
|Γ(−p/q)|
tp−1 dt
tε<|z′|<ta
|γq,n−ℓ (z′)| dz′
tε<|z′′|<ta
|γq,ℓ (z′′)| dz′′
|Γ(−p/q)|
(...) =
|Γ(−p/q)|
(I1 + I2).
The first integral is dominated by
tn+p−1 dt, c = σn−ℓ−1σℓ−1max
|γq,n−ℓ (z′)| max
|γq,ℓ (z′′)|
and is finite for p > −n. The second integral can be estimated by
making use of Lemma 7.2. Specifically, if q is not an even integer, then
I2 ≤ cε
tp−1 dt
|z′|>tε
|z′|n−ℓ+q
|z′′|>tε
|z′′|ℓ+q
tp−2q−1 dt.
If q is even, then γq,ℓ and γq,n−ℓ are rapidly decreasing and I2 ≤
tp−2m−1 dt for any m > 0. This gives what we need.
(ii) If −n < p < 0, the same argument is applicable with ε = 0. In
this case, I2 does not exceed ||γq,n−ℓ||1||γq,ℓ||1
tp−1 dt. The latter is
finite when p < 0, because, by Lemma 7.2, γq,n−ℓ and γq,ℓ are integrable
functions on respective spaces. When ξ → ∞, one can readily check
that hp,q,ℓ (ξ) = O(|ξ|m) for some m > 0, and therefore, hp,q,ℓ ∈ S ′(Rn).
To compute the Fourier transform (|| · ||pq,ℓ)∧(ξ), we replace ||x||
q,ℓ by
the formula
||x||pq,ℓ =
Γ(−p/q)
tp−1 e−|x
′/t|q−|x′′/t|q dt, p < 0,
34 BORIS RUBIN
and note that the Fourier transform of the function x→ e−|x′/t|q−|x′′/t|q
is just γq,n−ℓ (ξ
′t) γq,ℓ (ξ
′′t). Then
〈|| · ||pq,ℓ)
∧ , ϕ̂〉 = (2π)n〈|| · ||pq,ℓ , ϕ〉
(2π)nq
Γ(−p/q)
tp−1 dt
′/t|q−|x′′/t|qϕ(x) dx
Γ(−p/q)
tn+p−1 dt
γq,n−ℓ (ξ
′t) γq,ℓ (ξ
′′t) ϕ̂(ξ) dξ.
Interchange of the order of integration in this argument can be easily
justified using absolute convergence of integrals under consideration.
Theorem 7.4. If 0 < q ≤ 2, 0 < ℓ < n, then Bnq,ℓ is a λ-intersection
body for any 0 < λ < n.
Proof. Owing to Lemma 7.1, the function (7.8) (with p replaced by −λ)
is positive, and therefore, by Lemma 7.3, || · ||−λq,ℓ represents a positive
definite distribution. Now the result follows by Theorem 5.1. �
Consider the case q > 2. In this case Bnq,ℓ is convex, and, owing to
Example 6.2, Bnq,ℓ ∈ Inλ for all n− 3 ≤ λ < n. What about λ < n− 3?
This case is especially intriguing.
Proposition 7.5. If q > 2 and 0 < λ < max(n− ℓ, ℓ)− 2, then || · ||−λq,ℓ
is not a positive definite distribution and therefore, Bnq,ℓ 6∈ Inλ .
Proof. Let 0 < λ < n− ℓ− 2 and suppose the contrary, that Bnq,ℓ ∈ Inλ .
Consider the section of Bnq,ℓ by the (n − ℓ + 1)-dimensional plane η =
Ren ⊕ Rn−ℓ. By Theorem 5.12, Bnq,ℓ ∩ η ∈ In−ℓ+1λ in η, and therefore
||xnen + x′′||λq,ℓ = (|xn|q + |x′′|q)−λ/q
is a positive definite distribution in η. By the second derivative text (see
[K4, Theorem 4.19]) this is impossible if 0 < λ < n− ℓ− 2. A similar
contradiction can be obtained if we assume 0 < λ < ℓ− 2 and consider
the section of Bnq,ℓ by the (ℓ+ 1)-dimensional plane Re1 ⊕ Rℓ. �
Proposition 7.5 can be proved without using the second derivative
text and Theorem 5.12 on sections of λ-intersection bodies; see [R4].
The bounds for λ appear to be the same.
Open problem. Let q > 2, ℓ > 1. Is Bnq,ℓ a λ-intersection body if
max(n− ℓ, ℓ)− 2 < λ < n− 3?
This problem does not occur in the case ℓ = 1 as in Example 6.6.
INTERSECTION BODIES 35
8. The generalized cosine transforms and comparison of
volumes
For 1 < i < n, let voli(·) denote the i-dimensional volume function.
Suppose that i is fixed, and let A and B be o.s. convex bodies in Rn
satisfying
(8.1) voli(A ∩ ξ) ≤ voli(B ∩ ξ) ∀ξ ∈ Gn,i.
Does it follow that
(8.2) voln(A) ≤ voln(B) ?
This question is known as the Generalized Busemann-Petty Problem
(GBP); see [G], [RZ], [Z1].
Theorem 8.1. If GBP (8.1)-(8.2) has an affirmative answer, then
every smooth origin-symmetric convex body with positive curvature in
n is an (n− i)-intersection body.
Proof. Suppose that B is an o.s. convex body in Rn so that the radial
function ρB is infinitely smooth, the boundary of B has a positive curva-
ture and B /∈ Inn−i. By Definition 5.4, there is a function ϕ ∈ De(Sn−1),
which is negative on some open origin-symmetric set Ω ⊂ Sn−1 and
such that ρn−iB = M
1+i−nϕ. We choose a function h ∈ De(Sn−1) so
that h 6≡ 0, h(θ) ≥ 0 if θ ∈ Ω and h(θ) ≡ 0 otherwise. Define an o.s.
smooth body A by ρiA = ρ
B − εM1−ih, ε > 0. If ε is small enough,
then A is convex. Since by (3.12), RiM
1−ih = cR0n−i,⊥h ≥ 0, then
A ≤ RiρiB, which gives (8.1). On the other hand, by (3.5),
(ρn−iB , ρ
B − ρiA) = ε(M1+i−nϕ,M1−ih) = ε(ϕ, h) < 0,
or (ρn−iB , ρ
B) < (ρ
B , ρ
A). By Hölder’s inequality, this implies voln(B) <
voln(A), which contradicts (8.2). �
Remark 8.2. As we noted in Introduction, Theorem 8.1 is not new, and
its proof given in [K3] relies on a sequence of deep facts from functional
analysis. The proof presented above is much more elementary and
constructive. For instance, it allows us to keep invariance properties of
the bodies under control. This advantage was essentially used in our
paper [R4].
Theorem 8.1 and Proposition 7.5 imply the following
Corollary 8.3. Let 1 ≤ ℓ ≤ n/2; i > ℓ+2, B = Bn4,ℓ (see (7.1)). Then
there is a smooth o.s. convex body A in Rn so that (8.1) holds but (8.2)
fails.
36 BORIS RUBIN
Setting ℓ = 1 in this statement, we obtain the well-known Bourgain-
Zhang theorem, which states that GBP has a negative answer when
3 < i < n; see [BZ], [K4], [RZ] on this subject. For i = 2 and i = 3
(n ≥ 5) the GBP is still open. An affirmative answer in these cases
was obtained in [R4] for bodies having a certain additional symmetry.
9. Appendix
Every positive distribution F ∈ S ′(Rn) is given by a tempered non-
negative measure µ, i.e., 〈F, φ〉 =
φ(x)dµ(x); see, e.g., [GV, p.147]).
For convenience of the reader, we present a similar fact for the sphere.
Theorem 9.1. A distribution f ∈ D′(Sn−1) is positive if and only if
there is a measure µ ∈ M+(Sn−1) such that
(f, ϕ) =
ϕ(θ)dµ(θ) ∀ϕ ∈ D(Sn−1).
Proof. This statement is known, however, we could not find precise ref-
erence and decided to give a proof for convenience of the reader. The
“if” part is obvious. To prove the “ only if” part, we write a test func-
tion ϕ ∈ D(Sn−1) as a sum ϕ = ϕ1+iϕ2, where ϕ1 = Reϕ, ϕ2 = Imϕ.
Since −||ϕ||C(Sn−1) ≤ ϕj ≤ ||ϕ||C(Sn−1), j = 1, 2, and f is positive, then
−(f, 1) ||ϕ||C(Sn−1) ≤ (f, ϕj) ≤ (f, 1) ||ϕ||C(Sn−1),
and therefore, |(f, ϕ)| ≤ |(f, ϕ1)|+ |(f, ϕ2)| ≤ 2(f, 1) ||ϕ||C(Sn−1). Since
D(Sn−1) is dense in C(Sn−1), then f extends as a linear continuous
functional f̃ on C(Sn−1) and, by the Riesz theorem, there is a mea-
sure µ on Sn−1 such that (f̃ , ω) =
ω(θ)dµ(θ) for every ω ∈
C(Sn−1). In particular, (f, ϕ) = (f̃ , ϕ) =
ϕ(θ)dµ(θ) for every
ϕ ∈ D(Sn−1). By taking into account that every non-negative function
ω ∈ C(Sn−1) can be uniformly approximated by non-negative functions
ϕk ∈ D(Sn−1) (for instance, by Poisson integrals of ω), we get
ω(θ)dµ(θ) = lim
ϕk(θ)dµ(θ) = lim
(f, ϕk) ≥ 0.
The latter means that µ is non-negative. �
References
[BZ] J. Bourgain, G. Zhang, On a generalization of the Busemann-Petty prob-
lem, Convex geometric analysis (Berkeley, CA, 1996), 65–76, Math. Sci.
Res. Inst. Publ., 34, Cambridge Univ. Press, Cambridge, 1999.
[Er] A. Erdélyi (Editor), Higher transcendental functions, Vol. II, McGraw-
Hill, New York, 1953.
INTERSECTION BODIES 37
[FGW] H. Fallert, P. Goodey, W, Weil, Spherical projections and centrally sym-
metric Sets, Advances in Math., 129 (1997), 301–322.
[F] W. Feller, An introduction to probability theory and its application,
Wiley & Sons, New York, 1971.
[G] R.J. Gardner, Geometric tomography, Cambridge University Press, New
York, 1995; updates in http://www.ac.wwu.edu/ gardner/.
[GGG] I. M. Gel’fand, S. G. Gindikin, andM. I. Graev, Selected topics in integral
geometry, Translations of Mathematical Monographs, AMS, Providence,
Rhode Island, 2003.
[GS] I. M. Gelfand, G.E. Shilov, Generalized functions, vol. 1, Properties and
Operations, Academic Press, New York, 1964.
[GV] I. M. Gelfand, N. Ya. Vilenkin, Generalized functions, vol. 4, Applica-
tions of harmonic analysis, Academic Press, New York, 1964.
[GLW] P. Goodey, E. Lutwak, W. Weil, Functional analytic characterizations of
classes of convex bodies, Math. Z. 222 (1996), 363–381.
[GW] P. Goodey, W. Weil, Intersection bodies and ellipsoids. Mathematika, 42
(1995), 295–304.
[GZ] E.L. Grinberg, G. Zhang, Convolutions, transforms, and convex bodies,
Proc. London Math. Soc. (3), 78 (1999), 77–115.
[He] S. Helgason, The Radon transform, Birkhäuser, Boston, Second edition,
1999.
[K1] A. Koldobsky, Intersection bodies in R4, Adv. Math., 136 (1998), 1-14.
[K2] , A generalization of the Busemann-Petty problem on sections of
convex bodies, Israel J. Math. 110 (1999), 75–91.
[K3] , A functional analytic approach to intersection bodies, Geom.
Funct. Anal., 10 (2000), 1507–1526.
[K4] , Fourier analysis in convex geometry, Mathematical Surveys and
Monographs, 116, AMS, 2005.
[Le] C. Lemoine, Fourier transforms of homogeneous distributions, Ann.
Scuola Norm. Super. Pisa Sci. Fis. e Mat., 26 (1972), No. 1, 117–149.
[Lu] E. Lutwak, Intersection bodies and dual mixed volumes, Adv. in Math.
71 (1988), 232–261.
[Mi1] E. Milman, Generalized intersection bodies, J. Funct. Anal., 240 (2006),
530–567.
[Mi2] , Generalized intersection bodies are not equivalent,
math.FA/0701779.
[Mü] Cl. Müller, Spherical harmonics, Springer, Berlin, 1966.
[Ne] U. Neri, Singular integrals, Springer, Berlin, 1971.
[PS] G. Polya, G. Szego, Aufgaben und lehrsatze aus der analysis, Springer-
Verlag, Berlin-New York, 1964.
[PBM] A. P. Prudnikov, Y. A. Brychkov, O. I. Marichev, Integrals and series:
special functions, Gordon and Breach Sci. Publ., New York - London,
1986.
[R1] B. Rubin, Inversion of fractional integrals related to the spherical Radon
transform, Journal of Functional Analysis, 157 (1998), 470–487.
[R2] , Inversion formulas for the spherical Radon transform and the
generalized cosine transform, Advances in Appl. Math. 29 (2002), 471–
38 BORIS RUBIN
[R3] , Notes on Radon transforms in integral geometry, Fractional Cal-
culus and Applied Analysis, 6 (2003), 25–72.
[R4] , The lower dimensional Busemann-Petty problem for bodies with
the generalized axial symmetry, math.FA/0701317.
[R5] , Generalized cosine transforms and classes of star bodies,
math.FA/0602540.
[RZ] B. Rubin, G. Zhang, Generalizations of the Busemann-Petty problem for
sections of convex bodies, J. Funct. Anal., 213 (2004), 473–501.
[Sa1] S. G. Samko, The Fourier transform of the functions Ym(x/|x|)/|x|n+α,
Soviet Math. (IZ. VUZ) 22 (1978), no. 7, 6–64.
[Sa2] , Generalized Riesz potentials and hypersingular integrals with
homogeneous characteristics, their symbols and inversion, Proceeding of
the Steklov Inst. of Math., 2 (1983) , 173–243.
[Sa3] , Singular integrals over a sphere and the construction of the
characteristic from the symbol, Soviet Math. (Iz. VUZ), 27 (1983), No.
4, 35–52.
[Schn] R. Schneider, Convex bodies: The Brunn-Minkowski theory, Cambridge
Univ. Press, 1993.
[Schw] L. Schwartz, Théorie des distributions, Tome 1, Paris, Hermann, 1950.
[Se] V.I. Semyanistyi, Some integral transformations and integral geometry in
an elliptic space, Trudy Sem. Vektor. Tenzor. Anal., 12 (1963), 397–441
(Russian).
[St] E. M. Stein, Singular integrals and differentiability properties of func-
tions, Princeton Univ. Press, Princeton, NJ, 1970.
[SW] E.M. Stein, G. Weiss, Introduction to Fourier analysis on Euclidean
spaces, Princeton Univ. Press, Princeton, NJ, 1971.
[Str1] R.S. Strichartz, Convolutions with kernels having singularities on a
sphere, Trans. Amer. Math. Soc., 148 (1970), 461–471.
[Str2] , Lp-estimates for Radon transforms in Euclidean and non-
euclidean spaces, Duke Math. J., 48 (1981), 699–727.
[Z1] G. Zhang, Sections of convex bodies, Amer. J. Math., 118 (1996), 319–
[Z2] , A positive solution to the Busemann-Petty problem in R4, Ann.
of Math. (2), 149 (1999), 535–543.
Department of Mathematics, Louisiana State University, Baton Rouge,
LA, 70803 USA
E-mail address : borisr@math.lsu.edu
ABSTRACT
  Intersection bodies represent a remarkable class of geometric objects
associated with sections of star bodies and invoking
  Radon transforms, generalized cosine transforms, and the relevant Fourier
analysis. The main focus of this article is interrelation between generalized
cosine transforms of different kinds in the context of their application to
investigation of a certain family of intersection bodies, which we call
$\lam$-intersection bodies. The latter include $k$-intersection bodies (in the
sense of A. Koldobsky) and unit balls of finite-dimensional subspaces of
$L_p$-spaces. In particular, we show that restrictions onto lower dimensional
subspaces of the spherical Radon transforms and the generalized cosine
transforms preserve their integral-geometric structure. We apply this result to
the study of sections of $\lam$-intersection bodies. New characterizations of
this class of bodies are obtained and examples are given. We also review some
known facts and give them new proofs.

<|endoftext|><|startoftext|>
Introduction
Hidden Markov models (HMMs) are generative probabilistic models that have been succesfuly
used for annotation of sequence data, such as DNA and protein sequences, natural langauge texts,
and sequences of observations or measurements. Their numerous applications include gene finding
[1], protein secondary structure prediction [2], and speech recognition [3]. The linear-time Viterbi
algorithm [4] is the most commonly used algorithm for these tasks. Unfortunately, the space required
by the Viterbi algorithm grows linearly with the length of the sequence (with a high constant factor),
which makes it unsuitable for analysis of continuous or very long sequences. For example, DNA
sequence of a single chromosome can be hundreds of megabases long. In this paper, we address this
problem by proposing an on-line Viterbi algorithm that on average requires much less memory and
that can annotate continuous streams of data on-line without reading the complete input sequence
first.
An HMM, composed of states and transitions, is a probabilistic model that generates sequences
over a given alphabet. In each step of this generative process, the current state generates one symbol
of the sequence according to the emission probabilities associated with that state. Then, an outgoing
transition is randomly chosen according to the transition probability table, and this transition is
followed to the new state. This process is repeated until the whole sequence is generated.
The states in the HMM represent distinct features of the observed sequences (such as protein
coding and non-coding sequences in a genome), and the emission probabilities in each state represent
statistical properties of these features. The HMM thus defines a joint probability Pr(X,S) over all
possible sequences X and all state paths S through the HMM that could generate these sequences.
To annotate a given sequence X, we want to recover the state path S that maximizes this joint
probability. For example, in an HMM with one state for protein-coding sequences, and one state
for non-coding sequences, the most probable state path marks each symbol of the input sequence
X as either protein coding or non-coding.
http://arxiv.org/abs/0704.0062v1
To compute the most probable state path, we use the Viterbi dynamic programming algorithm
[4]. For every prefix X1 . . . Xi of the given sequence X and for every state j, we compute the most
probable state path generating this prefix ending in state j. We store the probability of this path
in table P (i, j) and its second last state in table B(i, j). These values can be computed from left to
right, using the recurrence P (i, j) = maxk{P (i− 1, k) · tk(j) · ej(Xi)}, where tk(j) is the transition
probability from state k to state j, and ej(Xi) is the emission probability of the i-th symbol
of X in state j. Back pointer B(i, j) is the value of k that maximizes P (i, j). After computing
these values, we can recover the most probable state path S = s1, . . . , sn by setting the last state
as sn = argmaxk{P (n, k)}, and then following the back pointers B(i, j) from right to left (i.e.,
si = B(i + 1, si+1)). For an HMM with m states and a sequence X of length n, the running time
of the Viterbi algorithm is Θ(nm2), and the space is Θ(nm).
This algorithm is well suited for sequences and models of moderate size. However, to annotate
all 250 million symbols of the human chromosome 1 with a gene finding HMM consisting of hundred
states, we would require 25 GB of memory just to store the back pointers B(i, j). This is clearly
impractical on most computational platforms.
Several solutions are used in practice to overcome this problem. For example, most practical
gene finding programs process only sequences of limited size. The long input sequence is split into
several shorter sequences which are processed separately. Afterwards, the results are merged and
conflicts are resolved heuristically. This approach leads to suboptimal solutions, especially if the
genes we are looking for cross the boundaries of the split.
Grice et al. [5] proposed a practical checkpointing algorithm that trades running time for space.
We divide the input sequence into K blocks of L symbols, and during the forward pass, we only
keep the first column of each block. To obtain the most probable state path, we recompute the last
block of L columns, and use back pointers to recover the last L states of the most probable path, as
well as the last state of the previous block. The information about this last state can now be used to
recompute the most probable state path within the previous block in the same way, and the process
is repeated for all blocks. Since every value of P (i, j) will be computed twice, this means two-fold
slow-down compared to the Viterbi algorithm, but if we set K = L =
n, this algorithm only
requires Θ(
nm) memory. Checkpointing can be further generalized to trade L-fold slow-down for
memory of Θ( L
nm) [6, 7].
In this paper, we propose and analyze an on-line Viterbi algorithm that does not use fixed
amount of memory for a given sequence. Instead, the amount of memory varies depending on
the properties of the HMM and the input sequence. In the worst case, our algorithm still requires
Θ(nm) memory; however, in practice the requirements are much lower. We prove, by demonstrating
analogy to random walks and using results from the theory of extreme values, that in simple cases
the expected space for a sequence of length n is as low as Θ(m log n). We also experimentally
demonstrate that the memory requirements are low for more complex HMMs.
2 On-line Viterbi algorithm
In our algorithm, we represent the back pointer matrix B in the Viterbi algorithm by a tree structure
(see [4]), with node (i, j) for each sequence position i and each state j. Parent of node (i, j) is the
node (i − 1, B(i, j)). In this data structure, the most probable state path is a path from the leaf
node (n, j) with the highest probability P (n, j) to the root of the tree (see Figure 1).
This tree is built as the Viterbi algorithm progresses from left to right. After processing sequence
position i, all edges that do not lie on one of the paths ending in a level i node can be removed;
sequence positions
Fig. 1. Example of the back pointer tree structure. Dashed lines mark the edges that cannot be part of the
most probable state path. The square node marks the coalescence point of the remaining paths.
these edges will not be used in the most probable path [8]. The remaining m paths represent all
possible initial segments of the most probable state path. These paths are not necessarily edge
disjoint; in fact, often all the paths share the same prefix up to some node called coalescence point
(see Figure 1).
Left of the coalescence point, there is only a single candidate for the initial segment of the most
probable state path. Therefore we can output this segment and remove all edges and nodes of the
tree up to the coalescence point. Forney [4] describes an algorithm that after processing D symbols
of the input sequence checks whether a coalescence point has been reached; in such case, the initial
segment of the most probable state path is outputted. If the coalescence point was not reached,
one potential initial segment is chosen heuristicaly. Several studies [9, 10] suggest how to choose D
to limit the expected error caused by such heuristic steps in the context of convolution codes.
Here we show how to detect the existence of a coalescence point dynamically without introducing
significant overhead to the whole computation. We maintain a compressed version of the back
pointer tree, where we omit all internal nodes that have less than two children. Any path consisting
of such nodes will be contracted to a single edge. This compressed tree has m leaves and at most
m− 1 internal nodes. Each node stores the number of its children and a pointer to its parent node.
We also keep a linked list of all the nodes of the compressed tree ordered by the sequence position.
Finally, we also keep the list of pointers to all the leaves.
When processing the k-th sequence position in the Viterbi algorithm, we update the compressed
tree as follows. First, we create a new leaf for each node at position i, link it to its parent (one of
the former leaves), and insert it into the linked list. Once these new leaves are created, we remove
all the former leaves that have no children, and recursively all of their ancestors that would not
have any children.
Finally, we need to compress the new tree: we examine all the nodes in the linked list in order
of decreasing sequence position. If the node has zero or one child and is not a current leaf, we
simply delete it. For each leaf or node that has at least two children, we follow the parent links
until we find its first ancestor (if any) that has at least two children and link the current node
directly to that ancestor. A node (ℓ, j) that does not have an ancestor with at least two children
is the coalescence point; it will become a new root. We can output the most probable state path
for all sequence positions up to ℓ, and remove all results of computation for these positions from
memory.
The running time of this update is O(m) per sequence position, and the representation of the
compressed tree takes O(m) space. Thus the asymptotic running time of the Viterbi algorithm is
not increased by the maintanance of the compressed tree. Moreover, we have implemented both
the standard Viterbi algorithm and our new on-line extension, and the time measurements suggest
that the overhead required for the compressed tree updates is less than 5%.
The worst-case space required by this algorithm is still O(nm). However, this is rarely the case
for realistic data; required space changes dynamically depending on the input. In the next section,
we show that for simple HMMs the expected maximum space required for processing sequence of
length n is Θ(m log n). This is much better than checkpointing, which requires space of Θ(m
with a significant increase in running time. We conjecture that this trend extends to more complex
cases. We also present experimental results on a gene finding HMM and real DNA sequences showing
that the on-line Viterbi algorithm leads to significant savings in memory.
Another advantage of our algorithm is that it can construct initial segments of the most probable
state path before the whole input sequence is read. This feature makes it ideal for on-line processing
of signal streams (such as sensor readings).
3 Memory requirements of the on-line Viterbi algorithm
In this section, we analyze the memory requirements of the on-line Viterbi algorithm. The memory
used by the algorithm is variable throughout the execution of the algorithm, but of special interest
are asymptotic bounds on the expected maximum amount of memory used by the algorithm while
decoding a sequence of length n.
We use analogy to random walks and results in extreme value theory to argue that for a symmet-
ric two-state HMMs, the expected maximum memory is Θ(m log n). We also conduct experiments
on an HMM for gene finding, and both real and simulated DNA sequences.
3.1 Symmetric two-state HMMs
Consider a two-state HMM over a binary alphabet as shown in Figure 2a. For simplicity, we assume
t < 1/2 and e < 1/2. The back pointers between the sequence positions i and i+1 can form one of
the configurations i–iii shown in Figure 2b. Denote pA = log P (i, A) and pB = logP (i, B), where
P (i, j) is the table of probabilities from the Viterbi algorithm. The recurrence used in the Viterbi
algorithm implies that the configuration i occurs when log t−log(1−t) ≤ pA−pB ≤ log(1−t)−log t,
configuration ii occurs when pA−pB ≥ log(1−t)−log t, and configuration iii occurs when pA−pB ≤
log t− log(1− t). Configuration iv never happens for t < 1/2.
Note that for a two-state HMM, a coalescence point occurs whenever one of the configurations
ii or iii occur. Thus the memory used by the HMM is proportional to the length of continuous
sequence of configurations i. We will call such a sequence of configurations a run.
First, we analyze the length distribution of runs under the assumption that the input sequence
X is a sequence of uniform i.i.d. binary random variables. In such case, we represent the run by a
symmetric random walk corresponding to a random variable X = pA−pB
log(1−e)−log e
− (log t− log(1− t)).
Whenever this variable is within the interval (0,K), where K =
log(1−t)−log(t)
log(1−e)−log(e)
, the configuration
i occurs, and the quantity pA−pB is updated by log(1−e)−log e, if the symbol at the corresponding
sequence position is 0, or log e− log(1− e), if this symbol is 1. These shifts correspond to updating
the value of X by +1 or −1.
When X reaches 0, we have a coalescence point in configuration iii, and the pA−pB is initialized
to log t− log(1 − t) ± (log e − log 1 − e), which either means initialization of X to +1, or another
0: 1−e
1−t 1−t
1: 1−e
configuration i:
configuration ii:
configuration iii: configuration iv:
(a) (b)
Fig. 2. (a) Symmetric two-state HMM with two parameters: e for emission probabilities and t for transitions
probabilities. (b) Possible back-pointer configurations for the two-state HMM.
coalescence point, depending on the symbol at the corresponding sequence position. The other case,
when X reaches K and we have a coalescence point in configuration ii, is symmetric.
We can now apply the classical results from the theory of random walks (see [11, ch.14.3,14.5])
to analyze the expected length of runs.
Lemma 1. Assuming that the input sequence is uniformly i.i.d., the expected length of a run of a
symmetrical two-state HMM is K − 1.
Therefore the larger is K, the more memory is required to decode the HMM. The worst case is
achieved as e approaches 1/2. In such case, the two states are indistinguishable and being in state
A is equivalent to being in state B. Using the theory of random walks, we can also characterize the
distribution of length of runs.
Lemma 2. Let Rℓ be the event that the length of a run of a symmetrical two-state HMM is either
2ℓ + 1 or 2ℓ + 2. Then, assuming that the input sequence is uniformly i.i.d., for some constants
b, c > 0:
b · cos2ℓ π
≤ Pr(Rℓ) ≤ c · cos2ℓ
Proof. For a symmetric random walk on interval (0,K) with absorbing barriers and with starting
point z, the probability of event Wz,n that this random walk ends in point 0 after n steps is zero,
if n− z is odd, and the following quantity, if n− z is even [11, ch.14.5]:
Pr(Wz,n) =
0<v<K/2
cosn−1
Using symmetry, note that the probability of the same random walk ending after n steps at barrier
K is the same as probability of WK−z,n. Thus, if K is odd, we can state:
Pr(Rℓ) = Pr(W1,2ℓ+1) + Pr(WK−1,2ℓ+1)
0<v<K/2
cos2ℓ
+ (−1)v+1 sin πv
0<v<K/2, v odd
cos2ℓ
There are at most K/4 terms in the sum and they can all be bounded from above by cos2ℓ πv
Thus, we can give both upper and lower bounds on Pr(Rℓ) using only the first term of the sum as
follows:
cos2ℓ
≤ Pr(Rℓ) ≤ cos2ℓ
Similarly, if K is even, we can state:
Pr(Rℓ) = Pr(W1,2ℓ+1) + Pr(WK−1,2ℓ+2)
0<v<K/2
cos2ℓ
1 + (−1)v+1 cos
and thus we have a similar bound:
1 + cos
cos2ℓ
≤ Pr(Rℓ) ≤ 2 cos2ℓ
The previous lemma characterizes the length distribution of a single run. However, to analyze
memory requirements for a sequence of length n, we need to consider maximum over several runs
whose total length is n. Similar problem was studied for the runs of heads in a sequence of n coin
tosses [12, 13]. For coin tosses, the length distribution of runs is geometric, while in our case the
runs are only bounded by geometricaly decaying functions. Still, we can prove that the expected
length of the longest run grows logarithmically with the length of the sequence, as is the case for
the coin tosses.
Lemma 3. Let X1,X2, . . . be a sequence of i.i.d. random variables drawn from a geometrically
decaying distribution over positive integers, i.e. there exist constants a, b, c, a ∈ (0, 1), 0 < b ≤ c,
such that for all integers k ≥ 1, bak ≤ Pr(Xi > k) ≤ cak.
Let Nn be the largest index such that
i=1...Nn
Xi ≤ n, and let Yn be max{X1,X2, . . . ,XNn , n−
i=1Xi}. Then
E[Yn] = log1/a n+ o(log n) (7)
Proof. Let Zn = maxi=1...nXn be the maximum of the first n runs. Clearly, Pr(Zn ≤ k) = Pr(Xi ≤
k)n, and therefore (1− cak)n ≤ Pr(Zn ≤ k) ≤ (1− bak)n for all integers k ≥ log1/a(c).
Lower bound: Let tn = log1/a n−
lnn. If Yn ≤ tn, we need at least n/tn runs to reach the sum n,
i.e. Nn ≥ n/tn − 1 (discounting the last incomplete run). Therefore
Pr(Yn ≤ tn) ≤ Pr(Z n
−1 ≤ tn) ≤ (1− batn)
= (1− batn)a
−tnatn ( n
Since limn→∞ a
tn(n/tn−1) = ∞ and limx→0(1− bx)1/x = e−b, we get limn→∞Pr(Yn ≤ tn) = 0.
Note that E[Yn] ≥ tn(1− Pr(Yn ≤ tn)), and thus we get the desired bound.
Upper bound: Clearly, Yn ≤ Zn and so E[Yn] ≤ E[Zn]. Let Z ′n be the maximum of n i.i.d. geometric
random variables X ′1, . . . ,X
n such that Pr(X
i ≤ k) = 1− ak.
We will compare E[Zn] to the expected value of variable Z
n. Without loss of generality, c ≥ 1.
For any real x ≥ log1/a(c) + 1 we have:
Pr(Zn ≤ x) ≥ (1− ca⌊x⌋)n
1− a⌊x⌋−log1/a(c)
1− a⌊x−log1/a(c)−1⌋
= Pr(Z ′n ≤ x− log1/a(c)− 1)
= Pr(Z ′n + log1/a(c) + 1 ≤ x)
This inequality holds even for x < log1/a(c) + 1, since the right-hand side is zero in such case.
Therefore, E[Zn] ≤ E[Z ′n+log1/a(c)+1] = E[Z ′n]+O(1). Expected value of Z ′n is log1/a(n)+o(log n)
[14], which proves our claim. ⊓⊔
Using results of Lemma 3 together with the characterization of run length distributions by
Lemma 2, we can conclude that for symmetric two-state HMMs, the expected maximum memory
required to process a uniform i.i.d. input sequence of length n is (1/ ln(1/ cos(π/K)))·ln n+o(log n).
3 Using the Taylor expansion of the constant term as K grows to infinity, 1/ ln(1/ cos(π/K))) =
2K2/π2 +O(1), we obtain that the maximum memory grows approximately as (2K2/π2) lnn.
The asymptotic bound Θ(log n) can be easily extended to the sequences that are generated by
the symmetric HMM, instead of uniform i.i.d. The underlying process can be described as a random
walk with approximately 2K states on two (0,K) lines, each line corresponding to sequence symbols
generated by one of the two states. The distribution of run lengths still decays geometrically as
required by Lemma 3; the base of the exponent is the largest eigenvalue of the transition matrix
with absorbing states omitted (see e.g. [15, Claim 2]).
The situation is more complicated in the case of non-symmetric two-state HMMs. Here, our
random walks proceed in steps that are arbitrary real numbers, different in each direction. We are
not aware of any results that would help us to directly analyze distributions of runs in these models,
however we conjecture that the size of the longest run is still Θ(log n). Perhaps, to obtain bounds
on the length distribution of runs, one can approximate the behaviour of such non-discrete random
walks by a different model (for example, [16, ch.7]).
3.2 Multi-state HMMs
Our analysis technique cannot be easily extended to HMMs with many states. In two-state HMMs,
each new coalescence event clears the memory, and thus the execution of the algorithm can be
divided into more or less independent runs. A coalescent event in a multi-state HMM results in a
non-trivial tree left in memory, sometimes with a substantial depth. Thus, the sizes of consecutive
runs are no longer independent (see Figure 3a).
3 We omitted the first run, which has a different starting point and thus does not follow the distribution outlined in
Lemma 2. However, the expected length of this run does not depend on n and thus contributes only a lower-order
term. We also omitted the runs of length one that start outside the interval (0,K); these runs again contribute
only to lower order terms of the lower bound.
15.2M 15.3M 15.4M 15.5M
Section of chromosome 1
0 5M 10M 15M 20M
Sequence length
Human genome (35)
HMM generated (100)
Random i.i.d. (35)
Fig. 3. Memory requirements of a gene finding HMM. a) Actual length of table used on a segment of human
chromosome 1. b) Average maximum table length needed for prefixes of 20 MB sequences.
To evaluate the memory requirements of our algorithm for multi-state HMMs, we have im-
plemented the algorithm and performed several experiments on both simulated and biological se-
quences. First, we generalized the symmetric HMMs from the previous section to multiple states.
The symmetric HMM with m states emits symbols over m-letter alphabet, where each state
emits one symbol with higher probability than the other symbols. The transition probabilities
are equiprobable, except for self-transitions. We have tested the algorithm for m ≤ 6 and sequences
generated both by a uniform i.i.d. process, and by the HMM itself. Observed data are consistent
with the logarithmic growth of average maximum memory needed to decode a sequence of length
n (data not shown).
We have also evaluated the algorithm using a simplified HMM for gene finding with 265 states.
The emission probabilities of the states are defined using at most 4-th order Markov chains, and
the structure of the HMM reflects known properties of genes (similar to the structure shown in
[17]). The HMM was trained on RefSeq annotations of human chromosomes 1 and 22.
In gene finding, we segment the input DNA sequence into exons (protein-coding sequence in-
tervals), introns (non-coding sequence separating exons within a gene), and intergenic regions (se-
quence separating genes). Common measure of accuracy is exon sensitivity (how many of real exons
we have succesfuly and exactly predicted). The implementation used here has exon sensitivity 37%
on testing set of genes by Guigo et al. [18]. A realistic gene finder, such as ExonHunter [19], trained
on the same data set achieves sensitivity of 53%. This difference is due to additional features that
are not implemented in our test, namely GC content levels, non-geometric length distributions, and
sophisticated signal models.
We have tested the algorithm on 20 MB long sequences: regions from the human genome,
simulated sequences generated by the HMM, and i.i.d. sequences. Regions of the human genome
were chosen from hg18 assembly so that they do not contain sequencing gaps. The distribution for
the i.i.d. sequences mirrors the distribution of bases in the human chromosome 1.
The results are shown in Figure 3b. The average maximum length of the table over several
samples appears to grow faster than logarithmically with the length of the sequence, though it
seems to be bounded by a polylogarithmic function. It is not clear whether the faster growth is an
artifact that would disapear with longer sequences or higher number of samples.
The HMM for gene finding has a special structure, with three copies of the state for introns
that have the same emission probabilities and the same self-transition probability. In two-state
symmetric HMMs, similar emission probabilities of the two states lead to increase in the length of
individual runs. Intron states of a gene finder are an extreme example of this phenomenon.
Nonetheless, on average a table of length roughly 100,000 is sufficient to to process sequences
of length 20 MB, which is a 200-fold improvement compared to the trivial Viterbi algorithm. In
addition, the length of the table did not exceed 222,000 on any of the 20MB human segments. As
we can see in Figure 3a, most of the time the program keeps only relatively short table; the average
length on the human segments is 11,000. The low average length can be of a significant advantage
if multiple processes share the same memory.
4 Conclusion
In this paper, we introduced the on-line Viterbi algorithm. Our algorithm is based on efficient detec-
tion of coalescence points in trees representing the state-paths under consideration of the dynamic
programming algorithm. The algorithm requires variable space that depends on the HMM and
on the local properties of the analyzed sequence. For two-state symmetric HMMs, we have shown
that the expected maximum memory used for analysis of sequence of length n is approximately
only (2K2/π2) ln n. Our experiments on both simulated and real data suggest that the asymptotic
bound Θ(m lnn) also extend to multi-state HMMs, and in fact, for most of the time throughout
the execution of the algorithm, much less memory is used.
Further advantage of our algorithm is that it can be used for on-line processing of streamed
sequences; all previous algorithms that are guaranteed to produce the optimal state path require
the whole sequence to be read before the output can be started.
There are still many open problems. We have only been able to analyze the algorithm for two-
state HMMs, though trends predicted by our analysis seem to generalize even to more complex cases.
Can our analysis be extended to multi-state HMMs? Apparently, design of the HMM affects the
memory needed for the decoding algorithm; for example, presence of states with similar emission
probabilities tends to increase memory requirements. Is it possible to characterize HMMs that
require large amounts of memory to decode? Can we characterize the states that are likely to serve
as coalescence points?
Acknowledgments: Authors would like to thank Richard Durrett for useful discussions. Recently, we
have found out that parallel work on this problem is also performed by another research group [20].
Focus of their work is on implementation of an algorithm similar to our on-line Viterbi algorithm
in their gene finder, and possible applications to parallelization, while we focus on the expected
space analysis.
References
1. Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. Journal of Molecular
Biology 268(1) (1997) 78–94
2. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a
hidden Markov model: application to complete genomes. Journal of Molecular Biology 305(3) (2001) 567–570
3. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings
of the IEEE 77(2) (1989) 257–286
4. Forney Jr., G.D.: The Viterbi algorithm. Proceedings of the IEEE 61(3) (1973) 268–278
5. Grice, J.A., Hughey, R., Speck, D.: Reduced space sequence alignment. Computer Applications in the Biosciences
13(1) (1997) 45–53
6. Tarnas, C., Hughey, R.: Reduced space hidden Markov model training. Bioinformatics 14(5) (1998) 401–406
7. Wheeler, R., Hughey, R.: Optimizing reduced-space sequence analysis. Bioinformatics 16(12) (2000) 1082–1090
8. Henderson, J., Salzberg, S., Fasman, K.H.: Finding genes in DNA with a hidden Markov model. Journal of
Computational Biology 4(2) (1997) 127–131
9. Hemmati, F., Costello, D., J.: Truncation error probability in Viterbi decoding. IEEE Transactions on Commu-
nications 25(5) (1977) 530–532
10. Onyszchuk, I.: Truncation length for Viterbi decoding. IEEE Transactions on Communications 39(7) (1991)
1023–1026
11. Feller, W.: An Introduction to Probability Theory and Its Applications, Third Edition, Volume 1. Wiley (1968)
12. Guibas, L.J., Odlyzko, A.M.: Long repetitive patterns in random sequences. Probability Theory and Related
Fields 53 (1980) 241–262
13. Gordon, L., Schilling, M.F., Waterman, M.S.: An extreme value theory for long head runs. Probability Theory
and Related Fields 72 (1986) 279–287
14. Schuster, E.F.: On overwhelming numerical evidence in the settling of Kinney’s waiting-time conjecture. SIAM
Journal on Scientific and Statistical Computing 6(4) (1985) 977–982
15. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. Journal of Computer and
System Sciences 70(3) (2005) 342–363
16. Durrett, R.: Probability: Theory and Examples. Duxbury Press (1996)
17. Brejova, B., Brown, D.G., Vinar, T.: Advances in hidden Markov models for sequence annotation. In Mandoiu,
I., Zelikovski, A., eds.: Bioinformatics Algorithms: Techniques and Applications. Wiley (2007) To appear.
18. Guigo, R., et al.: EGASP: the human ENCODE Genome Annotation Assessment Project. Genome Biology
7(S1) (2006) 1–31
19. Brejova, B., Brown, D.G., Li, M., Vinar, T.: ExonHunter: a comprehensive approach to gene finding. Bioinfor-
matics 21(S1) (2005) i57–65
20. Keibler, E., Brent, M.: Personal communication (2006)
ABSTRACT
  In this paper, we introduce the on-line Viterbi algorithm for decoding hidden
Markov models (HMMs) in much smaller than linear space. Our analysis on
two-state HMMs suggests that the expected maximum memory used to decode
sequence of length $n$ with $m$-state HMM can be as low as $\Theta(m\log n)$,
without a significant slow-down compared to the classical Viterbi algorithm.
Classical Viterbi algorithm requires $O(mn)$ space, which is impractical for
analysis of long DNA sequences (such as complete human genome chromosomes) and
for continuous data streams. We also experimentally demonstrate the performance
of the on-line Viterbi algorithm on a simple HMM for gene finding on both
simulated and real DNA sequences.

<|endoftext|><|startoftext|>
Introduction 
Neutrinoless double beta decay is one of the most sensitive approaches with great perspectives to test particle 
physics beyond the Standard Model. There is immense scope to use 0νββ decay for constraining neutrino masses, 
left–right–symmetric models, interactions involving R-parity breaking in the supersymmetric model and 
leptoquark scenarios, as well as effective lepton number violating couplings. Experimental limits on 0νββ decay 
are not only complementary to accelerator experiments but at least in some cases competitive or superior to the 
best existing direct search limits. The steadily improving experimental limits on the half-life of 0νββ can be 
translated into more stringent limits on the parameters of these new physics scenarios.  
In the process of beta decay an unstable nucleus decays by converting a neutron in the nucleus to a proton and 
emitting an electron and an anti-neutrino. In order for beta decay to be possible the final nucleus must have a 
larger binding energy than the original nucleus. For some nuclei, such as Germanium-76 the nuclei with atomic 
number one higher have a smaller binding energy, preventing beta decay from occurring. In the case of 
Germanium-76 the nuclei with atomic number two higher, Selenium-76 has a larger binding energy, so the 
"double beta decay" process is allowed. In double beta decay two neutrons in the nuclei are converted to protons, 
and two electrons and two anti-neutrinos are emitted. It is the rarest known kind of radioactive decay; it was 
observed for only ten isotopes. For some nuclei, the process occurs as conversion of two protons to neutrons, with 
emission of two neutrinos and absorption of two orbital electrons (double electron capture). If mass difference 
between the parent and daughter atoms is more than 1022 keV (two electron masses), another branch of the 
process becomes possible, with capture of one orbital electron and emission of one positron. And, at last, when 
the mass difference is more then 2044 keV (four electron masses), the third branch of the decay arises, with 
emission of two positrons (β+β+ decay).  
The processes described above are also known as two neutrino double beta decay, as two neutrinos (or anti-
neutrinos) are emitted. If the neutrino is a Majorana particle, meaning that the anti-neutrino and the neutrino are 
actually the same particle then it is possible for neutrinoless double beta decay to occur. In 0νββ decay the emitted 
neutrino is immediately absorbed (as its anti-particle) by another nucleon of the nucleus, so the total kinetic 
energy of the two electrons would be exactly the difference in binding energy between the initial and final state 
nuclei.  
† Now at Indiana University – Bloomington, USA 
Experiments have been carried out and proposed to search for 0νββ decay mode, as its discovery would indicate 
that neutrinos are indeed Majorana particles and allow a calculation of neutrino mass. While the two-neutrino 
mode (1.1) is allowed by the Standard Model of particle physics, the neutrinoless mode (0νββ) (1.2) requires 
violation of lepton number (∆L=2). This mode is possible only, if the neutrino is a Majorana particle, i.e. the 
neutrino is its own antiparticle. Double beta decay, the rarest known nuclear decay process, can occur in different 
modes:  
                                                                            _   
2νββ -decay : A(Z,N) → A(Z+2, N-2)+2e ⎯ +2 ν                                                                            (1.1) 
0 ν ββ -decay : A(Z,N) → A(Z+2, N-2) + 2e⎯                                                                                           (1.2) 
0 ν(2) χ ββ -decay : A(Z,N) → A(Z+2, N-2)+2e ⎯  + (2) χ                                                                                   (1.3) 
                                                                                          
2.  Double Beta Decay:  A Rare Process 
The process arises in certain cases of even-A nuclei, where A is the mass number and is the sum of the number of 
protons and neutrons (A = Z + N). For even-A nuclei, the strong pairing force between like nucleons (neutrons 
like to be paired with other neutrons in a given nucleus, with the same true for protons), the binding energy of 
even-even nuclei (even number of protons and even number neutrons) is larger than that of odd-odd nuclei (odd 
numbers of protons and neutrons). This fact results in two separate parabolas on a plot of binding energy, one 
parabola for even-even nuclei and one for odd-odd. Consequently, one occasionally finds a situation where two 
even-even nuclei for a given mass number A are stable against ordinary beta decay. However, the heavier nucleus 
is not fully stable and can decay to the lighter nucleus via normal double beta decay, a second-order process 
whereby the nuclear charge changes by two units. The ground state of the even-even nuclei is 0+ (positive parity) 
and the nuclear transition is 0+ +→0 . 
                 
One particular type of experimental approach that hopes to determine if the neutrino is a massive Majorana 
particle is the search for neutrinoless double beta decay. This type of experiment is perhaps the only feasible 
method for determining if the neutrino is a Majorana or Dirac particle. While neutrinoless double beta decay has 
not yet been experimentally discovered, searches have been conducted for many years, with many continuing 
today. In fact, the next generation of double beta decay experiments is currently being designed and developed 
and involves a tremendous increase in the amount of source material to be studied (on the order of a half-ton or 
more). In neutrinoless double beta decay an antineutrino emitted at the first vertex is absorbed at the second as 
 - 2 -
seen in the figure below or that a virtual neutrino emitted by a neutron is absorbed by the second neutron 
participating in the double beta decay.   
          
                                              
The two neutrino mode is allowed in standard model. The neutrinoless mode can occur only if neutrinos have 
masses of the Majorana type. The decay rate is proportional to the squared mass. In other words, the half life is 
inversely proportional to the squared mass. Experimentally one can distinguish the two modes. In the two 
neutrino mode the electrons take away only a fraction of the energy Q released in the decay. The sum energy 
spectrum is continuous, extending from 0 to Q. In the neutrinoless mode the total energy Q is carried away by the 
electrons, and the sum energy spectrum is a peak centered at Q, with a width given by the instrumental resolution. 
            
                      
allowed in the Standard Model of physics is given by  
                                                                                   (2.1) 
the exchange of the Majorana neutrino in the absence of right-handed 
                                                               (2.2) 
tively. The nuclear 
y in the 
The decay rate for 2ν ββ decay which is 
The decay rate for the process involving 
currents can be expressed as follows: 
2/1 ),()00(
MZEGT −=→
[ ] 2
2/1 ),()00( ><−=→
νννν mM
MZEGT F
The M  and MGT F are the nuclear matrix elements of Gamow-Teller and Fermi transitions respec
atrix elements of the 0+→0+ Gamow-Teller and Fermi transition for the two neutrino mode in weak theorm
second order perturbation is given by 
 - 3 -
                                   (2
 ∑ Δ+−= n in
∑∑ ++++++ iknnjf 0110 ττ
 - 4 -
nd is given by                           and the Gamow-Teller transition operator 
                                 respectively. 
 complete orthonormal set of intermediate excited states have been introduce  denot
eutrino double beta decay mode has been expressed in terms of single beta transitions through the introduction of 
termediate excited states via which the transition from the initial 0+ to the final 0+ state occurs.  
or the neutrinoless mode, the nuclear matrix elements resulting from Fermi and Gamow-Teller transitions are 
        (2.6) 
      
          (2.7) 
here, R  ro=1.2 fm. 
he par e lepton phase space, and gV and gA are the weak vector and 
xial-vector coupling constants respectively. The <mν> is the effective electron neutrino ma . If the light 
j« few MeV) exchange is the dominant mechanism at both 
ton number 
mj) and the mixing 
arameters (
                                                                                                                                       
               
est possible construction cost. The next sections review experiments 
3.   International Germa iment  (IGEX)  
 is a unique ground to investigate the nature and properties of the neutrino. The 
tern. To 
           (2.4) 
where Δ denotes the average energy a
∑ Δ+−= n in
∑∑ ++++++ ikknnjjf 0110 τστσ
and the Fermi transition operator is given by.
A d ed by    . Thus the two 
given by 
           (2.5) 
The function H depends on the distance between the nucleons and approximately has the form 
w = ro A
⅓,  A being the mass number and
T t G2ν and G0ν results from integrating over th
neutrino (m  for the 0νββ-decay process and th
the neutrino currents are left-handed, then the 0νββ-decay amplitude is proportional to the lep
iolating parameters. This effective mass is related to the light neutrino mass eigenvalues (v
p Uej) and is give by the relation 
              
                (2.8) 
The effective light neutrino mass <mν> may be suppressed by a destructive interference between the different 
contributions in the sum of equation (2.8) if CP is conserved. In this case the mixing matrix satisfies the condition 
Uej= Uej*.ζj, where ζj = ±i is the CP parity of the Majorana neutrino νj. The absolute value has thus been inserted 
for convenience, since the quantity inside it is squared in equation (2.8) and is complex if CP is violated. 
The ideal 0νββ-decay experiment has the following dream features: the lowest possible background, the best 
possible energy resolution, the greatest possible mass of the parent isotope, detection efficiency near 100% for 
valid events, a unique signature and the low
in such an effort from the isotope 76Ge. 
nium EXper
The nuclear Double Beta Decay
neutrinoless decay mode, if it exists, would provide an unambiguous evidence of the Majorana nature of the 
neutrino, its non-zero mass, and the non-conservation of lepton number. After implication from solar and 
atmospheric neutrino oscillation results that neutrinos have non-zero mass, the process of neutrinoless Double 
Beta Decay has become the most relevant place to test the neutrino mass scale and its hierarchy pat
achieve high sensitivity limits of the effective Majorana electron neutrino mass derived from the neutrinoless half-
life lower bound required for such new objectives, it will require a large number of double beta emitter nuclei, a 
ejj Umm
++++ ∑= ik
jjkfF ErHM 0),(0
0 ττν
++++ ∑= ikjjkfGT ErHM 0.),(00 ττν kj σσ
2)( fi EEΔ = −
∑ ∑ ++
jj ττσ  and
∫ +−+=
}2/)({
fi EEE
sin2 qrqR
 - 5 -
 of this type of search was the IGEX. The International Germanium 
Xperiment (IGEX) was a search for the neutrinoless double beta decay of 76Ge employing large amounts of 
In the first phase of the experiment three detectors of 0.7 
ents (the most 
 sites. It provided a rejection of ~ 60 % of the events in the region of 
                                            
                                   IGEX spectrum with and without the PSD background rejection. 
The IGEX detectors had the initial objective of the detection of the double beta decay of 76Ge. At the end of 1999 
certain modifications were made to adapt the detectors to the detection at low energy where the signal of WIMPs 
(Weak Interacting Massive Particles) is relevant. The shielding, shared by three IGEX detectors (2 kg germanium 
detectors isotopically enriched to 86% in 76Ge) and the COSME detector, included from inside to outside 40 cm 
of lead, a PVC box (silicone sealed and flushed with nitrogen), 2 mm of cadmium, plastic scintillators working in 
anticoincidence with the Ge detectors and 20 cm of polyethylene. The shielding was modified on July 2001 as it 
included only one 2 kg germanium detector inside a more efficient neutron shielding. These techniques of passive 
very low background and a sharp energy resolution in the Q-value region, and effective methods to disentangle 
signal from noise. A typical example
HPGe detectors, isotopically enriched to 86% in 76Ge.  
kg active volume each were operated: one in the Homestake gold mine (4000 m.w.e.), other in the Baksan 
Neutrino Observatory (660 m.w.e.) and the other in the Canfranc underground laboratory (Laboratory 2 at 1380 
m.w.e.). A conservative lower bound on the neutrinoless half-life of about 1024 years was derived.  
The International Germanium EXperiment (IGEX) took data at the Canfranc Underground Laboratory in Spain at 
a depth of 2450 m.w.e. in a search of neutrinoless double beta decay.  Three Germanium detectors (RG1, RG2 
and RG3), of ~2 kg each, enriched to 86% in 76Ge were used. Efforts were made to reduce part of the radioactive 
background by discriminating it from the expected signal by comparison of the shape of the pulses (PSD) of both 
types of events. The method was applied to the data recorded by two Ge detectors of the IGEX, which has 
produced one of the two best current sensitivity limits for the Majorana neutrino mass parameter. In the second 
phase, three large detectors (2 kg each) were fabricated (with improvements derived from the analysis of data of 
Phase 1). They are installed in the Canfranc underground laboratory (Laboratory 3 at 2450 m.w.e.) inside a low 
background shielding consisting of 40 cm of lead, a PVC box (silicone sealed and flushed with nitrogen), 2mm of 
cadmium, 20 cm of polyethylene and an active veto (plastic scintillators). A pulse shape discrimination (PSD) 
technique capable to distinguish single site events (ββ decay events for example) from multisite ev
dominant background events) is implemented. New limits on the neutrinoless half-life and the neutrino mass 
parameter were thus obtained from here. 
In large intrinsic Ge detectors, the charge carriers take 300 - 500 ns to reach their respective electrodes. These 
drift times are long enough for the current pulses to be recorded at a sufficient sampling rate. The current pulse 
contributions from electrons and holes are displacement currents, and therefore dependent on their instantaneous 
velocities and locations. Accordingly, events occurring at a single site (ββ-decay events for example) have 
associated current pulse characteristics which reflect the position in the crystal where the event occurred. More 
importantly, these single-site events (SSE) frequently have pulse shapes that differ significantly from those due to 
the background events that produce electron-hole pairs at several sites by multi-Compton-scattering process, for 
example (the so-called Multi-Site Events (MSE)). Consequently, pulse-shape analysis was used to distinguish 
between these two types of energy depositions since DBD events belong to the SSE class of events and will 
deposit energy at a single site in the detector while most of the background events belong to the MSE class of 
events and will deposit energy at several
interest, accepting the criterion that those events having more than two lobes cannot be due to DBD event. 
 - 6 -
and active shielding, along with the extreme radiopurity of the detectors and their components, allowed a low 
energy background as well as a low enough threshold which are unique in this type of detectors. So, very stringent 
contour limits for cross sections and masses of dark matter particles interacting with Ge nuclei through spin-
independent interactions were derived from here. The need to understand and reject backgrounds in Ge-diode 
detector double-beta decay experiments thus gave rise to the development of the pulse shape analysis technique in 
such detectors to distinguish DBD single-site energy deposits from the multiple-site deposits. Henceforth the 
analysis was extended by DBD people to segmented Ge detectors to study the effectiveness of combining 
segmentation with pulse shape analysis to identify the multiplicity of the energy deposits. 
The IGEX calculations for a lower bound to the half-life for the neutrinoless mode where there were fewer than 
3.1 candidate events (90% Confidence Level) under a peak having FWHM = 4 keV and centered at 2038.56 keV 
corresponded to:  
he requirements for a next generation experiment can easily be deduced by reference to                                 (3.1) 
             
where N is the number of parent nuclei, t is the counting time, and c is the upper limit on the number of 0νββ-
decay counts consistent with the observed background. To improve the sensitivity of ‹mν› by a factor of 100, the 
quantity Nt/c must be increased by a factor of 104. The quantity N can feasibly be increased by a factor of ~102 
over present experiments, so that t/c must also be improved by that amount. Since practical counting times can 
only be increased by a factor of 2 to 4, the background should be reduced by a factor of 25 to 50 below present 
levels.  These are approximately the target parameters of the next generation neutrinoless double-beta decay 
experiments.  
                                               
Histogram of the IGEX data in the energy region of interest for the 0ν -ββ decay. The limits on the half-life and 
neutrino mass parameter are also shown. 
The Effective ν Mass: The section of KKDK on effective neutrino mass (“Critical View to the IGEX neutrinoless 
ouble-beta decay experiment...” published in Phys. Rev. D, Volume 65 (2002) 092007, by H. V. Klapdor-
Kleingrothaus, A. Dietz, and I. V. Krivosheina) begins with: “Starting from their incorrectly determined half-life 
limit the authors claim a range of effective neutrino mass of (0.33-1.35) eV.” In response the IGEX collaboration, 
came out stating that KKDK selected only the 52.51 mole·years of the IGEX data that had been subjected to PSD 
and obtained T½0ν > 7.1×1024 y using the maximum number of counts, 3.1, from the entire 117 mole·years of data 
which was erroneous and unjustified. In another case, KKDK also decided to arbitrarily use the entire IGEX data 
set prior to PSD selection from which they obtained 0ν a bound of T½0ν > 1.1 × 1025 y for which there was no 
scientific justification for selecting only PSD corrected data on one hand and totally ignoring the PSD corrected 
data on the other hand. In the conclusion of KKDK it states: “the IGEX paper - apart from the too high half-life 
limits presented, as a consequence of an arithmetic error - is rather incomplete in its presentation”. In response to 
this paper the IGEX collaboration published the article “The IGEX experiment revisited: a response to the critique 
of Klapdor-Kleingrothaus, Dietz, and Krivosheina” where they stated that there was absolutely no arithmetic error 
yryrGe 25
76 1057.1
1087.4
>T 0 (ν2/1 1.3
).2(ln0
2/1 =
 - 7 -
and that the analysis of the published IGEX data presented in KKDK stands illegitimate. To obtain a much shorter 
bound on the half-life, they arbitrarily analyzed two ~ halves of the data separately. Instead of having 4.88×1025 y 
in the numerator (ln2 N.t) they used 2.2×1025 y. Yet they used the 90% CL upper limit on the number of counts 
under the peak, obtained by IGEX from all of the data. In another analysis, they ignored the fact that 52.51 
mole·years were corrected with PSD and treated the complete uncorrected data set. Naturally, the lower limits on 
T1/2oν (76Ge) obtained by these completely unjustified procedures are shorter than that obtained from properly 
analyzing the complete data set. This paper henceforth states “the lower limit quoted by IGEX, T1/20ν ≥ 1.57 × 1025 
years, is correct and that there was no arithmetical error as claimed in the Critical View article.” 
4.  The  HEIDELBERG - MOSCOW  Experiment  
The Heidelberg-Moscow experiment at the Gran Sasso underground laboratory is now claimed to be the most 
sensitive neutrinoless double beta decay experiment worldwide. It has contributed in an extraordinary way to the 
research in neutrino physics and particularly to beyond standard model physics, and limits for the latter are 
competing with those from the largest high-energy accelerators. The emphasis on the first indication for 
y is found in the Heidelberg-Moscow experiment ving first evidence of the lepton neutrinoless double beta deca gi
number violation and a Majorana nature of the neutrinos. The neutrinoless double beta decay could answer 
questions to the absolute scale of the neutrino mass and the fundamental character of the neutrino whether it is a 
Dirac or a Majorana particle. 
                                      
                                        Entrance of the highway tunnel under Gran Sasso mountain. 
With the support of the LNGS the experimental building of the experiment was built between Halls A and B in 
Gran Sasso, into which the first enriched 76 76Ge detector (the first high-purity enriched Ge detector worldwide) 
was installed in July 1990 . First preparation work had been done since 1989 in a provisional tent in Hall C. The 
ll amount of five enriched 76Ge detectors of in total 11 kg was finally installed in 1995 and were operated since 
n method  
lead (detectors ## 1,2,3,5). Each setup is coated with stainless steal casing. Non-
dioactive pure nitrogen was blown through casings to reduce radon emanation contribution. To reduce neutron 
background the casing with detectors ##1,2,3,5 was coated with borated polyethylene and two anticoincidence 
plates of plastic scintillator were located over the casing in order to reduce muon component. The setup was 
located in Gran Sasso underground laboratory, Italy at a depth of 3500 metres of water equivalent of the lab 
1996 with a newly developed pulse shape discriminatio
High purity germanium crystals, enriched by Germanium-76 isotope up to 86% are used as the main detecting 
elements. Five coaxial detectors with the total weight of 11.5 kg (125 moles in the active volume of detectors) are 
used. Each detector is located in a separate cryostat made of electrolytic copper with low content of radioactive 
impurities. The quantity of other designed materials (iron, bronze, light material insulators) is minimized in order 
to reduce the feasible radioactive impurities contribution to the total background of the detectors. The detectors 
were located in two separate shielded boxes. One of them, 270 mm thick is made of electrolytic copper (detector 
#4), the other consists of two layers of lead – inner -100mm of high purity LCD2-grade lead and outer – 200 mm 
of low  background Boliden 
reduces influence of cosmic rays on background conditions of the experiment. The electronics and the system of 
collecting data allow to record each event – the number (or numbers) of acted detector, amplitude and pulse shape, 
and anticoincidence veto. The Heidelberg-Moscow experiment, with five enriched 86%-88% high-purity p-type 
Germanium detectors, of in total 10.96 kg of active volume, used the largest source strength of all double beta 
experiments at present, and reached a record low level of background. The detectors were the first high-purity Ge 
detectors ever produced. The degree of enrichment was checked by investigation of tiny pieces of Ge after crystal 
production using the Heidelberg MP-Tandem accelerator as a mass spectrometer.  
The detectors, except detector # 4, were operated in a common Pb shielding of 30 cm, which consisted of an inner 
shielding of 10 cm radiopure LC2-grade Pb followed by 20 cm of Boliden lead. The whole setup was placed in an 
air-tight steel box and flushed with radiopure nitrogen in order to suppress the 222Rn contamination of the air. The 
shielding was improved in the course of the measurement. The steel box operated since 1994 centered inside a 10-
cm boron-loaded polyethylene shielding to decrease the neutron flux from outside. An active anticoincidence 
shielding was placed on top of the setup since 1995 to reduce the effect of muons. Detector # 4 was installed in a 
separate setup, which had an inner shielding of 27.5 cm electrolytical Cu, 20 cm lead, and boron-loaded 
below the steel box, but no muon shielding. The setup was kept air-tight closed since 
stallation of detector #5 in February’95. Since then no radioactive contaminations of the inner of the 
he sensitivity for the 0ν ββ half-life is given by                                                                   (4.1) 
ent are: energy resolution, background and 
e strength ever operated in a double beta decay 
xperiment. The background reached to the experiment, was 0.113 ± 0.007 events/kg y keV (in the period 1995-
was the lowest lim
polyethylene shielding 
experimental setup by air and dust from the tunnel could occur. 
- 8 -
With     denoting the degree of enrichment, ε the efficiency of the detector for detection of a double beta event, M 
the detector (source) mass, ∆E the energy resolution, B the background and t the measuring time, the sensitivity of 
our 11 kg of enriched 76Ge experiment corresponds to that of an at least 1.2 ton natural Ge experiment. After 
enrichment - the other most important parameters of a ββ experim
source strength. The high energy resolution of the Ge detectors of 0.2% or better, assures no background for a 
0νββ line from the two-neutrino double beta decay in this experiment (5.5 × 10-9 events expected in the energy 
range 2035-2039.1keV), in contrast to most other present experimental approaches, where limited energy 
resolution is a severe drawback. 
                                 
The efficiency of Ge detectors for detection of 0ν ββ decay events is close to 100%. The source strength in the 
Heidelberg-Moscow experiment of 11kg was the largest sourc
~0 〉〈×ν ε mand
. 02/1 Δ ννTBE
2003) in the 0ν ββ decay region (around Qββ). This it ever obtained in such type of experiment. 
 - 9 -
he statistics collected in this experiment during 13 years of stable running is the largest ever collected in a 
 presented a paper concerning “Measurement of the Bi spectrum in the energy 
gion around the Q-value of Ge neutrinoless double-beta decay”. In this work they presented the measurements 
f the 214Bi spectrum from a 226Ra source with a high purity germanium detector. Their attention was mostly 
focused on the energy region around the Q-value of 76Ge neutrinoless double-beta decay (2039.006 keV). The 
results of the measurement strongly relates to the first indication for neutrinoless double beta decay of 76Ge. An 
analysis of the data collected during ten years of measurements by the Heidelberg-Moscow experiment, at Gran-
Sasso Underground Laboratory, yields a first indication for the neutrinoless double beta decay of 76Ge. An 
important point of this analysis is the interpretation of the background, in the region around the Q-value of the 
double beta decay (2039.006 keV), as containing several weak photopeaks. It was suggested and has been shown 
that four of these peaks are produced by a contamination from the isotope 214Bi, whose lines are present 
throughout the Heidelberg-Moscow background spectrum.  
In this work they performed a measurement of a 226Ra source with a high-purity germanium detector. The aim of 
this work was to study the spectral shape of the lines in the energy region from 2000 to 2100keV and, most 
important, to show the difference in this spectral shape when changing the position of the source with respect to 
the detector, and to verify the effect of TCS (True Coincidence Summing) for the weak 214Bi lines seen in the 
Heidelberg-Moscow experiment. The activity of the 226Ra source is 95.2kBq. The isotope 226Ra appears in the 
238U natural decay chain and from its decays also 214Bi is produced. The γ-spectrum of 214Bi is clearly visible in 
the 226Ra measured spectrum. 214Bi is a naturally occurring isotope: it is produced in the 238U natural decay chain 
through the β- decay of 214Pb and the alpha decay of 218At. With a subsequent β- reaction, 214Bi decays then into 
214Po (the branching ratio with respect to the α decay into 210Tl is 99.979%). The decay, however, does not lead 
d state of 214Po, but to its excited states. From the decays of those excited states to the ground 
tate the well known γ-spectrum of 214Bi is obtained, which contains more than hundred lines. 
 the table given below, one can see in the energy region around the Q-value of the 0νββ decay (2000-2100keV), 
ur γ-lines and one E0 transition with energy 2016.7keV are expected. The E0 transition can produce a 
double beta decay experiment. The experiment took data during ~ 80% of its installation time. The Q value for 
neutrinoless double beta decay was recently determined with high precision. 
The background of the experiment: (1) primordial activities of the natural decay chains from 238 232U, Th, and 40K; 
(2) anthropogenic radio nuclides, like 137 134 125 207Cs, Cs, Sb, Bi; (3) cosmogenic isotopes, produced by activation 
due to cosmic rays during production and transport; (4) the bremsstrahlungs spectrum of 210Bi (daughter of 210Pb); 
(5) elastic and inelastic neutron scattering; and (6) direct muon-induced events. 
H.V. Klapdor-Kleingrothaus, O. Chkvorez, I.V. Krivosheina and C. Tomei at Max-Planck-Institut fur Kernphysik 
in the Heidelberg-Moscow group 214
directly to the groun
conversion electron or an electron-positron pair but it could not contribute directly to the γ-spectrum in the 
considered energy region if the source is located outside the detector active volume.  
               
0.0502010.71
Intensity(%)Energy (keV)
                                    
0.0782052.94
0.05020889.7
0.0202021.8
0.00582016.7
0.0502010.71
Intensity(%)Energy (keV)
0.0202021.8
0.00582016.7
0.0782052.94
0.05020889.7
The intensity of each line is defined as the number of emitted photons, with the corresponding energy, per 100 
decays of the parent nuclide. The considerations for the measurement were the efficiency of the detector (which 
depends on the size of the detector and on the distance source-detector) and the effect called True Coincidence 
Summing (TCS). The lifetimes of the atomic excited levels are much shorter than the resolving time of the 
detector. If two gamma-rays are emitted in cascade, there is a certain probability that they will be detected 
 - 10 -
lled in 
. The measurement of Bi 
pectrum, with a high purity germanium detector, in the energy region around the Q-value of 76Ge neutrinoless 
ta decay (2039.006keV) was done with the 226Ra source used for the measurements positioned, in a first 
step the source was positioned on the top of the detector, directly in contact with the copper cap (close geometry) 
and in a second step the source was moved 15cm away from the detector cap (far geometry). The results of the 
measurements show that, if the source is close to the detector, the intensities of the weak Bi lines in the energy 
region 2000- 2100keV are not in the same ratio as reported by Table of Isotopes.  The results of the analysis of the 
data collected by the Heidelberg-Moscow experiment with all the five detectors, yielding a first indication for the 
neutrinoless double beta decay of 76Ge, shows that four 214Bi lines are present in the energy region from 2000 to 
2080keV (many other strong lines from the same isotope are present in the spectrum), due to the presence of 
bismuth in the experimental setup, especially in the copper in the vicinity of the Ge crystals. 
                        
together. If this happens, then a pulse will be recorded which represents the sum of the energies of the two 
individual photons, instead of two separated pulses with different energies. The TCS effect can result both in 
lower peak-intensity for full-energy peaks and in bigger peak-intensity for those transitions whose energy can be 
given by the sum of two lower-energy gamma-rays. In this case, the lines at 2010.7 keV and 2016.7 keV can be 
given by the coincidence of the 609.312 keV photon (strongest line, intensity = 46.1%) with the 1401.50keV 
photon (intensity = 1.27%) or with the 1407.98keV photon (intensity = 2.15%). The degree of TCS depends on 
the probability that two gamma-rays emitted simultaneously will be detected simultaneously which is a function 
of the detector geometry and of the solid angle subtended at the detector by the source and for this the intensities 
of the two lines mentioned above (2010.71keV and 2016.7keV) are expected to depend on the position of the 
source with respect to the detector. 
The 226Ra γ-ray spectra were measured using a γ-ray spectroscopy system based on an HPGe detector insta
the operation room of the HEIDELBERG-MOSCOW experiment in Gran Sasso Underground Laboratory, Italy. 
The coaxial germanium detector had an external diameter of 5.2cm and 4.9cm height. The distance between the 
top of the detector and the copper cap was kept at 3.5cm. The relative detection efficiency of the detector was 
23% and the energy resolution being 3.6keV for the energy range 2000-2100keV 214
double-be
                 
The  above  figure  shows  the  sum  spectrum  of  the  76Ge  detectors 1,2,3,4  and  5  over  the  period  August  
1990  to  May  2003 as recorded by the Heidelberg-Moscow experiment. 
 - 11 -
 There is no null hypothesis analysis demonstrating that the data require a peak. Furthermore, no simulation has 
to demonstrate that the analysis correctly finds true peaks or that it would find no peaks if none 
existed. Monte Carlo simulations of spectra containing different numbers of peaks are needed to confirm the 
significance of any found peaks. 
2. There are three unidentified peaks in the region of analysis that have greater significance than the 2039-keV 
peak. There is no discussion of the origin of these peaks. 
3. There is no discussion of how sensitive the conclusions are to different mathematical models. There is a 
previous Heidelberg-Moscow publication that gives a lower limit of 1.9 × 1025 y (90% confidence level). This is 
in conflict with the “best value” of a newer KDHK paper of 1.5 × 1025 y. This indicates a dependence of the 
results on the analysis model and the background evaluation. 
In this paper they state that a number of other cross checks of the result should also be performed. For example, 
there is no discussion of how a variation of the size of the chosen analysis window affects the significance of the 
hypothetical peak. There is no relative peak strength analysis of all the 214Bi peaks. Quantitative evaluations 
should be made on the four 214Bi peaks in the region of interest. There is no statement of the net count rate of the 
peaks other than the 2039-keV peak. There being no presentation of the entire spectrum, is difficult to compare 
relative strengths of peaks. There is no discussion of the relative peak strengths before and after the single-site-
event cut.  
On the other hand the Heidelberg-Moscow group claims that the signal found at Qββ is consisting of single site 
events and is not a γ line. The signal does not occur in the Ge experiments not enriched in the double beta emitter 
76Ge, while neighbouring background lines appear consistently in these experiments. On this basis they translated 
the observed numbers of events into half-lives for neutrinoless double beta decay.  
The Heidelberg-Moscow experiment continued regularly from 1990 till 2003. The analysis of the full data taken 
with the Heidelberg-Moscow experiment in the period 2 August 1990 until 20 May 2003 is presented. The 
completed  Heidelberg-Moscow  76Ge  Experiment -71.7 kg y  after 13 years of operation presents their mass 
calculation limit status as  mν (eV) = 0.24 - 0.58  ( 99.997%  C.L.) with the best value of 0.4 eV (95% C.L.). 
hile an unambiguous interpretation of all of the neutrino oscillation experiments is not yet possible, it is 
bundantly clear that neutrinos exhibit properties not included in the standard model, namely mass and flavor 
arch which will employ 500 kg of Ge, 
a. The Majorana experiment is proposed for a US deep underground laboratory, 
eriments. Furthermore, new segmented Ge detector 
yogenic performance and background reduction and 
Moscow and IGEX experiments both utilized Germanium enriched to 86% in Ge and operated deep 
In a paper by Klapdor-Kleingrothaus, Dietz, Harney, and Krivosheina (hereafter referred to as KDHK) evidence is 
claimed for zero-neutrino double-beta decay in 76 Ge. The high quality data, upon which this claim is based, was 
compiled by the 2 careful efforts of the Heidelberg-Moscow collaboration, and is well documented. However, the 
analysis in KDHK makes an extraordinary claim, and therefore requires very solid substantiation according to 
another paper “Comment on Evidence for Neutrinoless Double Beta Decay” C.E.Aalseth et al. They state that a 
large number of issues were not addressed in KDHK some of which are: 
been presented 
5.  The proposed MAJORANA experiment 
mixing. Accordingly, sensitive searches for neutrinoless double-beta decay (0νββ-decay) are more important than 
ever. Experiments with large quantities of Ge, isotopically enriched in 76Ge, have thus far proven to be the most 
sensitive, specifically the Heidelberg-Moscow and IGEX experiments with lower limits in half-life sensitivities 
1.9×1025 y and 1.6×1025 y respectively. A new generation of experiments will be required to make significant 
improvements in sensitivity one of which is the proposed Majorana Experiment.  
The Majorana Experiment is a next-generation Ge double-beta decay se
isotopically enriched to 86% in Ge, in the form of ~200 detectors in a close-packed array for high granularity. 
Each crystal will be electronically segmented, with each region fitted with pulse-shape analysis electronics. A 
half-life sensitivity is predicted of 4.2 × 1027 years or < mν> ~ 0.02 - 0.07 eV, depending on the nuclear matrix 
elements used to interpret the dat
and requires very little R&D as it stands on the technical shoulders of the IGEX experiment and other previous 
successful double-beta decay and low-background exp
technology has recently become commercially available, while Pacific Northwest National Laboratory 
(PNNL)/University of South Carolina (USC) researchers have developed new pulse-shape discrimination 
techniques. 
Several configurations have been evaluated with respect to cr
rejection. It will concentrate on a conventional modular design using ultra-low background cryostat technology 
developed by IGEX. It will also utilize new pulse-shape discrimination hardware and software techniques 
developed by the Majorana collaboration and detector segmentation to reduce background. The Heidelberg-
underground. The projection for the Majorana is that the background will be reduced by a factor of 65 over the 
early IGEX results prior to pulse shape analysis (from 0.2 to ~0.003 keV-1
 - 12 -
germanium by limiting the time above ground after crystal growth, careful material selection 
marily comprised of multiple 
o or 
ore of the independent segments. When coincidences are found, the output from all detector segments is 
nly of the full-energy peak lying above a featureless 
 kg-1 y-1). This will occur mainly by the 
decay of the internal background due to cosmogenic neutron spallation reactions that produce 56 58 60Co, Co, Co, 
65Zn and 68Ge in the 
and electroforming copper cryostats. One component of the background reduction will arise from the 
segmentation and granularity of the detector array.  
Most of the Compton continuum consists of single Compton scatterings followed by escape of the scattered 
gamma ray, whereas full-energy events at typical gamma-ray energies are pri
scattering sequences followed by a photoelectric absorption. The peak-to-Compton ratio can therefore be 
enhanced by requiring a recorded event to correspond to more than one interaction within the detector before its 
acceptance. In germanium detectors, this selection is usually accomplished by subdividing the detector into 
several segments (or providing several adjacent independent detectors) and seeking coincident pulses from tw
summed and recorded. The resulting spectrum is made up o
continuum that is greatly suppressed and has no abrupt Compton edges. New Ge experiments must not simply be 
a volume expansion of IGEX or Heidelberg–Moscow. They must have superior background rejection and better 
electronic stability. The summing of 200 individual energy spectra can result in serious loss of energy resolution 
for the overall experiment which can be avoided by segmenting n-type intrinsic Ge detectors, advanced PSD 
techniques and electronic stability in measurement.  
                                         
The above figure depicts a standard Ge detector segmentation scheme. This is the configuration of the SEGA 
detector undergoing tests by the Majorana collaboration. A configuration with six-azimuthal-segment by two-
axial-segment geometry is shown in the above figure. 
Efforts are thus on with the Majorana experiment for the search of neutrinoless double beta decay that would give 
a new shape to the standard model of physics. Majorana cannot not simply be a volume expansion of IGEX, but 
must have superior background rejection. As it was conclusively shown that the limiting background in at least 
some previous experiments has been cosmogenic activation of the germanium itself, it is necessary to mitigate 
those background sources. Cosmogenic activity fortunately has certain factors which discriminate it from the 
signal of interest. For example, while 0νββ -decay would deposit 2 MeV between two electrons in a small, 
perhaps 1 mm3 volume, internal 60Co decay deposits about 318 keV (endpoint) in beta energy near the decaying 
atom, while simultaneous 1173 keV and 1332 keV gammas can deposit energy elsewhere in the crystal, most 
probably both in more than one location, for a total energy capable of reaching the 2039 keV region-of-interest. A 
similar situation exists for internal 68Ge decay. Thus deposition-location multiplicity distinguishes double-beta 
decay from the important long lived cosmogenics in germanium. Isotopes such as 56 57 58Co, Co, Co and 68Ge are 
produced at a rate of roughly 1 atom per day per kilogram on the earth’s surface. Only 60Co and 68Ge have both 
the energy and half-life to be of concern. To pursue the multiplicity parameter, firstly, the detector current pulse 
shape carries with it the record of energy deposition along the electric field lines in the crystal; that is, the radial 
 - 13 -
imension of cylindrical detectors. This information may be exploited through pulse-shape discrimination. 
econdly, the electrical contacts of the detector may be divided to produce independent regions of charge 
collection. The ability of new techniques to be easily calibrated for individual detectors makes them practical for 
large detector arrays. Calibration for single-site event pulses was trivially accomplished by collecting pulses from 
thorium ore; the 2614.47-keV gamma ray from 208Tl produces a largely single-site double-escape peak at 1592.47 
keV. The PSD discriminator was then calibrated to the properties of the double-escape peak A slightly improved 
double-escape peak was be made from the 26Al gamma ray of 2938.22-keV. The double-escape appears at 
1916.22 keV, only about 120 keV away from the expected region of interest for 0νββ-decay. The obvious and 
direct use of pulse- shape discrimination and segmentation is the rejection of cosmogenic pulses in the germanium 
itself. However, the approach should be also effective on gamma rays from the shielding and structural materials. 
The background effects of neutrons of both high energy (cosmic muon generated) and low energy (fission and 
(α,n) from rock) could be protected by the segmentation and granularity of the detectors. These neutrons could 
also produce other unwanted activities like the formation of 3H and 14C in nitrogen from high and low energy 
neutrons, respectively. Fortunately, Majorana detectors will not be surrounded by nitrogen at high density. 
                              
               
The GERDA (GERmanium Detector Assembly), which is another next generation 76Ge double beta decay 
experiment at the Gran Sasso Underground Laboratory, has projected a sensitivity in the half-life of the 0νββ-
decay mode which is less than the proposed Majorana experiment. In conclusion, the Majorana project has been 
designed in a compact, modular way such that it can be built and operated with high confidence in the approach 
and the technology. The initial years of construction will allow alternate cooling methods to be employed if they 
have an advantage and should they be shown to overcome long-term concerns due to surface contamination, 
muon-induced ions, and diffusion. The Majorana Collaboration has made an extensive analysis of the predicted 
backgrounds and their impact on the final sensitivity of the experiment. The Majorana experiment represents a 
great increase in Ge mass over IGEX with new segmented Ge detectors and the newest electronic systems for 
pulse-shape discrimination. Their conclusion is that with 500 kg of Ge, enriched to 86% in the isotope 76Ge, the 
Majorana array operating over 10 years including construction time, can reach a lower limit on T1/20ν of 4×1027 
years. This corresponds to an upper bound of < m  > of 0.038 ± 0.007eV.  One advantage of 76ν Ge is that it may 
well be a candidate for a future more reliable microscopic calculation of the 0ν ββ- decay nuclear matrix element.  
6   Conclusion 
eutrinoless double beta decay is thus one of the most sensitive approaches with great perspectives to test pN article 
hysics beyond the Standard Model. The possibilities to use 0νββ decay for constraining neutrino masses, left–
ght symmetric models, SUSY and leptoquark scenarios, as well as effective lepton number violating couplings, 
have been reviewed. It is a very sensitive probe to the lepton number violating terms in the Lagrangian such as the 
Majorana mass of the light neutrinos, right-handed weak couplings involving heavy Majorana neutrinos, as well 
as Higgs and other interactions involving violation of chirality conservation. 
 - 14 -
In search for neutrinoless double beta decay 76Ge as the source material has multiple advantages. It has high 
resolution (< 4 keV at Qββ) with no background from 2ν mode. A huge leap in sensitivity is possible applying 
ultra-low background techniques and 0ν- ββ signal discrimination. There can be a phased approach in the 
experiment with the increment of target mass. The source and detector are the same material thereby reducing 
background and maintaining the 4π geometry and the only way to scrutinize 0ν – DBD claim on short time scale: 
since it tests T1/2 and not mν. The consequences of Neutrinoless Double Beta Decay are- [1] Total Lepton number 
violation: The most important consequence of the observation of neutrinoless double beta decay is that lepton 
number is not conserved. This is fundamental for particle physics. [2] Majorana nature of neutrino: Another 
fundamental consequence is that the neutrino is a Majorana particle. Both of these conclusions are independent of 
any discussion of nuclear matrix elements. [3] Effective neutrino mass: The matrix element enters when we derive 
a value for the effective neutrino mass - making the most natural assumption that the 0νββ decay amplitude is 
dominated by exchange of a massive Majorana neutrino.  
Acknowledgements 
I would like to thank the IGEX collaboration, the Heidelberg-Moscow collaboration and the Majorana 
n for having used information from th imental works to write up this bri  review. 
eferences 
oration), Physics Review D (2002) 
 Lett. A(2002), hep-ph/0202018  C. E. 
hed 76Ge in Gran Sasso 1990-2003 Heidelberg-Moscow 
or-Kleingrothaus, I.V. Krivosheina, A.Dietz, O.Chkvoretz, Physics Letters B 586(2004)  198-212. 
m the Heidelberg-Moscow double beta decay experiment”, (The Heidelberg Moscow Collaboration), 
Eur. Phys. J.  A  12, 147-154(2001). 
collaboratio eir exper ef
[1] “Search for neutrinoless double beta decay with enriched 76Ge in Gran Sasso 1990-2003”, H.V. Klapdor-Kleingrothaus, 
I.V. Krivosheina, A. Dietz, O. Chkvorets, Phys. Lett. B 586 (2004) 198 - 212 and hep-ph/0404088. 
[2] “Next generation double-beta decay experiments: metrics for their evaluation”,  F T Avignone III, G S King III and Yu G 
Zdesenko , New Journal of Physics 7 (2005) 
[3] “Double-beta decay”, Steven R Elliott and Jonathan Engel, J. Phys. G: Nuclear and Particle Physics. 
[4] “New Physics Potential of Double Beta Decay and Dark Matter Search”, H.V. Klapdor–Kleingrothaus, H. Pas, Talk 
presented by Heinrich Pas atthe at the 6th Symp. on Particles, Strings and Cosmology (PASCOS’98), Boston, March 1998 
[5] H.V. Klapdor-Kleingrothaus et al. Mod. Phys. Lett. A 16 (2001) 2409 - 2420. 
[6] H.V. Klapdor-Kleingrothaus, A. Dietz, I.V. Krivosheina, Part. & Nucl. 110(2002)57. 
[7] H.V. Klapdor-Kleingrothaus, et al., Nucl. Instr. Meth. 522 A (2004) 371-406 and hep-ph/0403018 and Phys. Lett. B 586 
(2004) 198-212. 
[8] H.V. Klapdor-Kleingrothaus, A. Dietz, I.V. Krivosheina, Ch. Dorr, C. Tomei, Phys. Lett. B 578 (2004) 54-62 and hep-
ph/0312171. 
[9] H.V. Klapdor-Kleingrothaus et al., (Heidelberg-Moscow Collaboration.),  Eur. Phys. J. A 12(2001)147. 
10] “IGEX [ 76Ge neutrinoless double-beta decay experiment: Prospects for next generation experiments”, C.E.Aalseth et al., 
(The IGEX collab
[11] H.V.Klapdor-Kleingrothaus, A.Dietz, I.V.Krivosheina and O.Chkvorets, Nucl. Instr. Meth. A 522 (2004) 371-406. 
[12] “Heidelberg - Moscow Experiment. First Evidence for Lepton Number Violation and the Majorana Character of 
Neutrinos” H.V. Klapdor-Kleingrothaus and I.V. Krivosheina 
[13] “Search for Neutrinoless Double Beta Decay with Enriched 76Ge 1990-2003 Heidelberg-Moscow Experiment” 
H.V.Klapdor-Kleingrothaus, I.V. Krivosheina, A.Dietz, C.Tomei, O.Chkvoretz, H.Strecker   hep-ph/0404062 (2004) 
[14] “Pulse Shape Discrimination in the IGEX Experiment”, D. Gonzalez et al, hep-ex/0302018. 
[15] “Comment On Evidence for Neutrinoless Double Beta Decay”, Mod. Phys.
Aalseth et al. 
[16.] “The IGEX experiment revisited: a response to the critique of Klapdor-Kleingrothaus, Dietz, and Krivosheina”, 
C.E.Aalseth et al., (The IGEX collaboration),  nucl-ex/0404036. 
[17] “The Majorana 76Ge Double-Beta  Decay Project”, The Majorana Collaboration, hep-ex/0201021 
[18] H.V. Klapdor-Kleingrothaus , O. Chkvorez, I.V. Krivosheina, C. Tomei, Nucl. Instrum. Meth. A (2003), “Measurement 
of the 214Bi spectrum in the energy region around the Q-value of 76Ge neutrinoless double-beta decay”  
[19] “Critical View to the IGEX neutrinoless double-beta decay experiment” H. V. Klapdor-Kleingrothaus, A. Dietz, and I. V. 
Krivosheina, hep-ph/0403056. 
[20] “Results of the experiment on investigation of Germanium-76 double beta decay - Experimental data of Heidelberg-
Moscow collaboration November 1995 - August 2001”, A.M. Bakalyarov, A.Ya. Balysh, S.T. Belyaev, V.I. Lebedev, S.V. 
Zhukov, Phys.Part.Nucl.Lett. 2 (2005) 77-81 , hep-ex/0309016. 
[21] “The proposed  Majorana 76Ge double-beta decay experiment” , The Majorana Collaboration, Nuclear Physics B  
138(2005) 217-220. 
22] “Search For Neutrinoless Double Beta Decay With Enric[
Experiment” H.V.Klapd
[23] “Latest Results fro
ABSTRACT
  Neutrinoless double beta decay is one of the most sensitive approaches in
non-accelerator particle physics to take us into a regime of physics beyond the
standard model. This article is a brief review of the experiments in search of
neutrinoless double beta decay from 76Ge. Following a brief introduction of the
process of double beta decay from 76Ge, the results of the very first
experiments IGEX and Heidelberg-Moscow which give indications of the existence
of possible neutrinoless double beta decay mode has been reviewed. Then ongoing
efforts to substantiate the early findings are presented and the Majorana
experiment as a future experimental approach which will allow a very detailed
study of the neutrinoless decay mode is discussed.

<|endoftext|><|startoftext|>
Introduction
The geometrical superfield approach [1-8] to Becchi-Rouet-Stora-Tyutin (BRST) formalism
is one of the most attractive and intuitive approaches which enables us to gain some physical
insights into the beautiful (but abstract mathematical) structures that are associated with
the nilpotent (anti-)BRST symmetry transformations and their corresponding generators.
The latter quantities play a very decisive role in (i) the covariant canonical quantization of
the gauge theories, (ii) the proof of the unitarity of the “quantum” gauge theories at any
arbitrary order of perturbative computations for a given physical process (that is allowed
by the theory), (iii) the definition of the physical states of the “quantum” gauge theories
in the quantum Hilbert space, and (iv) the cohomological description of the physical states
of the quantum Hilbert space w.r.t. the conserved and nilpotent BRST charge.
To be specific, in the superfield formulation [1-8] of the 4D 1-form gauge theories, one
defines the super curvature 2-form F̃ (2) = d̃Ã(1)+ i Ã(1)∧ Ã(1) in terms of the super exterior
derivative d̃ = dxµ∂µ + dθ∂θ + dθ̄∂θ̄ (with d̃
2 = 0) and the super 1-form connection Ã(1) on
a (4, 2)-dimensional supermanifold parametrized by the usual spacetime variables xµ (with
µ = 0, 1, 2, 3) and a pair of anticommuting (i.e. θ2 = θ̄2 = 0, θθ̄ + θ̄θ = 0) Grassmannian
variables θ and θ̄. The above super 2-form is subsequently equated, due to the so-called
horizontality condition [1-8], to the ordinary curvature 2-form F (2) = dA(1) + iA(1) ∧ A(1)
defined on the ordinary 4D flat Minkowski spacetime manifold in terms of the ordinary
exterior derivative d = dxµ∂µ (with d
2 = 0) and the 1-form connection A(1) = dxµAµ. The
above super exterior derivative d̃ and super 1-form connection Ã(1) are the generalization of
the 4D ordinary exterior derivative d and 1-form connection A(1) to the (4, 2)-dimensional
supermanifold because d̃ → d, Ã(1) → A(1) in the limit (θ, θ̄) → 0.
The above horizontality condition (HC) has been referred to as the soul-flatness con-
dition in [9] which amounts to setting equal to zero all the Grassmannian components of
the (anti)symmetric second-rank super tensor that constitutes the super curvature 2-form
F̃ (2) on the (4, 2)-dimensional supermanifold. The key consequences, that emerge from
the HC, are (i) the derivation of the nilpotent (anti-)BRST symmetry transformations for
the gauge and (anti-)ghost fields of a given 4D 1-form gauge theory, (ii) the geometrical
interpretation of the (anti-)BRST symmetry transformations for the 4D local fields as the
translation of the corresponding superfields along the Grassmannian directions of the su-
permanifold, (iii) the geometrical interpretation of the nilpotency property as a pair of
successive translations of the superfield along a particular Grassmannian direction of the
supermanifold, and (iv) the geometrical interpretation of the anticommutativity property
of the (anti-)BRST symmetry transformations for a 4D local field as the sum of (a) the
translation of the corresponding superfield first along the θ-direction followed by the trans-
lation along the θ̄-direction, and (b) the translation of the same superfield first along the
θ̄-direction followed by the translation along the θ-direction.
It will be noted that the above HC (i.e. F̃ (2) = F (2)) is valid for the non-Abelian (i.e.
A(1)(n)∧A(1)(n) 6= 0) 1-form gauge theory as well as the Abelian (i.e. A(1)∧A(1) = 0) 1-form
gauge theory. As expected, for both types of theories, the HC leads to the derivation of the
nilpotent (anti-)BRST symmetry transformations for the gauge and (anti-)ghost fields of
the respective theories. We lay emphasis on the fact that the HC does not shed any light
on the derivation of the nilpotent (anti-)BRST symmetry transformations associated with
the matter fields of the interacting 4D (non-)Abelian 1-form gauge theories.
In a recent set of papers [10-17], the above HC condition has been generalized, in a
consistent manner, so as to compute the nilpotent (anti-)BRST symmetry transformations
associated with the matter fields of a given 4D interacting 1-form gauge theory (along with
the well-known nilpotent transformations for the gauge and (anti-)ghost fields) without
spoiling the cute geometrical interpretations of the (anti-)BRST symmetry transformations
(and their corresponding generators) that emerge from the HC alone. The latter approach
has been christened as the augmented superfield approach to BRST formalism where the
restrictions imposed on the (4, 2)-dimensional superfields are (i) the HC plus the invariance
of the (super) matter Noether conserved currents [10-14], (ii) the HC plus the equality of
any (super) conserved quantities [15], (iii) the HC plus a restriction that owes its origin
to the gauge invariance and the (super) covariant derivatives on the matter (super)fields
[16,17], and (iv) an alternative to the HC where the gauge invariance and the property
of a pair of (super) covariant derivatives on the (super) matter fields (and their intimate
connection with the (super) curvatures) play a crucial role [18-20].
In all the above approaches [1-20], however, the invariance of the Lagrangian densities
of the 4D (non-)Abelian 1-form gauge theories, under the nilpotent (anti-)BRST symmetry
transformations, has not yet been discussed at all. Some attempts in this direction have
been made in our earlier works where the specific topological features [21,22] of the 2D
free (non-)Abelian 1-form gauge theories have been captured in the superfield formulation
[23-25]. In particular, the invariance of the Lagrangian density under the nilpotent and
anticommuting (anti-)BRST and (anti-)co-BRST symmetry transformations has been ex-
pressed in terms of the superfields and the Grassmannian derivatives on them. These are,
however, a bit more involved in nature because of the existence of a new set of nilpotent
(anti-)co-BRST symmetries in the theory. The geometrical interpretations for the La-
grangian densities and the symmetric energy-momentum tensor (for the above topological
theory) have also been provided within the framework of the superfield formulation.
The purpose of our present paper is to capture the (anti-)BRST symmetry invariance of
the Lagrangian density of the 4D (non-)Abelian 1-form gauge theories within the framework
of the superfield approach to BRST formalism and to demonstrate that the above symme-
try invariance could be understood in a very simple manner in terms of the translational
generators along the Grassmannian directions of the (4, 2)-dimensional supermanifold on
which the above 4D ordinary gauge theories are considered. In addition, the reason behind
the existence (or non-existence) of any specific nilpotent symmetry transformation could
also be explained within the framework of the above superfield approach. We demonstrate
the uniqueness of the existence of the nilpotent (anti-)BRST symmetry transformations
for the Lagrangian density of a U(1) Abelian 1-form gauge theory. We go a step further
and show the existence of the nilpotent BRST symmetry transformations for the specific
Lagrangian densities (cf. (4.1) and (4.4) below) of the 4D non-Abelian 1-form gauge theory
and clarify the non-existence of the anti-BRST symmetry transformations for these spe-
cific Lagrangian densities within the framework of the superfield formulation (cf. section
5 below). Finally, we provide the geometrical basis for the existence of the off-shell nilpo-
tent and anticommuting (anti-)BRST symmetry transformations (and their corresponding
generators) for the specifically defined Lagrangian densities (cf. (4.7) and/or (4.8) below)
of the 4D non-Abelian 1-form gauge theory in the Feynman gauge.
The motivating factors that have propelled us to pursue our present investigation are
as follows. First and foremost, to the best of our knowledge, the property of the symmetry
invariance of a given Lagrangian density has not yet been captured in the language of the
superfield approach to BRST formalism. Second, the above (anti-)BRST invariance of the
theory has never been shown, in as simplified fashion, as we demonstrate in our present
endeavour. The geometrical interpretations for (i) the existence of the above nilpotent
(anti-)BRST symmetry invariance, and (ii) the on-shell conditions of the on-shell nilpotent
(anti-)BRST symmetries, turn out to be quite transparent in our present work. Third, we
establish the uniqueness of the existence of the (anti-)BRST symmetry invariance in their
various forms. The non-existence of the specific symmetry transformation is also explained
within the framework of the superfield approach to BRST formalism. Finally, our present
investigation is the first modest step in the direction to gain some insights into the existence
of the nilpotent symmetry transformations and their invariance for the higher form (e.g.
2-form, 3-form, etc.) gauge theories within the framework of the superfield formulation.
The contents of our present paper are organized as follows. In section 2, we recapitulate
some of the key points connected with the nilpotent (anti-)BRST symmetry transformations
for the free 4D Abelian 1-form gauge theory (having no interaction with matter fields) in
the Lagrangian formulation. The above symmetry transformations as well as the symmetry
invariance of the Lagrangian densities are captured in the geometrical superfield approach
to BRST formalism in section 3 where the HC on the gauge superfield plays a crucial role.
Section 4 deals with the bare essentials of the nilpotent (anti-)BRST symmetry transfor-
mations for the 4D non-Abelian 1-form gauge theory in the Lagrangian formulation. The
subject matter of section 5 concerns itself with the superfield formulation of the symmetry
invariance of the appropriate Lagrangian densities of the above 4D non-Abelian 1-form
gauge theory. Finally, in section 6, we summarize our key results, make some concluding
remarks and point out a few future directions for further investigations.
2 (Anti-)BRST symmetries in Abelian theory: Lagrangian formulation
Let us begin with the following (anti-)BRST invariant Lagrangian density of the 4D Abelian
1-form gauge theory∗ in the Feynman gauge [26,27,9]
B = −
F µνFµν + B (∂µA
B2 − i ∂µC̄ ∂
µC, (2.1)
where Fµν = ∂µAν − ∂νAµ is the antisymmetric (Fµν = −Fνµ) curvature tensor that con-
stitutes the Abelian 2-form F (2) = dA(1) ≡ 1
(dxµ ∧ dxν)Fµν , B is the Nakanishi-Lautrup
auxiliary multiplier field and (C̄)C are the anticommuting (i.e. C2 = C̄2 = 0, CC̄+C̄C = 0)
(anti-)ghost fields of the theory. The above Lagrangian density respects the off-shell nilpo-
tent (s2(a)b = 0) (anti-)BRST symmetry transformations s(a)b (with sbsab + sabsb = 0)
sbAµ = ∂µC, sbC = 0, sbC̄ = iB, sbB = 0, sbFµν = 0,
sabAµ = ∂µC̄, sabC̄ = 0, sabC = −iB, sabB = 0, sabFµν = 0.
(2.2)
It is clear that, under the nilpotent (anti-)BRST symmetry transformations s(a)b, the cur-
vature tensor Fµν is found to be invariant. In other words, the 2-form F
(2), owing its
origin to the cohomological operator d = dxµ∂µ, is an (anti-)BRST invariant object for the
Abelian U(1) 1-form gauge theory and is, therefore, a physically meaningful (i.e. gauge-
invariant) quantity. These observations will play an important role in our discussion on the
horizontality condition that would be exploited in the context of our superfield approach
to (anti-)BRST invariance of the Lagrangian densities in sections 3 and 5 (see below).
A noteworthy point, at this stage, is the observation that the gauge-fixing and Faddeev-
Popov ghost terms can be written, modulo a total derivative, in the following fashion
−i C̄ {(∂µA
B}], sab
+i C {(∂µA
sb sab
(2.3)
The above equation establishes, in a very simple manner, the (anti-)BRST invariance of
the 4D Lagrangian density (2.1). The simplicity ensues due to (i) the nilpotency s2(a)b = 0
of the (anti-)BRST symmetry transformations, (ii) the anticommutativity property (i.e.
sbsab + sabsb = 0) of s(a)b, and (iii) the invariance of the Fµν term under s(a)b.
As a side remark, it is interesting to note that the following on-shell (i.e. ✷C = ✷C̄ = 0)
nilpotent (s̃2(a)b = 0) (anti-)BRST symmetry transformations (with s̃bs̃ab + s̃abs̃b = 0)
s̃bAµ = ∂µC, s̃bC = 0, s̃bC̄ = −i(∂µA
µ), s̃bFµν = 0,
s̃abAµ = ∂µC̄, s̃abC̄ = 0, s̃abC = +i(∂µA
µ), s̃abFµν = 0,
(2.4)
∗We adopt here the notations and conventions such that the flat Minkowski metric in 4D is ηµν = diag
(+1,−1,−1,−1) so that AµB
µ = ηµνA
µBν = A0B0 − AiBi for two non-null 4-vectors Aµ and Bµ. The
Greek indices µ, ν...... = 0, 1, 2, 3 and Latin indices i, j, k.... = 1, 2, 3 stand for the 4D spacetime and 3D
space directions on the 4D Minkowski spacetime manifold, respectively, and the symbol ✷ = (∂0)
2 − (∂i)
†We follow here the notations and conventions adopted in [27]. In its full blaze of glory, the nilpotent
(anti-)BRST transformations δ(A)B are a product of an anticommuting spacetime independent parameter
η and s(a)b (i.e. δ(A)B = ηs(a)b) where the nilpotency property is encoded in the operators s(a)b.
are the symmetry transformations for the following Lagrangian density
b = −
F µνFµν −
µ)2 − i ∂µC̄ ∂
µC. (2.5)
The above transformations (2.4) and the Lagrangian density (2.5) have been derived from
(2.2) and (2.1) by the substitution B = −(∂µA
µ). An interesting point, connected with the
on-shell nilpotent symmetry transformations, is to express the analogue of (2.3) as ‡
C̄ (∂µA
µ) + i Aµ∂
µC̄], s̃ab
C (∂µA
µ)− i Aµ∂
s̃b s̃ab
(2.6)
It should be noted that, in the above precise computation, one has to take into account
the on-shell (✷C = ✷C̄ = 0) conditions so that, for all practical purposes s̃(a)b(∂µA
µ) = 0.
The above nilpotent (anti-)BRST symmetry transformations (i.e. sr, s̃r with r = b, ab)
are connected with the conserved and nilpotent generators (i.e. Qr, Q̃r with r = b, ab).
This statement can be succinctly expressed, in the mathematical form, as
sr Ω = −i [ Ω, Qr ](±), s̃r Ω̃ = −i [ Ω̃, Q̃r ](±), r = b, ab, (2.7)
where the subscripts (with the signatures (±)) on the square bracket stand for the bracket
to be an (anti)commutator, for the generic fields Ω = Aµ, C, C̄, B and Ω̃ = Aµ, C, C̄ (of
the Lagrangian densities (2.1) and (2.5)), being (fermionic)bosonic in nature. The above
charges Qr, Q̃r are found to be anticommuting (i.e. QbQab+QabQb = 0, Q̃bQ̃ab+Q̃abQ̃b = 0)
and off-shell as well as on-shell nilpotent (Q2(a)b = 0, Q̃
(a)b = 0) in nature, respectively.
3 (Anti-)BRST invariance in Abelian theory: superfield formalism
In this section, we exploit the geometrical superfield approach to BRST formalism, endowed
with the theoretical arsenal of the horizontality condition, to express the (anti-)BRST
symmetry transformations and the Lagrangian densities (cf. (2.1) and (2.5)) in terms of
the superfields defined on the (4, 2)-dimensional supermanifold. The latter is parametrized
by the spacetime coordinates xµ (with µ = 0, 1, 2, 3) and a pair of Grassmannian variables θ
and θ̄. As a consequence, the generalization of the 4D ordinary exterior derivative d = dxµ∂µ
and the 1-form connection A(1) = dxµAµ(x) on the (4, 2)-dimensional supermanifold, are
d → d̃ = dxµ ∂µ + dθ ∂θ + dθ̄ ∂θ̄, d̃
2 = 0,
A(1) → Ã(1) = dxµ Bµ(x, θ, θ̄) + dθ F̄(x, θ, θ̄) + dθ̄ F(x, θ, θ̄),
(3.1)
where the mapping from the 4D local fields to the superfields are: Aµ(x) → Bµ(x, θ, θ̄),
C(x) → F(x, θ, θ̄) and C̄(x) → F̄(x, θ, θ̄). The super-expansion of the superfields, in terms
‡We lay emphasis on the fact that (2.6) cannot be derived directly from (2.3) by the simple substitution
B = −(∂µA
µ). One has to be judicious to deduce the precise expression for (2.6). The logical reasons
behind the derivation of (2.6) are encoded in the superfield formulation (cf. (3.9) below).
of the basic fields as well as the secondary fields, are (see, e.g., [4-7, 10-12]):
Bµ(x, θ, θ̄) = Aµ(x) + θ R̄µ(x) + θ̄ Rµ(x) + i θ θ̄ Sµ(x),
F(x, θ, θ̄) = C(x) + i θ B̄1(x) + i θ̄ B1(x) + i θ θ̄ s(x),
F̄(x, θ, θ̄) = C̄(x) + i θ B̄2(x) + i θ̄ B2(x) + i θ θ̄ s̄(x).
(3.2)
It can be readily seen that, in the limiting case of (θ, θ̄) → 0, we get back our 4D basic
fields (Aµ, C, C̄). Furthermore, on the r.h.s. of the above super expansion, the bosonic (i.e.
Aµ, Sµ, B1, B̄1, B2, B̄2) and the fermionic (Rµ, R̄µ, C, C̄, s, s̄) fields do match.
At this juncture, we have to recall our observations after equation (2.2). The nilpotent
(anti-)BRST symmetry transformations basically owe their origin to the cohomological
operator d. This is capitalized in the horizontality condition where we impose the restriction
d̃Ã(1) = dA(1) on the super 1-form connection Ã(1) that contains the superfields defined on
the (4, 2)-dimensional supermanifold. The latter condition yields the following relationships
(see, e.g., for details, in our earlier works [21-25]):
B1 = B̄2 = s = s̄ = 0, B̄1 +B2 = 0, (3.3)
where we are free to choose the secondary fields (B2, B̄1) (i.e. B2 = B ⇒ B̄1 = −B) in
terms of the Nakanishi-Lautrup auxiliary field B of the BRST invariant Lagrangian density
(2.1). The other relations, that emerge from the above HC (i.e. d̃Ã(1) = dA(1)), are
Rµ = ∂µC, R̄µ = ∂µC̄, Sµ = ∂µB, ∂µBν − ∂νBµ = ∂µAν − ∂νAµ. (3.4)
At this stage, the super-curvature tensor F̃µν = ∂µBν − ∂νBµ is not equal to the ordinary
curvature tensor Fµν = ∂µAν−∂νAµ as the former contains Grassmannian dependent terms.
The substitution of the above values (cf. (3.3),(3.4)) of the secondary fields, in terms
of the basic and auxiliary fields of the Lagrangian density (2.1), leads to
B(h)µ (x, θ, θ̄) = Aµ + θ ∂µC̄ + θ̄ ∂µC + i θ θ̄ ∂µB,
F (h)(x, θ, θ̄) = C − i θ B, F̄ (h)(x, θ, θ̄) = C̄ + i θ̄ B,
(3.5)
where the superscript (h) has been used to denote that the above expansions have been
obtained after the application of the HC. It can be seen that, due to (3.5), we get
ν − ∂νB
µ = ∂µAν − ∂νAµ, (3.6)
where there is no Grassmannian θ and θ̄ dependence on the l.h.s.
In the language of the geometry on the (4, 2)-dimensional supermanifold, the expansions
(3.5) imply that the (anti-)BRST symmetry transformations s(a)b (and their corresponding
generators Q(a)b) for the 4D local fields (cf. (2.7)) are connected with the translational
generators (∂/∂θ, ∂/∂θ̄) because the translation of the corresponding (4, 2)-dimensional
superfields, along the Grassmannian directions of the supermanifold, produces it. Thus,
the Grassmannian independence of the super curvature tensor F̃ (h)µν = ∂µB
ν −∂νB
µ implies
that the 4D curvature tensor Fµν is an (anti-)BRST (i.e. gauge) invariant physical quantity.
In terms of the superfields, equations (2.3) can be expressed as
Limθ→0
−i F̄ (h) { (∂µB(h)µ +
Limθ̄→0
+ iF (h) { (∂µB(h)µ +
Bµ(h)B(h)µ +
F (h) F̄ (h)
(3.7)
These equations are unique because there is no other way to express the above equations in
terms of the derivatives w.r.t. Grassmannian variables θ and θ̄. Thus, besides (2.3), there
is no other possibility to express the gauge-fixing and the Faddeev-Popov ghost terms in
the language of the off-shell nilpotent (anti-)BRST symmetry transformations (2.2). The
superfield approach to BRST formulation, therefore, establishes the uniqueness of (2.3).
To express (2.6) in terms of the superfields, one has to substitute B = −(∂µA
µ) in (3.5).
Thus, the expansion (3.5), in terms of the transformations (2.4), becomes§
µ(o)(x, θ, θ̄) = Aµ + θ ∂µC̄ + θ̄ ∂µC − i θ θ̄ ∂µ(∂
ρAρ),
≡ Aµ + θ (s̃abAµ) + θ̄ (s̃bAµ) + θ θ̄(s̃bs̃abAµ),
(o) (x, θ, θ̄) = C + i θ (∂µA
µ) ≡ C + θ (s̃abC),
(o) (x, θ, θ̄) = C̄ − i θ̄ (∂µA
µ) ≡ C̄ + θ̄ (s̃bC̄).
(3.8)
We note that (3.5) and (3.8) are the super expansions (after the application of the HC)
which lead to the derivation of the off-shell nilpotent (anti-)BRST symmetry transforma-
tions s(a)b as well as the on-shell nilpotent (anti-)BRST symmetry transformations s̃(a)b,
respectively, for the basic fields Aµ, C and C̄ of the theory.
The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian density (2.5) can
also be expressed in terms of the superfields (3.8). In other words, (vis-à-vis (3.7)), we have
the following equations that are the analogue of (2.6), namely;
Limθ→0
(o) (∂
µAµ) + i B
µ(o) ∂
(o) )
Limθ̄→0
(o) (∂
µAµ)− i B
µ(o) ∂
(o) )
(o) B
µ(o) +
(o) F̄
(3.9)
We know that, for all practical computational purposes, it is essential to take into account
s̃(a)b(∂µA
µ) = 0 because of the on-shell conditions ✷C = ✷C̄ = 0. The logical reason behind
such a restriction (i.e. s̃(a)b(∂µA
µ) = 0) in (2.6) is encoded in the superfield approach to
BRST formalism as can be seen from a close look at (3.9).
The Lagrangian density (2.1) can be expressed, in terms of the (4, 2)-dimensional
superfields, in the following distinct and different forms
B = −
F̃ (h)µν F̃
µν(h) + Limθ→0
−i F̄ (h)(∂µB(h)µ +
, (3.10)
§The on-shell nilpotent (anti-)BRST symmetry transformations s̃(a)b can also be obtained by invoking
the (anti-)chiral superfields on the appropriately chosen supermanifolds (see, e.g. [23] for details).
B = −
F̃ (h)µν F̃
µν(h) + Limθ̄→0
+i F (h)(∂µB(h)µ +
, (3.11)
B = −
F̃ (h)µν F̃
µν(h) +
Bµ(h)B(h)µ +
F (h)F̄ (h)
. (3.12)
It would be noted that the kinetic energy term −(1/4)F̃ (h)µν F̃
µν(h) is independent of the
variables θ and θ̄ because F̃ (h)µν = Fµν . In exactly similar fashion, the Lagrangian density of
(2.5) can be expressed, with the help of the super expansion (3.8), as
b = −
µν(o)F̃
µν(h)
(o) + Limθ→0
(o) (∂
µAµ) + i B
µ(o) ∂
(o) )
, (3.13)
b = −
µν(o)F̃
µν(h)
(o) + Limθ̄→0
(o) (∂
µAµ)− i B
µ(0) ∂
(o) )
, (3.14)
b = −
µν(o)F̃
µν(h)
(o) +
(o) B
µ(o) +
(o) F̄
. (3.15)
The form of the Lagrangian densities (e.g. from (3.10) to (3.15)) simplify the proof for the
(anti-)BRST invariance of the Lagrangian densities in (2.1) and (2.5).
In the above forms (e.g. from (3.10) to (3.12)) of the Lagrangian density, the BRST
invariance sbLB = 0 and the anti-BRST invariance sabLB = 0 become very transparent
and simple because the following equalities and mappings exist, namely;
B = 0 ⇒ Limθ→0
B = 0, sb ⇔ Limθ→0
, s2b = 0 ⇔
= 0, (3.16)
B = 0 ⇒ Limθ̄→0
B = 0, sab ⇔ Limθ̄→0
, s2ab = 0 ⇔
= 0. (3.17)
Similarly, the most beautiful relation (3.12), leads to the (anti-)BRST invariance together.
Here one has to use the anticommutativity property sbsab + sabsb = 0 in the language of
the translational generators (i.e. (∂/∂θ̄), (∂/∂θ)) along the Grassmannian directions of the
supermanifold, for its proof. This statement can be mathematically expressed as
s(a)bL
B = 0 ⇒
B = 0, sbsab + sabsb = 0 ⇔
= 0. (3.18)
In exactly similar fashion, the on-shell nilpotent (anti-)BRST symmetry invariance (i.e.
s̃(a)bL
b = 0) of the Lagrangian density (2.5) can also be captured in the language of the
superfields if we exploit the expressions (3.13) to (3.15) for the Lagrangian density. In the
latter case, the on-shell nilpotent (anti-)BRST invariance turns out to be like (3.16), (3.17)
and (3.18) with the replacements: s(a)b → s̃(a)b, L
B → L
b , L̃
(1,2,3)
B → L̃
(1,2,3)
Mathematically, the (anti-)BRST invariance of the Lagrangian density (2.1) is captured
in the equations (3.16) to (3.18). In the language of geometry on the (4, 2)-dimensional
supermanifold, the (anti-)BRST invariance corresponds to the Grassmannian independence
of the supersymmetric versions of the Lagrangian density (2.1). In other words, the trans-
lation of the super Lagrangian densities (i.e. (3.10) to (3.12)), along the (θ)θ̄ directions of
the supermanifold, is zero. This observation captures the (anti-)BRST invariance of (2.1).
4 (Anti-)BRST symmetries in non-Abelian theory: Lagrangian approach
We begin with the following BRST-invariant Lagrangian density, in the Feynman gauge,
for the four (3 + 1)-dimensional non-Abelian 1-form gauge theory¶ (see, e.g. [26,27,9])
B = −
F µν · Fµν +B · (∂µA
B · B − i∂µC̄ ·D
µC, (4.1)
where the curvature tensor (Fµν) is defined through the 2-form F
(2)(n) = dA(1)(n)+iA(1)(n)∧
A(1)(n). Here the non-Abelian 1-form gauge connection is A(1)(n) = dxµ(Aµ · T ) and the
exterior derivative is d = dxµ∂µ. The Nakanishi-Lautrup auxiliary field B = B · T is
required for the linearization of the gauge-fixing term and the (anti-)ghost fields (C̄)C are
essential for the proof of the unitarity in the theory. The latter fields are fermionic (i.e.
(Ca)2 = 0, (C̄a)2 = 0, CaCb + CbCa = 0, CaC̄b + C̄bCa = 0, etc.) in nature.
The above Lagrangian density respects the following off-shell nilpotent ((s
2 = 0)
BRST symmetry transformations s
b , namely;
b Aµ = DµC, s
b C = −
(C × C), s
b C̄ = iB,
b B = 0, s
b Fµν = i(Fµν × C).
(4.2)
It will be noted that (i) the curvature tensor Fµν · T transforms here under the BRST
symmetry transformation. However, it can be checked explicitly that the kinetic energy
term −(1/4)Fµν · F
µν remains invariant under the BRST symmetry transformations, (ii)
the nilpotent anti-BRST symmetry transformations corresponding to the above BRST
symmetry transformations (4.2) cannot be defined for the Lagrangian density (4.1), and
(iii) the on-shell nilpotent version of the above BRST symmetry transformations is also
possible if we substitute, in the above symmetry transformations, B = −(∂µA
µ). The
ensuing on-shell (i.e. ∂µD
µC = 0) nilpotent BRST symmetry transformations s̃
b are
b Aµ = DµC, s̃
b C = −
(C × C),
b C̄ = −i(∂µA
µ), s̃
b Fµν = i(Fµν × C).
(4.3)
The above on-shell nilpotent transformations leave the following Lagrangian density
b = −
F µν · Fµν −
µ) · (∂ρA
ρ)− i∂µC̄ ·D
µC, (4.4)
¶For the non-Abelian 1-form gauge theory, the notations used in the Lie algebraic space are: A · B =
AaBa, (A ×B)a = fabcAbBc, DµC
a = ∂µC
a + ifabcAbµC
c ≡ ∂µC
a + i(Aµ × C)
a, Fµν = ∂µAν − ∂νAµ +
iAµ×Aν , Aµ = Aµ ·T, [T
a, T b] = fabcT c where the Latin indices a, b, c = 1, 2, 3....N are in the SU(N) Lie
algebraic space. The structure constant fabc can be chosen to be totally antisymmetric for any arbitrary
semi simple Lie algebra that includes SU(N), too (see, e.g., [27]).
quasi-invariant because it transforms to a total derivative.
The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian densities (4.1) and
(4.4) can be written, modulo a total derivative, as a BRST-exact quantity in terms of
the off-shell and on-shell nilpotent BRST symmetry transformations (4.2) and (4.3). This
statement can be mathematically expressed as follows
−i C̄ · {(∂µA
= B · (∂µA
B · B − i ∂µC̄ ·D
µC, (4.5)
C̄ · (∂µA
µ) + i Aµ · ∂
µ) · (∂ρA
ρ)− i ∂µC̄ ·D
µC. (4.6)
It will be noted that one has to take into account s̃
b (∂µA
µ) = ∂µD
µC = 0 in the above
proof of the exactness of the expression in (4.6).
The Lagrangian densities that respect the off-shell nilpotent (i.e. (s
(a)b)
2 = 0) and
anticommuting (s
ab + s
b = 0) (anti-)BRST symmetry transformations are
(1)(n)
b = −
F µν · Fµν +B · (∂µA
(B · B + B̄ · B̄)− i∂µC̄ ·D
µC, (4.7)
(2)(n)
b = −
F µν · Fµν − B̄ · (∂µA
(B · B + B̄ · B̄)− iDµC̄ · ∂
µC. (4.8)
Here auxiliary fields B and B̄ satisfy the Curci-Ferrari condition B+B̄ = −(C×C̄) [28,29].
It is also evident, from this relation, that B ·(∂µA
µ)−i∂µC̄ ·D
µC = −B̄ ·(∂µA
µ)−iDµC̄ ·∂
Furthermore, it should be re-emphasized that the Lagrangian densities (4.1) and (4.4) do
not respect the anti-BRST symmetry transformations of any kind. The BRST and anti-
BRST symmetry transformations, for the above Lagrangian densities, are
b Aµ = DµC, s
b C = −
(C × C), s
b C̄ = iB,
b B = 0, s
b Fµν = i(Fµν × C), s
b B̄ = i(B̄ × C),
(4.9)
ab Aµ = DµC̄, s
ab C̄ = −
(C̄ × C̄), s
ab C = iB̄,
ab B̄ = 0, s
ab Fµν = i(Fµν × C̄), s
ab B = i(B × C̄).
(4.10)
The above off-shell nilpotent (anti-)BRST symmetry transformations leave the Lagrangian
densities (4.7) as well as (4.8) quasi-invariant as they transform to some total derivatives.
The gauge-fixing and Faddeev-Popov ghost terms of the Lagrangian densities (4.7) and
(4.8) can be written, in a symmetrical fashion with respect to s
b and s
ab , as
Aµ ·A
µ + C · C̄
= B · (∂µA
(B ·B + B̄ · B̄)− i∂µC̄ ·D
≡ −B̄ · (∂µA
(B · B + B̄ · B̄)− iDµC̄ · ∂
(4.11)
This demonstrates the key fact that the above gauge-fixing and Faddeev-Popov ghost terms
are (anti-)BRST invariant together because of the nilpotency and anticommutativity of the
(anti-)BRST symmetry transformations s
(a)b that are present in the theory.
5 (Anti-)BRST invariance in non-Abelian theory: superfield approach
To capture (i) the off-shell as well as the on-shell nilpotent (anti-)BRST symmetry transfor-
mations, and (ii) the invariance of the Lagrangian densities, in the language of the superfield
approach to BRST formalism, we have to consider the 4D 1-form non-Abelian gauge theory
on a (4, 2)-dimensional supermanifold. As a consequence, we have the following mappings:
d → d̃ = dxµ ∂µ + dθ ∂θ + dθ̄ ∂θ̄, d̃
2 = 0,
A(1)(n) → Ã(1)(n) = dxµ(Bµ · T )(x, θ, θ̄) + dθ(F̄ · T )(x, θ, θ̄) + dθ̄(F · T )(x, θ, θ̄),
(5.1)
where the (4, 2)-dimensional superfields (Bµ ·T,F ·T, F̄ ·T ) are the generalizations of the 4D
basic local fields (Aµ ·T, C ·T, C̄ ·T ) of the Lagrangian density (4.1), (4.7) and (4.8). These
superfields can be expanded along the Grassmannian directions of the supermanifold, in
terms of the basic 4D fields, auxiliary fields and secondary fields as [4,16,19]
(Bµ · T )(x, θ, θ̄) = (Aµ · T )(x) + θ (R̄µ · T )(x) + θ̄ (Rµ · T )(x) + i θ θ̄ (Sµ · T )(x),
(F · T )(x, θ, θ̄) = (C · T )(x) + i θ (B̄1 · T )(x) + i θ̄ (B1 · T )(x) + i θ θ̄ (s · T )(x),
(F̄ · T )(x, θ, θ̄) = (C̄ · T )(x) + i θ (B̄2 · T )(x) + i θ̄ (B2 · T )(x) + i θ θ̄ (s̄ · T )(x).
(5.2)
To determine the exact expressions for the secondary fields, in terms of the basic and
auxiliary fields of the theory, we have to exploit the HC. The horizontality condition, for
the non-Abelian gauge theory is the requirement of the equality of the Maurer-Cartan
equation on the (super) manifolds. In other words, the covariant reduction of the super
2-form curvature F̃ (2)(n) to the ordinary 2-form curvature (i.e. d̃Ã(1)(n)+ iÃ(1)(n) ∧ Ã(1)(n) =
dA(1)(n)+ iA(1)(n)∧A(1)(n)) leads to the determination of the secondary fields in terms of the
basic and auxiliary fields of the theory. The ensuing expansions, in terms of the basic and
auxiliary fields, lead to (i) the derivation of the (anti-)BRST symmetry transformations
for the basic fields of the theory, and (ii) the geometrical interpretations of the nilpotent
(anti-)BRST symmetry transformations (and their corresponding nilpotent generators) for
the basic fields of the theory as the translations of the corresponding superfields along the
Grassmannian directions of the (4, 2)-dimensional supermanifold (see, e.g., [16,19]).
With the identifications B2 = B and B̄1 = B̄, the following relationships emerge after
the application of the horizontality condition ‖ (see, e.g., [16]):
Rµ = DµC, R̄µ = DµC̄, B + B̄ = −(C × C̄), s = i(B̄ × C),
Sµ = DµB +DµC × C̄ ≡ −DµB̄ −DµC̄ × C,
s̄ = −i(B × C̄), B1 = −
(C × C), B̄2 = −
(C̄ × C̄).
(5.3)
‖In the rest of our present text, we shall be using the short-hand notations for all the fields e.g.:
Aµ · T = Aµ, C · T = C, B · T = B, etc., for the sake of brevity.
The substitution of the above expressions, which are obtained after the application of the
horizontality condition, leads to the following expansions
B(h)µ (x, θ, θ̄) = Aµ + θ DµC̄ + θ̄ DµC + i θ θ̄ (DµB +DµC × C̄),
F (h)(x, θ, θ̄) = C + i θ B̄ −
θ̄ (C × C)− θ θ̄ (B̄ × C),
F̄ (h)(x, θ, θ̄) = C̄ −
θ (C̄ × C̄) + i θ̄ B + θ θ̄ (B × C̄).
(5.4)
The above expansions (see, e.g., our earlier works [16,19]) can be expressed in terms of the
off-shell nilpotent (anti-)BRST symmetry transformations (4.9) and (4.10).
With the above expansion at our disposal, the gauge-fixing and Faddeev-Popov terms
of the Lagrangian density (4.1) can be written, modulo a total ordinary derivative, as
Limθ→0
−iF̄ (h) · ∂µB(h)µ −
F̄ (h) · B
= B · (∂µA
B · B − i ∂µC̄ ·D
µC. (5.5)
Furthermore, it can be seen that, due to the validity and consequences of the horizontality
condition, the super curvature tensor F̃µν has the following form [16,4]
F̃ (h)µν = Fµν + iθ(Fµν × C̄) + iθ̄(Fµν × C)− θ θ̄ (Fµν × B + Fµν × C × C̄). (5.6)
It is clear from the above relationship that the kinetic energy term of the present 4D
non-Abelian 1-form gauge theory remains invariant, namely;
F̃ (h)µν · F̃
µν(h) = −
Fµν · F
µν . (5.7)
The Grassmannian independence of the l.h.s. of (5.7) has deep meaning as far as physics is
concerned. It implies immediately that the kinetic energy term of the non-Abelian gauge
theory is an (anti-)BRST (i.e. gauge) invariant physical quantity.
At this juncture, it is worthwhile to point out that one can also capture the equation
(4.6) in the superfield approach to BRST formalism where the on-shell nilpotent version
of the BRST symmetry transformations (i.e. s̃
b ) plays an important role. For this pur-
pose, we have to express the superfield expansion (5.4) for the on-shell nilpotent BRST
symmetry transformation where one has to exploit the replacement B = −(∂µA
µ). With
this substitution, the equation (5.4) for the superfield expansion becomes
µ(o)(x, θ, θ̄) = Aµ + θ DµC̄ + θ̄ DµC + i θ θ̄ [−Dµ(∂
ρAρ) +DµC × C̄],
(o) (x, θ, θ̄) = C + i θ B̄ −
θ̄ (C × C)− θ θ̄ (B̄ × C),
(o) (x, θ, θ̄) = C̄ −
θ (C̄ × C̄)− i θ̄ (∂µA
µ)− θ θ̄ [(∂µA
µ)× C̄)].
(5.8)
Now, the equation (4.6) can be expressed in terms of the above superfields, as:
Limθ→0
(o) · (∂
µAµ) + i B
µ(o) · ∂
µ) · (∂ρA
ρ)− i ∂µC̄ ·D
(5.9)
Furthermore, it will be noted that the analogue of (5.6), for the on-shell nilpotent BRST
symmetry transformation (i.e. F̃
µν(o)), can be obtained by the replacement B = −(∂µA
Once again, the equality (5.7) would remain intact even if we take into account the on-shell
nilpotent BRST symmetry transformations. Thus, we note that the kinetic energy term
(i.e. (−(1/4)F µν · Fµν = −(1/4)F̃
µν(h)
(o) · F̃
µν(o)) of the non-Abelian gauge theory remains
independent of the Grassmannian variables θ and θ̄ after the application of the HC. This
statement is true for the off-shell as well as the on-shell nilpotent (anti-)BRST symmetry
transformations. Physically, it implies that the kinetic energy term for the gauge field of
the non-Abelian theory is an (anti-)BRST (i.e. gauge) invariant quantity.
The above key observation helps in expressing the Lagrangian density (4.1) and (4.4)
in terms of the superfields (obtained after the application of HC), as
B = −
F̃ (h)µν · F̃
µν(h) + Limθ→0
−iF̄ (h) · ∂µB(h)µ −
F̄ (h) · B
b = −
µν(o) · F̃
µν(h)
(o) + Limθ→0
(o) · (∂
µAµ) + i B
µ(o) · ∂
(5.10)
This result, in turn, simplifies the BRST invariance of the above Lagrangian density (4.1)
and (4.4) (describing the 4D 1-form non-Abelian gauge theory) as follows
Limθ→0
B = 0 ⇒ s
B = 0, Limθ→0
b = 0 ⇒ s̃
b = 0. (5.11)
This is a great simplification because the total super Lagrangian densities (5.10) remain
independent of the Grassmannian variable θ̄. This key result is encoded in the mapping
b , s̃
b ) ⇔ Limθ→0(∂/∂θ̄) and the nilpotency (s
2 = 0, (s̃
2 = 0, (∂/∂θ̄)2 = 0.
It can be readily checked that the analogues of (5.5) and (5.9) cannot be expressed as
the derivative w.r.t. the Grassmannian variable θ. To check this, one has to exploit the
super expansions (5.4) and (5.8) obtained after the application of the HC (in the context
of the derivation of the off-shell as well as the on-shell nilpotent BRST symmetry transfor-
mations s
b and s̃
b ). It can be clearly seen that the operation of the derivative w.r.t. the
Grassmannian variable θ, on any combination of the superfields from the expansions (5.4)
and (5.8), does not lead to the derivation of the r.h.s. of (5.5) and (5.9). In the language
of the superfield approach to BRST formalism, this is the reason behind the non-existence
of the anti-BRST symmetry transformations for the Lagrangian densities (4.1) and (4.4).
The form of the gauge-fixing and Faddeev-Popov terms (4.11), expressed in terms of
the BRST and anti-BRST symmetry transformations together, can be represented in the
language of the superfields (obtained after the application of HC), as
B(h)µ · B
µ(h) + F (h) · F̄ (h)
= B · (∂µA
(B · B + B̄ · B̄)− i∂µC̄ ·D
(5.12)
As a consequence of the above expression, the Lagrangian densities (4.7) (as well as (4.8))
can be presented, in terms of the superfields, as
(1,2)(n)
b = −
F̃ µν(h) · F̃ (h)µν +
B(h)µ · B
µ(h) + F (h) · F̄ (h)
. (5.13)
The BRST and anti-BRST invariance of the above super Lagrangian density (and that of
the ordinary 4D Lagrangian densities (4.7) and (4.8)) is encoded in the following simple
equations that are expressed in terms of the translational generators along the Grassman-
nian directions of the (4, 2)-dimensional supermanifold, namely;
Limθ→0
(1,2)(n)
b = 0 ⇒ s
(1)(n)
b = 0, Limθ̄→0
(1,2)(n)
b = 0 ⇒ s
(2)(n)
b = 0.
(5.14)
This is a tremendous simplification of the (anti-)BRST invariance of the Lagrangian den-
sities (4.7) and (4.8) in the language of the superfield approach to BRST formalism. In
other words, if one is able to show the Grassmannian independence of the super Lagrangian
densities of the theory, the (anti-)BRST invariance of the 4D theory follows automatically.
In the language of the geometry on the supermanifold, the (anti-)BRST invariance of
a 4D Lagrangian density is equivalent to the statement that the translation of the super
version of the above Lagrangian density, along the Grassmannian directions of the (4, 2)-
dimensional supermanifold, is zero. Thus, the super Lagrangian density of an (anti-)BRST
invariant 4D theory is a Lorentz scalar, constructed with the help of (4, 2)-dimensional
superfields (obtained after the application of HC), such that, when the partial derivatives
w.r.t. the Grassmannian variables (θ and θ̄) operate on it, the result is zero.
The nilpotency and anticommutativity properties (that are associated with the con-
served (anti-)BRST charges and (anti-)BRST symmetry transformations) are found to be
captured very naturally (cf. (3.16)-(3.18)) when we consider the superfield formulation of
the (anti-)BRST invariance of the Lagrangian density of a given 1-form gauge theory. We
mention, in passing, that one could also derive the analogue of the equations (3.16), (3.17)
and (3.18) for the 4D non-Abelian 1-form gauge theory in a straightforward manner.
6 Conclusions
In our present investigation, we have concentrated mainly on the (anti-)BRST invariance
of the Lagrangian densities of the free 4D (non-)Abelian 1-form gauge theories (having no
interaction with matter fields) within the framework of the superfield approach to BRST
formalism. We have been able to provide the geometrical basis for the existence of the
(anti-)BRST invariance in the above 4D theories. To be more specific, we have been able
to show that the Grassmannian independence of the (4, 2)-dimensional super Lagrangian
density, expressed in terms of the appropriate superfields, is a clear-cut proof that there is
an (anti-)BRST invariance (cf. (3.16), (3.17), (3.18), (5.11), (5.14)) in the 4D theory.
If the super Lagrangian density could be expressed as a sum of (i) a Grassmannian
independent term, and (ii) a derivative w.r.t. the Grassmannian variable, then, the cor-
responding 4D Lagrangian density will automatically respect BRST and/or anti-BRST
invariance. In the latter piece of the above super Lagrangian density, the derivative could
be either w.r.t. θ or w.r.t. θ̄ or w.r.t. both of them put together. More specifically,
(i) if the derivative is w.r.t. θ̄, the nilpotent symmetry would correspond to the BRST,
(ii) if the derivative is w.r.t. θ, the nilpotent symmetry would be that of the anti-BRST
type, and (iii) if both the derivatives are present together, both the nilpotent (anti-)BRST
symmetries would be present together (and they would turn out to be anticommuting).
For the 4D (non-)Abelian 1-form gauge theories, that are considered on the (4, 2)-
dimensional supermanifold, it is the HC on the 1-form super connection Ã(1) that plays a
very important role in the derivation of the (anti-)BRST symmetry transformations. The
cohomological origin for the above HC lies in the (super) exterior derivatives (d̃)d. This
point has been made quite clear in our discussions after the off-shell as well as the on-shell
nilpotent (anti-)BRST symmetry transformations (2.2), (2.4), (4.2), (4.3), (4.9) and (4.10).
In fact, it is the full kinetic energy term of the above theories (owing its origin to the
cohomological operator d = dxµ∂µ) that remains invariant under the above on-shell as well
the off-shell nilpotent (anti-)BRST symmetry transformations.
The HC produces specifically the nilpotent (anti-)BRST symmetry transformations for
the gauge and (anti-)ghost fields because of the fact that the super 1-form connection
Ã(1)/Ã(1)(n) (cf. (3.1) and (5.1)) is constructed with a super vector multiplet (Bµ,F , F̄)
which is the generalization of the gauge field Aµ and the (anti-)ghost fields (C̄)C (of the
ordinary 4D (non-)Abelian 1-form gauge theories) to the (4, 2)-dimensional supermanifold.
As a consequence, only the nilpotent and anticommuting (anti-)BRST symmetry transfor-
mations for the 4D local fields Aµ, C and C̄ are obtained when the full potential of the HC
is exploited within the framework of the above superfield formulation.
It is worthwhile to point out that geometrically the super Lagrangian densities, ex-
pressed in terms of the (4, 2)-dimensional superfields, are equivalent to the sum of the
kinetic energy term and the translations of some composite superfields (obtained after the
application of the HC) along the Grassmannian directions (i.e. θ and/or θ̄) of the (4, 2)-
dimensional supermanifold. This observation is distinctly different from our earlier works
on the superfield approach to 2D (non-)Abelian 1-form gauge theories [24,25,23] which are
found to correspond to the topological field theories. In fact, for the latter theories, the
total super Lagrangian density turns out to be a total derivative w.r.t. the Grassmannian
variables (θ and/or θ̄). That is to say, even the kinetic energy term of the latter theories,
is able to be expressed as the total derivative w.r.t. the variables θ and/or θ̄.
In our present endeavour, within the framework of the superfield approach to BRST
formalism, we have been able to provide (i) the logical reason behind the non-existence of
the anti-BRST symmetry transformations for the Lagrangian densities (4.1) and (4.4) for
the 4D non-Abelian 1-form gauge theory, (ii) the explicit explanation for the uniqueness
of the equations (2.3) and (2.6) for the 4D Abelian 1-form gauge theory, (iii) the convinc-
ing proof for the on-shell nilpotent (anti-)BRST invariance of the gauge-fixing term (i.e.
s̃(a)b(∂µA
µ) = 0, s̃
(a)b(∂µA
µ) = 0) for the (non-)Abelian 1-form gauge theories, and (iv) the
compelling arguments for the non-existence of the exact analogue(s) of (2.3) and (2.6) for
the non-Abelian 1-form gauge theory. To the best of our knowledge, the logical explana-
tions for the above subtle points (connected with the 1-form gauge theories) are completely
new. Thus, the results of our present work are simple, beautiful and original.
It is worthwhile to mention that our superfield construction and its ensuing geometrical
interpretations are not specific to the Feynman gauge (which has been taken into account
in our present endeavor). To corroborate this assertion, we take the simple case of the 4D
Abelian 1-form gauge theory and write the Lagrangian density (2.1) in the arbitrary gauge
(a,ξ)
B = −
F µνFµν + B (∂µA
B2 − i ∂µC̄ ∂
µC, (6.1)
where ξ is the gauge parameter. It is elementary to check that, in the limit ξ → 1, we get
back our Lagrangian density (2.1) for the Abelian theory in the Feynman gauge.
The analogue of the equation (2.3) (for the gauge-fixing and Faddeev-Popov ghost terms
in the case of the arbitrary gauge) can be expressed as
−i C̄ {(∂µA
B}], sab
+i C {(∂µA
sb sab
(6.2)
The above expression can be easily generalized to the analogues of the equations (3.10)—
(3.12) in terms of the superfields by taking the help of (3.8). Thus, the geometrical inter-
pretations remain intact even in the case of the arbitrary gauge.
In a similar fashion, for the 4D non-Abelian 1-form gauge theory, the equations (4.5),
(4.6) and (4.11) can be generalized to the case of arbitrary gauge and, subsequently, can
be expressed in terms of superfields as the analogues of (5.5), (5.9) and (5.12). Finally,
we can obtain the analogues of (5.7), (5.10) and (5.13) which will lead to the derivation of
the analogues of (5.11) and (5.14). Thus, we note that geometrical interpretations, in the
arbitrary gauge, remain the same for the 4D (non-)Abelian 1-form gauge theory within the
framework of our superfield approach to BRST formalism.
Our present work can be generalized to the case of the interacting 4D (non-)Abelian
1-form gauge theories where there exists an explicit coupling between the gauge field and
the matter fields. In fact, our earlier works [14-18] might turn out to be quite handy in
attempting the above problems. It seems to us that it is the combination of the HC and
the restrictions, owing their origin to the (super) covariant derivative on the matter (super)
fields and their intimate connection with the (super) curvatures, that would play a decisive
role in proving the existence of the (anti-)BRST invariance for the above gauge theories.
It is gratifying to state that we have accomplished the above goals in our very recent
endeavours [30-32]. In fact, we have been able to provide the geometrical basis for the
existence of the (anti-)BRST invariance, in the context of the interacting (non-)Abelian
1-form gauge theories with Dirac as well as complex scalar fields, within the framework
of the augmented superfield approach to BRST formalism. As it turns out, here too, the
super Lagrangian density is found to be independent of the Grassmannian variables.
In our earlier works [33-35], we have been able to show the existence of the nilpotent
(anti-)BRST and (anti-)co-BRST symmetry transformations for the 4D free Abelian 2-form
gauge theory. We have also established the quasi-topological nature of it in [35]. In a recent
work [36], the nilpotent (anti-)BRST symmetry transformations have been captured in the
framework of the superfield formulation. It would be a very nice endeavour to study the
(anti-)BRST and (anti-)co-BRST invariance of the 4D Abelian 2-form gauge theory within
the framework of superfield formulation. At present, this issue and connected problems in
the context of the 4D free Abelian 2-form gauge theory are under intensive investigation
and our results would be reported in our forthcoming future publications [37].
Acknowledgement: Financial support from the Department of Science and Technology
(DST), Government of India, under the SERC project sanction grant No: - SR/S2/HEP-
23/2006, is gratefully acknowledged.
References
[1] J. Thierry-Mieg, J. Math. Phys. 21, 2834 (1980).
[2] J. Thierry-Mieg, Nuovo Cimento A 56, 396 (1980).
[3] M. Quiros, F. J. De Urries, J. Hoyos, M. L. Mazon and E. Rodrigues, J. Math. Phys.
22, 1767 (1981).
[4] L. Bonora and M. Tonin, Phys. Lett. B 98, 48 (1981).
[5] L. Bonora, P. Pasti and M. Tonin M, Nuovo Cimento A 63, 353 (1981).
[6] R. Delbourgo and P. D. Jarvis, J. Phys. A: Math. Gen. 15, 611 (1981).
[7] R. Delbourgo, P. D. Jarvis and G. Thompson, Phys. Lett. B 109, 25 (1982).
[8] D. S. Hwang and C. -Y. Lee, J. Math. Phys. 38, 30 (1997).
[9] N. Nakanishi and I. Ojima, Covariant Operator Formalism of Gauge Theories and
Quantum Gravity (World Scientific, Singapore, 1990).
[10] R. P. Malik, Phys. Lett. B 584, 210 (2004), hep-th/0311001.
[11] R. P. Malik, Int. J. Geom. Methods Mod. Phys. 1, 467 (2004), hep-th/0403230.
[12] R. P. Malik, J. Phys. A: Math. Gen. 37, 5261 (2004), hep-th/031193.
[13] R. P. Malik, Int. J. Mod. Phys. A 20, 4899 (2005), hep-th/0402005.
R. P. Malik, Int. J. Mod. Phys. A 20, 7285 (2005), hep-th/0402005 (Erratum).
[14] R. P. Malik, Mod. Phys. Lett. A 20, 1667 (2005), hep-th/0402123.
[15] R. P. Malik, Eur. Phys. J. C 45, 513 (2006), hep-th/0506109.
[16] R. P. Malik and B. P. Mandal, Eur. Phys. J. C 47, 219 (2006), hep-th/0512334.
[17] R. P. Malik, Eur. Phys. J. C 47, 227 (2006), hep-th/0507127.
[18] R. P. Malik, J. Phys. A: Math. Gen. 39, 10575 (2006), hep-th/0510164.
[19] R. P. Malik, Eur. Phys. J. C 51, 169 (2007), hep-th/0603049.
[20] R. P. Malik, J. Phys. A: Math. Theor. 40, 4877 (2007), hep-th/0605213.
[21] R. P. Malik, J. Phys. A: Math. Gen 33, 2437 (2000), hep-th/9902146.
[22] R. P. Malik, J. Phys. A: Math. Gen. 34, 4167 (2001), hep-th/0012085.
[23] R. P. Malik, Ann. Phys. (N. Y.) 307, 01 (2003), hep-th/0205135.
[24] R. P. Malik, J. Phys. A: Math. Gen 35, 6919 (2002), hep-th/0112260.
[25] R. P. Malik, J. Phys. A: Math. Gen. 35, 8817 (2002), hep-th/0204015.
[26] K. Nishijima, Czech. J. Phys. 46, 01 (1996).
[27] S. Weinberg, The Quantum Theory of Fields: Modern Applications Vol. 2 (Cambridge
University Press, Cambridge, 1996).
[28] G. Curci and R. Ferrari, Phys. Lett. B 63, 51 (1976).
[29] G. Curci and R. Ferrari, Nuovo Cimento A 32, 151 (1976).
[30] R. P. Malik, Nilpotent symmetry invariance in QED with Dirac fields: superfield for-
malism, arXiv: 0706.4168 [hep-th].
[31] R. P. Malik and B. P. Mandal, Superfield approach to the nilpotent symmetry invariance
in the non-Abelian 1-form gauge theory, arXiv: 0709.2277 [hep-th].
[32] R. P. Malik and B. P. Mandal, Nilpotent symmetry invariance in QED with complex
scalar fields: augmented superfield formalism, arXiv: 0711.2389 [hep-th].
[33] E. Harikumar, R. P. Malik and M. Sivakumar, J. Phys. A: Math. Gen. 33, 7149 (2000),
hep-th/0004145.
[34] R. P. Malik, Int. J. Mod. Phys. A 19, 5663 (2004), hep-th/0212240.
[35] R. P. Malik, J. Phys. A: Math. Gen. 36, 5056 (2003), hep-th/0209136.
[36] R. P. Malik, Superfield approach to nilpotent (anti-)BRST symmetries for the free
Abelian 2-form gauge theory, hep-th/0702039.
[37] R. P. Malik, in preparation.
ABSTRACT
  We capture the off-shell as well as the on-shell nilpotent
Becchi-Rouet-Stora-Tyutin (BRST) and anti-BRST symmetry invariance of the
Lagrangian densities of the four (3 + 1)-dimensional (4D) (non-)Abelian 1-form
gauge theories within the framework of the superfield formalism. In particular,
we provide the geometrical interpretations for (i) the above nilpotent symmetry
invariance, and (ii) the above Lagrangian densities, in the language of the
specific quantities defined in the domain of the above superfield formalism.
Some of the subtle points, connected with the 4D (non-)Abelian 1-form gauge
theories, are clarified within the framework of the above superfield formalism
where the 4D ordinary gauge theories are considered on the (4, 2)-dimensional
supermanifold parametrized by the four spacetime coordinates x^\mu (with \mu =
0, 1, 2, 3) and a pair of Grassmannian variables \theta and \bar\theta. One of
the key results of our present investigation is a great deal of simplification
in the geometrical understanding of the nilpotent (anti-)BRST symmetry
invariance.

<|endoftext|><|startoftext|>
Introduction
Let a = (ai), i ∈ Z be a sequence of variables. Consider the ring of polynomials
Z[a] in the variables ai with integer coefficients. Introduce another infinite set of
variables x = (x1, x2, . . . ) and for each nonnegative integer n denote by Λn the ring
of symmetric polynomials in x1, . . . , xn with coefficients in Z[a]. The ring Λn is
filtered by the usual degrees of polynomials in x1, . . . , xn with the ai considered to
have the zero degree. The evaluation map
ϕn : Λn → Λn−1, P (x1, . . . , xn) 7→ P (x1, . . . , xn−1, an) (1.1)
is a homomorphism of filtered rings so that we can define the inverse limit ring Λ by
Λ = lim
Λn, n → ∞, (1.2)
where the limit is taken with respect to the homomorphisms (1.1) in the category of
filtered rings. When a is specialized to the sequence of zeros, this reduces to the usual
definition of the ring of symmetric functions; see e.g. Macdonald [14]. In that case,
a distinguished basis of Λ is comprised by the Schur functions sλ(x) parameterized
by all partitions λ. The respective analogues of the sλ(x) in the general case are the
double Schur functions sλ(x||a) which form a basis of Λ over Z[a]. We introduce the
Littlewood–Richardson polynomials cνλµ(a) as the structure coefficients of the ring Λ
in the basis of double Schur functions,
sλ(x||a) sµ(x||a) =
cνλµ(a) sν(x||a). (1.3)
In the specialization a = (0) the polynomials cνλµ(a) become the classical Littlewood–
Richardson coefficients cνλµ; see [12]. These are remarkable nonnegative integers which
occupy a prominent place in combinatorics, representation theory and geometry; see
e.g. Fulton [5], Macdonald [14] and Sagan [21].
The main result of this paper is a combinatorial rule for the calculation of the
Littlewood–Richardson polynomials which provides a manifestly positive formula in
the sense that cνλµ(a) is written as a polynomial in the differences ai − aj , i < j, with
positive integer coefficients.
We consider two applications of the rule. The results of Knutson and Tao [9]
imply that under an appropriate specialization, the polynomials cνλµ(a) describe the
multiplication rule for the equivariant Schubert classes on Grassmannians; see also
Fulton [6] for a more direct argument. Let n and N be nonnegative integers with
n 6 N and let Gr(n,N) denote the Grassmannian of the n-dimensional vector sub-
spaces of CN . The torus T = (C∗)N acts naturally on Gr(n,N). The equivariant
cohomology ring H∗T (Gr(n,N)) is a module over the polynomial ring Z[t1, . . . , tN ]
which can be identified with H∗T ({pt}), the equivariant cohomology ring of a point.
This module has a basis of the equivariant Schubert classes σλ parameterized by all
diagrams λ contained in the n×m rectangle, m = N − n; see e.g. [5, 6]. Then
σλ σµ =
d νλµ σν , (1.4)
where d νλµ = c
λµ(a) with the sequence a specialized by
a−m+1 = −t1, a−m+2 = −t2, . . . , an = −tN , (1.5)
while the remaining parameters ai are set to zero (the ti should be replaced with
yi in the notation of [9]). The coefficients d
λµ are given explicitly as polynomials
in the ti − tj , i > j, with positive integer coefficients. This positivity property was
established by Graham [8] in the general context of the equivariant Schubert calculus.
The first manifestly positive formula for the coefficients in the expansion (1.4) was
obtained by Knutson and Tao [9] by using combinatorics of puzzles. An earlier rule
of Molev and Sagan [17] also calculates d νλµ but lacks the explicit positivity property.
Our new rule implies a stability property of the coefficients d νλµ (see Corollary 3.1
below). Even though this property was not pointed out in [9], it can be derived
directly from the puzzle rule; see also Fulton [6] for its geometrical interpretation and
an extension to the equivariant Schubert calculus on the flag variety.
As another application, we obtain a rule for the positive integer expansion of the
product of two (virtual) quantum immanants (or the corresponding higher Capelli
operators) of Okounkov and Olshanski [18, 19]; cf. [17]. The quantum immanants
Sλ|n are elements of the center Z(gln) of the universal enveloping algebra U(gln)
parameterized by partitions λ with at most n parts; see [18]. The elements Sλ|n form
a basis of Z(gln) so that we can define the coefficients f
λµ by the expansion
Sλ|n Sµ|n =
f νλµ Sν|n.
Then f νλµ = c
λµ(a) for the specialization ai = −i for i ∈ Z. As n → ∞ this yields
a multiplication rule for the virtual quantum immanants Sλ; see Section 3.2 for the
definitions.
We define the double Schur function sλ(x||a) as the sequence of the double Schur
polynomials
sλ(x1, . . . , xn ||a), n = 1, 2, . . . , (1.6)
which are compatible with respect to the homomorphisms (1.1),
ϕn : sλ(x1, . . . , xn ||a) 7→ sλ(x1, . . . , xn−1 ||a). (1.7)
The polynomials (1.6) are closely related to the “factorial” or “double” Schur poly-
nomials sλ(x|u) with x = (x1, . . . , xn). The latter were introduced by Goulden and
Greene [7] and Macdonald [13] as a generalization of the factorial Schur polynomials
of Biedenharn and Louck [1, 2], and they are also a special case of the double Schu-
bert polynomials of Lascoux and Schützenberger; see Lascoux [11]. We follow Chen,
Li and Louck [4] and Fulton [6] and use the name “double Schur polynomials” for
the related polynomials sλ(x||a) as well.
In a more detail, consider a partition λ which is a sequence λ = (λ1, . . . , λn)
of integers λi such that λ1 > · · · > λn > 0. We will identify λ with its diagram
represented graphically as the array of left justified rows of unit boxes with λ1 boxes
in the top row, λ2 boxes in the second row, etc. The total number of boxes in λ will
be denoted by |λ|. The transposed diagram λ′ = (λ′1, . . . , λ
p) is obtained from λ by
applying the symmetry with respect to the main diagonal, so that λ′j is the number
of boxes in the j-th column of λ.
Let u = (u1, u2, . . . ) be a sequence of variables. The polynomials sλ(x|u) can be
defined by
sλ(x|u) =
(xT (α) − uT (α)+c(α)), (1.8)
where T runs over all semistandard (column-strict) tableaux of shape λ with entries
in {1, . . . , n}, T (α) is the entry of T in the box α ∈ λ and c(α) = j − i is the content
of the box α = (i, j) in row i and column j.
By a reverse λ-tableau T we will mean the tableau obtained by filling in the boxes
of λ with the numbers 1, 2, . . . , n in such a way that the entries weakly decrease along
the rows and strictly decrease down the columns. If α = (i, j) is a box of λ we let
T (α) = T (i, j) denote the entry of T in the box α. We define the double Schur
polynomials sλ(x||a) by
sλ(x||a) =
(xT (α) − aT (α)−c(α)), (1.9)
summed over the reverse λ-tableaux T . Then we have
sλ(x||a) = sλ(x|u) (1.10)
for the sequences a and u related by an−i+1 = ui with i = 1, 2, . . . . In particular, the
polynomial sλ(x||a) only depends on the variables ai with i 6 n, i ∈ Z. The relation
(1.10) is verified easily by replacing xi with xn−i+1 in (1.8) for all i = 1, . . . , n and using
the fact that sλ(x|u) is a symmetric polynomial in x. The property (1.7) of the double
Schur polynomials is immediate from their definition. In the specialization of the
sequence a with ai = −i, i ∈ Z, formula (1.9) defines the shifted Schur polynomials
of Okounkov and Olshanski [18, 19] in the variables yi = xi+ i. The use of the reverse
tableaux was significant in their study of the vanishing and stability properties of
these polynomials and associated central elements of the universal enveloping algebra
for the Lie algebra gln; see also Section 3.2 below.
Note that the stability property (1.7) extends to the double Schubert polynomials
(and to the equivariant Schubert calculus on the flag manifold). This follows easily
from the Cauchy formula for the Schubert polynomials (e.g., put x1 = y1 in [15,
Formula in 2.5.5]). In a more general context, this was also pointed out in [3].
The double Schur polynomials sλ(x||a) parameterized by the diagrams λ with at
most n rows form a basis of the ring Λn. Due to the stability property (1.7), the
Littlewood–Richardson polynomials cνλµ(a) can be defined by the expansion (1.3),
where x is understood as the set of variables x = (x1, . . . , xn) for any positive integer
n such that the diagrams λ, µ and ν have at most n rows. This allows us to work
with a finite set of variables for the determination of the polynomials cνλµ(a). For
the proof of the main theorem (Theorem 2.1) we follow the general approach of [17],
using the techniques of “barred” tableaux and modify the corresponding arguments
in order to obtain manifestly positive polynomials. This is achieved by imposing a
boundness condition on the barred tableaux.
It was observed by Goulden and Greene [7] and Macdonald [13] that sλ(x|u),
regarded as a formal power series in the infinite sets of variables x and u, admits
a “supertableaux” representation. We show that this representation has its “finite”
counterpart where x is a finite set of variables. We derive the corresponding formula
by choosing a certain specialization of the 9th Variation in [13]. This representa-
tion leads to a “supertableau” expression for the Littlewood–Richardson polynomials
cνλµ(a), although that expression is neither manifestly positive, nor stable.
After the first version of this paper was completed we have learned of an indepen-
dent work of V. Kreiman [10], where a positive equivariant Littlewood–Richardson
rule was given. That rule is equivalent to our Theorem 2.1 although the proof in [10]
is different. Moreover, Kreiman’s paper also provides a weight-preserving bijection
between the Knutson–Tao puzzles and the barred tableaux used in Theorem 2.1.
This work was inspired by Bill Fulton’s lectures [6]. I am grateful to Bill for
stimulating discussions.
2 Multiplication rule
Let R denote a sequence of diagrams
µ = ρ(0) → ρ(1) → · · · → ρ(l−1) → ρ(l) = ν, (2.1)
where ρ → σ means that σ is obtained from ρ by adding one box. Let ri denote
the row number of the box added to the diagram ρ(i−1). The sequence r1r2 . . . rl is
called the Yamanouchi symbol of R. Introduce the ordering on the set of boxes of
a diagram λ by reading them by columns from left to right and from bottom to top
in each column. We call this the column order . We shall write α ≺ β if α (strictly)
precedes β with respect to the column order. Given a sequence R, construct the
set T (λ,R) of barred reverse λ-tableaux T with entries from {1, 2, . . . } such that T
contains boxes α1, . . . , αl with
α1 ≺ · · · ≺ αl and T (αi) = ri, 1 6 i 6 l.
We will distinguish the entries in α1, . . . , αl by barring each of them. So, an element
of T (λ,R) is a pair consisting of a reverse λ-tableau and a chosen sequence of barred
entries compatible with R. We shall keep the notation T for such a pair. For example,
let R be the sequence
(3, 1) → (3, 2) → (3, 2, 1) → (3, 3, 1) → (4, 3, 1)
so that the Yamanouchi symbol is 2 3 2 1. Then for λ = (5, 5, 3) the following barred
λ-tableau belongs to T (λ,R):
For each box α with αi ≺ α ≺ αi+1, 0 6 i 6 l, set ρ(α) = ρ
(i). The barred entries
r1, . . . , rl divide the tableau into regions marked by the elements of the sequence R,
as illustrated:
ρ(0) ρ(1)
r1 r2
· · ·
Finally, a reverse λ-tableau T will be called ν-bounded if
T (1, j) 6 ν ′j for all j = 1, . . . , λ1.
Note that ν-bounded λ-tableaux exist only if λ ⊆ ν.
We are now in a position to state a rule for the calculation of the Littlewood-
Richardson polynomials cνλµ(a) defined by (1.3).
Theorem 2.1. The polynomial cνλµ(a) is zero unless µ ⊆ ν. If µ ⊆ ν then
cνλµ(a) =
T (α) unbarred
aT (α)−ρ(α)
T (α)
− aT (α)−c(α)
, (2.2)
summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux
T ∈ T (λ,R). Moreover, for each factor occurring in the formula (2.2) we have
ρ(α)T (α) > c(α).
Before proving the theorem, let us point out some properties of the Littlewood-
Richardson polynomials which are immediate from the rule and consider some exam-
ples. The polynomial cνλµ(a) is zero unless both diagrams λ and µ are contained in
ν and |λ| + |µ| > |ν|. In this case cνλµ(a) is a homogeneous polynomial in the ai of
degree |λ| + |µ| − |ν|. If |λ| + |µ| − |ν| = 0 then the theorem reproduces a version
of the classical Littlewood-Richardson rule; see Corollary 2.9 below. Note also that
by the definition, the polynomials have the symmetry cνλµ(a) = c
µλ(a) which is not
apparent from the rule.
Example 2.2. For the product of the double Schur functions s(2)(x||a) and s(2,1)(x||a)
we have
s(2)(x||a) s(2,1)(x||a) = s(4,1)(x||a) + s(3,2)(x||a) + s(3,1,1)(x||a) + s(2,2,1)(x||a)
a−1 − a2 + a−2 − a0
s(3,1)(x||a) +
a−1 − a2
s(2,2)(x||a)
a−1 − a0
s(2,1,1)(x||a) +
a−1 − a2
a−1 − a0
s(2,1)(x||a).
For instance, the coefficient of s(3,1)(x||a) is calculated by the following barred (2)-
tableaux
1 1 1 1 2 1
compatible with the sequence (2, 1) → (3, 1). They contribute respectively a−1 − a1,
a−2 − a0, a1 − a2 which sums up to the coefficient a−1 − a2 + a−2 − a0. Alternatively,
using the symmetry cνλµ(a) = c
µλ(a) we can calculate the coefficient of s(3,1)(x||a) by
considering the barred (2, 1)-tableaux
compatible with the sequences (2) → (3) → (3, 1) and (2) → (2, 1) → (3, 1), respec-
tively. Their contributions to the coefficient are a−2 − a0 and a−1 − a2.
Example 2.3. For the calculation of c
(5,2,2)
(4,2,1)(2,2)
(a) take λ = (4, 2, 1), µ = (2, 2) and
ν = (5, 2, 2). We have ten sequences R of the form (2.1) but the set T (λ,R) contains
ν-bounded tableaux only for three of them. For the sequence R1 with the Yamanouchi
symbol 1 3 3 1 1, the set T (λ,R1) contains two bounded barred tableaux
3 1 1
3 1 1
whose contributions to the Littlewood–Richardson polynomial are (a0 − a3)(a0 − a2)
and (a0 − a3)(a−2 − a1), respectively. For the sequence R2 with the Yamanouchi
symbol 1 3 1 3 1, the set T (λ,R2) contains the bounded tableaux
3 1 1
3 1 1
with the respective contributions (a0 − a3)(a−4 − a−2) and (a0 − a3)(a−3 − a−1). For
the sequence R3 with the Yamanouchi symbol 3 1 3 1 1, the set T (λ,R3) contains the
only bounded tableau
3 1 1
with the contribution (a−1 − a3)(a0 − a3). Hence,
(5,2,2)
(4,2,1)(2,2)
(a) = (a0 − a3) (a−4 + a−3 + a0 − a1 − a2 − a3).
Taking λ = (2, 2), µ = (4, 2, 1) and ν = (5, 2, 2) we get two sequences with the
Yamanouchi symbols 1 3 and 3 1. The corresponding sets T (λ,R) consist of five
and four bounded barred tableaux, respectively, thus leading to a slightly longer
calculation.
Proof of Theorem 2.1. We present the proof as a sequence of lemmas. Due to the
stability property (1.7), we may (and will) work with a finite set of variables x =
(x1, . . . , xn). Accordingly, possible entries of the tableaux are now elements of the set
{1, . . . , n}. Introduce another sequence of variables b = (bi), i ∈ Z, and define the
Littlewood–Richardson type coefficients cνλµ(a, b) by the expansion
sλ(x||b) sµ(x||a) =
cνλµ(a, b) sν(x||a). (2.3)
Lemma 2.4. The coefficient cνλµ(a, b) is zero unless µ ⊆ ν. If µ ⊆ ν then
cνλµ(a, b) =
T (α) unbarred
aT (α)−ρ(α)
T (α)
− bT (α)−c(α)
, (2.4)
summed over all sequences R of the form (2.1) and all reverse λ-tableaux T ∈ T (λ,R).
Proof. This is essentially a reformulation of the main result of [17] (Theorem 3.1).
Note that the summation in (2.4) is taken over all barred tableaux T ∈ T (λ,R) (not
just over the ν-bounded ones as in (2.2)). Rather than repeating the whole argument
of [17], we only sketch the main steps of the proof and indicate the necessary changes
to be made. We refer the reader to [17] for the details.
We assume that all diagrams here have at most n rows. If ρ = (ρ1, . . . , ρn) is a
such diagram, we set
aρ = (a1−ρ1 , . . . , an−ρn) and |aρ| = a1−ρ1 + · · ·+ an−ρn.
Under the correspondence (1.10) we have aρ = uρ = (uρ1+n, . . . , uρn+1), the latter
notation was used in [17].
The starting point is the Vanishing Theorem of [18] whose proof was also repro-
duced in [17]. By that theorem,
sλ(aρ ||a) = 0 unless λ ⊆ ρ,
sλ(aλ ||a) =
(i,j)∈λ
ai−λi − aλ′j−j+1
The first claim of the lemma follows from the Vanishing Theorem which also implies
λµ(a, b) = sλ(aµ ||b).
This proves (2.4) for the case ν = µ. Now we suppose that |ν| − |µ| > 1 and proceed
by induction on |ν| − |µ|. The induction step is based on the recurrence relation
cνλµ(a, b) =
|aν | − |aµ|
cνλµ+(a, b)−
λµ (a, b)
(2.5)
which was proved in [17, Proposition 3.4]; see also [9]. Suppose that the diagram ν
is obtained from µ by adding one box in row r. Then
cνλµ(a, b) =
sλ(aν ||b)− sλ(aµ ||b)
(aν)r − (aµ)r
. (2.6)
Now use the definition (1.9) of the double Schur polynomials. Since the n-tuples aν
and aµ only differ at the r-th component, the ratio on the right hand side of (2.6)
can be expanded by taking into account the entries r of the reverse λ-tableaux T .
We need the following formula, where we are thinking of y = (aν)r, z = (aµ)r and
mi = bT (α)−c(α) as α runs over the boxes of T with T (α) = r in column order:
i=1(y −mi)−
i=1(z −mi)
y − z
(z −m1) . . . (z −mj−1)(y −mj+1) . . . (y −mk).
The right hand side of (2.6) can now be interpreted as the right hand side of (2.4),
where R is the only sequence µ → ν and the sum is taken over the reverse λ-tableaux
T with one barred entry r, as illustrated:
µ r ν
Here ρ(α) = µ for all boxes α preceding the box occupied by the barred r, and
ρ(α) = ν for all boxes α which follow that box in column order. Note that the
variables y and z are now swapped on the right hand side of the above expansion,
as compared to [17] (this does not change the polynomial due to the symmetry in y
and z). Consequently, the column order used in [17] is the opposite to the order on
the boxes of λ we use here.
We can represent the above calculation of cνλµ(a) by the “diagrammatic” relation
|aν | − |aµ|
µ r ν = ν − µ
Consider now the next case where |ν| − |µ| = 2 and apply the recurrence relation
(2.5). We have three subcases: the diagram ν is obtained from µ by adding two boxes
in different rows and columns; by adding two boxes in the same row; or by adding
two boxes in the same column. The first two subcases are dealt with in a way similar
to the case |ν|− |µ| = 1. An additional care is needed for the third subcase where we
suppose that ν is obtained from µ by adding the boxes in rows r and r + 1. Denote
by ρ the diagram obtained from µ by adding the box in row r. Then (2.5) gives
cνλµ(a, b) =
cνλρ(a, b)− c
λµ(a, b)
|aν | − |aµ|
Set s = r+1. Exactly as in the case |ν|−|µ| = 1, we have the following diagrammatic
relations:
|aρ| − |aµ|
sρ ν = ρ s ν − µ s ν
|aν | − |aρ|
sρ ν = µ r ν − µ r ρ
Hence, the desired formula for cνλµ(a, b) will follow if we prove the relation
µ r ν = µ s ν
We construct a weight-preserving bijection between the barred reverse λ-tableaux
which are represented by the left and right hand sides of this diagrammatic relation.
Here the weight is the product on the right hand side of (2.4) corresponding to a
barred tableau. Let such a tableau with a barred entry r in the box (i, j) be given.
Suppose first that the box (i − 1, j) belongs to the diagram and it is occupied by
s = r+1. Then the image of the tableau under the map is the same tableau but the
entry T (i, j) = r is now unbarred while T (i− 1, j) = r + 1 is barred. Since
(aν)r+1 = (aµ)r and T (i− 1, j)− c(i− 1, j) = T (i, j)− c(i, j),
the weights of the tableaux are preserved under the map.
Suppose now that the entry in the box (i − 1, j) is greater than r + 1, or this
box is outside the diagram. Consider all entries r in the row i to the left of the box
(i, j) and suppose that they occupy the boxes (i, j −m), (i, j −m+1), . . . , (i, j− 1).
Then the image of the tableau under the map is the tableau obtained by replacing
the entries in each of the boxes (i, j −m), . . . , (i, j) with s = r + 1 and barring the
entry in the box (i, j −m). The weights of the tableaux are again preserved.
The inverse map is described in a similar way. This gives the desired weight-
preserving bijection. The general argument uses similar calculations with the barred
diagrams and a similar bijection described in [17].
Remark 2.5. (i) A cohomological interpretation of the coefficients cνλµ(a, b) and their
puzzle computation can be found in [9].
(ii) The definition (2.3) of the coefficients cνλµ(a, b) can be extended to the case
where λ is a skew diagram. Lemma 2.4 and its proof remain valid; see [17].
(iii) In contrast with the Littlewood–Richardson polynomials cνλµ(a), the coeffi-
cients cνλµ(a, b) do not have the stability property as they depend on n.
Lemma 2.4 implies that the Littlewood–Richardson polynomials can be calcu-
lated by (2.4) with b = a, that is, cνλµ(a) = c
λµ(a, a). Our strategy now is to show
that (unlike the formula of Theorem 3.1 in [17]), the formula (2.4) (with b = a) is
“nonnegative” in the sense that all nonzero products which occur in the formula are
polynomials in the ai − aj with i < j. Then we demonstrate that the ν-boundness
condition serves to eliminate the unwanted zero terms.
Lemma 2.6. Let R be a sequence of the form (2.1) and let T ∈ T (λ,R). Suppose
that ∏
T (α) unbarred
aT (α)−ρ(α)
T (α)
− aT (α)−c(α)
6= 0. (2.7)
Then ρ(α)
T (α)
> c(α) for all α ∈ λ with unbarred T (α).
Proof. Suppose on the contrary that there exists a box α = (i, j) with an unbarred
T (i, j) and the condition ρ(i, j)T (i,j) < j − i; the equality ρ(i, j)T (i,j) = j − i is
excluded since this would violate (2.7). Choose such a box with the minimum possible
value of j. If all the entries T (i, 1), . . . , T (i, j − 1) of T are barred then ρ(i, j) is
obtained from µ by adding boxes in rows T (i, 1) > · · · > T (i, j − 1) and, possibly,
by adding other boxes. Since T (i, j − 1) > T (i, j), we have ρ(i, j)T (i,j) > j − 1, a
contradiction. So, at least one of the entries T (i, 1), . . . , T (i, j−1) must be unbarred.
Take such an unbarred entry T (i, k) which is the closest to T (i, j), that is, all entries
T (i, k+1), . . . , T (i, j − 1) are barred. Then ρ(i, j) is obtained from ρ(i, k) by adding
boxes in rows T (i, k + 1) > · · · > T (i, j − 1) and, possibly, by adding other boxes.
Hence,
ρ(i, j)T (i,j) > ρ(i, k)T (i,k) + j − k − 1
which implies ρ(i, k)T (i,k) < k − i + 1. However, if ρ(i, k)T (i,k) = k − i then the
factor in (2.7) corresponding to α = (i, k) is zero, which is impossible. Therefore
ρ(i, k)T (i,k) < k − i which contradicts the choice of j.
Lemma 2.7. Suppose that R is a sequence of the form (2.1) and T ∈ T (λ,R). If
(2.7) holds then T is ν-bounded.
Proof. By Lemma 2.6, for all unbarred entries T (1, k) of the first row of the tableau
T we have ρ(1, k)T (1,k) > k. This implies νT (1,k) > k. If the entry T (1, j) is barred
then ρ(1, k)T (1,k) > k for the nearest unbarred entry T (1, k) on its left (if it exists).
Then ν is obtained from ρ(1, k) by adding boxes in rows T (1, k + 1) > · · · > T (1, j)
and, possibly, by adding other boxes. This implies νT (1,j) > j. Thus, this inequality
holds for all j = 1, . . . , λ1. This is equivalent to the ν-boundness of T .
Lemma 2.8. Suppose that R is a sequence of the form (2.1) and T ∈ T (λ,R) is
ν-bounded. Then ρ(α)
T (α)
> c(α) for all α ∈ λ with unbarred T (α).
Proof. We argue by contradiction. Taking into account Lemma 2.6, we find that for
some α = (i, j) with unbarred T (α) we have ρ(i, j)T (i,j) = j − i. Set t = T (i, j)
and consider all barred entries of T (assuming for now they exist) which are equal
to t and occur to the right of the column j. Since T is a reverse tableau, these
entries t̄ can only occur in rows 1, 2, . . . , i. Let (r, k) be the box with the maximum
column number k containing t̄. Then the total number of such entries t̄ does not
exceed k− j. This implies that the number of boxes νt in row t of ν does not exceed
ρ(i, j)t + k − j = k − i. Hence, ν
k 6 t − 1. On the other hand, by the ν-boundness
of T we have t = T (r, k) 6 T (1, k) 6 ν ′k, a contradiction.
If none of the boxes to the right of the column j contains t̄ then νt = ρ(i, j)t = j−i.
However, by the assumption, νt > νT (1,j) > j, a contradiction.
This completes the proof of the theorem.
By the column word of a tableau T we will mean the sequence of all entries of T
written in the column order.
Corollary 2.9. Suppose that |ν| = |λ| + |µ|. The Littlewood–Richardson coefficient
cνλµ equals the number of ν-bounded reverse λ-tableaux T whose column word coincides
with the Yamanouchi symbol of a certain sequence R of the form (2.1).
This can be shown to be equivalent to a well-known version of the Littlewood–
Richardson rule. Corollary 2.9 also holds with the ν-boundness condition dropped;
see Lemma 2.7. By the corollary, cνλµ counts the cardinality of the intersection of two
finite sets: the set of column words of ν-bounded reverse λ-tableaux and the set of
Yamanouchi symbols of the sequences of the form (2.1).
Remark 2.10. Due to (1.10), the multiplication rule for the polynomials sλ(x|u) is
obtained from Theorem 2.1 by replacing ai with un−i+1 for each i. The corresponding
coefficients are polynomials in the ui−uj, i > j, with positive integer coefficients.
Corollary 2.11. Suppose that the polynomials cνλµ(a) are defined by the expansion
(1.3) with x = (x1, . . . , xn). Then c
λµ(a) is independent of n as soon as n > ν
Moreover, if n < ν ′1 then c
λµ(a) = 0.
Proof. This follows from the boundness condition on the reverse tableaux.
3 Applications
3.1 Equivariant Schubert calculus on the Grassmannian
As in the Introduction, consider the equivariant cohomology ring H∗T (Gr(n,N)) as
a module over Z[t1, . . . , tN ]. Let x1, . . . , xn denote the Chern roots of the dual S
of the tautological subbundle S of the trivial bundle CNGr(n,N) so that for the total
equivariant Chern class of S we have
cT (S) =
(1− xi).
Then, due to [6, Lecture 8, Proposition 1.1] (see also [16]), the equivariant Schubert
classes σλ can be expressed by
σλ = sλ(x|u), u = (−tN , . . . ,−t1, 0, . . . ).
Hence, Theorem 2.1 yields a multiplication rule for the equivariant Schubert classes.
The corresponding stability property is implied by Corollary 2.11.
Corollary 3.1. We have
σλ σµ =
d νλµ σν ,
where
d νλµ =
T (α) unbarred
tm+T (α)−c(α) − tm+T (α)−ρ(α)
T (α)
, (3.1)
summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux
T ∈ T (λ,R). In particular, the d νλµ are polynomials in the ti− tj, i > j, with positive
integer coefficients. Moreover, the coefficients d νλµ, regarded as polynomials in the
variables ai defined in (1.5), are independent of n and m, as soon as the inequalities
n > λ′1 + µ
1 and m > λ1 + µ1 hold.
Example 3.2. For any n > 3 and m > 4 we have
σ(2) σ(2,1) = σ(4,1) + σ(3,2) + σ(3,1,1) + σ(2,2,1)
+ (tm+2 − tm−1 + tm − tm−2) σ(3,1) + (tm+2 − tm−1) σ(2,2)
+ (tm − tm−1) σ(2,1,1) + (tm+2 − tm−1) (tm − tm−1) σ(2,1).
This follows from Example 2.2.
The first manifestly positive rule for the expansion of σλ σµ was given by Knutson
and Tao [9] by using combinatorics of puzzles. Although the stability property was not
pointed out in [9], it can be deduced directly from the puzzle rule or by applying the
weight-preserving bijection between the puzzles and the barred tableaux constructed
by Kreiman [10].
3.2 Quantum immanants and higher Capelli operators
Let gln denote the general linear Lie algebra over C. Consider the center Z(gln) of
the universal enveloping algebra U(gln). The algebra U(gln) is equipped with the
natural filtration. For all n we identify gln−1 as a subalgebra of gln in a usual way
and denote by gl∞ the corresponding inductive limit
gl∞ =
Due to Olshanski [20], there exist filtration-preserving homomorphisms
on : Z(gln) → Z(gln−1), n > 1, (3.2)
which allow one to define the algebra Z of the virtual Casimir elements for the Lie
algebra gl∞ as the inverse limit
Z = lim
Z(gln), n → ∞,
in the category of filtered algebras.
The quantum immanants Sλ|n are elements of the center Z(gln) of the universal
enveloping algebra U(gln) parameterized by the diagrams λ with at most n rows;
see [18]. The elements Sλ|n form a basis of Z(gln) and they are consistent with the
Olshanski homomorphisms (3.2) so that
on : Sλ|n 7→ Sλ|n−1, (3.3)
where we assume Sλ|n = 0 if the number of rows of λ exceeds n. For any diagram λ,
the corresponding virtual quantum immanant Sλ is then defined as the sequence
Sλ = ( Sλ|n |n > 0).
The elements Sλ parameterized by all diagrams λ form a basis of the algebra Z so
that we can define the coefficients f νλµ by the expansion
Sλ Sµ =
f νλµ Sν .
Note that the same coefficients f νλµ determine the multiplication rule for the higher
Capelli operators ∆λ, which are defined as the sequences of the images of the quantum
immanants Sλ|n, where each image is taken under a natural representation of gln by
differential operators; see [18, 19].
Corollary 3.3. The coefficient f νλµ is zero unless µ ⊆ ν. If µ ⊆ ν then
f νλµ =
T (α) unbarred
ρ(α)T (α) − c(α)
, (3.4)
summed over all sequences R of the form (2.1) and all ν-bounded reverse λ-tableaux
T ∈ T (λ,R). In particular, the f νλµ are nonnegative integers.
Proof. Due to the stability property (3.3) of the quantum immanants, it suffices to
calculate the corresponding coefficients for the expansion of the products Sλ|n Sµ|n.
The images of the quantum immanants Sλ|n under the Harish-Chandra isomorphism
can be identified with the double Schur polynomials sλ(x||a) where the sequence a is
specialized to ai = −i; see [18]. Therefore, the coefficients in question coincide with
the corresponding specializations of the Littlewood–Richardson polynomials cνλµ(a).
Example 3.4. Using Example 2.2 we get
S(2) S(2,1) = S(4,1) + S(3,2) + S(3,1,1) + S(2,2,1) + 5 S(3,1) + 3 S(2,2) + S(2,1,1) + 3 S(2,1).
In the course of the proof of Corollary 3.3 we also calculated the coefficients
for the expansion of the products Sλ|n Sµ|n for any n. Some other formulas for these
coefficients were obtained in [17]. In particular, it was shown that the f νλµ are integers,
although their positivity property was not established there.
Note also that the algebra of virtual Casimir elements Z is isomorphic to the
algebra of shifted symmetric functions Λ∗; see [19]. The latter can be regarded as the
specialization of Λ (or rather, its extension over C) at ai = −i for all i ∈ Z.
4 Supertableau formulas for sλ(x||a) and c
λµ(a)
Here we obtain one more rule for the calculation of the Littlewood–Richardson poly-
nomials cνλµ(a). It relies on a supertableau representation of the double Schur poly-
nomials sλ(x||a) which is implied by the results of [13]. This representation provides
a “finite” version of the supertableau formulas of [7] and [13]; cf. [4].
Fix a positive integer n. For r > 1 set u(r) = (u1, . . . , ur) and use the 9th Variation
in [13] with the indeterminates hrs specialized by
hrs = hr(u
(n−r−s+1)) if r + s 6 n,
and 0 otherwise, where hr denotes the r-th complete symmetric polynomial. Let us
write ŝλ/µ(u) for the corresponding Schur functions. Then (8.2) and (9.1) in [13] give
ŝλ/µ(u) =
α∈λ/µ
uT (α),
summed over semistandard tableaux T of shape λ/µ, such that the entries of the i-th
row do not exceed n− λi + i. Furthermore, using (6.18)
1 and (9.6 ′) in [13] we get
sλ(x|u) =
sµ(x) ŝλ′/µ′(−u). (4.5)
Equivalently, this can be interpreted as a combinatorial expression for the polynomials
sλ(x|u) in terms of “supertableaux”. Identify the indices of u with the symbols
1′, 2′, . . . . A supertableau T is obtained by filling in the diagram of λ with the indices
1, . . . , n, 1′, 2′, . . . in such a way that in each row (resp. column) each primed index is
to the right (resp. below) of each unprimed index; unprimed indices weakly increase
along the rows and strictly increase down the columns; primed indices strictly increase
along the rows and weakly increase down the columns; primed indices in column j
do not exceed n− λ′j + j. Relation (4.5) implies the following.
Proposition 4.1. We have
sλ(x|u) =
T (α) unprimed
xT (α)
T (α) primed
(−uT (α)), (4.6)
summed over all λ-supertableaux T .
Using (1.10), we get an analogous representation for the double Schur polynomials
sλ(x||a). A reverse supertableau T is obtained by filling in the diagram of λ with
1This formula in [13] should be corrected by replacing a(λj+n−j) with a(λi+n−i).
the indices 1, . . . , n, n′, (n − 1)′, . . . (including non-positive primed indices) in such
a way that in each row (resp. column) each primed index is to the right (resp.
below) of each unprimed index; unprimed indices weakly decrease along the rows and
strictly decrease down the columns; primed indices strictly decrease along the rows
and weakly decrease down the columns; primed indices in column j are not less than
λ′j − j + 1. The following supertableau representation of the polynomials sλ(x||a)
follows from Proposition 4.1.
Corollary 4.2. We have
sλ(x||a) =
T (α) unprimed
xT (α)
T (α) primed
(−aT (α)), (4.7)
summed over all reverse λ-supertableaux T .
Example 4.3. Let n = 2 and λ = (2, 1). By the definition (1.9),
s(2,1)(x||a) = (x2 − a2)(x1 − a0)(x1 − a2) + (x2 − a2)(x2 − a1)(x1 − a2).
On the other hand, the reverse (2, 1)-supertableaux are
2 0 ′
2 1 ′
2 2 ′
2 0 ′
2 1 ′
2 2 ′
1 0 ′
1 1 ′
1 2 ′
2 ′ 0 ′
2 ′ 1 ′
which yield
s(2,1)(x||a) = x
1x2 + x1x
2 − x1x2a2 − x
2a2 − x
1a2 − x1x2a0 − x1x2a1 − x1x2a2
+ x2a0a2 + x2a1a2 + x2a
2 + x1a0a2 + x1a1a2 + x1a
2 − a0a
2 − a1a
Formula (4.5) implies a supertableau representation of the coefficients cνλµ(a, b)
and hence, of the Littlewood–Richardson polynomials cνλµ(a). The representation
for the latter is neither manifestly positive, nor stable; it provides an expression for
cνλµ(a) as an alternating sum of monomials in the ai. Given a sequence R of the form
(2.1), construct the set S(λ,R) of barred reverse λ-supertableaux by analogy with
T (λ,R). A tableau T ∈ S(λ,R) must contain boxes α1, . . . , αl occupied by unprimed
indices r1, r2, . . . , rl listed in the column order which is restricted to the subtableau of
T formed by the unprimed indices. As before, we distinguish the entries in α1, . . . , αl
by barring each of them. For each box α with αi ≺ α ≺ αi+1, 0 6 i 6 l, which is
occupied by an unprimed index, set ρ(α) = ρ(i).
Corollary 4.4. The coefficients cνλµ(a, b) defined in (2.3) can be given by
cνλµ(a, b) =
T (α) unprimed, unbarred
aT (α)−ρ(α)
T (α)
T (α) primed
(−bT (α)), (4.8)
summed over sequences R of the form (2.1) and reverse supertableaux T ∈ S(λ,R).
Proof. Applying formula (4.5) we can reduce the calculation of cνλµ(a, b) to the par-
ticular case of the sequence b = (0). Now (4.8) follows from Lemma 2.4.
Example 4.5. In order to calculate the Littlewood–Richardson polynomial c
(2,1)
(2,1) (2)
we may take n = 2; see Corollary 2.11. The barred reverse supertableaux compatible
with the sequence (2) → (2, 1) are
2 0 ′
2 1 ′
2 2 ′
2 0 ′
2 1 ′
2 2 ′
so that
(2,1)
(2,1) (2)
(a) = a2−1 + a−1a1 + a−1a2 − a−1a2 − a1a2 − a
− a−1a0 − a−1a1 − a−1a2 + a0a2 + a1a2 + a
= a2−1 − a−1a0 − a−1a2 + a0a2,
which agrees with Example 2.2.
References
[1] L. C. Biedenharn and J. D. Louck, A new class of symmetric polynomials defined
in terms of tableaux, Advances in Appl. Math. 10 (1989), 396–438.
[2] L. C. Biedenharn and J. D. Louck, Inhomogeneous basis set of symmetric poly-
nomials defined by tableaux, Proc. Nat. Acad. Sci. U.S.A. 87 (1990), 1441–1445.
[3] A. S. Buch and R. Rimányi, Specializations of Grothendieck polynomials, C. R.
Acad. Sci. Paris, Ser. I 339 (2004), 1–4.
[4] W. Y. C. Chen, B. Li and J. D. Louck, The flagged double Schur function, J.
Alg. Comb. 15 (2002), 7-26.
[5] W. Fulton, Young tableaux. With applications to representation theory and ge-
ometry. London Mathematical Society Student Texts, 35. Cambridge University
Press, Cambridge, 1997.
[6] W. Fulton, Equivariant cohomology in algebraic geometry, Eilenberg lectures,
Columbia University, Spring 2007. Available at
http://www.math.lsa.umich.edu/∼dandersn/eilenberg
[7] I. Goulden and C. Greene, A new tableau representation for supersymmetric
Schur functions, J. Algebra. 170 (1994), 687–703.
[8] W. Graham, Positivity in equivariant Schubert calculus, Duke Math. J. 109
(2001), 599–614.
[9] A. Knutson and T. Tao, Puzzles and (equivariant) cohomology of Grassmanni-
ans, Duke Math. J. 119 (2003), 221–260.
[10] V. Kreiman, Equivariant Littlewood-Richardson tableaux, preprint
arXiv:0706.3738.
[11] A. Lascoux, Interpolation, Lectures at Tianjin University, June 1996. Available
at http://www-igm.univ-mlv.fr/∼al/pub−engl.html
[12] D. E. Littlewood and A. R. Richardson, Group characters and algebra, Philos.
Trans. Roy. Soc. London Ser. A 233 (1934), 49–141.
[13] I. G. Macdonald, Schur functions: theme and variations, in “Actes 28-e
Séminaire Lotharingien”, pp. 5–39. Publ. I.R.M.A. Strasbourg, 1992, 498/S–27.
[14] I. G. Macdonald, Symmetric Functions and Hall Polynomials, Oxford University
Press, Oxford, 1995.
[15] L. Manivel, Symmetric functions, Schubert polynomials and degeneracy loci,
SMF/AMS Texts and Monographs, Vol. 6, 1998.
[16] L. C. Mihalcea, Giambelli formulae for the equivariant quantum cohomology of
the Grassmannian, preprint math.CO/0506335.
[17] A. I. Molev and B. E. Sagan, A Littlewood-Richardson rule for factorial Schur
functions, Trans. Amer. Math. Soc, 351 (1999), 4429–4443.
[18] A. Okounkov, Quantum immanants and higher Capelli identities, Transform.
Groups 1 (1996), 99–126.
http://www.math.lsa.umich.edu/~dandersn/eilenberg
http://arxiv.org/abs/0706.3738
http://www-igm.univ-mlv.fr/~al/pub$_-$engl.html
http://arxiv.org/abs/math/0506335
[19] A. Okounkov and G. Olshanski, Shifted Schur functions, St. Petersburg Math.
J. 9 (1998), 239–300.
[20] G. I. Olshanski, Representations of infinite-dimensional classical groups, limits
of enveloping algebras, and Yangians, in “Topics in Representation Theory”,
Advances in Soviet Math. 2, Amer. Math. Soc., Providence RI, 1991, pp. 1–66.
[21] B. E. Sagan, The symmetric group. Representations, combinatorial algorithms,
and symmetric functions, 2nd edition, Grad. Texts in Math., 203, Springer-
Verlag, New York, 2001.
	Introduction
	Multiplication rule
	Applications
	Equivariant Schubert calculus on the Grassmannian
	Quantum immanants and higher Capelli operators
	Supertableau formulas for s(x || a) and c(a)
ABSTRACT
  We introduce a family of rings of symmetric functions depending on an
infinite sequence of parameters. A distinguished basis of such a ring is
comprised by analogues of the Schur functions. The corresponding structure
coefficients are polynomials in the parameters which we call the
Littlewood-Richardson polynomials. We give a combinatorial rule for their
calculation by modifying an earlier result of B. Sagan and the author. The new
rule provides a formula for these polynomials which is manifestly positive in
the sense of W. Graham. We apply this formula for the calculation of the
product of equivariant Schubert classes on Grassmannians which implies a
stability property of the structure coefficients. The first manifestly positive
formula for such an expansion was given by A. Knutson and T. Tao by using
combinatorics of puzzles while the stability property was not apparent from
that formula. We also use the Littlewood-Richardson polynomials to describe the
multiplication rule in the algebra of the Casimir elements for the general
linear Lie algebra in the basis of the quantum immanants constructed by A.
Okounkov and G. Olshanski.

<|endoftext|><|startoftext|>
Introduction 1
2 The momentum picture 3
3 Lagrangians, Euler-Lagrange equations and dynamical variables 5
4 On the uniqueness of the dynamical variables 10
5 Heisenberg relations 14
6 Types of possible commutation relations 20
6.1 Restrictions related to the momentum operator . . . . . . . . . . . . . . . . . 21
6.2 Restrictions related to the charge operator . . . . . . . . . . . . . . . . . . . . 25
6.3 Restrictions related to the angular momentum operator(s) . . . . . . . . . . . 27
7 Inferences 31
8 State vectors, vacuum and mean values 37
9 Commutation relations for several coexisting different free fields 43
9.1 Commutation relations connected with the momentum operator. Problems
and their possible solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
9.2 Commutation relations connected with the charge and angular momentum
operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.3 Commutation relations between the dynamical variables . . . . . . . . . . . . 50
9.4 Commutation relations under the uniqueness conditions . . . . . . . . . . . . 52
10 Conclusion 54
References 55
This article ends at page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Abstract
Possible (algebraic) commutation relations in the Lagrangian quantum theory of free
(scalar, spinor and vector) fields are considered from mathematical view-point. As sources of
these relations are employed the Heisenberg equations/relations for the dynamical variables
and a specific condition for uniqueness of the operators of the dynamical variables (with
respect to some class of Lagrangians). The paracommutation relations or some their gen-
eralizations are pointed as the most general ones that entail the validity of all Heisenberg
equations. The simultaneous fulfillment of the Heisenberg equations and the uniqueness re-
quirement turn to be impossible. This problem is solved via a redefinition of the dynamical
variables, similar to the normal ordering procedure and containing it as a special case. That
implies corresponding changes in the admissible commutation relations. The introduction of
the concept of the vacuum makes narrow the class of the possible commutation relations;
in particular, the mentioned redefinition of the dynamical variables is reduced to normal
ordering. As a last restriction on that class is imposed the requirement for existing of an
effective procedure for calculating vacuum mean values. The standard bilinear commutation
relations are pointed as the only known ones that satisfy all of the mentioned conditions and
do not contradict to the existing data.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 1
1. Introduction
The main subject of this paper is an analysis of possible (algebraic) commutation relations
in the Lagrangian quantum theory1 of free fields. These relations are considered only from
mathematical view-point and physical consequence of them, like the statistics of many-par-
ticle systems, are not investigated.
The canonical quantization method finds its origin in the classical Hamiltonian mechan-
ics [9, 10] and naturally leads to the canonical (anti)commutation relations [3, 11,12]. These
relations can be obtained from different assumptions (see, e.g., [1,13–15]) and are one of the
basic corner stones of the present-day quantum field theory.
Theoretically there are possible also non-canonical commutation relations. The best
known example of them being the so-called paracommutation relations [16–18]. But, however,
it seems no one of the presently known particles/fields obeys them.
In the present work is shown how different classes of commutation relations, understood
in a broad sense as algebraic connections between creation and/or annihilation operators,
arise from the Lagrangian formalism, when applied to three types of Lagrangians describing
free scalar, spinor and vector fields. Their origin is twofold. One one hand, a requirement for
uniqueness of the dynamical variables (that can be calculated from Lagrangians leading to
identical Euler-Lagrange equation) entails a number of specific commutation relations. On
another hand, any one of the so-called Heisenberg relations/equations [3, 11], implies cor-
responding commutation relations; for example, the paracommutation relations arise from
the Heisenberg equations regarding the momentum operator, when ‘charge symmetric’ La-
grangian is employed.2 The combination of the both methods leads to strong, generally
incompatible, restrictions on the admissible types of commutation relations.
The introduction of the concept of vacuum, combined with the mentioned uniqueness of
the operators of the dynamical variables, changes the situation and requires a redefinition
of these operators in a way similar to the one known as the normal ordering [1, 3, 11, 12],
which is its special case. Some natural assumptions reduce the former to the letter one; in
particular, in that way are excluded the paracommutation relations. However, this does not
reduce the possible commutation relations to the canonical ones. Further, the requirement
to be available an effective procedure for calculating vacuum mean (expectation) values, to
which reduce all predictable results in the theory, puts new restriction, whose only realistic
solution at the time being seems to be the standard canonical (anti)commutation relations.
The layout of the work is as follows.
Sect. 2 gives an idea of the momentum picture of motion and discusses the relations
between the creation and annihilation operators in it and in Heisenberg picture. In Sect. 3
are reviewed some basic results from [13–15], part of which can be found also in papers like [1,
3, 11, 12]. In particular, the explicit expression of the dynamical variables via the creation
1 In this paper we considered only the Lagrangian (canonical) quantum field theory in which the quantum
fields are represented as operators, called field operators, acting on some Hilbert space, which in general
is unknown if interacting fields are studied. These operators are supposed to satisfy some equations of
motion, from them are constructed conserved quantities satisfying conservation laws, etc. From the view-point
of present-day quantum field theory, this approach is only a preliminary stage for more or less rigorous
formulation of the theory in which the fields are represented via operator-valued distributions, a fact required
even for description of free fields. Moreover, in non-perturbative directions, like constructive and conformal
field theories, the main objects are the vacuum mean (expectation) values of the fields and from these are
reconstructed the Hilbert space of states and the acting on it fields. Regardless of these facts, the Lagrangian
(canonical) quantum field theory is an inherent component of the most of the ways of presentation of quantum
field theory adopted explicitly or implicitly in books like [1–8]. Besides, the Lagrangian approach is a source
of many ideas for other directions of research, like the axiomatic quantum field theory [3,7,8].
2 Ordinary [3,11], the commutation relations are postulated and the validity of the Heisenberg relations is
then verified. We follow the opposite method by postulating the Heisenberg equations and, then, looking for
commutation relations that are compatible with them.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 2
and annihilation operators are presented (without assuming some commutation relations or
normal ordering) and it is pointed to the existence of a family of such variables for a given
system of Euler-Lagrange equations for free fields. The last fact is analyzed in Sect. 4,
where a number of its consequences, having a sense of commutation relations, are drawn.
The Heisenberg relations and the commutation relations between the dynamical variables
are reviewed and analyzed in Sect. 5. It is pointed that the letter should be consequences
from the former ones. Arguments are presented that the Heisenberg equation concerning
the angular momentum operator should be split into two independent ones, representing its
‘orbital’ and ‘spin’ parts, respectively.
Sect. 6 contains a method for assigning commutation relations to the Heisenberg equa-
tions. It is shown that the Heisenberg equation involving the ‘orbital’ part of the angular
momentum gives rise to a differential, not algebraic, commutation relation and the one con-
cerning the ‘spin’ part of the angular momentum implies a complicated integro-differential
connections between the creation and annihilation operators. Special attention is paid to
the paracommutation relations, whose particular kind are the ordinary ones, which ensure
the validity of the Heisenberg equations concerning the momentum operator. Partially is
analyzed the problem for compatibility of the different types of commutation relations de-
rived. It is proved that some generalization of the paracommutation relations ensures the
fulfillment of all of the Heisenberg relations.
Sect. 7 is devoted to consequences from the commutation relations derived in Sect. 6
under the conditions for uniqueness of the dynamical variables presented in Sect. 4. Gen-
erally, these requirements are incompatible with the commutation relations. To overcome
the problem, it is proposed a redefinition of the dynamical variables via a method similar to
(and generalizing) the normal ordering. This, of course, entails changes in the commutation
relations, the new versions of which happen to be compatible with the uniqueness conditions
and ensure the validity of the Heisenberg relations.
The concept of the vacuum is introduced in Sect. 8. It reduces (practically) the redefini-
tion of the operators of the dynamical variables to the one obtained via the normal ordering
procedure in the ordinary quantum field theory, but, without additional suppositions, does
not reduce the commutation relations to the standard bilinear ones. As a last step in specify-
ing the commutation relations as much as possible, we introduce the requirement the theory
to supply an effective way for calculating vacuum mean values of (anti-normally ordered)
products of creation and annihilation operators to which are reduced all predictable results,
in particular the mean values of the dynamical variables. The standard bilinear commutation
relation seems to be the only ones know at present that survive that last condition, however
their uniqueness in this respect is not investigated.
Sect. 9 deals with the same problems as described above but for systems containing at
least two different quantum fields. The main obstacle is the establishment of commutation
relations between creation/annihilation operators concerning different fields. Argument is
presented that they should contain commutators or anticommutators of these operators.
The major of corresponding commutation relations are explicitly written and the results
obtained turn to be similar to the ones just described, only in ‘multifield’ version.
Section 10 closes the paper by summarizing its main results.
The books [1–3] will be used as standard reference works on quantum field theory. Of
course, this is more or less a random selection between the great number of (text)books and
papers on the theme to which the reader is referred for more details or other points of view.
For this end, e.g., [4, 12,19] or the literature cited in [1–4,12,19] may be helpful.
Throughout this paper ~ denotes the Planck’s constant (divided by 2π), c is the velocity of
light in vacuum, and i stands for the imaginary unit. The superscripts † and ⊤ mean respec-
tively Hermitian conjugation and transposition (of operators or matrices), the superscript ∗
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 3
denotes complex conjugation, and the symbol ◦ denotes compositions of mappings/operators.
By δfg, or δ
f or δ
fg (:= 1 for f = g, := 0 for f = g) is denoted the Kronecker δ-symbol,
depending on arguments f and g, and δn(y), y ∈ Rn, stands for the n-dimensional Dirac
δ-function; δ(y) := δ1(y) for y ∈ R.
The Minkowski spacetime is denoted by M . The Greek indices run from 0 to dimM −1 =
3. All Greek indices will be raised and lowered by means of the standard 4-dimensional
Lorentz metric tensor ηµν and its inverse ηµν with signature (+ − −−). The Latin indices
a, b, . . . run from 1 to dimM − 1 = 3 and, usually, label the spacial components of some
object. The Einstein’s summation convention over indices repeated on different levels is
assumed over the whole range of their values.
At last, we ought to give an explanation why this work appears under the general title
“Lagrangian quantum field theory in momentum picture” when in it all considerations are
done, in fact, in Heisenberg picture with possible, but not necessary, usage of the creation and
annihilation operators in momentum picture. First of all, we essentially employ the obtained
in [13–15] expressions for the dynamical variables in momentum picture for three types of
Lagrangians. The corresponding operators in Heisenberg picture, which in fact is used in this
paper, can be obtained via a direct calculation, as it is partially done in, e.g., [1] for one of
the mentioned types of Lagrangians. The important point here is that in Heisenberg picture
it suffice to be used only the standard Lagrangian formalism, while in momentum picture one
has to suppose the commutativity between the components of the momentum operator and
the validity of the Heisenberg relations for it (see below equations (2.6) and (2.7)). Since for
the analysis of the commutation relations we intend to do the fulfillment of these relations is
not necessary (they are subsidiary restrictions on the Lagrangian formalism), the Heisenberg
picture of motion is the natural one that has to be used. For this reason, the expression for the
dynamical variables obtained in [13–15] will be used simply as their Heisenberg counterparts,
but expressed via the creation and annihilation operators in momentum picture. The only
real advantage one gets in this way is the more natural structure of the orbital angular
momentum operator. As the commutation relations considered below are algebraic ones, it
is inessential in what picture of motion they are written or investigated.
2. The momentum picture
Since the momentum picture of motion will be used only partially in this work, below is
presented only its definition and the connection between the creation/annihilation operators
in it and in Heisenberg picture. Details concerning the momentum picture can be found
in [20,21] and in the corresponding sections devoted to it in [13–15].
Let us consider a system of quantum fields, represented in Heisenberg picture of motion
by field operators ϕ̃i(x) : F → F , i = 1, . . . , n ∈ N, acting on the system’s Hilbert space
F of states and depending on a point x in Minkowski spacetime M . Here and henceforth,
all quantities in Heisenberg picture will be marked by a tilde (wave) “˜” over their kernel
symbols. Let P̃µ denotes the system’s (canonical) momentum vectorial operator, defined via
the energy-momentum tensorial operator T̃ µν of the system, viz.
P̃µ :=
x0=const
T̃0µ(x) d3x. (2.1)
Since this operator is Hermitian, P̃†µ = P̃µ, the operator
U(x, x0) = exp
(xµ − xµ0 ) P̃µ
, (2.2)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 4
where x0 ∈ M is arbitrarily fixed and x ∈ M ,3 is unitary, i.e. U†(x0, x) := (U(x, x0))† =
U−1(x, x0) := (U(x, x0))−1 and, via the formulae
X̃ 7→ X (x) = U(x, x0)( X̃ ) (2.3)
Ã(x) 7→ A(x) = U(x, x0) ◦ ( Ã(x)) ◦ U−1(x, x0), (2.4)
realizes the transition to the momentum picture. Here X̃ is a state vector in system’s Hilbert
space of states F and Ã(x) : F → F is (observable or not) operator-valued function of x ∈ M
which, in particular, can be polynomial or convergent power series in the field operators
ϕ̃i(x); respectively X (x) and A(x) are the corresponding quantities in momentum picture.
In particular, the field operators transform as
ϕ̃i(x) 7→ ϕi(x) = U(x, x0) ◦ ϕ̃i(x) ◦ U−1(x, x0). (2.5)
Notice, in (2.2) the multiplier (xµ − xµ0 ) is regarded as a real parameter (in which P̃µ is
linear). Generally, X (x) and A(x) depend also on the point x0 and, to be quite correct, one
should write X (x, x0) and A(x, x0) for X (x) and A(x), respectively. However, in the most
situations in the present work, this dependence is not essential or, in fact, is not presented
at all. For that reason, we shall not indicate it explicitly.
The momentum picture is most suitable in quantum field theories in which the compo-
nents P̃µ of the momentum operator commute between themselves and satisfy the Heisenberg
relations/equations with the field operators, i.e. when P̃µ and ϕ̃i(x) satisfy the relations:
[ P̃µ, P̃ν ] = 0 (2.6)
[ ϕ̃i(x), P̃µ] = i~∂µ ϕ̃i(x). (2.7)
Here [A,B]± := A ◦ B ± B ◦ A, ◦ being the composition of mappings sign, is the commuta-
tor/anticommutator of operators (or matrices) A and B.
However, the fulfillment of the relations (2.6) and (2.7) will not be supposed in this paper
until Sect. 6 (see also Sect. 5).
Let a±s (k) and a
s (k) be the creation/annihilation operators of some free particular field
(see Sect. 3 below for a detailed explanation of the notation). We have the connections
ã±s (k) = e
xµkµ U−1(x, x0) ◦ a±s (k) ◦ U(x, x0)
ã†±s (k) = e
xµkµ U−1(x, x0) ◦ a†±s (k) ◦ U(x, x0)
m2c2 + k2 (2.8)
whose explicit form is
ã±s (k) = e
kµa±s (k)
ã†±s (k) = e
kµa†±s (k)
m2c2 + k2. (2.9)
Further it will be assumed ã±s (k) and ã
s (k) to be defined in Heisenberg picture, indepen-
dently of a±s (k) and a
s (k), by means of the standard Lagrangian formalism. What concerns
the operators a±s (k) and a
s (k), we shall regard them as defined via (2.9); this makes them
independent from the momentum picture of motion. The fact that the so-defined operators
a±s (k) and a
s (k) coincide with the creation/annihilation operators in momentum picture
(under the conditions (2.6) and (2.7)) will be inessential in the almost whole text.
3 The notation x0, for a fixed point in M , should not be confused with the zeroth covariant coordinate
µ of x which, following the convention xν := ηνµx
µ, is denoted by the same symbol x0. From the context,
it will always be clear whether x0 refers to a point in M or to the zeroth covariant coordinate of a point
x ∈ M .
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 5
3. Lagrangians, Euler-Lagrange equations
and dynamical variables
In [13–15] we have investigated the Lagrangian quantum field theory of respectively scalar,
spin 1
and vector free fields. The main Lagrangians from which it was derived are respectively
(see loc. cit. or, e.g. [1, 3, 11,12]):
L̃′sc = L̃′sc( ϕ̃, ϕ̃†) =−
1 + τ( ϕ̃)
m2c4 ϕ̃(x) ◦ ϕ̃†(x) + 1
1 + τ( ϕ̃)
c2~2(∂µ ϕ̃(x)) ◦ (∂µ ϕ̃†(x))
(3.1a)
L̃′sp = L̃′sp( ψ̃,
ψ) =− 1
i~c{ ˜̆ψ
(x)C−1γµ ◦ (∂µ ψ̃(x))
− (∂µ ˜̆ψ
(x))C−1γµ ◦ ψ̃(x)}+mc2 ˜̆ψ
(x)C−1 ◦ ψ̃(x)
(3.1b)
L̃′v = L̃′v( Ũ , Ũ†) =
1 + τ( Ũ)
Ũ†µ ◦ Ũ
1 + τ( Ũ)
−(∂µ Ũ
ν) ◦ (∂
µ Ũν) + (∂µ Ũ
) ◦ (∂ν Ũ
(3.1c)
Here it is used the following notation: ϕ̃(x) is a scalar field, a tilde (wave) over a symbol
means that it is in Heisenberg picture, the dagger † denotes Hermitian conjugation, ψ̃ :=
( ψ̃0, ψ̃1, ψ̃2, ψ̃3) is a 4-spinor field,
ψ := C ψ̃
:= C( ψ̃†γ0) is its charge conjugate with γµ
being the Dirac gamma matrices and the matrix C satisfies the equations C−1γµC = −γµ
and C⊤ = −C, Uµ is a vector field, m is the field’s mass (parameter) and the function
τ(A) :=
1 for A† = A (Hermitian operator)
0 for A† 6= A (non-Hermitian operator)
, (3.2)
with A : F → F being an operator on the systems Hilbert space F of states, takes care of
is the field charged (non-Hermitian) or neutral (Hermitian, uncharged). Since a spinor field
is a charged one, we have τ( ψ̃) = 0; sometimes below the number 0 = τ( ψ̃) will be written
explicitly for unification of the notation.
We have explored also the consequences from the ‘charge conjugate’ Lagrangians
L̃′′sc = L̃′′sc( ϕ̃, ϕ̃†) := L̃′sc( ϕ̃†, ϕ̃) (3.3a)
L̃′′sp = L̃′′sp( ψ̃,
ψ) := L̃′sp(
ψ, ψ̃) (3.3b)
L̃′′v = L̃′′v( Ũ , Ũ†) := L̃′v( Ũ†, Ũ), (3.3c)
as well as from the ‘charge symmetric’ Lagrangians
L̃′′′sc = L̃′′′sc( ϕ̃, ϕ̃†) :=
L̃′sc + L̃′′sc
L̃′sc( ϕ̃, ϕ̃†) + L̃′sc( ϕ̃†, ϕ̃)
(3.4a)
L̃′′′sp = L̃′′′sp( ψ̃,
ψ) :=
L̃′sp + L̃′′sp
L̃′sp( ψ̃,
ψ) + L̃′sp(
ψ, ψ̃)
(3.4b)
L̃′′′v = L̃′′′v ( Ũ , Ũ†) :=
L̃′v + L̃′′v
L̃′v( Ũ , Ũ†) + L̃′v( Ũ†, Ũ)
. (3.4c)
It is essential to be noted, for a massless, m = 0, vector field to the Lagrangian formalism
are added as subsidiary conditions the Lorenz conditions
∂µ Ũµ = 0 ∂
µ Ũ†µ = 0 (3.5)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 6
on the solutions of the corresponding Euler-Lagrange equations. Besides, if the opposite is
not stated explicitly, no other restrictions, like the (anti)commutation relations, are supposed
to be imposed on the above Lagrangians. And a technical remark, for convenience, the fields
ϕ̃, ψ̃ and Ũ and their charge conjugate ϕ̃†, ˜̆ψ and Ũ†, respectively, are considered as
independent field variables.
Let L̃′ denotes any one of the Lagrangians (3.1) and L̃′′ (resp. L̃′′′) the corresponding
to it Lagrangian given via (3.3) (resp. (3.4)). Physically the difference between L̃′ and L̃′′ is
that the particles for L̃′ are antiparticles for L̃′′ and vice versa. Both of the Lagrangians L̃′
and L̃′′ are not charge symmetric, i.e. the arising from them theories are not invariant under
the change particle↔antiparticle (or, in mathematical terms, under some of the changes
ϕ̃ ↔ ϕ̃†, ψ̃ ↔ ˜̆ψ, Ũ ↔ Ũ†) unless some additional hypotheses are made. Contrary to
this, the Lagrangian L̃′′′ is charge symmetric and, consequently, the formalism on its base
is invariant under the change particle↔antiparticle.4
The Euler-Lagrange equations for the Lagrangians L̃′, L̃′′ and L̃′′′ happen to coin-
cide [13–15]:5
∂ L̃′
( ∂ L̃′
∂(∂µχ)
≡ ∂ L̃
( ∂ L̃′′
∂(∂µχ)
≡ ∂ L̃
( ∂ L̃′′′
∂(∂µχ)
= 0, (3.6)
where χ = ϕ̃, ϕ̃†, ψ̃,
ψ, Ũ , Ũ† for respectively scalar, spinor and vector field.
Since the creation and annihilation operators are defined only on the base of Euler-La-
grange equations [1, 3, 11–15], we can assert that these operators are identical for the La-
grangians L̃′, L̃′′ and L̃′′′. We shall denote these operators by a±s (k) and a
s (k) with the
convention that a+s (k) (resp. a
s (k)) creates a particle (resp. antiparticle) with 4-momen-
tum (
m2c2 + k2,k), polarization s (see below) and charge (−q) (resp. (+q))6 and a†−s (k)
(resp. a−s (k)) annihilates/destroys such a particle (resp. antiparticle). Here and henceforth
k ∈ R3 is interpreted as (anti)particle’s 3-momentum and the values of the polarization index
s depend on the field considered: s = 1 for a scalar field, s = 1 or s = 1, 2 for respectively
massless (m = 0) or massive (m 6= 0) spinor field, and s = 1, 2, 3 for a vector field.7 Since
massless vector field’s modes with s = 3 may enter only in the spin and orbital angular mo-
menta operators [15], we, for convenience, shall assume that the polarization indices s, t, . . .
take the values from 1 to 2j+1− δ0m(1− δ0j), where j = 0, 12 , 1 is the spin for scalar, spinor
and vector field, respectively, and δ0m := 1 for m = 0 and δ0m := 0 for m 6= 0;8 if the value
s = 3 is important when j = 1 and m = 0, it will be commented/considered separately. Of
course, the creation and annihilation operators are different for different fields; one should
write, e.g., ja
(k) for a±s (k), but we shall not use such a complicated notation and will
assume the dependence on j to be an implicit one.
4 Besides, under the same assumptions, the Lagrangian L̃′′′ does not admit quantization via anticommu-
tators (commutators) for integer (half-integer) spin field, while L̃′ and L̃′′ do not make difference between
integer and half-integer spin fields.
5 Rigorously speaking, the Euler-Lagrange equations for the Lagrangian (3.4b) are identities like 0 = 0 —
see [22]. However, bellow we shall handle this exceptional case as pointed in [14].
6 For a neutral field, we put q = 0.
7 For convenience, in [14], we have set s = 0 if m = 0 and s = 1, 2 if m 6= 0 for a spinor field. For a massless
vector field, one may set s = 1, 2, thus eliminating the ‘unphysical’ value s = 3 for m = 0 — see [1, 11, 15].
In [13], for a scalar field, the notation ϕ±
(k) and ϕ
(k) is used for a±
(k) and a
(k), respectively.
8 In this way the case (j, s,m) = (1, 3, 0) is excluded from further considerations; if (j,m) = (1, 0) and
q = 0, the case considered further in this work corresponds to an electromagnetic field in Coulomb gauge,
as the modes with s = 3 are excluded [15]. However, if the case (j, s,m) = (1, 3, 0) is important for some
reasons, the reader can easily obtain the corresponding results by applying the ones from [15].
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 7
The following settings will be frequently used throughout this chapter:
0 for scalar field
for spinor field
1 for vector field
1 for q = 0 (neutral (Hermitian) field)
0 for q 6= 0 (charged (non-Hermitian) field)
ε := (−1)2j =
+1 for integer j (bose fields)
−1 for half-integer j (fermi fields)
(3.7)
[A,B]± := [A,B]±1 := A ◦B ±B ◦A, (3.8)
where A and B are operators on the system’s Hilbert space F of states.
The dynamical variables corresponding to L̃′, L̃′′ and L̃′′′ are, however, completely dif-
ferent, unless some additional conditions are imposed on the Lagrangian formalism [13–15].
In particular, the momentum operators P̃ωµ , charge operators Q̃ω, spin operators S̃ωµν and
orbital operators L̃ωµν , where ω = ′, ′′, ′′′, for these Lagrangians are [13–15]:
P̃ ′µ =
1 + τ
2j+1−δ0m(1−δ0j )
d3kkµ|
m2c2+k2
{a†+s (k) ◦ a−s (k) + εa†−s (k) ◦ a+s (k)}
(3.9a)
P̃ ′′µ =
1 + τ
2j+1−δ0m(1−δ0j )
d3kkµ|
m2c2+k2
{a+s (k) ◦ a†−s (k) + εa−s (k) ◦ a†+s (k)}
(3.9b)
P̃ ′′′µ =
2(1 + τ)
2j+1−δ0m(1−δ0j )
d3kkµ|
m2c2+k2
{[a†+s (k), a−s (k)]ε + [a+s (k), a†−s (k)]ε}
(3.9c)
Q̃′ = +q
2j+1−δ0m(1−δ0j)
d3k{a†+s (k) ◦ a−s (k)− εa†−s (k) ◦ a+s (k)} (3.10a)
Q̃′′ = −q
2j+1−δ0m(1−δ0j)
d3k{a+s (k) ◦ a†−s (k)− εa−s (k) ◦ a†+s (k)} (3.10b)
Q̃′′′ = 1
2j+1−δ0m(1−δ0j )
d3k{[a†+s (k), a−s (k)]ε − [a+s (k), a†−s (k)]ε} (3.10c)
S̃ ′µν =
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k)
+ σss
µν (k)a
s (k) ◦ a+s′(k)
(3.11a)
S̃ ′′µν = ε
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s′(k) ◦ a
s (k)
+ σss
µν (k)a
(k) ◦ a†+s (k)
(3.11b)
S̃ ′′′µν =
(−1)j−1/2j~
2(1 + τ)
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)[a
s (k), a
(k)]ε
+ σss
µν (k)[a
s (k), a
s′(k)]ε
(3.11c)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 8
L̃′µν =x0µ P̃ ′ν − x0 ν P̃ ′µ
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k)
+ lss
µν (k)a
s (k) ◦ a+s′(k)
2(1 + τ)
2j+1−δ0m(1−δ0j )
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k)
− εa†−s (k)
←−−−−−→
←−−−−−→
◦ a+s (k)
m2c2+k2
(3.12a)
L̃′′µν =x0µ P̃ ′′ν − x0 ν P̃ ′′µ
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s′(k) ◦ a
s (k)
+ lss
µν (k)a
(k) ◦ a†+s (k)
2(1 + τ)
2j+1−δ0m(1−δ0j )
a+s (k)
←−−−−−→
←−−−−−→
◦ a†−s (k)
− εa−s (k)
←−−−−−→
←−−−−−→
◦ a†+s (k)
m2c2+k2
(3.12b)
L̃′′′µν =x0µ P̃ ′′′ν − x0 ν P̃ ′′′µ
(−1)j−1/2j~
2(1 + τ)
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)[a
s (k), a
(k)]ε
+ lss
µν (k)[a
s (k), a
(k)]ε
4(1 + τ)
2j+1−δ0m(1−δ0j )
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k)
− εa−s (k)
←−−−−−→
←−−−−−→
◦ a†+s (k) + a+s (k)
←−−−−−→
←−−−−−→
◦ a†−s (k)
− εa†−s (k)
←−−−−−→
←−−−−−→
◦ a+s (k)
m2c2+k2
(3.12c)
Here we have used the following notation: (−1)n+1/2 := (−1)ni for all n ∈ N and i := +
←−−−−−→
◦B(k) := −
∂A(k)
◦B(k) +
A(k) ◦ kµ
∂B(k)
←−−−→
◦B(k)
(3.13)
for operators A(k) and B(k) having C1 dependence on k,9 and σ
ss′,±
µν (k) and l
ss′,±
µν (k) are
9 More generally, if ω : {F → F} → {F → F} is a mapping on the operator space over the system’s Hilbert
space, we put A
ω ◦ B := −ω(A) ◦ B + A ◦ ω(B) for any A,B : F → F . Usually [2, 12], this notation is used
for ω = ∂µ.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 9
some functions of k such that10
µν (k) = −σss
νµ (k) l
ss′,±
µν (k) = −lss
νµ (k)
µν (k) = l
ss′,±
νµ (k) = 0 for j = 0 (scalar field)
µν (k) = −σss
µν (k) =: σ
µν (k) = −σs
µν (k) = −σss
νµ (k) for j = 1 (vector field)
µν (k) = −lss
µν (k) =: l
µν (k) = −ls
µν (k) = −lss
νµ (k) for j = 1 (vector field).
(3.14)
A technical remark must be make at this point. The equations (3.9)–(3.12) were de-
rived in [13–15] under some additional conditions, represented by equations (2.6) and (2.7),
which are considered bellow in Sect. 5 and ensure the effectiveness of the momentum pic-
ture of motion [21] used in [13–15]. However, as it is partially proved, e.g., in [1], when the
quantities (3.9)–(3.12) are expressed via the Heisenberg creation and annihilation operators
(see (2.9)), they remain valid, up to a phase factor, and without making the mentioned
assumptions, i.e. these assumptions are needless when one works entirely in Heisenberg pic-
ture. For this reason, we shall consider (3.9)–(3.12) as pure consequence of the Lagrangian
formalism.
We should emphasize, in (3.11) and (3.12) with S̃ωµν and L̃ωµν , ω = ′, ′′, ′′′, are denoted
the spin and orbital, respectively, operators for L̃ω, which are the spacetime-independent
parts of the spin and orbital, respectively, angular momentum operators [14, 23]; if the last
operators are denoted by S̃ωµν and L̃
µν , the total angular momentum operator of a system
with Lagrangian L̃ω is [23]
M̃ωµν = L̃
µν + S̃
µν = L̃ωµν + S̃ωµν , ω = ′, ′′, ′′′ (3.15)
and S̃ωµν = S̃ωµν (and hence L̃
µν = L̃ωµν) iff S̃
µν is a conserved operator or, equivalently, iff
the system’s canonical energy-momentum tensor is symmetric.11
Going ahead (see Sect. 6), we would like to note that the expressions (3.9c) and, conse-
quently, the Lagrangian L̃′′′ are the base from which the paracommutation relations were
first derived [16].
And a last remark. Above we have expressed the dynamical variables in Heisenberg picture
via the creation and annihilation operators in momentum picture. If one works entirely in
Heisenberg picture, the operators (2.9), representing the creation and annihilation operators
in Heisenberg picture, should be used. Besides, by virtue of the equations
(a±s (k))
† = a†∓s (k) (a
s (k))
† = a∓s (k) (3.16)
ã±s (k)
= ã†∓s (k)
ã†±s (k)
= ã∓s (k), (3.17)
some of the relations concerning a
s (k), e.g. the Euler-Lagrange and Heisenberg equations,
are consequences of the similar ones regarding a±s (k). In view of (2.9), we shall consider (3.9)–
(3.12) as obtained form the corresponding expressions in Heisenberg picture by making the
replacements ã±s (k) 7→ a±s (k) and ã
s (k) 7→ a†±s (k). So, (3.9)–(3.12) will have, up to a
phase factor, a sense of dynamical variables in Heisenberg picture expressed via the cre-
ation/annihilation operators in momentum picture.
10 For the explicit form of these functions, see [13–15]; see also equation (6.57) below.
11 In [14,23] the spin and orbital operators are labeled with an additional left superscript ◦, which, for brevity,
is omitted in the present work as in it only these operators, not S̃
µν and L̃
µν , will be considered. Notice,
the operators S̃
µν and L̃
µν are, generally, time-dependent while the orbital and spin ones are conserved, as
a result of which the total angular momentum is a conserved operator too [14,23].
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 10
4. On the uniqueness of the dynamical variables
Let D = Pµ, Q, Sµν , Lµν denotes some dynamical variable, viz. the momentum, charge,
spin, or orbital operator, of a system with Lagrangian L. Since the Euler-Lagrange equations
for the Lagrangians L′, L′′ and L′′′ coincide (see (3.6)), we can assert that any field satisfying
these equations admits at least three classes of conserved operators, viz. D′, D′′ and D′′′ =
D′+D′′
.Moreover, it can be proved that the Euler-Lagrange equations for the Lagrangian
Lα,β := αL′ + β L′′ α+ β 6= 0 (4.1)
do not depend on α, β ∈ C and coincide with (3.6). Therefore there exists a two parameter
family of conserved dynamical variables for these equations given via
Dα,β := αD′ + βD′′ α+ β 6= 0. (4.2)
Evidently L′′′ = L 1
and D′′′ = D 1
. Since the Euler-Lagrange equations (3.6) are linear
and homogeneous (in the cases considered), we can, without a lost of generality, restrict the
parameters α, β ∈ C to such that
α+ β = 1, (4.3)
which can be achieved by an appropriate renormalization (by a factor (α+β)−1/2) of the field
operators. Thus any field satisfying the Euler-Lagrange equations (3.6) admits the family
Dα,β, α + β = 1, of conserved operators. Obviously, this conclusion is valid if in (4.1) we
replace the particular Lagrangians L′ and L′′ (see (3.1) and (3.3)) with any two Lagrangians
(of one and the same field variables) which lead to identical Euler-Lagrange equations. How-
ever, the essential point in our case is that L′ and L′′ do not differ only by a full divergence,
as a result of which the operators Dα,β are different for different pairs (α, β), α+ β = 1.12
Since one expects a physical system to possess uniquely defined dynamical characteristics,
e.g. energy and total angular momentum, and the Euler-Lagrange equations are considered
(in the framework of Lagrangian formalism) as the ones governing the spacetime evolution of
the system considered, the problem arises when the dynamical operators Dα,β, α+β = 1, are
independent of the particular choice of α and β, i.e. of the initial Lagrangian one starts off.
Simple calculation show that the operators (4.2), under the condition (4.3), are independent
of the particular values of the parameters α and β if and only if
D′ = D′′. (4.4)
Some consequences of the condition(s) (4.4) will be considered below, as well as possible ways
for satisfying these restrictions on the Lagrangian formalism.
Combining (3.9)–(3.12) with (4.4), for respectively D = Pµ, Q, Sµν , Lµν , we see that a
free scalar, spinor or vector field has a uniquely defined dynamical variables if and only if
the following equations are fulfilled:
2j+1−δ0m(1−δ0j )
d3k kµ
m2c2+k2
a†+s (k) ◦ a−s (k)− εa−s (k) ◦ a†+s (k)
− a+s (k) ◦ a†−s (k) + εa†−s (k) ◦ a+s (k)
= 0 (4.5)
12 Note, no commutativity or some commutation relations between the field operators and their charge (or
Hermitian) conjugate are presupposed, i.e., at the moment, we work in a theory without such relations and
normal ordering.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 11
2j+1−δ0m(1−δ0j )
a†+s (k) ◦ a−s (k)− εa−s (k) ◦ a†+s (k)
+ a+s (k) ◦ a†−s (k)− εa†−s (k) ◦ a+s (k)
= 0 (4.6)
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k)− εσ
ss′,−
µν (k)a
(k) ◦ a†+s (k)
− εσss′,+µν (k)a+s′(k) ◦ a
s (k) + σ
ss′,+
µν (k)a
s (k) ◦ a+s′(k)
= 0 (4.7)
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k)− εl
ss′,−
µν (k)a
(k) ◦ a†+s (k)
− εlss′,+µν (k)a+s′(k) ◦ a
s (k) + l
ss′,+
µν (k)a
s (k) ◦ a+s′(k)
2j+1−δ0m(1−δ0j )
a†+s (k)
←−−−−−→
←−−−−−→
◦a−s (k)+εa−s (k)
←−−−−−→
←−−−−−→
◦a†+s (k)
−a+s (k)
←−−−−−→
←−−−−−→
◦a†−s (k)−εa†−s (k)
←−−−−−→
←−−−−−→
◦a+s (k)
m2c2+k2
(4.8)
In (4.6) is retained the constant factor q as in the neutral case it is equal to zero and,
consequently, the equation (4.6) reduces to identity.
Since the Euler-Lagrange equations do not impose some restrictions on the creation and
annihilation operators, the equations (4.5)–(4.8) can be regarded as subsidiary conditions
on the Lagrangian formalism and can serve as equations for (partial) determination of the
creation and annihilation operators. The system of integral equations (4.5)–(4.8) is quite
complicated and we are not going to investigate it in the general case. Below we shall
restrict ourselves to analysis of only those solutions of (4.5)–(4.8), if any, for which the
integrands in (4.5)–(4.8) vanish. This means that we shall replace the system of integral
equations (4.5)–(4.8) with respect to creation and annihilation operators with the following
system of algebraic equations (do not sum over s and s′ in (4.12) and (4.13)!):
a†+s (k) ◦ a−s (k) − εa−s (k) ◦ a†+s (k) − a+s (k) ◦ a†−s (k) + εa†−s (k) ◦ a+s (k) = 0 (4.9)
a†+s (k) ◦ a−s (k) − εa−s (k) ◦ a†+s (k) + a+s (k) ◦ a†−s (k) − εa†−s (k) ◦ a+s (k) = 0 if q 6= 0
(4.10)
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k) + εa−s (k)
←−−−−−→
←−−−−−→
◦ a†+s (k)
−a+s (k)
←−−−−−→
←−−−−−→
◦a†−s (k)−εa†−s (k)
←−−−−−→
←−−−−−→
◦a+s (k)
m2c2+k2
(4.11)
µν (k)a
s (k) ◦ a−s′(k)− εσ
ss′,−
µν (k)a
(k) ◦ a†+s (k)
− εσss′,+µν (k)a+s′(k) ◦ a
s (k) + σ
ss′,+
µν (k)a
s (k) ◦ a+s′(k)
= 0 (4.12)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 12
µν (k)a
s (k) ◦ a−s′(k)− εl
ss′,−
µν (k)a
(k) ◦ a†+s (k)
− εlss′,+µν (k)a+s′(k) ◦ a
s (k) + l
ss′,+
µν (k)a
s (k) ◦ a+s′(k)
= 0 (4.13)
Here: s = 1, . . . , 2j + 1 − δ0m(1 − δ0j) in (4.9)–(4.11) and s, s′ = 1, . . . , 2j + 1 − δ0m(1 −
δ1j) in (4.12) and (4.13). (Notice, by virtue of (3.14), the equations (4.12) and (4.13) are
identically valid for j = 0, i.e. for scalar fields.) Since all polarization indices enter in (4.5)
and (4.6) on equal footing, we do not sum over s in (4.9)–(4.11). But in (4.12) and (4.13)
we have retain the summation sign as the modes with definite polarization cannot be singled
out in the general case. One may obtain weaker versions of (4.9)–(4.13) by summing in them
over the polarization indices, but we shall not consider these conditions below regardless of
the fact that they also ensure uniqueness of the dynamical variables.
At first, consider the equations (4.9)–(4.11). Since for a neutral field, q = 0, we have
s (k) = a
s (k), which physically means coincidence of field’s particles and antiparticles,
the equations (4.9)–(4.11) hold identically in this case.
Let consider now the case q 6= 0, i.e. the investigated field to be charged one. Using the
standard notation (cf. (3.8))
[A,B]η := A ◦B + ηB ◦A, (4.14)
for operators A and B and η ∈ C, we rewrite (4.9) and (4.10) as
[a†+s (k), a
s (k)]−ε − [a+s (k), a†−s (k)]−ε = 0 (4.9′)
[a†+s (k), a
s (k)]−ε + [a
s (k), a
s (k)]−ε = 0 if q 6= 0, (4.10′)
which are equivalent to
[a†±s (k), a
s (k)]−ε = 0 if q 6= 0. (4.15)
Differentiating (4.15) and inserting the result into (4.11), one can verify that (4.11) is
tantamount to
a†+s (k),
◦ a−s (k)
a+s (k),
◦ a†−s (k)
m2c2+k2
= 0 if q 6= 0, (4.16)
Consider now (4.12) and (4.13). By means of the shorthand (4.14), they read
µν (k)[a
s (k), a
(k)]−ε + σ
ss′,+
µν (k)[a
s (k), a
(k)]−ε
= 0 (4.17)
µν (k)[a
s (k), a
(k)]−ε + l
ss′,+
µν (k)[a
s (k), a
(k)]−ε
= 0. (4.18)
For a scalar field, j = 0, these conditions hold identically, due to (3.14). But for j 6= 0 they
impose new restrictions on the formalism. In particular, for vector fields, j = 1 and ε = +1
they are satisfied iff (see (3.14))
[a†+s (k), a
(k)]−ε − [a†−s (k), a+s′(k)]−ε − [a
(k), a−s (k)]−ε + [a
(k), a+s (k)]−ε = 0. (4.19)
One can satisfy (4.17) and (4.18) if the following generalization of (4.15) holds
[a†±s (k), a
(k)]−ε = 0. (4.20)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 13
For spin j = 1
(and hence ε = −1 – see (3.7)), the conditions (4.12) and (4.13) cannot
be simplified much, but, if one requires the vanishment of the operator coefficients after
ss′,±
µν (k) and l
ss′,±
µν (k), one gets
a†±s (k) ◦ a∓s′(k) = 0 j =
ε = −1. (4.21)
Excluding some special cases, e.g. neutral scalar field (q = 0 and j = 0), the equa-
tions (4.15) and (4.21) are unacceptable from many viewpoints. The main of them is that they
are incompatible with the ordinary (anti)commutation relations (see, e.g., e.g. [1, 11, 12, 18]
or Sect. 6, in particular, equations (6.13) bellow); for example, (4.21) means that the acts of
creation and annihilation of (anti)particles with identical characteristics should be mutually
independent, which contradicts to the existing theory and experimental data.
Now we shall try another way for achieving uniqueness of the dynamical variables for
free fields. Since in (4.9)–(4.13) naturally appear (anti)commutators between creation and
annihilation operators and these (anti)commutators vanish under the standard normal or-
dering [1,11,12,18], one may suppose that the normally ordered expressions of the dynamical
variables may coincide. Let us analyze this method.
Recall [1, 3, 11, 12], the normal ordering operator N (for free field theory) is a linear
operator on the operator space of the system considered such that to a product (composition)
c1 ◦ · · · ◦ cn of n ∈ N creation and/or annihilation operators c1, . . . cn it assigns the operator
(−1)f cα1 ◦ · · · cαn . Here (α1, . . . , αn) is a permutation of (1, . . . , n), all creation operators
stand to the left of all annihilation ones, the relative order between the creation/annihilation
operators is preserved, and f is equal to the number of transpositions among the fermion
operators (j = 1
) needed to be achieved the just-described order (“normal order”) of the
operators c1 ◦ · · · ◦ cn in cα1 ◦ · · · cαn .13 In particular this means that
a+s (k) ◦ a
t (p)
= a+s (k) ◦ a
t (p) N
a†+s (k) ◦ a−t (p)
= a†+s (k) ◦ a−t (p)
a−s (k) ◦ a
t (p)
t (p) ◦ a−s (k) N
a†−s (k) ◦ a+t (p)
= εa+t (p) ◦ a†−s (k) (4.22)
and, consequently, we have
[a†±s (k), a
t (p)]−ε
= 0 N
[a±s (k), a
t (p)]−ε
= 0, (4.23)
due to ε := (−1)2j = ±1 (see (3.7)). (In fact, below only the equalities (4.22) and (4.23),
not the general definition of a normal product, will be applied.)
Applying the normal ordering operator to (4.9′), (4.10′), (4.17) and (4.18), we, in view
of (4.23), get the identity 0 = 0, which means that the conditions (4.9), (4.10), (4.12)
and (4.13) are identically satisfied after normal ordering. This is confirmed by the application
of N to (3.9) and (3.10), which results respectively in (see (4.22))
N ( P̃ ′µ) = N ( P̃ ′′µ)
1 + τ
2j+1−δ0m(1−δ0j )
d3kkµ|
m2c2+k2
{a†+s (k) ◦ a−s (k) + a+s (k) ◦ a†−s (k)} (4.24)
13 We have slightly modified the definition given in [1,3,11,12] because no (anti)commutation relations are
presented in our exposition till the moment. In this paper we do not concern the problem for elimination of
the ‘unphysical’ operators a±
(k) and a
(k) from the spin and orbital momentum operators when j = 1; for
details, see [15], where it is proved that, for an electromagnetic field, j = 1 and q = 0, one way to achieve this
is by adding to the number f above the number of transpositions between a±s (k), s = 1, 2, and a
(k) needed
for getting normal order.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 14
N ( Q̃′) = N ( Q̃′′) = 1
1 + τ
2j+1−δ0m(1−δ0j )
d3k{a†+s (k) ◦ a−s (k)− a+s (k) ◦ a†−s (k)}.
(4.25)
Therefore the normal ordering ensures the uniqueness of the momentum and charge operators,
if we redefine them respectively as
P̃µ := N ( P̃ ′µ) Q̃ := N ( Q̃′). (4.26)
Putting ωµν := kµ
− kν ∂∂kµ and using (4.22), one can verify that
a+s (k)
←−−−→
ωµν ◦ a†−s (k)
= a+s (k)
←−−−→
ωµν ◦ a†−s (k)
a†+s (k)
←−−−→
ωµν ◦ a−s (k)
= a†+s (k)
←−−−→
ωµν ◦ a−s (k)
a−s (k)
←−−−→
ωµν ◦ a†+s (k)
= −εa†+s (k)
←−−−→
ωµν ◦ a−s (k)
a†−s (k)
←−−−→
ωµν ◦ a+s (k)
= −εa+s (k)
←−−−→
ωµν ◦ a†−s (k).
(4.27)
As a consequence of these equalities, the action of N on the l.h.s. of (4.11) vanishes. Com-
bining this result with the mentioned fact that the normal ordering converts (4.12) and (4.13)
into identities, we see that the normal ordering procedure ensures also uniqueness of the spin
and orbital operators if we redefine them respectively as:
S̃µν := N ( S̃ ′µν) := N ( S̃ ′′µν) =
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k) + εσ
ss′,+
µν (k)a
s′(k) ◦ a
s (k)
(4.28)
L̃µν := N ( L̃′µν) := N ( L̃′′µν) = x0µ P̃ν − x0 ν P̃µ +
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k)a
s (k) ◦ a−s′(k) + εl
ss′,+
µν (k)a
(k) ◦ a†−s (k)
2(1 + τ)
2j+1−δ0m(1−δ0j )
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k)
+ a+s (k)
←−−−−−→
←−−−−−→
◦ a†−s (k)
m2c2+k2
(4.29)
where (3.14) was applied.
5. Heisenberg relations
The conserved operators, like momentum and charge operators, are often identified with the
generators of the corresponding transformations under which the action operator is invari-
ant [1, 3, 11, 12]. This leads to a number of commutation relations between the components
of these operators and between them and the field operators. The relations of the letter
set are known/referred as the Heisenberg relations or equations. Both kinds of commuta-
tion relations are from pure geometric origin and, consequently, are completely external to
the Lagrangian formalism; one of the reasons being that the mentioned identification is, in
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 15
general, unacceptable and may be carried out only on some subset of the system’s Hilbert
space of states [23, 24]. Therefore their validity in a pure Lagrangian theory is questionable
and should be verified [11]. However, the considered relations are weaker conditions than
the identification of the corresponding operators and there are strong evidences that these
relations should be valid in a realistic quantum field theory [1,11]; e.g., the commutativity be-
tween the momentum and charge operators (see below (5.18)) expresses the experimental fact
that the 4-momentum and charge of any system are simultaneously measurable quantities.
It is known [1,11], in a pure Lagrangian approach, the field equations, which are usually
identified with the Euler-Lagrange, 14 are the only restrictions on the field operators. Besides,
these equations do not determine uniquely the field operators and the letter can be expressed
through the creation and annihilation operators. Since the last operators are left completely
arbitrary by a pure Lagrangian formalism, one is free to impose on them any system of
compatible restrictions. The best known examples of this kind are the famous canonical
(anti)commutation relations and their generalization, the so-called paracommutation rela-
tions [16,18]. In general, the problem for compatibility of such subsidiary to the Lagrangian
formalism system of restrictions with, for instance, the Heisenberg relations is open and
requires particular investigation [11]. For example, even the canonical (anti)commutation
relations for electromagnetic field in Coulomb gauge are incompatible with the Heisenberg
equation involving the (total) angular momentum operator unless the gauge symmetry of this
field is taken into account [11, § 84]. However, the (para)commutation relations are, by con-
struction, compatible with the Heisenberg relations regarding momentum operator (see [16]
or below Subsect. 6.1). The ordinary approach is to be imposed a system of equations on
the creation and annihilation operators and, then, to be checked its compatibility with, e.g.,
the Heisenberg relations. In the next sections we shall investigate the opposite situation:
assuming the validity of (some of) the Heisenberg equations, the possible restrictions on
the creation and annihilation operators will be explored. For this purpose, below we briefly
review the Heisenberg relations and other ones related to them.
Consider a system of quantum fields ϕ̃i(x), i = 1, . . . , N ∈ N, where ϕ̃i(x) denote the
components of all fields (and their Hermitian conjugates), and P̃µ, Q̃ and M̃µν be its
momentum, charge and (total) angular momentum operators, respectively. The Heisenberg
relations/equations for these operators are [1, 3, 11,12]
[ ϕ̃i(x), P̃µ] = i~
∂ ϕ̃i(x)
(5.1)
[ ϕ̃i(x), Q̃] = e( ϕ̃i)q ϕ̃i(x) (5.2)
[ ϕ̃i(x), M̃µν ] = i~{xµ∂ν ϕ̃i(x)− xν∂µ ϕ̃i(x)}+ i~
ϕ̃i′(x). (5.3)
Here: q = const is the fields’ charge, e( ϕ̃i) = 0 if ϕ̃
i = ϕ̃i, e( ϕ̃i) = ±1 if ϕ̃
i 6= ϕ̃i with
e( ϕ̃i)+e( ϕ̃
i ) = 0, and the constants I
iµν = −Ii
iνµ characterize the transformation properties
of the field operators under 4-rotations. (If ε( ϕ̃i) 6= 0, it is a convention whether to put
ε( ϕ̃i) = +1 or ε( ϕ̃i) = −1 for a fixed i.)
We would like to make some comments on (5.3). Since its r.h.s. is a sum of two operators,
the first (second) characterizing the pure orbital (spin) angular momentum properties of
the system considered, the idea arises to split (5.3) into two independent equations, one
involving the orbital angular momentum operator and another concerning the spin angular
momentum operator. This is supported by the observation that, it seems, no process is known
for transforming orbital angular momentum into spin one and v.v. (without destroying the
14 Recall, there are Lagrangians whose classical Euler-Lagrange equations are identities. However, their
correct and rigorous treatment [22] reveals that they entail field equations which are mathematically correct
and physically sensible.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 16
system). So one may suppose the existence of operators M̃orµν and M̃
µν such that
[ ϕ̃i(x), M̃orµν ] = i~{xµ∂ν ϕ̃i(x)− xν∂µ ϕ̃i(x)} (5.4)
[ ϕ̃i(x), M̃spµν ] = i~
iµν ϕ̃i′(x) (5.5)
M̃µν = M̃orµν + M̃spµν . (5.6)
However, as particular calculations demonstrate [5,14,15], neither the spin (resp. orbital)
nor the spin (resp. orbital) angular momentum operator is a suitable candidate for M̃spµν
(resp. M̃orµν). If we assume the validity of (5.1), then equations (5.4) and (5.5) can be
satisfied if we choose
M̃orµν(x) = L̃extµν := xµ P̃ν − xν P̃µ (5.7)
M̃spµν(x) = M̃(0)µν (x) := M̃µν − L̃extµν = S̃µν + L̃µν − {xµ P̃ν − xν P̃µ} (5.8)
with M̃µν satisfying (5.3). These operators are not conserved ones. Such a representation
is in agreement with the equations (3.12), according to which the operator (5.7) enters addi-
tively in the expressions for the orbital operator.15 The physical sense of the operator (5.7)
is that it represents the orbital angular momentum of the system due to its movement as a
whole. Respectively, the operator (5.8) describes the system’s angular momentum as a result
of its internal movement and/or structure.
Since the spin (orbital) angular momentum is associated with the structure (movement)
of a system, in the operator (5.8) are mixed the spin and orbital angular momenta. These
quantities can be separated completely via the following representations of the operators
Morµν and M
µν in momentum picture (when (5.1) holds)
Morµν = xµ Pν − xµPµ + Lintµν (5.9)
Mspµν = Mµν − (xµ Pν − xµ Pµ)− Lintµν , (5.10)
where Lintµν describes the ‘internal’ orbital angular momentum of the system considered and
depends on the Lagrangian we have started off. Generally said, Lintµν is the part of the
orbital angular momentum operator containing derivatives of the creation and annihilation
operators. In particular, for the Lagrangians L′, L′′ and L′′′ (see Sect. 3), the explicit forms
of the operators (5.9) and (5.10) respectively are:
M′ orµν =xµP ′ν − xν P ′µ
2(1 + τ)
2j+1−δ0m(1−δ0j)
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k)
− εa†−s (k)
←−−−−−→
←−−−−−→
◦ a+s (k)
m2c2+k2
(5.11a)
M′′ orµν =xµP ′′ν − xν P ′′µ
2(1 + τ)
2j+1−δ0m(1−δ0j)
a+s (k)
←−−−−−→
←−−−−−→
◦ a†−s (k)
− εa−s (k)
←−−−−−→
←−−−−−→
◦ a†+s (k)
m2c2+k2
(5.11b)
15 This is evident in the momentum picture of motion, in which xµ stands for x0µ in (3.12) — see [13–15].
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 17
M′′′ orµν =xµP ′′′ν − xν P ′′′µ
4(1 + τ)
2j+1−δ0m(1−δ0j)
a†+s (k)
←−−−−−→
←−−−−−→
◦ a−s (k)
− εa−s (k)
←−−−−−→
←−−−−−→
◦ a†+s (k) + a+s (k)
←−−−−−→
←−−−−−→
◦ a†−s (k)
− εa†−s (k)
←−−−−−→
←−−−−−→
◦ a+s (k)
m2c2+k2
(5.11c)
M′ spµν =
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k) + l
ss′,−
µν (k))a
s (k) ◦ a−s′(k)
+ (σss
µν (k) + l
ss′,+
µν (k))a
s (k) ◦ a+s′(k)
(5.12a)
M′′ spµν = ε
(−1)j−1/2j~
1 + τ
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k) + l
ss′,+
µν (k))a
(k) ◦ a†−s (k)
+ (σss
µν (k) + σ
ss′,−
µν (k))a
(k) ◦ a†+s (k)
(5.12b)
M′′′ spµν =
(−1)j−1/2j~
2(1 + τ)
2j+1−δ0m(1−δ1j )
s,s′=1
µν (k) + l
ss′,−
µν (k))[a
s (k), a
s′(k)]ε
+ (σss
µν (k) + l
ss′,+
µν (k))[a
s (k), a
(k)]ε
(5.12c)
Obviously (see Sect. 2), the equations (5.12) have the same form in Heisenberg picture in
terms of the operators (2.9) (only tildes over M and a must be added), but the equa-
tions (5.11) change substantially due to the existence of derivatives of the creation and
annihilation operators in them [13–15]:
M̃′ orµν =
2(1 + τ)
2j+1−δ0m(1−δ0j )
ã†+s (k)
←−−−−−→
←−−−−−→
◦ ã−s (k)
− εã†−s (k)
←−−−−−→
←−−−−−→
◦ ã+s (k)
m2c2+k2
(5.13a)
M̃′′ orµν =
2(1 + τ)
2j+1−δ0m(1−δ0j )
ã+s (k)
←−−−−−→
←−−−−−→
◦ ã†−s (k)
− εã−s (k)
←−−−−−→
←−−−−−→
◦ ã†+s (k)
m2c2+k2
(5.13b)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 18
M̃′′′ orµν =
4(1 + τ)
2j+1−δ0m(1−δ0j )
ã†+s (k)
←−−−−−→
←−−−−−→
◦ ã−s (k)
− εã−s (k)
←−−−−−→
←−−−−−→
◦ ã†+s (k) + ã+s (k)
←−−−−−→
←−−−−−→
◦ ã†−s (k)
− εã†−s (k)
←−−−−−→
←−−−−−→
◦ ã+s (k)
m2c2+k2
(5.13c)
From (5.13) and (5.12) is clear that the operators M̃orµν and M̃
µν so defined are conserved
(contrary to (5.7) and (5.8)) and do not depend on the validity of the Heisenberg rela-
tions (5.1) (contrary to expressions (5.11) in momentum picture).
The problem for whether the operators (5.12) and (5.13) satisfy the equations (5.4)
and (5.5), respectively, will be considered in Sect. 6.
There is an essential difference between (5.4) and (5.5): the equation (5.5) depends on
the particular properties of the operators ϕ̃i(x) under 4-rotations via the coefficients I
(see (5.25) below), while (5.4) does not depend on them. This is explicitly reflected in (5.11)
and (5.12): the former set of equations is valid independently of the geometrical nature of the
fields considered, while the latter one depends on it via the ‘spin’ (‘polarization’) functions
ss′,±
µν (k) and l
ss′,±
µν (k). Similar remark concerns (5.3), on one hand, and (5.1) and (5.2), on
another hand: the particular form of (5.3) essentially depends on the geometric properties
of ϕ̃i(x) under 4-rotations, the other equations being independent of them.
It should also be noted, the relation (5.3) does not hold for a canonically quantized
electromagnetic field in Coulomb gauge unless some additional terms it its r.h.s., reflecting
the gauge symmetry of the field, are taken into account [11, § 84].
As it was said above, the relations (5.1)–(5.3) are from pure geometrical origin. However,
the last discussion, concerning (5.4)–(5.8), reveals that the terms in braces in (5.3) should be
connected with the momentum operator in the (pure) Lagrangian approach. More precisely,
on the background of equations (3.11a)–(3.12c), the Heisenberg relation (5.3) should be
replaced with
[ ϕ̃i(x), M̃µν ] = xµ[ ϕ̃i(x), P̃ν ] − xν [ ϕ̃i(x), P̃µ] + i~
iµν ϕ̃i′(x), (5.14)
which is equivalent to (5.3) if (5.1) is true. An advantage of the last equation is that it is valid
in any picture of motion (in the same form) while (5.3) holds only in Heisenberg picture.16
Obviously, (5.14) is equivalent to (5.5) with M̃spµν defined by (5.8).
The other kind of geometric relations mentioned at the beginning of this section are
connected with the basic relations defining the Lie algebra of the Poincaré group [7, pp. 143–
147], [8, sect. 7.1]. They require the fulfillment of the following equations between the com-
ponents P̃µ of the momentum and M̃µν of the angular momentum operators [3, 5, 7, 8]:
[ P̃µ, P̃ν ] = 0 (5.15)
[M̃µν , P̃λ] = −i~(ηλµ P̃ν − ηλν P̃µ). (5.16)
[M̃κλ, M̃µν ] = −i~
ηκµ M̃λν − ηλµ M̃κν − ηκν M̃λµ + ηλν M̃κµ
. (5.17)
We would like to pay attention to the minus sign in the multiplier (−i~) in (5.16) and (5.17)
with respect to the above references, where i~ stands instead of −i~ in these equations. When
16 In other pictures of motion, generally, additional terms in the r.h.s. of (5.3) will appear, i.e. the functional
form of the r.h.s. of (5.3) is not invariant under changes of the picture of motion, contrary to (5.14).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 19
(a representation of) the Lie algebra of the Poincaré group is considered, this difference in the
sign is insignificant as it can be absorbed into the definition of M̃µν . However, the change
of the sign of the angular momentum operator, M̃µν 7→ −M̃µν , will result in the change
i~ 7→ −i~ in the r.h.s. of (5.3). This means that equations (5.15), (5.16) and (5.3), when
considered together, require a suitable choice of the signs of the multiplier i~ in their right
hand sides as these signs change simultaneously when M̃µν is replaced with −M̃µν . Since
equations (5.3), (5.16) and (5.17) hold, when M̃µν is defined according to the Noether’s
theorem and the ordinary (anti)commutation relations are valid [13–15], we accept these
equations in the way they are written above.
To the relations (5.15)–(5.17) should be added the equations [3, p. 78]
[ Q̃, P̃µ] = 0 (5.18)
[ Q̃, M̃µν ] = 0, (5.19)
which complete the algebra of observables and express, respectively, the translational and
rotational invariance of the charge operator Q̃; physically they mean that the charge and
momentum or the charge and angular momentum are simultaneously measurable quantities.
Since the spin properties of a system are generally independent of its charge or momentum,
one may also expect the validity of the relations17
[ S̃µν , P̃µ] = 0 (5.20)
[ S̃µν , Q̃] = 0. (5.21)
But, as the spin describes, in a sense, some of the rotational properties of the system,
equality like [ S̃µν , L̃κλ] = 0 is not likely to hold. Indeed, the considerations in [13–15] reveal
that (5.20) and (5.21), but not the last equation, are true in the framework of the Lagrangian
formalism with added to it standard (anti)commutation relations. Notice, if (5.20) and (5.21)
hold, then, respectively, (5.16) and (5.19) are equivalent to
[ L̃µν , P̃λ] = −i~(ηλµ P̃ν − ηλν P̃µ). (5.22)
[ Q̃, L̃µν ] = 0. (5.23)
It is intuitively clear, not all of the commutation relations (5.1)–(5.3) and (5.15)–(5.21)
are independent: if D̃ denotes some of the operators P̃µ, Q̃, M̃µν , S̃µν or L̃µν and the
commutators [ ϕ̃i(x), D̃] , i = 1, . . . , N , are known, then, in principle, one can calculate the
commutators [Γ( ϕ̃1(x), . . . , ϕ̃N (x)), D̃] , where Γ( ϕ̃1(x), . . . , ϕ̃N (x)) is, for example, any
function/functional bilinear in ϕ̃1(x), . . . , ϕ̃N (x); to prove this fact, one should apply the
identity [A,B ◦ C] = [A,B] ◦ C + B ◦ [A,C] a suitable number of times. In particular, if
D̃1 and D̃2 denote any two (distinct) operators of the dynamical variables, and [ ϕ̃i(x), D̃1]
is known, then the commutator [ D̃1, D̃2] can be calculated explicitly. For this reason, we
can expect that:
(i) Equation (5.1) implies (5.15), (5.16), (5.18), (5.20) and (5.22).
(ii) Equation (5.2) implies (5.18), (5.19), (5.21), and (5.23).
(iii) Equation (5.3) implies (5.16), (5.17), and (5.19).
Besides, (5.3) may, possibly, entail equations like (5.17) with S or L forM , with an exception
of M̃µν in the l.h.s., i.e.
[ S̃κλ, M̃µν ] = −i~
ηκµ S̃λν − ηλµ S̃κν − ηκν S̃λµ + ηλν S̃κµ
[ L̃κλ, M̃µν ] = −i~
ηκµ L̃λν − ηλµ L̃κν − ηκν L̃λµ + ηλν L̃κµ
} (5.24)
17 Recall, S̃µν (resp. L̃µν) is the conserved spin (resp. orbital) operator, not the generally non-conserved
spin (resp. orbital) angular momentum operator [23].
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 20
The validity of assertions (i)–(iii) above for free scalar, spinor and vector fields, when respec-
tively
ϕ̃i(x) 7→ ϕ̃(x), ϕ̃†(x) Ii
iµν 7→ Iµν = 0 e( ϕ̃) = −e( ϕ̃†) = +1 (5.25a)
ϕ̃i(x) 7→ ψ̃(x), ˜̆ψ(x) Ii
iµν 7→ Iψµν = Iψ̆µν = −
σµν e( ψ̃) = −e( ˜̆ψ) = +1 (5.25b)
ϕ̃i(x) 7→ Ũµ(x), Ũ†µ(x) Ii
iµν 7→ Iσρµν = I†σρµν = δσµηνρ − δσν ηµρ e( Ũµ) = −e( Ũ†µ) = +1,
(5.25c)
where σµν := i
[γµ, γν ] with γµ being the Dirac γ-matrices [1, 25], is proved in [13–15],
respectively. Besides, in loc. cit. is proved that equations (5.24) hold for scalar and vector
fields, but not for a spinor field.18
Thus, we see that the Heisenberg relations (5.1)–(5.3) are stronger than the commutation
relations (5.15)–(5.23), when imposed on the Lagrangian formalism as subsidiary restrictions.
6. Types of possible commutation relations
In a broad sense, by a commutation relation we shall understand any algebraic relation
between the creation and annihilation operators imposed as subsidiary restriction on the
Lagrangian formalism. In a narrow sense, the commutation relations are the equations (6.13),
with ε = −1, written below and satisfied by the bose creation and annihilation operators. As
anticommutation relations are known the equations (6.13), with ε = +1, written below and
satisfied by the fermi creation and annihilation operators. The last two types of relations
are often referred as the bilinear commutation relations [18]. Theoretically are possible also
trilinear commutation relations, an example being the paracommutation relations [16, 18]
represented below by equations (6.18) (or (6.20)).
Generally said, the commutation relations should be postulated. Alternatively, they
could be derived from (equivalent to them) different assumptions added to the Lagrangian
formalism. The purpose of this section is to be explored possible classes of commutation
relations, which follow from some natural restrictions on the Lagrangian formalism that are
consequences from the considerations in the previous sections. Special attention will be paid
on some consequences of the charge symmetric Lagrangians as the free fields possess such a
symmetry [1, 3, 11,12].
As pointed in Sect 3, the Euler-Lagrange equations for the Lagrangians L̃′, L̃′′ and L̃′′′
coincide and, in quantum field theory, the role of these equations is to be singled out the
independent degrees of freedom of the fields in the form of creation and annihilation operators
a±s (k) and a
s (k) (which are identical for L̃′, L̃′′ and L̃′′′). Further specialization of these
operators is provided by the commutation relations (in broad sense) which play a role of field
equations in this situation (with respect to the mentioned operators).
Before proceeding on, we would like to simplify our notation. As a spin variable, s say, is
always coupled with a 3-momentum one, k say, we shall use the letters l, m and n to denote
pairs like l = (s,k), m = (t,p) and n = (r, q). Equipped with this convention, we shall write,
e.g., a±l for a
s (k) and a
l for a
s (k). We set δlm := δstδ
3(k−p) and a summation sign like
l should be understood as
d3k, where the range of the polarization variable s will
be clear from the context (see, e.g., (3.9)–(3.12)).
18 The problem for the validity of assertions (i)–(iii) or equations (5.24) in the general case of arbitrary
fields (Lagrangians) is not a subject of the present work.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 21
6.1. Restrictions related to the momentum operator
First of all, let us examine the consequences of the Heisenberg relation (5.1) involving the
momentum operator. Since in terms of creation and annihilation operators it reads [1,13–15]
[a±s (k), Pµ] = ∓kµa±s (k) [a†±s (k), Pµ] = ∓kµa†±s (k) k0 =
m2c2 + k2, (6.1)
the field equations in terms of creation and annihilation operators for the Lagrangians (3.1),
(3.3) and (3.4) respectively are (see [13–15] or (6.1) and (3.9)):
2j+1−δ0m(1−δ0j )
m2c2+q2
a±s (k), a
t (q) ◦ a−t (q) + εa
t (q) ◦ a+t (q)
± (1 + τ)a±s (k)δstδ3(k − q)
d3q = 0
(6.2a)
2j+1−δ0m(1−δ0j )
m2c2+q2
a†±s (k), a
t (q) ◦ a−t (q) + εa
t (q) ◦ a+t (q)
± (1 + τ)a†±s (k)δstδ3(k − q)
d3q = 0
(6.2b)
2j+1−δ0m(1−δ0j )
m2c2+q2
a±s (k), a
t (q) ◦ a
t (q) + εa
t (q) ◦ a
t (q)
± (1 + τ)a±s (k)δstδ3(k − q)
d3q = 0
(6.3a)
2j+1−δ0m(1−δ0j )
m2c2+q2
a†±s (k), a
t (q) ◦ a
t (q) + εa
t (q) ◦ a
t (q)
± (1 + τ)a†±s (k)δstδ3(k − q)
d3q = 0
(6.3b)
2j+1−δ0m(1−δ0j )
m2c2+q2
a±s (k), [a
t (q), a
t (q)]ε + [a
t (q), a
t (q)]ε
± (1 + τ)a±s (k)δstδ3(k − q)
d3q = 0
(6.4a)
2j+1−δ0m(1−δ0j)
m2c2+q2
a†±s (k), [a
t (q), a
t (q)]ε + [a
t (q), a
t (q)]ε
± (1 + τ)a†±s (k)δstδ3(k − q)
d3q = 0,
(6.4b)
where j and ε are given via (3.7), the generalized commutation function [·, ·]ε is defined
by (4.14), and the polarization indices take the values
s, t = 1, . . . , 2j + 1− δ0m(1− δ0j) =
1 for j = 0 or for j = 1
and m = 0
1, 2 for j = 1
and m 6= 0 or for j = 1 and m = 0
1, 2, 3 for j = 1 and m 6= 0
(6.5)
The “b” versions of the equations (6.2)–(6.4) are consequences of the “a” versions and the
equalities
(a±l )
† = a
l ) = a
l (6.6)
[A,B]η
= η[A†, B†]η for [A,B]η = η[B,A]η η = ±1. (6.7)
Applying (6.2)–(6.4) and the identity
[A,B ◦ C] = [A,B]η ◦ C − ηB ◦ [A,C]η for η = ±1 (6.8)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 22
for the choice η = −1, one can prove by a direct calculation that
[ P̃µ, P̃ν ] = 0 [ Q̃, P̃µ] = 0 [ S̃µν , P̃λ] = 0
[ L̃µν , P̃λ] = −i~{ηλµ P̃ν − ηλν P̃µ} [M̃µν , P̃λ] = −i~{ηλµ P̃ν − ηλν P̃µ},
(6.9)
where the operators P̃µ, Q̃, S̃µν , L̃µν , and M̃µν denote the momentum, charge, spin,
orbital and total angular momentum operators, respectively, of the system considered and
are calculated from one and the same initial Lagrangian. This result confirms the supposition,
made in Sect. 5, that the assertion (i) before (5.24) holds for the fields investigated here.
Below we shall study only those solutions of (6.2)–(6.4) for which the integrands in them
vanish, i.e. we shall replace the systems of integral equations (6.2)–(6.4) with the following
systems of algebraic equations (see the above convention on the indices l and m and do not
sum over indices repeated on one and the same level):
a±l , a
m ◦ a−m + εa†−m ◦ a+m
± (1 + τ)δlma±l = 0 (6.10a)
l , a
m ◦ a−m + εa†−m ◦ a+m
± (1 + τ)δlma†±l = 0 (6.10b)
a±l , a
m ◦ a†−m + εa−m ◦ a†+m
± (1 + τ)δlma±l = 0 (6.11a)
l , a
m ◦ a†−m + εa−m ◦ a†+m
± (1 + τ)δlma†±l = 0 (6.11b)
a±l , [a
m , a
m]ε + [a
± 2(1 + τ)δlma±l = 0 (6.12a)
, [a†+m , a
m]ε + [a
± 2(1 + τ)δlma†±l = 0. (6.12b)
It seems, these are the most general and sensible trilinear commutation relations one may
impose on the creation and annihilation operators.
First of all, we should mentioned that the standard bilinear commutation relations, viz. [1,
3, 11–15]
[a±l , a
m]−ε = 0 [a
l , a
m ]−ε = 0
[a∓l , a
m]−ε = (±1)2j+1τδlm idF [a
l , a
m ]−ε = (±1)2j+1τδlm idF
[a±l , a
m ]−ε = 0 [a
l , a
m]−ε = 0
, a†±m ]−ε = (±1)2j+1δlm idF [a
, a±m]−ε = (±1)2j+1δlm idF , (6.13)
provide a solution of any one of the equations (6.10)–(6.12) in a sense that, due to (3.7)
and (6.8), with η = −ε any set of operators satisfying (6.13) converts (6.10)–(6.12) into
identities.
Besides, this conclusion remains valid also if the normal ordering is taken into account,
i.e. if, in this particular case, the changes a
m ◦ a+m 7→ εa+m ◦ a
m and a
m ◦ a
m 7→ εa†+m ◦ a−m
are made in (6.10)–(6.12).
Now we shall demonstrate how the trilinear relations (6.12) lead to the paracommuta-
tion relations. Equations (6.12) can be ‘split’ into different kinds of trilinear commutation
relations into infinitely many ways. For example, the system of equations
a±l , [a
± (1 + τ)δlma±l = 0 (6.14a)
a±l , [a
m , a
± (1 + τ)δlma±l = 0 (6.14b)
, [a+m, a
± (1 + τ)δlma†±l = 0 (6.14c)
l , [a
m , a
± (1 + τ)δlma†±l = 0 (6.14d)
provides an evident solution of (6.12). However, it is a simple algebra to be seen that
these relations are incompatible with the standard (anti)commutation relations (6.13) and,
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 23
in this sense, are not suitable as subsidiary restrictions on the Lagrangian formalism. For
our purpose, the equations
a+l , [a
+ 2δlma
l = 0 (6.15a)
a+l , [a
m , a
+ 2τδlma
l = 0 (6.15b)
a−l , [a
− 2τδlma−l = 0 (6.15c)
a−l , [a
m , a
− 2δlma−l = 0 (6.15d)
and their Hermitian conjugate provide a solution of (6.12), which is compatible with (6.13),
i.e. if (6.13) hold, the equations (6.15) are converted into identities.
The idea of the paraquantization is in the following generalization of (6.15)
a+l , [a
+ 2δlna
m = 0 (6.16a)
a+l , [a
m , a
+ 2τδlna
m = 0 (6.16b)
a−l , [a
− 2τδlma−n = 0 (6.16c)
a−l , [a
m , a
− 2δlma−n = 0 (6.16d)
which reduces to (6.15) for n = m and is a generalization of (6.13) in a sense that any set
of operators satisfying (6.13) converts (6.16) into identities, the opposite being generally not
valid.19
Suppose that the field considered consists of a single sort of particles, e.g. electrons or
photons, created by b
and annihilated by bl := a
. Then the equation Hermitian
conjugated to (6.15a) reads
[bl, [b
m, bm]ε] = 2δlmbm. (6.17)
This is the main relation from which the paper [16] starts. The basic paracommutation
relations are [16–18,26]:
[bl, [b
m, bn]ε] = 2δlmbn (6.18a)
[bl, [bm, bn]ε] = 0. (6.18b)
The first of them is a generalization (stronger version) of (6.17) by replacing the second index
m with an arbitrary one, say n, and the second one is added (by ”hands”) in the theory as
an additional assumption. Obviously, (6.18) are a solution of (6.15) and therefore of (6.12)
in the considered case of a field consisting of only one sort of particles.
The equations (6.15) contain also the relativistic version of the paracommutation rela-
tions, when the existence of antiparticles must be respected [18, sec. 18.1]. Indeed, noticing
that the field’s particles (resp. antiparticles) are created by b
:= a+
(resp. c
) and
annihilated by bl := a
(resp. cl := a
), from (6.15) and the Hermitian conjugate to them
equations, we get
[bl, [b
m, bm]ε] = 2δlmbm [cl, [c
m, cm]ε] = 2δlmcm (6.19a)
, [c†m, cm]ε] = −2τδlmb†m [c
, [b†m, bm]ε] = −2τδlmc†m. (6.19b)
Generalizing these equations in a way similar to the transition from (6.17) to (6.18), we
obtain the relativistic paracommutation relations as (cf. (6.16))
[bl, [b
m, bn]ε] = 2δlmbn [bl, [bm, bn]ε] = 0 (6.20a)
[cl, [c
m, cn]ε] = 2δlmcn [cl, [cm, cn]ε] = 0 (6.20b)
l , [c
m, cn]ε] = −2τδlnb†m [c
l , [b
m, bn]ε] = −2τδlnc†m. (6.20c)
19 Other generalizations of (6.15) are also possible, but they do not agree with (6.13). Moreover, it is easy
to be proved, any other (non-trivial) arrangement of the indices in (6.16) is incompatible with (6.13).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 24
The equations (6.20a) (resp. (6.20b)) represent the paracommutation relations for the field’s
particles (resp. antiparticles) as independent objects, while (6.20c) describe a pure relativistic
effect of some “interaction” (or its absents) between field’s particles and antiparticles and
fixes the paracommutation relations involving the bl’s and cl’s, as pointed in [18, p. 207]
(where bl is denoted by al and cl by bl). The relations (6.17) and (6.20) for ε = +1 (resp.
ε = −1) are referred as the parabose (resp. parafermi) commutation relations [18]. This
terminology is a natural one also with respect to the commutation relations (6.16), which
will be referred as the paracommutation relations too.
As first noted in [16], the equations (6.13) provide a solution of (6.20) (or (6.18) in the
nonrelativistic case) but the latter equations admit also an infinite number of other solutions.
Besides, by taking Hermitian conjugations of (some of) the equations (6.18) or (6.20) and
applying generalized Jacobi identities, like
α[[A,B]ξ , C]η + ξη[[A,C]−α/ξ , B]−α/η − α2[[B,C]ξη/α, A]1/α = 0 αξη 6= 0
β[A, [B,C]α, ]−βγ + γ[B, [C,A]β , ]−γα + α[C, [A,B]γ , ]−αβ = 0 α, β, γ = ±1
[[A,B]η, C]− + [[B,C]η, A]− + [[C,A]η , B]− = 0 η = ±1
[[A,B]ξ, [C,D]η ]− = [[A,B]ξ , C]−,D]η + η[[A,B]ξ ,D]−, C]1/η η 6= 0,
(6.21)
one can obtain a number of other (para)commutation relations for which the reader is referred
to [16,18,26].
Of course, the paracommutation relations (6.16), in particular (6.18) and (6.20) as their
stronger versions, do not give the general solution of the trilinear relations (6.12). For
instance, one may replace (6.12) with the equations
a+l , [a
m , a
n ]ε + [a
+ 2(1 + τ)δlna
m = 0 (6.22a)
a−l , [a
m , a
n ]ε + [a
− 2(1 + τ)δlma−n = 0. (6.22b)
and their Hermitian conjugate, which in terms of the operators bl and cl introduced above
[bl, [b
m, bn]ε + [c
m, cm]ε] = 2(1 + τ)δlmbn (6.23a)
[cl, [b
m, bn]ε + [c
m, cm]ε] = 2(1 + τ)δlmcn, (6.23b)
and supplement these relations with equations like (6.18b). Obviously, equations (6.16) con-
vert (6.22) into identities and, consequently, the (standard) paracommutation relations (6.20)
provide a solution of (6.23). On the base of (6.23) or other similar equations that can be
obtained by generalizing the ones in (6.10)–(6.12), further research on particular classes of
trilinear commutation relations can be done, but, however, this is not a subject of the present
work.
Let us now pay attention to the fact that equations (6.10), (6.11) and (6.12) are generally
different (regardless of existence of some connections between their solutions). The cause for
this being that the momentum operators for the Lagrangians L′, L′′ and L′′′ are generally
different unless some additional restrictions are added to the Lagrangian formalism (see
Sect. 4). A necessary and sufficient condition for (6.10)–(6.12) to be identical is
[a±l , [a
m , a
m]−ε − [a+m, a†−m ]−ε] = 0, (6.24)
which certainly is valid if the condition (4.9′), viz.
[a†+m , a
m]−ε − [a+m, a†−m ]−ε = 0, (6.25)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 25
ensuring the uniqueness of the momentum operator are, holds. If one adopts the standard
bilinear commutation relations (6.13), then (6.25), and hence (6.24), is identically valid, but
in the framework of, e.g., the paracommutation relations (6.16) (or (6.20) in other form) the
equations (6.25) should be postulated to ensure uniqueness of the momentum operator and
therefore of the field equations.
On the base of (6.10) or (6.11) one may invent other types of commutation relations,
which will not be investigated in this paper because we shall be interested mainly in the
case when (6.10), (6.11) and (6.12) are identical (see (6.24)) or, more generally, when the
dynamical variables are unique in the sense pointed in Sect. 4.
6.2. Restrictions related to the charge operator
The consequences of the Heisenberg relations (5.2), involving the charge operator for a charged
field, q 6= 0 (and hence τ = 0 – see (3.7)), will be examined in this subsection. In terms of
creation and annihilation operators it is equivalent to [1, 13–15]
[a±s (k), Q] = qa±s (k) [a†±s (k), Q] = −qa†±s (k), (6.26)
the values of the polarization indices being specified by (6.5). Substituting here (3.10), we
see that, for a charged field, the field equations for the Lagrangians L′, L′′ and L′′′ (see
Sect. 3) respectively are:
2j+1−δ0m(1−δ0j )
d3p{[a±s (k), a
t (p) ◦ a−t (p)− εa
t (p) ◦ a+t (p)] − a±s (k)δstδ3(k − p)} = 0
(6.27a)
2j+1−δ0m(1−δ0j )
d3p{[a†±s (k), a
t (p) ◦ a−t (p)− εa
t (p) ◦ a+t (p)] + a†±s (k)δstδ3(k − p)} = 0
(6.27b)
2j+1−δ0m(1−δ0j )
d3p{[a±s (k), a+t (p) ◦ a
t (p)− εa−t (p) ◦ a
t (p)] + a
s (k)δstδ
3(k − p)} = 0
(6.28a)
2j+1−δ0m(1−δ0j )
d3p{[a†±s (k), a+t (p) ◦ a
t (p)− εa−t (p) ◦ a
t (p)] − a†±s (k)δstδ3(k − p)} = 0
(6.28b)
2j+1−δ0m(1−δ0j )
d3p{[a±s (k), [a
t (p), a
t (p)]ε − [a+t (p), a
t (p)ε] − 2a±s (k)δstδ3(k − p)} = 0
(6.29a)
2j+1−δ0m(1−δ0j )
d3p{[a†±s (k), [a
t (p), a
t (p)]ε − [a+t (p), a
t (p)ε] + 2a
s (k)δstδ
3(k − p)}=0.
(6.29b)
Using (6.27)–(6.29) and (6.8), with η = ε = −1, or simply (6.26), one can easily verify
the validity of the equations
[ P̃µ, Q̃] = 0 [ L̃µν , Q̃] = 0
[ S̃µν , Q̃] = 0 [M̃µν , Q̃] = 0,
(6.30)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 26
where the operators P̃µ, Q̃, S̃µν , L̃µν and M̃µν are calculated from one and the same
initial Lagrangian according to (3.9)–(3.12). This result confirms the validity of assertion (ii)
before (5.24) for the fields considered.
Following the above considerations, concerning the momentum operator, we shall now
replace the systems of integral equations (6.27)–(6.29) with respectively the following stronger
systems of algebraic equations (by equating to zero the integrands in (6.27)–(6.29)):
a±l , a
m ◦ a−m − εa†−m ◦ a+m
− δlma±l = 0 (6.31a)
, a†+m ◦ a−m − εa†−m ◦ a+m
+ δlma
= 0 (6.31b)
a±l , a
m ◦ a†−m − εa−m ◦ a†+m
+ δlma
l = 0 (6.32a)
, a+m ◦ a†−m − εa−m ◦ a†+m
− δlma†±l = 0 (6.32b)
a±l , [a
m , a
m]ε − [a+m, a†−m ]ε
− 2δlma±l = 0 (6.33a)
, [a†+m , a
m]ε − [a+m, a†−m ]ε
+ 2δlma
= 0. (6.33b)
These trilinear commutation relations are similar to (6.10)–(6.12) and, consequently, can be
treated in analogous way.
By invoking (6.8), it is a simple algebra to be proved that the standard bilinear commu-
tation relations (6.13) convert (6.31)–(6.33) into identities. Thus (6.13) are stronger version
of (6.31)–(6.33) and, in this sense, any type of commutation relations, which provide a
solution of (6.31)–(6.33) and is compatible with (6.13), is a suitable candidate for general-
izing (6.13). To illustrate that idea, we shall proceed with (6.33) in a way similar to the
‘derivation’ of the paracommutation relations from (6.12).
Obviously, the equations (cf. (6.14) with τ = 0, as now q 6= 0)
, [a+m, a
m ]ε] + δlma
m = 0 (6.34a)
, [a†+m , a
m]ε] − δlma±m = 0 (6.34b)
and their Hermitian conjugate provide a solution of (6.33), but, as a direct calculations shows,
they do not agree with the standard (anti)commutation relations (6.13). A solution of (6.33)
compatible with (6.13) is given by the equations (6.15), with τ = 0 as the field considered is
charged one — see (3.7). Therefore equations (6.16), with τ = 0, also provide a compatible
with (6.13) solution of (6.33), from where immediately follows that the paracommutation
relations (6.20), with τ = 0, convert (6.33) into identities. To conclude, we can say that the
paracommutation relations (6.20), in particular their special case (6.13), ensure the simul-
taneous validity of the Heisenberg relations (5.1) and (5.2) for free scalar, spinor and vector
fields.
Similarly to (6.22), one may generalize (6.33) to
a+l , [a
m , a
n ]ε − [a+m, a†−n ]ε
− 2δlna+m = 0 (6.35a)
a−l , [a
m , a
n ]ε − [a+m, a†−n ]ε
− 2δlma−n = 0. (6.35b)
which equations agree with (6.13), (6.15), (6.16) and (6.20), but generally do not agree
with (6.22), with τ = 0, unless the equations (6.16), with τ = 0, hold.
More generally, we can assert that (6.33) and (6.12), with τ = 0, hold simultaneously if
and only if (6.15), with τ = 0, is fulfilled. From here, again, it follows that the paracommu-
tation relations ensure the simultaneous validity of (5.1) and (5.2).
Let us say now some words on the uniqueness problem for the Heisenberg equations
involving the charge operator. The systems of equations (6.31)–(6.33) are identical iff
a±l , [a
m , a
m]−ε + [a
m ]−ε
= 0, (6.36)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 27
which, in particular, is satisfied if the condition
[a†+m , a
m]−ε + [a
m ]−ε = 0, (6.37)
ensuring the uniqueness of the charge operator (see (4.10′)), is valid. Evidently, equa-
tions (6.36) and (6.24) are compatible iff
a+l , [a
m , a
a−l , [a
m , a
= 0 (6.38)
which is a weaker form of (4.15) ensuring simultaneous uniqueness of the momentum and
charge operator.
6.3. Restrictions related to the angular momentum operator(s)
It is now turn to be investigated the restrictions on the creation and annihilation operators
that follow from the Heisenberg relations (5.3) concerning the angular momentum operator.
They can be obtained by inserting the equations (3.11) and (3.12) into (5.3). As pointed
in Sect. 5, the resulting equalities, however, depend not only on the particular Lagrangian
employed, but also on the geometric nature of the field considered; the last dependence
being explicitly given via (5.25) and the polarization functions σss
µν (k) and l
ss′m±
µν (k) (see
also (3.14)).
Consider the terms containing derivatives in (5.3),
L̃orµν := i~
ϕ̃i(x). (6.39)
If ϕ̃
(k) denotes the Fourier image of ϕ̃i(x), i.e.
ϕ̃i(x) = Λ
d4ke−
kµxµ ϕ̃
(k), (6.40)
with Λ being a normalization constant, then the Fourier image of (6.39) is
(k). (6.41)
Comparing this expression with equations (3.12), we see that the terms containing derivatives
in (3.12) should be responsible for the term (6.39) in (5.3).20 For this reason, we shall suppose
that the momentum operator M̃µν admits a representation
M̃µν = M̃orµν + M̃spµν (6.42)
such that the operators M̃orµν and M̃
µν satisfy the relations (5.4) and (5.5), respectively.
Thus we shall replace (5.3) with the stronger system of equations (5.4)–(5.5). Besides,
we shall admit that the explicit form of the operatorsM̃orµν and M̃
µν are given via (5.13)
and (5.12) for the fields investigated in the present work.
Let us consider at first the ‘orbital’ Heisenberg relations (5.4), which is independent
of the particular geometrical nature of the fields studied. Substituting (5.13) and (6.40)
into (5.4), using that ϕ̃
(±k), with k2 = m2c2, is a linear combination of ã±s (k) with classical,
not operator-valued, functions of k as coefficients [1, 13–15] and introducing for brevity the
operator
ωµν(k) := kµ
, (6.43)
20 The terms proportional to the momentum operator in (3.12) disappear if the creation and annihilation
operators (2.9) in Heisenberg picture are employed (see also [13–15]).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 28
we arrive to the following integro-differential systems of equations:
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã±s (k), ã
t (p) ◦ ã−t (q)
− εã†−t (p) ◦ ã+t (q)] )
m2c2+p2
= 2(1 + τ)ωµν(k)(ã
s (k)) (6.44a)
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã† ±s (k), ã
t (p) ◦ ã−t (q)
− εã†−t (p) ◦ ã+t (q)] )
m2c2+p2
= 2(1 + τ)ωµν(k)(ã
s (k)) (6.44b)
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã±s (k), ã+t (p) ◦ ã
t (q)
− εã−t (p) ◦ ã
t (q)] )
m2c2+p2
= 2(1 + τ)ωµν(k)(ã
s (k)) (6.45a)
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã† ±s (k), ã+t (p) ◦ ã
t (q)
− εã−t (p) ◦ ã
t (q)] )
m2c2+p2
= 2(1 + τ)ωµν(k)(ã
s (k)) (6.45b)
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã±s (k), [ã
t (p), ã
t (q)]ε
+ [ã+t (p), ã
t (q)]ε] )
m2c2+p2
= 4(1 + τ)ωµν(k)(ã
s (k)) (6.46a)
2j+1−δ0m(1−δ0j )
(−ωµν(p) + ωµν(q))([ã† ±s (k), [ã
t (p), ã
t (q)]ε
+ [ã+t (p), ã
t (q)]ε] )
m2c2+p2
= 4(1 + τ)ωµν(k)(ã
s (k)), (6.46b)
where k0 =
m2c2 + k2 is set after the differentiations are performed (see (6.43)). Follow-
ing the procedure of the previous considerations, we replace the integro-differential equa-
tions (6.44)–(6.46) with the following differential ones:
(−ω◦µν(m) + ω◦µν(n))([ã±l , ã
m ◦ ã−n − εã†−m ◦ ã+n ] )
= 2(1 + τ)δlmω
µν(l)(ã
l ) (6.47a)
(−ω◦µν(m)+ω◦µν(n))([ã
l , ã
m ◦ ã−n − εã†−m ◦ ã+n ] )
= 2(1+ τ)δlmω
µν(l)(ã
l ) (6.47b)
(−ω◦µν(m) + ω◦µν(n))([ã±l , ã
m ◦ ã†−n − εã−m ◦ ã†+n ] )
= 2(1 + τ)δlmω
µν(l)(ã
) (6.48a)
(−ω◦µν(m)+ω◦µν(n))([ã
l , ã
m ◦ ã†−n − εã−m ◦ ã†+n ] )
= 2(1+ τ)δlmω
µν(l)(ã
l ) (6.48b)
(−ω◦µν(m) + ω◦µν(n))([ã±l , [ã
m , ã
n ]ε + [ã
m, ã
n ]ε] )
= 4(1 + τ)δlmω
µν(l)(ã
l ) (6.49a)
(−ω◦µν(m) + ω◦µν(n))([ã
, [ã†+m , ã
n ]ε + [ã
m, ã
n ]ε] )
= 4(1 + τ)δlmω
µν(l)(ã
(6.49b)
where we have set (cf. (6.43))
ω◦µν(l) := ωµν(k) = kµ
if l = (s,k) (6.50)
and k0 =
m2c2 + k2 is set after the differentiations are performed.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 29
Remark. Instead of (6.47)–(6.49) one can write similar equations in which the operator
−ω◦µν(m) or +ω◦µν(n) is deleted and the factor +12 or −
, respectively, is added on their right
hand sides. These manipulations correspond to an integration by parts of some of the terms
in (6.44)–(6.46).
The main difference of the obtained trilinear relations with respect to the previous ones
considered above is that they are partial differential equations of first order.
The relations (6.49) agree with the equations (6.16) in a sense that if (6.16) hold,
then (6.49) become identically valid. Indeed, since
(−ω◦µν(m) + ω◦µν(n))(ã±mδln)
= −2δlmω◦µν(m)(ã±m)
(−ω◦µν(m) + ω◦µν(n))(ã±n δlm)
= +2δlmω
µν(m)(ã
(6.51)
due to (6.50), (6.43) and the equality
dδ(x)
f(x) = −δ(x)df(x)
for a C1 function f , the
application of the operator (−ω◦µν(m) + ω◦µν(n)) to (6.16) and subsequent setting n = m
entails (6.49). In particular, this means that the paracommutation relations (6.20) and,
moreover, the standard (anti)commutation relations (6.13) convert (6.49) into identities.
Therefore the ‘orbital’ Heisenberg relations (5.4) hold for scalar, spinor and vector fields
satisfying the bilinear or para commutation relations.
It should be noted, the paracommutation relations are not the only trilinear commutation
relations that are solutions of (6.49). As an example, we shall present the trilinear relations
a+l , [a
a+l , [a
m , a
= −(1 + τ)δlna+m (6.52a)
a−l , [a
a−l , [a
m , a
= +(1 + τ)δlma
n , (6.52b)
which reduce to (6.14) for n = m, do not agree with (6.13), but convert (6.49) into identities
(see (6.51)). Other example is provided by the equations (6.22), which are compatible with
the paracommutation relations and, as a result of (6.51), convert (6.49) into identities. Prima
facie one may suppose that any solution of (6.12) provides a solution of (6.49), but this is
not the general case. A counterexample is provided by the commutation relations
a±l , [a
m , a
n ]ε + [a
± 2(1 + τ)δlna±m = 0, (6.53)
which reduce to (6.12) for n = m, satisfy (6.49) with ã+l for ã
l , and do not satisfy (6.49)
with ã−l for ã
l (see (6.51) and cf. (6.22)).
From (5.13) follows that the operator M̃orµν is independent of the Lagrangian L′, L′′ or
L′′′ one starts off if and only if (see (4.11))
(−ω◦µν(m) + ω◦µν(n))
[ã†+m , ã
n ]−ε − [ã+m, ã†−n ]−ε
= 0. (6.54)
This condition ensures the coincidence of the systems of equations (6.47), (6.48) and (6.49)
too. However, the following necessary and sufficient condition for the coincidence of these
systems is expressed by the weaker equations
(−ω◦µν(m) + ω◦µν(n))
ã±l , [ã
m , ã
n ]−ε − [ã+m, ã†−n ]−ε
= 0. (6.55)
It is now turn to be considered the ‘spin’ Heisenberg relations (5.5).
Recall, the field operators ϕi for the fields considered here admit a representation [13–15]
ϕi = Λ
i (p)a
t (p) + v
i (p)a
t (p)
, (6.56)
where Λ is a normalization constant and v
i (p) are classical, not operator-valued, complex
or real functions which are linearly independent. The particular definition of v
i (p) depends
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 30
on the geometrical nature of ϕi and can be found in [13–15] (see also [1]), where the reader
can find also a number of relations satisfied by v
i (p). Here we shall mention only that
i (p) = 1 for a scalar field and v
i (p) = v
i (p) =: v
i(p) = (v
i(p))
∗ for a vector field.
The explicit form of the polarization functions σ
ss′,±
µν (k) and l
ss′,±
µν (k) (see Sect. 3, in
particular (3.14)) through v
i (k) are [13–15]:
µν (k) =
(−1)j
j + δj0
i (k))
∗Iii′µνv
µν (k) =
(−1)j
2j + δj0
i (k))
←−−−−−→
←−−−−−→
i (k),
(6.57)
with an exception that σ
ss′,±
0a (k) = σ
ss′,±
a0 (k) = 0, a = 1, 2, 3, for a spinor field, j =
, [14].
Evidently, the equations (3.14) follow from the mentioned facts (see also (5.25)).
Substituting (6.56) and (5.12) into (5.5), we obtain the following systems of integral
equations (corresponding respectively to the Lagrangians L′, L′′ and L′′′):
(−1)j+1j
1 + τ
s,s′,t
i (p)
µν (k) + l
ss′,−
µν (k))[a
t (p), a
s (k) ◦ a−s′(k)]
+ (σss
µν (k) + l
ss′,+
µν (k))[a
t (p), a
s (k) ◦ a+s′(k)]
d3pIi
(p)a±t (p) (6.58)
(−1)j+1j
1 + τ
s,s′,t
i (p)
µν (k) + l
ss′,+
µν (k))[a
t (p), a
(k) ◦ a†−s (k)]
+ (σss
µν (k) + l
ss′,−
µν (k))[a
t (p), a
(k) ◦ a†+s (k)]
d3pIi
(p)a±t (p) (6.59)
(−1)j+1j
2(1 + τ)
s,s′,t
i (p)
µν (k) + l
ss′,−
µν (k))
a±t (p), [a
s (k), a
(k)]ε
+ (σss
µν (k) + l
ss′,+
µν (k))
a±t (p), [a
s (k), a
(k)]ε
d3pIi
(p)a±t (p).
(6.60)
For the difference of all previously considered systems of integral equations, like (6.2)–
(6.4), (6.27)–(6.29) and (6.44)–(6.46), the systems (6.58)–(6.60) cannot be replaced by ones
consisting of algebraic (or differential) equations. The cause for this state of affairs is that
in (6.58)–(6.60) enter polarization modes with arbitrary s and s′ and, generally, one cannot
‘diagonalize’ the integrand(s) with respect to s and s′; moreover, for a vector field, the modes
with s = s′ are not presented at all (see (3.14)). That is why no commutation relations can
be extracted from (6.58)–(6.60) unless further assumptions are made. Without going into
details, below we shall sketch the proof of the assertion that the commutation relations (6.16)
convert (6.60) into identities for massive spinor and vector fields.21 In particular, this entails
that the paracommutation and the bilinear commutation relations provide solutions of (6.60).
Let (6.16) holds. Combining it with (6.60), we see that the latter splits into the equations
21 The equations (6.58)–(6.60) are identities for scalar fields as for them Iµν = 0 and v
i (k) = 1, which
reflects the absents of spin for these fields.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 31
(−1)jj
1 + τ
i (p)
τ(σst,−µν (p) + l
µν (p)) + ε(σ
µν (p) + l
µν (p))
a+s (p),
(p)a+s (p) (6.61a)
(−1)j+1j
1 + τ
i (p)
(σts,−µν (p) + l
µν (p)) + ετ(σ
µν (p) + l
µν (p))
a−s (p),
i′ (p)a
s (p). (6.61b)
Inserting here (6.57), we see that one needs the explicit definition of v
i (k) and formulae for
sums like ρii′(k) :=
i (k)(v
(k))∗, which are specific for any particular field and can
be found in [13–15]. In this way, applying (5.25), (3.7) and the mentioned results from [13–15],
one can check the validity of (6.61) for massive fields in a way similar to the proof of (5.3)
in [13–15] for scalar, spinor and vector fields, respectively.
We shall end the present subsection with the remark that the equations (4.17) and (4.18),
which together with (4.15) ensure the uniqueness of the spin and orbital operators, are
sufficient conditions for the coincidence of the equations (6.58), (6.59) and (6.60).
7. Inferences
To begin with, let us summarize the major conclusions from Sect. 6. Each of the Heisenberg
equations (5.1)–(5.3), the equations (5.3) being split into (5.4) and (5.5), induces in a natural
way some relations that the creation and annihilation operators should satisfy. These rela-
tions can be chosen as algebraic trilinear ones in a case of (5.1) and (5.2) (see (6.10)–(6.12)
and (6.31)–(6.33), respectively). But for (5.4) and (5.5) they need not to be algebraic and
are differential ones in the case of (5.4) (see (6.47)–(6.49)) and integral equations in the case
of (5.5) (see (6.58)–(6.60)). It was pointed that the cited relations depend on the initial
Lagrangian from which the theory is derived, unless some explicitly written conditions hold
(see (6.24), (6.37) and (6.55)); in particular, these conditions are true if the equations (4.9)–
(4.13), ensuring the uniqueness of the corresponding dynamical operators, are valid. Since
the ‘charge symmetric’ Lagrangians (3.4) seem to be the ones that best describe free fields,
the arising from them (commutation) relations (6.12), (6.33), (6.49) and (6.60) were stud-
ied in more details. It was proved that the trilinear commutation relations (6.16) convert
them into identities, as a result of which the same property possess the paracommutation
relations (6.20) and, in particular, the bilinear commutation relations (6.13). Examples of tri-
linear commutation relations, which are neither ordinary nor para ones, were presented; some
of them, like (6.14), (6.34) and (6.52), do not agree with (6.13) and other ones, like (6.16),
(6.22) and (6.35), generalize (6.20) and hence are compatible with (6.13). At last, it was
demonstrated that the commutators between the dynamical variables (see (5.15)–(5.23)) are
uniquely defined if a Heisenberg relation for one of the operators entering in it is postulated.
The chief aim of the present section is to be explored the problem whether all of the
reasonable conditions, mentioned in the previous sections and that can be imposed on the
creation and annihilation operators, can hold or not hold simultaneously. This problem is
suggested by the strong evidences that the relations (5.1)–(5.3) and (5.15)–(5.23), with a
possible exception of (5.3) (more precisely, of (5.5)) in the massless case, should be valid in
a realistic quantum field theory [1, 3, 7, 8, 11, 12]. Besides, to the arguments in loc. cit., we
shall add the requirement for uniqueness of the dynamical variables (see Sect. 4).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 32
As it was shown in Sect. 6, the relations (5.1), (5.2), (5.4) and (5.5) are compatible if
one starts from a charge symmetric Lagrangian (see (3.4)), which best describes a free field
theory; in particular, the commutation relations (6.16) (and hence (6.20) and (6.13)) ensure
their simultaneous validity.22 For that reason, we shall investigate below only commutation
relations for which (5.1), (5.2), (5.4) and (5.5) hold. It will be assumed that they should be
such that the equations (6.10)–(6.12), (6.31)–(6.33), (6.47)–(6.49) and (6.58)–(6.60), respec-
tively, hold.
Consider now the problem for the uniqueness of the dynamical variables and its consis-
tency with the commutation relations just mentioned for a charged field. It will be assumed
that this uniqueness is ensured via the equations (4.9)–(4.11).
The equation (4.15), viz.
[a†±m , a
m]−ε = 0, (7.1)
is a necessary and sufficient conditions for the uniqueness of the momentum and charge
operators (see Sect. 4 and the notation introduced at the beginning of Sect. 6). Before
commenting on this relation, we would like to derive some consequences of it. Applying
consequently (6.8) for η = −ε, (7.1) and the identity
[A,B ◦ C]+ = [A,B]η ◦ C − ηB ◦ [A,C]−η η = ±1 (7.2)
for η = +ε,−ε, we, in view of (7.1), obtain
[a+m, [a
m ]ε] = [a
m , [a
m]−ε]+ = (1− ε)[a†−m , a+m]ε ◦ a+m
[a−m, [a
m , a
m]ε] = ε[a
m , [a
m]−ε]+ = ε(1 − ε)[a†+m , a−m]ε ◦ a−m.
(7.3)
Forming the sum and difference of (6.12a), for τ = 0, and (6.33a), we see that the system
of equations they form is equivalent to
[a+l , [a
m , a
m]ε] = 0 [a
l , [a
m ]ε] = 0 (7.4a)
, [a+m, a
m ]ε] + 2δlma
= 0 [a−
, [a†+m , a
m]ε] − 2δlma−l = 0. (7.4b)
Combining (7.4b), for l = m, with (7.3), we get
(1− ε)[a†−m , a+m]ε ◦ a+m + 2a+m = 0 ε(1− ε)[a†+m , a−m]ε ◦ a−m − 2a−m = 0. (7.5)
Obviously, these equations reduce to
a±m = 0 (7.6)
for bose fields as for them ε = +1 (see (3.7)). Since the operators (7.6) describe a completely
unobservable field, or, more precisely, an absence of a field at all, the obtained result means
that the theory considered cannot describe any really existing physical field with spin j = 0, 1.
Such a conclusion should be regarded as a contradiction in the theory. For fermi fields, j = 1
and ε = −1, the equations (7.5) have solutions different from (7.6) iff a±m are degenerate
operators, i.e. with no inverse ones, in which case (7.4a) is a consequence of (7.5) and (7.1)
(see (6.8) and (7.3) too).
The source of the above contradiction is in the equation (7.1), which does not agree with
the bilinear commutation relations (6.13) and contradicts to the existing correlation between
creation and annihilation of particles with identical characteristics (m = (t,p) in our case)
as (7.1) can be interpreted physically as mutual independence of the acts of creation and
annihilation of such particles [1, § 10.1].
At this point, there are two ways for ‘repairing’ of the theory. On one hand, one can
forget about the uniqueness of the dynamical variables (in a sense of Sect. 4), after which
22 The special case(s) when (5.5) may not hold for a massless field will not be considered below.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 33
the formalism can be developed by choosing, e.g., the charge symmetric Lagrangians (3.4)
and following the usual Lagrangian formalism; in fact, this is the way the parafield theory is
build [16,18]. On another hand, one may try to change something at the ground of the theory
in such a way that the uniqueness of the dynamical variables to be ensured automatically.
We shall follow the second method. As a guiding idea, we shall have in mind that the
bilinear commutation relations (6.13) and the related to them normal ordering procedure
provide a base for the present-day quantum field theory, which describes sufficiently well
the discovered elementary particles/fields. On this background, an extensive exploration of
commutation relations which are incompatible with (6.13) is justified only if there appear
some evidences for fields/particles that can be described via them. In that connection it
should be recalled [17, 18], it seems that all known particles/fields are described via (6.13)
and no one of them is a para particle/field.
Using the notation introduced at the beginning of Sect. 4, we shall look for a linear
mapping (operator) E on the operator space over the system’s Hilbert space F of states such
E(D′) = E(D′′). (7.7)
As it was shown in Sect. 4, an example of an operator E is provided by the normal ordering
operator N . Therefore an operator satisfying (7.7) always exists. To any such operator E
there corresponds a set of dynamical variables defined via
D = E(D′). (7.8)
Let us examine the properties of the mapping E that it should possess due to the re-
quirement (7.7).
First of all, as the operators of the dynamical variables should be Hermitian, we shall
require
= E(B†) (7.9)
for any operator B, which entails
D† = D, (7.10)
due to (3.9)–(3.12) and (7.8).
As in Sect. 4, we shall replace the so-arising integral equations with corresponding alge-
braic ones. Thus the equations (4.5)–(4.20) remain valid if the operator E is applied to their
left hand sides.
Consider the general case of a charged field, q 6= 0. So, the analogue of (4.15) reads
[a†±m , a
= 0, (7.11)
which equation ensures the uniqueness of the momentum and charge operators. Respectively,
the condition (4.11) transforms into
(−ω◦µν(m) + ω◦µν(n))
E([a†+m , a−n ]−ε)− E([a+m, a†−n ]−ε)
= 0, (7.12)
which, by means of (7.11) can be rewritten as (cf. (4.16))
ω◦µν(n)
E([a†+m , a−n ]−ε)− E([a+m, a†−n ]−ε)
= 0. (7.13)
At the end, equations (4.17) and (4.18) now should be written as
µν (k) E
[a†+s (k), a
(k)]−ε
+ σss
µν (k) E
[a†−s (k), a
(k)]−ε
= 0 (7.14)
µν (k) E
[a†+s (k), a
(k)]−ε
+ lss
µν (k) E
[a†−s (k), a
(k)]−ε
= 0. (7.15)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 34
These equations can be satisfied if we generalize (7.11) to (cf. (4.20))
[a†±s (k), a
(k)]−ε
= 0 (7.16)
for any s and s′. At last, the following stronger version of (7.16)
[a†±m , a
n ]−ε
= 0, (7.17)
for any m = (t,p) and n = (r, q), ensures the validity of (7.14) and (7.15) and thus of the
uniqueness of all dynamical variables.
It is time now to call attention to the possible commutation relations. The replacement
D′, D′′, D′′′ 7→ D := E(D′) = E(D′′) = E(D′′′) results in corresponding changes in the
whole of the material of Sect. 6. In particular, the systems of commutation relations (6.10)–
(6.12), (6.31)–(6.33), (6.47)–(6.49) and (6.58)–(6.60) should be replaced respectively with:23
a±l , E(a
m ◦ a−m) + ε E(a†−m ◦ a+m)
± (1 + τ)δlma±l = 0 (7.18)
a±l , E(a
m ◦ a−m)− ε E(a†−m ◦ a+m)
− δlma±l = 0 (7.19)
(−ω◦µν(m) + ω◦µν(n))([ã±l , E(ã
m ◦ ã−n )− ε E(ã†−m ◦ ã+n )] )
= 2(1 + τ)δlmω
µν(l)(ã
(7.20)
(−1)j+1j
1 + τ
s,s′,t
i (p)
µν (k) + l
ss′,−
µν (k))[a
t (p), E(a†+s (k) ◦ a−s′(k))]
+ (σss
µν (k) + l
ss′,+
µν (k))[a
t (p), E(a†−s (k) ◦ a+s′(k))]
d3pIi
i′ (p)a
t (p).
(7.21)
Due to the uniqueness conditions (7.11)–(7.14), one can rewrite the terms E(a†±m ◦ a∓m)
in (7.18)–(7.21) in a number of equivalent ways; e.g. (see (7.11))
E(a†±m ◦ a∓m) = ε E(a∓m ◦ a†±m ) =
E([a†±m , a∓m]ε). (7.22)
Consider the general case of a charged field, q 6= 0 (and hence τ = 0). The system of
equations (7.18)–(7.19) is then equivalent to
, E(a†±m ◦ a∓m)
= 0 (7.23a)
, E(a†−m ◦ a+m)
+ εδlma
= 0 (7.23b)
a−l , E(a
m ◦ a−m)
− δlma−l = 0. (7.23c)
These (commutation) relations ensure the simultaneous fulfillment of the Heisenberg rela-
tions (5.1) and (5.2) involving the momentum and charge operators, respectively. To ensure
also the validity of (7.20), with τ = 0, and, consequently, of (5.4), we generalize (7.23) to
a±l , E(a
m ◦ a∓n )
= 0 (7.24a)
, E(a†−m ◦ a+n )
+ εδlma
n = 0 (7.24b)
, E(a†+m ◦ a−n )
− δlma−n = 0, (7.24c)
for any l = (s,k), m = (t,p) and n = (t, q) (see also (6.51)). In the way pointed in
Sect. 6, one can verify that (7.24) for any l = (s,k), m = (t,p) and n = (r,p) entails (7.21)
and hence (5.5). At last, to ensure the validity of all of the mentioned conditions and a
23 To save some space, we do not write the Hermitian conjugate of the below-written equations.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 35
suitable transition to a case of Hermitian field, for which q = 0 and τ = 1 (see (3.7)), we
generalize (7.24) to
a+l , E(a
m ◦ a−n )
+ τδlna
m = 0 (7.25a)
a−l , E(a
m ◦ a+n )
− ετδlna−m = 0 (7.25b)
, E(a†−m ◦ a+n )
+ εδlma
n = 0, (7.25c)
, E(a†+m ◦ a−n )
− δlma−n = 0 (7.25d)
where l, m and n are arbitrary. As a result of (7.17), which we assume to hold, and τa
τa±l (see (3.7)), the equations (7.25a) and (7.25c) (resp. (7.25b) and (7.25d)) become identical
when τ = 1 (and hence a
l = a
l ); for τ = 0 the system (7.25) reduces to (7.24). Recalling
that ε = (−1)2j (see (3.7)), we can rewrite (7.25) in a more compact form as
a±l , E(a
m ◦ a∓n )
+ (±1)2j+1τδlna±m = 0 (7.26a)
a±l , E(a
m ◦ a±n )
− (∓1)2j+1τδlma±n = 0. (7.26b)
Since the last equation is equivalent to (see (7.17)) and use that ε = (−1)2j)
, E(a±m ◦ a†∓n )
+ (±1)2j+1δlna±m = 0, (7.26b′)
it is evident that the equations (7.26a) and (7.26b) coincide for a neutral field.
Let us draw the main moral from the above considerations: the equations (7.17) are
sufficient conditions for the uniqueness of the dynamical variables, while (7.26) are such
conditions for the validity of the Heisenberg relations (5.1)–(5.5), in which the dynamical
variables are redefined according to (7.8). So, any set of operators a±
and E , which are
simultaneous solutions of (7.17) and (7.26), ensure uniqueness of the dynamical variables
and at the same time the validity of the Heisenberg relations.
Consider the uniqueness problem for the solutions of the system of equations consisting
of (7.17)and (7.26). Writing (7.17) as
E(a†±m ◦ a∓n ) = ε E(a∓n ◦ a†±m ) =
E([a†±m , a∓n ]ε), (7.27)
which reduces to (7.22) for n = m, and using ε = (−1)2j (see (3.7)), one can verify that (7.26)
is equivalent to
a+l , E([a
n ]ε)
+ 2δlna
m = 0 (7.28a)
a+l , E([a
m , a
n ]ε)
+ 2τδlna
m = 0 (7.28b)
a−l , E([a
n ]ε)
− 2τδlma−n = 0 (7.28c)
a−l , E([a
m , a
n ]ε)
− 2δlma−n = 0. (7.28d)
The similarity between this system of equations and (6.16) is more than evident: (7.28) can
be obtained from (6.16) by replacing [·, ·]ε with E([·, ·]ε).
As it was said earlier, the bilinear commutation relations (6.13) and the identification of
E with the normal ordering operator N ,
E = N , (7.29)
convert (7.27)–(7.28) into identities; by invoking (6.8), for η = −ε, the reader can check
this via a direct calculation (see also (4.23)). However, this is not the only possible solution
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 36
of (7.27)–(7.28). For example, if, in the particular case, one defines an ‘anti-normal’ ordering
operator A as a linear mapping such that
A(a+m ◦ a†−n ) := εa†−n ◦ a+m A(a†+m ◦ a−n ) := εa−n ◦ a†+m
A(a−m ◦ a†+n ) := a−m ◦ a†+n A(a†−m ◦ a+n ) := a†−m ◦ a+n ,
(7.30)
then the bilinear commutation relations (6.13) and the setting E = A provide a solution
of (7.27)–(7.28); to prove this, apply (6.8) for η = −ε. Evidently, a linear combination of
N and A, together with (6.13), also provides a solution of (7.27)–(7.28).24 Other solution
of the same system of equations is given by E = id and operators a±
satisfying (6.16), in
particular the paracommutation relations (6.20), and a
m ◦ a,∓n = εa,∓n ◦ a†±m . The problem
for the general solution of (7.27)–(7.28) with respect to E and a±l is open at present.
Let us introduce the particle and antiparticle number operators respectively by (see (7.27),
(7.9) and (3.16))
Nl :=
= E(a+
◦ a†−
) = (Nl)† =: N †l
†Nl :=
l , a
= E(a†+l ◦ a
l ) = (
†Nl)† =: †Nl†.
(7.31)
As a result of the commutation relations (7.28), with n = m, they satisfy the equations25
[Nl, a+m]− = δlma+l (7.32a)
[ †Nl, a+m]− = τδlma+l (7.32b)
[Nl, a†+m ]− = τδlma
l (7.32c)
[ †Nl, a†+m ]− = δlma
l . (7.32d)
Combining (3.9)–(3.12) and (5.11)–(5.13) with (7.8), (7.27) and (7.31), we get the following
expressions for the operators of the (redefined) dynamical variables:
P̃µ =
1 + τ
m2c2+k2
(Nl + †Nl) l = (s,k) (7.33)
Q̃ = q
(−Nl + †Nl) (7.34)
S̃µν =
(−1)j−1/2j~
1 + τ
{εσmn,+µν Nnm + σmn,−µν †Nmn)}
m=(s,k)
n=(s′,k)
(7.35)
L̃µν = x0µ P̃ν − x0 ν P̃µ +
(−1)j−1/2j~
1 + τ
{εlmn,+µν Nnm + lmn,−µν †Nmn)}
m=(s,k)
n=(s′,k)
2(1 + τ)
−ω◦µν(l) + ω◦µν(m)
(Nl + †Nl)
m=l=(s,k)
(7.36)
M̃spµν =
(−1)j−1/2j~
1 + τ
{ε(σmn,+µν + lmn,+µν )Nnm + (σmn,−µν + lmn,−µν ) †Nmn)}
m=(s,k)
n=(s′,k)
(7.37)
M̃orµν =
2(1 + τ)
−ω◦µν(l) + ω◦µν(m)
(Nl + †Nl)
m=l=(s,k)
. (7.38)
24 If we admit a±
to satisfy the ‘anomalous” bilinear commutation relations (8.27) (see below), i.e. (6.13)
with ε for −ε and (±1)2j for (±1)2j+1, then E = N , A also provides a solution of (7.27)–(7.28). However,
as it was demonstrated in [13–15], the anomalous commutation relations are rejected if one works with the
charge symmetric Lagrangians (3.4).
25 The equations (7.32a) and (7.32b) correspond to (7.28a) and (7.28b), respectively, and (7.32c) and (7.32d)
correspond to the Hermitian conjugate to (7.28c) and (7.28d), respectively.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 37
Here ω◦µν(l) is defined via (6.50), we have set
σmn,±µν := σ
ss′,±
µν (k) l
µν := l
ss′,±
µν (k) for m = (s,k) and n = (s
′,k), (7.39)
and (see (7.27))
Nlm :=
[a+l , a
= E(a+l ◦ a
m ) = (Nml)† =: N
†Nlm :=
l , a
= E(a†+l ◦ a
m) = (
†Nml)† =: †Nml†
(7.40)
are respectively the particle and antiparticle transition operators (cf. [26, sec. 1] in a case of
parafields). Obviously, we have
Nl = Nll †Nl = †Nll. (7.41)
The choice (7.29), evidently, reduces (7.33)–(7.36) to (4.24), (4.25), (4.28) and (4.29), respec-
tively.
In terms of the operators (7.38), the commutation relations (7.28) can equivalently be
rewritten as (see also (7.9))
[Nlm, a+n ]− = δmna+l (7.42a)
[ †Nlm, a+n ]− = τδmna+l (7.42b)
[Nlm, a†+n ]− = τδmna
(7.42c)
[ †Nlm, a†+n ]− = δmna
l . (7.42d)
If m = l, these relations reduce to (7.32), due to (7.39).
We shall end this section with the remark that the conditions for the uniqueness of
the dynamical variables and the validity of the Heisenberg relations are quite general and
are not enough for fixing some commutation relations regardless of a number of additional
assumptions made to reduce these conditions to the system of equations (7.27)–(7.28).
8. State vectors, vacuum and mean values
Until now we have looked on the commutation relations only from pure mathematical view-
point. In this way, making a number of assumptions, we arrived to the system (7.27)–(7.28) of
commutation relations. Further specialization of this system is, however, almost impossible
without making contact with physics. For the purpose, we have to recall [1, 3, 11, 12] that
the physically measurable quantities are the mean (expectation) values of the dynamical
variables (in some state) and the transition amplitudes between different states. To make
some conclusions from these basic assumption of the quantum theory, we must rigorously
said how the states are described as vectors in system’s Hilbert space F of states, on which
all operators considered act.
For the purpose, we shall need the notion of the vacuum or, more precisely, the assumption
of the existence of unique vacuum state (vector) (known also as the no-particle condition).
Before defining rigorously this state, which will be denoted by X0, we shall heuristically
analyze the properties it should possess.
First of all, the vacuum state vector X0 should represent a state of the field without any
particles. From here two conclusions may be drawn: (i) as a field is thought as a collection
of particles and a ‘missing’ particle should have vanishing dynamical variables, those of the
vacuum should vanish too (or, more generally, to be finite constants, which can be set equal
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 38
to zero by rescaling some theory’s parameters) and (ii) since the operators a−l and a
l are
interpreted as ones that annihilate a particle characterize by l = (s,k) and charge −q or +q,
respectively, and one cannot destroy an ‘absent’ particle, these operators should transform
the vacuum into the zero vector, which may be interpreted as a complete absents of the field.
Thus, we can expect that
D(X0) = 0 (8.1a)
a−l (X0) = 0 a
l (X0) = 0. (8.1b)
Further, as the operators a+l and a
l are interpreted as ones creating a particle charac-
terize by l = (s,k) and charge −q or +q, respectively, state vectors like a+l (X0) and a
l (X0)
should correspond to 1-particle states. Of course, a necessary condition for this is
X0 6= 0, (8.2)
due to which the vacuum can be normalize to unit,
〈 X0| X0〉 = 1, (8.3)
where 〈·|·〉 : F × F → C is the Hermitian scalar (inner) product of F . More generally, if
, . . .) is a monomial only in i ∈ N creation operators, the vector
ψl1l2... := M(a+l1 , a
, . . .)(X0) (8.4)
may be expected to describe an i-particle state (with i1 particles and i2 antiparticles, i1+i2 =
i, where i1 and i2 are the number of operators a
l and a
l , respectively, in M(a
, . . .)).
Moreover, as a free field is intuitively thought as a collection of particles and antiparticles,
it is natural to suppose that the vectors (8.4) form a basis in the Hilbert space F . But the
validity of this assumption depends on the accepted commutation relations; for its proof,
when the paracommutation relations are adopted, see the proof of [18, p. 26, theorem I-1].
Accepting the last assumption and recalling that the transition amplitude between two
states is represented via the scalar product of the corresponding to them state vectors, it
is clear that for the calculation of such an amplitude is needed an effective procedure for
calculation of scalar products of the form
〈ψl1l2...|ϕm1m2...〉 := 〈 X0|(M(a+l1 , a
, . . .))† ◦ M′(a+m1 , a
, . . .)X0〉, (8.5)
with M and M′ being monomials only in the creation operators. Similarly, for computation
of the mean value of some dynamical operator D in a certain state, one should be equipped
with a method for calculation of scalar products like
〈ψl1l2...| Dϕm1m2...〉 := 〈X0|(M(a+l1 , a
, . . .))† ◦ D ◦ M′(a+m1 , a
, . . .)X0〉. (8.6)
Supposing, for the moment, the vacuum to be defined via (8.1), let us analyze (8.1)–(8.6).
Besides, the validity of (7.27)–(7.28) will be assumed.
From the expressions (7.8) and (3.9)–(3.12) for the dynamical variables, it is clear that
the condition (8.1a) can be satisfied if
E(a†±m ◦ a∓n )(X0) = 0, (8.7)
which, in view of (7.27), is equivalent to any one of the equations
E(a±m ◦ a†∓n )(X0) = 0 (8.8a)
E([a±m, a†∓n ]ε)(X0) = 0. (8.8b)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 39
Equation (8.7) is quite natural as it expresses the vanishment of all modes of the vacuum
corresponding to different polarizations, 4-momentum and charge. It will be accepted here-
after.
By means of (8.8) and the commutation relations (7.28) in the form (7.42), in particu-
lar (7.32), one can explicitly calculate the action of any one of the operators (7.33)–(7.38)
on the vectors (8.4): for the purpose one should simply to commute the operators Nlm (or
Nl = Nll) with the creation operators in (8.4) according to (7.42) (resp. (7.32)) until they
act on the vacuum and, hence, giving zero, as a result of (8.8) and (7.42) (resp. (7.32)). In
particular, we have the equations (k0 =
m2c2 + k2):
a+l (X0)
= kµa
l (X0) P̃µ
l (X0)
= kµa
l (X0) l = (s,k) (8.9)
= −qa+
(X0) Q̃
= +qa
(X0) (8.10)
l=(s,k)
(−1)j−1/2j~
1 + τ
{εσlm,+µν + τσml,−µν }
m=(t,k)
a+m|m=(t,k)(X0)
l=(s,k)
(−1)j−1/2j~
1 + τ
{ετσlm,+µν + σml,−µν }
m=(t,k)
a†+m |m=(t,k)(X0)
(8.11)
l=(s,k)
= (x0 µkν − x0 νkµ)(a+l )(X0)− i~
ω◦µν(l)(a
(−1)j−1/2j~
1 + τ
{εllm,+µν + τ lml,−µν }
m=(t,k)
a+m|m=(t,k)(X0)
l=(s,k)
= (x0 µkν − x0 νkµ)(a†+l )(X0)− i~
ω◦µν(l)(a
(−1)j−1/2j~
1 + τ
{ετ llm,+µν + lml,−µν }
m=(t,k)
a†+m |m=(t,k)(X0)
(8.12)
M̃spµν
l=(s,k)
(−1)j−1/2j~
1 + τ
{ε(σlm,+µν + llm,+µν )
+ τ(σml,−µν + l
µν )}
m=(t,k)
a+m|m=(t,k)(X0)
M̃spµν
l=(s,k)
(−1)j−1/2j~
1 + τ
{ετ(σlm,+µν + llm,+µν )
+ (σml,−µν + l
µν )}
m=(t,k)
a†+m |m=(t,k)(X0)
(8.13)
M̃orµν
ã+l (X0)
= −i~
ω◦µν(l)(ã
(X0) M̃orµν
l (X0)
= −i~
ω◦µν(l)(ã
(X0). (8.14)
These equations and similar, but more complicated, ones with an arbitrary monomial in the
creation operators for a+
are the base for the particle interpretation of the quantum
theory of free fields. For instance, in view of (8.9) and (8.10), the state vectors a+l (X0) and
l (X0) are interpreted as ones representing particles with 4-momentum (
m2c2 + k2,k)
and charges −q and +q, respectively; similar multiparticle interpretation can be given to the
general vectors (8.4) too.
The equations (8.9)–(8.12) completely agree with similar ones obtained in [13–15] on the
base of the bilinear commutation relations (6.13).
By means of (8.7), the expression (8.6) can be represented as a linear combination of
terms like (8.5). Indeed, as D is a linear combinations of terms like E(a†±m ◦a∓n ), by means of
the relations (7.28) we can commute each of these terms with the creation (resp. annihilation)
operators in the monomial M′(a+m1 , a
m2 , . . .) (resp. (M(a+l1 , a
, . . .))† = M′′(a†−
, . . .))
and thus moving them to the right (resp. left) until they act on the vacuum X0, giving
the zero vector — see (8.7). In this way the matrix elements of the dynamical variables,
in particular their mean values, can be expressed as linear combinations of scalar products
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 40
of the form (8.5). Therefore the supposition (8.7) reduces the computation of mean values
of dynamical variables to the one of the vacuum mean value of a product (composition)
of creation and annihilation operators in which the former operators stand to the right of
the latter ones. (Such a product of creation and annihilation operators can be called their
‘antinormal’ product; cf. the properties (7.30) of the antinormal ordering operator A.)
The calculation of such mean values, like (8.5) for states ψ,ϕ 6= X0, however, cannot
be done (on the base of (7.27)–(7.28), (8.7) and (8.1a)) unless additional assumption are
made. For the purpose one needs some kind of commutation relations by means of which
the creation (resp. annihilation) operators on the r.h.s. of (8.5) to be moved to the left (resp.
right) until they act on the left (resp. right) vacuum vector X0; as a result of this operation,
the expressions between the two vacuum vectors in (8.5) should transform into a linear
combination of constant terms and such with no contribution in (8.5). (Examples of the last
type of terms are E(a†±m ◦ a∓) and normally ordered products of creation and annihilation
operators.) An alternative procedure may consists in defining axiomatically the values of all
or some of the mean values (8.5) or, more stronger, the explicit action of all or some of the
operators, entering in the r.h.s. of (8.5), on the vacuum.26 It is clear, both proposed schemes
should be consistent with the relations (7.27)–(7.28), (8.1b) and (8.7)–(8.8).
Let us summarize the problem before us: the operator E in (7.27)–(7.28) has to be fixed
and a method for computation of scalar products like (8.5) should be given provided the
vacuum vector X0 satisfies (8.1b), (8.2), (8.3) and (8.7). Two possible ways for exploration
of this problem were indicated above.
Consider the operator E . Supposing E(a†±m ◦ a∓n ) to be a function only of a
m and a
we, in view of (8.1b), can write E(a†±m ◦ a∓n ) = f±(a
m ◦ a∓n ) ◦ b with b = a−n (upper sign) or
b = a
m (lower sign) and some functions f
±. Applying (7.27), we obtain (do not sum over l)
E(a†+m ◦ a−l ) = f
+(a†+m , a
l ) ◦ a
l E(a
m ◦ a
l ) = f
−(a+m, a
l ) ◦ a
◦ a†+m ) = εf+(a†+m , a−l ) ◦ a
E(a†−
◦ a+m) = εf−(a+m, a
) ◦ a†−
Since E is a linear operator, the expression E(a†±m ◦a∓n ) turns to be a linear and homogeneous
function of a
m and a
n , which immediately implies f
±(A,B) = λ±A for operators A and
B and some constants λ± ∈ C. For future convenience, we assume λ± = 1, which can be
achieved via a suitable renormalization of the creation and annihilation operators.27 Thus,
the last equations reduce to
E(a†+m ◦ a−l ) = a
m ◦ a−l E(a
m ◦ a
) = a+m ◦ a
(8.15a)
E(a−l ◦ a
m ) = εa
m ◦ a−l E(a
l ◦ a
m) = εa
m ◦ a
l . (8.15b)
Evidently, these equations convert (7.27), (8.7) and (8.8) into identities. Comparing (8.15)
and (4.22), we see that the identification
E = N (8.16)
of the operator E with the normal ordering operator N is quite natural. However, for our
purposes, this identification is not necessary as only the equations (8.15), not the general
definition of N , will be employed.
26 Such an approach resembles the axiomatic description of the scattering matrix [1,7,8].
27 Since λ+ = 0 or/and λ− = 0 implies D = 0, due to (7.8), these values are excluded for evident reasons.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 41
As a result of (8.15), the commutation relations (7.28) now read:
[a+l , a
m ◦ a†−n ] + δlna+m = 0 (8.17a)
, a†+m ◦ a−n ] + τδlna+m = 0 (8.17b)
, a+m ◦ a†−n ] − τδlma−n = 0 (8.17c)
[a−l , a
m ◦ a−n ] − δlma−n = 0. (8.17d)
(In a sense, these relations are ‘one half’ of the (para)commutation relations (6.16): the
latter are a sum of the former and the ones obtained from (8.17) via the changes a+m ◦a
n ◦ a+m and a
m ◦ a−n 7→ εa−n ◦ a
m ; the last relations correspond to (7.28) with E = A, A
being the antinormal ordering operator — see (7.30). Said differently, up to the replacement
a±i 7→
for all l, the relations (8.17) are identical with (6.16) for ε = 0; as noted in [26, the
remarks following theorem 2 in sec. 1], this is a quite exceptional case from the view-point
of parastatistics theory.) By means of (6.8) for η = −ε, one can verify that equations (8.17)
agree with the bilinear commutation relations (6.13), i.e. (6.13) convert (8.17) into identities.
The equations (8.15) imply the following explicit forms of the number operators (7.31)
and the transition operators (7.40):
Nl = a+l ◦ a
†Nl = a†+l ◦ a
l (8.18)
Nlm = a+l ◦ a
†Nlm = a†+l ◦ a
m. (8.19)
As a result of them, the equations (7.33)–(7.36) are simply a different form of writing of (4.24),
(4.25), (4.28) and (4.29), respectively.
Let us return to the problem of calculation of vacuum mean values of antinormal ordered
products like (8.5). In view of (8.1b) and (8.3), the simplest of them are
〈 X0|λ idF (X0)〉 = λ 〈X0|M±(X0)〉 = 0 (8.20)
where λ ∈ C and M+ (resp. M−) is any monomial of degree not less than 1 only in the
creation (resp. annihilation) operators; e.g. M± = a±l , a
l , a
◦a±l2 , a
◦a†±l2 . These equations,
with λ = 1, are another form of what is called the stability of the vacuum: if Xi denotes an
i-particle state, i ∈ N∪{0}, then, by virtue of (8.20) and the particle interpretation of (8.4),
we have
〈 Xi| X0〉 = δi0, (8.21)
i.e. the only non-forbidden transition into (from) the vacuum is from (into) the vacuum.
More generally, if Xi′,0 and X0,j′′ denote respectively i′-particle and j′′-antiparticle states,
with X0,0 := X0, then
〈Xi′,0| X0,j′′〉 = δi′0δ0j′′ , (8.22)
i.e. transitions between two states consisting entirely of particles and antiparticles, respec-
tively, are forbidden unless both states coincide with the vacuum. Since we are dealing with
free fields, one can expect that the amplitude of a transitions from an (i′-particle + j′-an-
tiparticle) state Xi′,j′ into an (i′′-particle + j′′-antiparticle) state Xi′′,j′′ is
〈 Xi′,j′| Xi′′,j′′〉 = δi′i′′δj′j′′ , (8.23)
but, however, the proof of this hypothesis requires new assumptions (vide infra).
Let us try to employ (8.17) for calculation of expressions like (8.5). Acting with (8.17)
and their Hermitian conjugate on the vacuum, in view of (8.1b), we get
a+m ◦ (−a†−n ◦ a+l + δln idF)(X0) = 0 a
n ◦ (a−m ◦ a
− δlm idF )(X0) = 0
a†+m ◦ (−a−n ◦ a+l + τδln idF )(X0) = 0 a
n ◦ (a†−m ◦ a
l − τδlm idF )(X0) = 0.
(8.24)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 42
These equalities, as well as (8.17), cannot help directly to compute vacuum mean values
of antinormally ordered products of creation and annihilation operators. But the equa-
tions (8.24) suggest the restrictions28
◦ a+m(X0) = δlm X0 a−l ◦ a
m (X0) = δlm X0
a−l ◦ a
m(X0) = τδlm X0 a
l ◦ a
m (X0) = τδlm X0
(8.25)
to be added to the definition of the vacuum. These conditions convert (8.24) into identities
and, in this sense agree with (8.17) and, consequently, with the bilinear commutation rela-
tions (6.13). Recall [16, 18], the relations (8.25) are similar to ones accepted in the parafield
theory and coincide with that for parastatistics of order p = 1; however, here we do not sup-
pose the validity of the paracommutation relations (6.20) (or (6.16)). Equipped with (8.25),
one is able to calculate the r.h.s. of (8.5) for any monomial M (resp. M′) and monomials
M′ (resp. M) of degree 1, degM′ = 1 (resp. degM = 1).29 Indeed, (8.25), (8.1b) and (8.3)
entail:
〈 X0|a†−l ◦ a
m(X0)〉 = 〈X0|a−l ◦ a
m (X0)〉 = δlm
〈 X0|a−l ◦ a
m(X0)〉 = 〈X0|a
◦ a†+m (X0)〉 = τδlm
〈 X0|(M(a+l1 , a
, · · · ))† ◦ a+m(X0)〉 = 〈 X0|(M(a+l1 , a
, · · · ))† ◦ a†+m (X0)〉 = 0 degM≥ 2
〈 X0|a−l ◦ M(a
, a†+m2 , · · · )(X0)〉 = 〈X0|a
l ◦ M(a
, a†+m2 , · · · )(X0)〉 = 0 degM≥ 2.
(8.26)
Hereof the equation (8.23) for i′ + j′ = 1 (resp. i′′ + j′′ = 1) and arbitrary i′′ and j′′ (resp. i′
and j′) follows.
However, it is not difficult to be realized, the calculation of (8.5) in cases more general
than (8.20) and (8.26) is not possible on the base of the assumptions made until now.30 At
this point, one is free so set in an arbitrary way the r.h.s. of (8.5) in the mentioned general
case or to add to (8.17) (and, possibly, (8.25)) other (commutation) relations by means of
which the r.h.s. of (8.5) to be calculated explicitly; other approaches, e.g. some mixture
of the just pointed ones, for finding the explicit form of (8.5) are evidently also possible.
Since expressions like (8.5) are directly connected with observable experimental results, the
only criterion for solving the problem for calculating the r.h.s. of (8.5) in the general case
can be the agreement with the existing experimental data. As it is known [1, 3, 11, 12], at
present (almost?) all of them are satisfactory described within the framework of the bilinear
commutation relations (6.13). This means that, from physical point of view, the theory
should be considered as realistic one if the r.h.s. of (8.5) is the same as if (6.13) are valid
or is reducible to it for some particular realization of an accepted method of calculation,
e.g. if one accepts some commutation relations, like the paracommutation ones, which are
a generalization of (6.13) and reduce to them as a special case (see, e.g., (6.20)). It should
be noted, the conditions (8.1b)–(8.3) and (8.25) are enough for calculating (8.5) if (6.16),
or its versions (6.17) or (6.20), are accepted (cf. [16]). The causes for that difference are
replacements like [a+m, a
n ] 7→ 2a+m◦a
n , when one passes from (6.16) to (8.17); the existence
of terms like a
n ◦ a+ma+l in (6.16) are responsible for the possibility to calculate (8.5).
28 Since the operators a±
and a
are, generally, degenerate (with no inverse ones), we cannot say that (8.24)
implies (8.25).
29 For degM′ = 0 (resp. degM′ = 0) — see (8.20).
30 It should be noted, the conditions (8.1b)–(8.3) and (8.25) are enough for calculating (8.5) if the rela-
tions (6.16), or their version (6.20), are accepted (cf. [16]). The cause for that difference is in replacements
like [a+m, a
n ] 7→ 2a
m ◦ a
n , when one passes from (6.16) to (8.17); the existence of terms like a
n ◦ a
m ◦ a
in (6.16) is responsible for the possibility to calculate (8.5), in case (6.16) hold.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 43
If evidences appear for events for which (8.5) takes other values, one should look, e.g.,
for other commutation relations leading to desired mean values. As an example of the last
type can be pointed the following anomalous bilinear commutation relations (cf. (6.13))
, a±m]ε = 0 [a
, a†±m ]ε = 0
[a∓l , a
m]ε = (±1)2jτδlm idF [a
l , a
m ]ε = (±1)2jτδlm idF
[a±l , a
m ]ε = 0 [a
l , a
m]ε = 0
[a∓l , a
m ]ε = (±1)2jδlm idF [a
l , a
m]ε = (±1)2jδlm idF , (8.27)
which should be imposed after expressions like E(a†±m ◦ a∓n ) are explicitly calculated. These
relations convert (8.17) and (8.25) into identities and by their means the r.h.s. of (8.5) can be
calculated explicitly, but, as it is well known [1,3,11,12,27] they lead to deep contradictions
in the theory, due to which should be rejected.31
At present, it seems, the bilinear commutation relations (6.13) are the only known com-
mutation relations which satisfy all of the mentioned conditions and simultaneously provide
an evident procedure for effective calculation of all expressions of the form (8.5). (Besides,
for them and for the paracommutation relations the vectors (8.4) form a base, the Fock
base, for the system’s Hilbert space of states [18].) In this connection, we want to mention
that the paracommutation relations (6.16) (or their conventional version (6.20)), if imposed
as additional restrictions to the theory together with (8.17), reduce in this particular case
to (6.13) as the conditions (8.25) show that we are dealing with a parafield of order p = 1,
i.e. with an ordinary field [17,18].32
Ending this section, let us return to the definition of the vacuum X0. It, generally,
depends on the adopted commutation relations. For instance, in a case of the bilinear com-
mutation relations (6.13) it consists of the equations (8.1a)–(8.3), while in a case of the
paracommutation relations (6.16) (or other ones generalizing (6.13)) it includes (8.1a)–(8.3)
and (8.25).
9. Commutation relations for
several coexisting different free fields
Until now we have considered commutation relations for a single free field, which can be
scalar, or spinor or vector one. The present section is devoted to similar treatment of a
system consisting of several, not less than two, different free fields. In our context, the
fields may differ by their masses and/or charges and/or spins; e.g., the system may consist
of charged scalar field, neutral scalar field, massless spinor field, massive spinor field and
massless neural vector field. It is a priori evident, the commutation relations regarding only
one field of the system should be as discussed in the previous sections. The problem is to be
derived/postulated commutation relations concerning different fields. It will be shown, the
developed Lagrangian formalism provides a natural base for such an investigation and makes
superfluous some of the assumptions made, for example, in [17, p. B 1159, left column] or
in [18, sec. 12.1], where systems of different parafields are explored.
To begin with, let us introduce suitable notation. With the indices α, β, γ = 1, 2, . . . , N
will be distinguished the different fields of the system, with N ∈ N, N ≥ 2, being their
number, and the corresponding to them quantities. Let qα and jα be respectively the charge
31 As it was demonstrated in [13–15], a quantization like (8.27) contradicts to (is rejected by) the charge
symmetric Lagrangians (3.4).
32 Notice, as a result of (8.17), the relations (6.16) correspond to (7.28) for E = A, with A being the
antinormal ordering operator (see (7.30)).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 44
and spin of the α-th field. Similarly to (3.7), we define
jα :=
0 for scalar α-th field
for spinor α-th field
1 for vector α-th field
τα :=
1 for qα = 0 (neutral (Hermitian) field)
0 for qα 6= 0 (charged (non-Hermitian) field)
εα := (−1)2jα =
+1 for integer jα (bose fields)
−1 for half-integer jα (fermi fields)
(9.1)
Suppose Lα is the Lagrangian of the α-field. For definiteness, we assume Lα for all
α to be given by one and the same set of equations, viz. (3.1), or (3.3) or (3.4). To save
some space, below the case (3.4), corresponding to charge symmetric Lagrangians, will be
considered in more details; the reader can explore other cases as exercises.
Since the Lagrangian of our system of free fields is
Lα, (9.2)
the dynamical variables are
Dα (9.3)
and the corresponding system of Euler-Lagrange equations consists of the independent equa-
tions for each of the fields of the system (see (3.6) with Lα for L). This allows an introduction
of independent creation and annihilation operators for each field. The ones for the α-th field
will be denoted by a±α,sα(k) and a
α,sα(k); notice, the values of the polarization variables
generally depend on the field considered and, therefore, they also are labeled with index
α for the α-th field. For brevity, we shall use the collective indices lα, mα and nα, with
lα := (α, sα,k) etc., in terms of which the last operators are a±
and a
, respectively. The
particular expressions for the dynamical operators Dα are given via (3.9)–(3.12) in which
the following changes should be made:
τ 7→ τα j 7→ jα ε 7→ εα s 7→ sα s′ 7→ s′α
µν (k) 7→ σs
αs′α,±
µν (k) l
ss′,±
µν (k) 7→ ls
αs′α,±
µν (k).
(9.4)
The content of sections 4 and 5 remains valid mutatis mutandis, viz. provided the just
pointed changes (9.4) are made and the (integral) dynamical variables are understood in
conformity with (9.3).
9.1. Commutation relations connected with the momentum operator.
Problems and their possible solutions
In sections 6–8, however, substantial changes occur; for instance, when one passes from (6.12)
or (6.15) to (6.16). We shall consider them briefly in a case when one starts from the charge
symmetric Lagrangians (3.4).
The basic relations (6.12), which arise from the Heisenberg relation (5.1) concerning the
momentum operator, now read (here and below, do not sum over α, and/or β and/or γ if
the opposite is not indicated explicitly!)
a±lα , [a
]εβ + [a
± (1 + τ)δlαmβa±lα = 0 (9.5a)
lα , [a
]εβ + [a
± (1 + τ)δlαmβa
lα = 0. (9.5b)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 45
It is trivial to be seen, the following generalizations of respectively (6.14) and (6.15)
a±lα , [a
± (1 + τβ)δlαmβa±lα = 0 (9.6a)
± (1 + τβ)δlαmβa±lα = 0 (9.6b)
lα , [a
± (1 + τβ)δlαmβa
lα = 0 (9.6c)
± (1 + τβ)δlαmβa
= 0 (9.6d)
a+lα , [a
+ 2δlαmβa
lα = 0 (9.7a)
+ 2τβδlαmβa
= 0 (9.7b)
a−lα , [a
− 2τβδlαmβa−lα = 0 (9.7c)
− 2δlαmβa−lα = 0 (9.7d)
provide a solution of (9.5) in a sense that they convert it into identity. As it was said in
Sect. 6, the equations (9.6) (resp. (9.7)) for a single field, i.e. for β = α, agree (resp. disagree)
with the bilinear commutation relations (6.13).
The only problem arises when one tries to generalize, e.g., the relations (9.7) in a way
similar to the transition from (6.15) to (6.16). Its essence is in the generalization of expres-
sions like [a
]εβ and τ
βδlαmβa
. When passing from (6.15) to (6.16), the indices l and
m are changed so that the obtained equations to be consistent with (6.13); of course, the
numbers ε and τ are preserved because this change does not concern the field regarded. But
the situation with (9.7) is different in two directions:
(i) If we change the pair (mβ,mβ) in [a
]εβ with (m
β, nγ), then with what the num-
ber εβ should be replace? With εβ , or εγ or with something else? Similarly, if the mentioned
changed is performed, with what the multiplier τβ in τβδlαmβa
lα should be replaced? The
problem is that the numbers εβ and τβ are related to terms like a
and a±
◦ a†∓
in the momentum operator, as a whole and we cannot say whether the index β in εβ and τβ
originates from the first of second index mβ in these expressions.
(ii) When writing (mβ , nγ) for (mβ ,mβ) (see (i) above), then shall we replace δlαmβa
with δlαmβa
nγ , or δlαnγa
, or δmβnγa
? For a single field, γ = β = α, this problem is
solved by requiring an agreement of the resulting generalization (of (6.16) in the particular
case) with the bilinear commutation relations (6.13). So, how shall (6.13) be generalized for
several, not less than two, different fields? Obviously, here we meet an obstacle similar to
the one described in (i) above, with the only change that −εβ should stand for εβ .
Let blα and clα denote some creation or annihilation operator of the α-field. Consider the
problem for generalizing the (anti)commutator [blα , clα ]±εα . This means that we are looking
for a replacement
[blα , clα ]±εα 7→ f±(blα , cmβ ;α, β), (9.8)
where the functions f± are such that
f±(blα , cmβ ;α, β)
= [blα , clα ]±εα . (9.9)
Unfortunately, the condition (9.9) is the only restriction on f± that the theory of free fields
can provide. Thus the functions f±, subjected to equation (9.9), become new free parameters
of the quantum theory of different free fields and it is a matter of convention how to choose/fix
them.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 46
It is generally accepted [18, appendix F], the functions f± to have forms ‘maximum’
similar to the (anti)commutators they generalize. More precisely, the functions
f±(blα , cmβ ;α, β) = [blα , cmβ ]±εαβ (9.10)
where εαβ ∈ C are such that
εαα = εα, (9.11)
are usually considered as the only candidates for f±. Notice, in (9.10), εαβ are functions in
α and β, not in lα and/or mβ. Besides, if we assume εαβ to be function only in εα and εβ ,
then the general form of εαβ is
εαβ = uαβεα + (1− uαβ)εβ + vαβ(1− εαεβ) uαβ , vαβ ∈ C, (9.12)
due to (9.1) and (9.11). (In view of (6.13), the value εαβ = +1 (resp. εαβ = −1) corresponds
to quantization via commutators (resp. anticommutators) of the corresponding fields.)
Call attention now on the numbers τα which originate and are associated with each term
[blα , cmα ]±εα . With every change (9.8) one can associate a replacement
τα 7→ g(blα , cmβ ;α, β), (9.13)
where the function g is such that
g(blα , cmβ ;α, β)
= τα. (9.14)
Of course, the last condition does not define g uniquely and, consequently, the function
g, satisfying (9.14), enters in the theory as a new free parameter. Suppose, as a working
hypothesis similar to (9.10)–(9.11), that g is of the form
g(blα , cmβ ;α, β) = τ
αβ , (9.15)
where ταβ are complex numbers that may depend only on α and β and are such that
ταα = τα. (9.16)
Besides, if we suppose ταβ to be functions only in τα and τβ, then
ταβ = xαβτα + yαβτβ + (1− xαβ − yαβ)τατβ xαβ , yαβ ∈ C, (9.17)
as a result of (9.1) and (9.16).
Let us summarize the above discussion. If we suppose a preservation of the algebraic
structure of the bilinear commutation relations (6.13) for a system of different free fields,
then the replacements
[blα , clα ]±εα 7→ [blα , cmβ ]±εαβ εαα = εα (9.18a)
τα 7→ ταα ταα = τα (9.18b)
should be made; accordingly, the relations (6.13) transform into:
]−εαβ = 0 [a
]−εαβ = 0
[a∓lα , a
]−εαβ = τ
αβδlαmβ idF ×
lα , a
]−εαβ = τ
αβδlαmβ idF ×
[a±lα , a
]−εαβ = 0 [a
lα , a
]−εαβ = 0
]−εαβ = δlαmβ idF ×
]−εαβ = δlαmβ idF ×
, (9.19)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 47
where 1 (resp. −εαβ) in
corresponds to the choice of the upper (resp. lower) signs. If
we suppose additionally εαβ (resp. ταβ) to be a function only in εα and εβ (resp. in τα and
τβ), then these numbers are defined up to two sets of complex parameters:
εαβ = uαβεα + (1− uαβ)εβ + vαβ(1− εαεβ) uαβ, vαβ ∈ C (9.20a)
ταβ = xαβτα + yαβτβ + (1− xαβ − yαβ)τατβ xαβ, yαβ ∈ C. (9.20b)
A reasonable further specialization of εαβ and ταβ may be the assumption their ranges
to coincide with those of εα and τα, respectively. As a result of (9.1), this supposition is
equivalent to
vαβ = −uαβ,−uαβ + 1, uαβ − 1, uαβ uαβ ∈ C (9.21a)
(xαβ , yαβ) = (0, 0), (0, 1), (1, 0), (1, 1). (9.21b)
Other admissible restriction on (9.20) may be the requirement εαβ and ταβ to be symmetric,
εαβ(εα, εβ) = εβα(εα, εβ) = εαβ(εβ , εα) (9.22a)
ταβ(τα, τβ) = τβα(τα, τβ) = ταβ(τβ, τα), (9.22b)
which means that the α-th and β-th fields are treated on equal footing and there is no a
priori way to number some of them as the ‘first’ or ‘second’ one.33 In view of (9.20), the
conditions (9.22) are equivalent to
uαβ =
vαβ ∈ C (9.23a)
yαβ = xαβ. (9.23b)
If both of the restrictions (9.21) and (9.23) are imposed on (9.20), then the arbitrariness of
the parameters in (9.20) is reduced to:
(uαβ , uαβ) =
(9.24a)
(xαβ , yαβ) = (0, 0), (1, 1) (9.24b)
and, for any fixed pair (α, β), we are left with the following candidates for respectively εαβ
and ταβ:
(+1 + εα + εβ − εαεβ) (9.25a)
(−1 + εα + εβ + εαεβ) (9.25b)
0 := τ
α + τβ (9.25c)
1 := τ
α + τβ − τατβ. (9.25d)
When free fields are considered, as in our case, no further arguments from mathematical
or physical nature can help for choosing a particular combination (εαβ , ταβ) from the four
possible ones according to (9.25) for a fixed pair (α, β). To end the above considerations of
εαβ and ταβ, we have to say that the choice
(εαβ , ταβ) = (ε
+ , τ
0 ) =
(+1 + εα + εβ − εαεβ), τα + τβ
(9.26)
33 However, nothing can prevent us to make other choices, compatible with (9.18), in the theory of free
fields; for instance, one may set εαβ = εαεβεβα and ταβ = 1
(τα + τβ)τβα.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 48
is known as the normal case [18, appendix F]; in it the relative behavior of bose (resp.
fermi) fields is as in the case of a single field, i.e. they are quantized via commutators (resp.
anticommutators) as (εαβ , ταβ) = (+1, 0) (resp. (εαβ , ταβ) = (−1, 0)), and the one of bose and
fermi field is as in the case of a single fermi field, viz. the quantization is via commutators as
(εαβ , ταβ) = (+1, 0). All combinations between ε
± and τ
0,1 different from (9.26) are referred
as anomalous cases. Above we supposed the pair (α, β) to be fixed. If α and β are arbitrary,
the only essential change this implies is in (9.25), where the choice of the subscripts +, −, 0
and 1 may depend on α and β. In this general situation, the normal case is defined as the one
when (9.26) holds for all α and β. All other combinations are referred as anomalous cases;
such are, for instance, the ones when some fermi and bose operators satisfy anticommutation
relations, e.g. (9.19) with εαβ = −1 for εα + εβ = 0, or some fermi fields are subjected to
commutation relations, like (9.19) with εαβ = +1 for εα = εβ = −1. For some details on this
topic, see, for instance, [18, appendix F], [7, chapter 20] and [27, sect 4-4]. Fields/operators
for which εαβ = +1 (resp. εαβ = −1), with β 6= α, are referred as relative parabose (resp.
parafermi) in the parafield theory [17,18]. One can transfer this terminology in the general
case and call the fields/operators for which εαβ = +1 (resp. εαβ = −1), with β 6= α, relative
bose (resp. fermi) fields/operators.
Further the relations (9.19) will be referred as the multifield bilinear commutation rela-
tions and it will be assumed that they represent the generalization of the bilinear commuta-
tion relations (6.13) when we are dealing with several, not less than two, different quantum
fields. The particular values of εαβ and ταβ in them are insignificant in the following; if one
likes, one can fix them as in the normal case (9.26). Moreover, even the definition (9.19)
of ταβ is completely inessential at all, as ταβ always appears in combinations like ταβδlαmβ
(see (9.19) or similar relations, like (9.27), below), which are non-vanishing if β = α, but
then ταα = τα; so one can freely write τα for ταβ in all such cases.
Equipped with (9.19) and (9.18), we can generalize (9.7) in different ways. For example,
the straightforward generalization of (6.16) is:
, [a+
nγ ]εβγ
+ 2δlαnγa
= 0 (9.27a)
a+lα , [a
, a−nγ ]εβγ
+ 2ταγδlαnγa
= 0 (9.27b)
, [a+
nγ ]εβγ
− 2ταβδlαmβa−nγ = 0 (9.27c)
a−lα , [a
, a−nγ ]εβγ
− 2δlαmβa−nγ = 0. (9.27d)
However, generally, the relations (9.19) do not convert (9.27) into identities. The reason is
that an equality/identity like (cf. (6.8))
[blα , cmβ ◦ dnγ ] = [blα , cmβ ]−εαβ ◦ dnγ + λαβγcmβ ◦ [blα , dnγ ]−εαγ , (9.28)
where blα , cmβ and dnγ are some creation/annihilation operators and λ
αβγ ∈ C, can be valid
only for
λαβγ = εαβ εαγ = 1/εαβ (εαβ 6= 0), (9.29)
which, in particular, is fulfilled if γ = β and εαβ = ±1. So, the agreement between (9.19)
and (9.27) depends on the concrete choice of the numbers εαβ . There exist cases when even
the normal case (9.26) cannot ensure (9.19) to convert (9.27) into identities; e.g. when the
α-th field and β-th fields are fermion ones and the γ-th field is a boson one. Moreover, it can
be proved that (9.19) and (9.27) are compatible in the general case if unacceptable equalities
like a±
◦ a±m = 0 hold.
One may call (9.27) the multifield paracommutation relations as from them a correspond-
ing generalization of (6.18) and/or (6.20) can be derived. For completeness, we shall record
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 49
the multifield version of (6.20):
[blα , [b
, bnγ ]εβγ ] = 2δlαmβbnγ [blα , [bmβ , bnγ ]εβγ ] = 0 (9.30a)
[clα , [c
, cnγ ]εβγ ] = 2δlαmβcnγ [clα , [cmβ , cnγ ]εβγ ] = 0 (9.30b)
lα , [c
, cnγ ]εβγ ] = −2ταγδlαnγb
lα , [b
, bnγ ]εβγ ] = −2ταγδlαnγc
. (9.30c)
For details regarding these multifield paracommutation relations, the reader is referred to [17,
18], where the case τα = τβ = ταβ = 0 is considered.
We leave to the reader as exercise to write down the multifield versions of the commuta-
tion relations (6.22) or (6.23), which provide examples of generalizations of (9.7) and hence
of (9.19) and (9.27).
9.2. Commutation relations connected with the charge and
angular momentum operators
In a case of several, not less than two, different fields, the basic trilinear commutation rela-
tions (6.33), which ensure the validity of the Heisenberg relation (5.2) concerning the charge
operator, read:
a±lα , [a
]εβ − [a+mβ , a
− 2δlαmβa±lα = 0 (9.31a)
lα , [a
]εβ − [a+mβ , a
+ 2δlαmβa
lα = 0. (9.31b)
Of course, these relations hold only for those fields which have non-vanishing charges, i.e.
in (9.31) is supposed (see (9.1))
τα = 0 τβ = 0 (⇐⇒ qαqβ 6= 0). (9.32)
The problem for generalizing (9.31) for these fields is similar to the one for (9.7) in the
case of non-vanishing charges, τβ = 0. Without repeating the discussion of Subsect. 9.1,
we shall adopt the rule (9.18) for generalizing (anti)commutation relations between cre-
ation/annihilation operators of a single field. By its means one can obtain different general-
izations of (9.31). For instance, the commutation relations.
, a−nγ ]εβγ − [a+mβ , a
nγ ]εβγ
− 2δlαnγa+mβ = 0 (9.33a)
a−lα , [a
, a−nγ ]εβγ − [a+mβ , a
nγ ]εβγ
− 2δlαmβa−nγ = 0 (9.33b)
and their Hermitian conjugate contain (9.31) and (6.35) as evident special cases and agree
with (9.19) if γ = β and εαβεβγ = +1. Besides, the multifield paracommutation rela-
tions (9.27) for charged fields, τα = τβ = τγ = 0, convert (9.33) into identities and, in this
sense, (9.33) agree with (contain as special case) (9.27) for charged fields. As an example
of commutation relations that do not agree with (9.27) for charged fields and, consequently,
with (9.33), we shall point the following ones:
a±lα , [a
nγ ]εβγ
+ δlαnγa
= 0 (9.34a)
, a−nγ ]εβγ−
− δlαnγa±mβ = 0, (9.34b)
which are a multifield generalization of (6.34).
The consideration of commutation relations originating from the ‘orbital’ Heisenberg
equation (5.4) is analogous to the one of the same relations regarding the charge operator.
The multifield version of (6.49) is:
(−ω◦µν(mβ) + ω◦µν(nγ))([ã±lα , [ã
, ã−nγ ]εβγ
+ [ã+
nγ ]εβγ ] )
nγ=mβ
= 4(1 + ταβ)δlαmβω
α)(ã±
) (9.35a)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 50
(−ω◦µν(mβ) + ω◦µν(nγ))([ã
lα , [ã
, ã−nγ ]εβγ
+ [ã+
nγ ]εβγ ] )
nγ=mβ
= 4(1 + ταβ)δlαmβω
α)(ã
lα ) (9.35b)
where
ω◦µν(l
α) := ωµν(k) = kµ
if lα = (α, sα,k). (9.36)
Applying (6.51), with mβ for m and nγ for n, one can check that the multifield paracom-
mutation relations (9.27) convert (9.35) into identities and hence provide a solution of (9.35)
and ensure the validity of (5.4), when system of different free fields is considered. An example
of a solution of (9.35) which does not agree with (9.27) is provided by the following multifield
generalization of (6.52):
a+lα , [a
nγ ]εβγ
a+lα , [a
, a−nγ ]εβγ
= −(1 + ταγ)δlαnγa+mβ (9.37a)
a−lα , [a
nγ ]εβγ
a−lα , [a
, a−nγ ]εβγ
= +(1 + ταβ)δlαmβa
nγ , (9.37b)
which provides a solution of (9.5). Notice, the evident multifield version of (6.53) agrees
with (9.5), but disagrees with (9.35) when the lower signs are used.
At last, the multifield exploration of the ‘spin’ Heisenberg relations (5.5) is a mutatis
mutandis (see (9.35)) version of the corresponding considerations in the second part of Sub-
sect. 6.3. The main result here is that the multifield bilinear commutation relations (9.19),
as well as their para counterparts (9.27), ensure the validity of (5.5).
9.3. Commutation relations between the dynamical variables
The aim of this subsection is to be discussed/proved the commutation relations (5.15)–(5.24)
for a system of at least two different quantum fields from the view-point of the commutation
relations considered in subsections 9.1 and 9.2.
To begin with, we rewrite the Heisenberg relations (5.1), (5.2) and (5.4) in terms of
creation and annihilation operators for a multifield system [1,11]:
, Pµ] = ∓kµa±lα [a
, Pµ] = ∓kµa†±lα (9.38)
[a±lα , Q] = qa
lα [a
lα , Q] = −qa
lα (9.39)
,Morµν ] = i~ω◦µν(lα)
,Morµν ] = i~ω◦µν(lα)
, (9.40)
where lα = (α, sα,k), ω◦(lα) is defined by (9.36) and k0 =
m2c2 + k2 is set in (9.38)
and (9.40) (after the differentiations are performed in the last case). The corresponding
version of (5.5) is more complicated and depends on the particular field considered (do not
sum over sα!):
[a±α,sα(k),Mspµν ] = i~gα
αtα,+
µν (k)a
α,tα(k) +
αtα,−
µν (k)a
α,tα(k)
α,sα(k),Mspµν ] = i~hα
αtα,−
µν (k)a
α,tα(k) +
αtα,+
µν (k)a
α,tα(k)
(9.41)
where fsα = −1, 0,+1 (depending on the particular field), gα := −hα := 1jα+δjα0 (−1)
jα+1 and
sαtα,+
µν (k) and
sαtα,−
µν (k) are some functions which strongly depend on the particular field
considered, with ±σ
sαtα,±
µν (k) being related to the spin (polarization) functions σ
sαtα,±
µν (k)
(see (3.14) and (3.11)).34 As a result of (5.6), (9.40) and (9.41), one can easily write the
Heisenberg relations (5.3) in a form similar to (9.38)–(9.41).
34 If φ̃αi (k) are the Fourier images of the α-th field and
i (k) =
i (k)ã
α,sα(k) + v
i (k)ã
α,sα(k)
, (9.42)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 51
The commutation relations involving the momentum operator are:
[Pµ, Pν ] = 0 [Q, Pµ] = 0
[Sµν , Pλ] = [Mspµν , Pλ] = 0
[Lµν , Pλ] = [Morµν , Pλ] = [Mµν , Pλ] = −i~{ηλµ Pν − ηλν Pµ}.
(9.45)
We claim that these equations are consequences from (9.38) and the explicit expressions (3.9)–
(3.12) and (5.11)–(5.13) for the operators of the dynamical variables of the free fields con-
sidered in the present work. In fact, since (9.38) implies
[b±lα ◦ c
, Pµ] = 0 lα = (α, sα,k), mβ = (β, sβ ,k) (9.46a)
[b±lα
←−−−→
ω◦µν (l
α) ◦ c∓
, Pµ] = ±2(kµηνλ − kνηµλ)b±lα ◦ c
, (9.46b)
where b±lα , c
lα = a
lα , a
lα and
←−−−→
ω◦µν (l
α) is defined via (9.36) and (3.13), the verification of (9.45)
reduces to almost trivial algebraic calculations. Further, we assert that any system of commu-
tation relations considered in Subsect. 9.1 entails (9.45): as these relations always imply (9.5)
(or similar multifield versions of (6.10) and (6.11) in the case of the Lagrangians (3.1) or (3.3),
respectively) and, on its turn, (9.5) implies (5.1), the required result follows from the last
assertion and the remark that (5.1) and (9.38) are equivalent. As an additional verification
of the validity of (9.45), the reader can prove them by invoking the identity (6.8) and any
system of commutation relations mentioned in Subsect. 9.1, in particular (9.19) and (9.27).
The commutation relations concerning the charge operator read:
[Pµ, Q] = 0 [Q, Q] = 0
[Lµν , Q] = [Sµν , Q] = 0
[Morµν , Q] = [Mspµν , Q] = [Mµν , Q] = 0.
(9.47)
These equations are trivial corollaries from (3.9)–(3.12) and (5.11)–(5.13) and the observation
that (9.39) implies
lα ◦ a
, Q] = [a±lα ◦ a
, Q] = 0 if qα = qβ, (9.48)
due to (6.8) for η = −1. Since any one of the systems of commutation relations mentioned in
Subsect. 9.2 entails (9.31) (or systems of similar multifield versions of (6.31) and (6.32), if the
Lagrangians (3.1) or (3.3) are employed), which is equivalent to (9.39), the equations (9.47)
hold if some of these systems is valid. Alternatively, one can prove via a direct calculation
that the commutation relations arising from the charge operator entail the validity of (9.47);
where v
i (k) are linearly independent functions normalize via the condition
i (k)
i (k) = δ
, (9.43)
with fs
= 1 for jα = 0, 1
and fs
= 0,−1 for (jα, sα) = (1, 3) or (jα, sα) = (1, 1), (1, 2), respectively, then
µν (k) :=
i (k)
µν (k) :=
i (k)
(9.44)
with Ii
iµν given via (5.25). Besides, σ
µν (k) =
µν (k) with an exception that σ
µν (k) = 0 for
α = 1
and (µ, ν) = (a, 0), (0, a) with a = 1, 2, 3.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 52
for the purpose the identity (6.8) and the explicit expressions for the dynamical variables via
the creation and annihilation operators should be applied.
At last, consider the commutation relations involving the different angular momentum
operators:
[Pλ, Sµν ] = [Pλ,Mspµν ] = 0
[Pλ, Lµν ] = [Pλ,Morµν ] = [Pλ,Mµν ] = +i~{ηλµ Pν − ηλν Pµ}
[Q, Lµν ] = [Q, Sµν ] = [Q,Morµν ] = [Q,Mspµν ] = [Q,Mµν ] = 0
[Sκλ,Mµν ] = −i~
ηκµ Sλν − ηλµ Sκν − ηκν Sλµ + ηλν Sκµ
[Lκλ,Mµν ] = −i~
ηκµ Lλν − ηλµ Lκν − ηκν Lλµ + ηλν Lκµ
[Mκλ,Mµν ] = −i~
ηκµMλν − ηλµMκν − ηκνMλµ + ηλνMκµ
(9.49)
(The other commutators, that can be form from the different angular momentum operators,
are complicated and cannot be expressed in a ‘closed’ form.) The proof of these relations is
based on equations like (see (9.40) and (6.8))
[blα ◦ cmβ ,Morµν ] = i~ω◦µν(lα)
blα ◦ cmβ
lα = (α, sα,k), mβ = (β, sβ ,k), (9.50)
with blα , clα = a
lα , a
lα , a
lα , a
lα , and similar, but more complicated, ones involving the other
angular momentum operators. It, generally, depends on the particular field considered and
will be omitted.
As it was said in Subsect. 6.3, the Heisenberg relations concerning the angular momentum
operator(s) do not give rise to some (algebraic) commutation relations for the creation and
annihilation operators. For this reason, the only problem is which of the commutation
relations discussed in subsections 9.1 and 9.2 imply the validity of the equations (9.49) (or
part of them). The general answer of this problem is not known but, however, a direct
calculation by means of (9.7), if it holds, and (6.8) shows the validity of (9.49). Since (9.19)
and (9.27) imply (9.7), this means that the multifield bilinear and para commutation relations
are sufficient for the fulfillment of (9.49).
To conclude, let us draw the major moral of the above material: the multifield bilinear
commutation relations (9.19) and the multifield paracommutation relations (9.27) ensure
the validity of all ‘standard’ commutation relations (9.45), (9.47) and (9.49) between the
operators of the dynamical variables characterizing free scalar, spinor and vector fields.
9.4. Commutation relations under the uniqueness conditions
As it was said at the end of the introduction to this section, the replacements (9.4) ensure the
validity of the material of Sect. 4 in the multifield case. Correspondingly, the considerations
in Sect. 7 remain valid in this case provided the changes
l 7→ lα m 7→ mβ n 7→ nγ
τδlm 7→ ταβδlαmβ = ταδlαmβ
[bm, bm]ε 7→ [bmβ , bmβ ]εβ [bm, bn]ε 7→ [bmβ , bnγ ]εβγ ,
(9.51)
with bm (or bmβ ) being any creation/annihilation operator, and, in some cases, (9.4) are
made.35 Without going into details, we shall write the final results.
The multifield version of (7.27)–(7.28) is:
E(a†±
◦ a∓nγ ) = εβγ E(a∓nγ ◦ a
E([a†±
, a∓nγ ]εβγ) (9.52)
35 As a result of (7.11), (7.16) and (7.17), in expressions like (7.18)–(7.26) the number ε should be replace
by εαβ, where α and β are the corresponding field indices of the creation/annihilation operators on which the
operator E acts, i.e. ε E(bm ◦ bn) 7→ ε
βγ E(bmβ ◦ bnγ ).
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 53
a+lα , E([a
nγ ]εβγ)
+ 2δlαnγa
= 0 (9.53a)
, E([a†+
, a−nγ ]εβγ )
+ 2ταγδlαnγa
= 0 (9.53b)
a−lα , E([a
nγ ]εβγ)
− 2ταβδlαmβa−nγ = 0 (9.53c)
, E([a†+
, a−nγ ]εβγ )
− 2δlαmβa−nγ = 0 (9.53d)
γ =β. (9.53e)
As one can expect, the relations (9.53a)–(9.53d) can be obtained from the multifield paracom-
mutation relations (9.27) via the replacement [·, ·]ε 7→ E([·, ·]εβγ ). It should be paid special
attention on the equation (9.53e). It is due to the fact that in the expressions for the dynami-
cal variables do not enter ‘cross-field-products’, like a
for β 6= α, and it corresponds to
the condition (ii) in [17, p. B 1159]. The equality (9.53e) is quite important as it selects only
that part of the ‘ E-transformed’ multifield paracommutation relations (9.27) which is com-
patible with the bilinear commutation relations (9.19) (see (9.28) and (9.29)). Besides, (9.53e)
makes (9.53a)–(9.53d) independent of the particular definition of εαβ (see (9.11)).
The equations (9.52) are the only restrictions on the operator E ; examples of this operator
are provided by the normal (resp. antinormal) ordering operator N (resp. A), which has the
properties (cf. (4.22) (resp. (7.30))
◦ a†−nγ
:= a+
◦ a†−nγ N
◦ a−nγ
◦ a−nγ
◦ a†+nγ
:= εβγa
nγ ◦ a−mβ N
◦ a+nγ
:= εβγa+nγ ◦ a
(9.54)
◦ a†−nγ
:= εβγa
nγ ◦ a+mβ A
◦ a−nγ
:= εβγa−nγ ◦ a
◦ a†+nγ
:= a−
◦ a†+nγ A
◦ a+nγ
◦ a+nγ .
(9.55)
The material of Sect. 8 has also a multifield variant that can be obtained via the re-
placements (9.51) and (9.4). Here is a brief summary of the main results found in that
The operator E should possess the properties (9.54) and, in this sense, can be identified
with the normal ordering operator,
E = N . (9.56)
As a result of this fact and εββ = εβ (see (9.11)), the commutation relations (9.53) take the
final form:
a+lα , a
◦ a†−
+ δlαnβa
= 0 (9.57a)
a+lα , a
+ ταβδlαnβa
= 0 (9.57b)
a−lα , a
◦ a†−
− ταβδlαmβa−nβ = 0 (9.57c)
− δlαmβa−nβ = 0 (9.57d)
which is the multifield version of (8.17) and corresponds, up to the replacement a±lα 7→
2a±lα ,
to (9.27) with εβγ = 0.
The vacuum state vector X0 is supposed to be uniquely defined by the following equations
(cf. (8.1b)–(8.3)):
a−lα X0 = 0 a
lα X0 = 0 (9.58a)
X0 6= 0 (9.58b)
〈 X0| X0〉 = 1 (9.58c)
lα ◦ a
(X0) = δlαmβ X0 a−lα ◦ a
(X0) = δlαmβ X0
(X0) = ταβδlαmβ X0 a−lα ◦ a
(X0) = ταβδlαmβ X0.
(9.58d)
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 54
The Hilbert space F of state vectors is a direct sum of the Hilbert spaces Fα of the
different fields and it is supposed to be spanned by the vectors
... = M(a
, . . . )(X0) (9.59)
with M(a+
, . . . ) being arbitrary monomial only in the creation operators.
Since (9.58a), (9.56) and (9.54) imply the multifield version of (8.7), the computation of
the mean values of (8.6), with l1 7→ lα11 etc., of the dynamical variables is reduced to the one
of scalar products like (cf. (8.5))
〈ψlα1
...|φmβ1
〉 = 〈 X0|
, . . . )
)† ◦ M′(a+
, . . . )(X0)〉 (9.60)
of basic vectors of the form (9.59). By means of the basic properties (9.58) of the vacuum,
one is able to calculate the simplest forms of the vacuum mean values (9.60), viz. the mul-
tifield versions (see (9.51)) of (8.20) and (8.26). But more general such expression cannot
be calculated by means of (9.57)–(9.58). Prima facie one can suppose that the multifield
commutation relations (9.19), which ensure the vectors (9.59) to form a base of the system’s
Hilbert space of states, can help for the calculation of (9.60) in more complicated cases. In
fact, this is the case which works perfectly well and covers the available experimental data.
In this connection, we must mention that the applicability of (9.19) for calculation of (9.60)
is ensured by the compatibility/agreement between (9.19) and (9.57): by means of (6.8) for
η = −εαβ, one can check that (9.19) converts (9.57) into identities.36
The commutation relations (9.57) admit as a solution also the multifield version of the
anomalous bilinear commutation relations (8.27) but it, as we said earlier, leads to contradic-
tions and must be rejected. The existence of solutions of (9.57) different from it and (9.19)
seems not to be investigated. If there appear date which do not fit into the description by
means of (9.19), one should look for other, if any, solutions of (9.57) or compatible with (9.57)
effective procedures for calculating vacuum mean values like (9.60).
10. Conclusion
In this paper we have investigated two sources of (algebraic) commutation relations in the
Lagrangian quantum theory of free scalar, spinor and vector fields: the uniqueness of the
dynamical variables (momentum, charge and angular momentum) and the Heisenberg rela-
tions/equations for them. If one ignores the former origin, which is the ordinary case, the
paracommutation relations or some their generalizations seems to be the most suitable can-
didates for the most general commutation relations that ensure the validity of all Heisenberg
equations. The simultaneous consideration of the both sources mentioned reveals, however,
their incompatibility in the general case. The outlet of this situation is in the redefinition
of the operators of the dynamical variables, similar to the normal ordering procedure and
containing it as a special case. That operation ensures the uniqueness of the new (redefined)
dynamical variables and changes the possible types of commutation relations. Again, the
commutation relations, connected with the Heisenberg relations concerning the (redefined)
momentum operator, entail the validity of all Heisenberg equations.
36 Recall, equations (9.19) and (9.27), or (9.53a)–(9.53d), for γ 6= β are generally incompatible. For instance,
excluding some special cases, like systems consisting of only fermi (bose) fields or one fermi (bose) field and
arbitrary number of bose (fermi) fields, the only operators satisfying (9.19) and (9.27) for γ 6= β and having
normal spin-statistics connection are such that bmβ ◦ bnγ = 0, with γ 6= β and bmβ and cnγ being any
creation/annihilation operators, which, in particular, means that no states with two particles from different
fields can exist.
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 55
Further constraints on the possible commutation relations follow from the definition/in-
troduction of the concept of the vacuum (vacuum state vector). They practically reduce the
redefined dynamical variables to the ones obtained via normal ordering procedure, which
results in the explicit form (8.17) of the admissible commutation relations. In a sense,
they happen to be ‘one half’ of the paracommutation ones. As a last argument in the way
for finding the ‘unique true’ commutation relations, we require the existence of procedure
for calculation of vacuum mean values of anti-normally ordered products of creation and
annihilation operators, to which the mean values of the dynamical variables and the transition
amplitudes between different states are reduced. We have pointed that the standard bilinear
commutation relations are, at present, the only known ones that satisfy all of the conditions
imposed and do not contradict to the existing experimental data.
The consideration of a system of at least two different quantum free fields meets a new
problem: the general relations between creation/annihilation operators belonging to differ-
ent fields turn to be undefined. The cause for this is that the commutation relations for any
fixed field are well defined only on the corresponding to it Hilbert subspace of the system’s
Hilbert space of states and their extension on the whole space, as well as the inclusion in
them of creation/annihilation operators of other fields, is a matter of convention (when free
fields are concerned); formally this is reflected in the structure of the dynamical variables
which are sums of those of the individual fields included in the system under consideration.
We have, however, presented argument by means of which the a priori existing arbitrari-
ness in the commutation relations involving different field operators can be reduced to the
‘standard’ one: these relations should contain either commutators or anticommutators of
the creation/annihilation operators belonging to different fields. A free field theory cannot
make difference between these two possibilities. Accepting these possibilities, the admissible
commutation relations (9.57) for system of several different fields are considered. They turn
to be corresponding multifield versions of the ones regarding a single field. Similarly to the
single field case, the standard multifield bilinear commutation relations seem to be the only
known ones that satisfy all of the imposed restrictions and are in agreement with the existing
data.
Acknowledgments
This research was partially supported by the National Science Fund of Bulgaria under Grant
No. F 1515/2005.
References
[1] N. N. Bogolyubov and D. V. Shirkov. Introduction to the theory of quantized fields.
Nauka, Moscow, third edition, 1976. In Russian. English translation: Wiley, New York,
1980.
[2] J. D. Bjorken and S. D. Drell. Relativistic quantum mechanics, volume 1 and 2. McGraw-
Hill Book Company, New York, 1964, 1965. Russian translation: Nauka, Moscow, 1978.
[3] Paul Roman. Introduction to quantum field theory. John Wiley&Sons, Inc., New York-
London-Sydney-Toronto, 1969.
[4] Lewis H. Ryder. Quantum field theory. Cambridge Univ. Press, Cambridge, 1985.
Russian translation: Mir, Moscow, 1987.
[5] A. I. Akhiezer and V. B. Berestetskii. Quantum electrodynamics. Nauka, Moscow,
1969. In Russian. English translation: Authorized English ed., rev. and enl. by the
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 56
author, Translated from the 2d Russian ed. by G.M. Volkoff, New York, Interscience
Publishers, 1965. Other English translations: New York, Consultants Bureau, 1957;
London, Oldbourne Press, 1964, 1962.
[6] Pierre Ramond. Field theory: a modern primer, volume 51 of Frontiers in physics. Read-
ing, MA Benjamin-Cummings, London-Amsterdam-Don Mills, Ontario-Sidney-Tokio, 1
edition, 1981. 2nd rev. print, Frontiers in physics vol. 74, Adison Wesley Publ. Co.,
Redwood city, CA, 1989; Russian translation from the first ed.: Moscow, Mir 1984.
[7] N. N. Bogolubov, A. A. Logunov, and I. T. Todorov. Introduction to axiomatic quantum
field theory. W. A. Benjamin, Inc., London, 1975. Translation from Russian: Nauka,
Moscow, 1969.
[8] N. N. Bogolubov, A. A. Logunov, A. I. Oksak, and I. T. Todorov. General principles of
quantum field theory. Nauka, Moscow, 1987. In Russian. English translation: Kluwer
Academic Publishers, Dordrecht, 1989.
[9] P. A. M. Dirac. The principles of quantum mechanics. Oxford at the Clarendon Press,
Oxford, fourth edition, 1958. Russian translation in: P. Dirac, Principles of quantum
mechanics, Moscow, Nauka, 1979.
[10] P. A. M. Dirac. Lectures on quantum mechanics. Belfer graduate school of science,
Yeshiva University, New York, 1964. Russian translation in: P. Dirac, Principles of
quantum mechanics, Moscow, Nauka, 1979.
[11] J. D. Bjorken and S. D. Drell. Relativistic quantum fields, volume 2. McGraw-Hill Book
Company, New York, 1965. Russian translation: Nauka, Moscow, 1978.
[12] C. Itzykson and J.-B. Zuber. Quantum field theory. McGraw-Hill Book Company, New
York, 1980. Russian translation (in two volumes): Mir, Moscow, 1984.
[13] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. In O. Kovras,
editor, Quantum Field Theory: New Researcn, pages 1–66. Nova Science Publishers, Inc.,
New York, 2005.
http://arXiv.org e-Print archive, E-print No. hep-th/0402006, February 1, 2004.
[14] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. II. Free spinor
fields.
http://arXiv.org e-Print archive, E-print No. hep-th/0405008, May 1, 2004.
[15] Bozhidar Z. Iliev. Lagrangian quantum field theory in momentum picture. III. Free
vector fields.
http://arXiv.org e-Print archive, E-print No. hep-th/0505007, May 1, 2005.
[16] H. S. Green. A generalized method of field quantization. Phys. Rev., 90(2):270–273,
1953.
[17] O. W. Greenberg and A. M. I. Messiah. Selection rules for parafields and the absence
of para particles in nature. Phys. Rev., 138B(5B):1155–1167, 1965.
[18] Y. Ohnuki and S. Kamefuchi. Quantum field theory and parafields. University of Tokyo
Press, Tokyo, 1982.
[19] Silvan S. Schweber. An introduction to relativistic quantum field theory. Row, Peter-
son and Co., Evanston, Ill., Elmsford, N.Y., 1961. Russian translation: IL (Foreign
Literature Pub.), Moscow, 1963.
http://arXiv.org
http://arxiv.org/abs/hep-th/0402006
http://arXiv.org
http://arxiv.org/abs/hep-th/0405008
http://arXiv.org
http://arxiv.org/abs/hep-th/0505007
Bozhidar Z. Iliev: QFT in momentum picture: IV. Commutation relations 57
[20] Bozhidar Z. Iliev. Pictures and equations of motion in Lagrangian quantum field theory.
In Charles V. Benton, editor, Studies in Mathematical Physics Research, pages 83–125.
Nova Science Publishers, Inc., New York, 2004.
http://arXiv.org e-Print archive, E-print No. hep-th/0302002, February 2003.
[21] Bozhidar Z. Iliev. Momentum picture of motion in Lagrangian quantum field the-
ory. International Journal of Theoretical Physics, Group Theory, and Nonlinear Optics,
??(?):??–??, 2007. To appear.
http://arXiv.org e-Print archive, E-print No. hep-th/0311003, November, 2003.
[22] Bozhidar Z. Iliev. On operator differentiation in the action principle in quantum field the-
ory. In Stancho Dimiev and Kouei Sekigava, editors, Proceedings of the 6th International
Workshop on Complex Structures and Vector Fields, 3–6 September 2002, St. Knstantin
resort (near Varna), Bulgaria, “Trends in Complex Analysis, Differential Geometry and
Mathematical Physics”, pages 76–107. World Scientific, New Jersey-London-Singapore-
Hong Kong, 2003.
http://arXiv.org e-Print archive, E-print No. hep-th/0204003, April 2002.
[23] Bozhidar Z. Iliev. On angular momentum operator in quantum field theory. In Frank
Columbus and Volodymyr Krasnoholovets, editors, Frontiers in quantum physics re-
search, pages 129–142. Nova Science Publishers, Inc., New York, 2004.
http://arXiv.org e-Print archive, E-print No. hep-th/0211153, November 2002.
[24] Bozhidar Z. Iliev. On momentum operator in quantum field theory. In Frank Columbus
and Volodymyr Krasnoholovets, editors, Frontiers in quantum physics research, pages
143–156. Nova Science Publishers, Inc., New York, 2004.
http://arXiv.org e-Print archive, E-print No. hep-th/0206008, June 2002.
[25] J. D. Bjorken and S. D. Drell. Relativistic quantum mechanics, volume 1. McGraw-Hill
Book Company, New York, 1964. Russian translation: Nauka, Moscow, 1978.
[26] A. B. Govorkov. Parastatistics and internal symmetries. In N. N. Bogolyubov, editor,
Physics of elementary particles and atomic nuclei, volume 14, No. 5, of Particles and
nuclei, pages 1229–1272. Energoatomizdat, Moscow, 1983. In Russian.
[27] R. F. Streater and A. S. Wightman. PCT, spin and statistics and all that. W. A.
Benjamin, Inc., New York-Amsterdam, 1964. Russian translation: Nauka, Moscow,
1966.
http://arXiv.org
http://arxiv.org/abs/hep-th/0302002
http://arXiv.org
http://arxiv.org/abs/hep-th/0311003
http://arXiv.org
http://arxiv.org/abs/hep-th/0204003
http://arXiv.org
http://arxiv.org/abs/hep-th/0211153
http://arXiv.org
http://arxiv.org/abs/hep-th/0206008
	Introduction
	The momentum picture
	Lagrangians, Euler-Lagrange equations and dynamical variables
	On the uniqueness of the dynamical variables
	Heisenberg relations
	Types of possible commutation relations
	Restrictions related to the momentum operator
	Restrictions related to the charge operator
	Restrictions related to the angular momentum operator(s)
	Inferences
	State vectors, vacuum and mean values
	Commutation relations for several coexisting different free fields
	Commutation relations connected with the momentum operator. Problems and their possible solutions
	Commutation relations connected with the charge and angular momentum operators
	Commutation relations between the dynamical variables
	Commutation relations under the uniqueness conditions
	Conclusion
	References
	This article ends at page
ABSTRACT
  Possible (algebraic) commutation relations in the Lagrangian quantum theory
of free (scalar, spinor and vector) fields are considered from mathematical
view-point. As sources of these relations are employed the Heisenberg
equations/relations for the dynamical variables and a specific condition for
uniqueness of the operators of the dynamical variables (with respect to some
class of Lagrangians). The paracommutation relations or some their
generalizations are pointed as the most general ones that entail the validity
of all Heisenberg equations. The simultaneous fulfillment of the Heisenberg
equations and the uniqueness requirement turn to be impossible. This problem is
solved via a redefinition of the dynamical variables, similar to the normal
ordering procedure and containing it as a special case. That implies
corresponding changes in the admissible commutation relations. The introduction
of the concept of the vacuum makes narrow the class of the possible commutation
relations; in particular, the mentioned redefinition of the dynamical variables
is reduced to normal ordering. As a last restriction on that class is imposed
the requirement for existing of an effective procedure for calculating vacuum
mean values. The standard bilinear commutation relations are pointed as the
only known ones that satisfy all of the mentioned conditions and do not
contradict to the existing data.

<|endoftext|><|startoftext|>
Introduction
Epitaxial self-assembled quantum dots (SAQDs) represent an important step in the advancement of semiconductor
fabrication at the nanoscale that will allow breakthroughs in optoelectronics and electronics. [1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12] Most frequent optoelectronic applications are high efficiency lasers with exotic wavelengths or photode-
tectors. [1, 3, 4, 5, 6, 7, 8, 10, 11, 12] SAQDs are the result of a transition from 2D growth to 3D growth in strained
epitaxial films such as SixGe1−x/Si and InxGa1−xAs/GaAs. This process is known as Stranski-Krastanow growth or
Volmer-Webber growth. [13, 1, 14, 15].
In applications, order is a key factor. There are two types of order, spatial and size. Spatial order refers to
the regularity of SAQD dot placement, and it is necessary for nano-circuitry applications. Size order refers to the
uniformity of SAQD size which determines the voltage and/or energy level quantization of SAQDs. It is reasonable
to expect that these type of order are linked, and it is important to understand the factors that determine SAQD order.
Further understanding should help in the design and simulation of both spontaneous “bottom up” self-assembly and
directed or guided self-assembly to enhance SAQD order. [16, 17, 18, 19, 20, 21, 22, 23]Here, an elaboration of and
further application of a linear analysis of SAQD order [24] is presented. The work reported here forms the basis of a
non-linear theory and modeling of SAQD order that will be reported in future work.
In [24] it was reported that one could calculate a correlation function using a linearized model of SAQD formation.
This correlation function included two correlation lengths that could be used to describe SAQD order. It was also found
that one effect of a hypothesized wetting potential was to enhance SAQD order when growth occurs near the critical
film height for 3D growth. Here, these results are expanded to create a more rigorous linearized theory of SAQD order
that will inform non-linear theories. In particular, the model is generalized to any model that combines local energy
effects such as surface energy density and non-local elastic destabilization, and the procedure for predicting order
based on any linear theory with peak wavelengths is presented. The hypothesized effect of elastic anisotropy in [24]
is verified with calculations using linear anisotropic elasticity theory. [25, 26] Details such as statistical fluctuation
and convergence are also addressed along with a discussion of the possible forms of linear anisotropic terms in SAQD
growth kinetics, and the effect of an atomic-scale cutoff in the continuum theory is addressed. Finally, the order
enhancing effect of growing near the critical threshold is explored in more detail using calculations appropriate to
Ge/Si SAQDs.
In the literature, two modes of SAQD formation are generally discussed, the thermal nucleation mode and the nu-
cleationless mode. [27, 28, 29] In the thermal nucleation mode, a 2D film surface is metastable, and the formation of
individual quantum dots is thermally activated. [27]. This growth mode leads to the formation of individual quantum
dots as uncorrelated or loosely correlated discrete events at essentially random locations. In the nucleationless mode,
the 2D film surface transitions from stable (or metastable) to unstable. In this mode, dots form everywhere at once
appearing at first as a cross-hatched ripple-like disturbance on the 2D film surface and then maturing into recogniz-
able individual dots.[27, 30, 28, 31, 32] These two modes are probably connected via an encompassing conceptual
and mathematical model1, and perhaps some of what is observed experimentally is in fact a hybrid mechanism. In
agreement with intuition, it appears that the nucleationless mode leads to a more ordered dot pattern than the thermal
nucleation mode that is dominated by randomness. 2 Thus, the presented analysis applies to the nucleationless mode.
There are various implementations of nucleationless growth models [28, 37, 38, 39, 40, 18, 34], although, there is
also a great deal of commonality among these models. In particular, they all include a non-local elastic effect and local
surface energies and/or local wetting energies. Here, a linear analysis of quantum dot order resulting from this class of
model is presented. Particular note is taken of the effects of stochastic initial conditions crystal anisotropy in general,
elastic anisotropy in particular, and the effect of varying film height as a control parameter as first introduced in [33].
A simple model similar to [28, 37, 38, 40, 18] is presented to produce numerical examples and explore the effects
of the average film height. Concurrently, a more abstract and general model is presented and analyzed that includes
non-local elastic strain effects, and a local combined surface and wetting energy. The linear model with stochastic
initial conditions and deterministic film height evolution will pave the way for more sophisticated analysis involving a
non-linear model of stochastic film height evolution.
As previously stated, one of the goals in the present work is to further explore the role of the wetting poten-
tial during growth near the stability threshold in film height. A wetting potential has been included in the analysis
and simulations in [38, 33, 37, 28]. Although somewhat controversial, the wetting potential plays an important phe-
nomenological role. It ensures that growth takes place in the Stranski-Krastanow mode: that a 3D unstable growth
occurs only after a critical layer thickness is achieved, and that a residual wetting layer persists. The physical origins
and consequences of the wetting potential are discussed in [41, 28]. The analysis presented here is usable in models
that neglect the wetting potential by simply setting it to zero. Another possibility is simply that the wetting potential
is simply an approximation to the stabilizing effect of intermixing. [42] That said, if the wetting potential is real, the
present analysis shows that it is beneficial to SAQD order to grow near the critical layer thickness.
The presented analytic formulas and linear analysis are intended to complement existing numerical models of
SAQD order. [43, 37, 44, 45] and to form a basis for future non-linear analytic analysis of SAQD order. The current
findings agree with previous work on the beneficial effects of elastic anisotropy to enhance in-plane order.
The linear analysis, of course, represents a simplification of the film evolution, and it applies only to the initial
stages of SAQD formation when the nominally flat film surface becomes unstable and transitions to three-dimensional
growth. However, the small surface fluctuation stage of SAQD growth determines the initial seeds of order or disorder
in an SAQD array; thus, the small fluctuation stage should have an important influence on the final outcome. At later
stages when surface fluctuations are large, there is a natural tendency of SAQDS to either order or ripen [33, 37, 46,
39, 47] Ordering systems tend to evolve slowly due to critical slowing down [39], while ripening tends to diminish
order further. [37] Thus, it is possible that the linear model could, in fact, yield good predictions of SAQD order.
The simplification and linearizion facilitates the development of analytic solutions that are most transparent, easily
portable to multiple material systems and have no effective limit on system size. Finally, it is virtually impossible to
have a thorough understanding of the full non-linear model without first having a thorough understanding of the linear
behavior.
The remainder of the paper is organized as follows. Section 2 presents the physical assumptions and mathematical
1It is likely that there a transition from stable, to metastable and finally to unstable. The analysis presented in [33] would appear to support such
a view where the film height acts as the control parameter driving the transition. There is also some controversy regarding whether all dot growth is
nucleationless or not. [34, 32]
2Compare various figures in [29, 35, 14, 31, 36].
approximations used to model film growth. Section 3 discusses the stochastic initial conditions and the resulting
correlation functions and correlation lengths. Section 4, presents a procedure for estimating SAQD order with an
application to Ge dots on a Si substrate. Section 5 presents conclusions, while Appendices A-F present additional
calculational details.
2 Modeling
The formation of SAQDs is modeled as a deterministic surface diffusion process with stochastic initial conditions. The
resulting equations and ultimately the sought after correlation functions are different depending on whether the film
surface is treated as one-dimensional isotropic, two-dimensional isotropic or two-dimensional anisotropic. The 1D
and 2D isotropic cases are discussed first, and then the essential differences of the 2D anisotropic model are presented.
The stochastic initial conditions need to be expressed in terms of the correlation functions that are also use to analyze
order; consequently, the discussion of the initial conditions is deferred to Sec. 3.2.
It should be noted that the results presented here are fairly general. There has been a good deal of recent work
refining the modeling of nucleationless growth processes to incorporate various phenomenological aspects of SAQD
growth. For example, the inclusion of orientation-dependent surface energy [38], strain-dependent surface energy [34]
and explicit modeling of atomic species segregation and film-substrate inter diffusion. [48] Two models are presented
here. One is a simple concrete example. It is the simplest model one can use including elastic effects surface energy
and wetting energy. The second model is more abstract and describes the general case of a local potential energy
that depends on both the film height and film height gradient. One effect that is not examined here is that of mixed
4-fold and two-fold symmetry. Such a mixing can occur due to diffusional anisotropy or surface energy anisotropy.
(Sec. 2.2.1.2 and Appendix D). However, a similar analysis procedure should work for these cases as well. The general
procedure for possible application to other models is discussed in Sec. 3.5.
The following discussion will use abstract vector notation, e.g. k instead of ki, etc. Also, because it is sometimes
computational expedient to perform one-dimensional modeling [24, 39, 17, 42], the case of a one dimensional surface
with two dimensional volume is discussed along with the case of an isotropic 2D surface. To facilitate this combined
discussion, the dimensionality of the surface will be denoted as d. In Secs. 3.3 and 3.4, d = 1, 2 will be substituted as
appropriate. Finally, much of the calculation involves reciprocal space. The convention used for the Fourier transforms
f(x) =
ddk eik·xfk, and fk = (2π)
ddx e−ik·xf(x)
following the example of [28].
2.1 1D and 2D Isotropic model
This discussion pertains to both the 1D model and the 2D isotropic model. The formation of SAQDs is modeled as a
surface diffusion process where the film height is a function of the lateral position and time. The system is treated as
deterministic with stochastic initial conditions. First, the general non-linear governing equations are presented. Then,
the linearized form is presented. Finally, the key behavior is reviewed.
The mathematical model uses film height, H(x, t) as the dependent variable and the horizontal position x and
time t as the independent variables. The film height evolves over time due to surface diffusion driven by a diffusion
potential µ(x, t) and a flux of new material Q. The surface velocity is thus
vn = nz∂tH = −∇S · D∇Sµ(x, t) +Q (1)
where nz is the vertical component of the surface normal nz = [1 + (∇H)2]−1/2, ∇S is the surface gradient, D is the
diffusivity, and ∇S · is the surface divergence.
2.1.1 Energetics
The diffusion potential µ(x, t) must produce Stranski-Krastanow growth. Thus, it must contain an elastic term that
destabilizes film growth, a surface energy term that stabilizes planar growth and a wetting energy that ensures a wetting
layer. The diffusion potential can be derived from a total free energy.
F = Felast + Fsurf. + Fwet
volume
dV ω +
surface
dAsurf. γ +
dAW (H)
where ω is the elastic energy density, γ is the surface energy density, W (H) is the wetting energy density. The
last integral corresponds to Fwet, and whether the integral should be taken over the film surface or the substrate is
ambiguous. The “simple” model (Sec. 2.1.1.1) assumes that the integral is over the substrate, while the “general”
model (Sec. 2.1.1.2) can accommodate both cases.
2.1.1.1 simple form The simplest possible model results if the integral corresponding to Fwet is taken over the
lateral positions x rather than over the actual free-surface. In concrete terms, one can use dV = d2xdz and dAsurf. =
1 + (∇H(x))2
to obtain the expression,
volume
d2xdz ω[H](x, z) +
x-plane
1 + (∇H(x))2
γ +W (H(x))
, (2)
where the “ω[H]” indicates that the elastic energy density is a non-local functional of the film height,H. The diffusion
potential µ can be found, similar to [15], by differentiating F with respect to the surface motion (Appendix A.1),
µ(x) = ΩδF/δH(x). Doing so for Eq. (2) (Appendix A.2),
µ(x) = Ω [ω(x)− γκ(x) +W ′ (H(x))] . (3)
where Ω is the atomic volume, ω(x) is the elastic energy density at the film surface (implicitly ω[H] (x,H(x))),
κ = ∇ ·
∇H(x)
1 + (∇H(x))2
]−1/2}
is the total surface curvature, and W ′(H) = ∂H(x)W (H(x)) is the
derivative of W (H(x)) evaluated at x.
2.1.1.2 general form It should be noted that Eq. (3) is not the same diffusion potential used in [38]. The wetting
potential used there can be derived by taking W (H) as an energy density of the free surface, not a density in the
x-plane. Expressions like Eq. (3) and Eq. (1) in [38] are part of a larger class of surface evolution models with more
or less the same linear behavior.
The surface and wetting energy can be combined and incorporated into a more general form, with a total free
energy Fsw and a free energy density Fsw(H,∇H) that depends on the film heightH(x) and the film height slope or
orientation ∇H(x). The total free energy is thus
F = Felast. + Fsw (4)
volume
d2xdz ω[H](x, z) +
x−plane
d2xFsw (H(x),∇H(x)) .
Fsw may not necessarily be the sum of separate surface energy and wetting energy contributions. It need only be a
local function ofH and ∇H. The corresponding diffusion potential is
µ(x) = Ω
ω(x) + F (10)sw (x)−∇ · F
sw (x)
, (5)
where F (mn)sw indicates the mth derivative with respect to H and the nth derivative with respect to ∇H. F
sw (x) =
∂H(x)Fsw (H(x),∇H(x)) and each vector component of F
sw (x) is
F(01)sw (x)
= ∂[∇H(x)]
Fsw (H(x),∇H(x)).
One can obtain the results of the simple model (Eqs. (2) and (3)) by setting
Fsw =
1 + (∇H(x))2
γ +W (H(x)) . (6)
A diffusion potential like Eq. (1) in [38] can be obtained by setting
Fsw =
1 + (∇H(x))2
[γ (∇H(x)) +W (H(x))] .
This is different from Eq. (6) in two ways. First, the surface energy density depends on the surface orientation. Second,
the Jacobian, J =
1 + (∇H(x))2
multiplies both the surface energy density and the wetting potential. Despite
these differences, the common form of the diffusion potential (Eq. (5)) among different models suggests that they
might all lead to similar linearized forms and behavior.
2.1.1.3 Linearization The diffusion potential is now linearized about the average film height H̄. In general, one
can control the amount of deposited material, and thus the average film height H̄. It is therefore useful to decompose
H(x) into the spatially averaged mean value and fluctuations about the average. Similar to [28],
H = H̄+ h(x, t). (7)
In the present calculation, H̄ is specified as constant in time. This assumption corresponds physically to a fast deposi-
tion and then an anneal. It is not too difficult to generalize to a time dependent H̄, but that is beyond the scope of this
manuscript. In [38, 49], deposition and evaporation is explicitly modeled.
All terms in µ(x, t) are now kept to only linear order in h(x, t). The elastic energy density ω is a non-local
functional of h(x, t) [40]; however, the equations generating ω(x) are translationally invariant. Thus, it is convenient
to use reciprocal space for the linearization. The curvature is trivially linearized as κ(x) → ∇2h(x) in real space or
κk → −k2hk in reciprocal space. The linearized elastic strain energy ω can be found in reciprocal space as in [15]
to be ωk = −2M(1 + ν)�2mhk, where M = E/(1 − ν) is the biaxial modulus, E is the Young modulus, ν is the
Poisson ratio, and �m is the film-substrate mismatch strain. This formula neglects possible differences in elastic moduli
between the film and substrate as in [28], but a similar method of analysis should apply to that case as well. Linearizing
Eqs. (3) and (5) in reciprocal space, µk is proportional to hk with a proportionality coefficient that depends on k and
µlin,k = f(k, H̄)hk (8)
where f(k, H̄) for three different isotropic cases, corresponding to Eqs. (3) and (5), and an abstracted general form, is
given by
f(k, H̄) =
−2M(1 + ν)�2mk + γk2 +W ′′(H̄)
; case a (Eq. (3))
−2M(1 + ν)�2mk + F 02k2 + F 20
; case b (Eq. (5))
−ak + bk2 + c ; case c (general)
. (9)
Due to isotropy, f(k, H̄) is independent of the direction of k, and only the wave number, k = ‖k‖, appears in the
right hand side. F (20)sw is the second derivative of Fsw with respect to H, and F
sw the second derivative of Fsw with
respect to ∇H. F (20)sw and F
sw depend on H̄ only; thus they are constants in the present analysis. See Appendix B.2
for more precise definitions and the derivation of f(k, H̄). Using Eq. (6), produces F (02)sw = γ and F
sw = W ′′(H̄)
which is identical to the simple case of Eq. (9), a. Case c, labeled as “general” where a, b, and c depend implicitly on
H̄ shows that f(k, H̄) for cases a and b have the same relatively simple form. It also emphasizes the dynamic effects
as opposed to the physical causes. There is a destabilizing term, −ak, a short wavelength cutoff term, bk2, and a term
that stabilizes the entire spectrum, c.
Despite the label “general,” there are of course limitations to the application of Eqs. (8) and (9). For example,
there has been recent work on the effects of strain-dependent surface energies. [34] The second form can not represent
such an effect because the derivation assumes that the surface energy only depends on local quantities, (H and ∇H)
whereas the strain effect is non-local. However, it is reasonable to conjecture that a more detailed analysis of the
effects of a strain dependent surface energy term would produce a coefficient function f(k, H̄) not very different from
the case c “general” form of Eq. (9). Thus, the following analysis may very well apply to this more exotic model, but
more study is needed to be certain.
2.1.2 Dynamics
As discussed in Sec. 2.1.1, the dynamics are derived assuming no flux of new material (Q = 0) and keeping only
terms to linear order in the height fluctuation, h(x, t). Under these assumptions, Eq. (1) can be decomposed into a
Table 1: Characteristic wave-numbers, characteristic times and associated dimensionless variables for the three cases
addressed in Eq. (9).
kc tc α β
case a 2M(1+ν)�
16DΩM4(1+ν)4�8m
γW ′′(H̄)
4M2(1+ν)2�4m
case b 2M(1+ν)�
(F (02)sw )
16DΩM4(1+ν)4�8m
F (02)sw F
4M2(1+ν)2�4m
case c a/b b3/(DΩa4) k/kc cb/a2
trivial equation for H̄ and an equation for the film height fluctuation by inserting Eq. (7).
dH̄/dt = 0 (10)
∂th(x) = −∇ · D∇µlin(x) (11)
where µlin(x) is the inverse Fourier transform of Eqs. (8) and (9), and it depends implicitly on the average film height
H̄. Note that the time dependence is implicitly while the coordinate dependence is explicit. The explicit coordinate
dependence serves to distinguish Assuming that the diffusivity D is constant, the Fourier transform of Eq. (11) gives
the linearized differential equation for the evolution of each Fourier component.
∂thk = −Dk2µk = −Dk2f(k, H̄)hk. (12)
Solving Eq. (12),
hk(t) = hk(0)e
σkt; (13)
σk = −Dk2f(k, H̄). (14)
The surface evolves in reciprocal space as an initial condition, hk(0) multiplied by an envelope function, eσkt. For
most values of H̄, this envelope function has a peak. As time passes, this peak narrows and can be approximated by a
gaussian. To analyze this behavior, appropriate dimensionless variables are defined. Then, the stability of the film is
discussed. Finally, σk is expanded about its peak to aid analytic calculations.
The time dependent behavior of the film height fluctuations is facilitated by using a characteristic wave number,
characteristic time and related dimensionless variables. For the “general” case c of Eq. (9), the characteristic wavenum-
ber is kc = a/b, and the characteristic time is tc = 1/(DΩbk4c ) = b3/(DΩa4). These characteristic dimensions can be
used to define a dimensionless wave vector, α = k/kc and a dimensionless wetting parameter β = c/(bk2c ) = cb/a
One can also define a dimensionless time, τ = t/tc. To obtain the corresponding characteristic scales for cases a and
b, one merely has to plug in the appropriate substitutes for a, b and c and follow the pattern. For example, for case
a, make the substitution a → Ω2M(2 + ν)�2m, etc. Table 1 summarizes these values for all three cases. For all three
cases, f(k, H̄) and the growth constant σk reduce to the following forms:
f(k, H̄) = f(kcα, H̄) = Ωbk2c
−α+ α2 + β
σk = σkcα = t
α− α2 − β
, (16)
where α = ‖α‖ = k/kc is the dimensionless wave number. These forms are plotted in Figs. 1a and 2. Fig. 1a shows
f(k, H̄)/Ωbk2c vs. α for an isotropic or one dimensional surface. Figs. 2 shows tcσk vs. α for a 2D anisotropic
surface (Sec. 2.2). However, the curves marked 0◦ are identical to the dispersion relation for a 1D or 2D isotropic
surface (compare Eqs. (9) and (23)).
2.1.3 Peaks
The peak growth rate and the corresponding wavenumber k can be found from Eq. (16). σk has a peak at k0 = kcα0
where
9− 32β
. (17)
Expanding σk about this peak to second order in k − k0,
σk ≈ σ0 −
σ2(k − k0)2
! ! 0.25!
increasing θk
Figure 1: Dimensionless diffusion potential prefactors vs. dimensionless wave number. (a) The one dimensional or
isotropic case with β = 0.3. (b) The elastically isotropic case with anisotropy �A = 0.1 (see Eq. (22)).
increasing θk
22.5◦
increasing θk
22.5◦
Figure 2: Dimensionless growth constant vs. dimensionless wave number. Curves are plotted for the elastically
anisotropic case, but the curves marked 0◦ are the same as for the isotropic cases. In (a), β = 0. In (b) β = 0.2.
Figure 3: Exponential Envelope eσkt as function of α for β = 0.208 and t/tc = 100. (a) 2D isotropic surface. (b) 2D
anisotropic surface with � = 0.1236.
The two constants are
t−1c α
0 (α0 − 2β) , (18)
σ2 = t
c (3α0 − 4β) . (19)
Inserting this approximation for σk into Eq. (13),
hk(t) = hk(0)e
σ0te−
2σ2t(k−k0)
. (20)
The individual initial surface fluctuation components grow with a gaussian shaped envelope. An example of this
envelope is plotted in Fig. 3(a). Notice that in two dimensions, the envelope forms a ring as the peak is about the
wave-number k0 but not about any particular point in the k-plane.
2.1.4 Stability and wetting potential
Stranski-Krastanow growth is marked by a transition from stable two-dimensional growth to unstable three-dimensional
growth once a critical height Hc is reached. [1] Eqs. (17), (18) and (20) are useful for analyzing the transition from
stable to unstable growth. In order for this transition to occur, there must be some stabilizing term in the diffusion
potential. In the present model, this means that there must be some surface energy-like term that varies strongly with
film height. This condition equates to stating that W ′′(H̄) or F 20sw or c (Eq. (9)) must be rather large if H̄ < Hc.
However, as H̄ increases, these terms are reduced. Finally, when H̄ > Hc, this term is no longer capable of stabilizing
the film against fluctuations of all possible wavelengths.
The critical value Hc can be found using the analysis from [33]. By inspection of Eqs. (8), (9) and (12), modes
with f > 0 increase the total free energy F as they grow; thus, they are stable and decay with time. Modes with
f < 0 decrease the total free energy F as they grow; thus, they are unstable and grow with time. This growth and
decay rule is easily verified by inspection of Eq. (14). Thus, stable growth occurs when f(k, H̄) > 0 for all values of
k, and unstable growth occurs when f(k, H̄) < 0 for some values of k. Thus, the transition from stable to unstable
growth occurs when the minimum value of f(k, H̄) just becomes negative. Using the same dimensional analysis as in
the previous section and following the discussion of [33], one finds that the minimum value, fmin = Ωbk2c (β − 1/4),
occurs at kmin/kc = αmin = 1/2. fmin first becomes negative, and the transition to unstable growth occurs when
the dimensionless wetting parameter (Table 1) drops to a critical value, β = 1/4 . β > 1/4 stable 2D growth, and
β < 1/4 unstable 3D growth. It is reasonable to suppose that W (H̄), W ′′(H̄), and thus β are positive monotonically
decreasing functions of H̄ so that the interface becomes less important for large values of H̄. For example, in [50] it is
assumed that W (H) = B/H, where B is constant. When β → 0, corresponding to large H̄, the case discussed in [28]
is obtained. A similar analysis can be done for cases b and c once one specifies how the terms F (20)sw and F
sw or a, b
and c depend on H̄.
Using a guessed form for a wetting potential, one can find the critical film heightHc by setting β = 1/4 . Applying
this condition to case a in Eq. (3)
W ′′(Hc) = γk2c/4.
Using the wetting potential of [50] as an example, W (H) = B/H,
Hc = 3
8B/(γk2c ) =
8Bγ/(2M(1 + ν)�2m)2. (21)
Conversely, one can fit a wetting potential to an observed or reasonable critical layer thickness from the same condition.
Using the example wetting potential from [50],
(2M(1 + ν)�2m)
as stated in [50].3
2.2 2D Anisotropic case
Crystal anisotropy leads to a dispersion relation σk that is both quantitatively and qualitatively different from the
isotropic case. Here the effect of elastic anisotropy is discussed in most detail. Other sources of anisotropy are
the surface and wetting energies. For example, in [38] the surface energy density is orientation dependent which
introduces a possible anisotropy in the dispersion relation. Possible sources of anisotropy are an anisotropic elastic
stiffness tensor, an orientation dependent surface energy or wetting potential or anisotropic diffusion. As discussed
below, the form of anisotropy to linear order in the height fluctuation, h, is somewhat restricted. Results are presented
for 4-fold symmetric surfaces, that is surfaces that have invariant dynamic evolution laws when rotated by 90◦. Possible
complications arising from 2-fold symmetric anisotropic terms (with 180◦ rotational symmetry) are also discussed. As
for the isotropic case, first the energetics are discussed, then the dynamics, and finally the expansion about the peaks
in the dispersion relation, σk.
2.2.1 Energetics
The discussion of energetics will first treat the effects of elastic anisotropy and then anisotropy resulting from surface
or wetting like terms.
3This result from [50] corresponds to the choice Fsw(H,∇H) =
1 + (∇H)2γ+W (H). However, the numerical model in [50] appears to
use Fsw(H,∇H) =
1 + (∇H)2 [γ(∇H) +W (H)]. This difference should lead to a slightly different critical film height in their numerical
model from the one that they predicted (Eq. (21)).
Figure 4: Plot of Eθk/(M�
m) for various materials. Symbols indicate values calculated using Appendix C. Solid lines
are the interpolation (Eq. (22)) using the values from Table 2.
Table 2: Elastic constants [51] and calculated values (see Appendix C) for various materials of interest at T = 300K.
c11 c12 c44 M
1011 ergcm3 10
11 erg
cm3 10
11 erg
cm3 10
11 erg
Ge 12.60 4.40 6.77 13.93 2.16 1.906 0.1176
Si 16.60 6.40 7.96 18.07 2.22 1.997 0.1005
InAs 8.34 4.54 3.95 7.94 2.70 2.09 0.226
GaAs 11.90 5.34 5.96 12.45 2.15 1.87 0.1302
2.2.1.1 Elastic anisotropy One would like to obtain a simple symbolic expression for the elastic energy density at
the free surface, ωk, to first order in hk for the elastically anisotropic case. Similar discussions can be found in [25, 26].
For the isotropic case, ωk = −2M(1 + ν)�2mhk. For the anisotropic case,
ωk = −Eθkkhk
where the prefactor Eθk is the decrease in elastic energy at the surface per unit wave number (k → 1) and unit amplitude
(hk → 1) . It is not constant, but instead depends on the θk, the angle that k makes with the x−direction. The case of a
cube-symmetry elastic stiffness tensor such as for Si is considered where one must specify three elastic constants c11,
c12 and c44. [51]. Growth on a (100) surface will produce an elastic energy prefactor Eθk that is four-fold symmetric
(symmetric upon rotations by 90◦). A procedure similar to [25, 26] based on a first order perturbation analysis is
followed (Appendix C). A relatively simple interpolation formula [24] is hypothesized and then verified numerically.
The interpolation procedure, suggested in [24] uses the lowest possible order expansion in sin(θk) and cos(θk)
that has the appropriate four-fold symmetry and then interpolates between θk = 0◦ and θk = 45◦. Thus,
Eθk = E0◦
1− �A sin2 (2θk)
where �A = (E0◦ − E45◦)/E0◦ is an anisotropy factor. This lowest order form turns out to be a very good fit to
numerical calculations (Fig. 4). Table 2 gives values of E0◦ and �A for some systems of interest. In the elastically
isotropic case, E0◦ = E45◦ = 2M(1 + ν) so that �A = 0.
There are two important differences from the elastically isotropic case. The first is obvious, that Eθk depends on
angular orientation, θk. The second is that the peak value of ωk is not the same as that for the elastically isotropic case
because in general, E0◦ 6= 2M(1 + ν). In [24], where the purpose was simply to investigate the mechanism by which
elastic anisotropy effects order, this second difference was neglected.
2.2.1.2 Surface and Wetting Energy Anisotropy The surface energy and wetting potential can be additional
sources of anisotropy if they depend on the surface orientation so that γ → γ(∇H) or W (H) → W (H,∇H) (for
example, [52, 38]). Then, to first order in h ,
µsurf.,k = Ω
γk2 + k · γ̃′′ · k
where γ̃′′ is the (2×2) matrix or Hessian matrix that results from taking the second derivatives of γ(∇H) with respect
to the two components of ∇H (Appendix B.1). Similarly
µwet,k = Ω
W (20) + k · W̃(02) · k
where W (20) and W̃(02) are the second derivatives of W (H,∇H) with respect to H and ∇H (Appendix B.1). For
both µsurf.,k and µwet,k, the first term is isotropic, and the second term contains any possible anisotropy.
The rank of the γ̃′′ and W̃(02) matrices greatly restricts the possible forms of the additional anisotropy. These
(2 × 2) matrices must be either two-fold symmetric or perfectly isotropic. Thus, if the surface energy and wetting
potential are four-fold symmetric as Eθk is, then γ̃
′′ → γ′′, a scalar, and W̃(02) → W (02), a scalar, and neither one
contributes any additional anisotropy. They do, however, help to stabilize or further destabilize the 2D surface as they
add terms proportional to k2. The effect of these additional terms is indistinguishable from the effect of varying the
value of the surface energy density, γ. [52, 31]
It should be noted that the (100) surface of a diamond or zinc-blend structures allows for anisotropy that is only 2-
fold symmetric (rotations by 180◦). Thus, they could “break” the four-fold symmetry that occurs when one considers
the elastic anisotropy alone. However, this “broken” symmetry is somewhat dubious because even the diamond and
zinc-blend structures have a screw symmetry (rotations by 90◦ and translation in the [100] direction by half a lattice
vector). Thus, if for example, W (H,∇H) is anisotropic with two-fold symmetry to linear order, there must be a
fast oscillation with changes in the film height H. In Appendix D, a similar term related to anisotropic diffusion is
discussed. There does not appear to be any evidence for this two-fold symmetry in the case of (100) surfaces of IV/IV
systems such as Ge/Si, but in III-V/III-V systems the four-fold symmetry of the (100) surface may indeed be “broken”
in this way corresponding to either a surface energy anisotropy or a diffusional anisotropy. [53, 54]. Further analysis
of such terms in any more detail would greatly complicate the present discussion, so it is left for future work. Most
of the modeling literature avoids this complication by not including the symmetry-breaking of the zinc-blend surface,
for example [25, 26, 38].
One can perform a similar analysis of the combined surface and wetting potential, Fsw(H,∇H) (case b). To linear
order the resulting anisotropic diffusion potential is (Appendix B.2)
µsw,k = Ω
F (20)sw + k · F̃
sw · k
Again, F̃(02)sw is a rank 2 tensor, and all of the same symmetry considerations apply here as well.
Because the two-fold symmetry anisotropic terms are excluded from the current discussion, and isotropic terms
simply “renormalize” the effective of surface energy, there will be no further consideration of anisotropy resulting
from the surface energy or wetting potential in this discussion. Further calculations will proceed assuming that the
surface energy density, γ, nor the wetting potential,W (H), depend on ∇H or similarly that Fsw(H,∇H) has a purely
isotropic dependence on ∇H. This assumption can be made without affecting any of the qualitative results.
2.2.1.3 total diffusion potential Having dispensed with the discussion of the various sources of anisotropy, the
total diffusion potential is stated for the case of 4-fold symmetric elastic anisotropy and a completely isotropic surface
energy and wetting potential. µk = f(k, H̄) with
f(k, H̄) =
1− �A sin2(2θk)
k + γk2 +W ′′(H̄)
; case a (Eq. (3))
1− �A sin2(2θk)
k + F (02)sw k2 + F
; case b (Eq. (5))
1− �A sin2(2θk)
+ bk2 + c ; case c (general)
. (23)
Table 3: Characteristic wave-numbers, characteristic times and associated dimensionless variables for the three cases
addressed in Eq. (9)
kc tc α β
case a E0◦/γ γ3/(DΩE40◦) k/kc γW ′′(H̄)/E20◦
case b E0◦/F
/(DΩE40◦) k/kc F
sw /E20◦
case c a/b b3/(DΩa4) k/kc cb/a2
2.2.2 Dynamics
The dynamics is governed by surface diffusion, just as for the fully isotropic case. It is assumed that the diffusivity
is isotropic as was done for the surface energy and the wetting energies; thus, all anisotropy in the film evolution
dynamics comes from elastic effects alone. The possibility and effects of an anisotropic diffusion potential is discussed
in Appendix D (also see [54]). The time dependence of the surface perturbations simply follows Eqs. (13) and (14),
but with Eq. (23) used for f(k, H̄). As for the isotropic case, appropriate characteristic wave numbers (kc) and
time scales (tc) can be found for each of the three cases along with the associated dimensionless wave vector α and
dimensionless wetting parameter β. These are listed in Table 3. The dispersion relation, σk can be expressed in terms
of these dimensionless variables (α and β), giving
σk = σkcα = t
1− �A sin2(2θk)
− α2 − β
. (24)
The stability behavior is essentially the same as for the isotropic case with a transition occurring at β = 1/4 corre-
sponding to H̄ = Hc.
2.2.3 Expansion about peaks
σk has 4 peaks at (k, θk) = (k0, π[n− 1]/2) with k0 = kcα0 (Eq. (17)) and n = 1 . . . 4. In vector form, there are four
peaks at
kn = k0 (cos(π(n− 1)/2)i + sin(π(n− 1)/2)j) .
Similar to the isotropic case, σk can be expanded about individual peaks so that in the vicinity of peak n, σk ≈ σn
σn = σ0 −
σ‖(k − k0)2 −
0(θk − nπ/2)
where σ0 is given by Eq. (18), σ‖ = σ2 given by Eq. (19), and
σ⊥ = 8�Aα0t
In terms of the vector components parallel and perpendicular to kn, k‖ and k⊥ respectively,
σn = σ0 −
σ‖(k‖ − k0)2 −
k‖ = cos[π(n − 1)/2]kx + sin[π(n − 1)/2]ky , and k⊥ = − sin[π(n − 1)/2]kx + cos[π(n − 1)/2]ky . The time
evolution of hkin the vicinity of one of the kn is
hk(t) ≈ hk(0)et(σ0−
2σ2(k‖−k0)
2− 12σ⊥k
3 Correlation Functions
Correlation functions and associated constants such as correlation lengths can be very useful for characterizing order.
In particular, the autocorrelation function (Eq. (25)) and its Fourier transform (Eq. (26)) also known as the spectrum
function can give a very good characterization of dot order (Figs. 6a and c and 5b, e and h). The autocorrelation
function is denoted CA(∆x) where ∆x is the difference vector between two points in the x−plane. The spectrum
function is a function of k, and it is denoted CAk . The goal here is to be able to predict these two functions and to
h(nm)
h(nm)
h(nm)
describe them quantitatively in a manner that can be used to characterize SAQD order with just a few numbers. The au-
tocorrelation function is the result of a spatial average over one experiment or one simulation (numerical experiment).
It is regular and repeatable because it is closely tied to the correlation function and spectrum function that results from
an ensemble average (Eqs. X and X). These are denoted as C(∆x) and the spectrum Ck respectively. Note that the
ensemble averaged functions do not have a superscript “A.” These ensemble average correlation functions are useful
in the analysis of stochastic ordinary and partial differential equations. [55, 56]. From a strictly technical viewpoint,
the spatial average and the ensemble average are not exactly the same; however, they are closely enough connected
that it is reasonable to use one as a substitute for the other (Sec. 3.1 and Appendix 3).
In the following, the analysis of SAQD order via autocorrelation and correlation functions is discussed (Sec. 3.1).
Then, the stochastic initial conditions are discussed (Sec. 3.2). Then, the prediction of the Fourier transforms of the
correlation functions is discussed (Sec. 3.3). The real-space correlation functions are presented (Sec. 3.4). Finally,
there are some notes regarding generalizing the analysis method to any dispersion relation that has peaks (Sec. 3.5),
for example, peaks related to broken four-fold symmetry or growth on a miscut substrate.
3.1 Correlation Functions and SAQD order
Auto-correlation functions are well-suited for investigating SAQD order. The autocorrelation function is defined as
CA(∆x) =
d2x′ h(∆x + x′)h(x′)∗. (25)
Its Fourier transform sometimes called the spectrum [56], spectrum function or power spectrum is
CAk =
(2π)d
d2∆x e−ik·∆xC(∆x) =
(2π)d
, (26)
where A is the projected area of the film in the x − y−plane. A periodic array of SAQDs leads to a periodic auto-
correlation function. A nearly periodic array leads to a range-limited periodic auto-correlation function. The ensemble-
mean of these autocorrelation functions can be calculated, and it is a good predictor of a SAQD order.
3.1.1 Periodic array
Consider a perfectly periodic height fluctuation corresponding to a perfect lattice of SAQDs,
h(x) =
exp [ikn · (x− xO)] (27)
plus higher order harmonic, where the dots have a height proportional to h0, N is the degree of symmetry, probably,
4-fold or 6-fold, xO is a random origin offset.
kn = k0
2π(n− 1)
i + sin
2π(n− 1)
In a linear analysis, the higher order harmonics do not come into play, so they are neglected here. In reciprocal space,
e−ikn·xOδd(k− kn)
plus higher order harmonic. The autocorrelation function is found by plugging Eq. (27) into Eq. (25) and simplifying,
CA(∆x) =
)2 N∑
exp [ikn ·∆x] (28)
plus higher order harmonic. In finding Eq. (28), the relation∫
d2x′ ei(km−kn)·x
= Aδkmkn = (2π)
dδd(km − kn) (29)
has been used. δkk′ is the Kronecker Delta, and δd(k− k′) is the Dirac Delta. Eq. (29) will be helpful whenever it is
necessary to take an areal average or sum over Dirac Delta functions. In reciprocal space,
CAk =
(2π)2
m,n=1
δ2(k− km)δ2(k− kn)
δ2(k− ki) (30)
plus higher order harmonics, where δd(k − kn) = (A/(2π)d)δkkn .4. Thus, the order of the SAQD lattice manifests
itself as periodic functions in real-space (Eq. (28)) and sharp peaks in reciprocal space (Eq. (30)).
3.1.2 Nearly Periodic array
A nearly periodic arrays shows deviation from perfect order. This deviation is shows itself by broadening of the peaks
of the spectrum function, CAk , and by range limited periodicity of the real-space autocorrelation function, C
A(∆x).
These two measure of disorder are naturally related.
The disorder in lateral dot size ∆size and spacing, ∆spacing are related to each other and to the broadening of the
peaks in CAk (Fig. 6.a and c). Prior to ripening, the size and spatial order should be related, as the volume of a dot
should be proportional to the amount of nearby material. If the SAQDs have nearly uniform size and spacing (peak-
to-peak distance) L0, the reciprocal space autocorrelation function will be tightly clustered around the wavenumber
characterizing the dot spacing k0 = 2π/L0. There are a number of such peaks depending on the system symmetry
(Fig. 6.a and c), but consider just one. Since the order is not perfect, the peak will have a finite width. Consequently,
there will be a scatter in the dot size. Since L0 = 2π/k0, the scatter in dot spacing (∆spacing) is related to the scatter in
Fourier components (∆k). Taking the derivative of the spacing-wavenumber relation and rearranging,
∆spacing
It is reasonable to expect that the fractional disorder in size (∆size /Lsize) is given by a similar (if not exactly the same)
number.
Another way to view spatial order (periodicity) is not by dot-dot distances, but the distance over which the dot array
can be considered periodic. This limited periodicity is evident in the film height autocorrelation function (Eq. (25) and
Figs. 5.b, e and h). Consider two distant dots. Their position will be completely uncorrelated, so it will be completely
random as to whether one position corresponds to a peak or a valley. Thus, for a large differences in position the
autocorrelation function tends to zero.
CA(∆xlarge) = 0
Similarly, the mean-square fluctuation of the film height can be large so that
CA(∆x = 0)� 0.
The distance over which the autocorrelation function, CA(∆x) decays to 0 is the correlation length, Lcor. Thus, Lcor
is a reasonable measure of spatial order.
The two measures of order ∆spacing and Lcor are intrinsically linked. The well known rule of Fourier transforms
states that the product of the real-space and reciprocal space widths must be greater than or equal to unity. Thus,
∆kLcor ≥ 1, or ∆spacing ≥ 2πL20/Lcor. Similarly, one can expect that ∆size ∼ L2size/Lcor. Thus, assuming that dot size
is governed by the amount of nearby material, small dispersions in dot size are only possible if there is long correlation
length.
3.1.3 Ensemble Correlation Functions / ergodicity
SAQDs are seeded by random fluctuations. Consequently, each experiment or simulation must be treated as just
one possible realization, and the autocorrelation function will be different for each realization. Thus, for analytic
4Eq. (29) has been used to help with summation
predictions, one must rely on ensemble averages. In [24], it was assumed that the ensemble average correlation
function was a good description of a SAQD order, an assumption that was born out by numerical calculations. Now,
this relation is put on a more solid ground. In particular, it is found that the ensemble correlation functions provide
good estimates of the auto correlation function and spectrum function produced by any particular realization. First,
it is shown that the mean value of the film-height fluctuation is zero. Then the method to calculate the ensemble-
averaged autocorrelation function and spectrum function is presented. Additional mathematical details are presented
in Appendix 3.
3.1.3.1 Mean fluctuation It is fairly straightforward to show that the ensemble mean film-height fluctuation is
zero. The governing dynamics (Eq. (12)) is invariant upon the substitution h(x, t) → −h(x, t). Thus, assuming that
one does not bias the initial conditions the mean fluctuations must be zero for all time,
〈h(x, t)〉 = 〈−h(x, t)〉 = 0, and 〈hk(t)〉 = 0.
This is a common situation, and it is most appropriate to characterize the film height fluctuations using the two-point
correlation function (or simply “the correlation function”). [55]
3.1.3.2 Correlation Function The autocorrelation function can be estimated by its ensemble average. Further-
more, this ensemble average is equivalent to the correlation function that can be easily calculated analytically. These
relations are first discussed for the real-space correlation functions and then their Fourier transforms. First, the statis-
tical properties of the autocorrelation function are discussed. Then the statistical properties of the spectrum function.
Finally, the method to The main results are reported here, and details of derivations are reported in Appendix E.
First it is noted that the autocorrelation function averaged over all realizations is equal to the ensemble correlation
function. 〈
CA(∆x)
= C(∆x), where C(∆x) = 〈h(∆x)h(0)〉 , (31)
where 〈. . . 〉 indicate an ensemble average. Eq. (31) assumes that the model of film-growth is translationally invariant.5
This relationship is fortunate, in that it allows one to predict the “typical” autocorrelation function using analytic tools
that apply only to ensemble averages.
Second, it is noted that as the area that is used to calculate the autocorrelation function becomes large, the autocor-
relation function tends towards it mean value,
CA(∆x) ≈ C(∆x) +O[A−1/2], (32)
where O[A−1/2] indicates statistical fluctuations about the mean value that become smaller and smaller as the area in
an experiment or the simulation area in a numerical experiment becomes larger. These fluctuations or noise die out as
A1/2. For example, the autocorrelation functions in Figs. 5.e and h are very close to the ensemble average autocorre-
lation functions Figs. 5.f and h, but have random fluctuations that are most visible far from the origin. This property,
that averaging over a parameter such as position is equivalent to averaging over all realizations, is known as ergodicity.
Individual realizations are tightly distributed about a “typical” behavior. This tight distribution lends credibility to the
notion that one can have representative experiments or simulations. Unfortunately, the “demonstration” of Eq. (32) in
Appendix E is not as general as one might like. Rigorously, it applies when the Fourier components of film height (hk)
are independent and normally distributed; however, it is reasonable to conjecture that a relationship like Eq. (32) holds
whenever the statistical distribution of film heights is suitably bounded as the boundedness of CAk plays an important
role in the derivations.
In reciprocal space, one finds that the ensemble-mean spectrum function is〈
= Ck, (33)
where Ck is defined as the prefactor appearing in the reciprocal-space two-point correlation function.
Ckk = 〈hkh∗k〉 = Ckδ
d(k− k′) = Ck
(2π)d
δkk′ , (34)
5A quick survey of literature will find that, virtually all published continuum models of SAQD formation are translationally invariant.
where Eq. (29) has been used. This form for the two-point correlation function in reciprocal space occurs if and only
if the system is translationally invariant. Eq. (33) is valuable because one can solve for Ck analytically in the linear
case or using various analytic approximations in the non-linear case. Unlike the autocorrelation function, the spectrum
function fluctuates greatly about its mean. In fact, the fluctuations are about 100% (Appendix E.2). These large
fluctuations result in the commonly observed speckle pattern for the spectrum function CAk (Figs. 6.a and c). Contrast
this pattern with ensemble-mean spectrum function Ck shown in Figs. 6.b and d. These speckles can be removed by a
smoothing operation, and a relation similar to Eq. (32) results (Appendix E.2.2). Finally, it should be noted that just
as CAk is the Fourier transform of C
A(∆x), Ck is the Fourier transform of C(∆x) (Appendix E.1).
3.2 Stochastic Initial Conditions
To model or simulate the formation of SAQDs, it is absolutely essential to include some sort of stochastic effect. An
initially flat film h(x, 0) = 0 is in unstable equilibrium. Thus, to seed the formation of quantum dots, it is necessary
to perturb the flat surface. The simplest method to do this is to use stochastic initial conditions with deterministic
evolution. One can tenuously suppose that white noise initial conditions do not “bias” the ultimate evolution of the
film. [57] Thus, the initial conditions are taken from an ensemble with zero mean,
〈h(x, 0)〉 = 0. (35)
and a spatial correlation function,
C(x,x′, 0) = 〈h(x, 0)h(x′, 0)∗〉 = ∆2δd (x− x′) , (36)
where the brackets 〈. . . 〉 indicate an ensemble average, ∆ is the noise amplitude, and δd(x) is the d−dimensional
Dirac Delta function. White noise conditions have an infinite amplitude which is not physical. Thus, a minimum
modification can be made to “cut off” the infinite fluctuations.
C(x,x′, 0) =
(2πb20)
(x− x′)2
In the limit b0 → 0, this correlation function reverts to the white noise correlation functions.
In reciprocal space,
Ckk′(0) = 〈hk(0)h∗k′(0)〉
= (2π)−2d
ddx′ e(−ik·x+ik
′·x′)C(x,x′, 0)
(2π)d
δd(k− k′)
Letting b0 → 0, the white noise reciprocal space correlation function is obtained. Thus, the initial spectrum function
Ck(0) =
(2π)d
The atomic-scale has a small and short-lived influence on the final film morphology (Appendix F), but the cutoff
procedure is useful for choosing a reasonable value of ∆2. It seems reasonable to choose ∆2 so that the initial r.m.s.
fluctuation
C(0, 0) = 〈h(0, 0)h(0, 0)∗〉1/2is one monolayer (1 ML). Also, choosing b0 = 1 ML as the atomic scale
cutoff is
∆2 = (2π)d/2(1 ML)2+d, (38)
where the natural unit 1 ML is, of course, material dependent.
Using stochastic initial conditions, one can integrate individual initial conditions to obtain representative samples
and then average over many realizations, the Monte Carlo approach, or one can calculate analytically, the statistical
measures of the ensemble. The ensemble statistical measures are strongly related to the statistical measures of order
for an individual realization, so the second approach is opted for here. Thus, the predicted SAQD order is ultimately
stated in terms of ensemble correlation functions.
3.3 Reciprocal Space Correlation Functions
The reciprocal space correlation function, Ckk′ , and spectrum function, Ck, are calculated for the 1D and 2D isotropic
case and then for the 2D anisotropic case. Generally Ck includes the length scales introduced in Sec. 2.1.3 as well as
the atomic scale cutoff b0.
Ckk′ = 〈hk(t)h∗k′(t)〉 = e
(σk+σk′ )t 〈hk(0)hk′(0)∗〉
(2π)2
e(σk+σk′ )t−
δ2(k− k′). (39)
Without much error, b0 can be neglected in the exponential (Appendix F). Using Eq. (34), the spectrum function is
then identified as
(2π)d
e2σkt. (40)
Ck is now calculated for each model: 1D isotropic, 2D isotropic and 2D anisotropic.
3.3.1 one-dimensional
The one dimensional surface is the simplest, so it is treated first. The spectrum function is simply
e2σ0t−
2 (2σ2t)(k−k0)
Ck has a peak at k = ±k0i. One can easily read off the correlation length as
Lcor =
2σ2t = k
2(3α0 − 4β)(t/tc). (41)
so that
e2σ0t−
cor(k−k0)
This approximation is valid when k0Lcor � 1. In terms of kx,
e2σ0t
cor(kx−k0)
cor(kx+k0)
3.3.2 2D isotropic
The 2D isotropic case is very similar;
(2π)2
e2σ0t−
cor(k−k0)
, (42)
where Lcor is the same as in Eq. (41). It has maximum that forms a ring in the k−plane as graphed in Fig. 6.b.
3.3.3 anisotropic
The anisotropic spectrum function is
(2π)2
e2σ0t
‖(k‖−k0)
2− 12L
⊥ , (43)
where
2σ‖t = k
(6α0 − 8β)(t/tc), (44)
2σ⊥t = k
16�α0(t/tc), (45)
k‖ = cos[π(n− 1)/2]kx + sin[π(n− 1)/2]ky , and k⊥ = − sin[π(n− 1)/2]kx + cos[π(n− 1)/2]ky and it is graphed
in Fig. 6.d. This approximation is valid when k0L‖ � 1 and k0L⊥ � 1.
Figure 6: CAk and Ck for Ge/Si as discussed in Sec. 4. (a,b) 2D isotropic surface. Eq. (42) is used for Ck. (c,d) 2D
anisotropic surface. Eq. (43) is used for Ck.
Figure 7: 1D isotropic surface in real space for Ge/Si as discussed in Sec. 4. (a) Example of h(x) plotted over a length
of 8Lcor. (b) corresponding reals space correlation functions plotted for range ±4Lcor. Filled plot is an example of
CA(∆x). Solid line isC(∆x) (Eq. (46)).
Loosely speaking, one can argue that the isotropic case is similar to letting �A → 0 in Eq. (45) so that the
perpendicular correlation length is always 0 regardless of time. A more conservative approach would be to argue that
L⊥ ≈ 2π/k0 for the isotropic model via inspection of Figs. 6(a) and (b). Even still, the more conservative result
guarantees that the perpendicular correlation length will always be the same as the dot spacing; thus, it will always
limit SAQD order to the first nearest neighbor at best.
3.4 Real Space Correlation Functions
The real space correlation functionsC(∆x) are now calculated for the 1D and 2D isotropic cases and the 2D elastically
anisotropic case.
3.4.1 one-dimensional
In one dimension,
C(∆x) =
dkx e
ikx∆xCk
e2σ0t−
2/L2cor2 cos (k0x) . (46)
Thus, C(∆x) has a damped periodicity indicating that it is imperfectly periodic (Fig. 7).
3.4.2 2D isotropic
In two dimensions with elastic isotropy,
C(∆x) =
d2k eik·∆xCk
(2π)2
e2σ0t
dk kei(k∆x cos(θk−θ∆x)e−
cor(k−k0)
Performing the angular integration first,
C(∆x) =
e2σ0t
dk kJ0(k∆x)e
− 12L
cor(k−k0)
where J0 is the zeroth Bessel function. In general, this integral is best performed numerically; however, it can be
solved in two important cases: ∆x→ 0 and Lcor →∞ (corresponding to long times). In the first case,
C(∆x) =
e2σ0t
dk ke−
cor(k−k0)
Under the same conditions that Eq. (42) is valid (k0Lcor � 1), the lower limit of the integral can be approximated as
−∞ so that
C(∆x = 0) =
∆2k0√
2πLcor
e2σ0t. (47)
This function gives the mean square surface height fluctuation. In the second case where Lcor →∞, e−
cor(k−k0)
(2π)1/2L−1cor δ(k − k0), so that
C(∆x) =
∆2k0√
2πLcor
e2σ0tJ0 (k0∆x) . (48)
This correlation function is the most ordered case for a 2D isotropic surface. It is graphed in Fig. 5c.
3.4.3 anisotropic
To find the real-space correlation function for the elastically anisotropic case, it is best to find the contribution from
each peak and then sum so that
C(∆x) =
(2π)2
e2σ0t
Cn(∆x) (49)
where
Cn(∆x) =
d2k eik·xe−
‖(k‖−k0)
2− 12L
∆x can be decomposed into the directions parallel and perpendicular to kn, so that ∆x‖ = cos(π(n − 1)/2)∆x +
sin(π(n− 1)/2)∆y and ∆x⊥ = − sin(π(n− 1)/2)∆x+ cos(π(n− 1)/2)∆y. Thus,
Cn(∆x) =
dk‖ e
ik‖∆x‖− 12L
‖(k‖−k0)
dk⊥ e
ik⊥∆x⊥− 12L
2 (x2‖/L2‖+x2⊥/L2⊥)eik0x‖ .
Plugging into Eq. (49),
C(∆x) =
πL‖L⊥
e2σ0t
2 (x2/L2‖+y2/L2⊥) cos(k0x) + e
− 12 (x2/L2⊥+y2/L2‖) cos(k0y)
. (50)
3.5 Generalizability
The dynamics and analysis used here were for a specific model, but the general procedure for analyzing the order
resulting from a linearized model should hold for any model with well-separated peaks in the dispersion relation, σk.
The procedure to follow is:
1. Generate the dispersion relation, σk as some function of k.
2. Find the peaks in the dispersion relation, kn, (n = 1 . . . N )
3. Expand about the peaks to generate the peak values, σn, and local Hessian matrix,(
∂ki∂kj
The spectrum function is then approximately
Ck(t) ≈
(2π)2
e2σnt exp
t (k− kn) · H̃n · (k− kn)
. (51)
4. Find the Eigenvalues of the local Hessian matrix, (Hn)I and (Hn)II . They should be negative, if there is a
peak at kn
5. Use the eigenvalues to determine the correlation lengths, (Ln)I =
2 |(Hn)I | t and (Ln)II =
2 |(Hn)II | t.
The real-space correlation function is
C(∆x, t) ≈
(Hn)I (Hn)II
e2σnt exp
x · H̃−1n · x
eikn·x. (52)
The “goodness” of these approximate forms requires that (Ln)
I and (Ln)
II be much less than the spacing
between peaks in the correlation function so that the gaussians do not overlap greatly. A reasonable test for
this no-overlap condition is ‖kn‖ (Ln)I � 1 and ‖kn‖ (Ln)II � 1, assuming that the peaks are not large in
number or very closely spaced.
4 Order Predictions
The real-space correlation function formulas (Eqs. (46), (47), and (50)) and correlation length formulas (Eqs. (41), (44)
and (45)) can now be used to estimate the order of SAQDs. Ge on Si is chosen for this example because this system has
received the most attention from theoretical work [58, 38, 31, 18, 39, 41, 27, 25, 26, and others], and it is the simplest
since it involves the diffusion of a single species. The procedure described below tries to predict the amount of order
when an initial atomic-scale fluctuation becomes “large”. “Large” is taken to be greater than atomic-scale. Beyond
this point, one would expect non-linear terms to become important. An example is presented for Ge on Si at 600K to
compare and contrast the 2D anisotropic results with the 1D isotropic and 2D isotropic results. The predictions are
also compared with a linear numerical calculation on a discrete reciprocal-space grid to test the approximations made
and to illustrate the relation between the surface profile (h(x)), the example autocorrelation functions (CA(x) and
CAk ) and the ensemble correlation functions (C(∆x) and Ck). Figs. 6, 7 and 5 show these results. Finally, the relation
between average film height and order is investigated.
4.1 Ge at 600K
The formulations for the three discussed cases are implemented for Ge/Si at 600K. The correlation lengths are esti-
mated for the end of the linear regime where fluctuations become large (greater than atomic scale). First, appropriate
physical constants are used to give the corresponding correlation length and correlation functions vs. time. These in-
clude an initial average film height H̄ and a white noise amplitude ∆ (Eq. (38)). These initial conditions approximate
a film at the beginning of an anneal that immediately follows a rapid deposition. The time tlarge is found by solving for
the time where the mean-square fluctuations are atomic scale,
h(x, t)2
= C(∆x = 0) = 1 ML2. At this point, the
correlation lengths are calculated.
Physical constants for the 2D anisotropic calculation are taken as follows. The elastic constants for Ge at 600 K
are c11 = 1.199 × 1012, c12 = 4.01 × 1011(from cS = 3.991), c44 = 6.73. [51] Using aGe = 0.5658nm and
aSi = 0.5431nm, it is found that �m = 0.0418. Using the procedure from (Appendix C), M = 1.332× 1012dyn/cm2.
E0◦ = 4.96 × 109erg/cm3, and E45◦ = 4.35 × 109erg/cm3 , giving �A = 0.1236. The atomic volume is Ω =
2.27 × 10−23 cm3. The estimated surface energy density is γ = 1927 erg/cm2. The wetting potential is estimated
by picking a plausible critical surface height, Hc ≈ 4 ML = 1.132 nm and setting W (H) = E20◦H3c/(8γH) =
2.315 × 10−6/H erg/cm2. The resulting characteristic wave number is kc = 0.257 nm−1. The initial film height is
taken to be H̄ = Hc + 0.25 ML = 1.203 nm and then allowed to evolve naturally. Thus, β = 0.208, α0 = 0.5658,
k0 = 0.1456 nm−1, σ0 = 0.1192/tc, σ‖ = 0.864/(k2c tc), σ⊥ = 0.559/(k
c tc), L‖ = 0.744k
0 (t/tc)
1/2, and
L⊥ = 0.599k
0 (t/tc)
1/2. The unspecified diffusivity has been absorbed into the characteristic time tc. From Eq. (38),
∆2 = 0.0403 nm4, and Eq. (50) gives
C(0) =
1.223× 10−3tc/t
e0.02385t/tc nm2.
The initial infinitely rough surface undergoes a smoothing described by the tc/t factor. Then the surface roughens due
to the exponential. The initial divergent roughness is an artifact of the non-physical white noise with the atomic scale
cutoff b0 neglected (Appendix F). The time for the fluctuations to become “large” again are found by setting
C(0) = h2large (53)
where hlarge = 1 ML = 0.283 nm. The solutions are t1 = 0.01527tc or t2 = 430tc. The first solution is discarded
since it is due to the non-physical white noise. At tlarge = t2, L‖ = 105.8 nm, and L⊥ = 85.2 nm. Taking L⊥ as
more limiting, the correlation spans about n = k0L⊥/π = 3.95 islands across. The corresponding reciprocal space
(Eq. (43)) and real-space correlation function (Eq. (50)) are shown in Figs. 6.d and 5.f respectively.
A corresponding numerical experiment is performed. A periodic surface of size l = 96(2π/k0) is used. Random
initial conditions consistent with Eq. (38) are used for k−space points on a square grid bounded by kx, ky = ±2k0.
The relation between discrete and continuous Fourier components is used, (hk)discrete = [(2π)d/A]hk. Eqs. (13)
and (14) are used without any additional approximation to find hk at time t = tlarge. The resulting CAk , a portion of
the height profile h(x) and CA(∆x) are plotted in Figs. 6(c), 5(d) and 5(f) respectively.
Similar calculations can be performed for the one-dimensional and two-dimensional elastically isotropic cases.
Isotropic values used previously [58, 24] are about E = 1.361 × 1012 dyn/cm2 and ν = 0.198 giving M = E/(1 −
ν) = 1.697 × 1012 dyn/cm2 and E = 2M(1 + ν) = 7.10 × 109 erg/cm3. Using the same critical surface height,
Hc = 4 ML, W (H) = 4.74×10−6/H erg/cm2. The resulting characteristic wave number is kc = 0.368 nm−1. If the
film is grown to H̄ = Hc+0.25 ML = 1.203 nm and then allowed to evolve naturally, β = 0.208; thus, α0 = 0.5658,
k0 = 0.208 nm−1, σ0 = 0.1192/tc, σ2 = 0.864/(k2c tc), and Lcor = 0.744k
0 (t/tc)
1/2. In one dimension, Eq. (46)
is used to find the mean square height fluctuation. Using Eq. (38) with d = 1, ∆2 = 0.0568 nm3, and
C(0, t) = 0.01271(t/tc)
−1/2e0.0238t/tc .
Setting C(0, t) = (1 ML)2 = 0.0801 nm2, t1 = 0.0252tc, and t2 = 186.9tc. At t2, Lcor = 48.8 nm, and n =
k0Lcor/π = 3.24, so about 3 dots in a row should be well correlated. The corresponding numerical calculation of
size l = 96(2π/k0) is performed. A portion of h(x), CA(∆x) and C(∆x) are shown in Fig. 7. In two dimensions,
Eq. (47) is used to find
h(x, t)2
C(0, t) = 9.40× 10−4(t/tc)−1/2e0.0238t/tc .
Setting C(0, t) = 0.0801 nm2, t1 = 1.376 × 10−4tc, and t2 = 306tc. At t2, Lcor = 62.4 nm, and n = k0Lcor/π =
4.14, and correlation is expected to extend about 4 dots. However, it should be noted that this correlation is not
lattice-like. Corresponding numerical results and ensemble correlation functions are shown in Figs. 6 and 5.a-c.
4.2 General case of β
In [24] it was suggested that allowing the film to evolve with β close to the stability threshold could enhance the SAQD
correlation. It is interesting to note what happens for different values of β. Similar analytic and numerical calculations
are performed for the large film-height limit, β = 0, for the 2D anisotropic Ge/Si surface. For β = 0, tlarge = 40.3tc,
L⊥ = 30.0 nm, and n = k0L⊥/π = 1.84, so one to two dots in a row are expected to be well correlated. h(x)
and real-space correlation functions are shown in Figs. 5g-i. The range of order is significantly less than for the case
β = 0.208 (Sec. 4.1). For Si/Ge at 600K, the 2D anisotropic predictions for tlarge and L⊥ are shown in Fig. 8. In
general, the closer β is to the critical value 0.25, the longer the correlation length. One can manipulate equation (53)
to find that tlarge/tc varies approximately but not exactly as (β − 1/4)−1 × ln[h2large
�A/(∆2k2c )]. Consequently,
L⊥ ∼ (β − 1/4)−1/2. Furthermore, the appearance of hlarge and ∆2 inside the logarithm shows that the final order
estimates are not overly sensitive to the guesses for ∆2 and h2large. The divergence of L⊥ with β − 1/4 is initially
encouraging, but it is clear that for the parameters used for Ge/Si, subatomic control of the film height is needed to
yield significantly enhanced long range correlations. Also as one approaches this threshold, one can probably expect
thermal activation to nucleate subcritical SAQDs whose effect on supercritically formed SAQDs is uncertain. There
should be some interesting phenomena at the theH → Hc.
l arge c
l a r g e
k 0L �
� 0 �
l a r g e � �
Figure 8: tlarge and L⊥ vs. β for Si/Ge using the 2D anisotropic model as described in Sec. 4. Units are normalize to
characteristic time tc and predicted number of correlated dots (n = k0L⊥/π).
5 Discussion/Conclusions
The order of epitaxial self-assembled quantum dots during initial stages of growth has been studied using a common
model of surface diffusion with stochastic initial conditions. It has been shown that correlation functions of small
surface height fluctuations can be predicted analytically using corresponding ensemble average correlation functions.
These correlation functions are characterized by correlation lengths that can be predicted by analytic formulas given
certain reasonable assumptions about the diffusion potential and the height and lateral scale of initial atomic scale
random fluctuations. Thus, the linear model of film surface height evolution via surface diffusion has enabled analytic
predictions of epitaxial SAQD order that are valid for small film height fluctuations. To what extent the initial degree
of order persists into later stages of growth remains to be studied, but the order of initial stages should certainly have
a strong influence on final outcomes. Furthermore, the linear analysis should provide insight into the less tractable
non-linear behavior. These predictions of SAQD order have been used to investigate the role of crystal anisotropy and
initial film height.
Crystal anisotropy has been shown to play an important role in enhancing SAQD order as observed in previous
numerical simulations continuum and atomistic numerical simulations. [43, 37, 44, 45] If a four-fold symmetry is
assumed for the governing dynamics, the effect of crystal anisotropy to linear order is felt through elastic anisotropy
alone. It is shown that elastic anisotropy is required to produce a lattice-like structure of SAQDs. The enhanced spatial
order should in turn lead to enhanced size order, a consequence that must be confirmed with non-linear studies, but
appears to be true based on the present available literature.
The role of initial film height has been shown to greatly influence order. Growth near the critical film height for
dot formation can enhance order. This order enhancement comes from increasing the duration of the linear small-
fluctuation stage of growth. In fact, the predicted correlation lengths diverge when the initial film height approaches
the critical film height from above. Achieving large correlation lengths in this manor is of course practically limited
by ability to control film heights to subatomic accuracy. Additionally, one should be careful when interpreting the
continuum model in such a context, as the effect of atomic discreteness might be greater at the transition film height.
Finally, it is likely that additional randomizing effects of thermal activation will effectively cut off this divergence
when the critical film height is approached from below during deposition.
Finally, the presented method may be useful as a first step in the analysis of methods to enhance SAQD order. It is
reasonable to suppose that under some circumstances initial growth stages will be very important while for others they
will not. For example, prior work on vertical stacking appears to confirm the presented ordering mechanism. [44].
Vertical stacking not only achieves vertical correlation of dots, but each layer is more ordered horizontally than the
one below. Additionally, a “growth window” was found, whereby to achieve enhanced order, the evolution of each
layer be terminated before ripening begins. The reported simulation [44] supports the following scenario for SAQD
order development. Order is enhanced during the small fluctuation stage as described here. Once the fluctuations are
sufficiently large, the seeded dots evolve towards their equilibrium shapes. Finally, the dots begin to ripen and order
diminishes. Order is transfered via strain to the next layer so that the next layer gets a head start on its initial ordering.
Thus, the multiple layers of dots effectively draws out the linear growth stage. It may be possible to modify the present
model to predict the correlation length of each SAQD layer.
A Diffusion Potential
The diffusion potential is calculated in terms of the film height H that is a function of the in plane coordinates x =
xi + yj. The elastic and surface energy portions of the diffusion potential can be found in [15]
µelast(x) = Ωω(x), and µsurf = −Ωγκ(x),
where Ω is the atomic volume, ω(x) is the elastic energy density at the film surface, γ is the surface energy density,
and κ is the total surface curvature. However, other calculations need to be included:
1. µwet for the two wetting potential cases, Eq. (3) and (5),
2. and µsurf and µwet when the surface energy density γ and wetting energy density W also depend on surface
orientation.
Before these case are addressed, a general form for the diffusion potential is justified.
A.1 General Form µ = ΩδF/δH(x)
The diffusion potential, µ(x), is the change in free energy, F , when a particle is added at a position, x. Note that µ(x)
and F are relative energies. They can be used to compare the binding energy of one site on the surface in comparison
with another site, but should not be interpreted as an absolute binding energy or total formation energy of the surface.
If a particle has a volume Ω, then the diffusion potential at x is related to the variation of free energy with volume,
δF = Ω−1
ddxµ(x)δV (x), (54)
where δV (x) is the volume variation at x. Calculating δV (x), V =
ddxH(x).Therefore, δV (x) = δH(x).
Substituting into δF (Eq. (54)), δF = Ω−1
ddxµ(x)δH(x) or µ(x) = ΩδF/δH(x).
A.2 Simple Model
Starting from Eq. (2), µ(x) is found by taking the variational derivative,
µelast.(x) = Ω
δH(x)
volume
ddxdz ω[H](x, z) = Ωω (x)
where the “[H]” indicates that the elastic energy, ω, is a nonlocal functional of the film height H, and ω(x) =
ω[H] (x,H(x)), the elastic energy density evaluated above lateral position x at the free surface (z = H(x)). See [15]
for details of the derivation. The surface energy diffusion potential is
µsurf.(x,t) = Ω
δH(x)
1 + (∇H(x))2
= −Ω∇ ·
1 + (∇H(x))2
γ = −Ωγκ(x).
The wetting energy diffusion potential is
µwet(x) = Ω
δH(x)
ddxW (H(x))
= ΩW ′(H(x))
Putting these three terms together, one obtains Eq. (3)
A.3 General Model
Consider the general form for the combined surface energy and wetting potential,
Fsw =
ddxFsw(H(x),∇H(x))
as in Eq. (4) so that the free energy is an integral over the x−plane of an energy density that depends on H(x) and
∇H(x) locally. The corresponding diffusion potential is
µ(x) = Ω
δH(x)
F (10)sw (H(x),∇H(x))−∇ · F
sw (H(x),∇H(x))
B Linearized Diffusion Potential and Anisotropy
The linearized diffusion potential µlin, k is found by finding µ(x) to first order in height fluctuations (h), to get µlin(x)
and then taking the Fourier transform to get µlin,k. The linearization of the simple isotropic diffusion potential corre-
sponding to Eqs. (2) and (3) was discussed in Sec. 2.1.1.1. Here, the more general diffusion potential corresponding
to Eqs (4) and (5) is linearized and then applied to the anisotropic simple model and the anisotropic general model.
Only the surface and wetting parts of the diffusion potential are discussed in this appendix. See ref. [15], Sec. 2.2.1.1
and Appendix C for discussion of µelast..
B.1 Linearizing the simple model
Consider a wetting potential and diffusion potential that both depend on the film height gradient ∇H, γ → γ(∇H)
and W (H) → W (H,∇H). Starting from Eq. (6) and expanding to second order in the film height fluctuation using
H(x) = H̄+ h(x) (Eq. (7)),
1 + (∇H)2
]−1/2
γ(∇H) =
(∇h)2 + . . .
γ + γ′ ·∇h+ γ̃′′ : ∇h∇h+ . . .
= γ + γ′ ·∇h−
γ (∇h)2 + γ̃′′ : ∇h∇h+O[h3]
where γ is γ(0), and the primes indicate the derivatives with respect to the surface height gradient.
γ′ = ∂∇Hγ(∇H)|∇H=0 , and γ̃
′′ = ∂∇H∂∇Hγ(∇H)|∇H=0 .
Taking the derivative with respect to ∇h results in a tensor of rank equal to the order of the derivative because ∇h is
a vector (tank 1 tensor). Taking the variational derivative, µsurf.(x) = ΩδFsurf./δh(x),
µsurf., lin(x) = Ω
γ∇2h(x)− γ̃′′ : ∇∇h(x)
The term with γ′ vanishes because it is the divergence of a constant (∇ · γ′). Taking the inverse Fourier transform,
µsurf., lin,k = Ω
−γk2 + k · γ̃′′ · k
hk. (55)
The first term is isotropic. The second term is parameterized by a rank 2 symmetric tensor.
Going through the same process, one finds essentially the same result for an orientation dependent wetting energy.
The step details are so close to the details for linearizing the more general form, Fsw(H,∇H), they are deferred to
(Appendix B.2). One finds that
µwet,lin,k = Ω
W (20) + k · W̃(02) · k
. (56)
where W (mn) = ∂mH∂
∇HW (H,∇H)|H=H̄,∇H=0 is the m
th and nth derivative of the wetting energy density with
respect toH and ∇H evaluated for a perfectly flat film of height H̄. W (mn) is a tensor of rank n.
B.1.1 isotropic case
In the isotropic case, γ̃′′ → γ′′Ĩ, where Ĩ is the identity operator, and γ′′ is a scalar. Similarly, W̃(02) →W (02)Ĩ. One
thus gets for the combined surface and wetting parts of the diffusion potential,
µsw,lin,k = Ω
−γ + γ′′ +W (02)
k2 +W (20)
Thus, in the isotropic case, the linear order effect of introducing a surface orientation to either the surface energy or
the wetting potential is simply to change the apparent surface energy density by γ → γ − γ′′ −W (02).
B.1.2 anisotropic case
The surface and wetting parts of the diffusion potential (Eqs. (55) and (56)) can admit only a limited anisotropy.
They both contain rank 2 symmetric tensors, γ̃′′ and W̃(02) in the x−plane. For a two-dimensional surface, this
means that they can either have two-fold-symmetric (rotations by 180◦) anisotropy or none at all. Thus, for the case
considered in Sec. 2.2.1.2, four-fold-symmetric anisotropy , the surface and wetting parts of the diffusion potential
must be completely isotropic. As discussed in Sec. 2.2.1.2, the (100) surface of zinc-blend structures, such as the
mentioned Ge, Si, InAs and GaAs present a rather complicated situation. For simplicity, it is assumed here that the
surface and wetting energies are at least four-fold symmetric. Consequently, they are completely isotropic.
Finally, it should be noted that if Fsw depends on higher order derivatives, then the discussion is greatly compli-
cated and a larger class of anisotropic terms is admissible. For example, whenFsw → Fsw(H,∇H,∇∇H,∇∇∇H, . . . )
is expanded aboutH(x) = H̄ to quadratic order in h, it would contains tensors of rank 6 and maybe even higher.
B.2 Linearizing the general model
The elastic part of the linearized diffusion potential was discussed in Sec. 2.2.1.1 and Appendix C . Eq. (56) can be
found by using all of the following steps with the substitution Fsw →W . The surface-wetting part of the diffusion po-
tential µ(x) is found by expanding Fsw to second order in the film-height fluctuation, h, and then taking the variational
derivative. Expanding Fsw about h = 0 and ∇h = 0,
Fsw(H̄+ h,∇h) = F (00)sw + F
sw h+ F
sw ·∇h+ hF
sw ·∇h . . .
· · ·+
F (20)sw h
F̃(02)sw : ∇h∇h+O[h
Note that in this expansion, all the F (mn)sw terms are constant with respect to h and depend implicitly on the average
film height, H̄. The first index indicates the mth derivative with respect to h. The second index indicates the nth
derivative with respect to ∇h. The derivatives are evaluated for a perfectly flat surface of height H̄. Thus,
F (mn)sw = ∂
∇HFsw (H,∇H)|H=H̄,∇H=0 .
Since ∇h is a vector in the x−plane, F (mn)sw is a tensor of rank n. Taking the variational derivative of Fsw =∫
ddxFsw(H,∇H) and keeping terms to order h1,
δh(x)
= F (10)sw −∇ · F
sw + F
sw h−∇ ·
F̃(02)sw ·∇h
Note that the F (00)sw term vanishes because it is constant, and the F
sw term vanishes upon simplification. Additionally,
the F (10)sw can be neglected if one enforces the condition that the film-height fluctuations do not add or subtract material
from the surface, namely that
ddx δh(x, t) = 0. Alternatively, one can discard it in anticipation of taking the gradient
of the diffusion potential, since it is a constant. The term ∇ · F(01)sw = 0 for the same reasons, or because F
sw is a
constant. Multiplying through by the atomic volume,
µlin(x) = Ω
F (20)sw h− F̃
sw : ∇∇h
. (57)
B.2.1 isotropic case
In the isotropic case, F̃(02)sw must be proportional to the identity so that F̃
sw = F
sw Ĩ; thus,
µsw,lin(x) = Ω
F (20)sw h(x)− F
2h(x)
Taking the inverse Fourier transform of this equation,
µsw,lin,k = Ω
F (20)sw + F
This gives case b in Eq. (9).
B.2.2 anisotropic case
If the surface is anisotropic, then F̃(02)sw in Eq. (57) is a rank 2 symmetric tensor in the x−plane. Thus, it can have two
distinct eigenvalues, and automatically has 2-fold rotational symmetry (rotations by 180◦). If any other symmetry is
assumed such as 4-fold symmetry (rotations by 90◦), then F̃(02)sw must be fully isotropic. Taking the inverse Fourier
transform,
µsw,lin,k = Ω
F (20)sw + k · F̃
sw · k
In Eq. (23), case b, it is assumed that there is four-fold symmetry, resulting in a surface-wetting part of the diffusion
potential that is completely isotropic.
C Elastic Anisotropy
In principal, the anisotropic elastic energy ωk is found in the same fashion as the isotropic elastic energy. [15] The
flat film, initially in a state of biaxial stress, is perturbed by a small periodic surface fluctuation of amplitude h0. An
appropriate elastic field is added to satisfy the perturbed traction-free boundary condition at the free surface. Finally,
the elastic energy is evaluated at the free surface to first order in h0. The coefficient h0 is the sought after ωk. The
equations themselves are cumbersome and best solved using a numeric implementation, so an abstract procedure for
calculating ωk is outlined here. ωk is found for k = 1 but arbitrary θk.
Let the surface have a height variation
h(x) = h0e
To first order in h0, the surface normal is
n(x) = −ikh0eikxi + k.
The elastic energy needs to be calculated to first order in h0. To find the elastic energy, it is necessary to find the
perturbing elastic field to first order in h0.
The initial unperturbed stress state is
σ̃m =
 σm 0 00 σm 0
0 0 0
where σm =
c11 + c12 − 2c212/c11
�m. Note that this stress state is isotropic in the x−y-plane and thus independent
of rotations about the vertical axis. Under this stress state, a flat surface is traction-free. With the height perturbation,
the traction is
tj = (n · σ̃m)j = −ikh0M�mδj1e
ikx. (58)
Next to find the perturbing elastic fields. These are not isotropic in the x − y−plane, and it is necessary to take
into account the angle. First, the 3× 3× 3× 3 elastic stiffness tensor cijkl is constructed for the cube orientation from
the compact 9× 9 matrix cij . The tensor representation aids in rotation. The stiffness tensor is then passively rotated
in the x− y−plane by an angel θk,
cijkl(θk) =
m,n,p,q=1
R(θk)imR(θk)jnR(θk)kpR(θk)lqcmnpq
where
R(θk) =
 cos(θk) sin(θk) 0− sin(θk) cos(θk) 0
0 0 1
This passive rotation of cijkl is equivalent to actively rotating the wave vector k = ki by θk.
The appropriate form for the perturbing displacement field is found. Assume a displacement of the form
ui(x, y, z) = Uie
k(ix+κz),
where κ can have a complex value. The elastic equilibrium equations are
i,k,l=1
cijkl(θk)
ul = 0; j = 1 . . . 3.
Cjl(θk, κ)Ul
k2ek(ix+κz) = 0 (59)
where
Cjl(θk, κ) =
i,k=1
cijkl(θk)(iδi1 + δi3κ)(iδk1 + δk3κ).
Factoring out k2ek(ix+κz), the part in parenthesis must be identically zero.
To obtain a non-trivial solution, the determinant of Cjl(θk, κ) to zero. Six complex values of κ are found. The
values of κ with Re[κ] < 0 are discarded since the corresponding displacements blow up as z → −∞. Each of
the remaining values κ = κp with p = 1 . . . 3 is substituted back into Cjl(θk, κ), and Eq. (59) is solved to find the
corresponding eigenvectors, Upl . The total displacement is thus
ul(x, y, z) = i�mh0
k(ix+κpz),
where it is assumed that the perturbing elastic displacement field is proportional to h0and σm, and the factor of i is
put in for convenience. The coefficients Ap can be found from the traction-free boundary condition at the free surface.
The traction formula is
i,k,l=1
nicijkl(θk)
ul(x, y, z) = ik�mh0
i,k,l,p=1
nicijkl(θk)ApU
l (iδk1 + κ
pδk3)e
k(ix+κpz) (60)
The traction is already proportional to h0. Thus, all terms in the sum must be kept to zeroth order in h0 so that
h(x) = 0, and n(x) = k.
Thus, plugging z = 0 to Eq. (60),
tj = ik�mh0
(ic3j1l(θk) + κ
pc3j3l(θk))ApU
ikx. (61)
Since the total traction (Eqs. (58) and (61)) must be zero, the coefficients Ap are found from
KjpAp = Rj ,
where
Kjp =
(ic3j1l(θk) + κ
pc3j3l(θk))U
Rj = Mδj1
for j = 1 . . . 3. It is worth noting that only for the symmetry directions, θk = 0◦ and θk = 45◦ is the strain purely
plane-strain as it is for the elastically isotropic case.
The elastic energy at the film surface is found to order O(h0). If the stress and strain are expanded to first order in
h0, σ̃ = σ̃0 + σ̃1, and �̃ = �̃0 + �̃1, then
�̃ : c̃ : �̃ =
σ̃0 : �0 + σ̃0 : �̃1 +O(h
Thus,
U = U0 +M�m ((�1)11 + (�1)22)
(�1)11 =
= −�mkh0
(�1)22 = ∂u2/∂y = 0. Thus,
U = U0 − Eθkkh0e
where
Eθk = M�
where Apand U
1 are implicitly functions of θk. This procedure has been used to find the values of E0◦ and E45◦ for
Table. 2 and Sec. 4.
D Diffusional Anisotropy
In general, the surface diffusivity can depend on the film height H(x) and the surface orientation ∇H(x) so that the
surface current is
JS(x) = D̃(H(x),∇H(x)) ·∇sµ(x)
where ∇s is the surface gradient, and D̃ is a rank 2 tensor in the two-dimensional space tangent to the film surface at
x. Linearizing the surface current about a flat surface,
JS(x) = D̃(H̄) ·∇µlin(x)
where the diffusivity must be evaluated for h = 0 and ∇h = 0, since µlin(x) is already proportional to h(x). The lin-
earized diffusivity is a symmetric rank 2 tensor in the x−plane. Thus, it is similar to F̃sw discussed in Appendix B.2.2.
It is automatically either two-fold symmetry (rotations by 180◦) or it is completely isotropic. In Eq. (23), four-fold
symmetry of the surface is assumed. Thus, the diffusivity must be completely isotropic; D̃ → D, a scalar. Sec-
tion 2.2.1.2 and Appendix B.2.2 contain discussions of the symmetry properties of the various rank 2 tensors that
appear in the linear evolution equations. A limited case of diffusional anisotropy has been modeled via kinetic Monte
Carlo technique. [54]
E Correlation Functions
E.1 Mean Values
Equations (31) and (33) are central to the presented analysis. Here, they are derived. The two-point correlation func-
tions for a stochastic system are introduced. Then, the average of the autocorrelation function is taken and expressed
in terms of the two-point correlation functions. Finally, this average is simplified using the translational invariance of
the system (governing equations and ensemble of initial conditions).
The two-point real-space space correlation function is
C(x,x′) = 〈h(x)h(x′)∗〉 ,
and the reciprocal space correlation function is
Ckk′ = 〈hkh∗k′〉 .
These are related by the double Fourier transform,
Ckk′ = 1(2π)2d
ddxddx′ e−ik·x+ik
′·x′C(x,x′); (62)
C(x,x′) =
ddkddk′ eik·x−ik
′·x′Ckk′ . (63)
These ensemble correlation functions can be used to give the ensemble-mean autocorrelation function and spec-
trum function. In real space,
CA(∆x)
d2x′ 〈h(∆x + x′)h(x′)〉
d2x′ C(∆x + x′,x′). (64)
(2π)d
〈hkh∗k〉 =
(2π)d
Ckk. (65)
Fortunately, the translational invariance of the system simplifies these relations. Inspecting the governing equations
and invoking the translational invariance of the stochastic initial conditions, the resulting ensemble and its statistical
measures must also be translationally invariant. Thus under the translation by x′,
C(∆x + x′,x′) = C(∆x,0) = C(∆x), (66)
so that the independent variable is reduced to just the difference vector ∆x = x − x′. This relation can be used to
simplify both the real and reciprocal space relations.
The real space relation simplifies as follows.Inserting Eq. (66) into Eq. (64),
CA(∆x)
d2x′ C(∆x,0) = C(∆x). (67)
The reciprocal space relation (Eq. (62)) simplifies to
Ckk′ = Ckδ
2(k− k′) = Ck
(2π)d
δkk′ , (68)
where
(2π)d
d2∆x e−ik·∆xC(∆x).
One can see immediately from Eq. (67) that Ck is the Fourier transform of
CA(∆x)
= C(∆x), or one can plug
Eq. (68) into Eq. (65), to get
= Ck.
E.2 Variance and Convergence
The ergodic hypothesis is that an average with respect to a parameter such as position or time tends towards an
ensemble average. In this case,
CAk ≈
= Ck, (69)
and CA(∆x) ≈
CA(∆x)
= C(∆x).
when the surface area is very large. The ensemble average is a good substitute if the variance about the average
vanishes as the substrate area A becomes large. It is found that in reciprocal space,
Var(CAk ) =
= C2k. (70)
Thus, the ergodic hypothesis does not hold for CAk . In practice, C
k is a speckled version of Ck (Fig. 6) However, if
one smooths CAk by averaging over a small patch in reciprocal space of size ksmooth = 1/∆s, so that
CAk (∆s) =
)d/2 ∫
ddk′ e−
′−k)2CAk′ , (71)
then Var
CAk (∆s)
diminishes as 1/A. For sufficiently large ∆s,〈
CAk (∆s)
≈ Ck, (72)
CAk (∆s)
πd/2∆ds
C2k. (73)
Thus, the ergodic hypothesis (Eq. (69)) only holds for a smoothed version of CAk .
In real space,
CA(∆x)
CA(∆x)
CA(∆x)
(2π)d
e2ik·∆xC2k + C
, (74)
where the integral is bounded (finite) provided that either t > 0 or the atomic scale cutoff b0 > 0. Thus, the ergodic
hypothesis holds for the real space autocorrelation function.
E.2.1 Eq. (70)
First,
CAk C
is calculated.
CAk C
(2π)d
〈hkh∗khk′h
k′〉 .
Assume that he distribution of hk is gaussian. Also, assume that h(x) is real so that hkh−k = |hk|
2. Then,〈
= Ck1Ck2δ
d(k1 − k4)δd(k2 − k3) . . .
. . . +Ck1Ck2δ
d(k1 + k3)δ
d(k2 + k4) . . .
. . . +Ck1Ck3δ
d(k1 − k2)δd(k3 − k4).
Thus,
CAk C
(2π)d
δd(k− k′)
. . .
. . . +C2k
δd(k + k′)
+ CkCk′
δd(0)
. (75)
= C2k
δkk′ + δk(−k′)
+ CkCk′ , (76)
where Eq. (29) has been used liberally. Setting k = k′, results in Eq. (70).
E.2.2 Eq. (73)
Now consider CAk smoothed over a length ∆s (Eq. (71)). The mean value is
CAk (∆s)
)d/2 ∫
ddk′ e−
′−k)2 〈CAk′〉 .
)d/2 ∫
ddk′ e−
′−k)2Ck′ .
For sufficiently small ksmooth, (sufficiently large ∆s), Eq. (72) results.
The variance of CAk (∆s) is now calculated. First, it is necessary to calculate
CAk (∆s)
CAk (∆s)
ddk′ e−
′−k)2 . . .
. . . ×
ddk′′ e−
′′−k)2 〈CAk′CAk′′〉 .
Using Eq. (75) and Eq. (29) as needed,
CAk (∆s)
ddk′ddk′′ e−
′−k)2e−
′′−k)2
(2π)d
. . .
· · · ×
δd(k′ − k′′)
+ C2k
δd(k′ + k′′)
+ CkCk′
δd(0)
′−k)2C2k′ + e
− 12 ∆
s[(k′−k)2+(k′+k)2]C2k′
. . .
· · ·+
)d/2 ∫
ddk′ e−
′−k)2Ck′
The first integral is bounded (finite) because Ck is bounded. Let its finite value be denoted I . The second integral is
simply
CAk (∆s)
.Thus,
Var(CAk (∆s)) =
∆2ds I
a finite value that decreases as A−1 as required for the ergodic hypothesis to hold. For sufficiently small ksmooth (large
∆s), I ≈ (π/∆2s)d/2C2k, and Eq. (73) results. It should also be noted that the large ∆s required for this approximation
also creates a more stringent requirement that A be large.
E.2.3 Eq. (74)
Now, consider the real space auto-correlation function. First,
CA(∆x)CA(∆x)
is needed.
CA(∆x)CA(∆x)
ddkddk′ ei(k+k
′)·∆x 〈CAk CAk′〉
Proceeding in a fashion similar to the previous section (making use of Eqs. (75) and (29) as needed) ,
CA(∆x)CA(∆x)
(2π)2d
ddkddk′ ei(k+k
′)·∆x
δd(k− k′)
. . .
· · ·+ C2k
δd(k + k′)
+ CkCk′
δd(0)
(2π)d
e2ik·∆xC2k + C
. . .
· · ·+
ddk eik·∆xCk
ddk′ eik
′·∆xCk
(2π)d
e2ik·∆xC2k + C
CA(∆x)
Thus, Eq. (74) results. For the variance to be vanishing, the integral in Eq. (74) must be bounded (finite). If time,
t > 0, the exponential in Eq. (77) guarantees that the integral is bounded. For time t = 0, the integral is only bounded
if the atomic scale cutoff b0 > 0.
F Atomic Scale Cutoff
Starting from Eq. (39),
(2π)d
e2σkt−
. (77)
The effect of the small scale cutoff is both small and short-lived, as it only works to suppress fluctuations with large
wavenumbers. The most important fluctuations have wavenumbers between 0 and 2kc. Thus, the typical size of the
cutoff term is about b20k
c . If a typical dot size or spacing size 10 nm, and a typical atomic scale is 10
−1 nm, a typical
value for this term is about 10−3 − 10−2. To calculate the effect of the cutoff, it can absorbed into the time-dependent
part with the substitution
so that its effect lasts only as long as a perturbation with atomic scale curvature (κ = b0). Thus, Eq. (40) is a good
approximation.
Acknowledgement
Thanks to L. Fang and C. Kumar for useful comments during the writing of this article.
References
[1] D. Bimberg, M. Grnudmann, and N. N. Ledentsov. Quantum Dot Heterostructures. John Wiley & Sons, 1999.
[2] O. P. Pchelyakov, Yu. B. Bolkhovityanov, A. V. Dvurechenski, L. V. Sokolov, A. I. Nikiforov, A. I. Yakimov,
and B. Voigtländer. SiliconGermanium nanostructures with quantum dots: Formation mechanisms and electrical
properties. Semiconductors, 34(11):122947, 2000. [doi:10.1134/1.1325416].
[3] M. Grundmann. The present status of quantum dot lasers. Physica E, 5:167, 2000. [doi:10.1016/S1386-
9477(99)00041-7].
[4] Pierre M. Petroff, Axel Lorke, and Atac Imamoglu. Epitaxially self-assembled quantum dots. Physics Today,
pages 46–52, May 2001.
[5] Hui-Yun Liu, Bo Xu, Yong-Qiang Wei, Ding Ding, Jia-Jun Qian, Qin Han, Ji-Ben Liang, and Zhan-Guo Wang.
High-power and long-lifetime InAs/GaAs quantum-dot laser at 1080 nm. Applied Physics Letters, 79(18):2868–
70, 2001. [doi:10.1063/1.1415416].
[6] F. Heinrichsdorff, M.H. Mao, N. Kirstaedter, A. Krost, D. Bimberg, A. O. Kosogov, and P. Werner. Room-
temperature continuous-wave lasing from stacked InAs/GaAs quantum dots grown by metalorganic chemical
vapor deposition. Applied Physics Letters, 71(1):22–4, 1997. [doi:doi:10.1063/1.120556].
[7] D. Bimberg, N.N. Ledentsov, and J.A. Lott. Quantum-dot vertical-cavity surface-emitting laser. MRS Bulletin,
27(7):531–7, 2002.
[8] N. N. Ledentsov. Long-wavelength quantum-dot lasers on GaAs substrates: From media to device con-
cepts. IEEE Journal of Selected Topics in Quantum Electronics, 8(5):1015–23, September/October 2002.
[doi:10.1109/JSTQE.2002.804236].
[9] M Friesen, P Rugheimer, D. E. Savage, M. G. Lagally, D. W. van der Weide, R Joynt, and M. A. Eriksson.
Practical design and simulation of silicon-based quantum-dot qubits. Physical Review B, 67(12):121301 (R),
2003. [doi:10.1103/PhysRevB.67.121301].
[10] Yi-Chang Cheng, San-Te (Cing-Ming) Yang, Jyh-Neng Yang, Liann-Be Chang, and Li-Zen Hsieh. Fabrication of
a far-infrared photodetector based on InAs/GaAs quantum-dot superlattices. Optical Engineering, 42(1):11923,
2003. [doi:doi:10.1117/1.1525277].
[11] R. Krebs, S. Deubert, J.P. Reithmaier, and A. Forchel. Improved performance of MBE grown quantum-dot
lasers with asymmetricdots in a well design emitting near 1.3 µm. Journal of Crystal Growth, 251:7427, 2003.
[doi:10.1016/S0022-0248(02)02385-0].
[12] Hiroyuki Sakaki. Progress and prospects of advanced quantum nanostructures and roles of molecular beam
epitaxy. Journal of Crystal Growth, 251:9–16, 2003. [doi:10.1016/S0022-0248(03)00831-5].
[13] B. J. Spencer, P. W. Voorhees, and S. H. Davis. Morphological instability in epitaxially strained dislocation-free
films. Physical Review Letters, 67(26):3696–3699, 1991. [doi:10.1103/PhysRevLett.67.3696].
[14] Karl Brunner. Si/ge nanostructures. Reports on Progress in Physics, 65(1):27–72, 2002. [doi:10.1088/0034-
4885/65/1/202].
[15] L. B. Freund and S. Suresh. Thin Film Materials: Stress, Defect Formation and Surface Evolution, chapter 8.
Cambridge University Press, 2003.
[16] S. Yu Shiryaev, E. Verstlund Pedersen, F. Jensen, J. Wulff Petersen, J. Lundsgaard Hansen, and A. Nylandsted
Larson. Dislocation patterning - a new tool for spatial manipulation of Ge islands. Thin solid films, 294(1-
2):311–314, 1997. [doi: 10.1016/S0040-6090(96)09240-1].
[17] C. Kumar and L. H. Friedman. Simulation of thermal field directed self assembly of epitaxial self-assembled Ge
quantum dots. Journal of Applied Physics, in press.
[18] Lawrence H. Friedman and Jian Xu. Feasibility study for thermal-field directed self-assembly of heteroepitaxial
quantum dots. Applied Physics Letters, 88:093105, 2006. [doi:10.1063/1.2179109].
[19] S. Krishna, D. Zhu, J. Xu, and P. Bhattacharya. Structural and luminescence characteristics of cycled sub-
monolayer InAs/GaAs quantum dots with room-temperature emission at 1.3 µm. Journal of Applied Physics,
86:6135–8, 1999. [doi:10.1063/1.371664].
[20] R. Hull, J.L. Gray, M. Kammler, T. Vandervelde, T. Kobayashi, P. Kumar, T. Pernell, J.C. Bean, J.A. Floro,
and F.M. Ross. Precision placement of heteroepitaxial semiconductor quantum dots. Materials Science and
Engineering B, 101:1–8, 2003. [doi:10.1016/S0921-5107(02)00680-3].
[21] O. Guise, Jr. J. T. Yates, J. Levy, J. Ahner, V. Vaithyanathan, and D. G. Schlom. Patterning of sub-10nm ge
islands on si(100) by direct self-assembly. Applied Physics Letters, 87:171902, 2005. [doi:10.1063/1.2112198].
[22] X. Niu, R. Vardavas, R. E. Caflisch, and C. Ratsch. Level set simulation of directed self-assembly during epitaxial
growth. Physical Review B, 74(19):193403, Nov 2006. [doi:10.1103/PhysRevB.74.193403.
[23] Z. M. Zhao, T. S. Yoon, W. Feng, B. Y. Li, J. H. Kim, J. Liu, O. Hulko, Y. H. Xie, H. M. Kim, K. B.
Kim, H. J. Kim, K. L. Wang, C. Ratsch, R. Caflisch, D. Y. Ryu, and T. P. Russell. The challenges in
guided self-assembly of ge and inas quantum dots on si. THIN SOLID FILMS, 508(1-2):195–199, Jun 2006.
[doi:10.1016/j.tsf.2005.08.407].
[24] Lawrence H. Friedman. Anisotropy and order of epitaxial self-assembled quantum dots. Physical Review B, in
press.
[25] Y. Obayashi and K. Shintani. Directional dependence of surface morphological stability of heteroepitaxial layers.
Journal of Applied Physics, 84(6):3141, 1998. [doi:10.1063/1.368468].
[26] C. S. Ozkan, W. D. Nix, and H. J. Gao. Stress-driven surface evolution in heteroepitaxial thin films: Anisotropy
of the two-dimensional roughening mode. JOURNAL OF MATERIALS RESEARCH, 14(8):3247–3256, Aug
1999. [doi:10.1557/JMR.1999.043].
[27] J. Tersoff and F. K. LeGoues. Competing relaxation mechanisms in strained layers. Physical Review Letters,
72(22):3570–3573, May 1994. [doi:10.1103/PhysRevLett.72.3570].
[28] B. J. Spencer, P. W. Voorhees, and S. H. Davis. Morphological instability in epitaxially strained dislocation-
free solid films: Linear stability theory. Journal of Applied Physics, 73(10):4955–4970, 1993. [doi:
10.1063/1.353815].
[29] J. M. Baribeau, X. Wu, N. L. Rowell, and D. J. Lockwood. Ge dots and nanostructures grown epitaxially
on si. JOURNAL OF PHYSICS-CONDENSED MATTER, 18(8):R139–R174, Mar 2006. [doi:10.1088/0953-
8984/18/8/R01].
[30] D. J. Srolovitz. On the stability of surfaces of stressed solids. Acta Metallurgica, 37(2):621–625, 1989.
[doi:10.1016/0001-6160(89)90246-0].
[31] H. J. Gao and W. D. Nix. Surface roughening of heteroepitaxial thin films. ANNUAL REVIEW OF MATERIALS
SCIENCE, 29:173–209, 1999. [doi:0.1146/annurev.matsci.29.1.173].
[32] P. Sutter and M. G. Lagally. Nucleationless three-dimensional island formation in low-misfit heteroepitaxy.
Physical Review Letters, 84(20):4637, 2000. [doi:10.1103/PhysRevLett.84.4637.
[33] A. A. Golovin, S. H. Davis, and P. W. Voorhees. Self-organization of quantum dots in epitaxially strained solid
films. Physical Review E, 68:056203, 2003. [doi:10.1103/PhysRevE.68.056203].
[34] A. Ramasubramaniam and V. B. Shenoy. Growth and ordering of si-ge quantum dots on strain patterned sub-
strates. JOURNAL OF ENGINEERING MATERIALS AND TECHNOLOGY-TRANSACTIONS OF THE ASME,
127(4):434–443, Oct 2005. [doi:10.1115/1.1924559].
[35] I. Berbezier, A. Ronda, F. Volpi, and A. Portavoce. Morphological evolution of SiGe layers. Surface Science,
531:231–243, 2003. [doi:10.1016/S0039-6028(03)00488-6].
[36] J. R. R. Bortoleto, H. R. Gutierrez, M. A. Cotta, J. Bettini, L. P. Cardoso, and M. M. G. de Carvalho. Spatial order-
ing in InP/InGaP nanostructures. Applied Physics Letters, 82(20):3523–3525, 2003. [doi:10.1063/1.1572553].
[37] P. Liu, Y. W. Zhang, and C. Lu. Formation of self-assembled heteroepitaxial islands in elastically anisotropic
films. Physical Review B, 67:165414, 2003. [doi: 10.1103/PhysRevB.67.165414].
[38] Y.W. Zhang, A.F. Bower, and P. Liu. Morphological evolution driven by strain induced surface diffusion. Thin
solid films, 424:9–14, 2003. [doi:10.1016/S0040-6090(02)00897-0].
[39] Yu U. Wang, Yongmei M. Jin, and Armen G. Khachaturyan. Phase field microelasticity modeling of surface
instability of heteroepitaxial thin films. Acta Materialia, 52:81–92, 2004. [doi:10.1016/j.actamat.2003.08.027].
[40] W. T. Tekalign and B. J. Spencer. Evolution equation for a thin epitaxial film on a deformable substrate. Journal
of Applied Physics, 96(10):5505–5512, 2004. [doi:10.1063/1.1766084].
[41] M. J. Beck, A. van de Walle, and M. Asta. Surface energetics and structure of the ge wetting layer on si(100).
Physical Review B, 70(20):205337, Nov 2004. [doi:10.1103/PhysRevB.70.205337].
[42] Y. H. Tu and J. Tersoff. Origin of apparent critical thickness for island formation in heteroepitaxy. Physical
Review Letters, 93(21):216101, Nov 2004. [doi:10.1103/PhysRevLett.93.216101.
[43] V. Holy, G. Springholz, M. Pinczolits, and G. Bauer. Strain induced vertical and lateral correlations in quantum
dot superlattices. Physical Review Letters, 83(2):356–359, 1999. [doi:10.1103/PhysRevLett.83.356].
[44] P. Liu, Y. W. Zhang, and C. Lu. Three-dimensional finite-element simulations of the self-organized growth of
quantum dot superlattices. Physical Review B, 68:195314, 2003. [doi:10.1103/PhysRevB.68.195314].
[45] G. Springholz, M. Pinczolits, V. Holy, S. Zerlauth, I. Vavra, and G. Bauer. Vertical and lateral ordering in
self-organized quantum dot superlattices. Physica E, 9:149–163, 2001. [doi:10.1016/S1386-9477(00)00189-2.
[46] P. Liu, Y. W. Zhang, and C. Lu. Coarsening kinetics of heteroepitaxial islands in nucleationless stranski-krastanov
growth. Physical Review B, 68:035402, 2003. [doi:10.1103/PhysRevB.68.035402].
[47] F. M. Ross, J. Tersoff, and R. M. Tromp. Coarsening of self-assembled ge quantum dots on Si(001). Physical
Review Letters, 80(5):984–7, 1998. [doi:10.1103/PhysRevLett.80.984].
[48] J. Tersoff. Kinetic surface segregation and the evolution of nanostructures. Applied Physics Letters, 83(2):353–
355, 2003. [doi:doi:10.1063/1.1592304].
[49] A. Ramasubramaniam and V. B. Shenoy. A spectral method for the nonconserved surface evolution of nanocrys-
talline gratings below the roughening transition. Journal of Applied Physics, 97(11):114312, 2005. [doi:
10.1063/1.1897837].
[50] Y. W. Zhang and A. F. Bower. Three-dimensional analysis of shape transitions in strained-heteroepitaxial islands.
Applied Physics Letters, 78(18):2706–2708, 2001. [doi:10.1063/1.1354155].
[51] L. E. Vorbyev. Handbook Series On Semiconductor Parameters, volume 1. World Scientific, 1996.
[52] A. A. Golovin, M. S. Levine, T. V. Savina, and S. H. Davis. Faceting instability in the presence of wet-
ting interactions: A mechanism for the formation of quantum dots. Physical Review B, 70:235342, 2004.
[doi:10.1103/PhysRevB.70.235342].
[53] B L Liang, Zh M Wang, Yu I Mazur, V V Strelchuck, K Holmes, J H Lee, and G J Salamo. Ingaas quantum
dots grown on b-type high index gaas substrates: surface morphologies and optical propertiesmorphologies and
optical properties. Nanotechnology, 17(11):2736–2740, 2006. [doi:10.1088/0957-4484/17/11/004].
[54] M. Meixner, R. Kunert, and E. Scholl. Control of strain-mediated growth kinetics of self-assembled semicon-
ductor quantum dots. Physical Review B, 67:195301, 2003. [doi: 10.1103/PhysRevB.67.195301].
[55] Robert Zwanzig. Nonequilbrium Statistical Mechanics. Oxford University Press, New York, 2001.
[56] C. W. Gardiner. Handbook of Stochastic Methods for Physics Chemistry and the Natural Sciences. Springer,
New York, 3rd edition, 2004.
[57] M. C. Cross and P. C. Hohenberg. Pattern formation outside equilibrium. Reviews of Modern Physics, 65(3):851–
1112, 1993. [doi:10.1103/RevModPhys.65.851].
[58] B. J. Spencer, S. H. Davis, and P. W. Voorhees. Morphological instability in epitaxially strained dislocation-free
solid films: Nonlinear evolution. Physical Review B, 47(15):9760, 1993. [doi: 10.1103/PhysRevB.47.9760].
	Introduction
	Modeling
	1D and 2D Isotropic model
	Energetics
	simple form
	general form
	Linearization
	Dynamics
	Peaks
	Stability and wetting potential
	2D Anisotropic case
	Energetics
	Elastic anisotropy
	Surface and Wetting Energy Anisotropy
	total diffusion potential
	Dynamics
	Expansion about peaks
	Correlation Functions
	Correlation Functions and SAQD order
	Periodic array
	Nearly Periodic array
	Ensemble Correlation Functions / ergodicity
	Mean fluctuation
	Correlation Function
	Stochastic Initial Conditions
	Reciprocal Space Correlation Functions
	one-dimensional
	2D isotropic
	anisotropic
	Real Space Correlation Functions
	one-dimensional
	2D isotropic
	anisotropic
	Generalizability
	Order Predictions
	Ge at 600K
	General case of 
	Discussion/Conclusions 
	Diffusion Potential
	General Form =F/H(x)
	Simple Model
	General Model
	Linearized Diffusion Potential and Anisotropy
	Linearizing the simple model
	isotropic case
	anisotropic case
	Linearizing the general model
	isotropic case
	anisotropic case
	Elastic Anisotropy
	Diffusional Anisotropy
	Correlation Functions
	Mean Values
	Variance and Convergence
	Eq. (??)
	Eq. (??)
	Eq. (??)
	Atomic Scale Cutoff
ABSTRACT
  Epitaxial self-assembled quantum dots (SAQDs) are of interest for
nanostructured optoelectronic and electronic devices such as lasers,
photodetectors and nanoscale logic. Spatial order and size order of SAQDs are
important to the development of usable devices. It is likely that these two
types of order are strongly linked; thus, a study of spatial order will also
have strong implications for size order. Here a study of spatial order is
undertaken using a linear analysis of a commonly used model of SAQD formation
based on surface diffusion. Analytic formulas for film-height correlation
functions are found that characterize quantum dot spatial order and
corresponding correlation lengths that quantify order. Initial atomic-scale
random fluctuations result in relatively small correlation lengths (about two
dots) when the effect of a wetting potential is negligible; however, the
correlation lengths diverge when SAQDs are allowed to form at a near-critical
film height. The present work reinforces previous findings about anisotropy and
SAQD order and presents as explicit and transparent mechanism for ordering with
corresponding analytic equations. In addition, SAQD formation is by its nature
a stochastic process, and various mathematical aspects regarding statistical
analysis of SAQD formation and order are presented.

<|endoftext|><|startoftext|>
A NOTE ABOUT THE {Ki(z)}
FUNCTIONS
Branko J. Malešević
In the article [10], A. Petojević verified useful properties of the Ki(z) functions
which generalize Kurepa’s [1] left factorial function. In this note, we present
simplified proofs of two of these results and we answer the open question stated
in [10]. Finally, we discuss the differential transcendency of the Ki(z) functions.
A. Petojević [7, p. 3.] considered the family of functions:
vMm(s; a, z) =
(−1)k−1
z +m+ 1− k
L[s; 2F1(a, k − z,m+ 2; 1− t)],(1)
for ℜ(z) > v−m−2, where v∈N is a positive integer; m∈{−1, 0, 1, 2, . . .} is an integer;
s, a, z are complex variables; L[s;F (t)] is Laplace transform and 2F1(a, b, c;x) is
the hypergeometric function (|x| < 1). D- .Kurepa has considered in the articles
[1, p. 151.] and [2, p. 297.] a complex function defined by the integral:
K(z) =
tz − 1
dt,(2)
for ℜ(z)>0. Especially, forKurepa’s functionK(z), it is true thatK(z)=1M0(1; 1, z),
for ℜ(z)>0, according to [10]. For various of values of parameters v,m, s, a, z from (1),
different special functions, as presented in [10], are obtained. A. Petojević has con-
sidered in the article [10, p. 1640.] the following sequence of functions:
Ki(z) =
1M0(1; 1, z + i− 1)− 1M0(1; 1, i− 1)
1M−1(1; 1, i)
for i∈N and ℜ(z)>−i. On the basis of the definition in (3), the following represen-
tation via Kurepa’s function is true:
Ki(z) =
(i−1)!
K(z + i− 1)−K(i− 1)
for i∈N and ℜ(z)>−i+1. Note that K(0)=0 [2, p. 297.] and therefore K1(z)=K(z)
for ℜ(z)>0. Analytical and differential–algebraic properties of Kurepa’s function
K(z) are considered in articles [1− 12] and in many other articles. On the basis
of well-known statements for Kurepa’s function K(z), using representation (4), in
many cases we can get simple proofs for analogous statements for Ki(z) functions.
For example, it is a well-known fact that it is possible to analytically continue Ku-
repa’s function to a meromorphic function with simple poles at integer points z = −1
and z = −m, (m ≥ 3) [2, p. 303.], [3, p. 474.]. Residues of Kurepa’s function at
these poles have the following form [2]:
Research partially supported by the MNTRS, Serbia, Grant No. 144020.
http://arxiv.org/abs/0704.0068v2
2 Branko J. Malešević
z = −1
K(z) = −1 and res
z = −m
K(z) =
(−1)k−1
, (m≥3).(5)
For Kurepa’s function K(z) the infinite point is an essential singularity [3]. Hence,
on the basis of (4), each function Ki(z) is meromorphic with simple poles at integer
points z=−i and z=−(i+m), (m≥2). On the basis of (4) we have:
z = −(i+m)
Ki(z) =
(i−1)!
· res
z = −(i+m)
K(z + i− 1) = 1
(i−1)!
· res
z = −(m+1)
K(z),(6)
where m = 0 or m≥2. Hence:
z = −i
Ki(z)=−
(i−1)!
and res
z = −(i+m)
Ki(z)=
(i−1)!
(−1)k−1
, (m≥2).(7)
For each Ki(z) function the infinite point is an essential singularity. Therefore, we get
Theorem 3.3. from [10]. Next, it is a well-known fact that for Kurepa’s function
the following asymptotic relation K(x) ∼ Γ(x) is true for real x such that x → ∞
and where Γ(x) is the gamma function [2, p. 299.]. Hence, for fixed i ∈N and real
x>−i+1, on the basis of (4), we get:
Ki(x)
Γ(x+ i− 1)
(i−1)!
· K(i+ x− 1)−K(i− 1)
Γ(x+ i− 1)
(i−1)!
Ki(x)
Γ(x+ i)
(i−1)!
· K(i+ x− 1)−K(i− 1)
(x+ i− 1)Γ(x+ i− 1)
0.(9)
Therefore, we get Theorem 3.6. from [10]. Next we give a solution to the open
problem stated in Question 3.7. in [10]. Namely, the following formula in the article
[8, p. 35.] is given:
K(z) =
Ei(1) + iπ
(−1)zΓ(1 + z)Γ(−z,−1)
,(10)
for values z ∈ C\{−1,−2,−3,−4, . . .} and i =
−1. In the previous formula Ei(z)
and Γ(z, a) are exponential integral and incomplete gamma function respectively [8].
Then, for fixed i∈N and values z∈C\{−i,−i− 1,−i− 2,−i− 3, . . .}, on the basis of
(4) and (10), we get:
Ki(z) =
(i−1)!
K(z + i− 1)−K(i− 1)
Ei(1) + iπ
e(i−1)!
(−1)z+i−1Γ(1 + z + i− 1)Γ(−z − i+ 1,−1)
e(i−1)!
−Ei(1) + iπ
e(i−1)!
− (−1)
i−1Γ(i)Γ(−i+ 1,−1)
e(i−1)!
= (−1)ie−1
Γ(1− i,−1)− (−1)z Γ(1− i− z,−1)Γ(i+ z)
(i−1)!
Therefore, the affirmative answer for Question 3.7. from [10] is true for complex
values z∈C\{−i,−i− 1,−i− 2,−i− 3, . . .}.
A note about the {Ki(z)}∞i=1 functions 3
Finally, at the end of this note let us emphasize one differential–algebraic fact for
the sequence of functions Ki(z). On the basis of the formula (17) from the article
[10], we can conclude that each Ki(z) function satisfies the following recurrence re-
lation (i−1)!Ki(z + 1)− (i−1)!Ki(z) = Γ(z + i). The previous relation can be used
to verify the differential transcendency of these functions as discussed in [11, 12].
Therefore, we can conclude that each Ki(z) function is a differential transcendental
function, i.e. it satisfies no algebraic differential equation over the field of complex ra-
tional functions.
REFERENCES
[1] D- . Kurepa: On the left factorial function !n, Mathematica Balkanica 1 (1971), 147−153.
[2] D- . Kurepa: Left factorial function in complex domain, Mathematica Balkanica 3 (1973),
297 − 307.
[3] D. Slavić: On the left factorial function of the complex argument, Mathematica Balkan-
ica 3 (1973), 472− 477.
[4] A. Ivić, Ž. Mijajlović: On Kurepa problems in number theory, Publications de
l’Institut Mathématique, SANU Beograd, 57, (71) (1995), 19 − 28, available at
http://elib.mi.sanu.ac.yu/pages/browse journals.php .
[5] G.V. Milovanović: Expansions of the Kurepa function, Publications de l’Institut
Mathématique, SANU Beograd 57 (71) (1995), 81− 90, available at home page
http://gauss.elfak.ni.ac.yu .
[6] G.V. Milovanović, A. Petojević: Generalized factorial functions, numbers and poly-
nomials, Mathematica Balkanica 16 (2002), 113− 130.
[7] A. Petojević: The function vMm(s; a, z) and some well-known sequences, Journal of
Integer Sequences, Article 02.1.6, Vol. 5 (2002).
[8] B. Malešević: Some considerations in connection with Kurepa’s function, Univerzitet u
Beogradu, Publikacije Elektrotehničkog Fakulteta, Serija Matematika, 14 (2003), 26−36,
available at http://pefmath.etf.bg.ac.yu/ .
[9] B. Malešević: Some inequalities for Kurepa’s function, Journal of Inequalities in
Pure and Applied Mathematics, Vol. 5, Issue 4, Article 84, (2004), available at
http://jipam.vu.edu.au/ .
[10] A. Petojević: The {Ki(z)}
i=1 functions, Rocky Mountain Journal of Mathematics,
Vol. 36, No. 5, (2006), 1637-1650.
[11] Ž. Mijajlović, B. Malešević: Differentially transcendental functions, accepted
in Bulletin of the Belgian Mathematical Society − Simon Stevin 2007, available at
http://arxiv.org/abs/math.GM/0412354 .
[12] Ž. Mijajlović, B. Malešević: Analytical and differential – algebraic properties of
Gamma function, to appear in International Journal of Applied Mathematics &
Statistics
J.Rassias (ed.), Functional Equations, Integral Equations, Differen-
tial Equations & Applications, http://www.ceser.res.in/ijamas/cont/fida.html
Special Issues dedicated to the Tri-Centennial Birthday Anniversary of L. Euler, 2007.,
available at http://arxiv.org/abs/math.GM/0605430 .
University of Belgrade, (Received : 04/01/2007 )
Faculty of Electrical Engineering, (Accepted : 05/25/2007 )
P.O.Box 35-54, 11 120 Belgrade, Serbia
malesh@eunet.yu, malesevic@etf.bg.ac.yu
http://elib.mi.sanu.ac.yu/pages/browse_journals.php
http://gauss.elfak.ni.ac.yu
http://pefmath.etf.bg.ac.yu/
http://jipam.vu.edu.au/
http://arxiv.org/abs/math.GM/0412354
http://www.ceser.res.in/ijamas/cont/fida.html
http://arxiv.org/abs/math.GM/0605430
ABSTRACT
  In the article [Petojevic 2006], A. Petojevi\' c verified useful properties
of the $K_{i}(z)$ functions which generalize Kurepa's [Kurepa 1971] left
factorial function. In this note, we present simplified proofs of two of these
results and we answer the open question stated in [Petojevic 2006]. Finally, we
discuss the differential transcendency of the $K_{i}(z)$ functions.

<|endoftext|><|startoftext|>
Introduction
Our purpose is to construct invariant dynamical objects for a self map f : X →
X of a compact topological space. We make use of sheaf cohomology and
differences in rates of expansion in different terms of a long exact sequence
to construct invariant sections of a sheaf. We will show that there are in-
variant degree 1 currents (or eigencurrents) corresponding to each expanding
eigenvector of H1(X,R). We also show that successive preimages of suffi-
ciently regular degree one currents converge to one of these eigencurrents.
We show that if most of the expansion f : X → X is ”along” an invariant
cohomological class v ∈ Hk(X,R) then there is an invariant current c in that
cohomology class and other sufficiently regular currents in the same class
converge to c under successive pullback.
The group cohomology of Z acting on a space of functions on X via pull-
back has been studied in the context of dynamical systems [Kat03]. This
work seems related to ours, but to be pursued in an essentially different di-
rection. Our map f is not assumed to be invertible, so there is not necessarily
a Z action, only an N action. Also, we use sheaves rather than functions and
make substantial use of cohomological tools. Most importantly, we are par-
ticularly interested in the construction of invariant currents, especially when
the current is some sense unique.
Our results are actually motivated by results in higher dimensional holo-
morphic dynamics showing the existence of a unique closed positive (1, 1) cur-
rent under a variety of circumstances (just about any recent paper on higher
dimensional holomorphic dynamics either proves such results or makes essen-
tial use of such results, see e.g. [FS92], [HOV94], [HOV95], [BS91a], [BS91b],
[BS92], [BLS93], [BS98a], [BS98b], [BS99], [Can01], [McM02], [FS94a], [FS94b],
[FS95b], [FS01], [FS95a], [JW00], [FJ03], [Ued94], [Ued98], [Ued97], and
[DS05]).
While invariant measures have been a focal point in dynamics, it seems
that invariant currents also have an imporant role to play. We will show under
mild conditions that if some degree one cohomological class of a smooth self
map f of a compact manifold is invariant and expanded there is necessarily
a invariant degree one current of a certain type representing that class. We
obtain analogous results for higher degree currents given bounds on the local
growth rates of f . The uniqueness of these classes is significant. It seems
clear that one could modify a map locally near a fixed point to obtain other
invariant currents of the same type without affecting the topology. Thus
our results also say that any local modification that created an invariant
current of the given type must violate the local growth conditions. In other
words, as long as things do not grow too fast compared to the growth rate of
the cohomology class, the expansion of the cohomology class gives sufficient
“marching orders” to points that no other invariant cohomological class of the
given type can be created by purely local dynamical behavior. Our results
give explicit conditions under which uniqueness is guaranteed. For degree
one currents, no restriction on local growth rates is necessary for our results.
2 Cohomomorphisms
We will make use of sheaves in this paper. There are two standard def-
initions of sheaves on a topological space X, one as a topological space
([Bre97],[GR84]), and one as a functor on the category TopX satisfying var-
ious axioms ([Har77],[Wei97]). Since we will often want to make use of a
topology on sections of a sheaf A that differs from the topology these inherit
using the topological definition of a sheaf, we will instead use the functor
definition of a sheaf.
Our sheaves will always be sheaves of K modules over some fixed field K.
We will require that K have an absolute value for which K is complete.
Given a continuous map f : X → Y and sheaves A and B on X and
Y respectively, an f -cohomomorphism is a generalized notion of a pullback
from B to A through f . Different types of geometric objects pull back
differently, and this allows us to handle all cases at once.
We take the following facts from from [Bre97] page 14–15.
Definition 1. If A and B are sheaves onX and Y then an “f-cohomomorphism”
k : B → A is a collection of homomorphisms kU : B(U) → A (f−1(U)), for
U open in Y , compatible with restrictions.
Note that if A is a sheaf on X and f : X → Y is continuous then there
is a canonical cohomomorphism f∗A ; A where f∗A is the direct image of
A , i.e. given an open U ⊂ Y , f∗A (U) = A (f−1(U)).
Remark. Given a continuous map f : X → Y of topological spaces X and
Y and sheaves A and B on X and Y respectively, all f -cohomomorphisms
f : B ; A are given by a composition of the form
j→ f∗A
f∗→ A
where j : B → f∗A is a sheaf homomorphism, and each such composition is
seen to given an f -cohomomorphism.
The usual notion of “a morphism of sheaves on X” is the same as an idX
cohomomorphism of sheaves on X.
2.1 Cohomomorphisms and Γ.
The functor Γ returns the global sections of that sheaf. Given a morphism
φ : A → A ′ of sheaves on X, Γφ is just the homomorphism A (X)→ A ′(X).
Given sheaves A and B on X and Y and given f : X → Y continuous
then for a sheaf cohomomorphism F : B → A one defines ΓF to be the
homomorphism B(Y ) → A (X). This extends Γ to be a functor on the
category of topological spaces with an associated sheaf where morphisms are
given by cohomomorphisms.
3 Invariant Global Sections
Fix a continuous self map f : X → X of a topological space X. We will
be interested in f self cohomomorphisms of sheaves A on X. As we will
typically have several sheaves of interest on X, each with a corresponding
f self cohomomorphism, we let fA : A ; A be the default notation for an
f -cohomomorphism of A .
Assume that X is a manifold and that
p→ B q→ C
is a short exact sequence of sheaves on X. Let f : X → X be a continuous
self map of X and assume further that we are given f self cohomomorphisms
of each of these sheaves and that
// B q
commutes. We will say that a commutative diagram as in (1) is an f self-
cohomomorphism of the sequence A → B → C .
Applying the functor Γ to this diagram, the rows can be extended in the
usual long exact sequence. The resulting diagram is commutative ([Bre97]
page 62).
0 // A (X)
C (X)
H1(X,A )
· · ·
0 // A (X)
// B(X)
// C (X)
// H1(X,A )
// · · ·
One can think of B as providing local potentials for members of C and
of A as being those potentials which give rise to the zero member of C .
It will be assumed that the reader is familiar with interpreting H1(X,A )
as classifying equivalence classes of bundles with transition functions in A .
We will frequently refer to members of H1(X,A ) as bundles. Sections of
such bundles will be assumed to be given locally by local sections of B,
so that every member c of Γ(C ) is given locally by potentials in B, and
these potentials, taken together, are a section of the corresponding bundle
δ(c) ∈ H1(X,A ).
Convention 1. We will frequently refer to a member v of H1(X,A ) as a
bundle, to a member c ∈ Γ(C ) as a divisor and if δ(c) = v we will call c a
divisor of the bundle v. We think this substantially adds to the readability of
the paper.
Definition 2. The support of a divisor c ∈ Γ(C ) is defined to be the com-
plement of the union of all open sets U such that c
Lemma 3. If an open set U lies outside the support of some c ∈ Γ(C ) then
f−1(U) lies outside the support of fC (c)
Proof. We note that by the definition of an f -cohomomorphism fC : C → C ,
since the cohomomorphism fC on C (U) is a homomorphism from C (U) to
C (f−1(U)) and the induced action of fC on Γ(C ) restricted to U must agree
with its action C (U) → C (f−1(U)), then if an open set U is outside the
support of c then f−1(U) is outside the support of of fC (c).
The following conditions for a given v ∈ H1(X,A ) will be of interest:
Definition 4. We will refer to a bundle v ∈ H1(X,A ) for which (H1p)(v) =
0 as being closed.
Note that this notion depends upon the exact sequence A → B → C ,
and not just on v. If B is γ acyclic then every member of H1(X,A ) is closed.
Definition 5. We will call a bundle v ∈ H1(X,A ) base point free if for every
x ∈ X there is some divisor c ∈ Γ(C ) associated to v whose support does
not contain x.
Lemma 6. If B is soft, X is a regular topological space, and a ∈ H1(X,A )
is a closed bundle then a is base point free.
Proof. From the long exact sequence there is some c′ ∈ Γ(C ) with δ(c′) = a
and given any point x ∈ X, from the fact that B
� C the germ c′x of c
at x is the image under qx of some germ b
x of Γ(B) at x. Choose an open
neighborhood U of x on which there is some b′ ∈ B(U) with b′x = b′′x. The
topological assumption on X implies that there is a neighborhood V b U
of x. The fact that B is soft implies there is some b ∈ Γ(B) such that
. Then c = c′ − b ∈ Γ(C ) has δ(c) = a and x 6∈ Supp(c).
Definition 7. We will refer to a bundle a ∈ H1(X,A ) such that fA (a) = λ·a
for some λ ∈ C as a λ eigenbundle.
We also find it useful to introduce a relevant notion of expansiveness
of a map f : X → X relative to a base point free closed eigenbundle v ∈
H1(X,A ).
Definition 8. Given a base point free closed eigenbundle v ∈ H1(X,A )
then we say that f is cohomologically expansive at x for v if for any open
neighborhood U of x and any divisor c ∈ Γ(C ) of v, the set U intersects the
support of fkC (c) for all sufficiently large k.
Remark. It is a corollary of the definition that the set of points at which
f is cohomologically expansive for v is closed and forward invariant. If
Supp fkC (c) = f
−k(Supp(c)) for each c ∈ Γ(C ) then the set of cohomolog-
ically expansive points is totally invariant.
The notion of being cohomologically expansive at x for v means roughly
that under iteration by f small neighborhoods U of x always grow to cover
enough of X that the pullback of the bundle v to the set fk(U) is a nontrivial
bundle on fk(U) whenever k is large.
We show that if B is soft and X is a compact metric space then some
minimal expansion takes place at points where f is cohomologically expansive
for a closed eigenbundle a ∈ H1(X,A ).
We use B�(x) to denote the ball of radius � about x.
Lemma 9. Let X be a compact metric space. If B is soft and v is a closed
eigenbundle then there exists δ > 0 such that for every � > 0 there exists
some K > 0 such that if f is cohomologically expanding at x then for every
k > K, diam fk(B�(x)) > δ.
Proof. The bundle v is base point free by Lemma 6. Using compactness we
can conclude that there is a finite open cover U1, . . . , U` of X such that for
each j, Uj is disjoint from Supp cj for some cj ∈ Γ(C ) with δ(cj) = v. We
will prove the lemma by contradiction. Let δ be the Lebesgue number of the
cover U1, . . . , U`. If the lemma is false there is some � > 0 and some increasing
sequence kn and points xn at which f is cohomologically expansive such that
diam fkn(B�(xn)) ≤ δ for each n. By going to a subsequence if necessary we
can assume xn converges to a point x∞. Letting U = B 1
�(x∞) we see that
U ⊂ B�(xn) for all large n and thus there is some one cj of c1, . . . , c` such that
fkn(U) is disjoint from Supp cj for infinitely many values of n. Consequently
U is disjoint from Supp fknC (cj) for infinitely many n, contrary to x∞ being
a point at which f is cohomologically expansive for v.
We included Lemma 9 to show that our notion of cohomological expansion
is genuinely expansive. However, depending on the nature of A , being coho-
mologically expansive can imply that neighborhoods grow a great deal under
iteration indeed. In Lemma 10 we show that given any closed set K such that
the pullback of a fixed point free closed eigenbundle a ∈ H1(X,A ) to K is a
trivial bundle then any neighborhood U of a point at which f is cohomolog-
ically expanding for a is so expanded under iteration that fk(U) 6⊂ intK for
all sufficiently large k. The collection of such sets K typically contains very
large sets about every point so no matter where fk(x) is the conclusion that
fk(U) does not lie in any intK implies some points of fk(U) must lie far away
from fk(x). The point is roughly that large iterates of any neighborhood of
x can not be homotopically contracted to a point in X.
Lemma 10. If B is soft, then for any closed set K ⊂ X such that the image
of H1(X,A ) → H1(K,A
) is zero, given any divisor c ∈ Γ(C ), there is
another divisor c′ ∈ Γ(C ) associated to the same bundle and c′ is supported
outside the interior of K. Consequently, if f is cohomologically expansive
at x ∈ X for some base point free closed eigenbundle a ∈ H1(X,A ) then
necessarily for any neighborhood U of x, fk(U) 6⊂ intK for all large k, where
intK is the interior of K.
Proof. We use the commutative diagram
H0(X,B)
Γq //
H0(X,C )
H1(X,A )
H0(K,B
Γq // H0(K,C
δ // 0
which we have written using H0 instead of Γ so it is clear what the ambi-
ent space is in each case. From exactness there exists some β ∈ H0(K,B
such that δ(β) = c
. Then since B is soft the mapH0(X,B)→ H0(K,B
is surjective so there is some b ∈ Γ(B) = H0(X,B) such that b
= β. Then
c′ = c − (Γq)(b) has δ(c′) = δ(c) and c′
= 0 so Supp(c′) is disjoint from
the interior of K.
It is easy to see that if f is cohomologically expansive at x ∈ X for some
fixed point free closed eigenbundle a ∈ H1(X,A ) then necessarily for any
neighborhood U of x, fk(U)∩Supp c 6= ∅ for all large k for any c ∈ Γ(C ) such
that δ(c) = a. Hence fk(U) can not lie in the interior of K for any large
Convention 2. We let K be either R or C, although our central theorems
only require K to be a complete field with an absolute value.
The following Theorem takes advantage of the fact that in an exact se-
quence the eigenvalues of members of nonadjacent members of the sequence
do not have to agree to give conditions under which one can uniquely “lift”
fixed members of one term of the exact sequence to a fixed member of the pre-
ceding term. Interpreted as a statement in the context of sheaf cohomology
we will be able to use this Theorem to make dynamical conclusions.
The theorem shows that each closed eigenbundle of the induced map
fA : H
1(X,A ) → H1(X,A ) with sufficiently large eigenvalue has a unique
associated invariant divisor c ∈ Γ(C ).
Definition 11. Given any finite dimensional K vector space V along with
a linear map g : V → V and any positive real number r, we let the r chron-
ically expanding subspace of V be the span of the subspaces associated2 to
eigenvalues of absolute value greater than r. We refer to the 1 chronically
expanding subspace simply as the chronically expanding subspace.
Theorem 12 (Unique Invariant Subspace Theorem). We will assume the
following:
• f : X → X is a continuous self map of a topological space X.
• We are given an f self cohomomorphism of a short exact sequence of
sheaves on X,
p→ B q→ C
• Γ(B) is a Banach space over K, and there exists some α, d ∈ R>0 such
that ‖ΓfBk(B)‖ ≤ d · αk‖B‖ for k ∈ N, B ∈ Γ(B),
• Γ(C ) is a topological vector space over K.
• If a sequence Ci ∈ Γ(C ) of divisors converges to another divisor C∞
then the support of C∞ is contained in the closure of the union of the
supports of Ci.
• The maps ΓfC and Γq are continuous.
• We are given a finite dimensional H1(fA ) invariant subspace W of the
α chronically expanding subspace of H1(X,A ). We also require W to
be comprised only of closed bundles.
2Meaning for each eigenvector λ we include not just the λ eigenspace, but also every
v ∈ V such that (g − λ · idV )n(v) = 0 for some positive integer n.
Then given any K linear map s : W → Γ(C ) such that δs = idW there is a
K linear map τ : W → Γ(B) satisfying
κ := lim
(ΓfC )
ksgk = s+ (Γq)τ (3)
where g : W → W is the inverse of H1fA
. Under iterated pullback the
rescaled pullbacks of any divisor C ∈ Γ(C ) of a bundle w ∈ W converge
toward the invariant plane of divisors κ(W ) ⊂ Γ(C ). The map κ : W →
Γ(C ) is the unique map making the diagram
wwo o
Γ(C )
// H1(X,A )
wwo o
// Γ(C )
// H1(X,A )
commute. Finally, for any basepoint free eigenbundle v ∈ W the support of
the corresponding invariant divisor κ(v) ∈ Γ(C ) is contained in the set of
points on which f is cohomologically expansive for v.
Proof. We note that δ
(ΓfC )sg−s
= 0 and so there is a map σ : W → Γ(B)
such that (Γq)σ = (ΓfC )sg − s.
Define Φ: Hom(W,Γ(B)) → Hom(W,Γ(B)) by Φ(σ) = (ΓfB)σg−1. We
will show that the sequence of maps Φk is exponentially contracting on
Hom(W,Γ(B)). Fix a norm ‖ · ‖ on W . The assumption that W lies in
the α chronically expanding subspace of H1(X,A ) implies that there exists
some β > α and some c > 0 such that ‖g−k(w)‖ ≤ cβ−k‖w‖ for k ∈ N,
w ∈ W . This with the assumption on the rate of expansion of ΓfB easily
implies that
‖Φk(φ)(w)‖ = ‖(ΓfB)k(φ(g−k(w)))‖ ≤ cd
‖φ‖ · ‖w‖
Thus Φk is an operator of norm no more than cd
, where α < β.
Letting τk = σ + Φ(σ) + Φ
2(σ) + · · · + Φk(σ) then limk→∞ τk converges
to some map τ . It is easily confirmed that (Γq)τk = (ΓfC )
ksg−k − s. Equa-
tion (3) then follows by continuity of Γq.
The conclusions about the map κ are easy consequences of its definition.
For the final conclusion note that if we just let W be the span of v then
we have already shown that if C is the unique invariant member of Γ(C )
associated to v then for any divisor c′ ∈ Γ(C ) satisfying δ(c′) = v letting
λ be the eigenvalue of v we can write c′ = κ(v) + (Γq)(b) and equation 3
becomes (ΓfC )
kc′/λk = κ(v) + (Γq)(ΓfB)
kbλk where the final term goes to
zero as k →∞ (by our assumptions on growth rates of g−1 and ΓfB). Hence
(ΓfC )
k(c′)/λk converges to c = κ(v). If U is any open subset of X and if the
support of c′ is disjoint from fn(U) for arbitrarily large values of n, then the
support of (ΓfC )
n(c′) must be disjoint from U for arbitrarily large values of
n. Since, rescaled, these converge to c then U must lie outside the support
of c.
Remark. While we have not formally required X to be compact, the re-
quirement that Γ(B) be a Banach space makes this the main case in which
Theorem 12 is apt to have interesting applications.
Theorem 12 shows that among all members of Γ(C ) representing a coho-
mology class in W there is a unique invariant linear subspace which can be
identified with W and all other such members of Γ(C ) are contracted to this
invariant copy of W in Γ(C ) under (rescaled) pullback.
Corollary 13. Assume that the hypothesis of Theorem 12 are satisfied, and
that g : W → W is dominated by a single simple real eigenvalue r > 0 with
eigenvector v. Let C ≡ κ(v) be the unique invariant divisor of v. Then given
a divisor C′ ∈ Γ(C ) of any w ∈ W the successive rescaled pullbacks fkC (C
′)/rk
converge to a multiple (possibly zero) of C.
Proof. This is a direct consequence of equation (3).
The assumption that g : W → W is dominated by a single simple real
eigenvalue is meant to handle the most typical situation, and is not an es-
sential restriction.
Remark. Given that for a fixed f : X → X the category of SC sheaves A
on X endowed with an f self cohomomorphism F is an abelian category
with enough injectives, then the functor Fixed Γ which gives the fixed global
sections of A under F will be left exact and its right derived functors should
be of dynamical interest. In the case where A is a sheaf of functions and
f is invertible this is just group cohomology with the group Z acting on
Γ(A ) and has been an object of study for some time (see, e.g. [Kat03]). We
anticipate studying the case of more general sheaves A and the right derived
functors of the composition Fixed Γ in a future paper, including the case of
endomorphisms.
3.1 Regularity and Positivity
Typically our regularity results for the members invariant plane κ(W ) will
be most easily described in terms of B rather than C . We therefore make
the following definition.
Definition 14. Given a subsheaf B′ ⊂ B we will say a divisor C ∈ Γ(C ) has
local B′ potentials if C ∈ Γ(q(B′)). This is equivalent to requiring that about
each point x ∈ X there is an open neighborhood U and some B′ ∈ B′(U)
such that q(B′) = C
The proof of Theorem 12 implicitly provides a method to prove regularity
results for members of the invariant plane κ(W ). We make this explicit as a
corollary (of the proof).
Corollary 15. Assume we are given f : X → X and a short exact sequence
of sheaves A
p→ B q→ C satisfying the hypothesis of Theorem 12. Assume
that B′ is a subsheaf of B and that ΓfB(B
′) ⊂ B′. Let C ′ be the image of
B′ under q : B → C . Let A ′ ⊂ A be the kernel of q : B → C ′. Assume
that the canonical map H1(X,A ′) → H1(X,A ) is injective. Assume that
there are basis members w1, . . . , wk of W with divisors each of which has local
potentials in B′. Let r be the the inverse of the absolute value of the largest
eigenvalue of g−1 (so for all j ≥ 0, g−j is an operator of norm no more
than cr−j for some c > 0) Finally assume that for any sequence of numbers
aj, j = 0, 1, 2, . . . such that |aj| is no more than a constant times r−j as
j →∞ then for B ∈ Γ(B′) the exponentially decaying sequence
a0 B + a1 (ΓfB)(B) + a2 (ΓfB)
2(B) + · · · (4)
converges in the Banach space structure on Γ(B) to a member of Γ(B′).
Then the map κ : W → Γ(C ) lands in Γ(C ′).
Proof. Since W lies in the α chronically expanding subspace of W then neces-
sarily α/r < 1. Thus the terms of equation (4) have exponentially decreasing
norms and the series is exponentially decaying.
By the assumption of a divisor in Γ(C ′) for each member wj of a basis
then the map s : W → Γ(C ) in Theorem 12 can be assumed to land in Γ(C ′).
Then (ΓfC )sg
−1− s lands in Γ(C ′) and satisfies δ((ΓfC )sg−1− s) = 0. Since
H1(X,A ′) → H1(X,A ) injects it easily follows that for each wj one can
choose σ(wj) to be a member Bj of Γ(B)
′. Using the basis w1, . . . , wk to write
g−1 as a matrix A, and letting aij,` be the ij entry of A
` (so for each ij, aij,` is
bounded by a constant times r−`) we see that τ`(wj) = Bj + (ΓfB)(a1j,1B1 +
· · ·+ akj,1Bk) + (ΓfB)2(a1j,2B2 + · · ·+ akj,2Bk) + · · ·+ (ΓfB)`(a1j,`B1 + · · ·+
akj,`Bk). Gathering all the B1 terms, B2 terms, etc... from the right hand
side we see that τ = limk→∞ τk is a member of Γ(B
′) and thus that κ lands
in Γ(C ′) by equation (3).
The following trivial observation will suffice for our needed positivity
conclusions.
Observation. Assume we have an f self cohomomorphism of a short exact
sequence of sheaves A
p→ B q→ C satisfying the hypothesis of Theorem 12,
and also a subsheaf C ′ ⊂ C such that
1. C ′ is closed under multiplication by R>0. Note that C ′ is not necessarily
a sheaf of K modules, or even of groups.
2. fC (C
′) ⊂ C ′
3. Γ(C ′) is closed in Γ(C ).
Then for any closed eigenbundle v ∈ H1(X,A ) with eigenvalue in K0 and at
least one divisor C′ ∈ Γ(C ′) the unique invariant divisor C ∈ Γ(C ) of v also
lies in Γ(C ′).
Proof. The proof is trivial since C = limk→∞(ΓfC )
k(C′)/λk where λ ∈ R>0 is
the eigenvalue of v.
4 Subsheaf Cohomology
In applications of Theorem 12 it is common that there is a well understood
exact sequence of sheaves
d0→ S1
d1→ S2
d2→ · · · (5)
and that B is a subsheaf of Sk for some k, A is the kernel of dk
: B →
Sk+1 and C is the image of B in Sk+1. Moreover, in these cases the self co-
homomorphism f on A → B → C is induced by an f self cohomomorphism
of the sequence (5). In order to apply Theorem 12 to these cases we need to
understand the R module H1(X,A ) and its induced self map.
There does not seem to be a computationally useful way to extract an
injective resolution of A using subsheaves of S0
d0→ S1
d1→ · · · even if this
last sequence is acyclic. Consider for example the case where for each n,
Sn is the sheaf of currents of degree n and B ⊂ Sk is a subsheaf of mildly
regular currents. It is not clear one could make the regularization method of
[dR84] work to compare H1(X,A ) to deRham cohomology groups because
his chain homotopy operator A does not restrict well to B since dA does not
preserve regularity. We use a standard sheaf cohomological trick, which we
include here as a proposition which we will need and which we expect to be
commonly used in conjuction with Theorem 12 because of the requirement
that Γ(B) be a Banach space.
Theorem 16 (Subsheaf Cohomology). Assume we are given an exact se-
quence of sheaves S0
d0→ S1
d1→ S2
d2→ · · · and that B is a subsheaf of Sk for
some k ≥ 1. Let A = ker dk
, and B′ be the preimage of B under dk−1.
Further assume that for each j ≥ 1 we have Hj(X,B′) = 0, Hj(X,B) = 0
and for any m satisfying 0 ≤ m ≤ k − 1 we have Hj(X,Sm) = 0 for j ≥ 1.
Then for each n ≥ 1 there is a canonical isomorphism
Hn(X,A ) ∼= Hn+k(X, ker d0).
Proof. While this result is essential for us, its proof is a standard cohomo-
logical trick. First one notes that ker dk−1
= ker dk−1 by the definition of
B′. One has the short exact sequences of sheaves:
ker dk−1 → B′ → (dk(B′) = A )
ker dj → Sj → ker dj+1, j = 0, . . . , k − 2.
Considering the long exact sequences for these shows that the induced maps
Hn(X,A )→ Hn+1(X, ker dk−1) andHn+j(X, ker dk−j)→ Hn+j−1(X, ker dk−j−1)
are isomorphisms for j = 1, . . . , k−1. Composing each of these canonical iso-
morphisms gives a canonical isomorphism fromHn(X,A )→ Hn+k(X, ker d0).
Remark. We take it as clear from the functorality of the δ map in the long
exact sequence that given an f -self cohomomorphism of S0
d0→ S1
d1→ S2
· · · which maps B to itself that the induced map of H1(X,A ) is identified
with the induced map of Hk+1(X, ker d0) via the above isomorphism.
We will need one more tool be able to make effective use of Theorem 16
for calculating sheaf cohomology of subsheaves of sheaves of currents.
Definition 17. By an interval flow h on a bounded open interval I ⊂ R we
will mean the flow obtained by integrating a vector field of the form σ(t) ∂
where σ is positive exactly on I and zero elsewhere. We use h(x, t) to denote
the location of x ∈ R after following the flow for time t.
Definition 18. By an n-box in Rn we will mean an open subset which is
a product of n bounded open intervals I1, . . . , In. By an n-box in an n
dimensional manifold we will mean an n-box which is compactly supported
in some coordinate patch. By an n-subbox of an n box U = I1 × · · · × In we
will mean an n box of the form I ′1 × · · · × I ′n where I ′k is a subinterval of Ik
for each k ∈ 1, . . . , n.
Definition 19. By an n-box flow we will mean the Rn action h on Rn
which is the product of n interval flows h1(t1), . . . , hn(tn) on Rn. That is
h(x, t) = (h1(x1, t1), . . . , hn(xn, tn)) where x = (x1, . . . , xn), t = (t1, . . . , tn)
and h1, . . . , hn are interval flows on I1, . . . , In respectively. We refer to the
n-box I1 × · · · × In as the open support of the n-box flow. We will often ht
to denote the diffeomorphism h(·, t) : Rn → Rn.
Definition 20. Let h be an n-box flow on an n-box B. Let ρ be a compactly
supported smooth volume form on Rn. With this data we define an operator
Sh,ρ on smooth k forms on any n box U containing B by
Sh,ρ(φ) =
h∗t (φ)ρ(t) (6)
We say Sh,ρ defines a box smear on U , or smears U . We will omit the
subscript from Sh,ρ when the meaning is clear from context. It is clear S(φ)
is compactly supported in U if φ is.
It is clear from the definition of S that if ψ is an n− k form on U then∫
SH,ρ(φ) ∧ ψ =
φ ∧ S−H,ρ(ψ)
where−H is the family Ht with the parameter negated. From this motivation
we define a smear of a current.
Definition 21. Given h, ρ defining a smear on an n box U we define the
smear Sh,ρ on currents on U via
< Sh,ρ(C), φ >≡< C,S−h,ρ(φ) > .
Lemma 22. Given h, ρ defining a smear S on an n box U then d
S(dC) for currents C on any open subset of U containing the open support
of the smear. Also, restricted to the open support of the smear, S(C) is a
smooth form on V .
Proof. We remark that it is clear that d
= S(dφ) for forms φ, and
consequently for currents φ via the definition.
Because on the open support of the smear, a smear is just convolution
with a smooth function, then we see that if V is an open subset of the open
support of smear S on U then for any current C on U , S(C)
is a smooth
form on V .
Proposition 23. Let B be a sheaf of degree k currents. Assume that B
contains the sheaf of smooth k forms on X, and that B(U) is closed under
smears on any n-box U ⊂ X. Let B′ be the preimage under d of B in the
sheaf of degree k − 1 currents. Then B′ is soft, and therefore, Γ-acyclic.
Proof. To show that B′ is soft it is sufficient to show that B′ is locally soft
([Bre97] page 69). Given an n-box U in X we therefore only need to show
that if K is a closed subset of X in U and if W is an open neighborhood
of K then given any member B′0 of B
′(W ) there is an open neighborhood
W0 ⊂ W of K and a member B′ ∈ B′(U) such that B′
= B′0
Choose any pair of open sets V1, V2 such that K b V1 b V2 b W . Then
V2 \ V1 is compact and can therefore be covered by finitely many (open) n-
subboxes Y1 . . . , YN of U . Moreover these subboxes can all be chosen to be
disjoint from K and to lie inside W . Letting S1, . . . ,Sn be smears on U with
open support Y1, . . . , YN respectively then let B = S1(S2(· · · (SN(B′0)) · · · )).
Then on each Yj, B is given by a smooth k form. Also, B
= B′0
. Finally,
we choose a smooth function ψ : U → [0, 1] which is one on a neighborhood
of V1 and zero on a neighborhood of U \ V2. Then the current B′ ≡ ψB
extends (by zero) to a current on all of U . Then for each Yj, B
smooth function times a smooth form. Thus d(B′
) is a smooth form and
Figure 1: A current comprised of parallel submanifolds smeared and cropped.
lies in B(Yj). The boxes Yj cover V2 \ V1. Outside V2, B′ is identically zero.
We know that dB ∈ B(W ) by Lemma 22. We also know that ψ ≡ 1 on
an open neighborhood W1 of V1. Thus d(B
) = d(B
) ∈ B(W1). We
thus conclude that B′ ∈ B′(U) since its restriction to each Yj, to W1 and to
U \ V2 is a section of B′. Letting W0 = V2 \ (Y1 ∪ Y2 ∪ · · · ∪ YN) then W0 is
an open neighborhood of K, then W0 ⊂ W1 so B′
= B′0 since
W0 is disjoint from the open support of each of the smears S1, . . . ,SN . This
completes the proof that B′ is soft.
The following gives a broad generalization of the equalivalence of the co-
homology of currents with the deRham cohomology groups. To the author’s
knowledge, this result is new.
Corollary 24. Let B be a sheaf of degree k currents. Assume that B con-
tains the sheaf of smooth k forms on X, and that B(U) is closed under
smears on any n-box U ⊂ X. Letting A be the subsheaf of d closed members
of B, then
Hm(X,A ) = Hm+k(X,K),
where K is R or C depending on whether or not we allow complex valued
currents and forms.
Proof. This is an immediate consequence of Proposition 23 and Theorem 16.
5 Invariant Currents
Notation 1. If G is some sheaf of functions on a smooth orientable manifold
X we will use F k(G ) to denote the sheaf of k forms on X with coefficients
in G . We will let F kc (G ) be the subsheaf of closed (in the sense of currents)
members of F k(G ).
It will be convenient to use either degree or dimension of a current de-
pending on the context (just as dimension and codimension are useful for
discussing manifolds), so we will not stick to just one of these terms. We will
let C k denote the sheaf of degree k currents with the index written above
as is typical for cohomology since d increases the degree. We will similarly
write Ck for the sheaf of dimension k currents with the index written below
since d decreases dimension as is common for homology. We use the following
convention to realize a form α as a current so that if α is C1 then dα is the
same whether computed as a current or a form.
Definition 25. Given an k form α with L1 coefficients on an n manifold X
we realize α as a degree k current via
β 7→ (−1)(
α ∧ β
Definition 26. Given a (possibly complex) nonzero deRham cohomology
class c ∈ HkdeRham(X) with f
∗(c) = α · c for some scalar α ∈ C we will refer
to a current C in the same cohomology class as α as an eigencurrent for f if
f ∗(C) = αC.
Currents naturally pushforward, rather than pullback. Because we are
considering maps which are not necessarily invertible we need to address
how this pullback is performed. If f has critical points it is impossible to
define a continuous pullback operation f ∗ on all currents in a way that agrees
with expected cases. For instance, consider f(x) = x2 and let Ca be the
dimension one current on R with Ca(h(x)dx) = h(a), i.e. Ca is a unit mass
vector. Then the pullback f ∗(Ca) should be the sum of weighted unit masses
at the two preimages of this vector (just like the pullback of a point mass
is a sum of point masses each weighted by multiplicity), that is, f ∗(Ca) =
C√a − C−√a
. However, these pullbacks do not converge to a current as
a → 0 so f ∗(C0) is not defined. Since we want f ∗ to be continuous, we are
forced to work with currents that have some extremely mild regularity. We
address this in the next section.
5.1 Nimble Forms and Lenient Currents
Finding a good set of currents to use to study smooth finite self maps (not
necessarily invertible) of compact manifolds turns out to be rather delicate.
Our solution is to first expand our class of forms to include pushforwards (in
the sense of currents) of forms through an appropriate class of smooth maps.
Then we restrict our attention to currents which act on this extended class
of forms.
This solution has the very nice property that it can potentially be adapted
directly to study the dynamics of other various other categories of smooth
maps (by simply changing which forms are considered nimble, according
to the class of maps used). It will convenient to first define the natural
pushforward operator on forms:
Definition 27. Given a compact orientable manifold X we let SX be the
category of smooth maps f : X → X of nonzero degree and having the
property that the critical set has measure zero. We use critical set here to
mean the points at which Df is not invertible.
It follows from our definition that the image of any set of positive measure
under some f ∈ SX has positive measure.
Definition 28. Given a compact orientable manifold X we define N k to be
those currents ϕ which are a finite sum of currents of the form p∗(σ) where
p : X → X is a map in SX and σ is a form of degree k. The pushforward
p∗(σ) is computed in the sense of currents.
We will later show that nimble forms are also, in fact, bona fide forms.
Definition 29. We topologize N k by saying ϕj → ϕ in N k if for sufficiently
large j there are maps f1, . . . , fk and k forms σ1j, . . . , σkj as well as forms
σ1, . . . , σk such that
i fi∗(σij) = ϕj and
i fi∗(σi) = ϕ (where pushfor-
wards are taken in the sense of currents) and for each i ∈ 1, . . . , k, the forms
σij converge to σi in the strong sense (i.e. all derivatives converge uniformly).
Lemma 30. Given a compact orientable manifold Y , N k(Y ) is a topological
vector space.
Proof. This follows easily from our definition of the topology.
We now define the corresponding space of currents.
Definition 31. We define the dimension k lenient currents Lk(Y ) to be
the topological dual of N k(Y ). Every member of Lk(Y ) is a dimension k
current, but with the added structure of its action on all nimble k forms. We
give Lk the weak topology, i.e. Ci → C in Lk iff < Ci, ϕ >→< C, ϕ > for
every ϕ ∈ N k. We write L k for the lenient currents of degree k.
We define operations of wedge products with smooth forms as is usual for
currents. It is clear that the lenient dimension k currents give a sheaf on X.
The following properties of nimble forms are also immediately clear.
Lemma 32. Let f : X → X be a member of SX . The pushforward (as
a current) of a nimble k form by f is again a nimble form. Moreover
f∗ : N
k(X)→ N k(X) is continuous (in the topology of nimble forms). Also
the exterior derivative of a nimble form (as a current) is a nimble form and
d : N k(X)→ N k+1(X) is continuous.
The basic necessary facts about pulling back lenient currents are then
immediate. We state them here:
Lemma 33. Given f : X → X a member of SX the induced map f ∗ on the
sheaf of lenient degree k currents is an f cohomomorphism of sheaves. Both
f ∗ : L k(X) → L k(X) and d : L k(X) → L k+1(X) are continuous. Lastly,
f ∗d = df ∗ : L k(Y )→ L k+1(X).
Proposition 34. Assume that f : X → X is a member of SX . Let R be the
regular set of f . By Sard’s theorem R has full measure. Since the critical set
is compact then R is an open subset of X. Since the preimage of a measure
zero set has measure zero for SX maps then f−1(R) is also a full measure
open set in X. There is a well defined operation f? which maps k forms on
f−1(R) to k forms on R. Given a k form β on X, f?(β) is defined on any
open subset V ⊂ R such that each component U1, . . . , Um of f−1(V ) maps
diffeomorphically onto V by the formula
f?(β)
deg f
(β) · σi (7)
where σi ∈ {±1} is the oriented degree of f
: Ui → V . The pushforward
f? satisfies:
• f?d = df? (keeping in mind that f? returns a current on R)
• f?(1) = 1
• f?(f ∗(β) ∧ α) = β ∧ f?(α)
• (f?)n = (fn)?
• The formula ∫
f ∗(β) ∧ α =
β ∧ f?(α) (8)
holds for any k form β with L∞loc coefficients on Y and any smooth n−k
form α on X. This justifies using f? to pullback currents. (Part of the
conclusion is that both sides are integrable.)
Proof. Each statement is a consequence of formula (7) except the integrabil-
ity conclusion for equation (8). Local charts can be given which are bounded
subsets of Rn and for which Df remains uniformly bounded (over each of the
charts) and thus f ∗(β) will be a form with L∞loc coefficients in these charts.
Thus the left hand side of (8) is the integral of a bounded function over a
finite union of bounded charts and is therefore absolutely integrable. Since
∗(β) ∧ α) = β ∧ f?(α) it is sufficient to show that if γ is an n form with
L∞loc coefficients then ∫
f−1(R)
f?(γ). (9)
Typicaly f?(γ) is unbounded so we need to show that the right hand side of
(9) is integrable. About any point x ∈ R we can find an open V such that
each of the preimages U1, . . . , Uk of V is mapped diffeomorphically onto V .
Since X is orientable and n dimensional there is a well defined notion of the
absolute value of an n form. Then∫
|f?(γ)| ≤
deg f
((f
 = ∑
|γ| =
f−1(V )
NowR is covered by countably many such sets V and listing them as V0, V1, V2, . . . ,
we can let V ′0 = V0, V
1 = V1 \V0, V ′2 = V2 \ (V0∪V1), . . . . Then R is the union
of the countable collection of disjoint measurable sets V ′j and∫
|f?(γ)| =
|f?(γ)| ≤
f−1(Vj)
|γ| =
f−1(R)
Since
f−1(R)
|γ| is finite then f?(γ) is an L1 form. Using precisely the same
argument but with the absolute values removed and the inequalities replaced
with equalities then shows
f?(γ) =
f−1(R)
Since R and f−1(R) are open and full measure then f? is an operator
which takes in forms on X and returns forms defined almost everywhere on
We now show that nimble forms are bona fide forms.
Lemma 35. If g : X → X is a map in SX and σ is a smooth k form on X
then the current g∗(σ) is the current of integration against the form g?(σ).
Proof. If ϕ is a smooth n − k form then by definition < g∗(σ), ϕ >=<
σ, g∗(ϕ) >= (−1)(
σ ∧ g∗(ϕ) = (−1)(
g?(σ) ∧ ϕ =< g?(σ), ϕ >
by formula (8) of Proposition 34
As described in [Fed69], an inner product on a vector space V can be
viewed as an isomorphism ` : V → V ∗ satisfying certain properties. The
inverse of ` gives the induced inner product on V ∗. The fact that < v,w >≤
‖v‖ · ‖w‖ with equality iff v and w are scalar multiples implies that the inner
product norm on V ∗ is the same as the operator norm of V ∗ acting on V .
The induced map
V ∗ gives an inner product on
We call this the canonical inner product on
V induced by the inner prod-
uct on V . Hence, given a Riemannian metric on X, there are canonical
smoothly varying inner products on
TxX and
T ∗xX for each x ∈ X.
At any point x ∈ X we define ‖
Dxf‖ to be the operator norm of the
linear function
Dxf :
TxX →
Tf(x)X. We define ‖
Df‖ to be
the L∞loc norm of the map x 7→ ‖
Dxf‖. Also, given a k form ϕ we define
the comass ‖ϕ‖L∞loc of ϕ to be the L
loc norm of the function x 7→ ‖
ϕx‖. It
is clear that the k forms with the comass norm is a Banach space. We now
show that the k forms with L∞loc coefficients are naturally lenient currents.
We start by defining the action on nimble forms.
Definition 36. Given an n− k form C with L∞loc coefficients we define
< C, p∗(σ) >= (−1)(
n−k+1
C ∧ p?(σ)
Lemma 37. The space F n−k(L∞loc) of n−k forms with L
loc coefficients under
the comass norm includes continuously into Lk(X) where the action of C ∈
F n−k(L∞loc) on some ϕ =
i fi∗(σi) ∈ N
k(X), with each fi ∈ SX and each
σi ∈ F k(C∞) is given by
< C, ϕ >≡
f ∗i (C) ∧ σi.
Proof. The assumption that X is compact means that any two Rieman-
nian metrics on X are comparable. Choose one so the notion of the comass
norm makes sense. The result is then a straightforward consequence of equa-
tion (8), Lemma 35, and our definitions.
Remark. It follows that a current with local F k(L∞loc) potentials is also a
lenient current.
Remark. Given a member C of F k(L∞loc) then f
∗(C) is the same whether
done as a lenient current or as a form. This, along with the fact that df ∗ =
f ∗d justifies the ad hoc pullback of closed positive (1, 1) currents used so
successfully in holomorphic dynamics. Similarly dC gives the same result
whether calculated as a lenient current or a form if C ∈ F k(C1).
5.2 Hölder Lemmas
We will want to apply Corollary 15 to show that each eigencurrent we con-
struct has local d potentials (or ddc potentials in the holomorphic case) which
are forms with Hölder continuous coefficients. In order to do this we will need
a few facts which we include here in order to avoid having to include regu-
larization results as afterthoughts to our main theorems.
Observation. Let Hα be the functions with coefficients that are Hölder of
exponent at least equal to some fixed α > 0. Since diffeomorphisms preserve
Hölder exponents and averages of Hölder functions are Hölder then we take
it as clear that Corollary 24 applies to show that H1(X,A ′) = H1(X,A )
where A ′ is the closed members of F k(Hα)) and A is the closed degree k
currents.
Lemma 38. Let X be a compact manifold (real or complex) with a Rieman-
nian metric and of real dimension n. Let f : X → X be a smooth map. Then
local coordinate charts Ui can be chosen on X (each representing a convex
open subset of Rn) so that there is a positive constant 1 < M so that for any
k form ϕ, there exist constants c, C > 0 such that writing each fk∗(ϕ) in any
of the charts Ui as
fk∗(ϕ) =
akidx
then each function aki satisfies
|aki| ≤ c · ‖fk∗(ϕ)‖comass (10)
and for each j ∈ 1, . . . , n,
∂aki
 ≤ C ·Mk.
Proof. Equation (10) is a basic fact.
The rest is a straightforward consequence of realizing a self map of a
manifold as being made up of a bunch of maps between different coordinate
patches in Rn. That is, one chooses an open cover of patches Ui of X. Each
patch is realized in Rn as a round ball. Thinking of each patch as lying in Rn
then we can find explicit maps from between open subsets of Rn of the form
pij : Ui ∩ f−1(Uj)→ Uj. By shrinking each open ball Ui a small amount the
resulting patches still cover X but the derivatives of the maps pij are all now
bounded (since we are working on relatively compact subsets of the previous
maps pij).
Then given any x we can keep track of which patch fk(x) is in at each time
and can then realize the map fk(x) as a composition pi1i2 ◦ pi2i3 ◦ · · · ◦ pik−1ik .
Since each partial derivative of each pij is uniformly bounded then any partial
derivative of the composition grows at most exponentially with k and we are
done.
The following observation will also be useful:
Lemma 39. If there are positive constants c, C,m,M with m < 1 < M
such that a sequence of smooth functions hk on an open convex set U ⊂ Rn
satisfies
‖hk‖sup < c ·mk
and www∂hk
< C ·Mk
for all k ∈ 0, 1, 2, . . . then h1+h2+h3+. . . converges to a bounded continuous
function which is Hölder of any exponent α <
log(m)
log(m/M)
Proof. The proof is elementary.
5.3 Eigencurrents for Cohomologically Expanding Smooth
We will call a section V of
TX a k-vector field. We define ‖V ‖L∞loc to be
the L∞loc norm of the function x 7→ ‖Vx‖. Whether Theorem 12 applies to a
map will depend the size of ‖
Df‖. Replacing f with an iterate does not
affect the needed estimate so we make the following definition.
Definition 40. We define Υk to be the limit supremum as j → ∞ of
D(f j)‖
j . It follows that Υ1 ≥ eλ for any Lyapunov exponent λ and
that Υk ≤ Υk1 ([Fed69] page 33).
We let B be the sheaf F k−1(L∞loc). The norm ‖ · ‖∞ clearly makes Γ(B)
into a Banach space. Given a member B ∈ Γ(B), since the operator norm on
TxX is equal to the norm already defined on
T ∗xX for each x ∈ X
then ‖B‖∞ is equal to supremum of the L∞loc norm of the function x 7→ B(Vx)
as V varies over all L∞loc k-vector fields of norm no more than one.
Theorem 41. Given f : X → X an a map in SX for the compact orientable
manifold X, assume that c ∈ HkdeRham(X) is a cohomology class (using ei-
ther real or complex deRham cohomology) which is an eigenvector for f ∗ with
eigenvalue β. Assume also that |β| > Υk−1. Then there exists a unique eigen-
current C with local F k−1(L∞loc) potentials representing the class c. Moreover
C has local F k−1(H) potentials.
Also, given any neighborhood U ⊂ X of any point in the support of C,
then for every lenient current C′ with local F k−1(L∞loc) potentials and which
represents the cohomology class c then fk(U) ∩ Supp C′ 6= ∅ for all large k.
Assume that the linear map f ∗ : HkdeRham(X)→ H
deRham(X) is dominated
by a single simple real eigenvalue r. Given C′ any current which has local
F k−1(L∞loc) potentials and which represents a cohomology class in the Υ
chronically expanding subspace of HkdeRham(X), then the successive rescaled
pullbacks fk∗(C′)/rk of C′ converge to a multiple of C in the sense of lenient
currents (and thus also in the sense of currents).
Proof. We let B = F k−1(L∞loc), A and C be the kernel and image respec-
tively of B
d→ L k. By Theorem 24, H1(X,A ) can be canonically identified
with Hk(X,K). Since B is Γ-acyclic then every member of H1(X,A ) is a
closed bundle with respect to the short exact sequence A → B → C .
From Lemma 33 there is an induced f cohomorphism of the short ex-
act sequence A
ι→ B d→ C . Also Γ(C ) is a space of lenient currents by
Lemma 33 and thus has a natural structure as a topological vector space. If
a sequence Bi ∈ Γ(B) converges to B ∈ Γ(B) then < dBi, ϕ >=
Bi∧dϕ =∫
B ∧ dϕ =< dB, ϕ > so the map d : Γ(B)→ Γ(C ) is continuous.
The cohomomorphism ΓfB is pullback f
∗ of differential forms. Fixing any
real α satisfying Υk−1 < α < |β| it is clear from the definition of Υk−1 that one
can choose a real d > 0 such that ‖
D(f `)‖ ≤ d ·α` for all ` ∈ N. The `th
pullback f `∗(B) of B ∈ Γ(B) satisfies ‖f `∗(B)‖∞ = supV ‖B(
D(f `)(V ))‖∞
where the supremum is taken over all k-covector fields V with ‖V ‖∞ ≤ 1.
However
D(f `)(V ) is a k-covector field of norm no more than ‖
D(f `)‖,
so ‖f `∗(B)‖∞ ≤ ‖B‖∞ ·
D(f `)‖∞ ≤ d · α`‖B‖∞.
Given any W in the Υk−1 chronically expanding subspace of H
k(X,K),
we can alter our choice of α > Υk−1 so that W also lies in the α chronically
expanding subspace of Hk(X,K).
We can therefore apply Theorem 12 to conclude that there is a (unique)
map κ : W → Γ(C ) such that f ∗κ = κf ∗, where the first f ∗ is pullback of
currents and the second is pullback on Hk(X,K).
In fact κ(W ) lies in the space of currents with locally Hölder potentials
(meaning F k−1(H) potentials) by applying Corollary 15 in conjunction with
Observation 5.2, Lemma 38 and Lemma 39. The second half of the Theorem
is a consequence of equation (3).
Remark. Theorem 41 gives regular degree one eigencurrents for every eigen-
value of f ∗ : H1(X,K) → H1(X,K) of norm greater one without requiring
any constraints on the local behavior of f . The degree one eigencurrents
seem to be, in some sense, more robust than currents of lower dimension,
including invariant measures. Moreover since codimension one closed sub-
manifolds are closed currents with local F 0(L∞loc) potentials then successive
rescaled preimages of such manifolds in the right cohomological class will
converge to the eigencurrent.
Remark. The fact that eigencurrents constructed via Theorem 41 have local
potentials which are forms does not imply their support has positive Lebesgue
measure as the classical example of a monotonic nonconstant function which
is constant on a set of full measure shows.
Remark. The assumption that f ∗ : H1deRham(X)→ H
deRham(X) is dominated
by a single simple real eigenvalue r is not essential, but just meant to handle
the simplest case. In fact the proof actually shows that if W lies in the
Υk−1 chronically expanding subspace of H
k(X,K) then every current in the
invariant plane κ(W ) ⊂ Γ(C ) of currents has local F k−1(H) potentials and
any current with cohomological class in W with local F k−1(L∞loc) potentials
is attracted to κ(W ) under successive rescaled pullback.
Since measures are of particular interest in dynamics, we note thatH1(X,F n−1(L∞loc)) =
Hn(X,K) = K by Corollary 24 so there is a unique f ∗ eigenvalue and it is
precisely the topological degree of f . We thus obtain:
Corollary 42. Given that Υn−1 < deg f then there is a unique dimension
zero eigencurrent C with F k−1(L∞loc) potentials (and in fact it has F
k−1(H)
potentials) and the successive rescaled preimages of any C′ with F k−1(L∞loc)
potentials converge to C. If additionally there is no point x ∈ X about which
f is locally an orientation reversing diffeomorphism then C (and every other
member of κ(W )) is a positive distribution and is therefore a Radon measure.
Proof. Since f ∗ pulls back dimension zero currents (i.e. distributions) which
are positive to distributions which are positive then by Corollary 3.1 the
distribution C is positive. It is therefore a Radon measure (see e.g. [HL99]
page 270).
Remark. In the case where f is orientation reversing on some parts of X (but
not on all of X) some special remarks apply. If it happens that successive
rescaled images of some point converge to a dimension zero eigencurrent then
since preimages of points are counted with multiplicity then when pulled back
through a portion of X on which f reverses orientation the sign of a point
is flipped. Thus in this case the eigencurrent may not describe so much
the distribution of preimages as the relative density of preimages counted
negatively as compared to those counted positively. The number of actual
preimages of a point may grow exponentially faster than the degree of the
map in such cases so that dividing by the degree does not yield a measure in
the limit unless some such “cancellation” takes place in the limit. One would
expect that the corresponding eigencurrents have local potentials which are
not of bounded variation in such a case.
5.4 Eigencurrents for Smooth Covering Maps
We will call a covering map which is locally a diffeomorphism a smooth
covering map. We now consider the special case of smooth self covering
maps f : X → X of a compact smooth orientable manifold X. We show
that in this case we have a substantially broader collection of currents whose
successive pullbacks converge to an eigencurrent, albeit we need different
estimates for Theorem 12 to apply. We will pull back currents by pushing
forward forms with f?. Since the regular set of f is all of X then f? is a well
defined operator from smooth forms to smooth forms.
Definition 43. For a map satisfying the hypothesis of Proposition 34 we
define the operation f ∗ from currents on X to currents on Y by
< f ∗(C), α >≡< C, f?(α) > .
Clearly f ∗ preserves the dimension of a current.
Let Mk−1 be the sheaf for which Mk−1(U) is the Banach space of bounded
linear operations on the topological vector space comprised of F k−1(C∞)(U)
with the ‖ · ‖∞ norm. Equivalently, Mk−1 is the sheaf of dimension k − 1
currents of finite mass.
Choose a Riemannian metric on X. If f : X → X is a smooth cover
then for each x ∈ X and each ` ∈ N, Dx(f `) : TxX → Tf`(x)X is invertible.
We let νk(x, `) be the operator norm of the inverse of
TxX →∧k
Tf`(x)X. We define νk(`) = supx∈X νk(x, `)
1/`. We define νk = lim sup`→∞ νk(`).
The iterated pushforward operation f `? : F
k−1(C∞)(X) → F k−1(C∞)(X)
satisfies ‖f `?(ϕ)‖∞ ≤ νk(`) · ‖ϕ‖∞ as is straightforward to verify. If f is in-
vertible then νk is a bound on the growth of the k
th wedge product of the
derivative under f−1. For non-invertible f , νk represents a bound on the
growth of the kth wedge product of the derivative under any sequence of
successive branches of f−1.
Theorem 44. Given f : X → X a smooth self covering map and that
c ∈ HkdeRham(X) is a cohomology class (using either real or complex deR-
ham cohomology) which is an eigenvector for f ∗ with eigenvalue β. Assume
also that |β| > νk−1. Then there exists a unique eigencurrent C with local
Mk−1 potentials representing the class c. Moreover C has local F
k−1(C0)
potentials. Consequently C is a current of order one.
Also, given any neighborhood U ⊂ X of any point in the support of C, then
for every lenient current C′ with local Mk−1 potentials and which represents
the cohomology class c then fk(U) ∩ Supp C′ 6= ∅ for all large k.
Assume that the linear map f ∗ : HkdeRham(X)→ H
deRham(X) is dominated
by a single simple real eigenvalue r. Given C′ any current which has local
Mk−1 potentials and which represents a cohomology class in the ν
k−1 chroni-
cally expanding subspace of HkdeRham(X), then the successive rescaled pullbacks
fk∗(C′)/rk of C′ converge a multiple of C.
Proof. We let A and C be the kernel and image respectively of d : Mk−1 →
C k. Since df? = f?d then pullback of currents gives an f cohomomorphism
of the short exact sequence of sheaves A →Mk−1 → C .
Since ΓMk−1 is the continuous linear operators on a normed vector space
then it is a Banach space. From the observations previous to the statement
of Theorem 44 one concludes that for any α > νk−1 there is a constant d > 0
such that ‖f `∗(B)‖ ≤ d · αk‖B‖ for all ` ∈ N.
Since Γ(C ) is a space of currents it is naturally a topological vector space
over K.
The map f ∗ : Γ(C ) → Γ(C ) is continuous since if Ci → C in Γ(C ) then
< f ∗(Ci), ϕ >=< Ci, f?(ϕ) >→< C, f?(ϕ) >=< f ∗(C), ϕ >.
If Pi → P in ΓMk−1 (using the Banach space structure) then ‖Pi−P‖ →
0 by assumption then ‖P(dϕ) − Pi(dϕ)‖ ≤ ‖P − Pi‖ · ‖dϕ‖ → 0. Hence
< dPi, ϕ >= Pi(dϕ)→ P(dϕ) =< dP, ϕ > and so we conclude that the map
d : ΓMk−1 → Γ(C ) is continuous.
Given any W in the νk−1 chronically expanding subspace of H
k(X,K),
we can alter our choice of α > νk−1 so that W also lies in the α chronically
expanding subspace of Hk(X,K).
We can therefore apply Theorem 12 to conclude that there is a (unique)
map κ : W → Γ(C ) such that f ∗κ = κf ∗, where the first f ∗ is pullback of
currents and the second is pullback on Hk(X,K).
In fact κ(W ) in the currents with locally continuous potentials by apply-
ing applying Corollary 15 in conjunction with Observation 5.2, Lemma 38
and Lemma 39. The second half of the Theorem is a consequence of equa-
tion (3).
Proposition 45. Let Y be an oriented codimension k submanifold of X. If
the cohomological class of Y (as a current) lies in the νk−1 chronically expand-
ing subspace of Hk(X,K) then the successive rescaled preimages of Y con-
verge to the invariant plane of currents κ(W ). If f ∗ : Hk(X,K)→ Hk(X,K)
is dominated by a single real eigenvalue r > νk−1 then the successive rescaled
preimages of Y converge to a multiple (possibly zero) of the r eigencurrent.
In particular, if νn−1 < deg f then the successive rescaled preimages of any
point converge to the unique invariant measure with Mn−1 potentials.
Proof. This follows immediately from Theorem 44 if we show that Y has
local potentials in Mk−1. This is equivalent to showing that locally Y = dP
where < P,ϕ >≤ a · ‖ϕ‖∞ for some a > 0. Let B be a ball in Rn and Y0
a k-plane in Rn. Then there is a k + 1 half plane P such that, as currents
in U , ∂P = Y0. Moreover it is clear that < P,ϕ >≤ a‖ϕ‖∞ for some real
a > 0. (There are also local potentials for Y which are given by forms with
L1loc coefficients. These can be constructed by choosing a projection π from
U \ Y0 to a codimension one cylinder C with axis Y0, and choosing a volume
form σ on C. The local potential is the pullback π∗(σ).)
Remark. As with Theorem 41, Theorem 44 gives regular degree one eigencur-
rents for every eigenvalue of f ∗ : H1(X,K)→ H1(X,K) of norm greater one
without requiring any constraints on the local behavior of f . In holomorphic
dynamics much progress has been made in constructing degree one eigencur-
rents and then constructing dynamically important invariant measures via
a generalized wedge product (see the references cited at the beginning of
Section 6).
Remark. The proof of Proposition 45 could clearly be modified to apply to
many singular manifolds as well.
6 Holomorphic Endomorphisms
We now restrict our interest to holomorphic dynamics. Thus all manifolds
are assumed to be complex manifolds and all maps are assumed to be holo-
morphic unless stated otherwise.
Holomorphic endomorphisms of the Riemann sphere have been studied
in great detail. For endomorphisms much of the theory is still in its be-
ginnings. Much attention has been paid to holomorphic automorphisms of
C2 [FM89], [FS92], [HOV94], [HOV95], [BS91a], [BS91b], [BS92], [BLS93],
[BS98a], [BS98b], [BS99] or K3 surfaces [Can01], [McM02], the major de-
velopments for endomorphisms have been on Pn, [FS94a], [FS94b], [FS95b],
[FS01], [FS95a], [JW00], [FJ03], [Ued94], [Ued98], [Ued97]. Recent signifi-
cant developments have been made for endomorphisms of Kahler manifolds
in [DS05]. The paper [DS05] shows existence of eigencurrents (or Green’s
currents) for endomorphisms of Kahler manifolds under a simple condition
on the comparative rates of growth of volume in two different dimensions.
They also show that a specific weighted sum of an arbitrary closed positive
smooth current will converge to the Green’s current, and that the Green’s
current has a Hölder continuous potential. In this setting our theorem shows
that arbitrary (rescaled) preimages of a broader class of currents will con-
verge to the Green’s current. A wide variety of results have been proven
in these various circumstances either showing the existence of invariant cur-
rents, showing convergence of currents to invariant currents, or studying the
properties of these invariant currents. We include here results that follow
from the method of this paper, which we are sure substantially overlap with
existing results. Presumably our cohomologicaly lifting theorem could be
used in conjuction with Theorem 12 to show existence of higher degree (k, k)
currents given certain bounds on local growth rates.
6.1 ddc Cohomology
Let Z be a complex manifold and let f : Z → Z be a holomorphic self map
of Z. Let H be the sheaf of pluriharmonic functions, let L∞loc be the sheaf
of locally bounded functions, and let C be the sheaf of currents with local
potentials in L∞loc, i.e. currents locally of the form dd
cb, for b a locally bounded
function. The members of C are closed (1, 1) currents on Z.
Using the usual pullback on functions, and the induced pullback on cur-
rents with function potentials (i.e. pullback the current by pulling back its
local potentials), then we get a self cohomomorphism of the exact sequence
of sheaves
H → L∞loc
ddc→ C . (11)
We note that H1(Z,H ) is a finite dimensional R vector space as can be
seen from the long exact sequence for the short exact sequence R→ O →H
where the first map is inclusion and the second takes the imaginary part. The
terms H1(Z,O) → H1(Z,H ) → H2(Z,R) give the finite dimensionality
since O is a coherent analytic sheaf (see e.g. [Tay02] page 302).
Then from Theorem 12 we obtain:
Corollary 46. Given v any closed eigenbundle of H1(Z,H ) for f ∗ with
eigenvalue r > 1, there is a unique closed (1, 1) current C such that limk→∞ f
k∗(C′)/rk
converges to C for any divisor C′ of v.
Remark. We note that the terms “closed eigenbundle” and “divisor” in Corol-
lary 46 are understood using the long exact sequence for (11).
We can apply Corollary 15 to show that
Corollary 47. Any such invariant current C so obtained has Hölder contin-
uous local potentials.
Proof. The result follows from Lemma 5.2, Lemma 38, the fact that the
ddc closed Hölder continuous functions are the same as the ddc closed L∞loc
functions and from Corollary 15.
Also from Observation 3.1,
Corollary 48. If v has a plurisubharmonic section the current C is positive.
7 Result via Invariant Sections
We stated early on that our construction of invariant members of H0(C ) for
a self cohomomorphism of a short exact sequence A → B → C of sheaves
could be done in terms of finding invariant sections of bundles. We illustrate
this here in a specific case where we can take advantage of geometry to make
further conclusions. Finding an invariant section of a bundle is equivalent to
finding an invariant trivialization of the bundle, and we will make our initial
statement in terms of a trivialization.
Let Z be a compact complex manifold. Let f : Z → Z be a holomorphic
endomorphism. Let p ∈ H1(Z,H ) be an eigenvector for f ∗ with real eigen-
value λ of norm greater than one. If f ∗ were to have complex eigenvalues of
interest, an analogous construction can be made to the one that follows.
We note that there is a canonical bundle map f̃ : f ∗(p) → p which gives
the map f on the base space. It is easy to show that there is a map σ : p→ λp
which is the identity on the base space and takes the form r 7→ λr+ b on the
fibers, where b is a constant. What is more, the map τλ is easily seen to be
unique up to the addition of a constant. Then define the map f̌ : p → p to
be the composition of
τλ→ λp = f ∗(p) f̃→ p.
Then f̌ is the map f on the base space and takes the form r 7→ λr + b on
the fibers.
Since every pluriharmonic bundle is trivial as a smooth bundle, then we
can choose a smooth trivialization t : p→ R, i.e. t(a+ r) = σ(a) + r for any
a ∈ p, r ∈ R, where a+ r is computed in the fiber containing a.
Theorem 49. There is a unique continuous trivialization g : p → R such
that:
g(a+ r) = g(a) + r for a ∈ p and r ∈ R,
g(f̌(a)) = λ · g(a) for a ∈ p,
moreover
g = lim
λ−k ◦ t ◦ f̌ ◦k
and the limit converges uniformly. Finally, the zero set of g is the image of a
section g : Z → p and is exactly the set of points whose forward image under
f̌ remains bounded.
Proof. Define a function T : p→ R by
T (a) ≡ t
f̌(a)
− λ · t(a).
Note that T descends to a well defined continuous function T : Z → R since
for an arbitrary r ∈ R one has T (a + r) = t
f̌(a + r)
− λ · t(a + r) =
t(f̌(a) + λr)− λ · (t(a) + r) = T (a).
One notes that since the function T is necessarily bounded if Z is compact
then defining
g(a) ≡ t(a) + λ−1 · T (a) + λ−2T (f̌(a)) + λ−3T (f̌ ◦2(a)) + · · ·
gives a continuous function g : p→ R satisfying the above two properties.
Assume g1 and g2 are two such functions. Then ∆ ≡ g1− g2 : p→ R is a
function satisfying ∆(a+ r) = ∆(a) for a ∈ p and r ∈ R so ∆ descends to a
continuous function ∆: p→ R satisfying ∆(f̌(a)) = λ ·∆(a). However since
λ > 1 one concludes that this is only possible if ∆ ≡ 0 since M is compact
so ∆(M) has compact image in R.
It is easy to check using the definition of T that λ−k ◦ t ◦ f̌ ◦k(a) is exactly
a partial sum of the first k terms of the above series and this gives the
convergence result. The conclusion about the section g is trivial.
The above construction can be carried through almost without modifi-
cation for any subspace of H1(Z,H ) on which f ∗ is expanding. This gives
an alternate way of understanding the convergence of preimages of sections.
The point is that if s is any section of p, i.e. the potential of a current C,
then 1
f ∗(C) is a current with potential which is the setwise preimage of s
under f̌ (this is easy to confirm from the construction of f). The Green’s
trivialization g shows that f̌ is uniformly repelling away from the image of
the invariant section g. Thus as long as s is bounded in p, (not even neces-
sarily continuous), then the successive preimages of s will converge uniformly
to the section g. Since uniform convergence of potentials implies convergence
of currents then the rescaled pullbacks of a current C converge to the cur-
rent with potential g. We already have this as a theorem, so we have not
restated it as such here. This is just an alternative approach. Note that in
the case where Z = P2 [FJ03] has given far more precise control of when the
successive rescaled preimages of a current will converge to the eigencurrent.
7.1 Sections version with an Invariant Ample Bundle
It is also interesting to consider the special case where there is an invariant
ample bundle with eigenvalue λ ≥ 2 an integer. Without loss of generality
we assume ` is very ample. The morphism of sheaves log | · | : O∗ → H
induces a map from holomorphic line bundles to pluriharmonic bundles. We
let p = log |`| be the corresponding pluriharmonic bundle.
It is easy to see that there is a holomorphic map ` → `λ which is of the
form σλ : z 7→ azλ, a ∈ C∗ on each fiber and is the identity on the base
space. There is also a canonical holomorphic map f̃ : f ∗(`) → ` which is a
line bundle map and is f on the base space.
One then defines the holomorphic map f̆ : `→ ` which is the composition
σk→ `k = f ∗(`) f̃→ `.
This map is of the form z 7→ azk on each fiber and is equal to the map
f : Z → Z on the base space. Let `∗ denote ` with its zero section removed,
so that log | · | : ` → p is a well defined continuous map. Since the preimage
of the zero section of ` under f̆ is the zero section then f̆ is a holomorphic
self map of `∗. It is easy to confirm that f̆ : ` → ` can be rescaled so that
the diagram
log |·|
log |·|
commutes.
Our Greens trivialization g : p→ R can be pulled back to give a Green’s
function G : `∗ → R on the punctured bundle `∗. It satisfies G(f̃(w)) =
λ · G(w) and G(βw) = G(w) + log |β| for w ∈ ` and β ∈ C∗. Since g is
a trivialization of an R bundle over a compact space, g is proper. Since
log | · | : `∗ → p is proper then G is proper. Thus, in this setting one can
construct a Greens function that is exactly analogous to the Green’s function
constructed on Cn+1 for a holomorphic endomorphism of Pn. Potentially one
could take advantage of the special geometry of very ample bundles to get
information about the dynamics in this situation.
8 Bibliography
References
[BLS93] Eric Bedford, Mikhail Lyubich, and John Smillie. Polynomial dif-
feomorphisms of C2. IV. The measure of maximal entropy and lam-
inar currents. Invent. Math., 112(1):77–125, 1993.
[Bre97] Glen E. Bredon. Sheaf Theory. Springer-Verlag, 1997.
[BS91a] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2:
currents, equilibrium measure and hyperbolicity. Invent. Math.,
103(1):69–99, 1991.
[BS91b] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2.
II. Stable manifolds and recurrence. J. Amer. Math. Soc., 4(4):657–
679, 1991.
[BS92] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2.
III. Ergodicity, exponents and entropy of the equilibrium measure.
Math. Ann., 294(3):395–420, 1992.
[BS98a] Eric Bedford and John Smillie. Polynomial diffeomorphisms of
C2. V. Critical points and Lyapunov exponents. J. Geom. Anal.,
8(3):349–383, 1998.
[BS98b] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2.
VI. Connectivity of J . Ann. of Math. (2), 148(2):695–735, 1998.
[BS99] Eric Bedford and John Smillie. Polynomial diffeomorphisms of C2.
VII. Hyperbolicity and external rays. Ann. Sci. École Norm. Sup.
(4), 32(4):455–497, 1999.
[Can01] Serge Cantat. Dynamique des automorphismes des surfaces K3.
Acta Math., 187(1):1–57, 2001.
[dR84] Georges de Rham. Differentiable Manifolds. Springer-Verlag, 1984.
[DS05] Tien-Cuong Dinh and Nessim Sibony. Green currents for holomor-
phic automorphisms of compact kähler manifolds. J. Amer. Math.
Soc., 18(2):291–312, 2005.
[Fed69] Herbert Federer. Geometric Measure Theory. Springer-Verlag,
1969.
[FJ03] Charles Favre and Mattias Jonsson. Brolin’s Theorem for Curves
in Two Complex Dimensions. Ann. Inst. Fourier, 53:1461–1501,
2003.
[FM89] Shmuel Friedland and John Milnor. Dynamical properties of
plane polynomial automorphisms. Ergodic Theory Dynam. Sys-
tems, 9(1):67–99, 1989.
[FS92] John Erik Fornaess and Nessim Sibony. Complex Hénon mappings
in C2 and Fatou-Bieberbach domains. Duke Mathematical Journal,
65(2):345–380, 1992.
[FS94a] John Erik Fornaess and Nessim Sibony. Complex dynamics in
higher dimension. In Complex Potential Theory, pages 131–186,
1994.
[FS94b] John Erik Fornaess and Nessim Sibony. Complex dynamics in
higher dimension. I. Astérisque, 222:201–231, 1994.
[FS95a] John Erik Fornaess and Nessim Sibony. Classification of recurrent
domains for some holomorphic maps. Math. Ann., 301(4):813–820,
1995.
[FS95b] John Erik Fornaess and Nessim Sibony. Complex dynamics in
higher dimension. II. In Modern methods in complex analysis
(Princeton, NJ, 1992), pages 135–182. Princeton Univ. Press,
Princeton, NJ, 1995.
[FS01] John Erik Fornæss and Nessim Sibony. Dynamics of p2 (examples).
In Laminations and foliations in dynamics, geometry and topology
(Stony Brook, NY, 1998), pages 47–85. Amer. Math. Soc., Provi-
dence, RI, 2001.
[GR84] H. Grauert and R. Remmert. Coherent Analytic Sheaves. Springer-
Verlag, 1984.
[Har77] Robin Hartshorne. Algebraic Geometry. Springer-Verlag, 1977.
[HL99] Francis Hirsche and Gilles Lacombe. Elements of Functional Analy-
sis, volume 192 of Graduate Texts in Mathematics. Springer-Verlag,
1999. Translated from the 1997 French original by Silvio Levy.
[HOV94] John H. Hubbard and Ralph W. Oberste-Vorth. Hénon mappings
in the complex domain. I. The global topology of dynamical space.
Inst. Hautes Études Sci. Publ. Math., (79):5–46, 1994.
[HOV95] John H. Hubbard and Ralph W. Oberste-Vorth. Hénon mappings
in the complex domain. II. Projective and inductive limits of poly-
nomials. In Real and complex dynamical systems (Hillerød, 1993),
volume 464 of NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., pages
89–132. Kluwer Acad. Publ., Dordrecht, 1995.
[JW00] Mattias Jonsson and Brendan Weickert. A nonalgebraic attractor
in P2. Proc. Amer. Math. Soc., 128(10):2999–3002, 2000.
[Kat03] Anatole Katok. Combinatorial constructions in ergodic theory and
dynamics, volume 30 of Unversity Lecture Series. American Math-
ematical Society, Providence, RI, 2003.
[McM02] Curtis T. McMullen. Dynamics on K3 surfaces: Salem numbers
and Siegel disks. J. Reine Angew. Math., 545:201–233, 2002.
[Tay02] Joseph L Taylor. Several Ccomplex Variables with Connections
to Algebraic Geometry and Lie Groups. American Mathematical
Society, 2002.
[Ued94] Tetsuo Ueda. Fatou sets in complex dynamics in projective spaces.
J. Math. Soc. Japan, 46:545–555, 1994.
[Ued97] Tetsuo Ueda. Complex dynamics on Pn and kobayashi metric.
In Complex dynamical systems and related areas, pages 188–191.
Surikaisekikenkyusho Kokyuroku No 988, 1997. (Kyoto 1996).
[Ued98] Tetsuo Ueda. Critical orbits of holomorphic maps on projective
spaces. J. Geometric Analysis, 8(2):319–334, 1998.
[Wei97] Charles A. Weibel. An Introduction to Homological Algebra. Cam-
bridge University Press, 1997.
	Introduction
	Cohomomorphisms
	Cohomomorphisms and .
	Invariant Global Sections
	Regularity and Positivity
	Subsheaf Cohomology
	Invariant Currents
	Nimble Forms and Lenient Currents
	Hölder Lemmas
	Eigencurrents for Cohomologically Expanding Smooth Maps
	Eigencurrents for Smooth Covering Maps
	Holomorphic Endomorphisms
	ddc Cohomology
	Result via Invariant Sections
	Sections version with an Invariant Ample Bundle
	Bibliography
ABSTRACT
  The goal of this paper is to construct invariant dynamical objects for a (not
necessarily invertible) smooth self map of a compact manifold. We prove a
result that takes advantage of differences in rates of expansion in the terms
of a sheaf cohomological long exact sequence to create unique lifts of finite
dimensional invariant subspaces of one term of the sequence to invariant
subspaces of the preceding term. This allows us to take invariant cohomological
classes and under the right circumstances construct unique currents of a given
type, including unique measures of a given type, that represent those classes
and are invariant under pullback. A dynamically interesting self map may have a
plethora of invariant measures, so the uniquess of the constructed currents is
important. It means that if local growth is not too big compared to the growth
rate of the cohomological class then the expanding cohomological class gives
sufficient "marching orders" to the system to prohibit the formation of any
other such invariant current of the same type (say from some local dynamical
subsystem). Because we use subsheaves of the sheaf of currents we give
conditions under which a subsheaf will have the same cohomology as the sheaf
containing it. Using a smoothing argument this allows us to show that the sheaf
cohomology of the currents under consideration can be canonically identified
with the deRham cohomology groups. Our main theorem can be applied in both the
smooth and holomorphic setting.

<|endoftext|><|startoftext|>
Coincidence of the oscillations in the dipole transition and in the persistent current of
narrow quantum rings with two electrons
Y. Z. He and C. G. Bao∗
State Key Laboratory of Optoelectronic Materials and Technologies,
and Department of Physics, Sun Yat-Sen University, Guangzhou, 510275, P.R. China
The fractional Aharonov-Bohm oscillation (FABO) of narrow quantum rings with two electrons
has been studied and has been explained in an analytical way, the evolution of the period and
amplitudes against the magnetic field can be exactly described. Furthermore, the dipole transition
of the ground state was found to have essentially two frequencies, their difference appears as an
oscillation matching the oscillation of the persistent current exactly. A number of equalities relating
the observables and dynamical parameters have been found.
PACS numbers: 73.23.Ra, 78.66.-w
* The corresponding author
Quantum rings containing only a few electrons can
be now fabricated in laboratories1,2. When a magnetic
field B is applied, interesting physical phenomena, e.g.,
Aharonov-Bohm oscillation (ABO) and fractional ABO
(FABO)of the ground state (GS) energy Eo and persis-
tent current Jo, have been observed
2−4,13. In the the-
oretical aspect, a number of calculations based on exact
diagonalization5−8, local-spin-density approximation9,10,
and the diffusion Monte Carlo method11 have been
performed. These calculations can in general repro-
duce the experimental data. For examples, in the
calculation of 4-electron ring6,11, the period of oscilla-
tion Φ0/4 found in experiments was recovered (Φ0 =
hc/eisthefluxquantum).
In addition to the oscillations in Eo and Jo, the oscil-
lation in the optical properties is noticeable.16,17. In this
paper a new kind of oscillation found in the dipole tran-
sition of two-electron (2-e) narrow rings is reported. The
emitted (absorbed) photon of the dipole transition of the
GS was found to have essentially two energies, their dif-
ference is exactly equal to hJo, where h is the Planck’s
constant. In other words the difference of the two photon
energies appears as an oscillation which matches exactly
the oscillation of Jo. This finding is approved by both
numerical calculation and analytical analysis as follows.
The narrow 2-e ring is first considered as one-
dimensional, then the effect of the width of the ring is
further evaluated afterward. The Hamiltonian reads
H = T + V12 +HZeeman (1)
G(−i ∂
+Φ)2, G = ~
2m∗R2
where m∗ the effective mass, θj the azimuthal an-
gle of the j − th electron, Φ = πR2B/Φ0, where B
is a magnetic field perpendicular to the plane of the
ring, V12 the e-e Coulomb interaction, HZeeman =
−SZµΦ the well known Zeeman energy where SZ is
the Z-component of the total spin S, and µ = g
πR2Φ0
where g∗ is the effective g-factor and µB is the Bohr
magneton. The interaction is adjusted as 7 V12 =
e2/(2ε
d2 +R2 sin2((θ1 − θ2)/2))
−1, where ε is the di-
electric constant and the parameter d is introduced to
account for the effect of finite thickness of the ring.
We first perform a numerical calculation so that all
related quantities can be evaluated quantitatively. m∗ =
0.063me, ε = 12.4 (for InGaAs), d = 0.05R , and the
units meV , nm , Tesla and Φ0 are used. Accordingly,
G = 604.8/R2,and µ = 33.53/R2.
A set of basis functions φk1k2 = e
i(k1θ1+k2θ2)/2π is in-
troduced to diagonalize the Hamiltonian, where k1 and
k2 must be integers to assure the periodicity, the sum of
k1 and k2 is just the total orbital angular momentum L.
φk1k2 must be further (anti-)symmetrized when S = 0(1).
When about three thousand basis functions are adopted,
accurate solutions (at least six effective digits) can be ob-
tained. The low-lying spectrum is plotted in Fig.1, where
the oscillation of the GS energy and the transition of the
GS angular momentum Lo can be clearly seen.
Let θC = (θ2 + θ1)/2 , and ϕ = θ2 − θ1. Then
H = Hcoll +Hint (2)
whereHcoll =
G(−i ∂
+2Φ)2+HZeeman andHint =
2G(−i ∂
)2+V12, they are for the collective and internal
motions, respectively. Our numerical results lead to the
following points.
(i) Separability: The separability of one-dimensional
ring is well known5. However, for the convenience of the
following description, it is briefly summarized as follows.
Each eigenenergy E can be exactly divided as a sum of
three terms
E = 1
G(L+ 2Φ)2 + Eint − SZµΦ (3)
where the first term is the kinetic energy of collective
motion, Eint is the internal energy .
Since the basis functions can be rewritten as
http://arxiv.org/abs/0704.0070v1
φk1k2 = e
iLθCei
(k2−k1)ϕ/2π (4)
the spatial part of each eigenstate Ψ is strictly separa-
ble as Ψ = 1√
eiL θCψint where the first part describes
the collective motion, while ψint is a normalized internal
state depending only on ϕ. In particular, both Eint and
ψint do not depend on B (or Φ).
0 2 4 6 8
E (meV)
FIG. 1: Low-lying levels of a 2-e ring against Φ/Φ0 in the
FABO region. When Φ/ is positive, Lo is negative, the num-
bers by the curves are −Lo.
(ii) Classification of ψint: When L is even (odd), (k2−
k1)/2 is an integer (half-integer), thus the period of ϕ as
shown in (4) is 2π (4π). Therefore, the periodicity of the
internal states have two choices. In fact, the difference in
the periodicity is closely related to the dependence of the
domains of the new variables θC and ϕ, this point has
been discussed in detail in ref.[14,15]. Let Q = (−1)L,
then the four cases (Q,S) = (1,0), (-1,0), (-1,1) and (1,1)
are associated with four types of states labeled by a, b, c,
and d , respectively. The internal states of Type a
are denoted as ψa, ψa∗ , · · · and the associated internal
energies as Ea < Ea∗ , · · · and so on. Examples of
ψint and Eint are plotted in Fig.2 and listed in Table 1,
respectively.
Table 1, The lowest and second lowest internal
energies (in meV ) of Type a to d, R = 30nm.
Type a b c d
Eint 2.626 4.247 2.630 4.272
E∗int 6.342 8.912 6.435 9.158
Due to the e-e repulsion, a dumbbell shape (DB), i.e.,
ϕ = 180◦, is advantageous in energy because the two
electrons are farther away from each other meanwhile.
However, a rotation of this geometry by π is equivalent
to an interchange of particles, these operations will cre-
ate the factors (−1)L and (−1)S, respectively, from the
wave function. Therefore, the equivalence leads to a con-
straint, accordingly the DB is allowed only for the states
Type a Type b
0 90 180 270 360
Type c
0 90 180 270 360
Type d
FIG. 2: Four types of ψint against ϕ , R = 40nm. The lowest
three of each type are shown, the higher state has more nodes.
with L + S even, i.e., only for Type a and c. Other-
wise, the states would have an inherent node at the DB
and therefore be higher in energy as shown in Table 1,
where Ea << Eb, Ec << Ed, and Ea ≈ Ec. In Fig.2
the patterns of Type a are one-to-one similar to Type c
, they all have a peak at the DB. On the contrary, all
those of Type (b) and(d) have the inherent node at the
DB. It is noticeable that Type b and c are not continuous
at ϕ = 0 and 2π due to their periods are not equal to 2π.
It was found that the internal states of all the GSs are
either ψa or ψc without exceptions because the favorable
DB is allowed in them. When the dynamical parame-
ters vary in reasonable ranges, the qualitative features of
Fig.2 remain the same.
According to (3), an appropriate Lo would be chosen to
minimize the GS energy. When Φ increases, Lo will un-
dergo even-odd transitions repeatedly and become more
negative as shown in Fig.1. Correspondingly, the total
spin So undergoes singlet-triplet transitions, and ψa and
ψc appear in the GS alternatively. However, due to the
Zeeman effect, when Φ is larger than a critical value Φcrit
, only So = 1 states will be dominant, and accordingly
only ψc will appear in the GS. The region Φ < (>) Φcrit
is called the FABO (ABO) region.
(iii) Persistent Current : Let J1 be the current of the
particle e1. The expression of J1 is well known.
5 However,
since it does not depend on the azimuthal angle, it equals
to its average over θ1. Thus the total current J = J1+J2
J = 1
dθ1dθ2 [Ψ
∗(−i ∂
+2Φ)Ψ+c.c.] (5)
where g = ~/(m∗R2). Using the arguments θC and ϕ
and making use of the separability, the integration over
θC and ϕ can be performed. Thus we have
J = g(L+ 2Φ)/2π (6)
This equation demonstrates explicitly the mechanism
of the oscillation of the persistent current, it is caused
by the step-by-step transition of L during the increase
of Φ. Examples of J are shown in Fig.3, where each
stronger oscillation (associated with a L odd and S = 1
GS) is followed by a weaker oscillation (associated with
a L even and S = 0 GS).
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
FIG. 3: The oscillation of the persistent current and the two
photon energies of the ground states against Φ/Φ0 in the
FABO region. The unit of current is 10−5C/R, where C
is the velocity of light. In the lowest panel, the black square
(white circle) denotes ~ω+ (~ω−), namely, the energy associ-
ated with Lo to Lo + 1 (Lo − 1) transition.
(iv) Relations among the internal states : Define
Om = e
im(θ1−θC)+ eim(θ2−θC) = 2 cos(mϕ/2). By an-
alyzing the numerical data, we found
Ñ(O1ψa) = ψb + ξa and Ñ(O1ψc) = ψd + ξc (7)
where Ñ is the operator of normalization, both ξa and
ξc are very small functions and depend on the dynamical
parameters very weakly. E.g., when R varies from 30 to
90, the weights of ξa and ξc vary from 0.0004 to 0.0002.
They are so small that in fact can be neglected. Since
O1 contains a node at the DB, it must cause a change of
type from a to b, or from c to d. Thus it is not surprising
that (7) holds. Since O1 is the operator of the dipole
transition (see below), eq.(7) provides an additional rule
of selection as discussed later.
(v) Dipole transition: The probability of dipole tran-
sition reads P
(o),± =
(ω±/c)
3R2 |A
(o),±|
2, where ω±
is the frequency of the photon,
= 〈Ψ(f)±|e
±iθ1 + e±iθ2 |Ψ(o)〉
= δL(f), L(o) ±1〈ψ
int |O1|ψ
int〉 (8)
where (f) and (o) denote the final and initial states,
respectively, the signs ± are associated with L(f) =
L(o) ± 1.
Let the initial state be the GS with Lo, then ψ
must be ψa or ψc depending on Lo is even or odd .
Let α denotes the type of the initial state. Due to (7)
int |O1|ψ
int〉 = δ(f),α < O1ψα|O1ψα >
1/2, where
δ(f),α implies that the final state must be ψb (ψd) if α = a
(c), otherwise the amplitude is zero. Thus, due to the ad-
ditional rule of selection eq.(7), the dipole strength of the
GS is completely concentrated in two final states having
L(f) = L(o) ± 1 and both having the same internal state
specified by eq.(7). Accordingly, only the photons with
the two energies
~ω± = E(f)± − E(o)
= G[ 1
(1± 2(Lo + 2Φ)) + ∆α/G] (9)
can be emitted (absorbed), where ∆α = Eb − Ea or
Ed − Ec depending on α = a or (c). The oscillation
of ~ω± is plotted in the lowest panel of Fig.3. It turns
out that ∆α/G depends on R very weakly, thus ~ω± is
nearly proportional to R−2. Accordingly, a smaller ring
will have a larger probability of transition with a higher
energy.
(vi) FABO region: The oscillation in this region is com-
plicated as shown in Fig.1 and 3. It is noted that the GS
energy (3), persistent current (6), and the photon ener-
gies (9) all contain the factor Lo + 2Φ, thus their FABO
are completely in phase and have the same mechanism
caused by the transition of Lo against Φ. In Fig.1 the
abscissa Φ can be divided into segments, in each the
GS has a specific Lo and the GS energy is given by a
piece of a parabolic curve. The segment is called an
even (odd) segment if Lo is even (odd). At the border of
two neighboring segments the two GS energies are equal.
From the equality and based on (3), the right and left
boundaries of the segment with Lo can be obtained as
Φright(Lo) = (1−(µ/2G)
2)−1[ 1−µ(Ec−Ea)/G
2Lo+(−1)
Lo(2(Ec−Ea)+µ(Lo−1/2))/G]/4 (10)
Φleft(Lo) = (1 − (µ/2G)
2)−1[−1 − µ(Ec − Ea)/G
2Lo − (−1)
Lo(2(Ec − Ea) + µ(Lo + 1/2))/G]/4 (11)
where Lo ≤ 0 and Φright(Lo) = Φleft(Lo − 1), µ arises
from the HZeeman. The length of the segment reads
dLo = Φright(Lo) − Φleft(Lo) = (1 − (µ/2G)
2)−1[1 +
(−1)Lo(2(Ec − Ea) + µLo)/G]/2 (12)
which is related to the period of the FABO. When Φ
increases, the magnitude of Lo would increase. Since µLo
is negative, it is clear from eq.(12) that the length of even
(odd) segments would become shorter (longer) when Φ
increases.
The location of a segment with a given Lo can be
known from the inequality Φleft(Lo) ≤ Φ ≤ Φright(Lo).
Once the relation between Lo and the segments of Φ is
clear, every details of the FABO can be analytically and
exactly explained via the eq.(3), (6), and (9). In particu-
lar, the extrema in each segment can be known by giving
Φ = Φright or Φleft. For an example, the maximal cur-
rent is g(Lo+2Φright)/2π. Incidentally, the minimum of
the GS energy in a segment is Emin = Ec−µ
2/8G+µLo/2
(if So = 1), or just equal to Ea (if S0 = 0).
It is noted that Ec − Ea (cf. Table 1) and µ/G (it
is 0.0554 in our case) are both small. When Φ is small
the magnitude of |Lo| would be also small . In this case
eq.(12) leads to dLo ≈ 1/2, i.e., the period is a half of
the one of the normal ABO. In fact, (12) provides an
quantitative description of the variation of the period of
the FABO.
(vii) ABO region: When Φ becomes sufficiently large,
Lo will become very negative, the even segments will
disappear due to their lengths dLo ≤ 0 . We can de-
fine a critical odd integer Lcrit so that dLcrit−1 ≤ 0
while dLcrit+1 > 0, thereby the critical flux separating
the FABO and ABO region can be defined as
Φcrit = Φleft(Lcrit) (13)
Once Φ > Φcrit, Lo remains odd and the system keeps
polarized. Let IX be the largest even integer smaller
than −(G + 2(Ec − Ea))/µ. It turns out from eq.(12)
that Lcrit = IX + 1. With our parameters, Lcrit = −19
and accordingly Φcrit = 9.003 ( refer to Fig.1). Both
Lcrit and Φcrit depend on R very weakly, but sensitively
on the effective mass m∗.
In the ABO region (Φ > Φcrit), eqs.(10) to (12) do not
hold. Instead we have Φright = −(Lo − 1)/2, Φleft =
−(Lo + 1)/2, and dLo = 1. Thus the normal ABO re-
covers. Evaluated from (6), the magnitude of current is
from −g/2π to g/2π (for a comparison, it is from −g/4π
to g/4π for 1-e rings). From (9) the photon energies ~ω+
is from ∆c −G/2 to ∆c + 3G/2, at the same time ~ω−
is from ∆c + 3G/2 to ∆c −G/2.
(viii) Relations between the photon energies and other
physical quantities : Due to (7), the emitted (absorbed)
dipole photon has only two frequencies , therefore it is
meaningful to define ∆~ω = ~(ω+ − ω−). Directly from
(9) and (6), we have
∆~ω = hJo (14)
where h is the Planck’s constant and Jo is the persis-
tent current of the GS. To compare with 1-e rings, the
latter has ∆~ω = 2hJoĖq.(14) demonstrates that the os-
cillation of ∆~ω and the oscillation of Jo are matched
with each other exactly, they keep strictly proportional
to each other during the variation of Φ.
The maxima of ∆~ω measured in the ABO and FABO
regions, respectively, read
(∆~ω)
max = 2G (15)
(∆~ω)
max = 2G(Lo + 2Φright) (16)
Obviously, (15) provides a way to determine G, m∗can
be thereby obtained. (16) can be rewritten as
Ec−Ea = (G−µLo)/2−(2G−µ)/(4G)(∆~ω)
max (17)
This equation can be used to determine Ec −Ea. Fur-
thermore, we define
Γ~ω = ~(ω+ + ω−) = G+ 2∆α (18)
Once G has been known, (18) can be used to determine
Eb−Ea andEd−Ec. Since the spectrum can be generated
from the internal energies via (3), the evolutions of the
spectrum and the persistent current against Φ can be
understood simply by measuring the photon energies.
0.0 0.2 0.4 0.6 0.8 1.0
-0.15
-0.10
-0.05
B (Tesla)
50 to 120nm
FIG. 4: Evolution of hJo(solid line) and ∆~ω (dotted line)
against B for a 2-e ring with ra = 50 and rb = 120nm
(ix) Effect of the width: We now consider a two-
dimensional model in which the two electrons are strictly
confined in an annular region by a potential U(r), which
is zero if ra < r < rb or is infinite otherwise. Under
this model we have performed numerical calculation to
obtain ∆~ω and hJo, where Jo is now the total angular
current inside the ring (from ra to rb). The result is
shown in Fig.4 where ra = 50 and rb = 120 are assumed,
and the two quantities are slightly different from each
other. However, when the width becomes smaller, say
rb − ra < 30, the two curves overlap. Thus (14) works
not only for one-dimensional but also for two-dimensional
narrow rings. Let us define r = ~/
m∗(∆~ω)ABmax.
For one-dimensional rings and from (15), we have r= R,
where R is the radius of the ring. For two-dimensional
rings, it was found from our numerical calculation that
r ≈ (rb + ra)/2 if rb − ra < 30. E.g., when rb = 100
and ra = 70, r =85.03. When rb = 100 and ra = 90, r
=95.00. Thus (15) works also well for two-dimensional
narrow rings if the R in G is replaced by the average
radius.
It is noted that the band-structure and related optical
properties of 2-e rings have already been studied in de-
tail by Wendler and coauthors18. They classify the eigen-
states according to their radial motion, relative angular
motion, and collective rotation. In our paper the relative
angular motion is further classified into four types ac-
cording to the inherent nodal structures and periodicity
of their wave functions, i.e., according to whether the DB
shape is allowed and whether the wave function is contin-
uous at ϕ = 2π. The DB-accessibility turns out to be im-
portant because it affects the eigenenergies decisively. In
fact, the classification of states based on inherent nodal
structures was found to be crucial in atomic physics,19
this would be also true in two-dimensional systems. Fur-
thermore, the rule of selection for the dipole transition
has been proposed in ref.[18]. In our paper, an addi-
tional rule (namely,eq.(7)) is further proposed based on
the possible transition of internal structures. This rule
would affect the dipole spectrum seriously because the
emission (absorption) is thereby concentrated into two
frequencies. The difference of these two frequencies turns
out to be proportional to the persistent current. There-
fore the measurement of this difference can be used to
determine the magnitude of the current.
In summary, we have studied the FABO both analyt-
ically and numerically. The analytical formalism pro-
vides not only a base for qualitative understanding, but
also provides a number of formulae for quantitative de-
scription. The domain of Φ is divided into segments,
each corresponds to a Lo. This division describes ex-
actly how Lo would transit against |Phi, which causes
directly the FABO. Thereby the variation of the period
and amplitude of the oscillation of the GS energy, persis-
tent current, and the frequencies of dipole transition in
the FABO region can be described exactly. A number of
equalities to relate the physical quantities and dynamical
parameters have been found. In particular, a new oscilla-
tion, namely, the oscillation of ∆~ω was found to match
exactly the oscillation of Jo. Since the photon energies
can be more accurately measured, other observables and
parameters can be thereby determined via the equali-
ties. Since the separability of the Hamiltonian and the
existence of inherent nodes are common, the above de-
scription can be more or less generalized to N−electron
rings, this deserves to be further studied.
Acknowledgment, This work is supported by the
NSFC of China under the grants 10574163 and 90306016.
.REFERENCES
1, A. Lorke, R.J. Luyken, A.O. Govorov, J.P. Kot-
thaus, J.M. Garcia, and P.M. Petroff, Phys. Rev. Lett.
84, 2223 (2000).
2, U.F. Keyser, C. Fühner, S. Borck, R.J. Haug, M.
Bichler, G. Abstreiter, and W. Wegscheider, Phys. Rev.
Lett. 90, 196601 (2003)
3, D. Mailly, C. Chapelier, and A. Benoit, Phys. Rev.
Lett. 70, 2020 (1993)
4, A. Fuhrer, S. Lüscher, T. Ihn, T. Heinzel, K. Ensslin,
W. Wegscheider, and M. Bichler, Nature (London) 413,
822 (2001)
5, S. Viefers, P. Koskinen, P.Singha Deo, M. Manninen,
Physica E 21, 1(2004).
6, K. Niemelä, P. Pietiläinen, P. Hyvönen, and T.
Chakraborty, Europhys. Lett. 36, 533 (1996)
7, M. Korkusinski, P. Hawrylak, and M. Bayer, Phys.
Stat. Sol. B 234, 273 (2002)
8, Z. Barticevic, G. Fuster, and M. Pacheco, Phys.
Rev. B 65, 193307 (2002)
9, M. Ferconi and G.Vignale, Phys. Rev. B 50, 14722
(1994).
10, Li. Serra, M. Barranco, A. Emperador, M. Pi, and
E. Lipparini, Phys. Rev. B 59, 15290 (1999)
11 A. Emperador, F. Pederiva, and E. Lipparini,
Phys. Rev. B 68, 115312 (2003)
12, C.G. Bao, G.M. Huang, Y.M. Liu, Phys. Rev. B
72, 195310 (2005)
13, A.E. Hansen, A. Kristensen, S. Pedersen, C.B.
Sorensen, and P.E. Lindelof, Physica E (Amsterdam)
12,770 (2002).
14, K. Moulopoulos and M. Constantinou, Phys. Rev.
B. 70, 235327 (2004)
15,J. Planelles, J.I. Climente, and J.L. Movilla,
arXiv:cond-mat/0506691 (2005)
16, J.I. Climente and J. Planelles, Phys. Rev. B 72,
155322 (2005)
17, A.O. Govorov, S.E. Ulloa, K. Karrai, and R.J. War-
burton, Phys. Rev. B 66, 081309 (2002)
18, L. Wendler, V.M. Fomin, A.V. Chaplik, and A.O.
Govorov, Phys. Rev. B 54, 4794 (1996).
19, M.D. Poulsen and L.B. Madsen, Phys. Rev. A 72,
042501 (2005).
http://arxiv.org/abs/cond-mat/0506691
ABSTRACT
  The fractional Aharonov-Bohm oscillation (FABO) of narrow quantum rings with
two electrons has been studied and has been explained in an analytical way, the
evolution of the period and amplitudes against the magnetic field can be
exactly described. Furthermore, the dipole transition of the ground state was
found to have essentially two frequencies, their difference appears as an
oscillation matching the oscillation of the persistent current exactly. A
number of equalities relating the observables and dynamical parameters have
been found.

<|endoftext|><|startoftext|>
Introduction
Being rare or ‘exotic’ is a relative phenomenon. From a Samoan point of view Burushaski is an 
extremely exotic language, but from the point of view of Telugu much less so. In this brief note we 
want to look a how different and how similar languages turn out to be in pairwise comparisons and 
the role that genealogical relatedness plays in this regard. We are interested in knowing whether 
there is a cut-off point Shigh in the amount of similarities such that we can be sure that language pairs 
that  have more than Shigh similarities are all generally thought to be related and also whether there 
is a cut-of point Slow at the other end of the scale such that all languages having less similarities than 
Slow are thought to be unrelated. In other words, if a language is ‘normal’ relative to some other 
language (as Burushaski is to Telugu), does this imply that the two languages are related according 
to commonly accepted classifications? Or, if two languages are mutually very exotic (as Burushaski 
and Samoan), does this imply that they are not thought to be related in commonly accepted 
classifications? 
 The data we use, as well as the genealogical classification, are from the World Atlas of 
Language Structures (Haspelmath et al., ed., henceforth WALS). The conclusions must of course be 
seen in relation to this particular dataset. Thus, when we observe a certain amount of typological 
similarity between two languages, this is strictly and only similarity in terms of the kinds of features 
investigated in WALS. The dataset includes 134 nonredundant features, each of which has from two 
to nine discrete values. All of these are quite generic typological features. Our conclusions are also 
limited to the amount of data available. We have required that for any language pair in our sample 
there should be 45 or more features attested for both members of the pair (a motivation for this 
precise number follows shortly). This has limited our sample to 320 languages and 29,810 pairs of 
languages compared. Among these pairs, there are 1,099 which are related according to the 
                                                           
 We would like to thank Bernard Comrie, Cecil Brown, and Dietrich Stauffer for comments on this manuscript. 
classification used in WALS. Henceforth we substitute ‘related’ for the more cumbersome ‘related 
according to the WALS classification’. We follow this classification because it seeks to meet a 
consensus view. 
1. Results 
Figure 1 presents the overall results of the investigation. As can be seen, the more similar languages 
get, the greater the probability is that they are related. The figures on which the curve is based are 
presented in Table 1. Percent similarity was defined as the percentage of attested features for which 
both languages have the same value. We have binned language pairs in 5% intervals from 10% to 
90% similarity. For the plot in Figure 1 the mean percent similarity in each interval was used. Table 
1 gives some additional information: it also shows how many language pairs belong in each 
interval. This is important for the interpretation of the results, as we shall see shortly. 
Before giving our interpretation let us explain why we have chosen the criterion that 
language pairs should have 45 or more features attested for both languages. It turns out that for a 
criterion of 30 or more features the curve is rather similar but not quite as steep, showing less 
dependence between the amount of similarity and the probability of finding related pairs. This 
indicates that the fewer features one operates with, the more prominent is random sampling 
variability in percent similarity. When operating with a criterion of 60 or more attested features the 
curve becomes uneven, indicating that the higher criterion passes too few pairs for stable results. 
This becomes even more pronounced when the criterion is 75 or more features. Obviously, with a 
more extended database the number of features taken to be criterial could be raised, but 45 is a 
number that suits the data available in WALS.  
Figure 1. The probability of finding related languages 
Probability of finding related languages
0 10 20 30 40 50 60 70 80 90 100
% similarity
Table 1. Data. (%SIM = % typological similarity between members of pairs; %REL = % of 
language pairs that are related; PAIRS = number of language pairs in range) 
%SIM %REL PAIRS 
10.0-14.9 0 11 
15.0-19.9 0 91 
20.0-24.9 0 443 
25.0-29.9 0.26 1566 
30.0-34.9 0.33 3904 
35.0-39.9 0.4 6019 
40.0-44.9 1.2 6772 
45.0-49.9 3.26 4873 
50.0-54.9 6.68 3520 
55.0-59.9 15.41 1551 
60.0-64.9 23.72 666 
65.0-69.9 38.24 238 
70.0-74.9 54.26 94 
75.0-79.9 61.54 39 
80.0-84.9 85 20 
85.0-89.9 100 2 
90.0-94.9 100 1 
 It may be of interest to mention the language pairs that fall in the lower and upper 
ranges of the percentage of shared values. Collectors of linguistic trivia may find it interesting that 
the members of the most divergent language pair in the world (in our dataset), i.e. Tümpisa 
Shoshone and Wari’, are found in the same general area, namely the Americas, that someone who is 
tired of Romance linguistics should turn to Nivkh and someone fed up with Swedish should visit 
the Koasatis when looking for something as radically different as it gets. Lists of the 20 most 
divergent language pairs and the 20 most similar ones are provided in tables 2 and 3. 
Table 2. The 20 most divergent language pairs in the sample 
Language A Language B Number of 
features 
compared 
% Similarity 
Tümpisa 
Shoshone   
Wari'              48 10.4 
Archi              Tukang Besi        46 13 
Maybrat            Limbu              45 13.3 
Italian            Nivkh              51 13.7 
Burushaski         Samoan             49 14.3 
Tzutujil           Burmese            49 14.3 
Ju|'hoan           Yup'ik (Central)   56 14.3 
Maybrat            Tamil              55 14.5 
Nubian 
(Dongolese) 
Acehnese           48 14.6 
Swedish            Koasati            47 14.9 
Klamath            Wari'              47 14.9 
Kongo              Ladakhi            46 15.2 
Bashkir            Maori              46 15.2 
Berber (Middle 
Atlas) 
Waorani            45 15.6 
Lango              Archi              45 15.6 
Archi              Thai               45 15.6 
Thai               Retuarã            45 15.6 
Ijo (Kolokuma)     Kutenai            50 16 
Kongo              Evenki             56 16.1 
Arabic (Egyptian)  Tümpisa 
Shoshone   
48 16.7 
Table 3. The 20 most similar language pairs in the sample 
Language A Language B Relatedness Number of 
features 
compared 
% Similarity 
Lango              Luo                same genus 46 80.4 
Luvale             Zulu               same genus 97 80.4 
Khmer              Vietnamese         same family, 
different genera 
89 80.9 
Vietnamese         Thai               different families 110 80.9 
Khalkha            Tuvan              same family, 
different genera 
48 81.3 
Lithuanian         Russian            same family, 
different genera 
64 81.3 
Greek (Modern)     Bulgarian          same family, 
different genera 
64 81.3 
Khmer              Thai               different families 91 81.3 
Polish             Russian            same genus 71 81.7 
Russian            Serbian-Croatian   same genus 45 82.2 
Swahili            Zulu               same genus 107 82.2 
Dagur              Turkish            same family, 
different genera 
46 82.6 
Telugu             Kannada            same family, 
different genera 
47 83 
Kongo              Nkore-Kiga         same genus 48 83.3 
Dutch              German             same genus 56 83.9 
Italian            Spanish            same genus 63 84.1 
Drehu              Iaai               same genus 46 84.8 
English            Swedish            same genus 60 85 
French             Italian            same genus 64 85.9 
Hindi              Panjabi            same genus 49 91.8 
While Table 2 does not point in any specific direction and remains a curiosity, Table 3 provides 
fragments of information which fits into the larger picture that emerges from our study. We note 
that two pairs of unrelated languages, Vietnamese-Thai and Khmer-Thai, turn up in this list, which 
otherwise consists of genealogically unrelated language pairs. Furthermore, the rest of the pairs 
represent a mixture of languages related to different degrees (see Dryer 1992, 2005 for a definition 
of ‘genera’). 
 Returning to Figure 1 and the associated data in Table 1 let us proceed to overall 
interpretations. We set out asking whether there is some degree of similarity in typological profiles 
beyond which it is certain that languages are related. The answer is positive, but nevertheless 
discouraging. Members of language pairs in the sample that are 81.5% or more similar are all 
related. But only 12 pairs of languages are that similar, in spite of the fact that there are 1099 pairs 
of related languages in the sample! On the other hand, if there are less than 25% shared feature 
values all language pairs will be unrelated, and this goes for 545 pairs in the sample. If one allows 
for a very small margin of error (around 1%), it can predicted that less than 40% shared feature 
values implies unrelatedness. That goes for 12,034 language pairs in the sample—close to half of 
the total of 29,810. Thus, lack of similarity is a good predictor of unrelatedness, but presence of 
similarity is a bad predictor of relatedness. 
2. Are there ways of improving the results? 
We next consider the question of whether the prediction of relatedness could be improved 
somehow. In other studies (Holman et al. 2006a,b, Brown et al. 2006) we have made exact 
quantitative explorations of the relationship between typological similarity and geographical 
distance among languages. Not surprisingly, the greater the geographical proximity is between 
languages, the more similar they tend to be (this goes for both related and unrelated languages). If 
one takes into account the areal factor, this might move the cut-off point to allow more accurate 
predictions of relatedness. Testing this strategy was unsuccessful. We were not able to obtain 
markedly different results by adjusting the measure of similarity relative to geographical distance: 
the correlation between adjusted and unadjusted measures was 0.96. The reason for this is probably 
that the distance measure, as given in the WALS database, identifies the location of a given 
language (roughly) with its center of extension. This means that some neighbouring languages, such 
as German and Dutch, are treated as having a certain distance between them when in reality they 
don’t have any. The more widespread the languages compared are, the bigger this problem gets. 
Since it is impossible to provide adequate measure of geographical distances for 29,810 language 
pairs, and not just take recourse to a mechanical measure of distance from one WALS dot to 
another, it is not viable to improve on the cut-off point in such a way. 
 Also, the 134 features differ appreciably in the distribution of rarity and commonness 
among their values.  It is possible to imagine that taking into account the relative rarity of feature 
values might improve the predictions. We again failed to obtain markedly different results by 
adjusting the measure of similarity relative to differences among features: the correlation between 
adjusted and unadjusted measures was 0.98. The probable reason is that differences among features 
tend to average out in a sample of at least 45 attested features. 
   Another strategy to try to improve the power of prediction concerning relatedness 
would be to weight different features or values of features according to their stability. We have 
explored ways of measuring stability and have come out with a ranked order of stability for WALS 
features (Wichmann et al. 2006). Conceivably, if the features shared among languages were 
weighted for their stability the cut-off point could be pushed a bit. We expect, however, that the 
results would be similar to the results for taking into account rarity, since stable and unstable 
features would also average out. 
A final strategy to improve the results would be to take into account the areality of 
features. The linguistic typological literature abounds with statements concerning the susceptibility 
to diffusion of certain features as opposed to others. In practice, however, it turns out to be virtually 
impossible to define areas and measure areality in a consistent way. A major contribution of WALS 
has been to show that most typological features are ‘areal’ to various extents. Browsing the maps 
will make it clear to anyone that almost any feature can spread and that whatever features diffuse 
are the features that happen to exist in an area. Thus, ‘areality’ is not amenable to quantification in 
any straightforward way. 
3. Deviant language pairs 
The results reported on in Figure 1 and Table 1 show that there are a few pairs of languages which 
are related even though showing less than 40% similarities, which is the point where pairs tend 
overwhelmingly not to be related. It serves the record to provide a list of the pairs of related 
languages that are deviant in the sense that they show less similarities than related languages 
normally do. This list is provided in Table 4. 
Table 4. Related languages that have unusually different typological profiles (less than 40% 
similarities) 
Language A Language B Language family Number of 
features 
compared 
% Similarity 
Luvale             Ijo (Kolokuma)     Niger-Congo 52 28.8 
Zulu               Ijo (Kolokuma)     Niger-Congo 52 28.8 
Maidu 
(Northeast)  
Tsimshian 
(Coast)  
Penutian 48 29.2 
Ngiti              Koyra Chiini       Nilo-Saharan 47 29.8 
Yoruba             Ijo (Kolokuma)     Niger-Congo 51 31.4 
Mundari            Semelai            Austro-Asiatic 66 31.8 
Swahili            Ijo (Kolokuma)     Niger-Congo 50 32 
Maung              Yidiny             Australian 81 32.1 
Mundari            Khmer              Austro-Asiatic 78 32.1 
Koyraboro Senni    Murle              Nilo-Saharan 65 32.3 
Koromfe            Ijo (Kolokuma)     Niger-Congo 49 32.7 
Beja               Margi              Afro-Asiatic 45 33.3 
Sango              Ijo (Kolokuma)     Niger-Congo 51 33.3 
Nandi              Koyraboro Senni    Nilo-Saharan 47 34 
Nandi              Koyra Chiini       Nilo-Saharan 52 34.6 
Marathi            Spanish            Indo-European 52 34.6 
Margi              Amharic            Afro-Asiatic 49 34.7 
Mundari            Vietnamese         Austro-Asiatic 88 35.2 
Garo               Cantonese          Sino-Tibetan 51 35.3 
Berber (Middle 
Atlas) 
Kera               Afro-Asiatic 65 35.4 
Irish              Marathi            Indo-European 45 35.6 
Paamese            Acehnese           Austronesian 45 35.6 
Limbu              Mandarin           Sino-Tibetan 45 35.6 
Mandarin           Bawm               Sino-Tibetan 76 36.8 
Ijo (Kolokuma)     Diola-Fogny        Niger-Congo 46 37 
Ngiti              Nubian 
(Dongolese) 
Nilo-Saharan 54 37 
Miwok (Southern 
Sierra) 
Tsimshian 
(Coast)  
Penutian 62 37.1 
Mundari            Khmu'              Austro-Asiatic 70 37.1 
Bagirmi            Nubian 
(Dongolese) 
Nilo-Saharan 64 37.5 
Beja               Hausa              Afro-Asiatic 82 37.8 
Koromfe            Kisi               Niger-Congo 45 37.8 
Yidiny             Tiwi               Australian 90 37.8 
Limbu              Meithei            Sino-Tibetan 45 37.8 
Kera               Amharic            Afro-Asiatic 50 38 
Zulu               Yoruba             Niger-Congo 104 38.5 
Beja               Kera               Afro-Asiatic 57 38.6 
Ngiyambaa          Maranungku         Australian 74 39.2 
Malagasy           Acehnese           Austronesian 56 39.3 
Ngiti              Nandi              Nilo-Saharan 48 39.6 
Lugbara            Lango              Nilo-Saharan 53 39.6 
Fur                Ngiti              Nilo-Saharan 58 39.7 
Experts in the different families involved will surely have good explanations for these deviant 
cases. In some cases a pair may in reality not belong to the same family, as in the case of large and 
not altogether uncontroversial families such as Australian and Nilo-Saharan. In other cases, such as 
the two pairs featuring Marathi, a wide separation both temporally and geographically and 
interaction with widely different types of languages may conspire to make a related pair stand out as 
unusually different. In any case, measuring the amount of typological similarity provides a clue that 
‘something is going on’—either the classification is potentially wrong or heavy language contact is 
involved. So the method of comparing typological profiles is potentially useful for someone 
wishing to probe into the behavior of different languages within a proposed family. 
4. Conclusions 
The results reported on in this note were, in part, unsurprising and, in part, unexpected. Figure 1 
showed a close correlation between relatedness and typological similarity. This is what we had 
expected. But we also expected to find some minimal amount of typological similarity among 
language pairs which would suffice to predict that two languages are related. It turned out to be the 
case, however, that the amount of similarity required to make this prediction is so high (81.5%) that 
only few language pairs qualify. In practice, this means that typological features such as those of 
WALS are not useful for identifying relatedness among languages when it comes to comparisons of 
single pairs (when groups of languages are compared the situation may be different, but this issue is 
beyond the scope of this paper). At the other end of the scale we found that typological dissimilarity 
is a good predictor of unrelatedness: with only a small margin of error one can predict that 
languages which have 60% or more differences are not related according to the WALS 
classification. Our finding that a certain amount of typological differences can be used to  predict 
that languages are not commonly believed to be related means that typological differences are a 
yardstick for gauging the limits of the traditional comparative method. 
 While it was was not surprising to find a correlation between relatedness and the 
amount of typological differences among language pairs, this finding may nevertheless steer us in 
new directions. Presumably there is a correlation between the amount of shared basic vocabulary 
and relatedness as well. If so, the amount of shared basic vocabulary and the amount of typological 
similarity should also be correlated, and it may even be possible to start considering whether there 
is such a thing as a ‘typological clock’ such that the time of separation of languages of a given 
family may be inferred from the amount of typological differences within the family. The fact that 
unrelated languages may be as similar typologically as related ones indicates that for a ‘typological 
clock’ to work reasonably well, several pairwise comparison should be made. How, in practice, this 
kind of methodology could be developed would be an item for future research. 
References 
Brown, Cecil H., Eric W. Holman, Christian Schulze, Dietrich Stauffer, and Søren Wichmann. 
2006. Are similarities among languages of the Americas due to diffusion or 
inheritance? An exploration of the WALS evidence. Paper presented at the conference 
“Genes and Languages”, University of California Santa Barbara, September 8-10, 
2006. 
Dryer, Matthew S. 1992. The Greenbergian word order correlations.  Language 68:81-138. 
Dryer, Matthew S. 2005. “Genealogical language list,” in The World Atlas of Language Structures, 
edited by Martin Haspelmath, Matthew S. Dryer, David Gil, and Bernard Comrie, pp. 
584-643.  Oxford: Oxford University Press. 
Haspelmath, Martin, Matthew S. Dryer, David Gil, and Bernard Comrie (eds.). 2005. The World 
Atlas of Language Structures. Oxford: Oxford University Press. 
Holman, Eric W., Dietrich Stauffer, Christian Schulze, and Søren Wichmann. 2006. On the relation 
between structural diversity and geographical distance among languages: observations 
and computer simulations. (Revised version under review for Linguistic Typology). 
Holman, Eric W., Søren Wichmann, and Cecil H. Brown. 2006. Linguistic and cultural diffusion in 
a comparative perspective. Submitted. 
Wichmann, Søren, Eric W. Holman, & Hans-Jörg Bibiko. 2006. How computer simulations may 
help linguists: recent progress and prospects for more. Paper presented at the 
conference “Language and physics”, Warsaw, September 11-15, 2006.
ABSTRACT
  No abstract given; compares pairs of languages from World Atlas of Language
Structures.

<|endoftext|><|startoftext|>
Introduction
Nonlinear partial differential equations (PDEs) play very important role in many
fields of mathematics, physics, chemistry, and biology, and numerous applica-
tions. If for nonlinear ordinary differential equations (ODEs) one can observe
incontestable progress in their automatic solving, the situation for nonlinear
PDEs seems as nearly hopeless one.
Despite the fact that various methods for solving nonlinear PDEs have been
developed in 19-20 centuries as the suitable groups of transformations, such as
point or contact transformations, differential substitutions, and Backlund trans-
formations etc., the most powerful method for explicit integration of second-
order nonlinear PDEs in two independent variables remains the method of Dar-
boux [1]-[4]. The original Darboux method (as already Darboux stated in [1])
is extendable in principle to equations of all orders in an arbitrary number of
independent variables, even to systems of equations; however, in [1]-[2] and sub-
sequent papers by many authors, the detailed calculations were performed only
http://arxiv.org/abs/0704.0072v1
for a single second-order equation with one dependent and two independent
variables.
The Darboux method was refined in recent years into more precise and effi-
cient (although not completely algorithmic) form [5]-[8] and references therein.
Nevertheless this approaches suffer from high complexity and necessitate to use
some tricks.
There were some partially successful attempts to extend modern variants of
the Darboux method based on Laplace cascade method on higher-order PDEs
and PDEs in the space of more than two independent variables [10]-[13] but
they suffer from high complexity too.
There is an original approach to the problem, based on the special type
of local change of variables which leads to the order reduction of initial PDE,
proposed in [14], which is suitable for high dimensions problems but of very
special class though.
In present paper we propose seemingly new method for finding solutions
of some types of nonlinear PDEs in closed form. The method is based on
decomposition of nonlinear operators on sequence of operators of lower orders. It
is shown that decomposition process can be done by iterative procedure(s), each
step of which is reduced to solution of some auxiliary PDEs system(s) for one
dependent variable. Moreover, we find on this way the explicit expression of the
first-order PDE(s) for first integral of decomposable initial PDE. Remarkably
that this first-order PDE is linear if initial PDE is linear in its highest derivatives.
The developed method is implemented in Maple procedure, which can really
solve many of different order PDEs with different number of independent vari-
ables. Examples of PDEs with calculated their general solutions demonstrate a
potential of the method for automatic solving of nonlinear PDEs.
2 Bases of the method
2.1 Decomposable PDEs
The simplest second-order non-linear PDE for w = w(t, x)
can be easily transformed to the following decomposed form
) = 0 , (2)
from which we can without difficulty obtain the general solution to PDE (1) in
two steps. First step gives us
d ln(G(x))
, (3)
where G(x) is an arbitrary function. And then, solving the equation (3) on the
second step, we obtain
w(t, x) = F (t)G(x) , (4)
where F (t) is one more arbitrary function.
The main observations on analyzing the grounds of solvability of the PDE
(1) by the above method are that
1. The PDE (1) is ”decomposable”, i.e., it can be represented as a composi-
tion of successive differential operators of type (5) (not necessarily linear). It is
clear that such type of decomposition can be done for some PDEs of any order
and with any number of independent variables in the following manner
D1(w) = u1 ,
D2(u1) = u2 ,
. . . . . . , (5)
Dn(un−1) = 0 ,
where ~x = (x1, . . . , xm), w = w(~x), ui = ui(~x) and
Di(u) = Vi(~x, u,
, . . . ,
Assuming that Vi are arbitrary functions, and eliminating ui by successive sub-
stitutions in system (5), we get a family of PDEs for w of nth order
Dn(Dn−1(. . . D1(w) . . . )) = 0 . (6)
which are ”decomposable” and in principle their solutions general or particular
can be obtained by integration of split system (5). The PDE (6) is nonlinear if
at least one of the operators Di is nonlinear. Not all PDEs admit such repre-
sentation. And in positive cases such representation is not unique in general.
Note that as a matter of fact Di need not be the first-order differential
operators. So the composition procedure for nth order PDE, when n > 2 can
be as follows
(w) = u ,
(u) = 0 , (7)
where n1, n2 are integers and n1 + n2 = n, w = w(~x), u = u(~x), and (k ≤ j)
i (u) = Vi(~x, u,
, . . . ,
. . . ∂x
m |k1+···+km=k≤j
, . . . ,
The late representation allows us to carry out the PDE‘s decomposition or
order reduction gradually bit by bit.
We have to stress here that in general representations (5) and (7) may have
different meaning. For example, some PDEs do not admit representation (5)
but permit the form (7) with both solvable DEs.
2. Each step of the solving process for decomposed PDE is faced with the
necessity to solve differential equation Di(ui−1) = ui (or D
i (ui−1) = ui), so all
such DEs must be solvable. Note that only first step Dn(un−1) = 0 is free from
arbitrary functions.
So one of the PDEs solving strategies may be as follows. First of all we
try to decompose given PDE. In order to do so we have to solve corresponding
auxiliary nonlinear PDE system for unknown functions Vi, it is sufficient to
find a particular solution here. And, if it is successful, then, deciding between
the variants, try to solve each arising DE from the chain (5). Main obstacle
here, beginning at the second step is just mentioned necessity to solve DEs with
arbitrary functions. There are sufficiently narrow circle of solvable (in sense of
the general solutions) DEs with an arbitrary function as a parameter.
Another (classification) approach can be based on the usage of only solvable
DEs. That is, we can form a composition of successive solvable differential
operators and as a result obtain a families of solvable PDEs. Such a way leads to
extensive nontrivial families for different types of nonlinear PDEs which general
solutions can be expressed in closed form. But on this way we encounter a
difficulty to circumscribe such families integrally and are forced to consider
particular subfamilies. Nevertheless it yields extensive field of PDEs for methods
testing [15].
2.2 Decomposition algorithm for decomposable PDEs
For nth order PDE, when n > 2 there are some slightly different approaches
which are dictated by goals of the problem. If the goal is to decompose given
nonlinear operator then we have to use the scheme (7) with n1 = 1, n2 = n− 1.
And conversely we have to use the scheme (7) with n1 = n − 1, n2 = 1 if the
goal is to solve given PDE. The last procedure in some features resembles the
well-known technics of reducing ODEs order, e.g., by first integral method. Of
course, it is possible to use intermediate cases.
All above cases can be treated by the same way as we consider below but
each of them leads to auxiliary PDEs systems of different order, viz n2+1, with
corresponding calculation complexity.
In sequel we will consider for shortness only the case with n1 = n−1, n2 = 1,
as more practical for PDEs solving.
Let us consider the decomposition of type (7) with Dn−1
(w) as a solution
of the following equation with respect of u = u(~x)
J(u, ~x, w,
, . . . ,
. . . ∂x
m |k1+···+km=k≤n−1
, . . . ,
∂n−1w
) = 0 (8)
D2(u) = V (~x, u,
, . . . ,
) . (9)
If substitute u = Dn−1
(w) into (9) we obtain decomposable n-th order PDE
V (~x, U0, Ux1 , . . . , Uxm) = 0 , (10)
where (we use below the following notation w = W0 and
...∂x
= Wk1,...,km)
(w) = U0 , (11)
∂Wk1,...,km
,...,k∗
= Uxi (i = 1, . . . ,m) , (12)
where k∗j = kj + 1 if j = i and k
j = kj otherwise, and it is supposed that
differentiations in sum are carried out on all indexed W ‘s which are involved in
Here we can introduce U0 and Ux1 , . . . , Uxm as new independent variables
if express m variables from the set {Wk1,...,km} with k1 + · · ·+ km = n using
linear system (12).
Assuming that given PDE of order n
F (~x, w,
, . . . ,
. . . ∂x
m |k1+···+km=k≤n
, . . . ,
) = 0 (13)
is decomposable, we receive, that after substitution of the new variables, left-
hand side of given PDE must turn into (10) with some V .
Left-hand side of given PDE expressed in new variables is the first-order
differential expression with respect to
J(U0, ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1)
and must not depend on all indexed W ‘s, that is derivatives of F expressed in
new variables with respect to all indexed W ‘s are equal to zero. Sequence of
such derivatives of F equated to zero form a second-order PDE system for J .
So a solution (particular as well) the PDE system gives possible expression of
differential operator Dn−1
(w) through (8) and differential operator D2(u) by
substituting the solution of J into left-hand side of given PDE expressed in new
variables.
Of course, there are problems where a operator decomposition is required
only. But in most cases obtained decomposition is intended for finding solutions
for given PDE. If in obtained decomposition the corresponding PDE D2(u) = 0
is solvable, then substitution of obtained u into J expressed in original variables
gives us a first integral (see its definition in the next subsection) of given PDE.
It is easy to see that for decomposable PDEs the first integral is a differential
equation, so we can try to solve it or to find a first integral for this new DE (or
decompose it) by the scheme described above until we come to the first-order
Remarkably that in the approach under consideration the finding of first
integrals can be done more directly and effectively.
2.3 Differential equation for first integral of decomposable
The first integral I of the PDE is an expression, involving one arbitrary func-
tion, which is equivalent in some sense to the given PDE. The first integral
vanishes on the set of solutions of given PDE. And (in accordance with [4]) all
differential consequences of the equation I = 0 coincide with respective differen-
tial consequences of given PDE (e.g., elimination of the arbitrary function leads
to the given PDE).
Our goal here is to find PDE for first integral of a decomposable PDE. To
do so we first of all have to take into account that u(~x) is the solution of the
corresponding PDE
V (~x, u,
, . . . ,
) = 0 ,
so u(~x) depends only on ~x but in no way on indexed W ‘s. Secondly, the depen-
dent variable in this case, namely
J(u(~x), ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1)
of given PDE (13) expressed in new variables do not to depend on Ux1 , . . . , Uxm
and is a first integral of given PDE.
If now consider u(~x) as an unknown function, we can denote the first integral
I(~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1) =
J(u(~x), ~x,W0,W1,0,...,0, . . . ,Wk1,...,km |k1+···+km=k≤n−1, . . . ,W0,0,...,n−1)
and instead of (12) in the form
∂Wk1,...,km
,...,k∗
= −Uxi
(i = 1, . . . ,m)
we arrive to the following system
∂Wk1,...,km
,...,k∗
= 0 (i = 1, . . . ,m) . (14)
If express m variables from the set {Wk1,...,km} with k1 + · · ·+ km = n (at
least one of which is actual for given PDE - note that there are some variants
here as a rule, so we can obtain some consistent PDEs on this stage) using linear
system (14) and substitute them into given PDE (13) we receive a first-order
(even linear if PDE (13) is linear in its highest derivatives) PDE with respect to
first integral I. And it remains only to solve this PDE(s) to find a first integral
of given PDE.
Note, given PDE is decomposable iff exists a solution of such first-order
PDE(s).
3 Examples
To facilitate necessary calculations in the process of finding first integrals I
have implemented above described method in prototype of Maple procedure
reduce PDE order (see Appendix). The input data of the procedure are given
PDE of any order and dependent variable of the PDE with any number of
independent variables. The procedure tries to find first integral(s) of the input
linear or nonlinear PDE.
The Maple built-in procedure pdsolve is used inside my procedure to solve
the first-order PDE for first integral. As different Maple versions have different
PDE solving abilities so the output depends on Maple version. In the following
examples I refer to Maple 11.
The procedure reduce PDE order is able to find first integrals for many
known and unknown linear and nonlinear PDEs. Here we give examples of
PDEs for which it is possible to find finally their general solutions. More exam-
ples one can find in collection of solvable nonlinear PDEs [15].
3.1 Second-order PDE with two independent variables
For PDE (w = w(t, x))
−kw− bc
= 0 (15)
with a 6= 0 and 4ak − b2 6= 0 the procedure reduce PDE order outputs the
following first integral
I = F1
4ak − b2 − 2 arctan
c+ 2a∂w
4ak − b2
4ak − b2
with arbitrary function F1.
The ODE I = 0 can be solved and one obtains (after some hand simplifica-
tions and edition) the following general solution to (15)
w(t, x) =
exp(t
b2 − 4ak)F (x)(b +
b2 − 4ak)−
b2 − 4ak + b
1 + exp(t
b2 − 4ak)F (x)
G(t)} exp
exp(t
b2 − 4ak)F (x)(b +
b2 − 4ak)−
b2 − 4ak + b
1 + exp(t
b2 − 4ak)F (x)
where F (x) and G(t) are arbitrary functions.
3.2 Second-order PDE with four independent variables
For PDE
∂x1∂x4
∂x2∂x4
∂x3∂x4
+ C0 +B1
C1(A1
+B1w +B0)+
C2(A1
+B1w +B0)
2 = 0 , (16)
where w = w(x1, x2, x3, x4) and Ai, Bi, Ci are constants, the procedure re-
duce PDE order outputs the following first integral
I = F1
x1, x2, x3, x4 +
2 arctan
2C2(A1
+B1w +B0) + C1
4C0C2 − C21
4C0C2 − C21
with arbitrary function F1.
The PDE I = 0 can be solved and one obtains the following general solution
to (16)
w(x1, x2, x3, x4) =
2A1C2
exp(−B1x1
)(2B0C2 + C1 + tan[
4C0C2 − C21
+G(ξ, (A2ξ +A1x2 −A2x1), (A3ξ +A1x3 −A3x1))]
4C0C2 − C21 )dξ
+ exp(−B1x1
)F [(A1x2 − A2x1), (A1x3 −A3x1), x4] ,
where F (t1, t2, t3) and G(t1, t2, t3) are arbitrary functions, c is arbitrary con-
stant.
3.3 Third order PDE with two independent variables
For PDE (w = w(t, x))
∂t∂x2
− 2w ∂
− w∂w
− aw3 = 0 (17)
the procedure reduce PDE order outputs the following first integrals
I1 = F1
− axw2), 1
ax2w2 + 2w(
− x ∂
) + 2x
I2 = F1
− atw2 −
with arbitrary function F1.
We can form some PDEs from I1 and to solve them we can repeat the process
of order reduction with the procedure reduce PDE order. The ODE I2 = 0 can
be solved directly and one obtains in any way the following general solution to
w(t, x) = F (t) exp
− xH(t) + x
G(x)dx −
xG(x)dx
where F (t), H(t) and G(x) are arbitrary functions.
3.4 Fourth order PDE with two independent variables
For PDE (w = w(t, x))
∂t2∂x2
− 2w2
∂t2∂x
∂t∂x2
− 2∂w
= 0 (18)
the procedure reduce PDE order outputs the following first integrals
I1 = F1(t,
∂t2∂x
− 2w∂w
− x ∂
∂t2∂x
w − 2x∂w
I2 = F1(x,
∂t∂x2
w + 2
− t ∂
∂t∂x2
w − 2t∂w
with arbitrary function F1.
The wealth of first integrals here allows us to operate with them in many
different ways. Apart from aforesaid subsequent order reduction we can, for
example, from
∂t∂x2
− 2w∂w
w + 2
= F (x)
∂t∂x2
] = G(x) ,
where F (x) and G(x) are arbitrary functions, algebraically eliminate mixed
derivative and obtain the following ODE
+ [tF (x)−G(x)]w2 = 0 ,
which gives the general solution to (18)
w(t, x) = H(t) exp
xF (x) dx − tx
F (x) dx+
G(x) dx −
xG(x) dx + xK(t)
where F (x), H(t), G(x) and K(t) are arbitrary functions.
4 Conclusion
The method have considered above is efficient enough for solving decomposable
PDEs of relatively high order with many independent variables. The main
limitation here is concerned with abilities to solve corresponding auxiliary first-
order PDEs for first integrals.
An adaptability of the method to PDEs which are not decomposable but
which general solutions can be expressed in closed form remains unsolved yet.
But it can be shown on examples that there are some ways to extend the method
for some types of such PDEs. These approaches deserve further thorough study
in another publication.
References
[1] G. Darboux, Sur les equations aux derivees partieles du second ordre. Ann.
Sci. Ecole Norm. Sup. 1870, v. 7, pp. 163-173.
[2] G. Darboux, Lecons sur la theorie generale des surfaces. v.II. Paris: Her-
mann, 1915.
[3] E. Goursat, Lecons sur l’integration des equations aux derivees partieles
du second ordre a deux variables independantes. V.I,II. Paris: Hermann,
1896, 1898.
[4] A.R. Forsyth, Theory of differential equations. Part 4. Partial differential
equations, vol. 6, Dover Press, New York, 1959.
[5] M. Juras, Generalized Laplace invariants and classical integration meth-
ods for second-order scalar hyperbolic partial differential equations in the
plane, Differential Geometry and Applications: Proc., Conf. Brno (Czech
Republic), 28 Aug.-1 Sept. 1995, Brno: Masaric Univ., 1966, pp. 275-284.
[6] M. Juras, Geometric aspects of second-order scalar hyperbolic partial dif-
ferential equations in the plane, Ph.D. thesis, 1997, Utah State University,
[7] V.V. Sokolov, A.V. Zhiber, On the Darboux integrable hyperbolic equa-
tions. Phys. Lett. A, v. 208, pp. 303-308, 1995.
[8] A.V. Zhiber, V.V. Sokolov, Exact integrable Liouville type hyperbolic equa-
tions [in Russian], Uspekhi Mat. Nauk, Vol. 56, No. 1, pp. 64-104, 2001.
[9] S.P.Tsarev, On Darboux integrable nonlinear partial differential equations,
Proc. Steklov Institute of Mathematics, v. 225, p. 372-381, 1999.
[10] J. Le Roux. Extensions de la methode de Laplace aux equations lin-
eaires aux derivees partielles dordre superieur au second. Bull. Soc. Math.
de France, v. 27, p. 237262, 1899. A digitized copy is obtainable from
http://www.numdam.org/
[11] U. Dini, Sopra una classe di equazioni a derivate parziali di secondordine
con un numero qualunque di variabili. Atti Acc. Naz. dei Lincei. Mem.
Classe fis., mat., nat. (ser. 5) v. 4, 1901, p. 121178. Also Opere v. III, p.
489566.
[12] U. Dini, Sopra una classe di equazioni a derivate parziali di secondordine.
Atti Acc. Naz. dei Lincei. Mem. Classe fis., mat., nat. (ser. 5) v. 4, 1902,
p. 431467. Also Opere v. III, p. 613660.
[13] S.P. Tsarev, On factorization and solution of multidimensional linear par-
tial differential equations. http://arxiv.org/abs/cs.SC/0609075, 2006.
[14] V.M. Boyko, W.I. Fushchych, Lowering of order and general solutions of
some classes of partial differential equations, Reports on Math. Phys., V.
41, No. 3, pp. 311-318, 1998.
[15] Yu.N. Kosovtsov, The general solutions of some nonlinear second
and third order PDEs with constant and nonconstant parameters.
http://arxiv.org/abs/math-ph/0609003 , 2006.
http://www.numdam.org/
http://arxiv.org/abs/cs.SC/0609075
http://arxiv.org/abs/math-ph/0609003
5 Appendix.
Maple procedure reduce PDE order
reduce PDE order:=proc(pde,unk)
local B,W,N,NN,ARG,acargs,i,M,pde0,DN,IND,IND2,IND3,IND4,ARGS,SUB,SUB0,
Z0,Bargs,EQS,XXX,WW,BB,PP,pdeI,IV,s,AN;
option ‘Copyright (c) 2006-2007 by Yuri N. Kosovtsov. All rights reserved.‘;
N:=PDETools[difforder](op(1,[selectremove(has,indets(pde,function),unk)]));
NN:=op(1,[selectremove(has,op(1,[selectremove(has,indets(pde,function),unk)]),diff)]);
ARG:=[op(unk)];
acargs:={};
for i from 1 to nops(ARG) do
if PDETools[difforder](NN,op(i,ARG))=0 then else acargs:=acargs union {op(i,ARG)}
fi; od;
acargs:=convert(acargs,list);
M:=op(0,unk)(op(acargs));
if type(pde,equation)=true then
pde0:=lhs(subs(unk=M,pde))-rhs(subs(unk=M,pde)) else pde0:=subs(unk=M,pde)
DN:=[seq(seq(i,i=1..nops(acargs)),j=1..N)];
IND:=seq(op(combinat[choose](DN,i)),i=1..N);
IND2:=seq(op(combinat[choose](DN,i)),i=1..N-2);
IND3:=op(combinat[choose](DN,N-1));
IND4:=op(combinat[choose](DN,N));
ARGS:=op(unk),M,seq(convert(D[op(op(i,[IND2]))](op(0,unk))
(op(acargs)),diff),i=1..nops([IND2]));
SUB:={M=W[0],seq(convert(D[op(op(i,[IND]))](op(0,unk))
(op(acargs)),diff)=W[op(op(i,[IND]))],i=1..nops([IND]))};
SUB0:={W[0]=op(0,unk)(op(ARG)),
seq(W[op(op(i,[IND]))]=subs(M=op(0,unk)(op(ARG)),
convert(D[op(op(i,[IND]))](op(0,unk))(op(acargs)),diff)),i=1..nops([IND]))};
Z0:=B(ARGS,seq(convert(D[op(op(i,[IND3]))](op(0,unk))(op(acargs)),diff),
i=1..nops([IND3])));
Bargs:=op(indets(subs(SUB,Z0),name));
EQS:=convert(subs(SUB,{seq(diff(Z0,op(i,acargs))=0,i=1..nops(acargs))}),diff);
XXX:={seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))};
WW:=select(type,indets(subs(SUB,pde0)), ’name’) intersect
{seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))};
BB:=select(has,combinat[choose](XXX, nops(acargs)),WW);
PP:={};
pdeI:={seq({subs(subs(solve(EQS,op(i,BB)),subs(SUB,pde0)))},i=1..nops(BB))};
IV:={seq(W[op(op(i,[IND4]))],i=1..nops([IND4]))};
for s from 1 to nops(pdeI) do
AN:=pdsolve(op(s,pdeI),{B},ivars=IV);
for i from 1 to nops(AN) do
if op(0,lhs(op(i,AN)))=B then
PP:=PP union {rhs(op(i,AN))}
catch:
end try;
PP:=subs(SUB0,PP);
RETURN(PP);
end proc:
Calling Sequence: reduce PDE order(PDE, f(~x));
PDE - partial differential equation;
f(~x) - indeterminate function with its arguments.
	Introduction
	Bases of the method
	Decomposable PDEs
	Decomposition algorithm for decomposable PDEs
	Differential equation for first integral of decomposable PDEs
	Examples
	Second-order PDE with two independent variables
	Second-order PDE with four independent variables
	Third order PDE with two independent variables
	Fourth order PDE with two independent variables
	Conclusion
	Appendix. Maple procedure reduce_PDE_order
ABSTRACT
  In present paper we propose seemingly new method for finding solutions of
some types of nonlinear PDEs in closed form. The method is based on
decomposition of nonlinear operators on sequence of operators of lower orders.
It is shown that decomposition process can be done by iterative procedure(s),
each step of which is reduced to solution of some auxiliary PDEs system(s) for
one dependent variable. Moreover, we find on this way the explicit expression
of the first-order PDE(s) for first integral of decomposable initial PDE.
Remarkably that this first-order PDE is linear if initial PDE is linear in its
highest derivatives.
  The developed method is implemented in Maple procedure, which can really
solve many of different order PDEs with different number of independent
variables. Examples of PDEs with calculated their general solutions demonstrate
a potential of the method for automatic solving of nonlinear PDEs.

<|endoftext|><|startoftext|>
Introduction
Morita contexts, in general, and (semi-)strict Morita contexts (with surjective con-
necting bilinear morphisms), in particular, were extensively studied and developed expo-
nentially during the last few decades (e.g. [AGH-Z1997]). However, we sincerely feel that
there is a gap in the literature on injective Morita contexts (i.e. those with injective con-
necting bilinear morphisms). Apart from the results in [Nau1994-a], [Nau1994-b] (where
the second author initially explored this notion) and from an application to Grothendieck
groups in the recent paper ([Nau2004]), it seems that injective Morita contexts were not
studied systematically at all.
∗Corresponding Author
http://arxiv.org/abs/0704.0074v2
We noticed that in several results of ([Nau1993], [Nau1994-a] and [Nau1994-b]) that are
related to Morita contexts, only one trace ideal is used. Observing this fact, we introduce
the notions of Morita semi-contexts and Morita data and investigate them. Several results
are proved then for injective Morita semi contexts and/or injective Morita data.
Consider a Morita datum M = (T, S, P,Q,<,>T , <,>S), with not necessarily compat-
ible bimodule morphisms <,>T : P ⊗S Q→ T and <,>S: Q⊗T P → S. We say that M is
injective, iff <,>T and <,>S are injective, and to be a Morita α-datum, iff the associated
dual pairings Pl := (Q, TP ), Pr := (Q,PS), Ql := (P, SQ) and Qr := (P,QT ) satisfy
the α-condition (which is closely related to the notion of local projectivity in the sense of
Zimmermann-Huisgen [Z-H1976]). The α-condition was introduced in [AG-TL2001] and
further investigated by the first author in [Abu2005].
While (semi-)strict unital Morita contexts induce equivalences between the whole mod-
ule categories of the rings under consideration, we show in this paper how injective Morita
(semi-)contexts and injective Morita data play an important role in establishing equiva-
lences between suitable intersecting subcategories of module categories (e.g. intersections
of subcategories that are localized/colocalized by trace ideals of a Morita datum with sub-
categories of static/adstatic modules, etc.). Our main applications in addition to equiv-
alences related to the Kato-Ohtake-Müller localization-colocalization theory (developed in
[Kat1978], [KO1979] and [Mül1974]), will be to ∗-modules (introduced by Menini and Or-
satti [MO1989]) and to right wide Morita contexts (introduced by F. Castaño Iglesias and
J. Gómez-Torrecillas [C-IG-T1995]).
Most of our results will be stated for left modules, while deriving the “dual” versions for
right modules is left to the interested reader. Moreover, for Morita contexts, some results
are stated/proved for only one of the Morita semi-contexts, as the ones corresponding to
the second semi-context can be obtained analogously. For the convenience of the reader, we
tried to make the paper self-contained, so that it can serve as a reference on injective Morita
(semi-)contexts and their applications. In this respect, and for the sake of completeness, we
have included some previous results of the authors that are (in most cases) either provided
with new shorter proofs, or are obtained under weaker conditions.
This paper is organized as follows: After this brief introduction, we give in Section 2
some preliminaries including the basic properties of dual α-pairings, which play a central
role in rest of the work. The notions of Morita semi-contexts and Morita data are intro-
duced in Section 3, where we clarify their relations with the dual pairings and the so-called
elementary rngs. Injective Morita (semi-)contexts appear in Section 4, where we study
their interplay with dual α-pairings and provide some examples and a counter-example.
In Section 5 we include some observations regarding static and adstatic modules and use
them to obtain equivalences among suitable intersecting subcategories of modules related
to a Morita (semi-)context. In the last section, more applications are presented, mainly to
subcategories of modules that are localized or colocalized by a trace ideal of an injective
Morita (semi-)context, to ∗-modules and to injective right wide Morita contexts.
2 Preliminaries
Throughout, R denotes a commutative ring with 1R 6= 0R and A,A
′, B, B′ are unital
R-algebras. We have reserved the term “ring” for an associative ring with a multiplicative
unity, and we will use the term “rng” for a general associative ring (not necessarily with
unity). All modules over rings are assumed to be unitary, and ring morphisms are assumed
to respect multiplicative unities. If T and S are categories, then we write T ≤ S (T ≤ S)
to mean that T is a (full) subcategory of S, and T ≈ S to indicate that T and S are
equivalent.
Rngs and their modules
2.1. By an A-rng (T, µT ), we mean an (A,A)-bimodule T with an (A,A)-bilinear mor-
phism µT : T ⊗A T → T, such that µT ◦ (µT ⊗A idT ) = µT ◦ (idT ⊗A µT ). We call an A-rng
(T, µT ) an A-ring, iff there exists in addition an (A,A)-bilinear morphism ηT : A → T,
called the unity map, such that µT ◦ (ηT ⊗A idT ) = ϑ
T and µT ◦ (idT ⊗A ηT ) = ϑ
T (where
A⊗A T
≃ T and T ⊗A A
≃ T are the canonical isomorphisms). So, an A-ring is a unital
A-rng; and an A-rng is (roughly speaking) an A-ring not necessarily with unity.
2.2. A morphism of rngs (ψ : δ) : (T : A) → (T ′ : A′) consists of a morphism of R-algebras
δ : A→ A′ and an (A,A)-bilinear morphism ψ : T → T ′, such that µT ′◦χ
(A,A′)
(T ′,T ′)
◦(ψ⊗Aψ) =
ψ ◦µT (where χ
(A,A′)
(T ′,T ′)
: T ′ ⊗A T
′ → T ′ ⊗A′ T
′ is the canonical map induced by δ). By RNG
we denote the category of associative rngs with morphisms being rng morphisms, and
by URNG < RNG the (non-full) subcategory of unital rings with morphisms being the
morphisms in RNG which respect multiplicative unities.
2.3. Let (T, µT ) be an A-rng. By a left T -module we mean a left A-module N with a left
A-linear morphism φNT : T ⊗AN → N, such that φ
T ◦ (µT ⊗A idN) = φ
T ◦ (idT ⊗A φ
T ). For
left T -modulesM,N, we call a left A-linear morphism f :M → N a T -linear morphism,
iff f(tm) = tf(m) for all t ∈ T. The category of left T -modules and left T -linear morphisms
is denoted by TM. The category MT of right T -modules is defined analogously. Let (T : A)
and (T ′ : A′) be rngs. We call an (A,A′)-bimodule N a (T, T ′)-bimodule, iff (N, φNT ) is
a left T -module and (N, φNT ′) is a right T
′-module, such that φNT ′ ◦ (φ
T ⊗A′ idT ′) = φ
(idT ⊗Aφ
T ′). For (T, T
′)-bimodulesM,N, we call an (A,A′)-bilinear morphism f :M → N
(T, T ′)-bilinear, provided f is left T -linear and right T ′-linear. The category of (T, T ′)-
bimodules is denoted by TMT ′ . In particular, for any A-rng T, a left (right) T -module
M has a canonical structure of a unitary right (left) S-module, where S := End(TM)
(S := End(MT )); and moreover, with this structure M becomes a (T, S)-bimodule (an
(S, T )-bimodule).
Remark 2.4. Similarly, one can define rngs over arbitrary (not-necessarily unital) ground
rngs and rng morphisms between them. Moreover, one can define (bi)modules over such
rngs and (bi)linear morphisms between them.
Notation. Let T be an A-rng. We write TU (UT ) to denote that U is a left (right) T -
module. For a left (right) T -module TU, we consider the set
∗U := HomT−(U, T ) (U
Hom−T (U, T )) of all left (right) T -linear morphisms from U to T with the canonical right
(left) T -module structure.
Generators and cogenerators
Definition 2.5. Let T be an A-rng. For a left T -module TU consider the following sub-
classes of TM :
Gen(TU) := {TV | ∃ a set Λ and an exact sequence U
(Λ) → V → 0};
Cogen(TU) := {TW | ∃ a set Λ and an exact sequence 0 → W → U
Pres(TU) := {TV | ∃ sets Λ1,Λ2 and an exact sequence U
(Λ2) → U (Λ1) → V → 0};
Copres(TU) := {TW | ∃ sets Λ1,Λ2 and an exact sequence 0 →W → U
Λ1 → UΛ2};
A left T -module in Gen(TU) (respectively Cogen(TU), Pres(TU), Copres(TU)) is said to be
U-generated (respectively U-cogenerated, U-presented, U-copresented). Moreover,
we say that TU is a generator (respectively cogenerator, presentor, copresentor), iff
Gen(TU) = TM (respectively Cogen(TU) = TM, Pres(TU) = TM, Copres(TU) = TM).
Dual α-pairings
In what follows we recall the definition and properties of dual α-pairings introduced
in [AG-TL2001, Definition 2.3.] and studied further in [Abu2005].
2.6. Let T be an A-rng. A dual left T -pairing Pl = (V, TW ) consists of a left T -module
W and a right T -module V with a right T -linear morphism κPl : V →
∗W (equivalently
a left T -linear morphism χPl : W → V
∗). For dual left pairings Pl = (V, TW ), P
l = (V
′), a morphism of dual left pairings (ξ, θ) : (V ′,W ′) → (V,W ) consists of a triple
(ξ, θ : ς) : (V, TW ) → (V
′, T ′W
where ξ : V → V ′ and θ : W ′ → W are T -linear and ς : T → T ′ is a morphism of rngs,
such that considering the induced maps <,>T : V ×W → T and <,>T ′: V
′ ×W ′ → T ′ we
< ξ(v), w′ >T ′= ς(< v, θ(w
′) >T ) for all v ∈ V and w
′ ∈ W ′. (1)
The dual left pairings with the morphisms defined above build a category, which we denote
by Pl. With Pl(T ) ≤ Pl we denote the full subcategory of dual T -pairings. The category
Pr of dual right pairings and its full subcategory Pr(T ) ≤ Pr of dual right T -pairings are
defined analogously.
Remark 2.7. The reader should be warned that (in general) for a non-commutative rng T
and a dual left T -pairing Pl = (V, TW ), the following map induced by the right T -linear
morphism κPl : V →
<,>T : V ×W → T, < v, w >T := κPl(v)(w)
is not necessarily T -balanced, and so does not induce (in general) a map V ⊗T W → T. In
fact, for all v ∈ V, w ∈ W and t ∈ T we have
< vt, w > = κPl(vt)(w) = [κPl(v)t](w) = [κPl(v)(w)]t = < v,w >T t;
< v, tw > = κPl(v)(tw) = t[κPl(v)(w)] = t < v, w >T .
2.8. Let T be an A-rng, N,W be left T -modules and identify NW with the set of all
mappings fromW toN. Considering N with the discrete topology andNW with the product
topology, the induced relative topology on HomT−(W,N) →֒ N
W is a linear topology (called
the finite topology), for which the basis of neighborhoods of 0 is given by the set of
annihilator submodules:
Bf (0) := {F
⊥(HomT−(W,N)) | F = {w1, ..., wk} ⊂W is a finite subset},
where
F⊥(HomT−(W,N)) := {f ∈ HomT−(W,N)) | f(W ) = 0}.
2.9. Let T be an A-rng, Pl = (V, TW ) a dual left T -pairing and consider for every right
T -module UT the following canonical map
U : U ⊗T W → Hom−T (V, U),
ui ⊗T wi 7→ [v 7→
ui < v,wi >T ]. (2)
We say that Pl = (V, TW ) ∈ Pl(T ) satisfies the left α-condition (or is a dual left α-
pairing), iff α
U is injective for every right T -module UT . By P
l (T ) ≤ Pl(T ) we denote the
full subcategory of dual left T -pairings satisfying the left α-condition. The full subcategory
of dual right α-pairings Pαr (T ) ≤ Pr(T ) is defined analogously.
Definition 2.10. Let T be an A-rng, Pl = (V, TW ) be a dual left T -pairing and consider
κPl : V →
∗W and α
V : V ⊗T W → End(VT ).
We say Pl ∈ Pl(T ) is
dense, iff κPl(V ) ⊆
∗W is dense (w.r.t. the finite topology on ∗W →֒ TW );
injective (resp. semi-strict, strict), iff α
V is injective (resp. surjective, bijective);
non-degenerate, iff V
→֒ ∗W and W
→֒ V ∗ canonically.
2.11. Let T be an A-rng. We call a T -module W locally projective (in the sense of B.
Zimmermann-Huisgen [Z-H1976]), iff for every diagram of T -modules
0 // F
g′◦ι   
ι //W
// N // 0
with exact rows and finitely generated T -submodule F ⊆W : for every T -linear morphism
g : W → N, there exists a T -linear morphism g′ : W → L, such that g ◦ ι = π ◦ g′ ◦ ι.
For proofs of the following basic properties of locally projective modules and dual
α-pairings see [Abu2005] and [Z-H1976]:
Proposition 2.12. Let T be an A-ring and Pl = (V, TW ) ∈ Pl(T ).
1. The left T -module TW is locally projective if and only if (
∗W,W ) is an α-pairing.
2. The left T -module TW is locally projective, iff for any finite subset {w1, ..., wk} ⊆ W,
there exists {(fi, w̃i)}
i=1 ⊂
∗W ×W such that wj =
fi(wj)w̃i for all j = 1, ..., k.
3. If TW is locally projective, then TW is flat and T -cogenerated.
4. If Pl ∈ P
l (T ), then TW is locally projective.
5. If TW is locally projective and κP (V ) ⊆
∗W is dense, then Pl ∈ P
l (T ).
6. Assume TT is an injective cogenerator. Then Pl ∈ P
l (T ) if and only if TW is locally
projective and κPl(V ) ⊆
∗W is dense.
7. If T is a QF ring, then Pl ∈ P
l (T ) if and only if TW is projective and W
→֒ V ∗.
The following result completes the nice observation [BW2003, 42.13.] about locally
projective modules:
Proposition 2.13. Let T be a ring, TW a left T -module, S := End(TW )
op and consider
the canonical (S, S)-bilinear morphism
[, ]W :
∗W ⊗T W → End(TW ), f ⊗T w 7→ [w̃ 7→ f(w̃)w].
1. TW is finitely generated projective if and only if [, ]W is surjective.
2. TW is locally projective if and only if Im([, ]W ) ⊆ End(TW ) is dense.
Proof. 1. This follows by [Fai1981, 12.8.].
2. Assume TW is locally projective and consider for every left T -module N the canonical
mapping
[, ]WN :
∗ W ⊗T N → HomT (W,N), f ⊗T n 7→ [w̃ 7→ f(w̃)n].
It follows then by [BW2003, 42.13.], that Im([, ]WN ) ⊆ HomT (W,N) is dense. In
particular, setting N = W we conclude that Im([, ]W ) ⊆ End(TW ) is dense. On
the other hand, assume Im([, ]W ) ⊆ End(TW ) is dense. Then for every finite subset
{w1, ..., wk} ⊆ W, there exists
g̃i ⊗T w̃i ∈
∗W ⊗T W with
wj = idW (wj) = [, ]W (
g̃i ⊗T w̃i)(wj) =
g̃i(wj)w̃i for j = 1, ..., k.
It follows then by Proposition 2.12 “2” that TW is locally projective.�
3 Morita (Semi)contexts
We noticed, in the proofs of some results on equivalences between subcategories of
module categories associated to a given Morita context, that no use is made of the com-
patibility between the connecting bimodule morphisms (or even that only one trace ideal
is used and so only one of the two bilinear morphisms is really in action). Some results of
this type appeared, for example, in [Nau1993], [Nau1994-a] and [Nau1994-b]. Moreover,
in our considerations some Morita contexts will be formed for arbitrary associative rngs
(i.e. not necessarily unital rings). These considerations motivate us to make the following
general definitions:
3.1. By a Morita semi-context we mean a tuple
mT = ((T : A), (S : B), P, Q,<,>T , I), (3)
where T is an A-rng, S is a B-rng, P is a (T, S)-bimodule, Q is an (S, T )-bimodule,
<,>T : P ⊗S Q → T is a (T, T )-bilinear morphism and I := Im(<,>T ) ⊳ T (called the
trace ideal associated to mT ).We drop the ground rings A,B and the trace ideal I ⊳ T,
if they are not explicitly in action. If mT (3) is a Morita semi-context and T, S are unital
rings, then we call mT a unital Morita semi-context.
3.2. Let mT = ((T : A), (S : B), P, Q,<,>T ), mT ′ = ((T
′ : A′), (S ′ : B′), P ′, Q′, <,>T ′) be
Morita semi-contexts. By a morphism of Morita semi-contexts from mT to mT ′ we
mean a four fold set of morphisms
((β : δ), (γ : σ), φ, ψ) : ((T : A), (S : B), P, Q) → ((T ′ : A′), (S ′ : B′), P ′, Q′),
where (β : δ) : (T : A) → (T ′ : A′) and (γ : σ) : (S : B) → (S ′ : B′) are rng morphisms,
φ : P → P ′ is (T, S)-bilinear and ψ : Q→ Q′ is (S, T )-bilinear, such that
β(< p, q >T ) =< φ(p), ψ(q) >T ′ for all p ∈ P, q ∈ Q .
Notice that we consider P ′ as a (T, S)-bimodule and Q′ as an (S, T )-bimodule with actions
induced by the morphism of rngs (β : δ) and (γ : σ). By MSC we denote the category
of Morita semi-contexts with morphisms defined as above, and by UMSC < MSC the
(non-full) subcategory of unital Morita semi-contexts.
Morita semi-contexts are closely related to dual pairings in the sense of [Abu2005]:
3.3. Let (T, S, P,Q,<,>T ) ∈ MSC and consider the canonical isomorphisms of Abelian
groups
Hom(S,T )(Q,
≃ Hom(T,T )(P ⊗S Q, T )
≃ Hom(T,S)(P,Q
This means that we have two dual T -pairings Pl := (Q, TP ) ∈ Pl(T ) and Qr := (P,QT ) ∈
Pr(T ), induced by the canonical T -linear morphisms
κPl := ξ
−1(<,>T ) : Q→
∗P and κQr := ζ(<,>T ) : P → Q
On the other hand, let (S, T,Q, P,<,>S) ∈ MSC and consider the canonical isomorphisms
of Abelian groups
Hom(S,T )(Q,P
≃ Hom(S,S)(Q⊗T P, S)
≃ Hom(T,S)(P,
Then we have two dual S-pairings Pr := (Q,PS) ∈ Pr(S) and Ql := (P, SQ) ∈ Pl(S),
induced by the canonical morphisms
κPr := ξ
′−1(<,>S) : Q→ P
∗ and κQr := ζ
′(<,>S) : P →
3.4. By a Morita datum we mean a tuple
M = ((T : A), (S : B), P, Q,<,>T , <,>S, I, J), (4)
where the following are Morita semi-contexts.
MT := ((T : A), (S : B), P, Q,<,>T , I) and MS := ((S : B), (T : A), Q, P,<,>S, J) (5)
If, moreover, the bilinear morphisms <,>T : P ⊗S Q → T and < −, >S: Q ⊗T P → S are
compatible, in the sense that
< q, p >S q
′ = q < p, q′ >T and p < q, p
′ >S =< p, q >T p
′ ∀ p, p′ ∈ P, q, q′ ∈ Q, (6)
then we call M a Morita context. If T, S in a Morita datum (context) M are unital,
then we call M a unital Morita datum (context).
3.5. LetM = ((T : A), (S : B), P, Q,<,>T , <,>S) andM
′ = ((T ′ : A′), (S ′ : B′), P ′, Q′, <
,>T ′, <,>S′) be Morita contexts. Extending [Ami1971, Page 275], we mean by a mor-
phism of Morita contexts from M to M′ a four fold set of maps
((β : δ), (γ : σ), φ, ψ) : ((T : A), (S : B), P, Q) → ((T ′ : A′), (S ′ : B′), P ′, Q′),
where (β : δ) : (T : A) → (T ′ : A′), (γ : σ) : (S : B) → (S ′ : B′) are rng morphisms,
φ : P → P ′ is (T, S)-bilinear and ψ : Q→ Q′ is (S, T )-bilinear, such that
β(< p, q >T ) =< φ(p), ψ(q) >T ′ and γ(< q, p >S) =< ψ(q), φ(p) >S′ ∀ p ∈ P, q ∈ Q.
By MC we denote the category of Morita contexts with morphisms defined as above, and
by UMC <MC the (non-full) subcategory of unital Morita contexts.
Example 3.6. If R is commutative, then any Morita semi-context (R,R, P,Q,<,>R) yields
a Morita context (R,R, P,Q,<,>R, [, ]R), where [, ]R := Q⊗R P ≃ P ⊗R Q
−→ R.�
3.7. We call a Morita semi-context mT = (T, S, P,Q,<,>T ) semi-derived (derived),
iff S := End(TP )
op (and Q = ∗P ). We call a Morita datum, or a Morita context, M =
(T, S, P,Q,<,>T , <,>S) semi-derived (derived), iff S = End(TP )
op, or T = End(PS)
(S = End(TP )
op and Q = ∗P, or T = End(PS) and Q = P
Remark 3.8. Following [Cae1998, 1.2.] (however, dropping the condition that the bilinear
map <,>T : P ⊗SQ→ T is surjective), Morita semi-contexts (T, S, P,Q,<>T ) in our sense
were called dual pairs in [Ver2006]. However, we think the terminology we are using is more
informative and avoids confusion with other notions of dual pairings in the literature (e.g.
the ones studied by the first author in [Abu2005]). The reason for this specific terminology
(i.e. Morita semi-contexts) is that every Morita context contains two Morita semi-contexts
as clear from the definition; and that any Morita semi-context can be extended to a (not
necessarily unital) Morita context in a natural way as explained below.
Elementary rngs
In what follows we demonstrate how to build new Morita (semi-)contexts from a given
Morita semi-context. These constructions are inspired by the notion of elementary rngs in
[Cae1998, 1.2.] (and [Ver2006, Remark 3.8.]):
Lemma 3.9. Let mT := ((T : A), (S : B), P, Q,<,>T ) ∈ MSC.
1. The (T, T )-bimodule T := P ⊗S Q has a structure of a T -rng (A-rng) with multipli-
cation
(p⊗S q) ·T (p
′ ⊗S q
′) :=< p, q >T p
′ ⊗S q
′ ∀ p, p′ ∈ P, q, q′ ∈ Q,
such that <,>T : T → T is a morphism of A-rngs, P is a (T, S)-bimodule and Q is
an (S,T)-bimodule, where
(p⊗S q)⇀ p̃ :=< p, q >T p̃ and q̃ ↼ (p⊗S q) := q̃ < p, q >T .
Moreover, we have morphisms of T -rngs (A-rngs)
ψ : T → End(PS), p⊗S q 7→ [p̃ 7→< p, q >T p̃];
φ : T → End(SQ)
op, p⊗S q 7→ [q̃ 7→ q̃ < p, q >T ],
((T : A), (S : B), P, Q, idT) ∈ MSC and we have a morphism of Morita semi-contexts
(<,>T , idS, , idP , idQ) : (T, S, P,Q, idT) → (T, S, P,Q,<,>T ).
2. The (S, S)-bimodule S := Q⊗T P has a structure of an S-rng (B-rng) with multipli-
cation
(q⊗T p) ·S (q
′ ⊗T p
′) := q < p, q′ >T ⊗T p
′ = q⊗T < p, q
′ >T p
′ ∀ p, p′ ∈ P, q, q′ ∈ Q,
such that <,>S: S → S is a morphism of B-rngs, P is a (T,S)-bimodule and Q is
an (S, T )-bimodule, where
p̃ ↼ (q ⊗T p) :=< p̃, q >T p and (q ⊗T p)⇀ q̃ := q < p, q̃ >T .
Moreover, we have morphisms of S-rngs (B-rngs)
Ψ : S → End(TP )
op, q ⊗T p 7→ [p̃ 7→< p̃, q >T p],
Φ : S → End(QT ), q ⊗T p 7→ [q̃ 7→ q < p, q̃ >T ],
and M := ((T : A), (S : B), P, Q,<,>T , idS) is a Morita context.
Remarks 3.10. 1. Given ((S : B), (T : A), Q, P,<,>S) ∈ MSC, the (S, S)-bimodule
S := Q⊗T P becomes an S-rng with multiplication
(q ⊗T p) ·S (q
′ ⊗T p
′) :=< q, p >S q
′ ⊗T p
′ ∀ p, p′ ∈ P, q, q′ ∈ Q;
and the (T, T )-bimodule T := P ⊗S Q becomes a T -rng with multiplication
(p⊗S q) ·T (p
′⊗S q
′) := p < q, p′ >S ⊗S q
′ = p⊗S < q, p
′ >S q
′ ∀ p, p′ ∈ P, q, q′ ∈ Q.
Analogous results to those in Lemma 3.9 can be obtained for the S-rng S and the
T -rng T.
2. Given a Morita semi-context (T, S, P,Q,<,>T ) several equivalent conditions for the
T -rng T := P ⊗S Q to be unital and the modules TP, QT to be firm can be found in
[Ver2006, Theorem 3.3.]. Analogous results can be formulated for the S-rng Q⊗T P
and the S-modules PS, SQ corresponding to any (S, T,Q, P,<,>S) ∈ MSC.
Proposition 3.11. 1. Let mT = (T, S, P,Q,<,>T ) ∈ UMSC and assume the A-rng
T := P ⊗S Q to be unital. If <,>T : T → T respects unities (and mT is injective),
then <,>T is surjective (T
≃ T as A-rings).
2. Let mS = (S, T,Q, P,<,>S) ∈ UMSC and assume the B-rng S := Q ⊗S P to
be unital. If <,>S: S → S respects unities (and mS is injective), then <,>S is
surjective (S
≃ S as B-rings).
3. Let M = (T, S, P,Q,<,>T , <,>S) ∈ UMC and assume the rngs T := P ⊗S Q, T,
S := Q⊗S P to be unital. If <,>T : P ⊗S Q → T and <,>S: S → S respect unities,
then T
≃ T as A-ring, S
≃ S as B-rings and we have equivalences of categories
TM ≈ SM (and MT ≈ MS).
Proof. Assume T is unital with 1T =
pi ⊗S qi. If <,>T respects unities, then we
< pi, qi >T= 1T , and so for any t ∈ T we get t = t1T =
t < pi, qi >T=∑n
< tpi, qi >T∈ Im(<,>T ). One can prove “2” analogously. As for “3”, it is well
known that a unital Morita context with surjective connecting bimodule morphisms is
strict (e.g. [Fai1981, 12.7.]), hence T
≃ T, S
≃ S. The equivalences of categories
TM ≃ TM ≈ SM ≃ SM (and MT ≃ MT ≈ MS ≃ MS) follow then by classical Morita
Theory (e.g. [Fai1981, Chapter 12]).�
Definition 3.12. Let T be an A-rng, VT a right T -module and consider for every left
T -module TL the annihilator
ann⊗L(VT ) := {l ∈ L | V ⊗T l = 0}.
Following [AF1974, Exercises 19], we say VT is L-faithful, iff ann
L(VT ) = 0; and to be
completely faithful, iff VT is L-faithful for every left T -module SL. Similarly, we can
define completely faithful left T -modules.
Under suitable conditions, the following result characterizes the Morita data, which
are Morita contexts:
Proposition 3.13. Let M = (T, S, P,Q,<,>T , <,>S) be a Morita datum.
1. If M ∈ MC, then S
≃ S and T
≃ T as rngs.
2. Assume TP is Q-faithful and QT is P -faithful. Then M ∈ MC if and only if S
and T
≃ T as rngs.
Proof. 1. Obvious.
2. Assume S
≃ S and T
≃ T as rngs. If p ∈ P and q, q′ ∈ Q are arbitrary, then we have
for any p̃ ∈ P :
< q, p >S q
′ ⊗T p̃ = (q ⊗T p) ·S (q
′ ⊗T p̃) = (q ⊗T p) ·S (q
′ ⊗T p̃) = q < p, q
′ >T ⊗T p̃,
hence < q, p >S q
′ − q < p, q′ >T∈ annQ(P ) = 0 (since TP is Q-faithful), i.e.
< q, p >S q
′ = q < p, q′ >T for all p ∈ P and q, q
′ ∈ Q. Assuming QT is P -faithful,
one can prove analogously that < p, q >T p
′ = p < q, p′ >S for all p, p
′ ∈ P and
q ∈ Q. Consequently, M is a Morita context.�
4 Injective Morita (Semi-)Contexts
Definition 4.1. We call a Morita semi-context mT = (T, S, P,Q,<,>T , I) :
injective (resp. semi-strict, strict), iff <,>T : P ⊗S Q→ T is injective (resp. surjec-
tive, bijective);
non-degenerate, iff Q →֒ ∗P and P →֒ Q∗ canonically;
Morita α-semi-context, iff Pl := (Q, TP ) ∈ P
l (T ) and Qr := (P,QT ) ∈ P
r (T ).
Notation. By MSCα ≤ MSC (UMSCα ≤ UMSC) we denote the full subcategory of
(unital) Morita semi-contexts satisfying the α-condition. Moreover, we denote by IMSC ≤
MSC (IUMSC ≤ UMSC) the full subcategory of injective (unital) Morita semi-contexts.
Definition 4.2. We say a Morita datum (context) M = (T, S, P,Q,<,>T , <,>S, I, J) :
is injective (resp. semi-strict, strict), iff <,>T : P⊗SQ→ T and <,>S: Q⊗T P → S
are injective (resp. surjective, bijective);
is non-degenerate, iff Q →֒ ∗P, P →֒ Q∗, Q →֒ P ∗ and P →֒ ∗Q canonically;
satisfies the left α-condition, iffPl := (Q, TP ) ∈ P
l (T ) andQl := (P, SQ) ∈ P
l (S);
satisfies the right α-condition, iff Qr := (P,QT ) ∈ P
r (T ) and Pr := (Q,PS) ∈
Pαr (S);
satisfies the α-condition, or M is a Morita α-datum (Morita α-context), iff M
satisfies both the left and the right α-conditions.
Notation. By MCαl < MC (UMC
l < UMC) we denote the full subcategory of Morita
contexts satisfying the left α-condition, and by MCαr < MC (UMC
r < UMC) the full
subcategory of (unital) Morita contexts satisfying the right α-condition. Moreover, we set
α := MCαl ∩MC
r and UMC
α := UMCαl ∩ UMC
Lemma 4.3. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ MC. Consider the Morita semi-
context MS := (S, T,Q, P,<,>S), the dual pairings Pl := (Q, TP ) ∈ Pl(T ), Qr :=
(P,QT ) ∈ Pr(T ) and the canonical morphisms of rings
ρP : S → End(TP )
op and λQ : S → End(QT ).
1. If Qr is injective (semi-strict), then MS is injective (ρP : S → End(TP )
op is a
surjective morphism of B-rngs).
2. Assume PS is faithful and let Qr be semi-strict. Then S ≃ End(TP )
op (an isomor-
phism of unital B-rings) and MS is strict.
3. If Pl is injective (semi-strict), then MS is injective (λQ : S → End(QT ) is a surjec-
tive morphism of B-rngs).
4. Assume SQ is faithful and let Pl is semi-strict. Then S ≃ End(QT ) (an isomorphism
of unital B-rings) and MS is strict.
Proof. We prove only “1” and “2”, as “3” and “4” can be proved analogously.
Consider the following butterfly diagram with canonical morphisms
Q⊗T Q
∗P ⊗T P
Q⊗T P
idQ⊗TκQr
llYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
⊗T idP
22eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
uujjj
Hom−T (
∗P,Q)
(κPl ,Q)
Hom−T (Q
∗, P )
(κQr ,P )
ttjjjj
**TTT
End(QT ) End(TP )
qi ⊗T pi ∈ Q⊗T P be arbitrary. For every p̃ ∈ P we have
[(κQr , P ) ◦ α
qi ⊗T pi)](p̃) =
< p̃, qi >T pi
p̃ < qi, pi >S
= ρP (
< qi, pi >S)(p̃)
= (ρP◦ <,>S)(
qi ⊗T pi)(p̃),
i.e. α
P := (κQr , P ) ◦ α
P = ρP◦ <,>S; and
[, ]lP ◦ (κPl ⊗T idP ))(
qi ⊗T pi)](p̃) =
κPl(qi)(p̃)pi
< p̃, qi >T pi
p̃ < qi, pi >S
= ρP (
< qi, pi >S)(p̃)
= [(ρP◦ <,>S)(
qi ⊗T pi)](p̃),
i.e. [, ]lP ◦ (κPl ⊗T idP ) = ρP◦ <,>S . On the other hand, for every q̃ ∈ Q we have
((κPl , Q) ◦ α
qi ⊗T pi)(q̃) =
qi < pi, q̃ >T
< qi, pi >S)q̃
= λQ(
< qi, pi >S)(q̃)
= (λQ◦ <,>S)(
qi ⊗T pi),
i.e. α
Q := (κPl , Q) ◦ α
Q = λQ◦ <,>S and
([, ]rQ ◦ (idQ ⊗T κQr))(
qi ⊗T pi)](q̃) =
qiκQr(pi)(q̃)
qi < pi, q̃ >T
< qi, pi >S q̃
= λQ(
< qi, pi >S)(q̃)
= [(λQ◦ <,>S)(
qi ⊗T pi)](q̃),
i.e. [, ]rQ ◦ (idQ ⊗T κQr) = λQ◦ <,>S . Hence Diagram (7) is commutative.
(1) Follows directly from the assumptions and the equality α
P = ρP◦ <,>S .
(2) Let PS be faithful, so that the canonical left S-linear map ρP : S → End(TP )
is injective. Assume now that Qr is semi-strict. Then ρP is surjective by “1” , whence
bijective. Since rings of endomorphisms are unital, we conclude that S ≃ End(TP )
op is a
unital B-ring as well (with unity ρ−1P (idP )). Moreover, the surjectivity of α
P = ρP◦ <,>S
implies that <,>S is surjective (since ρP is injective), say 1S =
< q̃j, p̃j >S for some
{(q̃j , p̃j)}J ⊆ Q× P. For any
qi ⊗T pi ∈ Ker(<,>S), we have then
qi ⊗T pi = (
qi ⊗T pi) · 1S =
(qi ⊗T pi) · (
< q̃j, p̃j >S)
qi ⊗T pi < q̃j, p̃j >S =
qi⊗T < pi, q̃j >T p̃j
qi < pi, q̃j >T ⊗T p̃j =
< qi, pi >S q̃j ⊗T p̃j
< qi, pi >S)q̃j ⊗T p̃j = 0,
i.e. <,>S is injective, whence an isomorphism.�
The following result shows that Morita α-contexts are injective:
Corollary 4.4. MCαl ∪MC
r ≤ IMC.
Example 4.5. Let mT = (T, S, P,Q,<,>T ) be a non-degenerate Morita semi-context. If
T is a QF ring and the T -modules TP, QT are projective, then by Proposition 2.12 “7”
Pl := (Q, TP ) ∈ P
l (T ) and Qr := (P,QT ) ∈ P
r (T ) (i.e. mT is a Morita α-semi-
context, whence injective). On the other hand, let M = (T, S, P,Q,<,>T , <,>S) be a
non-degenerate Morita datum. If T, S are QF rings and the modules TP, QT , PS, SQ are
projective, then M is an Morita α-datum (whence injective).�
Every semi-strict unital Morita context is injective (whence strict, e.g. [Fai1981, 12.7.]).
The following example, which is a modification of [Lam1999, Example 18.30]), shows that
the converse is not necessarily true:
Example 4.6. Let T = M2(Z2) be the ring of 2× 2 matrices with entries in Z2. Notice that
∈ T is an idempotent, and that eTe ≃ Z2 as rings. Set
P := Te = {
| a′, c′ ∈ Z2} and Q := eT = {
| a, b ∈ Z2}.
Then P = Te is a (T, eTe)-bimodule and Q = eT is an (eTe, T )-bimodule. Moreover, we
have a Morita context
Me = (T, eTe, T e, , eT, <,>T , < . >eTe),
where the connecting bilinear maps are
<,>T : Te⊗eTe eT → T,
a′a a′b
c′a c′b
<,>eTe : eT ⊗T Te → eTe
aa′ + bc′ 0
Straightforward computations show that<,>T is injective but not surjective (as
Im(<,>T )) and that <,>eTe is in fact an isomorphism. This means that Me is an injective
Morita context that is not semi-strict (whence not strict).�
Definition 4.7. Let T be a rng and I ⊳ T an ideal. For every left T -module TV consider
the canonical T -linear map
ζI,V : V → HomT (I, V ), v 7→ [t 7→ tv].
We say T I is strongly V -faithful, iff annV (I) := Ker(ζI,V ) := 0. Moreover, we say I is
strongly faithful, if T I is V -faithful for every left T -module TV. Strong faithfulness of I
w.r.t. right T -modules can be defined analogously.
Remark 4.8. Let T be a rng, I ⊳ T an ideal and TU a left ideal. It’s clear that ann
U(IT ) ⊆
annU(I) := Ker(ζI,U). Hence, if T I is strongly U-faithful, then IT is U-faithful (which
justifies our terminology). In particular, if T I is strongly faithful, then IT is completely
faithful.
Morita α-contexts are injective by Corollary 4.4. The following result gives a
partial converse:
Lemma 4.9. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ MC and assume the Morita semi-
context MS := (S, T,Q, P,<,>S, J) is injective.
1. If SJ is strongly faithful, then Qr := (P,QT ) ∈ P
r (T ).
2. If JS is strongly faithful, then Pl := (Q, TP ) ∈ P
l (T ).
Proof. We prove only “1”, since “2” can be proved similarly. Assume MS is injective and
consider for every left T -module U the following diagram
Q⊗T U
ζJ,Q⊗T U ((QQ
HomT−(P, U)
ψQ,Uuukkk
HomS−(J,Q⊗T U)
where for all f ∈ HomT−(P, U) and
< qj , pj >S∈ J we define
ψQ,U(f)(
< qj , pj >S) :=
qj ⊗T f(pj).
Then we have for every
q̃i ⊗T ũi ∈ Q⊗T U and s =
< qj, pj >S∈ J :
(ψQ,U ◦ α
q̃i ⊗T ũi)(s) =
qj ⊗T [α
q̃i ⊗T ũi)](pj)
qj ⊗T
< pj , q̃i >T ũi]
qj⊗T < pj, q̃i >T ũi
qj < pj , q̃i >T ⊗T ũi
< qj , pj >S q̃i ⊗T ũi
= ζJ,Q⊗TU(
q̃i ⊗T ũi)(s),
i.e. diagram (8) is commutative. If SJ is strongly faithful, then Ker(ζJ,Q⊗TU) = annQ⊗TU(J) =
0, hence ζJ,Q⊗TU is injective and it follows then that α
U is injective.�
Proposition 4.10. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ IMC. If T I, IT , SJ and JS
are strongly faithful, then M ∈ MCα.
5 Equivalences of Categories
In this section we give some applications of injective Morita (semi-)contexts and in-
jective Morita data to equivalences between suitable subcategories of modules arising in
the Kato-Müller-Ohtake localization-colocalization theory (as developed in (e.g. [Kat1978],
[KO1979], [Mül1974]). All rings, hence all Morita (semi-)contexts and data, in this section
are unital.
Static and Adstatic Modules
5.1. ([C-IG-TW2003]) Let A and B be two complete cocomplete Abelian categories, R :
A → B an additive covariant functor with left adjoint L : B → A and let
ω : LR → 1A and η : 1B → RL
be the induced natural transformations (called the counit and the unit of the adjunction,
respectively). Related to the adjoint pair (L,R) are two full subcategories of A and B :
Stat(R) := {X ∈ A | LR(X)
≃ X} and Adstat(R) := {Y ∈ B | Y
≃ RL(Y )},
whose members are called R-static objects and R-adstatic objects, respectively. It is
evident (from definition) that we have equivalence of categories Stat(R) ≈ Adstat(R).
A typical situation, in which static and adstatic objects arise naturally is the
following:
5.2. Let T, S be rings, TUS a (T, S)-bimodule and consider the covariant functors
HlU := HomT (U,−) : TM → SM and T
U := U ⊗S − : SM → TM.
It is well-known that (TlU ,H
U) is an adjoint pair of covariant functors via the natural
isomorphisms
HomT (U ⊗S M,N) ≃ HomS(M,HomT (U,N)) for all M ∈ SM and N ∈ TM
and the natural transformations
ωlU : U ⊗S HomT (U,−) → 1TM and η
U : 1SM → HomT (U, U ⊗S −)
yield for every TK and SL the canonical morphisms
ωlU,K : U ⊗S HomT (U,K) → K and η
U,L : L→ HomT (U, U ⊗S L). (9)
We call the HlU-static modules U-static w.r.t. S and set
Statl(TUS) := Stat(H
U) = {TK | U ⊗S HomT−(U,K)
≃ K};
and the HlU-adstatic modules U-adstatic w.r.t. S and set
Adstatl(TUS) := Adstat(H
U) = {SL | L
≃ HomT−(U, U ⊗S L)}.
By [Nau1990a] and [Nau1990b], there are equivalences of categories
Statl(TUS) ≈ Adstat
l(TUS). (10)
On the other hand, one can define the full subcategories Statr(TUS) ≈ Adstat
r(TUS) :
Statr(TUS) := {KS | Hom−S(U,K)⊗T U ≃ K};
Adstatr(TUS) := {LT | L ≃ Hom−S(U, L⊗T U)}.
In particular, setting
Stat(TU) := Stat
l(TUEnd(TU)op); Adstat(TU) := Adstat
l(TUEnd(TU)op);
Stat(US) := Stat
r(End(SU)US); Adstat(US) := Adstat
r(End(SU)US),
there are equivalences of categories:
Stat(TU) ≃ Adstat(TU) and Stat(US) ≃ Adstat(US). (11)
Remark 5.3. The theory of static and adstatic modules was developed in a series of papers
by the second author (see the references). They were also considered by several other
authors (e.g. [Alp1990], [CF2004]). For other terminologies used by different authors, the
interested reader may refer to a comprehensive treatment of the subject by R. Wisbauer
in [Wis2000].
Intersecting subcategories
Several intersecting subcategories related to Morita contexts were introduced in
the literature (e.g. [Nau1993], [Nau1994-b]). In what follows we introduce more and we
show that many of these coincide, if one starts with an injective Morita semi-context.
Moreover, other results on equivalences between some intersecting subcategories related
to an injective Morita context will be reframed for arbitrary (not necessarily compatible)
injective Morita data.
Definition 5.4. 1. For a right T -module X, a T -submodule X ′ ⊆ X is called K-pure
for some left T -module TK, iff the following sequence of Abelian groups is exact
0 → X ′ ⊗T K → X ⊗T K → X/X
′ ⊗T K → 0;
2. For a left T -module Y, a T -submodule Y ′ ⊆ Y is called L-copure for some left
T -module TL, iff the following sequence of Abelian groups is exact
0 → HomT (Y/Y
′, L) → HomT (Y, L) → HomT (Y
′, L) → 0.
Definition 5.5. (Compare [KO1979, Theorems 1.3., 2.3.]) Let T be a ring, I ⊳ T an
ideal, U a left T -module and consider the canonical T -linear morphisms
ζI,U : U → HomT (I, U) and ξI,U : I ⊗T U → U.
1. We say TU is I-divisible, iff ξI,U is surjective (equivalently, iff IU = U).
2. We say TU is I-localized, iff U
≃ HomT (I, U) canonically (equivalently iff T I is
strongly U -faithful and T I ⊆ T is U -copure).
3. We say a left T -module U is I-colocalized, iff I⊗TU
≃ U canonically (equivalently,
iff TU is I-divisible and IT ⊆ T is U -pure).
Notation. For a ring T, an ideal I ⊳ T, and with morphisms being the canonical ones,
we set
ID := {TU | IU = U}; IF := {TU | U →֒ HomT−(I, U)};
IL := {TU | U ≃ HomT (I, U}; IC := {TU | I ⊗T U ≃ U};
DI := {UT | UI = U}; FI := {UT | U →֒ Hom−T (I, U)};
LI := {UT | U ≃ HomT (I, U}; CI := {UT | U ⊗T I ≃ U}; .
The following result is due to T. Kato, K. Ohtake and B. Müller (e.g. [Mül1974],
[Kat1978], [KO1979]):
Proposition 5.6. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then there are equiv-
alences of categories
IC ≈ JC, CI ≈ CJ , IL ≈ JL and LI ≈ LJ .
5.7. Let mT = (T, S, P,Q,<,>T , I) ∈ UMSC and consider the dual pairings Pl := (Q,
TP ) ∈ Pl(T ) and Qr := (P,QT ) ∈ Pr(T ). For every left (right) T -module U consider the
canonical S-linear morphism induced by <,>T :
U : Q⊗T U → HomT−(P, U) (α
U : U ⊗T P → Hom−T (Q,U)).
We define
Dl(mT ) := {TU | Q⊗T U
≃ HomT−(P, U)};
Dr(mT ) := {UT | U ⊗T P
≃ Hom−T (Q,U)}.
Moreover, set
Ul(mT ) := Stat
l(TPS) ∩Adstat
l(SQT ); Ur(mT ) := Stat
r(SQT ) ∩Adstat
r(TPS);
Vl(mT ) := Stat
l(TPS) ∩ Dl(mT ); Vr(mT ) := Stat
r(SQT ) ∩ Dr(mT );
Vl(mT ) := IC ∩ Dl(mT ); Vr(mT ) := CI ∩ Dr(mT );
V̂l(mT ) := Vl(mT )∩ IL; V̂r(mT ) := Vr(mT ) ∩ LI ;
Wl(mT ) := Adstat
l(SQT ) ∩ Dl(mT ); Wr(mT ) := Adstat
r(TPS) ∩ Dr(mT );
Wl(mT ) := IL ∩ Dl(mT ); Wr(mT ) := LI ∩ Dr(mT );
Ŵl(mT ) := Wl(mT )∩ IC; Ŵr(mT ) := Wr(mT ) ∩ CI ;
Xl(mT ) := Vl(mT ) ∩Wl(mT ); Xr(mT ) := Vr(mT ) ∩Wr(mT );
Xl(mT ) := Vl(mT ) ∩Wl(mT ); Xr(mT ) := Vr(mT ) ∩Wr(mT ).
X ∗l (mT ) := {S(Q⊗T U) | V ∈ Xl(mT )}; X
r (mT ) := {(U ⊗T P )S | V ∈ Xr(mT )};
l (mT ) := {S(Q⊗T U) | V ∈ Xl(mT )}; X
r(mT ) := {(U ⊗T P )S | V ∈ Xr(mT )}.
Given mS = (S, T,Q, P,<,>S, J) ∈ UMSC one can define analogously, the corresponding
intersecting subcategories of SM and MS.
As an immediate consequence of Proposition 5.6 we get
Corollary 5.8. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ IUMC and consider the asso-
ciated Morita semi-contexts MT and MS (5).
1. If IC ≤ Dl(MT ) and JC ≤ Dl(MS), then Vl(MT ) ≈ Vl(MS). Similarly, if CI ≤
Dr(MT ) and CJ ≤ Dr(MS), then Vr(MT ) ≈ Vr(MS).
2. If IL ≤ Dl(MT ) and JL ≤ Dl(MS), then Wl(MT ) ≈ Wl(MS). Similarly, if LI ≤
Dr(MT ) and LJ ≤ Dr(MS), then Wr(MT ) ≈ Wr(MS).
Starting with a Morita context, the following result was obtained in [Nau1993,
Theorem 3.2.]. We restate the result for an arbitrary (not necessarily compatible) Morita
datum and sketch its proof:
Lemma 5.9. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be a unital Morita datum and con-
sider the associated Morita semi-contexts MT and MS in (5). Then there are equivalences
of categories
Xl(MT )
HomT−(P,−)
HomS−(Q,−)
Xl(MS) and Xr(MT )
Hom−T (Q,−)
Hom−S(P,−)
Xr(MS).
Proof. Let TV ∈ Xl(MT ). By the equivalence Stat
l(TPS)
HomT (P,−)
≈ Adstatl(TPS) in 5.2 we
have HomT−(P, V ) ∈ Adstat
l(TPS). Moreover, V ∈ Dl(M), hence HomT−(P, V ) ≃ Q⊗T V
canonically and it follows then from the equivalence Adstatl(SQT )
≈ Statl(SQT ) that
HomT−(P, V ) ∈ Stat
l(SQT ). Moreover, we have the following natural isomorphisms
P ⊗S HomT−(P, V ) ≃ V ≃ HomS−(Q,Q⊗T V ) ≃ HomS−(Q,HomT−(P, V )), (13)
i.e. HomT−(P, V ) ∈ Dl(MS). Consequently, HomT−(P, V ) ∈ Xl(MS). Moreover, (13)
yields a natural isomorphism V ≃ HomS−(Q,HomT−(P, V )). Analogously, one can show for
everyW ∈ Xl(MS) that HomS−(Q,W ) ∈ Xl(MT ) and thatW ≃ HomT−(P,HomS−(Q,W ))
naturally. Consequently, Xl(MT ) ≈ Xl(MS). The equivalences Xr(MT ) ≈ Xr(MS) can
be proved analogously.�
Proposition 5.10. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be a unital injective Morita
datum and consider the associated Morita semi-contexts MT and MS in (5).
1. There are equivalences of categories
Statl(T IT ) ≈ Adstat
l(T IT ); Stat
l(SJS) ≈ Adstat
l(SJS);
Statr(T IT ) ≈ Adstat
r(T IT ); Stat
r(SJS) ≈ Adstat
r(SJS).
2. If Statl(T IT ) ≤ X
l (MS) and Stat
l(SJS) ≤ X
l (MT ), then there are equivalences of
categories
Statl(T IT ) ≈ Stat
l(SJS) and Adstat
l(T IT ) ≈ Adstat
l(SJS).
3. If Statr(T IT ) ≤ X
r (MS) and Stat
r(SJS) ≤ X
r (MT ), then there are equivalences of
categories
Statr(T IT ) ≈ Stat
r(SJS) and Adstat
r(T IT ) ≈ Adstat
r(SJS).
Proof. To prove “1”, notice that since M is an injective Morita datum, P ⊗S Q
and Q⊗T P
≃ J as bimodules and so the four equivalences of categories result from 5.2.
To prove “2”, one can use an argument similar to that in [Nau1994-b, Theorem 3.9.] to
show that the inclusion Statl(T IT ) = Stat
l(T (P ⊗S Q)T ) ≤ X
l (MS) implies Stat
l(T IT ) =
Statl(T (P ⊗S Q)T ) = Xl(MT ) and that the inclusion Stat
l(SJS) = Stat
l(S(Q ⊗T P )S) ≤
X ∗l (MT ) implies Stat
l(SJS) = Stat
l(S(Q ⊗T P )S) = Xl(MS). The result follows then by
Lemma 5.9. The proof of “3” is analogous to that of “2”.�
For injective Morita semi-contexts, several subcategories in (12) are shown in the
following result to be equal:
Theorem 5.11. Let mT = (T, S, P,Q,<,>T , I) ∈ IUMS. Then
1. Vl(mT ) = Vl(mT ), Wl(mT ) = Wl(mT ), whence
V̂l(mT ) = Ŵl(mT ) = Xl(mT ) = Xl(mT ) = IC∩Dl(mT )∩ IL and X
l (mT ) = X
l (mT ).
2. Vr(mT ) = Vr(mT ), Wr(mT ) = Wr(mT ), whence
V̂r(mT ) = Ŵr(mT ) = Xr(mT ) = Xr(mT ) = CI∩Dr(mT )∩LI and X
r (mT ) = X
r(mT ).
Proof. We prove only “1” as “2” can be proved analogously. Assume the Morita semi-
context mT = (T, S, P,Q,<,>T , I) is injective. By our assumption we have for every
V ∈ Dl(mT ) the commutative diagram
P ⊗S (Q⊗T V )
idP⊗S(α
(P ⊗S Q)⊗T V
<,>T⊗T idV≃
P ⊗S HomT−(P, V )
// V I ⊗T VξI,V
Then it becomes obvious that ωlP,V : P ⊗S HomT (P, V ) → V is an isomorphism if and only
if ξI,V : I ⊗T V → V is an isomorphism. Consequently
V(mT ) = Dl(mT ) ∩ Stat
l(TPS) = Dl(mT ) ∩ IC = V(mT ).
On the other hand, we have for every V ∈ Dl(mT ) the following commutative diagram
HomS−(Q,HomT−(P, V ))
// HomT−(P ⊗S Q, V )
HomS−(Q,Q⊗T V )
// HomT−(I, V )
(<,>T ,V )≃
It follows then that ηlP,L : V → HomS(Q,Q ⊗T P ) is an isomorphism if and only if
ζI,V : V → HomT (I, V ) is an isomorphism. Consequently,
W(mT ) = Dl(mT ) ∩ Adstat
l(TPS) = Dl(mT ) ∩I L = W(mT ).
Moreover, we have
V̂l(mT ) := Vl(mT ) ∩ IL = Vl(mT ) ∩ IL = IC ∩ Dl(mT )∩ IL
= IC ∩Wl(mT ) = IC ∩Wl(mT ) = Ŵl(mT ).
On the other hand, we have
Xl(mT ) = Vl(mT ) ∩Wl(mT ) = Vl(mT ) ∩Wl(mT ) = Xl(mT )
and so the equalities V̂l(mT ) = Ŵl(mT ) = Xl(mT ) = Xl(mT ) and X
l (mT ) = X
l (mT ) are
established.�
In addition to establishing several other equivalences of intersecting subcategories,
the following results reframe the equivalence of categories V̂ ≈ Ŵ in [Nau1994-b, Theorem
4.9.] for an arbitrary (not necessarily compatible) injective Morita datum:
Theorem 5.12. Let M = (T, S, P,Q,<,>T , <,>S, I, J) be an injective Morita datum and
consider the associated Morita semi-contexts MT and MS (5).
1. The following subcategories are mutually equivalent:
V̂l(MT ) = Ŵl(MT ) = Xl(MT ) = Xl(MT ) ≈ Xl(MS) = Xl(MS) = Ŵl(MS) = V̂l(MS).
2. If Vl(MT ) ≤ IL and Wl(MS) ≤ JC, then Vl(MT ) ≈ Wl(MS). If Wl(MT ) ≤ IC
and Vl(MS) ≤ JL, then Wl(MT ) ≈ Vl(MS).
3. The following subcategories are mutually equivalent:
V̂r(MT ) = Ŵr(MT ) = Xr(MT ) = Xr(MT ) ≈ Xr(MS) = Xr(MS) = Ŵr(MS) = V̂r(MS).
4. If Vr(MT ) ≤ LI and Wr(MT ) ≤ CJ , then Vr(MT ) ≈ Wr(MS). If Wr(MT ) ≤ CJ
and Vr(MS) ≤ LI , then Vr(MS) ≈ Wr(MT ).
Proof. By Lemma 5.9, Xl(MT ) ≈ Xl(MS) and so “1” follows by Theorem 5.11. If
Vl(MT ) ≤ IL and Wl(MS) ≤ JC, then we have
Vl(MT ) = Vl(MT ) ∩ IL = V̂l(MT ) ≈ Ŵl(MS) = Wl(MS) ∩ JC = Wl(MS).
On the other hand, if Wl(MT ) ≤ IL and Vl(MS) ≤ JC, then
Wl(MT ) = Wl(MT ) ∩ IC = Ŵl(MT ) ≈ V̂l(MS) = Vl(MS) ∩ JL = Vl(MS).
So we have established “2”. The results in “3” and “4” can be obtained analogously.�
6 More applications
In this final section we give more applications of Morita α-(semi-)contexts and
injective Morita (semi-)contexts. All rings in this section are unital, whence all Morita
(semi-)contexts are unital. Moreover, for any ring T we denote with TE an arbitrary, but
fixed, injective cogenerator in TM.
Notation. Let T be an A-ring. For any left T -module TV, we set
#V := HomT (V, TE). If
moreover, TVS is a (T, S)-bimodule for some B-ring S, then we consider
S V with the left
S-module structure induced by that of VS.
Lemma 6.1. (Compare [Col1990, Lemma 3.2.], [CF2004, Lemmas 2.1.2., 2.1.3.]) Let T be
an A-ring, S a B-ring and TVS a (T, S)-bimodule,
1. A left T -module TK is V -generated if and only if the canonical T -linear morphism
ωlV,K : V ⊗S HomT (V,K) → K (18)
is surjective. Moreover, V ⊗S W ⊆ Pres(TV ) ⊆ Gen(TV ) for every left S-module
2. A left S-module SL is
S V -cogenerated if and only if the canonical S-linear morphism
ηlV,L : L→ HomT (V, V ⊗S L) (19)
is injective. Moreover, HomT (V,M) ⊆ Copres(
S V ) ⊆ Cogen(
S V ) for every left
T -module TM.
Remark 6.2. Let T be an A-ring, S a B-ring and TVS a (T, S)-bimodule. Notice that for
any left S-module SL we have
ann⊗L(VS) := {l ∈ L | V ⊗S l = 0} = Ker(η
V,L),
whence (by Lemma 6.1 “2” ) VS is L-faithful if and only if SL is
S V -cogenerated. It follows
then that VS is completely faithful if and only if
S V is a cogenerator.
Localization and colocalization
In what follows we clarify the relations between static (adstatic) modules and subcate-
gories colocalized (localized) by a trace ideal of a Morita context satisfying the α-condition.
Recall that for any (T, S)-bimodule TPS we have by Lemma 6.1:
Statl(TPS) ⊆ Gen(TP ) and Adstat
l(TPS) ⊆ Cogen(
S P ). (20)
Theorem 6.3. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then we have
IC ⊆ ID ⊆ Gen(TP ). (21)
Assume Pr := (Q,PS) ∈ P
r (S). Then
1. Gen(TP ) = Stat
l(TPS) ⊆ IF.
2. If Gen(TP ) ⊆ IC, then IC = ID = Gen(TP ) = Stat
l(TPS).
3. If Qr := (P,QT ) ∈ P
r (T ), then T I ⊆ TT is pure and IC = ID.
Proof. For every left T -module TK, consider the following diagram with canonical mor-
phisms and let α2 := ζI,K ◦ ω
P,K. It is easy to see that both rectangles and the two right
triangles commutes:
P ⊗S Q⊗T K
idP⊗Sα
<,>T⊗T idK
P ⊗S HomT (P,K)
HomT (P,K)//
HomS(Q,HomT (P,K))
HomT (P ⊗S Q,K)
I ⊗T K
// HomT (I,K)
(<,>T ,K)
It follows directly from the definitions that IC ⊆ ID and Stat
l(TPS) ⊆ Gen(TP ). If TK
is I-divisible, then ξI,K◦ <,>T ⊗T idK = ω
P,K ◦ idP ⊗S α
K is surjective, whence ω
is surjective and we conclude that TK is P -generated by Lemma 6.1 “1”. Consequently,
ID ⊆ Gen(TP ).
Assume now that Pr ∈ P
r (S). Considering the canonical map ρQ : T → End(SQ)
the map ρQ◦ <,>T= α
Q is injective and so the bilinear map <,>T is injective (i.e.
P ⊗S Q
≃ I). Define α1 := (idP ⊗S α
K ) ◦ (<,>T ⊗T idK)
−1, so that the left triangles
commute. Notice that αPr
HomT (P,K)
is injective and the commutativity of the upper right
triangle in Diagram (22) implies that α2 is injective (whence ω
P,K is injective by the
commutativity of the lower right triangle).
1. If K ∈ Statl(TPS), then the commutativity of the lower right triangle (22) and the
injectivity of α2 show that ζI,K is injective; hence, Stat
l(TPS) ⊆ IF. On the other
hand, if TK is P -generated, then ω
P,K is surjective by Lemma 6.1 (1), thence bijective,
i.e. K ∈ Statl(TPS). Consequently, Gen(TP ) = Stat
l(TPS).
2. This follows directly from the inclusions in (21) and “1”.
3. Assume Qr := (P,QT ) ∈ P
r (T ). Since Pr ∈ P
r (S), it follows by analogy to Propo-
sition 2.12 “3” that PS is flat, hence idP ⊗S α
K is injective. The commutativity of
the upper left triangle in Diagram (22) implies then that α1 is injective, thence ξI,K
is injective by commutativity of the lower left triangle (i.e. T I ⊆ TT is K-pure). If
TK is divisible, then K ⊗T I
≃ K (i.e. K ∈ IC).�
Theorem 6.4. Let M = (T, S, P,Q,<,>T , <,>S, I, J) ∈ UMC. Then we have
JL ⊆ JF ⊆ Cogen(
S P ) and Adstat
l(TPS) ⊆ Cogen(
S P ).
Assume Qr := (P,QT ) ∈ P
r (T ). Then
1. JS ⊆ SS is pure and JC ⊆ Cogen(
S P ).
2. If Pr := (Q,PS) ∈ P
r (S), then JL ⊆ Adstat
l(TPS) ⊆ Cogen(
S P ) ⊆ JF.
3. If Pr ∈ P
r (S) and Cogen(
S P ) ⊆ JL, then JL = Cogen(
S P ) = Adstat
l(TPS).
Proof. For every right S-module L consider the commutative diagram with canonical
morphisms and let α3 be so defined, that the left triangles become commutative
J ⊗S L
ξJ,L //
ζJ,L // HomS(J, L)
(<,>S ,L)
HomS(Q⊗T P, L)
≃ can
Q⊗T P ⊗S L
(<,>S)⊗S idL
// HomT (P, P ⊗S L)
HomT (P,HomS(Q,L))
By definition JL ⊆ JF and Adstat
l(TPS) ⊆ Cogen(
S P ). If SL ∈ JF, then ζJ,L is injective
and it follows by commutativity of the right rectangle in Diagram (23) that ηlP,L is injective,
hence SL is
S P -cogenerated by Lemma 6.1 “2”. Consequently, JF ⊆ Cogen(
S P ).
Assume now that Qr ∈ P
r (T ). Then it follows from Lemma 4.3 that <,>S is injective
(hence Q⊗T P
≃ J) and so α4 := (can ◦ (<,>S, L))
−1 ◦ (P, αPrL ) is injective.
1. Since α3 is injective, ξJ,L is also injective for every SL, i.e. JS ⊆ SS is pure. If SL ∈
JC, then it follows from the commutativity of the left rectangle in Diagram (23) that
ηlP,L is injective, hence L ∈ Cogen(
S P ) by Lemma 6.1 (2).
2. Assume that Pr ∈ P
r (S), so that α4 is injective. If SL ∈ JL, then ζJ,L is an
isomorphism, thence ηlP,L is surjective (notice that α4 is injective). Consequently,
JL ⊆ Adstat
l(TPS).
3. This follows directly from the assumptions and “2”.�
∗-Modules
To the end of this section, we fix a unital ring T, a left T -module TP and set S :=
End(TP )
Definition 6.5. ([MO1989]) We call TP a ∗-module, iff Gen(TP ) ≈ Cogen(
S P ).
Remark 6.6. It was shown by J. Trlifaj [Trl1994] that all ∗-modules are finitely generated.
By definition, Statl(TPS) ≤ TM and Adstat
l(TPS) ≤ SM are the largest subcat-
egories between which the adjunction (P ⊗S −,HomT (P,−)) induces an equivalence. On
the other hand, Lemma 6.1 shows that Gen(TP ) ≤ TM and Cogen(
S P ) ≤ SM are the
largest such subcategories (see [Col1990, Section 3] for more details). This suggests the
following observation:
Proposition 6.7. ([Xin1999, Lemma 2.3.]) We have
TP is a ∗ -module ⇔ Stat(TP ) = Gen(TP ) and Adstat(TP ) = Cogen(
S P ).
Definition 6.8. A left T -module TU is said to be
semi-
-quasi-projective (abbr. s-
-quasi-projective), iff for any left T -module
TV ∈ Pres(TU) and any U-presentation
U (Λ) → U (Λ
′) → V → 0
of TV (if any), the following induced sequence is exact:
HomT (U, U
(Λ)) → HomT (U, U
(Λ′)) → HomT (U, V ) → 0;
weakly-
-quasi-projective (abbr. w-
-quasi-projective), iff for any left T -
module TV and any short exact sequence
0 → K → U (Λ
′) → V → 0
with K ∈ Gen(TU) (if any), the following induced sequence is exact:
0 → HomT (U,K) → HomT (U, U
(Λ′)) → HomT (U, V ) → 0;
self-tilting, iff TU is w-
-quasi-projective and Gen(TU) = Pres(TU);∑
-self-static, iff any direct sum U (Λ) is U -static.
(self)-small, iff HomT (U,−) commutes with direct sums (of TU);
Proposition 6.9. Assume M = (T, S, P,Q,<,>T , <,>S) is a unital Morita context.
1. If Pr := (Q,PS) ∈ P
r (S), then:
(a) Gen(TP ) = Stat
l(TPS);
(b) there is an equivalence of categories Gen(TP ) ≈ Cop(
S P );
(c) TP is
-self-static and Statl(TPS) is closed under factor modules.
(d) Gen(TP ) = Pres(TP );
2. If M ∈ UMCαr and Cogen(
S P ) ⊆ JL, then:
(a) Gen(TP ) = Stat
l(TPS) and Cogen(
S P ) = Adstat
l(TPS);
(b) there is an equivalence of categories Cogen(
S P ) ≈ Gen(TP );
(c) TP is a ∗-module;
(d) TP is self-tilting and self-small.
Proof. 1. If Pr ∈ P
r (S), then it follows by Theorem 6.3 that Gen(TP ) = Stat
l(TPS),
which is equivalent to each of “b” and “c” by [Wis2000, 4.4.] and to “d” by [Wis2000,
4.3.].
2. It follows by the assumptions, Theorems 6.3, 6.4 and 5.2 that Gen(TP ) = Stat
l(TPS) ≈
Adstatl(TPS) = Cogen(
S P ), whence Gen(TP ) ≈ Cogen(
S P ) (which is the definition
of ∗-modules). Hence “a” ⇔“b” ⇔“c”. The equivalence “a” ⇔ “d” is evident by
[Wis2000, Corollary 4.7.] and we are done.�
Wide Morita Contexts
Wide Morita contexts were introduced by F. Castaño Iglesias and J. Gómez-Torrecillas
[C-IG-T1995] and [C-IG-T1996] as an extension of classical Morita contexts to Abelian
categories.
Definition 6.10. Let A and B be Abelian categories. A right (left) wide Morita
context between A and B is a datum Wr = (G,A,B, F, η, ρ), where G : A ⇄ B : F are
right (left) exact covariant functors and η : F ◦ G −→ 1A, ρ : G ◦ F −→ 1B (η : 1A −→
F ◦ G, ρ : 1B −→ G ◦ F ) are natural transformations, such that for every pair of objects
(A,B) ∈ A× B the compatibility conditions G(ηA) = ρG(A) and F (ρB) = ηF (B) hold.
Definition 6.11. Let A and B be Abelian categories and W = (G,A,B, F, η, ρ) be a right
(left) wide Morita context. We call W injective (respectively semi-strict, strict), iff η
and ρ are monomorphisms (respectively epimorphisms, isomorphisms)
Remarks 6.12. Let W = (G,A,B, F, η, ρ) be a right (left) wide Morita context.
1. It follows by [CDN2005, Propositions 1.1., 1.4.] that if either η or ρ is an epimorphism
(monomorphism), then W is strict, whence A ≈ B.
2. The resemblance of injective left wide Morita contexts is with the Morita-Takeuchi
contexts for comodules of coalgebras, i.e. the so called pre-equivalence data for cate-
gories of comodules introduced in [Tak1977] (see [C-IG-T1998] for more details).
Injective Right wide Morita contexts
In a recent work [CDN2005, 5.1.], Chifan, et. al. clarified (for module categories) the
relation between classical Morita contexts and right wide Morita contexts. For the conve-
nience of the reader and for later reference, we include in what follows a brief description
of this relation.
6.13. Let T, S be rings, A := TM and B := SM. Associated to each Morita context
M = (T, S, P,Q,<,>T , <,>S) is a wide Morita context as follows: Define G : A ⇄ B : F
by G(−) = Q ⊗T − and F (−) = P ⊗S −. Then there are natural transformations η :
F ◦G −→ 1
and ρ : G ◦ F −→ 1
such that for each TV and WS :
ηV : P ⊗S (Q⊗T V ) → V,
pi ⊗S (qi ⊗T vi) 7→
< pi, qi >T vi,
ρW : Q⊗T (P ⊗S W ) → W,
qi ⊗T (pi ⊗S wi) 7→
< qi, pi >S wi.
Then the datum Wr(M) := (G, TM, SM, F, η, ρ) is a right wide Morita context.
Conversely, let T ′, S ′ be two rings and W ′r = (G
′, T ′M, S′M, F
′, η′, ρ′) be a right wide
Morita context between T ′M and S′M such that the right exact functors G
′ : T ′M ⇄
S′M : F
′ commute with direct sums. By Watts’ Theorems (e.g. [Gol1979]), there exists a
(T, S)-bimodule P ′ (e.g. F ′(S ′)) such that F ′ ≃ P ′⊗S′ −, an (S, T )-bimodule Q
′ such that
G′ ≃ Q′ ⊗T ′ − and there should exist two bilinear forms
<,>T ′: P
′ ⊗S′ Q
′ → T ′ and <,>S′: Q
′ ⊗T ′ P
′ → S ′,
such that the natural transformations η′ : F ′ ◦G′ → 1
, ρ : G′ ◦ F ′ → 1
are given by
η′V ′(p
′ ⊗S′ q
′ ⊗T ′ v
′) =< p′, q′ >T ′ v
′ and ρ′W ′(q
′ ⊗T p
′ ⊗S w
′) =< q′, p′ >S′ w
for all V ′ ∈ T ′M, W
′ ∈ S′M, p
′ ∈ P ′, q′ ∈ Q′, v′ ∈ V ′ and w′ ∈ W ′. It can be shown that
in this way one obtains a Morita context M′ = M′(W ′r) := (T
′, S ′, P ′, Q′, <,>T ′, <,>S′).
Moreover, it turns out that given a wide Morita context Wr, we have Wr ≃ Wr(M(Wr)).
The following result clarifies the relation between injective Morita contexts and
injective right wide Morita contexts.
Theorem 6.14. Let M = (T, S, P,Q,<,>T , <,>S) be a Morita context, A := TM, B :=
SM and consider the induced right wide Morita context Wr(M) := (G,A,B, F, η, ρ).
1. If Wr(M) is an injective right wide Morita context, then M is an injective Morita
context.
2. If M ∈ UMCαr , then Wr(M) is an injective right wide Morita context.
Proof. 1. Let Wr(M) be an injective right wide Morita context. Then in particular,
<,>T= ηT and <,>S= ρS are injective, i.e. M is an injective Morita context.
2. Assume that M satisfies the right α-condition. Suppose there exists some TV and∑
pi ⊗S (qi ⊗T vi) ∈ Ker(ηV ). Then for any q ∈ Q we have
0 = q ⊗T ηV (
(pi ⊗S qi)⊗T vi) =
q⊗T < pi, qi >T vi
q < pi, qi >T ⊗T vi =
< q, pi >S qi ⊗T vi
< q, pi >S (qi ⊗T vi) = α
pi ⊗S (qi ⊗T vi))(q).
Since Pr := (Q,PS) ∈ P
r (S), the morphism α
is injective and so
pi⊗S (qi⊗T
vi) = 0, i.e. ηV is injective. Analogously, suppose
qi ⊗T (pi ⊗S wi) ∈ Ker(ρW ).
Then for any p ∈ P we have
0 = p⊗S ρW (
qi ⊗T (pi ⊗S wi) =
p⊗S < qi, pi >S wi
p < qi, pi >S ⊗Swi =
< p, qi >T pi ⊗S wi
< p, qi >T (pi ⊗S wi) = α
qi ⊗T (pi ⊗S wi))(p).
Since Qr := (P,QT ) ∈ P
r (T ), the morphism α
is injective and so
qi ⊗T
(pi ⊗S wi) = 0, i.e. ρW is injective. Consequently, the induced right wide Morita
context Wr(M) is injective.�
Acknowledgement: The authors thank the referee for his/her careful reading of
the paper and for the fruitful suggestions, comments and corrections, which helped in
improving several parts of the paper. Moreover, they acknowledge the excellent research
facilities as well as the support of their respective institutions, King Fahd University of
Petroleum and Minerals and King AbdulAziz University.
References
[Abr1983] G.D. Abrams, Morita equivalence for rings with local units, Comm. Algebra 11
(1983), 801-837.
[Abu2005] J.Y. Abuhlail, On the linear weak topology and dual pairings over rings, Topol-
ogy Appl. 149 (2005), 161-175.
[AF1974] F. Anderson and K. Fuller, Rings and Categories of Modules, Springer-Verlag
(1974).
[AGH-Z1997] A.V. Arhangélskii, K.R. Goodearl and B. Huisgen-Zimmermann, Kiiti
Morita, (1915-1995 ), Notices Amer. Math. Soc. 44(6) (1997), 680-684.
[AG-TL2001] J.Y. Abuhlail, J. Gómez-Torrecillas and F. Lobillo, Duality and rational
modules in Hopf algebras over commutative rings, J. Algebra 240 (2001), 165-
[Alp1990] J.L. Alperin, Static modules and nonnormal Clifford theory, J. Austral. Math.
Soc. Ser. A 49(3) (1990), 347-353.
[AM1987] P.N. Ánh and L. Márki, Morita equivalence for rings without identity, Tsukuba
J. Math 11 (1987), 1-16.
[Ami1971] S.A. Amitsur, Rings of quotients and Morita contexts, J. Algebra 17 (1971),
273-298.
[Ber2003] I. Berbee, The Morita-Takeuchi theory for quotient categories, Comm. Algebra
31(2) (2003), 843-858.
[C-IG-T1995] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide Morita contexts, Comm.
Algebra 23 (1995), 601-622.
[C-IG-T1996] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide left Morita contexts and
equivalences, Rev. Roum. Math. Pures Appl. 4(1-2) (1996), 17-26.
[C-IG-T1998] F. Castaño Iglesias and J. Gomez-Torrecillas, Wide Morita contexts and
equivalences of comodule categories, J. Pure Appl. Algebra 131 (1998), 213-
[BW2003] T. Brzeziński and R. Wisbauer, Corings and Comodules, Lond. Math. Soc. Lec.
Not. Ser. 309, Cambridge University Press (2003).
[Cae1998] S. Caenepeel, Brauer Groups, Hopf Algebras and Galois Theory, Kluwer Aca-
demic Publishers (1998).
[C-IG-TW2003] F. Castaño Iglesias, J. Gómez-Torrecillas and R. Wisbauer, Adjoint func-
tors and equivalence of subcategories, Bull. Sci. Math. 127 (2003), 279-395.
[CDN2005] N. Chifan, S. Dăscălescu and C. Năstăsescu, Wide Morita contexts, relative
injectivity and equivalence results, J. Algebra 284 (2005), 705-736.
[Col1990] R. Colpi, Some remarks on equivalences between categories of modules, Comm.
Algebra 18 (1990), 1935-1951.
[CF2004] R. Colby and K. Fuller, Equivalence and Duality for Module Categories. With
Tilting and Cotilting for Rings, Cambridge University Press (2004).
[Fai1981] C. Faith, Algebra I, Rings, Modules and Categories, Springer-Verlag (1981).
[Gol1979] J. Golan, An Introduction to Homological Algebra, Academic Press (1979).
[HS1998] Z. Hao and K.-P. Shum, The Grothendieck groups of rings of Morita contexts,
Group theory (Beijing, 1996), 88-97, Springer (1998).
[Kat1978] T. Kato, Duality between colocalization and localization, J. Algebra 55 (1978),
351-374.
[KO1979] T. Kato and K. Ohtake, Morita contexts and equivalences. J. Algebra 61 (1979),
360-366.
[Lam1999] T.Y. Lam, Lectures on Modules and Rings, Springer (1999).
[MO1989] C. Menini and A. Orsatti, Representable equivalences between categories of mod-
ules and applications. Rend. Sem. Mat. Univ. Padova 82 (1989), 203-231.
[Mül1974] B.J. Müller, The quotient category of a Morita context, J. Algebra 28 (1974),
389-407.
[Nau1990a] S.K. Nauman, Static modules, Morita contexts, and equivalences. J. Algebra
135 (1990), 192-202.
[Nau1990b] S.K. Nauman, Static modules and stable Clifford theory, J. Algebra 128(2)
(1990), 497-509.
[Nau1993] S.K. Nauman, Intersecting subcategories of static modules and their equiva-
lences, J. Algebra 155(1) (1993), 252-265.
[Nau1994-a] S.K. Nauman, An alternate criterion of localized modules, J. Algebra 164
(1994), 256-263.
[Nau1994-b] S.K. Nauman, Intersecting subcategories of static modules, stable Clifford the-
ory and colocalization-localization, J. Algebra 170(2) (1994), 400-421.
[Nau2004] S.K. Nauman, Morita similar matrix rings and their Grothendieck groups, Ali-
garh Bull. Math. 23(1-2) (2004), 49-60.
[Sat1978] M. Sato, Fuller’s Theorem of equivalences, J. Algebra 52 (1978), 274-284.
[Tak1977] M. Takeuchi, Morita theorems for categories of comodules, J. Fac. Univ. Tokyo
24 (1977), 629-644.
[Trl1994] J. Trlifaj, Every ∗-module is finitely generated, J. Algebra 169 (1994), 392-398.
[Ver2006] J. Vercruysse, Local units versus local dualisations: corings with local structure
maps, Commun. Algebra 34 (2006), 2079-2103.
[Wis1991] R.Wisbauer, Foundations of Module and Ring Theory, a Handbook for Study
and Research, Gordon and Breach Science Publishers (1991).
[Wis1998] R. Wisbauer, Tilting in module categories, in “Abelian groups, module theory
and topology”, LNPAM 201 (1998), 421-444.
[Wis2000] R. Wisbauer, Static modules and equivalences, in “Interactions between Ring
Theory and Representation Theory”, Ed. V. Oystaeyen, M. Saorin, Marcel
Decker (2000), 423-449.
[Xin1999] Lin Xin, A note on ∗-modules, Algebra Colloq. 6(2) (1999), 231-240.
[Z-H1976] B. Zimmermann-Huisgen, Pure submodules of direct products of free modules,
Math. Ann. 224 (1976), 233-245.
	Introduction
	Preliminaries
	Morita (Semi)contexts
	Injective Morita (Semi-)Contexts
	Equivalences of Categories
	More applications
ABSTRACT
  This paper is an exposition of the so-called injective Morita contexts (in
which the connecting bimodule morphisms are injective) and Morita
$\alpha$contexts (in which the connecting bimodules enjoy some local
projectivity in the sense of Zimmermann-Huisgen). Motivated by situations in
which only one trace ideal is in action, or the compatibility between the
bimodule morphisms is not needed, we introduce the notions of Morita
semi-contexts and Morita data, and investigate them. Injective Morita data will
be used (with the help of static and adstatic modules) to establish
equivalences between some intersecting subcategories related to subcategories
of modules that are localized or colocalized by trace ideals of a Morita datum.
We end up with applications of Morita $\alpha$-contexts to $\ast$-modules and
injective right wide Morita contexts.

<|endoftext|><|startoftext|>
Introduction
	The notations and conventions of charmed baryon
	The 3P0 model
	The strong decays of charmed baryon
	Numerical results
	Discussion and conclusion
	Appendix
	The harmonic oscillator wave functions used in our calculation
	The momentum space integration
	Acknowledgments
	References
ABSTRACT
  There has been important experimental progress in the sector of heavy baryons
in the past several years. We study the strong decays of the S-wave, P-wave,
D-wave and radially excited charmed baryons using the $^3P_0$ model. After
comparing the calculated decay pattern and total width with the available data,
we discuss the possible internal structure and quantum numbers of those charmed
baryons observed recently.

<|endoftext|><|startoftext|>
Introduction
It took thirty-seven years from the discovery of a tiny CP violating effect of order
10−3 inKL → π+π− 1 to a first observation of a breakdown of CP symmetry outside
the strange meson system. A large CP asymmetry of order one between rates of
initial B0 and B̄0 decays to J/ψKS was measured in summer 2001 by the Babar and
Belle Collaborations.2 A sizable however smaller asymmetry had been anticipated
twenty years earlier 3 in the framework of the Kobayashi-Maskawa (KM) model
of CP violation,4 in the absence of crucial information on b quark couplings. The
asymmetry was observed in a time-dependent measurement as suggested,5 thanks
to the long B0 lifetime and the large B0-B̄0 mixing.6 The measured asymmetry,
fixing (in the standard phase convention7) the sine of the phase 2β (≡ 2φ1) ≡
2arg(VtbV
td) of the top-quark dominated B
0-B̄0 mixing amplitude, was found to
be in good agreement with other determinations of Cabibbo-Kobayasi-Maskawa
(CKM) parameters,8,9 including a recent precise measurement of Bs-B̄s mixing.
This showed that the CKM phase γ (≡ φ3) ≡ arg(V ∗ub), which seems to be unable
to account for the observed cosmological baryon asymmetry,11 is the dominant
source of CP violation in flavor-changing processes. With this confirmation the next
pressing question became whether small contributions beyond the CKM framework
occur in CP violating flavor-changing processes, and whether such effects can be
observed in beauty decays.
One way of answering this question is by over-constraining the CKM unitarity
triangle through precise CP conserving measurements related to the lengths of the
∗Based partially on review talks given at recent conferences.
http://arxiv.org/abs/0704.0076v2
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
2 M. Gronau
sides of the triangle. An alternative and more direct way, focusing on the origin
of CP violation in the CKM framework, is to measure β and γ in a variety of B
decay modes. Different values obtained from asymmetries in several processes, or
values different from those imposed by other constraints, could provide clues for
new sources of CP violation and for new flavor-changing interactions. Such phases
and interactions occur in the low energy effective Hamiltonian of extensions of the
Standard Model (SM) including models based on supersymmetry.12
In this presentation we will focus on the latter approach based primarily on
CP asymmetries, using also complementary information on hadronic B decay rates
which are expected to be related to each other in the CKM framework. In the
next section we outline several of the most relevant processes and the theoretical
tools applied for their studies, quoting numerous papers where these ideas have
been originally proposed and where more details can be found.13 Sections 3, 4 and
5 describe a number of methods in some detail, summarizing at the end of each
section the current experimental situation. Section 6 discusses several tests for NP
effects, while Section 7 concludes.
2. Processes, methods and New Physics effects
Whereas testing the KM origin of CP violation in most hadronic B decays requires
separating strong and weak interaction effects, in a few “golden modes” CP asym-
metries are unaffected by strong interactions. For instance, the decay B0 → J/ψKS
is dominated by a single tree-level quark transition b̄ → c̄cs̄, up to a correction
smaller than a fraction of a percent.14,15,16,17 Thus, the asymmetries measured
in this process and in other decays dominated by b̄ → c̄cs̄ have already provided a
rather precise measurement of sin 2β,18,19,20
sin 2β = 0.678± 0.025 . (1)
This value permits two solutions for β at 21.3◦ and at 68.7◦. Time-dependent an-
gular studies of B0 → J/ψK∗0,21 and time-dependent Dalitz analyses of B0 →
Dh0 (D → KSπ+π−, h0 = π0, η, ω)22 measuring cos 2β > 0 have excluded the
second solution at a high confidence level, implying
β = (21.3± 1.0)◦ . (2)
Since B0 → J/ψKS proceeds through a CKM-favored quark transition, contribu-
tions to the decay amplitude from physics at a higher scale are expected to be
very small, potentially identifiable by a tiny direct asymmetry in this process or in
B+ → J/ψK+.23
Another process where the determination of a weak phase is not affected
by strong interactions is B+ → DK+, proceeding through tree-level amplitudes
b̄ → c̄us̄ and b̄ → ūcs̄. The interference of these two amplitudes, from D̄0 and D0
which can always decay to a common hadronic final state, leads to decay rates and
a CP asymmetry which measure very cleanly the relative phase γ between these
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 3
amplitudes.24,25 The trick here lies in recognizing the measurements which yield
this fundamental CP-violating quantity. Physics beyond the SM is expected to have
a negligible effect on this determination of γ which relies on the interference of two
tree amplitudes.
B decays into pairs of charmless mesons, such as B → ππ (or B → ρρ) and
B → Kπ (or B → K∗ρ), involve contributions of both tree and penguin ampli-
tudes which carry different weak and strong phases.14,26,27 Contrary to the case
of B → DK, the determination of β and γ using CP asymmetries in charmless B
decays involves two correlated aspects which must be considered: its dependence
on strong interaction dynamics and its sensitivity to potential New Physics (NP)
effects. This sensitivity follows from the CKM and loop suppression of penguin am-
plitudes, implying that new heavy particles at the TeV mass range, replacing the
W boson and the top-quark in the penguin loop, may have sizable effects.28. In
order to claim evidence for physics beyond the SM from a determination of β and
γ in these processes one must handle first the question of dynamics. There are two
approaches for treating the dynamics of charmless hadronic B decays:
(1) Study systematically strong interaction effects in the framework of QCD.
(2) Identify by symmetry observables which do not depend on QCD dynamics.
The first approach faces the difficulty of having to treat precisely long distance
effects of QCD including final state interactions. Remarkable theoretical progress
has been made recently in proving a leading-order (in 1/mb) factorization formula
for these amplitudes in a heavy quark effective theory approach to perturbative
QCD.29,30,31 However, there remain differences between ways of treating in differ-
ent approaches power counting, the scale of Wilson coefficients, end-point quark dis-
tribution functions of light mesons, and nonperturbative contributions from charm
loops.32 Also, the nonperturbative input parameters in these calculations involve
non-negligible uncertainties. These parameters include heavy-to-light form factors
at small momentum transfer, light-cone distribution amplitudes, and the average
inverse momentum fraction of the spectator quark in the B meson. The resulting
inaccuracies in calculating magnitudes and strong phases of amplitudes prohibit a
precise determination of γ from measured decay rates and CP asymmetries. Also,
the calculated rates and asymmetries cannot provide a clear case for physics be-
yond the SM in cases where the results of a calculation deviate slightly from the
measurements.
In the second approach one applies isospin symmetry to obtain relations among
several decay amplitudes. For instance, using the distinct behavior under isospin of
tree and penguin operators contributing in B → ππ, a judicious choice of observ-
ables permits a determination of γ or α (≡ φ2) = π − β − γ. 33 The same analysis
applies in B decays to pairs of longitudinally polarized ρ mesons. In case that an
observable related to the subdominant penguin amplitude is not measured with
sufficient precision, it may be replaced in the analysis by a CKM-enhanced SU(3)-
related observable, in which a large theoretical uncertainty is translated to a small
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
4 M. Gronau
error in γ. The precision of this method is increased by including contributions of
higher order electroweak penguin amplitudes, which are related by isospin to tree
amplitudes.34,35 With sufficient statistics one should also take into account isospin-
breaking corrections of order (md−mu)/ΛQCD ∼ 0.02,36,37 and an effect caused by
the ρ meson width.38 A similar analysis proposed for extracting γ in B → Kπ 39,40
requires using flavor SU(3) instead of isospin for relating electroweak penguin con-
tributions and tree amplitudes.35,41 While flavor SU(3) is usually assumed to be
broken by corrections of order (ms − md)/ΛQCD ∼ 0.3, in this particular case a
rather precise recipe for SU(3) breaking is provided by QCD factorization, reducing
the theoretical uncertainty in γ to only a few degrees.42
Charmless B decays, which are sensitive to physics beyond the SM 28, provide a
rich laboratory for studying various signatures of NP. A large variety of theories have
been studied in this context, including supersymmetric models, models involving
tree-level flavor-changing Z or Z ′ couplings, models with anomalous three-gauge-
boson couplings and other models involving an enhanced chromomagnetic dipole
operator.43,44 The following effects have been studied and will be discussed in
Section 6 in a model-independent manner:
(1) Within the SM, the three values of γ extracted from B → ππ, B → Kπ and
B+ → DK+ are equal. As we will explain, these three values are expected to be
different in extensions of the SM involving new low energy four-fermion operators
behaving as ∆I = 3/2 in B → ππ and as ∆I = 1 in B → Kπ.
(2) Other signatures of anomalously large ∆I = 1 operators contributing to
B → Kπ are violations of isospin sum rules, holding in the SM for both decay rates
and CP asymmetries in these decays.45,46,47
(3) Time-dependent asymmetries in B0 → π0KS , B0 → φKS and B0 → η′KS
and in other b → s penguin-dominated decays may differ substantially from the
asymmetry sin 2β sin∆mt, predicted approximately in the SM.26,43,48 Significant
deviations are expected in models involving anomalous |∆S| = 1 operators behaving
as ∆I = 0 or ∆I = 1.
(4) An interesting question, which may provide a clue to the underlying New
Physics once deviations from SM predictions are observed, is how to diagnose the
value of ∆I in NP operators contributing to |∆S| = 1 charmless B decays. We will
discuss an answer to this question which has been proposed recently.49
3. Determining γ in B → DK
In this section we will discuss in some length a rather rich and very precise method
for determining γ in processes of the form B → D(∗)K(∗), which uses both charged
and neutral B mesons and a large variety of final states. It is based on a broad idea
that any coherent admixture of a state involving a D̄0 from b̄ → c̄us̄ and a state
with D0 from b̄ → ūcs̄ can decay to a common final state.24,25 The interference
between the two channels, B → D(∗)0K(∗), D0 → fD and B → D̄(∗)0K(∗), D̄0 →
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 5
fD, involves the weak phase difference γ, which may be determined with a high
theoretical precision using a suitable choice of measurements. Effects of D0-D̄0
mixing are negligible.50 While some of these processes are statistically limited,
combining them together is expected to reduce the experimental error in γ. In
addition to (quasi) two-body B decays, the D or D∗ in the final state may be
accompanied by any multi-body final state with quantum numbers of a kaon.25
Each process in this large class of neutral and charged B decays is characterized
by two pairs of parameters, describing complex ratios of amplitudes for D0 and D̄0
for the two steps of the decay chain (we use a convention rB , rf ≥ 0, 0 ≤ δB, δf <
A(B → D(∗)0K(∗))
A(B → D̄(∗)0K(∗))
= rBe
i(δB+γ) ,
A(D0 → fD)
A(D̄0 → fD)
= rfe
iδf . (3)
In three-body decays ofB andD mesons, such asB → DKπ andD → Kππ, the two
pairs of parameters (rB , δB) and (rf , δf ) are actually functions of two corresponding
Dalitz variables describing the kinematics of the above three-body decays. The
sensitivity of determining γ depends on rB and rf because this determination relies
on an interference of D0 and D̄0 amplitudes. For D decay modes with rf ∼ 1 (see
discussion below) the sensitivity increases with the magnitude of rB.
For each of the eight sub-classes of processes, B+,0 → D(∗)K(∗)+,0, one may
study a variety of final states in neutral D decays. The states fD may be divided
into four families, distinguished qualitatively by their parameters (rf , δf ) defined in
Eq. (3):
(1) fD = CP-eigenstate
24,25,51 (K+K−,KSπ
0, etc.); rf = 1, δf = 0, π.
(2) fD = flavorless but non-CP state
52 (K+K∗−,K∗+K−, etc.); rf = O(1).
(3) fD = flavor state
53 (K+π−,K+π−π0, etc.); rf ∼ tan2 θc.
(4) fD = 3-body self-conjugate state
54 (KSπ
+π−); rf , δf vary across the Dalitz
plane.
In the first family, CP-odd states occur in Cabibbo-favored D0 and D̄0 decays,
while CP-even states occur in singly Cabibbo-suppressed decays. The second family
of states occurs in singly Cabibbo-suppressed decays, the third family occurs in
Cabibbo-favored D̄0 decays and in doubly Cabibbo-suppressed D0 decays, while
the last state is formally a Cabibbo-favored mode for both D0 and D̄0.
The parameters rB and δB in B → D(∗)K(∗) depend on whether the B meson
is charged or neutral, and may differ for K vs K∗,55 and for D vs D∗, where a
neutral D∗ can be observed in D∗ → Dπ0 or D∗ → Dγ.56 The ratio rB involves
a CKM factor |VubVcs/VcbVus| ≃ 0.4 in both B+ and B0 decays, and a color-
suppression factor in B+ decays, while in B0 decays both b̄ → c̄us̄ and b̄ → ūcs̄
amplitudes are color-suppressed. A rough estimate of the color-suppression factor in
these decays may be obtained from the color-suppression measured in corresponding
CKM-favored decays, B → Dπ,D∗π,Dρ,D∗ρ, where the suppression is found to
be in the range 0.3 − 0.5.57 Thus, one expects rB(B0) ∼ 0.4, rB(B+) = (0.3 −
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
6 M. Gronau
0.5)rB(B
0) in all the processes B+,0 → D(∗)K(∗)+,0. We note that three-body B+
decays, such as B+ → D0K+π0, are not color-suppressed, making these processes
advantageous by their potentially large value of rB which varies in phase space.
58,59
The above comparison of rB(B
+) and rB(B
0) may be quantified more precisely
by expressing the four ratios rB(B
0)/rB(B
+) in B → D(∗)K(∗) in terms of recip-
rocal ratios of known magnitudes of amplitudes:60
0 → D(∗)K(∗)0)
rB(B+ → D(∗)K(∗)+)
B+ → D̄(∗)0K(∗)+)
B0 → D̄(∗)0K(∗)0)
. (4)
This follows from an approximation,
A(B0 → D(∗)0K(∗)0) ≃ A(B+ → D(∗)0K(∗)+) , (5)
where the B0 and B+ processes are related to each other by replacing a spectator
d quark by a u quark. While formally Eq. (5) is not an isospin prediction, it may
be obtained using an isospin triangle relation,61
A(B0 → D(∗)0K(∗)0) = A(B+ → D(∗)0K(∗)+) +A(B+ → D(∗)+K(∗)0), (6)
and neglecting the second amplitude on the right-hand-side which is “pure
annihilation”.62 This amplitude is expected to be suppressed by a factor of four or
five relative to the other two amplitudes appearing in (6) which are color-suppressed.
Evidence for this kind of suppression is provided by corresponding ratios of CKM-
favored amplitudes,57 |A(B0 → D−s K+)/
2A(D̄0π0)| = 0.23 ± 0.03, |A(B0 →
D∗−s K
2A(D̄∗0π0)| < 0.24.
Applying Eq. (4) to measured branching ratios,57,63 one finds
rB(B+)
B → DK B → DK∗ B → D∗K B → D∗K∗
2.9± 0.4 3.7± 0.3 > 2.2 > 3.0 (7)
This agrees with values of rB(B
0) near 0.4 and rB(B
+) between 0.1 and 0.2. Note
that in spite of the expected larger values of rB in B
0 decays, from the point
of view of statistics alone (without considering the question of flavor tagging and
the efficiency of detecting a KS in B
0 → D(∗)K0), B+ and B0 decays may fare
comparably when studying γ. This follows from (5) because the statistical error on
γ scales roughly as the inverse of the smallest of the two interfering amplitudes.
We will now discuss the actual manner by which γ can be determined using
separately three of the above-mentioned families of final states fD. We will men-
tion advantages and disadvantages in each case. For illustration of the method we
will consider B+ → fDK+. We will also summarize the current status of these
measurements in all eight decay modes B+,0 → D(∗)K(∗)+,0.
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 7
3.1. fD = CP-eigenstates
One considers four observables consisting of two charge-averaged decay rates for
even and odd CP states, normalized by the decay rate into a D0 flavor state,
RCP± ≡
Γ(DCP±K
−) + Γ(DCP±K
Γ(D0K−)
, (8)
and two CP asymmetries for even and odd CP states,
ACP± ≡
Γ(DCP±K
−)− Γ(DCP±K+)
Γ(DCP±K−) + Γ(DCP±K+)
. (9)
In order to avoid dependence of RCP± on errors in D
0 and DCP branching ratio
measurements one uses a definition of RCP± in terms of ratios of B decay branching
ratios intoDK andDπ final states.59 The four observablesRCP± and ACP± provide
three independent equations for rB, δB and γ,
RCP± = 1 + r
B ± 2rB cos δB cos γ , (10)
ACP± = ±2rB sin δB sin γ/RCP± . (11)
While in principle this is the simplest and most precise method for extracting
γ, up to a discrete ambiguity, in practice this method is sensitive to r2B , because
(RCP+ + RCP−)/2 = 1 + r
B . This becomes very difficult for charged B decays
where one expects rB ∼ 0.1− 0.2, but may be feasible for neutral B decays where
rB ∼ 0.4. An obvious signature for a non-zero value of rB would be observing a
difference between RCP+ and RCP− which is linear in this quantity.
Studies of B+ → DCPK+, B+ → DCPK∗+ and B+ → D∗CPK+ have been car-
ried out recently,64,65,66 each consisting of a few tens of events. A nonzero difference
RCP+ −RCP− at 2.6 standard deviations, measured in B+ → DCPK∗+,64 is prob-
ably a statistical fluctuation. A larger difference is anticipated in B0 → DCPK∗0,
as the value of rB in this process is expected to be three or four times larger than
in B+ → DK∗+. [See Eq. (7).] Higher statistics is required for a measurement of γ
using this method.
3.2. fD = flavor state
Consider a flavor state fD in Cabibbo-favored D̄
0 decays, accessible also to doubly
Cabibbo-suppressed D0 decays, such that one has rf ∼ tan2 θc in Eq. (3). One
studies the ratio of two charge-averaged decay rates, for decays into f̄DK and fDK,
Γ(fDK
−) + Γ(f̄DK
Γ(f̄DK−) + Γ(fDK+)
, (12)
and the CP asymmetry,
Γ(fDK
−)− Γ(f̄DK+)
Γ(fDK−) + Γ(f̄DK+)
. (13)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
8 M. Gronau
These observables are given by
Rf = r
B + r
f + 2rB rf cos(δB − δf ) cos γ , (14)
Af = 2r rf sin(δB − δf ) sin γ/Rf , (15)
where a multiplicative correction 1 +O(rBrf ) ∼ 1.01 has been neglected in (14).
These two observables involve three unknowns, rB , δB − δf and γ. One assumes
rf to be given by the measured ratio of doubly Cabibbo-suppressed and Cabibbo-
favored branching ratios. Thus, one needs at least two flavor states, fD and f
for which two pairs of observables (Rf , Af ) and (Rf ′ , Af ′) provide four equations
for the four unknowns, rB, δB − δf , δB − δf ′ , γ. The strong phase differences δf , δf ′
can actually be measured at a ψ′′ charm factory,67 thereby reducing the number of
unknowns to three.
While the decay rate in the numerator of Rf is rather low, the asymmetry Af
may be large for small values of rB around 0.1, as it involves two amplitudes with
a relative magnitude rf/rB.
So far, only upper bounds have been measured for Rf implying upper limits
on rB in several processes, rB(B
+ → DK+) < 0.2,68,69,70 rB(B+ → D∗K+) <
0.2,68 r(B+ → DK∗+) < 0.4,71 and rB(B0 → DK∗0) < 0.4.63,72 Further con-
straints on rB in the first three processes have been obtained by studying D decays
into CP-eigenstates and into the state KSπ
+π−. Using rB(B
0 → DK∗0)/rB(B+ →
DK∗+) = 3.7 ± 0.3 in (7) and assuming that rB(B+ → DK∗+) is not smaller
than about 0.1, one may conclude that a nonzero measurement of rB(B
0 → DK∗0)
should be measured soon. The signature for B0 → D0K∗0 events would be two
kaons with opposite charges.
3.3. fD = KSπ
The amplitude for B+ → (KSπ+π−)DK+ is a function of the two invariant-mass
variables, m2
≡ (pKS + pπ±)2, and may be written as
A(B+ → (KSπ+π−)DK+) = f(m2+,m2−) + rBei(δB+γ)f(m2−,m2+) . (16)
In B− decay one replaces m+ ↔ m−, γ → −γ. The function f may be written as a
sum of about twenty resonant and nonresonant contributions modeled to describe
the amplitude for flavor-tagged D̄0 → KSπ+π− which is measured separately.73,74
This introduces a model-dependent uncertainty in the analysis. Using the measured
function f as an input and fitting the rates for B± → (KSπ+π−)DK± to the
parameters, rB , δB and γ, one then determines these three parameters.
The advantage of using D → KSπ+π− decays over CP and flavor states is
being Cabibbo-favored and involving regions in phase space with a potentially large
interference between D0 and D̄0 decay amplitudes. The main disadvantage is the
uncertainty introduced by modeling the function f .
Two recent analyses of comparable statistics by Belle and Babar, combining
B± → DK±, B± → D∗K± and B± → DK∗±, obtained values 73 γ = [53+15
−18 ± 3±
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 9
9(model)]◦ and γ = [92±41±11±12(model)]◦.74 [This second value does not use the
process B+ → D(KSπ+)K∗ , also studied by the same group,75.] The larger errors in
the second analysis are correlated with smaller values of the extracted parameters rB
in comparison with those extracted in the first study. The model-dependent errors
may be reduced by studying at CLEO-c the decays DCP± → KSπ+π−, providing
further information on strong phases in D decays.67
Conclusion: The currently most precise value of γ is γ = [53+15
−18 ± 3± 9(model)]◦,
obtained from B± → D(∗)K(∗)± using D → KSπ+π−. These errors may be reduced
in the future by combining the study of all D decay modes in B+,0 → D(∗)K(∗)+,0.
The decay B0 → DK∗0 seems to carry a high potential because of its expected
large value of rB. Decays B
0 → D(∗)K0 may also turn useful, as they have been
shown to provide information on γ without the need for flavor tagging of the initial
B0.60,76
4. The currently most precise determination of γ: B → ππ, ρρ, ρπ
4.1. B → ππ
The amplitude for B0 → π+π− contains two terms, conventionally denoted “tree”
(T ) and “penguin” (P ) amplitudes, 14,26 involving a weak CP-violating phase γ
and a strong CP-conserving phase δ, respectively:
A(B0 → π+π−) = |T |eiγ + |P |eiδ . (17)
Time-dependent decay rates, for an initial B0 or a B
, are given by
Γ(B0(t)/B
(t) → π+π−) = e−ΓtΓπ+π− [1± C+− cos∆mt∓ S+− sin∆mt] , (18)
where
S+− =
2Im(λππ)
1 + |λππ|2
, C+− =
1− |λππ |2
1 + |λππ |2
, λππ ≡ e−2iβ
0 → π+π−)
A(B0 → π+π−)
. (19)
One has14
S+− = sin 2α+ 2|P/T | cos2α sin(β + α) cos δ +O(|P/T |2) ,
C+− = 2|P/T | sin(β + α) sin δ +O(|P/T |2) . (20)
This tells us two things:
(1) The deviation of S+− from sin 2α and the magnitude of C+− increase with
|P/T |, which can be estimated to be |P/T | ∼ 0.5 by comparing B → ππ rates with
penguin-dominated B → Kπ rates.77
(2) Γπ+π− , S+− and C+− are insufficient for determining |T |, |P |, δ and γ (or α).
Further information on these quantities may be obtained by applying isospin sym-
metry to all B → ππ decays.
In order to carry out an isospin analysis,33 one uses the fact that the three
physical B → ππ decay amplitudes and the three B̄ → ππ decay amplitudes,
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
10 M. Gronau
depending each on two isospin amplitudes, obey triangle relations of the form,
A(B0 → π+π−)/
2 +A(B0 → π0π0)−A(B+ → π+π0) = 0 . (21)
Furthermore, the penguin amplitude is pure ∆I = 1/2; hence the ∆I = 3/2 am-
plitude carries a week phase γ, A(B+ → π+π0) = e2iγA(B− → π−π0). Defin-
ing sin 2αeff ≡ S+−/(1 − C2+−)1/2, the difference αeff − α is then determined by
an angle between corresponding sides of the two isospin triangles sharing a com-
mon base, |A(B+ → π+π0)| = |A(B− → π−π0)|. A sign ambiguity in αeff − α is
resolved by two model-independent features which are confirmed experimentally,
|P |/|T | ≤ 1, |δ| ≤ π/2. This implies α < αeff .78
Table I. Branching ratios and CP asymmetries in B → ππ, B → ρρ.
Decay mode Branching ratio (10−6) ACP = −C S
B0 → π+π− 5.16± 0.22 0.38 ± 0.07 −0.61± 0.08
B+ → π+π0 5.7± 0.4 0.04 ± 0.05
B0 → π0π0 1.31± 0.21 0.36
+0.33
−0.31
B0 → ρ+ρ− 23.1
0.11 ± 0.13 −0.06± 0.18
B+ → ρ+ρ0 18.2± 3.0 −0.08± 0.13
B0 → ρ0ρ0 1.07± 0.38
Current CP-averaged branching ratios and CP asymmetries for B → ππ and
B → ρρ decays are given in Table I,20 where ACP ≡ −C for decays to CP eigen-
states. An impressive experimentally progress has been achieved in the past two
years in extracting a precise value for αeff , αeff = (110.6
−3.2)
◦. However, the er-
ror on αeff − α using the isospin triangles is still large. An upper bound, given by
CP-averaged rates and a direct CP asymmetry in B0 → π+π−,79,80
cos 2(αeff − α) ≥
Γ+− + Γ+0 − Γ00
)2 − Γ+−Γ+0
Γ+−Γ+0
1− C2+−
, (22)
leads to 0 < αeff − α < 31◦ at 1σ. Adding in quadrature the error in αeff and the
uncertainty in α−αeff , this implies α = (95± 16)◦ or γ = (64± 16)◦ by . A similar
central value but a smaller error, α = (97± 11)◦, has been reported recently by the
Belle Collaboration.81 The possibility that a penguin amplitude in B0 → π+π−
may lead to a large CP asymmetry S for values of α near 90◦ where sin 2α = 0 was
anticipated fifteen years ago.82
The bound on αeff − α may be improved considerably by measuring a nonzero
direct CP asymmetry in B0 → π0π0. This asymmetry can be shown to be large and
positive (see Eq. (46) in Sec. 5.2), implying a large rate for B̄0 but a small rate for
B0. Namely, the triangle (21) is expected to be squashed, while the B̄ triangle is
roughly equal sided.
An alternative way of treating the penguin amplitude in B0 → π+π− is by
combining within flavor SU(3) the decay rate and asymmetries in this process with
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 11
rates and asymmetries in B0 → K0π+ or B0 → K+π−.77 The ratio of ∆S = 1 and
∆S = 0 tree amplitudes in these processes, excluding CKM factors, is taken to be
given by fK/fπ assuming factorization, while the ratio of corresponding penguin
amplitudes is allowed to vary by ±0.22 around one. A current update of this rather
conservative analysis obtains 83
γ = (73± 4+10
◦ , (23)
where the first error is experimental, while the second one is due to an uncertainty
in SU(3) breaking. A discussion of SU(3) breaking factors relating B0 → π+π− and
B0 → K+π− is included in Section 5.2.
4.2. B → ρρ
Angular analyses of the pions in ρ decays have shown that B0 → ρ+ρ− is dominated
almost 100% by longitudinal polarization 20. This simplifies the isospin analysis of
CP asymmetries in these decays to becoming similar to B0 → π+π−. The advantage
of B → ρρ over B → ππ is the relative small value of (
ρ0ρ0) in comparison with
ρ+ρ−) and (
ρ+ρ0) (see Table I), indicating a smaller |P/T | in B → ρ+ρ− (|P/T | <
0.3 8) than in B0 → π+π− (|P/T | ∼ 0.5 77). Eq. (22) leads to an upper bound on
αeff − α in B → ρρ, 0 < αeff − α < 17◦ (at 1σ). The asymmetries for longitudinal
ρ’s given in Table I imply αeff = (91.7
−5.2)
◦. Thus, one finds α = (83 ± 10)◦ or
γ = (76± 10)◦ by adding errors in quadrature.
A stronger bound on |P/T | in B0 → ρ+ρ−, leading to a more precise value of γ,
may be obtained by relating this process to B+ → K∗0ρ+ within flavor SU(3). 84
One uses the branching ratio and fraction of longitudinal rate measured for this
process 20, (
K∗0ρ+) = (9.2 ± 1.5) × 10−6, fL(K∗0ρ+) = 0.48 ± 0.08, to normalize
the penguin amplitude in B0 → ρ+ρ−. Including a conservative uncertainty from
SU(3) breaking and smaller amplitudes, one finds a value
γ = (71.4+5.8
−1.7)
◦ , (24)
where the first error is experimental and the second one theoretical.
The current small theoretical error in γ requires including isospin breaking effects
in studies based on isospin symmetry. The effect of electroweak penguin amplitudes
on the isospin analyses of B → ππ and B → ρρ has been calculated and was found
to move γ slightly higher by an amount ∆γEWP = 1.5
◦.34,35 Other corrections,
relevant to methods using π0 and ρ0, includng π0-η-η′ mixing, ρ-ω mixing, and a
small I = 1 ρρ contribution allowed by the ρ-width, are each smaller than one
degree.36,37,38
Conclusion: Taking an average of the two values of γ in (23) and (24) obtained
from B0 → π+π− and B0 → ρ+ρ−, and including the above-mentioned EWP
correction, one finds
γ = (73.5± 5.7)◦ . (25)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
12 M. Gronau
A third method of measuring γ (or α) in time-dependent Dalitz analyses of B0 →
(ρπ)0 involves a much larger error,85 and has a small effect on the overall averaged
value of the weak phase. We note that sin γ is close to one and its relative error
is only 3%, the same as the relative error in sin 2β and slightly smaller than the
relative error in sinβ.
5. Rates, asymmetries, and γ in B → Kπ
5.1. Extracting γ in B → Kπ
The four decays B0 → K+π−, B0 → K0π0, B+ → K0π+, B+ → K+π0 involve
a potential for extracting γ, provided that one is sensitive to interference between
a dominant isoscalar penguin amplitude and a small tree amplitude contributing
to these processes. This idea has led to numerous suggestions for determining γ in
these decays starting with a proposal made in 1994.86,87 An interference between
penguin and tree amplitudes may be identified in two ways:
(1) Two different properly normalized B → Kπ rates.
(2) Nonzero direct CP asymmetries.
Table II. Branching ratios and asymmetries in B → Kπ.
Decay mode Branching ratio (10−6) ACP
B0 → K+π− 19.4± 0.6 −0.097± 0.012
B+ → K+π0 12.8± 0.6 0.047± 0.026
B+ → K0π+ 23.1± 1.0 0.009± 0.025
B0 → K0π0 10.0± 0.6 −0.12± 0.11
Current branching ratios and CP asymmetries are summarized in Table II.20 Three
ratios of rates, calculated using the ratio of B+ and B0 lifetimes, τ+/τ0 = 1.076±
0.008,20 are:
R ≡ Γ(B
0 → K+π−)
Γ(B+ → K0π+)
= 0.90± 0.05 ,
2Γ(B+ → K+π0)
Γ(B+ → K0π+)
= 1.11± 0.07 ,
Γ(B0 → K+π−)
2Γ(B0 → K0π0) = 0.97± 0.07 . (26)
The largest deviation from one, observed in the ratio R at 2σ, is insufficient for
claiming unambiguous evidence for a non-penguin contribution. An upper limit,
R < 0.965 at 90% confidence level, would imply γ ≤ 79◦ using sin2 γ ≤ R,88
which neglects however “color-suppressed” EWP contributions.89 As we will argue
now, these contributions and “color-suppressed” tree amplitudes are actually not
suppressed as naively expected.
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 13
The nonzero asymmetry measured in B0 → K+π− provides first evidence for
an interference between penguin (P ′) and tree (T ′) amplitudes with a nonzero rel-
ative strong phase. Such an interference occurs also in B+ → K+π0 where no
asymmetry has been observed. An assumption that other contributions to the lat-
ter asymmetry are negligible has raised some questions about the validity of the
CKM framework. In fact, a color-suppressed tree amplitude (C′), also occurring in
B+ → K+π0,86 resolves this “puzzle” if this amplitude is comparable in magnitude
to T ′. Indeed, several studies have shown that this is the case,90,91,92,93,94 also im-
plying that color-suppressed and color-favored EWP amplitudes are of comparable
magnitudes.35 For consistency between the two CP asymmetries in B0 → K+π−
and B+ → K+π0, the strong phase difference between C′ and T ′ must be negative
and cannot be very small.95 This seems to stand in contrast to QCD calculations
using a factorization theorem.29,31,94
The small asymmetry ACP (B
+ → K+π0) implies bounds on the sine of the
strong phase difference δc between T
′ +C′ and P ′. The cosine of this phase affects
Rc − 1 involving the decay rates for B+ → K0π+ and B+ → K0π+. A question
studied recently is whether the two upper bounds on | sin δc| and | cos δc| are con-
sistent with each other or, perhaps, indicate effects of NP. Consistency was shown
by proving a sum rule involving ACP (B
+ → K+π0) and Rc − 1, in which an elec-
troweak penguin (EWP) amplitude plays an important role. We will now present a
proof of the sum rule, which may provide important information on γ.95
The two amplitudes for B+ → K0π+,K+π0 are given in terms of topological
contributions including P ′, T ′ and C′,
A(B+ → K0π+) = (P ′ − 1
P ′cEW ) +A
A(B+ → K+π0) = (P ′ − 1
P ′cEW ) + (T
′ + P ′cEW ) + (C
′ + P ′EW ) +A
′ , (27)
where P ′EW and P
EW are color-favored and color-suppressed EWP contributions.
The small annihilation amplitude A′ and a small u quark contribution to P ′ involv-
ing a CKM factor V ∗ubVus will be neglected (|V ∗ubVus|/|V ∗cbVcs| = 0.02). Evidence for
the smallness of these terms can be found in the small CP asymmetry measured for
B+ → K0π+. Large terms would require rescattering and a sizable strong phase
difference between these terms and P ′.
Flavor SU(3) symmetry relates ∆I = 1, I(Kπ) = 3/2 electroweak penguin and
tree amplitudes through a calculable ratio δEW
35,41,
T ′ + C′ + P ′EW + P
EW = (T
′ + C′)(1 − δEW e−iγ) ,
δEW = −
c9 + c10
c1 + c2
|V ∗tbVts|
|V ∗ubVus|
= 0.60± 0.05 . (28)
The error in δEW is dominated by the current uncertainty in |Vub|/|Vcb| = 0.104±
0.007 57, including also a smaller error from SU(3) breaking estimated using QCD
factorization. Eqs. (27) and (28) imply 96
Rc = 1− 2rc cos δc(cos γ − δEW) + r2c (1− 2δEW cos γ + δ2EW) , (29)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
14 M. Gronau
ACP (B
+ → K+π0) = −2rc sin δc sin γ/Rc , (30)
where rc ≡ |T ′ + C′|/|P ′ − 13P
EW | and δc is the strong phase difference between
T ′ + C′ and P ′ − 1
P ′cEW .
The parameter rc is calculable in terms of measured decay rates, using bro-
ken flavor SU(3) which relates T ′ + C′ and T + C dominating B+ → π+π0 by a
factorization factor fK/fπ (neglecting a tiny EWP term in B
+ → π+π0),87
|T ′ + C′| =
|A(B+ → π+π0)| . (31)
Using branching ratios from Tables I and II, one finds
B+ → π+π0)
B+ → K0π+)
= 0.198± 0.008 . (32)
The error in rc does not include an uncertainty from assuming factorization for
SU(3) breaking in T ′ + C′. While this assumption should hold well for T ′, it may
not be a good approximation for C′ which as we have mentioned is comparable in
magnitude to T ′ and carries a strong phase relative to it. Thus one should allow a
10% theoretical error when using factorization for relating B → Kπ and B → ππ
T + C amplitudes, so that
rc = 0.20± 0.01 (exp)± 0.02 (th) . (33)
Eliminating δc in Eqs. (29) and (30) by retaining terms which are linear in rc,
one finds
Rc − 1
cos γ − δEW
ACP (B
+ → K+π0)
sin γ
= (2rc)
2 +O(r3c ) . (34)
This sum rule implies that at least one of the two terms whose squares occur on
the left-hand-side must be sizable, of the order of 2rc = 0.4. The second term,
|ACP (B+ → K+π0)|/ sin γ, is already smaller than ≃ 0.1, using the current 2σ
bounds on γ and |ACP (B+ → K+π0)|. Thus, the first term must provide a dominant
contribution. For Rc ≃ 1, this implies γ ≃ arccos δEW ≃ (53.1± 3.5)◦. This range is
expanded by including errors in Rc and ACP (B
+ → K+π0). For instance, an upper
bound Rc < 1.1 would imply an inportant upper limit, γ < 70
◦. Currently one only
obtains an upper limit γ ≤ 88◦ at 90% confidence level.95 This bound is consistent
with the value obtained in (25) from B → ππ and B → ρρ, but is not competitive
with the latter precision.
Conclusion: The current constraint obtained from Rc and ACP (B
+ → K+π0) is
γ ≤ 88◦ at 90% confidence level. Further improvement in the measurement of Rc
(which may, in fact, be very close to one) is required in order to achieve a precision
in γ comparable to that obtained in B → ππ, ρρ. (A conclusion concerning the
different CP asymmetries measured in B0 → K+π− and B+ → K+π0 will be given
at the end of the next subsection.)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 15
5.2. Symmetry relations for B → Kπ rates and asymmetries
The following two features imply rather precise sum rules in the CKM framework,
both for B → Kπ decay rates and CP asymmetries:
(1) The dominant penguin amplitude is ∆I = 0.
(2) The four decay amplitudes obey a linear isospin relation,39
A(K+π−)−A(K0π+)−
2A(K+π0) +
2A(K0π0) . (35)
An immediate consequence of these features are two isospin sum rules, which
hold up to terms which are quadratic in small ratios of non-penguin to penguin
amplitudes,45,46,47
Γ(K+π−) + Γ(K0π+) = 2Γ(K+π0) + 2Γ(K0π0) , (36)
∆(K+π−) + ∆(K0π+) = 2∆(K+π0) + 2∆(K0π0) , (37)
where
∆(Kπ) ≡ Γ(B̄ → K̄π̄)− Γ(B → Kπ) . (38)
Quadratic corrections to (36) have been calculated in the SM and were found to
be a few percent.97,98,99 This is the level expected in general for isospin-breaking
corrections which must therefore also be considered. The above two features imply
that these ∆I = 1 corrections are suppressed by a small ratio of non-penguin to
penguin amplitudes and are therefore negligible.100 Indeed, this sum rule holds
experimentally within a 5% error.101 One expects the other sum rule (37) to hold
at a similar precision.
The CP rate asymmetry sum rule (37), relating the four CP asymmetries, leads
to a prediction for the asymmetry in B0 → K0π0 in terms of the other three
asymmetries which have been measured with higher precision,
ACP (B
0 → K0π0) = −0.140± 0.043 . (39)
While this value is consistent with experiment (see Table II), higher accuracy in
this asymmetry measurement is required for testing this straightforward prediction.
Relations between CP asymmetries in B → Kπ and B → ππ following from
approximate flavor SU(3) symmetry of QCD 102 are not expected to hold as pre-
cisely as isospin relations, but may still be interesting and useful. An important
question relevant to such relations is how to include SU(3)-breaking effects, which
are expected to be at a level of 20-30%. Here we wish to discuss two SU(3) rela-
tions proposed twelve years ago,103,104 one of which holds experimentally within
expectation, providing some lesson about SU(3) breaking, while the other has a an
interesting implication for future applications of the isospin analysis in B → ππ.
A most convenient proof of SU(3) relations is based on using a diagramatic
approach, in which diagrams with given flavor topologies replace reduced SU(3)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
16 M. Gronau
matrix elements.86 In this language, the amplitudes for B0 decays into pairs of
charged or neutral pions, and pairs of charged or neutral π and K, are given by:
−A(B0 → π+π−) = T + (P + 2P cEW /3) + E + PA ,
2A(B0 → π0π0) = C − (P − PEW − P cEW /3)− E − PA ,
−A(B0 → K+π−) = T ′ + (P ′ + 2P ′cEW /3) ,
2A(B0 → K0π0) = C′ − (P ′ − P ′EW − P ′cEW /3) . (40)
The combination E + PA, representing exchange and penguin annihilation topolo-
gies, is expected to be 1/mb-suppressed relative to T and C,
31,62 as demonstrated
by the small branching ratio measured for B0 → K+K−.20 This term will be
neglected.
Expressing topological amplitudes in terms of CKM factors, SU(3)-invariant
amplitudes and SU(3) invariant strong phases, one may write
T ≡ V ∗ubVud|T + Puc| , P + 2P cEW /3 ≡ V ∗tbVtd|Ptc|eiδ ,
T ′ ≡ V ∗ubVus|T + Puc| , P ′ + 2P ′cEW /3 ≡ V ∗tbVts|Ptc|eiδ , (41)
C ≡ V ∗ubVud|C − Puc| , P − PEW − P cEW /3 ≡ V ∗tbVtd|P̃tc|eiδ̃ ,
C′ ≡ V ∗ubVus|C − Puc| , P ′ − P ′EW − P ′cEW /3 ≡ V ∗tbVts|P̃tc|eiδ̃ .
Unitarity of the CKM matrix, V ∗cbVcd(s) = −V ∗tbVtd(s) − V ∗ubVud(s), has been used
to absorb in T (
′) and C(
′) a penguin term Puc ≡ Pu − Pc multiplying V ∗ubVud(s),
while Ptc ≡ Pt − Pc and P̃tc ≡ P̃t − P̃c contain two distinct combinations of EWP
contributions. Using the identity
Im (V ∗ubVudVtbV
td) = −Im (V ∗ubVusVtbV ∗ts) , (42)
one finds103,104
∆(B0 → K+π−) = −∆(B0 → π+π−) (43)
∆(B0 → K0π0) = −∆(B0 → π0π0) , (44)
where ∆ is the CP rate difference defined in (38).
Quoting products of branching ratios and asymmetries from Tables I and II,
Eq. (43) reads
− 1.88± 0.24 = −1.96± 0.37 . (45)
This SU(3) relation works well and requires no SU(3)-breaking. An SU(3) breaking
factor fK/fπ in T but not in P , or in both T and P , are currently excluded at a
level of 1.0σ, or 1.75σ. More precise CP asymmetry measurements in B0 → K+π−
and B0 → π+π− are required for determining the pattern of SU(3) breaking in tree
and penguin amplitudes.
Using the prediction (39) of the B → Kπ asymmetry sum rule, Eq. (44) predicts
ACP (B
0 → π0π0) = 1.07± 0.38 . (46)
The error is dominated by current errors in CP asymmetries for B+ → K0π+
and B+ → K+π0, and to a less extent by the error in (
π0π0). SU(3) breaking in
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 17
amplitudes could modify this prediction by a factor fπ/fK if this factor applies to
C, and less likely by (fπ/fK)2. A large positive CP asymmetry, favored in all three
cases, will affect future applications of the isospin analysis in B → ππ. It implies
that while the B̄ isospin triangle is roughly equal-sided, the B triangle is squashed.
A twofold ambiguity in the value of γ disappears in the limit of a flat B triangle.24
Conclusion: The isospin sum rule for B → Kπ decay rates holds well, while the
CP asymmetry sum rule predicts ACP (B
0 → K0π0) = −0.140±0.043. The different
asymmetries in B0 → K+π− and B+ → K+π0 can be explained by an amplitude
C′ comparable to T ′ and involving a relative negative strong phase, and should
not be considered a “puzzle”. An SU(3) relation for B0 → ππ and B0 → Kπ CP
asymmetries works well for charged modes. The corresponding relation for neutral
modes predicts a large positive asymmetry in B0 → π0π0. Improving asymmetry
measurements can provide tests for SU(3) breaking factors.
6. Tests for small New Physics effects
6.1. Values of γ
We have described three ways for extracting a value for γ relying on interference
of distinct pairs of quark amplitudes, (b → cūs, b → uc̄s), (b → cc̄s, b → uūs) and
(b → cc̄d, b → uūd). The three pairs provide a specific pattern for CP violation in
the CKM framework, which is expected to be violated in many extensions of the
SM. The rather precise value of γ (25) extracted from B → ππ, ρρ, ρπ is consistent
with constraints on γ from CP conserving measurements related to the sides of the
unitarity triangle.8,9 The values of γ obtained in B → D(∗)K(∗) and B → Kπ
are consistent with those extracted in B → ππ, ρρ, ρπ, but are not yet sufficiently
precise for testing small NP effects in charmless B decays. Further experimental
improvements are required, in particular in the former two types of processes.
While the value of γ in B → D(∗)K∗) is not expected to be affected by NP,
the other two classes of processes involving penguin loops are susceptible to such
effects. The extraction of γ in B → ππ ρρ assumes that γ is the phase of a ∆I =
3/2 tree amplitude, while an additional ∆I = 3/2 EWP contribution is included
using isospin. The extracted value could be modified by a new ∆I = 3/2 effective
operator originating in physics beyond the SM, but not by a new ∆I = 1/2 operator.
Similarly, the value of γ extracted in B → Kπ is affected by a potential new ∆I = 1
operator, but not by a new ∆I = 0 operator, because the amplitude (28), playing
an essential role in this method, is pure ∆I = 1.
6.2. B → Kπ sum rule
Charmless |∆S| = 1 B and Bs decays are particularly sensitive to NP effects, as
new heavy particles at the TeV mass range may replace the the W boson and top-
quark in the penguin loop dominating these amplitudes.28 The sum rule (36) for
B → Kπ decay rates provides a test for such effects. However, as we have argued
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
18 M. Gronau
from isospin considerations, it is only affected by quadratic ∆I = 1 amplitudes
including NP contributions. Small NP amplitudes, contributing quadratically to
the sum rule, cannot be separated from SM corrections, which are by themselves at
a level of a few percent. This is the level to which the sum rule has already been
tested. We will argue below for evidence showing that potential NP contributions
to |∆S| = 1 charmless decays must be suppressed by roughly an order of magnitude
relative to the dominant b→ s penguin amplitudes.
6.3. Values of S,C in |∆S| = 1 B0 → fCP decays
A class of b → s penguin-dominated B0 decays to CP-eigenstates has recently at-
tracted considerable attention. This includes final statesXKS andXKL, whereX =
φ, π0, η′, ω, f0, ρ
0,K+K−,KSKS , π
0π0, for which measured asymmetries −ηCPS
and C are quoted in Table III. [The asymmetries S and C = −ACP were de-
fined in (18) for B0 → π+π−. Observed modes with KL in the final states obey
ηCP (XKL) = −ηCP (XKS).] In these processes, a value S = −ηCP sin 2β (for states
Table III. Asymmetries S and C in B0 → XKS .
X φ π0 η′ ω f0(980)
−ηCP S 0.39± 0.18 0.33± 0.21 0.61± 0.07 0.48± 0.24 0.42 ± 0.17
C 0.01± 0.13 0.12± 0.11 −0.09± 0.06 −0.21± 0.19 −0.02± 0.13
X ρ0 K+K− KSKS π
−ηCP S 0.20± 0.57 0.58
+0.18
−0.13
0.58± 0.20 −0.72± 0.71
C 0.64± 0.46 0.15± 0.09 −0.14± 0.15 0.23± 0.54
with CP-eigenvalue ηCP ) is expected approximately.
26,43 These predictions involve
hadronic uncertainties at a level of several percent, of order λ2, λ ∼ 0.2. It has been
pointed out some time ago105 that it is difficult to separate these hadronic uncer-
tainties within the SM from NP contributions to decay amplitudes if the latter are
small. In the next subsection we will discuss indirect experimental evidence showing
that NP contributions to S and C must be small. Corrections to S = −ηCP sin 2β
and values for the asymmetries C have been calculated in the SM using methods
based on QCD factorization106,107 and flavor SU(3),90,108,109 and were found to
be between a few percent up to above ten percent within hadronic uncertainties.
Whereas the deviation of S from −ηCP sin 2β is process-dependent, a generic
result has been proven a long time ago for both S and C, to first order in |c/p|,14
∆S ≡ −ηCPS − sin 2β = 2
cos 2β sin γ cos∆ ,
C = 2
sin γ sin∆ . (47)
Here p and c are penguin and color-suppressed tree amplitudes involving a small ra-
tio and relative weak and strong phases γ and ∆, respectively. This implies ∆S > 0
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 19
for |∆| < π/2, which can be argued for several of the above decays using QCD
arguments106,107 or SU(3) fits.109 (Note that while |p| is measurable in certain
decay rates up to first order corrections, |c| and ∆ involve sizable hadronic uncertain-
ties in QCD calculations.) In contrast to this expectation, the central values mea-
sured for ∆S are negative for all decays. (See Table III.) Consequently, one finds an
averaged value sin 2βeff = 0.53±0.05,20 to be compared with sin 2β = 0.678±0.025.
Two measurements which seem particularly interesting are−ηCPSφKS = 0.39±0.18,
where a positive correction of a few percent to sin 2β is expected in the SM,106,107
and −ηCPSπ0KS = 0.33± 0.21, where a rather large positive correction to sin 2β is
expected shifting this asymmetry to a value just above 0.8.90
While the current averaged value of sin 2βeff is tantalizing, experimental errors
in S and C must be reduced further to make a clear case for physics beyond the
SM. Assuming that the discrepancy between improved measurements and calcu-
lated values of S and C persists beyond theoretical uncertainties, can this pro-
vide a clue to the underlying New Physics? Since many models could give rise to
a discrepancy,28,43,44 one would seek signatures characterizing classes of models
rather than studying the effects in specific models. One way of classifying extensions
of the SM is by the isospin behavior of the new effective operators contributing to
b→ sqq̄ transitions.
6.4. Diagnosis of ∆I for New Physics operators
Four-quark operators in the effective Hamiltonian associated with NP in b → sqq̄
transitions can be either isoscalar or isovector operators. We will now discuss a study
proposed recently in order to isolate ∆I = 0 or ∆I = 1 operators, thus determining
corresponding NP amplitudes and CP violating phases.49 We will show that since S
and C in the above processes combine ∆I = 0 or ∆I = 1 contributions, separating
these contributions requires using also information from other two asymmetries,
which are provided by isospin-reflected decay processes.
Two |∆S| = 1 charmless B (or Bs) decay processes, related by isospin reflection,
RI : u ↔ d, ū ↔ −d̄, can always be expressed in term of common ∆I = 0 and
∆I = 1 amplitudes B and A in the form:
A(B+ → f) = B +A , A(B0 → RIf) = ±(B −A) . (48)
A proof of this relation uses a sign change of Clebsch-Gordan coefficients underm↔
−m.49 The description (48) applies, in particular, to pairs of processes involving
all the B0 decay modes listed in Table III, and B+ decay modes where final states
are obtained by isospin reflection from corresponding B0 decay modes. Decay rates
for pairs of isospin-reflected B decay processes, and for B̄ decays to corresponding
charge conjugate final states are therefore given by (we omit inessential common
kinematic factors),
Γ+ = |B +A|2 , Γ0 = |B − A|2 ,
Γ− = |B̄ + Ā|2 , Γ0̄ = |B̄ − Ā|2 . (49)
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
20 M. Gronau
The amplitudes B̄ and Ā are related to B and A by a change in sign of all weak
phases, whereas strong phases are left unchanged.
For each pair of processes one defines four asymmetries: an isospin-dependent
CP-conserving asymmetry,
Γ+ + Γ− − Γ0 − Γ0̄
Γ+ + Γ− + Γ0 + Γ0̄
, (50)
two CP-violating asymmetries for B+ and B0,
A+CP ≡
Γ− − Γ+
Γ− + Γ+
, − C ≡ A0CP ≡
Γ0̄ − Γ0
Γ0̄ + Γ0
, (51)
and the time-dependent asymmetry S in B0 decays,
1 + |λ|2
, λ ≡ ηCP
B̄ − Ā
e−2iβ , (52)
In the Standard Model, the isoscalar amplitude B contains a dominant penguin
contribution, BP , with a CKM factor V
cbVcs. The residual isoscalar amplitude,
∆B ≡ B −BP , (53)
and the amplitude A, consist each of contributions smaller than BP by about an
order of magnitude.29,30,31,32,86 These contributions include terms with a much
smaller CKM factor V ∗ubVus, and a higher order electroweak penguin amplitude with
CKM factor V ∗tbVts. Thus, one expects
|∆B| ≪ |BP | , |A| ≪ |BP | . (54)
Consequently, the asymmetries AI , A
CP and ∆S are expected to be small, of or-
der 2|A|/|B| and 2|∆B|/|BP |. In contrast, potentially large contributions to ∆B
and A from NP, comparable to BP , would most likely lead to large asymmetries
of order one. An unlikely exception is the case when both ∆B/BP and A/BP are
purely imaginary, or almost purely imaginary. This would require very special cir-
cumstances such as fine-tuning in specific models. Excluding cancellations between
NP and SM contributions in both CP-conserving and CP violating asymmetries,
tests for the hierarchy (54) become tests for the smallness of corresponding potential
NP contributions to B and A.
There exists ample experimental information showing that asymmetries A+CP
are small in processes related by isospin reflection to the decay modes in Table III.
Upper limits on the magnitudes of most asymmetries are at a level of ten or fifteen
percent [e.g., A+CP (K
+φ) = 0.034±0.044,A+CP (K+η′) = 0.031±0.026], while others
may be as large as twenty or thirty percent [A+CP (K
+ρ0) = 0.31+0.11
−0.10]. Similar values
have been measured for isospin asymmetries AI [e.g., AI(K
+φ) = −0.037± 0.077,
+η′) = −0.001± 0.033, AI(K+ρ0) = −0.16± 0.10].49 Since these two types
of asymmetries are of order 2|∆B|/|BP | and 2|A|/|BP |, this confirms the hierarchy
(54), which can be assumed to hold also in the presence of NP.
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 21
We will take by convention the dominant penguin amplitude BP to have a zero
weak phase and a zero strong phase, referring all other strong phases to it. Writing
B = BP +∆B , B̄ = BP +∆B̄ , (55)
and expanding the four asymmetries to leading order in ∆B/BP or A/BP , one has
∆S = cos 2β
Im(Ā−A)
− Im(∆B̄ −∆B)
, (56)
Re(Ā+A)
, (57)
A+CP =
Re(Ā−A)
Re(∆B̄ −∆B)
, (58)
A0CP = −
Re(Ā−A)
Re(∆B̄ −∆B)
. (59)
The four asymmetries provide the following information:
• The ∆I = 0 and ∆I = 1 contributions in CP asymmetries are separated
by taking sums and differences,
A∆I=0CP ≡
(A+CP +A
CP ) =
Re(∆B̄ −∆B)
, (60)
A∆I=1CP ≡
(A+CP −A
CP ) =
Re(Ā−A)
. (61)
• ReA/BP and ReĀ/BP may be separated by using information from A∆I=1CP
and AI .
• ∆S is governed by an imaginary part of a combination of ∆I = 0 and ∆I =
1 terms which cannot be separated in B decays. Such a separation is possible
in Bs decays to pairs of isospin-reflected decays, e.g. Bs → K+K−,KSKS
or Bs → K∗+K∗−,K∗0K̄∗0, where 2β in the definition of ∆S (47) is now
replaced by the small phase of Bs-B̄s mixing.
One may take one step further under the assumption that strong phases as-
sociated with NP amplitudes are small relative to those of the SM and can be
neglected.110 This assumption, which must be confronted by data, is reasonable
because rescattering from a leading b → scc̄ amplitude is likely the main source
of strong phases, while rescattering from a smaller b → sqq̄ NP amplitude is then
a second-order effect. In the convention (55), where the strong phase of BP is set
equal to zero, ∆B and A have the same CP-conserving strong phase δ, and involve
CP-violating phases φB and φA, respectively,
∆B = |∆B|eiδeiφB , A = |A|eiδeiφA . (62)
Since the four asymmetries (56)-(59) are first order in small ratios of amplitudes,
one may take BP in their expression to be given by the square root of Γ+ or Γ0,
thereby neglecting second order terms. These four observables can then be shown to
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
22 M. Gronau
determine |A|, φA and |∆B| sinφB .49 The combination |∆B| cosφB adds coherently
to BP and cannot be fixed independently.
The amplitudes ∆B and A consist of process-dependent SM and potential NP
contributions. Assuming that the former are calculable, either using methods based
on QCD-factorization or by fitting within flavor SU(3) these and other B decay
rates and asymmetries, the four asymmetries determine the magnitude and CP
violating phase of a ∆I = 1 NP amplitude and the imaginary part of a ∆I = 0 NP
amplitude. In certain cases, e.g., B → φK or B → η′KS , stringent upper bounds on
SM contributions to ∆B and Amay suffice if some of the four measured asymmetries
are larger than permitted by these bounds. In the pair B+ → K+π0, B0 → K0π0,
the four measured asymmetries [using the predicted value (39)] are AI = 0.087 ±
0.038, A∆I=0CP = −0.047± 0.025, A∆I=1CP = 0.094± 0.025,∆S = −0.35± 0.21. Some
reduction of errors is required for a useful implementation of this method.
Conclusion: There exists ample experimental evidence in pairs of isospin-reflected
b → s penguin-dominated decays that potential NP amplitudes must be small.
Assuming that these amplitudes involve negligible strong phases, and assuming that
small SM non-penguin contributions are calculable or can be strictly bounded, one
may determine the magnitude and CP violating phase of a NP ∆I = 1 amplitude,
and the imaginary part of a NP ∆I = 0 amplitude in each pair of isospin-reflected
decays.
6.5. Null or nearly-null tests
We have not discussed null tests of the CKM framework.111 Evidence for physics
beyond the Standard Model may show-up as (small) nonzero asymmetries in pro-
cesses where they are predicted to be extremely small in the CKM framework. A
well-known example is B+ → π+π0, where the CP asymmetry is expected to be a
small fraction of a percent including EWP amplitudes.34,35 We have only discussed
exclusive hadronic B decays, where QCD calculations involve hadronic uncertain-
ties. A more robust calculation exists for the direct CP asymmetry in inclusive
radiative decays B → Xsγ, found to be smaller than one percent.112 The current
upper limit on this asymmetry is at least an order of magnitude larger.113
Time-dependent asymmetries in radiative decays B0 → KSπ0γ, for a KSπ0
invariant-mass in the K∗ region and for a larger invariant-mass range including
this region, are interesting because they test the photon helicity, predicted to be
dominantly right-handed in B0 decays and left-handed in B̄0 decays.105,114 The
asymmetry, suppressed by ms/mb, is expected to be several percent in the SM,
and can be very large in extensions where spin-flip is allowed in b → sγ. While
dimensional arguments seem to indicate a possible larger asymmetry in the SM,
of order ΛQCD/mb ∼ 10%,115 calculations using perturbative QCD116 and QCD
factorization117 find asymmetries of a few percent. The current averaged values,
for the K∗ region and for a larger invariant-mass range including this region, are
S((KSπ
0)K∗γ) = −0.28 ± 0.26 and S(KSπ0γ) = −0.09 ± 0.24.20,118 These mea-
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 23
surements must be improved in order to become sensitive to the level predicted in
the SM, or to provide evidence for physics beyond the SM.
7. Summary
The Standard Model passed with great success numerous tests in the flavor sector,
including a variety of measurements of CP asymmetries related to the CKM phases
β and γ. Small potential New Physics corrections may occur in ∆S = 0 and |∆S| =
1 penguin amplitudes, affecting the extraction of γ and modifying CP-violating
and isospin-dependent asymmetries in |∆S| = 1 B0 decays and isospin-related B+
decays. Higher precision than achieved so far is required for claiming evidence for
such effects and for sorting out their isospin structure.
Similar studies can be performed with Bs mesons produced at hadron colliders
and at e+e− colliders running at the Υ(5S) resonance. Time-dependence in Bs →
D−s K
+ and Bs → J/ψφ or Bs → J/ψη measures γ and the small phase of the
Bs-B̄s mixing amplitude.
119 Comparing time-dependence and angular analysis in
Bs → J/ψφ with b → s penguin-dominated processes including Bs → φφ,Bs →
K∗+K∗−, Bs → K∗0K̄∗0 provides a methodic search for potential NP effects. Work
on Bs decays has just begun at the Tevatron.
120 One is looking forward to first
results from the LHC.
Acknowledgments
I am grateful to numerous collaborators, in particular to Jonathan Rosner whose
collaboration continued without interruption for many years. This work was sup-
ported in part by the Israel Science Foundation under Grant No. 1052/04 and by
the German-Israeli Foundation under Grant No. I-781-55.14/2003.
References
1. J. H. Christenson, J. W. Cronin, V. L. Fitch and R. Turlay, Phys. Rev. Lett. 13, 138
(1964).
2. B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 87, 091801 (2001); K. Abe
et al. [Belle Collaboration], Phys. Rev. Lett. 87, 091802 (2001).
3. A. B. Carter and A. I. Sanda, Phys. Rev. Lett. 45, 952 (1980); Phys. Rev. D 23, 1567
(1981); I. I. Y. Bigi and A. I. Sanda, Nucl. Phys. B 193, 85 (1981).
4. M. Kobayashi and T. Maskawa, Prog. Theor. Phys. 49, 652 (1973).
5. I. Dunietz and J. L. Rosner, Phys. Rev. D 34, 1404 (1986); I. I. Y. Bigi and A. I. Sanda,
Nucl. Phys. B 281, 41 (1987).
6. H. Albrecht et al. [ARGUS Collaboration], Phys. Lett. B 192, 245 (1987); S. L. Wu,
Nucl. Phys. Proc. Suppl. 3, 39 (1988).
7. L. Wolfenstein, Phys. Rev. Lett. 51, 1945 (1983). We use a standard phase convention
in which Vub and Vtd are complex, while all other CKM matrix elements are real to a
good approximation.
8. J. Charles et al. [CKMfitter Collaboration], eConf C060409, 043 (2006),
presenting updated results periodically on the web site http://www.slac.
stanford.edu/xorg/ckmfitter/.
http://www.slac
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
24 M. Gronau
9. M. Bona et al. [UTfit Collaboration], JHEP 0610, 081 (2006), presenting updated
results periodically on the web site http://www.utfit.org/.
10. V. M. Abazov et al. [D0 Collaboration], Phys. Rev. Lett. 97, 021802 (2006); A. Abu-
lencia et al. [CDF Collaboration], Phys. Rev. Lett. 97, 242003 (2006).
11. For a recent review see A. D. Dolgov, arXiv:hep-ph/0511213.
12. See e.g. E. Gabrielli, A. Masiero and L. Silvestrini, Phys. Lett. B 374, 80 (1996).
13. This review, which is only 27 page long (the number of Hebrew alphabet letters)
includes 120 references, as a Jewish blessing says “May you live to be 120!” It is too
short to include other hundreds or thousands of relevant papers. I apologize to their
many authors.
14. M. Gronau, Phys. Rev. Lett. 63, 1451 (1989).
15. H. Boos, T. Mannel and J. Reuter, Phys. Rev. D 70, 036006 (2004).
16. M. Ciuchini, M. Pierini and L. Silvestrini, Phys. Rev. Lett. 95, 221804 (2005).
17. H. n. Li and S. Mishima, arXiv:hep-ph/0610120.
18. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607107.
19. K. F. Chen et al. [Belle Collaboration], arXiv:hep-ex/0608039.
20. E. Barbiero et al. [Heavy Flavor Averaging Group], hep-ex/0603003; updates are avail-
able at http://www.slac.stanford.edu/xorg/hfag/.
21. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 71, 032005 (2005); R. Itoh et
al. [Belle Collaboration], Phys. Rev. Lett. 95, 091601 (2005).
22. P. Krokovny et al. [Belle Collaboration], Phys. Rev. Lett. 97, 081801 (2006). B. Aubert
et al. [BABAR Collaboration], arXiv:hep-ex/0607105.
23. R. Fleischer and T. Mannel, Phys. Lett. B 506, 311 (2001).
24. M. Gronau and D. London., Phys. Lett. B 253, 483 (1991).
25. M. Gronau and D. Wyler, Phys. Lett. B 265, 172 (1991).
26. D. London and R. D. Peccei, Phys. Lett. B 223, 257 (1989).
27. B. Grinstein, Phys. Lett. B 229, 280 (1989).
28. M. Gronau and D. London, Phys. Rev. D 55, 2845 (1997).
29. M. Beneke, G. Buchalla, M. Neubert and C. T. Sachrajda, Phys. Rev. Lett. 83, 1914
(1999); Nucl. Phys. B 606, 245 (2001); Phys. Rev. D 72, 098501 (2005).
30. Y. Y. Keum, H. n. Li and A. I. Sanda, Phys. Lett. B 504, 6 (2001); Phys. Rev. D 63,
054008 (2001).
31. C. W. Bauer, D. Pirjol, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 70, 054015
(2004); C. W. Bauer, D. Pirjol, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 72,
098502 (2005).
32. M. Ciuchini, E. Franco, G. Martinelli and L. Silvestrini, Nucl. Phys. B 501, 271
(1997); M. Ciuchini, R. Contino, E. Franco, G. Martinelli and L. Silvestrini, Nucl.
Phys. B 512, 3 (1998) [Erratum-ibid. B 531, 656 (1998)]; M. Ciuchini, E. Franco,
G. Martinelli, M. Pierini and L. Silvestrini, Phys. Lett. B 515, 33 (2001).
33. M. Gronau and D. London, Phys. Rev. Lett. 65, 3381 (1990).
34. A. J. Buras and R. Fleischer, Eur. Phys. J. C 11, 93 (1999).
35. M. Gronau, D. Pirjol and T. M. Yan, Phys. Rev. D 60, 034021 (1999) [Erratum-ibid.
D 69, 119901 (2004)].
36. S. Gardner, Phys. Rev. D 59, 077502 (1999); S. Gardner, Phys. Rev. D 72, 034015
(2005).
37. M. Gronau and J. Zupan, Phys. Rev. D 71, 074017 (2005).
38. A. F. Falk, Z. Ligeti, Y. Nir and H. Quinn, Phys. Rev. D 69, 011502 (2004).
39. Y. Nir and H. R. Quinn, Phys. Rev. Lett. 67, 541 (1991); H. J. Lipkin, Y. Nir,
H. R. Quinn and A. Snyder, Phys. Rev. D 44, 1454 (1991); M. Gronau, Phys. Lett. B
265, 389 (1991);
http://www.utfit.org/
http://arxiv.org/abs/hep-ph/0511213
http://arxiv.org/abs/hep-ph/0610120
http://arxiv.org/abs/hep-ex/0607107
http://arxiv.org/abs/hep-ex/0608039
http://arxiv.org/abs/hep-ex/0603003
http://www.slac.stanford.edu/xorg/hfag/
http://arxiv.org/abs/hep-ex/0607105
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 25
40. See, however, N. G. Deshpande and X. G. He, Phys. Rev. Lett. 74, 26 (1995) [Erratum-
ibid. 74, 4099 (1995)].
41. M. Neubert and J. L. Rosner, Phys. Lett. B 441, 403 (1998); Phys. Rev. Lett. 81,
5076 (1998).
42. M. Neubert, JHEP 9902, 014 (1999); M. Beneke and S. Jager, hep-ph/0610322.
43. Y. Grossman and M. P. Worah, Phys. Lett. B 395, 241 (1997).
44. M. Ciuchini, E. Franco, G. Martinelli, A. Masiero and L. Silvestrini, Phys. Rev. Lett.
79, 978 (1997); R. Barbieri and A. Strumia, Nucl. Phys. B 508, 3 (1997); S. A. Abel,
W. N. Cottingham and I. B. Whittingham, Phys. Rev. D 58, 073006 (1998); Y. Gross-
man, M. Neubert and A. L. Kagan, JHEP 9910, 029 (1999); X. G. He, C. L. Hsueh
and J. Q. Shi, Phys. Rev. Lett. 84, 18 (2000); G. Hiller, Phys. Rev. D 66, 071502
(2002); N. G. Deshpande and D. K. Ghosh, Phys. Lett. B 593, 135 (2004); V. Barger,
C. W. Chiang, P. Langacker and H. S. Lee, Phys. Lett. B 580, 186 (2004); ibid. 598,
218 (2004).
45. M. Gronau and J. L. Rosner, Phys. Rev. D 59, 113002 (1999); H. J. Lipkin, Phys.
Lett. B 445, 403 (1999).
46. D. Atwood and A. Soni, Phys. Rev. D 58, 036005 (1998); M. Gronau, Phys. Lett. B
627, 82 (2005).
47. A sum rule involving three asymmetries, based on the expectation that the asymmetry
in B+ → K0π+ should be very small, is discussed in M. Gronau and J. L. Rosner,
Phys. Rev. D 71, 074019 (2005).
48. D. London and A. Soni, Phys. Lett. B 407, 61 (1997).
49. M. Gronau and J. L. Rosner, arXiv:hep-ph/0702193, to be published in Phys. Rev.
50. Y. Grossman, A. Soffer and J. Zupan, Phys. Rev. D 72, 031501 (2005). Evidence for
0-D̄0 mixing has been reported recently, B. Aubert et al. [BABAR Collaboration],
arXiv:hep-ex/0703020; K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0703036.
51. M. Gronau, Phys. Rev. D 58, 037301 (1998).
52. Y. Grossman, Z. Ligeti and A. Soffer, Phys. Rev. D 67, 071301 (2003)
53. D. Atwood, I. Dunietz and A. Soni, Phys. Rev. Lett. 78, 3257 (1997); D. Atwood,
I. Dunietz and A. Soni, Phys. Rev. D 63, 036005 (2001).
54. A. Giri, Y. Grossman, A. Soffer and J. Zupan, Phys. Rev. D 68, 054018 (2003);
A. Bondar, Proceedings of BINP Special Analysis Meeting on Data Analysis, 24–26
September 2002, unpublished.
55. I. Dunietz, Phys. Lett. B 270, 75 (1991).
56. A. Bondar and T. Gershon, Phys. Rev. D 70, 091503 (2004).
57. W. M. Yao et al. [Particle Data Group], J. Phys. G 33, 1 (2006).
58. R. Aleksan, T. C. Petersen and A. Soffer, Phys. Rev. D 67, 096002 (2003).
59. M. Gronau, Phys. Lett. B 557, 198 (2003).
60. M. Gronau, Y. Grossman, N. Shuhmaher, A. Soffer and J. Zupan, Phys. Rev. D 69,
113003 (2004).
61. M. Gronau and J. L. Rosner, Phys. Lett. B 439, 171 (1998); Z. z. Xing, Phys. Rev.
D 58, 093005 (1998); J. H. Jang and P. Ko, Phys. Rev. D 58, 111302 (1998).
62. B. Blok, M. Gronau and J. L. Rosner, Phys. Rev. Lett. 78, 3999 (1997).
63. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 74, 031101 (2006).
64. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 071103 (2005).
65. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 73, 051105 (2006).
66. K. Abe et al. [BELLE Collaboration], Phys. Rev. D 73, 051106 (2006).
67. J. P. Silva and A. Soffer, Phys. Rev. D 61, 112001 (2000); M. Gronau, Y. Grossman
and J. L. Rosner, Phys. Lett. B 508, 37 (2001).
http://arxiv.org/abs/hep-ph/0610322
http://arxiv.org/abs/hep-ph/0702193
http://arxiv.org/abs/hep-ex/0703020
http://arxiv.org/abs/hep-ex/0703036
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
26 M. Gronau
68. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 032004 (2005).
69. K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0508048.
70. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607065.
71. B. Aubert et al. [BABAR Collaboration], Phys. Rev. D 72, 071104 (2005).
72. See also P. Krokovny et al. [Belle Collaboration], Phys. Rev. Lett. 90, 141802 (2003);
K. Abe et al. [Belle Collaboration], arXiv:hep-ex/0408108.
73. A. Poluektov et al. [Belle Collaboration], Phys. Rev. D 73, 112009 (2006).
74. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0607104. See also B. Aubert
et al. [BABAR Collaboration], Phys. Rev. Lett. 95, 121802 (2005).
75. B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0507101.
76. M. Gronau, Y. Grossman, Z. Surujon and J. Zupan, arXiv:hep-ph/0702011, to be
published in Phys. Lett. B.
77. M. Gronau and J. L. Rosner, Phys. Lett. B 595, 339 (2004).
78. M. Gronau, E. Lunghi and D. Wyler, Phys. Lett. B 606, 95 (2005).
79. M. Gronau, D. London, N. Sinha and R. Sinha, Phys. Lett. B 514, 315 (2001).
80. For two somewhat weaker bounds, which are included in this bound, see Y. Grossman
and H. R. Quinn, Phys. Rev. D 58, 017504 (1998); J. Charles, Phys. Rev. D 59, 054007
(1999).
81. H. Ishino et al. [Belle Collaboration], BELLE-PREPRINT-2006-33.
82. M. Gronau, Phys. Lett. B 300, 163 (1993).
83. M. Gronau and J. L. Rosner, work in progress.
84. M. Beneke, M. Gronau, J. Rohrer and M. Spranger, Phys. Lett. B 638, 68 (2006).
85. A. E. Snyder and H. R. Quinn, Phys. Rev. D 48, 2139 (1993); A. Kusaka et al.
[Belle Collaboration], arXiv:hep-ex/0701015; B. Aubert et al. [BABAR Collaboration],
arXiv:hep-ex/0703008.
86. M. Gronau, O. F. Hernandez, D. London and J. L. Rosner, Phys. Rev. D 50, 4529
(1994); ibid 52, 6374 (1995).
87. M. Gronau, J. L. Rosner and D. London, Phys. Rev. Lett. 73, 21 (1994).
88. R. Fleischer and T. Mannel, Phys. Rev. D 57, 2752 (1998).
89. M. Gronau and J. L. Rosner, Phys. Rev. D 57, 6843 (1998).
90. C. W. Chiang, M. Gronau, J. L. Rosner and D. A. Suprun, Phys. Rev. D 70, 034020
(2004).
91. S. Baek, P. Hamel, D. London, A. Datta and D. A. Suprun, Phys. Rev. D 71, 057502
(2005).
92. A. J. Buras, R. Fleischer, S. Recksiegel and F. Schwab, Phys. Rev. Lett. 92, 101804
(2004).
93. H. n. Li, S. Mishima and A. I. Sanda, Phys. Rev. D 72, 114005 (2005).
94. M. Beneke and S. Jager, Nucl. Phys. B 751, 160 (2006).
95. M. Gronau and J. L. Rosner, Phys. Lett. B 644, 237 (2007).
96. M. Gronau and J. L. Rosner, Phys. Rev. D 65, 013004 (2002); [Erratum-ibid. D 65,
079901 (2002).
97. M. Gronau and J. L. Rosner, Phys. Lett. B 572, 43 (2003).
98. M. Beneke and M. Neubert, Nucl. Phys. B 675, 333 (2003).
99. C. W. Bauer, I. Z. Rothstein and I. W. Stewart, Phys. Rev. D 74, 034010 (2006).
100. M. Gronau, Y. Grossman, G. Raz and J. L. Rosner, Phys. Lett. B 635, 207 (2006).
101. M. Gronau and J. L. Rosner, Phys. Rev. D 74, 057503 (2006).
102. D. Zeppenfeld, Z. Phys. C 8, 77 (1981); M. J. Savage and M. B. Wise, Phys. Rev. D
39, 3346 (1989) [Erratum-ibid. D 40, 3127 (1989)]; L. L. Chau, H. Y. Cheng, W. K. Sze,
H. Yao and B. Tseng, Phys. Rev. D 43, 2176 (1991). [Erratum-ibid. D 58, 019902
(1998)].
http://arxiv.org/abs/hep-ex/0508048
http://arxiv.org/abs/hep-ex/0607065
http://arxiv.org/abs/hep-ex/0408108
http://arxiv.org/abs/hep-ex/0607104
http://arxiv.org/abs/hep-ex/0507101
http://arxiv.org/abs/hep-ph/0702011
http://arxiv.org/abs/hep-ex/0701015
http://arxiv.org/abs/hep-ex/0703008
October 27, 2018 17:34 WSPC/INSTRUCTION FILE CP-review
CP violation in beauty decays 27
103. N. G. Deshpande and X. G. He, Phys. Rev. Lett. 75, 1703 (1995); X. G. He, Eur.
Phys. J. C 9, 443 (1999).
104. M. Gronau and J. L. Rosner, Phys. Rev. Lett. 76, 1200 (1996); A. S. Dighe,
M. Gronau and J. L. Rosner, Phys. Rev. D 54, 3309 (1996).
105. D. Atwood, M. Gronau and A. Soni, Phys. Rev. Lett. 79, 185 (1997).
106. M. Beneke, Phys. Lett. B 620, 143 (2005).
107. H. Y. Cheng, C. K. Chua and A. Soni, Phys. Rev. D 72, 014006 (2005); H. Y. Cheng,
C. K. Chua and A. Soni, Phys. Rev. D 72, 094003 (2005).
108. Y. Grossman, Z. Ligeti, Y. Nir and H. Quinn, Phys. Rev. D 68, 015004 (2003);
G. Engelhard, Y. Nir and G. Raz, Phys. Rev. D 72, 075013 (2005); G. Engelhard and
G. Raz, Phys. Rev. D 72, 114017 (2005).
109. M. Gronau and J. L. Rosner, Phys. Lett. B 564, 90 (2003); C. W. Chiang, M. Gronau
and J. L. Rosner, Phys. Rev. D 68, 074012 (2003); C. W. Chiang, M. Gronau, Z. Luo,
J. L. Rosner and D. A. Suprun, Phys. Rev. D 69, 034001 (2004); M. Gronau, J. L. Ros-
ner and J. Zupan, Phys. Lett. B 596, 107 (2004); M. Gronau, J. L. Rosner and J. Zupan,
Phys. Rev. D 74, 093003 (2006).
110. A. Datta and D. London, Phys. Lett. B 595, 453 (2004); S. Baek, P. Hamel, D. Lon-
don, A. Datta and D. A. Suprun, Phys. Rev. D 71, 057502 (2005); A. Datta, M. Im-
beault, D. London, V. Page, N. Sinha and R. Sinha, Phys. Rev. D 71, 096002 (2005).
111. T. Gershon and A. Soni, J. Phys. G 33, 479 (2007).
112. J. M. Soares, Nucl. Phys. B 367, 575 (1991); A. L. Kagan and M. Neubert, Phys.
Rev. D 58, 094012 (1998).
113. B. Aubert et al. [BABAR Collaboration], Phys. Rev. Lett. 93, 021804 (2004); Phys.
Rev. Lett. 97, 171803 (2006); S. Nishida et al. [BELLE Collaboration], Phys. Rev. Lett.
93, 031803 (2004).
114. D. Atwood, T. Gershon, M. Hazumi and A. Soni, Phys. Rev. D 71, 076003 (2005).
115. B. Grinstein, Y. Grossman, Z. Ligeti and D. Pirjol, Phys. Rev. D 71, 011504 (2005);
B. Grinstein and D. Pirjol, Phys. Rev. D 73, 014013 (2006).
116. M. Matsumori and A. I. Sanda, Phys. Rev. D 73, 114022 (2006).
117. P. Ball and R. Zwicky, Phys. Lett. B 642, 478 (2006).
118. B. Aubert et al. [BaBar Collaboration], Phys. Rev. D 72, 051103 (2005); Y. Ushiroda
et al. [Belle Collaboration], Phys. Rev. D 74, 111104 (2006).
119. R. Aleksan, I. Dunietz and B. Kayser, Z. Phys. C 54, 653 (1992).
120. M. Paulini, arXiv:hep-ex/0702047; G. Punzi [CDF - Run II Collaboration],
arXiv:hep-ex/0703029.
http://arxiv.org/abs/hep-ex/0702047
http://arxiv.org/abs/hep-ex/0703029
ABSTRACT
  Precision tests of the Kobayashi-Maskawa model of CP violation are discussed,
pointing out possible signatures for other sources of CP violation and for new
flavor-changing operators. The current status of the most accurate tests is
summarized.

<|endoftext|><|startoftext|>
Introduction of Reichenbach’s book [3]. He called the concept as ” ... of great
interest for the methodology of physics but what has so far not received the
attention it deserves”. In this paper we shall try to rectify for this failure
of appreciating the concept of the Universal Force - albeit in a somewhat
altered and improved form.
Reichenbach defines two kind of forces - Differential Forces and Univer-
sal Forces. It may be pointed out that the term ”force” here should not be
taken strictly as defined in physics but in a broad and general framework.
In fact Carnap has suggested that the term ”effect” instead of ”force’ would
better serve the purpose [5] and which allows it be used in different frame-
works. Hence to conform with the accepted practice, though in this paper
we shall continue to use the term ”Universal Force” the reader may do well
to remember that what we really mean is ”Universal Effect”.
One calls a force Differetial if it acts differently on different substances. It
is called Universal if it is quantitatively the same for all the substances [3,5].
If we heat a rod of initial length l0 from initial temperature T0 to tempetature
T then its length is given as
l = l0[1 + β(T − T0)] (1)
where β the coefficient for thermal expansion is different for different
materials. Hence this is a Differential Force. Now the correction factor due
to the influence of gravitation on the length of the rod is
l = l0[1− C
φ] (2)
Here the rod is placed at a distance r from sun whose mass is m and φ is
the angle of the rod with respect to the the line sun to rod. C is a universal
constant ( in CGS unit C= 3.7 x 10−29 ). As this acts in the same manner
for any material of mass m, gravity is a Universal Force as per the above
definition.
Reichenbach also gives a general definition of the Universal Forces [3,p
12] as: (1) affecting all the materials in the same manner and (2) there are
no insulating walls against it. We saw above that gravity is such a force,
Indeed gravity is a Universal Force par excellance. It affects all matter
in the same manner. The equality of the gravitational and inertial masses is
what ensures this physically. If the gravitational and inertial masses were not
found to be equal, then one would not have been able to visualize of the paths
of freely falling mass points as geodesics in the four dimentional space-time.
In that case different geodesics would have resulted from different materials
of mass points [3].
Therefore the universal effect of gravitation on different kinds of measur-
ing instruments is to define a single geometry for all of them. Viewed this
way, one may say that gravity is geometerized. ”It is not theory of gravitation
that becomes geometry, but it is geometry that becomes the experience of the
gravitational field” [3, p 256]. Why does the planet follow the curved path?
Not because it is acted upon by a force but because the curved space-time
manifold leaves it with no other choice!
So as per Einstein’s theory of relativity, one does not speak of a change
produced by the gravitational field in the measuring instruments, but regard
the measuring instruments as free from any deforming forces. Gravity being a
Universal Force, in the Einstein’s Theory of Relativity, it basically disappears
and is replaced by geometry.
In fact Reichenbach [3, p 22] shows how one can give a consistent defi-
nition of a rigid rod - the same rigid rods which are needed in relativity to
measure all lengths. ”Rigid rods are solid bodies which are not affected by
Differential Forces, or concerning which the influence of Differential Forces
has been eliminated by corrections; Universal Forces are disregarded. We do
not neglect Universal Forces. We set them to zero by definition. Without
such a rule a rigid body cannot be defined.” In fact this rule also helps in
defining a closed system as well.
All this was formalized in terms of a theorem by Reichenbach [3, p 33]
THEOREM θ :
Given the geometry G0 to which the measuring instruments conform,
we can imagine a Universal Force F which affects the instruments in such
a way that the actual geometry is an arbitrary geometry G, while the ob-
served deviation from G is due to universal deformation of the measuring
instruments.”
G0 + F = G (3)
Hence only the combination G0+F is testable. As per Reichenbach’s prin-
ciple one prefers the theory wherein we put F=0. If we accept Reichenbach
principle of putting the Universal Force of gravity to zero, then the arbitrari-
ness in the choice of the measuring procedure is avoided and the question of
the geometrical structure of the physical space has a unique answer deter-
mined by physical measurement. It is this principle which Carnap praises
highly [5, p 171], ” Whenever there is a system of physics in which a certain
universal effect is asserted by a law that specifies under what conditions in
what amount the effect occurs, then the theory should be transformed so that
the amount of effect would be reduced to zero. This is what Einstein did
in regard to contraction and expansion of bodies in gravitational field.” The
left hand side of Einstein’s equation (below) gives the relevant non-Euclideon
geometry
Gµν = 8πG〈φ|Tµν|φ〉 (4)
In the case of gravity, and in as much as Einsteins’s Theory of Relativ-
ity has been well tested experimentally, we treat the above concept as well
placed empirically. But from this single success Reichenbach generalizes this
as a fundamental principle for all cases where Universal forces may arise. As
Carnap states [5, p 171], ” Whenever universal effects are found in physics,
Reichenbach maintained that it is always possible to eliminate them by suit-
able transformation of theory; such a transformation should be made because
of the overall simplicity that would result. This is a useful general principle,
deserving more attention than it has received. It applies not only to relativ-
ity theory, but also to situations that may arise in the future in which other
universal effects may be observed. Without the adoption of this rule there
is no way to give unique answer to the question - what is the structure of
space?”.
As such Reichenbach goes ahead and tries to apply this principle of elimi-
nation of Universal Forces to another universal effect that he finds and which
arises from considerations of topology ( as an additional consideration over
and above that of geometry ) of space-time of the universe.
The Theorem θ is limited to talking about the geometry of space-time
only. It does not take account of specific topological issues that may arise.
To take account of topology of the space-time we shall have to extend the
said theorem appropriately.
What would one experience if space had different topological properties.
To make the point home Reichenbach considers a torus-space [3, p 63]. This
is quite detailed and extensive. However for the purpose of simplifying the
and shortening the discussion here we shall talk of a two dimensional being
who lives on the surface of a sphere. His measurements tell him so. But in
spite of this he insists that he lives on a plane. He may actually do so as per
our discussion above if he confines himself to metrical relations only. With
an appropriate Universal Force he can he can justify living on a plane. But
the surface of a sphere is topologically different from that of a plane. On a
sphere if he starts at a point X and goes on a world tour he may come back
to the same point X. But this is impossible on a plane. And hence to account
for coming back to the ”same point” he has to maintain that on the plane
he actually has come back to a different point Y - which though is identical
to X in all other respects. One option for him is to accept that he is actually
living on a sphere. However if he still wants to maintain his position that he
is living on a plane then he has to explain as to how point Y is physically
identical to point X in spite of the fact that X and Y are different and distinct
points of space. Indeed he can do so by visualizing a fictitious force as an
effect of some kind of ”pre-established harmony” [3, p 65] by proposing that
everything that occurs at X also occurs at the point Y. As it would affect all
matter in the same manner this corresponds to a Universal Force/Effect as
per Reichenbach’s definition.
This interdependence of corresponding points which is essential in this
”pre-established” harmony cannot be interpreted as ordinary causality, as
it does not require ordinary time to transmit it and also does not spread
continuously through intervening space. Hence there is no mysterious causal
connection between the points X and point Y. Thus this necessarily entails
proposing a ”causal anomaly” [3, p 65]. In short connecting different topolo-
gies through a fictitious Universal Effect of ”pre-established harmony” neces-
sarly calls for introduction of ”causal anomalies”. Call this new hypothesize
Universal Force as A and the Theorem θ be extended to read
G0 + F + A = G (5)
where on the right had side we have given a different capital G which
reduces to G of the original Theorem θ when A is set equal to zero.
Now as per Reichenbach’s law of preferring that physical reality wherein
all Universal Forces are put to zero, he advocates of putting A to zero. He
pointed out that this has the advantage of retaining physical ”causality ”
in our science, This he takes as a success of his methodology. As per Re-
ichenbach [3, p 65] ” The principle of causality is one of its (physics) sacred
laws, which it will not abandon lightly; pre-established harmony, however is
incompatible with this law”.
However, as the said ’causal anomaly” is of topological origin we cannot
be sure in what manner it will manifest itself physically. In addition will not
the Universal Force/Effect of ”pre-established harmony” compensate for it in
some manner? So what one is saying is that it is possible that Reichenbach
was wrong in putting all Universal Forces to zero. It was OK to put F to
zero which justified the geometrical interpretation of gravity. But in the case
of this new topological Universal Force we really do not know enough and
let us not be governed by any theoretical prejudice and let the Nature decide
as to what is happening. So to say, let us look at modern cosmology to see
if it is throwing up any new Universal Forces which may be identified with
our ”pre-established harmony” here.
To understand this let us look at the Einstein’s Equation given above.
Harvey and Schucking [11] correcting for Einstein’s error in understanding
the role of the cosmological term λ have derived the most general equation
of motion to be
Gµν + λgµν = 8πG〈φ|Tµν|φ〉 (6)
They showed that [11] the Cosmological Constant λ above provides a
new repulsive force proportional to mass m, repelling every particle of mass
m with a force
F = mc2
x (7)
Recent data [1] on λ is what leads to the crisis of Dark Energy.
Quite clearly this repulsive force is a new Universal Force as per our
definition and hence conforms to the ”pre-established harmony” aspect of
the ”causal anomaly”. Thus we see that indeed as per the recent data on
accelerating universe we have stumbled upon this new Universal Force which
is of topological origin. Hence the source of dark energy is due to ”causal
anomaly” arising from the unique topological structure of our universe. This
solves the mystery of the origin of Dark Energy.
So we would like to emphasize that it is the accelerating universe ( and
hence the Dark Energy ) which is forcing us to accept the incorporation of
this ”causal anomaly” of topological origin. Implications of this new concept
in physics have now to be explored.
Note that as per Theorem θ when one puts F to be zero then one obtains
the proper non-Euclidean Geometry of Einstein’s equation. But now we
know that full structure is the sum of this non-Euclidean geometry plus A
, the new Universal Force ( as per the modified theorem above ) and this is
what the accelerating universe is forcing us to accept. This is what we called
capital G above. We feel that the DASI data on Ω0 being close to one and
thus showing that the Universe is flat [1] is consistent with capital G being
equal to G+ A. In principle just as per the original Theorem θ one may
add a Universal Force F to Einstein’s non-Euclidean geometry to obtain a
physically relevant Euclidean geometry, so in the same manner given a non-
Euclidean geometry of Einstein on can add an appropriate Universal Force
A to provide a flat universe. And this is exactly what capital G is telling us.
Thus the observed flatness of the universe may be treated as a success of the
new idea proposed here.
One would like to ask as to in what other manner incorporation of this
new ”causal anomaly” may help us in understanding Nature better? Will
it provide new perspectives as answers to quantum mechanical puzzles of
quantum jumps, non-locality etc. These are open questions to be tackled in
future.
REFERENCES
1. M S Turner, ”Making sense of the new cosmology”, Int J Mod Phys,
A17S1 (2002) 180-196
2. Hans Reichenbach (1891-1953) can properly be called a philosopher-
scientist. As a leading philosopher of science he was founder of the Berlin
Circle and a proponent of logical positivism. Among his teachers were David
Hilbert, Max Planck, Max Born and Albert Einstein. He wrote extensively
on the theory of probability, theory of relativity and quantum mechanics.
His philosophical writings have a definite scientific touch in them, very much
akin to that of Descartes, Leibniz and Huygens.
3. H Reichenbach, ”The philosophy of space and time”, Dover, New York
(1957) (Original German edition in 1928)
4. C Callender and N Huggett, ” Physics meets philosophy at the Planck
scale”, Cambridge University Press, UK (2001)
5. R Carnap, ”An introduction to the philosophy of science”, Basic Books,
New York (1966)
6. E Nagel, ”The structure of science”, Routledge and Kegan Paul, Lon-
don (1961)
7. D Dieks, ”Gravitation as a Universal Force”, Synthese, 73 (1987)
381-397
8. B Ellis, ”Universal and Differential Forces”, Brit J Phil Sc, 14 (1963)
177-194
9. A Gruenbaum, ”Philosophical problems of space and time”, Dordrecht,
Holland; D Reidel (1973) or Alfred A Knopf, New York (1963)
10. R Torretti, ”Relativity and geometry”, Pergamon Press (1983)
11. A Harvey and E Schucking, ”Einstein’s mistake and the cosmological
constant”, Am J Phys, 68 (2000) 723-727
ABSTRACT
  The Dark Energy problem is forcing us to re-examine our models and our
understanding of relativity and space-time. Here a novel idea of Fundamental
Forces is introduced. This allows us to perceive the General Theory of
Relativity and Einstein's Equation from a new pesrpective. In addition to
providing us with an improved understanding of space and time, it will be shown
how it leads to a resolution of the Dark Energy problem.

<|endoftext|><|startoftext|>
Introduction
An important aspect in any geometric gravitational theory is the analysis of how to match
two spacetimes. This is true in particular for General Relativity and its perturbation
theory. Despite the relevance and maturity of the matching theory one often finds papers
where the matching conditions are not properly used. Most of the difficulties arise from
the fact that the matching conditions are imposed in specific coordinate systems in a
manner which is not completely coordinate independent. More specifically, matching two
spacetimes requires identifying the boundaries pointwise, and sometimes this identification
is done implicitly by fixing spacetime coordinates, without paying enough attention to the
fact that solving the matching involves finding an identification of the boundary and that
this should not be fixed a priori.
In perturbation theory this problem also arises, and it gets complicated by the fact
that the fields to be matched (as the perturbed metric) are gauge dependent. So, in
addition to a priori choices of identifications of the boundary, there is also the problem
http://arxiv.org/abs/0704.0078v1
that particular gauges are often used. It may be argued that the matching theory must
be gauge independent and therefore it can be performed in any gauge. This is true, but
only when due care is taken to ensure that the choice of gauge does not restrict, a priori,
the perturbed identification of the boundaries.
A complete description of the linearized matching conditions has been achieved only
recently by Carter and Battye [5] and independently by Mukohyama [6]. To second order,
the matching conditions have been recently found in [7]. Despite these papers, we believe
that some confusion still lingers in the field, in particular with respect to the existing gauge
invariant formulations. The aim of this paper is to try to clarify these issues. In order
to do that, we will critically discuss some of the approaches proposed in the literature
trying to make clear which are the implicit assumptions made and to what extent are
they justified.
The first papers discussing the perturbed matching theory are, as far as we know,
the classic papers by Gerlach and Sengupta [2, 3]. However, as explained below, their
description of the perturbed matching theory contains imprecisions, and we will therefore
start discussing their approach pointing out the difficulties they encounter. A first attempt
to justify the claims in [2, 3] is due to Mart́ın-Garćıa and Gundlach [4], who propose a
different but nevertheless closely related set of linearized matching conditions. Pointing
out the implicit assumption made by these authors will also help us to try to explain the
subtleties inherent to the perturbed matching theory.
In [6] the linearized matching conditions are described for arbitrary backgrounds,
perturbations and matching hypersurfaces, and then applied to the case of two background
spacetimes with a high degree of symmetry, namely those which admit a maximal group of
isometries acting on codimension two spacelike submanifolds (e.g. spherically symmetric
spacetimes). In order to simplify the matching conditions, Mukohyama derives a set of
matching conditions for so-called doubly gauge invariants. However, a gap arises in his
final conclusions as the presented set of conditions for doubly gauge invariant quantities
for the linearized matching of spacetimes are only shown to be necessary conditions.
Analysing sufficiency touches directly on the issue we are trying to emphasize in this
paper, so we devote one section to clarify this point, where we show how these conditions
are, stricktly speaking, not sufficient. Since the matching conditions in terms of doubly
gauge invariants are widely used in the literature, we consider important to close this
gap. Moreover, the constructions of gauge invariant quantities using spherical harmonic
decompositons leaves out the l = 0 and l = 1 sectors. We will discuss this issue and its
consequences.
The paper is organized as follows. We start by summarising the perturbed matching
conditions in Section 2, where we also describe the gauge freedom involved. Then, the
procedures used in the classic papers [2, 3] together with the justifications and further
developments in [4] are reviewed in Section 3. Section 4 focuses on the consequences of
the existence of symmetries in the background configuration, which will have relevance
in our final discussion. Section 5 has three subsections. The first one is devoted to
present briefly the procedure and results discussed in [6] particularised to the case of
spherically symmetric backgrounds. In the second subsection we analyse the sufficiency
of the doubly gauge invariant matching conditions in [6]. The last subsection is devoted
to the study of the freedom left in the perturbation of the matching hypersurface once
the metric perturbations have been fixed at both sides. We finish with an appendix where
explicit expressions for the discontinuities of the perturbed second fundamental forms in
the spherical case are given. Some of these expressions are used in the main text.
2 Linearized matching
In this section we describe the gauge freedom involved in the linearised spacetime matching
and summarise the perturbed matching conditions.
2.1 Gauge freedom
The purpose of the matching theory is to construct a new spacetime out of two spacetimes
M± with boundary by finding a suitable diffeomorphism between the boundaries which
allows for their pointwise identification. In particular, the matched spacetime cannot
be thought to exist beforehand. Another aspect to bear in mind is that the matching
conditions involve exclusively tensors on the identified boundary Σ and hence any coordi-
nate system in M± is equally valid. This is well-known but it is still source of confusion
sometimes.
In perturbed matching theory, not only the metrics are perturbed but also the match-
ing hypersurfaces may be deformed. Furthermore, as for the metric, the “deviation” of the
matching hypersurface is also a gauge dependent quantity. This can be best understood
by viewing perturbations as ε-derivatives (at ε = 0) of a one-parameter family of space-
times (M+ε , g
ε ) with boundary Σ
ε . It is convenient to embed M
ε within a larger manifold
(without boundary) V +ε to clarify the discussion. A priori, the manifolds (M
ε , gε) are
completely distinct so it makes no direct sense to talk about ε-derivatives. It is necessary
to identify first the different manifolds so that a single point p refers to one point on each
of the manifolds. Obviously, there are infinite ways to identify the manifolds, all of them
equally valid a priori. This freedom leads to the gauge dependence of the perturbed met-
ric (and of any other geometrically defined tensor). The identification above may, or may
not, map the boundaries Σ+ε among themselves. A priori, a point in Σ
0 may be mapped,
for ε 6= 0, to a point on Σ+ε , to a point interior to M+ε or to a point exterior to M+ε (within
the extension V +ε ) which is not part of the manifold. How can we then take derivatives
with respect to ε at those later points? Since only derivatives at ε = 0 are needed, re-
stricting to infinitesimal values of ε entails no loss of generality. Then, if for some small
ε, a point q ∈ Σ+0 is mapped to the exterior of M+ε , it follows from differentiability with
respect to ε that q is mapped, for the reverse value −ε, to a point interior to M+ε . Thus,
perturbations can be defined at the boundary by taking one sided derivatives, i.e. to take
limits ε → 0, with a sign restriction on ε (c.f. [7] for an alternative discussion).
However, an important issue remains: How do we describe the deformation of the
boundary Σ+0 ? As a set of points each boundary Σ
ε maps, with the above identification,
into a hypersurface of the background spacetime, which we call Σ̂+ε . In general, this
hypersurface will not coincide with Σ+0 and may well touch it or cross it. This gives us
an idea of how the boundary is deformed, but only as a subset, not pointwise. In order
to know how the boundary actually moves within the background, we need to prescribe a
priori a pointwise identification of Σ+0 with Σ
ε . This identification is completely different
and independent from the one described above involving spacetime points, and involves
only the points on the boundaries. As before, there are infinitely many ways to identify the
boundaries, and this defines a second gauge freedom, which involves objects intrinsically
defined on the boundary. This gauge freedom will be referred as hypersurface gauge, as
opposed to the usual spacetime gauge described above.
With both identifications chosen, the deformation of the boundary within the back-
ground can already be described: Fix a point q on the background boundary Σ+0 . The
identification of the boundaries defines a point qε on Σ
ε , for each ε. The spacetime iden-
tification takes this point qε and maps it into a point q̂ε of the background M
0 (perhaps
after a sign restriction on ε). Obviously q̂ε belongs to the perturbed hypersurface Σ̂
ε . We
have therefore not only a deformation of the background hypersurface as a set of points,
but also pointwise information. It only remains to take the tangent vector of the curve
q̂ε at ε = 0, i.e. ~Z
+ = dq̂ε
|ε=0 which encodes completely the deformation of the matching
hypersurface as seen from the background spacetime. Two final remarks are in order:
(i) ~Z+ is defined exclusively on Σ+0 , no extension thereof is defined or required and (ii)
~Z+ depends on both the spacetime and hypersurface gauges, since its defining curve is
constructed using both identifications. However, decomposing ~Z+ = Q+~n 0+ +
~T+, where
~n 0+ is the unit normal of Σ
0 (assumed non-null anywhere) and
~T+ is tangent to it, it turns
out that Q+ depends on the spacetime gauge but not on the hypersurface gauge. This
is because changing the hypersurface gauge reorganizes the points within each Σ̂+ε , but
cannot modify any of them as a set of points.
Tensors defined intrinsically on the boundaries Σ+ε are completely unaffected by the
spacetime identification, and are therefore invariant under spacetime gauge transforma-
tions. Recall that the matching conditions involve only objects intrinsic to the match-
ing hypersurfaces. Since the perturbed matching conditions are, formally, just their ε-
derivatives, it follows by construction that the perturbed matching conditions must be
gauge invariant under spacetime gauge transformations. This may seem surprising at first
sight since the matching conditions must involve the perturbed metric, which is obviously
gauge dependent. However, the conditions turn out to be gauge independent because they
also involve the deformation vector ~Z+, which is spacetime gauge dependent. This vector
is therefore of fundamental importance and must be taken into account in any sensible
approach to the problem, as we shall see next.
2.2 Matching conditions
Let (M±0 , g
0 ) be n-dimensional spacetimes with non-null boundaries Σ
0 . Matching them
requires an identification of the boundaries, i.e. a pair of embeddings Φ± : Σ0 −→ M±0
with Φ±(Σ0) = Σ
0 , where Σ0 is an abstract copy of any of the boundaries. Let ξ
(i, j, . . . = 1, . . . , n−1) be a coordinate system on Σ0. Tangent vectors to Σ±0 are obtained
by e±αi =
(α, β, . . . = 0, . . . , n − 1). There are also unique (up to orientation) unit
normal vectors n
α to the boundaries. We choose them so that if n
α points towards
M+ then n
α points outside of M− or viceversa. The first and second fundamental are
simply q(0)ij
± ≡ e±αi e
, K(0)ij
± = −n(0)±αe
i ∇±β e
j . The matching conditions
(in the absence of shells) require the equality of the first and second fundamental forms
on Σ±0 , i.e.
q(0)ij
+ = q(0)ij
−, K(0)ij
+ = K(0)ij
−. (1)
Under a perturbation of the background metric g±pert = g
(0)± + g(1)± and of the matching
hypersurfaces via ~Z± = Q± ~n(0)± + ~T
±, the matching conditions will be perturbatively
satisfied if and only if [6]
q(1)+ij = q
ij , K
ij = K
ij, (2)
q(1)ij
± = L~T±q
± + 2Q±K(0)ij
± + e±αi e
±, (3)
K(1)ij
± = L~T±K
± − ǫDiDjQ± +Q±(−n(0)± µn
αµβνe
±K(0)
g(1)αβ
βK(0)ij
± − n(0)± µS
(1)±µ
j , (4)
where ǫ = n
(0)α, D is the covariant derivative of (Σ, q(0)±) and S(1)
βγ ≡ 12(∇
(1)±α
∇±γ g(1)±αβ −∇±α g(1)
In these equations, Q± and ~T± are a priori unknown quantities and fulfilling the
matching conditions requires showing that two vectors ~Z± exist such that (2) are satisfied.
The spacetime gauge freedom can be exploited to fix either or both vectors ~Z± a priori,
but this should be avoided (or at least carefully analysed) if additional spacetime gauge
choices are made, in order not to restrict a priori the possible matchings. Regarding the
hypersurface gauge, this can be used to fix one of the vectors ~T+ or ~T−, but not both.
As already stressed the linearized matching conditions are by construction spacetime
gauge invariant (in fact each of the tensors q(1)ij
±, K(1)ij
± is). Moreover, the set of
conditions (2) are hypersurface gauge invariant, provided the background is properly
matched, since [6] under such a gauge transformation given by the vector ~ζ on Σ0, q
transforms as q(1)ij + L~ζ q(0)ij , and similarly for K(1)ij .
3 On previous spacetime gauge invariant formalisms
The first attempt to derive a general formalism for the matching conditions in linearized
gravity is, to our knowledge, due to Gerlach and Sengupta [2]. Their approach is based
on the description of the matching hypersurface Σ as a level set of a function f defined
on the spacetime. Assuming the level sets {f = const} to be timelike, a field of spacelike
unit normals is defined as nµ = (g
αβf,αf,β)
−1/2f,µ. The unperturbed matching conditions
correspond to the continuity everywhere (in particular across Σ) of the tensors
qαβ ≡ gαβ − nαnβ, Kαβ ≡ qαµqβν∇µnν , (5)
which are the spacetime versions of the first and second fundamental forms introduced
above. Being f defined everywhere, it makes sense to perturb it in order to describe the
variation of the matching hypersurface. Obviously, by perturbing f one also perturbs nµ.
The perturbed matching conditions proposed in [2] read
β△(qαβ)+ = qµαqνβ△(qαβ)−, qµαqνβ△(Kαβ)+ = qµαqνβ△(Kαβ)−, (6)
where qβ
α is the projector onto Σ, △ stands for perturbation and + and − denote the
quantities as computed from either side of the matching hypersurface Σ. These expres-
sions involve the projections of the perturbations of qαβ and Kαβ onto Σ. The need of
1We will abuse slightly the notation and refer to vectors on Σ0 and their images on spacetime with
the same symbol. The meaning should be clear from the context.
considering only the projected components is justified in [2] since the matching condi-
tions need to be intrinsic to the matching hypersurfaces. However, Gerlach and Sengupta
themselves note that conditions (6) are not gauge2 invariant.
Since the main interest in [2, 3] refers to spherically symmetric backgrounds, this
“ambiguity” is fixed in that case by finding suitable gauge invariant combinations of
the linearized matching conditions, which turn out to give a correct set of necessary
perturbed matching conditions in spherical symmetry. It should be stressed however, that
the authors consider these gauge invariant subset to be sufficient also, with no further
justification.
We know from the discussion in Sect. 2.1 above that (6) cannot be correct as it leads
to a set of gauge dependent conditions. Since, on the other hand the proposal (6) may
look plausible, it is of interest to point out where, and in which sense, it fails to be correct.
The first source of problems comes from assuming that the matched spacetime is given
beforehand. Indeed, qαβ and Kαβ are spacetime tensors and they can only exist (and be
continuous) once the matched spacetime is constructed. But this is precisely the purpose
of the matching conditions, so the conditions become circular. Another aspect of the same
problem is that one can only talk about continuity once the pointwise identification of
the boundaries is chosen. But a level set of a function defines only a set of points and not
the way those points must be identified. A third instance of the same issue is that tensor
components must be expressed in some basis, e.g. a common coordinate system covering
both sides of Σ. But again this cannot be assumed a priori. It needs to be constructed.
Let us however mention that once the pointwise identification of the boundaries is
chosen, the use of spacetime tensors is allowed provided they are finally projected onto
the hypersurface. In that sense, and when properly used, using spacetime indices may
simplify some calculations notably (see Carter and Battye, [5] where this notation is used
to derive the perturbed matching conditions).
Besides this aspect (which already affects the background matching) the perturbed
equations (6) suffer from one extra problem. The perturbations △(qαβ)(p) and△(Kαβ)(p)
at a point p in the background can be defined by taking ε-derivatives at fixed p and ε = 0
of the corresponding tensors (defined by gαβ(ε) and fε). For each value of ε, the matching
conditions impose the continuity of qαβ(ε) andKαβ(ε) everywhere (with the caveat already
mentioned regarding the identification of the boundaries). However, continuity of △(qαβ)
and △(Kαβ) at p would only follow if derivatives of continuous functions with respect
to an external parameter were necessarily continuous (in our case, the derivative with
respect to ε), which is not true in general. A trivial example is given by the function
u(ε, x) = |x + ε|, whith x ∈ R. For each ε this function is continuous. However, the
derivative with respect to ε does not even exist at x = 0, ε = 0. This reflects the fact that
subtracting continuous tensors at a fixed spacetime point p leads to objects that need not
be continuous. This is in fact the main problem of (6) as linearized matching conditions.
An immediate question arises: Why is the gauge invariant subset of matching condi-
tions found in [2, 3] for spherically symmetric backgrounds correct? In order to understand
this, let us rewrite (6) using the formalism of section 2.2. First of all, since △(nαnβ) will
contain, at least, one free n
α , we have
β△(qαβ)± = qµαqνβg(1)
αβ . (7)
2Throughout this section gauge will refer to spacetime gauge. Hypersurface gauges will only appear
briefly towards the end of the section.
Moreover, a simple calculation gives △(∇αnβ) = ∇α(△nβ) − S(1)µαβ n
µ and △(qαβ ) =
−g(1)αµn(0)µ n(0)β + g(0)αµ△(nµ)n
(0)α△(nβ). These, together with standard properties
of the projector, lead to
β△(Kαβ)± =
a(0)ν qµ
α△(nα) + qµαqνβ∇α(△nβ)− qµαqνβS(1)ραβ n
±, (8)
where a
ν ≡ n(0)α∇αn(0)ν . In general, these expressions do not agree with (3) and (4).
However, when the gauges are chosen so that ~Z± = 0, then △f ≡ 0 on Σ because
the matching hypersurface is unperturbed as seen from the background. Consequently
∂α(△f) ∝ n(0)α on Σ, which implies △(nα)
α for some function h. Imposing ~n(ε) to
be unit for all ε fixes h = ǫ
β . Inserting into (8) the matching conditions (6)
become
βg(1)αβ
βg(1)αβ
, (9)
n(0)α n
β Kµν − qµ
n(0)α n
β Kµν − qµ
which agree with (2) (with the exception that (9) refers to spacetime tensors and (2) are
defined on Σ). Since Gerlach and Sengupta derive a subset of gauge invariant matching
conditions out of (6) in the spherically symmetric case and their conditions are correct in
one gauge, it follows that the invariant subset is correct in any gauge. This is the reason
why the results in [2, 3] involving spherically symmetric backgrounds turn out to be fine.
Substantial progress in the linearized matching problem was made by Mart́ın-Garćıa
and Gundlach [4]. These authors pointed out the lack of justification in [2, 3] for the choice
of (6) as matching conditions. It was also argued that for spacetimes with boundary it
only makes sense to define perturbations by using gauges where the perturbed matching
hypersurface is mapped onto the background matching hypersurface. Perturbations in this
gauge, called “surface gauge” (not to be confused with hypersurface gauge) are denoted
by △̄, and its defining property is △̄f = 0. The idea was to write down the matching
conditions in this gauge and then transform into any other gauge if necessary. As noticed
by the authors, the surface gauge is not unique since there are still three degrees of freedom
left, which correspond to the three directions tangent to Σ.
A relevant observation made in [4] was that the continuity of tensorial perturbations
may depend on the index position in the tensors. The authors argue that the tensors
truly intrinsic to the hypersurfaces are qαβ, Kαβ (with indices upstairs) and propose the
following perturbed matching conditions
△̄(qαβ)+ = △̄(qαβ)−, △̄(Kαβ)+ = △̄(Kαβ)−, (10)
which are demonstrated to become exactly (9). This shows the equivalence of both pro-
posals in the surface gauge, as explicitly stated in [4]. This justifies partially the validity of
both approaches in the surface gauge. However, the justification is not complete because
of the issue we discuss next.
Indeed, conditions (10) still carry one implicit assumption that needs to be clarified.
As already stressed the perturbed matching conditions have two inherent and independent
degrees of gauge freedom. The approach by Mart́ın-Garćıa and Gundlach involves only
spacetime objects, and therefore can only notice the spacetime gauge freedom. This leads
to an incorrect statement in [4], as it is not true that the linearized matching conditions
read (10) in any surface gauge. Conditions (10) will only be valid when the spacetime
gauge maps pairs of background points (identified, via the background matching) to pairs
of points on the perturbed boundaries Σ±ε which are also identified through the matching.
Notice that not all surface gauges have this property. In explicit terms, this means that
the vectors ~Z± must (i) only have tangential components (so that we are in surface gauge)
and (ii) have the same components when written in terms of an intrinsic basis of Σ0. In
less precise, but more intuitive terms, condition (ii) states that ~Z+ and ~Z− are the same
vector, i.e. that the gauges in both regions are chosen such that the displacement of
one fixed point of the background hypersurface is identical in both regions (the displaced
point, of course, stays on the unperturbed hypersurface, due to the choice of surface
gauge). Observe finally that if Q± = 0 and ~T+ = ~T−, then the linearized matching
conditions (2) truly reduce to conditions (9), once the latter are projected on Σ. This
shows the correctness of the approaches by Gerlach and Sengupta and Mart́ın-Garćıa and
Gundlach in special gauges.
4 Freedom in matching due to symmetries
We devote this section to the study of the consequences of the existence of background
symmetries on perturbed spacetime matchings.
The existence of symmetries in the background configuration introduces two issues
which are important to take into consideration: the first corresponds to the freedom in-
troduced by the matching procedure, when preserving the symmetries, at the background
level [9], c.f. [10] for an application. The second issue corresponds to the consequences
that the symmetries in the background configuration may have on the perturbation of the
matching.
It must be stressed here that the arbitrariness introduced by the presence of symmetries
in the background configuration is completely independent from both the hypersurface and
spacetime gauge freedoms. However, that arbitrariness is gauge dependent and therefore
a gauge choice can be made to remove it. As we will show, an isometry in the background
implies that there is a direction along which the difference [~T ] ≡ ~T+ − ~T− cannot be
determined by the perturbed matching equations. But, as we have discussed at the end
of section 2, one could eventually choose part of the spacetime gauges (if there is any
freedom left) to fix [~T ]. Note, finally, that a change of hypersurface gauge leaves [~T ]
invariant.
4.1 Isometries
We shall now consider the presence of isometries in the background configuration. So,
let us assume that one of the sides, say (M+0 , g
(0)+), admits an isometry generated by
the Killing vector field ~ξ+ tangent to the boundary Σ+0 . The commutation of the Lie
derivative and the pull-back implies [9]
L~ξ+q
+ = e+αi e
j L~ξ+g
+|Σ0 = 0,
which means that ~ξ+ is a Killing vector of (Σ0, q
+). This implies from expression (3)
that q(1)ij
+ is invariant under the transformation ~T+ → ~T+ + ~ξ+|Σ0.
As for K(1)ij
+, from its expression (4), it is again clear that the previous transforma-
tions of ~T+ will leave K(1)ij
+ invariant provided L~ξ+K(0)ij+ = 0. But this is precisely the
case since ~ξ+ is a Killing vector orthogonal to n
+ , which implies L~ξ+n
+ β|Σ+
= 0, and
hence
L~ξ+K
+ = e+αi e
j L~ξ+(∇n
+ )αβ|Σ0 = e+αi e
j ∇αL~ξ+n
+ β|Σ0 = 0.
Of course, all this discussion also applies to the (−) side.
The combination of the invariance of q(1)ij
± and K(1)ij
± leads to the fact that the
first order perturbed matching conditions are invariant under a change of the vectors
~T± along the direction of any isometry of the background configuration (preserved by
the matching). Then, as expected, when symmetries are present the linearized matching
conditions cannot determine the difference [~T ] completely: they leave undetermined the
relative (between the two sides) deformation of the hypersurface along the direction of
the symmetry. Note that, still, the shape of the perturbed hypersurface is completely
determined, since that is driven by Q±.
The overall picture is as follows: at the background level we have the arbitrariness of
the identification of Σ+0 with Σ
0 [9], which can be seen as a “sliding” between Σ
0 and
Σ−0 . The perturbation adds to this an arbitrary shift of the deformation of the matching
hypersurface at each side along the orbits of the isometry group. As an example, in
the description of stationary and axisymmetric compact bodies discussed in [10, 9], the
background sliding corresponds to an arbitrary constant rotation of the interior with
respect to the exterior. Note that, in that case, this rotation is only relevant because
the exterior is taken to be asymptotically flat. As a result, two identical interiors can,
in principle, give rise to two exteriors that differ by a constant rate rotation [10]. The
shift of the surface deformation would, in principle, lead to an arbitrary constant rotation
along the axial coordinate of the surface deformation of the body. Likewise, two identical
perturbations in the interior of the body may produce two different perturbations in the
exterior, which may differ by a relative constant rate rotation. A choice of spacetime
gauge could be used to relate the deformations inside and outside. However, this may
interfere with other gauge fixings that may have been made.
5 n-dimensional spherically symmetric backgrounds
In this section we shall revisit Mukohyama’s theory for linearized matching in the special
case of spherical symmetry. Similar results [6] hold for backgrounds admitting isometry
groups of dimension (n−1)(n−2)/2 acting on non-null codimension-two orbits of arbitrary
topology (strictly speaking the orbits need to be compact).
5.1 The approach of Mukohyama
Concentrating on one of the two spacetimes to be matched, either + or −, we consider a
spherically symmetric background metric of the form
αdxβ = γabdx
adxb + r2ΩABdθ
AdθB, (11)
where γab (a, b, .. = 0, 1) is a Lorentzian two-dimensional metric (depending only on {xa}),
r > 0 is a function of {xa}, and ΩABdθAdθB is the n − 2 dimensional unit sphere metric
with coordinates {θA} (A,B, . . . = 2, 3, . . . , n− 1).
A general spherically symmetric background hypersurface can be given in parametric
form as
Σ0 := {x0 = Z(0)0(λ), x1 = Z(0)1(λ), θA = ϑA}, (12)
where {ξi} = {λ, ϑA} is a coordinate system in Σ0 adapted to the spherical symmetry.
The tangent vectors to Σ0 read
~eλ =
˙Z(0)0∂x0 +
˙Z(0)1∂x1
, ~eϑA = ∂θA |Σ0 , (13)
where dot is derivative w.r.t. λ. With N2 ≡ −ǫeλaeλbγab|Σ0, so that ǫ = 1 (ǫ = −1)
corresponds to a timelike (spacelike) hypersurface, the unit normal to Σ0 reads
(0) =
− det γ
− ˙Z(0)1dx0 + ˙Z(0)0dx1
, (14)
where the sign choice of N corresponds to the choice of orientation of the normal. The
background induced metric and second fundamental form on Σ0 read
q(0)ijdξ
idξj = −ǫN2dλ2 + r2|Σ0ΩAB|Σ0dϑAdϑB, (15)
K(0)ijdξ
idξj = N2Kdλ2 + r2K̄|Σ0ΩAB|Σ0dϑAdϑB, (16)
where
K ≡ N−2eλaeλb∇an(0)b , K̄ = n
(0)a∂xa ln r.
It follows that the background matching conditions (1) are
N2+ = N
+|Σ0 = r2−|Σ0, K+ = K−, K̄+ = K̄−. (17)
Using (3) and (4) we could now compute the first order perturbations q(1)ij and K
ij in
terms of the above quantities and ~Z (or equivalently Q and ~T ), c.f. Eqs. (45) and (46)
in [6]. Let us recall (see subsection 2.2) that while the individual tensors q(1)ij and K
are not hypersurface gauge invariant, their respective differences from the + and − sides
(i.e. the linearized matching conditions) are. Those tensors depend of the hypersurface
gauge through the tangent vectors ~T+ and ~T−, which under a gauge change transform
simply by adding the gauge vector. It follows that only their difference [~T ] can appear
in the linearized matching conditions. Consequently there are three degrees of freedom
that cannot be fixed by the equations, but can be fixed by choosing the hypersurface
gauge, for instance to set ~T+. Thus, the linearized matching conditions can be looked at
as equations for the difference [~T ] as well as for Q+ and Q−, i.e. for five objects. If these
equations admit solutions, then the linearized matching is possible and it is impossible
otherwise.
Mukohyama emphasizes the convenience to look for doubly gauge invariant quantities
to write down the linearized matching conditions, however the matching conditions are
already gauge invariant (both for the spacetime and hypersurfaces gauges). Looking for
gauge invariant combinations on each side amounts to writing equations where the dif-
ference vector [~T ] simply drops. Indeed, in many cases, knowing the value of such vector
in a specific matching is not interesting. In that sense, using doubly gauge invariant
quantities is useful as it lowers the number of equations to analyse. However, we want to
stress that this is not related to obtaining gauge invariant linearized matching equations.
It is just related to not solving for superfluous information. In fact, a set of equations
where also Q+ and Q− have disappeared would be even more convenient from this point
of view, provided one is not interested in knowing how the hypersurfaces are deformed in
the specific spacetime gauge being used.
Since the use of doubly gauge invariant matching conditions is used extensively, let us
recall its main ingredients in order to discuss if they really are equivalent to the full set
of linearized matching equations and in which sense.
To that aim Mukohyama [6], decomposes the perturbation tensors q(1)ij and K
ij in
terms of scalar Y , vector VA and tensor harmonics TAB on the sphere, as
q(1)ijdξ
idξj =
(σ00Y dλ
2 + σ(Y )T(Y )ABdϑ
AdϑB) +
2(σ(T )0V(T )A + σ(L)0V(L)A)dλdϑ
(σ(T )T(T )AB + σ(LT )T(LT )AB + σ(LL)T(LL)AB)dϑ
AdϑB, (18)
K(1)ijdξ
idξj =
(κ00Y dλ
2 + κ(Y )T(Y )ABdϑ
AdϑB) +
2(κ(T )0V(T )A + κ(L)0V(L)A)dλdϑ
(κ(T )T(T )AB + κ(LT )T(LT )AB + κ(LL)T(LL)AB)dϑ
AdϑB, (19)
where all the scalar coefficients depend only on λ. Each coefficient in the decomposition
has indices l and m which have been dropped for notational simplicity. Notice that
each coefficient σ and κ is defined in the range of l’s appearing in the corresponding
summatory. By construction, each of the σ and κ are spacetime-gauge invariant (but
not hypersurface-gauge invariant). For l ≥ 2 they can even be written down [6] explicity
in terms of spacetime-gauge invariant quantities. In a similar way, the doubly gauge-
invariant quantities presented in [6], are only defined for l ≥ 2 (except k(T )0, which is also
defined for l = 1), and read
l ≥ 2 : f00 ≡ σ00 − 2N∂λ
l ≥ 2 : f ≡ σ(Y ) + ǫN−2χ∂λ
r2|Σ0
k2l σ(LL),
l ≥ 2 : f0 ≡ σ(T )0 − r2|Σ0∂λ
r−2|Σ0σ(LT )
l ≥ 2 : f(T ) ≡ σ(T ),
l ≥ 2 : k00 ≡ κ00 + ǫKσ00 + ǫχ∂λK,
l ≥ 1 : k(T )0 ≡ κ(T )0 − K̄σ(T )0, (20)
l ≥ 2 : k(L)0 ≡ κ(L)0 +
(ǫK − K̄)σ(L)0 +
(ǫK + K̄)
χ− r2|Σ0∂λ(r−2|Σ0σ(LL))
l ≥ 2 : k(LT ) ≡ κ(LT ) − K̄σ(LT ),
l ≥ 2 : k(LL) ≡ κ(LL) − K̄σ(LL),
l ≥ 2 : k(Y ) ≡ κ(Y ) − K̄σ(Y ) + ǫN−2r2|Σ0χ∂λK̄,
l ≥ 2 : k(T ) ≡ κ(T ) − K̄σ(T ),
3The ranges of l’s are not made explicit in [6] in order to include also non-compact homogeneous
spaces, where the index l is continuous. However, to discuss sufficiency of the equations we need to be
precise on the range of validity of each equation.
where k2l = l(l + n− 3) and
l ≥ 2 : χ ≡ σ(L)0 − r2|Σ0∂λ(r−2|Σ0σ(LL)).
The orthogonality properties of the scalar, vector and tensor harmonics imply that
the equalities of the coefficients σ and κ for each l and m is equivalent to the equality of
the perturbation tensors (18) and (19) at both sides of Σ0. Thus, recalling the notation
[f ] ≡ f+|Σ0 − f−|Σ0 , the equations
l ≥ 0 : [σ00] = [σ(Y )] = 0
l ≥ 1 : [σ(L)0] = [σ(T )0] = 0
l ≥ 2 : [σ(T )] = [σ(LT )] = [σ(LL)] = 0
l ≥ 0 : [κ00] = [κ(Y )] = 0
l ≥ 1 : [κ(L)0] = [κ(T )0] = 0
l ≥ 2 : [κ(T )] = [κ(LT )] = [κ(LL)] = 0
are equivalent to (2) and therefore correspond exactly to the linearized matching condi-
tions in this setting. Notice that each of the equalities in (21) and (22) is in fact one
equation for each l and m in the appropriate range. We will however refer to them simply
as equations.
The full linearized matching conditions obviously imply the following equalities in
terms of the doubly-gauge invariant quantities (20),
l ≥ 2 : [f00] = [f ] = [f0] = [f(T )] = 0 (23)
l ≥ 1 : [k(T )0] = 0
l ≥ 2 : [k00] = [k(Y )] = [k(L)0] = [k(LL)] = [k(LT )] = [k(T )] = 0.
Whether these equations can be regarded as the full set of linearized matching conditions
or not requires studying their sufficiency, i.e. whether they imply (21)-(22) or not. This
point was not mentioned in [6] and in fact the answer turns out to be negative, although
in a mild way, as we discuss in the next subsection.
5.2 On the sufficiency of the continuity of the doubly-gauge in-
variants
Let us recall that fulfilling the matching conditions requires finding two ~Z± such that
(21)-(22) are satisfied. The key issue for the matching is therefore to show existence of
deformation vectors ~Z± so that all the equations hold.
A plausibility argument in favour of the sufficiency of (23)-(24) comes from simple
equation counting. Indeed, as already discussed, the linearized matching conditions are
spacetime and hypersurface gauge invariant and therefore can only involve the difference
vector [~T ], i.e. three quantities. Since constructing double gauge invariant quantities on
each side eliminates this vector, the number of equations should be reduced exactly by
three if they are to remain equivalent to the original set. This is precisely what happens
as we go from the original forteen equations in (21)-(22) down to eleven equations in
(23)-(24). This argument however is not conclusive, both because it is not rigorous and
because each equation in those expressions is, in fact, a set of equations depending on l
and m, and the range of l’s changes with the equations. Let us therefore analyse this issue
in detail. In particular we need to discuss what are the consequences of the non-existence
of doubly gauge-invariant variables for l = 0 and l = 1 (except for k(T )0 which exists for
l = 1), something not mentioned in [6].
Let us start by finding explicit expressions for σ’s valid in the whole range of l’s. As
in [6], we decompose g(1) in harmonics as
g(1)αβdx
αdxβ =
(habY dx
adxb + h(Y )T(Y )ABdθ
AdθB)
2(h(T )aV(T )A + h(L)aV(L)A)dx
(h(T )T(T )AB + h(LT )T(LT )AB + h(LL)T(LL)AB)dθ
AdθB, (25)
and ~Z as
zaY dx
(z(T )V(T )A + z(L)V(L)A)dθ
QY n(0) − ǫN−2zλY eλ
(z(T )V(T )A + z(L)V(L)A)dθ
A, (26)
which implies Tαdx
l=0(−ǫN−2zλY eλ) +
l=1(z(T )V(T )A + z(L)V(L)A)dθ
A. Inserting
these expressions into (2) and expanding in spherical harmonics it is straightforward to
l ≥ 0 : [σ00] = 0 ⇔ [hλλ] + 2[Q]N2K + 2N∂λ
N−1[zλ]
l ≥ 1 : [σ(L)0] = 0 ⇔ [zλ] + [h(L)λ] + r2|Σ0∂λ(r−2|Σ0[z(L)]) = 0, (27)
l ≥ 2 : [σ(LL)] = 0 ⇔ [z(L)] + [h(LL)] = 0, (28)
l ≥ 0 : [σ(Y )] = 0 ⇔ [h(Y )] + 2[Q]r2|Σ0K̄ − ǫN−2[zλ]∂λ(r2|Σ0)−
k2l [z(L)] = 0,
l ≥ 1 : [σ(T )0] = 0 ⇔ [h(T )λ] + r2|Σ0∂λ(r−2|Σ0[z(T )]) = 0,
l ≥ 2 : [σ(LT )] = 0 ⇔ [z(T )] + [h(LT )] = 0, (29)
l ≥ 2 : [σ(T )] = 0 ⇔ [h(T )] = 0,
where [hλλ], [h(L)λ], etc. denote eλ
b[hab], eλ
a[h(L)a], etc. Later on we will also write
down the explicit expressions for (22) but they are not needed in this subsection.
It is obvious by the form of f ’s and κ’s (20) that the set of equations (21)-(22) are
equivalent to (23)-(24) together with
l ≥ 2 : [σ(L)0] = [σ(LT )] = [σ(LL)] = 0 (30)
l = 0, 1 : [σ00] = [σ(Y )] = 0
l = 1 : [σ(L)0] = [σ(T )0] = 0
l = 0, 1 : [κ00] = [κ(Y )] = 0
l = 1 : [κ(L)0] = 0.
Sufficiency of Mukohyama’s doubly gauge invariant matching conditions would follow if
these equations serve exclusively to determine the discontinuity [~T ], i.e. [zλ] for l ≥ 0
and [z(T )], [z(L)] for l ≥ 1. Now, the explicit expressions (27), (29), (28) show that (30)
determine uniquely [zλ], [z(T )] and [z(L)] for l ≥ 2. So, restricted to the sector l ≥ 2
Mukoyama’s doubly gauge invariant matching conditions can be regarded as equivalent
to the full set of matching conditions. Taking all l’s into account, however, the equations
turn out not to be sufficient. To show this, it is enough to display one equation involving
the discontinuity of the background metric perturbations and [Q] (but not [~T ]) which holds
as a consequence of the full set of matching conditions (21)-(22) but not as a consequence
of (23)-(24). Using the fact that each l = 1 expression refers to n − 1 objects (one for
each m), the number of equations in (31)-(32) is 7n− 3, while the number of unkowns in
[~T ] not yet determined by (30), i.e. [zλ] for l = 0, 1 and [z(T )], [z(L)] for l = 1 is 3n − 2,
which is smaller. It is to be expected, therefore, that (31), (32) imply conditions where
these variables do not appear. This can be made explicit, for instance, by combining
[σ00]l=0 = 0 with [σ(Y )]l=0 = 0 which yields
l = 0 : [hλλ] + 2[Q]N
2K + 2N∂λ
∂λ(r2|Σ0)
[h(Y )] + 2[Q]r
2|Σ0K̄
whenever ∂λ(r
2|Σ0) 6= 0. (If ∂λ(r2|Σ0) = 0 it is enough to consider [σ(Y )]l=0 = 0.) This
relation is enough to show that the continuity of the doubly-gauge invariant variables of
Mukohyama is not sufficient to ensure the existence of the perturbed matching. Of course,
this does not invalidate Mukohyama’s approach in any way, which remains interesting
and useful. It only means that, when using this approach to solve linearized matchings,
one still needs to look more carefully into the l = 0 and l = 1 sector to make sure that
the remaining equations (31) and (32) hold.
On the other hand, equations (31), (32) do not completely determine [~T ]. The variable
[z(T )]l=1 only appears in [σ(T )0]l=1 = 0, in the term ∂λ(r
−2|Σ0 [z(T )]). As a result, the
matching conditions do not fix [z(T )]l=1 completely, but up to a constant factor times
r2|Σ0 (for each m). Recalling that V(T )AdϑA for l = 1 correspond to the three Killing
vectors on the sphere, this arbitrary constant (for each m) accounts for the addition to
[~T ] of an arbitrary Killing vector of the sphere. This is in accordance with the discussion
in Section 4. We devote the following subsection to complete the study of the freedom
left in the matching.
5.3 Freedom in the matching
As already emphasized, solving the linearized matching amounts to finding perturbation
vectors ~Z+ and ~Z−. Assume now that a linearized matching between two given back-
grounds and perturbations has been done. It is natural to ask what is the most general
matching between those two spaces, i.e. what is the most general solution for ~Z+ and ~Z−
of the matching conditions. Geometrically, this means finding all the possible deforma-
tions of the matching hypersurface Σ0 which allow the two spaces to be matched.
Since this problem is of interest not only when the full matching conditions are imposed
but also in situations where layers of matter are present (e.g. in brane-world or shell
cosmologies) so that jumps in the second fundamental forms are allowed, we will analyse
this issue in two steps. First, we will study the equations involving the perturbed first
fundamental forms and will determine the freedom they admit. On a second step we
will write down the extra conditions coming from the equality of the second fundamental
forms.
Thus, let us consider two perturbation configurations of the same background matching
and denote their respective sets of difference variables on Σ0 as [f ] and [f ]
′ for any given
variable f . Now, we will define the difference between the two configurations as < f >≡
[f ]′ − [f ] for any variable f . The assumption that the perturbation on each side is fixed
once and for all implies < g(1) >= 0. We are assuming that the linearized matching
conditions are satisfied in each case, and so we can subtract them. Linearity implies that
the differences of the linearised matching equations become equations for the difference
vector < ~Z >. The general solution of these equations clearly determines the freedom in
the deformation of the hypesurface.
The difference of the equations in (21) for the two configurations using < g(1) >= 0
give the following set of equations
l ≥ 0 : < σ00 >= 0 → < Q > N2K +N∂λ
N−1 < zλ >
= 0, (33)
l ≥ 1 : < σ(L)0 >= 0 → < zλ > +r2|Σ0∂λ(r−2|Σ0 < z(L) >) = 0, (34)
l ≥ 2 : < σ(LL) >= 0 → < z(L) >= 0, (35)
l ≥ 0 : < σ(Y ) >= 0 → 2 < Q > r2|Σ0K̄ − ǫN−2 < zλ > ∂λ(r2|Σ0)
k2l < z(L) >= 0, (36)
l ≥ 1 : < σ(T )0 >= 0 → ∂λ(r−2|Σ0 < z(T ) >) = 0, (37)
l ≥ 2 : < σ(LT ) >= 0 → < z(T ) >= 0, (38)
l ≥ 2 : < σ(T ) >= 0 → 0 = 0.
Expressions (35) and (38) readily determine < z(L) >l≥2=< z(T ) >l≥2= 0, which sub-
stituted in (34) give < zλ >l≥2= 0. As a result, (36) for l ≥ 2 lead to < Q >l≥2= 0.
Clearly all the equations for l ≥ 2 are now satisfied. We now concentrate on the l = 1
equations. Equation (37) implies that < z(T ) >l=1= ar
2|Σ0, where a is a constant for each
m. Combining equations (33), (34) and (36) for l = 1 we obtain the following equation
for r−2|Σ0 < z(L) >l=1,
K̄∂2λ(r−2|Σ0 < z(L) >l=1) +
(2K̄ + ǫK)∂λ(ln r|Σ0)− K̄∂λ lnN
−2|Σ0 < z(L) >l=1)
r−2|Σ0 < z(L) >l=1= 0, (39)
while (34) and (33) determine < zλ >l=1 and < Q >l=1 respectively (provided K 6= 0,
which occurs generically). The two equations for l = 0 can be rearranged onto
K̄∂λ(N−1 < zλ >l=0) +N−1 < zλ >l=0 ǫK∂λ ln(r|Σ0) = 0 (40)
plus the equation (33) for l = 0, which determines < Q >l=0.
Summarizing, we have found that the freedom in the deformation of the hypersurface
compatible with the linearized matching conditions involving the first fundamental form
[~Z]′ − [~Z] =
< Q > Y ~n(0) − ǫN−2 < zλ > Y ~eλ
+am~V
(T ) + r
−2|Σ0 < z(L) >l=1,m ~V m(L),
where r−2|Σ0 < z(L) >l=1,m, satisfy (39), < zλ >l=0 satisfy (40) and the rest of the
variables are completely determined as described above. The term in am corresponds
to adding Killing vectors on the sphere, something already discussed in Section 4. The
rest of terms involve combinations (with functions) of the conformal Killing vectors on
the sphere and tangential vectors along λ. Notice that the coefficients of the conformal
Killing (i.e. < z(L) >l=1,m ) determine all the rest of the l = 1 coefficients. In particular
when < z(L) >l=1,m vanishes, then all the l = 1 terms vanish and the freedom becomes
radially symmetric.
We now add to the analysis the difference of the equations in (22). Due to the fact
that all coefficients in < ~Z > vanish for l ≥ 2 we only need to consider the equations for
l = 0, 1, i.e. (32). We refer the reader to Appendix A for the explicit expressions of (32)
in terms of the metric perturbations and ~Z. For the sake of completeness we also include
all the explicit expressions of (22) in Appendix A. The difference of equations (32), see
(44)-(46), whenever < g(1) >= 0 read
l = 0, 1 : < κ00 >= 0 ⇔ (41)
− < QR(γ)dbac > n
(0)dn(0)aeλ
c − ǫ∂2λ < Q > +
2∂λ < Q > −ǫ < Q > K2N2
−ǫKN2∂λ(N−2 < zλ >)− ǫ∂λ(K < zλ >) = 0,
l = 1 : < κ(L)0 >= 0 ⇔ (42)
−ǫ∂λ < Q > +ǫK < zλ > +ǫ < Q > ∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0 < z(L) >) = 0
l = 0, 1 : < κ(Y ) >= 0 ⇔ (43)
N−2∂λ(r
2|Σ0) (∂λ < Q > +K < zλ >) +
< Qn(0)an(0)b∇a∇br2 >
N−2eλ
a < zλn
(0)b∇b∇ar2 > +
l(l + n− 3)
ǫ < Q > −2K̄ < z(L) >
It can be checked that in general these equations overdetermine the previous equations,
i.e. (39) and (40), although there may be particular cases for which they are compatible.
Therefore, generically, they will imply that < z(L) >l=1,m= 0 and < zλ >l=0= 0, and
thence all the rest of the variables vanish, < zλ >l=1,m=< Q >l=1,m= 0, < zλ >l=0=<
Q >l=0= 0, so that the only freedom left is given by
[~Z]′ − [~Z] = am~V m(T ).
Finding in which particular cases equations (39)-(43) are compatible is straightforward
but tedious and will not be carried out explicitly here.
Acknowledgements
FM and MM thank CRUP(Portugal)/MCT(Spain) for grant E-113/04. FM thanks FCT
(Portugal) for grant SFRH/BPD/12137/2003 and CMAT, University of Minho, for sup-
port. MM was supported by the projects FIS2006-05319 of the Spanish Ministerio de
Educación y Tecnoloǵıa and SA010CO of the Junta de Castilla y León. RV was supported
by the Irish IRCSET, Ref. PD/2002/108, and now is funded by the Basque Government
Ref. BFI05.335.
A Appendix
For the sake of completeness we devote this appendix to present the explicit expressions
of (22) in terms of the metric perturbations and ~Z, which read
l ≥ 0 : [κ00] = 0 ⇔ (44)
N2K[hnn]−
n(0)aeλ
c(2∇c[hab]−∇a[hbc])− [QR(γ)dbac]n
(0)dn(0)aeλ
−ǫ∂2λ[Q] +
2∂λ[Q]− ǫ[Q]K2N2 − ǫKN2∂λ(N−2[zλ])− ǫ∂λ(K[zλ]) = 0,
l ≥ 1 : [κ(L)0] = 0 ⇔ (45)
[hnλ]−
n(0)aeλ
b(∂b[h(L)a]− ∂a[h(L)b])− ǫ∂λ[Q]− ǫK[zλ]
+(ǫ[Q] + [h(L)n])∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0[z(L)]) = 0
l ≥ 0 : [κ(Y )] = 0 ⇔ (46)
r2|Σ0K̄[hnn] +
N−2∂λ(r
2|Σ0) (ǫ[hnλ] + ∂λ[Q] +K[zλ]) +
[n(0)a∂ah(Y )]
[Qn(0)an(0)b∇a∇br2]−
N−2eλ
a[zλn
(0)b∇b∇ar2]
l(l + n− 3)
[h(L)n] + ǫ[Q]− 2K̄[z(L)]
l ≥ 1 : [κ(T )0] = 0 ⇔
n(0)aeλ
b(∂b[h(T )a]− ∂a[h(T )b]) + [h(T )n]∂λ ln(r|Σ0) + r2|Σ0K̄∂λ(r−2|Σ0[z(T )]) = 0,
l ≥ 2 : [κ(LT ] = 0 ⇔ −
[h(T )n] +
n(0)a∂a[h(LT )] + K̄[z(T )] = 0,
l ≥ 2 : [κ(LT ] = 0 ⇔ −
[h(L)n] +
n(0)a∂a[h(LL)] + K̄[z(L)]−
[Q] = 0,
l ≥ 2 : [κ(T ] = 0 ⇔
n(0)a∂a[h(T )] = 0.
References
[1] Gerlach U H and Sengupta U K (1979) “Gauge-invariant perturbations on most gen-
eral spherically symmetric space-times” Phys. Rev. D 19 2268-2272
[2] Gerlach U H and Sengupta U K (1979) “Junction conditions for odd-parity perturba-
tions on most general spherically symmetric space-times” Phys. Rev. D 20 3009-3014
[3] Gerlach U H and Sengupta U K (1979) “Even parity junction conditions for pertur-
bations on most spherically symmetric space-times” J. Math. Phys. 20 2540-2546
[4] Mart́ın-Garćıa J M and Gundlach (2001) “Gauge-invariant and coordinate-
independent perturbations of stellar collapse II: matching to the exterior” Phys. Rev.
D 64 024012
[5] Carter B and Battye R A (1995) “Gravitational Perturbations of Relativistic Mem-
branes and Strings” Phys. Lett. B35 29-35
[6] Mukohyama S (2000) “Perturbation of the junction conditions and doubly gauge-
invariant variables” Class. Quantum Grav. 17 4777-4797
[7] Mars M (2005) “First and second order perturbations of hypersurfaces” Class. Quan-
tum Grav. 22 3325-3347
[8] Mars M and Senovilla J M M (1993) “Geometry of general hypersurfaces in spacetime:
junction conditions” Class. Quantum Grav. 10 1865-1897
[9] Vera R (2002) “Symmetry-preserving matchings” Class. Quantum Grav. 19 5249-5264
[10] Mars M and Senovilla J M M (1998) “On the construction of global models describing
rotating bodies; uniqueness of the exterior gravitational field” Mod. Phys. Lett. A13
1509-1519
	Introduction
	Linearized matching
	Gauge freedom
	Matching conditions
	On previous spacetime gauge invariant formalisms
	Freedom in matching due to symmetries
	Isometries
	n-dimensional spherically symmetric backgrounds
	The approach of Mukohyama
	On the sufficiency of the continuity of the doubly-gauge invariants
	Freedom in the matching
	Appendix
ABSTRACT
  We present a critical review about the study of linear perturbations of
matched spacetimes including gauge problems. We analyse the freedom introduced
in the perturbed matching by the presence of background symmetries and revisit
the particular case of spherically symmetry in n-dimensions. This analysis
includes settings with boundary layers such as brane world models and shell
cosmologies.

<|endoftext|><|startoftext|>
Introduction
The unilateral shift on complex separable Hilbert space generates two funda-
mental operator algebras, namely the norm closed (unital) algebra and the
weak operator topology closed algebra. The former is naturally isomorphic to
the disc algebra of holomorphic functions on the unit disc, continuous to the
boundary, while the latter is isomorphic to H∞. The freely noncommuting
multivariable generalisations of these algebras arise from the freely noncom-
muting shifts Le1 , . . . , Len given by the left creation operators on the Fock
space Fn =
k=0⊕(Cn)⊗k. Here the generated operator algebras, denoted
An and Ln for the norm and weak topologies, are known as the noncommu-
tative disc algebra and the freesemigroup algebra. They have been studied
extensively with respect to operator algebra structure, representation theory
and the multivariable operator theory of row contractions. See for example
[2], [9].
Higher rank generalisations of these algebras arise when one considers
several families of freely noncommuting generators between which there are
commutation relations. In the present paper we consider a very general form
of such relations, namely
LeiLfj =
ui,j,k,lLflLek
where Le1 , . . . , Len and Lf1 , . . . , Lfm are freely noncommuting and u = (ui,j,k,l)
is an nm×nm unitary matrix. The associated operator algebras are denoted
Au and Lu and we classify them up to various forms of isomorphism in terms
of the unitary matrices u. Such unitary relations arose originally in the con-
text of the general dilation theorem proven in Solel ([12], [13]) for two row
contractions [T1 · · ·Tn] and [S1 · · ·Sm] satisfying the unitary commutation
relations.
For n = m = 1, we have u = [α] with |α| = 1 and Au is the subalgebra of
the rotation C*-algebra for the relations uv = αvu. When u is a permutation
unitary matrix arising from a permutation θ in Snm then the relations are
those associated with a single vertex rank 2 graph in the sense of Kumjian
and Pask, and the algebras in this case have been considered in Kribs and
Power [5] and Power [10]. In particular, in [10] it was shown that there are 9
operator algebras Aθ arising from the 24 permutations in case n = m = 2. In
contrast, we see below in Section 6 that for general 2 by 2 unitaries u there
are uncountably many isomorphism classes of the unitary relation algebras
Au expressed in terms of a nine fold real parametrisation of isomorphism
types.
The algebras Aθ are easily defined; they are determined by the left regular
representation of the semigroup F+θ whose generators are e1, . . . , en, f1, . . . , fm
subject to the relations eifj = flek where θ(i, j) = (k, l). On the other hand
the unitary relation algebras Au are generated by creation operators on a
Z2+-graded Fock space
k,l ⊕(Cn)⊗k⊗ (Cm)⊗l with relations arising from the
identification u : Cn ⊗Cm → Cm ⊗Cn. In particular, Au is a representation
of the non-selfadjoint tensor algebra of a rank 2 correspondence (or a product
system over N2) in the sense of [13]. See also [3]
In the main results, summarised partly in Theorem 5.10, we see that if
Au and Av are isomorphic then the two families of generators have match-
ing cardinalities. Furthermore, if n 6= m then the algebras are isomorphic if
and only if the unitaries u, v in Mnm(C) are unitary equivalent by a unitary
A ⊗ B in Mn(C) ⊗Mm(C). As in [10] we term this product unitary equiv-
alence (with respect to the fixed tensor product decomposition). The case
n = m admits an extra possibility, in view of the possibility of generator
exchanging isomorphisms, namely that u, ṽ are product unitary equivalent,
where ṽi,j,k,l = v̄l,k,j,i.
The theorem is proven as follows. After some preliminaries we identify, in
Section 3, the character space M(Au) and the set of w*-continuous charac-
ters on Lu. These are subsets of the closed unit ball product Bn ×Bm which
are associated with a variety Vu in C
n×Cm determined by u. We then define
the core Ω0u, a closed subset of the realised character space Ωu =M(Au), and
we identify this intrinsically (algebraically) in terms of representations of Au
into T2, the algebra of upper triangular matrices in M2(C). The importance
of the core is that we are able to show that the interior is a minimal automor-
phism invariant subset on which automorphisms act transitively. This allows
us to infer the existence of graded isomorphisms from general isomorphisms.
To construct automorphisms we first review, in Section 4, Voiculescu’s con-
struction of a unitary action of the Lie group U(1, n) on the Cuntz algebraOn
and the operator algebras An and Ln. This provides, in particular, unitary
automorphisms Θα, for α ∈ Bn, which act transitively on the interior ball,
Bn, of the character space of An. For these explicit unitary automorphisms
of the ei-generated copy of An in Au, we establish unitary commutation re-
lations for the tuples Θα(Le1), . . . ,Θα(Len) and Lf1 , . . . , Lfm , when (α, 0) is
a point in the core. This enables us to define natural unitary automorphisms
of Au itself, and in Theorem 4.8 the relative interior of the core is identi-
fied as an automorphism invariant set in the Gelfand space Ωu. In Section
5 we determine the graded and bigraded isomorphisms in terms of product
unitary equivalence. To do this we observe that such isomorphisms induce
an origin preserving biholomorphic map between the cores Ω0u and Ω
v and
that these maps, by a generalised Schwarz’s Lemma, are implemented by a
product unitary. We then prove the main classification theorem.
In Section 6 we analyse in detail the case n = m = 2 and consider the
special case of permutation unitaries.
Finally, in Section 7 we show that the algebra Au is contained in a tensor
algebra T+(X), associated with a correspondence X as in [7]. Moreover, at
least when n 6= m, every automorphism of Au extends to an automorphism
of T+(X). The advantage of the tensor algebra is that its representation
theory is known ([7]) while this is not the case yet for the algebra Au.
2 Preliminaries
Fix two finite dimensional Hilbert spaces E = Cn and F = Cm and a
unitary mn × mn matrix u. The rows and columns of u are indexed by
{1, . . . , n}×{1, . . . , m} (u = (u(i,j),(k,l))) and when we write u as an mn×mn
matrix we assume that {1, . . . , n} × {1, . . . , m} is ordered lexicographically
(so that, for example, the second row is the row indexed by (1, 2)). We also
fix orthonormal bases {ei} and {fj} for E and F respectively and the matrix
u is used to identify E ⊗ F with F ⊗ E through the equation
ei ⊗ fj =
u(i,j),(k,l)fl ⊗ ek. (1)
Equivalently, we write
fl ⊗ ek =
ū(i,j),(k,l)ei ⊗ fj. (2)
For every k, l ∈ N, we write X(k, l) for E⊗k ⊗ F⊗l. Using succesive applica-
tions of (1), we can identify X(k, l) with E⊗k1 ⊗ F⊗l1 ⊗ E⊗k2 ⊗ · · · ⊗ F⊗lr
whenever k =
ki and l =
Let F(n,m, u) be the Fock space given by the Hilbert space direct sum
X(k, l) =
E⊗k ⊗ F⊗l,
and, for e ∈ E and f ∈ F , write Le and Lf for the “shift” operators
Leei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = e⊗ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl
Lfei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = f⊗ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl
where, in the last equation, we use (1) to identify the resulting vector as a
vector of E⊗k ⊗ F⊗(l+1).
The unital semigroup generated by {I, Le, Lf : e ∈ E, f ∈ F} is
denoted F+u and the algebra it generates denoted C[F
u ]. The norm closure
of C[F+u ] will be written Au and its closure in the weak* operator topology
will be written Lu. In particular, the algebras Lθ and Aθ studied in [10] are
the algebras Lu and Au for u which is a permutation matrix.
The results of Section 2 in [5] hold here too with minor changes. Every
A ∈ Lu is the limit (in the strong operator topology) of its Cesaro sums
Σp(A) =
(1− k
)Φk(A)
where Φk(A) lies in Lu and is “supported” on
l ⊕E⊗l ⊗ F⊗(k−l). In fact,
let Qk be the projection of F(n,m, u) onto
l ⊕E⊗l ⊗ F⊗(k−l), form the
one-parameter unitary group {Ut} defined by Ut :=
k=0 e
iktQk and set
γt = AdUt. Then {γt}t∈R is a w∗-continuous action of R on L(F(n,m, u))
that normalizes both Au and Lu and
Φk(a) =
e−iktγt(a)dt
for all a ∈ L(F(n,m, u)). Then Φk leaves Lu invariant.
We can define the algebra Ru generated by the right shifts Re and Rf
defined by
Reei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl = ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fjl⊗e
Rfei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fil = ei1⊗ei2⊗· · ·⊗eik⊗fj1⊗fj2⊗· · ·⊗fil⊗f.
The techniques of the proof of Proposition 2.3 of [5] can be applied here to
show that the commutant of Ru is Lu. Also, mapping ei1 ⊗ ei2 ⊗ · · · ⊗ eik ⊗
fj1 ⊗ fj2 ⊗ · · · ⊗ fjl to fjl ⊗ fjl−1 ⊗ · · · ⊗ fj1 ⊗ eik ⊗ eik−1 ⊗ · · · ⊗ ei1 , we get a
unitary operator
W : F(n,m, u) → F(n,m, u∗)
implementing a unitary equivalence of Lu and Ru∗ . In fact, it is easy to
check that ReiW = WLei and RfjW = WLfj for every i, j. To see that the
commutation relation in the range is given by u∗, apply W to (2) to get (in
the range of W ) ek⊗ fl =
i,j ū(i,j),(k,l)fj ⊗ ei =
i,j(u
∗)(k,l),(i,j)fj ⊗ ei which
is equation (1) with u∗ instead of u.
As in [5], we conclude that (Lu)′ = Ru and (Lu)′′ = Lu.
3 The character space and its core
In the following proposition we describe the structure of the character spaces
M(Lu) and M(Au) (equipped with the weak∗ topology). Similar results
were obtained in [5] for algebras defined for higher rank graphs and in [2] for
analytic Toeplitz algebras. (See also [10].)
It will be convenient to write
Vu = {(z, w) ∈ Cn × Cm : ziwj =
u(i,j),(k,l)zkwl } (3)
Ωu = Vu ∩ (Bn × Bm) (4)
where Bn is the open unit ball of C
n. We refer to Vu as the variety associated
with u.
Proposition 3.1 (1) The linear multiplicative functionals on C[F+u ] are in
one-to-one correspondence with points (z, w) in Vu.
(2) M(Au) is homeomorphic to Ωu.
(3) For (z, w) ∈ Ωu, write α(z,w) for the corresponding character of Au.
Then α(z,w) extends to a w
∗-continuous character on Lu if and only if
(z, w) ∈ Bn × Bm.
Proof. Part (1) follows immediately from (1). Fix α ∈ M(Au) and
write zi = α(Lei), 1 ≤ i ≤ n, and wi = α(Lfj ), 1 ≤ j ≤ m. From the
multiplicativity and linearity of α and (1), it follows that (z, w) ∈ Vu. Since
α is contractive and maps
i aiLei to
i aizi, it follows that ‖z‖ ≤ 1 and
similarly ‖w‖ ≤ 1. Thus (z, w) ∈ Ωu.
For the other direction, fix first (z, w) ∈ Ωu with ‖z‖ < 1 and ‖w‖ < 1.
It follows from the definition of Ωu and from (1) that (z, w) defines a linear
and multiplicative map α on the algebra C[F+u ] such that Lei is mapped into
zi and α(Lfj) = wj . Abusing notation slightly, we write α(x) for α(Lx) for
every x ∈ E⊗k ⊗ F⊗l. Also, for i = (i1, . . . , ik) and j = (j1, . . . , jl), we write
eifj for ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl. These elements form an orthonormal
basis for E⊗k ⊗ F⊗l and we now set
α(eifj)eifj ∈ F(X).
If pi ≥ 0 and p1 + . . . + pn = k then there are k!p1!···pn! terms ei1 ⊗ · · · ⊗ eik
with α(ei1 ⊗ · · · ⊗ eik) = z
2 · · · z
k . It follows that
i |α(ei)|2 =
i=(i1,...,ik)
|α(ei1)|2 · · · |α(eik)|2. Thus
‖wα‖2 =
i,j,k,l
|α(eifj)|2 = (1− ‖z‖2)−1(1− ‖w‖2)−1 <∞
Note that, for every x ∈ E⊗k ⊗ F⊗l,
〈x, wα〉 = α(x).
Thus, for e ∈ E, 〈x, L∗ewα〉 = 〈Lex, wα〉 = α(e⊗x) = α(e)α(x) = 〈α(e)wα, x〉
and, similarly 〈x, L∗fwα〉 = 〈α(f)wα, x〉 for f ∈ F . Thus 〈wα, L∗ewα〉 =
α(e)α(wα) = α(e)
|α(eifj)|2 = α(e)‖wα‖2. Similarly, 〈wα, L∗fwα〉 =
α(f)α(wα) = α(f)
|α(eifj)|2 = α(f)‖wα‖2 for f ∈ F . Thus if we write
να = wα/‖wα‖ then
α(x) = 〈Lxνα, να〉
for every x ∈ E⊗k ⊗ F⊗l (for every k, l). This shows that α is contractive
and is w∗-continuous. We can, therefore, extend it to an element of M(Lu),
also denoted α.
The analysis above shows that the image of the map α 7→ (z, w) ∈ Ωu
defined above (onM(Au)) contains Vu∩(Bn×Bm). Since M(Au) is compact
and the map is w∗-continuous, its image contains (and, thus, is equal to) Ωu.
This completes the proof of (2). To complete the proof of (3), we need
to show that, if (z, w) ∈ Ωu and the corresponding character extends to a
w∗-continuous character on Lu, then ‖z‖ < 1 and ‖w‖ < 1.
For this, write L for the w∗-closed subalgebra of Lu generated by {Le :
e ∈ E} ∪ {I}. Let P be the projection of F(E, F, u) onto F(E) = C ⊕
E ⊕ (E ⊗ E) ⊕ · · ·. Then PLP = PLuP and the map T 7→ PTP , is a
w∗-continuous isomorphism of L onto PLuP . The latter algebra is unitarily
equivalent to the algebra Ln studied in [2]. A w∗-continuous character of Lu
gives rise, therefore, to a w∗-continuous character on Ln. It follows from [2,
Theorem 2.3] that z ∈ Bn. Similarly, one shows that w ∈ Bm. �
To state the next result, we first write u(i,j) for the n×m matrix whose
k, l-entry is u(i,j),(k,l). Thus, the (i, j) row of u provides the n rows of u(i,j).
We then compute
u(i,j),(k,l)zkwl =
u(i,j),(k,l)wl)zk =
(u(i,j)w)kzk = 〈u(i,j)w, z̄〉.
Write Ei,j for the n×m matrix whose i, j-entry is 1 and all other entries are
0 (so that 〈Ei,jw, z̄〉 = ziwj) and write C(i,j) for the matrix u(i,j)−Ei,j . Then
the computation above yields the following.
Lemma 3.2 With C(i,j) defined as above, we have
Vu = {(z, w) ∈ Cn × Cm : 〈C(i,j)w, z̄〉 = 0, for all i, j}.
Definition 3.3 The core of Ωu is the subset given by
Ω0u := {(z, w) ∈ Bn × Bm : C(i,j)w = 0, Ct(i,j)z = 0 for all i, j}.
Fix (z, w) ∈ Ω0u. We have u(i,j)w = Ei,jw for all i, j. Thus, for every k,
u(i,j),(k,l)wl = δi,kwj (6)
(where δi,k is 1 if i = k and 0 otherwise) and, for a1, a2, . . . , an, in C we have
k,l u(i,j),(k,l)akwl = aiwj. Hence, if we let w̃
(i) be the vector in Cmn defined
by w̃
(k,l)
= δk,iwl, we get uw̃
(i) = w̃(i). Similarly, for z, we have
u(i,j),(k,l)zk = δj,lzi (7)
and for scalars b1, . . . , bm we have
k,l u(i,j),(k,l)blzk = bjzi. Thus, writing z̃(j)
for the vector defined by (z̃(j))(k,l) = δl,jzk, we have uz̃(j) = z̃(j). The vector
w̃(i) in Cnm = Cn ⊗ Cm is also expressible as δi ⊗ w where {δ1, . . . , δn} is
the standard basis of Cn, and, similarly, z̃(j) = z ⊗ δj . We therefore obtain
Lemma 3.4 which will be useful in Section 6.
We note also the following companion formula. Suppose (z, w) ∈ Ω0u.
Then, as we noted above, uz̃(j) = z̃(j) and, thus, u
∗z̃(j) = z̃(j). Writing this
explicitly, we have, for all i, j, l,
u(k,l),(i,j)z̄k = δj,lz̄i. (8)
Lemma 3.4 Let (z, w) be a vector in the core Ω0u. Then
span{z̃(j), w̃(i) : 1 ≤ i ≤ n, 1 ≤ j ≤ m} ⊆ Ker(u− I).
In particular,
(i) If the core contains a vector (z, w) with z 6= 0, then dim(Ker(u−I)) ≥
(ii) If the core contains a vector (z, w) with w 6= 0 then dim(Ker(u−I)) ≥
(iii) If the core contains a vector (z, w) with z 6= 0 and w 6= 0, then
dim(Ker(u− I)) ≥ m+ n− 1.
We now characterise the core in an algebraic manner in terms of repre-
sentations into the algebra T2 of upper triangular 2×2 matrices. We remark
that nest representations such as these have proven useful in the algebraic
structure theory of nonself-adjoint algebra [?], [11].
Let ρ : C[F+u ] → T2 with
ρ(T ) =
ρ1,1(T ) ρ1,2(T )
0 ρ2,1(T )
Then ρ1,1 and ρ2,2 are characters and ρ1,2 is a linear functional that satisfies
ρ1,2(TS) = ρ1,1(T )ρ1,2(S) + ρ1,2(T )ρ2,2(S) (9)
for T, S ∈ C[F+u ].
We now restrict to the case where ρ1,1 = ρ2,2. By Proposition 3.1(1),
both are associated with a point (z, w) in Vu. It follows from (9) that ρ1,2
is determined by its values on Lei and Lfj . Setting λi = ρ1,2(Lei) and µj =
ρ1,2(Lfj ), we associate with each homomorphism ρ (as discussed above) a
quadruple (z, w, λ, µ) where (z, w) ∈ Vu and, for every i, j,
ziµj + λiwj =
u(i,j),(k,l)(wlλk + µlzk). (10)
(The last equation follows from (1)). Using (5) we can write the last equation
〈u(i,j)w, λ̄〉+ 〈u(i,j)µ, z̄〉 = ziµj + λiwj = 〈Ei,jw, λ̄〉+ 〈Ei,jµ, z̄〉.
That is,
〈C(i,j)w, λ̄〉+ 〈µ, Ct(i,j)z〉 = 0. (11)
The following lemma now follows from the definition of the core.
Lemma 3.5 A point (z, w) ∈ Ωu lies in the core Ω0u if and only if every
(λ, µ) ∈ Cn × Cm defines a homomorphism ρ : C[F+u ] → T2 such that
ρ(Lei) =
zi λi
ρ(Lfj ) =
wj µj
for all i, j.
4 Automorphisms of Ln and Lu
We first derive the unitary automorphisms of Ln and An associated with
U(1, n). These were obtained by Voiculescu [14] in the setting of the Cuntz-
Toeplitz algebra. However the automorphisms restrict to an action of U(1, n)
on the free semigroup algebra. The result is rather fundamental, being a
higher dimensional version of the familiar Möbius automorphism group on
H∞. For the reader’s convenience we provide complete proofs. See also the
discussion in Davidson and Pitts [2], and in [1], [10].
Lemma 4.1 Let α ∈ Bn and write
(i) x0 = (1− ‖α‖2)−1/2,
(ii) η = x0α, and
(iii) X1 = (ICn + ηη
∗)1/2.
(1) ‖η‖2 = |x0|2 − 1,
(2) X1η = x0η, and
(3) X21 = I + ηη
In particular, the matrix X =
satisfies X∗JX = J , where J =
Proof. Part (1) is an easy computation and part (3) follows from the
definition of X1. For (2), note that X
1η = (I + ηη
∗)η = η + ‖η‖2η = x20η
and, for every ζ ∈ η⊥, X1ζ = ζ . Suppose X1η = aη + ζ (ζ ∈ η⊥). Then
x20η = X
1η = a
2η + ζ and it follows that a = x0 (as X1 ≥ 0) and ζ = 0. �
The lemma exhibits specific matrices (X1 is nonnegative) in U(1, n) asso-
ciated with points in the open ball. One can similarly check (see [2] or [10] for
example) that the general form of a matrix Z in U(1, n) is Z =
η2 Z1
where
‖η1‖2 = ‖η2‖2 = |z0|2 − 1,
Z1η1 = z̄0η2, Z
1η2 = z0η1,
Z∗1Z1 = In + η1η
1 , Z1Z
1 = In + η2η
It is these equations that are equivalent to the single matrix equation Z∗JZ =
It is well known that the map θX defined on Bn by
θX(λ) =
X1λ+ η
x0 + 〈λ, η〉
, λ ∈ Bn.
is an automorphism of Bn with inverse θX−1 . See Lemma 4.9 of [2] and Lemma
8.1 of [10] for example. We make use of this in the proof of Voiculescu’s
theorem below.
Let L1, . . . , Ln be the generators of the norm closed algebraAn and for ζ ∈
Cn write Lζ =
ζiLi. Recall that the character space M(An) is naturally
identifiable with the closed ball B̄n, with λ in this ball providing a character
φλ for which φλ(Li) = λi. The proof is a reduced version of that given above
for M(Aθ).
Theorem 4.2 Let α ∈ Bn and let X1, x0, η and X be associated with α as
in Lemma 4.1. Then
(i) there is an automorphism ΘX of Ln such that
Θα(Lζ) = (x0I + Lη)
−1(LX1ζ + 〈ζ, η̄〉I), (12)
(ii) the inverse automorphism Θ−1X is ΘX−1, and X
−1 is the matrix in
U(1, n) associated with −α,
(iii) there is a unitary UX on Fn such that for a ∈ An,
UXaξ0 = Θα(a)(x0I + Lη)
and ΘX(a) = UXaU
Proof. Let Fn be the Fock space for Ln, In = IFn , and let L̃ =
[In L1 · · ·Ln] viewed as an operator from (C⊕ Cn)⊗Fn = Fn ⊕ (Cn ⊗Fn)
to Fn. Then
L̃(J ⊗ I)L̃∗ = In − L̃L̃∗ = In − (L1L∗1 + . . . LnL∗n) = P0
where P0 is the vacuum vector projection from Fn to C. Also, since XJX =
J , we have
L̃(J ⊗ I)L̃∗ = L̃(X ⊗ In)(J ⊗ I)(X ⊗ In)L̃∗ = [Y0 Y1](J ⊗ I)[Y0 Y1]∗
where
[Y0 Y1] = [In L]
x0 ⊗ In η∗ ⊗ In
η ⊗ In X1 ⊗ In
Thus Y0Y
0 − Y1Y ∗1 = P0. Also
Y0 = x0 ⊗ In + L(η ⊗ In) = x0In + Lη,
Y1 = η
∗ ⊗ In + L(X1 ⊗ In) = η∗ ⊗ In + [LX1e1 . . . LX1en]
where, here, e1, . . . , en is the standard basis for C
The operator V = Y −10 Y1 is a row isometry [V1 · · · Vn], from Cn ⊗Fn to
Fn with defect 1. To see this we compute
I − V V ∗ = I − Y −10 Y1Y ∗1 Y ∗−10 = I − Y −10 (−P0 + Y0Y ∗0 )Y ∗−10
= I + Y −10 P0Y
0 − I = ξ
0 = Y
0 ξ0 = (x0In + Lη)
−1ξ0 = x
(x−10 Lη)
and so
‖ξ′0‖ = |x0|−2
|x0|−2j‖η‖2j =
x20 − ‖η‖2
Considering the path t → tα for 0 ≤ t ≤ 1 and the corresponding path of
partial isometries V it follows from the stability of Fredholm index that the
index of V and L coincide and so in fact V is a row isometry. Thus V1, . . . , Vn
are isometries with orthogonal ranges.
We now have a contractive algebra homomorphism An → L(Fn) deter-
mined by the correspondence Lei → Vi, i = 1, . . . , n. In fact it is an algebra
endomorphism Θ : An → An. Indeed, for ξ = (ξ1, . . . , ξn) we have
Θ(Lξ) =
ξiVi =
0 Y1(ei ⊗ In)
ζi(x0In + Lη)
−1(η∗ ⊗ In + [LX1e1 . . . LX1en])[In · · · In]t
= (x0In + Lη)
−1(〈ζ, η〉In + LX1ζ).
Thus far we have followed Voiculescu’s proof [14]. The following argument
shows that Θ is an automorphism and is an alternative to the calculation
suggested in [14]. The calculation shows that
φλ ◦ΘX = φθ
We have
φλ ◦ΘX(Lζ) = φλ((x0In + Lη)−1(〈ζ, η〉In + LX1ζ))
= (x0 + 〈λ, η〉)−1(〈ζ, η〉+ 〈X1ζ, λ〉) = φµ(Lζ)
where
X∗1λ + η
x0 + 〈λ, η〉
X1λ+ η
x0 + 〈λ, η〉
= θX(λ).
Write ΘX for the contractive endomorphism Θ of An as constructed
above. It follows that the composition Φ = ΘX−1 ◦ ΘX is a contractive
endomorphism which, by the remarks preceding the statement of the theo-
rem, induces the identity map on the character space, so that φλ = φλ ◦Φ−1
for all λ ∈ Bn. Such a map must be the identity. Indeed, suppose that we
have the Fourier series representation Φ−1(Le1) = a1Le1 + . . . + anLen + X
where X is a series with terms of total degree greater than one. It follows
t−1φ(t,0,...,0)(Φ
−1(Le1)) = a1
while
t−1φ(t,0,...,0)(Le1) = 1.
Since the induced map is the identity, we have a1 = 1 and ak = 0 for k ≥ 2.
In this way we see that the image of each Li has the form Li+Ti where Ti has
only terms of total degree greater than one. Since Liξ0 is orthogonal to Tiξ0
and Φ−1(Li) is a contraction, we have 1 ≥ ‖Φ−1(Li)ξ0‖2 = ‖Liξ0 + Tiξ0‖2 =
‖Liξ0‖2 + ‖Tiξ0‖2 = 1 + ‖Tiξ0‖2. Thus Tiξ0 = 0 and, consequently, Ti = 0
and so the composition Φ is the identity map.
Finally, we show that Θα is unitarily implemented. Define UX on Anξ0
by UXaξ0 = ΘX(a)ξ
0 = ΘX(a)(x0I + Lη)
−1ξ0 for a ∈ A. Since ΘX is an
automorphism, (UXa)bξ0 = UXabξ0 = ΘX(a)ΘX(b)ξ
0 = ΘX(a)UXbξ0, for
a, b ∈ An, and it follows that UXa = ΘX(a)UX , as linear transformations on
the dense space Anξ0.
Now, V = [V1, . . . , Vn] is a row isometry with defect space spanned by ξ
The map UX maps ξi = Liξ0 to ΘX(Li)ξ
0 = Viξ
0 and, if w = w(e1, . . . , en) is
a word in e1, . . . , en , then
UXξw = UXw(L1, . . . , Ln)ξ0 = ΘX(w(L1, . . . , Ln))ξ
0 = w(V1, . . . , Vn)ξ
Since V is a row isometry and ξ′0 is a unit wandering vector for V , it follows
that {w(V1, . . . , Vn)ξ′0} is an orthonormal set. Thus, UX is an isometry. Since
the range of UX contains UXAnξ0 = ΘX(An)ξ′0 = An(x0I + Lη)−1ξ0 = Anξ0
we see that UX is unitary. �
Remark 4.3 With the same calculations as in the proof above and slightly
more notation, one can show that each invertible matrix Z ∈ U(1, n) defines
an automorphism ΘZ and that Z → ΘZ is an action of U(1, n) on An and,
in particular, ΘZΘX = ΘZX . Moreover, Z → UZ is a unitary representation
of U(1, n) implementing this as the following calculation indicates.
Let W =
be the matrix in U(1, n) associated with β ∈ Bn
as in Lemma 4.1. Then
UXUWaξ0 = UX(Θβ(a)(w0 + Lω)
−1ξ0)
= Θα(Θβ(a)(w0 + Lω)
−1)(x0In + Lη)
= Θα(Θβ(a))Θα((w0 + Lω)
−1)(x0In + Lη)
= ΘXW (a)Θα((w0 + Lω)
−1)(x0In + Lη)
= ΘXW (a)[w0In + (x0In + Lη)
−1(LX1ω + 〈ω, η〉In)]
(x0In + Lη)
= ΘXW (a)[w0x0In + w0Lη + LX1ω + 〈ω, η〉In)]
= ΘXW (a)[(w0x0In + 〈ω, η〉)In + Lω0η+X1ω]
One readily checks that this is the same as UXW (a)ξ0
It is evident from the last theorem and its proof that the unitary auto-
morphisms of An and Ln act transitively on the open subset Bn associated
with the weak star continuous characters. We shall show that a version of
this holds for the unitary relation algebras with respect to the open core of
the character space. As a first step to constructing automorphisms of Au we
obtain unitary commutation relations for the n-tuples [Θ(Le1), . . . ,Θ(Len)]
and [Lf1 , . . . , Lfm ] for certain automorphisms Θ of the copy of An in Au.
Lemma 4.4 Suppose (z, w) ∈ Ω0u∩(Bn×Bm). Write α for z̄ and let Θ := Θα
be as in (12). Then, for every 1 ≤ i ≤ n and 1 ≤ j ≤ m,
Θ(Lei)Lfj =
u(i,j),(k,l)LflΘ(Lek). (13)
Proof. Write Y for ηη∗ and β for (x0 + 1)
−1. Since X21 = I + ηη
X1 = I + βηη
∗ = I + βY and Y = (Yi,j) where Yi,j = ηiη̄j = x
0z̄izj . We now
compute
(X1ei)fj = eifj +
βYt,ietfj = eifj +
t,k,l
βYt,iu(t,j),(k,l)flek
u(i,j),(k,l)flek +
t,k,l
βx20z̄tziu(t,j),(k,l)flek
u(i,j),(k,l)flek + βx
t,k,l
z̄tu(t,j),(k,l)flek.
Using the core equation (8), the last expression is equal to
u(i,j),(k,l)flek + βx
δj,lz̄kflek
u(i,j),(k,l)flek + βx
z̄kfjek
u(i,j),(k,l)flek + βx
(δj,lzi)z̄kflek.
Using the core equation (7), this is equal to
u(i,j),(k,l)flek + βx
u(i,j),(t,l)zt)z̄kflek
u(i,j),(k,l)flek + βx
k,l,t
u(i,j),(k,l)zkz̄tflet
u(i,j),(k,l)flek + β
k,l,t
u(i,j),(k,l)Yt,kflet
u(i,j),(k,l)flek + β
u(i,j),(k,l)flY ek
u(i,j),(k,l)flX1ek.
LX1eiLfj =
u(i,j),(k,l)LflLX1ek . (14)
Next, we compute
i z̄ieifj =
i,k,l u(i,j),(k,l)z̄iflek. Using (8), this is equal
k,l δj,lz̄kflek =
k z̄kfjek. Thus
z̄ieifj =
z̄ifjei (15)
and, hence, Lη commutes with Lfj . It follows that
Lfj (x0I − Lη)−1 = (x0I − Lη)−1Lfj . (16)
We have, using (14) and (16),
(x0I − Lη)−1LX1eiLfj =
u(i,j),(k,l)(x0I − Lη)−1LflLX1ek
u(i,j),(k,l)Lfl(x0I − Lη)
−1LX1ek .
Also, applying (7) and (16), we get
(x0I − Lη)−1〈ei, η〉Lfj = ziLfj (x0I − Lη)−1
δj,lziLfl(x0I − Lη)
u(i,j),(k,l)zkLfl(x0I − Lη)
Subtracting the last two equations, we get (13). �
Corollary 4.5 In the notation of Lemma 4.4, for every i, j,
L∗fjΘ(Lei) =
u(i,l),(k,j)Θ(Lek)L
Proof. It follows from (13) that Θ(Lei)Lfl =
k,t u(i,l),(k,t)LftΘ(Lek) for
every i, l. Thus, for i, j, l,
L∗fjΘ(Lei)LflL
u(i,l),(k,t)L
LftΘ(Lek)L
u(i,l),(k,t)δj,tΘ(Lek)L
u(i,l),(k,j)Θ(Lek)L
Summing over l, we get
L∗fjΘ(Lei)(
u(i,l),(k,j)Θ(Lek)L
l LflL
= I − P where P is the projection onto the subspace C ⊕
E ⊕ (E ⊗ E) ⊕ . . .. Note that P is left invariant under the operators in
the algebra generated by {Lei : 1 ≤ i ≤ n} and, in particular, by Θ(Lei).
Thus L∗fjΘ(Lei)P = L
PΘ(Lei)P = 0 =
k,l u(i,l),(k,j)Θ(Lek)L
P . This
completes the proof of the corollary. �
Proposition 4.6 Suppose (z, w) ∈ Ω0u ∩ (Bn × Bm). Then there is a auto-
morphism Θ̃z of Au that is unitarily implemented and such that, for every
X ∈ Au,
α(0,w)(Θ̃
z (X)) = α(z,w)(X) (17)
where α(z,w) is the character associated with (z, w) by Proposition 3.1.
Proof. Let U be the unitary operator implementing Θ. We can view
F(n,m, u) as the sum
F(n,m, u) =
F⊗k ⊗ F(E)
where F(E) = C⊕E⊕ (E⊗E)⊕· · ·. We now let V be the unitary operator
whose restriction to F⊗k ⊗F(E) is Ik ⊗U (where Ik is the identity operator
on F⊗k). It is easy to check that, for every fj ,
V LfjV
∗ = Lfj .
Now, fix i. We shall show, by induction, that, for every k and every ξ ∈
F⊗k ⊗ F(E),
(Ik ⊗ U)Leiξ = Θ(Lei)(Ik ⊗ U)ξ. (18)
For k = 0 this is just the fact that U implements Θ. Suppose we know this
for k and fix fj ∈ F . Then, for ξ ∈ F⊗k ⊗ F(E) we have,
(Ik+1 ⊗ U)LeiLfjξ =
u(i,j),(k,l)(Ik+1 ⊗ U)LflLekξ
u(i,j),(k,l)Lfl(Ik ⊗ U)Lekξ.
Applying the induction hypothesis, this is equal to
k,l u(i,j),(k,l)LflΘ(Lek)(Ik⊗
U)ξ. Using (13), this is Θ(Lei)Lfj (Ik ⊗ U)ξ = Θ(Lei)(Ik ⊗ U)Lfjξ. Since
F⊗(k+1) ⊗ F(E) is spanned by elements of the form Lfjξ (as above) the
equality follows. From the relations of Lemma 4.4 it follows that the map
Θ̃z : X → V XV ∗ defines a unitary endomorphism of Au. Since Θ is an
automorphism of An it follows that Θ̃z gives the desired automorphism. �
Clearly, in Proposition 4.6, we can interchange z and w to get the follow-
ing, where Θz,w = Θ̃zΘ̃w.
Proposition 4.7 Suppose (z, w) ∈ Ω0u ∩ (Bn ×Bm). Then there is a unitary
automorphism Θz,w of Lu which is a homeomorphism with respect to the
w∗-topologies and which restricts to an automorphism of Au. Moreover, for
every X ∈ Lu,
α(0,0)(Θ
z,w(X)) = α(z,w)(X) (19)
where α(z,w) is the character associated with (z, w) as in Proposition 3.1.
An automorphism Ψ of Au, defines a map on the character space of Au,
namely φ 7→ φ ◦Ψ−1. Thus using Proposition 3.1 we have a homeomorphism
θΨ of Ωu. Also, since Ωu ∩ (Bn × Bm) is the interior of Ωu, θΨ maps Ωu ∩
(Bn × Bm) onto itself.
Similarly, if Ψ is an automorphism of Lu which is a homeomorphism with
respect to the w∗-topologies, then θΨ is a homeomorphism of Ωu∩ (Bn×Bm).
In the following theorem we identify the relative interior of the core as the
orbit of (0, 0) under the group of maps θΨ associated with automorphisms Ψ.
Theorem 4.8 For (z, w) ∈ Bn×Bm the following conditions are equivalent.
(1) (z, w) ∈ Ω0u.
(2) There exists a completely isometric automorphism Ψ of Lu that is a
homeomorphism with respect to the w∗-topologies and restricts to an
automorphism of Au, such that θΨ(0, 0) = (z, w).
(3) There exists an algebraic automorphism Ψ of Au such that θΨ(0, 0) =
(z, w).
Proof. The proof that (1) implies (2) follows from Proposition 4.7. Clearly
(2) implies (3). It is left to show that (3) implies (1).
Given a point (z, w) ∈ Ωu, we saw in Lemma 3.5 that, for every (λ, µ)
satisfying (11) there is a homomorphism ρz,w,λ,µ : C[F
u ] → T2. For (z, w) =
(0, 0) equation (11) holds for every pair (λ, µ). Since ρ0,0,λ,µ vanishes off a
finite dimensional subspace, it is a bounded homomorphism. In fact, for
every (λ, µ), ‖ρ0,0,λ,µ‖ ≤ 1 + ‖λ‖+ ‖µ‖.
Given Ψ and (z, w) as in (3), for every (λ, µ) ∈ Cn × Cm, ρ0,0,λ,µ ◦ Ψ−1
is a homomorphism on C[F+u ] and, thus, it is of the form ρz,w,λ′,µ′ for some
(unique) (λ′, µ′) satisfying (11). Write ψ(λ, µ) = (λ′, µ′) and note that this
defines a continuous map. To prove the continuity, suppose (λn, µn) → (λ, µ)
and write ρn for ρ0,0,λn,µn and ρ for ρ0,0,λ,µ. Then (using the estimate on the
norm of ρ0,0,λ,µ) there is someM such that ‖ρn‖ ≤M for all n and ‖ρ‖ ≤M .
For every Y ∈ C[F+u ], ρn(Y ) → ρ(Y ). Now fix X ∈ Au and ǫ > 0. There is
some Y ∈ C[F+u ] such that ‖X − Y ‖ ≤ ǫ and there is some N such that for
n ≥ N ‖ρn(Y )− ρ(Y )‖ ≤ ǫ. Thus, for such n, ‖ρn(X)− ρ(X)‖ ≤ (2M +1)ǫ.
Setting X = Ψ(Lei), we get λ
n → λ′ and similarly for µ′.
If (z, w) is not in Ω0u, then the set of all (λ, µ) satisfying (11) is a subspace
of Cn ×Cm of dimension strictly smaller than n+m and, as is shown above,
it contains the continuous image (under the injective map ψ) of Cn × Cm.
This is impossible. �
5 Isomorphic algebras
In this section we shall find conditions for algebras Au and Av to be (isomet-
rically) isomorphic. The characterisation also applies to the weak star closed
algebras Lu.
We start by considering a special type of isomorphism. We shall now
assume that the set {n,m} for both algebras is the same. In fact, by inter-
changing E and F , we can assume that the corresponding dimensions are the
same and the algebras are defined on F(n,m, u) and F(n,m, v) respectively.
This assumption will be in place in the discussion below up to the end of
Lemma 5.5.
The algebra Au carries a natural Z2+-grading, with the (k, l) labeled sub-
space being spanned by products of the form Lei1Lei2 . . . LeikLfi1Lfi2 . . . Lfil .
Also, the total length of such operators provides a natural Z+-grading. Note
that an algebra isomorphism Ψ : Au → Av which respects the Z+-grading is
determined by a linear map between the spans of the generators
Le1 , . . . , Len , Lf1 , . . . , Lfm . Here we use the same notation for the generators
of Au and Av. Such an isomorphism will be called graded.
We now consider two types of graded isomorphisms, namely, either bi-
graded, as in the following definition, or, in case n = m, bigraded after
relabeling generators.
Definition 5.1 (i) An isomorphism Ψ : Au → Av is said to be bigraded
isomorphism if there are unitary matrices A (n × n) and B (m ×m)
such that
Ψ(Lei) =
ai,jLej , Ψ(Lfk) =
bk,lLfl.
(ii) If m = n and Ψ is a graded isomorphism such that
Ψ(Lei) =
ai,jLfj , Ψ(Lfk) =
bk,lLel
for n × n unitary matrices A and B then we say that Ψ is a graded
exchange isomorphism.
We write ΨA,B for the bigraded isomorphism (as in (i)) and Ψ̃A,B for the
graded exchange isomorphism.
Abusing notation, we write Ψ(ei) =
j ai,jej instead of Ψ(Lei) =
j ai,jLej
for a bigraded isomorphism (and similarly for the other expressions).
For unitary permutation matrices the following lemma was proved in [10,
Theorem 5.1(iii)].
Lemma 5.2 (i) If ΨA,B is a bigraded isomorphism then
(A⊗ B)v = u(A⊗B) (20)
where A⊗B is the mn×mn matrix whose (i, j), (k, l) entry is ai,kbj,l.
(ii) If m = n and Ψ̃A,B is a graded exchange isomorphism then
(A⊗ B)ṽ = u(A⊗B) (21)
where ṽ(i,j),(k,l) = v̄(l,k),(j,i).
Proof. Assume Ψ = ΨA,B is a bigraded isomorphism. For i, j,
Ψ(ei ⊗ fj) = (
ai,kek)⊗ (
bj,lfl) =
(A⊗ B)(i,j),(k,l)ek ⊗ fl =
k,l,r,t
(A⊗ B)(i,j),(k,l)v(k,l),(r,t)ft ⊗ er =
((A⊗ B)v)(i,j),(r,t)ft ⊗ er.
On the other hand,
Ψ(ei ⊗ fj) = Ψ(
u(i,j),(k,l)fl ⊗ ek) =
k,l,t,r
u(i,j),(k,l)bl,tak,rft ⊗ er =
(u(A⊗B))(i,j),(r,t)ft ⊗ er.
This proves equation (20). A similar argument can be used to verify equation
(21). �
Definition 5.3 If u, v are mn×mn unitary matrices and there exist unitary
matrices A and B satisfying (20), we say that u and v are product unitary
equivalent.
Now suppose that A and B are unitary matrices satisfying (20). The
same computation as in Lemma 5.2 shows that WA,B : E ⊗u F → E ⊗v F
defined by
WA,B(ei ⊗ fj) =
(A⊗ B)(i,j),(k,l)ek ⊗ fl
is a well defined unitary operator. Here the notation E ⊗u F indicates that
this is E ⊗ F as a subspace of F(n,m, u). Similarly, one defines a unitary
operator, also denoted WA,B, from E
⊗k ⊗F⊗l in F(n,m, u) to E⊗k ⊗ F⊗l in
F(n,m, v) by
WA,B(ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl) =
ai1,r1 · · · aik,rkbj1,t1 · · · bjl,tler1 ⊗ · · · ⊗ erk ⊗ ft1 ⊗ · · · ⊗ ftl .
This gives a well defined unitary operator
WA,B : F(n,m, u) → F(n,m, v).
Lemma 5.4 For every i, j, write Aei =
k ai,kek and Bfj =
l bj,lfl.
Then, for g1, g2, . . . , gr in {e1, . . . , en, f1, . . . , fm},
WA,B(g1 ⊗ g2 ⊗ · · · ⊗ gr) = Cg1 ⊗ Cg2 ⊗ · · · ⊗ Cgr (22)
where Cgi = Agi if gi ∈ {e1, . . . , en} and Cgi = Bgi if gi ∈ {f1, . . . , fm}.
Proof. If the gi’s are ordered such that the first ones are from E and the
following vectors are from F , then the result is clear from the definition of
WA,B. Since we can get any other arrangement by starting with one of this
kind and interchanging pairs gl, gl+1 successively (with gl ∈ {e1, . . . , en} and
gl+1 ∈ {f1, . . . , fm}), it is enough to show that that if (22) holds for a given
arrangement of e’s and f ’s and we apply such an interchange, then it still
holds. So, we assume gl = ek, gl+1 = fs and we write g
′ = g1 ⊗ · · · ⊗ gl−1,
g′′ = gl+2 ⊗ · · · ⊗ gr, Cg′ = Cg1 ⊗ · · · ⊗ Cgl−1 and Cg′′ = Cgl+2 ⊗ · · · ⊗ Cgr
and compute
WA,B(g
′ ⊗ fs ⊗ ek ⊗ g′′) = WA,B(
ū(i,j),(k,s)g
′ ⊗ ei ⊗ fj ⊗ g′′).
Using our assumption, this is equal to
ū(i,j),(k,s)Cg
′ ⊗ (
ai,tet)⊗ (
bj,qfq)⊗ Cg′′ =
i,j,t,q
ū(i,j),(k,s)ai,tbj,qCg
′ ⊗ et ⊗ fq ⊗ Cg′′ =
i,j,t,q,d,p
ū(i,j),(k,s)ai,tbj,qv(t,q),(d,p)Cg
′ ⊗ fp ⊗ ed ⊗ Cg′′ =
(u∗)(k,s),(i,j)(A⊗ B)(i,j),(t,q)v(t,q),(d,p)Cg′ ⊗ fp ⊗ ed ⊗ Cg′′ =
(A⊗ B)(k,s),(d,p)Cg′ ⊗ fp ⊗ ed ⊗ Cg′′ =
ak,dbs,pCg
′ ⊗ fp ⊗ ed ⊗ Cg′′ =
Cg′ ⊗ Bfs ⊗ Aek ⊗ Cg′′
completing the proof. �
The following lemma was proved in [10, Section 7] and it shows that the
necessary conditions of Lemma 5.2 are also sufficient conditions on A⊗B for
the existence of a unitarily implemented isomorphism ΨA,B.
Lemma 5.5 For unitary matrices A,B satisfying (20) and X ∈ Au, the
X 7→WA,BXW ∗A,B
is the bigraded isomorphism ΨA,B : Au → Av. Moreover ΨA,B extends to
a unitary isomorphism Lu → Lv, and similar statements holds for graded
exchange isomorphisms (when m = n).
Proof. It will suffice to show the equality
ΨA,B(X)WA,B = WA,BX
for X = Lei and for X = Lfj . Let X = Lfj and apply both sides of the
equation to ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl. Using Lemma 5.4, we get
ΨA,B(Lfj )WA,B(ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl)
bj,rLfr(Aei1 ⊗ · · · ⊗ Aeik ⊗Bfj1 ⊗ · · · ⊗ Bfjl)
= Bfj ⊗Aei1 ⊗ · · · ⊗Aeik ⊗ Bfj1 ⊗ · · · ⊗ Bfjl
=WA,B(fj ⊗ ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl)
=WA,BLfj (ei1 ⊗ · · · ⊗ eik ⊗ fj1 ⊗ · · · ⊗ fjl).
This proves the equality for X = Lfj . The proof for X = Lei is similar. �
At this point we drop our assumption that the set {n,m} is the same for
both algebras and write {n′, m′} for the dimensions associated with Av. We
shall see in Proposition 5.8 (and Remark 5.11(i)) that, if the algebras are
isomorphic, then necessarily {n,m} = {n′, m′}.
Given an isomorphism Ψ : Au → Av we get a homeomorphism θΨ : Ωu →
Ωv (as in the discussion preceeding Theorem 4.8). The arguments used in
the proof of Theorem 4.8 to show that part (3) implies part (1) apply also
to isomorphisms and thus, θΨ(0, 0) ∈ Ω0v.
Proposition 5.6 Let Ψ : Au → Av be an (algebraic) isomorphism. Then
u) = Ω
v and θΨ(Ω
u ∩ (Bn × Bm)) = Ω0v ∩ (Bn × Bm).
Proof. Fix (z, w) in Ω0u and use Theorem 4.8 to get an automorphism Φ
of Au such that θΦ(0, 0) = (z, w). But then θΨ◦Φ(0, 0) = θΨ(z, w) and, as we
noted above, this implies that θΨ(z, w) ∈ Ω0v. It follows that θΨ(Ω0u) ⊆ Ω0v
and, applying this to Ψ−1, the lemma follows. �
Lemma 5.7 The map θΨ is a biholomorphic map.
Proof. The coordinate functions for θΨ are (z, w) 7→ α(z,w)(Ψ−1(ei)) (and
(z, w) 7→ α(z,w)(Ψ−1(fj))) where α(z,w) is the character associated with (z, w)
by Proposition 3.1. For every Y ∈ C[F+v ], α(z,w)(Y ) is a polynomial in (z, w)
(for (z, w) ∈ Ωv) and, therefore, an analytic function. Each X ∈ Av is a norm
limit of elements in C[F+v ] and, thus, α(z,w)(X) is an analytic function being
a uniform limit of analytic functions on compact subsets of Ωv. Hence, for
every (z, w) ∈ Ωv, there is a power series that converges in some, non empty,
circular, neighborhoodC of (z, w) that represents α(z,w)(X) onC∩Ωv. Taking
for X the operators Ψ−1(ei) and Ψ
−1(fj), we see that θ is analytic. The same
arguments apply to θ−1. �
The facts in the following proposition obtained in [10] in the case of
permutation matrices.
Proposition 5.8 Let Ψ : Au → Av be an algebraic isomorphism and let
θΨ : Ωu → Ωv be the associated map between the character spaces. Suppose
θΨ(0, 0) = (0, 0). Then we have the following.
(1) {n,m} = {n′, m′} and we shall assume that n = n′ and m = m′
(interchanging E and F and changing u to u∗ if necessary).
(2) There are unitary matrices U (n×n) and V (m×m) such that θΨ(z, w) =
(Uz, V w) for (z, w) ∈ Ωu. (If n = m it is also possible that θΨ(z, w) =
(V w, Uz).)
(3) If Ψ is an isometric isomorphism, then Ψ is a bigraded isomorphism.
(Or, if m = n, it may be a graded exchange isomorphism).
Proof. The proof of Proposition 6.3 in [10] giving (1) and (2) in the
permutation case is based essentially on Schwarz’s lemma for holomorphic
map from the unit disc. It applies without change to the case of unitary
matrices.
For (3) we may assume m = m′ and n = n′. From (2) we have for each
Φ(Lei) = LUei +X where X is a sum of higher order terms. Since Φ(Lei) is a
contraction and LUei is an isometry it follows, as in the proof of Voiculescu’s
theorem, that X = 0. Similarly, Φ(Lfj ) = LV fj and it follows that Φ is
bigraded. �
Since every graded isomorphism Ψ satisfies θΨ(0, 0) = (0, 0), we conclude
the following.
Corollary 5.9 Every graded isometric isomorphism is bigraded if n 6= m
and otherwise is either bigraded or is a graded exchange isomorphism.
Theorem 5.10 The following statements are equivalent for unitary matrices
u, v in Mn(C)⊗Mm(C).
(i) There is an isometric isomorphism Ψ : Au → Av.
(ii) There is a graded isometric isomorphism from Ψ : Au → Av.
(iii) The matrices u, v are product unitary equivalent or (in case n = m)
the matrices u, ṽ are product unitary equivalent, where ṽ(i,j),(k,l) = v̄(l,k),(j,i).
(iv) There is an isometric w*-continuous isomorphism Γ :  Lu →  Lv.
Proof. Given Ψ in (i), let (z, w) = θΨ(0, 0). By Proposition 5.6 (z, w) lies in
the interior of Ω0v. By Theorem 4.8 there is a completely isometric automor-
phism Φ ofAv such that θΦ(0, 0) = (z, w) and, therefore, θΦ−1◦Ψ(0, 0) = (0, 0).
By Proposition 5.8, Φ−1 ◦Ψ is a graded isometric isomorphism and (ii) holds.
Lemma 5.2 shows that (ii) implies (iii) and Lemma 5.5 that (iii) implies (i).
Finally, (iii) implies (iv) follows from Lemma 5.5, and (iv) implies (ii) is
entirely similar to (i) implies (ii). �
Remark 5.11 The argument at the beginning of the proof of Theorem 5.10
shows that, whenever Au and Av are isomorphic, we have {n,m} = {n′, m′}.
Theorem 5.12 For n 6= m the isometric automorphisms of Au are of the
form ΨA,BΘz,w where (z, w) ∈ Ω0u and (A⊗B)u = u(A⊗B). In case n = m
the isometric automorphisms include, in addition, those of the form Ψ̃A,BΘz,w
where (A⊗ B)ũ = u(A⊗ B).
6 Special cases
6.1 The case n = m = 2
Even in the low dimensions n = m = 2 there are many isomorphism classes
and special cases. Note that the product unitary equivalence class orbit O(u)
of the 4× 4 unitary matrix u takes the form
O(u) = {(A⊗B)u(A⊗ B)∗ : A,B ∈ SU2(C)},
and so the product unitary equivalence classes are parametrised by the set of
orbits, U4(C)/Ad(SU2(C)×SU2(C)). This set admits a 10-fold parametrisa-
tion, since, as is easily checked, U4(C) and SU2(C)×SU2(C) are real algebraic
varieties of dimension 16 and 6 respectively. It follows that the isometric iso-
morphism types of the algebras Au admit a 10 fold real parametrisation, with
coincidences only for pairs O(u),O(v) with u = ṽ
We now look at some special cases in more detail. Let d = dimKer(u−I).
Case I: d = 0
For every (z, w) ∈ B2 × B2, we have (z, w) ∈ Ωu if and only if the vector
(z1w1, z1w2, z2w1, z2w2)
t lies in Ker(u− I). Thus, in case I, Ωu is as small as
possible and is equal to
Ωmin := (B2 × {0}) ∪ ({0} × B2).
It follows from Lemma 3.4 that, in this case,
Ω0u = {(0, 0)}.
By Proposition 5.8 every isometric automorphism of Au is graded and the
isometric automorphisms of Au are given by pairs (A,B) of unitary matrices
such that A⊗ B either commutes with u or intertwines u and ũ.
Case II: d = 1
When d = 1 it still follows from Lemma 3.4 that
Ω0u = {(0, 0)}
but now it is possible for Ωu to be larger than Ωmin. In fact, if the non zero
vector (a, b, c, d)t spanning Ker(u− I) satisfies ad 6= bc then Ωu = Ωmin but
if ad = bc then the matrix
is of rank one and can be written as
(z1, z2)
t(w1, w2). Thus, (z, w) ∈ Vu and Ωu contains some (z, w) with non
zero z and w.
Since Ω0u = {(0, 0)}, it is still true that isometric isomorphisms and auto-
morphisms of these algebras are graded.
Case III: d = 2
When d = 2 it is possible that Ω0u will contain non zero vectors (z, w) but, as
Lemma 3.4 shows, it does not contain a vector with both z 6= 0 and w 6= 0.
All other possibilities may occur. For example write u1, u2 and u3 for the
three diagonal matrices:
u1 = diag(1,−1,−1, 1), u2 = diag(1,−1, 1,−1)
u3 = diag(1, 1,−1,−1).
Using the definition of the core, we easily see that
Ω0u1 = {(0, 0)}, Ω
= {(0, 0, w1, 0) : |w1| ≤ 1}
Ω0u3 = {(z1, 0, 0, 0) : |z1| ≤ 1}.
Thus, the only isometric automorphisms of Au1 are graded, the isomet-
ric automorphisms of Au2 are formed by composing graded automorphisms
with automorphisms of the type described in Proposition 4.7 (with z = (0, 0)
and w = (w1, 0)). Similarly, for the automorphisms of Au3, we use Proposi-
tion 4.6.
Case IV: d = 3
In this case we are able to obtain an explicit 2-fold parametrization of the
isomorphism types of the algebra Au.
Every 4×4 unitary matrix u with dim(Ker(u− I)) = 3 is determined by
a unit eigenvector x and its (different from 1) eigenvalue. So that ux = λx,
‖x‖ = 1, |λ| = 1 and λ 6= 1. Suppose u and v are product unitary equivalent;
that is
(A⊗ B)u = v(A⊗B)
for unitary matrices A,B, and write x, λ for the unit eigenvector and eigen-
value of u. (Of course, x is determined only up to a multiple by a scalar
of absolute value 1). Then y = (A ⊗ B)x is a unit eigenvector of v with
eigenvalue λ. For unit vectors x, y (in C4) we write x ∼ y if there are unitary
(2 × 2) matrices A,B with y = (A ⊗ B)x. For the statement of the next
lemma recall that the entries of the vectors x and y in C4 are indexed by
{(i, j) : 1 ≤ i, j ≤ 2}.
Lemma 6.1 For a vector x = {x(i,j)} in C4, write c(x) for the 2× 2 matrix
c(x) =
x(1,1) x(1,2)
x(2,1) x(2,2)
Then x ∼ y if and only if there are unitary matrices A,B such that c(x) =
Ac(y)B. (In this case, we shall write c(x) ∼ c(y).)
Proof. Suppose y = (A⊗B)x for some unitary matrices A = (ai,j) and B =
(bi,j). Then c(y)i,j = y(i,j) =
(A ⊗ B)(i,j),(k,l)x(k,l) =
k,l ai,kbj,lc(x)k,l =
(Ac(x)B)i,j. �
Using the polar decomposition c(x) = U |c(x)| and diagonalizing |c(x)| =
V ∗, we find that c(x) ∼
= c(y) where y = (a, 0, 0, d)
and a, d ≥ 0. Then a, d (the eigenvalues of |c(x)|) are uniquely determined
once we choose them such that a ≤ d and, if ‖x‖ = 1, then a2 + d2 = 1
(so that 0 ≤ a ≤ 1/
2 and a determines d). In this way, we associate to
each unitary matrix u as above a pair (a, λ) with 0 ≤ a ≤ 1/
2, λ 6= 1
and |λ| = 1. Using Lemma 6.1 and the discussion preceeding it, we have the
following.
Corollary 6.2 For every 4× 4 unitary matrix u with dim(Ker(u− I)) = 3,
there are numbers λ (with |λ| = 1 and λ 6= 1) and a (0 ≤ a ≤ 1/
2) such
that u and v are product unitary equivalent if and only if they have the same
a, λ.
Proof. Let u and v be unitary matrices with dim(Ker(u− I)) = 3 and let
(a, λ), (b, µ) be the pairs associated to u and v (respectively) as above. Also
write x for the unit eigenvector of u associated to the eigenvalue λ and let y
be the unit eigenvector of v associated to µ.
Suppose u and v are product unitarily equivalent. Then they are unitary
equivalent and, thus, λ = µ. Write (A⊗B)u = v(A⊗B) for unitary matrices
A,B. As we saw above, y can be chosen to be (A⊗ B)x so that x ∼ y and,
by Lemma 6.1, c(x) ∼ c(y). It follows that a = b.
Conversely, assume that a = b and λ = µ. Then c(x) ∼ c(y) and, thus,
x ∼ y so we can write y = (A⊗B)x for some unitary matrices A,B. Writing
v′ = (A⊗B)u(A⊗B)∗, we find that y is the unit eigenvector of v′ associated
to λ. Thus v = v′, completing the proof. �
For every a, λ as in Corollary 6.2 we let u(a, λ) be the following 4 × 4
matrix.
u(a, λ) =
(λ− 1)a2 + 1 0 0 (λ− 1)a(1− a2)1/2
0 1 0 0
0 0 1 0
(λ− 1)a(1− a2)1/2 0 0 λ+ (1− λ)a2
It is a straightforward computation to verify that dim(Ker(u− I)) = 3 and
that λ is an eigenvalue of u(a, λ) with eigenvector (a, 0, 0, (1− a2)1/2)t. Thus
the pair associated to u(a, λ) is a, λ and we have
Corollary 6.3 Every matrix u with dim(Ker(u−I)) = 3 is product unitary
equivalent to a unique matrix of the form u(a, λ) (with 0 ≤ a ≤ 1/
2, |λ| = 1
and λ 6= 1).
Using the definition of the core, we immediately get the following.
Proposition 6.4 If a = 0, |λ| = 1, λ 6= 1, then Ωu(0,λ) is the union
{(z1, z2, w1, 0) : z ∈ B2; |w1| ≤ 1} ∪ {(z1, 0, w1, w2) : w ∈ B2; |z1| ≤ 1},
Ω0u(0,λ) = {(z1, 0, w1, 0) : |z1| ≤ 1; |w1| ≤ 1}.
If a 6= 0 then
Ωu(a,λ) = {(z1, z2, w1, w2) : az1w1 + (1− a2)1/2z2w2 = 0, (z, w) ∈ B2 × B2}
Ω0u(a,λ) = {(0, 0)}.
Proof. The space Ωu(a,λ) consists of points (z, w) for which
(z1w1, z1w2, z2w1, z2w2)
t = u(a, λ)(z1w1, z1w2, z2w1, z2w2)
that is, for which
((λ− 1)a2 + 1)z1w1 + (λ− 1)a(1− a2)1/2z2w2 = z1w1,
(λ− 1)a(1− a2)1/2z1w1 + (λ+ (λ− 1)a2)z2w2 = z2w2.
If a = 0 this implies z2w2 = 0, while if a 6= 0 then (z1w1, 0, 0, z2w2) is a fixed
vector for u(a, λ) and so for some scalar µ (z1w1, z2w2) = µ((1− a2)1/2,−a).
The descriptions of Ωu(a,λ) follows.
From the definition of the core and the fact that here C12 = C21 = 0 and
C11 =
(λ− 1) 0
0 (λ− 1)a(1− a2)1/2
C22 =
(λ− 1)a(1− a2)1/2 0
0 (λ− 1) + (λ− 1)a2
we see that for a = 0 we have w2 = z2 = 0 while for a 6= 0, z1 = z2 = w1 =
w2 = 0. �
Recall that, for a 4 × 4 unitary matrix v we defined the matrix ṽ by
ṽ(i,j),(k,l) = v̄(l,k),(j,i) and showed (Corollary 5.10) that Au and Av are isomet-
rically isomorphic if and only if either u and v or u and ṽ are product unitary
equivalent.
Now, it is easy to check that ũ(a, λ) = u(a, λ̄) and so, using Proposi-
tion 3.3 and previous results, we obtain the following.
Theorem 6.5 Let 0 ≤ a, b ≤ 1/
2, |λ| = |µ| = 1, λ, µ 6= 1. Then
(1) Au(a,λ) and Au(b,µ) are isometrically isomorphic if and only if a = b and
λ equals either µ or µ̄.
(2) When a 6= 0 the isometric automorphisms of Au(0,λ) are all bigraded
(3) If a = 0 then there are isometric isomorphisms that are not graded
Case V: d = 4
This is the case where u = I. We have Ωu = Ω
u = Bn×Bm and the isometric
automorphisms are obtained by composing graded automorphisms and the
automorphisms described by Proposition 4.6, Proposition 4.7.
6.2 Permutation unitary relation algebras
With more structure assumed for a class of unitaries u it may be possible to
derive an appropriately more definitive classification of the algebras Au. We
indicate this now for the class of permutation unitaries. A fuller discussion
is in [10].
Let θ ∈ S4, viewed as a permutation of the product set {1, 2} × {1, 2} =
{11, 12, 21, 22}. Associate with θ the matrix uθ = u(i,j),(k,l) where u(i,j),(k,l) =
1 if (k, l) = θ(i, j) and is zero otherwise. If τ ∈ S4 is product conjugate to θ in
the sense that τ = σθσ−1 with σ in S2×S2, then it follows that uτ and uθ are
product unitarily equivalent. Thus we need only consider product conjugacy
classes. It turns out that these classes are the same as the product unitary
equivalence classes of the matrices uθ.
It can be helpful to view a permutation θ in Snm as a permutation of the
entries of an n ×m rectangular array, since product conjugacy corresponds
to conjugation through row permutations and column permutations. Con-
sidering this for n = m = 2 one can verify firstly that there are at most
9 isomorphism types for the algebras Atheta corresponding to the following
permutations:
θ1 = id, θ2 = (11, 12), θ3 = (11, 22),
θ4a = (11, 22, 12), θ4b = θ
4a = (11, 12, 22), θ5 = ((11, 12), (21, 22)),
θ6 = ((11, 22), (12, 21)), θ7 = (11, 12, 22, 21), θ8 = (11, 12, 21, 22).
The Gelfand spaces of the algebras Aθ (and Lθ) distinguish all of these al-
gebras except for the pairs {θ4a, θ4b} and {θ7, θ8}. However, one can verify
in both cases that neither the pair u, v nor the pair u, ṽ are product unitary
equivalent. Theorem 5.10 now applies to yield the following result from [10].
Theorem 6.6 For n = m = 2 there are 9 isometric isomorphism classes for
the algebras Aθ and for the algebras Lθ.
To a higher rank graph (Λ, d) in the sense of Kumjian and Pask [6] one can
associate nonself-adjoint Toeplitz algebra AΛ,LΛ, as in Kribs and Power [5].
In the single vertex rank 2 case it is easy to see that AΛ is equal to the algebra
Au for some permutation matrix u = θ in Snm. Thus Theorem 5.10 classifies
these algebras in terms of product unitary equivalence restricted to Snm as
stated formally in the next theorem. In the rank 2 case this is a significant
improvement on the results in [10] which, although covering general rank,
were restricted to the case of trivial core for the character space. With θ̃ the
permutation for the permutation matrix ũθ (which corresponds to generator
exchange) we have:
Theorem 6.7 Let Λ1 and Λ2 be single vertex 2-graphs with relations de-
termined by the permutations θ1 and θ2. Then the rank 2 graph algebras
AΛ1,AΛ2 are isometrically isomorphic if and only if the pair θ1, θ2 or the
pair θ1, θ̃2 are product unitary equivalent
It is natural to expect that as in the (2, 2) case product unitary equiva-
lence will correspond to product conjugacy.
7 Au as a subalgebra of a tensor algebra
Let En be the Toeplitz extension of the Cuntz algebra On and write H for
the Fock space associated with E (that is, H = C ⊕ E ⊕ (E ⊗ E) ⊕ · · ·).
Note that En acts naturally on H ( by the “shift” or “creation” operators
Li = Lei, 1 ≤ i ≤ n). In fact, Le1, . . . , Len generate En as a C∗-algebra.
Consider also the space F(F )⊗H = H⊕(F⊗H)⊕((F⊗F )⊗H)⊕· · ·. This
space is isomorphic to F(E, F, u) and we write w : F(F )⊗H → F(E, F, u)
for the isomorphism. It will be convenient to write wk for the restriction of
w to the summand F⊗k⊗H (which is an isomorphism onto its image). Note
that, for a fixed k, {w∗kLeiwk : 1 ≤ i ≤ n } is a set of n isometries with
orthogonal ranges. Thus it defines a representation ρk of En on F⊗k⊗H (with
ρk(Lei) = w
kLeiwk). (Note that we are using Lei for the creation operators
both on H and on F(E, F, u). This should cause no confusion). We also
write ρ∞ for the representation
k ⊕ρk of En on F(F )⊗H (where ρ0 is the
representation of En on H).
Let X be the column space Cm(En). This is a C∗-module over En. As a
vector space it is the direct sum of m copies of En. The right module action
of En on X is given by (ai) · b = (aib) and the En-valued inner product is
〈(ai), (bi)〉 =
i bi. For every 1 ≤ i ≤ n, we write S̃i for the operator in
L(X) defined by
S̃i(aj)
j=1 = (
u(i,j),(k,l)Lekaj)
Note that
u(i,j),(k,l)Lekaj)
l=1, (
j′,k′
u(i,j′),(k′,l)Lek′ bj′)
l=1〉 =
j,j′,k,k′,l
ū(i,j),(k,l)a
Lek′ bj′u(i,j′),(k′,l) =
(uu∗)(i,j′),(i,j)a
jbj′ =
a∗jbj
= 〈(aj), (bj′)〉.
Thus S̃i is an isometry. A similar computation shows that these isometries
have orthogonal ranges and, thus, this family defines a ∗-homomorphism
ϕ : En → L(X), with ϕ(Lei) = S̃i, 1 ≤ i ≤ n, making X a C∗-correspondence
over En (in the sense of [8] and [7]). Once we have a correspondence we can
formX⊗X and, more generally, X⊗k. Recall that to define X⊗X one defines
the sesquilinear form 〈x⊗y, x′⊗y′〉 = 〈y, ϕ(〈x, x′〉)y′〉 on the algebraic tensor
product and then lets X ⊗X be the Hausdorff completion. The right action
of En on X ⊗X is (x⊗ y) · a = x⊗ (y · a) and the left action is given by the
map ϕ2.
ϕ2(a)(x⊗ y) = ϕ(a)x⊗ y.
The definition of X⊗k is similar (and the left action map is denoted ϕk)
For k = 0 we set X⊗0 = En and ϕ0 is defined by left multiplication . Also
write ϕ∞ for
k ⊕ϕk, the left action of En on F(X).
One can then define the Hilbert spaceX⊗k⊗EnH by defining the sesquilin-
ear form 〈x⊗h, y⊗k〉 = 〈h, 〈x, y〉k〉 (x, y ∈ X⊗k) and applying the Hausdorff
completion.
Now define the map
v : X ⊗En H → F ⊗H
by setting
v((ai)⊗ h) =
fi ⊗ aih.
It is straightforward to check that this map is a well defined Hilbert space
isomorphism. By induction, we also define maps vk : X
⊗k⊗En H → F⊗k⊗H
vk+1((aj)⊗ z) =
fj ⊗ vk((ϕk(aj)⊗ IH)z) (23)
for z ∈ X⊗k ⊗En H and v0 is the identity map from En ⊗En H (which is
isomorphic to H) and F⊗0 ⊗ H = H . Assume that vk is a Hilbert space
isomorphism of X⊗k ⊗En H onto F⊗k ⊗ H and compute, for (aj), (bj) ∈ X
and z, z′ ∈ X⊗k ⊗H ,
〈vk+1((aj)⊗z), vk+1((bj)⊗z)〉 =
〈fj⊗vk((ϕk(aj)⊗IH)z), fj′⊗vk((ϕk(bj′)⊗IH)z′)〉 =
〈vk((ϕk(aj)⊗ IH)z), vk((ϕk(bj)⊗ IH)z′)〉 =
〈z, (ϕk(a∗jbj)⊗ IH)z′)〉 =
〈(aj)⊗ z, (bj)⊗ z′〉.
Thus, by induction, each map vk is a Hilbert space isomorphism and, sum-
ming up, we get a Hilbert space isomorphism
v∞ :=
⊕vk : F(X)⊗En H → F(F )⊗H.
Lemma 7.1 v∞ is a Hilbert space isomorphism and intertwines the actions
of En. That is,
v∞ ◦ (ϕ∞(a)⊗ IH) = ρ∞(a) ◦ v∞
for a ∈ En.
Proof. We show that, for every p ≥ 0 and a ∈ En, we have
vp ◦ (ϕp(a)⊗ IH) = ρp(a) ◦ vp. (24)
The proof will proceed by induction on p. For p = 0 this is clear so we now
assume that it holds for p. For 1 ≤ i ≤ n, (aj) ∈ X and z ∈ X⊗p⊗H , we have
vp+1((ϕp+1(Lei)⊗ IH)((aj)⊗ z)) = vp+1(ϕ(Lei)(aj)⊗ z) =
l,k,j u(i,j),(k,l)fl ⊗
vp((ϕp(Lekaj)⊗ IH)z). Using the induction hypothesis, this is equal to
l,k,j
u(i,j),(k,l)fl ⊗ ρp(Lek)ρp(aj)vpz =
l,k,j
u(i,j),(k,l)fl ⊗ w∗pLekwpρp(aj)vpz =
l,k,j
u(i,j),(k,l)fl ⊗ ekρp(aj)vpz = w∗∞
ei ⊗ fj ⊗ ρp(aj)vpz =
ρp+1(Lei)w
fj ⊗ ρp(aj)vpz.
Using the induction hypothesis again, we get ρp+1(Lei)w
j fj⊗vp((ϕp(aj)⊗
IH)z) = ρp+1(Lei)vp+1((aj)⊗z). This proves (24) for p+1 and the generators
of En. Since both ρp+1 and vp+1(ϕp+1(·)⊗IH)v∗p+1 are ∗-homomorphisms, (24)
holds for p + 1 and every a ∈ En, completing the induction step. Thus, (24)
holds for every p and this implies the statement of the lemma. �
Write δl for the vector (aj) in X such that al = I and aj = 0 if l 6= j.
The tensor algebra T+(X) is generated by the operators Tδl (where Tδl is the
creation operator on F(X) associated with δl) and the C∗-algebra ϕ∞(En).
The latter algebra is generated (as a C∗-algebra) by the operators ϕ∞(Li)
where {Li} is the set of generators of En.
We have
Lemma 7.2 For every 1 ≤ i ≤ n and 1 ≤ j ≤ m and k ≥ 0,
(i) w ◦ vk ◦ (ϕ∞(Li)⊗ IH) = Lei ◦ w ◦ vk.
(ii) w ◦ vk+1 ◦ (Tδj ⊗ IH) = Lfj ◦ w ◦ vk.
Proof. Part (i) follows from (24) and part (ii) from (23) (with δj in place
of (aj)). �
Recalling that w ◦ v∞ is a unitary operator mapping F(X) ⊗ H onto
F(E, F, u), we get
Theorem 7.3 (1) The algebra Au is unitarily isomorphic to the (norm
closed) subalgebra of the tensor algebra T+(X) that is generated by
{ϕ∞(Li), Tδj : 1 ≤ i ≤ n, 1 ≤ j ≤ m}.
(2) The (norm closed) subalgebra of B(F(E, F, u)) that is generated by
{Lei, L∗ei, Lfj : 1 ≤ i ≤ n, 1 ≤ j ≤ m } is unitarily isomorphic to the
tensor algebra T+(X) (and contains Au).
(2) The (norm closed) subalgebra of B(F(E, F, u)) that is generated by
{Lei, L∗fj , Lfj : 1 ≤ i ≤ n, 1 ≤ j ≤ m } is unitarily isomorphic to a
tensor algebra T+(Y ) (and contains Au).
Proof. Parts (1) and (2) follow from Lemma 7.2. For part (3), note
that one can interchange the roles of E and F . More precisely, one defines
the C∗-module Y over Em to be Y = Cn(Em) and the left action of Em on
Y by ϕY (Lfl)(bk)
k=1 = (
j,k ū(i,j),(k,l)Lfjbk)
i=1. This makes Y into a C
correspondence over Em and the rest of the proof proceeds along similar lines
as above. �
Suppose m = 1. Then X is the correspondence associated with the
automorphism α of En given by mapping Ti to
j=1 ui,jTj (note that u, in
this case, is an n × n matrix). The tensor algebra T+(X) is the analytic
crossed product En ×α Z+ and Au is unitarily isomorphic to the subalgebra
of this analytic crossed product that can be written An×α Z+. One can also
embed Au in T+(Y ) (as in Corollary 7.3(3)). Here Em is simply the (classical)
Toeplitz algebra T and Y = Cn(T ) with ϕY (Tz)(bk)k = (
k ūi,kTzbk)i (where
Tz is the generator of T ).
Remark 7.4 Since the automorphisms Θz,w and ΨA,B of Au are both uni-
tarily implemented, they can be extended to T+(X). It is easy to check that
they map T+(X) into itself and, thus, are automorphisms of T+(X). Hence,
at least when n 6= m, every automorphism of Au can be extended to an auto-
morphism of the tensor algebra T+(X) that contains it (see Theorem 5.12).
References
[1] K.R. Davidson, Free Semigroup Algebras : a survey. Systems, approxi-
mation, singular integral operators, and related topics (Bordeaux, 2000),
209–240, Oper. Theory Adv. Appl. 129, Birkhauser, Basel, 2001.
[2] K.R. Davidson and D.R. Pitts, The algebraic structure of noncommuta-
tive analytic Toeplitz algebras, Math. Ann. 311 (1998), 275-303.
[3] N. Fowler, Discrete product systems of Hilbert bimodules, Pacific J.
Math. 204 (2002), 335-375.
[4] E. Katsoulis, D.W. Kribs, Isomorphisms of algebras associated with di-
rected graphs, Math. Ann., 330 (2004), 709-728.
[5] D.W. Kribs and S.C. Power, The H∞ algebras of higher rank graphs,
Math. Proc. of the Royal Irish Acad., 106 (2006), 199-218.
[6] A. Kumjian and D. Pask, Higher rank graph C* -algebras, New York J.
Math. 6 (2000), 1–20.
[7] P. Muhly and B. Solel, Tensor algebras over C∗-correspondences (Rep-
resentations, dilations, and C∗-envelopes), J. Functional Anal. 158
(1998), 389–457.
[8] M. Pimsner, A class of C∗-algebras generalizing both Cuntz-Krieger
algebras and crossed products by Z, in Free Probability Theory, D.
Voiculescu, Ed., Fields Institute Communications 12, 189-212, Amer.
Math. Soc., Providence, 1997.
[9] G. Popescu, Von Neumann inequality for (B(H)n)1, Math. Scand.68
(1991), 292-304.
[10] S.C. Power, Classifying higher rank analytic Toeplitz algebras, preprint
2006, preprint Archive no., math.OA/0604630.
[11] B. Solel, You can see the arrows in a quiver algebra, J. Australian Math.
Soc., 77 (2004), 111-122.
[12] B. Solel, Regular dilations of representations of product systems, preprint
Archive no., math.OA/0504129.
[13] B. Solel, Representations of product systems over semigroups and dila-
tions of commuting CP maps, J. Funct. Anal.235 (2006), 593-618.
[14] D. Voiculescu, Symmetries of some reduced free product C∗-algebras,
Lect. Notes Math. 1132, 556-588, Springer-Verlag, New York 1985.
http://arxiv.org/abs/math/0604630
http://arxiv.org/abs/math/0504129
	Introduction
	Preliminaries
	The character space and its core
	Automorphisms of Ln and  Lu
	Isomorphic algebras
	Special cases
	The case n=m=2
	Permutation unitary relation algebras
	Au as a subalgebra of a tensor algebra
ABSTRACT
  We define nonselfadjoint operator algebras with generators $L_{e_1},...,
L_{e_n}, L_{f_1},...,L_{f_m}$ subject to the unitary commutation relations of
the form \[ L_{e_i}L_{f_j} = \sum_{k,l} u_{i,j,k,l} L_{f_l}L_{e_k}\] where $u=
(u_{i,j,k,l})$ is an $nm \times nm$ unitary matrix. These algebras, which
generalise the analytic Toeplitz algebras of rank 2 graphs with a single
vertex, are classified up to isometric isomorphism in terms of the matrix $u$.

<|endoftext|><|startoftext|>
THE ASTROPHYSICAL JOURNAL, 679:1272–1287, 2008 JUNE 1
Preprint typeset using LATEX style emulateapj v. 08/22/09
SHAPING THE GLOBULAR CLUSTER MASS FUNCTION BY STELLAR-DYNAMICAL EVAPORATION
DEAN E. MCLAUGHLIN1,2 AND S. MICHAEL FALL3,4
The Astrophysical Journal, 679:1272–1287, 2008 June 1
ABSTRACT
We show that the globular cluster mass function (GCMF) in the Milky Way depends on cluster half-mass
density, ρh, in the sense that the turnover mass MTO increases with ρh while the width of the GCMF decreases.
We argue that this is the expected signature of the slow erosion of a mass function that initially rose towards low
masses, predominantly through cluster evaporation driven by internal two-body relaxation. We find excellent
agreement between the observed GCMF—including its dependence on internal density rhoh, central concen-
tration c, and Galactocentric distance rgc—and a simple model in which the relaxation-driven mass-loss rates
of clusters are approximated by −dM/dt = µev ∝ ρ
h . In particular, we recover the well-known insensitivity
of MTO to rgc. This feature does not derive from a literal “universality” of the GCMF turnover mass, but rather
from a significant variation of MTO with ρh—the expected outcome of relaxation-driven cluster disruption—
plus significant scatter in ρh as a function of rgc. Our conclusions are the same if the evaporation rates are
assumed to depend instead on the mean volume or surface densities of clusters inside their tidal radii, as
µev ∝ ρ
t or µev ∝ Σ
t —alternative prescriptions that are physically motivated but involve cluster properties
(ρt and Σt) that are not as well defined or as readily observable as ρh. In all cases, the normalization of µev
required to fit the GCMF implies cluster lifetimes that are within the range of standard values (although falling
towards the low end of this range). Our analysis does not depend on any assumptions or information about
velocity anisotropy in the globular cluster system.
Subject headings: galaxies: star clusters—globular clusters: general
1. INTRODUCTION
The mass functions of star cluster systems provide an im-
portant point of reference for attempts to understand the con-
nection between old globular clusters (GCs) and the young
massive clusters that form in local starbursts and galaxy merg-
ers. When expressed as the number per unit logarithmic mass,
dN/d log M, the GC mass function (GCMF) is character-
ized by a peak, or turnover, at a mass MTO ≈ 1–2× 10
that is empirically very similar in most galaxies. By con-
trast, the mass functions of young clusters show no such fea-
ture but instead rise monotonically towards low masses over
the full observed range (106 M⊙ & M & 10
4 M⊙ in the best-
studied cases), in a way that is well described by a power law,
dN/d log M ∝ M1−β with β ≃ 2 (e.g., Zhang & Fall 1999).
At the same time, for high M > MTO, old GCMFs
closely resemble the mass functions of young clusters, and
of molecular clouds in the Milky Way and other galaxies
(Harris & Pudritz 1994; Elmegreen & Efremov 1997); and
it is well known that a number of dynamical processes
cause star clusters to lose mass and can lead to their com-
plete destruction as they orbit for a Hubble time in the
potential wells of their parent galaxies (e.g., Fall & Rees
1977; Caputo & Castellani 1984; Aguilar, Hut, & Ostriker
1988; Chernoff & Weinberg 1990; Gnedin & Ostriker 1997;
Murali & Weinberg 1997). It is therefore natural to ask
whether the peaks in GCMFs can be explained by the deple-
1 Dept. of Physics and Astronomy, University of Leicester, University
Road, Leicester, UK LE1 7RH
2 Permanent address: Astrophysics Group, Lennard-Jones Lab-
oratories, Keele University, Keele, Staffordshire, UK ST5 5BG;
dem@astro.keele.ac.uk
3 Institute for Advanced Study, Einstein Drive, Princeton, NJ 08450
4 Permanent address: Space Telescope Science Institute, 3700 San Martin
Drive, Baltimore, MD 21218; fall@stsci.edu
tion over many Gyr of globulars from initial mass distribu-
tions that were similar to those of young clusters below MTO
as well as above.
Our chief purpose in this paper is to establish and inter-
pret an aspect of the Galactic GCMF that appears fundamen-
tal but has gone largely unnoticed to date: dN/d log M has
a strong and systematic dependence on GC half-mass den-
sity, ρh ≡ 3M/8πr
h (rh being the cluster half-mass radius), in
the sense that the turnover mass MTO increases and the width
of the distribution decreases with increasing ρh. As observed
facts, these must be explained by any theory of the GCMF. We
argue here that they are an expected signature of slow dynam-
ical evolution from a mass function that initially increased
towards M < MTO, if the long-term mass loss from surviv-
ing GCs has been dominated by stellar escape due to internal,
two-body relaxation (which we refer to from now on as either
relaxation-driven evaporation or simply evaporation).
Fall & Zhang (2001; hereafter FZ01) explain in detail why
cluster evaporation dominates the long-term evolution of the
low-mass shape of observable GCMFs. Briefly, stellar evolu-
tion removes (through supernovae and winds) the same frac-
tion of mass from all clusters of a given age, and so cannot
change the shape of dN/d log M (unless special initial con-
ditions are invoked; cf. Vesperini & Zepf 2003). Meanwhile,
for GCs like those that have survived for a Hubble time in
the Milky Way, the mass loss from gravitational shocks dur-
ing disk crossings and bulge passages is generally less than
that due to evaporation for M < MTO (FZ01; Prieto & Gnedin
2006).5
As we discuss further in §2 below, the evaporation of tidally
5 It is possible that there existed a past population of GCs with low den-
sities or concentrations, or perhaps on extreme orbits, that were destroyed in
less than a Hubble time by shocks or stellar evolution. Our discussion does
not cover such clusters.
http://arxiv.org/abs/0704.0080v4
2 McLAUGHLIN & FALL
limited clusters proceeds at a rate, µev ≡ −dM/dt, that is ap-
proximately constant in time and primarily determined by
cluster density. FZ01 show that a constant mass-loss rate
leads to a power-law scaling dN/d log M ∝ M1−β with β → 0
(corresponding to a flat distribution of clusters per unit lin-
ear mass) at sufficiently low M < µevt in the evolved mass
function of coeval GCs that began with any nontrivial ini-
tial dN/d log M0.
6 To accommodate this when dN/d log M0
originally increased towards low masses as a power law, a
time-dependent peak must develop in the GCMF at a mass
of order MTO(t) ∼ µevt (FZ01). But then, since µev depends
fundamentally on cluster density, so too must MTO.
A β ≃ 0 power-law scaling below the turnover mass has
been confirmed directly in the GCMFs of the Milky Way
(FZ01) and the giant elliptical M87 (Waters et al. 2006), while
Jordán et al. (2007) show it to be consistent with dN/d log M
data for 89 Virgo Cluster galaxies, and it is apparent in deep
observations of some other GCMFs (e.g., in the Sombrero
galaxy, M104; Spitler et al. 2006). As regards the peak itself,
old GCs are observed (e.g., Jordán et al. 2005) to have rather
similar densities on average—and, therefore, similar typical
µev—in galaxies with widely different total luminosities and
Hubble types. (Inasmuch as cluster densities are set by tides,
this is probably related to the mild variation of mean galaxy
density with total luminosity; see FZ01, and also Jordán et al.
2007.) Thus, an evaporation-dominated evolutionary origin
for a turnover in the GCMF appears to be consistent with
the well-known fact that the mass scale MTO generally dif-
fers very little among galaxies (e.g., Harris 2001; Jordán et al.
2006).
If this picture is basically correct, it implies that, even
though MTO may appear nearly universal when considering
the global mass functions of entire GC systems, in fact the
GCMFs of subsamples of clusters with similar ages but dif-
ferent densities should have different turnovers. In §2, we
show—working for definiteness and relatively easy observ-
ability with the half-mass density, ρh—that this is the case for
globulars in the Milky Way. We fit the observed dN/d log M
for GCs in bins of different ρh with models assuming that
(1) the initial distribution increased as a β = 2 power law at
low masses and (2) the mass-loss rates of individual clusters
can be estimated from their half-mass densities by the rule
µev ∝ ρ
h . In §3 we discuss the validity of this prescription
for µev, which is certainly approximate but captures the main
physical dependence of relaxation-driven mass loss. In par-
ticular, we show that the alternative mass-loss laws µev ∝ ρ
and µev ∝ Σ
t —where ρt and Σt are the mean volume and
surface densities inside cluster tidal radii—lead to models for
the GCMF that are essentially indistinguishable from those
based on µev ∝ ρ
h . The normalization of µev required to fit
the observed GCMF implies cluster lifetimes that are within
a factor of ≈ 2 (perhaps slightly on the low side, if the ini-
tial power-law exponent at low masses was β = 2) of typical
values in theories and simulations of two-body relaxation in
tidally limited GCs.
We also show in §2 that when the observed densities of in-
dividual clusters are used in our models to predict GCMFs
in different bins of Galactocentric radius (rgc), they fit the
6 Throughout this paper, we use “initial” to mean at a relatively early time
in the development of long-lived clusters, after they have dispersed any rem-
nants of their natal gas clouds, survived the bulk of stellar-evolution mass
loss, and come into virial equilibrium in the tidal field of a galaxy.
much weaker variation of dN/d log M and MTO as functions
of rgc, which is well-known in the Milky Way and other large
galaxies (see Harris 2001; Harris, Harris, & McLaughlin
1998; Barmby, Huchra, & Brodie 2001; Vesperini et al. 2003;
Jordán et al. 2007). Similarly, applying our models to the
GCs in two bins of central concentration, with only the mea-
sured ρh of the clusters in each subsample as input, suffices
to account for previously noted differences between the mass
functions of low- and high-concentration Galactic globulars
(Smith & Burkert 2002). The most fundamental feature of the
GCMF therefore appears to be its dependence on cluster den-
sity, which can be understood at least qualitatively (and even
quantitatively, to within a factor of 2) in terms of evaporation-
dominated cluster disruption.
There is a widespread perception that if the GCMF evolved
slowly from a rising power law at low masses, then a weak
or null variation of MTO with rgc can be achieved only in GC
systems with strongly radially anisotropic velocity distribu-
tions, which are not observed (see especially Vesperini et al.
2003). This apparent inconsistency has been cited to bol-
ster some recent attempts to identify a mechanism by which
a “universal” peak at MTO ∼ 10
5 M⊙ might have been im-
printed on the GCMF at the time of cluster formation, or
very shortly afterwards, and little affected by the subsequent
destruction of lower-mass GCs (e.g., Vesperini & Zepf 2003;
Parmentier & Gilmore 2007). However, given the real suc-
cesses of an evaporation-dominated evolutionary scenario for
the origin of MTO, as summarized above and added to below,
it would be premature to reject the idea in favor of requiring a
near-formation origin, solely on the basis of difficulties with
GC kinematics. (And, in any event, formation-oriented mod-
els must now be reconsidered in light of the non-universality
of MTO as a function of cluster density.)
We are not concerned in this paper with velocity anisotropy
in GC systems, because we only predict an evaporation-
evolved dN/d log M as a function of cluster density (and age)
and take the observed distribution of ρh versus rgc in the Milky
Way as a given, to show consistency with the observed be-
havior of MTO as a function of rgc. Most other models (FZ01;
Vesperini et al. 2003; and references therein) predict dynam-
ically evolved GCMFs directly in terms of rgc, and in doing
so are forced also to derive theoretical dependences of cluster
density on rgc. It is only at this stage that GC orbital distri-
butions enter the problem, and then only in conjunction with
several other assumptions and simplifications. As we discuss
further in §3 below, the radially biased GC velocity distribu-
tions that appear in such models could well be consequences
of one or more of these other assumptions, rather than of the
main hypothesis about evaporation-dominated GCMF evolu-
tion.
2. THE GALACTIC GCMF AS A FUNCTION OF
CLUSTER DENSITY
In this section we define and model the dependence of
the Galactic GCMF on cluster density. First, we describe
the dependence that is expected to arise from evaporation-
dominated evolution.
Two-body relaxation in a tidally limited GC leads to a
roughly steady rate of mass loss, µev ≡ −dM/dt ≃ constant
in time. Thus, the total cluster mass decreases approxi-
mately linearly, as M(t) ≃ M0 −µevt. This behavior is exact
in some classic models of GC evolution (Hénon 1961) and
is found to be a good approximation in most other calcula-
tions (e.g., Lee & Ostriker 1987; Chernoff & Weinberg 1990;
SHAPING THE GCMF BY EVAPORATION 3
Vesperini & Heggie 1997; Gnedin, Lee, & Ostriker 1999;
Baumgardt 2001; Giersz 2001; Baumgardt & Makino 2003;
Trenti, Heggie, & Hut 2007). The result comes from a variety
of computational methods (semi-analytical, Fokker-Planck,
Monte Carlo, and N-body simulation) applied to clusters with
different initial conditions (densities and concentrations) on
different kinds of orbits (circular and eccentric; with and with-
out external gravitational shocks) and with different internal
processes and ingredients (with or without stellar mass spec-
tra, binaries, and central black holes). To be sure, deviations
from perfect linearity in M(t) do occur, but these are generally
small—especially away from the endpoints of the evolution,
i.e., for 0.9&M(t)/M0 & 0.1—and neglecting them to assume
an approximately constant dM/dt is entirely appropriate for
our purposes.
When gravitational shocks are subdominant to relaxation-
driven evaporation, as they generally appear to be for
extant GCs, they work to boost the mass-loss rate µev
slightly without altering the basic linearity of M(t) (e.g.,
Vesperini & Heggie 1997; Gnedin, Lee, & Ostriker 1999; see
also Figure 1 of FZ01). A time-dependent mass scale ∆ ≡
µevt is then associated naturally with any system of coeval
clusters having a common mass-loss rate: all those with initial
M0 ≤ ∆ are disrupted by time t, and replaced with the rem-
nants of objects that began with M0 >∆. As we mentioned
in §1, if the initial GCMF increased towards low masses as a
power law, then ∆ is closely related to a peak in the evolved
distribution, which eventually decreases towards low M <∆
as dN/d log M ∝ M1−β with β = 0 (FZ01).
In standard theory (e.g., Spitzer 1987; Binney & Tremaine
1987, Section 8.3), the lifetime of a cluster against evapora-
tion is a multiple of its two-body relaxation time, trlx. For
a total mass M of stars within a radius r, this scales to first
order (ignoring a weak mass dependence in the Coulomb log-
arithm) as trlx(r) ∝ (Mr
3)1/2 ∝ M/ρ1/2, where ρ∝ M/r3. In a
concentrated cluster with an internal density gradient, trlx(r)
of course varies throughout the cluster, and the global re-
laxation timescale is an average of the local values (see the
early discussion by King 1958). This can still be written as
trlx ∝ M/ρ
1/2, with M the total cluster mass and ρ an appro-
priate reference density. We then have for the instantaneous
mass-loss rate, µev ≡ −dM/dt ∝ M/trlx ∝ ρ1/2. Insofar as this
is approximately constant in time, a GCMF evolving from an
initial β > 1 power law at low masses should therefore de-
velop a peak at a mass that depends on cluster density and age
through the parameter ∆∝ ρ1/2t.
It remains to identify the best measure of ρ in this context.
A standard choice in the literature, and the one that we even-
tually make to derive our main results in this paper, is the
half-mass density ρh = 3M/8πr
h. However, in a steady tidal
field, the mean density ρt inside the tidal radius of a cluster is
constant by definition, and thus choosing ρ = ρt instead is the
simplest way to ensure that µev ∝ ρ
1/2 and µev ≃ constant in
time are mutually consistent. In fact, King (1966) found from
direct calculations of the escape rate at each radius within his
standard (lowered Maxwellian) models, that the coefficient
in µev ∝ ρ
t is only a weak function of the internal density
structure (concentration) of the models, and thus only a weak
function of time for a cluster evolving quasistatically through
a series of such models.
The rule µev ∝ ρ
t is routinely used to set the GC mass-
loss rates in models for the dynamical evolution of the GCMF,
although such studies normally express µev immediately in
terms of orbital pericenters, rp, most often by assuming ρt ∝
r−2p as for GCs in galaxies whose total mass distributions fol-
low a singular isothermal sphere (e.g., Vesperini 1997, 1998,
2000, 2001; Vesperini et al. 2003; Baumgardt 1998; FZ01).
This bypasses any explicit examination of the GCMF as a
function of cluster density, which is our main goal in this pa-
per. But it is done in part because tidal radii are the most
poorly constrained of all structural parameters for GCs in the
Milky Way (their theoretical definition is imprecise and their
empirical estimation is highly model-dependent and sensitive
to low-surface brightness data), and they are exceedingly dif-
ficult if not impossible to measure in distant galaxies. We
deal with this here by focusing on the GCMF as a function of
cluster density ρh inside the less ambiguous, empirically bet-
ter determined and more robust half-mass radius, asking how
simple models with µev ∝ ρ
h fare against the data.
Taking µev ∝ ρ
h in place of µev ∝ ρ
t , which we do to
construct evaporation-evolved model GCMFs in §2.2, is most
appropriate if the ratio ρt/ρh is the same for all clusters and
constant in time. This is the case in Hénon’s (1961) model
of GC evolution, and in this limit (adopted by FZ01 in their
models for the Galactic GCMF) our analysis is rigorously jus-
tified. However, real clusters are not homologous (ρt/ρh dif-
fers among clusters) and they do not evolve self-similarly (ρh
may vary in time even if ρt does not). The key assumption in
our models is that µev is approximately independent of time
for any GC, which is well-founded in any case. By using cur-
rent ρh values to estimate µev, we do not suppose that the half-
mass densities are also constant, but we in effect use a single
number for all GCs to represent a range of (ρt/ρh)
1/2. Equiv-
alently, we ignore a dependence on cluster concentration in
the normalization of µev ∝ ρ
h . As we discuss further in §3,
it is reasonable to neglect this complication in a first approx-
imation because (ρt/ρh)
1/2 varies much less among Galactic
globulars than ρt and ρh do separately. We demonstrate this
explicitly by repeating our analysis with ρh replaced by ρt and
recover essentially the same results for the GCMF.
In §3 we also discuss some recent results, which indicate
that the timescale for relaxation-driven evaporation depends
on a slightly less-than-linear power of trlx (Baumgardt 2001;
Baumgardt & Makino 2003). We point out that this implies
that µev may increase as a modest power of the average sur-
face density of a cluster as well as (or, in an important special
case, instead of) the usual volume density. However, we show
in detail that making the appropriate changes throughout the
rest of the present section to reflect this possibility does not
change any of our conclusions.
2.1. Data
Figure 1 shows the distribution of mass against half-mass
density and against Galactocentric radius for 146 Milky Way
GCs in the catalogue of Harris (1996),7 along with the dis-
tribution of ρh versus rgc linking the two mass plots. The
Harris catalogue actually records the absolute V magni-
tudes of the GCs. We obtain masses from these by apply-
ing the population-synthesis model mass-to-light ratios ΥV
computed by McLaughlin & van der Marel (2005) for indi-
vidual clusters based on their metallicities and an assumed
age of 13 Gyr. However, we first multiplied all of the
7 Feb. 2003 version; see http://physwww.mcmaster.ca/∼harris/mwgc.dat .
http://physwww.mcmaster.ca/~harris/mwgc.dat
4 McLAUGHLIN & FALL
FIG. 1.— Left: Mass versus three-dimensional half-mass density, ρh ≡ 3M/8πr
h , and versus Galactocentric radius, rgc, for 146 Milky Way GCs in the
catalogue of Harris (1996). The dashed line in the first panel is M ∝ ρ
h , a locus of approximately constant lifetime against evaporation. Right: Half-mass
density versus rgc for the same clusters.
McLaughlin & van der Marel ΥV values by a factor of 0.8 so
as to obtain a median Υ̂V ≃ 1.5M⊙L
⊙ in the end,
8 consistent
with direct dynamical estimates (see McLaughlin 2000 and
McLaughlin & van der Marel 2005; also Barmby et al. 2007).
By assigning mass-to-light ratios to GCs in this way, we
allow for expected differences between clusters with differ-
ent metallicities. Our application of a corrective factor to
the population-synthesis values, ΥpopV , is motivated empiri-
cally by the fact that their distribution among Galactic GCs
is strongly peaked around a median Υ̂popV ≃ 1.9 M⊙ L
⊙ , while
the observed (dynamical) ΥdynV lie in a fairly narrow range
around Υ̂dynV ≃ 1.5 M⊙ L
⊙ (McLaughlin & van der Marel
2005). However, it is worth noting that the size of this differ-
ence is similar to what is found in some numerical simulations
of two-body relaxation over a Hubble time in clusters with a
spectrum of stellar masses (e.g., Baumgardt & Makino 2003).
In such simulations, ΥdynV falls below Υ
V due to the prefer-
ential escape of low-mass stars with high individual M∗/L∗
(population-synthesis models do not incorporate this or any
other stellar-dynamical effect). Thus, a median Υ̂dynV < Υ̂
may itself be a signature of cluster evaporation. We might
then also expect that more dynamically evolved clusters—that
is, those with shorter relaxation times—could have systemat-
ically lower ratios of ΥdynV /Υ
V . However, this is a relatively
small effect, which is not well quantified theoretically and
is not clearly evident in real data (the numbers published by
McLaughlin & van der Marel 2005 show no significant corre-
lation between Υ
V and trh for Galactic globulars). We
therefore proceed, as stated, with a single ΥdynV /Υ
V = 0.8
assumed for all GCs.
Harris (1996) gives the projected half-light radius Rh for
141 of the clusters with a mass estimated in this way, and for
these we obtain the three-dimensional half-mass radius from
the general rule rh = (4/3)Rh (Spitzer 1987), which assumes
no internal mass segregation. The remaining five objects have
mass estimates but no size measurements. To each of these
clusters, we assign an rh equal to the median value for those
of the other 141 GCs having masses within a factor two of the
one with unknown rh. In all cases, the half-mass density is
8 Throughout this paper, we use bx to denote the median of any quantity x.
ρh ≡ 3M/8πr
The leftmost panel in Figure 1 shows immediately that
the cluster mass distribution has a strong dependence on
half-mass density: the median M̂ increases with ρh while
the scatter in log M—that is, the width of the GCMF—
decreases. The first of these points is related to the fact that
rh correlates poorly with M (e.g., Djorgovski & Meylan 1994;
McLaughlin 2000). The second point, that the dispersion of
dN/d log M decreases with increasing ρh, is behind the find-
ing (Kavelaars & Hanes 1997; Gnedin 1997) that the GCMF
is broader at very large Galactocentric radii. We return to this
in §2.2.
A natural concern, when plotting M against ρh as we have
done here, is that any apparent correlation might only be a
trivial reflection of the definition ρh ∝ M/r
h. This may seem
particularly worrisome because, as we just mentioned, it is
known that size does not correlate especially well with mass
for GCs in the Milky Way (or, indeed, in other galaxies).
However, the lack of a tight M–rh correlation does not imply
that all GCs have the same rh, even within the unavoidable
measurement errors. The root-mean-square (rms) scatter of
log rh about its average value is ±0.3 for Galactic GCs, and
the 68-percentile spread in log rh is slightly greater than 0.5,
or more than a factor of 3 in linear terms (from the data in
Harris 1996; see, e.g., Figure 8 of McLaughlin 2000). This
compares to an rms random measurement error (from formal,
χ2 fitting uncertainties) of δ(log rh) ≈ 0.05, or about 10% rel-
ative error; and an rms systematic measurement error (i.e.,
differences in the rh inferred from fitting different structural
models to a single cluster) of perhaps δ(log rh) . 0.03; see
McLaughlin & van der Marel (2005). Most of the scatter in
plots of observed half-light radius versus mass is therefore
real and contains physical information. The left-hand panel
of Figure 1 displays this information in a form that highlights
clear, nontrivial overall trends requiring physical explanation.
The dashed line in the plot of mass against density traces
the proportionality M ∝ ρ
h , or Mr
h = constant. Insofar as
the half-mass relaxation time scales as trh ∝ (Mr
1/2, and
to the extent that µev ∝ M/trh ∝ ρ
h approximates the av-
erage rate of relaxation-driven mass loss, this line is one of
equal evaporation time. That such a locus nicely bounds
the lower envelope of the observed cluster distribution is
SHAPING THE GCMF BY EVAPORATION 5
TABLE 1
MILKY WAY GC PROPERTIES IN BINS OF DENSITY AND GALACTOCENTRIC RADIUS
Bin N bρh
brgc a Mmin Mmax bM
a MTO
[M⊙ pc−3] [kpc] [M⊙] [M⊙] [M⊙] [M⊙]
ρh bins
0.034 ≤ ρh ≤ 76.5 M⊙ pc
−3 48 8.48 12.9 5.63× 102 8.84× 105 4.12× 104 3.98× 104
78.8 ≤ ρh ≤ 526 M⊙ pc
−3 49 232 5.6 8.37× 103 1.67× 106 1.22× 105 1.58× 105
579 ≤ ρh ≤ 5.65× 10
4 M⊙ pc−3 49 973 3.2 1.93× 104 1.30× 106 2.82× 105 2.88× 105
rgc bins
0.6 ≤ rgc ≤ 3.2 kpc 47 597 1.9 4.47× 103 1.02× 106 1.15× 105 2.14× 105
3.3 ≤ rgc ≤ 9.4 kpc 50 261 5.2 2.02× 103 1.67× 106 1.27× 105 1.66× 105
9.6 ≤ rgc ≤ 123 kpc 49 18.4 18.3 5.63× 102 1.30× 106 7.42× 104 8.71× 104
a The notation bx represents the median of quantity x.
b MTO is the peak mass of the model GCMFs traced by the solid curves in each panel of Figure 2, which
are given by equation (3) of the text with β = 2, Mc = 10
6 M⊙, and individual ∆ given by the observed ρh
of each cluster through equation (4).
itself a strong hint that relaxation-driven cluster disruption
has significantly modified the GCMF at low masses (re-
call that Mr3h = constant defines one side of the GC “sur-
vival triangle” when the M–ρh plot is recast as rh versus M:
Fall & Rees 1977; Okazaki & Tosa 1995; Ostriker & Gnedin
1997; Gnedin & Ostriker 1997). It is also further evidence
that the weak correlation of observed rh with M is due to sig-
nificant and real differences in cluster radii, since if rh were
intrinsically the same for all GCs, then we would see M ∝ ρh
instead.
The middle panel of Figure 1 shows the well-known result
that the typical GC mass depends weakly if at all on Galacto-
centric radius, at least until large rgc & 30–40 kpc, where there
are too few clusters to discern any trend. The right-hand panel
of the figure shows why this is true even though the GCMF
depends significantly on cluster density: although there is a
correlation between half-mass density and Galactocentric po-
sition, the large scatter about it is such that convolving the
observed M versus ρh with the observed ρh versus rgc results
in an almost null dependence of M on rgc.
We now divide the GC sample in Figure 1 roughly into
thirds, in two different ways: first on the basis of half-mass
density, and second by Galactocentric radius. These ρh and
rgc bins are defined in Table 1, which also gives a few sum-
mary statistics for the globulars in each subsample. We count
the clusters in every subsample in about 10 equal-width bins
of log M to obtain histogram representations of dN/d log M,
first as a function of ρh and then as a function of rgc. These
GCMFs are shown by the points in Figure 2, with errorbars
indicating standard Poisson uncertainties. The curves in the
figure trace model GCMFs, which we describe in §2.2. For
the moment, it is important to note that the dashed curve is the
same in every panel, apart from minor differences in normal-
ization, and is proportional to the GCMF for the whole sample
of 146 GCs. (In the middle-left panel of Figure 2, which per-
tains to clusters distributed tightly around the median ρh of
the entire GC system, the dashed curve is coincident with the
solid curve running through the data.)
The left-hand panels of Figure 2 show directly that the
GCMF is peaked for clusters at any density, and that the mass
of the peak increases systematically with ρh (see also the last
column of Table 1, but note that the turnover masses there
refer to the model GCMFs that we develop below). The sta-
tistical significance of this is very high, and qualitatively it
is the behavior expected if MTO owes its existence to cluster
disruption at a rate that increases with ρh, as is the case with
relaxation-driven evaporation.
The right-hand panels of Figure 2 confirm once again that
the GCMF peak mass is a very weak function of Galactocen-
tric position. In fact, the observed distributions in the two rgc
bins inside ≃10 kpc are statistically indistinguishable in their
entirety, and the main difference at larger rgc & 10 kpc is a
slightly higher proportion of low-mass clusters rather than a
large change in MTO. All of this is consistent with the pri-
mary dependence of the GCMF being that on ρh, since Fig-
ure 1 shows that the GC density distribution is not sensitive to
Galactocentric position for rgc . 10–20 kpc but has a substan-
tial low-density tail at larger radii (with a broader associated
GCMF, as seen in the upper-left panel of Figure 2).
2.2. Simple Models
We now assess more quantitatively whether these results
are consistent with evaporation-dominated evolution of the
GCMF from an initial distribution like that observed for
young clusters in the local universe. We model the time-
evolution of the distribution of M versus ρh in Figure 1 but do
not attempt this for the distribution of ρh over rgc—the details
of which likely depend on a complicated interplay between
the tidal field of the Galaxy, the present and past orbital pa-
rameters of clusters, and the structural nonhomology of GCs.
To compare our models to the current GCMF as a function of
rgc, we simply calculate them using the observed ρh of indi-
vidual clusters in different ranges of Galactocentric radius.
We assume that the initial GCMF was independent of clus-
ter density, and that all globulars surviving to the present day
have been losing mass for the past Hubble time at constant
rates. We use the current half-mass density of each cluster
to estimate µev ∝ ρ
h . As we discussed earlier, an approxi-
mately time-independentµev is indicated by most calculations
of two-body relaxation in tidally limited GCs. We give a more
detailed, a posteriori justification in §3 for using ρh, rather
than other plausible measures of cluster density, to estimate
Consider first a group of coeval GCs with an initial mass
function dN/d log M0 and a single, time-independent mass-
loss rate µev. The mass of every cluster decreases linearly as
M(t) = M0 −µevt, and at any later time each has lost the same
amount ∆ ≡ M0 − M(t) = µevt. FZ01 show rigorously that in
6 McLAUGHLIN & FALL
FIG. 2.— GCMF as a function of half-mass density, ρh ≡ 3M/8πr
h (left panels), and as a function of Galactocentric radius, rgc (right panels), for 146 Milky
Way GCs in the catalogue of Harris (1996). The dashed curve in all cases is an evolved Schechter function for the entire GC system (Jordán et al. 2007): equation
(3) with β = 2, Mc = 106 M⊙, and ∆≡ 2.3×105 M⊙ for all clusters (from equation [4] and a median bρh = 246 M⊙ pc
−3), giving a peak at MTO = 1.6×10
5 M⊙ .
Solid curves are the GCMFs predicted by equation (3) with β = 2 and Mc = 106 M⊙ but individual ∆ given by the observed ρh of each cluster (equation [4]) in
the different subsamples.
this case, the evolved and initial GCMFs are related by
d log M
d log M0
(M +∆)
d log (M +∆)
. (1)
This is the basis for the claim that the mass function scales
generically as dN/d log M ∝ M+1 (a β = 0 power law) at low
enough M(t)<∆—that is, for the surviving remnants of clus-
ters with M0 ≈∆—just so long as the initial distribution was
not a delta function.
We follow FZ01 (see also Jordán et al. 2007) in adopting a
Schechter (1976) function for the initial GCMF:
dN/d log M0 ∝ M
0 exp
−M0/Mc
. (2)
With β ≃ 2, this distribution describes the power-law mass
functions of young massive clusters in systems like the Anten-
nae galaxies (e.g., Zhang & Fall 1999). An exponential cut-
off at Mc ∼ 10
6 M⊙ is generally consistent with such data,
even if not always demanded by them; here we require it
mainly to match the curvature observed at high masses in old
GCMFs (e.g., Burkert & Smith 2000; Jordán et al. 2007).
Combining equations (1) and (2) gives the probability den-
sity that a single GC with known evaporation rate and age has
an instantaneous mass M. The time-dependent GCMF of a
system of N GCs with a range of µev (or ages, or both) is
then just the sum of all such individual probability densities:
d log M
[M +∆i]
M +∆i
. (3)
Here the total mass losses ∆i = (µevt)i may differ from clus-
ter to cluster (ti being the age of a single GC) but both β and
Mc are assumed to be constants, independent of ρh in particu-
lar.9 Given each ∆i, the normalizations Ai in equation (3) are
defined so that the integral over d log M of each term in the
summation is unity.
Jordán et al. (2007) have introduced a specialization of
equation (3) in which all clusters have the same ∆. They refer
to this as an evolved Schechter function and describe its prop-
erties in detail (including giving a formula for the turnover
mass MTO as a function of ∆ and Mc) for the case β = 2. Here
we note only that, at very young cluster ages or for slow mass-
loss rates, such that ∆ ≪ Mc and only the low-mass, power-
law part of the initial GCMF is significantly eroded, any one
evolved Schechter function has a peak at MTO ≃ ∆/(β − 1)
(for β > 1). As ∆ increases relative to Mc, the turnover at
first increases proportionately and the width of the distribu-
tion decreases (since the high-mass end at M & MTO is largely
unchanged). For large ∆≫Mc, however, the peak is bounded
above by MTO →Mc and the width approaches a lower limit.
Thus, the dependence of MTO on ∆ is weaker than linear when
Mc is finite in the initial GCMF of equation (2). Any peak in
the full equation (3) for a system of GCs with individual ∆
values is an average of N different turnovers and must be cal-
culated numerically.
In their modeling of the Milky Way GC system, FZ01 ef-
9 Note that Mc appears to take on different values in the GCMFs of other
galaxies, varying systematically with the total luminosity Lgal (Jordán et al.
2007). The reasons for this are unclear, as is the origin of this mass scale in
the first place.
10 The increase of MTO and the decrease of the full width of dN/d log M
for increasing ∆ eventually saturate when the mass loss per GC is so high
that it affects clusters in the exponential part of the initial Schechter-function
GCMF. This is because dN/d log M ∝ M+1 exp(−M/Mc) is a self-similar
solution to equation (1).
SHAPING THE GCMF BY EVAPORATION 7
fectively compute mass functions of the type (3)—based on
the same initial conditions and dynamical evolution—with a
distribution of ∆ values determined by the orbital parame-
ters of clusters in an idealized, spherical and static logarith-
mic Galaxy potential (used both to fix µev in terms of clus-
ter tidal densities and to estimate additional mass loss due
to gravitational shocks). Jordán et al. (2007) fit GCMF data
in the Milky Way and scores of Virgo Cluster galaxies with
their version of equation (3) in which all GCs have the same
∆. They thus estimate the dynamical mass loss from typi-
cal clusters in these systems. Here, we construct models for
the Milky Way GCMF using ∆ values given directly by the
observed half-mass densities of individual GCs.
We adopt β = 2 for the initial low-mass power-law index
in equation (2), which carries over into equation (3) for the
evolved dN/d log M. Jordán et al. (2007) have fitted the full
Galactic GCMF with an evolved Schechter function assuming
β = 2 and a single ∆ ≡ ∆̂ for all surviving globulars. They
find Mc ≃ 10
6 M⊙ and ∆̂ = 2.3× 10
5 M⊙. We use this value
of Mc in equation (3) and we associate ∆̂ with the mass loss
from clusters at the median half-mass density of the entire GC
system, which is ρ̂h = 246 M⊙ pc
−3 from the data in Figure 1.
Since we are assuming that ∆ = µevt ∝ ρ
h t for coeval GCs,
we therefore stipulate
∆ = 1.45× 104 M⊙
ρh/M⊙ pc
−3)1/2 (4)
for globulars with arbitrary ρh. Assuming a typical GC age of
t = 13 Gyr, this corresponds to a mass-loss rate of
µev ≃ 1100 M⊙ Gyr
−1 (ρh/M⊙ pc−3
. (5)
In §3 we discuss the cluster lifetimes implied by this value
of µev. We emphasize here that the scaling of µev and ∆
with ρ
h follows rather generically from our hypothesis of
evaporation-dominated cluster evolution, while the numerical
coefficients in equations (4) and (5) are specific to the assump-
tion of β = 2 for the power-law index at low masses in the
initial GCMF.
The dashed curve shown in every panel of Figure 2 is
the evolved Schechter function fitted to the entire GCMF of
the Milky Way by Jordán et al. (2007). This has a peak at
MTO ≃ 1.6× 10
5 M⊙ (magnitude MV ≃ −7.4 for a typical V -
band mass-to-light ratio of 1.5 in solar units) and gives a very
good description of the observed dN/d log M in the middle
density bin, 79 . ρh . 530 M⊙ pc
−3, and in the two inner ra-
dius bins, rgc ≤ 9.4 kpc. This is expected, since the median
half-mass density in each of these cluster subsamples is very
close to the system-wide median ρ̂h = 246 M⊙ pc
−3 (see Ta-
ble 1). Even in the outermost rgc bin, a Kolmogorov-Smirnov
(KS) test only marginally rejects the dashed-line model (at
the ≃95% level), because this subsample still includes many
GCs at or near the global median ρ̂h (see Figure 1). By con-
trast, the average GCMF is strongly rejected as a model for
the lowest- and highest-density GCs on the left-hand side of
Figure 2: the KS probabilities that these data are drawn from
the dashed distribution are <10−4 in both cases. This is also
expected since, by construction, these bins only contain clus-
ters with densities well away from the median of the full GC
system, for which the total mass lost by evaporation should be
significantly different from the typical ∆̂ = ∆(ρ̂h).
The solid curves in Figure 2, which are different in ev-
ery panel, are the superpositions of many different evolved
Schechter functions, as in equation (3), with distinct ∆ values
given by equation (4) using the observed ρh of each cluster in
the corresponding subsample. These models provide excel-
lent matches to the observed dN/d log M in every ρh and rgc
bin, with χ2 < 1.3 per degree of freedom in all cases. This is
the main result of this paper.
The last column of Table 1 gives the mass MTO at which
each of the solid model GCMFs in Figure 2 peaks. We note
that these turnovers increase roughly as MTO ∼ ρ̂
0.3−0.4
h for
our specific binnings in ρh and rgc, somewhat shallower than
the ρ
h scaling of the cluster mass-loss rate that defines the
models. This is partly because of the averaging over indi-
vidual turnovers implied by the summation of many evolved
Schechter functions in each GC bin, and partly because—as
we discussed just after equation (3)—the turnover mass of any
one evolved Schechter function cannot increase indefinitely in
direct proportion to ∆ ∝ ρ
h , but has a strict upper limit of
MTO ≤ Mc.
Our models are naturally consistent with the fact that the
GCMF is narrower for clusters with higher densities. This
is obvious in the left-hand panels of Figure 2; in the dis-
cussion immediately after equation (3), we described how it
follows from the increase of MTO with ∆ ∝ ρ
h for a sin-
gle evolved Schechter function. In addition, the superposi-
tion of many such functions with separate, density-dependent
turnovers and widths results in wider GCMFs for cluster sub-
samples spanning larger ranges of ρh. This accounts in partic-
ular for the breadth of the mass function at rgc ≥ 9.4 kpc. The
globulars at these radii have 0.034 ≤ ρh ≤ 4.1×10
3 M⊙ pc
corresponding to individual evolved Schechter functions with
turnovers at 2.7× 103 . MTO . 4.0× 10
5 M⊙. The compos-
ite GCMF in the lower-right panel of Figure 2 is therefore
extremely broad and shows a very flat peak, such that an over-
all MTO cannot be established precisely from the data alone.
This explains the findings of Kavelaars & Hanes (1997), who
pointed out that the GCMF of the outermost third of the Milky
Way cluster system has a turnover that is statistically consis-
tent with the full-Galaxy average, but a larger dispersion (see
also Gnedin 1997).
Finally, if the GCMF evolved dynamically from initial con-
ditions similar to those we have adopted, then the data and
models in the left-hand panels of Figure 2 argue against
the notion that external gravitational shocks, rather than in-
ternal two-body relaxation, were primarily responsible for
shaping the present-day GCMF. This is because the mass-
loss rate caused by shocks alone, −dM/dt = µsh ∝ M/ρh,
differs significantly from that caused by evaporation alone,
−dM/dt = µev ∝ ρ
h . The direct dependence of µsh on M en-
sures that shocks become progressively less important com-
pared to evaporation as clusters lose mass (at a given ρh), and
consequently shocks are not likely to have had much effect on
the observed GCMF for M < MTO. Furthermore, the inverse
dependence of µsh on ρh is contrary to the direct dependence
of MTO on ρh shown in Figure 2. The different roles played
by shocks and evaporation in shaping the observed GCMF are
discussed more fully by FZ01. We note here that gravitational
shocks may have been important in destroying very massive
or very low-density clusters early in the history of our Galaxy.
2.3. Other Cluster Properties
If the current shape of the GCMF is fundamentally the re-
sult of long-term cluster disruption according to a mass-loss
8 McLAUGHLIN & FALL
rule like µev ∝ ρ
h , then it should be possible to reproduce
the distribution as a function of any other cluster attribute by
using the observed ρh of individual GCs in equations (3) and
(4) to build model dN/d log M for subsamples of the Galactic
cluster system defined by that attribute—as we did for the rgc
binning of §2.2. Here we explore one example in which dif-
ferences in the GCMFs of two groups of globulars can be seen
in this way to follow from differences in their ρh distributions.
Smith & Burkert (2002) have shown that the mass function
of Galactic globulars with King (1966) model concentrations
c < 0.99 has a less massive peak than that for c ≥ 0.99. [Here
c ≡ log(rt/r0), where rt is the fitted tidal radius and r0 a core
scale.] They further find that a power-law fit to the low-
c GCMF just below its peak returns dN/d log M ∝ M+0.5—
shallower than the M+1 expected generically for a mass-loss
rate that is constant in time—but they confirm that the latter
slope applies for the GCMF at c ≥ 0.99. They discuss various
options to explain these results, including a suggestion that, if
the mass functions of both low- and high-concentration clus-
ters evolved slowly from the same, young-cluster–like initial
distribution, then the mass-loss law for low-c GCs may have
differed from that for high-c clusters. However, they give no
physical explanation for such a difference, and we can show
now that none is required.
The upper panel of Figure 3 plots concentration against
half-mass density for the same 146 GCs from Figure 1; the
filled circles distinguish 24 clusters with c < 0.99. There
is a correlation of sorts between c and ρh, which either de-
rives from or causes the better-known correlation between c
and M (e.g., Djorgovski & Meylan 1994; McLaughlin 2000).
The important point here is that the ρh distribution is off-
set to lower values and has a higher dispersion at c < 0.99.
Following the discussion in §2.2, we therefore expect the
low-concentration GCMF to have a smaller MTO, a flatter
shape around the peak, and a larger full width than the high-
concentration GCMF.
The lower panel of Figure 3 shows the GCMFs for c < 0.99
(filled circles) and c ≥ 0.99 (open circles). The curves are
again given by equation (3) with β = 2, Mc = 10
6 M⊙, and in-
dividual ∆ calculated from the observed cluster ρh through
equation (4). These models peak at MTO ≃ 4.3× 10
for the c < 0.99 subsample but at MTO ≃ 1.8× 10
5 M⊙ for
c ≥ 0.99, entirely as a result of the different ρh involved.
The larger width of dN/d log M and its shallower slope at
any M . 105 M⊙ for the low-concentration GCs are also
clear, in the model curves as well as the data. It is further
evident that there are no low-c Galactic globulars observed
with M & 2× 105 M⊙, above the nominal turnover of the full
GCMF (as Smith & Burkert 2002 noted). But this is not sur-
prising, given that there are so few low-concentration clusters
in total and they are expected to be dominated by low-mass
objects because of their generally low densities. Thus, the
solid curve in Figure 3 predicts perhaps ≃ 3 high-mass clus-
ters with c < 0.99, where none is found.
The apparent variation of the Milky Way GCMF with in-
ternal concentration is therefore consistent with the same
density-based model for evaporation-dominated dynamical
evolution that we compared to dN/d log M as a function of
ρh and rgc in §2.2. To show this, we have made use of the
densities ρh exactly as observed within the two concentration
bins indicated in Figure 3—just as we also took ρh directly
from the data for GCs in different ranges of rgc to construct
models for comparison with the observed dN/d log M in the
FIG. 3.— Top: Concentration parameter as a function of half-mass density
for 146 Galactic GCs. The line of points at c ≡ 2.5 comes from the practice of
assigning this value to core-collapsed clusters in the Harris (1996) catalogue
and its sources. Bottom: GCMF data and models (eqs. [3] and [4]) for 24
clusters with c < 0.99 (filled circles and solid curve) and 122 clusters with
c ≥ 0.99 (open circles and dashed curve).
right-hand panels of Figure 2. Of course, this is not the same
as explaining the distribution of ρh versus rgc or c. Doing so
would certainly be of interest in its own right, but it is beyond
the scope of our work here.
3. DISCUSSION
In this section, we first show that the mass-loss rate in equa-
tion (5) above implies cluster lifetimes that compare favorably
with those expected from relaxation-driven evaporation. Then
we discuss why it is reasonable to approximate µev ∝ ρ
the first place. Finally, we address the issue of possible con-
flict, in some other models for evaporation-dominated GCMF
evolution, between the near-constancy of MTO as a function
of rgc and the observed kinematics of GC systems.
3.1. Cluster Lifetimes
The disruption time of a GC with mass M and a steady
mass-loss rate µev is just tdis = M/µev. It is convenient,
for purposes of comparison with evaporation times in the
literature, to normalize tdis to the relaxation time of a
cluster at its half-mass radius. In general, this is trh =
0.138M1/2r
G1/2m∗ ln (γM/m∗)
, where m∗ is the mean
SHAPING THE GCMF BY EVAPORATION 9
stellar mass. For clusters of stars with a single mass,
m∗ ≃ 0.7M⊙ and γ ≃ 0.4 are appropriate (Spitzer 1987;
Binney & Tremaine 1987, equation [8-72]), in which case
equation (5) for µev from our GCMF modeling implies
µevtrh
0.57M/M⊙
0.57× 105
. (6)
Clusters with realistic stellar mass spectra will have slightly
different values of m∗ and a smaller γ in the calculation of
the relaxation time (Giersz & Heggie 1996), which changes
the numerical value of tdis/trh somewhat but does not alter any
scalings.
We obtained the normalization of µev ∝ ρ
h in §2.2 by
fitting to observed GCMFs constructed by applying a spe-
cific mass-to-light ratio ΥV to every cluster, with models as-
suming a specific form for the initial dN/d log M0. Thus,
the result in equation (6) depends both on the median Υ̂V
and on the power-law index β at low masses in the original
Schechter-function GCMF. The net scaling, for either single-
or multiple-mass clusters, is
tdis/trh ∝ Υ̂
V (β − 1)
−1 . (7)
To see the dependence of this dimensionless lifetime on
Υ̂V , note that we require µev ∝∆∝ΥV to fit the mass losses
of clusters with a given distribution of luminosities (the di-
rect observables), whereas M/trh is proportional to ρ
V (L/r
1/2. Therefore, tdis/trh ∝ (M/trh)/µev ∝Υ
V . The
mass-to-light ratios adopted in this paper, with a median value
Υ̂V ≃ 1.5 M⊙ L
⊙ , are tied directly to dynamical determina-
tions (§2.1).
To understand the dependence on β in equation (7), recall
first that the coefficients in our expressions for ∆ and µev
as functions of ρh (eqs. [4] and [5]) followed from choos-
ing β = 2 for the power-law exponent at low masses in the
initial GCMF (equation [2]). As we mentioned just after
equation (3), the turnover mass of an evolved Schechter func-
tion with any β > 1 is MTO ≃ ∆/(β − 1) in the limit of low
∆ ∝ ρ
h , and MTO → Mc for very high ∆. In this sense,
the strongest observational constraints on the normalizations
of ∆ and µev come from the low-density clusters. All other
things being equal, their GCMF can be reproduced with β 6= 2
if ∆ and µev are multiplied by (β − 1) at fixed ρh. Therefore,
tdis ∝ 1/µev ∝ 1/(β− 1). Observations of young massive clus-
ters (e.g., Zhang & Fall 1999) indicate that β is near 2; but if
it were slightly shallower, then the cluster lifetimes we infer
from the old GCMF would increase accordingly. Even a rela-
tively minor change to β = 1.5 would double tdis/trh from ≈10
to ≈20.
In the model of Hénon (1961) for single-mass clusters
evolving self-similarly (fixed ratio ρt/ρh of mean densities in-
side the tidal and half-mass radii) in a steady tidal field (ρt
constant in time), a cluster loses 4.5% of its remaining mass
every half-mass relaxation time. The time to complete disrup-
tion is therefore tdis/trh = 1/0.045≃ 22. For non-homologous
clusters in a steady tidal field, tdis/trh is a function of central
concentration and can differ from the Hénon value by factors
of about two. From one-dimensional Fokker-Planck calcula-
tions, Gnedin & Ostriker (1997) find tdis/trh ≃ 10–40 for King
(1966) model clusters with c values similar to those found in
real GCs and with gravitational shocks suppressed (see their
Figure 6). Thus, even though the evaporation time in equation
(6) may be slightly shorter than is typically found in theoreti-
cal calculations, it is certainly within the range of such calcu-
lations. Moreover, the assumptions of a steady tidal field and
a single stellar mass in Hénon (1961) and Gnedin & Ostriker
(1997) are important. Part of the difference between the typ-
ical lifetimes in these particular theoretical treatments and
our estimate of tdis/trh from the GCMF is that the former do
not include gravitational shocks, which may have accelerated
somewhat the evolution of real clusters (although we stress
again that shocks do not appear in general to have dominated
the evolution of extant Galactic GCs and are not expected
to affect the basic time-independence of the net mass-loss
rate; see Vesperini & Heggie 1997, Gnedin, Lee, & Ostriker
1999, FZ01, and Prieto & Gnedin 2006). A spectrum of stel-
lar masses in the clusters may also have contributed to an in-
crease in evaporation rate over the single-mass values (e.g.,
Johnstone 1993; Lee & Goodman 1995).
Estimates of evaporation times from other numerical
methods and for models of multimass clusters can be rather
sensitive to the detailed computational techniques and input
assumptions and approximations, and differences at roughly
the factor-of-two level in tdis/trh between different analyses
are not uncommon; see, e.g., Vesperini & Heggie (1997),
Takahashi & Portegies Zwart (1998, 2000), Baumgardt
(2001), Joshi, Nave, & Rasio (2001), Giersz (2001), and
Baumgardt & Makino (2003). Thus, although the lifetimes
in these studies tend to be broadly comparable to those in
Hénon (1961) and Gnedin & Ostriker (1997), noticeably
shorter values do occur in some models. In any case, we
are encouraged by consistency to within factors of two or
three between estimates of tdis or µev by such vastly different
methods—one purely observational, based on the mass
functions of cluster systems; the other purely theoretical,
based on idealized models for the evolution of individual
clusters—particularly since each method involves several
uncertain inputs and parameters.
3.2. Approximating µev ∝ ρ
3.2.1. Half-mass versus Tidal Density
The dimensionless disruption time in equation (6) is inde-
pendent of any cluster property other than the Coulomb log-
arithm because we have used GC half-mass densities to esti-
mate tdis = M/µev ∝ M/ρ
h , while trh also scales as M/ρ
However, as we mentioned above, the Fokker-Planck calcu-
lations of Gnedin & Ostriker (1997) in particular show that
tdis/trh is actually a function of central concentration, c, for
King (1966) model clusters in steady tidal fields. The constant
of proportionality in µev ∝ ρ
h should therefore also depend
on c, a detail that we have neglected to this point. We show
now that this has not biased any of our analysis or affected our
conclusions.
The dotted curve in Figure 4 illustrates the dependence of
tdis/trh on c for single-mass King models, as given by equation
(30) of Gnedin & Ostriker (1997). The solid curve is propor-
tional to (ρh/ρt)
1/2 = (r3t /2r
1/2, which we have calculated
as a function of c for these models and multiplied by a con-
stant to compare directly with tdis/trh. Evidently, there is an
approximate equality tdis/trh ≈ 2.15(ρh/ρt)
1/2, which holds to
within <15% over the range of concentrations shown in Fig-
ure 4 (note that all but 6 Galactic GCs have 0.7 ≤ c ≤ 2.5,
corresponding to central potentials 3 .W0 . 11). Thus, if the
10 McLAUGHLIN & FALL
FIG. 4.— Dependence of tdis/trh (dotted line; from Gnedin & Ostriker
1997) and (ρh/ρt )
1/2 (solid line; after scaling by a factor of 2.15) on cen-
tral concentration for single-mass King-model clusters. Over the range of c
shown, which includes nearly all Galactic globulars, the approximate propor-
tionality tdis/trh ∝ (ρh/ρt )
1/2 holds to within better than 15%. Thus, to this
level of accuracy the evaporation time tdis is roughly the same multiple of
t for clusters with any internal density profile.
evaporation time is written as tdis ∝ trh(ρh/ρt)
1/2 ∝ M/ρ
then the constant of proportionality in the mass-loss rate µev ∝
M/tdis ∝ ρ
t should be nearly independent of c. In fact, King
(1966) originally concluded, from quite basic arguments, that
the evaporation rate of a cluster with a lowered-Maxwellian
velocity distribution would take the form µev ∝ ρ
t with
only a weak dependence on c. An essentially concentration-
independent scaling of µev with ρ
t is also found in N-
body simulations of tidally limited, multimass clusters (e.g.,
Vesperini & Heggie 1997) and so is not an artifact of any as-
sumptions specific to the calculations of either King (1966) or
Gnedin & Ostriker (1997).
This suggests that it might have been more natural to spec-
ify cluster evaporation rates proportional to ρ
t rather than
h when developing our GCMF models in §2. For any clus-
ter in a steady tidal field, with a constant ρt , such a choice
would also have been automatically consistent with an ap-
proximately time-independent µev and the corresponding lin-
ear M(t) dependence that we have adopted throughout this
paper. As we discussed at the beginning of §2, our deci-
sion to work with ρh rather than ρt was motivated by the
fact that the half-mass density is much better defined in prin-
ciple and more accurately observed in practice. Neverthe-
less, re-writing µev ∝ ρ
t as µev ∝ (ρt/ρh)
1/2 × ρ
h makes
it clear that the validity of our models, with a fixed coefficient
in µev ∝ ρ
h , depends on the extent to which variations in
(ρt/ρh)
1/2 can safely be ignored.
Figure 4 shows that the full range of possible values for
(ρh/ρt)
1/2 in King-model clusters with c ≥ 0.7 is only a factor
of ≃ 4 between minimum and maximum. Therefore, using
a single, intermediate value of this density ratio to describe
all GCs (or a single GC evolving in time through a series of
quasi-static King models)—which we have effectively done
by using a GCMF fit to normalize ∆ and µev in equations (4)
and (5)—should never be in error by more than a factor of 2 or
so. This is a relatively small inaccuracy, given that measured
GC densities range over four to five orders of magnitude.
To confirm more directly that our models with µev ∝ ρ
are good approximations to GCMF evolution under a mass-
loss law µev ∝ ρ
t , we have repeated the analysis of §2 in full
but using the GC tidal densities ρt (derived from the values of
rt listed by Harris 1996) in place of ρh throughout. All of our
main results persist.
For example, the two panels of Figure 5, which are analo-
gous to the left- and rightmost panels of Figure 1 above, show
that (1) the GC mass distribution has a clear dependence on
ρt , with a lower envelope that is well matched by a line of
constant evaporation time, M ∝ ρ
t (the dashed line in the
plot); and (2) although the scatter in the distribution of ρt
over Galactocentric radius is smaller than the scatter in ρh ver-
sus rgc, it is still significant. Because the M–rgc distribution
can now be viewed as the convolution of the M–ρt distribu-
tion with the ρt–rgc distribution, the scatter in ρt versus rgc is
again critical in explaining the weak or null dependence of the
GCMF on Galactocentric radius. (The M–rgc distribution is,
of course, unchanged from that shown in the middle panel of
Figure 1.)11
Figure 6 shows the Milky Way GCMF for globulars in three
equally populated bins of tidal density (defined as indicated in
the left-hand panels of the plot) and in the same three bins of
Galactocentric radius that we used in §2.2 above. Our mod-
els for these distributions are based as before on equation (3)
with β = 2, but now the total mass lost from any GC is esti-
mated from its tidal density rather than its half-mass density.
Specifically, we take
∆ = 2.1× 105 M⊙
ρt/M⊙ pc
−3)1/2 . (8)
The numerical coefficient in equation (8) is such that it gives a
∆ identical to that in equation (4) for a GC with ρh/ρt = 210,
which is the median value of this density ratio for the 146 GCs
in the Harris (1996) catalogue.
As in Figure 2, the dashed curve in every panel of Figure
6 is the same, representing a fit to the average dN/d log M of
the entire Galactic GC system. Thus, it is immediately clear
that the peak mass of the GCMF increases significantly and
systematically with increasing ρt , just as it does with increas-
ing ρh. Meanwhile, the solid curves are subsample-specific
model GCMFs, obtained by using the observed tidal density
of each cluster in any ρt or rgc bin to specify individual ∆ val-
ues via equation (8) for each of the evolved Schechter func-
tions in the summation of equation (3). As expected, there is
no appreciable difference, in terms of the fits to any of the ob-
served GCMFs, between these models based on evaporation
rates µev ∝ ρ
t and our original models with µev ∝ ρ
3.2.2. Retarded Evaporation
Another potential concern comes from recent arguments
(see especially Baumgardt 2001; Baumgardt & Makino 2003)
11 As was also the case with our earlier plots involving ρh in Figure 1,
the scatter and structure in both panels of Figure 5 are real, since the rms
scatter of log rt about the best-fit lines to either of log M or log rgc is 0.3–0.35
while the rms errorbars based on formal fitting uncertainties are in the range
δ(log rt ) ≃ 0.05–0.15 for a variety of models (McLaughlin & van der Marel
2005).
SHAPING THE GCMF BY EVAPORATION 11
FIG. 5.— Scatter plots of mass M versus mean density inside the tidal radius (ρt ≡ 3M/4πr3t ) and of ρt versus Galactocentric radius rgc, for 146 Galactic GCs
from the Harris (1996) catalogue. These plots are analogous to the left- and rightmost panels of Figure 1. The dashed line in the left-hand plot traces the relation
M ∝ ρ
t , which defines a locus of constant evaporation time for µev ∝ ρ
FIG. 6.— Observed GCMF (points, with Poisson errorbars) and models (curves) as a function of mean cluster density inside the tidal radius, ρt ≡ 3M/4πr3t (left-
hand panels), and as a function of Galactocentric radius, rgc (right-hand panels). The dashed curve in every panel is an evolved Schechter function representing
the entire GC system: equation (3) with β = 2, Mc = 106 M⊙, and a single ∆, common to all clusters, evaluated from equation (8) using the median bρt of all 146
Galactic GCs. Solid curves are subsample-specific models using equation (3) with β = 2 and Mc = 106 M⊙ but a different ∆ value for every cluster (obtained
from equation [8] using individual observational estimates of ρt ) in any ρt or rgc bin.
12 McLAUGHLIN & FALL
that the total evaporation time of a tidally limited cluster
is not simply a multiple of an internal two-body relaxation
time, trlx ∝ (Mr
3)1/2, but depends on both trlx and the crossing
time tcr ∝ (M/r
3)−1/2 through the combination tdis ∝ t
with x < 1. The mass-loss rate µev ∝ M/tdis then scales as
M3/2−xr−3/2, which for x 6= 1 differs from the rates µev ∝ ρ
and µev ∝ ρ
t that we have so far adopted. However, our
GCMF models are still meaningful, because postulating tdis ∝
txrlxt
cr implies a dependence of µev on a measure of cluster
density that is, once again, well approximated by ρ
h for
Galactic GCs. Before showing this, we briefly discuss the
reasons and the evidence for a possible dependence of tdis on
both trlx and tcr.
If stars are assumed to escape a cluster as soon as they
have attained energies above some critical value as a result
of two-body relaxation, then tdis ∝ trlx is expected (and con-
firmed by N-body simulations; e.g., Baumgardt 2001). How-
ever, more complicated behavior may arise when escape not
only depends on stars satisfying such an energy criterion, but
also requires them to cross a spatial boundary. Then, al-
though the stars are still scattered to near- and above-escape
energies on the timescale trlx, they require some additional
time to actually leave the cluster. This escape timescale is
related fundamentally to tcr (but also depends on details of
the stellar orbits, the external tidal field, and the shape of
the zero-energy surface). The longer this extra time, the
higher is the probability that further encounters with bound
cluster stars may scatter any potential escapers back down
to sub-escape energies. The net result is a slow-down (“re-
tardation”) of the overall evaporation rate (Chandrasekhar
1942; King 1959; Takahashi & Portegies Zwart 1998, 2000;
Fukushige & Heggie 2000; Baumgardt 2001) and a length-
ening of the cluster lifetime tdis, by a factor that can be ex-
pected to increase with the ratio tcr/trlx. If this factor scales as
(tcr/trlx)
1−x for some x < 1, then tdis ∝ trlx (tcr/trlx)
1−x = txrlxt
While such a retardation of evaporation can be expected
to occur at some level in all clusters, there are physical sub-
tleties in the effect that are probably not captured adequately
by a simple re-parametrization of lifetimes as tdis ∝ t
In particular, it is unlikely that this expression can hold for
clusters of all masses with a single value of x < 1. Since
tcr/trlx ∝ M
−1, very massive clusters have tcr ≪ trlx, and stars
scattered to greater than escape energies by relaxation cross
the tidal boundary effectively instantaneously—implying that
the standard tdis ∝ trlx, or x→ 1, applies in the high-mass limit.
Indeed, if this were not the case, and a fixed x < 1 held for all
M, then an unphysical tdis < trlx would obtain at high enough
masses; see Baumgardt (2001) for further discussion. Unfor-
tunately, “very massive” is not well quantified in this context,
and it is not yet clear if a single value of x is accurate for the
entire GC mass regime. So far, it has been checked directly
only for initial cluster masses below the current peak of the
GCMF.
It is also worth noting that the analysis and simulations
aimed at this problem to date have dealt with clusters on cir-
cular or moderately eccentric orbits in galactic potentials that
are static and spherical. This means that any tidal perturba-
tions felt by stars within the clusters are relatively weak and/or
slow compared to their own orbital periods, leading to nearly
adiabatic or at least non-impulsive responses. In more realis-
tic situations, the galactic potential would be time-dependent
and non-spherical and there might be additional tidal pertur-
bations, including disk and bulge shocks. These perturbations
could in some cases accelerate the escape of weakly bound
stars from the clusters and thus counteract the retardation ef-
fect to some degree. Further study is therefore needed to de-
termine the regime of validity of the formula tdis ∝ t
cr and
its possible modification outside this regime.
In the meantime, Baumgardt (2001) and
Baumgardt & Makino (2003; hereafter BM03) have fit-
ted this formula to the lifetimes of a suite of N-body clusters
with initial masses M0 . 7 × 10
4 M⊙ and several different
initial concentrations and orbital eccentricities. BM03 at
first write tdis in terms of the relaxation and crossing times
of clusters at their half-mass radii, so that trlx ∝ (Mr
tcr ∝ (M/r
−1/2, and tdis ∝ M
x−1/2r
h (see their equation
[5]). However, they immediately take a factor of (rt/rh)
out from the normalization of this scaling—in effect to obtain
tdis ∝M
x−1/2r
t with a different constant of proportionality—
and then use a simple definition of the tidal radius (their
equation [1], r3t = GMr
c , which is appropriate for a
circular orbit of radius rp in a logarithmic potential with
circular speed Vc; see Innanen, Harris, & Webbink 1983)
to obtain the total lifetime of a cluster as a function of its
initial mass, perigalactic distance, and Vc (their equation
[7]). A single exponent x ≃ 0.75 and a single normalization
in this function then suffice to predict to within 10% the
lifetimes of the simulated clusters, regardless of their initial
concentrations. By implication, if trlx and tcr were fixed at rh
rather than rt , then tdis would have an additional concentration
dependence, related to the ratio (rt/rh)
3/2—very similar to
what we discussed in §3.2.1 for the case x = 1.
We now re-examine the Milky Way GCMF in terms of
this prescription for retarded evaporation (bearing in mind the
caveats mentioned above). To avoid any explicit dependences
on concentration, we also focus on the tidal radius and write
tdis ∝ M
x−1/2r
t for general x ≤ 1; but we do not substitute a
potential- and orbit-specific formula for rt in terms of rp and
galactic properties such as Vc. Instead, to keep the empha-
sis entirely on cluster densities, we re-write the scaling of the
lifetime in terms of the mean surface density inside the tidal
radius, Σt ≡ M/πr
t , and the corresponding volume density
ρt = 3M/4πr
t . This leads to tdis ∝ MΣ
−3(1−x)
−2(x−3/4)
t , which
then implies
µev ≡ −dM/dt ∝ M/tdis ∝ Σ
3(1−x)
2(x−3/4)
t . (9)
Clearly, the standard µev ∝ ρ
t , which we have already dis-
cussed, is recovered for x = 1; while for x = 0.75, we have the
equally straightforward µev ∝ Σ
BM03 find that, even with the retarded evaporation implied
by x ≃ 0.75, the masses of their simulated clusters still de-
crease approximately linearly with time after stellar-evolution
effects (which are only important for the first few 108 yr) are
separated out; see especially their Figure 6, equation (12), and
related discussion. Thus, if the GCMF initially rose towards
low masses and has been eroded by slow, relaxation-driven
cluster destruction, then in this modified description of evap-
oration we might expect the current mass function to depend
fundamentally on Σt rather than ρh or ρt . But because M(t)
still decreases nearly linearly with t, only now with µev ∝Σ
for each cluster, the shape of the evolved GCMF and its de-
pendence on Σt should resemble our earlier results for ρh and
SHAPING THE GCMF BY EVAPORATION 13
We have confirmed this expectation by repeating all of our
analyses in §2 again, now using µev ∝Σ
t to estimate cluster
mass-loss rates. As before, we calculate Σt from the data in
the Harris (1996) catalogue, although we caution once more
that the tidal radii, and thus the derived Σt , are more uncertain
than rh and ρh.
Figure 7, which should be compared to Figures 1 and 5
above, shows that the average Galactic GC mass increases
systematically with Σt; that the lower envelope of the M–Σt
distribution is described well by M ∝ Σ
t (the dashed line in
the left-hand panel of Figure 7), which is a locus of constant
lifetime against evaporation for µev ∝ Σ
t ; and that the scat-
ter in the distribution of cluster Σt versus Galactocentric ra-
dius (right-hand panel of the figure) is substantial, as required
to account for the almost non-existent correlation between M
and rgc.
The left-hand side of Figure 8 shows the mass functions
of globulars in three bins of Σt , as defined in each panel.
The right-hand side of the figure shows dN/d log M in the
same three intervals of rgc as in Figures 2 and 6 above. As
in those earlier plots, the dashed curve in all panels of Figure
8 is a model GCMF with the same parameters in every case,
representing the mass function of the entire Galactic GC sys-
tem. Once again, compared to the average MTO, the observed
turnover mass is significantly lower for clusters in the lowest
Σt bin and higher for clusters in the highest Σt bin, while the
width of dN/d log M decreases noticeably as Σt increases.
The solid curves in Figure 8 are again different in every
panel. They are the sums of evaporation-evolved Schechter
functions as in equation (3), with the usual β = 2 assumed
but with total mass losses estimated individually for each GC
in any Σt or rgc bin according to ∆ ∝ Σ
t rather than ∆ ∝
h or ∆ ∝ ρ
t . However, it turns out not to be necessary
to change the normalization of ∆ ∝ ρ
h in equation (4) to
achieve good fits to the observed GCMF as a function of either
Σt or rgc. Thus, in Figure 8 we have simply used
∆ = 1.45× 104 M⊙
Σt/M⊙ pc
−2)3/4 . (10)
The fits of these models, based on tdis ∝ t
cr with x ≃
0.75, are indistinguishable from the fits of our original mod-
els based on the standard tdis ∝ trlx, i.e., x = 1. (We have con-
firmed that adopting individual ∆ given by equation [10] also
reproduces the GCMFs of low-and high-concentration GCs
in Figure 3 as well as before.) It was somewhat unexpected
that equation (10) and equation (4) should have the same nu-
merical coefficient, but we note that this follows empirically
from the fact that the measured ρh and Σt of Galactic GCs
are consistent with the simple near-equality, ρh/M⊙ pc
(Σt/M⊙ pc
−2)1.5 in the mean. This is illustrated in Figure 9,
which also shows that there is significant scatter about the re-
lation.12 However, this scatter does not correlate with clus-
ter mass or Galactocentric radius. From a pragmatic point of
view, therefore, ρ
h and Σ
t are near enough to interchange-
able for our purposes, and there is no practical difference be-
12 Although it may be only a coincidence that the constant of proportion-
ality in ρh ∝ Σ
t is so near unity, the basic scaling itself holds because
combining the observed correlation between cluster mass and central con-
centration (Djorgovski & Meylan 1994; McLaughlin 2000) with the intrinsic
dependence of rt/rh on c in King models leads roughly to (rt/rh) ∝ M
tween GCMF models based on one or the other measure of
GC density.
One further check on this is to verify that the mass-loss rate
associated with equation (10) is roughly in keeping with that
implied by the N-body simulations pointing to x = 0.75 in the
first place. Thus, we compare the rate
µev = ∆/(13 Gyr) ≃ 1100 M⊙ Gyr
−1 (Σt/M⊙ pc
−2)3/4 (11)
to a formula implicit in BM03. Starting with their equa-
tion (7) for the lifetime tdis as a function of initial cluster
mass and perigalactic distance and circular speed in a loga-
rithmic halo potential; using their x = 0.75 and their normal-
ization of 1.91× 106 yr, multiplied as in their equation (9)
by (1 + e) to allow for eccentric orbits with apo- and peri-
galactic distances related by e ≡ (ra − rp)/(ra + rp); insert-
ing their equation (1) for rt ; taking the mean mass of clus-
ter stars to be m∗ = 0.55M⊙, as they do; using γ = 0.02 as
they do in the Coulomb logarithm, ln(γM0/m∗); and defining
Σt,0 ≡ M0/πr
t,0 (the subscript 0 denoting initial values), we
obtain
µev(BM03)≃
0.7M0
1 + e
M⊙ Gyr
0.036M0/M⊙
0.036× 105
]3/4 (
M⊙ pc−2
This is appropriate for clusters that just fill their Roche lobes
at perigalacticon, which is where Σt,0 is specified. The factor
of 0.7 in the first equality accounts for mass loss due to stellar
evolution in the BM03 simulations, which, as they discuss,
can be treated as having occurred almost immediately and in
full at the beginning of a cluster’s life.
Our GCMF-based µev is a factor of ≈ 2 faster than the N-
body value for clusters on circular orbits (with e = 0 and in
steady tidal fields) in the simulations; and our µev is still
within a factor of about three of the N-body rate for clusters on
eccentric orbits with e = 0.5 in BM03 (e ≃ 0.5–0.6 is typical
for tracers with an isotropic velocity distribution in a logarith-
mic potential; van den Bosch et al. 1999). This is very similar
to the comparison of lifetimes in §3.1 for our original models
based on µev ∝ ρ
h . Moreover, our new estimate of µev and
that in BM03 are still subject to their own, separate uncertain-
ties and reflect different idealizations and assumptions. For
example, our rate still depends on the exact power-law expo-
nent β at low masses in the initial GCMF, as discussed after
equation (7); while the rate from BM03 still neglects grav-
itational shocks from disk crossings and passages by a dis-
crete galactic bulge, and may additionally be biased low for
M0 > 10
5 M⊙ if x > 0.75 at such masses. All of this—not
to mention again the large uncertainties and possible system-
atics in the estimates of tidal radii needed to calculate Σt—
makes the near agreement between equations (11) and (12)
more striking than any apparent discrepancy.
In summary, although the relation µev ∝ ρ
h ≃ constant
in time is rigorously correct only in rather specific circum-
stances, our GCMF models based on it in §2 are good proxies,
in all respects, for models based on other plausible characteri-
zations of relaxation-driven cluster mass loss. This result will
likely be important for future studies of the mass functions of
extragalactic cluster systems, where it may well be necessary
14 McLAUGHLIN & FALL
FIG. 7.— Scatter plots of mass M versus mean surface density inside the tidal radius (Σt ≡ M/πr2t ) and of Σt versus Galactocentric radius rgc, for 146 Galactic
GCs from the Harris (1996) catalogue. These plots are analogous to the left- and rightmost panels of Figure 1, and the two panels of Figure 5. The dashed line in
the left-hand plot traces the relation M ∝ Σ
t , which defines a locus of constant evaporation time for µev ∝ Σ
FIG. 8.— Observed GCMF (points, with Poisson errorbars) and models (curves) as a function of mean surface density inside the tidal radius, Σt ≡ M/πr2t (left-
hand panels), and as a function of Galactocentric radius, rgc (right-hand panels). The dashed curve in every panel is an evolved Schechter function representing
the entire GC system: equation (3) with β = 2, Mc = 106 M⊙, and a single ∆, common to all clusters, evaluated from equation (10) using the median bΣt of all
146 Galactic GCs. Solid curves are subsample-specific models using equation (3) with β = 2 and Mc = 106 M⊙ but a different ∆ value for every cluster (obtained
from equation [10] using individual observational estimates of Σt ) in any Σt or rgc bin.
SHAPING THE GCMF BY EVAPORATION 15
FIG. 9.— Half-mass density, ρh = 3M/8πr
h , against mean surface den-
sity inside the tidal radius, Σt = M/πr2t , for 146 clusters with data in Harris
(1996). The straight line is ρh = Σ
to adopt procedures based on ρh rather than ρt or Σt because
of the difficulty or impossibility of estimating tidal radii.
3.3. MTO versus rgc, and Velocity Anisotropy in GC Systems
In this paper we have directly modeled dN/d log M as a
function only of GC density and age, and used the observed ρh
(or ρt , or Σt) of clusters in relatively narrow ranges of Galac-
tocentric position to show that such models are consistent with
the current near-constancy of the GCMF as a function of rgc.
Most other models in the literature for evaporation-dominated
GCMF evolution, in either the Milky Way or other galaxies,
instead predict the distribution explicitly as a function of rgc at
any time. They therefore need, in effect, to derive theoretical
density–position relations for clusters in galaxies alongside
their main GCMF calculations. This usually begins with the
adoption of analytical potentials to describe the parent galax-
ies of GCs. Taking these to be spherical and static for a Hub-
ble time allows the use of standard tidal-limitation formulae
to write GC densities ab initio in terms of the (fixed) peri-
centers rp of unique orbits in the adopted potentials. Cluster
relaxation times and mass-loss rates µev then follow as func-
tions of rp as well. Finally, specific initial mass, space, and
velocity (or orbital eccentricity) distributions are chosen for
entire GC systems, so that at all later times it is known what
the dynamically evolved dN/d log M is for globulars with any
single rp; how many clusters with a given rp survive; and what
the distributions of rp and all dependent cluster properties are
at any instantaneous position rgc.
In this approach, if the GCMF began with a power-law rise
towards low masses and its current peak is due entirely to clus-
ter disruption, then a dependence of MTO on rp is expected in
general, because the densities of tidally limited GCs decrease
with increasing rp. Thus, models along these lines that as-
sume the orbit distribution of a GC system to be the same
at all radii in a galaxy (i.e., that the time average of the ra-
tio rgc/rp is independent of position) have typically had diffi-
culty in accounting for the observed weak or non-correlation
between MTO and present rgc in large galaxies. This is partic-
ularly a problem if it is assumed that the initial GCMF was a
pure power law, with the same index at arbitrarily high masses
as low (e.g., Baumgardt 1998; Vesperini 2001). It is poten-
tially less of a concern if dN/d log M started as a Schechter
function with an exponential cut-off at masses M > Mc, as
we have assumed, since then the existence of a strict upper
bound MTO ≤ Mc (§2.2) means that the dependence of an
evaporation-evolved MTO on rp and rgc must saturate for small
enough galactocentric radii (high enough GC densities). Even
so, the “scale-free” models of FZ01, in which Mc ≃ 10
and all GCs in a Milky Way-like galaxy potential have the
same time-averaged rgc/rp, still predict a gradient in MTO ver-
sus rgc that is stronger than observed.
FZ01 showed that, if they left all of their other assumptions
unchanged, then a dependence of GCMF peak mass on rgc
could be effectively erased by an appropriately varying radial
velocity anisotropy in the initial GC system. Thus, in their
“Eddington” models the eccentricity of a typical cluster or-
bit increases with galactocentric distance (the time average of
rgc/rp increases with radius), such that globulars spread over
a larger range of current rgc can have more similar rp and asso-
ciated MTO. However, the initial velocity-anisotropy gradient
required to fit the Milky Way GCMF data specifically is only
marginally consistent with the observed kinematics of the GC
system (e.g., Dinescu, Girard, & van Altena 1999).13 Subse-
quently, Vesperini et al. (2003) constructed broadly similar
models for the GCMF of the Virgo elliptical M87 and con-
cluded that there, too, a variable radial velocity anisotropy is
required to match the observed MTO versus rgc; but the model
anisotropy profile in this case is clearly inconsistent with the
true velocity distribution of the GC system, which is observed
to be isotropic out to large rgc (Romanowsky & Kochanek
2001; Côté et al. 2001).
These results certainly suggest that some element is lacking
in rgc-oriented GCMF models developed as outlined above.
But they do not mean that the fault lies with the main hypoth-
esis, that the difference between the mass functions of young
clusters and old GCs is due to the effects of slow, relaxation-
driven disruption in the latter case. Any conclusions about
velocity anisotropy depend on the totality of steps taken to
connect the densities and positions of clusters; and it is possi-
ble that reasonable changes to one or more of these ancillary
assumptions could make the models compatible with the ob-
served kinematics of GCs in both the Milky Way and M87,
without abandoning a basic physical picture of evaporation-
dominated GCMF evolution that is otherwise quite success-
One issue is that previous models have always specified
evaporation rates a priori as functions of cluster density (or or-
bital pericenter), usually normalizing µev so that tdis/trh ≃ 20–
40 as in standard treatments of two-body relaxation. How-
ever, following our discussion in §3.1 and §3.2, it would seem
worthwhile to investigate these models with µev increased at
fixed ρh or rp to allow tdis/trh ≈ 10 (if β ≃ 2 for the low-mass
power-law part of the initial GCMF).
FZ01 and Vesperini et al. (2003) both consider velocity dis-
tributions parametrized by a galactocentric anisotropy radius,
RA, inside of which a cluster system is essentially isotropic
and beyond which it is increasingly dominated by radial or-
13 The fact that clusters on radial orbits are preferentially disrupted lessens
any inconsistency between the radial anisotropy required in the initial veloc-
ity distribution and observational constraints on the present velocity distribu-
tion.
16 McLAUGHLIN & FALL
bits. In these terms, the difficulty with the published models
is that, to reproduce the observed insensitivity of MTO to rgc
given standard normalizations of µev, they require values of
RA that are smaller than allowed by observations (especially
for M87). Increasing RA to more realistic values while keep-
ing the normalization of µev fixed leads to a stronger gra-
dient in MTO: the orbits of GCs at small rgc . RA remain
closely isotropic and the typical rp and MTO are essentially
unchanged, while at large galactocentric distances the clus-
ter orbits are on average less radial than before, with larger
rp, lower densities, and lower evolved MTO for a given rgc.
This effect is illustrated, for example, in Figure 9 of FZ01.
However, it can be compensated at least in part by increas-
ing µev by a common factor for all GCs, with the new, larger
RA fixed, if the initial mass function is assumed to have been
a Schechter function rather than a pure power law extend-
ing to arbitrarily high masses. A faster evaporation rate
will then lead to a (roughly) proportionate increase in the
evolved GCMF peak mass for GCs with relatively low den-
sities, i.e., those at large rgc and rp; but the increase in MTO
will be smaller, and eventually even negligible, for higher-
density clusters at progressively smaller rgc—again because
MTO grows less than linearly with µev ∝ ρ
h when there is
an upper limit MTO < Mc due to an exponential cut-off in the
initial dN/d log M0. Thus, the qualitative effect of increasing
the normalization of µev in models with radially varying GC
velocity anisotropy is to weaken the amount of radial-orbit
bias required to fit an observed MTO versus rgc.
Another point, emphasized by FZ01, has to do with the
standard starting assumption that GCs orbit in galaxies that
are perfectly static and spherical. In reality, galaxies grow
hierarchically. In this case, even if the values of µev are not
changed, much of the burden for the weakening or erasing of
any initial gradients in MTO versus rgc may be transferred from
velocity anisotropy to the time-dependent evolution of the
galaxies themselves. Violent relaxation, major mergers, and
smaller accretion events all work to move clusters between
different parts of galaxies and between different progenitors,
scrambling and combining any number of pericenter–density–
MTO relations. Any position dependences in the GC ρh dis-
tribution and in MTO itself for the final galaxy are therefore
bound to be weaker, more scattered, and more difficult to re-
late accurately to a cluster velocity distribution than in the
case of a monolithic, non-evolving potential. Allowing for
a non-spherical galaxy potential would have qualitatively the
same effect, because in this case every cluster explores a range
of pericenters and different maximum tidal fields on each of
its orbits.
In this situation, it may be important to ask how evapora-
tion rates can still be approximately constant in time—so that
cluster masses still decrease approximately linearly with t as
our models assume—if the tidal field around any given GC
changes significantly over time. Thus, consider first a sys-
tem of GCs in a single, static galaxy potential. The mass-
evolution curve for each cluster is approximately a straight
line, M(t) ≃ M0 −µevt, with µev depending on some measure
of internal density, which may be ρ
h , ρ
t , or Σ
t . The av-
erage mass-evolution curve for the entire system of clusters
is also approximately linear, 〈M(t)〉 ≃ 〈M0〉 − 〈µev〉t. If now
a merger or other event rearranges the clusters in the galaxy,
then after the event the mass-loss rates of some clusters will
be higher than before and the rates of other clusters will be
lower than before. However, if the mean density of the galaxy
as a whole is roughly the same after the event as before, then
so too will be the average of the GC densities, because of
tidal limitation. The average 〈µev〉 ∝ 〈ρ
h 〉 (say) will differ
even less between the pre- and post-merger systems. Thus, al-
though using instantaneous densities to estimate the past µev
of individual clusters may err on the high side for some clus-
ters and on the low side for others, these errors will average
away to a small or even zero net bias. The approximation
µev ≃ constant in time in our GCMF models will then still be
valid in the mean, and the average 〈M(t)〉 dependence of suf-
ficiently large numbers of clusters will remain roughly linear.
This type of scenario might be expected to pertain at least
to galaxies that evolve on the fundamental plane, since this
entails a connection between the total (baryonic plus dark)
masses and circular speeds of galaxies, of the form Mgal ∝V
or Mgal ∝ V
c . By the virial theorem, the average densities
scale as ρgal ∝ V
2, and thus ρgal ∝ M
gal or ρgal ∝ M
gal .
Insofar as 〈ρh〉 ∝ 〈ρt〉 ∝ ρgal for the GCs, the system-wide av-
erage 〈µev〉 ∝ 〈ρ
h 〉 should therefore not change drastically
even after a major merger between two fundamental-plane
galaxies; at most, the ratio of final to initial 〈µev〉 will be
roughly of order the −1/4 power of the ratio of final to ini-
tial Mgal. Note that this line of reasoning is closely related to
that applied by FZ01 to explain the small observed galaxy-
to-galaxy differences in the average turnover masses of entire
GC systems (although non-zero differences do exist, and can
be accomodated in these sorts of arguments; see Jordán et al.
2006, 2007).
A full exploration of questions such as these, about the wide
range of ingredients in current GC-plus-galaxy models, will
most likely require large N-body simulations set in a realistic,
cold dark matter cosmology. Until these can be carried out,
it is our view that the kinematics of globular cluster systems
cannot be used as decisive side constraints on theories for the
GCMF.
4. CONCLUSIONS
We have shown that the mass function dN/d log M of glob-
ular clusters in the Milky Way depends significantly on clus-
ter half-mass density, ρh, with the peak or turnover mass MTO
increasing and the width of the distribution decreasing as ρh
increases. This behavior is expected if the GCMF initially
rose towards masses below the present turnover scale—as the
mass functions of young cluster systems like that in the An-
tennae galaxies do—and has evolved to its current shape via
the slow depletion of low-mass clusters over Gyr timescales,
primarily through relaxation-driven evaporation. The fact that
MTO increases with cluster density favors evaporation over
external gravitational shocks as the primary mechanism of
low-mass cluster disruption, since the mass-loss rates asso-
ciated with shocks depend inversely on cluster density and
directly on cluster mass. Our results therefore add to previ-
ous arguments supporting an interpretation of the GCMF in
terms of evaporation-dominated evolution, based on the fact
that dN/d log M scales as M1−β with β ≃ 0 in the low-mass
limit (Fall & Zhang 2001).
The observed GCMF as a function of ρh is fitted well
by simple models in which the initial distribution was
a Schechter function, dN/d log M0 ∝ M
0 exp
−M0/Mc
with β = 2 and Mc ≃ 10
6 M⊙ assumed, and in which clusters
have been losing mass for a Hubble time at roughly steady
rates that can be estimated from their current half-mass den-
SHAPING THE GCMF BY EVAPORATION 17
sities as µev ∝ ρ
h . We have shown that, although this pre-
scription is approximate, it captures the main physical depen-
dence of relaxation-driven evaporation. In particular, it leads
to model GCMFs that are entirely consistent with those re-
sulting from alternative characterizations of evaporation rates
in terms of cluster tidal densities ρt or mean surface densities
Σt (§3.2). The normalization of µev at a given ρh (or ρt , or Σt)
required to fit the GCMF implies total cluster lifetimes that
are within range of the lifetimes typically obtained in theoret-
ical studies of two-body relaxation, although our values may
be slightly shorter than the theoretical ones if the low-mass,
power-law part of the initial cluster mass function was as steep
as we have assumed.
Taking clusters in various bins of central concentration c
and Galactocentric radius rgc and using their (individual) ob-
served densities as direct input to our models yields dynam-
ically evolved GCMFs as functions of c and rgc that agree
well with all data. This again indicates that the most fun-
damental physical dependence in the GCMF is that on clus-
ter density. Moreover, our models for dN/d log M versus
rgc obtained in this way are consistent in particular with the
well-known insensitivity of the GCMF peak mass to Galac-
tocentric position. This is seen to follow from a significant
variation of MTO with ρh (or ρt , or Σt)—due in our analysis
to evaporation-dominated cluster disruption—combined with
substantial scatter in the GC densities at any Galactocentric
position.
We have not invoked an anisotropic GC velocity distribu-
tion to explain the observed weak variation of MTO with rgc;
indeed, we have made no predictions or assumptions what-
soever about velocity anisotropy. We have emphasized that,
when velocity anisotropy enters other long-term dynamical-
evolution models for the GCMF, it is only in conjunction with
several additional, interrelated assumptions made as part of
larger efforts to derive theoretical density–rgc relations for
GCs—which we have not attempted to do here. The appar-
ent need in some current models for a strong bias towards
high-eccentricity cluster orbits to explain the near-constancy
of MTO versus rgc might well be avoided by changing one or
more ancillary assumptions in the models, without having to
discard the underlying idea that the peak and low-mass shape
of the GCMF are the result of relaxation-driven cluster dis-
ruption.
It clearly will be of interest to test and refine the main
ideas in this paper through modeling of the GCMFs in other
galaxies. For the time being at least, doing so will re-
quire the estimation of approximate mass-loss rates using
cluster half-mass densities rather than tidal quantities, sim-
ply because GC half-light radii can be measured accurately
in many systems beyond the Local Group, whereas tidal
radii are much more model-dependent and difficult to ob-
serve. Chandar, Fall, & McLaughlin (2007) have recently
shown that the peak mass of the GCMF in the Sombrero
galaxy (M104) increases with ρh in a way that is reasonably
well described by sums of evolved Schechter (1976) functions
as in the models presented in this paper. It should be rela-
tively straightforward to pursue similar studies in other nearby
galaxies.
We thank Michele Trenti, Douglas Heggie, Bill Harris, Ru-
pali Chandar, and Bruce Elmegreen for helpful discussions
and comments. SMF acknowledges support from the Am-
brose Monell Foundation and from NASA grant AR-09539.1-
A, awarded by the Space Telescope Science Institute, which is
operated by AURA, Inc., under NASA contract NAS5-26555.
REFERENCES
Aguilar, L., Hut, P., & Ostriker, J. P. 1988, ApJ, 335, 720
Barmby, P., Huchra, J. P., & Brodie, J. P. 2001, AJ, 121, 1482
Barmby, P., McLaughlin, D. E., Harris, W. E., Harris, G. L. H., & Forbes, D.
A. 2007, AJ, 133, 2764
Baumgardt, H. 1998, A&A, 330, 480
Baumgardt, H. 2001, MNRAS, 325, 1323
Baumgardt, H., & Makino, J. 2003, MNRAS, 340, 227 (BM03)
Binney, J., & Tremaine, S. 1987, Galactic Dynamics (Princeton: Princeton
University Press)
Burkert, A., & Smith, G. H. 2000, ApJ, 542, L95
Caputo, F., & Castellani, V. 1984, MNRAS, 207, 185
Chandar, R., Fall, S. M., & McLaughlin, D. E. 2007, ApJ, 668, L119
Chandrasekhar, S. 1942, Principles of Stellar Dynamics (Chicago:
University of Chicago Press)
Chernoff, D. F., & Weinberg, M. D. 1990, ApJ, 351, 121
Côté, P., et al. 2001, ApJ, 559, 828
Dinescu, D. I., Girard, T. M., & van Altena, W. F. 1999, AJ, 117, 1792
Djorgovski, S., & Meylan, G. 1994, AJ, 108, 1292
Elmegreen, B. G., & Efremov, Y. N. 1997, ApJ, 480, 235
Fall, S. M., & Rees, M. J. 1977, MNRAS, 181, 37P
Fall, S. M., & Zhang, Q. 2001, ApJ, 561, 751 (FZ01)
Fukushige, T., & Heggie, D. C. 2000, MNRAS, 318, 753
Giersz, M. 2001, MNRAS, 324, 218
Giersz, M., & Heggie, D. C. 1996, MNRAS, 279, 1037
Gnedin, O. Y. 1997, ApJ, 487, 663
Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223
Gnedin, O. Y., Lee, H. M., & Ostriker, J. P. 1999, ApJ, 522, 935
Harris, W. E. 1996, AJ, 112, 1487
Harris, W.E. 2001, in Star Clusters (28th Saas-Fee Advanced Course) ed. L.
Labhardt & B. Binggeli (Berlin: Springer), 223
Harris, W. E., & Pudritz, R. E. 1994, ApJ, 429, 177
Harris, W. E., Harris, G. L. H., & McLaughlin, D. E. 1998, AJ, 115, 1801
Hénon, M. 1961, Ann. d’Astrophys., 24, 369
Innanen, K. A., Harris, W. E., & Webbink, R. F. 1983, AJ, 88, 338
Johnstone, D. 1993, AJ, 105, 155
Jordán, A., et al. 2005, ApJ, 634, 1002
Jordán, A., et al. 2006, ApJ, 651, L25
Jordán, A., et al. 2007, ApJS, 171, 101
Joshi, K. J., Nave, C. P., & Rasio, F. A. 2001, ApJ, 550, 691
Kavelaars, J. J., & Hanes, D. A. 1997, MNRAS, 285, L31
Lee, H. M., & Goodman, J. 1995, ApJ, 443, 109
King, I. 1958, AJ, 63, 109
King, I. 1959, AJ, 64, 351
King, I. R. 1966, AJ, 71, 64
Lee, H. M., & Ostriker, J. P. 1987, ApJ, 322, 123
McLaughlin, D. E. 2000, ApJ, 539, 618
McLaughlin, D. E., & van der Marel, R. P. 2005, ApJS, 161, 304
Murali, C., & Weinberg, M. D. 1997, MNRAS, 291, 717
Okazaki, T., & Tosa, M. 1995, MNRAS, 274, 48
Ostriker, J. P., & Gnedin, O. Y. 1997, ApJ, 487, 667
Parmentier, G., & Gilmore, G. 2007, MNRAS, 377, 352
Prieto, J. L., & Gnedin, O. Y. 2006, preprint (astro-ph/0608069)
Romanowsky, A. J., & Kochanek, C. S. 2001, ApJ, 553, 722
Schechter, P. 1976, ApJ, 203, 297
Smith, G. H., & Burkert, A. 2002, ApJ, 578, L51
Spitler, L. R., Larsen, S. S., Strader, J., Brodie, J. P., Forbes, D. A., &
Beasley, M. A. 2006, AJ, 132, 1593
Spitzer, L. 1987, Dynamical Evolution of Globular Clusters (Princeton:
Princeton Univ. Press)
Takahashi, K., & Portegies Zwart, S. F. 1998, ApJ, 503, L49
Takahashi, K., & Portegies Zwart, S. F. 2000, ApJ, 535, 759
Trenti, M., Heggie, D. C., & Hut, P. 2007, MNRAS, 374, 344
van den Bosch, F. C., Lewis, G. F., Lake, G., & Stadel, J. 1999, ApJ, 515, 50
Vesperini, E. 1997, MNRAS, 287, 915
Vesperini, E. 1998, MNRAS, 299, 1019
Vesperini, E. 2000, MNRAS, 318, 841
Vesperini, E. 2001, MNRAS, 322, 247
Vesperini, E., & Heggie, D. C. 1997, MNRAS, 289, 898
Vesperini, E., & Zepf, S. E. 2003, ApJ, 587, L97
Vesperini, E., Zepf, S. E., Kundu, A., & Ashman, K. M. 2003, ApJ, 593, 760
Waters, C. Z., Zepf, S. E., Lauer, T. R., Baltz, E. A., & Silk, J. 2006, ApJ,
650, 885
Zhang, Q., & Fall, S. M. 1999, ApJ, 527, L81
http://arxiv.org/abs/astro-ph/0608069
ABSTRACT
  We show that the globular cluster mass function (GCMF) in the Milky Way
depends on cluster half-mass density (rho_h) in the sense that the turnover
mass M_TO increases with rho_h while the width of the GCMF decreases. We argue
that this is the expected signature of the slow erosion of a mass function that
initially rose towards low masses, predominantly through cluster evaporation
driven by internal two-body relaxation. We find excellent agreement between the
observed GCMF -- including its dependence on internal density rho_h, central
concentration c, and Galactocentric distance r_gc -- and a simple model in
which the relaxation-driven mass-loss rates of clusters are approximated by
-dM/dt = mu_ev ~ rho_h^{1/2}. In particular, we recover the well-known
insensitivity of M_TO to r_gc. This feature does not derive from a literal
``universality'' of the GCMF turnover mass, but rather from a significant
variation of M_TO with rho_h -- the expected outcome of relaxation-driven
cluster disruption -- plus significant scatter in rho_h as a function of r_gc.
Our conclusions are the same if the evaporation rates are assumed to depend
instead on the mean volume or surface densities of clusters inside their tidal
radii, as mu_ev ~ rho_t^{1/2} or mu_ev ~ Sigma_t^{3/4} -- alternative
prescriptions that are physically motivated but involve cluster properties
(rho_t and Sigma_t) that are not as well defined or as readily observable as
rho_h. In all cases, the normalization of mu_ev required to fit the GCMF
implies cluster lifetimes that are within the range of standard values
(although falling towards the low end of this range). Our analysis does not
depend on any assumptions or information about velocity anisotropy in the
globular cluster system.

<|endoftext|><|startoftext|>
Introduction
The quantum deformations of relativistic symmetries are described by Hopf-algebraic
deformations of Lorentz and Poincaré algebras. Such quantum deformations are classified
by Lorentz and Poincaré Poisson structures. These Poisson structures given by classical
r-matrices were classified already some time ago by S. Zakrzewski in [1] for the Lorentz
algebra and in [2] for the Poincaré algebra. In the case of the Lorentz algebra a complete
list of classical r-matrices involves the four independent formulas and the corresponding
quantum deformations in different forms were already discussed in literature (see [3, 4, 5,
6, 7]). In the case of Poincaré algebra the total list of the classical r-matrices, which satisfy
the homogeneous classical Yang-Baxter equation, consists of 20 cases which have various
numbers of free parameters. Analysis of these twenty solutions shows that each of them
can be presented as a sum of subordinated r-matrices which almost all are of Abelian
and Jordanian types. A part of twists corresponding to the r-matrices of Zakrzewski
classification are given in explicit form.
2 Preliminaries
Let r be a classical r-matrix of a Lie algebra g, i.e. r ∈
∧ g and r satisfies to the classical
Yang–Baxter equation (CYBE)
[r12, r13 + r23] + [r13, r23] = Ω , (2.1)
∗Invited talk at the XXII Max Born Symposium ”Quantum, Super and Twistors”, September 27-29,
2006 Wroclaw (Poland), in honour of Jerzy Lukierski.
†Supported by the grants RFBR-05-01-01086 and FNRA NT05-241455GIPM.
http://arxiv.org/abs/0704.0081v1
where Ω is g-invariant element, Ω ∈ (
∧ g)g. We consider two types of the classical
r-matrices and corresponding twists.
Let the classical r-matrix r = rA has the form
xi ∧ yi , (2.2)
where all elements xi, yi (i = 1, . . . , n) commute among themselves. Such an r-matrix is
called of Abelian type. The corresponding twist is given as follows
= exp
= exp
xi ∧ yi
. (2.3)
This twisting two-tensor F := Fr
satisfies the cocycle equation
F 12(∆⊗ id)(F ) = F 23(id⊗∆)(F ) , (2.4)
and the ”unital” normalization condition
(ǫ⊗ id)(F ) = (id⊗ ǫ)(F ) = 1 . (2.5)
The twisting element F defines a deformation of the universal enveloping algebra U(g)
considered as a Hopf algebra. The new deformed coproduct and antipode are given as
follows
∆(F )(a) = F∆(a)F−1 , S(F )(a) = uS(a)u−1 (2.6)
for any a ∈ U(g), where ∆(a) is a co-product before twisting, and u =
i S(f
i ) if
i ⊗ f
Let the classical r-matrix r = rJ(ξ) has the form
rJ(ξ) = ξ
xν ∧ yν
, (2.7)
where the elements xν , yν (ν = 0, 1, . . . , n) satisfy the relations
[x0, y0] = y0 , [x0, xi] = (1− ti)xi , [x0, yi] = tiyi ,
[xi, yj] = δijy0 , [xi, xj ] = [yi, yj] = 0 , [y0, xj ] = [y0, yj] = 0 ,
(2.8)
(i, j = 1, . . . , n), (ti ∈ C). Such an r-matrix is called of Jordanian type. The corresponding
twist is given as follows [8, 9]
= exp
xi ⊗ yi e
−2tiσ
exp(2x0 ⊗ σ) , (2.9)
1Here entering the parameter deformation ξ is a matter of convenience.
2It is easy to verify that the two-tensor (2.7) indeed satisfies the homogenous classical Yang-Baxter
equation (2.1) (with Ω = 0), if the elements xν , yν (ν = 0, 1, . . . , n) are subject to the relations (2.8).
where σ := 1
ln(1 + ξy0).
Let r be an arbitrary r-matrix of g. We denote a support of r by Sup(r)4. The
following definition is useful.
Definition 2.1 Let r1 and r2 be two arbitrary classical r-matrices. We say that r2 is
subordinated to r1, r1 ≻ r2, if δr1(Sup(r2)) = 0, i.e.
(x) := [x⊗ 1 + 1⊗ x, r1] = 0 , ∀x ∈ Sup(r2) . (2.10)
If r1 ≻ r2 then r = r1 + r2 is also a classical r-matrix (see [15]). The subordination
enables us to construct a correct sequence of quantizations. For instance, if the r-matrix
of Jordanian type (2.7) is subordinated to the r-matrix of Abelian type (2.2), rA ≻ rJ ,
then the total twist corresponding to the resulting r-matrix r = rA+ rJ is given as follows
Fr = Fr
. (2.11)
The further definition is also useful.
Definition 2.2 A twisting two-tensor Fr(ξ) of a Hopf algebra, satisfying the conditions
(2.4) and (2.5), is called locally r-symmetric if the expansion of Fr(ξ) in powers of the
parameter deformation ξ has the form
Fr(ξ) = 1 + c r +O(ξ
2) . . . (2.12)
where r is a classical r-matrix, and c is a numerical coefficient, c 6= 0.
It is evident that the Abelian twist (2.3) is globally r-symmetric and the twist of Jordanian
type (2.9) does not satisfy the relation (2.12), i.e. it is not locally r-symmetric.
3 Quantum deformations of Lorentz algebra
The results of this section in different forms were already discussed in literature (see
[3, 4, 5, 6, 7]).
The classical canonical basis of the D = 4 Lorentz algebra, o(3, 1), can be described
by anti-Hermitian six generators (h, e±, h
′, e′±) satisfying the following non-vanishing
commutation relations5:
[h, e±] = ±e± , [e+, e−] = 2h , (3.1)
[h, e′±] = ±e
± , [h
′, e±] = ±e
± , [e±, e
∓] = ±2h
′ , (3.2)
[h′, e′±] = ∓e± , [e
−] = −2h , (3.3)
and moreover
x∗ = −x (∀ x ∈ o(3, 1)) . (3.4)
3The corresponding twists for Lie algebras sl(n), so(n) and sp(2n) were firstly constructed in the
papers [10, 11, 12, 13].
4The support Sup(r) is a subalgebra of g generated by the elements {xi, yi} if r =
xi ∧ yi.
5Since the real Lie algebra o(3, 1) is standard realification of the complex Lie sl(2,C) these relations
are easy obtained from the defining relations for sl(2,C), i.e. from (3.1).
A complete list of classical r-matrices which describe all Poisson structures and generate
quantum deformations for o(3, 1) involve the four independent formulas [1]:
r1 = α e+ ∧ h , (3.5)
r2 = α (e+ ∧ h− e
+ ∧ h
′) + 2β e′+ ∧ e+ , (3.6)
r3 = α (e
+ ∧ e− + e+ ∧ e
−) + β (e+ ∧ e− − e
+ ∧ e
−)− 2γ h ∧ h
′ , (3.7)
r4 = α
e′+ ∧ e− + e+ ∧ e
− − 2h ∧ h
± e+ ∧ e
+ . (3.8)
If the universal R-matrices of the quantum deformations corresponding to the classical
r-matrices (3.5)–(3.8) are unitary then these r-matrices are anti-Hermitian, i.e.
r∗j = −rj (j = 1, 2, 3, 4) . (3.9)
Therefore the ∗-operation (3.4) should be lifted to the tensor product o(3, 1) ⊗ o(3, 1).
There are two variants of this lifting: direct and flipped, namely,
(x⊗ y)∗ = x∗ ⊗ y∗ (∗ − direct) , (3.10)
(x⊗ y)∗ = y∗ ⊗ x∗ (∗ − flipped) . (3.11)
We see that if the ”direct” lifting of the ∗-operation (3.4) is used then all parameters in
(3.5)–(3.8) are pure imaginary. In the case of the ”flipped” lifting (3.11) all parameters
in (3.5)–(3.8) are real.
The first two r-matrices (3.5) and (3.6) satisfy the homogeneous CYBE and they are
of Jordanian type. If we assume (3.10), the corresponding quantum deformations were
described detailed in the paper [6] and they are entire defined by the twist of Jordanian
type:
= exp (h⊗ σ) , σ =
ln(1 + αe+) (3.12)
for the r-matrix (3.5), and
= exp
σ ∧ ϕ
exp (h⊗ σ − h′ ⊗ ϕ) , (3.13)
(1 + αe+)
2+ (αe′+)
, ϕ = arctan
1 + αe+
(3.14)
for the r-matrix (3.6). It should be recalled that the twists (3.12) and (3.13) are not
locally r-symmetric. A locally r-symmetric twist for the r-matrix (3.5) was obtained in
[14] and it has the following complicated formula:
= exp
∆(h)−
sinhαe+
⊗ e−αe++ eαe+⊗ h
sinhαe+
α∆(e+)
sinhα∆(e+)
, (3.15)
where ∆ is a primitive coproduct.
The last two r-matrices (3.7) and (3.8) satisfy the non-homogeneous (modified) CYBE
and they can be easily obtained from the solutions of the complex algebra o(4,C) ≃
sl(2,C)⊕ sl(2,C) which describes the complexification of o(3, 1). Indeed, let us introduce
the complex basis of Lorentz algebra (o(3, 1) ≃ sl(2;C) ⊕ sl(2,C)) described by two
commuting sets of complex generators:
(h + ıh′) , E1± =
(e± + ıe
±) , (3.16)
(h− ıh′) , E2± =
(e± − ıe
±) , (3.17)
which satisfy the relations (compare with (3.1))
[Hk, Ek±] = ±Ek± , [Ek+, Ek−] = 2Hk (k = 1, 2) . (3.18)
The ∗-operation describing the real structure acts on the generatorsHk, and Ek± (k = 1, 2)
as follows
H∗1 = −H2 , E
1± = −E2± , H
2 = −H1 , E
2± = −E1± . (3.19)
The classical r-matrix r3, (3.7), and r4, (3.8), in terms of the complex basis (3.16), (3.17)
take the form
r3 = r
1 + r
r′3 := 2(β + ıα)E1+ ∧ E1− + 2(β − ıα)E2+ ∧ E2− ,
r′′3 := 4ıγ H2 ∧H1 ,
(3.20)
r4 = r
4 + r
r′4 := 2ıα(E1+ ∧ E1− −E2+ ∧ E2− − 2H1 ∧H2) ,
r′′4 := 4ıλE1+ ∧ E2+
(3.21)
For the sake of convenience we introduce parameter6λ in r′′4 . It should be noted that
r′3, r
3 and r
4 are themselves classical r-matrices. We see that the r-matrix r
3 is simply
a sum of two standard r-matrices of sl(2;C), satisfying the anti-Hermitian condition
r∗ = −r. Analogously, it is not hard to see that the r-matrix r4 corresponds to a Belavin-
Drinfeld triple [15] for the Lie algebra sl(2;C) ⊕ sl(2,C)). Indeed, applying the Cartan
automorphism E2± → E2∓, H2 → −H2 we see that this is really correct (see also [16]).
We firstly describe quantum deformation corresponding to the classical r-matrix r3
(3.20). Since the r-matrix r′′3 is Abelian and it is subordinated to r
3 therefore the algebra
o(3, 1) is firstly quantized in the direction r′3 and then an Abelian twist corresponding
to the r-matrix r′′3 is applied. We introduce the complex notations z± := β ± ıα. It
should be noted that z− = z
+ if the parameters α and β are real, and z− = −z
the parameters α and β are pure imaginary. From structure of the classical r-matrix r′3
it follows that a quantum deformation Ur′
(o(3, 1)) is a combination of two q-analogs of
U(sl(2;C)) with the parameter qz
and qz
, where qz
:= exp z±. Thus Ur′3(o(3, 1))
(sl(2;C))⊗Uq
(sl(2;C)) and the standard generators q±H1z
, E1± and q
, E2± satisfy
6We can reduce this parameter λ to ± 1
by automorphism of o(4,C).
the following non-vanishing defining relations
qH1z+ E1± = q
E1± q
, [E1+, E1−] =
q2H1z+ − q
qz+ − q
, (3.22)
qH2z− E2± = q
E2± q
, [E2+, E2−] =
q2H2z− − q
qz− − q
. (3.23)
In this case the co-product ∆r′
and antipode Sr′
for the generators q±H1z
, E1± and q
E2± can be given by the formulas:
(q±H1z+ ) = q
⊗ q±H1z+ , ∆r′1
(E1±) = E1± ⊗ q
+ q−H1z+ ⊗ E1± , (3.24)
(q±H2z− ) = q
⊗ q±H2z− , ∆r′1
(E2±) = E2± ⊗ q
+ q−H2z− ⊗ E2± , (3.25)
(q±H1z+ ) = q
, Sr′
(E1±) = −q
E1± , (3.26)
(q±H2z
) = q∓H2z− , Sr′1
(E2±) = −q
E2± . (3.27)
The ∗-involution describing the real structure on the generators (3.8) can be adapted to
the quantum generators as follows
(q±H1z+ )
∗ = q∓H2
, E∗1± = −E2± , (q
)∗ = q∓H1
, E∗2± = −E1± , (3.28)
and there exit two ∗-liftings: direct and flipped, namely,
(a⊗ b)∗ = a∗ ⊗ b∗ (∗ − direct) , (3.29)
(a⊗ b)∗ = b∗ ⊗ a∗ (∗ − flipped) (3.30)
for any a ⊗ b ∈ Ur′
(o(3, 1)) ⊗ Ur′
(o(3, 1)), where the ∗-direct involution corresponds to
the case of the pure imaginary parameters α, β and the ∗-flipped involution corresponds
to the case of the real deformation parameters α, β. It should be stressed that the Hopf
structure on Ur′
(o(3, 1)) satisfy the consistency conditions under the ∗-involution
(a∗) = (∆r′
(a))∗, Sr′
((Sr′
(a∗))∗) = a (∀x ∈ Ur′
(o(3, 1)) . (3.31)
Now we consider deformation of the quantum algebra Ur′
(o(3, 1)) (secondary quan-
tization of U(o(3, 1))) corresponding to the additional r-matrix r′′3 , (3.20). Since the
generators H1 and H2 have the trivial coproduct
(Hk) = Hk ⊗ 1 + 1⊗Hk (k = 1, 2) , (3.32)
therefore the unitary two-tensor
:= qH1∧H2ıγ (F
= F−1
) (3.33)
satisfies the cocycle condition (2.4) and the ”unital” normalization condition (2.5). Thus
the complete deformation corresponding to the r-matrix r3 is the twisted deformation of
(o(3, 1)), i.e. the resulting coproduct ∆r
is given as follows
(x) = Fr′′
(x)F−1
(∀x ∈ Ur′
(o(3, 1)) . (3.34)
and in this case the resulting antipode Sr
does not change, Sr
. Applying the
twisting two-tensor (3.33) to the formulas (3.24) and (3.25) we obtain
(q±H1z+ ) = q
⊗ q±H1z+ , ∆r′1(q
) = q±H2z− ⊗ q
, (3.35)
(E1+) = E1+ ⊗ q
qH2ıγ + q
q−H2ıγ ⊗ E1+ , (3.36)
(E1−) = E1− ⊗ q
q−H2ıγ + q
qH2ıγ ⊗ E1− , (3.37)
∆r3(E2+) = E2+ ⊗ q
q−H1ıγ + q
qH1ıγ ⊗ E2+ , (3.38)
∆r3(E2−) = E2− ⊗ q
qH1ıγ + q
q−H1ıγ ⊗ E2− . (3.39)
Next, we describe quantum deformation corresponding to the classical r-matrix r4
(3.21). Since the r-matrix r′4(α) := r
4 is a particular case of r3(α, β, γ) := r3, namely
r′4(α) = r3(α, β = 0, γ = α), therefore a quantum deformation corresponding to the r-
matrix r′4 is obtained from the previous case by setting β = 0, γ = α, and we have the
following formulas for the coproducts ∆r′
) = q
(k = 1, 2) , (3.40)
(E1+) = E1+ ⊗ q
H1+H2
+ q−H1−H2
⊗ E1+ , (3.41)
(E1−) = E1− ⊗ q
H1−H2
+ q−H1+H2
⊗ E1− , (3.42)
(E2+) = E2+ ⊗ q
−H1−H2
ξ + q
H1+H2
ξ ⊗ E2+ , (3.43)
(E2−) = E2− ⊗ q
H1−H2
ξ + q
−H1+H2
ξ ⊗ E2− , (3.44)
where we set ξ := ıα.
Consider the two-tensor
:= exp
λE1+q
H1+H2
ξ ⊗ E2+q
H1+H2
. (3.45)
Using properties of q-exponentials (see [17]) is not hard to verify that Fr′′
satisfies the co-
cycle equation (2.4). Thus the quantization corresponding to the r-matrix r4 is the twisted
q-deformation Ur′
(o(3, 1)). Explicit formulas of the co-products ∆r
(·) = F
(·)F−1
and antipodes Sr4(·) in the complex and real Cartan-Weyl bases of Ur′4(o(3, 1)) will be
presented in the outgoing paper [7].
4 Quantum deformations of Poincare algebra
The Poincaré algebra P(3, 1) of the 4-dimensional space-time is generated by 10 elements:
the six-dimensinal Lorentz algebra o(3, 1) with the generators Mi, Ni (i = 1, 2, 3):
[Mi, Mj ] = ıǫijk Mk, [Mi, Nj ] = ıǫijk Nk, [Ni, Nj ] = −ıǫijk Mk, (4.1)
and the four-momenta Pj, P0 (j = 1, 2, 3) with the standard commutation relations:
[Mj , Pk] = ıǫjkl Pl , [Mj , P0] = 0 , (4.2)
[Nj, Pk] = −ıδjk P0 , [Nj , P0] = −ıPj . (4.3)
The physical generators of the Lorentz algebra, Mi, Ni (i = 1, 2, 3), are related with the
canonical basis h, h′, e±, e
± as follows
h = ıN3 , e± = ı(N1 ± M2), (4.4)
h′ = −ıM3 , e
± = ı(±N2 −M1). (4.5)
The subalgebra generated by the four-momenta Pj, P0 (j = 1, 2, 3) will be denoted by P
and we also set P± := P0 ± P3.
S. Zakrzewski has shown in [2] that each classical r-matrix, r ∈ P(3, 1) ∧ P(3, 1), has
a decomposition
r = a+ b+ c , (4.6)
where a ∈ P ∧P, b ∈ o(3, 1) ∧P, c ∈ o(3, 1) ∧ o(3, 1) satisfy the following relations
[[c, c]] = 0 , (4.7)
[[b, c]] = 0 , (4.8)
2[[a, c]] + [[b, b]] = tΩ (t ∈ R) , (4.9)
[[a, b]] = 0 . (4.10)
Here [[·, ·]] means the Schouten bracket. Moreover a total list of the classical r-matrices
for the case c 6= 0 and also for the case c = 0, t = 0 was found.7 It was shown that
there are fifteen solutions for the case c = 0, t = 0, and six solutions for the case c 6= 0
where there is only one solution for t 6= 0. Thus Zakrzewski found twenty r-matrices which
satisfy the homogeneous classical Yang-Baxter equation (t = 0 in (4.9)). Analysis of these
twenty solutions shows that each of them can be presented as a sum of subordinated r-
matrices which almost all are of Abelian and Jordanian types. Therefore these r-matrices
correspond to twisted deformations of the Poincaré algebra P(3, 1). We present here
r-matrices only for the case c 6= 0, t = 0:
r1 = γh
′ ∧ h+ α(P+ ∧ P− − P1 ∧ P2) , (4.11)
r2 = γe
+ ∧ e+ + β1(e+ ∧ P1 − e
+ ∧ P2 + h ∧ P+) + β2h
′ ∧ P+ , (4.12)
r3 = γe
+ ∧ e+ + β(e+ ∧ P1 − e
+ ∧ P2 + h ∧ P+) + αP1 ∧ P+ , (4.13)
r4 = γ(e
+ ∧ e+ + e+ ∧ P1+ e
+ ∧ P2− P1 ∧ P2) + P+ ∧ (α1P1+ α2P2) , (4.14)
r5 = γ1(h ∧ e+ − h
′ ∧ e′+) + γ2e+ ∧ e
+ . (4.15)
The first r-matrix r1 is a sum of two subordinated Abelian r-matrices
r1 := r
1 + r
1 , r
1 ≻ r
r′1 = α(P+ ∧ P− − P1 ∧ P2) , r
1 := γh
′ ∧ h .
(4.16)
Therefore the total twist defining quantization in the direction to this r-matrix is the
ordered product of two the Abelian twits
= Fr′′
= exp
γh′ ∧ h
α(P+ ∧ P− − P1 ∧ P2)
. (4.17)
7Classification of the r-matrices for the case c = 0, t 6= 0 is an open problem up to now.
The second r-matrix r2 is a sum of three subordinated r-matrices where two of them
are of Abelian type and one is of Jordanian type
r2 := r
3 + r
2 + r
2 , r
2 ≻ r
2 ≻ r
r′2 := β1(e+ ∧ P1 − e
+ ∧ P2 + h ∧ P+) ,
r′′2 := γe
+ ∧ e+ , r
2 := β2h
′ ∧ P+ .
(4.18)
Corresponding twist is given by the following formulas
= Fr′′′
, (4.19)
where
= exp
β1(e+ ⊗ P1 − e
+ ⊗ P2)
exp(2h⊗ σ+) ,
= exp(γe′+ ∧ e+) , Fr′′′2 = exp(β2h
′ ∧ σ+) .
(4.20)
Here and below we set σ+ :=
ln(1 + β1P+).
The third r-matrix r3 is a sum of two subordinated r-matrices where one is of Abelian
type and another is a more complicated r-matrix which we call mixed Jordanian-Abelian
r3 := r
3 + r
3 , r
3 ≻ r
r′3 := β1(e+ ∧ P1 − e
+ ∧ P2 + h ∧ P+) + αP1 ∧ P+ ,
r′′3 := γe
+ ∧ e+ .
(4.21)
Corresponding twist is given by the following formulas
= Fr′′
, (4.22)
where
= exp
β1(e+ ⊗ P1 − e
+ ⊗ P2)
exp(αP1 ∧ σ+) exp(2h⊗ σ+) ,
= exp(γe′+ ∧ e+) .
(4.23)
The fourth r-matrix r4 is a sum of two subordinated r-matrices of Abelian type
r4 := r
4 + r
4 , r
4 ≻ r
r′4 := P+ ∧ (α1P1 + α+P2) ,
r′′4 := γ(e
+ − P1) ∧ (e+ + P2) .
(4.24)
Corresponding twist is given by the following formulas
= Fr′′
, (4.25)
where
= exp
(P+ ⊗ (α1P1 + α2P2)
= exp
γ(e′+ − P1) ∧ (e+ + P2)
(4.26)
The fifth r-matrix r5 is the r-matrix of the Lorentz algebra, (3.6), and the correspond-
ing twist is given by the formula (3.13).
References
[1] S. Zakrzewski, Lett. Math. Phys., 32, 11 (1994).
[2] S. Zakrzewski, Commun. Math. Phys., 187, 285 (1997);
http://arxiv.org/abs/q-al/9602001.
[3] M. Chaichian and A. Demichev, Phys. Lett., B34, 220 (1994)
[4] A. Mudrov, Yadernaya Fizika, 60, No.5, 946 (1997).
[5] A. Borowiec, J. Lukierski, V.N. Tolstoy, Czech. J. Phys., 55, 11 (2005);
http://xxx.lanl.gov/abs/hep-th/0301033.
[6] A. Borowiec, J. Lukierski, V.N. Tolstoy, Eur. Phys. J., C48, 336 (2006);
arXiv:hep-th/0604146.
[7] A. Borowiec, J. Lukierski, V.N. Tolstoy, in preparation.
[8] V.N. Tolstoy, Proc. of International Workshop ”Supersymmetries and Quantum Sym-
metries (SQS’03)”, Russia, Dubna, July, 2003, eds: E. Ivanov and A. Pashnev, publ.
JINR, Dubna, p. 242 (2004); http://xxx.lanl.gov/abs/math.QA/0402433.
[9] V.N. Tolstoy, Nankai Tracts in Mathematics ”Differential Geometry and Physics”.
Proceedings of the 23-th International Conference of Differential Geometric
Methods in Theoretical Physics (Tianjin, China, 20-26 August, 2005). Edi-
tors: Mo-Lin Ge and Weiping Zhang. Wold Scientific, 2006, Vol. 10, 443-452;
http://xxx.lanl.gov/abs/math.QA/0701079.
[10] P.P. Kulish, V.D. Lyakhovsky and A.I. Mudrov, Journ. Math. Phys., 40, 4569 (1999).
[11] P.P. Kulish, V.D. Lyakhovsky and M.A. del Olmo, Journ. Phys. A: Math. Gen., 32,
8671 (1999).
[12] V.D. Lyakhovsky, S. Stolin and P.P. Kulish, J. Math. Phys. Gen., 42, 5006 (2000).
[13] D.N. Ananikyan, P.P. Kulish and V.D. Lyakhovsky, St.Petersburg Math. J., 14, 385
(2003).
[14] Ch. Ohn, Lett. Math. Phys., 25, 85 (1992).
[15] A. Belavin and V. Drinfeld, Functional Anal. Appl., 16(3), 159 (1983); translated
from Funktsional. Anal. i Prilozhen, 16, 1 (1982) (Russian).
[16] A.P. Isaev and O.V. Ogievetsky, Phys. Atomic Nuclei, 64(12), 2126 (2001);
math.QA/0010190.
[17] S.M. Khoroshkin and V. Tolstoy, Comm. Math. Phys., 141(3), 599 (1991).
http://arxiv.org/abs/q-al/9602001
http://xxx.lanl.gov/abs/hep-th/0301033
arXiv:hep-th/0604146
http://xxx.lanl.gov/abs/math.QA/0402433
http://xxx.lanl.gov/abs/math.QA/0701079
	Introduction
	Preliminaries
	Quantum deformations of Lorentz algebra
	Quantum deformations of Poincare algebra
ABSTRACT
  We discussed quantum deformations of D=4 Lorentz and Poincare algebras. In
the case of Poincare algebra it is shown that almost all classical r-matrices
of S. Zakrzewski classification correspond to twisted deformations of Abelian
and Jordanian types. A part of twists corresponding to the r-matrices of
Zakrzewski classification are given in explicit form.

<|endoftext|><|startoftext|>
Introduction
In 2002, matter-wave bright solitons in quasi-one-dimensional (1D) Bose-Einstein conden-
sates (BECs) were observed experimentally.1, 2) Bright solitons propagate in most cases with
much larger amplitudes than dark solitons,3, 4) and are expected to have the potential for
various applications such as coherent transport and atom interferometry. Soliton propagation
in BEC can be described by the Gross-Pitaevskii (GP) equation. The GP equation, called the
nonlinear Schrödinger (NLS) equation in nonlinear science, is integrable and has soliton solu-
tions in a one-dimensional and uniform system. Recent experimental and theoretical advances
about matter-wave bright solitons are reviewed, for instance, in ref. 5.
The experimental creation of matter-wave solitons has been so far achieved only for single-
component BEC. It is, nevertheless, very interesting to consider soliton propagation in BEC
with internal degrees of freedom, so-called, spinor BEC. When BEC of ultracold alkali atoms
is trapped exclusively by optical means, the hyperfine spin of atoms remains liberated. The
spinor BEC was realized in such a way.6–8) Internal degrees of freedom endow solitons with
a multiplicity. The multiple solitons will show a rich variety of dynamics. Here, we focus on
the boson system in the F = 1 hyperfine spin state, exemplified by 23Na, 39K and 87Rb. The
multi-component GP equation for F = 1 spinor BEC turns to an integrable model at special
points, which is mathematically equivalent to the matrix NLS equation. An integrable model
with a self-focusing nonlineality enables one to perform exact analysis via the inverse scatter-
ing method (ISM) for the matrix NLS equation.9) In particular, bright soliton solutions under
vanishing boundary conditions (VBC) are obtained, whose properties are investigated in refs.
10 and 11. Recently, the ISM for the matrix NLS equation under nonvanishing boundary con-
ditions (NVBC) is formulated.12) Dark solitons in the F = 1 spinor BEC can be investigated
by applying the ISM under NVBC to an integrable model with a self-defocusing nonlineal-
ity.13) Although the ISM under NVBC is dedicated mainly to the self-defocusing case, we note
that this technique is also applicable to an integrable model with a self-focusing nonlineality,
which makes us available to bright soliton solutions with a finite background.
In this paper, the detail of matter-wave bright solitons in the quasi-1D F = 1 spinor BEC
is further investigated, based on an integrable model. We consider matter-wave spinor bright
solitons traveling on a finite background of the condensate. We write down explicitly new
soliton solutions, and verify that the obtained soliton solutions have the similar properties
compared to those without a background. In the usual experimental setups, the condensates
are confined in a finite-size regime, and the matter-wave bright solitons will accompany a
finite background. The study given in this paper is meaningful in such realistic circumstances.
The paper is organized as follows. In § 2, the GP equation for quasi-1D F = 1 spinor BEC
is introduced. In particular, the integrable model is presented. There, the interactions between
two atoms are supposed to be inter-atomic attractive and ferromagnetic, which lead to bright
J. Phys. Soc. Jpn. Full Paper
solitons. In § 3, the inverse scattering method under nonvanishing boundary conditions is
applied to the integrable model. This application leads to bright soliton solutions with a finite
background. Several conserved quantities of the model are also provided. One-soliton solutions
are investigated in § 4. The spin states of one-solitons are classified, assuming that discrete
eigenvalues are purely imaginary. Two-soliton solutions are discussed in § 5. The last section,
§ 6, is devoted to the concluding remarks.
2. GP Equation for F = 1 Spinor Bose-Einstein Condensates
For BEC of ultracold alkali atoms, the mean-field theory works well, because almost all
atoms go into condensation and the condensate is dilute. In this paper, we deal with the
quasi-one-dimensional system. Atoms in the F = 1 hyperfine spin state have three magnetic
substates labeled by the magnetic quantum number mF = 1, 0,−1. The system is charac-
terized by a vectorial field operator with the components corresponding to each substate,
Φ̂ = (Φ̂1, Φ̂0, Φ̂−1)
T , satisfying equal-time commutation relations:
[Ψ̂α(x, t), Ψ̂
(x′, t)] = δαβδ(x− x′), (1)
where the subscripts α, β take on 1, 0,−1. In the framework of the mean-field theory for BEC,
the quantum field is replaced with the order parameter:
Φ(x, t) ≡ 〈Φ̂(x, t)〉 = (Φ1(x, t),Φ0(x, t),Φ−1(x, t))T . (2)
Φ(x, t) is often called the spinor condensate wavefunction, which is normalized to the total
number of atoms NT:
dxΦ†(x, t)Φ(x, t) = NT. (3)
The spinor condensate wavefunction obeys a set of coupled evolution equations, namely, the
multi-component GP equation:
i~∂tΦ1 = −
∂2xΦ1 + (c̄0 + c̄2)
|Φ1|2 + |Φ0|2
+(c̄0 − c̄2)|Φ−1|2Φ1 + c2Φ∗−1Φ20,
i~∂tΦ0 = −
∂2xΦ0 + (c̄0 + c̄2)
|Φ1|2 + |Φ−1|2
+c̄0|Φ0|2Φ0 + 2c̄2Φ∗0Φ1Φ−1,
i~∂tΦ−1 = −
∂2xΦ−1 + (c̄0 + c̄2)
|Φ−1|2 + |Φ0|2
+(c̄0 − c̄2)|Φ1|2Φ−1 + c̄2Φ∗1Φ20, (4)
where c̄0 = (ḡ0 + 2ḡ2)/3 and c̄2 = (ḡ2 − ḡ0)/3 denote effective 1D coupling constants for the
mean-field and the spin-exchange interaction, respectively. Here, the effective 1D coupling
constants ḡf are given by
ḡf =
4~2af
1− Caf/a⊥
, (5)
J. Phys. Soc. Jpn. Full Paper
where af are the s-wave scattering lengths in the total hyperfine spin f channel, a⊥ is the
size of the transverse ground state, m is the atomic mass, and C = −ζ(1/2) ≃ 1.46. Note that
one may change the values of c̄0 and c̄2 by tuning a⊥.
Equation (4) is derived as follows. The interaction between two atoms in the F = 1
hyperfine spin state has a form,15, 16)
V̂ (x1 − x2) = δ(x1 − x2)
c̄0 + c̄2F̂ 1 · F̂ 2
, (6)
where F̂ i is the spin operator. The Gross-Pitaevskii energy functional is thus given by
EGP[Φ] =
α∂xΦα +
α′Φα′Φα +
αβ · fα′β′Φβ′Φβ
, (7)
where repeated subscripts (α, β, α′, β′ = 1, 0,−1) should be summed up and f = (fx, fy, fz)T
with fi(i = x, y, z) being 3× 3 spin-1 matrices. Then, the variational principle: i~∂tΦα(x, t) =
δEGP[Φ]/δΦ
α(x, t), for α = 1, 0,−1, yields eq. (4).
An important fact is that eq. (4) possesses a completely integrable point when c̄0 = c̄2 ≡
−c < 0, equivalently, 2ḡ0 = −ḡ2 > 0.10, 11) This condition is realized when
a⊥ = 3C
2a0 + a2
, (8)
assuming that a0a2(a2−a0) > 0 holds. The situation corresponds to attractive (c̄0 < 0) and fer-
romagnetic (c̄2 < 0) interaction. When we change the wavefunction by Φ = (φ1,
2φ0, φ−1)
and measure time and length in units of t̄ = ~a⊥/c and x̄ = ~
a⊥/2mc, respectively, we can
rewrite eq. (4) with c̄0 = c̄2 ≡ −c < 0 into the dimensionless form,
i∂tφ1 = −∂2xφ1 − 2(|φ1|2 + 2|φ0|2)φ1 − 2φ∗−1φ20,
i∂tφ0 = −∂2xφ0 − 2(|φ−1|2 + |φ0|2 + |φ1|2)φ0 − 2φ∗0φ1φ−1,
i∂tφ−1 = −∂2xφ−1 − 2(|φ−1|2 + 2|φ0|2)φ−1 − 2φ∗1φ20. (9)
Then, eq. (9) is found to be equivalent to a 2× 2 matrix version of the NLS equation with a
self-focusing nonlinearity:
i∂tQ+ ∂
xQ+ 2QQ
†Q = O, (10)
with an identification,
φ1 φ0
φ0 φ−1
. (11)
The matrix NLS equation (10) is integrable in the sense that the initial value problem can
be solved via the inverse scattering method.9, 12) The integrability of the reduced equations
(9) is thus proved automatically. Thus, we have derived the integrable spinor model. Another
integrable point of eq. (4) is c̄0 = c̄2 ≡ c > 0, i.e., the matrix NLS equation with a self-
defocusing nonlineality.13) Special solutions for generic coupling constants c̄0, c̄2 are given in
ref. 17.
J. Phys. Soc. Jpn. Full Paper
3. Bright Solitons with a Finite Background
We consider bright soliton solutions of the integrable model (9) under NVBC, whereas
those under VBC are studied in refs. 10 and 11. We summarize briefly the results of the inverse
scattering method for eq. (9) with NVBC.12)
We define the nonvanishing boundary conditions as
Q(x, t) → Q±, x→ ±∞,
±Q± = Q±Q
± = λ
0I, (12)
where λ0 is a positive real constant and I denotes a 2 × 2 unit matrix. Note that vanishing
boundary conditions are recovered as λ0 → 0. The analysis of the ISM under NVBC yields
the standard form of the multiple soliton solutions of the 2 × 2 matrix NLS equation with a
self-focusing nonlineality (10) as
Q(x, t) = λ0 e
I + 2i(I · · · I
︸ ︷︷ ︸
. (13)
Here, a 2 × 2 complex matrix Πj is called the polarization matrix. S is a 2N × 2N matrix
defined by
Sij =
ζj + λj
δijI +
i(ζi + ζj)
ζi + λi
ζj + λj
1 ≤ i, j ≤ N, (14)
where λj is a complex discrete eigenvalue for the bound state and ζj = (λ
j + λ
1/2 with
Im ζj > 0 for j = 1, . . . , N . It is required for the ISM under NVBC that a two-sheet Riemann
surface is introduced appropriately, due to a double-valued function ζj. The phase of the
carrier wave, φ(x, t), is given by
φ(x, t) = kx− (k2 − 2λ20)t+ δ, (15)
and the coordinate function is given by
χj ≡ χj(x, t) = 2iζj(x− 2(λj + k)t). (16)
The above solution is the M(= N/2)-soliton solution. The ISM under NVBC for the self-
focusing case results in pairs of discrete eigenvalues corresponding to each Riemann sheet.
The constraint should be imposed on λj and ζj (j = 1, · · · , N) such that λ2l−1 = λ∗2l and
ζ2l−1 = −ζ∗2l for l = 1, · · · , N/2. At the same time, Πj must satisfy that Π2l−1 = Π
For our reduction to the integrable model for F = 1 spinor BEC, we must make the
potential Q symmetric, noting eq. (11). The symmetry of Q is naturally reflected in Πj.
When we take every Πj to be symmetric in eq. (13), soliton solutions of the integrable model
(9) under NVBC are obtained.
J. Phys. Soc. Jpn. Full Paper
The form (13) is the standard form of soliton solutions in the sense that the boundary
value at x→ ∞ ⇔ eχj → 0 is supposed to be fixed as
Q(x, t) e−iφ(x,t) → λ0I, x→ ∞. (17)
The spinor model, however, allows the SU(2) transformation of the solutions, if they are kept
symmetric. To be concrete, let U be a 2× 2 unitary matrix. When Q is a solution of eq. (10)
with eq. (11), then
Q′ = UQUT , (18)
is also a solution. Assuming that Q is the standard form (13), the limit x→ ∞ of Q′ becomes
Q′ e−iφ(x,t) → λ0 UUT ≡ λ0Q′+, x→ ∞. (19)
Q′+ = UUT is the so-called Cholesky decomposition. The arbitrary boundary conditions Q′+
other than eq. (17) are thus realized via the SU(2) transformation. On the other hand, the
behavior in the limit x → −∞ varies depending on whether detΠ = 0 or not, which will be
discussed later for the one-soliton case.
There is another important concept about an integrable model. Due to the integrability,
the model has the infinite conservation laws, which restrict the dynamics of the system in an
essential way. Several conserved quantities, related to the physical quantities of the system,
are listed below:
Total number : N̄T =
dx n̄(x, t),
n̄(x, t) = tr (Q†Q)− tr (Q†±Q±). (20)
Total spin : FT =
dx f(x, t),
f(x, t) = tr (Q†σQ). (21)
Total momentum : P̄T =
dx p̄(x, t),
p̄(x, t) = −i~[tr (Q†Qx)− tr (Q†±Q±,x)]. (22)
Total energy : ĒT =
dx ē(x, t),
ē(x, t) = c[tr (Q†xQx −Q†QQ†Q)
−tr (Q†±,xQ±,x −Q
±Q±)]. (23)
Here, σ = (σx, σy, σz)
T are the Pauli matrices. To avoid the divergence of integrals, we should
subtract the contribution of the background from the physical quantities, except the total spin,
to which the background does not contribute explicitly. These subtractions are emphasized
by the bars on the conserved densities and quantities. The local spin density f = (fx, fy, fz)
is covariant under the SU(2) transformation (18), whereas the other densities such as n̄(x, t),
J. Phys. Soc. Jpn. Full Paper
p̄(x, t) and ē(x, t) are invariant. The total macroscopic spin is directed to face to an arbitrary
direction by the global spin rotation. The SU(2) symmetry of the system causes the energy
degenerated states of solitons for this spin rotation.
4. One-Soliton Solutions
In this section, one-soliton solutions of the integrable spinor model (9) are investigated in
detail. We can derive the explicit form of one-soliton solutions by setting N = 2 (M = 1) in
the formula (13). The calculation is complicated but straightforward. The result is as follows:
Q = λ0 e
iφ(x,t)
I + 2i
, (24)
where detS is given by
detS = κ21κ
2 − eχ1ν1κ1κ22 trΠ− ex2ν2κ21κ2 trΠ†
+eχ1+χ2κ1κ2
̟ + ν1ν2(|tr Π|2 − 1)
+e2χ1ν21κ
2 detΠ + e
2χ2ν22κ
1 detΠ
− e2χ1+χ2ν1κ2̟ trΠ† detΠ− eχ1+2χ2ν2κ1̟ trΠdetΠ†
+e2χ1+2χ2̟2|detΠ|2, (25)
and T is a 2× 2 matrix such that
T = eχ1κ1κ
2 Π+ e
χ2κ21κ2 Π
− eχ1+χ2κ1κ2
ς1 trΠ · Π† + ς2 trΠ† ·Π+ µ(|trΠ|2 − 1)I
− e2χ1µ1κ22 detΠ · I − e2χ2µ2κ21 detΠ† · I
+e2χ1+χ2κ2 detΠ
ς21 Π
† +̟ trΠ† · I
+eχ1+2χ2κ1 detΠ
ς22 Π+̟ trΠ · I
− e2χ1+2χ2̟ (ς1 + ς2) |detΠ|2 · I. (26)
We explain physical meanings of notations. The phase of the carrier wave is given by
φ(x, t) = kx− (k2 − 2λ20)t+ δ. (27)
Let λj and ζj = (λ
j + λ
1/2 for j = 1, 2 be complex constants satisfying λ1 = λ
2 and
ζ1 = −ζ∗2 . Without loss of generality, we assume that (λ1, ζ1) ((λ2, ζ2)) belongs to the upper
(lower) Riemann sheet, which is characterized such that Im ζ Im λ > 0 (Im ζ Im λ < 0).
χj(x, t) is expressed in terms of them as
χ1 ≡ χ1(x, t) = 2iζ1(x− 2(λ1 + k)t), (28)
χ2 ≡ χ2(x, t) = 2iζ2(x− 2(λ2 + k)t). (29)
Note that χ1 = χ
2 ≡ χ holds. Reχ thus denotes the coordinate of the envelope soliton,
whereas Imχ implies the self-modulation phase. The polarization matrix Π is a 2×2 symmetric
J. Phys. Soc. Jpn. Full Paper
matrix. Here, the normalization in a sense of the square norm is imposed on Π as a matter of
convenience:
, 2|α|2 + |β|2 + |γ|2 = 1. (30)
The other parameters are expedient functions of λ0, λ1 and λ2:
ζj + λj
, µ = iλ0
κ1 + κ2
ζ1 + ζ2
, νj =
iλ0κj
, ̟ = ν1ν2 − µ2, ςj = νj − µ, (31)
for j = 1, 2. We list the meaning of each parameter as follows:
k : wave number of soliton’s carrier wave.
λ0 : amplitude of soliton’s carrier wave.
φ(x, t) : phase of soliton’s carrier wave.
Re χ(x, t) : coordinate of soliton’s envelope.
Im χ(x, t) : self-modulation phase of soliton.
Π : symmetric polarization matrix of soliton.
Equations (24)-(26) are new soliton solutions which had never been written down explicitly
in the literatures. If we take the vanishing limit λ0 → 0, ζ1 and ζ2 converge at λ1 and −λ2,
respectively. Then,
· 2kR
Πe−(χR+ρ/2) + (σyΠ†σy) eχR+ρ/2detΠ
e−(2χR+ρ) + 1 + e2χR+ρ|detΠ|2
eiχI , (32)
with notations
eρ/2 ≡ 1
, (33)
χR ≡ χR(x, t) = kR(x− 2kI t)− ǫ, (34)
χI ≡ χI(x, t) = kIx+ (k2R − k2I )t, (35)
each of which holds the following correspondence respectively:
kR = −2Imλ1, (36)
kI = 2Reλ1 + k, (37)
ǫ = − ln(4|λ1|). (38)
Equations (32)-(35) are the same forms as those in ref. 11, except a phase factor. This con-
sequence is natural but non-trivial, because the formula of solitons under VBC9) is quite
different from that under NVBC (13), in particular, in the form of the matrix S. Actually, the
initial displacement ǫ can be arbitrarily changed, regardless of eq. (38), by the parallel shift
of the position x. We have shown that the soliton solutions (24)-(26) can be regarded as a
general form of bright soliton solutions, including the case of VBC.
J. Phys. Soc. Jpn. Full Paper
4.1 Classification by the boundary conditions
We shall show that there are two kinds of one-soliton solutions depending on the boundary
conditions, detΠ = 0 or detΠ 6= 0. The similar classification about the boundary conditions
also exists for dark solitons.13) The examples of snapshots of one-soliton density profiles are
shown in Fig. 1. The upper row is for detΠ = 0, and the lower row is for detΠ 6= 0. The
shape of envelope solitons looks a locally-oscilating wave rather than, literally, a solitary wave,
because of the self-modulation due to the complex velocity.
For detΠ = 0, the boundary conditions of the standard form (24)-(26) are
Q e−iφ → λ0I, x→ ∞,
Q e−iφ → λ0
I − 2i ς1 trΠ · Π
† + ς2 trΠ
† · Π+ µ(|trΠ|2 − 1)I
̟ + ν1ν2(|tr Π|2 − 1)
, x→ −∞. (39)
The left and right boundary values differ in not only the global phase but also the population
of each component, in general. That is, those are the SU(2) rotated boundary conditions.
In the upper row of Fig. 1, we see that the envelope soliton of each component forms the
domain-wall (DW) shape, although it does not manifest in the total number density.
On the other hand, for detΠ 6= 0, the boundary conditions are
Q e−iφ → λ0I, x→ ∞,
Q e−iφ → λ0
1− 2i
ς1 + ς2
I, x→ −∞. (40)
In contrast to the case that detΠ = 0, both boundary values are diagonal matrices, and only
the phase-shift (PS) occurs. That is, those are the U(1) rotated boundary conditions.
For the above reasons, we call one-soliton solutions the DW-type for detΠ = 0, and the
PS-type for detΠ 6= 0. Remark the following; the spin density profile of DW-type suggests
that the total spin is nonzero, whereas that of PS-type is dipole-shape, implying that the total
spin amounts to zero. See the right panel of Fig. 1. This observation will be solidified in § 4.3.
4.2 Case of purely imaginary discrete eigenvalues
The ISM performed on Riemann sheets involves a double-valued function of the spectral
parameter, and it usually renders a very complicated representation of N -soliton solutions
even for N = 1, as is seen from eqs. (24)-(26). To simplify an explicit representation, it is
convenient to assume that λj and ζj are purely imaginary.
18) The similar approach is employed
for the analysis of N -soliton solutions of the derivative NLS equation under NVBC.20)
If we take a pair of discrete eigenvalues as
(λ1, ζ1) ≡ (iλ0λ, iλ0ζ), (λ2, ζ2) ≡ (−iλ0λ, iλ0ζ), (41)
where λ and ζ are positive real numbers such that
λ > 1, ζ =
λ2 − 1, (42)
J. Phys. Soc. Jpn. Full Paper
(a) (b) (c)
Fig. 1. Snapshots of one-soliton density profiles. The upper row is plotted for detΠ = 0 at the moment
t = 0, with k = 0, λ0 = 1, λ1 = 1+i, ξ1 = 1.27+0.79i (χ(x, t) = −(1.57−2.54i)x− (8.23−1.94i)t)
and Π =
4/5 2/5
2/5 1/5
. The lower row is plotted for detΠ 6= 0 at the moment t = 0, with the
same parameters except for Π =
2 2/5
2/5 3/(5
. The left panel (a) depicts the local density
for each component, |φ1|2 (solid line), |φ0|2 (chain line) and |φ−1|2 (dotted line). The center panel
(b) depicts the local number density n, where the contribution of the background is included. The
right panel (c) depicts the local spin densities, fx (solid line) and fz (dotted line). fy vanishes
identically due to a choice of a real matrix Π.
we obtain a relatively simple form of one-soliton solutions as
Q = λ0 e
iφ(x,t)
I + 2i
, (43)
detS = 1−
(tr Π(t) + trΠ†(t)) + e2χP
|trΠ(t)|2
(detΠ(t) + detΠ†(t))−
trΠ†(t)detΠ(t) + trΠ(t)detΠ†(t)
e4χP |detΠ(t)|2, (44)
2i T =
2(|tr Π(t)|2 − 1) e2χP + 2e2χP
detΠ(t) +
detΠ†(t)
tr Π†(t)detΠ(t) +
trΠ(t)detΠ†(t)
10/18
J. Phys. Soc. Jpn. Full Paper
−2(ζ + λ) eχP + 2
e2χP trΠ†(t) + 2
e3χP detΠ†(t)
−2(ζ − λ) eχP − 2
e2χP trΠ(t) + 2
e3χP detΠ(t)
Π†(t), (45)
where the coordinate of the envelope soliton is given by
χP (x, t) = −2λ0ζ(x− 2kt), (46)
and the time dependence of the phase modulation is embedded in the polarization matrix,
namely,
Π(t) ≡ Πe4iλ20ζλt. (47)
When we take the limit λ0 → 0 with λ0λ and λ0ζ kept finite in eqs. (43)-(45), Q converges
to the form of eqs. (32)-(35), accompanying the parameters kR = −2λ0λ, kI = k and ǫ =
− ln(4λ0λ). Here, kR and kI are independent free parameters, apart from the trivial initial
displacement ǫ. In this sense, in spite of the reduction, we can still regard the form of one-
soliton solutions (43)-(45) as a general form of those under VBC.
We can also take another limit. That is, we consider the reduction to the single-component
case. If we set
k = 0, Π =
eiθ 0
, (48)
the (1,1)-component of Q becomes
Q11 · e−i(2λ
t+δ) = λ0 − 2λ0ζ
ζ cos(4λ20ζλt+ θ) + iλ sin(4λ
0ζλt+ θ)
λ cosh(2λ0ζx+ ψ)− cos(4λ20ζλt+ θ)
, (49)
where eψ = ζ/λ. The form (49) was given in ref. 18. We thus verify that our soliton solutions
are also generalization of those for the single-component NLS equation under NVBC.19)
4.3 Spin states
In this subsection, we discuss the spin states of one-soliton solutions with a finite back-
ground, by calculating the total spin. The conservation law guarantees that we obtain the
total spin FT from integrating at arbitrary time. Therefore, we can select the time so that the
calculation becomes easier. We concentrate on the case of purely imaginary discrete eigenval-
ues. As a result, we see that the DW-type is associated with the ferromagnetic state, whereas
the PS-type is associated with the polar state. One-soliton solutions for purely imaginary dis-
crete eigenvalues (43)-(45) include those under VBC apart from an initial displacement, and
therefore the classification about the spin states presented below is wider than that performed
before.10, 11)
4.3.1 Ferromagnetic state
For detΠ = 0 (DW-type), we substitute eqs. (43)-(45) into eq. (21) and calculate the total
spin. The time t′ such that trΠ(t′) + trΠ†(t′) = 0 is suitable for the calculation. The result is
11/18
J. Phys. Soc. Jpn. Full Paper
as follows:
FT = 4λ0τ
2λRe{α∗(β + γ)}
−2ζIm{α∗(β − γ)}
λ(|β|2 − |γ|2)
, τ ≡
|β + γ|2
, (50)
with the modulus
|F T|2 = (4λλ0)2τ. (51)
The total number of the particles transformed into a soliton, N̄T, is calculated by eq. (20),
N̄T = 4λ0ζ, (52)
and the range of the value taken by |F T|2 is expressed in terms of N̄T,
N̄2T ≤ |F T|2 ≤ N̄2T + (4λ0)2. (53)
Remark that, in the vanishing limit λ0 → 0, the modulus of the total spin is always equal to
the total number of the particles, namely, |F T| → NT.
With nonzero total spin, the DW-type of solitons belongs to the ferromagnetic state.
Since inter-atomic ferromagnetic interactions are supposed here, solitons tend to take the
ferromagnetic state or DW-type. In various contexts of physics, the domain-walls are topo-
logical solitons related with the symmetry breaking. Here, resulting from the domain-walls,
the magnetic entity emerges as the spontaneously broken symmetry. It is worthy to notice
that the case |N̄T| < |FT| may happen. The background is spinless, but its internal spin state
appears to be affected on the ground that the ferromagnetic soliton runs over the background.
Thereby, the background contributes to the total spin.
4.3.2 Polar state
If detΠ 6= 0 (PS-type), the solitons show the other magnetic property. The time t′ such
that detΠ(t′) = detΠ†(t′) > 0 is suitable for the analysis. After lengthy calculations, the local
spin density at such time is derived as
= 8λ20 e
−3υΞ−2(χP ′)
β − γ
ζ e2υΞ(χP ′) sinh(χP ′)
(ζ2 − λ2)tr Π(t′) + (ζ2 + λ2)trΠ†(t′)
sinh(2χP ′)
λ2/ζ · ((tr Π†(t′))2 − |tr Π(t′)|2) + 4ζdetΠ(t′) + 2ζ(|tr Π(t′)|2 − 1)
sinh(χP ′)
+h.c., (54)
fy = 32λ
−3υIm{α∗(β − γ)}Ξ−2(χP ′)
2ζ eυ sinh(2χP ′)− (trΠ(t′) + trΠ†(t′)) sinh(χP ′)
, (55)
12/18
J. Phys. Soc. Jpn. Full Paper
where Ξ(χP ′) is an even function of χP ′ , χP ′(x, t) is a parallel-shifted coordinate, and υ is a
constant:
Ξ(χP ′) ≡ 2 cosh(2χP ′)−
2 e−υ
(trΠ(t′) + tr Π†(t′)) cosh(χP ′)
ζ2 + |trΠ(t′)|2
λ2detΠ(t′)
, (56)
χP ′ ≡ χP + υ = −2λ0ζ(x− 2kt′) + υ, (57)
υ ≡ ln
(detΠ(t′))1/2
. (58)
Note that fx and fz share the same functional form. Each component of the above local spin
density is an odd function of χP ′ and, in particular, it has the node at the same point x0 such
that χP ′(x0, t
′) = 0, namely, x0 = 2kt
′ + (2λ0ζ)
−1υ. Consequently, the total spin amounts to
zero:
dxf(χP ′) = (0, 0, 0)
T . (59)
For this reason, the PS-type of solitons, on the average, belongs to the polar state.15)
5. Two-Soliton Collision
We proceed to the discussion of two-soliton collisions in the integrable spinor model (9).
Two-soliton solutions are obtained by setting N = 4 (M = 2) in the formula (13). There
exist two independent discrete eigenvalues and symmetric polarization matrices, respectively,
i.e., λ1 = λ
2 and Π1 = Π
2 for one of solitons, λ3 = λ
4 and Π3 = Π
4 for the other. Each
soliton is separated at t = ±∞. Then, a two-soliton is asymptotically two one-solitons. The
classification of one-soliton solutions based on the values of the determinants of polarization
matrices, discussed in the previous section, is thus valid for two-soliton solutions.
The derivation of the explicit form is more complicated than that in the case for one-soliton
solutions. For the derivative NLS equation under NVBC, explicit two-soliton solutions and
shifts of soliton positions due to collisions between solitons have been analytically obtained,
in the case of purely imaginary eigenvalues, where complexity of calculation is considerably
reduced.20) This strategy, however, does not stand in our NLS equation under NVBC. The
reason is understood from eq. (46). In the spinor model, purely imaginary eigenvalues give
two solitons with the same velocity 2k, and they do not collide with each other. Accordingly,
we can not investigate the properties of collisions for purely imaginary discrete eigenvalues.
No one has studied explicit multi-soliton solutions of the NLS equation under NVBC, even in
the single-component case, because of the computational complexity.
Here, we graphically show the characteristic behaviors of two-soliton collisions in the
spinor model, by use of the exact solutions given by the ISM. Referring to them, we carry
out the qualitative discussions. Although the presented graphs are depicted for the specific
parameters, much the same behaviors are observed for arbitrarily selected parameters.
13/18
J. Phys. Soc. Jpn. Full Paper
Figure 2 illustrates the behavior of a mutual collision between two PS-types, where
detΠ1 6= 0 and detΠ3 6= 0. One can see that, in all three components, Both solitons re-
tain their shapes before and after the collision, which is the common property with solitons
in the single-component case. In this sense, PS-PS soliton collision is essentially equivalent to
two-soliton collision of the single-component NLS equation.
Figure 3 illustrates the behavior of a mutual collision between DW-type and PS-type,
where detΠ1 = 0 and detΠ3 6= 0. The behavior of collisions between DW-type and PS-type is
qualitatively alien from that between two PS-types. One observes that, in PS-type, much of
the population initially inhabiting the hyperfine substate |F = 1,mF = ±1〉 is transferred into
the hyperfine substate |F = 1,mF = 0〉 due to the collision. In contrast, in DW-type, such
spin transfer does not occur, and the domain-wall shape is preserved against the collision. This
phenomenon can be interpreted as follows. DW-type, with nonzero spin, can affect the internal
spin state of PS-type, whereas PS-type, which is expected to have zero spin in total, does not
affect the internal spin state of DW-type. This kind of spin-transfer phenomenon, called the
spin-switching, has been first predicted for the case of VBC.11) Due to the conservation laws,
the total number, the total spin and so on are invariant throughout the collision process.
Population-mixing among internal degrees of freedom is permitted, as far as the conservation
laws are not violated. The spin-switching is one of the dynamical processes which make the
spinor solitons more interesting.
Finally, for a mutual collision between two DW-types, where detΠ1 = detΠ3 = 0, the
shapes of both solitons are expected to be deformed by the collision since each soliton carries
nonzero total spin. In fact, drastic population-mixing is seen in Fig. 4, which shows an example
of this kind of collisions. One finds that domain-walls ”repel” at the collision region, rather
than collide.
(a) (b) (c)
Fig. 2. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between two PS-types.
The parameters used here are k = 1, λ0 = 1, λ1 = 1.03i, λ3 = 1.05 + i, Π1 =
2 i/2
i/2 0
0 i/2
i/2 1/
. The velocity of the right (left) moving soliton is 2.00 (−3.41). The collision
takes place at t = 0.
14/18
J. Phys. Soc. Jpn. Full Paper
(a) (b) (c)
Fig. 3. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between DW-
type and PS-type. The parameters used here are the same as those of Fig. 2, except for
2i/3 −1/3
, Π3 =
0 −1/
. The right (left) moving soliton is DW-type
(PS-type).
(a) (b) (c)
|φ1|2 |φ0|2 |φ−1|2
t t t
x x x
Fig. 4. Density plots of |φ1|2 (a), |φ0|2 (b) and |φ−1|2 (c) for a mutual collision between two DW-
types. The parameters used here are the same as those of Fig. 2, except for Π1 =
1/2 i/2
i/2 −1/2
. The values more than 2 are colored white.
6. Concluding Remarks
In this paper, we have investigated dynamical properties of matter-wave bright solitons
with a finite background in the F = 1 spinor Bose-Einstein condensate. To perform our anal-
ysis concretely, we have exploited an integrable spinor model with a self-focusing nonlinearity
and the inverse scattering method under nonvanishing boundary conditions. The situation
15/18
J. Phys. Soc. Jpn. Full Paper
that matter-wave solitons are located on a finite background fits to the experiments.
One-soliton solutions are derived explicitly and studied in detail. From the point of the
mathematical view, they offer general forms of bright soliton solutions of the NLS equation.
We have confirmed that our one-soliton solutions include those obtained in the previous works.
One-soliton solutions are classified into two kinds by the difference of boundary conditions;
DW-type and PS-type. The spin density profiles of one-solitons vary depending on the bound-
ary conditions. In the case of purely imaginary discrete eigenvalues, we have analytically shown
that DW-type is in the ferromagnetic state, whereas PS-type is in the polar state. The exis-
tence of two distinct magnetic properties for one-soliton solutions also gives rise to fascinating
phenomena in the case for two-soliton collisions, for example, the spin-switching. The above
results for bright solitons with a finite background agree with those for bright solitons under
VBC10, 11) and dark solitons.13)
Several problems still remain. It is desirable to extend our analysis to the case of general
discrete eigenvalues. The computations of the conserved quantities other than the total spin
are also required. (One approach is given in Appendix.) In addition, we wish to investigate
analytical properties of general N -soliton solutions under NVBC in the spinor model. Needless
to say, too complicated calculations are inevitable for the above problems. The remaining
problems should be discussed elsewhere.
We conclude that the properties of the multiple matter-wave solitons in the spinor BECs
are interesting and should be useful in various applications. Bright solitons are preferable to
dark solitons for applications, because of the advantage in the propagation distance. We hope
that our analysis contributes to illuminating dynamical properties of solitons in the spinor
BECs, which should be demonstrated experimentally in near future.
Acknowledgment
One of the authors (TK) acknowledges Dr. J. Ieda and Dr. M. Uchiyama for valuable
comments and discussions.
Appendix: Several Conserved Quantities of One-Soliton Solutions
The conserved quantities help us to understand the dynamics of the system. In this ap-
pendix, we calculate the total number, the total spin, the total momentum and the total
energy of the spinor model. We assume that, in addition to purely imaginary discrete eigen-
values, Π is a real symmetric 2× 2 matrix. The condition that Π is a real symmetric matrix
is inherent in the self-defocusing case, i.e., dark solitons.12)
For Π = Π†, one-soliton solutions of purely imaginary discrete eigenvalues (43)-(45) be-
come the following form at t = t′ = (4n− 1)π/8λ20ξλ for n = 0,±1, . . . :
Q = λ0 e
iφ(x,t)
I + 4iζ
Πe−(χP+ρ
′/2) + (σyΠσy) eχP+ρ
′/2detΠ
e−(2χP+ρ
′) + 1 + e2χP+ρ
(detΠ)2
, (A·1)
16/18
J. Phys. Soc. Jpn. Full Paper
where eρ
′/2 ≡ λ/ζ. This form is suitable for calculations, since the imaginary part is separated
from the real one. One can see clearly that the one-soliton solutions under VBC (32) are located
on a finite background in the form (A·1). Note that the domain-wall shape is lost even for
detΠ = 0 there.
Several conserved quantities of the solitons (A·1) are calculated by use of eqs. (20)-(23).
The results for detΠ = 0 are given by
N̄T = 4λ0ζ, (A·2)
FT = N̄T
2α(β + γ)
β2 − γ2
, |F T| = N̄T, (A·3)
P̄T = N̄T~k, (A·4)
ĒT = N̄Tc
(k2 − 2λ20)−
, (A·5)
and those for detΠ 6= 0 are given by
N̄T = 8λ0ζ, (A·6)
FT = (0, 0, 0)
T , (A·7)
P̄T = N̄T~k, (A·8)
ĒT = N̄Tc
(k2 − 2λ20)−
. (A·9)
It is intriguing that, for fixed amplitude and discrete eigenvalue, N̄T, P̄T and ĒT of the
PS-type (detΠ 6= 0) have just twice values as those of the DW-type (detΠ = 0), respectively.
This enables us to interpret that the PS-type of solitons is a bound state of the two DW-types
of solitons. Additionally, for fixed amplitude and total number, the total energy ĒT of the
DW-type is lower than that of the PS-type: ĒDWT − ĒPST = −N̄3Tc/16 < 0, which suggests that
the DW-type is physically preferable. This result is consistent with inter-atomic ferromagnetic
interaction, i.e., c̄2 < 0.
17/18
J. Phys. Soc. Jpn. Full Paper
References
1) K. E. Strecker, G. B. Partridge, A. G. Truscott and R. G. Hulet: Nature (London) 417 (2002) 150.
2) L. Khaykovich, F. Schreck, G. Ferrari, T. Bourdel, J. Cubizolles, L. D. Carr, Y. Castin and C.
Salomon: Science 296 (2002) 1290.
3) S. Burger, K. Bongs, S. Dettmer, W. Ertmer and K. Sengstock: Phys. Rev. Lett. 83 (1999) 5198.
4) J. Denschlag, J. E. Simsarian, D. L. Feder, C. W. Clark, L. A. Collins, J. Cubizolles, L. Deng, E.
W. Hagley, K. Helmerson, W. P. Reinhardt, S. L. Rolston, B. I. Schneider and W. D. Phillips:
Science 287 (2000) 97.
5) F. K. Abdullaev, A. Gammal, A. M. Kamchatnov and L. Tomio: Int. J. Mod. Phys. B 19 (2005)
3415.
6) J. Stenger, S. Inouye, D. M. Stamper-Kurn, H.-J. Miesner, A. P. Chikkatur, W. Ketterle: Nature
396 (1998) 345.
7) D. M. Stamper-Kurn, M. R. Andrews, A. P. Chikkatur, S. Inouye, H.-J. Miesner, J. Stenger and
W. Ketterle: Phys. Rev. Lett. 80 (1998) 2027.
8) H.-J. Miesner, D. M. Stamper-Kurn, J. Stenger, S. Inouye, A. P. Chikkatur and W. Ketterle: Phys.
Rev. Lett. 82 (1999) 2228.
9) T. Tsuchida and M. Wadati: J. Phys. Soc. Jpn. 67 (1998) 1175.
10) J. Ieda, T. Miyakawa and M. Wadati: Phys. Rev. Lett. 93 (2004) 194102.
11) J. Ieda, T. Miyakawa and M. Wadati: J. Phys. Soc. Jpn. 73 (2004) 2996.
12) J. Ieda, M. Uchiyama and M. Wadati: J. Math. Phys. 48 (2007) 013507.
13) M. Uchiyama, J. Ieda and M. Wadati: J. Phys. Soc. Jpn. 75 (2006) 064002.
14) M. Olshanii: Phys. Rev. Lett. 81 (1998) 938.
15) T.-L. Ho: Phys. Rev. Lett. 81 (1998) 742.
16) T. Ohmi and K. Machida: J. Phys. Soc. Jpn. 67 (1998) 1822.
17) M. Wadati and N. Tsuchida: J. Phys. Soc. Jpn. 75 (2006) 014301.
18) T. Kawata and H. Inoue: J. Phys. Soc. Jpn. 44 (1978) 1722.
19) In ref. 18, the right-hand side of eq. (49) is λ0+2λ0ζ · · · . We are afraid that there exists a misprint
of the sign.
20) X.-J. Chen, J. Yang and W. K. Lam: J. Phys. A 39 (2006) 3263.
18/18
ABSTRACT
  We investigate dynamical properties of bright solitons with a finite
background in the F=1 spinor Bose-Einstein condensate (BEC), based on an
integrable spinor model which is equivalent to the matrix nonlinear
Schr\"{o}dinger equation with a self-focusing nonlineality. We apply the
inverse scattering method formulated for nonvanishing boundary conditions. The
resulting soliton solutions can be regarded as a generalization of those under
vanishing boundary conditions. One-soliton solutions are derived in an explicit
manner. According to the behaviors at the infinity, they are classified into
two kinds, domain-wall (DW) type and phase-shift (PS) type. The DW-type implies
the ferromagnetic state with nonzero total spin and the PS-type implies the
polar state, where the total spin amounts to zero. We also discuss two-soliton
collisions. In particular, the spin-mixing phenomenon is confirmed in a
collision involving the DW-type. The results are consistent with those of the
previous studies for bright solitons under vanishing boundary conditions and
dark solitons. As a result, we establish the robustness and the usefulness of
the multiple matter-wave solitons in the spinor BECs.

<|endoftext|><|startoftext|>
Why there is something rather than nothing (out of everything)?
A.O.Barvinsky
Theory Department, Lebedev Physics Institute, Leninsky Prospect 53, 119991 Moscow, Russia
The path integral over Euclidean geometries for the recently suggested density matrix of the
Universe is shown to describe a microcanonical ensemble in quantum cosmology. This ensemble
corresponds to a uniform (weight one) distribution in phase space of true physical variables, but
in terms of the observable spacetime geometry it is peaked about complex saddle-points of the
Lorentzian path integral. They are represented by the recently obtained cosmological instantons
limited to a bounded range of the cosmological constant. Inflationary cosmologies generated by these
instantons at late stages of expansion undergo acceleration whose low-energy scale can be attained
within the concept of dynamically evolving extra dimensions. Thus, together with the bounded
range of the early cosmological constant, this cosmological ensemble suggests the mechanism of
constraining the landscape of string vacua and, simultaneously, a possible solution to the dark energy
problem in the form of the quasi-equilibrium decay of the microcanonical state of the Universe.
PACS numbers: 04.60.Gw, 04.62.+v, 98.80.Bp, 98.80.Qc
Euclidean quantum gravity (EQG) is a lame duck in
modern particle physics and cosmology. After its sum-
mit in early and late eighties (in the form of the cosmo-
logical wavefunction proposals [1, 2] and baby universes
boom [3]) the interest in this theory gradually declined,
especially, in cosmological context, where the problem of
quantum initial conditions was superseded by the con-
cept of stochastic inflation [4]. EQG could not stand the
burden of indefiniteness of the Euclidean gravitational
action [5] and the cosmology debate of the tunneling vs
no-boundary proposals [6].
Thus, a recently suggested EQG density matrix of the
Universe [7] is hardly believed to be a viable candidate
for the initial state of the Universe, even though it avoids
the infrared catastrophe of small cosmological constant
Λ, generates an ensemble of universes in the limited
range of Λ, and suggests a strong selection mechanism
for the landscape of string vacua [7, 8]. Here we want to
justify this result by deriving it from first principles of
Lorentzian quantum gravity applied to a microcanonical
ensemble of closed cosmological models.
Thermal properties in quantum cosmology [9] are in-
corporated by a mixed physical state, which is dynam-
ically more preferable than a pure state of the Hartle-
Hawking type. This follows from the path integral for
the EQG statistical sum [7, 8]. It can be cast into the
form of the integral over a minisuperspace of the lapse
function N(τ) and scale factor a(τ) of spatially closed
FRW metric ds2 = N2(τ) dτ2 + a2(τ) d2Ω(3),
e−Γ =
periodic
D[ a,N ] e−ΓE [ a,N ], (1)
e−ΓE [ a,N ] =
periodic
Dφ(x) e−SE [ a,N ;φ(x) ]. (2)
Here ΓE [ a, N ] is the Euclidean effective action of all
inhomogeneous “matter” fields which include also met-
ric perturbations on minisuperspace background Φ(x) =
(φ(x), ψ(x), Aµ(x), hµν (x), ...). SE [a,N ;φ(x)] is the clas-
sical Eucidean action, and the integration runs over pe-
riodic fields on the Euclidean spacetime with a compact-
ified time τ (of S1 × S3 topology).
For free matter fields φ(x) conformally coupled to grav-
ity (which are assumed to be dominating in the sys-
tem) the effective action has the form [7] ΓE [ a,N ] =
dτ NL(a, a′) + F (η), a′ ≡ da/Ndτ . Here NL(a, a′) is
the effective Lagrangian of its local part including the
classical Einstein term (with the cosmological constant
Λ = 3H2) and the contribution of the conformal anomaly
of quantum fields and their vacuum (Casimir) energy,
L(a, a′) = −aa′2−a+H2a3+B
. (3)
F (η) is the free energy of their quasi-equilibrium excita-
tions with the temperature given by the inverse of the
conformal time η =
dτ N/a. This is a typical boson or
fermion sum F (η) = ±
1∓ e−ωη
over field oscil-
lators with energies ω on a unit 3-sphere. We work in
units of mP = (3π/4G)
1/2, and B is the constant deter-
mined by the coefficient of the Gauss-Bonnet term in the
overall conformal anomaly of all fields φ(x).
Semiclassically the integral (1) is dominated by the
saddle points — solutions of the Friedmann equation
−H2 − C
, (4)
modified by the quantum B-term and the radiation term
C/a4 with the constant C satisfying the bootstrap equa-
tion C = B/2 + dF (η)/dη. Such solutions represent
garland-type instantons which exist only in the limited
range 0 < Λmin < Λ < 3m
P/B [7, 8] and eliminate the
infrared catastrophe of Λ = 0. The period of these quasi-
thermal instantons is not a freely specifiable parameter,
but as a function of Λ follows from this bootstrap. There-
fore this is not a canonical ensemble.
Contrary to the construction above, the density ma-
trix that we advocate here is given by the canonical path
integral of Lorentzian quantum gravity. Its kernel in the
http://arxiv.org/abs/0704.0083v2
representation of 3-metrics and matter fields denoted be-
low as q reads
ρ(q+, q−) = e
q(t±)= q±
D[ q, p,N ] e
dt (p q̇−NµHµ)
, (5)
where the integration runs over histories of phase-space
variables (q(t), p(t)) interpolating between q± at t± and
the Lagrange multipliers of the gravitational constraints
Hµ = Hµ(q, p) — lapse and shift functionsN(t) = N
µ(t).
The measure D[ q, p,N ] includes the gauge-fixing factor
containing the delta function δ(χ) =
µ δ(χ
µ) of gauge
conditions χµ and the ghost factor [10, 11] (condensed
index µ includes also continuous spatial labels). It is
important that the integration range of Nµ
−∞ < N < +∞, (6)
is such that it generates in the integrand the delta-
functions of these constraints δ(H) =
µ δ(Hµ). As a
consequence the kernel (5) satisfies the set of quantum
Dirac constraints — Wheeler-DeWitt equations
q, ∂/i∂q
ρ( q, q− ) = 0, (7)
and the density matrix (5) can be regarded as an operator
delta-function of these constraints
ρ̂ ∼ “
δ(Ĥµ)
”. (8)
This notation should not be understood literally because
this multiple delta-function is not well defined, for the
operators Ĥµ do not commute and form a quasi-algebra
with nonvanishing structure functions. Moreover, ex-
act operator realization Ĥµ is not known except the
first two orders of a semiclassical ~-expansion [12]. For-
tunately, we do not need a precise form of these con-
straints, because we will proceed with their path-integral
solutions well adjusted to the semiclassical perturbation
theory. This strategy does not extend beyond typical
field-theoretic considerations when the path integral is
regarded more fundamental than the Schrodinger equa-
tion marred with the problems of divergent equal-time
commutators, operator ordering, etc.
The very essence of our proposal is the interpretation
of (5) and (8) as the density matrix of a microcanonical
ensemble in spatially closed quantum cosmology. A sim-
plest analogy is the density matrix of an unconstrained
system having a conserved Hamiltonian Ĥ in the micro-
canonical state with a fixed energy E, ρ̂ ∼ δ(Ĥ − E).
Major distinction of (8) from this case is that spatially
closed cosmology does not have freely specifiable con-
stants of motion like the energy or other global charges.
Rather it has as constants of motion the Hamiltonian and
momentum constraints Hµ, all having a particular value
— zero. Therefore, the expression (8) can be considered
as a most general and natural candidate for the quantum
state of the closed Universe. Below we confirm this fact
by showing that in the physical sector the correspond-
ing statistical sum is just a uniformly distributed (with
a unit weight) integral over entire phase space of true
physical degrees of freedom. Thus, this is a sum over
Everything. However, in terms of the observable quanti-
ties, like spacetime geometry, this distribution turns out
to be nontrivially peaked around a particular set of uni-
verses. Semiclassically this distribution is given by the
EQG density matrix and the saddle-point instantons of
the above type [7].
From the normalization of the density matrix in the
physical Hilbert space the statistical sum follows as the
path integral
1 = Trphys ρ̂ =
q, ∂/i∂q
ρ(q, q′)
periodic
D[ q, p,N ] e i
dt(p q̇−NµHµ), (9)
where the integration runs over periodic in time histo-
ries of q = q(t). Here µ
q, ∂/i∂q
= µ̂ is the mea-
sure which distinguishes the physical inner product in
the space of solutions of the Wheeler-DeWitt equations
(ψ1|ψ2) = 〈ψ1|µ̂|ψ2〉 from that of the space of square-
integrable functions, 〈ψ1|ψ2〉 =
dq ψ∗1ψ2. This measure
includes the delta-function of unitary gauge conditions
and an operator factor built with the aid of the relevant
ghost determinant [12].
On the other hand, in terms of the physical phase space
variables the Faddeev-Popov path integral equals [10, 11]
D[ q, p,N ] e i
dt (p q̇−NµHµ)
DqphysDpphys e
dt (pphys q̇phys−Hphys(t))
= Trphys
T e−i
dt Ĥphys(t)
, (10)
where T denotes the chronological ordering. Here the
physical Hamiltonian Hphys(t) and its operator realiza-
tion Ĥphys(t) are nonvanishing only in unitary gauges ex-
plicitly depending on time [12], ∂tχ
µ(q, p, t) 6= 0. In static
gauges, ∂tχ
µ = 0, they identically vanish, because in spa-
tially closed cosmology the full Hamiltonian reduces to
the combination of constraints.
The path integral (10) is gauge-independent on-shell
[10, 11] and coincides with that in the static gauge.
Therefore, from Eqs.(9)-(10) with Ĥphys = 0, the sta-
tistical sum of our microcanonical ensemble equals
e−Γ = Trphys Iphys =
dqphys dpphys
= sum over Everything. (11)
This ultimate equipartition, not modulated by any non-
trivial density of states, is a result of general covariance
and closed nature of the Universe lacking any freely speci-
fiable constants of motion. The volume integral of entire
physical phase space, whose structure and topology is
not known, is very nontrivial. However, below we show
that semiclassically it is determined by EQG methods
and supported by instantons of [7] spanning a bounded
range of the cosmological constant.
Integration over momenta in (9) yields a Lagrangian
path integral with a relevant measure and action
e−Γ =
D[ q,N ] eiSL[ q, N ]. (12)
Integration runs over periodic fields (not indicated ex-
plicitly but assumed everywhere below) even despite the
Lorentzian signature of the underlying spacetime. Sim-
ilarly to the procedure of [7, 8] leading to (1)-(2) in
the Euclidean case, we decompose [ q,N ] into a min-
isuperspace [ aL(t), NL(t) ] and the “matter” φL(x) vari-
ables, the subscript L indicating their Lorentzian na-
ture. With a relevant decomposition of the measure
D[ q,N ] = D[ aL, NL ]×DφL(x), the microcanonical sum
takes the form
e−Γ =
D[ aL, NL ] e
iΓL[ aL, NL ], (13)
eiΓL[ aL, NL ] =
DφL(x) e
iSL[ aL, NL;φL(x)], (14)
where ΓL[ aL, NL ] is a Lorentzian effective action. The
stationary point of (13) is a solution of the effective equa-
tion δΓL/δNL(t) = 0. In the gauge NL = 1 it reads as
a Lorentzian version of the Euclidean equation (4) and
the bootstrap equation for the radiation constant C with
the Wick rotated τ = it, a(τ) = aL(t), η = i
dt/aL(t).
However, with these identifications C turns out to be
purely imaginary (in view of the complex nature of the
free energy F (i
dt/aL)). Therefore, no periodic solu-
tions exist in spacetime with a real Lorentzian metric.
On the contrary, such solutions exist in the Euclidean
spacetime. Alternatively, the latter can be obtained with
the time variable unchanged t = τ , aL(t) = a(τ), but
with the Wick rotated lapse function
NL = −iN, iSL[ aL, NL;φL] = −SE[ a,N ;φ ]. (15)
In the gauge N = 1 (NL = −i) these solutions exactly
coincide with the instantons of [7]. The corresponding
saddle points of (13) can be attained by deforming the
integration contour (6) of NL into the complex plane to
pass through the point NL = −i and relabeling the real
Lorentzian t with the Euclidean τ . In terms of the Eu-
clidean N(τ), a(τ) and φ(x) the integrals (13) and (14)
take the form of the path integrals (1) and (2) in EQG,
iΓL[ aL, NL] = −ΓE [ a, N ]. (16)
However, the integration contour for the Euclidean N(τ)
runs from −i∞ to +i∞ through the saddle point N = 1.
This is the source of the conformal rotation in Euclidean
quantum gravity, which is called to resolve the problem of
unboundedness of the gravitational action and effectively
renders the instantons a thermal nature, even though
they originate from the microcanonical ensemble. This
mechanism implements the justification of EQG from
canonical quantization of gravity [14] (see also [15] in
black hole context).
To show this we calculate (1) in the one-loop approx-
imation with the measure inherited from the canonical
path integral (5) D[ a,N ] = DaDN µ[ a,N ] δ[χ ] DetQ.
Here µ[ a,N ] is a local measure determined by the La-
grangian NL(a, a′), (3), in the local part of ΓE [ a,N ],
µ1−loop =
∂2(NL)
∂ȧ ∂ȧ
N a2a′2
D = a a′2(a2 −B +B a′2). (17)
The Faddeev-Popov factor δ[χ ] DetQ contains a gauge
condition χ = χ(a,N) fixing the one-dimensional dif-
feomorphism, τ → τ̄ = τ − f/N , which for infinitesi-
mal f = f(τ) has the form ∆fN ≡ N̄(τ) − N(τ) = ḟ ,
∆fa ≡ ā(τ) − a(τ) = ȧ f/N , and Q = Q(d/dτ) is a
ghost operator determined by the gauge transformation
of χ(a,N), ∆fχ = Q(d/dτ) f(τ).
The conformal mode σ of the perturbations δa = σa0
and δN = σN0 on the saddle-point background (labeled
below by zero, a = a0 + δa, N = N0 + δN) origi-
nates from imposing the background gauge χ(a,N) =
δN − (N0/a0) δa. In this gauge Q = a(d/dτ)a−1, and
the quadratic part of ΓE takes the form [13]
δ2σΓE = −
3πm2P
, (18)
where D is given by (17). As is known from [7] for the
background instantons a20(τ) ≥ a2− > B (a− is the turn-
ing point with the smallest value of a0(τ)), so that D > 0
everywhere except the turning points where D degener-
ates to zero. Therefore δ2σΓE < 0 for real σ, but the
Gaussian integration runs along the imaginary axes and
yields the functional determinant of the positive operator
— the kernel of the quadratic form (18)
e−Γ1−loop = e−Γ0 DetQ0
D/a′2
= e−Γ0×Det
)]−1/2
In view of periodic boundary conditions for both oper-
ators their determinants cancel each other (their zero
modes to be eliminated because they correspond to the
conformal Killing symmetry of FRW instantons) [13].
Therefore, the contribution of the conformal mode re-
duces to the selection of instantons with a fixed time pe-
riod, effectively endowing them with a thermal nature.
As suggested in [7, 8, 16] these instantons serve as
initial conditions for inflationary universes which evolve
according to the Lorentzian version of Eq.(4) and, at late
stages, have two branches of a cosmological acceleration
with Hubble scales H2
= (m2P /B)(1±(1−2BH2)1/2). If
the initial Λ = 3H2 is a composite inflaton field decaying
at the end of inflation, then one of the branches under-
goes acceleration with H2+ = 2m
P/B. This is determined
by the new quantum gravity scale suggested in [8] – the
upper bound of the instanton Λ-range, Λmax = 3m
P /B.
To match the model with inflation and the dark energy
phenomenon, one needs B of the order of the inflation
scale in the very early Universe and B ∼ 10120 now, so
that this parameter should effectively be a growing func-
tion of time.
This picture seems to fit into string theory at its low-
energy field-theoretic level. Then, with a bounded range
of Λ, it might constrain the landscape of string vacua
[7, 8]. Moreover, string theory has a qualitative mecha-
nism to promote the constant B to the level of a mod-
uli variable indefinitely growing with the evolving size
R(t) of extra dimensions. Indeed B as a coefficient in
the overall conformal anomaly of 4-dimensional quantum
fields basically counts their number N , B ∼ N . In the
Kaluza-Klein (KK) theory and string theory the effective
4-dimensional fields arise as KK and winding modes with
the masses [17]
m2n,w =
R2 (19)
(enumerated by the KK and winding numbers), which
break their conformal symmetry. These modes remain
approximately conformally invariant as long as their
masses are much smaller than the spacetime curvature,
m2n,w ≪ H2+ ∼ m2P /N . Therefore the number of confor-
mally invariant modes changes with R. Simple estimates
show that for pure KK modes (w = 0, n ≤ N) their num-
ber grows with R as N ∼ (mPR)2/3, whereas for pure
winding modes (n = 0, w ≤ N) their number grows with
the decreasing R as N ∼ (mPα′/R)2/3. Thus, the effect
of indefinitely growing B is possible for both scenarios
with expanding or contracting extra dimensions. In the
first case this is the growing tower of superhorizon KK
modes which make the horizon scale H+ = mP
2/B ∼
mP /(mPR)
1/3 indefinitely decreasing with R → ∞. In
the second case this is the tower of superhorizon winding
modes which make this acceleration scale decrease with
the decreasing R as H+ ∼ mP (R/mPα′)1/3. This effect
is flexible enough to accommodate the present day ac-
celeration scale H0 ∼ 10−60mP (though, by the price of
fine-tuning an enormous coefficient of expansion or con-
traction of R). This gives a new dark energy mechanism
driven by the conformal anomaly and transcending the
inflationary and matter-domination stages starting with
the state of the microcanonical distribution.
To summarize, within a minimum set of assumptions
(the equipartition in the physical phase space (11)),
we seem to have the mechanism of constraining the
landscape of string vacua and get the full evolution of
the Universe as a quasi-equilibrium decay of its initial
microcanonical state. Thus, contrary to anticipations
of Sidney Coleman, “there is Nothing rather than
Something” [3], one can say that Something (rather
than Nothing) comes from Everything.
The author thanks O.Andreev, C.Deffayet, A.Kamen-
shchik, J.Khoury, H.Tye, A.Tseytlin, I.Tyutin and
B.Voronov for thought provoking discussions and espe-
cially Andrei Linde, this work having appeared as an un-
intended response to his discontent with EQG initial con-
ditions. The work was supported by the RFBR grant 05-
02-17661, the grant LSS 4401.2006.2 and SFB 375 grant
at the Ludwig-Maximilians University in Munich.
[1] J.B.Hartle and S.W.Hawking, Phys.Rev. D28, 2960
(1983); S.W.Hawking, Nucl. Phys. B 239, 257 (1984).
[2] A.D. Linde, JETP 60, 211 (1984); A.Vilenkin, Phys.
Rev. D 30, 509 (1984).
[3] S.R.Coleman, Nucl. Phys. B 310, 643 (1988).
[4] A.A.Starobinsky, in Field Theory, Quantum Gravity and
Strings, 107 (eds. H.De Vega and N.Sanchez, Springer,
1986); A.D.Linde, Particle physics and inflationary cos-
mology (Harwood, Chur, Switzerland, 1990).
[5] G.W.Gibbons, S.W.Hawking and M.Perry, Nucl. Phys.
B 138, 141 (1978).
[6] A.Vilenkin, Phys. Rev. D58, 067301 (1988),
gr-qc/9804051; gr-qc/9812027.
[7] A.O.Barvinsky and A.Yu.Kamenshchik, J. Cosmol. As-
tropart. Phys. 09, 014 (2006), hep-th/0605132.
[8] A.O.Barvinsky and A.Yu.Kamenshchik, Phys. Rev.D74,
121502 (2006), hep-th/0611206.
[9] H.Firouzjahi et al, JHEP 0409, 060 (2004); S.Sarangi
and S.-H.H.Tye, hep-th/0505104; R.Brustein and S.P.de
Alwis, Phys. Rev. D 73, 046009 (2006).
[10] L.D.Faddeev, Theor. Math. Phys. 1, 1 (1970).
[11] A.O.Barvinsky, Phys. Rep. 230, 237 (1993); Nucl. Phys.
B 520 (1998) 533.
[12] A.O.Barvinsky and V.Krykhtin, Class. Quantum Grav.
10, 1957 (1993); A.O.Barvinsky, Geometry of the Dirac
and reduced phase space quantization of constrained sys-
tems, gr-qc/9612003; M.Henneaux and C.Teitelboim,
Quantization of Gauge Sytems (Princeton University
Press, Princeton, 1992).
[13] A.O.Barvinsky and A.Yu.Kamenshchik, in preparation.
[14] J.B. Hartle and K. Schleich, in Quantum field theory and
quantum statistics, 67 (eds. I.Batalin et al, Hilger, Bris-
tol, 1988); K. Schleich, Phys.Rev. D 36, 2342 (1987).
[15] D. Brown and J.W. York, Jr., Phys. Rev. D 47, 1420
(1993), gr-qc/9209014.
[16] A.O.Barvinsky and A.Yu.Kamenshchik, Cosmological
landscape and Euclidean quantum gravity,to appear in J.
Phys. A, hep-th/0701201.
[17] J.Polchinski, String Theory (Cambridge University Press,
Cambridge, 1998).
http://arxiv.org/abs/gr-qc/9804051
http://arxiv.org/abs/gr-qc/9812027
http://arxiv.org/abs/hep-th/0605132
http://arxiv.org/abs/hep-th/0611206
http://arxiv.org/abs/hep-th/0505104
http://arxiv.org/abs/gr-qc/9612003
http://arxiv.org/abs/gr-qc/9209014
http://arxiv.org/abs/hep-th/0701201
ABSTRACT
  The path integral over Euclidean geometries for the recently suggested
density matrix of the Universe is shown to describe a microcanonical ensemble
in quantum cosmology. This ensemble corresponds to a uniform (weight one)
distribution in phase space of true physical variables, but in terms of the
observable spacetime geometry it is peaked about complex saddle-points of the
{\em Lorentzian} path integral. They are represented by the recently obtained
cosmological instantons limited to a bounded range of the cosmological
constant. Inflationary cosmologies generated by these instantons at late stages
of expansion undergo acceleration whose low-energy scale can be attained within
the concept of dynamically evolving extra dimensions. Thus, together with the
bounded range of the early cosmological constant, this cosmological ensemble
suggests the mechanism of constraining the landscape of string vacua and,
simultaneously, a possible solution to the dark energy problem in the form of
the quasi-equilibrium decay of the microcanonical state of the Universe.

<|endoftext|><|startoftext|>
Introduction.
1.1. Description of the model. We give a quantitative analysis of clus-
tering in a stochastic model of one-dimensional gas. At time zero, the gas
consists of n point particles, each one of mass 1
. These particles are ran-
domly distributed on the real line and have zero initial speeds. Particles
begin to move under the forces of mutual attraction. When two or more
particles collide, they stick together forming a new particle, called cluster,
whose mass and speed are defined by the laws of mass and momentum
conservation. Between collisions, particles move according to the laws of
Newtonian mechanics.
We suppose that the force of mutual attraction does not depend on dis-
tance and equals the product of masses. This assumption is natural for
Received March 2007; revised September 2007.
1Supported in part by the Grants NSh-4222.2006.1 and DFG-RFBR 436 RUS
113/773/0-1(R).
AMS 2000 subject classifications. Primary 60K35, 82C22; secondary 60F17, 70F99.
Key words and phrases. Sticky particles, particle systems, gravitating particles, number
of clusters, aggregation, adhesion.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Probability,
2008, Vol. 18, No. 3, 1026–1058. This reprint differs from the original in
pagination and typographic detail.
http://arxiv.org/abs/0704.0086v2
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/07-AAP481
http://www.imstat.org
http://www.ams.org/msc/
http://www.imstat.org
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/07-AAP481
2 V. V. VYSOTSKY
one-dimensional models because, by the Gauss law applied to flux of the
gravitational field, gravitation is proportional to the distance to the power
one minus dimension of the space. At any moment, the acceleration of a
particle is thus equal to difference of masses located to the right and to the
left of the particle.
Random initial positions of particles are usually described (see [8, 16,
25]) by the following natural models: in the uniform model, n particles are
independently and uniformly spread on [0,1]; in the Poisson model, particles
are located at points 1
S2, . . . ,
Sn, where Si is a standard exponential
random walk. In other words, particles are located at points of first n jumps
of a Poisson process with intensity n.
These two models are the most natural and interesting; let us call them
the main models of initial positions. However, we will see that behavior
of the Poisson model is essentially defined by independence of initial dis-
tances between particles rather than by the particular type of the distances’
distribution. Therefore, it is of a great mathematical interest to general-
ize the Poisson model by introducing the i.d. model, where “i.d.” stands
for “independent distances,” as follows. Particles are initially located at
S2, . . . ,
Sn, where Si is a positive random walk whose nonnegative
i.i.d. increments Xi satisfy the normalization condition EXi = 1. Note that
if we proceed to the limit as n→∞, we consider a system of total mass one,
which consists of, roughly speaking, infinitesimal particles homogeneously
spread on [0,1]; this is true for all the mentioned models of initial positions.
The mathematical interest in sticky particles systems arises mainly from
relations between these systems and some nonlinear partial differential equa-
tions originating from fluid mechanics, for example, the Burgers equation.
These equations admit interpretation in terms of sticky particles; see Gur-
batov et al. [10], Brenier and Grenier [4] or E, Rykov and Sinai [6]. Sticky
particles models are also used for numerical solving of other partial differen-
tial equations; see Chertock et al. [5] for explanations and further references.
As time goes, particles aggregate in clusters. Clusters become larger and
larger while the number of clusters decreases until they merge into a single
cluster containing all initial particles. This process of mass aggregation is
strongly connected with additive coalescence; see Bertoin [2] and Giraud [9]
for the most recent results and references.
The aggregation process resembles formation of a star from dispersed
space dust and sticky particles models indeed have relations to astrophysics.
It is appropriate to clarify these relations since they are not so direct and
cause a lot of misunderstanding.
It is known that the distribution of galaxies in the universe is very inho-
mogeneous and the regions of high density form a peculiar cellular structure.
The first attempt to understand the formation of such structures was made
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 3
in 1970 by Zeldovich. Most of the mass in the universe is believed to ex-
ist in the form of particles that practically do not collide with each other
and interact only gravitationally, for example, neutrinos. In his model, Zel-
dovich considered an initially homogeneous collisionless medium of particles
moving by pure inertia; the gravitational interaction was taken away by an
appropriate time change. He showed that singularities, that is, the thin re-
gions of very high density of particles, so called “pancakes,” appear even if
initial speeds of particles form a smooth velocity field.
Zeldovich’s approximate model, however, does not explain formation of
the cellular structure of matter. His approximation does not take into ac-
count that particles hitting a “pancake” are hampered by its strong gravita-
tional field and start oscillating inside the “pancake” instead of flying away.
Although this gravitational adhesion of collisionless particles is not precisely
the same as the real sticking, the model of sticky particles serves as a reason-
able approximation. The effect of gravitational adhesion was then analyzed
by the use of the Burgers equation; Gurbatov, Saichev and Shandarin pro-
posed it in 1984 to extend Zeldovich’s approximation, which is invalid after
formation of “pancakes.”
The model of sticky particles is directly mentioned in Gurbatov et al. [11];
a comprehensive survey of the formation of the Universe’s large-scale struc-
ture could be found in Shandarin and Zeldovich [23].
1.2. Statement of the problem and the results. In general, the problem
is to describe the process of mass aggregation. How fast is it? How large
the clusters are? Where do clusters appear most intensively, and so forth?
Numerous papers on the model (e.g., [8, 14, 16, 20, 25]) are dedicated to
probabilistic description of various properties of the aggregation process as
the number of initial particles n tends to infinity. Thus, the behavior of a
typical system consisting of a large number of particles is studied.
In this paper, we are interested in the asymptotic behavior of Kn(t), which
denotes the number of clusters at time t in the system with n initial particles.
This variable is a decreasing random step function satisfying Kn(0) = n
and Kn(t) = 1 for t ≥ T lastn , where T lastn denotes the moment of the last
collision. While calculating Kn(t), we also count initial particles that have
not experienced any collisions; in other words, Kn(t) is the total number of
particles existing at time t.
It is very important to know the behavior of Kn(t). This gives us a deep
understanding of the aggregation process since the average size of a cluster
at time t is n
Kn(t)
At first we give a short deterministic example. Suppose that particles are
located at points 1
, . . . , n
, that is, Si = i. By simple calculations, we find
that there would not be any collisions before t= 1. At the moment t= 1, all
4 V. V. VYSOTSKY
particles simultaneously stick together, hence Kn(t) = n for 0 ≤ t < 1 and
Kn(t) = 1 for t≥ 1.
However, when the initial positions are random, the aggregation process
behaves entirely differently. In [25], the author proved the following state-
ment.
Fact 1. There exists a deterministic function a(t) such that both in the
Poisson and the uniform models of initial positions, for any t≥ 0, we have
Kn(t)
P−→ a(t), n→∞.(1)
The function a(t) is continuous, a(0) = 1, and a(t) = 0 for t≥ 1. We conjec-
ture, on the basis of numerical simulations, that a(t) = 1− t2 for 0≤ t≤ 1.
The relation a(t) = 0 for t > 1 is not of a surprise because we know from
Giraud [8] that both in the Poisson and the uniform models, T lastn
P−→ 1 (the
limit constant is so “fine” due to the proper scaling of the model). Therefore,
we say that the moment t= 1 is critical ; note that this moment coincides
with the moment of the total collision in the deterministic model.
The aim of this paper is to strengthen the result of [25]. We first gen-
eralize Fact 1 and prove it for the i.d. model. We will see [relations (19)
and (27) below] that a(t) is equal to the probability of a certain event that
is expressed in terms of Xi. Also, we will prove that a(t) depends on the
common distribution of Xi as follows: a(t) = 1 on [0,
µ), where
µ := sup{y :P{Xi < y}= 0};
a(t) ∈ (0,1) on (√µ,1); and a(t) = 0 on (1,∞).
Furthermore, the recent results of the author [26] allow us to prove the
conjecture from Fact 1 that aPoiss(t) = aUnif(t) = 1− t2 for 0≤ t≤ 1. There
is an amazing contrast between the simplicity of this formula and the hard
calculations one needs to obtain it. It is remarkable that now we know the
limit function a(t) for the main models of initial positions.
Our main goal is to improve (1) by finding the next term in the asymp-
totics of Kn(t). The result is the following statement, where the standard
symbol
D−→ denotes weak convergence and D denotes the Skorohod space.
Theorem 1. In the i.d. model with continuous Xi satisfying EX
for some γ > 4, there exists a centered Gaussian process K(·) on [0,1) such
Kn(·)− na(·)√
D−→K(·) in D[0,1− ε] for all ε ∈ (0,1)(2)
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 5
as n→ ∞. The process K(·) depends on the distribution of Xi. This pro-
cess satisfies K(0) = 0 and has a.s. continuous trajectories. The covariance
function R(s, t) of K(·) is continuous on [0,1)2, R(s, t)> 0 on (√µ,1)2, and
R(s, t) = 0 on [0,1)2 \ (√µ,1)2.
In the uniform model, (2) holds for some centered Gaussian process KUnif(·)
on [0,1). This process satisfies KUnif(0) = 0 and has a.s. continuous trajecto-
ries. The covariance function RUnif(s, t) of KUnif(·) is continuous on [0,1)2,
and RUnif(s, t) =RPoiss(s, t)− s2t2.
Thus, the Poisson and the uniform models lead to different limit processes
KPoiss(·) and KUnif(·), although aPoiss(·) = aUnif(·).
As an immediate corollary of Theorem 1 (see Billingsley [3], Section 15),
we get
Kn(t)− na(t)√
D−→N (0, σ2(t)), n→∞(3)
for any t < 1, where σ2(t) := R(t, t). It is possible to show that in the i.d.
model, (3) holds for all t 6= 1 under the less restrictive condition EX2i <∞,
with σ2(t) = 0 for t > 1; continuity of Xi is not required.
We also study convergence of the left-hand side of (3) at the critical
moment t= 1. Apparently, the limit is not Gaussian, but this complicated
problem is related to a curious, but hardly provable conjecture on integrated
random walks. In view of this non-Gaussianity, it seems impossible to prove
any extended version of Theorem 1 that describes the weak convergence of
trajectories on the whole interval [0,1]; we refer to Section 7 for further
discussion.
We finish this subsection with a note on scaling. In our model, the masses
of particles are equal to 1
and the distances between them are of the order
. Let us rescale the i.d. model by multiplying all masses and distances
by n: the system of particles of mass one each, initially located at points
S1 − S[n/2], S2 − S[n/2], . . . , Sn − S[n/2], is called the expanding model. The
particles are shifted by S[n/2] because we want the system to expand “filling”
the whole line as n→∞ rather than only the positive half-line.
All results of our paper hold true for the expanding model. This is not
unexpected because the shift does not produce any changes and the rescaling
of masses is equivalent to the time contraction by n times while the rescaling
of distances is equivalent to the time expansion by n times. We refer the
reader to Section 2 below or to Lifshits and Shi [16] for rigorous arguments.
1.3. Organization of the paper. In Section 2 we describe a general method
which is used to study systems of sticky particles. This method is applied for
studying the i.d. model in Section 3, where we investigate some properties of
6 V. V. VYSOTSKY
the aggregation process. We will show that the aggregation process is highly
local, that is, the behavior of a particle is essentially defined by the motion
of neighbor particles. This localization property suggests that we could use
limit theorems for weakly dependent variables to prove both Fact 1 and
Theorem 1 for the i.d. model; this will be done in Section 4. Then we will
prove Theorem 1 for the uniform model in Section 5. In Section 6 we study
the number of clusters at the critical moment t = 1. Some open questions
are discussed in Section 7.
2. Method of barycenters. In this section we briefly describe the method
of barycenters, which is the main tool used to study systems of sticky par-
ticles; it is also applicable to more general models where particles could
have nonzero initial speeds and different masses. The method of barycenters
was independently introduced by E, Rykov and Sinai [6] and Martin and
Piasecki [20].
Let us start with several definitions. We always numerate particles from
left to right and identify particles with their numbers. A block of particles is
a nonempty set J ⊂ [1, n] consisting of consecutive numbers. For example,
the block (i, i+k] consists of particles i+1, . . . , i+k. Note that there are not
any relations between blocks and clusters: for example, a block’s particles
could be contained in different clusters and these clusters could even contain
particles that do not belong to the block.
It is convenient to assume that initial particles do not vanish at collisions
but continue to exist in created clusters. Then the coordinate xi,n(t) of a
particle i could be defined as the coordinate of a cluster that contains the
particle at time t. The second subscript n always indicates the number of
initial particles; we will omit this subscript as often as possible.
By xJ(t) := |J |−1
i∈J xi(t) denote the position of the barycenter of a
block J at time t. Further, define
x∗J(t) := xJ(0) +
where M
J := n
−1(n −maxj∈J) and M (L)J := n−1(minj∈J −1) are the to-
tal masses of particles located to the right and to the left of the block J ,
respectively.
A block is free from the right up to time t if, up to this time, the block’s
particles did not collide with particles initially located to the right of the
block. We similarly define blocks that are free from the left and say that a
block is free up to time t if it is both free from the right and from the left.
The next statement plays the key role in the analysis of sticky particles
systems. The barycenter of a free block moves as an imaginary particle con-
sisting of all particles of the block put together at the initial barycenter. In a
more precise and general way, we state the following.
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 7
Proposition 1. If a block J is free from the right (resp. left) up to time
t, then xJ(s)≥ x∗J(s) for s ∈ [0, t] [resp. xJ(s)≤ x∗J(s)]. If a block J is free
up to time t, then xJ(s) = x
J(s) for s ∈ [0, t].
This statement could be found, for example, in Lifshits and Shi [16],
Proposition 4.1. The easy proof is based on the property of conservation of
momentum.
The moment when a particle j sticks with its right-hand side neighbor
j + 1 is called the merging time Tj,n of the particle j. In other words, Tj,n
is the first moment when particles j and j + 1 are contained in a common
cluster; here j ∈ [1, n− 1]. Proposition 4.3 from Lifshits and Shi [16], which
is stated below, gives us a way to calculate Tj,n.
Proposition 2. For every j ∈ [1, n− 1], we have
Tj,n = min
j<k≤n
0≤l<j
{s≥ 0 :x∗(j,k](s) = x∗(l,j](s)}.(4)
Thus, Tj,n is expressed by means of barycenters. Note that since
x∗(j,k](s)− x∗(l,j](s) = x(j,k](0)− x(l,j](0)−
s2,(5)
each of the equations x∗
(j,k]
(s) = x∗
(l,j]
(s) has a unique nonnegative solution.
We also mention that at the moment Tj,n appears a cluster that consists of
the particles l+1, . . . , k, where k and l are minimizers of the right-hand side
of (4).
We will prove Proposition 2 since the proof is simple and perfectly illus-
trates the sense of the method of barycenters.
Proof of Proposition 2. For any u < Tj,n, the particles j and j + 1
are contained in different clusters. Therefore, for every l < j, the block [l, j]
is free from the right up to time u, and for every k > j, the block [j + 1, k]
is free from the left. By Proposition 1,
x∗(l,j](u)≤ x(l,j](u)≤ xj(u)<xj+1(u)
≤ x(j,k](u)≤ x∗(j,k](u),
and since, by (5), the function x∗
(j,k]
(s)− x∗
(l,j]
(s) is decreasing for s≥ 0, we
conclude that
u <{s≥ 0 :x∗(j,k](s) = x∗(l,j](s)}.
Taking minimum over k, l and taking supremum over u, we get Tj,n ≤
min{· · ·}.
8 V. V. VYSOTSKY
Let us prove the last inequality in the other direction. By the definition of
Tj,n, there exist an l < j and a k > j such that the blocks (l, j] and (j, k] are
free up to time Tj,n (clusters containing particles from these blocks collide
exactly at time Tj,n). In view of Proposition 1,
x∗(l,j](Tj,n) = x(l,j](Tj,n) = x(j,k](Tj,n) = x
(j,k](Tj,n);
hence Tj,n = {s≥ 0 :x∗(j,k](s) = x
(l,j]
(s)} and Tj,n ≥min{· · ·}. �
3. Study of the i.d. model. The localization property. At first, note that
Kn(t) = 1+
1{t<Ti,n}(6)
because the total number of clusters decreases by one at each moment Ti,n.
This representation plays the key role in the investigation of Kn(t). Clearly,
we need to study properties of the r.v.’s Ti,n to prove limit theorems for
Kn(t); such study will be done in this section.
3.1. The initial study. Let us simplify the representation for Tj,n from
Proposition 2. In this section we consider the i.d. model of initial positions,
where xj,n(0) =
Sj . Recall that Sj is a random walk with i.i.d. increments
{Xj}j∈Z (we will need the variables {Xj}j≤0 later).
Rewrite the initial distance between barycenters as
x(j,k](0)− x(l,j](0)
i=j+1
j − l
i=l+1
i=j+1
(Si − Sj+1) +
j − l
i=l+1
(Sj − Si) + (Sj+1 − Sj)
k−j−1
(Sj+i+1 − Sj+1) +
j − l
j−l−1
(Sj − Sj−i) +Xj+1
let us agree that
:= 0. Further, by
x(j,k](0)− x(l,j](0)
k−j−1
j+i+1
m=j+2
j − l
j−l−1
m=j−i+1
Xm +Xj+1
k−j−1
(k− j − i)Xj+i+1
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 9
j − l
j−l−1
(j − l− i)Xj−i+1 +Xj+1
and (5), we have
x∗(j,k](s)− x∗(l,j](s) =
Fk−j,j,j−l(s),
where
Fp,j,q(s) :=
(p− i)Xj+i+1
(q − i)Xj−i+1 +Xj+1 −
(for p, q ≥ 1 and j ∈ Z). Now, by Proposition 2, we get
Tj,n = min
j<k≤n
0≤l<j
{s≥ 0 :Fk−j,j,j−l(s) = 0}
= min
1≤k≤n−j
1≤l≤j
{s≥ 0 :Fk,j,l(s) = 0}.(8)
Note that Fp,j,q(0) ≥ 0 for all p, j, q and Fp,j,q(s) is decreasing for s ≥ 0.
This function could be also written in the more convenient form:
Fp,j,q(s) =
(p− i)(Xj+i+1 − s2)
(q − i)(Xj−i+1 − s2) + (Xj+1 − s2).
3.2. Localization property of the aggregation process. We see that Tj,n
is a function of X2, . . . ,Xn; in other words, it is necessary to know the
distances between all n particles to find Tj,n. The aggregation process is
actually highly local, that is, the value of Tj,n is essentially defined by the
initial distances between neighbor particles {i} of j for which |j − i| is small
enough.
To make this statement rigorous, we need to introduce the following no-
tation. Let us put
j := min
1≤k,l≤M
{s≥ 0 :Fk,j,l(s) = 0}, j ∈ Z,M ∈N,
which is expressed in terms of the variables {Xi}|j−i|≤M only. Also, define
Tj := inf
k,l≥1
{s≥ 0 :Fk,j,l(s) = 0}, j ∈ Z,
10 V. V. VYSOTSKY
which is, in some sense, the merging time in an appropriate infinite system
of particles. The reader could construct such system by considering the limit
of the expanding model, see Section 1.
It is clear that
Tj ≤ Tj,n ≤ T (j∧n−j)j , j, n ∈N, j ≤ n,(10)
where by ∧ and ∨ we denote minimum and maximum, respectively, and
Tj ≤ T (M)j , j ∈ Z,M ∈N.(11)
Let us estimate the rate of the convergence of P{Tj 6= T (M)j } to zero as the
“radius of the neighborhood” M tends to infinity. We thus could “measure”
the above-mentioned locality of the aggregation process. In fact, by (10), we
have P{Tj,n 6= T (M)j } ≤ P{Tj 6= T
j } for any n ∈N, j ≤ n, andM ≤ j∧n−j.
Lemma 1. Suppose EX
i <∞ for some γ ≥ 1. Then there exists a non-
decreasing function ρ(t) such that
max(P{1{t≤Tj} 6= 1{t≤T (M)
}},P{Tj 6= T
j , T
j ≤ t})≤ ρ(t)M
1−γ(12)
for any t ∈ (0,1), j ∈ Z, and M ∈N. Moreover, for any t < 1, the left-hand
side of (12) is o(M1−γ).
Proof. Let us estimate the first probability in the left-hand side of
(12). By properties of Fk,j,l(·) and definitions of T (M)j and of Tj ,
P{1{t≤Tj} 6= 1{t≤T (M)
}}= P{Tj < t≤ T
k,l≥1
Fk,j,l(t)< 0, min
1≤k,l≤M
Fk,j,l(t)≥ 0
By (9), this expression does not depend on j, and putting j :=−1,
P{1{t≤Tj} 6= 1{t≤T (M)
(k− i)(Xi − t2)
+ inf
(l− i)(X−i − t2) + (X0 − t2)< 0,
1≤k≤M
(k− i)(Xi − t2)
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 11
+ min
1≤l≤M
(l− i)(X−i − t2) + (X0 − t2)≥ 0
We then compare the inequalities in the braces and obtain
P{1{t≤Tj} 6= 1{t≤T (M)
(k− i)(Xi − t2)< min
1≤k≤M
(k− i)(Xi − t2)
(Si − it2)< min
1≤k≤M
(Si − it2)
(Si − it2)< min
k∈{1,M}
(Si − it2)
Now rewrite the event in the last line as
∃k >M : 1
(Si − it2)<min
(Si − it2)
∃k >M : 1
(Si − it2)
(Si − it2)<min
(Si − it2)
Analyzing both cases 0≤ 1
i=1 (Si − it2) and 0> 1M
i=1 (Si − it2), we
conclude that the considered event implies
∃k >M : 1
(Si − it2)< 0
∃k >M :
(Si − it2)< 0
Clearly, the latter implies
{∃i≥M :Si − it2 < 0}=
hence, combining all the estimates together, we get
P{1{t≤Tj} 6= 1{t≤T (M)
}} ≤ 2P
.(13)
Note that we obtained (13) without any assumptions on the moments of Xi.
We now estimate the right-hand side of (13); recall that EXi = 1. Then
the first part of (12) immediately follows from the classical result of Baum
and Katz [1] (see their Theorem 3 and Lemma):
12 V. V. VYSOTSKY
Fact 2. If EXi = a and E|Xi|γ <∞ for some γ ≥ 1, then
= o(k1−γ), k→∞
for any ε > 0. In addition, the series
k=1P{supi≥k |Sii − a|> ε} converges
for all ε > 0 if γ = 2.
The estimation of the second probability in the left-hand side of (12) is
completely analogous, since
{Tj 6= T (M)j , T
j ≤ t}
= {Tj < T (M)j ≤ t}
1≤k,l
Fk,j,l(T
j )< 0, min
1≤k,l≤M
Fk,j,l(T
j ) = 0, T
j ≤ t
We put j :=−1, repeat the estimates, and get
P{Tj 6= T (M)j , T
j ≤ t} ≤ 2P{∃i≥M :Si − i[T
< 0, T
−1 ≤ t}
instead of (13). The right-hand side does not exceed 2P{∃i≥M : Si − it2 <
0}, hence
P{Tj 6= T (M)j , T
j ≤ t} ≤ 2P
.(14)
3.3. The distribution function of T0 in the Poisson model. It is amazing
that in the Poisson model, the distribution function of T0 could be found
explicitly. This is important because by (27) below, the limit function a(t)
equals P{T0 > t} for the i.d. model. Also, in the proof of Theorem 1 for the
uniform model, we will need aPoiss(t) = P{TPoiss0 ≥ t} to be twice differen-
tiable and have a continuous second derivative.
Lemma 2. In the Poisson model, for 0≤ t≤ 1, we have
P{T0 ≥ t}= 1− t2.(15)
In addition, for t≥ 0, n≥ 2, and 1≤ j ≤ n− 1, we have
P{Tj,n ≥ t}= et
1≤k≤j
(Si − it2)≥ 0
1≤k≤n−j
(Si − it2)≥ 0
where Si is a standard exponential random walk.
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 13
Proof. We start with (16). By (8), (9) and properties of Fk,j,l(·),
P{Tj,n ≥ t}= P
1≤k≤n−j
1≤l≤j
Fk,j,l(t)≥ 0
1≤k≤n−j
(k− i)(Xj+i+1 − t2)(17)
+ min
1≤l≤j
(l− i)(Xj−i+1 − t2) +Xj+1 − t2 ≥ 0
In the right-hand side of the last equality, by Y denote the first minimum
and by Ỹ denote the second one.
Suppose X is a standard exponential r.v., Z is a nonnegative r.v., and
that X and Z are independent; then
P{Z ≤X}=
P{Z ≤ x}e−x dx
E1{Z≤x}e
−x dx= E
1{Z≤x}e
−x dx= Ee−Z .
Hence in view of independence of Y , Ỹ , Xj+1 we get
P{Y + Ỹ +Xj+1 − t2 ≥ 0}= EeY+Ỹ−t
EeY−t
EeỸ−t
and therefore,
P{Tj,n ≥ t}= et
P{Y +Xj+1 − t2 ≥ 0} · P{Ỹ +Xj+1 − t2 ≥ 0}.
Now, by
P{Ỹ +Xj+1 − t2 ≥ 0}
1≤l≤j
(l− i)(Xj−i+1 − t2) +Xj+1 − t2 ≥ 0
1≤l≤j
(l− i)(Xi+1 − t2) + l(X1 − t2)
1≤l≤j
(l− i+1)(Xi − t2)≥ 0
we conclude the proof of (16). Indeed, the expression in the last line equals
the first probability in the right-hand side of (16).
14 V. V. VYSOTSKY
Now let us prove (15). From the definition of T0 and T
0 we see that
1{t≤T (k)
} → 1{t≤T0} a.s. as k→∞; then by (16),
P{T0 ≥ t}= et
(Si − it2)≥ 0
Then we need to check that
(Si − it)≥ 0
1− te−t/2
for 0 ≤ t ≤ 1. The complicated calculations of this probability take more
then ten pages. Therefore, they were separated into independent paper [26].
Although these calculations seem to be technical, they are based on quite
original ideas. �
3.4. Some properties of the variables Ti. In this subsection we prove
several important properties of the r.v.’s Ti.
1. The sequence Ti is stationary.
Proof. This statement immediately follows from the definition of Ti and
stationarity of Xi, which are i.i.d.
2. The common distribution function of Ti is defined by
P{Ti ≥ t}= P
(k− i)(Xi − t2)
+ inf
(l− i)(X−i − t2) + (X0 − t2)≥ 0
Proof. This formula follows from (9).
3. We have P{√µ ≤ Ti ≤ 1} = 1 while sup{y :P{Ti < y} = 0} =
µ and
inf{y :P{Ti < y} = 1} = 1; recall that µ = sup{y :P{Xi < y} = 0}. In addi-
tion, if 0<DXi <∞, then P{Ti = 1}= 0.
Proof. First, P{√µ ≤ Ti} = 1 is trivial, because both infima in (19) are
nonpositive.
Second, fix a t≥ 1 and consider P{Ti ≥ t}. Taking into account that infima
in (19) are nonpositive, we obtain
P{Ti ≥ t} ≤ P
(k− i)(Xi − t2) + (X0 − t2)≥ 0
Then by the same arguments as in (18),
P{Ti ≥ t} ≤ P
(k − i+1)(Xi − t2)≥ 0
(Si − it2)≥ 0
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 15
By the strong law of large numbers, this probability is zero for all t > 1.
If t= 1 and 0<DXi <∞, then
(Si − i)≥ 0
= lim
1≤k≤n
(Si − i)≥ 0
= lim
1≤k≤n
Si − i√
and from the invariance principle, we get
P{Ti ≥ 1} ≤ P
0≤s≤1
W (u)du≥ 0
It follows from the asymptotics of unilateral small deviation probabilities of
an integrated Wiener process, see (43) and (44) below, that the last expres-
sion equals zero.
Third, sup{y :P{Ti < y}= 0}=
µ and inf{y :P{Ti < y}= 1} = 1 follow
if we prove that for any t < EXi = 1, the common distribution of the i.i.d.
infima in (19) has an atom at zero. But we have
(k− i)(Xi − t2) = 0
(Si − it2) = 0
and it could be shown via the strong law of large numbers that the last
probability is strictly positive for all t < 1.
4. Suppose Xi is continuous. Then T
j and Tj,n are continuous for any
j, k,n and the common distribution of Tj could have an atom only at 1. In
addition, if EX2i <∞, then Tj are continuous.
Proof. By (7) and (8),
Tj,n = min
1≤k≤n−j
1≤l≤j
H(k, j, l),(20)
where
H(p, j, q) :=
(p− i)Xj+i+1 +
(q − i)Xj−i+1 +Xj+1
Hence Tj,n is continuous as a minimum of a finite number of continuous
r.v.’s. The T
j are also continuous because T
= Tk,2k.
16 V. V. VYSOTSKY
Now we prove the continuity of Tj . By Property 3, it only remains to
verify that P{Tj ≥ t} is continuous on [0,1). But P{T (k)j ≥ t} − P{Tj ≥ t}=
P{1{t≤Tj} 6= 1{t≤T (k)
}}, and in view of (13),
0≤t≤s
|P{T (k)j ≥ t}−P{Tj ≥ t}| ≤ sup
0≤t≤s
for every s < 1 = EXi. The last expression tends to zero by the strong law
of large numbers; then P{Tj ≥ t} is continuous on [0, s] as a uniform limit
of continuous functions P{T (k)j ≥ t}. Since s < 1 is arbitrary, P{Tj ≥ t} is
continuous on [0,1).
5. The cov(1{s≤T0},1{t≤Tk}) tends to zero as k→∞ for all s, t ∈ [0,1). If,
in addition, EX
i <∞ for some γ > 1, then for any s, t ∈ [0,1) and k ∈ N,
we have
|cov(1{s≤T0},1{t≤Tk})| ≤ 2
γ(ρ(s) + ρ(t))k1−γ .(21)
Proof. The idea is to approximate 1{s≤T0} and 1{t≤Tk} by 1{s≤T (k/2)0 }
1{t≤T (k/2)
}, respectively; here by k/2 we mean ⌈k/2⌉, where ⌈x⌉=min{m ∈
Z :m≥ x}. Note that 1{s≤T (k/2)0 }
and 1{t≤T (k/2)
} are independent because the
first is a function of {Xi}i≤k/2 while the second is a function of {Xi}i≥k/2+1.
We then have
|cov(1{s≤T0},1{t≤Tk})|
= |cov(1{s≤T0},1{t≤Tk})− cov(1{s≤T (k/2)
},1{t≤T (k/2)
≤ |E(1{s≤T0}1{t≤Tk} − 1{s≤T (k/2)
}1{t≤T (k/2)
+ |E(1{s≤T0} − 1{s≤T (k/2)
})|+ |E(1{t≤Tk} − 1{t≤T (k/2)
})|(22)
= P{1{s≤T0}1{t≤Tk} 6= 1{s≤T (k/2)
}1{t≤T (k/2)
+ P{1{s≤T0} 6= 1{s≤T (k/2)0 }}+ P{1{t≤Tk} 6= 1{t≤T (k/2)k }
P{1{s≤T0}1{t≤Tk} 6= 1{s≤T (k/2)
}1{t≤T (k/2)
≤ P{1{s≤T0} 6= 1{s≤T (k/2)
} ∪ 1{t≤Tk} 6= 1{t≤T (k/2)
therefore the result follows from Lemma 1.
6. The r.v.’s {Ti}i∈Z, {T (k)i }i∈Z, and {Ti,n}
i=1 are associated ; the author
owes this observation to M. A. Lifshits.
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 17
Proof. Let us first recall the definition and some basic properties of as-
sociated variables. R.v.’s ξ1, . . . , ξm are associated if for any coordinate-wise
nondecreasing functions f, g :Rm →R, it is true that
cov(f(ξ1, . . . , ξm), g(ξ1, . . . , ξm))≥ 0
(assuming that the left-hand side is well defined). An infinite set of r.v.’s is
associated if any finite subset of its variables is associated.
The following sufficient conditions of association are well known; see [7].
(a) Independent variables are associated.
(b) Coordinate-wise nondecreasing functions (of finite number of argu-
ments) of associated r.v.’s are associated.
(c) If the variables ξ1,k, . . . , ξm,k are associated for every k and (ξ1,k, . . . ,
ξm,k)
D−→ (ξ1, . . . , ξm) as k→∞, then ξ1, . . . , ξm are associated.
(d) If two sets of associated variables are independent, then the union of
these sets is also associated.
Then {Ti,n}n−1i=1 are associated for every n by (a), (b) and (20). Analo-
gously, {T (k)i }i∈Z are associated for every k. Finally, since T
i → Ti a.s. as
k→∞ for every i, (c) ensures the association of {Ti}i∈Z.
7. For any s, t ∈R and k ∈ Z,
cov(1{T0≤s},1{Tk≤t})≥ 0.(23)
Proof. This inequality follows from cov(1{T0≤s},1{Tk≤t}) = cov(1{s<T0},
1{t<Tk}), the association of T0, Tk and (b).
8. If EX
i <∞ for some γ ≥ 2, then the stationary sequence min{Ti, t}
is strongly mixing for any t < 1 and its coefficients of strong mixing α(k)
satisfy α(k) = o(k2−γ).
Proof. Recall that stationary r.v.’s ξi are strongly mixing if α(k)→ 0 as
k→∞, where α(k) are the coefficients of strong mixing defined as
α(k) := sup
A∈F0−∞,B∈F
|P(AB)− P(A)P(B)|;
here F0−∞ := σ(ξ0, ξ−1, . . .) and F∞k := σ(ξk, ξk+1, . . .) are the σ-algebras of
“past” and “future,” respectively. It is readily seen that
α(k)≤ sup
0≤f,g≤1
| cov(f(ξ0, ξ−1, . . .), g(ξk, ξk+1, . . .))|,(24)
where the supremum is taken over Borel functions f, g :R∞ → [0,1].
Let us estimate α(k) in the same way we estimated the left-hand side
of (21). Fix some Borel functions f, g : R∞ → [0,1]. We approximate the
variables from the “past” T0∧ t, T−1∧ t, T−2∧ t, . . . by T (k/2)0 ∧ t, T
(k/2+1)
−1 ∧ t,
(k/2+2)
−2 ∧ t, . . . , respectively; and for the variables from the “future,” we
18 V. V. VYSOTSKY
use the analogous approximation. Now, f(T
(k/2)
0 ∧ t, T
(k/2+1)
−1 ∧ t, . . .) and
(k/2)
k ∧ t, T
(k/2+1)
k+1 ∧ t, . . .) are independent because the first is a function
of {Xi}i≤k/2 and the second is a function of {Xi}i≥k/2+1. We then argue in
the same way as in (22) to get
| cov(f(T0 ∧ t, T−1 ∧ t, . . .), g(Tk ∧ t, Tk+1 ∧ t, . . .))|
(T−i ∧ t) 6= (T (k/2+i)−i ∧ t)
(Tk+i ∧ t) 6= (T (k/2+i)k+i ∧ t)
i=k/2
P{(T0 ∧ t) 6= (T (i)0 ∧ t)}.
Now, by the formula of total probability, we have
P{(T0 ∧ t) 6= (T (i)0 ∧ t)}
= P{(T0 ∧ t) 6= (T (i)0 ∧ t), T
0 ≥ t}+ P{(T0 ∧ t) 6= (T
0 ∧ t), T
0 < t}
≤ P{1{t≤T0} 6= 1{t≤T (i)
}}+ P{T0 6= T
0 , T
0 ≤ t}
and combining all the estimates together, by Lemma 1 (24) and arbitrariness
of f and g, we get α(k)≤ 8
i=k/2 o(i
1−γ) = o(k2−γ) if γ > 2. For γ = 2, we
get α(k)≤ 16
i=k/2 P{inf i≥M Sii < t
2}= o(1) using the same argument and
applying (13), (14), and Fact 2 instead of Lemma 1.
3.5. The last collision. We finish this section with a statement on the
convergence of the moments of the last collision.
Proposition 3. In the i.d. model, T lastn
P−→ 1 as n→∞ if EX2i <∞.
This result is well known for the Poisson model; see Giraud [8].
Proof of Proposition 3. Let us first prove that P{T lastn ≥ t}→ 0 as
n→∞ for all t > 1. Since T lastn =max1≤j≤n−1Tj,n, we have
P{T lastn ≥ t} ≤
P{Tj,n ≥ t}.(25)
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 19
By taking into account that the minima in (17) are nonpositive and by
arguing as in (18),
P{Tj,n ≥ t} ≤ P
1≤k≤j∨n−j
(k− i)(Xj+i+1 − t2) +Xj+1 − t2 ≥ 0
1≤k≤j∨n−j
(k− i+1)(Xi − t2)≥ 0
1≤k≤n/2
(Si − it2)≥ 0
We claim that (without any assumptions on the moments of Xi)
P{Tj,n ≥ t} ≤ P
i≥(t−1)/4tn
1 + t2
;(26)
recall that t > 1. Clearly, (26) follows if we check that
1≤k≤n/2
(Si − it2)≥ 0
i≥(t−1)/4tn
1 + t2
Assume the converse; then, by the nonnegativity of Si,
(Si − it2) =
(Si − it2) +
i=cn+1
(Si − it2)
(Scn − it2) +
i=cn+1
1 + t2
− it2
where c := t−1
. We estimate the last expression with
cnScn −
(cn)2
t2 − (n/2)
2 − (cn)2
2 − 1
n2 − 1/4− c
2 − 1
It is simple to check that the right-hand side is negative, thus we have a
contradiction.
Then from (25), (26) and Fact 2 it follows that P{T lastn ≥ t} =
i=1 o((cn)
−1) = o(1) for all t > 1.
Now let us prove that P{T lastn < t} → 0 as n → ∞ for all t < 1. Since
T lastn =max1≤j≤n−1Tj,n, we estimate
P{T lastn < t} ≤ P
n,n < t
P{1{t≤Tj√n,n} 6= 1{t≤T (
20 V. V. VYSOTSKY
In view of (10) and Lemma 1, the sum is
j=1 o(n
−1/2) = o(1), hence it
remains to check that the first probability in the last line tends to zero.
For a fixed n, all T
are independent because each one is a function of
{Xi}|j√n−i|≤√n/2 (to be precise, of Xj√n−√n/2+2, . . . ,Xj√n+√n/2). Thus,
n−1{T (
n/2)√
< t} ≤ P
n−1{T0 < t},
which tends to zero; indeed, P{T0 < t}< 1 by Property 3, Section 3.4. �
4. Proofs of Fact 1 and Theorem 1 for the i.d. model. Recall that the
number of clusters Kn(t) is given by (6). Our idea is to study
i=1 1{t<Ti}
instead of
i=1 1{t<Ti,n}: We thus deal with a single sequence Ti and avoid
considering the triangular array Ti,n.
Let us now prove Fact 1 for the i.d. model. We prove (1) for t 6= 1 without
any additional assumptions on Xi; for t = 1, we require EX
i < ∞. The
properties of the limit function a(t) were studied in Section 3.4, Properties
3 and 4.
Proof of Fact 1. We put
a(t) := P{T0 > t}.(27)
Let us first prove (1) for all t < 1. It is sufficient to check that
Kn(t)
1{t<Ti}
P−→ 0, n→∞.(28)
Indeed, the stationary sequence 1{t<Ti} satisfies the law of large numbers by
Property 5, Section 3.4, and the well-known result of S. N. Bernstein:
Fact 3. The law of large numbers holds for r.v.’s ξi if there exists a
sequence r(k)→ 0 such that cov(ξi, ξj)≤ r(|i− j|) for all i, j ∈N.
By (6),
Kn(t)
1{t<Ti}
(1{t<Ti,n} − 1{t<Ti}),
where we used (10) to get the nonnegativity of the right-hand side. Then
(28) immediately follows from the Chebyshev inequality provided that the
expectation of the right-hand side tends to zero. By using (10), we obtain
E(1{t<Ti,n} − 1{t<Ti})≤
(E1{t<T (i∧n−i)
} − E1{t<Ti})
P{1{t<Ti} 6= 1{t<T (i∧n−i)
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 21
which is 2
i=1 o(1) = o(1) by Lemma 1. To be very precise, Lemma 1
deals with slightly different indicators, but we can estimate the considered
probability by repeating the proof of Lemma 1 word for word (or just use
Property 4, Section 3.4).
We now check that (1) holds for all t > 1. Using (26) gives E
Kn(t)
i=1 P{Ti,n > t} → 0 as n→∞ and
Kn(t)
P−→ a(t) = 0 follows from the
Chebyshev inequality.
It remains to check that (1) holds for t = 1 if EX2i < ∞ to conclude
the proof. If DXi = 0, then the situation is deterministic, this case was
described in Introduction. Here we always have Kn(1) = 1 and (1) is true.
If 0<DXi <∞, then by Property 3 from Section 3.4, we have a(1) = 0 and
P{T0 = 1} = 0; consequently, a(t) = P{T0 > t} is continuous at t= 1. Then
(1) is true for t = 1 since 0 <
Kn(1)
≤ Kn(t)
P−→ a(t) for any t ∈ (0,1) and
a(t)→ a(1) = 0 as tր 1. �
Now we prove Theorem 1 for the i.d. model. We think of D[0,1] as of a
separable metric space equipped with the Skorohod metric d, which induces
the Skorohod topology.
Proof of Theorem 1. At first, we prove (2). In view of representation
(6) for Kn(t), relation (2) follows from the relation
0≤t≤1−ε
1{t<Ti,n} −
1{t<Ti}
P−→ 0 for all ε ∈ (0,1)(29)
and the existence of a centered Gaussian process K(·) on [0,1) such that
1{t<Ti} −na(t)
D−→K(·) in D[0,1− ε] for all ε ∈ (0,1).(30)
Indeed, if Yn
D−→ Y and d(Yn, Y ′n)
P−→ 0 for some random elements Yn, Y ′n, Y
of the separable metric spaceD[0,1−ε], then Y ′n
D−→ Y ; recall that d(Yn, Y ′n)≤
supt∈[0,1−ε] |Yn(t)− Y ′n(t)|.
We start with (29). It is sufficient to prove that the expectation of the
left-hand side tends to zero. Since the supremum of a sum does not exceed
the sum of suprema, let us check that
E sup
0≤t≤1−ε
|1{t<Ti,n} − 1{t<Ti}| −→ 0 for all ε ∈ (0,1).(31)
By (10), we have
E sup
0≤t≤1−ε
|1{t<Ti,n} − 1{t<Ti}| ≤ E sup
0≤t≤1−ε
(1{t<T (i∧n−i)
} − 1{t<Ti})
22 V. V. VYSOTSKY
= P{Ti 6= T (i∧n−i)i , Ti ≤ 1− ε}
= P{Ti 6= T (i∧n−i)i , T
(i∧n−i)
i < 1− ε}
+ P{1{1−ε≤Ti} 6= 1{1−ε≤T (i∧n−i)
where the last equality was obtained via the formula of total probability.
Combining the estimates together and using Lemma 1,
E sup
0≤t≤1−ε
|1{t<Ti,n} − 1{t<Ti}|
≤ 2ρ(1− ε)√
(i ∧ n− i)1−γ = 4ρ(1− ε)√
i1−γ .
The last expression is O(n3/2−γ) and (31), which implies (29), follows.
Now let us prove (30). As long as
Un(t) :=−
1{t<Ti} − na(t)
1{Ti≤t} − (1− a(t))
the Un(·) is the empirical process of stationary r.v.’s Ti with the continuous
common distribution function 1− a(t). By K(·) D=−K(·), (30) is equivalent
to the existence of a centered Gaussian process K(·) on [0,1) such that
Un(·)
D−→K(·) in D[0,1− ε] for all ε ∈ (0,1).(32)
We will use the following result from Lin and Lu [17], Section 12 on
convergence of empirical processes. They attribute this statement to Q.-M.
Shao, who published it in 1986, in Chinese.
Fact 4. Let ξi be a sequence of stationary strongly mixing r.v.’s dis-
tributed on [0,1], and let F be the common distribution function of ξi.
Suppose F (x) = x on [0,1] (i.e., ξi are uniformly distributed) and the co-
efficients of strong mixing of the sequence F (ξi) decrease as O(k
−(2+δ))
as k → ∞ for some δ > 0. Then the empirical processes of ξi weakly con-
verge in D[0,1] to a centered Gaussian process with the covariance function
i∈Z cov(1{ξ0≤s},1{ξi≤t}).
Remark. The limit Gaussian process is a.s. continuous on [0,1]. Fact 4
also holds true if F is an arbitrary continuous distribution function.
The a.s. continuity of the limit process could be concluded by a compar-
ison of the proof from Lin and Lu [17] with the proof of Theorem 22.1 from
Billingsley [3]. The statements and the proofs of these theorems are identical,
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 23
but Lin and Lu do not state the continuity while Billingsley does. Further,
since F (ξi) is uniformly distributed on [0,1] if F is continuous, Fact 4 holds
true for every continuous F ; see the proof of Theorem 22.1 by Billingsley [3]
for explanations.
Recall that we need to prove the convergence of the empirical process of
Ti. It seems that the r.v.’s Ti are not strongly mixing; but min{Ti,1− ε} are
strongly mixing because of Property 8, Section 3.4. These variables are not
continuous and so we need to fix them. Let us fix an ε ∈ (0,1), and let αi
be i.i.d. r.v.’s independent of all Ti and, say, uniformly distributed on [0, ε];
we define T̃i := min{Ti,1− ε}+ 1{Ti≥1−ε}αi.
The stationary variables T̃i are distributed on [0,1], their common dis-
tribution function G is continuous, and the coefficients of strong mixing
of G(T̃i) decrease as o(k
2−γ). The proof of the last statement is the same
as the proof of Property 8 from Section 3.4. Indeed, approximate the vari-
ables G(T̃0),G(T̃−1), . . . from the “past” by G(T̃
(k/2)
0 ),G(T̃
(k/2+1)
−1 ), . . . where
i := min{T
i ,1− ε}+ 1{T (m)
≥1−ε}αi; use the analogous approximation
for the variables from the “future”; and then repeat word for word the ar-
guments of the previous proof.
Now, recalling that γ > 4, we see that T̃i satisfy the assumptions of Fact 4,
with the only difference that their distribution is not uniform. By Ũn(·)
denote the empirical process of T̃i; clearly, Ũn(·) coincides with the empirical
process Un(·) of Ti on [0,1− ε]. By the remark to Fact 4, we conclude that
first,
Ũn(·)
D−→ K̃(·) in D[0,1],(33)
where K̃(·) is a centered Gaussian process with the covariance function
R̃(s, t) :=
cov(1{T̃0≤s},1{T̃i≤t})
and, second, trajectories of K̃(·) are a.s. continuous on [0,1].
[There exists a simpler and more elegant proof of (33). Note that {T̃i}i∈Z
are associated as coordinate-wise nondecreasing functions of associated r.v.’s
{Ti, αi}i∈Z, see (a), (b) and (d) from Property 6, Section 3.4. Then we can
obtain (33) applying the result of Louhichi [18] on convergence of empirical
processes of stationary associated r.v.’s ξi instead of using Fact 4. This the-
orem requires only cov(F (ξ0), F (ξk)) = O(k
−(4+δ)), which could be proved
analogously to Property 5, Section 3.4. Thus we avoid the complicated esti-
mations of the strong mixing coefficients, and the proof of (33) is becomes
much simpler. The only problem is that this proof requires γ > 5.
We also note that the a.s. continuity of K̃(·) could be proved directly,
without referring to the proof of Fact 4. The arguments should be the same
as in the proof of the continuity of KUnif(·) in Section 5.]
24 V. V. VYSOTSKY
Define
R(s, t) :=
cov(1{T0≤s},1{Ti≤t}),(34)
which is, evidently, equal to R̃(s, t) on [0,1 − ε]2. Since R̃(s, t) is positive
definite and ε > 0 is arbitrary, the function R(s, t) is positive definite on
[0,1)2. Therefore, by Lifshits [15], Section 4, there exists a centered Gaussian
process K(·) on [0,1) with the covariance function R(s, t). The trajectories
of K(·) are a.s. continuous on [0,1) by K(·) D= K̃(·) on [0,1−ε], arbitrariness
of ε > 0, and the a.s. continuity of K̃(·) on [0,1].
Finally, by (33), Ũn(·) = Un(·) on [0,1− ε], K̃(·) D=K(·) on [0,1− ε], and
the a.s. continuity of K̃(·), we get (32). Since (32) implies (30), we conclude
the proof of (2).
Only the stated properties of R(s, t) remain to be proven. The continuity
of the joint distribution function of continuous variables T0 and Ti implies
that cov(1{T0≤s},1{Ti≤t}) is continuous on [0,1)
2 for every i ≥ 0. Then, in
view of (21), R(s, t) is continuous on [0,1)2 as a sum of uniformly converging
series of continuous functions.
The strict positivity of R(s, t) on (
µ,1)2 trivially follows from (34), (23)
and cov(1{T0≤s},1{T0≤t}) = a(s∨ t)(1−a(s∧ t)) > 0; the last inequality holds
by Property 3, Section 3.4. The R(s, t) = 0 on [0,1)2 \ (√µ,1)2 follows from
P{Ti ≤
µ}= 0, which holds by Properties 3 and 4 from Section 3.4. �
We note that (3) holds for t 6= 1 under the less restrictive condition EX2i <
∞. For t < 1, the proof is almost the same: By (29), which is true for γ > 3/2,
we conclude that (3) holds if the stationary associated sequence 1{t<Ti}
satisfies the central limit theorem. Then we refer to the central limit theorem
for stationary associated sequences from Newman [21]; his theorem requires
only R(t, t)<∞, that is, the convergence of the right-hand side of (34). This
condition holds by (13) and Fact 2. For t > 1, relation (3) holds true with
σ2(t) = 0 because of Proposition 3.
Finally, note that the process K(·) is associated, that is, the r.v.’s
{K(t)}t∈[0,1) are associated. In fact, by (6), Property 6 from Section 3.4,
and Condition (b) from the same Property 6, the processes
Kn(·)−na(·)√
associated for every n. Then K(·) is associated by (2) and (c), Property 6.
5. Proof of Theorem 1 for the uniform model. There exists a simple
method that allows to extend results from the Poisson model to the uniform
model and vise versa. The method is based on the next statement (see
Karlin [13], Section 9.1).
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 25
Fact 5. Let Si be an exponential random walk. Then for any k ≥ 1, we
, . . . ,
= (U1,k,U2,k, . . . ,Uk,k),(35)
where Ui,k are the order statistics of k i.i.d. r.v.’s uniformly distributed on
[0,1]. Moreover, the random vector in the left-hand side of (35) is indepen-
dent of Sk+1.
Therefore, if xPoissj,n (0) =
Sj are the initial positions of particles in the
Poisson model, then for the initial positions of particles in the uniformmodel,
we have xUnifj,n (0) =
· xPoissj,n (0). By Proposition 2 and (5), we conclude
TUnifj,n = β
Poiss
j,n , βn :=
,(36)
and hence, using (6), we get
KUnifn (t) =K
Poiss
n (βnt).(37)
Note that the process KUnifn (·) and the r.v. βn are independent since val-
ues of the process are defined by xUnif1,n (0), . . . , x
n,n (0), which are mutually
independent of βn by Fact 5.
Now we prove Theorem 1 for the uniform model.
Proof of Theorem 1. Denote
Yn(t) :=
KUnifn (t)− na(t)√
, Zn(t) :=
n(a(t)− a(βnt));
we stress that Yn(·) and Zn(·) are independent.
Fix an ε ∈ (0,1). First, it follows from (2) for the Poisson model and (37)
Yn(·) +Zn(·)
D−→KPoiss(·) in D[0,1− ε].(38)
Indeed, the process Yn(·) +Zn(·) is obtained from 1√n(K
Poiss
n (·)− na(·)) by
the random time change t 7→ βnt; and since ‖βnt− t‖C[0,1−ε]
P−→ 0, we have
Yn(·) +Zn(·),
KPoissn (·)− na(·)√
P−→ 0
by the definition of the Skorohod metric d.
Second, from Fact 1, (15), and (27) it follows that aUnif(t) = aPoiss(t) =
P{TPoiss0 ≥ t}= 1− t2 for 0≤ t≤ 1, and by the central limit theorem,
Zn(t)
D−→ t2η in D[0,1− ε],(39)
26 V. V. VYSOTSKY
where η is a standard Gaussian r.v.
We claim that (38), the independence of Yn(·) and Zn(·), and (39) yield
the weak convergence of Yn(·) in D[0,1− ε]. Let us check the tightness of
Yn(·) and the convergence of their finite-dimensional distributions.
The tightness of Yn(·) in D[0,1− ε] follows from Yn(·) = (Yn(·)+Zn(·))−
Zn(·), (38), and (39). Indeed, by the Prokhorov theorem, (38) and (39) yield
that both sequences Yn(·) + Zn(·) and −Zn(·) are tight. But trajectories
of −Zn(·) are a.s. continuous because of the continuity of a(·), and the
tightness follows from the continuity of addition + :D×C →D and the fact
that under any continuous mapping, the image of a compact set is also a
compact set.
Now we study convergence of finite dimensional distributions of Yn(·).
Recall that the characteristic function of a centered Gaussian vector in Rm
is e−1/2(Ru,u), where u ∈ Rm and R is the covariance matrix of the vector.
Then (38), the independence of Yn(·) and Zn(·), and (39) yield that for the
characteristic functions of all finite-dimensional distributions of Yn(·), we
Eei(Yn(t),u) −→ e−1/2({R
Poiss(tj ,tk)−t2j t
j,k=1
,(40)
where u ∈Rm, t= (t1, . . . , tm) ∈ [0,1−ε]m, and Yn(t) := (Yn(t1), . . . , Yn(tm)).
We stress that (40) is true for every t ∈ [0,1− ε]m since the limit processes
in (38) and (39) have continuous trajectories.
We see that the matrix {RPoiss(tj , tk)− t2j t2k}mj,k=1 is positive definite for
any t = (t1, . . . , tm) ∈ [0,1− ε]m and m≥ 1 since the absolute value of the
left-hand side of (40) does not exceed one. Putting
RUnif(s, t) :=RPoiss(s, t)− s2t2,
we have {RPoiss(tj , tk)− t2j t2k}mj,k=1 = {RUnif(tj , tk)}mj,k=1; then the function
RUnif(s, t) is positive definite on [0,1)2 since ε > 0 is arbitrary. Thus, by
Lifshits [15], Section 4, RUnif(s, t) is the covariance function of some centered
Gaussian process KUnif(·) on [0,1).
Relation (2) is thus proved. Now check that KUnif(·) ∈C[0,1− ε] a.s. to
conclude the proof of Theorem 1 for the uniform model.
For this purpose, let us prove that a.s., trajectories of Yn(·) have jumps
of size 1√
only. In fact, the jumps of Yn(·) coincide with the jumps of
KUnifn (·), whose jumps are of size 1√n if and only if T
6= TUnifj2,n for
1 ≤ j1 6= j2 ≤ n − 1. By (36), we need to verify that TPoissj1,n 6= T
Poiss
for 1 ≤ j1 6= j2 ≤ n − 1. This relation follows from (20) if H(k1, j1, l1) 6=
H(k2, j2, l2) a.s. for j1 6= j2 and k1, k2, l1, l2 ≥ 1. The last a.s. nonequality is
obvious because if the equality holds true, then a certain nontrivial linear
combination of i.i.d. exponential Xi equals zero.
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 27
Then there exist a.s. continuous Ỹn(·) such that supt∈[0,1−ε] |Ỹn(t)−Yn(t)| ≤
a.s.; consequently, d(Ỹn, Yn) ≤ 1√n a.s. Then by Yn(·)
D−→ KUnif(·), we
also have Ỹn(·)
D−→KUnif(·). But 1 = lim inf P{Ỹn(·) ∈C} ≤ P{KUnif(·) ∈C}
since C ⊂D is closed in the Skorohod topology, therefore, a.s., KUnif(·) is
continuous on [0,1− ε].
Since ε ∈ (0,1) is arbitrary, a.s., KUnif(·) is continuous on the whole inter-
val [0,1). The RUnif(s, t) =RPoiss(s, t)− s2t2 is continuous on [0,1)2 because
RPoiss(s, t) is. �
6. The number of clusters at the critical moment. Now we turn our
attention to the number of clusters at the critical moment t = 1. We are
interested in the behavior of
Kn(1)− na(1)√
Kn(1)√
which is the left-hand side of (3) at t = 1; here we have a(1) = 0 under
EX2i <∞, see Property 3, Section 3.4.
We do not know if this sequence is weakly convergent, but we hope that
it is. We also have a naive guess that its limit is Gaussian because the limit
in Theorem 1 is Gaussian. In view of Kn(1)≥ 1, this conjectured weak limit
is nonnegative, hence it is Gaussian if and only if it is identically equal to
zero. However, the results of this section show that the limit is nonzero, thus
our guess on Gaussianity fails.
The study of convergence of
Kn(1)√
is quite complicated. Therefore, in this
section, we consider only the Poisson model. First, let us prove the following
statement.
Proposition 4. In the Poisson model, we have limn→∞P{Kn(1) =
1}> 0.
Proof. On the one hand, Kn(1) = 1 is equivalent to T
n;Poiss ≤ 1, where
T lastn;Poiss denotes the moment of the last collision in the Poisson model. On
the other hand, a result by Giraud [8] states that in the uniform model,
n(T lastn;Unif − 1)
D−→ sup
0≤x≤1
W (y)dy −
W (y)dy
=: τ,
where
W (·) is a Brownian bridge. Now, by (36), we have T lastn;Unif = β−1n T lastn;Poiss,
hence
n(β−1n T
n;Poiss − 1)
D−→ τ.(41)
28 V. V. VYSOTSKY
But from the central limit theorem and the law of large numbers,
n(β−1n − 1) =−
Sn+1 − n√
Sn+1(
Sn+1 +
D−→ η
,(42)
where η is a standard Gaussian r.v. and Si is a standard exponential ran-
dom walk that defines initial positions of particles. Since, in view of Fact 5,
T lastn;Unif = β
n;Poiss and βn are independent, from (41), (42), and the law
of large numbers it follows that
n(T lastn;Poiss − 1)
D−→ τ − η
= τ +
where τ and η are independent. Thus,
P{Kn(1) = 1}= lim
P{T lastn;Poiss ≤ 1}= P
The main advantage of the Poisson model is that, by Lemma 2 and Prop-
erty 4, Section 3.4 we have P{Tj,n > 1}= epjpn−j , where
pk := P
1≤m≤k
(Si − ESi)≥ 0
and Si is a standard exponential random walk. We say that the sequence of
r.v.’s
i=1(Si−ESi) is an integrated random walk. In the proof of Property
3, Section 3.4, we showed that pk → 0 as k→∞. Therefore, it is reasonable
to say that pk are the unilateral small deviation probabilities of an integrated
centered random walk.
We need to obtain the asymptotics of pk → 0 to continue the study of
convergence of
Kn(1)√
. Unfortunately, the results of the rest of this section
are completely dependent on the correctness of the following conjecture.
Conjecture 1. We have pk ∼ c1k−1/4 as k→∞ for some c1 ∈ (0,∞).
Simulations show that the conjecture is true and c1 ≈ 0.36. The weaker
form pk ≍ k−1/4 of Conjecture 1 was proved by Sinai [22], but only for inte-
grated symmetric Bernoulli random walks. It also interesting to note that,
by McKean [19], the unilateral small deviation probabilities of an integrated
Wiener process have the same order as T →∞:
0≤s≤T
W (u)du≥−1
∼ c2T−1/4(43)
for some c2 ∈ (0,∞). The left-hand side of (43) is a unilateral small deviation
probability since
0≤s≤T
W (u)du≥−1
0≤s≤1
W (u)du≥−T−3/2
.(44)
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 29
To be precise, McKean was interested in a more general problem, and
some calculations are required to obtain (43) from his results. Therefore,
we additionally refer to Isozaki and Watanabe [12] who state (43) explic-
itly.
By the results mentioned above, we also suppose that Conjecture 1 is
true for other integrated centered random walks that satisfy some moment
conditions.
Now we are able to prove the following result on convergence of
Kn(1)√
Proposition 5. Suppose Conjecture 1 holds true. Then in the Poisson
model, we have
Kn(1)√
= c3, sup
Kn(1)√
<∞(45)
for some c3 ∈ (0,∞); the sequence Kn(1)√n is tight and uniformly integrable;
and the limit of any weakly converging subsequence of
Kn(1)√
takes value zero
with positive probability, but is not identically equal to zero.
Numerical simulations show that
Kn(1)√
is weakly convergent and that this
convergence is quite fast. In Figure 1 we present the (empirical) distribution
function of
Kn(1)√
for n = 10,000. Since the simulations performed for n =
40,000 showed a hardly perceptible difference, this function seems to be a
good candidate for the distribution function of the conjectured limit.
Note that if we weaken Conjecture 1 to pk ≍ k−1/4, then Proposition 5
still holds true with the only difference that E
Kn(1)√
Proof of Proposition 5. We start with the convergence of the ex-
pectation. On the one hand, by (6) and Lemma 2,
Kn(1)√
pipn−i,
and on the other hand,
i−1/4(n− i)−1/4 = 1
)−1/4(
)−1/4
−→B(3/4,3/4)
as the integral sum of Beta function. Then it follows from Conjecture 1 and
standard arguments that E
Kn(1)√
converges to c3 := ec
1B(3/4,3/4) > 0.
30 V. V. VYSOTSKY
Fig. 1. The distribution function of
Kn(1)√
for n= 10,000.
Now we check the uniform boundedness of E(
Kn(1)√
)2. By (6) it is sufficient
to prove that
i,j=1,i 6=j
P{Ti,n > 1, Tj,n > 1}<∞.(46)
Suppose i < j; then by using (8) and properties of Fk,j,l(·), we get
P{Ti,n > 1, Tj,n > 1}= P
1≤k≤n−i
1≤l≤i
Fk,i,l(1)> 0, min
1≤k≤n−j
1≤l≤j
Fk,j,l(1)> 0
1≤k≤(j−i)/2
1≤l≤i
Fk,i,l(1)> 0, min
1≤k≤n−j
1≤l≤(j−i)/2
Fk,j,l(1)> 0
where by (j − i)/2 we mean ⌈(j − i)/2⌉. The minima in the last expres-
sion are independent as functions of {Xm}m≤(i+j)/2 and {Xm}m≥(i+j)/2+1,
respectively; hence
P{Ti,n > 1, Tj,n > 1} ≤ P
1≤k≤(j−i)/2
1≤l≤i
Fk,i,l(1)> 0
1≤k≤n−j
1≤l≤(j−i)/2
Fk,j,l(1)> 0
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 31
= P{Ti,i+(j−i)/2 > 1} · P{T(j−i)/2,n−j+(j−i)/2 > 1}
= e2pip
⌈(j−i)/2⌉pn−j,
where the first equality follows from (8) and the second follows from Lemma 2.
Recalling Conjecture 1, we get
i,j=1,i 6=j
P{Ti,n > 1, Tj,n > 1} ≤
i,j=1,i 6=j
e2pip
⌈|j−i|/2⌉pn−j
i,j=1,i 6=j
i−1/4⌈|j − i|/2⌉−1/2(n− j)−1/4
i,j=1,i 6=j
)−1/4∣
−1/2(
)−1/4
for some c > 0. The last expression is an integral sum converging to
x−1/4|x− y|−1/2(1− y)−1/4 dxdy,
and it is a simple exercise to check that the integral is finite. This concludes
(46).
The uniform integrability of
Kn(1)√
follows from the second relation from
(45), see Billingsley [3], Section 5, and the tightness follows from the uniform
integrability.
Finally, suppose
Kni(1)√
D−→ ξ for some subsequence ni → ∞ and some
r.v. ξ. Then Eξ = c3 > 0 by the uniform integrability and (45), and hence
ξ is not identically equal to zero. But the distribution of ξ has an atom at
zero since by Proposition 4 and properties of weak convergence,
P{ξ = 0}= lim
P{ξ ≤ ε}
≥ lim
lim sup
Kni(1)√
≥ lim
P{Kni(1) = 1}> 0. �
7. Open questions. 1. The number of clusters at the critical moment
t= 1.
Here the main question is if Conjecture 1 holds true. Even by itself, this
problem is worth studying.
But even if Conjecture 1 is true, we still do not have a proof of weak
convergence of
Kn(1)√
, it is only known that this sequence is tight. The author
32 V. V. VYSOTSKY
strongly believes, relying on numerical simulations, that the limit exists. It
would be interesting to find this conjectured limit, which should be nontrivial
by Proposition 5, in an explicit form.
2. The weak convergence of
Kn(·)−na(·)√
on the whole interval [0,1].
It is very natural to ask if it is possible to strengthen Theorem 1 by
proving the weak convergence of
Kn(·)−na(·)√
in D[0,1]. This complicated
problem returns us again to Question 1 because the weak convergence of
Kn(·)−na(·)√
in D[0,1] implies the weak convergence of
Kn(1)−na(1)√
Kn(1)√
, see
Billingsley [3], Section 15. But even if
Kn(1)√
converges, its weak limit K(1)
is not Gaussian, hence the limit process K(·), which is Gaussian on [0,1),
is no more Gaussian on [0,1]. Therefore, it is doubtful that Theorem 1 is
true in D[0,1]; at least, one should provide a proof completely different from
the presented one. Also, it is unclear how to define the finite-dimensional
distributions of the non-Gaussian K(·) on [0,1] because simulations show
that K(1) would not be independent with K(t) for t < 1.
3. The number of clusters in the warm gas.
In the presented case, initial speeds of particles are zero. This model is of-
ten called the cold gas according to its zero initial temperature. We introduce
a new model stating that initial speeds of particles are anv1, anv2, . . . , anvn,
where vi are some i.i.d. r.v.’s and an is a sequence of normalization con-
stants. This model, called the warm gas, was studied in many papers, for
example, [14, 16, 20, 25].
It is of a great interest to study the behavior of Kn(t) in the warm gas.
In [25], the author proved that in the basic case where an = 1 for all n and
Ev2i <∞, we have
Kn(t)
P−→ 0 for all t > 0. The question is to find a normal-
ization of Kn(t) leading to some nontrivial limit. Clearly, this normalization
depends on an, but it is very possible that there is an effect of phase tran-
sition similar to the one discovered by Lifshits and Shi [16]: If an are small
enough, then the gas has a low temperature and the normalization is the
same as in the cold gas. If an are big enough, as in the basic case an ≡ 1,
then the normalization and the behavior of the gas differ entirely from the
case of the cold gas.
The author believes that the localization property, which is described in
Section 3, could be helpful in a study of these questions.
It is also interesting to compare the behavior of Kn(1) in the warm and in
the cold gases; in the warm gas, the moment t= 1 plays the same “critical”
role as in the cold gas, see Lifshits and Shi [16]. The variable Kn(1) was
studied by Suidan [24], who considered the warm gas with an ≡ 1 and deter-
ministic initial positions of particles (his initial positions were 1
, . . . , n
For this case, Suidan found the distribution of Kn(1) and showed that
EKn(1)∼ logn. Recall that in the presented case, EKn(1)∼ c3
CLUSTERING IN A STOCHASTIC MODEL OF ONE-DIMENSIONAL GAS 33
4. The number of clusters in ballistic systems of sticky particles.
A sticky particles model is called ballistic if it evolves according to the
laws introduced in Section 1, but in the absence of gravitation. Such models
are, in some sense, more natural than gravitational ones because the ba-
sic assumption that gravitation does not depend on distance is sometimes
confusing. However, an unpublished paper of Lifshits and Kuoza shows that
certain gravitational and ballistic models are tightly connected.
It seems interesting to study the number of clusters in the ballistic model.
The author does not know any results in this field.
Acknowledgments. I am grateful to my adviser Mikhail A. Lifshits for
drawing my attention into the subject and for his guidance. I also thank the
anonymous referees for carefully reading this paper and useful comments.
REFERENCES
[1] Baum, L. E. and Katz, M. (1965). Convergence rates in the law of large numbers.
Trans. Amer. Math. Soc. 120 108–123. MR0198524
[2] Bertoin, J. (2002). Self-attracting Poisson clouds in an expanding universe. Comm.
Math. Phys. 232 59–81. MR1942857
[3] Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
MR0233396
[4] Brenier, Y. and Grenier, E. (1998). Sticky particles and scalar conservation laws.
SIAM J. Numer. Anal. 35 2317–2328. MR1655848
[5] Chertock, A., Kurganov, A. and Rykov, Yu. (2007). A new sticky particle
method for pressureless gas dynamics. SIAM J. Numer. Anal. 45 2408–2441.
MR2361896
[6] E, W., Rykov, Yu. G. and Sinai, Ya. G. (1996). Generalized variational principles,
global weak solutions and behavior with random initial data for systems of
conservation laws arising in adhesion particle dynamics. Comm. Math. Phys.
177 349–380. MR1384139
[7] Esary, J. D., Proschan, F. and Walkup, D. W. (1967). Association of random
variables, with applications. Ann. Math. Stat. 38 1466–1474. MR0217826
[8] Giraud, C. (2001). Clustering in a self-gravitating one-dimensional gas at zero tem-
perature. J. Statist. Phys. 105 585–604. MR1871658
[9] Giraud, C. (2005). Gravitational clustering and additive coalescence. Stochastic Pro-
cess. Appl. 115 1302–1322. MR2152376
[10] Gurbatov, S. N., Malakhov, A. N. and Saichev, A. I. (1991). Nonlinear Ran-
dom Waves and Turbulence in Nondispersive Media: Waves, Rays, Particles.
Manchester Univ. Press. MR1255826
[11] Gurbatov, S. N., Saichev, A. I. and Shandarin, S. F. (1989). The large-scale
structure of the universe in the frame of the model equation of nonlinear diffu-
sion. Mon. Not. R. Astr. Soc. 236 385–402.
[12] Isozaki, Y. and Watanabe, S. (1994). An asymptotic formula for the Kolmogorov
diffusion and a refinement of Sinai’s estimates for the integral of Brownian mo-
tion. Proc. Japan Acad. Ser. A Math. Sci. 70 271–276. MR1313176
[13] Karlin, S. (1968). A First Course in Stochastic Processes. Academic Press, New
York. MR0208657
http://www.ams.org/mathscinet-getitem?mr=0198524
http://www.ams.org/mathscinet-getitem?mr=1942857
http://www.ams.org/mathscinet-getitem?mr=0233396
http://www.ams.org/mathscinet-getitem?mr=1655848
http://www.ams.org/mathscinet-getitem?mr=2361896
http://www.ams.org/mathscinet-getitem?mr=1384139
http://www.ams.org/mathscinet-getitem?mr=0217826
http://www.ams.org/mathscinet-getitem?mr=1871658
http://www.ams.org/mathscinet-getitem?mr=2152376
http://www.ams.org/mathscinet-getitem?mr=1255826
http://www.ams.org/mathscinet-getitem?mr=1313176
http://www.ams.org/mathscinet-getitem?mr=0208657
34 V. V. VYSOTSKY
[14] Kuoza, L. V. and Lifshits, M. A. (2006). Aggregation in one-dimensional gas model
with stable initial data. J. Math. Sci. 133 1298–1307. MR2092206
[15] Lifshits, M. A. (1995). Gaussian Random Functions. Kluwer, Dordrecht.
MR1472736
[16] Lifshits, M. and Shi, Z. (2005). Aggregation rates in one-dimensional stochastic
systems with adhesion and gravitation. Ann. Probab. 33 53–81. MR2118859
[17] Lin, Z. and Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables.
Kluwer, Dordrecht. MR1486580
[18] Louhichi, S. (2000). Weak convergence for empirical processes of associated se-
quences. Ann. Inst. H. Poincare Probab. Statist. 36 547–567. MR1792655
[19] McKean, H. P. (1963). A winding problem for a resonator driven by a white noise.
J. Math. Kyoto Univ. 2 227–235. MR0156389
[20] Martin, Ph. A. and Piasecki, J. (1996). Aggregation dynamics in a self-gravitating
one-dimensional gas. J. Statist. Phys. 84 837–857. MR1400187
[21] Newman, C. M. (1980). Normal fluctuations and the FKG inequalities. Comm.
Math. Phys. 74 119–128. MR0576267
[22] Sinai, Ya. G. (1992). Distribution of some functionals of the integral of a random
walk. Theor. Math. Phys. 90 219–241. MR1182301
[23] Shandarin, S. F. and Zeldovich, Ya. B. (1989). The large-scale structure of the
universe: Turbulence, intermittency, structures in a self-gravitating medium.
Rev. Modern Phys. 61 185–220. MR0989562
[24] Suidan, T. M. (2001). A one-dimensional gravitationally interacting gas and the con-
vex minorant of Brownian motion. Russ. Math. Surv. 56 687–708. MR1861441
[25] Vysotsky, V. V. (2006). On energy and clusters in stochastic systems of sticky
gravitating particles. Theory Probab. Appl. 50 265–283. MR2221711
[26] Vysotsky, V. V. (2007). The area of exponential random walk and partial sums of
uniform order statistics. J. Math. Sci. 147 6873–6883.
Department of Probability Theory
and Mathematical Statistics
Faculty of Mathematics and Mechanics
St. Petersburg State University
Bibliotechnaya pl. 2
Stary Peterhof 198504
Russia
E-mail: vlad.vysotsky@gmail.com
http://www.ams.org/mathscinet-getitem?mr=2092206
http://www.ams.org/mathscinet-getitem?mr=1472736
http://www.ams.org/mathscinet-getitem?mr=2118859
http://www.ams.org/mathscinet-getitem?mr=1486580
http://www.ams.org/mathscinet-getitem?mr=1792655
http://www.ams.org/mathscinet-getitem?mr=0156389
http://www.ams.org/mathscinet-getitem?mr=1400187
http://www.ams.org/mathscinet-getitem?mr=0576267
http://www.ams.org/mathscinet-getitem?mr=1182301
http://www.ams.org/mathscinet-getitem?mr=0989562
http://www.ams.org/mathscinet-getitem?mr=1861441
http://www.ams.org/mathscinet-getitem?mr=2221711
mailto:vlad.vysotsky@gmail.com
	Introduction
	Description of the model
	Statement of the problem and the results
	Organization of the paper
	Method of barycenters
	Study of the i.d. model. The localization property
	The initial study
	Localization property of the aggregation process
	The distribution function of T0 in the Poisson model
	Some properties of the variables Ti
	The last collision
	Proofs of Fact 1 and Theorem 1 for the i.d. model
	Proof of Theorem 1 for the uniform model
	The number of clusters at the critical moment
	Open questions
	Acknowledgments
	References
	Author's addresses
ABSTRACT
  We give a quantitative analysis of clustering in a stochastic model of
one-dimensional gas. At time zero, the gas consists of $n$ identical particles
that are randomly distributed on the real line and have zero initial speeds.
Particles begin to move under the forces of mutual attraction. When particles
collide, they stick together forming a new particle, called cluster, whose mass
and speed are defined by the laws of conservation. We are interested in the
asymptotic behavior of $K_n(t)$ as $n\to \infty$, where $K_n(t)$ denotes the
number of clusters at time $t$ in the system with $n$ initial particles. Our
main result is a functional limit theorem for $K_n(t)$. Its proof is based on
the discovered localization property of the aggregation process, which states
that the behavior of each particle is essentially defined by the motion of
neighbor particles.

<|endoftext|><|startoftext|>
Introduction
Let Hm and Hn are hyperbolic spaces with dimensions m ≥ 2 and n correspond-
ingly. For convenience, we use the upper-half space models for Hm and Hn. So
Hm = {(x1, ..., xm) ∈ IRm : xm > 0}, Hn = {(y1, ..., yn) ∈ IRn : yn > 0} with
metrics
d2Hm =
(xm)2
((dx1)2 + ...+ (dxm)2),
d2Hn =
(yn)2
((dy1)2 + ...+ (dyn)2).
So the tension fields of u = (y1, ..., yn) is
τα = (xm)2(∆0y
< ∇0y
α,∇0y
n >),
for 1 ≤ α ≤ n− 1 and
τn(u) = (xm)2(∆0y
α|2 − |∇0y
n|2)),
where ∇0 is the Euclidean gradient and ∆0 is the Euclidean Laplacian.
A C2 map u : Hm → Hn is called a harmonic map if τ(u)s = 0 for all s =
1, 2, ..., n. The literature about harmonic maps between Riemannian manifolds are
abundant, we refer the readers to the classical work [4].
One of the interesting problems for harmonic maps is that of the Dirichlet prob-
lem at infinity: Given ∂Hm and ∂Hn geometric boundaries of Hm and Hn, and
given a continuous map f : ∂M → ∂N (here continuity is understood in the sense
Date: October 22, 2018.
2000 Mathematics Subject Classification. 53A35.
Key words and phrases. Dirichlet problems; Harmonic functions; Hyperbolic spaces.
This work has been initiated when the second author was at Department of mathematics,
University of natural sciences, Hochiminh city, Vietnam. He would like to thank Professor Dang
Duc Trong for his many invaluable helps. He also would like to express his thankfulness to
Professor F. Helein, Professor R. Schoen, and Mr. Le Quang Nam for their generous help.
http://arxiv.org/abs/0704.0087v2
2 DUONG MINH DUC AND TRUONG TRUNG TUYEN
of Euclidean), is there a harmonic map u : Hm → Hn such that in Euclidean sense
u is continuous up to the boundary ∂Hm and takes boundary value f?
For this problem with some more requirements for the smoothness of f , there are
many results. In three papers [8], [9] and [7], Li and Tam established the existence
and uniqueness of a harmonic function u which is C1 up to the boundary and has
boundary value f , provided f is C1. But for more general types of f , according to
our knowledge, there is no answer to the existence of a solution u.
In this paper we establish the existence of approximate solutions to the Dirichlet
problem for harmonic maps between two hyperbolic spaces with prescribed bound-
ary value. More explicitly, we prove the following result
Theorem 1. Let f : Hm → Hn be a bounded uniformly continuous. Let functions
g and ϕ be as in Section 2. Assume that
t−1g(t)dt < ∞, in particular, this
condition is satisfied if f is Holder continuous. For each ǫ > 0, there exists a
harmonic map uǫ : H
m → Hn which is continuous up to the boundary ∂Hm and
u|∂Hm = (f
1, ..., fn−1, ǫ).
Our strategy for proving this result is the follows: First, we construct an initial
map, i.e., a C2 map v = (v1, ..., vn−1, vn) : Hm → Hn which has boundary value
f for any continuous map f : ∂Hm → Hn. For this step we follows the ideas in
[9], with some changes: Since the function f needs not to differentiable, we can
not take vn as in [9], and the function vn of ours is a function of one variable xm.
Then, we use this function to produce harmonic maps uǫ : H
m → Hn which takes
boundary value (f1, ..., fn−1, vn + ǫ) for every ǫ > 0.
2. Initial maps
In this part, we use the techniques in [9] to construct good initial maps v having
the map f : ∂Hm → ∂Hn as the boundary value.
Let f : IRm−1 → IRn−1 be a uniformly continuous bounded function. Let
g : Hm → (0.∞) be C2, bounded and
g(x′, xm) = 0,
uniformly in x′.
We denote by v = {f, g} : Hm → Hn the extension of f defined as follows
vα(x′, xm) =
xmfα(y′)
(|x′ − y′|2 + (xm)2)m/2
for 1 ≤ α ≤ n− 1 and
vn(x′, xm) = g(x′, xm).
By results in [9] (pp. 628-630) we have
(i) v is C2 and up to the boundary given by xm = 0 it is continuous.
(ii) If 1 ≤ α ≤ n− 1 then
xm|∇0v
α| = 0,
uniformly in x′.
Moreover, by estimates of elliptic PDEs (see Theorem 2.10 in [5]), noting that
vα is bounded, there exists constants C > 0 such that
(2.1) max{(xm)3|D3vα|, (xm)2|D2vα|, (xm)|∇0v
α|} ≤ C.
APPROXIMATE SOLUTIONS TO THE DIRICHLET PROBLEM FOR HARMONIC MAPS BETWEEN HYPERBOLIC SPACES3
We put
g(r) = sup
x′,y′∈IRm−1, |x′−y′|≤r
|f(y′)− f(x′)|,
ϕ(r) =
s2 + r2
g(s)ds.
Since g is monotone it follows that g is Lebesgue measurable. Moreover, since g is
bounded, we see that ϕ is well-defined.
Using polar coordinates with center at x′ we see that there exists a constant
C > 0 such that ∫
IRm−1
xm|f(y′)− f(x′)|
(|x′ − y′|2 + (xm)2)m/2
≤ Cϕ(xm),
for all x′ ∈ IRm−1.
Since f is uniformly continuous we see that
g(r) = 0.
Now we show that
ϕ(xm) = 0.
Indeed, for any ǫ > 0, we find δ > 0 such that
g(s) ≤ ǫ,
if 0 < s ≤ δ. So, if K = sup
s∈IR g(s) we have
ϕ(r) =
s2 + r2
g(s)ds+
s2 + r2
g(s)ds
s2 + r2
s2 + r2
= ǫ arctan(δ/r) +K(π/2− arctan(δ/r)).
Letting r → 0 we see that
lim sup
ϕ(r) ≤ ǫπ/2.
Since ǫ > 0 is arbitrary, we see that
ϕ(r) = 0.
Thus, if we put v = {f, ϕ(xm)} we see that v is an extension of f . Moreover we
have the following result
Lemma 1. Let f : ∂Hm → ∂Hn be nonconstant, uniformly continous and
bounded. Put v = {f, ϕ(xm)} as above. Then v is smooth, up to the boundary
it is continuous, v|IRm−1 = f and there exists C > 0 such that for x
m near 0 we
||τ(v)||2 ≤ C.
4 DUONG MINH DUC AND TRUONG TRUNG TUYEN
Proof. By Section 6 in [9] we have
|(xm)∇0v
α| ≤ C3|ϕ(x
where 1 ≤ α ≤ n− 1 and C3 is a positive constant.
Directly computation gives
ϕ′(r) =
s2 − r2
(s2 + r2)2
g(s)ds,
ϕ”(r) =
(s2 + r2)2
g(s)ds+
−4r(s2 − r2)
(s2 + r2)3
g(s)ds.
max{|rϕ′(r)|, |r2ϕ”(r)|} ≤ C4ϕ(r),
where C4 is a constant.
Since g is increasing, g′ exists almost everywhere and g′ ≥ 0. Using integration
by parts, noting that d
s2+r2
) = s
(s2+r2)2
, we have
ϕ′(r) =
s2 − r2
(s2 + r2)2
g(s)ds
s2 + r2
g(s)|∞0 +
s2 + r2
g′(s)ds
s2 + r2
g′(s)ds.
Differentiating the last term in above equality we get
ϕ”(r) = −
(s2 + r2)2
g′(s)ds.
Since f is nonconstant we see easily that g′ 6≡ 0 (in fact, we don’t need this
restriction since we can add g with a non-constant positive function, for example
(xm)1/2 ). So since g′ ≥ 0, it follows from above equalities that
ϕ′(r) > 0,
|rϕ”(r)| ≤ C5ϕ
′(r),
where C5 is a positive constant. Then use the formula for the tension field we are
done. �
3. Proof of Theorem 1
Proof. Fixed ǫ > 0. We define vǫ : H
m → Hn as follows:
vǫ(x) = (v
1(x), v2(x), ..., vn−1(x), ϕ(xm) + ǫ).
For each δ > 0 denote uǫ,δ : H
m k Ωδ = {x
m > δ} → Hn the harmonic map
taking value vǫ on ∂Ωδ.
By inequality (2.1) in [2] and properties of v and vǫ (see Lemma 1) we have
∆HmdHn(uǫ,δ, vǫ) ≥ −|τ(vǫ)| ≥ −C
ϕ(xm)
ϕ(xm) + ǫ
ϕ(xm),
for all x ∈ Ωδ, and here C is one constant from Lemma 1.
APPROXIMATE SOLUTIONS TO THE DIRICHLET PROBLEM FOR HARMONIC MAPS BETWEEN HYPERBOLIC SPACES5
We claim that the function
ψ(r) =
u−2ϕ(u) du ds
is well-defined for r ≥ 0. In fact, using the formula for ϕ we have
ψ(r) =
u−2ϕ du ds =
u−1(u2 + t2)−1g(t)dt du ds.
Since the integrand is non-negative, using Fubini’s theorem we have∫ r
u−1(u2 + t2)−1g(t)dt du ds =
u−1(u2 + t2)−1g(t)du dt ds
log(1 +
)g(t)dt ds
log(1 +
)g(t)ds dt
πt2 − 2 arctan( t
)t2 + r log(1 + t
g(t)dt.
Now since g(t) is bounded we have
πt2 − 2 arctan( t
g(t)dt
is convergent. Fixed r ≥ 0, near t = 0 we have
r log(1 + t
g(t) ≈ t−1g(t),
and when t→ ∞ we have
r log(1 + t
g(t) ≈ t−3g(t),
hence since g(t) is bounded and the assumption that
t−1g(t) converges, our claim
is verified.
We use the same ψ to denote the function ψ : Hm → IR defined by ψ(x) = ψ(xm)
for x = (x1, ..., xm−1, xm) ∈ Hm. Now we have ψ′(r) =
u−2ϕ du > 0 and
ψ”(r) = −r−2ϕ(r), since m ≥ 2 we have
∆Hm(−ψ(x)) = −(x
m)2[ψ”(xm)−
(m− 2)
ψ′(xm)] ≥ −(xm)2ψ”(xm) = ϕ(r).
Hence
∆Hm (dHn(uǫ,δ, vǫ)− C
ψ) ≥ 0,
for x ∈ Ωδ. Hence by maximum principle we have
dHn(uǫ,δ, vǫ) ≤ C
ψ(xm).
This bound for dHn(uǫ,δ, vǫ) is independent of δ, hence by standard arguments (see
the proof of Theorem 6.4 in [9]) we have a harmonic map uǫ : H
m → Hn which is
the subsequent limit of uǫ,δ. Moreover for all x ∈ H
m we have
dHn(uǫ, vǫ) ≤ C
ψ(xm).
6 DUONG MINH DUC AND TRUONG TRUNG TUYEN
Hence
dHn(uǫ, vǫ) = 0,
which shows that uǫ is continuous up to the boundary and takes boundary value
1, ..., xm−1, 0) = (f1, f2, ..., fn−1, ǫ). �
References
[1] Shiu-Yuen Cheng, Liouville theorem for harmonic maps, Proc. Symp. Pure Math. 36, 1980,
147–151.
[2] Wei-Yue Ding and Youde Wang, Harmonic maps of complete noncompact Riemannian man-
ifolds, Internat. J. Math. 2, 1991, 617–633.
[3] Duong Minh Duc and Alberto Verjovsky, Proper harmonic maps with Lipschitz boundary
values, preprint.
[4] James Eells, Jr. and J. H. Sampson, Harmonic mappings of Riemannian manifolds, Ams. J.
Math. 86 (1), 1964, pp. 109–160.
[5] David Gilbarg and Neil S. Trudinger, Elliptic partial differential Equations of second order,
Springer - Verlag, Berlin - Heidelberg-New York -Tokyo, 1983.
[6] Frederic Helein, Regularite des applications faiblement harmoniques entre une surface et une
variete riemannienne, C. R. Acad. Sci. Paris 312 (1), 1991, 591–596.
[7] Peter Li and Luen-Fai Tam, The heat equation and harmonic maps of complete manifolds,
Invent. Math. 105, 1991, 1–46.
[8] Peter Li and Luen-Fai Tam, Uniqueness and regularity of proper harmonic maps, Anals of
Mathematics 137, 1993, pp. 167-201.
[9] Peter Li and Luen-Fai Tam, Uniqueness and regularity of proper harmonic maps II, Indiana
University Mathematics Journal 42 (2), 1993, pp. 591-635.
[10] Richard Schoen and Shing Tung Yau, Compact group actions and the topology of manifolds
with non-positive curvature, Topology 18, 1979, 361–380.
Department of Mathematics, University of natural sciences, Hochiminh city, Viet-
E-mail address: dmduc@hcmuns.edu.vn
Department of Mathematics, Indiana University, Rawles Hall, Bloomington, IN 47405
E-mail address: truongt@indiana.edu
	1. Introduction
	2. Initial maps
	3. Proof of Theorem ??
	References
ABSTRACT
  Our main result in this paper is the following: Given $H^m, H^n$ hyperbolic
spaces of dimensional $m$ and $n$ corresponding, and given a Holder function
$f=(s^1,...,f^{n-1}):\partial H^m\to \partial H^n$ between geometric boundaries
of $H^m$ and $H^n$. Then for each $\epsilon >0$ there exists a harmonic map
$u:H^m\to H^n$ which is continuous up to the boundary (in the sense of
Euclidean) and $u|_{\partial H^m}=(f^1,...,f^{n-1},\epsilon)$.

<|endoftext|><|startoftext|>
Introduction 
    In [1-3] new effect called photonic flame effect was found and some its properties were 
studied. This effect is determined by  properties of photonic crystals.       
    Photonic crystals have attracted great attention since the first papers concerning such 
structures [4-6]. One of the most important photonic crystals are artificial opals – self –
assembled structures composed of SiO2 spheres organizing face-centered cubic lattice. The size 
of such spheres varying between 200 nm and 400 nm and defines the parameters of the face-
centered cubic lattice and the photonic bandgap. The possibility of opal infiltration with different 
medium gives rise to effective processing the properties of the light propogating through the 
crystal. The voids in the opal structures can be filled with semiconductors, superconductors, 
ferromagnetic materials, fluorescent medium [7] and this fact gives large possibility to practical 
applications of such structures for optoelectronics.  The study of the linear optical properties of 
the photonic band gap have been the task of many theoretical and experimental works and still 
remain the task to be investigated [7,8]. The theoretical description of the electromagnetic field 
inside the photonic crystal structures (obtained by transfer matrix method [9] or coupled mode 
theory [10]) gives the clear picture of the transmitted and reflected spectrum, electromagnetic 
field distribution inside the crystal and their dependence on the parameters of the photonic 
crystal structure (values of period, number of periods, refractive index contrast). Large values of 
the electromagnetic field localization in some regions lead to the expectation of the strong 
enhancement of nonlinear wave-matter interaction in comparison with bulk crystals. Second 
harmonic generation in different types of photonic crystals was investigated in [11,12]. Properly 
chosen photonic crystal exhibits negative refraction at some conditions [13]. Some features of 
the stimulated Raman scattering in one-dimensional photonic structure  were considered in [14]. 
Fully quantum mechanical treatment of the generation of entangled photon in nonlinear photonic 
crystals at the process of down-conversion was realized in [15]. Photonic band gap properties 
which are demonstrated by  photonic crystalls are being actively used for investigation of 
photon-exciton interaction [16]. Acoustic modes excited in SiO2 balls which compose opal 
photonic crystal show the effect of phonon modes quantization [17] and are the reason of 
stimulated globular scattering [3]. Specific features of the acoustic wave propagation in the 
photonic structures lead to the possibility of the diverging ultrasonic beam focusing into a 
mailto:tchera@mail1.lebedev.ru
narrow focal spot with a large focal depth [18]. Optical parametric oscillations via four-wave 
mixing in isotropic photonic crystals showes the possibility of the effective frequency processing 
[19].  
    The aim of this work is to give a short review of results [1-3] and to  study collective behavior 
of several photonic crystals. The crystals are posed on Cu plate at the temperature of liquid 
nitrogen. One of the photonic crystals is illuminated by laser pulse and the laser light is focused 
on this only crystal. The phenomenon which we observe is the appearance of  luminescence of  
other photonic crystals. The duration of the luminescence of other crystals which are spatially 
separated with the crystal illuminated by laser pulse is of the order of seconds. The appearance of 
the luminescence takes place with some time delay respectively to the laser pulse. The form of 
these  light spots on the other crystals and their slow motions along the crystal reminds a small 
flame spot. This inspired us to give the “photonic flame” name to the observed effect. In the case 
of covering the surface of the Cu plate with liquid (acetone, ethanol, water) after the PFE 
excitation in the opal situated on this plate  blue luminescence is being seen in the frozen liquid. 
The temporal characteristics of this luminescence are the sme as for single opal crystal. The 
paper is organized as follows. In Sec.2 the experimental setup, laser, the photonic crystals 
(artificial opals) used in the experiment are described. In Sec.3 the “photonic flame effect” 
observed in the experiment is discussed. In Sec.4 perspectives and possible explanations are 
presented.    
2. Photonic crystals and laser used in experiment. 
    One of the most promising three-dimensional photonic crystals is artificial opal. Opal is a 
crystal with face-centered cubic lattice consisting of the monodisperse close packed SiO2 spheres 
with diameter about several hundred nanometers. Because the refractive index contrast (ratio 
nSiO2/nair) is about 1,45 the complete photonic band gap does not exist but the photonic 
pseudogap takes place. Empty cavities among these globules have octahedral and tetrahedral 
form. It is possible to investigate both initial opals (opal matrices) and nanocomposites, in which 
cavities are filled with organic or inorganic materials, for instance, semiconductors, 
superconductors, ferromagnetic substances, dielectrics, displaying different types of  
Fig.1.  Common appearance of a globular photonic crystal, built of spherical particles (globules)  
nonlinearities and so on. Filling voids of the photonic crystal with materials with different 
refraction index one can effectively process the parameters of the photonic pseudogap. 
 Ruby laser giant pulse (λ=694.3 nm, τ=20 ns, Emax =0.3 J, spectral width of the initial light - 
0.015 cm-1.) has been used as a source of excitation. Exciting light has been focused on the 
material by lenses with different focal lengths (50, 90, and 150 mm). The samples of opal 
crystals used had the size 3x5x5mm and were cut parallel to the plane (111) (see Fig.2) .The 
angle of the incidence of the laser beam on the plane (111)  varied from 0 to 600. Sample 
distance from focusing system and exciting light energy were different in different runs of the 
experiment. This gave possibility to make measurements for different power density at the 
entrance of the sample and for different field distribution inside the sample. Opal crystals 
consisting of the close-packed amorphose spheres with diameter 200 nm, 230 nm, 250 nm and 
nanocomposites (opal crystals with voids filled with acetone or ethanol) were investigated.   
                                                                                       z
                                           θ                                                                        y
                                                                  [111]                                 x
Fig.2. The scheme of  illuminating the sample. Plane XY correspondes to the CU plate surface. 
3.Characteristics of “photonic flame” 
     Opal crystals were placed on the Cu plate which was put into the cell with liquid nitrogen (see 
Fig.3). The number of crystals varied from 1 to 5. The distance (d) betwen the crystals was of the 
order of several centimeters (maximum value of d was 5 centimeters and was determined by the 
Cu plate size).  One of the crystals was illuminated by the focused laser pulse. In the case of the 
reaching of the threshold visible (blue) luminescence appeared. The luminescence duration was 
from 1 to 12 seconds and it looked like inhomogeneous spot changing its spatial distribution and 
position on the surface of the crystal during this time.    
                                                           liquid   N2
                                                                 opal               opal
                                                                                                               exciting light
                                                      d
                                   opal                 Cu
Fig.3. Experimental setup. 
    Parameters of the luminescence (duration, threshold) were determined by the geometric 
characteristic of the illumination and the refractive index contrast of the sample. For optimal 
geometry of the excitation the power density threshold for opal crystal was 0.12 Gw/cm2, for 
opal crystals filled with ethanol – 0.05 Gw/cm2, for opal crystal filled with acetone - 0,03 
Gw/cm2. Typical luminescence temporal distribution measured for the part of the crystal 
displaying the most intensive brightness is shown on Fig.4. The same behavior is typical for all 
cases of the luminescence at these conditions of excitation, but the value of the luminescence 
duration fluctuated from shoot to shoot. 
                                  a)                                                              b) 
Fig.4. Temporal distribution of the visible luminiscence.  
     The duration of the luminiscence fluctuated from 1 till 12 seconds and demonstrated 
oscillating structure. In some cases the temporal distribution had maximum at the beginning of 
the luminiscence in some cases – minimum. Fig.4 a) and b) show the luminiscence of the pure 
opal matrix of the nearly the same duration at the same geometrical and energetical conditions of 
excitation near the threshold of excitation (0.12 Gw/cm2 ). The beginning of the mesurements 
corresponds to 0.3 s delay after the laser shoot  (laser pulse duration is 20 ns).  
     Secondary emission spectrum observed in photonic flame effect has been investigated with 
the help of setup shown at the Fig.5. 
                             
                                          12
                          11                                          10
                                                      9
                                          8
                                                                                1
                    3                4          2
             6           7
    Fig. 5.The experimental setup for PPE spectrum study. 1- ruby laser; 2- lens; 3, 4, 5 – photonic 
crystals; 7 – cell with liquid nitrogen 6 – Cu plate; 8 – fiber wave guide; 9 – minipolychromator; 
10 – computer; 11 – camera; 12 – computer. 
Spectra of the light emitted by photonic crystal  for different pumping light power density are 
shown at the Fig. 6  (a and b) 
                                                                                                                   
200 300 400 500 600 700 800 900 1000
643I, 
λ, nm
200 300 400 500 600 700 800 900 1000 1100
λ, nm
                              a                                                          b                                           
 Fig. 6. Secondary emission spectrum of a photonic crystal for different laser light power density: 
a - I = 0.12 GW/cm2,     b - I = 0.14   GW/cm2. 
Spectrum consisted of the sharp lines with wavelengths: 429.0, 453.0, 489.0, 555.0, 643.0 nm, 
which corresponds to the antistokes spectral range for exciting line 694.3 nm. Lines intensity in 
the spectrum strongly depended on the laser pumping intensity, which was evidence of 
stimulated type of the radiation emission.  
    In the case of several crystals placed on  the Cu plate only one of them was irradiated by the 
laser pulse. Luminescence took place in this crystal in the case of the threshold reaching. Bright 
shining of the other crystals began with some time delay after laser shoot. The value of this delay 
(and the intensity of the luminiscence) was determined by the spatial position of the crystals on 
the plate. The steal screen beeing put between the crystals (in order to avoid irradiating of the 
crystals by the light scattered by the crystal excited by the laser) did not stop the appearing of the 
luminescence if the distance between the Cu plate and the screen was more than 0.5 mm.  The 
duration of the luminiscence was of the order of several seconds and temporal behavior was like 
shown on Fig.4. The typical features of such distribution were existence of  maximum and large 
plato with near constant value of the intensity. 
     In order to show the role of the material of the plate used we repeated these measurements 
with plates of the same size but made from steel and quarz on which opal crystals were placed 
like in the previous experiments. Luminescence of the same kind in the irradiated crystal took 
place but the luminescence of the other samples situated on these plates was not observed.   
    The effect was also determined by the angle of incidence (Fig.2). For the samples used the 
value of the angle was chosen experimentally for achieving of the maximal value of the 
luminescence (it worth to mention that this value differed from 0 and was about 400). Easier the 
effect was excited in the unprocessed samples. In Fig.5 one can see the luminescence of the 
crystals situated at the distance of about 1 centimeter from the crystal which was irradiated. 
                                 
Fig.7 Visible luminiscence of the opal crystals in the case of the irradiating one of them (the 
irradiated crystal can be seen by bright red light; on the left picture it was the crystal in the 
center, on the right picture  it was the crystal on the left). Left picture corresponds to the case 
where crystals are infiltrated by acetone. Right picture corresponds to the case of the opal 
crystals without infiltration. 
       In the case of the large laser energy (several times more the threshold) or if the crystal was 
irradiated by several laser pulses the opal can be destroyed and the parts of the crystal produce 
the luminiscence with the spectral  and temporal properties described above (Fig.8). 
Fig.8 Opal crystal is destroyed and 3 large pieces and several little pieces are going on to 
produce the luminescence. 
      In order to clarify the role of the Cu plate surface on the energy transport between the 
crystals the next experiment  was realized: the pure opal matrix posed on the Cu plate was 
irradiated  by the ruby laser pulse and demonstrated strong luminescence lasting few seconds 
with the properties described above (Fig.9)  
Fig.9  Luminiscence of the single opal matrix 
    Next step was covering the surface of the Cu plate with the liquid (experiments were made 
with water, aceton or ethanol). The thickness of the frozen liquid on the plate surface was about 
1 mm. The transverse size of the frozen liquid was about 1 cm. After illuminating of the crystal 
by the ruby laser pulse the luminiscence of the crystal appears the bright blue luminiscence of 
the frozen species of the liquid used appears. The temporal characteristics of the luminiscence in 
crystal and in the frozen liquid are approximatly the same (the luminiscence duration is about 
several seconds). The luminiscence of the frozen liquid goes on in spite on the putting the screen 
between the crystal and the liquid. It shows that the luminiscence of the liquid is not a reflection 
of the light which is emmited by the crystal. Fig.10 shows the luminiscence of the crystal and the 
frozen liquid (in this case it was water). The pictures were made with the interval of 1 second 
between each other. Analogous behaviour is demonstrated by aceton and ethanol. The 
luminiscence of the area covered with frozen liquid takes place even if this area is at the distance 
of several cantimeters from the irradiated crystal.The explanation of the blue luminiscence of the 
frozen liquid can be done in several ways and for clarifying the reasons of this luminescence 
appearance it is necessary to produce additional experiments.  The intensity of the laser in the 
experiments is about 0.12 Gw/cm2, and the large enhancement of this field due to Mie – 
resonance [20]  simultaniously with the interference effect caused by the structure of the opal 
matrix can lead to the extremely large field enhancement which can play an important role in this 
effect.  
                                                                                                                   
                                                                                                                                                                          
Fig.10  Luminiscence of the opal matrix (bright round spot) and frozen liquid (large blue spot) 
on the surface of the Cu plate.  
   4. Conclusions. 
   In this paper we reported about some new features of the photonic flame effect. The main 
features of PFE are: 
- At the excitation of the artificial opal crystal which is placed on the Cu plate at the 
temperature of the liquid nitrogen by the ruby laser pulse of the nanosecond range long-
continued optical luminescence takes place in the case if the threshold of the process        is 
reached;                                                                                               
- In the case of  several opal crystals being put on the Cu plate while one of them is being          
irradiated bright visible luminescence occurs in all samples;                               
- Temporal behavior and thresholds of the luminescence have been determined. Photonic    
crystals infiltrated with different nonlinear liquids and without infiltration have been 
investigated. Investigated transport of the excitation between the samples spatially separated 
by the length of several centimeters gives the possibility of the practical applications of  
PFE; 
- The blue luminiscence of the frozen liquid on the surface of the Cu plate takes place at the 
precense of the photonic flame effect; 
   The photonic flame effect can have different explanation. Probably an essential role is played 
by plasma properties. The slow transport of the excitations from the irradiated crystal to other 
photonic crystals can be associated with sound waves created due to laser pulse interaction with 
the sample. Exciton mechanism and surface waveguides on the surface of the Cu plate also can 
play important role. It was checked that the change of the properties of the plate surface was 
leading to change of the photonic flame effect. Removing the oxid layer from the plate changed 
the threshold PFE. The luminescence of the frozen liquids on the surface of the Cu plate showes 
the important role of the electromagnetic field enhancement due to Mie resonance and Bragg 
diffraction on the photonic crystal lattice. The electromagnetic field enhancement  can lead to 
producing laser plasma, electron acceleration and x-ray production.    
References 
1. N.V.Tcherniega, A.D.Kudryavtseva, ArXiv Physics/ 0608150 (2006). 
2. N.V.Tcherniega, A.D.Kudryavtseva, Journal of Russian Laser Research, V.27, N 5, стр.400-
409 (2006). 
3.A.A.Esakov, V.S.Gorelik, A.D.Kudryavtseva, M.V.Tareeva and N.V.Tcherniega,  SPIE 
Proceedings, V 6369, 6369 OE1 - 6369 OE12, Photonic Crystals and Photonic Crystal Fibers for 
Sensing Applications II; Henry H. Du, Ryan Bise; Eds, (Oct.2006). 
4.P. Bykov,  J. Eksp. Teor. Fiz., 35, 269, (1972). 
5.E.Yablonovich, Phys. Rev. Lett.,58, 2059 (1987). 
6.S.John , Phys. Rev. Lett., 58, 2486, (1987). 
7.V. N. Astratov, V. N. Bogomolov, A. A. Kaplyanskii, A. V. Prokofiev, L. A. Samoilovich, S. 
M. Samoilovich, Yu. A. Vlasov, Nuovo Cimento, D 17,1349 (1995). 
8.A. V. Baryshev, A. A. Kaplyanskii, V. A. Kosobukin, M. F. Limonov, K. B. Samsuev, 
Fiz.Tverd.Tela, 45, 434 (2003), in Russian. 
9.M. Born, E. Wolf, Principles of Optics, Macmillan, New York (1964) 
10.A. Yariv, Quantum Electronics, John Wiley and Sons, Inc., New York, London, Sudneu 
(1967). 
11.M. G. Martemyanov, D. G. Gusev, I. V. Soboleva, T.V. Dolgova, A. A. Fedyanin, O. A. 
Akstipetrov, and G. Marovsky, Laser Physics, 14, 677 (2004). 
12.A. A. Fedyanin, O. A. Aktsipetrov, D. A. Kurdyukov, V. G. Golubev, M. Inoue, 
Appl.Phys.Letters, 87, 151111 (2005). 
13.Foteinopoulou, E.N.Economou, C.M.Soukoulis, Phys.Rev.Let., 90 , 107402, (2003). 
14.R. G. Zaporozhchenko, S. Ya. Kilin, A. G. Smirnov, Quantum Electronics, 30, 997 (2000), in 
Russian. 
15.W. T. M. Irvine, M. J. A. de Dood, D. Bouwmeester, Phys Rev.A 72, 043815 (2005). 
16.N. A. Gippius, S. G. Tihodeev, A. Christ, J. Kuhl, H. Giessen, Fiz. Tverd. Tela, 47, 139 
(2005). 
17.M.H.Kuok, H.S.Lim, S.C.NG, N.N.Liu, Z.K.Wang, Phys.Rev.Let., 90 , 255502, (2003). 
18.  Suxia Yang, J.H.Page, Zhengyou Liu, M.L.Cowan, C.T.Chan, Ping Sheng, Phys.Rev.Let., 93 
, 024301, (2004). 
19.Claudio Conti, Andrea Di Falco, Gaetano Assantom, Optics Express, 12, 823, (2004).  
20.G.Mie, Ann.Phys.,(Berlin), 25,  377,(1908)
ABSTRACT
  The results of the spectral, energetical and temporal characteristics of
radiation in the presence of the photonic flame effect are presented.
Artificial opal posed on Cu plate at the temperature of liquid nitrogen boiling
point (77 K) being irradiated by nanosecond ruby laser pulse produces long-
term luminiscence with a duration till ten seconds with a finely structured
spectrum in the the antistocks part of the spectrum. Analogous visible
luminescence manifesting time delay appeared in other samples of the artificial
opals posed on the same plate. In the case of the opal infiltrated with
different nonlinear liquids the threshold of the luminiscence is reduced and
the spatial disribution of the bright emmiting area on the opal surface is
being changed. In the case of the putting the frozen nonlinear liquids on the
Cu plate long-term blue bright luminiscence took place in the frozen species of
the liquids. Temporal characteristics of this luminiscence are nearly the same
as in opal matrixes.

<|endoftext|><|startoftext|>
Introduction
	 Fundamentals of nonparametric modeling
	Description of kernel function
	 Nonparametric estimation of PDF pertaining to experimental data
	Estimation of a physical law
	 Characteristics of the model
	 Predictor quality
	 Redundancy and predictor cost function
	 Example
	 Discussion
	 Conclusions
	Acknowledgments
	References
ABSTRACT
  Statistical modeling of experimental physical laws is based on the
probability density function of measured variables. It is expressed by
experimental data via a kernel estimator. The kernel is determined objectively
by the scattering of data during calibration of experimental setup. A physical
law, which relates measured variables, is optimally extracted from experimental
data by the conditional average estimator. It is derived directly from the
kernel estimator and corresponds to a general nonparametric regression. The
proposed method is demonstrated by the modeling of a return map of noisy
chaotic data. In this example, the nonparametric regression is used to predict
a future value of chaotic time series from the present one. The mean predictor
error is used in the definition of predictor quality, while the redundancy is
expressed by the mean square distance between data points. Both statistics are
used in a new definition of predictor cost function. From the minimum of the
predictor cost function, a proper number of data in the model is estimated.

<|endoftext|><|startoftext|>
Introduction
This paper is a brief description of a methodology of developing options (in the sense of financial options,
e.g., with all Greeks), to be applied in collaboration with Michael Bowman, as a first example to
scheduling a massive US Army project, Future Combat Systems (FCS) [1].
The major focus is to develop Real Options for non-financial projects, as discussed in other earlier
papers [3,4,12]. Data and some guidance on its use has been reported in a previous study of FCS [2,5].
The need for tools for fairly scheduling and pricing such a complex project has been emphasized in
Recommendations for Executive Action in a report by the U.S. General Accounting Office (GAO) on
FCS [14], and they also emphasize the need for management of FCS business plans [13].
2. Goals
A giv en Plan results in S(t), money allocated by the client/government is defined in terms of Projects
Si(t),
S(t) =
Σ Si(t)
where ai(t) may be some scheduled constraints. PATHTREE processes a probability tree developed over
the life of the plan T , divided into N nodes at times {tn}, each with mean epoch length dt [11]. Options,
including all Greeks, familiar to financial markets, are calculated for quite arbitrary nonlinear means and
variances of multiplicative noise [6,9]. This ability to process nonlinear functions in probability
distributions is essential for real-world applications.
Each Task has a range of durations, with nonzero Ai , with a disbursement of funds used, defining Si(tn).
Any Task dependent on a Task completion is slaved to its precursor(s).
We dev elop the Plan conditional probability density (CPD) in terms of differenced costs, dS,
P(S ± dS; tn + dt |S; tn)
P is modeled/cast/fit into the functional form
P(S ± dS; tn + dt |S; tn) = (2π g
2 exp(−Ldt)
(dS − fdt)2
(2g2dt2)
where f and g are nonlinear function of cost S and time t. The g2 variance function absorbs the multiple
Task cost and schedule statistical spreads, to determine P(dS, t), giving rise to the stochastic nature of
dollars spent on the Plan.
A giv en Project i with Task k has a mean duration iik , with a a mean cost Sik . The spread in dS has two
components arising from: (1) a stochastic duration around the mean duration, and (2) a stochastic spread
of mean dollars around a deterministic disbursement at a given time. Different finite-width asymmetric
distributions are used for durations and costs. For example, the distribution created for Adaptive
Simulated Annealing (ASA) [8], originally called Very Fast Simulated Re-annealing [7], is a finite-ranged
distribution with shape determined by a parameter “temperature” q. For each state (whether duration or
cost): (a) A random binary choice can be made to be higher or lower than the mean, using any ratio of
probabilities selected by the client. (b) Then, an ASA distribution is used on the chosen side. Each side
has a different q, each falling off from the mean. This is illustrated and further described in Fig. 1.
At the end of the tree at a time T (T also can be a parameter), there is a total cost at each node S(T ),
called a final “strike” in financial language. (A final strike might also appear at any node before T due to
cancellation of the Project using a particular kind of schedule alternative.) Working backwards, options
are calculated at time t0. Greeks (functional derivatives of the option) assess sensitivity to various
variables, e.g., like those discussed in previous papers [12], but here we deliver precise numbers based on
as much real-world information available.
Lester Ingber - 3 -  Real Options for Project Schedules (ROPS)
-1 -0.5  0  0.5  1
ASA (q = 0.1)
1/(2 * (abs(y) + q) * log(1 + 1/q))
Fig. 1. The ASA distribution can be used to develop finite-range asymmetric distributions
from which a value can be chosen for a given state of duration or cost. (a) A random binary
distribution is selected for a lower-than or higher-than mean, using any ratio of probabilities
selected by the client. Each side of the mean has its own temperature q. Here an ASA distri-
bution is given for q = 0.1. The range can be scaled to any finite interval and the mean
placed within this range. (b) A uniform random distribution selects a value from [-1,1], and
a normalized ASA value is read off for the given state.
3. Data
The following data are used to develop Plan CPD. Each Task i has
(a) a Projected allocated cost, Ci
(b) a Projected time schedule, Ti
(c) a CPD with a statistical width of funds spent, SWSi
(d) a distribution with a statistical width of duration, SWTi
(e) a range of durations, RTi
(f) a range of costs, RSi
Expert guesses need to be provided for (c)-(f) for the prototype study.
A giv en Plan must be constructed among all Tasks, specified the ordering of Tasks, e.g., obeying any
sequential constraints among Tasks.
4. Three Recursive Shell
4.1. Outer Shell
There may be several parameters in the Project, e.g., as coefficients of variables in means and variances of
different CPD. These are optimized in an outer shell using ASA [8]. This end product, including
MULTI_MIN states returned by ASA, gives the client flexibility to apply during a full Project [12]. We
may wish to minimize Cost/T , or (CostOverrun - CostInitial)/T , etc.
Lester Ingber - 4 -  Real Options for Project Schedules (ROPS)
4.2. Middle Shell
To obtain the Plan CPD, an middle shell of Monte Carlo (MC) states are generated from recursive
calculations. A Weibull or some other asymmetric finite distribution might be used for Task durations.
For a giv en state in the outer middle, a MC state has durations and mean cost disbursements defined for
each Task.
4.3. Inner Shell
At each time, for each Task, the differenced cost ((Sik(t + dt) − Sik(t))) is subjected to a inner shell
stochastic variation, e.g., some asymmetric finite distribution. The net costs dSik(t) for each Project i and
Task k are added to define dS(t) for the Plan. The inner shell cost CPD is re-applied many times to get a
set of {dS} at each time.
5. Real Options
5.1. Plan Options
After the Outer MC sampling is completed, there are histograms generated of the Plan’s dS(t) and
dS(t)/S(t − dt) at each time t. The histograms are normalized at each time to give P(dS, t). At each time
t, the data representing P is “curve-fit” to the form of Eq. (0), where f and g are functions needed to get
good fits, e.g., fitting coefficients of parameters {x}
f = x f 0 + x f 1S + x f 2S
2 + . . .
g = xg0 + xg1S + xg2S
2 + . . .
At each time t, the functions f and g are fit to the function ln((P(dS, t)), which includes the prefactor
containing g and the function L which may be viewed as a Padé approximate of these polynomials.
Complex constraints as functions of Sik(t) can be easily incorporated in this approach, e.g., due to regular
reviews by funding agencies or executives. These P’s are input into PATHTREE to calculate options for a
given strategy or Plan.
5.2. Risk Management of Project Options
If some measure of risk among Projects is desired, then during the MC calculations developed for the top-
level Plan, sets of differenced costs for each Project, dSi(t) and dSi(t)/Si(t − dt), stored from each of the
Project’s Tasks. Then, histograms and Project CPDs are developed, similar to the development of the
Plan CPD. A copula analysis, coded in TRD for risk management of financial markets, are applied to
develop a relative risk analysis among these projects [10]. In such an analysis, the Project marginal CPDs
are all transformed to Gaussian spaces, where it makes sense to calculate covariances and correlations.
An audit trail back the original Project spaces permits analysis of risk dependent on the tails of the Project
CPDs.
6. Generic Applications
ROPS can be applied to any complex scheduling of tasks similar to the FCS project. The need for
government agencies to plan and monitor such large projects is becoming increasingly difficult and
necessary [15]. Many large businesses have similar projects and similar requirements to manage their
complex projects.
Lester Ingber - 5 -  Real Options for Project Schedules (ROPS)
References
[1] M. Bowman and L. Ingber, ‘‘Real Options for US Army Future Combat Systems,’’ Report
2007:ROFCS, Lester Ingber Research, Ashland, OR, 2007.
[2] G.G. Brown, R.T. Grose, and R.A. Koyak, ‘‘Estimating total program cost of a long-term, high-
technology, high-risk project with task durations and costs that may increase over time,’’ Military
Operations Research 11, 41-62 (2006). [URL
http://www.nps.navy.mil/orfacpag/resumePages/papers/Brownpa/Estimating_total_
program_cost.pdf]
[3] T.E. Copeland and P.T. Keenan, ‘‘Making real options real,’’ McKinsey Quarterly 128-141 (1998).
[URL http://faculty.fuqua.duke.edu/˜charvey/Teaching/BA456_2006/McK98_3.pdf]
[4] G. Glaros, ‘‘Real options for defense,’’ Tr ansformation Trends June, 1-11 (2003). [URL
http://www.oft.osd.mil/library/library_files/trends_205_transforma-
tion_trends_9_june%202003_issue.pdf]
[5] R. Grose, ‘‘Cost-constrained project scheduling with task durations and costs that may increase
over time: Demonstrated with the U.S. Army future combat systems,’’ Thesis, Naval Postgraduate
School, Monterey, CA, 2004. [URL http://www.stormingmedia.us/75/7594/A759424.html]
[6] J.C. Hull, Options, Futures, and Other Derivatives, 4th Edition (Prentice Hall, Upper Saddle River,
NJ, 2000).
[7] L. Ingber, ‘‘Very fast simulated re-annealing,’’ Mathl. Comput. Modelling 12, 967-973 (1989).
[URL http://www.ingber.com/asa89_vfsr.pdf]
[8] L. Ingber, ‘‘Adaptive Simulated Annealing (ASA),’’ Global optimization C-code, Caltech Alumni
Association, Pasadena, CA, 1993. [URL http://www.ingber.com/#ASA-CODE]
[9] L. Ingber, ‘‘Statistical mechanics of portfolios of options,’’ Report 2002:SMPO, Lester Ingber
Research, Chicago, IL, 2002. [URL http://www.ingber.com/markets02_portfolio.pdf]
[10] L. Ingber, ‘‘Trading in Risk Dimensions (TRD),’’ Report 2005:TRD, Lester Ingber Research,
Ashland, OR, 2005.
[11] L. Ingber, C. Chen, R.P. Mondescu, D. Muzzall, and M. Renedo, ‘‘Probability tree algorithm for
general diffusion processes,’’ Phys. Rev. E 64, 056702-056707 (2001). [URL
http://www.ingber.com/path01_pathtree.pdf]
[12] K.J. Leslie and M.P. Michaels, ‘‘The real power of real options,’’ McKinsey Quarterly 4-22 (1997).
[http://faculty.fuqua.duke.edu/˜charvey/Teaching/BA456_2006/McK97_3.pdf]
[13] General Accounting Office, ‘‘Future Combat System Risks Underscore the Importance of
Oversight,’’ Report GAO-07-672T, GAO, Washington DC, 2007. [URL http://www.gao.gov/cgi-
bin/getrpt?GAO-07-672T]
[14] General Accounting Office, ‘‘Key Decisions to Be Made on Future Combat System,’’ Report
GAO-07-376, GAO, Washington DC, 2007. [URL http://www.gao.gov/cgi-
bin/getrpt?GAO-07-376]
[15] B. Wysocki, Jr, ‘‘Is U.S. Government ’Outsourcing Its Brain’?,’’ Wall Street Journal March 30, 1
(2007).
ABSTRACT
  Real Options for Project Schedules (ROPS) has three recursive
sampling/optimization shells. An outer Adaptive Simulated Annealing (ASA)
optimization shell optimizes parameters of strategic Plans containing multiple
Projects containing ordered Tasks. A middle shell samples probability
distributions of durations of Tasks. An inner shell samples probability
distributions of costs of Tasks. PATHTREE is used to develop options on
schedules.. Algorithms used for Trading in Risk Dimensions (TRD) are applied to
develop a relative risk analysis among projects.

<|endoftext|><|startoftext|>
Introduction
We shall start with
Definition. Suppose that n ≥ 2 is an integer. We will say that a group M has the
property (nCC) if there are exactly n conjugacy classes of elements in M .
Note that a group M has (2CC) if and only if any two non-trivial elements are
conjugate in M . For two elements x, y of some group G, we shall write x
∼ y if x
and y are conjugate in G, and x
≁ y if they are not.
For a group G, denote by π(G) the set of all finite orders of elements of G. A
classical theorem of G. Higman, B. Neumann and H. Neumann ([8]) states that
every countable group G can be embedded into a countable (but infinitely gen-
erated) group M , where any two elements of the same order are conjugate and
π(M) = π(G).
For any integer n ≥ 2, take G = Z/2n−2Z and embed G into a countable group
M according to the theorem above. Then card(π(M)) = card(π(G)) = n − 1.
Since, in addition, M will always contain an element of infinite order, the theorem
of Higman-Neumann-Neumann implies that G has (nCC).
Another way to construct infinite groups with finitely many conjugacy classes
was suggested by S. Ivanov [15, Thm. 41.2], who showed for every sufficiently large
prime p there is an infinite 2-generated groupMp of exponent p possessing exactly p
conjugacy classes. The groupMp is constructed as a direct limit of word hyperbolic
groups, and, as noted in [21], it is impossible to obtain an infinite group with (2CC)
in the same manner.
In the recent paper [21] D. Osin developed a theory of small cancellation over
relatively hyperbolic groups and used it to obtain the following remarkable result:
2000 Mathematics Subject Classification. 20F65, 20E45, 20F28.
Key words and phrases. Conjugacy Classes, Relatively Hyperbolic Groups, Outer Automor-
phism Groups.
This work was supported by the Swiss National Science Foundation Grant ♯ PP002-68627.
http://arxiv.org/abs/0704.0091v2
2 ASHOT MINASYAN
Theorem 1.1 ([21], Thm. 1.1). Any countable group G can be embedded into a
2-generated group M such that any two elements of the same order are conjugate
in M and π(M) = π(G).
Applying this theorem to the group G = Z/2n−2Z one can show that for each
integer n ≥ 2 there exists a 2-generated group with (nCC). And when n = 2 we
get a 2-generated torsion-free group that has exactly two conjugacy classes.
The presence of elements of finite orders in the above constructions was impor-
tant, because if two elements have different orders, they can never be conjugate.
So, naturally, one can ask the following
Question 1. Do there exist torsion-free (finitely generated) groups with (nCC),
for any integer n ≥ 3?
Note that if G is the finitely generated group with (2CC) constructed by Osin,
then the m-th direct power Gm of G is also a finitely generated torsion-free group
which satisfies (2mCC). But what if we want to achieve a torsion-free group with
(3CC)? With this purpose one could come up with
Question 2. Suppose that G is a countable torsion-free group and x, y ∈ G are
non-conjugate. Is it possible to embed G into a groupM , which has (3CC), so that
x and y stay non-conjugate in M?
Unfortunately, the answer to Question 2 is negative as the following example
shows.
Example 1. Consider the group
(1.1) G1 = 〈a, t ‖ tat
−1 = a−1〉
which is isomorphic to the non-trivial semidirect product Z ⋊ Z. Note that G1 is
torsion-free, and t is not conjugated to t−1 in G1 because t ≁ t
−1 in the infinite
cyclic group 〈t〉 which is canonically isomorphic to the quotient of G1 by the normal
closure of a. However, if G1 is embedded into a (3CC)-group M , it is easy to see
that every element of M will be conjugated to its inverse (indeed, if y ∈ M \ {1}
and y
≁ y−1 then yǫ
∼ a−1, for some ǫ ∈ {1,−1}, hence yǫ
∼ y−ǫ – a
contradiction). In particular, t
∼ t−1.
An analog of the above example can be given for each n ≥ 3 – see Section 3. This
example shows that, in order to get a positive result, one would have to strengthen
the assumptions of Question 2.
Let G be a group. Two elements x, y ∈ G are said to be commensurable if there
exist k, l ∈ Z \ {0} such that xk is conjugate to yl. We will use the notation x
if x and y are commensurable in G. In the case when x is not commensurable with
y we will write x
6≈ y. Observe that commensurability, as well as conjugacy, defines
an equivalence relation on the set of elements of G. It is somewhat surprising that
if one replaces the words ”non-conjugate” with the words ”non-commensurable” in
Question 2, the answer becomes positive:
Corollary 1.2. Assume that G is a countable torsion-free group, n ∈ N, n ≥ 2,
and x1, . . . , xn−1 ∈ G \ {1} are pairwise non-commensurable. Then there exists a
group M and an injective homomorphism ϕ : G→M such that
1. M is torsion-free and generated by two elements;
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 3
2. M has (nCC);
3. M is 2-boundedly simple;
4. the elements ϕ(x1), . . . , ϕ(xn−1) are pairwise non-commensurable in M .
Recall that a group G is said to be k-boundedly simple if for any x, y ∈ G \ {1}
there exist l ≤ k and g1, . . . , gl ∈ G such that x = g1yg
1 · · · glyg
l in G. A group
is called boundedly simple if it is k-boundedly simple for some k ∈ N. Evidently
every boundedly simple group is simple; the converse is not true in general. For
example, the infinite alternating group A∞ is simple but not boundedly simple
because conjugation preserves the type of the decomposition of a permutation into
a product of cycles. First examples of torsion-free finitely generated boundedly
simple groups were constructed by A. Muranov (see [12, Thm. 2], [13, Thm. 1]).
Corollary 1.2 is an immediate consequence of a more general Theorem 3.5 that
will be proved in Section 3.
Applying Corollary 1.2 to the group G = F (x1, . . . , xn−1), which is free on the
set {x1, . . . , xn−1}, and its non-commensurable elements x1, . . . , xn−1, we obtain a
positive answer to Question 1:
Corollary 1.3. For every integer n ≥ 3 there exists a torsion-free 2-boundedly
simple group satisfying (nCC) and generated by two elements.
(In the case when n = 2 the above statement was obtained by Osin in [21,
Cor. 1.3].) In fact, for any (finitely generated) torsion-free group H we can set
G = H ∗ F (x1, . . . , xn−1), and then use Corollary 1.2 to embed G into a group M
enjoying the properties 1− 4 from its claim. Since there is a continuum of pairwise
non-isomorphic 2-generated torsion-free groups ([4]), and a finitely generated group
can contain at most countably many of different 2-generated subgroups, this shows
that there must be continually many pairwise non-isomorphic groups satisfying
properties 1− 3 from Corollary 1.2.
Recall that the rank rank(G) of a group G is the minimal number of elements re-
quired to generate G. In Section 4 we show how classical theory of HNN-extensions
allows to construct different embeddings into (infinitely generated) groups that have
finitely many classes of conjugate elements, and in Section 5 we use Osin’s results
(from [21]) regarding quotients of relatively hyperbolic groups to prove
Theorem 1.4. Let H be a torsion-free countable group and let M ⊳H be a non-
trivial normal subgroup. Then H can be isomorphically embedded into a torsion-free
group Q, possessing a normal subgroup N ⊳Q, such that
• Q = H ·N and H ∩N =M (hence Q/N ∼= H/M);
• N has (2CC);
• ∀ x, y ∈ Q \ {1}, x
∼ y if and only if ϕ(x)
∼ ϕ(y), where ϕ : Q → Q/N
is the natural homomorphism;
• rank(N) = 2 and rank(Q) ≤ rank(H/M) + 2.
This theorem implies that if Q/N ∼= H/M has exactly (n− 1) conjugacy classes
(e.g., if it is finite), then the group Q will have (nCC) and will not be simple
(if n ≥ 3). Thus it may be used to build (nCC)-groups in a recursive manner.
It also allows to obtain embeddings of countable torsion-free groups into (nCC)-
groups, which we could not get by using Corollary 1.2. For instance, as we saw in
Example 1, the fundamental group of the Klein bottle G1, given by (1.1), can not
be embedded into a (3CC)-group M so that t
≁ t−1. However, with 4 conjugacy
4 ASHOT MINASYAN
classes this is already possible: see Corollary 5.5 in Section 5. The idea is as
follows: the group G1 can be mapped onto Z/3Z in such a way that the images of
the elements t and t−1 are distinct. Let M be the kernel of this homomorphism.
One can apply Theorem 1.4 to the pair (G1,M) to obtain the required embedding
of G1 into a group Q. And since Z/3Z has exactly 3 conjugacy classes, the group
Q will have (4CC).
An application of Theorem 1.4 to the case when H = Z and M = 2Z⊳H also
provides an affirmative answer to a question of A. Izosov from [9, Q. 11.42], asking
whether there exists a torsion-free (3CC)-group Q that contains a normal subgroup
N of index 2.
The goal of the second part of this article is to show that every countable group
can be realized as a group of outer automorphisms of some finitely generated (2CC)-
group. This problem has some historical background: in [11] T. Matumoto proved
that every group is a group of outer automorphisms of some group (in contrast,
there are groups, e.g., Z, that are not full automorphism groups of any group); M.
Droste, M. Giraudet, R. Göbel ([7]) showed that for every group C there exists
a simple group S such that Out(S) ∼= C; I. Bumagina and D. Wise in [3] proved
that each countable group C is isomorphic to Out(N) where N is a 2-generated
subgroup of a countable C′(1/6)-group, and if, in addition, C is finitely presented
then one can choose N to be residually finite.
In Section 6 we establish a few useful statements regarding paths in the Cayley
graph of a relatively hyperbolic group G, and apply them in Section 7 to obtain
small cancellation quotients of G satisfying certain conditions. Finally, in Section
8 we prove the following
Theorem 1.5. Let C be an arbitrary countable group. Then for every non-elemen-
tary torsion-free word hyperbolic group F1 there exists a torsion-free group N sat-
isfying the following properties:
• N is a 2-generated quotient of F1;
• N has (2CC);
• Out(N) ∼= C.
The principal difference between this theorem and the result of [3] is that our
group N is torsion-free and simple. Moreover, if one applies Theorem 1.5 to the
case when F1 is a torsion-free hyperbolic group with Kazhdan’s property (T) (and
recalls that every quotient of a group with property (T) also has (T)), one will get
Corollary 1.6. For any countable group C there is a 2-generated group N such
that N has (2CC) and Kazhdan’s property (T), and Out(N) ∼= C.
The reason why Kazhdan’s property (T) is interesting in this context is the
question from [6, p. 134] which asked whether there exist groups that satisfy
property (T) and have infinite outer automorphism groups (it can be motivated
by a theorem of F. Paulin [22] which claims that the outer automorphism group is
finite for any word hyperbolic group with property (T)). Positive answers to this
question were obtained (using different methods) by Y. Ollivier and D. Wise [14],
Y. de Cornulier [5], and I. Belegradek and D. Osin [2]. Corollary 1.6 not only
shows that the group of outer automorphisms of a group N with property (T)
can be infinite, but also demonstrates that there are no restrictions whatsoever on
Out(N).
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 5
Acknowledgements. The author would like to thank D. Osin for fruitful dis-
cussions and encouragement.
2. Relatively hyperbolic groups
Assume that G is a group, {Hλ}λ∈Λ is a fixed collection of subgroups of G (called
peripheral subgroups), and X is a subset of G. The subset X is called a relative
generating set of G with respect to {Hλ}λ∈Λ if G is generated by X ∪
λ∈ΛHλ. In
this case G a quotient of the free product
F = (∗λ∈ΛHλ) ∗ F (X ),
where F (X ) is the free group with basis X . Let R be a subset of F such that the
kernel of the natural epimorphism F → G is the normal closure of R in the group
F ; then we will say that G has relative presentation
(2.1) 〈X , {Hλ}λ∈Λ ‖ R = 1, R ∈ R〉.
If the sets X and R are finite, the relative presentation (2.1) is said to be finite.
Set H =
λ∈Λ(Hλ \ {1}). A finite relative presentation (2.1) is said to satisfy a
linear relative isoperimetric inequality if there exists C > 0 such that, for every word
w in the alphabet X ∪H (for convenience, we will further assume that X−1 = X )
representing the identity in the group G, one has
f−1i R
i fi,
with equality in the group F , where Ri ∈ R, fi ∈ F , for i = 1, . . . , k, and k ≤ C‖w‖,
where ‖w‖ is the length of the word w.
The next definition is due to Osin (see [20]):
Definition. the group G is called hyperbolic relative to (the collection of peripheral
subgroups) {Hλ}λ∈Λ, if G admits a finite relative presentation (2.1) satisfying a
linear relative isoperimetric inequality.
This definition is independent of the choice of the finite generating set X and
the finite set R in (2.1) (see [20]). We would also like to note that, in general, it
does not require the group G to be finitely generated, which will be important in
this paper. The definition immediately implies the following basic facts:
Remark 2.1 ([20]). (a) Let {Hλ}λ∈Λ be an arbitrary family of groups. Then the
free product G = ∗λ∈ΛHλ will be hyperbolic relative to {Hλ}λ∈Λ.
(b) Any word hyperbolic group (in the sense of Gromov) is hyperbolic relative
to the family {{1}}, where {1} denotes the trivial subgroup.
Recall that a group H is called elementary if it has a cyclic subgroup of finite
index. Further in this section we will assume that G is a non-elementary group
hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ.
An element g ∈ G is said to be parabolic if it is conjugated to an element of Hλ
for some λ ∈ Λ. Otherwise g is said to be hyperbolic. Given a subgroup S ≤ G, we
denote by S0 the set of all hyperbolic elements of S of infinite order.
Lemma 2.2 ([17], Thm. 4.3, Cor. 1.7). For every g ∈ G0 the following conditions
hold.
6 ASHOT MINASYAN
1) The element g is contained in a unique maximal elementary subgroup EG(g)
of G, where
(2.2) EG(g) = {f ∈ G : fg
nf−1 = g±n for some n ∈ N}.
2) The group G is hyperbolic relative to the collection {Hλ}λ∈Λ ∪ {EG(g)}.
Recall that a non-trivial subgroup H ≤ G is called malnormal if for every g ∈
G \H , H ∩ gHg−1 = {1}. The next lemma is a special case of Theorem 1.4 from
[20]:
Lemma 2.3. For any λ ∈ Λ and any g /∈ Hλ, the intersection Hλ ∩ gHλg
−1 is
finite. If h ∈ G, µ ∈ Λ and µ 6= λ, then the intersection Hλ ∩ hHµh
−1 is finite. In
particular, if G is torsion-free then Hλ is malnormal (provided that Hλ 6= {1}).
Lemma 2.4 ([20], Thm. 2.40). Suppose that a group G is hyperbolic relative
to a collection of subgroups {Hλ}λ∈Λ ∪ {S1, . . . , Sm}, where S1, . . . , Sm are word
hyperbolic (in the ordinary non-relative sense). Then G is hyperbolic relative to
{Hλ}λ∈Λ.
Lemma 2.5 ([19], Cor. 1.4). Let G be a group which is hyperbolic relative to a
collection of subgroups {Hλ}λ∈Λ ∪ {K}. Suppose that K is finitely generated and
there is a monomorphism α : K → Hν for some ν ∈ Λ. Then the HNN-extension
〈G, t ‖ txt−1 = α(x), x ∈ K〉 is hyperbolic with respect to {Hλ}λ∈Λ.
In [21] Osin introduced the following notion: a subgroup S ≤ G is suitable if
there exist two elements g1, g2 ∈ S
0 such that g1
6≈ g2 and EG(g1)∩EG(g2) = {1}.
For any S ≤ G with S0 6= ∅, one sets
(2.3) EG(S) =
EG(g)
which is obviously a subgroup of G normalized by S. Note that EG(S) = {1} if the
subgroup S is suitable in G. As shown in [1, Lemma 3.3], if S is non-elementary
and S0 6= ∅ then EG(S) is the unique maximal finite subgroup of G normalized by
Lemma 2.6. Let {H}λ∈Λ be a family of groups and let F be a torsion-free non-
elementary word hyperbolic group. Then the free product G = (∗λ∈ΛHλ) ∗ F is
hyperbolic relative to {Hλ}λ∈Λ and F is a suitable subgroup of G.
Proof. Indeed, G is hyperbolic relative to {Hλ}λ∈Λ by Remark 2.1 and Lemma
2.4. Since F is non-elementary, there are elements of infinite order x, y ∈ F such
that x
6≈ y (see, for example, [16, Lemma 3.2]). Evidently, x and y are hyperbolic
elements of G that are not commensurable with each other, and the subgroups
EG(x) = EF (x) ≤ F , EG(y) = EF (y) ≤ F are cyclic (as elementary subgroups of a
torsion-free group). Hence EG(x) ∩ EG(y) = {1}, and thus F is suitable in G. �
Lemma 2.7 ([21], Lemma 2.3). Suppose that G is a group hyperbolic relative to a
family of subgroups {Hλ}λ∈Λ and S ≤ G is a suitable subgroup. Then one can find
infinitely many pairwise non-commensurable (in G) elements g1, g2, · · · ∈ S
0 such
that EG(gi) ∩ EG(gj) = {1} for all i 6= j.
The following theorem was proved by Osin in [21] using the theory of small
cancellation over relatively hyperbolic groups, and represents our main tool for
obtaining new quotients of such groups having a number of prescribed properties:
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 7
Theorem 2.8 ([21], Thm. 2.4). Let G be a torsion-free group hyperbolic relative to
a collection of subgroups {Hλ}λ∈Λ, let S be a suitable subgroup of G, and let T, U
be arbitrary finite subsets of G. Then there exist a group G1 and an epimorphism
η : G→ G1 such that:
(i) The restriction of η to
λ∈ΛHλ ∪ U is injective, and the group G1 is hy-
perbolic relative to the collection {η(Hλ)}λ∈Λ;
(ii) for every t ∈ T , we have η(t) ∈ η(S);
(iii) η(S) is a suitable subgroup of G1;
(iv) G1 is torsion-free;
(v) the kernel ker(η) of η is generated (as a normal subgroup of G) by a finite
collection of elements belonging to T · S.
We have slightly changed the original formulation of the above theorem from
[21], demanding the injectivity on V =
λ∈ΛHλ ∪ U (instead of just
λ∈ΛHλ)
and adding the last point concerning the generators of the kernel. The latter
follows from the explicit form of the relations, imposed on G (see the proof of
Thm. 2.4 in [21]), and the former – from part 2 of Lemma 5.1 in [21] and the fact
that any element from V has length (in the alphabet X ∪ H) at most N , where
N = max{|h|X∪H : h ∈ U}+ 1.
3. Groups with finitely many conjugacy classes
Lemma 3.1. Let G be a group and let x1, x2, x3, x4 ∈ G be elements of infinite order
such that x1
6≈ xi, i = 2, 3, 4. Let H = 〈G, t ‖ tx3t
−1 = x4〉 be the HNN-extension
of G with associated cyclic subgroups generated by x3 and x4. Then x1
6≈ x2.
Proof. Arguing by contradiction, assume that hxl1h
−1xm2 = 1 for some h ∈ H ,
l,m ∈ Z \ {0}. The element h has a reduced presentation of the form
h = g0t
ǫ1g1t
ǫ2 . . . tǫkgk
where g0, . . . , gk ∈ G, ǫ1, . . . , ǫk ∈ Z \ {0}, and
gj /∈ 〈x3〉 if 1 ≤ j ≤ k − 1 and ǫj > 0, ǫj+1 < 0
gj /∈ 〈x4〉 if 1 ≤ j ≤ k − 1 and ǫj < 0, ǫj+1 > 0
By the assumptions, x1
6≈ x2 hence k ≥ 1, and in the group H we have
(3.1) hxl1h
−1xm2 = g0t
ǫ1g1t
ǫ2 . . . tǫkgkx
−ǫk . . . t−ǫ2g−11 t
−ǫ1 g̃0 = 1,
where g̃0 = g
2 ∈ G. By Britton’s Lemma (see [10, IV.2]), the left hand side
in (3.1) can not be reduced, and this can happen only if gkx
k belongs to either
〈x3〉 or 〈x4〉 in G, which would contradict the assumptions. Thus the lemma is
proved. �
Definition. Suppose that G is a group and Xi ⊂ G, i ∈ I, is a family of subsets.
We shall say that Xi, i ∈ I, are independent if no element of Xi is commensurable
with an element of Xj whenever i 6= j, i, j ∈ I.
Lemma 3.2. Assume that G is a countable torsion-free group, n ∈ N, n ≥ 2, and
non-empty subsets Xi ⊂ G \ {1}, i = 1, . . . , n− 1, are independent in G. Then G
can be (isomorphically) embedded into a countable torsion-free group M in such a
way that M has (nCC) and the subsets Xi, i = 1, . . . , n− 1, remain independent in
8 ASHOT MINASYAN
Proof. For each i = 1, . . . , n− 1, fix an element xi ∈ Xi. First we embed G into a
countable torsion-free group G1 such that for each non-trivial element g ∈ G there
exist j ∈ {1, . . . , n− 1} and t ∈ G1 satisfying tgt
−1 = xj in G1, and the subsets Xi,
i = 1, . . . , n− 1, stay independent in G1.
Let g1, g2, . . . be an enumeration of all non-trivial elements of G. Set G(0) = G
and suppose that we have already constructed the group G(k), containing G, so
that for each l ∈ {1, . . . , k} there is j ∈ {1, . . . , n− 1} such that the element gl is
conjugated in G(k) to xj , and Xi, i = 1, . . . , n− 1, are independent in G(k).
Suppose, at first, that gk+1 is commensurable in G(k) with an element of Xj
for some j. Then gk+1
6≈ h for every h ∈
i=1,i6=j
Xi. Define G(k + 1) to be the
HNN-extension 〈G(k), tk+1 ‖ tk+1gk+1t
k+1 = xj〉. By Lemma 3.1 the subsets Xi,
i = 1, . . . , n− 1, will remain independent in G(k + 1).
Thus we can assume that gk+1 is not commensurable with any element from
i=1 Xi in G(k). According to the induction hypotheses one can apply Lemma 3.1
to the HNN-extension
G(k + 1) = 〈G(k), tk+1 ‖ tk+1gk+1t
k+1 = x1〉
to see that the subsets Xi ⊂ G ≤ G(k + 1), i = 1, . . . , n − 1, are independent in
G(k + 1).
Now, setG1 =
k=0G(k). EvidentlyG1 has the required properties. In the same
manner, one can embed G1 into a countable torsion-free group G2 so that each non-
trivial element of G1 will be conjugated to xi in G2, for some i ∈ {1, . . . , n − 1},
and the subsets Xi, i = 1, . . . , n− 1, continue to be independent in G2.
Proceeding like that we obtain the desired groupM =
s=1Gs. By the construc-
tion, M is a torsion-free countable group which has exactly n conjugacy classes:
[1], [x1], . . . , [xn−1]. The subsets Xi, i = 1, . . . , n− 1, are independent in M because
they are independent in Gs for each s ∈ N. �
Corollary 3.3. In Lemma 3.2 one can add that the groupM is 2-boundedly simple.
Proof. Let a torsion-free countable group G and its non-empty independent subsets
Xi, i = 1, . . . , n− 1, be as in Lemma 3.2. Let F = F (a1, . . . , an−1, b1, . . . , bn−1) be
the free group with the free generating set {a1, . . . , an−1, b1, . . . , bn−1}, and consider
the group Ḡ = G ∗ F . For each i = 1, . . . , n− 1, define
X̄i = Xi ∪ {ai, a
i } ∪ {[aj, bi] | j = 1, . . . , n− 1, j 6= i} ⊂ Ḡ,
where [aj , bi] = ajbia
i . Using the universal properties of free groups and free
products one can easily see that the subsets X̄i, i = 1, . . . , n− 1, are independent
in Ḡ.
Now we apply Lemma 3.2 to find a countable torsion-free (nCC)-group M , con-
taining Ḡ, such that X̄i, i = 1, . . . , n− 1, are independent in M . Observe that this
implies that for any given i = 1, . . . , n− 1, any two elements of X̄i are conjugate in
M . For arbitrary x, y ∈ M \ {1} there exist i, j ∈ {1, . . . , n − 1} such that x
and y
∼ aj . If i = j then x
∼ y. Otherwise, y
∼ a−1j and x
∼ [aj , bi] which
is a product of two conjugates of aj , and, hence, of y. Therefore the group M is
2-boundedly simple, and since G ≤ Ḡ ≤M , the corollary is proved. �
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 9
Below is a particular (torsion-free) case of a theorem proved by Osin in [21, Thm.
2.6]:
Lemma 3.4. Any countable torsion-free group S can be embedded into a 2-generated
group M so that S is malnormal in M and every element of M is conjugated to an
element of S in M .
Proof. Following Osin’s proof of Theorem 2.6 from [21], we see that the required
group M can be constructed as an inductive limit of relatively hyperbolic groups
G(i), i ∈ N. More precisely, one sets G(0) = S ∗ F2, where F2 is a free group of
rank 2, ξ0 = idG(0) : G(0) → G(0), and for each i ∈ N one constructs a group
G(i) and an epimorphism ξi : G(0) → G(i) so that ξi is injective on S, G(i) is
torsion-free and hyperbolic relative to {ξi(S)}, and ξi factors through ξi−1. The
group M is defined to be the direct limit of (G(i), ξi) as i → ∞, i.e., Q = G(0)/N
where N =
i∈N ker(ξi). By Lemma 2.3, ξi(S) is malnormal in G(i), hence the
image of S will also be malnormal in M . �
Theorem 3.5. Let G be a torsion-free countable group, n ∈ N, n ≥ 2, and non-
empty subsets Xi ⊂ G \ {1}, i = 1, . . . , n − 1, be independent in G. Then G can
be embedded into a 2-generated torsion-free group M which has (nCC), so that the
subsets Xi, i = 1, . . . , n− 1, stay independent in M . Moreover, one can choose M
to be 2-boundedly simple.
Proof. First, according to Corollary 3.3, we can embed the groupG into a countable
torsion-free group S such that S has (nCC) and is 2-boundedly simple, and Xi,
i = 1, . . . , n − 1, are independent in S. Second, we apply Lemma 3.4 to find the
2-generated group M from its claim. Choose any i, j ∈ {1, . . . , n − 1}, i 6= j, and
x ∈ Xi, y ∈ Xj. If x and y were commensurable inM , the malnormality of S would
imply that x and y must be commensurable in S, contradicting the construction.
Hence Xi, i = 1, . . . , n − 1, are independent in M . Since each element of M is
conjugated to an element of S, it is evident that M has (nCC), is torsion-free and
2-boundedly simple. �
Remark 3.6. A more direct proof of Theorem 3.5, not using Lemma 3.4, can be
extracted from the proof of Theorem 5.1 (see Section 5), applied to the case when
H =M .
It is easy to see that Theorem 3.5 immediately implies Corollary 1.2 that was
formulated in the Introduction. As promised, we now give a counterexample to
Question 2 (formulated in the Introduction) for any n ≥ 3.
Example 2. Let G2 = 〈a, t ‖ tat
−1 = a2〉 be the Baumslag-Solitar BS(1, 2)-group.
ThenG2 is torsion-free, and the elements t
2, t4, . . . , t2
are pairwise non-conjugate
in G2 (since this holds in the quotient of G2 by the normal closure of a). Suppose
that G2 is embedded into a group M having (nCC) so that t
2, t4, . . . , t2
pairwise non-conjugate in M . Then t2, . . . , t2
is the list of representatives of all
non-trivial conjugacy classes of M . Therefore there exist k, l ∈ {1, . . . , n− 1} such
that t
and a
. Consequently
and t2
hence k = l = n− 1 according to the assumptions. But this yields
n−1 M
∼ t2,
10 ASHOT MINASYAN
implying that t2
∼ t4, which contradicts our assumptions.
Thus G2 can not be embedded into a (nCC)-group M in such a way that
t2, . . . , t2
remain pairwise non-conjugate in M .
4. Normal subgroups with (nCC)
If M is a normal subgroup of a group H , then H naturally acts on M by con-
jugation. We shall say that this action preserves the conjugacy classes of M if for
any h ∈ H and a ∈M there exists b ∈M such that hah−1 = bab−1.
Lemma 4.1. Let G be a torsion-free group, N ⊳ G and x1, . . . , xl ∈ N \ {1}
be pairwise non-commensurable (in G) elements. Then there exists a partition
N \ {1} =
k=1Xk of N \ {1} into a (disjoint) union of G-independent subsets
X1, . . . , Xl such that xk ∈ Xk for every k ∈ {1, . . . , l}. Moreover, each subset Xk
will be invariant under conjugation by elements of G.
Proof. Since
≈ is an equivalence relation on G\{1}, one can find the corresponding
decomposition: G \ {1} =
j∈J Yj , where Yj is an equivalence class for each j ∈ J .
For each k = 1, . . . , l, there exists j(k) ∈ J such that xk ∈ Yj(k). Note that
j(k) 6= j(m) if k 6= m since xk
6≈ xm.
Denote J ′ = J \ {j(1), . . . , j(l − 1)},
X1 = Yj(1) ∩N, . . . , Xl−1 = Yj(l−1) ∩N, and Xl =
Yj ∩N.
EvidentlyN\{1} =
k=1Xk, X1, . . . , Xl are independent subsets of G and xk ∈ Xk
for each k = 1, . . . , l. The final property follows from the construction since for any
a ∈ G and j ∈ J we have aYja
−1 = Yj and aNa
−1 = N . �
Lemma 4.2. For every countable group C and each n ∈ N, n ≥ 2, there exists a
countable torsion-free group H having a normal subgroup M ⊳H such that
(i) M satisfies (nCC);
(ii) M is 2-boundedly simple;
(iii) the natural action of H on M preserves the conjugacy classes of M ;
(iv) H/M ∼= C.
Proof. Let H ′0 be the free group of infinite countable rank. Choose N
0 so that
H ′0/N
∼= C. Let F = F (x1, . . . , xn−1) denote the free group freely generated by
x1, . . . , xn−1. Define H0 = H
0 ∗ F and let N0 be the normal closure of N
0 ∪ F in
H0. Evidently, H0/N0 ∼= H
∼= C and the elements x1, . . . , xn−1 ∈ N0 \ {1} are
pairwise non-commensurable in H0.
By Lemma 4.1, one can choose a partition of N0 \ {1} into the union of H0-
independent subsets:
N0 \ {1} =
so that xk ∈ X0k for each k = 1, . . . , n− 1.
By Corollary 3.3 there exists a countable torsion-free 2-boundedly simple group
M1 with the property (nCC) containing a copy of N0, such that the subsets X0k,
k = 1, 2, . . . , n − 1, are independent in M1. Denote by H1 = H0 ∗N0 M1 the
amalgamated product of H0 and M1 along N0, and let N1 be the normal closure
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 11
of M1 in H1. Note that H1 is torsion-free as an amalgamated product of two
torsion-free groups ([10, IV.2.7]).
We need to verify that the elements x1, . . . , xn−1 are pairwise non-commen-
surable in H1. Indeed, if a ∈ X0k and b ∈ X0l, k 6= l, are conjugate in H1 then
there must exist y1, . . . , yt ∈ M1 \N0 and z1, . . . , zt−1 ∈ H0 \N0, z0, zt ∈ H0 such
z0y1 · · · zt−1ytztaz
t−1 · · · y
Suppose that t is minimal possible with this property. As conjugation by elements
of H0 preserves X0k and X0l, we can assume that z0, zt = 1. Hence
y1z1 · · · zt−1ytay
t−1 · · · z
−1 H1= 1.
By the properties of amalgamated products (see [10, Ch. IV]), the left-hand side
in this equality can not be reduced, consequently ytay
t ∈ N0 \ {1} =
k=1 X0k.
But then ytay
t ∈ X0k by the properties of M1, contradicting the minimality of t.
Thus, we have shown that xk
6≈ xl whenever k 6= l.
Assume that the groupHi = Hi−1∗Ni−1Mi, i ≥ 1, has already been constructed,
so that
0) Hi is countable and torsion-free;
1) Ni−1 ⊳Hi−1;
2) Hi−1 = H0 ·Ni−1 and H0 ∩Ni−1 = N0;
3) Mi satisfies (nCC);
4) x1, . . . , xn−1 are pairwise non-commensurable in Hi.
Let Ni be the normal closure of Mi in Hi. Because of the condition 4) and
Lemma 4.1, one can find a partition of Ni \ {1} into a union of Hi-independent
subsets:
Ni \ {1} =
so that xk ∈ Xik for each k = 1, . . . , n − 1. By Lemma 3.2 there is a countable
group a Mi+1, with (nCC), containing a copy of Ni, in which the subsets Xik,
i = 1, . . . , n− 1, remain independent. Set Hi+1 = Hi ∗Ni Mi+1. Now, it is easy to
verify that the analogs of the conditions 0)-3) hold for Hi+1 and
(4.1) Ni−1 ≤Mi ≤ Ni ≤Mi+1.
The analog of the condition 4) is true in Hi+1 by the same considerations as before
(in the case of H1).
Define the group H =
i=1Hi and its subgroup M =
i=1Ni. Observe that
the condition 0) implies that H is torsion-free, condition 1) implies that M is
normal in H , and 2) implies that H = H0 ·M and H0 ∩M = N0. Hence H/M ∼=
H0/(H0∩M) ∼= C. Applying (4.1) we getM =
i=1Mi, and thus, by the conditions
3), 4) it enjoys the property (nCC): each element of M will be a conjugate of xk
for some k ∈ {1, . . . , n− 1}. Since x1, . . . , xn−1 ∈M1 ≤M and M1 is 2-boundedly
simple, then so will be M . Finally, 4) implies that xk
≁ xl whenever k 6= l,
and, consequently, the natural action of H on M preserves its conjugacy classes.
Q.e.d. �
Lemma 4.3. Suppose that G is a group, N ⊳ G, A,B ≤ G and ϕ : A → B is
an isomorphism such that ϕ(a) ∈ aN (i.e., the canonical images of a and ϕ(a) in
12 ASHOT MINASYAN
G/N coincide) for each a ∈ A. Let L = 〈G, t ‖ tat−1 = ϕ(a), ∀ a ∈ A〉 be the
HNN-extension of G with associated subgroups A and B, and let K be the normal
closure of 〈N, t〉 in L . Then G ∩K = N .
Proof. This statement easily follows from the universal property of HNN-extensions
and is left as an exercise for the reader. �
The next lemma will allow us to construct (nCC)-groups that are not simple:
Lemma 4.4. Assume that H is a torsion-free countable group and M⊳H is a non-
trivial normal subgroup. Then H can be isomorphically embedded into a countable
torsion-free group G possessing a normal subgroup K ⊳G such that
1) G = HK and H ∩K =M ;
2) ∀ x, y ∈ G \ {1}, ϕ(x) = ϕ(y) if and only if ∃ h ∈ K such that x = hyh−1,
where ϕ : G → G/K is the natural homomorphism; in particular, K will
have (2CC);
3) ∀ x, y ∈ G \ {1}, x
∼ y if and only if ϕ(x)
∼ ϕ(y);
Proof. Choose a set of representatives Z ⊂ H of cosets of H modulo M , in such a
way that each coset is represented by a unique element from Z and 1 /∈ Z.
DefineG(0)=H andK(0) =M . Enumerate the elements ofG(0)\{1}: g1, g2, . . . .
First we embed the group G(0) into a countable torsion-free group G1, having a
normal subgroup K1 ⊳G1, such that G1 = HK1, H ∩K1 =M and for every i ≥ 0
there are ti ∈ K1 and zi ∈ Z satisfying tigit
i = zi.
Suppose that the (countable torsion-free) group G(j), j ≥ 0, and K(j) ⊳ G(j),
have already been constructed so that H ≤ G(j), G(j) = HK(j), H ∩K(j) = M
and, if j ≥ 1, then tjgjt
j = zj for some tj ∈ K(j) and zj ∈ Z. The group G(j+1),
containing G(j), is defined as the following HNN-extension:
G(j + 1) = 〈G(j), tj+1 ‖ tj+1gj+1t
j+1 = zj+1〉,
where zj+1 ∈ Z ⊂ H is the unique representative satisfying gj+1 ∈ zj+1K(j) in
G(j). Denote by K(j+1)⊳G(j+1) the normal closure of 〈K(j), tj+1〉 in G(j+1).
Evidently the group G(j + 1) is countable and torsion-free, H ≤ G(j) ≤ G(j + 1),
G(j + 1) = HK(j + 1) and H ∩K(j + 1) = H ∩K(j) =M by Lemma 4.3.
Now, it is easy to verify that the group G1 =
j=0G(j) and its normal subgroup
j=0K(j) enjoy the required properties.
In the same way we can embed G1 into a countable torsion-free group G2, that
has a normal subgroup K2⊳G2, so that G2 = HK2, H∩K2 =M and each element
of G1 \ {1} is conjugated in G2 to a corresponding element of Z. Performing such
a procedure infinitely many times we achieve the group G =
i=1Gi and a normal
subgroup K =
i=1Ki ⊳ G that satisfy the claims 1) and 2) of the lemma. It is
easy to see that the claim 2) implies 3), thus the proof is finished. �
5. Adding finite generation
Theorem 5.1. Assume that H is a countable torsion-free group and M is a non-
trivial normal subgroup of H. Let F be an arbitrary non-elementary torsion-free
word hyperbolic group. Then there exist a countable torsion-free group Q, containing
H, and a normal subgroup N ⊳Q with the following properties:
1. H is malnormal in Q;
2. Q = H ·N and N ∩H =M ;
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 13
3. N is a quotient of F ;
4. the centralizer CQ(N) of N in Q is trivial;
5. for every q ∈ Q there is z ∈ H such that q
Proof. The group Q will be constructed as a direct limit of relatively hyperbolic
groups.
Step 0 . Set G(0) = H ∗F and F (0) = F ; then G(0) is hyperbolic relative to its
subgroupH and F (0) is a suitable subgroup of G(0) by Lemma 2.6. LetN(0)⊳G(0)
be the normal closure of the subgroup 〈M,F 〉 in G(0). Evidently G(0) = H ·N(0)
and H ∩ N(0) = M . Enumerate all the elements of N(0): {g0, g1, g2, . . . }, and of
G(0): {q0, q1, q2, . . . }, in such a way that g0 = q0 = 1.
Steps 0-i . Assume the groups G(j), j = 0, . . . , i, i ≥ 0, have been already
constructed, so that
1◦. for each 1 ≤ j ≤ i there is an epimorphism ψj−1 : G(j−1) → G(j) which is
injective on (the image of) H in G(j − 1). Denote F (j) = ψj−1(F (j − 1)),
N(j) = ψj−1(N(j − 1));
2◦. G(j) is torsion-free and hyperbolic relative to (the image of) H , and F (j) ≤
G(j) is a suitable subgroup, j = 0, . . . , i;
3◦. G(j) = H ·N(j), N(j)⊳G(j) and H ∩N(j) =M , j = 0, . . . , i;
4◦. the natural image ḡj of gj in G(j) belongs to F (j), j = 0, . . . , i;
5◦. there exists zj ∈ H such that q̄j
∼ zj, j = 0, . . . , i, where q̄j is the image
of qj in G(j).
Step i+1 . Let q̂i+1 ∈ G(i), ĝi+1 ∈ N(i) be the images of qi+1 and gi+1 in G(i).
First we construct the group G(i+1/2), its normal subgroup Ki+1 and its element
ti+1 as follows.
If for some f ∈ G(i), f q̂i+1f
−1 = z ∈ H , then set G(i + 1/2) = G(i), Ki+1 =
N(i)⊳G(i + 1/2) and ti+1 = 1.
Otherwise, q̂i+1 is a hyperbolic element of infinite order in G(i). Since G(i) is
torsion-free, the elementary subgroup EG(i)(q̂i+1) is cyclic, thus EG(i)(q̂i+1) = 〈hx〉
for some h ∈ H and x ∈ N(i) (by 3◦), and q̂i+1 = (hx)
m for some m ∈ Z. Now, by
Lemma 2.2, G(i) is hyperbolic relative to {H, 〈hx〉}. Choose y ∈M so that hy 6= 1
and let G(i + 1/2) be the following HNN-extension of G(i):
G(i + 1/2) = 〈G(i), ti+1 ‖ ti+1(hx)t
i+1 = hy〉.
The group G(i+ 1/2) is torsion-free and hyperbolic relative to H by Lemma 2.5.
Let us now verify that the subgroup F (i) is suitable in G(i + 1/2). Indeed,
according to Lemma 2.7, there are two hyperbolic elements f1, f2 ∈ F (i) of infinite
order in G(i) such that fl
6≈ hx, fl
6≈ hy, l = 1, 2, and f1
6≈ f2. Then
G(i+1/2)
6≈ f2 by Lemma 3.1. It remains to check that fl is a hyperbolic element
of G(i + 1/2) for each l = 1, 2. Choose an arbitrary element w ∈ H and observe
that fl
6≈ w (since H is malnormal in G(i) by Lemma 2.3, a non-trivial power of
fl is conjugated to an element of H if and only if fl is conjugated to an element of
H in G(i), but the latter is impossible because fl is hyperbolic in G(i)). Applying
Lemma 3.1 again, we get that fl
G(i+1/2)
6≈ w for any w ∈ H . Hence f1, f2 ∈ F (i) are
hyperbolic elements of infinite order in G(i+1/2). The intersection EG(i+1/2)(f1)∩
14 ASHOT MINASYAN
EG(i+1/2)(f2) must be finite, since these groups are virtually cyclic (by Lemma 2.2),
and f1 is not commensurable with f2 in G(i+1/2). But G(i+ 1/2) is torsion-free,
therefore EG(i+1/2)(f1) ∩EG(i+1/2)(f2) = {1}. Thus F (i) is a suitable subgroup of
G(i+ 1/2).
Lemma 4.3 assures that H ∩Ki+1 =M where Ki+1 ⊳G(i + 1/2) is the normal
closure of 〈N(i), ti+1〉 in G(i + 1/2). Finally, note that
ti+1q̂i+1t
i+1 = ti+1(hx)
mt−1i+1 = (hy)
m = z ∈ H in G(i + 1/2).
Now, that the group G(i+ 1/2) has been constructed, set Ti+1 = {ĝi+1, ti+1} ⊂
Ki+1 and define G(i+ 1) as follows. Since Ti+1 ·F (i) ⊂ Ki+1 ⊳G(i+1/2), we can
apply Theorem 2.8 to find a group G(i+1) and an epimorphism ϕi : G(i+1/2) →
G(i + 1) such that ϕi is injective on H , G(i + 1) is torsion-free and hyperbolic
relative to (the image of) H , {ϕi(ĝi+1), ϕi(ti+1)} ⊂ ϕi(F (i)), ϕi(F (i)) is a suitable
subgroup of G(i + 1), and ker(ϕi) ≤ Ki+1. Denote by ψi the restriction of ϕi on
G(i). Then ψi(G(i)) = ϕi(G(i)) = G(i + 1) because G(i + 1/2) was generated by
G(i) and ti+1, and according to the construction, ti+1 ∈ ϕi(F (i)) ≤ ϕi(G(i)). Now,
after defining F (i+1) = ψi(F (i)), N(i+1) = ψi(N(i)), ḡi+1 = ϕi(ĝi+1) ∈ F (i+1)
and zi+1 = ϕi(z) ∈ H , we see that the conditions 1
◦,2◦,4◦ and 5◦ hold in the case
when j = i+1. The properties G(i+1) = H ·N(i+1) and N(i+1)⊳G(i+1) are
immediate consequences of their analogs for G(i) and N(i). Finally, observe that
ϕ−1i (H ∩N(i+ 1)) = H · ker(ϕi) ∩N(i) · ker(ϕi) =
H ∩N(i) · ker(ϕi)
· ker(ϕi)
H ∩Ki+1
· ker(ϕi) =M · ker(ϕi).
Therefore H ∩N(i+ 1) =M and the condition 3◦ holds for G(i + 1).
Let Q = G(∞) be the direct limit of the sequence (G(i), ψi) as i → ∞, and let
F (∞) and N = N(∞) be the limits of the corresponding subgroups. Then Q is
torsion-free by 2◦, N ⊳ Q, Q = H · N and H ∩N = M by 3◦. N ≤ F (∞) by 4◦,
and 5◦ implies the condition 5 from the claim.
Since F (0) ≤ N(0) we get F (∞) ≤ N . Thus N = F (∞) is a homomorphic
image of F (0) = F .
For any i, j ∈ N ∪ {∞}, i < j, we have a natural epimorphism ζij : G(i) → G(j)
such that if i < j < k then ζjk ◦ ζij = ζik. Take any g ∈ G(0). Since F = F (0)
is finitely generated, using the properties of direct limits one can show that if
w = ζ0∞(g) ∈ CQ(F (∞)) in Q, then ζ0j(g) ∈ CG(j)(F (j)) for some j ∈ N. But
CG(j)(F (j)) ≤ EG(j)(F (j)) = {1} (by formulas (2.2) and (2.3)) because F (j) is
a suitable subgroup of G(j), hence w = ζj∞
ζ0j(g)
= 1, that is, CQ(F (∞)) =
CQ(N) = {1}. This concludes the proof. �
The next statement is well-known:
Lemma 5.2. Assume G is a group and N ⊳ G is a normal subgroup such that
CG(N) ⊆ N , where CG(N) is the centralizer of N in G. Then the quotient-group
G/N embeds into the outer automorphism group Out(N).
Proof. The action of G on N by conjugation induces a natural homomorphism
ϕ from G to the automorphism group Aut(N) of N . Since ϕ(N) is exactly the
group of inner automorphisms Inn(N) of N , one can define a new homomorphism
ϕ̄ : G/N → Out(N) = Aut(N)/Inn(N) in the natural way: ϕ̄(gN) = ϕ(g)Inn(N)
for every gN ∈ G/N . It remains to check that ϕ̄ is injective, i.e., if g ∈ G \N then
ϕ̄(gN) 6= 1 in Out(N); or, equivalently, ϕ(g) /∈ Inn(N). Indeed, otherwise there
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 15
would exist a ∈ N such that ghg−1 = aha−1 for every h ∈ N , thus N 6∋ a−1g ∈
CG(N), contradicting the assumptions. Q.e.d. �
Note that for an arbitrary group N , any subgroup C ≤ Out(N) naturally acts
on the set of conjugacy classes C(N) of the group N .
Theorem 5.3. For any n ∈ N, n ≥ 2, and an arbitrary countable group C, C can
be isomorphically embedded into the outer automorphism group Out(N) of a group
N satisfying the following conditions:
• N is torsion-free;
• N is generated by two elements;
• N has (nCC) and the natural action of C on C(N) is trivial;
• N is 2-boundedly simple.
Proof. By Lemma 4.2 we can find a countable torsion-free group H and its normal
subgroup M enjoying the properties (i)-(iv) from its claim. Now, if F denotes
the free group of rank 2, we can obtain a countable torsion-free group Q together
with its normal subgroup N that satisfy the conditions 1-5 from the statement of
Theorem 5.1.
Then N is torsion-free and generated by two elements (as a quotient of F ).
Condition 2 implies that Q/N ∼= H/M ∼= C and, by 4 and Lemma 5.2, C embeds
into the group Out(N).
Using property 5, for each g ∈ N we can find u ∈ Q and z ∈ H such that
ugu−1 = z ∈ N ∩ H = M . Since Q = HN , there are h ∈ H and x ∈ N
such that u = hx. Since z, h−1zh ∈ M and the action of H on M preserves
the conjugacy classes of M , there is r ∈ M such that rh−1zhr−1 = z, hence
z = rh−1ugu−1(rh−1)−1 = rxgx−1r−1, where v = rx ∈ N . Thus for every g ∈ N
there is v ∈ N such that vgv−1 ∈ M . Evidently, this implies that N is also 2-
boundedly simple. Since M has (nCC), the number of conjugacy classes in N will
be at most n.
Suppose x1, x2 ∈ M and x1
≁ x2. Then x1
≁ x2 (by the property (iii) from
the claim of Lemma 4.2), and since H is malnormal in Q we get x1
≁ x2. Hence
≁ x2, i.e., N also enjoys (nCC).
The fact that the natural action of C on C(N) is trivial follows from the same
property for the action of H on C(M) and the malnormality of H in Q. Q.e.d. �
Now, let us proceed with the
Proof of Theorem 1.4. First we apply Lemma 4.4 to construct a group G and a
normal subgroup K ⊳ G according to its claim. Now, by Theorem 5.1, there is a
groupQ, having a normal subgroupN⊳Q such that G is malnormal in Q, Q = GN ,
G ∩ N = K, rank(N) ≤ 2 (if one takes the free group of rank 2 as F ) and every
element q ∈ Q is conjugated (in Q) to an element of G. By claim 2) of Lemma 4.4,
K has (2CC), and an argument, similar to the one used in the proof of Theorem
5.3, shows that N will also have (2CC). Consequently, rank(N) > 1 because N is
torsion-free, hence rank(N) = 2.
Since G = HK and H ∩ K = M we have Q = HKN = HN and H ∩ N =
H ∩K =M . Since Q/N ∼= H/M and N can be generated by two elements, we can
conclude that rank(Q) ≤ rank(H/M) + 2.
16 ASHOT MINASYAN
Consider arbitrary x, y ∈ Q \ {1} and suppose that ϕ(x)
∼ ϕ(y). By Theorem
5.1, there are w, z ∈ G \ {1} such that x
∼ w and y
∼ z. Therefore ϕ(w)
∼ ϕ(z),
hence the images of w and z in G/K are also conjugate. By claim 3) of Lemma
4.4, w
∼ z, implying x
∼ y. �
Theorem 1.4 provides an alternative way of obtaining torsion-free groups that
have finitely many conjugacy classes: for any countable group C we can choose
a free group H of countable rank and a normal subgroup {1} 6= M ⊳ H so that
H/M ∼= C, and then apply Theorem 1.4 to the pair (H,M) to get
Corollary 5.4. Assume that n ∈ N, n ≥ 2, and C is a countable group that
contains exactly (n− 1) distinct conjugacy classes. Then there exists a torsion-free
group Q and N ⊳Q such that
• Q/N ∼= C;
• N has (2CC) and Q has (nCC);
• rank(N) = 2 and rank(Q) ≤ rank(C) + 2.
Corollary 5.5. The group G1, given by presentation (1.1), can be isomorphically
embedded into a 2-generated torsion-free group Q satisfying (4CC) in such a way
that t
≁ t−1.
Proof. Denote by K the kernel of the homomorphism ϕ : G1 → Z3, for which
ϕ(a) = 0 and ϕ(t) = 1, where Z3 is the group of integers modulo 3. Now, apply
Theorem 1.4 to the pair (G1,K) to find the group Q, containing G1, and the normal
subgroup N ⊳ Q from its claim. Since Q/N ∼= G1/K ∼= Z3 has (3CC), the group
Q will have (4CC). We also have t
≁ t−1 because the images of t and t−1 are not
conjugate in Q/N .
Choose an element q1 ∈ Q \ N . Then q2 = q
1 ∈ N \ {1} and since N is 2-
generated and has (2CC), there is q3 ∈ N such that N = 〈q2, q3〉 in Q. As Q/N
is generated by the image of q1, the group Q will be generated by {q1, q2, q3}, and,
consequently, by {q1, q3}. Q.e.d. �
6. Combinatorics of paths in relatively hyperbolic groups
Let G be a group hyperbolic relative to a family of proper subgroups {Hλ}λ∈Λ,
and let X be a finite symmetrized relative generating set of G. Denote H =
λ∈Λ (Hλ \ {1}).
For a combinatorial path p in the Cayley graph Γ(G,X ∪H) (of G with respect
to X ∪H) p−, p+, L(p), and lab(p) will denote the initial point, the ending point,
the length (that is, the number of edges) and the label of p respectively. p−1 will
be the path obtained from p by following it in the reverse direction. Further, if Ω
is a subset of G and g ∈ 〈Ω〉 ≤ G, then |g|Ω will be used to denote the length of a
shortest word in Ω±1 representing g.
We will be using the following terminology from [20]. Suppose q is a path in
Γ(G,X ∪ H). A subpath p of q is called an Hλ-component for some λ ∈ Λ (or
simply a component) of q, if the label of p is a word in the alphabet Hλ \ {1} and
p is not contained in a bigger subpath of q with this property.
Two components p1, p2 of a path q in Γ(G,X ∪ H) are called connected if they
are Hλ-components for the same λ ∈ Λ and there exists a path c in Γ(G,X ∪ H)
connecting a vertex of p1 to a vertex of p2 such that lab(c) entirely consists of letters
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 17
from Hλ. In algebraic terms this means that all vertices of p1 and p2 belong to the
same coset gHλ for a certain g ∈ G. We can always assume c to have length at
most 1, as every nontrivial element of Hλ is included in the set of generators. An
Hλ-component p of a path q is called isolated if no other Hλ-component of q is
connected to p.
The next statement is a particular case of Lemma 2.27 from [20]; we shall for-
mulate it in a slightly more general form, as it appears in [18, Lemma 2.7]:
Lemma 6.1. Suppose that a group G is hyperbolic relative to a family of subgroups
{Hλ}λ∈Λ. Then there exists a finite subset Ω ⊆ G and a constant K ∈ N such
that the following holds. Let q be a cycle in Γ(G,X ∪H), p1, . . . , pk be a collection
of isolated components of q and g1, . . . , gk be the elements of G represented by
Lab(p1), . . . ,Lab(pk) respectively. Then g1, . . . , gk belong to the subgroup 〈Ω〉 ≤ G
and the word lengths of gi’s with respect to Ω satisfy
|gi|Ω ≤ KL(q).
Definition. Suppose that m ∈ N and Ω is a finite subset of G. Define W(Ω,m) to
be the set of all words W over the alphabet X ∪H that have the following form:
W ≡ x0h0x1h1 . . . xlhlxl+1,
where l ∈ Z, l ≥ −2 (if l = −2 then W is the empty word; if l = −1 then W ≡ x0),
hi and xi are considered as single letters and
1) xi ∈ X ∪{1}, i = 0, . . . , l+1, and for each i = 0, . . . , l, there exists λ(i) ∈ Λ
such that hi ∈ Hλ(i);
2) if λ(i) = λ(i + 1) then xi+1 /∈ Hλ(i) for each i = 0, . . . , l− 1;
3) hi /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}, i = 0, . . . , l.
Choose the finite subset Ω ⊂ G and the constant K > 0 according to the claim
of Lemma 6.1.
Recall that a path q in Γ(G,X ∪ H) is said to be without backtracking if all of
its components are isolated.
Lemma 6.2. Let q be a path in the Cayley graph Γ(G,X ∪ H) with Lab(q) ∈
W(Ω,m) and m ≥ 5K. Then q is without backtracking.
Proof. Assume the contrary to the claim. Then one can choose a path q providing
a counterexample of the smallest possible length. Thus if p1, . . . , pl is the (consec-
utive) list of all components of q then l ≥ 2, p1 and pl must be connected Hλ′ -
components, for some λ′ ∈ Λ, the components p2, . . . , pl−1 must be isolated, and q
starts with p1 and ends with pl. Since Lab(q) ∈ W(Ω,m) we have L(q) ≤ 2l− 1.
If l = 2 then the (X ∪ {1})-letter between p1 and p2 would belong to Hλ′
contradicting the property 2) from the definition of W(Ω,m).
Therefore l ≥ 3. Since p1 and pl are connected, there exists a path v in Γ(G,X ∪
H) between (pl)− and (p1)+ with Lab(v) ∈ Hλ′ (thus we can assume that L(v) ≤ 1).
Denote by q̂ the subpath of q starting with (p1)+ and ending with (pl)−. Note that
L(q̂) = L(q)−2 ≤ 2l−3, and p2, . . . , pl−1 is the list of components of q̂, all of which
are isolated. If one of them were connected to v it would imply that it is connected
to p1 contradicting with the minimality of q. Hence the cycle o = q̂v possesses
18 ASHOT MINASYAN
k = l − 2 ≥ 1 isolated components, which represent elements h1, . . . , hk ∈ H.
Consequently, applying Lemma 6.1 one obtains that hi ∈ 〈Ω〉, i = 1, . . . , k, and
|hi|Ω ≤ KL(o) ≤ K(L(q̂) + 1) ≤ K(2l− 2).
By the condition 3) from the definition of W(Ω,m) one has |hi|Ω > m ≥ 5K for
each i = 1, . . . , k. Hence
k · 5K ≤
|hi|Ω ≤ K(2l− 2), or 5 ≤
2l − 2
which contradicts the inequality k ≥ l − 2. Q.e.d. �
Definition. Consider an arbitrary cycle o = rqr′q′ in Γ(G,X ∪ H), where Lab(q)
and Lab(q′) belong to W(Ω,m). Let p be a component of q (or q′). We will say
that p is regular if it is not an isolated component of o. If m ≥ 5K, and hence q and
q′ are without backtracking by Lemma 6.2, this means that p is either connected
to some component of q′ (respectively q), or to a component of r or r′.
Lemma 6.3. In the above notations, suppose that m ≥ 7K and denote C =
max{L(r),L(r′)}. Then
(a) if C ≤ 1 then every component of q or q′ is regular;
(b) if C ≥ 2 then each of q and q′ can have at most 4C components which are
not regular.
(c) if l is the number of components of q, then at least (l− 6C) of components
of q are connected to components of q′; and two distinct components of q
can not be connected to the same component of q′. Similarly for q′.
Proof. Assume the contrary to (a). Then one can choose a cycle o = rqr′q′ with
L(r),L(r′) ≤ 1, having at least one isolated component on q or q′, and such that
L(q) + L(q′) is minimal. Clearly the latter condition implies that each component
of q or q′ is an isolated component of o. Therefore q and q′ together contain k
distinct isolated components of o, representing elements h1, . . . , hk ∈ H, where
k ≥ 1 and k ≥ (L(q) − 1)/2 + (L(q′) − 1)/2. Applying Lemma 6.1 we obtain
hi ∈ 〈Ω〉, i = 1, . . . , k, and
|hi|Ω ≤ KL(o) ≤ K(L(q) + L(q
′) + 2).
Recall that |hi|Ω > m ≥ 7K by the property 3) from the definition of W(Ω,m).
Therefore
i=1 |hi|Ω ≥ k · 7K, implying
L(q′)
L(q)− 1
L(q′)− 1
which yields a contradiction.
Let us prove (b). Suppose that C ≥ 2 and q contains more than 4C isolated
components of o. We shall consider two cases:
Case 1. No component of q is connected to a component of q′. Then a com-
ponent of q or q′ can be regular only if it is connected to a component of r or r′.
Since, by Lemma 6.2, q and q′ are without backtracking, two distinct components
of q or q′ can not be connected to the same component of r (or r′). Hence q and
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 19
q′ together can contain at most 2C regular components. Thus the cycle o has k
isolated components, representing elements h1, . . . , hk ∈ H, where k ≥ 4C > 4 and
k ≥ (L(q)−1)/2+(L(q′)−1)/2−2C. By Lemma 6.1, hi ∈ 〈Ω〉 for each i = 1, . . . , k,
i=1 |hi|Ω ≤ K(L(q) + L(q
′) + 2C). Once again we can use the property 3)
from the definition of W(Ω,m) to achieve
L(q′)
L(q)− 1
L(q′)− 1
− 2C + 1 + 3C
L(q)− 1
L(q′)− 1
≤ 2 +
yielding a contradiction.
Case 2. The path q has at least one component which is connected to a com-
ponent of q′. Let p1, . . . , pl denote the sequence of all components of q. By part
(a), if ps and pt, 1 ≤ s ≤ t ≤ l, are connected to components of q
′, then for any
j, s ≤ j ≤ t, pj is connected to some component of q
′ (because q is without back-
tracking by Lemma 6.2). We can take s (respectively t) to be minimal (respectively
maximal) possible. Consequently p1, . . . , ps−1, pt+1, . . . , pl will contain the set of
all isolated components of o that belong to q, and none of these components will
be connected to a component of q′.
Without loss of generality we may assume that s− 1 ≥ 4C/2 = 2C. Since ps is
connected to some component p′ of q′, there exists a path v in Γ(G,X∪H) satisfying
v− = (ps)−, v+ = p
+, Lab(v) ∈ H ∪ {1}, L(v) ≤ 1. Let q̄ (respectively q̄
′) denote
the subpath of q (respectively q′) from q− to (ps)− (respectively from p
+ to q
Consider a new cycle ō = rq̄vq̄′. Reasoning as before, one can show that ō has k
isolated components, where k ≥ 2C ≥ 4 and k ≥ (L(q̄)−1)/2+(L(q̄′)−1)/2−C−1.
Now, an application of Lemma 6.1 to the cycle ō together with the property 3) from
the definition of W(Ω,m) will lead to a contradiction as before.
By the symmetry, the statement (b) of the lemma also holds for q′.
The claim (c) follows from (b) and the estimate L(r) + L(r′) ≤ 2C because if
two different components p and p̄ of q were connected to the same component of
some path in Γ(G,X ∪H), then p and p̄ would also be connected with each other,
which would contradict Lemma 6.2. �
Lemma 6.4. In the previous notations, let m ≥ 7K, C = max{L(r),L(r′)}, and
let p1, . . . , pl, p
1, . . . , p
l′ be the consecutive lists of the components of q and q
respectively If l ≥ 12max{C, 1} + 2, then there are indices s, t, s′ ∈ N such that
1 ≤ s ≤ 6C + 1, l − 6max{C, 1} ≤ t ≤ l and for every i ∈ {0, 1, . . . , t − s}, the
component ps+i of q is connected to the component p
s′+i of q
Proof. By part (c) of Lemma 6.3, there exists s ≤ 6C +1 such that the component
ps is connected to a component p
s′ for some s
′ ∈ {1, . . . , l′}. Thus there is a path
r1 between (p
s′)+ and (ps)+ with L(r1) ≤ 1. Consider a new cycle o1 = r1q1r
where q1 is the segment of q from (ps)+ to q+ = r
− and q
1 is the segment of q
from q′− = r
+ to (p
s′)+.
Observe that ps+1, . . . , pl is the list of all components of q1 and l−s ≥ l−6C−1 ≥
6max{1, C}+ 1, hence, according to part (c) of Lemma 6.3 applied to o1, there is
t ≥ l − 6max{1, C} > s such that pt is connected to p
t′ by means of a path r
where s′ + 1 ≤ t′ ≤ l′, (r′1)− = (pt)+, (r
1)+ = (p
t′)+ and L(r
1) ≤ 1. Consider
20 ASHOT MINASYAN
s′+i′ p
Figure 1.
the cycle o2 = r1q2r
2 in which q2 and q
2 are the segments of q1 and q
1 from
(ps)+ = (r1)+ to (pt)+ and from (p
t′)+ to (p
s′)+ = (r1)− respectively (Fig. 1).
Note that ps+1, . . . , pt is the list of all components of q2 and p
s′+1, . . . , p
t′ is the
list of all components of q′2
. The cycle o2 satisfies the assumptions of part (a) of
Lemma 6.3, therefore for every i ∈ {1, . . . , t − s} there exists i′ ∈ {1, . . . , t′ − s′}
such that ps+i is connected to p
s′+i′ (ps+i can not be connected to r1 [r
1] because in
this case it would be connected to ps [pt], but q is without backtracking by Lemma
6.2).
It remains to show that i′ = i for every such i. Indeed, if i′ < i for some
i ∈ {1, . . . , t − s} then one can consider the cycle o3 = r1q3r
3, where q3 and
q′3 are segments of q2 and q
2 from (q2)− = (r1)+ to (ps+i)+ and from (p
s′+i′ )+ to
(q′2)+ = (r1)− respectively, and (r
3)− = (q3)+, (r
3)+ = (q
3)−, L(r
3) ≤ 1. According
to part (a) of Lemma 6.3, each of the components ps+1, . . . , ps+i of q3 must be
connected to one of p′s′+1, . . . , p
s′+i′ . Hence, since i
′ < i, two distinct components
of q3 will be connected to the same component of q
, which is impossible by part
(c) of Lemma 6.3.
The inequality i′ > i would lead to a contradiction after an application of a
symmetric argument to q′3. Therefore i
′ = i and the lemma is proved. �
Lemma 6.5. In the above notations, let m ≥ 7K and C = max{L(r),L(r′)}. For
any positive integer d there exists a constant L = L(C, d) ∈ N such that if L(q) ≥ L
then there are d consecutive components ps, . . . , ps+d−1 of q and p
s′ , . . . , p
s′+d−1 of
q′−1, so that ps+i is connected to p
s′+i for each i = 0, . . . , d− 1.
Proof. Choose the constant L so that (L − 1)/2 ≥ 12max{C, 1} + 2 + d. Let
p1, . . . , pl be the consecutive list all components of q. Since Lab(q) ∈ W(Ω,m), we
have l ≥ (L − 1)/2 (due to the form of any word from W(Ω,m)). Thus we can
apply Lemma 6.4 to find indices s, t from its claim. By the choice of s and t, and
the estimate on l, we have t− s ≥ d+ 1, yielding the statement of the lemma. �
Corollary 6.6. Let G be a group hyperbolic relative to a family of proper subgroups
{Hλ}λ∈Λ. Suppose that a ∈ Hλ0 , for some λ0 ∈ Λ, is an element of infinite order,
and x1, x2 ∈ G \ Hλ0 . Then there exists k ∈ N such that g = a
k1x1a
k2x2 is a
hyperbolic element of infinite order in G whenever |k1|, |k2| ≥ k.
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 21
Proof. Without loss of generality we can assume that x1, x2 ∈ X , since relative
hyperbolicity does not depend on the choice of the finite relative generating set
([20, Thm. 2.34]). Choose the finite subset Ω ⊂ G and the constant K ∈ N
according to the claim of Lemma 6.1, and set m = 7K. As the order of a is infinite,
there is k ∈ N such that ak
/∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m} whenever |k
′| ≥ k. Assume
that |k1|, |k2| ≥ k.
Suppose, first, that gl = 1 for some l ∈ N. Consider the cycle o = rqr′q′ in
Γ(G,X ∪ H) where q− = q+ = 1, Lab(q) ≡ (a
k1x1a
k2x2)
l ∈ W(Ω,m) (akj are
considered as single letters from the alphabet X ∪H) and r, r′, q′ are trivial paths
(consisting of a single point). Then, by part (a) of Lemma 6.3, every component of
q must be regular in o, which is impossible since q is without backtracking according
to Lemma 6.2. Hence g has infinite order in G.
Suppose, now, that there exists λ′ ∈ Λ, u ∈ Hλ′ and y ∈ G such that ygy
−1 = u.
Denote C = |y|X∪H. Since element u ∈ G has infinite order, there exists l ∈ N such
that 2l ≥ 6C+2 and ul /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}. The equality yg
ly−1u−l = 1 gives
rise to the cycle o = rqr′q′ in Γ(G,X∪H), where r and r′ are paths of length C whose
labels represent y in G, r− = 1, q− = r+ = y, Lab(q) ≡ (a
k1x1a
k2x2)
l ∈ W(Ω,m),
r′− = q+, q
− = r
+ = y(a
k1x1a
k2x2)
ly−1 and Lab(q′) ≡ u−l ∈ W(Ω,m), L(q′) = 1.
By part (c) of Lemma 6.3, at least 2l − 6C ≥ 2 distinct components of q must
be connected to distinct components of q′, which is impossible as q′ has only one
component. The contradiction shows that g must be a hyperbolic element of G. �
Lemma 6.7. Let G be a torsion-free group hyperbolic relative to a family of proper
subgroups {Hλ}λ∈Λ, a ∈ Hλ0 \ {1}, for some λ0 ∈ Λ, and t, u ∈ G \Hλ0 . Suppose
that there exists k̂ ∈ N such that for every k ≥ k̂ the element g1 = a
ktakt−1
is commensurable with g2 = a
kuaku−1 in G. Then there are β, γ ∈ Hλ0 and
ǫ, ξ ∈ {−1, 1} such that u = γtξβ, βaβ−1 = aǫ, γ−1aγ = aǫ.
Proof. Changing the finite relative generating set X of G, if necessary, we can
assume that t, u, t−1, u−1 ∈ X . Let the finite subset Ω ⊂ G and the constant
K ∈ N be chosen according to Lemma 6.1. Define m = 7K and suppose that k is
large enough to satisfy ak /∈ {h ∈ 〈Ω〉 : |h|Ω ≤ m}.
Since g1 and g2 are commensurable, there exist l, l
′ ∈ Z\{0} and y ∈ G such that
ygl2y
−1 = gl
1 . Let C = |y|X∪H, d = 8 and L = L(C, d) be the constant from Lemma
6.5. Without loss of generality, assume that 4l ≥ L. Consider the cycle o = rqr′q′
in Γ(G,X ∪ H) such that r and r′ are paths of length C whose labels represent y
in G, r− = 1, q− = r+ = y, Lab(q) ≡ (a
kuaku−1)l ∈ W(Ω,m), L(q) = 4l, r′− = q+,
q′− = r
+ = yg
−1, Lab(q′) ≡ (aktakt−1)l
∈ W(Ω,m), L(q′) = 4l′.
Now, by Lemma 6.5, there are subpaths q̃ = p1s1p2s2p3s3p4 of q and q̃
4 of q
′−1 such that Lab(pi) ≡ a
k, Lab(p′i) ≡ a
ǫk, i = 1, 2, 3, 4, for
some ǫ ∈ {−1, 1} (which depends on the sign of l′), Lab(s1) ≡ Lab(s3) ≡ u,
Lab(s2) ≡ u
−1, Lab(s′1) ≡ Lab(s
3) ≡ t
ξ, Lab(s′2) ≡ t
−ξ, for some ξ ∈ {−1, 1}, and
pi is connected in Γ(G,X∪H) to p
i for each i = 1, 2, 3, 4. Therefore there exist paths
p̃1, p̃2, p̃3, p̃4 whose labels represent the elements α, β, γ, δ ∈ Hλ0 respectively, such
that (p̃1)− = (p1)+, (p̃1)+ = (p
1)+, (p̃2)− = (p
2)+, (p̃2)+ = (p2)+, (p̃3)− = (p3)−,
(p̃3)+ = (p
3)−, (p̃4)− = (p
4)−, (p̃4)+ = (p4)− (see Fig. 2).
The cycles s−11 p̃1s
2p̃2p
2 , s2p̃3s
p̃2 and s
3 p̃3p
3p̃4 give rise to the fol-
lowing equalities in the group G:
u = αtξaǫkβa−k, u = γtξβ and u = a−kγaǫktξδ.
22 ASHOT MINASYAN
p1 s1 p2 s2 p3 s3 p4
p′1 s
p̃1 p̃2 p̃3 p̃4
ak u ak u−1 ak u
aǫkaǫkaǫk
t−ξtξ tξ
Figure 2.
Consequently, recalling that Hλ0 is malnormal (Lemma 2.3) and that t
ξ /∈ Hλ0 , we
βakβ−1a−ǫk = t−ξγ−1αtξ ∈ Hλ0 ∩ t
−ξHλ0t
ξ = {1}, and
a−ǫkγ−1akγ = tξδβ−1t−ξ ∈ Hλ0 ∩ t
ξHλ0t
−ξ = {1}.
(6.1) βakβ−1 = aǫk and γ−1akγ = aǫk
for some β = β(k), γ = γ(k) ∈ Hλ0 and ǫ = ǫ(k), ξ = ξ(k) ∈ {−1, 1}. Note that
the proof works for any sufficiently large k, therefore we can find two mutually
prime positive integers k, k′ with the above properties such that ǫ(k) = ǫ(k′) = ǫ
and ξ(k) = ξ(k′) = ξ. Denote β′ = β(k′) and γ′ = γ(k′), then γtξβ = u = γ′tξβ′,
implying
γ−1γ′ = tξββ′
t−ξ ∈ Hλ0 ∩ t
ξHλ0t
−ξ = {1}.
Hence β′ = β, γ′ = γ,
(6.2) βak
β−1 = aǫk
and γ−1ak
γ = aǫk
It remains to observe that since k and k′ are mutually prime, the formulas (6.1)
and (6.2) together yield
βaβ−1 = aǫ and γ−1aγ = aǫ,
q.e.d. �
7. Small cancellation over relatively hyperbolic groups
Let G be a group generated by a subset A ⊆ G and let O be the set of all
words in the alphabet A±1, that are trivial in G. Then G has a presentation of the
following form:
(7.1) G = 〈A ‖ O〉.
Given a symmetrized set of words R over the alphabet A, consider the group G1
defined by
(7.2) G1 = 〈A ‖ O ∪R〉 = 〈G ‖ R〉.
During the proof of the main result of this section we use presentations (7.2) (or,
equivalently, the sets of additional relators R) that satisfy the generalized small
cancellation condition C1(ε, µ, λ, c, ρ). In the case of word hyperbolic groups this
condition was suggested by Ol’shanskii in [16], and was afterwards generalized to
relatively hyperbolic groups by Osin in [21]. For the definition and detailed theory
we refer the reader to the paper [21], as we will only use the properties, that were
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 23
already established there. The following observation is an immediate consequence
of the definition:
Remark 7.1. Let the constants εj, µj , λ, c, ρj , j = 1, 2, satisfy 0 < λ ≤ 1, 0 ≤ ε1 ≤
ε2, c ≥ 0, 0 < µ2 ≤ µ1, ρ2 ≥ ρ1 > 0. If the presentation (7.2) enjoys the condition
C1(ε2, µ2, λ, c, ρ2) then it also enjoys the condition C1(ε1, µ1, λ, c, ρ1).
We will also assume that the reader is familiar with the notion of a van Kampen
diagram over the group presentation (7.2) (see [10, Ch. V] or [15, Ch. 4]). Let ∆ be
such a diagram. A cell Π of ∆ is called an R-cell if the label of its boundary contour
∂Π (i.e., the word written on it starting with some vertex in the counter-clockwise
direction) belongs to R.
Consider a simple closed path o = rqr′q′ in a diagram ∆ over the presentation
(7.2), such that q is a subpath of the boundary cycle of an R-cell Π and q′ is a
subpath of ∂∆. Let Γ denote the subdiagram of ∆ bounded by o. Assuming that
Γ has no holes, no R-cells and L(r),L(r′) ≤ ε, it will be called an ε-contiguity
subdiagram of Π to ∂∆. The ratio L(q)/L(∂Π) will be called the contiguity degree
of Π to ∂∆ and denoted (Π,Γ, ∂∆).
A diagram is said to be reduced if it has a minimal number of R-cells among all
the diagrams with the same boundary label.
If G is a group hyperbolic relative to a family of proper subgroups {Hi}i∈I , with
a finite relative generating set X , then G is generated by the set A = X ∪
i∈I(Hi \
{1}), and the Cayley graph Γ(G,A) is a hyperbolic metric space [20, Cor. 2.54].
As for every condition of small cancellation, the main statement of the theory is
the following analogue of Greendlinger’s Lemma, claiming the existence of a cell,
large part of whose contour lies on the boundary of the van Kampen diagram.
Lemma 7.2 ([21], Cor. 4.4). Suppose that the group G is generated by a subset A
such that the Cayley graph Γ(G,A) is hyperbolic. Then for any 0 < λ ≤ 1 there is
µ0 > 0 such that for any µ ∈ (0, µ0] and c ≥ 0 there are ε0 ≥ 0 and ρ0 > 0 with the
following property.
Let the symmetrized presentation (7.2) satisfy the C1(ε0, µ, λ, c, ρ0)-condition.
Further, let ∆ be a reduced van Kampen diagram over G1 whose boundary contour
is (λ, c)-quasigeodesic in G. Then, provided ∆ has an R-cell, there exists an R-cell
Π in ∆ and an ε0-contiguity subdiagram Γ of Π to ∂∆, such that
(Π,Γ, ∂∆) > 1− 23µ.
The main application of this particular small cancellation condition is
Lemma 7.3 ([21], Lemmas 5.1 and 6.3). For any 0 < λ ≤ 1, c ≥ 0 and N > 0
there exist µ1 > 0, ε1 ≥ 0 and ρ1 > 0 such that for any symmetrized set of words
R satisfying C1(ε1, µ1, λ, c, ρ1)-condition the following hold.
1. The group G1 defined by (7.2) is hyperbolic relative to the collection of
images {η(Hi)}i∈I under the natural homomorphism η : G→ G1.
2. The restriction of η to the subset of elements having length at most N with
respect to A is injective.
3. Any element that has a finite order in G1 is an image of an element of
finite order in G.
Below is the principal lemma of this section that will later be used to prove
Theorem 1.5.
24 ASHOT MINASYAN
Lemma 7.4. Assume that G is a torsion-free group hyperbolic relative to a family
of proper subgroups {Hi}i∈I , X is a finite relative generating set of G, S is a suitable
subgroup of G and U ⊂ G is a finite subset. Suppose that i0 ∈ I, a ∈ Hi0 \ {1}
and v1, v2 ∈ G are hyperbolic elements which are not commensurable to each other.
Then there exists a word W (x, y) over the alphabet {x, y} such that the following is
true.
Denote w1 = W (a, v1) ∈ G, w2 = W (a, v2) ∈ G, and let 〈〈w2〉〉 be the normal
closure of w2 in G, G1 = G/〈〈w2〉〉 and η : G → G1 be the natural epimorphism.
• η is injective on {Hλ}λ∈Λ ∪ U and G1 is hyperbolic relative to the family
{η(Hλ)}λ∈Λ;
• η(S) is a suitable subgroup of G1;
• G1 is torsion-free;
• η(w1) 6= 1.
Proof. By Lemma 2.7 there are hyperbolic elements v3, v4 ∈ S such that vi
6≈ vj if
1 ≤ i < j ≤ 4. Then by Lemma 2.2, the group G is hyperbolic relative to the finite
collection of subgroups {Hi}i∈I ∪
j=1{EG(vj)}, and generated by the set
A = X ∪
EG(vj)
 \ {1}.
Let Ω ⊂ G and K ∈ N denote the finite subset and the constant achieved after an
application of Lemma 6.1 to this new collection of peripheral subgroups.
Define m = 7K, λ = 1/3, c = 2 and N = max{|u|A : u ∈ U} + 1. Choose
µj > 0, εj ≥ 0 and ρj > 0, j = 0, 1, according to the claims of Lemmas 7.2
and 7.3. Let ε = max{ε0, ε1}, and let L = L(C, d) > 0 be the constant given by
Lemma 6.5 where C = ε0 and d = 2. Evidently there exists n ∈ N such that, for
µ = (3ε+ 11)/n, one has
0 < µ ≤ min{µ0, µ1}, 2n(1− 23µ) > L, and 2n > max{ρ0, ρ1}.
F(ε) =
h ∈ 〈Ω〉 : |h| ≤ max{K(32ε+ 70),m}
Since the subset F(ε) is finite, we can find k ∈ N such that ak
1 , v
2 /∈ F(ε)
whenever k′ ≥ k. Consider the word
W (x, y) ≡ xkykxk+1yk+1 . . . xk+n−1yk+n−1.
Let wj ∈ G be the element represented by the wordW (a, vj) in G, j = 1, 2, and let
R be the set of all cyclic shifts ofW (a, v2) and their inverses. By Lemma 2.3, Hi0 ∩
EG(v2) = {1} because G is torsion-free, hence by [21, Thm. 7.5] the presentation
(7.2) satisfies the condition C1(ε, µ, 1/3, 2, 2n), and therefore, by Remark 7.1, it
satisfies the conditions C1(ε0, µ, 1/3, 2, ρ0) and C1(ε1, µ1, 1/3, 2, ρ1).
Observe that w1 6= 1 in G because, otherwise, there would have existed a closed
path q in Γ(G,A) labelled by the word W (a, v1), and, by part (a) of Lemma 6.3,
all components of q would have been regular in the cycle o = rqr′q′ (where r, r′, q′
are trivial paths), which is obviously impossible.
Denote G1 = G/〈〈w2〉〉 and let η : G → G1 be the natural epimorphism. Then,
according to Lemma 7.3, the group G1 is is torsion-free, hyperbolic relative to
{η(Hi)}i∈I∪
j=1{η(EG(vj))} and η is injective on the set
i∈I Hi∪
j=1 EG(vj)∪
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 25
U (because the length in A of any element from this set is at most N). Since any
elementary group is word hyperbolic, G1 is also hyperbolic relative to {η(Hi)}i∈I
(by Lemma 2.4) and η(v3), η(v4) ∈ η(S) become hyperbolic elements of infinite or-
der in G1, that are not commensurable with each other (by Lemma 2.3). Therefore
EG1(η(v3)) ∩ EG2(η(v4)) = {1} (recall that these subgroups are cyclic by Lemma
2.2 and because G1 is torsion-free), and, consequently, η(S) is a suitable subgroup
of G1.
Suppose that η(w1) = 1. By van Kampen’s Lemma there exists a reduced
planar diagram ∆ over the presentation (7.2) with the wordW (a, v1) written on its
boundary. SinceW (a, v1)
6= 1, ∆ possesses at least one R-cell. It was proved in [21,
Lemma 7.1] that any path in Γ(G,A) labelled byW (a, v1) is (1/3, 2)-quasigeodesic,
hence we can apply Lemma 7.2 to find an R-cell Π of ∆ and an ε0-contiguity
subdiagram Γ (containing no R-cells) between Π and ∂∆ such that (Π,Γ, ∂∆) >
1− 23µ. Thus there exists a cycle o = rqr′q′ in Γ(G,A) such that q is labelled by a
subword of (a cyclic shift of) W (a, v2), q
′ is labelled by a subword of (a cyclic shift
of) W (a, v1)
±1, L(r),L(r′) ≤ ε0 = C and
L(q) > (1− 23µ) · L(∂Π) = (1− 23µ) · 2n > L.
In particular, Lab(q),Lab(q′) ∈ W(Ω,m). Therefore we can apply Lemma 6.5 to
find two consecutive components of q that are connected to some components of
q′. Due to the form of the word W (a, v2), one of the formers will have to be an
EG(v2)-component, but q
′ can have only EG(v1)- or Hi0 -components. This yields
a contradiction because EG(v2) 6= EG(v1) and EG(v2) 6= Hi0 . Hence η(w1) 6= 1 in
G1, and the proof is complete. �
8. Every group is a group of outer automorphisms of a (2CC)-group
Lemma 8.1. There exists a word R(x, y) over the two-letter alphabet {x, y} such
that every non-elementary torsion-free word hyperbolic group F1 has a non-elemen-
tary torsion-free word hyperbolic quotient F that is generated by two elements a, b ∈
F satisfying
(8.1) R(a, b)
6= 1, R(a−1, b−1)
= 1, R(b, a)
= 1, R(b−1, a−1)
Proof. Consider the word
R(x, y) ≡ xy101x2y102 . . . x100y200.
Denote by F (a, b) the free group with the free generators a, b. Let
R1 = {R(a, b), R(a
−1, b−1), R(b, a), R(b−1, a−1)},
and R2 be the set of all cyclic permutations of words from R
1 . It is easy to see
that the set R2 satisfies the classical small cancellation condition C
′(1/8) (see [10,
Ch. V]). Denote by Ñ the normal closure of the set
R3 = {R(a
−1, b−1), R(b, a), R(b−1, a−1)}
in F (a, b). Since the symmetrization of R3 also satisfies C
′(1/8), the group F̃ =
F (a, b)/Ñ is a torsion-free ([10, Thm. V.10.1]) word hyperbolic group (because it
has a finite presentation for which the Dehn function is linear by [10, Thm. V.4.4])
such that
R(a, b)
6= 1 but R(a−1, b−1)
= R(b, a)
= R(b−1, a−1)
26 ASHOT MINASYAN
Indeed, if the word R(a, b) were trivial in F̃ then, by Greendlinger’s Lemma [10,
Thm. V.4.4], it would contain more than a half of a relator from (the symmetriza-
tion of) R3 as a subword, which would contradict the fact that R2 enjoys C
′(1/8).
The group F̃ is non-elementary because every torsion-free elementary group is
cyclic, hence, abelian, but in any abelian group the relation R(a−1, b−1) = 1 implies
R(a, b) = 1.
Now, the free product G̃ = F̃ ∗F1 is a torsion-free hyperbolic group. Its subgroups
F̃ and F1 are non-elementary, hence, according to a theorem of Ol’shanskii [16,
Thm. 2], there exists a non-elementary torsion-free word hyperbolic group F and
a homomorphism φ : G̃ → F such that φ(F̃ ) = φ(F1) = F and φ(R(a, b)) 6= 1 in
F . Therefore F is a quotient of F1, the (φ-images of the) elements a, b generate F
and enjoy the required relations. �
We are now ready to prove Theorem 1.5.
Proof of Theorem 1.5. The argument will be similar to the one used to prove The-
orem 5.1.
First, set n = 2 and apply Lemma 4.2 to find a countable torsion-free group H
and a normal subgroup M ⊳H , where H/M ∼= C and M has (2CC) (alternatively,
one could start with a free group H ′ and M ′ ⊳H ′ such that H ′/M ′ ∼= C, and then
apply Lemma 4.4 to the pair (H ′,M ′) to obtain H and M with these properties).
Consider the word R(x, y) and the torsion-free hyperbolic group F , generated by
the elements a, b ∈ F which satisfy (8.1), given by Lemma 8.1. Denote G(−2) =
H ∗ F and let N(−2) be the normal closure of 〈M,F 〉 in G(−2), F (−2) = F ,
R(−2) = {R(a, b)} – a finite subset of F (−2). By Lemma 2.6, G(−2) will be
hyperbolic relative to the subgroup H , G(−2) = H ·N(−2), H ∩N(−2) =M and
F (−2) will be a suitable subgroup of G(−2).
The element a ∈ F (−2) will be hyperbolic in G(−2) and since the group G(−2)
is torsion-free, the maximal elementary subgroup EG(−2)(a) will be cyclic generated
by some element h−2x−2, where h−2 ∈ H , x−2 ∈ N(−2).
Choose y−2 ∈ M so that h−2y−2 6= 1. By Lemmas 2.2 and 2.5, the HNN-
extension
G(−3/2) = 〈G(−2), t−1 ‖ t−1h−2x−2t
−1 = h−2y−2〉
is hyperbolic relative to H . As in proof of Theorem 5.1, one can verify that F (−3)
is a suitable subgroup of G(−3/2), and apply Theorem 2.8 to find an epimorphism
η−2 : G(−3/2) → G(−1) such that G(−1) is a torsion-free group hyperbolic relative
to η−2(H), η−2 is injective on H ∪ R(−2) and η−2(t−1) ∈ F (−1) where F (−1) =
η−2(F (−2)) is a suitable subgroup of G(−1). Hence η−2(G(−2)) = G(−1) as
G(−3/2) was generated by G(−2) and t−1.
Denote N(−1) = η−2(N(−2)), R(−1) = η−2(R(−2)) and ψ−2 = η−2|G(−2) :
G(−2) ։ G(−1). One can show thatG(−1) = H ·N(−1) andH∩N(−1) =M using
the same arguments as in the proof of Theorem 5.1. According to the construction,
we have
η−2(t−1)η−2(a)η−2(t
−1) = η(t−1at
−1) ∈ N(−1) ∩H =M
in G(−1), therefore, since the conjugation by η−2(t−1) is an inner automorphism
of F (−1), we can assume that F (−1) is generated by a−1 and b−1, where a−1 ∈M
and R(a−1, b−1) 6= 1 in F (−1) (because η−2(R(a, b)) 6= 1 in F (−1)).
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 27
Now, if b−1 is not a hyperbolic element of G(−1), i.e., if b−1
G(−1)
∼ c for some
c ∈ H , then c ∈ N(−1)∩H =M , and since M has (2CC) we can find s−1 ∈ G(−1)
such that b−1 = s−1a−1s
−1. In this case we define G(0) = G(−1), N(0) = N(−1),
F (0) = F (−1), R(0) = R(−1), a0 = a−1, s0 = s−1 and ψ−1 = idG(−1).
Otherwise, if b−1 is hyperbolic in G(−1), then we construct the group G(0),
and an epimorphism ψ−1 : G(−1) → G(0) in an analogous way, to make sure that
η−1 is injective on H ∪ R(−1), G(0) torsion-free and hyperbolic relative to (the
image of) H , F (0) = ψ−1(F (−1)) is a suitable subgroup of G(0), G(0) = H ·N(0)
and H ∩N(0) = M where N(0) = ψ−1(N(−1)), and b0 = s0a0s
0 in G(0) where
b0 = ψ−1(b−1), a0 = ψ−1(a−1) for some s0 ∈ G(0)
Enumerate all elements of N(0): {g0, g1, g2, . . . }, and of G(0): {q0, q1, q2, . . . },
so that g0 = q0 = 1.
The groups G(j) together with N(j)⊳G(j), F (j) ≤ G(j), finite subsets R(j) ⊂
G(j), and elements aj, sj ∈ G(j), j = 1, 2, . . . , that we will construct shall satisfy
the following properties:
1◦. for each j ∈ N there is an epimorphism ψj−1 : G(j − 1) → G(j) which is
injective on H ∪R(j− 1). F (j) = ψj−1(F (j − 1)), N(j) = ψj−1(N(j − 1)),
aj = ψj−1(aj−1) ∈M , sj = ψj−1(sj−1) ∈ G(j);
2◦. G(j) is torsion-free and hyperbolic relative to (the image of) H , and F (j) ≤
G(j) is a suitable subgroup generated by aj and sjajs
3◦. G(j) = H ·N(j), N(j)⊳G(j) and H ∩N(j) =M ;
4◦. the natural image ḡj of gj in G(j) belongs to F (j);
5◦. there exists zj ∈ H such that q̄j
∼ zj, where q̄j is the image of qj in G(j);
6◦. if j ≥ 1, q̄j−1 ∈ G(j − 1) \H and for each k̂ ∈ N there is k ≥ k̂ such that
akj−1sj−1a
G(j−1)
6≈ akj−1q̄j−1a
j−1q̄
j−1, then there is a word Rj−1(x, y)
over the two-letter alphabet {x, y} which satisfies
R(j) ∋ ψj−1
Rj−1(aj−1, sj−1aj−1s
6= 1 and
Rj−1(aj−1, q̄j−1aj−1q̄
=1 in G(j).
Suppose that the groups G(0), . . . , G(i) have already been defined. The group
G(i+ 1) will be constructed in three steps.
First, assume that q̄i ∈ G(i) \ H and for each k̂ ∈ N there is k ≥ k̂ such that
aki sia
6≈ aki q̄ia
i . Observe that si /∈ H because, otherwise, one would
have F (i) ⊂ H , which is impossible as F (i) is suitable in G(i). Therefore, by
Corollary 6.6, we can suppose that k is so large that the elements v1 = a
i sia
and v2 = a
i q̄ia
i are hyperbolic in G(i). Applying Lemma 7.4 we can find a
word W (x, y) over {x, y} such that the group G(i+ 1/3) = G(i)/〈〈W (ai, v2)〉〉 and
the natural epimorphism η : G(i) → G(i + 1/3) satisfy the following: η is injective
on H ∪ R(i), G(i + 1/3) is torsion-free and hyperbolic relative to (the image of)
H , η(F (i)) ≤ G(i + 1/3) is a suitable subgroup, and η(W (ai, v1)) 6= 1. Define the
word Ri(x, y) ≡ W (x, x
kyk). Then Ri(ai, siais
i ) = W (ai, v1), Ri(ai, q̄iaiq̄
i ) =
W (ai, v2) in G(i), hence
Ri(ai, siais
6= 1 and η
Ri(ai, q̄iaiq̄
= 1 in G(i + 1/3).
28 ASHOT MINASYAN
If, on the other hand, q̄i ∈ H or there is k̂ ∈ N such that for every k ≥ k̂ one has
aki sia
≈ aki q̄ia
i , then we define G(i+ 1/3) = G(i), η : G(i) → G(i+ 1/3)
to be the identical homomorphism and Ri(x, y) to be the empty word.
Let ĝi+1 and q̂i+1 denote the images of gi+1 and qi+1 in G(i + 1/3), N̂(i) =
η(N(i)), F̂ (i) = η(F (i)) and R̂(i) = η
R(i) ∪ {Ri(ai, siais
. Then, using 3◦,
we get G(i + 1/3) = H · N̂(i) and H ∩ N̂(i) = M because ker(η) ≤ N(i) (as
ai, q̄iaiq̄
i ∈ N(i)).
Now we construct the group G(i + 2/3) in exactly the same way as the group
G(i+ 1/2) was constructed in during the proof of Theorem 5.1.
If for some f ∈ G(i + 1/3), f q̂i+1f
−1 = z ∈ H , then set G(i + 2/3) = G(i),
Ki+1 = N̂(i)⊳G(i + 2/3) and ti+1 = 1.
Otherwise, q̂i+1 is a hyperbolic element of infinite order in G(i + 1/3). Since
G(i + 1/3) is torsion-free, one has EG(i+1/3)(q̂i+1) = 〈hx〉 for some h ∈ H and
x ∈ N̂(i), and there is m ∈ Z such that q̂i+1 = (hx)
m. Now, by Lemma 2.2,
G(i + 1/3) is hyperbolic relative to {H, 〈hx〉}. Choose y ∈ M so that hy 6= 1 and
let G(i+ 2/3) be the following HNN-extension of G(i + 1/3):
G(i + 2/3) = 〈G(i + 1/3), ti+1 ‖ ti+1(hx)t
i+1 = hy〉.
The group G(i + 2/3) is torsion-free and hyperbolic relative to H by Lemma 2.5.
One can show that F̂ (i) is a suitable subgroup of G(i + 2/3) in the same way as
during the proof of Theorem 5.1. Lemma 4.3 assures that H ∩Ki+1 = M where
Ki+1 ⊳G(i+ 2/3) is the normal closure of 〈N̂(i), ti+1〉 in G(i+2/3). Finally, note
ti+1q̂i+1t
i+1 = ti+1(hx)
mt−1i+1 = (hy)
m = z ∈ H in G(i + 2/3).
Define Ti+1 = {ĝi+1, ti+1} ⊂ Ki+1. The group G(i + 1) is constructed from
G(i+2/3) as follows. Since Ti+1 · F̂ (i) ⊂ Ki+1⊳G(i+2/3), we can apply Theorem
2.8 to find a group G(i + 1) and an epimorphism ϕi : G(i + 2/3) → G(i + 1) such
that ϕi is injective on H ∪ R̂(i), G(i + 1) is torsion-free and hyperbolic relative to
(the image of) H , {ϕi(ĝi+1), ϕi(ti+1)} ⊂ ϕi(F̂ (i)), ϕi(F̂ (i)) is a suitable subgroup
of G(i + 1), and ker(ϕi) ≤ Ki+1. Denote by ψi : G(i) → G(i + 1) the composition
ϕi ◦ η. Then ψi(G(i)) = ϕi(G(i)) = G(i + 1) because G(i + 2/3) was generated
by G(i) and ti+1, and according to the construction, ti+1 ∈ ϕi(F̂ (i)) ≤ ϕi(G(i)).
Now, after defining F (i+1) = ψi(F (i)), N(i+1) = ψi(N(i)), R(i+1) = ϕi(R̂(i)),
ḡi+1 = ϕi(ĝi+1) ∈ F (i+1) and zi+1 = ϕi(z) ∈ H , we see that the conditions 1
◦ - 5◦
hold in the case when j = i+ 1, as in the proof of Theorem 5.1. The last property
6◦ follows from the way we constructed the group G(i + 1/3).
Let Q = G(∞) be the direct limit of the sequence (G(i), ψi) as i → ∞, and let
F (∞) and N = N(∞) be the limits of the corresponding subgroups. Let a∞, b∞
and s∞ be the images of a0, b0 and s0 in Q respectively. Then b∞ = s∞a∞s
∞ , Q
is torsion-free by 2◦, N ⊳Q, Q = H ·N and H ∩N =M by 3◦, N ≤ F (∞) by 4◦.
Hence Q/N ∼= H/M ∼= C.
Since F (0) ≤ N(0) we get F (∞) ≤ N . Thus N = F (∞) is a homomorphic
image of F (0) = F , and, consequently, it is a quotient of F1. By 5
◦, for any q ∈ N
there are z ∈ H and p ∈ Q such that pqp−1 = z. Consequently z ∈ H ∩ N = M .
Choose x ∈ N and h ∈ H so that p = hx. Since M has (2CC) and h−1zh ∈ M ,
there is y ∈ M such that yh−1zhy−1 = z, therefore (yx)q(yx)−1 = z ∈ M and
GROUPS WITH FINITELY MANY CONJUGACY CLASSES 29
yx ∈ MN = N . Hence each element q of N will be conjugated (in N) to an
element ofM , and since M has (2CC), therefore the group N will also have (2CC).
The property that CQ(N) = {1} can be established in the same way as in
Theorem 5.1. Therefore the natural homomorphism Q → Aut(N) is injective. It
remains to show that it is surjective, that is for every φ ∈ Aut(N) there is g ∈ Q
such that φ(x) = gxg−1 for every x ∈ N . Since all non-trivial elements of N are
conjugated, after composing φ with an inner automorphism of N , we can assume
that φ(a∞) = a∞. On the other hand, there exist q∞ ∈ N and i ∈ N such that
φ(b∞) = q∞a∞q
∞ and q∞ is the image of qi in Q. Note that s∞ /∈ H because
si ∈ G(i) \ H for every i ∈ N. This implies that H is a proper subgroup of N ,
thus q∞ /∈ H since N = F (∞) = 〈a∞, q∞a∞q
∞ 〉 ≤ Q, and a∞ ∈ H . Hence
q̄i ∈ G(i) \H .
Now we have to consider two possibilities.
Case 1: for each k̂ ∈ N there is k ≥ k̂ such that
aki sia
6≈ aki q̄ia
Then there is a word Ri(x, y) such that the property 6
◦ holds for j = i + 1. And,
since each ψj is injective on {1} ∪Rj (by 2
◦), we conclude that
Ri(a∞, s∞a∞s
∞ ) 6= 1 and Ri(a∞, q∞a∞q
∞ ) = 1 in Q,
which contradicts the injectivity of φ. Hence Case 1 is impossible.
Case 2: the assumptions of Case 1 fail. Then we can use Lemma 6.7 to find
β, γ ∈ H and ǫ, ξ ∈ {−1, 1} such that q̄i = γs
iβ, βaiβ
−1 = aǫi and γ
−1aiγ = a
in G(i). Denote by γ∞ the image γ in Q, and for any y ∈ Q let Cy be the
automorphism of N defined by Cy(x) = yxy
−1 for all x ∈ N .
If ξ = −1 then γ−1∞ a∞γ∞ = a
∞ and φ(b∞) = q∞a∞q
∞ = γ∞s
hence
Aut(N) ∋ Cs∞γ−1∞ ◦ φ :
a∞ 7→ s∞a
∞ = b
b∞ = s∞a∞s
∞ 7→ a
But N has no such automorphisms because R(a∞, b∞) 6= 1 and R(b
∞) = 1 in
N (since N is a quotient of F and 1 6= R(a0, b0) ∈ R(0) in G(0)).
Therefore ξ = 1. Similarly, ǫ = 1, as otherwise we would obtain a contradiction
with the fact that R(a−1∞ , b
∞ ) = 1 in N . Thus
Aut(N) ∋ Cγ−1∞ ◦ φ :
a∞ 7→ a∞
b∞ = s∞a∞s
∞ 7→ s∞a∞s
∞ = b∞
And since a∞ and b∞ generate N we conclude that for all x ∈ N , φ(x) = gxg
where g = γ∞ ∈ Q. Thus the natural homomorphism fromQ to Aut(N) is bijective,
implying that Out(N) = Aut(N)/Inn(N) ∼= Q/N ∼= C. Q.e.d. �
References
[1] G. Arzhantseva, A. Minasyan, D. Osin, The SQ-universality and residual properties of rela-
tively hyperbolic groups, J. Algebra 315 (2007), no. 1, 165-177.
[2] I. Belegradek, D. Osin, Rips construction and Kazhdan property (T), preprint (2006). arXiv:
math.GR/0605553
[3] I. Bumagin, D.T. Wise, Every group is an outer automorphism group of a finitely generated
group, J. Pure Appl. Algebra 200 (2005), no. 1-2, 137-147.
[4] R. Camm, Simple Free Products, J. London Math. Soc. 28 (1953), 66-76.
http://arxiv.org/abs/math/0605553
30 ASHOT MINASYAN
[5] Y. de Cornulier, Finitely presentable, non-Hopfian groups with Kazhdan’s Property and infi-
nite outer automorphism group, Proc. Amer. Math. Soc. 135 (2007), no. 4, 951-959.
[6] P. de la Harpe, A. Valette, La propriété (T) de Kazhdan pour les groupes localement compacts,
(avec un appendice de Marc Burger). Astérisque 175, 1989.
[7] M. Droste, M. Giraudet, R. Göbel, All groups are outer automorphism groups of simple
groups, J. London Math. Soc. (2) 64 (2001), no 3, 565-575.
[8] G. Higman, B.H. Neumann, H. Neumann, Embedding theorems for groups, J. London Math.
Soc. 24 (1949), 247-254.
[9] The Kourovka notebook. Unsolved problems in group theory, 16th augmented edition, V. D.
Mazurov and E. I. Khukhro eds., Rossĭıskaya Akademiya Nauk, Sibirskoe Otdelenie, Insti-
tut Matematiki (Siberian branch of Russian Academy of Sciences, Mathematical Institute),
Novosibirsk, 2006.
[10] R. Lyndon and P. Schupp, Combinatorial Group Theory, Springer-Verlag, 1977.
[11] T. Matumoto, Any group is represented by an outer automorphism group, Hiroshima Math.
J. 19 (1989), no. 1, 209-219.
[12] A. Muranov, Diagrams with selection and method for constructing boundedly generated and
boundedly simple groups, Comm. Algebra 33 (2005), no. 4, 1217-1258.
[13] A. Muranov, Finitely generated infinite simple groups of infinite commutator width, Int. J.
Algebra Comput. 17 (2007), no. 3, 607-659.
[14] Y. Ollivier, D.T. Wise, Kazhdan groups with infinite outer automorphism group, Trans. Amer.
Math. Soc. 359 (2007), no. 5, 1959-1976.
[15] A.Yu. Ol’shanskii, Geometry of defining relations in groups, Moscow, Nauka, 1989 (in Rus-
sian); English translation in Mathematics and its Applications (Soviet Series), 70. Kluwer
Academic Publishers Group, Dordrecht, 1991.
[16] A.Yu. Ol’shanskii, On residualing homomorphisms and G-subgroups of hyperbolic groups,
Internat. J. Algebra Comput. 3, no. 4 (1993), 365-409.
[17] D.V. Osin, Elementary subgroups of relatively hyperbolic groups and bounded generation,
Internat. J. Algebra Comput. 16 (2006), no. 1, 99-118.
[18] D.V. Osin, Peripheral fillings of relatively hyperbolic groups, Invent. Math. 167 (2007), no. 2,
295-326.
[19] D.V. Osin, Relative Dehn functions of HNN-extensions and amalgamated products, Topo-
logical and asymptotic aspects of group theory, Contemp. Math. 394, 209-220, Amer. Math.
Soc., Providence, RI, 2006.
[20] D.V. Osin, Relatively hyperbolic groups: intrinsic geometry, algebraic properties, and algo-
rithmic problems, Mem. Amer. Math. Soc. 179 (2006), no. 843, vi+100 pp.
[21] D.V. Osin, Small cancellations over relatively hyperbolic groups and embedding theorems,
Annals of Math., to appear. arXiv: math.GR/0411039
[22] F. Paulin, Outer automorphisms of hyperbolic groups and small actions on R-trees, Arboreal
Group Theory (MSRI, Berkeley, 1988), R.C. Alperin ed., Math. Sci. Res. Inst. Publ. 19,
Springer, New York, 1991.
School of Mathematics, University of Southampton, Highfield, Southampton, SO17
1BJ, United Kingdom.
E-mail address: aminasyan@gmail.com
http://arxiv.org/abs/math/0411039
	1. Introduction
	2. Relatively hyperbolic groups
	3. Groups with finitely many conjugacy classes
	4. Normal subgroups with (nCC)
	5. Adding finite generation
	6. Combinatorics of paths in relatively hyperbolic groups
	7. Small cancellation over relatively hyperbolic groups
	8. Every group is a group of outer automorphisms of a (2CC)-group
	References
ABSTRACT
  We combine classical methods of combinatorial group theory with the theory of
small cancellations over relatively hyperbolic groups to construct finitely
generated torsion-free groups that have only finitely many classes of conjugate
elements. Moreover, we present several results concerning embeddings into such
groups.
  As another application of these techniques, we prove that every countable
group $C$ can be realized as a group of outer automorphisms of a group $N$,
where $N$ is a finitely generated group having Kazhdan's property (T) and
containing exactly two conjugacy classes.

<|endoftext|><|startoftext|>
Energy density for chiral lattice fermions with chemical potential
Christof Gattringera and Ludovit Liptakb
Institut für Physik, FB Theoretische Physik, Universität Graz,
Universitätsplatz 5, 8010 Graz, Austria
Institute of Physics, Slovak Academy of Sciences,
Dúbravská cesta 9, 845 11 Bratislava 45, Slovak Republic
We study a recently proposed formulation of overlap fermions at finite density. In particular we
compute the energy density as a function of the chemical potential and the temperature. It is shown
that overlap fermions with chemical potential approach the correct continuum behavior.
PACS numbers: 11.15.Ha, 12.38.Gc
I. INTRODUCTION
Over the last two decades lattice gauge theory was
turned into a powerful qualitative tool for analyzing
QCD. This progress is in part due to the advances in
algorithms and computer technology, but also on the con-
ceptual side important breakthroughs were made. Most
prominent among these is the correct implementation of
chiral symmetry on the lattice based on the Ginsparg-
Wilson equation for the Dirac operator [1].
An application of lattice techniques which has seen a
lot of attention in recent years, is the study of QCD at
finite temperature. The lattice implementation of the
chemical potential µ, necessary for such an analysis, is
not straightforward, however. It is well known [2], that
a naive introduction leads to µ2/a2 contributions which
diverge in the continuum limit when the lattice spacing
a is sent to zero. For more traditional formulations, such
as the Wilson or staggered Dirac operators, the problem
has been solved by introducing the chemical potential in
the same way as the 4-component of the gauge field.
A satisfactory implementation of the chemical poten-
tial should be compatible with chiral symmetry on the
lattice based on the Ginsparg-Wilson equation. When
attempting to introduce the chemical potential into the
only solution of the Ginsparg-Wilson equation know in
closed form, the overlap operator [3], a potential prob-
lem quickly surfaces: defining the sign function of a non-
hermitian matrix. In [4] Bloch and Wettig proposed a
solution based on an analytic continuation of the sign
function into the complex plane. It was shown, that the
eigenvalue spectra of this construction match the expec-
tations from random matrix theory.
In this letter we analyze the proposal [4] further
and study the energy density of free, massless overlap
fermions with chemical potential. The dependence of the
energy density on µ and the temperature T allows for a
detailed analysis of the lattice formulation at finite den-
sity. Of particular interest will be the question whether
the analytic continuation of the sign function produces
divergent µ2/a2 terms. Our study indicates the absence
of such contributions and we find that the µ and T de-
pendence of the energy density is approached correctly.
II. SETUP OF THE CALCULATION
The overlap Dirac operator D(µ) for fermions with a
chemical potential µ is given as
D(µ) =
[1− γ5 signH(µ)] ,
H(µ) = γ5 [1− aDW (µ)] . (1)
The sign function may be defined through the spectral
theorem for matrices. DW (µ) denotes the usual Wilson
Dirac operator,
DW(µ)x,y = 1
δx,y − (2)
Uj(x)δx+ĵ,y +
Uj(x− ĵ)†δx−ĵ,y
U4(x)e
µa4δx+4̂,y −
U4(x−4̂)†e−µa4δx−4̂,y .
For later use we distinguish between the lattice spacing
a in spatial direction and the temporal lattice constant
a4. Periodic boundary conditions are used in the spatial
directions, while in time direction we apply anti-periodic
boundary conditions. The chemical potential µ is cou-
pled in the usual exponential form [2].
For vanishing µ the Wilson Dirac operator is γ5-
hermitian, i.e., γ5DW (0)γ5 = DW (0)
†. This implies that
H(0) is a hermitian matrix. As soon as the chemical po-
tential µ is turned on, γ5-hermiticity no longer holds, and
H(µ) is a non-hermitian, general matrix. This fact has
two important consequences: Firstly, the eigenvalues of
H(µ) are no longer real and the sign function for a com-
plex number has to be defined in the spectral representa-
tion of signH(µ). Secondly, the spectral representation
has to be formulated using left and right eigenvectors.
This latter problem will be dealt with later when we dis-
cuss the evaluation of signH(µ). For the sign function
of a complex number we use the analytic continuation
proposed in [4] and define the sign function through the
sign of the real part
sign (x+ iy) = sign (x) . (3)
http://arxiv.org/abs/0704.0092v2
The observable we study here is the energy density
defined as
ǫ(µ) =
〈H〉 = 1
H e−β (H−µN )
= (4)
e−β(H−µN )
= − 1
∂ lnZ
Here H is the Hamiltonian of the system, N denotes the
number operator and β = 1/T is the inverse temperature
(in our units the Boltzmann constant k is set to k = 1).
The derivatives in the second line are taken such that
βµ = c = const.
The continuum result for the subtracted energy density
of free massless fermions reads (see, e.g., [5])
ǫ(µ)− ǫ(0) = µ
µ2T 2 . (5)
When working on the lattice, the inverse tempera-
ture β is given by the lattice extent in 4-direction, i.e.,
β = N4a4. Thus the derivative ∂/∂β in (4) turns into
N−14 ∂/∂a4. The partition function Z is given by the
fermion determinant detD which we write as the prod-
uct over all eigenvalues λn. We thus find
ǫ(µ) = − 1
∂ ln detD
a4µ=c
= − 1
a4µ=c
= − 1
a4µ=c
. (6)
III. EVALUATION OF THE EIGENVALUES
According to (6), for the evaluation of ǫ(µ) the eigen-
values λn of the Dirac operator D have to be computed.
This is done in three steps: First we bring the Dirac oper-
ator for free fermions to 4× 4 block-diagonal form, using
Fourier transformation. Subsequently the spectral repre-
sentation is applied to the 4× 4 blocks of H to evaluate
sign H . Finally the eigenvalues of the blocks of D are
computed and by summing over the discrete momenta
all eigenvalues are obtained.
Following this strategy, one finds for the Fourier trans-
form Ĥ of H ,
Ĥ = γ5h5 + iγ5
γνhν , (7)
h5 = 1−
[1− cos(apj)]−
[1− cos(a4(p4 − iµ))] ,
hj = − sin(apj) for j = 1, 2, 3 ,
h4 = −
sin(a4(p4 − iµ)) . (8)
The spatial momenta are given by pj = 2πkj/aN ,
where N is the number of lattice points in the spatial
directions and kj = 0, 1 ... N − 1. The momenta in time-
direction are p4 = π(2k4 + 1)/a4N4, k4 = 0, 1 ... N4 − 1.
The remaining diagonalization of Ĥ is similar to the
construction of the left- and right-eigenfunctions for the
free Dirac operator. One finds that Ĥ has two different,
doubly degenerate eigenvalues
α1 = α2 = + s , α3 = α4 = − s , s =
h2 + h25 , (9)
where h2 =
ν . The corresponding left- and right-
eigenvectors, lj and rj are given by
l1 = l
1 [Ĥ + s1] , l2 = l
2 [Ĥ + s1] ,
l3 = l
3 [Ĥ − s1] , l4 = l
4 [Ĥ − s1] ,
r1 = [Ĥ + s1]r
1 , r2 = [Ĥ + s1]r
r3 = [Ĥ − s1]r(0)3 , r4 = [Ĥ − s1]r
4 . (10)
The constant spinors l
j , r
j are (T is transposition)
1 = r
(0) T
1 = c (1, 0, 0, 0) , l
2 = r
(0) T
2 = c (0, 1, 0, 0) ,
3 = r
(0) T
3 = c (0, 0, 1, 0) , l
4 = r
(0) T
4 = c (0, 0, 0, 1) .
The constant c = (2s(s + h5))
−1/2 ensures the correct
normalization, such that the eigenvectors obey lirj = δij .
Using these eigenvectors and the spectral theorem we
find for sign Ĥ the simple result
sign Ĥ =
sign (λj) rj lj =
sign(s)
Ĥ . (12)
Plugging this back into the overlap formula (1) and di-
agonalizing the remaining 4 × 4 problem one finds two
different eigenvalues for the overlap operator at a given
momentum,
1− sign (
h2 + h25 )h5 ± i
h2 + h25
, (13)
where each of the two eigenvalues is twofold degenerate.
The momentum dependence enters through the compo-
nents hν , h5 defined in (8). In the spectral sum (6) the la-
bel n runs over all momenta and the eigenvalues at fixed
momentum as given in (13). The necessary derivative
with respect to a4 is straightforward to compute in closed
form, and the spectral sum (6) can then be summed nu-
merically. The argument of the sign function cannot be-
come purely imaginary on a finite lattice, and no δ-like
terms occur. We remark, that after taking the derivative
with respect to a4, we set a = a4 = 1, i.e., all the results
we present are in lattice units.
0.00 0.02 0.04 0.06 0.08 0.10
0.000
0.001
0.002
0.003
0.004
 / 4π
FIG. 1: The energy density ǫ(µ)−ǫ(0) as a function of µ4 (all
in lattice units). The symbols (connected to guide the eye)
are for various lattice sizes, the dashed line is the continuum
result.
IV. RESULTS
We begin the discussion of our results with Fig. 1,
where we show the subtracted energy density ǫ(µ)− ǫ(0)
as a function of µ4 for three different lattice volumes. For
those lattices all 4 sides have equal length, i.e., in the
thermodynamic limit they correspond to zero tempera-
ture. Thus, according to (5), we expect the data (sym-
bols in Fig. 1) to approach the continuum form µ4/4π2
(dashed line) as the 4-d volume is sent to infinity.
The figure clearly shows that the lattice data are pre-
dominantly linear when plotted versus µ4 and that for
small µ they approach the continuum curve when the
volume is increased. It is, however, obvious that also on
our largest lattice still a discrepancy remains for larger
µ. In particular one finds a slight curvature upwards, a
discretization effect which here, since the lattice spacing
is just the inverse lattice extension, is also a finite size
effect. Furthermore, for small µ one expects to see finite
temperature corrections according to (5).
In order to study these finite temperature corrections
systematically, we analyzed lattices with short tempo-
ral extent, i.e., lattices with non-vanishing temperature.
Fig. 2 shows the corresponding results, where we again
plot the subtracted energy density as a function of µ4.
The lattice with the shortest temporal extent, 1283×8,
which corresponds to the largest temperature, shows a
clear curvature. This curvature is due to the T 2µ2/2
term in (5), which appears as a square root when plotted
as function of µ4. The effect is visible also for the other
lattices, but becomes less pronounced as the temporal
extent is increased, i.e., the temperature T is lowered.
In order to study this effect quantitatively, we fit the
finite temperature results to the continuum form (5) plus
two terms even in µ which parameterize the cutoff effects
0.00 0.02 0.04 0.06 0.08 0.10
0.000
0.001
0.002
0.003
0.004
 x 12
 x 16
 x 24
FIG. 2: The energy density ǫ(µ) − ǫ(0) as a function of µ4,
now for finite temperature lattices (all in lattice units).
observed in Fig. 1. The fit function is given by
2 + c4 µ
4 + c6 µ
6 + c8 µ
8 . (14)
Due to (5) the coefficient of the quadratic term should
scale with the temperature such that one expects
c2 ∼ T 2/2 = N−24 /2 . (15)
The coefficient for the quartic term should be constant,
c4 ∼ 1/4π2 = 0.02533 . (16)
The results of the fit for the data used in Fig. 2, and for
the largest lattice of Fig. 1 are given in Table 1.
The table shows that with increasing N4 the two phys-
ically significant parameters c2 and c4 approach the val-
ues expected from the continuum formula (5): c2 gets
closer to N−24 /2 as listed in the second column, and c4
approaches 1/4π2 = 0.02533. For the largest finite tem-
perature lattice 1283×24 the discrepancy is down to 9 %
/2 c2 c4 c6 c8
8 0.007812 0.010125 0.03519 0.010 -0.021
12 0.003472 0.004125 0.03178 0.023 -0.013
16 0.001953 0.002192 0.02803 0.029 -0.015
24 0.000868 0.000947 0.02587 0.025 -0.030
128 0.000030 0.000032 0.02543 0.015 0.016
TABLE I: Results of the fits to the form (14). The spatial
volume is always 1283. The temporal extension N4 is given
in the first column. In the second column we list the corre-
sponding value of N−2
/2 which is what one expects for the
fitting coefficient c2 in the third column. The coefficient c4 in
the fourth column is expected to approach the constant value
1/4π2 = 0.02533.
0.0 0.5 1.0 1.5 2.0
-0.05
Overlap ,  128
Wilson ,  128
FIG. 3: The ratio (ǫ(µ) − ǫ(0))/µ4 as a function of µ (in
lattice units). We compare the results for overlap to those
from Wilson fermions.
for c2, and 2 % for c4. The larger discrepancy for small
N4 can be understood as a discretization effect, since the
temporal lattice spacing a4 is related to the temporal ex-
tension through a4 = 1/N4 and thus larger N4 implies a
smaller a4. For comparison we also display the fit results
for the 1284 lattice, which corresponds to zero temper-
ature. There we find excellent agreement (less than 1%
discrepancy) for the parameter c4, governing the leading
term at T = 0. The overall picture obtained from the fit
results is that overlap fermions with chemical potential
reproduce very well both, the µ4 term, as well as the fi-
nite temperature contribution T 2µ2/2. We conclude that
the analytic continuation of the sign function does not in-
troduce lattice artifacts, such as the µ2/a2 term known
to be present in a naive implementation of the chemical
potential.
In the final step of our analysis we study the discretiza-
tion effect for larger values of µ and compare the re-
sults to the data from the standard Wilson operator. In
Fig. 3 we plot the ratio (ǫ(µ)− ǫ(0))/µ4 as a function of
µ. In the continuum at T = 0 this ratio has the value
1/4π2 = 0.02533 indicated by the horizontal line. For
small µ, up to about µ ∼ 0.7, the Wilson and overlap data
fall on top of each other. For very small µ both opera-
tors show a prominent increase which is a left-over finite
temperature effect, which for the ratio (ǫ(µ) − ǫ(0))/µ4
shows up as a 1/µ2 term. In the range between µ = 0.1
and 0.5 the data are close to the continuum value. Be-
yond 0.5 the discretization effects kick in and the overlap
and Wilson results start to differ. A comparison with
the equivalent plot in [6], where the results from various
other lattice Dirac operators were presented, shows that
the discretization effects of the overlap operator at large
µ are comparable to other formulations.
V. SUMMARY
In this article we have analyzed the energy density of
the overlap operator at finite chemical potential. Follow-
ing [4], the sign function in the overlap was implemented
through the spectral theorem using the analytic continu-
ation of the sign into the complex plane. The subtracted
energy density ǫ(µ) − ǫ(0) was analyzed for finite and
zero temperature lattices. Fits of the data show that the
expected continuum behavior is approached. No trace
of unphysical µ2/a2 terms was found. We conclude that
overlap fermions with chemical potential [4] provide both,
chiral symmetry and the correct description of fermions
at finite density.
Acknowledgments: We thank Leonard Fister,
Gabriele Jaritz, Christian Lang, Stefan Olejnik, Tilo
Wettig, and Florian Wodlei for discussions and check-
ing some of our calculations. This work is supported by
the Slovak Science and Technology Assistance Agency
under Contract No. APVT–51–005704, and the Austrian
Exchange Service ÖAD.
[1] P. H. Ginsparg and K. G. Wilson, Phys. Rev. D 25, 2649
(1982).
[2] P. Hasenfratz and F. Karsch, Phys. Lett. B 125, 308
(1983).
[3] R. Narayanan and H. Neuberger, Nucl. Phys. B 443, 305
(1995); H. Neuberger, Phys. Lett. B 417, 141 (1998).
[4] J. Bloch and T. Wettig, Phys. Rev. Lett. 97, 012003
(2006); J. Bloch and T. Wettig, contribution to Lattice
2006 (hep-lat/0609020).
[5] J. Kapusta, Finite temperature field theory, Cambridge
University Press, Cambridge (1989).
[6] W. Bietenholz and U. J. Wiese, Phys. Lett. B 426, 114
(1998).
http://arxiv.org/abs/hep-lat/0609020
ABSTRACT
  We study a recently proposed formulation of overlap fermions at finite
density. In particular we compute the energy density as a function of the
chemical potential and the temperature. It is shown that overlap fermions with
chemical potential reproduce the correct continuum behavior.

<|endoftext|><|startoftext|>
Aspects of Electron-Phonon Self-Energy Revealed from Angle-Resolved
Photoemission Spectroscopy
W.S. Lee,1 S. Johnston,2 T.P. Devereaux,2 and Z.-X. Shen1
Department of Physics, Applied Physics, and Stanford Synchrotron
Radiation Laboratory, Stanford University, Stanford, CA 94305
Department of Physics, University of Waterloo,Waterloo, Ontario, Canada N2L 3G1
(Dated: November 4, 2018)
Lattice contribution to the electronic self-energy in complex correlated oxides is a fascinating
subject that has lately stimulated lively discussions. Expectations of electron-phonon self-energy
effects for simpler materials, such as Pd and Al, have resulted in several misconceptions in strongly
correlated oxides. Here we analyze a number of arguments claiming that phonons cannot be the
origin of certain self-energy effects seen in high-Tc cuprate superconductors via angle resolved pho-
toemission experiments (ARPES), including the temperature dependence, doping dependence of the
renormalization effects, the inter-band scattering in the bilayer systems, and impurity substitution.
We show that in light of experimental evidences and detailed simulations, these arguments are not
well founded.
PACS numbers: Valid PACS appear here
I. INTRODUCTION
The microscopic pairing mechanism of the high-Tc
superconductivity remains an unsolved question even
after twenty years of its discovery. Observations of
a kink at around 40-70 meV in the band dispersion
along the diagonal of the Brillouin zone (nodal diec-
tion) and a peak-dip-hump (PDH) structure at the zone
boundary by angle-resolved photoemission spectroscopy
(ARPES)1,2,3,4,5,6,7,8,9,10,11,12,13 have drawn a great deal
of recent attention as they may shed some light on this
problem. Although an agreement has been established
that the kink and PDH structure are signatures of the
electrons coupled to a sharp bosonic mode, it is still
strongly debated about the origin of this bosonic mode.
Influenced by the fact that the high-Tc cuprate is a doped
antiferromagnetic insulator, a common belief is that this
bosonic mode has a magnetic origin2,3,4,5,6,7,8,9. How-
ever, an alternative view is that the electron-phonon cou-
pling in such a doped-insulator can be very strong and
anomalous because of a number of unusual effects, such
as poor screening, complex structure as well as the in-
terplay with correlations. Indeed, oxygen related op-
tical phonons have been invoked to explain the tem-
perature and doping dependence of the renormalization
effects10,11,12,13,14. This idea of phonons being mainly
responsible for this low energy band renormalization ef-
fect observed by ARPES has stimulated intense debate.
There is currently no consensus whether a phonon, a set
of phonons, or a magnetic mode is the primary cause of
the band renormalization.
As mentioned, some important reasons to invoke
phonon interpretation of the ARPES data are: the
presence of an universal energy scale in all materials
at all doping10,15, particularly in the normal state of
very low Tc materials
16; the strong inferred momen-
tum dependence11,14; the existence of multiple bosonic
mode couplings12 and the decrease in the overall cou-
pling strength with increased doping, interpreted as a
screening effect, especially for phonons with eigenvectors
along the c-axis13. Yet, there is still a widespread belief
that phonons are not responsible for the kink features. In
the following sections, with a comprehensive look at all
experimental data as well as some recent simulations, we
address some of the criticisms that have been commonly
used to argue against the phonon interpretation. These
include the temperature and doping dependence of the
renormalization effects, inter-band scattering for bilayer
system, and the ARPES experiments on impurity substi-
tuted Bi2212 crystal, Bi2Sr2Ca(Cu2−xMx)O8+δ with M
= Zn or Ni. Our goal is to clarify these misconceptions
as being due to oversimplifying the effects of electron-
phonon coupling in cuprates as well as other strongly
correlated transition metal oxides.
II. ASPECTS OF THE ELECTRON-PHONON
SELF-ENERGY
A. Temperature Dependence
In the standard treatment of electron-phonon coupling
effects, the Debye temperature sets a characteristic tem-
perature scale, which is well above Tc in conventional
superconducting materials. However in the cuprates and
other low Fermi energy systems, these two energy scales
can be comparable. As a result, the temperature depen-
dence of phonon induced self-energies can be very differ-
ent from that of conventional superconductors. Accord-
ing to the ARPES measurements on Bi2212 system, the
band renormalization in the antinodal region (peak-dip-
hump structure) shows a dramatic superconductivity-
induced enhancement when the system goes through a
phase transition from the normal state to the supercon-
ducting state. It has been argued that only a mode which
emerges in the superconducting state and vanishes in
http://arxiv.org/abs/0704.0093v1
the normal state can explain this temperature-dependent
renormalization effect2,3,4,5. Phonons are thereby ex-
cluded.
The sharpness of the renormalization effects due to
electron-phonon coupling is strongly temperature depen-
dent, given by the fact that Tc of optimally-doped Bi2212
is close to 100K. To demonstrate this temperature de-
pendence, we consider the normal state (120 K) and su-
perconducting state (10 K) of a d-wave superconductor
coupled to a 36 meV B1g, 55meV oxygen A1g, and 70
meV breathing phonons14,17. The electron-phonon cou-
pling for the B1g and breathing phonons are those used
in Ref. 14. The A1g modes involve c-axis motions of
both planar and apical oxygens, and have two branches
around 55 and 80 meV. The apical electron-phonon cou-
pling, derived in Ref. 17, involves a charge-transfer from
the apex oxygen into the CuO2 plane via the Cu 4s or-
bital, the same pathway that gives rise to bi-layer split-
ting. However, for simplicity, the apical electron-phonon
coupling is treated as a constant in the calculations pre-
sented in this paper. The reason to include three modes
in the calculation was inspired by the earlier success of
the two-mode calculation11 as well as the recent discov-
ery of multiple mode coupling12,13. For this calculation,
the tight-binding band structure described in Ref. 19
has been used. The real part and imaginary part of the
self-energy Σ(k, ω) and the spectral functions A(k, ω) at
k = (0, π) are then obtained within weak coupling Eliash-
berg formalism14 and plotted in Fig. 1. Details of the
calculations are presented in the Appendix.
At high temperature, both ReΣ(k, ω) and ImΣ(k, ω)
do not exhibit a sharp renormalization feature as shown
by the dashed curves in Fig. 1 (a) and Fig. 1(b), respec-
tively. This demonstrates that the thermal broadening
effect smears out the sharp renormalization features; in
addition, broadening effects due to additional many body
effects would smear out the renormalization features fur-
ther. Thus, one should not expect to observe any sharp
renormalization features at k = (0, π) in the normal state
(∼ 100K) from the electron-phonon coupling. In the su-
perconducting state, the renormalization features of the
self-energy become much sharper, due to smaller ther-
mal broadening effect as well as the opening of a su-
perconducting gap. Consistent with the optimally-doped
Bi2212 and Bi2223 measurements2,4,11,18, the PDH struc-
ture of the spectral function at k = (0, π) emerges at low
temperature and disappears at high temperature (nor-
mal state), as illustrated in Fig. 1(c) and Fig. 1(d),
respectively. While this behavior is generally expected
for any phonon, we note that in addition, the self-energy
from electron-phonon couplings which involve momen-
tum transfers within and between anti-nodal regions of
the Fermi surface, such as the apex A1g and B1g phonons,
are greatly enhanced for all k-points due to the large
density of state enhancements in these regions via the
opening of a d-wave gap. A detailed momentum depen-
dence of the renormalization effects in the normal state
and superconducting state due to the coupling to the B1g
FIG. 1: The calculated (a) real part, ReΣ, (b) imaginary
part of the self-energy, ImΣ , and the corresponding spectral
functions, A(k,ω) in (c) normal state and (d) superconducting
state. An extra 5 meV is added to the imaginary part of the
self energy in 120K simulation to account for the thermal
broadening of the quasi-particle life time. The location for
this calculation is indicated in inset of (a) by a red dot with
a red curve representing the FS. Insets of (c) and (d) are the
data of optimally-doped Bi2223 system (Tc=110K) taken at
the normal state (120K) and superconducting state(25K)18,
respectively.
phonon has been discussed in Ref. 11 and Ref. 14. Fur-
thermore, both the dispersion kink and the PDH struc-
ture in the nodal region have been clearly observed in
the normal state when measured at a low temperature
on samples with a lower Tc
16. This lends further support
to the strongly temperature dependent renormalization
features due to electron-phonon coupling.
B. Doping Dependence
Another problematic statement against the phonon
scenario stems from the apparent strong doping depen-
dence of the kink position and strength. Based on the
wisdom for conventional good metals, phonons should
not have a strong doping dependence, either in frequency
of the mode or in strength of the coupling. This is
not necessary valid for layered, doped insulators with
strong correlation effects, such as cuprates where dop-
ing dramatically changes the metallicity and the abil-
ity of electrons to screen charge fluctuations. We first
note that many experiments on various cuprates have re-
ported strongly doping dependent anomalies for several
phonons, which implies a strongly doping dependent e-p
coupling for these phonons. For example, from inelastic
neutron scattering measurements, the breathing mode,
half-breathing mode, and the bond-stretching modes ex-
hibit prominent doping dependence of dispersion and en-
ergy renormalizations20,21. In Raman and infrared spec-
FIG. 2: The intensity plots of the (a) spectral functions with-
out resolution convolution and (b) resolution convoluted spec-
tral functions in the superconducting state (10K) along the
nodal direction, as indicated in the inset of (b) by the blue
line. Black curves are the band dispersion extracted from
the maximum positions of the momentum distribution curves,
which cut the spectral functions at a fixed energy. The MDC-
derived dispersions in (a) exhibit three sharp ”subkinks” due
to the coupling to the three phonon modes used in the model;
while in (b) the subkinks are washed out by the finite instru-
ment resolution effect leaving an apparent single kink in the
band dispersion. The white dashed line illustrates the bare
band for extracting ReΣ shown in Fig. 3 (a).
troscopy, the Fano lineshapes of phonon modes in B1g
and B1u symmetry show strong doping dependences
Furthermore, the strength of the phonon energy shift and
linewidth variation across Tc also changes strongly with
doping23.
Second, the doping dependence of the renormalization
effects to the electronic self-energy is sophisticated as in-
ferred by two recent ARPES studies. One is the observa-
tion of multiple bosonic mode couplings along the nodal
direction12. The other is the doping dependence of the
c-axis screening effect to the coupling between the elec-
tron and c−axis phonons. As proposed by Meevasana et
13,24 and Devereaux et al.17, for electron-phonon cou-
pling at long wavelengths, the screening becomes more
effective at reducing the coupling strength when the c-
axis conductivity becomes more metallic. Given these
two results plus the variation of the superconducting gap
magnitude with doping, the doping dependence of the
kink energy is highly convoluted in Bi2212 whose super-
conducting gap has an energy comparable to some of the
phonons.
In Fig. 2, we present the intensity plot of calculated
spectral functions demonstrating a doping dependence
of the dispersion kink in the superconducting state. The
superconducting gap size was set to be 40, 20, and 10
meV for the optimally-doped and more overdoped sys-
tems. In addition, the coupling strength of the breathing
mode, whose appreciable coupling is only for short wave-
lengths and large momentum transfers20,21, remains un-
FIG. 3: (a) The ReΣ extracted from Fig. 2(a) (dashed lines)
and Fig. 2(b) (solid lines) by subtracting a linear bare band
(dashed line in Fig. 2(a)) from the band dispersion. The ar-
rows indicate the maximum positions of the ReΣ where the
“single” apparent kink in the band dispersion is usually de-
fined. (b) Summary of the doping dependence of the apparent
kink energy and the apparent mode energy extracted by as-
suming a single mode scenario.
changed for our doping dependence simulation; while, a
filter function, ω2/(ω2+ω2sc) with different value of c-axis
screening frequency ωsc is applied to the c-axis phonons
(B1g and A1g), to simulate the doping-dependent cou-
pling strength due to the change of the c-axis screening
effect13,24. We note that although this is a simplifica-
tion, it represents the general behavior of screening con-
siderations for phonons involving small in-plane momen-
tum transfers. Full consideration of screening has been
given in Ref. 17 and Ref. 24. In addition, a component
0.005+ ω2 eV is added in the imaginary part of the self-
energy to mimic the quasiparticle life time broadening
due to electron-electron interaction.
As shown in Fig. 2(a), the coupling to multiple phonon
modes induces several “subkinks” in the dispersion. The
positions of these subkinks mostly correspond to the en-
ergies of phonons plus the maximum d-wave SC gap,
∆0, even through there is no gap along the nodal di-
rection. This is because when calculating the self-energy,
one needs to integrate the contributions from the entire
zone, which makes the electrons in the nodal region ”feel”
the presence of the gap. Furthermore, revealed by the ex-
tracted real-part of the self-energy, ReΣ (dashed curves
in Fig. 3 (a)), the dominant feature in ReΣ for the OP
case is induced by 36 meV B1g mode, while for the OD1
and OD2 case, the features of the 55 meV A1g mode and
70 meV breathing mode gradually out-weight the con-
tribution from the B1g mode. This demonstrates that
the change of the SC gap magnitude and the increasing
screening effect to these phonons because of increased
doping alters the relative strength of the subkinks caused
by different modes.
To simulate the experimental data, we convoluted the
spectral functions shown in Fig. 2(a) with a typical
ARPES instrumental resolution: 12.5 meV in energy res-
olution and 0.012 π/a in momentum resolution. As illus-
trated in Fig. 2(b) and the extracted ReΣ (solid curves
in Fig. 3(a)), the subkinks are less obvious and become
a broadened “single” kink in the dispersion which is lo-
cated at roughly the energy of the dominant phonon fea-
ture determined by the maximum position of the ReΣ
(the arrows in Fig. 3(a)). The doping dependence of
the kink position is summarized as the solid symbols in
Fig. 3(b). Assuming a single mode scenario, one can
obtain the “doping dependence” of the mode energy by
subtracting out the SC gap size. However, we note that
this extracted “apparent” mode energy does not match
any of the modes used in the model; instead, it is a av-
erage between the dominant features (open symbols in
Fig. 3(b). Clearly, since the kink energy is a sum of the
superconducting gap and a spectrum of bosonic modes,
it should not be taken as a precise measurement of the
energy of any particular bosonic mode. This casts doubts
to the analysis of the doping dependent properties of the
kink in the nodal band dispersion based on the single
bosonic mode coupling scenario7,8. More importantly,
this illustrates the complex nature of lattice effects in
these oxides.
C. Interband Scattering
Borisenko et al.8,9 observed that the scattering rate
of the bonding and antibonding bands along the nodal
direction cross each other near the energy of the Van
Hove singularity, suggesting a strong inter-band scatter-
ing between the bonding and antibonding bands. They
argued that only a mode with ”odd” symmetry, such as
spin resonance mode, can mediate such inter-band scat-
tering. The question whether phonons can induce such
inter-band scattering has also been raised by these au-
thors.
First, we note that recent high energy and momentum
resolution ARPES experiments on Bi2212 using low en-
ergy photons( <10 eV) have better resolved the bilayer
splitting at the nodal point25. However, as shown in Fig.
2 of Ref. 25, the scattering rate of the bonding and anti-
bonding band does not exhibit a crossover behavior as
reported by Borisenko at al.. The inconsistency of the
data between the two groups implies that more experi-
ments and better analysis are needed to verify whether
this inter-band scattering effect is genuine.
Second, empirically, it has been known for over 15 years
that interband electron-phonon coupling in the cuprates
is very large. The evidence comes from the strong reso-
nance profiles of many Raman active phonons, which dis-
play large intensity variations26. This is generally under-
stood as a result of strong interband coupling, whereby
phonons can be brought in and out of resonances via tun-
ing of the incident photon energy27. Since, in general,
phonons can also provide momentum to scatter electrons
along the c-axis, direct inter bi-layer scattering can occur
which involves mixing of different symmetries of phonons.
This can be viewed in a simplified way even if we first
neglect direct interband scattering and consider a bilayer
system coupled to c−axis phonons. For qz = 0, a simple
classification of c-axis modes is possible:
k,σ,α=1,2
ǫα(k)c
k,α,σ
ck,α,σ +
t⊥(k)
k,1,σck,2,σ + h.c.
k,q,σ,α=1,2,ν
gν,α(k, q)
k+q,α,σck,α,σ
aν(−q) + a
+ h.c.
, (1)
where α is the index for the electronic states of different
layers, ǫ1(k) = ǫ2(k), t⊥(k) describes the hopping of elec-
trons between two layers, and the index ν can be either
gerade or ungerade active c-axis modes, with symmetry
classification with respect to the displacement eigenvec-
tors to the inversion center of the cell, depicted in Fig.
4. After diagonalizing the first two terms by canonical
transformation, the electron-phonon coupling can be re-
cast as
(g,u)
k,q,σ
g(g,u)(k, q)
a(g,u)(q) + a
(g,u)
k+q,+,σck,(+,−),σ + c
k+q,−,σck,(−,+),σ
+ h.c.
..(2)
We have used the c+ and c− for the even and odd
linear combination of c1 and c2, and subscript g and u
for the gerade and ungerade mode, respectively. Thus
for qz = 0, where this classification is possible, ger-
ade phonons induce intra-band scattering (even chan-
nel), while the ungerade phonons mediate the inter-
band scattering process (odd channel) even without di-
rect electron-phonon coupling across the layers. Yet for
qz = π/c, the classification inverts, where gerade modes
become ungerade and vice-versa, as illustrated in Fig 4.
Thus, even in this simple case, modes at different qz con-
tribute both to intra and interband scattering, and the
net weight of the coupling appearing in the self energy is
then largely determined by the specific momentum struc-
ture g(k, q). Since the self-energy generally involves sums
over qz, and coupling directly of electrons in adjacent lay-
ers via phonons are non-negligible, clearly the inter-band
scattering phenomena can not be used to argue against
FIG. 4: The illustration of the gerade and ungerade c−axis
phonons. The eigenmode of the gerade (ungerade) phonons
is even (odd) with respect to the mirror plane between two
CuO2 layers at qz = 0, while their definition swapped at qz =
π/c. The black, grey, and white circles represent the Cu, Ca,
and O atoms, respectively.
the phonons being important to the electronic states.
We also add a remark concerning the electron-
phonon coupling derived from Raman measurements28
in YBa2Cu3O7 and Bi-2212 compared to that obtained
from ARPES. While one might naively expect the cou-
plings to be comparable from Raman and ARPES, we re-
mark that this situation is remarkably different if the cou-
pling is strongly moment dependent and whenever corre-
lations are appreciable. Since Raman measures phonons
with net zero momentum transfer and ARPES involves a
sum over all transfers, a sizeable coupling difference may
be discernable. This is specifically the case for the B1g
phonon, where scattering involving momentum transfers
across the necks of the Fermi surface near (π, 0)14, further
enhanced via correlations29, yields a strong contribution
to the electron self-energy that is absent in phonon self-
energies. Moreover, a sum rule analysis presented in Ref.
30 highlights in general how electron and phonon self-
energies may be qualitatively different in strongly corre-
lated systems.
D. ARPES Experiments on Zn and Ni substituted
Bi2212
In this section, we comment on recent experiments
about the renormalization effects in Zn and Ni substi-
tuted Bi2212 crystal31,32. The strength of sharp renor-
malizations in these substituted crystals is found to be
weakened compared to the pristine crystals. Since the
magnetic properties are expectedly modified due to the
Cu substitution by these impurities, the authors con-
cluded that the sharp renormalization effects are induced
by magnetic-related modes, not phonons.
In fact, a close examination of the data published by V.
B. Zabolotnyy et al.31 and K. Terashima et al.32 implies
that the magnetic property is not the only modification
due to the substitution by Zn and Ni. First, although
both sets of data are consistent in the antinodal region
where the strength of the band renormalization is re-
duced, they are inconsistent with each other on the kink
strength along the nodal direction. In the data set of V.
B. Zabolotnyy et al., the kink strength is weaker in the
Zn or Ni doped samples, whereas there is no detectable
change in the data set reported by K. Terashima et al..
Second, the data from K. Terashima et al. (Fig. 1(d)-
(f) in Ref. 32) suggest that the bilayer-splitting struc-
ture is much clearer in the pristine crystals than in the
Zn and Ni doped crystals. Since the authors have ruled
out the possibility of a significant doping level difference
between pristine and impurity-doped crystals, the dis-
tinct visibility variation of the bilayer structure implies
a impurity-related change in the electronic structure.
From these two observations on their data, it implies
that not only the magnetic properties could change, the
band structure and scattering behaviors could also be
affected due to these impurities. It is possible that
these changes of the electronic structures could “weaken”
the renormalizaton features observed in the ARPES
spectrum. Furthermore, we note that the strength of
electron-phonon coupling could also be modified by the
substituted impurities: this can be inferred from the
change of the Fano spectra lineshape of the B1g 340
cm−1 phonon in Raman spectral for Zn-doped YBCO33
and Th-doped YBCO34 resulting from an increase in the
phonon linewidth due to impurity scattering. Therefore,
the experiments on Ni and Zn substituted Bi2212 crys-
tals are inconclusive experiments to distinguish phonon
and magnetic modes as the origin of the renormalization
effects.
III. CONCLUSION
We have shown that the temperature and doping de-
pendence of the renormalization effects, inter-band band
scattering, and the results of Zn and Ni doped materials
can be understood in the framework of electron-phonon
coupling. On the other hand, the issues that make it
not plausible for the sharp kink being of spin origin, es-
pecially the spin resonance mode, remain: i) the nearly
constant energy scale as a function of doping in small
gap system12; ii) the multiple modes12; iii) the presence
of clear kink in the normal state4,13,16 iv) the detailed
agreement between B1g phonon based explanation of the
mode coupling as a function of momentum11,14, while
the spin resonance with tiny spectral weight (2%) is un-
likely to give an explanation for both nodal and antin-
odal renormalization; v) the accumulated evidence for
lattice polaron effect in underdoped and deeply under-
doped systems35,36. With these weaknesses of the spin
resonance interpretation, lattice effect is a more plausible
explanation of the renormalization effects. It remains a
possibility that the spin-fluctuation and other strong cor-
relation effects are also very important to determine the
electronic structure of cuprates; they likely contribute to
a smooth renormalization of the band and may be more
relevant to the higher binding energy. However, opti-
cal phonons are the most probable origin for the renor-
malization effects due to sharp modes near 40-70 meV,
which is also supported by the recent finding of STM
experiments37,38.
Acknowledgments
W.S. Lee acknowledge the support from SSRL which
is operated by the DOE Office of Basic Energy Science,
Division of Chemical Science and Material Science under
contract DE-AC02-76SF00515. T. P. Devereaux would
like to acknowledge support from NSERC, ONR grant
N00014-05-1-0127 and the A. von Humboldt Foundation.
APPENDIX A: MIGDAL-ELIASHBERG BASED
APPROACH
In the calculations presented herein, we evaluate elec-
tronic self energies and spectral functions via Migdal-
Eliashberg treatment, as discussed in Ref. 39. The
dressed Green’s function in the superconducting state is
given in Nambu notation by
Ĝ(k, ω) =
ωZ(k, ω)τ̂0 + [ǫ(k) + χ(k, ω)]τ̂3 + φ(k, ω)τ̂1
[ωZ(k, ω)]2 − [ǫ(k) + χ(k, ω)]2 − φ2(k, ω)
, (A1)
from which the spectral function follows A(k, ω) = − 1
G′′1,1(k, ω) as shown in Figs. 1c,1d, and 2. The momentum-
dependent components of the Nambu self energy are given as generalizations of those found in Ref. 39:
ωZ2(k, ω) =
|gν(k,p− k)|
[nb(Ων) + nf(Ep)][δ(ω +Ων − Ep) + δ(ω − Ων + Ep)]
+[nb(Ων) + nf (−Ep)][δ(ω − Ων − Ep) + δ(ω +Ων + Ep)]
χ2(k, ω) = −
|gν(k,p− k)|
[nb(Ων) + nf (Ep)][δ(ω +Ων − Ep)− δ(ω − Ων + Ep)]
+[nb(Ων) + nf (−Ep)][δ(ω − Ων − Ep)− δ(ω +Ων + Ep)]
φ2(k, ω) =
|gν(k,p− k)|
[nb(Ων) + nf (Ep)][δ(ω +Ων − Ep)− δ(ω − Ων + Ep)]
+[nb(Ων) + nf(−Ep)][δ(ω − Ων − Ep)− δ(ω +Ων + Ep)]
where ν denotes the phonon mode index, and nf and nb
are the Fermi and Bose occupation factors. gν(k,q) are
the corresponding electron-phonon couplings for mode
ν, given in reference14 for the B1g and breathing modes.
We choose to model the A1g coupling via a momentum
independent coupling. Further details can be found in
Ref. 17.
1 P. V. Bogdanov, A. Lanzara, S. A. Kellar, X. J. Zhou, E.
D. Lu, W. J. Zheng, G. Gu, J.-I. Shimoyama, K. Kishio,
H. Ikeda, R. Yoshizaki, Z. Hussain, and Z. X. Shen, Phys.
Rev. Lett. 85, 2581 (2000).
2 A. Kaminski, M. Randeria, J. C. Campuzano, M. R. Nor-
man, H. Fretwell, J. Mesot, T. Sato, T. Takahashi, and K.
Kadowaki, Phys. Rev. Lett. 86, 1070 (2001).
3 T. K. Kim, A. A. Kordyuk, S. V. Borisenko, A. Koitzsch,
M. Knupfer, H. Berger, and J. Fink, Phys. Rev. Lett. 91,
167002 (2003).
4 T. Sato, H. Matsui, T. Takahashi, H. Ding, H.-B. Yang,
S.-C. Wang, T. Fujii, T. Watanabe, A. Matsuda, T.
Terashima, and K. Kadowaki, Phys. Rev. Lett. 91, 157003
(2003).
5 M. R. Norman, H. Ding, J. C. Campuzano, T. Takeuchi,
M. Randeria, T. Yokoya, T. Takahashi, T. Mochiku, and
K. Kadowaki, Phys. Rev. Lett. 79, 3506 (1997).
6 A. D. Gromko, A. V. Fedorov, Y.-D. Chuang, J. D. Ko-
ralek, Y. Aiura, Y. Yamaguchi, K. Oka, Yoichi Ando, and
D. S. Dessau Phys. Rev. B 68, 174520 (2003)
7 A. A. Kordyuk, S. V. Borisenko, V. B. Zabolotnyy, J. Geck,
M. Knupfer, J. Fink, B. Büchner, C. T. Lin, B. Keimer, H.
Berger, A.V. Pan, Seiki Komiya, and Yoichi Ando, Phys.
Rev. Lett. 97, 017002(2006).
8 S. V. Borisenko, A. A. Kordyuk, V. Zabolotnyy, J. Geck,
D. Inosov, A. Koitzsch, J. Fink, M. Knupfer, B. Büchner,
V. Hinkov, C. T. Lin, B. Keimer, T. Wolf, S. G. Chi-
uzbăian, L. Patthey, and R. Follath, Phys. Rev. Lett. 96,
117004 (2006).
9 S. V. Borisenko, A. A. Kordyuk, A. Koitzsch, J. Fink, J.
Geck, V. Zabolotnyy, M. Knupfer, B. Büchner, H. Berger,
M. Falub, M. Shi, J. Krempasky, and L. Patthey, Phys.
Rev. Lett. 96, 067001 (2006).
10 A. Lanzara, P. V. Bogdanov, X. J. Zhou, S. A. Kellar, D.
L. Feng, E. D. Lu, T. Yoshida, H. Eisaki, A. Fujimori, K.
Kishio, J.-I. Shimoyama, T. Noda, S. Uchida, Z. Hussain,
Z.-X. Shen, Nature (London) 412, 510 (2001).
11 T. Cuk, F. Baumberger, D. H. Lu, N. Ingle, X. J. Zhou,
H. Eisaki, N. Kaneko, Z. Hussain, T. P. Devereaux, N. Na-
gaosa, and Z.-X. Shen, Phys. Rev. Lett. 93, 117003 (2004).
12 X. J. Zhou, Junren Shi, T. Yoshida, T. Cuk, W. L. Yang,
V. Brouet, J. Nakamura, N. Mannella, Seiki Komiya,
Yoichi Ando, F. Zhou, W. X. Ti, J. W. Xiong, Z. X. Zhao,
T. Sasagawa, T. Kakeshita, H. Eisaki, S. Uchida, A. Fu-
jimori, Zhenyu Zhang, E. W. Plummer, R. B. Laughlin,
Z. Hussain, and Z.-X. Shen, Phys. Rev. Lett. 95, 117001
(2005).
13 W. Meevasana, N. J. C. Ingle, D. H. Lu, J. R. Shi, F.
Baumberger, K. M. Shen, W. S. Lee, T. Cuk, H. Eisaki,
T. P. Devereaux, N. Nagaosa, J. Zaanen, and Z.-X. Shen,
Phys. Rev. Lett. 96, 157003 (2006).
14 T. P. Devereaux, T. Cuk, Z.-X. Shen, and N. Nagaosa,
Phys. Rev. Lett. 93, 117004 (2004).
15 X.J. Zhou, T. Yoshida, A. Lanzara, P.V. Bogdanov, S.A.
Kellar, K.M. Shen, W.L. Yang, F. Ronning, T. Sasagawa,
T. Kakeshita, T. Noda, H. Eisaki, S. Uchida, C.T. Lin, F.
Zhou, J.W. Xiong, W.X. Ti, Z.X. Zhao, A. Fujimori, Z.
Hussain, and Z.-X. Shen, Nature 423, 398 (2003).
16 A. Lanzara, P. V. Bogdanov, X. J. Zhou, N. Kaneko,
H. Eisaki, M. Greven, Z. Hussain, and Z. -X. Shen,
cond-mat/0412178.
17 T. P. Devereaux, Z.-X. Shen, N. Nagaosa, and J. Zaanen,
preprint.
18 W.S. Lee et al., unpublished.
19 M. Eschrig and M. R. Norman, Phys. Rev. B 67, 144503
(2003).
20 L. Pintschovius, Phys. stat. sol. (b) 242, 30 (2005), and
the references herein.
21 D. Reznik, L. Pintschovius, M. Ito, S. Likubo, M. Sato,
H. Goka, M. Fujita, K. Yamada, G. D. Gu, and J. M.
Tranquada, Nature 440, 1170 (2006).
22 M. Opel, R. Hackl, T. P. Devereaux, A. Virosztek and
A. Zawadowski, A. Erb and E. Walker, H. Berger and L.
Forró, Phys. Rev. B 60, 9836 (1999); C. Bernhard, D.
Munzar, A. Golnik, C. T. Lin, A. Wittlin, J. Humliček,
and M. Cardona, ibid. 61, 618-626 (2000).
23 E. Altendorf, X. K. Chen, J. C. Irwin, R. Liang and W.
N. Hardy, Phys. Rev. B 47, 8140(1993); K. C. Hewitt,
X. K. Chen, C. Roch, J. Chrzanowski, J. C. Irwin, E. H.
Altendorf, R. Liang, D. Bonn, and W. N. Hardy, ibid. 69
064514(2004).
24 W. Meevasana, T. P. Devereaux, N. Nagaosa, Z.-X. Shen,
and J. Zaanen, Phys. Rev. B 74, 174524 (2006).
25 T. Yamasaki, K. Yamazaki, A. Ino, M. Arita, H. Na-
matame, M. Taniguchi, A. Fujimori, Z.-X. Shen, M.
Ishikado, and S. Uchida, cond-mat/0603006.
26 E. T. Heyen, S. N. Rashkeev, I. I. Mazin, O. K. Andersen,
R. Liu, M. Cardona, and O. Jepsen, Phys. Rev. Lett. 65,
3048-3051 (1990); B. Friedl, C. Thomsen, H.-U. Haber-
meier, and M. Cardona, Solid State Commun. 78, 291
(1991); D. Reznik, S.L. Cooper, M.V. Klein, W.C. Lee,
D.M. Ginsberg, A.A. Maksimov, A.V. Puchkov, I.I. Tar-
takovskii, and S-W. Cheong, Phys. Rev. B 48, 7624 (1993);
M. Kang, G. Blumberg, M. V. Klein, and N. N. Kolesnikov
Phys. Rev. Lett. 77, 4434 (1996); X. Zhou, M. Cardona,
D. Colson, and V. Viallet, Phys. Rev. B 55, 12 770 (1997);
V.G. Hadjiev, X. Zhou, T. Strohm, M. Cardona, Q.M. Lin,
and C.W. Chu, ibid. 58, 1043 (1998).
27 See, e.g., E. Ya. Sherman and C. Ambrosch-Draxl, Phys.
Rev. B 62, 9713 (2000), and references therein.
28 Considering Y-123 and Bi-2212, earlier Raman measure-
ments, when fit with a Fano profile, indicated that B1g cou-
pling in Y-123 is more appreciable than in Bi-2212, which
was thought to be due to the different electrostatic environ-
ment surrounding the CuO2 planes. [T.P. Devereaux, A.
Virosztek, A. Zawadowski, M. Opel, P.F. Müller, C. Hoff-
mann, R. Philipp, R. Nemetschek, R. Hackl, H. Berger, L.
Forro, A. Erb, and E. Walker, Solid State Commun. 108,
407 (1998)]. This at the time was supported by electrostatic
calculations of the c-axis oriented crystal field in Y-123
[J. Li and J. Ladik, Solid State Commun. 95, 35 (1995)],
but no calculations had been performed for Bi2212. A re-
examination of the Raman data indicate that the extracted
coupling for Bi-2212 may be affected by intrinsic inhomo-
geneity of phonon lines in Bi-2212 compared to Y-123, as
well to differences in the B1g electronic background. While
λ was estimated to be 0.0013, with inhomogeneity of the
phonon line taken into account along with a different choice
of background, λ = 0.02 may be obtained, comparable
to Y-123. This is supported by recent Ewald calculations
for Bi-2212, which gives a value of local crystal field 1.25
eV/cm, comparable to that obtained for Y-123.
29 Carsten Honerkamp, Henry C. Fu, and Dung-Hai Lee,
cond-mat/0605161.
30 O. Rösch, and O. Gunnarsson, Phys. Rev. Lett. 93,
237001(2004); O. Rösch, G. Sangiovanni, and O. Gunnars-
son, cond-mat/0607612.
31 V. B. Zabolotnyy, S.V. Borisenko, A. A. Kordyuk, J. Fink,
J. Geck, A. Koitzsch, M. Knupfer, B. Büchner, H. Berger,
A. Erb, C. T. Lin, B. Keimer, and R. Follath, Phys. Rev.
Lett. 96, 037003 (2006).
32 K. Terashima, H. Matsui, D. Hashimoto, T. Sato, T. Taka-
hashi, H. Ding, T. Yamamoto AND K. Kadowaki, Nature
Physics 2, 27 (2006).
33 M. Limonov, D. Shantsev, S. Tajima, and A. Yamanaka,
Phys. Rev. B 65, 024515(2001).
34 E. Altendorf, J. C. Irwin, W. N. Hardy, and R. Liang,
Physica C 185-189, 1375(1991).
35 K.M. Shen, F. Ronning, D.H. Lu, W.S. Lee, N.J.C. Ingle,
W. Meevasana, F. Baumberger, A. Damascelli, N.P. Ar-
mitage, L.L. Miller, Y. Kohsaka, M. Azuma, M. Takano, H.
Takagi, and Z.-X. Shen, Phys, Rev. Lett. 93, 267002(2004)
36 O. Rösch, O. Gunnarsson, X. J. Zhou, T. Yoshida, T.
Sasagawa, A. Fujimori, Z. Hussain, Z.-X. Shen, and S.
Uchida, Phys. Rev. Lett. 95, 227002 (2005).
37 Jinho Lee, K. Fujita, K. McElroy, J. A. Slezak, M. Wang,
Y. Aiura, H. Bando, M. Ishikado, T. Masui, J.-X. Zhu, A.
V. Balatsky, H. Eisaki, S. Uchida and J. C. Davis, Nature
http://arxiv.org/abs/cond-mat/0412178
http://arxiv.org/abs/cond-mat/0603006
http://arxiv.org/abs/cond-mat/0605161
http://arxiv.org/abs/cond-mat/0607612
442, 546(2006).
38 Jian-Xin Zhu, A. V. Balatsky, T. P. Devereaux, Qimiao
Si, J. Lee, K. McElroy, and J. C. Davis, Phys. Rev. B 73,
014511(2006); cond-mat/0507621.
39 D. J. Scalapino, in Superconductivity, Vol. 1, editted by R.
Parks, Dekker, 1969.
http://arxiv.org/abs/cond-mat/0507621
ABSTRACT
  Lattice contribution to the electronic self-energy in complex correlated
oxides is a fascinating subject that has lately stimulated lively discussions.
Expectations of electron-phonon self-energy effects for simpler materials, such
as Pd and Al, have resulted in several misconceptions in strongly correlated
oxides. Here we analyze a number of arguments claiming that phonons cannot be
the origin of certain self-energy effects seen in high-$T_c$ cuprate
superconductors via angle resolved photoemission experiments (ARPES), including
the temperature dependence, doping dependence of the renormalization effects,
the inter-band scattering in the bilayer systems, and impurity substitution. We
show that in light of experimental evidences and detailed simulations, these
arguments are not well founded.

<|endoftext|><|startoftext|>
arXiv:0704.0094v1  [astro-ph]  2 Apr 2007
Timing and Lensing of the Colliding Bullet Clusters: barely enough time and gravity
to accelerate the bullet
HongSheng Zhao
University of St Andrews, Scottish University Physics Alliances, KY16 9SS, UK
We present semi-analytical constraint on the amount of dark matter in the merging bullet galaxy
cluster using the classical Local Group timing arguments. We consider particle orbits in potential
models which fit the lensing data. Marginally consistent CDM models in Newtonian gravity are
found with a total mass MCDM = 1 × 10
M⊙ of Cold DM: the bullet subhalo can move with
VDM = 3000 kms
−1, and the ”bullet” X-ray gas can move with Vgas = 4200 kms
−1. These are
nearly the maximum speeds that are accelerable by the gravity of two truncated CDM halos in a
Hubble time even without the ram pressure. Consistency breaks down if one adopts higher end of
the error bars for the bullet gas speed (5000− 5400 kms−1), and the bullet gas would not be bound
by the sub-cluster halo for the Hubble time. Models with VDM ∼ 4500 kms
∼ Vgas would invoke
unrealistic large amount MCDM = 7× 10
M⊙ of CDM for a cluster containing only ∼ 10
M⊙ of
gas. Our results are generalisable beyond General Relativity, e.g., a speed of 4500 kms−1 is easily
obtained in the relativistic MONDian lensing model of Angus et al. (2007). However, MONDian
model with hot dark matter MHDM ≤ 0.6×10
M⊙ and CDM model with a halo mass ≤ 1×10
are barely consistent with lensing and velocity data.
PACS numbers: 98.10.+z, 98.62.Dm, 95.35.+d; submitted to Physical Review D, rapid publications
I. POTENTIAL FROM TIMING
Timing is a unique technique to establish the case for
dark matter halos, first and most throughly explored in
the context of the Local Group (Kahn & Woljter 1959,
Fich & Tremaine 1991, Peebles 1989, Inga & Saha 1998).
In its simplest version the Local Group consists of the
Milky Way and M31 as two isolated point masses, which
formed close to each other, moved apart due to the Hub-
ble expansion, and slowed down and moved towards each
other upto their present velocity ∼ 120 km s−1 and sepa-
ration (about 700 kpc) due to their mutual gravity. The
age of the universe sets the upper limit on the period of
this galaxy pair, hence the total mass of the pair through
Kepler’s 3rd law assuming Newtonian gravity.
Timing also finds a timely application in the pair of
merging galaxy clusters 1E0657-56 at redshift z = 0.3,
which is largely an extra-galactic grand analogy of the
M31-MW system. The sub-cluster, called the ”bullet”,
presently penetrates 400-700 kpc through the main clus-
ter with an apparent speed of ∼ 4750+710
−550 km s
−1 (Marke-
vitch 2006). The X-ray gas of the bullet (amounts to
2×1013M⊙) collides with the X-ray gas of the main clus-
ter (with the total gas up to 1014M⊙) and forms a Mach-3
cone in front of the ”bullet”. The two clusters have at
least four different centers, which are offset by 400 kpc
between the pair of X-ray gas centers and by 700 kpc
between the pair of star-light centers, which coincides
with the gravitational lensing centers and (dark matter)
potential centers (Clowe et al. 2006). The penetration
speed is unusually high, hard for standard cosmology to
explain statistically (Hayashi & White 2006), and modi-
∗Electronic address: hz4@st-andrews.ac.uk
fied force law has been suggested (Farrar & Rosen 2006,
Angus et al. 2007).
The timing method applies in in MONDian gravity
as well as Newtonian. Like lensing, timing is merely a
method about constraining potential distribution, and is
only indirectly related to the matter distribution. In this
Letter we model the bullet clusters as a pair of mass con-
centrations formed at high redshift, and set constraint on
their mutual force using the simple fact that their radial
oscillation period must be close to the age of universe
at z = 0.3. We check the consistency with the lensing
signal of the cluster and give interpretations in terms of
standard CDM and MOND.
First we can understand the speed of the bullet clus-
ter analytically in simplified scenarios. Approximate the
two clusters as points of fixed masses M1 and M2 on a
head-on orbit, we can apply the usual MW-M31 timing
argument. The total mass M0 = M1 + M2 is constant.
The radial orbital period is computed from
T = 2
∫ rmax
V (r)
, (1)
r3max
, Newtonian p = 2 (2)
2πrmax
, deep-MONDian, p = 1 (3)
∝ K−n/2r
max, for a K/r
p gravity, (4)
where rmax is the apocenter and is related to the present
relative velocity V (r) at separation r = 700 kpc by energy
http://arxiv.org/abs/0704.0094v1
conservation
V (r)2
= −GM0
Newtonian (5)
= V 2M (ln rmax − ln r) deep−MONDian (6)
r1−p − r1−pmax
K/(1− p) for a K/rp gravity,(7)
where VM =
ξ(GM0a0)
1/4 is the MOND cir-
cular velocity of two point masses, a0 equals one
Angstrom per square second and is the MOND
acceleration scale, and the dimensionless ξ ≡
3M1M2
∼ 0.81 ∼ 1 (cf. Mil-
grom 1994, Zhao 2007, in preparation) for a typical mass
ratio.
The predictions for simple Newtonian Keplerian grav-
ity are given in Fig. 1; the more subtle case for a MON-
Dian cluster is discussed in the final section. Setting the
orbital period T = 10Gyrs, the age of the universe at
the cluster redshift, yields presently V ∼ 3200 km s−1 in
Newtonian for a normal combined mass of M1 + M2 =
(0.7− 1)× 1015M⊙ for the clusters, which is about 7-10
times their baryonic gas content (∼ 1014M⊙) for Newto-
nian universe of Ω = 0.3 cold dark matter. In agreement
with Farrar & Rosen and Hayashi & White, the sim-
ple timing argument suggests that dark halo velocities
of 4750 km s−1, as high as the ”bullet” X-ray gas, would
require halos with unrealistically larger masses of dark
matter, ∼ 1016M⊙, an order of magnitude more than
what a universal baryon-dark ratio implies. As a sanity
check, assuming a conventional 3× 1012M⊙ Local Group
dark matter mass Fig.1 predicts the relative velocity of
∼ 100km/s for the M31-MW system at separation 700
kpc after 14 Gyrs, consistent with observation (Binney
& Tremaine 1987).
These analytical arguments, while straightforward, are
not precise given its simplifying assumptions. For one,
clusters do not form immediately at redshift infinity, and
the cluster mass and size might grow with time gradual-
lly. More important is that point mass Newtonian halo
models are far from fitting the weak lensing data of the
1E0657-56. A shallower Newtonian potential makes it
even more difficult to accelerate the bullet. On the other
hand, Angus, Shan, Zhao, Famaey (2007) show that there
are MOND-inspired potentials that fits lensing. As com-
mented in their conclusion, the same potential is deep
enough that a V = 4750 km s−1 ”bullet” is bound in an
orbit of apocenter rmax of a few Mpc, so the two clus-
ters could be accelerated by mutual gravity from a zero
velocity apocenter to 4750 km/s within the clusters’ life-
time. This line of thought was further explored by the
more systematic numerical study of Angus & McGaugh
(2007).
Our paper is a spin-off of these works and the works
of Hayashi & White and Farrar & Rosen. We emphasize
the unification of the semi-analytical timing perspective
and the lensing perspective, and aim to derive robust
constraints to the potential, without being limited to a
X-ray bullet speed
Baryon
12.5 13 13.5 14 14.5 15
Log (Combined Mass)
FIG. 1: Analytical timing-predicted dynamical mass vs. the
relative speed of two objects separated by 700 kpc after 10±
4 Gyrs (three lines in increasing order for increasing time)
assuming Keplerian potential of point masses. Three vertical
lines indicate typical Local Group Halo mass, Baryonic mass
in galaxy clusters, and most massive CDM halo masses. Three
horizontal lines indicate the error bar of the speed of the X-ray
”bullet” gas.
specific gravity theory or dark matter candidate.
Towards the completion of this work, we are made
aware by the preprint of Springel & Farrar (2007) that the
unobserved bullet DM halo could be moving slower than
its observed stripped X-ray gas. These authors, as well
as the preprint of Milosovic et al. (2007), emphasized the
effect of hydrodynamical pressure, which we will not be
able to model realisticly here. But to address the velocity
differences, instead we treat the X-ray gas as a ”bullestic
particle”. We argue that our hypothetical ballistic parti-
cle must move slow enough to be bound to vicinity of the
subhalo before the collision, but moves somewhat faster
than 4700+700
−550 km s
now, since it does not experience
ram pressure of the gas. This model follows the spirit
of classical timing models of the separation of the Large
and Small Magellanic Clouds and the Magellanic Stream
(Lin & Lynden-Bell 1982).
II. 3D POTENTIAL FROM LENSING
The weak lensing shear map of Clowe et al. (2006)
has been fitted by Angus et al. (2007) using a four-
component analytical potential each being spherical but
on different centres. For our purpose we redistribute the
minor components and simplify the potential into two
components centred on the moving centroid of galaxy
light of the main cluster with the present spatial coordi-
nates r1(t) = (−564,−176, 0) kpc and subcluster galaxy
centroid r2(t) = (145, 0, 0) kpc; the coordinate origin is
set at the present brightest point of the ”bullet” X-ray
gas; presently the cluster is at z = 0.3 or cosmic time
t = 10Gyrs. We also apply a Keplerian truncation to the
potential beyond the truncation radius rt. So the follow-
ing 3D potential is adopted for the cluster 1E0657-56 at
time t,
Φ(X,Y, Z, t) = (1800 km s−1)2φ (|r− r1|) (8)
+ (1270 km s−1)2φ (|r− r2|) ,
φ(|r − ri(t)|) = ln
|r − ri(t)|
180 kpc
+ cst, r < rt(9)
= − r̃t
|r− ri(t)|
, r ≥ rt(t) = C × t,(10)
where r̃t ≡ r
+1802
is to ensure a continuous and smooth
transition of the potential across the truncation radius rt.
The truncation rt evolves with time, since a pre-cluster
region collapses gradually after the big bang, and its
boundary and total mass grows with time till it reaches
the size of a cluster. In the interests of simplicity rather
than rigour, we use a linear model rt = C × t, where C
is a constant of the unit kpc/Gyr.
To check that the simplified potential is still consis-
tent with weak lensing data, we recompute the 3D weak
lensing convergence (Taylor et al. 2004) for sources at
distance D(0, zs) at the redshift zs,
κ(X,Y, zs) =
i=X,Y
∫ D(0,zs)
2D(z, zs)
(∂iΦ)dZ
where the integrations in square backets are the deflec-
tion angles for a source at zs, and the usrual lensing
effective distance is related to the comoving distances
by D(z, zs) = (1 + z)
−1D̃(z)
1− D̃(z)
D̃(zs)
= 587 Mpc is
for the bullet cluster z = 0.3 lensing sources at zs = 1;
the distance increases by a factor 1.3 to 1.6 for source
redshifts of 3 to infinity. Fig.2 shows the predicted κ
along the line joining the two dark centers; the result
is insensitive to the cluster truncation radius as long as
rt ≥ 1000kpc presently. The lensing model predicts a
signal in between that of the weak lensing data of Clowe
et al., and strong lensing data of Bradac et al. It is
known that these two data sets are somewhat discrepant
to each other. So the fit here is reasonable. The method
is deprojection is essentially similar to the decomoposi-
tion method of Bradac et al. whose explicit assumption
of Einsteinian gravity is however unnecessary.
The important thing here is that as far as deproject-
ing the above potential is concerned, no assumption is
needed on the gravity theory as long as light rays fol-
low geodesics, a feature built in most alternative grav-
ity theory. Similarly orbits of massive particles are also
(different) geodesics in these theories. The meaning of
potential in such theories is that the potential (scaled by
a factor 2/c2) represents metric perturbations to the flat
space-time, especially to the g00(cdt)
2 = −(1+ 2Φ
)(cdt)2
term, so the Christoffel Γi00 ∼ ∂∂XiΦ, it can be shown
–1200 –1000 –800 –600 –400 –200 0 200 400
X/kpc
FIG. 2: Predicted bullet cluster convergence (rescaled for
sources at infinity) along the line Y = 0.3X + cst connect-
ing our two potential centroids. The model predicts a lensing
signal in between that of observed weak lensing data from
sources at zs = 1 (Clowe et al, lower end of error bars) and
the united weak lensing and strong lensing (zs = 3) data
(Bradc et al. upper part of error bars); the mismatch of these
two datasets are presently unresolved.
that the geodesic equations have the same form as Ein-
steinian in the weak-field limit: d
R ≈ −(1 + v
)∇RΦ,
where R is the pair of spatial coordinates perpendicular
to the instantaneous velocity v; the pathes of light rays
are deflected twice as much by the metric perturbation
2Φ/c2 as those of low-speed particles.
III. ORBITS OF THE COLLIDING CLUSTERS
We now use this potential to predict the relative speed
of the two clusters. This is possible using the classical
timing argument, in the style of Kahn & Wolter (159),
Fich & Tremaine (1991) and Voltonen et al. (1998);
we postpone most rigourous least action models (Pee-
bles 1989, Schmoldt & Saha 1998) for later investiga-
tions since these require modeling a cosmological con-
stant and other mass concentrations along the orbital
path of the bullet clusters, which have technical issues in
non-Newtonian gravity. We trace the orbits of the two
centroids of the potentials according to the equation of
motion d
= −∇Φ(ri). We assign different relative ve-
locities presently (at z = 0.3), and integrate backward
in time and require the two centroids of the potential be
close together at a time 10 Gyrs ago. The motions are
primarily in the sky plane, but we allow for 600 km/s
relative velocity component in the line of sight. Clearly
at earlier times when t is small, the two centroids are
well-separated compared to their sizes, so they move in
the growing Keplerian potential of each other. At lat-
ter times the centroids came close and move in the cored
isothermal potential.
We shall consider models with a normal truncation
rt = C × t = 1000 kpc at time t = 10 Gyrs. We also
consider models with a very large truncation C × t =
10000 kpc. In the language of CDM, the truncation
means the virial radius of the halo. The present in-
stantaneous escape speed of the model can be com-
puted by Vesc =
−2Φ(X,Y, Z, t). We find Vesc ∼
4200− 4500 km s−1 in the central region of the shallower
potential model with a present truncation 1000 kpc. The
escape speed increases to Vesc ∼ 5700 km s−1 for models
with a present truncation 10000 kpc.
Fig. 3 shows the predicted orbits for different present
relative velocities VDM = |dr2dt −
|. Among models
with a normal truncation, we find VDM ∼ 2950 km s−1; a
model with relative velocity VDM < 2800 km s
−1 would
predict an unphysical orbital crossing at high redshift,
while models with VDM > 3000 km s
−1 would predict
that the two potential centroids were never close at high
redshift.
Larger halo velocities are only possible in models
with very large truncation. If the relative velocity is
4200 km s−1 < VDM < 4750 km s
−1 between two clus-
ter gravity centroids, then the truncation must be as big
as 10Mpc at z = 0.3.
We also track the orbit of the bullet X-ray gas cen-
troid as a tracer particle in the above bi-centric poten-
tial. We look for orbits where the bullet X-ray gas will
always be bound to one member of the binary system
since the ram pressure in a hydrodynamical collision is
unlikely to be so efficient to eject the X-ray gas out of
potential wells of both the main and sub-clusters. This
means that the bullet speed must not exceed greatly the
present instantaneous escape speed of the model, which is
∼ 4200− 4500 km s−1 in the central region of the shallow
potential of a model with a present truncation 1000 kpc.
The escape speed increases to ∼ 5700 km s−1 for models
with a present truncation 10000 kpc. The model with
normal truncation is marginally consistent with the ob-
served gas speed Vgas ∼ 4750+710−550 km s
−1. The problem
would become more severe if the potential were made
shallower by an even smaller truncation. The gas speed
is less an issue in models with larger truncation.
In short the present velocity and lensing data are eas-
ier explained with potential models of very large trun-
cation. Models with normal truncation have smaller
gravitational power, can only accelerate the subhalo to
3000 km s−1 in 10 Gyrs. Models with normal CDM trun-
cation can only accelerate the bullet X-ray gas cloud to
∼ 4200− 4400 km s−1, the escape speed, marginally con-
sistent with observations.
Above simulation results are sensitive to the present
cluster separation, but insensitive to the present direc-
tion of the velocity vector. Unmodeled effects such as
dynamical friction associated with a live halo will reduce
the predicted VDM for the same potential, but the effect
is mild since the actual collision is brief ∼ 0.1− 0.3Gyrs
and the factor exp(−M2/2) in Chandrasekhar’s formulae
Curve 1
Curve 2
Curve 3
Curve 4
Curve 9
Curve 10
Curve 11
V_DM=2850 
C=100
C=100
V_DM=2950 
C=1000
V_DM=4200 
C=1000kpc/Gyr
V_DM=4750 km/s
–2000
–4000 –2000 0 2000 4000
X kpc
FIG. 3: The orbit of the bullet subcluster X-ray gas (red,
with present Vgas = 5400 km s
−1 for the 10 Gyrs in the past,
and pink: for the future 4 Gyrs), and the orbits of the col-
liding main cluster halo (blue dashes) and subhalo (black
dashes) in the potential (eqs. 8-10) determined by lensing
data; dashes indicate length traveled in 0.5 Gyrs steps. No
explicit assumption of gravity is needed for these calcula-
tions. Orbits with different present halo relative velocity VDM
and halo growth rate C are shown after a vertical shift for
clarity. Timing requires the present cluster relative velocity
in between 2800 kms−1 < VDM < 3000 kms
−1 for poten-
tials of normal truncation (lowest panels where the cluster
truncation grows from zero to C × 10Gyr = 1000 kpc), and
4200 kms−1 < VDM < 4750 kms
−1 for potentials with large
truncation (two upper panels where the cluster truncation
grows from zero to C × 10 Gyr = 10000 kpc).
sharply reduces dynamical friction for a supersonic body,
where M ∼ 2− 3 is the Mach number for the bullet.
IV. NEWTONIAN AND MONDIAN MEANINGS
OF THE POTENTIAL MODEL
Assuming Newtonian gravity the models with normal
truncation rt = 1Mpc at t = 10Gyrs correspond to clus-
ter (dark) masses of M1 = 0.745 × 1015M⊙ and M2 =
0.345×1015M⊙; the larger truncation rt = 10Mpc corre-
sponds to M1 = 7.45×1015M⊙ and M2 = 3.45×1015M⊙
in Newtonian. All these models fit lensing.
Interpreted in the MONDian gravity, the truncation
is due to external field effect and cosmic background
so to make the MOND potential finite hence escapable
(Famaey, Bruneton, Zhao 2007). Beyond the trun-
cation radius, MOND potential becomes nearly Kep-
lerian. The MONDian models, insensitive to trunca-
tion, would have masses only M1 = 0.66 × 1015M⊙ and
M2 = 0.16 × 1015M⊙. These masses are still higher
than their baryonic content ∼ 1014M⊙, implying the
need for, e.g., massive neutrinos; the neutrino density
is too low in galaxies to affect normal MONDian fits to
galaxy rotation curves, but is high enough to bend light
and orbits significantly on 1Mpc scale. The neutrino-to-
baryon ration, approximately 7:1 in the bullet cluster,
would be a reasonable assumption for a MONDian uni-
verse with Ωb ∼ 0.04 plus 2eV neutrinos hot dark matter
ΩHDM ∼ 0.25 ∼ 7 × Ωb (Sanders 2003, Pointecoute &
Silk 2005, Skordis et al. 2006, Angus et al. 2007). The
amount of hot dark matter inferred here is the same as
Angus et al. (2007) since their potential parameters are
fixed by the same lensing data.
V. CONCLUSION
In short a consistent set of simple lensing and dynam-
ical model of the bullet cluster is found. The present
relative speeds between galaxies of the two clusters is pre-
dicted to be VDM ∼ 2900 km s−1 in CDM and VDM ∼
4500 km s−1 in µHDM (MOND + Hot Dark Matter) if
the two clusters were born close to each other 10 Gyrs
ago; both models assume close to universal gas-DM ratio
in clusters, i.e., about (0.6 − 1) × 1015M⊙ Hot or Cold
DM. Modeling the bullet X-ray gas as ballistic particle,
we find the gas particle with speed of Vgas = 4200km/s
(at the lower end of observed speed) is bound to the
potential of the subcluster for most part of the Hubble
time for both above models, insensitive to the preference
of the law of gravity. But if future relative proper mo-
tion measurements of the subcluster galaxy speed is as
high as VDM = 4500km/s, or the gas speed is as high as
Vgas ∼ 5400 km s−1, then Newtonian models would need
to invoke unlikely 7×1015M⊙ DM halos around 1014M⊙
[20] Angus, G.W. & McGaugh S.D. 2007, astro-ph/0703xxx
[20] Angus G.W., Shan H, Zhao H., Famaey B., 2007, ApJ,
654, L13
[3] Bekenstein J., 2004, Phys. Rev. D., 70, 3509
[4] Binney, J., & Tremaine, S. 1987, Galactic Dynamics,
Princeton University Press, Princeton, New Jersey, Ch.7
[5] Bradac M., Clowe D., Gonzalez A.H., et al., 2006,astro-
ph/0608408 (B06)
[6] Clowe D., Bradac M., Gonzalez A.H., et al., 2006,astro-
ph/0608407 (C06)
[20] Farrar G., & Rosen R.A., astro-ph/0610298
[20] Famaey B., Bruneton J.P., Zhao H.S. 2007, MNRAS, in
press (astro-ph/072275)
[20] Inga M.S. & Saha P. 1998, ApJ, 115, 2231
[20] Lin D.N.C. & Lynden-Bell D, 1982, MNRAS, 198, 707
[20] Markevitch M. 2006, in ESA SP-604: The X-ray Universe
2005, ed. A.Wlison 723
[20] Milgrom M. 1994, ApJ, 429, 540
[20] Kahn, F.D. & Woltjer L. 1959, ApJ, 130, 705
[20] Peebles P.J.E. 1989, ApJ, 344, L53
[20] Pointecoute E. & Silk J. 2005, MNRAS, 364, 654
[20] Skordis, C. et al. 2006, Phys. Rev. Lett, 96, 1301
[20] Sanders R. 2003, MNRAS, 343, 901
[20] Taylor A.,N., Bacon D.J., et al. 2004, MNRAS, 353, 1176
[20] Fich M. & Tremaine S. 1991,ARAA, 29, 409
[20] Voltonen M.J., Byrd G.G., McCall M., Innanen K.A.
1993, AJ 105, 886
ABSTRACT
  We present semi-analytical constraint on the amount of dark matter in the
merging bullet galaxy cluster using the classical Local Group timing arguments.
We consider particle orbits in potential models which fit the lensing data.
{\it Marginally consistent} CDM models in Newtonian gravity are found with a
total mass M_{CDM} = 1 x 10^{15}Msun of Cold DM: the bullet subhalo can move
with V_{DM}=3000km/s, and the "bullet" X-ray gas can move with
V_{gas}=4200km/s. These are nearly the {\it maximum speeds} that are
accelerable by the gravity of two truncated CDM halos in a Hubble time even
without the ram pressure. Consistency breaks down if one adopts higher end of
the error bars for the bullet gas speed (5000-5400km/s), and the bullet gas
would not be bound by the sub-cluster halo for the Hubble time. Models with
V_{DM}~ 4500km/s ~ V_{gas} would invoke unrealistic large amount M_{CDM}=7x
10^{15}Msun of CDM for a cluster containing only ~ 10^{14}Msun of gas. Our
results are generalisable beyond General Relativity, e.g., a speed of
$4500\kms$ is easily obtained in the relativistic MONDian lensing model of
Angus et al. (2007). However, MONDian model with little hot dark matter
$M_{HDM} \le 0.6\times 10^{15}\msun$ and CDM model with a small halo mass $\le
1\times 10^{15}\msun$ are barely consistent with lensing and velocity data.

<|endoftext|><|startoftext|>
Introduction 1
2. Quasi-norms and the geometry of nilpotent Lie groups 12
3. The nilshadow 19
4. Periodic metrics 23
5. Reduction to the nilpotent case 27
6. The nilpotent case 31
7. Locally compact G and proofs of the main results 39
8. Coarsely geodesic distances and speed of convergence 47
9. Appendix: the Heisenberg groups 52
References 55
1. Introduction
1.1. Groups with polynomial growth. Let G be a locally compact group with
left Haar measure volG. We will assume that G is generated by a compact sym-
metric subset Ω. Classically, G is said to have polynomial growth if there exist
C > 0 and k > 0 such that for any integer n ≥ 1
volG(Ω
n) ≤ C · nk,
Date: April 2012.
http://arxiv.org/abs/0704.0095v2
2 EMMANUEL BREUILLARD
where Ωn = Ω· . . . · Ω is the n-fold product set. Another choice for Ω would only
change the constant C, but not the polynomial nature of the bound. One of the
consequences of the analysis carried out in this paper is the following theorem:
Theorem 1.1 (Volume asymptotics). Let G be a locally compact group with poly-
nomial growth and Ω a compact symmetric generating subset of G. Then there
exists c(Ω) > 0 and an integer d(G) ≥ 0 depending on G only such that the
following holds:
volG(Ω
nd(G)
= c(Ω)
This extends the main result of Pansu [27]. The integer d(G) coincides with
the exponent of growth of a naturally associated graded nilpotent Lie group, the
asymptotic cone of G, and is given by the Bass-Guivarc’h formula (4) below.
The constant c(Ω) will be interpreted as the volume of the unit ball of a sub-
Riemannian Finsler metric on this nilpotent Lie group. Theorem 1.1 is a by-
product of our study of the asymptotic behavior of periodic pseudodistances on G,
that is pseudodistances that are invariant under a co-compact subgroup of G and
satisfy a weak kind of the existence of geodesics axiom (see Definition 4.1).
Our first task is to get a better understanding of the structure of locally compact
groups of polynomial growth. Guivarc’h [21] proved that locally compact groups
of polynomial growth are amenable and unimodular and that every compactly
generated1 closed subgroup also has polynomial growth.
Guivarc’h [21] and Jenkins [15] also characterized connected Lie groups with
polynomial growth: a connected Lie group has polynomial growth if and only if
it is of type (R), that is if for all x ∈ Lie(S), ad(x) has only purely imaginary
eigenvalues. Such groups are solvable-by-compact and any connected nilpotent
Lie group is of type (R).
It is much more difficult to characterize discrete groups with polynomial growth,
and this was done in a celebrated paper of Gromov [17], proving that they are
virtually nilpotent. Losert [24] generalized Gromov’s method of proof and showed
that it applied with little modification to arbitrary locally compact groups with
polynomial growth. In particular he showed that they contain a normal compact
subgroup modulo which the quotient is a (not necessarily connected) Lie group.
We will prove the following refinement.
Theorem 1.2 (Lie shadow). Let G be a locally compact group of polynomial
growth. Then there exists a connected and simply connected solvable Lie group S
of type (R), which is weakly commensurable to G. We call such a Lie group a Lie
shadow of G.
Two locally compact groups are said to be weakly commensurable if, up to
moding out by a compact kernel, they have a common closed co-compact subgroup.
More precisely, we will show that, for some normal compact subgroupK, G/K has
1in fact it follows from the Gromov-Losert structure theory that every closed subgroup is
compactly generated.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 3
a co-compact subgroup H/K which can be embedded as a closed and co-compact
subgroup of a connected and simply connected solvable Lie group S of type (R).
We must be aware that being weakly commensurable is not an equivalence
relation among locally compact groups (unlike among finitely generated groups).
Additionally, the Lie shadow S is not unique up to isomorphism (e.g. Z3 is a
co-compact lattice in both R3 and the universal cover of the group of motions of
the plane).
We cannot replace the word solvable by the word nilpotent in the above theo-
rem. We refer the reader to Example 7.9 for an example of a connected solvable
Lie group of type (R) without compact normal subgroups, which admits no co-
compact nilpotent subgroup. In fact this is typical for Lie groups of type (R). So
in the general locally compact case (or just the Lie case) groups of polynomial
growth can be genuinely not nilpotent, unlike what happens in the discrete case.
There are important differences between the discrete case and the general case.
For example, we will show that no rate of convergence can be expected in Theorem
1.1 when G is solvable not nilpotent, while some polynomial rate always holds in
the nilpotent discrete case [9].
Theorem 1.2 will enable us to reduce most geometric questions about locally
compact groups of polynomial growth, and in particular the proof of Theorem
1.1, to the connected Lie group case. Observe also that Theorem 1.2 subsumes
Gromov’s theorem on polynomial growth, because it is not hard to see that a
co-compact lattice in a solvable Lie group of polynomial growth must be virtually
nilpotent (see Remark 7.8). Of course in the proof we make use of Gromov’s
theorem, in its generalized form for locally compact groups due to Losert. The rest
of the proof combines ideas of Y. Guivarc’h, D. Mostow and a crucial embedding
theorem of H.C. Wang. It is given in Paragraph 7.1 and is largely independent of
the rest of the paper.
1.2. Asymptotic shapes. The main part of the paper is devoted to the asymp-
totic behavior of periodic pseudodistances on G. We refer the reader to Definition
4.1 for the precise definition of this term, suffices it to say now that it is a class of
pseudodistances which contains both left-invariant word metrics on G and geodesic
metrics on G that are left-invariant under co-compact subgroup of G.
Theorem 1.2 enables us to assume that G is a co-compact subgroup of a simply
connected solvable Lie group S, and rather than looking at pseudodistances on G,
we will look at pseudodistances on S that are left-invariant under a co-compact
subgroup H. More precisely a direct consequence of Theorem 1.2 is the following:
Proposition 1.3. Let G be a locally compact group with polynomial growth and ρ
a periodic metric on G. Then (G, ρ) is (1, C)-quasi-isometric to (S, ρS) for some
finite C > 0, where S is a connected and simply connected solvable Lie group of
type (R) and ρS some periodic metric on S.
Recall that two metric spaces (X, dX ) and (Y, dY ) are called (1, C)-quasi-isometric
if there exists a map φ : X → Y such that any y ∈ Y is at distance at most C
from some element in the image of φ and if |dY (φ(x), φ(x
′)) − dX(x, x
′)| ≤ C for
all x, x′ ∈ X.
4 EMMANUEL BREUILLARD
In the case when S is Rd and H is Zd, it is a simple exercise to show that any
periodic pseudodistance is asymptotic to a norm on Rd, i.e. ρ(e, x)/ ‖x‖ → 1 as
x → ∞, where ‖x‖ = lim 1
ρ(e, nx) is a well defined norm on Rd. Burago in [6]
showed a much finer result, namely that if ρ is coarsely geodesic, then ρ(e, x)−‖x‖
is bounded when x ranges over Rd.When S is a nilpotent Lie group andH a lattice
in S, then Pansu proved in his thesis [27], that a similar result holds, namely that
ρ(e, x)/ |x| → 1 for some (unique only after a choice of a one-parameter group of
dilations) homogeneous quasi-norm |x| on the nilpotent Lie group. However, we
show in Section 8, that it is not true in general that ρ(e, x) − |x| stays bounded,
even for finitely generated nilpotent groups, thus answering a question of Burago
(see also Gromov [20]). Our main purpose here will be to extend Pansu’s result
to solvable Lie groups of polynomial growth.
As was first noticed by Guivarc’h in his thesis [21], when dealing with geometric
properties of solvable Lie groups, it is useful to consider the so-called nilshadow of
the group, a construction first introduced by Auslander and Green in [2]. Accord-
ing to this construction, it is possible to modify the Lie product on S in a natural
way, by so to speak removing the semisimple part of the action on the nilradical,
in order to turn S into a nilpotent Lie group, its nilshadow SN . The two Lie
groups have the same underlying manifold, which is diffeomorphic to Rn, only a
different Lie product. They also share the same Haar measure. This “semisimple
part” is a commutative relatively compact subgroup T (S) of automorphisms of S,
image of S under a homomorphism T : S → Aut(S). The new product g ∗ h is
defined as follows by twisting the old one g · h by means of T (S),
(1) g ∗ h := g · T (g−1)h
The two groups S and SN are easily seen to be quasi-isometric, and this is why any
locally compact group of polynomial growth G is quasi-isometric to some nilpotent
Lie group. In particular, their asymptotic cones are bi-Lipschitz. The asymptotic
cone of a nilpotent Lie group is a certain associated graded nilpotent Lie group
endowed with a left invariant geodesic distance (or Carnot group). The graded
group associated to SN will be called the graded nilshadow of S. Section 3 will be
devoted to the construction and basic properties of the nilshadow and its graded
group.
In this paper, we are dealing with a finer relation than quasi-isometry. We will
be interested in when do two left invariant (or periodic) distances are asymptotic2
(in the sense that
d1(e,g)
d2(e,g)
→ 1 when g → ∞). In particular, for every locally
compact group G with polynomial growth, we will identify its asymptotic cone
up to isometry and not only up to quasi-isometry or bi-Lipschitz equivalence (see
Corollary 1.9 below). One of our main results is the following:
2Yet a finer equivalence relation is (1, C)-quasi-isometry, i.e. being at bounded distance in
Gromov-Hausdorff metric; classifying periodic metrics up to this kind of equivalence is much
harder.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 5
Theorem 1.4 (Main theorem). Let S be a simply connected solvable Lie group
with polynomial growth. Let ρ(x, y) be periodic pseudodistance on S which is in-
variant under a co-compact subgroup H of S (see Def. 4.1). On the manifold S,
one can put a new Lie group structure, which turns S into a stratified nilpotent Lie
group, the graded nilshadow of S, and a subFinsler metric d∞(x, y) on S which is
left-invariant for this new group structure such that
ρ(e, g)
d∞(e, g)
as g → ∞ in S. Moreover every automorphism in T (H) is an isometry of d∞.
The reader who wishes to see a simple illustration of this theorem can go directly
to subsection 8.1, where we have treated in detail a specific example of periodic
metric on the universal cover of the groups of motions of the plane.
The new stratified nilpotent Lie group structure on S given by the graded
nilshadow comes with a one-parameter family of so-called homogeneous dilations
{δt}t>0. It also comes with an extra group of automorphisms, namely the image
of H under the homomorphism T . This yields automorphisms of S for both the
original group structure on S and the new graded nilshadow group structure.
Moreover the dilations {δt}t>0 are automorphisms of the graded nilshadow and
they commute with T (H).
A subFinsler metric is a geodesic distance which is defined exactly as subRie-
mannian (or Carnot-Caratheodory) metrics on Carnot groups are defined (see e.g.
[25]), except that the norm used to compute the length of horizontal paths is not
necessarily a Euclidean norm. We refer the reader to Section 2.1 for a precise
definition.
In Theorem 1.4 the subFinsler metric d∞ is left invariant for the new Lie struc-
ture on S and it is also invariant under all automorphisms in T (H) (these form
a relatively compact commutative group of automorphisms). Moreover it satisfies
the following pleasing scaling law:
d∞(δt(x), δt(y)) = td∞(x, y) ∀t > 0.
The proof of Theorem 1.4 splits in two important steps. The first is a reduction
to the nilpotent case and is performed in Section 5. Using a double averaging
of the pseudodistance ρ over both K := T (H) and S/H, we construct an asso-
ciated pseudodistance, which is periodic for the nilshadow structure on S (i.e.
left-invariant by a co-compact subgroup for this structure), and we prove that it
is asymptotic to the original ρ. This reduces the problem to nilpotent Lie groups.
The key to this reduction is the following crucial observation: that unipotent au-
tomorphisms of S induce only a sublinear distortion, forcing the metric ρ to be
asymptotically invariant under T (H). The second step of the proof assumes that
S is nilpotent. This part is dealt with in Section 6 and is essentially a reformula-
tion of the arguments used by Pansu in [27].
6 EMMANUEL BREUILLARD
Incidently, we stress the fact that the generality in which Section 6 is treated
(i.e. for general coarsely geodesic, and even asymptotically geodesic periodic met-
rics) is necessary to prove even the most basic case (i.e. word metrics) of Theorem
1.4 for non-nilpotent solvable groups. So even if we were only interested in the
asymptotics of left invariant word metrics on a solvable Lie group of polynomial
growth S, we would still need to understand the asymptotics of arbitrary coarsely
geodesic left invariant distances (and not only word metrics!) on nilpotent Lie
groups. This is because the new pseudodistance obtained by averaging, see (30),
is no longer a word metric.
The subFinsler metric d∞(e, x) in the above theorem is induced by a certain
T (H)-invariant norm on the first stratum m1 of the graded nilshadow (which
is T (H)-invariant complementary subspace of the commutator subalgebra of the
nilshadow). This norm can be described rather explicitly as follows.
Recall that we have3 a canonical map π1 : S → m1, which is a group homomor-
phism for both the nilshadow and graded nilshadow structures. Then:
{v ∈ m1, ‖v‖∞ ≤ 1} =
CvxHull
π1(h)
ρ(e, h)
, h ∈ H\F
where the right hand side is the intersection over all compact subsets F of S of
the closed convex hull of the points π1(h)/ρ(e, h) for h ∈ H\F .
Figure 1 gives an illustration of the limit shape corresponding to the word
metric on the 3-dimensional discrete Heisenberg group with standard generators.
We explain in the Appendix how one can compute explicitly the geodesics of the
limit metric and the limit shape in this example.
When S itself is nilpotent to begin with and ρ is (in restriction to H) the
word metric associated to a symmetric compact generating set Ω of H (namely
ρΩ(e, h) := inf{n ∈ N;h ∈ Ω
n}), the above norm takes the following simple form:
(2) {v ∈ m1, ‖v‖∞ ≤ 1} = CvxHull {π1(ω), ω ∈ Ω}
For instance, in the special case when H is a torsion-free finitely generated nilpo-
tent group with generating set Ω and S is its Malcev closure, the unit ball
{v ∈ m1, ‖v‖∞ ≤ 1} is a polyhedron in m1. This was Pansu’s description in
[27].
However when S is not nilpotent, and is equipped with a word metric ρΩ on
a co-compact subgroup, then the determination of the limit shape, i.e. the de-
termination of the limit norm ‖ · ‖∞ on the abelianized nilshadow, is much more
difficult. Clearly ‖ · ‖∞ is K-invariant and it is a simple observation that the unit
ball for ‖ · ‖∞ is always contained in the convex hull of the K-orbit of π1(Ω).
3The subspace m1 can be identified with the abelianized nilshadow (or abelianized graded
nilshadow) by first identifying the nilshadow with its Lie algebra via the exponential map and
then projecting modulo the commutator subalgebra. The map does not depend on the choice
involved in the construction of the nilshadow. See also Remark 3.7.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 7
Nevertheless the unit ball is typically smaller than that (unless Ω was K-invariant
to begin with).
In general it would be interesting to determine whether there exists a simple
description of the limit shape of an arbitrary word metric on a solvable Lie group
with polynomial growth. We refer the reader to Section 8 and Paragraph 8.2 for an
example of a class of word metrics on the universal cover of the group of motions
of the plane, for which we were able to compute the limit shape.
Another by-product of Theorem 1.4 is the following result.
Corollary 1.5 (Asymptotic shape). Let S be a simply connected solvable Lie
group with polynomial growth and H a co-compact subgroup. Let ρ be an H-
periodic pseudodistance on S. Then in the Hausdorff metric,
(Bρ(t)) = C,
where C is a T (H)-invariant compact neighborhood of the identity in S, Bρ(t) is
the ρ-ball of radius t in S and {δt}t>0 is a one-parameter group of dilations on S
(equipped with the graded nilshadow structure). Moreover, C = {g ∈ S, d∞(e, g) ≤ 1}
is the unit ball of the limit subFinsler metric from Theorem 1.4.
Proof. By Theorem 1.4, for every ε > 0 we have Bd∞(t−εt) ⊂ Bρ(t) ⊂ Bd∞(t+εt)
if t is large enough. Since δ 1
(Bd∞(t)) = C, for all t > 0, we are done. �
Combining this with Theorem 1.2, we also get the following corollary, of which
Theorem 1.1 is only a special case with ρ the word metric associated to the gen-
erating set Ω.
Corollary 1.6 (Volume asymptotics). Suppose that G is a locally compact group
with polynomial growth and ρ is a periodic pseudodistance on G. Let Bρ(t) be
the ρ-ball of radius t in G, i.e. Bρ(t) = {x ∈ G, ρ(e, x) ≤ t}, then there exists a
constant c(ρ) > 0 such that the following limit exists:
(3) lim
volG(Bρ(t))
td(G)
= c(ρ)
Here d(G) is the integer d(SN ), the so-called homogeneous dimension of the
nilshadow SN of a Lie shadow S of G (obtained by Theorem 1.2), and is given by
the Bass-Guivarc’h formula:
(4) d(SN ) =
dim(Ck(SN ))
where {Ck(SN )}k is the descending central series of SN .
The limit c(ρ) is equal to the volume volS(C) of the limit shape C from Corollary
1.5 once we make the right choice of Haar measure on a Lie shadow S of G. Let
us explain this choice. Recall that according to Theorem 1.2, G/K admits a co-
compact subgroup H/K which embeds co-compactly in S. Starting with a Haar
measure volG on G, we get a Haar measure on G/K after fixing the Haar measure
of K to be of total mass 1, and we may then choose a Haar measure on H/K so
that the compact quotient G/H has volume 1. Finally we choose the Haar measure
8 EMMANUEL BREUILLARD
Figure 1. The asymptotic shape of large balls in the Cayley graph
of the Heisenberg group H(Z) = 〈x, y|[x, [x, y]] = [y, [x, y]] = 1〉
viewed in exponential coordinates.
on S so that the other compact quotient S/(H/K) has volume 1. This gives the
desired Haar measure volS such that c(ρ) = volS(C).
Note that Haar measure on S is also invariant under the group of automor-
phisms T (S) and is thus left invariant for the nilshadow structure on S. It is also
left invariant for the graded nilshadow structure. In both exponential coordinates
of the first kind (on SN ) and of the second kind (as in Lemma 3.10), Haar measure
is just Lebesgue measure.
In the case of the discrete Heisenberg group of dimension 3 equipped with the
word metric given by the standard generators, it is possible to compute the con-
stant c(ρ) and the volume of the limit shape as shown in Figure 1. In this case the
volume is 31
(see the Appendix). The 5-dimensional Heisenberg group can also be
worked out and the volume of its limit shape (associated to the word metric given
by standard generators) is equal to 2009
21870
log 2
32805
. The fact that this number is
transcendental implies that the growth series of this group, i.e. the formal power
series
n≥0 |Bρ(n)|z
n is not algebraic in the sense that it is not a solution of a
polynomial equation with rational functions in C(z) as coefficients (see [33, Prop.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 9
3.3.]). This was observed by Stoll in [33] by more direct combinatorial means.
Stoll also shows there the interesting fact that the growth series can be rational
for some other choices of generating sets in the 5-dimensional Heisenberg group.
So rationality of the growth series depends on the generating set.
Another interesting feature is asymptotic invariance:
Corollary 1.7 (Asymptotic invariance). Let S be a simply connected solvable
Lie group with polynomial growth and ρ a periodic pseudodistance on S. Let ∗ be
the new Lie product on S given by the nilshadow group structure (or the graded
nilshadow group structure). Then ρ(e, g ∗ x)/ρ(e, x) → 1 as x → ∞ for every
g ∈ S.
This follows immediately from Theorem 1.4, when ∗ is the graded nilshadow
product, and from Theorem 6.2 below in the case ∗ is the nilshadow group struc-
ture.
It is worth observing that we may not in general replace ∗ by the ordinary
product on S. Indeed, let for instance S = R ⋉ R2 be the universal cover of the
group of motions of the Euclidean plane, then S, like its nilshadow R3, admits
a lattice Γ ≃ Z3. The quotient S/Γ is diffeomorphic to the 3-torus R3/Z3 and
it is easy to find Riemannian metrics on this torus so that their lift to R3 is not
invariant under rotation around the z-axis. Hence this metric, viewed on the Lie
group S will not be asymptotically invariant under left translation by elements of
S. Nevertheless, if the metric is left-invariant and not just periodic, then we have
the following corollary of the proof of Theorem 1.4.
Corollary 1.8 (Left-invariant pseudodistances are asymptotic to subFinsler met-
rics). Let S be a simply connected solvable Lie group of polynomial growth and ρ
be a periodic pseudodistance on S which is invariant under all left-translations by
elements of S (e.g. a left-invariant coarsely geodesic metric on S). Then there
is a left-invariant subFinsler metric d on S which is asymptotic to ρ in the sense
ρ(e,g)
d(e,g)
→ 1 as g → ∞.
We already mentioned above that determining the exact limit shape of a word
metric on S is a difficult task. Consequently so is the task of telling when two
distinct word metrics are asymptotic. The above statement says that in any case
every word metric on S is asymptotic to some left-invariant subFinsler metric. So
the set of possible limit shapes is no richer for word metrics than for left-invariant
subFinsler metrics.
We note that in the case of nilpotent Lie groups (where K is trivial), Theorem
1.4 shows that every periodic metric is asymptotic to a left-invariant metric. It is
still an open problem to determine whether every coarsely geodesic periodic metric
is at a bounded distance from a left-invariant metric (this is Burago’s theorem in
n, more about it below).
Theorems 1.2 and 1.4 allow us to describe the asymptotic cone of (G, ρ) for any
periodic pseudodistance ρ on any locally compact group with polynomial growth.
10 EMMANUEL BREUILLARD
Corollary 1.9 (Asymptotic cone). Let G be a locally compact group with polyno-
mial growth and ρ a periodic pseudodistance on G. Then the sequence of pointed
metric spaces {(G, 1
ρ, e)}n≥1 converges in the Gromov-Hausdorff topology. The
limit is the metric space (N, d∞, e), where N is a graded simply connected nilpo-
tent Lie group and d∞ a left invariant subFinsler metric on N . Moreover the Lie
group N is (up to isomorphism) independent of ρ. The space (N, d∞) is isometric
to “the asymptotic cone” associated to (G, ρ). This asymptotic cone is independent
of the choice of ultrafilter used to define it.
This corollary is a generalization of Pansu’s theorem ((10) in [27]). We refer
the reader to the book [18] for the definitions of the asymptotic cone and the
Gromov-Hausdorff convergence. We discuss in Section 8 the speed of convergence
(in the Gromov-Hausdorff metric) in this theorem and its corollaries about volume
growth. In particular there is a major difference between the discrete nilpotent case
and the solvable non nilpotent case. In the former, one can find a polynomial rate
of convergence [9], while in the latter no such rate exist in general (see Theorem
8.1).
1.3. Folner sets and ergodic theory. A consequence of Corollary 1.6 is that
sequences of balls with radius going to infinity are Folner sequences, namely:
Corollary 1.10. Let G be a locally compact group with polynomial growth and
ρ a periodic pseudodistance on G. Let Bρ(t) be the ρ-ball of radius t in G. Then
{Bρ(t)}t>0 form a Folner family of subsets of G namely, for any compact set F
in G, we have (∆ denotes the symmetric difference)
(5) lim
volG(FBρ(t)∆Bρ(t))
volG(Bρ(t))
Proof. Indeed FBρ(t)∆Bρ(t) ⊂ Bρ(t + c)\Bρ(t) for some c > depending on F .
Hence (5) follows from (3). �
This settles the so-called “localization problem” of Greenleaf for locally compact
groups of polynomial growth (see [16]), i.e. determining whether the powers of
a compact generating set {Ωn}n form a Folner sequence. At the same time it
implies that the ergodic theorem for G-actions holds along any sequence of balls
with radius going to infinity.
Theorem 1.11. (Ergodic Theorem) Let be given a locally compact group G with
polynomial growth together with a measurable G-space X endowed with a G-
invariant ergodic probability measure m. Let ρ be a periodic pseudodistance on
G and Bρ(t) the ρ-ball of radius t in G. Then for any p, 1 ≤ p < ∞, and any
function f ∈ Lp(X,m) we have
volG(Bρ(t))
Bρ(t)
f(gx)dg =
for m-almost every x ∈ X and also in Lp(X,m).
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 11
In fact, Corollary 1.10 above, was the “missing block” in the proof of the ergodic
theorem on groups of polynomial growth. So far and to my knowledge, Corollary
1.10 and Theorem 1.11 were known only along some subsequence of balls {Bρ(tn)}n
chosen so that (5) holds (see for instance [10] or [34]). This issue was drawn to
my attention by A. Nevo and was my initial motivation for the present work. We
refer the reader to the A. Nevo’s survey paper [26] Section 5.
It later turned out that the mere fact that balls are Folner in a given polynomial
growth locally compact group can also be derived from the fact these groups are
doubling metric spaces (which is an easier result than the precise asymptotics
vol(Ωn) ∼ cΩn
d(G) proved in this paper and only requires lower and upper bounds
of the form c1n
d(G) ≤ vol(Ωn) ≤ c2n
d(G)). This was observed by R. Tessera
[35] who rediscovered a cute argument of Colding and Minicozzi [11, Lemma 3.3.]
showing that the volume of spheres Ωn+1 \ Ωn is at most some O(n−δ) times the
volume of the ball Ωn, where δ > 0 is a positive constant depending only on the
doubling constant the word metric induced by Ω in G.
In [9], we give a better upper bound (which depends only on the nilpotency class
and not on the doubling constant) for the volume of spheres in the case of finitely
generated nilpotent groups. This is done by showing the following error term in
the asymptotics of the volume of balls: we have vol(Ωn) = cΩn
d(G)+O(nd(G)−αr ),
where αr > 0 depends only on the nilpotency class r of G. We refer the reader to
Section 8 and to the preprint [9] for more information on this. We only note here
that although the above Colding-Minicozzi-Tessera upper bound on the volume of
spheres holds generally for all locally compact groups G with polynomial growth,
unless G is nilpotent, there is no error term in general in the asymptotics of the
volume of balls. An example with arbitrarily small speed is given in §8.1.
1.4. A conjecture of Burago and Margulis. In [7] D. Burago and G. Margulis
conjectured that any two word metrics on a finitely generated group which are
asymptotic (in the sense that
ρ1(e,γ)
ρ2(e,γ)
tends to 1 at infinity) must be at a bounded
distance from one another (in the sense that |ρ1(e, γ) − ρ2(e, γ)| = O(1)). This
holds for abelian groups. An analogous result was proved by Abels and Margulis
for word metrics on reductive groups [1]. S. Krat [23] established this property
for word metrics on the Heisenberg group H3(Z). However using Theorem 1.4
(which in this particular case of finitely generated nilpotent groups is just Pansu’s
theorem [27]) we will show in Section 8.3, that there are counter-examples and
exhibit two word metrics on H3(Z) × Z which are asymptotic and yet are not at
a bounded distance. For more on this counter-example, and how to adequately
modify the conjecture of Burago and Margulis, we refer the interested reader to
1.5. Organization of the paper. Sections 2-4 are devoted to preliminaries. In
Section 2 we present the basic nilpotent theory as can be found in Guivarc’h’s
thesis [21]. In particular, a full proof of the Bass-Guivarc’h formula is given. In
Section 3, we recall the construction of the nilshadow of a solvable Lie group.
12 EMMANUEL BREUILLARD
In Section 4 we set up the axioms and basic properties of the (pseudo)distance
functions that are studied in this paper.
Sections 5-7 contain the core of the proof of the main theorems. In Section 5, we
assume that G is a simply connected solvable Lie group and reduce the problem to
the nilpotent case. In Section 6, we assume that G is a simply connected nilpotent
Lie group and prove Theorem 1.4 in this case following the strategy used by Pansu
in [27]. In Section 7, we prove Theorem 1.2 for general locally compact groups
and reduce the proof of the results of the introduction to the Lie case.
In the last section we make further comments about the speed of convergence.
In particular we give examples answering negatively the aforementioned question
of Burago and Margulis.
The Appendix is devoted to the discrete Heisenberg groups of dimension 3 and
5. We compute their limit balls, explain Figure 1, and recover the main result of
Stoll [33].
The reader who is mainly interested in the nilpotent group case can read directly
Section 6 while keeping an eye on Sections 2 and 4 for background notations and
elementary facts.
Finally, let us mention that the results and methods of this paper were largely
inspired by the works of Y. Guivarc’h [21] and P. Pansu [27].
1.6. Nota Bene. A version of this article circulated since 2007. The present ver-
sion contains essentially the same material, only the exposition has been improved
and several somewhat sketchy arguments have been replaced by full fledged proofs
(in particular in Sections 3 and 7). This delay is due to the fact that I was plan-
ning for a long time to improve Section 6 and show an error term in the volume
asymptotics of balls in nilpotent groups. E. Le Donne and I recently managed to
achieve this and it has now become an independent joint paper [9].
2. Quasi-norms and the geometry of nilpotent Lie groups
In this section, we review the necessary background material on nilpotent Lie
groups. In paragraph 2.4, we give some crucial properties of homogeneous quasi
norms and reproduce some lemmas originally due to Y. Guivarc’h which will be
used in the sequel. Meanwhile, we prove the Bass-Guivarc’h formula for the de-
gree of polynomial growth of nilpotent Lie groups, following Guivarc’h’s original
argument.
2.1. Carnot-Caratheodory metrics. Let G be a connected Lie group with Lie
algebra g and let m1 be a vector subspace of g. We denote by ‖·‖ a norm on m1.
We now recall the definition of a left-invariant Carnot-Carathéodory metric also
called subFinsler metric on G. Let x, y ∈ G. We consider all possible piecewise
smooth paths ξ : [0, 1] → G going from ξ(0) = x to ξ(1) = y. Let ξ′(u) be the
tangent vector which is pulled back to the identity by a left translation, i.e.
= ξ(u) · ξ′(u)
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 13
where ξ′(u) ∈ g and the notation ξ(u) · ξ′(u) means the image of ξ′(u) under
the differential at the identity of the left translation by the group element ξ(u).
We say that the path ξ is horizontal if the vector ξ′(u) belongs to m1 for all
u ∈ [0, 1]. We denote by H the set of piecewise smooth horizontal paths. The
Carnot-Carathéodory metric associated to the norm ‖·‖ is defined by:
d(x, y) = inf{
∥∥ξ′(u)
∥∥ du, ξ ∈ H, ξ(0) = x, ξ(1) = y}
where the infimum is taken over all piecewise smooth paths ξ : [0, 1] → N with
ξ(0) = x, ξ(1) = y that are horizontal in the sense that ξ′(u) ∈ m1 for all u. If
‖ · ‖ is a Euclidean norm, the metric d(x, y) is also called subRiemannian. In this
paper however the norm ‖ · ‖ will typically not be Euclidean (it can be polyhedral
like in the case of word metrics on finitely generated nilpotent groups) and d(x, y)
will only be subFinsler. If m1 = g, and ‖·‖ is a Euclidean (resp. arbitrary) norm
on g, then d is simply the usual left-invariant Riemannian (resp. Finsler) metric
associated to ‖·‖ .
Chow’s theorem (e.g. see [19] or [25]) tells us that d(x, y) is finite for all x and
y in G if and only if the vector subspace m1, together with all brackets of elements
of m1, generates the full Lie algebra g. If this condition is satisfied, then d is a
distance on G which induces the original topology of G.
In this paper, we will only be concerned with Carnot-Caratheodory metrics on
a simply connected nilpotent Lie group N . In the sequel, whenever we speak of a
Carnot-Carathéodory metric on N, we mean one that is associated to a norm ‖·‖
on a subspace m1 such that n = m1 ⊕ [n, n] where n = Lie(N). It is easy to check
that any such m1 generates the Lie algebra n.
Remark 2.1. Let us observe here that for such a metric d on N, we have the
following description of the unit ball for ‖·‖
{v ∈ m1, ‖v‖ ≤ 1} =
π1(x)
d(e, x)
, x ∈ N\{e}
where π1 is the linear projection from n (identified with N via exp) to m1 with
kernel [n, n]. Indeed, π1 gives rise to a homomorphism from N to the vector space
m1. And if ξ(u) is a horizontal path from e to x, then applying π1 to (6) we
get d
π1(ξ(u)) = ξ
′(u), hence π1(x) =
ξ′(u)du. Hence ‖π1(x)‖ ≤ d(e, x) with
equality if x ∈ m1.
2.2. Dilations on a nilpotent Lie group and the associated graded group.
We now focus on the case of simply connected nilpotent Lie groups. Let N be
such a group with Lie algebra n and nilpotency class r. For background about
analysis on such groups, we refer the reader to the book [12]. The exponential
map is a diffeomorphism between n and N . Most of the time, if x ∈ n, we will
abuse notation and denote the group element exp(x) simply by x. We denote
by {Cp(n)}p the central descending series for n, i.e. C
p+1(n) = [n, Cp(n)] with
C0(n) = n and Cr(n) = {0}.
14 EMMANUEL BREUILLARD
Let (mp)p≥1 be a collection of vector subspaces of n such that for each p ≥ 1,
(7) Cp−1(n) = Cp(n)⊕mp.
Then n = ⊕p≥1mp and in this decomposition, any element x in n (or N by abuse
of notation) will be written in the form
πp(x)
where πp(x) is the linear projection onto mp.
To such a decomposition is associated a one-parameter group of dilations (δt)t>0.
These are the linear endomorphisms of n defined by
δt(x) = t
for any x ∈ mp and for every p. Conversely, the one-parameter group (δt)t≥0
determines the (mp)p≥1’s since they appear as eigenspaces of each δt, t 6= 1. The
dilations δt do not preserve a priori the Lie bracket on n. This is the case if and
only if
(8) [mp,mq] ⊆ mp+q
for every p and q (where [mp,mq] is the subspace spanned by all commutators of
elements of mp with elements of mq). If (8) holds, we say that the (mp)p≥1 form a
stratification of the Lie algebra n, and that n is a stratified (or homogeneous) Lie
algebra. It is an exercise to check that (8) is equivalent to require [m1,mp] = mp+1
for all p.
If (8) does not hold, we can however consider a new Lie algebra structure on
the real vector space n by defining the new Lie bracket as [x, y]∞ = πp+q([x, y])
if x ∈ mp and y ∈ mq. This new Lie algebra n∞ is stratified and has the same
underlying vector space as n. We denote by N∞ the associated simply connected
Lie group. Moreover the (δt)t>0 form a one-parameter group of automorphisms of
n∞. In fact the original Lie bracket [x, y] on n can be deformed continuously to
[x, y]∞ through a continuous family of Lie algebra structures by setting
(9) [x, y]t = δ 1
([δtx, δty])
and letting t → +∞. Note that conversely, if the δt’s are automorphisms of n,
then [x, y] = πp+q([x, y]) for all x ∈ mp and y ∈ mq, and n = n∞.
The graded Lie algebra associated to n is by definition
gr(n) =
Cp(n)/Cp+1(n)
endowed with the Lie bracket induced from that of n. The quotient map mp →
Cp(n)/Cp+1(n) gives rise to a linear isomorphism between n and gr(n), which is
a Lie algebra isomorphism between the new Lie algebra structure n∞ and gr(n).
Hence stratified Lie algebra structures induced by a choice of supplementary sub-
spaces (mp)p≥1 as in (7) are all isomorphic to gr(n).
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 15
On N∞ the left-invariant subFinsler metrics d∞ associated to a choice of norm
on m1 are of special interest. The one-parameter group of dilations {δt}t is an
automorphism of N∞ and that
(10) d∞(δtx, δty) = td∞(x, y)
for any x, y ∈ N∞. The metric space (N∞, d∞) is called a Carnot group.
If on the other hand the simply connected nilpotent Lie groupN is not stratified,
then the group of dilations (δt)t associated to a choice of supplementary vector
subspaces mi’s as in (7) will not consist of automorphisms of N and the relation
(10) will not hold.
Note also that if we are given two different choices of supplementary subspaces
mi’s and m
i’s as in (7), then the left-invariant Carnot-Caratheodory metrics on
the corresponding stratified Lie groups are isometric if and only if (m1, ‖·‖) and
(m′1, ‖·‖
) are isometric (a linear isomorphism from m1 to m
1 that sends ‖·‖ to
extends to an isometry of the two Carnot groups).
2.3. The Campbell-Hausdorff formula. The exponential map exp : n → N
is a diffeomorphism. In the sequel, we will often abuse notation and identify N
and n without further notice. In particular, for two elements x and y of n (or
N equivalently) xy will denote their product in N , while x + y denotes the sum
in n. Let (δt)t be a one-parameter group of dilations associated to a choice of
supplementary subspaces mi’s as in (7). We denote the corresponding stratified
Lie algebra by n∞ as above and the Lie group by N∞. The product on N∞ is
denoted by x ∗ y. On N∞ the dilations (δt)t are automorphisms.
The Campbell-Hausdorff formula (see [12]) allows to give a more precise form of
the product in N. Let (ei)1≤i≤d be a basis of n adapted to the decomposition into
mi’s, that is mi = span{ej , ej ∈ mi}. Let x = x1e1 + ...+ xded the corresponding
decomposition of an element x ∈ n. Then define the degree di = deg(ei) to be the
largest j such that ei ∈ C
j−1(n). If α = (α1, ..., αd) ∈ N
d is a multi-index, then let
dα = deg(e1)α1 + ...+ deg(ed)αd.
The Campbell-Hausdorff formula yields
(11) (xy)i = xi + yi +
Cα,βx
where Cα,β are real constants and the sum is over all multi-indices α and β such
that dα + dβ ≤ deg(ei), dα ≥ 1 and dβ ≥ 1.
From (9), it is easy to give the form of the associated stratified Lie group law:
(12) (x ∗ y)i = xi + yi +
Cα,βx
where the sum is restricted to those α’s and β’s such that dα + dβ = deg(ei),
dα ≥ 1 and dβ ≥ 1.
2.4. Homogeneous quasi-norms and Guivarc’h’s theorem on polynomial
growth. Let n be a finite dimensional real nilpotent Lie algebra and consider a
decomposition
n = m1 ⊕ ...⊕mr
16 EMMANUEL BREUILLARD
by supplementary vector subspaces as in (7). Let (δt)t>0 be the one parameter
group of dilations associated to this decomposition, that is δt(x) = t
ix if x ∈ mi.
We now introduce the following definition.
Definition 2.2 (Homogeneous quasi-norm). A continuous function | · | : n → R+
is called a homogeneous quasi-norm associated to the dilations (δt)t, if it satisfies
the following properties:
(i) |x| = 0 ⇔ x = 0.
(ii) |δt(x)| = t|x| for all t > 0.
Example 2.3. (1) Quasi-norms of supremum type, i.e. |x| = maxp ‖πp(x)‖
where ‖·‖p are ordinary norms on the vector space mp and πp is the projection on
mp as above.
(2) |x| = d∞(e, x), where d∞ is a Carnot-Carathéodory metric on a stratified
nilpotent Lie group (as the relation (10) shows).
Clearly, a quasi-norm is determined by its sphere of radius 1 and two quasi-
norms (which are homogeneous with respect to the same group of dilations) are
always equivalent in the sense that
|·|1 ≤ |·|2 ≤ c |·|1
for some constant c > 0 (indeed, by continuity, | · |2 admits a maximum on the
“sphere” {|x|1 = 1}). If the two quasi-norms are homogeneous with respect to
two distinct semi-groups of dilations, then the inequalities (13) continue to hold
outside a neighborhood of 0, but may fail near 0.
Homogeneous quasi-norms satisfy the following properties:
Proposition 2.4. Let | · | be a homogeneous quasi-norm on n, then there are
constants C,C1, C2 > 0 such that
(a) |xi| ≤ C · |x|
deg(ei) if x = x1e1 + ...+ xnen in an adapted basis (ei)i.
(b) |x−1| ≤ C · |x|.
(c) |x+ y| ≤ C · (|x|+ |y|)
(d) |xy| ≤ C1(|x|+ |y|) + C2.
Properties (a), (b) and (c) are straightforward from the fact that |x| = maxp ‖πp(x)‖
is a homogeneous quasi-norm and from (13). Property (d) justifies the term “quasi-
norm” and follows from Lemma 2.5 below. It can be a problem that the constant
C1 in (d) may not be equal to 1. In fact, this is why we use the word quasi-norm
instead of just norm, because we do not require the triangle inequality axiom to
hold. However the following lemma of Guivarc’h is often a good enough remedy
to this situation. Let ‖·‖p be an arbitrary norm on the vector space mp.
Lemma 2.5. (Guivarc’h, [21] lemme II.1) Let ε > 0. Up to rescaling each ‖·‖p
into a proportional norm λp ‖·‖p (λp > 0) if necessary, the quasi-norm |x| =
maxp ‖πp(x)‖
satisfies
(14) |xy| ≤ |x|+ |y|+ ε
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 17
for all x, y ∈ N . If N is stratified with respect to (δt)t we can take ε = 0.
This lemma is crucial also for computing the coarse asymptotics of volume
growth. For the reader’s convenience, we reproduce here Guivarc’h’s argument,
which is based on the Campbell-Hausdorff formula (11).
Proof. We fix λ1 = 1 and we are going to give a condition on the λi’s so that (14)
holds. The λi’s will be taken to be smaller and smaller as i increases. We set
|x| = maxp ‖πp(x)‖
and let |x|λ = maxp ‖λpπp(x)‖
for any r-tuple of λi’s.
We want that for any index p ≤ r,
(15) λp ‖πp(xy)‖p ≤ (|x|λ + |y|λ + ε)
By (11) we have πp(xy) = πp(x) + πp(y) + Pp(x, y) where Pp is a polynomial map
into mp depending only on the πi(x) and πi(y) with i ≤ p− 1 such that
‖Pp(x, y)‖p ≤ Cp ·
l,m≥1,l+m≤p
Mp−1(x)
lMp−1(y)
where Mk(x) := maxi≤k ‖πi(x)‖
i and Cp > 0 is a constant depending on Pp and
on the norms ‖·‖i’s. Since ε > 0, when expanding the right hand side of (15) all
terms of the form |x|lλ|y|
λ with l +m ≤ p appear with some positive coefficient,
say εl,m. The terms |x|
and |y|
appear with coefficient 1 and cause no trouble
since we always have λp ‖πp(x)‖p ≤ |x|
λ and λp ‖πp(y)‖p ≤ |y|
λ. Therefore, for
(15) to hold, it is sufficient that
λpCpMp−1(x)
lMp−1(y)
m ≤ εl,m|x|
for all remaining l and m. However, clearly Mk(x) ≤ Λk · |x|λ where Λk :=
maxi≤k{1/λ
i } ≥ 1. Hence a sufficient condition for (15) to hold is
where ε = min εl,m. Since Λp−1 depends only on the first p−1 values of the λi’s, it
is obvious that such a set of conditions can be fulfilled by a suitable r-tuple λ. �
Remark 2.6. The constant C2 in Property (d) above can be taken to be 0 when N
is stratified with respect to the mi’s (i.e. the δt’s are automorphisms), as is easily
seen after changing x and y into their image under δt. And conversely, if C2 = 0
for some δt-homogeneous quasi-norm on N, then N admits a stratification. Indeed,
from (11) and (12), we see that if the δt’s are not automorphisms, then one can
find x, y ∈ N such that, when t is small enough, |δt(xy) − δt(x)δt(y)| ≥ ct
(r−1)/r
for some c > 0. However, combining Properties (c) and Property (d) with C2 = 0
above we must have |δt(xy)− δt(x)δt(y)| = O(t) near t = 0. A contradiction.
Guivarc’h’s lemma enables us to show:
Theorem 2.7. (Guivarc’h ibid.) Let Ω be a compact neighborhood of the identity
in a simply connected nilpotent Lie group N and ρΩ(x, y) = inf{n ≥ 1, x
−1y ∈ Ωn}.
18 EMMANUEL BREUILLARD
Then for any homogeneous quasi-norm | · | on N, there is a constant C > 0 such
|x| ≤ ρΩ(e, x) ≤ C|x|+ C
Proof. Since any two homogeneous quasi-norms (w.r.t the same one-parameter
group of dilations) are equivalent, it is enough to do the proof for one of them,
so we consider the quasi-norm obtained in Lemma 2.5 with the extra property
(14). The lower bound in (16) is a direct consequence of (14) and one can take
there C to be max{|x|, x ∈ Ω} + ε. For the upper bound, it suffices to show
that there is C ∈ N such that for all n ∈ N, if |x| ≤ n then x ∈ ΩCn. To
achieve this, we proceed by induction of the nilpotency length of N. The result
is clear when N is abelian. Otherwise, by induction we obtain C0 ∈ N such that
x = ω1 · ... · ωC0n · z where ωi ∈ Ω and z ∈ C
r−1(N) whenever |x| ≤ n. Hence
|z| ≤ |x|+C0n ·max |ω
i |+ εC0 · n ≤ C1n for some other constant C1 ∈ N. So we
have reduced the problem to x = z ∈ mr = C
r−1(N) which is central in N. We
have z = zn
1 where |z1| = |z|/n ≤ C1. Since Ω is a neighborhood of the identity
in N, the set U of all products of at most dim(mr) simple commutators of length
r of elements in Ω is a neighborhood of the identity in Cr−1(N) (e.g. see [19],
p113). It follows that there is a constant C2 ∈ N such that z1 is in U
C2 , hence the
product of at most C2 dim(mr) simple commutators. Then we are done because z
itself will be equal to the same product of commutators where each letter xi ∈ Ω
is replaced by xni . This last fact follows from the following lemma:
Lemma 2.8. Let G be a nilpotent group of nilpotency class r and n1, ..., nr be
positive integers. Then for any x1, ..., xr ∈ G
1 , [x
2 , [..., x
r ]...] = [x1, [x2, [..., xr]...]
n1·...·nr
To prove the lemma it suffices to use induction and the following obvious fact:
if [x, y] commutes to x and y then [xn, y] = [x, y]n. �
Finally, we obtain:
Corollary 2.9. Let Ω be a compact neighborhood of the identity in N. Then there
are positive constants C1 and C2 such that for all n ∈ N,
d ≤ volN (Ω
n) ≤ C2n
where d is given by the Bass-Guivarc’h formula:
(17) d =
i · dimmi
Proof. By Theorem 2.7, it is enough to estimate the volume of the quasi-norm
balls. By homogeneity of the quasi-norm, we have volN{x, |x| ≤ t} = t
dvolN{x, |x| ≤
1}. �
Remark 2.10. The use of Malcev’s embedding theorem allows, as Guivarc’h ob-
served, to deduce immediately that the analogous result holds for virtually nilpotent
finitely generated groups. This fact that was also proven independently by H. Bass
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 19
[3] by a direct combinatorial argument. See also Tits’ appendix to Gromov’s pa-
per [17]. In fact Guivarc’h’s Theorem 2.7 seems to have been rediscovered several
times in the past 40 years, including by Pansu in his thesis [27], the latest example
of that being [22].
3. The nilshadow
The goal of this section is to introduce the nilshadow of a simply connected
solvable Lie group G. We will assume that G has polynomial growth, although
this last assumption is not necessary for almost everything we do in this section.
The only statement which will be used afterwards in the paper (in Section 5) is
Lemma 3.12 below. The reader familiar with the nilshadow can jump directly to
the statement of this lemma and skip the forthcoming discussion.
3.1. Construction of the nilshadow. The nilshadow of G is a simply connected
nilpotent Lie group GN , which is associated to G in a natural way. This notion
was first introduced by Auslander and Green in [2] in their study of flows on
solvmanifolds. They defined it as the unipotent radical of a semi-simple splitting
of G. However, we are going to follow a different approach for its construction by
working first at the Lie algebra level. We refer the reader to the book [13] where
this approach is taken up.
Let g be a solvable real Lie algebra and n the nilradical of g.We have [g, g] ⊂ n.
If x ∈ g, we write ad(x) = ads(x) + adn(x) the Jordan decomposition of ad(x) in
GL(g). Since ad(x) ∈ Der(g), the space of derivations of g, and Der(g) is the Lie
algebra of the algebraic group Aut(g), the Jordan components ads(x) and adn(x)
also belong to Der(g). Moreover, for each x ∈ g, ads(x) sends g into n (because
so does ad(x) and ads(x) is a polynomial in ad(x)). Let h be a Cartan subalgebra
of g, namely a nilpotent self-normalizing subalgebra. Recall that the image of
a Cartan subalgebra by a surjective Lie algebra homomorphism is again a Car-
tan subalgebra. Now since g/n is abelian, it follows that h maps onto g/n, i.e.
h+ n = g. Moreover ads(x)|h = 0 if x ∈ h, because h is nilpotent.
Now pick any real vector subspace v of h in direct sum with n. Then the
following two conditions hold:
(i) v⊕ n = g .
(ii) ads(x)(y) = 0 for all x, y ∈ v.
From (i) and (ii), it follows easily that ads(x) commutes with ad(y), ads(y) and
adn(y), for all x, y in v. We have:
Lemma 3.1. The map v → Der(g) defined by x 7→ ads(x) is a Lie algebra
homomorphism.
Proof. First let us check that this map is linear. Let x, y ∈ v. By the above
ads(y) and ads(x) commute with each other (hence their sum is semi-simple) and
commute with adn(x)+adn(y). From the uniqueness of the Jordan decomposition
20 EMMANUEL BREUILLARD
it remains to check that adn(x)+adn(y) is nilpotent if x, y in v. To see this, apply
the following obvious remark twice to a = adn(x) and V = ad(n) first and then to
a = adn(y) and V = span{adn(x), ad((ad(y))
nx), n ≥ 1} : Let V be a nilpotent
subspace of GL(g) and a ∈ GL(g) nilpotent, i.e. V n = 0 and am = 0 for some
n,m ∈ N and assume [a, V ] ⊂ V. Then (a+ V )nm = 0.
The fact that this map is a Lie algebra homomorphism follows easily from the
fact that all ads(x), x ∈ v commute with one another and with [g, g] ⊂ n.
We define a new Lie bracket on g by setting:
(18) [x, y]N = [x, y]− ads(xv)(y) + ads(yv)(x)
where xv is the linear projection of x on v according to the direct sum v⊕n = g. The
Jacobi identity is checked by a straightforward computation where the following
fact is needed: ads (ads(x)(y)) = 0 for all x, y ∈ g. This holds because, as we just
saw, ads(x)(g) ⊂ n for all x ∈ g, and ads(a) = 0 if a ∈ n.
Definition 3.2. Let gN be the vector space g endowed with the new Lie algebra
structure [·, ·]N given by (18). The nilshadow GN of G is defined to be the simply
connected Lie group with Lie algebra gN .
It is easy to check that gN is a nilpotent Lie algebra. To see this, note first that
[gN , gN ]N ⊂ n, and if x ∈ gN and y ∈ n then [x, y]N = (adn(xv) + ad(xn))(y).
However, adn(xv) + ad(xn) is a nilpotent endomorphism of n as follows from the
same remark used in the proof of Lemma 3.1. Hence gN is a nilpotent.
The nilshadow Lie product on GN will be denoted by ∗ in order to distinguish
it from the original Lie product on G. In the sequel, we will often identify G (resp.
GN ) with its Lie algebra g (resp. gN ) via their respective exponential map. Since
the underlying space of gN was g itself, this gives an identification (although not
a group isomorphism) between G and GN . Then the nilshadow Lie product can
be expressed in terms of the original product as follows:
g ∗ h = g · (T (g−1)h)
Here T is the Lie group homomorphism G → Aut(G) induced by the above
choice of supplementary subspace v as follows.
(19) T (ea)(eb) = exp(eads(av)b) ∀a, b ∈ g.
In other words, T is the unique Lie group homomorphism whose differential
at the identity is the Lie algebra homomorphism deT : g → Der(g) given by
deT (a)(b) = ads(av)b, that is the composition of the map v → Der(g) from
Lemma 3.1 with the linear projection g → g/n ≃ v.
It is easy to check that this definition of the new product is compatible with
the definition of the new Lie bracket.
It can also be checked that two choices of supplementary spaces v as above yield
isomorphic Lie structures (see [13, Chap. III]). Hence by abuse of language, we
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 21
speak of the nilshadow of g, when we mean the Lie structure on G induced by a
choice of v as above.
The following example shows several of the features of a typical solvable Lie
group of polynomial growth.
Example 3.3 (Nilshadow of a semi-direct product). Let G = R ⋉φ R
n where
φt ∈ GLn(R) is some one parameter subgroup given by φt = exp(tA) = ktut where
A is some matrix in Mn(R) and A = As +Au is its Jordan decomposition, giving
rise to kt = exp(tAs) and ut = exp(tAu). The group G is diffeomorphic to R
hence simply connected. If all eigenvalues of As are purely imaginary, then G has
polynomial growth. However G is not nilpotent unless As = 0. So let us assume
that neither As nor Au is zero. Then the nilshadow GN is the semi-direct product
R⋉u R
n where ut is the unipotent part of φt.
It is easy to compute the homogeneous dimension of G (or GN) in terms of the
dimension of the Jordan blocs of Au. If nk is the number of Jordan blocks of Au
of size k, then
d(G) = 1 +
k(k + 1)
3.2. Basic properties of the nilshadow. We now list in the form of a few
lemmas some basic properties of the nilshadow.
Lemma 3.4. The image of T : G → Aut(G) is abelian and relatively compact.
Moreover T (T (g)h) = T (h) for any g, h ∈ G.
Proof. Since G has polynomial growth it is of type (R) by Guivarc’h’s theorem.
Hence all ads(x) have purely imaginary eigenvalues. It follows that K is compact.
Since T factors through the nilradical, its image is abelian. The last equality
follows from (19) and the fact that ∀x, y ∈ g, ads(ads(x)(y)) = 0. �
Lemma 3.5. T (G) also belongs to Aut(GN ) and T is a group homomorphism
GN → Aut(GN ).
Proof. The first assertion follows from (19) and the fact that deT is a derivation
of gN as one can check from (18) and the fact that ∀x, y ∈ g, ads(ads(x)(y)) = 0.
The second assertion then follows from Lemma 3.4. �
We denote by K the closure of T (G) in Aut(G) = Aut(g).
Lemma 3.6 (K-action on gN ). K preserves v and acts trivially on it. It also
preserves the ideals n and the central descending series {Ci(gN )}i of gN .
Proof. It suffices to check that ads(v) preserves n and C
i(gN ). It preserves n
because ad(x) preserves n for all x ∈ g. It preserves Ci(gN ) because it acts as a
derivation of gN as we have already checked in the proof of Lemma 3.5. �
Remark 3.7 (Well-definedness of π1). It is also easy to check from the definition
of the nilshadow bracket that the commutator subalgebra [gN , gN ] and in fact each
term of the central descending series Ci(gN ) is an ideal in g and does not depend
on the choice of supplementary subspace v used to defined the nilshadow bracket.
22 EMMANUEL BREUILLARD
In particular the projection map π1 : gN → gN/[gN , gN ] is a well defined linear
map on g = gN (i.e. independently of the choice involved in the construction of
the nilshadow Lie bracket).
Lemma 3.8 (Exponential map). The respective exponential maps exp : g → G
and expN : gN → GN coincide on n and on v.
Proof. Since the two Lie products coincide on N = exp(n), so do their exponential
map. For the second assertion, note that T (e−tv)v = v for every v ∈ v because
ads(x)(y) = 0 for all x, y ∈ ν. It follows that {e
tv}t is a one-parameter subgroup
for both Lie structures, hence it is equal to {expN (tv)}t. �
Remark 3.9 (Surjectivity of the exponential map). The exponential map is not
always a diffeomorphism, as the example of the universal cover Ẽ of the group E
of motions of the plane shows (indeed any 1-parameter subgroup of E is either a
translation subgroup or a rotation subgroup, but the rotation subgroup is compact
hence a torus, so its lift will contain the (discrete) center of E, hence will miss
every lift of a non trivial translation). In fact, it is easy to see that if g is the Lie
algebra of a solvable (non-nilpotent) Lie group of polynomial growth, then g maps
surjectively on the Lie algebra of E. Hence, for a simply connected solvable and
non-nilpotent Lie group of polynomial growth, the exponential map is never onto.
Nevertheless its image is easily seen to be dense.
However, exponential coordinates of the second kind behave nicely. Note that
[gN , gN ] ⊂ n.
Lemma 3.10 (Exponential coordinates of the second kind). Let {Ci(gN )}i≥0
be the central descending series of gN (with C
1(gN ) = [gN , gN ]) and pick linear
subspaces mi in gN such that C
i(gN ) = mi ⊕ C
i−1(gN ) for i ≥ 2. Let ℓ be a
supplementary subspace of C1(gN ) in n. Define exponential coordinates of the
second kind by setting
mr ⊕ ...⊕m2 ⊕ ℓ⊕ v → G
(ξr, ..., ξ1, v) 7→ expN (ξr) ∗ . . . ∗ expN (ξ1) ∗ expN (v)
This map is a diffeomorphism. Moreover expN (ξr) ∗ . . . ∗ expN (ξ1) ∗ expN (v) =
eξr · ... · eξ1 · ev for all choices of v ∈ v and ξi ∈ mi.
Proof. By Lemma 3.8 the exponential maps of G and GN coincide on n and on v.
Moreover g ∗ h = g · h whenever g belongs to the nilradical exp(n) of G. Hence
expN (ξr)∗. . .∗expN (ξ1)∗expN (v) = expN (ξr)·. . .·expN (ξ1)·expN (v) = e
ξr ·...·eξ1 ·ev.
The restriction of the map to n is a diffeomorphism onto exp(n), because this map
and its inverse are explicit polynomial maps (the ξi’s are coordinates of the second
kind, see the book [12]). Now the map n ⊕ v → G sending (n, v) to en · ev is a
diffeomorphism, because G is simply connected and hence the quotient group
G/ exp(n) isomorphic to a vector space and hence to exp(v). �
Lemma 3.11 (“Bi-invariant” Riemannian metric). There exists a Riemannian
metric on G which is left invariant under both Lie structures.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 23
Proof. Indeed it suffices to pick a scalar product on g which is invariant under the
compact subgroup K = T (G) ⊂ Aut(g). �
We identify K = {T (g), g ∈ G} with its image in Aut(g) under the canonical
isomorphism between Aut(G) and Aut(g). Recall that, according to Lemma 3.6,
the central descending series of gN is invariant under ads(x) for all x ∈ v and
consists of ideals of g. The same holds for n. It follows that these linear subspaces
also invariant under K. However since K is compact, its action on g is completely
reducible. Therefore we have proved:
Lemma 3.12 (K-invariant stratification of the nilshadow). Let g be the Lie algebra
of a simply connected Lie group G with polynomial growth. Let gN be the nilshadow
Lie algebra obtained from a splitting g = n⊕v as above (i.e. n is the nilradical and
v satisfies ads(x)(y) = 0 for every x, y ∈ v). Let K := {T (g), g ∈ G} ⊂ Aut(G),
where T is defined by (19). Then there is a choice of linear subspaces mi’s and ℓ
such that
(20) gN = mr ⊕ . . . m2 ⊕ ℓ⊕ v,
where each term is K-invariant, m1 := ℓ⊕ v and the central descending series of
gN satisfies C
i(gN ) = mi ⊕ C
i−1(gN ). Moreover the action on K can be read off
on the exponential coordinates of second kind in this decomposition, namely:
eξr · ... · eξ0
= k(eξr) · ... · k(eξ0) = ek(ξr) · ... · ek(ξ0)
= expN (k(ξr)) ∗ ... ∗ expN (k(ξ0))
4. Periodic metrics
In this section, unless otherwise stated, G will denote an arbitrary locally com-
pact group.
4.1. Definitions. By a pseudodistance (or metric) on a topological space X, we
mean a function ρ : X × X → R+ satisfying ρ(x, y) = ρ(y, x) and ρ(x, z) ≤
ρ(x, y) + ρ(y, z) for any triplet of points of X. However ρ(x, y) may be equal to 0
even if x 6= y.
We will require our pseudodistances to be locally bounded, meaning that the
image under ρ of any compact subset of G × G is a bounded subset of R+. To
avoid irrelevant cases (for instance ρ ≡ 0) we will also assume that ρ is proper, i.e.
the map y 7→ ρ(e, y) is a proper map, namely the preimage of a bounded set is
bounded (we do not ask that the map be continuous). When ρ is locally bounded
then it is proper if and only if y 7→ ρ(x, y) is proper for any x ∈ G.
A pseudodistance ρ on G is said to be asymptotically geodesic if for every ε > 0
there exists s > 0 such that for any x, y ∈ G one can find a sequence of points
x1 = x, x2, ..., xn = y in G such that
ρ(xi, xi+1) ≤ (1 + ε)ρ(x, y)
and ρ(xi, xi+1) ≤ s for all i = 1, ..., n − 1.
24 EMMANUEL BREUILLARD
We will consider exclusively pseudodistances on a group G that are invariant
under left translations by all elements of a fixed closed and co-compact subgroup
H of G, meaning that for all x, y ∈ G and all h ∈ H, ρ(hx, hy) = ρ(x, y).
Combining all previous axioms, we set the following definition.
Definition 4.1. Let G be a locally compact group. A pseudodistance ρ on G will
be said to be a periodic metric (or H-periodic metric) if it satisfies the following
properties:
(i) ρ is invariant under left translations by a closed co-compact subgroup H.
(ii) ρ is locally bounded and proper.
(iii) ρ is asymptotically geodesic.
Remark 4.2. The assumption that ρ is symmetric, i.e. ρ(x, y) = ρ(y, x) is here
only for the sake of simplicity, and most of what is proven in this paper can be
done without this hypothesis.
4.2. Basic properties. Let ρ be a periodic metric on G and H some co-compact
subgroup of G. The following properties are straighforward.
(1) ρ is at a bounded distance from its restriction to H. This means that if F
is a bounded fundamental domain for H in G and for an arbitrary x ∈ G, if hx
denotes the element of H such that x ∈ hxF, then |ρ(x, y)− ρ(hx, hy)| ≤ C for
some constant C > 0.
(2) ∀t > 0 there exists a compact subset Kt of G such that, ∀x, y ∈ G, ρ(x, y) ≤
t ⇒ x−1y ∈ Kt. And conversely, if K is a compact subset of G, ∃t(K) > 0 s.t.
x−1y ∈ K ⇒ ρ(x, y) ≤ t(K).
(3) If ρ(x, y) ≥ s, the xi’s in (21) can be chosen in such a way that s ≤
ρ(xi, xi+1) ≤ 2s (one can take a suitable subset of the original xi’s).
(4) The restriction of ρ to H × H is a periodic pseudodistance on H. This
means that the xi’s in (21) can be chosen in H.
(5) Conversely, given a periodic pseudodistance ρH on H, it is possible to extend
it to a periodic pseudodistance on G by setting ρ(x, y) = ρH(hx, hy) where x =
hxF for some bounded fundamental domain F for H in G.
4.3. Examples. Let us give a few examples of periodic pseudodistances.
(1) Let Γ be a finitely generated torsion free nilpotent group which is embedded
as a co-compact discrete subgroup of a simply connected nilpotent Lie group N .
Given a finite symmetric generating set S of Γ, we can consider the corresponding
word metric dS on Γ which gives rise to a periodic metric on N given by ρ(x, y) =
dS(γx, γy) where x ∈ γxF and y ∈ γyF if F is some fixed fundamental domain for
Γ in N.
(2) Another example, given in [27], is as follows. Let N/Γ be a nilmanifold with
universal cover N and fundamental group Γ. Let g be a Riemannian metric on
N/Γ. It can be lifted to the universal cover and thus gives rise to a Riemannian
metric g̃ on N . This metric is Γ-invariant, proper and locally bounded. Since Γ is
co-compact in N, it is easy to check that it is also asymptotically geodesic hence
periodic.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 25
(3) Any word metric on G. That is, if Ω is a compact symmetric generating
subset of G, let ∆Ω(x) = inf{n ≥ 1, x ∈ Ω
n}. Then define ρ(x, y) = ∆Ω(x
−1y).
Clearly ρ is a pseudodistance (although not a distance) and it is G-invariant on the
left, it is also proper, locally bounded and asymptotically geodesic, hence periodic.
(4) If G is a connected Lie group, any left invariant Riemannian metric on G.
Here again H = G and we obtain a periodic distance. Similarly, any left invariant
Carnot-Carathéodory metric on G will do.
Remark 4.3 (Berestovski’s theorem). According to a result of Berestovski [5]
every left-invariant geodesic distance on a connected Lie group is a subFinsler
metric as defined in Paragraph 2.1.
4.4. Coarse equivalence between invariant pseudodistances. The following
proposition is basic:
Proposition 4.4. Let ρ1 and ρ2 be two periodic pseudodistances on G. Then there
is a constant C > 0 such that for all x, y ∈ G
ρ2(x, y)− C ≤ ρ1(x, y) ≤ Cρ2(x, y) + C
Proof. Clearly it suffices to prove the upper bound. Let s > 0 be the number cor-
responding to the choice ε = 1 in (21) for ρ2. From 4.2 (2), there exists a compact
subset Ks in G such that ρ2(x, y) ≤ 2s ⇒ x
−1y ∈ K2s, and there is a constant
t = t(K2s) > 0 such that x
−1y ∈ K2s ⇒ ρ1(x, y) ≤ t. Let C = max{2t/s, t}, and
let x, y ∈ G. If ρ2(x, y) ≤ s then ρ1(x, y) ≤ t so the right hand side of (22) holds.
If ρ2(x, y) ≥ s then, from (21) and 4.2 (3), we get a sequence of xi’s in G from x
to y such that s ≤ ρ2(xi, xi+1) ≤ 2s and
1 ρ2(xi, xi+1) ≤ 2ρ2(x, y). It follows
that ρ1(xi, xi+1) ≤ t for all i. Hence ρ1(x, y) ≤
ρ1(xi, xi+1) ≤ Nt ≤
tρ2(x, y)
and the right hand side of (22) holds. �
In the particular case when G = N is a simply connected nilpotent Lie group,
the distance to the origin x 7→ ρ(e, x) is also coarsely equivalent to any homoge-
neous quasi-norm on N. We have,
Proposition 4.5. Suppose N is a simply connected nilpotent Lie group. Let ρ1 be
a periodic pseudodistance on N and | · | be a homogeneous quasi-norm, then there
exists C > 0 such that for all x ∈ N
|x−1y| − C ≤ ρ1(x, y) ≤ C|x
−1y|+ C
Moreover, if ρ2 is a periodic pseudodistance on the stratified nilpotent group N∞
associated to N, then again, there is a constant C > 0 such that
ρ2(e, x)− C ≤ ρ1(e, x) ≤ Cρ2(e, x) + C
The proposition follows at once from Guivarc’h’s theorem (see Corollary 2.7
above), the equivalence of homogeneous quasi-norms, and the fact that left-invariant
Carnot-Caratheodory metrics on N∞ are homogeneous quasi norms. However,
since the group structures on N and N∞ differ, (24) cannot in general be replaced
by the stronger relation (22) as simple examples show.
26 EMMANUEL BREUILLARD
The next proposition is of fundamental importance for the study of metrics on
Lie groups of polynomial growth:
Proposition 4.6. Let G be a simply connected solvable Lie group of polynomial
growth and GN its nilshadow. Let ρ and ρN be arbitrary periodic pseudodistances
on G and GN respectively. Then there is a constant C > 0 such that for all
x, y ∈ G
ρN (x, y)− C ≤ ρ(x, y) ≤ CρN (x, y) + C
Proof. According to Proposition 4.4, it is enough to show (25) for some choice of
periodic metrics on G and GN . But in Lemma 3.11 we constructed a Riemannian
metric on G which is left invariant for both G and GN . We are done. �
4.5. Right invariance under a compact subgroup. Here we verify that, given
a compact subgroup of G, any periodic metric is at bounded distance from another
periodic metric which is invariant on the right by this compact subgroup. Let K
be a compact subgroup of G and ρ a periodic pseudodistance on G. We average ρ
with the help of the normalized Haar measure on K to get:
(26) ρK(x, y) =
ρ(xk1, yk2)dk1dk2
Then the following holds:
Lemma 4.7. There is a constant C0 > 0 depending only on ρ and K such that
for all k1, k2 ∈ K and all x, y ∈ G
(27) |ρ(xk1, yk2)− ρ(x, y)| ≤ C0
Proof. From 4.2 (2), ∃t = t(K) > 0 s.t. ∀x ∈ G, ρ(x, xk) ≤ t. Applying the
triangle inequality, we are done. �
Hence we obtain:
Proposition 4.8. The pseudodistance ρK is periodic and lies at a bounded dis-
tance from ρ. In particular, as x tends to infinity in G the following limit holds
(28) lim
ρK(e, x)
ρ(e, x)
Proof. From Lemma 4.7 and 4.2 (3), it is easy to check that ρK must be asymp-
totically geodesic, and periodic. Integrating (27) we get that ρK is at a bounded
distance from ρ and (28) is obvious. �
If K is normal in G, we thus obtain a periodic metric ρK on G/K such that
ρK(p(x), p(y)) is at a bounded distance from ρ(x, y), where p is the quotient map
G→ G/K.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 27
5. Reduction to the nilpotent case
In this section, G denotes a simply connected solvable Lie group of polynomial
growth. We are going to reduce the proof of the theorems of the Introduction
to the case of a nilpotent G. This is performed by showing that any periodic
pseudodistance ρ on G is asymptotic to some associated periodic pseudodistance
ρN on the nilshadow GN . We state this in Proposition 5.1 below.
The key step in the proof is Proposition 5.2 below, which shows the asymptotic
invariance of ρ under the “semisimple part” of G. The crucial fact there is that the
displacement of a distant point under a fixed unipotent automorphism is negligible
compared to the distance from the identity (see Lemmas 5.4, 5.5), so that the
action of the semisimple part of large elements can be simply approximated by
their action by left translation.
5.1. Asymptotic invariance under a compact group of automorphisms
of G. The main result of this section is the following. Let G be a connected and
simply connected solvable Lie group with polynomial growth and GN its nilshadow
(see Section 3).
Proposition 5.1. Let H be a closed co-compact subgroup of G and ρ an H-
periodic pseudodistance (see Definition 4.1) on G. There exist a closed subset HK
containing H which is a co-compact subgroup for both G and GN , and an HK-
periodic (for both Lie structures) pseudodistance ρK such that
(29) lim
ρK(e, x)
ρ(e, x)
The closed subgroup HK will be taken to be the closure of the group generated
by all elements of the form k(h), where h belongs to H and k belongs to the
closure K in the group Aut(G) of the image of H under the homomorphism
T : G → Aut(G) introduced in Section 3. It is easy to check from the definition
of the nilshadow product (1) that this is indeed a subgroup in both G and its
nilshadow GN .
The new pseudodistance ρK is defined as follows, using a double averaging
procedure:
(30) ρK(x, y) :=
ρ(gk(x), gk(y))dkdµ(g)
Here the measure µ is the normalized Haar measure on the coset space H\HK
and dk is the normalized Haar measure on the compact group K. Recall that
all closed subgroups of S are unimodular (since they have polynomial growth by
[21][Lemme I.3.]). Hence the existence of invariant measures on the coset spaces.
An essential part of the proof of Proposition 5.1 is enclosed in the following
statement:
28 EMMANUEL BREUILLARD
Proposition 5.2. Let ρ be a periodic pseudodistance on G which is invariant
under a co-compact subgroup H. Then ρ is asymptotically invariant under the
action of K = {T (h), h ∈ H} ⊆ Aut(G). Namely, (uniformly) for all k ∈ K,
(31) lim
ρ(e, k(x))
ρ(e, x)
The proof of Proposition 5.2 splits into two steps. First we show that it is
enough to prove (31) for a dense subset of k’s. This is a consequence of the following
continuity statement:
Lemma 5.3. Let ε > 0, then there is a neighborhood U of the identity in K such
that, for all k ∈ U,
limx→∞
ρ(x, k(x))
ρ(e, x)
Then we show that the action of T (g) can be approximated by the conjugation
by g, essentially because the unipotent part of this conjugation does not move x
very much when x is far. This is the content of the following lemma:
Lemma 5.4. Let ρ be a periodic pseudodistance on G which is invariant under
a co-compact subgroup H. Then for any ε > 0, and any compact subset F in H
there is s0 > 0 such that
|ρ(e, T (h)x) − ρ(e, hx)| ≤ ερ(e, x)
for any h ∈ F and as soon as ρ(e, x) > s0.
Proof of Proposition 5.2 modulo Lemmas (5.3) and (5.4): As ρ is assumed to
be H-invariant, for every h ∈ H, we have ρ(e, h−1x)/ρ(e, x) → 1. The proof of
the proposition then follows immediately from the combination of the last two
lemmas. �
5.2. Proof of Lemmas (5.3) and (5.4). We choose K-invariant subspaces mi’s
and ℓ of the nilshadow gN of g as in Lemma 3.12 from Section 3. In particular
gN = mr ⊕ . . . ⊕m2 ⊕ ℓ⊕ v,
where each term is K-invariant, n = [gN , gN ] ⊕ l and C
i(gN ) = mi ⊕ C
i−1(gN ).
Moreover δt(x) = t
ix if x ∈ mi (here m1 = ℓ⊕ v).
We also set v(x) = maxi ‖ξi‖
i if x = expN (ξr) ∗ . . . ∗ expN (ξ0) and di = i if
i > 0 and d0 = 1. And we let |x| := maxi ‖xi‖
1/di if x = xr + . . . + x1 + x0 in the
above direct sum decomposition.
Note that | · | is a δt-homogeneous quasi-norm. Moreover, it is straightforward
to verify (using the Campbell-Hausdorff formula (12) and Proposition 2.4) that
v(x) ≤ C|x|+C for some constant C > 0. In particular ξi/|x|
di remains bounded
as |x| becomes large.
Proof of Lemma 5.3. Combining Propositions 4.5 and 4.6, there is a constant
C > 0 such that for all x, y ∈ G, ρ(x, y) ≤ C|x∗−1 ∗ y| + C. Therefore we have
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 29
reduced to prove the statement for | · | instead of ρ, namely it is enough to show
that |x∗−1 ∗ k(x)| becomes negligible compared to |x| as |x| goes to infinity and k
tends to 1.
It follows from the Campbell-Baker-Hausdorff formula (11) and (12) that, if
x, y ∈ GN and |x|, |y| are O(t), then |δ 1
(x ∗ y) − δ 1
(x) ∗ δ 1
(y)| = O(t−1/r), and
similarly |δ 1
(x1 ∗ . . . ∗ xm) − δ 1
(x1) ∗ . . . ∗ δ 1
(xm)| = Om(t
−1/r), for m elements
xi with |xi| = O(t). Hence when writing x = expN (ξr) ∗ ... ∗ expN (ξ0), and setting
t = |x|, we thus obtain that the following quantity
∣∣∣∣∣∣
(x∗−1 ∗ k(x))−
0≤i≤r
expN (−t
−diξi) ∗
0≤i≤r
expN (t
−dr−ik(ξr−i))
∣∣∣∣∣∣
is a O(t−1/r). Indeed recall from Lemma 3.12 that k(x) = expN (k(ξr)) ∗ ... ∗
expN (k(ξ0)). As x gets larger, each t
−diξi remains in a compact subset of mi.
Therefore, as k tends to the identity in K, each t−dik(ξi) becomes uniformly close
to t−diξi independently of the choice of x ∈ GN as long as t = |x| is large. The
result follows. �
Proof of Lemma 5.4. Recall that hx = h ∗ T (h)x for all x, h ∈ G (see (1).
By the triangle inequality it is enough to bound ρ(y, h ∗ y), where y = T (h)x.
From Propositions 4.5 and 4.6, ρ is comparable (up to multiplicative and additive
constants to the homogeneous quasi-norm | · |. Hence the Lemma follows from the
following:
Lemma 5.5. Let N be a simply connected nilpotent Lie group and let | · | be a
homogeneous quasi norm on N associated to some 1-parameter group of dilations
(δt)t. For any ε > 0 and any compact subset F of N, there is a constant s2 > 0
such that
|x−1gx| ≤ ε|x|
for all g ∈ F and as soon as |x| > s2.
Proof. Recall, as in the proof of the last lemma, that for any c1 > 0 there is
a c2 > 0 such that if t > 1 and x, y ∈ N are such that |x|, |y| ≤ c1t, then
(xy)− δ 1
(x) ∗ δ 1
(y)| ≤ c2t
−1/r. In particular, if we set t = |x|, then
∣∣∣δ 1
(x−1gx)− δ 1
(x)−1 ∗ δ 1
(g) ∗ δ 1
∣∣∣ ≤ c2t−1/r
On the other hand, as g remains in the compact set F, δ 1
(g) tends uniformly to
the identity when t = |x| goes to infinity, and δ 1
(x) remains in a compact set.
By continuity, we see that δ 1
(x)−1 ∗ δ 1
(g) ∗ δ 1
(x) becomes arbitrarily small as t
increases. We are done. �
30 EMMANUEL BREUILLARD
5.3. Proof of Proposition 5.1. First we prove the following continuity state-
ment:
Lemma 5.6. Let ρ be a periodic pseudodistance on G and ε > 0. Then there
exists a neighborhood of the identity U in G and s3 > 0 such that
1− ε ≤
ρ(e, gx)
ρ(e, x)
≤ 1 + ε
as soon g ∈ U and ρ(e, x) > s3.
Proof. Let ρN be a left invariant Riemannian metric on the nilshadow GN .
|ρ(e, x)− ρ(e, gx)| ≤ ρ(x, gx) ≤ ρ(x, g ∗ x) + ρ(g ∗ x, gx)
However ρ(a, b) ≤ CρN(a, b)+C for some C > 0 by Proposition 4.6. Moreover by
(1) we have gx = g ∗ T (g)x. Hence
|ρ(e, x) − ρ(e, gx)| ≤ CρN (x, g ∗ x) + CρN (x, T (g)x) + 2C
To complete the proof, we apply Lemmas 5.5 and 5.3 to the right hand side above.
We proceed with the proof of Proposition 5.1. Let L be the set of all g ∈ G
such that ρ(e, gx)/ρ(e, x) tends to 1 as x tends to infinity in G. Clearly L is a
subgroup of G. Lemma 5.6 shows that L is closed. The H-invariance of ρ insures
that L contains H. Moreover, Proposition 5.2 implies that L is invariant under K.
Consequently L contains HK , the closed subgroup generated by all k(h), k ∈ K,
h ∈ H. This, together with Proposition 5.2, grants pointwise convergence of the
integrand in (29). Convergence of the integral follows by applying Lebesgue’s
dominated convergence theorem.
The fact that ρK is invariant under left multiplication by H and invariant under
precomposition by automorphisms from K insures that ρK is invariant under ∗-
left multiplication by any element h ∈ H, where ∗ is the multiplication in the
nilshadow GN . Moreover we check that T (g) ∈ K if g ∈ HK , hence HK is a
subgroup of GN . It is clearly co-compact in GN too (if F is compact and HF = G
then H ∗ FK = G where FK is the union of all k(F ), k ∈ K).
Clearly ρK is proper and locally bounded, so in order to finish the proof, we
need only to check that ρK is asymptotically geodesic. By H-invariance of ρK and
since H is co-compact in G, it is enough to exhibit a pseudogeodesic between e
and a point x ∈ H. Let x = z1 · ... ·zn with zi ∈ H and
ρ(e, zi) ≤ (1+ε) ·ρ(e, x).
Fix a compact fundamental domain F for H in HK so that integration in (29)
over H\HK is replaced by integration over F. Then for some constant CF > 0 we
have |ρ(g, gz) − ρ(e, gz)| ≤ CF for g ∈ F and z ∈ H. Moreover, it follows from
Proposition 5.2, Lemma 5.6 and the fact that HK ⊂ L, that
(32) ρ(e, gk(z)) ≤ (1 + ε) · ρ(e, z)
for all g ∈ F, k ∈ K and as soon as z ∈ G is large enough. Fix s large enough
so that CF ≤ εs and so that (32) holds when ρ(e, z) ≥ s. As already observed in
the discussion following Definition 4.1 (property 4.2 (3)) we may take the zi’s so
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 31
that s
≤ ρ(e, zi) ≤ s. Then nCF ≤ nsε ≤ 3ερ(e, x). Finally we get for ε < 1 and
x large enough
ρK(e, zi) ≤ CFn+ (1 + ε)
2ρ(e, x)
≤ CFn+ (1 + ε)
3ρK(e, x)
≤ (1 + 10ε) · ρK(e, x)
where we have used the convergence ρK/ρ→ 1 that we just proved. �
6. The nilpotent case
In this section, we prove Theorem 1.4 and its corollaries stated in the Intro-
duction for a simply connected nilpotent Lie group. We essentially follow Pansu’s
argument from [27], although our approach differs somewhat in its presentation.
Throughout the section, the nilpotent Lie group will be denoted by N, and its Lie
algebra by n.
Let m1 be any vector subspace of n such that n = m1 ⊕ [n, n]. Let π1 the
associated linear projection of n onto m1. Let H be a closed co-compact subgroup
of N . To every H-periodic pseudodistance ρ on N we associate a norm ‖·‖0 on
m1 which is the norm whose unit ball is defined to be the closed convex hull of all
elements π1(h)/ρ(e, h) for all h ∈ H\{e}. In other words,
(33) E := {x ∈ m1, ‖x‖0 ≤ 1} = CvxHull
π1(h)
ρ(e, h)
, h ∈ H\{e}
The set E is clearly a convex subset of m1 which is symmetric around 0 (since ρ
is symmetric). To check that E is indeed the unit ball of a norm on m1 it remains
to see that E is bounded and that 0 lies in its interior. The first fact follows
immediately from (23) and Example 2.3. If 0 does not lie in the interior of E,
then E must be contained in a proper subspace of m1, contradicting the fact that
H is co-compact in N .
Taking large powers hn, we see that we can replace the set H \{e} in the above
definition by any neighborhood of infinity in H. Similarly, it is easy to see that
the following holds:
Proposition 6.1. For s > 0 let Es be the closed convex hull of all π1(x)/ρ(e, x)
with x ∈ N and ρ(e, x) > s. Then E =
s>0Es.
Proof. Since ρ is H-periodic, we have ρ(e, hn) ≤ nρ(e, h) for all n ∈ N and h ∈ H.
This shows E ⊂
s>0Es. The opposite inclusion follows easily from the fact that
ρ is at a bounded distance from its restriction to H, i.e. from 4.2 (1). �
We now choose a set of supplementary subspaces (mi) starting with m1 as in
Paragraph 2.2. This defines a new Lie product ∗ on N so that N∞ = (N, ∗) is
stratified. We can then consider the ∗-left invariant Carnot-Carathéodory metric
associated to the norm ‖·‖0 as defined in Paragraph 2.1 on the stratified nilpotent
Lie group N∞. In this section, we will prove Theorem 1.4 for nilpotent groups in
the following form:
32 EMMANUEL BREUILLARD
Theorem 6.2. Let ρ be a periodic pseudodistance on N and d∞ the Carnot-
Carathéodory metric defined above, then as x tends to infinity in N
(34) lim
ρ(e, x)
d∞(e, x)
Note that d∞ is left-invariant for the N∞ Lie product, but not the original Lie
product on N .
Before going further, let us draw some simple consequences.
(1) In Theorem 6.2 we may replace d∞(e, x) by d(e, x), where d is the left
invariant Carnot-Caratheodory metric on N (rather than N∞) defined by the
norm ‖·‖0 (as opposed to d∞ which is ∗-left invariant). Hence ρ, d and d∞ are
asymptotic. This follows from the combination of Theorem 6.2 and Remark 2.1.
(2) Observe that the choice ofm1 was arbitrary. Hence two Carnot-Carathéodory
metrics corresponding to two different choices of a supplementary subspace m1
with the same induced norm on n/[n, n], are asymptotically equivalent (i.e. their
ratio tends to 1), and in fact isometric (see Remark 2.1). Conversely, if two
Carnot-Carathéodory metrics are associated to the same supplementary subspace
m1 and are asymptotically equivalent, they must be equal. This shows that the
set of all possible norms on the quotient vector space n/[n, n] is in bijection with
the set of all classes of asymptotic equivalence of Carnot-Carathéodory metrics on
(3) As another consequence we see that if a locally bounded proper and asymp-
totically geodesic left-invariant pseudodistance on N is also homogeneous with
respect to the 1-parameter group (δt)t (i.e. ρ(e, δtx) = tρ(e, x)) then it has to be
of the form ρ(x, y) = d∞(e, x
−1y) where d∞ is a Carnot-Carathéodory metric on
6.1. Volume asymptotics. Theorem 6.2 also yields a formula for the asymptotic
volume of ρ-balls of large radius. Let us fix a Haar measure on N (for example
Lebesgue measure on n gives rise to a Haar measure on N under exp). Since d∞
is homogeneous, it is straightforward to compute the volume of a d∞-ball:
vol({x ∈ N, d∞(e, x) ≤ t}) = t
d(N)vol({x ∈ N, d∞(e, x) ≤ 1})
where d(N) =
i≥1 dim(C
i(n)) is the homogeneous dimension of N. For a pseu-
dodistance ρ as in the statement of Theorem 6.2, we can define the asymptotic vol-
ume of ρ to be the volume of the unit ball for the associated Carnot-Carathéodory
metric d∞.
AsV ol(ρ) = vol({x ∈ N, d∞(e, x) ≤ 1})
Then we obtain as an immediate corollary of Theorem 6.2:
Corollary 6.3. Let ρ be periodic pseudodistance on N. Then
td(N)
vol({x ∈ N, ρ(e, x) ≤ t}) = AsV ol(ρ) > 0
Finally, if Γ is an arbitrary finitely generated nilpotent group, we need to take
care of the torsion elements. They form a normal finite subgroup T and applying
Theorem 6.2 to Γ/T , we obtain:
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 33
Corollary 6.4. Let S be a finite symmetric generating set of Γ and Sn the ball
of radius n is the word metric ρS associated to S, then
nd(N)
#Sn = #T ·
AsV ol(ρS)
vol(N/Γ)
where N is the Malcev closure of Γ = Γ/T , the torsion free quotient of Γ, and dS
is the word pseudodistance associated to S, the projection of S in Γ.
Moreover, it is possible to be a bit more precise about AsV ol(ρS). In fact,
the norm ‖·‖0 on m1 used to define the limit Carnot-Carathéodory distance d∞
associated to ρS is a simple polyhedral norm defined by
{‖x‖0 ≤ 1} = CvxHull (π1(s), s ∈ S)
More generally the following holds. Let H be any closed, co-compact subgroup
of N. Choose a Haar measure on H so that volN (N/H) = 1. Theorem 6.2 yields:
Corollary 6.5. Let Ω be a compact symmetric (i.e. Ω = Ω−1) neighborhood of
the identity, which generates H. Let ‖·‖0 be the norm on m1 whose unit ball is
CvxHull{π1(Ω)} and let d∞ be the corresponding Carnot-Carathéodory metric on
N∞. Then we have the following limit in the Hausdorff topology
(Ωn) = {g ∈ N, d∞(e, g) ≤ 1}
volH(Ω
nd(N)
= volN ({g ∈ N, d∞(e, g) ≤ 1})
6.2. Outline of the proof. We first devise some standard lemmas about piece-
wise approximations of horizontal paths (Lemmas 6.6, 6.7, 6.10). Then it is shown
(Lemma 6.11) that the original product on N and the product in the associated
graded Lie group are asymptotic to each other, namely, if (δt)t is a 1-parameter
group of dilations of N, then after renormalization by δ 1
, the product of O(t)
elements lying in some bounded subset of N, is very close to the renormalized
product of the same elements in the graded Lie group N∞. This is why all com-
plications due to the fact that N may not be a priori graded and the δt’s may
not be automorphisms disappear when looking at the large scale geometry of the
group. Finally, we observe (Lemma 6.13), as follows from the very definition of
the unit ball E for the limit norm ‖·‖0 , that any vector in the boundary of E,
can be approximated, after renormalizing by δ 1
by some element x ∈ N lying in
a fixed annulus s(1 − ε) ≤ ρ(e, x) ≤ s(1 + ε). This enables us to assert that any
ρ-quasi geodesic gives rise, after renormalization, to a d∞-geodesic (this gives the
lower bound in Theorem 6.2). And vice-versa, that any d∞-geodesic can be ap-
proximated uniformly by some renormalized ρ-quasi geodesic (this gives the upper
bound in Theorem 6.2).
34 EMMANUEL BREUILLARD
6.3. Preliminary lemmas.
Lemma 6.6. Let G be a Lie group and let ‖·‖e be a Euclidean norm on the Lie
algebra of G and de(·, ·) the associated left invariant Riemannian metric on G.
Let K be a compact subset of G. Then there is a constant C0 = C0(de,K) > 0
such that whenever de(e, u) ≤ 1 and x, y ∈ K
|de(xu, yu)− de(x, y)| ≤ C0de(x, y)de(e, u)
Proof. The proof reduces to the case when u and x−1y are in a small neighborhood
of e. Then the inequality boils down to the following ‖[X,Y ]‖e ≤ c ‖X‖e ‖Y ‖e for
some c > 0 and every X,Y in Lie(G). �
Lemma 6.7. Let G be a Lie group, let ‖·‖ be some norm on the Lie algebra of
G and let de(·, ·) be a left invariant Riemannian metric on G. Then for every
L > 0 there is a constant C = C(de, ‖·‖ , L) > 0 with the following property.
Assume ξ1, ξ2 : [0, 1] → G are two piecewise smooth paths in the Lie group G
with ξ1(0) = ξ2(0) = e. Let ξ
i ∈ Lie(G) be the tangent vector pulled back at the
identity by a left translation of G. Assume that supt∈[0,1] ‖ξ
1(t)‖ ≤ L, and that∫ 1
‖ξ′1(t)− ξ
2(t)‖ dt ≤ ε. Then
de(ξ1(1), ξ2(1)) ≤ Cε
Proof. The function f(t) = de(ξ1(t), ξ2(t)) is piecewise smooth. For small dt we
may write, using Lemma 6.6
f(t+ dt)− f(t) ≤ de(ξ1(t)ξ
1(t)dt, ξ1(t)ξ
2(t)dt) + de(ξ1(t)ξ
2(t)dt, ξ2(t)ξ
2(t)dt)− f(t) + o(dt)
∥∥ξ′1(t)− ξ′2(t)
dt+ C0f(t)
∥∥ξ′2(t)dt
+ o(dt)
≤ ε(t)dt +C0Lf(t)dt+ o(dt)
where ε(t) = ‖ξ′1(t)− ξ
2(t)‖e . In other words,
f ′(t) ≤ ε(t) + C0Lf(t)
Since f(0) = 0, Gronwall’s lemma implies that f(1) ≤ eC0L
ε(s)e−C0Lsds ≤ Cε.
From now on, we will take G to be the stratified nilpotent Lie group N∞, and
de(·, ·) will denote a left invariant Riemannian metric on N∞ while d∞(·, ·) is a left
invariant Carnot-Caratheodory Finsler metric on N∞ associated to some norm ‖·‖
on m1.
Remark 6.8. There is c0 > 0 such that c
0 de(e, x) ≤ d∞(e, x) ≤ c0de(e, x)
r in a
neighborhood of e. Hence in the situation of the lemma we get d∞(ξ1(1), ξ2(1)) ≤
r for some other constant C1 = C1(L, d∞, de).
Lemma 6.9. Let N ∈ N and dN (x, y) be the function in N∞ defined in the
following way:
dN (x, y) = inf{
∥∥ξ′(u)
∥∥ du, ξ ∈ HPL(N), ξ(0) = x, ξ(1) = y}
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 35
where HPL(N) is the set of horizontal paths ξ which are piecewise linear with at
most N possible values for ξ′. Then we have dN → d∞ uniformly on compact
subsets of N∞.
Proof. Note that it follows from Chow’s theorem (e.g. see [25] or [19]) that there
exists K0 ∈ N such that A := supd∞(e,x)=1 dK0(e, x) < ∞. Moreover, since piece-
wise linear paths are dense in L1, it follows for example from Lemma 6.7 that for
each fixed x, dn(e, x) → d∞(e, x). We need to show that dN (e, x) → d∞(e, x) uni-
formly in x satisfying d∞(e, x) = 1. By contradiction, suppose there is a sequence
(xn)n such that d∞(e, xn) = 1 and dn(e, xn) ≥ 1 + ε0 for some ε0 > 0. We may
assume that (xn)n converges to say x. Let yn = x
−1 ∗xn and tn = d∞(e, yn). Then
dK0(e, yn) = tndK0(e, δ 1
(yn)) ≤ Atn. Thus dn(e, xn) ≤ dn(e, x) + dn(e, yn) ≤
dn(e, x) +Atn as soon as n ≥ K0. As n tends to ∞, we get a contradiction. �
This lemma prompts the following notation. For ε > 0, we let Nε ∈ N be the
first integer such that 1 ≤ dNε(e, x) ≤ 1 + ε for all x with d∞(e, x) = 1. Then we
have:
Lemma 6.10. For every x ∈ N∞ with d∞(e, x) = 1, and all ε > 0 there exists a
path ξ : [0, 1] → N∞ in HPL(Nε) with unit speed (i.e. ‖ξ
′‖ = 1) such that ξ(0) = e
and d∞(x, ξ(1)) ≤ C2ε and ξ
′ has at most one discontinuity on any subinterval of
[0, 1] of length εr/Nε.
Proof. We know that there is a path in HPL(Nε) connecting e to x with length
ℓ ≤ 1 + ε. Reparametrizing the path so that it has unit speed, we get a path
ξ0 : [0, ℓ] → N∞ in HPL(Nε) with d∞(x, ξ0(1)) = d∞(ξ0(ℓ), ξ0(1)) ≤ ε. The deriva-
tive ξ′0 is constant on at most Nε different intervals say [ui, ui+1). Let us remove
all such intervals of length ≤ εr/Nε by merging them to an adjacent interval
and let us change the value of ξ′0 on these intervals to the value on the adja-
cent interval (it doesn’t matter if we choose the interval on the left or on the
right). We obtain a new path ξ : [0, 1] → N∞ in HPL(Nε) with unit speed and
such that ξ′ has at most one discontinuity on any subinterval of [0, 1] of length
εr/Nε. Moreover
‖ξ′(t)− ξ′0(t)‖ dt ≤ ε
r. By Lemma 6.7 and Remark 6.3, we
have d∞(ξ(1), ξ0(1)) ≤ C1ε, hence
d∞(ξ(1), x) ≤ d∞(x, ξ0(1)) + d∞(ξ0(1), ξ(1)) ≤ (C1 + 1)ε
Lemma 6.11 (Piecewise horizontal approximation of paths). Let x∗y denote the
product inside the stratified Lie group N∞ and x · y the ordinary product in N .
Let n ∈ N and t ≥ n. Then for any compact subset K of N , and any x1, ..., xn
elements of K, we have
de(δ 1
(x1 · ... · xn), δ 1
(x1 ∗ ... ∗ xn)) ≤ c1
de(δ 1
(x1 ∗ ... ∗ xn), δ 1
(π1(x1) ∗ ... ∗ π1(xn))) ≤ c2
36 EMMANUEL BREUILLARD
where c1, c2 depend on K and de only.
Proof. Let ‖·‖ be a norm on the Lie algebra of N. For k = 1, ..., n let zk =
x1 · ... ·xk−1 and yk = xk+1 ∗ ...∗xn. Since all xi’s belong to K, it follows from (24)
that as soon as t ≥ n, all δ 1
(zk) and δ 1
(yk) for k = 1, ..., n remain in a bounded set
depending only on K. Comparing (12) and (11), we see that whenever y = O(1)
and δ 1
(x) = O(1), we have
∥∥∥δ 1
(xy)− δ 1
(x ∗ y)
∥∥∥ = O(
On the other hand, from (12) it is easy to verify that right ∗-multiplication by a
bounded element is Lipschitz for ‖·‖ and the Lipschitz constant is locally bounded.
It follows that there is a constant C1 > 0 (depending only on K and ‖·‖) such
that for all k ≤ n
∥∥∥δ 1
((zk · xk) ∗ yk)− δ 1
(zk ∗ xk ∗ yk)
∥∥∥ ≤ C1
∥∥∥δ 1
(zk · xk)− δ 1
(zk ∗ xk)
Applying n times the relation (35) with x = x1 · ... · xk−1 and y = xk, we finally
obtain ∥∥∥δ 1
(x1 · ... · xn)− δ 1
(x1 ∗ ... ∗ xn)
∥∥∥ = O(
) = O(
where O() depends only on K. On the other hand, using (11), it is another simple
verification to check that if x, y lie in a bounded set, then 1
de(x, y) ≤ ‖x− y‖
≤ c2de(x, y) for some constant c2 > 0. The first inequality follows.
For the second inequality, we apply Lemma 6.7 to the paths ξ1 and ξ2 starting
at e and with derivative equal on [ k
, k+1
) to nδ 1
(xk) for ξ1 and to n
π1(xk)
for ξ2.
We get
de(δ 1
(x1 ∗ ... ∗ xn), δ 1
(π1(x1) ∗ ... ∗ π1(xn)) = O(
Remark 6.12. From Remark 6.3 we see that if we replace de by d∞ in the above
lemma, we get the same result with 1
replaced by t−
Lemma 6.13 (Approximation in the abelianized group). Recall that ‖·‖0 is the
norm on m1 defined in (33). For any ε > 0, there exists s0 > 0 such that for every
s > s0 and every v ∈ m1 such that ‖v‖0 = 1, there exists h ∈ H such that
(1− ε)s ≤ ρ(e, h) ≤ (1 + ε)s
and ∥∥∥∥
π1(h)
ρ(e, h)
Proof. Let ε > 0 be fixed. Considering a finite ε-net in E, we see that there exists
a finite symmetric subset {g1, ..., gp} of H\{e} such that, if we consider the closed
convex hull of F = {fi = π1(gi)/ρ(e, gi)|i = 1, ..., p} and ‖·‖ε the associated norm
on m1, then ‖·‖0 ≤ ‖·‖ε ≤ (1 + 2ε) ‖·‖0. Up to shrinking F if necessary, we may
assume that ‖fi‖ε = 1 for all i’s. We may also assume that the fi’s generate m1 as
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 37
a vector space. The sphere {x, ‖x‖ε = 1} is a symmetric polyhedron in m1 and to
each of its facets corresponds d = dim(m1) vertices lying in F and forming a vector
basis of m1. Let f1, ..., fd, say, be such vertices for a given facet. If x ∈ m1 is of the
form x =
i=1 λifi with λi ≥ 0 for 1 ≤ i ≤ d then we see that ‖x‖ε =
i=1 λi,
because the convex hull of f1, ..., fd is precisely that facet, hence lies on the sphere
{x, ‖x‖ε = 1}.
Now let v ∈ m1, ‖v‖0 = 1, and let s > 0. The half line tv, t > 0, hits the
sphere {x, ‖x‖ε = 1} in one point. This point belongs to some facet and there
are d linearly independent elements of F, say f1, ..., fd, the vertices of that facet,
such that this point belongs to the convex hull of f1, ..., fd. The point sv then lies
in the convex cone generated by π1(g1), ..., π1(gd). Moreover, there is a constant
Cε > 0 (Cε ≤
max1≤i≤p ρ(e, gi)) such that
∥∥∥∥∥sv −
niπ1(gi)
∥∥∥∥∥
for some non-negative integers n1, ..., nd depending on s > 0. Hence
niρ(e, gi) =
∥∥∥∥∥
niπ1(gi)
∥∥∥∥∥
(‖sv‖ε + Cε)
≤ 1 + 2ε +
≤ 1 + 3ε
where the last inequality holds as soon as s > Cε/ε.
Now let h = g
1 · ... · g
d ∈ H. We have π1(h) =
i=1 niπ1(gi)
ρ(e, h) ≥ ‖π1(h)‖0 ≥ s− Cε ≥ s(1− ε)
Moreover
ρ(e, h) ≤
niρ(e, gi) ≤ s(1 + 3ε)
Changing ε into say ε
and for say ε < 1
, we get the desired result with s0(ε) =
max1≤i≤p ρ(e, gi). �
6.4. Proof of Theorem 6.2. We need to show that as x→ ∞ in N
1 ≤ lim
ρ(e, x)
d∞(e, x)
≤ lim
ρ(e, x)
d∞(e, x)
First note that it is enough to prove the bounds for x ∈ H. This follows from
(4.2) (1).
Let us begin with the lower bound. We fix ε > 0 and s = s(ε) as in the definition
of an asymptotically geodesic metric (see (21)). We know by 4.2 (3) and (4) that
as soon as ρ(e, x) ≥ s we may find x1, ..., xn in H with s ≤ ρ(e, xi) ≤ 2s such that
xi and
ρ(e, xi) ≤ (1 + ε)ρ(e, x). Let t = d∞(e, x), then n ≤
ρ(e, x),
hence n ≤ C
t where C is a constant depending only on ρ (see (23)). We may
38 EMMANUEL BREUILLARD
then apply Lemma 6.11 (and the remark following it) to get, as t ≥ n as soon as
s(ε) ≥ C,
d∞(δ 1
(x), δ 1
(π1(x1) ∗ ... ∗ π1(xn))) ≤ c
But for each i we have ‖π1(xi)‖0 ≤ ρ(e, xi) by definition of the norm, hence
t = d∞(e, x) ≤
‖π1(xi)‖0+ d∞(x, π1(x1) ∗ ... ∗π1(xn)) ≤ (1+ ε)ρ(e, x)+ c
Since ε was arbitrary, letting t→ ∞ we obtain
ρ(e, x)
d∞(e, x)
We now turn to the upper bound. Let t = d∞(e, x) and ε > 0. According to
Lemma 6.10, there is a horizontal piecewise linear path {ξ(u)}u∈[0,1] with unit
speed such that d∞(δ 1
(x), ξ(1)) ≤ C2ε and no interval of length ≥
contains
more than one change of direction. Let s0(ε) be given by Lemma 6.13 and assume
t > s0(ε
r)Nε/ε
r. We split [0, 1] into n subintervals of length u1, ..., un such that
ξ′ is constant equal to yi on the i-th subinterval and s0(ε
r) ≤ tui ≤ 2s0(ε
r). We
have ξ(1) = u1y1 ∗ ... ∗ unyn. Lemma 6.13 yields points xi ∈ H such that
∥∥∥∥yi −
π1(xi)
∥∥∥∥ ≤ ε
and ρ(e, xi) ∈ [(1 − ε
r)tui, (1 + ε
r)tui] (note that tui > s0(ε
r)). Let ξ be the
piecewise linear path [0, 1] → N∞ with the same discontinuities as ξ and where the
value yi is replaced by
π1(xi)
. Then according to Lemma 6.7, d∞(ξ(1), ξ(1)) ≤ Cε.
Since ρ(e, xi) ≤ 4s0(ε
r) for each i, we may apply Lemma 6.11 (and the remark
following it) and see that if y = x1 · ... · xn,
d∞(ξ(1), δ 1
(y)) ≤ c′1(ε)t
Hence d∞(δ 1
(x), δ 1
(y)) ≤ (C2+C)ε+c
1(ε)t
r and ρ(e, y) ≤
ρ(e, xi) ≤ (1+ε
while ρ(x, y) ≤ C ′td∞(e, δ 1
(x−1y)) + C ′ ≤ t(Cd∞(δ 1
(x), δ 1
(y)) + oε(1)). Hence
ρ(e, x) ≤ t+ oε(t)
Remark 6.14. In the last argument we used the fact that
∥∥∥δ 1
(xu)− δ 1
(x ∗ u)
∥∥∥ =
) if δ 1
(x) and δ 1
(u) are bounded, in order to get for y = xu,
d∞(e, δ 1
(u)) ≤ d∞(δ 1
(x), δ 1
(xu)) + d∞(δ 1
(xu), δ 1
(x ∗ u))
≤ d∞(δ 1
(x), δ 1
(y)) + o(1).
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 39
7. Locally compact G and proofs of the main results
In this section, we prove Theorem 1.2 and complete the proof of Theorem 1.4
and its corollaries. We begin with the latter.
Proof of Theorem 1.4. It is the combination of Proposition 5.1, which reduces the
problem to nilpotent Lie groups, and Theorem 6.2, which treats the nilpotent case.
It only remains to justify the last assertion that d∞ is invariant under T (H).
SinceK = T (H) stabilizesm1 (see Lemma 3.12 for the definition ofm1) and acts
by automorphisms of the nilpotent (nilshadow) structure (Lemma 3.5), given any
k ∈ K, the metric d∞(k(x), k(y)) is nothing else but the left invariant subFinsler
metric on the nilshadow associated to the norm ‖k(v)‖ for v ∈ m1 (if ‖ · ‖ denotes
the norm associated to d∞).
However, d∞ is asymptotically invariant under K, because of Proposition 5.1.
Namely d∞(e, k(x))/d∞(e, x) tends to 1 as x tends to infinity. Finally d∞(e, v) =
‖v‖ and d∞(e, k(v)) = ‖k(v)‖ for all v ∈ m1. Two asymptotic norms on a vector
space are always equal. It follows that the norms ‖ · ‖ and ‖k(·)‖ on m1 coincide.
Hence d∞(e, k(x)) = d∞(e, k(x)) for all x ∈ S as claimed. �
Proof of Corollary 1.8. First some initial remark (see also Remark 2.1). If d is
a left-invariant subFinlser metric on a simply connected nilpotent Lie group N
induced by a norm ‖ · ‖ on a supplementary subspace m1 of the commutator
subalgebra, then it follows from the very definition of subFinsler metrics (see
Paragraph 2.1) that π1 is 1-Lipschitz between the Lie group and the abelianization
of it endowed with the norm ‖·‖, namely ‖π1(x)‖ ≤ d(e, x), with equality if x ∈ m1.
From this and considering the definition of the limit norm in (33), we conclude
that ‖ · ‖ coincides with the limit norm of d. In particular Theorem 6.2 implies
that d is asymptotic to the ∗-left invariant subFinsler metric d∞ induced by the
same norm ‖ · ‖ on the graded Lie group (N∞, ∗).
We can now prove Corollary 1.8. By the above remark, the limit metric d∞
on the graded nilshadow of S is asymptotic to the subFinsler metric d induced
by the same norm ‖ · ‖ on the same (K-invariant) supplementary subspace m1
of the commutator subalgebra of the nilshadow, and which is left invariant for
the nilshadow structure on S. However, it follows from Theorem 1.4 that d∞
and the norm ‖ · ‖ are K-invariant. This implies that d is also left-invariant
with respect to the original Lie group structure of S. Indeed, by (1), we can
write d(gx, gy) = d(g ∗ (T (g)x), g ∗ (T (g)y)) = d(T (g)x, T (g)y) = d(x, y), where ∗
denotes this time the nilshadow product structure. We are done. �
Proof of Corollary 1.7. This follows immediately from Theorem 1.4, when ∗ de-
notes the graded nilshadow product. If ∗ denotes the nilshadow group structure,
then it follows from Theorem 6.2 and the remark we just made in the proof of
Corollary 1.8 (see also Remark 2.1). �
7.1. Proof of Theorem 1.2. Let G be a locally compact group of polynomial
growth. We will show that G has a compact normal subgroup K such that G/K
40 EMMANUEL BREUILLARD
contains a closed co-compact subgroup, which can be realized as a closed co-
compact subgroup of a connected and simply connected solvable Lie group of type
(R) (i.e. of polynomial growth). The proof will follow in several steps.
(a) First we show that up to moding out by a normal compact subgroup, we may
assume that G is a Lie group whose connected component of the identity has no
compact normal subgroup. Indeed, it follows from Losert’s refinement of Gromov’s
theorem ([24] Theorem 2) that there exists a normal compact subgroup K of G
such that G/K is a Lie group. So we may now assume that G is a Lie group (not
necessarily connected) of polynomial growth. The connected component G0 of G
is a connected Lie group of polynomial growth. Recall the following classical fact:
Lemma 7.1. Every connected Lie group has a unique maximal compact normal
subgroup. By uniqueness it must be a characteristic Lie subgroup.
Proof. Clearly if K1 and K2 are compact normal subgroups, then K1K2 is again
a compact normal subgroup. Considering G/K, where K is a compact normal
subgroup of maximal dimension, we may assume that G has no compact normal
subgroup of positive dimension. But every finite normal subgroup of a connected
group is central. Hence the closed group generated by all finite normal subgroups is
contained in the center ofG. The center is an abelian Lie subgroup, i.e. isomorphic
to a product of a vector space Rn, a torus Rm/Zm, a free abelian group Zk and a
finite abelian group. In such a group, there clearly is a unique maximal compact
subgroup (namely the product of the finite group and the torus). It is also normal,
and maximal in G. �
The maximal compact normal subgroup of G0 is a characteristic Lie subgroup
of of G0. It is therefore normal in G and we may mod out by it. We therefore have
shown that every locally compact (compactly generated) group with polynomial
growth admits a quotient by a compact normal subgroup, which is a Lie group G
whose connected component of the identity G0 has polynomial growth and con-
tains no compact normal subgroup. We will now show that a certain co-compact
subgroup of G has the embedding property of Theorem 1.2.
(b) Second we show that, up to passing to a co-compact subgroup, we may
assume that the connected component G0 is solvable. For this purpose, let Q be
the solvable radical of G0, namely the maximal connected normal Lie subgroup
of G0. Note that it is a characteristic subgroup of G0 and therefore normal in G.
Moreover G0/Q is a semisimple Lie group. Since G0 has polynomial growth, it
follows that G0/Q must be compact. Consider the action of G by conjugation on
G0/Q, namely the map φ : G→ Aut(G0/Q). Since G0/Q is compact semisimple,
its group of automorphisms is also a compact Lie group. In particular, the kernel
ker φ is a co-compact subgroup of G.
The connected component of the identity of Aut(G0/Q) is itself semisimple and
hence has finite center. However the image of the connected component (ker φ)0
of ker φ in G0/Q modulo Q is central. Therefore it must be trivial. We have
shown that (ker φ)0 is contained in Q and hence is solvable. Moreover (ker φ)0
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 41
has no compact normal subgroup, because otherwise its maximal normal compact
subgroup, being characteristic in (ker φ)0, would be normal in G (note that (ker φ)0
is normal in G).
Changing G into the co-compact subgroup kerφ, we can therefore assume that
G0 is solvable, of polynomial growth, and has no non trivial compact normal sub-
group. The group G/G0 is discrete, finitely generated, and has polynomial growth.
By Gromov’s theorem, it must be virtually nilpotent, in particular virtually poly-
cyclic.
(c) We finally prove the following proposition.
Proposition 7.2. Let G be a Lie group such that its connected component of the
identity G0 is solvable, admits no compact normal subgroup, and with G/G0 virtu-
ally polycyclic. Then G has a closed co-compact subgroup, which can be embedded
as a closed co-compact subgroup of a connected and simply connected solvable Lie
group.
The proof of this proposition is mainly an application of a theorem of H.C.
Wang, which is a vast generalization of Malcev’s embedding theorem for torsion
free finitely generated nilpotent groups. Wang’s theorem [36] states that any S-
group can be embedded as a closed co-compact subgroup of a simply connected real
linear solvable Lie group with only finitely many connected components. Wang
defines a S-group to be any real Lie group G, which admits a normal subgroup
A such that G/A is finitely generated abelian and A is a torsion-free nilpotent
Lie group whose connected components group is finitely generated. In particular
any S-group has a finite index (hence co-compact) subgroup which embeds as a
co-compact subgroup in a connected and simply connected solvable Lie group.
In order to prove Proposition 7.2, it therefore suffices to establish that G has a
co-compact S-group.
We first recall the following simple fact:
Lemma 7.3. Every closed subgroup F of a connected solvable Lie group S is
topologically finitely generated.
Proof. We argue by induction on the dimension of S. Clearly there is an epi-
morphism π : S → R. By induction hypothesis F ∩ ker π is topologically finitely
generated. The image of F is a subgroup of R. However every subgroup of R
contains either one or two elements, whose subgroup they generate has the same
closure as the original subgroup. We are done. �
Next we show the existence of a nilradical.
Lemma 7.4. Let G be as in Proposition 7.2. Then G has a unique maximal
normal nilpotent subgroup GN .
Proof. The subgroup generated by any two normal nilpotent subgroups of any
given group is itself nilpotent (Fitting’s lemma, see e.g. [30][5.2.8]). Let GN be
the closure of the subgroup generated by all nilpotent subgroups of G. We need
to show that GN is nilpotent. For this it is clearly enough to prove that it is
42 EMMANUEL BREUILLARD
topologically finitely generated (because any finitely generated subgroup of GN is
nilpotent by the remark we just made). Since G/G0 is virtually polycyclic, every
subgroup of it is finitely generated ([29][4.2]). Hence it is enough to prove that
GN ∩G0 is topologically finitely generated. This follows from Lemma 7.3. �
Incidently, we observe that the connected component of the identity (GN )0
coincides with the nilradicalN of G0 (it is the maximal normal nilpotent connected
subgroup of G0).
We now claim the following:
Lemma 7.5. The quotient group G/GN is virtually abelian.
The proof of this lemma is inspired by the proof of the fact, due to Malcev,
that polycyclic groups have a finite index subgroup with nilpotent commutator
subgroup (e.g. see [30][ 15.1.6]).
Proof. We will show that G has a finite index normal subgroup whose commutator
subgroup is nilpotent. This clearly implies the lemma, for this nilpotent subgroup
will be normal, hence contained in GN .
First we observe that the group G admits a finite normal series Gm ≤ Gm−1 ≤
. . . ≤ G1 = G, where each Gi is a closed normal subgroup of G such that Gi/Gi+1
is either finite, or isomorphic to either Zn, Rn or Rn/Zn. This see it pick one of the
Gi’s to be the connected component G0 and then treat G/G0 and G0 separately.
The first follows from the definition of a polycyclic group (G/G0 has a normal
polycyclic subgroup of finite index). While for G0, observe that its nilradical N is
a connected and simply connected nilpotent Lie group and it admits such a series
of characteristic subgroups (pick the central descending series), and G0/N is an
abelian connected Lie group, hence isomorphic to the direct product of a torus
n/Zn and a vector group Rn. The torus part is characteristic in G0/N , hence its
preimage in G0 is normal in G.
The group G acts by conjugation on each partial quotient Qi := Gi/Gi+1. This
yields a map G → Aut(Qi). Now note that in order to prove our lemma, it is
enough to show that for each i, there is a finite index subgroup of G whose com-
mutator subgroup maps to a nilpotent subgroup of Aut(Qi). Indeed, taking the
intersection of those finite index subgroup, we get a finite index normal subgroups
whose commutator subgroup acts nilpotently on each Qi, hence is itself nilpotent
(high enough commutators will all vanish).
Now Aut(Qi) is either finite (if Qi is finite), or isomorphic to GLn(Z) (in case
Qi is either Z
n or Rn/Zn) or to GLn(R) (when Qi ≃ R
n). The image of G in
Aut(Qi) is a solvable subgroup. However, every solvable subgroup of GLn(R)
contains a finite index subgroup, whose commutator subgroup is unipotent (hence
nilpotent). This follows from Kolchin’s theorem for example, that a connected
solvable algebraic subgroup of GLn(C) is triangularizable. We are done. �
In the sequel we assume that G/G0 is torsion-free polycyclic. It is legitimate
to do so in the proof of Proposition 7.2, because every virtually polycyclic group
has a torsion-free polycyclic subgroup of finite index (see e.g. [29][Lemma 4.6]).
We now claim the following:
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 43
Lemma 7.6. GN is torsion-free.
Proof. Since G/G0 is torsion-free, it is enough to prove that GN ∩ G0 is torsion-
free. However the set of torsion elements in GN forms a subgroup of GN (if x, y are
torsion, then xy is too because 〈x, y〉 is nilpotent). Clearly it is a characteristic
subgroup of GN . Hence its intersection with G0 is normal in G0. Taking the
closure, we obtain a nilpotent closed normal subgroup T of G0 which contains a
dense set of torsion elements. Recall that G0 has no normal compact subgroup.
From this it quickly follows that T is trivial, because first it must be discrete (the
connected component T0 is compact and normal in G0), hence finitely generated
(by Lemma 7.3), hence made of torsion elements. But a finitely generated torsion
nilpotent group is finite. Again since G0 has no compact normal subgroup, T must
be trivial, and GN is torsion-free. �
Now observe that the group of connected components of GN , namely GN/(GN )0
is finitely generated. Indeed, since G/G0 is finitely generated (as any polycyclic
group), it is enough to prove that (G0 ∩GN )/(GN )0 is finitely generated, but this
follows from the fact that G0∩GN is topologically finitely generated (Lemma 7.3).
Now we are almost done. Note thatG is topologically finitely generated (Lemma
7.3), therefore so is G/GN . By Lemma 7.5 G/GN is virtually abelian, hence has
a finite index normal subgroup isomorphic to Zn × Rm. It follows that G/GN
has a co-compact subgroup isomorphic to a free abelian group Zn+m. Hence after
changing G by a co-compact subgroup, we get that G is an extension of GN (a
torsion-free nilpotent Lie group with finitely generated group of connected com-
ponents) by a finitely generated free abelian group. Hence it is an S-group in the
terminology of Wang [36]. We apply Wang’s theorem and this ends the proof of
Proposition 7.2.
(d) We can now conclude the proof of Theorem 1.2. By (a) and (b) G has
a quotient by a compact group which admits a co-compact subgroup satisfying
the assumptions of Proposition 7.2. Hence to conclude the proof it only remains
to verify that the simply connected solvable Lie group in which a co-compact
subgroup of G/K embeds has polynomial growth (i.e. is of type (R)). But this
follows from the following lemma (see [21][Thm. I.2]).
Lemma 7.7. Let G be a locally compact group. Then G has polynomial growth if
and only if some (resp. any) co-compact subgroup of it has polynomial growth.
Proof. First one checks that G is compactly generated if and only if some (resp.
any) co-compact subgroup is. This is by the same argument which shows that
finite index subgroups of a finitely generated group are finitely generated. In
particular, if Ω is a compact symmetric generating set of G and H is a co-compact
subgroup, then there is n0 ∈ N such that Ω
n0H = G. Then H ∩ Ω3n0 generates
If G has polynomial growth and H is any compactly generated closed subgroup,
then H has polynomial growth. Indeed (see [21][Thm I.2]), if ΩH denotes a com-
pact generating set for H, and K a compact neighborhood of the identity in G,
44 EMMANUEL BREUILLARD
volG(K)volH(Ω
H) ≤ volH(KK
−1 ∩H)volG(Ω
This inequality follows by integrating over a left Haar measure of G the function
φ(x) :=
−1x)dh, where dh is a left Haar measure on H. This integral
equals the left handside of the above displayed equation, while it is pointwise
bounded by volH(xK
−1 ∩H) inside HK and by zero outside HK.
In the other direction, if H has polynomial growth, then G also has, because
one can write Ωn ⊂ ΩnHK for some compact generating set ΩH of H and some
compact neighborhood K of the identity in G (see Proposition 4.4). Then the
result follows from the following inequality
volH(ΩH)volG(Ω
HK) ≤ volH(Ω
H )volG(Ω
H K),
which itself is a direct consequence of the fact that the function
ψ(x) :=
(h−1x)dh,
where dh is a left Haar measure onH, satisfies
ψ(x)dx = volH(Ω
H )volG(Ω
on the one hand and is bounded below by volH(ΩH) for every x ∈ Ω
HK on the
other hand. �
Note that the above proof would be slightly easier if we already knew that both
G and H were unimodular, in which case G/H has an invariant measure. But
we know this only a posteriori, because the polynomial growth condition implies
unimodularity ([21]).
Similar considerations show that G has polynomial growth if and only if G/K
has polynomial growth, given any normal compact subgroup K (e.g. see [21]).
We end this paragraph with a remark and an example, which we mentioned in
the Introduction.
Remark 7.8 (Discrete subgroups are virtually nilpotent). Suppose Γ is a discrete
subgroup of a connected solvable Lie group of type (R) (i.e. of polynomial growth).
Then Γ is virtually nilpotent. Indeed, a similar argument as in Lemma 7.3 shows
that every subgroup of Γ is finitely generated. It follows that Γ is polycyclic. How-
ever Wolf [37] proved that polycyclic groups with polynomial growth are virtually
nilpotent.
Example 7.9 (A group with no nilpotent co-compact subgroup). Let G be the
connected solvable Lie group G = R ⋉ (R2 × R2), where R acts as a dense one-
parameter subgroup of SO(2,R) × SO(2,R). Then G is of type (R). It has no
compact subgroup. And it has no nilpotent co-compact subgroup. Indeed suppose H
is a closed co-compact nilpotent subgroup. Then it has a non-trivial center. Hence
there is a non identity element whose centralizer is co-compact in G. However a
simple examination of the possible centralizers of elements of G shows that none
of them is co-compact.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 45
7.2. Proof of Corollary 1.6 and Theorem 1.1. Let G be an arbitrary locally
compact group of polynomial growth and ρ a periodic pseudodistance on G.
Claim 1: Corollary 1.6 holds for a co-compact subgroup H of G, if and only
if it holds for G. By Lemma 7.7, the groups G and H are unimodular, and hence
G/H bears a G-invariant Radon measure volG/H , which is finite since H is co-
compact. Now let F be a bounded Borel fundamental domain for H inside G. And
let ρ be the periodic pseudodistance on G induced by the restriction of ρ to H, that
is ρ(x, y) := ρ(hx, hy) where hx is the unique element of H such that x ∈ hxF. By
4.2 (1) and (4), ρ and ρ are at a bounded distance from each other. In particular,
Bρ(r−C) ⊂ Bρ(r) ⊂ Bρ(r+C). Hence if the limit (3) holds for ρ, it also holds for
ρ with the same limit. However, Bρ(r) = {x ∈ G, ρ(e, hx) ≤ r} = BρH (r)F where
ρH is the restriction of ρ to H. Hence volG(Bρ(r)) = volH(BρH (r)) · volG/H(F ).
By 4.2 (4), ρH is a periodic pseudodistance on H. So the result holds for (H, ρH)
if and only if it holds for (G, ρ). Conversely, if ρ0 is a periodic pseudodistance
on H, then ρ0(x, y) := ρ0(hx, hy) is a periodic pseudodistance on G, hence again
volG(Bρ0(r)) = volH(Bρ0(r)) · volG(F ) and the result will hold for (H, ρ0) if and
only if it holds for (G, ρ0).
Claim 2: If Corollary 1.6 holds for G/K, where K is some compact normal
subgroup, then it holds for G as well. Indeed, if ρ is a periodic pseudodistance on
G, then the K-average ρK , as defined in (26), is at a bounded distance from G
according to Lemma 4.7. Now ρK induces a periodic pseudodistance ρK on G/K
and BρK (r) = BρK (r)K. Hence, volG(BρK (r)) = volG/K(BρK (r)) · volK(K). And
if the limit (3) holds for ρK , it also holds for ρK , hence for ρ too.
Thus the discussion above combined with Theorem 1.2 reduces Corollary 1.6 to
the case when G is simply connected and solvable, which was treated in Section 5
and 6. �
7.3. Proof of Proposition 1.3 and Corollary 1.9.
Proof of Proposition 1.3. We say that two metric spaces (X, dX ) and (Y, dY ) are
at a bounded distance if they are (1, C)-quasi-isometric for some finite C. This is
an equivalence relation. Now if ρ is H-periodic with H co-compact, then (G, ρ)
is at a bounded distance from (H, ρ|H). Hence we may assume that H = G, i.e.
that ρ is left invariant on G.
Now Theorem 1.2 gives the existence of a normal compact subgroup K, a co-
compact subgroup H containing K and a simply connected solvable Lie group S
such that H/K is isomorphic to a co-compact subgroup of S.
Lemma 4.7 shows that (G, ρ) is at a bounded distance from (G, ρK), where ρK
is defined as in (26). Now ρK induces a left invariant periodic metric on G/K,
and (G/K, ρK ) is clearly at a bounded distance from (G, ρK). Now by 4.2, its
restriction to H/K is at a bounded distance and is left invariant. Now we set
ρS(s1, s2) = ρ
K(h1, h2), where (given a bounded fundamental domain F for the
46 EMMANUEL BREUILLARD
left action of H/K on S) hi is the unique element of H/K such that si ∈ hiF .
Clearly then (S, ρS) is at a bounded distance from (H/K, ρ
K). We are done. �
We note that our construction of S here depends on the stabilizer of ρ in G.
Certainly not every choice of Lie shadow can be used for all periodic metrics (think
that R3 is a Lie shadow of the universal cover of the group of motions of the plane).
Perhaps a single one can be chosen for all, but we have not checked that.
Proof of Corollary 1.9. Proposition 1.3 reduces the proof to a periodic metric ρ
on a simply connected solvable Lie group S. Let d∞ the subFinsler metric on S
(left invariant for the graded nilshadow group structure SN ) as given by Theorem
1.4. Let {δt}t is the group of dilations in the graded nilshadow SN of S as defined
in Section 3. By definition of the pointed Gromov-Hausdorff topology (see [18]),
it is enough to prove the
Claim. The following quantity
ρ(s1, s2)− d∞(δ 1
(s1), δ 1
(s2))|
converges to zero as n tends to +∞ uniformly for all s1, s2 in a ball of radius O(n)
for the metric ρ.
Now this follows in three steps. First ρ is at a bounded distance from its
restriction to the (co-compact) stabilizer H of ρ (cf. 4.2 (1), 4.2 (4)). Then
for h1, h2 ∈ H, we can write ρ(h1, h2) = ρ(e, h
1 h2). However Proposition 5.1
implies the existence of another periodic distance ρK on S, which is invariant
under left translations by elements of H for both the original Lie structure and
the nilshadow Lie structure on S, such that
ρ(e,x)
ρK(e,x)
tends to 1 as x tends to
∞. Hence ρK(e, h
1 h2) = ρK(h1, h2) = ρK(e, h
1 h2), where ∗ is the nilshadow
product on S. Hence | 1
ρ(h1, h2)−
ρK(e, h
1 h2)| tends to zero uniformly as h1
and h2 vary in a ball of radius O(n) for ρ.
Finally Theorem 6.2 implies that | 1
ρK(e, h
1 h2) −
d∞(e, h
1 h2)| tends to
zero and the claim follows, as one verifies from the Campbell Hausdorff formula
by comparing (11) and (12) as we did in (35), that
|d∞(δ 1
(h1), δ 1
(h2))− d∞(e, δ 1
(h∗−11 h2)|
converges to zero.
The fact that the graded nilpotent Lie group does not depend (up to isomor-
phism) on the periodic metric ρ but only on the locally compact group G fol-
lows from Pansu’s theorem [28] that if two Carnot groups (i.e. a graded simply
connected nilpotent Lie group endowed with left-invariant subRiemannian metric
induced by a norm on a supplementary subspace to the commutator subalgebra)
are bi-Lipschitz, the underlying Lie groups must be isomorphic. This deep fact
relies on Pansu’s generalized Rademacher theorem, see [28]. Indeed, two differ-
ent periodic metrics ρ1 and ρ2 on G are quasi-isometric (see Proposition 4.4),
and hence their asymptotic cones are bi-Lipschitz (and bi-Lipschitz to any Carnot
group metric on the same graded group, by (13)). �
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 47
8. Coarsely geodesic distances and speed of convergence
Under no further assumption on the periodic pseudodistance ρ, the speed of
convergence in the volume asymptotics can be made arbitrarily small. This is
easily seen if we consider examples of the following type: define ρ(x, y) = |x−y|+
|x− y|α on R where α ∈ (0, 1). It is periodic and vol(Bρ(t)) = t− t
α + o(tα).
However, many natural examples of periodic metrics, such as word metrics or
Riemannian metrics, are in fact coarsely geodesic. A pseudodistance on G is said
to be coarsely geodesic, if there is a constant C > 0 such that any two points can
be connected by a C-coarse geodesic, that is, for any x, y ∈ G there is a map
g : [0, t] → G with t = ρ(x, y), g(0) = x and g(t) = y, such that
|ρ(g(u), g(v)) − |u− v|| ≤ C
for all u, v ∈ [0, t].
This is a stronger requirement than to say that ρ is asymptotically geodesic
(see 21). This notion is invariant under coarse isometry. In the case when G
is abelian, D. Burago [6] proved the beautiful fact that any coarsely geodesic
periodic metric on G is at a bounded distance from its asymptotic norm. In
particular volG(Bρ(t)) = c · t
d+O(td−1) in this case. In the remarkable paper [32],
M. Stoll proved that such an error term in O(td−1) holds for any finitely generated
2-step nilpotent group. Whether O(td−1) is the right error term for any finitely
generated nilpotent group remains an open question.
The example below shows on the contrary that in an arbitrary Lie group of
polynomial growth no universal error term can be expected.
Theorem 8.1. Let εn > 0 be an arbitrary sequence of positive numbers tending
to 0. Then there exists a group G of polynomial growth of degree 3 and a compact
generating set Ω in G and c > 0 such that
volG(Ω
c · n3
≤ 1− εn
holds for infinitely many n, although 1
volG(Ω
n) → 1 as n→ +∞.
The example we give below is a semi-direct product of Z by R2 and the metric
is a word metric. However, many similar examples can be constructed as soon as
the map T : G → K defined in Paragraph 5.1 in not onto. For example, one can
consider left invariant Riemannian metrics on G = R · (R2 × R2) where R acts
by via a dense one-parameter subgroup of the 2-torus S1 × S1. Incidently, this
group G is known as the Mautner group and is an example of a wild group in
representation theory.
8.1. An example with arbitrarily small speed. In this paragraph we describe
the example of Theorem 8.1. Let Gα = Z · R
2 where the action of Z is given by
the rotation Rα of angle πα, α ∈ [0, 1). The group Gα is quasi-isometric to R
and hence of polynomial growth of order 3 and it is co-compact in the analogously
defined Lie group G̃α = R ⋉ R
2. Its nilshadow is isomorphic to R3. The point is
48 EMMANUEL BREUILLARD
Figure 2. The union of the two cones, with basis the disc of radius
2, represents the limit shape of the balls Ωn in the group Z ⋉ R2,
where Z acts by an irrational rotation, with generating set Ω =
{(±1, 0, 0)} ∪ {(0, x1, x2),
x21 + x
2 ≤ 1}.
that if α is a suitably chosen Liouville number, then the balls in Gα will not be
well approximated by the limit norm balls.
Elements of Gα are written (k, x) where k ∈ Z and x ∈ R
2. Let ‖x‖
x21+x
be a Euclidean norm on R2, and let Ω be the symmetric compact generating set
given by {(±1, 0)}∪{(0, x), ‖x‖ ≤ 1}. It induces a word metric ρΩ on G. It follows
from Theorem 1.4 and the definition of the asymptotic norm that ρΩ(e, (k, x)) is
asymptotic to the norm on R3 given by ρ0(e, (k, x)) := |k| + ‖x‖0 where ‖x‖0 is
the rotation invariant norm on R2 defined by ‖x‖
(x21 + x
2). The unit ball of
‖·‖0 is the convex hull of the union of all images of the unit ball of ‖·‖ under all
rotations Rkα, k ∈ Z.
We are going to choose α as a suitable Liouville number so that (36) holds. Let
δn = (4εn)
1/3 and choose α so that the following holds for infinitely many n’s:
(37) d(kα,Z +
) ≥ 2δn
for all k ∈ Z, |k| ≤ n. This is easily seen to be possible if we choose α of the form∑
1/3ni for some suitable lacunary increasing sequence of (ni)i.
Note that, since ‖x‖0 ≥ ‖x‖ , we have ρΩ ≥ ρ0. Let Sn be the piece of R
2 defined
by Sn = {|θ| ≤ δn} where θ is the angle between the point x and the vertical axis
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 49
Re2. We claim that if x ∈ Sn, ρ0(e, (k, x)) ≤ n and n satisfies (37), then
ρΩ(e, (k, x)) ≥ |k|+ (1 +
) ‖x‖0
It follows easily from the claim that volG(Ω
n) ≤ (1− εn) · volG(Bρ0(n)). Moreover
volG(Bρ0(n)) = c · n
3 + O(n2), where c = 4π
if volG is given by the Lebesgue
measure.
Proof of claim. Here is the idea to prove the claim. To find a short path between
the identity and a point on the vertical axis, we have to rotate by a Rkα such that
kα is close to 1
, hence go up from (0, 0) to (k, 0) first, thus making the vertical
direction shorter. However if (37) holds, the vertical direction cannot be made as
short as it could after rotation by any of the Rkα with |k| ≤ n.
Note that if ρ0(e, (k, x)) ≤ n then |k| ≤ n and ρΩ(e, (k, x)) ≥ |k|+inf
‖Rkiαxi‖
where the infimum is taken over all paths x1, ..., xN such that x =
xi and all
rotations Rkiα with |ki| ≤ n. Note that if δn is small enough and (37) holds
then for every x ∈ Sn we have ‖Rkαx‖ ≥ (1 + δ
n) ‖x‖0 . On the other hand
‖x‖0 =
‖xi‖0 cos(θi) where θi is the angle between xi and the x. Hence
‖Rkiαxi‖ ≥
|θi|≤δn
‖Rkiαxi‖+
|θi|>δn
‖Rkiαxi‖
≥ (1 + δ2n)
|θi|≤δn
‖xi‖0 cos(θi) +
cos(δn)
|θi|>δn
‖xi‖0 cos(θi)
≥ (1 +
) · ‖x‖0
8.2. Limit shape for more general word metrics on solvable Lie groups
of polynomial growth. The determination of the limit shape of the word metric
in Paragraph 8.1 was possible due to the rather simple nature of the generating
set. In general, using the identity (see (1))
(38) ω1 · . . . · ωm = ω1 ∗ (T (ω1)ω2) ∗ . . . ∗ (T (ωm−1 · . . . · ω1)ωm)
it is easy to check that the unit ball of the limit norm ‖ · ‖∞ inducing the limit
subFinsler metric d∞ on the nilshadow associated to a given word metric with
generating set Ω is contained in the K-orbit of the convex hull of the projection
of Ω to the abelianized nilshadow, namely the convex hull of K · π1(Ω).
In the example of Paragraph 8.1, we even had equality between the two. How-
ever this is not the case in general. For example, the limit shape is always K-
invariant, but clearly the limit shape associated to a generating set Ω coincides
with the one associated with a conjugate gΩg−1 of it, while the convex hull of the
respective K-orbits may not be the same.
Of course if the generating set Ω is K-invariant to begin with, then Ωn = Ω∗n
and we are back in the nilpotent case, where we know that the unit ball of the
limit norm is just the convex hull of the projection of the generating set to the
abelianization. In general however it is a challenging problem to determine the
50 EMMANUEL BREUILLARD
precise asymptotic shape of a word metric on a general solvable Lie group with
polynomial growth, and there seems to be no simple description analogous to what
we have in the nilpotent case.
Even in the above example Gα = Z⋉αR
2, or in the universal cover of the group
of motions of the plane (in which Gα embeds co-compactly), it is not that simple.
In general the shape is determined by solving an optimization problem in which
one has to find the path which maximizes the coordinates of the endpoint. In
order to illustrate this, we treat without proof the following simple example.
Suppose Ω is a symmetric compact neighborhood of the identity in Gα = Z⋉αR
of the form Ω = (0,Ω0) ∪ (1,Ω1) ∪ (1,Ω1)
−1, where Ω0,Ω1 ⊂ R
2. Then the
limit shape of the word metric ρΩ associated to Ω is the solid body (rotationally
symmetric around the vertical axis as in Figure 2) made of two copies (upper and
lower) of a truncated cone with base a disc on (0,R2) of radius max{r0, r1} and
top (resp. bottom) a disc on the plane (1,R2) (resp. (−1,R2)) of radius r2, where
the radii are given by
r0 = max{‖x‖, x ∈ Ω0}, r1 =
diam(Ω1),
where diam(Ω1) is the diameter of Ω1 and r2 is given by the integral
(39) r2 =
max{πθ(Ω1)}
where πθ(Ω1) is the orthogonal projection on the x-axis of image of Ω1 ⊂ R
2 by a
rotation of angle θ around the origin. It is indeed convex (note that r2 ≤ r1).
For example if Ω1 is made of only one point, then the limit shape is the same
as in the previous paragraph and as in Figure 2, namely two copies of a cone.
However if Ω1 is made of two points {a, b}, then the upper part of the limit shape
will be a truncated cone with an upper disc of radius r2 =
‖a−b‖
(which is the
result of the computation of the above integral).
Let us briefly explain the formula (39). A path of length n reaching the highest
z-coordinate in Gα is a word of the form (1, ω1) · . . . · (1, ωn), with ωi ∈ Ω1. By
(38) this word equals
Ri−1α ωi).
Here ωi can take any value in Ω1. In order to maximize the norm of the second
coordinate, or equivalently (by rotation invariance) its x-coordinate, one has to
choose ωi ∈ Ω1 at each stage in such a way that the x-coordinate of R
α ωi is
maximized. Formula (39) now follows from the fact that {Ri−1α }1≤i≤n becomes
equidistributed in SO(2,R) as n tends to infinity.
In order to show that max{r0, r1} is the radius of the base disc and more
generally that the limit shape is no bigger than this double truncated cone, one
needs to argue further by considering all possible paths of the form (ε1, ω1) · . . . ·
(εn, ωn) where εi ∈ {0,±1} and
εi is prescribed.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 51
8.3. Bounded distance versus asymptotic metrics. In this paragraph we an-
swer a question of D. Burago and G. Margulis (see [7]). Based on the abelian case
and the reductive case (Abels-Margulis [1]), Burago and Margulis had conjectured
that every two asymptotic word metrics should be at a bounded distance. We give
below a counterexample to this. We first give an example (A) of a nilpotent Lie
group endowed with two left invariant subFinsler metrics d∞ and d
∞ that are
asymptotic to each other, i.e. d∞(e, x)/d
∞(e, x) → 1 as x → ∞ but such that
|d∞(e, x)− d
∞(e, x)| is not uniformly bounded. Then we exhibit (B) a word met-
ric that is not at a bounded distance from any homogeneous quasi-norm. Finally
these examples also yield (C) two word metrics ρ1 and ρ2 on the same finitely
generated nilpotent group which are asymptotic but not at a bounded distance.
Note that the group Gα with ρ0 and ρΩ from the last paragraph also provides
an example of asymptotic metrics which are not at a bounded distance (but this
group was not discrete).
(A) Let N = R × H3(R) where H3 is classical Heisenberg group and Γ =
Z ×H3(Z) a lattice in N . In the Lie algebra n = RV ⊕ h3 we pick two different
supplementary subspaces of [n, n] = RZ, i.e. m1 = span{V,X, Y } and m
span{V + Z,X, Y }, where h3 is the Lie algebra of H3(R) spanned by X,Y and
Z = [X,Y ].We consider the L1-norm on m1 (resp. m
1) corresponding to the basis
(V,X, Y ) (resp. (V + Z,X, Y )). Both norms induce the same norm on n/[n, n].
They give rise to left invariant Carnot-Caratheodory Finsler metrics on N , say
d∞ (resp. d
∞). We use the coordinates (v, x, y, z) = exp(vV + xX + yY + zZ).
According to Remark (2) after Theorem 6.2, d∞ and d
∞ are asymptotic. Let
us show that they are not at a bounded distance. First observe that, since V
is central, d∞(e, (v; (x, y, z))) = |v| + dH3(e, (x, y, z)) where dH3 is the Carnot-
Caratheodory Finsler metric on H3(R) defined by the standard L
1-norm on the
span{X,Y }. Similarly d′∞(e, (v; (x, y, z))) = |v| + dH3(e, (x, y, z − v))). If d∞ and
d′∞ were at a bounded distance, we would have a C > 0 such that for all t > 0
|d∞(e, (t; (0, 0, t))) − t| ≤ C
Hence |dH3(e, (0, 0, t))| ≤ C, which is a contradiction.
(B) Now let Ω = {(1; (0, 0, 1))±1 , (1; (0, 0,−1))±1 , (0; (1, 0, 0))±1 , (0; (0, 1, 0))±1}
be a generating set for Γ and ρΩ the word metric associated to it. Let | · | be
a homogeneous quasi-norm on N which is at a bounded distance from ρΩ, i.e.
|ρΩ(e, g) − |g|| is bounded. Then | · | is asymptotic to ρΩ, hence is equal to the
Carnot-Caratheodory Finsler metric d asymptotic to ρΩ and homogeneous with
respect to the same one parameter group of dilations {δt}t>0. Let m1 = {v ∈ n,
δt(v) = tv}. Then d is induced by some norm ‖·‖0 on m1, whose unit ball is
given, according to Theorem 1.4 by the convex hull of the projections to m1 of the
generators in Ω. There is a unique vector in m1 of the form V +z0Z. Its ‖·‖0-norm
is 1 and d(e, (1; (0, 0, z0))) = 1. However d(e, (v; (x, y, z))) = |v|+ dH3(e, (x, y, z −
vz0)). Since ρΩ(e, (n; (0, 0, n))) = n, we get
d(e, (n; (0, 0, n))) − ρΩ(e, (n; (0, 0, n))) = dH3(e, (0, 0, n(1 − z0)))
52 EMMANUEL BREUILLARD
If this is bounded, this forces z0 = 1. But we can repeat the same argument with
(n; (0, 0,−n)) which would force z0 = −1. A contradiction.
(C) Let now Ω2 := {(1; (0, 0, 0))
±1 , (0; (1, 0, 0))±1 , (0; (0, 1, 0))±1} and ρΩ2 the
associated word metric on Γ. Then again ρΩ and ρΩ2 are asymptotic by Theorem
6.2 because the convex hull of their projection modulo the z-coordinate coincide.
However ρΩ2 is a product metric, namely we have ρΩ2(e, (v; (x, y, z))) = |v| +
ρ(e, (x, y, z)), where ρ is the word metric on the discrete Heisenberg group H3(Z)
with standard generators {(1, 0, 0)±1, (0, 1, 0)±1}. In particular
ρΩ(e, (n; (0, 0, n))) − ρΩ2(e, (n; (0, 0, n))) = ρ(e, (0, 0, n))
which is unbounded.
Remark 8.2 (An abnormal geodesic). We refer the reader to [9] for more on
these examples. In particular we show there that ρ1 and ρ2 above are not (1, C)-
quasi-isometric for any C > 0. The key phenomenon behind this example is
the presence of an abnormal geodesic (see [25]), namely the one-parameter group
{(t; (0, 0, 0))}t .
Remark 8.3 (Speed of convergence in the nilpotent case). The slow speed phe-
nomenon in Theorem 8.1 relied crucially on the presence of a non-trivial semisim-
ple part in Gα ; this doesn’t occur in nilpotent groups. In [9], we show that for
word metrics on finitely generated nilpotent groups, the convergence in Theorem
6.2 has a polynomial speed with an error term at least as good as O(d∞(e, x)
3r ),
where r is the nilpotency class. We conjecture there that the optimal exponent is 1
This involves refining quantitatively the estimates of the above proof of Theorem
9. Appendix: the Heisenberg groups
Here we show how to compute the asymptotic shape of balls in the Heisenberg
groups H3(Z) and H5(Z) and their volume, thus giving another approach to the
main result of Stoll [33]. The leading term for the growth of H3(Z) is rational
for all generating sets (Prop. 9.1 below), whereas in H5(Z) with its standard
generating set, it is transcendental. This explains how our Figure 1 was made
(compare with the odd [22] Fig. 1).
9.1. 3-dim Heisenberg group. Let us first consider the Heisenberg group
H3(Z) = 〈a, b|[a, [a, b]] = [b, [a, b]] = 1〉 .
We see it as the lattice generated by a = exp(X) and b = exp(Y ) in the real
Heisenberg group H3(R) with Lie algebra h3 generated by X,Y and spanned by
X,Y,Z = [X,Y ]. Let ρΩ be the standard word metric on H3(Z) associated to
the generating set Ω = {a±1, b±1}. According to Theorem 1.4, the limit shape of
the n-ball Ωn in H3(Z) coincides with the unit ball C3 = {g ∈ H3(R), d∞(e, g) ≤
1} for the Carnot-Caratheodory metric d∞ induced on H3(R) by the ℓ
1-norm
‖xX + yY ‖0 = |x|+ |y| on m1 = span{X,Y } ⊂ h3.
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 53
Computing this unit ball is a rather simple task. Exchanging the roles of X
and Y , we see that C3 is invariant under the reflection z 7→ −z. Then clearly C3 is
of the form {xX + yY + zZ, with |x|+ |y| ≤ 1 and |z| ≤ z(x, y)}. Changing X to
−X and Y to −Y, we get the symmetries z(x, y) = z(−x, y) = z(x,−y) = z(y, x).
Hence when determining z(x, y), we may assume 0 ≤ y ≤ x ≤ 1, x+ y ≤ 1.
The following well known observation is crucial for computing z(x, y). If ξ(t) is a
horizontal path in H3(R) starting from id, then ξ(t) = exp(x(t)X+y(t)Y +z(t)Z),
where ξ′(t) = x(t)X + y(t)Y and z(t) is the “balayage” area of the between the
path {x(s)X + y(s)Y }0≤s≤t and the chord joining 0 to x(t)X + y(t)Y.
Therefore, z(x, y) is given by the solution to the “Dido isoperimetric problem”
(see [25]): find a path in the X,Y -plane between 0 and xX + yY of ‖·‖0-length 1
that maximizes the “balayage area”. Since ‖·‖0 is the ℓ
1-norm in the X,Y -plane,
as is well-known (see [8]), such extremal curves are given by arcs of square with
sides parallel to the X,Y -axes. There is therefore a dichotomy: the arc of square
has either 3 or 4 sides (it may have 1 or 2 sides, but these are included are limiting
cases of the previous ones).
If there are 3 sides, they have length ℓ, x and y + ℓ with y + ℓ ≤ x. Hence
1 = ℓ+ x+ y + ℓ and z(x, y) = ℓx+ 1
xy. Therefore this occurs when y ≤ 3x − 1
and we then have z(x, y) =
x(1−x)
If there are 4 sides, they have length ℓ, x+ u, y + ℓ and u, with ℓ+ y = x+ u.
Hence 1 = 2ℓ + 2u + x + y and z(x, y) = (ℓ + y)(x + u) −
. This occurs when
y ≥ 3x− 1 and we then have z(x, y) =
(1+x+y)2
Hence if 0 ≤ y ≤ x ≤ 1 and x+ y ≤ 1
(40) z(x, y) = 1y≤3x−1
x(1− x)
+ 1y>3x−1
(1 + x+ y)2
The unit ball C3 drawn in Figure 1 is the solid body C3 = {xX + yY + zZ, with
|x|+ |y| ≤ 1 and |z| ≤ z(x, y)}.
A simple calculation shows that vol(C3) =
in the Lebesgue measure dxdydz.
Since H3(Z) is easily seen to have co-volume 1 for this Haar measure on H3(R)
(actually {xX+yY +zZ, x ∈ [0, 1), y ∈ [0, 1), z ∈ [0, 1)} is a fundamental domain),
it follows that
#(Ωn)
= vol(C3) =
We thus recover a well-known result (see [4], [31] where even the full growth series
is computed and shown to be rational).
One can also determine exactly which points of the sphere ∂C3 are joined to id
by a unique geodesic horizontal path. The reader will easily check that uniqueness
fails exactly at the points (x, y,±z(x, y)) with |x| < 1
and y = 0, or |y| < 1
x = 0, or else at the points (x, y, z) with |x|+ |y| = 1 and |z| < z(x, y).
The above method also yields the following result.
54 EMMANUEL BREUILLARD
Proposition 9.1. Let Ω be any symmetric generating set for H3(Z). Then the
leading coefficient in #(Ωn) is rational, i.e.
#(Ωn)
is a rational number.
Proof. We only sketch the proof here. We can apply the method above and com-
pute r as the volume of the unit CC-ball C(Ω) of the limit CC-metric d∞ de-
fined in Theorem 1.4. Since we know what is the norm ‖·‖ in the (x, y)-plane
m1 = span 〈X,Y 〉 that generates d∞ (it is the polygonal norm given by the con-
vex hull of the points of Ω), we can compute C(Ω) explicitly. We need to know
the solution to Dido’s isoperimetric problem for ‖·‖ in m1, and as is well known
(see [8]) it is given by polygonal lines from the dual polygon rotated by 90◦. Since
the polygon defining ‖·‖ is made of rational lines (points in Ω have integer coordi-
nates), any vector with rational coordinates has rational ‖·‖-length, and the dual
polygon is also rational. The equations defining z(x, y) will therefore have only
rational coefficients, and z(x, y) will be piecewisely given by a rational quadratic
form in x and y, where the pieces are rational triangles in the (x, y)-plane. The
total volume of C(Ω) will therefore be rational. �
9.2. 5-dim Heisenberg group. The Heisenberg groupH5(Z) is the group gener-
ated by a1, b1, a2, b2,c with relations c = [a1, b1] = [a2, b2], a1 and b1 commute with
a2 and b2 and c is central. Let Ω = {a
i , b
i , i = 1, 2}. Let us describe the limit
shape of Ωn. Again, we see H5(Z) as a lattice of co-volume 1 in the group H5(R)
with Lie algebra h5 spanned by X1, Y1X2, Y2 and Z = [Xi, Yi]. By Theorem 1.4,
the limit shape is the unit ball C5 for the Carnot-Caratheodory metric on H5(R)
induced by the ℓ1-norm ‖x1X1 + y1Y1 + x2X2 + y2Y2‖0 = |x1|+ |y1|+ |x2|+ |y2|.
Since X1, Y1 commute with X2, Y2, in any piecewise linear horizontal path in
H5(R), we can swap the pieces tangent to X1 or Y1 with those tangent to X2 or
Y2 without changing the end point of the path. Therefore if ξ(t) = exp(x1(t)X1 +
y1(t)Y1 + x2(t)X2 + y2(t)Y2 + z(t)Z) is a horizontal path, then z(t) = z1(t) +
z2(t), where zi(t), i = 1, 2, is the “balayage area” of the plane curve {xi(s)Xi +
yi(s)Yi}0≤s≤t.
Since, just like for H3(Z), we know the curve maximizing this area, we can
compute the unit ball C5 explicitly. In exponential coordinates it will take the
form C5 = {exp(x1X1 + y1Y1 + x2X2 + y2Y2 + zZ), |x1|+ |y1|+ |x2|+ |y2| ≤ 1 and
|z| ≤ z(x1, y1, x2, y2)}. Then z(x1, y1, x2, y2) = sup0≤t≤1{zt(x1, y1)+ z1−t(x2, y2)},
where zt(x, y) is the maximum “balayage area” of a path of length t between 0
and xX+yY. It is easy to see that zt(x, y) = t
2z(x/t, y/t) where z is given by (40).
Hence zt is a piecewise quadratic function of t. Again z(x1, y1, x2, y2) is invariant
under changing the signs of the xi,yi’s, and swapping x and y, or else swapping
1 and 2. We may thus assume that the xi,yi’s lie in D = {0 ≤ yi ≤ xi ≤ 1 and
x1+y1+x2+y2 ≤ 1, and x2−y2 ≥ x1−y1}.We may therefore determine explicitly
the supremum z(x1, y1, x2, y2), which after some straightforward calculations takes
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 55
on D the following form:
z(x1, y1, x2, y2) = 1Amax{d1, d2}+ 1B max{d1, c1}+ 1C max{c1, c2}
where d1 =
(1−x1− y1−x2), c1 =
(1+x1+ y1−x2− y2)
x2y2−x1y1
and d2 and c2 are obtained from d1 and c1 by swapping the indices 1 and 2. The
sets A,B and C form the following partition of D : A = D ∩ {m ≤ x1 − y1},
B = D ∩ {x1 − y1 < m < x2 − y2} and C = D ∩ {x2 − y2 ≤ m}, where m =
(1 − x1 − x2 − y1 − y2)/2.
Since C5 has such an explicit form, it is possible to compute its volume. The fact
that z(x1, y1, x2, y2) is piecewisely given by the maximum of two quadratic forms
makes the computation of the integral somewhat cumbersome but tractable. Our
equations coincide (fortunately!) with those of Stoll (appendix of [33]), where he
computed the main term of the asymptotics of #(Ωn) by a different method. Stoll
did calculate that integral and obtained
#(Ωn)
= vol(C5) =
21870
log(2)
32805
which is transcendental. It is also easy to see by this method that if we change
the generating set to Ω0 = {a
2 }, then we get a rational volume. Hence
the rationality of the growth series of H5(Z) depends on the choice of generating
set, which is Stoll’s theorem.
One advantage of our method is that it can also apply to fancier generating
sets. The case of Heisenberg groups of higher dimension with the standard gen-
erating set is analogous: the function z({xi}, {yi}) is again piecewisely defined as
the maximum of finitely many explicit quadratic forms on a linear partition of the
ℓ1-unit ball
|xi|+ |yi| ≤ 1.
Acknowledgments. I would like to thank Amos Nevo for his hospitality at the
Technion of Haifa in December 2005, where part of this work was conducted, and
for triggering my interest in this problem by showing me the possible implications
of Theorem 1.1 to Ergodic Theory. My thanks are also due to V. Losert for
pointing out an inaccuracy in my first proof of Theorem 1.2 and for his other
remarks on the manuscript. Finally I thank Y. de Cornulier, M. Duchin, E. Le
Donne, Y. Guivarc’h, A. Mohammadi, P. Pansu and R. Tessera for several useful
conversations.
References
[1] H. Abels and G. Margulis. Coarsely geodesic metrics on reductive groups. In Modern dy-
namical systems and applications, pages 163–183. Cambridge Univ. Press, Cambridge, 2004.
[2] L. Auslander and L. W. Green, G-induced flows, Amer. J. Math. 88 (1966), 43–60.
[3] H. Bass, The degree of polynomial growth of finitely generated nilpotent groups, Proc. London
Math. Soc. (3) 25 (1972), 603–614.
[4] M. Benson, On the rational growth of virtually nilpotent groups, In: S.M. Gersten, Stallings
(eds), Combinatorial Group Theory and Topology, Ann. Math. Studies, vol 111, PUP (1987).
[5] V. N. Berestovskĭı. Homogeneous manifolds with an intrinsic metric I, Sibirsk. Mat. Zh.,
29(6):17–29, 1988.
56 EMMANUEL BREUILLARD
[6] D. Yu. Burago, Periodic metrics, in Representation Theory and Dynamical Systems, 205–
210, Adv. Soviet Math. 9 Amer. Math. Soc. (1992).
[7] D. Yu. Burago, G.A. Margulis, Problem Session, in Oberwolfach Report, Geometric Group
Theory, Hyperbolic Dynamics and Symplectic Geometry, 2006.
[8] H. Busemann, The isoperimetric problem in the Minkowski plane, AJM 69 (1947), 863–871.
[9] E. Breuillard and E. Le Donne, On the rate of convergence to the asymptotic cone for
nilpotent groups and subFinsler geometry, preprint 2012.
[10] A. Calderon, A general ergodic theorem, Annals of Math. 57 (1953), pp. 182-191.
[11] T. H. Colding and W. P. Minicozzi, II. Liouville theorems for harmonic sections and appli-
cations. Comm. Pure Appl. Math., 51(2):113–138, 1998.
[12] L. Corwin and F. P. Greenleaf, Representations of nilpotent Lie groups and their applications,
Part I, Basic theory and examples, Cambridge Univ. Press, (1990) 269pp.
[13] N. Dungey, A. F. M ter Elst, and D. W. Robinson, Analysis on Lie groups with polynomial
growth, Progress in Math. 214, Birkhauser, (2003) 312pp.
[14] W. R. Emerson, The pointwise ergodic theorem for amenable groups, Amer. J. Math 96
(1974), 472–487.
[15] J. W. Jenkins, A characterization of growth in locally compact groups, Bull. Amer. Math.
Soc. 79 (1973), 103–106.
[16] F. P. Greenleaf, Invariant means on topological groups and their applications, Van Nostrand
Mathematical Studies, no 16 (1969) 113pp.
[17] M. Gromov, Groups of polynomial growth and expanding maps, Publications Mathématiques
de l’IHES, no 53 (1981), 53-73.
[18] M. Gromov. Metric structures for Riemannian and non-Riemannian spaces, volume 152 of
Progress in Mathematics. Birkhäuser Boston Inc., Boston, MA, 1999. Based on the 1981
French original, With appendices by M. Katz, P. Pansu and S. Semmes.
[19] M. Gromov, Carnot-Carathéodory spaces seen from within, in Sub-Riemannian Geometry,
edited by A. Bellaiche and J-J. Risler, 79-323, Birkauser (1996).
[20] M. Gromov, Asymptotic invariants of infinite groups, in Geometric group theory, Vol. 2
(Sussex, 1991), 1–295, London Math. Soc. Lecture Note Ser., 182, CUP (1993).
[21] Y. Guivarc’h, Croissance polynômiale et périodes des fonctions harmoniques, Bull. Sc. Math.
France 101, (1973), p. 353-379.
[22] R. Karidi, Geometry of balls in nilpotent Lie groups, Duke Math. J. 74 (1994), no. 2, 301–317.
[23] S. A. Krat. Asymptotic properties of the Heisenberg group. Zap. Nauchn. Sem. S.-Peterburg.
Otdel. Mat. Inst. Steklov. (POMI), 261(Geom. i Topol. 4):125–154, 268, 1999.
[24] V. Losert, On the structure of groups with polynomial growth, Math. Z. 195 (1987), no 1,
109–117.
[25] R. Montgomery, A tour of sub-riemannian geometry, AMS book 2002.
[26] A. Nevo, Pointwise ergodic theorems for actions of connected Lie groups, Handbook of Dy-
namical Systems, Eds. B. Hasselblatt and A. Katok, to appear.
[27] P. Pansu, Croissance des boules et des géodésiques fermées dans les nilvariétés, Ergodic
Theory Dynam. Systems 3 (1983), no. 3, 415–445.
[28] P. Pansu, Mtriques de Carnot-Carathodory et quasiisomtries des espaces symtriques de rang
un, Ann. of Math. (2) 129 (1989), no. 1, 160.
[29] M. S. Raghunathan, Discrete subgroups of Lie groups, Springer Verlag (1972).
[30] D. Robinson, A course in the theory of groups, Springer-Verlag.
[31] M. Shapiro, A geometric approach to almost convexity and growth of some nilpotent groups,
Math. Ann, 285, 601-624 (1989).
[32] M. Stoll, On the asymptotic of the growth of 2-step nilpotent groups, J. London Math. Soc
(2) 58 (1998), no 1, 38–48.
[33] M. Stoll, Rational and transcendental growth series for higher Heisenberg groups, Invent.
math. 126, 85-109 (1996).
[34] A. Tempelman, Ergodic theorems for group actions, Mathematics and its applications, 78,
Kluwer Academic publishers (1992).
ASYMPTOTIC SHAPE OF BALLS IN GROUPS WITH POLYNOMIAL GROWTH 57
[35] R. Tessera, Volumes of spheres in doubling measures metric spaces and groups of polynomial
growth, Bull. Soc. Math. France, 135(1):47–64, 2007.
[36] H.C. Wang, Discrete subgroups of solvable Lie groups, Annals of Math, (1956), 64, 1-19.
[37] J. Wolf, Growth of finitely generated solvable groups and curvature of Riemanniann mani-
folds, J. Differential Geometry, 2 (1968) p. 421–446.
E-mail address: emmanuel.breuillard@math.u-psud.fr
Université Paris-Sud 11, Laboratoire de Mathématiques, 91405 Orsay, France
	1. Introduction
	2. Quasi-norms and the geometry of nilpotent Lie groups
	3. The nilshadow
	4. Periodic metrics
	5. Reduction to the nilpotent case
	6. The nilpotent case
	7. Locally compact G and proofs of the main results
	8. Coarsely geodesic distances and speed of convergence
	9. Appendix: the Heisenberg groups
	References
ABSTRACT
  We get asymptotics for the volume of large balls in an arbitrary locally
compact group G with polynomial growth. This is done via a study of the
geometry of G and a generalization of P. Pansu's thesis. In particular, we show
that any such G is weakly commensurable to some simply connected solvable Lie
group S, the Lie shadow of G. We also show that large balls in G have an
asymptotic shape, i.e. after a suitable renormalization, they converge to a
limiting compact set which can be interpreted geometrically. We then discuss
the speed of convergence, treat some examples and give an application to
ergodic theory. We also answer a question of Burago about left invariant
metrics and recover some results of Stoll on the irrationality of growth series
of nilpotent groups.

<|endoftext|><|startoftext|>
Introduction
A system of n ordinary differential equations each of order M > 1,
k = fk(u
j , t), j, k = 1, n, s = 0,M − 1, (1)
has a variable number of Lie point symmetries depending upon the structure of
the functions fk. The maximal dimension D of the algebra of admitted Lie point
symmetries can be obtained by the formulæ [9]
M = 2 =⇒ D = n2 + 4n + 3 (2)
M > 2 =⇒ D = n2 +Mn + 3. (3)
Some explicit numbers are given in Table 1.
Recently the elaboration of the elements of the Lie algebra, E8, of order 248 has
been variously announced [3, 7, 13, 17, 16] in the serious popular media. The au-
thoritative source is the Atlas of Lie Groups and Representations [2] which is funded
by the National Science Foundation through the American Institute of Mathematics
[1]. The results of the E8 computation were announced in a talk at MIT by David
Vogan on Monday, March 19, 2007, and the details may be found at [15]. The Atlas
of Lie Groups and Representations is a project to make available information about
representations of semisimple Lie groups over real and p-adic fields. Of particular
importance is the problem of the unitary dual, ie the classification of all of the ir-
reducible unitary representations of a given Lie group. The goal of the Atlas of Lie
∗permanent address: School of Mathematical Sciences, Westville Campus, University of
KwaZulu-Natal, Durban 4000, Republic of South Africa
http://arxiv.org/abs/0704.0096v1
Table 1: : The maximal dimension of the algebra of admitted Lie point symmetries
for systems of equations of varying order (horizontal) and number (vertical).
M 2 3 4 5 6 7 8 9 10
1 8 7 8 9 10 11 12 13 14
2 15 13 15 17 19 21 23 25 27
3 24 21 24 27 30 33 36 39 42
4 35 31 35 39 43 47 51 55 59
5 48 43 48 53 58 63 68 73 78
6 63 57 63 69 75 81 87 93 99
7 80 73 80 87 94 101 108 115 122
8 99 91 99 107 115 123 131 139 147
9 120 111 120 129 138 147 156 165 174
10 143 133 143 153 163 173 183 193 203
Groups and Representations is to classify the unitary dual of a real Lie group, G,
by computer. A step in this direction is to compute the admissible representations
of G including their Kazhdan-Lusztig-Vogan polynomials. The computation for E8
was an important test of the technology. While the computation is an impressive
achievement, it is only a small step towards the unitary dual and should not be
ranked as important as the original work of Kazhdan, Lusztig, Vogan, Beilinson,
Bernstein et al. (See for example [4, 5, 6, 11, 12, 14, 18, 8].) Nevertheless the result
was regarded as being suitable for a concerted campaign of publicity to heighten
awareness of Mathematics in the community at large:
“Symmetrie ist möglicherweise das erfolgreichste Prinzip der Physik überhaupt” [7].
“Un groupe de chercheurs américains et européens, parmi lesquels on trouve deux
Français, est parvenu à décoder une des structures les plus vastes de l’histoire des
mathématiques” [13].
“It may be that some day this calculation can help physicists to understand the
universe” [17].
“Eighteen mathematicians spent four years and 77 hours of supercomputer compu-
tation to describe this structure” [16].
In this note we demonstrate three representations of a Lie algebra of dimension
248. The two of us spent four hours and 77 seconds of pocket-calculator computation
to describe these three structures.
2 Three simple systems
For D = 248 formula (2) does not have integral solutions and so there is no system
of second-order ordinary differential equations of maximal symmetry possessing a
248-dimensional algebra of its Lie point symmetries1. About formula (3) the factors
of 248-3=245 are 1, 5 and 7 (49 is out of question because 492 > 245). Consequently
1Is this another instance of the intrinsically uniqueness of Classical Mechanics?
possible values of n are 1, 5 and 7. The corresponding values ofM are 244, 44 and 28,
respectively. The systems of maximal symmetry are easily obtained as one simply
puts fk = 0 ∀ k. Thus the systems we construct are the simplest representations of
the equivalence class under point transformation of systems of equations of maximal
symmetry.
Firstly we consider the following system:
k = 0, k = 1, 5. (4)
It is easy to show that this simple system admits a 248-dimensional algebra of its
Lie point symmetries since 52 + 5 · 44 + 3 = 248. The algebra is generated by the
operators
Γ1 = t
2∂t + 43t
i=1 ui∂ui ,
Γ2 = t∂t,
Γ3 = ∂t,
Γi,k = uk∂ui , k = 1, 5, i = 1, 5
Γi+5,s = t
s∂ui, s = 0, 43, i = 1, 5.
Secondly we consider the system
u(28)r = 0, r = 1, 7. (6)
This equally simple system admits a 248-dimensional algebra (72+7 · 28+ 3 = 248)
of its Lie point symmetries generated by
Γ1 = t
2∂t + 27t
j=1 uj∂uj ,
Γ2 = t∂t,
Γ3 = ∂t,
Γj,r = ur∂uj , r = 1, 7, j = 1, 7
Γj+7,n = t
n∂uj , n = 0, 27, j = 1, 7.
Thirdly and finally the scalar equation,
u(244) = 0, (8)
admits a 248-dimensional Lie algebra (12+1 ·244+3 = 248) of its point symmetries
generated by the operators
Γ1 = t
2∂t + 243tu∂u,
Γ2 = t∂t,
Γ3 = ∂t,
Γ4 = u∂u,
Γn+5 = t
n∂u, n = 0, 243.
3 Conclusion
We have demonstrated three representations of Lie algebras of dimension 248 which
is the dimension of E8. Although the algebras we present are not simple, their
method of construction is. The reason for this simplicity is that we used represen-
tations for systems of equations of maximal symmetry. We do not deny that larger
systems, be that in order or number, of less than maximal symmetry could possibly
have an algebra of dimension 248, but even on the assumption that such systems
be linear the complexity of the calculation becomes immense [10] and defeats the
purpose of the present note.
Note that we have used the simplest forms for the generators of the algebras of
the three systems, (4), (6) and (8), for our primary interest is the demonstration of
the existence of the algebras. Normally one would use combinations which reflect
subalgebraic structures. For example in the case of (8) for which the algebra is
obviously sl(250, IR) one would replace Γ2 with Γ̃2 = 2t∂t +243u∂u to underline the
subalgebraic structure {sl(2, IR)⊕ A1} ⊕s 244A1, where Γ1, Γ̃2 and Γ3 constitute a
representation of sl(2, IR), Γ4 reflects the homogeneity of the equation in the depen-
dent variable and the 244-element abelian subalgebra is composed of the solution
symmetries, so called because the coefficient functions are solutions of (8).
Acknowledgements
PGLL thanks the University of Kwazulu-Natal for its continued support.
References
[1] American Institute of Mathematics.
http://aimath.org/E8/
[2] Atlas of Lie Groups and Representations.
http://www.liegroups.org/
[3] BBC Monday, 19 March 2007, 12:28 GMT.
http://news.bbc.co.uk/2/hi/science/nature/6466129.stm
[4] Beilinson A (1983) Localization of representations of reductive Lie algebras
Proceedings of the International Congress of Mathematicians, Warsaw 699-710
[5] Beilinson A & Bernstein J (1981) Localisation de g-modules Comptes Rendus
de l’Académie des Sciences de Paris Séries I Mathématiques 292 15-18
[6] Bernstein J (1986) On the Kazhdan-Lusztig conjectures AMS Summer Research
Conference (University of California, Santa Cruz, July 1986)
[7] Der Spiegel, 19 März 2007.
http://www.spiegel.de/wissenschaft/mensch/0,1518,472569,00.html
http://aimath.org/E8/
http://www.liegroups.org/
http://news.bbc.co.uk/2/hi/science/nature/6466129.stm
http://www.spiegel.de/wissenschaft/mensch/0
[8] Gelfand S & MacPherson R (1982) Verma modules and Schubert cells: a dic-
tionary in Seminaire d’algebre Paul Dubriel et MP Malliavin (Lecture Notes in
Mathematics 925, Springer Verlag, Berlin–New York) 150
[9] González-Gascón F & González-López A (1983) Symmetries of differential equa-
tions IV Journal of Mathematical Physics 24 2006-2021
[10] Gorringe VM & Leach PGL (1988) Lie point symmetries for systems of second
order linear ordinary differential equations Quæstiones Mathematicæ 11 95-117
[11] Kazhdan D & Lusztig G (1979) Representations of Coxeter groups and Hecke
algebras Inventiones Mathematicæ 53 165184
[12] Kazhdan D & Lusztig G (1980) Schubert varieties and Poincaré duality in
Geometry of the Laplace Operator, (Proceedings of Symposium on Pure Math-
ematics 36, American Mathematical Society) 185203
[13] LEMONDE.FR avec AFP 19.03.07
http://www.lemonde.fr/web/article/0,1-0@2-3244,36-884723@51-
884724,0.html
[14] Lusztig G & Vogan D (1983) Singularities of closures of K-orbits on flag mani-
fold Inventiones Mathematicæ 71 365370
[15] http://www.liegroups.org/AIME8/technicaldetails.html
[16] NEW YORK TIMES 2007/03/20.
http://select.nytimes.com/gst/abstract.html?res=F40613FE3C540C738EDDAA0894DF404482
[17] The Times March 19, 2007.
http://www.timesonline.co.uk/tol/news/uk/science/article1533648.ece
[18] Vogan D (1983) Irreducible characters of semisimple Lie groups III: Proof of
the Kazhdan-Lusztig conjecture in the integral case Inventiones Mathematicæ
71 381417
http://www.lemonde.fr/web/article/0
http://www.liegroups.org/AIM$_$E8/technicaldetails.html
http://select.nytimes.com/gst/abstract.html?res=F40613FE3C540C738EDDAA0894DF404482
http://www.timesonline.co.uk/tol/news/uk/science/article1533648.ece
	Introduction
	Three simple systems
	Conclusion
ABSTRACT
  In this note we present three representations of a 248-dimensional Lie
algebra, namely the algebra of Lie point symmetries admitted by a system of
five trivial ordinary differential equations each of order forty-four, that
admitted by a system of seven trivial ordinary differential equations each of
order twenty-eight and that admitted by one trivial ordinary differential
equation of order two hundred and forty-four.

<|endoftext|><|startoftext|>
Introduction
A mathematically rigorous approach to quantum field theory based on op-
erator algebras is called an algebraic quantum field theory. It has a long
history since pioneering works of Araki, Haag, Kastker. (See [22] for a gen-
eral treatment of algebraic quantum field theory.) This theory works on
Minkowski spaces on any spacetime dimension, and there have been some
recent results on curved spacetimes or even noncommutative spacetimes. In
the case of 1+1-dimensional Minkowski space with higher spacetime symme-
try, conformal symmetry, we have conformal field theory and there we have
seen many new developments in the recent years, so we survey such results
here. Our emphasis is on representation theoretic aspects of the theory and
we make various comparison with another mathematically rigorous and more
recent approach to conformal field theory, that is, theory of vertex operator
algebras.
∗Supported in part by JSPS.
http://arxiv.org/abs/0704.0097v1
Roughly speaking, a mathematical study of quantum field theory is a
study of Wightman fields, which are certain type of operator-valued distri-
butions on a spacetime with covariance with respect to a given spacetime
symmetry group. We have mathematically rigorous axioms for such Wight-
man fields, but they involve distributions and unbounded operators, so these
cause various kinds of technical difficulty. In contrast, in the algebraic quan-
tum field theory, our fundamental object is a net of von Neumann algebras
of bounded linear operators on a Hilbert space. (See [46] for general the-
ory of von Neumann algebras.) Technical problems on definition domains of
unbounded operators do not arise in this approach.
A basic idea is as follows. Suppose we have a Wightman field Φ on a
spacetime. Fix a bounded region O in the space time and consider a test
function ϕ with support contained in O. Then the pairing 〈Φ, ϕ〉 produces
an (unbounded) operator. We have many Φ and ϕ for a fixed O and obtain
many unbounded operators from such pairing. Then we consider a von Neu-
mann algebra of bounded linear operators on this Hilbert space generated
by these unbounded operators. (For example, if we have a self-adjoint un-
bounded operators, we consider its spectral projections which are obviously
all bounded. In this way, we deal with only bounded operators.) This is re-
garded as a von Neumann algebra generated by observables in the spacetime
region O. A von Neumann algebra is an algebra of bounded linear operators
which is closed under the adjoint operation and the strong operator topol-
ogy. In this way, we have a family {A(O)} of von Neumann algebras on the
same Hilbert space parameterized by spacetime regions. Since the spacetime
regions make a net with respect to the inclusion order, we call such a family a
net of von Neumann algebras. Now we forget Wightman fields and consider
only a net of von Neumann algebras. We have some expected properties for
such nets of von Neumann algebras from a physical consideration, and now
we use these properties as axioms. So our mathematical object is a net of
von Neumann algebras subject to certain set of axioms. Our mathematical
aim is to study such nets of von Neumann algebras.
2 Conformal Quantum Field Theory
We first explain formulation of full conformal quantum field theory on the
1 + 1-dimensional Minkowski space in algebraic quantum field theory. As a
spacetime region O above, it is enough to consider only open rectangles O
with edges parallel to t = ±x in (1 + 1)-dim Minkowski space. In this way,
we get a family {A(O)} of operator algebras parameterized by spacetime
regions O (rectangles). In order to realize conformal symmetry, we have to
make a partial compactification of the 1+1-dimensional Minkowski space. If
two rectangles are spacelike separated, then we have no interactions between
them even at the speed of light, so our axiom requires that the corresponding
two von Neumann algebras commute with each other. This is the locality
axiom. Since this is not our main object in this paper, we omit details of the
other axioms. See [29] for full details.
Next we briefly explain that boundary conformal field theory can be han-
dled within the same framework. Now we consider the half-space {(x, t) |
x > 0} in the 1+1-dimensional Minkowski space and only rectangles O con-
tained in this half-space. In this way, we have a similar net of von Neumann
algebras {A(O)} parameterized with rectangles in the half-space. See [38]
for full details of the axioms.
If we have a net of von Neumann algebras over the 1 + 1-dimensional
Minkowski space, we can restrict the net of von Neumann algebras to two
chiral conformal field theories on the light cones {x = ±t}. In this way, we
have two nets of von Neumann algebras on the compactified S1 as description
of two chiral conformal field theories. Since this net is our main mathematical
object in this article, we give a full set of axioms. (See [29] for details of this
“restriction” procedure.)
Now our “spacetime” is S1 and a “spacetime region” is an interval I,
which means a non-empty, non-dense open connected subset of S1. We have
a family {A(I)} of von Neumann algebras on a fixed Hilbert space H . These
von Neumann algebras are simple and such von Neumann algebras are called
factors, so the family {A(I)} satisfying the axioms below is called a net of
factors (or an irreducible local conformal net of factors, strictly speaking).
Actually, the set of intervals on S1 is not directed with respect to inclusions,
so the terminology net is not mathematically appropriate, but is widely used.
1. (isotony) For intervals I1 ⊂ I2, we have A(I1) ⊂ A(I2).
2. (locality) For intervals I1, I2 with I1∩I2 = ∅, we have [A(I1),A(I2)] = 0
3. (Möbius covariance) There exists a strongly continuous unitary repre-
sentation U of PSL(2,R) on H satisfying U(g)A(I)U(g)∗ = A(gI) for
any g ∈ PSL(2,R) and any interval I.
4. (positivity of energy) The generator of the one-parameter rotation sub-
group of U , called the conformal Hamiltonian, is positive.
5. (existence of the vacuum) There exists a unit U -invariant vector Ω in
H , called the vacuum vector, and the von Neumann algebra
I∈S1 A(I)
generated by all A(I)’s is B(H).
6. (conformal covariance) There exists a projective unitary representation
U of Diff(S1) on H extending the unitary representation of PSL(2,R)
such that for all intervals I, we have
U(g)A(I)U(g)∗ = A(gI), g ∈ Diff(S1),
U(g)AU(g)∗ = A, A ∈ A(I), g ∈ Diff(I ′),
where Diff(S1) is the group of orientation-preserving diffeomorphisms
of S1 and Diff(I ′) is the group of diffeomorphisms g of S1 with g(t) = t
for all t ∈ I.
The isotony axiom is natural because we have more test functions (or more
observables) for a larger interval. The locality axiom takes this simple form on
S1. The choice of the spacetime symmetry is not unique, and we can use the
Poincaré symmetry on the Minkowski space or the Möbius covariance on S1,
for example, but in the conformal field theory, we use conformal symmetry,
which means diffeomorphism covariance as above. This set of axioms imply
various nice conditions such as the Reeh-Schlieder property, the Bisognano-
Wichmann property and the Haag duality. See [28] and references there for
details.
In the usual situation, all the von Neumann algebras A(I) are isomorphic
to the so-called Araki-Woods type III1 factor for all nets A and all intervals
I. So each von Neumann algebra does not contain any information about
the conformal field theory, but it is the relative position of the von Neumann
algebras in the family that encodes the physical information of the theory.
(It is similar to subfactor theory of Jones where we study a relative position
of one factor in another.)
At the end of this section, we compare our formulation of conformal
quantum field theory with another mathematically rigorous approach, the-
ory of vertex operator algebras. A vertex operator algebra is an algebraic
axiomatization of Wightman fields on S1, called vertex operators. If we
have an operator valued distribution on S1, its Fourier expansion should give
countably many (possibly unbounded) operators as the Fourier coefficients.
Under the so-called state-field correspondence, any vector in the space of
“states” should give an operator-valued distribution, a quantum “field”, and
its Fourier expansion gives countably many operators. In this way, one vector
should give countably many operators on the space of these vectors. In other
words, for two vectors v, w we have countably many binary operations v(n)w,
n ∈ Z, the action of the n-th operator given by v on w. An axiomatization
of this idea gives a notion of vertex operator algebra. (See [16] for a precise
definition. There is a slightly weaker notion of a vertex algebra. See [27]
for its precise definition and related results.) In theory of vertex operator
algebra, one considers a vector space of states without an inner product and
even when we have a positive definite inner product, one considers this vec-
tor space without completion. Here in comparison to nets of factors, we are
interested in the case where we have positive definite inner products on the
spaces of states. We say that such a vertex operator algebra is unitary.
Both of one (unitary) vertex operator algebra and one net of factors
should describe one chiral conformal field theory. So unitary vertex operator
algebras and nets of factors should be in a bijective correspondence, at least
under some “nice” additional conditions, but no general theorems have been
known for such a correspondence, though there is a recent progress due to S.
Carpi and M. Weiner. However, if we have one construction or an idea on one
side, we can often “translate” it to the other side, though it can be highly non-
trivial from a technical viewpoint. Fundamental sources of constructions for
vertex operator algebras are affine Kac-Moody algebras and integral lattices.
The corresponding constructions for nets of factors have been done by A.
Wassermann [47] and his students, and Dong-Xu [12], respectively, after the
initial construction of Buchholz-Mack-Todorov [5]. If we have examples with
some nice properties, we canoften construct new examples from them, and
as such methods of constructions of vertex operator algebras, we have simple
current extensions, the coset construction, and the orbifold construction. The
simple current extensions for nets of factors are simply crossed products by
DHR-automorphisms and easy to realize. (See the next section for a notion
of DHR-endomorphisms.) The coset and orbifold constructions for nets of
factors have been studied in detail by F. Xu [50, 51, 52].
For nets of factors, we have introduced a new construction of examples
in [28] based on Longo’s notion of Q-systems [36]. Further examples have
been constructed by Xu [55] with this method. This can be translated to the
setting of vertex operator algebras, as we will see in this article later.
3 Representation Theory
An important tool to study nets of factors is a representation theory. For a
net of factors {A(I)}, all the algebras A(I) act on the initial Hilbert space
H from the beginning, but we also consider their representations on another
Hilbert space, that is, a family {πI} of representations πI : A(I) → B(K),
where K is another Hilbert space, common for all I. For I1 ⊂ I2, we must
have that the restriction of πI2 on A(I1) is equal to πI1 . The representation on
the initial Hilbert space is called the vacuum representation and plays a role of
a trivial representation. We also have to take care of the spacetime symmetry
group when we consider a representation, but this part is often automatic
(see [20]), so we now ignore it for simplicity. See [20] for a more detailed
treatment. Note that a representation of a net of factors is a counterpart of
a module over a vertex operator algebra.
Notions of irreducibility and a direct sum for such representations are
easy to formulate. Non-trivial notions are dimensions and tensor products.
Each representation {πI} is in a bijective correspondence to a certain endo-
morphism λ of an infinite dimensional operator algebra, called a Doplicher-
Haag-Roberts (DHR) endomorphism [13, 15], and we can restrict λ to a single
factor A(I) for an arbitrarily but fixed interval I. Then λ(A(I)) ⊂ A(I) is
a subfactor and we have its Jones index [26]. (See [14, 41, 43] for general
theory of subfactors.) The square root of this Jones index plays the role of
the dimension of the representation [35]. In algebraic quantum field theory,
such a dimension was called a statistical dimension, and it is analogous to
a quantum dimension in the theory of quantum groups. It is a positive real
numbers in the interval [1,∞]. We can also compose endomorphisms and
this composition gives the correct notion of tensor products. We then get a
braided tensor category as in [15].
In representation theory of a vertex operator algebra (and also a quantum
group), it sometimes happens that we have only finitely many irreducible rep-
resentations. Such finiteness is often called rationality, possibly with some
extra assumptions on some finite dimensionality. This also plays an impor-
tant role in theory of quantum invariants in low dimensional topology. In [32],
we have introduced an operator algebraic condition for such rationality for
nets of factors as follows and we called it complete rationality. We split the
circle into four intervals I1, I2, I3, I4 in this order, say, counterclockwise. Then
complete rationality is given by the finiteness of the Jones index for a subfac-
tor A(I1)∨A(I3) ⊂ (A(I2)∨A(I4))
′ where ′ means the commutant, together
with the split property. The split property is known to hold if the vacuum
character,
n=0(dimHn)q
n, is convergent for |q| < 1 by [9], so it usually
holds and is easy to verify. (Here H =
n=0Hn is the eigenspace decompo-
sition of the original Hilbert space for the positive generator of the rotation
group. So this convergence property can be verified simply by looking at
the Hilbert space, not the von Neumann algebras.) In the original definition
of complete rationality in [32], we required another condition called strong
additivity, but it was proved to be redundant by Longo-Xu [39]. We have
proved in [32] that this complete rationality implies that we have a modular
tensor category as a representation category of {A(I)}. A modular tensor
category produces a 3-dimensional topological quantum field theory. (See [45]
for general theory of topological quantum field theory.) The SU(N)k-net of
Wassermann has been shown to be completely rational by [49].
We now introduce an important notion of α-induction. For an inclusion
of nets of factors, A(I) ⊂ B(I), we have an induction procedure analogous
to the group representation. So from a representation of the smaller net A,
we would like to construct a representation of the larger net B, but what
we actually obtain is not a genuine representation of the larger net B in
general, and is something weaker called solitonic. This induction procedure
is called the α-induction and depends a choice of braiding, so we write α+
and α−. This was first defined in Longo-Rehren [37] and studied in detail
in Xu [48]. Then Böckenhauer-Evans [1] made a further study, and [2, 3]
unified this study with Ocneanu’s graphical method [42]. The intersection
of the irreducible endomorphisms appearing in the images of α+-induction
and α−-induction gives the true representation category of {B(I)} if A is
completely rational by [2, 32].
This α-induction opens an important and new connection with theory
of modular invariants. A modular tensor category produces a unitary rep-
resentation π of SL(2,Z) through its braiding as in [44], and its dimension
is the number of irreducible objects. So a completely rational net of fac-
tors produces such a unitary representation. (Note that our representation
of SL(2,Z) comes from the braiding structure, not from the action of this
group on the characters through change of variables τ 7→
aτ + b
cτ + d
, though in
all the “nice” known examples, these two representations coincide. See [30]
for a discussion on this matter.)
It has been proved in [2] that the matrix (Zλ,µ) defined by
Zλ,µ = dimHom(α
λ , α
is in the commutant of the representation π, using Ocneanu’s graphical cal-
culus [42]. Such a matrix Z is called a modular invariant, and we have only
finitely many such Z for a given π. For any completely rational net {A(I)},
any extension {B(I) ⊃ A(I)} produces such Z. Matrices Z are certainly
much easier to classify than extensions and this is a source of classification
theory in the next section.
4 Classification Theory
For a net of factors, we can naturally define a central charge and it is well-
known to take discrete values 1− 6/m(m+ 1), m = 3, 4, 5, . . . , below 1 and
all values in [1,∞) by [17, 18]. We have the Virasoro net {Virc(I)} for each
such c and it is the operator algebraic counterpart of the Virasoro vertex
operator algebra with the same c. Any net of factors {A(I)} with central
charge c is an extension of the Virasoro net with the same central charge and
it is automatically completely rational if c < 1, as shown in [28]. So we can
apply the above theory and we get the following complete classification list
for the case c < 1 as in [28].
1. The Virasoro nets {Virc(I)} with c < 1.
2. The simple current extensions of the Virasoro nets with index 2.
3. Four exceptionals at c = 21/22, 25/26, 144/145, 154/155.
The unitary representations of SL(2,Z) for the Virasoro nets are the well-
known ones, and all the modular invariants for these have been classified by
[6]. Our result shows that each of the so-called type I modular invariants in
the classification list of [6] corresponds to a net of factors uniquely. They
are labeled with pairs of A-D2n-E6,8 Dynkin diagrams with Coxeter numbers
differing by 1. Three in (3) of the above list have been identified with coset
models, but the remaining one does not seem to be related to any other
known constructions. This is constructed with “extension by Q-system”.
Xu [55] recently applied this construction to many other coset models and
obtained infinitely many new examples based on [54], called mirror exten-
sions. Classification for the case c = 1 has been also done under some extra
assumption [7, 53].
This classification theorem also implies a classification of certain types of
vertex operator algebras as follows.
Let V be a (rational) vertex operator algebra and Wi be its irreducible
modules. We would like to classify all vertex operator algebras arising from
putting a vertex operator algebra structure on
i niWi and using the same
Virasoro element as V , where ni is multiplicity and W0 = V , n0 = 1. From
a viewpoint of tensor category, this classification problem of extensions of
a vertex operator algebras is the “same” as the classification problem of
extensions of a completely rational net of factors, as shown in [24].
So the above classification theorem of local conformal nets implies a clas-
sification theorem of extensions of the Virasoro vertex operator algebras with
c < 1 as above, and we obtain the same classification list. That is, besides
the Virasoro vertex operator algebras themselves, we have their simple cur-
rent extensions, and four exceptionals at c = 21/22, 25/26, 144/145, 154/155.
With the usual notation of L(c, h) for a module with central charge c and
conformal weight h of the Virasoro vertex operator algebras with c < 1, the
four exceptionals are listed as follows.
1. L(21/22, 0)⊕L(21/22, 8). It has 15 irreducible representations and has
two coset realizations, from SU(9)2 ⊂ (E8)2 and (E8)3 ⊂ (E8)2⊗(E8)1.
2. L(25/26, 0) ⊕ L(25/26, 10). It has 18 irreducible representations and
has a coset realization from SU(2)11 ⊂ SO(5)1 ⊗ SU(2)1.
3. L(144/145, 0)⊕L(144/145, 24)⊕L(144/145, 78)⊕ L(144/145, 189). It
has 28 irreducible representations and no coset realization has been
known.
4. L(154/155, 0)⊕L(154/155, 26)⊕L(154/155, 84)⊕ L(154/155, 203). It
has 30 irreducible representations and has a coset realization from
SU(2)29 ⊂ (G2)1 ⊗ SU(2)1.
Note that it is not obvious that the representation category of the Virasoro
net Virc and the representation category of the Virasoro vertex operator
algebra L(c, 0) are isomorphic, but as long as the two are braided tensor
category and have the same S- and T -matrices, the arguments in [28] work,
so we obtain the above classification result for vertex operator algebras.
Using the above results and more techniques, we can also completely
classify full conformal field theories within the framework algebraic quantum
field theory for the case c < 1. Full conformal field theories are given as
certain nets of factors on 1 + 1-dimensional Minkowski space. Under natu-
ral symmetry and maximality conditions, those with c < 1 are completely
labeled with the pairs of A-D-E Dynkin diagrams with the difference of
their Coxeter numbers equal to 1, as shown in [29]. We now naturally have
D2n+1, E7 as labels, unlike in the chiral case. The main difficulty in this
work lies in proving uniqueness of the structure for each modular invariant
in the Cappelli-Itzykson-Zuber list [6]. This is done through 2-cohomology
vanishing for certain tensor categories. in the spirit of [25].
Furthermore, using the above results and more techniques we can also
completely classify boundary conformal field theories for the case c < 1.
Boundary conformal field theories are given as certain nets of factors on a 1+
1-dimensional Minkowski half-space. Under a natural maximality condition,
these with c < 1 are now completely labeled with the pairs of A-D-E Dynkin
diagrams with distinguished vertices having the difference of their Coxeter
numbers equal to 1, as shown in [33] based on a general theory in [38]. The
“chiral fields” in a boundary conformal field theory should produce a net
of factors on the boundary (which is compactified to S1) as in the operator
algebraic approach. Then a general boundary conformal field theory restricts
to this boundary to produce a non-local extension of this chiral conformal
field theory on the boundary.
5 Moonshine Conjecture
The Moonshine conjecture, formulated by Conway-Norton [8], is about mys-
terious relations between finite simple groups and modular functions, since
an observation due to McKay.
Today the classification of all finite simple groups is complete and the
classification list contains 26 sporadic groups in addition to several infinite
series. The largest group among the 26 sporadic groups is called the Monster
group and its order is about 8× 1053
One the one hand, the non-trivial irreducible representation of the Mon-
ster having the smallest dimension is 196883 dimensional. On the other
hand, the following function, called j-function, has been classically studied
in algebra.
j(τ) = q−1 + 744 + 196884q +
21493760q2 + 864299970q3 + · · ·
For q = exp(2πiτ), Im τ > 0, we have modular invariance property, j(τ) =
aτ + b
cτ + d
∈ SL(2,Z), and this is the only function, up to
the constant term, satisfying this property and starting with q−1,
McKay noticed 196884 = 196883 + 1, and similar simple relations for
other coefficients of the j-function and dimensions of irreducible represen-
tations of the Monster group turned out to be true. Then Conway-Norton
[8] formulated the Moonshine conjecture roughly as follows, which has been
now proved by Borcherds [4] in 1992.
1. We have a “natural” infinite dimensional graded vector space V =
Vn with some algebraic structure having a Monster action pre-
serving the grading and each Vn is finite dimensional.
2. For any element g in the Monster, the power series
n=0(Tr g|Vn)q
is a special function called a Hauptmodul for some discrete subgroup
of SL(2,R). When g is the identity element, the series is the j-function
minus constant term 744.
For the part (1) of this conjecture, Frenkel-Lepowsky-Meurman [16] gave
a precise definition of “some algebraic structure” as a vertex operator algebra
and constructed a particular example V , which is now called the Moonshine
vertex operator algebras and denoted by V ♮.
The construction roughly goes as follows. In dimension 24, we have an
exceptional lattice Λ called the Leech lattice. Then there is a general con-
struction of a vertex operator algebra from a certain lattice, and the one for
the Leech lattice gives something very close to our final object V ♮. Then we
take a fixed point algebra under a natural action of Z/2Z arising from the
lattice symmetry, and then make a simple current extension of order 2. The
resulting vertex operator algebra is the Moonshine vertex operator algebra
V ♮. (The final step is called a twisted orbifold construction). The series
n=0(dimV
n−1 is indeed the j-function minus constant term 744.
Miyamoto [40] has a new realization of V ♮ as an extension of a tensor
power of the Virasoro vertex operator algebra with c = 1/2, L(1/2, 0)⊗48
(based on Dong-Mason-Zhu [11]). This kind of extension of a Virasoro tensor
power is called a framed vertex operator algebra as in [10].
We have given an operator algebraic counterpart of such a construction
in [31].
We realize a Leech lattice net of factors on S1 as an extension of Vir1/2
using certain Z4-code. Then we can perform the twisted orbifold construction
in the operator algebraic sense to obtain a net of factors, the Moonshine net
A♮. Theory of α-induction is used for obtaining various decompositions. We
then get a Miyamoto-type description of this construction, as an operator
algebraic counterpart of the framed vertex operator algebras. We then have
the following properties.
1. c = 24.
2. The representation theory is trivial.
3. The automorphism group is the Monster.
4. The Hauptmodul property (as above).
Outline of the proof of these four properties is as follows.
It is immediate to get c = 24. We can show complete rationality passes
to an extension (and an orbifold) in general with control over the size of the
representation category, using the Jones index. With this, we obtain (2) very
easily. Such a net is called holomorphic. Property (3) is the most difficult
part. For the Virasoro VOA L(1/2, 0), the vertex operator is indeed a well-
behaved Wightman field and smeared fields produce the Virasoro net Vir1/2.
Using this property and the fact that
g g(L(1/2, 0)
⊗48) for all g ∈ Aut(V ♮)
generate the entire Moonshine VOA V ♮, we can prove that the automorphism
group as a vertex operator algebra and the automorphism group as a net
of factors are indeed the same. Then (4) is now a trivial corollary of the
Borcherds theorem [4].
We note that the Baby Monster, the second largest among the 26 sporadic
finite simple groups, can be treated similarly with Höhn’s construction of the
shorter Moonshine super vertex operator algebra.
Still, these examples are treated with various tricks case by case. We
expect a bijective correspondence between vertex operator algebras and nets
of factors on S1 under some nice conditions. On the side of vertex operator
algebras, the most natural candidate for such a “nice” condition is the C2-
finiteness condition of Zhu [56] (with unitarity). On the operator algebraic
side, our complete rationality in [32] seems to be such a “nice” condition, but
the actual relations between the two notions are not clear at this moment.
The essential condition for complete rationality is the finiteness of the Jones
index arising from four intervals on the circle, and this finiteness somehow
has formal similarity to the finiteness appearing in the definition of the C2-
finiteness.
At the end, we list some open problems. The operator algebraic approach
has an advantage in control of representation theory, but is behind of theory
of vertex operator algebras in the theory of characters.
For a net of factors, we can naturally define a notion of a character for
each representation. But even convergence of these characters have not been
proved in general, and the modular invariance property, the counterpart of
Zhu’s result [56], is unknown, though we certainly expect it to be true. We
also expect the Verlinde identity holds, which has been proved in the context
of vertex operator algebras recently by Huang [23]. We would need an S-
matrix version of the spin-statistics theorem [21] for nets of factors.
References
[1] J. Böckenhauer, D. E. Evans, Modular invariants, graphs and α-
induction for nets of subfactors I, Commun. Math. Phys. 197 (1998)
361–386. II 200 (1999) 57–103. III 205 (1999) 183–228.
[2] J. Böckenhauer, D. E. Evans, Y. Kawahigashi, On α-induction, chiral
projectors and modular invariants for subfactors, Commun. Math. Phys.
208 (1999) 429–487.
[3] J. Böckenhauer, D. E. Evans, Y. Kawahigashi, Chiral structure of modu-
lar invariants for subfactors, Commun. Math. Phys. 210 (2000) 733–784.
[4] R. E. Borcherds, Monstrous moonshine and monstrous Lie superalgebras,
Invent. Math. 109 (1992) 405–444.
[5] D. Buchholz, G. Mack, I. Todorov, The current algebra on the circle as
a germ of local field theories, Nucl. Phys. B, Proc. Suppl. 5B (1988)
20–56.
[6] A. Cappelli, C. Itzykson, J.-B. and Zuber, The A-D-E classification of
minimal and A
1 conformal invariant theories, Commun. Math. Phys.
113 (1987) 1–26.
[7] S. Carpi, On the representation theory of Virasoro nets, Commun. Math.
Phys. 244 (2004) 261–284. math.OA/0306425.
[8] J. H. Conway, S. P. Norton, Monstrous moonshine, Bull. London Math.
Soc. 11 (1979) 308–339.
[9] C. D’Antoni, R. Longo, F. Radulescu, Conformal nets, maximal tem-
perature and models from free probability, J. Operator Theory 45 (2001)
195–208.
[10] C. Dong, R. L. Griess Jr., G. Höhn, Framed vertex operator algebras,
codes and the Moonshine module, Commun. Math. Phys. 193 (1998)
407–448.
[11] C. Dong, G. Mason, Y. Zhu, Discrete series of the Virasoro algebra and
the moonshine module, Proc. Symp. Pure. Math., Amer. Math. Soc. 56
II (1994) 295–316.
[12] C. Dong, F. Xu, Conformal nets associated with lattices and their orb-
ifolds, Adv. Math. 206 (2006) 279–306. math.OA/0411499.
[13] S. Doplicher, R. Haag, J. E. Roberts, Local observables and particle
statistics, I. Commun. Math. Phys. 23, 199-230 (1971); II. 35, 49-85
(1974).
[14] D. E. Evans, Y. Kawahigashi, “Quantum symmetries on operator alge-
bras”, Oxford University Press, 1998.
[15] K. Fredenhagen, K.-H. Rehren, B. Schroer, Superselection sectors with
braid group statistics and exchange algebras, I Commun. Math. Phys.
125, 201–226 (1989), II Rev. Math. Phys. Special issue (1992) 113–
[16] I. Frenkel, J. Lepowsky, A. Meurman, “Vertex operator algebras and
the Monster”, Academic Press, 1988.
http://arxiv.org/abs/math/0306425
http://arxiv.org/abs/math/0411499
[17] D. Friedan, Z. Qiu, S. Shenker, Details of the non-unitarity proof for
highest weight representations of the Virasoro algebra, Commun. Math.
Phys. 107 (1986) 535–542.
[18] P. Goddard, A. Kent, D. Olive, Unitary representations of the Virasoro
and super-Virasoro algebras, Commun. Math. Phys. 103 (1986) 105–
[19] R. L. Griess Jr., The friendly giant, Invent. Math. 69 (1982) 1–102.
[20] D. Guido & R. Longo, Relativistic invariance and charge conjugation in
quantum field theory, Commun. Math. Phys. 148 (1992) 521—551.
[21] D. Guido, R. Longo, The conformal spin and statistics theorem, Com-
mun. Math. Phys. 181 (1996) 11–35.
[22] R. Haag, “Local Quantum Physics”, 2nd ed., Springer, Berlin, Heidel-
berg, New York, 1996
[23] Y.-Z. Huang, Vertex operator algebras, the Verlinde conjecture, and mod-
ular tensor categories, Proc. Natl. Acad. Sci. USA 102 (2005) 5352–
5356.
[24] Y.-Z. Huang, A. Kirillov Jr., J. Lepowsky, Braided tensor categories and
extensions of vertex operator algebras, in preparation.
[25] M. Izumi, H. Kosaki, On a subfactor analogue of the second cohomology,
Rev. Math. Phys. 14 (2002) 733–757.
[26] V. F. R. Jones, Index for subfactors, Invent. Math. 72 (1983) 1–25.
[27] V. Kac, “Vertex Algebras for Beginners”, Lect. Notes Series 10, Amer.
Math. Soc. Providence, RI, 1988.
[28] Y. Kawahigashi, R. Longo, Classification of local conformal nets. Case
c < 1, Ann. of Math. 160 (2004), 493–522. math-ph/0201015.
[29] Y. Kawahigashi, R. Longo, Classification of two-dimensional local con-
formal nets with c < 1 and 2-cohomology vanishing for tensor categories,
Commun. Math. Phys. 244 (2004) 63–97. math-ph/0304022.
http://arxiv.org/abs/math-ph/0201015
http://arxiv.org/abs/math-ph/0304022
[30] Y. Kawahigashi, R. Longo, Noncommutative spectral invariants and
black hole entropy, Commun. Math. Phys. 257 (2005) 193-225.
math-ph/0405037.
[31] Y. Kawahigashi, R. Longo, Local conformal nets arising from
framed vertex operator algebras, Adv. Math. 206 (2006) 729–751.
math.OA/0407263.
[32] Y. Kawahigashi, R. Longo, M. Müger, Multi-interval subfactors and
modularity of representations in conformal field theory, Commun. Math.
Phys. 219 (2001) 631–669.
[33] Y. Kawahigashi, R. Longo, U. Pennig, K.-H. Rehren, The classification
of non-local chiral CFT with c < 1, Commun. Math. Phys. 271 (2007)
375–385. math.OA/0505130.
[34] A. Kirillov Jr., V. Ostrik, On q-analog of McKay correspondence and
ADE classification of sl(2) conformal field theories, Adv. Math. 171
(2002) 183–227.
[35] R. Longo, Index of subfactors and statistics of quantum fields I–II, Com-
mun. Math. Phys. 126 (1989) 217–247 & 130 (1990) 285–309.
[36] R. Longo, A duality for Hopf algebras and for subfactors, Commun.
Math. Phys. 159 (1994) 133–150.
[37] R. Longo, K.-H. Rehren, Nets of subfactors, Rev. Math. Phys. 7 (1995)
567–597.
[38] R. Longo, K.-H. Rehren, Local fields in boundary CFT, Rev. Math. Phys.
16 (2004) 909–960.
[39] R. Longo, F. Xu, Topological sectors and a dichotomy in conformal field
theory, Commun. Math. Phys. 251 (2004) 321–364. math.OA/0309366.
[40] M. Miyamoto, A new construction of the moonshine vertex operator
algebra over the real number field, Ann. of Math. 159 (2004) 535–596.
[41] A. Ocneanu, Quantized group, string algebras and Galois theory for alge-
bras, in Operator algebras and applications, Vol. 2 (Warwick, 1987), (ed.
D. E. Evans and M. Takesaki), London Mathematical Society Lecture
Note Series 36, Cambridge University Press, Cambridge, 1988, 119–172.
http://arxiv.org/abs/math-ph/0405037
http://arxiv.org/abs/math/0407263
http://arxiv.org/abs/math/0505130
http://arxiv.org/abs/math/0309366
[42] A. Ocneanu, Paths on Coxeter diagrams: from Platonic solids and singu-
larities to minimal models and subfactors, (Notes recorded by S. Goto),
in Lectures on operator theory, (ed. B. V. Rajarama Bhat et al.), The
Fields Institute Monographs, AMS Publications, 2000, 243–323.
[43] S. Popa, “Classification of subfactors and of their endomorphisms”,
CBMS Regional Conference Series, Amer. Math. Soc. 86 (1995).
[44] K.-H. Rehren, Braid group statistics and their superselection rules, in
“The Algebraic Theory of Superselection Sectors”, D. Kastler ed., World
Scientific 1990, 333–355.
[45] V. G. Turaev, “Quantum invariants of knots and 3-manifolds”, Walter
de Gruyter, Berlin-New York, 1994.
[46] M. Takesaki, “Theory of Operator Algebras”, vol. I, II, III, Springer
Encyclopaedia of Mathematical Sciences 124 (2002), 125, 127 (2003).
[47] A. Wassermann, Operator algebras and conformal field theory III: Fusion
of positive energy representations of SU(N) using bounded operators,
Invent. Math. 133 (1998) 467–538.
[48] F. Xu, New braided endomorphisms from conformal inclusions, Com-
mun. Math. Phys. 192 (1998) 347–403.
[49] F. Xu, Jones-Wassermann subfactors for disconnected intervals, Com-
mun. Contemp. Math. 2 (2000) 307–347.
[50] F. Xu, Algebraic coset conformal field theories I, Commun. Math. Phys.
211 (2000) 1–44.
[51] F. Xu, Algebraic coset conformal field theories II, Publ. RIMS, Kyoto
Univ. 35 (1999) 795–824.
[52] F. Xu, Algebraic orbifold conformal field theories, Proc. Nat. Acad. Sci.
U.S.A. 97 (2000) 14069–14073.
[53] F. Xu, Strong additivity and conformal nets, Pac. J. Math. 221 (2005)
167–199. math.QA/0303266.
[54] F. Xu, 3-manifolds invariants from cosets, J. Knot Theory Ramif. 14
(2005) 21–90.
http://arxiv.org/abs/math/0303266
[55] F. Xu, Mirror extensions of local nets, Commun. Math. Phys. 270 (2007)
835–847. math.QA/0505367.
[56] Y. Zhu, Modular invariance of characters of vertex operator algebras, J.
Amer. Math. Soc. 9 (1996) 237–302.
http://arxiv.org/abs/math/0505367
	Introduction
	Conformal Quantum Field Theory
	Representation Theory
	Classification Theory
	Moonshine Conjecture
ABSTRACT
  We review recent progress in operator algebraic approach to conformal quantum
field theory. Our emphasis is on use of representation theory in classification
theory. This is based on a series of joint works with R. Longo.

<|endoftext|><|startoftext|>
Sparsely-spread CDMA - a statistical mechanics
based analysis
Jack Raymond and David Saad
Neural Computation Research Group, Aston University, Aston Triangle, Birmingham,
B4 7EJ
E-mail: jack.raymond@physics.org
Abstract.
Sparse Code Division Multiple Access (CDMA), a variation on the standard CDMA
method in which the spreading (signature) matrix contains only a relatively small number
of non-zero elements, is presented and analysed using methods of statistical physics. The
analysis provides results on the performance of maximum likelihood decoding for sparse
spreading codes in the large system limit. We present results for both cases of regular
and irregular spreading matrices for the binary additive white Gaussian noise channel
(BIAWGN) with a comparison to the canonical (dense) random spreading code.
PACS numbers: 64.60.Cn, 75.10.Nr, 84.40.Ua, 89.70.+c
AMS classification scheme numbers: 68P30,82B44,94A12,94A14
http://arxiv.org/abs/0704.0098v5
Sparsely-spread CDMA - a statistical mechanics based analysis 2
1. Background
The area of multiuser communications is one of great interest from both theoretical and
engineering perspectives [1]. Code Division Multiple Access (CDMA) is a particular
method for allowing multiple users to access channel resources in an efficient and robust
manner, and plays an important role in the current preferred standards for allocating
channel resources in wireless communications. CDMA utilises channel resources highly
efficiently by allowing many users to transmit on much of the bandwidth simultaneously,
each transmission being encoded with a user specific signature code. Disentangling the
information in the channel is possible by using the properties of these codes and much of
the focus in CDMA research is on developing efficient codes and decoding methods.
In this paper we study a variant of the original method, sparse CDMA, where the
spreading matrix contains only a relatively small number of non-zero elements as was
originally studied and motivated in [2]. While the straightforward application of sparse
CDMA techniques to uplink multiple access communication is rather limited, as it is
difficult to synchronise the sparse transmissions from the various users, the method can be
highly useful for frequency and time hopping. In frequency-hopping code division multiple
access (FH-CDMA), one repeatedly switches frequencies during radio transmission, often
to minimize the effectiveness of interception or jamming of telecommunications. At
any given time step, each user occupies a small (finite) number of the (infinite) M-ary
frequency-shift-keying (MFSK) chip/carrier pairs (with gain G, the total number of chip-
frequency pairs is MG.) Hops between available frequencies can be either random or
preplanned and take place after the transmission of data on a narrow frequency band. In
time-hopping (TH-)CDMA, a pseudo-noise sequence defines the transmission moment for
the various users, which can be viewed as sparse CDMA when used in an ultra-wideband
impulse communication system. In this case the sparse time-hopping sequences reduces
collisions between transmissions.
This study follows the seminal paper of Tanaka [3], and other recent extensions [4],
in utilising the replica analysis for randomly spread CDMA with discrete inputs, which
established many of the properties of random densely-spread CDMA with respect to
several different detectors including Maximum A Posteriori (MAP), Marginal Posterior
Maximiser (MPM) and minimum mean square-error (MMSE). Sparsely-spread CDMA
differs from the conventional CDMA, based on dense spreading sequences, in that any
user only transmits to a small number of chips (by comparison to transmission on all chips
in the case of dense CDMA). The sparse nature of this model facilitates the use of methods
from statistical physics of dilute disordered systems [5, 6] for studying the properties of
typical cases.
The feasibility of sparse CDMA for transmitting information was recently
demonstrated [2] for the case of real (Gaussian distributed) input symbols by employing a
Gaussian effective medium approximation; several results have been reported for the case of
random transmission patterns. In a separate recent study, based on the belief propagation
inference algorithm and a binary input prior distribution, sparse CDMA has also been
Sparsely-spread CDMA - a statistical mechanics based analysis 3
considered as a route to proving results in the densely spread CDMA [7]. In addition, this
study demonstrated the existence of a waterfall phenomenon comparable to the dense code
for a subset of ensembles. The waterfall phenomenon is observed in decoding techniques,
where there is a dynamical transition between two statistically distinct solutions as the
noise parameter is varied. Finally we note a number of pertinent studies concerning the
effectiveness of belief propagation as an MPM decoding method [8, 9, 10, 11], and in
combining sparse encoding (LDPC) methods with CDMA [12]. Many of these papers
however consider the extreme dilution regime – in which the number of chip contributions
is large but not O(N).
The theoretical work regarding sparsely spread CDMA remained lacking in certain
respects. As pointed out in [2], spreading codes with Poisson distributed number of non-
zero elements, per chip and across users, are systematically failing in that each user
has some probability of not contributing to any chips (transmitting no information).
Even in the “partly regular” code [7] ensemble (where each user transmits on the same
number of chips) some chips have no contributors owing to the Poisson distribution in
chip connectivity, consequently the bandwidth is not effectively utilised. We circumvent
this problem by introducing constraints to prevent this, namely taking regular signature
codes constrained such that both the number of users per chip and chips per user take
fixed integer values. Furthermore we present analytic and numerical analysis without
resort to Gaussian approximations of any quantities. Using new tools from statistical
mechanics we are able to cast greater light on the nature of the binary prior transmission
process. Notably the nature of the decoding state space and relative performance of sparse
ensembles versus dense ones across a range of noise levels; and importantly, the question
of how the coexistence of solutions found by Tanaka [3] extends to sparse ensembles,
especially close to the transition points determined for the dense ensemble.
In this paper we demonstrate the superiority of regular sparsely spread CDMA code
over densely spread codes in certain respects, for example, the anticipated bit error rate
arising in decoding is improved in the high noise regime and the solution coexistence
behaviour is less pervasive. Furthermore, to utilise belief propagation for such an ensemble
is certain to be significantly faster and less computationally demanding [13], this also has
power-consumption implications which may be important in some applications. Other
practical issues of implementation, the most basic being non-synchronisation and power
control, require detailed study and may make fully harnessing these advantages more
complex and application dependent.
The paper is organised as follows: In section 2 we will introduce the general framework
and notation used, while the methodology used for the various codes will be presented in
section 3. The main results for the various codes will be presented in section 4 followed
by concluding remarks in section 5.
Sparsely-spread CDMA - a statistical mechanics based analysis 4
i b ξ j d
1τ iτ
jτ kτ lτ K
yb yc yd yN
Figure 1. A bi-partite graph is useful for visually realising a problem. A user node i at
the bottom interacts with other variables through its set of neighbouring factor nodes (∂i)
to which it connects. The factor nodes are determined through a similar neighborhood.
The interaction at each factor (µ) is conditioned on neighbouring gain factors ξµ (the
non-zero components of s), and yµ (which is an implicit function of the noise ωµ, and
neighbouring input bits bµ and gain factors ξµ), assuming a uniform prior on the bits.
The statistical mechanics reconstruction problem associates dynamical variables τ to the
user nodes that interact through the factors. The thermodynamical equilibrium state of
this system then describes the theoretical performance of optimal detectors.
2. The model
We consider a standard model of CDMA consisting of K users transmitting in a bit
interval of N chips. We assume a model with perfect power control and synchronisation,
and consider only the single bit interval. In our case the received signal y is described by
[skbk] + ω , (1)
where the vector components describe the values for distinct chips: sk is the spreading
code for user k, bk = ±1 is the bit sent by user k (binary input symbols) and ω the noise
vector. Appropriate normalisation of the power is through the definition of the signature
matrix (s). It is possible to include a user or chip specific amplitude variation, which may
be due to fading or imperfect power control. We consider a model without these effects.
The spreading codes are sparse so that in expectation only C of the elements in vector sk
are non-zero. If, with knowledge of the signature matrix in use, we assume the signal has
been subject to additive white Gaussian channel noise of variance σ20/β, where σ
0 is the
variance of the true channel noise 〈ω2〉, we can write the posterior for the transmitted bits
τ (unknowns given the particular instance) using Bayes Theorem
P (τ |y) =
[sµk(bk − τk)] + ωµ
P (τ ) , (2)
and from this define bit error rate, mutual information, and other quantities. The
statistical mechanics approach from here is to define a Hamiltonian and partition
Sparsely-spread CDMA - a statistical mechanics based analysis 5
function from which the various statistics relating to this probability distribution may
be determined - and hence all the usual information theory measures. A suitable choice
for the Hamiltonian is
H(τ ) =
[sµk(bk − τk)] + ωµ
hkτk . (3)
We can here identify τ as the dynamical variables in the inference problem (dependence
shown explicitly). The other quenched variables (parameters), describing the instance of
the disorder, are the signature matrix (s), noise (ω) and the inputs (b). The variables
hk describe our prior beliefs about the inputs (the specific user bias), and we can assume
some simple distribution for this such as all users having the same bias hk = H . Maximal
rate transmission corresponds to unbiased bits H = 0, and this is considered throughout
the paper. The properties of such a system may be reflected in a factor (Tanner) graph,
a bipartite graph in which users and chips are represented by nodes (see figure 1).
The calculation we undertake is specific to the case of the thermodynamic limit in
which the number of chips N → ∞ whilst the load α = K/N is fixed. Note that α is
termed β in many CDMA papers, here we reserve β to mean the “inverse temperature”
in a statistical mechanics sense (which defines our prior belief for the noise level and give
rise to the corresponding MAP detector.)
In all ensembles we may identify the parameter L as the mean number of contributions
to each chip, and C as the mean number of contributions per user. As such the following
also holds
. (4)
The case in which α is greater than 1 will be called oversaturated, since more than one
bit is being transmitted per chip.
The calculations presented henceforth are specific to the case of memoryless noise,
drawn from a single distribution of mean zero and mean square σ20
Ω(ω) = P (ωµ = ω) . (5)
Defining normalised spreading codes such that
sk.sk = N , we can identify the “power
spectral density” (PSD) over a chip interval as a measure of the system noise 1/(2σ20) –
the factor two being connected with physical considerations in implementing the model.
2.1. Code Ensembles
We consider several code ensembles we call irregular, partly regular and regular, which
differ in the constraints placed on the factor and variable degree constraints of the signature
matrix s. The probability distribution
P (s) = N
δ(sµk 6= 0)− L̃)
P (L̃)
Sparsely-spread CDMA - a statistical mechanics based analysis 6
δ(sµk 6= 0)− C̃)
P (C̃)
P (sµk) , (6)
where N is a normalising constant, P (L̃) is the factor degree probability distribution of
mean L, P (C̃) is the variable degree probability distribution of mean C, and P (sµk) is the
marginal probability distribution which is common to all ensembles
P (sµk) =
δ(sµk) +
δ(sµk − ξ) . (7)
The form of (6) is then sufficient for the sparse distributions we consider in the large system
limit, and makes explicit the chip and user connectivity properties of the ensembles. The
gain factor ξ, is drawn randomly from a single distribution with zero measure at ξ = 0,
and finite moments, in any instance of a code
φ(ξ) = P (sµk = ξ|sµk 6= 0) . (8)
Unlike the dense case the details of this distribution will effect results, but only in a small
way for reasonable choices [2]. We here investigate the case of Binary Phase Shift Keying
(BPSK) which corresponds to a uniform distribution on {− 1√
}, though the analytic
results presented are applicable to any distribution of mean square = 1/L. Note that
disorder in the gain factors is not a necessity, the case ξ = 1/
L also allows decoding in
sparse ensembles.
The case where P (L̃) and P (C̃) are Poissonian distributed identifies the irregular
ensemble - where the connections between chips and users are independently distributed.
The second distribution called partly regular has P (C̃) = δC,C̃ , in which the chip
connectivity is again Poisson distributed with mean L, but each user contributes to exactly
C chips. This prevents the systematic failure inherent in the irregular ensemble since
therein an extensive number of users fail to transmit on any chips. If in addition to
the aforementioned constraint all chips receive exactly L contributions, P (L̃) = δL,L̃,
the ensemble is called regular. Regular chip connectivity amongst other things prevents
the systematic inefficiency due to leaving some chips unaccessed by any of the users.
The case of Poissonian distributions is that in which there is no global control. In
many engineering applications constraining users individually (non-Poissonian P (C̃)) is
practical, whereas coordination between users (non-Poissonian P (L̃)) is difficult. The
practicalities of implementing the different ensembles we consider are application specific:
the advantages inherent in distributing channel resources more evenly amongst users may
be lost to practical implentation problems.
3. Methodology
3.1. Spectral Efficiency Lower Bound
The inferiority of codes with Poissonian user connectivity has been pointed out previously
(e.g., in [2]), based on the understanding that codes which leave a portion of the users
Sparsely-spread CDMA - a statistical mechanics based analysis 7
disconnected cannot be optimal. Analogously we argue that codes with irregular chip
connectivity must also be inferior in that they leave a fraction of the chips (bandwidth)
unutilised, thus providing a motivation for considering fully regular codes.
In this section we show a particular case in which the regular codes are expected
to outperform any other ensemble by analysing the amount of information that can be
extracted on the sent bits by consideration of only one chip in isolation of the other chips.
This corresponds to a detector reconstructing bits based only on the value of a single chip,
and is independent of the user connectivity.
The spectral efficiency is defined as the mutual information between the received
signal and reconstructed bits per chip. In considering only a single chip (µ) we have
I(τ ; yµ) =
P (τ |yµ)
P (τ )
P0(τ ,yµ)
, (9)
where the subscript zero indicates that the true (generative), rather than model (2),
probability distribution. For brevity we consider the simplest case that the generative
and model probability distributions are the same with unbiased bits and a Gaussian noise
distribution in which case after some rearrangement
I(τ ; yµ) = L̃−
exp(−Hµ(τ µ))
τ µ exp(−Hµ(τ µ))
P0(τ µ,yµ)
, (10)
where τ µ are the bits connected to chip µ, and the chip Hamiltonian is
ξiτi + yµ
, (11)
labelling each interacting (non-zero) component on the chip by i, L̃ being the chip
connectivity.
Working from this description we wish to compare the performance of ensembles
with different chip connectivities. To do this we consider the ensemble average mutual
information by averaging the mutual information over the connectivities (L̃), load factors,
and transmitted bits. This average is complicated, however it is possible to calculate the
dominant terms in the low and high PSD limits.
In the case of low noise (PSD → ∞) we find the asymptotically dominant terms
come first from the numerator
〈log2 exp−H(τ µ)〉
/ log(2) =
2 log(2)
, (12)
which is an average over the ground state energy, and also the logarithm of the denominator
which is
exp−H(τ µ)
ξi(bi − τi)
, (13)
where yµ has been decomposed into its bit ({bi}) and noise (ω) parts, and the averages are
now over the ensembles as well as yµ. The first part of (13) gives an energy contribution
cancelling (12). We call the remaining part the average over the chip entropy, by
Sparsely-spread CDMA - a statistical mechanics based analysis 8
comparison with (10) this determines the amount of information lost in decoding. The
chip entropy term contains an indicator function counting the ground states - the average
chip entropy is zero when τ µ = bµ is the only solution. For the case of BPSK however
there may be some degeneracy in ground states with two terms in the sum being non-zero
but cancelling one another. This degeneracy has a dependence on the distribution P (L̃)
for given L. Averaging over load factors and transmitted bits we find that in the zero
noise limit
I(τ , yµ)
ξi(bi − τi)
P (L̃)
, (14)
min(p,L̃−p)
L̃− p
P (L̃)
. (15)
By numerical evaluation of this function (see results section 4.2) we find that the optimal
ensemble is in fact the regular ensemble. This is because chip entropy, when averaged over
load factors and bits is a concave function in L̃, so that the information loss is minimised
when P (L̃) = δ
. This dependency on L̃ may be a peculiarity of the detector considered,
but many other aspects of the calculation may be generalised to give a similar result.
It is possible to consider the opposite limit σ20 → ∞ perturbatively. We found that
the leading four orders in 1/σ0 were identical for all code ensembles of the same mean chip
connectivity. We would anticipate the behaviour at non-extreme PSD to fall somewhere
between these two regimes and thus for the chip regular ensemble to be atleast as good as
the chip irregular ensembles.
We note here that another reason for considering the regular code optimal amongst
sparse random codes is to consider the field term when the Hamiltonian (11) is written
in canonical form with a set of couplings ({J〈ij〉}) and user specific external fields ({hi}).
In this representation the set of external fields are in expectation aligned with the sent
bit sequence, but subject to fluctuations for each code instance. The variance of these
fluctuations may be shown to be proportional to the excess chip connectivity over the
true chip connectivity [14], which amongst all ensembles is minimised by the regular chip
ensemble. The multi-user interference is larger in irregular codes and hence information
recovery is weaker as predicted in this section.†
3.2. Replica Method Outline
We determine the static properties of our model defined in section 2, including correlations
due to the full interaction structure, we use the replica method. From the expression of
the Hamiltonian (3) we may identify a free energy and partition function as:
f = − 1
lnZ Z = Trτ exp (−βH(τ )) .
† This argument is added since published version.
Sparsely-spread CDMA - a statistical mechanics based analysis 9
To progress we make use of the anticipated self-averaging properties of the system.
The assumption being that in the large system limit any two randomly selected instances
will, with high probability, have indistinguishable statistical properties. This assumption
has firm foundation in several related problems [15], and is furthermore intuitive after
some reflection. If this assumption is true then the statists of any particular instance can
be described completely by the free energy averaged over all instances of the disorder. We
are thus interested in the quantity
F = 〈f〉 = − lim
〈lnZ〉I , (16)
where the angled brackets represent the weighted averages over I (the instances). The
entropy density may be calculated from the free energy density by use of the relation
s = β(e− f) , (17)
where e is the energy density.
To determine the free energy we must average over disorder in (16), which is a difficult
problem except in special cases. This is why we make use of the replica identity
〈lnZ〉I = lim
〈Zn〉I . (18)
We can model the system now as one of interacting replicas, where Zn is decomposed as a
product of an integer number of partition functions with conditionally independent (given
the instance of the disorder) dynamical variables. The discreteness of replicas is essential
in the first part of the calculation, but a continuation to the real numbers is required
in taking n → 0+ – this is a notorious assumption, which rigorous mathematics can not
yet justify for the general case, in spite of the progress made in recent years [16, 17, 18].
However, we shall assume validity and since the methodology for the sparse structures is
well established [19, 20, 15] we omit our particular details. The final functional form for
the free energy is determlained from
〈Zn〉 =
dP (b,σ)dP̂ (b,σ)
exp{lnN +N(G1(n) +G2(n) +G3(n))} ; (19a)
G1(n) = ln
λ2α/2
P (b,σ)
λα(b− τα)
P (L̃)
; (19b)
G2(n) =
P (b,σ)P̂ (b,σ) ; (19c)
G3(n) = α ln
(−L)C̃
P̂ (b, τ )
P (C̃)
P0(b)
; (19d)
where N is a constant due to normalising the ensembles (6). This expression may be
evaluated at the saddle point to give an expression for the free energy. In the term (19d)
Sparsely-spread CDMA - a statistical mechanics based analysis 10
we account for the cases in which the marginalised probability distribution P0(b) and
assumed marginal probability distribution (described by H) are asymmetric. In the case
of maximal rate which we will consider, the b average is trivial and H = 0. Provided
that in addition the gain factor distribution is symmetric then it is possible to remove
the b dependence in the order parameters, since the symmetry P (b,σ) = P (−b,−σ) and
P̂ (b,σ) = P̂ (−b,−σ) leaves the free energy invariant.
3.3. Replica Symmetric Equations
The concise form for our equations is attained using the assumption of replica symmetry
(RS). This amounts to the assumption that the correlations amongst replicas are all
identical, and determined by a unique shared distribution. The validity of this assumption
may be self consistently tested (section 3.5). This assumption differs from that used
by Yoshida and Tanaka [2] where the correlations are described by only a handful of
parameters rather than a distribution once RS is assumed – this approach may therefore
miss some of the detailed structure although it is easier to handle numerically. The order
parameter in our case is given by
P (b, τ ) =
dπ(x)
(1 + bταx)
; (20a)
P̂ (b, τ ) = q̂
dπ̂(x̂)
(1 + bταx) ; (20b)
where q̂ is a variational normalisation constant and π, π̂ are normalised distributions on
the interval [−1, 1]. From here onwards we may consider the case in which the bit variables
τα and gain factors ξ are gauged to b (τb → τ , ξb → ξ).
Using Laplace’s method, this gives the following expression for the (RS) free energy
at the saddle point
FRS = −
Extrπ,bπ
G1,RS(L̃)(n) + G2,RS(n) + G3,RS(C̃)(n)
where
G1,RS(n)
= − L ln 2
[dπ(xl)]
ln Tr{τl=±1}χL̃(τ ; {ξ}, ω, {x})
Ω(ω),φ(ξ)
P (L̃)
; (22a)
χL̃(τ ; {ξ}, ω, {x}) = exp
(1− τl)ξl
(1 + τlxl) ; (22b)
G2,RS(n) = − L
dπ(xc)dπ̂(x̂c) ln(1 + xx̂c) ; (22c)
G3,RS(n) = α
[dπ̂(x̂c)] ln
(1 + x̂c) +
(1− x̂c)
P (C̃)
. (22d)
Sparsely-spread CDMA - a statistical mechanics based analysis 11
and the saddle point value for ŵ (= L) has been introduced. The averages over L̃ and C̃
encapsulate the differences amongst the ensembles.
Equation (22b) describes the interaction at a single chip in the factor graph (figure 1)
of connectivity L̃. The parameter ξl and variable τ are the gain factors, and reconstructed
bits respectively, both gauged to the transmitted bit, while ω is the instance of the chip
noise.
The order variational distributions {π, π̂} are chosen so as to extremise (21). The self
consistent equations attained by the saddle point method are:
π̂(x̂) =
[dπ(xl)]
Tr{τl=±1} τL̃+1 χ̄L̃(τ ; {ξ}, {x̂})
Tr{τl=±1} χ̄L̃(τ ; {ξ}, ω, {x})
{ξ},ω
P (L̃)
(23a)
χ̄L̃(τ ; {ξ}, ω, {x}) = exp
(1− τl)ξl
(1 + τlxl) (23b)
π(x) =
[dπ̂(x̂c)] δ
c=1(1 + x̂c)−
c=1(1− x̂c)
c=1(1 + x̂c) +
c=1(1− x̂c)
P (C̃)
. (23c)
The variables P (L̃) and P (C̃) are here the excess degree distributions of the particular
ensemble (6). For regularly constrained ensembles the chip and user excesses are L − 1
and C − 1 respectively. For Poissonian distributions the excess degree distribution is the
full degree distribution.
Aside from entropy, the other quantities of interest may be determined from the
probability distribution for the overlap of reconstructed and sent variables mk = 〈τk〉,
P (m) = lim
δmk,m
, (24)
[dπ̂(x̂c)] δ
c=1(1 + x̂c)−
c=1(1− x̂c)
c=1(1 + x̂c)+
c=1(1− x̂c)
P (C̃)
. (25)
We note finally that equivalent expressions to these found with the RS assumption
may be obtained by using the cavity method [6] with the assumption of a single pure state.
This approach is a probabilistic one and hence more intuitive on some levels.
3.4. Population Dynamics
Analysis of these equations is primarily constrained by the nature of equations (23a-
23c). No exact solutions are apparent, and perturbative regimes about the ferromagnetic
solution (which is only a solution for zero noise) are difficult to handle. Consequently
we use population dynamics [21] – representing the distributions {π(x), π̂(x̂)} by finite
populations (histograms) and iterating this distribution until convergence. It is hoped,
and observed, that each histogram captures sufficient detail to describe the continuous
Sparsely-spread CDMA - a statistical mechanics based analysis 12
function and the dynamics (described below) allow convergence towards a true solution
distribution with only small corrections due to finite size effects.
To solve the equations (23a,23c) with population dynamics finite histograms
constucted from M undirected cavity magnetisations are used. Histograms approximating
each function are formed
π(x) → W = {x1, . . . , xi, . . . , xM} , (27a)
π(x̂) → Ŵ = {x̂1, . . . , x̂a, . . . , x̂M} , (27b)
with M sufficiently large to provide good resolution in the desired performance measures.
The discrete minimisation dynamics of the histograms is derived from (23a-23c).
Histogram updates are undertaken alternately, with all magnetisation in the histogram
being updated sequentially. In the update of field xa the quenched parameters {L̃, ω, ξ}
are sampled, L̃ being the chip excess degree, and L̃ magnetisations are randomly chosen
from W , defining through (23a) the update
x̂a =
Tr{τl=±1} τL̃+1 χ̄L̃(τ ; {ξ}, ω, {x})
Tr{τl=±1} χ̄L̃(τ ; {ξ}, ω, , {x})
. (28)
The update of the other histogram follows dynamics in which C̃ is sampled, C̃ being
the user excess degree, along with C̃ randomly chosen magnetisations from Ŵ , defining
through (23c) the update
c=1(1 + x̂c)−
c=1(1− x̂c)
c=1(1 + x̂c) +
c=1(1− x̂c)
. (29)
There is a strong analogy between the population dynamics algorithm and that of
message passing on a particular instance of the graph. The iteration of the histograms
implicit in (28-29) is analogous to the propagation of a population of cavity magnetisations
between factor (a) and user (i) nodes, which may be written as the self consistent equations:
x̂a→i =
Tr{τl=±1}τi exp
l∈∂ari
(1− τl)ξal
l∈∂ari
(1 + τlxl→a) ; (30a)
xi→a =
c∈∂ira
(1 + x̂c→i)−
c∈∂ira
(1− x̂c→i)
; (30b)
where Nx,x̂ are the relevant normalisations, and the abbreviation ∂y indicates the set
of nodes connected to y. In population dynamics, the notion of a particular graph
with labelled edges is absent however, and the only the distribution of the two types
of magnetisations are relevant.
3.5. Stability Analysis
To test the stability of the obtained solutions we consider both the appearance of
non-negative entropy, and a stability parameter defined through a consideration of the
Sparsely-spread CDMA - a statistical mechanics based analysis 13
fluctuation dissipation theorem. The first criteria that the entropy be non-negative is
based on the fact that physically viable solutions in discrete systems must have non-
negative entropy so that any solution found not meeting this criteria must be based on
bad premises; replica symmetry is a likely source.
The stability parameter λ is defined in connection with the cavity method for spin
glasses [22] and tests local stability of the solutions. It is equivalent to testing the local
stability of belief propagation equations as proposed in [23]. A necessary condition for
the stability of the RS solution is that the corresponding susceptibility does not diverge.
This condition ensures that fields are not strongly correlated. The spin glass susceptibility
when averaged over instances may be defined
〈τ0τd〉2c
, (31)
where d is the distance between two nodes in the factor graph, the inner average denotes
the connected correlation function between these nodes, Xd describes the typical number
of variables at distance d, and the outer average is over instances of the disorder (self-
averaging part). This quantity is not divergent provided that
λ = ln
〈τ0τd〉2c
is negative, since this indicates an asympoptically exponential decrease in the terms of (31)
and hence convergence of the sum. In the thermodynamic limit the connected correlation
function is dominated by a single direct path which may be decomposed as a chain of local
linear susceptibilities
〈τ0τd〉c ∝
(i,j)
∂xi→a
∂x̂b→i
∂x̂b→i
∂xj→b
, (33)
where (i,j) indicate the set of variables on the shortest path between nodes 0 and d in a
particular instance of the graph (30a).
This representation allows us to construct an estimation for λ numerically based
on principles similar to population dynamics [24] – the directedness and fixed structure
implicit in a particular problem is removed with the self-averaging assumption leaving a
functional description similar to (23a-23c), which may be iterated. In order to approximate
the stability parameter λ one introduces additional positive numbers in the population
dynamics histograms (27b,27a), xi → {xi, vi} and x̂a → {x̂a, v̂a} respectively. These new
values represent the relative sizes of perturbations in each magnetisation, and are updated
in parallel to (28,29) as
v̂a =
, (34)
and with similar assignments for the field update of W
. (35)
Sparsely-spread CDMA - a statistical mechanics based analysis 14
The partial derivatives are calculated from (28-29) and evaluated at the corresponding
values in the sampled population. If the final fixed point is stable against small
perturbations in the initial field then these values {v, v̂} must decay exponentially on
average. Renormalisation of {vi} and {v̂a} such that the mean is 1 after each update is
necessary. The numerical renormalisation constant for each population yields (dependent)
estimations of λ, which can be sampled at a suitable convergence time (end of the {W, Ŵ}
minimisation process).
Like population dynamics we expect behaviour to be sensitive to initialisation
conditions and finite size effects in some circumstances. In addition the estimation requires
good resolution in the histograms W and Ŵ .
4. Results
Results are presented here for the canonical case of Binary Phase Shift Keying (BPSK)
where ξl ∈ {1,−1} with equal probability. Furthermore, we assume an AWGN model for
the true noise ω (of variance σ20). For evaluation purposes we assume the channel noise
level is known precisely, so that β = 1, employing the Nishimori temperature [5]. This
guarantees that the RS solution is thermodynamically dominant. Furthermore the energy
takes a constant value at the Nishimori temperature and hence the entropy is affine to
the free energy. Where of interest we plot the comparable statistics for the Single User
Gaussian channel (SUG), and the densely spread ensemble, each with MPM detectors –
equivalent to maximum likelihood for individual bits.
For population dynamics two parallel populations (27a,27b) are initialised either
uniformly at random, or in the ferromagnetic state. These two populations are known
to converge towards the unique solution, where one exists, from opposite directions, and
so we can use their convergence as a criteria for halting the algorithm and testing for the
appearance of multiple solutions. In the case where they converge to different solutions
we can usually identify the solution converged to from the ferromagnetic initial state as
a good solution - in the sense that it reconstructs well, and that arrived at from random
initial state as a bad solution. In the equivalent belief propagation algorithm one cannot
choose initial conditions equivalent to ferromagnetic – knowing the exact solution would
of course makes the decoding redundant. We therefore expect the properties of the bad
solution to be those realisable by belief propagation (though clever algorithms may be
able to escape to the good solution under some circumstances). The stability variables
{v, v̂} were initialised independently each as the square of a value drawn from a gaussian
distribution – and tests indicated other reasonable distributions produced similar results.
Computer resources restrict the cases studied in detail to an intermediate PSD
regime, and small L. In particular, the problem at low PSD, is the Gaussian noise
average, which is poorly estimated, while at high PSD a majority of the histogram is
concentrated at magnetisations x, x̂ ≈ 1 not allowing sufficient resolution in the rest of
the histogram.
Several different measures are calculated from the converged order parameter,
Sparsely-spread CDMA - a statistical mechanics based analysis 15
indicating the performance of sparsely-spread CDMA. Using the converged histograms
for the fields we are able to determine the following quantities: free energy, energy and a
histogram for the probability distribution, from discretisations of the previously presented
equations (23a-23c). Using the probability distribution we are also able to approximate
the decoding bit error rate
dP (m)
1− sign(m)
; (36)
multi-user efficiency
MuE =
erfc−1(Pb)
; (37)
and mutual information between sent and reconstructed bits per chip, I(b; τ )/N (taking
a factorised form given the RS assumption)
MI = α
dP (m)
1 + τm
1 + τm
. (38)
The spectral efficiency is the capacity I(τ ;y) per chip, which is affine to the entropy (and
the free energy at the Nishimori temperature)
ν = α− s/ ln 2 . (39)
Negative entropy can be identified when the measured spectral efficiency exceeds the load,
and thermodynamic transition points correspond to points of coincident spectral efficiency.
Figure 2‡ demonstrates some general properties of the regular ensemble in which the
variable and factor degree connectivities are C : L = 3 : 3, respectively. Equations (23a-
23c) were iterated using population dynamics and the relevant properties were calculated
using the obtained solutions; the data presented is averaged over 100 runs and error-bars,
which are typically small, are omitted for brevity. Figure 2(a) shows the bit error rate
in regular and Poissonian codes, the inset focuses on the range where the sparse-regular
and dense cases crossover. The sparse codes demonstrate similar trends to the dense case
except the irregular code, which show weaker performance in general, and in particular at
high PSD. Detailed trends can be seen in figure 2(b) that shows the multiuser efficiency.
Codes with regular user connectivity show superior performance with respect to the dense
case at low PSD. Figure 2(c) shows similar trends in the spectral efficiency and mutual
information (shown in the inset); the effect of the disconnected (user) component is clear
in the fact that the irregular code fails to reach capacity at high noise levels. In general
it appears the chip connectivity distribution is not critical in changing the trends present,
unlike the user connectivity distribution. It was found in these cases (and all cases with
unique solutions for given PSD), that the algorithm converged to non-negative entropy
values and to a stability measure fluctuating about a value less than 0, as shown in
figure 2(d). These points would indicate the suitability of the RS assumption.
‡ This figure has been modified from the published version, the difference being that the Poissonian chip
connectivity codes have everywhere weaker performance than the dense and sparse regular code ensemble.
Sparsely-spread CDMA - a statistical mechanics based analysis 16
The outperformance of dense codes by sparse ensembles with regular user connectivity
in the low PSD regime is new to our knowledge, although Poissonian chip connectivity
is everywhere inferior to both the dense and regular sparse codes. The difference between
codes disappears rapidly with increasing (connection) density at fixed α (figure 3). This
is inline with our prediction of the regular code being a high performance ensemble in
preceeding sections.
Figure 3 indicates the effect of increasing density at fixed α in the case of the
regular code. As density is increased the statistics of the sparse codes approach that
of the dense channel in all ensembles tested. For the irregular ensemble performance
increases monotonically with density at all PSD. The rapid convergence to the dense
case performance was elsewhere observed for partly regular ensembles, and ensembles
based on a Gaussian prior input [2, 7]. At all densities for which single solutions were
found the RS assumption appeared validated in the stability parameter and entropy.
Figure 4 indicates the effect of channel load α on performance. We first explain results
for codes in which only a single solution was found (no solution coexistence). For small
values of the load a monotonic increase in the bit error rate, and capacity are observed as α
is increased with C constant, as shown in figures 4(a) and 4(b), respectively. This matches
the trend in the dense case, the dense code becoming superior in performance to the sparse
codes as PSD increases. We found that for all sparse ensembles there existed regimes with
α > 1.49 for which only a single stable solution existed, although the equivalent dense
systems are known to have two stable solutions in some range of PSD [3]. In all single
valued regimes we observed positive entropy, and a negative stability parameter. However,
in cases of large α many features became more pronounced close to the dense case solution
coexistence regime: notably the cusp in the stability parameter, gap between MI and ν
and the gradient in Pb.
4.1. Solution Coexistence Regimes
As in the case of dense CDMA [3], also here we observe a regime where two solutions, of
quite different performance, coexist. In order to investigate the regime where two solutions
coexist we investigated the states arrived at from random and ferromagnetic initial
conditions (giving bad and good solutions respectively). Separate heuristic convergence
criteria were found for the histograms, and these seemed to work well for the good
solution. For the bad solution we simply present results after a fixed number of histogram
updates (500) as all convergence criteria tested appeared either too stringent, to require
experimentally inaccessible timescales, or did not capture the asymptotic values for
important quantities like entropy. We believe 500 updates to be sufficiently conservative
to capture the properties of these solutions however.
Figure 4(a) shows the dependence of the bit error rate on the load, which is also
equivalent to L/C. There is a monotonic increase in bit error rate with the load and the
emergence and coexistence of two separate solutions above a certain point; in the case
of the 6 : 3 code the point above which the two solutions coexist is PSD = 10.23dB as
Sparsely-spread CDMA - a statistical mechanics based analysis 17
−10 −8 −6 −4 −2 0 2 4 6 8 10
Spectral Power Density [dB]
Irreg.
P. Reg.
Dense0 1 2 3 4 5 6 7 8
−10 −8 −6 −4 −2 0 2 4 6 8 10
Spectral Power Density [dB]
Irreg.
P. Reg.
Dense(b)
−10 −8 −6 −4 −2 0 2 4 6 8 10
Spectral Power Density [dB]
Irreg.
P. Reg.
Dense
2 2.5 3 3.5 4 4.5 5 5.5 6
−10 −8 −6 −4 −2 0 2 4 6 8 10
Spectral Power Density [dB]
Irreg.
P. Reg.
Reg. Type 1
Reg. Type 2
Figure 2. Performance of the sparse CDMA configuration of variable and factor degree
connectivities C : L = 3 : 3, respectively; all data presented on the basis of 100 runs, error
bars are omitted and are typically small in subfigures (a)-(c) the smoothness of the curves
being characteristic of this level (numerical accuracy was excellent only at intermediate
PSDs). (a) The bit error rate is limited by the disconnected component in the case of
irregular codes, otherwise trends match the dense case, lower bounded by the SUG. Inset
- the range where the sparse-regular and dense cases crossover.(b) Multiuser efficiency
indicates the regular user connectivity codes outperform the dense case below some PSD.
(c) The spectral efficiency [——] demonstrates similar trends, the entropy being positive.
The gap between the mutual information [· · · · · ·] and spectral efficiency (shown in the
inset) is everywhere small and especially so at small and large PSD, indicating little
information loss in the decoding process. (d) The two markers show the mean results
for the two different stability estimates in the algorithm for the regular code. There are
systematic errors at small PSD, and convergence is good only at intermediate PSD.
The lines represent the average of these quantities for each ensemble – all ensembles show
a cusp at some PSD, for 3 : 3 codes the various ensembles shows very similar trends,
indicating local stability everywhere.
Sparsely-spread CDMA - a statistical mechanics based analysis 18
−10 −8 −6 −4 −2 0 2 4 6 8 10
Spectral Power Density [dB]
Dense
0 1 2 3 4 5 6 7 8
Spectral Power Density [dB]
Dense
Figure 3. The effect of increasing density for the regular ensemble: (a) Multiuser
efficiency, (b) spectral efficiency [——] and mutual information [– – –]. Data presented on
the basis of 10 runs, error bars are omitted but of a size comparable with the smoothness
of the curves. The performance of sparse codes rapidly approaches that of the dense code
everywhere. The PSD threshold beyond which the dense code outperforms the sparse
code is fairly stable.
indicated by the vertical dotted line.
We use the regular code 6 : 3 to demonstrate the solution coexistence found above
some PSD in various codes. The onset of the bimodal distribution can be identified by
the divergence in the convergence time in the single solution regime (the time for the
ferromagnetic and random histograms to converge to a common distribution). The time
for this to occur, in a heuristically chosen statistic and accuracy, is plotted in figure 4(b).
By a naive linear regression across 3 decades we found a power law exponent of 0.59 and
a transition point of PSD = 10.23dB, but cannot provide a goodness of fit measure to
this data. This would represent the point at which at least two stable solutions co-exist.
Beyond PSD ≈ 12dB only one stable solution is found from both random and
ferromagnetic initial conditions, corresponding statistically to a continuation of the good
solution. A solution which statistically resembles a continuation of the bad solution is
occasionally arrived at from both initial conditions, this solution had a positive stability
parameter and negative entropy – so is not a viable solution. Thus we predict a second
dynamical transition in the region of 12dB, as might be guessed by comparison with the
dense case and observation of the trend in the stability parameter (see figure 4(c)).
The stability results are presented in figure 4(c). Only two stable solutions were found
in the region beyond this critical point and upto 12dB, which we infer to be viable RS
solutions (where entropy is positive). The bad solution upto 12dB has a well resolved
negative value. The good solution has a negative value in its mean, but like other near
ferromagnetic solutions investigated results are very noisy due to numerical issues relating
to histogram resolution.
Both capacity and spectral efficiency monotonically increase with the load as shown in
figure 4(d). For the 6 : 3 code we see a separation of the two solutions at PSD = 10.23dB
Sparsely-spread CDMA - a statistical mechanics based analysis 19
−4 −2 0 2 4 6 8 10 12
Spectral Power Density [dB]
6:3 (Bad)
6:3 (Good)
 − PSD
Data Mean
Linear fit
Bounds
−4 −2 0 2 4 6 8 10 12
Spectral Power Density [dB]
6:3 (Bad)
6:3 (Good)
−4 −2 0 2 4 6 8 10 12
Spectral Power Density [dB]
6:3 (Bad)
6:3 (Good)
8   10  12  
Figure 4. The effect of channel load α on performance for the regular ensemble. Data
presented on the basis of 10 runs, error bars omitted but characterised by the smoothness
of curves. Dashed lines indicate the dense code analogues. The vertical dotted line
indicates the point beyond which 6 : 3 random and ferromagnetic initial conditions failed
to converge to the same solution, both dynamically stable solutions are shown beyond
this point. (a) There is a monotonic increase in bit error rate with the increasing load.
(b) Investigation of the 6 : 3 code (α = 2) indicates a divergence in convergence time as
PSD → 10.23dB with exponent 0.59 based on a simple linear regression of 15 points (each
point is the mean of 10 independent runs). Beyond this point different initial conditions
give rise to one of two solutions. (c) The stability parameter was found to be negative
for all convergent solutions, indicating the suitability of RS. Where the solution is near
ferromagnetic the stability measure becomes quickly very noisy (as shown for the 5 : 3
and 6 : 3 codes). (d) As load α is increased there is a monotonic increase in capacity.
The spectral efficiency for the ’bad’ solution exceeds 2 in a small interval (equivalent to
negative entropy), similar to the behaviour reported for the dense case.
Sparsely-spread CDMA - a statistical mechanics based analysis 20
(vertical dotted line.) The dashed lines correspond to a similar behaviour observed in the
dense case (the range of interest is magnified in the inset.) A cross over in the entropy of the
two distinct solutions, near PSD ≈ 11dB, is indicative of a second order phase transition.
As in the dense case, only the solution of smallest spectral efficiency is thermodynamically
relevant at a given PSD, although the other is likely to be important in decoding dynamics.
The trends in the sparse case follow the dense case qualitatively, with the good solution
having performance only slightly worse than the corresponding solution in the dense case
(and vice versa for the bad solution).
The entropy of the bad solution becomes negative in a small interval (spectral
efficiency exceeds 2) although no local instability is observed. The static and dynamic
properties of the histograms appear to be well resolved in this region. However, the
negative entropy indicates an instability towards either a type of solution not captured
within the RS assumption, or towards some metastable configuration. We will not
speculate further, the bad solution is in any case thermodynamically subdominant in
its low and negative entropy form.
Our hypothesis is therefore that the trends in the sparse ensembles match those in
the dense ensembles within the coexistence region and RS continues to be valid for each
of two distinct positive entropy solutions. The coexistence region for the sparse codes is
however smaller than in the corresponding dense ensembles. Since our histogram updates
mirror the properties of a belief propagation algorithm on a random graph we can suspect
that the bad solution may have implications for the performance of belief propagation
decoding in the coexistence region, and that convergence problems will appear near this
region. In the user regular codes investigated the bad solution of the sparse ensemble
outperforms the bad solution of the dense ensemble, and vice-versa for the good solution.
Thus regardless of whether sparse decoding performance is good or bad, the dynamical
transition point for the dense ensemble would corresponds to a PSD beyond which dense
CDMA outperforms sparse CDMA at a particular load.
4.2. Spectral Efficiency Lower Bound Numerical Results
Finally we present figure 5, which shows the the mutual information between a single
chip and transmitted bits for sparse ensembles of differing chip connectivity in the infinite
PSD (zero noise) limit (15). This shows that in expectation a chip drawn from the
regular ensemble contains more information on the transmitted bits than a chip drawn
from any other ensemble (including the Poissonian ensemble). The difference between the
regular and Poissonian ensembles becomes relatively smaller as L increases. This appears
consistent with the replica method results found at high PSD, although regular chip
connectivity under performed by comparison with Poisson distributed chip connectivity
in the low PSD regime, which was not anticipated by the single chip approximation.
Sparsely-spread CDMA - a statistical mechanics based analysis 21
0 5 10 15 20 25
Mean Chip Connectivity, L
Poissonian
Figure 5. A PSD → ∞ limit to the expected mutual information between a single chip,
and the transmitted bits. Mutual Information is highest for regular chip connectivities,
with the Poissonian chip connectivity result also shown, the discrepancy becoming
relatively small as L increases. The inset shows the mutual information/bit decoded
(〈I(τ ; yµ)〉 /L) on a log-log plot to demonstrate an asymptotic power law behaviour and
show more detail in the cases of small L.
5. Concluding Remarks
Our results demonstrate the feasibility of sparse regular codes for use in CDMA. At
moderate PSD it seems the performance of sparse regular codes may be very good. With
the replica symmetric assumption apparently valid at practical PSD it is likely that fast
algorithms based on belief propagation may be very successful in achieving the theoretical
results. Furthermore for lower density sparse codes the problem of the coexistence regime,
which limits the performance of practical decoding methods, seems to be less pervasive
than for dense ensembles in the over saturated regime.
A direct evaluation of the properties of belief propagation may prove similar results
to those shown here. In the absence of replica symmetry breaking states it is normally
true that belief propagation performs very well. However, to make best use of the channel
resources it may be preferable to implement high load regimes in cases of high PSD, and
so overcoming the algorithmic problems arising from the solution coexistence is a challenge
of practical importance in this case.
Other practical issues in implementation are certainly significant. Similar to the case
of dense CDMA there are considerable problems relating to multipath, fading and power
control, in fact it is known that these effects are more disruptive for the sparse codes,
especially regular codes. However, certain situations such as broadcasting (one to many)
channels and downlink CDMA, where synchronisation can be assumed, may be practical
points for future implementation. There are practical advantages of the sparse case over
dense and orthogonal codes in some regimes. The sparse CDMA method is likely to be
particularly useful in frequency-hopping and time-hopping code division multiple access
(FH and TH -CDMA) applications where the effect of these practical limitations is less
Sparsely-spread CDMA - a statistical mechanics based analysis 22
emphasised.
Extensions based on our method to cases without power control or synchronisation
have been attempted and are quite difficult. A consideration of priors on the inputs, in
particular the effects when sparse CDMA is combined with some encoding method may
also be interesting.
Acknowledgments
Support from EVERGROW, IP No. 1935 in FP6 of the EU is gratefully acknowledged.
DS would like to thank Ido Kanter for helpful discussions.
Bibliography
[1] S. Verdu. Multiuser Detection. Cambridge University Press, New York, NY, USA, 1998.
[2] M. Yoshida and T. Tanaka. Analysis of sparsely-spread cdma via statistical mechanics. In Proceedings
- IEEE International Symposium on Information Theory, 2006., pages 2378–2382, 2006.
[3] T. Tanaka. A statistical-mechanics approach to large-system analysis of cdma multiuser detectors.
Information Theory, IEEE Transactions on, 48(11):2888–2910, Nov 2002.
[4] D. Guo and S. Verdu. Communications, Information and Network Security, chapter Multiuser
Detection and Statistical Mechanics, pages 229–277. Kluwer Academic Publishers, 2002.
[5] H. Nishimori. Statistical Physics of Spin Glasses and Information Processing. Oxford Science
Publications, Oxford, UK, 2001.
[6] M. Mezard, G. Parisi, and M.A Virasoro. Spin Glass Theory and Beyond. World Scientific,
Singapore, 1987.
[7] A. Montanari and D. Tse. Analysis of belief propagation for non-linear problems: The example
of cdma (or: How to prove tanaka’s formula). In Proceedings IEEE Workshop on Information
Theory, 2006.
[8] Y. Kabashima. A statistical-mechanical approach to cdma multiuser detection: propagating beliefs
in a densely connected graph. cond-mat/0210535, 2002.
[9] J.P Neirotti and D. Saad. Improved message passing for inference in densely connected systems.
Europhys. Lett., 71(5):866–872, 2005.
[10] A. Montanari, B. Prabhakar, and D. Tse. Belief propagation based multiuser detection. In
Proceedings of the Allerton Conference on Communication, Control and Computing, Monticello,
USA, 2006.
[11] D. Guo and C. Wang. Multiuser detection of sparsely spread cdma. (unpublished), 2007.
[12] T. Tanaka and D. Saad. A statistical-mechanical analysis of coded cdma with regular ldpc codes.
In Proceedings - IEEE International Symposium on Information Theory, 2003., page 444, 2003.
[13] D.J. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press,
2004.
[14] J. Raymond and D. Saad. Randomness and metastability in cdma paradigms. arXiv:0711.4380,
2007.
[15] R. Vicente, D. Saad, and Y. Kabashima. Advances in Imaging and Electron Physics, volume 125,
chapter Low Density Parity Check Codes - A statistical Physics Perspective, pages 231–353.
Academic Press, 2002.
[16] M Talagrand. The generalized parisi formula. Comptes Rendus Mathematique, 337(2):111–114, 2003.
[17] S. Franz, M. Leone, and F.L. Toninelli. Replica bounds for diluted non-poissonian spin systems.
Journal of Physics A: Mathematical and General, 36(43):10967–10985, 2003.
[18] F. Guerra. Broken Replica Symmetry Bounds in the Mean Field Spin Glass Model. Communications
in Mathematical Physics, 233:1–12, 2003.
Sparsely-spread CDMA - a statistical mechanics based analysis 23
[19] R. Monasson. Optimization problems and replica symmetry breaking in finite connectivity spin
glasses. J. Phys. A, 31(2):513–529, 1998.
[20] K.Y.M. Wong and D. Sherrington. Graph bipartitioning and spin-glasses on a random network of
fixed finite valence. J. Phys. A, 20:L793–99, 1987.
[21] M. Mezard and G. Parisi. The bethe lattice spin glass revisited. Euro. Phys. Jour. B, 20(2):217–233,
2001.
[22] O. Rivoire, G. Biroli, O.C. Martin, and M. Mzard. Glass models on bethe lattices. Euro. Phys. J.
B, 37:55–78, 2004.
[23] Y. Kapashima. Propagating beliefs in spin glass models. J. Phys. Soc. Jpn., 72:1645–1649, 2003.
[24] J. Raymond, A. Sportiello, and L. Zdeborov. The phase diagram of random 1-in-3 satisfiability
problem. Phys. Rev. E., 76(1):011101, 2007.
	Background
	The model
	Code Ensembles
	Methodology
	Spectral Efficiency Lower Bound
	Replica Method Outline
	Replica Symmetric Equations
	Population Dynamics
	Stability Analysis
	Results
	Solution Coexistence Regimes
	Spectral Efficiency Lower Bound Numerical Results
	Concluding Remarks
ABSTRACT
  Sparse Code Division Multiple Access (CDMA), a variation on the standard CDMA
method in which the spreading (signature) matrix contains only a relatively
small number of non-zero elements, is presented and analysed using methods of
statistical physics. The analysis provides results on the performance of
maximum likelihood decoding for sparse spreading codes in the large system
limit. We present results for both cases of regular and irregular spreading
matrices for the binary additive white Gaussian noise channel (BIAWGN) with a
comparison to the canonical (dense) random spreading code.

<|endoftext|><|startoftext|>
Introduction
In [1], Ando proved the following inequalities for positive semidefinite (PSD)
matrices A, B, and any unitarily invariant (UI) norm. For any non-negative
operator monotone function f(t) on [0,∞):
|||f(A)− f(B)||| ≤ |||f(|A−B|)|||, (1)
and, when f(0) = 0 and f(∞) = ∞, and g is the inverse function of f ,
|||g(A)− g(B)||| ≥ |||g(|A−B|)|||. (2)
In a later paper [2], Ando and Zhan proved the related inequalities (with the
same conditions on f and g)
|||f(A) + f(B)||| ≥ |||f(A+B)|||, (3)
|||g(A) + g(B)||| ≤ |||g(A+B)|||. (4)
The conditions on f are satisfied by every operator concave function f with
f(0) = 0.
Inequality (4) was generalised by Kosem [7] to non-negative convex functions
g on [0,∞), with g(0) = 0. Inequality (3) was generalised very recently to any
non-negative concave function on [0,∞) by Bourin and Uchiyama [5], who
also gave a simpler proof of Kosem’s result.
In the same spirit, we consider the question whether inequalities (1) and (2)
can also be generalised to non-negative concave f and convex g, respectively.
After introducing the necessary prerequisites in Section 2, we present our main
results concerning this question in Section 3. Regrettably, most of our results
are negative answers, and we give counterexamples to this generalisation. The
answer is even negative for the special case A ≥ B, although the apparent
hardness of finding counterexamples had led us temporarily into believing
that in that case the generalisation might actually hold.
All is not bad news, however. In Section 4 we answer the question affirmatively
in the special case when A ≥ ||B||. In Section 5, we introduce the novel notion
of Y -dominated majorisation between the spectra of two Hermitian matrices,
where Y is itself a Hermitian matrix. We prove a certain property of this
relation, namely Proposition 3, which we subsequently use, first in a rather
destructive fashion. To wit, the Proposition has been instrumental in finally
discovering a counterexample to the generalisation of (1) for A ≥ B; this will
be reported in Section 6. On the more constructive side, the Proposition also
allows to strengthen the results of Bourin-Uchiyama and Kosem mentioned
above. This is the topic of the final Section, along with a few other applications.
2 Preliminaries
In this Section, we introduce the notations and necessary prerequisites; a more
detailed exposition can be found, e.g. in [4]. We will use the abbreviations LHS
and RHS for left-hand side and right-hand side, respectively.
We denote the vector of diagonal entries of a matrix A by Diag(A).
We denote the absolute value by | · |, both for scalars and for matrices. For
matrices this is defined as |A| := (A∗A)1/2. Similarly, we denote the positive
part of a real scalar or Hermitian matrix by (·)+, and define it by A+ :=
(A+ |A|)/2.
In this paper, we are mainly concerned with monotonously increasing convex
and concave functions from R to R. Kosem noted in [7] that any such function
can be approximated by a sum of angle functions x 7→ ax+ b(x−x0)
+, where
a ≥ 0, and b > 0 for a convex angle function (b < 0 for a concave one).
We are also concerned with the unitarily invariant (UI) matrix norms, which
we denote by ||| · |||, and which are defined in terms of the singular values
σj(·) of the matrix only. We adopt the customary convention that the singular
values are sorted in non-increasing order: σ1 ≥ σ2 ≥ . . . ≥ σd. Special cases
of these norms are the operator norm || · ||, which is just equal to the largest
singular value σ1(·), and the Ky Fan norms || · ||(k), which are defined as the
sum of the k largest singular values:
||A||(k) :=
σj(A).
The famous Ky Fan dominance theorem states that a matrix B dominates
another matrix A in all UI norms if and only if it does so in all Ky Fan norms.
The latter set of relations can be written as a weak majorisation relation
between the vectors of singular values of A and B:
σ(A) ≺w σ(B) :
σj(A) ≤
σj(B), ∀k.
For PSD matrices, the above domination relation translates to a weak majori-
sation between the vectors of eigenvalues: λ(A) ≺w λ(B).
Weyl’s monotonicity Theorem ([4], Corollary III.2.3) states that
k(A) ≤ λ
k(A +B), ∀k,
for Hermitian A and positive semi-definite B. Here, λ↓(A) denotes the (real)
vector of eigenvalues of A sorted in non-increasing order.
Finally, we refer the reader to Chapter 2 of [6] for an exposition of a number
of important functional analytic properties of eigenvalues and corresponding
eigenspaces of a Hermitian matrix, which we will need in the proof of Propo-
sition 2.
3 Main Results
The question we start with is about the straightforward generalisation of in-
equality (2) to non-negative convex functions.
Question 1 For all A,B,≥ 0, for all UI norms, and for non-negative convex
functions g on [0,∞) with g(0) = 0, does the inequality |||g(A) − g(B)||| ≥
|||g(|A− B|)||| hold?
The answer to this question is negative, as shown by the following counterex-
ample. We consider the convex angle function g(x) = x + (x − 1)+ and the
operator norm. For the 2× 2 PSD matrices
0.9 0
0 0.6
, B =
0.8 0.5
0.5 0.4
, (5)
the eigenvalues of g(|A− B|) are 0.65249 and 0.35249, while those of g(A)−
g(B) are 0.65010 and −0.48862. Thus, ||g(|A − B|)||∞ = 0.65249, which is
larger than ||g(A)− g(B)||∞ = 0.65010. ✷
Under the additional restriction A ≥ B, the absolute value in the argument
of g in the right-hand side vanishes, leading to a simplified statement, and a
second question, with better hopes for success. Introducing the matrix ∆ =
A− B,
Question 2 For all B,∆ ≥ 0, for all UI norms, and for non-negative convex
functions g on [0,∞) with g(0) = 0, does the inequality |||g(B+∆)−g(B)||| ≥
|||g(∆)||| hold?
This restricted case also turns out to have a negative answer. Counterexam-
ples, however, were much harder to find, and required a reduction of the prob-
lem based on certain results about a novel majorisation-like relation, which
we call the Y -dominated majorisation. This will be the subject of Sections 5
and 6, where a number of Propositions of independent interest are proven.
It is also very reasonable to ask:
Question 3 For all B,∆ ≥ 0, for all UI norms, and for non-negative concave
functions f on [0,∞), does the inequality |||f(B + ∆) − f(B)||| ≤ |||f(∆)|||
hold?
In fact, if this were true, a positive answer to Question 2 would easily follow,
using the same reasoning that was used in [5] to derive the generalisation of
(3) from the generalisation of (4).
Again, this statement is false, as the following counterexample shows. Consider
the concave angle function f(x) = min(x, 1) = x − (x − 1)+, and the 3 × 3
PSD matrices
0.701816 0.317887 0.198910
0.317887 1.014950 −0.093826
0.198910 −0.093826 0.274236
0.192713 0 0
0 0.446505 0
0 0 0.455416
One gets
||f(∆)||∞ = 0.455416
while
||f(B +∆)− f(B)||∞ = 0.455776.
In Section 4, we consider an even more restricted special case, in which the
inequalities (1) and (2) finally do hold. We actually prove that a stronger
relationship holds in this special case:
Proposition 1 For non-negative, monotonously increasing and concave func-
tions g, and A,B ≥ 0 such that A ≥ ||B||, we have
λ↓(g(A− B)) ≥ λ↓(g(A)− g(B)). (6)
An easy Corollary is the corresponding statement for monotonously increasing
convex functions.
Corollary 1 Let f be a non-negative convex function on [0,∞) with f(0) = 0.
Let A,B ≥ 0 such that A ≥ ||B||. Then
λ↓(f(A− B)) ≤ λ↓(f(A)− f(B)). (7)
Proof. Let f = g−1, with g satisfying the conditions of Proposition 1. Replace
in (6) A by f(A) and B by f(B), yielding
λ↓(g(f(A)− f(B))) ≥ λ↓(A− B).
Applying the function f on both sides does not change the ordering, because
of monotonicity of f , and yields validity of inequality (7). ✷
These two results obviously imply the corresponding majorisation relations,
and by Ky Fan dominance, relations in any UI norm.
4 Proof of Proposition 1
We want to prove inequality (6):
λ↓(g(A)− g(B)) ≤ λ↓(g(A−B)),
for A,B ≥ 0, A ≥ ||B||, and concave, monotonously increasing and non-
negative g.
W.l.o.g. we will assume ||B|| = 1, since any other value can be absorbed in
the definition of g.
It is immediately clear that if (6) holds for g that in addition satisfy g(0) = 0,
then it must also hold without that constraint, i.e. for functions g(x)+ c, with
c ≥ 0. This is because the additional constant c drops out in the LHS, while
λ↓(g(A−B) + c) ≥ λ↓(g(A− B)).
Furthermore, (6) remains valid when replacing g(x) with ag(x), for a > 0.
Thus, w.l.o.g. we can assume g(0) = 0 and g(1) = 1. Together with concavity
of g, this implies that, for 0 ≤ x ≤ 1, g(x) ≥ x, while for x ≥ 1, the derivative
g′(x) ≤ 1.
Since 0 ≤ B ≤ 11, and for 0 ≤ x ≤ 1, g(x) ≥ x holds, we have g(B) ≥ B,
or −g(B) ≤ −B. By Weyl monotonicity, this implies λ↓(g(A) − g(B)) ≤
λ↓(g(A)−B). Thus, statement (6) would be implied by the stronger statement
λ↓(g(A)−B) ≤ λ↓(g(A− B)). (8)
Now note that the argument of g in the LHS, A, is never below 1. Thus, in
principle, we could replace g(x) in the LHS by another function h(x) defined
h(x) =
g(x), if x ≥ 1
x, otherwise.
If we also do that in the RHS, we get a stronger statement than (8). Indeed,
h(x) ≤ g(x) for x ≥ 0 and A − B ≥ 0, and therefore h(A − B) ≤ g(A − B)
holds. By Weyl monotonicity again, we see that (8) is implied by
λ↓(h(A)−B) ≤ λ↓(h(A− B)). (10)
The importance of this move is that h(x) is still a monotonously increasing
and concave function (because g′(x) ≤ 1 for x ≥ 1), but now has gradient
h′(x) ≤ 1 for x ≥ 0.
Defining C = A−B, which is positive semi-definite, we now have to show the
inequality
k(h(C +B)− B) ≤ λ
k(h(C)) = h(λ
k(C)),
for every k. Fixing k, and introducing the shorthand x0 = λ
k(C), we can
exploit concavity of h to upper bound it as h(x) ≤ a(x − x0) + h(x0), where
a = h′(x0) ≤ 1. Again by Weyl monotonicity, we find
k(h(C +B)− B)≤λ
k(a(C +B − x0) + h(x0)− B)
k(aC + (a− 1)B − ax0 + h(x0))
k(aC)− ax0 + h(x0) = h(x0),
where in the second line we could remove the term (a−1)B because it is nega-
tive. This being true for all k, we have proved (10) and all previous statements
that follow from it, including the statement of the Theorem. ✷
5 On Y -dominated Majorisation
To answer Question 2, we have to consider the property that a convex function
f satisfies
λ(f(∆)) ≺w λ(f(B +∆)− f(B)) (11)
for all PSD B and ∆, which is equivalent to the statement
λ(f(A− B)) ≺w λ(f(A)− f(B)) (12)
for all A ≥ B ≥ 0.
The monotone convex angle functions x 7→ ax + (x − 1)+ (a ≥ 0) already
have proven their valour as a testing ground for similar statements, in Section
3. Numerical experiments using angle functions for inequality (11) did not
directly lead to any counterexamples, however. This temporarily increased
our belief that the inequality might actually hold, and led us to investigate,
as an initial step towards a “proof”, whether the inequality
j(aY +B) ≤
j(aY + C)
might be true for all a ≥ 0, where B = f(Y ) and C = f(X + Y )− f(X), and
f(x) = (x−1)+. The crucial observation is now that if this holds for all a ≥ 0,
then, actually, a much stronger relationship than just majorisation must hold
between aY +B and aY + C.
To describe this phenomenon, we’ll consider a somewhat broader setting. Let
G and C be Hermitian matrices, and let f1 and f2 be monotonously increasing
real functions on R. Suppose that for all a ≥ 0, the following holds:
j(aA +B) ≤
j(aA+ C), (13)
with A = f1(G) and B = f2(G).
It is easily seen that if (13) holds for a certain value of a, it also holds for all
smaller positive values. Let b be a scalar such that 0 ≤ b < a. Because both
A and B exhibit their eigenvalues as diagonal elements in the eigenbasis of G,
and both in non-increasing order, we get
j(aA +B) =
j (bA+B) + (a− b)
j (A).
On the other hand, for aA + C we only have the subadditivity inequality
j (aA+ C) ≤
j(bA + C) + (a− b)
j(A).
As a consequence, we obtain that, indeed,
j (bA+B) ≤
j(bA + C)
follows from (13).
We are therefore led to consider what happens when a tends to infinity, because
that value dominates all others. Subtracting
j=1 λ
j(aA) from both sides, and
substituting a = 1/t, we obtain
j(A + tB)− λ
j(A)) ≤
j(A+ tC)− λ
j(A)).
In the limit of t going to 0, this yields a comparison between derivatives:
j(A + tB) ≤
j(A+ tC). (14)
We will show below that the derivatives ∂
j (A + tC) are the diagonal
elements of C in a certain basis depending on G and C. Let us first introduce
the vector δ(C;A) whose entries satisfy the following relation:
δj(C;A) :=
j(A+ tC). (15)
With this notation, relation (14) becomes
δj(B;G) ≤
δj(C;G).
That is, the entries of δ(B;G) are “majorised” by those of δ(C;G). How-
ever, this is a much stronger relation than ordinary majorisation, since the
rearrangement of the entries in decreasing order is absent.
Introducing the symbol ≺dw for weak majorisation with missing rearrange-
ment:
a ≺dw b ⇐⇒
bj , (16)
relation (14) is expressed as
δ(B;G) ≺dw δ(C;G). (17)
To justify these notations, we now show:
Proposition 2 Let A and C be Hermitian matrices. With δ(C;A) defined by
(15), the entries of the vector δ(C;A) are the diagonal entries of C in a certain
basis in which A is diagonal and its diagonal entries appear sorted in non-
increasing order. When all eigenvalues of A are simple (i.e. have multiplicity
1), this basis is just the eigenbasis of A and does not depend on C.
Proof. There are three cases to consider, according to whether A is non-
degenerate, A + tC has an accidental degeneracy at t = 0, or A + tC is
permanently degenerate.
1. The most important case is when all eigenvalues of A are simple, i.e. when
they have multiplicity 1. We then show that the derivative is given by
j(A+ tC) = Tr[Pk(A) C],
where Pk(A) denotes the projector on the subspace spanned by the k eigen-
vectors of A corresponding to its k largest eigenvalues.
By the simplicity of the eigenvalues of A, the eigenvalues of A + tC are also
simple for small enough values of t. This follows easily fromWeyl’s inequalities:
j(A) + λ
n(tC) ≤ λ
j (A+ tC) ≤ λ
j(A) + λ
1(tC);
thus if t||C|| is strictly less than one half the minimal difference between
all pairs of eigenvalues of A, the difference between all pairs of eigenvalues of
A+tC is bounded away from 0. Therefore, for small enough t, every eigenvalue
of A+ tC has a unique eigenvector, and as a result Pk(A+ tC) is well-defined
as the sum of the projectors on the eigenvectors pertaining to the k largest
eigenvalues.
It is well-known that the eigenvalues of A+tC as functions of the real variable
t can be so ordered that they are analytic functions of t (see [6], Chapter 2),
and hence continuous. This implies that the k-th largest eigenvalue of A+ tC
is also a continuous function of t, for any k.
If, furthermore, an eigenvalue λ(t) of A + tC is simple in an interval of t,
then the projector P (t) on the eigenvector x(t) associated to it (with P (t) =
x(t)x(t)∗) is also analytic, and therefore continuous in t on this interval. We
conclude that Pk(A+ tC) is analytic in t, and therefore differentiable.
By the maximality of Pk(A) in the variational characterisation
j(A) = max
Tr[AQk] = Tr[APk(A)],
where Qk runs over all rank-k projectors, we have
Tr[APk(A + tC)] = 0,
which implies
j (A+ tC)
Tr[(A+ tC)Pk(A+ tC)]
Tr[APk(A+ tC)] +
Tr[(A+ tC)Pk(A)]
=Tr[C Pk(A)].
Let U be the unitary that diagonalises A, i.e. UAU∗ = Λ↓(A). Then
Tr[C Pk(A)] =
(UCU∗)jj,
and the statement of the Proposition follows.
2. When A has degenerate eigenvalues, the situation becomes somewhat more
complicated, but there are no really significant changes. There is no longer
a unique eigenbasis of A, so that Pk(A) is not well-defined for all k. We will
first consider the case where C is such that it removes the degeneracy of the
eigenvalues of A in A+ tC for small enough positive t. In that case Pk(A+ tC)
will be uniquely defined for all positive t less than some value t0, which is the
smallest positive t for which A + tC has an accidental degeneracy (which is
what also happens at t = 0).
This occurs, for instance, when C has simple eigenvalues. Indeed, by analyt-
icity of the eigenvalues of A + tC in t, degeneracy is either accidental (for
isolated values of t) or permanent (for all values of t). Since all eigenvalues
are simple for large enough t, they have to remain simple for all values of t
except possibly for some isolated values, such as t = 0, in this case. Let t0
be the smallest positive such value, then A + tC has simple eigenvalues for
0 < t < t0.
We can therefore define Pk(A) in a unique way as the limit limt→0 Pk(A +
tC). This is an allowed choice because of the continuity of the eigenvalues:
j=0 λ
k(A) = Tr[limt→0 Pk(A + tC) A]. Using the same argument as in the
previous case, we obtain δ(C;A) := Tr[limt→0 Pk(A+ tC) C].
Let λl be the eigenvalues of A (multiplicity not counted), and Ql the projec-
tions onto the corresponding eigenspaces of A (with Q∗l the corresponding in-
clusion operators); the rank of Ql equals the multiplicity of λl, which we denote
by ml. To obtain δ(C;A), we first construct the diagonal blocks Cl := QlCQ
(of size ml), then take the eigenvalues λ
↓(Cl) in non-increasing order of each
block, and then concatenate the obtained sequences of eigenvalues:
δ(C;A) := (λ↓(C1), . . . , λ
↓(Cm)).
If all eigenvalues of A are distinct, this reduces to the vector of diagonal
elements of C in the eigenbasis of A that we encountered in case 1.
For example, if λ↓(A) = (5, 5, 3, 1), then δ(C;A) = (λ
1(C1), λ
2(C1), C33, C44),
where C1 =
C11 C12
C21 C22
 and all entries of C are taken in the eigenbasis of A.
Let U be a unitary (which, in this case, is not unique) that diagonalises A as
UAU∗ = Λ↓, and take the diagonal blocks Cl of UCU
∗, as above. Each block
can be diagonalised using a unitary Vl. Together with U we obtain the total
basis rotation W := U(
l Vl). By construction,
l Vl leaves Λ invariant, and
resolves the ambiguity in U . We obtain that δ(C;A) is the vector of diagonal
entries of C in the basis obtained by applying the unitary W .
3. Finally, we look at the case when A + tC is permanently degenerate, i.e.
when it has degenerate eigenvalues for all values of t. W.l.o.g. we just have to
look at t in an interval [0, t0), where t0 is the smallest positive value for which
A + tC has an accidental degeneracy. Let us denote by λj(t) the eigenvalues
of A+ tC in non-increasing order, multiplicity mj not counted, and by Pj(t)
the projectors on the corresponding eigenspaces. In that case Pk(A + tC) is
only well-defined if there is a j′ such that k = m1 +m2 + . . . +mj′; then we
have Pk(A + tC) = P1(t) + P2(t) + . . .+ Pj′(t).
If there is no such j′, let j′ be the largest integer such that k > m1 + m2 +
. . .+mj′ =: k
′. Thus 0 < k − k′ < mj′+1. Then we have
j (A+ tC)
miλi(t) + (k − k
′)λj′+1(t)
=Tr[(A + tC) (P1(t) + . . .+ Pj′(t) +
k − k′
mj′+1
Pj′+1(t))]
=Tr[(A + tC) (
k − k′
mj′+1
Pk′+mj′+1(A+ tC) + (1−
k − k′
mj′+1
)Pk′(A+ tC))].
Thus, if we define α := (k − k′)/mj′+1,
j=1 λ
j(A + tC) interpolates linearly
between
j=1 λ
j(A+ tC) and
∑k′+mj′+1
j=1 λ
j(A + tC) with parameter α.
Proceeding in the same way as in the two previous cases, we obtain for the
derivative
j (A+ tC) = Tr[C(αPk′+mj′+1(A) + (1− α)Pk′(A))],
where the Pk(A) have to be replaced with the limits limt→0 Pk(A + tC) if in
addition there are accidental degeneracies at t = 0. Let us consider the entries
of C again as before, in an eigenbasis of A in which the eigenvalues of A appear
on the diagonal, in non-increasing order. We get
δ(C;A)k = (1− α)
Cii + α
k′+mj′+1
Cii. (18)
Because of the permanent degeneracy, an eigenbasis is determined up to “lo-
cal” rotations within the various eigenspaces. We consider a partitioning of
C in such an eigenbasis corresponding to these eigenspaces. That is, in C we
can single out diagonal blocks, each of which corresponds to the eigenspace of
eigenvalue λj . We can use our freedom to choose the local rotations to make
all diagonal elements of C equal within each diagonal block. This allows us to
get rid of the interpolation in (18), and we finally obtain that, again,
δ(C;A)k =
with the entries of C taken in the eigenbasis that we have just chosen. ✷
The upshot of this Proposition is that there exists a unitary U such that
UAU∗ = Λ↓(A) and δ(C;A) = Diag(UCU∗). In the generic case that all
λi(A) are distinct, U is unique and does not depend on C.
A number of easy consequences follow immediately from this Proposition:
Corollary 2 Let G and C be Hermitian matrices, f be any monotonously
increasing real function on R, and g any strictly increasing real function on
R, then
(i) δ(f(G);G) = f(λ↓(G)).
(ii) δ(C;G) obeys Schur’s majorisation Theorem: δ(C;G) ≺ λ↓(C).
(iii) δ(C;G) + aλ↓(f(G)) = δ(C + af(G);G), ∀a ≥ 0.
(iv) δ(C; f(A)) = δ(C;A).
Along with the previously demonstrated equivalence of (13) with (17), the
Corollary immediately leads to the following Proposition:
Proposition 3 For Hermitian G,C, monotonously increasing real functions
f1, f2 on R, and A = f1(G), B = f2(G), the following are equivalent:
λ(aA+B) ≺w λ(aA+ C), ∀a ≥ 0 (19)
δ(B;G)≺dw δ(C;G) (20)
δ(aA+ B;G)≺dw δ(aA+ C;G), ∀a ≥ 0. (21)
Proof.
(19) implies (20): This is just Proposition 2.
(20) implies (21): Add aλ↓(A) to both sides and invoke statement (iii) of the
Corollary.
(21) implies (19): By statement (i) of the Corollary, the LHS of (21) is equal
to λ↓(aA+B), while, by statement (ii) of the Corollary, its RHS is majorised
by λ(aA+ C). ✷
6 Counterexample to Question 2
If the answer to Question 2 is to be affirmative, it should at least hold for all
angle functions f(x) = ax+ b(x−x0)
+. By Proposition 3 this is equivalent to
the statement
δ((Y − 11)+; Y ) ≺dw δ((X + Y − 11)
+ − (X − 11)+; Y ).
Consider the 3× 3 matrices
0.35614 −0.053243 0.10116
−0.053243 0.87456 0.40559
0.10116 0.40559 0.82474
0.53642 0 0
0 0.42018 0
0 0 0.094866
The eigenbasis of Y is therefore the standard basis. Then δ((Y − 11)+; Y ) =
(0, 0, 0) and
(X + Y − 11)+ − (X − 11)+ =
−0.00018194 0.00052449 −0.0016345
0.00052449 0.2573 0.12368
−0.0016345 0.12368 0.04
so that δ((X+Y −11)+−(X−11)+; Y ) = (−0.00018194, 0.2573, 0.04). The first
entry is negative, violating the ≺dw relation, and thereby answering Question
2 in the negative. ✷
7 Further Applications of Y -dominated majorisation
One issue we had to address during our attempts at giving a positive answer
to Question 2 dealt with the possibility of reducing the question for convex
functions to convex angle functions. One way of doing so would have been
possible if the set of (monotonously increasing and convex) functions satisfying
(11) were closed under addition. While we were unable to prove this particular
statement (which is most likely false, anyway), Proposition 3 enables us to
prove the corresponding statement for the relation
δ(f(Y ); Y ) ≺dw δ(f(X + Y )− f(X); Y ). (22)
Proposition 4 Let all the eigenvalues of Y be distinct. Let f and g be func-
tions from R to R satisfying (22). Then f + g also satisfies (22).
Proof. By the assumption on the eigenvalues of Y , δ(A; Y ) equals Diag(A) in
a basis only depending on Y and is therefore a linear function of A. We can
therefore add up the inequalities (22) for f and g and obtain the corresponding
inequalities for f + g. ✷
A second application of Proposition 3 is a strengthening of the following Propo-
sition, which we also obtained in the course of our attempts at positively
answering Question 2.
Proposition 5 For X, Y ≥ 0 and ga(x) = ax+
, with a ≥ 0, the following
majorisation statement holds:
λ(ga(Y )) ≺w λ(ga(X + Y )− ga(X)).
Proof. From the proof of Lemma X.1.4 in [4], we have, for X, Y ≥ 0,
j ((X + 11)
−1 − (X + Y + 11)−1) ≤ λ
j(11 − (Y + 11)
Defining the function f(x) = x
= 1− (x+ 1)−1, this turns into:
j(f(X + Y )− f(X)) ≤ λ
j(f(Y )).
This implies the majorisation statement
j(f(X + Y )− f(X)) ≤
j(f(Y )). (23)
We want to prove a somewhat similar statement for the function ga(x). Since
both f and ga are monotonously increasing over R
+, and noting that ga(x) =
(a+ 1)x− f(x), we have
j(ga(Y ))= ga(λ
j (Y )) = (a+ 1)λ
j(Y )− f(λ
j(Y ))
j(f(Y ))= f(λ
j(Y )),
so that
j(ga(Y )) = (a+ 1)λ
j(Y )− λ
j(f(Y )).
This implies in particular
j (ga(Y )) = (a+ 1)
j (Y )−
j(f(Y ))
≤ (a+ 1)
j (Y )−
j(f(X + Y )− f(X)),
where we have inserted (23). Exploiting the well-known relation ([4], Th.
III.4.1)
j(A+B) ≤
j (A) +
j(B),
for A = (a+ 1)Y − f(X + Y ) + f(X) and B = f(X + Y )− f(X) then yields
j (ga(Y ))≤
j((a+ 1)Y − f(X + Y ) + f(X))
j(ga(X + Y )− ga(X)).
Proposition 3, with A = G = Y , B = f(Y ), C = f(X + Y ) − f(X), where
f(x) = x2/(x+ 1), then yields the following strengthening of Proposition 5:
Proposition 6 For X, Y ≥ 0, and ga(x) = ax+
, with a ≥ 0,
δ(ga(Y ); Y ) ≺dw δ(ga(X + Y )− ga(X); Y ).
Here we noted that ga(X + Y )− ga(X) = aY + f(X + Y )− f(X).
To end this Section, we present a third application of Proposition 3, namely
to the results of Kosem and Bourin-Uchiyama. Consider first inequality (3),
which holds for all non-negative concave functions f(x). In particular, it holds
for all functions f = ax+ f0(x), where f0 is non-negative concave, and a ≥ 0.
Inserting this in the eigenvalue-majorisation form of inequality (3), we get the
(A+B)-dominated majorisation relation
λ(a(A+B) + f0(A+B)) ≺w λ(a(A +B) + f0(A) + f0(B)),
for A,B ≥ 0. Proposition 3 then immediately yields the stronger form
δ(f(A+B);A+B) ≺dw δ(f(A) + f(B);A+B), (24)
for all non-negative concave functions f . The strengthening of inequality (4)
is performed in a completely identical way and yields the reversed inequality
of (24) for non-negative convex functions g such that g(0) = 0.
Acknowledgements
JSA thanks Professor Moin Uddin, Director of his institute for encourage-
ment and supporting his visit to attend the conference at Nova Southeastern
University, Fort Lauderdale, Florida, USA, which lead to his introduction to
Koenraad M.R. Audenaert and the completion of this work.
KA thanks the Institute for Mathematical Sciences, Imperial College London,
for support. His work is part of the QIP-IRC (www.qipirc.org) supported by
EPSRC (GR/S82176/0).
References
[1] T. Ando, “Comparison of norms |||f(A)− f(B)||| and |||f(|A−B|)|||,” Math. Z.
197, 403–409 (1988).
[2] T. Ando and X. Zhan, “Norm inequalities related to operator monotone
functions,” Math. Ann. 315, 771–780 (1999).
[3] J.S. Aujla and F.C. Silva, “Weak majorization inequalities and convex functions,”
Lin. Alg. Appl. 369, 217–233 (2003).
[4] R. Bhatia, Matrix Analysis, Springer, Heidelberg (1997).
[5] J.-C. Bourin and M. Uchiyama, “A matrix subadditivity inequality for f(A+B)
and f(A) + f(B),” Arxiv.org E-print math.FA/0702475 (2007).
[6] T. Kato, Perturbation theory for linear operators, Reprint of the 1980 edition,
Classics in Mathematics, Springer-Verlag, Berlin (1995).
[7] T. Kosem, “Inequalities between ||f(A + B)|| and ||f(A) + f(B)||,” Lin. Alg.
Appl. 418, 153–160 (2006).
http://arxiv.org/abs/math/0702475
	Introduction
	Preliminaries
	Main Results
	Proof of Proposition 1
	On Y-dominated Majorisation
	Counterexample to Question 2
	Further Applications of Y-dominated majorisation
	Acknowledgements
	References
ABSTRACT
  For positive semidefinite matrices $A$ and $B$, Ando and Zhan proved the
inequalities $||| f(A)+f(B) ||| \ge ||| f(A+B) |||$ and $||| g(A)+g(B) ||| \le
||| g(A+B) |||$, for any unitarily invariant norm, and for any non-negative
operator monotone $f$ on $[0,\infty)$ with inverse function $g$. These
inequalities have very recently been generalised to non-negative concave
functions $f$ and non-negative convex functions $g$, by Bourin and Uchiyama,
and Kosem, respectively.
  In this paper we consider the related question whether the inequalities $|||
f(A)-f(B) ||| \le ||| f(|A-B|) |||$, and $||| g(A)-g(B) ||| \ge ||| g(|A-B|)
|||$, obtained by Ando, for operator monotone $f$ with inverse $g$, also have a
similar generalisation to non-negative concave $f$ and convex $g$. We answer
exactly this question, in the negative for general matrices, and affirmatively
in the special case when $A\ge ||B||$.
  In the course of this work, we introduce the novel notion of $Y$-dominated
majorisation between the spectra of two Hermitian matrices, where $Y$ is itself
a Hermitian matrix, and prove a certain property of this relation that allows
to strengthen the results of Bourin-Uchiyama and Kosem, mentioned above.

<|endoftext|><|startoftext|>
Introduction
Black holes in space-times of greater than or equal to five dimensions have rich
topological structure. According to the well-known results of Hawking concerning
the topology of black holes in four-dimensional space-time, the apparent horizon
or the spatial section of the stationary event horizon is necessarily diffeomorphic
to a 2-sphere. [1, 2] This follows from the fact that the total curvature, which is
the integral of the intrinsic scalar curvature over the horizon, is positive under the
dominant energy condition and from the Gauss-Bonnet theorem. Alternative and
improved proofs of Hawking’s theorem have been given by several authors. [3, 4, 5, 6]
However in higher dimensional space-times, an apparent horizon or the spatial
section of the stationary event horizon may not be a topological sphere, [7, 8, 9, 10]
because the Gauss-Bonnet theorem does not hold in such cases. Nevertheless,
the positivity of the total curvature of the horizon still holds. This puts certain
topological restrictions on the black hole topology, though they are rather weak. For
example, the apparent horizon in five-dimensional space-time can consist of finitely
many connected sums of copies of S3/Γ and copies of S2 × S1. In fact, exact
solutions representing a black hole space-time possessing a horizon of nonspherical
topology have recently been found in five-dimensional general relativity. When such
black holes with nontrivial topologies are regarded as being formed in the course
of gravitational collapse, questions regarding the evolution of the topology of black
holes naturally arise. Our purpose here is to understand the time evolution of the
topology of event horizons in a general setting. The relation between the crease set,
where the event horizon is nondifferentiable, and the topology of the event horizon
is studied in Refs. [11, 12, 13] for four-dimensional space-times. In the present
work, we carry out a systematic investigation and find useful rules to determine
admissible processes of topological evolution for time slicing of a black hole.
Our approach is to utilize the Morse theory [14, 15] in differential topology. The
Morse theory is useful for the purpose of understanding the topology of smooth
http://arxiv.org/abs/0704.0100v3
manifolds. The basic tool used in this approach is a smooth function on a differ-
entiable manifold. The event horizon, however, is not a differentiable manifold but
has a wedge-like structure at the past endpoints of the null geodesic generators
of the horizon. For this reason, we first smooth the wedge. Then, the smooth
time function which is assumed to exist plays the role of the Morse function on the
smoothed event horizon. According to the Morse theory, the topological evolution
of the event horizon can then be decomposed into elementary processes called “han-
dle attachments.” In such a process, starting with a spherical horizon, one adds
several handles, each characterized by the index of the critical points of the Morse
function, which is an integer ranging from 0 to n (the dimension of the smoothed
horizon as a differentiable manifold).
The purpose of the present article is to show that there are several constraints
on the handle attachments for real black hole space-times.
2. The Morse theory for event horizons
Let M be an (n+1)-dimensional asymptotically flat space-time. We require the
existence of a global time function t : M → R that is smooth and has an everywhere
time-like and future-pointing gradient. The event horizon H is defined as the
boundary of the causal past of the future null infinity H = ∂J−(I +). [2] We treat
the event horizon defined with respect to a single asymptotic end, unless otherwise
stated. In other words, the future null infinity, I +, is assumed to be connected.
The black hole region B is defined as the interior region of H , specifically, as B =
M \ J−(I +), and the exterior region E of the black hole region is its complement,
E = int(J−(I +)). We refer to the intersection of the black hole region and the
time slice Σ(t0) = {t = t0} as the black hole B(t0) = B ∩ Σ(t0) at time t = t0.
The exterior region at time t = t0 is, accordingly, written E (t0) = E ∩ Σ(t0).
One of most basic properties of the event horizon is that it is generated by null
geodesics without future endpoints. In general, the event horizon is not smoothly
imbedded into the space-time manifold M , but it has a wedge-like structure at
the past endpoints of the null geodesic generators, where distinct null geodesic
generators intersect. We call the set of past endpoints of null geodesic generators
of H , from which two or more null geodesic generators emanate, the crease set
S. [11, 12] When no crease set S exists between the time slices t = t1 and t = t2, the
null geodesic generators of H naturally define a diffeomorphism ∂B(t1) ≈ ∂B(t2).
Hence, the topological evolution of a black hole can take place only when the time
slice intersects the crease set S. Of course, the event horizon itself is a gauge-
independent object. Nevertheless, we often understand the dynamics of space-time
by scanning it along time slices. Thus, the topological evolution of a black hole
depends on the time function.
It is expected that Morse theory [14] provides useful techniques to analyze such
a process of topological evolution. Because the Morse theory is concerned with
functions on smooth manifolds, we first regularize H around the crease set S. The
event horizon is not necessarily smooth, even on H \ S, in the case that the future
null infinity I + has a pathological structure. [16] Here it is assumed that H is
smooth on H \ S. Then, small deformations of H near the crease set S will make
H a smooth hypersurface H̃ in M , while B̃(t0) remains deformed in such a manner
that ∂B̃(t0) = H̃ ∪ Σ(t0) holds and B̃(t0) remains homeomorphic to the original
black hole for all t0 ∈ R. This deformation is assumed to be such that the time
Figure 1. An example in which no smoothing procedure makes
t| eH a Morse function on H̃ . Here, the intersection of the crease set
S of the event horizon and t = t0 hypersurface has an accumulation
point.
function t| eH , which is the restriction of t on H̃, gives a Morse function on H̃ that
has only nondegenerate critical points, where the gradient of t| eH defined on H̃
becomes zero and where also the Hessian matrix (∂i∂jt| eH) of t| eH is nondegenerate.
Though this assumption should hold for a wide class of systems, it does not always
hold. Figure 1 gives an example for which no smoothing procedure makes the
induced time function t| eH a Morse function on H̃ , because the intersection of the
crease set S of the event horizon and the t = t0 hypersurface has an accumulation
point. It is highly nontrivial to determine whether such a smoothing procedure is
generically possible. It is, however, not easy nor the primary purpose of this article
to assertain the realm of validity of the assumption, and therefore we make this
assumption without inquiring into its validity.
According to the Morse Lemma, there is a local coordinate system {x1, · · · , xn}
on H̃ in the neighborhood of the critical point p ∈ H̃ such that the restriction t| eH
of the time function t on H̃ takes the form
t| eH(x
1, · · · , xn) = t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2.
The integer λ, ranging from 0 to n, is called the index of the critical point p. The
topology of the black hole B̃(t) changes when Σ(t) pass through critical points,
or equivalently, when the time function t takes critical values. This implies that
critical points appears only near the crease set S.
The gradient-like vector field for t| eH is defined to be the tangent vector field X
on H̃ such that Xt| eH > 0 holds on H̃ , except for critical points, and has the form
X = −2x1
− · · · − 2xλ
+ 2xλ+1
∂xλ+1
+ · · ·+ 2xn
near the critical point of index λ, in terms of the standard coordinate system
appearing in the Morse Lemma. We choose a gradient-like vector field X such that
it coincides with the future-directed tangent vector field of null geodesic generators
of H , except in a small neighborhood of the crease set S (Fig. 2). The effect of
a critical point p of index λ is equivalent to the attachment of a λ-handle. [14, 15]
The handlebody is just a topological n-disk Dn ≈ In (I = [0, 1]), but it is regarded
as the product space Dn ≈ Dλ×Dn−λ (Fig. 3). The λ-handle attachment to an n-
dimensional manifold N with a boundary consists of the set hλ = (Dλ ×Dn−λ, f),
where the attaching map f induces the imbedding of ∂Dλ × Dn−λ ⊂ ∂Dn into
∂N (Fig. 4). The new manifold obtained through the λ-handle attachment to N is
Figure 2. The smoothing procedure of the event horizon H . The
gradient-like vector field on H̃ can be constructed through a slight
deformation of the null geodesic generators of H . Here, the effect
of the crease set S has been replaced by that of the critical points
p1, p2 and p3.
Figure 3. The local structure around the critical point p of index
λ. It can be seen that H̃t(p)+ǫ is homeomorphic to H̃t(p)−ǫ with a
λ-handle attached.
given by
N ∪ hλ = N ∪ (Dλ ×Dn−λ)/(x ∼ f(x)), (x ∈ ∂Dλ ×Dn−λ).
Let us denote by H̃t0 the t ≤ t0 part of H̃. Then, H̃t(p)+ǫ (ǫ > 0) just above the
critical point p of index λ is homeomorphic (in fact diffeomorphic, taking account
of the smoothing procedure) to that just below p, H̃t(p)−ǫ attached with a λ-handle,
H̃t(p)+ǫ ≈ H̃t(p)−ǫ ∪ h
if there are no other critical points satisfying t(p)−ǫ ≤ t ≤ t(p)+ǫ. The handlebody
itself is denoted by hλ as well.
Let us consider several examples. The 0-handle attachment does not need an
attaching map f . It simply corresponds to the emergence of the (n − 1)-sphere
Sn−1 ≈ ∂Dn as a black hole horizon ∂B(t). A typical example is the creation of a
black hole (Fig. 5): A black hole always emerges as 0-handle attachment. The other
Figure 4. The attachment of a 1-handle and a 2-handle to a
3-manifold N creates a new 3-manifold N ∪ h1 ∪ h2.
Figure 5. The emergence of a black hole through a 0-handle attachment.
Figure 6. The emergence of a bubble in the black hole region by
0-handle attachment, which does not occur in the real black hole
space-times.
Figure 7. The collision of a pair of black holes, creating a single
black hole, is realized through 1-handle attachment.
possiblity is the creation of a bubble that is subset of J−(I +) in a black hole region
(Fig. 6). One might think that this corresponds to wormhole creation between the
internal and external regions of the event horizon. Although in the framework of the
standard Morse theory on H̃ , these two examples are indistinguishable, we below
see that the latter process is in fact impossible.
Next, we consider 1-handle attachment. A typical example is the collision of two
black holes. A 1-handle serves as a bridge connecting black holes, or it corresponds
to taking the connected sum of each component of multiple black holes (Fig. 7).
Figure 8. The bifurcation of one black hole into two is represeted
by an (n − 1)-handle attachment. This, however, never occurs in
real black hole space-times.
Figure 9. The structure of λ-handle. The core Dλ × {0} corre-
sponds to the stable submanifold with respect to the flow gener-
ated by the gradient-like vector field, and the co-core {0} ×Dn−λ
corresponds to the unstable submanifold.
The time reversal of the collision of black holes consists of the bifurcation of one
black hole into two. This would be realized through an (n− 1)-handle attachment,
if such a process were possible (Fig. 8). It is, however, well known that such a
process is forbidden. [2] In general, the time reversal of the λ-handle attachment
corresponds to (n− λ)-handle attachment.
Before discussing general cases, let us consider the structure of a handlebody.
Recall that a λ-handle consists of the product space Dλ × Dn−λ. The subset
Dλ × {0} ⊂ Dλ × Dn−λ is called the core of the handlebody, and {0} ×Dn−λ ⊂
Dλ ×Dn−λ is called the co-core. The core and co-core intersect transversely at a
point. This point can be regarded as a critical point p.
Let us refer to the subset Ws(p) of H̃
(1) Ws(p) = {q ∈ M | lim
expq tX = p}
which consists of points that converge to p along the flow generated by the gradient-
like vector field X , as the stable manifold with respect to the critical point p. The
stable manifold Ws(p) is homeomorphic to R
λ if the index of p is given by λ. [17]
Similarly, let us refer to the subset Wu(p) ⊂ H̃ consisting of points which converge
to p along the flow generated by (−X) as the unstable manifold with respect to p.
For the unstable manifold, Wu(p) ≈ R
n−λ holds. The portions of the stable and
unstable manifolds in the handlebody can be regarded as corresponding to the core
and co-core, respectively.
The effect of smoothing the event horizonH to H̃ is to deform the null vector field
generating H into a gradient-like vector field X . The primary difference between
the null geodesic generators and the flow generated by X is that the former does
not have future endpoints, but the latter can. Thus, there are admissible and
inadmissible processes for the smoothed manifold H̃. An admissible process is
given by H̃ , which is obtained from an in priciple realizable event horizon, while an
inadmissible one is constructed from a spurious event horizon, i.e., one that consists
of the null hypersurface containing null geodesic generators with a future endpoint.
3. The structure of the critical points
The spatial topology of a black hole changes only when the time function takes a
critical value. The time evolution of the black hole topology can be understood by
considering its local structure around critical points. To determine whether a given
topological change is admissible or inadmissible, it is not sufficient to consider only
the intrinsic structure of the event horizon. Rather, it is required to take account
of its imbedding structure relative to the space-time.
In a time slice, any point separate from H̃ belongs to either of the black hole
or the exterior of the black hole region. It is useful to consider the local behavior
of the black hole region or the exterior region near the critical point p. Let us call
the exterior E of the black hole region simply the exterior region, for brevity. The
exterior region is slightly deformed by the smoothing procedure. The deformed
exterior region is denoted by Ẽ , and the deformed exterior region at the time t by
Ẽ (t) = Ẽ ∩ Σ(t) = Σ(t) \ B̃(t).(2)
The 0-handle is placed at some t ≥ t(p). Such an attachment describes the emer-
gence of the black hole region at the critical point p and its expansion with time.
The emergence of a bubble, which consists of a part of J−(I +), in the background
of the black hole region would also be described by a 0-handle attachment. This,
however, never occurs, as we explain below in detail. Hence, a 0-handle attachment
always describes the creation of a black hole homeomorphic to the n-disk.
An n-handle attachment corresponds to the time reversal of a 0-handle attache-
ment. This process, however, never occurs in real black hole space-time. An n-
handle is defined for t ≤ t(p), which means that it terminates at the critical point
p. The crease set is isolated into critical points during the course of the smoothing
procedure. The gradient-like vector field, which can be regarded as being tangent
to the generator of the deformed event horizon H̃ , may have several inward (con-
verging) directions at the critical point due to this smoothing procedure, while the
original null generator of the event horizon does not have an inward direction at
the crease set. In the case of the n-handle, all the directions become inward at the
critical point. This implies that the null generators of the event horizon H must
have future endpoints at the critical point, which is, of course, impossible. It is
thus seen that an n-handle attachment never occurs in real black hole space-times.
The remaining cases are λ-handle attachments for 1 ≤ λ ≤ n − 1. In these
cases, the λ-handle lies on either side of the critical point p both in the future
[t > t(p)] and in the past [t < t(p)]. Then, we consider the case in which the handle
exists during the sufficiently small time interval t ∈ [t(p) − δ, t(p) + δ] (δ > 0), to
understand the topological change of the black hole region at the critical point p.
Figure 10. The neighborhood U of p is separated by hλ into the
future region, U+, and the past region, U−.
First, we introduce a coordinate system {t, xi} (i = 1, · · · , n) in the neighborhood
U of p, where t is a given function of time, and {xi} is the extension over U of
the cannonical coordinate appearing in the Morse Lemma such that each curve
(x1, · · · , xn) = [const] is timelike in U . We assume that U is the solid cylinder
given by t ∈ [t(p)−δ, t(p)+δ],
(xi)2 ≤ δ. In this coordinate system, the λ-handle
hλ is given by the saddle surface
t = t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2
in U , which is an acausal set if the constant δ is taken sufficiently small, since hλ
is tangent to the space-like hypersurface t = t(p) at p. Therefore, hλ separates
U into two open subsets, the future and past regions U+ and U− of U , where
U+ and U− are the subsets lying chronological future and past, respectively, of
hλ: U± = I±(hλ) ∩ U . Explicitly, the future and past regions U± are the regions
satisfying
t ≷ t(p)− (x1)2 − · · · − (xλ)2 + (xλ+1)2 + · · ·+ (xn)2
in U , respectively (Fig. 10).
Because the λ-handle is a subset of the black hole boundary H̃ , one of U± is
contained in the black hole region, B̃, and the other in the exterior region, Ẽ .
However, the future region U+ of U is always included in the black hole region, i.e.
U+ ⊂ B̃, and hence we have U− ⊂ Ẽ , since the horizon is the boundary of the past
set, J−(I +). Therefore, the black hole region B̃(t(p) − ǫ) ∩ U in U at the time
t = t(p)− ǫ just before the critical time is given by
(x1)2 + · · ·+ (xλ)2 > (xλ+1)2 + · · ·+ (xn)2 + ǫ,
which is homotopic to the (λ − 1)-sphere Sλ−1. (For λ = 1, S0 simply consists of
two points.) Similarly, B̃(t(p) + ǫ) ∩ U just after the critical time is given by
(x1)2 + · · ·+ (xλ)2 + ǫ > (xλ+1)2 + · · ·+ (xn)2,
which is homotopic to the n-disk. In this way, the black hole region restricted to
the small neighborhood of the critical point p is initially homotopic to a sphere.
Then, the internal region of the sphere is filled up at the critical time t = t(p) and
eventually becomes homotopically trivial. The exterior region, Ẽ (t) ∩ U , in U is
initially homotopic to an n-disk for t = t(p) − ǫ. Then, its (n − λ)-dimensional
direction is penetrated by the black hole region at t = t(p), and thus it becomes
homotopic to an (n− λ− 1)-sphere Sn−λ−1 for t = t(p) + ǫ. If the spurious event
horizon is also taken into account, the future region U+ might be a subset of Ẽ , and
therefore the past region U− might be a subset of B̃. Then, the black hole region in
the λ-handle might be homotopic to an n-disk initially and become homotopic to an
(n−λ−1)-sphere finally, and vice versa for the exterior region. Let us refer to such
a topological change of the black hole region B̃(t)∩U from a region homotopic to a
sphere to a region homotopic to a disk as a black handle attachment, and that from
a region homotopic to a disk to a region homotopic to the sphere as a white handle
attachment. The above observation shows that only a black handle attachment
occurs if a sufficiently small neighborhood of the critical point is considered. For
example, a collision of black holes corresponds to a black 1-handle attachment, while
the bifurcation of a black hole corresponds to a white (n − 1)-handle attachment
in the sense that the homotopy type of the exterior region Ẽ (t) ∩ U changes from
that of Sn−2 to that of Dn. This local argument also elucidates te reason that a
black hole collision is admissible while a black hole bifurcation, which is its time
reversal, is inadmissible. We also note that the effect of time reversal is to convert
a black λ-handle attachment into a white (n− λ)-handle attachment.
It is appropriate to refer to the 0-handle attachment corresponding to the cre-
ation of a black hole as a black 0-handle attachment. Then, the proposition above
also applies to a 0-handle attachment.
4. Connectedness of the exterior region
There also exist processes that are unrealizable due to global conditions. Let
us, for a moment, consider the event horizon in maximally extended Schwarzschild
space-time. Though we are interested in the event horizon defined with respect to
a specific asymptotic end, for the purpose of explanation, we examine the event
horizon defined with respect to a pair of asymptotic ends in Schwarzschild space-
time (Fig. 11).
Let I +1 and I
2 be the pair of future null infinities of the maximally extended
Schwarzschild space-time. The event horizon here is defined by H = ∂J−(I +1 ∪
2 ), which is nondifferentiable at the bifurcate horizon F = ∂J
−(I +1 )∩∂J
−(I +2 ).
Let t be a global time function and χ be a global radial coordinate function such
that each two-surface t, χ = [const] is invariant under the SO(3) isometry. These
coordinates are chosen such that the bifurcation surface F is located at t = χ = 0
and the event horizon H is determined by t = |χ| around F . The smoothed event
horizon H̃ is also taken to be invariant under the SO(3) isometry. Due to the
symmetry of the configuration, the time function t has critical points of degenerate
type. In fact, any point on bifurcate horizon F is critical. Here, we are not interested
in such a nongeneric situation. Instead, we consider a slightly different time slicing
determined by the new time function
t′ = t+ ǫ sin2
where ǫ > 0 is a sufficiently small positive constant and ϑ, which satisfies 0 ≤ ϑ ≤ π,
is the usual polar coordinate of the 2-sphere. Then, there appears only a pair of
isolated critical points at the north pole (ϑ = 0) and the south pole (ϑ = π) on
the bifurcate horizon F , and the time function t′ becomes the Morse function on
H̃. At the time t′ = 0, the black hole appears at the north pole. This is the
0-handle attachment. The black hole formed there grows into a geometrically thick
spherical shell with a hole at the south pole, which is nevertheless a topological
3-disk. At the time t′ = ǫ, the puncture at the south pole is filled, and the black
hole region becomes topologically S2 × [0, 1]. The deformed event horizon H̃ splits
into a disjoint union of a pair of 2-spheres. This is the 2-handle attachment.
This kind of 2-handle attachment occurs because the event horizon is defined
with respect to the two asymptotic ends, which is in general inadmissible if the
future null infinity is connected, as we assume from this point. To understand the
above statement, it should be noted that there is no process through which the
several connected components of the exterior region Ẽ (t) = Ẽ ∩ Σ(t) at time t
merge together at a later time because such a handle attachment is not admissible.
It is also seen that no connected component of Ẽ (t) disappears, because possi-
ble n-handle attachments are inadmissible. These facts imply that the number of
connected components of the exterior region Ẽ (t) cannot decrease with the time
function t. On the other hand, there is only one connected component of the exte-
rior region Ẽ (t) for sufficiently large t, because of the connectedness of I +. This
observation shows that the exterior region Ẽ (t) remains connected in any process.
The only possible process through which the number of connected components
of the exterior region Ẽ (t) changes is an (n− 1)-handle attachment, as constructed
above in the Schwarzschild space-time. This is because the subset Dλ × ∂Dn−λ of
H2 H1
t' = t' (p)
B (t'(p))
II 12
Figure 11. The figure on the left is a conformal diagram of the
maximally extended Schwarzschild space-time. The structure of
the event horizon defined with respect to the two asymptotic ends
is depicted on the right, with one dimension omitted. The shaded
region represents the black hole region at the critical time t = t(p).
This corresponds to the 2-handle attachment, where the exterior
region is separated into a pair of connected components.
the boundary of the λ-handle
∂hλ ≈ (∂Dλ ×Dn−λ) ∪ (Dλ × ∂Dn−λ),
namely the part of ∂hλ which is the complement of the preimage of the attaching
f : ∂hλ ⊃ ∂Dλ ×Dn−λ → H̃t,
is disconnected only when λ = n − 1. In this case, the homotopy type of the
exterior region Ẽ (t) changes from that of an n-disk to that of S0, namely two points.
Note, however, that this does not imply that the exterior region Ẽ (t) is always
separated into two disconnected parts through the (n− 1)-handle attachment. For
example, a transition from the black ring horizon ≈ Sn−2 × S1 to the spherical
black hole horizon ≈ Sn−1 is realized through a black (n− 1)-handle attachment,
which pinches the longitude {a point}×S1 ⊂ Sn−2 ×S1 into a point. The exterior
region Ẽ (t) remains connected all the while. Thus, there are both admissible and
inadmissible processes for (n−1)-handle attachments. An (n−1)-handle attachment
is inadmissible if it separates the exterior region Ẽ (t).
5. Concluding remarks
The arguments given in this paper are summerized by the following rules. As-
sume that (i) an (n+ 1)-dimensional space-time M is asymptotically flat and the
future null infinity I + is connected, or the event horizon H = ∂J−(I +) is defined
with respect to a single asymptotic end, (ii) the space-time M admits a smooth
global time function t, (iii) the event horizon H can be deformed so that the black
hole B̃(t) deformed accordingly at each time t is smooth and homeomorphic to orig-
inal one B(t) at each time t and the time function t becomes the Morse function
on H̃. Then, the topological evolution of the event horizon can be regarded as a
λ-handle attachment (0 ≤ λ ≤ n) subject to the following rules:
(1) The n-handle attachment is inadmissible.
(2) Only the black λ-handle attachment (0 ≤ λ ≤ n − 1), where the black
hole region in the neighborhood of the critical point varies from the region
homotopic to the sphere Sλ−1 (regarded as the empty set for λ = 0) to the
n-disk Dn, is admissible.
(3) The (n − 1)-handle attachment which separates the spatial section of the
exterior region of the black hole is inadmissible.
The first rule simply states that no connected component of a black hole disap-
pears. It also implies that if a bubble of the exterior region forms within the black
hole region, it does not vanish.
The second rule is concerned with the imbedding structure of the event horizon
relative to the space-time manifold. The neighborhood of the critical point is sep-
arated into two regions by the event horizon. One changes homotopically from a
sphere to a disk and the other from a disk to a sphere. We call it a black handle at-
tachment when the former corresponds to the black hole region and a white handle
attachment otherwise. Then, the second rule states that a white handle attach-
ment never occurs. The reverse process, in which a black hole region homotopically
changes from a disk to a sphere, is ruled out. A white 0-handle attachment, which
Figure 12. Black ring formation from a spherical black hole must
be non-axisymmetric in real black hole space-times.
describes the emergence of the exterior region, is also forbidden. This gives an-
other reason for the well-known result that a black hole cannot bifurcate, because
it corresponds to a white (n− 1)-handle attachment.
The second rule applies to more general situations. For example, let us consider
the topological evolution of the event horizon from Sn−1 to Sn−2 × S1 in (n+ 1)-
dimensional space-time (n ≥ 3). When it is realized with a single critical point, it
corresponds to a 1-handle attachment. Here, one might expect two possibilities if
the second rule is not considered. One possibility is that the 1-handle is attached in
the exterior region of the black hole. This is locally equivalent to the merging of a
pair of black holes, where these two black holes are connected elsewhere irrelevant.
The other possibility is that it is attached from the inside such that the 1-handle
pierces the black hole region. In asymptotically flat space-times, only the latter
includes axisymmetric configurations such that a spherical black hole is pinched
out along the symmetric axis; here the axisymmetric configuration is such that
the space-time possesses the SO(n − 1) isometry and the time slicing respects
this symmetry. However, this latter possibility corresponds to a white 1-handle
attachment, which is impossible, and only the former, which corresponds to a black
1-handle attachment, is possible. In particular, a transition from a spherical event
horizon (≈ Sn−1) to a black ring horizon (≈ Sn−2×S1) in asymptotically flat space-
times is always non-axisymmetric in the sense that such a configuration cannot
possess SO(n− 1) symmetry (Fig. 12).
While the apparent horizon must be diffeomorphic to a two-sphere in four-
dimensional space-times under the dominant energy condition, a torus event hori-
zon may appear, even under the dominant energy condition, via a black 1-handle
attachment to the spherical horizon. More generally, an event horizon with an
arbitrary number of genura may be formed by several black 1-handle attachments.
The third rule is not directly determined by the local structure of the critical
point. It states that the exterior region E (t) = E ∩ Σ(t) at each time is always
connected under the assumption that I + is connected. Thus, the possibility that
there forms a bubble of the exterior region inside the black hole horizon is ruled
out. It should, however, be noted that such a process is possible if I + consists
of several connected components. This may also be related to the topological
censorship theorem. [19] The topological censorship theorem states that all causal
curves from I − to I + are homotopic under the null energy condition. This also
forbids the formation of a bubble of the exterior region inside the black hole, because
otherwise there would be two nonhomotopic causal curves from I − to I +, one
passing inside the horizon and the other outside. Our argument, however, does not
depend on energy conditions.
References
[1] S. W. Hawking, Commun. Math. Phys. 25 (1972), 152.
[2] S. W. Hawking and G. F. R. Ellis, The large scale structure of space-times (London, Cam-
bridge University Press, 1973).
[3] D. Gannon, Gen. Relat. Gravit. 7 (1974), 219.
[4] P. T. Chrusciel and R. M. Wald, Class. Quantum Grav. 11 (1994), L147; gr-qc/9410004.
[5] T. Jacobson and S. Venkataramani, Class. Quantum Grav. 12 (1995), 1055; gr-qc/9410023.
[6] S. F. Browdy and G. J. Galloway, J. Math. Phys. 36 (1995), 4952.
[7] M. l. Cai and G. J. Galloway, Class. Quantum Grav. 18 (2001), 2707; hep-th/0102149.
[8] C. Helfgott, Y. Oz and Y. Yanay, JHEP 0602 (2006), 025; hep-th/0509013.
[9] G. J. Galloway and R. Schoen, Commun. Math. Phys. 266 (2006), 571; gr-qc/0509107.
[10] G. J. Galloway, gr-qc/0608118.
[11] M. Siino, Phys. Rev. D 58 (1998), 104016; gr-qc/9701003.
[12] M. Siino and T. Koike, arXiv:gr-qc/0405056.
[13] S. L. Shapiro, S. A. Teukolsky and J. Winicour, Phys. Rev. D 52 (1995), 6982.
[14] J. Milnor, Lectures on h-cobordism theorem (Princeton, Princeton University Press, 1965).
[15] I. Tamura, The differential topology (Tokyo, Iwanami Shoten Publishers, 1978).
[16] P. T. Chrusciel and G. J. Galloway, Commun. Math. Phys. 193 (1998), 449 gr-qc/9611032.
[17] S. Smale, Ann. of Math. 74 (1961), 199.
[18] R. Emparan and H. S. Reall, Phys. Rev. Lett. 88 (2002), 101101; hep-th/0110260.
[19] J. L. Friedman, K. Schleich and D. M. Witt, Phys. Rev. Lett. 71 (1993), 1486 [Errata; 75
(1995), 1872]; gr-qc/9305017.
http://arxiv.org/abs/gr-qc/9410004
http://arxiv.org/abs/gr-qc/9410023
http://arxiv.org/abs/hep-th/0102149
http://arxiv.org/abs/hep-th/0509013
http://arxiv.org/abs/gr-qc/0509107
http://arxiv.org/abs/gr-qc/0608118
http://arxiv.org/abs/gr-qc/9701003
http://arxiv.org/abs/gr-qc/0405056
http://arxiv.org/abs/gr-qc/9611032
http://arxiv.org/abs/hep-th/0110260
http://arxiv.org/abs/gr-qc/9305017
	1. Introduction
	2. The Morse theory for event horizons
	3. The structure of the critical points
	4. Connectedness of the exterior region
	5. Concluding remarks
	References
ABSTRACT
  The topological structure of the event horizon has been investigated in terms
of the Morse theory. The elementary process of topological evolution can be
understood as a handle attachment. It has been found that there are certain
constraints on the nature of black hole topological evolution: (i) There are n
kinds of handle attachments in (n+1)-dimensional black hole space-times. (ii)
Handles are further classified as either of black or white type, and only black
handles appear in real black hole space-times. (iii) The spatial section of an
exterior of the black hole region is always connected. As a corollary, it is
shown that the formation of a black hole with an S**(n-2) x S**1 horizon from
that with an S**(n-1) horizon must be non-axisymmetric in asymptotically flat
space-times.

<|endoftext|><|startoftext|>
Introduction
The sixties was a period in which strong interacting processes were studied
in detail using the newly constructed accelerators at Cern and other places.
Many new hadronic states were found that appeared as resonant peaks in var-
ious cross sections and hadronic cross sections were measured with increasing
accuracy. In general, the experimental data for strongly interacting processes
were rather well understood in terms of resonance exchanges in the direct
channel at low energy and by the exchange of Regge poles in the transverse
channel at higher energy. Field theory that had been very successful in de-
scribing QED seemed useless for strong interactions given the big number of
hadrons to accomodate in a Lagrangian and the strength of the pion-nucleon
coupling constant that did not allow perturbative calculations. The only do-
main in which field theoretical techniques were successfully used was current
algebra. Here, assuming that strong interactions were described by an almost
chiral invariant Lagrangian, that chiral symmetry was spontaneously broken
and that the pion was the corresponding Goldstone boson, field theoretical
methods gave rather good predictions for scattering amplitudes involving pi-
ons at very low energy. Going to higher energy was, however, not possible
with these methods.
Because of this, many people started to think that field theory was use-
less to describe strong interactions and tried to describe strong interacting
http://arxiv.org/abs/0704.0101v1
2 Paolo Di Vecchia
processes with alternative and more phenomenological methods. The basic
ingredients for describing the experimental data were at low energy the ex-
change of resonances in the direct channel and at higher energy the exchange
of Regge poles in the transverse channel. Sum rules for strongly interacting
processes were saturated in this way and one found good agreement with the
experimental data that came from the newly constructed accelerators. Be-
cause of these successes and of the problems that field theory encountered to
describe the data, it was proposed to construct directly the S matrix without
passing through a Lagrangian. The S matrix was supposed to be constructed
from the properties that it should satisfy, but there was no clear procedure on
how to implement this construction1. The word “bootstrap” was often used
as the way to construct the S matrix, but it did not help very much to get an
S matrix for the strongly interacting processes.
One of the basic ideas that led to the construction of an S matrix was
that it should include resonances at low energy and at the same time give
Regge behaviour at high energy. But the two contributions of the resonances
and of the Regge poles should not be added because this would imply double
counting. This was called Dolen, Horn and Schmidt duality [2]. Another idea
that helped in the construction of an S matrix was planar duality [3] that
was visualized by associating to a certain process a duality diagram, shown in
Fig. (1), where each meson was described by two lines representing the quark
and the antiquark. Finally, also the requirement of crossing symmetry played
a very important role.
Fig. 1. Duality diagram for the scattering of four mesons
Starting from these ideas Veneziano [4] was able to construct an S matrix
for the scattering of four mesons that, at the same time, had an infinite number
of zero width resonances lying on linearly rising Regge trajectories and Regge
behaviour at high energy. Veneziano originally constructed the model for the
1 For a discussion of S matrix theory see Ref.s [1]
The birth of string theory 3
process ππ → πω, but it was immediately extended to the scattering of four
scalar particles.
In the case of four identical scalar particles, the crossing symmetric scat-
tering amplitude found by Veneziano consists of a sum of three terms:
A(s, t, u) = A(s, t) +A(s, u) +A(t, u) (1)
where
A(s, t) =
Γ (−α(s))Γ (−α(t))
Γ (−α(s)− α(t))
dxx−α(s)−1(1− x)−α(t)−1 (2)
with linearly rising Regge trajectories
α(s) = α0 + α
′s (3)
This was a very important property to implement in a model because it was
in agreement with the experimental data in a wide range of energies. s, t and
u are the Mandelstam variables:
s = −(p1 + p2)2 , t = −(p3 + p2)2 , u = −(p1 + p3)2 (4)
The three terms in Eq. (1) correspond to the three orderings of the four parti-
cles that are not related by a cyclic or anticyclic 2 permutation of the external
legs. They correspond, respectively, to the three permutations: (1234), (1243)
and (1324) of the four external legs. They have only simple pole singularities.
The first one has only poles in the s and t channels, the second only in the s
and u channels and the third only in the t and u channels. This property fol-
lows directly from the duality diagram that is associated to each inequivalent
permutation of the external legs. In fact, at that time one used to associate
to each of the three inequivalent permutations a duality diagram where each
particle was drawn as consisting of two lines that rappresented the quark and
antiquark making up a meson. Furthermore, the diagram was supposed to
have only poles singularities in the planar channels which are those involving
adjacent external lines. This means that, for instance, the duality diagram
corresponding to the permutation (1234) has only poles in the s and t chan-
nels as one can see by deforming the diagram in the plane in the two possible
ways shown in figure (2).
This was a very important property of the duality diagram that makes
it qualitatively different from a Feynman diagram in field theory where each
diagram has only a pole in one of the three s, t and u channels and not
simultaneously in two of them. If we accept the idea that each term of the
sum in Eq. (1) is described by a duality diagram, then it is clear that we
2 An anticyclic permutation corresponding, for instance, to the ordering (1234) is
obtained by taking the reverse of the original ordering (4321) and then performing
a cyclic permutation.
4 Paolo Di Vecchia
Fig. 2. The duality diagram contains both s and t channel poles
do not need to add terms corresponding to equivalent diagrams because the
corresponding duality diagram is the same and has the same singularities. It
is now clear that it was in some way implicit in this picture the fact that the
Veneziano model corresponds to the scattering of relativistic strings. But at
that time the connection was not obvious at all. The only S matrix property
that the Veneziano model failed to satisfy was the unitarity of the S matrix.
because it contained only zero width resonances and did not have the various
cuts required by unitarity. We will see how this property will be implemented.
Immediately after the formulation of the Veneziano model, Virasoro [5]
proposed another crossing symmetric four-point amplitude for scalar particles
that consisted of a unique piece given by:
A(s, t, u) ∼
Γ (−α(u)
)Γ (−α(s)
)Γ (−α(t)
Γ (1 +
)Γ (1 +
)Γ (1 +
where
α(s) = α0 + α
′s (6)
The model had poles in all three s, t and u channels and could not be written
as sum of three terms having poles only in planar diagrams. In conclusion,
the Veneziano model satisfies the principle of planar duality being a crossing
symmetric combination of three contributions each having poles only in the
planar channels. On the other hand, the Virasoro model consists of a unique
crossing symmetric term having poles in both planar and non-planar channels.
The attempts to construct consistent models that were in good agreement
with the strong interaction phenomenology of the sixties boosted enormously
the activity in this research field. The generalization of the Veneziano model to
the scattering ofN scalar particles was built, an operator formalism consisting
of an infinite number of harmonic oscillators was constructed and the complete
spectrum of mesons was determined. It turned out that the degeneracy of
states grew up exponentially with the mass. It was also found that the N
point amplitude had states with negative norm (ghosts) unless the intercept
of the Regge trajectory was α0 = 1 [6]. In this case it turned out that the
model was free of ghosts but the lowest state was a tachyon. The model was
called in the literature the “dual resonance model”.
The birth of string theory 5
The model was not unitary because all the states were zero width reso-
nances and the various cuts required by unitarity were absent. The unitarity
was implemented in a perturbative way by adding loop diagrams obtained by
sewing some of the external legs together after the insertion of a propagator.
The multiloop amplitudes showed a structure of Riemann surfaces. This be-
came obvious only later when the dual resonance model was recognized to
correspond to scattering of strings.
But the main problem was that the model had a tachyon if α0 = 1 or had
ghosts for other values of α0 and was not in agreement with the experimental
data: α0 was not equal to about
as required by experiments for the ρ
Regge trajectory and the external scalar particles did not behave as pions
satisfying the current algebra requirements. Many attempts were made to
construct more realistic dual resonance models, but the main result of these
attempts was the construction of the Neveu-Schwarz [7] and the Ramond [8]
models, respectively, for mesons and fermions. They were constructed as two
independent models and only later were recognized to be two sectors of the
same model. The Neveu-Schwarz model still contained a tachyon that only in
1976 through the GSO projection was eliminated from the physical spectrum.
Furthermore, it was not properly describing the properties of the physical
pions.
Actually a model describing ππ scattering in a rather satisfactory way
was proposed by Lovelace and Shapiro [9] 3. According to this model the
three isospin amplitudes for pion-pion scattering are given by:
[A(s, t) +A(s, u)]− 1
A(t, u)
A1 = A(s, t)−A(s, u) A2 = A(t, u) (7)
where
A(s, t) = β
Γ (1− α(s))Γ (1 − α(t))
Γ (1− α(t)− α(s))
; α(s) = α0 + α
′s (8)
The amplitudes in eq.(7) provide a model for ππ scattering with linearly rising
Regge trajectories containing three parameters: the intercept of the ρ Regge
trajectory α0, the Regge slope α
′ and β. The first two can be determined by
imposing the Adler’s self-consistency condition, that requires the vanishing of
the amplitude when s = t = u = m2π and one of the pions is massless, and the
fact that the Regge trajectory must give the spin of the ρ meson that is equal
to 1 when
s is equal to the mass of the ρ meson mρ. These two conditions
determine the Regge trajectory to be:
α(s) =
s−m2π
m2ρ −mπ2
= 0.48 + 0.885s (9)
3 See also Ref. [10].
6 Paolo Di Vecchia
Having fixed the parameters of the Regge trajectory the model predicts the
masses and the couplings of the resonances that decay in ππ in terms of a
unique parameter β. The values obtained are in reasonable agreement with
the experiments. Moreover, one can compute the ππ scattering lenghts:
a0 = 0.395β a2 = −0.103β (10)
and one finds that their ratio is within 10% of the current algebra ratio given
by a0/a2 = −7/2. The amplitude in eq.(8) has exactly the same form as that
for four tachyons of the Neveu-Schwarz model with the only apparently minor
difference that α0 = 1/2 (for mπ = 0) instead of 1 as in the Neveu-Schwarz
model. This difference, however, implies that the critical space-time dimension
of this model is d = 4 4 and not d = 10 as in the Neveu-Schwarz model. In
conclusion this model seems to be a perfectly reasonable model for describing
low-energy ππ scattering. The problem is, however, that nobody has been able
to generalize it to the multipion scattering and therefore to get the complete
meson spectrum.
As we have seen the S matrix of the dual resonance model was constructed
using ideas and tools of hadron phenomenology of the end of the sixties.
Although it did not seem possible to write a realistic dual resonance model
describing the pions , it was nevertheless such a source of fascination for those
who actively worked in this field at that time for its beautiful internal structure
and consistency that a lot of energy was used to investigate its properties and
for understanding its basic structure. It turned out with great surprise that
the underlying structure was that of a quantum relativistic string.
The aim of this contribution is to explain the logic of the work that was
done in the years from 1968 to 1974 5 in order to uncover the deep properties of
this model that appeared from the beginning to be so beautiful and consistent
to deserve an intensive study.
This seems to me a very good way of celebrating the 65th anniversary of
Gabriele who is the person who started and also contributed to develop the
whole thing with his deep physical intuition.
2 Construction of the N -point amplitude
We have seen that the construction of the four-point amplitude is not sufficient
to get information on the full hadronic spectrum because it contains only
those hadrons that couple to two ground state mesons and does not see those
intermediate states which only couple to three or to an higher number of
ground state mesons [12]. Therefore, it was very important to construct the
N -point amplitude involving identical scalar particles. The construction of
4 This can be checked by computing the coupling of the spinless particle at the
level α(s) = 2 and seeing that it vanishes for d = 4.
5 Reviews from this period can be found in Ref. [11]
The birth of string theory 7
the N -point amplitude was done in Ref. [13] (extending the work of Ref. [14])
by requiring the same principles that have led to the construction of the
Veneziano model, namely the fact that the axioms of S-matrix theory be
satisfied by an infinite number of zero width resonances lying on linearly
rising Regge trajectories and planar duality.
The fully crossing symmetric scattering amplitude of N identical scalar
particles is given by a sum of terms corresponding to the inequivalent permu-
tations of the external legs:
An (11)
Also in this case two permutations of the external legs are inequivalent if they
are not related by a cyclic or anticyclic permutation. Np is the number of
inequivalent permutations of the external legs and is equal to Np =
(N−1)!
and each term has only simple pole singularities in the planar channels. Each
planar channel is described by two indices (i, j), to mean that it includes the
legs i, i+ 1, i+ 2 . . . j − 1, j, by the Mandelstam variable
sij = −(pi + pi+1 + . . .+ pj)2 (12)
and by an additional variable uij whose role will become clear soon. It is
clear that the channels (ij) and (j + 1, i− 1) 6 are identical and they should
be counted only once. In the case of N identical scalar particles the number
of planar channels is equal to
N(N−3)
. This can be obtained as follows. The
independent planar diagrams involving the particle 1 are of the type (1, i)
where i = 2 . . .N − 2. Their number is N − 3. This is also the number of
planar diagrams involving the particle 2 and not the 1. The number of planar
diagrams involving the particle 3 and not the particles 1 and 2 is equal to
N − 4. In general the number of planar diagrams involving the particle i and
not the previous ones from 1 to i-1 is equal to N − 1− i. This means that the
total number of planar diagram is equal to:
2(N − 3) +
(N − 1− i) = 2(N − 3) +
= 2(N − 3) +
(N − 4)(N − 3)
N(N − 3)
If one writes down the duality diagram corresponding to a certain planar
ordering of the external particles, it is easy to see that the diagram can have
simultaneous pole singularities only in N − 3 channels. The channels that
allow simultaneous pole singularities are called compatible channels, the other
6 This channel includes the particles (j + 1, . . . , N, 1, . . . i− 1).
8 Paolo Di Vecchia
are called incompatible. Two channels (i,j) and (h,k) are incompatible if the
following inequalities are satisfied:
i ≤ h ≤ j ; j + 1 ≤ k ≤ i− 1 (14)
The aim is to construct the scattering amplitude for each inequivalent per-
mutation of the external legs that has only pole singularities in the
N(N−3)
planar channels. We have also to impose that the amplitude has simultaneous
poles only in N − 3 compatible channels. In order to gain intuition on how to
proceed we rewrite the four-point amplitude in Eq. (2) as follows:
A(s, t) =
du23 u
−α(s12)−1
−α(s23)−1
23 δ(u12 + u23 − 1) (15)
where u12 and u23 are the variables corresponding to the two planar chan-
nels (12) and (23) and the cancellation of simultaneous poles in incompatible
channels is provided by the δ-function which forbids u12 and u23 to vanish
simultaneously.
We will now extend this procedure to the N -point amplitude. But for the
sake of clarity let us start with the case of N = 5 [14]. In this case we have 5
planar channels described by u12, u13, u23, u24 and u34. Since we have only two
compatible channels only two of the previous five variables are independent.
We can choose them to be u12 and u13. In order to determine the depen-
dence of the other three variables on the two independent ones, we exclude
simultaneous poles in incompatible channels. This can be done by imposing
relations that prevent variables corresponding to incompatible channels to
vanish simultaneously. A sufficient condition for excluding simultaneous poles
in incompatible channels is to impose the conditions:
uP = 1−
uP̄ (16)
where the product is over the variables P̄ corresponding to channels that
are incompatible with P . In the case of the five-point amplitude we get the
following relations:
u23 = 1− u34u12 ; u24 = 1− u13u12
u13 = 1− u34u24 ; u34 = 1− u23u13 ; u12 = 1− u24u23 (17)
Solving them in terms of the two independent ones we get:
u23 =
1− u12
1− u12u13
; u34 =
1− u13
1− u12u13
; u24 = 1− u12u13 (18)
In analogy with what we have done for the four-point amplitude in Eq. (15)
we write the five-point amplitude as follows:
The birth of string theory 9
du34u
−α(s12)−1
−α(s13)−1
×u−α(s24)−124 u
−α(s23)−1
−α(s34)−1
δ(u23 + u12u34 − 1)δ(u24 + u12u13 − 1)δ(u34 + u13u23 − 1) (19)
Performing the integral over the variables u23, u24 and u34 we get:
du13u
−α(s12)−1
−α(s13)−1
× (1− u12)−α(s23)−1(1 − u13)−α(s13)−1(1− u12u13)−α(s24)+α(s23)+α(s34)(20)
We have implicitly assumed that the Regge trajectory is the same in all chan-
nels and that the external scalar particles have the same common mass m
and are the lowest lying states on the Regge trajectory. This means that their
mass is given by:
α0 − α′p2i = 0 ; p2i ≡ −m2 (21)
Using then the relation:
α(s23) + α(s34)− α(s24) = 2α′p2 · p4 (22)
we can rewrite Eq. (20) as follows:
−α(s2)−1
−α(s3)−1
3 (1− u2)−α(s23)−1×
× (1 − u3)−α(s34)−1
(1− xij)2α
′pi·pj (23)
where
si ≡ s1i , ui ≡ u1i ; i = 2, 3 ; xij = uiui+1 . . . uj−1. (24)
We are now ready to construct the N -point function [13]. In analogy with
what has been done for the four and five-point amplitudes we can write the
N -point amplitude as follows:
. . .
−α(sP )−1
δ(uQ − 1 +
uQ̄) (25)
10 Paolo Di Vecchia
where the first product is over the
N(N−3)
variables corresponding to all
planar channels, while the second one is over the
(N−3)(N−2)
independent
δ-functions. The product in the δ-function is defined in Eq. (16).
The solution of all the non-independent linear relations imposed by the
δ-functions is given by
uij =
(1 − xij)(1− xi−1,j+1)
(1− xi−1,j)(1 − xi,j+1)
where the variables xij are given in Eq. (24). Eliminating the δ-function from
Eq. (25) one gets:
−α(si)−1
i (1 − ui)
−α(si,i+1)−1
j=i+2
(1− xij)−γij(27)
where
γij = α(sij) + α(si+1;j−1)− α(si;j−1)− α(si+1;j) ; j ≥ i+ 2 (28)
It is easy to see that
α(si,i+1) = −α0 − 2α′pi · pi+1 ; γij = −2α′pi · pj ; j ≥ i+ 2 (29)
Inserting them in Eq. (27) we get:
−α(si)−1
i (1− ui)
j=i+1
(1− xij)2α
′pi·pj (30)
This is the form of the N -point amplitude that was originally constructed.
Then Koba and Nielsen [15] put it in the form that is more known nowadays.
They constructed it using the following rules. They associated a real variable
zi to each leg i. Then they associated to each channel (i, j) an anharmonic
ratio constructed from the variables zi, zi−1, zj, zj+1 in the following way
(zi, zi+1, zj, zj+1)
−α(sij)−1 =
(zi − zj)(zi−1 − zj+1)
(zi−1 − zj)(zi − zj+1)
]−α(sij)−1
and finally they gave the following expression for the N -point amplitude:
dV (z)
(i,j)
(zi, zi+1, zj, zj+1)
−α(sij)−1 (32)
where
dV (z) =
1 [θ(zi − zi+1)dzi]
i=1(zi − zi+2)dVabc
; dVabc =
dzadzbdzc
(zb − za)(zc − zb)(za − zc)
The birth of string theory 11
and the variables zi are integrated along the real axis in a cyclically ordered
way: z1 ≥ z2 . . . ≥ zN with a, b, c arbitrarily chosen.
The integrand of the N -point amplitude is invariant under projective
transformations acting on the leg variables zi:
αzi + β
γzi + δ
; i = 1 . . .N ; αδ − βγ = 1 (34)
This is because both the anharmonic ratio in Eq. (31) and the measure dVabc
are invariant under a projective transformation. Since a projective transfor-
mation depends on three real parameters, then the integrand of the N -point
amplitude depends only on N − 3 variables zi. In order to avoid infinities, one
has then to divide the integration volume with the factor dVabc that is also
invariant under the projective transformations. The fact that the integrand
depends only on N − 3 variables is in agreement with the fact that N − 3 is
also the maximal number of simultaneous poles allowed in the amplitude.
It is convenient to write the N -point amplitude in a form that involves the
scalar product of the external momenta rather than the Regge trajectories.
We distinguish three kinds of channels. The first one is when the particles
i and j of the channel (i, j) are separated by at least two particles. In this
case the channels that contribute to the exponent of the factor (zi − zj) are
the channels (i, j) with exponent equal to −α(sij) − 1, (i + 1, j − 1) with
exponent −α(si+1,j−1)− 1, (i+1, j) with exponent α(si+1,j)+1 and (i, j− 1)
with exponent α(si,j−1) + 1. Adding these four contributions one gets for the
channels where i and j are separated by at least two particles
− α(sij)− α(si+1,j−1) + α(si+1,j) + α(si,j−1) = 2α′pi · pj (35)
The second one comes from the channels that are separated by only one
particle. In this case only three of the previous four channels contribute. For
instance if j = i+2 the channel (i+1, j− 1) consists of only one particle and
therefore should not be included. This means that we would get:
− α(si;i+2)− 1 + α(s1+1;i+2) + 1 + α(si;i+1 + 1) = 1 + 2α′pi · pi+2 (36)
Finally the third one that comes from the channels whose particles are adja-
cent, gets only contribution from:
− α(si;i+1)− 1 = α0 − 1 + 2α′pi · pi+1 (37)
Putting all these three terms together in Eq. (32) and remembering the factor
in the denominator in the first equation of (33) we get:
1 dziθ(zi − zi+1)
dVabc
(zi − zi+1)α0−1
(zi − zj)2α
′pi·pj(38)
A convenient choice for the three variables to keep fixed is:
12 Paolo Di Vecchia
za = z1 = ∞ ; zb = z2 = 1 ; zc = zN = 0 (39)
With this choice the previous equation becomes:
dziθ(zi − zi+1)
(zi − zi+1)α0−1×
j=i+1
(zi − zj)2α
′pi·pj (40)
We now want to show that this amplitude is identical to the one given in Eq.
(30). This can be done by performing the following change of variables:
; i = 2, 3 . . .N − 2 (41)
that implies
zi = u2u3 . . . ui−1 ; i = 3, 4 . . .N − 1 (42)
Taking into account that the Jacobian is equal to:
uN−2−ii (43)
using the following two relations:
(zi − zi+1)α0−1 =
(N−1−i)α0−1
(1− ui)α0−1 (44)
j=i+1
(zj − zi)2α
′pi·pj =
j=i+1
(1− xij)2α
′pi·pj
−α(si)−(N−i−1)α0
i (45)
and the conservation of momentum
pi = 0 (46)
together with Eq. (21), one can easily see that Eq.s (30) and (40) are equal.
The birth of string theory 13
The N -point amplitude that we have constructed in this section corre-
sponds to the scattering of N spinless particles with no internal degrees of
freedom. On the other hand it was known that the mesons were classified
according to multiplets of an SU(3) flavour symmetry. This was implemented
by Chan and Paton [16] by multiplying the N -point amplitude with a factor,
called Chan-Paton factor, given by
Tr(λa1λa2 . . . λaN ) (47)
where the λ’s are matrices of a unitary group in the fundamental representa-
tion. Including the Chan-Paton factors the total scattering amplitude is given
Tr(λa1λa2 . . . λaN )BN (p1, p2, . . . pN ) (48)
where the sum is extended to the (N − 1)! permutations of the external legs,
that are not related by a cyclic permutations. Originally when the dual reso-
nance model was supposed to describe strongly interacting mesons, this factor
was introduced to represent their flavour degrees of freedom. Nowadays the
interpretation is different and the Chan-Paton factor represents the colour
degrees of freedom of the gauge bosons and the other massive particles of the
spectrum.
The N -point amplitude BN that we have constructed in this section con-
tains only simple pole singularities in all possible planar channels. They cor-
respond to zero width resonances located at non-negative integer values n of
the Regge trajectory α(M2) = n. The lowest state located at α(m2) = 0 cor-
responds to the particles on the external legs of BN . The spectrum of excited
particles can be obtained by factorizing the N -point amplitude in the most
general channel with any number of particles. This was done in Ref.s [17] and
[18] finding a spectrum of states rising exponentially with the mass M . Being
the model relativistic invariant it was found that many states obtained by
factorizing the N -point amplitude were ”ghosts”, namely states with negative
norm as one finds in QED when one quantizes the electromagnetic field in a
covariant gauge. The consistency of the model requires the existence of rela-
tions satisfied by the scattering amplitudes that are similar to those obtained
through gauge invariance in QED. If the model is consistent they must decou-
ple the negative norm states leaving us with a physical spectrum of positive
norm states. In order to study in a simple way these issues, we discuss in the
next section the operator formalism introduced already in 1969 [19, 20, 21].
Before concluding this section let us go back to the non-planar four-point
amplitude in Eq. (5) and discuss its generalization to an N -point amplitude.
Using the technique of the electrostatic analogue on the sphere instead of on
the disk Shapiro [22] was able to obtain a N -point amplitude that reduces
to the four-point amplitude in Eq. (5) with intercept α0 = 2. The N -point
amplitude found in Ref. [22] is:
14 Paolo Di Vecchia
i=1 d
dVabc
|zi − zj |α
′pi·pj (49)
where
dVabc =
d2zad
|za − zb|2|za − zc|2|zb − zc|2
The integral in Eq. (49) is performed in the entire complex plane.
3 Operator formalism and factorization
The factorization properties of the dual resonance model were first studied by
factorizing by brute force the N-point amplitude at the various poles [17, 18].
The number of terms that factorize the residue of the pole at α(s) = n,
increases rapidly with the value of n. In order to find their degeneracy it turned
out to be convenient to first rewrite the N-point amplitude in an operator
formalism. In this section we introduce the operator formalism and we rewrite
the N -point amplitude derived in the previous section in this formalism.
The key idea [19, 20, 21] is to introduce an infinite set of harmonic oscil-
lators and a position and momentum operators 7 which satisfy the following
commutation relations:
[anµ, a
mν ] = ηµνδnm ; [q̂µ, p̂ν ] = iηµν (51)
where ηµν is the flat Minkowski metric that we take to be ηµν = (−1, 1, . . . 1).
A state with momentum p is constructed in terms of a state with zero mo-
mentum as follows:
p̂|p〉 ≡ p̂eip·q̂|0〉 = p|p〉 ; p̂ |0〉 = 0 (52)
normalized as 8
〈p|p′〉 = (2π)dδ(d)(p+ p′) (53)
In order to avoid minus signs we use the convention that
〈p| = 〈0|eip·q̂ (54)
A complete and orthonormal basis of vectors in the harmonic oscillator space
is given by
|λ1, λ2, . . . λi; p〉 =
(a†µn;n)
λn;µn
λn,µn !
eipq̂|0, 0〉 (55)
7 Actually the position and momentum operators were introduced in Ref. [23].
8 Although we now use an arbitrary d we want to remind you that all original
calculations were done for d = 4.
The birth of string theory 15
where the first |0〉 corresponds to the one annihilated by all annihilation op-
erators and the second one to the state of zero momentum:
aµn;n|0, 0〉 = p̂|0, 0〉 = 0 (56)
Notice that Lorentz invariance forces to introduce also oscillators that create
states with negative norm due to the minus sign in the flat Minkowski metric.
This implies that the space spanned by the states in Eq. (55) is not positive
definite. This is, however, not allowed in a quantum theory and therefore if
the dual resonance model is a consistent quantum-relavistic theory we expect
the presence of relations of the kind of those provided by gauge invariance in
Let us introduce the Fubini-Veneziano [23] operator:
Qµ(z) = Q
µ (z) +Q
µ (z) +Q
µ (z) (57)
where
Q(+) = i
z−n ; Q(−) = −i
Q(0) = q̂ − 2iα′p̂ log z (58)
In terms of Q we introduce the vertex operator corresponding to the external
leg with momentum p:
V (z; p) =: eip·Q(z) :≡ eip·Q
(−)(z)eipq̂e+2α
′p̂·p log zeip·Q
(+)(z) (59)
and compute the following vacuum expectation value:
〈0, 0|
V (zi, pi)|0, 0〉 (60)
It can be easily computed using the Baker-Haussdorf relation
eAeB = eBeAe[A,B] (61)
that is valid if the commutator, as in our case, [A,B] is a c-number. In our
case the commutation relations to be used are:
[Q(+)(z), Q(−)(w)] = −2α′ log
and the second one in Eq. (51). Using them one gets:
V (z; p)V (w; k) =: V (z; p)V (w; k) : (z − w)2α
′p·k (63)
16 Paolo Di Vecchia
〈0, 0|
V (zi, pi)|0, 0〉 =
(zi − zj)2α
′pi·pj (2π)dδ(d)(
pi) (64)
where the normal ordering requires that all creation operators be put on the
left of the annihilation one and the momentum operator p̂ be put on the right
of the position operator q̂. This means that
(2π)dδ(d)(
pi)BN =
1 dziθ(zi − zi+1)
dVabc
(zi − zi+1)α0−1
× 〈0, 0|
V (zi, pi)|0, 0〉 (65)
By choosing the three variables za, zb and zc as in Eq. (39) we can rewrite the
previous equation as follows:
(2π)dδ(d)(
pi)BN =
θ(zi − zi+1)×
(zi − zi+1)α0−1
〈0, p1|
V (zi; pi)|0, pN 〉 (66)
where we have taken z2 = 1 and we have defined (α0 ≡ α′p2i ; i = 1 . . .N) :
V (zN ; pN)|0, 0〉 ≡ |0; pN〉 ; 〈0; 0| lim
z2α01 V (z1; p1) = 〈0, p1| (67)
Before proceeding to factorize the N -point amplitude let us study the prop-
erties under the projective group of the operators that we have introduced.
We have already seen that the projective group leaves the integrand of the
Koba-Nielsen representation of the N -point amplitude invariant. The projec-
tive group has three generators L0, L1 and L−1 corresponding respectively to
dilatations, inversions and translations. Assuming that the Fubini-Veneziano
fields Q(z) transforms as a field with weight 0 (as a scalar) we can immedi-
ately write the commutation relations that Q(z) must satisfy. This means in
fact that, under a projective transformation, Q(z) transforms as follows:
Q(z) → QT (z) = Q
αz + β
γz + δ
; αδ − βγ = 1 (68)
Expanding for small values of the parameters we get:
QT (z) = Q(z) + (ǫ1 + ǫ2z + ǫ3z
dQ(z)
+ o(ǫ2) (69)
The birth of string theory 17
This means that the three generators of the projective group must satisfy the
following commutation relations with Q(z):
[L0, Q(z)] = z
; [L−1, Q(z)] =
; [L1, Q(z)] = z
They are given by the following expressions in terms of the harmonic oscilla-
tors:
L0 = α
′p̂2 +
na†n · an ; L1 =
2α′p̂ · a1 +
n(n+ 1)an+1 · a†n (71)
L−1 = L
2α′p̂ · a†1 +
n(n+ 1)a
n+1 · an (72)
They annihilate the vacuum
L0|0, 0〉 = L1|0, 0〉 = L−1|0, 0〉 = 0 (73)
that is therefore called the projective invariant vacuum, and satisfy the algebra
that is called Gliozzi algebra [24]9:
[L0, L1] = −L1 ; [L0, L−1] = L−1 ; [L1, L−1] = 2L0 (74)
The vertex operator with momentum p is a projective field with weight equal
to α0 = α
′p2. It transforms in fact as follows under the projective group:
[Ln, V (z, p)] = z
n+1 dV (z, p)
+ α0(n+ 1)z
nV (z, p) ; n = 0,±1 (75)
or in finite form as follows:
UV (z, p)U−1 =
(γz + δ)2α0
αz + β
γz + δ
where U is the generator of an arbitrary finite projective transformation.
Since U leaves the vacuum invariant, by using Eq. (76) it is easy to show
that:
〈0, 0|
V (z′i, p)|0, 0〉 =
(γzi + δ)
2α0〈0, 0|
V (zi, p)|0, 0〉 (77)
that together with the following equation:
(z′i − z′i+1)α0−1 =
(zi − zi+1)α0−1
(γzi + δ)
−2α0(78)
9 See also Ref. [25].
18 Paolo Di Vecchia
implies that the integrand of the N -point amplitude in Eq. (65) is invariant
under projective transformations.
We are now ready to factorize the N -point amplitude and find the spec-
trum of mesons.
From Eq.s (75) and (76) it is easy to derive the transformation of the
vertex operator under a finite dilatation:
zL0V (1, p)z−L0 = V (z, p)zα0 (79)
Changing the integration variables as follows:
; i = 2, 3 . . .N − 2 ; det
= z3z4 . . . zN−2 (80)
where the last term is the jacobian of the trasformation from zi to xi, we get
from Eq.(66) the following expression:
AN ≡ 〈0, p1|V (1, p2)DV (1, p3) . . .DV (1, pN−1)|0, pN〉 (81)
where the propagator D is equal to:
dxxL0−1−α0(1− x)α0−1 = Γ (L0 − α0)Γ (α0)
Γ (L0)
AN = (2π)
dδ(d)
BN (83)
The factorization properties of the amplitude can be studied by inserting in
the channel (1,M) or equivalently in the channel (M +1, N) described by the
Mandelstam variable
s = −(p1 + p2 + . . . pM )2 = −(pM+1 + pM+2 . . .+ pN )2 ≡ −P 2 (84)
the complete set of states given in Eq. (55):
〈p(1,M)|λ, P 〉〈λ, P |D|µ, P 〉〈µ, P |p(M+1,N)〉 (85)
where
〈p(1,M)| = 〈0, p1|V (1, p2)DV (1, p3) . . . V (1, pM ) (86)
|p(M+1,N)〉 = V (1, pM+1)D . . . V (1, pN−1)|pN , 0〉 (87)
Introducing the quantity:
The birth of string theory 19
na†n · an (88)
it is possible to rewrite
〈λ, P |D|µ, P 〉 =
〈λ, P |
(−1)m
α0 − 1
R+m− α(s)
|µ, P 〉 (89)
where s is the variable defined in Eq. (84). Using this equation we can rewrite
Eq. (85) as follows
〈p(1,M)|λ, P 〉
〈λ, P |
(−1)m
α0 − 1
R+m− α(s)
|µ, P 〉〈µ, P |p(M+1,N)〉(90)
This expression shows that amplitude AN has a pole in the channel (1,M)
when α(s) is equal to an integer n ≥ 0 and the states |λ〉 that contribute to
its residue are those satisfying the relation:
R|λ〉 = (n−m)|λ〉 ; m = 0, 1 . . . n (91)
The number of independent states |λ〉 contributing to the residue gives the
degeneracy of states for each level n.
Because of manifest relativistic invariance the space spanned by the com-
plete system of states in Eq. (55) contains states with negative norm corre-
sponding to those states having an odd number of oscillators with timelike
directions (see Eq. (51)). This is not consistent in a quantum theory where
the states of a system must span a positive definite Hilbert space. This means
that there must exist a number of relations satisfied by the external states that
decouple a number of states leaving with a positive definite Hilbert space. In
order to find these relations we rewrite the state in Eq. (87) going back to the
Koba-Nielsen variables:
|p(1,M)〉 =
dziθ(zi − zi+1)]
(zi − zi+1)α0−1×
× V (1, p1)V (z2, p2) . . . V (zM−1, pM−1)|0, pM 〉 (92)
Let us consider the operator U(α) that generate the projective transformation
that leaves the points z = 0, 1 invariant:
1− α(z − 1) = z + α(z
2 − z) + o(α2) (93)
From the transformation properties of the vertex operators in Eq. (76) it
is easy to see that the previous transformation leaves the state in Eq. (92)
invariant:
20 Paolo Di Vecchia
U(α)|p(1,M)〉 = |p(1,M)〉 (94)
This means that the generator of the previous transformation annihilates the
state in Eq. (92):
W1|p(1,M)〉 = 0 ; W1 = L1 − L0 (95)
The explicit form of W1 follows from the infinitesimal form of the transforma-
tion in Eq. (93). This condition that is of the same kind of the relations that
on shell amplitudes with the emission of photons satisfy as a consequence of
gauge invariance, implies that the residue at the pole in Eq. (90) can be fac-
torized with a smaller number of states. It turns out, however, that a detailed
analysis of the spectrum shows that negative norm states are still present.
This can be qualitatively understood as follows. Due to the Lorentz metric
we have a negative norm component for each oscillator. In order to be able
to decouple all negative norm states we need to have a gauge condition of
the type as in Eq. (95) for each oscillator. But the number of oscillators is
infinite and, therefore, we need an infinite number of conditions of the type
as in Eq. (95). It was found in Ref. [6] that, if we take α0 = 1, then one can
easily construct an infinite number of operators that leave the state in Eq.
(92) invariant. In the next section we will concentrate on this case.
4 The case α0 = 1
If we take α0 = 1 many of the formulae given in the previous section simplify.
The N -point amplitude in Eq. (38) becomes:
1 dziθ(zi − zi+1)
dVabc
(zi − zj)2α
′pi·pj (96)
that can be rewritten in the operator formalism as follows:
(2π)4δ(
pi)BN =
1 dziθ(zi − zi+1)
dVabc
〈0, 0|
V (zi, pi)|0, 0〉 (97)
By choosing z1 = ∞, z2 = 1 and zN = 0 it becomes
(2π)4δ(
pi)BN =
θ(zi − zi+1)〈0, p1|
V (zi; pi)|0, pN 〉 (98)
The birth of string theory 21
where
V (zN ; pN )|0, 0〉 ≡ |0; pN 〉 ; 〈0; 0| lim
z21V (z1; p1) = 〈0, p1| (99)
Eq. (81) is as before, but now the propagator becomes:
dxxL0−2 =
L0 − 1
(100)
This means that Eq. (89) becomes:
〈λ, P |D|µ, P 〉 = 〈λ, P | 1
L0 − 1
|µ, P 〉 (101)
and Eq. (90) has the simpler form:
〈p(1,M)|λ, P 〉〈λ, P |
R − α(s)
|λ, P 〉〈λ, P |p(M+1,N)〉 (102)
BN has a pole in the channel (1,M) when α(s) is equal to an integer n ≥ 0 and
the states |λ〉 that contribute to its residue are those satisfying the relation:
R|λ〉 = n|λ〉 (103)
Their number gives the degeneracy of the states contributing to the pole at
α(s) = n. The N -point amplitude can be written as:
BN = 〈p(1,M)|D|p(M+1,N)〉 (104)
where
|p(1,M)〉 =
∫ M−1
[dziθ(zi − zi+1)]×
× V (1, p1)V (z2, p2) . . . V (zM−1, pM−1|0, pM 〉 (105)
Using Eq. (79) and changing variables from zi, i = 2 . . .M−1 to xi = zi+1zi , i =
1 . . .M − 2 with z1 = 1 we can rewrite the previous equation as follows:
|p(1,M)〉 = V (1, p1)DV (1, p2) . . .DV (1, pM−1)|0, pM 〉 (106)
where the propagator D is defined in Eq. (100).
We want now to show that the state in Eq.s (105) and (106) is not only
annihilated by the operator in Eq. (95), but, if α0 = 1 [6], by an infinite set
of operators whose lowest one is the one in Eq. (95). We will derive this by
using the formalism developed in Ref. [26] and we will follow closely their
derivation.
Starting from Eq.s (70) Fubini and Veneziano realized that the generators
of the projective group acting on a function of z are given by:
22 Paolo Di Vecchia
L0 = −z
; L−1 = −
; L1 = −z2
(107)
They generalized the previous generators to an arbitrary conformal transfor-
mation by introducing the following operators, called Virasoro operators:
Ln = −zn+1
(108)
that satisfy the algebra:
[Ln, Lm] = (n−m)Ln+m (109)
that does not contain the term with the central charge! They also showed that
the Virasoro operators satisfy the following commutation relations with the
vertex operator:
[Ln, V (z, p)] =
zn+1V (z, p)
(110)
More in general actually they define an operator Lf corresponding to an
arbitrary function f(ξ) and Lf = Ln if we choose f(ξ) = ξ
n. In this case the
commutation relation in Eq. (110) becomes:
[Lf , V (z, p)] =
(zf(z)V (z, p)) (111)
By introducing the variable:
ξf(ξ)
(112)
where A is an arbitrary constant, one can rewrite Eq. (111) in the following
form:
[Lf , zf(z)V (z, p)] =
(zf(z)V (z, p)) (113)
This implies that, under an arbitrary conformal transformation z → f(z),
generated by U = eαLf , the vertex operator transforms as:
eαLfV (z, p) zf(z) e−αLf = V (z′, p)z′f(z′) (114)
where the parameter α is given by:
ξf(ξ)
(115)
On the other hand, this equation implies:
zf(z)
z′f(z′)
(116)
The birth of string theory 23
that, inserted in Eq. (114), implies that the quantity V (z, p) dz is left invariant
by the transformation z → f(z):
eαLfV (z, p)dze−αLf = V (z′, p)dz′ (117)
Let us now act with the previous conformal transformation on the state in
Eq. (105). We get:
eαLf |p(1,M)〉 =
[dziθ(zi − zi+1)] eαLfV (1, p1)e−αLf×
×eαLfV (z2, p2)e−αLf . . . . . . eαLfV (zM−1, pM−1)e−αLf eαLf |0, pM 〉 =
θ(zi − zi+1)× eαLfV (1, p1)e−αLf×
× V (z′2, p2)dz′2 . . . V (z′M−1, pM−1)dz′M−1eαLf |0, pM 〉 (118)
where we have used Eq. (117). The previous transformation leaves the state
invariant if both z = 0 and z = 1 are fixed points of the conformal transfor-
mation. This happens if the denominator in Eq. (115) vanishes when ξ = 0, 1.
This requires the following conditions:
f(1) = 0 ; lim
ξf(ξ) = 0 (119)
Expanding ξ near the poinr ξ = 1 we can determine the relation between z
and z′ near z = z′ = 1. We get:
ze−αf
1− z + ze−αf ′(1)
(120)
and from it we can determine the conformal factor:
(1 − z + ze−αf ′(1))2
→ eαf
′(1) (121)
in the limit z → 1. Proceeding in the same near the point z = z′ = 0 we get:
zf(0)eαf(0)
f(0) + zf ′(0)(1− eαf(0)
→ zeαf(0) (122)
in the limit z → 0. This means that Eq. (118) becomes
eα(Lf−f
′(1)−f(0))|p(1,M)〉 = |p(1,M)〉 (123)
A choice of f that satisfies Eq.s (119) is the following:
24 Paolo Di Vecchia
f(ξ) = ξn − 1 (124)
that gives the following gauge operator:
Wn = Ln − L0 − (n− 1) (125)
that annihilates the state in Eq. (105):
Wn|p1...M 〉 = 0 ; n = 1 . . .∞ (126)
These are the Virasoro conditions found in Ref. [6]. There is one condition for
each negative norm oscillator and, therefore, in this case there is the possibility
that the physical subspace is positive definite. An alternative more direct
derivation of Eq. (126) can be obtained by acting with Wn on the state in Eq.
(106) and using the following identities:
WnV (1, p) = V (1, p)(Wn + n) ; (Wn + n)D = [L0 + n− 1]−1Wn (127)
The second equation is a consequence of the following equation:
L0 = xL0+nLn (128)
Eq.s (127) imply
WnV (1, p)D = V (1, p)[L0 + n− 1]−1Wn (129)
This shows that the operator Wn goes unchanged through all the product
of terms V D until it arrives in front of the term V (1, pM−1)|0, pM 〉. Going
through the vertex operator it becomes Ln − L0 + 1 that then annihilate the
state
(Ln − L0 + 1)|pM , 0〉 = 0 (130)
This proves Eq. (126).
Using the representation of the Virasoro operators given in Eq. (108) Fu-
bini and Veneziano showed that they satisfy the algebra given in eq. (109)
without the central charge. The presence of the central charge was recognized
by Joe Weis10 in 1970 and never published. Unlike Fubini and Veneziano [26]
he used the expression of the Ln operators in terms of the harmonic oscillators:
2α′np̂ · an +
m(n+m)an+m · am+
m(n−m)am−n · am ;n ≥ 0 Ln = L†n (131)
10 See noted added in proof in Ref. [26].
The birth of string theory 25
He got the following algebra:
[Ln, Lm] = (n−m)Ln+m +
n(n2 − 1)δn+m;0 (132)
where d is the dimension of the Minkowski space-time. We write here d for the
dimension of the Minkowski space, but we want to remind you that almost
everybody working in a model for mesons at that time took for granted that
the dimension of the space-time was d = 4. As far as I remember the first
paper where a dimension d 6= 4 was introduced was Ref. [27] where it was
shown that the unitarity violating cuts in the non-planar loop become poles
that were consistent with unitarity if d = 26.
In the last part of this section we will generalize the factorization procedure
to the Shapiro-Virasoro model whose N -point amplitude is given in Eq. (49).
In this case we must introduce two sets of harmonic oscillators commuting
with each other and only one set of zero modes satisfying the algebra [28] :
[anµ, a
mν ] = [ãnµ, ã
mν ] = ηµνδnm ; [q̂µ, p̂ν ] = iηµν (133)
In terms of them we can introduce the Fubini-Veneziano operator
Q(z, z̄) = q̂ − 2α′p̂ log(zz̄) + i
−n − a†nzn
ãnz̄
−n − ã†nz̄n
(134)
We can then introduce the vertex operator:
V (z, z̄; p) =: eip·Q(z,z̄) : (135)
and write the N -point amplitude in Eq. (95) in the following factorized form:
i=1 d
dVabc
V (zi, z̄i, pi))
|0〉 =
= (2π)4δ(4)(
i=1 d
dVabc
|zi − zj|α
′pi·pj (136)
where the radial ordered product is given by
V (zi, z̄i, pi))
V (zi, z̄i, pi))
θ(|zi| − |zi+1|) + . . . (137)
26 Paolo Di Vecchia
and the dots indicate a sum over all permutations of the vertex operators.
By fixing z1 = ∞, z2 = 1, zN = 0 we can rewrite the previous expression
as follows:
∫ N−1
d2zi〈0, p1|R
V (zi, z̄i, pi))
|0, pN〉 (138)
For the sake of simplicity let us consider the term corresponding to the per-
mutation 1, 2, . . .N . In this case the Koba-Nielsen variables are ordered in
such a way that |zi| ≥ |zi+1| for i = 1, . . .N −1. We can then use the formula:
V (zi, z̄i, pi)) = z
L̃0−1
i V (1, 1, pi)z
i (139)
and change variables:
; |wi| ≤ 1 (140)
to rewrite Eq. (138) as follows:
〈0, p1|V (1, 1, pi1)DV (1, 1, p2)D . . . V (1, 1, pN−1)|0, pN 〉 (141)
where
wL0−1w̄L̃0−1 =
L0 + L̃0 − 2
· sinπ(L0 − L̃0)
L0 − L̃0
(142)
We can now follow the same procedure for all permutations arriving at the
following expression:
〈0, p1|P [V (1, 1, p2)DV (1, 1, p3)D . . . V (1, 1, pN−1)]|0, pN〉 (143)
where P means a sum of all permutations of the particles.
If we want to consider the factorization of the amplitude on the pole at
s = −(p1 + . . . pM )2 we get only the following contribution:
〈p(1...M)|D|p(M+1...N)〉 (144)
where
|p(M+1...N)〉 = P [V (1, 1, pM+1)D . . . V (1, 1, pN−1]|0, pN 〉 (145)
〈p(1...M)| = 〈0, p1|P [V (1, 1, p2)D . . . V (1, 1, pM)] (146)
The amplitude is factorized by introducing a complete set of states and rewrit-
ing Eq. (141) as follows:
The birth of string theory 27
〈p1...M |λ, λ̃〉
2π〈λ, λ̃|δL0,L̃0|λ, λ̃〉
L0 + L̃0 − 2
〈λ, λ̃|p(M+1,...N)〉 (147)
By writing
p̂2 +R ; L̃0 =
p̂2 + R̃ (148)
na†n · an ; R̃ =
nã†n · ãn (149)
we can rewrite Eq. (147) as follows
〈p1...M |λ, λ̃〉
2π〈λ, λ̃|δR,R̃|λ, λ̃〉
R + R̃− α(s)
〈λ, λ̃|p(M+1,...N)〉 (150)
We see that the amplitude for the Shapiro-Virasoro model has simple poles
only for even integer values of αSV (s) = 2 +
s = 2n ≥ 0 and the residue at
the poles factorizes in a sum with a finite number of terms. Notice that the
Regge trajectory of the Shapiro-Virasoro model has double intercept and half
slope of that of the generalized Veneziano model.
5 Physical states and their vertex operators
In the previous section, we have seen that the residue at the poles of the N -
point amplitudes factorizes in a sum of a finite number of terms. We have also
seen that some of these terms, due to the Lorentz metric, correspond to states
with negative norm. We have also derived a number of ”Ward identities” given
in Eq. (126) that imply that some of the terms of the residue decouple. The
question to be answered now is: Is the space spanned by the physical states
a positive norm Hilbert space? In order to answer this question we need first
to find the conditions that characterize the on shell physical states |λ, P 〉
and then to determine which are the states that contribute to the residue
of the pole at α(s = −P 2) = n. In other words, we have to find a way of
characterizing the physical states and of eliminating the spurious states that
decouple in Eq. (102) as a consequence of Eq.s (126). A state |λ.P 〉 contributes
at the residue of the pole in Eq.(102) for α(s = −P 2) = n if it is on shell,
namely if it satisfies the following equations:
R|λ, P 〉 = n|λ, P 〉 ; α(−P 2) = 1− α′P 2 = n (151)
that can be written in a unique equation:
28 Paolo Di Vecchia
(L0 − 1)|λ, P 〉 = 0 (152)
Because of Eq. (126) we also know that a state of the type:
|s, P 〉 = W †m|µ, P 〉 (153)
is not going to contribute to the residue of the pole. We call it a spurious or
unphysical state. We start constructing the subspace of spurious states that
are on shell at the level n. Let us consider the set of orthogonal states |µ, P 〉
such that
R|µ, P 〉 = nµ|µ, P 〉 ; L0|µ, P 〉 = (1−m)|µ, P 〉 ; 1− α′P 2 = n (154)
where
m = n+ nµ (155)
In terms of these states we can construct the most general spurious state that
is on shell at the level n. It is given by
|s, P 〉 = W †m|µ, P 〉 ; (L0 − 1)|s, P 〉 = 0 (156)
per any positive integer m. Using Eq. (154), eq. (156) becomes:
|s, P 〉 = L†m|µ, P 〉 (157)
where |µ, P 〉 is an arbitrary state satisfying Eq.s (154).
A physical state |λ, P 〉 is defined as the one that is orthogonal to all spuri-
ous states appearing at a certain level n. This means that it must satisfy the
following equation:
〈λ.P |L†ℓ |µ, P 〉 = 0 (158)
for any state |µ, P 〉 satisfying Eq.s (154). In conclusion, the on shell physical
states at the level n are characterized by the fact that they satisfy the following
conditions:
Lm|λ, P 〉 = (L0 − 1)|λ, P 〉 = 0 ; 1− α′P 2 = n (159)
These conditions characterizing the physical subspace were first found by Del
Giudice and Di Vecchia [28] where the analysis described here was done.
In order to find the physical subspace one starts writing the most general
on shell state contributing to the residue of the pole at level n in Eq. (154).
Then one imposes Eq.s (159) and determines the states that span the physical
subspace. Actually, among these states one finds also a set of zero norm states
that are physical and spurious at the same time. Those states are of the form
given in Eq. (157), but also satisfy Eq.s (159). It is easy to see that they are
not really physical because they are not contributing to the residue of the pole
The birth of string theory 29
at the level n. This follows from the form of the unit operator given in the
space of the physical states by:
norm 6=0
|λ, P 〉〈λ, P |+
[|λ0, P 〉〈µ0, P |+ |µ0, P 〉〈λ0, P |] (160)
where |λ0, P 〉 is a zero norm physical and spurious state and |µ0, P 〉 its con-
jugate state. A conjugate state of a zero norm state is obtained by changing
the sign of the oscillators with timelike direction. Since |λ0, P 〉 is a spurious
state when we insert the unit operator, given in Eq. (160), in Eq. (102) we
see that the zero norm states never contribute to the residue because their
contribution is annihilated either from the state 〈p(1,M)| or from the state
|p(M+1,N)〉. In conclusion, the physical subspace contains only the states in
the first term in the r.h.s. of Eq. (160).
Let us analyze the first two excited levels. The first excited level corre-
sponds to a massless gauge field. It is spanned by the states ǫµa
1µ|0, P 〉. In
this case the only condition that we must impose is:
1µ|0, P 〉 = 0 =⇒ P · ǫ = 0 (161)
Choosing a frame of reference where the momentum of the photon is given by
Pµ ≡ (P, 0....0, P ) , Eq. (161) implies that the only physical states are:
1i |0, P 〉+ ǫ(a
1;0 − a
1;d−1)|0, P 〉 ; i = 1 . . . d− 2 (162)
where ǫi and ǫ are arbitrary parameters. The state in Eq. (162) is the most
general state of the level N = 1 satisfying the conditions in Eq. (159). The
first state in eq. (162) has positive norm, while the second one has zero norm
that is orthogonal to all other physical states since it can be written as follows:
1;0 − a
1;D−1)|0, P 〉 = L
1|0, P 〉 (163)
in the frame of reference where Pµ ≡ (P, ...0, P ). Because of the previous
property it is decoupled from the physical states together with its conjugate:
1,0 + a
1,d−1)|0, P 〉 (164)
In conclusion, we are left only with the transverse d− 2 states corresponding
to the physical degrees of freedom of a massless spin 1 state. At the next level
n = 2 the most general state is given by:
[αµνa
1,ν + β
2,µ]|0, P 〉 (165)
If we work in the center of mass frame where Pµ = (M,0) we get the following
most general physical state:
|Phys >= αij [a†1,ia
1,j −
(d− 1)
1,k]|0, P 〉+
30 Paolo Di Vecchia
+βi[a
2,i + a
1,i]|0, P >〉+
1,i +
1,0 − 2a
|0, P 〉 (166)
where the indices i, j run over the d− 1 space components. The first term in
(166) corresponds to a spin 2 in (d− 1) dimensional space and has a positive
norm being made with space indices. The second term has zero norm and is
orthogonal to the other physical states since it can be written as L+1 a
1,i|0, P 〉.
Therefore it must be eliminated from the physical spectrum together with its
conjugate, as explained above. Finally, the last state in (166) is spinless and
has a norm given by:
2(d− 1)(26− d) (167)
If d < 26 it corresponds to a physical spin zero particle with positive norm. If
d > 26 it is a ghost. Finally, if d = 26 it has a zero norm and is also orthogonal
to the other physical states since it can be written in the form:
2 + 3L
1 )|0 > (168)
It does not belong, therefore, to the physical spectrum. The analysis of this
level was done in Ref. [29] with d = 4. This did not allow the authors of
Ref. [29] to see that there was a critical dimension.
The analysis of the physical states can be easily extended [28] to the
Shapiro-Virasoro model. In this case the physical conditions given in Eq. (159)
for the open string, become [28]:
Lm|λ, λ̃〉 = L̃m|λ, λ̃〉 = (L0 − 1)|λ, λ̃〉 = (L̃0 − 1)|λ, λ̃〉 = 0 (169)
for any positive integer m. It can be easily seen from the previous equations
that the lowest state of the Shapiro-Virasoro model is the vacuum |0a, 0ã, p〉
corresponding to a tachyon with mass α′p2 = 4, while the next level described
by the state a
1ν |0a, 0ã, p〉 contains massless states corresponding to the
graviton, a dilaton and a two-index antisymmetric tensor Bµν .
Having characterized the physical subspace one can go on and construct
a N -point scattering amplitude involving arbitrary physical states. This was
done by Campagna, Fubini, Napolitano and Sciuto [30] where the vertex oper-
ator for an arbitrary physical state was constructed in analogy with what has
been done for the ground tachyonic state. They associated to each physical
state |α, P 〉 a vertex operator Vα(z, P ) that is a conformal field with conformal
dimension equal to 1:
[Ln, Vα(z, p)] =
zn+1Vα(z, p)
(170)
and reproduces the corresponding state acting on the vacuum as follows:
Vα(z; p)|0, 0〉 ≡ |α; p〉 ; 〈0; 0| lim
z2Vα(z; p) = 〈α, p| (171)
The birth of string theory 31
It satisfies, in addition, the hermiticity relation:
V †α (z, P ) = Vα(
,−P )(−1)α(−P
2) (172)
An excited vertex that will play an important role in the next section is the
one associated to the massless gauge field. It is given by:
Vǫ(z, k) ≡ ǫ ·
dQ(z)
eik·Q(z) ; k · ǫ = k2 = 0 (173)
Because of the last two conditions in Eq. (173) the normal order is not neces-
sary. It is convenient to give the expression of
dQ(z)
in terms of the harmonic
oscillators:
P (z) ≡ dQ(z)
−n−1 (174)
It is a conformal field with conformal dimension equal to 1. The rescaled
oscillators αn are given by:
nan ; α−n =
na†n ; n > 0 ; α0 =
2α′p̂ (175)
In terms of the vertex operators previously introduced the most general
amplitude involving arbitrary physical states is given by [30]:
(2π)4δ(
1 dziθ(zi − zi+1)
dVabc
〈0, 0|
Vαi(zi, pi)|0, 0〉(176)
In the case of the Shapiro-Virasoro model the tachyon vertex operator is
given in Eq. (135). By rewriting Eq. (134) as follows:
Q(z, z̄) = Q(z) + Q̃(z̄) (177)
where
Q(z) =
q̂ − 2α′p̂ log(z) + i
−n − a†nzn
(178)
Q̃(z̄) =
q̂ − 2α′p̂ log(z̄) + i
ãnz̄
−n − ã†nz̄n
(179)
we can write the tachyon vertex operator in the following way:
V (z, z̄, p) =: eip·Q(z)eip·Q̃(z̄) : (180)
32 Paolo Di Vecchia
This shows that the vertex operator corresponding to the tachyon of the
Shapiro-Virasoro model can be written as the product of two vertex oper-
ators corresponding each to the tachyon of the generalized Veneziano model.
Analogously the vertex operator corresponding to an arbitrary physical
state of the Shapiro-Virasoro model can always be written as a product of
two vertex operators of the generalized Veneziano model:
Vα,β(z, z̄, p) = Vα(z,
)Vβ(z̄,
) (181)
The first one contains only the oscillators αn, while the second one only the
oscillators α̃n. They both contain only half of the total momentum p and
the same zero modes p̂ and q̂. The two vertex operators of the generalized
Veneziano model are both conformal fields with conformal dimension equal
to 1. If they correspond to physical states at the level 2n, they satisfy the
following relation (n = ñ):
+ n = 1 (182)
They lie on the following Regge trajectory:
p2 ≡ αSV (−p2) = 2n (183)
as we have already seen by factorizing the amplitude in Eq. (150).
6 The DDF states and absence of ghosts
In the previous section we have derived the equations that characterize the
physical states and their corresponding vertex operators. In this section we
will explicitly construct an infinite number of orthonormal physical states with
positive norm.
The starting point is the DDF operator introduced by Del Giudice, Di Vec-
chia and Fubini [31] and defined in terms of the vertex operator corresponding
to the massless gauge field introduced in eq. (173):
Ai,n =
i Pµ(z)e
ik·Q(z) (184)
where the index i runs over the d−2 transverse directions, that are orthogonal
to the momentum k. We have also taken
= 1. Because of the log z term
appearing in the zero mode part of the exponential, the integral in Eq. (184),
that is performed around the origin z = 0, is well defined only if we constrain
the momentum of the state, on which Ai,n acts, to satisfy the relation:
2α′p · k = n (185)
The birth of string theory 33
where n is a non-vanishing integer.
The operator in Eq. (184) will generate physical states because it com-
mutes with the gauge operators Lm:
[Lm, An;i] = 0 (186)
since the vertex operator transforms as a primary field with conformal dimen-
sion equal to 1 as it follows from Eq. (170).
On the other hand it also satisfies the algebra of the harmonic oscillator
as we are now going to show. From Eq. (184) we get:
[An,i, Am,j] = −
dzǫi · P (z)eik·Q(ζ)ǫj · P (ζ)eik
′ ·Q(ζ) (187)
where
2α′p · k = n ; 2α′p · k′ = m (188)
and k and k′ are supposed to be in the same direction, namely
kµ = nk̂µ ; k
µ = mk̂µ (189)
2α′p · k̂ = 1 (190)
Finally the polarizations are normalized as:
ǫi · ǫj = δij (191)
Since k̂ · ǫi = k̂ · ǫj = k̂2 = 0 a singularity for z = ζ can appear only from the
contraction of the two terms P (ζ) and P ((z) that is given by:
〈0, 0|ǫi · P (z)ǫj · P (ζ)|0, 0〉 = −
2α′δij
(z − ζ)2
(192)
Inserting it in Eq. (187) we get:
[An,i, Am,j ] = δij in
dζk̂ · P (ζ)e−i(n+m))k̂·Q(ζ) =
= inδijδn+m;0
dζk̂ · P (ζ) (193)
where we have used the fact that the integrand is a total derivative and
therefore one gets a vanishing contribution unless n + m = 0. If n + m = 0
from Eq.s (174) and (190) we get:
[An,i, Am,j ] = nδijδn+m;0 ; i, j = 1 . . . d− 2 (194)
34 Paolo Di Vecchia
Eq. (194) shows that the DDF operators satisfy the harmonic oscillator alge-
In terms of this infinite set of transverse oscillators we can construct an
orthonormal set of states:
|i1, N1; i2, N2; . . . im, Nm〉 =
Aik,−Nk√
|0, p〉 (195)
where λh is the multiplicity of the operator Aih,−Nh in the product in Eq.
(195) and the momentum of the state in Eq. (195) is given by
P = p+
k̂Ni (196)
They were constructed in four dimensions where they were not a complete
system of states 11 and it took some time to realize that in fact they were a
complete system of states if d = 26 [32, 33] 12. Brower [32] and Goddard and
Thorn [33] showed also that the dual resonance model was ghost free for any
dimension d ≤ 26. In d = 26 this follows from the fact that the DDF operators
obviously span a positive definite Hilbert space (See Eq. (194)). For d < 26
there are extra states called Brower states [32]. The first of these states is
the last state in Eq. (166) that becomes a zero norm state for d = 26. But
also for d < 26 there is no negative norm state among the physical states.
The proof of the no-ghost theorem in the case α0 = 1 is a very important
step because it shows that the dual resonance model constructed generalizing
the four-point Veneziano formula, is a fully consistent quantum-relativistic
theory! This is not quite true because, when the intercept α0 = 1, the lowest
state of the spectrum corresponding to the pole in the N -point amplitude for
α(s) = 0, is a tachyon with mass m2 = − 1
. A lot of effort was then made
to construct a model without tachyon and with a meson spectrum consistent
with the experimental data. The only reasonably consistent models that came
out from these attempts, were the Neveu-Schwarz [7] for mesons and the
Ramond model [8] for fermions that only later were recognized to be part of a
unique model that nowadays is called the Neveu-Schwarz-Ramond model. But
this model was not really more consistent than the original dual resonance
11 Because of this Fubini did not want to publish our result, but then he went to a
meeting in Israel in spring 1971 giving a talk on our work where he found that
the audience was very interested in our result and when he came back to MIT we
decided to publish our result.
12 I still remember Charles Thorn coming into my office at Cern and telling me:
Paolo, do you know that your DDF states are complete if d = 26? I quickly redid
the analysis done in Ref. [29] with an arbitrary value of the space-time dimension
obtaining Eq.s (166) and (167) that show that the spinless state at the level
α(s) = 2 is decoupled if d = 26. I strongly regretted not to have used an arbitrary
space-time dimension d in the analysis of Ref. [29] .
The birth of string theory 35
model because it still had a tachyon with mass m2 = − 1
. The tachyon
was eliminated from the spectrum only in 1976 through the GSO projection
proposed by Gliozzi, Scherk and Olive [34].
Having realized that, at least for the critical value of the space-time dimen-
sion d = 26, the physical states are described by the DDF states having only
d− 2 = 24 independent components, open the way to Brink and Nielsen [35]
to compute the value α0 = 1 of the Regge trajectory with a very physical ar-
gument. They related the intercept of the Regge trajectory to the zero point
energy of a system with an infinite number of oscillators having only d − 2
independent components:
α0 = −
n (197)
This quantity is obviously infinite and, in order to make sense of it, they in-
troduced a cutoff on the frequencies of the harmonic oscillators obtaining an
infinite term that they eliminated by renormalizing the speed of light and a
finite universal constant term that gave the intercept of the Regge trajectory.
Instead of following their original approach we discuss here an alternative ap-
proach due to Gliozzi [36] that uses the ζ-function regularization. He rewrites
Eq. (197) as follows:
α0 = −
n = −
n−s = −
ζR(−1) = 1 (198)
where in the last equation we have used the identity ζR(−1) = − 112 and we
have put d = 26. Since the Shapiro-Virasoro model has two sets of trans-
verse harmonic oscillators it is obvious that its intercept is twice that of the
generalized Veneziano model.
Using the rules discussed in the previous section we can construct the
vertex operator corresponding to the state in Eq. (195). It is given by:
V(i;Ni)(z, P ) =
dziǫi · P (zi)eiNik̂·Q(zi) : eip·Q(z) : (199)
where the integral on the variable zi is evaluated along a curve of the complex
plane zi containing the point z. The singularity of the integrand for zi = z is
a pole provided that the following condition is satisfied.
2α′p · k̂ = 1 (200)
The last vertex in Eq. (199) is the vertex operator corresponding to the ground
tachyonic state given in Eq. (59) with α′p2 = 1.
Using the general form of the vertex one can compute the three-point
amplitude involving three arbitrary DDF vertex operators. This calculation
36 Paolo Di Vecchia
has been performed in Ref. [37] and since the vertex operators are conformal
fields with dimension equal to 1 one gets:
〈0, 0|V
(z1, P1)V(i(2)
(z2, P2)V(i(3)
(z3, P3)|0, 0〉 =
(z1 − z2)(z1 − z3)(z2 − z3)
(201)
where the explicit form of the coefficient C123 is given by:
C123 = 1〈0, 0|2〈0, 0|3〈0, 0|e
r.s=1
n,m=1
−n;iN
−m;i+
−n;i×
× eτ0
(α′Π2r−1)|N (1)k1 , i
〉1|N (2)k2 , i
〉2|N (3)k3 , i
〉3 (202)
where
N rsnm = −N rnNsm
nmα1α2α3
nαs +mαr
; N rn =
Γ (−nαr+1
αrn!Γ (1− nαr+1αr − n)
(203)
Π = Pr+1αr − Prαr+1 ; r = 1, 2, 3 (204)
Π is independent on the value of r chosen as a consequence of the equations:
Pr = 0 (205)
7 The zero slope limit
In the introduction we have seen that the dual resonance model has been
constructed using rules that are different from those used in field theory.
For instance, we have seen that planar duality implies that the amplitude
corresponding to a certain duality diagram, contains poles in both s and t
channels, while the amplitude corresponding to a Feynman diagram in field
theory contains only a pole in one of the two channels. Furthermore, the
scattering amplitude in the dual resonance model contains an infinite number
of resonant states that, at high energy, average out to give Regge behaviour.
Also this property is not observed in field theory. The question that was
natural to ask, was then: is there any relation between the dual resonance
model and field theory? It turned out, to the surprise of many, that the dual
resonance model was not in contradiction with field theory, but was instead
an extension of a certain number of field theories. We will see that the limit in
The birth of string theory 37
which a field theory is obtained from the dual resonance model corresponds
to taking the slope of the Regge trajectory α′ to zero.
Let us consider the scattering amplitude of four ground state particles in
Eq. (1) that we rewrite here with the correct normalization factor:
A(s, t, u) = C0N
0 (A(s, t) +A(s, u) +A(t, u)) (206)
where
2g(2α′)
4 (207)
is the correct normalization factor for each external leg, g is the dimensionless
open string coupling constant that we have constantly ignored in the previous
sections and C0 is determined by the following relation:
′ = 1 (208)
that is obtained by requiring the factorization of the amplitude at the pole
corresponding to the ground state particle whose mass is given in Eq. (21).
Using Eq. (21) in order to rewrite the intercept of the Regge trajectory in
terms of the mass of the ground state particle m2 and the following relation
satisfied by the Γ - function:
Γ (1 + z) = zΓ (z) (209)
we can easily perform the limit for α′ → 0 of A(s, t) obtaining:
A(s, t) =
m2 − s
m2 − s
(210)
Performing the same limit on the other two planar amplitudes we get the
following expression for the total amplitude in Eq. (206):
A(s, t, u) =
2g(2α′)
(α′)2
m2 − s
m2 − s
m2 − u
(211)
By introducing the coupling constant:
g3 = 4g(2α
4 (212)
Eq. (211) becomes
A(s, t, u) = g23
m2 − s
m2 − s
m2 − u
(213)
that is equal to the sum of the tree diagrams for the scattering of four particles
with mass m of Φ3 theory with coupling constant equal to g3. We have shown
that, by keeping g3 fixed in the limit α
′ → 0, the scattering amplitude of four
38 Paolo Di Vecchia
ground state particles of the dual resonance model is equal to the tree diagrams
of Φ3 theory. This proof can be extended to the scattering of N ground state
particles recovering also in this case the tree diagrams of Φ3 theory. It is also
valid for loop diagrams that we will discuss in the next section. In conclusion,
the dual resonance model reduces in the zero slope limit to Φ3 theory. The
proof that we have presented here is due to J. Scherk [38] 13
A more interesting case to study is the one with intercept α0 = 1. We will
see that, in this case, one will obtain the tree diagrams of Yang-Mills theory,
as shown by Neveu and Scherk [40] 14.
Let us consider the three-point amplitude involving three massless gauge
particles described by the vertex operator in Eq. (173). It is given by the sum
of two planar diagrams. The first one corresponding to the ordering (123) is
given by:
3Tr (λa1λa2λa3)
〈0, 0|Vǫ1(z1, p1)Vǫ2(z2, p2)Vǫ3(z3, p3)|0, 0〉
[(z1 − z2)(z2 − z3)(z1 − z3)]−1
(214)
Using momentum conservation p1+ p2+ p3 = 0 and the mass shell conditions
p2i = pi · ǫi = 0 one can rewrite the previous equation as follows:
0Tr(λ
a1λa2λa3)
× [(ǫ1 · ǫ2)(p1 · ǫ3) + (ǫ1 · ǫ3)(p3 · ǫ2) + (ǫ2 · ǫ3)(p2 · ǫ1)] (215)
The second contribution comes from the ordering 132 that can be obtained
from the previous one by the substitution
Tr(λa1λa2λa3) → −Tr(λa1λa3λa2) (216)
Summing the two contributions one gets
oTr(λ
a1 [λa2 , λa3 ])
× [(ǫ1 · ǫ2)(p1 · ǫ3) + (ǫ1 · ǫ3)(p3 · ǫ2) + (ǫ2 · ǫ3)(p2 · ǫ1)] (217)
The factor
N0 = 2g(2α
′)(d−2)/4 (218)
is the correct normalization factor for each vertex operator if we normalize
the generators of the Chan-Paton group as follows:
δij (219)
13 See also Ref. [39].
14 See also Ref. [41].
The birth of string theory 39
It is related to C0 through the relation
′ = 2 (220)
g is the dimensionless open string coupling constant. Notice that Eq.s (218)
and (220) differ from Eq.s (207) and (208) because of the presence of the
Chan-Paton factors that we did not include in the case of Φ3 theory.
By using the commutation relations:
[λa, λb] = ifabcλc (221)
and the previous normalization factors we get for the three-gluon amplitude:
igYMf
a1a2a3 [(ǫ1 · ǫ2)((p1 − p2) · ǫ3 +
+(ǫ1 · ǫ3)((p3 − p1) · ǫ2) + (ǫ2 · ǫ3)((p2 − p3) · ǫ1)] (222)
that is equal to the 3-gluon vertex that one obtains from the Yang-Mills action
LYM = −
F aαβF
a , F
αβ = ∂αA
β − ∂βAaα + gYMfabcAbαAcβ (223)
where
gYM = 2g(2α
4 (224)
The previous procedure can be extended to the scattering of N gluons finding
the same result that one gets from the tree diagrams of Yang-Mills theory.
In the next section, we will discuss the loop diagrams. Also, in this case one
finds that the h-loop diagrams involving N external gluons reproduces in the
zero slope limit the sum of the h-loop diagrams with N external gluons of
Yang-Mills theory.
We conclude this section mentioning that one can also take the zero slope
limit of a scattering amplitude involving three and four gravitons obtaining
agreement with what one gets from the Einstein Lagrangian of general rela-
tivity. This has been shown by Yoneya [43].
8 Loop diagrams
The N -point amplitude previously constructed satisfies all the axioms of S-
matrix theory except unitarity because its only singularities are simple poles
corresponding to zero width resonances lying on the real axis of the Mandel-
stam variables and does not contain the various cuts required by unitarity [1].
15 The determination of the previous normalization factors can be found in the
Appendix of Ref. [42].
40 Paolo Di Vecchia
In order to eliminate this problem it was proposed already in the early days of
dual theories to assume, in analogy with what happens for instance in pertur-
bative field theory, that the N -point amplitude was only the lowest order (the
tree diagram) of a perturbative expansion and, in order to implement unitar-
ity, it was necessary to include loop diagrams. Then, the one-loop diagrams
were constructed from the propagator and vertices that we have introduced
in the previous sections [44]. The planar one-loop amplitude with M external
particles was computed by starting from a (M + 2)-point tree amplitude and
then by sewing two external legs together after the insertion of a propagator
D given in Eq. (100). In this way one gets:
(2α′)d/2(2π)d
〈P, λ|V (1, p1)DV (1, p2) . . . V (1, pN)D|P, λ〉 (225)
where the sum over λ corresponds to the trace in the space of the harmonic os-
cillators and the integral in ddP corresponds to integrate over the momentum
circulating in the loop. The previous expression for the one-loop amplitude
cannot be quite correct because all states of the space generated by the oscil-
lators in Eq. (51) are circulating in the loop, while we know that we should
include only the physical ones. This was achieved first by cancelling by hand
the time and one of the space components of the harmonic oscillators reducing
the degrees of freedom of each oscillator from d to d − 2 as suggested by the
DDF operators at least for d = 26. This procedure was then shown to be cor-
rect by Brink and Olive [45]. They constructed the operator that projects over
the physical states and, by inserting it in the loop, showed that the reduction
of the degrees of freedom of the oscillators from d to d− 2 was indeed correct.
This was, at that time, the only procedure available to let only the physical
states circulate in the loop because the BRST procedure was discovered a bit
later also in the framework of the gauge field theories!
To be more explicit let us compute the trace in Eq. (225) adding also the
Chan-Paton factor. We get:
(2π)dδ(d)
NTr(λa1 . . . λaM )
(8π2α′)d/2
τd/2+1
[f1(k)]
12 (2π)M×
dνM−1 . . .
dν2 τ
eG(νji)
]2α′pi·pj
; k ≡ e−πτ(226)
where νji ≡ νj − νi,
G(ν) = log
ie−πν
2τ Θ1(iντ |iτ)
f31 (k)
; f1(k) = k
(1− k2n) (227)
The birth of string theory 41
Θ1(ν|iτ) = −2k1/4 sinπν
1− e2iπνk2n
1− e−2iπνk2n
(1− k2n)(228)
Finally the normalization factor N0 is given in Eq. (218). We have performed
the calculation for an arbitrary value of the space-time dimension d. However,
in this way one gets also the extra factor of k
12 appearing in the first line
of Eq. (226) that implies that our calculation is actually only consistent if
d = 26. In fact, the presence of this factor does not allow one to rewrite the
amplitude, originally obtained in the Reggeon sector, in the Pomeron sector
as explained below. In the following we neglect this extra factor, implicitly
assuming that d = 26, but, on the other hand, still keeping an arbitrary d.
Using the relations:
f1(k) =
tf1(q) ; Θ1(iντ |iτ) = iΘ1(ν|it)t1/2eπν
2/t (229)
where t = 1
and q ≡ e−πt, we can rewrite the one-loop planar diagram in the
Pomeron channel. We get:
(2π)dδ(d)
NTr(λa1 . . . λaM )
(8π2α′)d/2
dt[f1(q)]
2−d(2π)M×
dνM−1 . . .
−Θ1(νji|it)
f31 (q)
]2α′pi·pj
(230)
Notice that, by factorizing the planar loop in the Pomeron channel, one con-
structed for the first time what we now call the boundary state [46] 16. This
can be easily seen in the way that we are now going to describe. First of all,
notice that the last quantity in Eq. (230) can be written as follows:
Θ1(νji|it)
f31 (q)
]2α′pi·pj
−2 sin(πνji)
1− q2ne2πiνji
1− q2ne−2πiνji
(1− q2n)2
]2α′pi·pj
(231)
This equation can be rewritten as follows:
〈p = 0|q2R
i=1 : e
ipi·Q(e2iπνi ) : |p = 0〉
Tr (〈p = 0|q2N |p = 0〉)
; R =
na†n · an (232)
16 See also the first paper in Ref. [47].
42 Paolo Di Vecchia
where the trace is taken only over the non-zero modes and momentum con-
servation has been used. It must also be stressed that the normal ordering of
the vertex operators in the previous equation is such that the zero modes are
taken to be both in the same exponential instead of being ordered as in Eq.
(59). By bringing all annihilation operators on the left of the creation ones,
from the expression in Eq. (232) one gets (zi ≡ e2πiνi):
(2π)dδ(d)
(−2 sinπνji)2α
′pi·pj×
n=1 Tr
n·ane
2α′pj ·
znj e
2α′pi· an√
Tr (〈p = 0|q2N |p = 0〉)
(233)
The trace can be computed by using the completeness relation involving co-
herent states |f〉 = efa† |0〉:
e−|f |
|f〉〈f | = 1 (234)
Inserting the previous identity operator in Eq. (233) one gets after some cal-
culation:
(2π)dδ(d)
(−2 sinπνji)2α
′pi·pj×
i.j=1
−2α′pi·pje2πinνji q
n(1−q2n) (235)
Expanding the denominator in the last exponent and performing the sum over
n one gets:
(2π)dδ(d)
(−2 sinπνji)2α
′pi·pj×
2α′pi·pj
log(1−e2πiνji q2(m+1)) (236)
that is equal to the last line of Eq. (231) apart from the δ-function for mo-
mentum conservation. In conclusion, we have shown that Eq.s (231) and (232)
are equal.
Using Eq. (231) we can rewrite Eq. (230) as follows:
NNM0 Tr(λ
a1 . . . λaM )
(8π2α′)d/2
dt[f1(q)]
2−d(2πi)M
dνM−1 . . .
The birth of string theory 43
. . .
λ〈p = 0, λ|q2R
i=1 : e
ipi·Q(e2iπνi ) : |p = 0, λ〉
λ〈p = 0, λ|q2N |p = 0, λ〉
(237)
where the sum over any state |λ〉 corresponds to taking the trace over the
non-zero modes. If d = 26 we can rewrite Eq. (237) in a simpler form:
NNM0 Tr(λ
a1 . . . λaM )
(8π2α′)d/2
dt (2πi)M
dνM−1 . . .
〈p = 0, λ|q2R−2
: eipi·Q(e
2iπνi ) : |p = 0, λ〉 (238)
The previous equation contains the factor
dtq2R−2 that is like the propa-
gator of the Shapiro-Virasoro model, but with only one set of oscillators as
in the generalized Veneziano model. In the following we will rewrite it com-
pletely with the formalism of the Shapiro-Virasoro model. This can be done
by introducing the Pomeron propagator:
dt q2N−2 =
D̂ ; D̂ ≡ α
zL0−1z̄L̃0−1; |z| ≡ q = e−πt(239)
and rewriting the planar loop in the following compact form:
〈B0|D̂|BM 〉 ; |B0〉 ≡
n |p = 0, 0a, 0ã〉 (240)
where |B0〉 is the boundary state without any Reggeon on it,
Td−1 =
2(d−10)/4
α′)−d/2−1 (241)
and |BM 〉 is instead the one with M Reggeons given by:
|BM 〉 = NM0 Tr(λa1 . . . λaM )(2πi)M
dνM−1 . . .
: eipi·Q(e
2iπνi ) : |B0〉 (242)
We want to stress once more that the normal ordering in the previous equa-
tion is defined by taking the zero modes in the same exponential. Both the
boundary states and the propagator are now states of the Shapiro-Virasoro
model. This means that we have rewritten the one-loop planar diagram, where
the states of the generalized Veneziano model circulate in the loop, as a tree
44 Paolo Di Vecchia
diagram of the Shapiro-Virasoro model involving two boundary states and a
propagator. This is what nowadays is called open/closed string duality.
Besides the one-loop planar diagram in Eq. (225), that is nowadays called
the annulus diagram, also the non-planar and the non-orientable diagrams
were constructed and studied. In particular the non-planar one, that is ob-
tained as the planar one in Eq. (225) but with two propagators multiplied
with the twist operator
Ω = eL−1(−1)R , (243)
had unitarity violating cuts that disappeared [27] if the dimension of the
space-time d = 26, leaving behind additional pole singularities. The explicit
form of the non-planar loop can be obtained following the same steps done
for the planar loop. One gets for the non-planar loop the following amplitude:
〈BR|D̂|BM 〉 (244)
where now both boundary states contain, respectively, R and M Reggeon
states. The additional poles found in the non-planar loop were called Pomerons
because they occur in the Pomeron sector, that today is called the closed string
channel, to distinguish them from the Reggeons that instead occur in the
Reggeon sector, that today is called the open string sector of the planar and
non-planar loop diagrams. At that time in fact, the states of the generalized
Veneziano models were called Reggeons, while the additional ones appearing
in the non-planar loop were called Pomerons. The Reggeons correspond nowa-
days to open string states, while the Pomerons to closed string states. These
things are obvious now, but at that time it took a while to show that the
additional states appearing in the Pomeron sector have to be identified with
those of the Shapiro-Virasoro model. The proof that the spectrum was the
same came rather early. This was obtained by factorizing the non-planar dia-
gram in the Pomeron channel [46] as we have done in Eq. (244). It was found
that the states of the Pomeron channel lie on a linear Regge trajectory that
has double intercept and half slope of the one of the Reggeons. This follows
immediately from the propagator D̂ in Eq. (239) that has poles for values of
the momentum of the Pomeron exchanged given by:
p2 = 2n (245)
that are exactly the values of the masses of the states of the Shapiro-Virasoro
model [48], while the Reggeon propagator in Eq. (100) has poles for values of
momentum equal to:
1− α′p2 = n (246)
However, it was still not clear that the Pomeron states interact among them-
selves as the states of the Shapiro-Virasoro model. To show this it was first
The birth of string theory 45
necessary to construct tree amplitudes containing both states of the general-
ized Veneziano model and of the Shapiro-Virasoro model [49]. They reduced
to the amplitudes of the generalized Veneziano (Shapiro-Virasoro) model if
we have only external states of the generalized Veneziano (Shapiro-Virasoro)
model. Those amplitudes are called today disk amplitudes containing both
open and closed string states. They were constructed [49] by using for the
Reggeon states the vertex operators that we have discussed in Sect. (5) in-
volving one set of harmonic oscillators and for the Pomeron states the vertex
operators given in Eq. (181) that we rewrite here:
Vα,β(z, z̄, p) = Vα(z,
)Vβ(z̄,
) (247)
because now both component vertices contain the same set of harmonic os-
cillators as in the generalized Veneziano model. Furthermore, each of the two
vertices is separately normal ordered, but their product is nor normal ordered.
The amplitude involving both kinds of states is then constructed by taking
the product of all vertices between the projective invariant vacuum and inte-
grating the Reggeons on the real axis in an ordered way and the Pomerons in
the upper half plane, as one does for a disk amplitude.
We have mentioned above that the two vertices are separately normal
ordered, but their product is not normal ordered. When we normal order
them we get, for instance for the tachyon of the Pomeron sector, a factor
(z − z̄)α′p2/2 that describes the Reggeon-Pomeron transition. This implies a
direct coupling [51] between the U(1) part of gauge field and the two-index
antisymmetric field Bµν , called Kalb-Ramond field [50], of the Pomeron sector,
that makes the gauge field massive [51].
It was then shown that, by factorizing the non-planal loop in the Pomeron
channel, one reproduced the scattering amplitude containing one state of
the Shapiro-Virasoro and a number of states of the generalized Veneziano
model [52]. If we have also external states belonging to the generalized
Shapiro-Virasoro model, then by factorizing the non-planar one loop ampli-
tude in the pure Pomeron channel, one would obtain the tree amplitudes of
the Shapiro-Virasoro model [52].
All this implies that the generalized Veneziano model and the Shapiro-
Virasoro model are not two independent models, but they are part of the
same and unique model. In fact, if one started with the generalized Veneziano
model and added loop diagrams to implement unitarity, one found the ap-
pearence in the non-planar loop of additional states that had the same mass
and interaction of those of the Shapiro-Virasoro model.
The planar diagram, written in Eq. (230) in the closed string channel, is
divergent for large values of t. This divergence was recognized to be due to
exchange, in the Pomeron channel, of the tachyon of the Shapiro-Virasoro
model and of the dilaton [47]. They correspond, respectively, to the first two
terms of the expansion:
[f1(q)]
−24 = e2πt + 24 +O
e−2πt
(248)
46 Paolo Di Vecchia
The first one could be cancelled by an analytic continuation, while the second
one could be eliminated through a renormalization of the slope of the Regge
trajectory α′ [47].
We conclude the discussion of the one-loop diagrams by mentioning
that the one-loop diagram for the Shapiro-Virasoro model was computed by
Shapiro [53] who also found that the integrand was modular invariant.
The computation of multiloop diagrams requires a more advanced tech-
nology that was also developed in the early days of the dual resonance model
few years before the discovery of its connection to string theory. In order to
compute multiloop diagrams one needs first to construct an object that was
called the N -Reggeon vertex and that has the properties of containing N sets
of harmonic oscillators, one for each external leg, and is such that, when we
saturate it with N physical states, we get the corresponding N -point ampli-
tude. In the following we will discuss how to determine the N -Reggeon vertex.
The first step toward the N -Reggeon vertex is the Sciuto-Della Selva-
Saito [54] vertex that includes two sets of harmonic oscillators that we denote
with the indices 1 and 2. It is equal to:
VSDS = 2〈x = 0, 0| : exp
dzX ′2(z) ·X1(1− z)
: (249)
where X is the quantity that we have called Q in Eq. (57) and the prime
denotes a derivative with respect to z. It satisfies the important property
of giving the vertex operator Vα(z = 1) of an arbitrary state |α〉 when we
saturate it with the corresponding state:
VSDS |α〉2 = Vα(z = 1) (250)
A shortcoming of this vertex is that it is not invariant under a cyclic permu-
tation of the three legs. A cyclic symmetric vertex has been constructed by
Caneschi, Schwimmer and Veneziano [55] by inserting the twist operator in
Eq. (243). But the 3-Reggeon vertex is not enough if we want to compute an
arbitrary multiloop amplitude. We must generalize it to an arbitrary number
of external legs. Such a vertex, that can be obtained from the one in Eq. (249)
with a very direct procedure, or that can also be obtained by sewing together
three-Reggeon vertices, has been written in its final form by Lovelace [56] 17.
Here we do not derive it, but we give directly its expression written in Ref. [56]:
VN,0 =
i=1 dzi
dVabc
i=1[V
i (0)]
[i<x = 0, Oa|] δ(
i,j=1
n,m=0
a(i)n Dnm(ΓV
i Vj) a
(251)
17 See also Ref. [57]. Earlier papers on the N-Reggeon can be found in Ref.s [58].
The birth of string theory 47
where a
0 ≡ αi0 =
2α′p̂i is the momentum of particle i and the infinite
matrix:
Dnm(γ) =
∂mz [γ(z)]
n|z=0 ; n,m = 1.. : D00(γ) = − log |
AD −BC
Dn0 =
)n ; D0n =
)n ; γ(z) =
Az +B
Cz +D
(252)
is a ”representation” of the projective group corresponding to the conformal
weight ∆ = 0, that satisfies the eqs.:
Dnm(γ1γ2) =
Dnl(γ1)Dlm(γ2) +Dn0(γ1)δ0m +D0m(γ2)δn0 (253)
Dnm(γ) = Dmn(Γγ
−1Γ ) Γ (z) =
(254)
Finally Vi is a projective transformation that maps 0, 1 and ∞ into zi−1, zi
and zi+1.
The previous vertex can be written in a more elegant form as follows:
VN,0 =
i=1 dzi
dVabc
i=1[V
i (0)]
[i<x = 0, Oa|] δ(
dz∂X(i)(z)p̂i logV
i (z)
i,j=1
dy∂X(i)(z) log[Vi(z)− Vj(y)]∂X(j)(y)
(255)
where the quantities X(i) are what we called Q, namely the Fubini-Veneziano
field, in the previous sections. The N -Reggeon vertex that satisfies the impor-
tant property of giving the scattering amplitude of N physical particle when
we saturate it with their corresponding states, is the fundamental object for
computing the multiloop amplitudes. In fact, if we want to compute a M -loop
amplitude withN external states, we need to start from the (N+2M)-Reggeon
vertex and then we have to sew the M pairs together after having inserted a
propagator D. In this way we obtain an amplitude that is not only integrated
over the punctures zi (i = 1 . . . N) of the N external states, but also over
the additional 3h− 3 moduli corresponding to the punctures variables of the
48 Paolo Di Vecchia
states that we sew together and the integration variable of the M propaga-
tors. h is the number of loops. The multiloop amplitudes have been obtained
in this way already in 1970 [59, 60, 61] and, through the sewing procedure,
one obtained functions, as the period matrix, the abelian differentials, the
prime form, etc., that are well defined on Riemann surface! The only thing
that was missing, was the correct measure of integrations over the 3h−3 vari-
ables because it was technically not possible to let only the physical states to
circulate in the loops. This problem was solved only much later [62, 63] when
a BRST invariant formulation of string theory and the light-cone functional
integral could be used for computing multiloops. They are two very different
approaches that, however, gave the same result. For the sake of completeness
we write here the planar h-loop amplitude involving M tachyons:
M (p1, . . . , pM ) = N
h Tr(λa1 · · ·λaM ) Ch
2gs (2α
(d−2)/4
[dm]Mh
G(h)(zi, zj)
V ′i (0)V
j (0)
2α′pi·pj
, (256)
where Nh Tr(λa1 · · ·λaM ) is the appropriate U(N) Chan-Paton factor, g is
the dimensionless open string coupling constant, Ch is a normalization factor
given by
(2π)dh
g2h−2s
(2α′)d/2
, (257)
and G(h) is the h-loop bosonic Green function
G(h)(zi, zj) = logE(h)(zi, zj)−
ωµ (2πImτµν)
ων , (258)
with E(h)(zi, zj) being the prime form, ω
µ (µ = 1, . . . , h) the abelian differen-
tials and τµν the period matrix. All these objects, as well as the measure on
moduli space [dm]Mh , can be explicitly written in the Schottky parametrization
of the Riemann surface, and their expressions for arbitrary h can be found for
example in Ref. [64]. It is given by
[dm]Mh =
dVabc
V ′i (0)
dkµ dξµ dηµ
k2µ (ξµ − ηµ)2
(1− kµ)2
(259)
× [det (−iτµν)]−d/2
(1− knα)−d
(1− knα)2
where kµ are the multipliers, ξµ and ηµ are the fixed points of the generators
of the Schottky group,
The birth of string theory 49
9 From dual models to string theory
The approach presented in the previous sections is a real bottom-up ap-
proach. The experimental data were the driving force in the construction of
the Veneziano model and of its generalization to N external legs. The rest of
the work that we have described above consisted in deriving its properties. The
result is, except for a tachyon, a fully consistent quantum-relativistic model
that was a source of fascination for those who worked in the field. Although
the model grew out of S-matrix theory where the scattering amplitude is the
only observable object, while the action or the Lagrangian have not a central
role, some people nevertheless started to investigate what was the underly-
ing microscopic structure that gave rise to such a consistent and beautiful
model. It turned out, as we know today, that this underlying structure is that
of a quantum-relativistic string. However, the process of connecting the dual
resonance model (actually two of them the generalized Veneziano and the
Shapiro-Virasoro model) to string theory took several years from the origi-
nal idea to a complete and convincing proof of the conjecture. The original
conjecture was independently formulated by Nambu [20, 65], Nielsen [66] and
Susskind [21] 18. If we look at it in retrospective, it was at that time a fantastic
idea that shows the enormous physical intuition of those who formulated it.
On the other hand, it took several years to digest it before one was able to
derive from it all the deep features of the dual resonance model. Because of
this, the idea that the underlying structure was that of a relativistic string,
did not really influence most of the research in the field up to 1973. Let me
try to explain why.
A common feature of the work of Ref.s [20, 66, 21] is the suggestion that
the infinite number of oscillators, that one got through the factorization of
the dual resonance model, naturally comes out from a two-dimensional free
Lagrangian for the coordinate Xµ(τ, σ) of a one-dimensional string, that is an
obvious generalization of the Lagrangian that one writes for the coordinate
Xµ(τ) of a pointlike object in the proper-time gauge:
=⇒ L ∼
(260)
Being this theory conformal invariant the Virasoro operators were also con-
structed together with their algebra. In this very first formulation, however,
the Virasoro generators Ln were just the generators associated to the confor-
mal symmetry of the string world-sheet Lagrangian given in Eq. (260) as in
any conformal field theory. It was not clear at all why they should imply the
gauge conditions found by Virasoro or, in modern terms, why they should be
zero classically. The basic ingredient to solve this problem was provided by
Nambu [65] and Goto [68] who wrote the non-linear Lagrangian proportional
18 See also Ref. [67].
50 Paolo Di Vecchia
to the area spanned by the string in the external target space. They proceeded
in analogy with the point particle and wrote the following action:
−dσµνdσµν (261)
where
dσµν =
dζα ∧ dζβ = ∂Xµ
ǫαβdσdτ (262)
Xµ(σ, τ) is the string coordinate and ζ
0 = τ and ζ1 = σ are the coordinates of
the string worldsheet. ǫαβ is an antisymmetric tensor with ǫ01 = 1. Inserting
eq. (262) in (261) and fixing the proportionality constant one gets the Nambu-
Goto action [65, 68]:
S = −cT
(Ẋ ·X ′)2 − Ẋ2X ′2 (263)
where
Ẋµ ≡ ∂X
µ ≡ ∂X
(264)
and T ≡ 1
is the string tension, that replaces the mass appearing in the
case of a point particle. In this formulation, the string Lagrangian is invariant
under any reparametrization of the world-sheet coordinates σ and τ and not
only under the conformal transformations. This, in fact, implies that the two-
dimensional world-sheet energy-momentum tensor of the string is actually zero
as we will show later on. But it took still a few years to connect the Nambu-
Goto action to the properties of the dual resonance model. In the meantime
an analogue model was formulated [69] that reproduced the tree and loop
amplitudes of the generalized Veneziano model. This approach anticipated
by several years the path integral derivation of dual amplitudes. It was very
closely related to the functional integral formulation of Ref.s [70].
However, one needed to wait until 1973 with the paper of Goddard,
Goldstone, Rebbi and Thorn [71], where the Nambu-Goto action was cor-
rectly treated, all its consequences were derived and it became completely
clear that the structure underlying the dual resonance model was that of a
quantum-relativistic string. The equation of motion for the string were de-
rived from the action in Eq. (263) by imposing δS = 0 for variations such
that δXµ(τi) = δX
µ(τf ) = 0. One gets:
∂X ′µ
δXµ +
∂X ′µ
δXµ|σ=πσ=0
(265)
where L is the Lagrangian in Eq. (263). Since δXµ is arbitrary, from eq. (265)
one gets the Euler-Lagrange equation of motion
The birth of string theory 51
∂X ′µ
= 0 (266)
and the boundary conditions
∂X ′µ
= 0 or δXµ = 0 at σ = 0, π (267)
for an open string and
Xµ(τ, 0) = Xµ(τ, π) (268)
for a closed string. In the case of an open string, the first kind of boundary
condition in Eq.(267) corresponds to Neumann boundary conditions, while
the second one to Dirichlet boundary conditions. Only the Neumann bound-
ary conditions preserve the translation invariance of the theory and, there-
fore, they were mostly used in the early days of string theory. It must be
stressed, however, that Dirichlet boundary conditions were already discussed
and used in the early days of string theory for constructing models with off-
shell states [72].
From Eq. (263) one can compute the momentum density along the string:
≡ Pµ = cT
′2 −X ′µ(Ẋ ·X ′)
(Ẋ ·X ′)2 − Ẋ2X ′2
(269)
and obtain the following constraints between the dynamical variables Xµ and
c2T 2x′
+ P 2 = x′ · P = 0 (270)
They are a consequence of the reparametrization invariance of the string La-
grangian. Because of this one can choose the orthonormal gauge specified by
the conditions:
Ẋ2 +X ′
= Ẋ ·X ′ = 0 (271)
that nowadays is called conformal gauge. In this gauge eq. (269) becomes:
Pµ = cT Ẋµ
∂X ′µ
= −cTX ′µ (272)
and therefore the eq. of motion in eq.(266) becomes:
Ẍµ −X ′′µ = 0 (273)
while the boundary condition in eq.(267) becomes:
X ′µ(σ = 0, π) = 0 (274)
52 Paolo Di Vecchia
The most general solution of the eq. of motion and of the boundary conditions
can be written as follows:
Xµ(τ, σ) = qµ + 2α′pµτ + i
[aµne
−inτ − a+µn einτ ]
cosnσ√
(275)
for an open string and
Xµ(τ, σ) = qµ + 2α′pµτ +
[ãµne
−2in(τ+σ) − ã+µn e2in(τ+σ)]
[aµne
−2in(τ−σ) − a+µn e2in(τ−σ)]
(276)
for a closed string. This procedure really shows that, starting from the
Nambu-Goto action, one can choose a gauge (the orthonormal or confor-
mal gauge) where the equation of motion of the string becomes the two-
dimensional D’Alembert equation in Eq. (273). Furthermore, the invariance
under reparametrization of the Nambu-Goto action implies that the two-
dimensional energy-momentum tensor is identically zero at the classical level
(See Eq. (271)).
As the Lorentz gauge in QED the orthonormal gauge does not fix com-
pletely the gauge. We can still perform reparametrizations that leave in the
conformal gauge: they are conformal transformatiuons. Introducing the vari-
able z = eiτ the generators of the conformal transformations for the open
string can be written as follows:
dzzn+1
αn−m · αm = 0 (277)
where
αµn =
naµn if n > 0√
2α′pµ if n = 0√
na†µn if n < 0
(278)
They are zero as a consequence of Eq.s (270) that in the conformal gauge
become Eq.s (271). In the case of a closed string we get instead:
L̃n =
dzzn+1
= 0 (279)
dz̄z̄n+1
= 0 (280)
The birth of string theory 53
In terms of the harmonic oscillators introduced in eq. (276) we get
αm · αn−m = 0 ; L̃n =
α̃m · α̃n−m = 0 (281)
where for the non-zero modes we have used the convention in (278), while the
zero mode is given by:
0 = α̃
(282)
In conclusion, the fact that we have reparametrization invariance implies that
the Virasoro generators are classically identically zero. When we quantize the
theory one cannot and also does not need to impose that they are vanishing at
the operator level. They are imposed as conditions characterizing the physical
states.
〈Phys′|Ln|Phys〉 = 〈Phys′|(L0 − 1)|Phys〉 = 0 ; n 6= 0 (283)
These equations are satisfied if we require:
Ln|Phys >= (L0 − 1)|Phys >= 0 (284)
The extra factor −1 in the previous equations comes from the normal ordering
as explained in Eq. (198).
The authors of Ref. [71] further specified the gauge by fixing it completely.
They introduced the light-cone gauge specified by imposing the condition:
X+ = 2α′p+τ (285)
where
X0 ±Xd−1√
X0 ±Xd−1√
(286)
In this gauge the only physical degrees of freedom are the transverse ones.
In fact the components along the directions 0 and d − 1 can be expressed in
terms of the transverse ones by inserting Eq. (285) in the constraints in Eq.
(271) and getting:
Ẋ− =
4α′p+
(Ẋ2i +X
i ) X
2α′p+
Ẋi ·X ′i (287)
that up to a constant of integration determine completely X− as a function
of X i. In terms of oscillators we get
α+n = 0 ;
2α′α−n =
αin−mα
m n 6= 0 (288)
54 Paolo Di Vecchia
for an open string and
α+n = α̃
n = 0 n 6= 0 (289)
together with
2α′α−n =
αin−mα
2α′α̃−n =
α̃in−mα̃
m (290)
in the case of a closed string.
This shows that the physical states are described only by the transverse
oscillators having only d − 2 components. Those transverse oscillators corre-
spond to the transverse DDF operators that we have discussed in Section 6.
The authors of Ref. [71] also constructed the Lorentz generators only in terms
of the transverse oscillators and they showed that they satisfy the correct
Lorentz algebra only if the space-time dimension is d = 26. In this way the
spectrum of the dual resonance model was completely reproduced starting
from the Nambu-Goto action if d = 26! On the other hand, the choice of
d = 26 is a necessity if we want to keep Lorentz invariance!
Immediately after this, the interaction was also included either by adding
a term describing the interaction of the string with an external gauge field [73]
or by using a functional formalism [74, 75].
In the following we will give some detail only of the first approach for the
case of an open string. A way to describe the string interaction is by adding
to the free string action an additional term that describes the interaction of
the string with an external field.
SINT =
dDyΦL(y)JL(y) (291)
where ΦL(y) is the external field and JL is the current generated by the string.
The index L stands for possible Lorentz indices that are saturated in order to
have a Lorentz invariant action.
In the case of a point particle, such an interaction term will not give any
information on the self-interaction of a particle.
In the case of a string, instead, we will see that SINT will describe the
interaction among strings because the external fields that can consistently
interact with a string are only those that correspond to the various states of
the string, as it will become clear in the discussion below.
This is a consequence of the fact that, for the sake of consistency, we must
put the following restrictions on SINT :
• It must be a well defined operator in the space spanned by the string
oscillators.
The birth of string theory 55
• It must preserve the invariances of the free string theory. In particular, in
the ”conformal gauge” it must be conformal invariant.
• In the case of an open string, the interaction occurs at the end point of
a string (say at σ = 0). This follows from the fact that two open strings
interact attaching to each other at the end points.
The simplest scalar current generated by the motion of a string can be written
as follows
J(y) =
dσδ(σ)δ(d)[yµ − xµ(τ, σ)] (292)
where δ(σ) has been introduced because the interaction occurs at the end of
the string. For the sake of simplicity we omit to write a coupling constant g
in (292).
Inserting (292) in (291) and using for the scalar external field Φ(y) = eik·y
a plane wave, we get the following interaction:
SINT =
dτ : eik·X(τ,0) : (293)
where the normal ordering has been introduced in order to have a well defined
operator. The invariance of (293) under a conformal transformation τ → w(τ)
requires the following identity:
SINT =
dτ : eik·X(τ,0) : =
dw : eik·X(w,0) : (294)
or, in other words, that
: eik·X(τ,0) :=⇒ w′(τ) : eik·X(w,0) : (295)
This means that the integrand in Eq. (294) must be a conformal field with
conformal dimension equal to one and this happens only if α′k2 = 1. The
external field corresponds then to the tachyonic lowest state of the open string.
Another simple current generated by the string is given by:
Jµ(y) =
dσδ(σ)Ẋµ(τ, σ)δ
(d)(y −X(τ, σ)) (296)
Inserting (296) in (291) we get
SINT =
dτẊµ(τ, 0)ǫ
µeik·X(τ,0) (297)
if we use a plane wave for Φµ(y) = ǫµe
ik·y. The vertex operator in eq. (297)
is conformal invariant only if
k2 = ǫ · k = 0 (298)
56 Paolo Di Vecchia
and, therefore, the external vector must be the massless photon state of the
string. We can generalize this procedure to an arbitrary external field and
the result is that we can only use external fields that correspond to on shell
physical states of the string.
This procedure has been extended in Ref. [73] to the case of external
gravitons by introducing in the Nambu-Goto action a target space metric and
obtaining the vertex operator for the graviton that is a massless state in the
closed string theory. Remember that, at that time, this could have been done
only with the Nambu-Goto action because the σ-model action was introduced
only in 1976 first for the point particle [76] and then for the string [77]. As
in the case of the photon it turned out that the external field corresponding
to the graviton was required to be on shell. This condition is the precursor of
the equations of motion that one obtains from the σ-model action requiring
the vanishing of the β-function [78].
One can then compute the probability amplitude for the emission of a
number of string states corresponding to the various external fields, from an
initial string state to a final one. This amplitude gives precisely the N -point
amplitude that we discussed in the previous sections [73]. In particular, one
learns that, in the case of the open string, the Fubini-Veneziano field is just
the string coordinate computed at σ = 0:
Qµ(z) ≡ Xµ(z, σ = 0) ; z = eiτ (299)
In the case of a closed string we get instead:
Qµ(z, z̄) ≡ Xµ(z, z̄) ; z = e2i(τ−σ) , z̄ = e2i(τ+σ) (300)
Finally, let me mention that with the functional approach Mandelstam [74]
and Cremmer and Gervais [79] computed the interaction between three arbi-
trary physical string states and reproduced in this way the coupling of three
DDF states given in Eq. (202) and obtained in Ref. [37] by using the operator
formalism. At this point it was completely clear that the structure underlying
the generalized Veneziano model was that of an open relativistic string, while
that underlying the Shapiro-Virasoro model was that of a closed relativistic
string. Furthermore, these two theories are not independent because, if one
starts from an open string theory, one gets automatically closed strings by
loop corrections.
10 Conclusions
In this contribution, we have gone through the developments that led from
the construction of the dual resonance model to the bosonic string theory
trying as much as possible to include all the necessary technical details. This
is because we believe that they are not only important from an historical point
of view, but are also still part of the formalism that one uses today in many
The birth of string theory 57
string calculations. We have tried to be as complete and objective as possible,
but it could very well be that some of those who participated in the research
of these years, will not agree with some or even many of the statements we
made. We apologize to those we have forgotten to mention or we have not
mentioned as they would have liked.
Finally, after having gone through the developments of these years, my
thoughts go to Sergio Fubini who shared with me and Gabriele many of the
ideas described here and who is deeply missed, and to my friends from Flo-
rence, Naples and Turin for a pleasant collaboration in many papers discussed
here.
Acknowledgments
I thank R. Marotta and I. Pesando for a critical reading of the manuscript.
References
1. G.F. Chew, The analytic S matrix, W.A.Benjamin, Inc. (1966).
R.J. Eden, P.V. Landshoff, D.I. Olive and J.C. Polkinghorne, The analytic S
matrix, Cambridge University Press (1966).
2. R. Dolen, D. Horn and C. Schmid, Phys. Rev. 166, 1768 (1968).
C. Schmid, Phys. Rev. Letters 20, 689 (1968).
3. H. Harari, Phys. Rev. Letters 22, 562 (1969).
J.L. Rosner, Phys. Rev. Letters 22, 689 (1969).
4. G. Veneziano, Nuovo Cimento A 57, 190 (1968).
5. M. A. Virasoro, Phys. Rev. 177, 2309 (1969).
6. M.A. Virasoro, Phys. Rev. D 1, 2933 (1970).
7. A. Neveu and J.H. Schwarz, Nucl. Phys. B 31, 86 (1971) and
Phys. Rev. D 4, 1109 (1971).
8. P. Ramond, Phys. Rev. D 3, 2415 (1971).
9. C. Lovelace, Phys. Lett. B 28, 265 (1968).
J. Shapiro, Phys. Rev. 179, 1345 (1969).
10. P.H. Frampton, Phys. Lett. B 41, 364 (1972).
11. V. Alessandrini, D. Amati, M. Le Bellac and D. Olive, Phys. Rep. C 1, 269
(1971).
G. Veneziano, Phys. Rep. C 9, 199 (1974).
S. Mandelstam, Phys. Rep. C 13, 259 (1974).
C. Rebbi, Phys. Rep. C 12, 1 (1974).
J. Scherk, Rev. Mod. Phys. 47, 123 (1975).
12. F. Gliozzi, Lett. Nuovo Cimento 2, 1160 (1970).
13. K. Bardakçi and H. Ruegg, Phys. Rev. 181, 1884 (1969).
C.G. Goebel and B. Sakita, Phys. Rev. Letters 22, 257 (1969).
Chan Hong-Mo and T.S. Tsun, Phys. Lett. B 28, 485 (1969).
Z. Koba and H.B.Nielsen, Nucl. Phys. B 10, 633 (1969).
14. K. Bardakçi and H. Ruegg, Phys. Lett.B 28, 671 (1969).
M.A. Virasoro, Phys. Rev. Lett. 22, 37 (1969).
58 Paolo Di Vecchia
15. Z. Koba and H.B.Nielsen, Nucl. Phys. B 12, 517 (1969).
16. H. M. Chan and J.E. Paton, Nucl. Phys. B 10, 516 (1969).
17. S. Fubini and G. Veneziano, Nuovo Cimento A 64, 811 (1969).
18. Bardakçi and S. Mandelstam, Phys. Rev. 184, 1640 (1969).
19. S. Fubini, D. Gordon and G. Veneziano, Phys. Lett. B 29, 679 (1969)
20. Y. Nambu, Proc. Int. Conf. on Symmetries and Quark Models, Wayne State
University 1969 (Gordon and Breach, 1970) p. 269.
21. L. Susskind, Nuovo Cimento A 69, 457 (1970) and Phys. Rev. Letter 23, 545
(1969).
22. J. Shapiro, Phys. Lett. B 33, 361 (1970).
23. S. Fubini and G. Veneziano, Nuovo Cimento A 67, 29 (1970).
24. F. Gliozzi, Lettere al Nuovo Cimento 2, 846 (1969).
25. C.B. Chiu, S. Matsuda and C. Rebbi, Phys. Rev. Lett. 23, 1526 (1969).
C.B. Thorn, Phys. Rev. D 1, 1963 (1970).
26. S. Fubini and G. Veneziano, Annals of Physics 63, 12 (1971).
27. C. Lovelace, Phys. Lett. B 34, 500 (1971).
28. E. Del Giudice and P. Di Vecchia, Nuovo Cimento A 5, 90 (1971).
M. Yoshimura, Phys. Lett. B 34, 79 (1971).
29. E. Del Giudice and P. Di Vecchia, Nuovo Cimento A 70, 579 (1970).
30. P. Campagna, S. Fubini, E Napolitano and S. Sciuto, Nuovo Cimento A 2, 911
(1971).
31. E. Del Giudice, P. Di Vecchia and S. Fubini, Annals of Physics, 70, 378 (1972).
32. R.C. Brower, Phys. Rev. D 6, 1655 (1972).
33. P. Goddard and C.B. Thorn, Phys. Lett. B 40, 235 (1972).
34. F. Gliozzi, J. Scherk and D. Olive, Phys. Lett. B 65, 282 (1976) ; Nucl. Phys.
B 122, 253 (1977).
35. L. Brink and H.B. Nielsen, Phys. Lett. B 45, 332 (1973).
36. F. Gliozzi, unpublished.
See also P. Di Vecchia in Many Degrees of Freedom in Particle Physics, Edited
by H. Satz, Plenum Publishing Corporation, 1978, p. 493.
37. M. Ademollo, E. Del Giudice, P. Di Vecchia and S. Fubini, Nuovo Cimento A
19, 181 (1974).
38. J. Scherk, Nucl. Phys. B 31, 222 (1971).
39. N. Nakanishi, Prog. Theor. Phys. 48, 355 (1972).
P.H. Frampton and K.C. Wali, Phys. Rev. D 8, 1879 (1973).
40. A. Neveu and J. Scherk, Nucl. Phys.B 36, 155 (1973).
41. A. Neveu and J.L. Gervais, Nucl. Phys. B 46, 381 (1972).
42. P. Di Vecchia, A. Lerda, L. Magnea, R. Marotta and R. Russo, Nucl. Phys. B
469, 235 (1996)
43. T. Yoneya, Prog. of Theor. Phys. 51, 1907 (1974).
44. K. Kikkawa, B. Sakita and M. Virasoro, Phys. Rev. 184, 1701 (1969).
K. Bardakçi, M.B. Halpern and J. Shapiro, Phys. Rev. 185, 1910 (1969).
D. Amati, C. Bouchiat and J.L. Gervais, Lett. al Nuovo Cimento 2, 399 (1969).
A. Neveu and J. Scherk, Phys. Rev. D 1, 2355 (1970).
G. Frye and L. Susskind, Phys. Lett. B 31, 537 (1970).
D.J. Gross, A. Neveu, J. Scherk and J.H. Schwarz, Phys. Rev. D 2, 697 (1970).
45. L. Brink and D. Olive, Nucl. Phys. B 56, 253 (1973) and Nucl. Phys. B 58, 237
(1973).
The birth of string theory 59
46. E. Cremmer and J. Scherk, Nucl. Phys. B 50, 222 (1972).
L. Clavelli and J. Shapiro, Nucl. Phys. B 57, 490 (1973).
L. Brink, D.I. Olive and J. Scherk, Nucl. Phys. B 61, 173 (1973).
47. M. Ademollo, A. D’Adda, R. D’Auria, F. Gliozzi, E. Napolitano, S. Sciuto and
P. Di Vecchia, Nucl. Phys. B 94, 221 (1975).
J. Shapiro, Phys. Rev. D 11, 2937 (1975).
48. D.I.Olive and J. Scherk, Phys. Lett. B 44, 296 (1973).
49. M. Ademollo, A. D’Adda, R. D’Auria, E. Napolitano, P. Di Vecchia, F. Gliozzi
and S. Sciuto, Nucl. Phys. B 77, 189 (1974).
50. M. Kalb and P. Ramond, Phys. Rev. D 9, 2273 (1974).
51. E, Cremmer and J. Scherk, Nucl. Phys. B 72, 117 (1974).
52. A. D’Adda, R. D’Auria, E. Napolitano, P. Di Vecchia, F. Gliozzi and S. Sciuto,
Phys. Lett. B 68, 81 (1977).
53. J. Shapiro, Phys. Rev. D 5, 1945 (1975).
54. S. Sciuto, Lettere al Nuovo Cimento 2, 411 (1969).
A. Della Selva and S. Saito, Lett. al Nuovo Cimento 4, 689 (1970).
55. L. Caneschi, A. Schwimmer and G. Veneziano, Phys. Lett.B 30, 356 (1969).
L. Caneschi and A. Schwimmer, Lettere al Nuovo Cimento 3, 213 (1970).
56. C. Lovelace, Phys. Lett. B 32, 490 (1970).
57. D.I. Olive, Nuovo Cimento A 3, 399 (1971).
58. I. Drummond, Nuovo Cimento A 67, 71 (1970).
G. Carbone and S. Sciuto, Lett. Nuovo Cimento 3, 246 (1970).
L. Kosterlitz and D. Wray, Lett. al Nuovo Cimento 3, 491 (1970).
D. Collop, Nuovo Cimento A 1, 217 (1971).
L.P. Yu, Phys. Rev. D 2, 1010 (!970); Phys. Rev. D 2, 2256 (!970).
E. Corrigan and C. Montonen, Nucl. Phys. B 36, 58 (1972).
J.L. Gervais and B. Sakita, Phys. Rev. D 4, 2291 (1971).
59. C. Lovelace, Phys. Lett. B 32, 703 (1970).
60. V. Alessandrini, Nuovo Cimento A 2, 321 (1971).
61. D. Amati and V. Alessandrini, Nuovo Cimento A 4, 793 (1971).
62. P. Di Vecchia, M. Frau, A. Lerda and S. Sciuto, Phys. Lett. B 199, 49 (1987).
J.L. Petersen and J. Sidenius, Nucl. Phys. B 301, 247 (1988).
63. S. Mandelstam, In “Unified String Theories”, edited by M. Green and D. Gross,
World Scientific, p. 46.
64. P. Di Vecchia, F. Pezzella, M. Frau, K. Hornfeck, A. Lerda and S. Sciuto, Nucl.
Phys. B 322, 317 (1989).
65. Y. Nambu, Lectures at the Copenhagen Symposium, 1970, unpublished.
66. H. B. Nielsen, Paper submitted to the 15th Int. Conf. on High Energy Physics,
Kiev, 1970 and Nordita preprint (1969).
67. T. Takabayasi, Progr. Theor. Phys. 44 (1970) 1117.
O. Hara, Progr. Theor. Phys. 46, 1549 (1971).
L.N. Chang and J. Mansouri, Phys. Rev. D 5, 2535 (1972).
J. Mansouri and Y. Nambu, Phys. Lett. B 39, 357 (1972).
M. Minami, Prog. Theor. Phys. 48, 1308 (1972).
68. T. Goto, Progr. Theor. Phys. 46 (1971) 1560.
69. D. Fairlie and H.B. Nielsen, Nucl. Phys. B 20, 637 (1970) and 22, 525 (1970).
70. C.S. Hsue, B. Sakita and M.A. Virasoro, Phys. Rev. 2, 2857 (1970).
J.L. Gervais and B. Sakita, Phys. Rev. D 4, 2291 (1971).
71. P. Goddard, J. Goldstone, C. Rebbi and C. Thorn, Nucl. Phys. B 56, 109
(1973).
60 Paolo Di Vecchia
72. E.F. Corrigan and D.B. Fairlie, Nucl. Phys. B 91, 527 (1975).
73. M. Ademollo, A. D’Adda, R. D’Auria, P. Di Vecchia, F. Gliozzi, R. Musto, E.
Napolitano, F. Nicodemi and S. Sciuto, Nuovo Cimento A 21, 77 (1974).
74. S. Mandelstam, Nucl. Phys. B 64, 205 (1973).
75. J.L. Gervais and B. Sakita, Phys. Rev. Lett. 30, 716 (1973).
76. L. Brink, P. Di Vecchia, P. Howe, S. Deser and B. Zumino, Phys. Lett. B 64,
435 (1976).
77. L. Brink, P. Di Vecchia and P. Howe, Phys. Lett. B 65, 471 (1976).
S. Deser and B. Zumino, Phys. Lett. B 65, 369 (1976).
78. C. Lovelace, Phys. Lett. B 136, 75 (1984).
C.G. Callan, E.J.Martinec, M.J. Perry and D. Friedan, Nucl. Phys. B 262, 593
(1985).
79. E. Cremmer and J.L. Gervais, Nucl. Phys. 76, 209 (1974).
	The birth of string theory
	Paolo Di Vecchia
ABSTRACT
  In this contribution we go through the developments that in the years 1968 to
1974 led from the Veneziano model to the bosonic string.

<|endoftext|><|startoftext|>
Introduction
The purpose of this paper is to construct examples of strange behavior of local coho-
mology. In these constructions we follow a strategy that was already used in [CH] and
which relates, via a spectral sequence introduced in [HR], the local cohomology for the
two distinguished bigraded prime ideals in a standard bigraded algebra.
In the first part we consider algebras with rather general gradings and deduce a similar
spectral sequence in this more general situation. A typical example of such an algebra
is the Rees algebra of a graded ideal. The proof for the spectral sequence given here is
simpler than that of the corresponding spectral sequence in [HR].
In the second part of this paper we construct examples of standard graded rings A,
which are algebras over a field K, such that the function
(1) j 7→ dimK(H iA+(A)−j)
is an interesting function for j ≫ 0. In our examples, this dimension will be finite for all
Suppose that A0 is a Noetherian local ring, A =
j≥0Aj is a standard graded ring and
set A+ :=
j>0Aj. Let M be a finitely generated graded A-module and F := M̃ be the
sheafification of M on Y = Proj(A). We then have graded A-module isomorphisms
H i+1A+ (M)
H i(Y,F(n))
for i ≥ 1, and a similar expression for i = 0 and 1.
By Serre vanishing, H iA+(M)j = 0 for all i and j ≫ 0. However, the asymptotic
behaviour of H iA+(M)−j for j ≫ 0 is much more mysterious.
In the case when A0 = K is a field, the function (1) is in fact a polynomial for large
enough j. The proof is a consequence of graded local duality ([BS, 13.4.6] or [BH, 3.6.19])
or follows from Serre duality on a projective variety.
For more general A0, HA+(M)−j are finitely generated A0 modules, but need not have
finite length.
The following problem was proposed by Brodmann and Hellus [BrHe].
The second author was partially supported by NSF.
http://arxiv.org/abs/0704.0102v1
Tameness problem: Are the local cohomology modules H iA+(M) tame? That is, is it
true that either
{H iA+(M)j 6= 0, ∀j ≪ 0} or {H
A+(M)j = 0, ∀j ≪ 0}?
The problem has a positive solution for A0 of small dimension (some of the references
are Brodmann [Br], Brodmann and Hellus [BrHe], Lim [L], Rotthaus and Sega [RS]).
Theorem 0.1 ([BrHe]). If dim(A0) ≤ 2, then M is tame.
However, it has recently been shown by two of the authors that tameness can fail if
dim(A0) = 3.
Theorem 0.2 ([CH]). There are examples with dim(A0) = 3 where M is not tame.
The statement of this example is reproduced in Theorem 3.1 of this paper. The function
(1) is periodic for large j. Specifically, the function (1) is 2 for large even j and is 0 for
large odd j.
In Theorem 3.3 we construct an example of failure of tameness of local cohomology
which is not periodic, and is not even a quasi polynomial (in −j) for large j. Specifically,
we have for j > 0,
dimK(H
(A)−j) =
1 if j ≡ 0 (mod) (p+ 1),
1 if j = pt for some odd t ≥ 0,
0 otherwise,
where the characteristic of K is p. We have pt ≡ −1 (mod) (p + 1) for all odd t ≥ 0.
We also give an example (Theorem 3.5) of failure of tameness where (1) is a quasi
polynomial with linear growth in even degree and is 0 in odd degree.
In Theorem 3.6, we give a tame example, but we have
dimK(H
(A)−j)
so (1) is far from being a quasi polynomial in −j for large j.
While the example of [CH] is for M = ωA, where ωA is the canonical module of A, the
examples of the paper are all for M = A. This allows us to easily reinterpret our examples
as Rees algebras in Section 4, and thus we have examples of Rees algebras over local rings
for which the above failure of tameness holds.
In the final section, Section 5, we give an analysis of the explicit and implicit role of
bigraded duality in the construction of the examples, and some comments on how it effects
the geometry of the constructions.
1. Duality for polynomial rings in two sets of variables
Let K be any commutative ring (with unit). In later applications K will be mostly
a field. Furthermore let S = K[x1, . . . , xm, y1, . . . , yn], P = (x1, . . . , xm) and Q =
(y1, . . . , yn).
The homology of the Čech complex CP ( ) (resp. CQ( )) will be denoted by HP ( ) (resp.
HQ( )). Notice that for any commutative ring K, this homology is the local cohomology
supported in P (resp. Q), as P and Q are generated by a regular sequences.
Assume that S is Γ-graded for some abelian group Γ, and that deg(a) = 0 for a ∈ K.
If xsyp ∈ R, deg(xsyp) = l(s) + l′(p) with l(s) :=
i si deg(xi) and l
′(p) :=
j pj deg(yj).
Definition 1.1. Let I ⊂ S be a Γ-graded ideal. The Γ-grading of S is I-sharp if H iI(S)γ
is a finitely generated K-module, for every i and γ ∈ Γ.
Lemma 1.2. The following conditions are equivalent:
(i) the Γ-grading of S is P -sharp.
(ii) the Γ-grading of S is Q-sharp.
(iii) for all γ ∈ Γ, |{(α, β) : α ≥ 0, β ≥ 0, l(α) = γ + l′(β)}| < ∞.
Note that if K is Noetherian, M is a finitely generated Γ-graded S-module, and the
Γ-grading of S is I-sharp, then H iI(M)γ is a finite K-module, for every i and γ ∈ Γ. This
follows from the converging Γ-graded spectral sequence Hp−q(H
I (F)) ⇒ H
I (M), where F
is a Γ-graded free S-resolution of M with Fi finite for every i.
We will assume from now on that the Γ-grading of S is P -sharp (equivalently Q-
sharp). Set σ = deg(x1 · · · xmy1 · · · yn), and if N is a Γ-graded module, then let N∨ =
HomS(N,S(−σ)) and N∗ = ∗HomK(N,K) where the Γ-grading of N∗ is given by (N∗)γ =
HomK(N−γ ,K). More generally, we always denote the graded K-dual of a graded mod-
ule N (over what graded K-algebra soever) by N∗. Finally we denote by ϕαβ the map
S(−a) → S(−b) induced by multiplication by xαyβ where a = deg xα and b = − deg yβ.
Lemma 1.3. HmP (ϕαβ)γ
∼= HnQ(ϕ∨αβ)∗.
Proof. The free K-moduleHmP (S)γ is generated by the elements x
−s−1yp with s, p ≥ 0 and
−l(s)− l(1) + l′(p) = γ, and HnQ(S)γ′ is generated by the elements xty−q−1 with t, q ≥ 0
and l(t)− l′(q)− l′(1) = γ′.
Let dγ : H
P (S)γ → (HnQ(S∨)∗)γ = HnQ(S)−γ−σ be the K-linear map defined by
−s−1yp)(xty−q−1) =
1, if s = t and p = q,
0, else.
Then dγ is an isomorphism (because the Γ-grading of R is Q-sharp) and there is a com-
mutative diagram
HmP (S)γ−a
(ϕαβ)γ−−−−−−−→ HmP (S)γ−b
y dγ−b
(HnQ(S)−γ+a−σ)
−−−−−−−−−→ (HnQ(S)−γ+b−σ)∗.
The assertion follows. �
As an immediate consequence we obtain
Corollary 1.4. (a) Let f ∈ S be an homogeneous element of degree a−b, and ϕ : S(−a) →
S(−b) the graded degree zero map induced by multiplication with f . Then
HmP (ϕ) ≃ HnQ(ϕ∨)∗.
(b) Let F be a Γ-graded complex of finitely generated free S-modules. Then
(i) H iP (F) = 0 for i 6= m and H
Q(F) = 0 for j 6= n,
(ii) HmP (F) ≃ HnQ((F)∨)∗.
As the main result of this section we have
Theorem 1.5. Assume that K is Noetherian, the Γ-grading of S is P -sharp (equivalently
Q-sharp) and M is a finitely generated Γ-graded S-module. Set ωS/K := S(−σ). Let F be
a minimal Γ-graded S-resolution of M . Then,
(a) For all i, there is a functorial isomorphism
H iP (M) ≃ Hm−i(HmP (F)).
(b) There is a convergent Γ-graded spectral sequence,
H iQ(Ext
S(M,ωS/K)) ⇒ H
i+j−n(HmP (F)
In particular, if K is a field, there is a convergent Γ-graded spectral sequence,
H iQ(Ext
S(M,ωS)) ⇒ H
dimS−(i+j)
P (M)
Proof. Claim (a) is an immediate consequence of Corollary 1.4 via the Γ-graded spec-
tral sequence Hp−i(H
P (F)) ⇒ H iP (M). For (b), the two spectral sequences arising from
the double complex CQF∨ have as second terms respectively ′Eij2 = H iQ(Ext
S(M,ωS/K)),
2 = 0 for i 6= n and ′′E
2 = H
j(HnQ(F
∨)) ≃ Hj(HmP (F)∗). If further K is a field,
Hj(HmP (F)
∗) ≃ (Hj(HmP (F)))∗ ≃ H
P (F)
Corollary 1.6. Under the hypotheses of the theorem, if K is a field, then for any γ ∈ Γ,
there are convergent spectral sequences of finite dimensional K-vector spaces
H iQ(Ext
S(M,ωR))γ ⇒ H
dimS−(i+j)
P (M)−γ ,
H iP (Ext
S(M,ωR))γ ⇒ H
dimS−(i+j)
Q (M)−γ .
We now consider the special case that Γ = Z2, S := K[x1, . . . , xm, y1, . . . , yn] with
deg(xi) = (1, 0) and deg(yj) = (dj , 1) with dj ≥ 0. Set T := K[x1, . . . , xm] and let M be
a Γ-graded S-module. We view M as a Z-graded module by defining Mk =
j M(j,k).
Observe that each Mk itself is a graded T -module with (Mk)j = M(j,k) for all j. We also
note that H iP (M)k
∼= H iP0(Mk), as can been seen from the definition of local cohomology
using the Čech complex. Here P0 = (x1, . . . , xm) is the graded maximal ideal of T .
Corollary 1.7. With the notation introduced, let s := dimS = m + n and d := dimM .
(a) H0P (Ext
S (M,ωS))
∼= HdQ(M)∗ for any k,
(b) there is an exact sequence
0→H1P (Exts−dS (M,ωS))→H
Q (M)
∗→H0P (Exts−d+1S (M,ωS)).
(c) Let i ≥ 2. If ExtjS(M,ωS) is annihilated by a power of P for all s−d < j < s−d+i,
then there is an exact sequence
Exts−d+i−1S (M,ωS)→H
P (Ext
S (M,ωS))→H
Q (M)
∗→H0P (Exts−d+iS (M,ωS)).
In particular, if Ext
S(M,ωS) has finite length for all s − d < j ≤ s − d+ i0, for some
integer i0, then
H iP0(Ext
S (M,ωS)k)
∼= (Hd−iQ (M)−k)
∗ for all i ≤ i0 and k ≫ 0.
Consequently, if M is a generalized Cohen-Macaulay module (i.e. Exts−iS (M,ωS) has
finite length for all i 6= d), and if we set N = Exts−dS (M,ωS), then
H iP0(Nk)
∼= (Hd−iQ (M)−k)
∗ for all i and all k ≫ 0.
Proof. (a), (b) and (c) are direct consequences of Corollary 1.6. For the application, notice
that if γ = (ℓ, k) ∈ Γ with k ≫ 0 one has ExtjS(M,ωS)γ = 0 for all s− d < j ≤ s− d+ i0.
Therefore, for such γ, the desired conclusion follows. �
A typical example to which this situation applies is the Rees algebra of a graded ideal
I in the standard graded polynomial ring T = K[x1, . . . , xm]. Say, I is generated be the
homogeneous polynomials f1, . . . , fn with deg fj = dj for j = 1, . . . , n. Then the Rees
algebra R(I) ⊂ T [t] is generated the elements fjt. If we set deg fjt = (dj , 1) for all j
and deg xi = (1, 0) for all i, then R(I) becomes a Γ-graded S-module via the K-algebra
homomorphism S → R(I) with xi 7→ xi and yj 7→ fjt.
According to this definition we have R(I)k = Ik for all k.
Since dimR(I) = m+1, the module ωR(I) = Extn−1S (R(I), ωS) is the canonical module
of R(I) (in the sense of [HK, 5. Vortrag]). Recall that if a ring R is a finite S-module
of dimension m + 1, the natural finite map R→Hom(ωR, ωR) ∼= Extn−1S (ωR, ωS) is an
isomorphism if and only if R is S2. Thus in combination with Corollary 1.7 we obtain
Corollary 1.8. Let R := R(I). Suppose that Rp is Cohen-Macaulay for all p 6= (m, R+)
where m = (x1, . . . , xm) and R+ =
k>0 I
ktk. Then
H im(I
k) ∼= (Hm+1−iR+ (ωR)−k)
∗ for all i and all k ≫ 0.
Proof. Since ωR localizes, the conditions imply that (ωR)p is Cohen-Macaulay for all p 6=
(m, R+). Hence the natural into map R→R′ := Extn−1S (ωR, ωS) has a cokernel of finite
length. In particular, R′k = Rk = I
k for k ≫ 0. Thus Corollary 1.7 applied to M = ωR
gives the desired conclusion. �
Remark 1.9. Let R := R(I). If the cokernel of R→Hom(ωR, ωR) is annihilated by a
power of R+ (in other words, the blow-up is S2, as a projective scheme over Spec(T )),
then R′k = I
k for k ≫ 0 and therefore one has an exact sequence
0→H0m(T/Ik)→(HmR+(ωR)−k)
∗→H0m(ExtnS(ωR, ωS)k)→H1m(T/Ik)→(Hm−1R+ (ωR)−k)
for such a k.
2. A method of constructing examples
Suppose that R =
i,j≥0Rij is a standard bigraded algebra over a ring K = R00.
Define Ri =
j≥0Rij and Rj =
i≥0Rij . Define ideals P =
i and Q =
j>0Rj
in R. Suppose that M =
ij∈ZMij is a finitely generated, bigraded R-module. Define
M i =
j∈ZMij and Mj =
i∈ZMij. M
i is a graded R0-module and Mj is a graded
R0-module. Let Q0 = R01R
0, so that Q = Q0R. Let P0 = R10R0 so that P = R10R. We
have K module isomorphisms
H lQ(M)m,n
∼= H lQ0(M
for m,n ∈ Z. Let M̃m be the sheafification of the graded R0-module Mm on Proj(R0).
We have K module isomorphisms
H lQ0(M
m)n ∼= H l−1(Proj(R0), M̃m(n))
for l ≥ 2 and exact sequences
0 → H0Q0(M
m)n → (Rm)n = Rm,n → H0(Proj(R0), M̃m(n)) → H1Q0(M
m)n → 0.
We have similar formulas for the calculation of H lP (M).
Now assume that X is a projective scheme over K and F1 and F2 are very ample line
bundles on X. Let
Rm,n = Γ(X,F⊗m1 ⊗F
We require that R =
m,n≥0Rm,n be a standard bigraded K-algebra. We have
X ∼= Proj(R0) ∼= Proj(R0).
The sheafification of the graded R0-module Rm on X is R̃m = F⊗m1 , and the sheafification
of the graded R0-module Rn on X is R̃n ∼= F⊗n2 (Exercise II.5.9 [Ha]).
For l ≥ 2 we have bigraded isomorphisms
H lQ(R)
H lQ0(R
m)n ∼=
m≥0,n∈Z
H l−1(X,F⊗m1 ⊗F
Viewing R as a graded R0 algebra, we thus have graded isomorphisms
(2) H lQ(R)n
H l−1(X,F⊗m1 ⊗F
for l ≥ 2 and n ∈ Z. Let d = dim(R) = dim(X) + 2.
We now further assume that K is an algebraically closed field and X is a nonsingular
K variety. Let
V = P(F1 ⊕F2),
a projective space bundle over X with projection π : V → X. Since F1 ⊕ F2 is an ample
bundle on X, OV (1) is ample on V . Since
Γ(V,OV (t))
Γ(V,OV (t)) ∼= Γ(X,St(F1 ⊕F2)) ∼=
i+j=t
and R is generated in degree 1 with respect to this grading, OV (1) is very ample on V and
R is the homogeneous coordinate ring of the nonsingular projective variety V , so that R
is generalized Cohen Macaulay (all local cohomology modules H iR+(R) of R with respect
to the maximal bigraded ideal R+ of R have finite length for i < d). We further have that
V is projectively normal by this embedding (Exercise II.5.14 [Ha]) so that R is normal.
3. Strange behavior of local cohomology
In [CH], we constructed the following example of failure of tameness of local cohomology.
In the example, R0 has dimension 3, which is the lowest possible for failure of tameness
[Br].
Theorem 3.1. Suppose that K is an algebraically closed field. Then there exists a normal
standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra
R with dim(R) = 4 such that for j ≫ 0,
dimK(H
Q(ωR)−j) =
2 if j is even,
0 if j is odd,
where ωR is the canonical module of R, Q =
n>0Rn.
We first show that the above theorem is also true for the local cohomology of R.
Theorem 3.2. Suppose that K is an algebraically closed field. Then there exists a normal
standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra
R with dim(R) = 4 such that for j > 0,
dimK(H
Q(R)−j) =
2 if j is even,
0 if j is odd,
where Q =
n>0Rn.
Proof. We compute this directly for the R of Theorem 3.1 from (2) and the calculations of
[CH]. Translating from the notation of this paper to the notation of [CH], we have X = S
is an Abelian surface, F1 = OS(r2laH) and F2 = OS(r2(D + alH)).
By (2) of this paper, for n ∈ N, we have
dimK(H
Q(R)n) =
m≥0 h
1(X,F⊗m1 ⊗F
m≥0 h
1(S,OS((m+ n)r2alH + nr2D)).
Formula (1) of [CH] tells us that for m,n ∈ Z,
(3) h1(S,OS(mH + nD)) =
2 if m = 0 and n is even,
0 otherwise.
Thus for n < 0, we have
dimK(H
Q(R)n) =
2 if n is even,
0 if n is odd,
giving the conclusions of the theorem. �
The following example shows non periodic failure of tameness.
Theorem 3.3. Suppose that p is a prime number such that p ≡ 2 (mod) 3 and p ≥ 11.
Then there exists a normal standard graded K-algebra R0 over a field K of characteristic
p with dim(R0) = 4, and a normal standard graded R0-algebra R with dim(R) = 5 such
that for j > 0,
dimK(H
Q(R)−j) =
1 if j ≡ 0 (mod) (p+ 1),
1 if j = pt for some odd t ≥ 0,
0 otherwise,
where Q =
n>0Rn. We have p
t ≡ −1(mod)(p + 1) for all odd t ≥ 0.
To establish this, we need the following simple lemma.
Lemma 3.4. Suppose that C is a non singular curve of genus g over an algebraically closed
field K, and M, N are line bundles on C. If deg(M) ≥ 2(2g+1) and deg(N ) ≥ 2(2g+1),
then the natural map
Γ(C,M) ⊗ Γ(C,N ) → Γ(C,M⊗N )
is a surjection.
Proof. If L is a line bundle on C, then H1(C,L) = 0 if deg(L) > 2g − 2 and L is very
ample if deg(L) ≥ 2g + 1 (Chapter IV, Section 3 [Ha]).
Suppose that L is very ample and G is another line bundle on C. If deg(G) > 2g − 2−
deg(L), then G is 2-regular for L (Lecture 14, [M1]). Thus if deg(G) > 2g − 2 + deg(L),
Γ(C,G) ⊗ Γ(C,L) → Γ(C,G ⊗ L)
is a surjection by Castelnuovo’s Proposition, Lecture 14, page 99 [M1].
We now apply the above to prove the lemma. Write M ∼= A⊗q ⊗ B where A is a
line bundle such that deg(A) = 2g + 1, and 2g + 1 ≤ deg(B) < 2(2g + 1). deg(N ) >
2g − 2 + deg(A). Thus there exists a surjection
Γ(C,N ) ⊗ Γ(C,A) → Γ(C,A⊗N ).
We iterate to get surjections
Γ(C,A⊗i ⊗N )⊗ Γ(C,A) → Γ(C,A⊗(i+1) ⊗N )
for i ≤ q, and a surjection
Γ(C,A⊗q ⊗N )⊗ Γ(C,B) → Γ(C,M⊗N ).
We now prove Theorem 3.3. For the construction, we start with an example from
Section 6 of [CS]. There exists an algebraically closed field K of characteristic p, a curve
C of genus 2 over K, a point q ∈ C and a line bundle M on C of degree 0, such that for
n ≥ 0,
H1(C,OC (q)⊗M⊗n) =
1 if n = pt for some t ≥ 0,
0 otherwise.
Further, H1(C,OC (2q)⊗M⊗n) = 0 for all n > 0.
Let a = p+ 1. Let E be an elliptic curve over K, and let T = E × E, with projections
πi : T → E. Let b ∈ E be a point and let A = π∗1(OE(b)) ⊗ π∗2(OE(b)). Let X = T × C,
with projections ϕ1 : X → T , ϕ2 : X → C. Let L = OC(q). Let
F1 = ϕ∗1(A)⊗a ⊗ ϕ∗2(L)⊗a,
F2 = ϕ∗1(A)⊗(1+a) ⊗ ϕ∗2(L⊗(1+a) ⊗M−1).
For m,n ≥ 0, we have
Γ(X,F⊗m1 ⊗F
2 ) = Γ(T,A⊗(ma+n(1+a)))⊗ Γ(C,L⊗(ma+n(1+a)) ⊗M−⊗n)
= Γ(T,A⊗a)⊗m ⊗ Γ(T,A⊗(1+a))⊗n ⊗ Γ(C,La)⊗m ⊗ Γ(C,L⊗(1+a) ⊗M−1)⊗n
= Γ(X,F1)⊗m ⊗ Γ(X,F2)⊗n
by the Künneth formula (IV of Lecture 11 [M1]) and Lemma 3.4.
Let Rm,n = Γ(X,F⊗m1 ⊗F
2 ). R =
m,n≥0Rm,n is a standard bigraded K-algebra by
(4). Thus (2) holds.
By the Riemann Roch Theorem, we compute,
(5) h0(C,L⊗r ⊗M−⊗s) = h1(C,L⊗r ⊗M−⊗s) + r − 1,
and for s < 0,
(6) h1(C,L⊗r ⊗M−⊗s) =


1− r r < 0,
1 r = 0, s < 0,
1 r = 1, s = −pt, for some t ∈ N,
0 r = 1, s 6= −pt for some t ∈ N,
0 r = 2, s < 0,
0 r ≥ 3.
We further have
(7) h1(T,A⊗r) =
0 r 6= 0,
2 r = 0,
(8) h0(T,A⊗r) =
0 r < 0,
1 r = 0,
r2 r > 0.
By (2), for n ∈ Z, we have
dimK(H
Q(R)n) =
h1(X,F⊗m1 ⊗F
By the Künneth formula,
H1(X,F⊗m1 ⊗F
∼= H0(T,A⊗(ma+n(1+a)))⊗H1(C,L⊗(ma+n(1+a)) ⊗M−⊗n)
⊕H1(T,A⊗(ma+n(1+a)))⊗H0(C,L⊗(ma+n(1+a)) ⊗M−⊗n).
Thus by (5) - (8), we have for j > 0,
dimK(H
Q(R)−j) =
1 j ≡ 0 (mod) a,
1 j = pt for some odd t ∈ N,
0 otherwise.
and we have the conclusions of Theorem 3.3.
Theorem 3.5 gives an example of failure of tameness of local cohomology with larger
growth.
Theorem 3.5. Suppose that K is an algebraically closed field. Then there exists a normal
standard graded K-algebra R0 over K with dim(R0) = 4, and a normal standard graded
R0-algebra R with dim(R) = 5 such that for j > 0,
dimK(H
Q(R)−j) =
6j if j is even,
0 if j is odd,
where Q =
n>0Rn.
Proof. Let E be an elliptic curve over K, and let q ∈ E be a point. Let L = OE(3q). By
Proposition IV.4.6 [Ha], L is very ample on E, and
(9) ⊕n≥0Γ(E,L⊗n)
is generated in degree 1 as a K-algebra. For n ∈ N,
(10) h0(C,L⊗n) =
0 n < 0,
1 n = 0,
3n n > 0.
(11) h1(C,L⊗n) =
−3n n < 0,
1 n = 0,
0 n > 0.
Let X = E3, with the three canonical projections πi : X → E. Define
F1 = π∗1(L⊗2)⊗ π∗2(L⊗2)⊗ π∗3(L⊗2)
F2 = π∗1(L)⊗ π∗2(L)⊗ π∗3(L⊗2).
Rm,n = Γ(X,F⊗m1 ⊗F
m,n≥0
Rm,n.
By (9) and the Künneth formula, R is standard bigraded. By (2), the fact that ωX ∼= OX
and Serre duality,
dimK(H
Q(R)−j) =
h2(X,F⊗m1 ⊗F
2 ) =
h1(X,F⊗m1 ⊗F
for j ∈ Z.
Now by (10), (11) and the Künneth formula, we have that for n > 0,
h1(X,F⊗m1 ⊗F
2 ) =
0 if 2m+ n 6= 0,
2h0(X,L⊗n) if 2m+ n = 0.
Thus the conclusions of Theorem 3.5 hold. �
The following theorem gives an example of tame, but still rather strange local cohomol-
ogy. Let [x] be the greatest integer in a real number x.
Theorem 3.6. Suppose that K is an algebraically closed field. Then there exists a normal
standard graded K-algebra R0 with dim(R0) = 3, and a normal standard graded R0-algebra
R with dim(R) = 4 such that for j > 0,
dimK(H
Q(R)−j) = 162
dimK(H
Q(R)−j)
where Q =
n>0Rn.
Proof. We use the method of Example 1.6 [Cu]. Let E be an elliptic curve over an
algebraically closed field K, and let p ∈ E be a point. Let X = E × E with projections
πi : X → E. Let C1 = π∗1(p), C2 = π∗2(p) and
∆ = {(q, q) | q ∈ E}
be the diagonal of X. We compute (as in [Cu]) that
(12) (C21 ) = (C
2 ) = (∆
2) = 0
(13) (∆ · C1) = (∆ · C2) = (C1 · C2) = 1.
If N is an ample line bundle on X, then
(14) H i(X,N ) = 0 for i > 0
by the vanishing theorem of Section 16 [M2].
Suppose that L is a very ample line bundle on X, and M is a numerically effective (nef)
line bundle. Then M is 3 regular for L, so that
Γ(X,M⊗L⊗n)⊗ Γ(X,L) → Γ(X,M⊗L⊗(n+1))
is a surjection if n ≥ 3. C1 + 2C2 is an ample divisor by the Moishezon Nakai criterion
(Theorem V.1.10 [Ha]), so that 3(C1+2C2) is very ample by Lefschetz’s theorem (Theorem,
Section 17 [M2]). Let
F1 = OX(9(C1 + 2C2)).
Then OX is 3 regular for OX(3(C1 + 2C2)), so we have surjections
Γ(X,F⊗n1 )⊗ Γ(X,F1) → Γ(X,F
⊗(n+1)
for all n ≥ 1.
∆+C2 is ample by the Moishezon Nakai criterion. Let D = 3(∆+C2). D is very ample
by Lefschetz’s theorem, and thus OX(D)⊗F1 is very ample. Let
F2 = OX(3D)⊗F⊗31 .
OX is 3 regular for OX(D)⊗F1, so we have surjections
Γ(X,F2⊗n)⊗ Γ(X,F2) → Γ(X,F⊗(n+1)2 )
for all n ≥ 1.
for n > 0 and m ≥ 0, we have
F⊗m1 ⊗F
∼= OX(3nD)⊗F⊗(m+3n)1 .
Since D is nef, it is 3 regular for F1, and we have a surjection for all m ≥ 0, n > 0,
Γ(X,F⊗m1 ⊗F
2 )⊗ Γ(X,F1) → Γ(X,F
⊗(m+1)
Rm,n = Γ(X,F⊗m1 ⊗F
We have shown that ⊕m,n≥0Rm,n is a standard bigraded K-algebra. Thus (2) holds.
For m,n ∈ Z, let G = F⊗m1 ⊗ F
2 . As in Example 1.6 [Cu], and by (14) and Serre
Duality (ωX ∼= OX since X is an Abelian variety), we deduce that
1. (G2) > 0 and (G · F1) > 0 imply G is ample and h1(X,G) = h2(X,G) = 0.
2. (G2) < 0 implies h0(X,G) = h2(X,G) = 0.
3. (G2) > 0 and (G · F1) < 0 imply G−1 is ample and h0(X,G) = h1(X,G) = 0.
Let τ2 = −4−
and τ1 = −4 +
Using (12) and (13), we compute
(F21 ) = 2 · 162, (F2)2 = 31 · 162, (F1 · F2) = 8 · 162.
We have
(G2) = 324(m2 + 8mn+ 31
= 324(m− τ1n)(m− τ2n).
(G · F1) = 324(m + 4n).
Since τ2 < −4 < τ1 < 0, for n < 0 and m ∈ Z, we have
1. m > τ2n if and only if G2 > 0 and G · F1 > 0
2. τ1n < m < τ2n if and only if (G2) < 0
3. m < τ1n if and only if (G2) > 0 and (G · F1) < 0.
By the Riemann Roch Theorem for an Abelian surface (Section 16 [M2]),
χ(G) = 1
(G2).
Thus for m ∈ Z and n < 0,
h1(X,G) =
(G2) = −162(m2 + 8mn+ 31
n2) if τ1n < m < τ2n,
0 otherwise.
For n ∈ Z, let σ(n) = dimK(H2Q(Rn)). By (2),
σ(n) =
h1(X,F⊗m1 ⊗F
For n < 0, we have
σ(n) = −162(
τ1n<m<τ2n
(m2 + 8mn+
n2)).
Setting r = m+ 4n, we have
σ(n) = −162(
n<r<−
(r2 − 1
= −324
r=1 (r
2 − 1
n2) + 81n2
= −324
+ 81n2
= 162
We thus have the conclusions of the theorem. �
4. Strange examples of Rees Algebras
Let notation and assumptions be as in Section 2. Since F1 is ample, there exists l > 0
such that Γ(X,F⊗l1 ⊗ F
2 ) 6= 0. Thus we have an embedding F2 ⊗ F
1 ⊂ OX . Let
A = F2 ⊗F−l1 , which we have embedded as an ideal sheaf of X. For j ≥ 0 and i ≥ jl, let
Tij = Γ(X,F⊗i1 ⊗A⊗j) = Ri−jl,j.
For j ≥ 0, let Tj =
i≥jl Tij and T =
j≥0 Tj . Let B =
j>0 Tj. R
∼= T as graded rings
over R0 ∼= T0, although they have different bigraded structures. Thus for all i, j we have
(15) H iB(T )j
∼= H iQ(R)j .
T1 is a homogeneous ideal of T0, and T is the Rees algebra of T1. Thus all of the exam-
ples of Section 3 can be interpreted as Rees algebras over normal rings T0 with isolated
singularities.
We thus obtain the following theorems from Theorems 3.2 - 3.6. Theorems 4.1, 4.2 and
4.3 give examples of Rees algebras with non tame local cohomology.
Theorem 4.1. Suppose that K is an algebraically closed field. Then there exists a normal,
standard graded K algebra T0 with dim(T0) = 3 and a graded ideal A ⊂ T0 such that the
Rees algebra T = T0[At] of A is normal, and for j > 0,
dimK(H
B(T )−j) =
2 if j is even,
0 if j is odd.
where B is the graded ideal AtT of T .
Theorem 4.2. Suppose that p is a prime number such that p ≡ 2(mod)3 and p ≥ 11.
Then there exists a normal standard graded K-algebra T0 over a field K of characteristic
p with dim(T0) = 4, and a graded ideal A ⊂ T0 such that the Rees algebra T = T0[At] of
A is normal, and for j > 0,
dimK(H
Q(T )−j) =
1 if j ≡ 0(mod)(p + 1),
1 if j = pt for some odd t ≥ 0,
0 otherwise,
where B is the graded ideal AtT of T . We have pt ≡ −1(mod)(p + 1) for all odd t ≥ 0.
Theorem 4.3. Suppose that K is an algebraically closed field. Then there exists a normal,
standard graded K-algebra T0 with dim(T0) = 4 and a graded ideal A ⊂ T0 such that the
Rees algebra T = R0[At] of A is normal, and for j > 0,
dimK(H
B(T )−j) =
6j if j is even,
0 if j is odd,
where B is the graded ideal AtT of T .
Theorem 4.4. Suppose that K is an algebraically closed field. Then there exists a normal
standard graded K algebra T0 with dim(T0) = 3, and a graded ideal A ⊂ T0 such that the
Rees algebra T = T0[At] of A is normal, and for j > 0,
dimK(H
B(T )−j) = 162
dimK(H
B(T )−j)
where B is the graded ideal AtT of T .
By localizing at the graded maximal ideal of T0, we obtain examples of Rees algebras
of local rings with strange local cohomology.
In all of these examples, T0 is generalized Cohen Macaulay, but is not Cohen Macaulay.
This follows since in all of these examples,
H2P0(R0)0
∼= H1(X,OX ) 6= 0.
5. Local duality in the examples
The example of [CH], giving failure of tameness of local cohomology, is stated in The-
orem 3.1 of this paper. The proof of [CH] uses the bigraded local duality theorem of
[HR], which now follows from the much more general bigraded local duality theorem, The-
orem 1.5 and Corollary 1.7 of this paper, to conclude that in our situation, where R is
generalized Cohen Macaulay,
(16) (Hd−iQ (ωR)−j)
∗ ∼= H iP (R)j
for j ≫ 0.
In [CH], the formula
H iP (R)j
∼= H iP0(Rj)
i−1(X, R̃j(m))
i−1(X,F⊗m1 ⊗F
for i ≥ 2 and j ≥ 0 is then used with formula (1) of [CH] ((3) of this paper) to prove
Theorem 3.1.
In Section 2 we derive (2) from which we directly compute the local cohomology in the
examples of this paper. We make essential use of Serre duality on X in computing the
examples.
In this section, we show how (16) can be obtained directly from the geometry of X and
V , and how this formula can be directly interpreted as Serre duality on X.
Let notation be as in Section 2, so that K is an algebraically closed field, F1 and F2
are very ample line bundles on the nonsingular variety X.
Let ωR be the dualizing module of R, and let ωX be the canonical bundle of X (which
is a dualizing sheaf on X). For a K module W , let W ′ = HomK(W,K).
Lemma 5.1. We have that
(ωR)ij =
Γ(X,F⊗i1 ⊗F
2 ⊗ ωX) if i ≥ 1 and j ≥ 1
0 otherwise.
Set (ωR)
j∈Z(ωR)i,j , a graded R
0 module. The sheafification of (ωR)
i on X is
(18) (̃ωR)i =
F⊗i1 ⊗ ωX if i ≥ 1
0 if i ≤ 0.
Set (ωR)j =
i∈Z(ωR)i,j , a graded R0 module. The sheafification of (ωR)j on X is
(19) (̃ωR)j =
F⊗j2 ⊗ ωX if j ≥ 1
0 if j ≤ 0.
Proof. Give R the grading where the elements of degree e in R are [R]e =
i+j=eRij.
We have realized R (with this grading) as the coordinate ring of the projective embed-
ding of V = P(F1 ⊕F2) by the very ample divisor OV (1), with projection π : V → X.
Let ωV be the canonical line bundle on V . We first calculate ωV . Let f be a fiber of
the map π : V → X. By adjunction, we have that (f · ωV ) = −2. Since
Pic(V ) ∼= ZOV (1) ⊕ π∗(Pic(X)),
we see that there exists a line bundle G on X such that
ωV ∼= OV (−2)⊗ π∗(G).
The natural split exact sequence
(20) 0 → F2 → F1 ⊕F2 → F1 → 0
determines a section X0 of X, such that π∗ of the exact sequence
0 → OV (1)⊗OV (−X0) → OV (1) → OV (1) ⊗OX0 → 0
is (20) (Proposition II.7.12 [Ha]). Thus
OV (1) ⊗OV (−X0) ∼= π∗(F2)
OV (1)⊗OX0 ∼= F1.
By adjunction, we have that the canonical line bundle of X0 is
∼= ωV ⊗OV (X0)⊗OX0 .
Putting the above together, we see that
G ∼= F1 ⊗F2 ⊗ ωX .
ωV ∼= OV (−2)⊗ π∗(F1 ⊗F2 ⊗ ωX).
We realize R as a bigraded quotient of a bigraded polynomial ring
S = K[x1, . . . , xm, y1, . . . , yn],
with deg(xi) = (1, 0) for all i and deg(yj) = (0, 1) for all j. Viewing S as a graded K-
algebra with the grading determined by d(xi) = d(yj) = 1 for all i, j, we have a projective
embedding V ⊂ P = Proj(S). Since V is nonsingular, we see from Section III.7 of [Ha]
ωV ∼= ExtrP(OV ,Op(−e))
where e = m+ n is the dimension of S, and r = e− dim(R). ωR is defined as
ωR = *Ext
S(R,S(−e)) ∼=
ExtrP(OV ,OP(m− e)).
For m ≫ 0,
Γ(P, ExtrP(OV ,Op(m− e))) ∼= ExtrP(OV ,OP(m− e))
(by Proposition III.6.9 [Ha]). Thus ωR and
Γ∗(ωV ) =
Γ(V, ωV (m))
are isomorphic in high degree. Since both modules have depth≥ 2 at the maximal bigraded
ideal of R, we see that
ωR ∼= Γ∗(ωV ).
m∈Z Γ(V, ωV (m))
m∈Z Γ(V,OV (m− 2)⊗ π∗(F1 ⊗F2 ⊗ ωX)).
Since a fiber f of π satisfies (f ·OV (m− 2)⊗π∗(F1⊗F2)) < 0 if m < 2, we see that (with
this grading) [ωR]m = 0 if m < 2 and For m ≥ 2, we have
[ωR]m = Γ(X,S
m−2(F1 ⊕F2)⊗F1 ⊗F2)
i+j=m−2 Γ(X,F
⊗(i+1)
⊗(j+1)
2 ⊗ ωX).
The conclusions of the lemma now follow. �
Suppose that 2 ≤ i ≤ d − 2. Since F1 and F2 are ample, and d − (i + 1) > 0, there
exists a natural number n0 such that
(21) Hd−(i+1)(X,F⊗m1 ⊗Fn2 ⊗ ωX) = 0
for n ≥ n0 and all m ≥ 0.
By (18), we have graded isomorphisms
(22) H iQ(ωR)n
H i−1(X,F⊗m1 ⊗F
2 ⊗ ωX)
for n ∈ Z.
By Serre duality,
(23) H iQ(ωR)n
(Hd−i−1(X,F−⊗m1 ⊗F
By (21), there exists n0 such that
(24) H iQ(ωR)−n
(Hd−i−1(X,F−⊗m1 ⊗F
for n ≥ n0.
Now apply the functor L∗ = HomK(L,K) on graded R0-modules, with the grading
(L∗)i = HomK(L−i,K)
to (24), and compare with (17), to obtain
(25) Hd−iP (R)n
∼= (H iQ(ωR)−n)∗
for n ≥ n0, from which (16) immediately follows.
We can now verify that Theorem 3.1 is in fact true for all j > 0, using (22) and (3).
We finally comment that an alternate proof of Theorem 3.2 for j ≫ 0 is obtained
from Theorem 3.1, Formulas (2) and (22), the fact that X is an Abelian variety so that
ωX ∼= OX , and the observation that
h1(X,F−⊗n2 ) = h
1(X,F⊗n2 ) = 0
for n > 0.
References
[A] Aoyama, On the depth and the projective dimension of the canonical module, Japan J. Math.
6(1980), 61–69.
[BrHe] Brodmann, M. and Hellus, M., Cohomological patterns of coherent sheaves over projective
schemes, J. Pure and Appl. Alg. 172 (2002), 165–182.
[Br] Brodmann, M., Asymptotic behaviour of cohomology: tameness, supports and associated primes,
Joint International Meeting of the American Mathematical Society and the Indian Mathematical
Society on Commutative Algebra and Algebraic Geometry, Bangalore/India, December 17-20,
2003, Contemporary Mathematics 390(2005), 31-61.
[BS] Brodmann, M. and Sharp, R., Local cohomology, Cambridge Univ. Press, Cambridge, (1998).
[BH] Bruns, W. and Herzog, J., Cohen-Macaulay rings (Revised edition), Cambridge Studies in Ad-
vanced Mathematics 39, Cambridge University Press, 1998.
[Cu] Cutkosky, S.D., Zariski decomposition of divisors on algebraic varieties, Duke Math. J. 53 (1986),
149 -156.
[CH] Cutkosky, S.D. and Herzog, J., Failure of tameness of Local Cohomology, to appear in Journal
of Pure and Applied Algebra.
[CS] Cutkosky, S.D. and Srinivas, V., On a problem of Zariski on dimensions of linear systems,
Annals of Math. 137 (1993), 531 - 559.
[E] Eisenbud, D., Commutative algebra, with a view towards algebraic geometry, Springer Verlag,
New York, Heidelberg, Berlin (1995).
[Ha] Hartshorne, R., Algebraic Geometry, Springer Verlag, New York, Heidelberg, Berlin, 1977.
[HK] Herzog, J. and Kunz, E., Der kanonische Modul eines Cohen-Macaualy Rings, Lecture Notes in
Mathematics 238, Springer, 1971.
[HR] Herzog, J. and Rahimi, A., Local Duality for Bigraded Modules, math.AC/0604587.
[L] Lim, C.S., Tameness of graded local cohomology modules for dimension R0 = 2, the Cohen-
Macaulay case, Menumi Math 26, 11 - 21 (2004).
[M1] Mumford, D., Lectures on curves on an algebraic surface, Annals of Math Studies 59, Princeton
Univ. Press, princeton (1966).
[M2] Mumford, D., Abelian Varieties, Oxford University Press, Bombay, 1970.
[RS] Rotthaus, C. and Sega, L.M., Some properties of graded local cohomology modules, J. Algebra
283, 232 - 247 (2005).
Marc Chardin, Institut Mathématique de Jussieu Université Pierre et Marie Curie, Boite
247, 4, place Jussieu, F-75252 PARIS CEDEX 05
E-mail address: chardin@math.jussieu.fr
Dale Cutkosky, Mathematics Department, 202 Mathematical Sciences Bldg, University
of Missouri, Columbia, MO 65211 USA
E-mail address: dale@math.missouri.edu
Jürgen Herzog, Fachbereich Mathematik und Informatik, Universität Duisburg-Essen,
Campus Essen, 45117 Essen, Germany
E-mail address: juergen.herzog@uni-essen.de
Hema Srinivasan, Mathematics Department, 202 Mathematical Sciences Bldg, University
of Missouri, Columbia, MO 65211 USA
E-mail address: srinivasanh@math.missouri.edu <srinivasan@math.missouri.edu>
ABSTRACT
  We prove a duality theorem for certain graded algebras and show by various
examples different kinds of failure of tameness of local cohomology.

<|endoftext|><|startoftext|>
Introduction
Is it possible to define weak solutions of the Einstein equations of class
piecewise-C0, i.e. to generalize the compatibility conditions which replace
the field equations on a singular hypersurface to the case when the metric is
regularly discontinuous?
To reach this goal would probably mean to define the most general class
of regularly discontinuous weak solutions of the Einstein equations. It seems
that this problem was never studied before in the literature. But, before we
proceed, we need to discuss whether we are talking of something mathemat-
ically and physically consistent or not.
A fundamental concept of Riemannian geometry is that at any point of
a submanifold there are coordinate choices for which the metric reduces to
http://arxiv.org/abs/0704.0103v1
the Minkowski flat metric. Clearly, if this choice is made on both sides of
the discontinuity surface, any ”jump” in the metric disappears. Thus, the
metric discontinuity appears as a coordinate dependent concept, which is
neither geometrically (nor physically) acceptable in the context of General
Relativity.
But we also have to consider that regularity of the global coordinates plays
an important role in our approach, which is that of [1] and of the literature
cited therein. In particular, since here the spacetime is only C0, we are led to
considering (C0, piecewise C1) coordinate transformations. If the metric is
discontinuous in some globally C0 chart, it is in general impossible to obtain
the vanishing of the metric jump on both sides of a hypersurface with a C0
coordinate transformation (see section 2). Moeover in the following we are
led in a natural way to considering C1 coordinate transformations; the metric
discontinuity is a tensor with respect to such coordinate changes!
In other words the jump of the metric has a precise mathematical mean-
ing, if we consider it in connection with global regular coordinates.
In a well consolidated procedure, the assumption of continuity for the
metric across a gravitational interface is usually taken for granted; however
it follows from the limiting process of the thin sandwich modelization, in
consequence of the hypothesis that the external derivatives of the metric
are bounded [2]. Yet in this paper we are going to see that, even removing
the assumption of continuity, it is still possible to define a generalized inner
geometry of the discontinuity hypersurface; one thus can consistently find
a corresponding generalized set of compatibility conditions, which obviously
reduces to the usual ones when the continuity hypothesis is restored.
Yet, which are the physical motivations to move to such generalization?
Actually gravitational shock waves and thin shells are usually defined by the
presence of singular curvature with a “delta” component concentrated on a
hypersurface, situation which is well cast within the classic C0 piecewise-C1
match of metrics [1, 3].
We were originally led to consider solutions of class piecewise-C0, as pos-
sible generalizations of shock waves and thin shells, by the sake of mathe-
matical completeness, with the idea that phisical interpretation would follow.
We actually found a reacher framework than the usual one, with some in-
teresting new features (and even some rather undesiderable ones), which we
display in this paper.
There are two main theories in the literature for solutions of class C0
piecewise-C1, i.e. that in terms of the second fundamental form (heuristic
theory, see e.g. [4, 5]) and that in terms of the curvature tensor-distribution
(axiomatic theory, see e.g. [6, 1]); such are equivalent through appropiate
extensions (for a self-contained overview see e.g. [1]).
The axiomatic theory appears to be inappropriate to the study of general-
ized solutions, since the theory of distributions is basically linear. Even if we
could in principle replace the discontinuous metric with its associated distri-
bution gD, then it would be impossible to define, for example, replacements
for the Christoffel symbols, since this would involve product of distributions,
which, as it is generally believed, is impossible to define. In fact it was proved
by Schwarz [7] that, under reasonable hypothesis, there can be no definition
of commutative and associative operation on distributions which reduce to
ordinary multiplication on integrable distributions (say on regular functions);
thus in a word it is impossible to define product of distributions.
Or is it? Colombeau [8, 9, 10] developed a theory which apparently con-
tradicts Schwarz’s result. He introduced a very broad space of generalized
functions, which extends the usual space of distributions, a subspace of which
corresponds, in a certain sense (the correspondance is not 1 to 1), to usual
distributions. Colombeau’s formalism permits multiplication of generalized
functions; but the contradiction with the impossibility theorem is only ap-
parent, in fact Schwarz’s hypothesis are violated, since the operation does
not coincide with ordinary multiplication on regular functions nor with mul-
tiplication of a regular function times a distribution (although it does at least
for C∞ functions).
Such theory, however, does not fit in a natural way in general relativity,
since it is impossible to define covariantly invariant geometrical objects; in
fact Colombeau’s space is not invariant for smooth coordinate transforma-
tions, unless they are linear. Such difficulty, however, seems to have been
overcome in subsequent adjustments of the theory, with the introduction of
a richer mathematical framework [11, 12], so that the generalized functions
current apparatus can be used in general relativity, and indeed it has been
applied at least to the calculation of singular curvatures of the spacetimes
of Kerr [13], Reissner-Nordstrom [14], and so-called cosmic-string spacetime
[15]. In such literature Colombeau’s theory is adapted to the handling of
curvature when the metric has a singularity in the sense of functions, i.e. the
ordinary curvature would explode, at a singular event-point or at a singular
worldline. There seems to be no particular reason to forbid Colombeau’s
method also for defining the match of piecewise-C0 regularly discontinuous
metrics at a singular hypersurface; however, as far as the author is aware, no
attempt has been made yet to use it in this framework.
The direct method we will introduce in the following sections, however,
is so conceptually simple that we prefer not to experiment with Colombeau’s
generalized functions, which would instead mean introducing a far more com-
plicated and unfamiliar mathematical apparatus.
In this paper in fact we propose a new, generalized theory for regularly
discontinuous solutions, covering also the match of piecewise-C0 metrics. Our
theory is heuristic, as it is constructed in a way similar to the heuristic the-
ory of C0, piecewise-C1 solutions originated from the studies of Israel, but
we completely avoid the traditional or projectional Gauss-Codazzi framework
(which either does not include the lightlike case [4, 5], or needs a special adap-
tation for it [16, 1]) and introduce what we called “mean-value differential
geometry” framework, instead (see section 3). This is conceptually very sim-
ple, and permits to construct in a natural way a generalized theory, where
the main role (which used to be that of the jump of the secund fundamental
form) is here played by the jump of the Christoffel symbols. The theory is an
extension of the theory of gravitational discontinuity hypersurfaces we have
studied in [1], to which it reduces when the metric is C0. Even if we should
restrict to C0 solution, by adding the traditional assumption of continuity for
the metric, our theory would undoubtedly have at least the good qualities of
not needing the timelike and the lightlike case to be distinguished (different
from usual heuristic theory), and of just requiring C0 continuity for the co-
ordinates (different from the axiomatic theory). Moreover, it is completely
cast in the framework of general coordinates of the ambient (glued) space-
time, with no use of parametric equations of the hypersurface, nor of inner
coordinates and holonomic 3-basis, which could be considered a good quality
in some applications as well.
Piecewise-C0 weak solutions of the Einstein equations, as far as the au-
thor is aware, have never been considered previously in the literature. They
generalize the corresponding C0 solutions, as examples in this paper show;
however there is more. Apparently in fact the theory allows the propagation
of free gravitational discontinuity at lower speed than the speed of light (sec-
tion 8); or rather, we still have no general proof that the absence of stress
energy concentrated on Σ should, in the timelike case, necessarily imply the
degeneracy of a generalized solution to a boundary layer, although it does
at least for a wide class of spherical matchs (see section 6). Moreover, non-
simmetric stress-energy is allowed on the hypersurface (section 9), like e.g. in
Einstein-Cartan dynamics. This possible link to classical unification theories
is surprising, since in our framework we have nothing similar to Einstein-
Cartan torsion. We therefore see a lot of space for future investigation.
2 Discontinuous metrics
Let us suppose V4 an oriented differentiable manifold of dimension 4, of class
(C0, piecewise C2), provided with a strictly hyperbolic metric of signature
–+++ and class piecewise-C0. Let Ω ⊂ V4 be an open connected subset with
compact closure. Let units be chosen in order to have the speed of light in
empty space c ≡ 1. Greek indices run from 0 to 3.
Let Σ ⊂ Ω be a regular hypersuperface of equation f(x) = 0; let Ω+ and
Ω− denote the subdomains distinguished by the sign of f . We suppose the
metric and its first and second partial derivatives to be regularly discontin-
uous on Σ in all charts of class C0(Ω). Let f ∈ C0(Ω) ∩ C2(Ω\Σ), and let
second and third derivatives of f be regularly discontinuous on Σ. Finally,
let ℓα ≡ ∂αf denote the gradient of f .
Let the metric be a solution of the ordinary Einstein equations on each of
the two domains Ω+ and Ω−. In this situation Σ is the interface between two
general relativistic spacetimes and it is called a (generalized) gravitational
discontinuity hypersurface.
In the following we will develope a theory to justify the introduction of
suitable generalized compatibility conditions to replace the Einstein equa-
tions on Σ (section 5); if such conditions are satisfied the match across the
generalized gravitational hypersurface Σ will be called a generalized regularly
discontinuous solution of the Einstein equations.
Now let us briefly recall some basics notions on regularly discontinuous
fields, which we will use as tools. In any case, for notation and terminology
we refer to [1].
A field ϕ is said to be regularly discontinuous on Σ if its restrictions to
the two subdomains Ω+ and Ω− both have a finite limit for f −→ 0; we
denote such limits by ϕ+ and ϕ−, respectively.
In this case the jump [ϕ] across Σ and its arithmetic mean value ϕ are
well defined on the hypersurface:
[ϕ] = ϕ+ − ϕ−
ϕ = (1/2)(ϕ+ + ϕ−)
If ϕ is continuous across Σ, we obviously have: [ϕ] = 0, ϕ = ϕ. We also have
the converse formulae:
ϕ+ = ϕ+ (1/2)[ϕ]
ϕ− = ϕ− (1/2)[ϕ].
As for the product of two functions ϕ and ψ, we have:
[ϕψ] = [ϕ]ψ + ϕ[ψ]
ϕψ = ϕψ + (1/4)[ϕ][ψ]
If a field ϕ is regularly discontinuous on Σ, its jump [ϕ] is sometimes called
its discontinuity of order 0.
The jump of a regularly discontinuous function has support on Σ, but in
general, the partial derivative of the jump is well defined as the jump of the
derivative of the function (see [17, 18]). In particular, the derivative of the
jump of a continuous field is not null, unless the field is also C1.
Similarly, we define the partial derivative of the mean value as the mean
value of the partial derivative. We can also use regular prolongations to
extend, in a sense, the definition of ϕ and [ϕ] to the whole domain Ω. Thus
they can be regarded as regular and derivable fields in Ω, but their values
(and those of their derivatives) are well defined only on Σ, while in Ω\Σ
they depend on the choice of the prolongation. For details on the method of
regular prolongations see e.g. [17, 18].
We moreover define the covariant derivative of a field with support on Σ
by means of the mean value Γβρ
σ of the Christoffel symbols. For the jump of a
regularly discontinuous vector, for example, with this definition one has that
the jump of the covariant derivative is different than the covariant derivative
of the jump. Thus, by definition, we have:
β] = ∂α[V
β] + Γασ
β[V σ] (4)
and in consequence of (3):
β] = [∇αV
β]− [Γασ
, (5)
and similarly for the jump of any regularly discontinuous tensor.
Since the spacetime is only C0, we are led to considering (C0, piecewise
C1) coordinate transformations, with regularly discontinuous first deriva-
tives; the metric discontinuity [gαβ] is not a tensor with respect to such
changes of coordinates. In fact we have:
[gαβ] = [gα′β′]
+ qαβ′
+ qα′β
where:
qα′β =
[gα′β′ ]
+ ḡα′β′
We therefore can simulate all (C0, piecewise C1) coordinate changes by com-
bining C1 changes with metric gauge changes:
[gαβ ]←→ [gαβ] + qαβ′
+ qα′β
which generalize usual gravitational gauge changes of the theory of (C0, piece-
wise C1) solutions [1].
Is it always possible to make [gαβ] vanish with an appropriate C
0 trans-
formation? Clearly the answer is negative. In fact it suffices to consider the
case when [gαβ] and ḡαβ are both definite positive in a given chart to see that
the equation obtained from (6) by replacing the first hand side with 0 has no
solution for [∂xα
/∂xα] and ∂xα
/∂xα. Thus the set of effective generalized
gravitational discontinuity hypersurfaces is non empty.
Moreover it will occur in many applications to have ℓα ∈ C
0. Therefore
it will be often desiderable to work in the framework of (C1, piecewise C2)
coordinate transformations, which preserve such condition. The metric dis-
continuity is a tensor with respect to such changes of coordinates, but the
jump of the Christoffel symbols, which appear to play a main role in the
following, is not; we have in fact:
σ] = [Γα′β′
∂xα∂xβ
If the coordinates are C0 and so is the form ℓα we can write:
∂xα∂xβ
= ℓαℓβ∂
where ∂2 denotes the weak discontinuity of order 2 (see e.g. [17, 18]). Thus on
Σ we can generate all (C1, piecewise C2) transformations for [Γ] combining
C2 transformations (with respect to which Γ is a tensor) and Christoffel
gauges transformations, i.e. of the kind:
σ]↔ [Γαβ
σ] + ℓαℓβQ
σ (11)
with some analogy with the case of C0 metrics (where the main role is played
by the first order metric discontinuity ∂g, see [1] section 3).
In any case neither the mean value of the metric g or its jump [g] now
have null covariant derivatives. Consider in fact the identity ∇αgβρ = 0 in
the domain Ω+; from the limit f −→ 0+, on Σ we have:
βρ − (Γαβ
ν)+g+νρ − (Γαρ
ν)+g+βν = 0 (12)
Here, with obvious meaning of the symbols, we denote: g+βρ = (gβρ)
+, gβρ =
gβρ, etc. Consequently on Σ, from (2)1 we have:
∂αgβρ + (1/2)∂α[gβρ]− Γαβ
νgνρ − Γαρ
νgνβ+
−(1/2)([Γαβ
ν ]gνρ + Γαβ
ν [gρν ] + [Γαρ
ν ]gνβ + Γαρ
ν [gβν ])+
−(1/4)([Γαβ
ν ][gνρ] + [Γαρ
ν ][gβν ]) = 0
Similarly, from the limit f −→ 0− and from (2)2 we also have on Σ:
∂αgβρ − (1/2)∂α[gβρ]− Γαβ
νgνρ − Γαρ
νgνβ+
+(1/2)([Γαβ
ν ]gνρ + Γαβ
ν [gρν ] + [Γαρ
ν ]gνβ + Γαρ
ν [gβν ])+
−(1/4)([Γαβ
ν ][gνρ] + [Γαρ
ν ][gβν ]) = 0
From the sum of expressions (13) and (14) we thus have:
∇αgβρ = (1/4)([Γαβ
ν ][gνρ] + [Γαρ
ν ][gβν ]) (15)
and, from difference:
∂α[gβρ] = [Γαβρ] + [Γαρβ ] (16)
From (16), (3), and from the definition of covariant derivative over Σ, we
then have:
∇α[gβρ] = [Γαβ
ν ]gνρ + [Γαρ
ν ]gβν (17)
As for the jump and the mean value of the Christoffel symbols we have, from
ν = (1/2){gνσ(∂αgβσ + ∂βgσα − ∂σgαβ)+
+(1/4)[gνσ](∂α[gβσ] + ∂β[gσα]− ∂σ[gαβ ])}
ν ] = (1/2){gνσ(∂α[gβσ] + ∂β[gσα]− ∂σ[gαβ ])+
+[gνσ](∂αgβσ + ∂βgσα − ∂σgαβ)
or, from (15) and (17):
ν ]gνρ = (1/2)(∇α[gβρ] +∇β[gρα]−∇ρ[gαβ])
ν ][gνρ] = 2(∇αgβρ +∇βgρα −∇ρgαβ)
3 Mean-value geometry on a hypersurface
Let us consider a 4-vector V α, regularly discontinuous on Σ, the jump and the
mean value of which will work as a prototype of vectors with Σ as support.
We have, by definition:
[∇β∇αV
σ] = ∇β[∇αV
σ]− [Γβα
ν ]∇νV σ + [∇βν
σ]∇αV ν (21)
where [∇αV
σ] = ∇α[V
σ]+ [Γαν
and where, again by definition, we have:
∇νV σ =
{∂ν(V
+)σ + (Γ+)νλ
σ(V +)λ + ∂ν(V
−)σ + (Γ−)νλ
σ(V −)λ} (22)
Thus, from (2) we have:
∇νV σ = ∇νV σ + (1/4)[Γνλ
σ][V λ], (23)
which, incidentally, is the same result we could get from the formal applica-
tion of (3), wich actually can be applied to covariant derivatives, provided
one interpretes ∇ = ∇. We therefore have:
[∇α∇βV
σ] = ∇α∇β[V
σ] +∇β[Γαν
+ [Γαν
σ]∇βV
−[Γβα
ν ]∇νV
− (1/4)[Γβαν][Γνλ
σ][V λ]+
+[Γβν
σ]∇αV
+ (1/4)[Γβν
σ][Γαλ
ν ][V λ]
and thus, by antisymmetrization:
[∇[β∇α]V
σ] = ∇[β∇α][V
σ] +∇[β[Γα]νσ ]V
[Γν[β
σ][Γα]λ
ν ][V λ] (25)
Now, from the Ricci identity we have: [∇[β∇α]V
σ] = [Rαβρ
σV ρ] and then, by
[∇[β∇α]V
σ] = [Rαβρ
+Rαβρ
σ[V ρ], (26)
and thus from a well known identity which follows from (3) as a consequence
our definition (5) for the covariant derivative on Σ, i.e. (see [1]):
[Rαβρ
σ] = ∇β[Γαρ
σ]−∇α[Γβρ
σ] (27)
we have that the commutator of the covariant derivatives of the jump of a
generic regularly discontinuous vector obeys the following Ricci-like formula:
∇[βα][V
(1/2)Rαβρ
σ − (1/4)[Γν[β
σ][Γα]ρ
[V ρ]. (28)
Not surprisingly, working in a similar way starting from ∇β∇αV σ and anti-
symmetrizing, we find again:
∇[βα]V
(1/2)Rαβρ
σ − (1/4)[Γν[β
σ][Γα]ρ
; (29)
in fact any given field with support on Σ can be considered, by the help of
suitable prolongations, as the jump (or as the mean value of) some regularly
discontinuous field. Thus, for any vector V with support on Σ, we can
introduce the following mean-value geometry Ricci-like formula on Σ:
(∇[β∇α])V
σ = (RΣ)αβρ
σV ρ; (30)
where we have introduced the mean-value geometry curvature (RΣ), defined
by the following mean-value geometry first Gauss-Codazzi identity:
(RΣ)αβρ
σ = Rαβρ
σ − (1/4)([Γβν
σ][Γαρ
ν ]− [Γαν
σ][Γβρ
ν ]) (31)
Notice that, for the sake of simplicity, we have introduced a slight abuse
of notation, since in [1] and [16] the same symbol RΣ instead denotes the
inner curvature defined with the help of projections. Actually anything goes
like in [1] section 4 with the Gauss-Codazzi framework, with the difference
that here we don’t have to make projections, which would involve product
times a discontinuous tangent metric. Moreover here we don’t even have
to distinguish between the cases of Σ timelike or lightlike. In other words
our mean-value differential geometry on a hypersurface is a very simple, in
conceptual terms, analogue of the Gauss-Codazzi apparatus.
Thus, with the heuristic theory of [1] section 6 (see also [4] for the timelike
case) in mind as a prototype, we expect the jump of the Christoffel symbols to
play the main role, in place of the secund fundamental form, in the definition
of compatibility conditions for very weak solutions of the Einstein equations.
Indeed, this happens, as it will be shown in the following.
4 Complex mean-value formalism
The metric being dicontinuous on Σ, we are missing the fundamental tool
to rise and lower indices, and to construct curvature in the traditional way.
This is the reason why sometimes one is tempted to introduce some hybrid
metric object on Σ to replace the metric, even in the (C0, piecewise C1) case
(see e.g. [5]). It is reassuring to find out that the framework of the preceeding
section can be confirmed by such a kind of approach.
It would be desiderable to simply replace g with g on Σ, but it is easy to
check that g has not the necessary algebraic requisites; in particular we have
αρ 6= δβ
ρ. Consider instead:
g̃αβ = gαβ + i(1/2)[gαβ], g̃
αβ = gαβ − i(1/2)[gαβ] (32)
where i is the imaginary unit (i.e. we have i2 = −1). It is easy to check,
with the help of (3), that we have:
g̃αβ g̃
αρ = δα
ρ + i[gαβ ]g
αρ (33)
i.e., in particular: ℜ(g̃αβ g̃
αρ) = δβ
ρ. For the sake of brevity in the following
we will denote A ≈ B the relation ℜ(A) = ℜ(B). Thus the pair g̃αβ and
g̃αβ is a good candidate replacement for the metric on Σ, for the purposes of
rising and lowering indices. Now, similar to (32) let us introduce:
Γ̃αβν = Γαβν + i(1/2)[Γαβν ], Γ̃αβ
σ = Γαβ
σ − i(1/2)[Γαβ
σ] (34)
so that we have: Γ̃αβ
σ ≈ Γ̃αβν g̃
σν and conversely: Γ̃αβν ≈ Γ̃αβ
σg̃νσ. Let us
now introduce the differential operator ∇̃ on Σ, which makes use of Γ̃ in
place of Γ. As we could expect we have:
∇̃ρg̃αβ ≈ 0, ∇̃ρg̃
αβ ≈ 0 (35)
which is the replacement on Σ for the covariant conservation of the metric
tensor.
Now let us construct on Σ the complex curvature tensor R̃ in the familiar
way, but with Γ̃ in place of the ordinary Christoffel symbols (which are
undefined on Σ):
R̃αβρ
σ = ∂βΓ̃αρ
σ − ∂αΓ̃βρ
σ + Γ̃βµ
σΓ̃αρ
µ − Γ̃αµ
σΓ̃βρ
µ (36)
We rather unespectedly find out that
R̃αβρ
σ = (RΣ)αβρ
σ + i(1/2)[Rαβρ
σ] (37)
i.e. in particular we have: R̃αβρ
σ ≈ (RΣ)αβρ
σ, where RΣ is given by (31).
This is just another reason for identifying RΣ as the replacement for the
curvature tensor of Σ, which is the first step of our path to the generalized
compatibility conditions.
5 Generalized compatibility conditions
Let us now consider limit f → 0+ of the curvature tensor of the subdomain
Ω+; by (2) we have:
(Rαβρ
σ)+ = Rαβρ
σ + (1/2)[Rαβρ
σ] (38)
and, by (27):
(Rαβρ
σ)+ = Rαβρ
σ +∇[β[Γα]ρ
σ] (39)
We also have, by (31):
(Rαβρ
σ)+ = (RΣ)αβρ
σ +∇[β[Γα]ρ
σ] + [Γν[β
σ][Γα]ρ
ν ] (40)
We see that R and RΣ only differ by terms proportional to [Γ], and not
involving derivatives of it. Thus, in view of neglecting these tems, in the
following we will consider R instead ofRΣ; this simply avoids the introduction
of the symbol “ ∼= ”, with the meaning of equality but for terms not involving
derivatives of [Γ] (which here replaces the second fundamental form K) as in
[1] section 6. Then for the Ricci tensor Rβρ = Rαβρ
α we have:
(Rβρ)
+ = Rβρ + (1/2)∇µ
µ[Γνρ
ν ]− [Γβρ
and for the curvature scalar R = Rα
R+ = R + (1/2)∇µ
µν ]− [Γν
Now, to construct the Einstein tensor G+ we have to remember that, since
the metric is also discontinuous:
(gαβ)
+ = gαβ + (1/2)[gαβ] (43)
so that we have:
(Gβρ)
+ = Gβρ + (1/2)∇µ
µ − (1/8)[gβρ]
µν ]− [Γν
where we have denoted, for the sake of brevity:
µ[Γνρ
ν ]− [Γβρ
µ]− (1/2)gβρ
µν ]− [Γν
Let us fix a coordinate chart and consider a generic (for the moment) regular
prolongation for G, so that its mean value is defined in the whole Ω. Now
consider the Riemann 4-volume integral of G+ over the domain Ω+; from
the Green theorem we obtain (for the general definition of integral on a
hypersurface see [6] p. 6):
Gβρ =
Gβρ + (1/2)
ℓ+µHβρ
µ − (1/8)
ℓ+µ [gβρ]
µν ]− [Γν
The analogous formula for Ω− involves −ℓ− as the outgoing normal vector
and the metric g−αβ = gαβ − (1/2)[gαβ], so we have:
Gβρ =
Gβρ + (1/2)
ℓ−µHβρ
µ + (1/8)
ℓ−µ [gβρ]
µν ]− [Γν
and consequently we have:
Gβρ =
Gβρ +
ℓµHβρ
µ (48)
Thus reasons similar to those of the heuristic theory (see [4] and [1] section
6) lead to the reasonable hypothesis that G remain bounded in the neigh-
bourhood of Σ, for any admissible prolongation, so that from the volume
integral of the Einstein equations, with the presence of an eventual source
term concentrated on Σ:
Gαβ = −χ
Tαβ − χ
T̆αβ (49)
where χ denotes the gravitational constant, we conclude that
ℓµHβρ
µ = −χ
T̆βρ (50)
which is our heuristic reason for considering the following set of general-
ized compatibility conditions to hold on Σ as a replacement for the Einstein
equations:
ℓµHβρ
µ = −χT̆βρ (51)
Here T̆ represents the stress-energy content of the hypersurface.
In the simpler case ℓα ∈ C
0, it is very easy to check that the object ℓµHβρ
is gauge-invariant in the sense of (11), as we could hope.
Turning now to the comparison with the C0 case, we see from eq.s (71)
and (85) of [1] that our generalized conditions (51) are formally identical to
ordinary compatibility conditions [eq. (110) of the same paper], if expressed
in terms of [Γ] (which in the general case is a function of the jump of the
metric [g] as well as of its weak discontinuity ∂g). Therefore it is clear
that generalized compatibility conditions reduce to ordinary ones in case the
metric is continuous, i.e. in case [gαβ] = 0.
In particular, let us suppose g ∈(C0, piecewise C1) and f ∈ C0(Ω); let
us moreover suppose (ℓ · ℓ) > 0, i.e. Σ timelike. By definition of Christoffel
symbols, and from (11) of [1], we have:
σ] = (ℓ · ℓ)−1/2(NβGρ
σ +NρGβ
σ −NσGβρ) + (ℓ · ℓ)
1/2NβNρQ
σ (52)
Q is a vector which can be set to zero with a suitable gauge choice; it plays
no role in (51), as one would expect, in fact we have:
ℓµ[Γβρ
µ] = −Gβρ + (ℓ · ℓ)Nβρ(Q ·N)
ℓβ[Γνρ
ν ] = NβNρGν
ν + (ℓ · ℓ)NβNρ(Q ·N)
ℓµ[Γν
µν ] = Gν
ν + (ℓ · ℓ)(Q ·N)
ℓµ[Γν
νµ] = −Gν
ν + (ℓ · ℓ)(Q ·N)
and, since g = g = h(N) +N ⊗N , we have from (45):
ℓµHβρ
µ = Gβρ − h(N)βρGν
ν (54)
i.e., according to (88) of [1]:
ℓµHβρ
µ = Hβρ (55)
as expected. Now let us instead suppose (ℓ · ℓ) = 0, i.e. Σ lightlike. Let
u ∈ C0(Ω) be a given auxiliary reference frame. From eq.s (21) and (16) of
[1] we have:
σ] = (u ·ℓ)−1(−LβF(u)ρ
σ−LρF(u)β
σ+LσF(u)βρ)+(u ·ℓ)
2LβLρQ̂
σ (56)
and consequently, from (18) and (19) of the same paper:
ℓµ[Γβρ
µ] = LβB(u, n)ρ + LρB(u, n)β − (u · ℓ)
3LβLρ(Q̂ · L)
ℓβ[Γνρ
ν ] = LβLρG(u, n)ν
ν − (u · ℓ)3LβLρ(Q̂ · L)
ℓµ[Γν
µν ] = ℓµ[Γν
νµ] = 0
We therefore have:
ℓµHβρ
µ = G(u, n)ν
νLβLρ − LβB(u, n)ρ − LρB(u, n)β (58)
i.e. again, according to (83) of [1], we have: ℓµHβρ
µ = Hβρ, as expected.
Therefore the set (51) of compatibility conditions, together with ordinary
Einstein Equations to hold on each side of the discontinuity hypersurface,
defines the class of generalized regularly discontinuous solutions of the Ein-
stein equations. And in case [g] = 0, i.e. for continuous metric, from such
conditions we recover the ordinary compatibility conditions for regularly dis-
continuous weak solutions.
However, in the generic case we have some differences, as we are going to
show in the following.
6 A class of spherical boundary layers
Let us consider the match of two piecewise-C0 regularly discontinuous spher-
ical solutions of the vacuum Einstein equations, of the form
ds2 = −a(r, t)dt2 + b(r, t)dr2 + r2dΩ2 (59)
with dΩ2 = dθ2 + sin2 θdϕ2, across a spherical admissible gravitational dis-
continuity hypersurface Σ of equation r = ρ(t), with ρ(t) ∈ C1. Therefore
the form ℓα = δα
r − ρ̇δα
t is continuous (while ℓβ = gβαℓα in general is not).
We suppose globally C0 coordinates, the same form of the metric in both
domains Ω+ and Ω−, and the identification t+ = t−, r+ = r−, θ+ = θ−,
ϕ+ = ϕ− on Σ. Leta, b > 0 and let a, b ∈ piecewise-C0 be regularly dis-
continuous on Σ and with regularly discontinuous first derivatives. Let us
denote by a dot the partial derivative with respect to t, and by a prime that
with respect to r. Let moreover condition a− bρ̇ > 0, i.e. (ℓ · ℓ) > 0, hold on
both sides on Σ.
We have:
[gαβ ] = −[a]δα
t + [b]δα
r (60)
Now let us define the match as a generalized regularly discontinuous solution
by (51), with T̆ = 0, i.e. in the absence of stress-energy concentrated on Σ.
In this case our compatibility conditions reduce to:
ℓβ[Γµρ
µ]− ℓµ[Γβρ
µ] = 0 (61)
which, for a match of metrics of the kind (59), are equivalent to the following
system:
ρ̇[ḃb−1] + [a′b−1] = 0
ρ̇[ḃa−1] + [a′a−1] = 0
ρ̇[a′a−1] + [ȧa−1] = 0
ρ̇[b′b−1] + [ḃb−1] = 0
[b−1] = 0
i.e. we have [b] = 0 and consequently:
ρ̇[ḃ] + [a′] = 0
ρ̇[ḃa−1] + [a′a−1] = 0
ρ̇[a′a−1] + [ȧa−1] = 0
ρ̇[b′] + [ḃ] = 0
and from (3):
ρ̇[ḃ] + [a′] = 0
(ρ̇ḃ+ a′)[a−1] = 0
(ρ̇ a′ + ȧ)[a−1] + (ρ̇[a′] + [ȧ])a−1 = 0
ρ̇[b′] + [ḃ] = 0
Now if we had both ρ̇[ḃ] + [a′] = 0 and ρ̇ḃ + a′ = 0, by (2) we would have
ρ̇ḃ + a′ = 0 on both sides of the hypersurface. We discard for the moment
this singular situation, and from (64)2 we conclude that [a
−1] = 0.
Thus in this case our generalized compatibility conditions imply [a] =
[b] = 0, i.e. they force the match to be C0, piecewise-C1.
In [1] we have already studied some examples of C0, piecewise-C1 matchs
of metrics of the kind (59) at a hypersurface of constant radius r = rb, with
ℓα = δα
r. Namely, we have considered: external Schwarzschild - internal
Schwarzschild; external Schwarzschild - Tolman VI; external Schwarzschild -
Tolman V. Such matchs obviously have ℓα ∈ C
0; moreover condition ρ̇ ḃ+a′ 6=
0 reduce in this case to a′ 6= 0, which is obviously satisfied. In each case we
have verified that condition [a] = [b] = 0 imply ∂a = 0 (where ∂ denotes first
order discontinuity), which then define the match as a boundary layer [1] (it
actually also imply ∂b = 0, as one can verify). Such is a general result, since
for a metric of the kind (59) the completely temporal and radial components
of the Einstein tensor are independent from the second derivatives of the
metric:
Gtt = −a(b
′r + b2 − b)/r2b2
Grr = −(a
′r − ab+ a)/ar2
so that the corresponding vacuum Einstein equations reduce to:
b′r + b2 − b = 0,
a′r − ab+ a = 0.
Now, since in the match of (59) vacuum solutions equations (66) are satisfied
on each side of the interface Σ, their jump is in particular null, and from (3)
we have:
′] + (2b− 1)[b] = 0
′]− a[b]− (b+ 1)[a] = 0
from which it clearly follows that conditions [a] = [b] = 0 imply [a′] = [b′] = 0,
i.e. ∂a = ∂b = 0.
Summarizing, for the match of two piecewise-C0 regularly discontinuous
spherical solutions, in the above hypothesis, generalized compatibility con-
ditions (51) imply [a] = [b] = 0 i.e. they force the match to be C0. On
the other hand conditions [a] = [b] = 0 imply that Σ is a boundary layer.
Therefore for such spherical matchs generalized compatibility conditions (51)
are necessary and sufficient for the match to be a boundary layer.
7 Generalized gravitational shock waves
Let us consider the match of two plane wave metrics of the form
ds2 = −2dξdη + F (ξ)2dx2 +G(ξ)2dy2 (68)
across a hypersurface Σ of equation ξ = 0. Here ξ and η are the two null
coordinates. We suppose continuously matching coordinates and F,G reg-
ularly discontinuous, together with their first and second derivatives. The
gradient vector of Σ is the continuous characteristic (on each side of Σ) vector
ℓα = δα
Generalized compatibility conditions (51) in the case T̆ = 0 (i.e. no stress-
energy concentrated on the hypersurface) reduce to the following single scalar
equation:
[F−1F ′ +G−1G′] = 0 (69)
which characterize the generalized gravitational shock wave. Let us now
study compatibility of (69) with the Einstein Equations. Einstein vacuum
equations also reduce to a single scalar equation:
F−1F ′′ +G−1G′′ = 0 (70)
which we suppose to hold on each side of the hypersurface Σ; thus replacing
F+ and G+ by their expressions in terms of F , [F ], G and [G] according to
(2) gives rise to the following couple scalar conditions:
(2F ′′ + [F ′′])(2G+ [G]) + (2G′′ + [G′′])(2F + [F ]) = 0 (71)
(2F ′′ − [F ′′])(2G− [G]) + (2G′′ − [G′′])(2F − [F ]) = 0 (72)
Equations (71)-(72) are compatible with (69), i.e. the three equations set
can be solved algebraically with respect to F , [G] and to any member of the
pair (F , G), and the solution is not necessarily trivial.
Finally let us notice that, if the additional condition [F ] = [G] = 0 holds,
i.e. if the solution is C0, condition (69) reduces to F−1[F ′]+G−1[G′] = 0 i.e.:
F−1∂F +G−1∂G = 0 (73)
which is the analogous condition for the ordinary shock wave, according to
[1] section 10.5.
8 Slow generalized gravitational waves
Let us start trying to match two vacuum solutions of the kind (68) across
the timelike (on both sides) hypersurface Σ of equation ξ = ζ . Again we
suppose continuously matching coordinates, F,G regularly discontinuous to-
gether with their first and second derivatives, and T̆ = 0. This times gener-
alized compatibility conditions include (69) and the following two additional
scalar conditions:
[FF ′] = 0, [GG′] = 0 (74)
i.e., in terms of F , [F ], G and [G], according to (3):
F [F ′] + [F ]F ′ = 0 (75)
G[G′] + [G]G′ = 0 (76)
It is easy to check that the system (75)-(76) is not compatible with (71)-(72),
in the sense that the whole system does not admit non-trivial solutions for
F , [F ], G and [G].
On the other hand we have proved in section 6 that a wide class of gen-
eralized spherical matchs at a hypersurface of constant radius necessarily
degenerate to a C0 match.
Other examples of degeneracy have not been included in the paper for
the sake of brevity, but at least it seems to be a hard task to construct a non-
trivial generalized match across a timelike (on each side) hypersurface, with
no stress-energy content. Such difficulty is certainly not a proof that this is
an impossible task, but it makes us wonder whether such a solution should
necessarily degenerate to a boundary layer, just like it happens for ordinary
C0 solutions (see e.g. [1]). This would forbid the existence of generalized
solutions which propagate at a speed slower than light. Such would be a
desiderable prohibition under certain respect, since one could expect that
gravitational interactions in vacuo must necessarily propagate at the speed
of light also in a generalized theory.
In general terms, since for generalized solutions the metric is discontinu-
ous, a hypersurface can in principle have different signatures on the different
sides; for this reason we cannot simply distinguish between the timelike and
the lightlike case, as for usual C0 solutions. We should rather distinguish
between three cases: timelike-timelike, timelike-lightlike (or conversely) and
lightlike-lightlike.
In any case it is legitimate to expect that, at least in the timelike-timelike
case, similar to the timelike case of (C0, piecewise C1) solutions, absence of
stress-energy concentrated on Σ should imply the solution to degenerate to
a boundary layer [1].
Unfortunately for generalized solutions we still have no proof that absence
of stress energy concentrated on Σ does necessarily imply the degeneracy of
the solution to a boundary layer.
Therefore, although the examples considered in this paper seem to sug-
gest that such property could hold true also in the generalized case, for the
moment such result is still a conjecture; we thus have to admit that the the-
ory in principle allows propagation of generalized gravitational shock waves
at lower speed than the speed of light. We would call such waves “slow gen-
eralized gravitational shock waves”. It would be reasonable to forbid this
situation as unphysical, but for now this can only be done ad hoc, by means
of a corresponding additional hypothesis.
9 Non-symmetric stress-energy
Notice that ℓµHβρ
µ is not necessarily symmetric; from identity:
ν = (1/2)g−1∂αg (77)
where g denotes the determinant of the contravariant metric, we have:
ℓµH[βρ]
µ = (1/4)(ℓβ[g
−1∂ρg]− ℓρ[g
−1∂βg]) (78)
Thus the generalized scheme allows in principle the presence of non sym-
metric stress-energy on the discontinuity hypersurface. We will display non-
trivial examples of non-symmetry in the following section. Notice that the
right hand side of (78) is identically null in case g ∈ C0 and ℓα ∈ C
0, since
in this case we have [g−1∂βg] = ℓβg
−1∂g.
A non-symmetric Einstein tensor is a feature of Einstein-Cartan theory of
gravitation (see [19], see also [3] section 7.2), where it is due to the presence
of torsion in the non-symmetric connection used to construct generalized
curvature. Thus the generalized theory can be interpreted, at least to some
extent, as introducing a torsion equivalent tool on the shell only, even if there
are no geometrical objects in our theory which can be directly interpretated
as torsion. However, Einstein-Cartan theory also has a spin - angular mo-
mentum field equation in addition to the Einstein equations, which here is
missing.
In the literature, compatibility conditions for C0 solutions of boundary
layers [20], and recently of shock waves and thin shells [21], have been studied
also in the framework of Einstein-Cartan theory; actually this can lead to
non-symmetric stress-energy on the shell. But in that theory this feature
is inherited from the ambient spacetime, which is not here: non-symmetric
stress-energy arises on the shell only, in consequence of the theory. This
interesting feauture probably is worth investigating.
10 Generalized thin shells
Now let us consider a more general form of the spherical metric:
ds2 = −a(r, t)dt2 + b(r, t)dr2 + c(r, t)dΩ2 (79)
Let us consider a match of two spherical solutions of the Einstein equations
across a timelike (on each side) hypersurface of equation r = ρ(t). Again we
suppose ρ(t) ∈ C1 and therefore ℓα = δα
r − ρ̇δα
t ∈ C0. Let the coordinates
be C0 globally, and let the metric have the same form (79) in both domains
Ω+ and Ω−, with the identification t+ = t−, r+ = r−, θ+ = θ−, ϕ+ = ϕ− on
Let moreover a, b, c > 0 and let a, b, c ∈ piecewise-C0 be regularly dis-
continuous on Σ and with regularly discontinuous first derivatives. Again we
denote by a dot the partial derivative with respect to t, and by a prime that
with respect to r.
In this case for the left hand side of the generalized compatibility condi-
tions ℓµHβρ
µ we obtain:
ℓµHβρ
µ = −
[a′b−1/2] + ρ̇[ḃb−1/2 + ċc−1]
+([a′a−1/2 + c′c−1] + ρ̇[ḃa−1/2])δβ
+([ȧa−1/2 + ċc−1] + ρ̇[a′a−1/2])δβ
−([ḃb−1/2] + ρ̇[b′b−1/2 + c′c−1])δβ
+([c′b−1/2] + ρ̇[ċa−1/2])(δβ
θ + sin2 θδβ
[b−1(a′a−1/2 + c′c−1)] + ρ̇[a−1(ḃb−1/2 + ċc−1)]
where, obviously:
gβρ = −aδβ
t + bδβ
r + c(δβ
θ + sin2 θδβ
ϕ) (81)
We now mean to interpret (80) as the matter-energy of a thin shell. Let
us first get back to the particular case ρ̇ = 0 (static shell) and ȧ = ḃ =
ċ = 0, to make the interpretation simpler by eliminating the non-symmetric
component; rearranging some terms we in fact obtain:
ℓµHβρ
µ = (−[a′b−1/2] + ac−1[c′b−1/2])δβ
+([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δβ
c−1[c′b−1/2]− [b−1(a′a−1/2 + c′c−1)]
This can be interpretated as a perfect isotropic magneto-fluid thin shell with
infinite conductivity, i.e. we can solve the compatibility conditions by con-
sidering the following stress-energy as the right hand side:
T̆αβ = (ρ0 + p+ µh
2)UαUβ + (p+ (1/2)µh
2)gαβ − µhαhβ (83)
where ρ0 is the proper density, h the magnetic field and µ the magnetic
permeability [22, 23, 24, 25, 6]; here we define h2 = hαhβg
αβ . In fact it
suffices to define the following 4-velocity vector:
a− (1/4)[a2]a−1δα
t = (a−1)1/2δα
t (84)
which by construction is unitary on Σ, in the following sense: UαUβg
αβ = −1,
and the following magneto-hydrodynamical variables:
ρ0 = χ
−1a−1[a′b−1]/2 + χ−1c−1(b b−1/2− aa−1 + 1)[c′b−1]/2+
−χ−1[b−1](a′a−1/2 + c′c−1)− (3/2)χ−1b−1[a′a−1/2 + c′c−1]
p = χ−1c−1(bb−1/2− 1)[c′b−1/2] + (1/2)χ−1b−1[a′a−1/2 + c′c−1]+
+χ−1[b−1](a′a−1/2 + c′c−1)
hα = ±
b−1χ−1([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δα
to match (82) and (83) via ℓµHβρ
µ = −χT̆βρ. If [a] = [b] = [c] = 0 then the
generalized shell (85) degenerates to the C0 magnetohydrodynamical shell
considered in [1] section 10.1, in the particular case ρ̇ = 0.
The slightly more general case of ȧ = ḃ = ċ = 0, but ρ̇ 6= 0, displays
non-symmetric terms in (80); however it is not difficult to see that the per-
fect magnetofluid interpretation still holds, provided such additional non-
symmetric terms are interpreted, or neglected. In fact in this case we have:
ℓµHβρ
µ = (−[a′b−1/2] + a c−1[c′b−1/2])δβ
+([a′a−1/2 + c′c−1]− bc−1[c′b−1/2])δβ
+(1/2)ρ̇[a′a−1/2− b′b−1/2− c′c−1](δβ
t + δβ
+(1/2)ρ̇[a′a−1/2 + b′b−1/2 + c′c−1](δβ
t − δβ
c−1[c′b−1/2]− [b−1(a′a−1/2 + c′c−1)]
Now let us consider, for the sake of brevity, the following quantities:
ρ̇2[ a
]2b−1 − (a
]− [ a
])2a−1
]− [ a
]− [ a
and let us suppose that inequality α < 0 holds, which is necessary for the
physical interpretation. In fact in this case the following vector:
]− [ a
t + 1
ρ̇[ a
]− [ a
is a unit timelike vector on Σ, in the sense that UαUβg
αβ = −1. Rearranging
terms, (86) now reads:
ℓµHβρ
µ = αUβUρ + βδβ
r + 1
ρ̇[ a
t − δβ
c−1[ c
]− [b−1( a
which can be matched via ℓµHβρ
µ = −χT̆βρ with a stress-energy tensor of
the following kind:
T̆αβ = (ρ0 + p+ µh
2)UαUβ + (p+ (1/2)µh
2)gαβ − µhαhβ + Aαβ (91)
where A denotes the anti-symmetric term. We have:
ρ0 = −χ
−1α + χ−1 1
]− χ−1[b−1( a
)]− 1
p = −χ−1 1
] + χ−1[b−1( a
)]− 1
µh2 = χ−1βb−1
while the anti-symmetric term A reads:
Aαβ = −χ
t − δβ
r) (93)
The interpretation of such term is still missing; alternatively it could be
neglegted by adding the further hypothesys:
= 0 (94)
which is equivalent to [g−1g′] = 0, as we could expect from (78).
References
[1] G. Gemelli Gen. Rel. Grav. 34 1491-1540 (2002).
[2] S.O’Brien, J.L.Synge Comm. Dublin Inst. Adv. Stud. Ser. A 9 1-20
(1952).
[3] C. Barrabes, P.A. Hogan Singular null hypersurfaces in general relativity
World Scientific, Singapore (2003).
[4] W. Israel, Nuovo Cimento B 44 1 (1966); corrections in 48, 463.
[5] C. Barrabes, W. Israel Phys. Rev. D 43 1129 (1991).
[6] A. Lichnerowicz, Magnetohydrodynamics: waves and shock waves in
curved space-time,Mathematical physics studies vol. 14, Kluwer academic
publishers, Dordrecht, 1994.
[7] L. Schwarz, C. R. Acad. Sci. Paris 239 847 (1954).
[8] J.F. Colombeau J. of Math. Anal. and appl. 94 96 (1983).
[9] J.F. Colombeau New generalised functions and multiplication of distribu-
tions North-Holland (1984).
[10] J.F. Colombeau Multiplication of distributions Lecture notes in Mathe-
matics 1532, Springer (1992).
[11] J.F. Colombeau, A. Meril J. Math. Anal. Appl. 186 (1984).
[12] J.A. Vickers, J.P. Wilson Class. Quantum Grav. 16 579-588 (1999).
[13] H. Balasin, Class. Quantum Grav. 14 (1997).
[14] R. Steinbauer J. Math. Phys. 38 1614 (1997).
[15] C.J.S. Clarke, J.A. Vickers, J.P. Wilson Class. Quantum Grav. 13 2485
(1986).
[16] G. Gemelli, J. Geom. Phys. 43/4 371-383 (2002).
[17] C. Cattaneo Istit. Lombardo Accad. Sci. Lett. Rend. A 112 139 (1978).
[18] G. Gemelli J. Geom. Phys. 20 233 (1996).
[19] E. Cartan C. R. Acad. Sci. Paris 174 593-595 (1922).
[20] W. Arkuszewsky, W Kopczynski, V.N. Ponomariev Commun. Math.
Phys. 45 183-190 (1985).
[21] G.F. Bressange Class. Quantum Grav. 17 2509-2523 (2000). bibitem-
lich67
[22] A. Lichnerowicz, Ann. Inst. H.Poincaré 7 271 (1967).
[23] A. Lichnerowicz, Comm. Math. Phys. 12 145 (1969).
[24] A. Lichnerowicz, in “Centr. Int. Mat. Est. 1970,” Cremonese, Roma,
(1971).
[25] A. M. Anile Relativistic fluids and magneto - fluids Cambridge Univer-
sity Press, Cambridge (1989).
	Introduction
	Discontinuous metrics
	Mean-value geometry on a hypersurface
	Complex mean-value formalism
	Generalized compatibility conditions
	A class of spherical boundary layers
	Generalized gravitational shock waves
	Slow generalized gravitational waves
	Non-symmetric stress-energy
	Generalized thin shells
ABSTRACT
  The physical consistency of the match of piecewise-$C^0$ metrics is
discussed. The mathematical theory of gravitational discontinuity hypersurfaces
is generalized to cover the match of regularly discontinuous metrics. The
mean-value differential geometry framework on a hypersurface is introduced, and
corresponding compatibility conditions are deduced. Examples of generalized
boundary layers, gravitational shock waves and thin shells are studied.

<|endoftext|><|startoftext|>
Introduction
This paper is a step in a broader program, which aims at finding a geomet-
ric counterpart to the Mirror Symmetry phaenomenon, and possibly a geometric
language in which to formulate a physical theory interpolating between different
σ-models. While we direct the reader to [G2],[G3] for more details, we list here
only some aspects of this theory to put the present work into context.
In the Strominger-Yau-Zaslow approach to Mirror Symmetry you have that two
mirror dual Calabi-Yaus should posses (in some limiting sense) semi-flat special
lagrangian torus fibrations f : M → B, f̂ : M̂ → B which have as fibres flat tori
which are dual in the metric sense (see [SYZ], and [G2] for the terminology and the
definitions). As it is widely known, the major drawback of this approach is that it
is very difficult to build special lagrangian tori fibrations. Usually this construction
can be carried out only when the dual Calabi-Yau manifolds are actually hyper-
kahler, and the special lagrangian tori can be viewed as complex submanifolds (with
respect to a rotated complex structure), so that the methods of complex algebraic
geometry can be put to work.
When you do have the fibrations, then the idea is to construct the mirror map as
a sort of Fourier-Mukai transform (see for example [BMP]). This Fourier-Mukai
transform is a correspondence induced by pull-back and push forward from the
space X = M ×B M̂ . In the hyperkähler case this space is a complex manifold,
while in the general case (for example for Mirror Symmetry for Calabi-Yau three-
folds) it is just a real manifold of (real) dimension 3 · dimC(M).
Background. The notion of (Weakly) self-dual manifold (cf. [G2]) was con-
ceived in the first place to isolate the geometric aspects of the X above which are
needed to obtain Mirror Symmetry betweenM and M̂ . We reproduce here the def-
inition for the reader, while referring to [G2] and [G3] for all the remarks, examples
and observations:
Definition 1.1. A weakly self-dual manifold (WSD manifold for brevity) is given
by a smooth manifold X, together with two smooth 2-forms ω1, ω2 a Riemannian
metric and a third smooth 2-form ωD (the dualizing form) on it, which satisfy the
following conditions:
1) dω1 = dω2 = dωD = 0 and the distribution ω
1 + ω
2 is integrable.
2) For all p ∈ X there exist an orthogonal basis dx1, .., dxm, dy
1 , ..., dy
m, dy
1 , ..., dy
Date: October 24, 2006.
http://arxiv.org/abs/0704.0104v1
2 GIOVANNI GAIFFI, MICHELE GRASSI
dz1, ..., dzc, dw1, ..., dwc of T
pX such that the dx1, .., dxm, dy
1 , ..., dy
m, dy
1 , ..., dy
are orthonormal and
(ω1)p =
dxi ∧ dy
i , (ω2)p =
dxi ∧ dy
i , (ωD)p =
dy1i ∧ dy
dzi ∧ dwi
Any orthogonal basis of TpX dual to a basis of 1- forms as above is said to be
adapted to the structure, or standard. The number m is the rank of the structure.
For a more intrinsic definition of WSD manifolds the reader should refer to [G2].
Here we have chosen the quickest way to introduce them.
When the forms ω1, ω2, ωD are covariant constant with respect to the Levi-Civita
connection, we speak of 2-Kähler manifolds. An example of these comes from mirror
symmetry for abelian varieties.
Remark 1.2. The form ωD is symplectic once restricted to ω
1 + ω
2. We have
therefore that ω
dim(X)−m
6= 0.
Definition 1.3. 1) A WSD manifold is nondegenerate if dim(ω01 ∩ ω
2)p = 0 at all
points (equivalently if its dimension is 3 times the rank).
2) A WSD manifold is self-dual (SD manifold for brevity) if all the leaves of the
distribution ω01 + ω
2 have volume one (with respect to the volume form induced by
the metric)
Using Self dual manifolds, you can give a first näıve geometric definition of Mir-
ror Symmetry as follows:
Two Calabi-Yau manifolds with B-field (M,BM ) and (M̂,BM̂ ) are mirror dual
if there is a Self-dual manifold X together with surjections π : X → M and
π̂ : X → M̂ such that:
a) π∗(ωM ) = ω1, π̂
) = ω2.
b) The leaves of ω⊥1 are the fibres of π̂
c) The leaves of ω⊥2 are the fibres of π
d) The induced B-fields on M and M̂ are the ones given.
Here make their first appearence the B-fields BM and BM̂ , which are flat unitary
gerbes on M and M̂ respectively, and which are not relevant for the discussions of
this paper. In [G2] it was shown that this picture works well in the case of elliptic
curves, and for some other flat situations.
Physical motivation. One of the reasons to introduce SD manifolds however
was to get rid of special lagrangian fibrations, which are so difficult to construct,
and to be able to attack the problem of Mirror Symmetry also when these fibra-
tions are not expected to exist. In this more general context one expects that the
Mirror Symmetry phaenomenon will not be obtained directly from fibrations of
a SD manifold to the dual Calabi-Yaus, but via a more sophisticated procedure,
which involves a Gromov-Hausdorff type of limit. In [G3] it was shown that for
the family of anticanonical divisors in complex projective space one can build a
(real) two-dimensional family of WSD manifolds, which degenerate in a normalized
Gromov-Hausdorff sense to the correct limits of the mirror dual Calabi-Yaus. The
picture is the following:
A GEOMETRIC REALIZATION OF sl(6,C) 3
MB MAS
where MA and MB are the large Kähler and large complex structure limits of M
and M̂ respectively. To be precise, the manifolds which come out of the costruc-
tion of [G3] are 11 dimensional (degenerate) Weakly self-dual manifolds or rank 3.
Dimension 11 is very appealing in this context from a physical point of view, and
it brings us to the motivation for the present work.
The point of view of [G3] is very different from the current one in the main liter-
ature on mathematical Mirror Symmetry: instead of considering the fibre product
M×B M̂ (when it exists) as a device for proving Mirror Symmetry for Calabi-Yaus,
the limiting Calabi-Yaus of Mirror Symmetry are seen as very special limits of a
family of Self-Dual manifolds, which are the main objects of study. This is actually
more in line with what can be found in the physical literature, where the σ-models
defining the string theories from which Mirror Symmetry originates are seen as
just ”phases” of a unique theory, which is not necessarily in the form of a σ-model
but could very likely be similar to a quantized Gauge theory on an 11-dimensional
manifold. To make this circle of ideas more concrete (and hence more verifiable) at
the end of [G3] it is suggested that one should try to build a natural gauge theory
on Self-dual manifolds: the hope is that once quantized this gauge theory might in-
terpolate between the σ-models associated to the Calabi-Yau’s, and as a byproduct
prove Mirror Symmetry for them. Of course one can always put a gauge bundle on
the Self-dual manifolds ”artificially”, but a natural bundle which depends only on
the geometric structure would be much more appealing. We ignore here the issue
of which action to put on the theory, but it too should be a natural geometric one.
Finally, on [GG] we analyzed the situation for rank three WSD manifolds, and we
found that in this case the corresponding natural bundle is formed by complex Lie
superalgebras. We were able to find a geometrically motivated real form, and to
split it into simple factors. The results of [GG] confirm the suspicion that on a WSD
manifold of high enough rank there could be enough natural algebraic bundles of
operators to build interesting gauge theories.
The construction of LC. From a physical point of view the case of Calabi-
Yau threefolds (i.e. rank three WSD manifolds) or fourfolds (i.e. rank four WSD
manifolds) would be the most interesting one to start with. However, its technical
difficulty convinced us to start more modestly from the case of Calabi-Yau two-
folds (i.e. K3 surfaces) which correspond to rank two Self-dual manifolds. We also
considered only orientable nondegenerate Self-dual manifolds of rank two, hence of
dimension 6. This could be considered a proof of concept from a physicist’s point
of view, however Mirror Symmetry for K3’s is in itself very interesting mathemati-
cally, so we hope that our results could have some useful geometric consequences.
The rank three case is treated in our subsequent [GG], as mentioned in the previous
section of this introduction. The main result of the present paper is the following
4 GIOVANNI GAIFFI, MICHELE GRASSI
(which is a geometric restatement of Theorem 5.11):
The Lie algebra sl(6,C) acts via canonical operators (depending only on the geo-
metric structure) on the smooth differential forms of any orientable nondegenerate
WSD manifold of rank 2.
This action generalizes naturally the action of sl(2,C) on smooth differential
forms of any almost Kähler manifold, and is induced by a bundle action on the
exterior power of the cotangent bundle.
Recall that a Weakly self-dual manifold is a Riemannian manifold with three
”compatible” closed differential forms. We will build a Lie algebra of pointwise
operators on complex differential forms on X , as smooth sections of a bundle of Lie
algebras of operators on the complexified cotangent bundle of X . To start, one can
define the following operators:
Definition 1.4. For φ ∈ Ω∗
L0(φ) = ωD ∧ φ, L1(φ) = −ω2 ∧ φ, L2(φ) = ω1 ∧ φ
One can notice immediately the strong resemblance of the operators above with
the Lefschetz operator of Kähler geometry. Indeed, one can elaborate on this simi-
larity, and use the metric to define the adjoints Λj = L
j (using a pointwise proce-
dure, as in the almost Kähler case).
Simply using the Lj and the Λj , one can show that the algebra generated is iso-
morphic to SL(4,C) ([G2]). However, there are other natural differential forms on
a WSD manifold (which do not have a counterpart in the Kähler case), namely the
volume forms of the distributions ω⊥1 , ω
2 , ω
D of vectors which contract to zero
with the forms ω1, ω2 and ωD respectively. If one calls V0, V1, V2 the corresponding
wedge operators, and A0, A1, A2 their adjoints, the complexity of the calculations
to describe the generated Lie algebra grows a lot. We called L the algebra generated
by the Lj, Vj and their adjoints, and LC its complexification. To study LC we intro-
duced an operator J , which is a complex structure on each of the two-dimensional
distributions mentioned above and generates a group isomorphic to SO(2,R) (recall
that we are in the ”hyperkahler” case, corresponding to Mirror Symmetry for K3’s,
so an ”extra” complex structure shouldn’t be surprising; moreover the holonomy of
a WSD manifold in which all ω1, ω2, ωD are invariant is actually always included in
the group generated by J). One checks that all the operators introduced commute
with it:
∀j [Lj , J ] = [Λj , J ] = [Vj , J ] = [Aj , J ] = 0
and therefore one can try to decompose Λ∗T ∗
X with respect to J and then use
Shur’s Lemma to reduce to the study of the operators on the isotypical components.
One should mention that in the (very) good cases (for instance 2-Kähler manifolds)
the operators above are all covariant constant with respect to the metric connec-
tion, and define an action on the cohomology of X much in the same way as in the
Kähler setting the operators L and Λ do (due to Hodge-type identities). We don’t
explore this aspect here, although it may be relevant to the (homological) mirror
map construction.
Coming back to the construction, we point out the inclusion of the Lie algebra LC
inside a copy of the Clifford algebra Cl6,6.
Using this Clifford algebra one can identify ”degree two” or ”quadratic” operators
(in a way similar to the ones involved in the Spinor representations on standard
Spin manifolds) and among these the SO(2,R)-invariant ones. A posteriori, it turns
out that the operators of LC⊕ < J > are all the J-invariant operators of ”degree
A GEOMETRIC REALIZATION OF sl(6,C) 5
two”, and this strengthens the rationale in our selection of natural operators.
As a last step one finds that inside Λ∗T ∗X there is an SO(2,R)-isotypical compo-
nent of dimension 6, and by direct computation we prove that indeed the operators
restricted to this sub-representation determine a copy of sl(6,C) (with the defin-
ing representation). Using the bound on the dimension of LC obtained computing
”quadratic” invariants, one then shows that the representation on this isotypical
component is faithful. This provides as a byproduct a method for giving presenta-
tion of standard Serre generators of LC, explicitely written in terms of the natural
geometrical generators.
2. Basic operators
In this section we fix a point p in the WSD manifold X . The WSD structure
splits the cotangent space as T ∗pX =W0⊕W1⊕W2 where theWj are three mutually
orthogonal canonical distributions defined as:
W0 = {φ ∈ T
pX | φ ∧ ω
1 = φ ∧ ω
2 = 0}
W1 = {φ ∈ T
pX | φ ∧ ω
1 = φ ∧ ω
D = 0}
W2 = {φ ∈ T
pX | φ ∧ ω
2 = φ ∧ ω
D = 0}
The WSD structure also determines canonical pairwise linear identifications
among W0,W1 and W2, so that one can also write T
pX = W0 ⊗R R
3 or more
simply
T ∗pX =W ⊗R R
where W =W0 ∼=W1 ∼=W2.
Let us now come back to the canonical operators Lj mentioned in the introduction:
Definition 1.4 For φ ∈ Ω∗
L0(φ) = ωD ∧ φ, L1(φ) = −ω2 ∧ φ, L2(φ) = ω1 ∧ φ
We now choose a (non-canonical) orthonormal basis γ1, γ2 forW0, and this together
with the standard identifications of the Wj determines an orthonormal basis for
T ∗pX , which we write as {vij = γi ⊗ ej | i = 1, 2, j = 0, 1, 2}. We remark that
the vij are an adapted coframe for the WSD structure, and therefore we have the
explicit expressions:
ω1 = v10 ∧ v11 + v20 ∧ v21
ω2 = v10 ∧ v12 + v20 ∧ v22
ωD = v11 ∧ v12 + v21 ∧ v22
A different choice of the γ1, γ2 would be related to the previous one by an element in
O(2,R) or, taking into account the orientability ofX mentioned in the Introduction,
an element of SO(2,R). The Lie algebra of the group SO(2,R) expressing the
change from one oriented adapted basis to another is generated (point by point) by
the global operator J :
Definition 2.1. The operator J ∈ EndR(Ω
∗(X)) is induced by its pointwise action
on the Λ∗T ∗pX for varying p ∈ X, defined in terms of the standard basis vij as
J(v1j) = v2j , J(v2j) = −v1j for j ∈ {0, 1, 2}
and J(v ∧ w) = J(v) ∧ w + v ∧ J(w) for v, w ∈ Λ∗T ∗pX
Remark 2.2. As J commutes with itself, it is well defined, independently of the
choice of an oriented adapted basis.
Using the chosen (orthonormal) basis, one can define corresponding (non canon-
ical) wedge and contraction operators:
6 GIOVANNI GAIFFI, MICHELE GRASSI
Definition 2.3. Let i ∈ {1, 2} and j ∈ {0, 1, 2}. The operators Eij and Iij are
respectively the wedge and the contraction operator with the form vij on
(defined using the given basis); we use the notation ∂
to indicate the element of
TpX dual to vij ∈ T
Eij(φ) = vij ∧ φ, Iij(φ) =
Proposition 2.4. The operators Eij , Iij satisfy the following relations:
∀i, j, k, l EijEkl = −EklEij , IijIkl = −IklIij
∀i, j EijIij + IijEij = Id
∀(i, j) 6= (k, l) EijIkl = −IklEij
∀i, j E∗ij = Iij , I
ij = Eij
where ∗ is adjunction with respect to the metric.
Proof The proof is a simple direct verification, which we omit. �
It is then immediate to verify that:
Proposition 2.5. J can be expressed as
(E2jI1j − E1jI2j)
on the whole
T ∗pX. From this expression and the previous proposition one ob-
tains that J∗ = −J , i.e. for every p the Lie algebra generated by J is a subalgebra
of o(
T ∗pX) isomorphic to so(2,R)
∼= R. Moreover, the exponential images in-
side AutR(Ω
∗(X))of the operators of type tJ for t ∈ R form a group isomorphic to
SO(2,R) ∼= S1, as this isomorphism holds for the (faithful) restriction of the group
action to T ∗pX.
Using the (non canonical) operators Eij we can obtain simple expressions for the
pointwise action of the other canonical operators, the volume forms Vj :
Definition 2.6. For φ ∈
T ∗pX,
V0(φ) = E10E20(φ), V1(φ) = E11E21(φ), V2(φ) = E12E22(φ)
Remember however that the operators Vj do not depend on the choice of a basis,
as they are simply multiplication by the volume forms of the spaces Wj .
We use the vij also as a orthonormal basis for the complexified space T
p ⊗R C
(with respect to the induced hermitian inner product). We indicate with the same
symbols Vj the complexified operators acting on the spaces
T ∗pX .
The riemannian metric induces a Riemannian metric on T ∗pX and on the space
T ∗pX .
Definition 2.7. For j ∈ {0, 1, 2}
Λj = L
j , Aj = V
By construction the canonical operators Lj , Vj ,Λj, Aj on
T ∗pX are the point-
wise restrictions of corresponding global operators on smooth differential forms,
which we indicate with the same symbols: for j ∈ {0, 1, 2},
Lj , Vj ,Λj , Aj : Ω
∗(X) → Ω∗(X)
Summing up:
A GEOMETRIC REALIZATION OF sl(6,C) 7
Definition 2.8. The ∗-Lie algebra L is the ∗-Lie subalgebra of EndR (Ω
∗(X)) gen-
erated by the operators
{Lj, Vj ,Λj, Aj | for j = 0, 1, 2}
The ∗ operator on L is induced by the adjoint with respect to the Riemannian
metric. The ∗-Lie algebra LC is L ⊗ C, and is in a natural way a ∗-Lie subalgebra
of EndC (Ω
(X)). The ∗ operator on LC is induced by the adjoint with respect to
the induced Hermitian metric.
The canonical splitting T ∗pX = W0 ⊕ W1 ⊕ W2 together with the canonical
identifications W0 ∼=W1 ∼=W2 induce an action of the symmetric group S3, which
propagates to
T ∗X and to its C∞ sections. At every point, the action can be
written explicitly in terms of the basis as
σ(vij) = viσ(j)
The induced action on endomorphisms via conjugation, σ(φ) = σ◦φ◦σ−1, preserves
LC. Indeed, one can check directly using the basis vij at every point that for σ ∈ S3
σ(Vj) = Vσ(j), σ(Lj) = ǫ(σ)Lσ(j)
Since S3 acts on LC by conjugation with unitary operators, its action commutes
with adjunction (the ∗ operator), and therefore
σ(Aj) = Aσ(j), σ(Λj) = ǫ(σ)Λσ(j)
Moreover, one also has that σ(J) = J which means that the action of S3 commutes
with that of so(2,R).
3. The action of so(2,R)
When one deals with mirror simmetry for 2-Kähler manifolds (see the Introduc-
tion), the WSD manifolds which arise have the property that the forms ω1, ω2 and
ωD are covariant constant with respect to the metric. In this case, the maximal
possible holonomy of the WSD manifold X is included in the so(2,R) generated
by the operator J . We will show now that J commutes with LC. Our proof will
be strictly algebraic, so that the commutativity between so(2,R) and LC will hold
also on WSD manifolds for which the holonomy is more general.
Definition 3.1. Given n ∈ Z, we indicate with Vn the one dimensional complex
representation of SO(2,R) ∼= S1 ∼= R/Z given by the character:
θ → e2πınθ
Proposition 3.2. Under the SO(2,R) representation induced by the operator J ,
for any p ∈ X :
1) The space
Xp) splits as
V ⊕31
8 GIOVANNI GAIFFI, MICHELE GRASSI
2) The whole space
Xp) splits according to the following picture:
Xp) = V0
Xp) = V
V ⊕31
Xp) = V
V ⊕90
V ⊕32
Xp) = V−3
V ⊕91
Xp) = V
V ⊕90
V ⊕32
Xp) = V
V ⊕31
Xp) = V0
Proof 1) The space T ∗
Xp is a direct sum of the three Wj , and each one of these
is the standard two dimensional real representation of so(2,R). We therefore diag-
onalize the representation introducing a new basis for each Wj =< v1j , v2j >:
wj = v1j + ı v2j , wj = v1j − ıv2j
From the definition of J , one has then for every j ∈ {0, 1, 2}
J(wj) = −ıwj , J(wj) = ıwj
Therefore one has for every j ∈ {0, 1, 2}
< wj >∼= V−1, < wj >∼= V1
2) To prove the general case, we use the fact that the operator J determines an
almost complex structure on the manifold X , compatible with the metric. From
this, following standard arguments, the complex differential forms and also the
elements of
Xy for any y ∈ Y can be divided according to their type:
p+q=n
In the notation adopted in the proof of the first statement, one has
Xy =< wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq | i1, ..., jq ∈ {0, 1, 2} >
From the definition of the action of J one has therefore that for any p, q
Xy ∼= V
with k =
from which the second statement of the proposition can be esily
deduced. �
Theorem 3.3. The operators Lj, Vj for j ∈ {0, 1, 2} commute with the generator
J of so(2,R).
A GEOMETRIC REALIZATION OF sl(6,C) 9
Proof We prove the statements by a direct computation using the basis vij ;
moreover, using the action of S3 (which permutes the Lj, Vj and fixes J), it is
enough to prove the commutativity for L0 and V0. It useful to rewrite ω0 (and
hence L0 which is wedge with ω0) in terms of the basis generated by the wj :
ω0 = v11 ∧ v12 + v21 ∧ v22 =
(w1 ∧ w2 − w2 ∧ w1)
and then:
[J, L0](wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq ) =
(w1 ∧ w2 − w2 ∧ w1)
wi1 ∧ · · · ∧wip ∧ wj1 ∧ · · · ∧ wjq
(w1 ∧ w2 − w2 ∧ w1)
wi1 ∧ · · · ∧wip ∧ wj1 ∧ · · · ∧ wjq
(w1 ∧ w2 − w2 ∧w1) ∧ J(wi1 ∧ · · · ∧ wip ∧ wj1 ∧ · · · ∧ wjq )
Therefore the result follows from the fact that
(w1 ∧w2 − w2 ∧ w1) = 0
as wj and wk have opposite weight with respect to J for any j, k.
Similarly, [J, V0] = 0 follows from the fact that for any α
V0(α) = v10 ∧ v20 ∧ α =
w0 ∧ w0 ∧ α
From the previous theorem one obtains the following corollary, which holds on any
WSD manifold (not necessarily 2-Kähler ):
Corollary 3.4. The algebra LC commutes with the action of so(2,R) induced by
Proof We already know that [J, Lj] = [J, Vj ] = 0 for j ∈ {0, 1, 2}. The corre-
sponding commutation relations for the adjoint generators Λj , Aj of LC follow from
the fact that J∗ = −J , as noticed in Proposition 2.5. �
Remark 3.5. From Schurs’s lemma it follows that the columns of the diagram of
Proposition 3.2 are preserved by the action of LC.
4. An irreducible representation of LC
Looking at the table in Proposition 3.2 we notice that the second column from
the left is a representation of LC (by Remark 3.5) of dimension 6:
V ∼= V
−2 =< w0 ∧ w1, w0 ∧ w2, w1 ∧w2, w0 ∧ w1 ∧w2 ∧w0,
w0 ∧w1 ∧w2 ∧ w1, w0 ∧ w1 ∧ w2 ∧ w2 >
In this section we will compute explicitely this representation.
Using the above described basis, it is not difficult to compute the matrices by
hand:
Proposition 4.1. Indicating with β the ordered basis for V indicated above, the
matrices for the (restrictions to V of) the generators of LC are the following:
Mβ(L0) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0
, Mβ(Λ0) =
0 0 0 0 −2 0
0 0 0 0 0 −2
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
10 GIOVANNI GAIFFI, MICHELE GRASSI
Mβ(L1) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0 0
0 0 −
0 0 0
, Mβ(Λ1) =
0 0 0 2 0 0
0 0 0 0 0 0
0 0 0 0 0 −2
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Mβ(L2) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0
0 0 0
0 0 0 0 0 0
, Mβ(Λ2) =
0 0 0 0 0 0
0 0 0 2 0 0
0 0 0 0 2 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Mβ(V0) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
, Mβ(A0) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 −2ı 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Mβ(V1) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0
0 0 0 0 0 0
, Mβ(A1) =
0 0 0 0 0 0
0 0 0 0 2ı 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Mβ(V2) =
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0
, Mβ(A2) =
0 0 0 0 0 −2ı
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
Proof Direct computation using the basis generated by the wj . �
Corollary 4.2. The algebra generated by the restriction of LC to V is isomorphic
to sl(6,C), with V its natural representation.
One can sum up the computations above in the following theorem:
Theorem 4.3. There is an exact sequence of Lie algebras
0 → K → LC → sl(6,C) → 0
given by the restriction to V .
In the next section we will prove that K = {0}, and therefore the representation
V is faithful and LC ∼= sl(6,C).
5. Quadratic invariants
We begin by showing that the action of Lie algebra LC is induced by a (non-
canonical) Clifford algebra representation. We use for simplicity the canonical
identification T ∗∗Xp ∼= TXp without further comment, so that if {vij} is a basis
for T ∗pX , then {
} is the corresponding dual basis for TpX .
Definition 5.1. For p ∈ X, the Clifford algebra Cp is
Cp = Cl(TpX ⊕ T
pX, q)
A GEOMETRIC REALIZATION OF sl(6,C) 11
with the quadratic form q induced by the metric
∀i, j, h, k < vij , vhk >= 0
∀i, j, h, k < ∂
∀(i, j) 6= (h, k) < vij ,
∀i, j < vij ,
>= − 1
Remark 5.2. The Clifford algebras Cp for varying p define a Clifford bundle C
on X, as the definition of Cp is independent on the choice of a basis. Indeed, the
quadratic form used to define it is simply induced by − 1
times the natural bilinear
pairing TpX ⊗ T
pX → R.
Proposition 5.3. The Clifford algebra Cp has a canonical representation ρp on
T ∗pX, induced by the operators Eij and Iij via the map
ρp(vij) = Eij , ρp
= Iij
Proof The Clifford relations
φψ + ψφ = −2 < φ,ψ >
are precisely the content of Proposition 2.4. The representation is canonical, even if
the operators Eij and Iij are not, because it can be defined in a basis independent
way as
ρp(v)(α) = v ∧ α, ρp
Abusing slightly the notation, we will identify Cp with its (faithful) image inside
T ∗pX
, and we will omit any reference to the map ρp. Actually, as the
representation above is a real analogue of the Spinor representation, it is easy to
check that the map ρp is an isomorphism of associative algebras. One then has:
Definition 5.4. The linear subspace C2p of Cp is the image of the natural map
(TpX ⊕ T
pX) → Cp. The linear subspace C
p of Cp is the subspace generated by
Recall that C2p is a Lie subalgebra of Cp (with the commutator bracket).
Proposition 5.5. The Lie algebra Lp and the operator J sit inside C
p for all
p ∈ X.
Proof The operators Lj , the Λj, the Vj and the Aj lie inside C
p by Propo-
sition 2.4 and the fact that ω1, ω,ωD lie in
T ∗pX . The operator J lies inside
C2p ⊕ C
p by Proposition 2.5. By definition the elements C
p are commutators, and
therefore have trace zero in any representation, and hence also in the ρp. Moreover,
again by inspection all the generators of Lp have trace zero once represented via
ρp (they are nilpotent), and therefore they must lie inside C
p. The operator J is
in the Lie algebra of the isometry group, and therefore it too has trace zero and
hence sits inside C2p . As C
p is closed under the commutator bracket of Cp, and
this commutator coincides with the composition bracket of operators, we have the
conclusion. �
Remark 5.6. Giving degree 1 to the operators Eij and degree −1 to the opera-
tors Iij , we induce a Z-degree on Cp. This degree coincides with the degree of the
operators induced from the grading on the forms from
T ∗X.
12 GIOVANNI GAIFFI, MICHELE GRASSI
Remark 5.7. For any p ∈ X, the Clifford algebra Cp is isomorphic to Cl6,6, as
the metric used to define it has signature (6, 6). The previous proposition therefore
shows that Lp is a Lie subalgebra of Cl
∼= spin(6, 6) = so(6, 6), generated by
smooth global sections of the Clifford bundle C.
The operator J acts on all of Cp by adjunction with respect to the commutator
bracket, and sends its quadratic part C2p to itself from Proposition 5.5.
We will show that the space of J-invariants inside C2p (the “quadratic” J-invariants)
coincides with LC. To describe it explicitely, let us introduce the following notation:
Definition 5.8.
Ewj = E1j + ıE2j , Ewj = E1j − ıE2j
Iwj = I1j − ıI2j , Iwj = I1j + ıI2j
Lemma 5.9. The adjoint action of the operator J on Ewj , Iwj , Ewj , Iwj is:
[J,Ewj ] = −ıEwj , [J, Iwj ] = ıIwj
[J,Ewj ] = ıEwj , [J, Iwj ] = −ıIwj
Proof It is enough to consider the corresponding J-weights of the wj , wj . �
Proposition 5.10. The following 36 operators provide a linear basis for the qua-
dratic J-invariants:
(1) [Ew0 , Ew1 ], [Ew0 , Ew2 ], [Ew1 , Ew2 ], [Ew1 , Ew0 ], [Ew2 , Ew0 ], [Ew2 , Ew1 ]
(2) [Iw0 , Iw1 ], [Iw0 , Iw2 ], [Iw1 , Iw2 ], [Iw1 , Iw0 ], [Iw2 , Iw0 ], [Iw2 , Iw1 ]
(3) [Ew0 , Ew0 ], [Ew1 , Ew1 ], [Ew2 , Ew2 ]
(4) [Iw0 , Iw0 ], [Iw1 , Iw1 ], [Iw2 , Iw2 ]
(5) [Ew0 , Iw1 ], [Ew0 , Iw2 ], [Ew1 , Iw0 ], [Ew1 , Iw2 ], [Ew2 , Iw0 ], [Ew2 , Iw1 ]
(6) [Ew0 , Iw1 ], [Ew0 , Iw2 ], [Ew1 , Iw0 ], [Ew1 , Iw2 ], [Ew2 , Iw0 ], [Ew2 , Iw1 ]
(7) [Ew0 , Iw0 ], [Ew1 , Iw1 ], [Ew2 , Iw2 ], [Ew0 , Iw0 ], [Ew1 , Iw1 ], [Ew2 , Iw2 ]
Proof The J-weight of a bracket of J-homogeneous operators is the sum of
the respective weights. The quadratic ”monomials” (with respect to the bracket)
in the Ewj , Iwj , Ewj , Iwj are all J-homogeneous, and therefore to find a basis of
J-invariant quadratic operators it is enough to identify the J-invariant quadratic
monomials. To be J-invariant means simply to have weight zero, and the compu-
tation of the J-weight of the quadratic mononials follows immediately from those
of Ewj , Iwj , Ewj , Iwj , which are respectively −ı, ı, ı,−ı. �
We end this section with the following:
Theorem 5.11. In the exact sequence of Theorem 4.3 the kernel K is equal to {0}.
The algebra LC is therefore isomorphic to sl(6,C).
Proof Since LC is included in the Lie algebra of quadratic invariants, it is
enough to show that J 6∈ LC, as from this and the previous proposition it follows
that dimC(LC) ≤ 35. As LC maps surjectively to sl(6,C) which has dimension 35,
the kernel must be zero. When restricted to the subrepresentation V , the generators
of LC have all trace zero by inspection of their matrices. However, by definition
of V , J restricted to it is multiplication by −2ı, and has therefore trace equal to
−12ı. �
Corollary 5.12. The Lie algebra LC⊕ < J > equals the Lie algebra of quadratic
invariants inside C2p .
A GEOMETRIC REALIZATION OF sl(6,C) 13
6. A geometric presentation of Serre generators
In this section, to gain a better geometric understanding of the representation
LC of sl(6,C), we explore in greater detail its relation to the geometric structure of
a WSD manifold. In particular, we give a presentation of a natural choice of Cartan
subalgebra and Serre generators in terms on the geometric generators Lj ,Λj, Vj , Aj .
The Lj operators are similar in nature to the Lefschetz operators of a Kähler
manifold. This analogy is what provided the initial interest in the algebraic struc-
ture of LC. Similarly to the corresponding standard construction of a representation
of sl(2,C), we define
Definition 6.1. For j ∈ {0, 1, 2}
Hj = [Lj ,Λj]
These operators are self-adjoint, as L∗j = Λj by definition. As in the context of
Kählerian geometry, for every j the algebra < Lj ,Λj, Hj > turns out to be a copy
of sl(2,C). Moreover, the following proposition shows that the operators Hj are
semisimple on the whole algebra LC, and therefore generate a toral subalgebra of
Proposition 6.2. The geometric operators Hj generate a toral subalgebra of LC,
and the following relations hold: for j 6= k ∈ {0, 1, 2}
(1) [Hj , Lj] = 2Lj, [Hj ,Λj] = −2Λj
(2) [Hj , Lk] = Lk, [Hj ,Λk] = −Λk
(3) [Hj , Vj ] = 0, [Hj , Aj ] = 0
(4) [Hj , Vk] = 2Vk, [Hj , Ak] = −2Ak
Proof In view of Theorem 5.11, at this point the quickest method of proof of
this proposition is to refer to the explicit matrices of the (faithful) restriction of LC
to V . �
The whole algebra LC splits into a direct sum of weight spaces with respect to
< H0, H1, H2 >, as this subalgebra is toral. The weight of L0 with respect to the
basis dual to H0, H1, H2 is:
αL0 = (αL0(H0), αL0(H1), αL0(H2)) = (2, 1, 1)
The full list is:
αL0 = (2, 1, 1), αΛ0 = −αL0
αL1 = (1, 2, 1), αΛ1 = −αL1
αL2 = (1, 1, 2), αΛ2 = −αL2
αV0 = (0, 2, 2), αA0 = −αV0
αV1 = (2, 0, 2), αA1 = −αV1
αV2 = (2, 2, 0), αA2 = −αV2
To find a natural geometric expression for two ad-semisimple elements which com-
plete < H0, H1, H2 > to a Cartan subalgebra we look at the generators Vj and Aj .
However, it turns out that the natural candidates [Vj , Aj ] already lie in the algebra
< H0, H1, H2 >. We instead build the new operators by ”subtracting” from the Vj
their weight αVj :
Definition 6.3. We define
S0 = ı[[[V0,Λ1],Λ2], L0]
S1 = ı[[[V1,Λ2],Λ0], L1]
S2 = ı[[[V2,Λ0],Λ1], L2]
14 GIOVANNI GAIFFI, MICHELE GRASSI
and denote by H the Lie algebra (over C):
H =< H0, H1, H2, S0, S1, S2 >
The coefficients ı which appear in the formulas above are dictated by the fact
that with this choice the (diagonal) matrices of the Sj restricted to V have integer
entries.
Proposition 6.4. The algebra H is a Cartan subalgebra of LC. More precisely,
the following are the diagonals of the operators H0, ..., S2 once restricted to V
, H1 :
, H2 :
, S0 :
, S1 :
, S2 :
Proof The computation of the matrices above shows that, once restricted to
V , the algebra H spans the space of diagonal matrices of trace zero in the given
basis. �
Remark 6.5. The computation above shows also that operators S0, S1, S2 safisfy
the relation
S0 + S1 + S2 = 0
Even if from the previous proposition we know that H is maximal toral inside LC,
the natural geometric generators Lj ,Λj are not eigenvectors for the adjoint action
of the Sk. At this point however it is possible to single out in natural geometric
terms operators of LC which have ”pure” weight with respect to the algebra H and
which contain in their linear span the Lj,Λj :
Definition 6.6. For j ∈ {0, 1, 2}
L1j = −2Lj + [Sj , Lj], L2j = 2Lj + [Sj , Lj]
Λ1j = −2Λj − [Sj ,Λj], Λ2j = 2Λj − [Sj ,Λj]
Proposition 6.7. Indicating with ehk the 6× 6 matrix with a 1 in position k (row)
and h (column) and zero otherwise, the matrices of the operators Lij and Λij re-
stricted on V are:
L10 = 2e
6 L11 = −2e
4 L12 = −2e
L20 = −2e
5 L21 = −2e
6 L22 = 2e
Λ10 = 8e
2 Λ11 = −8e
1 Λ12 = −8e
Λ20 = −8e
1 Λ21 = −8e
3 Λ22 = 8e
Corollary 6.8. We have the following relations for the operators of LC restricted
to V :
[Hk, Lij ] = (1 + δkj)Lij , [Hk,Λij ] = −(1 + δkj)Λij
[Sk, Lij ] = (−1)
i+1(1− 3δkj)Lij , [Sk,Λij ] = (−1)
i(1− 3δkj)Λij
[Sk, Vj ] = 0, [Sk, Aj ] = 0
Guided by all the explicit computations of the action on the isotypical component
V = V ⊕6
−2 made up to this point, we now define in terms of the natural geometric
operators a set of Serre generators for the algebra LC.
A GEOMETRIC REALIZATION OF sl(6,C) 15
Definition 6.9.
[L20, A1] f1 =
[V1,Λ20]
[L22, A0] f2 =
[V0,Λ22]
e3 = V0 f3 = A0
[L12, A0] f4 =
[V0,Λ12]
[L10, A1] f5 =
[V1,Λ10]
Moreover, for all i ∈ {1, .., 5} we define hi = [ei, fi].
As the ei have by construction associated matrix e
i+1 once restricted to V and
the fi are their respective adjoints, one gets:
Proposition 6.10. The operators ei, fj ,hk satisfy the Serre relations for sl(6,C)
and the hi span the Cartan subalgebra H:
(H1 −H2 − S1 − S2)
(H0 −H1 + S2)
(−H0 +H1 +H2)
(H0 −H1 − S2)
(H1 −H2 + S1 + S2)
It would be interesting as a last remark to identify in the list of quadratic in-
variants the geometric operators Lij ,Λij , Vj , Aj , the algebra H and the so(2,R)
generator J . To do this one could of course use the explicit matrices for the qua-
dratic invariants once restricted to V , which are not difficult to compute. One can
however get very quickly a qualitative picture by using the notion of multidegree
which we now introduce.
The decomposition T ∗X = W0 ⊕ W1 ⊕ W2 induces naturally a multi-degree on
X with values in Z3, which we indicate with mdeg. This follows from the
equation
p+q+r=n
(W0 ⊗ C)⊕
(W1 ⊗ C)⊕
(W2 ⊗ C)
We notice furthermore that the (complexified) decomposition above is preserved by
the operator J , and therefore mdeg commutes with the action of so(2,R).
Proposition 6.11. The operators Lj, Vj ,Λj , Aj , Hj , Sj are mdeg-homogeneous,
with multi-degrees:
mdeg(L0) = (0, 1, 1) mdeg(L1) = (1, 0, 1) mdeg(L2) = (1, 1, 0)
mdeg(Λ0) = (0,−1,−1) mdeg(Λ1) = (−1, 0,−1) mdeg(Λ2) = (−1,−1, 0)
mdeg(V0) = (2, 0, 0) mdeg(V1) = (0, 2, 0) mdeg(V2) = (0, 0, 2)
mdeg(A0) = (−2, 0, 0) mdeg(A1) = (0,−2, 0) mdeg(A2) = (0, 0,−2)
mdeg(H0) = (0, 0, 0) mdeg(H1) = (0, 0, 0) mdeg(H2) = (0, 0, 0)
mdeg(S0) = (0, 0, 0) mdeg(S1) = (0, 0, 0) mdeg(S2) = (0, 0, 0)
Proof The values for mdeg for the Lj and the Vj follow immediately from mdeg
of the corresponding forms and the dual (contraction) operators have opposite value
16 GIOVANNI GAIFFI, MICHELE GRASSI
of mdeg. The remaing values can be computed using the additivity of mdeg with
respect to the bracket. �
Proposition 6.12. Let {j, k, l} = {0, 1, 2}. Then
Span (L1j, L2j) = Span ([Ewk , Ewl ], [Ewl , Ewk ])
Span (Λ1j ,Λ2j) = Span ([Iwk , Iwl ], [Iwl , Iwk ])
Span (Vj) = Span
[Ewj , Ewj ]
Span (Aj) = Span
[Iwj , Iwj ]
H⊕ Span (J) =
Span ([Ewm , Iwm ], [Ewm , Iwm ])
Proof The mdeg of the Lij is the same of the corresponding Lj, and sim-
ilarly for their adjoints. The mdegs of the quadratic monomials are immedi-
ately computed as they are the sum of those of their components. For example,
mdeg(Ew0) = mdeg(Ew0) = (1, 0, 0) , mdeg(Ew1) = mdeg(Ew1) = (0, 1, 0) and
therefore mdeg([Ew0 , Ew1 ] = (1, 1, 0), equal to that of L12 and L22. �
References
[B] V. Batyrev, Dual polyhedra and mirror symmetry for Calabi-Yau hypersurfaces in toric
varieties, J. Alg. Geom. 3 (1994) , 493-535
[BMP] U. Bruzzo, G. Marelli, F. Pioli A Fourier transform for sheaves on real tori Part II.
Relative theory J. of Geometry and Phy. 41 (2002) 312-329
[CDGP] P. Candelas, X.C. De la Ossa, P.S. Green, L. Parkes, A pair of Calabi-Yau manifolds
as an exactly soluble superconformal theory, Nucl. Phys. B359 (1991), p 21-74
[GG] G. Gaiffi, M. Grassi, A natural Lie superalgebra bundle on rank three WSD manifolds,
preprint (2007)
[G1] M. Grassi, Polysymplectic spaces, s-Kähler manifolds and lagrangian fibrations,
math.DG/0006154 (2000)
[G2] M. Grassi, Mirror symmetry and self-dual manifolds, math.DG/0202016 (2002)
[G3] M. Grassi, Self-dual manifolds and mirror symmetry for the quintic threefold, Asian
J. Math 9 (2005) 79-102
[GP] B.R. Greene, M.R. Plesser, Duality in Calabi-Yau moduli space, Nucl. Phys. B338
(1990), 15-37
[GVW] B. R. Greene, C. Vafa, N. P. Warner, Calabi-Yau manifolds and renormalization group
flows, Nucl. Phys. B324 (1989), 371-390
[Gr] M. Gromov, Metric structures for Riemannian and non-Riemannian spaces,
Birkhäuser P.M. 152, Boston 1999
[GW] M. Gross, P.M.H. Wilson, Large Complex Structure limits of K3 surfaces,
math.DG/0008018 (2001)
[Gu] V. Guillemin, Moment maps and combinatorial invariants of Hamiltonian Tn-spaces,
Birkhäuser P.M. 122 (1994)
[M] A. McInroy, Orbifold mirror symmetry for complex tori, preprint
[KS] M. Kontsevich, Y. Soibelman, Homological mirror symmetry and torus fibrations,
math.SG/0011041 (2001)
[SYZ] A. Strominger, S.T. Yau, E. Zaslow, Mirror Symmetry is T-Duality, Nucl. Phys. B479
(1996) 243-259; hep-th/9606040
http://arxiv.org/abs/math/0006154
http://arxiv.org/abs/math/0202016
http://arxiv.org/abs/math/0008018
http://arxiv.org/abs/math/0011041
http://arxiv.org/abs/hep-th/9606040
	1. Introduction
	2. Basic operators
	3. The action of so(2,R)
	4. An irreducible representation of LC
	5. Quadratic invariants
	6. A geometric presentation of Serre generators
	References
ABSTRACT
  Given an orientable weakly self-dual manifold X of rank two, we build a
geometric realization of the Lie algebra sl(6,C) as a naturally defined algebra
L of endomorphisms of the space of differential forms of X. We provide an
explicit description of Serre generators in terms of natural generators of L.
This construction gives a bundle on X which is related to the search for a
natural Gauge theory on X. We consider this paper as a first step in the study
of a rich and interesting algebraic structure.

<|endoftext|><|startoftext|>
Introduction and main results 3
1.1 Many facets of displaceability . . . . . . . . . . . . . . . . . . 3
1.2 Preliminaries on quantum homology . . . . . . . . . . . . . . . 8
1.3 An hierarchy of rigid subsets within Floer theory . . . . . . . 10
1.4 Hamiltonian torus actions . . . . . . . . . . . . . . . . . . . . 14
1.5 Super(heavy) monotone Lagrangian submanifolds . . . . . . . 19
1.6 An effect of semi-simplicity . . . . . . . . . . . . . . . . . . . . 23
1.7 Discussion and open questions . . . . . . . . . . . . . . . . . . 27
1.7.1 Strong displaceability beyond Floer theory? . . . . . . 27
1.7.2 Heavy fibers of Poisson-commutative subspaces . . . . 28
2 Detecting stable displaceability 32
3 Preliminaries on Hamiltonian Floer theory 33
3.1 Valuation on QH∗(M) . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Hamiltonian Floer theory . . . . . . . . . . . . . . . . . . . . 34
3.3 Conley-Zehnder and Maslov indices . . . . . . . . . . . . . . . 36
3.4 Spectral numbers . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Partial symplectic quasi-states . . . . . . . . . . . . . . . . . . 44
4 Basic properties of (super)heavy sets 45
5 Products of (super)heavy sets 48
5.1 Product formula for spectral invariants . . . . . . . . . . . . . 48
5.2 Decorated Z2-graded complexes . . . . . . . . . . . . . . . . . 49
5.3 Reduced Floer and Quantum homology . . . . . . . . . . . . . 50
5.4 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . . . 51
5.5 Proof of algebraic Theorem 5.2 . . . . . . . . . . . . . . . . . 52
6 Stable non-displaceability of heavy sets 57
7 Analyzing stable stems 59
8 Monotone Lagrangian submanifolds 61
9 Rigidity of special fibers of Hamiltonian actions 66
9.1 Calabi and mixed action-Maslov . . . . . . . . . . . . . . . . . 76
1 Introduction and main results
1.1 Many facets of displaceability
A well-studied and easy to visualize rigidity property of subsets of a symplec-
tic manifold (M,ω) is the rigidity of intersections: a subset X ⊂ M cannot
be displaced from the closure of a subset Y ⊂ M by a compactly supported
Hamiltonian isotopy:
φ(X) ∩ Y 6= ∅ ∀φ ∈ Ham(M) .
We say in such a case that X cannot be displaced from Y . If X cannot be
displaced from itself we call it non-displaceable. These properties become
especially interesting and purely symplectic when X can be displaced from
itself or from Y by a (compactly supported) smooth isotopy.
One of the main themes of the present paper is that “some non-displace-
able sets are more rigid than others.” To explain this, we need the following
ramifications of the notion of a non-displaceable set:
Strong non-displaceability: A subset X ⊂ M is called strongly non-
displaceable if one cannot displace it by any (not necessarily Hamiltonian)
symplectomorphism of (M,ω).
Stable non-displaceability: Consider T ∗S1 = R × S1 with the coordi-
nates (r, θ) and the symplectic form dr ∧ dθ. We say that X ⊂ M is stably
non-displaceable if X × {r = 0} is non-displaceable in M × T ∗S1 equipped
with the split symplectic form ω̄ = ω ⊕ (dr ∧ dθ). Let us mention that de-
tecting stably non-displaceable subsets is useful for studying geometry and
dynamics of Hamiltonian flows (see for instance [50] for their role in Hofer’s
geometry and [51] for their appearance in the context of kick stability in
Hamiltonian dynamics).
Formally speaking, the properties of strong and stable non-displaceability
are mutually independent and both are strictly stronger than displaceability.
In the present paper we refine the machinery of partial symplectic quasi-
states introduced in [23] and get new examples of stably non-displaceable
sets, including certain fibers of moment maps of Hamiltonian torus actions
as well as monotone Lagrangian submanifolds discussed by Albers [2] and
Biran-Cornea [15]. Further, we address the following question: given the class
of stably non-displaceable sets, can one distinguish those of them which are
also strongly non-displaceable by means of the Floer theory? Or, other way
around, what are the Floer-homological features of stably non-displaceable
but strongly displaceable sets? Toy examples are given by the equator of the
symplectic two-sphere and by the meridian on a symplectic two-torus. Both
are stably non-displaceable since their Lagrangian Floer homologies are non-
trivial. On the other hand, the equator is strongly non-displaceable, while
the meridian is strongly displaceable by a non-Hamiltonian shift. Later on we
shall explain the difference between these two examples from the viewpoint
of Hamiltonian Floer homology and present various generalizations.
The question on Floer-homological characterization of (strongly) non-displa-
ceable but stably displaceable sets is totally open, see Section 1.7.1 below for
an example involving Gromov’s packing theorem and discussion.
Leaving Floer-theoretical considerations for the next section, let us outline
(in parts, informally) the general scheme of our results: Given a symplectic
manifold (M,ω), we shall define (in the language of the Floer theory) two
collections of closed subsets of M , heavy subsets and superheavy subsets.
Every superheavy subset is heavy, but, in general, not vice versa. Formally
speaking, the hierarchy heavy-superheavy depends in a delicate way on the
choice of an idempotent in the quantum homology ring ofM . This and other
nuances will be ignored in this outline. The key properties of these collections
are as follows (see Theorems 1.2 and 1.5 below):
Invariance: Both collections are invariant under the group of all symplec-
tomorphisms of M .
Stable non-displaceability: Every heavy subset is stably non-displace-
able.
Intersections: Every superheavy subset intersects every heavy subset. In
particular, superheavy subsets are strongly non-displaceable. In contrast to
this, heavy subsets can be mutually disjoint and strongly displaceable.
Products: Product of any two (super)heavy subsets is (super)heavy.
What is inside the collections? The collections of heavy and superheavy
sets include the following examples:
Stable stems: Let A ⊂ C∞(M) be a finite-dimensional Poisson-commuta-
tive subspace (i.e. any two functions from A commute with respect to the
Poisson brackets). Let Φ :M → A∗ be the moment map: 〈Φ(x), F 〉 = F (x).
A non-empty fiber Φ−1(p), p ∈ A∗, is called a stem of A (see [23]) if all
non-empty fibers Φ−1(q) with q 6= p are displaceable and a stable stem if
they are stably displaceable. If a subset of M is a (stable) stem of a finite-
dimensional Poisson-commutative subspace of C∞(M), it will be called just
a (stable) stem. Clearly, any stem is a stable stem. The collection of
superheavy subsets includes all stable stems (see Theorem 1.6 below).
One readily shows that a direct product of stable stems is a stable stem and
that the image of a stable stem under any symplectomorphism is again a
stable stem.
The following example of a stable stem is borrowed (with a minor mod-
ification) from [23]: Let X ⊂ M be a closed subset whose complement is
a finite disjoint union of stably displaceable sets. Then X is a stable stem.
For instance, the codimension-1 skeleton of a sufficiently fine triangulation of
any closed symplectic manifold is a stable stem. Another example is given by
the equator of S2: it divides the sphere into two displaceable open discs and
hence is a stable stem. By taking products, one can get more sophisticated
examples of stable stems. Already the product of equators of the two-spheres
gives rise to a Lagrangian Clifford torus in S2× . . .×S2. To prove its rigidity
properties (such as stable non-displaceability) one has to use non-trivial sym-
plectic tools such as Lagrangian Floer homology, see e.g. [44]. Products of
the 1-skeletons of fine triangulations of the two-spheres can be considered as
singular Lagrangian submanifolds, an object which is currently out of reach
of the Lagrangian Floer theory.
Another example of stable stems comes from Hamiltonian torus actions.
Consider an effective Hamiltonian action ϕ : Tk → Ham(M) with the mo-
ment map Φ = (Φ1, . . . ,Φk) : M → R
k. Assume that Φi is a normalized
Hamiltonian, that is
Φi = 0 for all i = 1, . . . , k. A torus action is called
compressible if the image of the homomorphism ϕ♯ : π1(T
k) → π1(Ham(M)),
induced by the action ϕ, is a finite group. One can show that for compressible
actions the fiber Φ−1(0) is a stable stem (see Theorem 1.7 below).
Special fibers of Hamiltonian torus actions: Consider an effective
Hamiltonian torus action ϕ on a spherically monotone symplectic manifold.
Let I : π1(Ham(M)) → R be the mixed action-Maslov homomorphism intro-
duced in [49]. Since the target space Rk of the moment map Φ is naturally
identified with Hom(π1(T
k),R), the pull back pspec := −ϕ
♯I of the mixed
action-Maslov homomorphism with the reversed sign can be considered as a
point of Rk. The preimage Φ−1(pspec) is called the special fiber of the action.
We shall see below that the special fiber is always non-empty. For monotone
symplectic toric manifolds (that is when 2k = dimM) the special fiber is a
monotone Lagrangian torus. Note that when the action is compressible we
have pspec = 0 and therefore the special fiber is a stable stem according to the
previous example. It is unknown whether the latter property persists for gen-
eral non-compressible actions. Thus in what follows we treat stable stems
and special fibers as separate examples. The collection of superheavy
subsets includes all special fibers (see Theorem 1.9 below).
For instance, consider CP 2 and the Lagrangian Clifford torus in it (i.e.
the torus {[z0 : z1 : z2] ∈ CP
2 | |z0| = |z1| = |z2|}). Take the standard
Hamiltonian T2-action on CP 2 preserving the Clifford torus. It has three
global fixed points away from the Clifford torus. Make an equivariant sym-
plectic blow-up, M , of CP 2 at k of these fixed points, 0 ≤ k ≤ 3, so that
the obtained symplectic manifold is spherically monotone. The torus action
lifts to a Hamiltonian action on M . One can show that its special fiber is
the proper transform of the Clifford torus.
Monotone Lagrangian submanifolds: Let (M2n, ω) be a spherically
monotone symplectic manifold, and let L ⊂ M be a closed monotone La-
grangian submanifold with the minimal Maslov number NL ≥ 2. We say
that L satisfies the Albers condition [2] if the image of the natural morphism
H∗(L;Z2) → H∗(M ;Z2) contains a non-zero element S with
deg S > dimL+ 1−NL .
The collection of heavy sets includes all closed monotone Lagran-
gian submanifolds satisfying the Albers condition (see Theorem 1.15
below).
Specific examples include the meridian on T2, RP n ⊂ CP n and all La-
grangian spheres in complex projective hypersurfaces of degree d in CP n+1
with n > 2d − 3. In the case when the fundamental class [L] of L divides
a non-trivial idempotent in the quantum homology algebra of M , L is, in
fact, superheavy (see Theorem 1.18 below). For instance, this is the case
for RP n ⊂ CP n. Furthermore, a version of superheaviness holds for any
Lagrangian sphere in the complex quadric of even (complex) dimension.
However, there exist examples of heavy, but not superheavy, Lagrangian
submanifolds: For instance, the meridian of the 2-torus is strongly displa-
ceable by a (non-Hamiltonian!) shift and hence is not superheavy. Another
example of heavy but not superheavy Lagrangian submanifold is the sphere
arising as the real part of the Fermat hypersurface
M = {−zd0 + z
1 + . . .+ z
n+1 = 0} ⊂ CP
with even d ≥ 4 and n > 2d− 3. We refer to Section 1.5 for more details on
(super)heavy monotone Lagrangian submanifolds.
Motivation: Our motivation for the selection of examples appearing in the
list above is as follows. Stable stems provide a playground for studying
symplectic rigidity of singular subsets. In particular, no visible analogue of
the conventional Lagrangian Floer homology technique is applicable to them.
Detecting (stable) non-displaceability of Lagrangian submanifolds via La-
grangian Floer homology is one of the central themes of symplectic topology.
In contrast to this, detecting strong non-displaceabilty has at the moment the
status of art rather than science. That’s why we were intrigued by Albers’
observation that monotone Lagrangian submanifolds satisfying his condition
are in some situations strongly non-displaceable. In the present work we tried
to digest Albers’ results [2] and look at them from the viewpoint of theory
of partial symplectic quasi-states developed in [23]. In addition, our result
on superheaviness of the Lagrangian anti-diagonal in S2 × S2 allows us to
detect an “exotic” monotone Lagrangian torus in this symplectic manifold:
this torus does not intersect the anti-diagonal, and hence is not heavy in
contrast to the standard Clifford torus, see Example 1.20 below.
In [23] we proved a theorem which roughly speaking states that every
(singular) coisotropic foliation has at least one non-displaceable fiber. How-
ever, our proof is non-constructive and does not tell us which specific fibers
are non-displaceable. The notion of the special fiber arose as an attempt to
solve this problem for Hamiltonian circle actions.
Let us mention also that the product property enables us to produce even
more examples of (super)heavy subsets by taking products of the subsets
appearing in the list.
A few comments on the methods involved into our study of heavy and su-
perheavy subsets are in order. These collections are defined in terms of
partial symplectic quasi-states which were introduced in [23]. These are cer-
tain real-valued functionals on C∞(M) with rich algebraic properties which
are constructed by means of the Hamiltonian Floer theory and which conve-
niently encode a part of the information contained in this theory. In general,
the definition of a partial symplectic quasi-state involves the choice of an
idempotent element in the commutative part QH•(M) of the quantum ho-
mology algebra of M . Though the default choice is just the unity of the
algebra, there exist some other meaningful choices, in particular in the case
when QH•(M) is semi-simple. This gives rise to another theme discussed in
this paper: “visible” topological obstructions to semi-simplicity (see Corol-
lary 1.24 and Theorem 1.25 below). For instance, we shall show that if a
monotone symplectic manifold M contains “too many” disjoint monotone
Lagrangian spheres whose minimal Maslov numbers exceed n+ 1, the quan-
tum homology QH•(M) cannot be semi-simple.
Let us pass to the precise set-up. For the reader’s convenience, the ma-
terial presented in this brief outline will be repeated in parts in the next
sections in a less compressed form.
1.2 Preliminaries on quantum homology
The Novikov Ring: Let F denote a base field which in our case will be
either C or Z2, and let Γ ⊂ R be a countable subgroup (with respect to the
addition). Let s, q be formal variables. Define a field KΓ whose elements are
generalized Laurent series in s of the following form:
KΓ :=
θ, zθ ∈ F , ♯
θ > c | zθ 6= 0
<∞, ∀c ∈ R
Define a ring ΛΓ := KΓ[q, q
−1] as the ring of polynomials in q, q−1 with
coefficients in KΓ. We turn ΛΓ into a graded ring by setting the degree of s
to be 0 and the degree of q to be 2.
The ring ΛΓ serves as an abstract model of the Novikov ring associated to
a symplectic manifold. Let (M,ω) be a closed connected symplectic manifold.
Denote by HS2 (M) the subgroup of spherical homology classes in the integral
homology group H2(M ;Z). Abusing the notation we will write ω(A), c1(A)
for the results of evaluation of the cohomology classes [ω] and c1(M) on
A ∈ H2(M ;Z). Set
π̄2(M) := H
2 (M)/ ∼,
where by definition
A ∼ B iff ω(A) = ω(B) and c1(A) = c1(B).
Denote by Γ(M,ω) := [ω](HS2 (M)) ⊂ R the subgroup of periods of the
symplectic form on M on spherical homology classes. By definition, the
Novikov ring of a symplectic manifold (M,ω) is ΛΓ(M,ω). In what follows,
when (M,ω) is fixed, we abbreviate and write Γ, K and Λ instead of Γ(M,ω),
KΓ(M,ω) and ΛΓ(M,ω) respectively.
Quantum homology: Set 2n = dimM . The quantum homology QH∗(M)
is defined as follows. First, it is a graded module over Λ given by
QH∗(M) := H∗(M ;F)⊗F Λ,
with the grading defined by the gradings on H∗(M ;F) and Λ:
deg (a⊗ zsθqk) := deg (a) + 2k .
Second, and most important, QH∗(M) is equipped with a quantum prod-
uct: if a ∈ Hk(M ;F), b ∈ Hl(M ;F), their quantum product is a class
a ∗ b ∈ QHk+l−2n(M), defined by
a ∗ b =
A∈π̄2(M)
(a ∗ b)A ⊗ s
−ω(A)q−c1(A),
where (a ∗ b)A ∈ Hk+l−2n+2c1(A)(M) is defined by the requirement
(a ∗ b)A ◦ c = GW
A (a, b, c) ∀c ∈ H∗(M ;F).
Here ◦ stands for the intersection index and GWFA (a, b, c) ∈ F denotes the
Gromov-Witten invariant which, roughly speaking, counts the number of
pseudo-holomorphic spheres inM in the class A that meet cycles representing
a, b, c ∈ H∗(M ;F) (see [55], [56], [41] for the precise definition).
Extending this definition by Λ-linearity to the whole QH∗(M) one gets
a correctly defined graded-commutative associative product operation ∗ on
QH∗(M) which is a deformation of the classical ∩-product in singular ho-
mology [37], [41], [55], [56], [69]. The quantum homology algebra QH∗(M)
is a ring whose unity is the fundamental class [M ] and which is a module
of finite rank over Λ. If a, b ∈ QH∗(M) have graded degrees deg (a), deg (b)
deg (a ∗ b) = deg (a) + deg (b)− 2n. (1)
We will be mostly interested in the commutative part of the quantum
homology ring (which in the case F = Z2 is, of course, the whole quantum
homology ring). For this purpose we introduce the following notation:
We denote by QH•(M) the whole quantum homology QH∗(M) if
F = Z2 and the even-degree part of QH∗(M) if F = C.
In general, given a topological space X, we denote by H•(X ;F) the
whole singular homology group H∗(X ;F) if F = Z2 and the even-
degree part of H∗(X ;F) if F = C.
Thus, in our notation the ring QH•(M) = H•(M ;F)⊗F Λ is always a com-
mutative subring with unity of QH∗(M) and a module of finite rank over Λ.
We will identify Λ with a subring of QH•(M) by λ 7→ [M ]⊗ λ.
1.3 An hierarchy of rigid subsets within Floer theory
Fix a non-zero idempotent a ∈ QH2n(M) (by obvious grading considera-
tions the degree of every idempotent equals 2n). We shall deal with spectral
invariants c(a,H), where H = Ht : M → R, t ∈ R, is a smooth time-
dependent and 1-periodic in time Hamiltonian function on M , or c(a, φH),
where φH is an element of the universal cover H̃am (M) of Ham(M) rep-
resented by an identity-based path given by the time-1 Hamiltonian flow
generated by H . If H is normalized, meaning that
dimM/2 = 0 for all
t, then c(a,H) = c(a, φH). These invariants, which nowadays are standard
objects of the Floer theory, were introduced in [45] (cf. [59] in the aspherical
case; also see [42],[43] for an earlier version of the construction and [22] for a
summary of definitions and results in the monotone case).
Disclaimer: Throughout the paper we tacitly assume that (M,ω) (as well
as (M ×T2, ω̄), when we speak of stable displaceability) belongs to the class
S of closed symplectic manifolds for which the spectral invariants are well
defined and enjoy the standard list of properties (see e.g. [41, Theorem
12.4.4]). For instance, S contains all symplectically aspherical and spherically
monotone manifolds. Furthermore, S contains all symplectic manifolds M2n
for which, on one hand, either c1 = 0 or the minimal Chern number (on
HS2 (M)) is at least n − 1 and, on the other hand, [ω](H
2 (M)) is a discrete
subgroup of R (cf. [64]). The general belief is that the class S includes all
symplectic manifolds.
Define a functional ζ : C∞(M) → R by
ζ(H) := lim
c(a, lH)
It is shown in [23] that the functional ζ has some very special algebraic
properties (see Theorem 3.6) which form the axioms of a partial symplectic
quasi-state introduced in [23]. The next definition is motivated in part by
the work of Albers [2].
Definition 1.1. A closed subset X ⊂ M is called heavy (with respect to ζ
or with respect to a used to define ζ) if
ζ(H) ≥ inf
H ∀H ∈ C∞(M) , (3)
and is called superheavy (with respect to ζ or a) if
ζ(H) ≤ sup
H ∀H ∈ C∞(M) . (4)
The default choice of an idempotent a is the unity [M ] ∈ QH∗(M). In this
case, as we shall see below, the collections of heavy and superheavy sets
satisfy the properties listed in Section 1.1 and include the examples therein.
In view of potential applications (including geometric obstructions to semi-
simplicity of the quantum homology), we shall work, whenever possible, with
general idempotents.
The asymmetry between supX H and infX H is related to the fact that
the spectral numbers satisfy a triangle inequality c(a ∗ b, φFφG) ≤ c(a, φF ) +
c(b, φG), while there may not be a suitable inequality “in the opposite direc-
tion”. In the case when such an “opposite” inequality exists (e.g. when a = b
is an idempotent and ζ defined by it is a genuine symplectic quasi-state – see
Section 1.6 below) the symmetry between supX H and infX H gets restored
and the classes of heavy and superheavy sets coincide.
Let us emphasize that the notion of (super)heaviness depends on the
choice of a coefficient ring for the Floer theory. In this paper the coefficients
for the Floer theory will be either Z2 or C depending on the situation. Unless
otherwise stated, our results on (super)heavy subsets are valid for any choice
the coefficients.
The group Symp (M) of all symplectomorphisms of M acts naturally on
H∗(M ;F) and hence on QH∗(M) = H∗(M ;F) ⊗F Λ. Clearly, the identity
component Symp0(M) of Symp (M) acts trivially on QH∗(M) and hence for
any idempotent a ∈ QH∗(M) the corresponding ζ is Symp0(M)-invariant.
Thus the image of a (super)heavy set under an element of Symp0(M) is again
a (super)heavy set with respect to the same idempotent a. If a is invariant
under the action of the whole Symp (M) (for instance, if a = [M ]) the classes
of heavy and superheavy sets with respect to a are invariant under the action
of the whole Symp (M) in agreement with the invariance property presented
in Section 1.1 above.
Let us mention also that the collections of (super)heavy sets enjoy a
stability property under inclusions: If X, Y , X ⊂ Y , are closed subsets of M
and X is heavy (respectively, superheavy) with respect to an idempotent a
then Y is also heavy (respectively, superheavy) with respect to the same a.
We are ready now to formulate the main results of the present section.
Theorem 1.2. Assume a and ζ are fixed. Then
(i) Every superheavy set is heavy, but, in general, not vice versa.
(ii) Every heavy subset is stably non-displaceable.
(iii) Every superheavy set intersects every heavy set. In particular, a super-
heavy set cannot be displaced by a symplectic (not necessarily Hamil-
tonian) isotopy and if the idempotent a is invariant under the symplec-
tomorphism group of (M,ω) (e.g. if a = [M ]), every superheavy set is
strongly non-displaceable.
The following theorem discusses the relation between heaviness/super-
heaviness properties with respect to different idempotents. In particular, it
shows that [M ] plays a special role among all the other non-zero idempotents
in QH∗(M).
Theorem 1.3. Assume a is a non-zero idempotent in the quantum homology.
(i) Every set that is superheavy with respect to [M ] is also superheavy with
respect to a.
(ii) Every set that is heavy with respect to a is also heavy with respect to
[M ].
(iii) Assume that the idempotent a is a sum of non-zero idempotents
e1, . . . , el and assume that a closed subset X ⊂ M is heavy with re-
spect to a. Then X is heavy with respect to ei for at least one i.
The next proposition shows that, in general, the heaviness of a set does
depend on the choice of an idempotent in the quantum homology.
Proposition 1.4. Consider the torus T2n equipped with the standard sym-
plectic structure ω = dp∧dq. Let M2n = T2n♯CP n be a symplectic blow-up of
T2n at one point (the blow up is performed in a small ball around the point).
Assume that the Lagrangian torus L ⊂ T2n given by q = 0 does not intersect
the ball in T2n, where the blow up was performed.
Then the proper transform of L (identified with L) is a Lagrangian sub-
manifold of M , which is not heavy with respect to some non-zero idempotent
a ∈ QH∗(M) but heavy with respect to [M ]. (Here we work with F = Z2).
Next, consider direct products of (super)heavy sets. We start with the fol-
lowing convention on tensor products. Let Γi, i = 1, 2, be two countable
subgroups of R. Let Ei be a module over KΓi. We put
E1⊗̂KE2 =
E1 ⊗KΓ1 KΓ1+Γ2
⊗KΓ1+Γ2
E2 ⊗KΓ2 KΓ1+Γ2
. (5)
If E1, E2 are also rings we automatically assume that the middle tensor prod-
uct is the tensor product of rings. In simple words, we extend both modules
to KΓ1+Γ2-modules and consider the usual tensor product over KΓ1+Γ2 .
Given two symplectic manifolds, (M1, ω1) and (M2, ω2), note that the
subgroups of periods of the symplectic forms satisfy
Γ(M1 ×M2, ω1 ⊕ ω2) = Γ(M1, ω1) + Γ(M2, ω2) .
Furthermore, due to the Künneth formula for quantum homology (see e.g.
[41, Exercise 11.1.15] for the statement in the monotone case; the general
case in our algebraic setup can be treated similarly) there exists a natural
ring monomorphism linear over KΓ1+Γ2
QH2n1(M1)⊗̂KQH2n2(M2) →֒ QH2n1+2n2(M1 ×M2) ,
We shall fix a pair of idempotents ai ∈ QH∗(Mi), i = 1, 2. The notions
of (super)heaviness in M1,M2 and M1 ×M2 are understood in the sense of
idempotents a1, a2 and a1 ⊗ a2 respectively.
Theorem 1.5. Assume that Xi is a heavy (resp. superheavy) subset of Mi
with respect to some idempotent ai, i = 1, 2. Then the product X1 × X2
is a heavy (resp. superheavy) subset of M with respect to the idempotent
a1 ⊗ a2 ∈ QH•(M1 ×M2).
An important class of superheavy sets is given by stable stems introduced
and illustrated in Section 1.1.
Theorem 1.6. Every stable stem is a superheavy subset with respect to any
non-zero idempotent a ∈ QH∗(M). In particular, it is strongly and stably
non-displaceable.
In the next section we present an example of stable stems coming from Hamil-
tonian torus actions.
1.4 Hamiltonian torus actions
Fibers of the moment maps of Hamiltonian torus actions form an interesting
playground for testing the various notions of displaceability and heaviness
introduced above. Throughout the paper we deal with effective actions only,
that is we assume that the map ϕ : Tk → Ham(M) defining the action
is a monomorphism. Furthermore, we assume that the moment map Φ =
(Φ1, . . . ,Φk) : M → R
k of the action is normalized: Φi is a normalized
Hamiltonian for all i = 1, . . . , k. By the Atiyah-Guillemin-Sternberg theorem
[6], [30], the image ∆ = Φ(M) of Φ is a k-dimensional convex polytope,
called the moment polytope. The subsets Φ−1(p), p ∈ ∆, are called fibers of
the moment map. A torus action is called compressible if the image of the
homomorphism ϕ♯ : π1(T
k) → π1(Ham(M)), induced by the action ϕ, is a
finite group.
Theorem 1.7. Assume that (M,ω) is equipped with a compressible Hamilto-
nian Tk-action with moment map Φ and moment polytope ∆. Let Y ⊂ ∆ be
any closed convex subset which does not contain 0. Then the subset Φ−1(Y )
is stably displaceable. In particular, the fiber Φ−1(0) is a stable stem.
Note that for symplectic toric manifolds, that is when 2k = dimM , the point
0 is the barycenter of the moment polytope with respect to the Lebesgue
measure. This follows from our assumption on the normalization of the
moment map.
Theorems 1.6 and 1.7 imply that the fiber Φ−1(0) of a compressible torus
action is stably non-displaceable, and thus we get the complete description
of stably displaceable fibers for such actions.
In the case when the action is not compressible, the question of the com-
plete description of stably non-displaceable fibers remains open. We make a
partial progress in this direction by presenting at least one such fiber, called
the special fiber, explicitly in the case when (M,ω) is spherically monotone:
[ω]|HS2 (M)
= κ c1(TM)|HS2 (M)
, κ > 0 .
The special fiber can be described via the mixed action-Maslov homomor-
phism introduced in [49]: Let (M2n, ω) be a spherically monotone symplectic
manifold, and let {ft}, t ∈ [0, 1], be any loop of Hamiltonian diffeomorphisms,
with f0 = f1 = 1, generated by a 1-periodic normalized Hamiltonian func-
tion F (x, t). The orbits of any Hamiltonian loop are contractible due to the
standard Floer theory1. Pick any point x ∈ M and any disc u : D2 → M
spanning the orbit γ = {ftx}. Define the action
2 of the orbit by
AF (γ, u) :=
F (γ(t), t)dt−
u∗ω .
Trivialize the symplectic vector bundle u∗(TM) over D2 and denote by
mF (γ, u) the Maslov index of the loop of symplectic matrices corresponding
to {ft∗} with respect to the chosen trivialization. One readily checks that,
in view of the spherical monotonicity, the quantity
I(F ) := −AF (γ, u)−
mF (γ, u)
does not depend on the choice of the point x and the disc u, and is invariant
under homotopies of the Hamiltonian loop {ft}. In fact, I is a well defined
homomorphism from π1(Ham(M)) to R (see [49], [68]).
Assume again that ϕ : Tk → Ham(M,ω) is a Hamiltonian torus ac-
tion. Write ϕ♯ for the induced homomorphism of the fundamental groups.
Since the target space Rk of the moment map Φ is naturally identified with
Hom(π1(T
k),R), the pull back −ϕ∗♯I of the mixed action-Maslov homomor-
phism with the reversed sign can be considered as a point of Rk. We call
it a special point and denote by pspec. The preimage Φ
−1(pspec) is called the
special fiber of the moment map. In the case k = 1, when Φ is a real-valued
function on M , we will call pspec the special value of Φ.
1The Floer theory guarantees the existence of at least one contractible periodic orbit –
this is not obvious a priori if {ft} is not an autonomous flow. Since all the orbits of {ft}
are homotopic, all of them are contractible.
2Note that our action functional and the one in [49] are of opposite signs.
If k = n and M is a symplectic toric manifold, then pspec can be defined
in purely combinatorial terms involving only the polytope ∆. Namely, pick
a vertex x of ∆. Since ∆ in this case is a Delzant polytope [20], there is a
unique (up to a permutation) choice of vectors v1, . . . ,vn which
• originate at x;
• span the n rays containing the edges of ∆ adjacent to x;
• form a basis of Zn over Z.
Proposition 1.8.
pspec = x+ κ
vi. (6)
Proof. The vertices of the moment polytope are in one-to-one correspondence
with the fixed points of the action. Let x ∈ M be the fixed point corre-
sponding to the vertex x = (x1, . . . ,xn). Then the vectors vj = (v
j , . . . , v
j = 1, . . . , n, are simply the weights of the isotropy Tn-action on TxM . Since
the definition of the mixed action-Maslov invariant of a Hamiltonian circle
action does not depend on the choice of a 1-periodic orbit and a disc span-
ning it, let us compute all Ii, l = 1, . . . , n, using the constant periodic orbit
concentrated at the fixed point x and the constant disc u spanning it. Clearly,
AΦi(x, u) = Φi(x) = xi and mΦi(x, u) = 2
vij ∀i = 1, . . . , n,
which readily yields formula (6).
E.Shelukhin pointed out to us that by summing up equations (6) over all the
vertices x(1), . . . ,x(m) ∈ Rn of the moment polytope, one readily gets that
pspec =
Theorem 1.9. Assume M2n is a spherically monotone symplectic manifold
equipped with a Hamiltonian Tk-action. Then the special fiber of the moment
map is superheavy with respect to any (non-zero) idempotent a ∈ QH2n(M).
In particular, it is stably and strongly non-displaceable.
Let us mention that, in particular, the special fiber is non-empty and so
pspec ∈ ∆. Moreover pspec is an interior point of ∆ – otherwise Φ
−1(pspec) is
isotropic of dimension < n and hence displaceable (see e.g. [9]).
Remark 1.10. If dimM = 2dimTk (that is we deal with a symplectic toric
manifold), the special fiber, say L, is a Lagrangian torus. In fact, this torus
is monotone: for every D ∈ π2(M,L) we have
ω = κ ·mL(D) ,
where mL stands for the Maslov class of L. This is an immediate consequence
of the definitions.
Remark 1.11. Note that when M is spherically monotone and the action is
compressible Theorems 1.7 and 1.9 match each other: in this case pspec = 0
and therefore the special fiber is a stable stem by Theorem 1.7. It is unknown
whether this property persists for the special fibers of non-compressible ac-
tions.
Example 1.12. Let M be the monotone symplectic blow up of CP 2 at k
points (0 ≤ k ≤ 3) which is equivariant with respect to the standard T2-
action and which is performed away from the Clifford torus in CP 2. Since
the blow-up is equivariant, M comes equipped with a Hamiltonian T2-action
extending the T2-action on CP 2. The Clifford torus is a fiber of the moment
map of the T2-action on CP 2. Let L ⊂M be the Lagrangian torus which is
the proper transform of the Clifford torus under the blow-up – it is a fiber of
the moment map of the T2-action on M . Using Proposition 1.8 it is easy to
see that L is the special fiber of M . According to Theorem 1.9, it is stably
and strongly non-displaceable. In fact, it is a stem: the displaceability of
all the other fibers was checked for k = 0 in [10], for k = 1 in [23] and for
k = 2, 3 in [40].
We refer to Section 1.7.2 for further discussion of related problems and very
recent advances.
Digression: Calabi vs. action-Maslov. The method used to prove
Theorem 1.9 also allows to prove the following result involving the mixed
action-Maslov homomorphism. Denote by vol (M) the symplectic volume of
M . Consider the function µ : H̃am (M) → R defined by
µ(φH) := −vol (M) lim
c(a, φlH)/l.
In the case when a is the unity in a field that is a direct summand in the
decomposition of the K-algebra QH2n(M,ω), as an algebra, into a direct
sum of subalgebras, µ is a homogeneous quasi-morphism on H̃am (M) called
Calabi quasi-morphism [22],[24],[46]; in the general case it has weaker prop-
erties [23]. With this language the functional ζ (on normalized functions) is
induced (up to a constant factor) by the pull-back of µ to the Lie algebra of
H̃am (M).
Following P.Seidel we described in [22] the restriction of µ (in fact, for any
spherically monotoneM) on π1(Ham(M)) ⊂ H̃am (M) in terms of the Seidel
homomorphism π1(Ham(M)) → QH
∗ (M), where QH
∗ (M) denotes the
group of invertible elements in the ring QH∗(M). Here we give an alternative
description of µ|π1(Ham(M)) in terms of the mixed action-Maslov homomor-
phism I which, in turn, also provides certain information about the Seidel
homomorphism.
Theorem 1.13. Assume M is spherically monotone and let µ be defined as
above for some non-zero idempotent a ∈ QH∗(M). Then
µ|π1(Ham(M)) = vol (M) · I.
Note that, in particular, µ|π1(Ham(M)) does not depend on a used to de-
fine µ. The theorem also implies that µ descends to a quasi-morphism on
Ham(M) if and only if I : π1(Ham(M)) → R vanishes identically (since µ
descends to a quasi-morphism on Ham(M) if and only if µ|π1(Ham(M)) ≡ 0 –
see e.g. [22], Prop. 3.4). The proof of the theorem is given in Section 9.1.
Let us mention also that, interestingly enough, the homomorphism I
coincides with the restriction to π1(Ham(M)) of yet another quasi-morphism
on H̃am (M) constructed by P.Py (see [52, 53]).
Digression: Action-Maslov homomorphism and Futaki invari-
ant. This remark grew from an observation pointed out to us by Chris
Woodward – we are grateful to him for that. Assume that our symplectic
manifoldM is complex Kähler (i.e. the symplectic structure onM is induced
by the Kähler one) and Fano (by this we mean here that [ω] = c1). Assume
also that a Hamiltonian S1-action {ft} preserves the Kähler metric and the
complex structure. For instance, if M2n is a symplectic toric manifold it can
be equipped canonically with a complex structure and a Kähler metric invari-
ant under the Tn-action on M , hence under the action of any S1-subgroup
{ft} of T
Let V be the Hamiltonian vector field generating the Hamiltonian flow
{ft}. Since {ft} preserves the complex structure, one can associate to V its
Futaki invariant F(V ) ∈ C [29]. It has been checked by E.Shelukhin [63]
that, up to a universal constant factor, this Futaki invariant is equal to the
value of the mixed action-Maslov homomorphism on the loop {ft}:
F(V ) = const · I({ft}).
Note that if such an M admits a Kähler-Einstein metric then the Futaki
invariant has to vanish [29] – thus if I({ft}) 6= 0 the manifold does not admit
a Kähler-Einstein metric. Moreover, if M2n is toric the opposite is also true:
if the Futaki invariant vanishes for any V generating a subgroup of the torus
Tn acting onM thenM admits a Kähler-Einstein metric – this follows from a
theorem by Wang and Zhu [67], combined with a previous result of Mabuchi
[38]. In terms of the moment polytope, the vanishing of the Futaki invariant,
and accordingly the existence of a Kähler-Einstein metric, on a Kähler Fano
toric manifold means precisely that the special point of the polytope coincides
with the barycenter.
1.5 Super(heavy) monotone Lagrangian submanifolds
Let (M2n, ω) be a closed spherically monotone symplectic manifold with [ω] =
κ · c1(TM) on π2(M), κ > 0. Let L ⊂ M be a closed monotone Lagrangian
submanifold with the minimal Maslov number NL ≥ 2. As usually, we put
NL = +∞ if π2(M,L) = 0. As before, we work with the basic field F which
is either Z2 or C. In the case F = C, we assume that L is relatively spin, that
is L is orientable and the 2nd Stiefel-Whitney class of L is the restriction of
some integral cohomology class of M .
Disclaimer: In the case F = C the results of this section are conditional:
We take for granted that Proposition 8.1 below, which was proved by Biran
and Cornea [15] for homologies with Z2-coefficients, extends to homologies
with C-coefficients. In each of the specific examples below we will explicitly
state which F we are using and whenever we use F = C we assume that L
is relatively spin.
Denote by j the natural morphism j : H•(L;F) → H•(M ;F). We say that
L satisfies the Albers condition [2] if there exists an element S ∈ H•(L;F) so
that j(S) 6= 0 and
deg S > dimL+ 1−NL .
We shall refer to such S as to an Albers element of L.
Example 1.14. Assume [L] ∈ H•(L;F) and j([L]) ∈ H•(M ;F) is non-zero.
This means precisely that [L] is an Albers element of L.
A closed monotone Lagrangian submanifold L which satisfies this con-
dition (and whose minimal Maslov number is greater than 1) will be called
homologically non-trivial in M .
Theorem 1.15. Let L be a closed monotone Lagrangian submanifold satisfy-
ing the Albers condition. Then L is heavy with respect to [M ]. In particular,
any homologically non-trivial Lagrangian submanifold is heavy with respect
to [M ].
Example 1.16. Assume that π2(M,L) = 0. Then the homology class of a
point is an Albers element of L, and hence L is heavy. Note that in this
case heaviness cannot be improved to superheaviness: the meridian on the
two-torus is heavy but not superheavy. Here we took F = Z2.
Example 1.17 (Lagrangian spheres in Fermat hypersurfaces). More exam-
ples of heavy (but not necessarily superheavy) monotone Lagrangian sub-
manifolds can be constructed as follows3.
Let M ⊂ CP n+1 be a smooth complex hypersurface of degree d. The
pull-back of the standard symplectic structure from CP n+1 turns M into a
symplectic manifold (of real dimension 2n). If d ≥ 2, then, as it is explained,
for instance, in [12],M contains a Lagrangian sphere: M can be included into
a family of algebraic hypersurfaces of CP n+1 with quadratic degenerations at
isolated points and the vanishing cycle of such a degeneration can be realized
by a Lagrangian sphere following [5], [21], [60], [61], [62].
Let M ⊂ CP n+1 be a projective hypersurface of degree d, 2 ≤ d < n+ 2.
The minimal Chern number of M equals N := n+2− d > 0. Let Ln ⊂ M2n
be a simply connected Lagrangian submanifold (for instance, a Lagrangian
sphere).
First, consider the case when n is even, L is relatively spin and the Euler
characteristics of L does not vanish (this is the case for a sphere). Then the
3We thank P.Biran for his indispensable help with these examples.
homology class j([L]) ∈ Hn(M ;Z) is non-zero: its self-intersection number
in M up to the sign equals the Euler characteristic. Thus [L] is an Albers
element. (Here we use F = C). In view of Theorem 1.15, L is heavy with
respect to [M ].
Second, suppose that n is of arbitrary parity but n > 2d − 3, and no
restriction on the Euler characteristics of L is assumed anymore. This yields
NL = 2N > n+ 1 and thus L satisfies the Albers condition with the class of
a point P as an Albers element. Thus L is heavy with respect to [M ] – here
we use F = Z2.
Finally, fix n ≥ 3 and an even number d such that 4 ≤ d < n+2. Consider
a Fermat hypersurface of degree d
M = {−zd0 + z
1 + . . .+ z
n+1 = 0} ⊂ CP
n+1 .
Its real part L := M ∩ RP n+1 lies in the affine chart z0 6= 0 and is given by
the equation
xd1 + . . .+ x
n+1 = 1,
where xj := Re(zj/z0) . Since d is even, L is an n-dimensional sphere. As
it was explained above, L is heavy with respect to [M ] if either n is even
(and F = C) or n > 2d − 3 (and F = Z2). However, in either case L is not
superheavy with respect to [M ]. Indeed, let Σd ≈ Zd be the group of complex
roots of unity. Given a vector α = (α1, . . . , αn) ∈ (Σd)
n+1, denote by fα the
symplectomorphism of M given by
fα(z0 : z1 : . . . : zn+1) = (z0 : α1z1 : . . . : αn+1zn+1) . (7)
If all αj ∈ C\R, then αjx /∈ R whenever x ∈ R\{0}, and thus fα(L)∩L = ∅.
Therefore L is strongly displaceable and the claim follows from the part (iii)
of Theorem 1.2.
The next result gives a user-friendly sufficient condition of superheaviness.
Theorem 1.18. Assume L is homologically non-trivial in M and assume
a ∈ QH2n(M) is a non-zero idempotent divisible by j([L]) in QH•(M), that
is a ∈ j([L]) ∗QH•(M). Then L is superheavy with respect to a.
The homological non-triviality of L in the hypothesis of the theorem means
just that [L] is an Albers element of L (see Example 1.14). In fact, the
theorem can be generalized to the cases when L has other Albers elements –
see Remark 8.3 (ii).
Example 1.19 (Lagrangian spheres in quadrics). Here we work with F = C.
Let M be the real part of the Fermat quadric M = {−z20 +
j=1 z
j = 0}.
Assume that n is even and L is a simply connected Lagrangian submanifold
with non-vanishing Euler characteristic (e.g. a Lagrangian sphere). Under
this assumption, [L] ∈ H•(L) and j([L]) 6= 0, since L has non-vanishing self-
intersection. Denote by p ∈ H∗(M ;F) the class of a point. The quantum
homology ring of M was described by Beauville in [8]. In particular, p ∗ p =
w−2[M ], where w = sκnqn. Thus
a± :=
[M ]± pw
are idempotents. One can show that j([L]) divides a− and hence L is a−-
superheavy. Since a− is invariant under the action of Symp(M), the manifold
L is strongly non-displaceable.
For simplicity, we present the calculation in the case n = 2 – the general
case is absolutely analogous. The 2-dimensional quadric is symplectomorphic
to (S2 × S2, ω ⊕ ω). Denote by A and B the classes of [S2] × [point] and
[point] × [S2] respectively. Since the symplectic form vanishes on j([L]) we
get that j([L]) = l(B − A) with l 6= 0. It is known that A ∗ B = p and
B ∗B = w−1[M ]. Thus j([L]) ∗ 1
wB = a−, that is j([L]) divides a−.
In particular, the Lagrangian anti-diagonal
∆ := {(x, y) ∈ S2 × S2 : x = −y} ,
which is diffeomorphic to the 2-sphere, is superheavy with respect to a−. It is
unknown whether ∆ is super-heavy with respect to a+. Further information
on superheavy Lagrangian submanifolds in the quadrics can be extracted
from [15].
Example 1.20 (A non-heavy monotone Lagrangian torus in S2 × S2). Con-
sider the quadric M = S2 × S2 from the previous example. We will think of
S2 as of the unit sphere in R3 whose symplectic form is the area form divided
by 4π. We will work again with F = C. Interestingly enough, such an M
contains a monotone Lagrangian torus that is not heavy with respect to a−.
Namely, consider a submanifold K given by equations4
K = {(x, y) ∈ S2 × S2 : x1y1 + x2y2 + x3y3 = −
, x3 + y3 = 0} .
4We thank Frol Zapolsky for his help with calculations in this example.
One readily checks that K is a monotone Lagrangian torus with NK = 2
which represents a zero element inH2(M ;F) (both with F = C and F = Z2).
Thus H•(K;F) does not contain any Albers element. Furthermore, K is
disjoint from the Lagrangian anti-diagonal ∆ and hence is not heavy with
respect to a− since, as it was shown above, ∆ is superheavy with respect to
a−. In particular, K is an exotic monotone torus: it is not symplectomorphic
to the Clifford torus which is a stem and hence a−-superheavy. A further
study of exotic tori in products of spheres is currently being carried out by
Y.Chekanov and F.Schlenk.
It is an interesting problem to understand whether K is superheavy with
respect to a+, or at least non-displaceable. Identify M \ {the diagonal} with
the unit co-ball bundle of the 2-sphere. After such an identification ∆ corre-
sponds to the zero section, while K corresponds to a monotone Lagrangian
torus, say K ′. Interestingly enough, the Lagrangian Floer homology of K ′
in T ∗S2 (with F = Z2) does not vanish as was shown by Albers and Frauen-
felder in [3], and thus K is not displaceable in M \ {the diagonal}. Thus
the question on (non)-displaceability of K is related to understanding of the
effect of the compactification of the unit co-ball bundle to S2 × S2.
The proofs of theorems above are based on spectral estimates due to
Albers [2] and Biran-Cornea [15]. Furthermore, the results above admit
various generalizations in the framework of Biran-Cornea theory of quantum
invariants for monotone Lagrangian submanifolds, see [15] and the discussion
in Section 8 below.
1.6 An effect of semi-simplicity
Recall that a commutative (finite-dimensional) algebra Q over a field A is
called semi-simple if it splits into a direct sum of fields as follows: Q =
Q1 ⊕ . . .⊕Qd , where
• each Qi ⊂ Q is a finite-dimensional linear subspace over A;
• each Qi is a field with respect to the induced ring structure;
• the multiplication in Q respects the splitting:
(a1, . . . , ad) · (b1, . . . , bd) = (a1b1, . . . , adbd).
A classical theorem of Wedderburn (see e.g. [66], §96) implies that the semi-
simplicity is equivalent to the absence of nilpotents in the algebra.
Remark 1.21. Assume that the K-algebra QH2n(M,ω) splits, as an algebra,
into a direct sum of two algebras, at least one of which is a field, and let e
be the unity in that field. In particular, this is the case when QH2n(M,ω) =
Q1⊕ . . .⊕Qd is semi-simple and e is the unity in one of the fields Qi. A slight
generalization of the argument in [23, 46] (see [24], the remark on pp. 56-57)
shows that the partial quasi-state ζ(e, ·) associated to e is R-homogeneous
(and not just R+-homogeneous as in the general case).
This immediately yields that every set which is heavy with respect to e is
automatically superheavy with respect to e.
In fact, in this situation ζ is a genuine symplectic quasi-state in the sense of
[23] and, in particular, a topological quasi-state in the sense of Aarnes [1] (see
[23] for details). In [1] Aarnes proved an analogue of the Riesz representation
theorem for topological quasi-states which generalizes the correspondence
between genuine states (that is positive linear functionals on C(M)) and
measures. The object τζ corresponding to a quasi-state ζ is called a quasi-
measure (or a topological measure). With this language in place, the sets that
are (super)heavy with respect to ζ are nothing else but the closed sets of the
full quasi-measure τζ . Any two such sets have to intersect for the following
basic reason: any quasi-measure is finitely additive on disjoint closed subsets
and therefore if two closed subsets of M of the full quasi-measure do not
intersect, the quasi-measure of their union must be greater than the total
quasi-measure of M , which is impossible.
Example 1.22. In this example we again assume that F = Z2. Let M =
CP n be equipped with the Fubini-Study symplectic structure ω, normalized
so that [ω] = c1, and let A ∈ H2n−2(M) be the homology class of the hyper-
plane. One readily verifies the following K-algebra isomorphism
QH2n(M) ∼= K[X ]/〈X
n+1 − u−1〉,
where
K = Z2[[u] = {zku
k + zk−1u
k−1 + . . . , zi ∈ Z2 ∀i}
is the field of Laurent-type series in u := sn+1 with coefficients in Z2 and
X = qA. Since no root of degree 2 or more of u−1 is contained in K, the
polynomial P is irreducible over K for any n (see e.g. [34], Theorem 9.1) and
therefore QH2n(M) is a field. Hence the collections of heavy and superheavy
sets with respect to the fundamental class coincide.
We claim that L := RP n ⊂ CP n is superheavy. The case n = 1 cor-
responds to the equator of the sphere, which is known to be a stable stem.
For n ≥ 2, note that NL = n + 1 and S = [RP
2] is an Albers element of L.
Therefore, L is [M ]-heavy by Theorem 1.15, and hence superheavy.
The next result follows directly from Theorem 1.3 (iii) and Remark 1.21:
Theorem 1.23. Assume that QH2n(M) is semi-simple and splits into a
direct sum of d fields whose unities will be denoted by e1, . . . , ed. Assume that
a closed subset X ⊂M is heavy with respect to a non-zero idempotent a – as
one can easily see, such an idempotent has to be of the form a = ej1+ . . .+ejl
for some 1 ≤ j1 < . . . < jl ≤ d. Then X is superheavy with respect to some
eji, 1 ≤ i ≤ l.
The theorem yields the following geometric characterization of non-semi-
simplicity of QH2n(M). Namely, define the symplectic Torelli group as the
group of all symplectomorphisms of M which induce the identity map on
H•(M ;F). For instance, this group contains Symp0(M). Note that any ele-
ment of the symplectic Torelli group acts trivially on the quantum homology
of M and hence maps sets (super)heavy with respect to an idempotent a to
sets (super)heavy with respect to a.
Now Theorem 1.23 readily implies the following
Corollary 1.24. Assume that (M,ω) contains a closed subset X which is
heavy with respect to a non-zero idempotent and displaceable by a symplec-
tomorphism from the symplectic Torelli group. Then QH2n(M) is not semi-
simple.
The simplest examples are provided by sets of the form X×{a meridian}
in M × T2 with a heavy X .
Another result in the same vein is as follows5. Given a set Y of positive
integers, put βY (M) =
i∈Y βi(M), where βi(M) stands for the i-th Betti
number of M over F .
5In the case F = C, Theorem 1.25 is conditional, see the disclaimer in the previous
section.
Theorem 1.25. Assume that either of the following (not mutually excluding)
conditions holds:
(a) M contains m > βY (M) + 1 pair-wise disjoint closed monotone La-
grangian submanifolds whose minimal Maslov numbers are greater than n+1
and belong to a set Y of positive integers.
(b) M contains pair-wise disjoint homologically non-trivial Lagrangian sub-
manifolds6 whose fundamental classes, viewed as (non-zero) elements of
H•(M ;F), are linearly dependent over F .
(In the case F = C assume that all the Lagrangian submanifolds above are
also relatively spin.)
Then QH2n(M) is not semi-simple.
The proof is given in Section 8.
Example 1.26. For instance, if all the Lagrangian submanifolds from part
(a) of the theorem are simply connected, their minimal Maslov numbers are
equal to 2N , so that the set Y consists of one element: Y = {2N}. Thus
if 2N > n + 1 and QH2n(M) is semi-simple, M cannot contain more than
β2N(M)+ 1 pair-wise disjoint simply-connected Lagrangians (provided all of
them are relatively spin if we work with F = C).
Example 1.27. Set F = C. Fix n ≥ 11 and an even number d such that
6 ≤ d < (n + 3)/2. Consider a Fermat hypersurface of degree d
M = {−zd0 + z
1 + . . .+ z
n+1 = 0} ⊂ CP
As we already saw in Example 1.17, the manifold L := M ∩ RP n+1 is an
n-dimensional Lagrangian sphere. Consider the images fα(L), where sym-
plectomorphisms fα are defined by (7). Note that, as long as αj/βj 6= ±1
for all j, the Lagrangian spheres fα(L) and fβ(L) are disjoint. Using this
observation, it is easy to find d/2 disjoint Lagrangian spheres in M .
The minimal Chern number N ofM equals n+2−d, and so 2N lies in the
interval [n+2, 2n−4]. In this case β2N(M) = 1 (see e.g. [31]). Since d/2 > 2,
we conclude from the previous example that QH2n(M) is not semi-simple.
This conclusion agrees with the computation of QH∗(M) by Beauville [8].
6See Example 1.14 for the definition. As in that example we again assume that all our
Lagrangian submanifolds are closed, monotone and have minimal Maslov number greater
than 1.
It would be interesting to find examples of symplectic manifolds where the
quantum homology is not known a priori and where the above theorems are
applicable. Let us mention that different obstructions to the semi-simplicity
of QH•(M) coming from Lagrangian submanifolds were recently found by
Biran and Cornea [14].
1.7 Discussion and open questions
1.7.1 Strong displaceability beyond Floer theory?
Clearly, displaceability implies stable displaceability. The converse is not
true, as the next example shows:
Example 1.28. Consider the complex projective space CP n equipped with
the Fubini-Study symplectic form (in our normalization the area of a line
equals 1). Identify CP n with the symplectic cut of the Euclidean ball B(1) ⊂
Cn (that is the boundary of B(1) is collapsed to CP n−1 along the fibers of
the Hopf fibration, see [36]), where B(r) := {π|z|2 ≤ r}. Then B(r) ⊂ CP n
(i) displaceable for r < 1/2;
(ii) strongly non-displaceable but stably displaceable for r ∈ [1/2, n/n+1);
(iii) strongly and stably non-displaceable for r ≥ n/n+ 1.
It is instructive to analyze the techniques involved in the proofs: The strong
non-displaceability result in (ii) is an immediate consequence of Gromov’s
packing-by-two-balls theorem, which is proved via the J-holomorphic variant
of the theorem which states that there exists a J-holomorphic line in CP n
passing through any two points. In the case (iii) the ball B(r) contains the
Clifford torus, which is stably non-displaceable. This follows either from the
fact that the Clifford torus is a stem (see [10]), or from non-vanishing of its
Lagrangian Floer homology [16].
The displaceability of B(r) in (i) follows from the explicit construction
of the two balls packing (see [33]). The stable displaceability in (ii) is a
direct consequence of Theorem 1.7 above: Indeed, consider the standard Tn-
action on CP n. The normalized moment polytope ∆ ⊂ Rn has the form
∆ = ∆stand + w where ∆stand is the standard simplex {ρi ≥ 0,
ρi ≤ 1} in
Rn, where (ρ1, . . . , ρn) denote coordinates in R
n, and w = − 1
(1, . . . , 1).
Note that the ball B(r) equals to Φ−1(∆r) where ∆r := r ·∆stand + w. Note
that ∆r does not contain the origin exactly when r ≤
which yields the
stable displaceability in (ii) above.
A mysterious feature of Example 1.28 is as follows. On the one hand, we
believe in the following general empiric principle: whenever one can establish
the non-displaceability of a subset by means of the Floer homology theory,
one gets for free the stable non-displaceability. On the other hand, we be-
lieve, following a philosophical explanation provided by Biran, that Gromov’s
packing-by-two-balls theorem may be extracted from some “operations” in
Floer homology. Example 1.28 shows that at least one of these beliefs is
wrong. It would be interesting to clarify this issue.
1.7.2 Heavy fibers of Poisson-commutative subspaces
It was shown in [23] that for any finite-dimensional Poisson-commutative
subspace A ⊂ C∞(M) at least one of the fibers of its moment map Φ has to
be non-displaceable.
Question. Is it true that at least one fiber of Φ has to be heavy (with respect
to some non-zero idempotent a ∈ QH∗(M))?
It is easy to construct an example of A whose moment map Φ has no
superheavy fibers: take T2 with the coordinates p, q mod 1 on it and take A
to be the set of all smooth functions depending only on p – the corresponding
Φ defines the fibration of T2 by meridians none of which is superheavy.
Here is another question which concerns fibers of symplectic toric man-
ifolds, i.e. fibers of a moment map Φ of an effective Hamiltonian Tn-action
on (M2n, ω). Assume M is (spherically) monotone. Theorem 1.9 shows that
in such a case the special fiber ofM is superheavy, hence stably and strongly
non-displaceable. In all the examples where it has been checked this turns
out to be the only non-displaceable fiber of M .
Question. Is the special fiber for a monotone symplectic toric M always
a stem? In particular, is it the only non-displaceable fiber of the moment
In the monotone case the special fiber is clearly the only heavy fiber of
the moment map, because it is superheavy and any other heavy fiber would
have had to intersect it. On the other hand, if we consider a Hamiltonian Tk-
action on M2n with k < n there can be more than one non-displaceable fiber
of the moment map – for instance, because of purely topological obstructions:
the simplest Hamiltonian T1-action on CP 2 provides such an example. In
the case of monotone symplectic toric manifolds of dimension bigger than 4
the question above is absolutely open.
After the first draft of this paper appeared, a remarkable progress in this
direction has been achieved in the works by Cho [17] and Fukaya, Oh, Ohta
and Ono [28]: In particular, it turns out that a non-monotone symplectic
toric manifold can have more than one non-displaceable fiber – this happens
already for certain equivariant blowups of CP 2.
Organization of the paper:
In Section 2 we prove Theorem 1.7 which in particular states that the
special fiber of a compressible torus action is a stable stem.
In Section 3 we sum up various preliminaries from Floer theory including
basic properties of spectral invariants and partial symplectic quasi-states. In
addition we spell out a useful property of the Conley-Zehnder index: it is a
quasi-morphism on the universal cover of the symplectic group (see Propo-
sition 3.5). For completeness we extract a proof of this property from [54];
alternatively, one can use the results of [19].
In Section 4 we prove parts (i) and (iii) of Theorem 1.2 and Theorem 1.3
on basic properties of (super)heavy sets.
In Section 5 we prove Theorem 1.5 on products of (super)heavy sets. Our
approach is based on a quite general product formula for spectral invariants
(Theorem 5.1), which is proved by a fairly lengthy algebraic argument.
In Section 6 we prove Theorem 1.2 (ii) on stable non-displaceability of
heavy subsets. The argument involves a “baby version” of the above-men-
tioned product formula.
In Section 7 we prove superheaviness of stable stems.
In Section 8 we bring together the proofs of various results related to
(super)heaviness of monotone Lagrangian submanifolds satisfying the Albers
condition, including Theorems 1.15, 1.18, 1.25 and Proposition 1.4.
In Section 9 we prove Theorem 1.9 on superheaviness of special fibers of
Hamiltonian torus actions on monotone symplectic manifolds. The proof is
quite involved. In fact, two tricks enabled us to shorten our original argu-
ment: First, we use the Fourier transform on the space of rapidly decaying
functions on the Lie coalgebra of the torus in order to reduce the problem
to the case of Hamiltonian circle actions. Second, we systematically use the
quasi-morphism property of the Conley-Zehnder index for asymptotic calcu-
lations with Hamiltonian spectral invariants. Finally, in Section 9.1 we prove
Theorem 1.13.
Figure 1 sums up the hierarchy of the non-displaceability properties dis-
cussed above.
������������������������
MHierarchy of non−displaceability properties of a closed subset of 
Heavy
aidempotent 
Superheavy
idempotent  a
wrt a non−zero wrt a non−zero
(3) (4) (5) (6)
action on a spherically
torus action on a (not
a compressible Hamiltonian
necessarily monotone)
Special fiber of
monotone M
Always true.
True under certain conditions (see below)
? Question (under certain conditions − see below)
Monotone
Lagrangian
submanifold L
(14) (15)
(17) (18)
(21) (22)
(16b)
Product of codimension−1
skeletons of fine
triangulations
Strongly
non−displaceable
a Hamiltonian torus
Non−displaceable
a symplectic isotopy
Non−displaceable by
wrt [M]
Heavy
(16a)
Superheavy
wrt [M]
Stable stem
Stably
non−displaceable
Zero fiber of
Figure 1: Hierarchy of non-displaceability properties
(1),(2),(6),(19) - Trivial.
(3) True if a is invariant under the action of the whole group Symp (M) –
Theorem 1.2, part (iii).
(4), (9) Theorem 1.2, part (iii).
(5) True if the algebra QH2n(M) is semi-simple – see Corollary 1.24.
(7a) True if the algebra QH2n(M) splits, as an algebra, into a direct sum of
two algebras, at least one of which is a field, and a is the unity element in
that field – see Remark 1.21.
(7b), (16b) Theorem 1.2, part (i).
(8) Theorem 1.2, part (ii).
(10) Theorem 1.18 (see the assumptions on L there).
(11) True if the algebra QH2n(M) is semi-simple – see Corollary 1.24.
(12) Theorem 1.3, part (i).
(13) Theorem 1.3, part (ii).
(14) Theorem 1.18 (see the assumptions on L there) with a = [M ] – i.e. j(L)
is invertible in QH•(M).
(15) L satisfies the Albers condition – see Theorem 1.15.
(16a) True if QH2n(M) is a field – see Remark 1.21.
(17) Theorem 1.6.
(18) Theorem 1.9.
(20) Theorem 1.7.
(21) Is the special fiber for a monotone symplectic toric M always a stem?
See Section 1.7.2.
(22) True if M is spherically monotone and the torus action is compressible
– see Remark 1.11.
(23) See [23].
2 Detecting stable displaceability
For detecting stable displaceability of a subset of a symplectic manifold we
shall use the following result (cf. [48, Chapter 6]).
Theorem 2.1. Let X be a closed subset of a closed symplectic manifold
(M,ω). Assume that there exists a contractible loop of Hamiltonian diffeo-
morphisms of (M,ω) generated by a normalized time-periodic Hamiltonian
Ht(x) so that Ht(x) 6= 0 for all t ∈ [0, 1] and x ∈ X. Then X is stably
displaceable.
Proof. Denote by ht the Hamiltonian loop generated by H . Let h
t be its
homotopy to the constant loop: h
t = ht and h
t = 1. Write H
(s)(x, t) for
the corresponding normalized Hamiltonians. Consider the family of diffeo-
morphisms Ψs of M × T
∗S1 given by
Ψs(x, r, θ) = (h
θ x, r −H
(s)(h
θ x, θ), θ) .
One readily checks that Ψs, s ∈ [0, 1], is a Hamiltonian isotopy (not com-
pactly supported). We claim that Ψ1 displaces Y := X × {r = 0}. Indeed,
if Ψ1(x, 0, θ) ∈ Y we have hθx ∈ X and Hθ(hθx) = 0 which contradicts the
assumption of the theorem. This completes the proof.
Proof of Theorem 1.7: Choose a linear functional F : Rk → R with
rational coefficients which is strictly positive on Y . Then for some suffi-
ciently large positive integer N the Hamiltonian H := NΦ∗F generates a
contractible Hamiltonian circle action on M and H is strictly positive on
X := Φ−1(Y ). Thus X is stably displaceable in view of the previous theo-
3 Preliminaries on Hamiltonian Floer theory
3.1 Valuation on QH∗(M)
Define a function ν : K → Γ by
θ) = max{ θ | zθ 6= 0} .
The convention is that ν(0) = −∞. In algebraic terms, exp ν is a non-
Archimedean absolute value on K.
The function ν admits a natural extension to Λ and then to QH∗(M) –
abusing the notation we will denote all of them by ν. Namely, any element
of λ ∈ Λ can be uniquely represented as λ =
θ uθs
θ, where each uθ belongs
to F [q, q−1], and any non-zero a ∈ QH∗(M) can be uniquely represented as
i λibi, 0 6= λi ∈ Λ, 0 6= bi ∈ H∗(M ;F). Define
ν(λ) := max
θ | uθ 6= 0
ν(a) := max
ν(λi).
3.2 Hamiltonian Floer theory
We briefly recall the notation and conventions for the setup of the Hamilto-
nian Floer theory that will be used in the proofs.
Let L be the space of all smooth contractible loops γ : S1 = R/Z → M .
We will view such a γ as a 1-periodic map γ : R → M . Let D2 be the
standard unit disk in R2. Consider a covering L̃ of L whose elements are
equivalence classes of pairs (γ, u), where γ ∈ L, u : D2 → M , u|∂D2 = γ (i.e.
u(e2π
−1t) = γ(t)), is a (piecewise smooth) disk spanning γ in M and the
equivalence relation is defined as follows: (γ1, u1) ∼ (γ2, u2) if and only if γ1 =
γ2 and the 2-sphere u1#(−u2) vanishes in H
2 (M). The equivalence class of
a pair (γ, u) will be denoted by [γ, u]. The group of deck transformations of
the covering L̃ → L can be naturally identified with HS2 (M). An element
A ∈ HS2 (M) acts by the transformation
A([γ, u]) = [γ, u#(−A)]. (8)
Let F :M× [0, 1] → R be a Hamiltonian function (which is time-periodic
as we always assume). Set Ft := F (·, t). We will denote by ft the Hamiltonian
flow generated by F , meaning the flow of the time-dependent Hamiltonian
vector field Xt defined by the formula
ω(·, Xt) = dFt(·) ∀t.
(Note our sign convention!)
Let PF ⊂ L be the set of all contractible 1-periodic orbits of the Hamilto-
nian flow generated by F , i.e. the set of all γ ∈ L such that γ(t) = ft(γ(0)).
Denote by P̃F the full lift of PF to L̃.
Denote by Fix (F ) the set of those fixed points of f that are endpoints of
contractible periodic orbits of the flow:
Fix (F ) := {x ∈M | ∃γ ∈ PF , x = γ(0)}.
We say that F is regular if for any x ∈ Fix (F ) the map dxf : TxM → TxM
does not have eigenvalue 1.
Recall that the action functional is defined on L̃ by the formula:
AF ([γ, u]) =
F (γ(t), t)dt−
Note that
AF (Ay) = AF (y) + ω(A) (9)
for all y ∈ L̃ and A ∈ HS2 (M).
For a regular Hamiltonian F define a vector space C(F ) over F as the
set of all formal sums
λiyi, λi ∈ Λ, yi ∈ P̃F ,
modulo the relations
Ay = s−ω(A)q−c1(A)y,
for all y ∈ P̃F , A ∈ H
2 (M). The grading on Λ together with the Conley-
Zehnder index on elements of P̃F (see Section 3.3) defines a Z-grading on
C(F ). We will denote the i-th graded component by Ci(F ).
Given a loop {Jt}, t ∈ S
1, of ω-compatible almost complex structures,
define a Riemannian metric on L by
(ξ1, ξ2) =
ω(ξ1(t), Jtξ2(t))dt,
where ξ1, ξ2 ∈ TγL. Lift this metric to L̃ and consider the negative gradient
flow of the action functional AF . For a generic choice of the Hamiltonian
F and the loop {Jt} (such a pair (F, J) is called regular) the count of iso-
lated gradient trajectories connecting critical points of AF gives rise in the
standard way [26], [32], [58] to a Morse-type differential
d : C(F ) → C(F ), d2 = 0. (10)
The differential d is Λ-linear and has the graded degree −1. It strictly de-
creases the action. The homology, defined by d, is called the Floer homology
and will be denoted by HF∗(F, J). It is a Λ-module. Different choices of a
regular pair (F, J) lead to natural isomorphisms between the Floer homology
groups.
The following proposition summarizes a few basic algebraic properties of
Floer complexes and Floer homology that will be important for us further.
The proof is straightforward and we omit it.
Proposition 3.1.
1) Each Ci(F ) and each HFi(F, J), i ∈ Z, is a finite-dimensional vector
space over K.
2) Multiplication by q defines isomorphisms Ci(F ) → Ci+2(F ) and
HFi(F, J) → HFi+2(F, J) of K-vector spaces.
3) For each i ∈ Z there exists a basis of Ci(F ) over K consisting of the
elements of the form ql[γ, u], with [γ, u] ∈ P̃F .
4) A finite collection of elements of the form ql[γ, u], [γ, u] ∈ P̃F , lying
in C0(F ) ∪C1(F ) is a basis of the vector space C0(F )⊕C1(F ) over the field
K if and only if it is a basis of the module C(F ) over the ring Λ.
3.3 Conley-Zehnder and Maslov indices
In this section we briefly outline the definition and recall the relevant proper-
ties of the Conley-Zehnder index referring to [54, 58, 57] for details. In par-
ticular, we show that the Conley-Zehnder index is a quasi-morphism on the
universal cover S̃p (2k) of the symplectic group Sp(2k) (see Proposition 3.5
below), a fact which will be useful for asymptotic calculations with Floer
homology in the next sections. There are several routes leading to this fact,
which is quite natural since all homogeneous quasi-morphisms on S̃p (2k) are
proportional, and hence the same quasi-morphism admits quite dissimilar
definitions [7]. We extract the quasi-morphism property from the paper of
Robbin and Salamon [54] by bringing together several statements contained
therein7.
The Conley-Zehnder index assigns to each [γ, u] ∈ P̃F a number. Orig-
inally the Conley-Zehnder index was defined only for regular Hamiltonians
[18] – in this case it is integer-valued and gives rise to a grading of the ho-
mology groups in Floer theory. Later the definition was extended in different
ways by different authors to arbitrary Hamiltonians. We will use such an ex-
tension introduced in [54] (also see [57, 58]). In this case the Conley-Zehnder
index may take also half-integer values.
Let k be a natural number. Consider the symplectic vector space R2k
with a symplectic form ω2k on it. Denote by p = (p1, . . . , pk), q = (q1, . . . , qk)
the corresponding Darboux coordinates on the vector space R2k.
7We thank V.L. Ginzburg for stimulating discussions on the material of this section.
Robbin-Salamon index of Lagrangian paths: Let V ⊂ R2k be a
Lagrangian subspace. Consider the Grassmannian Lagr (k) of all Lagrangian
subspaces in R2k and consider the hypersurface ΣV ⊂ Lagr (k) formed by all
the Lagrangian subspaces that are not transversal to V . To such a V and
to any smooth path {Lt}, 0 ≤ t ≤ 1, in Lagr (k) Robbin and Salamon [54]
associate an index, which may take integer or half-integer values and which
we will denote by RS({Lt}, V ). The definition of the index can be outlined
as follows.
A number t ∈ [0, 1] is called a crossing if Lt ∈ ΣV . To each crossing t one
associates a certain quadratic form Qt on the space L(t) ∩ V – see [54] for
the precise definition. The crossing t is called regular if the quadratic form
Qt is non-degenerate. The index of such a regular crossing t is defined as the
signature of Qt if 0 < t < 1 and as half of the signature of Qt if t = 0, 1.
One can show that regular crossings are isolated. For a path {Lt} with only
regular crossings the index RS({Lt}, V ) is defined as the sum of the indices
of its crossings. An arbitrary path can be perturbed, keeping the endpoints
fixed, into a path with only regular crossings and the index of the perturbed
path does not depend on the perturbation – in fact, it depends only on the
fixed endpoints homotopy class of the path. Moreover, it is additive with
respect to the concatenation of paths and satisfies the naturality property:
RS({ALt}, AV ) = RS({Lt}, V ) for any symplectic matrix A.
Indices of paths in Sp (2k): Consider the group Sp (2k) of symplectic
2k × 2k-matrices. Denote by S̃p (2k) its universal cover. One can use the
index RS in order to define two indices on the space of smooth paths in
Sp (2k).
The first index, denoted by Ind2k, is defined as follows. Fix a Lagrangian
subspace V ⊂ R2k. For each smooth path {At}, 0 ≤ t ≤ 1, in Sp (2k) define
Ind2k ({At}, V ) as
Ind2k ({At}, V ) := RS({AtV }, V ).
The naturality of the RS index implies that
RS({BAtB
−1(BV )}, BV ) = RS({BAtV )}, BV ) =
= RS({AtV )}, V ) for any B ∈ Sp (2k)
and thus we get the following naturality condition for Ind2k:
Ind2k ({BAtB
−1}, BV ) = Ind2k ({At}, V ) for any B ∈ Sp (2k). (11)
The second index, which we will call the Conley-Zehnder index of a matrix
path and which will be denoted by CZmatr, is defined as follows. For each
A ∈ Sp (2k) denote by GrA the graph of A which is a Lagrangian subspace
of the symplectic vector space R4k = R2k×R2k equipped with the symplectic
structure ω4k = −ω2k ⊕ ω2k. Denote by ∆ the diagonal in R
4k = R2k × R2k
– it is a Lagrangian subspace with respect to ω4k. Now for any smooth path
{At}, 0 ≤ t ≤ 1, in Sp (2k) define CZmatr as
CZmatr({At}) := RS({GrAt},∆).
Equivalently, one can define CZmatr({At}) similarly to the index RS by look-
ing at the intersections of {A(t)} with the hypersurface Σ ⊂ Sp (2k) formed
by all the symplectic 2k× 2k-matrices with eigenvalue 1 and translating the
notions of a regular crossing and the corresponding quadratic form to this
setup.
Both indices Ind2k ({At}, V ) and CZmatr({At}) depend only on the fixed
endpoints homotopy class of the path {At} and are additive with respect to
the concatenation of paths in Sp (2k). The relation between the two indices
is as follows. Denote by I2k the 2k × 2k identity matrix. Given a smooth
path {At}, 0 ≤ t ≤ 1, in Sp (2k), set Ât := I2k ⊕ At ∈ Sp (4k). Then
CZmatr({At}) = Ind4k({Ât},∆). (12)
Remark 3.2. Note that near each W ∈ ΣV there exists a local coordinate
chart (on Lagr (k)) in which ΣV can be defined by an algebraic equation of
degree bounded from above by a constant C depending only on k and W .
Moreover, since for any two V, V ′ ∈ Lagr (k) there exists a diffeomorphism of
Lagr (k) mapping ΣV into ΣV ′ we can assume that C = C(k) is independent
of W and depends only on k. Therefore for any V , for any point W ∈ ΣV
and for any sufficiently small open neighborhood UW of W in Lagr (k) the
number of connected components of UW \(UW ∩ΣV ) is bounded by a constant
depending only on k.
Using these observations and the fact that regular crossings are isolated it
is easy to show that there exists a constant C(k), depending only on k, such
that for any Lagrangian subspace V ⊂ R2k and any path {At} ⊂ Sp (2k),
0 ≤ t ≤ 1, there exists a δ > 0 such that for any smooth path {A′t} ⊂ Sp (2k),
0 ≤ t ≤ 1, which is δ-close to {At} in the C
0-metric, one has
|Ind2k({At}, V )− Ind2k({A
t}, V | < C(k),
|CZmatr({At})− CZmatr({A
t}| < C(k).
Leray theorem on the index Ind2k: The following result follows from
Theorem 5.1 in [54] which Robbin and Salamon credit to Leray [35], p.52.
Denote by L the Lagrangian (q1, . . . , qk)-coordinate plane in R
2k. Any sym-
plectic matrix S ∈ Sp (2k) can be decomposed into k × k blocks as
where the blocks satisfy, in particular, the condition that
EF T − FET = 0. (13)
If SL ∩ L = 0 then the k × k-matrix F is invertible and multiplying (13)
by F−1 on the left and (F T )−1 = (F−1)T on the right, we get that F−1E −
ET (F−1)T = 0. Therefore the matrix QS := F
−1E is symmetric.
Theorem 3.3 ([54], Theorem 5.1; [35], p.52). Assume {At}, {Bt}, 0 ≤ t ≤ 1,
are two smooth paths in Sp (2k), such that A0 = B0 = I2k and A1L ∩ L = 0,
B1L ∩ L = 0, A1B1L ∩ L = 0. Then
Ind2k({AtBt}, L) = Ind2k({At}, L) + Ind2k({Bt}, L) +
sign (QA1 +QB1),
where sign (QA1 +QB1) is the signature of the quadratic form defined by the
symmetric k × k-matrix QA1 +QB1.
Corollary 3.4. Let V be any Lagrangian subspace of R2k. Then there exists
a positive constant C, depending only on k, such that for any smooth paths
{Xt}, {Yt}, 0 ≤ t ≤ 1, in Sp (2k), such that X0 = Y0 = I2k (there are no
assumptions on X1, Y1!),
|Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| < C.
Proof. We will write C1, C2, . . . for (possibly different) positive constants de-
pending only on k.
Pick a map Ψ ∈ Sp (2k) such that ΨV = L. Denote At = ΨXtΨ
Bt = ΨYtΨ
−1. Note that the paths {At}, {Bt} are based at the identity.
Using the naturality property (11) of Ind2k we get
|Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| =
= |Ind2k({ΨXtYtΨ
−1},ΨV )− Ind2k({ΨXtΨ
−1},ΨV )−
−Ind2k({ΨYtΨ
−1},ΨV )| =
= |Ind2k({(ΨXtΨ
−1)(ΨYtΨ
−1)}, L)− Ind2k({ΨXtΨ
−1}, L)−
−Ind2k({ΨYtΨ
−1}, L)| =
= |Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|.
|Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| =
= |Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|. (14)
Further on, Remark 3.2 implies that we can find sufficiently C0-close identity-
based perturbations {A′t}, {B
t} of {At}, {Bt} such that
A′1L ∩ L = 0, B
1L ∩ L = 0, A
1L ∩ L = 0. (15)
|Ind2k({AtBt}, L)− Ind2k({At}, L)− Ind2k({Bt}, L)|−
−|Ind2k({A
t}, L)− Ind2k({A
t}, L)− Ind2k({B
t}, L)| < C1, (16)
for some C1. On the other hand, since the three identity-based paths {A
{B′t}, {A
t}, satisfy the conditions (15), we can apply to them Theorem 3.3.
Hence there exists C2 such that
|Ind2k({A
t}, L)− Ind2k({A
t}, L)− Ind2k({B
t}, L)| < C2.
Combining it with (14) and (16) we get that there exists C3 such that
|Ind2k({XtYt}, V )− Ind2k({Xt}, V )− Ind2k({Yt}, V )| < C3,
which finishes the proof.
Conley-Zehnder index as a quasi-morphism: Recall that 2n = dimM .
Restricting CZmatr to the identity-based paths in Sp (2n) one gets a function
on S̃p (2n) that will be still denoted by CZmatr.
Proposition 3.5 (cf. [19]). The function CZmatr : S̃p (2n) → R is a quasi-
morphism. It means that there exists a constant C > 0 such that
|CZmatr(ab)− CZmatr(a)− CZmatr(b)| ≤ C ∀a, b ∈ S̃p (2n).
Proof. Represent a and b by identity-based paths {At}, {Bt}, 0 ≤ t ≤ 1, in
Sp (2n). Then use (12) and apply Corollary 3.4 for k = 2n, V = ∆ to {Ât},
{B̂t} in Sp (4n).
Maslov index of symplectic loops: The Conley-Zehnder index for
identity-based loops in Sp (2n) is called the Maslov index of a loop. Its
original definition, going back to [4], is the following: it is the intersection
number of an identity-based loop with the stratified hypersurface Σ whose
principal stratum is equipped with a certain co-orientation. Note that we do
not divide the intersection number by 2 and thus in our case the Maslov index
takes only even values; for instance, the Maslov index of a counterclockwise
2π-twist of the standard symplectic R2 is 2. We denote the Maslov index of
a loop {B(t)} by Maslov ({B(t)}).
Conley-Zehnder and Maslov indices of periodic orbits: The Con-
ley-Zehnder index for periodic orbits is defined by means of the Conley-
Zehnder index for matrix paths as follows. Given [γ, u] ∈ P̃F , build an
identity-based path {A(t)} in Sp (2n) as follows: take a symplectic trivial-
ization of the bundle u∗(TM) over D2 and use the trivialization to identify
the linearized flow dγ(0)ft, 0 ≤ t ≤ 1, along γ with a symplectic matrix
{A(t)}. Then the Conley-Zehnder index CZF ([γ, u]) is defined as
CZF ([γ, u]) := n− CZmatr ({A(t)}). (17)
With such a normalization of CZF for any sufficiently C
2-small autonomous
Morse Hamiltonian F , the Conley-Zehnder index of an element of P̃F , rep-
resented by a pair [x, u] consisting of a critical point x of F (viewed as a
constant path in M) and the trivial disk u, is equal to the Morse index of
x. Note that with such a normalization CZF (Sy) = CZF (y)+2
c1(M) for
every y ∈ P̃F and S ∈ H
2 (M).
Similarly, if the time-1 flow generated by F defines a loop in Ham(M) then
to each [γ, u] ∈ P̃F one can associate its Maslov index. Namely, trivialize the
bundle u∗(TM) over D2 and identify the linearized flow {dxft} along γ with
an identity-based loop of symplectic 2n × 2n-matrices. Define the Maslov
index mF ([γ, u]) as the Maslov index for the loop of symplectic matrices.
Under the action of HS2 (M) on P̃F the Maslov index changes as follows:
mF (S · [γ, u]) = mF ([γ, u])− 2
c1(M), S ∈ H
2 (M).
Let us make the following remark. Assume γ ∈ PF and assume that a
symplectic trivialization of the bundle γ∗(TM) over S1 identifies {dγ(0)ft}
with an identity-based path {A(t)} of symplectic matrices. Assume there
is another symplectic trivialization of the same bundle, coinciding with the
first one at γ(0), and denote by {B(t)} the identity-based loop of transition
matrices from the first symplectic trivialization to the second one. Use the
second trivialization to identify {dγ(0)ft} with an identity-based path {A
′(t)}.
CZmatr ({A
′(t)}) = CZmatr ({A(t)}) +Maslov ({B(t)}), (18)
and if {A(t)} is a loop then so is {A′(t)} and
Maslov ({A′(t)}) = Maslov ({A(t)}) +Maslov ({B(t)}). (19)
3.4 Spectral numbers
Given the algebraic setup as above, the construction of the Piunikhin-Sala-
mon-Schwarz (PSS) isomorphism [47] yields a Λ-linear isomorphism (PSS-
isomorphism) φM : QH∗(M) → HF∗(F, J) which preserves the grading and
which is actually a ring isomorphism (the pair-of-pants product defines a ring
structure on HF∗(F, J)).
Using the PSS-isomorphism one defines the spectral numbers c(a, F ),
where 0 6= a ∈ QH∗(M), in the usual way [45]. Namely, the action functional
AF defines a filtration on C(F ) which induces a filtration HF
∗ (F, J), α ∈ R,
on HF∗(F, J), with HF
∗ (F, J) ⊂ HF
∗ (F, J) as long as α < β. Then
c(a, F ) := inf {α | φM(a) ∈ HF
∗ (F, J)}.
Such spectral number is finite and well-defined (does not depend on J). Here
is a brief account of the relevant properties of spectral numbers – for details
see [45] (see also [65, 42, 59, 43] for earlier versions of this theory).
(Spectrality) c(a,H) ∈ spec (H), where the spectrum spec (H) of H is
defined as the set of critical values of the action functional AH , i.e.
spec (H) := AH(P̃H) ⊂ R.
(Quantum homology shift property) c(λa,H) = c(a,H) + ν(λ) for all
λ ∈ Λ, where ν is the valuation defined in Section 3.1.
(Hamiltonian shift property) c(a,H + λ(t)) = c(a,H) +
λ(t) dt for
any Hamiltonian H and function λ : S1 → R.
(Monotonicity) If H1 ≤ H2, then c(a,H1) ≤ c(a,H2).
(Lipschitz property) The map H 7→ c(a,H) is Lipschitz on the space of
(time-dependent) Hamiltonians H : M × S1 → R with respect to the
C0-norm.
(Symplectic invariance) c(a, φ∗H) = c(a,H) for every φ ∈ Symp0(M),
H ∈ C∞(M); more generally, Symp (M) acts on H∗(M ;F), and hence
on QH∗(M), and c(a, φ
∗H) = c(φ∗a,H) for any φ ∈ Symp (M).
(Normalization) c(a, 0) = ν(a) for every a ∈ QH∗(M).
(Homotopy invariance) c(a,H1) = c(a,H2) for any normalized H1, H2
generating the same φ ∈ H̃am (M). Thus one can define c(a, φ) for any
φ ∈ H̃am (M) as c(a,H) for any normalized H generating φ.
(Triangle inequality) c(a ∗ b, φψ) ≤ c(a, φ) + c(b, ψ).
The commutative ring QH•(M) admits a K-bilinear and K-valued form Ω
on QH•(M) which associates to a pair of quantum homology classes a, b ∈
QH•(M) the coefficient (belonging to K) at the class [point] = [point] · q
a point in their quantum product a ∗ b ∈ QH•(M) (the Frobenius structure).
Let τ : K → F be the map sending each series
θ∈Γ zθs
θ, zθ ∈ F , to its free
term z0. Define a non-degenerate F -valued F -linear pairing on QH•(M) by
Π(a, b) := τΩ(a, b) = τΩ(a ∗ b, [M ]) . (20)
Note that Π is symmetric and
Π(a ∗ b, c) = Π(a, b ∗ c) ∀a, b, c ∈ QH•(M). (21)
With this notion at hand, we can present another important property of
spectral numbers:
(Poincaré duality) c(b, φ) = − infa∈Υ(b) c(a, φ
−1) for all b ∈ QH•(M)\{0}
and φ. Here Υ(b) denotes the set of all a ∈ QH•(M) with Π(a, b) 6= 0.
The Poincaré duality can be extracted from [47] (cf. [22]) – for a proof see
[46].
The next property is an immediate consequence of the definitions (see [22]
for a discussion in the monotone case):
(Characteristic exponent property) Given 0 6= λ ∈ F , a, b ∈ QH∗(M),
a, b, a + b 6= 0, and a (time-dependent) Hamiltonian H , one has
c(λ · a,H) = c(a,H) and c(a+ b,H) ≤ max(c(a,H), c(b,H)).
3.5 Partial symplectic quasi-states
Given a non-zero idempotent a ∈ QH2n(M) and a time-independent Hamil-
tonian H :M → R, define
ζ(a,H) := lim
c(a, lH)
. (22)
When a is fixed, we shall often abbreviate ζ(H) instead of ζ(a,H). The limit
in the formula (22) always exists and thus the functional ζ : C∞(M) → R is
well-defined. The functional ζ on C∞(M) is Lipschitz with respect to the C0-
norm ‖H‖ = maxM |H| and therefore extends to a functional ζ : C(M) → R,
where C(M) is the space of all continuous functions on M . These facts were
proved in [23] in the case a = [M ] but the proofs actually go through for any
non-zero idempotent a ∈ QH2n(M).
Here we will list the properties of ζ for such an M . Again, these proper-
ties were proved in [23] in the case a = [M ] but the proof goes through for
any non-zero idempotent a ∈ QH2n(M). The additivity with respect to con-
stants property was not explicitly listed in [23] but follows immediately from
the definition of ζ and the Hamiltonian shift property of spectral numbers.
The triangle inequality follows readily from the definition of ζ and from the
triangle inequality for the spectral numbers.
Theorem 3.6. The functional ζ : C(M) → R satisfies the following prop-
erties:
Semi-homogeneity: ζ(αF ) = αζ(F ) for any F and any α ∈ R≥0.
Triangle inequality: If F1, F2 ∈ C
∞(M), {F1, F2} = 0 then ζ(F1 + F2) ≤
ζ(F1) + ζ(F2).
Partial additivity and vanishing: If F1, F2 ∈ C
∞(M), {F1, F2} = 0 and the
support of F2 is displaceable, then ζ(F1 + F2) = ζ(F1); in particular, if the
support of F ∈ C(M) is displaceable, ζ(F ) = 0.
Additivity with respect to constants and normalization: ζ(F +α) = ζ(F )+α
for any F and any α ∈ R. In particular, ζ(1) = 1.
Monotonicity: ζ(F ) ≤ ζ(G) for F ≤ G.
Symplectic invariance: ζ(F ) = ζ(F ◦ f) for every symplectic diffeomorphism
f ∈ Symp0 (M).
Characteristic exponent property: ζ(a1+a2, F ) ≤ max(ζ(a1, F ), ζ(a2, F )) for
each pair of non-zero idempotents a1, a2 with a1 ∗ a2 = 0, a1+ a2 6= 0 (in this
case a1 + a2 is also a non-zero idempotent), and for all F ∈ C(M) .
We will call the functional ζ : C(M) → R satisfying all the properties
listed in Theorem 3.6 a partial symplectic quasi-state.
4 Basic properties of (super)heavy sets
In this section we prove parts (i) and (iii) of Theorem 1.2, as well as The-
orem 1.3. We shall use that a partial symplectic quasi-state ζ extends by
continuity in the uniform norm to a monotone functional on the space of
continuous functions C(M), see Section 3.5 above. In particular, one can
use continuous functions instead of the smooth ones in the definition of (su-
per)heaviness in formulae (3) and (4).
Assume a partial quasi-state ζ defined by a non-zero idempotent is fixed
and we consider heaviness and superheaviness with respect to ζ . We start
with the following elementary
Proposition 4.1. A closed subset X ⊂ M is heavy if and only if for every
H ∈ C∞(M) with H|X = 0, H ≤ 0 one has ζ(H) = 0. A closed subset
X ⊂ M is superheavy if and only if for every H ∈ C∞(M) with H|X = 0,
H ≥ 0 one has ζ(H) = 0.
Proof. The “only if” parts follow readily from the monotonicity property of
ζ . Let us prove the “if” part in the “heavy case” – the “superheavy” case is
similar. Take a function H on M and put
F = min(H − inf
H, 0) .
Note that F |X = 0 and F ≤ 0. Thus ζ(F ) = 0 by the assumption of the
proposition. Thus
0 = ζ(F ) ≤ ζ(H − inf
H) = ζ(H)− inf
which yields heaviness of X .
The following proposition proves part (i) of Theorem 1.2.
Proposition 4.2. Every superheavy set is heavy.
Proof. Let X ⊂ M be a superheavy subset. Assume that H|X = 0, H ≤ 0.
By the triangle inequality for ζ we have ζ(H) + ζ(−H) ≥ 0. Note that
−H|X = 0, −H ≥ 0. Superheaviness yields ζ(−H) = 0, so ζ(H) ≥ 0. But
by monotonicity ζ(H) ≤ 0. Thus ζ(H) = 0 and the claim follows from
Proposition 4.1.
Superheavy sets have the following user-friendly property.
Proposition 4.3. Let X ⊂ M be a superheavy set. Then for every α ∈ R
and H ∈ C∞(M) with H|X ≡ α one has ζ(H) = α.
Proof. Since ζ(H + α) = ζ(H) + α it suffices to prove the proposition for
α = 0. Take any function H with H|X = 0. Since X is superheavy and, by
Proposition 4.2, also heavy, we have
0 = ζ(−|H|) ≤ ζ(H) ≤ ζ(|H|) = 0 ,
which yields ζ(H) = 0.
As an immediate consequence we get part (iii) of Theorem 1.2.
Proposition 4.4. Every superheavy set intersects with every heavy set.
Proof. Let X be a superheavy set and Y be a heavy set. Assume on the
contrary that X ∩ Y = ∅. Take a function H ≤ 0 with H|Y ≡ 0 and
H|X ≡ −1. Then ζ(H) = −1 by Proposition 4.3. On the other hand,
ζ(H) = 0 since Y is heavy, and we get a contradiction.
Note that two heavy sets do not necessarily intersect each other: a meridian
of T2 is heavy (see Corollary 6.4 below), while two meridians can be disjoint.
Proof of Theorem 1.3 (i) and (ii): The triangle inequality yields
c(a,H) = c(a ∗ [M ], 0 +H) ≤ c(a, 0) + c([M ], H) = ν(a) + c([M ], H).
Passing to the partial quasi-states ζ(a,H) and ζ([M ], H) we get:
ζ(a,H) = lim
c(a, kH)/k ≤
≤ lim
(ν(a) + c([M ], kH))/k = lim
c([M ], kH)/k = ζ([M ], H).
The result now follows from the definition of heavy and superheavy sets (see
Definition 1.1).
Proof of Theorem 1.3 (iii): By the characteristic exponent property of
spectral invariants,
ζ(a, F ) ≤ max
i=1,...,l
ζ(ei, F ) ∀F ∈ C
∞(M) . (23)
Choose a sequence of functions Gj ∈ C
∞(M), j → +∞, with the fol-
lowing properties: Gk ≤ Gj for k > j, Gj = 0 on X , Gj ≤ 0 and for
every function F ≤ 0 which vanishes on an open neighborhood of X there
exists j so that Gj ≤ F (existence of such a sequence can be checked easily).
In view of inequality (23), we have that for every j there exists i so that
ζ(a,Gj) ≤ ζ(ei, Gj). Passing, if necessary, to a subsequence Gjk , jk → +∞,
we can assume without loss of generality that i is the same for all j. In view
of heaviness of X with respect to a, we have that ζ(a,Gj) = 0. Therefore
ζ(ei, Gj) ≥ 0.
Choose any function F ≤ 0 onM which vanishes on an open neighborhood
of X . Then there exists j large enough so that F ≥ Gj. By monotonicity
combined with the previous estimate we have
0 ≥ ζ(ei, F ) ≥ ζ(ei, Gj) ≥ 0 ,
which yields ζ(ei, F ) = 0.
Now let F be any continuous function on M that vanishes on X . Take
a sequence of continuous functions Fj , converging to F in the C
0-norm, so
that each Fj vanishes on an open neighborhood of X . Then ζ(ei, Fj) =
limj→+∞ ζ(ei, Fj) = 0, because ζ(ei, ·) is Lipschitz with respect to the C
norm. The heaviness ofX with respect to ei now follows from Proposition 4.1.
This finishes the proof of the theorem.
5 Products of (super)heavy sets
In this section we prove Theorem 1.5 on products of (super)heavy subsets.
5.1 Product formula for spectral invariants
The proof of Theorem 1.5 is based on the following general result.
Theorem 5.1. For every pair of time-dependent Hamiltonians G1, G2 onM1
and M2, and all non-zero a1 ∈ QHi1(M1), a2 ∈ QHi2(M2) we have
c(a1 ⊗ a2, G1(z1, t) + G2(z2, t)) = c(a1, G1) + c(a1, G2) .
Here G1(z1, t) +G2(z2, t) is a time-dependent Hamiltonian on M1 ×M2.
Let us deduce Theorem 1.5 from Theorem 5.1.
Proof of Theorem 1.5: We show that the product of superheavy sets is
superheavy (the proof for heavy sets goes without any changes). We denote
by ζ1, ζ2 and ζ the partial quasi-states on M1,M2 and M := M1 ×M2 as-
sociated to the idempotents a1, a2 and a1 ⊗ a2 respectively. Let Xi ⊂ Mi,
i = 1, 2, be a superheavy set. By Proposition 4.1 it suffices to show that if a
non-negative function G ∈ C∞(M) vanishes on some neighborhood, say U ,
of X := X1 × X2 then ζ(G) = 0. (Since ζ is Lipschitz with respect to the
C0-norm this would imply that ζ(G) = 0 for any non-negative G ∈ C(M)
that vanishes on X). Put K := maxM G. Choose neighborhoods Ui of Xi so
that U1 ×U2 ⊂ U . Choose non-negative functions Gi on Mi which vanish on
Xi and such that Gi(z) > K for all z ∈Mi \Ui. Observe that G ≤ G1 +G2.
But, in view of Theorem 5.1 and superheaviness of Xi, we have
ζ(G1 +G2) = ζ1(G1) + ζ2(G2) = 0 .
By monotonicity
0 ≤ ζ(G) ≤ ζ(G1 +G2) = 0 ,
and thus ζ(G) = 0.
It remains to prove Theorem 5.1. Note that the left-hand side of the equality
stated in the theorem does not exceed the right-hand side: this is an imme-
diate consequence of the triangle inequality for spectral invariants. However,
we were unable to use this observation for proving the theorem. Our ap-
proach is based on a rather lengthy algebraic analysis which enables us to
calculate separately the left and the right-hand sides “on the chain level”. A
simple inspection of the results of this calculation yields the desired equality.
5.2 Decorated Z2-graded complexes
A Z2-complex is a Z2-graded finite-dimensional vector space V over a field
K equipped with a K-linear differential ∂ : V → V satisfying ∂2 = 0 and
shifting the grading. A decorated complex over K = KΓ includes the following
data:
• a countable subgroup Γ ⊂ R;
• a Z2-graded complex (V, d) over KΓ;
• a preferred basis x1, . . . , xn of V ;
• a function F : {x1, . . . , xn} → R (called the filter) which extends to V
λjxj) = max{ν(λj) + F (xj)
∣∣∣ λj 6= 0},
and satisfies F (dv) < F (v) for all v ∈ V \ {0}. The convention is that
F (0) = −∞. Here ν is the valuation defined in Section 3.1 above.
We shall use the notation
V := (V, {xi}i=1,...,n, F, d,Γ)
for a decorated complex.
The ⊗̂K-tensor product V = V1⊗̂KV2 of decorated complexes
Vi = (Vi, {x
j }j=1,...,ni, Fi, di,Γi) , i = 1, 2
is defined as follows. Consider the space V = V1⊗̂KV2 (see formula (5) above)
with the natural Z2-grading. Define the differential d on V by
d(x⊗ y) = d1x⊗ y + (−1)
deg xx⊗ d2y .
The preferred basis in V is given by {xpq := x
p ⊗ x
q } and the filter F is
defined by
F (xpq) = F1(x
p ) + F2(x
Finally, we put V := (V, {xpq}, F, d,Γ1 + Γ2) .
The (Z2-graded) homology of decorated complexes are denoted by H∗(V)
– they are K-vector spaces. By the Künneth formula, H(V1⊗̂KV2) =
H(V1)⊗̂KH(V2).
Next we define spectral invariants associated to a decorated complex V :=
(V, {xpq}, F, d) . Namely, for a ∈ H(V) put
c(a) := inf{F (v) | a = [v], v ∈ Ker d} .
We shall see below that c(a) > −∞ for each a 6= 0.
The purpose of this algebraic digression is to state the following result:
Theorem 5.2. For any two decorated complexes V1,V2
c(a1 ⊗ a2) = c(a1) + c(a2) ∀a1 ∈ H(V1), a2 ∈ H(V2)
5.3 Reduced Floer and Quantum homology
The 2-periodicity of the Floer complex and Floer homology defined by the
multiplication by q (see Proposition 3.1 above) allows to encode their al-
gebraic structure in a decorated Z2-complex. Consider a regular pair (G, J)
consisting of a Hamiltonian function and a compatible almost-complex struc-
ture on M (both, in general, are time-dependent). Let (C∗(G), dG,J) be the
corresponding Floer complex. Let us associate to it a Z2-complex: a Z2-
graded vector space VG over KΓ, defined as
VG := C0(G)⊕ C1(G),
with the obvious Z2-grading, and a differential ∂G,J : VG → VG, defined as
the direct sum of dG,J : C1(G) → C0(G) and qdG,J : C0(G) → C1(G). One
readily checks that this is indeed a Z2-complex because dG,J : C(G) → C(G)
is ΛΓ-linear. We will call (VG, ∂G,J) the Z2-complex associated to (G, J).
Note that the cycles and the boundaries of (VG, ∂G) having Z2-degree
i ∈ {0, 1} in VG coincide, respectively, with the cycles and the boundaries
having Z-degree i of (C(G), dG,J). Therefore the Floer homology HFi(G, J)
is isomorphic, as a vector space over KΓ, to the i-th degree component of the
homology of the complex (VG, ∂G,J).
The Z2-complex (VG, ∂G,J) carries a structure of the decorated complex
VG,J as follows. Let γi(t), i = 1, . . . , m, be the collection of all contractible
1-periodic orbits of the Hamiltonian flow generated by G. Choose disc ui
in M spanning γi. For each i there exists unique integer, say ri, so that
the Conley-Zehnder index of the element xi := q
ri · [γi, ui] lies in the set
{0, 1}. Clearly, the collection {xi} forms a basis of VG over KΓ. We shall
consider it as a preferred basis. Note that the preferred basis is unique up to
multiplication of xi’s by elements of the form s
αi , αi ∈ Γ. Finally, the action
functional associated to G defines a filtration on VG.
The homology of (VG, ∂G,J) can be canonically identified via the PSS-
isomorphism with the object which we call reduced quantum homology:
QHred(M) := QH0(M)⊕QH1(M) .
We call this isomorphism the reduced PSS-isomorphism and denote it by ψG,J .
Note that we have a natural projection p : QH∗(M) → QHred(M) which
sends any degree homogeneous element a to aqr with deg a + 2r ∈ {0, 1}.
With this notation, the usual Floer-homological spectral invariant c(a,G)
coincides with the spectral invariant c(p(a)) of the decorated complex VG,J .
5.4 Proof of Theorem 5.1
By the Lipschitz property of spectral numbers it is enough to consider the
case when G1 and G2 belong to regular pairs (Gi, Ji), i = 1, 2. Set
G(z1, z2, t) := G1(z1, t) +G(z2, t)
and J := J1 × J2. Then (G, J) is also a regular pair. Put Γi = Γ(Mi, ωi). It
is straightforward to see that the decorated complex VG,J is the ⊗̂K-tensor
product of the decorated complexes VGi,Ji for i = 1, 2.
Put (M,ω) = (M1×M2, ω1⊕ω2). An obvious modification of the Künneth
formula for quantum homology (see e.g. [41, Exercise 11.1.15] for the state-
ment in the monotone case) yields a natural monomorphism
ı : QHi1(M1, ω1)⊗̂KQHi2(M1, ω1) → QHi1+i2(M,ω) .
Since in our setting quantum homologies are 2-periodic, the collection of these
isomorphisms for all pairs (i1, i2) from the set {0, 1} induces an isomorphism
j : QHred(M1)⊗̂KQHred(M2) → QHred(M) .
It has the following properties: First, given two elements a1 ∈ QHi1(M1, ω1)
and a2 ∈ QHi2(M2, ω2) we have that
p(a1)⊗ p(a2) = p(a1 ⊗ a2) .
Second, the following diagram commutes:
H(VG1, ∂G1,J1)⊗̂KH(VG2 , ∂G2,J2)
ψG1,J1⊗ψG2,J2
H(VG, ∂G,J)
QHred(M1)⊗̂KQHred(M2)
// QHred(M)
Here k is the isomorphism coming from the Künneth formula for Z2-comple-
xes, and ψGi,Ji, ψG,J stand for the reduced PSS-isomorphisms. It follows that
the definition of c(ai, Gi), c(a1⊗a2, G) matches the definition of c(p(ai)) and
c(p(a1)⊗ p(a2)). By Theorem 5.2 we get that
c(a1⊗a2, G) = c(p(a1)⊗p(a2)) = c(p(a1))+ c(p(a2)) = c(a1, G1)+ c(a2, G2) .
This proves Theorem 5.1 modulo Theorem 5.2.
5.5 Proof of algebraic Theorem 5.2
A decorated complex is called generic if F (xi) − F (xj) /∈ Γ for all i 6= j
(recall that under our assumptions Γ, the group of periods of the symplectic
form ω over π2(M), is a countable subgroup of R). We start from some
auxiliary facts from linear algebra. Let V := (V, {xi}i=1,...,n, F, d,Γ) be a
generic decorated complex. We recall once again that for brevity we write K
instead of KΓ wherever it is clear what Γ is taken.
An element x ∈ V is called normalized if
x = xp +
i 6=p
λixi , λi ∈ K, F (xp) > max
i 6=p
F (λixi) .
We shall use the notation x = xp+o(xp). In generic complexes, every element
x 6= 0 can be uniquely written as x = λ(xp+o(xp)) for some p = 1, . . . , n and
λ ∈ K. A system of vectors e1, . . . , em in V is called normal if every ei has
the form ei = xji +o(xji) for ji ∈ {1, . . . , n} and the numbers ji are pair-wise
distinct.
Lemma 5.3. Let e1, . . . , em be a normal system. Then
λiei) = max
F (λiei) .
Proof. We prove the result using induction in m. For m = 1 the statement is
obvious. Let’s check the induction step m− 1 → m. Observe that it suffices
to check that
F (e1 +
λiei) ≥ F (e1) . (24)
Then obviously
λiei) ≥ max
F (λiei) ,
while the reversed inequality is an immediate consequence of the definitions.
By the induction step,
λiei) = max
i=2,...,n
F (λiei) .
In view of the genericity, the maximum at the right hand side can be uniquely
written as F (λi0xi0). Without loss of generality we shall assume that ei =
xi + o(xi) and i0 = 2.
λ−12 λiei = x2 + o(x2) .
Write
e1 = x1 + αx2 +X, v = x2 + βx1 + Y,
where α, β ∈ K and X, Y ∈ SpanK(x3, . . . , xn). Note that F (x1) > F (αx2),
F (x2) > F (βx1), which yields
ν(α) < F (x1)− F (x2) < −ν(β) = ν(β
−1) . (25)
In particular, ν(α) < ν(β−1). Note that
e1 + λ2v = (1 + λ2β)x1 + (α + λ2)x2 + Z, Z ∈ SpanK(x3, . . . , xn) .
F (e1 + λ2v) ≥ max(ν(1 + λ2β) + F (x1), ν(α + λ2) + F (x2)) .
If ν(1 + λ2β) ≥ 0 we have F (e1 + λ2v) ≥ F (x1) = F (e1) and inequality (24)
follows. Assume that ν(1+λ2β) < 0 = ν(1). Then ν(λ2β) = 0 = ν(λ2)+ν(β),
and hence ν(λ2) = ν(β
−1) 6= ν(α). Thus
ν(α + λ2) ≥ ν(λ2) = −ν(β) .
Combining this inequality with (25) we get that
F (e1 + λ2v) ≥ ν(α + λ2) + F (x1) + (F (x2)− F (x1))
≥ F (x1) + (ν(α + λ2) + ν(β)) ≥ F (x1) = F (e1) .
This completes the proof of inequality (24), and hence of the lemma.
It readily follows from the lemma that every normal system is linearly inde-
pendent.
Lemma 5.4. Every subspace L ⊂ V has a normal basis.
Proof. We use induction over m = dimK L. The case m = 1 is obvious,
so let us handle the induction step m − 1 → m. It suffices to show the
following: Let e1, . . . , em−1 be a normal basis in a subspace L
′, and let v /∈ L′
be any vector. Put L = SpanK(L
′ ∪ {v}). Then there exists em ∈ L so that
e1, . . . , em is a normal basis. Indeed, assume without loss of generality that
for all i = 1, . . . , m−1 one has ei = xi+ o(xi). Put W = SpanK(xm, . . . , xn).
We claim that L′ ∩W = {0}. Indeed, otherwise
λ1e1 + . . .+ λm−1em−1 = λmxm + . . .+ λnxn
where the linear combinations in the right and the left-hand sides are non-
trivial. Apply F to both sides of this equality. By Lemma 5.3
F (λ1e1 + . . .+ λm−1em−1) = F (xp) mod Γ, where 1 ≤ p ≤ m− 1 ,
while
F (λmxm + . . .+ λnxn) = F (xq) mod Γ, where q ≥ m .
This contradicts the genericity of our decorated complex, and the claim fol-
lows. Since dimL′+dimW = dimV , we have that V = L′⊕W . Decompose v
as u+w with u ∈ L′, w ∈ W , and note that w ∈ L. Note that e1, . . . , em−1, w
are linearly independent. Furthermore, w = λ(xp + o(xp)) for some p ≥ m.
Put em = λ
−1w. The vectors e1, . . . , em form a normal basis in L.
The same proof shows that if L1 ⊂ L2 are subspaces of V , every normal basis
in L1 extends to a normal basis in L2.
Now we turn to the analysis of the differential d. Choose a normal basis
g1, . . . , gq in Im d, and extend it to a normal basis g1, . . . , gq, h1, . . . , hp in
Ker d. Note that each of these p + q vectors has the form xj + o(xj) with
distinct j. Let us assume without loss of generality that the remaining n−p−q
elements of the preferred basis in V are x1, . . . , xq, and
gi = xi+q + o(xi+q), hj = xj+2q + o(xj+2q) .
Here we use that, by the dimension theorem, n = p+ 2q. Note that
x1, . . . , xq, g1, . . . , gq, h1, . . . , hp
is a normal system, and hence a basis in V . We call such a basis a spectral
basis of the decorated complex V.
Note that [h1], . . . , [hp] is a basis in the homology H(V). Consider any
homology class a =
λi[hi]. Every element v ∈ V with a = [v] can be
written as v =
λihi +
αjgj. Thus, by Lemma 5.3, F (v) ≥ maxi F (λihi)
and hence
c(a) = max
F (λihi) . (26)
This proves in particular that the spectral invariants are finite provided a 6= 0.
For finite sets A = {v1, . . . , vs} and B = {w1, . . . , ws} we write A⊗B for the
finite set {vi ⊗ wj}.
Assume now that V1,V2 are generic decorated complexes. We say that they
are in general position if their tensor product V = V1⊗̂KV2 is generic. Let
Bi = {x
1 , . . . , x
1 , . . . , g
1 , . . . , h
}, i = 1, 2
be a spectral basis in Vi. Obviously, B1 ⊗ B2 is a normal basis in V1⊗̂KV2.
We shall denote by d1, d2, d the differentials and by F1, F2, F the filters in
V1,V2 and V respectively. Put Gi = {g
1 , . . . , g
qi }, Hi = {h
1 , . . . , h
and K = G1 ⊗ B2 ∪B1 ⊗G2. Observe that
Im d ⊂W := Span(K) .
Take any two classes
j ] ∈ H(Vi) , i = 1, 2.
Suppose that a1 ⊗ a2 = [v]. Then v is of the form
λ(1)m λ
m ⊗ h
l + w
where w must lie in W . Observe that (H1 ⊗H2) ∩K = ∅. By Lemma 5.3,
F (v) ≥ max
F (λ(1)m λ
m ⊗ h
l ) ,
and hence
c(a1 ⊗ a2) = max
F (λ(1)m λ
m ⊗ h
= max
m ) + F2(λ
= max
m ) + max
l ) = c(a1) + c(a2) .
In the last equality we used (26). This completes the proof of Theorem 5.2
for decorated complexes in general position.
It remains to remove the general position assumption. This will be done
with the help of the following lemma. We shall work with a family of deco-
rated complexes
V := (V, {xi}i=1,...,n, F, d,Γ)
which have exactly the same data (preferred basis, grading, differential and
Γ) with the exception of the filter F which will be allowed to vary in the class
of filters. The corresponding spectral invariants will be denoted by c(a, F ).
Lemma 5.5.
(i) If filters F, F ′ satisfy F (xi) ≤ F
′(xi) for all i = 1, . . . , n, then c(a, F ) ≤
c(a, F ′) for all non-zero classes a ∈ H(V).
(ii) If F is a filter and θ ∈ R, then F + θ is again a filter and c(a, F + θ) =
c(a, F ) + θ for all non-zero classes a ∈ H(V).
The proof is obvious and we omit it. It follows that for any two filters F, F ′
|c(a, F )− c(a, F ′)| ≤ ||F − F ′||C0 ∀a ∈ H(V) \ {0} .
Assume now that V1,V2 are decorated complexes. Denote by F1, F2 their
filters. Fix ǫ > 0. By a small perturbation of the filters we get new filters,
F ′1 and F
2, on our complexes so that the complexes become generic and in
general position, and furthermore
||F1 − F
1||C0 ≤ ǫ , ||F2 − F
2||C0 ≤ ǫ .
Given homology classes ai ∈ H(Vi) we have
|c(a1, F1) + c(a2, F2)− c(a1 ⊗ a2, F1 + F2)| ≤
|c(a1, F
1) + c(a2, F
2)− c(a1 ⊗ a2, F
1 + F
2)|+ 4ǫ = 4ǫ .
Here we used that Theorem 5.2 is already proved for generic complexes in
general position. Since ǫ > 0 is arbitrary, we get that
c(a1, F1) + c(a2, F2)− c(a1 ⊗ a2, F1 + F2) = 0 ,
which completes the proof of Theorem 5.2 in full generality.
6 Stable non-displaceability of heavy sets
In this section we prove part (ii) of Theorem 1.2.
Proposition 6.1. Every heavy subset is stably non-displaceable.
For the proof we shall need the following auxiliary statement. Given R > 0,
consider the torus T2R obtained as the quotient of the cylinder T
∗S1 = R(r)×
S1 (θ mod 1) by the shift (r, θ) 7→ (r + R, θ). For α > 0 define the function
Fα(r, θ) := αf(r) on T
R, where f(r) is any R-periodic function having only
two non-degenerate critical points on [0, R]: a maximum point at r = 0 with
f(0) = 1, and a minimum point at r = R/2, f(R/2) =: −β < 0. We denote
by [T ] the fundamental class of T2R. We work with the symplectic form dr∧dθ
on T2R.
Lemma 6.2. c([T ], Fα) = α.
Proof. Note that the contractible closed orbits of period 1 of the Hamiltonian
flow generated by Fα are fixed points forming circles S+ = {r = 0} and
S− = {r = R/2}. The actions of the fixed points on S± equal respectively to
α and −αβ, and thus the spectral invariants of Fα lie in the set {α,−αβ}.
Recall from [59] that c([T ], Fα) > c([point], Fα). Thus c([T ], Fα) = α.
Lemma 6.3. Let H ∈ C∞(M) so that H−1(maxH) is displaceable. Then
ζ(H) < maxH.
Proof. Choose ǫ > 0 so that the set
H−1((maxH − ǫ,maxH ])
is displaceable. Choose a real-valued cut-off function ρ : R → [0, 1] which
equals 1 near maxH and which is supported in (maxH−ǫ,maxH+ǫ). Thus
ρ(H) is supported in H−1((maxH − ǫ; maxH ]) and ζ(ρ(H)) = 0. Since H
and ρ(H) Poisson-commute, the vanishing and the monotonicity axioms yield
ζ(H) = ζ(ρ(H)) + ζ(H − ρ(H)) ≤ max(H − ρ(H)) < maxH .
Proof of Proposition 6.1: It suffices to show that for every R > 0 the set
Y := X × {r = 0} ⊂M ′ :=M × T2R
is non-displaceable. Assume on the contrary that Y is displaceable. Choose
a function H on M with H ≤ 0, H−1(0) = X . Put
H ′ = H + F1 = H + f(r) :M
′ → R.
Assume that the partial quasi-state ζ on M is associated to some non-zero
idempotent a ∈ QH∗(M) by means of (2). Denote by ζ
′ the quasi-state on
M ′ associated to a⊗ T . Note that
Y = (H ′)−1(maxH ′) , where maxH ′ = 1 ,
while Theorem 5.1 and Lemma 6.2 imply that
ζ ′(H ′) = ζ(H) + 1 .
By Lemma 6.3 ζ ′(H ′) < 1 and so ζ(H) < 0. In view of Proposition 4.1, we
get a contradiction with the heaviness of X .
Lemma 6.2 also yields a simple proof of the following result which also follows
from Corollary 1.15:
Corollary 6.4. Any meridian of T2 is heavy (with respect to the fundamental
class [T ]).
Proof. In the notation as above identify T2 with T21 for R = 1. Since any
two meridians of T2 can be mapped into each other by a symplectic isotopy
and since such an isotopy preserves heaviness, it suffices to prove that the
meridian S := S+ = {r = 0} (see the proof of Lemma 6.2) is heavy.
Let H : T2 → R be a Hamiltonian and let us show that ζ(H) ≥ infSH ,
where ζ is defined using [T ]. Shifting H , if necessary, by a constant, we may
assume without loss of generality that infSH = 1. Pick f = f(r) : T
2 → R
as in the definition of Fα so that F1 = f ≤ H on T
2 (note that f equals 1 on
S). Then Lemma 6.2 yields
ζ(H) ≥ ζ(F1) = 1 = inf
7 Analyzing stable stems
Proof of Theorem 1.6: Assume that A is a Poisson-commutative subspace
of C∞(M), Φ : M → A∗ its moment map with the image ∆, and let X =
Φ−1(p) be a stable stem of A.
Take any functionH ∈ C∞(A∗) with H ≥ 0 andH(p) = 0. We claim that
ζ(Φ∗H) = 0. By an arbitrarily small C0-perturbation of H we can assume
that H = 0 in a small neighborhood, say U , of p. Choose an open covering
U0, U1, . . . , UN of ∆ so that U0 = U , and all Φ
−1(Ui) are stably displaceable
for i ≥ 1 (it exists by the definition of a stem). Let ρi : ∆ → R, i = 0, . . . , N ,
be a partition of unity subordinated to the covering {Ui}.
Take the two-torus T2R as in Section 6. Choose R > 0 large enough so that
Φ−1(Ui)× {r = const} is displaceable in M × T
R for all i ≥ 1. Choose now
a sufficiently fine covering Vj , j = 1, . . . , K, of the torus T
R by sufficiently
thin annuli {|r − rj | < δ} so that the sets Φ
−1(Ui) × Vj are displaceable in
M × T2R for all i ≥ 1 and all j. Let ̺j = ̺j(r), j = 1, . . . , K, be a partition
of unity subordinated to the covering {Vj}.
Denote by ζ ′ the partial quasi-state corresponding to a⊗T . Put F (r, θ) =
cos(2πr/R). Write
Φ∗H + F =
(Φ∗H + F ) · Φ∗ρi · ̺j =
Φ∗(Hρ0) + F · Φ
∗ρ0 +
(Φ∗H + F ) · Φ∗ρi · ̺j .
Note that Hρ0 = 0 and F · Φ
∗ρ0 ≤ 1. Applying partial quasi-additivity
and monotonicity we get that
ζ ′(Φ∗H + F ) = ζ ′(F · Φ∗ρ0) ≤ 1.
By Lemma 6.2 and the product formula (Theorem 5.1 above) we have
ζ ′(Φ∗H + F ) = ζ(Φ∗H) + 1 ≤ 1
and hence ζ(Φ∗H) ≤ 0. On the other hand, ζ(Φ∗H) ≥ 0 since H ≥ 0. Thus
ζ(Φ∗H) = 0 and the claim follows.
Further, given any function G on M with G ≥ 0 and G|X = 0, one can
find a function H on A∗ with H(p) = 0 so that G ≤ Φ∗H . By monotonicity
and the claim above
0 ≤ ζ(G) ≤ ζ(Φ∗H) = 0 ,
and hence ζ(G) = 0. Thus X is superheavy.
8 Monotone Lagrangian submanifolds
The main tool of proving (super)heaviness of monotone Lagrangian subman-
ifolds satisfying the Albers condition is the spectral estimate in Proposi-
tion 8.1(iii) below, which originated in the work by Albers [2]. Later on
Biran and Cornea pointed out a mistake in [2], and suggested a correction
together with a far reaching generalization in [15]. Let us mention that the
original Albers estimate was used in the first version of the present paper. We
thank Biran and Cornea for informing us about the mistake, explaining to
us their approach and helping us to correct a number of our results affected
by this mistake.
The main ingredient of Biran-Cornea techniques which is needed for our
purposes is the following result. Let (M,ω) be a closed monotone symplectic
manifolds with [ω] = κ·c1(M), κ > 0. WriteN for the minimal Chern number
of (M,ω). Let Ln ⊂M2n be a closed monotone Lagrangian submanifold with
the minimal Maslov number NL ≥ 2.
We shall treat slightly differently the cases when NL is even and odd. Let
us mention that for orientable L, NL is automatically even. Thus, due to
our convention, when NL is odd we work with the basic field F = Z2. Let
Γ = κN · Z be the group of periods of M . Recall that the quantum ring has
the form QH∗(M) = H∗(M ;F) ⊗F Λ, where the Novikov ring Λ is defined
as Λ = KΓ[q, q
−1] . Put Γ′ = (κN/2) · Z. Consider an extended Novikov ring
Λ′ := KΓ′ [q
2 , q−
2 ]. Define now QH ′∗(M) as QH∗(M) if NL is even, and as
H∗(M,Z2)⊗Z2 Λ
′ if NL is odd. In the latter case QH
∗(M) is an extension of
QH∗(M), and we shall consider without special mentioning QH∗(M), Λ, KΓ
as subrings of QH ′∗(M), Λ
′, KΓ′. The grading of QH
∗(M) is determined by
the condition deg q
2 = 1. As before, we shall use notation QH ′•(M), where
• = “even” when F = C and • = ∗ when F = Z2.
Note that the spectral invariants (and hence partial symplectic quasi-
states) are well-defined over the extended ring, and furthermore, their values
and properties, by tautological reasons, do not alter under such an extension
(cf. a discussion in [15], Section 5.4). Put w := sκNL/2qNL/2. Recall that j
stands for the natural morphism H•(L;F) → H•(M ;F).
Proposition 8.1. Assume that k > n+1−NL. If F = C assume in addition
that k is even. Then there exists a canonical homomorphism jq : Hk(L;F) →
QH ′k(M) with the following properties
8The letter “q” in jq stands for quantum.
(i) jq(x) = j(x) + w−1y, where y is a polynomial in w−1 with coefficients
in H•(M ;F);
(ii) jq([L]) = j([L]);
(iii) If jq(x) 6= 0 then c(jq(x), H) ≤ supLH for every H ∈ C
∞(M).
In particular, if S is an Albers element of L, we have jq(S) = j(S)+O(w−1) 6=
This proposition was proved by Biran and Cornea in [15] in the case
F = Z2: The map j
q is essentially the map iL appearing in Theorem A(iii)
in [15]. Proposition 8.1(i) above is a combination of Theorem A(iii) and
Proposition 4.5.1(i) in [15]. Our variable w corresponds to the variable t−1 in
[15], while our sNκqN corresponds to the variable s−1 in Section 2.1.2 of [15].
After such an adjustment of the notation, the formula w := sκNL/2qNL/2 above
can be extracted from Section 2.1.2 of [15]. For Proposition 8.1(ii) above
see Remark 5.3.2.a in [15]. Proposition 8.1(iii) above follows from Lemma
5.3.1(ii) in [15]. Finally, let us repeat the disclaimer made in Section 1.5: we
take for granted that Proposition 8.1 remains valid for the case F = C.
Let us pass to the proofs of our results on (super)-heaviness of monotone
Lagrangian submanifolds. We start with the following remark. Let S be an
Albers element of L. The Poincaré duality property of spectral invariants
(see Section 3.4 above) extends verbatim to the case of the ring QH ′(M)
with the following modification: When NL is odd, the pairing Π introduced
in Section 3.4 extends in the obvious way to a non-degenerate F -valued
pairing on QH ′•(M) which we still denote by Π. Applying Poincaré duality
and substituting H := −F into Proposition 8.1 (iii) above we get that for
every function F ∈ C∞(M)
c(T, F ) ≥ inf
F ∀T ∈ QH ′•(M) with Π(T, j
q(S)) 6= 0.
In particular, given a non-zero idempotent a ∈ QH ′•(M) and a class b ∈
QH ′•(M), so that Π(a∗b, j
q(S)) 6= 0, we get, using the normalization property
of spectral invariants, that
c(a, F ) + ν(b) ≥ c(a ∗ b, F ) ≥ inf
F ∀F ∈ C∞(M) . (27)
Therefore, applying (27) to kF for k ∈ N, dividing by k and passing to the
limit as k → +∞, we get that for the partial quasi-state ζ , defined by a,
ζ(F ) ≥ inf
F ∀F ∈ C∞(M),
meaning that L is heavy with respect to a.
Proof of Theorem 1.15: Let S be an Albers element of L. Let T ∈
H•(M ;F) be any singular homology class such that T ◦ j(S) 6= 0. Thus,
applying Proposition 8.1 (i) we see that Π([M ]∗T, jq(S)) = Π(T, jq(S)) 6= 0,
and hence inequality (27), applied to a = [M ], b = T , yields that L is heavy
with respect to [M ].
Let us pass to the proof of Theorem 1.25 on the effect of semi-simplicity
of the quantum homology. It readily follows from the next more general
statement. Let L1, . . . , Lm be Lagrangian submanifolds satisfying the Albers
condition. Let Si be any Albers element of Li. Denote by Zi = j
q(Si) ∈
QH ′•(M) its image under the inclusion morphism from Proposition 8.1 above.
Theorem 8.2. Given such L1, . . . , Lm and Z1, . . . , Zm, assume, in addition,
that QH2n(M) is semi-simple and the Lagrangian submanifolds L1, . . . , Lm
are pair-wise disjoint. Then the classes Z1, . . . , Zm are linearly independent
over KΓ′.
Proof. Arguing by contradiction, assume that
Z1 = α2Z2 + . . .+ αmZm (28)
for some α2, . . . , αm ∈ KΓ′ . Since QH2n(M) is semi-simple, it decomposes
into a direct sum of fields with unities e1, . . . , ed. Since the pairing Π (on
QH ′•(M ;F)) is non-degenerate, there exists T ∈ QH
•(M ;F) such that
Π(T, Z1) 6= 0. (29)
Let us write T as
T = [M ] ∗ T =
ei ∗ T. (30)
Equations (29), (30) imply that there exists l ∈ [1, d] such that
Π(el ∗ T, Z1) 6= 0 . (31)
Then (28) implies that there exists r ∈ [2, m] such that
Π(el ∗ T, αrZr) 6= 0.
Using (21) (for Π on QH ′•(M ;F)) we can rewrite the last equation as
Π(el ∗ αrT, Zr) 6= 0. (32)
Applying now formula (27) for S = Z1 ∈ H•(L1;F), a = el, b = T , and also
for S = Zr ∈ H•(Lr;F), a = el, b = αrT , we conclude that both L1 and Lr
are heavy with respect to el. Thus they are superheavy with respect to el,
because el is the unity in a field factor of QH2n(M) (see Section 1.6). Hence
they must intersect – in contradiction to the assumption of the theorem. This
finishes the proof of the first part of the theorem.
Proof of Theorem 1.25(a): Assume that L1, . . . , Lm are pair-wise disjoint
Lagrangian submanifolds satisfying the condition (a) from the formulation
of the theorem. Denote by Ni the minimal Maslov number of Li. Since
Ni > n + 1, the class of a point from H0(Li;F) is an Albers element for Li.
Let Zi ∈ QH
0(M) be its image under the Biran-Cornea inclusion morphism
associated to Li. Note that by Proposition 8.1(i) Zi = p + aiw
i , where
wi = s
κNi/2qNi/2, ai ∈ HNi(M ;F) and p ∈ H0(M ;F) is the homology class of
a point. Observe that degwi = Ni > n + 1, and hence the expression for Zi
cannot contain terms in w−1i of order two and higher, since HkNi(M ;F) = 0
for k ≥ 2.
Recall now that all Ni’s lie in some set Y of positive integers. Let W ⊂
QH ′0(M) be the span over KΓ′ of
H0(M ;F)⊕
s−κE/2q−E/2 ·HE(M ;F) .
Note that
dimKΓ′ W = βY (M) + 1 < m .
Thus the elements Zi, i = 1, . . . , m, are linearly dependent over KΓ′ . By
Theorem 8.2, QH2n(M) is not semi-simple.
Proof of Theorem 1.25(b): Assume that L1, . . . , Lm are pair-wise disjoint
homologically non-trivial Lagrangian submanifolds. By Proposition 8.1(ii)
jq([Li]) = j([Li]) for every i = 1, . . . , m. Since the classes j([Li]) are linearly
dependent, Theorem 8.2 implies that QH2n(M) is not semi-simple.
Proof of Theorem 1.18: Combining Proposition 8.1 (ii) and (iii) we get
that for any H ∈ C∞(M)
c(j([L]), H) ≤ sup
H ∀H ∈ C∞(M) .
By the hypothesis of the theorem, we can write j([L]) ∗ b = a for some b.
c(a,H) = c(j([L]) ∗ b,H) ≤ c(j([L]), H) + c(b, 0) .
c(a,H) ≤ sup
H + c(b, 0) .
Applying this inequality to E · H with E > 0, dividing by E and passing
to the limit as E → +∞ we get that ζ(H) ≤ supLH for all H . Thus L is
superheavy.
Remark 8.3. The results above admit the following generalizations in the
framework of the Biran-Cornea theory. The main object of this theory is the
quantum homology ring QH∗(L) of a monotone Lagrangian submanifold,
which is isomorphic to the Lagrangian Floer homology HF∗(L, L) of L up to
a shift of the grading.
(i) If QH∗(L) does not vanish then L is heavy (see Remark 1.2.9b in [15]).
In fact, it follows from [15] that if L satisfies the Albers condition,
QH∗(L) does not vanish.
(ii) The map jq of the Proposition 8.1 above is a footprint of the quan-
tum inclusion map iL : QH∗(L) → QH
∗(M) constructed in [15]. The
analogue of the action estimate in item (iii) of the proposition is ob-
tained in [15] for the classes iL(x) for elements x ∈ QH∗(L) of a certain
special form, yielding the following generalization of Theorem 1.18: for
these special classes x ∈ QH∗(L) the condition that the class iL(x)
does not vanish and divides a non-trivial idempotent a implies that L
is superheavy with respect to a. This enables, for instance, to general-
ize Example 1.19 on Lagrangian spheres in quadrics above to the case
when dimL is odd.
(iii) In [15] one can find another action estimate which comes from the
QH∗(M)-module structure on QH∗(L), which yields more results on
(super)heaviness of monotone Lagrangian submanifolds.
Proof of Proposition 1.4: The quantum homology QH2n(M) splits as an
algebra over K into a direct sum of two algebras one of which is a field. This
was proved by McDuff for the field F = C (see [39] and [24, Section 7]), but
the proof goes through for the case F = Z2 as well. Denote the unity of the
field by a. It is a non-zero idempotent in QH2n(M). As we already pointed
out in Remark 1.21, such an idempotent a defines a genuine symplectic quasi-
state and therefore the classes of heavy and superheavy sets with respect to
a coincide.
By Theorem 1.2, the Lagrangian torus L ⊂ M cannot be superheavy
with respect to a, since it can be displaced from itself by a symplectic (non-
Hamiltonian) isotopy. Indeed, take an obvious symplectic isotopy φt of T
that displaces L (a parallel shift) and compose it with a Hamiltonian isotopy
ψt so that for every t we have that ψt is constant on φt(L) and ψtφt is identity
on the ball where the blow up of T2n was performed. Clearly, the resulting
symplectic isotopy ψtφt extends to a symplectic isotopy of M that displaces
On the other hand, NL ≥ 2 because in this case NL = 2N , where N ≥ 1
is the minimal Chern number of M . Finally, note that L represents a non-
trivial homology class in Hn(M ;Z2). Therefore we can apply Theorem 1.15
and get that L is heavy with respect to [M ].
9 Rigidity of special fibers of Hamiltonian ac-
tions
In this section we prove Theorem 1.9. Denote the special fiber of Φ by
L := Φ−1(pspec).
Reduction to the case of T1-actions: First, we claim that it is enough
to prove the theorem for Hamiltonian T1-actions and the general case will
follow from it. Indeed, assume this is proved. The superheaviness of the
special fiber immediately yields that for any function H̄ : R → R
ζ(Φ∗H̄) = H̄(pspec), (33)
where Φ :M → R is the moment map of the T1-action.
Let us turn to the multi-dimensional situation and let Φ : M → Rk
be the normalized moment map of a Hamiltonian Tk-action on M . For a
v ∈ Rk denote by Φv(x) = 〈v,Φ(x)〉, where 〈·, ·〉 is the standard Euclidean
inner product on Rk. Note that if v ∈ Zk the function Φv is the normalized
moment map of a Hamiltonian circle action and its special value is 〈v, pspec〉.
Thus by (33)
K) = K(〈v, pspec〉) ∀K ∈ C
∞(R) . (34)
By homogeneity of ζ , equality (34) holds for all v ∈ Qk, and by continuity
for all v ∈ Rk.
Observe that for each pair of smooth functions P,Q ∈ C∞(R) and for each
pair of vectors v,w ∈ Rk the functions Φ∗
P and Φ∗
Q Poisson-commute on
M . Thus the triangle inequality for the spectral numbers (see Section 3.4)
yields
P + Φ∗
Q) ≤ ζ(Φ∗
P ) + ζ(Φ∗
Q) . (35)
Since M is compact, it suffices to assume that the function H̄ ∈ C∞(Rk) on
Rk is compactly supported. By the inverse Fourier transform we can write
H̄(p) =
sin〈v, p〉 · F (v) + cos〈v, p〉 ·G(v)
for some rapidly (say, faster than (|p| + 1)−N for any N ∈ N) decaying
functions F and G on Rk. For every v ∈ Rk define a function Kv ∈ C
Kv(s) := sin s · F (v) + cos s ·G(v) .
Observe that
Φ∗H̄ =
Kv dv .
Denote by B(R) the Euclidean ball of radius R in Rk with the center at the
origin. Put
H̄R(p) =
Kv(〈v, p〉) dv, p ∈ R
Since the functions F and G are rapidly decaying, we get that
||H̄R − H̄||C0(Rk) → 0 as R → ∞ . (36)
We claim that for every R
ζ(Φ∗H̄R) ≤ H̄R(pspec) . (37)
Indeed, for ǫ > 0 introduce the integral sum
H̄R,ε(p) =
v∈ ε·Zk∩B(R)
εk ·Kv(〈v, p〉) .
Φ∗H̄R,ε =
v∈ ε·Zk∩B(R)
εk · Φ∗
Applying repeatedly (35) and (34) we get that
ζ(Φ∗H̄R,ε) ≤ H̄R,ε(pspec) .
Note now that for fixed R the family H̄R,ǫ converges to H̄R as ε → 0 in
the uniform norm on C0(Rk). Using that ζ is Lipschitz with respect to the
uniform norm on C0(M) we readily get the inequality (37).
Combining the fact that ζ is Lipschitz with (36) and (37) we get that
ζ(Φ∗H̄) = lim
ζ(Φ∗H̄R) ≤ lim
H̄R(pspec) = H̄(pspec) .
Now, assume that H̄ ≥ 0 and H̄(pspec) = 0. We just have proved that
ζ(Φ∗H̄) ≤ 0, and hence ζ(H) = 0, which immediately yields the desired su-
perheaviness of the special fiber. This completes the reduction of the general
case to the 1-dimensional case.
From now on we will consider only the case of an effective Hamil-
tonian T1-action on M with a moment map Φ :M → R. Its moment
polytope ∆ is a closed interval in R and pspec = −I(Φ) ∈ R.
Reduction to the case of a strictly convex function: We claim
that it is enough to show the following proposition:
Proposition 9.1. Assume H̄ : R → R is a strictly convex smooth function
reaching its minimum at pspec. Set H := Φ
∗H̄. Then ζ(H) = H̄(pspec).
Postponing the proof of the proposition for a moment let us show that it
implies the theorem. Indeed, let F : M → R be a Hamiltonian on M . In
order to show the superheaviness of L = Φ−1(pspec) we need to show that
ζ(F ) ≤ supL F . Pick a very steep strictly convex function H̄ : R → R with
the minimum value supL F reached at pspec and such that Φ
∗H̄ =: H ≥ F
everywhere on M . Then using Proposition 9.1 and the monotonicity of ζ we
ζ(F ) ≤ ζ(H) = H̄(pspec) = sup
yielding the claim.
Preparations for the proof of Proposition 9.1: Given a (time-
dependent, not necessarily regular) Hamiltonian G, we associate to every
pair [γ, u] ∈ P̃G a number
DG([γ, u]) := AG([γ, u])−
· CZG([γ, u]).
(Recall that we defined the Conley-Zehnder index for all Hamiltonians and
not only the regular ones – see Section 3.3). The number DG([γ, u]) is in-
variant under a change of the spanning disc u – an addition of a sphere
jS ∈ HS2 (M) to the disc u changes both AG([γ, u]) and κ/2 · CZG([γ, u]) by
the same number. Thus we can write DG([γ, u]) = DG(γ).
Given [γ, u] ∈ P̃G and l ∈ N define γ
(l) and u(l) as the compositions
of γ and u with the map z → zl on the unit disc D2 ⊂ C (here z is a
complex coordinate on C). Denote by t 7→ gt the time-t flow of G and by
G(l) :M × R → R the Hamiltonian whose time-t flow is t 7→ (gt)
l and which
is defined by
G(l) := G♯ . . . ♯G (l times),
where G♯K(x, t) := G(x, t) +K(g−1t x, t) for any K :M × R → R.
Proposition 9.2. There exists a constant C > 0, depending only on n, with
the following property. Given a 1-periodic orbit γ ∈ PG of the flow t 7→ gt
generated by G, assume that γ(l) is a 1-periodic orbit of the flow t 7→ glt
generated by G(l), and therefore for any u such that [γ, u] ∈ P̃G we have
[γ(l), u(l)] ∈ P̃G(l). Then
|DG(l)([γ
(l), u(l)])− lDG([γ, u])| ≤ l · C.
Proof. The action term in DG gets multiplied by l as we pass from G to G
As for the Conley-Zehnder term, the quasi-morphism property of the Conley-
Zehnder index (see Proposition 3.5) implies that there exists a constant C > 0
(depending only on n) such that
|lCZG[γ, u]− CZG(l)([γ
(l), u(l)])| ≤ C.
This immediately proves the proposition.
Proposition 9.3. Let G :M × [0, 1] → R be a Hamiltonian as above. Then
one can choose ǫ > 0, depending on G, and a constant Cn > 0, depending
only on n = dimM/2, so that any function F : M × [0, 1] → R which is ǫ-
close to G in a C∞-metric on C∞(M×[0, 1]) satisfies the following condition:
for every γ0 ∈ PF there exists γ ∈ PG such that the difference between DF (γ0)
and DG(γ) is bounded by Cn.
Proof. Denote the flow of G by gt (as before) and the flow of F by ft. We
will view time-1 periodic trajectories of these flows both as maps of [0, 1] to
M having the same value at 0 and 1 and as maps from S1 to M .
First, consider the fibration D2×M →M and, slightly abusing notation,
denote the natural pullback of ω again by ω. Second, look at the fibration
pr : D2 ×M → D2. Denote by V ert the vertical bundle over D2 ×M formed
by the tangent spaces to the fibers of pr. For each loop σ : S1 →M define by
σ̂ : S1 → D2 ×M the map σ̂(t) := (t, γ(t)). The bundles σ∗TM and σ̂∗V ert
over S1 coincide. Similarly for each w : D2 →M denote by ŵ : D2 → D2×M
the map ŵ(z) := (z, w(z)).
There exists δ > 0, depending on G, such that for each γ ∈ PG a tubular
δ-neighborhood of the image of γ̂ in S1 ×M ⊂ D2 ×M , denoted by Ubγ, has
the following properties:
• there exists a 1-form λ on Ubγ satisfying dλ = ω;
• V ert admits a trivialization over Ubγ .
Given an ǫ > 0, we can choose F sufficiently C∞-close to G so that the
paths t 7→ ft and t 7→ gt in Ham(M) are arbitrarily C
∞-close and therefore
• for every x ∈ Fix (F ) there exists y ∈ Fix (G) which is ǫ-close to x
(think of the fixed points as points of intersection of the graph of a
diffeomorphism with the diagonal);
• the C∞-distance between the maps γ0 : t 7→ ft(x) and γ : t 7→ gt(y)
from [0, 1] to M is bounded by ǫ and the image of γ̂0 lies in Ubγ.
Pick a map u0 : D
2 → M , u|∂D2 = γ0. Since γ0 and γ are C
∞-close one
can enlarge D2 to a bigger disc D21 ⊃ D
2 and find a smooth map u : D21 →M
so that
• u|∂D21
= u0;
• u(D21 \ D
2) ⊂ Ubγ.
Rescaling D21 we may assume without loss of generality that [γ, u] ∈ PG.
Trivialize the vector bundles γ∗0TM and γ
∗TM so that the trivializations
extend to a trivialization of u∗TM over D21 (and hence of u
0TM over D
Using the trivializations we can identify the paths t 7→ dγ0(0)ft and t 7→ dγ(0)gt
with some identity-based paths of symplectic matrices A(t), B(t). Fixing a
small ǫ as above, we can also assume that F is chosen so C∞-close to G that,
in addition to all of the above, the C∞-distance between the paths t 7→ A(t)
and t 7→ B(t) in Sp (2n) is bounded by ǫ (for instance, make sure first that
the matrix paths obtained by writing the paths t 7→ dγ0(0)ft and t 7→ dγ(0)gt
using some trivialization of V ert over Ubγ are close enough – then the matrix
paths t 7→ A(t) and t 7→ B(t) will also be close enough).
We claim that by choosing ǫ sufficiently small in the construction above we
can bound the difference between DF ([γ0, u0]) and DG([γ, u]) by a quantity
depending only on dimM .
Indeed, the difference |
F (γ0(t), t)dt −
G(γ(t))dt| is bounded by a
quantity depending only on some universal constants and ǫ, because γ0 is
ǫ-close to γ and F is ǫ-close to G with respect to the C∞-metrics. It can be
made arbitrarily small by choosing a sufficiently small ǫ. The difference
u∗0ω −
u∗ω| = |
û∗0ω −
û∗ω|
is bounded by the difference |
γ̂∗0λ −
γ̂∗λ|. Since, γ0 and γ are ǫ-close
in the C∞-metric the later difference can be made less than 1 if we choose
a sufficiently small ǫ. Thus we have shown that by choosing a sufficiently
small ǫ we can bound |AF ([γ0, u0])−AG([γ, u])| by 1.
Now, as far as the Conley-Zehnder indices are concerned, our choice
of the trivializations means that the difference between CZF ([γ0, u0]) and
CZG([γ, u]) is just the difference between the Conley-Zehnder indices for the
matrix paths t 7→ A(t) and t 7→ B(t). But the latter paths in Sp (2n) are
ǫ-close in the C∞-sense, hence represent close elements of S̃p (2n) and if ǫ
was chosen sufficiently small, then, as we mentioned in Section 3.3, their
Conley-Zehnder indices differ at most by a constant depending only on n.
This finishes the proof of the claim and the proposition.
Plan of the proof of Proposition 9.1: We assume now that H̄ is
a fixed strictly convex function on R. Our calculations will feature E as a
large parameter. For quantities α, β depending on E we will write α � β if
α ≤ β+const holds for large enough E, where const depends only on (M,ω),
Φ and H̄ , and in particular does not depend on E. We will write α ≈ β if
α � β and β � α. Using this language the proposition can be restated as
c(a, EH) ≈ EH̄(pspec). (38)
In general, 1-periodic orbits of the flow of EH are not isolated and there-
fore the Hamiltonian is not regular. Let F be a regular (time-periodic)
perturbation of EH .
By the spectrality axiom, the spectral number c(a, F ) for a ∈ QH2n(M)
equals AF ([γ0, u0]) for some pair [γ0, u0] ∈ P̃F with CZF ([γ0, u0]) = 2n. Thus
c(a, F ) ≈ DF (γ0). Combining this with Proposition 9.3 we get that for some
γ ∈ PEH
EH̄(pspec) � c(a, EH) ≈ c(a, F ) ≈ DF (γ0) ≈ DEH(γ) . (39)
Thus it would be enough to show that
DEH(γ) � EH̄(pspec) for all γ ∈ PEH , (40)
which together with (39) would imply (38).
Inequality (40) will be proved in the following way. Note that each γ ∈
PEH lies in Φ
−1(p) for some p ∈ ∆. We will show that
DEH(γ) ≈ EH̄(p) + EH̄
′(p)(pspec − p). (41)
Note that (41) implies (40). Indeed, since H̄ is strictly convex and reaches
its minimum at pspec, it follows from (41) that
DEH(γ) ≈ EH̄(p) + EH̄
′(p)(pspec − p) ≤ EH̄(pspec),
which is true for any γ ∈ PEH thus yielding (40).
Proof of (41): Let the T1-action on M be given by a loop of sym-
plectomorphisms {φt}, t ∈ R, φt = φt+1. The flow of EH has the form
htx = φEH̄′(Φ(x))tx.
We view γ as a map γ : [0, 1] → M satisfying γ(0) = γ(1). Denote
x := γ(0). The curve γ lies in Φ−1(p).
Denote N := γ([0, 1]). This is the T1-orbit of x and it is either a point or
a circle.
In the first case γ is a constant trajectory concentrated at a fixed point
N ∈M of the action. Using this constant curve γ together with the constant
disc u spanning for the definitions of I(Φ) and DEH(γ) one gets
pspec − p = mΦ(γ, u) · κ/2,
DEH(γ) = EH̄(p)− κ/2 · CZEH([γ, u]).
Thus proving (41) reduces in this case to proving
−CZEH([γ, u]) ≈ EH̄
′(p) ·mΦ(γ, u).
Let us fix a symplectic basis of TNM and view each differential dNφt as a
symplectic matrix A(t), so that {A(t)} is an identity-based loop in Sp (2n).
−CZEH([γ, u]) ≈ CZmatr({A(EH̄
′(p)t)}),
while
EH̄ ′(p) ·mΦ(γ, u) ≈ EH̄
′(p)Maslov({A(t)}).
Thus we need to prove
CZmatr({A(EH̄
′(p)t)}) ≈ EH̄ ′(p)Maslov({A(t)}),
which follows easily from the definitions of the Conley-Zehnder index and
the Maslov class.
Thus from now on we will assume that N is a circle. Take any point
x ∈ N . The stabilizer of x under the T1-action is a finite cyclic group of
order k ∈ N. Thus the orbit of the T1-action turns k times along N . Since γ
is a non-constant closed orbit of the Hamiltonian flow generated by EΦ∗H̄ ,
it turns r times along N with r ∈ Z \ {0}. This implies that EH̄ ′(p) = r/k.
We claim that without loss of generality we may assume that l := r/k is an
integer.
Indeed, we can always pass to γ(k) ∈ PkEH , so that (kEH̄)
′(p) ∈ Z, and
if we can prove the proposition for γ(k), then
DkEH(γ
(k)) ≈ kEH̄(p) + kEH̄ ′(p)(pspec − p).
Applying Proposition 9.2 we get
kDEH(γ) ≈ kEH̄(p) + kEH̄
′(p)(pspec − p) + k · const ,
and hence
DEH(γ) ≈ EH̄(p) + EH̄
′(p)(pspec − p),
proving the claim for the original γ.
From now on we assume that l := EH̄ ′(p) ∈ Z\{0} and that [γ, u] ∈ P̃lΦ.
Consider the Hamiltonian vector field X := sgradΦ at a point x ∈ N . Since
N is a non-constant orbit we get X 6= 0. Then V = Tx(Φ
−1(p)) is the skew-
orthogonal complement to X . Choose a T1-invariant ω-compatible almost
complex structure J in a neighborhood of N . Together ω and J define a T1-
invariant Riemannian metric g. Decompose the tangent bundle TM along
N as follows. Put Z = Span(JX,X) and set W to be the g-orthogonal
complement to X in V . Thus we have a T1-invariant decomposition
TxM = W ⊕ Z , x ∈ N . (42)
Furthermore, W and Z carry canonical symplectic forms. Thus W and Z
define symplectic (and hence trivial) subbundles of TM over N . They induce
trivial subbundles of the bundle γ∗TM over S1.
We calculate
dht(x)ξ = dφEH′(Φ(x))t(x)ξ + EH
′′(Φ(x)) · dΦ(ξ) ·X . (43)
We consider two trivializations of the bundle γ∗TM over S1. The first trivi-
alization is defined by means of sections invariant under the T1-action. The
second one is chosen in such a way that it extends to a trivialization of u∗TM
over D2. Using these trivializations we can identify dht(x), respectively, with
two identity-based paths {Ct}, {C
t} of symplectic matrices. The decompo-
sition (42) induces a split
Ct = 1⊕ Bt .
We claim that |CZmatr({Bt})| is bounded by a constant independent of E.
Indeed, observe that in the basis (X, JX) of Z
1 b12(t)
Denote by L the line spanned by X = (1, 0). Perturb {Bt} to a path {B
RδtBt}, where Rt is the rotation by angle t, and δ > 0 is small enough.
Observe thatB′(t)L∩L = {0} for t > 0. It follows readily from the definitions
that |CZmatr(B
t)| and |CZmatr(Rδt)| do not exceed 2. Thus by the quasi-
morphism property of the Conley-Zehnder index (see Proposition 3.5) we
have that |CZmatr({Bt})| is bounded by a constant independent of E, which
yields the claim. Therefore
CZmatr ({Ct}) ≈ 0 .
On the other hand, by formula (18)
CZmatr ({C
t}) = CZmatr ({Ct}) +mlΦ([γ, u]) .
CZEH([γ, u]) := n− CZmatr ({C
t}) ≈ −mlΦ([γ, u]). (44)
Since the periodic trajectory γ lies inside Φ−1(p), we get
AEH([γ, u]) =
EH(γ(t))dt−
u∗ω = EH̄(p)−
u∗ω. (45)
Using (45) and (44) the precise equality
DEH([γ, u]) = AEH([γ, u])−
· CZEH([γ, u])
can be turned into an asymptotic inequality
DEH([γ, u]) ≈ EH̄(p)−
u∗ω +
mlΦ([γ, u]). (46)
Since the periodic trajectory γ lies inside Φ−1(p), we have
AlΦ([γ, u]) =
lΦ(γ(t))dt−
u∗ω = lp−
u∗ω. (47)
Adding and subtracting lp from the right-hand side of (46) and using (47)
we get
DEH(γ) = DEH([γ, u]) ≈
EH̄(p)− lp)
mlΦ([γ, u])
EH̄(p)− lp
AlΦ([γ, u])+
mlΦ([γ, u])
EH̄(p)− lp
− I(lΦ) =
= EH̄(p) + l(−I(Φ)− p) = EH̄(p) + l(pspec − p) .
Recalling that l = EH ′(p), we finally obtain that
DEH(γ) = EH̄(p) + EH
′(p)(pspec − p),
which is precisely the equation (41) that we wanted to get. This finishes the
proof of Proposition 9.1 and Theorem 1.9.
9.1 Calabi and mixed action-Maslov
Proof of Theorem 1.13.
Assume H :M × [0, 1] → R is a normalized Hamiltonian which generates
a loop in Ham(M) representing a class α ∈ π1(Ham(M)) ⊂ H̃am (M). Then
H(l) is also normalized and generates a loop representing αl. Let us compute
µ(α) = −vol (M) · liml→+∞ c(a,H
(l))/l.
Arguing as in the proof of (39) we get that there exists a constant C >
0 such that for each l ∈ N there exists γ ∈ PH(l) for which |c(a,H
(l)) −
DH(l)(γ)| ≤ C. But, as it follows from the definitions and from the fact that
I is a homomorphism, DH(l)(γ) does not depend on γ and equals −I(α
−lI(α). This immediately implies that µ(α) = vol (M) · I(α).
Acknowledgements. The origins of this paper lie in our joint work with
P.Biran on the paper [10] – we thank him for fruitful collaboration at an
early stage of this project, as well as for his crucial help with Example 1.17
on Lagrangian spheres in projective hypersurfaces. We also thank him and
O.Cornea for pointing out to us a mistake in the original version of this paper
and helping us with the correction (see Section 8). We thank F. Zapolsky
for his help with the “exotic” monotone Lagrangian torus in S2 × S2 dis-
cussed in Example 1.20. We thank C. Woodward for pointing out to us the
link between the special point in the moment polytope of a symplectic toric
manifold and the Futaki invariant, and E. Shelukhin for useful discussions
on this issue. We are also grateful to V.L. Ginzburg, Y. Karshon, Y. Long,
D. McDuff, M. Pinsonnault, D. Salamon and M. Sodin for useful discussions
and communications. We thank K. Fukaya, H. Ohta and K. Ono, the orga-
nizers of the Conference on Symplectic Topology in Kyoto (February 2006),
M. Harada, Y. Karshon, M. Masuda and T. Panov, the organizers of the Con-
ference on Toric Topology in Osaka (May 2006), O. Cornea, V.L. Ginzburg,
E. Kerman and F. Lalonde, the organizers of the Workshop on Floer theory
(Banff, 2007), and A. Fathi, Y.-G. Oh and C. Viterbo, the organizers of the
AMS Summer Conference on Symplectic Topology and Measure-Preserving
Dynamical Systems (Snowbird, July 2007), for giving us an opportunity to
present a preliminary version of this work and for the superb job they did
in organizing these conferences. Finally, we thank an anonymous referee for
helpful comments and corrections.
References
[1] Aarnes, J.F., Quasi-states and quasi-measures, Adv. Math. 86:1 (1991),
41-67.
[2] Albers, P., On the extrinsic topology of Lagrangian submanifolds, Int.
Math. Res. Not. 38, (2005), 2341-2371.
[3] Albers, P., Frauenfelder, U., A non-displaceable Lagrangian torus in
T ∗S2, Comm. Pure Appl. Math. 61:8 (2008), 1046-1051.
[4] Arnold, V.I., On a characteristic class entering into conditions of quan-
tization, (Russian) Funkcional. Anal. i Priložen. 1 1967, 1-14.
[5] Arnold, V. I., Some remarks on symplectic monodromy of Milnor fi-
brations, in The Floer memorial volume, 99-103, Progr. Math., 133,
Birkhäuser, 1995.
[6] Atiyah, M.F., Convexity and commuting Hamiltonians, Bull. London
Math. Soc. 14:1 (1981), 1-15.
[7] Barge, J., Ghys, E., Cocycles d’Euler et de Maslov, Math. Ann. 294:2
(1992), 235-265.
[8] Beauville, A., Quantum cohomology of complete intersections, Mat. Fiz.
Anal. Geom. 2:3-4 (1995), 384-398.
[9] Biran, P., Cieliebak, K., Symplectic topology on subcritical manifolds,
Comm. Math. Helv. 76:4 (2001), 712-753.
[10] Biran, P., Entov, M., Polterovich, L., Calabi quasimorphisms for the
symplectic ball, Commun. Contemp. Math. 6:5 (2004), 793-802.
[11] Biran, P., Geometry of Symplectic Intersections, in Proceedings of the
International Congress of Mathematicians (Beijing 2002), Vol. II, 241-
[12] Biran, P., Symplectic topology and algebraic families, in 4-th European
Congress of Mathematics (Stockholm 2004),pp. 827-836, Eur. Math.
Soc., Zürich, 2005.
[13] Biran, P., Lagrangian Non-Intersections, Geom. and Funct. Anal.
(GAFA), 16 (2006), 279-326.
[14] Biran, P., Cornea, O., Quantum structures for Lagrangian submanifolds,
preprint, arXiv:0708.4221, 2007.
[15] Biran, P., Cornea, O., Rigidity and uniruling for Lagrangian submani-
folds, arXiv:0808.2440, 2008.
[16] Cho, C.-H., Holomorphic disc, spin structures and Floer cohomology of
the Clifford torus, Int. Math. Res. Not. 35 (2004), 1803-1843.
[17] Cho, C.-H., Non-displaceable Lagrangian submanifolds and Floer coho-
mology with non-unitary line bundle, prperint, arXiv:0710.5454, 2007.
[18] Conley, C., Zehnder, E., Morse-type index theory for flows and peri-
odic solutions for Hamiltonian equations, Comm. Pure Appl. Math. 37:2
(1984), 207-253.
[19] de Gosson, M., de Gosson, S., Piccione, P., On a product formula for the
Conley–Zehnder index of symplectic paths and its applications, preprint,
math.SG/0607024, 2006.
[20] Delzant, T., Hamiltoniens périodiques et images convexes de l’appli-
cation moment, Bull. Soc. Math. France 116:3 (1988), 315-339.
[21] Donaldson, S. K., Polynomials, vanishing cycles and Floer homology, in
Mathematics: frontiers and perspectives, 55-64, AMS, 2000.
[22] Entov, M., Polterovich, L., Calabi quasimorphism and quantum homol-
ogy, Intern. Math. Res. Notices 30 (2003), 1635-1676.
[23] Entov, M., Polterovich, L., Quasi-states and symplectic intersections,
Comm. Math. Helv. 81:1 (2006), 75-99.
http://arxiv.org/abs/0708.4221
http://arxiv.org/abs/0808.2440
http://arxiv.org/abs/0710.5454
[24] Entov, M., Polterovich, L., Symplectic quasi-states and semi-simplicity
of quantum homology, in Toric Topology, pp. 47-70, Contemporary
Mathematics 460, AMS, 2008.
[25] Entov, M., Polterovich, L., Zapolsky, F., Quasi-morphisms and the Pois-
son bracket, Pure Appl. Math. Quarterly 3:4, part 1 (2007), 1037-1055.
[26] Floer, A., Symplectic fixed points and holomorphic spheres, Comm.
Math. Phys. 120:4 (1989), 575-611.
[27] Fukaya, K., Oh., Y.-G., Ohta, H., Ono, K., Lagrangian intersection
Floer theory – anomaly and obstruction, preprint.
[28] Fukaya, K., Oh., Y.-G., Ohta, H., Ono, K., Lagrangian Floer theory on
compact toric manifolds I, preprint, arXiv:0802.1703, 2008.
[29] Futaki, A., An obstruction to the existence of Einstein Kähler metrics,
Invent. Math. 73:3 (1983), 437-443.
[30] Guillemin, V., Sternberg, S., Convexity properties of the moment map-
ping, Invent. Math. 67:3 (1982), 491-513.
[31] Hirzebruch, F., Topological methods in algebraic geometry, Springer-
Verlag, Berlin, 1966.
[32] Hofer, H., Salamon, D., Floer homology and Novikov rings, in: The Floer
Memorial Volume, 483-524, Progr. Math., 133, Birkhäuser, 1995.
[33] Karshon, Y., Appendix to the paper “Symplectic packings and algebraic
geometry” by D.McDuff and L.Polterovich, Invent. Math. 115:3 (1994),
431-434.
[34] Lang, S., Algebra, 3rd edition, Springer-Verlag, 2002.
[35] Leray, J., Lagrangian Analysis and Quantum Mechanics, The MIT
Press, Cambridge, Massachusetts, 1981.
[36] Lerman, E., Symplectic cuts, Math. Res. Lett. 2:3 (1995), 247-258.
[37] Liu, G., Associativity of quantum multiplication, Comm. Math. Phys.
191:2 (1998), 265-282.
http://arxiv.org/abs/0802.1703
[38] Mabuchi, T., Einstein-Káhler forms, Futaki invariants and convex ge-
ometry on toric Fano varieties, Osaka J. Math. 24:4 (1987), 705-737.
[39] McDuff, D., Hamiltonian S1 manifolds are uniruled, preprint,
arXiv:0706.0675, 2007.
[40] McDuff, D., Private communication, 2007.
[41] McDuff, D., Salamon, D., J-holomorphic curves and symplectic topology,
AMS, 2004.
[42] Oh, Y.-G., Symplectic topology as the geometry of action functional I,
J. Differ. Geom. 46 (1997), 499-577.
[43] Oh, Y.-G., Symplectic topology as the geometry of action functional II,
Commun. Anal. Geom. 7 (1999), 1-55.
[44] Oh, Y.-G., Addendum to: “Floer cohomology of Lagrangian intersections
and pseudo-holomorphic disks. I”, Comm. Pure Appl. Math. 48 (1995),
no. 11, 1299-1302.
[45] Oh, Y.-G., Construction of spectral invariants of Hamiltonian diffeomor-
phisms on general symplectic manifolds, in The breadth of symplectic and
Poisson geometry, 525-570, Birkhäuser, 2005.
[46] Ostrover, Y., Calabi quasi-morphisms for some non-monotone symplec-
tic manifolds, Algebr. Geom. Topol. 6 (2006), 405-434.
[47] Piunikhin, S., Salamon, D., Schwarz, M., Symplectic Floer-Donaldson
theory and quantum cohomology, in: Contact and Symplectic Geometry,
171-200, Publ. Newton Inst., 8, Cambridge Univ. Press, 1996.
[48] Polterovich, L., The geometry of the group of symplectic diffeomor-
phisms, Birkhäuser, 2001.
[49] Polterovich, L., Hamiltonian loops and Arnold’s principle, Amer. Math.
Soc. Transl. (2) 180 (1997), 181-187.
[50] Polterovich, L., Hofer’s diameter and Lagrangian intersections, Internat.
Math. Res. Notices 4 1998, 217-223.
http://arxiv.org/abs/0706.0675
[51] Polterovich, L., Rudnick, Z., Kick stability in groups and dynamical
systems, Nonlinearity 14:5 (2001), 1331-1363.
[52] Py, P., Quasi-morphismes et invariant de Calabi, Ann. Sci. Ecole Norm.
Sup. 39 (2006), 177–195.
[53] Py, P., Quasi-morphismes et diffeomorphismes Hamiltoniens, PhD-
thesis, ENS-Lyon, 2008.
[54] Robbin, J., Salamon, D., The Maslov index for paths, Topology 32:4
(1993), 827-844.
[55] Ruan, Y., Tian, G., A mathematical theory of quantum cohomology,
Math. Res. Lett. 1:2 (1994), 269-278.
[56] Ruan, Y., Tian, G., A mathematical theory of quantum cohomology, J.
Diff. Geom. 42:2 (1995), 259-367.
[57] Salamon, D., Zehnder, E., Morse theory for periodic solutions of Hamil-
tonian systems and the Maslov index, Comm. Pure Appl. Math. 45:10
(1992), 1303-1360.
[58] Salamon, D., Lectures on Floer homology, in: Symplectic geometry and
topology (Park City, UT, 1997), 143-229, IAS/Park City Math. Ser., 7,
AMS, 1999.
[59] Schwarz, M., On the action spectrum for closed symplectically aspherical
manifolds, Pacific J. Math. 193:2 (2000), 419-461.
[60] Seidel, P., Floer homology and the symplectic isotopy problem, PhD the-
sis, Oxford University, 1997.
[61] Seidel, P., Graded Lagrangian submanifolds, Bull. Soc. Math. France
128:1 (2000), 103-149.
[62] Seidel, P., Vanishing cycles and mutation, in European Congress of
Mathematics, Vol. II (Barcelona, 2000), 65-85, Progr. Math., 202,
Birkhäuser, 2001.
[63] Shelukhin, E., PhD thesis (in preparation), Tel-Aviv University.
[64] Usher, M., Spectral numbers in Floer theories, preprint, arXiv:0709.1127,
2007.
[65] Viterbo, C., Symplectic topology as the geometry of generating functions,
Math. Ann. 292:4 (1992), 685-710.
[66] van der Waerden, B., Algebra. Vol. 2, Springer-Verlag, 1991.
[67] Wang, X.-J., Zhu, X., Kähler-Ricci solitons on toric manifolds with pos-
itive first Chern class, Adv. Math. 188:1 (2004), 87-103.
[68] Weinstein, A., Cohomology of symplectomorphism groups and critical
values of Hamiltonians, Math. Z. 201:1 (1989), 75-82.
[69] Witten, E., Two-dimensional gravity and intersection theory on moduli
space, Surveys in Diff. Geom. 1 (1991), 243-310.
Michael Entov Leonid Polterovich
Department of Mathematics School of Mathematical Sciences
Technion Tel Aviv University
Haifa 32000, Israel Tel Aviv 69978, Israel
entov@math.technion.ac.il polterov@post.tau.ac.il
http://arxiv.org/abs/0709.1127
	Introduction and main results
	Many facets of displaceability
	Preliminaries on quantum homology
	An hierarchy of rigid subsets within Floer theory
	Hamiltonian torus actions
	Super(heavy) monotone Lagrangian submanifolds
	An effect of semi-simplicity
	Discussion and open questions
	Strong displaceability beyond Floer theory? 
	Heavy fibers of Poisson-commutative subspaces
	Detecting stable displaceability
	Preliminaries on Hamiltonian Floer theory
	Valuation on QH* (M)
	Hamiltonian Floer theory
	Conley-Zehnder and Maslov indices
	Spectral numbers
	Partial symplectic quasi-states
	Basic properties of (super)heavy sets
	Products of (super)heavy sets 
	Product formula for spectral invariants
	Decorated Z2-graded complexes
	Reduced Floer and Quantum homology
	Proof of Theorem 5.1
	Proof of algebraic Theorem 5.2 
	Stable non-displaceability of heavy sets
	Analyzing stable stems
	Monotone Lagrangian submanifolds 
	Rigidity of special fibers of Hamiltonian actions
	Calabi and mixed action-Maslov
ABSTRACT
  We show that there is an hierarchy of intersection rigidity properties of
sets in a closed symplectic manifold: some sets cannot be displaced by
symplectomorphisms from more sets than the others. We also find new examples of
rigidity of intersections involving, in particular, specific fibers of moment
maps of Hamiltonian torus actions, monotone Lagrangian submanifolds (following
the works of P.Albers and P.Biran-O.Cornea), as well as certain, possibly
singular, sets defined in terms of Poisson-commutative subalgebras of smooth
functions. In addition, we get some geometric obstructions to semi-simplicity
of the quantum homology of symplectic manifolds. The proofs are based on the
Floer-theoretical machinery of partial symplectic quasi-states.

<|endoftext|><|startoftext|>
Introduction
Multiple parton scattering in a dense medium can be used as a useful tool to study
properties of both hot and cold nuclear matter. The success of such an approach has been
demonstrated by the discovery of strong jet quenching phenomena in central Au + Au
collisions at the Relativistic Heavy-ion Collider (RHIC) [1,2,3] and their implications
on the formation of a strongly coupled quark-gluon plasma at RHIC [4,5]. However,
for a convincing phenomenological study of the existing and future experimental data,
a unified description of all medium effects in hard processes involving nuclei, such as
electron-nucleus (e + A), hadron-nucleus (h + A) and nucleus-nucleus collisions (A +
A) has to be developed [6,7]. This must include the physics of transverse momentum
broadening [8], strong nuclear enhancement in DIS [9] and Drell-Yan production [10,11],
nuclear shadowing [12], and parton energy loss due to gluon radiation induced by multiple
scattering [13,14,15,16,17,18,19].
There exist many different frameworks in the literature to describe multiple scattering
in a nuclear medium [20,21,22]. Among them the twist expansion approach is based on
the generalized factorization in perturbative QCD as initially developed by Luo, Qiu
and Sterman (LQS) [23]. In the LQS formalism, multiple scattering processes generally
involve high-twist multiple-parton correlations in analogy to the parton distribution op-
erators in leading twist processes. Though the corresponding higher twist corrections are
suppressed by powers of 1/Q2, they are enhanced at least by a factor of A1/3 due to
multiple scattering in a large nucleus. This framework has been applied recently to study
medium modification of the fragmentation functions as the leading parton propagates
through the medium [18,19]. Because of the non-Abelian Landau-Pomeranchuck-Midgal
interference in the gluon bremsstrahlung induced by multiple parton scattering in nuclei,
the higher-twist nuclear modifications to the fragmentation functions are in fact enhanced
by A2/3, quadratic in the nuclear size [18,19]. Phenomenological study of parton energy
loss and nuclear modification of the fragmentation functions in cold nuclear matter [24]
gives a good description of the nuclear modification of the leading hadron spectra in semi-
inclusive deeply inelastic lepton-nucleus scattering observed by the HERMES experiment
[25,26]. The same framework also gives a compelling explanation for the suppression of
large transverse momentum hadrons discovered at RHIC [27].
The emphasis of recent studies of medium modification of fragmentation functions has
been on radiative parton energy loss induced by multiple scattering with gluons. Such
processes indeed are dominant relative to multiple scattering with quarks because of the
abundance of soft gluons in either cold nuclei or hot dense matter produced in heavy-ion
collisions. Since gluon bremsstrahlung induced by scattering with medium gluons is the
same for quarks and anti-quarks, one also expects the energy loss and fragmentation
modification to be identical for quarks and anti-quarks. However, in a medium with fi-
nite baryon density such as cold nuclei and the forward region of heavy-ion collisions, the
difference between quark and anti-quark distributions in the medium should lead to differ-
ent energy loss and modified fragmentation functions for quarks and antiquarks through
quark-antiquark annihilation processes. To study such an asymmetry, one must consider
systematically all possible quark-quark and quark-antiquark scattering processes, which
will be the focus of this paper.
In this study we will calculate the modifications of quark and antiquark fragmentation
functions (FF) due to quark-quark (antiquark) double scattering in a nuclear medium,
working within the LQS framework for generalized factorization in perturbative QCD.
For a complete description of nuclear modification of the single inclusive hadron spec-
tra, one still have to consider medium modification of gluon fragmentation functions in
addition to modified quark fragmentation function due to quark-gluon scattering [18].
The theoretical results presented in this paper will be a second step toward a complete
description of medium modified fragmentation functions. However, one can already find
that quark-quark (antiquark) double scattering will give different corrections to quark
and antiquark FF, depending on antiquark and quark density of the medium, respec-
tively. This difference between modified quark and antiquark FF may shed light on the
interesting observation by the HERMES experiment [25,26] of a large difference between
nuclear suppression of the leading proton and antiproton spectra in semi-inclusive DIS
off large nuclei. Such a picture of quark-quark (antiquark) scattering can provide a com-
peting mechanism for the experimentally observed phenomenon in addition to possible
absorption of final state hadrons inside nuclear matter [28,29].
The paper is organized as follows. In the next section we will present the general for-
malism of our calculation including the generalized factorization of twist-4 processes.
In Section III we will illustrate the procedure of calculating the hard partonic parts of
quark-quark double scattering in nuclei. In Section IV we will discuss the modifications
to quark and antiquark fragmentation functions due to quark-quark (antiquark) dou-
ble scattering in nuclei. In Section V, we will focus on the flavor dependent part of the
medium modification to the quark FF’s due to quark-antiquark annihilation and we will
discuss the implications for the flavor dependence of the leading hadron spectra in both
DIS off a nucleus and heavy-ion collisions. We will summarize our work in Section VI. In
the Appendix A-1, we collect the complete results for the hard partonic parts for different
cut diagrams of quark-quark (antiquark) double rescattering in nuclei. We also provide
an alternative calculation of the hard parts of the central-cut diagrams in Appendix A-3
through elastic quark-quark scattering or quark-antiquark annihilation as a cross check.
2 General formalism
In order to study quark and antiquark FF’s in semi-inclusive deeply inelastic lepton-
nucleus scattering, we consider the following processes,
e(L1) + A(p) −→ e(L2) + h(ℓh) +X ,
Ap Ap
Fig. 1. Lowest order and leading-twist contribution to semi-inclusive DIS.
where L1 and L2 are the four momenta of the incoming and outgoing leptons, and ℓh
is the observed hadron momentum. The differential cross section for the semi-inclusive
process can be expressed as
EL2Eℓh
dσhDIS
d3L2d3ℓh
LµνEℓh
dW µν
, (1)
where p = [p+, 0, 0⊥] is the momentum per nucleon in the nucleus, q = L2 − L1 =
[−Q2/2q−, q−, 0⊥] the momentum transfer carried by the virtual photon, s = (p + L1)2
the lepton-nucleon center-of-mass energy and αEM is the electromagnetic (EM) coupling
constant. The leptonic tensor is given by Lµν = 1/2Tr(γ · L1γµγ · L2γν) while the semi-
inclusive hadronic tensor is defined as,
〈A|Jµ(0)|X, h〉〈X, h|Jν(0)|A〉2πδ4(q + p− pX − ℓh) (2)
where
X runs over all possible final states and Jµ =
q eqψ̄qγµψq is the hadronic EM
current.
Assuming collinear factorization in the parton model, the leading-twist contribution to
the semi-inclusive cross section can be factorized into a product of parton distributions,
parton fragmentation functions and the partonic cross section. Including all leading log
radiative corrections, the lowest order contribution [O(α0s)] from a single hard γ∗ + q
scattering, as illustrated in Fig. 1, can be written as
dW Sµν
dxfAq (x, µ
µν (x, p, q)Dq→h(zh, µ
2) ; (3)
H(0)µν (x, p, q) =
Tr(γ · pγµγ · (q + xp)γν)
2p · q
δ(x− xB) , (4)
where the momentum fraction carried by the hadron is defined as zh = ℓ
−, xB =
Q2/2p+q− is the Bjorken scaling variable, µ2I and µ
2 are the factorization scales for
the initial quark distributions fAq (x, µ
I) in a nucleus and the fragmentation functions
in vacuum Dq→h(zh, µ
2), respectively. The renormalized quark fragmentation function
Dq→h(zh, µ
2) satisfies the DGLAP QCD evolution equations [30]:
∂Dq→h(zh, µ
∂ lnµ2
γq→qg(z)Dq→h(zh/z, µ
+ γq→gq(z)Dg→h(zh/z, µ
; (5)
∂Dg→h(zh, µ
∂ lnµ2
γg→qq̄(z)Dq→h(zh/z, µ
+ γg→gg(z)Dg→h(zh/z, µ
, (6)
where γa→bc(z) denotes the splitting functions of the corresponding radiative processes
[31,32].
In DIS off a nuclear target, the propagating quark will experience additional scatterings
with other partons from the nucleus. The rescatterings may induce additional parton
(quark or gluon) radiation and cause the leading quark to lose energy. Such induced
radiation will effectively give rise to additional terms in the DGLAP evolution equation
leading to a modification of the fragmentation functions in a medium. These are power-
suppressed higher-twist corrections and they involve higher-twist parton matrix elements.
We will only consider those contributions that involve two-parton correlations from two
different nucleons inside the nucleus. They are proportional to the thickness of the nucleus
[18,23,33] and thus are enhanced by a nuclear factor A1/3 as compared to two-parton
correlations in a nucleon. As in previous studies [18,19], we will limit our study to such
double scattering processes in a nuclear medium. These are twist-four processes and give
leading contributions to the nuclear effects. The contributions of higher twist processes or
contributions not enhanced by the nuclear medium will be neglected for the time being.
When considering double scattering with nuclear enhancement, a very important process
is quark-gluon double scattering as illustrated in Fig. 2. Such processes give the dominant
contribution to the leading quark energy loss and have been studied in detailed in Refs.
[18,19]. The modification to the vacuum quark fragmentation function from quark-gluon
scattering is,
qg→qg
q→h (zh)=
α2sCA
Dq→h(zh/z)
1 + z2
(1− z)+
TAqg(x, xL)
fAq (x)
+ δ(z − 1)
∆TAqg(x, ℓ
fAq (x)
+Dg→h(zh/z)
1 + (1− z)2
TAqg(x, xL)
fAq (x)
, (7)
Fig. 2. A typical diagram for quark-gluon double scattering with three possible cuts [central(C),
left(L) and right(R)].
where the +function is defined as
F (z)
(1 − z)+
F (z)− F (1)
for any F (z) that is sufficiently smooth at z = 1 and the twist-four quark-gluon correla-
tion function,
TAqg(x, xL) =
dy−1 dy
i(x+xL)p
+y−(1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
×〈A|ψ̄q(0)
F +σ (y
+σ(y−1 )ψq(y
−)|A〉θ(−y−2 )θ(y− − y−1 ), (9)
has explicit interference included. The matrix element in the virtual correction [the term
with δ(z − 1)] is defined as
∆TAqg(x, ℓ
T ) ≡
2TAqg(x, xL)|z=1 − (1 + z2)TAqg(x, xL)
. (10)
Since TAqg(x, xL)/f
q (x) is proportional to gluon distribution and independent of the flavor
of the leading quark, the suppression of the hadron spectrum caused by quark-gluon or
antiquark-gluon scattering should be proportional to the gluon density of the medium
and is identical for quark and antiquark fragmentation. It was shown in Ref. [24] that
such modification of parton fragmentation functions by quark-gluon double scattering
and gluon bremsstrahlung in a nuclear medium describes very well the recent HERMES
data [25] on semi-inclusive DIS off nuclear targets.
Fig. 3. Diagram for leading order quark-antiquark annihilation with three possible cuts [cen-
tral(C), left(L) and right(R)].
Fig. 4. A typical diagram for next-to-leading order correction to quark-antiquark annihilation
with three possible cuts [central(C), left(L) and right(R)].
In this paper, we will consider quark-quark (antiquark) double scattering such as the
process shown in Fig. 3 and its radiative corrections at order O(α2s) in Fig. 4. The
contributions of quark-quark double scattering is proportional to the quark density in
a nucleon, while the contribution of quark-gluon double scattering is proportional to
the gluon density in a nucleon; and the gluon density is generally larger than the quark
density in a nucleon at small momentum fraction. However, as pointed out in earlier works
[18], quark-quark double scattering mixes quark and gluon fragmentation functions and
therefore gives rise to new nuclear effects. The annihilation processes as shown in Figs. 3
and 4 will lead to different modifications of quark and antiquark fragmentation functions
in a medium with finite baryon density (or valence quarks). Such differences will in
turn lead to flavor dependence of the nuclear modification of leading hadron spectra as
observed in HERMES experiment [25,26].
Quark-quark double scattering as well as quark-gluon double scattering are twist-4 pro-
cesses. We will apply the same generalized factorization procedure for twist-4 processes
as developed by LQS [23] for semi-inclusive processes in DIS. In general, the twist-four
contributions can be expressed as the convolution of partonic hard parts and two-parton
correlation matrix elements. In this framework, contributions from double quark-quark
scattering in any order of αs, e.g., the quark-antiquark annihilation process as illustrated
in Fig. 4, can be written in the following form,
dWDµν
p+dy−
dy−1 dy
−, y−1 , y
2 , p, q, zh)
×〈A|ψ̄q(0)
−)ψ̄q(y
2 )|A〉. (11)
Here we have neglected transverse momenta of all quarks in the hard partonic part.
Transverse momentum dependent contributions are higher twist and are suppressed by
〈k2⊥〉/Q2, Therefore, all quarks’ momenta are assumed collinear, k2 = x2p and k3 = x3p.
−, y−1 , y
2 , p, q, zh) is the Fourier transform of the partonic hard part H̃µν(x, x1, x2, p, q, zh)
in momentum space,
−, y−1 , y
2 , p, q, zh) =
eix1p
+y−+ix2p
+i(x−x1−x2)p
×H̃Dµν(x, x1, x2, p, q, zh)
dxH(0)µν (x, p, q)H
(y−, y−1 , y
2 , x, p, q, zh) , (12)
where, in collinear approximation, the hard partonic part H(0)µν (x, p, q) [Eq. (4)] in the
leading twist without multiple parton scattering can be factorized out of the high-twist
hard part H̃Dµν(x, x1, x2, p, q, zh). The momentum fractions x, x1, and x2 are fixed by
δ-functions of the on-shell conditions of the final state partons and poles of parton prop-
agators in the partonic hard part. The phase factors in H
−, y−1 , y
2 , p, q, zh) can then
be factored out, which in turn will be combined with the partonic fields in Eq. (11)
to give twist-four partonic matrix elements or two-parton correlations. The quark-quark
double scattering corrections in Eq. (11) can then be factorized as the convolution of
fragmentation functions, twist-four partonic matrix elements and the partonic hard scat-
tering cross sections. For scatterings (versus the annihilation) with quarks (antiquarks),
a summation over the flavor of the secondary quarks (antiquarks) should be included
in two-quark correlation matrix elements and both t, u channels and their interferences
should be considered for scattering of identical quarks in the hard partonic parts.
After factorization, we then define the twist-four correction to the leading twist quark
fragmentation function in the same form [Eq. (3)],
dWDµν
dxfAq (x)H
µν (x, p, q)∆Dq→h(zh) . (13)
3 Quark-quark double scattering processes
In this section we will discuss the calculation of the hard part of quark-quark double
scattering in detail. The lowest order process of quark-quark (antiquark) double scattering
in nuclei is quark-antiquark annihilation (or quark-gluon conversion) as shown in Fig. 3.
The hard partonic parts from the three cut diagrams in this figure are [18]:
0,C(y
−, y−1 , y
2 , x, p, q, zh)=Dg→h(zh)
×θ(−y−2 )θ(y− − y−1 ) , (14)
0,L(y
−, y−1 , y
2 , x, p, q, zh)=Dq→h(zh)
×θ(y−1 − y−2 )θ(y− − y−1 ) , (15)
0,R(y
−, y−1 , y
2 , x, p, q, zh)=Dq→h(zh)
×θ(−y−2 )θ(y−2 − y−1 ) . (16)
The main focus of this paper is about contributions from the next-leading order correc-
tions to the above lowest order process. There is a total of 12 diagrams for real corrections
at one-loop level as illustrated in Fig. 5 to Fig. 16 in the Appendix A-1, each having up
to three different cuts. In this section, we demonstrate the calculation of the hard parts
from the quark-antiquark annihilation in Fig. 4 in detail as an example. We will list the
complete results of all diagrams in Appendix A.
One can write down the hard partonic part of the central-cut diagram of Fig. 4 (Fig. 5
in Appendix A-1) according to the conventional Feynman rule,
C µν(y
−, y−1 , y
2 , p, q, zh)=
Dg→h(
eix1p
+y−+ix2p
×ei(x−x1−x2)p+y
(2π)4
γµĤγν
2πδ+(ℓ
2)2πδ+(ℓ
g) δ(1− z −
γ · (q + x1p)
(q + x1p)2 − iǫ
γ · (q + x1p− ℓ)
(q + x1p− ℓ)2 − iǫ
γ · (q + xp− ℓ)
(q + xp− ℓ)2 + iǫ
γ · (q + xp)
(q + xp)2 + iǫ
εαρ(ℓ) εβσ(ℓg) , (17)
where δ+ is a Dirac delta-function with only the positive solution in its functional variable,
εαρ(ℓ) = −gαρ + (nαℓρ + nρℓα)/n · ℓ is the polarization tensor of a gluon propagator in
an axial gauge (n · A = 0) with n = [1, 0−,~0⊥], ℓ and ℓg = q + (x1 + x2)p − ℓ are the
4-momenta carried by the two final gluons respectively. The fragmenting gluon carries
a fraction, z = ℓ−g /q
−, of the initial quark’s longitudinal momentum (the large minus
component).
To simplify the calculation in the case of small transverse momentum ℓT ≪ q−, p+, we
can apply the collinear approximation to complete the trace of the product of γ-matrices,
Ĥ ≈ γ · ℓq
γ · ℓqĤ
. (18)
According to the convention in Eqs. (11) and (12), contributions from quark-quark double
scattering in the nuclear medium to the semi-inclusive hadronic tensor in DIS off a nucleus
can be expressed in the general factorized form:
dWDqq̄,µν
dxH(0)µν (x, p, q)
p+dy−
dy−1 dy
(y−, y−1 , y
2 , x, p, q, zh)
× 〈A|ψ̄q(0)
ψq̄(y
−)ψ̄q(y
ψq̄(y
2 )|A〉 . (19)
After carrying out the momentum integration in x, x1, x2 and ℓ
± in Eq. (17) with the
help of contour integration and δ-functions, one obtains the hard partonic part, H
the rescattering for the central-cut diagram in Fig. 4 (Fig. 5) as
5,C(y
−, y−1 , y
2 , x, p, q, zh) =
α2sxB
Dg→h(zh/z)
× 2(1 + z
z(1− z)
I5,C(y
−, y−1 , y
2 , x, xL, p) , (20)
I5,C(y
−, y−1 , y
2 , x, xL, p) = e
i(x+xL)p
+y−θ(−y−2 )θ(y− − y−1 )
× (1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
)) , (21)
where the momentum fractions xL is defined as
2p+q−z(1− z)
. (22)
Note that the function I5,C(y
−, y−1 , y
2 , x, xL, p) contains only phase factors. One can
combine these phase factors with the matrix elements of the quark fields to define a
special two-quark correlation function
A(5,C)
qq̄ (x, xL) =
p+dy−
dy−1 dy
2 〈A|ψ̄q(0)
ψq̄(y
−)ψ̄q(y
ψq̄(y
2 )|A〉
× I5,C(y−, y−1 , y−2 , x, xL, p) . (23)
The contribution from quark-antiquark annihilation in the central-cut diagram in Fig. 4
to the hadronic tensor can then be expressed as
dWDqq̄,µν
dxH(0)µν (x, p, q)
α2sxB
Dg→h(
2(1 + z2)
z(1 − z)
A(5,C)
qq̄ (x, xL). (24)
Contributions from all quark-quark (antiquark) double scattering processes can be cast
in the above factorized form.
The structure of the phase factors in I5,C(y
−, y−1 , y
2 , x, xL, p) is exactly the same as for
gluon bremsstrahlung induced by quark-gluon scattering as studied in Ref. [18,19]. It
resembles the cross section of dipole scattering and represents contributions from two
different processes and their interferences. It contains essentially four terms,
I5,C(y
−, y−1 , y
2 , x, xL, p) = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×[1 + e−ixLp+(y−+y
) − e−ixLp+y
2 − e−ixLp+(y−−y
)] . (25)
The first term corresponds to the so-called hard-soft processes where the gluon emission
is induced by the hard scattering between the virtual photon γ∗ and the initial quark
with momentum (x + xL)p. The quark then becomes on-shell before it annihilates with
a soft antiquark from the nucleus that carries zero momentum and converts into a real
gluon in the final state. The second term corresponds to a process in which the initial
quark with momentum xp is on-shell after the first hard γ∗-quark scattering. It then
annihilates with another antiquark and produces two final gluons in the final state. In
this process, the antiquark carries finite (hard) momentum xLp. Therefore one often refers
to this process as double-hard scattering as compared to the first process in which the
antiquark carries zero momentum. Set aside the change of flavors in the initial and final
states, the double-hard scattering corresponds essentially to two-parton elastic scattering
with finite momentum and energy transfer. This is in contrast to the hard-soft scattering
which is essentially the final state radiation of the γ∗-quark scattering and the total
energy and momentum of the two final state gluons all come from the initial quark. The
corresponding matrix elements of the two-quark correlation functions from these first two
terms are called ‘diagonal’ elements.
The third and fourth terms with negative signs in I5,C(y
−, y−1 , y
2 , x, xL, p) are interfer-
ences between hard-soft and double hard processes. The corresponding matrix elements
are called ‘off-diagonal’. The cancellation between the two diagonal and off-diagonal
terms essentially gives rise to the destructive interference which is very similar to the
Landau-Pomeranchuk-Migdal (LPM) interference in gluon bremsstrahlung induced by
quark-gluon double scattering [18,19]. One can similarly define the formation time of the
parton (quark or gluon) emission as
. (26)
In the limit of collinear emission (xL → 0) or when the formation time of the parton
emission, τf , is much larger than the nuclear size, the effective matrix element vanishes
because
I5,C(y
−, y−1 , y
2 , x, xL, p)|xL=0 → 0 , (27)
when the hard-soft and double hard processes have complete destructive interference.
We should note that in the central-cut diagram of Fig. 4, the final state partons are two
gluons. Therefore, in Eq. (20) the gluon fragmentation function in vacuum Dg→h(zh/z)
enters. If the other gluon (close to the γ∗-quark interaction) fragments, the contribution
to the semi-inclusive hadronic tensor is similar except that the corresponding effective
“splitting function” should be replaced by
1 + z2
z(1 − z)
→ 1 + (1− z)
z(1 − z)
. (28)
As we will show in Appendix A-1, the two gluons in the quark-antiquark annihilation
processes (central-cut diagrams) are symmetric when contributions from all possible an-
nihilation processes and their interferences are summed. Therefore, one can simply mul-
tiply the final results by a factor of 2 to take into account the hadronization of the second
final-state gluon.
In addition to the central-cut diagram, one should also take into account the asymmetrical-
cut diagrams in Fig. 4, which represent interference between gluon emission from single
and triple scattering. The hard partonic parts are mainly the same as for the central-cut
diagram. The only differences are in the phase factors and the fragmentation functions
since the fragmenting parton can be the final-state quark or gluon. These hard parts can
be calculated following a similar procedure and one gets,
5,L(R)(y
−, y−1 , y
2 , x, p, q, zh) =
α2sxB
Dq→h(
2(1 + z2)
z(1 − z)
× I5,L(R)(y−, y−1 , y−2 , x, xL, p) , (29)
I5,L(y
−, y−1 , y
2 , x, xL, p) =−ei(x+xL)p
+y−(1− e−ixLp+(y−−y
×θ(y−1 − y−2 )θ(y− − y−1 ) , (30)
I5,R(y
−, y−1 , y
2 , x, xL, p) =−ei(x+xL)p
+y−(1− e−ixLp+y
×θ(−y−2 )θ(y−2 − y−1 ) . (31)
In the asymmetrical cut diagrams, the above contributions come from the fragmentation
of the final-state quark. Therefore, quark fragmentation function Dq→h(zh/z) enters this
contribution. For gluon fragmentation into the observed hadron in this asymmetrical-cut
diagrams, the contribution can be obtained by simply replacing the quark fragmentation
function by the gluon fragmentation function Dg→h(zh/z) and replacing z by 1 − z.
Summing the contributions from three different cut diagrams of Fig. 4, we can observe
further examples of mixing (or conversion) of quark and gluon fragmentation functions.
This medium-induced mixing was first observed by Wang and Guo [18] and is a unique
feature of quark-quark (antiquark) double scattering among all multiple parton scattering
processes.
With the same procedure we can calculate contributions from all other cut diagrams
of quark-quark (antiquark) double scattering at order O(α2s), which are listed in Ap-
pendix A-1. There are three types of processes: two annihilation processes, qq̄ → gg
(central-cut diagrams in Figs. 5, 6, 7, 8 and 9), qq̄ → qiq̄i (central-cut diagram in Fig. 10)
and quark-quark (antiquark) scattering, qqi(q̄i) → qqi(q̄i) (central-cut diagram in Fig. 11).
One also has to consider the interference of s and t-channel amplitude for annihilation
into an identical quark pair, qq̄ → qq̄ (central-cut diagrams in Figs. 12 and 13) and the
interference between t and u channels of identical quark scattering qq → qq (central-cut
diagram in Fig. 14).
Contributions from left and right-cut diagrams correspond to interference between the
amplitude of gluon radiation from single γ∗-quark scattering and triple quark scattering.
The amplitudes of gluon radiation via triple quark scattering essentially come from ra-
diative corrections to the left and right-cut diagrams of the lowest-order quark-antiquark
annihilation in Fig. 3 (as shown in left and right-cut diagrams in Figs. 5, 6, 8, 9, 12,
13, 15 and 16). Two other triple quark scatterings with gluon radiation, shown as the
left and right-cut diagrams in Figs. 11 and 14, correspond to the case where one of the
final state quarks, after quark-quark scattering, annihilates with another antiquark and
converts into a final state gluon.
4 Modified Fragmentation Functions
In order to simplify the contributions from quark-quark (antiquark) scattering (annihi-
lation), one can first organize the results of the hard parts in terms of contributions from
central, left or right-cut diagrams, which are associated by contour integrals with specific
products of θ-functions,
= HDC θ(−y−2 )θ(y− − y−1 )−HDL θ(y−1 − y−2 )θ(y− − y−1 )
−HDR θ(−y−2 )θ(y−2 − y−1 ) . (32)
These θ-functions provide a space-time ordering of the parton correlation and will restrict
the integration range along the light-cone. For contributions from central, left and right-
cut diagrams that have identical hard partonic parts, H
C = H
L = H
R , they will
have a common combination of θ-functions that produces a path-ordered integral,
dy−2 =−
dy−1 dy
θ(−y−2 )θ(y− − y−1 )− θ(−y−2 )θ(y−2 − y−1 )
−θ(y− − y−1 )θ(y−1 − y−2 )
that is limited only by the spatial-spread y− of the first parton along the light-cone
coordinate. For a high-energy parton that carries momentum fraction xp+, y− ∼ 1/xp+
should be very small. Those contributions that are proportional to the above path-ordered
integral are referred to as contact contributions (or contact interactions).
Similarly, y−1 − y−2 is the spatial spread of the second parton and can only be limited
by the spatial size of its host nucleon even for small value of momentum fraction. The
spatial position of its host nucleon, y−1 + y
2 , however, can be anywhere within the nu-
cleus. Therefore, any contributions from double parton scattering that have unrestricted
integration over y−1 and y
2 should be proportional to the nuclear size of the target A
and therefore are nuclear enhanced. In this paper, we will only keep the nuclear enhanced
contributions and neglect the contact contributions. This will greatly simplify the final
results for double parton scattering.
4.1 qq̄ → g annihilation
For the lowest order of quark-antiquark annihilation in Eqs. (14)-(16), the hard parts
from the three cut diagrams are almost the same except for the parton fragmentation
functions. The central-cut diagram is proportional to the gluon fragmentation function
while the left and right-cut diagrams are proportional to quark fragmentation functions.
Rearranging the contributions from the three cut diagrams and neglecting the contact
term that is proportional to the path-ordered integral as in Eq. (33), the total contribution
can be written as
dWD(0)µν
qq̄ (x, 0)
H(0)µν (x, p, q)
× [Dg→h(zh)−Dq→h(zh)] . (34)
According to our definition in Eq. (13) of the twist-four correction to the quark fragmen-
tation functions, the modification to the quark fragmentation function from the lowest
order quark-antiquark annihilation is then,
(qq̄→g)
q→h (zh) =
[Dg→h(zh)−Dq→h(zh)]
qq̄ (x, 0)
fAq (x)
. (35)
Here the effective quark-antiquark correlation function T
qq̄ (x, 0) is defined as,
qq̄i (x, xL)≡
p+dy−
dy−1 dy
ixp+y−−ixLp
)θ(−y−2 )θ(y− − y−1 )
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 , (36)
with the antiquark q̄i carrying momentum fraction xL. This two-parton correlation func-
tion is always associated with double-hard rescattering processes. Similarly, we define
three other quark-antiquark correlation matrix elements
qq̄i (x, xL)≡
p+dy−
dy−1 dy
i(x+xL)p
+y−θ(y− − y−1 )
× θ(−y−2 )〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 , (37)
A(I−L)
qq̄i (x, xL)≡
p+dy−
dy−1 dy
i(x+xL)p
+y−−ixLp
+(y−−y
)θ(y− − y−1 )
× θ(−y−2 )〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 , (38)
A(I−R)
qq̄i (x, xL)≡
p+dy−
dy−1 dy
i(x+xL)p
+y−−ixLp
2 θ(y− − y−1 )
× θ(−y−2 )〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 , (39)
that are associated with hard-soft rescattering and interference between double hard
and hard-soft rescattering. In the first parton correlation T
qq̄i (x, xL), the antiquark q̄i
carries momentum fraction xL while the initial quark has the momentum fraction x. The
two-parton correlation T
qq̄i (x, xL) corresponds to the case when the leading quark has
x+ xL but the antiquark carries zero momentum. The two interference matrix elements
are approximately the same for small value of xL and will be denoted as T
qq̄i (x, xL).
4.2 qq̄ → qiq̄i annihilation
Contributions from the next-to-leading order quark-antiquark annihilation or quark-
quark (antiquark) scattering are more complicated since they involve many real and vir-
tual corrections. The simplest real correction comes from qq̄ → qiq̄i annihilation (qi 6= q)
[Fig. 10 and Eqs. (A-25) and (A-26)] which has only a central-cut diagram,
(qq̄→qiq̄i)
q→h (zh) =
α2sxB
[z2 + (1− z)2]
qi 6=q
[Dqi→h(zh/z) +Dq̄i→h(zh/z)]
qq̄ (x, xL)
fAq (x)
. (40)
This kind of qq̄ annihilation is truly a hard processes and thus requires the second an-
tiquark to carry finite initial momentum fraction xL. Furthermore, there are no other
interfering processes.
4.3 qqi(q̄i) → qqi(q̄i) scattering
Contributions from non-identical quark-quark scattering qq̄i → qq̄i (qi 6= q) are a little
complicated because they involve all three cut diagrams (central, left and right) [Eqs. (A-
28)-(A-32)]. One can factor out the θ-functions in the hard parts according to Eq. (32)
and re-organize the phase factors in each cut diagram,
I11,C = e
i(x+xL)p
+y−(1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
= ei(x+xL)p
+y−[1− e−ixLp+y
2 − e−ixLp+(y−−y
) + e−ixLp
+(y−+y
I11,L = e
i(x+xL)p
+y−(1− e−ixLp+(y−−y
= ei(x+xL)p
+y−[1− e−ixLp+y
2 − e−ixLp+(y−−y
) + e−ixLp
2 ] ;
I11,R = e
i(x+xL)p
+y−(1− e−ixLp+y
= ei(x+xL)p
+y−[1− e−ixLp+y
2 − e−ixLp+(y−−y
) + e−ixLp
+(y−−y
)] , (41)
such that the first three terms in each amplitude are the same. These three common phase
factors will give rise to a contact contribution for all similar hard parts from the three
cut diagrams, which we will neglect since they are not nuclear enhanced. The remaining
part will have the following phase factors,
I11= e
i(x+xL)p
+y−[θ(−y−2 )θ(y− − y−1 )e−ixLp
+(y−+y
−θ(y−1 − y−2 )θ(y− − y−1 )e−ixLp
2 − θ(−y−2 )θ(y−2 − y−1 )e−ixLp
+(y−−y
)] . (42)
Note that the phase factors of the last two terms in the above equation give identi-
cal contributions to the matrix elements when intergated over y−1 and y
2 as they dif-
fer only by the substitution y−2 ↔ y−1 − y−. One therefore can combine them with
θ(−y−2 )θ(y− − y−1 )e−ixLp
+(y−−y
) to form another contact contribution (path-ordered)
which can be neglected. The final effective phase factor is then
I11 = e
ixp+y−−ixLp
)(1− eixLp+y
2 ) . (43)
Using the above effective phase factor, one can obtain the effective modification to the
quark fragmentation function due to quark-antiquark scattering, qq̄i → qq̄i,
(qq̄i→qq̄i)
q→h (zh) =
α2sxB
q̄i 6=q̄
Dq→h(zh/z)
1 + z2
(1− z)2
+ Dg→h(zh/z)
1 + (1− z)2
A(HI)
qq̄i (x, xL)
fAq (x)
+ [Dq̄i→h(zh/z))−Dg→h(zh/z)]
1 + (1− z)2
A(HS)
qq̄i (x, xL)
fAq (x)
α2sxB
q̄i 6=q̄
Dq→h(zh/z)
1 + z2
(1− z)2
+ Dq̄i→h(zh/z)
1 + (1− z)2
A(HI)
qq̄i (x, xL)
fAq (x)
+ [Dq̄i→h(zh/z)−Dg→h(zh/z)]
1 + (1− z)2
A(SI)
qq̄i (x, xL)
fAq (x)
 , (44)
where three types of two-parton correlations are defined:
A(HI)
qq̄i (x, xL)≡T
qq̄i (x, xL)− T
qq̄i (x, xL)
p+dy−
dy−1 dy
ixp+y−−ixLp
)(1− eixLp+y
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (45)
A(SI)
qq̄i (x, xL)≡T
qq̄i (x, xL)− T
qq̄i (x, xL)
p+dy−
dy−1 dy
i(x+xL)p
+y−(1− e−ixLp+y
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (46)
A(HS)
qq̄i (x, xL)≡T
A(HI)
qq̄i (x, xL) + T
A(SI)
qq̄i (x, xL)
p+dy−
dy−1 dy
i(x+xL)p
+y−(1− e−ixLp+y
× (1− e−ixLp+(y−−y
))θ(−y−2 )θ(y− − y−1 )
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 . (47)
One can similarly obtain the modification of quark fragmentation from non-identical
quark-quark scattering by replacing q̄i → qi in Eq. (44),
(qqi→qqi)
q→h (zh) =
α2sxB
qi 6=q
Dq→h(zh/z)
1 + z2
(1− z)2
+ Dqi→h(zh/z)
1 + (1− z)2
TA(HI)qqi (x, xL)
fAq (x)
+ [Dqi→h(zh/z)−Dg→h(zh/z)]
1 + (1− z)2
TA(SI)qqi (x, xL)
fAq (x)
. (48)
The two-quark correlations, TA(HI)qqi (x, xL) and T
A(SI)
(x, xL) can be obtained from T
A(HI)
qq̄i (x, xL)
and T
A(SI)
qq̄i (x, xL), respectively, by making the replacements ψqi(y2) → ψ̄qi(y2) and ψ̄qi(y1) →
ψqi(y1) in Eqs. (45) and (46),
TA(HI)qqi (x, xL)≡
p+dy−
dy−1 dy
ixp+y−−ixLp
)(1− eixLp+y
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
1 )|A〉θ(−y−2 )θ(y− − y−1 ) , (49)
TA(SI)qqi (x, xL) =
p+dy−
dy−1 dy
i(x+xL)p
+y−(1− e−ixLp+y
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
1 )|A〉θ(−y−2 )θ(y− − y−1 ) , (50)
and TA(HS)qqi (x, xL) = T
A(HI)
(x, xL) + T
A(SI)
(x, xL).
Note that the contribution from fragmentation of quark qi or antiquark q̄i only comes from
the central-cut diagram. This contribution is positive and is proportional to T
A(HI)
qq̄i (x, xL)+
A(SI)
qq̄i (x, xL), containing all four terms: hard-soft, double-hard and both interference
terms . The gluon fragmentation comes only from the single-triple interferences (left and
right-cut diagrams). Its contribution is therefore negative and partially cancels the pro-
duction of qi(q̄i) from the hard-soft rescattering. The cancellation is not complete since
the gluon and quark fragmentation functions are different. The structure of this hard-soft
rescattering (quark plus gluon) is very similar to the lowest order result of qq̄ → g in
Eq. (35). It contributes to the modification of the effective fragmentation function but
does not contribute to the energy loss. The energy loss of the leading quark comes only
from double-hard rescattering, since the leading quark fragmentation comes both from
the central-cut and single-triple interferences, and the single-triple interference terms can-
cel the effect of hard-soft scattering for the leading fragmentation. Its net contribution is
therefore proportional to T
A(HI)
qq̄i(q̄i)
. Since the double-hard rescattering amounts to elastic
qqi(q̄i) scattering, the effective energy loss is essentially elastic energy loss as shown in
Ref. [34]. There is, however, LPM suppression due to partial cancellation by single-triple
interference contributions. For long formation time, 1/xLp
+ ≫ RA, the cancellation is
complete. Therefore, LPM interference effectively imposes the lower limit xL ≥ 1/p+RA
on the fractional momentum carried by the second quark (antiquark).
4.4 qq → qq scattering
For identical quark-quark scattering, qq → qq, one has to include both t and u-channels,
their interferences, and the related single-triple interference contributions. Using the same
technique to identify and neglect the contact contributions, one can find the correspond-
ing modification to the quark fragmentation function from Eqs. (48) and (A-45)-(A-49),
(qq→qq)
q→h (zh) =
α2sxB
TA(HS)qq (x, xL)
fAq (x)
× [Dq→h(zh/z))−Dg→h(zh/z)]
1 + (1− z)2
z(1 − z)
Dq→h(zh/z)
1 + z2
(1− z)2
z(1 − z)
+Dg→h(zh/z)
1 + (1− z)2
z(1 − z)
TA(HI)qq (x, xL)
fAq (x)
α2sxB
TA(SI)qq (x, xL)
fAq (x)
P (s)qq→qq(z)[Dq→h(zh/z)
−Dg→h(zh/z)] +Dq→h(zh/z)Pqq→qq(z)
TA(HI)qq (x, xL)
fAq (x)
, (51)
where the effective splitting functions are defined as
P (s)qq→qq(z) =
1 + (1− z)2
z(1 − z)
, (52)
Pqq→qq(z) =
1 + (1− z)2
1 + z2
(1− z)2
z(1− z)
. (53)
4.5 qq̄ → qq̄, gg annihilation
The most complicated twist-four processes involving four quark field operators are quark-
antiquark annihilation into two gluons or an identical quark-antiquark pair. We have to
consider them together since they have similar single-triple interference processes and
they involve the same kind of quark-antiquark correlation matrix elements, T
qq̄ (x, xL),
(i = HI, SI,HS).
For notation purpose, we first factor out the common factor (CF/Nc)α
sxB/Q
2/fAq (x) and
the integration over ℓT and z and define
(qq̄→gg,qq̄)
q→h (zh) ≡
α2sxB
Q2fAq (x)
(qq̄→gg,qq̄)
q→h (zh, z, x, xL). (54)
After rearranging the phase factors and identifying (by combining central, left and right
cut diagrams) and neglecting contact contributions we can list in the following the twist-
four corrections to the quark fragmentation from the hard partonic parts of each cut
diagram (see Appendix A):
Fig. 5 (t-channel qq̄ → gg),
(qq̄→gg,qq̄)
q→h(5) =Dg→h(zh/z)2CF
1 + (1− z)2
z(1 − z)
1 + z2
z(1− z)
A(HI)
qq̄ (x, xL)
+ [Dg→h(zh/z)−Dq→h(zh/z)] 2CF
1 + z2
z(1− z)
A(SI)
qq̄ (x, xL) ; (55)
Fig. 6 (interference between u and t-channel of qq̄ → gg),
(qq̄→gg,qq̄)
q→h(6) =Dg→h(zh/z)
−4(CF − CA/2)
z(1 − z)
A(HI)
qq̄ (x, xL)
+ [Dg→h(zh/z)−Dq→h(zh/z)]
−2(CF − CA/2)
z(1 − z)
A(SI)
qq̄ (x, xL) ; (56)
Fig. 7 (s-channel of qq̄ → gg),
(qq̄→gg,qq̄)
q→h(7) = Dg→h(zh/z)4CA
(1− z + z2)2
z(1 − z)
qq̄ (x, xL) ; (57)
Figs. 8 and 9 (interference of s and t-channel qq̄ → gg),
(qq̄→gg,qq̄)
q→h(8+9) =Dg→h(zh/z)(−2CA)
1 + z3
z(1− z)
1 + (1− z)3
z(1 − z)
×TA(HI)qq̄ (x, xL) + CA
Dq→h(zh/z)
1 + z3
z(1 − z)
+Dg→h(zh/z)
1 + (1− z)3
z(1 − z)
× [TA(I2)qq̄ (x, xL)− T
qq̄ (x, xL)] ; (58)
Fig. 10 (s-channel of qq̄ → qq̄),
(qq̄→gg,qq̄)
q→h(10) = [Dq→h(zh/z) +Dq̄→h(zh/z)] [z
2 + (1− z)2]
×TA(H)qq̄ (x, xL) , (59)
Fig. 11 (t-channel of qq̄ → qq̄), similar to Eq. (44),
(qq̄→gg,qq̄)
q→h(11) =Dq→h(zh/z)
1 + z2
(1 − z)2
A(HI)
qq̄ (x, xL)
+Dq̄→h(zh/z)
1 + (1− z)2
A(HS)
qq̄ (x, xL)
−Dg→h(zh/z)
1 + (1− z)2
A(SI)
qq̄ (x, xL) ; (60)
Figs. 12 and 13 (interference between s and t-channel qq̄ → qq̄),
(qq̄→gg,qq̄)
q→h(12+13) =−4(CF − CA/2)
Dq→h(zh/z)
1 − z
+ Dq̄→h(zh/z)
(1− z)2
A(HI)
qq̄ (x, xL)
+2(CF − CA/2)
Dq→h(zh/z)
+Dg→h(zh/z)
(1− z)2
× [TA(I2)qq̄ (x, xL)− T
qq̄ (x, xL)] ; (61)
Figs. 15 and 16 (two additional single-triple interference diagrams),
(qq̄→gg,qq̄)
q→h(15+16) =−2CF
Dq→h(zh/z)
1 + z2
+ Dg→h(zh/z)
1 + (1− z)2
A(I2)
qq̄ (x, xL) . (62)
Most processes involve both TA(HI)(x, xL) for double-hard rescattering with interference
and TA(SI)(x, xL) for hard-soft rescattering with interference. All the s-channel (Figs. 7
and 10) processes involve double-hard scattering only. Therefore, they depend only on the
qq̄ (x, xL) = T
A(HI)
qq̄ (x, xL) + T
qq̄ (x, xL). For interference between single and triple
scattering (left and right-cut diagrams in Figs. 8, 9, 12 13, 15 and 16), where a hard
rescattering with the second quark (antiquark) follows a soft rescattering with the third
antiquark (quark), only interference matrix elements, T
qq̄ (x, xL) and T
A(I2)
qq̄ (x, xL), are
involved. Here,
A(I2)
qq̄ (x, xL)≡
p+dy−
dy−1 dy
ixp+y−+ixLp
×〈A|ψ̄q(0)
−)ψ̄q(y
2 )|A〉θ(−y−2 )θ(y− − y−1 )
p+dy−
dy−1 dy
ixp+y−+ixLp
+(y−−y
×〈A|ψ̄q(0)
−)ψ̄q(y
2 )|A〉θ(−y−2 )θ(y− − y−1 ) , (63)
is a new type of interference matrix elements that is only associated with this type of
single-triple interference processes. One can categorize the above contributions according
to the associated two-quark correlation matrix elements and rewrite the above contribu-
tions as,
qq̄→qq̄,gg
q→h(HI) = T
A(HI)
qq̄ (x, xL)[Dg→h(zh/z)Pqq̄→gg(z) +Dq→h(zh/z)Pqq̄→qq̄(z)
+Dq̄→h(zh/z)Pqq̄→qq̄(1− z)] (64)
qq̄→qq̄,gg
q→h(SI) = T
A(SI)
qq̄ (x, xL)
z(1 − z)
+ 2CF
1 + (1− z)2
×Dg→h(zh/z)−Dq→h(zh/z)
z(1− z)
+ 2CF
+ Dq̄→h(zh/z)
1 + (1− z)2
A(SI)
qq̄ (x, xL)
[Dq→h(zh/z)−Dg→h(zh/z)]
P (s)qq→qq(z)
− 2CF
1 + z2
z(1− z)
+ [Dq̄→h(zh/z)−Dq→h(zh/z)]
1 + (1− z)2
qq̄→qq̄,gg
q→h(I) = T
qq̄ (x, xL)
4(1− z + z2)2 − 1
z(1 − z)
− 2CF
(1− z)2
×Dg→h(zh/z) + [z2 + (1− z)2]Dq̄→h(zh/z)]
+ Dq→h(zh/z)
z2 + (1− z)2 −
z(1 − z)
− 2CF
A(I2)
qq̄ (x, xL)
Dq→h(zh/z)
z(1 − z)
− 2CF
+ Dg→h(zh/z)
z(1 − z)
− 2CF
, (66)
where P (s)qq→qq(z) is given in Eq. (52) and the effective splitting functions for qq̄ → gg and
qq̄ → qq̄ are defined as
Pqq̄→gg(z) = 2CF
z2 + (1− z)2
z(1− z)
− 2CA[z2 + (1− z)2] ; (67)
Pqq̄→qq̄(z) = z
2 + (1− z)2 +
1 + z2
(1− z)2
, (68)
which come from the complete matrix elements of qq̄ → gg and qq̄ → qq̄ scattering (see
Appendix A-3). Again, double-hard rescattering corresponds to the elastic scattering of
the leading quark with another antiquark in the medium and the interference contribu-
tions. The structure of the hard-soft rescattering contribution we identify above shows
the same kind of gluon-quark (or quark-antiquark) mixing in the fragmentation functions
and does not contribute to the energy loss of the leading quark. The unique contributions
in the qq̄ → qq̄, gg processes are the interference-only contributions. They mainly come
from single-triple interference processes in the multiple parton scattering.
5 Modification due to quark-gluon mixing
We have so far cast the modification of the quark fragmentation function due to quark-
quark (antiquark) scattering (or annihilation) in a form similar to the DGLAP evolution
equation in vacuum. In fact, one can also view the evolution of fragmentation functions in
vacuum as modification due to final-state gluon radiation. In both cases, the modification
at large zh is mainly determined by the singular behavior of the splitting functions for
z → 1, whereas the modifications at mall zh is dominated by the singular behavior of the
splitting function for z → 0.
Let us first focus on the modification at large zh. A careful examination of the contribu-
tions from all possible processes shows that the dominant modification to the effective
quark fragmentation function comes from the t-channel of double hard quark-quark scat-
tering processes,
∆Dq→h(zh)∼
α2sxB
Dq→h(
TA(HI)qqi (x, xL)
fAq (x)
× 1 + z
(1 − z)2+
+δ(1− z)∆qi(ℓ2T )
Dq→h(
A(HI)
(x, xL)
fAq (x)
×z(1 + z
(1 − z)+
+ δ(1− z)∆qi(ℓ2T )
, (69)
where the summation is over all possible quark and antiquark flavors including qi = q, q̄
and ∆qi(ℓ
T ) represents the contribution from virtual corrections. We have expressed the
modification in a form that it is proportional to the matrix elements xLT
A(HI)
(x, xL)/f
q (x) ∼
A1/3xLf
(xL) as compared to the modification from quark-gluon scattering where the
corresponding matrix element [Eq. (9)] is TA(HI)qg (x, xL)/f
q (x) ∼ A1/3xLGN(xL). Here,
fNqi (x) and G
N(x) are quark and gluon distributions, respectively, in a nucleon. This
leading contribution to the modification from quark-quark scattering is very similar in
form to that from quark-gluon scattering [see Eq. (7)]. However, it is smaller due to the
different color factors CF/CA = 4/9 and the different quark and gluon distributions,
fNqi (xL) and G
N(xL) in a nucleon. Because of LPM intereference, small angle scattering
with long formation time τf = 1/xLp
+ is suppressed, leading to a minimum value of
xL ≥ xA = 1/mNRA = 0.043 for a Kr target. For this value of xL, the ratio
fNqi (xL, Q
GN(xL, Q2)
≥ 1.40/1.85 ∼ 0.75, (70)
at Q2 = 2 GeV2 according the CTEG4HJ parameterization [35]. Therefore, one has to
include the effect of quark-quark scattering for a complete calculation of the total quark
energy loss and medium modification of quark fragmentation functions.
In a weakly coupled and fully equilibrated quark-gluon plasma, quark to gluon number
density ratio is ρq/ρg = nf (3/2)Nc/(N
c − 1) = 9nf/16. An asymptotically energetic
jet in an infinitely large medium actually probes the small x = 〈q2T 〉/2ET regime, where
quark-antiquark pairs and gluons are predominantly generated by thermal gluons through
pQCD evolution. In this ideal scenario one expects Nq/Ng ∼ 1/4CA = 1/12 and therefore
can neglect quark-quark scattering. The modification of quark fragmentation function will
be dominated by quark-gluon rescattering. However, for moderate jet energy E ≈ 20 GeV
and a finite medium L ∼ 5 fm, parton distributions in a quark-gluon plamsa are close to
the thermal distribution. In particular, if quark and gluon production is dominated by
non-perturbative pair production from strong color fields in the initial stage of heavy-ion
collisions [36], the quark to gluon ratio is comparable to the equilibrium value. In this
case, we should take into account the medium modification of the quark fragmentation
functions by quark-quark scattering.
An important double hard process in quark-quark (antiquark) scattering is qq̄ → gg
[Eq. (64)]. In this process, the annihilation converts the initial quark into two final gluons
that subsequently fragment into hadrons. This will lead to suppression of the leading
hadrons not only because of energy loss (energy carried away by the other gluon) but
also due to the softer behavior of gluon fragmentation functions at large zh. Even though
the leading behavior of the effective splitting function [Eq. (67)]
Pqq̄→gg(z) ≈ 2CF
z2 + (1− z)2
z(1− z)
is not as dominating as that of t-channel quark-quark scattering, it is enhanced by a color
factor 2CF = 8/3. One expects this to make a significant contribution to the medium
modification at intermediate zh.
In high-energy heavy-ion collisions, the ratios of initial production rates for valence
quarks, gluons and antiquarks vary with the transverse momentum pT . Gluon production
rate dominates at low pT while the fraction of valence quark jets increases at large pT .
Quarks are more likely to fragment into protons than antiprotons, while gluons fragment
into protons and antiprotons with equal probabilities. Therefore, the ratio of large pT
antiproton and proton yields in p + p collisions is smaller than 1 and decreases with pT
as the fraction of valence quark jets increases. Since gluons are expected to lose more
energy than quark jets, one would naively expect to see the antiproton to proton ratio
p̄/p becomes smaller due to jet quenching. However, if the quark-gluon conversion due
to qq̄ → gg becomes important, one would expect that the fractions of quark and gluon
jets are modified toward their equilibrium values. The final p̄/p ratio could be larger than
or comparable to that in p+ p collisions. Such a scenario of quark-gluon conversion was
recently considered in Ref. [37] via a master rate equation.
The mixing between quark and gluon jets also happens at the lowest order of quark-
antiquark annihilation as shown in Fig. 3. At NLO, all hard-soft quark-quark (antiquark)
scattering processes have this kind of mixing between quark and gluon fragmentation
functions. Their contributions generally have the form,
α2sxB
Dqi→h(
)−Dg→h(
×Pqqi→qqi(z)
TA(SI)qqi (x, xL)
fAq (x)
, (72)
where again the summation over the quark flavor includes qi = q, q̄. This mixing does
not occur on the probability but rather on on the amplitude level since it involves in-
terferences between single and triple scattering. Therefore, this contribution depends on
the difference between gluon and quark fragmentation functions [Eq. (35)] and can be
positive or negative in different region of zh. Nevertheless, they contribute to the mod-
ification of the effective quark fragmentation function and the flavor dependence of the
final hadron spectra.
6 Flavor dependence of the medium modified fragmentation
Summing all contributions to quark-quark (antiquark) double scattering as listed in Sec-
tion 4, we can express the total twist-four correction up to O(α2s) to the quark fragmen-
tation function as
∆Dq→h(zh)=
2 [Dg→h(zh)−Dq→h(zh)]
qq̄ (x, 0)
fAq (x)
a,b,i
Db→h(zh/z)P
qa→b(z)
TA(i)qa (x, xL)
fAq (x)
 , (73)
where the summation is over all possible q+a→ b+X processes and all different matrix
elements TA(i)qa (x, xL) (i = HI, SI, I, I2), which will be four basic matrix elements we will
use. The effective splitting functions P
qa→b(z) are listed in Appendix A-2. One should also
include virtual corrections which can be constructed from the real corrections through
unitarity constraints [18].
Similarly, we can also write down the twist-four corrections to antiquark fragmentation
in a nuclear medium,
∆Dq̄→h(zh)=
2 [Dg→h(zh)−Dq̄→h(zh)]
q̄q (x, 0)
fAq̄ (x)
a,b,i
Db→h(zh/z)P
q̄a→b(z)
q̄a (x, xL)
fAq̄ (x)
 , (74)
where the matrix elements T
q̄a (x, xL) and the effective splitting functions P
q̄a→b(z) can
be obtained from the corresponding ones for quarks. Given a model for the two-quark
correlation functions, one will be able to use the above expressions to numerically evaluate
twist-four corrections to the quark (antiquark) fragmentation functions. In this paper,
we will instead give a qualitative estimate of the flavor dependence of the correction in
DIS off a large nucleus.
For the purpose of a qualitative estimate, one can assume that all the twist-four two-quark
correlation functions can be factorized, as has been done in Refs. [18,19,23,33],
p+dy−
dy−1 dy
+y−+ix2p
)θ(−y−2 )θ(y− − y−1 )
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉
fAq (x1) f
(x2) , (75)
p+dy−
dy−1 dy
+y−+ix2p
)±ixLp
2 θ(−y−2 )θ(y− − y−1 )
×〈A|ψ̄q(0)
ψq̄(y
−)ψ̄qi(y
ψqi(y
2 )|A〉
fAq (x1) f
(x2)e
A , (76)
where xA = 1/mNRA, mN is the nucleon mass, RA the nucleus size, f
(x2) is the
antiquark distribution in a nucleon and C is assumed to be a constant, parameterizing
the strength of two-parton correlations inside a nucleus. The integration over the position
of the antiquark (y−1 +y
2 )/2 in the twist-four two-quark correlation matrix elements gives
rise to the nuclear enhancement factor 1/xA = mNRA = 0.21A
We should note that we set kT = 0 for the collinear expansion. As a consequence, the
secondary quark field in the twist-four parton matrix elements will carry zero momentum
in the soft-hard process. Finite intrinsic transverse momentum leads to higher-twist cor-
rections. If a subset of the higher-twist terms in the collinear expansion can be resummed
to restore the phase factors such as eixT p
+y−, where xT ≡ 〈k2T 〉/2p+q−z(1 − z), the soft
quark fields in the parton matrix elements will carry a finite fractional momentum xT .
Under such an assumption of factorization, one can obtain all the two-quark correlation
matrix elements:
A(HI)
qq̄i (x, xL)≈
fAq (x) f
(xL + xT )[1− e−x
A], (77)
A(SI)
qq̄i (x, xL)≈
fAq (x+ xL) f
(xT )[1− e−x
A], (78)
qq̄i (x, xL)≈T
A(I2)
qq̄i (x, xL) ≈
fAq (x+ xL) f
(xT )e
fAq (x) f
(xL + xT )e
A. (79)
In the last approximation, we have assumed xL ∼ xT ≪ x. Similarly, one can obtain
TA(i)qqi (x, xL), T
q̄qi (x, xL) and T
q̄q̄i (x, xL). With these forms of two-quark correlation
matrix elements, we can estimate the flavor dependence of the nuclear modification to
the quark (antiquark) fragmentation functions.
The lowest order corrections [O(αs)] are very simple
q→h (zh) ∝ CA1/3[Dg→h(zh)−Dq→h(zh)]fNq̄ (xT ) , (80)
q̄→h̄
(zh) ∝ CA1/3[Dg→h̄(zh)−Dq̄→h̄(zh)]fNq (xT ) . (81)
We consider the dominant contribution from the fragmentation of a quark (antiquark)
which is one of the valence quarks (antiquarks) of the final particle h (antiparticle h̄).
The gluon fragmentation functions into h and h̄ are the same. For large zh, the gluon
fragmentation function is always softer than the valence quark (antiquark) fragmenta-
tion [38]. Therefore, the lowest order twist-four corrections are always negative for large
zh, leading to a suppression of the valence quark (antiquark) fragmentation function,
Dqv→h(zh) [Dq̄v→h̄(zh)]. Consider those quarks that are also valence quarks of a nucleon:
n = udd
p = uud , p̄ = ūūd̄ , (82)
K+ = us̄ ,K− = ūs . (83)
π+, π0, π− = ud̄ , (uū − dd̄ )/
2 , dū . (84)
One can find the following flavor dependence of the lowest order twist-four corrections
to the quark (antiquark) fragmentation functions,
q̄v→h̄
−|∆D(LO)
q̄v→h̄
(zh)|
−|∆D(LO)qv→h(zh)|
fNqv (xT )
fq̄v(xT )
> 1 , (85)
q̄v→h̄
1 + ∆D
q̄v→h̄
(zh)/Dq̄→h̄(zh)
1 + ∆D
(zh)/Dq→h(zh)
< 1 , (86)
where R
is the corresponding leading order suppression of the fragmentation function
at large zh for proton (anti-proton) and K
+ (K−). Since pions contain both valence
quark and antiquark, the suppression factors should be similar for all pions. For xT ≥
0.043, u(x)/ū(x) ≥ 3 and d(x)/d̄(x) ≥ 2 [35]. Therefore, the modification of antiquark
fragmentation functions due to quark-antiquark annihilation is significantly larger than
that of a quark.
The flavor dependence of the NLO results are more complicated since they involve scatter-
ing with both quarks and antiquarks in the medium. One can observe first that effective
splitting functions (or quark-quark scattering cross section) are the same for the t-channel
qq′ → qq′ and qq̄′ → qq̄′ (q′ 6= q) scatterings,
qq′→b(z) = P
q̄q′→b(z) = P
qq̄′→b(z) = P
q̄q̄′→b(z) . (87)
For identical quark-quark scattering or quark-antiquark annihilation, one can separate
the qq̄ annihilation splitting functions (or cross sections) into singlet and non-singlet
contributions by singling out the t-channel contributions,
qq̄→b(z)≡P
qq→b(z) + ∆P
qq̄→b(z), (88)
q̄q→b(z)≡P
q̄q̄→b(z) + ∆P
q̄q→b(z). (89)
These singlet contributions to the modified fragmentation functions are,
S(NLO)
q→h (zh)∝
b,q′,i
Db→h ⊗ P (i)qq′→b(zh)
×[fNq′ (xT ) + fNq̄′ (xT )]C(i) , (90)
S(NLO)
q̄→h̄
(zh)∝
b,q′,i
Db̄→h̄ ⊗ P
q̄q̄′→b̄
×[fNq′ (xT ) + fNq̄′ (xT )]C(i) , (91)
where the summation over q′ now includes q′=q and C(i)(xL) are flavor-independent
functions determined from Eqs. (77)-(79),
C(HI) =C(SI) = C(xL)(1− e−x
C(I) =C(I2) = C(xL)e
A , (92)
and C(xL) is a common coefficient that is a function of xL. Using P
q̄q̄→b̄
(z) = P
qq→b(z)
, one can conclude that the singlet contributions to the modified quark and antiquark
fragmentation functions are the same, ∆D
S(NLO)
q→h (zh) = ∆D
S(NLO)
q̄→h̄
(zh).
The non-singlet contributions, mainly from s-channel and s-t interferences, are,
N(NLO)
q→h (zh)∝
Db→h ⊗∆PN(i)qq̄→b(zh)fNq̄ (xT )C(i) , (93)
N(NLO)
q̄→h̄
(zh)∝
Db̄→h̄ ⊗∆P
q̄q→b̄
(zh)f
q (xT )C
(i) , (94)
where again ∆P
qq̄→b(z) = ∆P
q̄q→b̄
(z) due to crossing symmetry. We have listed all non-
vanishing nonsinglet splitting functions ∆P
qq̄→b(z) in Appendix A-2.
We again consider the limit zh → 1. In this region the convolution in the modified
fragmentation function is dominated by the large z → 1 behavior of the effective split-
ting functions. From the listed ∆P
qq̄→b(z) in Appendix A-2, we can obtain the leading
contributions,
C(i)∆P
qq̄→q(z)≈−4CF
C(xL)
C(i)∆P
qq̄→g(z)≈ 2
2CF + CF (1− e−x
A) + CAe
] C(xL)
, (95)
where we have also neglected terms proportional to 1/Nc. All ∆P
qq̄→q̄(z) are non-leading
in the limit z → 1 and therefore can be neglected. With these leading contributions,
the non-singlet modification to the quark and antiquark fragmentation functions can be
estimated as
N(NLO)
q→h (zh)∝
C(xL)
(1− z)+
CF (1− e−x
A) + CAe
+ δ(1− z)∆1(ℓT )
−Dq→h
C(xL)
(1− z)+
+ δ(1− z)∆2(ℓT )
fNq̄ (xT ) , (96)
N(NLO)
q̄→h̄
(zh)∝
Dg→h̄
C(xL)
(1− z)+
CF (1− e−x
A) + CAe
+ δ(1− z)∆1(ℓT )
Dg→h̄
−Dq̄→h̄
C(xL)
(1− z)+
+ δ(1− z)∆2(ℓT )
fNq (xT ) , (97)
where ∆1(ℓT ) and ∆2(ℓT ) are from virtual corrections,
∆1(ℓT )=
CFC(xL)|z=1 − [CF (1− e−x
+ CAe
A ]C(xL)
, (98)
∆2(ℓT )=
2CF [C(xL)|z=1 − C(xL)] . (99)
Because of momentum conservation, C(xL) = 0 when xL → ∞ for z = 1. Therefore,
the above virtual corrections are always negative. At large zh, these virtual corrections
dominate over the real ones.
There are two kinds of non-singlet contributions in the expressions given above. One that
is proportional to gluon fragmentation functions is due to quark-antiquark annihilation
into gluons which then fragment. The fragmenting gluon not only carries less energy than
the initial quark but also has a softer fragmentation function, leading to suppression of
the final leading hadrons. The second type of contributions is proportional to Dg→h(zh)−
Dq→h(zh) and therefore mixes quark and gluon fragmentation functions, similarly as the
lowest order quark-antiquark annihilation processes [see Eqs. (80) and (81)]. Since a gluon
fragmentation function is softer than a quark one, the real corrections from this type of
processes are positive for small zh and negative for large zh. The virtual corrections
have just the opposite behavior. Therefore, the second type of contributions will reduce
the total net modification. For intermediate values of zh where 2Dg→h(zh) > Dq→h(zh),
the net effect is still the suppression of the effective fragmentation functions for leading
hadrons.
Since fNq (xT ) > f
q̄ (xT ), we can conclude that the LO and NLO combined non-singlet
suppression for antiquark fragmentation into valence hadrons is larger than that for quark
fragmentation into valence hadrons. This qualitatively explains the flavor dependence of
nuclear suppression of leading hadrons in DIS off heavy nuclear targets as measured by
the HERMES experiment [25,26]. The ratio of differential semi-inclusive cross sections
for nucleus and deuteron targets were used to study the nuclear suppression of the frag-
mentation functions. It was observed that suppression of leading anti-proton is stronger
than for leading proton and K− suppression is stronger than K+. In the valence quark
fragmentation picture, the leading proton (K+) is produced mainly from u, d (u) quark
fragmentation while anti-protons come primarily from ū, d̄ (ū) fragmentation. Therefore,
HERMES data are consistent with stronger suppression of antiquark fragmentation.
Since gluon bremsstrahlung and the singlet qqi(q̄i) scattering also suppress quark and
antiquark fragmentation, but independently of quark flavor, one has to include all the
processes in order to have a complete and quantitative numerical evaluation of the flavor
dependence of the nuclear modification of the quark fragmentation functions. Further-
more, the NLO contributions are proportional to αs ln(Q
2)/2π. They are as important
as the lowest order correction for large values of Q2. In principle, one should resum these
higher order corrections via solving a set of coupled DGLAP evolution equations, in-
cluding medium modification for gluon fragmentation functions. The contributions from
quark-quark (antiquark) scattering derived in this paper will be an important part of the
complete dscription. Detailed numerical study of the effect of quark-quark (antiquark)
scattering will be possible only after the completion of this complete description in the
future.
7 Summary
Utilizing the generalized factorization framework for twist-four processes we have stud-
ied the nuclear modification of quark and antiquark fragmentation functions (FF) due
to quark-quark (antiquark) double scattering in dense nuclear matter up to order O(α2s).
We calculated and analyzed the complete set of all possible cut diagrams. The results
can be categorized into contributions from double-hard, hard-soft processes and their
interferences. The double-hard rescatterings correspond to elastic scattering of the lead-
ing quark with another medium quark. It requires the second quark to carry a finite
fractional momentum xL. Therefore, the energy loss of the leading quark through such
processes can be identified as elastic energy loss at order O(α2s). The quark energy loss
and modification of quark fragmentation functions are dominated by the t-channel of
quark-quark (antiquark) scattering and are shown to be similar to that caused by quark-
gluon scattering. The contribution from quark-quark scattering is smaller than that from
quark-gluon scattering by a factor of CF/CA times the ratio of quark and gluon distribu-
tion functions in the medium. We have shown that such contributions are not negligible
for realistic kinematics and finite medium size. The soft-hard rescatterings mix gluon and
quark scattering, in the same way as the lowest order qq̄ → g processes. Such processes
modifies the final hadron spectra or effective fragmentation functions but do not con-
tribute to energy loss of the leading quark. For qq̄ → qq̄, gg processes, there also exist
pure interference contributions mainly coming from single-triple-scattering interference.
With a simple model of a factorized two-quark correlation functions, we further investi-
gated the flavor dependence of the medium modified quark fragmentation functions in a
large nucleus. We identified the flavor dependent part of the modification and find that
the nuclear modification for an antiquark fragmentation into a valence hadron is larger
than that of a quark. This offers an qualitative explanation for the flavor dependence of
the leading hadron suppression in semi-inclusive DIS off nuclear targets as observed by
the HERMES experiment [25,26].
Acknowledgements
The authors thank Jian-Wei Qiu and Enke Wang for helpful discussion. This work
was supported by NSFC under project No. 10405011, by MOE of China under project
IRT0624, by Alexander von Humboldt Foundation, by BMBF, by the Director, Office
of Energy Research, Office of High Energy and Nuclear Physics, Divisions of Nuclear
Physics, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231,
and by the US NSF under Grant No. PHY-0457265, the Welch Foundation under Grant
No. A-1358.
A-1 Hard partonic parts for quark-quark double scattering
In Section 3 we have discussed the calculation of the hard part of one example cut-diagram
(Fig. 5) in detail. In this appendix we list the results for all possible real corrections to
quark-quark (antiquark) double scattering in the next-to-leading order O(α2s). There are
a total of 12 diagrams as illustrated in Figs. 5-16. For the purpose of abbreviation, we
will suppress the variables in the notations of partonic hard parts
D ≡ HD(y−, y−1 , y−2 , x, p, q, zh) , (A-1)
and phase factor functions
I ≡ I(y−, y−1 , y−2 , x, , xL, p) . (A-2)
We first consider all qq̄ → gg annihilation diagrams with different possible cuts. The
contributions of Fig. 5 are:
5,C =
α2sxB
I5,CDg→h(zh/z)
1 + z2
z(1 − z)
1 + (1− z)2
z(1 − z)
, (A-3)
Fig. 5. The t-channel of qq̄ → gg annihilation diagram with three possible cuts, central(C),
left(L) and right(R).
Fig. 6. The interference between t and u-channel of qq̄ → gg annihilation.
I5,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×(1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
)) , (A-4)
5,L(R) =
α2sxB
I5,L(R)
Dq→h(zh/z)2
1 + z2
z(1 − z)
+Dg→h(zh/z)2
1 + (1− z)2
z(1 − z)
, (A-5)
I5,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p
+y−(1− e−ixLp+(y−−y
)) , (A-6)
I5,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
+y−(1− e−ixLp+y
2 ) . (A-7)
Here we have included the fragmentation of both final-state partons.
The contributions from Fig. 6 are:
Fig. 7. The s-channel of qq̄ → gg annihilation diagram with only a central-cut.
6,C =
α2sxB
I6,C 2Dg→h(zh/z)
(1− z)z
CF (CF − CA/2)
, (A-8)
6,L(R) =
α2sxB
I6,L(R) [Dg→h(zh/z) +Dq→h(zh/z)]
(1− z)z
CF (CF − CA/2)
, (A-9)
I6,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
× (1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
)) , (A-10)
I6,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p
+y−(1− e−ixLp+(y−−y
)) . (A-11)
I6,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
+y−(1− e−ixLp+y
2 ) , (A-12)
Note that the central-cut diagram in Fig. 6 corresponds to the interference between t and
u-channel of the qq̄ → gg annihilation processes in Fig. 5. Since the splitting function is
symmetric in z and 1 − z, a factor of 2 comes from the fragmentation of both gluons in
the central-cut diagram.
The s-channel of qq̄ → gg is shown in Fig. 7 which has only one central-cut. Its contri-
bution to the partonic hard part is,
7,C =
α2sxB
I7,C 2Dg→h(zh/z)
2(z2 − z + 1)2
z(1 − z)
, (A-13)
I7,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
+y−e−ixLp
+(y−−y−
)e−ixLp
2 . (A-14)
Note that the splitting function 2(z2 − z + 1)2/z(1 − z) = 2[1 − z(1 − z)]2/z(1 − z) is
symmetric in z and 1− z. Therefore, fragmentation of the two final gluons gives rise to
the factor of 2 in front of the gluon fragmentation function.
Fig. 8. The interference between t and s-channel of qq̄ → gg annihilation.
Fig. 9. The complex conjugate of Fig. 8.
The interferences between t and s-channel of qq̄ → gg processes are shown in Figs. 8 and
9. There are only two possible cuts in these diagrams. The contributions from Fig. 8 are:
8,C =
α2sxB
I8,C Dg→h(zh/z)
1 + z3
z(1 − z)
1 + (1− z)3
z(1 − z)
, (A-15)
α2sxB
Dq→h(zh/z)2
1 + z3
z(1− z)
+ Dg→h(zh/z)2
1 + (1− z)3
z(1 − z)
, (A-16)
I8,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×(1− e−ixLp+y
2 )e−ixLp
+(y−−y−
) , (A-17)
I8,L = θ(y
1 − y−2 )θ(y− − y−1 )ei(x+xL)p
×(e−ixLp+(y−−y
) − e−ixLp+(y−−y
)) . (A-18)
Contributions from Fig. 9, which are just the complex conjugate of Fig. 8, are:
9,C =
α2sxB
I9,C Dg→h(zh/z)
1 + z3
z(1 − z)
1 + (1− z)3
z(1 − z)
, (A-19)
9,R =
α2sxB
Dq→h(zh/z)2
1 + z3
z(1 − z)
+ Dg→h(zh/z)2
1 + (1− z)3
z(1 − z)
, (A-20)
I9,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×(1− e−ixLp+(y−−y
))e−ixLp
2 , (A-21)
I9,R = θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
×(1− e−ixLp+(y
))e−ixLp
1 . (A-22)
One can collect all contributions of the double hard qq̄ → gg processes from the central-
cut diagrams, which should have the common phase factor
ĪC = θ(−y−2 )θ(y− − y−1 )eixp
+y−e−ixLp
) , (A-23)
and obtain the total effective splitting function in the hard partonic part,
Pqq̄→gg(z) =
z(1 − z)
{C2F [1 + z2 + 1 + (1− z)2]− 2CF (CF − CA/2)
+2CFCA(1− z + z2)2 − CFCA[1 + z3 + 1 + (1− z)3]}
z2 + (1− z)2
z(1− z)
− 2CA[z2 + (1− z)2]
. (A-24)
We will find later in Appendix A-3 that the above result can also be obtained from the
total matrix elements squared for qq̄ → gg annihilation.
We now consider the annihilation processes qq̄ → qiq̄i with qi 6= q. There is only the
s-channel process with one central-cut diagram as shown in Fig. 10. Its contribution to
the hard part is
10,C =
α2sxB
I10,C
qi 6=q
[Dqi→h(zh/z) +Dq̄i→h(zh/z)]
Fig. 10. s-channel qq̄ → qiq̄i annihilation.
Fig. 11. t-channel qqi(q̄i) → qqi(q̄i) scattering.
×[z2 + (1− z)2]
, (A-25)
I10,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
+y−e−ixLp
+(y−−y
)e−ixLp
2 . (A-26)
Here we define the effective splitting function for qq̄ → qiq̄i annihilation as,
Pqq̄→qiq̄i(z) =
[z2 + (1− z)2] . (A-27)
Similarly, for qq̄i → qq̄i scattering with qi 6= q, there is only the t-channel as shown in
Fig. 11. There are, however, three cut diagrams. Their contributions to the partonic hard
part are:
11,C =
α2sxB
I11,C
Dq→h(zh/z)
1 + z2
(1− z)2
+Dq̄i→h(zh/z)
1 + (1− z)2
, (A-28)
11,L(R) =
α2sxB
I11,L(R)
Dq→h(zh/z)
1 + z2
(1− z)2
+Dg→h(zh/z)
1 + (1− z)2
, (A-29)
I11,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
× (1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
)) , (A-30)
I11,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p
+y−(1− e−ixLp+(y−−y
)) , (A-31)
I11,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
+y−(1− e−ixLp+y
2 ) . (A-32)
The twist-four two-parton correlation matrix element associated with the above quark-
antiquark scattering is the quark-antiquark correlator,
TAqq̄i(x, xL)∝ e
ixp+y−−ixLp
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
2 )|A〉 , (A-33)
and one should sum over all possible qi 6= q flavors. Note that in the above matrix element,
the momentum flow for the antiquark (q̄i) is opposite to that of the quark (q) fields.
For quark-quark scattering, qqi → qqi, the hard part is essentially the same. The only
difference is the associated matrix element for the quark-quark correlator which is ob-
tained from that of the quark-antiquark correlator via the exchange ψqi(y2) → ψ̄qi(y2)
and ψ̄qi(y1) → ψqi(y1),
TAqqi(x, xL)∝ e
ixp+y−+ixLp
×〈A|ψ̄q(0)
−)ψ̄qi(y
ψqi(y
1 )|A〉 . (A-34)
Note that the momentum flows of the two quarks (q and qi) point in the same direction.
The effective splitting function of this scattering process is defined through the fragmen-
tation of the quark in the central-cut diagram,
Pqqi(q̄i)→qqi(q̄i)(z) =
1 + z2
(1− z)2
. (A-35)
For annihilation qq̄ → qq̄ into identical quark and antiquark pairs, in addition to the
s-channel (Fig. 10 for qi = q) and t-channel (Fig. 11 for qi = q̄), one has also to consider
the interference between s and t-channel amplitudes as shown in Figs. 12 and 13, each
having two cuts. Their contributions to the hard partonic parts are, respectively:
Fig. 12. Interference between s and t-channel of qq̄ → qq̄ scattering
Fig. 13. The complex conjugate of Fig. 12.
12,C =
α2sxB
I12,C
Dq→h(zh/z)
(1− z)
+Dq̄→h(zh/z)
2(1− z)2
CF (CF − CA/2)
, (A-36)
12,L =
α2sxB
I12,L
Dq→h(zh/z)
(1− z)
+Dg→h(zh/z)
2(1− z)2
CF (CF − CA/2)
, (A-37)
I12,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×(1− e−ixLp+y
2 )e−ixLp
+(y−−y−
) , (A-38)
I12,L = θ(y
1 − y−2 )θ(y− − y−1 )ei(x+xL)p
×(e−ixLp+(y−−y
) − e−ixLp+(y−−y
)) ; (A-39)
Fig. 14. The interference between t and u-channel of identical quark-quark scattering qq → qq.
13,C =
α2sxB
I13,C
Dq→h(zh/z)
(1− z)
+Dq̄→h(zh/z)
2(1− z)2
CF (CF − CA/2)
, (A-40)
13,R =
α2sxB
I13,R
Dq→h(zh/z)
(1− z)
+Dg→h(zh/z)
2(1− z)2
CF (CF − CA/2)
, (A-41)
I13,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
×(1− e−ixLp+(y−−y
))e−ixLp
2 , (A-42)
I13,R = θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
×(e−ixLp+y
1 − e−ixLp+y
2 ) . (A-43)
One can again collect contributions from the central-cut diagrams of the double scattering
processes in Figs. 10, 11 12 and 13 and obtain the total effective splitting function for
qq̄ → qq̄,
Pqq̄→qq̄(z) =
[z2 + (1− z)2] +
1 + z2
(1− z)2
CF (CF − CA/2)
z2 + (1− z)2 +
1 + z2
(1− z)2
. (A-44)
Here we have used CF − CA/2 = −1/2Nc. For antiquark fragmentation, Pqq̄→q̄q(z) =
Pqq̄→qq̄(1 − z). One can also obtain the above result from qq̄ → qq̄ scattering matrix
squared as shown in Appendix A-3.
Similarly, for scattering of identical quarks qq → qq, one should set qi = q in Fig. 11[in
Eq. (A-28)]. In addition, one should also also include interference between t and u-channel
of the scattering as shown in Fig. 14. The contributions from such interference diagram
14,C =
α2sxB
I14,C
× 2Dq→h(zh/z)
z(1 − z)
CF (CF − CA/2)
, (A-45)
14,L(R) =
α2sxB
I14,L(R)
× [Dq→h(zh/z) +Dg→h(zh/z)]
z(1− z)
CF (CF − CA/2)
, (A-46)
I14,C = θ(−y−2 )θ(y− − y−1 )ei(x+xL)p
× (1− e−ixLp+y
2 )(1− e−ixLp+(y−−y
)) , (A-47)
I14,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p
+y−(1− e−ixLp+(y−−y
)) , (A-48)
I14,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
+y−(1− e−ixLp+y
2 ) . (A-49)
Note again that the fragmentation of both quarks contributes to the factor 2 in Eq. (A-
45) since the splitting function is symmetric in z and 1 − z. The twist-four two-quark
correlation matrix element associated with qq → qq scattering is TAqq(x, xL) as compared
to TAqq̄(x, xL) for quark-antiquark annihilation processes.
We can sum contributions from the double hard scattering in all the central-cut diagrams
in Figs. 11 and 14 and obtain the total effective splitting function for qq → qq processes,
Pqq→qq(z) =
1 + z2
(1− z)2
1 + (1− z)2
CF (CF − CA/2)
z(1 − z)
1 + z2
(1− z)2
1 + (1− z)2
z(1 − z)
. (A-50)
There are two remaining cut diagrams that contribute to the quark-antiquark annihilation
at the order of O(α2s) as shown in Figs. 15 and 16. Their contributions are:
15,L =
α2sxB
I15,L
Dq→h(zh/z)2
1 + z2
+Dg→h(zh/z)2
1 + (1− z)2
, (A-51)
I15,L =−θ(y−1 − y−2 )θ(y− − y−1 )ei(x+xL)p
+y−e−ixLp
+(y−−y
) , (A-52)
Fig. 15. Interference between final-state gluon radiation from single and triple-quark scattering.
Fig. 16. The complex conjugate of Fig. 15.
16,R =
α2sxB
I16,R
Dq→h(zh/z)2
1 + z2
+Dg→h(zh/z)2
1 + (1− z)2
, (A-53)
I16,R =−θ(−y−2 )θ(y−2 − y−1 )ei(x+xL)p
+y−e−ixLp
1 . (A-54)
A-2 Effective splitting functions
In this Appendix, we list the effective splitting functions associated with each process
qa→ b and the double-hard (HI), hard-soft (SI) or their interferences (I, I2) according
to Eq. (73).
qqi(q̄i)→qi(q̄i)
(z) =
1 + (1− z)2
qqi(q̄i)→q
(z) =
1 + z2
(1− z)2
qqi(q̄i)→qi(q̄i)
(z) =
1 + (1− z)2
qqi(q̄i)→g
(z) = −1 + (1− z)
(A-55)
qq̄→qi(z) =P
qq̄→q̄i(z) = z
2 + (1− z)2,
qq̄→qi(z) =P
qq̄→q̄i(z) = z
2 + (1− z)2, (A-56)
P (HI)qq→q(z) =
1 + (1− z)2
1 + z2
(1− z)2
z(1 − z)
P (SI)qq→g(z) =−P (SI)qq→q(z) , (A-57)
P (SI)qq→q(z) =
1 + (1− z)2
z(1 − z)
qq̄→q(z) = z
2 + (1− z)2 + 1 + z
(1− z)2
qq̄→q̄(z) =P
qq̄→q(1− z) ,
qq̄→g(z) = 2CF
z2 + (1− z)2
z(1 − z)
− 2CA[z2 + (1− z)2], (A-58)
qq̄→q(z) =−
z(1 − z)
+ 2CF
qq̄→q̄(z) =
1 + (1− z)2
qq̄→g(z) =
z(1 − z)
+ 2CF
− 1 + (1− z)
(A-59)
qq̄→q(z) = z
2 + (1− z)2 −
z(1 − z)
− 2CF
qq̄→q̄(z) = z
2 + (1− z)2 ,
qq̄→g(z) =CA
4(1− z + z2)2 − 1
z(1 − z)
− 2CF
(1− z)2
, (A-60)
qq̄→q(z) =
z(1 − z)
− 2CF
qq̄→g(z) =
z(1 − z)
− 2CF
. (A-61)
The non-singlet splitting functions for qq̄ → b, defined as
qq̄→b(z) ≡ P
qq̄→b(z)− P
qq→b(z), (A-62)
are listed as below:
N(HI)
qq̄→qi(q̄i)
(z) =P
qq̄→qi(q̄i)
(z), ∆P
qq̄→qi(q̄i)
(z) = P
qq̄→qi(q̄i)
(z), (A-63)
N(HI)
qq̄→q (z) =−
(1− z2)(1 + z2 + (1− z)2)
1 + z3
z(1− z)
N(HI)
qq̄→q̄ (z) =P
qq̄→q̄(z), ∆P
N(HI)
qq̄→g (z) = P
qq̄→g(z), (A-64)
N(SI)
qq̄→q (z) =−
1 + z2
1 + (1− z)2
N(SI)
qq̄→q̄ (z) =P
qq̄→q̄(z)
N(SI)
qq̄→g (z) = 2CF
1 + z2
z(1 − z)
1 + (1− z)2
(A-65)
qq̄→b(z) =P
qq̄→b(z), ∆P
N(I2)
qq̄→b (z) = P
qq̄→b(z) (b = q, q̄, g) (A-66)
A-3 Alternative calculations of central-cut diagrams
As a cross-check of the hard partonic parts calculated from different cut diagrams in
Appendix A-1, we provide an alternative calculation of all the central-cut diagrams,
which correspond to quark-quark (antiquark) scattering.
Considering a parton (a) with momentum q scattering with another parton (b) that
carries a fractional momentum xp, a(q) + b(xp) → c(ℓ) + d(p′), the cross section can be
written as
dσab =
|M |2ab→cd(t̂/ŝ, û/ŝ)
(2π)32ℓ0
2πδ[(p+ q − ℓ)2]
(4π)2
|M |2ab→cd(t̂/ŝ, û/ŝ)
z(1− z)
dℓ2T δ
, (A-67)
where q = [0, q−, 0] and p = [xp+, 0, 0] are momenta of the initial partons and
, zq−, ~ℓT
(A-68)
is the momentum of one of the final partons. With the given kinematics, the on-shell
condition in the cross section can be recast as
(xp + q − ℓ)2 = 2(1− z)xp+q−
1− xL
, xL =
2z(1− z)p+q−
. (A-69)
The Mandelstam variables of the collision are,
ŝ=(q + xp)2 = 2xp+q− =
z(1 − z)
, û = (ℓ− xp)2 = −zŝ,
t̂=(ℓ− q)2 = −(1 − z)
ŝ = −(1− z)ŝ, (A-70)
where we have used the on-shell condition x = xL.
With Eq. (A-67) and parton distribution functions fNb (x), one can obtain the parton-
nucleon cross section,
dσaN =
dσabf
b (x)dx
fNb (xL)xL|M |2ab→cd(t̂/ŝ, û/ŝ)
z(1− z)
fNb (xL)
C0Pab→cd(z)dz
, (A-71)
where s = 2p+q− is the center-of-mass energy for aN collision, C0 is some common color
factor in the scattering matrix elements and
Pab→cd(z) = (1/C0)|M |2ab→cd(t̂/ŝ, û/ŝ) (A-72)
is what we have defined as the effective splitting function for the corresponding processes.
One can therefore easily obtain these effective splitting functions from the corresponding
matrix elements for elementary parton-parton scattering [39]. We will list them in the fol-
lowing. A common color factor for all quark-quark(antiquark) scattering is C0 = CF/Nc.
qq̄ → qiq̄i annihilation:
|M |2qq̄→qiq̄i =
t̂2 + û2
Pqq̄→qiq̄i(z) = z
2 + (1− z)2 . (A-73)
qq̄ → qq̄ annihilation:
|M |2qq̄→qq̄ =
û2 + ŝ2
û2 + t̂2
Pqq̄→qq̄(z) =
1 + z2
(1− z)2
+ z2 + (1− z)2 +
. (A-74)
qq̄ → gg annihilation:
|M |2qq̄→gg =
− 2CA
û2 + t̂2
Pqq̄→gg(z) = 2CF
z2 + (1− z)2
z(1− z)
− 2CA(z2 + (1− z)2) . (A-75)
qqi(q̄i) → qqi(q̄i) scattering:
|M |2qqi(q̄i)→qqi(q̄i)=
û2 + ŝ2
Pqqi(q̄i)→qqi(q̄i)(z) =
1 + z2
(1− z)2
. (A-76)
qq → qq scattering:
|M |2qq→qq =
û2 + ŝ2
ŝ2 + t̂2
Pqq→qq(z) =
1 + z2
(1− z)2
1 + (1− z)2
z(1− z)
. (A-77)
For quark-gluon Compton scattering, the relevant gluon distribution function is xLGN (xL).
One can therefore rewrite contribution from qg → qg to Eq. (A-71) as,
dσqN = xLGN(xL)πα
sz(1 − z)|M |2qg→qg(t̂/ŝ, û/ŝ)dz
≡xLGN(xL)πα2s
Pqg→qg(z)dz
. (A-78)
We have then for qg → qg scattering,
|M |2qg→qg =
ŝ2 + û2
û2 + ŝ2
Pqg→qg(z) = z(1 − z)
1 + z2
(1− z)2
1 + z2
. (A-79)
Comparing this result with that in Ref. [18] for the quark-gluon rescattering, we can see
that they agree in the limit 1 − z → 0. This is a consequence of the collinear approxi-
mation employed in Ref. [18] in the calculation of the hard partonic part in quark-gluon
rescattering.
We can also extend this calculation to the case of gluon-nucleon scattering. One can use
Eq. (A-71) to define the splitting function for gq → gq scattering,
|M |2gq→gq =
ŝ2 + t̂2
t̂2 + ŝ2
Pgq→gq(z) = z(1 − z)
1 + (1− z)2
1 + (1− z)2
(1− z)
. (A-80)
Here for gluon-parton scattering, there is no common color factor.
gg → qq̄ annihilation,
|M |2gg→qq̄ =
t̂2 + û2
t̂2 + û2
Pgg→qq̄(z) = z(1 − z)
z2 + (1− z)2
z(1− z)
[z2 + (1− z)2]
. (A-81)
gg → gg scattering
|M |2gg→gg =2
3− t̂û
− ûŝ
− t̂ŝ
Pgg→gg(z) = 2
(1− z + z2)3
z(1− z)
. (A-82)
One can use this technique to extend the study of modified fragmentation functions to
propagating gluons. Since the modification is dominated by quark-gluon and gluon-gluon
scattering, comparing the effective splitting functions,
Pqg→qg(z)≈
, (A-83)
Pgg→gg(z)≈
, (A-84)
in the limit z → 1, one can conclude that a gluon’s radiative energy loss is larger than
a quark by a factor of Nc/CF = CA/CF = 9/4. We will leave the complete derivation of
medium modification of gluon fragmentations to a future publication.
References
[1] K. Adcox et al., [PHENIX Collaboration], Phys. Rev. Lett. 88, 022301 (2002).
[2] C. Adler et al., [STAR Collaboration], Phys. Rev. Lett. 89 202301 (2002).
[3] C. Adler et al., [STAR Collaboration], Phys. Rev. Lett. 90, 082302 (2003).
[4] M. Gyulassy and L. McLerran, Nucl. Phys. A 750, 30 (2005).
[5] P. Jacobs and X. N. Wang, Prog. Part. Nucl. Phys. 54, 443 (2005).
[6] J. W. Qiu, [arXiv:hep-ph/0507268].
[7] J. W. Qiu and G. Sterman, Int. J. Mod. Phys. E 12 (2003) 149.
[8] X. F. Guo, Phys. Rev. D58 (1998) 114033.
[9] X. F. Guo, J. W. Qiu and W. Zhu, Phys. Lett. B 523 (2001) 88.
[10] R. J. Fries, A. Schäfer, E. Stein and B. Muller, Nucl. Phys. B 582, 537 (2000).
[11] J. W. Qiu and X. Zhang, Phys. Lett. B 525 (2002) 265.
[12] J. W. Qiu and I. Vitev, Phys. Rev. Lett. 93, 262301 (2004), J. W. Qiu and I. Vitev, Phys.
Lett. B 587, 52 (2004).
[13] M. Gyulassy and X.-N. Wang, Nucl. Phys. B 420, 583 (1994); X.-N. Wang, M. Gyulassy
and M. Plümer, Phys. Rev. D 51, 3436 (1995).
[14] R. Baier et al., Nucl. Phys. B 483, 291 (1997). Nucl. Phys. B 484, 265 (1997); Phys. Rev.
C 58, 1706 (1998).
[15] B. G. Zakharov, JETP Lett. 63, 952 (1996).
[16] M. Gyulassy, P. Lévai and I. Vitev, Nucl. Phys. B594, 371 (2001); Phys. Rev. Lett. 85,
5535 (2000).
[17] U. Wiedemann, Nucl. Phys. B588, 303 (2000); C. A. Salgado and U. A. Wiedemann, Phys.
Rev. Lett. 89, 092303 (2002).
[18] X. F. Guo and X.-N. Wang, Phys. Rev. Lett. 85, 3591 (2000);
X.-N. Wang and X. F. Guo, Nucl. Phys. A 696, 788 (2001).
[19] B. W. Zhang and X.-N. Wang, Nucl. Phys. A 720, 429 (2003); B. W. Zhang, E. Wang
and X.-N. Wang, Phys. Rev. Lett. 93, 072301 (2004); B. W. Zhang, E. K. Wang and
X.-N. Wang, Nucl. Phys. A 757, 493 (2005).
[20] B. Z. Kopeliovich, A. Schäfer and A. V. Tarasov , Phys. Rev. C 59 (1999) 1609
[arXiv:hep-ph/9808378].
[21] M. Gyulassy, I. Vitev, X. N. Wang and B. W. Zhang,Quark-Gluon Plasma 3, R. C. Hwa and
X.-N Wang, Eds. (World Scientific, Singapore, 2003), p123-191 [arXiv:nucl-th/0302077].
[22] A. Kovner and U. A. Wiedemann, arXiv:hep-ph/0304151.
[23] M. Luo, J. W. Qiu and G. Sterman, Phys. Lett. B 279 (1992) 377; Phys. Rev. D 50 (1994)
1951;
Phys. Rev. D 49, 4493 (1994).
[24] E. Wang and X.-N. Wang, Phys. Rev. Lett. 89, 162301 (2002) [arXiv:hep-ph/0202105].
[25] A. Airapetian et al. [HERMES Collaboration], Eur. Phys. J. C 20, 479 (2001)
[arXiv:hep-ex/0012049].
http://arxiv.org/abs/hep-ph/0507268
http://arxiv.org/abs/hep-ph/9808378
http://arxiv.org/abs/nucl-th/0302077
http://arxiv.org/abs/hep-ph/0304151
http://arxiv.org/abs/hep-ph/0202105
http://arxiv.org/abs/hep-ex/0012049
[26] A. Airapetian et al. [HERMES Collaboration], Phys. Lett. B 577, 37 (2003)
[arXiv:hep-ex/0307023].
[27] X. N. Wang, Phys. Lett. B 595, 165 (2004) [arXiv:nucl-th/0305010].
[28] T. Falter, W. Cassing, K. Gallmeister and U. Mosel, Phys. Rev. C 70, 054609 (2004)
[arXiv:nucl-th/0406023].
[29] B. Z. Kopeliovich, J. Nemchik, E. Predazzi and A. Hayashigaki, Nucl. Phys. A 740, 211
(2004) [arXiv:hep-ph/0311220].
[30] V. N. Gribov and L. N. Lipatov, Sov. J. Nucl. Phys. 15, 438 (1972); Yu. L. Dokshitzer,
Sov. Phys. JETP 46, 641 (1977); G. Altarelli and G. Parisi, Nucl. Phys. B126, 298 (1977);
[31] R. D. Field, Applications of Perturbative QCD, Frontiers in Physics Lecture, Vol. 77, Ch.
5.6 (Addison Wesley, 1989).
[32] M. E. Peskin and D. V. Schroeder, An Introduction to Quantuam Field Theory, (Addison-
Wesley Advanced Book Program, 1995).
[33] J. Osborne and X.-N. Wang, Nucl. Phys. A 710, 281 (2002) [arXiv:hep-ph/0204046].
[34] X. N. Wang, arXiv:nucl-th/0604040.
[35] H. L. Lai et al. [CTEQ Collaboration], Eur. Phys. J. C 12, 375 (2000)
[arXiv:hep-ph/9903282]; One can use the online parton distribution calculator at
http://durpdg.dur.ac.uk/HEPDATA/PDF.
[36] F. Gelis, K. Kajantie and T. Lappi, Phys. Rev. Lett. 96, 032304 (2006)
[arXiv:hep-ph/0508229].
[37] W. Liu, C. M. Ko and B. W. Zhang, arXiv:nucl-th/0607047.
[38] J. Binnewies, B. A. Kniehl and G. Kramer, Phys. Rev. D 52, 4947 (1995)
[arXiv:hep-ph/9503464].
[39] R. Cutler and D. W. Sivers, Phys. Rev. D 17, 196 (1978).
http://arxiv.org/abs/hep-ex/0307023
http://arxiv.org/abs/nucl-th/0305010
http://arxiv.org/abs/nucl-th/0406023
http://arxiv.org/abs/hep-ph/0311220
http://arxiv.org/abs/hep-ph/0204046
http://arxiv.org/abs/nucl-th/0604040
http://arxiv.org/abs/hep-ph/9903282
http://durpdg.dur.ac.uk/HEPDATA/PDF
http://arxiv.org/abs/hep-ph/0508229
http://arxiv.org/abs/nucl-th/0607047
http://arxiv.org/abs/hep-ph/9503464
	Introduction
	General formalism
	Quark-quark double scattering processes
	Modified Fragmentation Functions
	 q"7016q g annihilation
	q"7016q qi"7016qi annihilation
	qqi("7016qi) qqi("7016qi) scattering
	qqqq scattering
	q"7016q q"7016q, gg annihilation
	Modification due to quark-gluon mixing
	Flavor dependence of the medium modified fragmentation
	Summary
	Hard partonic parts for quark-quark double scattering
	Effective splitting functions
	Alternative calculations of central-cut diagrams
	References
ABSTRACT
  Modifications to quark and antiquark fragmentation functions due to
quark-quark (antiquark) double scattering in nuclear medium are studied
systematically up to order \cal{O}(\alpha_{s}^2)$ in deeply inelastic
scattering (DIS) off nuclear targets. At the order $\cal{O}(\alpha_s^2)$,
twist-four contributions from quark-quark (antiquark) rescattering also exhibit
the Landau-Pomeranchuck-Midgal (LPM) interference feature similar to gluon
bremsstrahlung induced by multiple parton scattering. Compared to quark-gluon
scattering, the modification, which is dominated by $t$-channel quark-quark
(antiquark) scattering, is only smaller by a factor of $C_F/C_A=4/9$ times the
ratio of quark and gluon distributions in the medium. Such a modification is
not negligible for realistic kinematics and finite medium size. The
modifications to quark (antiquark) fragmentation functions from quark-antiquark
annihilation processes are shown to be determined by the antiquark (quark)
distribution density in the medium. The asymmetry in quark and antiquark
distributions in nuclei will lead to different modifications of quark and
antiquark fragmentation functions inside a nucleus, which qualitatively
explains the experimentally observed flavor dependence of the leading hadron
suppression in semi-inclusive DIS off nuclear targets. The quark-antiquark
annihilation processes also mix quark and gluon fragmentation functions in the
large fractional momentum region, leading to a flavor dependence of jet
quenching in heavy-ion collisions.

<|endoftext|><|startoftext|>
Introduction
Among all dimensions, 2-SAT possesses many special properties unique in
the sense of computational complexity [1, 2, 3, 4, 5]. But in light of works
[6, 8, 7, 9] a problem arose: either those properties are accidental or there
are polynomial time reductions of SAT to 2-SAT of polynomial size. This
article describes one such reduction.
2 Presenting SAT with XOR
In [6] was described one of the ways to present SAT with a conjunction of
XOR. Let us summarize it.
Let Boolean formula f define a given SAT instance:
f = c1 ∧ c2 ∧ . . . ∧ cm. (1)
Clauses ci are disjunctions of literals:
ci = Li1 ∨ Li2 ∨ . . . ∨ Lini , i = 1, 2, . . . , m
- where ni is the number of literals in clause ci; and Lij are the literals. Using
distributive laws, formula (1) can be rewritten in disjunctive form:
f = d1 ∨ d2 ∨ . . . dp, p = n1n2 . . . nm.
Clauses dk in this presentation are conjunctions of m literals - one literal
from each clause ci, i = 1, 2, . . . , m:
dk = L1k1 ∧ L2k2 ∧ . . . ∧ Lmkm , k = 1, 2, . . . , p. (2)
∗Author’s email: sgubin@genesyslab.com
It is obvious that formula (1) is satisfiable iff there are clauses without com-
plimentary literals amongst conjunctive clauses (2). Disjunction of all those
clauses is the disjunctive normal form of formula (1). Thus, formula (1) is
satisfiable iff there are members in its disjunctive normal form.
There is a generator for conjunctive clauses (2):
(ξi1 ⊕ ξi2 ⊕ . . .⊕ ξini) = true, (3)
- where Boolean variable ξµν indicates whether literal Lµν participates in con-
junction (2). Solutions of equation (3) generate conjunctive clauses (2). Let’s
call the variables ξ the indicators. To select from all solutions of equation (3)
those without complimentary clauses, let’s use another Boolean equation.
For each of the combination of clauses (ci, cj), 1 ≤ i < j ≤ m, let’s build
a set of all couples of literals participating in the clauses:
Aij = { (Liµ, Ljν) | ci = Liµ ∨ . . . ; cj = Ljν ∨ . . . }.
Let Bij be a set of such couples of indicators (ξiµ, ξjν), that the literals they
present are complimentary:
Bij = { (ξiµ, ξjν) | (Liµ, Ljν) ∈ Aij, Liµ = L̄jν }.
There are C2m sets Bij , 1 ≤ i < j ≤ m, and
|Bij| ≤ min{ni, nj}.
Let’s mention that some of the sets can be empty. Then, the following equa-
tion will select from all solutions of equation (3) those without complimentary
clauses:
1≤i<j≤m
(ξ,ζ)∈Bij
(ξ̄ ∨ ζ̄) = true. (4)
Due to the above estimations of the number of sets Bij and of their sizes, the
number of clauses in formula (4) is
n = O(t2m
- where t2 is the second number in the row of clauses’ sizes sorted by value:
t1 = max{n1, n2, . . . , nm}, t2 = max
min{ni, nj}, . . .
Because satisfiability of formula (1) means that the disjunctive normal
form of formula (1) has conjunctive clauses, formula (1) is satisfiable iff the
following formula/equation is satisfiable:
g ∧ h = true. (5)
The reasons for replacing formula (1) with formula (5) are explained in
[6]. The number of true-strings in truth-tables of XOR clauses of formula
(3) is linear over initial input. The number of true-strings in truth-tables of
disjunctive clauses of formula (4) is just 3. The number of all clauses in (5)
is cubic over initial input. It can be estimated as
m+ n = O(t2m
Thus, application of the simplified compatibility matrices method [6] to equa-
tion (5) will produce a polynomial time algorithm for SAT. But let’s return
to the reduction.
3 SAT vs. 2-SAT
Let’s apply the simplified method of compatibility matrices [6] to equation
(5). The method consists of sequential Boolean transformations of compat-
ibility matrices of equation (5). Let’s mention that after m iterations, due
to the allocation of formula (4) at the end of formula (5), there will only be
compatibility matrices of equation (4) left in play. They will be grouped in
an upper triangular box matrix
S = (Fm+µ,m+ν)1≤µ<ν≤n. (6)
The matrix is displayed below:
Fm+1,m+2 Fm+1,m+3 . . . Fm+1,m+n
Fm+2,m+3 . . . Fm+2,m+n
. . .
Fm+n−1,m+n
If there are no complimentary literals in different clauses of formula (1),
then formula (4) is just missing. The size of matrix (6) is 0× 0. In this case,
formula (1) is reducible to 1-SAT instance
ω1 ∧ ω2 ∧ . . . ∧ ωm,
- where
ωi = ξi1 ⊕ ξi2 ⊕ . . .⊕ ξini, i = 1, 2, . . . , m.
This singularity belongs to the set of all 2-SAT instances.
If, during the first m iterations, a pattern of unsatisfiability arises (one of
the compatibility matrices becomes filled with false entirely), then formulas
(5) and (1) are both unsatisfiable [6]. This case may be thought of as a case
of formula (1) being reduced to an unsatisfiable formula
false.
Let’s include this singularity in the set of all 2-SAT instances.
Otherwise, boxes Fm+µ,m+ν in matrix (6) are what is left of the compati-
bility matrices of equation (4) after the first m iterations of the method.
Due to their construction [6], the boxes are 3× 3 matrices:
Fm+µ,m+ν = (xij)3×3, 1 ≤ µ < ν ≤ n (7)
- where xij ∈ {false, true}. The number of boxes is C
n. Thus, the number
of all elements in matrix (6) is
e = 9C2n = O(t
Let’s enumerate the elements arbitrarily:
y1, y2, . . . , ye.
Then, distribution of true/false in matrix (6) can be described with a 1-SAT
formula/equation
w = η1 ∧ η2 . . . ∧ ηe = true, (8)
- where ηi are literals over a set of Boolean variables
{ b1, b2, . . . , be }.
The literals are
bi, yi = true
b̄i, yi = false
, i = 1, 2, . . . , e.
Let’s take the following 2-SAT instance:
h ∧ w. (9)
Box matrix (6) is an initialization of the modified method of compatibility
matrices [6] for formula (9): compatibility matrices of formula (4) are de-
pleted to satisfy equation (8). Thus, continuation of the simplified method
of compatibility matrices for equation (5) from its Step m+ 1 to its finish is
an application of the modified method of compatibility matrices to system
(9) from its Step 1 to its finish [6]. After n−2 iterations, both methods must
result with the same version of satisfiability of formula (1). Thus, formulas
(5) and (1) are satisfiable iff 2-SAT formula (9) is satisfiable. The number of
clauses in formula (9) is
e+ n = O(t22m
According to [6], the time to deduce formula (9) can be safely estimated as
O(t41t
4 SAT vs. 1-SAT
Let’s take one step further. Applying to formula (1)/(5) either of the varia-
tions of the compatibility matrices method [6] will produce a Boolean matrix.
Let it be a matrix R:
R = (rij)a×b.
Size of the matrix depends on the method’s variation and the order of clauses
in formula (1). The size can be changed if permute the clauses and repeat the
method [6]. The formula (1) is satisfiable iff matrix R contains true-elements
[6] (elements which are true). The existence/absence of the true-elements is
the only invariant.
If formula (1) is unsatisfiable, then that formula is reducible to formula
“false”. Otherwise, formula (1) is reducible to a 1-SAT instance.
Proof. Let’s enumerate elements of matrix R in arbitrarily order:
z1, z2, . . . , zab.
Let B be a set of t = ab Boolean variables:
B = { bi ∈ {false, true} | i = 1, 2, . . . , t }.
Then the following 1-SAT formula describes distribution of true/false in
matrix R:
θ1 ∧ θ2 ∧ . . . ∧ θt, (10)
- where literals θi are
bi, zi = true
b̄i, zi = false
, i = 1, 2, . . . , t.
Thus, the compatibility matrices method reduces satisfiable formula (1) to
1-SAT formula (10).
In its turn, formula (10) can be rewritten as SAT of any dimension by
appropriate substitution of variables.
If use the simplified method of compatibility matrices, then matrix R is a
3× 3 Boolean matrix [6]. Let there be two clauses shorter than 3 in formula
(1). Let’s permute all clauses and make those shortest clauses to be the last
ones in formula (1). Then, result of the modified method [6] will be a matrix
R of size less than 3× 3. That proves the following theorem.
Theorem 1. Any SAT instance is reducible to a 1-SAT instance with 9
variables or less. A SAT instance is unsatisfiable iff its 1-SAT presentation
is “false” - there is not any variables in its 1-SAT presentation.
5 Conclusions
Formula (1) may be thought of as a “Business Requirements”. And any
appropriate computer program may be thought of as a solution of the SAT
instance. Then, theorem 1 can be an explanation of the remarkable efficiency
of the “natural programs”. From this point of view, the iterations of the
method of compatibility matrices may be thought of as a learning/modeling
of the business domain. In the artificial programming, the calculation of the
compatibility matrices - a virtual business domain - could be a conclusion of
the stage “Business Requirements Analysis/Mathematical Modeling”. That
would improve the programs’ performance. The resulting compatibility ma-
trices may be thought of as a fussy logic’s tables of rules for the domain.
The whole solution of formula (1) can be achieved, with one of the fol-
lowing approaches, for example. ANN approach is the applying of the com-
patibility matrices method backward, starting from matrix R. An example
of that can be found in [7]. DTM approach is the looping trough of the
following three steps: selection of any true-element from matrix R; substi-
tution of the appropriate true-assignments in formula (1); and repeating of
the compatibility matrices method. The last method is an implication of the
self-reducibility property of SAT [5].
In certain sense, theorem 1 may be seen as an answer to the Feasibility
Thesis [2].
References
[1] Stephen Cook. The complexity of theorem-proving procedures. In Con-
ference Record of Third Annual ACM Symposium on Theory of Com-
puting. p.151-158, 1971
[2] Stephen Cook. The P versus NP problem.
http://www.claymath.org/millennium/P_vs_NP/pvsnp.pdf
[3] Richard M. Karp. Reducibility Among Combinatorial Problems. In
Complexity of Computer Computations, Proc. Sympos. IBM Thomas
J. Watson Res. Center, Yorktown Heights, N.Y. New York: Plenum,
p.85-103, 1972.
[4] M.R. Garey and D.S. Johnson. Computers and Intractability, a Guide to
the Theory of NP-Completeness. W.H. Freeman and Co. San Francisco,
1979.
http://www.claymath.org/millennium/P_vs_NP/pvsnp.pdf
[5] Lane A. Hemaspaandra, Mitsunori Ogihara. The Complexity Theory
Companion. Springer-Verlag Berlin Heidelberg, 2002.
[6] Sergey Gubin. A Polynomial Time Algorithm for SAT.
http://www.arxiv.org/pdf/cs/0703146
[7] Sergey Gubin. A Polynomial Time Algorithm for 3-SAT. Examples of
use. http://www.arxiv.org/pdf/cs/0703098
[8] Sergey Gubin. A Polynomial Time Algorithm for 3-SAT.
http://www.arxiv.org/pdf/cs/0701023
[9] Sergey Gubin. A Polynomial Time Algorithm for The Traveling Sales-
man Problem. http://www.arxiv.org/pdf/cs/0610042
http://www.arxiv.org/pdf/cs/0703146
http://www.arxiv.org/pdf/cs/0703098
http://www.arxiv.org/pdf/cs/0701023
http://www.arxiv.org/pdf/cs/0610042
	Introduction
	Presenting SAT with XOR
	SAT vs. 2-SAT
	SAT vs. 1-SAT
	Conclusions
ABSTRACT
  Description of a polynomial time reduction of SAT to 2-SAT of polynomial
size.

<|endoftext|><|startoftext|>
Half-metallic silicon nanowires
E. Durgun,1, 2 D. Çakır,1, 2 N. Akman,2, 3 and S. Ciraci1, 2, ∗
Department of Physics, Bilkent University, Ankara 06800, Turkey
National Nanotechnology Research Center, Bilkent University, Ankara 06800, Turkey
Department of Physics, Mersin University, Mersin, Turkey
(Dated: November 19, 2021)
From first-principles calculations, we predict that transition metal (TM) atom doped silicon
nanowires have a half-metallic ground state. They are insulators for one spin-direction, but show
metallic properties for the opposite spin direction. At high coverage of TM atoms, ferromagnetic sil-
icon nanowires become metallic for both spin-directions with high magnetic moment and may have
also significant spin-polarization at the Fermi level. The spin-dependent electronic properties can
be engineered by changing the type of dopant TM atoms, as well as the diameter of the nanowire.
Present results are not only of scientific interest, but can also initiate new research on spintronic
applications of silicon nanowires.
PACS numbers: 73.22.-f, 68.43.Bc, 73.20.Hb, 68.43.Fg
Rod-like, oxidation resistant Si nanowires (SiNW) can
now be fabricated at small diameters[1] (1-7 nm) and dis-
play diversity of interesting electronic properties. In par-
ticular, the band gap of semiconductor SiNWs varies with
their diameters. They can serve as a building material in
many of electronic and optical applications like field effect
transistors [2] (FETs), light emitting diodes [3], lasers [4]
and interconnects. Unlike carbon nanotubes, the con-
ductance of semiconductor nanowire can be tuned easily
by doping during the fabrication process or by applying
a gate voltage in a SiNW FET.
In this letter, we report a novel spin-dependent elec-
tronic property of hydrogen terminated silicon nanowires
(H-SiNW): When doped by specific transition metal
(TM) atoms they show half-metallic[5, 6] (HM) ground
state. Namely, due to broken spin-degeneracy, energy
bands En(k, ↑) and En(k, ↓) split and the nanowire re-
mains to be insulator for one spin-direction of electrons,
but becomes a conductor for the opposite spin-direction
achieving 100% spin polarization at the Fermi level. Un-
der certain circumstances, depending on the dopant and
diameter, semiconductor H-SiNWs can be also either a
ferromagnetic semiconductor or metal for both spin di-
rections. High-spin polarization at the Fermi level can
be achieved also for high TM coverage of specific SiNWs.
Present results on the asymmetry of electronic states of
TM doped SiNWs are remarkable and of technological in-
terest since room temperature ferromagnetism is already
discovered in Mn-doped SiNW[8]. Once combined with
advanced silicon technology, these properties can be re-
alizable and hence can make ”known silicon” again a po-
tential material with promising nanoscale technological
applications in spintronics, magnetism.
Even though 3D ferromagnetic Heusler alloys and
transition-metal oxides exhibit half-metallic properties
[7], they are not yet appropriate for spintronics because
of difficulties in controlling stoichiometry and the defect
levels destroying the coherent spin-transport. Qian et
al. have proposed HM heterostructures composed of δ-
doped Mn layers in bulk Si [9]. Recently, Son et al.
[10] predicted HM properties of graphene nanoribbons.
Stable 1D half-metals have been also predicted for TM
atom doped arm-chair single-wall carbon nanotubes [11]
and linear carbon chains [12, 13]; but synthesis of these
nanostructures appears to be difficult.
Our results are obtained from first-principles plane
wave calculations [14] (using a plane-wave basis set up to
kinetic energy of 350 eV) within generalized gradient ap-
proximation expressed by PW91 functional[15]. All cal-
culations for paramagnetic, ferromagnetic and antiferro-
magnetic states are carried out using ultra-soft pseudopo-
tentials [16] and confirmed by using PAW potential[17].
All atomic positions and lattice constants are optimized
by using the conjugate gradient method where total en-
ergy and atomic forces are minimized. The convergence
for energy is chosen as 10−5 eV between two steps,
and the maximum force allowed on each atom is 0.05
eV/Å[18].
Bare SiNW(N)s (which are oriented along [001] direc-
tion and have N Si atoms in their primitive unit cell)
are initially cut from the ideal bulk Si crystal in rod-like
forms and subsequently their atomic structures and lat-
tice parameter are relaxed [19]. The optimized atomic
structures are shown for N=21, 25, and 57 in Fig. 2.
While bare SiNW(21) is a semiconductor, bare SiNW(25)
and SiNW(57) are metallic. The average cohesive energy
relative to a free Si atom (Ec) is comparable with the
calculated cohesive energy of bulk crystal (4.64 eV per Si
atom) and it increases with increasing N. The average co-
hesive energy relative to the bulk Si crystal, E
c, is small
but negative as expected. Upon passivation of dangling
bonds with hydrogen atoms all of these SiNWs (specified
as H-SiNW) become semiconductor with a band gap EG.
The binding energy of adsorbed hydrogen relative to the
free H atom (Eb), as well as relative to the free H2 (E
are both positive and increases with increasing N. Exten-
http://arxiv.org/abs/0704.0109v1
FIG. 1: (Color online) Upper curve in each panel with
numerals indicate the distribution of first, second, third,
fourth etc nearest neighbor distances of SiNW(N) as cut
from the ideal Si crystal, same for structure-optimized bare
SiNW(N)(middle curve) and structure optimized H-SiNW(N)
(bottom curve) for N=21, 57 and 81. Vertical dashed line cor-
responds to the distance of Si-H bond.
sive ab initio molecular dynamics calculations have been
carried out at 500 K using supercells, which comprise ei-
ther two or four primitive unit cells of nanowires to lift
artificial limitations imposed by periodic boundary con-
dition. After several iterations lasting 1 ps, the structure
of all SiNW(N) and H-SiNW(N) remained stable. Even
though SiNWs are cut from ideal crystal, their optimized
structures deviate substantially from crystalline coordi-
nation, especially for small diameters as seen in Fig.1.
Upon hydrogen termination the structure is healed sub-
stantially, and approaches the ideal case with increas-
ing N (or increasing diameter), as expected. The cal-
culated response of the wire to a uniaxial tensile force,
κ = ∂ET /∂c, ranging from 172 to 394 eV/cell indicates
that the strength of H-SiNW(N)s (N=21-57) is rather
high.
The adsorption of a single TM (TM=Fe, Ti, Co, Cr,
and Mn) atom per primitive cell, denoted by n = 1, have
been examined for different sites (hollow, top, bridge etc)
on the surface of H-SiNW(N) for N=21, 25 and 57. In
Fig. 2(c) we present only the most energetic adsorption
geometry for a specific TM atom for each N, which re-
sults in a HM state. These are Co-doped H-SiNW(21),
Cr-doped SiNW(25) and Cr-doped SiNW(57). These
nanowires have ferromagnetic ground state, since their
energy difference between calculated spin-unpolarized
and spin-polarized total energy, i.e. ∆Em = EsuT −
is positive. We calculated ∆Em =0.04, 0.92 and
0.94 eV for H-SiNW(21)+Co, H-SiNW(25)+Cr and H-
SiNW(57)+Cr, respectively [20]. Moreover, these wires
have the integer number of unpaired spin in their prim-
itive unit cell. In contrast to usually weak binding of
TM atoms on single-wall carbon nanotubes which can
lead to clustering [21], the binding energy of TM atoms
(EB) on H-SiNWs is high and involve significant charge
transfer from TM atom to the wire [22]. Mulliken anal-
ysis indicates that the charge transfer from Co to H-
FIG. 2: (Color online) Top and side views of optimized atomic
structures of various SiNW(N)’s. (a) Bare SiNWs; (b) H-
SiNWs; (c) single TM atom doped per primitive cell of H-
SiNW (n = 1); (d) H-SiNWs covered by n TM atom corre-
sponding to n > 1. Ec, E
c, Eb, E
b, EG, and µ, respectively
denote the average cohesive energy relative to free Si atom,
same relative to the bulk Si, binding energy of hydrogen atom
relative to free H atom, same relative to H2 molecule, energy
band gap and the net magnetic moment per primitive unit
cell. Binding energies in regard to the adsorption of TM
atoms, i.e. EB, E
B for n = 1 and average values EB , E
for n > 1 are defined in the text and in Ref[22]. The [001]
direction is along the axis of SiNWs. Small, large-light and
large-dark balls represent H, Si and TM atoms, respectively.
Side views of atomic structure comprise two primitive unit
cells of the SiNWs. Binding and cohesive energies are given
in eV/atom.
SiNW(21) is 0.5 electrons. The charge transfer from
Cr to H-SiNW(25) and H-SiNW(57) is even higher (0.8
and 0.9 electrons, respectively). Binding energies of ad-
sorbed TM atoms relative to their bulk crystals (E′B) are
negative and hence indicate endothermic reaction. Due
to very low vapor pressure of many metals, it is proba-
bly better to use some metal-precursor to synthesize the
structures predicted here.
The band structures of HM nanowires are presented
in Fig.3. Once a Co atom is adsorbed above the center
of a hexagon of Si atoms on the surface of H-SiNW(21)
the spin degeneracy is split and whole system becomes
magnetic with a magnetic moment of µ=1 µB (Bohr mag-
neton per primitive unit cell). Electronic energy bands
become asymmetric for different spins: Bands of major-
ity spins continue to be semiconducting with relatively
smaller direct gap of EG=0.4 eV. In contrast, two bands
of minority spins, which cross the Fermi level, become
metallic. These metallic bands are composed of Co-3d
and Si-3p hybridized states with higher Co contribution.
The density of majority and minority spin states, namely
D(E, ↑) and D(E, ↓), display a 100% spin-polarization
P = [D(EF , ↑)−D(EF , ↓)]/[D(EF , ↑)+D(EF , ↓)] at EF .
Cr-doped H-SiNW(25) is also HM. Indirect gap of major-
ity spin bands has reduced to 0.5 eV. On the other hand,
two bands constructed from Cr-3d and Si-3p hybridized
states cross the Fermi level and hence attribute metal-
licity to the minority spin bands. Similarly, Cr-doped
H-SiNW(57) is also HM. The large direct band gap of
undoped H-SiNW(57) is modified to be indirect and is
reduced to 0.9 eV for majority spin bands. The mini-
mum of the unoccupied conduction band occurs above
but close to the Fermi level. Two bands formed by Cr-3d
and Si-3p hybridized states cross the Fermi level. The
net magnetic moment is 4 µB . Using PAW potential
results, we estimated Curie temperature of half-metallic
H-SiNW+TMs as 8, 287, and 709 K for N=21, 25, and
57, respectively.
The well-known fact that density functional theory un-
derestimates the band gap, EG does not concern the
present HM states, since H-SiNWs are already verified
to be semiconductor experimentally[1] and upon TM-
doping they are predicted to remain semiconductor for
one spin direction. In fact, band gaps predicted here are
in fair agreement with experiment and theory. As for par-
tially filled metallic bands of the opposite spin, they are
properly represented. Under uniaxial compressive strain
the minimum of the conduction band of majority spin
states rises above the Fermi level. Conversely, it becomes
semi-metallic under uniaxial tensile strain. Since conduc-
tion and valence bands of both H-SiNW(21)+Co and H-
SiNW(25)+Cr are away from EF , their HM behavior is
robust under uniaxial strain. Also the effect of spin-orbit
coupling is very small and cannot destroy HM properties
[12]. The form of two metallic bands crossing the Fermi
level eliminates the possibility of Peierls distortion. On
FIG. 3: (Color online) Band structure and spin-dependent to-
tal density of states (TDOS) for N=21, 25 and 57. Left panels:
Semiconducting H-SiNW(N). Middle panels: Half-metallic H-
SiNW(N)+TM. Right panels: Density of majority and mi-
nority spin states of H-SiNW(N)+TM. Bands described by
continuous and dotted lines are majority and minority bands.
Zero of energy is set to EF .
FIG. 4: (Color online) D(E, ↓), density of minority (light) and
D(E, ↑), majority (dark) spin states. (a) H-SiNW(25)+Cr,
n = 8; (b) H-SiNW(25)+Cr, n = 16. P and µ indicate spin-
polarization and net magnetic moment (in Bohr magnetons
per primitive unit cell), respectively.
the other hand, HM ground state of SiNWs is not com-
mon to all TM doping. For example H-SiNW(N)+Fe is
consistently ferromagnetic semiconductor with different
EG,↑ and EG,↓. H-SiNW(N)+Mn(Cr) can be either fer-
romagnetic metal or HM depending on N.
To see whether spin-dependent GGA properly repre-
sents localized d-electrons and hence possible on-site re-
pulsive Coulomb interaction destroys the HM, we also
carried out LDA+U calculations[23]. We found that in-
sulating and metallic bands of opposite spins coexist up
to high values of repulsive energy (U = 4) for N=25. For
N=57, HM persists until U∼1. Clearly, HM character
of TM doped H-SiNW revealed in Fig.3 is robust and
unique behavior.
Finally, we note that HM state predicted in TM-doped
H-SiNWs occurs in perfect structures; complete spin-
polarization may deviate slightly from P=100% due to
the finite extent of devices. Even if the exact HM charac-
ter corresponding to n = 1 is disturbed for n > 1, the pos-
sibility that some H-SiNWs having high spin-polarization
at EF at high TM coverage can be relevant for spintronic
applications. We therefore investigated electronic and
magnetic structure of the above TM-doped H-SiNWs at
n > 1 as described in Fig. 2(d). Figure 4 presents the
calculated density of minority and majority spin states
of Cr covered H-SiNWs.
It is found that H-SiNW(21) covered by Co is non-
magnetic for both coverage of n = 4 and 12. H-SiNW(25)
is, however, ferromagnetic for different level of Cr cover-
age and has high net magnetic moment. For example,
n = 8 can be achieved by two different geometries; both
geometries are ferromagnetic with µ=19.6 and 32.3 µB
and are metallic for both spin directions. Interestingly,
while P is negligible for the former geometry, the lat-
ter one has P = 0.84 and hence is suitable for spin-
tronic applications (See Fig. 4). Similarly, Cr covered
H-SiNW(57) with n = 8 and 16 are both ferromagnetic
with µ= 34.3 (P =56) and µ=54.5 µB (P =0.33), respec-
tively. The latter nanostructure having magnetic mo-
ment as high as 54.5 µB can be a potential nanomagnet.
Clearly, not only total magnetic moment, but also the
spin polarization at EF of TM covered H-SiNMs exhibits
interesting variations depending on n, N and the type of
In conclusion, hydrogen passivated SiNWs can exhibit
half-metallic state when doped with certain TM atoms.
Resulting electronic and magnetic properties depend on
the type of dopant TM atom, as well as on the diam-
eter of the nanowire. As a result of TM-3d and Si-3p
hybridization two new bands of one type of spin direc-
tion are located in the band gap, while the bands of
other spin-direction remain to be semiconducting. Elec-
tronic properties of these nanowires depend on the type
of dopant TM atoms, as well as on diameter of the
H-SiNW. When covered with more TM atoms, perfect
half-metallic state of H-SiNW is disturbed, but for cer-
tain cases, the spin polarization at EF continues to be
high. High magnetic moment obtained at high TM
coverage is another remarkable result which may lead
to the fabrication of nanomagnets for various applica-
tions. Briefly, functionalizing silicon nanowires with TM
atoms presents us a wide range of interesting properties,
such as half-metals, 1D ferromagnetic semiconductors or
metals and nanomagnets. We believe that our findings
hold promise for the use of silicon -a unique material
of microelectronics- in nanospintronics including magne-
toresistance, spin-valve and non-volatile memories.
∗ Electronic address: ciraci@fen.bilkent.edu.tr
[1] D. D. D. Ma et al., Science 299, 1874 (2003).
[2] Y. Cui, Z. Zhong, D. Wang, W. U. Wang and C. M.
Lieber, Nano Lett. 3, 149 (2003).
[3] Y. Huang, X. F. Duan and C. M. Lieber, Small 1, 142
(2005).
[4] X. F. Duan, Y. Huang, R. Agarwal and C. M. Lieber,
Nature(London) 421, 241 (2003).
[5] R.A. de Groot et al., Phys. Rev. Lett. 50, 2024 (1983).
[6] W.E. Pickett and J. S. Moodera, Phys. Today 54, 39
(2001).
[7] J.-H. Park et al., Nature (London) 392, 794 (1998).
[8] W.H. Wu, J.C. Tsai and J.L. Chen, Appl. Phys. Lett.
90, 043121 (2007).
[9] M.C. Qian et al., Phys. Rev. Lett. 96, 027211 (2006).
[10] Y-W Son, M.L. Cohen and S.G. Louie, Nature 444,
(2006); Phys. Rev. Lett. 97, 216803 (2006).
[11] C. Yang, J. Zhao and J.P. Lu, Nano. Lett. 4, 561 (2004);
Y. Yagi et al., Phys. Rev. B 69, 075414 (2004).
[12] S. Dag et al., Phys. Rev. B 72, 155444 (2005).
[13] E. Durgun et al., Europhys. Lett. 73, 642 (2006).
[14] Numerical computations have been carried out by us-
ing VASP software: G. Kresse, J. Hafner, Phys Rev. B
47, R558 (1993). Calculations of charge transfer, orbital
contribution and local magnetic moments have been re-
mailto:ciraci@fen.bilkent.edu.tr
peated by SIESTA code using local basis set, P. Ordejon,
E. Artacho and J.M. Soler, Phys. Rev. B 53, R10441
(1996).
[15] J. P. Perdew et al., Phys. Rev. B 46, 6671 (1992).
[16] D. Vanderbilt, Phys. Rev. B 41, R7892 (1990).
[17] P.E. Bloechl, Phys. Rev. B 50, 17953 (1994).
[18] All structures have been treated within supercell geom-
etry using the periodic boundary conditions with lattice
constants of a and b ranging from 20 Å to 25 Å depending
on the diameter of the SiNW and c = co (co being the
optimized lattice constant of SiNW along the wire axis).
Some of the calculations have been carried out in dou-
ble and quadruple primitive unit cells of SiNW by taking
c = 2co and c = 4co, respectively. In the self-consistent
potential and total energy calculations the Brillouin zone
is sampled in the k-space within Monkhorst-Pack scheme
[H.J. Monkhorst and J.D. Pack, Phys. Rev. B 13, 5188
(1976)] by (1x1x15) mesh points.
[19] Numerous theoretical studies on SiNW have been pub-
lished in recent years. See for example: A. K. Singh et
al., Nano. Lett. 6, 920 (2006); Q. Wang et al., Phys. Rev.
Lett. 95, 167200 (2005); Nano Lett. 5, 1587 (2005).
[20] Spin-polarized calculations have been carried by relax-
ing the magnetic moment and by starting with different
initial µ values. Whether antiferromagnetic ground state
exists in H-SiNW(N)+TM’s has been explored by using
supercell including double primitive cells.
[21] E. Durgun et al., Phys. Rev. B 67, 201401(R) (2003); J.
Phys. Chem. B 108, 575 (2004).
[22] Binding energy corresponding to n=1 is calculated by
the following expression, EB = ET [H − SiNW (N)] +
ET [TM ] − ET [H − SiNW (N) + TM ] in terms of the
total energy of optimized H-SiNW(N) and TM-doped H-
SiNW(N) (i.e. H-SiNW(N)+TM) and the total energy of
the string of TM atoms having the same lattice parameter
co of H-SiNW(N)+TM, all calculated in the same super-
lattice. Hence EB can be taken as the binding energy of
single isoalted TM atom, since the coupling amaong ad-
sorbed TM atoms has been excluded. To calculate E′B,
ET (TM) is taken as the total energy of bulk TM crystal
per atom. For n¿1, ET (TM) is taken as the free TM atom
energy, and hence EB includes the coupling between TM
atoms. For this reason E
B > 0 for H-SiNW(21)+Co at
[23] S.L. Dudarev et al., Phys. Rev. B, 57, 1505 (1998).
ABSTRACT
  From first-principles calculations, we predict that transition metal (TM)
atom doped silicon nanowires have a half-metallic ground state. They are
insulators for one spin-direction, but show metallic properties for the
opposite spin direction. At high coverage of TM atoms, ferromagnetic silicon
nanowires become metallic for both spin-directions with high magnetic moment
and may have also significant spin-polarization at the Fermi level. The
spin-dependent electronic properties can be engineered by changing the type of
dopant TM atoms, as well as the diameter of the nanowire. Present results are
not only of scientific interest, but can also initiate new research on
spintronic applications of silicon nanowires.

<|endoftext|><|startoftext|>
Introduction
Noncommutative geometry (NCG) (a la Connes, see [2] ) and the C∗-
algebraic theory of quantum groups (see, for example, [11], [10]) are two
well-developed mathematical areas which share the basic idea of ‘noncom-
mutative mathematics’, namely, to view a general (noncommutative) C∗
algebra as noncommutative analogue of a topological space, equipped with
additional structures resembling and generalizing those in the classical (com-
mutative) situation, e.g. manifold or Lie group structure. A lot of fruitful
interaction between these two areas is thus quite expected. However, such
an interaction was not very common until recently, when a systematic effort
by a number of mathematicians for understanding C∗-algebraic quantum
groups as noncommutative manifolds in the sense of Connes triggered a
rapid and interesting development to this direction. However, quite sur-
prisingly, such an effort was met with a number of obstacles even in the
case of the simplest non-classical quantum group, namely SUq(2) and it was
not so clear for some time whether this (and other standard examples of
quantum groups) could be nicely fitted into the framework of Connes’ NCG
(see [6] and the discussion and references therein). The problem of finding
a nontrivial equivariant spectral triple for SUq(2) was finally settled in the
affirmative in the papers by Chakraborty and Pal ([4], see also [3] and [5]
for subsequent development), which increased the hope for a happy mar-
riage between NCG and quantum group theory. However, even in the case
of SUq(2), a few puzzling questions remain to be answered. One of them
is the issue of invariance of the Chern character, which we have addressed
in [7] and attempted to suggest a solution through the twisted version of
http://arxiv.org/abs/0704.0111v1
the entire cyclic cohomology theory, building on the ideas of [8]. In that
paper, we also made an attempt to study the connection between twisted
and the conventional NCG following a comment in [3]. The present article
is a follow-up of [7], and we mainly concentrate on SUq(2), considering it as
a test-case for comparing the twisted and conventional formulation of NCG.
2 Notation and background
Let A = SUq(2) (with 0 < q < 1) denote the C∗-algebra generated by two
elements α, β satisfying
α∗α+ β∗β = I, αα∗ + q2ββ∗ = I, αβ − qβα = 0,
αβ∗ − qβ∗α = 0, β∗β = ββ∗.
We also denote the ∗-algebra generated by α and β (without taking the
norm completion) by A∞. There is a Hopf ∗ algebra structure on A∞, as
can be seen from, for example, [10]. We denote the canonical coproduct
on A∞ by ∆. We shall also use the so-called Sweedler convention, which
we briefly explain now. For a ∈ A∞, there are finitely many elements
, i = 1, 2, ..., p (say), such that ∆(a) =
⊗a(2)
. For notational
convenience, we abbreviate this as ∆(a) = a(1)⊗a(2). For any positive integer
m, let A∞m be the m-fold algebraic tensor product of A∞. There is a natural
coaction of A∞ on A∞m given by
∆mA(a1 ⊗ a2 ⊗ ...⊗ am) := (a1(1) ⊗ ...am(1))⊗ (a1(2)...am(2)),
using the Sweedler notation, with summation being implied. Let us recall
the convolution ∗ defined in [7]. If φ : A∞m → C is an m-linear functional,
and ψ : A∞ → C is a linear functional, we define their convolution φ ∗ ψ :
A∞m → C by the following :
(φ ∗ ψ)(a1 ⊗ ...⊗ am) := φ(a(1)1 ⊗ ...⊗ a(1)m )ψ(a
1 ...a
using the Sweedler convention. We say that an m-linear functional φ is
invariant if φ ∗ ψ = ψ(1)φ for every functional ψ on A∞.
In [9], the K-homology K∗(A∞) has been explicitly computed. It has
been shown there that K0(A∞) = K1(A∞) = Z, and the Chern charac-
ters (in cyclic cohomology ) of the generators of these K-homology groups,
denoted by [τev] and [τodd] respectively, are also explicitly written down.
3 Main results
3.1 Chern characters are not invariant
In this subsection, we give detailed arguments for a remark made in [7]
about the impossibility of having an invariant Chern character for A∞ under
the conventional (non-twisted) framework of NCG. To make the notion of
invariance precise, we give the following definition (motivated by a comment
by G. Landi, which is gratefully acknowledged).
Definition 3.1 We say that a class [φ] ∈ HCn(A∞) is invariant if there is
an invariant n + 1-linear functional φ′ such that φ′ is a cyclic cocycle and
φ′ ∼ φ (i.e. [φ] = [φ′]).
It is easy to see that the Chern chracter [τev] cannot be invariant. Had it
been so, it would follow from the uniqueness of the Haar state (say h) on
SUq(2) that τev must be a scalar multiple of h. Since τev is a nonzero trace,
it would imply that h is a trace too. But it is known (see [10]) that h is not
a trace.
However, proving that [τodd] is not invariant requires little bit of detailed
arguments. We begin with the following observation.
Lemma 3.2 If τ is a trace on A∞, i.e. τ ∈ HC0(A∞), then we have that
(∂ξ) ∗ τ = ∂(ξ ∗ τ)
for every functional ξ on A∞, where the Hochschild coboundary operator ∂
is defined by
(∂ξ)(a, b) = ξ(ab)− ξ(ba).
Proof :
We shall use the Swedler notation. We have that for a0, a1 ∈ A∞,
(∂ξ ∗ τ)(a0, a1)
= (∂ξ)(a
0 ⊗ a
1 )τ(a
= ξ(a
1 )τ(a
1 )− ξ(a
0 )τ(a
= ξ(a
1 τ(a
1 )− ξ(a
0 )τ(a
0 ) (since τ is a trace)
= (ξ ∗ τ)(a0a1)− (ξ ∗ τ)(a1a0)
= ∂(ξ ∗ τ)(a0, a1).
The above lemma allows us to define the multiplication ∗ at the level
of cohomology classes. More precisely, for [φ] ∈ HC1(A∞) and [η] ∈
HC0(A∞), we set [φ] ∗ [η] := [φ ∗ η] ∈ HC1(A∞), which is well-defined
by the Lemma 3.2. Similarly [η] ∗ [φ] and [η] ∗ [η′] (where [η′] ∈ HC0(A∞))
can be defined. We now recall from [9] that
[τev] ∗ [τev] = [τev], [τev] ∗ [τodd] = [τodd] ∗ [τev] = 0.
We also note that τev(1) = 1 and that τev is a trace, i.e. τev(ab) = τev(ba).
Using this observation, we are now in a position to prove that the Chern
character of the generator of K1(A∞) is not an invariant class.
Theorem 3.3 [τodd] is not invariant.
Proof :
Suppose that there is φ ∼ τodd such that φ is invariant. Then we have
[φ ∗ τev] = [φ] ∗ [τev] = [τodd] ∗ [τev] = 0.
However, since we have φ ∗ τev = τev(1)φ = φ by the invariance of φ, it
follows that [φ] = [φ ∗ τev] = 0, that is, [τodd] = 0, which is a contradiction.
3.2 Nontrivial pairing with the twisted Chern character
As already mentioned in the introduction, in [7] we have made an attempt
to recover the desirable property of invariance by making a departure from
the conventional NCG and using the twisted entire cyclic cohomology. We
briefly recall here some of the basic concepts from that paper and refer the
reader to [7] and the references therein for more details of this approach. We
shall use the results derived in that paper wihout always giving a specific
reference.
Let us give the definition of twisted entire cyclic cohomology for Banach
algebras for simplicity, but note that the theory extends to locally convex
algebras, which we actually need. The extension to the locally convex al-
gebra case follows exactly as remarked in [1, page 370]. So, let A be a
unital Banach algebra, with ‖.‖∗ denoting its Banach norm, and let σ be
a continuous automorphism of A, σ(1) = 1. For n ≥ 0, let Cn be the
space of continuous n + 1-linear functionals φ on A which are σ-invariant,
i.e. φ(σ(a0), ..., σ(an)) = φ(a0, ..., an)∀a0, ..., an ∈ A; and Cn = {0} for
n < 0. We define linear maps Tn, Nn : C
n → Cn, Un : Cn → Cn−1 and
Vn : C
n → Cn+1 by,
(Tnf)(a0, ..., an) = (−1)nf(σ(an), a0, ..., an−1), Nn =
T jn,
(Unf)(a0, ..., an−1) = (−1)nf(a0, ..., an−1, 1),
(Vnf)(a0, ..., an+1) = (−1)n+1f(σ(an+1)a0, a1, ..., an).
Let Bn = Nn−1Un(Tn − I), bn =
j=0 T
n+1 VnT
n. Let B, b be maps on
the complex C ≡ (Cn)n given by B|Cn = Bn, b|Cn = bn. It is easy to
verify (similar to what is done for the untwisted case , e.g. in [2]) that
B2 = 0, b2 = 0 and Bb = −bB, so that we get a bicomplex (Cn,m ≡ Cn−m)
with differentials d1, d2 given by d1 = (n − m + 1)b : Cn,m → Cn+1,m,
: Cn,m → Cn,m+1. Furthermore, let Ce = {(φ2n)n ∈ IN ;φ2n ∈
C2n∀n ∈ IN}, and Co = {(φ2n+1)n ∈ IN ;φ2n+1 ∈ C2n+1∀n ∈ IN}. We
say that an element φ = (φ2n) of C
e is a σ-twisted even entire cochain if
the radius of convergence of the complex power series
‖φ2n‖z
is infinity,
where ‖φ2n‖ := sup‖aj‖∗≤1 |φ2n(a0, ...., a2n)|. Similarly we define σ-twisted
odd entire cochains, and let Ceǫ (A, σ) (Coǫ (A, σ) respectively) denote the set
of σ-twisted even (respectively odd) entire cochains. Let ∂̃ = d1 + d2 , and
we have the short complex Ceǫ (A, σ)
Coǫ (A, σ). We call the cohomology
of this complex the σ-twisted entire cyclic cohomology of A and denote it
by H∗ǫ (A, σ). Let Aσ = {a ∈ A : σ(a) = a} be the fixed point subalgebra
for the automorphism σ. There is a canonical pairing < ., . >σ,ǫ: K∗(Aσ)×
H∗ǫ (A, σ) → C. We shall need the pairing for the odd case, which we write
down :
< [u], [ψ] >≡< [u], [ψ] >σ,ǫ=
(−1)n n!
(2n + 1)!
ψ2n+1(u
−1, u, ..., u−1, u),
where [u] ∈ K1(Aσ) and [ψ] ∈ H1ǫ (A, σ).
Definition 3.4 Let H be a separable Hilbert space, A∞ be a ∗ subalgebra
(not necessarily complete) of B(H), R be a positive (possibly unbounded)
operator on H, D be a self-adjoint operator in H with compact resolvents
such that the following hold :
(i) [D, a] ∈ B(H) ∀a ∈ A∞,
(ii) R commutes with D,
(iii) For any real number s and a ∈ A∞, σs(a) := R−saRs is bounded and be-
longs to A∞. Furthermore, for any positive integer n, sups∈[−n,n] ‖σs(a)‖ <
Then we call the quadruple (A∞,H,D,R) an odd R-twisted spectral data.
We say that the odd twisted spectral data is Θ-summable if Re−tD
is trace-
class for all t > 0.
Let us now recall the construction of twisted Chern character from a
given odd twisted spectral data (A∞,H,D,R). Let B denote the set of
all A ∈ B(H) for which σs(A) := R−sARs ∈ B(H) for all real number s,
[D,A] ∈ B(H) and s 7→ ‖σs(A)‖ is bounded over compact subsets of the real
line. In particular, A∞ ⊆ B. We define for n ∈ IN an n+1-linear functional
Fn on B by the formula
Fn(A0, ..., An) =
Tr(A0e
...Ane
R)dt0...dtn,
where Σn = {(t0, ..., tn) : ti ≥ 0,
i=0 ti = 1}.
Let us now equip A∞ with the locally convex topology given by the fam-
ily of Banach norms ‖.‖∗,n, n = 1, 2, ..., where ‖a‖∗,n := sups∈[−n,n](‖σs(a)‖+
‖[D,σs(a)]‖). Let A denote the completion of A∞ under this topology, and
thus A is Frechet space. We can construct the (twisted) Chern character in
Hoǫ (A, σ), where σ = σ1, which extends on the whole of A by continuity.
Theorem 3.5 Let φo ≡ (φ2n+1)n be defined by
φ2n+1(a0, ..., a2n+1) =
2iF2n+1(a0, [D, a1], ..., [D, a2n+1]), ai ∈ A.
Then we have (b+B)φo = 0, hence ψo ≡ ((2n + 1)!φ2n+1)n ∈ Hoǫ (A, σ).
We shall also need some results from the theory of semifinite spectral
triples and the corresponding JLO cocycles and index formula, as discussed
in, for example, [1]. An odd semifinite spectral triple is given by (C,N ,K,D),
where K is a separable Hilbert space, N ⊆ B(K) is a von Neumann algebra
with a faithful semifinite normal trace (say τ), D is a self-adjoint operator
affiliated to N , C is a ∗-subalgebra of B(K) such that [D, c] ∈ B(K) for all
c ∈ C. In the terminology of [1], (N ,D) is also called an odd, unbounded
Breuer-Fredholm module for the norm-closure of C. It is called Θ-summable
if τ(e−tD
) < ∞ for all t > 0. For a Θ-summable semifinite spectral triple,
there is a canonical construction of JLO cocycle and index theorem (see [1]),
which are very similar to their counterparts in the conventional framework
of NCG.
Let us now settle in the affirmative conjecture made in [7] about the
nontriviality of the twisted Chern character of a natural twisted spectral data
obtained from the equivariant spectral triple of [4]. For reader’s convenience,
we briefly recall the construction of this equivariant spectral triple. Let us
index the space of irreducible (co-)representations of SUq(2) by half-integers,
i.e. n = 0, 1
, 1, ...; and index the orthonormal basis of the corresponding
(2n + 1)2 dimensional subspace of L2(SUq(2), h) by i, j = −n, ..., n, instead
of 1, 2, ..., (2n + 1). Thus, let us consider the orthonormal basis eni,j , n =
, ...; i, j = −n,−n + 1, ..., n in the notation of [4]. We consider any of
the equivariant spectral triples constructed by the authors of [4] and in the
associated Hilbert space H = L2(SUq(2), h) define the following positive
unbounded operator R :
R(eni,j) = q
−2i−2jeni,j ,
n = 0, 1
, , 1, ...; i, j = −n,−n+ 1, ..., n. Let us choose a spectral triple given
by the Dirac operator D on H, defined by
D(eni,j) = d(n, i)e
i,j ,
where d(n, i) are as in (3.12) of [4], i.e. d(n, i) = 2n + 1 if n = i, d(n, i) =
−(2n+ 1) otherwise. It can easily be seen that (A∞,H,D,R) is an odd R-
twisted spectral data and furthermore, the fixed point subalgebra SUq(2)σ
for σ(.) = R−1 ·R is the unital ∗-algebra generated by β, so it contains u =
∗β)(β−I)+I which can be chosen to be a generator of K1(SUq(2)) = Z
(see [4]). It is easily seen that the map from K1(C
∗(u)) to K1(SUq(2)),
induced by the inclusion map, is an isomorphism of the K1-groups (where
C∗(u) denotes the unital C∗-algebra generated by u). Thus, we can consider
the pairing of the twisted Chern character with K1(C
∗(u)), and in turn with
K1(SUq(2)) using the isomorphism noted before. The important question
raised in [7] is whether we recover the nontrivial pairing obtained in [4] in
our twisted framework, and in what follows, we shall give an affirmative
answer to this question.
Theorem 3.6 The pairing between K1(SUq(2)σ) ∼= K1(SUq(2)) and the
(twisted) Chern character of the above twisted spectral data coincides with
the pairing between K1(SUq(2)) and the Chern character of the (non-twisted)
spectral triple (A∞,H,D). In particular, this pairing is nontrivial.
Proof :
Let N be the von Neumann algebra in B(H) generated by β and f(D) for
all bounded measurable functions f : R → C. Since R commutes with both
β and D, it is easy to see that the functional N ∋ X 7→ τ(X) := Tr(XR)
defines a faithful, normal, semifinite trace on the von Neumann algebra N .
Moreover, (N ,D) is an unbounded Θ-summable Breuer-Fredholm module
for the norm-closure of the unital ∗-algebra (say C) generated by β.
Moreover, it follows from the fact that R commutes with D and u that
the pairing of [u] with the twisted Chern character (say ψo ≡ (ψ2n+1))
coming from the twisted spectral data (A∞,H,D,R) is given by
< [u], [ψo] >
(−1)n n!
(2n+ 1)!
ψ2n+1(u
−1, u, ..., u−1, u)
(−1)nn!
Σ2n+1
Tr(u−1e−t0D
[D,u]et1D
...[D,u]et2n+1D
R)dt0...dt2n+1,
(−1)nn!
Σ2n+1
τ(u−1e−t0D
[D,u]et1D
...[D,u]et2n+1D
)dt0...dt2n+1
which is nothing but the pairing between [u] ∈ K1(C) and the Breuer-
Fredholm module (N ,D) mentioned before. By Theorem 10.8 of [1] and
a straightforward but somewhat lengthy calculation along the lines of index
computation in [4], we can show that the value of this pairing is equal to
−indτ (A) ≡ −(τ(PA) − τ(QA)) for the following operator A : H0 → H0,
where H0 is the closed subspace spanned by {enn,j , n = 0, 12 , ..., j = −n,−n+
1, ..., n}, PA, QA are the orthogonal projections onto the kernel of A and
the kernel of A∗ respectively and where r is a positive integer such that
q2r < 1
< q2r−2 :
Aenn,j = −q(n+j)(2r+1)(1−q2(n−j))r(1−q2(n−j−1))
,j− 1
+(1−q2r(n+j)(1−q2(n−j))r)enn,j.
It can be verified by computations as in [4] that Ker(A) = {0} and Ker(A∗)
is the one dimensional subspace spanned by the vector ξ =
n,−n,
where p 1
= 1 and for n ≥ 3
1− (1− q4n−2)r
(1− q4n) 12 (1− q4n−2)r
1− (1− q2)r
(1 − q4) 12 (1− q2)r
Clearly, since Ren−n,n = e
n,−n, we have Rξ = ξ and thus
−indτ (A) =
‖ξ‖2 τ(|ξ >< ξ|) =
‖ξ‖2Tr(R|ξ >< ξ|) = 1,
which is the same as the value of the pairing between [u] ∈ K1(SUq(2))
and the conventional Chern character corresponding to the spectral triple
constructed in [4]. ✷
Thus we see that both the conventional and twisted frameworks of NCG give
essentially the same results for the example we considered, namely SUq(2).
The aparent weakness of the twisted NCG arising from the fact that the
twisted cyclic cohomology can be paired naturally with only the K theory
of the invariant subalgebra and not of the whole algebra, does not seem
to pose any essential difficulty for studying the noncommutative geometric
aspects of SUq(2), since by a suitable choice of the twisting operator R
as we did one could make sure that the K theory of the corresponding
invariant subalgebra is isomorphic with the K theory of the whole, and also
the pairing between the Chern character and the generator of the K theory
in the twisted framework is equal to the similar pairing in the ordinary
(non-twisted) framework of NCG. It will be important and interesting to
investigate whether a similar fact remains true for a larger class of quantum
groups, and we hope to pursue this in the future.
References
[1] A. Carey and J. Phillips, Spectral flow in Fredholm modules, eta
invariants and the JLO cocycle, K-Theory31 (2004), no. 2, 135–194.
[2] A. Connes, Noncommutative Geometry, Academic Press (1994).
[3] A. Connes, Cyclic Cohomology, Quantum group Symmetries and the
Local Index Formula for SUq(2), J. Inst. Math. Jussieu 3 (2004), no.
1, 17-68.
[4] P. S. Chakraborty and A. Pal, Equivariant spectral triples on the
quantum SU(2) group, K-Theory 28(2003), No. 2, 107-126.
[5] L. Dabrowski, G. Landi, A. Sitarz, W. van Suijlekom and J. C.
Varilly, The Dirac operator on SUq(2), Commun.Math.Phys. 259
(2005) 729-759.
[6] D. Goswami, Some Noncommutative Geometric Aspects of SUq(2),
preprint ( math-ph/0108003).
[7] D. Goswami, Twisted entire cyclic cohomology, J-L-O cocycles and
equivariant spectral triples, Rev. Math. Phys. 16 (2004), no. 5,
583-602.
[8] J. Kustermans, G.J. Murphy and L. Tuset, Differential Calculi over
Quantum Groups and Twisted Cyclic Cocycles, J. Geom. Phys. 44
(2003), no. 4, 570–594.
[9] T. Masuda, Y. Nakagami and J.Watanabe, Noncommutative Differ-
ential Geometry on the Quantum SU(2), I: An Algebraic Viewpoint,
K Theory 4 (1990), 157-180.
[10] S. L. Woronowicz, Twisted SU(2)-group : an example of a non-
commutative differential calculus, Publ. R. I. M. S. (Kyoto Univ.)
23(1987) 117-181.
[11] S. L. Woronowicz, Compact matrix pseudogroups, Commun. Math.
Phys. 111 (1987), no. 4, 613–665.
ABSTRACT
  We give details of the proof of the remark made in \cite{G2} that the Chern
characters of the canonical generators on the K homology of the quantum group
$SU_q(2)$ are not invariant under the natural $SU_q(2)$ coaction. Furthermore,
the conjecture made in \cite{G2} about the nontriviality of the twisted Chern
character coming from an odd equivariant spectral triple on $SU_q(2)$ is
settled in the affirmative.

<|endoftext|><|startoftext|>
7 Placeholder Substructures III: A
Bit-String-Driven “Recipe Theory” for
Infinite-Dimensional Zero-Divisor Spaces
Robert P. C. de Marrais ∗
Thothic Technology Partners, P.O.Box 3083, Plymouth MA 02361
October 29, 2018
Abstract
Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from N-
dimensional hypercomplex numbers (N a power of 2, and at least 4) can
represent singularities and, as N → ∞, fractals – and thereby, scale-free net-
works. Any integer > 8 and not a power of 2 generates a meta-fractal or
Sky when it is interpreted as the strut constant (S) of an ensemble of octahe-
dral vertex figures called Box-Kites (the fundamental ZD building blocks).
Remarkably simple bit-manipulation rules or recipes provide tools for trans-
forming one fractal genus into others within the context of Wolfram’s Class
4 complexity.
1 The Argument So Far
In Parts I[1] and II[2], the basic facts concerning zero-divisors (ZDs) as they arise
in the hypercomplex context were presented and proved. “Basic,” in the context of
this monograph, means seven things. First, they emerged as a side-effect of apply-
ing CDP a minimum of 4 times to the Real Number Line, doubling dimension to
the Complex Plane, Quaternion 4-Space, Octonion 8-Space, and 16-D Sedenions.
With each such doubling, new properties were found: as the price of sacrificing
∗Email address: rdemarrais@alum.mit.edu
http://arxiv.org/abs/0704.0112v3
counting order, the Imaginaries made a general theory of equations and solution-
spaces possible; the non-commutative nature of Quaternions mapped onto the re-
alities of the manner in which forces deploy in the real world, and led to vector
calculus; the non-associative nature of Octonions, meanwhile, has only come into
its own with the need for necessarily unobservable quantities (because of confor-
mal field-theoretical constraints)in String Theory. In the Sedenions, however, the
most basic assumptions of all – well-defined notions of field and algebraic norm
(and, therefore, measurement) – break down, as the phenomena correlated with
their absence, zero-divisors, appear onstage (never to leave it for all higher CDP
dimension-doublings).
Second thing: ZDs require at least two differently-indexed imaginary units
to be defined, the index being an integer larger than 0 (the CDP index of the
Real Unit) and less than 2N for a given CDP-generated collection of 2N-ions. In
“pure CDP,” the enormous number of alternative labeling schemes possible in
any given 2N-ion level are drastically reduced by assuming that units with such
indices interact by XOR-ing: the index of the product of any two is the XOR
of their indices. Signing is more tricky; but, when CDP is reduced to a 2-rule
construction kit, it becomes easy: for index u < G, G the Generator of the 2N-ions
(i.e., the power of 2 immediately larger than the highest index of the predecessor
2N−1-ions), Rule 1 says iu · iG = +i(u+G). Rule 2 says take an associative triplet
(a,b,c), assumed written in CPO (short for “cyclically positive order”: to wit,
a · b = +c, b · c = +a, and c · a = +b). Consider, for instance, any (u,G,G+ u)
index set. Then three more such associative triplets (henceforth, trips) can be
generated by adding G to two of the three, then switching their resultants’ places
in the CPO scheme. Hence, starting with the Quaternions’ (1,2,3) (which we’ll
call a Rule 0 trip, as it’s inherited from a prior level of CDP induction), Rule
1 gives us the trips (1,4,5), (2,4,6), and (3,4,7), while Rule 2 yields up the
other 4 trips defining the Octonions: (1,7,6), (2,5,7), and (3,6,5). Any ZD
in a given level of 2N-ions will then have units with one index < G, written in
lowercase, and the other index > G, written in uppercase. Such pairs, alternately
called “dyads” or “Assessors,” saturate the diagonal lines of their planes, which
diagonals never mutually zero-divide each other (or make DMZs, for ”divisors (or
dyads) making zero”), but only make DMZs with other such diagonals, in other
such Assessors. (This is, of course, the opposite situation from the projection
operators of quantum mechanics, which are diagonals in the planes formed by
Reals and dimensions spanned by Pauli spin operators contained within the 4-
space created by the Cartesian product of two standard imaginaries.)
Third thing: Such ZDs are not the only possible in CDP spaces; but they define
the “primitive” variety from which ZD spaces saturating more than 1-D regions
can be articulated. A not quite complete catalog of these can be found in our
first monograph on the theme [3]; a critical kind which was overlooked there,
involving the Reals (and hence, providing the backdrop from which to see the
projection-operator kind as a degenerate type), were first discussed more recently
[4]. (Ironically, these latter are the easiest sorts of composites to derive of any:
place the two diagonals of a DMZ pairing with differing internal signing on axes
of the same plane, and consider the diagonals they make with each other!) All the
primitive ZDs in the Sedenions can be collected on the vertices of one of 7 copies
of an Octahedron in the Box-Kite representation, each of whose 12 edges indicates
a two-way “DMZ pathway,” evenly divided between 2 varieties. For any vertex V,
and k any real scalar, indicate the diagonals this way: (V,/) = k · (iv+ iV ), while
(V, \)= k · (iv− iV ). 6 edges on a Box-Kite will always have negative edge-sign
(with unmarked ET cell entries: see the “sixth thing”). For vertices M and N,
exactly two DMZs run along the edge joining them, written thus:
(M,/) ·(N, \) = (M, \) · (N,/) = 0
The other 6 all have positive edge-sign, the diagonals of their two DMZs hav-
ing same slope (and marked – with leading dashes – ET cell entries):
(Z,/) ·(V, /)= (Z, \) ·(V,\)= 0
Fourth thing: The edges always cluster similarly, with two opposite faces
among the 8 triangles on the Box-Kite being spanned by 3 negative edges (con-
ventionally painted red in color renderings), with all other edges being positive
(painted blue). One of the red triangles has its vertices’ 3 low-index units forming
a trip; writing their vertex labels conventionally as A, B, C, we find there are in
fact always 4 such trips cycling among them: (a,b,c), the L-trip; and the three
U-trips obtained by replacing all but one of the lowercase labels in the L-trip with
uppercase: (a,B,C); (A,b,C); (A,B,c). Such a 4-trip structure is called a Sail, and
a Box-Kite has 4 of them: the Zigzag, with all negative edges, and the 3 Trefoils,
each containing two positive edges extending from one of the Zigzag vertices to
the two vertices opposite its Sailing partners. These opposite vertices are always
joined by one of the 3 negative edges comprising the Vent which is the Zigzag’s
opposite face. Again by convention, the vertices opposite A, B, C are written F,
E, D in that order; hence, the Trefoil Sails are written (A,D,E); (F,D,B), and
(F,C,E), ordered so that their lowercase renderings are equivalent to their CPO
L-trips. The graphical convention is to show the Sails as filled in, while the other 4
faces, like the Vent, are left empty: they show “where the wind blows” that keeps
the Box-Kite aloft. A real-world Box-Kite, meanwhile, would be held together
by 3 dowels (of wood or plastic, say) spanning the joins between the only vertices
left unconnected in our Octahedral rendering: the Struts linking the strut-opposite
vertices (A, F); (B, E); (C, D).
Fifth thing: In the Sedenions, the 7 isomorphic Box-Kites are differentiated by
which Octonion index is missing from the vertices, and this index is designated
by the letter S, for “signature,” “suppressed index,” or strut constant. This last
designation derives from the invariant relationship obtaining in a given Box-Kite
between S and the indices in the Vent and Zigzag termini (V and Z respectively)
of any of the 3 Struts, which we call the “First Vizier” or VZ1. This is one of
3 rules, involving the three Sedenion indices always missing from a Box-Kite’s
vertices: G, S, and their simple sum X (which is also their XOR product, since G
is always to the left of the left-most bit in S). The Second Vizier tells us that the
L-index of either terminus with the U-index of the other always form a trip with
G, and it true as written for all 2N-ions. The Third shows the relationship between
the L- and U- indices of a given Assessor, which always form a trip with X. Like
the First, it is true as written only in the Sedenions, but as an unsigned statement
about indices only, it is true universally. (For that reason, references to VZ1 and
VZ3 hereinout will be assumed to refer to the unsigned versions.) First derived in
the last section of Part I, reprised in the intro of Part II, we write them out now for
the third and final time in this monograph:
VZ1: v · z =V ·Z = S
VZ2: Z · v =V · z = G
VZ3: V · v = z ·Z = X.
Rules 1 and 2, the Three Viziers, plus the standard Octonion labeling scheme
derived from the simplest finite projective group, usually written as PSL(2,7), pro-
vide the basis of our toolkit. This last becomes powerful due to its capacity for
recursive re-use at all levels of CDP generation, not just the Octonions. The sim-
plest way to see this comes from placing the unique Rule 0 trip provided by the
Quaternions on the circle joining the 3 sides’ midpoints, with the Octonion Gen-
erator’s index, 4, being placed in the center. Then the 3 lines leading from the
Rule 0 trip’s (1, 2, 3) midpoints to their opposite angles – placed conventionally
in clockwise order in the midpoints of the left, right, and bottom sides of a triangle
whose apex is at 12 o’clock – are CPO trips forming the Struts, while the 3 sides
themselves are the Rule 2 trips. These 3 form the L-index sets of the Trefoil Sails,
while the Rule 0 trip provides the same service for the Zigzag. By a process analo-
gized to tugging on a slipcover (Part I) and pushing things into the central zone
of hot oil while wok-cooking (Part II), all 7 possible values of S in the Sedenions,
not just the 4, can be moved into the center while keeping orientations along all
7 lines of the Triangle unchanged. Part II’s critical Roundabout Theorem tells us,
moreover, that all 2N-ion ZDs, for all N > 3, are contained in Box-Kites as their
minimal ensemble size. Hence, by placing the appropriate G, S, or X in the center
of a PSL(2,7) triangle, with a suitable Rule 0 trip’s indices populating the circle,
any and all candidate primitive ZDs can be discovered and situated.
Sixth thing: The word “candidate” in the above is critical; its exploration was
the focus of Part II. For, starting with N = 5 and hence G = 16 (which is to say, in
the 32-D Pathions), whole Box-Kites can be suppressed (meaning, all 12 edges,
and not just the Struts, no longer serve as DMZ pathways). But for all N, the full
set of candidate Box-Kites are viable when S≤ 8 or equal to some higher power of
2. For all other S values, though, the phenomenon of carrybit overflow intervenes
– leading, ultimately, to the “meta-fractal” behavior claimed in our abstract. To
see this, we need another mode of representation, less tied to 3-D visualizing, than
the Box-Kite can provide. The answer is a matrix-like method of tabulating the
products of candidate ZDs with each other, called Emanation Tables or ETs. The
L-indices only of all candidate ZDs are all we need indicate (the U-indices being
forced once G is specified); these will saturate the list of allowed indices < G,
save for the value of S whose choice, along with that of G, fixes an ET. Hence,
the unique ET for given G and S will fill a square spreadsheet whose edge has
length 2N−1 −2. Moreover, a cell entry (r,c) is only filled when row and column
labels R and C form a DMZ, which can never be the case along an ET’s long
diagonals: for the diagonal starting in the upper left corner, R xor R = 0, and
the two diagonals within the same Assessor, can never zero-divide each other; for
the righthand diagonal, the convention for ordering the labels (ascending counting
order from the left and top, with any such label’s strut-opposite index immediately
being entered in the mirror-opposite positions on the right and bottom) makes R
and C strut-opposites, hence also unable to form DMZs.
For the Sedenions, we get a 6 x 6 table, 12 of whose cells (those on long
diagonals) are empty: the 24 filled cells, then, correspond to the two-way traffic
of “edge-currents” one imagines flowing between vertices on a Box-Kite’s 12
edges. A computational corollary to the Roundabout Theorem, dubbed the Trip-
Count Two-Step, is of seminal importance. It connects this most basic theorem of
ETs to the most basic fact of associative triplets, indicated in the opening pages
of Part I, namely: for any N, the number TripN of associative triplets is found,
by simple combinatorics, to be (2N −1)(2N −2)/3! – 35 for the Sedenions, 155
for the Pathions, and so on. But, by Trip-Count Two-Step, we also know that the
maximum number of Box-Kites that can fill a 2N-ion ET = TripN−2. For S a power
of 2, beginning in the Pathions (for S= 25−2 = 8), the Number Hub Theorem says
the upper left quadrant of the ET is an unsigned multiplication table of the 2N−2-
ions in question, with the 0’s of the long diagonal (indicated Real negative units)
replaced by blanks – a result effectively synonymous with the Trip-Count Two-
Step.
Seventh thing: We found, as Part II’s argument wound down, that the 2 classes
of ETs found in the Pathions – the “normal” for S ≤ 8, filled with indices for all
7 possible Box-Kites, and the “sparse” so-called Sand Mandalas, showing only
3 Box-Kites when 8 < S < 16, were just the beginning of the story. A simple
formula involving just the bit-string of s and g, where the lowercase indicates the
values of S and G modulo G/2, gave the prototype of our first recipe: all and
only cells with labels R or C, or content P ( = R xor C ), are filled in the ET. The
4 “missing Box-Kites” were those whose L-index trip would have been that of a
Sail in the 2N−1 realm with S = s and G = g. The sequence of 7 ETs, viewed in
S-increasing succession, had an obvious visual logic leading to their being dubbed
a flip-book. These 7 were obviously indistinguishable from many vantages, hence
formed a spectrographic band. There were 3 distinct such bands, though, each
typified by a Box-Kite count common to all band-members, demonstrable in the
ETs for the 64-D Chingons. Each band contained S values bracketed by multiples
of 8 (either less than or equal to the higher, depending upon whether the latter
was or wasn’t a power of 2). These were claimed to underwrite behaviors in all
higher 2N-ion ETs, according to 3 rough patterns in need of algorithmic refining
in this Part III. Corresponding to the first unfilled band, with ETs always missing
4N−4 of their candidate Box-Kites for N > 4, we spoke of recursivity, meaning
the ETs for constant S and increasing N would all obey the same recipe, properly
abstracted from that just cited above, empirically found among the Pathions for
S > 8. The second and third behaviors, dubbed, for S ascending, (s,g)-modularity
and hide/fill involution respectively, make their first showings in the Chingons, in
the bands where 16 < S ≤ 24, and then where 24 < S < 32. In all such cases,
we are concerned with seeing the “period-doubling” inherent in CDP and Chaotic
attractors both become manifest in a repeated doubling of ET edge-size, leading
to the fixed-S, N increasing analog of the fixed-N,S increasing flip-books first ob-
served in the Pathions, which we call balloon-rides. Specifying and proving their
workings, and combining all 3 of the above-designated behaviors into the “funda-
mental theorem of zero-division algebra,” will be our goals in this final Part III.
Anyone who has read this far is encouraged to bring up the graphical complement
to this monograph, the 78-slide Powerpoint show presented at NKS 2006 [5], in
another window. (Slides will be referenced by number in what follows.)
2 8 < S < 16,N → ∞ : Recursive Balloon Rides in
the Whorfian Sky
We know that any ET for the 2N-ions is a square whose edge is 2N−1 − 2 cells.
How, then, can any simply recursive rule govern exporting the structure of one
such box to analogous boxes for progressively higher N? The answer: include
the label lines – not just the column and row headers running across the top and
left margins, but their strut-opposite values, placed along the bottom and right
margins, which are mirror-reversed copies of the label-lines (LLs) proper to which
they are parallel. This increases the edge-size of the ET box to 2N−1.
Theorem 11. For any fixed S > 8 and not a power of 2, the row and column indices
comprising the Label Lines (LLs) run along the left and top borders of the 2N-
ion ET ”spreadsheet” for that S. Treat them as included in the spreadsheet, as
labels, by adding a row and column to the given square of cells, of edge 2N−1−2,
which comprises the ET proper. Then add another row and column to include
the strut-opposite values of these labels’ indices in “mirror LLs,” running along
the opposite edges of a now 2N−1-edge-length box, whose four corner cells, like
the long diagonals they extend, are empty. When, for such a fixed S, the ET for
the 2N+1-ions is produced, the values of the 4 sets of LL indices, bounding the
contained 2N-ion ET, correspond, as cell values, to actual DMZ P-values in the
bigger ET, residing in the rows and columns labeled by the contained ET’s G and
X (the containing ET’s g and g+S). Moreover, all cells contained in the box they
bound in the containing ET have P-values (else blanks) exactly corresponding to –
and including edge-sign markings of – the positionally identical cells in the 2N-ion
ET: those, that is, for which the LLs act as labels.
Proof. For all strut constants of interest, S < g(= G/2); hence, all labels up
to and including that immediately adjoining its own strut constant (that is, the
first half of them) will have indices monotonically increasing, up to and at least
including the midline bound, from 1 to g − 1. When N is incremented by 1,
the row and column midlines separating adjoining strut-opposites will be cut and
pulled apart, making room for the labels for the 2N+1-ion ET for same S, which
middle range of label indices will also monotonically increase, this time from
the current 2N-ion generation’s g (and prior generation’s G), up to and at least
including its own midline bound, which will be g plus the number of cells in
the LL inherited from the prior generation, or g/2− 1. The LLs are therefore
contained in the rows and columns headed by g and its strut opposite, g+S. To
say that the immediately prior CDP generation’s ET labels are converted to the
current generation’s P-values in the just-specified rows and columns is equivalent
to asserting the truth of the following calculation:
(g+u)+(sg) · (G+g+uopp)
g + (G+g+S)
−(vz) · (G+uopp) +(vz) · (sg) ·u
+u − (sg) · (G+uopp)
0 only if vz = (−sg)
Here, we use two binary variables, the inner-sign-setting sg, and the Vent-or-
Zigzag test, based on the First Vizier. Using the two in tandem lets us handle the
normal and “Type II” box-kites in the same proof. Recall (and see Appendix B of
Part II for a quick refresher) that while the “Type I” is the only type we find in the
Sedenions, we find that a second variety emerges in the Pathions, indistinguishable
from Type I in most contexts of interest to us here: the orientation of 2 of the
3 struts will be reversed (which is why VZ1 and VZ3 are only true generally
when unsigned). For a Type I, since S < g, we know by Rule 1 that we have the
trip (S,g,g+S); hence, g – for all 2N-ions beyond the Pathions, where the Sand
Mandalas’ g = 8 is the L-index of the Zigzag B Assessor – must be a Vent (and
its strut-opposite, g+S, a Zigzag). For a Type II, however, this is necessarily so
only for 1 of the 3 struts – which means, per the equation above, that sg must be
reversed to obtain the same result. Said another way, we are free to assume either
signing of vz means +1, so the “only if” qualifying the zero result is informative.
It is u and its relationship to g+ u that is of interest here, and this formulation
makes it easier to see that the products hold for arbitrary LL indices u or their
strut-opposites. But for this, the term-by-term computations should seem routine:
the left bottom is the Rule 1 outcome of (u,g,g+ u): obviously, any u index
must be less than g. To its right, we use the trip (uopp,g,g+ uopp) → (G+ g+
uopp,g,G+uopp), whose CPO order is opposite that of the multiplication. For the
top left, we use (u,S,uopp) as limned above, then augment by g, then G, leaving
uopp unaffected in the first augmenting, and g + u in the second. Finally, the
top right (ignoring sg and vz momentarily) is obtained this way: (u,S,uopp) →
(u,g+uopp,g+S)→ (u,G+g+ s,G+g+uopp); ergo, +u.
Note that we cannot eke out any information about edge-sign marks from this
setup: since labels, as such, have no marks, we have nothing to go on – unlike all
other cells which our recursive operations will work on. Indeed, the exact algo-
rithmic determination of edge-sign marks for labels is not so trivial: as one iterates
through higher N values, some segments of LL indexing will display reversals of
marks found in the ascending or descending left midline column, while other seg-
ments will show them unchanged – with key values at the beginnings and ends
of such octaves (multiples of 8, and sums of such multiples with S mod 8) some-
times being reversed or kept the same irrespective of the behavior of the terms
they bound. Fortunately, such behaviors are of no real concern here – but they are,
nevertheless, worth pointing out, given the easy predictability of other edge-sign
marks in our recursion operations.
Now for the ET box within the labels: if all values (including edge-sign marks)
remain unchanged as we move from the 2N-ion ET to that for the 2N+1-ions, then
one of 3 situations must obtain: the inner-box cells have labels u,v which belong to
some Zigzag L-trip (u,v,w); or, on the contrary, they correspond to Vent L-indices
– the first two terms in the CPO triplet (wopp,vopp,u), for instance; else, finally,
one term is a Vent, the other a Zigzag (so that inner-signs of their multiplied dyads
are both positive): we will write them, in CPO order, vopp and u, with third trip
member wopp. Clearly, we want all the products in the containing ET to indicate
DMZs only if the inner ET’s cells do similarly. This is easily arranged: for the
containing ET’s cells have indices identical to those of the contained ET’s, save
for the appending of g to both (and ditto for the U-indices).
Case 1: If (u,v,w) form a Zigzag L-index set, then so do (g+ v,g+u,w), so
markings remain unchanged; and if the (u,v) cell entry is blank in the contained,
so will be that for (g+ u,g+ v) in its container. In other words, the following
holds:
(g+ v)+(sg) · (G+g+ vopp)
(g+u) + (G+g+uopp)
−(G+wopp) − (sg) ·w
−w − (sg) · (G+wopp)
0 only if sg = (−1)
(g+u) · (g+ v) = P : (u,v,w)→ (g+ v,g+u,w); hence, (−w).
(g+ u) · (sg) · (G+ g+ vopp) = P : (u,wopp,vopp) → (g+ vopp,wopp,g+ u)
→ (G+wopp,G+g+ vopp,g+u); hence, (sg) · (−(G+wopp)).
(G+g+uopp) · (g+ v) = P : (uopp,wopp,v)→ (g+ v,wopp,g+uopp) → (g+
v,G+g+uopp,G+wopp); hence, (−(G+wopp)).
(G+g+uopp) ·(G+g+vopp) = P : Rule 2 twice to the same two terms yields
the same result as the terms in the raw, hence (−w).
Clearly, cycling through (u,v,w) to consider (g+ v) · (g+ w) will give the
exactly analogous result, forcing two (hence three) negative inner-signs in the
candidate Sail; hence, if we have DMZs at all, we have a Zigzag Sail.
Case 2: The product of two Vents must have negative edge-sign, and there’s
no cycling through same-inner-signed products as with the Zigzag, so we’ll just
write our setup as a one-off, with upper inner-sign explicitly negative, and claim
its outcome true.
(g+ vopp)− (G+g+ v)
(g+wopp) + (G+g+w)
+(G+uopp) +u
−u − (G+uopp)
(g+wopp) · (g+ vopp) = P : (wopp,vopp,u) → (g+ vopp,g+wopp,u); hence,
(−u).
(g+wopp) · (G+g+v) = P : (wopp,v,uopp)→ (g+v,g+wopp,uopp)→ (G+
uopp,g+wopp,G+ g+ v); but inner sign of upper dyad is negative, so (−(G+
uopp)).
(G+g+w) · (g+vopp) = P : (vopp,uopp,w)→ (g+w,uopp,g+vopp)→ (G+
uopp,G+g+w,g+ vopp); hence, (+(G+uopp)).
(G+g+w) · (G+g+ v) = P : Rule 2 twice to the same two terms yields the
same result as the terms in the raw; but inner sign of upper dyad is negative, so
(+u).
Case 3: The product of Vent and Zigzag displays same inner sign in both
dyads; hence the following arithmetic holds:
(g+u)+(G+g+uopp)
(g+ vopp) + (G+g+ v)
−(G+w) +wopp
−wopp +(G+w)
The calculations are sufficiently similar to the two prior cases as to make their
writing out tedious. It is clear that, in each of our three cases, content and marking
of each cell in the contained ET and the overlapping portion of the container ET
are identical. �
To highlight the rather magical label/content involution that occurs when N is
in- or de- cremented, graphical realizations of such nested patterns, as in Slides
60-61, paint LLs (and labels proper) a sky-blue color. The bottom-most ET being
overlaid in the central box has g = the maximum high-bit in S, and is dubbed the
inner skybox. The degree of nesting is strictly measured by counting the number
of bits B that a given skybox’s g is to the left of this strut-constant high-bit. If we
partition the inner skybox into quadrants defined by the midlines, and count the
number Q of quadrant-sized boxes along one or the other long diagonal, it is obvi-
ous that the inner skybox itself has B = 0 and Q = 1; the nested skyboxes contain-
ing it have Q = 2B. If recursion of skybox nesting be continued indefinitely – to
the fractal limit, which terminology we will clarify shortly – the indices contained
in filled cells of any skybox can be interpreted in B distinct ways, B → ∞, as rep-
resentations of distinct ZDs with differing G and, therefore, differing U-indices.
By obvious analogy to the theory of Riemann surfaces in complex analysis, each
such skybox is a separate “sheet”; as with even such simple functions as the log-
arithmic, the number of such sheets is infinite. We could then think of the infinite
sequence of skyboxes as so many cross-sections, at constant distances, of a flash-
light beam whose intensity (one over the ET’s cell count) follows Kepler’s inverse
square law. Alternatively, we could ignore the sheeting and see things another
Where we called fixed-N, S varying sequences of ETs flip-books, we refer to
fixed-S, N varying sequences as balloon rides: the image is suggested by David
Niven’s role as Phineas Fogg in the movie made of Jules Vernes’ Around the World
in 80 Days: to ascend higher, David would drop a sandbag over the side of his hot-
air balloon’s basket; if coming down, he would pull a cord that released some of
the balloon’s steam. Each such navigational tactic is easy to envision as a bit-
shift, pushing G further to the left to cross LLs into a higher skybox, else moving
Figure 1: ETs for S=15, N=5,6,7 (nested skyboxes in blue)· · · and “fractal limit.”
it rightward to descend. Using S = 15 as the basis of a 3-stage balloon-ride, we
see how increasing N from 5 to 6 to 7 approaches the white-space complement
of one of the simplest (and least efficient) plane-filling fractals, the Cesàro double
sweep [6, p. 65].
The graphics were programmatically generated prior to the proving of the the-
orems we’re elaborating: their empirical evidence was what informed (indeed,
demanded) the theoretical apparatus. And we are not quite finished with the cur-
rent task the apparatus requires of us. We need two more theorems to finish the
discussion of skybox recursion. For both, suppose some skybox with B = k, k any
non-negative integer, is nested in one with B = k + 1. Divide the former along
midlines to frame its four quadrants, then block out the latter skybox into a 4×4
grid of same-sized window panes, partitioned by the one-cell-thick borders of
its own midlines into quadrants, each of which is further subdivided by the out-
side edges of the 4 one-cell-thick label lines and their extensions to the window’s
frame. These extended LLs are themselves NSLs, and have R,C values of g and
g+S; for S = 15, they also adjoin NSLs along their outer edges whose R,C val-
ues are multiples of 8 plus S mod 8. These pane-framing pairs of NSLs we will
henceforth refer to (as a windowmaker would) as muntins. It is easy to calculate
that while the inner skybox has but one muntin each among its rows and columns,
each further nesting has 2B+1 − 1. But we are getting ahead of ourselves, as we
still have two proofs to finish. Let’s begin with Four Corners, or
Theorem 12. The 4 panes in the corners of the 16-paned B = k + 1 window are
identical in contents and marks to the analogously placed quadrants of the B = k
skybox.
Proof. Invoke the Zero-Padding Lemma with regard to the U-indices, as the labels
of the boxes in the corners of the B = k+1 ET are identical to those of the same-
sized quadrants in the B = k ET, all labels ≥ the latter’s g only occurring in the
newly inserted region. �
Remarks. For N = 6, all filled Four Corners cells indicate edges belonging to
3 Box-Kites, whose edges they in fact exhaust. These 3, not surprisingly, are
the zero-padded versions of the identically L-indexed trio which span the entirety
of the N = 5 ET. By calculations we’ll see shortly, however, the inner skybox,
when considered as part of the N = 6 ET, has filled cells belonging to all the
other 16 Box-Kites, even though the contents of these cells are identical to those
in the N = 5 ET. As B increases, then, the “sheets” covering this same central
region must draw upon progressively more extensive networks of interconnected
Box-Kites. As we approach the fractal limit – and “the Sky is the limit” – these
networks hence become scale-free. (Corollarily, for N = 7, the Four Corners’ cells
exhaust all the edges of the N = 6 ET’s 19 Box-Kites, and so on.)
Unlike a standard fractal, however, such a Sky merits the prefix “meta”: for
each empty ET cell corresponds to a point in the usual fractal variety; and each
pair of filled ET cells, having (r,c) of one = (c,r) of the other), correspond to
diagonal-pairs in Assessor planes, orthogonal to all other such diagonal-pairs be-
longing to the other cells. Each empty ET cell, in other words, not only corre-
sponds to a point in the usual plane-confined fractal, but belongs to the comple-
ment of the filled cells’ infinite number of dimensions framing the Sky’s meta-
fractal.
We’ve one last thing to prove here. The French Windows Theorem shows us
the way the cell contents of the pairs of panes contained between the B = k+ 1
skybox’s corners are generated from those of the analogous pairings of quadrants
in the B = k skybox, by adding g to L-indices.
Theorem 13. For each half-square array of cells created by one or the other midline
(the French windows), each cell in the half-square parallel to that adjoining the
midline (one of the two shutters), but itself adjacent to the label-line delimiting
the former’s bounds, has content equal to g plus that of the cell on the same line
orthogonal to the midline, and at the same distance from it, as it is from the label-
line. All the empty long-diagonal cells then map to g (and are marked), or g+S
(and are unmarked). Filled cells in extensions of the label-lines bounding each
shutter are calculated similarly, but with reversed markings; all other cells in a
shutter have the same marks as their French-window counterparts.
Preamble. Note that there can be (as we shall see when we speak of hide/fill
involution) cells left empty for rule-based reasons other than P = R⊻C = 0 | S.
The shutter-based counterparts of such French-window cells, unlike those of long-
diagonal cells, remain empty.
Proof. The top and left (bottom and right) shutters are equivalent: one merely
switches row for column labels. Top/left and bottom/right shutter-sets are likewise
equivalent by the symmetry of strut-opposites. We hence make the case for the
left shutter only. But for the novelties posed by the initially blank cells and the
label lines (with the only real subtleties involving markings), the proof proceeds
in a manner very similar to Theorem 11: split into 3 cases, based on whether (1)
the L-index trip implied by the R,C,P values is a Zigzag; (2) u,v are both Vents;
or, (3) the edge signified by the cell content is the emanation of same-inner-signed
dyads (that is, one is a Vent, the other a Zigzag).
Case 1: Assume (u,v,w) a Zigzag L-trip in the French window’s contained
skybox; the general product in its shutter is
v − (G+ vopp)
(g+u) + (G+g+uopp)
−(G+g+wopp) +(g+w)
−(g+w) +(G+g+wopp)
(g+u) · v = P : (u,v,w)→ (g+w,v,g+u); hence, (−(g+w)).
(g+ u) · (G+ vopp) = P : (u,wopp,vopp) → (g+wopp,g+ u,vopp) → (G+
vopp,g+u,G+g+wopp); dyads’ opposite inner signs make (G+g+wopp) pos-
itive.
(G+g+uopp) · v = P : (uopp,wopp,v) → (g+wopp,g+uopp,v) → (G+g+
uopp,G+g+wopp,v); hence, (−(G+g+wopp)).
(G+g+uopp) · (G+ vopp) = P : (vopp,uopp,w) → (vopp,g+w,g+uopp) →
(G+g+uopp,g+w,G+vopp); dyads’ opposite inner signs make (g+w) positive.
Case 2: The product of two Vents must have negative edge-sign, hence nega-
tive inner sign in top dyad to lower dyad’s positive. The shutter product thus looks
like this:
(uopp)− (G+u)
(g+ vopp) + (G+g+ v)
+(G+g+wopp) +(g+w)
−(g+w) − (G+g+wopp)
(g+vopp) ·uopp = P : (vopp,uopp,w)→ (g+w,uopp,g+vopp); hence, (−(g+
(g+ vopp) · (G+ u) = P : (vopp,u,wopp) → (g+wopp,u,g+ vopp) → (G+
u,G+ g+wopp,g+ vopp); but dyads’ inner signs are opposite, so (−(G+ g+
wopp)).
(G+g+v) ·uopp = P : (uopp,wopp,v)→ (uopp,g+v,g+wopp)→ (uopp,G+
g+wopp,G+g+ v); hence, (+(G+g+wopp)).
(G+g+v) ·(G+u)= P : (u,v,w)→ (u,g+w,g+v)→ (G+g+v,g+w,G+
u); but dyads’ inner signs are opposite, so (+(g+w)).
Case 3: The product of Vent and Zigzag displays same inner sign in both
dyads; hence the following arithmetic holds:
(uopp)+(G+u)
(g+ v) + (G+g+ vopp)
+(G+g+w) +(g+wopp)
−(g+wopp) − (G+g+w)
As with the last case in Theorem 11, we omit the term-by-term calculations for
this last case, as they should seem “much of a muchness” by this point. What is
clear in all three cases is that index values of shutter cells have same markings
as their French-window counterparts, at least for all cells which have markings in
the contained skybox; but, in all cases, indices are augmented by g.
The assignment of marks to the shutter-cells linked to blank cells in French
windows is straightforward for Type I box-kites: since any containing skybox
must have g > S, and since g+ s has g as its strut opposite, then the First Vizier
tells us that any g must be a Vent. But then the R,C indices of the cell containing
g must belong to a Trefoil in such a box-kite; hence, one is a Vent, the other a
Zigzag, and g must be marked. Only if the R,C,P entry in the ET is necessar-
ily confined to a Type II box-kites will this not necessarily be so. But Part II’s
Appendix B made clear that Type II’s are generated by excluding g from their
L-indices: recall that, in the Pathions, for all S ¡ 8, all and only Type II box-kites
are created by placing one of the Sedenion Zigzag L-trips on the “Rule 0” circle
of the PSL(2,7) triangle with 8 in the middle (and hence excluded). This is a box-
kite in its own right (one of the 7 “Atlas” box-kites with S = 8); its 3 sides are
“Rule 2” triplets, and generate Type II box-kites when made into zigzag L-index
sets. Conversely, all Pathion box-kites containing an ’8’ in an L-index (dubbed
”strongboxes” in Appendix B) are Type I. Whether something peculiar might oc-
cur for large N (where there might be multiple powers of 2 playing roles in the
same box-kite) is a matter of marginal interest to present concerns, and will be left
as an open question for the present. We merely note that, by a similar argument,
and with the same restrictions assumed, g+S must be a Zigzag L-index, and R,C
either both be likewise (hence, g+S is unmarked); or, both are Vents in a Trefoil
(so g+S must be unmarked here too).
The last detail – reversal of label-line markings in their g-augmented shutter-
cell extensions – is demonstrated as follows, with the same caveat concerning
Type II box-kites assumed to apply. Such cells house DMZs (just swap u for g+u
in Theorem 11’s first setup – they form a Rule 1 trip – and compute). The LL
extension on top has row-label g; that along the bottom, the strut-opposite g+S.
Given trip (u,v,w), the shutter-cell index for R,C = (g,u) corresponds to French-
window index for R,C = (g,g+u). But (u,g,g+u) is a Trefoil, since g is a Vent.
So if u is one too, g+u isn’t; hence marks are reversed as claimed. �
3 Maximal High-Bit Singletons: (s,g)-Modularity for
16 < S ≤ 24
The Whorfian Sky, having but one high bit in its strut constant, is the simplest
possible meta-fractal – the first of an infinite number of such infinite-dimensional
zero-divisor-spanned spaces. We can consider the general case of such single-
ton high-bit recursiveness in two different, complementary ways. First, we can
supplement the just-concluded series of theorems and proofs with a calculational
interlude, where we consider the iterative embeddings of the Pathion Sand Man-
dalas in the infinite cascade of boxes-within-boxes that a Sky oversees. Then, we
can generalize what we saw in the Pathions to consider the phenomenology of
strut constants with singleton high-bits, which we take to be any bits representing
a power of 2 ≥ 3 if S contains low bits (is not a multiple of 8), else a power of 2
strictly greater than 3 otherwise. Per our earlier notation, g = G/2 is the highest
such singleton bit possible. We can think of its exponential increments – equiva-
lent to left-shifts in bit-string terms – as the side-effects of conjoint zero-padding
of N and S. This will be our second topic in this section.
Maintaining our use of S = 15 as exemplary, we have already seen that NSLs
come in quartets: a row and column are each headed by S mod g (henceforth, s)
and g, hence 7 and 8 in the Sand Mandalas. But each recursive embedding of the
current skybox in the next creates further quartets. Division down the midlines
to insert the indices new to the next CDP generation induces the Sand Mandala’s
adjoining strut-opposite sets of s and g lines (the pane-framing muntins) to be
displaced to the borders of the four corners and shutters, with the new skybox’s
g and g+ s now adjoining the old s and g to form new muntins, on the right and
left respectively, while g+g/2 (the old G+g) and its strut opposite form a third
muntin along the new midlines. Continuing this recursive nesting of skyboxes
generates 1, 3, 7, · · ·, 2B+1 −1 row-and-column muntin pairs involving multiples
of 8 and their supplementings by s, where (recalling earlier notation) B = 0 for the
inner skybox, and increments by 1 with each further nesting. Put another way, we
then have a muntin number µ = (2N−4 −1), or 4µ NSL’s in all.
The ET for given N has (2N−1 −2) cells in each row and column. But NSLs
divvy them up into boxes, so that each line is crossed by 2µ others, with the 0,
2 or 4 cells in their overlap also belonging to diagonals. The number of cells in
the overlap-free segments of the lines, or ω , is then just 4µ · (2N−1 − 2− 2µ) =
24µ(µ +1): an integer number of Box-Kites. For our S = 15 case, the minimized
line shuffling makes this obvious: all boxes are 6 x 6, with 2-cell-thick boundaries
(the muntins separating the panes), with µ boundaries, and (µ + 1) overlap-free
cells per each row or column, per each quartet of lines.
The contribution from diagonals, or δ , is a little more difficult, but straight-
forward in our case of interest: 4 sets of 1,2,3, · · · ,µ boxes are spanned by mov-
ing along one empty long diagonal before encountering the other, with each box
contributing 6, and each overlap zone between adjacent boxes adding 2. Hence,
δ = 24 · (2N−3 − 1)(2N−3 − 2)/6 – a formula familiar from associative-triplet
counting: it also contributes an integer number of Box-Kites. The one-liner we
want, then, is this:
BKN, 8<S<16 = ω +δ = (2N−4)(2N−4 −1) + (2N−3 −1)(2N−3 −2)/6
For N = 4,5,6,7,8,9,10, this formula gives 0,3,19,91,395, 1643,6699. Add
4N−4 to each – the immediate side-effect of the offing of all four Rule 0 candidate
trips of the Sedenion Box-Kite exploded into the Sand Mandala that begins the
recursion – and one gets “déjà vu all over again”: 1, 7, 35, 155, 651, 2667, 10795
– the full set of Box-Kites for S ≤ 8.
It would be nice if such numbers showed up in unsuspected places, having
nothing to do with ZDs. Such a candidate context does, in fact, present itself, in Ed
Pegg’s regular MAA column on “Math Games” focusing on “Tournament Dice.”
[7] He asks us, “What dice make a non-transitive four player game, so that if three
dice are chosen, a fourth die in the set beats all three? How many dice are needed
for a five player non-transitive game, or more?” The low solution of 3 explicitly
involves PSL(2,7); the next solution of 19 entails calculations that look a lot like
those involved in computing row and column headers in ETs. No solutions to the
dice-selecting game beyond 19 are known. The above formulae, though, suggest
the next should be 91. Here, ZDs have no apparent role save as dummies, like the
infinity of complex dimensions in a Fourier-series convergence problem, tossed
out the window once the solution is in hand. Can a number-theory fractal, with
intrinsically structured cell content (something other, non-meta, fractals lack) be
of service in this case – and, if not in this particular problem, in others like it?
Now let’s consider the more general situation, where the singleton high-bit
can be progressively left-shifted. Reverting to the use of the simplest case as
exemplary, use S = g+1 = 9 in the Pathions, then do tandem left-shifts to pro-
duce this sequence: N = 6, S = g+1 = 17; N = 7, S = g+1 = 33; · · · ; N = K,
S = g+1 = 2K−2 + 1. A simple rule governs these ratchetings: in all cases, the
number of filled cells = 6 · (2N−1 − 4), since there are two sets of parallel sides
which are filled but for long-diagonal intersections, and two sets of g and 1 entries
distributed one per row along orthogonals to the empty long diagonals. Hence,
for the series just given, we have cell counts of 72, 168, · · · , 6 · (2N−1 − 4) for
BKN, S = 3, 7, · · · , 2
N−3 − 1, for g < S < g + 8 = G in the Pathions, and all
g < S ≤ g+8 in the Chingons, 27-ions, and general 2N-ions, in that order.
Algorithmically, the situation is just as easy to see: the splitting of dyads,
sending U- and L- indices to strut-opposite Assessors, while incorporating the S
and G of the current CDP generation as strut-opposites in the next, continues. For
S = 17 in the Chingons, there are now 2N−3 −1 = 7, not 3, Box-Kites sharing the
new g = 16 (at B) and S mod g = 1 (at E) in our running example. The U- indices
of the Sand Mandala Assessors for S = g+1 = 9 are now L-indices, and so on:
every integer < G and 6= S gets to be an L-index of one of the 30(= 2N−1 − 2)
Assessors, as 16 and S mod g = 1 appear in each of the 7 Box-Kites, with each
other eligible integer appearing once only in one of the 7 ·4= 28 available L-index
slots.
As an aside, in all 7 cases, writing the smallest Zigzag L-index at a mandates
all the Trefoil trips be “precessed” – a phenomenon also observed in the S = 8
Pathion case, as tabulated on p. 14 of [8]. For Zigzag L-index set (2,16,18),
for instance, (a,d,e) = (2,3,1) instead of (1,2,3); ( f ,c,e) = (19,18,1) not
(1,19,18); and ( f ,d,b) = (19,3,16). But otherwise, there are no surprises: for
N = 7, there are (27−3 − 1) = 15 Box-Kites, with all 62(= 2N−1 − 2) available
cells in the rows and columns linked to labels g and S mod g being filled, and so
Note that this formulation obtains for any and all S > 8 where the maximum
high-bit (that is, g) is included in its bitstring: for, with g at B and S mod g at
E, whichever R,C label is not one of these suffices to completely determine the
remaining Assessor L-indices, so that no other bits in S play a role in determining
any of them. Meanwhile, cell contents P containing either g or S mod g, but
created by XORing of row and column labels equal to neither, are arrayed in off-
diagonal pairs, forming disjoint sets parallel or perpendicular to the two empty
ones. If we write S mod g with a lower-case s, then we could call the rule in
play here (s,g)-modularity. Using the vertical pipe for logical or, and recalling the
special handling required by the 8-bit when S is a multiple of 8 (which we signify
with the asterisk suffixed to “mod”), we can shorthand its workings this way:
Theorem 14. For a 2N-ion inner skybox whose strut constant S has a singleton
high-bit which is maximal (that is, equal to g = G/2 = 2N−2), the recipe for its
filled cells can be condensed thus:
R | C| P = g | S mod∗ g
Under recursion, the recipe needs to be modified so as to include not just the
inner-skybox g and S mod∗ g (henceforth, simply lowercase s), but all integer
multiples k of g less than the G of the outermost skybox, plus their strut opposites
k ·g+ s.
Proof. The theorem merely boils down the computational arguments of prior para-
graphs in this section, then applies the last section’s recursive procedures to them.
The first claim of the proof is identical to what we’ve already seen for Sand Man-
dalas, with zero-padding injected into the argument. The second claim merely
assumes the area quadrupling based on midline splitting, with the side-effects al-
ready discussed. No formal proof, then, is called for beyond these points. �
Remarks. Using the computations from two paragraphs prior to the theorem’s
statement, we can readily calculate the box-kite count for any skybox, no matter
how deeply nested: recall the formula 6 · (2N−1 − 4) for BKN, S = 2
N−3 − 1. It
then becomes a straightforward matter to calculate, as well, the limiting ratio of
this count to the maximal full count possible for the ET as N → ∞, with each cell
approaching a point in a standard 2-D fractal. Hence, for any S with a singleton
high-bit in evidence, there exists a Sky containing all recursive redoublings of its
inner skybox, and computations like those just considered can further be used to
specify fractal dimensions and the like. (Such computations, however, will not
concern us.) Finally, recall that, by spectrographic equivalence, all such compu-
tations will lead to the same results for each S value in the same spectral band or
octave.
4 Hide/Fill Involution: Further-Right High-Bits with
24 < S < 32.
Recall that, in the Sand Mandala flip-book, each increment of S moved the two
sets of orthogonal parallel lines one cell closer toward their opposite numbers:
while S = 9 had two filled-in rows and columns forming a square missing its cor-
ners, the progression culminating in S = 15 showed a cross-hairs configuration:
the parallel lines of cells now abutted each other in 2-ply horizontal and vertical
arrays. The same basic progression is on display in the Chingons, starting with
S = 17. But now the number of strut-opposite cell pairs in each row and column
is 15, not 7, so the cross-hairs pattern can’t arise until S = 31. Yet it never arises
in quite the manner expected, as something quite singular transpires just after flip-
ping past the ET in the middle, for S = 24. Here, rows and columns labeled 8 and
16 constrain a square of empty cells in the center · · · quickly followed by an ET
which seems to continue the expected trajectory – except that almost all the non-
long-diagonal cells left empty in its predecessor ETs are now inexplicably filled.
More, there is a method to the “almost all” as well: for we now see not 2, but 4
rows and columns, all being blanked out while those labeled with g and S mod g
are being filled in.
This is an inevitable side effect of a second high-bit in S: we call this phe-
nomenon, first appearing in the Chingons, hide/fill involution. There are 4, not 2,
line-pairs, because S and G, modulo a lower power of 2 (because devolving upon
a prior CDP generation’s g), offer twice the possibilities: for S = 25, S mod 16 is
now 9, but S mod 8 can result in either 1 or 17 as well – with correlated multiples
of 8 (8 proper, and 24) defining the other two pairings. All cells with R |C | P
equal to one of these 4 values, but for the handful already set to “on” by the first
high-bit, will now be set to “off,” while all other non-long-diagonal cells set to
“off” in the Pathion Sand Mandalas are suddenly “on.” What results for each
Chingon ET with 24 < S < 32 is an ensemble comprised of 23 Box-Kites. (For
the flip-book, see Slides 40 – 54.) Why does this happen? The logic is as straight-
forward as the effect can seem mysterious, and is akin, for good reason, to the
involutory effect on trip orientation induced by Rule 2 addings of G to 2 of the
trip’s 3 indices.
In order to grasp it, we need only to consider another pair of abstract calcula-
tion setups, of the sort we’ve seen already many times. The first is the core of the
Two-Bit Theorem, which we state and prove as follows:
Theorem 15. 2N-ion dyads making DMZs before augmenting S with a new high-bit
no longer do so after the fact.
Proof. Suppose the high-bit in the bitstring representation of S is 2K, K < (N −
1). Suppose further that, for some L-index trip (u,v,w), the Assessors U and V
are DMZ’s, with their dyads having same inner signs. (This last assumption is
strictly to ease calculations, and not substantive: we could, as earlier, use one
or more binary variables of the sg type to cover all cases explicitly, including
Type I vs. Type II box-kites. To keep things simple, we assume Type I in what
follows.) We then have (u+ u ·X)(v+ v · X) = (u+U)(v+V ) = 0. But now
suppose, without changing N, we add a bit somewhere further to the left to S, so
that S < (2K = L) < G. The augmented strut constant now equals SL = S+L.
One of our L-indices, say v, belongs to a Vent Assessor thanks to the assumed
inner signing; hence, by Rule 2 and the Third Vizier, (V,v,X)→ (X +L,v,V +L).
Its DMZ partner u, meanwhile, must thereby be a Zigzag L-index, which means
(u,U,X)→ (u,X +L,U +L). We claim the truth of the following arithmetic:
v + (V +L)
u + (U +L)
+(W + L) +w
+ w − (W +L)
NOT ZERO (+w’s don’t cancel)
The left bottom product is given. The product to its right is derived as follows:
since u is a Zigzag L-index, the Trefoil U-trip (u,V,W) has the same orientation as
(u,v,w), so that Rule 2 → (u,W +L,V +L), implying the negative result shown.
The left product on the top line, though, has terms derived from a Trefoil U-trip
lacking a Zigzag L-index, so that only after Rule 2 reversal are the letters arrayed
in Zigzag L-trip order: (U +L,v,W +L). Ergo, +(W +L). Similarly for the top
right: Rule 2 reversal “straightens out” the Trefoil U-trip, to give (U+L,V +L,w);
therefore, (+w) results. If we explicitly covered further cases by using an sg
variable, we would be faced with a Theorem 2 situation: one or the other product
pair cancels, but not both. �
Remark. The prototype for the phenomenon this theorem covers is the “explo-
sion” of a Sedenion box-kite into a trio of interconnected ones in a Pathion sand
mandala, with the S of the latter = the X of the former. As part of this process, 4 of
the expected 7 are “hidden” box-kites (HBKs), with no DMZs along their edges.
These have zigzag L-trips which are precisely the L-trips of the 4 Sedenion Sails.
Here, an empirical observation which will spur more formal investigations in a
sequel study: for the 3 HBKs based on trefoil L-trips, exactly 1 strut has reversed
orientation (a different one in each of them), with the orientation of the triangular
side whose midpoint it ends in also being reversed. For the HBK based on the
zigzag L-trip, all 3 struts are reversed, so that the flow along the sides is exactly
the reverse of that shown in the “Rule 0” circle. (Hence, all possible flow patterns
along struts are covered, with only those entailing 0 or 2 reversals corresponding
to functional box-kites: our Type I and Type II designations.) It is not hard to show
that this zigzag-based HBK has another surprising property: the 8 units defined
by its own zigzag’s Assessors plus X and the real unit form a ZD-free copy of the
Octonions. This is also true when the analogous Type II situation is explored, al-
beit for a slightly different reason: in the former case, all 3 Catamaran “twistings”
take the zigzag edges to other HBKs; in the latter, though, the pair of Assessors in
some other Type II box-kite reached by “twisting” – (a,B) and (A,b), say, if the
edge be that joining Assessors A and B, with strut-constant copp = d – are strut
opposites, and hence also bereft of ZDs. The general picture seems to mirror this
concrete case, and will be studied in “Voyage by Catamaran” with this expecta-
tion: the bit-twiddling logic that generates meta-fractal “Skies” also underwrites
a means for jumping between ZD-free Octonion clones in an infinite number of
HBKs housed in a Sky. Given recent interest in pure “E8” models giving a privi-
leged place to the basis of zero-divisor theory, namely “G2” projections (viz., A.
Garrett Lisi’s “An Exceptionally Simple Theory of Everything”); a parallel vogue
for many-worlds approaches; and, the well-known correspondence between 8-D
closest-packing patterns, the loop of the 240 unit Octonions which Coxeter dis-
covered, and E8 algebras – given all this, tracking the logic of the links across
such Octonionic “brambles” might prove of great interest to many researchers.
Now, we still haven’t explained the flipside of this off-switch effect, to which
prior CDP generation Box-Kites – appropriately zero-padded to become Box-
Kites in the current generation until the new high-bit is added to the strut-constant
– are subjected. How is it that previously empty cells not associated with the sec-
ond high-bit’s blanked-out R, C, P values are now full? The answer is simple, and
is framed in the Hat-Trick Theorem this way.
Theorem 16. Cells in an ET which represent DMZ edges of some 2N-ion Box-
Kites for some fixed S, and which are offed in turn upon augmenting of S by a
new leftmost bit, are turned on once more if S is augmented by yet another new
leftmost bit.
Proof. We begin an induction based upon the simplest case (which the Chingons
are the first 2N-ions to provide): consider Box-Kites with S ≤ 8. If a high-bit
be appended to S, then the associated Box-Kites are offed. However, if another
high-bit be affixed, these dormant Box-Kites are re-awakened – the second half of
hide/fill involution. We simply assume an L-index set (u,v,w) underwriting a Sail
in the ET for the pre-augmented S, with Assessors (u,U) and (v,V ). Then, we
introduce a more leftified bit 2Q = M, where pre-augmented S < L < M < G, then
compute the term-by term products of (u+(U +L+M)) and (v+ sg · (V +L+
M)), using the usual methods. And as these methods tell us that two applications
of Rule 2 have the same effect as none in such a setup, we have no more to prove.
Corollary. The induction just invoked makes it clear that strut constants equal to
multiples of 8 not powers of 2 are included in the same spectral band as all other
integers larger than the prior multiple. The promissory note issued in the second
paragraph of Part II’s concluding section, on 64-D Spectrography, can now be
deemed redeemed.
In the Chingons, high-bits L and M are necessarily adjacent in the bitstring for
S < G = 32; but in the general 2N-ion case, N large, zero-padding guarantees that
things will work in just the same manner, with only one difference: the recursive
creation of “harmonics” of relatively small-g (s,g)-modular R,C,P values will
propagate to further levels, thereby effecting overall Box-Kite counts.
In general terms, we have echoes of the formula given for (s,g)-modular cal-
culations, but with this signal difference: there will be one such rule for each
high-bit 2H in S, where residues of S modulo 2H will generate their own near-
solid lines of rows and columns, be they hidden or filled. Likewise for multiples
of 2H <G which are not covered by prior rules, and multiples of 2H supplemented
by the bit-specific residue (regardless of whether 2H itself is available for treat-
ment by this bit-specific rule). In the simplest, no-zero-padding instances, all even
multiples are excluded, as they will have occurred already in prior rules for higher
bits, and fills or hides, once fixed by a higher bit’s rule, cannot be overridden.
Cases with some zero-padding are not so simple. Consider this two-bit in-
stance, S = 73,N = 8: the fill-bit is 64, the hide-bit is just 8, so that only 9 and 64
generate NSLs of filled values; all other multiples of 8, and their supplementing
by 1 (including 65) are NSLs of hidden values. Now look at a variation on this
example, with the single high-bit of zero-padding removed – i. e., S = 41,N = 8.
Here, the fill-bit is 32, and its multiples 64 and 96, as well as their supplements by
S modulo 32 = 9, or 9 and 73 and 105, label NSLs of filled values; but all other
multiples of 8, plus all multiples of 8 supplemented by 1 not equal to 9 or 73 or
105, label NSLs of hidden values. Cases with multiple fill and hide bits, with or
without additional zero-padding, are obviously even more complicated to handle
explicitly on a case-by-case basis, but the logic framing the rules remain simple;
hence, even such messy cases are programmatically easy to handle.
Hide/fill involution means, then, that the first, third, and any further odd-
numbered high-bits (counting from the left) will generate “fill” rules, whereas
all the even-numbered high-bits generate “hide” rules – with all cells not touched
by a rule being either hidden (if the total number of high-bits B is odd) or filled (B
is even).
Two further examples should make the workings of this protocol more clear.
First, the Chingon test case of S = 25: for (R | C | P = 9 | 16), all the ET cells are
filled; however, for (R | C | P = 1 | 8 | 17 | 24), ET cells not already filled by the
first rule (and, as visual inspection of Slide 48 indicates, there are only 8 cells in
the entire 840-cell ET already filled by the prior rule which the current rule would
like to operate on) are hidden from view. Because the 16- and 8- bits are the only
high-bits, the count of same is even, meaning all remaining ET cells not covered
by these 2 rules are filled.
We get 23 for Box-Kite count as follows. First, the 16-bit rule gives us 7 Box-
Kites, per earlier arguments; the 8-bit rule, which gives 3 filled Box-Kites in the
Pathions, recursively propagates to cover 19 hidden Box-Kites in the Chingons,
according to the formula produced last section. But hide/fill involution says that,
of the 35 maximum possible Box-Kites in a Chingon ET, 35− 19 = 16 are now
made visible. As none of these have the Pathion G = 16 as an L-index, and all the
7 Box-Kites from the 16-bit rule do, we therefore have a grand total of 7+16= 23
Box-Kites in the S = 25 ET, as claimed (and as cell-counting on the cited Slide
will corroborate).
The concluding Slides 76–78 present a trio of color-coded “histological slices”
of the hiding and filling sequence (beginning with the blanking of the long diago-
nals) for the simplest 3-high-bit case, N = 7,S = 57. Here, the first fill rule works
on 25 and 32; the first hide rule, on 9, 16, 41, and 48; the second fill rule, on 1, 8,
17, 24, 33, 40, 49, and 56; and the rest of the cells, since the count of high-bits is
odd, are left blank.
We do not give an explicit algorithmic method here, however, for computing
the number of Box-Kites contained in this 3,720-cell ET. Such recursiveness is
best handled programmatically, rather than by cranking out an explicit (hence,
long and tedious) formula, meant for working out by a time-consuming hand cal-
culation. What we can do, instead, is conclude with a brief finale, embodying
all our results in the simple “recipe theory” promised originally, and offer some
reflections on future directions.
5 Fundamental Theorem of Zero-Divisor Algebra
All of the prior arguments constitute steps sufficient to demonstrate the Funda-
mental Theorem of Zero-Divisor Algebra. Like the role played by its Gaussian
predecessor in the legitimizing of another “new kind of [complex] number the-
ory,” its simultaneous simplicity and generality open out on extensive new vistas
at once alien and inviting. The Theorem proper can be subdivided into a Proposi-
tion concerning all integers, and a “Recipe Theory” pragmatics for preparing and
“cooking” the meta-fractal entities whose existence the proposition asserts, but
cannot tell us how to construct.
Proposition: Any integer K > 8 not a power of 2 can uniquely be associated with
a Strut Constant S of ZD ensembles, whose inner skybox resides in the 2N-ions
with 2N−2 < K < 2N−1. The bitstring representation of S completely determines
an infinite-dimensional analog of a standard plane-confined fractal, with each of
the latter’s points associated with an empty cell in the infinite Emanation Table,
with all non-empty cells comprised wholly of mutually orthogonal primitive zero-
divisors, one line of same per cell.
Preparation: Prepare each suitable S by producing its bitstring representation,
then determining the number of high-bits it contains: if S is a multiple of 8, right-
shift 4 times; otherwise, right-shift 3 times. Then count the number B of 1’s
in the shortened bitstring that results. For this set {B} of B elements, construct
two same-sized arrays, whose indices range from 1 to B: the array {i} which
indexes the left-to-right counting order of the elements of {B}; and, the array
{P} which indexes the powers of 2 of the same element in the same left-to-right
order. (Example: if K = 613, the inner skybox is contained in the 211-ions; as
the number is not a multiple of 8, the bistring representation 1001100101 is right-
shifted thrice to yield the substring of high-bits 1001100; B = 3, and for 1 ≤ i ≤
3, P1 = 9, P2 = 6;P3 = 5.)
Cookbook Instructions:
[0] For a given strut-constant S, compute the high-bit count B and bitstring
arrays {i} and {P}, per preparation instructions.
[1] Create a square spreadsheet-cell array, of edge-length 2I , where I ≥G/2= g
of the inner skybox for S, with the Sky as the limit when I → ∞.
[2] Fill in the labels along all four edges, with those running along the right
(bottom) borders identical to those running along the left (top), except in
reversed left-right (top-bottom) order. Refer to those along the top as col-
umn numbers C, and those along the left edge, as row numbers R, setting
candidate contents of any cell (r,c) to R⊻C = P.
[3] Paint all cells along the long diagonals of the spreadsheet just constructed
a color indicating BLANK, so that all cells with R =C (running down from
upper left corner) else R⊻C = S (running down from upper right) have their
P-values hidden.
[4] For 1 ≤ i ≤ B, consider for painting only those cells in the spreadsheet
created in [1] with R | C | P = m · 2γ | m · 2γ + σ , where γ = Pi,σ =
S mod∗ 2γ , and m is any integer ≥ 0 (with m= 0 only producing a legitimate
candidate for the right-hand’s second option, as an XOR of 0 indicates a
long-diagonal cell).
[5] If a candidate cell has already been painted by a prior application of these
instructions to a prior value of i, leave it as is. Otherwise, paint it with R⊻C
if i = odd, else paint it BLANK.
[6] Loop to [4] after incrementing i. If i < B, proceed until this step, then
reloop, reincrement, and retest for i = B. When this last condition is met,
proceed to the next step.
[7] If B is odd, paint all cells not already painted, BLANK; for B even, paint
them with R⊻C.
In these pseudocode instructions, no attention is given to edge-mark gener-
ation, performance optimization, or other embellishments. Recursive expansion
beyond the chosen limits of the 2N-ion starting point is also not addressed. (Just
keep all painted cells as is, then redouble until the expanded size desired is at-
tained; compute appropriate insertions to the label lines, then paint all new cells
according to the same recipe.) What should be clear, though, is any optimization
cannot fail to be qualitatively more efficient than the code in the appendix to [9],
which computes on a cell-by-cell basis. For S > 8, N > 4, we’ve reached the
onramp to the Metafractal Superhighway: new kinds of efficiency, synergy, con-
nectedness, and so on, would seem to more than compensate for the increase in
dimension.
It is well-known that Chaotic attractors are built up from fractals; hence, our
results make it quite thinkable to consider Chaos Theory from the vantage of pure
Number · · · and hence the switch from one mode of Chaos to another as a bitstring-
driven – or, put differently, a cellular automaton-type – process, of Wolfram’s
Class 4 complexity. Such switching is of the utmost importance in coming to
terms with the most complex finite systems known: human brains. The late Fran-
cisco Varela, both a leading visionary in neurological research and its computer
modeling, and a long-time follower of Madhyamika Buddhism who’d collabo-
rated with the Dalai Lama in his “Tibetan Buddhists talk with brain scientists”
dialogues [10], pointed to just the sorts of problems being addressed here as the
next frontier. In a review essay he co-authored in 2001 just before his death [11,
p. 237], we read these concluding thoughts on the theme of what lies “Beyond
Synchrony” in the brain’s workings:
The transient nature of coherence is central to the entire idea of large-
scale synchrony, as it underscores the fact that the system does not be-
have dynamically as having stable attractors [e.g., Chaos], but rather
metastable patterns – a succession of self-limiting recurrent patterns.
In the brain, there is no “settling down” but an ongoing change marked
only by transient coordination among populations, as the attractor it-
self changes owing to activity-dependent changes and modulations of
synaptic connections.
Varela and Jean Petitot (whose work was the focus of the intermezzo conclud-
ing Part I, in which semiotically inspired context the Three Viziers were intro-
duced) were long-time collaborators, as evidenced in the last volume on Naturaliz-
ing Phenomenology [12] which they co-edited. It is only natural then to re-inscribe
the theme of mathematizing semiotics into the current context: Petitot offers sepa-
rate studies, at the “atomic” level where Greimas’ “Semiotic Square” resides; and
at the large-scale and architectural, where one must place Lévi-Strauss’s “Canon-
ical Law of Myths.” But the pressing problem is finding a smooth approach that
lets one slide the same modeling methodology from the one scale to the other:
a fractal-based “scale-free network” approach, in other words. What makes this
distinct from the problem we just saw Varela consider is the focus on the structure,
rather than dynamics, of transient coherence – a focus, then, in the last analysis, on
a characterization of database architecture that can at once accommodate meta-
chaotic transiency and structural linguists’ cascades of “double articulations.”
Starting at least with C. S. Peirce over a century ago, and receiving more
recent elaboration in the hands of J. M. Dunn and the research into the “Semantic
Web” devolving from his work, data structures which include metadata at the
same level as the data proper have led to a focus on “triadic logic,” as perhaps best
exemplified in the recent work of Edward L. Robertson. [13] His exploration of
a natural triadic-to-triadic query language deriving from Datalog, which he calls
Trilog, is not (unlike our Skies) intrinsically recursive. But his analysis depends
upon recursive arguments built atop it, and his key constructs are strongly resonant
with our own (explicitly recursive) ones. We focus on just a few to make the point,
with the aim of provoking interest in fusing approaches, rather than in proving any
particular results.
The still-standard technology of relational databases based on SQL statements
(most broadly marketed under the Oracle label) was itself derived from Peirce’s
triadic thinking: the creator of the relational formalism, Edgar F. “Ted” Codd, was
a PhD student of Peirce editor and scholar Arthur W. Burks. Codd’s triadic “re-
lations,” as Robertson notes (and as Peirce first recognized, he tells us, in 1885),
are “the minimal, and thus most uniform” representations “where metadata, that
is data about data, is treated uniformly with regular data.” In Codd’s hands (and in
those of his market-oriented imitators in the SQL arena), metadata was “relegated
to an essentially syntactic role” [13, p. 1] – a role quite appropriate to the appli-
cations and technological limitations of the 1970’s, but inadequate for the huge
and/or highly dynamic schemata that are increasingly proving critical in bioinfor-
matics, satellite data interpretation, Google server-farm harvesting, and so on. As
Robertson sums up the situation motivating his own work,
Heterogeneous situations, where diverse schemata represent semanti-
cally similar data, illustrate the problems which arise when one per-
son’s semantics is another’s syntax – the physical “data dependence”
that relational technology was designed to avoid has been replaced by
a structural data dependence. Hence we see the need to [use] a simple,
uniform relational representation where the data/metadata distinction
is not frozen in syntax. [13, pp. 1-2]
As in relational database theory and practice, the forming and exploiting of
inner and outer joins between variously keyed tables of data is seminal to Robert-
son’s approach as well as Codd’s. And while the RDF formalism of the Semantic
Web (the representational mechanism for describing structures as well as contents
of web artifacts on the World Wide Web) is likewise explicitly triadic, there has,
to date, been no formal mechanism put in place for manipulating information in
RDF format. Hence, “there is no natural way to restrict output of these mecha-
nisms to triples, except by fiat” [13, p. 4], much less any sophisticated rule-based
apparatus like Codd’s “normal forms” for querying and tabulating such data. It is
no surprise, then, that Robertson’s “fundamental operation on triadic relations is
a particular three-way join which takes explicit advantage of the triadic structure
of its operands.” This triadic join, meanwhile, “results in another triadic relation,
thus providing the closure required of an algebra.” [13, p. 6]
Parsing Robertson’s compact symbolic expressions into something close to
standard English, the trijoin of three triadic relations R, S, T is defined as some
(a,b,c) selected from the universe of possibilities (x,y,z), such that (a,x,z) ∈ R,
(x,b,y) ∈ S, and (z,y,c) ∈ T . This relation, he argues, is the most fundamental
of all the operators he defines. When supplemented with a few constant relations
(analogs of Tarski’s “infinite constants” embodied in the four binary relations of
universality of all pairs, identity of all equal pairs, diversity of all unequal pairs,
and the empty set), it can express all the standard monotonic operators (thereby
excluding, among his primitives, only the relative complement).
How does this compare with our ZD setup, and the workings of Skies? For one
thing, Infinite constants, of a type akin to Tarski’s, are embodied in the fact that
any full meta-fractal requires the use of an infinite G, which sits atop an endless
cascade of singleton leftmost bits, determining for any given S an indefinite tower
of ZDs. One of the core operators massaging Robertson’s triads is the flip, which
fixes one component of a relation while interchanging the other two · · · but our
Rule 2 is just the recursive analog of this, allowing one to move up and down
towers of values with great flexibilty (allowing, as well, on and off switching
effecting whole ensembles). The integer triads upon which our entire apparatus
depends are a gift of nature, not dictated “by fiat,” and give us a natural basis for
generating and tracking unique IDs with which to “tag” and “unpack” data (with
“storage” provided free of charge by the empty spaces of our meta-fractals: the
“atoms” of Semiotic Squares have four long-diagonal slots each, one per each of
the “controls” Petitot’s Catastrophe Theory reading calls for, and so on.)
Finally, consider two dual constructions that are the core of our own triadic
number theory: if the (a,b,c) of last paragraph, for instance, be taken as a Zigzag’s
L-index set, then the other trio of triples correlates quite exactly with the Zigzag
U-trips. And this 3-to-1 relation, recall, exactly parallels that between the 3 Tre-
foil, and 1 Zigzag, Sails defining a Box-Kite, with this very parallel forming the
support for the recursion that ultimately lifts us up into a Sky. We can indeed
make this comparison to Robinson’s formalism exceedingly explicit: if his X, Y,
Z be considered the angular nodes of PSL(2,7) situated at the 12 o’clock apex and
the right and left corners respectively, then his (a,b,c) correspond exactly to our
own Rule 0 trip’s same-lettered indices!
Here, we would point out that these two threads of reflection – on underwriting
Chaos with cellular-automaton-tied Number Theory, and designing new kinds of
database architectures – are hardly unrelated. It should be recalled that two years
prior to his revolutionary 1970 paper on relational databases [14], Codd published
a pioneering book on cellular automata [15]. It is also worth noting that one
of the earliest technologies to be spawned by fractals arose in the arena of data
compression of images, as epitomized in the work of Michael Barnsley and his
Iterative Systems company. The immediate focus of the author’s own commercial
efforts is on fusing meta-fractal mathematics with the context-sensitive adaptive-
parsing “Meta-S” technology of business associate Quinn Tyler Jackson. [16] And
as that focus, tautologically, is not mathematical per se, we pass it by and leave it,
like so many other themes just touched on here, for later work.
References
[1] Robert P. C. de Marrais, “Placeholder Substructures I: The Road From NKS
to Scale-Free Networks is Paved with Zero Divisors,” Complex Systems, 17
(2007), 125-142; arXiv:math.RA/0703745
[2] Robert P. C. de Marrais, “Placeholder Substructures II: Meta-Fractals, Made
of Box-Kites, Fill Infinite-Dimensional Skies,” arXiv:0704.0026 [math.RA]
[3] Robert P. C. de Marrais, “The 42 Assessors and the Box-Kites They Fly,”
arXiv:math.GM/0011260
[4] Robert P. C. de Marrais, “The Marriage of Nothing and All: Zero-Divisor
Box-Kites in a ‘TOE’ Sky,” in Proceedings of the 26th International Col-
loquium on Group Theoretical Methods in Physics, The Graduate Center
of the City University of New York, June 26-30, 2006, forthcoming from
Springer–Verlag.
[5] Robert P. C. de Marrais, “Placeholder Substructures: The Road from NKS
to Small-World, Scale-Free Networks Is Paved with Zero-Divisors,” http://
wolframscience.com/conference/2006/ presentations/materials/demarrais.ppt
(Note: the author’s surname is listed under “M,” not “D.”)
[6] Benoit Mandelbrot, The Fractal Geometry of Nature (W. H. Freeman and
Company, San Francisco, 1983)
[7] Ed Pegg, Jr., “Tournament Dice,” Math Games column for July 11, 2005, on
the MAA website at http://www.maa.org/editorial/ mathgames/mathgames
_07_11_05.html
[8] Robert P. C. de Marrais, “The ‘Something From Nothing’ Insertion Point,”
http://www.wolframscience.com/conference/2004/ presentations/
materials/rdemarrais.pdf
[9] Robert P. C. de Marrais, “Presto! Digitization,” arXiv:math.RA/0603281
[10] Francisco Varela, editor, Sleeping, Dreaming, and Dying: An Exploration
of Consciousness with the Dalai Lama (Wisdom Publications: Boston, 1997).
[11] F. J. Varela, J.-P. Lachauz, E. Rodrigues and J. Martinerie, “The brainweb:
phase synchronization and large-scale integration,” Nature Reviews Neuro-
science, 2 (2001), pp. 229-239.
http://arxiv.org/abs/math/0703745
http://arxiv.org/abs/0704.0026
http://arxiv.org/abs/math/0011260
http://www.maa.org/editorial/
http://www.wolframscience.com/conference/2004/
http://arxiv.org/abs/math/0603281
[12] Jean Petitot, Francisco J. Varela, Bernard Pachoud and Jean-Michel Roy,
Naturalizing Phenomenology: Issues in Contemporary Phenomenology and
Cognitive Science (Stanford University Press: Stanford, 1999)
[13] Edward L. Robertson, “An Algebra for Triadic Relations,” Technical Re-
port No. 606, Computer Science Department, Indiana University, Bloom-
ington IN 47404-4101, January 2005; online at http://www.cs.indiana.edu/
pub/techreports/TR606.pdf
[14] E. F. Codd, The Relational Model for Database Management: Version 2
(Addison-Wesley: Reading MA, 1990) is the great visionary’s most recent
and comprehensive statement.
[15] E. F. Codd, Cellular Automata (Academic Press: New York, 1968)
[16] Quinn Tyler Jackson, Adapting to Babel – Adaptivity and Context-Sensiti-
vity in Parsing: From anbncn to RNA (Ibis Publishing: P.O. Box3083, Ply-
mouth MA 02361, 2006; for purchasing information, contact Thothic Tech-
nology Partners, LLC, at their website, www.thothic.com).
http://www.cs.indiana.edu/
	The Argument So Far
	8 <S < 16, N  : Recursive Balloon Rides in the Whorfian Sky
	Maximal High-Bit Singletons: (s,g)-Modularity for 16 < S 24
	Hide/Fill Involution: Further-Right High-Bits with 24 < S < 32.
	Fundamental Theorem of Zero-Divisor Algebra
ABSTRACT
  Zero-divisors (ZDs) derived by Cayley-Dickson Process (CDP) from
N-dimensional hypercomplex numbers (N a power of 2, at least 4) can represent
singularities and, as N approaches infinite, fractals -- and thereby,scale-free
networks. Any integer greater than 8 and not a power of 2 generates a
meta-fractal or "Sky" when it is interpreted as the "strut constant" (S) of an
ensemble of octahedral vertex figures called "Box-Kites" (the fundamental
building blocks of ZDs). Remarkably simple bit-manipulation rules or "recipes"
provide tools for transforming one fractal genus into others within the context
of Wolfram's Class 4 complexity.

<|endoftext|><|startoftext|>
Langmuir-Blodgett Assembly of Densely Aligned Single-Walled Carbon Nanotubes 
from Bulk Materials
Xiaolin Li, Li Zhang, Xinran Wang, Iwao Shimoyama, Xiaoming Sun, Won-Seok Seo, Hongjie Dai*
Department of Chemistry, Stanford University, Stanford, CA 94305, USA.
RECEIVED DATE (automatically inserted by publisher); hdai@stanford.edu
Single-walled carbon nanotubes (SWNTs) exhibit advanced 
properties desirable for high performance nanoelectronics.
Important to future manufacturing of high-current, speed and 
density nanotube circuits is large-scale assembly of SWNTs into 
densely aligned forms.
 Despite progress in oriented synthesis and 
assembly including the Langmuir-Blodgett (LB) method,2-9 no 
method exists for producing assemblies of pristine SWNTs (free 
of extensive covalent modifications) with both high density and 
high degree of alignment of SWNTs. Here, we develop a LB 
method achieving monolayers of aligned non-covalently 
functionalized SWNTs from organic solvent with dense packing. 
The monolayer SWNTs are readily patterned for device 
integration by microfabrication, enabling the high currents 
(~3mA) SWNT devices with narrow channel widths. Our method 
is generic for different bulk materials with various diameters. 
 Suspensions of as-grown laser-ablation and Hipco SWNTs in 
1,2-dichloroethane (DCE) solutions of poly(m-phenylenevinylene 
-co-2,5-dioctoxy-p-phenylenevinylene) (PmPV) were prepared by 
sonication, ultra centrifugation and filtration (see supplementary 
information). The suspension contained mostly individual 
nanotubes (average diameter~1.3nm and ~1.8nm respectively for 
Hipco and laser-ablation materials, mean length ~500nm, Fig.1d
and 1e) well solubilized in DCE without free unbound PmPV. 
PmPV is known to exhibit high binding affinity to SWNT 
sidewall via  stacking of its conjugated backbone (Fig.1a) and 
thus impart solubility of nanotubes in organic solvents.10 Indeed, 
we obtained homogeneous suspensions of nanotubes in PmPV 
solutions. However, we found that DCE was the only solvent in 
which PmPV bound SWNTs remained stably suspended when 
free unbound PmPV molecules were removed (Inset of Fig.1b). 
The PmPV treated SWNTs exhibited no aggregation in DCE over 
several months. DCE without PmPV could suspend low 
concentrations of SWNTs (~50X lower than with PmPV 
functionalization), insufficient for LB formation, especially for 
larger SWNTs in laser materials with lower solubility.
The excitation and emission spectra of PmPV bound SWNTs 
(in PmPV-SWNT solution with excess PmPV removed) exhibited 
~20nm and ~3nm shifts respectively relative to those of pure 
PmPV in DCE (Fig.1b), providing spectroscopic evidence of 
strong interaction between PmPV and SWNTs. No change in the 
spectra was observed with the highly stable PmPV-SWNT/DCE 
suspension for months, indicating strong binding of PmPV on 
SWNT without detachment in DCE. The fact that PmPV-SWNTs 
were not stably suspended in other solvents without excess PmPV 
and that addition of large amounts of these solvents (e.g., 
chloroform) into a PmPV-SWNT/DCE suspension causing 
nanotube precipitation suggested significant detachment of PmPV 
from nanotubes in most organic solvents. The unique stability of 
PmPV coating on SWNT in DCE over other solvents is not fully 
understood currently. Nevertheless, it is highly desirable for 
chemical assembly of high quality nanotubes and integrated 
devices since it enables non-covalently functionalized SWNTs 
(both large diameter laser and small diameter Hipco materials) 
soluble in organics in nearly pristine form, as gleaned from the 
characteristic UV-vis-NIR absorbance (Fig.1c) and Raman 
signatures of non-covalently modified SWNTs (Fig.2c).
PmPV-SWNTs were spread on a water subphase from a DCE 
solution, compressed upon DCE vaporization to form a LB film 
using compression-retraction-compression cycles to reduce 
hysteresis (supplementary Fig.S1, S2&S3) and then vertically 
transferred onto a SiO2 or any other substrate (glass, plastic, etc.).
Organic solutions of stably suspended SWNTs without excess 
free polymer are critical to high density SWNT LB film 
formation. Microscopy (Fig.2a&2b) and spectroscopy 
(Fig.2c&2d) characterization revealed high quality densely 
aligned SWNTs (normal to the compression and substrate pulling 
Hipco
Figure 1. PmPV functionalized SWNTs. (a) Schematic drawings of a 
SWNT and two units of a PmPV chain. (b) Excitation and 
fluorescence spectra of pure PmPV in DCE vs. PmPV bound Hipco 
SWNTs in DCE. Inset: photograph of PmPV coated Hipco SWNTs 
suspended in DCE without excess PmPV in the solution. (c) UV-vis-
NIR spectrum of PmPV suspended Hipco SWNTs with no excess 
PMPV. (d) & (e) Atomic force microscopy (AFM) images of Hipco 
and laser-ablation SWNTs randomly deposited on a substrate from 
solution. Insets: Diameter distributions.
= OC8H17
(b) (c)
300 400 500
5.0x10
1.0x10
1.5x10
Wavelength (nm)
 PmPV
 Hipco-PmPV
600 800 1000
Wavelength (nm)
 Hipco-PmPV
1.0 1.5 2.03.00.5
d (nm)
1.0 1.5 2.03.00.5
d (nm)
1.0 1.5 2.0 2.50.5
d (nm)
1.0 1.5 2.0 2.50.5
d (nm)
(d) (e)
direction) formed uniformly over large substrates for both Hipco 
and laser ablation materials. Height of the film relative to tube-
free regions of the substrate was <2nm under AFM, suggesting 
monolayer of packed SWNTs. Micro-Raman spectra of the 
SWNTs showed ~ cos2polarization dependence of the G band 
(~1590cm-1) intensity (Fig. 2d), where  is the angle between the 
laser polarization and the SWNT alignment direction. The peak to 
valley ratio of the Raman intensities was ~8 with little variation 
over the substrate, indicating alignment of SWNTs over large 
areas. Nevertheless, imperfections existed in the quasi-aligned 
dense SWNT assembly including voids, bending and looping of 
nanotubes formed during the compression process for LB film 
formation due to the high aspect ratio (diameter <~2nm, length 
~200nm-1m) and mechanical flexibility of SWNTs.
Our aligned SWNT monolayers on oxide substrates can be 
treated as carbon-nanotube on insulator (CNT_OI) materials for 
patterning and integration into potential devices, much like how
Si on insulator (SOI) has been used for electronics. We used 
lithographic patterning techniques and oxygen plasma etching to 
remove unwanted nanotubes and form patterned arrays of squares 
or rectangles comprised of aligned SWNTs (Fig.3a and 3b). We 
then fabricated arrays of two-terminal devices with Ti/Au metal 
source (S) and drain (D) contacting massively parallel SWNTs in 
~10 m wide S-D regions with channel length ~250nm (Fig.3c 
and 3d). Current vs. bias voltage (I-V) measurements showed that 
such devices made from Hipco SWNTs were more than 25 times 
more resistive than similar devices made from laser-ablation 
SWNTs, with currents reaching ~0.13mA and ~3.5mA 
respectively at a bias of 3 V through collective current carrying of 
SWNTs in parallel (Fig.3e and 3f). Further, Hipco SWNT devices 
exhibited higher non-linearity in the I-V characteristics than laser 
ablation nanotubes (Fig.3e). These results were attributed to the 
diameter difference between Hipco and laser-ablation materials. 
Hipco SWNTs were small in diameter with many tubes ≤1.2nm, 
giving rise to high (non-ohmic) contact resistance for both 
semiconducting and metallic SWNTs.
 Smaller SWNTs could 
also be more susceptible to defects and disorder, contributing to 
degraded current carrying ability.
The LB assembly of densely aligned SWNTs can be combined 
with chemical separation and selective chemical reaction 
methods12 to afford purely metallic or semiconducting SWNTs in 
massive parallel configuration useful for interconnection or high 
speed transistor applications at large scale. The method is generic 
in terms of the type of nanotube materials and substrates. 
Acknowledgment. We thank Dr. Pasha Nikolaev for providing 
laser-ablation SWNTs and MARCO-MSD and Intel for support.
Supporting Information Available: Experimental details are 
available free of charge via the internet at http://pubs.acs.org. 
REFERENCES 
1. Guo, J., Hasan, S., Javey, A., Bosman, G. & Lundstrom, M. IEEE Trans. 
Nanotechnology 2005, 4, 715.
2. Zhang, Y., Chang, A. & Dai, H. J. Appl. Phys. Lett. 2001, 79, 3155.
3. Huang, S. M., Maynor, B., Cai, X. Y. & Liu, J. Adv. Mater. 2003, 15, 
1651.
4. Kocabas, C., Hur, S., Gaur, A., Meitl, M. A., Shim, M. and Rogers, J. A. 
Small 2005, 11, 1110.
5. Han, S., Liu, X. & Zhou, C. W. J. Am. Chem. Soc. 2005, 127, 5294.
6. Gao, J., Yu, A., Itkis M. E., Bekyarova, E., Zhao, B., Niyogi, S. & 
Haddon, R. C. J. Am. Chem. Soc. 2004, 126, 16698.
7. Rao, S. G., Huang, L., Setyawan, W. & Hong, S. Nature 2003, 425, 36.
8. Guo, Y., Wu, J., Zhang, Y. Chem. Phys. Lett. 2002, 362, 314.
9. Krstic, V., Duesberg, G. S., Muster, J., Burghard, M., Roth, S. Chem. 
Mater. 1998, 10, 2338.
10. Star, A., Stoddart, J. F., Steuerman, D., Diehl, M., Boukai, A., Wong, E. 
W., Yang, X., Chung, S. W., Choi, H. & Heath, J. R. Angew. Chem. Int. 
Ed. 2001, 40, 1721.
11. Kim, W. Javey, A., Tu, R., Cao, J., Wang, Q. & Dai, H. Appl. Phys. Lett.
2005, 87, 173101.
12. Zhang, G. Y., Qi, P. F., Wang, X. R., Lu, Y. R., Li, X. L., Tu, R., 
Bangsaruntip, S., Mann, D., Zhang, L. & Dai, H. Science 2006, 314, 
Figure 2. LB monolayers of aligned SWNTs. (a) AFM image of a LB 
film of Hipco SWNTs on a SiO2 substrate. (b) AFM image of a LB 
film of laser-ablation SWNTs. (c) Raman spectra of the G line of a 
Hipco SWNT LB film recorded at various angles () between the 
polarization of laser excitation and SWNT alignment direction. Inset:
Raman spectrum showing the radial breathing mode (RBM) region of
the Hipco LB film at ~0. (d) G line (1590cm-1) intensity vs. angle 
for the Hipco SWNT LB film in (c). The red curve is a cos2 fit.
(a) (b)
0 50 100 150
5.0x10
1.0x10
1.5x10
2.0x10
Angle (deg.)
ngle (deg.) Raman s
hift (cm
ngle (deg.) Raman s
hift (cm
150 200 250
Raman shift (cm
(c) (d)
Figure 3. Microfabrication patterning and device integration of SWNT 
LB films. (a) Optical image of a patterned SWNT LB film. The squares 
and rectangles are regions containing densely aligned SWNTs. Other 
areas are SiO2 substrate regions. (b) SEM image of a region 
highlighted in (a) with packed SWNTs aligned vertically. (c) SEM 
image showing a 10-micron-wide SWNT LB film between source and 
drain electrodes formed in a region marked in (b). (d) AFM image of a 
region in (c) showing aligned SWNTs and the edges of the S and D 
electrodes. (e) Current vs. bias (Ids-Vds) curve of a device made of 
Hipco SWNTs (10m channel width and 250nm channel length). (f)
Ids-Vds of a device made of laser-ablation SWNTs (10m channel 
width and 250nm channel length).
-3 -2 -1 0
-3 -2 -1 0
-0.15
-0.10
-0.05
(e) (f)
400m
400nm
80m(b)
Angle (deg.) Raman 
shift (cm
Angle (deg.) Raman 
shift (cm
ABSTRACT FOR WEB PUBLICATION.
Single-walled carbon nanotubes (SWNTs) exhibit advanced electrical and surface properties useful for high performance 
nanoelectronics. Important to future manufacturing of nanotube circuits is large-scale assembly of SWNTs into aligned forms. 
Despite progress in assembly and oriented synthesis, pristine SWNTs in aligned and close-packed form remain elusive and 
needed for high-current, -speed and -density devices through collective operations of parallel SWNTs. Here, we develop a 
Langmuir-Blodgett (LB) method achieving monolayers of aligned SWNTs with dense packing, central to which is a non-
covalent polymer functionalization by poly(m-phenylenevinylene-co-2,5-dioctoxy-p-phenylenevinylene) (PmPV) imparting high 
solubility and stability of SWNTs in an organic solvent 1,2-dichloroethane (DCE). Pressure cycling or ‘annealing’ during LB 
film compression reduces hysteresis and facilitates high-degree alignment and packing of SWNTs characterized by 
microscopy and polarized Raman spectroscopy. The monolayer SWNTs are readily patterned for device integration by 
microfabrication, enabling the highest currents (~3mA) through the narrowest regions packed with aligned SWNTs thus far.
ABSTRACT
  Single walled carbon nanotubes exhibit advanced electrical and surface
properties useful for high performance nanoelectronics. Important to future
manufacturing of nanotube circuits is large scale assembly of SWNTs into
aligned forms. Despite progress in assembly and oriented synthesis, pristine
SWNTs in aligned and close-packed form remain elusive and needed for high
current, speed and density devices through collective operations of parallel
SWNTs. Here, we develop a Langmuir Blodgett method achieving monolayers of
aligned SWNTs with dense packing, central to which is a non covalent polymer
functionalization by PmPV imparting high solubility and stability of SWNTs in
an organic solvent DCE. Pressure cycling or annealing during LB film
compression reduces hysteresis and facilitates high degree alignment and
packing of SWNTs characterized by microscopy and polarized Raman spectroscopy.
The monolayer SWNTs are readily patterned for device integration by
microfabrication, enabling the highest currents 3mA through the narrowest
regions packed with aligned SWNTs thus far.

<|endoftext|><|startoftext|>
Quantum Phase Transition in the Four-Spin Exchange Antiferromagnet
Valeri N. Kotov, Dao-Xin Yao, A. H. Castro Neto, and D. K. Campbell
Department of Physics, Boston University, 590 Commonwealth Avenue, Boston, MA 02215
We study the S=1/2 Heisenberg antiferromagnet on a square lattice with nearest-neighbor and plaquette four-
spin exchanges (introduced by A.W. Sandvik, Phys. Rev. Lett. 98, 227202 (2007).) This model undergoes a
quantum phase transition from a spontaneously dimerized phase to Néel order at a critical coupling. We show
that as the critical point is approached from the dimerized side, the system exhibits strong fluctuations in the
dimer background, reflected in the presence of a low-energy singlet mode, with a simultaneous rise in the triplet
quasiparticle density. We find that both singlet and triplet modes of high density condense at the transition,
signaling restoration of lattice symmetry. In our approach, which goes beyond mean-field theory in terms of the
triplet excitations, the transition appears sharp; however since our method breaks down near the critical point,
we argue that we cannot make a definite conclusion regarding the order of the transition.
PACS numbers: 75.10.Jm, 75.30.Kz, 75.50.Ee
I. INTRODUCTION
Problems related to quantum criticality in quantum spin
systems are of both fundamental and practical importance1.
Numerous materials, such as Mott insulators, exhibit either
antiferromagnetic (Néel) order or quantum disordered (spin
gapped) ground state depending on the distribution of Heisen-
berg exchange couplings and geometry. External perturba-
tions (such as doping or frustration) can also cause quantum
transitions between these phases. Systems with spin 1/2 are
indeed the most interesting as they are the most susceptible to
such transitions. It is well understood that the quantum transi-
tion between a quantum disordered and a Néel phase is in the
O(3) universality class1, where a triplet state condenses at the
quantum critical point (QCP).
A recent exciting development in our theoretical under-
standing of QCPs originated from the proposal that if the
quantum disordered (QD) phase spontaneously breaks lattice
symmetries (e.g. is characterized by spontaneous dimer or-
der), and the transition is of second order, then exactly at
the QCP spinon deconfinement occurs, i.e. the excitations
are fractionalized2. It is assumed that the Hamiltonian itself
does not break the lattice symmetries (i.e. does not have “triv-
ial” dimer order caused by some exchanges being stronger
than the others). We use the terms “dimer order” and “va-
lence bond solid (VBS) order” interchangeably. It is expected
that the dimer order vanishes exactly at the point where Néel
order appears, i.e. there is no coexistence between the two
phases. Deconfinement thus is intimately related to disap-
pearance of VBS order; indeed if the latter persisted in the
Néel phase it would be impossible to isolate a spinon, as
“pairing” would always take place. Spontaneous VBS or-
der driven by frustration has been a common theme in quan-
tum antiferromagnetism3, although its presence and the na-
ture of criticality in specific models, such as the 2D square-
lattice frustrated Heisenberg antiferromagnet, is still some-
what controversial4. It would be particularly useful to apply
unbiased numerical approaches, such as the Quantum Monte
Carlo (QMC) method, to study frustrated spin models; how-
ever due to the fatal “sign” problem5, frustrated Heisenberg
systems are beyond the QMC reach.
In a recent study, the QMC method was applied to a four-
spin exchange quantum spin model without frustration, which
was shown to exhibit columnar dimer VBS order and a mag-
netically ordered phase with a deconfined QCP separating
them6. These conclusions were later confirmed by further
QMC studies7. Extensions of the model, which include for
example additional (six-spin) interactions, provide additional
support for a continuous QCP8. A different VBS pattern (pla-
quette order) was also proposed for the four-spin exchange
model9. At the same time, the nature of the quantum phase
transition was challenged in Refs.[10,11], where arguments
were given that the transition is in fact of (weakly) first order.
It is the objective of the present work to study the Sand-
vik model6, by approaching the quantum transition from the
dimer VBS phase. Our approach uses as a starting point a
symmetry broken state (i.e. one out of four degenerate VBS
configurations), and we thus must search for signatures that
the system attempts to restore the lattice symmetry at the QCP.
Even though full restoration is impossible within the present
framework, we find a QCP characterized by condensation of
triplet modes of high density; this is in contrast to the conven-
tional situation when the condensing particles are in the dilute
Bose gas limit. The high density itself is due to the presence
of a singlet mode that condenses at the QCP, and reflects the
strong fluctuations of the background dimer order. The above
effects lead to the vanishing of the VBS order parameter; at
the same time our method, which accounts for the strong fluc-
tuations, leads to a rather sharp phase transition. It appears
that we cannot draw a definite conclusion about the order of
the transition because in the vicinity of the QCP the triplon
density increases uncontrollably, suggesting that other states
(such a plaquette states and larger clusters) are strongly ad-
mixed into the ground state. This is generally expected in a
situation where the lattice symmetry is restored at the quan-
tum critical point.
The model under consideration is
H = J
〈a,b〉
Sa.Sb −K
a,b,c,d
(Sa · Sb)(Sc · Sd), (1)
where J > 0,K > 0, and all spins are S = 1/2. Consider
the numbers 1,2,3,4 in Fig. 1. The summation in the four-
spin term is over indexes (a, b) = (1, 2), (c, d) = (3, 4) and
http://arxiv.org/abs/0704.0114v3
FIG. 1: (Color online) Dimer pattern in the quantum disordered
(VBS) phase, K/J > (K/J)c.
(a, b) = (1, 4), (c, d) = (2, 3) on a given plaquette, and then
summation is made over all plaquettes12. The range of pa-
rameters explored in Ref.[6] is K/J ≤ 2, and the QCP is at
(K/J)c ≈ 1.85. Our coupling notation is slightly different
from the one used in Refs.[6,7]; the coupling K is related to
the parameter Q6,7 via K = Q/(1 + Q/(2J)), and the criti-
cal point in that notation is (Q/J)c ≈ 25. The dimerization
pattern is proposed to be of the “columnar” type, as shown in
Fig. 1. Four such configurations exist. We will assume a con-
figuration of this type, will show that it is stable at K/J ≫ 1,
and will then search for an instability towards the Néel state
as K/J decreases.
The rest of the paper is organized as followed. In Sec-
tion II we present results based on the mean-field approach in
terms of the dimer (triplon) operators. In Section III we extend
our treatment beyond mean-field, and even further in Section
IV, where we also take into account low-energy singlet two-
triplon excitations. Section V contains our conclusions.
II. MEAN-FIELD TREATMENT
We start by rewriting Eq. (1) in the the bond-operator
representation13, where on a dimer i, the two spins forming it
are expressed as: Sα1,2 =
tiα±t†iαsi−iǫαβγt
tiγ), and
, α = x, y, z create a singlet and triplet of states. We re-
fer to the triplet (S=1) quasiparticle, t†
, as “triplon”. The
bold indexes i, j,m, l label the dimers (see Fig. 1). Summa-
tion over repeated Greek indexes is assumed, unless indicated
otherwise.
The hard-core constraint, s†
si + t
tiα = 1, must be en-
forced on every site, which at the mean-field (MF) level can be
done by introducing a term in the Hamiltonian, −µ
(s2 +
tiα − 1). Then µ and the (condensed) singlet amplitude
s ≡ 〈si〉, are determined by the MF equations13. We obtain at
the quadratic level, in momentum representation:
tkα +
−kα + h.c.
where
Ak = J/4− µ+ s2(ξ−k +K/2) + s
4Σ(k) ,
Bk = s
+ s4Σ(k) ,
= −(J/2) coskx + (J ±K/4) cosky . (3)
The four-spin interaction from (1) acting between two
dimers (e.g. i, j in Fig. 1) contributes to the “on-site” gap and
hopping (ξ−
) via Ak, as well as to the quantum fluctuations
term Bk. The part involving four dimers has been split in a
mean-field fashion, leading to the Hartree-Fock self-energy
−Σ(k)/K = 2Σx cos kx + 2Σy cos ky +Σxy cos kx cos ky,
tmα + t
t†mα〉 , (5)
where i,m are neighboring dimers in the x (horizontal) direc-
tion (Fig. 1), and similarly for the y and the diagonal contribu-
tions. The triplon dispersion is ω(k) =
, and has
a minimum at the Néel ordering wave-vector kAF = (0, π)
(since we work on a dimerized lattice). The ground state en-
ergy is then easily computed,
EGS = E0 + 〈H2〉 , (6)
where
E0/N = −
(Js2 +Ks4) + µ(−s2 + 1) + (7)
Σ2x +Σ
, (8)
〈H2〉 =
(ω(k)−Ak) . (9)
The mean-field equations require a numerical minimization
of EGS with respect to the parameters {µ, s,Σx,Σy,Σxy}.
This amounts to the self-consistent Hartree-Fock approxima-
tion for Σ(k). The result for the triplon gap ∆ = ω(kAF ) is
presented in Fig. 2 (black curve).
The MF result (K/J)c ≈ 0.6 substantially underestimates
the location of the critical point, compared to the the QMC
calculations, where (K/J)c ≈ 1.856,7. Interestingly, if one
solves the MF equations ignoring both the hard core and the
Σ(k), one finds (K/J)c = 1. Physically, in the full MF, the
hard core contribution increases the gap (and hence the stabil-
ity of the dimer phase) while at the same time suppressing the
antiferromagnetic fluctuations (which favor the Néel state).
We also note that a recent (hierarchical) MF treatment
based on the plaquette ground state also underestimates very
strongly the QCP location ((K/J)c ≈ 19), similarly to our
result. In our view this means that both mean field approaches
are not sufficient to attack the present problem, where fluctu-
ations are apparently very strong. We choose to accept that
the numerical QMC result gives the most accurate determina-
tion of the QCP, and therefore in what follows we extend our
treatment in several directions beyond mean-field theory.
0.5 1 1.5 2 2.5
Brueckner field theory (II)
Brueckner +Singlet fluctuations (III)
Mean-Field Theory (I)
(III)
FIG. 2: (Color online) Triplon excitation gap ∆ = ω(kAF ) in vari-
ous approximations. The point ∆ → 0 corresponds to transition to
the Néel phase.
III. BEYOND MEAN-FIELD: THE DILUTE TRIPLON GAS
APPROXIMATION
A more accurate treatment of fluctuations is possible by
taking into account the hard-core constraint beyond mean-
field. One can set the singlet amplitude s = 1 in the pre-
vious formulas, but introduce an infinite on-site repulsion be-
tween the triplons, U
tβitαi, U → ∞. As long as
the triplon density (determined by the quantum fluctuations) is
low, an infinite repulsion corresponds to a finite scattering am-
plitude between excitations and can be calculated by resum-
ming ladder diagrams for the scattering vertex14. This leads
to the effective triplon-triplon vertex Γ(k, ω) which was pre-
viously calculated15:
Γ−1(k, ω)=
ω(q) + ω(k− q)− ω
u → v
ω → −ω
This vertex in turn affects the triplon dispersion via (what we
call) the Brueckner self-energy15:
ΣB(k, ω) = 4
v2qΓ(k+ q, ω − ω(q)). (11)
The corresponding parameters in the quadratic Hamiltonian
(2) in this case are
Ak = J + 2K(1− 4nt/3) + ξ−k +Σ(k) + ΣB(k, 0),
Bk = ξ
+Σ(k). (12)
The Bogolubov coefficients are defined in the usual way
= 1/2 + Ak/(2ω(k)) = 1 + v
. The various terms
in Σ(k) can be expressed through them: for example Σx =
+ vkuk) cos kx, and so on. The density of triplons
is nt = 〈t†iαtiα〉 = 3
v2k. In addition, the renormaliza-
tion of the quasiparticle residue, Z−1
= 1 − ∂ΣB(k, 0)/∂ω,
−k y y
FIG. 3: Renormalization of quantum fluctuations by resummation of
a ladder series, with (13) at the vertices.
implies the replacement uk →
Zkuk, vk →
Zkvk in
all the formulas15, and the renormalized spectrum ω(k) =
An iterative numerical evaluation of the spectrum using
the above equations, which amounts to solution of the Dyson
equation, leads to the result shown in Fig. 2 (blue curve). The
above approach appears to be well justified since the quasi-
particle density nt < 0.1. The resulting critical point is still in
the “weak-coupling” regime K/J < 1, with about 100% de-
viation from the QMC result ((K/J)c ≈ 1.85). This suggests
that the on-site triplon fluctuations are not the dominant cause
for the disagreement with the QMC results; thus we proceed
to include two-particle fluctuations (in the triplon language),
which amounts to including dimer-dimer correlations.
IV. STRONG FLUCTUATIONS IN THE SINGLET
BACKGROUND: QCP BEYOND THE DILUTE TRIPLON
GAS APPROXIMATION
It is clear that “non-perturbative” effects are responsible for
driving the QCP towards the “strong-coupling” regionK/J ∼
2. To proceed we make two improvements to the previous
low-density, weak-coupling theory.
First, we take into account fluctuations in the singlet back-
ground, i.e. the manifold on which the triplons are built and
interact. The main effect originates from the action of the
four-spin K-term from (1) on two dimers, e.g. i, j in Fig. 1.
Part of this action has led to the on-site gap 2K in (12), fa-
voring dimerization. However, a strong attraction between the
two dimers is also present, since theK-term is symmetric with
respect to the index pair exchange (1, 2)(3, 4) ↔ (1, 4)(2, 3),
leading to a “plaquettization” tendency as well. In the triplon
language this is manifested by formation of bound states of
two triplons, due to their nearest-neighbor interactions
H4,y =
〈i,j〉y,αβ
tβitαj + γ2t
tβitβj
+ γ3t
tαitβj
, (13)
γ1 = −
, γ2 = −
, γ3 = −
We also checked that on the perturbative (Hartree-Fock) level,
the effect of this term on equations (3) and (12) was negligible
(and we did not write it explicitly).
An intuitive way of taking into account the effect of two-
triplon bound states (with total spin S=0) on the one-triplon
2.1 2.2 2.3 2.4 2.5
FIG. 4: (Color online) (a.) Singlet bound state energy Es (black),
binding energy ǫ = 2∆ − Es (blue), and the triplon gap ∆ (red).
(b.) Triplon density nt. (c.) Dimer order parameters. Dashed parts
of the lines represent points corresponding to rapid growth of the
quasiparticle density.
spectrum, is to work in the “local” approximation. This means
effectively neglecting the triplon dispersion and directly eval-
uating the ladder series that renormalizes the quantum fluctu-
ation term Bk in (2), corresponding to emission of a pair of
triplons with zero total momentum. This is illustrated graphi-
cally in Fig. 3, with the result
Bk = −
cos kx +
J +K/4
1− |γ|
 cos ky +Σ(k) ,
γ ≡ γ1 + 3γ2 + γ3 = −J −
K, (14)
where γ is the effective attraction of two triplons with total
S = 0, and
∆E = 2J +
K (15)
is the energy of two (non-interacting) triplons on adjacent
sites. This calculation is justified for K/J ≫ 1 and leads
to an increase of the quantum fluctuations, and from there to
almost doubling of the triplon density nt (see Fig. 4 below).
It contributes significantly to the shift of the QCP.
We can go beyond the “local” approximation by solving
the Bethe-Salpeter equation for the bound state, formed due
to the attraction (13), and taking into account the full triplon
dispersion. The equation for the singlet bound state energy
Es(Q), corresponding to total pair momentum Q is
1 = 2γ
u4q cos
Es(Q)− ω(Q/2 + q)− ω(Q/2− q)
. (16)
Here we have, for simplicity, written only the main contribu-
tion to pairing (Eq. (13)) in the limit K/J ≫ 1, and have
neglected the on-site repulsion (which leads to slightly di-
minished pairing), as well as small pairing due to the ex-
change J from dimers in the x-direction on Fig. 1. It is
easily seen that the lowest energy corresponds to Q = 0;
we define from now on Es ≡ Es(Q = 0). The bind-
ing energy is ǫ = 2∆ − Es, where ∆ is the one-particle
gap. The bound state wave-function corresponding to Es is
|Ψ〉 =
α,i,j,qy
iqy(i−j)t
|0〉. In the “local” limit
(nearest-neighbor pairing), Ψqy =
2 cos qy .
Second, we have made subtle changes to the resumma-
tion procedure concerning the quasiparticle renormalization
Z , based on both formal and physical grounds. On the one
hand it is clear that in the Brueckner approximation (Eq. (11)),
where the self energy is linear in the density (ΣB ∝ nt), the
dependence of the vertex Γ−1 on density is beyond the accu-
racy of the calculation, meaning one can put uq = 1, vq = 0
in (10), instead of determining them self-consistently. This
leads to a decreased influence of the hard-core ΣB (which fa-
vors the dimer state) on the Hartree-Fock self-energy Σ(k)
from (4) (which favors the Néel state). It is indeed the mutual
interplay between ΣB > 0 and Σ(k) < 0, that determines the
exact location of the QCP in the course of the Dyson’s equa-
tion iterative solution. While in the “weak-coupling” regime
K/J < 1, ΣB always dominates, in the “strong-coupling” re-
gion K/J > 2, Σ(k) starts playing a significant role, since
parametrically Σ ∝ Knt. It is physically consistent that in
the region where singlet fluctuations in the dimer background
are strong, the hard-core effect is less important, i.e. in ef-
fect the kinematic hard-core constraint is “relaxed”. We also
observe that in typical models with QCP driven by explicit
dimerization, such as the bilayer model, the described differ-
ence in approximation schemes makes a very small difference
on the location of the QCP16, since those models are always
in the “weak-coupling” regime, dominated by the hard-core
repulsion of excitations on a non-fluctuating dimer configura-
tion. The purpose of the above rather technical diversion is to
emphasize that care has been taken to take into account as ac-
curately as possible the effect of the (low-energy) two-particle
spectrum on the one-particle triplon gap.
Our results are summarized in Fig. 4 and Fig. 2 (red line) for
the gap. The critical point is shifted towards (K/J)c ≈ 2.16
(in much better agreement with QMC data), with a very
strong increase of the density towards Kc. This translates
into a decrease of the dimer order, as measured by the two
dimer order parameters that we compute from the expres-
sions: Dx = |〈S3 · S4〉 − 〈S5 · S4〉| = | − 34 + nt +
Dy = |〈S3 ·S4〉−〈S1 ·S4〉| = |− 34+nt−
Σy|. The spins are
labeled as in Fig. 1. The singlet bound state energy Es(0) also
tends towards zero at the QCP, with the corresponding binding
energy remaining quite large ǫ/J ≈ 1. All these effects point
towards a tendency of the system to restore the lattice sym-
metry, although it is certainly clear that as the critical point is
approached, our approximation scheme (low density of quasi-
particles) breaks down (dashed lines on figures). We should
point out that the sharpness of variation near Kc is not due to
divergence in any of the self-energies but is a result of rapid
cancellation at high orders (i.e. iterations in the Dyson equa-
tion). In fact cutting off our iterative procedure at finite order
gives a smooth curve, suggesting that additional classes of di-
agrams become important (although in practice their classifi-
cation is an insurmountable task). The merger of singlet and
triplet modes, which we find near the QCP, in principle reflects
a tendency towards quasiparticle fractionalization (spinon de-
confinement) and is also found in the 1D Heisenberg chain
with frustration17, where spinons are always deconfined.
Since we are now dealing with a situation where the density
is not very small nt ≈ 0.2, it is prudent to check how the next
order in the density may affect the above results. For example
at second order in the density, the self-energy Σ(k) changes
by amount δΣ(k), i.e. one has to add this contribution to the
right hand side of Eq.(12), namely:
Ak → Ak + δΣ(k), Bk → Bk + δΣ(k). (17)
We have found
δΣ(k) = −2K(Q2x − P 2x ) cos kx +
+2K(Q2y − P 2y ) cos ky −
−K(Q2xy − P 2xy) cos kx cos ky, (18)
and the following definitions are used:
Px = (1/3)
tmα〉 =
v2k cos kx,
Qx = (1/3)
〈tiαtmα〉 =
ukvk cos kx, (19)
and similarly for the other directions, for example Py =
v2k cos ky , Pxy =
v2k cos kx cos ky , etc. After includ-
ing these expressions in our numerical iterative procedure, we
have found that the QCP is shifted by a very small amount,
and the overall picture, as summarized in Fig. 4 and Fig. 2
(red line), still stands.
V. CONCLUSIONS
In conclusion, we have shown that the QCP between the
Néel and the dimer state in the model (1) is of unconven-
tional nature, in the sense that it is characterized by the pres-
ence of both triplet and singlet low-energy modes. Near the
QCP, whose location ((K/J)c ≈ 2.16) we find in fairly good
agreement with recent QMC studies, the system exhibits: (1.)
Strong rise of the triplon excitation density, due to increased
quantum fluctuations, (2.) Corresponding strong decrease
(and ultimately vanishing) of the dimer order at the QCP (3.)
Vanishing of a singlet energy scale, related to the destruction
of the dimer “columns” in Fig. 1. The above effects are all re-
lated and influence strongly one another, ultimately meaning
that the QCP reflects strong fluctuations and can not be de-
scribed in a mean-field theory framework. These results also
suggest a desire of the system to restore the lattice symmetry
at the QCP, as found in the QMC studies6.
At the same time all our improvements beyond mean-field
theory have also resulted in a very sharp transition, which ap-
pears to be first order. However in our view our approach
is not capable of addressing correctly the issue of the order
of the phase transition, basically because once we take the
strong (inter) dimer fluctuations in to account, the triplon den-
sity starts rising quickly beyond control. This is in a certain
sense natural in a situation where the system wants to restore
the lattice symmetry at the QCP and thus the ground state
acquires strong admixture of plaquette, etc. fluctuations as
the dimers begin to “disappear.” This is also manifested in the
fact that our procedure is sensitive to the number of iterations
in the Dyson equation; all presented results are for an “infi-
nite” number of iterations, so that a fixed point is reached,
but cutting off the procedure results in a smoother behavior
and a shift of the QCP, which becomes iteration dependent.
We have not previously encountered such volatile behavior in
any other spin model with a dimer to magnetic order transi-
tion. Since iterations translate into accounting of more and
more fluctuations, the sensitivity of the results seems to mean
that the situation starts spiraling out of control near the QCP,
quite likely because classes of fluctuations become important
that are not included in the dimer description, such as longer
range correlations, etc. All this suggests that the triplon quasi-
particle description breaks down near the QCP which indeed
appears natural in a model where spinon deconfinement is ex-
pected to take place at the QCP6. On the other hand, if we
put aside the arguments that our approach is not reliable near
the QCP, the natural conclusion would be that the transition is
first order.
Acknowledgments
We are grateful to A. W. Sandvik, K. S. D. Beach, S.
Sachdev, and O. P. Sushkov for numerous stimulating dis-
cussions. A.H.C.N. was supported through NSF grant DMR-
0343790; V.N.K., D.X.Y., and D.K.C. were supported by
Boston University.
1 S. Sachdev, Quantum Phase Transitions (Cambridge University
Press, Cambridge, 1999).
2 T. Senthil, A. Vishwanath, L. Balents, S. Sachdev, and M. P.
A. Fisher, Science 303, 1490 (2004); T. Senthil, L. Balents, S.
Sachdev, A. Vishwanath, and M. P.A. Fisher, Phys. Rev. B 70,
144407 (2004).
3 S. Sachdev and N. Read, Int. J. Mod. Phys. B 5, 219 (1991).
4 R. R. P. Singh, Z. Weihong, C. J. Hamer, and J. Oitmaa, Phys.
Rev. B 60, 7278 (1999); L. Capriotti, F. Becca, A. Parola, and
S. Sorella, Phys. Rev. Lett. 87, 097201 (2001); M. Mambrini,
A. Läuchli, D. Poilblanc, and F. Mila, Phys. Rev. B 74, 144422
(2006).
5 P. Henelius and A. W. Sandvik, Phys. Rev. B 62, 1102 (2000).
6 A. W. Sandvik, Phys. Rev. Lett. 98, 227202 (2007).
7 R. G. Melko and R. K. Kaul, Phys. Rev. Lett. 100, 017203 (2008).
8 J. Lou, A. W. Sandvik, and N. Kawashima, arXiv:0908.0740.
9 L. Isaev, G. Ortiz, and J. Dukelsky, arXiv:0903.1630.
10 A. B. Kuklov, M. Matsumoto, N. V. Prokof’ev, B. V. Svistunov,
and M. Troyer, Phys. Rev. Lett. 101, 050405 (2008).
11 F.-J. Jiang, M. Nyfeler, S. Chandrasekharan, and U.-J. Wiese, J.
Stat. Mech., P02009 (2008).
12 The possibility of four-spin exchange induced dimerization has
been discussed in the context of the full ring exchange, of which
the interaction (1) is part; see e.g. A. Läuchli, J. C. Domenge,
C. Lhuillier, P. Sindzingre, and M. Troyer, Phys. Rev. Lett. 95,
137206 (2005).
13 S. Sachdev and R. N. Bhatt, Phys. Rev. B 41, 9323 (1990).
14 A. L. Fetter and J. D. Walecka, Quantum Theory of Many-Particle
Systems (Dover Publications, Mineola, NY, 2003).
15 V. N. Kotov, O. P. Sushkov, Z. Weihong, and J. Oitmaa, Phys. Rev.
Lett. 80, 5790 (1998).
16 P. V. Shevchenko, A. W. Sandvik, and O. P. Sushkov, Phys. Rev.
B 61, 3475 (2000).
17 W. H. Zheng, C. J. Hamer, R. R. P. Singh, S. Trebst, and H.
Monien, Phys. Rev. B 63, 144411 (2001); ibid. 63, 144410 (2001).
http://arxiv.org/abs/0908.0740
http://arxiv.org/abs/0903.1630
ABSTRACT
  We study the S=1/2 Heisenberg antiferromagnet on a square lattice with
nearest-neighbor and plaquette four-spin exchanges (introduced by A.W. Sandvik,
Phys. Rev. Lett. {\bf 98}, 227202 (2007).)
  This model undergoes a quantum phase transition from a spontaneously
dimerized phase to N\'eel order at a critical coupling. We show that as the
critical point is approached from the dimerized side, the system exhibits
strong fluctuations in the dimer background, reflected in the presence of a
low-energy singlet mode, with a simultaneous rise in the triplet quasiparticle
density. We find that both singlet and triplet modes of high density condense
at the transition, signaling restoration of lattice symmetry. In our approach,
which goes beyond mean-field theory in terms of the triplet excitations, the
transition appears sharp; however since our method breaks down near the
critical point, we argue that we cannot make a definite conclusion regarding
the order of the transition.

<|endoftext|><|startoftext|>
Introduction
Let N and P be smooth (C∞) manifolds of dimensions n and p respectively. Let
Jk(N,P ) denote the k-jet space of the manifolds N and P with the projections
πkN and π
P onto N and P mapping a jet onto its source and target respectively.
The canonical fiber is the k-jet space Jk(n, p) of C∞-map germs (Rn, 0) →
(Rp, 0). Let K denote the contact group defined in [MaIII]. Let O(n, p) denote
a K-invariant nonempty open subset of Jk(n, p) and let O(N,P ) denote an
open subbundle of Jk(N,P ) associated to O(n, p). In this paper a smooth map
f : N → P is called an O-regular map if jkf(N) ⊂ O(N,P ).
We will study what is called the homotopy principle for O-regular maps.
As for the long history of the several types of homotopy principles and their
applications we refer to the Smale-Hirsch Immersion Theorem ([Sm] and [H]),
the Feit k-mersion Theorem ([F]), the Phillips Submersion Theorem ([P]) and
the general theorems due to Gromov ([G1]) and du Plessis ([duP1], [duP2] and
[duP3]). Furthermore, we should refer to the homotopy principle on the 1-jet
level for fold-maps due to Èliašberg ([E1] and [E2]) (see further references in
[G2]).
∗2000 Mathematics Subject Classification. Primary 58K30; Secondary 57R45, 58A20
†Key Words and Phrases: smooth map, singularity, homotopy principle
‡This research was partially supported by Grand-in-Aid for Scientific Research (No.
16540072).
http://arxiv.org/abs/0704.0115v2
Let C∞
(N,P ) denote the space consisting of all O-regular maps, N → P
equipped with the C∞-topology. Let ΓO(N,P ) denote the space consisting of all
continuous sections of the fiber bundle πkN |O(N,P ) : O(N,P ) → N equipped
with the compact-open topology. Then there exists a continuous map jO :
(N,P ) → ΓO(N,P ) defined by jO(f) = j
kf . If the following property (h-P)
holds, then we say in this paper that the relative homotopy principle on the
existence level holds for O-regular maps.
(h-P) Let C be a closed subset of N with ∂N = ∅. Let s be a section in
ΓO(N,P ) which has an O-regular map g defined on a neighborhood of C to P ,
where jkg = s. Then there exists an O-regular map f : N → P such that s
and jkf are homotopic relative to a neighborhood of C by a homotopy sλ in
ΓO(N,P ) with s0 = s and s1 = j
As important applications of [An7, Theorem 0.1] we will prove the following
relative homotopy principles in (h-P). Here, Σn−p+1,0(n, p) refers to the space
consisting of all fold jets in Jk(n, p).
Theorem 1.1 Let n and p be positive integers with n ≧ p ≧ 2 or n < p. Let k
be a positive integer with k ≧ n− |n− p|+ 2. Let O(n, p) denote a K-invariant
open subspace of Jk(n, p) containing all regular jets such that if n ≧ p ≧ 2,
then O(n, p) contains Σn−p+1,0(n, p) at least. Let N and P be connected smooth
manifolds of dimensions n and p respectively with ∂N = ∅. Let C be a closed
subset of N . Let s be a section in ΓO(N,P ) which has an O-regular map g
defined on a neighborhood of C to P , where jkg = s.
Then there exists an O-regular map f : N → P such that jkf is homotopic
to s relative to a neighborhood of C as sections in ΓO(N,P ).
Let ρ be an integer with ρ ≧ 1. Let W kρ denote the subset consisting of
all z ∈ Jk(n, p) such that the codimension of Kz in Jk(n, p) is not less than
ρ (k may be ∞). Let Okℓ (n, p) denote a K-invariant nonempty open subset of
Jk(n, p)\W kℓ+1. By applying Theorem 1.1 we will prove the following theorem.
Theorem 1.2 Let ℓ be a positive integer. Let k ≧ max{ℓ+1, n−|n−p|+2} or
k = ∞. Let Okℓ (n, p) denote a K-invariant open subspace of J
k(n, p) containing
all regular jets such that if n ≧ p ≧ 2, then Okℓ (n, p) contains Σ
n−p+1,0(n, p) at
least. Then the relative homotopy principle in (h-P) holds for Okℓ -regular maps.
It is well known that any smooth map f : N → P is homotopic to a smooth
map g : N → P such that j∞x g is of finite K-codimension for any x ∈ N (see,
for example, [W, Theorem 5.1]).
There have been described many important applications of the homotopy
principles in [G2]. We only refer to the recent applications of the relative ho-
motopy principle on the existence level to the problems in topology such as the
elimination of singularities and the existence of Okl -regular maps in [An1-7] and
[Sa] and the relation between the stable homotopy groups of spheres and higher
singularities in [An4].
Let P be a closed manifold of dimension p. Let h(P ) denote the group of all
homotopy classes of homotopy equivalences of P . Let hℓ(P ) denote the subset
of h(P ) which consists of all homotopy classes of maps which are homotopic
to O∞l -regular homotopy equivalences. In particular, h0(P ) is the subset of all
homotopy classes of maps which are homotopic to diffeomorphisms of P . In this
paper we will prove that the following filtration
h0(P ) ⊂ h1(P ) ⊂ · · · ⊂ hℓ(P ) ⊂ · · · ⊂ h(P ). (1.1)
is never trivial in general.
Theorem 1.3 For a given positive integer d, there exists a closed oriented p-
manifold P and a sequence of positive integers ℓ1, ℓ2, · · · , ℓd with ℓj < ℓj+1 for
1 ≤ j < d such that
h0(P ) & hℓ1(P ) & hℓ2(P ) & · · · & hℓd(P ) & h(P ).
In Section 2 we will review the results on the Boardman manifolds and the
fundamental properties of K-equivalence and K-determinacy which are neces-
sary in this paper. In Section 3 we will recall [An7, Theorem 0.1] and apply it in
the proofs of Theorems 1.1 and 1.2. In Section 4 we will study the nonexistence
problem of Okl -regular maps. In Section 5 we will study the filtration in (1.1)
and prove Theorem 1.3.
2 Boardman manifolds and K-orbits
Throughout the paper all manifolds are Hausdorff, paracompact and smooth of
class C∞. Maps are basically smooth (of class C∞) unless otherwise stated.
For a Boardman symbol (simply symbol) I = (i1, · · · , ik) with n ≧ i1 ≧ · · · ≧
ik ≧ 0, let Σ
I(n, p) denote the Boardman manifold of symbol I in Jk(n, p) which
has been defined in [T], [L], [Bo] and [MaTB]. Let An = R[[x1, · · · , xn]] denote
the formal power series of algebra on variables x1, · · · , xn. Letmn be its maximal
ideal and An(k) = An/m
n . Let z = j
0 f ∈ J
k(n, p) where f = (f1, · · · , fp) :
(Rn, 0) → (Rp, 0). We define I(z) to be the ideal in An(k) generated by the
image in An(k) of the Taylor expansions of f
1, · · · , fp. It has been proved in
[Bo] and [MaTB] that the Boardman symbol I(z) of z depends only on the
ideal I(z) by the notion of the Jacobian extension. Let ΣI(N,P ) denote the
subbundle of Jk(N,P ) overN×P associated to ΣI(n, p). Let ΣIx,y(N,P ) denote
the fiber of ΣI(N,P ) over (x, y) ∈ N × P .
Since codimΣi1(n, p) = (p−n+ i1)i1, the following proposition follows from
[An6, Remark 2.1], which has been proved by using the results in [Bo, Section
Proposition 2.1 Let I = (i1, · · · , iℓ) be a symbol such that i1 ≧ max{n− p+
1, 1} and ΣI(n, p) is nonempty. Then we have
codimΣI(n, p) ≧ (p− n+ i1)i1 + (1/2)Σ
j=2 ij(ij + 1).
In particular, if iℓ > 0, then we have codimΣ
I(n, p) ≧ |n− p|+ ℓ.
Let ΩI(n, p) denote the union of all Boardman manifolds ΣJ (N,P ) with
J ≤ I in the lexicographic order. We have the following lemma (see [duP1]).
Lemma 2.2 The space ΩI(n, p) is open in Jk(n, p).
Let us review the K-equivalence of two smooth map germs f, g : (N, x) →
(P, y), which has been introduced in [MaIII, (2.6)], by following [Mart, II, 1].
We say that the above two map germs f and g are K-equivalent if there exists a
smooth map germ φ : (N, x) → GL(Rp) and a local diffeomorphism h : (N, x) →
(N, x) such that f(x) = φ(x)g(h(x)). It is known that this K-equivalence is
nothing but the contact equivalence introduced in [MaIII]. The contact group
K is defined as a certain subgroup of the group of germs of local diffeomorphisms
(N, x)× (P, y) and acts on Jkx,y(N,P ). For a k-jet z in J
x,y(N,P ) let Kz denote
the orbit of K through z. As is well known, Kz is an orbit of a Lie group. Hence,
Kz is a submanifold of Jkx,y(N,P ). This fact is also observed from the above
definition. The following lemma is important in this paper.
Lemma 2.3 The Boardman manifold ΣIx,y(N,P ) in J
x,y(N,P ) is invariant
with respect to the action of K.
Proof. Let z = jkxf and w = j
xg be k-jets in J
x,y(N,P ) such that two map
germs f and g are K-equivalent as above. Let h∗ : Cx → Cx be the isomorphism
defined by h∗(φ) = φ◦h. By the definition of K-equivalence we have h∗(I(g)) =
I(f). The Thom-Boardman symbols of jkxf and j
xg are determined by I(f)
and I(g), and are the same by [MaTB, 2, Corollary]. This proves the assertion.
Let us review the results in [MaIII], [MaIV] and [MaV] which are necessary
in this paper. Let C∞(N, x) and C∞(P, y) denote the rings of smooth function
germs on (N, x) and (P, y) respectively. Let mx and my denote their maximal
ideals respectively. Let f : (N, x) → (P, y) be a germ of a smooth map. Let
f∗ : C∞(P, y) → C∞(N, x) denote the homomorphism defined by f∗(a) = a◦f .
Let θ(N)x denote the C
∞(N, x)-module of all germs at x of smooth vector fields
on (N, x). We define θ(P )y similarly for y ∈ P . Let θ(f)x denote the C
∞(N, x)-
module of germs at x of smooth vector fields along f , namely which consists of
all smooth germs ς : (N, x) → TP such that pP ◦ ς = f . Here, pP : TP → P is
the canonical projection. Then we have the homomorphisms
tf : θ(N)x → θ(f)x (2.1)
defined by tf(uN) = df ◦ uN for uN ∈ θ(N)x. For a singular jet z = j
0 f ∈
Jk(N,P ) there has been defined the isomorphism
x,y(N,P )) −→ mxθ(f)x/m
x θ(f)x (2.2)
in [MaIII, (7.3)] such that Tz(Kz) corresponds to tf(mxθ(N)x)+f
∗(my)(θ(f)x)
modulo mk+1x θ(f)x. We do not here explain the definition. According to [MaIII]
we define d(f,K) to be
dimmxθ(f)x/(tf(mxθ(N)x) + f
∗(my)(θ(f)x)), (2.3)
which is equal to codimKz.
3 Proofs of Theorems 1.1 and 1.2.
In this section we prove Theorems 1.1 and 1.2.
Let k be a positive integer. Let W kρ = W
ρ (n, p) denote the subset consisting
of all z ∈ Jk(n, p) such that the codimension of Kz in Jk(n, p) is not less than
ρ. The following lemma has been observed in [MaV, Section 7 and Proof of
Theorem 8.1].
Lemma 3.1 Let ρ be an integer with ρ ≧ 1. Then W kρ is an algebraic subset of
Jk(n, p).
The order of K-determinacy is estimated by the codimension of a K-orbit as
follows.
Proposition 3.2 Let k be an integer with k > ρ. Let z = jkf be a singular jet
in Jk(n, p)\W kρ+1. Then z is K-k-determined.
Proof. It follows from [W, Theorem 1.2 (iii)] that if d =codimKz, then z
is K-(d + 1)-determined. Hence, if z ∈ Jk(n, p)\W kρ+1, then d ≤ ρ and z is
K-k-determined.
We define the bundle homomorphism
d : (πkN )
∗(TN) −→ (πkk−1)
∗(TJk−1(N,P )), (3.1)
d1 : (π
∗(TN) −→ (πkP )
∗(TP ).
Let w = jkxf ∈ J
x,y(N,P ) and z = π
k(w). Then we have j
k−1f : (N, x) →
(Jk−1(N,P ), z) and d(jk−1f) : TxN → Tz(J
k−1(N,P )). We set
dz(w,v) = (w, d(j
k−1f)(v)) and (d1)z(w,v) = (w, df(v)).
Let I ′ be a symbol of length k. Let K(ΣI
) denote the kernel subbundle of
(πkN |Σ
I′(N,P ))∗(TN) defined by
)w = (w,Ker(dxf)).
The following theorem follows from the corresponding assertion for the case
k = ∞ in [B, (7.7)]. This is very important in the proof of Theorem 1.1.
Theorem 3.3 If I ′ = (i1, · · · , ik−2, 0, 0) and I = (i1, · · · , ik−2,0), then we have
d(K(ΣI
)w) ∩ (π
k−1|Σ
I′(N,P ))∗(T (ΣI(N,P ))w = {0}
for any w ∈ ΣI
(N,P ).
Let us review a general condition on O(n, p) for the relative homotopy prin-
ciple on the existence level in [An7]. We say that a nonempty K-invariant open
subset O(n, p) is admissible if O(n, p) consists of all regular jets and a finite
number of disjoint K-invariant nonempty submanifolds V i(n, p) of codimension
ρi (1 ≤ i ≤ ι) such that the following properties (H-i) to (H-v) are satisfied.
(H-i) V i(n, p) consists of singular k-jets of rank ri, namely, V
i(n, p) ⊂
Σn−ri(n, p).
(H-ii) For each i, the set O(n, p)\{∪ιj=iV
j(n, p)} is an open subset.
(H-iii) For each i with ρi ≤ n, there exists a K-invariant submanifold
V i(n, p)(k−1) of Jk−1(n, p) such that V i(n, p) is open in (πkk−1)
−1(V i(n, p)(k−1)).
(H-iv) If n ≧ p ≧ 2, then V 1(n, p) = Σn−p+1,0(n, p).
Here, Σn−p+1,0(n, p) denotes the Thom-Boardman manifold in Jk(n, p), which
consists of K-orbits of fold jets. Let V i(N,P ) denote the subbundle of Jk(N,P )
associated to V i(n, p). Let K(V i) be the kernel bundle in (πkN )
∗(TN)|V i(N,P )
defined by K(V i)z = (z,Ker(dxf)).
(H-v) For each i with ρi ≤ n and any z ∈ V
i(N,P ), we have
d(K(V i)z) ∩ (π
k−1|V
i(N,P ))∗(T (V i(N,P )(k−1))z = {0}. (3.2)
Then we have proved the following theorem in [An7, Theorem 0.1].
Theorem 3.4 Let k ≧ n − |n − p| + 2. Let n ≧ p ≧ 2 or n < p. Let O(n, p)
denote an admissible open subspace of Jk(n, p). Then the relative homotopy
principle in (h-P) holds for O-regular maps.
We set
VI(n, p) = O(n, p) ∩ Σ
I(n, p).
Let J = (j1, · · · , jk) be a symbol of a singular jet with codimΣ
J (n, p) ≤ n. If
k ≧ n − |n − p|+ 2, we have by Proposition 2.1 that ik−1 = ik = 0. Indeed, if
ik−1 > 0, then
codimΣJ (n, p) ≧ |n− p|+ k − 1 ≧ n+ 1.
So we set J = (j1, · · · , jk−2, 0, 0), J
∗ = (i1, · · · , ik−2,0) and
VJ∗(n, p)
(k−1) = πkk−1(O(n, p)) ∩ Σ
J∗(n, p).
Lemma 3.5 Let J = (j1, · · · , jk−2, 0, 0) and J
∗ = (j1, · · · , jk−2,0) be as above.
Then VJ (n, p) is open in (π
−1(VJ∗(n, p)
(k−1)).
Proof. It is evident that
ΣJ(n, p) = (πkk−1)
−1(ΣJ
(n, p)) and O(n, p) ⊂ (πkk−1)
−1(πkk−1(O(n, p))).
So we have VJ (n, p) ⊂ (π
−1(VJ∗(n, p)
(k−1)). Since πkk−1 is an open map, we
have that VJ (n, p) is an open subset of (π
−1(VJ∗(n, p)
(k−1)).
Let us prove Theorem 1.1.
Proof of Theorem 1.1. By Theorem 3.4 it is enough to prove that O(n, p)
is admissible. Let J be a symbol of length k. By Lemma 2.3, VJ (n, p) is K-
invariant. We have that
(H1) O(n, p) is decomposed into a finite union of all VJ (n, p),
(H2) For each symbol J , the set O(n, p) ∩ ΩJ(n, p) is an open subset of
O(n, p),
(H3) VJ (n, p) is open in (π
−1(VJ∗(n, p)
(k−1)) by lemma 3.5,
(H4) If n ≧ p ≧ 2, then O(n, p) ⊃ Σn−p+1,0(n, p) by the assumption,
(H5) Property (3.2) holds for VJ (n, p) by Theorem 3.3 and Lemma 3.5.
Since O(n, p) satisfies the properties (H1) to (H5), we have proved Theorem
We next prove Theorem 1.2.
Proof of Theorem 1.2. If ℓ is finite, then it follows from Lemma 3.2 that if
k > ℓ, then any k-jet z of Jk(n, p)\W kℓ+1 is K-k-determined and we have
(π∞k )
−1(Okℓ (n, p)) = O
ℓ (n, p).
Therefore, if k ≧ max{ℓ+1, n−|n−p|+2}, then the relative homotopy principle
in (h-P) holds for Okℓ -regular maps by Theorem 1.1 and also for O
ℓ -regular
maps.
Corollary 3.6 Under the same assumption of Theorem 1.2, given a map f :
N → P is homotopic to an Okℓ -regular map if and only if there exists a section
s ∈ Γ
(N,P ) such that πkP ◦ s is homotopic to f .
Corollary 3.7 Let hℓ(P ) be as in Introduction. Then the homotopy class of a
homotopy equivalence f : P → P lies in hℓ(P ) if and only if j
∞f is homotopic
to a section in ΓO∞
(N,P ).
Here we give two remarks.
Remark 3.8 Let W∞
denote the subspace of J∞(n, p) which consists of all jets
z such that any smooth map germ f with z = j∞f is not finitely determined. Let
(N,P ) is the subbundle of J∞(N,P ) associated to W∞
. It has been proved
(see, for example, [W, Theorem 5.1]) that W∞
is not of finite codimension
in J∞(n, p). Consequently, the space of all smooth maps f : N → P with
j∞f(N) ⊂ J∞(N,P )\W∞
(N,P ) is dense in C∞(N,P ). In other words if N
is compact, then a smooth map f : N → P has an integer ℓ such that f is
homotopic to an O∞ℓ -regular map.
Remark 3.9 It is very important to study the topology of the space W kℓ+1(n, p)
and obstructions for finding an Okℓ -regular map. The Thom polynomials related
to W kℓ+1(n, p) have been studied in the dimensions n = p ≦ 8 in [O] and [F-R].
4 Nonexistence theorems
In this section we will discuss the nonexistence of Okℓ -regular maps f : N → P .
Let W kℓ+1(N,P ) denote the subbundle of J
k(N,P ) associated to W kℓ+1(n, p). By
the homotopy principle for Okℓ -regular maps in Theorem 1.2, the existence of
a section of Jk(N,P )\W kℓ+1(N,P ) over N is equivalent to the existence of an
Okℓ -regular map. However, it is not so easy to find obstructions associated to
W kℓ+1(N,P ) such as Thom polynomials of W
ℓ+1(N,P ), and so we will adopt a
method applied in [An1], [I-K] and [duP4] in this section.
For k ≧ p+1, let Σ(n, p; k) denote the algebraic subset of all C∞-nonstable
k-jets of Jk(n, p) defined in [MaV]. Note that for k′ > k, (πk
−1(Σ(n, p; k)) =
Σ(n, p; k′). We have proved the following proposition in [An1, Corollary 5.6].
Proposition 4.1 Let k ≧ p+ 1. If
(p− n+ i)(
i(i+ 1)− p+ n)− i2 ≧ n,
then we have that Σi(n, p) ⊂ Σ(n, p; k).
In [I-K] the following proposition has been proved, while it has not been
stated explicitly and the proof has been given in the context without the details.
So we give a sketchy proof.
Proposition 4.2 ([I-K]) Let ℓ be a nonnegative integer and k ≧ p+ ℓ+ 1. If
(p− n+ i)(
i(i+ 1)− p+ n)− i2 ≧ n+ ℓ,
then we have that Σi(n, p) ⊂ W kℓ+1(n, p). In particular, if n = p and
i2(i−1) ≧
n+ ℓ, then we have that Σi(n, n) ⊂ W kℓ+1(n, n).
Proof. Take a jet z in Σi(n, p) such that z = jk0 f . Suppose that z /∈ W
ℓ+1, and
hence codimKz ≦ ℓ. By [MaIV] there exists a versal unfolding F : (Rn×Rℓ, 0) →
(Rp × Rℓ, 0) of f and jk
(0,0)
F /∈ Σ(n + ℓ, p+ ℓ; k). Here, we note that jk
(0,0)
of kernel rank i. By the assumption and Proposition 4.1 we have
Σi(n+ ℓ, p+ ℓ) ⊂ Σ(n+ ℓ, p+ ℓ; k).
This implies jk
(0,0)
F ∈ Σ(n + ℓ, p+ ℓ; k). This is a contradiction. Hence, z lies
in W kℓ+1.
We show the following proposition by applying Proposition 4.2.
Proposition 4.3 Let ℓ be a nonnegative integer and k ≧ p+ℓ+1. If Σi(n, p) ⊂
W kℓ+1(n, p), then we have that for any positive integer m, Σ
i(m + n,m + p) ⊂
W kℓ+1(m+ n,m+ p).
Proof. Let z = jk0f ∈ Σ
i(m + n,m+ p). Setting α = j10f , we identify α with
the homomorphism Rm+n → Rm+p. Let Ker(α)⊥ and Im(α)⊥ be the orthogonal
complement of the kernel Ker(α) and the image Im(α) of α respectively. Let L
and M be subspaces of Ker(α)⊥ and Im(α) of dimension m such that α maps
L onto M isomorphically. Let L⊥ and M⊥ be their orthogonal complements
in Ker(α)⊥ and Im(α) respectively. Then α is decomposed as in the following
exact sequence.
0 → Ker(α) → L⊕ L⊥ ⊕Ker(α)
→ M ⊕M⊥ ⊕ Im(α)⊥ → Im(α)⊥ → 0
Let us choose coordinates
(u1, · · · , um), (um+1, · · · , um+n−i) and (um+n−i+1, · · · , um+n)
of L, L⊥ and Ker(α), and coordinates
(y1, · · · , ym), (ym+1, · · · , ym+n−i) and (ym+n−i+1, · · · , ym+p)
of M , M⊥ and Im(α)⊥ respectively. Since α maps L onto M isomorphically,
there exist the new coordinates (x1, · · · , xm+n) of R
m+n such that
xj = xj(u1, · · · , um+n) (1 ≤ j ≤ m) and xj = uj (m+ 1 ≤ j ≤ m+ n)
and that
yj ◦ f(x1, · · · , xm+n) = xj (1 ≤ j ≤ m). (4.1)
Setting
x = (xm+1, · · · , xm+n), we define the map g : (R
n, 0) → (Rp, 0) by
yj ◦ g(
x) = yj ◦ f(0, · · · , 0,
x) (m+ 1 ≤ j ≤ m+ p).
Then f is an unfolding of g by (4.1) and g is of kernel rank i at the origin. We
next prove by following the argument and the notation used in [MaIV, Section
1] that d(g,K) is equal to d(f,K). Define π : θ(f) → θ(g) by
ajtf(
j=m+1
j=m+1
◦ g),
where aj ∈ C
∞(Rm+n, 0), a′j ∈ C
∞(Rn, 0) and a′j(
x) = aj(0, · · · , 0,
x). We note
tf(∂/∂xj) = (∂/∂yj) ◦ f +
t=m+1(∂yt ◦ f/∂xj)(∂/∂yt) ◦ f (1 ≤ j ≤ m),
tf(∂/∂xj) =
t=m+1(∂yt ◦ f/∂xj)(∂/∂yt) ◦ f (m+ 1 ≤ j ≤ m+ n),
(∂yt ◦ f/∂xj)(0, · · · , 0,
x) = (∂yt ◦ g/∂xj)(
x) (m+ 1 ≤ t ≤ m+ p).
Since
yt ◦ f(x1, · · · , xm+n)− yt ◦ f(0, · · · , 0,
xubu(x1, · · · , xm+n),
for some bj ∈ C
∞(Rm+n, 0), we have
∂yt ◦ f/∂xj − ∂yt ◦ g/∂xj =
xu(∂bu/∂xj) (m+ 1 ≤ j ≤ m+ n).
Hence, the assertion follows from an elementary calculation under the definition
in (2.3).
Since jk0 g ∈ Σ
i(n, p) ⊂ W kℓ+1(n, p), we have d(g,K) ≧ ℓ+ 1. Hence, we have
d(f,K) ≧ ℓ+ 1. This shows z ∈ W kℓ+1(m+ n,m+ p). This is what we want.
Let ξ be a stable vector bundle over a space. Let c(Σi, ξ) denote the de-
terminant of the (p− n+ i)-matrix whose (s, t)-component is the (i+ s− t)-th
Stiefel-Whitney class Wi+s−t(ξ). If n − p and i are even, say n − p = 2u and
i = 2v, and if ξ is orientable, then cZ(Σ
i, ξ) expresses the determinant of the
(v − u)-matrix whose (s, t)-component is the (v + s − t)-th Pontrjagin class
Pv+s−t(ξ).
Wi · · · Wn−p+1
. . .
Wp−n+2i−1 · · · Wi
Pv · · · Pu+1
. . .
P2v−u−1 · · · Pv
Let τX denote the stable tangent bundle of a manifold X . If f : N → P is a
smooth map transverse to Σi(N,P ) and ξ = τN − f
∗(τP ), then c(Σ
i, ξ) (resp.
i, ξ)) is equal to the (resp. integer) Thom polynomial of the topological
closure of (jkf)−1(Σi(N,P )) ([Po], [Ro] and see also [An1, Proposition 5.4]). If
it does not vanish, then (jkf)−1(Σi(N,P )) cannot be empty by the obstruction
theory in [St]. Hence, we have the following corollary of Propositions 4.2 and
Corollary 4.4 Let f : M → Q be a smooth map with dimM = m + n and
dimQ = m+p. Under the same assumption of Proposition 4.2. we assume that
either
(i) c(Σi, τM − f
∗(τQ)) does not vanish, or
(ii) M and τM −f
∗(τQ) are orientable, n−p and i are even and cZ(Σ
i, τM −
f∗(τQ)) does not vanish.
Then f is not homotopic to any Okℓ -regular map.
5 Homotopy equivalences
In this section we will study the filtration in (1.1) in Introduction by applying
Corollaries 3.7 and 4.4 and Remark 3.8.
Let us first review what is called the Sullivan’s exact sequence in the surgery
theory following [M-M] (see also [K-M], [Su] and [Br]).
In what follows P is a closed and oriented n-manifold. We define the set S(P )
to be the set of all equivalence classes of homotopy equivalences f : N → P of
degree 1 under the following equivalence relation. Let Nj be closed oriented n-
manifolds and let fj : Nj → P be homotopy equivalences of degree 1 (j = 1, 2).
We say that f1 and f2 are equivalent if there exists an h-cobordism W of N1
and N2 and a homotopy equivalence F : (W,N1 ∪ (−N2)) → (P × [0, 1], P × 0∪
(−P )× 1) of degree 1 such that F |Nj = fj (j = 1, 2).
Let O(k) denote the rotation group of Rk and let Gk denote the space of all
homotopy equivalence of the (k − 1)-sphere Sk−1 equipped with the compact-
open topology. By considering the canonical inclusions O(k) → O(k + 1) and
Gk → Gk+1, we set O = limk→∞ O(k) and G = limk→∞ Gk. Let BO and BG
denote the classifying spaces for O and G. Then we have the canonical maps
π(m) : BO(m) → BG(m) and π : BO → BG, which are regarded as fibrations
with fibers G(m)/O(m) and G/O respectively. For a sufficiently large number
m, let ηO(m) denote the universal vector bundle over BO(m) and let iG/O :
G(m)/O(m) → BO(m) be the inclusion of a fiber. Set ηG/O = (iG/O)
∗ηO(m).
Then ηG/O has a trivialization tG/O : ηG/O → R
m as a spherical fibration.
We next recall the surgery obstruction sP4q : [P,G/O] → Z only in the case
of n = 4q. For [α] ∈ [P,G/O] let η = α∗(ηG/O) with the canonical bundle map
α : η → ηG/O covering α and the projection πη onto P . We deform tG/O ◦ α
to a map transverse to 0 ∈ Rm and let M be the inverse image of 0 with a
map πη|M : M → P of degree 1. We define s
4q([α]) = (1/8)(σ(M) − σ(P )).
If P is simply connected in addition, then there have been defined an injection
jP : S(P ) → [P,G/O] such that if sP4q([α]) = 0, πη|M is deformed to a homotopy
equivalence f : N → P of degree 1 under a certain cobordism. The following is
the Sullivan’s exact sequence.
0 −→ S(P )
−→ [P,G/O]
Let us recall the cobordism group Ωh−eqn of homotopy equivalences of degree
1 in [An5]. Let Nj and Pj be oriented closed n-manifolds and let fj : Nj → Pj
be homotopy equivalences of degree 1 (j = 1, 2). We say that f1 and f2 are
cobordant if there exists an oriented (n + 1)-manifold W , V and a homotopy
equivalence F : (W,∂W ) → (V, ∂V ) of degree 1 such that ∂W = N1 ∪ (−N2),
∂V = P1 ∪ (−P2) and F |Nj = fj . The cobordism class of f : N → P is denoted
by [f : N → P ]. Let Ωh−eqn denote the set which consists of all cobordism
classes of homotopy equivalences of degree 1. We provide Ωh−eqn with a module
structure by setting
• [f1 : N1 → P1] + [f2 : N2 → P2] = [f1 ∪ f2 : N1 ∪N2 → P1 ∪ P2],
• −[f : N → P ] = [f : (−N) → (−P )].
The null element is defined to be [f : N → P ] which bound a homotopy equiv-
alence F : (W,∂W ) → (V, ∂V ) of degree 1 such that ∂W = N , ∂V = P and
F |N = f . Even if P is not simply connected, we can find f1 : N1 → P1 with P1
being simply connected in the same cobordism class by killing π1(N) ≈ π1(P )
by usual surgery.
Let cQ(Σ
2i, ηG/O) denote the image of cZ(Σ
2i, ηG/O) in H
4i2(G/O;Q). Let
α = jP ([f : N → P ]) and let cP : P → BSO be a classifying map of the
tangent bundle TP of P . Then it induces the homomorphism C2i : Ω
H4q−4i2(G/O;Q) defined by
C2i([f : N → P ]) = cQ(Σ
2i, ηG/O) ∩ α([P ])
= cQ(Σ
2i, ηG/O)⊗ 1 ∩ (α × cP )∗([P ]),
under the identification
H4q−4i2 (G/O;Q) = H4q−4i2(G/O;Q)⊗ 1
j=0 H4j(G/O;Q)⊗H4q−4i2−4j(BSO;Q). We have that
C2i(α) = cQ(Σ
2i, ηG/O) ∩ (α)∗([P ])
= cQ(Σ
2i, ηG/O) ∩ (α ◦ f)∗([N ])
= (α ◦ f)∗((α ◦ f)
∗(cQ(Σ
2i, ηG/O)) ∩ [N ])
= (α ◦ f)∗(cZ(Σ
2i, τN − f
∗(τP )) ∩ [N ]).
Furthermore, we have proved in [An5, Theorems 3.2 and 4.1] that for integers
q and i with q ≧ i2 ≧ 1,
4q /(Ω
4q ∩Ker(C2i))⊗ Q = dimH4q−4i2 (BSO;Q). (5.1)
The following theorem follows from (5.1), Proposition 4.2 and Corollary 4.4.
Theorem 5.1 Let ℓ, q and i be integers with ℓ ≧ 0 and q ≧ i2. Let k ≧
4q + ℓ + 1. There exists a cobordism class [f : N → P ] ∈ Ω
4q such that
2i, τN − f
∗(τP )) is not a torsion element and that if 4i
3 − 2i2 ≧ 4q + ℓ ≧
4i2 + ℓ, then f is not cobordant in Ω
4q to any O
ℓ -regular map.
We can prove the following theorem using Theorem 5.1 by applying the same
argument in the proof of [An5, Theorem 0.2]. However, Theorem 1.2 is very
important in the following and the situation is rather different. Therefore, we
give its proof.
Theorem 5.2 Let ℓ, q and i be given integers with ℓ ≧ 0 and q ≧ i2. Let
k ≧ 8q + ℓ + 1. If 4i3 − 2i2 ≧ 4q + ℓ ≧ 4i2 + ℓ, then there exists a closed
connected oriented 8q-manifold P and a homotopy equivalence f : P → P of
degree 1 such that cZ(Σ
2i, τP −f
∗(τP )) 6= 0 and that f is not cobordant in Ω
to any Okℓ -regular homotopy equivalence of degree 1.
Proof. It follows from Theorem 5.1 that there exists a homotopy equivalence
f : N → P of degree 1 between 4q-manifolds such that cZ(Σ
2i, τN − f
∗(τP )) is
not a torsion element. Let f−1 : P → N be a homotopy inverse of f . Define
g : N×P → N×P by g(x, y) = (f−1(y), f(x)). We have k ≥ dimN×P + ℓ+1.
If we prove that cZ(Σ
2i, τN×P − g
∗(τN×P )) does not vanish, then, by Corollary
4.4, g is not homotopic to any Okℓ -regular map. We set ξ = τN×P −g
∗(τN×P ) =
τN × τP − f
∗(τP )× (f
−1)∗(τN ). Then
pj(ξ) =
s+t=j
ps(τN × τP )pt(f
∗(τP )× (f
−1)∗(τN ))
s+t=j
s1 + s2 = s
t1 + t2 = t
ps1(τN )pt1(f
∗(τP ))⊗ ps2(τP )pt2((f
−1)∗(τN ))
modulo torsion in H∗(N ;Z) ⊗ H∗(P ;Z). The term of pj(ξ) which lies in
H4j(N ;Z)⊗H0(P ;Z) is equal modulo torsion to
s+t=j
ps(τN )pt(f
∗(τP ))⊗ 1 = pj(τN − f
∗(τP ))⊗ 1.
Hence, we have that cZ(Σ
2i, τN×P−g
∗(τN×P ) is equal to the sum of cZ(Σ
2i, τN−
f∗(τP )) ⊗ 1 and the other term which lies in Σ
4i2−4j(N ;Z) ⊗ H4j(P ;Z)
modulo torsion. Since cZ(Σ
2i, τN − f
∗(τP )) does not vanish, it follows that
2i, τN×P − g
∗(τN×P )) does not vanish. This completes the proof.
We are now ready to prove Theorem 1.3.
Proof of Theorem 1.3. In the proof k refers to a sufficiently large integer.
Let i0 = 2, which is the smallest integer such that 4i
3 − 2i2 ≧ 4i2 with q = 4
and ℓ = 8. Then we have, by Theorem 5.2, a closed connected oriented 8 · 4-
manifold P0 and a homotopy equivalence f0 : P0 → P0 of degree 1 such that
4, τP0 − f
0 (τP0)) 6= 0 and that f0 is not homotopic to any O
8 -regular map.
By Remark 3.8 there exists an integer ℓ such that f0 is homotopic to an O
regular map. Let ℓ1 be such a smallest integer.
We assume the following (A-t) for an integer t ≧ 0, where ℓ0 = 8.
(A-t) We have constructed integers ℓt, ℓt+1, it, a closed oriented 8·i
t -manifold
Pt and an O
-regular homotopy equivalence ft : Pt → Pt of degree 1 such
that 4i3t − 2i
t ≧ 4i
t + ℓt, ℓt+1 > ℓt, cZ(Σ
2it , τPt − f
t (τPt)) 6= 0 and that ft is
not homotopic to any Okℓt-regular map.
Under the assumption (A-t) we prove (A-(t+ 1)) with ℓt+1 < ℓt+2. Let it+1
be the smallest integer among the integers i > 0 with 4i3 − 2i2 ≧ 4i2 + ℓt+1.
Then it follows from Theorem 5.2 that there exist a closed connected oriented
8 · i2t+1-manifold Pt+1 and a homotopy equivalence ft+1 : Pt+1 → Pt+1 of degree
1 such that cZ(Σ
2it , τPt+1 − f
t+1(τPt+1)) 6= 0 and that ft+1 is not homotopic to
any Okℓt+1-regular map. It follows Remark 3.8 that there exists an integer ℓ such
that ft+1 is homotopic to an O
ℓ -regular map. Let ℓt+2 be the smallest integer
among those integers ℓ. Hence, we have ℓt+2 > ℓt+1. This proves (A-(t+ 1)).
Thus we have defined the sequences {it}, {ℓt}, closed connected oriented
manifolds {Pt} of dimensions {8 · i
t} and homotopy equivalences {ft} of degree
1 which satisfy the above properties.
Given a positive integer d, let
P = P0 × P1 × P2 × · · · × Pd,
Ft = idP0 × · · · × idPt−1 × ft × idPt+1 × · · · × idPd (0 ≦ t ≦ d),
and p =
t=0 8 · i
t . We show that Ft /∈ hℓt(P ) and Ft ∈ hℓt+1(P ). Let
qt : P → Pt be the canonical projection. Then the stable tangent bundle τP is
isomorphic to q∗0(τP0)⊕ q
1(τP1 )⊕ · · · ⊕ q
d(τPd). Hence, τP − F
t (τP ) is equal to
q∗0(τP0)⊕ q
1(τP1)⊕ · · · ⊕ q
d(τPd)
− ((q0 ◦ Ft)
∗(τP0 )⊕ (q1 ◦ Ft)
∗(τP1)⊕ · · · ⊕ (qd ◦ Ft)
∗(τPd))
= q∗0(τP0 )⊕ q
1(τP1)⊕ · · · ⊕ q
d(τPd)
− (q∗0(τP0 )⊕ · · · ⊕ q
t−1(τPt−1 )⊕ (ft ◦ qt)
∗(τPt)⊕ · · · ⊕ q
d(τPd))
= q∗t (τPt)− (ft ◦ qt)
∗(τPt)
= q∗t ((τPt)− f
t (τPt)).
This shows that
2it , τP − F
t (τP )) = cZ(Σ
2it , q∗t ((τPt)− f
t (τPt))
= q∗t (cZ(Σ
2it , τPt − f
t (τPt)),
which does not vanish in H2i
t (P ;Z) since cZ(Σ
2it , τPt − f
t (τPt)) 6= 0 and since
q∗t : H
t (Pt;Z) → H
t (P ;Z) is injective. Furthermore, it follows from Propo-
sition 4.3 that Σ2it(p, p) ⊂ W kℓ+1(p, p) and from Corollary 4.4 that Ft is not
homotopic to any Okℓt -regular map. However, since ft is homotopic to an O
regular map, Ft is also homotopic to an O
-regular map. This proves the
theorem.
We prepare further results which are necessary to study the filtration in
(1.1). The assertions (i) and (ii) in the following theorem have been proved in
[An2, Theorem 4.8] and [An4, Theorem 4.1] respectively, which are applications
of the relative homotopy principles for O-regular maps.
Theorem 5.3 Let P be orientable and f : P → P be a smooth map.
(i) A map f is homotopic to a fold-map if and only if τP is isomorphic to
f∗(τP ).
(ii) If a map f is Ω1-regular, then f is homotopic to an Ω(1,1,0)-regular map.
Let V (n, p) be an algebraic set of Jk(n, p) which is invariant with respect
to the actions of local diffeomorphisms of (Rn, 0) and (Rn, 0) and Let V (N,P )
be the subbundle of Jk(N,P ) associated to V (n, p). By [B-H] we have the
fundamental class of V (N,P ) under the coefficient group Z/2, and have the
Thom polynomial c(V (n, p), τN − f
∗(τP )) of V (N,P ).
Theorem 5.4 Let V (p, p) be as above. Let P be orientable and f : P → P be
a smooth map.
(i) If f is a homotopy equivalence, then c(V (p, p), τP − f
∗(τP )) vanishes.
(ii) cZ(W
p (p, p), τP − f
∗(τP )) = 0 for p = 5, 6, 7 and
8 (8, 8), τP − f
∗(τP )) = 9P2(τP − f
∗(τP )) + 3P
1 (τP − f
∗(τP ))
for p = 8.
(iii) Let 2 ≦ p ≦ 8. Then there exists a section s of Okp−1(P, P ) over P with
πkP ◦ s and f being homotopic if and only if cZ(W
p (p, p), τP − f
∗(τP )) = 0.
Proof. (i) Let S(νP ) denote the spherical normal fiber space of P . It follows
from [Sp] that S(νP ) is equivalent to f
∗(S(νP )). Hence, the associated spherical
spaces of τP and f
∗(τP ) are equivalent. In particular, the Stiefel-Whitney classes
of τP − f
∗(τP ) vanish.
(ii) If p ≦ 8, then a map f : P → P is homotopic to a smooth map with
only K-simple singularities by [MaVI]. According to [F-R], the integer Thom
polynomial ofW kp (p, p) is equal to the formula for p = 8 and vanish for p = 5, 6, 7
in Hp(P ;Z) ≈ Z.
(iii) It follows from the relative homotopy principle for Okp−1-regular maps
P → P that the primary obstruction in Hp(P ;πp−1(O
p−1(p, p)) is the unique
obstruction for finding the required section. By an elementary argument we
πp−1(O
p−1(p, p)) ≈ Hp−1(O
p−1(p, p);Z) ≈ H
dimWk
(p,p)(W kp (p, p);Z).
This shows the assertion.
Finally we study the filtration in (1.1) in the case of P being orientable and
p ≦ 8 by applying the homotopy principles in Theorems 1.2 and 5.3. We have
hp(P ) = h(P ).
Examples.
Case: p ≦ 3; h0(P ) ⊂ h1(P ) = h(P ).
Since P is parallelizable, TP and f∗(TP ) are trivial. So a map f : P → P
is homotopic to a fold-map. We refer the reader to [Ru, 1].
Case: p = 4; h0(P ) ⊂ h1(P ) ⊂ h2(P ) = h3(P ) ⊂ h4(P ).
It is known that cZ(Σ
4; τP − f
∗(τP )) = P2(τP − f
∗(τP )). If this class van-
ish, then there exists a section P → Ω1(P, P ) covering f , and hence an Ω1-
regular map by [F]. By Theorems 5.3 and 5.4 we obtain an Ω(1,1,0)-regular
map homotopic to f . It has been proved in [Ak] that h0(P ) 6= h(P ) for
P = S3 × S1#S2 × S2.
Case: 5 ≦ p ≦ 7; h0(P ) ⊂ h1(P ) ⊂ · · · ⊂ hp−1(P ) = hp(P ).
This follows from Theorems 1.2 and 5.4.
Case: p = 8; h0(P ) ⊂ h1(P ) ⊂ · · · ⊂ h7(P ) ⊂ h8(P ).
If 9P2(τP − f
∗(τP )) + 3P
1 (τP − f
∗(τP )) = 0, then the homotopy class of f
lies in h7(P ) by Theorems 1.2 and 5.4.
For more precise information we must investigate the obstructions for finding
sections in Γ
(P, P ) related to W kℓ+1(p, p).
References
[Ak] S. Akbulut, Scharlemann’s manifolds is standard, Ann. of Math.
149(1999), 497-510.
[An1] Y. Ando, Elimination of Thom-Boardman singularities of order two, J.
Math. Soc. Japan 37(1985), 471-487.
[An2] Y. Ando, Fold-maps and the space of base point preserving maps of
spheres, J. Math. Kyoto Univ. 41(2002), 691-735.
[An3] Y. Ando, Existence theorems of fold-maps, Japanese J. Math. 30(2004),
29-73.
[An4] Y, Ando, Stable homotopy groups of spheres and higher singularities,
J. Math. Kyoto Univ. 46(2006), 147-165.
[An5] Y. Ando, Nonexistence of homotopy equivalences which are C∞ stable
or of finite codimension, Topol. Appl. 153(2006), 2962-2970.
[An6] Y. Ando, A homotopy principle for maps with prescribed Thom-
Boardman singularities, Trans. Amer. Math. Soc. 359(2007), 489-515.
[An7] Y. Ando, The homotopy principle for maps with singularities of given
K-invariant class, J. Math. Soc. Japan 59(2007), 557-582.
[Bo] J. M. Boardman, Singularities of differentiable maps, IHES Publ. Math.
33(1967), 21-57.
[B-H] A. Borel and A. Haefliger, La classe d’homologie fundamental d’un es-
pace analytique, Bull. Soc. Math. France, 89(1961), 461-513.
[Br] W. Browder, Surgery on Simply-connected Manifolds, Springer-Verlag,
Berlin Heiderberg, 1972.
[duP1] A. du Plessis, Maps without certain singularities, Comment. Math.
Helv. 50(1975), 363-382.
[duP2] A. du Plessis, Homotopy classification of regular sections, Compos.
Math. 32(1976), 301-333.
[duP3] A. du Plessis, Contact invariant regularity conditions, Springer Lecture
Notes 535(1976), 205-236.
[duP4] A. du Plessis, On mappings of finite codimension, Proc. London Math.
Soc. 50(1985), 114-130.
[E1] Ja. M. Èliašberg, On singularities of folding type, Math. USSR. Izv.
4(1970), 1119-1134.
[E2] Ja. M. Èliašberg, Surgery of singularities of smooth mappings, Math.
USSR. Izv. 6(1972), 1302-1326.
[F] S. Feit, k-mersions of manifolds, Acta Math. 122(1969), 173-195.
[F-R] L. Fehér and R. Rimányi, Thom polynomials with integer coefficients,
Illinois J. Math. 46(2002), 1145-1158.
[G1] M. Gromov, Stable mappings of foliations into manifolds, Math. USSR.
Izv. 3(1969), 671-694.
[G2] M. Gromov, Partial Differential Relations, Springer-Verlag, Berlin, Hei-
delberg, 1986.
[H] M. Hirsch, Immersions of manifolds, Trans. Amer. Math. Soc. 93(1959),
242-276.
[I-K] S. Izumiya and Y. Kogo, Smooth mappings of bounded codimensions,
J. London Math. Soc. 26(1982), 567-576.
[K-M] M. A. Kervaire and J. W. Milnor, Groups of homotopy spheres: I, Ann.
Math. 77(1963), 504-537.
[L] H. I. Levine, Singularities of differentiable maps, Proc. Liverpool Singu-
larities Symposium, I, Springer Lecture Notes in Math. Vol. 192, 1-85,
Springer-Verlag, Berlin, 1971.
[M-M] I. Madsen and R. J. Milgram, The Classifying Spaces for Surgery and
Cobordism of Manifolds, Ann. Math. Studies 92, Princeton Univ. Press,
Princeton, 1979.
[Mart] J. Martinet, Déploiements versels des applications différentiables et clas-
sification des applications stables, Springer Lecture Notes in Math. Vol.
535, 1-44, Spribger-Verlag, Berlin, 1976.
[MaIII] J. N. Mather, Stability of C∞ mappings, III: Finitely determined map-
germs, Publ. Math. Inst. Hautes Étud. Sci. 35(1968), 127-156.
[MaIV] J. N. Mather, Stability of C∞ mappings, IV: Classification of stable
germs by R-algebra, Publ. Math. Inst. Hautes Étud. Sci. 37(1970), 223-
[MaV] J. N. Mather, Stability of C∞ mappings: V, Transversality, Adv. Math.
4(1970), 301-336.
[MaTB] J. N. Mather, On Thom-Boardman singularities, Dynamical Systems,
Academic Press, 1973, 233-248.
[O] T. Ohmoto, Vassiliev complex for contact classes of real smooth map-
germs, Res. Fac. Sci. Kagoshima Univ. 27(1994), 1-12.
[Ph] A. Phillips, Submersions of open manifolds, Topology 6(1967), 171-206.
[Po] I. R. Porteous, Simple singularities of maps, Proc. Liverpool Singulari-
ties Symp. I, Springer Lecture Notes in Math. 192(1971), 286-307.
[Ro] F. Ronga, Le calcul de la classe de cohomologie entière dual a Σk,
Proc. Liverpool Singularities Symp. I, Springer Lecture Notes in Math.
192(1971), 313-315.
[Ru] J. W. Rutter, Homotopy self-equivalences 1988-1999, Contemporary
Math. 274(2001), 1-11.
[Sa] O. Saeki, Fold maps on 4-manifolds, Comment. Math. Helv., 78(2003),
627-647.
[Sm] S. Smale, The classification of immersions of spheres in Euclidean
spaces, Ann. Math. 327-344, 69(1969).
[Sp] M. Spivak, Spaces satisfying Poincaré duality, Topology 6(1969), 77-
[St] N. Steenrod, The Topology of Fibre Bundles, Princeton Univ. Press,
Princeton, 1951.
[Su] D. Sullivan, Triangulating homotopy equivalences, Thesis, Princeton
Univ., 1965.
[T] R. Thom, Les singularités des applications différentiables, Ann. Inst.
Fourier 6(1955-56), 43-87.
[W] C. T. C. Wall, Finite determinacy of smooth map germs, Bull. London
Math. Soc. 13(1981), 481-539.
Department of Mathematical Sciences
Faculty of Science, Yamaguchi University
Yamaguchi 753-8512, Japan
E-mail: andoy@yamaguchi-u.ac.jp
	Introduction
	Boardman manifolds and K-orbits
	Proofs of Theorems 1.1 and 1.2.
	Nonexistence theorems
	Homotopy equivalences
ABSTRACT
  We will prove the relative homotopy principle for smooth maps with
singularities of a given {\cal K}-invariant class with a mild condition. We
next study a filtration of the group of homotopy self-equivalences of a given
manifold P by considering singularities of non-negative {\cal K}-codimensions.

<|endoftext|><|startoftext|>
arXiv:0704.0116v2  [math-ph]  21 May 2007
Stringy Jacobi fields in Morse theory
Yong Seung Cho∗
National Institute for Mathematical Sciences, 385-16 Doryong, Yuseong, Daejeon 305-340 Korea and
Department of Mathematics, Ewha Womans University, Seoul 120-750 Korea
Soon-Tae Hong†
Department of Science Education and Research Institute for Basic Sciences, Ewha Womans University, Seoul 120-750 Korea
(Dated: November 4, 2018)
We consider the variation of the surface spanned by closed strings in a spacetime manifold.
Using the Nambu-Goto string action, we induce the geodesic surface equation, the geodesic surface
deviation equation which yields a Jacobi field, and we define the index form of a geodesic surface as
in the case of point particles to discuss conjugate strings on the geodesic surface.
PACS numbers: 02.40.-k, 04.20.-q, 04.90.+e, 11.25.-w, 11.40.-q
Keywords: Nambu-Goto string action, geodesic surface, Jacobi field, index of geodesic surface, conjugate
strings
I. INTRODUCTION
It is well known that string theory [1, 2] is one of the
best candidates for a consistent quantum theory of grav-
ity to yield a unification theory of all the four basic forces
in nature. In D-brane models [2], closed strings represent
gravitons propagating on a curved manifold, while open
strings describe gauge bosons such as photons, or mat-
ter attached on the D-branes. Moreover, because the
two ends of an open string can always meet and con-
nect, forming a closed string, there are no string theories
without closed strings.
On the other hand, the supersymmetric quantum me-
chanics has been exploited by Witten [3] to discuss the
Morse inequalities [4, 5, 6]. The Morse indices for pair of
critical points of the symplectic action function have been
also investigated based on the spectral flow of the Hes-
sian of the symplectic function [7], and on the Hilbert
spaces the Morse homology [8] has been considered to
discuss the critical points associated with the Morse in-
dex [9]. The string topology was initiated in the seminal
work of Chas and Sullivan [10]. Using the Morse theoretic
techniques, Cohen in Ref. [11] constructs string topology
operations on the loop space of a manifold and relates
the string topology operations to the counting of pseudo-
holomorphic curves in the cotangent bundle. He also
speculates the relation between the Gromov-Witten in-
variant [12] of the cotangent bundle and the string topol-
ogy of the underlying manifold. Recently, the Jacobi
fields and their eigenvalues of the Sturm-Liouville oper-
ator associated with the particle geodesics on a curved
manifold have been investigated [13], to relate the phase
factor of the partition function to the eta invariant of
Atiyah [14, 15].
In this paper, we will exploit the Nambu-Goto string
∗Electronic address: yescho@ewha.ac.kr
†Electronic address: soonhong@ewha.ac.kr
action to investigate the geodesic surface equation and
the geodesic surface deviation equation associated with
a Jacobi field. The index form of a geodesic surface will
be also discussed for the closed strings on the curved
manifold.
In Section II, the string action will be introduced to
investigate the geodesic surface equation in terms of the
world sheet currents associated with τ and σ world sheet
coordinate directions. By taking the second variation of
the surface spanned by closed strings, the geodesic sur-
face deviation equation will be discussed for the closed
strings on the curved manifold. In Section III, exploiting
the orthonormal gauge, the index form of a geodesic sur-
face will be also investigated together with breaks on the
string tubes. The geodesic surface deviation equation in
the orthonormal gauge will be exploited to discuss the
Jacobi field on the geodesic surface.
II. STRINGY GEODESIC SURFACES IN
MORSE THEORY
In analogy of the relativistic action of a point parti-
cle, the action for a string is proportional to the area
of the surface spanned in spacetime manifold M by the
evolution of the string. In order to define the action
on the curved manifold, let (M, gab) be a n-dimensional
manifold associated with the metric gab. Given gab, we
can have a unique covariant derivative ∇a satisfying [6]
∇agbc = 0, ∇aω
b = ∂aω
b + Γbac ω
c and
(∇a∇b −∇b∇a)ωc = R
abc ωd. (2.1)
We parameterize the closed string by two world sheet
coordinates τ and σ, and then we have the correspond-
ing vector fields ξa = (∂/∂τ)a and ζa = (∂/∂σ)a. The
Nambu-Goto string action is then given by [1, 2, 16]
S = −
dτdσf(τ, σ) (2.2)
http://arxiv.org/abs/0704.0116v2
where the coordinates τ and σ have ranges 0 ≤ τ ≤ T
and 0 ≤ σ ≤ 2π respectively and
f(τ, σ) = [(ξ · ζ)2 − (ξ · ξ)(ζ · ζ)]1/2. (2.3)
We now perform an infinitesimal variation of the tubes
γα(τ, σ) traced by the closed string during its evolution in
order to find the geodesic surface equation from the least
action principle. Here we impose the restriction that the
length of the string circumference is τ independent. Let
the vector field ηa = (∂/∂α)a be the deviation vector
which represents the displacement to an infinitesimally
nearby tube, and let Σ denote the three-dimensional sub-
manifold spanned by the tubes γα(τ, σ). We then may
choose τ , σ and α as coordinates of Σ to yield the com-
mutator relations,
a = ξb∇bη
a − ηb∇bξ
a = 0,
a = ζb∇bη
a − ηb∇bζ
a = 0,
a = ξb∇bζ
a − ζb∇bξ
a = 0. (2.4)
Now we find the first variation as follows [17]
dτdσ ηb(ξ
τ + ζ
dσ P bτ ηb|
τ=0 −
dτ P bσηb|
σ=0 , (2.5)
where the world sheet currents associated with τ and σ
directions are respectively given by [17]
P aτ =
[(ξ · ζ)ζa − (ζ · ζ)ξa],
P aσ =
[(ξ · ζ)ξa − (ξ · ξ)ζa]. (2.6)
Using the endpoint conditions ηa(0) = ηa(T ) = 0 and pe-
riodic condition ηa(σ+2π) = ηa(σ), we have the geodesic
surface equation [17]
ξa∇aP
τ + ζ
σ = 0, (2.7)
and the constraint identities [17]
Pτ · ζ = 0, Pτ · Pτ + ζ · ζ = 0,
Pσ · ξ = 0, Pσ · Pσ + ξ · ξ = 0.
(2.8)
Let γα(τ, σ) denote a smooth one-parameter family
of geodesic surfaces: for each α ∈ R, the tube γα is
a geodesic surface parameterized by affine parameters τ
and σ. For an infinitesimally nearby geodesic surface in
the family, we then have the following geodesic surface
deviation equation
ξb∇b(η
τ ) + ζ
b∇b(η
+R abcd (ξ
bP dτ + ζ
bP dσ )η
c ≡ (Λη)a = 0. (2.9)
For a small variation ηa, our goal is to compare S(α) with
S(0) of the string. The second variation d2S/dα2(0) is
then needed only when dS/dα(0) = 0. Explicitly, the
second variation is given by
|α=0 = −
(ηc∇cP
τ )(ξ
a∇aηb)
+(ηc∇cP
a∇aηb)−R
acb (ξ
aP bτ + ζ
aP bσ)η
dσ P bτ η
a∇aηb|
τ=0 −
dτ P bση
a∇aηb|
σ=0 .
(2.10)
Here the boundary terms vanish for the fixed endpoint
and the periodic conditions, even though on the geodesic
surface we have breaks which we will explain later. After
some algebra using the geodesic surface deviation equa-
tion, we have
|α=0 =
dτdσ ηa(Λη)
a. (2.11)
III. JACOBI FIELDS IN ORTHONORMAL
GAUGE
The string action and the corresponding equations
of motion are invariant under reparameterization σ̃ =
σ̃(τ, σ) and τ̃ = τ̃ (τ, σ). We have then gauge degrees of
freedom so that we can choose the orthonormal gauge as
follows [17]
ξ · ζ = 0, ξ · ξ + ζ · ζ = 0, (3.1)
where the plus sign in the second equation is due to the
fact that ξ·ξ is timelike and ζ·ζ is spacelike. Note that the
gauge fixing (3.1) for the world sheet coordinates means
that the tangent vectors are orthonormal everywhere up
to a local scale factor [17]. In this parameterization the
world sheet currents (2.6) satisfying the constraints (2.8)
are of the form
P aτ = −ξ
a, P aσ = ζ
a. (3.2)
The geodesic surface equation and the geodesic surface
deviation equation read
− ξa∇aξ
b + ζa∇aζ
b = 0, (3.3)
−ξb∇b(ξ
a) + ζb∇b(ζ
−R abcd (ξ
bξd − ζbζd)ηc = (Λη)a = 0. (3.4)
We now restrict ourselves to strings on constant scalar
curvature manifold such as Sn. We take an ansatz that
on this manifold the string shape on the geodesic surface
γ0 is the same as that on a nearby geodesic surface γα at a
given time τ . We can thus construct the variation vectors
ηa(τ) as vectors associated with the centers of the string
of the two nearby geodesic surfaces at the given time
τ . We then introduce an orthonormal basis of spatial
vectors eai (i = 1, 2, ..., n−2) orthogonal to ξ
a and ζa and
parallelly propagated along the geodesic surface. The
geodesic surface deviation equation (3.4) then yields for
i, j = 1, 2, ..., n− 2
+ (R iτjτ −R
σjσ)η
j = 0. (3.5)
The value of ηi at time τ must depend linearly on the
initial data ηi(0) and dη
(0) at τ = 0. Since by con-
struction ηi(0) = 0 for the family of geodesic surfaces,
we must have
ηi(τ) = Aij(τ)
(0). (3.6)
Inserting (3.6) into (3.5) we have the differential equation
for Aij(τ)
d2Aij
+ (R iτkτ −R
σkσ)A
j = 0, (3.7)
with the initial conditions
Aij(0) = 0,
(0) = δij . (3.8)
Note that in (3.7) we have the last term originated from
the contribution of string property.
Next we consider the second variation equation (2.10)
under the above restrictions
|α=0 =
− (R iτjτ −R
σjσ)η
(3.9)
We define the index form Iγ of a geodesic surface γ as
the unique symmetric bilinear form Iγ : Tγ × Tγ → R
such that
Iγ(V, V ) =
|α=0 (3.10)
for V ∈ Tγ . From (3.9) we can easily find
Iγ(V,W ) =
−(R mτjτ −R
σjσ )W
. (3.11)
If we have breaks 0 = τ0 < · · · < τk+1 = T , and the
restriction of γ to each set [τi−1, τi] is smooth, then the
tube γ is piecewise smooth. The variation vector field
V of γ is always piecewise smooth. However dV/dτ will
generally have a discontinuity at each break τi (1 ≤ i ≤
k). This discontinuity is measured by
(τi) =
(τ+i )−
(τ−i ), (3.12)
where the first term derives from the restrictions
γ|[τi, τi+1] and the second from γ|[τi−1, τi]. If γ and
V ∈ Tγ have the breaks τ1 < · · · < τk, we have
∫ τi+1
dτ = −
(3.13)
to yield
Iγ(V,W ) = −
dτdσ V m
(3.14)
+(R mτjτ −R
σjσ )W
dσ Vm∆
(τi). (3.15)
Here note that if we do not have the breaks, (3.9) yields
|α=0 = −
dτdσ ηi
+ (R iτjτ −R
σjσ)η
(3.16)
A solution ηa of the geodesic surface deviation equation
(3.5) is called a Jacobi field on the geodesic surface γ. A
pair of strings p, q ⊂ γ defined by the centers of the closed
strings on the geodesic surface is then conjugate if there
exists a Jacobi field ηa which is not identically zero but
vanishes at both strings p and q. Roughly speaking, p
and q are conjugate if an infinitesimally nearby geodesic
surface intersects γ at both p and q. From (3.6), q will be
conjugate to p if and only if there exists nontrivial initial
data: dηi/dτ(0) 6= 0, for which ηi = 0 at q. This occurs
if and only if detAij = 0 at q, and thus detA
j = 0
is the necessary and sufficient condition for a conjugate
string to p. Note that between conjugate strings, we have
detAij 6= 0 and thus the inverse of A
j exists. Using (3.7)
we can easily see that
Aik −Aij
= 0. (3.17)
In addition, the quantity in parenthesis of (3.17) vanishes
at p, since Aij(0) = 0. Along a geodesic surface γ, we
thus find
Aik −Aij
= 0. (3.18)
If γ is a geodesic surface with no string conjugate to p
between p and q, then Aij defined above will be nonsingu-
lar between p and q. We can then define Y i = (A−1)ijη
or ηi = AijY
j . From (3.16) and (3.18), we can easily
verify
|α=0 =
≥ 0. (3.19)
Locally γ minimizes the Nambu-Goto string action, if γ is
a geodesic surface with no string conjugate to p between
p and q.
On the other hand, if γ is a geodesic surface but has a
conjugate string r between strings p and q, then we have
a non-zero Jacobi field J i along γ which vanishes at p
and r. Extend J i to q by putting it zero in [r, q]. Then
dJ i/dτ(r−) 6= 0, since J i is nonzero. But dJ i/dτ(r+) = 0
to yield
(r) = −
(r−) 6= 0. (3.20)
We choose any ki ∈ Tγ such that
(r) = c, (3.21)
with a positive constant c. Let ηi be ηi = ǫki + ǫ−1J i
where ǫ is some constant, then we have
Iγ(η, η) = ǫ
2Iγ(k, k) + 2Iγ(k, J) + ǫ
−2Iγ(J, J). (3.22)
By taking ǫ small enough, the first term in (3.22) vanishes
and the third term also vanishes due to the definition
of the Jacobi field and (3.15). Substituting (3.21) into
(3.15) we have Iγ(k, J) = −2πc and thus
|α=0 = −4πc, (3.23)
which is negative definite. From the above arguments, we
conclude that given a smooth timelike tube γ connecting
two strings p, q ⊂ M , the necessary and sufficient con-
dition that γ locally minimizes the surface of the closed
string tube between p and q over smooth one parameter
variations is that γ is a geodesic surface with no string
conjugate to p between p and q. It is also interesting
to see that on Sn, the first non-minimal geodesic sur-
face has n − 1 conjugate strings as in the case of point
particle. Moreover, on the Riemannian manifold with
the constant sectional curvature K, the geodesic surfaces
have no conjugate strings for K < 0 or K = 0, while
conjugate strings occur for K > 0 [18].
IV. CONCLUSIONS
The Nambu-Goto string action has been introduced to
study the geodesic surface equation in terms of the world
sheet currents associated with τ and σ directions. By
constructing the second variation of the surface spanned
by closed strings, the geodesic surface deviation equation
has been discussed for the closed strings on the curved
manifold.
Exploiting the orthonormal gauge, the index form of
a geodesic surface has been defined together with breaks
on the string tubes. The geodesic surface deviation equa-
tion in this orthonormal gauge has been derived to find
the Jacobi field on the geodesic surface. Given a smooth
timelike tube connecting two strings on the manifold,
the condition that the tube locally minimizes the sur-
face of the closed string tube between the two strings
over smooth one parameter variations has been also dis-
cussed in terms of the conjugate strings on the geodesic
surface.
In the Morse theoretic approach to the string theory,
one could consider the physical implications associated
with geodesic surface congruences and their expansion,
shear and twist. It would be also desirable if the string
topology and the Gromov-Witten invariant can be in-
vestigated by exploiting the Morse theoretic techniques.
These works are in progress and will be reported else-
where.
Acknowledgments
The work of YSC was supported by the Korea Re-
search Council of Fundamental Science and Technol-
ogy (KRCF), Grant No. C-RESEARCH-2006-11-NIMS,
and the work of STH was supported by the Korea Re-
search Foundation (MOEHRD), Grant No. KRF-2006-
331-C00071, and by the Korea Research Council of Fun-
damental Science and Technology (KRCF), Grant No.
C-RESEARCH-2006-11-NIMS.
[1] M.B. Green, J.H. Schwarz and E. Witten, Superstring
Theory Vol. 1 (Cambridge Univ. Press, Cambridge,
1987).
[2] J. Polchinski, String Theory Vol. 1 (Cambridge Univ.
Press, Cambridge, 1999).
[3] E. Witten, J. Diff. Geom. 17, 661 (1982).
[4] M. Morse, The Calculus of Variations in the Large
(Amer. Math. Soc., New York, 1934).
[5] J. Milnor, Morse Theory (Princeton Univ. Press, Prince-
ton, 1963).
[6] R.M. Wald, General Relativity (The Univ. of Chicago
Press, Chicago, 1984).
[7] A. Floer, Comm. Pure Appl. Math. 41, 393 (1988).
[8] M. Schwarz, Morse Homology, Vol. 111 of Prog. Math.
(Birkhäuser, Basel, 1993).
[9] A. Abbondandolo, P. Majer, Comm. Pure Appl. Math.
54, 689 (2001).
[10] M. Chas and D. Sullivan, String Topology, to appear in
Ann. Math., math.GT/9911159.
[11] P. Biran, O. Cornea and F. Lalonde, Morse Theoretic
Methods in Nonlinear Analysis and in Symplectic Topol-
ogy Series II: Mathematics, Physics and Chemistry, Vol.
217 of NATO Sci. Series (Springer, New York, 2004).
[12] D. McDuff and D. Salamon, J-holomorphic Curves and
Quantum Cohomology, Vol. 6 of Univ. Lecture Series
(Amer. Math. Soc., Providence, 1994).
[13] S.T. Hong, J. Geom. Phys. 48, 135 (2003).
[14] M.F. Atiyah, V. Patodi and I. Singer, Math. Proc. Camb.
Phil. Soc. 77, 43 (1975); Math. Proc. Camb. Phil. Soc.
78, 405 (1975); Math. Proc. Camb. Phil. Soc. 79, 71
(1976).
[15] E. Witten, Comm. Math. Phys. 121, 351 (1989).
[16] Y. Nambu, Lecture at the Copenhagen Symposium,
1970, unpublished; T. Goto, Prog. Theor. Phys. 46, 1560
(1971).
[17] J. Scherk, Rev. Mod. Phys. 47, 123 (1975); J. Govaerts,
Lectures given at Escuela Avanzada de Verano en Fisica,
Mexico City, Mexico (1986).
[18] J. Cheeger and D. Ebin, Comparison Theorems in Rie-
mannian Geometry (North-Holland, Amsterdam, 1975).
ABSTRACT
  We consider the variation of the surface spanned by closed strings in a
spacetime manifold. Using the Nambu-Goto string action, we induce the geodesic
surface equation, the geodesic surface deviation equation which yields a Jacobi
field, and we define the index form of a geodesic surface as in the case of
point particles to discuss conjugate strings on the geodesic surface.

<|endoftext|><|startoftext|>
Lower ground state due to counter-rotating wave interaction in trapped ion system
T. Liu1, K.L. Wang1,2, and M. Feng3 ∗
The School of Science, Southwest University of Science and Technology, Mianyang 621010, China
The Department of Modern Physics, University of Science and Technology of China, Hefei 230026, China
State Key Laboratory of Magnetic Resonance and Atomic and Molecular Physics,
Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan, 430071, China
(Dated: November 4, 2018)
We consider a single ion confined in a trap under radiation of two traveling waves of lasers. In
the strong-excitation regime and without the restriction of Lamb-Dicke limit, the Hamiltonian of
the system is similar to a driving Jaynes-Cummings model without rotating wave approximation
(RWA). The approach we developed enables us to present a complete eigensolutions, which makes
it available to compare with the solutions under the RWA. We find that, the ground state in our
non-RWA solution is energically lower than the counterpart under the RWA. If we have the ion in the
ground state, it is equivalent to a spin dependent force on the trapped ion. Discussion is made for
the difference between the solutions with and without the RWA, and for the relevant experimental
test, as well as for the possible application in quantum information processing.
PACS numbers: 32.80.Lg, 42.50.-p, 03.67.-a
I. INTRODUCTION
Ultracold ions trapped as a line are considered as a promising system for quantum information processing [1]. Since
the first quantum gate performed in the ion trap [2], there have been a series of experiments with trapped ions to
achieve nonclassical states [3], simple quantum algorithm [4], and quantum communication [5].
There have been also a number of proposals to employ trapped ions for quantum computing, most of which work
only in the weak excitation regime (WER), i.e., the Rabi frequency smaller than the trap frequency. While as bigger
Rabi frequency would lead to faster quantum gating, some proposals [6, 7, 8] have aimed to achieve operations in
the case of the Rabi frequency larger than the trap frequency, i.e., the so called strong excitation regime (SER). The
difference of the WER from the SER is mathematically reflected in the employment of the rotating wave approximation
(RWA), which averages out the fast oscillating terms in the interaction Hamiltonian. As the RWA is less valid with
the larger Rabi frequency, the treatment for the SER was complicated, imcomplete [9], and sometimes resorted to
numerics [10].
In addition, the Lamb-Dicke limit strongly restricts the application of the trapped ions due to technical challenge
and the slow quantum gating. We have noticed some ideas [11, 12] to remove the Lamb-Dicke limit in designing
quantum gates, which are achieved by using some complicated laser pulse sequences.
In the present work, we investigate, from another research angle, the system mentioned above in SER and in the
absence of the Lamb-Dicke limit. The main idea, based on an analytical approach we have developed, is to check the
eigenvectors and the eigenenergies of such a system, with which we hope to obtain new insight into the system for
more application. The main result in our work is a newly found ground state, energically lower than the ground state
calculated by standard Jaynes-Cummings model. We will also present the analytical forms of the eigenvectors and
the variance of the eigenenergies with respect to the parameters of the system, which might be used in understanding
the time evolution of the system.
The paper is organized as follows. In Section II we will solve the system in the absence of the RWA. Then some
numerical results will be presented in comparison with the RWA solutions in Section III. We will discuss about the
new results for their possible application. More extensive discussion and the conclusion are made in Section IV. Some
analytical deduction details could be found in Appendix.
II. THE ANALYTICAL SOLUTION OF THE SYSTEM
As shown in Fig. 1, we consider a Raman Λ-type configuration, which corresponds to the actual process in NIST
experiments. Like in [13], we will employ some unitary transformations to get rid of the assumption of Lamb-Dicke
limit and the WER. So our solution is more general than most of the previous work [14]. For a single trapped
∗ Electronic address: mangfeng@wipm.ac.cn
http://arxiv.org/abs/0704.0117v1
ion experiencing two off-resonant counter-propagating traveling wave lasers with frequencies ω1 and ω2, respectively,
and in the case of a large detuning δ, we have an effective two-level system with the lasers driving the electric-dipole
forbidden transition |g〉 ↔ |e〉 by the effective laser frequency ωL = ω1−ω2. So we have the dimensionless Hamiltonian
σz + a
iηx̂ + σ−e
−iηx̂), (1)
in the frame rotating with ωL, where ∆ = (ω0 − ωL)/ν, ω0 and ν are the resonant frequency of the two levels of the
ion and the trap frequency, respectively. Ω is the dimensionless Rabi frequency in units of ν and η the Lamb-Dicke
parameter. σ±,z are usual Pauli operators, and we have x̂ = a
† + a for the dimensionless position operator of the ion
with a† and a being operators of creation and annihilation of the phonon field, respectively. We suppose that both Ω
and ν are much larger than the atomic decay rate and the phonon dissipative rate so that no dissipation is considered
below.
Like in [13], we first carry out some unitary transformations on Eq. (1) to avoid the expansion of the exponentials.
So we have
HI = UHU † =
σz + a
†a+ g(a† + a)σx + ǫσx + g
2, (2)
where
F †(η) F (η)
−F †(η) F (η)
with F (η) = exp [iη(a† + a)/2], g = η/2, and ǫ = −∆/2. Eq. (2) is a typical driving Jaynes-Cummings model
including the counter-rotating wave terms. In contrast to the usual treatments to consider the Lamb-Dicke limit
by using the RWA in a frame rotation, we remain the counter-rotating wave interaction in the third term of the
right-hand side of Eq. (2) in our case. To go on our treatment, we make a further rotation with V = exp (iπσy/4),
yielding
= V HIV † = −
σx + a
†a+ g(a† + a)σz + ǫσz + g
2, (3)
where we have used exp (iθσy)σx exp (−iθσy) = cos(2θ)σx + sin(2θ)σz , and exp (iθσy)σz exp (−iθσy) = cos(2θ)σz −
sin(2θ)σx. For convenience of our following treatment, we rewrite Eq. (3) to be
= ǫ(|e〉〈e| − |g〉〈g|)−
(|e〉〈g|+ |g〉〈e|) + a†a+ g(a† + a)(|e〉〈e| − |g〉〈g|) + g2. (4)
Using Schrödinger equation, and the orthogonality between |e〉 and |g〉, we suppose
|〉 = |ϕ1〉|e〉+ |ϕ2〉|g〉, (5)
which yields
ǫ|ϕ1〉+ a†a|ϕ1〉+ g(a† + a)|ϕ1〉 −
|ϕ2〉+ g2|ϕ1〉 = E|ϕ1〉, (6)
− ǫ|ϕ2〉+ a†a|ϕ2〉 − g(a† + a)|ϕ2〉 −
|ϕ1〉+ g2|ϕ2〉 = E|ϕ2〉. (7)
To make the above equations concise, we apply the displacement operator D̂(g) = exp [g(a† − a)] on a† and a, which
givesA = D̂(g)†aD̂(g) = a+g, A† = D̂(g)†a†D̂(g) = a†+g, B = D̂(−g)†aD̂(−g) = a−g, and B† = D̂(−g)†a†D̂(−g) =
a† − g. So we have
(A†A+ ǫ)|ϕ1〉 −
|ϕ2〉 = E|ϕ1〉, (8)
(B†B − ǫ)|ϕ2〉 −
|ϕ1〉 = E|ϕ2〉. (9)
Obvious, the new operators work in different subspaces, which leads to different evolutions regarding different internal
levels |g〉 and |e〉. We will later refer to this feature to be relevant to spin-dependent force. The solution of the two
equations above can be simply set as
|ϕ1〉 =
cn|n〉A, (10)
|ϕ2〉 =
dn|n〉B, (11)
with N a large integer to be determined later, |n〉A = 1√
(a† + g)n|0〉A = 1√
(a† + g)nD̂(g)†|0〉 = 1√
(a† +
g)n exp{−ga† − g2/2}|0〉, and |n〉B = 1√
(a† − g)n|0〉B = 1√
(a† − g)nD̂(−g)†|0〉 = 1√
(a† − g)n exp{ga† − g2/2}|0〉.
Taking Eqs. (10) and (11) into Eqs. (8) and (9), respectively, and multiplying by A〈m| and B〈m|, respectively, we
have,
(m+ ǫ)cm −
(−1)nDmndn = Ecm, (12)
(m− ǫ)dm −
(−1)mDmncn = Edm, (13)
where we have set (−1)nDmn =A 〈m|n〉B and (−1)mDmn =B 〈m|n〉A, whose deduction can be found in Appendix.
Diagonizing the relevant determinants, we may have the eigenenergies Ei and the eigenvectors regarding c
n and d
(n = 0, · · · , N, i = 0, · · · , N). Therefore, as long as we could find a closed subspace with ciN+1 and diN+1 approaching
zero for a certain big integer N, we may have a complete eigensolution of the system.
III. DISCUSSION BASED ON NUMERICS
Before doing numerics, we first consider a treatment by involving the RWA. As the RWA solution could present
complete eigenenergy spectra, it is interesting to make a comparison between the RWA solution and our non-RWA
one. We consider a rotation in Eq. (2) with respect to exp{−i[(Ω/2)σz + a†a]t}, which results in
σz + a
†a+ g(aσ+ + a
†σ−) + g
2, (14)
where the RWA has been made by setting Ω = 1, and we have corresponding eigenenergies
E±n = (n+ g
2 + 1/2)± g
n+ 1. (15)
So the system is degenerate in the case of η = 0 and there are two eigenenergy spectra corresponding to E±n as long
as η 6= 0.
Figs. 2(a) and 2(b) demonstrate two spectra, respectively, and in each figure we compare the differences between the
RWA and non-RWA solutions [15]. In contrast to the two spectra in the RWA solution, the non-RWA solution includes
only one spectrum. Comparing the two eigensolutions, we find that the even-number and odd-number excited levels
in the non-RWA case correspond to E+n and E
n of the RWA case, respectively, and the difference becomes bigger and
bigger with the increase of η. It is physically understandable for these differences because the RWA solution, valid only
for small η, does not work beyond the Lamb-Dicke regime. Above comparison also demonstrates the change of the ion
trap system from an integrable case (i.e., with RWA validity) to the non-integrable case (i.e., without RWA validity).
But besides these differences, we find an unusual result in this comparison, i.e., a new level without the counterpart
in RWA solution appearing in our solution, which is lower than the ground state in RWA solution by ν + xη with x
a η-dependent coefficient. In the viewpoint of physics, due to additional counter-rotating wave interaction involved,
it is reasonable to have something more in our solution than the RWA case, although this does not surely lead to
a new level lower than the previous ground state. Anyway, this is a good news for quantum information processing
with trapped ions. As the situation in SER and beyond the Lamb-Dicke limit involves more instability, a stable
confinement of the ion requires a stronger trapping condition. In this sense, our solution, with the possibility to have
the ion stay in an energically lower state, gives a hope in this respect. We will come to this point again later.
Since no report of the new ground state had been found either theoretically or experimentally in previous pub-
lications, we suggest to check it experimentally by resonant absorption spectrum. As shown above, in the case of
non-zero Lamb-Dicke parameter, the degeneracy of the neighboring level spacing is released, and the bigger the η,
the larger the spacing difference between the neighboring levels. Therefore, an experimental test of the newly found
ground state should be available by resonant transition between the ground and the first excited states in Fig. 2, once
the SER is reached. We have noticed that the SER could be achieved by first cooling the ions within the Lamb-Dicke
limit and under the WER, and then by decreasing the trap frequency by opening the trap adiabatically [6].
Since it is lower in energy than the previously recognized ground states, the new ground state we found is more
stable, and thereby more suitable to store quantum information. Once the trapped ion is cooled down to the ground
state in the SER, it is, as shown in Eq. (5) with n = 0, actually equivalent to the effect of a spin-dependent force on
the trapped ion [16]. If we make Hadamard gate on the ion by |g〉 → (|g〉 + |e〉)/
2 and |e〉 → (|g〉 − |e〉)/
2, we
reach a Schrödinger cat state, i.e., (1/2){[D†(g)|0〉+D†(−g)|0〉]|g〉 − [D†(g)|0〉 −D†(−g)|0〉]|e〉}. Two ions confined
in a trap in above situation will yield two-qubit gates without really exciting the vibrational mode [11]. It is also the
way with this spin-dependent force towards scalable quantum information processing [12]. As in SER, we may have
larger Rabi frequency than in WER, the quantum gate could be in principle carried out faster in the SER.
In addition, as it is convergent throughout the parameter subspace, our complete eigensolution enables us to
accurately write down the state of the system at an arbitrary evolution time, provided that we have known the initial
state. This would be useful for future experiments in preparing non-classical states and in designing any desired
quantum gates with trapped ions in the SER and beyond the Lamb-Dicke limit. Moreover, as shown in Figs 3(a),
3(b) and 3(c), our present solution is helpful for us to understand the particular solutions in previous publication
[13]. The comparison in the figures shows that the results in [13] are actually mixtures of different eigensolutions. For
example, the lowest level in Fig. 2 in [13], corresponding to Ω = 2 and η = 0.2, is actually constituted at least by the
third, the fourth, and the fifth excited states of the eigensolution.
IV. FURTHER DISCUSSION AND CONCLUSION
The observation of the counter-rotating effects is an interesting topic discussed previously. In [17], a standard method
is used to study the observable effects regarding the rotating and the counter-rotating terms in the Jaynes-Cummings
model, including to observe Bloch-Siegert shift [18] and quantum chaos in a cavity QED by using differently polarized
lights. A recent work [19] for a two-photon Jaynes-Cummings model has also investigated the observability of the
counter-rotating terms. By using perturbation theory, the authors claimed that the counter-rotating effects, although
very small, can be in principle observed by measuring the energy of the atom going through the cavity. Actually, for
the cavity QED system without any external source involved, it is generally thought that the counter-rotating terms
only make contribution in some virtual fluctuations of the energy in the weak coupling regime. While the interference
between the rotating and counter-rotating contributions could result in some phase dependent effects [20]. Anyway,
if there is an external source, for example, the laser radiating a trapped ultracold ion, the counter-rotating terms will
show their effects, e.g., related to heating in the case of WER [21]. In this sense, our result is somewhat amazing
because the counter-rotating interaction in the SER, making entanglement between internal and vibrational states of
the trapped ion, plays positive role in the ion trapping.
We argue that our approach is applicable to different physical processes involving counter-rotating interaction. Since
the counter-rotating terms result in energy nonconservation in single quanta processes, usual techniques cannot solve
the Hamiltonian with eigenstates spanning in an open form. In this case, path-integral approach [22] and perturbation
approach [20], assisted by numerical techniques were employed in the weak coupling regime of the Jaynes-Cummings
model. In contrast, our method, based on the diagonalization of the coherent-state subspace, could in principle study
the Jaynes-Cummings model without the RWA in any cases. We have also noticed a recent publication [23] to treat
a strongly coupled two-level system to a quntum oscillator under an adiabatic approximation, in which something is
similar to our work in the solution of the Hamiltonian in the absence of the RWA. But due to the different features
in their system from our atomic case, the two-level splitting term, much smaller compared to other terms, can be
taken as a perturbation. So the advantage of that treatment is the possibility to analytically obtain good approximate
solutions. In contrast, not any approximation is used in our solution, which should be more efficient to do the relevant
In summary, we have investigated the eigensolution of the system with a single trapped ion, experiencing two
traveling waves of lasers, in the SER and in the absence of the Lamb-Dicke limit. We have found the ground state
in the non-RWA case to be energically lower than the counterpart of the solution with RWA, which would be useful
for quantum information storage and for quantum computing. The analytical forms of the eigenfunction and the
complete set of the eigensolutions would be helpful for us to understand a trapped ion in the SER and with a large
Lamb-Dicke parameter. We argue that our work would be applied to different systems in dealing with strong coupling
problems.
V. ACKNOWLEDGMENTS
This work is supported in part by NNSFC No. 10474118, by Hubei Provincial Funding for Distinguished Young
Scholars, and by Sichuan Provincial Funding.
VI. APPENDIX
We give the deduction of A〈m|n〉B and B〈m|n〉A below,
A〈m|n〉B =
〈0|e−ga−g
2/2(a+ g)m(a† − g)nega
†−g2/2|0〉
〈0|(a+ g)mega
e−ga(a† − g)n|0〉
〈0|(a+ 2g)m(a† − 2g)n|0〉 = (−1)nDmn,
Dmn = e
min[m,n]
(−1)−i
m!n!(2g)m+n−2i
(m− i)!(n− i)!i!
It is easily proven following a similar step to above that
B〈m|n〉A =
〈0|ega−g
2/2(a− g)m(a† + g)ne−ga
†−g2/2|0〉,
would finally get to (−1)mDmn.
[1] Cirac J I, Zoller P 1995 Phys. Rev. Lett. 74 4091
[2] Monroe C, Meekhof D M, King B E, Itano W M, Wineland D J 1995 Phys. Rev. Lett. 75 4714
[3] Turchette Q A, Wood C S, King B E, Myatt C J, Leibfried D, Itano W M, Monroe C, Wineland D J 1998 Phys. Rev.
Lett. 81 3631; Sackett C A, Kielpinski D, King B E, Langer C, Meyer V, Myatt C J, Rowe M, Turchette Q A, Itano W
M, Wineland D J, Monroe C 2000 Nature 404 256
[4] Gulde S, Riebe M, Lancaster G P T, Becher C, Eschner J, Haeffner H, Schmidt-Kaler F, Chuang I L, Blatt R 2003 Nature
421 48
[5] Riebe M, Haeffner H, Roos C F, Haensel W, Benhelm J, Lancaster G P T, Koerber T W, Becher C, Schmidt-Kaler F,
James D F V, Blatt R 2004 Nature 429 734; Barrett M D, Chiaverini J, Schaetz T, Britton J, Itano W M, Jost J D, Knill
E, Langer C, Leibfried D, Ozeri R, Wineland D J 2004 Nature 429 737
[6] Poyators J F, Cirac J I, Blatt R, Zoller P 1996 Phys. Rev. A 54 1532; Poyatos J F, Cirac J I, Zoller P 1998 Phys. Rev.
Lett. 81 1322
[7] Zheng S, Zhu X W, Feng M 2000 Phys. Rev. A 62 033807
[8] Feng M 2004 Eur. Phys. J. D 29 189
[9] Feng M, Zhu X, Fang X, Yan M, Shi L 1999 J. Phys. B 32 701; Feng M 2002 Eur. Phys. J. D 18 371
[10] Zeng H, Lin F, Wang Y, Segawa Y 1999 Phys. Rev. A 59 4589
[11] Garcia-Ripoll J J, Zoller P and Cirac J I 2003 Phys. Rev. Lett. 91 157901;
[12] Duan L -M 2004 Phys. Rev. Lett. 93 100502
[13] Feng M 2001 J. Phys. B 34 451
[14] Most of the previous work in this respect were carried out by cuting off the expansion of the exponentials regarding the
quantized phonon operators, which is only reasonable in the WER and within the Lamb-Dicke limit. In contrast, our
treatment can be used in both the SER and the WER cases.
[15] We take throughout this paper N = 40 in which the coefficients ci41 and d
41 with i = 0, 1, ..40 are negligible in the case
of Ω = 1 and 2. Although with the increase of values of Ω the diagonalization space has to be enlarged, our analytical
method generally works well in a wide range of parameters.
[16] Haljan P C, Brickman K -A, Deslauriers L, Lee P J and Monroe C 2005 Phys. Rev. Lett. 94 153602
[17] Crisp M D 1991 Phys. Rev. A 43 2430
[18] Bloch F and Siegert A 1940 Phys. Rev. 57 522
[19] Janowicz M and Orlowski A 2004 Rep.Math. Phys. 54 71
[20] Phoenix S J D 1989 J. Mod. Optics 3 127
[21] Leibfrid D, Blatt R, Monroe C, and Wineland D J 2003 Rev. Mod. Phys. 75 281
[22] Zaheer K and Zubairy M S 1998 Phys. Rev. A 37 1628
[23] Irish E K, Gea-Banacloche J, Martin I, and Schwab K C 2005 Phys. Rev. B 72 195410
The captions of the figures
Fig. 1 Schematic of a single trapped ion under radiation of two traveling wave lasers, where ω1 and ω2 are frequencies
regarding the two lasers, respectively, ω0 is the resonant frequency between |g〉 and |e〉, and δ and ∆ are relevant
detunings. This is a typical Raman process employed in NIST experiments, with for example Be+, for quantum
computing.
Fig. 2 The eigenenergy spectra with Ω = 1, where (a) and (b) correspond to two different sets of eigenenergies
with respect to Lamb-Dicke parameter. In (a) the comparison is made between E+n in the RWA case (dashed-dotted
curves) and En with n = even numbers in the non-RWA case (star curves for n = 0 and solid curves for others); In
(b) the comparison is for E−n in the RWA case (dashed-dotted curves) to En with n = odd numbers in the non-RWA
case (solid curves).
Fig. 3 The eigenenergy with respect to the detuning ∆, where for convenience of comparison we have used the same
parameter numbers as in [13]. For clarity, we plot the different levels with different lines. The parameter numbers are
Ω = 2, and (a) η = 0.2; (b) η = 0.4; (c) η = 0.6.
	introduction
	The analytical solution of the system
	discussion based on numerics
	further discussion and conclusion
	acknowledgments
	appendix
	References
ABSTRACT
  We consider a single ion confined in a trap under radiation of two traveling
waves of lasers. In the strong-excitation regime and without the restriction of
Lamb-Dicke limit, the Hamiltonian of the system is similar to a driving
Jaynes-Cummings model without rotating wave approximation (RWA). The approach
we developed enables us to present a complete eigensolutions, which makes it
available to compare with the solutions under the RWA. We find that, the ground
state in our non-RWA solution is energically lower than the counterpart under
the RWA. If we have the ion in the ground state, it is equivalent to a spin
dependent force on the trapped ion. Discussion is made for the difference
between the solutions with and without the RWA, and for the relevant
experimental test, as well as for the possible application in quantum
information processing.

<|endoftext|><|startoftext|>
Strained single-crystal Al2O3 grown layer-by-layer on Nb (110) thin films
Paul B. Welander and James N. Eckstein
Department of Physics and Frederick Seitz Materials Research Laboratory,
University of Illinois at Urbana-Champaign, Urbana, IL 61801
(Dated: April 1, 2007)
We report on the layer-by-layer growth of single-crystal Al2O3 thin-films on Nb (110). Single-
crystal Nb films are first prepared on A-plane sapphire, followed by the evaporation of Al in an O2
background. The first stages of Al2O3 growth are layer-by-layer with hexagonal symmetry. Electron
and x-ray diffraction measurements indicate the Al2O3 initially grows clamped to the Nb lattice with
a tensile strain near 10%. This strain relaxes with further deposition, and beyond about 50 Å we
observe the onset of island growth. Despite the asymmetric misfit between the Al2O3 film and the
Nb under-layer, the observed strain is surprisingly isotropic.
The present challenge of constructing solid-state quan-
tum bits with long coherence times [1] has ignited new
interest in Josephson junctions fabricated from single-
crystal materials. It has been found that critical-current
1/f noise cannot fully account for the observed deco-
herence times in junctions-based qubits [2]. However,
amorphous tunnel-barrier defects can give rise to two-
level charge fluctuations that destroy quantum coher-
ence across the junction [3, 4]. Oh et al have recently
found that tunnel-junctions from epitaxial Re/Al2O3/Al
tri-layers have a significantly reduced density of two-level
fluctuators [5].
The pairing of Re and Al2O3 is advantageous because
of the very small misfit between the basal planes and
because Re is less likely to oxidize compared with other
superconducting refractory metals. However, epitaxial
Re films develop domains due to basal-plane twinning,
causing the surface to be rough on the length scales of
a typical tunnel-junction [6]. An alternative to Al2O3
hetero-epitaxy on a close-packed metal surface is to grow
on bcc (110), where such twinning is absent. To date
single-crystal Al2O3 films have been grown on a number
of such metals: Ta [7], Mo [8], W [9], and more recently
Nb [10].
In a recent paper, Dietrich et al reported on their in-
vestigations of ultra-thin epitaxial α-Al2O3 (0001) films
on Nb using tunneling microscopy and spectroscopy [10].
Their films were grown on Nb (110) by evaporating Al in
an O2 background near room temperature. Crystalliza-
tion was achieved by annealing the sample up to 1000 ◦C.
Subsequent microscopy showed the film to be atomically
smooth, but spectroscopic scans found localized defect
states around ±1 eV, well below the 9 eV sapphire band
We report here on our findings concerning the hetero-
epitaxy of Al2O3 on Nb (110) films. Unlike the previ-
ous study, our Al2O3 films are grown layer-by-layer with
co-deposition of Al and O at elevated substrate tem-
peratures. Epitaxial bi-layers (Nb/Al2O3) and tri-layers
(Nb/Al2O3/Nb) are grown by molecular beam epitaxy
(MBE). Characterization techniques include in situ re-
∗This article has been submitted to Applied Physics Letters.
flection high-energy electron diffraction (RHEED) and
x-ray photo-electron spectroscopy (XPS), and ex situ
atomic force microscopy (AFM) and x-ray diffraction
(XRD).
The process for growing high-quality single-crystal Nb
films on sapphire is well understood [11]. Our samples
start with a thick Nb base layer (2000 Å) grown on A-
plane sapphire – α-Al2O3 (112̄0) – with a nominal miscut
of 0.1◦. Nb (99.99%) is evaporated via e-beam bombard-
ment at a rate of about 0.3 Å/s onto a substrate held
near 800 ◦C. The base pressure of our chamber is about
10−11 torr, with the growth pressure around 10−9 torr.
After deposition, the film is annealed above 1300 ◦C for
30 min. During growth and annealing the film surface is
monitored with RHEED.
Epitaxial Nb on A-plane sapphire grows in the (110)
orientation with Nb [11̄1] ‖ α-Al2O3 [0001], in accordance
with the well-established three-dimensional relationship
[11, 12]. Nb RHEED patterns (Figure 1) reveal a two-
dimensional, reconstructed film surface that takes one
form after growth [13], and a second one upon annealing
[14]. Annealed films also show a sharp specular spot in-
FIG. 1: Nb (110) on A-plane sapphire. Top: RHEED images
along the (a) [001] and (b) [11̄1] azimuths after growth at 800
◦C, and (c) [11̄1] after annealing above 1300 ◦C. Left: 5 × 5
µm2 AFM image of annealed Nb, 10 Å height scale. Right:
XRD radial scan of the Nb (110) Bragg peak.
http://arxiv.org/abs/0704.0118v1
FIG. 2: Top: RHEED images from epitaxial Al2O3 on Nb
(110), taken along the [11̄00] azimuth after deposition of (a)
4 Å, (b) 25 Å, and (c) 125 Å. Bottom: 5× 5 µm2 AFM scans
on Al2O3 films that are 20 Å (left, 10 Å height-scale) and 100
Å (right, 50 Å) thick.
dicating long-range film flatness, which is confirmed by
AFM measurements. Scans show large terraces about
2000 Å wide and monolayer step-edges that align them-
selves according to the substrate miscut (Figure 1). An-
nealed Nb films typically have an rms surface roughness
less than 2 Å.
XRD measurements on these Nb films show sharp
Bragg peaks and narrow rocking curves, both indicative
of single-crystal growth. Figure 1 shows a radial scan
(2θ-ω) of the Nb (110) Bragg peak from a 2000 Å-thick
film, with intensity fringes indicating a structural coher-
ence that extends over the entire film thickness. Rock-
ing curves typically have a FWHM of about 0.03◦. In
addition, measurements of specular and off-axis Bragg
peaks demonstrate that a 2000 Å-thick annealed Nb film
is strained 0.1% or less with respect to bulk.
Al2O3 is deposited in situ onto similar Nb films at a
substrate temperature of around 750 ◦C. Using a stan-
dard effusion cell, Al (99.9995%) is evaporated at about
0.1 Å/s in an O2 (99.995%) background up to 5 × 10
torr. Under these growth conditions we estimate that
the O2 flux is about 1000 times greater than that of Al
[15]. After deposition the sample is cooled before turning
the O2 off. Al2O3 films included in this report range in
thickness from 15 to 125 Å.
Chemical analysis of the Al2O3 is carried out in an
XPS system adjacent to the growth chamber. Measure-
ments of the Al 2p, O 1s and Nb 3d levels indicate that
the Al is completely oxidized with no measurable oxida-
tion of the underlying Nb. The observed energy differ-
ence between the O 1s and Al 2p levels is 457.1 eV, in
good agreement with what has been reported for sap-
phire (456.6 eV) [16]. The Nb 3d level shows no side
bands which would indicate oxide formation.
RHEED of the Al2O3 thin film reveals a hexagonal
FIG. 3: Strain vs. film thickness for epitaxial Al2O3 on Nb
(110). (•) Strain of a 100 Å film measured during deposition.
(⋄) Strain observed for a number of samples after deposition
and cooling, with error bars indicating the range of strain
values measured along different RHEED azimuths.
C-plane-like surface in the Nishiyama-Wasserman orien-
tation: α-Al2O3 (0001) [1̄100] ‖ Nb (110) [001] [17]. (Be-
cause both α-Al2O3 [18] and γ-Al2O3 [19] have close-
packed planes, no definitive crystal structure can be
inferred. Hexagonal Miller indices will be employed
for defining crystallographic orientations by convention
only.) Diffraction images from various stages of growth
are shown in Figure 2. Immediately after the oxide depo-
sition begins the Nb diffraction pattern and specular spot
disappear. After about 2 ML (4 Å) the Al2O3 diffraction
pattern becomes visible. At a thickness of 25 Å, RHEED
shows an elongated specular spot and well-defined first-
order streaks. Up to about 50 Å the Al2O3 growth is
layer-by-layer (Frank-van der Merwe mode). Beyond this
thickness the 2D streaks evolve into 3D spots, indicating
the growth of islands (Stranski-Krastanov mode).
As the transformation from 2D to 3D growth is
occurring, the measured spacing between RHEED
streaks/spots increases, indicating a shrinking of the
Al2O3 surface lattice. Using the RHEED from the base-
layer Nb as a ruler, we find that the Al2O3 film experi-
ences a tensile strain that relaxes with increasing thick-
ness, as shown in Figure 3. The strain-thickness curve is
determined from RHEED along the [1̄100] azimuth dur-
ing Al2O3 deposition near 750
◦C. With respect to C-
plane sapphire (a = 4.759 Å), the tensile strain is nearly
10% initially and by 20 Å has fallen to about 8%. After
100 Å of deposition, the Al2O3 exhibits a tensile strain
of around 3%.
After deposition and cooling in O2, Al2O3 films of var-
ious thicknesses show further lattice relaxation (Figure
3). On average, RHEED measurements near room tem-
perature show a strain reduction of about 1% when com-
pared to measurements just after Al deposition. Ther-
mal contraction accounts for a significant portion of the
strain change during cooling. (Both Nb and Al2O3 have
expansion coefficients in this temperature range around
7-8×10−6 K−1.) However, due to the limited precision
FIG. 4: XRD pole figure for an epitaxial Nb/Al2O3/Nb tri-
layer grown on A-plane sapphire. Both Nb layers have a (110)
surface-orientation. This scan shows the off-axis 〈110〉 Bragg
peaks. The four peaks connected by the dashed rectangle are
approximately four times stronger than the others.
of our measurements, the presence of other strain-relief
mechanisms cannot be determined.
Regardless, the measured tensile strain in epitaxial
Al2O3 films on Nb (110) is significant. What’s more,
the strain is fairly isotropic – RHEED patterns along the
{1̄100} azimuths reveal relatively small variations. The
strain for each azimuth is determined by averaging oppo-
site directions – eg. [1̄100] and [11̄00] – to reduce system-
atic errors. The mean and range of the measured tensile
strain for the three azimuths is shown in Figure 3. The
strain-isotropy is surprising since the misfit along the Nb
[001] or α-Al2O3 [1̄100] direction is rather large (20%),
while along the Nb [11̄0] or α-Al2O3 [1̄1̄20] it is much
smaller (−1.7%). Despite such an anisotropic misfit, the
Al2O3 films exhibit isotropic strain.
Thin Al2O3 films are also very flat. AFM imaging of
a 20 Å-thick film shows an atomically flat surface with
monolayer steps (c/6 = 2.165 Å) and an rms roughness
of about 2 Å (Figure 2). On the other hand, the surface
of a 100 Å-thick film is comprised of islands about 1000
Å wide and 50 Å in height. This agrees well with our
interpretation of Al2O3 RHEED - evidence for islands
in the diffraction images appeared after about 50 Å of
deposition.
For those samples where an epitaxial Nb over-layer
is deposited in situ, the substrate is warmed back up
above 700 ◦C. Under these conditions growth on C-plane
sapphire would yield (111)-oriented films [11, 12]. How-
ever, XRD analysis indicates that the top Nb layer is
(110)-oriented with Nb [001] ‖ α-Al2O3 [1̄100], [01̄10]
and [101̄0]. A pole scan of off-axis 〈110〉 Bragg peaks
is shown in Figure 4, and despite the surface orientation,
the Nb over-layer reproduces the hexagonal symmetry of
the Al2O3 film. The top Nb film grows in three domains
of roughly equal weight rotated with respect to one an-
other by 120◦, with one domain aligned to the base Nb
layer. This type of film structure has been observed for
Nb growth on C-plane sapphire, but only under the fol-
lowing conditions: evaporation above 1000 ◦C [20], post-
growth annealing up to 1500 ◦C [21], and niobium sput-
tering near 850 ◦C [22]. That we observe this growth
structure for evaporation near 700 ◦C suggests that the
surface lattice of the Al2O3 film, while hexagonal, is not
identical to that of C-plane sapphire.
Tunnel-junctions were fabricated from several of these
epitaxial tri-layers. The I-V characteristics showed
a large conductance shunting the Josephson junction.
While an inhomogeneous morphology may cause such
a conductance, no metallurgical pinholes were ever ob-
served in our Al2O3 films. Devices with 20 Å Al2O3 lay-
ers had critical current densities around 104 A/cm2 and
normal state conductances near 109 S/cm2. Assuming a
homogeneous barrier, the latter value gives an effective
barrier height of about 1.3 eV. This is similar to the en-
ergy of sub-gap states found spectroscopically by Dietrich
et al [10] in epitaxial Al2O3 on Nb.
Among the previous studies of Al2O3 epitaxy on bcc
(110) metals, only Chen et al reported any measure of
tensile strain [7]. For Al2O3 films 5-40 Å thick on Ta
(110) they measured a lattice enlargement of about 9%.
The agreement with our findings could be expected since
the lattice constants of Ta and Nb are nearly identi-
cal. One difference though is that Chen et al observed a
Kurdjumov-Sachs relationship, α-Al2O3 (0001) [1̄100] ‖
Nb (110) [11̄1] [7], instead of the Nishiyama-Wasserman
orientation we observe.
In summary, single-crystal Nb/Al2O3 and
Nb/Al2O3/Nb multi-layers were grown by MBE.
Various methods of materials analysis suggest these
layers are all high-quality. Our principal finding is that
epitaxial Al2O3 on Nb (110) grows under uniform tensile
strain, despite the anisotropic misfit. As the Al2O3 film
thickness is increased the strain relaxes and the surface
roughens. The over-layer Nb grows with a (110) surface
orientation under growth conditions that would yield
Nb (111) on C-plane sapphire.
AFM and XRD analysis was carried out in the Cen-
ter for Microanalysis of Materials, University of Illinois
at Urbana-Champaign, which is partially supported by
the U.S. Department of Energy under grant DEFG02-
91ER45439. This project was funded by the National
Science Foundation through grant EIA 01-21568.
[1] M. A. Nielson and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
2000).
[2] D. J. van Harlingen, T. L. Robertson, B. L. T. Plourde,
P. A. Reichardt, T. A. Crane, and J. Clarke, Phys. Rev.
B 70, 064517 (2004).
[3] I. Martin, L. Bulaevskii, and A. Shnirman, Phys. Rev.
Lett. 95, 127002 (2005).
[4] J. M. Martinis, K. B. Cooper, R. McDermott, M. Steffen,
M. Ansmann, K. D. Osborn, K. Cicak, S. Oh, D. P. Pap-
pas, R. Simmonds, et al., Phys. Rev. Lett. 95, 210503
(2005).
[5] S. Oh, K. Cicak, J. S. Kline, M. A. Sillanpää, K. D.
Osborn, J. D. Whittaker, R. W. Simmonds, and D. P.
Pappas, Phys. Rev. B 74, 100502 (2006).
[6] S. Oh, D. A. Hite, K. Cicak, K. D. Osborn, R. W. Sim-
monds, R. McDermott, K. B. Cooper, M. Steffen, J. M.
Martinis, and D. P. Pappas, Thin Solid Films 389, 496
(2006).
[7] P. J. Chen and D. W. Goodman, Surf. Sci. Lett. 312,
L767 (1994).
[8] M.-C. Wu and D. W. Goodman, J. Phys. Chem. 98, 9874
(1994).
[9] J. Günster, M. Brause, T. Mayer, A. Hitzke, and
V. Kempter, Nuc. Instr. and Meth. in Phys. Res. B 100,
411 (1995).
[10] C. Dietrich, B. Koslowski, and P. Ziemann, J. Appl.
Phys. 97, 083515 (2005).
[11] S. M. Durbin, J. E. Cunningham, M. E. Mochel, and
C. P. Flynn, J. Phys. F: Met. Phys. 11, L223 (1981).
[12] J. Mayer, C. P. Flynn, and M. Rühle, Ultramicroscopy
33, 51 (1990).
[13] C. Sürgers and H. v. Löhneysen, Appl. Phys. A 54, 350
(1992).
[14] M. Ondrejcek, R. S. Appleton, W. Swiech, V. L. Petrova,
and C. P. Flynn, Phys. Rev. Lett. 87, 116102 (2001).
[15] K. G. Tscherich and V. von Bonin, J. Appl. Phys. 84,
4065 (1998).
[16] W. M. Mullins and B. L. Averbach, Surf. Sci. 206, 29
(1988).
[17] L. A. Bruce and H. Jaeger, Phil. Mag. A 38, 223 (1978).
[18] W. E. Lee and K. P. D. Lagerlof, J. Elec. Micros. Tech.
2, 247 (1985).
[19] F. H. Streitz and J. W. Mintmire, Phys. Rev. B 60, 773
(1999).
[20] T. Wagner, J. Mater. Res. 13, 693 (1998).
[21] T. Wagner, M. Lorenz, and M. Rühle, J. Mater. Res. 11,
1255 (1996).
[22] H.-G. B. Ch. Dietrich and B. Koslowski, J. Appl. Phys.
94, 1478 (2003).
ABSTRACT
  We report on the layer-by-layer growth of single-crystal Al2O3 thin-films on
Nb (110). Single-crystal Nb films are first prepared on A-plane sapphire,
followed by the evaporation of Al in an O2 background. The first stages of
Al2O3 growth are layer-by-layer with hexagonal symmetry. Electron and x-ray
diffraction measurements indicate the Al2O3 initially grows clamped to the Nb
lattice with a tensile strain near 10%. This strain relaxes with further
deposition, and beyond about 5 nm we observe the onset of island growth.
Despite the asymmetric misfit between the Al2O3 film and the Nb under-layer,
the observed strain is surprisingly isotropic.

<|endoftext|><|startoftext|>
Quasi-quartet crystal electric field ground state
in a tetragonal CeAg2Ge2 single crystal
A. Thamizhavel ∗, R. Kulkarni, S. K. Dhar
Department of condensed matter physics and materials science,
Tata institute of fundamental research, Colaba, Mumbai 400 005, India
Abstract
We have successfully grown the single crystals of CeAg2Ge2, for the first time, by flux method and studied the anisotropic physical
properties by measuring the electrical resistivity, magnetic susceptibility and specific heat. We found that CeAg2Ge2 undergoes
an antiferromagnetic transition at TN = 4.6 K. The electrical resistivity and susceptibility data reveal strong anisotropic magnetic
properties. The magnetization measured at T = 2 K exhibited two metamagnetic transitions at Hm1 = 31 kOe andHm2 = 44.7 kOe,
for H ‖ [100] with a saturation magnetization of 1.6 µB/Ce. The crystalline electric field (CEF) analysis of the inverse susceptibility
data reveals that the ground state and the first excited states of CeAg2Ge2 are closely spaced indicating a quasi-quartet ground
state. The specific heat data lend further support to the presence of closely spaced energy levels.
Key words: CeAg2Ge2; CEF; quartet ground state; antiferromagnetism
PACS: 81.10.-h, 71.27.+a, 71.70.Ch, 75.10.Dg, 75.50.Ee
Compounds crystallizing in the ThCr2Si2 type struc-
ture are the most extensively studied among the strongly
correlated electron systems. A wide range of compounds
crystallize in this type of tetragonal crystal structure
and exhibit novel physical properties. Some of the promi-
nent examples include the first heavy fermion supercon-
ductor CeCu2Si2, pressure induced superconductors like
CePd2Si2, CeRh2Si2, CeCu2Ge2, unconventional metam-
agnetic transition in CeRu2Si2 etc. CeAg2Ge2 also crys-
tallizes in the tetragonal ThCr2Si2 type crystal structure.
Previous reports of CeAg2Ge2 were on polycrystalline
samples and there were conflicting reports on the antifer-
romagnetic ordering temperature [1,2,3]. Furthermore, the
ground state properties of CeAg2Ge2 are also quite intrigu-
ing. Neutron scattering experiments on a polycrystalline
sample could detect only one excited state at 11 meV indi-
cating that the ground state and the first excited states are
closely spaced. In order to study the anisotropic physical
properties and to study the crystalline electric field ground
state, we have grown the single crystals of CeAg2Ge2.
Single crystals of CeAg2Ge2 were grown by self flux
method, using Ag:Ge (75.5: 24.5) binary alloy, which forms
an eutectic at 650 ◦C, as flux. The details about the crystal
∗ Corresponding author. Tel: (81)22-2280-4556
Email address: thamizh@tifr.res.in (A. Thamizhavel).
growth process are given elsewhere [4]. Figure 1(a) shows
the temperature dependence of electrical resistivity of
CeAg2Ge2 for the current direction parallel to both [100]
and [001] directions. There is a large anisotropy in the
electrical resistivity. The electrical resistivity shows a shal-
low minimum at 20 K, marginally increases with decrease
in temperature down to 4.6 K. With further decrease in
the temperature the electrical resistivity drops due to the
reduction in the spin-disorder scattering caused by the an-
tiferromagnetic ordering of the magnetic moments, as seen
in the inset of Fig. 1(a). The antiferromagnetic transition
can be clearly seen at 4.6 K as indicated by the arrow in
the figure.
Figure 1(b) shows the temperature dependence of the
magnetic susceptibility along the two principle directions.
As can be seen from the figure there is a large anisotropy in
the susceptibility due to tetragonal crystal structure. The
high temperature susceptibility does not obey the simple
Curie-Weiss law; on the other hand it can be very well fit-
ted to a modified Curie-Weiss law which is given by χ =
, where χ0 is the temperature independent part
of the magnetic susceptibility and C is the Curie constant.
The value of χ0 was estimated to be 1.33 × 10
−3 and
1.41× 10−3 emu/mol forH ‖ [001] and [100], respectively
such that an effective moment of 2.54 µB/Ce is obtained for
Preprint submitted to Elsevier 25 October 2018
http://arxiv.org/abs/0704.0119v1
) J || [001]
[100]
CeAg2Ge2
3002001000
Temperature (K)
u)H || [100]
[001]
H || [001]
[100]
CeAg2Ge2
[100]
[001]
6040200
Magnetic Field (kOe)
CeAg2Ge2
[001]
H || [100]
T = 2 K
Fig. 1. (a) The temperature dependence of electrical resistivity of
CeAg2Ge2, inset shows the low temperature part, (b) Temperature
dependence of the magnetic susceptibility together with inverse mag-
netic susceptibility plot, solid lines indicate the CEF fitting and (c)
Magnetization of CeAg2Ge2 measured at T = 2 K.
temperatures above 100 K. In order to perform the CEF
analysis of the susceptibility data, we plotted the inverse
susceptibility as 1/(χ− χ0) versus T . The solid line in fig-
ure 1(b) are the fitting to the inverse susceptibility with the
CEF Hamiltonian given byHCEF = B
where Bm
and Om
are CEF parameters and the Stevens
operators respectively. The level splitting energies are esti-
mated to be ∆1 = 5 K and ∆2 = 130 K. It may be noted
that the first excited state is very close to the ground state
indicating that the ground state is a quasi-quartet state.
Figure 1(c) shows the field dependence of magnetization at
2 K. For H ‖ [100], the magnetization increases linearly
with the field and exhibit two metamagnetic transition at
Hm1 = 31 kOe and Hm2 = 44.7 kOe before it saturates at
1.6 µB/Ce at 70 kOe, indicating the easy axis of magneti-
zation. On the other hand the magnetization along [001] is
very small and varies linearly with field reaching a value of
0.32 µB/Ce at 50 kOe.
Figure 2(a) shows the temperature dependence of the
specific heat of CeAg2Ge2 together with the specific heat
of a reference sample LaAg2Ge2. The antiferromagnetic or-
dering is manifested by the clear jump in the specific heat
at TN = 4.6 K as indicated by the arrow. The inset of
Fig. 2(a) shows the Cmag/T versus T together with the
20151050
Temperature(K)
H // [100]
 0   kOe
 20 kOe
 40 kOe
 50 kOe
 60 kOe
 80 kOe
 100 kOe
 120 kOe
40200
Temperature (K)
R ln 2
R ln 4
Fig. 2. (a) Temperature dependence of the specific heat of CeAg2Ge2
and LaAg2Ge2. The inset shows the magnetic entropy. (b) The field
dependence of the specific heat of CeAg2Ge2 for the field applied
along the easy axis of magnetization, namely [100].
magnetic entropy. The entropy reaches R ln 4 not too far
away from the magnetic ordering temperature leading to
the conclusion that the ground state and the first excited
states are closely spaced or nearly degenerate, thus corrob-
orating our CEF analysis of the inverse susceptibility data.
Figure 2(b) shows the field dependence of the specific heat
for the field applied parallel to the easy axis of magnetiza-
tion namely [100]. With the increase in the magnetic field
the Néel temperature decreases and the antiferromagnetic
ordering apparently vanishes at a critical field of 50 kOe
indicating a possibility of a field induced quantum critical
point in this compound. However, further low temperature
measurements are necessary to confirm this.
In summary, we have successfully grown the single crys-
tals of CeAg2Ge2 by the flux method. CeAg2Ge2 orders an-
tiferromagnetically at TN = 4.6 K. The CEF analysis of the
inverse susceptibility data indicate the ground state and
the first excited states are closely spaced. The heat capacity
data support this quasi-quartet ground state. Furthermore,
the heat capacity in applied magnetic fields revealed that
the Néel temperature vanishes at a critical field of 50 kOe
indicating a possible field induced quantum critical point
in this compound.
References
[1] R. Rauchschwalbe et al., J. Less Common. Metals 111, (1985)
[2] G. Knopp et al., J. Magn. Magn. Mater. 63 & 64, (1987) 88.
[3] E. Cordruwish et al., J. Phase Equilibria 20, (1999) 407.
[4] A. Thamizhavel et al., Phys. Rev. B (2007) to be published
	References
ABSTRACT
  We have successfully grown the single crystals of CeAg$_2$Ge$_2$, for the
first time, by flux method and studied the anisotropic physical properties by
measuring the electrical resistivity, magnetic susceptibility and specific
heat. We found that CeAg$_2$Ge$_2$ undergoes an antiferromagnetic transition at
$T_{\rm N}$ = 4.6 K. The electrical resistivity and susceptibility data reveal
strong anisotropic magnetic properties. The magnetization measured at $T$ = 2 K
exhibited two metamagnetic transitions at $H_{\rm m1}$ = 31 kOe and $H_{\rm
m2}$ = 44.7 kOe, for $H \parallel$ [100] with a saturation magnetization of 1.6
$\mu_{\rm B}$/Ce. The crystalline electric field (CEF) analysis of the inverse
susceptibility data reveals that the ground state and the first excited states
of CeAg$_2$Ge$_2$ are closely spaced indicating a quasi-quartet ground state.
The specific heat data lend further support to the presence of closely spaced
energy levels.

<|endoftext|><|startoftext|>
Strong Phase and D0 − D0 mixing at BES-III
Xiao-Dong Cheng1,2,∗ Kang-Lin He1,† Hai-Bo Li1,‡ Yi-Fang Wang1,§ and Mao-Zhi Yang1¶
Institute of High Energy Physics, P.O.Box 918, Beijing 100049, China
Department of Physics, Henan Normal University, XinXiang, Henan 453007, China
(Dated: October 25, 2018)
Most recently, both BaBar and Belle experiments found evidences of neutral D mixing. In this
paper, we discuss the constraints on the strong phase difference in D0 → Kπ decay from the
measurements of the mixing parameters, y′, yCP and x at the B factories. With CP tag technique at
ψ(3770) peak, the extraction of the strong phase difference at BES-III are discussed. The sensitivity
of the measurement of the mixing parameter y is estimated in BES-III experiment at ψ(3770) peak.
Finally, we also make an estimate on the measurements of the mixing rate RM .
PACS numbers: 13.25.Ft, 12.15.Ff, 13.20.Fc, 11.30.Er
Due to the smallness of ∆C = 0 amplitude in the
Standard Model (SM), D0 − D0 mixing offers a unique
opportunity to probe flavor-changing interactions which
may be generated by new physics. The recent measure-
ments from BaBar and Belle experiments indicate that
the D0 − D0 mixing may exist [1, 2]. At the B fac-
tories, the decay time information can be used to ex-
tract the neutral D mixing parameters. At t = 0 the
only term in the amplitude is the direct doubly-Cabibbo-
suppressed (DCS) mode D0 → K+π−, but for t > 0
D0 − D0 mixing may contribute through the sequence
D0 → D0 → K+π− , where the second stage is Cabibbo
favored (CF). The interference of this term with the DCS
contribution involves the lifetime and mass differences
of the neutral D mass eigenstates, as well as the final-
state strong phase difference δKπ between the CF and
the DCS decay amplitudes. This interference plays a
key role in the measurement of the mixing parameters at
time-dependent measurements.
With the assumption of CPT invariance, the mass
eigenstates of D0 −D0 system are |D1〉 = p|D0〉+ q|D0〉
and |D2〉 = p|D0〉−q|D0〉 with eigenvalues µ1 = m1−
and µ2 = m2 −
Γ2, respectively, where the m1 and Γ1
(m2 and Γ2) are the mass and width of D1 (D2). For
the method of detecting D0 − D0 mixing involving the
D0 → Kπ decay mentioned above, in order to separate
the DCS decay from the mixing signal, one must study
the time-dependent decay rate. The proper-time evolu-
tion of the particle states |D0
(t)〉 and |D0
(t)〉 are
given by
|D0phys(t)〉 = g+(t)|D
0〉 − q
g−(t)|D0〉,
|D0phys(t)〉 = g+(t)|D
0〉 − p
g−(t)|D0〉, (1)
where
(e−im2t−
Γ2t ± e−im1t− 12Γ1t), (2)
with definitions
m ≡ m1 +m2
, ∆m ≡ m2 −m1,
Γ ≡ Γ1 + Γ2
, ∆Γ ≡ Γ2 − Γ1, (3)
Note the sign of ∆m and ∆Γ is to be determined by
experiments.
In practice, one define the following mixing parameters
x ≡ ∆m
, y ≡ ∆Γ
. (4)
The time-dependent decay amplitudes for D0
(t) →
K+π− and D0
(t) → K−π+ are described as
〈K+π−|H|D0phys(t)〉 = g+(t)AK+π− −
g−(t)AK+π−
AK+π− [λg+(t)− g−(t)], (5)
〈K−π+|H|D0phys(t)〉 = g+(t)AK−π+ −
g−(t)AK−π+
AK−π+ [λg+(t)− g−(t)], (6)
where AK+π− = 〈K+π−|H|D0〉, AK+π− =
〈K+π−|H|D0〉, AK−π+ = 〈K−π+|H|D0〉, and
AK−π+ = 〈K−π+|H|D0〉. Here, λ and λ are de-
fined as:
λ ≡ p
AK+π−
AK+π−
, (7)
λ ≡ q
AK−π+
AK−π+
. (8)
From Eqs. (5) and (6), one can derive the general ex-
pression for the time-dependent decay rate, in agreement
http://arxiv.org/abs/0704.0120v3
with [3, 4]:
dΓ(D0
(t) → K+π−)
dtN = |AK
+π− |2
e−Γt ×
[(|λ|2 + 1)cosh(yΓt) +
(|λ|2 − 1)cos(xΓt) +
2Re(λ)sinh(yΓt) +
2Im(λ)sin(xΓt)] (9)
dΓ(D0
(t) → K−π+)
dtN = |AK−π+ |
e−Γt ×
[(|λ|2 + 1)cosh(yΓt) +
(|λ|2 − 1)cos(xΓt) +
2Re(λ)sinh(yΓt) +
2Im(λ)sin(xΓt)] (10)
where N is a common normalization factor. In order to
simplify the above formula, we make the following defi-
nition:
≡ (1 +AM )e−iβ , (11)
where β is the weak phase in mixing and AM is a real-
valued parameter which indicates the magnitude of CP
violation in the mixing. For f = K−π+ final state, we
define
AK+π−
AK+π−
r′e−iα
AK−π+
AK−π+
re−iα, (12)
where r′ and α′ (r and α) are the ratio and relative phase
of the DCS decay rate and the CF decay rate. Then, λ
and λ can be parameterized as
λ = −
1 +AM
e−i(α
′−β) , (13)
λ = −
r(1 +AM )e
−i(α+β). (14)
In order to demonstrate the CP violation in decay, we
define
RD(1 + AD) and
1 +AD
Thus, Eqs. (13) and (14) can be expressed as
λ = −
1 +AD
1 +AM
e−i(δ−φ) , (15)
λ = −
1 +AM
1 +AD
e−i(δ+φ) , (16)
where δ =
α′ + α
is the averaged phase difference be-
tween DCS and CF processes, and φ =
α− α′
We can characterize the CP violation in the mixing
amplitude, the decay amplitude, and the interference
between amplitudes with and without mixing, by real-
valued parameters AM , AD, and φ as in Ref [5, 6].
In the limit of CP conservation, AM , AD and φ are
all zero. AM = 0 means no CP violation in mixing,
namely, |q/p| = 1; AD = 0 means no CP violation in
decay, for this case, r = r′ = RD = |AK−π+/AK−π+ |2 =
|AK+π−/AK+π− |2; φ = 0 means no CP violation in the
interference between decay and mixing.
In experimental searches, one can define CF decay as
right-sign (RS) and DCS decay or via mixing followed
by a CF decay as wrong-sign (WS). Here, we define the
ratio of WS to RS decays as for D0:
R(t) =
dΓ(D0
(t) → K+π−)
dtN × e−Γ|t| × 2|AK+π− |2
, (17)
and for D0:
R(t) =
dΓ(D0
(t) → K−π+)
dtN × e−Γ|t| × 2|AK−π+ |2
, (18)
Taking into account that |λ|, |λ| ≪ 1 and x, y ≪ 1,
keeping terms up to order x2, y2 and RD in the ex-
pressions, neglecting CP violation in mixing, decay and
the interference between decay with and without mixing
(AM = 0, AD = 0, and φ = 0), expanding the time-
dependent for xt, yt <∼ Γ−1, combing Eqs. (9) and (10),
we can write Eqs. (17) and (18) as
R(t) = R(t) = RD +
(Γt)2, (19)
where
x′ = xcosδ + ysinδ, (20)
y′ = −xsinδ + ycosδ. (21)
In the limit of SU(3) symmetry, AK+π− and AK+π−
(AK−π+ and AK−π+) are simply related by CKM fac-
tors, AK+π− = (VcdV
us/VcsV
ud)AK+π− [7]. In particular,
AK+π− and AK+π− have the same strong phase, leading
to α′ = α = 0 in Eq. (12). But the SU(3) symmetry
is broken according to the recent precise measurements
from the B factories, the ratio [5]:
R = BR(D
0 → K+π−)
BR(D0 → K+π−)
, (22)
is unity in the SU(3) symmetry limit. But, the world
average for this ratio is
Rexp = 1.21± 0.03, (23)
computed from the individual measurements using the
standard method of Ref. [4]. Since the SU(3) is bro-
ken in D → Kπ decays at the level of 20%, in which
case the strong phase δ should be non-zero. Recently, a
time-dependent analysis in D → Kπ has been performed
based on 384 fb−1 luminosity at Υ (4S) [1]. By assuming
CP conservation, they obtained the following neutral D
mixing results
RD = (3.03± 0.16± 0.10)× 10−3,
= (−0.22± 0.30± 0.21)× 10−3,
y′ = (9.7± 4.4± 3.1)× 10−3. (24)
TABLE I: Experimental results used in the paper. Only one
error is quoted, we have combined in quadrature statistical
and systematic contributions.
Parameter BaBar (×10−3) Belle(×10−3) Technique
-0.22± 0.37 [1] 0.18+0.21−0.23 [8] Kπ
′ 9.7± 5.4 [1] 0.6+4.0−3.9 [8] Kπ
RD 3.03± 0.19 [1] 3.64 ± 0.17 [8] Kπ
yCP - 13.1 ± 4.1 [2] K
−, π+π−
x - 8.0± 3.4 [9] KSπ
y - 3.3± 2.8 [9] KSπ
The result is inconsistent with the no-mixing hypoth-
esis with a significance of 3.9 standard deviations. The
results from BaBar and Belle are in agreement within 2
standard deviation on the exact analysis of y′ measure-
ment by using D → Kπ as listed in Table I. As indicated
in Eq. (23), the strong phase δ should be non-zero due to
the SU(3) violation. One has to know the strong phase
difference exactly in order to extract the direct mixing
parameters, x and y as defined in Eqs. (4). However,
at the B factory, it is hard to do that with a model-
independent way [7, 10]. In order to extract the strong
phase δ we need data near the DD threshold to do a CP
tag as discussed in Ref. [7]. Here, we would like to figure
out the possible physics solution of the strong phase δ by
using the recent results from the B factories with differ-
ent decay modes, so that we can have an idea about the
sensitivity to measure the strong phase at the BES-III
project.
In Ref [2], Belle collaboration also reported the result
of yCP =
τ(D0→K+π−)
τ(D0→fCP )
− 1, where fCP = K+K− and
yCP = (13.1± 3.2± 2.5)× 10−3. (25)
The result is about 3.2σ significant deviation from zero
(non-mixing). In the limit of CP symmetry, yCP = y [11,
12]. In the decay of D0 → KSπ+π−, Belle experiment
has done a Dalitz plot (DP) analysis [9], they obtained
the direct mixing parameters x and y as
x = (8.0± 3.4)× 10−3, y = (3.3± 2.8)× 10−3, (26)
where the error includes both statistic and systematic un-
certainties. Since the parameterizations of the resonances
on the DP are model-dependent, the results suffer from
large uncertainties from the DP model. In this analysis,
they see a significance of 2.4 standard deviations from
non-mixing. Here, we will use the value of x measured in
the DP analysis for further discussion. As shown in Eq.
(21), once y, y′ and x are known, it is straightforward to
extract the strong phase difference between DCS and CF
decay in D0 → Kπ decay. If taking the measured central
values of x, yCP (≈ y) , and y′ as input parameters, we
found two-fold solutions for tanδ as below:
tanδ = 0.35± 0.63, or − 7.14± 29.13, (27)
which are corresponding to (19± 32)0 and (−820 ± 30)0,
respectively.
At ψ(3770) peak, to extract the mixing parameter y,
one can make use of rates for exclusive D0D0 combina-
tion, where both the D0 final states are specified (known
as double tags or DT), as well as inclusive rates, where
either the D0 or D0 is identified and the other D0 de-
cays generically (known as single tags or ST) [13]. With
the DT tag technique [14, 15], one can fully consider
the quantum correlation in C = −1 and C = +1 D0D0
pairs produced in the reaction e+e− → D0D0(nπ0) and
e+e− → D0D0γ(nπ0) [13, 16, 17], respectively.
For the ST, in the limit of CP conservation, the rate
of D0 decays into a CP eigenstate is given as [13]:
Γfη ≡ Γ(D0 → fη) = 2A2fη [1− ηy] , (28)
where fη is a CP eigenstate with eigenvalue η = ±1, and
Afη = |〈fη|H|D0〉| is the real-valued decay amplitude.
For the DT case, Gronau et. al. [7] and Xing [18]
have considered time-integrated decays into correlated
pairs of states, including the effects of non-zero final state
phase difference. As discussed in Ref. [7], the rate of
(D0D0)C=−1 → (l±X)(fη) is described as [7]:
Γl;fη ≡ Γ[(l±X)(fη)] = A2l±XA
(1 + y2)
≈ A2l±XA
, (29)
where Al±X = |〈l±X |H|D0〉| is real-valued amplitude for
semileptonic decays, here, we neglect y2 term since y ≪
For C = −1 initial D0D0 state, y can be expressed in
term of the ratios of DT rates and the double ratios of
ST rates to DT rates [13]:
Γl;f+Γf−
Γl;f−Γf+
Γl;f−Γf+
Γl;f+Γf−
. (30)
For a small y, its error, ∆(y), is approximately 1/
Nl±X ,
where Nl±X is the total number of (l
±X) events tagged
with CP -even and CP -odd eigenstates. The num-
ber Nl±X of CP tagged events is related to the to-
tal number of D0D0 pairs N(D0D0) through Nl±X ≈
N(D0D0)[BR(D0 → l± +X)× BR(D0 → f±)× ǫtag] ≈
1.5 × 10−3N(D0D0), here we take the branching ratio-
times-efficiency factor (BR(D0 → f±)× ǫtag) for tagging
CP eigenstates is about 1.1% (the total branching ratio
into CP eigenstates is larger than about 5% [4]). We find
∆(y) =
N(D0D0)
= ±0.003. (31)
If we take the central value of y from the measurement
of yCP at Belle experiment [2], thus, at BES-III exper-
iment [19], with 20fb−1 data at ψ(3770) peak, the sig-
nificance of the measurement of y could be around 4.3 σ
deviation from zero.
We can also take advantage of the coherence of the
D0 mesons produced at the ψ(3770) peak to extract the
strong phase difference δ between DCS and CF decay am-
plitudes that appears in the time-dependent mixing mea-
surement in Eq. (19) [7, 13]. Because the CP properties
of the final states produced in the decay of the ψ(3770)
are anti-correlated [16, 17], one D0 state decaying into a
final state with definite CP properties immediately iden-
tifies or tags the CP properties of the other side. As
discussed in Ref. [7], the process of one D0 decaying to
K−π+, while the other D0 decaying to a CP eigenstate
fη can be described as
ΓKπ;fη ≡ Γ[(K−π+)(fη)] ≈ A2A2fη |1 + η
−iδ|2
≈ A2A2fη (1 + 2η
RDcosδ),
where A = |〈K−π+|H|D0〉| and Afη = |〈fη|H|D0〉| are
the real-valued decay amplitudes, and we have neglected
the y2 terms in Eq. (32). In order to estimate the total
sample of events needed to perform a useful measurement
of δ, one defined [7, 10] an asymmetry
ΓKπ;f+ − ΓKπ;f−
ΓKπ;f+ + ΓKπ;f−
, (33)
where ΓKπ;f± is defined in Eq. (32), which is the rates for
the ψ(3770) → D0D0 configuration to decay into flavor
eigenstates and a CP -eigenstates f±. Eq. (32) implies a
small asymmetry, A = 2
RDcosδ. For a small asymme-
try, a general result is that its error ∆A is approximately
NK−π+ , where NK−π+ is the total number of events
tagged with CP -even and CP -odd eigenstates. Thus one
obtained
∆(cosδ) ≈ 1
NK−π+
. (34)
The expected number NK−π+ of CP -tagged events
can be connected to the total number of D0D0 pairs
N(D0D0) through NK−π+ ≈ N(D0D0)BR(D0 →
K−π+)×BR(D0 → f±)×ǫtag ≈ 4.2×10−4N(D0D0) [7],
here, as in Ref [7], we take the branching ratio-times-
efficiency factor BR(D0 → f±) × ǫtag = 1.1%. With
0 0.2 0.4 0.6 0.8 1
FIG. 1: Illustrative plot of the expected error (∆δ) of the
strong phase with various central values of cosδ. The expected
error of cosδ is 0.04 by ssuming 20fb−1 data at ψ(3770) peak
at BES-III. The two asterisks correspond to δ = 190 and
−820, respectively.
the measured RD = (3.03± 0.19)× 10−3 and BR(D0 →
K−π+) = 3.8% [4], one found [7]
∆(cosδ) ≈ ±444√
N(D0D0)
. (35)
At BESIII, about 72 × 106 D0D0 pairs can be collected
with 4 years’ running. If considering both K−π+ and
K+π− final states, we thus estimate that one may be
able to reach an accuracy of about 0.04 for cosδ. Fig-
ure 1 shows the expected error of the strong phase δ
with various central values of cosδ. With the expected
∆(cosδ) = ±0.04, the sensitivity of the strong phase
varies with the physical value of cosδ. For δ = 190 and
−820, the expected error could be ∆(δ) = ±8.70 and
±2.90, respectively.
By combing the measurements of x inD0 → KSππ and
yCP from Belle, one can obtain RM = (1.18±0.6)×10−4.
At the ψ(3770) peak, D0D0 pair are produced in a state
that is quantum-mechanically coherent [16, 17]. This en-
ables simple new method to measure D0 mixing param-
eters in a way similar proposed in Ref. [7]. At BES-III,
the measurement of RM can be performed unambigu-
ously with the following reactions [16]:
(i) e+e− → ψ(3770) → D0D0 → (K±π∓)(K±π∓),
(ii) e+e− → ψ(3770) → D0D0 → (K−e+ν)(K−e+ν),
(iii) e+e− → D−D∗+ → (K+π−π−)(π+
[K+e−ν]).
Reaction (i) in Eq. (36) can be normalized to D0D0 →
(K−π+)(K+π−), the following time-integrated ratio is
obtained by neglecting CP violation:
N [(K−π+)(K−π+)]
N [(K−π+)(K+π−)]
2 + y2
= RM . (37)
For the case of semileptonic decay, as (ii) in Eq. (36), we
N(l±l±)
N(l±l∓)
x2 + y2
= RM , (38)
The observation of reaction (i) would be definite evi-
dence for the existence of D0 −D0 mixing since the final
state (K±π∓)(K±π∓) can not be produced from DCS
decay due to quantum statistics [16, 17]. In particular,
the initial D0D0 pair is in an odd eigenstate of C which
will preclude, in the absence of mixing between the D0
and D0 over time, the formation of the symmetric state
required by Bose statistics if the decays are to be the
same final state. This final state is also very appealing
experimentally, because it involves a two-body decay of
both charm mesons, with energetic charged particles in
the final state that form an overconstrained system. Par-
ticle identification is crucial in this measurement because
if both the kaon and pion are misidentified in one of the
two D-meson decays in the event, it becomes impossi-
ble to discern whether mixing has occurred. At BESIII,
where the data sample is expected to be 20 fb−1 inte-
grated luminosity at ψ(3770) peak, the limit will be 10−4
at 95% C.L. for RM , but only if the particle identification
capabilities are adequate.
Reactions (ii) and (iii) offer unambiguous evidence
for the mixing because the mixing is searched for in the
semileptonic decays for which there are no DCS decays.
Of course since the time-evolution is not measured, obser-
vation of Reactions (ii) and (iii) actually would indicate
the violation of the selection rule relating the change in
charm to the change in leptonic charge which holds true
in the standard model [16].
In Table II, the sensitivity for RM measurements in
different decay modes are estimated with 4 years’ run at
BEPCII.
TABLE II: The sensitivity for RM measurements at BES-III
with different decay modes with 4 years’ run at BESPCII
0 Mixing
Reaction Events Sensitivity
RS(×104) RM (×10
ψ(3770) → (K−π+)(K−π+) 10.4 1.0
ψ(3770) → (K−e+ν)(K−e+ν) 8.9
ψ(3770) → (K−e+ν)(K−µ+ν) 8.1 3.7
ψ(3770) → (K−µ+ν)(K−µ+ν) 7.3
In the limit of CP conservation, by combing the mea-
surements of x in D0 → KSππ and yCP from Belle, one
can obtain RM = (1.18± 0.6)× 10−4. With 20fb−1 data
at BES-III, about 12 events for the precess D0D0 →
(K±π∓)(K±π∓) can be produced. One can observe 3.0
events after considering the selection efficiency at BE-
SIII, which could be about 25% for the four charged
particles. The background contamination due to double
particle misidentification is about 0.6 event with 20fb−1
data at BES-III [20]. Table III lists the expected mixing
signal for Nsig = N(K
±π∓)(K±π∓), background Nbkg ,
and the Poisson probability P (n), where n is the possible
number of observed events in experiment. In Table III,
we assume the RM = 1.18× 10−4, the expected number
of mixing signal events are estimated with 10fb−1 and
20fb−1, respectively.
TABLE III: The expected mixing signal for Nsig =
N(K±π∓)(K±π∓), background Nbkg , and the Poisson prob-
ability P (n) in 10 fb−1 and 20 fb−1 at BES-III at ψ(3770)
peak, respectively. Here, we take the mixing rate RM =
1.18× 10−4.
10 fb−1 (ψ(3770)) 20 fb−1 (ψ(3770))
36 million D0D0 72 million D0D0
Nsig 1.5 3.0
Nbkg 0.3 0.6
P (n = 0) 15.7% 2.5%
P (n = 1) 29.1% 9.1%
P (n = 2) 26.9% 16.9%
P (n = 3) 16.6% 20.9%
P (n = 4) 7.7% 19.3%
P (n = 5) 2.8% 14.3%
P (n = 6) 0.9% 8.8%
P (n = 7) 0.2% 4.7%
P (n = 8) 0.1% 2.2%
P (n = 9) 0.01% 0.9%
In conclusion, we discuss the constraints on the strong
phase difference in D0 → Kπ decay according to the
most recent measurements of y′, yCP and x from B fac-
tories. We estimate the sensitivity of the measurement of
mixing parameter y at ψ(3770) peak in BES-III experi-
ment. With 20 fb−1 data, the uncertainty ∆(y) could be
0.003. Thus, assuming y at a percent level, we can make
a measurement of y at a significance of 4.3σ deviation
from zero. The sensitivity of the strong phase differ-
ence at BES-III are obtained by using data near the DD
threshold with CP tag technique at BES-III experiment.
Finally, we estimated the sensitivity of the measurements
of the mixing rate RM , and find that BES-III experiment
may not be able to make a significant measurement of
RM with current luminosity by using coherent DD state
at ψ(3770) peak.
One of the authors (H. B. Li) would like to thank
David Asner and Zhi-Zhong Xing for stimulating dis-
cussion, Chang-Zheng Yuan for useful discussion on the
statistics used in this paper, and also thank Stephen
L. Olsen and Yang-Heng Zheng for commenting on this
manuscript. We thank BES-III collaboration for provid-
ing us many numerical results based on GEANT4 simula-
tion. This work is supported in part by the National Nat-
ural Science Foundation of China under contracts Nos.
10205017, 10575108,10521003, and the Knowledge Inno-
vation Project of CAS under contract Nos. U-612 and
U-530 (IHEP).
∗ Electronic address: chengxd@ihep.ac.cn
† Electronic address: hekl@ihep.ac.cn
‡ Electronic address: lihb@ihep.ac.cn
§ Electronic address: yfwang@ihep.ac.cn
¶ Electronic address: yangmz@ihep.ac.cn
[1] B. Aubert, et. al., (BaBar Collaboration),
hep-ex/0703020.
[2] K. Abe et. al., (Belle Collaboration), hep-ex/0703036.
[3] Y. Nir, hep-ph/0703235.
[4] W. M. Yao et. al., (Partcle Data Group), J. Phys.G 33,
1(2006).
[5] A. F. Falk, Y. Nir, and A. Petrov, JHEP12, 019 (1999).
[6] H. B. Li and M. Z. Yang, Phys. Rev. D74, 094016(2006).
[7] M. Gronau, Y. Grossman, J. L. Rosner, Phys. Lett.
B508, 37 (2001).
[8] L. M. Zhang et. al., (Belle Collaboration), Phys. Rev.
Lett. 96, 151801 (2006).
[9] M. Staric, Talk given at the 42th Renocontres De
Moriond On Electroweak Interactions And Unified The-
ories, March 10-17, 2007, La Thuile, Italy.
[10] G. Burdman and I. Shipsey, Ann. Rev. Nucl. Part. Sci.
53, 431 (2003).
[11] S. Bergmann, Y. Grossman, Z. Ligeti, Y. Nir and
A. A. Petrov, Phys. Lett. B486, 418(2000).
[12] D. Atwood, A. A. Petrov, Phys. Rev. D71, 054032
(2005).
[13] D. M. Asner and W. M. Sun Phys. Rev. D73,
034024 (2006);D. M. Asner et. al., Int. J. Mod. Phys.
A21, 5456 (2006); W. M. Sun, hep-ex/0603031, AIP
Conf. Proc. 842:693-695 (2006).
[14] R. M. Baltrusaitis, et. al., (MARK III Collaboration),
Phys. Rev. Lett. 56, 2140(1986).
[15] J. Adler, et. al., (MARK III Collaboration), Phys. Rev.
Lett. 60, 89 (1988).
[16] I. I. Bigi, Proceed. of the Tau-Charm Workshop,
L. V. Beers (ed.), SLAC-Report-343, page 169, (1989).
[17] I. Bigi, A. Sanda, Phys. Lett. B171, 320(1986).
[18] Z. Z. Xing, Phys. Rev. D55, 196(1997);
Z. Z. Xing, Phys. Lett. B372,317(1996).
[19] BES-III Collaboration, ”The Preliminary Design Report
of the BESIII Detector”, Report No. IHEP-BEPCII-SB-
[20] Y. Z. Sun et. al., to appear at HEP & NP 31, 1 (2007).
mailto:chengxd@ihep.ac.cn
mailto:hekl@ihep.ac.cn
mailto:lihb@ihep.ac.cn
mailto:yfwang@ihep.ac.cn
mailto:yangmz@ihep.ac.cn
http://arxiv.org/abs/hep-ex/0703020
http://arxiv.org/abs/hep-ex/0703036
http://arxiv.org/abs/hep-ph/0703235
http://arxiv.org/abs/hep-ex/0603031
ABSTRACT
  Most recently, both BaBar and Belle experiments found evidences of neutral
$D$ mixing. In this paper, we discuss the constraints on the strong phase
difference in $D^0 \to K\pi$ decay from the measurements of the mixing
parameters, $y^\prime$, $y_{CP}$ and $x$ at the $B$ factories. The sensitivity
of the measurement of the mixing parameter $y$ is estimated in BES-III
experiment at $\psi(3770)$ peak. We also make an estimate on the measurements
of the mixing rate $R_M$. Finally, the sensitivity of the strong phase
difference at BES-III are obtained by using data near the $D\bar{D}$ threshold
with CP tag technique at BES-III experiment.

<|endoftext|><|startoftext|>
Introduction
It is well-known that the N = 1 SU(Nc) QCD with fundamental flavors has a vanishing
superpotential before we deform this theory by mass term for quarks. The vanishing su-
perpotential in the electric theory makes it easier to analyze its nonvanishing dual magnetic
superpotential. Sometimes by tuning the various rotation angles between NS5-branes and
D6-branes appropriately, even if the electric theory has nonvanishing superpotential, one can
make the nonzero superpotential to vanish in the electric theory. Two procedures, deforming
the electric gauge theory by adding the mass for the quarks and taking the Seiberg dual
magnetic theory from the electric theory, are crucial to find out meta-stable supersymmetry
breaking vacua in the context of dynamical supersymmetry breaking [1, 2]. Some models
of dynamical supersymmetry breaking can be studied by gauging the subgroup of the flavor
symmetry group by either field theory analysis or using the brane configuration 1.
In this paper, starting from the known N = 1 supersymmetric electric gauge theories, we
construct the N = 1 supersymmetric magnetic gauge theories by brane motion and linking
number counting. The dual gauge group appears only on the first gauge group. Based on
their particular limits of corresponding magnetic brane configurations in the sense that their
electric theories do not have any superpotentials except the mass deformations for the quarks,
we describe the intersecting brane configurations of type IIA string theory for the meta-stable
nonsupersymmetric vacua of these gauge theories.
We focus on the cases where the whole gauge group is given by a product of two gauge
groups. One example can be realized by three NS5-branes with D4- and D6-branes, and the
other by four NS5-branes with D4- and D6-branes. For the latter, the appropriate orientifold
6-plane should be located at the center of this brane configuration in order to have two gauge
groups. Of course, it is also possible, without changing the number of gauge groups, to have
the brane configuration consisting of five NS5-branes and orientifold 6-plane, at which the
extra NS5-brane is located, with D4- and D6-branes, but we’ll not do this particular case in
this paper.
In section 2, we review the type IIA brane configuration that contains three NS5-branes,
corresponding to the electric theory based on the N = 1 SU(Nc) × SU(N ′c) gauge theory
[4, 5, 6] with matter contents and deform this theory by adding the mass term for the quarks.
Then we construct the Seiberg dual magnetic theory which is N = 1 SU(Ñc)×SU(N ′c) gauge
theory with corresponding dual matters as well as various gauge singlets, by brane motion
and linking number counting. We do not touch the part of second gauge group SU(N ′c) in
1For the type IIA brane configuration description of N = 1 supersymmetric gauge theory, see the review
paper [3].
this dual process.
In section 3, we consider the nonsupersymmetric meta-stable minimum by looking at
the magnetic brane configuration we obtained in section 2 and present the corresponding
intersecting brane configuration of type IIA string theory, along the line of [7, 8, 9, 10, 11](see
also [12, 13, 14]) and we describe M-theory lift of this supersymmetry breaking type IIA brane
configuration.
In section 4, we describe the type IIA brane configuration that contains four NS5-branes,
corresponding to the electric theory based on the N = 1 SU(Nc) × SO(N ′c) gauge theory
[15] with matter contents and deform this theory by adding the mass term for the quarks.
Then we take the Seiberg dual magnetic theory which is N = 1 SU(Ñc) × SO(N ′c) gauge
theory with corresponding dual matters as well as various gauge singlets, by brane motion
and linking number counting. The part of second gauge group SO(N ′c) in this dual process is
not changed under this process.
In section 5, the nonsupersymmetric meta-stable minimum by looking at the magnetic
brane configuration we obtained in section 4 is constructed and we present the corresponding
intersecting brane configuration of type IIA string theory and describe M-theory lift of this
supersymmetry breaking type IIA brane configuration, as we did in section 3.
In section 6, we describe the similar application to the N = 1 SU(Nc) × Sp(N ′c) gauge
theory [15] briefly and make some comments for the future directions.
2 The N = 1 supersymmetric brane configuration of
SU(Nc)× SU(N ′c) gauge theory
After reviewing the type IIA brane configuration corresponding to the electric theory based
on the N = 1 SU(Nc)×SU(N ′c) gauge theory [4, 5, 6], we construct the Seiberg dual magnetic
theory which is N = 1 SU(Ñc)× SU(N ′c) gauge theory.
2.1 Electric theory with SU(Nc)× SU(N ′c) gauge group
The gauge group is given by SU(Nc)×SU(N ′c) and the matter contents [4, 5, 6] are given by
• Nf chiral multiplets Q are in the fundamental representation under the SU(Nc), Nf
chiral multiplets Q̃ are in the antifundamental representation under the SU(Nc) and then Q
are in the representation (Nc, 1) while Q̃ are in the representation (Nc, 1) under the gauge
group
• N ′f chiral multiplets Q′ are in the fundamental representation under the SU(N ′c), N ′f
chiral multiplets Q̃′ are in the antifundamental representation under the SU(N ′c) and then Q
are in the representation (1,N′
) while Q̃′ are in the representation (1,N′
) under the gauge
group
• The flavor singlet field X is in the bifundamental representation (Nc,N′c) under the
gauge group and its complex conjugate field X̃ is in the bifundamental representation (Nc,N
under the gauge group
In the electric theory, since there exist Nf quarks Q, Nf quarks Q̃, one bifundamental
field X which will give rise to the contribution of N ′c and its complex conjugate X̃ which will
give rise to the contribution of N ′c, the coefficient of the beta function of the first gauge group
factor is
bSU(Nc) = 3Nc −Nf −N ′c
and similarly since there exist N ′f quarks Q
′, N ′f quarks Q̃
′, one bifundamental field X which
will give rise to the contribution of Nc and its complex conjugate X̃ which will give rise to
the contribution of Nc, the coefficient of the beta function of the second gauge group factor is
bSU(N ′c) = 3N
c −N ′f −Nc.
The anomaly free global symmetry is given by [SU(Nf ) × SU(N ′f )]2 × U(1)3 × U(1)R
[4, 5, 6] and let us denote the strong coupling scales for SU(Nc) as Λ1 and for SU(N
c) as Λ2.
The theory is asymptotically free when bSU(Nc) = 3Nc − Nf − N ′c > 0 for the SU(Nc) gauge
theory and when bSU(N ′c) = 3N
c −N ′f −Nc > 0 for the SU(N ′c) gauge theory.
The type IIA brane configuration for this theory can be described by Nc color D4-branes
(01236) suspended between a middle NS5-brane (012345) and the right NS5’-brane (012389)
(denoted by NS5′R-brane) along x
6 direction, together with Nf D6-branes (0123789) which are
parallel to NS5′R-brane and have nonzero (45) directions. Moreover, an extra left NS5’-brane
(denoted by NS5′L-brane) is located at the left hand side of a middle NS5-brane along the
x6 direction and there exist N ′c color D4-branes suspended between them, with N
f D6-branes
which have zero (45) directions. These are shown in Figure 1 explicitly. See also [3] for the
brane configuration.
By realizing that the two outer NS5′L,R-branes are perpendicular to a middle NS5-brane
and the fact that Nf D6-branes are parallel to NS5
R-brane and N
f D6-branes are parallel
to NS5′L-brane, the classical superpotential vanishes. However, one can deform this theory.
Then the classical superpotential by deforming this theory by adding the mass term for the
quarks Q and Q̃, along the lines of [1, 11, 10, 9, 8, 7], is given by
W = mQQ̃ (2.1)
and this type IIA brane configuration can be summarized as follows 2:
• One middle NS5-brane with worldvolume (012345).
• Two NS5’-branes with worldvolume (012389).
• Nf D6-branes with worldvolume (0123789) located at the positive region in v direction.
• Nc color D4-branes with worldvolume (01236). These are suspended between a middle
NS5-brane and NS5′R-brane.
• N ′c color D4-branes with worldvolume (01236). These are suspended between NS5′L-
brane and a middle NS5-brane.
Now we draw this electric brane configuration in Figure 1 and we put the coincident Nf
D6-branes in the nonzero v direction. If we ignore the left NS5′L-brane, N
c D4-branes and
N ′f D6-branes, then this brane configuration corresponds to the standard N = 1 SQCD with
the gauge group SU(Nc) with Nf massive flavors. The electric quarks Q and Q̃ correspond
to strings stretching between the Nc color D4-branes with Nf D6-branes, the electric quarks
Q′ and Q̃′ correspond to strings between the N ′c color D4-branes with N
f D6-branes and the
bifundamentals X and X̃ correspond to strings stretching between the Nc color D4-branes
with N ′c color D4-branes.
Figure 1: The N = 1 supersymmetric electric brane configuration of SU(Nc)×SU(N ′c) with
Nf chiral multiplets Q, Nf chiral multiplets Q̃, N
f chiral multiplets Q
′, N ′f chiral multiplets
Q̃′, the flavor singlet bifundamental field X and its complex conjugate bifundamental field X̃ .
The Nf D6-branes have nonzero v coordinates where v = m for equal massive case of quarks
Q, Q̃ while Q′ and Q̃′ are massless.
2We introduce two complex coordinates v ≡ x4 + ix5 and w ≡ x8 + ix9 for simplicity.
2.2 Magnetic theory with SU(Ñc)× SU(N ′c) gauge group
One can consider dualizing one of the gauge groups regarding as the other gauge group as a
spectator. One takes the Seiberg dual for the first gauge group factor SU(Nc) while remaining
the second gauge group factor SU(N ′c) unchanged. Also we consider the case where Λ1 >> Λ2,
in other words, the dualized group’s dynamical scale is far above that of the other spectator
group.
Let us move a middle NS5-brane to the right all the way past the right NS5′R-brane. For
example, see [12, 13, 14, 11, 10, 9, 8, 7]. After this brane motion, one arrives at the Figure 2.
Note that there exists a creation of Nf D4-branes connecting Nf D6-branes and NS5
R-brane.
Recall that the Nf D6-branes are perpendicular to a middle NS5-brane in Figure 1. The
linking number [16] of NS5-brane from Figure 2 is L5 =
− Ñc. On the other hand, the
linking number of NS5-brane from Figure 1 is L5 = −Nf2 +Nc−N
c. Due to the connection of
N ′c D4-branes with NS5
R-brane, the presence of N
c in the linking number arises. From these
two relations, one obtains the number of colors of dual magnetic theory
Ñc = Nf +N
c −Nc. (2.2)
The linking number counting looks similar to the one in [7] where there exists a contribution
from O4-plane.
Let us draw this magnetic brane configuration in Figure 2 and recall that we put the
coincident Nf D6-branes in the nonzero v directions in the electric theory, along the lines of
[12, 13, 14, 11, 10, 9, 8, 7]. The Nf created D4-branes connecting between D6-branes and
NS5′R-brane can move freely in the w direction. Moreover since N
c D4-branes are suspending
between two equal NS5′L,R-branes located at different x
6 coordinate, these D4-branes can
slide along the w direction also. If we ignore the left NS5′L-brane, N
c D4-branes and N
D6-branes(detaching these from Figure 2), then this brane configuration corresponds to the
standard N = 1 SQCD with the magnetic gauge group SU(Ñc = Nf −Nc) with Nf massive
flavors [12, 13, 14].
The dual magnetic gauge group is given by SU(Ñc) × SU(N ′c) and the matter contents
are given by
• Nf chiral multiplets q are in the fundamental representation under the SU(Ñc), Nf
chiral multiplets q̃ are in the antifundamental representation under the SU(Ñc) and then q
are in the representation (Ñc, 1) while q̃ are in the representation (Ñc, 1) under the gauge
group
• N ′f chiral multiplets Q′ are in the fundamental representation under the SU(N ′c), N ′f
chiral multiplets Q̃′ are in the antifundamental representation under the SU(N ′c) and then Q
Figure 2: The N = 1 supersymmetric magnetic brane configuration of SU(Ñc = Nf +N ′c −
Nc) × SU(N ′c) with Nf chiral multiplets q, Nf chiral multiplets q̃, N ′f chiral multiplets Q′,
N ′f chiral multiplets Q̃
′, the flavor singlet bifundamental field Y and its complex conjugate
bifundamental field Ỹ as well as Nf fields F
′, its complex conjugate Nf fields F̃ ′, N
f fields
M and the gauge singlet Φ. There exist Nf flavor D4-branes connecting D6-branes and
NS5′R-brane.
are in the representation (1,N′
) while Q̃′ are in the representation (1,N′
) under the gauge
group
• The flavor singlet field Y is in the bifundamental representation (Ñc,N′c) under the gauge
group and its complex conjugate field Ỹ is in the bifundamental representation (Ñc,N
) under
the gauge group
There are (Nf +N
2 gauge singlets in the first dual gauge group factor as follows:
• Nf -fields F ′ are in the fundamental representation under the SU(N ′c), its complex con-
jugate Nf -fields F̃ ′ are in the antifundamental representation under the SU(N
c) and then F
are in the representation (1,N′
) under the gauge group while F̃ ′ are in the representation
(1,N′
) under the gauge group
These additional Nf SU(N
c) fundamentals and Nf SU(N
c) antifundamentals are origi-
nating from the SU(Nc) chiral mesons X̃Q and XQ̃ respectively. It is clear to see that from
the Figure 2, since the Nf D6-branes are parallel to the NS5
R-brane, the newly created Nf
D4-branes can slide along the plane consisting of these D6-branes and NS5′R-brane arbitrar-
ily. Then strings stretching between the Nf D6-branes and N
c D4-branes will give rise to
these additional Nf SU(N
c) fundamentals and Nf SU(N
c) antifundamentals.
• N2f -fields M are in the representation (1, 1) under the gauge group
This corresponds to the SU(Nc) chiral meson QQ̃ and the fluctuations of the singlet M
correspond to the motion of Nf flavor D4-branes along (789) directions in Figure 2.
• The N ′2c -fields Φ is in the representation (1,N′2c − 1)⊕ (1, 1) under the gauge group
This corresponds to the SU(Nc) chiral meson XX̃ and note that X has a representation
of SU(N ′c) while X̃ has a representation N
of SU(N ′c). The fluctuations of the singlet
Φ correspond to the motion of N ′c D4-branes suspended two NS5
L,R-branes along the (789)
directions in Figure 2.
In the dual theory, since there exist Nf quarks q, Nf quarks q̃, one bifundamental field Y
which will give rise to the contribution of N ′c and its complex conjugate Ỹ which will give rise
to the contribution of N ′c, the coefficient of the beta function for the first gauge group factor
[6] is
SU( eNc)
= 3Ñc −Nf −N ′c = 2Nf + 2N ′c − 3Nc
where we inserted the number of colors given in (2.2) in the second equality and since there
exist N ′f quarks Q
′, N ′f quarks Q̃
′, one bifundamental field Y which will give rise to the
contribution of Ñc, its complex conjugate Ỹ which will give rise to the contribution of Ñc, Nf
fields F ′, its complex conjugate Nf fields F̃ ′ and the singlet Φ which will give rise to N
c, the
coefficient of the beta function of second gauge group factor [6] is
SU(N ′c)
= 3N ′c −N ′f − Ñc −Nf −N ′c = N ′c +Nc − 2Nf −N ′f .
Therefore, both SU(Ñc) and SU(N
c) gauge couplings are IR free by requiring the negativeness
of the coefficients of beta function. One can rely on the perturbative calculations at low energy
for this magnetic IR free region b
SU( eNc)
< 0 and b
SU(N ′c)
< 0. Note that the SU(N ′c) fields in
the magnetic theory are different from those of the electric theory. Since bSU(N ′c)−b
SU(N ′c)
SU(N ′c) is more asymptotically free than SU(N
mag [6]. Neglecting the SU(N ′c) dynamics,
the magnetic SU(Ñc) is IR free when Nf +N
Nc [6].
The dual magnetic superpotential, by adding the mass term (2.1) for Q and Q̃ in the
electric theory which is equal to put a linear term in M in the dual magnetic theory, is given
Wdual =
Mqq̃ + Y F ′q̃ + Ỹ qF̃ ′ + ΦY Ỹ
+mM (2.3)
where the mesons in terms of the fields defined in the electric theory are
M ≡ QQ̃, Φ ≡ XX̃, F ′ ≡ X̃Q, F̃ ′ ≡ XQ̃.
By orientifolding procedure(O4-plane) into this brane configuration, the q(Q) and q̃(Q̃) are
equivalent to each other, the Y (X) and Ỹ (X̃) become identical and F ′ and F̃ ′ become the
same. Then the reduced superpotential is identical with the one in [7]. Here q and q̃ are fun-
damental and antifundamental for the gauge group index respectively and antifundamentals
for the flavor index. Then, qq̃ has rank Ñc while m has a rank Nf . Therefore, the F-term
condition, the derivative the superpotential Wdual with respect to M , cannot be satisfied if
the rank Nf exceeds Ñc. This is so-called rank condition and the supersymmetry is broken.
Other F-term equations are satisfied by taking the vacuum expectation values of Y, Ỹ , F ′ and
F̃ ′ to vanish.
The classical moduli space of vacua can be obtained from F-term equations
qq̃ +m = 0, q̃M + F̃ ′Ỹ = 0,
Mq + Y F ′ = 0, F ′q̃ + Ỹ Φ = 0,
q̃Y = 0, qF̃ ′ + ΦY = 0,
Ỹ q = 0, Y Ỹ = 0.
Then, it is easy to see that there exist three reduced equations
q̃M = 0 = Mq, qq̃ +m = 0
and other F-term equations are satisfied if one takes the zero vacuum expectation values for
the fields Y, Ỹ , F ′ and F̃ ′. Then the solutions can be written as follows:
< q > =
meφ1 eNc
, < q̃ >=
me−φ1 eNc 0
, < M >=
0 Φ01Nf− eNc
< Y > = < Ỹ >=< F ′ >=< F̃ ′ >= 0. (2.4)
Let us expand around a point on (2.4), as done in [1]. Then the remaining relevant terms of
superpotential are given by
W reldual = Φ0 (δϕ δϕ̃+m) + δZ δϕ q̃0 + δZ̃ q0δϕ̃
by following the same fluctuations for the various fields as in [9]:
q01 eNc +
(δχ+ + δχ−)1 eNc
, q̃ =
q̃01 eNc +
(δχ+ − δχ−)1 eNc δϕ̃
δY δZ
δZ̃ Φ01Nf− eNc
as well as the fluctuations of Y, Ỹ , F ′ and F̃ ′. Note that there exist also three kinds of terms,
the vacuum < q > multiplied by δỸ δF̃ ′, the vacuum < q̃ > multiplied by δF ′δY , and the
vacuum < Φ > multiplied by δY δỸ . However, by redefining these, they do not enter the
contributions for the one loop result, up to quadratic order. As done in [17], one gets that
m2Φ0 will contain (log 4− 1) > 0 implying that these are stable.
3 Nonsupersymmetric meta-stable brane configuration
of SU(Nc)× SU(N ′c) gauge theory
Now we recombine Ñc D4-branes among Nf flavor D4-branes connecting between D6-branes
and NS5′R-brane with those connecting between NS5
R-brane and NS5-brane and push them
in +v direction from Figure 2. After this procedure, there are no color D4-branes between
NS5′R-brane and NS5-brane. For the flavor D4-branes, we are left with only (Nf − Ñc) flavor
D4-branes.
Then the minimal energy supersymmetry breaking brane configuration is shown in Figure
3, along the lines of [12, 13, 14, 11, 10, 9, 8, 7]. If we ignore the left NS5′L-brane, N
c D4-
branes and N ′f D6-branes(detaching these from Figure 3), as observed already, then this brane
configuration corresponds to the minimal energy supersymmetry breaking brane configuration
for the N = 1 SQCD with the magnetic gauge group SU(Ñc = Nf − Nc) with Nf massive
flavors [12, 13, 14].
Figure 3: The nonsupersymmetric minimal energy brane configuration of SU(Ñc = Nf +
N ′c −Nc)× SU(N ′c) with Nf chiral multiplets q, Nf chiral multiplets q̃, N ′f chiral multiplets
Q′, N ′f chiral multiplets Q̃
′, the flavor singlet bifundamental field Y and its complex conjugate
bifundamental field Ỹ and various gauge singlets.
The type IIA/M-theory brane construction for the N = 2 gauge theory was described
by [18] and after lifting the type IIA description to M-theory, the corresponding magnetic
M5-brane configuration 3 with equal mass for the quarks where the gauge group is given by
3The M5-brane lives in (0123) directions and is wrapping on a Riemann surface inside (4568910) directions.
The Taub-NUT space in (45610) directions is parametrized by two complex variables v and y and the flat two
dimensions in (89) directions by a complex variable w. See [14] for the relevant discussions.
SU(Ñc)×SU(N ′c), in a background space of xt = vN
k=1(v−ek) where this four dimensional
space replaces (45610) directions, is described by
t3 + (v
eNc + · · · )t2 + (vN ′c + · · · )t+ vN ′f
(v − ek) = 0 (3.1)
where ek is the position of the D6-branes in the v direction(for equal massive case, we can
write ek = m) and we have ignored the lower power terms in v in t
2 and t denoted by · · · and
the scales for the gauge groups in front of the first term and the last term, for simplicity. For
fixed x, the coordinate t corresponds to y.
From this curve (3.1) of cubic equation for t above, the asymptotic regions for three NS5-
branes can be classified by looking at the first two terms providing NS5-brane asymptotic
region, next two terms providing NS5′R-brane asymptotic region and the final two terms
giving NS5′L-brane asymptotic region as follows
1. v → ∞ limit implies
w → 0, y ∼ v eNc + · · · NS asymptotic region.
2. w → ∞ limit implies
v → m, y ∼ wNf+N ′f−N ′c + · · · NS ′L asymptotic region,
v → m, y ∼ wN ′c− eNc + · · · NS ′R asymptotic region.
Here the two NS5′L,R-branes are moving in the +v direction holding everything else fixed
instead of moving D6-branes in the +v direction, in the spirit of [14]. The harmonic function
sourced by the D6-branes can be written explicitly by summing over two contributions from
the Nf and N
f D6-branes and similar analysis to both solve the differential equation and
find out the nonholomorphic curve can be done [14, 10, 9, 8, 7]. An instability from a new
M5-brane mode arises.
4 The N = 1 supersymmetric brane configuration of
SU(Nc)× SO(N ′c) gauge theory
After reviewing the type IIA brane configuration corresponding to the electric theory based
on the N = 1 SU(Nc) × SO(N ′c) gauge theory [15], we describe the Seiberg dual magnetic
theory which is N = 1 SU(Ñc)× SO(N ′c) gauge theory.
4.1 Electric theory with SU(Nc)× SO(N ′c) gauge group
The gauge group is given by SU(Nc)× SO(N ′c) and the matter contents [15](similar matter
contents are found in [4]) are given by
• Nf chiral multiplets Q are in the fundamental representation under the SU(Nc), Nf
chiral multiplets Q̃ are in the antifundamental representation under the SU(Nc) and then Q
are in the representation (Nc, 1) while Q̃ are in the representation (Nc, 1) under the gauge
group
• 2N ′f chiral multiplets Q′ are in the fundamental representation under the SO(N ′c) and
then Q′ are in the representation (1,N′
) under the gauge group
• The flavor singlet field X is in the bifundamental representation (Nc,N′c) under the
gauge group and the flavor singlet X̃ is in the bifundamental representation (Nc,N
) under
the gauge group
In the electric theory, since there exist Nf quarks Q, Nf quarks Q̃, one bifundamental
field X which will give rise to the contribution of N ′c and its complex conjugate X̃ which will
give rise to the contribution of N ′c, the coefficient of the beta function of the first gauge group
factor is
bSU(Nc) = 3Nc −Nf −N ′c
and similarly, since there exist 2N ′f quarks Q
′, one bifundamental field X which will give rise
to the contribution of Nc and its complex conjugate X̃ which will give rise to the contribution
of Nc, the coefficient of the beta function of the second gauge group factor is
bSO(N ′c) = 3(N
c − 2)− 2N ′f − 2Nc.
The anomaly free global symmetry is given by SU(Nf )
2 × SU(2N ′f)×U(1)2 ×U(1)R and
let us denote the strong coupling scales for SU(Nc) as Λ1 and for SO(N
c) as Λ2, as in previous
section. The theory is asymptotically free when bSU(Nc) > 0 for the SU(Nc) gauge theory and
when bSO(N ′c) > 0 for the SO(N
c) gauge theory.
The type IIA brane configuration of N = 2 gauge theory [19] consists of four NS5-branes
(012345) which have different x6 values, Nc and N
c D4-branes (01236) suspended between
them, 2Nf and 2N
f D6-branes (0123789) and an orientifold 6 plane (0123789) of positive
Ramond charge 4. According to Z2 symmetry of orientifold 6-plane(O6-plane) sitting at
v = 0 and x6 = 0, the coordinates (x4, x5, x6) transform as −(x4, x5, x6), as usual. See also
[3] for the discussion of O6-plane.
4There are many different brane configurations with O6-plane in the literature and some of them are
present in [20, 21, 22, 23, 24].
By rotating the third and fourth NS5-branes which are located at the right hand side of
O6-plane, from v direction toward −w and +w directions respectively, one obtains N = 1
theory. Their mirrors, the first and second NS5-branes which are located at the left hand
side of O6-plane, can be rotated in a Z2 symmetric manner due to the presence of O6-plane
simultaneously. That is, if the first NS5-brane rotates by an angle −ω in (v, w) plane, denoted
by NS5−ω-brane [3], then the mirror image of this NS5-brane, the fourth NS5-brane, is rotated
by an angle ω in the same plane, denoted by NS5ω-brane. If the second NS5-brane rotates
by an angle θ in (v, w) plane, denoted by NS5θ-brane [3], then the mirror image of this
NS5-brane, the third NS5-brane, is rotated by an angle −θ in the same plane, denoted by
NS5−θ-brane. For more details, see the Figure 4
We also rotate the N ′f D6-branes which are located between the second NS5-brane and
an O6-plane and make them be parallel to NS5θ-brane and denote them as D6θ-brane with
zero v coordinate(the angle between the unrotated D6-branes and D6θ-branes is equal to
− θ) and its mirrors N ′f D6-branes appear as D6−θ-branes between the O6-plane and third
NS5-brane. There is no coupling between the adjoint field and the quarks since the rotated
D6θ-branes are parallel to the rotated NS5θ-brane [5, 3]. Similarly, the Nf D6-branes which
are located between the third NS5-brane and the fourth NS5-brane can be rotated and we
can make them be parallel to NS5ω-brane and denote them as D6ω-branes with nonzero v
coordinate(the angle between the unrotated D6-branes and D6ω-branes is equal to
−ω) and
its mirrors Nf D6-branes appear as D6−ω-branes between the first NS5-brane and the second
NS5-brane.
Moreover the Nc D4-branes are suspended between the first NS5-brane and the second
NS5-brane(and its mirrors) and the N ′c D4-branes are suspended between the second NS5-
brane and the third NS5-brane.
For this brane setup 6, the classical superpotential is given by [15]
W = −1
4 tan(ω − θ) +
tan 2θ
tr(XX̃)2 +
trXX̃X̃X
4 sin 2θ
(trXX̃)2
4Nc tan(ω − θ)
. (4.1)
It is easy to see that when θ approaches 0 and ω approaches π
, then this superpotential
vanishes.
5The angles of θ1 and θ2 in [15] are related to the angles θ and ω as follows: θ = θ1 and ω = θ2.
6For arbitrary angles θ and ω, the superpotential for the SU(Nc) sector is given by W = XφX̃ + tan(ω −
θ) trφ2 where φ ia an adjoint field for SU(Nc). There is no coupling between φ and Nf quarks because
D6±ω-branes are parallel to NS5±ω-branes. The superpotential for the SO(N
c) sector is given by W =
XφAX̃ +XφSX̃ + tan θ trφ
A − 1tan θ trφ
S where φA and φS are an adjoint field and a symmetric tensor for
SO(N ′c) [25]. After integrating out φ, φA and φS , the whole superpotential can be written as in (4.1).
Now one summarizes the supersymmetric electric brane configuration with their worldvol-
umes in type IIA string theory as follows.
• NS5−ω-brane with worldvolume by both (0123) and two spatial dimensions in (v, w)
plane and with negative x6.
• NS5θ-brane with worldvolume by both (0123) and two spatial dimensions in (v, w) plane
and with negative x6.
• NS5−θ-brane with worldvolume by both (0123) and two spatial dimensions in (v, w)
plane and with positive x6.
• NS5ω-brane with worldvolume by both (0123) and two spatial dimensions in (v, w) plane
and with positive x6.
• N ′f D6θ-branes with worldvolume by both (01237) and two spatial dimensions in (v, w)
plane and with negative x6 and v = 0.
• N ′f D6−θ-branes with worldvolume by both (01237) and two space dimensions in (v, w)
plane and with positive x6 and v = 0.
• Nf D6ω-branes with worldvolume by both (01237) and two spatial dimensions in (v, w)
plane and with positive x6. Before the rotation, the distance from Nc color D4-branes in the
+v direction is nonzero.
• Nf D6−ω-branes with worldvolume by both (01237) and two space dimensions in (v, w)
plane and with negative x6. Before the rotation, the distance from Nc color D4-branes in the
−v direction is nonzero.
• O6-plane with worldvolume (0123789) with v = 0 = x6.
•Nc D4-branes connecting NS5−ω-brane and NS5θ-brane, with worldvolume (01236) with
v = 0 = w(and its mirrors).
• N ′c D4-branes connecting NS5θ-brane and NS5−θ-brane, with worldvolume (01236) with
v = 0 = w.
We draw the type IIA electric brane configuration in Figure 4 which was basically given
in [15] already but the only difference is to put Nf D6-branes in the nonzero v direction in
order to obtain nonzero masses for the quarks which are necessary to obtain the meta-stable
vacua.
4.2 Magnetic theory with SU(Ñc)× SO(N ′c) gauge group
One takes the Seiberg dual for the first gauge group factor SU(Nc) while remaining the second
gauge group factor SO(N ′c), as in previous case. Also we consider the case where Λ1 >> Λ2,
in other words, the dualized group’s dynamical scale is far above that of the other spectator
group.
Figure 4: The N = 1 supersymmetric electric brane configuration of SU(Nc)×SO(N ′c) with
Nf chiral multiplets Q, Nf chiral multiplets Q̃, 2N
f chiral multiplets Q
′, the flavor singlet
bifundamental field X and its complex conjugate bifundamental field X̃ . The Nf D6ω-branes
have nonzero v coordinates where v = m(and its mirrors) for equal massive case of quarks
Q, Q̃ while Q′ is massless.
Let us move the NS5−θ-brane to the right all the way past the right NS5ω-brane(and
its mirrors to the left). After this brane motion, one arrives at the Figure 5. Note that
there exists a creation of Nf D4-branes connecting Nf D6ω-branes and NS5ω-brane(and its
mirrors). Recall that the Nf D6ω-branes are not parallel to the NS5−θ-brane in Figure 4(and
its mirrors). The linking number of NS5−θ-brane from Figure 5 is L5 =
− Ñc. On the
other hand, the linking number of NS5−θ-brane from Figure 4 is L5 = −Nf2 +Nc −N
c. From
these, one gets the number of colors in dual magnetic theory
Ñc = Nf +N
c −Nc. (4.2)
Let us draw this magnetic brane configuration in Figure 5 and remember that we put the
coincident Nf D6ω-branes in the nonzero v directions(and its mirrors). The Nf created D4-
branes connecting between D6ω-branes and NS5ω-brane can move freely in the w direction,
as in previous case. Moreover, since N ′c D4-branes are suspending between two unequal
NS5±ω-branes located at different x
6 coordinate, these D4-branes cannot slide along the w
direction, for arbitrary rotation angles. If we are detaching all the branes except NS5ω-brane,
NS5−θ-brane, D6ω-branes, Nf D4-branes and Ñc D4-branes from Figure 5, then this brane
configuration corresponds to N = 1 SQCD with the magnetic gauge group SU(Ñc = Nf−Nc)
with Nf massive flavors with tilted NS5-branes.
The dual magnetic gauge group is given by SU(Ñc) × SO(N ′c) and the matter contents
are given by
Figure 5: The N = 1 supersymmetric magnetic brane configuration of SU(Ñc = Nf +N ′c −
Nc) × SO(N ′c) with Nf chiral multiplets q, Nf chiral multiplets q̃, 2N ′f chiral multiplets Q′,
the flavor singlet bifundamental field Y and its complex conjugate bifundamental field Ỹ as
well as Nf fields F
′, its complex conjugate Nf fields F̃ ′, N
f fields M and the gauge singlet Φ.
There exist Nf flavor D4-branes connecting D6ω-branes and NS5ω-brane(and its mirrors).
• Nf chiral multiplets q are in the fundamental representation under the SU(Ñc), Nf
chiral multiplets q̃ are in the antifundamental representation under the SU(Ñc) and then q
are in the representation (Ñc, 1) while q̃ are in the representation (Ñc, 1) under the gauge
group
• 2N ′f chiral multiplets Q′ are in the fundamental representation under the SO(N ′c) and
then Q′ are in the representation (1,N′
) under the gauge group
• The flavor singlet field Y is in the bifundamental representation (Ñc,N′c) under the gauge
group and its complex conjugate field Ỹ is in the bifundamental representation (Ñc,N
) under
the gauge group
There are (Nf +N
2 gauge singlets in the first dual gauge group factor as follows:
• Nf -fields F ′ are in the fundamental representation under the SO(N ′c), Nf -fields F̃ ′ are
in the fundamental representation under the SO(N ′c) and then F
′ are in the representation
(1,N′
) under the gauge group while F̃ ′ are in the representation (1,N′
) under the gauge
group
These additional 2Nf SO(N
c) vectors are originating from the SU(Nc) chiral mesons X̃Q
and XQ̃ respectively. It is easy to see that from the Figure 5, since the D6−ω-branes are par-
allel to the NS5−ω-brane, the newly created Nf D4-branes can slide along the plane consisting
of D6−ω-branes and NS5−ω-brane arbitrarily(and its mirrors). Then strings connecting the
Nf D6−ω-branes and N
c D4-branes will give rise to these additional 2Nf SO(N
c) vectors.
• N2f -fields M are in the representation (1, 1) under the gauge group
This corresponds to the SU(Nc) chiral meson QQ̃ and the fluctuations of the singlet M
correspond to the motion of Nf flavor D4-branes along (789) directions in Figure 5.
• The N ′2c singlet Φ is in the representation (1, adj)⊕ (1, symm) under the gauge group
This corresponds to the SU(Nc) chiral meson XX̃ and note that both X and X̃ have
representation N′
of SO(N ′c). In general, the fluctuations of the singlet Φ correspond to the
motion of N ′c D4-branes suspended two NS5±ω-branes along the (789) directions in Figure 5.
In the dual theory, since there exist Nf quarks q, Nf quarks q̃, one bifundamental field Y
which will give rise to the contribution of N ′c and its complex conjugate Ỹ which will give rise
to the contribution of N ′c, the coefficient of the beta function of the first gauge group factor
with (4.2) is
SU( eNc)
= 3Ñc −Nf −N ′c = 2Nf + 2N ′c − 3Nc
and since there exist 2N ′f quarks Q
′, one bifundamental field Y which will give rise to the
contribution of Ñc, its complex conjugate Ỹ which will give rise to the contribution of Ñc, Nf
fields F ′, its complex conjugate Nf fields F̃ ′ and the singlet Φ which will give rise to N
c, the
coefficient of the beta function is
SO(N ′c)
= 3(N ′c − 2)− 2N ′f − 2Ñc − 2Nf − 2N ′c = −N ′c + 2Nc − 4Nf − 2N ′f − 6.
Therefore, both SU(Ñc) and SO(N
c) gauge couplings are IR free by requiring the negativeness
of the coefficients of beta function. One can rely on the perturbative calculations at low energy
for this magnetic IR free region b
SU( eNc)
< 0 and b
SO(N ′c)
< 0. Note that the SO(N ′c) fields in
the magnetic theory are different from those of the electric theory. Since bSO(N ′c)−b
SO(N ′c)
SO(N ′c) is more asymptotically free than SO(N
mag. Neglecting the SO(N ′c) dynamics, the
magnetic SU(Ñc) is IR free when Nf +N
Nc, as in previous case.
The dual magnetic superpotential, by adding the mass term for Q and Q̃ in the electric
theory which is equal to put a linear term in M in the dual magnetic theory, is given by 7
Wdual =
(Φ2 + · · · ) +Q′ΦQ′ +Mqq̃ + Y F̃ ′q̃ + Ỹ qF ′ + ΦY Ỹ
+mM (4.3)
where the mesons in terms of the fields defined in the electric theory are
M ≡ QQ̃, Φ ≡ XX̃, F ′ ≡ X̃Q, F̃ ′ ≡ XQ̃.
7There appears a mismatch between the number of colors from field theory analysis and those from brane
motion when we take the full dual process on the two gauge group factors simultaneously [15]. By adding
D4-branes to the dual brane configuration without affecting the linking number counting, this mismatch
can be removed. Similar phenomena occurred in [5, 26]. Then this turned out that there exists a deformation
∆W generated by the meson Q′XX̃Q′. This is exactly the second term, Q′ΦQ′, in (4.3). In previous example,
there is no such deformation term in (2.3).
We abbreviated all the relevant terms and coefficients appearing in the quartic superpotential
for the bifundamentals in electric theory (4.1) and denote them here by Φ2 + · · · . Here
q and q̃ are fundamental and antifundamental for the gauge group index respectively and
antifundamentals for the flavor index. Then, qq̃ has rank Ñc and m has a rank Nf . Therefore,
the F-term condition, the derivative the superpotential Wdual with respect to M , cannot be
satisfied if the rank Nf exceeds Ñc and the supersymmetry is broken. Other F-term equations
are satisfied by taking the vacuum expectation values of Y, Ỹ , F ′, F̃ ′ and Q′ to vanish.
The classical moduli space of vacua can be obtained from F-term equations and one gets
qq̃ +m = 0, q̃M + F ′Ỹ = 0,
Mq + Y F̃ ′ = 0, F̃ ′q̃ + Ỹ Φ = 0,
q̃Y = 0, qF ′ + ΦY = 0,
Ỹ q = 0, Q′Q′ + Y Ỹ = 0,
ΦQ′ = 0.
Then, it is easy to see that there exists a solution
q̃M = 0 = Mq, qq̃ +m = 0.
Other F-term equations are satisfied if one takes the zero vacuum expectation values for the
fields Y, Ỹ , F ′, Q′ and F̃ ′. Then the solutions can be written as
< q > =
meφ1 eNc
, < q̃ >=
me−φ1 eNc 0
, < M >=
0 Φ01Nf− eNc
< Y > = < Ỹ >=< F ′ >=< F̃ ′ >=< Q′ >= 0. (4.4)
Let us expand around a point on (4.4), as done in [1]. Then the remaining relevant terms of
superpotential are given by
W reldual = Φ0 (δϕ δϕ̃+m) + δZ δϕ q̃0 + δZ̃ q0δϕ̃
by following the similar fluctuations for the various fields as in [9]. Note that there exist also
four kinds of terms, the vacuum < q > multiplied by δỸ δF ′, the vacuum < q̃ > multiplied
by δF̃ ′δY , the vacuum < Φ > multiplied by δY δỸ , and the vacuum < Φ > multiplied by
δQ′δQ′. However, by redefining these, they do not enter the contributions for the one loop
result, up to quadratic order. As done in [17], one gets that m2Φ0 will contain (log 4 − 1) > 0
implying that these are stable.
5 Nonsupersymmetric meta-stable brane configuration
of SU(Nc)× SO(N ′c) gauge theory
Since the electric superpotential (4.1) vanishes for θ = 0 and ω = π
, the corresponding
magnetic superpotential in (4.3) does not contain the terms Φ2 + · · · and it becomes
Wdual =
Q′ΦQ′ +Mqq̃ + Y F̃ ′q̃ + Ỹ qF ′ + ΦY Ỹ
Now we recombine Ñc D4-branes among Nf flavor D4-branes connecting between D6ω=π
D6-branes and NS5ω=π
= NS5′R-brane with those connecting between NS5
R-brane and
NS5−θ=0 = NS5R-brane(and its mirrors) and push them in +v direction from Figure 5. Of
course their mirrors will move to−v direction in a Z2 symmetric manner due to the O6+-plane.
After this procedure, there are no color D4-branes between NS5′R-brane and NS5R-brane.
For the flavor D4-branes, we are left with only (Nf − Ñc) D4-branes(and its mirrors).
Then the minimal energy supersymmetry breaking brane configuration is shown in Figure
6. If we ignore all the branes except NS5′R-brane, NS5R-brane, D6-branes, (Nf − Ñc) D4-
branes and Ñc D4-branes, as observed already, then this brane configuration corresponds
to the minimal energy supersymmetry breaking brane configuration for the N = 1 SQCD
with the magnetic gauge group SU(Ñc) with Nf massive flavors [12, 13, 14]. Note that N
D4-branes can slide w direction for this brane configuration.
The type IIA/M-theory brane construction for the N = 2 gauge theory was described by
[19] and after lifting the type IIA description we explained so far to M-theory, the correspond-
ing magnetic M5-brane configuration with equal mass for the quarks where the gauge group
is given by SU(Ñc)×SO(N ′c), in a background space of xt = (−1)Nf+N
k=1(v
2− e2k)
where this four dimensional space replaces (45610) directions, is characterized by
t4 + (v
eNc + · · · )t3 + (vN ′c + · · · )t2 + (v eNc + · · · )t+ v2N ′f+4
(v2 − e2k) = 0.
From this curve of quartic equation for t above, the asymptotic regions can be classified
by looking at the first two terms providing NS5R-brane asymptotic region, next two terms
providing NS5′R-brane asymptotic region, next two terms providing NS5
L-brane asymptotic
region, and the final two terms giving NS5L-brane asymptotic region as follows:
1. v → ∞ limit implies
w → 0, y ∼ v eNc + · · · NS5R asymptotic region,
w → 0, y ∼ v2Nf+2N ′f− eNc+4 + · · · NS5L asymptotic region.
Figure 6: The nonsupersymmetric minimal energy brane configuration of SU(Ñc = Nf +
N ′c −Nc)× SO(N ′c) with Nf chiral multiplets q, Nf chiral multiplets q̃, 2N ′f chiral multiplets
Q′, the flavor singlet bifundamental field Y and its complex conjugate bifundamental field Ỹ
and gauge singlets. The N ′c D4-branes and 2(Nf − Ñc) D4-branes can slide w direction freely
in a Z2 symmetric way.
2. w → ∞ limit implies
v → −m, y ∼ w eNc−N ′c + · · · NS5′L asymptotic region,
v → +m, y ∼ wN ′c− eNc + · · · NS5′R asymptotic region.
Now the two NS5′L,R-branes are moving in the ±v direction holding everything else fixed
instead of moving D6-branes in the ±v direction. Then the mirrors of D4-branes are moved
appropriately. The harmonic function sourced by the D6-branes can be written explicitly by
summing of three contributions from the Nf and N
f D6-branes(and its mirrors) plus an O6-
plane, and similar analysis to solve the differential equation and find out the nonholomorphic
curve can be done [14, 10, 9, 8, 7]. In this case also, we expect an instability from a new
M5-brane mode.
6 Discussions
So far, we have dualized only the first gauge group factor in the gauge group SU(Nc)×SO(N ′c).
What happens if we dualize the second gauge group factor SO(N ′c)?(For the case SU(Nc)×
SU(N ′c), the behavior of dual for the second gauge group will be the same as when we take
the dual for the first gauge group factor.) This can be done by moving the NS5θ-brane and
N ′f D6θ-branes that can be located at the nonzero v coordinate for massive quarks Q
′, to
the right passing through O6-plane(and their mirrors to the left). According to the linking
number counting, one obtains the dual gauge group SU(Nc)×SO(Ñ ′c = 2Nc+2N ′f −N ′c+4).
One can easily see that there is a creation of N ′f D4-branes connecting NS5θ-brane and
D6θ-branes(and its mirrors). Then from the brane configuration, there exist the additional
2N ′f SU(Nc) quarks originating from the SO(N
c) chiral mesons Q
′X ≡ F̃ ′ and Q′X̃ ≡ F ′.
The deformed superpotential ∆W = Q′XX̃Q′ can be interpreted as the mass term of F ′F̃ ′.
Then one can write dual magnetic superpotential in this case. However, it is not clear how
the recombination of color and flavor D4-branes and splitting procedure between them in
the construction of meta-stable vacua arises since there is no extra NS5-brane between two
NS5±θ-branes. If there exists an extra NS5-brane at the origin of our brane configuration(then
the gauge group and matter contents will change), it would be possible to construct the
corresponding meta-stable brane configuration. It would be interesting to study these more
in the future.
As already mentioned in [8] and section 4, the matter contents in [4] are different from the
ones in section 4 with the same gauge group. In other words, the theory of SU(Nc)×SO(N ′c)
with X , which transform as fundamental in SU(Nc) and vector in SO(N
c), a antisymmetric
tensor A in SU(Nc), as well as fundamentals for SU(Nc) and vectors for SO(N
c) can confine
either SU(Nc) factor or SO(N
c) factor. This theory can be described by the web of branes in
the presence of O4−-plane and orbifold fixed points. With two NS5-branes and O4−-plane, by
modding out Z3 symmetry acting on (v, w) as (v, w) → (v exp(2πi3 ), w exp(
)), the resulting
gauge group will be SU(Nc)×SO(Nc+4) with above matter contents [27]. Similar analysis for
SU(Nc)×Sp(Nc2 −2) gauge group with opposite O4
+-plane can be done. Then in this case, the
matter in SU(Nc) will be a symmetric tensor S and other matter contents are present also. It
would be interesting to see whether this gauge theory and corresponding brane configuration
will provide a meta-stable vacuum.
Let us comment on other possibility where the gauge group is given by SU(Nc)× Sp(N ′c)
and the matter contents are given by
• Nf chiral multiplets Q are in the fundamental representation under the SU(Nc), Nf
chiral multiplets Q̃ are in the antifundamental representation under the SU(Nc) and then Q
are in the representation (Nc, 1) while Q̃ are in the representation (Nc, 1) under the gauge
group
• 2N ′f chiral multiplets Q′ are in the fundamental representation under the Sp(N ′c) and
then Q′ are in the representation (1, 2N′
) under the gauge group
• The flavor singlet field X is in the bifundamental representation (Nc, 2N′c) under the
gauge group and the flavor singlet X̃ is in the bifundamental representation (Nc, 2N
) under
the gauge group
One can compute the coefficients of beta functions of the each gauge group factor, as we
did for previous examples.
The type IIA brane configuration of an electric theory is exactly the same as the Figure
4 except the RR charge O6-plane with negative sign. The classical superpotential 8 is given
by [15]
W = −1
4 tan(ω − θ) +
tan 2θ
tr(XX̃)2 − trXX̃X̃X
4 sin 2θ
(trXX̃)2
4Nc tan(ω − θ)
. (6.1)
In this case, when θ approaches π
and ω approaches 0, then this superpotential vanishes.
The dual magnetic gauge group is given by SU(Ñc = Nf + 2N
c −Nc)× Sp(N ′c) with the
same number of colors of dual theory as those in previous cases and the matter contents are
given by
• Nf chiral multiplets q are in the fundamental representation under the SU(Ñc), Nf
chiral multiplets q̃ are in the antifundamental representation under the SU(Ñc) and then q
are in the representation (Ñc, 1) while q̃ are in the representation (Ñc, 1) under the gauge
group
• 2N ′f chiral multiplets Q′ are in the fundamental representation under the Sp(N ′c) and
then Q′ are in the representation (1, 2N′
) under the gauge group
• The flavor singlet field Y is in the bifundamental representation (Ñc, 2N′c) under the
gauge group and its complex conjugate field Ỹ is in the bifundamental representation (Ñc, 2N
under the gauge group
There are (Nf + 2N
2 gauge singlets in the first dual gauge group factor
• Nf -fields F ′ are in the fundamental representation under the Sp(N ′c), Nf -fields F̃ ′ are
in the fundamental representation under the Sp(N ′c) and then F
′ are in the representation
(1, 2N′
) under the gauge group while F̃ ′ are in the representation (1, 2N′
) under the gauge
group
• N2f -fields M are in the representation (1, 1) under the gauge group
• The 4N ′2c singlet Φ is in the representation (1, adj)⊕ (1, antisymm) under the gauge
group
The dual magnetic superpotential for arbitrary angles is given by (4.3) with appropriate
Sp(N ′c) invariant metric J . The stability analysis can be done similarly.
8The superpotential for the Sp(N ′c) sector is given by W = XφAX̃+XφSX̃+tan θ trφ
S− 1tan θ trφ
A where
φS and φA are an adjoint field(symmetric tensor) and an antisymmetric tensor for Sp(N
c) [25]. Note that
there is a sign change in the second trace term of the superpotential in (6.1), compared to (4.1).
After following the procedure from Figure 4 to Figure 5 with opposite RR charge for O6-
plane and by taking the limit where θ → π
and ω → 0, the minimal energy supersymmetry
breaking brane configuration is shown in Figure 7.
Figure 7: The nonsupersymmetric minimal energy brane configuration of SU(Ñc = Nf +
2N ′c −Nc)×Sp(N ′c) with Nf chiral multiplets q, Nf chiral multiplets q̃, 2N ′f chiral multiplets
Q′, the flavor singlet bifundamental field Y and its complex conjugate bifundamental field Ỹ
and gauge singlets. Note the RR charge of O6-plane is negative and its charge is equivalent
to −4 D6-branes. The 2N ′c D4-branes and 2(Nf − Ñc) D4-branes can slide w direction freely
in a Z2 symmetric way.
Compared to the previous nonsupersymmetric brane configuration in Figure 6, the role
of NS5-brane and NS5’-brane is interchanged to each other: undoing the Seiberg dual in the
context of [13]. This kind of feature of recombination and splitting between color D4-branes
and flavor D4-branes occurs in [8]. At the electric brane configuration, Nf D6-branes are
perpendicular to NS5-brane and this leads to the coupling between the quarks and adjoint
in the superpotential. However, the overall coefficient function including this extra terms
vanishes and eventually the whole electric superpotential will vanish according to the above
limit we take.
From the quartic equation with the presence of opposite RR charge for O6-plane, in a
background space of xt = (−1)Nf+N ′fv2N ′f−4
k=1(v
2 − e2k),
t4 + (v
eNc + · · · )t3 + (vN ′c + · · · )t2 + (v eNc + · · · )t+ v2N ′f−4
(v2 − e2k) = 0,
the asymptotic regions can be classified as follows:
1. v → ∞ limit implies
w → 0, y ∼ vN ′c− eNc · · · NS5R asymptotic region,
w → 0, y ∼ v eNc−N ′c + · · · NS5L asymptotic region.
2. w → ∞ limit implies
v → −m, y ∼ w2Nf+2N ′f− eNc−4 + · · · NS5′L asymptotic region,
v → +m, y ∼ w eNc + · · · NS5′R asymptotic region.
In [28], the SU(7)×S̃p(1) model and SU(9)×S̃p(2) model can be obtained by dualizing the
SU(7)× SU(2) model with a bifundamental and two antifundamentals for SU(7) and a fun-
damental for SU(2) and the SU(9)×SU(2) with a bifundamental and two antifundamentals
for SU(9) and a fundamental for Sp(1) respectively(Note that Sp(1) ∼ SU(2)). The matter
contents in an electric theory are different from those in previous paragraph. The matter
contents in the magnetic description are given by an antisymmetric tensor and a fundamen-
tal in the first gauge group as well as a bifundamental, a fundamental in the second gauge
group and two antifundamentals in the first gauge group. There exists a nonzero dual mag-
netic superpotential. Also the dual description the SU(7)× S̃p(1) model and SU(9)× S̃p(2)
model can be constructed from the antisymmetric models of Affleck-Dine-Seiberg by gauging
a maximal flavor symmetry and adding the extra matter to cancel all anomalies and extra
flavor.
On the other hand, the models SU(2Nc + 1)× SU(2) have its brane box model descrip-
tion in [29] where the above examples correspond to Nc = 3 and Nc = 4 respectively. In
particular, the case where Nc = 1(the gauge group is SU(3) × SU(2), i.e., (3, 2) model [30])
was described by brane box model with superpotential or without superpotential. Then it
would be interesting to obtain the Seiberg dual for these models using brane box model
and look for the possibility of having meta-stable vacua for these models. Moreover, this
gauge theory was generalized to SU(2Nc + 1) × Sp(N ′c) model with a bifundamental and
2N ′c antifundamentals for SU(2Nc + 1) and a fundamental for Sp(N
c) and its dual descrip-
tion SU(2Nc + 1)× Sp(Ñ ′c = Nc −N ′c − 1) with a bifundamental and 2N ′c antifundamentals
for SU(2Nc + 1) and a fundamental for Sp(N
c) as well as two gauge singlets [28]. For the
particular range of Nc, the dual theory is IR free, not asymptotically free.
According to [31], SU(2Nc) with antisymmetric tensor and antifundamentals can be de-
scribed by two gauge groups Sp(2Nc−4)×SU(2Nc) with bifundamental and antifundamentals
for SU(2Nc). Some of the brane realization with zero superpotential was given in the brane
box model in [29]. Similarly from the result of [32] by following the method of [31], the dual
description for SU(2Nc +1) with antisymmetric tensor and fundamentals can be represented
by two gauge group factors. This dual theory breaks the supersymmetry at the tree level.
Similar discussions are present in [33]. Then it would be interesting to construct the corre-
sponding Seigerg dual and see how the electric theory and its magnetic theory can be mapped
into each other in the brane box model.
Ther are also different directions concerning on the meta-stable vacua in different contexts
and some of the relevant works are present in [34]-[43] where some of them use anti D-branes
and some of them describe the type IIB theory and it would be interesting to find out how
similarities if any appear and what are the differences in what sense between the present work
and those works.
Acknowledgments
I would like to thank A. Hanany and K. Landsteiner for discussions. This work was
supported by grant No. R01-2006-000-10965-0 from the Basic Research Program of the Korea
Science & Engineering Foundation.
References
[1] K. Intriligator, N. Seiberg and D. Shih, “Dynamical SUSY breaking in meta-stable
vacua,” JHEP 0604, 021 (2006) [arXiv:hep-th/0602239].
[2] K. Intriligator and N. Seiberg, “Lectures on supersymmetry breaking,”
[arXiv:hep-ph/0702069].
[3] A. Giveon and D. Kutasov, “Brane dynamics and gauge theory,” Rev. Mod. Phys. 71,
983 (1999) [arXiv:hep-th/9802067].
[4] K. A. Intriligator, R. G. Leigh and M. J. Strassler, “New examples of duality in chi-
ral and nonchiral supersymmetric gauge theories,” Nucl. Phys. B 456, 567 (1995)
[arXiv:hep-th/9506148].
[5] J. H. Brodie and A. Hanany, “Type IIA superstrings, chiral symmetry, and N = 1 4D
gauge theory dualities,” Nucl. Phys. B 506, 157 (1997) [arXiv:hep-th/9704043].
[6] E. Barnes, K. Intriligator, B. Wecht and J. Wright, “N = 1 RG flows, product groups,
and a-maximization,” Nucl. Phys. B 716, 33 (2005) [arXiv:hep-th/0502049].
http://arxiv.org/abs/hep-th/0602239
http://arxiv.org/abs/hep-ph/0702069
http://arxiv.org/abs/hep-th/9802067
http://arxiv.org/abs/hep-th/9506148
http://arxiv.org/abs/hep-th/9704043
http://arxiv.org/abs/hep-th/0502049
[7] C. Ahn, “Meta-Stable Brane Configuration and Gauged Flavor Symmetry,”
[arXiv:hep-th/0703015].
[8] C. Ahn, “More on meta-stable brane configuration,” [arXiv:hep-th/0702038].
[9] C. Ahn, “Meta-stable brane configuration with orientifold 6 plane,”
[arXiv:hep-th/0701145], to appear in JHEP.
[10] C. Ahn, “M-theory lift of meta-stable brane configuration in symplectic and orthogonal
gauge groups,” Phys. Lett. B 647, 493 (2007) [arXiv:hep-th/0610025].
[11] C. Ahn, “Brane configurations for nonsupersymmetric meta-stable vacua in SQCD with
adjoint matter,” Class. Quant. Grav. 24, 1359 (2007) [arXiv:hep-th/0608160].
[12] H. Ooguri and Y. Ookouchi, “Meta-stable supersymmetry breaking vacua on intersecting
branes,” Phys. Lett. B 641, 323 (2006) [arXiv:hep-th/0607183].
[13] S. Franco, I. Garcia-Etxebarria and A. M. Uranga, “Non-supersymmetric meta-stable
vacua from brane configurations,” JHEP 0701, 085 (2007) [arXiv:hep-th/0607218].
[14] I. Bena, E. Gorbatov, S. Hellerman, N. Seiberg and D. Shih, “A note on (meta)stable
brane configurations in MQCD,” JHEP 0611, 088 (2006) [arXiv:hep-th/0608157].
[15] E. Lopez and B. Ormsby, “Duality for SU x SO and SU x Sp via branes,” JHEP 9811,
020 (1998) [arXiv:hep-th/9808125].
[16] A. Hanany and E. Witten, “Type IIB superstrings, BPS monopoles, and three-
dimensional gauge dynamics,” Nucl. Phys. B 492, 152 (1997) [arXiv:hep-th/9611230].
[17] D. Shih, “Spontaneous R-Symmetry Breaking in O’Raifeartaigh Models,”
[arXiv:hep-th/0703196].
[18] E. Witten, “Solutions of four-dimensional field theories via M-theory,” Nucl. Phys. B
500, 3 (1997) [arXiv:hep-th/9703166].
[19] K. Landsteiner and E. Lopez, “New curves from branes,” Nucl. Phys. B 516, 273 (1998)
[arXiv:hep-th/9708118].
[20] K. Landsteiner, E. Lopez and D. A. Lowe, “Supersymmetric gauge theories from branes
and orientifold six-planes,” JHEP 9807, 011 (1998) [arXiv:hep-th/9805158].
http://arxiv.org/abs/hep-th/0703015
http://arxiv.org/abs/hep-th/0702038
http://arxiv.org/abs/hep-th/0701145
http://arxiv.org/abs/hep-th/0610025
http://arxiv.org/abs/hep-th/0608160
http://arxiv.org/abs/hep-th/0607183
http://arxiv.org/abs/hep-th/0607218
http://arxiv.org/abs/hep-th/0608157
http://arxiv.org/abs/hep-th/9808125
http://arxiv.org/abs/hep-th/9611230
http://arxiv.org/abs/hep-th/0703196
http://arxiv.org/abs/hep-th/9703166
http://arxiv.org/abs/hep-th/9708118
http://arxiv.org/abs/hep-th/9805158
[21] C. Ahn, K. Oh and R. Tatar, “Comments on SO/Sp gauge theories from brane configu-
rations with an O(6) plane,” Phys. Rev. D 59, 046001 (1999) [arXiv:hep-th/9803197].
[22] K. Landsteiner, E. Lopez and D. A. Lowe, “Duality of chiral N = 1 supersymmetric
gauge theories via branes,” JHEP 9802, 007 (1998) [arXiv:hep-th/9801002].
[23] I. Brunner, A. Hanany, A. Karch and D. Lust, “Brane dynamics and chiral non-chiral
transitions,” Nucl. Phys. B 528, 197 (1998) [arXiv:hep-th/9801017].
[24] S. Elitzur, A. Giveon, D. Kutasov and D. Tsabar, “Branes, orientifolds and chiral gauge
theories,” Nucl. Phys. B 524, 251 (1998) [arXiv:hep-th/9801020].
[25] C. Csaki, M. Schmaltz, W. Skiba and J. Terning, “Gauge theories with tensors from
branes and orientifolds,” Phys. Rev. D 57, 7546 (1998) [arXiv:hep-th/9801207].
[26] C. Ahn, K. Oh and R. Tatar, “Branes, geometry and N = 1 duality with product gauge
groups of SO and Sp,” J. Geom. Phys. 31, 301 (1999) [arXiv:hep-th/9707027].
[27] J. D. Lykken, E. Poppitz and S. P. Trivedi, “M(ore) on chiral gauge theories from D-
branes,” Nucl. Phys. B 520, 51 (1998) [arXiv:hep-th/9712193].
[28] K. A. Intriligator and S. D. Thomas, “Dual descriptions of supersymmetry breaking,”
[arXiv:hep-th/9608046].
[29] A. Hanany and A. Zaffaroni, “On the realization of chiral four-dimensional gauge theories
using branes,” JHEP 9805, 001 (1998) [arXiv:hep-th/9801134].
[30] I. Affleck, M. Dine and N. Seiberg, “Dynamical Supersymmetry Breaking In Four-
Dimensions And Its Phenomenological Implications,” Nucl. Phys. B 256, 557 (1985).
[31] M. Berkooz, “The Dual of supersymmetric SU(2k) with an antisymmetric tensor and
composite dualities,” Nucl. Phys. B 452, 513 (1995) [arXiv:hep-th/9505067].
[32] P. Pouliot, “Duality in SUSY SU(N) with an Antisymmetric Tensor,” Phys. Lett. B
367, 151 (1996) [arXiv:hep-th/9510148].
[33] P. Pouliot and M. J. Strassler, “Duality and Dynamical Supersymmetry Breaking in
Spin(10) with a Spinor,” Phys. Lett. B 375, 175 (1996) [arXiv:hep-th/9602031].
[34] S. Murthy, “On supersymmetry breaking in string theory from gauge theory in a throat,”
[arXiv:hep-th/0703237].
http://arxiv.org/abs/hep-th/9803197
http://arxiv.org/abs/hep-th/9801002
http://arxiv.org/abs/hep-th/9801017
http://arxiv.org/abs/hep-th/9801020
http://arxiv.org/abs/hep-th/9801207
http://arxiv.org/abs/hep-th/9707027
http://arxiv.org/abs/hep-th/9712193
http://arxiv.org/abs/hep-th/9608046
http://arxiv.org/abs/hep-th/9801134
http://arxiv.org/abs/hep-th/9505067
http://arxiv.org/abs/hep-th/9510148
http://arxiv.org/abs/hep-th/9602031
http://arxiv.org/abs/hep-th/0703237
[35] R. Argurio, M. Bertolini, S. Franco and S. Kachru, “Metastable vacua and D-branes at
the conifold,” [arXiv:hep-th/0703236].
[36] A. Giveon and D. Kutasov, “Gauge symmetry and supersymmetry breaking from inter-
secting branes,” [arXiv:hep-th/0703135].
[37] Y. E. Antebi and T. Volansky, “Dynamical supersymmetry breaking from simple quiv-
ers,” [arXiv:hep-th/0703112].
[38] M. Wijnholt, “Geometry of particle physics,” [arXiv:hep-th/0703047].
[39] J. J. Heckman, J. Seo and C. Vafa, “Phase structure of a brane/anti-brane system at
large N,” [arXiv:hep-th/0702077].
[40] R. Tatar and B. Wetenhall, “Metastable vacua, geometrical engineering and MQCD
transitions,” JHEP 0702, 020 (2007) [arXiv:hep-th/0611303].
[41] H. Verlinde, “On metastable branes and a new type of magnetic monopole,”
[arXiv:hep-th/0611069].
[42] M. Aganagic, C. Beem, J. Seo and C. Vafa, “Geometrically induced metastability and
holography,” [arXiv:hep-th/0610249].
[43] R. Argurio, M. Bertolini, S. Franco and S. Kachru, “Gauge / gravity dual-
ity and meta-stable dynamical supersymmetry breaking,” JHEP 0701, 083 (2007)
[arXiv:hep-th/0610212].
http://arxiv.org/abs/hep-th/0703236
http://arxiv.org/abs/hep-th/0703135
http://arxiv.org/abs/hep-th/0703112
http://arxiv.org/abs/hep-th/0703047
http://arxiv.org/abs/hep-th/0702077
http://arxiv.org/abs/hep-th/0611303
http://arxiv.org/abs/hep-th/0611069
http://arxiv.org/abs/hep-th/0610249
http://arxiv.org/abs/hep-th/0610212
	Introduction
	The N=1 supersymmetric brane configuration of SU(Nc) SU(Nc') gauge theory
	Electric theory with SU(Nc) SU(Nc') gauge group
	Magnetic theory with SU(N"0365Nc) SU(Nc') gauge group
	Nonsupersymmetric meta-stable brane configuration of SU(Nc) SU(Nc') gauge theory 
	The N=1 supersymmetric brane configuration of SU(Nc) SO(Nc') gauge theory 
	Electric theory with SU(Nc) SO(Nc') gauge group 
	Magnetic theory with SU(N"0365Nc) SO(Nc') gauge group
	Nonsupersymmetric meta-stable brane configuration of SU(Nc) SO(Nc') gauge theory 
	Discussions
ABSTRACT
  Starting from the N=1 SU(N_c) x SU(N_c') gauge theory with fundamental and
bifundamental flavors, we apply the Seiberg dual to the first gauge group and
obtain the N=1 dual gauge theory with dual matters including the gauge
singlets. By analyzing the F-term equations of the superpotential, we describe
the intersecting type IIA brane configuration for the meta-stable
nonsupersymmetric vacua of this gauge theory. By introducing an orientifold
6-plane, we generalize to the case for N=1 SU(N_c) x SO(N_c') gauge theory with
fundamental and bifundamental flavors. Finally, the N=1 SU(N_c) x Sp(N_c')
gauge theory with matters is also described very briefly.

<|endoftext|><|startoftext|>
Spinor dipolar Bose-Einstein condensates; Classical spin approach
M. Takahashi1, Sankalpa Ghosh1,2, T. Mizushima1, K. Machida1
Department of Physics, Okayama University, Okayama 700-8530, Japan and
Department of Physics, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
(Dated: October 26, 2018)
Magnetic dipole-dipole interaction dominated Bose-Einstein condensates are discussed under spin-
ful situations. We treat the spin degrees of freedom as a classical spin vector, approaching from
large spin limit to obtain an effective minimal Hamiltonian; a version extended from a non-linear
sigma model. By solving the Gross-Pitaevskii equation we find several novel spin textures where
the mass density and spin density are strongly coupled, depending upon trap geometries due to the
long-range and anisotropic natures of the dipole-dipole interaction.
PACS numbers: 03.75.Mn, 03.75.Hh, 67.57.Fg
Bose-Einstein condensates (BEC) with internal de-
grees of freedom, the so-called spinor BEC have attract
much attention experimentally and theoretically in re-
cent years [1]. Spinor BEC opens up a new paradigm
where the order parameter of condensates is described
by a multi-component vector [2, 3]. This can be possi-
ble by optically trapping cold atoms where all hyperfine
states are liberated, while magnetic trapping freezes its
freedom. So far 23Na (the hyperfine state F = 1), and
87Rb (F = 2) are extensively investigated.
Griesmaier et al. [4] have recently succeeded in achiev-
ing BEC of 52Cr atom gases whose magnetic moment per
atom is 3 µB (Bohr magneton). There has been already
emerging [5] several novel aspects associated with larger
magnetic moment in 52Cr atom even in this magnetic
trapping, where all spin moments are polarized along an
external magnetic field. Namely the magnetic dipole-
dipole (d-d) interaction, which is proportional to F 2 is
expected to play an important role in a larger spin atom.
It is natural to expect realization of BEC with still
larger spin atomic species under the spinful situations
by optical trapping or control the d-d interaction via the
Feshbach resonance relative to other interaction channels.
There has already been existing a large amount of theo-
retical studies for dipolar BEC [6]. Most of them treat
the polarized case where the dipolar moments are aligned
along an external field. The intrinsic anisotropic or ten-
sorial nature of the d-d interaction relative to the polar-
ization axis manifests itself in various properties. The
head-to-tail moment arrangement due to the d-d interac-
tion is susceptible to a shape instability by concentrating
atoms in the central region. We have seen already that
tensorial and long-ranged d-d interaction is responsible
for this kind of shape dependent phenomenon where the
mass density is constrained by the polarization axis.
In contrast the theoretical studies of the spinor dipolar
BEC are scarce, and just started with several impressive
works [7, 8, 9, 10]. They consider either the F = 1 spinor
BEC by taking into account the d-d interaction or F = 3
for 52Cr atom gases in a realistic situation. Here one must
handle a 7-component spinor with 5 different interaction
channels g0, g2, g4, g6, and gd. The parameter space to
hunt is large and difficult enough to find a stable con-
figuration. The situation becomes further hard towards
a larger F where the d-d interaction is more important
and eventually dominant one among various channels.
Here we investigate generic properties of the spinor
dipolar BEC under an optical trapping where the d-d
interaction dominates other interactions except for the
s-wave repulsive channel. A proposed model Hamilto-
nian is intended to capture essential properties of the
spinor BEC system. We note that this long-ranged and
anisotropic d-d interaction has fascinated researches for
a long time, for example, Luttinger and Tisza in their
seminal paper [11] theoretically discussed the stable spin
configurations of a spin model on a lattice where classi-
cal spins with a fixed magnitude are free to rotate on a
lattice. The present paper is designed to generalize this
lattice spin model to a dipolar BEC system. Here we are
interested in the interplay between the spin degrees of
freedom and the mass density through the d-d interac-
tion.
We approach this problem from atomic species with
large magnetic moment. This spinor dipole BEC with the
hyper fine state F (Fz = −F,−F+1, · · · , F ) is character-
ized by 2F+1 components Ψα(r). In general the number
of the interaction channels are F + 1. For example, the
F = 1 spinor BEC [2, 3] is characterized by the scattering
lengths a0 and a2, leading to the spin independent repul-
sive interaction g0 = 4πh̄
2(a0 + 2a0)/3m and the spin
dependent exchange interaction g2 = 4πh̄
2(a2 − a0)/3m.
Since a0 and a2 are comparable, g2 is actually small;
|g2|/g0 ∼ 1/10 for 23Na [12, 13] and ∼ 1/35 for 87Rb
[14, 15]. This tendency that, except for the dominant re-
pulsive part g0, other spin-dependent channels are nearly
cancelled is likely to be correct for other F ’s [16].
We can take a view in this paper that instead of work-
ing with ~Ψ(r) full quantum mechanical 2F + 1 compo-
nents (ΨF ,ΨF−1, · · · ,Ψ−F ) with the interaction param-
eters g0, g2, g4, ..., and g2F , the order parameter can be
simplified to ~Ψ(ri) = ψ(ri)~S(ri) where ~S(ri) is a clas-
sical vector with |~S(ri)|2 = 1. Namely we can treat it
http://arxiv.org/abs/0704.0122v2
as the classical spin vector whose magnitude |ψ(ri)|2 is
proportional to the local condensate density. In other
words, we focus on long-wavelength and low energy tex-
tured solutions of a dipolar system which will manifest
the interplay between the mass and spin density degrees
of freedom.
We start with the following minimal model Hamilto-
d3ri~Ψ
†(ri)H0(ri)~Ψ(ri)
d3rid
rjVdd(ri, rj)|ψ(ri)|2|ψ(rj)|2,(1)
H0 = −
∇2i + Vtrap(ri)− µ+
|~Ψ(ri)|2, (2)
Vdd(ri, rj) =
~Si · ~Sj − 3(~Si · ~eij)(~Sj · ~eij)
, (3)
where ~eij ≡ (ri − rj)/rij with rij = |ri − rj |. The uni-
axially symmetric trap potential is given by Vtrap(r) =
mω2{γ(x2 + y2) + z2} with γ being the anisotropy pa-
rameter. µ is the chemical potential. The repulsive
(g > 0) and the dipole-dipole (gd) interaction are intro-
duced. The classical spin vector ~Si ≡ ~S(ri) characterizes
the internal degrees of freedom of the system at the site
i and is denoted by spherical coordinates (ϕ(ri), θ(ri))
with |~Si|2 = 1. A dimensionless form of this Hamiltonian
may be written as
|∇ψ(ri)|2 + ni
∇θ(ri)
+ sin2 θ(ri)
∇ϕ(ri)
+ γ2(x2 + y2) + z2
− 2µni + gn2i
d3rid
~Si · ~Sj − 3(~Si · ~eij)(~Sj · ~eij)
ninj , (4)
with |ψ(ri)|2 = ni. We note that the spin gradient term
in the first line is a Non-linear sigma model[17]. Here it
is extended to include the dipole-dipole interaction be-
tween the different parts of the spin density. The energy
(length) is measured by the harmonic frequency ω (har-
monic length d ≡ 1/
mω) with h̄ = 1 The functional
derivatives with respect to ψ∗(ri), ϕ(ri) and θ(ri) lead
to the corresponding Gross-Pitaevskii equations.
In this paper under a fixed repulsive interaction
(g/ωd3 = 0.01) we vary the d-d interaction gd in a range
of 0 ≤ gd ≤ 0.4g, beyond which the system is unsta-
ble. We consider two types of the confinement: A pan-
cake (γ = 0.2) and a cigar (γ = 5.0) to see the shape
dependence of the d-d interaction, which is long-ranged
and anisotropic. The total particle number ∼ 104. The
three dimensional space is discretized into the lattice sites
∼ 2.5 × 104. Using the imaginary time (τ) evolution of
Gross-Pitaevskii equations e.g. ∂ψi/∂τ = −δH/δψ∗i , we
obtain stable configurations for spin and particle densi-
ties by starting with various initial patterns.
We start with the pancake shape (γ = 0.2). Figure 1
shows a stereographic image of the particle density and
spin distributions. We call it spin current texture, where
the spin direction circulates around the origin and is con-
fined into the x-y plane without the third component,
that is, a coplanar texture. It is seen that the particle
density distribution is strongly coupled to the spin one;
FIG. 1: Stereographic view of the spin current texture, dis-
playing simultaneously the number and spin densities. The
pancake (γ = 0.2) is distorted and at the center the number
density is depleted to give a doughnut like shape. gd = 0.2g.
All spins lie in the x-y plane, i.e. a coplanar spin structure,
circulating around the origin O. The length of the arrow is
proportional to its number density. Inset shows the schematic
spin configuration on z = 0 plane.
In the central region the particles are depleted over the
coherent length ξd of the d-d interaction. In the present
case ξd ∼ 2.0ξc (ξc is the ordinary coherent length of the
FIG. 2: The r-flare texture. Left (right) column shows the
cross-sectional density plots of the particle number (the corre-
sponding spin structure). The circular profile in the x-y plane
is spontaneously broken. gd = 0.2g, γ = 0.2.
repulsive interaction).
This spin current texture can be readily explained in
the following way: (1) Locally, along the stream line of
the spin current the head-to-tail configuration minimizes
the energy. (2) Globally, the spins at A and B which are
situated far apart about the origin O shown in inset of
Fig. 1 are orientated anti-parallel to minimize the d-d in-
teraction. (3) When the two antiparallel spins at A and
B come closer towards the origin O, the kinetic energy
due to the spin modulation increases. To avoid this en-
ergy loss, the particle number is depleted in the central
region at the cost of the harmonic potential energy.
For an alternative explanation of the spin current
texture we rewrite the d-d interaction as vdd(rij) ∝
Y2µ(cos θ)Σµ(ij) with Σµ(ij) being a rank 2
tensor consisting of the two spins at i and j sites, and
Y2µ(cos θ) a spherical harmonics [18]. θ is the polar angle
in spherical coordinates of the system. The spin current
texture shown in inset of Fig. 1 picks up the phase factor
e2iϕ when winding around the origin. This is coupled to
Y2±2(cos θ) ∝ sin2 θ, meaning that this orbital moment
dictates the number density depletion at the pancake cen-
ter. The spin-orbit coupling directly manifests itself here.
The total angular momentum consisting of the spin and
orbit ones is a conserved quantity of the present axis-
symmetric system, leading to the Einstein-de Haas effect
[7]. The spin current texture is stable for the wide range
of anisotropy γ: 0.01 ≤ γ ≤ 0.6, beyond which it becomes
unstable.
Figure 2 displays another stable configuration in a sim-
ilar situation. The left (right) column shows the den-
sity plots of the particle number (the corresponding spin
structure). The spins are almost parallel to the x-axis,
but at the outer region they bent away. We call it r-flare
texture, which is a non-coplanar spin arrangement. It
is clearly seen that the axis-symmetry in the x-y plane,
FIG. 3: Cross sections of the particle number in Fig. 2 along
the x and y-axis compared with Thomas-Fermi (TF) profile
for gd = 0. The profile is elongated (compressed) along the x
(y)-axis.
which was originally circular, is spontaneously broken so
that the circular shape is elongated along the x-axis and
compressed along the y-axis. Figure 3 displays the x
and y-axis cross-sections of the particle density, com-
pared with the Thomas-Fermi (TF) profile for gd = 0
with the same particle number. Because of the d-d in-
teraction which favors the head-to-tail arrangement, the
particle number is increased at the center. The bending
tendency at the circumference increases with increasing
gd. Beyond a certain critical value gd ∼= 0.27g for ∼ 104
particles, the r-flare texture becomes unstable, indicating
a quantum phase transition. Upon increasing the total
particle number the r-flare is replaced by the spin current
texture. We also note that the z-flare texture in which
the polarization points to the z-axis is equally stable as
we explain shortly.
Let us turn to the cigar shape case elongated along
the z-axis with the trap anisotropy γ = 5.0. The stable
configuration we obtain is shown in Fig. 4 where the spin
structure is basically a flare spin texture which is a non-
coplanar spin arrangement. Namely, the bending occurs
radially so that the spin texture is a three dimensional
object, but keeps axis-symmetry around the z-axis. The
particle density is modified from the TF profile for gd = 0,
elongated along the z direction and compressed to the z-
axis.
This can be understood by seeing Fig. 4 (b). The up-
spin density near the center exerts the d-d force so as to
align the outer spins parallel to the vector connecting the
center and its position, taking the head-to-tail configu-
ration. This results in a non-coplanar structure, but the
axis-symmetry about the z-axis is preserved. This spin
texture is stable for gd ≤ 0.3g and robust for different
aspect ratios: γ = 0.2 and 1.5. The bending angle of
FIG. 4: (a) The z-flare spin texture in the cigar trap along
the z-axis. The spins almost point to the z direction. In
the outer regions they bent. The bright region in background
corresponds to high number density. gd = 0.2g, γ = 5.0. (b)
Schematic figure to explain this spin configuration due to d-d
interaction.
FIG. 5: (a) The two-z-flare spin texture under the same
parameter set (gd = 0.2g, γ = 5.0.) as in Fig. 4 with different
initial spin configuration. The bright region in background
corresponds to high number density. (b) Schematic figure to
explain this spin structure. At the z = 0 plane two oppositely
aligned spins meet and the number density is depleted.
the flare spin texture increases and the elongation along
the z direction becomes larger as gd increases (= 0.1 and
0.2).
Finally we display an example to show how the model
Hamiltonian admits many subtle spin textures with com-
parable energies. Figure 5 (a) shows the two-z-flares op-
positely polarized stacked back to back. This configura-
tion is stabilized starting with a hedgehog spin config-
uration, or skyrmion at the center from which all the
spins point outward from the origin. In the end the
two-z-flares oppositely polarized become stable, but at
the central z = 0 plane the antiparallel spins meet as
seen from Fig. 5 (b). To avoid drastic changes of the
spin direction, or the spin kinetic energy loss, the parti-
cle density decreases there. As a result even though the
harmonic potential energy is minimal there, the two-z-
flare spin textures oppositely polarized are stacked back
to back, but two objects are almost split. This example
illustrates strong coupling between the particle number
and spin densities through the d-d interaction.
These spin textures can be observed directly via a novel
phase-sensitive in situ detection [1] or indirectly via con-
ventional absorption imaging for the number density. It
is interesting to examine the vortex properties under ro-
tation. For the spin current texture, the vortex entry
into a system should be easy because in the central region
the mass density is already depleted. We point out that
the collective modes might be also intriguing because the
mass density is tightly coupled with the spin degrees of
freedom. These problems belong to future work.
In summary, we have introduced a model Hamiltonian
to capture the essential nature of dipolar spinor BEC
where the spin magnitude is large enough, focusing on
long wavelength and low energy textured solutions. We
show several typical stable configurations by solving the
Gross-Pitaevshii equation where the spin and mass den-
sities are strongly coupled due to the dipole-dipole inter-
action. The shape of the harmonic potential trapping is
crucial to determine the spin texture. The model Hamil-
tonian is a minimal extension of the Non-linear sigma
model with the d-d interaction, and yet complicate and
versatile enough to explore further because it is expected
that there are many stable configurations with compara-
ble energies. Finally the model Hamiltonian is applica-
ble literally for electric dipolar systems without further
approximation. We expect that BEC of hetero-nuclear
molecules with permanent electric dipole moment might
be realized in near future [19] where the formation of such
textures may be possible.
We thank Tarun K. Ghosh and W. Pogosov for useful
discussions in the early stage of this research. This work
of S. G. was supported by a grant of the Japan Society
for the Promotion of Science.
[1] See for example, L. E. Sadler et al., Nature (London)
443, 312 (2006).
[2] T. Ohmi and K. Machida, J. Phys. Soc. Jpn. 67, 1822
(1998).
[3] T. -L. Ho, Phys. Rev. Lett. 81, 742 (1998).
[4] A. Griesmaier et al., Phys. Rev. Lett. 94, 160401 (2005).
[5] J. Stuhleret al., Phys. Rev. Lett. 95, 150406 (2005); L.
Santos and T. Pfau, Phys. Rev. Lett. 96, 190404 (2006);
A. Griesmaier et al., Phys. Rev. Lett. 97, 250402 (2006);
S. Giovanazzi et al., Phys. Rev. A 74, 013621 (2005).
[6] See for review, M. A. Baranov et al., Phys. Scr. T 102,
74 (2002).
[7] Y. Kawaguchi et al., Phys. Rev. Lett. 96, 080405 (2006),
97, 130404 (2006), and 98, 110406 (2007).
[8] S. Yi and H. Pu, Phys. Rev. Lett. 97, 020401 (2006).
[9] R. B. Diener and T. -L. Ho, Phys. Rev. Lett. 96, 190405
(2006).
[10] R. Cheng et al., J. Phys. B 38, 2569 (2005).
[11] J. M. Luttinger and L. Tisza, Phys. Rev. 70, 954 (1946).
[12] J. Stenger et al., Nature (London) 396, 345 (1998).
[13] J. P. Burke et al., Phys. Rev. Lett. 81, 3355 (1998).
[14] M. D. Barrett et al., Phys. Rev. Lett. 87, 010404 (2001).
[15] N. N. Klausen et al., Phys. Rev. A 64, 053602 (2001).
[16] For 87Rb (F = 2), the two spin dependent interactions
are 80 and 50 times smaller than the spin-independent
one. T. Kuwamoto et al., Phys. Rev. A 69, 063604 (2004).
[17] F.D.M. Haldane, Phys. Rev. Lett. 50, 1153(1983). R. Ra-
jaraman, Solitons and Instantons (North-Holland, Ams-
terdam, 1989).
[18] C. J. Pethick and H. Smith, in Bose-Einstein condensa-
tion in dilute gases (Cambridge University Press, Cam-
bridge, 2002). Chap. 5, (5.76).
[19] See the special issue on ultracold polar molecules; Eur.
Phys. J. D31, 149-445 (2004).
ABSTRACT
  Magnetic dipole-dipole interaction dominated Bose-Einstein condensates are
discussed under spinful situations. We treat the spin degrees of freedom as a
classical spin vector, approaching from large spin limit to obtain an effective
minimal Hamiltonian; a version extended from a non-linear sigma model. By
solving the Gross-Pitaevskii equation we find several novel spin textures where
the mass density and spin density are strongly coupled, depending upon trap
geometries due to the long-range and anisotropic natures of the dipole-dipole
interaction.

<|endoftext|><|startoftext|>
Introduction to synergetics.) - М.: 
Наука, 1990. - 272 с. 
12. Hopfield J.J. Neural networks and physical systems with emergent collective computation abilities // 
Proc. Natl. Acad. Sci. USA. - 1982. – Vol. 79. -P.2554-2558. 
13. Анисимов Б.В., Курганов В.Д., Злобин В.К. Распознавание и цифровая обработка 
изображений. (Recognition and digital processing of images.) - М.: Высшая школа, 1983. - 295 с. 
14. Павлидис Т. Алгоритмы машинной графики и обработки изображений. (Algorithms of computer 
graphics and image processing.) - М.: Радио и связь, 1986. - 400 с. 
15. Хорстхемке В., Лефевр Р. Индуцированные шумом переходы. (Noise-induced transitions.) - М.: 
Мир, 1987. - 400 с. 
16. Pecora L.M., Carroll T.L. Pseudoperiodic driving: Eliminating multiple domains of attraction using 
chaos // Phys. Rev. Letters. - 1991. – Vol. 67, No.8. - P.945-948. 
17. Маковецкий Д.Н. Критические явления в примесном парамагнетике Cr3+: Al2O3 при 
насыщении электронного парамагнитного резонанса (Critical phenomena in doped paramagnetic 
Cr3+:Al2O3 at saturation of electron paramagnetic resonance.) // XIX Всесоюз. конф. по физике 
магн. явлений (24-27 сент. 1991): Тез.докл. - Ташкент, 1991. - С.102. 
18. Маковецкий Д.Н. Критические явления и нелинейные резонансы в неравновесном примесном 
парамагнетике (Critical phenomena and non-linear resonances in a non-equilibrium doped 
paramagnetic.) // Тр. III Междунар. конф. “Физические явления в твердых телах” (21-23 янв. 
1997). - Харьков, 1997. - С.105. 
19. Tredicce J.R. e.a. On chaos in lasers with modulated parameters: A comparative analysis // Opt. 
Commun. - 1985. – Vol. 55, No.2. - P.131-134. 
20. Meucci R., Poggi A., Arecchi F.T., Tredicce J.R. Dissipativity of an optical chaotic system 
characterized via generalized multistability // Opt. Commun. - 1988. – Vol. 65, No.2. - P.151-156. 
21. Николис Г., Пригожин И. Самоорганизация в неравновесных системах. (Self-organization in 
non-equilibrium systems.) - М.: Мир, 1979. - 512 с. 
22. Grebogi C., Ott E., Yorke J.A. Crises, sudden changes in chaotic attractors, and transient chaos // 
Physica D. - 1983. – Vol. 7. - P.181-200. 
23. Дмитриев А.С., Кислов В.Я. Стохастические колебания в радиофизике и электронике. 
(Stochastic vibrations in radiophysics and electronics.) - М.: Наука, 1989. - 280 с.  
24. Маковецкий Д.Н., Лавринович А.А., Черпак Н.Т. Ветвление стационарных инверсионных 
состояний в квантовом парамагнитном усилителе с резонаторной накачкой (Branching of 
stationary inversion states in a cavity-pumped paramagnetic quantum amplifier.) // Журн. техн. 
физики. - 1999. – T. 69, № 5. - С.101-105.  
25. Makovetskii D.N., Lavrinovich A.A., Cherpack N.T. Branching of stationary inversion states in a 
cavity-pumped paramagnetic maser amplifier // Tech. Phys. - 1999. – Vol. 44, No.5. - P.570-574. 
26. Маковецкий Д.Н. Нелинейное спин-фононное взаимодействие и усиление гиперзвука в 
активной парамагнитной среде (Non-linear spin-phonon interaction and hypersound amplification 
in an active paramagnetic medium.) // Укр. физ. журн. - 1985. – T. 30, № 11. - С.1737-1740.
ABSTRACT
  The microwave phonon stimulated emission (SE) has been experimentally and
numerically investigated in a nonautonomous microwave acoustic quantum
generator, called also microwave phonon laser or phaser (see previous works
arXiv:cond-mat/0303188 ; arXiv:cond-mat/0402640 ; arXiv:nlin.CG/0703050)
Phenomena of branching and long-time refractority (absence of the reaction on
the external pulses) for deterministic chaotic and regular processes of SE were
observed in experiments with various levels of electromagnetic pumping. At the
pumping level growth, the clearly depined increasing of the number of
coexisting SE states has been observed both in real physical experiments and in
computer simulations. This confirms the analytical estimations of the branching
density in the phase space. The nature of the refractority of SE pulses is
closely connected with the pointed branching and reflects the crises of strange
attractors, i.e. their collisions with unstable periodic components of the
higher branches of SE states in the nonautonomous microwave phonon laser.

<|endoftext|><|startoftext|>
Introduction
The problem of embedding complex discs or general Riemann surfaces into complex manifolds
has been well-known for a long time. The interest to the case of almost complex manifolds
has grown due to a strong link with symplectic geometry (Gromov [13]). We present the
following result.
Theorem 1.1 Let (M,J) be an almost complex manifold of complex dimension 2 admitting
a strictly plurisubharmonic exhaustion function ρ. Then for every non-critical value c of ρ,
every point p ∈ Ωc = {ρ < c} and every vector v ∈ Tp(M) there exists a J-holomorphic
immersion f : ID −→ Ωc, where ID ⊂ IC is the unit disc, such that f(bID) ⊂ bΩc, f(0) = p,
and df0
∂Re ζ
= λv for some λ > 0.
For a domainM ⊂ ICn with the standard complex structure, the result is due to Forstnerič
and Globevnik [12]; there are various generalizations including embedding bordered Riemann
surfaces into singular complex spaces (see [7] and references there).
http://arxiv.org/abs/0704.0124v3
Recently Biolley [4] proved a similar result for an almost complex manifold M of any
dimension n, but under the additional hypothesis that the defining function ρ is subcritical.
The latter means that ρ does not have critical points of the maximum Morse index n. (A
plurisubharmonic function can not have critical points of index higher than n.) We don’t
impose such a restriction. Furthemore, Biolley [4] does not prescribe the direction of the disc.
Her method is based on the Floer homology and substantially uses recent work of Viterbo
[23] and Hermann [14]. Our proof is self-contained; we adapt the ideas of Forstnerič and
Globevnik [12] to the almost complex case using the methods of classical complex analysis
and PDE.
In most work on the existence of global discs with boundaries in prescribed totally real
manifolds ([2, 9, 10, 15, 17] and others) the authors use the continuity principle. By the
implicit function theorem and the linearized equation they show that any given disc generates
a family of nearby discs. Then the compactness argument allows for passing to the limit. In
contrast, we construct the discs by solving the almost Cauchy-Riemann equation directly.
Following [12], we start with a small disc passing through the given point in given direction
and push the boundary of the disc in the directions complex-tangent to the level sets of the
defining function ρ; it results in increasing ρ due to pseudoconvexity. This plan leads to
a problem of attaching J-holomorphic discs to totally real tori in a level set of ρ. The
problem is of independent interest and may occur elsewhere. It reduces in turn to the
existence theorem for a boundary value problem for a quasilinear elliptic system of partial
differential equations in the unit disc (Theorem 4.1). We prove it by the classical methods of
the Beltrami equations and quasiconformal mappings (Ahlfors, Bers, Boyarskii, Lavrentiev,
Morrey, Vekua; see [3, 21] and references there). The result can be viewed as a far reaching
generalization of the Riemann mapping theorem.
Since the almost Cauchy-Riemann equation is nonlinear, one can only hope to find a
solution close to a current disc f . By measuring the closeness in the Lp norm, we are able in
fact to construct a disc sufficiently far from f in the sup-norm. To make sure we are looking
for a disc close to f , we adapt the idea of [12] of adding to f(ζ) a term with a factor of ζn
(ζ ∈ ID) with big n. We develop a nonlinear version of this idea.
The above procedure works well in the absence of critical points of ρ. In order to push
the boundary of the disc through critical level sets, we use a method by Drinovec Drnovšek
and Forstnerič [7, 11], which consists of temporarily switching to another plurisubharmonic
function at each critical level set. We point out that adapting this method to the almost
complex case is not a major problem because the difficulties are localized near the critical
points, in which the almost complex structure can be closely approximated by the standard
complex structure.
Although higher dimension gives one more freedom for constructing J-holomorphic discs,
we must admit that our proof of the main result goes through in dimension 2 only. The reason
is that our main tool (Theorem 4.1) needs a special coordinate system in which coordinate
hyperplanes z = const are J-complex, which generally can be achieved only in dimension
2. For a domain in ICn with the standard complex structure, the result is obtained in [12]
by reduction to dimension 2 using sections by 2-dimensional complex hypersurfaces. Such a
reduction in not possible for almost complex structures.
We thank Franc Forstnerič and Josip Globevnik for helpful discussions, in particular, for
pointing out at some difficulties in the problem and for the important references [7, 11].
Parts of the work were completed when the third author was visiting Université de
Provence and Université des Sciences et Technologies de Lille in the spring of 2006. He
thanks these universities for support and hospitality.
2 Almost complex manifolds
Let (M,J) be an almost complex manifold. Denote by ID the unit disc in IC and by Jst the
standard complex structure of ICn; the value of n is usually clear from the context. Let f be a
smooth map from ID intoM . Recall that f is called J-holomorphic if df ◦Jst = J◦df . We also
call such a map f a J-holomorphic disc or a pseudoholomorphic disc or just a holomorphic
disc when a complex structure is fixed. We will often denote by ζ the standard complex
coordinate on IC.
A fundamental result of the analysis and geometry of almost complex structures is the
Nijehnuis–Woolf theorem which states that given point p ∈ M and given tangent vector
v ∈ TpM there exists a J-holomorphic disc f : ID −→M centered at p, that is, f(0) = p and
such that df(0)(∂/∂Re ζ) = λv for some λ > 0. This disc f depends smoothly on the initial
data (p, v) and the structure J . A short proof of this theorem is given in [19]. This result
will be used several times in the present paper.
It is well known that an almost complex manifold (M,J) of complex dimension n can be
locally viewed as the unit ball IB in ICn equipped with an almost complex structure which
is a small deformation of Jst. More precisely, let (M,J) be an almost complex manifold of
complex dimension n. Then for every p ∈M , δ0 > 0, and k ≥ 0 there exist a neighborhood U
of p and a smooth coordinate chart z : U −→ IB such that z(p) = 0, dz(p) ◦ J(p) ◦ dz−1(0) =
Jst, and the direct image z∗(J) := dz ◦ J ◦ dz
−1 satisfies the inequality ||z∗(J)− Jst||Ck(ĪB) ≤
δ0. For a proof we point out that there exists a diffeomorphism z from a neighborhood
U ′ of p ∈ M onto IB such that z(p) = 0 and dz(p) ◦ J(p) ◦ dz−1(0) = Jst. For δ > 0
consider the isotropic dilation dδ : t 7→ δ
−1t in ICn and the composite zδ = dδ ◦ z. Then
limδ→0 ||(zδ)∗(J)−Jst||Ck(ĪB) = 0. Setting U = z
δ (IB) for positive δ small enough, we obtain
the desired result. As a consequence we obtain that for every point p ∈ M there exists a
neighborhood U of p and a diffeomorphism z : U → IB with center at p (in the sense that
z(p) = 0) such that the function |z|2 is J-plurisubharmonic on U and z∗(J) = Jst +O(|z|).
Let u be a function of class C2 on M , let p ∈ M and v ∈ TpM . The Levi form of u at p
evaluated on v is defined by LJ(u)(p)(v) := −d(J∗du)(v, Jv)(p).
The following result is well known (see, for instance, [6]).
Proposition 2.1 Let u be a real function of class C2 on M , let p ∈M and v ∈ TpM . Then
LJ(u)(p)(v) = ∆(u◦f)(0) where f : rID −→M for some r > 0 is an arbitrary J-holomorphic
map such that f(0) = p and df(0)(∂/∂Re ζ) = v, ζ ∈ rID.
The Levi form is invariant with respect to J-biholomorphisms. More precisely, let u be a
C2 real function onM , let p ∈M and v ∈ TpM . If Φ is a (J, J
′)-holomorphic diffeomorphism
from (M,J) into (M ′, J ′), then LJ(u)(p)(v) = LJ
(u ◦ Φ−1)(Φ(p))(dΦ(p)(v)).
Finally, it follows from Proposition 2.1 that a C2 function u is J-plurisubharmonic on
M if and only if LJ (u)(p)(v) ≥ 0 for all p ∈ M , v ∈ TpM . Thus, similarly to the case of
the integrable structure one arrives in a natural way to the following definition: a C2 real
valued function u on M is strictly J-plurisubharmonic on M if LJ(u)(p)(v) is positive for
every p ∈M , v ∈ TpM\{0}.
Let J be a smooth almost complex structure on a neighborhood of the origin in ICn and
J(0) = Jst. Denote by z = (z1, ..., zn) the standard coordinates in IC
n (in matrix computations
below we view z as a column). Then a map z : ID −→ ICn is J-holomorphic if and only if it
satisfies the following system of partial differential equations
zζ − A(z)zζ = 0, (1)
where A(z) is the complex n× n matrix defined by
A(z)v = (Jst + J(z))
−1(Jst − J(z))v (2)
It is easy to see that right-hand side of (2) is IC-linear in v ∈ ICn with respect to the standard
structure Jst, hence A(z) is well defined. Since J(0) = Jst, we have A(0) = 0. Then in a
sufficiently small neighborhood U of the origin the norm ‖ A ‖L∞(U) is also small, which
implies the ellipticity of the system (1).
However, we will need a more precise choice of coordinates imposing additional restric-
tions on the matrix function A. The proof of the following elementary statement can be
found, for instance, in [6].
Lemma 2.2 After a suitable polynomial second degree change of local coordinates near the
origin
z 7→ z +
akjzkzj
we can achieve
A(0) = 0, Az(0) = 0
In these coordinates the Levi form of a given C2 function u with respect to J at the origin
coincides with its Levi form with respect to Jst that is
LJ(u)(0)(v) = LJst(u)(0)(v)
for every vector v ∈ T0IR
3 Integral transforms in the unit disc
Let Ω be a domain in IC. Let TΩ denote the Cauchy-Green transform
TΩf(ζ) =
f(τ)dτ ∧ dτ
τ − ζ
. (3)
Let RΩ denote the Ahlfors-Beurling transform
RΩf(ζ) =
f(τ)dτ ∧ dτ
(τ − ζ)2
, (4)
where the integral is considered in the sense of the Cauchy principal value. We omit the
index Ω if it is clear form the context. Denote by B the Bergman projection for ID.
Bf(ζ) =
f(τ)dτ ∧ dτ
(τζ − 1)2
We need the following properties of the above operators.
Proposition 3.1 (i) Let p > 2 and α = (p−2)/p. Then the linear operator T : Lp(ID) −→
Cα(IC) is bounded, in particular, T : Lp(ID) −→ L∞(ID) is compact. If f ∈ Lp(ID),
then ∂ζTf = f , ζ ∈ ID, as a Sobolev derivative.
(ii) Let m ≥ 0 be integer and let 0 < α < 1. Then the linear operators T : Cm,α(ID) −→
Cm+1,α(IC) and R : Cm,α(ID) −→ Cm,α(ID) are bounded. Furthermore,if f ∈ Cm,α(ID),
then ∂ζTf = f and ∂ζTf = Rf , ζ ∈ ID, in the usual sense.
(iii) The operator RΩ can be uniquely extended to a bounded linear operator RΩ : L
p(Ω) −→
Lp(Ω) for every p > 1. If f ∈ Lp(ID), p > 1 then ∂ζTf = Rf as a Sobolev derivative.
Moreover, the operator RIC is an isometry of L
2(IC), therefore ‖ RIC ‖L2(IC)= 1.
(iv) The Bergman projection B : Lp(ID) −→ Ap(ID) is bounded. Here Ap(ID) denotes the
space of all holomorphic functions in ID of class Lp(ID).
(v) The functions p 7→‖ T ‖Lp(Ω) and p 7→‖ R ‖Lp(Ω) are logarithmically convex and in
particular, continuous for p > 1.
The proofs of the parts (i)–(iii) are contained in [21]. The part (iv) follows from (iii); see e.
g. [8]. The part (v) follows by the classical interpolation theorem of M. Riesz–Torin (see e.
g. [24]).
We introduce modifications of the operators T and R for solving certain boundary value
problems in the unit disc ID. For f ∈ Lp(ID) we define
T0f(ζ) = Tf(ζ)− Tf(ζ
), ζ ∈ ID. (5)
By Proposition 3.1 for p > 2 and α = (p− 2)/p, the linear operator T0 : L
p(ID) −→ Cα(ID)
is bounded, in particular, T0 : L
p(ID) −→ L∞(ID) is compact. Since the function Tf is
holomorphic and bounded in IC\ID, then the function ζ 7→ (Tf)(ζ
) is holomorphic in ID.
Hence ∂ζT0f = ∂ζTf = f . Furthermore, for ζ ∈ bID, we have ζ = ζ
, therefore by (5),
ReT0f(ζ) = 0. Hence for f ∈ L
p(ID), the function u = T0f solves the boundary value
problem
∂ζu = f, ζ ∈ ID,
Reu|bID = 0
We further define
R0f := ∂ζT0f.
Since ∂ζTf = Rf and ∂ζTf = f , then
R0f(ζ) = ∂ζT0f(ζ) = Rf(ζ)− ∂ζTf(ζ
) = Rf(ζ) + ζ−2Rf(ζ
), (6)
and we obtain a nice formula
R0f = Rf +Bf,
where B is the Bergman projection. By Propositions 3.1(iv) and (v), the operator R0 :
Lp(ID) −→ Lp(ID) is bounded, and the map p 7→‖ R0 ‖Lp(ID) is continuous for p > 1. By
Proposition 3.1(iii), R is an isometry of L2(IC). The analogue of this result for the operator
R0 may have been used for the first time by Vinogradov [22]. In fact we came across [22]
after proving the following
Theorem 3.2 R0 is a IR-linear isometry of L
2(ID), in particular, ‖ R0 ‖L2(ID)= 1.
Since we could not find a proof in the literature, for completeness we include it here.
Proof : For a domain G ⊂ IC we use the inner product
(f, g)G = −
fgdζ ∧ dζ.
We put
σf(ζ) = ζ
), ψ(ζ) = ζ
Then σ2 = id. By substitution ζ 7→ ζ
we obtain
(σf, σg)ID = (g, f)IC\ID, Rσ = ψσR, R = ψσRσ. (7)
By (6) we have
R0f = Rf + ψσRf.
Let f ∈ L2(ID). Extend f to all of IC by putting f(ζ) = 0 for |ζ | > 1. Then
‖ R0f ‖
L2(ID)= (Rf + ψσRf,Rf + ψσRf)ID =
(Rf,Rf)ID + 2Re (Rf, ψσRf)ID + (ψσRf, ψσRf)ID.
Since |ψ| = 1, by (7) we obtain
(ψσRf, ψσRf)ID = (σRf, σRf)ID = (Rf,Rf)IC\ID,
(Rf, ψσRf)ID = (ψσRσf, ψσRf)ID = (Rσf,Rf)IC\ID = (ψσRf,Rf)IC\ID = (Rf, ψσRf)IC\ID.
Then by the previous line and because R is an isometry
2Re (Rf, ψσRf)ID = Re (Rf, ψσRf)IC = Re (Rf,Rσf)IC = Re (f, σf)IC = 0.
Hence
‖ R0f ‖
L2(ID)= (Rf,Rf)ID + (Rf,Rf)IC\ID =‖ Rf ‖
L2(IC)=‖ f ‖
L2(IC)=‖ f ‖
L2(ID),
which proves the theorem.
4 Riemann mapping theorem for an elliptic system
The Riemann mapping theorem asserts that for every simply connected domain G ⊂ IC there
exists a conformal map of G onto ID. If G is smooth, then there is a diffeomorphism f :
G −→ ID, which defines an almost complex structure J = f∗(Jst) in ID. Then the Riemann
mapping theorem reduces to constructing a J-holomorphic map z : (ID, Jst) −→ (ID, J). The
latter satisfies the Beltrami type equation ∂ζz = A(z)∂ζz, which is equivalent to the linear
Beltrami equaion ∂zζ + A(z)∂zζ = 0. We consider the following more general system
∂ζz = a(z, w)∂ζz,
∂ζw = b(z, w)∂ζz,
which cannot be reduced to a linear one. Here z, w are unknown functions of ζ ∈ ID and a, b
are C∞ coefficients. By eliminating ζ , the system reduces to a nonhomogeneous quasilinear
Beltrami type equation ∂zw + a∂zw = b, but we prefer to deal with (8) directly.
The following theorem is our main technical tool for constructing pseudoholomorphic
discs with boundaries in a prescribed torus. For r > 0 denote IDr := rID.
Theorem 4.1 Let a, b : ID× ID1+γ −→ IC (γ > 0) be smooth functions such that
a(z, 0) = b(z, 0) = 0 and |a(z, w)| ≤ a0 < 1.
Then there exists C > 0 such that for every integer n ≥ 1 the system (8) admits a smooth
solution (zn, wn) with the following properties:
(i) |zn(ζ)| = |wn(ζ)| = 1 for |ζ | = 1.
(ii) zn : ID −→ ID is a diffeomorphism with zn(0) = 0.
(iii) |wn(ζ)| ≤ C|ζ |
n, |wn(ζ)| < 1 + γ.
Proof : Shrinking γ > 0 if necessary, we extend the functions a and b to all of IC2
preserving their properties. We will look for a solution of (8) in the form
z = ζeu, w = ζnev.
Then for the new unknowns u and v we have the following boundary value problem
∂ζu = A(u, v, ζ)(1 + ζ∂ζu), ζ ∈ ID
∂ζv = B(u, v, ζ)(1 + ζ∂ζu), ζ ∈ ID
Re u(ζ) = Re v(ζ) = 0, |ζ | = 1
where
A = aζ−1eu−u,
B = bζ−neu−v.
Put ∂ζu = h and choose u in the form u = T0h. Then ∂ζu = R0h, which we plug into (9).
We obtain the following system of singular integral equations for u, v and h:
h = A(1 + ζR0h),
u = T0h,
v = T0(B(1 + ζR0h))
We denote by ‖ f ‖p the L
p-norm of f in ID. Since the function p 7→‖ R0 ‖p is continuous in
p and ‖ R0 ‖2= 1 we choose p > 2 such that
a0 ‖ R0 ‖p< 1.
For given u, v ∈ L∞(ID) the map h 7→ A(1 + ζR0h) is a contraction in L
p(ID) because
‖ ζA ‖∞‖ R0 ‖p< 1.
Hence there exists a unique solution h = h(u, v) of the first equation of (10) satisfying
‖ h ‖p≤
‖ A ‖p
1− a0 ‖ R0 ‖p
Consider the map F : L∞(ID)× L∞(ID) −→ L∞(ID)× L∞(ID) defined by
F : (u, v) 7→ (U, V ) = (T0h, T0(B(1 + ζR0h)))
where h = h(u, v) is determined above. Then F is continuous (even Lipschitz) map. Let
E = {(u, v) ∈ L∞(ID)× L∞(ID) :‖ u ‖∞≤ u0, ‖ v ‖∞≤ v0}
We need the following
Lemma 4.2 There exist u0 > 0, v0 > 0 such that E is invariant under F .
Assuming the lemma, we prove the existence of the solution of (10). Indeed, since
T0 : L
p(ID) −→ L∞(ID) is compact for p > 2, then F : E −→ E is compact. Since
E is a bounded, closed and convex, then the existence of the solution of (10) follows by
Schauder’s principle.
Proof of Lemma 4.2 : Since a(z, 0) = b(z, 0) = 0, we have
|a(z, w)| ≤ C1|w|, |b(z, w)| ≤ C1|w|.
Here and below we denote by Cj constants independent of n. We have
|a| = |a(ζeu, ζnev)| ≤ C1e
‖v‖∞ |ζ |n,
‖ A ‖p=‖ aζ
−1 ‖p≤ C2 ‖ ζ
n−1 ‖p e
‖v‖∞ ≤ C3e
‖v‖∞n−1/p.
By (11), ‖ h ‖p≤ C4e
‖v‖∞n−1/p, hence
‖ U ‖∞≤ C5e
‖v‖∞n−1/p.
Similarly
|B| = |b(ζeu, ζnev)ζ−neu−v| ≤ C1e
‖u‖∞ ,
‖ B ‖∞≤ C1e
‖ V ‖∞≤ C7(‖ B ‖p + ‖ B ‖∞‖ h ‖p) ≤ C8e
‖u‖∞ .
Let δ = n−1/p. Then
‖ U ‖∞≤ C9δe
‖v‖∞ ,
‖ V ‖∞≤ C9e
Consider the system
u0 = C9δe
v0 , v0 = C9e
with the unknowns u0, v0. Then
u0 = C9δe
For small δ > 0 this equation has two positive roots. Let u0 = u0(δ) be the smaller root and
v0 = v0(δ) = C9e
u0. Now if ‖ u ‖∞≤ u0, ‖ v ‖∞≤ v0, then
‖ U ‖∞≤ C9δe
‖v‖∞ ≤ C9δe
v0 ≤ u0,
‖ V ‖∞≤ C9δe
‖u‖∞ ≤ C9δe
u0 ≤ v0
Hence E is invariant under F , which proves the lemma.
Thus the solution of (10) in L∞(ID) exists for n big enough. Since h ∈ Lp(ID), p > 2,
the second and the third equations of (10) imply that u, v ∈ Cα(ID), α = (p − 2)/p. Since
∂ζu = h ∈ L
p(ID) and ∂ζu = R0h ∈ L
p(ID) as Sobolev’s derivatives, then u and v are
solutions of (9), hence z = ζeu and w = ζnev are solutions of (8). By the ellipticity of
the system, z, w ∈ C∞(ID). The smoothness up to the boundary can be derived directly
from the properties of the Beltrami equation; it also follows by the reflection principle for
pseudoholomorphic discs attached to totally real manifolds (see, e.g., [18]).
Since the winding number of z|bID about 0 equals 1 and
∣∂ζz/∂ζz
∣ = |a| ≤ a0 < 1 then
z : ID −→ ID is a homeomorphism by the classical properties of the Beltrami equation [21],
and (ii) follows.
Note that u0 −→ 0, v0 −→ C9 as n −→ ∞. Since T0 : L
p(ID) −→ Cα(ID) is bounded,
then we have
‖ v ‖Cα(ID)≤ C10, ‖ e
v ‖Cα(ID)≤ C11,
and |w(ζ)| ≤ C11|ζ |
n. Furthermore, since |ev| = 1 on bID, then |ev(ζ)| ≤ 1 + C11(1 − |ζ |)
for |ζ | < 1. Then |w(ζ)| ≤ |ζ |n(1 + C11(1 − |ζ |)
α), hence ‖ w ‖∞−→ 1 as n −→ ∞. Hence
‖ w ‖∞< 1+ γ for n big enough, and (iii) follows. This completes the proof of Theorem 4.1.
5 Pseudoholomorphic discs attached to real tori
This section concerns the geometrization of Theorem 4.1. We apply Theorem 4.1 in order
to obtain a crucial technical result on (approximately) attaching pseudoholomorphic discs
to a given real 2-dimensional torus in (M,J). We will use this result later for pushing discs
across level sets of the defining function ρ in Theorem 1.1.
The tori and the discs considered in this section are not arbitrary. We study a special
case which will suffice for the proof of the main result. Given a psedoholomorphic immersed
disc f , we associate with f a real 2-dimensional torus Λ formed by the boundary circles of
discs hζ centered at the boundary points f(ζ), ζ ∈ bID. Thus, our initial data is a pair (f,Λ).
Our goal is to construct a pseudoholomorphic disc with the boundary attached to the torus
Λ. First we find a suitable neighborhood of the disc f which can be parametrized by the
bidisc in IC2. We transport the structure J onto this bidisc and choose the coordinates there
such that the equations for J-holomorphic discs take the form used in Theorem 4.1. The
theorem will provide a pseudoholomorphic disc approximately attached to Λ.
5.1 Admissible parametrizations by the bidisc and generated tori
Let f : ID −→ (M,J) be a J-holomorphic disc of class C∞(ID). Suppose f is an immersion.
Let γ > 0. Given ζ ∈ D consider a J-holomorphic disc
hζ : (1 + γ)ID −→ M
satisfying the condition hζ(0) = f(ζ) and such that the direction dhζ(0)(
∂Re τ
) is not tangent
to f . Admitting some abuse of notation, we sometimes write hf(ζ) for hζ .
This allows to define a C∞ map
H : ID× (1 + γ)ID −→M, H(ζ, τ) = hζ(τ).
Then H has the following properties:
(i) For every ζ ∈ ID the map hζ := H(ζ, •) is J-holomorphic.
(ii) For every ζ ∈ ID we have H(ζ, 0) = f(ζ).
(iii) For every ζ ∈ ID the disc hζ is transversal to f at the point f(ζ).
We assume in addition that
(iv) H : ID× (1 + γ)ID −→M is locally diffeomorphic.
Then Λ = H(bID×(1+γ)bID) is a real 2-dimensional torus immersed intoM . It is formed
by a family of topological circles γζ = hζ((1 + γ)bID) parametrized by ζ ∈ bID. Every such a
circle bounds a J-holomorphic disc hζ : (1 + γ)ID −→ M centered at f(ζ). In particular the
torus Λ can be continuously deformed to the circle f(bID).
If the above conditions (i) - (iv) hold we say that a mapH is an admissible parametrization
of a neighborhood of f(ID) and Λ is the torus generated by H .
5.2 Ellipticity of admissible parametrizations
We prove the following consequence of Theorem 4.1.
Theorem 5.1 Let f : ID −→ (M,J) be a C∞ immersion J-holomorphic in ID. Suppose
that there exists an admissible parametrization H of a neighborhood of f(ID) and let Λ be
the generated torus. Then there exists an immersed J-holomorphic disc f̃ of class C∞(ID)
centered at f(0), tangent to f at f(0) and satisfying the boundary condition f̃(bID) ⊂ H(bID×
bID).
We stress that the boundary of f̃ is attached to the torus H(bID×bID) and not to Λ. However
since γ > 0 can be chosen arbitrarily close to 0, this leads to the following result sufficient
for applications.
Corollary 5.2 In the hypothesis of the former theorem for any positive integer n there exists
an immersed J-holomorphic disc fn of class C∞(ID) centered at f(0), tangent to f at f(0)
and such that dist(fn(bID),Λ) −→ 0 as n −→ ∞.
Here dist denotes any distance compatible with the topology of M .
We begin the proof of Theorem 5.1 with the remark that the discs hζ , ζ ∈ D, fill a subset
V of M containing f(ID) which can be viewed as a fiber space with the base f(ID) and the
generic fiber hζ((1 + γ)ID). Therefore the defined above map
H : ID× (1 + γ)ID −→ V
gives a natural parametrization of V by the bidisc Uγ := ID × (1 + γ)ID. Since H is locally
diffeomorphic (see (iv) above) the inverse map H−1 is defined in a neighborhood of every
point of V . This allows to define the almost complex structure J̃ = H∗(J) = dH−1 ◦ J ◦ dH
on Uγ . The structure J̃ has a special form. Indeed, in the standard basis of IR
4 we have
J̃11 J̃12
J̃21 J̃22
where J̃kj are real 2×2 matrices. We recall that in this basis the standard complex structure
st of IC has the form
It follows by the property (i) of H that the maps τ 7→ (c, τ) are J̃-holomorphic for every
fixed c. This implies that J̃12 = 0 and J̃22 = J
st . Furthermore, since the map ζ 7→ (ζ, 0) is
J̃-holomorphic, we have J̃11(z, 0) = J
st and J̃21(z, 0) = 0.
Let now g : ID −→ Uγ be a J̃ -holomorphic map. If we set ζ = ξ+iη, the Cauchy–Riemann
equations have expressing the J̃-holomorphicity of g have the form
= 0 (13)
Suppose now that the matrix Jst+J is invertible. Then the Cauchy–Riemann equations can
be rewritten in the form
gζ + A(g)gζ = 0 (14)
where A is defined by (2). If we use the notation g = (z, w), then the Cauchy–Riemann
equations (14) can be written in the form
∂ζz = a(z, w)∂ζz,
∂ζw = b(z, w)∂ζz
identical to (8). Furthermore, since J̃(z, 0) = Jst, the conditions a(z, 0) = b(z, 0) = 0 are
satisfied.
Proposition 5.3 We have ‖ a ‖∞< 1.
Proof : The proof consists of two steps. First we study the matrix J̃+Jst which determines
the matrix A in the Cauchy–Riemann equations (14).
Lemma 5.4 The matrix J̃(z, w) + Jst is non-degenerate for any (z, w) ∈ ID× (1 + γ)ID.
Proof : It suffices to verify the condition det(J̃11(z, w) + J
st ) 6= 0. For every fixed (z, w)
the matrix J̃11(z, w) defines a complex structure on the euclidean space IR
2 so there exists a
matrix P = P (z, w) such that
J̃11(z, w) = PJ
−1. (16)
Recall that the manifold J2 of all complex structures on IR
2 can be identified with the
quotient GL(2, IR)/GL(1, IC) and has two connected components: J +2 and J
2 . A structure
J̃11 belongs to J
2 (resp. to J
2 ) if in the representation (16) we have detP > 0 (resp.
detP < 0). Suppose now that det(PJ
st ) = 0 or equivalently det(PJ
st +J
st P ) =
0 at some point (z, w). If we denote by pjk the entries of the matrix P , the last equality means
jk=1 p
jk = 0 which together with the non-degeneracy of P implies that detP < 0 so
that J̃11(z, w) ∈ J
2 . On the other hand, for the point (z, 0) we have detP > 0 since
J̃11(z, 0) = J
st so J̃(z, 0) ∈ J
2 . But we can join the points (z, 0) and (z, w) by a real
segment, so this contradiction proves lemma.
Now we can conclude the proof of Proposition 5.3. It follows by Lemma 5.4 that the
Cauchy–Riemann equations (13) can be written in the form (15) on ID × (1 + γ)ID. The
Cauchy–Riemann equations are elliptic at every point and this condition is independent of
the choice of the coordinates. The system (15) is ellipitic at a point (z, w) if and only if
|a(z, w)| 6= 1. Since a(z, 0) = 0 we obtain by connectedness that |a| < 1 on ID × (1 + γ)ID,
which concludes the proof.
Now Theorem 5.1 follows by Theorem 4.1.
5.3 Construction of an admissible parametrization with a pre-
scribed generated torus
So far we studied a situation where an admissible parametrization of a neighborhood of
an immersed J-holomorphic disc was given and proved the existence of discs with bound-
aries close to the generated torus. In the proof of our main result, we need an admissible
parametrization of a neighborhood of a J-holomorphic disc with a given generated torus.
Let f : ID −→ M be an immersed J-holomorphic disc of class C∞(ID). We extend f
smoothly to a neighborhood of ID. Let U be a small neighborhood of bID. For every point
f(ζ), ζ ∈ U , consider a J-holomorphic disc hζ : 2ID −→ M . Suppose that the map hζ
smoothly depends on ζ ∈ U . Thus we obtain a smooth map
H : bID × ID −→ M, H : (ζ, τ) 7→ hζ(τ).
Then Λ := H(bID × bID) is a real 2-dimensional torus. In order to construct an admissible
parametrization with the generated torus Λ we need to extend the map H from the cylinder
bID × ID to the bidisc ID× ID.
Definition 5.5 We call the described above torus Λ admissible. We further put Xζ :=
Xf(ζ) = dhζ(0)(
∂Re τ
) for every ζ ∈ U .
Theorem 5.6 Let f : ID −→ (M,J) be an immersed J-holomorphic disc of class C∞(ID).
Let Λ be an admissible torus. Then there is a sequence of admissible tori Λn converging to
Λ such that for every n there exists an immersed J-holomorphic disc fn of class C∞(ID)
centered at f(0), tangent to f at f(0) and satisfying the boundary condition fn(bID) ⊂ Λn.
Proof : Let Λ be an admissible torus and let X be the vector field given by Definition
5.5. In general it is impossible to extend X as a non-vanishing vector field transversal to
f(ID) at every point. However, for any integer (not necessarily positive) n we can consider
the discs hnζ : τ 7→ hζ(ζ
nτ), where ζ ∈ bID. Their tangent vectors at the points f(ζ) are equal
to Xnζ := ζ
nXζ, where by multiplying a vector by a complex number ζ
n we mean applying
the operator (Re ζ + (Im ζ)J)n. We need the following
Lemma 5.7 After a suitable choice of n the vector field Xnζ can be extended on the disc as
a nonvanishing field transversal to f at every point.
Proof : First we look for a global parametrization of a neighborhood of f(ID). Fix an
arbitrary vector field Y transversal to f(ID) at every point. By Nijenhuis - Woolf theorem
we obtain a family of J-holomorphic discs gz : w 7→ gz(w), z ∈ ID so that gz(0) = f(z) and
Xf(ζ) is tangent to gz. Then the map G : (z, w) 7→ gz(w) is a local diffeomorphism from
a neighborhood of ID × ID onto a neighborhood of f(ID) and G(z, 0) = f(z) so we can use
the coordinates (z, w). We pull back the vector field X by G−1 and consider the vector field
(G−1)∗(X) : ζ 7→ (G
−1)∗(Xζ). Let m be the winding number of the w-component of the
vector field (G−1)∗(X) when ζ runs along the circle bID. We set n = −m. Then the field
(G−1)∗(X
n) extends on the disc (ζ, 0) as a smooth vector field Z transversal to this disc at
every point. Then the map G∗(Z) associates to every point of ID a vector transversal to
f(ID) and so defines the desired extension X̃n of the vector field Xn. This proves the lemma.
Now by the Nijenhuis - Woolf theorem there exists a map h̃ζ : ID −→ M which is J-
holomorphic on ID such that h̃ζ = hζ for every ζ in a neighborhood of bID and the vector X̃
is tangent to hζ at the origin. Thus we can extend H to a function defined on ID × ID such
that the map H(ζ, •) is J-holomorphic for any ζ ∈ ID. This map H is a local diffeomorphism
and so determines an admissible parametrization of a neighborhood of f(ID) such that the
generated torus coincides with Λ. Theorem 5.6 now follows by Theorem 5.1.
6 Pushing discs through non-critical levels
In this section we explain how to push a given disc through non-critical level sets of a strictly
plurisubharmonic function.
Proposition 6.1 Suppose that ρ does not have critical values in the closed interval [c1, c2].
Let f : ID −→ Ωc1 be an immersed J-holomorphic disc such that f(bID) ⊂ bΩc1 . Then there
exists an immersed J-holomorphic disc f̃ : ID −→ Ωc2 such that f̃(0) = f(0), df̃(0) = λdf(0)
for some λ > 0 and f̃(bID) ⊂ bΩc2.
For the proof we need some preparations. Let ρ be a strictly plurisubharmonic function
on an almost complex manifold (M,J). For real c consider the domain Ωc = {ρ < c}.
Suppose that its boundary has no critical points. Let f : ID −→ Ωc be a J-holomorphic
disc of class C∞(ID) and such that f(bID) ⊂ bΩc. For every point p ∈ f(bID) consider a
J-holomorphic disc hp : 2ID −→ M touching bΩc from outside such that ρ ◦ hp|2ID\{0} > c.
We call the discs hp the Levi discs. The map hp can be chosen smoothly depending on
p ∈ f(bID).
An explicit construction of the Levi discs is given in [12]. In the almost complex case the
proof is similar; the only thing which has to be justified is the existence of discs hp touching
a strictly pseudoconvex level set from outside. This was recently proved by Barraud and
Mazzilli [1] and Ivashkovich and Rosay [16]. In [6] the result is obtained in any dimension.
For reader’s convenience we include a simple proof (see [6]).
Lemma 6.2 For a point p ∈ bΩc there exists a J-holomorphic disc hp such that hp(0) = p
and hp(ID\{0}) is contained in M\Ωc.
Proof : We fix local coordinates z = (z1, z2) near p such that p = 0 and J(0) = Jst.
Denote by ej , j = 1, 2 the vectors of the standard basis of IC
2. By an additional change of
coordinates we may achieve that the map h : ζ 7→ ζe1 is J-holomorphic on ID. We can
assume that the Levi form LJr (0, e1) = 1 so that
r(z) = 2Re z2 + 2Re
ajkzjzk +
αjkzjzk + o(|z|
α11 = ∆(r ◦ h)(0) = 1.
Now for every δ > 0 consider the non-isotropic dilation Λδ : (z1, z2) 7→ (δ
−1/2z1, δ
−1z2). The
J-holomorphicity of the map h implies that the direct images Jδ := (Λδ)∗(J) converge to Jst
as δ −→ 0 in the Ck norm for every positive integer k on any compact subset of IC2. Similarly,
the functions rδ := δ
−1r ◦ Λ−1 converge to the function r0 := 2Re z2 + |z1|
2 + 2Re βz21 (for
some β ∈ IC).
Consider a Jst-holomorphic disc ĥ : ζ 7→ ζe1 − βζ
2e2. According to the Nijenhuis-
Woolf theorem for every δ ≥ 0 small enough there exists a Jδ-holomorphic discs h
δ such
that the family (hδ)δ≥0 depends smoothly on the parameter δ and for every δ ≥ 0 we have
hδ(ζ) = ζe1 + o(|ζ |) and h
0 = ĥ. Since (r0 ◦ h
0)(ζ) = |ζ |2, we obtain that for δ > 0 small
enough that (rδ ◦ h
δ)(ζ) = Aδ(ζ) + o(|ζ |
2), where Aδ is a positive definite quadratic form on
IR2. Since the structures Jδ and J are biholomorphic, then the lemma follows.
Thus we obtain a smooth map
H : bID× ID −→M, H : (ζ, τ) 7→ hf(ζ)(τ) =: hζ(τ).
For simplicity we assume here that H is a local diffeomorphism although the Levi discs hζ
can intersect even for close values of ζ . We prove in a forthcoming paper that the pullback
H∗(J) of J to the bidisc can be defined even if H is not a local diffeomorphism. Thus
Λ := H(bID × bID) is an admissible torus and ρ|Λ ≥ c + ε for some ε > 0. We stress that ε
depends only on ρ (more precisely on a constant separating the norm of the gradient of ρ
from zero) and the C2-norm of J .
Now Theorem 5.6 implies that there exists a disc f̃ with the same direction as f at the
center and with the boundary attached to a torus arbitrarily close to Λ. Now we cut off the
discs hζ by the level set {ρ = c+ε/2} and obtain a disc with boundary attached to this level
set. Indeed, we have the following
Lemma 6.3 Suppose that ρ ◦ f |bID ≥ c0 and c0 is a non-critical value of ρ. Then there
exists a J-holomorphic disc f̃ centered at f(0) and tangent to f at the center with boundary
attached to the level set {ρ = c0}.
Proof : By the Hopf lemma the disc f intersects the level set {ρ = c0} transversally at
every point. Therefore the open set Ω = {ζ ∈ ID : ρ ◦ f(ζ) < c0} has a smooth bound-
ary. The set Ω may be disconnected, but the connected component of 0 ∈ Ω is simply
connected by the maximum principle applied to the function ρ ◦ f . Now the lemma follows
via reparametrization by the Riemann mapping theorem.
Then we again consider the Levi discs for this level set etc. By iterating this argument
a finite number of times we obtain Proposition 6.1.
7 Pushing discs through a critical level
In order to push the boundary of the disc f through critical level sets of ρ, we use a method
of [11, 7], which consists of temporarily switching to another plurisubharmonic function at
each critical level set. We need a version of the Morse lemma for almost complex manifolds.
Proposition 7.1 Let (M,J) be an almost complex manifold of complex dimension 2. Let
ρ be a strictly plurisubharmonic Morse function on M . Then there exists another strictly
plurisubharmonic Morse function ρ̃ close to ρ with the same critical points, such that at each
critical point of Morse index k in local coordinates given by Lemma 2.2 one has
ρ̃(z) = ρ̃(0) + |z1|
2 + |z2|
2 − a1Re z
1 − a2Re z
2 (17)
where
(i) a1 = a2 = 0 if k = 0,
(ii) a1 = 2 and a2 = 0 if k = 1,
(iii) a1 = a2 = 2 if k = 2.
Remark. This is a weak version of the Morse lemma because we change the given
function ρ instead of reducing it to a normal form.
The following result must be well known. For convenience we include a proof.
Lemma 7.2 Let B be a complex symmetric n×n matrix. Then there exists a unitary matrix
U such that U tBU is diagonal with nonnegative elements.
Proof : Using coordinate-free language, given a hermitian positive definite form H and a
complex symmetric bilinear form B on a vector space V , dimIC V = n, we need u1, ..., un ∈ V
such that
H(ui, uj) = δij , B(ui, uj) = ciδij, ci ≥ 0.
If the above holds with just ci ∈ IC, then by rotation ui 7→ σiui, |σi| = 1, we obtain ci ≥ 0.
It suffices to find u1 ∈ V , H(u1, u1) = 1, such that for every x ∈ V ,
H(x, u1) = 0 implies B(x, u1) = 0. (18)
Then the rest of ui in the H-orthogonal complement of u1 are found by induction. Given
u ∈ V , by duality, there is a unique vector L(u) ∈ V such that for every x ∈ V ,
H(x, L(u)) = B(x, u). (19)
Then L : V → V is a IR-linear (IC-antilinear) transformation. Since B is symmetric, then by
(19), L is real symmetric (self-adjoint) with respect to the form ReH . Then the eigenvalues
of L are real and the eigenvectors are in V (generally they are in V ⊗IR IC). Let u1 ∈ V
be an eigenvector of L, that is L(u1) = λu1, for some λ ∈ IR. We normalize u1 so that
H(u1, u1) = 1. Then for u = u1, (19) implies (18), and the lemma follows.
Proof of Proposition 7.1 : Let p be a critical point of ρ. Introduce a coordinate system
with the origin at p given by Lemma 2.2. In these coordinates the function ρ is strictly
plurisubharmonic at the origin with respect to Jst. Then
ρ(z) = ρ(0) +
aijzizj + Re
bijzizj +O(|z|
where aij = aji and bij = bji. By a linear transformation we can reduce to the form aij = δij .
If we now make a unitary transformation z 7→ Uz preserving |z1|
2 + |z2|
2, then the matrix
B = (bij) changes to U
tBU . By Lemma 7.2 the expression of ρ reduces to
ρ(z) = ρ(0) + |z1|
2 + |z2|
2 − Re (a1z
1 + a2z
2) +O(|z|
where aj ≥ 0, j = 1, 2. The remainder ϕ = O(|z|
3) can be removed by changing ρ to
ρ̃ = ρ−ϕλ, where λ(z) = λ0(z/ε) is a smooth cut-off function with λ0 ≡ 1 in a neighborhood
of the origin and λ0(z) = 0 for |z| ≥ 1, ε > 0 small enough.
Since ϕ(z) = O(|z|3), then |d(ϕλ)| ≤ C|z|2, ‖ ϕλ ‖C2(IC2)≤ Cε where C > 0 is independent
of ε. Since |dρ| ≥ C|z| in a neighborhood of 0 for some C > 0, then for small ε > 0 the
function ρ̃ has only one critical point at the origin, is strictly plurisubharmonic and matches
with ρ for |z| > ε.
The coefficients aj can be reduced to the standard values 0 and 2 depending on the index
k of the critical point. We need a cut-off function that falls down from 1 to 0 sufficiently
slowly.
Lemma 7.3 Given δ > 0 there exists a smooth non-increasing function φ with a compact
support on IR+ such that
(i) φ = 1 near the origin.
(ii) |tφ′(t)| ≤ δ.
(iii) |t2φ′′(t)| ≤ δ
The lemma follows because
Let bj = 0 (resp. 2) if 0 ≤ aj < 1 (resp. aj > 1). Let λ(z) = φ(|z|/ε), where φ is provided
by Lemma 7.3 for sufficiently small δ. Then the function
ρ̃(z) = ρ(z) + λ[(a1 − b1)Re z
1 + (a2 − b2)Re z
for sufficiently small ε has all the desired properties. Proposition 7.1 is proved.
Thus in what follows we assume that ρ has the properties given by Proposition 7.1. Let p
be a critical point of ρ and ρ(p) = 0. Without loss of generality assume that the index k of p
is equal to 1 or 2 since the disc obtained by Proposition 6.1 cannot approach a minimum of
ρ. Choose a small neighborhood U of p. By (17) ρ is strictly plurisubharmonic with respect
to Jst.
We apply the construction of Lemma 6.7 of [11]. Consider c0 > 0 small enough such that
0 is the only critical value of ρ in the interval [−c0, 3c0]. We can assume that c0 is small
enough so that the set K(c0) := {z : ρ(z) ≤ 3c0, |x
′|2 ≤ c0} is compactly contained in a
neighborhood of the origin corresponding to U . Here we use the notation x′ = x1, x
′′ = x2
and |x′|2 = x21 (resp. x
′ = (x1, x2) and |x
′|2 = x21 + x
2 ) if k = 1 (resp. k = 2). We will use
similar notations for the coordinates x, y and the coordinates u, v introduced below. Let
E = {y′ = 0, z′′ = 0, |x′|2 ≤ c0}. (20)
Then E is a totally real submanifold with boundary and dimE = k. Consider the isotropic
dilations of coordinates
dc0 : z 7→ w = u+ iv = c
Set Jc0 = (dc0)∗(J). The structures Jc0 converge to Jst in any C
m norm on compact subsets
of IC2 as c0 −→ 0. Consider the function ρ̂(w) := c
0 ρ(c
0 w). This function has no critical
values in [−1, 3] and its expression in the coordinates w = u+iv is the same as the expression
(17) of ρ that is
ρ̂(w) = 3v21 + v
2 − u
1 + u
if k = 1 and
ρ̂(w) = 3v21 + 3v
2 − u
1 − u
if k = 2. In particular the set K = dc0(K(c0)) is given by {w : ρ̂(w) ≤ 3, |u
′|2 ≤ 1} and is a
fixed compact independent of c0.
It is important that the origin is a critical point of the function ρ and the local coordinates
and the function ρ are given by Proposition 7.1. This allows to use the isotropic dilations in
contrast with Lemma 6.2.
Since the function ρ̂ is strictly plurisubharmonic with respect to Jst, we can apply the
construction of [11] (Lemma 6.7 and section 6.4). We replace the function ρ̂ by a new
function ϕ defined by
ϕ(w) = 3v21 + v
2 − h(u
1) + u
if k = 1 and
ϕ(w) = 3v21 + 3v
2 − h(u
1 + u
if k = 2, where h ≥ 0 is a suitable function. The construction of h depends on the parameter
c0 only. In our “delated” coordinates w we apply this construction taking c0 = 1. Namely,
according to [11] there exist constants 0 < τ0 < τ1 < 1 depending on the eigenvalues of ρ̂
and a function ϕ strictly plurisubharmonic on IC2 with respect to Jst satisfying the following
properties:
(i) ρ̂ ≤ ϕ ≤ ρ̂+ τ1,
(ii) ρ̂+ τ0 ≤ ϕ on the set {|u
′|2 ≥ τ0}
(iii) ϕ = ρ̂+ τ1 on {|u
′|2 ≥ 1}
Since ρ̂ is strictly plurisubharmonic with respect to the structure Jc0 , the function ϕ also is
strictly Jc0-plurisubharmonic on {|u
′|2 ≥ 1} in view of (iii). On the other hand the structures
Jc0 converge to Jst in any C
m norm on compact subsets of IC2 as c0 −→ 0. Therefore, since
ϕ is strictly Jst-plurisubharmonic, it also is strictly Jc0-plurisubharmonic on K if c0 is small
enough. Thus, ϕ is strictly Jc0-plurisubharmonic on {ρ̂ ≤ 3}.
Now consider the function ρ̃(z) = c0ϕ(c
0 z) and set t0 = τ0c0.
The function ρ̃ satisfies the following properties:
(i) ρ̃ is strictly plurisubharmonic (with respect to Jst) in a neighborhood V ⊂ U of 0 and
ρ̃ = ρ+ t1 on the complement of V . Here t1 > 0 is a constant.
(ii) ρ̃ has no critical values on (0, 3c0)
(iii) There exists t0 ∈ (0, c0) such that
{ρ ≤ −c0} ∪ E ⊂ {ρ̃ ≤ 0} ⊂ {ρ ≤ −t0} ∪ E, (21)
where E is defined above by (20).
(iv) We have
{ρ ≤ c0} ⊂ {ρ̃ ≤ 2c0} ⊂ {ρ < 3c0} (22)
By Proposition 6.1 we construct an immersed J-holomorphic disc f such that −t0 <
ρ ◦ f |bID < 0. The boundary of f is contained in a torus Λ formed by discs complex tangent
to a level set of ρ. We will perturb the disc f slightly in order to avoid the intersection of
its boundary with E.
Proposition 7.4 Let f : ID −→ M be an immersed J-holomorphic disc in (M,J), where
dimICM = 2. Let E be a smooth submanifold in M . Then for every m ≥ 2 there exists a
J-holomorphic disc f̃ arbitrarily close to f in Cm(ID) such that f̃(0) = f(0), df̃(0) = df(0),
and f̃ |bID is transverse to E. In particular, if dimIR E ≤ 2, then f̃(bID) ∩ E = ∅.
Proof : By the implicit function theorem, the restriction f |bD admits infinitesimal pertur-
bations in all directions. Then the proposition follows by the proof of Thom’s transversality
theorem.
We now assume f(bID) ∩ E = ∅. In view of the inclusion (21) we conclude that ρ̃ > 0
on f(bID). By Lemma 6.3 we cut off the disc f by a level set {ρ̃ = c} for some c > 0 to
assume that now f(bID) is contained in this level set. The function ρ̃ has no critical values
in (0, 3c0). By Proposition 6.1 applied to the disc f and the function ρ̃ there exists a new
disc f̃ with the boundary contained in {ρ̃ > 2c0}. In view of (22) we have the inclusion
{ρ̃ > 2c0} ⊂ {ρ > c0}. Now the boundary of f̃ is outside the critical level {ρ = 0} as desired,
and we switch back to the original function ρ.
8 Proof of Theorem 1.1
Since the function ρ is strictly plurisubharmonic, then after a generic perturbation of ρ which
does not change the given level set, we can assume that ρ is a Morse function. Let p be the
given point in D. If p is not a point of minimum of ρ, we proceed as follows. Consider a
small J-holomorphic disc f centered at p with the given direction v. Consider a non-critical
level set ρ = c such that ρ(p) < c. Consider a foliation of a neighborhood of f by a complex
one-parameter family of J-holomorphic discs hq, q ∈ f(ID) such that the boundaries of these
discs are outside the sublevel set ρ < c. When q runs over the circle f(bID) these boundaries
form a torus. Applying Proposition 5.1 we obtain a new disc f̃ centered at p and still in the
same direction at p but with ρ ◦ f̃ |bID > 0.
If p is a point of minimum for ρ, we drop this first step and directly have this situation
with f̃ = f . Now the desired results follow by Proposition 6.1 combined with the above
argument allowing to push boundaries of discs through critical levels.
References
[1] J.-F. Barraud, E. Mazzilli, Regular type of real hypersurfaces in (almost) complex manifolds, Math.
Z. 248 (2004), 379–405.
[2] E. Bedford, B. Gaveau, Envelopes of holomorphy of certain 2-spheres in IC2, Amer. J. Math. 105
(1983), 975–1009.
[3] L. Bers, F. John, M.Schechter Partial differential equations J.Wiley and Sons, 1964.
[4] A.-L. Biolley, Floer homology, symplectic and complex hyperbolicities, Preprint, ArXiv
math.SG/0404551.
[5] B. Bojarski, Generalized solutions of a system of differential equations of first order of elliptic type
with discontinuous coefficients, Math. Sb. 43 (1957), 451–503.
[6] K. Diederich, A. Sukhov, Plurisubharmonic exhaustion functions and almost complex Stein structures,
Preprint, ArXiv math.CV/0603417
[7] B. Drinovec Drnovšek, F. Forstnerič, Holomorphic curves in complex spaces, Duke Math. J. 139 (2007),
203–254.
[8] P. Duren, A. Shuster, Bergman spaces, Math. Surveys and Monographs, 100, AMS, Providence, RI,
2004.
[9] Y. Eliashberg, Filling by holomorphic discs and its applications, London Math. Soc. Lecture Notes,
151 (1990), 45–67.
[10] F. Forstnerič, Polynomial hulls of sets fibered over the unit circle, Indiana Univ. Math. J. 37 (1988),
869–889.
[11] F. Forstnerič, Noncritical holomorphic functions on Stein manifolds, Acta Math. 191 (2003), 143–189.
[12] F. Forstnerič, J. Globevnik, Discs in pseudoconvex domains, Comment. Math. Helv. 67 (1992), 129–
[13] M. Gromov, Pseudo-holomorphic curves in symplectic manifolds, Invent. Math. 82 (1985), 307–347.
[14] D. Hermann, Holomorphic curves and hamiltonian systems in an open set with restricted contact type
boundary, Duke Math. J. 103 (2000),
[15] R. Hind, Filling by pseudoholomorphic discs with weakly pseudoconvex boundary conditions, GAFA 7
(1997), 462–495.
[16] S. Ivashkovich, J.-P. Rosay, Schwarz-type lemmas for solutions of ∂-inequalities and complete hyper-
bolicity of almost complex structures, Ann. Inst. Fourier 54 (2004), 2387–2435.
[17] N. Kruzhilin, Two-dimensional spheres on the boundaries of pseudoconvex domains in IC2, Izv. Akad.
Nauk SSSR Ser. Math. 52 (1988), 16–40.
[18] D. McDuff, D. Salamon, J-holomorphic curves and symplectic topology , AMS Colloquium Publ., 52,
AMS, Providence, RI, 2004.
[19] J.-C. Sikorav, Some properties of holomorphic curves in almost complex manifolds, in “Holomorphic
curves in Symplectic geometry”, Ed. M.Audin, J.Lafontane, Birkhauser (1994), 165–189.
[20] A. Sukhov, A. Tumanov, Filling hypersurfaces by discs in almost complex manifolds of dimension 2,
Indiana Univ. Math. J. 57 (2008), 509–544.
[21] I. Vekua, Generalized analytic functions , Fizmatgiz, Moscow (1959); English translation: Pergamon
Press, London, and Addison-Welsey, Reading, Massachuset (1962).
[22] V. S. Vinogradov, On a boundary value problem for linear first order elliptic systems of differential
equations in the plane (Russian), Dokl. Akad. Nauk SSSR, 118 (1958), 1059–1062.
[23] C. Viterbo, Functors and computations in Floer homology with applications, Part I, GAFA, 9 (1999),
[24] A. Zygmund, Trigonometric series, Vol. 2. Cambridge University Press, London 1959.
ABSTRACT
  We prove the existence of global Bishop discs in a strictly pseudoconvex
Stein domain in an almost complex manifold of complex dimension 2.

<|endoftext|><|startoftext|>
ANISOTROPIC THERMO-ELASTICITY IN 2D
PART I: A UNIFIED TREATMENT
MICHAEL REISSIG AND JENS WIRTH
Abstract. In this note we develop tools and techniques for the treatment of anisotropic
thermo-elasticity in two space dimensions. We use a diagonalisation technique to obtain
properties of the characteristic roots of the full symbol of the system in order to prove Lp–Lq
decay rates for its solutions.
Keywords: thermo-elasticity, a-priori estimates, anisotropic media
1. The problem under consideration
Systems of thermo-elasticity are hyperbolic-parabolic or hyperbolic-hyperbolic coupled sys-
tems (type-1, type-2 or type-3 models) describing the elastic and thermal behaviour of elastic,
heat-conducting media. The classical type-1 model of thermo-elasticity is based on Fourier’s law,
which means, that the heat flux is proportional to the gradient of the temperature. The present
paper is devoted to the study of type-1 systems for homogeneous but anisotropic media in R2.
There are different results in the literature for certain anisotropic media (cubic in [3]; rhombic
in [4]). Our goal is to present an approach, which allows to consider (an)isotropic models in R2
from a unified point of view.
We consider the type-1 system of thermo-elasticity
Utt +A(D)U + γ∇θ = 0, (1.1a)
θt − κ∆θ + γ∇TUt = 0. (1.1b)
Here A(D) denotes the elastic operator, which is assumed to be a homogeneous second order
2×2 matrix of (pseudo) differential operators and models the elastic properties of the underlying
medium. Furthermore, κ describes the conduction of heat and γ the thermo-elastic coupling of
the system. We assume κ > 0 and γ 6= 0. We solve the Cauchy problem for system (1.1) with
initial data
U(0, ·) = U1, Ut(0, ·) = U2, θ(0, ·) = θ0, (1.2)
for simplicity we assume U1, U2 ∈ S(R2,R2) and θ0 ∈ S(R2). We denote by A(ξ) the symbol of
the elastic operator and we set η = ξ/|ξ| ∈ S1. Then some basic examples for our approach are
given as follows. The material constants are always specified in such a way that the matrix A(η)
becomes positive.
Example 1.1. Cubic media in 2D are modelled by
A(η) =
(τ − µ)η21 + µ (λ+ µ)η1η2
(λ+ µ)η1η2 (τ − µ)η22 + µ
(1.3)
with constants τ, µ > 0, −2µ−τ < λ < τ . This case was treated e.g. in [3]. For the corresponding
elastic system see [12].
Example 1.2. Rhombic media in 2D are modelled by
A(η) =
(τ1 − µ)η21 + µ (λ + µ)η1η2
(λ+ µ)η1η2 (τ2 − µ)η22 + µ
(1.4)
http://arxiv.org/abs/0704.0125v3
2 MICHAEL REISSIG AND JENS WIRTH
with constants τ1, τ2, µ > 0 and −2µ−
τ1τ2 < λ <
τ1τ2. For this case we refer also to [4].
Example 1.3. Although it is not the main point of this note, we can consider isotropic media,
where
A(η) = µI + (λ + µ)η ⊗ η
(λ+ µ)η21 + µ (λ+ µ)η1η2
(λ+ µ)η1η2 (λ+ µ)η
2 + µ
(1.5)
with Lamé constants µ > 0 and λ+ µ > 0.
We will present a unified treatment of these cases of (in general) anisotropic thermo-elasticity.
For this we assume that the homogeneous symbol A = A(ξ) = |ξ|2A(η), η = ξ/|ξ|, is given as a
function
A : S1 → C2×2 (1.6)
subject to the conditions
(A1): A is real-analytic in η ∈ S1,
(A2): A(η) is self-adjoint and positive for all η ∈ S1.
For some results we require that
(A3): A(η) has two distinct eigenvalues, # specA(η) = 2.
Under assumption (A3) the direction η ∈ S1 is called (elastically) non-degenerate. In this case we
know that the elasticity equation Utt+A(D)U = 0 is strictly hyperbolic and can be diagonalised
smoothly using a corresponding system of normalised eigenvectors rj(η) to the eigenvalues κj(η)
of A(η).
If (A3) is violated we will call the corresponding directions η ∈ S1 degenerate. For these
directions we can use the one-dimensionality of S1 in connection with the analytic perturbation
theory of self-adjoint matrices (cf. [6]). So we can always find locally smooth eigenvalues κj(η)
and corresponding locally smooth normalised eigenvectors rj(η) of A(η). For the following we
assume for simplicity that these functions extend to global smooth functions on S1.
This classification of directions is not sufficient for a precise study of Lp–Lq decay estimates
for solutions to the thermo-elastic system. It turns out that different microlocal directions
η = ξ/|ξ| ∈ S1 from the phase space have different influence on decay estimates. But how to
distinguish these directions and how to understand their influence? In general, this can be done
by a refined diagonalisation procedure applied to a corresponding first order system (first order
with respect to time). Applying a partial Fourier transform and chosing a suitable energy (of
minimal dimension) this system reads as DtV = B(ξ)V . The properties of the matrix B(ξ) are
essential for our understanding:
• the notions of hyperbolic and parabolic directions depend on the behaviour of the eigen-
values of B(ξ) (see (2.3), (2.5), Definition 1);
• the matrix B(ξ) contains spectral data of A(ξ) together with certain coupling functions
(see (2.6)) between different components of the energy. The behaviour of these coupling
functions close to hyperbolic directions has an essential influence on decay rates (see
Theorems 3.1, 3.5 and 3.6).
It turns out that we have to exclude some exceptional values of the coupling constant γ by
assuming that (see Definition 1 for the notion of hyperbolic directions)
(A4): γ2 6= 2κj0(η̄)− trA(η̄) for all hyperbolic directions η̄ with respect to κj0 .
Basically this implies the non-degeneracy of the 1-homogeneous part of B(ξ). In the following
we will call a hyperbolic direction violating (A4) a γ-degenerate direction. Assumption (A4) is
used for the treatment of small hyperbolic frequencies and plays there a similar rôle like (A3)
for large frequencies.
ANISOTROPIC THERMO-ELASTICITY IN 2D 3
In Section 2 we will give the transformation of the thermo-elastic system (1.1) to a system
of first order and the diagonalisation procedure in detail. The proposed procedure generalises
those from [14], [15], [9], [17]. The obtained results are used to represent solutions of the original
system as Fourier integrals with complex phases. Based on these representations we give micro-
localised decay estimates for solutions in Section 3. They and their method of proof depend
• the classification of directions (to be hyperbolic or parabolic);
• the order of contact between Fresnel curves (coming from the elastic part) and their
tangents for hyperbolic directions;
• the vanishing order of the coupling functions in hyperbolic directions.
Let us formulate some of the results. The first one follows from Theorem 3.1 and Corollary 3.2.
Result. Under assumptions (A1) to (A4) and if the coupling functions vanish to first order in
hyperbolic directions the solutions U(t, x) and θ(t, x) to (1.1) satisfy the Lp–Lq estimate
‖DtU(t, ·)‖q + ‖
A(D)U(t, ·)‖q + ‖θ(t, ·)‖q
. (1 + t)−
) (‖U1‖p,r+1 + ‖U2‖p,r + ‖θ0‖p,r) (1.7)
for dual indices p ∈ (1, 2], pq = p+ q, and Sobolev regularity r > 2(1/p− 1/q).
If the coupling functions vanish to higher order we have to relate their vanishing order ℓ to
the order of contact γ̄ between the Fresnel curve and its tangent in the corresponding direction
and for the corresponding sheet. In Theorems 3.5 and 3.6 we show that in this case the 1/2 in
the exponent is changed to 1/min(2ℓ, γ̄).
Our main motivation to write this paper is to provide a unified way to treat anisotropic
models of thermo-elasiticity. New analytical tools presented in these notes generalise to higher
dimensions and allow to treat especially models in 3D (outside degenerate directions). The two-
dimensional results from [3] for cubic media and [4] for rhombic media are contained / extended;
general anisotropic media can be treated. This is discussed in some detail in the second part [16]
of this note.
Acknowledgements. The first author thanks Prof. Wang Ya-Guang (Shanghai Jiao Tong
University) for the discussion about some basic ideas of the approach presented in this paper
during his stay at the TU Bergakademie Freiberg in August 2004. The stay was supported by
the German-Chinese research project 446 CHV 113/170/0-2.
2. General treatment of the thermo-elastic system
We use a partial Fourier transform with respect to the spatial variables to reduce the Cauchy
problem for (1.1) to the system of ordinary differential equations
Ûtt + |ξ|2A(η)Û + iγξθ̂ = 0, (2.1a)
θ̂t + κ|ξ|2θ̂ + iγξ · Ût = 0, (2.1b)
Û(0, ·) = Û1, Ût(0, ·) = Û2, θ̂(0, ·) = θ̂0 (2.1c)
parameterised by the frequency variable ξ.
We denote by κ1,κ2 ∈ C∞(S1) the eigenvalues of A(η) and by r1, r2 ∈ C∞(S1, S1) corre-
sponding normalised eigenvectors. Both depend in a real-analytic way on η ∈ S1. In a first step
we reduce (2.1) to a first order system. For this we use the diagonaliser of the elastic operator,
i.e. the matrix M(η) = (r1(η)|r2(η)) build up from the normalised eigenvectors, and define
U (0)(t, ξ) =MT (η)Û (t, ξ). (2.2)
4 MICHAEL REISSIG AND JENS WIRTH
Then we define by the aid of
D1/2(η) = diag(ω1(η), ω2(η)), ωj(η) =
κj(η) ∈ C∞(S1), (2.3)
the vector-valued function
V (0)(t, ξ) =
(Dt +D1/2(ξ))U (0)(t, ξ)
(Dt −D1/2(ξ))U (0)(t, ξ)
θ̂(t, ξ)
 , (2.4)
where as usual Dt = −i∂t. It satisfies a first order system with apparently simple structure. A
short calculation yields DtV
(0)(t, ξ) = B(ξ)V (0)(t, ξ) with
B(ξ) =
ω1(ξ) iγa1(ξ)
ω2(ξ) iγa2(ξ)
−ω1(ξ) iγa1(ξ)
−ω2(ξ) iγa2(ξ)
a1(ξ) − iγ2 a2(ξ) −
a1(ξ) − iγ2 a2(ξ) iκ|ξ|
, (2.5)
where we used the coupling functions
aj(ξ) = rj(η) · ξ. (2.6)
For later use we introduce the notation B1(ξ) and B2(ξ) for the homogeneous components of
B(ξ) of order 1 and 2, respectively. The coupling functions aj(η) can be understood as the
co-ordinates of η with respect to the orthonormal eigenvector basis {r1(η), r2(η)}. Therefore, it
holds a21(η) + a
2(η) = 1. Furthermore, they are well-defined and real-analytic functions on S
In the following proposition we collect some information on the characteristic polynomial of
the matrix B(ξ).
Proposition 2.1. (1) trB(ξ) = iκ|ξ|2 and detB(ξ) = iκ|ξ|6 detA(η).
(2) The characteristic polynomial of B(ξ) is given by
det(νI −B(ξ)) =(ν − iκ|ξ|2)(ν2 − κ1(ξ))(ν2 − κ2(ξ))
− νγ2|ξ|2a21(η)(ν2 − κ2(ξ))− νγ2|ξ|2a22(η)(ν2 − κ1(ξ)). (2.7)
(3) An eigenvalue ν ∈ specB(ξ), ξ 6= 0, is real if and only if ν2 = κj0 (ξ) for an index
j0 = 1, 2. If the direction is non-degenerate this is equivalent to aj0(η) = 0.
(4) If aj(η) 6= 0, j = 1, 2 the eigenvalues ν ∈ specB(ξ) satisfy
iκ|ξ|2
a21(ξ)
ν2 − κ1(ξ)
a22(ξ)
ν2 − κ2(ξ)
. (2.8)
It turns out that the property of B(ξ) to have real eigenvalues depends only on the direction
η = ξ/|ξ| ∈ S1. We will introduce a notation.
Definition 1. We call a direction η ∈ S1 hyperbolic if B(ξ) has a real eigenvalue and parabolic
if all eigenvalues of B(ξ) have non-zero imaginary part.
In hyperbolic directions we always have a pair of real eigenvalues. If η ∈ S1 is hyperbolic
with ±ωj0(ξ) ∈ specB(ξ) for ξ = |ξ|η, we call η hyperbolic with respect to the index j0 (or
with respect to the eigenvalue κj0(η) of A(η)) and ν±(ξ) = ±ωj0(ξ) the corresponding pair of
hyperbolic eigenvalues of B(ξ).
A non-degenerate direction is parabolic if and only if aj(η) 6= 0, j = 1, 2, while for non-
degenerate hyperbolic directions one of the coupling functions aj0(η) vanishes. Degenerate di-
rections are always hyperbolic (in 2D), see (2.7).
ANISOTROPIC THERMO-ELASTICITY IN 2D 5
Example 2.1. If the medium is isotropic, A(η) = µI+(λ+µ)η⊗η, the eigenvalues of A are µ and
λ+µ with corresponding eigenvectors η and η⊥. Thus all directions are hyperbolic (with respect
to the second eigenvalue). In this case the matrix B(ξ) decomposes into a diagonal hyperbolic
2 × 2-block and a parabolic 3 × 3-block. This decomposition coincides with the Helmholtz
decomposition as used in the standard treatment of isotropic thermo-elasticity.
Example 2.2. For cubic media (where we assume in addition µ 6= τ and µ + λ 6= 0) there exist
eight hyperbolic directions determined by η1η2 = 0 or η
1 = η
2 . The functions aj(η) have simple
zeros at these directions.
Example 2.3. Weakly coupled cubic media with λ+µ = 0, µ 6= τ , have the degenerate directions
η21 = η
2 , media with µ = τ , λ + µ 6= 0, for η1η2 = 0. In both cases the coupling functions aj(η)
do not vanish in these directions.
If µ = τ = −λ, the elastic system decouples directly into two wave equations with propagation
speed
µ. In this case all directions are degenerate.
Example 2.4. For rhombic media we have to distinguish between three cases.
Case 1. If the material constants satisfy (λ+2µ− τ1)(λ+2µ− τ2) > 0, we are close to the cubic
case and there exist eight hyperbolic directions given by η1η2 = 0 and
η21(λ+ 2µ− τ1) = η22(λ+ 2µ− τ2).
Case 2. If we assume on the contrary that (λ+2µ−τ1)(λ+2µ−τ2) < 0, only the four hyperbolic
directions η1η2 = 0 exist. In the Cases 1 and 2 in each hyperbolic direction one of the coupling
functions aj(η) vanishes to first order.
Case 3. In the borderline case τ1 = λ+2µ or τ2 = λ+2µ, but τ1 6= τ2, three hyperbolic directions
collapse to one. We have the four hyperbolic directions η1η2 = 0, at two of them (ηj = ±1 if
τj = λ+ 2µ) the vanishing order of the coupling function is three.
Rhombic media are degenerate if a) µ = τ1 (or µ = τ2) with degenerate direction (0, 1)
T (or
(1, 0)T ) or b) λ+µ = 0 (weakly coupled case) and (µ−τ1)(µ−τ2) > 0 with degenerate directions
determined by η21(µ− τ1) = η22(µ− τ2) or c) τi = µ = −λ (exceptional case) where all directions
are degenerate.
Proposition 2.2. Let the direction η̄ ∈ S1 be non-degenerate and hyperbolic with respect to the
index j0. Then the corresponding eigenvalues ν±(ξ) satisfy
a2j0(ξ)
ν2±(ξ) − κj0(ξ)
= qη̄(|ξ|) = Cη̄ ∓ iDη̄|ξ| (2.9)
for all non-tangential limits with real constants Cη̄, Dη̄ ∈ R, Dη̄ > 0. Furthermore, the imaginary
part of the hyperbolic eigenvalue satisfies
Im ν±(ξ)
a2j0(η)
D2η̄|ξ|2
C2η̄ + |ξ|2D2η̄
> 0, (2.10)
and thus vanishes like Im ν(|ξ|η) ∼ a2j0(η) as η → η̄ for all ξ 6= 0.
6 MICHAEL REISSIG AND JENS WIRTH
Proof. Let for simplicity j0 = 1. We use the characteristic polynomial of B(ξ) to deduce
γ2|ξ|2a21(η)
ν2± − |ξ|2κ1(η)
iκ|ξ|2
γ2|ξ|2a22(η)
ν2± − |ξ|2κ2(η)
→ 1∓ iκ|ξ|
ω1(η̄)
κ1(η̄)− κ2(η̄)
(2.11)
= 1− γ
κ1(η̄)− κ2(η̄)
︸ ︷︷ ︸
γ2Cη̄
ω1(η̄)
︸ ︷︷ ︸
γ2Dη̄
|ξ| = γ2qη̄(|ξ|). (2.12)
The existence of the limit is implied by ν± 6= 0 and ν2± 6= κ2(ξ) as consequence of ν±(ξ) →
±|ξ|ω1(η) as η → η̄ by Proposition 2.1.
Obviously, Im qη̄(|ξ|) = ∓κ|ξ|/γ2ω1(η̄) = ∓Dη̄|ξ| is non-zero for ξ 6= 0 and considering the
imaginary part of the first limit expression
Im qη̄(|ξ|) = lim
a21(ξ)
ν2±(ξ)− κ1(ξ)
= lim
−2Re ν±(ξ) Im ν±(ξ) a21(ξ)
|ν2±(ξ)− κ1(ξ)|2
= ∓2ω1(|ξ|η̄)|qη̄(|ξ|)|2 lim
Im ν±(|ξ|η)
|ξ|2a21(η)
proves the second statement, limη→η̄
Im ν±(ξ)
a21(η)
Dη̄ |ξ|
2ω1(η̄)(C
η̄+|ξ|
2D2η̄)
In the case of isolated degenerate directions (others are not of interest, because then the
system is decoupled) we can find a replacement for Proposition 2.1.
Proposition 2.3. Let η̄ ∈ S1 be an isolated degenerate direction, κ1(η̄) = κ2(η̄). Then the
corresponding hyperbolic eigenvalues ν±(ξ) satisfy
ω1(ξ)− ν2±(ξ)
ω1(ξ)− ω2(ξ)
= a21(η̄) > 0, (2.13)
and, therefore,
Im ν±(ξ)
ω1(ξ)− ω2(ξ)
= 0, lim
ω1(ξ)− Re ν±(ξ)
ω1(ξ)− ω2(ξ)
= a21(η̄). (2.14)
Thus, if a1(η̄) 6= 0 then the eigenvalues ν±(ξ) approach ±ω1(ξ) at the contact order between
ω1(ξ) and ω2(ξ) (while they approach ±ω1(ξ) with a higher order if a1(ξ) = 0).
2.1. Asymptotic expansion of the eigenvalues as |ξ| → 0. If |ξ| is small the first order part
B1(ξ) dominates B2(ξ), so the properties of the eigenvalues are governed by spectral properties
of B1(ξ).
Proposition 2.4. (1) trB1(ξ) = 0 and detB1(ξ) = 0.
(2) If the direction η ∈ S1 is parabolic the nonzero eigenvalues ν̃ of B1(η) satisfy
γ−2 =
a21(η)
ν̃2 − κ1(η)
a22(η)
ν̃2 − κ2(η)
(2.15)
and are thus real and related to κj(η) by
0 < κ1(η) < ν̃
1(η) < κ2(η) < ν̃
2(η) (2.16)
(if κ1(η) < κ2(η)).
ANISOTROPIC THERMO-ELASTICITY IN 2D 7
(3) If η is non-degenerate and hyperbolic with respect to the index 1 we have κ1(η) = ν̃
1(η),
while for hyperbolic directions with respect to the index 2 three cases occur depending on
the size of the coupling constant γ:
γ2 < κ2(η̄)− κ1(η̄) : κ2(η) = ν̃22 (η),
γ2 = κ2(η̄)− κ1(η̄) : ν̃21(η) = κ2(η) = ν̃22 (η),
γ2 > κ2(η̄)− κ1(η̄) : ν̃21(η) = κ2(η).
(2.17)
(4) If the direction is degenerate, κ1(η) = κ2(η), we have the eigenvalues ±
κ1(η) and
κ1(η) + γ2.
The existence of five distinct eigenvalues of the homogeneous principal part B1(η) for all
parabolic and most hyperbolic directions allows us to calculate the full asymptotic expansion of
the eigenvalues ν(ξ) of B(ξ) as |ξ| → 0. We will give only the first terms in detail, but provide
the whole diagonalisation procedure. Assumption (A4) guarantees the non-degeneracy of B1(ξ).
Note that, even if (A4) is violated the matrix B1(ξ) is diagonalisable (as consequence of its block
structure).
Proposition 2.5. As |ξ| → 0 the eigenvalues of the matrix B(ξ) behave as
ν0(ξ) = iκ|ξ|2b0(η) +O(|ξ|3), (2.18a)
ν±j (ξ) = ±|ξ|ν̃j(η) + iκ|ξ|
2bj(η) +O(|ξ|3) (2.18b)
for all non-γ-degenerate directions, where the functions bj ∈ C∞(S1) are given by
b0(η) =
γ2a21(η)
κ1(η)
γ2a22(η)
κ2(η)
> 0 (2.19)
bj(η) =
1 + γ2a21(η)
ν̃2j + κ1(η)
(ν̃2j − κ1(η))2
+ γ2a22(η)
ν̃2j + κ2(η)
(ν̃2j − κ2(η))2
≥ 0. (2.20)
Furthermore, bj(η) > 0 if η is parabolic and bj0(η) = 0 if η is hyperbolic with respect to the index
Proof. We apply a diagonalisation scheme in order to extract the spectral information for B(ξ).
We assume that the eigenvalues are denoted such that κ1(η) ≤ κ2(η).
Step 1. By Proposition 2.4 we know that the homogeneous first order part B1(η) has the
distinct eigenvalues ν̃0 = 0 and ν̃
j (η) = ±ν̃j(η), which are ordered as κ1(η) ≤ ν̃1(η) ≤ κ2(η) ≤
ν̃2(η) (where equality holds only under the exceptions stated in Proposition 2.4). We denote
corresponding normalised and bi-orthogonal left and right eigenvectors of the matrix B1(η) by
±(η) and e±j (η). If we collect them in the matrices
L(η) = (0e(η)|1e+(η)|1e−(η)|2e+(η)|2e−(η)), (2.21a)
R(η) = (e0(η)|e+1 (η)|e
1 (η)|e
2 (η)|e
2 (η)), (2.21b)
we have L∗(η)R(η) = I and
L∗(η)B1(η)R(η)) = D1(η) = diag(0, ν̃1(η),−ν̃1(η), ν̃2(η),−ν̃2(η)). (2.22)
Further we get
L∗(η)B2(η)R(η) = iκb∗(η)⊗ ∗b(η), (2.23)
where b∗(η) and ∗b(η) are vectors collecting the last entries b0(η), b
j (η) and 0b(η), jb
±(η) of the
eigenvectors e0(η), e
j (η) and 0e(η), je
±(η), respectively.
8 MICHAEL REISSIG AND JENS WIRTH
The matrix
B(0)(ξ) = L∗(η)B(ξ)R(η) = |ξ|D1(η) + |ξ|2iκb∗(η)⊗ ∗b(η) (2.24)
is diagonalised modulo O(|ξ|2) as |ξ| → 0 and has a main part with distinct entries. We denote
R(2)(ξ) = |ξ|2iκb∗ ⊗ ∗b.
Step 2. We construct a diagonaliser of B(0)(ξ) as |ξ| → 0 of the form
Nk(ξ) = I +
|ξ|jN (j)(η). (2.25)
For this we denote the k-homogeneous part of R(k)(ξ) by R̃(k)(ξ) and its entries by R̃
ij (η).
Then we set for k = 1, 2, . . .
Dk+1(η) = diag R̃(k+1)(η), (2.26)
N (k)(η) =
(k+1)
12 (η)
d1(η)−d2(η)
· · · R̃
(k+1)
15 (η)
d1(η)−d5(η)
(k+1)
21 (η)
d2(η)−d1(η)
0 · · · R̃
(k+1)
25 (η)
d2(η)−d5(η)
. . .
(k+1)
51 (η)
d5(η)−d1(η)
(k+1)
52 (η)
d5(η)−d2(η)
· · · 0
, (2.27)
where dj(η) are the entries of D1(η). By construction we have the commutator relation
[D1(η), N (k)(η)] = Dk+1(η)− R̃(k+1)(η), (2.28)
such that
R(k+2)(ξ) = B(0)(ξ)Nk(ξ)−Nk(ξ)
|ξ|jDj(η)
= R(k+1)(ξ)
+ |ξ|kB(0)(ξ)N (k)(η)− |ξ|kN (k)(η)
|ξ|jDj(η) −Nk(ξ)|ξ|k+1Dk+1(η)
= R(2)(ξ)|ξ|kN (k)(η)− |ξ|kN (k)(η)
|ξ|jDj(η)− (Nk(ξ)− I)|ξ|k+1Dk+1(η)
= O(|ξ|k+2).
Using Nk(ξ) − I = O(|ξ|) we see that for |ξ| ≤ ck, ck sufficiently small, the matrix Nk(ξ) is
invertible and
N−1k (ξ)B
(0)(ξ)Nk(ξ) =
|ξ|jDj(η) +O(|ξ|k+2). (2.29)
Thus, the entries of Dj(η) contain the asymptotic expansion of the eigenvalues, while the rows
of Nk(ξ)R(η) (and L(η)N
k (ξ)) give asymptotic expansions of the right (and left) eigenvectors
of B(ξ).
Furthermore, the construction implies that all occurring matrices are smooth functions of
η ∈ S1.
Step 3. We calculate the first terms explicitly. For this we need the diagonal entries of the
matrix b∗ ⊗ ∗b. Therefore, we determine the left and right eigenvectors of B1(η). If we assume
ANISOTROPIC THERMO-ELASTICITY IN 2D 9
that the direction η is non-degenerate we get for e0(η) = (r
1 , r
1 , r
2 , r
2 , r0)
T and 0e(η) =
(ℓ+1 , ℓ
1 , ℓ
2 , ℓ
2 , ℓ0)
T the equations
±r±j ωj + iγajr0 = 0, ±ℓ
j ωj −
ajℓ0 = 0, (2.30a)
1 + a2r
2 + a1r
1 + a2r
2 = 0, a1ℓ
1 + a2ℓ
2 + a1ℓ
1 + a2ℓ
2 = 0, (2.30b)
together with the normalisation condition
r+1 ℓ
1 + · · · r
2 + r0ℓ0 = 1. (2.31)
The first equations imply the representation
±r±j (η) = −
iγaj(η)
ωj(η)
r0(η), ±ℓ±j (η) =
iγaj(η)
2ωj(η)
ℓ0(η), (2.32)
the second line of equations follows from the first, while the normalisation condition yields
b0(η) = r0(η) ℓ0(η) =
γ2a21(η)
κ1(η)
γ2a22(η)
κ2(η)
6= 0. (2.33)
To calculate the eigenvectors we can further require r0(η) = ℓ0(η) =
b0(η) > 0.
Similarly, we obtain for the eigenvectors e+k (η) = (r
1 , . . . , r
2 , r0)
T and ke
+(η) = (ℓ+1 , . . . , ℓ
2 , ℓ0)
the equations (we use the same notation as above in the hope that this will not lead to confusion
here)
±r±j ωj + iγajr0 = ±ν̃kr
j , ±ℓ
j ωj −
ajℓ0 = ±ν̃k(η)ℓ±j (2.34)
together with the normalisation condition. Thus for parabolic directions we get
±r±j (η) =
iγaj(η)
ν̃k(η)− ωj(η)
r0(η), ±ℓ±j (η) = −
iγaj(η)
2(ν̃k(η)− ωj(η))
ℓ0(η), (2.35)
and hence
b+k (η) = r0(η) ℓ0(η) =
j=1,2
γ2a2j(η)
2(ν̃k(η)− ωj(η))2
j=1,2
γ2a2j(η)
2(ν̃k(η) + ωj(η))2
1 + γ2a21(η)
ν̃2k(η) + κ1(η)
(ν̃2k(η) − κ1(η))2
+ γ2a22(η)
ν̃2k(η) + κ2(η)
(ν̃2k(η)− κ2(η))2
. (2.36)
For e−k (η) and ke
−(η) we have to replace ν̃k(η) by −ν̃k(η) and obtain b−k (η) = b
k (η) = bk(η).
If the direction η is non-degenerate and hyperbolic with respect to the index j0, the entries r
and ℓ±j0 are undetermined by (2.34), while the other entries of the vectors are zero. Together with
the normalisation condition this determines the eigenvectors and gives bj0(η) = 0. It remains to
consider degenerate directions. Then we have ν̃1 = ω1 and ν̃2 > ω1 such that for k = 1 we have
ℓ0 = r0 = 0, r
1 and ℓ
1 are non-zero while r
2 = ℓ
2 = 0, especially b1(η) = 0. For k = 2 we get
from the above expression for b2(η) = (2 + 2κ
2/γ2)−1 > 0. �
Remark. 1. Note that for all non-degenerate hyperbolic directions η̄ ∈ S1 with respect to the
index 1 the limit
a21(η)b
1 (η) =
κ1(η̄)
κ1(η̄)− κ2(η̄)
(2.37)
is taken and non-zero, while for hyperbolic directions with respect to the index 2 the corre-
sponding limit is non-zero only if γ2 6= κ2(η̄) − κ1(η̄), i.e. if the direction is not γ-degenerate.
Near γ-degenerate directions Step 1 of the previous proof is still valid. Similar to Step 2 we
10 MICHAEL REISSIG AND JENS WIRTH
can diagonalise to a (2, 2, 1) block structure. The eigenvalues of these blocks can be calculated
explicitely.
2. For degenerate directions we obtain similarly
b1(η)
(κ1(η)− κ2(η))2
a21(η̄)a
2(η̄)
2γ2κ1(η̄)
(2.38)
and b1(η) vanishes to the double contact order.
2.2. Asymptotic expansion of the eigenvalues as |ξ| → ∞. If we consider large frequencies
the second order part B2(ξ) dominates B1(ξ). This makes it necessary to apply a different two-
step diagonalisation scheme. We follow partly ideas from [9], [14], [15] adapted to our special
situation.
Proposition 2.6. As |ξ| → ∞ the eigenvalues of the matrix B(ξ) behave as
ν0(ξ) = iκ|ξ|2 −
+O(|ξ|−1), (2.39a)
ν±j (ξ) = ±|ξ|ωj(η) +
a2j (η) +O(|ξ|−1). (2.39b)
for all non-degenerate directions ξ/|ξ| ∈ S1.
Remark. 1. Note that, while in hyperbolic directions we always have ν±j0(ξ) = ±|ξ|ωj0(η) for
one index j0, in degenerate hyperbolic directions all aj(η) may be non-zero. Hence the statement
of the above theorem cannot be valid in such directions in general.
2. Degenerate directions play for large frequencies a similar rôle as γ-degenerate directions play
for small frequencies.
Proof. The proof will be decomposed into several steps. In a first step we use the main part
B2(ξ) = iκ|ξ|2 diag(0, 0, 0, 0, 1) to block-diagonalise B(ξ). In a second step we diagonalise the
upper 4× 4 block for all non-degenerate directions.
Step 1. For a matrix B ∈ C5×5 we denote by b-diag4,1B the block diagonal of B consisting
of the upper 4 × 4 block and the lower corner entry. We construct a diagonalisation scheme to
block-diagonalise B(ξ) as |ξ| → ∞.
We set R(−1)(ξ) = B(ξ) − B2(ξ) = B1(ξ) and B−2(ξ) = B2(ξ) and construct recursively a
diagonaliser modulo the upper 4× 4 block,
Mk(ξ) = I +
|ξ|−jM (j)(η). (2.40)
Again we denote by R̃(k)(η) the (−k)-homogeneous part of R(k)(ξ) (which exists because it exists
for R(−1)(ξ) and the existence is transfered by the construction). Then we introduce the recursive
scheme
Bk−2(η) = b-diag4,1 R̃(k−2)(η), (2.41)
M (k)(η) =
(k−2)
15 (η)
(k−2)
45 (η)
−R̃(k−2)51 (η) · · · −R̃
(k−2)
54 (η) 0
(2.42)
for k = 1, 2, . . ., such that the commutator relation
[B−2(η),M (k)(η)] = Bk−2(η)− R̃(k−2)(η), (2.43)
ANISOTROPIC THERMO-ELASTICITY IN 2D 11
holds. Thus it follows
R(k−1)(ξ) = B(ξ)Mk(ξ)−Mk(ξ)
|ξ|−jBj(η) = O(|ξ|1−k) (2.44)
and using that Mk(ξ) is invertible for |ξ| ≥ Ck, Ck sufficiently large, we obtain the block diago-
nalisation
M−1k (ξ)B(ξ)Mk(ξ) =
|ξ|−jBj(η) +O(|ξ|1−k), (2.45)
where Bj(η) = b-diag4,1 Bj(η) is (4, 1)-block diagonal.
Step 2. By Step 1 we constructed Mk(ξ) such that M
k (ξ)B(ξ)Mk(ξ) is (4, 1)-block diagonal
modulo O(|ξ|1−k). The upper 4×4 block has already diagonal main part |ξ|−1D−1(η) = B1(ξ). If
the direction η = ξ/|ξ| is non-degenerate, the diagonal entries ±ω1(η) and ±ω2(η) are mutually
distinct and thus we can apply the standard diagonalisation procedure (cf. proof of Proposi-
tion 2.5) in the corresponding subspace. This does not alter the lower corner entry and gives
only combinations of the entries of the last column and of the last row (without changing their
asymptotics).
Thus we can construct a matrix Nk−1(ξ) = I +
j=1 |ξ|−jN (j)(ξ), which is invertible for
|ξ| > C̃k−1, C̃k−1 sufficiently large, such that
N−1k−1(ξ)M
k (ξ)B(ξ)Mk(ξ)Nk−1(ξ) = |ξ|
2B−2(η) +
|ξ|−jDj(η) +O(|ξ|1−k) (2.46)
is diagonal modulo O(|ξ|1−k).
Step 3. We give the first matrices explicitly. Following Step 1 we get
M (1)(η) =
γa1(η)
γa2(η)
γa1(η)
γa2(η)
γa1(η)
γa2(η)
γa1(η)
γa2(η)
(2.47)
together with B−1(η) = diag(ω1(η), ω2(η),−ω1(η),−ω2(η), 0) and
R̃(0)(η) = B1(η)M
(1)(η)−M (1)(η)B−1(η)
γ2a21(η)
γ2a1(η)a2(η)
· · · γa1(η)ω1(η)
γ2a1(η)a2(η)
γ2a22(η)
· · · γa2(η)ω2(η)
. . .
γa1(η)ω1(η)
γa2(η)ω2(η)
· · · −iγ
, (2.48)
B0(η) = b-diag4,1 R̃(0)(η) (2.49)
in the first diagonalisation step. Applying a second step alters only the last row and column to
O(|ξ|−1). Following Step 2 we diagonalise the upper 4× 4 block to |ξ|B−1(η) +B0(η) +O(|ξ|−1)
modulo O(|ξ|−1) and the statement is proven. �
Remark. If the direction η is degenerate, i.e. ω1(η) = ω2(η), we can block-diagonalise in Step
2 to (2, 2, 1)-block form. To diagonalise further we have to know that the 0-homogeneous part
of these 2× 2-blocks has distinct eigenvalues.
12 MICHAEL REISSIG AND JENS WIRTH
One possible treatment of degenerate directions is given in the following proposition. Note
that a corresponding statement can be obtained for γ-degenerate directions as |ξ| → 0.
Proposition 2.7. Let η̄ be an isolated degenerate direction. Then the corresponding hyperbolic
eigenvalue satisfies in a small conical neighbourhood of η̄
ν−j0(ξ) =
ω1(ξ) + ω2(ξ)
(ω1(ξ)− ω2(ξ))2
iγ2(ω1(ξ)− ω2(ξ))(a21(η)− a22(η))
+O(|ξ|−1). (2.50)
Proof. We follow the treatment of the previous proof to (2, 2, 1)-block-diagonalise B(ξ) modulo
|ξ|−1. Now, we consider one of its 2× 2-blocks. (We use a similar notation as before in the hope
that it will not lead to any confusion.) Such a block is given by
B(ξ) = |ξ|B−1(η) + B0(η) +O(|ξ|−1), (2.51)
where
B−1(η) = diag
ω1(η), ω2(η)
, (2.52)
B0(η) =
a21(η) a1(η)a2(η)
a1(η)a2(η) a
. (2.53)
In the direction η̄ both diagonal entries of B−1 coincide. In a small conical neighbourhood we
denote the eigenvalues of |ξ|B−1(η) + B0(η) as δ+(ξ) and δ−(ξ). A simple calculation yields
δ±(ξ) =
ω1(ξ) + ω2(ξ)
(ω1(ξ)− ω2(ξ))2
iγ2(ω1(ξ)− ω2(ξ))(a21(η)− a22(η))
(2.54)
with δ−(ξ̄) = ω1(ξ̄) and δ+(ξ̄) = ω1(ξ̄) +
. The hyperbolic eigenvalue corresponds to δ−(ξ).
These eigenvalues are distinct in a sufficiently small neighbourhood of η̄ (and may coincide only if
a1(η) = a2(η) and (ω1(ξ)−ω2(ξ))2 = γ4/(4κ2), which gives eventually two parabolic directions).
Hence the perturbation theory of matrices implies ν+j0(ξ) = δ−(ξ) +O(|ξ|
−1) in a sufficiently
small neighbourhood of η̄ and the statement is proven. �
Remark. Note, that
ω1(ξ)− δ−(ξ)
ω1(ξ)− ω2(ξ)
= a21(η̄) (2.55)
for all fixed |ξ|, which coincides with the result (2.13) of Proposition 2.3 for the eigenvalue ν(ξ).
2.3. Collecting the results. The asymptotic expansions from Propositions 2.5 and Proposi-
tion 2.6 imply estimates for eigenvalues of B(ξ) and the proofs give representations of corre-
sponding eigenvectors. For the application of multiplier theorems and the proof of Lp–Lq decay
estimates it is essential to provide also estimates for derivatives of them.
Assume that the eigenvalues under consideration are simple. From the asymptotic expansions
we know that this is the case for small frequencies and also for large frequencies. For the middle
part it will be sufficient to know that the hyperbolic eigenvalues are separated, which follows for
sufficiently small conical neighbourhoods of these directions.
ANISOTROPIC THERMO-ELASTICITY IN 2D 13
In a first step we consider derivatives of the eigenvalues. Differentiating the characteristic
polynomial
0 = det(ν(ξ)I −B(ξ)) =
Ik(ξ)ν(ξ)
k (2.56)
with respect to ξ yields by Leibniz formula
ξ Ik(ξ)
ξ ν(ξ)
(2.57)
for all multi-indices α ∈ N20. Thus we can express the highest derivative of ν(ξ)k in terms of
lower ones and hence Faà di Bruno’s formula (see e.g. [5]) yields an expression
Dαν(ξ)
kIk(ξ)ν(ξ)
k−1 =
Ck,α,β
Dα−βIk(ξ)
Dβν(ξ)
(2.58)
with certain constants Ck,α,β . Because the eigenvalue has multiplicity one, the sum on the left-
hand side is nonzero and therefore we can calculate the derivatives of ν(ξ) by this expression.
Furthermore, it follows that for small and large frequencies the derivatives of the eigenvalue have
full asymptotic expansions and thus we are allowed to differentiate the asymptotic expansions
term by term.
It remains to consider the corresponding eigenprojections. Recall that if ν(ξ) is a eigenvalue of
multiplicity one and r(ξ) and l(ξ) are corresponding right and left eigenvectors, the correspond-
ing eigenprojection is given by the dyadic product Pν(ξ) = l(ξ) ⊗ r(ξ). Thus the constructed
diagonaliser matrices imply asymptotic expansions of these operators. Again we are only inter-
ested whether the derivatives of these eigenprojections also possess asymptotic expansions (in
order to see whether it is allowed to differentiate term by term).
For this we use the representation
Pν(ξ) =
ν̃∈specB(ξ)\{ν}
(ν̃(ξ)I −B(ξ))(ν̃(ξ)− ν(ξ))−1 (2.59)
given e.g. in [4], [7]. All terms on the right-hand side have full asymptotic expansions as |ξ| → 0
and |ξ| → ∞ together with all of their derivatives. Differentiating with respect to ξ yields the
same result for the eigenprojection. Thus we obtain
Proposition 2.8. The asymptotic expansions from Proposition 2.5 and Proposition 2.6 may be
differentiated term by term to get asymptotic expansions for the derivatives of the eigenvalues.
Furthermore, the same holds true for the corresponding eigenprojections.
From Proposition 2.1 we know that an eigenvalue ν(ξ) of the matrix B(ξ) is real if and only if
η = ξ/|ξ| is hyperbolic. We want to combine this information with the asymptotic expansions of
Proposition 2.5 and Proposition 2.6 and derive some estimates for the behaviour of the imaginary
part.
Proposition 2.9. Let c > 0 be a given constant.
(1) Let η = ξ/|ξ| ∈ S1 be parabolic. Then the eigenvalues of B(ξ) satisfy
Im ν(ξ) ≥ Cη > 0, |ξ| ≥ c, (2.60)
with a constant Cη depending on the direction η and c. Furthermore,
Im ν(ξ) ∼ b(η)|ξ|2, |ξ| ≤ c, (2.61)
where b(η) is one of the functions from Proposition 2.5.
14 MICHAEL REISSIG AND JENS WIRTH
(2) Let η̄ be non-degenerate and hyperbolic with respect to the index 1. Then ν±1 (|ξ|η̄) =
±|ξ|ω1(η̄) and ν0(ξ) and ν±2 (ξ) satisfy the statement of point 1. Furthermore,
Im ν±1 (ξ) ∼ a21(η), |ξ| ≥ c, |η − η̄| ≪ 1, (2.62)
Im ν±1 (ξ) ∼ |ξ|2a21(η), |ξ| ≤ c, |η − η̄| ≪ 1. (2.63)
Proof. The first point follows directly from the asymptotic expansions, we concentrate on the
second one. We know that the hyperbolic eigenvalues ν±1 (ξ) satisfy by Proposition 2.2
Im ν±1 (ξ) = a
1(η)N
1 (ξ) (2.64)
with a smooth non-vanishing function N±1 (ξ) defined in a neighbourhood of η̄. By Proposition 2.5
and 2.6 we see that N±1 (ξ) also has full asymptotic expansions and thus
N±1 (ξ) =
+O(|ξ|−1), |ξ| → ∞ (2.65a)
N±1 (ξ) = iκ|ξ|
2 b1(η)
a21(η)
+O(|ξ|3), |ξ| → 0. (2.65b)
Together with (2.37) we get upper and lower bounds on N±1 (ξ) and the desired statement follows.
Remark. A similar reasoning allows to replace the hyperbolic eigenvalue near degenerate di-
rections by the model expression obtained in Proposition 2.7, thus ν(ξ) ∼ δ−(ξ) uniformly in a
sufficiently small conical neighbourhood of η̄.
3. Decay estimates for solutions
Our strategy to give decay estimates for solutions to the thermo-elastic system (1.1) is to
micro-localise them. In principle we have to distinguish four different cases. On the one hand
we differentiate between small and large frequencies, on the other hand between hyperbolic
directions and parabolic ones.
We distinguish between two cases depending on the vanishing order of the coupling functions.
If the coupling functions vanish to first order at hyperbolic directions only, we rely on simple
multiplier estimates. Later on we discuss coupling functions with higher vanishing order, where
the decay rates are obtained by tools closely related to the treatment of the elasticity equation.
3.1. Coupling functions with simple zeros. In a first step we consider the first order system
DtV = B(D)V, V (0, ·) = V0 (3.1)
to Cauchy data V0 ∈ S(R2,C5). For a cut-off function χ ∈ C∞(R+) with χ(s) = 0, s ≤ ǫ, and
χ(s) = 1, s ≥ 2ǫ, we consider
Ppar(η) =
η̄ hyperbolic
χ(|η − η̄|), Phyp(η) = 1− Ppar(η). (3.2)
Then Phyp(D) localises in a conical neighbourhood of the set of hyperbolic directions, while
Ppar(D) localises to a compact set of parabolic directions.
The asymptotic formulae and representations for the characteristic roots of the full symbol
B(ξ) allow us to proof decay estimates for the solutions.
ANISOTROPIC THERMO-ELASTICITY IN 2D 15
Theorem 3.1. Assume that (A1) to (A4) are satisfied and the coupling functions vanish in
hyperbolic directions to first order.
Then the solution V (t, x) to (3.1) satisfies the following a-priori estimates:
‖χ(D)Ppar(D)V (t, ·)‖q . e−Ct‖V0‖p,r (3.3a)
‖(1− χ(D))Ppar(D)V (t, ·)‖q . (1 + t)−(
)‖V0‖p (3.3b)
‖χ(D)Phyp(D)V (t, ·)‖q . (1 + t)−
)‖V0‖p,r (3.3c)
‖(1− χ(D))Phyp(D)V (t, ·)‖q . (1 + t)−
)‖V0‖p (3.3d)
for dual indices p ∈ (1, 2], pq = p+ q, and with Sobolev regularity r > 2(1/p− 1/q).
Remark. If B(ξ) is diagonalisable for ξ 6= 0 (which is valid e.g. if B(ξ) has no double eigenvalues
for ξ 6= 0) we can make the result even more precise. To each eigenvalue ν(ξ) ∈ specB(ξ) we
have corresponding left and right eigenvectors and associated to them the eigenprojection Pν(D)
such that
Pν(D)V (t, ·) = eitν(D)Pν(D)V0. (3.4)
Thus, we can single out the influence of one eigenvalue in this way. Note, that this is only of
interest in the neighbourhood of hyperbolic directions and for the corresponding eigenvalue and
there the assumption of diagonalisability of B(ξ) may be skipped (real eigenvalues are always
simple, for small |ξ| diagonalisation works, for large |ξ| everything goes well under the assumption
of non-degeneracy and on the middle part we make the neighbourhood small enough to exclude
possible multiplicities).
Proof. We decompose the proof into four parts corresponding to the four estimates. The micro-
localised estimates are merely standard multiplier estimates. We do not use stationary phase
method.
Step 1. Parabolic directions, large frequencies. In this case we have uniformly in ξ ∈ suppχPpar ,
the estimate Im ν(ξ) ≥ C′ > 0. Taking 0 < C < C′ we obtain for these ξ
ImB(ξ) =
B(ξ)−B∗(ξ)
in the sense of self-adjoint operators and the estimate follows from the L2–L2 estimate
‖χ(D)Ppar(D)V (t, x)‖22 =
‖χ(ξ)Ppar(ξ)V̂ (t, ξ)‖22
= 2Re
χ(ξ)Ppar(ξ)V̂ (t, ξ), χ(ξ)Ppar(ξ)∂tV̂ (t, ξ)
= −2 Im
χ(ξ)Ppar(ξ)V̂ (t, ξ), χ(ξ)Ppar(ξ)B(ξ)V̂ (t, ξ)
≤ −2C‖χ(ξ)Ppar(ξ)V̂ (t, ξ)‖2 = −2C‖χ(D)Ppar(D)V (t, x)‖22,
viewed as Hs–Hs estimate and combined with Sobolev embedding.
Step 2. Parabolic directions, small frequencies. We know from Proposition 2.5 that in this case
the matrix B(ξ) has only simple eigenvalues. We will make use of the representation of solutions
V (t, x) =
ν(ξ)∈specB(ξ)
eitν(D)Pν(D)V0(x) (3.5)
with corresponding eigenprojections (amplitudes) Pν(ξ). The amplitudes are uniformly bounded
on the set of all occurring ξ and possess full asymptotic expansions in |ξ| as ξ → 0 together with
their derivatives. Especially, by Hörmander-Mikhlin multiplier theorem [10, p. 96, Theorem 3]
the operators Pν(D) are L
p-bounded for 1 < p <∞.
16 MICHAEL REISSIG AND JENS WIRTH
It remains to consider the model multiplier eitν(ξ). From Proposition 2.5 we know that
|eitν(ξ)| . e−Ct|ξ|
, |ξ| ≤ c, (3.6)
with suitable constants c and C and thus the L1–L∞ estimate
‖eitν(D)f‖∞ ≤ ‖eitν(ξ)f̂‖1 ≤ ‖eitν(ξ)‖L1({|ξ|≤c})‖f̂‖∞
. ‖f‖1
e−Ct|ξ|
|ξ|d|ξ| . (1 + t)−1 ‖f‖1
holds for all f ∈ L1(R2) with supp f̂ ⊆ {|ξ| ≤ c}. Riesz-Thorin interpolation [1, Chapter 4.2]
with the obvious L2–L2 estimate gives the desired decay result.
Step 3. Hyperbolic directions, large frequencies. We take the conical neighbourhoods small
enough to exclude all multiplicities (related to the eigenvalue which becomes real in the hyperbolic
direction). Then similar to (3.5) the solution is represented as
V (t, x) = eitν(D)P+ν (D)V0(x) + e
−itν(D)P−ν (D)V0(x) + Ṽ (t, x), (3.7)
where Ṽ (t, x) corresponds to the remaining parabolic eigenvalues of B(ξ) and satisfies the es-
timate from Step 1. Again we can use smoothness of P±ν (ξ) together with the existence of a
full asymptotic expansion as |ξ| → ∞ to get Lp-boundedness of P±ν (D) for 1 < p < ∞ from
Hörmander-Mikhlin multiplier theorem.
It remains to understand the model multiplier e±itν(ξ) for the hyperbolic eigenvalue related
to the hyperbolic direction η̄. Using the estimate from Proposition 2.9 we conclude for r > 2
‖e±itν(D)f‖∞ ≤ ‖e±itν(ξ)f̂‖1 ≤ ‖e±itν(ξ)|ξ|−r‖L1(S1)‖ |ξ|
r f̂‖∞
. ‖〈D〉rf‖1
|ξ|1−rd|ξ|
e−C1tφ
. t−1/2 ‖〈D〉rf‖1, t ≥ 1
for all f ∈ 〈D〉−rL1(R2) with supp f̂ ⊆ S1 = {|ξ| ≥ c, |η − η̄| ≤ ǫ}. Riesz-Thorin interpolation
with the L2–L2 estimate gives the desired decay result.
Step 4. Hyperbolic directions, small frequencies. Like for large hyperbolic frequencies we make use
of the representation (3.7) to separate hyperbolic and parabolic influences. As in the previous
cases the existence of full asymptotic expansions imply that the projections P±ν (D) are L
bounded for 1 < p <∞.
It remains to understand the model multiplier e±itν(ξ). Using Proposition 2.9 we have |e±itν(ξ)| .
e−C2t|ξ|
2φ2 such that after introducing polar co-ordinates
‖e±itν(D)f‖∞ ≤ ‖e±itν(ξ)f̂‖1 ≤ ‖e±itν(ξ)‖L1(S2)‖f‖1
. ‖f‖1
e−C2tφ
2|ξ|2dφ|ξ|d|ξ|
. t−1/2‖f‖1
d|ξ| . t−1/2 ‖f‖1, t ≥ 1
for all f ∈ L1(R2) with supp f̂ ⊆ S2 = {|ξ| ≤ c, |η − η̄| ≤ ǫ}. Riesz-Thorin interpolation with
the L2–L2 estimate gives the desired decay result. �
Remark. 1. In non-degenerate hyperbolic directions where the coupling function vanishes to
order ℓ (cf. Example 2.4 case 3) we obtain by the same reasoning the weaker Lp–Lq decay rate
‖χ(D)Phyp(D)V (t, ·)‖q . (1 + t)−
)‖V0‖p,r. (3.8)
ANISOTROPIC THERMO-ELASTICITY IN 2D 17
It remains to understand whether this weaker decay rate is also sharp or whether an application
of stationary phase method may be used to improve this. We will discuss this in Section 3.2.
2. We can extend the estimate of Theorem 3.1 to the limit case p = 1, if we include the eigen-
projections Pν(ξ) into the considered model multiplier and use just their boundedness instead
of Hörmander-Mikhlin multiplier theorem. This is possible, because all the multiplier estimates
were based on Hölder inequalities.
So far we understood properties of solutions to the transformed problem (3.1) for the vector-
valued function V (t, x) given by (2.4) as
V (t, x) =
(Dt +D1/2(D))U (0)(t, x)
(Dt −D1/2(D))U (0)(t, x)
θ(t, x)
 , (3.9)
where U (0)(t, x) =MT (D)U(t, x) is the elastic displacement after transformation with the diag-
onaliser of the elastic operator. Because M(η) is unitary and homogeneous of degree zero this
diagonaliser is Lp-bounded for 1 < p <∞ with bounded inverse. Thus we have
DtU(t, x) =M(D)
V (t, x) (3.10)
A(D)U(t, x) =M(D)
0 − 1
0 − 1
V (t, x) (3.11)
such that as corollary of Theorem 3.1 we obtain
Corollary 3.2. Assume (A1) to (A4) and that the coupling functions vanish of first order in all
hyperbolic directions. Then the solution U(t, x) and θ(t, x) to (1.1) satisfy the a-priori estimates
‖DtU(t, ·)‖q + ‖
A(D)U(t, ·)‖q + ‖θ(t, ·)‖q
. (1 + t)−
) (‖U1‖p,r+1 + ‖U2‖p,r + ‖θ0‖p,r) (3.12)
for dual indices p ∈ (1, 2], pq = p+ q, and Sobolev regularity r > 2(1/p− 1/q).
Remark. Including the diagonaliser M(ξ) into the model multipliers of Theorem 3.1 allows to
overcome the restriction on p and to extend this statement up to p = 1. We preferred this way
of presenting the results because they allow to decouple both statements. Theorem 3.1 gives
deeper insight into the asymptotic behaviour of solutions than Corollary 3.2 does.
3.2. Coupling functions vanishing to higher order. We have seen that under the assump-
tion that the coupling functions aj(η) ∈ C∞(S1), aj(η) = η · rj(η) related to the symbol of the
elastic operator A(η) have only zeros of first order, we can deduce Lp–Lq decay estimates without
relying on the method of stationary phase.
Now we want to discuss how to use the method of stationary phase to deduce decay estimates
in the remaining cases. From Proposition 2.2 we know that in hyperbolic directions the imaginary
part Im ν±j0(ξ) of the hyperbolic eigenvalues vanishes like the square of the corresponding coupling
function a2j0(η), while the real part Re ν
(ξ) is essentially described by ±ωj0(ξ). First, we make
this more precise and formulate an estimate for Re νj0(ξ)
± ∓ ωj0(ξ) and its derivatives.
Proposition 3.3. Let η̄ ∈ S1 be hyperbolic with respect to the index j0 and aj0(η) vanish to
order ℓ in η̄. Then there exists a conical neighbourhood of the direction η̄ such that for all
k = 0, 1, . . . , 2ℓ− 1 the estimates
∣∂kη (Re ν
(ξ) ∓ ωj0(ξ))
∣ ≤ c(|ξ|)|η − η̄|2ℓ−k, (3.13a)
∣∂kη Im ν
∣ ≤ c(|ξ|)|η − η̄|2ℓ−k, (3.13b)
18 MICHAEL REISSIG AND JENS WIRTH
hold uniformly on it, where c(|ξ|) ∼ 1 as |ξ| → ∞ and c(|ξ|) ∼ |ξ| as |ξ| → 0.
This estimate may be used to transfer the micro-localised decay estimate for the elasticity
equation to the thermo-elastic system on a sufficiently small conical neighbourhood of η̄. We
need one further notion to prepare the main theorem of this section. Decay rates for solutions to
the elasticity equation depend heavily on the order of contact between the sheets of the Fresnel
surface and its tangents, cf. [13] or point 2 of the concluding remarks on page 21.
Proposition 3.4. For η ∈ S1 we denote by γ̄j(η) the order of contact between the j-th sheet
Sj = {ω−1j (η)η | η ∈ S
1} (3.14)
of the Fresnel curve and its tangent in the point ω−1j (η)η. Then
∂kη (∂
ηωj(η) + ωj(η)) = 0, k = 0, . . . , γ̄j(η)− 2, (3.15)
(if γ̄j(η) > 2) and
∂γ̄j(η)η ωj(η) + ∂
γ̄j(η)−2
η ωj(η) 6= 0. (3.16)
Proof. Follows by straight-forward calculation. The curvature of the j-th Fresnel curve Sj at the
point ω−1j (η)η factorises as ωj(η) + ∂
ηωj(η) and a smooth non-vanishing analytic function. �
For the following we chose the conical neighbourhood of the hyperbolic direction η̄ (with
respect to the index j0) small enough in order that the curvature of Sj0 vanishes at most in η̄.
Theorem 3.5. Let η̄ be hyperbolic with respect to the index j0 and let aj0(η) vanish in η̄ to order
ℓ > 1. Let us assume further that γ̄j0(η̄) < 2ℓ. Then
||e±itνj0 (D)f ||q . (1 + t)
)||f ||p,r (3.17)
for dual indices p ∈ [1, 2], pq = p + q, regularity r > 2(1/p − 1/q) and for all f with supp f̂
contained in a sufficiently small conical neighbourhood of η̄.
Proof. We distinguish between two cases related to the Fourier support of f , for general f we
can use linearity. In both cases we apply the method of stationary phase. Key lemma will be
the Lemma of van der Corput, cf. [11, p.334], combined for large frequencies with a dyadic
decomposition. It suffices to consider t ≥ 1, the estimate for t ≤ 1 follows from the uniform
boundedness of the Fourier multiplier together with Sobolev embedding using the regularity
imposed on the data, r > 2(1/p− 1/q).
In the following we skip the index j0 of the eigenvalue. Thus ω(η) stands for ωj0(η) and γ̄(η)
for γ̄j0(η) to shorten the notation.
Large and medium frequencies. Assume that f̂ is supported in a sufficiently small conical neigh-
bourhood of η̄ bounded away from zero, |ξ| > 1. We will use a dyadic decomposition in the radial
variable. Let for this χ ∈ C∞0 (R) be chosen in such a way that χ(R) = [0, 1], suppχ ⊆ [1/2, 2]
j∈Z χ(2
js) = 1 for all s ∈ R+. We set χj(s) = χ(2−js) such that suppχj ⊆ [2j−1, 2j+1].
We follow the treatment of Brenner [2] and use Besov spaces. Due to our assumptions on the
Fourier support of f Besov norms are given by
||f ||Brp,q =
∥2jr||χj(|D|)f ||p
ℓq(N0)
(3.18)
and corresponding Besov spaces are related to Sobolev and Lp-spaces by the embedding relations
Hp,r →֒ Brp,2, B0q,2 →֒ Lq (3.19)
ANISOTROPIC THERMO-ELASTICITY IN 2D 19
for p ∈ (1, 2] and q ∈ [2,∞). Thus it suffices to prove an Brp,2 → B0q,2 estimate for dual p and q.
The case p = q = 2, r = 0 is trivial by Plancherel and the uniform boundedness of the multiplier,
we concentrate on the B21,2 → B0∞,2 estimate. Therefore, we use
||e±itν(D)f ||B0
||χj(D)e±itν(D)f ||2∞
≤ sup
Ij(t) ||f ||Br
, (3.20)
where Ij(t) are estimates of the dyadic components of the operator,
Ij(t) = sup
|F−1[eitν(ξ)ψ(ξ)χj(|ξ|)|ξ|−r]|. (3.21)
Here ψ(ξ) localises to a small conical neighbourhood of η̄ bounded away from zero, |ξ| > 1,
containing the support of f̂ . We assume |ψ(ξ)|+ |∂ηψ(ξ)|+ |∂|ξ|ψ(ξ)| . 1 and ψ ≡ 1 on a smaller
conic set around η̄. We introduce polar co-ordinates |ξ| and φ (with φ = 0 corresponding to η̄)
and set x = tz. Using that the hyperbolic directions η̄ ∈ S1 are isolated we can assume that
the conical neighbourhood under consideration contains no further hyperbolic direction. For the
calculation of Ij(t) we integrate over the interval [−ǫ, ǫ] for φ. This yields for all j ∈ N0
Ij(t) = sup
∫ 2j+1
eit(Re ν(ξ)+z·ξ)−t Im ν(ξ)ψ(ξ)χj(|ξ|)|ξ|1−rdφd|ξ|
= 2j(2−r) sup
jt(Re ν(2jξ)2−j+z·ξ)e−t Im ν(2
jξ)ψ(2jξ)χ(|ξ|)|ξ|1−rdφd|ξ|
Due to Proposition 3.3 (let us restrict to the − case) we have
Re ν(2jξ) = 2jω(ξ) + 2j|ξ|α(2jξ), (3.22)
where
∣∂kφα(2
∣ ≤ Ck |φ|2ℓ−k, k = 0, 1, . . . , 2ℓ− 1. (3.23)
By our assumptions 2ℓ− 1 ≥ γ̄(η̄). We use this to reduce the problem to properties of the elastic
eigenvalue ωj(η) and consider
Ij(t) = 2
j(2−r) sup
j |ξ|(ω(η)+z·η+α(2jξ))e−t Im ν(2
jξ)ψ(2jξ)dφχ(|ξ|)|ξ|1−rdξ
. (3.24)
Note that, |α(2jξ)| . |φ|2ℓ is small, if we choose the neighbourhood of η̄ small enough.
If |ω(η)+z ·η| ≥ δ for some small δ the outer integral has no stationary points and integration
by parts implies arbitrary polynomial decay. We restrict to the z ∈ R2 with |ω(η) + z · η| ≤ δ
and consider only the inner integral
Ij(t, |ξ|) =
j|ξ|(ω(η)+z·η+α(2jξ))e−t Im ν(2
jξ)ψ(2jξ)dφ
. (3.25)
It can be estimated using the Lemma of van der Corput. For this we need an estimate for deriva-
tives of the phase. We start by considering the unperturbed phase ω(η) + z · η. Differentiation
yields
∂γ̄(η̄)η (ω(η) + z · η) = ∂γ̄(η̄)−2η (∂2ηω(η) + ω(η))− ∂γ̄(η̄)−4η (∂2ηω(η) + ω(η)) +−
· · ·+−
ω(η) + z · η, 2|γ̄(η̄)
∂ηω(η) + z · ηT , 26 |γ̄(η̄)
(3.26)
20 MICHAEL REISSIG AND JENS WIRTH
and by Proposition 3.4 the first term is non-zero for η̄ while the others are small in a neighbour-
hood of η̄. Choosing δ and the neighbourhood small enough and using (3.23) this implies a lower
bound on the γ̄(η̄)-th derivative of the phase,
|∂γ̄(η̄)η (ω(η) + z · η + α(2jξ))| & 1 (3.27)
uniformly on this neighbourhood of η̄ and independent of j. Thus we conclude by [11, p. 334]
for all t ≥ 1 and uniformly in z with |ω(η) + z · η| ≤ δ
Ij(t, |ξ|) ≤ Ck(t2j |ξ|)−
γ̄(η̄)
−t Im ν(2jξ)ψ(2jξ)
. (3.28)
To estimate the remaining integral we use Proposition 3.3
−t Im ν(2jξ)ψ(2jξ)
∣ dφ .
e−ctφ
|φ|2ℓ−1tdφ = 2
e−ctφ
φ2ℓ−1ctdφ
e−ctφ
Integrating over ξ ∈ [1/2, 2] and choosing r ≥ 2 we obtain for j ∈ N0
Ij(t) . (1 + t)
γ̄(η̄) , (3.29)
where the occurring constant is independent of j and (3.20) implies the desired B21,2 → B0∞,2
estimate. Interpolation with the L2–L2 estimate gives the corresponding Brp,2 → B0q,2 estimates
with r ≥ 2(1/p− 1/q) and finally the embedding relations to Sobolev and Lp-spaces the desired
estimate
||eitν(D)f ||q . (1 + t)−
γ̄(η̄) ||f ||p,r (3.30)
for all f satisfying the required Fourier support conditions.
Small frequencies. Assume now that f̂ is supported in a small conical neighbourhood of η̄ with
|ξ| ≤ 2. In this case we estimate directly by the method of stationary phase. We sketch the main
ideas. It is sufficient to estimate
I(t) = sup
eitν(ξ)ψ(ξ)χ(|ξ|)
∣ , (3.31)
where the function ψ ∈ C∞(R+) localises to the small neighbourhood of η̄ with |ξ| ≤ 2. Again
we require |ψ(ξ)| + |∂ηψ(ξ)| + |∂|ξ|ψ(ξ)| . 1. In correspondence to large frequencies I(t) equals
I(t) = sup
eit|ξ|(ω(η)+z·η+α(ξ))e−t Im ν(ξ)ψ(ξ)χ(|ξ|)dφ|ξ|d|ξ|
. (3.32)
We distinguish between |ω(η) + z · η| ≥ δ, where the outer integral has no stationary points, and
|ω(η) + z · η| ≤ δ. In the first case we can apply one integration by parts and get t−1 using
|ξ| |∂|ξ|e−t Im ν(ξ)| . |ξ|2te−t|ξ|
2φ2ℓ . 1. (3.33)
In the second case we can reduce the consideration to
I(t, |ξ|) =
eit|ξ|(ω(η)+z·η+α(ξ))e−t Im ν(ξ)ψ(ξ)dφ
(3.34)
and an application of the Lemma of van der Corput. By Propositions 3.3 and 3.4 we have
γ̄(η̄)
φ (ω(η) + z · η + α(ξ)) 6= 0 (3.35)
ANISOTROPIC THERMO-ELASTICITY IN 2D 21
for all ξ in a sufficiently small conic neighbourhood of η̄. So we obtain
I(t, |ξ|) . (t|ξ|)−
γ̄(η̄)
−t Im ν(ξ)ψ(ξ)
. (t|ξ|)−
γ̄(η̄) . (3.36)
Integrating with respect to |ξ| yields the estimate for I(t)
I(t) ≤
I(t, |ξ|)|ξ|d|ξ| . t−
γ̄(η̄)
|ξ|1−
γ̄(η̄) d|ξ| . (1 + t)−
γ̄(η̄) , t ≥ 1 (3.37)
and by Hölder inequality the L1–L∞ estimate
||eitν(D)f ||∞ ≤ I(t)||f ||1 . (1 + t)−
γ̄(η̄) ||f ||1 (3.38)
for all f satisfying the required Fourier support condition.
By interpolation with the obvious L2–L2 estimate we get Lp–Lq estimates and combining
them with the estimate of the first part proves the theorem. �
If the order of contact exceeds the vanishing order of the coupling funtions, the best we can do
is to use the idea of Section 3.1 and to apply standard multiplier estimates. As already remarked
after the proof of Theorem 3.1 we obtain as decay rate in this case:
Theorem 3.6. Let η̄ be hyperbolic with respect to the index j0 and let aj0(η) vanish in η̄ of order
ℓ. Let us assume further that γ̄j0(η̄) ≥ 2ℓ, where γ̄j0(η) is defined in Proposition 3.4. Then
||e±itνj0 (D)f ||q . (1 + t)−
)||f ||p,r (3.39)
for dual indices p ∈ [1, 2], pq = p + q, regularity r > 2(1/p − 1/q) and for all f with supp f̂
contained in a sufficiently small conical neighbourhood of η̄.
4. Concluding remarks
1. In a second part of this note, [16], we will give concrete applications of the general treatment
presented so far. From the remarks and examples we made in the previous sections it follows that
we indeed cover the estimates of [3] for cubic media and [4] for rhombic media in the situations
of coupling vanishing to first order.
In [16] we will discuss the situation of higher order tangencies for rhombic media and give a
new estimate extending the results of [4]. Furthermore, we will give concrete examples of media
for all achievable decay rates in the case of a differential elastic operator A(D).
2. In general decay rates of solutions are determined by vanishing properties of the coupling
functions. If the coupling functions are nonzero, decay rates are parabolic. If they vanish to
sufficiently high order, decay rates are hyperbolic and determined by the elastic operator (micro-
localised to this direction). In the intermediate case simple multiplier estimates are sufficient.
3. In the case of isotropic media one of the coupling functions vanishes identically. In this case
one pair of eigenvalues of the matrix B(ξ) is purely real ν±(ξ) = ±(λ+µ)|ξ| and the components
related to these eigenvalues solve a wave equation. Thus they satisfy the usual Strichartz type
decay estimates, [2],
‖e±it(λ+µ)|D|P±(D)V0‖q . (1 + t)−
)‖V0‖p,r (4.1)
for p ∈ (1, 2], pq = p + q and r = 2(1/p − 1/q). More generally, if we know that one pair of
eigenvalues satisfies ν±(ξ) = ±|ξ|ω(η) for all η ∈ S1 with a smooth function ω : S1 → R+, the
decay rates for the corresponding components depend heavily on geometric properties of the
Fresnel curve S = {ω−1(η)η | η ∈ S1 } ⊂ R2. Following [8] the estimate (4.1) is valid in this case
as long as the curvature of S never vanishes. If there exist directions η where the curvature of S
vanishes, the constant 1
in the exponent has to be altered to 1
, where γ̄ denotes the maximal
order of contact of the curve S to its tangent, [13].
22 MICHAEL REISSIG AND JENS WIRTH
4. The main focus of this paper was on the treatment of non-degenerate cases, thus assumptions
(A1) to (A4) are required. Nevertheless, we showed how to obtain a control on the eigenvalues
of the matrix B(ξ) in the exceptional cases where either (A3) or (A4) is violated. In the first
case the treatment of large frequencies has to be replaced by the investigation of the expression
obtained in Proposition 2.7, while in the latter one a corresponding replacement has to be made
for small frequencies.
5. The results in this paper are essentially two-dimensional. For the study of anisotropic thermo-
elasticity in higher space dimensions there arise two essential problems. The first is that we can
not assume (A3). For example in three space dimensions and for cubic media the symbol of
the elastic operator A : S2 → C3×3 has degenerate directions with multiple eigenvalues related
to the crystal axes. Nevertheless, the multi-step diagonalisation scheme used in Section 2.2 can
be adapted to such a case. This will be done in the sequel. The second problem is that the
geometry of the set of hyperbolic directions becomes more complicated. In general we can not
expect isolated hyperbolic directions, it will be necessary to consider manifolds of hyberbolic
directions on S2.
References
[1] C. Bennett and R. Sharpley, Interpolation of Operators, Academic Press, Boston, 1988
[2] P. Brenner, On Lp–Lp′ estimates for the wave equation, Math. Z. 145(3):251–254, 1975.
[3] J. Borkenstein, Lp–Lq Abschätzungen der linearen Thermoelastizitätsgleichungen für kubische Medien
im R2, Diplomarbeit, Bonn, 1993.
[4] M. S. Doll, Zur Dynamik (magneto-) thermoelastischer Systeme im R2, Dissertation, Konstanz, 2004.
[5] W.P. Johnson, The curious history of Faà di Bruno’s formula, Amer. Math. Monthly, Vol. 109(3):
217–234, 2002.
[6] T. Kato, Perturbation Theory for linear Operators, Springer, 1980.
[7] O. Liess, Decay estimates for the solutions of the system of crystal optics, Asymptot. Anal. 4:61–95,
1991.
[8] R. Racke, Lectures on Nonlinear Evolution Equations – Initial Value Problems, Aspects of Mathemat-
ics: E, Vol. 19, Friedr. Vieweg & Sohn, Braunschweig/Wiesbaden, 1992.
[9] M. Reissig, Y.-G. Wang, Cauchy problems for linear thermoelastic systems of type III in one space
variable, Math. Meth. Appl. Sci. 28:1359–1381, 2005.
[10] E. M. Stein, Singular Integrals and Differentiability Properties of Functions, Princeton University
Press, Princeton, New Jersey, 1970.
[11] E. M. Stein, Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals,
Princeton University Press, Princeton, New Jersey, 1993.
[12] M. Stoth, Lp–Lq Abschätzungen für eine Klasse von Lösungen linearer Cauchy-Probleme bei
anisotropen Medien, Dissertation, Bonn, 1994.
[13] M. Sugimoto, Estimates for hyperbolic equations with non-convex characteristics, Math. Z. 222(4):521–
531, 1996.
[14] Y.-G. Wang, Microlocal analysis in nonlinear thermoelasticity, Nonlinear Anal. 54:683–705, 2003 .
[15] Y.-G. Wang, A new approach to study hyperbolic-parabolic coupled systems, in R. Picard (ed.) et al.,
Evolution equations. Propagation phenomena, global existence, influence of non-linearities. Based on
the workshop, Warsaw, Poland, July 1-July 7, 2001, Warsaw: Polish Academy of Sciences, Institute
of Mathematics, Banach Cent. Publ. 60, p. 227-236, 2003.
[16] J. Wirth, Anisotropic thermo-elasticity in 2D - Part II: Applications, Asymptotic Anal. ??:???–
???,????.
[17] K. Yagdjian, The Cauchy problem for hyperbolic operators. Multiple characteristics, micro-local ap-
proach, Akademie-Verlag, Berlin, 1997.
Michael Reissig, Institut für Angewandte Analysis, Fakultät für Mathematik und Informatik, TU
Bergakademie Freiberg, Prüferstraße 9, 09596 Freiberg, Germany
Jens Wirth, Institut für Angewandte Analysis, Fakultät für Mathematik und Informatik, TU
Bergakademie Freiberg, Prüferstraße 9, 09596 Freiberg, Germany
current address: Department of Mathematics, Imperial College, London SW7 2AZ, UK
	1. The problem under consideration
	2. General treatment of the thermo-elastic system
	2.1. Asymptotic expansion of the eigenvalues as ||0
	2.2. Asymptotic expansion of the eigenvalues as ||
	2.3. Collecting the results
	3. Decay estimates for solutions
	3.1. Coupling functions with simple zeros
	3.2. Coupling functions vanishing to higher order
	4. Concluding remarks
	References
ABSTRACT
  In this note we develop tools and techniques for the treatment of anisotropic
thermo-elasticity in two space dimensions. We use a diagonalisation technique
to obtain properties of the characteristic roots of the full symbol of the
system in order to prove $L^p$--$L^q$ decay rates for its solutions.

<|endoftext|><|startoftext|>
I-V characteristics of the vortex state in MgB2 thin films
Huan Yang,1 Ying Jia,1 Lei Shan,1 Yingzi Zhang1 and Hai-Hu Wen1∗
National Laboratory for Superconductivity, Institute of Physics and National Laboratory for Condensed Matter Physics,
Chinese Academy of Sciences, P.O. Box 603, Beijing 100080, P. R. China
Chenggang Zhuang,2,3 Zikui Liu,4 Qi Li,2 Yi Cui2 and Xiaoxing Xi2,4
Department of Physics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
Department of Physics, Peking University, Beijing 100871, PR China and
Department of Materials Science and Engineering,
The Pennsylvania State University, University Park, Pennsylvania 16802, USA
(Dated: October 22, 2018)
The current-voltage (I-V ) characteristics of various MgB2 films have been studied at different
magnetic fields parallel to c-axis. At fields µ0H between 0 and 5 T, vortex liquid-glass transitions
were found in the I-V isotherms. Consistently, the I-V curves measured at different temperatures
show a scaling behavior in the framework of quasi-two-dimension (quasi-2D) vortex glass theory.
However, at µ0H ≥ 5 T, a finite dissipation was observed down to the lowest temperature here, T =
1.7 K, and the I-V isotherms did not scale in terms of any known scaling law, of any dimensionality.
We suggest that this may be caused by a mixture of σ band vortices and π band quasiparticles.
Interestingly, the I-V curves at zero magnetic field can still be scaled according to the quasi-2D
vortex glass formalism, indicating an equivalent effect of self-field due to persistent current and
applied magnetic field.
PACS numbers: 74.70.Ad, 74.25.Qt, 74.25.Sv
I. INTRODUCTION
Since the discovery of the two-gap superconductor
MgB2 in 2001,
1 the mechanism of its superconductiv-
ity and vortex dynamics has attracted considerable in-
terests. The two three-dimension (3D) π bands and
two quasi-two-dimension (quasi-2D) σ bands in this sim-
ple binary compound seem to play an important role
in the superconductivity,2 as well as the normal state
properties.3,4 The two sets of bands have different en-
ergy gaps, i.e., about 7 meV for the σ bands, and about
2 meV for the π bands.5,6 And the coherent length of
the π bands is much larger than that of the σ bands2.
Many experiments have demonstrated that the π-band
superconductivity is induced from the σ-band and there
is a rich evidence for both the interband and intraband
scattering. Owing to the complicated nature of super-
conductivity in this system, its vortex dynamics may ex-
hibit some interesting or novel features. Among various
experimental methods, measuring the current-voltage (I-
V ) characteristics at different temperatures and magnetic
fields can provide important information for understand-
ing the physics of the vortex state. Up to now, the
transport properties of MgB2 have been studied on both
polycrystalline bulk samples7 and thin films8. In both
cases, the I-V characteristics demonstrated good agree-
ment with the 3D vortex glass (VG) theory. This was
partially due to the limited magnetic fields in the experi-
ment. In addition, it has been shown that the properties
of MgB2 are very sensitive to the impurities and defects
introduced in the process of sample preparation, and the
vortex dynamics must be influenced, too. Therefore, it
is necessary to investigate the vortex dynamics in high
quality MgB2 epitaxial thin films and to reveal the in-
trinsic properties of the vortex matter in this interesting
multiband system. In this paper, we present the I-V
characteristics of high-quality MgB2 thin films measured
at various temperatures and magnetic fields. The vortex
dynamics in this system is then investigated in detail.
II. EXPERIMENT
The high-quality MgB2 thin films studied in this work
were prepared by the hybrid physical-chemical vapor de-
position technique9 on (0001) 6H-SiC substrates. All the
films had c-axis orientation with the thickness of about
100 nm. Fig. 1 (a) shows the θ-2θ scan of the MgB2 film,
and the sharp (000l) peaks indicate the pure phase of the
c-axis orientation of MgB2. In order to show the good
crystallinity of the film, we present in Fig. 1(b) the same
data in a semilogarithmic scale which enlarges the data
in the region of small magnitude. It is clear that, besides
the background noise, we can only observe the diffraction
peaks from MgB2 and the SiC substrate, i.e., there is no
trace of the second phase in the film. The c-axis lattice
constant calculated from the MgB2 peak positions was
about 3.517 Å(bulk value1: 3.524 Å). The φ scan (az-
imuthal scan) shown elsewhere9 indicated well the six-
fold hexagonal symmetry of the MgB2 film matching the
substrate. The full width at half maximum (FWHM)
of the 0002 peak taken on the film in θ-2θ scan [MgB2
0002 peak in Fig. 1(a)] and ω scan [rocking curve, shown
in Fig. 1(c)] is 0.15◦ and 0.39◦, respectively. The scan-
ning electron microscopy (SEM) image in Fig. 1(d) gave
a rather smooth top surface view without any observable
http://arxiv.org/abs/0704.0126v2
2x103
4x103
6x103
24 26
3x103
6x103
10 20 30 40 50 60 70 80
 (degrees)
FWHM=0.39o
 (0002)
500nm
2  (degrees)
FIG. 1: (a) X-ray diffraction pattern of the MgB2 film on a
(0001) 6H-SiC substrate in the θ-2θ scan, which shows only
the 000l peaks of MgB2 in addition to substrate peaks, in-
dicating a phase-pure c-axis-oriented MgB2 film. (b) The
semilogarithmic plot of the θ-2θ scan. (c) The rocking curve
of the 0002 MgB2 peak, which shows the FWHM of about
0.39◦. (d) The SEM image of the MgB2 film, which shows
the smooth surface without obvious granularity.
grain boundaries, which suggested that the film had a
homogeneous quality. Ion etching was used to pattern a
four-lead bridge with the effective size of 380× 20 µm2.
The resistance measurements were made in an Oxford
cryogenic system Maglab-Exa-12 with magnetic field up
to 12 T. Magnetic field was applied along the c axis of
the film for all the measurements. The temperature sta-
bilization was better than 0.1% and the resolution of the
voltmeter was about 10 nV. We have done all the mea-
surements on several MgB2 films, and the experimental
data and scaling behaviors are similar; so, in this paper,
we present the data from one film.
In Fig. 2, we present the resistive transitions (R-T re-
lations) of a MgB2 thin film measured at various mag-
netic fields in a semilogarithmic scale. The current den-
sity in the measurement was about 500 A/cm
, much
smaller than the critical value for low temperatures,
106 A/cm
2 10. It can be determined from Fig. 2 that
the sample had a superconducting transition tempera-
ture of Tc = 40.05 K, with a transition width of about
0.5 K. Its normal state resistivity was about 2.45 µΩcm
and the residual resistance ratio [≡ ρ(300 K)/ρ(42 K)]
was about 6.4. The I-V curves were measured at various
temperatures for each field, and then we got the electric
0 10 20 30 40 50
H=0 T
H=1 T
H=3 T
H=6 T
T (K)
FIG. 2: Temperature dependence of resistive transitions for
µ0H = 0, 1, 3, and 6 T, with the current density j =
500 A/cm
field (E) and the current density (j) according to the
sample dimension. The current density was swept from
5 to 105 A/cm
during the I-V measurements.
III. THEORETICAL MODELS
In the mixed state of high-Tc superconductors with
randomly distributed pointlike pinning centers, a second-
order phase transition is predicted between VG state and
vortex-liquid state.11 The I-V curves at different tem-
peratures near the VG transition temperature Tg can be
scaled onto two different branches12 by the scaling law
j (T − Tg)
ν(z+2−D)
|T − Tg|
ν(D−1)
. (1)
The scaling parameter z has the value of 4–7, and ν ≈ 1–
2; D denotes the dimension of the system with the value
3 for 3D and 2 for quasi-2D13; f+ and f− represent the
functions for two sets of the branches above and below
Tg. Above Tg, the linear resistivity is given by
ρlin = dE/dj|j→0 ∝ (T − Tg)
ν(z+2−D)
. (2)
At Tg, the electric field versus the current density curve
satisfies the relationship
E(j)|T=Tg ≈ j
(z+1)/(D−1). (3)
In 2D superconductors at µ0H = 0 T, a Berezinskii-
Kosterlitz-Thouless (BKT) transition was found at a spe-
cific temperature TBKT.
14 At TBKT, E ∝ j
3, which is a
sign of the BKT transition. A continuous change from
the BKT transition at zero field to a quasi-2D VG transi-
tion, and then to a true 2D VG transition with Tg = 0 K
was found in TlBaCaCuO film,15 which shows a field-
induced crossover of criticalities.
A 2D VG transition may exist in a true 2D system
with Tg = 0 K, i.e., there is no zero-resistance state at
any finite temperatures. The E-j curves can be scaled
T 1+ν2D
, (4)
where T0 is a characteristic temperature, ν2D ≈ 2, and
p ≥ 1, while g is a scaling function for all temperatures
at a given magnetic field. The linear resistance is given
ρlin ∝ exp[−(T0/T )
p]. (5)
This 2D scaling law can be achieved in the very thin
films17 or in highly anisotropic systems at high magnetic
fields.18,19
IV. EXPERIMENTAL RESULTS AND
DISCUSSIONS
A. Quasi-two-dimension vortex-glass scaling in the
low-field region (µ0H < 5 T)
The E-j characteristics have been measured at various
magnetic fields up to 12 T. In Fig. 3 we show the typical
example at µ0H = 1 T for (a) E-j curves and (b) the cor-
responding ρ-j curves in double-logarithmic scales. It is
obvious that when the temperature goes below some par-
ticular value (this is actually the vortex-glass transition
temperature Tg according to following discussions), the
resistivity falls rapidly with decreasing current density
and finally reaches the zero-resistance state which is the
characteristic of the so-called VG state. At the tempera-
tures above Tg, the resistivity remains constant in small
current limit. The current density of 500 A/cm
used in
ρ-T measurement shown in Fig. 2 lies in this linear resis-
tivity regime from about 10−3 to 1 µΩcm. Consequently,
these data sets provide the basic information on scaling
if the data are describable by the VG theory.
The inset in Fig. 4 shows the data of the ρlin versus
(T −Tg) and the fit to Eq. (2). The data are the same as
those shown in Fig. 2 for µ0H = 1 T, and the attempt Tg
value is 31.4 K. In this double-logarithmic plot, the slope
of the linear fitting gives just the exponent of ν(z+2−D),
and the determined value is 8.08± 0.05. In order to have
reasonable values for ν and z, the dimension parameterD
needs to be chosen as 2, i.e., the investigated system has
the property of quasi-2D, which is similar to the situation
found in BiSrCaCuO.13,20 This is further supported by
the VG scaling of the data at 1 T. As shown in the main
frame of Fig. 4, the scaling experimental E-j curves form
two universal branches corresponding to the data above
and below Tg (31.4 K) with ν = 1.32 and z = 6.12.
At very large current density or a temperature near the
onset of superconducting transition, the free flux flow
regime dominates and, hence, the data do not scale. The
101 102 103 104 105
10 nV
j (A/cm2)
H=1 T
FIG. 3: (Color online) (a) E-j characteristics measured at
fixed temperatures ranging from 30 to 36 K for µ0H = 1 T.
The increments are 0.30 K in the range from 30.00 to 31.20 K,
and 0.25 K in the range from 31.50 to 34.00 K respectively,
and finally 35 K on the top. The dashed line shows the po-
sition of Tg, and the symbols denote the segments that scale
well according to the quasi-2D VG theory. The thin solid
lines denote also the measured data, however, located outside
the scalable range. (b) ρ-j curves corresponding to the E-j
data in (a). The thick solid line in (b) denotes the voltage
resolution of 10 nV.
1 2 3 4 5 6
 From R(T)
 From E(j)
100 101 102 103 104 105 106
Quasi-2D VG Scaling for 
H=1 T
D=2, T
=31.4 K, =1.32, z=6.12
j/|T-T
| (D-1)
FIG. 4: (Color online) Quasi-2D VG scaling of the E-j curves
measured at 1 T. The inset shows a double-logarithmic plot
of the temperature dependence of the linear resistivity. The
dashed line is a guide for the eyes.
3 6 9 121518
 From R(T)
 From E(j)
10-2 10-1 100 101 102 103 104 105
10-10
Quasi-2D VG Scaling for H=3 T
D=2, T
=15.4 K, =1.00, z=7.70
j/(T|T-T
| (D-1))
10-1 100 101 102 103 104 105 106
10-11
Quasi-2D VG Scaling for H=3 T
D=2, T
=15.4 K, =1.17, z=6.58
j/|T-T
| (D-1)
FIG. 5: (Color online)(a) Scaling curves of the E-j data mea-
sured in 3 T based on the quasi-2D VG scaling theory. The
inset shows a log-log plot of the temperature dependence of
the linear resistivity. (b) VG scaling with another form of
scaling variable j/(T |T − Tg|
ν(D−1)
symbols in the figure denote the range of the data well
described by the scaling law.
The situation at µ0H = 3 T is similar to that at
µ0H = 1 T. As shown in Fig. 5(a), the determined pa-
rameters are Tg = 15.4 K, ν = 1.17, and z = 6.58. Inter-
estingly, the previous work on MgB2 film
8 indicated that
the 3D VG scaling theory (D = 3) is a better choice in
describing the I-V characteristics in this system, though
this experiment was done at magnetic fields lower than
1 T. Moreover, the I-V curves were demonstrated to scale
well by using the argument of j/(T |T − Tg|
). The same
conclusions were also drawn on the polycrystalline MgB2
samples7. In order to clarify this issue, we also analyzed
our data using the form suggested in Ref. 8. As shown in
Fig. 5(b), such a scaling with j/(T |T − Tg|
ν(D−1)
) as the
scaling variable is worse than that with j/ |T − Tg|
ν(D−1)
Most importantly, the dimension parameter D is still re-
quired to be 2 instead of 3 as proposed in Refs. 7 and
8. This confusion can be easily understood in terms of
the two-band superconductivity of MgB2. As we know,
there are two types of bands contributing to the super-
conductivity of MgB2, namely, the 3D π bands and the
101 102 103 104 105
H=6 T
j (A/cm2)
FIG. 6: (Color online) ρ-j data at temperatures 1.7 K and
4 K to 20 K with 2 K-step, for µ0H = 6 T. Temperature of
the isotherms increases from bottom to top.
2D σ bands. Therefore, the structure of the vortex mat-
ter must be affected by both of them. Although the su-
perconductivity of π bands, induced possibly by that of
σ bands, is much weaker, it provides a large coherence
length with 3D characteristics in the low-field region.
Therefore, the vortices in this system may be quasi-2D
like and, at the same time, they can possess large cores
characterized by the coherence length of the π band su-
perfluid. In this sense, the quasi-2D scaling should be
more appropriate than the 3D one. However, when a
higher disorder is induced in the system, especially in
the boron sites, the interband scattering gets stronger
and the anisotropy decreases, which may lead to a 3D
vortex scaling. In this case, a more rigid vortex line can
be observed, especially at low fields.21 The good quasi-2D
scaling at 1 and 3 T demonstrated here suggests that the
phase transition from VG to the vortex liquid in MgB2
resembles that in the high-Tc superconductors. Together
with the data shown below, we can safely conclude that
a vortex glass state with zero linear resistivity can be
achieved in the low field region due to the presence of the
finite superfluid density from the π bands. Regarding the
VG scaling22 a principal requirement is a proper deter-
mination of Tg, namely the temperature with a straight
logE-log j curve in the low dissipation part. The toler-
ance for Tg variation is very small (about ±0.3 K). With
an inappropriately chosen Tg, the scaling quality dra-
matically deteriorates and, simultaneously, the values of
ν and z quickly deviate from those reported above and
those proposed by theory. This validates our analysis
here.
10-3 10-1 101 103
A scaling attempt with quasi-2D VG for 
H=6 T
D=2, T
=0 K, =3, z=0.58
j/|T-T
| (D-1)
 Qusi-2D VG theory
 From R(T)
 From E(j)
FIG. 7: (Color online) The scaling of the E-j isotherms with
quasi-2D VG model for µ0H = 6 T. The inset shows the
deviation of the ρlin vs T − Tg (Tg = 0) relation from the
linearity in the double logarithmic scale.
B. Anomalous vortex properties in high field
region
As shown in Fig. 2, when the magnetic field reaches
6 T, no zero-resistance state can be observed down to
the lowest temperature, here 1.7 K. Consequently, no
VG transition exists above 1.7 K at this field, as shown in
Fig. 6. The shape of the curve at T = 1.7 K suggests that
the resistivity goes to a finite value as the current den-
sity approaches zero.23 As shown in Fig. 7, the ρlin versus
T − Tg seriously deviates from linearity for any possible
Tg value, indicating the inapplicability of Eq. (2) in the
present case. Correspondingly, the quasi-2D scaling law
fails here. A natural explanation is that, with increas-
ing field, the 3D supercurrent from π bands is seriously
suppressed6,24 and the quasi-2D vortex structure trans-
forms into a 2D-like one dominated by the σ band super-
fluid. In Fig. 8, we show our attempt to apply 2D VG
scaling on the data. Surprisingly, this attempt also failed,
even though this model has been successfully applied to
the layered superconductors with large anisotropy (or 2D
property) such as Tl- and Bi-based high-Tc thin films at
high magnetic fields.18,19
The most reasonable explanation for this anomaly is
that the supercurrent contribution from the π bands is
much easier to suppress by the magnetic field than that
from the σ bands, since the gap in the π bands is several
times smaller than that in the σ band. We suggest that
at high magnetic fields (above 5 T), a different vortex
matter state is formed, composed of quasi-particles from
the π bands and vortices formed mainly by the residual
superfluid from the σ band. The π-band quasiparticles
diminish the long range phase coherence of the supercon-
ducting phase, which leads to a finite dissipation. Once
the long range superconducting phase coherence is de-
stroyed by the proliferation of a large amount of these
5 10 15 20
10-4 10-3 10-2 10-1 100 101 102 103
A scaling attempt with 2D VG for 
H=6 T
=2, p=1, T
=20 K
j/T1+ 2D
 2D VG theory
 From R(T)
 From E(j)
T (K)
FIG. 8: (Color online) Attempted scaling of the data with
2D VG model [Eq. (4)] for µ0H = 6 T. The inset shows the
nonlinearity of the relationship between ρlin and temperature,
the solid line shows the theoretical curve of true 2D VG theory
[Eq. (5)].
π-band quasiparticles, neither 3D nor quasi-2D VG scal-
ing is applicable. Such a mixed state is obviously diffi-
cult to be simply described by any known scaling theory.
Recently, scanning tunneling microscopy studies showed
that the quasiparticles of the π bands disperse over all of
the superconductor, both within and outside the vortex
cores25, which strongly supports our arguments. This is
the basis for the explanation of the nonvanishing vortex
dissipation at high magnetic fields in a zero temperature
limit found recently on MgB2 thin films.
C. Self-field effect at µ0H = 0 T
For a 2D layered superconductor in zerofield, the above
mentioned BKT transition may exist and be reflected
in the I-V characteristics15. In the present MgB2 sam-
ples, we have not found any evidence of this transition
in low magnetic fields which would be consistent with
the quasi-2D (instead of 2D) configuration of the vor-
tex matter. Moreover, both the E-j curves and the ρ-j
curves (as presented in Fig. 9) are similar to the situation
of µ0H = 1 T. Considering the narrow transition width
at zero-field, we did the measurement carefully with an
increment of 0.05 K. Obviously, there is no E(j) curve
which satisfies the E ∝ j3 dependence, as expected by the
BKT theory. Since the current can induce self-generated
vortices, it might be interesting to look at whether the
quasi-2D VG model applies here.
Similar in Sec. IV A, we present ρlin versus (T − Tg)
in a double logarithmic plot. From this graph, we de-
termined the exponent in Eq. 2 (as shown by the inset
of Fig. 10(a)). A good quasi-2D scaling was obtained
with parameters Tg = 39.94 K, ν = 1.12, and z = 6.61,
as presented in Fig. 10(a). Using the parameters deter-
) 0H=0 T
101 102 103 104 105
j (A/cm2)
10 nV
FIG. 9: (Color online) (a) E-j data at various temperatures
from 39.7 K to 40.5 K with an interval of 0.05 K for µ0H =
0 T, the symbols denote the region, where the data are scaled
(from 39.70 K to 40.30 K). Temperature of the isotherms
increases from bottom to top. The dashed line shows the
position of Tg and the symbols denote the segments, which
scale well according to the quasi-2D VG theory. The thin solid
lines are also the measured data lying outside the scalable
range. (b) ρ-j curves corresponding to the E-j data in (a).
The thick solid line in (b) denotes the voltage resolution of
10 nV.
mined here, one finds a self-consistency with the value
of ν(z + 2 − D), as determined in fitting the linear re-
sistivity [Eq. (2)]. Both the temperature dependence of
ρlin and the scaling curves at µ0H = 0 T are similar
to the situation at small field µ0H = 0.1 T [shown in
Fig. 10(b)] and µ0H = 0.5 T (not shown in this pa-
per), except for the slight differences of the scaling pa-
rameters. The scaling parameters including the ones at
µ0H = 0.5 T are listed in Table I. It was proven that
current and magnetic field exhibit analogous effects in
suppressing superconductivity and generating quasipar-
ticles in conventional superconductors.26 Similarly, the
current-induced self-field may lead to a similar effect in
the vortex state as an applied magnetic field. Nonethe-
less, the good agreement of this simple scaling law with
the zero-field data is interesting and worth studying in
detail. Moreover, the values of ν and z for zero field are
very close to those for µ0H = 1 and 3 T, indicating a
similar vortex dynamics in the whole low-field region.
H=0 T
101 102 103 104 105
j (A/cm2)
10 nV
FIG. 10: (Color online) (a) Quasi-2D VG scaling of the data
measured at 0 T. The inset indicates a good linearity of the
temperature dependence of the linear resistivity. (b) Quasi-
2D VG scaling of the data measured at 0.1 T. The inset indi-
cates a good linearity of the temperature dependence of the
linear resistivity.
TABLE I: Quasi-2D VG scaling parameters at different fields.
µ0H (T) Tg(K) ν z
0.0 39.94 1.12 6.61
0.1 39.28 1.30 6.08
0.5 35.95 1.37 6.42
1.0 31.4 1.32 6.12
3.0 15.4 1.17 6.58
V. SUMMARY
We have measured I-V curves on high-quality MgB2
films at various magnetic fields and temperatures. At
magnetic fields below 5 T including the zero field, the
curves scaled well according to the quasi-2D VG theory
instead of the 3D model, in good agreement with the
multiband superconductivity of MgB2 contributed from
the strong 2D σ bands and weak 3D π bands. At the
fields above 5 T, the curves did not scale according to
any known VG scaling laws, accompanied by the disap-
pearance of a zero-resistance state. Based on our result
combined with recent tunneling experiments, a different
vortex state was suggested, namely, a state where the
vortices composed of the superfluid from the σ bands
move through the space filled with numerous quasiparti-
cles from π bands.
VI. ACKNOWLEDGMENTS
This work is supported by the National Science Foun-
dation of China, the Ministry of Science and Tech-
nology of China (973 project: 2006CB601000 and
2006CB921802), and the Knowledge Innovation Project
of the Chinese Academy of Sciences (ITSNEM). The
work at Penn State is supported by NSF under Grants
Nos. DMR-0306746 (X.X.X.), DMR-0405502 (Q.L.), and
DMR-0514592 (Z.K.L. and X.X.X.), and by ONR under
grant No. N00014-00-1-0294 (X.X.X.).
∗ Electronic address: hhwen@aphy.iphy.ac.cn
1 J. Nagamatsu, N. Nakagawa, T. Muranaka, Y. Zenitani,
and J. Akimitsu, Nature (London) 410, 63 (2001).
2 A. Rydh, U. Welp, A. E. Koshelev, W. K. Kwok, G. W.
Crabtree, R. Brusetti, L. Lyard, T. Klein, C. Marcenat, B.
Kang, K. H. Kim, K. H. P. Kim, H.S. Lee, and S. I. Lee,
Phys. Rev. B 70, 132503 (2004); A. E. Koshelev and A. A.
Golubov, Phys. Rev. Lett. 92, 107008 (2004); ibid, Phys.
Rev. B 68, 104503 (2003).
3 P. de la Mora, M. Castro, and G. Tavizon, J. Phys.: Con-
dens. Matter 17, 965 (2005); I. Pallecchi, M. Monni, C.
Ferdeghini, V. Ferrando, M. Putti, C. Tarantini, and E.
Galleani D’Agliano, Eur. Phys. J. B 52, 171 (2006).
4 Q. Li, B. T. Liu, Y. F. Hu, J. Chen, H. Gao, L. Shan, H.
H. Wen, A. V. Pogrebnyakov, J. M. Redwing, and X. X.
Xi, Phys. Rev. Lett. 96, 167003 (2006).
5 H. J. Choi, D. Roundy, H. Sun, M. L. Cohen, and S. G.
Louie, Nature (London) 418, 758 (2002).
6 M. Iavarone, G. Karapetrov, A. E. Koshelev, W. K. Kwok,
G. W. Crabtree, D. G. Hinks, W. N. Kang, E. M. Choi,
H. J. Kim, H. J. Kim, and S. I. Lee, Phys. Rev. Lett. 89,
187002 (2002).
7 K. H. P. Kim, W. N. Kang, M. S. Kim, C. U. Jung, H.
J. Kim, E. M. Choi, M. S. Park, and S. I. Lee, Physica C
370, 13 (2002).
8 S. K. Gupta, S. Sen, A. Singh, D. K. Aswal, J. V. Yakhmi,
E. M. Choi, H. J. Kim, K. H. P. Kim, S. Choi, H. S. Lee,
W. N. Kang, and S. I. Lee, Phys. Rev. B 66, 104525 (2002).
9 X. H. Zeng, A. V. Pogrebnyakov, A. Kotcharov, J. E.
Jones, X. X. Xi, E. M. Lysczek, J.M. Redwing, S. Y. Xu,
J. Lettieri, D. G. Schlom, W. Tian, X. Q. Pan, Z. K. Liu ,
Nature Mater. 1, 35 (2002).
10 H. H. Wen, S. L. Li, Z. W. Zhao, H. Jin, Y. M. Ni, W. N.
Kang, H. J. Kim, E. M. Choi, and S. I. Lee, Phys. Rev. B
64, 134505 (2001).
11 M. P. A. Fisher, Phys. Rev. Lett. 62, 1415 (1989); D. S.
Fisher, M. P. A. Fisher, and D. A. Huse, Phys. Rev. B 43,
130 (1991); D. A. Huse, M. P. A. Fisher, and D. S. Fisher,
Nature (London) 358, 553 (1992).
12 R. H. Koch, V. Foglietti, G. Koren, A. Gupta, and M. P.
A. Fisher, Phys. Rev. Lett. 63, 1511 (1989); R. H. Koch,
V. Foglietti, and M. P. A. Fisher, ibid. 64, 2586 (1990).
13 H. Yamasaki, K. Endo, S. Kosaka, M. Umeda, S. Yoshida,
and K. Kajimura, Phys. Rev. B 50, 12959 (1994).
14 V. L. Berezinskii, Sov. Phys. JETP 32, 493 (1970); J. M.
Kosterlitz, and D. J. Thouless, J. Phys. C 6, 1181 (1973).
15 H. H. Wen, P. Ziemann, H. A. Radovan, and S. L. Yan,
Europhys. Lett. 42, 319 (1998).
16 M. P. A. Fisher, T. A. Tokuyasu, and A. P. Young, Phys.
Rev. Lett. 66, 2931 (1991);
17 C. Dekker, P. J. M. Wöltgens, R. H. Koch, B. W. Hussey,
and A. Gupta, Phys. Rev. Lett. 69, 2717 (1992).
18 H. H. Wen, A. F. Th. Hoekstra, R. Griessen, S. L. Yan, L.
Fang, and M. S. Si, Phys. Rev. Lett. 79, 1559 (1997).
19 H. H. Wen, H. A. Radovan, F. M. Kamm, P. Ziemann, S.
L. Yan, L. Fang, and M. S. Si, Phys. Rev. Lett. 80, 3859
(1998).
20 Y. Z. Zhang, R. Deltour, J. F. de Marneffe, H. H. Wen, Y.
L. Qin, C. Dong, L. Li, and Z. X. Zhao, Phys. Rev. B 62,
11373 (2000).
21 H. Jin, H. H. Wen, H. P. Yang, Z. Y. Liu, Z. A. Ren, G. C.
Che, and Z. X. Zhao, App. Phys. Lett. 83, 2626 (2003).
22 D. R. Strachan, M. C. Sullivan, P. Fournier, S. P. Pai, T.
Venkatesan and C. J. Lobb, Phys. Rev. Lett. 87, 067007
(2001).
23 Y. Jia, H. Yang, Y. Huang, L. Shan, C. Ren, C. G. Zhuang,
Y. Cui, Q. Li, Z. K. Liu, X. X. Xi, and H. H. Wen,
arXiv:cond-mat/0703637 (unpubilished).
24 R. S. Gonnelli, D. Daghero, G. A. Ummarino, V. A.
Stepanov, J. Jun, S. M. Kazakov, and J. Karpinski, Phys.
Rev. Lett. 89, 247004 (2002).
25 M. R. Eskildsen, M. Kugler, S. Tanaka, J. Jun, S. M. Kaza-
kov, J. Karpinski, and Ø. Fischer, Phys. Rev. Lett. 89,
187003 (2002).
26 A. Anthore, H. Pothier, and D. Esteve, Phys. Rev. Lett.
90, 127001 (2003).
mailto:hhwen@aphy.iphy.ac.cn
http://arxiv.org/abs/cond-mat/0703637
ABSTRACT
  The current-voltage (I-V) characteristics of various MgB2 films have been
studied at different magnetic fields parallel to c-axis. At fields \mu0H
between 0 and 5T, vortex liquid-glass transitions were found in the I-V
isotherms. Consistently, the I-V curves measured at different temperatures show
a scaling behavior in the framework of quasi-two-dimension (quasi-2D) vortex
glass theory. However, at \mu0 H >= 5T, a finite dissipation was observed down
to the lowest temperature here, T=1.7K, and the I-V isotherms did not scale in
terms of any known scaling law, of any dimensionality. We suggest that this may
be caused by a mixture of \sigma band vortices and \pi band quasiparticles.
Interestingly, the I-V curves at zero magnetic field can still be scaled
according to the quasi-2D vortex glass formalism, indicating an equivalent
effect of self-field due to persistent current and applied magnetic field.

<|endoftext|><|startoftext|>
Introduction 
Deep sub-100 nm magnetic nanoelements have been the focus of intense research 
interest due to their fascinating fundamental properties and potential technological 
applications.1-6 At such small dimensions, comparable to the typical magnetic domain 
wall width, properties of the nanomagnets are rich and complex. It is known that well 
above the domain wall width, in micron and sub-micron sized patterns, magnetization 
reversal often occur via a vortex state (VS).7-12 At reduced sizes, single domain (SD) 
static states are energetically more favorable.13 However, even in SD nanoparticles, the 
magnetization reversal can be quite complex, involving thermally activated incoherent 
processes.14 The VS-SD crossover itself is fascinating. For example, recently Jausovec et 
al. have proposed that a third, metastable, state exists in 97 nm permalloy nanodots, 
based on minor loop and remanence curve studies.15 To date, direct observation of the 
VS-SD crossover, especially in the deep sub-100 nm regime, has been challenging. 
Fundamentally, the vortex core is expected to have a nanoscale size, comparable to the 
exchange length. Magnetic imaging techniques face resolution limits and often are 
limited to remanent state and room temperature studies. Practically, collections of 
nanomagnets inevitably have variations in size, shape, anisotropy, etc.16 The ensemble-
averaged properties obtained by collective measurements such as magnetometry no 
longer yield clear signatures of the nucleation / annihilation fields. Furthermore, 
quantitatively capturing the distributions of magnetic properties are essential to the 
understanding and application of magnetic and spintronic devices, which may consist of 
billions of nanomagnets. How to qualitatively and quantitatively investigate the 
properties of such nanomagnets remains a key challenge for condensed matter physics 
and materials science.  
 In this study, we investigate the VS-SD crossover in deep sub-100 nm Fe 
nanodots. We have captured “fingerprints” of such nanodots using a first-order reversal 
curve (FORC) method,17-21 which circumvents the resolution, remanent state and room 
temperature limits by measuring the collective magnetic responses of the dots. The 
“fingerprints”, shown as FORC diagrams, reveal remarkably rich information about the 
nanodots. A qualitatively different reversal pattern is observed as the dot size is increased 
from 52 to 67 nm, despite only subtle differences in their major hysteresis loops.  The 52 
nm nanodots behave as SD particles; the 67 nm ones exhibit VS reversal; and the 58 nm 
ones have both SD and VS characteristics. Quantitatively, the FORC diagram shows 
explicitly a coercivity distribution for the SD dots, which agrees well with calculations; it 
yields SD and VS phase fractions in the larger dots; it also extracts unambiguously the 
nucleation and annihilation fields for the VS dots and distinguishes annihilations from 
opposite sides of the dots. 
II. Experimental 
Samples for the study are Fe nanodots fabricated using a nanoporous alumina 
shadow mask technique in conjunction with electron beam evaporation.22,23 This method 
allows for fabrication of high density nanodots (~1010 /cm2) over macroscopic areas (~1 
cm2).  Three different types of samples have been made on Si and MgO substrates with 
mean nanodot sizes of 52±8, 58±8, and 67±13 nm, and a thickness of 20 nm, 15 nm, and 
20 nm, respectively. The nanodot center-to-center spacing is typically twice its diameter. 
The Fe nanodots thus made are polycrystalline, capped with an Al or Ag layer. A 
scanning electron microscopy (SEM) image of the 67 nm sample is shown in Fig. 1. A 
survey of the size distribution is illustrated in Fig. 1 inset.  
Magnetic properties have been measured using a Princeton Measurements Corp.  
2900 alternating gradient and vibrating sample magnetometer (AGM/VSM), with the 
applied field in the plane of the nanodots. Samples have been cut down to ~ 3 × 3 mm2 
pieces, which contains ~ 109 Fe nanodots each. Additionally, the FORC technique has 
been employed to study details of the magnetization reversal. After saturation, the 
magnetization M is measured starting from a reversal field HR back to positive saturation, 
tracing out a FORC. A family of FORC’s is measured at different HR, with equal field 
spacing, thus filling the interior of the major hysteresis loop [Figs. 2(a)-2(c)]. The FORC 
distribution is defined as a mixed second order derivative:17-21  
( ) ( )
−≡ ,     (1) 
which eliminates the purely reversible components of the magnetization. Thus any non-
zero ρ  corresponds to irreversible switching processes.19-21  The FORC distribution is 
plotted against (H, HR) coordinates on a contour map or a 3-dimensional plot. For 
example, along each FORC in Fig 4(a) with a specific reversal field HR, the 
magnetization M is measured with increasing applied field H; the corresponding FORC 
distribution ρ in Fig. 4(b) is represented by a horizontal line scan at that HR along H.  
Alternatively ρ can be plotted in coordinates of (HC, HB), where HC is the local coercive 
field and HB is the local interaction or bias field.  This transformation is accomplished by 
a simple rotation of the coordinate system defined by: HB=(H+HR)/2 and HC=(H-HR)/2.  
Both coordinate systems are discussed in this paper.  
III. Results 
Families of the FORC’s for the 52, 58, and 67 nm nanodots are shown in Figs. 
2(a)-2(c). The major hysteresis loops, delineated by the outer boundaries of the FORC’s, 
exhibit only subtle differences. The 52 nm nanodots show a regular major loop, with a 
remanence of 57 % and a coercivity of 475 Oe [Fig. 2(a)]. The 67 nm nanodots have a 
slight “pinching” in its loop near zero applied field, with a remanence of 27 % and a 
coercivity of 246 Oe [Fig. 2(c)]. The unique shape, small values of coercivity and 
remanence suggest that the magnetization reversal is via a VS.  Indeed, the VS is 
confirmed by polarized neutron reflectivity measurements on similarly prepared 65 nm 
Fe nanodots, which find an out of plane magnetic moment corresponding to a vortex core 
of 15 nm.24 However, due to the relatively gradual changes in magnetization along the 
major loop, averaged over signals from ~ 109 Fe nanodots, it is difficult to determine the 
vortex nucleation and annihilation fields.  In contrast, the relatively fuller major loop of 
the 52 nm nanodots is suggestive of a SD state. The loop of the 58 nm nanodots appears 
to have combined features from those of the other two samples [Fig. 2(b)].   
 The subtle differences seen in the major hysteresis loops manifest themselves as 
striking differences in the corresponding FORC distributions, shown in Figs. 2(d)-2(i).  
For the 52 nm nanodots, the only predominant feature is a narrow ridge along the local 
coercivity HC-axis with zero bias [Figs. 2(d) and 2(g)].  The ridge is peaked at HC = 525 
Oe, near the major loop coercivity value of 475 Oe. This pattern is characteristic of a 
collection of non-interacting SD particles.25  Given that the nanodot spacing is about 
twice its diameter and the random in-plane easy axes, dipolar interactions are expected to 
be small.26 The relative spread of the FORC distribution along the HB-axis actually gives 
a direct measure of the interdot interactions, as we have shown in single domain 
magnetite nanoparticles with different separations.27 The sharp ridge shown in Fig. 2(g) is 
similar to that of an assembly of well–dispersed magnetite nanoparticles with little 
dipolar interactions. In the present case, the ridge is localized between bias field of HB ~ 
±100 Oe and has a narrow FWHM (full width at half maximum) of about 136 Oe [Fig. 
3(a)].   A simple calculation of the dipolar fields yields a value similar to the FWHM.     
As the nanodot size is increased, the FORC distribution becomes much more 
complex.  The 67 nm sample is characterized by three main features, as shown in Figs. 
2(f) and 2(i): two pronounced peaks at HC= 650 Oe and HB = ± 750 Oe, and a ridge along 
HB = 0, forming a butterfly-like contour plot [Fig. 2(i)]. The ridge has changed 
significantly from that of the 52 nm sample: a peak corresponding to the coercivity of the 
major loop is now virtually absent; instead a large peak at HC =1500 Oe has appeared, 
accompanied by two small negative regions nearby.  The 58 nm sample shows a FORC 
pattern representative of both the 52 and 67 nm samples [Figs. 2(e) and 2(h)].  The 
overall distribution resembles that of the 67 nm sample, with two peaks centered at HC = 
650 Oe and HB = ±400 Oe.  A ridge along HB = 0, peaking at roughly the major loop 
coercivity, is similar to that seen in the 52 nm sample.  Note that the 58 nm sample, being 
thinner, would tend to inhibit the formation of a VS in the smaller dots and therefore 
show magnetic reversal via a SD.  However, significant fractions of the nanodots in the 
ensemble are apparently reversing via a VS.    
The FORC distribution also allows us to extract quantitative information about 
the reversal processes.  Since each sample measured consists of ~ 1 billion nanodots with 
a distribution of sizes, a coercivity spread is contained in the FORC distribution.  For the 
52 nm sample, we have indeed extracted this distribution by projecting the ridge in Fig. 
2(g) onto the HC-axis (HB = 0), as shown as the open circles in Fig. 3(b).  The relative 
height gives the appropriate weight of nanodots with a given coercivity.  This extracted 
coercivity distribution can be compared with a simple theoretical calculation.  The 
coercivity of a SD particle undergoing reversal via a curling mode increases strongly with 
decreasing particle size d, according to   
HC −∝ ,     (2) 
where C1 and C2 are constants.28  Based on the mean nanodot size and the size 
distribution determined from SEM, we have calculated a coercivity distribution [solid 
circles in Fig. 3(b)]. A good agreement is obtained with that determined from the FORC 
distribution, after a rescaling of the latter by an arbitrary weight.  Thus for other non-
interacting single-domain particle systems with unknown size distributions, the FORC 
method may be used to extract that information. This is particularly important in 3D 
distributions of nanostructures where there is no direct image access to the individual 
dots, as is the case here for a 2D distribution. 
As we have demonstrated earlier, the FORC distribution ρ is extremely sensitive 
to irreversible switching.19 This is most convenient to see in the (H, HR) coordinate 
system (meaningful data is in H>HR), as non-zero values of ρ correspond to the degree of 
irreversibility along a given FORC. We have employed this capability to analyze the VS 
nucleation and annihilation for the 67 nm sample.  The complex butterfly-like pattern of 
Fig. 2(i) now transforms into irreversible switching mainly along line scans 1 and 2 in 
Fig. 4(b), which correspond to FORC’s starting at HR= 100 Oe and -1450 Oe, 
respectively [marked as bold with large open circles as starting points in Fig. 4(a)]. Along 
line scan 1 (HR=100 Oe), when applied field H=100 Oe, vortices have already nucleated 
in most of the nanodots. With increasing field H, ρ becomes non-zero and increases with 
H and peaks at 1320 Oe.  This corresponds to the annihilation of the vortices in majority 
of the nanodots, and eventually ρ returns to zero near positive saturation.  Line scan 2 
starts at HR= -1450 Oe, where the majority, but not all, of the nanodots have been 
negatively saturated.  As H is increased, a first maximum in ρ is seen at H= -100 Oe, 
corresponding to the nucleation of vortices within the nanodots. Between -100 Oe < H < 
1450 Oe, ρ is essentially zero, indicating reversible motion of the vortices through the 
nanodots.  A second ρ  maximum is found at H = 1450 Oe, as the vortices are 
annihilated. This is again followed by reversible behavior near positive saturation.  Note 
that along line scan 1, the vortices are annihilated from the same side of the nanodot from 
which they first nucleated, and thus the net magnetization remains positive; along line 
scan 2, the vortices nucleate on one side of the dot and are annihilated from the other, and 
consequently the net magnetization changes sign [Fig. 4(a)].  Interestingly the 
annihilation field along line scan 2, 1450 Oe, is larger than that along line scan 1, 1320 
Oe.  It seems more difficult to drive a vortex across the nanodot and then annihilate it. 
The peaks in Fig. 4(b) are rather broad, which is a manifestation of vortex nucleation and 
annihilation field distributions.  Also note that the interactions among the VS dots are 
expected to be negligible due to the high degree of flux closure, as confirmed by 
simulations.29 
IV. Simulations and Discussions 
For comparison, micromagnetic simulations have been carried out on nearly 
circular nanodots with 60 nm diameters and 20 nm thicknesses.30  We have used 
parameters appropriate for Fe (exchange stiffness A = 2.1 × 10-11 J/m, saturation 
magnetization Ms = 1.7 × 106 A/m, and anisotropy constant K = 4.8 × 104 J/m3).  Each 
polycrystalline nanodot is composed of 2 nm square cells that are 20 nm thick, where 
each cell is a different grain with a random easy axis.  A small cut on one side of the 
nanodot generates two distinct annihilation fields that depend on which side of the 
nanodot the vortex annihilates from. This exercise models the fact that our fabricated dots 
are not perfectly circular.31 We have simulated FORC’s generated by two nanodots with 
different orientations [Fig. 4(c)]: the edge-cut in one is parallel, and in the other at a 45º 
angle, to the applied field.  The simulated M-H curves show abrupt magnetization 
changes, corresponding to the nucleation, propagation, and annihilation of vortices.  
The corresponding FORC distribution is shown in Fig. 4(d).  Peaks in the 
simulated FORC distribution clearly indicate the nucleation and annihilation fields of the 
vortices which are apparent in Fig 4(c).  Along line scan 1 of Fig. 4(d), a vortex is already 
nucleated at HR = 100 Oe and subsequently annihilated at H = 2300 Oe (upper right 
corner).  Along line scan 2 with HR = -2450 Oe, a vortex is nucleated at H = -100 Oe and 
finally annihilated at H = 2450 Oe (lower right corner).  It is clear that the simulated 
FORC reproduces the key features of the experimentally obtained one in Fig. 4(b).  Here 
the asymmetric dot shape is essential to obtain a different annihilation field along scan 2 
than that along scan 1.  We have simulated the angular dependence of nucleation and 
annihilation fields in such circular dot with a small cut as the cut orientation is varied in a 
field.  We find that for most angles it is harder to annihilate a vortex from the opposite 
side of its nucleation site.  However, for a small range of angles near 45° it is actually 
slightly easier to annihilate from the opposite side.  It is the combination of these two 
behaviors that gives rise to the negative-positive-negative trio of features in the lower 
right portion of the FORC distribution.  The presence of similar features in the 
experimental data shown in Fig. 4(b) thus illustrates that the FORC distribution is also 
sensitive to variations of dot shapes in the array. Because only two dots are simulated, the 
features in the FORC distribution are much sharper than the experimental data where 
distributions of vortex nucleation and annihilation fields are present.  Including more dots 
in the simulation with different applied field orientations and size distributions would 
tend to broaden the features generated by the two dots simulated.      
Additionally, by selectively integrating the normalized FORC distribution20,21
corresponding to the SD phase (the aforementioned ridge centered at low coercivity 
values in Fig. 2), we can quantitatively determine the percentage of nanodots in SD state 
for each sample.  The SD phase fraction is 100%, 43%, and 10% for the 52, 58, and 67 
nm sample, respectively. Thus the 58 nm nanodots have a significant co-existence of both 
SD and VS states. However, we do not observe clear evidence of any additional 
metastable phase.15 
V. Conclusions 
In summary, we have used the FORC method to “fingerprint” the rich 
magnetization reversal behavior in arrays of 52, 58, and 67 nm sized Fe nanodots. 
Distinctly different reversal mechanisms have been captured, despite only subtle 
differences in the major hysteresis loops.  The 52 nm nanodots are in SD states.  A 
coercivity distribution has been extracted, which agrees with calculations. The 67 nm 
dots reverse their magnetization via the nucleation and annihilation of vortices. Different 
fields are required to annihilate vortices from opposite sides of the dots.  Quantitative 
measures of the vortex nucleation and annihilation fields have been obtained. OOMMF 
simulations confirm the experimental FORC distributions.   The 58 nm sample shows 
coexistence of SD and VS reversal, without evidence of additional reversal mode.  These 
results further demonstrate the FORC method as a simple yet powerful technique for 
studying magnetization reversal, due to its capability of capturing distributions of 
magnetic properties, sensitivity to irreversible switching, and the quantitative phase 
information it can extract. 
Acknowledgements 
This work has been supported by ACS (PRF-43637-AC10), AFOSR, and the 
Alfred P. Sloan Foundation. We thank J. E. Davies, J. Olamit, M. Winklhofer, C. R. Pike, 
H. G. Katzgraber, R. T. Scalettar, G. T. Zimányi, and K. L. Verosub for helpful 
discussions. R.K.D. acknowledges support from the Katherine Fadley Pusateri Memorial 
Travel Award. 
References 
*   Corresponding author, email address: kailiu@ucdavis.edu. 
1 C. Chappert, H. Bernas, J. Ferre, V. Kottler, J.-P. Jamet, Y. Chen, E. Cambril, T. 
Devolder, F. Rousseaux, V. Mathet, and H. Launois, Science 280, 1919 (1998). 
2 B. Terris, L. Folks, D. Weller, J. Baglin, A. Kellock, H. Rothuizen, and P. Vettiger, 
Appl. Phys. Lett. 75, 403 (1999). 
3 S. H. Sun, C. B. Murray, D. Weller, L. Folks, and A. Moser, Science 287, 1989 
(2000). 
4 C. Ross, Annu. Rev. Mater. Res. 31, 203 (2001). 
5 J. I. Martin, J. Nogues, K. Liu, J. L. Vicent, and I. K. Schuller, J. Magn. Magn. Mater. 
256, 449 (2003). 
6 F. Q. Zhu, G. W. Chern, O. Tchernyshyov, X. C. Zhu, J. G. Zhu, and C. L. Chien, 
Phys. Rev. Lett. 96, 027205 (2006). 
7 T. Shinjo, T. Okuno, R. Hassdorf, K. Shigeto, and T. Ono, Science 289, 930 (2000). 
8 R. P. Cowburn, D. K. Koltsov, A. O. Adeyeye, M. E. Welland, and D. M. Tricker, 
Phys. Rev. Lett. 83, 1042 (1999). 
9 A. Wachowiak, J. Wiebe, M. Bode, O. Pietzsch, M. Morgenstern, and R. 
Wiesendanger, Science 298, 577 (2002). 
10 K. Y. Guslienko, V. Novosad, Y. Otani, H. Shima, and K. Fukamichi, Phys. Rev. B 
65, 024414 (2002). 
11 H. F. Ding, A. K. Schmid, D. Li, K. Y. Guslienko, and S. D. Bader, Phys. Rev. Lett. 
94, 157202 (2005). 
12 J. Sort, K. S. Buchanan, V. Novosad, A. Hoffmann, G. Salazar-Alvarez, A. Bollero, 
M. D. Baró, B. Dieny, and J. Nogués, Phys. Rev. Lett. 97, 067201 (2006). 
13 We use "single domain state" to refer to all magnetization configurations with no 
domain wall and a non-zero net magnetization. 
14 Y. Li, P. Xiong, S. von Molnár, Y. Ohno, and H. Ohno, Phys. Rev. B 71, 214425 
(2005). 
15 A.-V. Jausovec, G. Xiong, and R. P. Cowburn, Appl. Phys. Lett. 88, 052501 (2006). 
16 Even in nanomagnets with identical size and shape, there may exist a distribution of 
intrinsic anisotropy. See e.g., T. Thomson, G. Hu and B. D. Terris, Phys. Rev. Lett. 
96, 257204 (2006). 
17 C.R. Pike and A. Fernandez, J. Appl. Phys. 85, 6668 (1999). 
18 H. G. Katzgraber, F. Pazmandi, C. R. Pike, K. Liu, R. T. Scalettar, K. L. Verosub, 
and G. T. Zimanyi, Phys. Rev. Lett. 89, 257202 (2002). 
19 J. E. Davies, O. Hellwig, E. E. Fullerton, G. Denbeaux, J. B. Kortright, and K. Liu, 
Phys. Rev. B 70, 224434 (2004). 
20 J. E. Davies, J. Wu, C. Leighton, and K. Liu, Phys. Rev. B 72, 134419 (2005). 
21 J. Olamit, K. Liu, Z. P. Li, and I. K. Schuller, Appl. Phys. Lett. 90, 032510 (2007). 
22 K. Liu, J. Nogues, C. Leighton, H. Masuda, K. Nishio, I. V. Roshchin, and I. K. 
Schuller, Appl. Phys. Lett. 81, 4434 (2002).  
23    C. P. Li, I. V. Roshchin, X. Batlle, M. Viret, F. Ott, I. K. Schuller, J. Appl. Phys. 100, 
074318 (2006). 
24 I. V. Roshchin, C. P. Li, X. Battle, J. Mejia-Lopez, D. Altbir, A. H. Romero, S. Roy, 
S. K. Sinha, M. Fitzimmons, F. Ott, M. Viret, and I. K. Schuller, unpublished. 
25 S. J. Cho, A. M. Shahin, G. J. Long, J. E. Davies, K. Liu, F. Grandjean, and S. M. 
Kauzlarich, Chem. Mater. 18, 960 (2006). 
26 M. Grimsditch, Y. Jaccard, and I. K. Schuller, Phys. Rev. B 58, 11539 (1998). 
27 J. E. Davies, J. Y. Kim, F. E. Osterloh, and K. Liu, unpublished. 
28    A. Aharoni, Introduction to the Theory of Ferromagnetism, 2nd Edition (Oxford 
University Press, Oxford, 2000). This is an approximation for reversal via curling as 
the dot sizes are larger than the coherence radius. 
29 J. Mejía-López, D. Altbir, A. H. Romero, X. Batlle, I. V. Roshchin, C. P. Li, and I. K. 
Schuller, J. Appl. Phys. 100, 104319 (2006). 
30   OOMMF code. http://math.nist.gov/oommf.  
31   Simulations done on perfectly circular and slightly elliptical dots show only a single 
annihilation field.  To reproduce the two distinct annihilation fields the symmetry of 
the dot must be broken by some type of shape defect.   
Figure Captions 
Fig. 1. (color online). Scanning electron micrograph of the 67 nm diameter nanodot 
sample. Inset is a histogram showing the distribution of nanodot sizes. 
Fig. 2. First-order reversal curves and the corresponding distributions. Families of 
FORC’s (a-c), whose starting points are represented by black dots for the 52, 58, 
and 67 nm Fe nanodots, respectively. The corresponding FORC distributions are 
shown in 3-dimensional plots (d-f) and contour plots (g-i).  
Fig. 3. (color online). Projection of the FORC distribution ρ of the 52 nm nanodots onto 
(a) the HB-axis, showing weak dipolar interactions; and (b) the HC-axis (open 
circles), showing a coercivity distribution that agrees with a calculation based on 
measured size distribution (solid circles).   
Fig. 4. (a) A family of measured FORC’s for the 67nm diameter dots. (b) The 
corresponding experimental FORC distribution plotted against applied field H 
and reversal field HR.  (c) A family of simulated FORC’s generated using the 
OOMMF code.  Inset shows the orientations of the two dots simulated.  (d) The 
FORC distribution calculated from the simulated FORC’s shown in (c).  The two 
white dashed lines in (b) and (d) correspond to the two bold FORC’s whose 
starting points are large open circles in (a) and (c), respectively.  
Fig. 1, Dumas, et al. 
Fig. 2, Dumas, et al. 
Fig. 3, Dumas, et al. 
Fig. 4, Dumas, et al.
ABSTRACT
  Sub-100 nm nanomagnets not only are technologically important, but also
exhibit complex magnetization reversal behaviors as their dimensions are
comparable to typical magnetic domain wall widths. Here we capture magnetic
"fingerprints" of 1 billion Fe nanodots as they undergo a single domain to
vortex state transition, using a first-order reversal curve (FORC) method. As
the nanodot size increases from 52 nm to 67 nm, the FORC diagrams reveal
striking differences, despite only subtle changes in their major hysteresis
loops. The 52 nm nanodots exhibit single domain behavior and the coercivity
distribution extracted from the FORC distribution agrees well with a
calculation based on the measured nanodot size distribution. The 58 and 67 nm
nanodots exhibit vortex states, where the nucleation and annihilation of the
vortices are manifested as butterfly-like features in the FORC distribution and
confirmed by micromagnetic simulations. Furthermore, the FORC method gives
quantitative measures of the magnetic phase fractions, and vortex nucleation
and annihilation fields.

<|endoftext|><|startoftext|>
Introduction
The data from the Swift satellite (Gehrels et al. 2004), and partic-
ularly its X-ray Telescope (XRT, Burrows et al. 2005), are rev-
olutionising our understanding of Gamma Ray Bursts (GRBs,
see Zhang 2007 for a recent review). The XRT typically begins
observing a GRB ∼ 100 s after the trigger, and usually follows
it for several days, and occasionally for months (e.g., Grupe et
al. 2007). However, creating light curves of the XRT data is a
non-trivial process with many pitfalls. The UK Swift Science
Data Centre is automatically generating light curves of GRBs –
an example light curve is given in Fig. 1 – and making them im-
mediately available online. In this paper we detail how the light
curves are created, and particularly, how the complications spe-
cific to these data are treated.
1.1. Aspects of light curve generation
In general, creation of X-ray light curves is a relatively simple,
quick task using ftools such as the xselect and lcmath pack-
ages. Building Swift/XRT light curves of GRBs, however, has a
number of complications which can make the task difficult and
slower, as described below.
⋆ pae9@star.le.ac.uk
⋆⋆ http://www.swift.ac.uk/xrt curves
100 1000 104 105 106
Time since BAT trigger (s)
Fig. 1. Swift X-ray light curve of GRB 051117a (Goad et al.
2007), created using the software described in this paper and
obtained from the Swift Light Curve Repository.
1.1.1. GRBs fade
The standard light curve tools, such as those mentioned above,
produce light curves with uniform bin durations. Since GRBs
fade by many orders of magnitude, long-duration bins are
needed at late times in order to detect the source. However,
GRBs show rapid variability and evolution at early times, and
http://arxiv.org/abs/0704.0128v2
2 P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs.
1062×105 5×105 2×106 5×106
Time since BAT trigger (s)
Static region size
1062×105 5×105 2×106 5×106
Time since BAT trigger (s)
Dynamic region size
Fig. 2. Late-time Swift X-ray light curves of GRB 060614
(Mangano et al. 2007), showing the need for the source region
to be reduced as the data fades.
Top panel: Where the source extraction region remains large at
late times, the source cannot be detected after 600 ks.
Bottom panel: Using a smaller source extraction region at later
times suppresses the background, yielding 6 more datapoints on
the light curve.
short time bins are needed to resolve these features. A better ap-
proach to producing GRB light curves is to bin data based on
the number of counts in a bin, rather than the bin duration. This
is common practice for X-ray spectroscopy, however there are
no ftools available to do this for light curves. While this is our
chosen means of binning GRB light curves, it is not the only
option. For example, one could use the Bayesian blocks method
(Scargle 1998) to determine the bin size.
Another complication caused by the fading nature of GRBs
is that when the burst is bright, it is best to extract data for a
relatively large radius around the GRB position, to maximise the
number of counts measured. When the GRB has faded, using
such a large region means that the measured counts would be
dominated by background counts, making it harder to detect the
source, thus it is necessary to reduce the source region size as
the GRB fades. This is illustrated in Fig. 2.
1.1.2. Swift data contain multiple observations and
snapshots
The Swift observing schedule is planned on a daily basis, and
each day’s observation of a given target has its own observa-
tion identification (ObsID) and event list. Thus if Swift follows a
GRB for two weeks, it will produce up to fourteen event lists, all
of which need to be used in light curve creation. At late times it
may become necessary to combine several datasets just to detect
the GRB.
Also, Swift’s low-Earth orbit means that it is unable to ob-
serve most targets continuously. Thus, any given ObsID may
contain multiple visits to the target (‘snapshots’) which again
will need to be combined (this differentiation between observa-
tions – datasets with a unique ObsID – and snapshots – different
on-target times within an ObsID – will be used throughout this
paper). Combining snapshots/observations can result in bins on
a light curve where the fractional exposure is less than 1. This
must be taken into account in calculating the count rate.
The standard pipeline processing of Swift data1 ensures that
the sky coordinates are correctly attained for each event, how-
ever the position of the GRB on the physical detector can be
different each snapshot due to changes in the spacecraft attitude.
This becomes a problem when one considers the effects of bad
pixels and columns.
1.1.3. CCD Damage
On 2005-May-27 the XRT was struck by a micrometeoroid
(Abbey et al. 2005). Several of the detector columns became
flooded with charge (‘hot’), and have had to be permanently
screened out. Unfortunately, these lie near the centre of the CCD,
so the point spread function (PSF) of a GRB often extends over
these bad columns. As well as these columns there are individual
‘hot pixels’ which are screened out, and other pixels which be-
come hot when the CCD temperature rises, so may be screened
out in one event list, but not in the next. Exposure maps and
the xrtmkarf tool can be used to correct for this, however this
has to be done individually for each Swift snapshot (since the
source will not be at the same detector position from one snap-
shot to the next). A single day’s observation contains up to 15
snapshots, thus to do this manually is a slow, laborious task. The
forthcoming xrtlccorr program should make this process eas-
ier, however it will still need to be executed for each observation.
1.1.4. Automatic readout-mode switching
One of the XRT’s innovative features is that it changes readout
mode automatically depending on the source intensity (Hill et
al. 2004). At high count-rates it operates in Windowed Timing
(WT) mode, where some spatial information is sacrificed to
gain time resolution (∆t = 1.8 × 10−3 s). At lower count-rates
Photon Counting (PC) mode is used, yielding full spatial infor-
mation, but lower time resolution (∆t = 2.5 s). The XRT also has
Photodiode (PD) mode, which contains no spatial information,
but has very high time resolution (∆t = 1.4× 10−4 s). This mode
was designed to operate for higher count-rates than WT mode,
however it was disabled following the micrometeoroid impact.
Prior to this, the XRT produced very few PD mode frames be-
fore switching to WT so we have limited our software to WT
and PC modes.
1 http://swift.gsfc.nasa.gov/docs/swift/analysis/xrt swguide v1 2.pdf
P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs. 3
For a simple, decaying GRB the earliest data are in WT
mode and as the burst fades the XRT switches to PC mode.
This is not always the case; the XRT can toggle between modes.
GRB 060929 for example, had a count-rate of ∼ 0.1 counts s−1
and the XRT was in PC mode, when a giant flare pushed the
count-rate up to ∼ 100 counts s−1 and the XRT switched into
WT mode, causing a ∼ 200 s gap in the PC exposure (Fig. 3, up-
per panel). Since the initial CCD frames are taken in WT mode,
and the PC data both preceded and succeeded the WT data taken
during the flare, there are large overlaps between the WT and PC
data.
On occasions, such as when Swift was observing GRB
050315 (Vaughan et al. 2006), the XRT oscillates rapidly be-
tween WT and PC modes (Fig. 3, lower panel). This ‘mode
switching’ occurs when the count-rate in the central window
of the CCD changes rapidly. Such variation is usually due to
the rapid appearance and disappearance of hot pixels at high
(∼ −52◦C) CCD temperatures (the XRT is only passively cooled
due to the failure of the on-board thermoelectric cooler, Kennea
et al. 2005), although contamination by photons from the illumi-
nated face of the Earth can also induce mode switching. Recent
changes in the on-board calibration have significantly reduced
the effects of hot-pixel induced mode switching, however when
it does happen it complicates light curve production by causing
a variable fractional exposure. Also, during mode switching the
XRT does not stay in either mode long enough to collect suffi-
cient data to produce a light curve bin (see Section 2.2), thus the
WT and PC bins can overlap. The lower panel of Fig. 3 illus-
trates these points.
1.1.5. Pile-up
Pile-up occurs when two photons are incident upon on the same
or adjacent CCD pixels in the same CCD frame. Thus, when the
detector is read out the two photons are recorded as one event.
Pile-up in the Swift XRT has been discussed by Romano et al.
(2006) for WT mode and Vaughan et al. (2006) and Pagani et
al. (2006) for PC mode. Their quantitative analyses show the
effects of pile-up at different count rates, and we used these val-
ues to determine when we consider pile-up to be a problem (see
Section 2.2).
This problem is not unique to Swift, but because GRBs vary
by many orders of magnitude, pile-up must be identified and cor-
rected in a time-resolved manner. The standard way to correct
for pile-up is to use an annular source extraction region, discard-
ing the data near the centre of the PSF where pile-up occurs.
For constant sources, or those which vary about some roughly
constant mean, it is usually safe to use this annular region at all
times. This is not true for GRBs, which can span five decades in
brightness; using an annulus when the burst is faint would make
it almost undetectable!
In the following sections we detail the algorithm used to gen-
erate light curves automatically, and in particular we concentrate
on how the above issues are resolved.
Time since BAT trigger (s)
100 1000200 500 2000
Time since BAT trigger (s)
Fig. 3. Swift X-ray light curves of two GRBs, showing the
switching between readout modes. WT mode is blue, PC mode
Top panel: GRB 060929. The XRT changed from PC to WT
mode due to a large flare.
Bottom panel: GRB 050315. The XRT was ‘mode-switching’
during the second snapshot. The lower pane shows the fractional
exposure, which is highly variable due to this effect.
2. Light curve creation procedure
The raw Swift/XRT data are processed at the Swift Data Center
at NASA’s Goddard Space Flight Center, using the standard
Swift software developed at the ASI Science Data Center
(ASDC) in Italy. The processed data are then sent to the Swift
quick-look archives at Goddard, the ASDC, and the UK. As soon
as data for a new GRB arrive at the UK site, the light curve
generation software is triggered, and light curves made available
within minutes.
The light curve creation procedure can be broken down into
three phases. The preparation phase gathers together all of the
observations of the GRB, creating summed source and back-
ground event lists. The production phase converts these data
into time-binned ASCII files, applying corrections for the above-
mentioned problems in the process. The presentation phase then
produces light curves from the ASCII files, and transfers them
to the online light curve repository.
4 P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs.
2.1. Phase #1 – Preparation Phase
In overview: this phase collates all of the observations, de-
fines appropriate source and background regions (accounting for
pileup where necessary), and ultimately produces a source event
list and background event list for WT and PC mode, which are
then passed to the production phase.
The preparation phase begins by creating a list of ObsIDs
for the GRB, and then searching the file metadata to ascertain
the position of the burst, the trigger time, and the name. An im-
age is then created from the first PC-mode event list and the
ftool xrtcentroid is used to obtain a more accurate position. A
circular source region is then defined, centred on this position,
and initially 30 pixels (71”) in radius. A background region is
also defined, as an annulus centred on the burst with an inner ra-
dius of 60 pixels (142”) and an outer radius of 110 pixels (260”).
The image is also searched for serendipitous sources close to the
GRB (e.g. there is a flare star 40” from GRB 051117A; Goad et
al. 2007), and if any are found to encroach on the source region,
the source extraction radius is reduced to prevent contamination.
The software then takes each event list in turn. The bad pixel
information is obtained from the ‘BADPIX’ FITS extension and
stored for use in the production phase. An image of the back-
ground region is created, and the detect routine in ximage is
used to identify any sources with a count-rate ≥ 3σ above the
background level. For each source thus found, a circular region
centred on the source with a radius equal to the source extent
returned by ximage, is excluded from the background region.
The event list is then broken up into individual snapshots, the
mean count-rate during each snapshot is ascertained and used to
determine an appropriate source region size (Table 1) and ap-
pended to the event list for later use. The values in Table 1 were
determined by manual analysis of many GRB observations, and
reflect a compromise between minimising the background level
while maximising the proportion of source counts that are de-
tected.
For each snapshot, the detector coordinates of the object are
found using the pointxform ftool, and used to confirm that both
source and background regions lie within the CCD. If the back-
ground region falls off the edge of the detector it is simply shifted
by an appropriate amount (ensuring that the inner ring of the an-
nulus remains centred on the source). The source region must
remain centred on the source in order for later count-rate correc-
tions to be valid, however if this results in part of the extraction
region lying outside of the exposed CCD area, the source region
for this snapshot is reduced.
The first part of pile-up correction is carried out at this stage.
A simple, uniform time-bin light curve is created with bins of 1 s
(5 s) for WT (PC) mode, and then parsed to identify times where
the count rate climbs above 150 (0.6) counts s−1; such times are
considered to be at risk of pile-up. For WT mode we are un-
able to investigate further, since we have only one-dimensional
spatial information arranged at an arbitrary (albeit known) an-
gle in a two dimensional plane, and no tools currently exist to
extract a PSF from such data. Instead, the centre of the source
region is excluded, such that the count rate in the remaining pix-
els never rises above 150 counts s−1. The number of excluded
pixels is typically in the range ∼ 6–20, depending on the source
brightness. For PC mode, a PSF profile is obtained for the times
of interest, and the wings of this (from 25” outwards) are fitted
with a King function which accurately reproduces Swift’s PSF
(Moretti et al. 2005). This fit is then extrapolated back to the
PSF core, and if the model exceeds the data by more than the
1-σ error on the data, the source is classified as piled up (Fig. 4).
10 100
Radius (arc sec)
Fig. 4. The PSF of GRB 061121 (Page et al. 2007), during the
first snapshot of PC data. The model PSF was fitted to the data
more than 25” from the burst. The central 10” are clearly piled
Table 1. Source extraction radii used for given count rates. R is
the measured, uncorrected count rate.
Count rate R (counts s−1) Source radius in pixels (arc sec)
R > 0.5 30 (70.8”)
0.1 < R ≤ 0.5 25 (59.0”)
0.05 < R ≤ 0.1 20 (47.2”)
0.01 < R ≤ 0.05 15 (35.4”)
0.005 < R ≤ 0.01 12 (28.3”)
0.001 < R ≤ 0.005 9 (21.2”)
0.0005 < R ≤ 0.001 7 (16.5”)
R ≤ 0.0005 5 (11.8”)
The source region is then replaced with an annular region whose
inner radius is that at which the model PSF and the data agree
to within 1-σ of the data. Note that these annular regions are
only used during the intervals for which pile-up was detected,
the rest of the time a circular region is used (or a box-shaped
region for WT mode). If there are several separate intervals of
pile-up (e.g., pile-up lasts for several snapshots, or a flare causes
the count-rate to rise into the pile-up régime), they each have
their own annular region. The inner radii of the annuli (or size of
the excluded region in WT mode) are stored in the event list, so
that in the production phase the count-rate can be corrected for
events lost by the exclusion of the central part of the PSF.
The time-dependent region files thus created are used to
generate source and background event lists for this snapshot.
This process is performed for every snapshot in every observa-
tion of the GRB, and the event lists are then combined to yield
one source and one background event list for each XRT mode.
Additionally, all PC-mode event lists are merged for use in the
presentation phase (phase #3)
2.2. Phase #2 – Production Phase
In this phase the data are first filtered so that only events with
energy in the range 0.3–10 keV are included. For WT (PC)
mode, only events with grades 0–2 (0–12) are accepted. Each
mode is then processed separately: WT and PC mode data are
not merged. The process described in this section occurs three
times in parallel: once on the entire dataset, once binning only
soft photons (with energies in the range 0.3–1.5 keV), and once
binning only the hard photons (1.5–10 keV). The data are then
binned and background subtracted. Since the source region is
P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs. 5
dynamic and could change within a bin, each background pho-
ton is individually scaled to the source area (the source radius
used was saved in each event list during the preparation phase).
A bin (i.e. a point on the light curve) is defined as the small-
est possible collection of events which satisfies the following
criteria:
– There must be at least C counts from the source event list.
– The bin must span at least 0.5 (2.51) s in WT (PC) mode.
– The source must be detected at a significance of at least 3σ.
– There must be no more events within the source region in
this CCD frame
For the energy-resolved data, both the soft and hard data
must meet these criteria individually to complete a bin.
C, the minimum number of counts in the source region, is a
dynamic parameter. Its default value of 30 for WT mode and 20
for PC mode is valid when the source count-rate is one count per
second. It scales with count rate, such that an order of magnitude
change in count rate produces a factor of 1.5 change in C. This
is done discretely, i.e. where 1 ≤ rate < 10, C=30 counts (WT
mode), for 10 ≤ rate < 100, C = 45 counts etc. C must always
be above 15 counts, so that Gaussian statistics remain valid. Note
that C always refers to the number of measured counts, with no
corrections applied, however ‘rate’ refers to the corrected count
rate (see below). These values of C give poor signal-to-noise
levels in the hardness ratio, so for the energy-resolved data we
require 2C counts in each band in order to create a bin.
The second criterion (the bin duration) is in place to enable
reasonable sampling of the background. For the third criterion
we define the detection significance as σ = N/
B, where N is
the number of net counts from the source and B is the number
of background counts scaled to the source area. Thus we require
that a datapoint have a < 0.3% probability of being a background
fluctuation before we regard it as ‘real’.
The final criterion is used because the CCD is read out at dis-
crete times, thus all events that occur between successive read-
outs (i.e. within the same frame) have the same time stamp.
Thus, if the final event in one bin and the first event in the next
were from the same frame, those bins would overlap. Apart from
being cosmetically unpleasant, this will also make modelling the
light curves much harder, and is thus avoided.
At the end of a Swift snapshot, there may be events left over
which do not yet comprise a full bin. These will be appended
to the last full bin from this snapshot, if there is one, other-
wise they are carried over to the next snapshot. At the end of
the event list, if there are still spare events, this bin is replaced
with an upper limit on the count rate. This is calculated at the 3σ
(i.e. 99.7%) confidence level, using the Bayesian method cham-
pioned by Kraft, Burrows and Nousek (1991).
As the data are binned and background subtracted, the count-
rates are corrected for losses due to pile-up, dead zones on
the CCD (i.e. bad pixels and bad columns) and source pho-
tons which fell outside the source extraction region. This cor-
rection, which is applied on an event-by-event basis, is achieved
by numerically simulating the PSF for the relevant XRT mode
over a radius of 150 pixels, and summing it. It is then summed
again, however this time, the value of any pixel in the simu-
lated PSF which corresponds to a bad pixel in the data is set to
zero before the summation (the lists of bad pixels and the times
for which they were bad were saved in the preparation phase).
Furthermore, only the parts of the PSF which were within the
data extraction region are included. Taking the ratio of the com-
plete PSF to the partial PSF gives the correction factor. This
method is analogous to using exposure maps and the xrtmkarf
task, as is done when manually creating light curves. Alternative
methods of using xrtmkarf give correction factors which differ
by up to 5%; we compared our correction factors with these, and
found them to lie in the middle of this distribution.
In addition to these corrections, we need to ensure that the
exposure time is calculated correctly: mode switching, or bins
spanning multiple snapshots, will result in a bin duration which
is much longer than the exposure time. This is done by using
the Good Time Interval (GTI) information from the event lists:
if a bin spans multiple GTIs the dead-times between GTIs are
summed, and the result is subtracted from the bin duration to
give the exposure time, which is used to calculate the count rate.
The fractional exposure is defined as the exposure time divided
by the bin duration.
Finally, the data are written to ASCII files. The following
information is saved for each bin:
– Time in seconds (with errors). The bin time is defined as the
mean photon arrival time, and the (consequentially asym-
metric) errors span the entire time interval covered by the
bin. Time zero is defined as the BAT trigger time. For non-
Swift bursts, the trigger time given in the GCN circular which
announced the GRB is used as time zero.
– Source count rate (and error) in counts s−1. This is the final
count rate, background subtracted and fully corrected, with
a ±1-σ error.
– Fractional exposure.
– Background count rate (and error) in counts s−1. This is the
background count rate scaled to the source region, with a
±1-σ error.
– Correction factor applied to correct for to pile-up, dead zones
on the CCD, and source photons falling outside of the source
extraction region.
– Measured counts in the source region.
– Measured background counts, scaled to the source region.
– Exposure time
– Detection significance (σ), before corrections were applied.
If an upper limit is produced, the measured counts and de-
tection significance columns refer to the data which have been
replaced with an upper limit. The significance of the upper limit
is always 3σ.
σ is always calculated before the corrections are applied,
since it is a measure of how likely it is that the measured counts,
not corrected counts, were caused by a fluctuation in the back-
ground level.
2.2.1. Counts to flux conversion
The conversion from count rates (as in our light curves) to flux
requires spectral information. Since automatic spectral fitting is
prone to errors (e.g due to local minima of the fit statistic), we
refrain from doing this. Furthermore, accurate flux conversion
needs to take into account spectral variation as the flux evolves,
which is beyond the scope of this work.
The GCN reports issued by the Swift team contain a mean
conversion factor for a given burst. These tend to be around
5×10−11 erg cm−2 count−1(0.3–10 keV), suggesting such a value
could be used as an approximate conversion. For 10 Swift bursts
between GRB 070110 and GRB 070306, the mean flux conver-
sion is 5.04 × 10−11 erg cm−2 count−1, with a standard deviation
of 2.61 × 10−11 erg cm−2 count−1.
6 P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs.
2.3. Phase #3 – Presentation Phase
The final phase parses the output of the production phase to pro-
duce light curves. Three such curves are produced, and Fig. 5
shows an example of each; count-rates and the time since trig-
ger are plotted logarithmically. The first is a basic light curve,
simply showing count-rate against time. The second also shows
the background level and fractional exposure. In WT mode when
the GRB is bright, the background tends to be dominated by the
< 1% of the PSF which leaks into the background region, but
because of the high source count rate, this has negligible effects
on the corrected count rates. The PC mode background should
generally be approximately constant. If it shows large variations,
the data may be contaminated by enhanced background linked to
the sunlit Earth. Unfortunately, such contamination is currently
unpredictable and varies both spatially and temporally; it is thus
very difficult to correct for manually, and our automated pro-
cessing does not currently correct for this. PC mode data points
which occur during times of variable background should thus be
treated with caution. We note, however, that our testing proce-
dure (Section 3) does not show our light curves to be degraded
when bright Earth characteristics become apparent.
The third light curve produced in this phase is energy-
resolved. The hard- and soft-band light curves are shown sep-
arately, and the hard/soft ratio makes up the bottom panel of this
plot.
Also created in this phase is a deep PC mode image, using
the summed PC event list created in the preparation phase. This
image is split into three energy bands: 0.3–1.2 keV, 1.2–1.8 keV
and 1.8–10 keV. These bands were chosen based on the spectra
of the GRBs seen by Swift to date, to ensure that for a ‘typical’
burst, there will be approximately equal numbers of counts in
each band. These three energy-resolved images are plotted on
a logarithmic scale, and combined (using ImageMagick to pro-
duce a 3-colour image (with red, green and blue being the soft,
medium and hard bands respectively). This is then smoothed us-
ing ImageMagick.
Once created, these products are transferred to the online
repository.
2.4. Immediate light curve regeneration
Our light curve generation is a dynamic process: a light curve is
created when the first XRT data arrive in the UKSSDC archive
– typically 1.5–2 hours after the burst – and it is then up-
dated whenever new data have been received and undergone the
pipeline processing. Thus, a light curve should never be more
than ∼ 15 minutes older than the quick-look data. If the GRB is
being observed every orbit, new data can be received as often as
every 96 minutes.
The update procedure is identical to that described in
Sections 2.1–2.3 above, except that only the new data are pro-
cessed and the results appended to the existing light curve. In
the case where the existing light curves ends with an upper limit,
the data from this upper limit are reprocessed with the new data,
hopefully enabling that limit to be replaced with a detection.
3. Testing procedure
In order to confirm that our light curves are correct, we used
manually created light curves for every GRB detected by the
Swift XRT up to GRB 070306, which had been produced by one
of us (K.L. Page). We broke each light curve up into phases of
constant (power-law) decay, and compared the count-rate and
100 1000 104 105 106 107
Time since BAT trigger (s)
100 1000 104 105 106 107
Time since BAT trigger (s)
100 1000 104 105 106 107
Time since BAT trigger (s)
Fig. 5. Light curve images for GRB 060729. These data have
been discussed by Grupe et al. (2007).
Top panel: Basic light curve.
Centre panel: Detailed light curve, with background levels
shown below the light curve, and fractional exposure given in
the lower pane.
Bottom panel: Hardness ratio. The 3 panes are (top to bot-
tom) Hard: (1.5–10 keV) , Soft (0.3–1.5 keV) and the ratio
(Hard/Soft).
P.A. Evans et al.: An online repository of Swift/XRT light curves of GRBs. 7
Fig. 6. An example three colour image. This is GRB 060729, and
the exposure time is 1.2 Ms.
time at the start and end of each of these phases. We also con-
firmed that the shape of the decay was the same in both auto-
matic and manually created light curves. Where applicable, we
also confirmed that the transition between XRT read-out modes
looked the same in both sets of light curves. Once we were sat-
isfied that our light curves passed this test, we also compared
a random sample of 30 GRBs with those manually created by
other members of the Swift/XRT team and again found good
agreement.
4. Data availability and usage
Our light curve repository is publicly available via the internet,
http://www.swift.ac.uk/xrt curves/
Specific light curves can be accessed directly by appending
their Swift target ID to this URL2.
While every effort has been made to make this process com-
pletely automatic, there may be cases where the light curve gen-
eration fails (e.g. if the source is too faint to centroid on, or if
there are multiple candidates within the BAT error circle). In this
event, a member of the Swift/XRT team will manually instigate
the creation procedure as soon as possible. For GRBs detected
by other observatories which Swift subsequently observes, the
creation procedure will not be automatically triggered, however
the XRT team will trigger it manually in a timely manner.
These light curves, data and images may be used by anyone.
In any publication which makes use of these data, please cite this
paper in the body of your publication where the light curves are
presented. The suggested wording is:
“For details of how these light curves were produced, see
Evans et al. (2007).”
2 The target ID is the trigger number, given in the GCN notices and
circulars, but padded with leading zeroes to be 8 digits long. e.g. GRB
060729 had the trigger number 221755, so its target ID is 00221755
Please also include the following paragraph in the
Acknowledgements section:
“This work made use of data supplied by the UK Swift
Science Data Centre at the University of Leicester.”
5. Acknowledgements
PAE, APB, KLP, LGT and JPO acknowledge PPARC support.
LV, JR, PM and DNB are supported by NASA contract NAS5-
00136.
References
Abbey, A.F., Carpenter, J., Read, A., et al. 2005, in ESA-SP 604, ‘The X-ray
Universe 2005’, 943
Burrows, D.N., Hill, J.E., Nousek, J.A., et al. 2005, Sp. Sci. Rev, 120, 165
Gehrels N., Chincarini, G., Giommi, P., et al. 2004, ApJ, 611, 1005
Goad, M.R., Page, K.L., Godet, O., et al., 2007, A&A, in press
(astro-ph/0612661)
Grupe, D., Burrows, D.N., Patel, S.K., et al. 2007, ApJ, in press
(astro-ph/0611240)
Hill, J.E., Burrows, D.N., Nousek, J.A., et al. 2004, SPIE, 5165, 217
Kraft, R.P., Burrows, D.N., Nousek, J.A., et al. 1991, ApJ, 374, 344
Kennea, J.A., Burrows, D.N., Wells, A., et al. 2005, SPIE, 5898, 329
Moretti, A., Campana, S., Mineo, T., et al. 2005, SPIE, 5898, 360
Pagani, C., Morris, D.C., Kobayashi S., et al. 2006, ApJ, 645, 1315
Page, K.L., Willingale, R., Osborne, J.P., et al. 2007, ApJ, in press (astro-
ph/0704.1609)
Romano, P., Campana, S., Chincarini, G., et al. 2006, A&A, 456, 917
Scargle J.D., 1998, ApJ, 504, 405
Vaughan, S., Goad, M.R., Beardmore, A.P., et al., 2006, ApJ, 638, 920
Mangano, V., Holland, S.T., Malesani, D., et al., 2007, A&A, in press
Zhang, B., 2007, ChJAA, 7, 1
http://www.swift.ac.uk/xrt_curves/
http://arxiv.org/abs/astro-ph/0612661
http://arxiv.org/abs/astro-ph/0611240
	Introduction
	Aspects of light curve generation
	GRBs fade
	Swift data contain multiple observations and snapshots
	CCD Damage
	Automatic readout-mode switching
	Pile-up
	Light curve creation procedure
	Phase #1 – Preparation Phase
	Phase #2 – Production Phase
	Counts to flux conversion
	Phase #3 – Presentation Phase
	Immediate light curve regeneration
	Testing procedure
	Data availability and usage
	Acknowledgements
ABSTRACT
  Context. Swift data are revolutionising our understanding of Gamma Ray
Bursts. Since bursts fade rapidly, it is desirable to create and disseminate
accurate light curves rapidly.
  Aims. To provide the community with an online repository of X-ray light
curves obtained with Swift. The light curves should be of the quality expected
of published data, but automatically created and updated so as to be
self-consistent and rapidly available. Methods. We have produced a suite of
programs which automatically generates Swift/XRT light curves of GRBs. Effects
of the damage to the CCD, automatic readout-mode switching and pile-up are
appropriately handled, and the data are binned with variable bin durations, as
necessary for a fading source.
  Results. The light curve repository website
(http://www.swift.ac.uk/xrt_curves) contains light curves, hardness ratios and
deep images for every GRB which Swift's XRT has observed. When new GRBs are
detected, light curves are created and updated within minutes of the data
arriving at the UK Swift Science Data Centre.

<|endoftext|><|startoftext|>
Introduction.
	2. The Aubry set and the quotient Aubry set.
	3. The main result.
	4. Proof of the Main Lemma.
	5. Proof of a modified version of Kneser-Glaeser's Rough composition theorem.
	References
ABSTRACT
  In this paper we show that the quotient Aubry set associated to certain
Lagrangians is totally disconnected (i.e., every connected component consists
of a single point). Moreover, we discuss the relation between this problem and
a Morse-Sard type property for (difference of) critical subsolutions of
Hamilton-Jacobi equations.

<|endoftext|><|startoftext|>
Introduction
1.1. Setting. We use standard notations of [FH, S]; for the precise definition (algorithm) of
generalized Cartan-Tanaka–Shchepochkina (CTS) complete and partial prolongations, and
algorithms of their construction, see [Shch]. Hereafter K is an algebraically closed field
of characteristic p > 2, unless specified. Let g′ = [g, g], and c(g) = g ⊕ center, where
dim center = 1. Let n)g denote the incarnation of the Lie (super)algebra g with the n)th
Cartan matrix, cf. [GL4, BGL1]. On classification of simple vectorial Lie superalgebras
with polynomial coefficients (in what follows referred to as vectorial Lie superalgebras of
polynomial vector fields over C, see [LSh, K3]).
The works of S. Lie, Killing and È. Cartan, now classical, completed classification over C
(1) simple Lie algebras of finite dimension and of polynomial vector fields.
Lie algebras and Lie superalgebras over fields in characteristic p > 0, a.k.a. modular Lie
(super)algebras, were first recognized and defined in topology, in the 1930s. The simple
Lie algebras drew attention (over finite fields K) as a step towards classification of simple
finite groups, cf. [St]. Lie superalgebras, even simple ones and even over C or R, did not
draw much attention of mathematicians until their (outstanding) usefulness was observed by
physicists in the 1970s. Meanwhile mathematicians kept discovering new and new examples
of simple modular Lie algebras until Kostrikin and Shafarevich ([KS]) formulated a conjecture
embracing all previously found examples for p > 7. Its generalization reads: select a Z-form
gZ of every g of type
1) (1), take gK := gZ⊗Z K and its simple finite dimensional subquotient
si(gK) (there can be several such si(gK)). Together with deformations
2) of these examples
we get in this way all simple finite dimensional Lie algebras over algebraically closed fields
if p > 5. If p = 5, we should add to the above list Melikyan’s examples.
1991 Mathematics Subject Classification. 17B50, 70F25.
Key words and phrases. Cartan prolongation, nonholonomic manifold, Lie superalgebra.
We are thankful to I. Shchepochkina for help; DL is thankful to MPIMiS, Leipzig, for financial support
and most creative environment.
1)Observe that the algebra of divided powers (the analog of the polynomial algebra for p > 0) and hence
all prolongs (Lie algebras of vector fields) acquire one more — shearing — parameter: N , see [S].
2)It is not clear, actually, if the conventional notion of deformation can always be applied if p > 0 (for
the arguments, see [LL]; cf. [Vi]); to give the correct (better say, universal) notion is an open problem, but
in some cases it is applicable, see [BGL4].
http://arxiv.org/abs/0704.0130v1
2 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
Having built upon ca 30 years of work of several teams of researchers, and having added
new ideas and lots of effort, Block, Wilson, Premet and Strade proved the generalized KSh
conjecture for p > 3, see [S]. For p ≤ 5, the above KSh-procedure does not produce all
simple finite dimensional Lie algebras; there are other examples. In [GL4], we returned to
É. Cartan’s description of Z-graded Lie algebras as CTS prolongs, i.e., as subalgebras of
vectorial Lie algebras preserving certain distributions; we thus interpreted the “mysterious”
at that moment exceptional examples of simple Lie algebras for p = 3 (the Brown, Frank,
Ermolaev and Skryabin algebras), further elucidated Kuznetsov’s interpretation [Ku1] of
Melikyan’s algebras (as prolongs of the nonpositive part of the Lie algebra g(2) in one of
its Z-gradings) and discovered three new series of simple Lie algebras. In [BjL], the same
approach yielded bj, a simple super versions of g(2), and Bj(1;N |7), a simple p = 3 super
Melikyan algebra. Both bj and Bj(1;N |7) are indigenous to p = 3, the case where g(2) is
not simple.
1.2. Classification: Conjectures and results. Recently, Elduque considered super analogs
of the exceptional simple Lie algebras; his method leads to a discovery of 10 new simple
(presumably, exceptional) Lie superalgebras for p = 3. For a description of the Elduque
superalgebras, see [CE, El1, CE2, El2]; for their description in terms of Cartan matrices and
analogs of Chevalley relations and notations we use in what follows, see [BGL1, BGL2].
In [L], a super analog of the KSh conjecture embracing all types of simple (finite dimen-
sional) Lie superalgebras is formulated based on an entirely different idea in which the CTS
prolongs play the main role:
F o r e v e r y s i m p l e f i n i t e d i m e n s i o n a l L i e ( s u p e r ) a l g e b r a o f t h e
f o r m g(A) , t a k e i t s n o n - p o s i t i v e p a r t w i t h r e s p e c t t o a c e r t a i n
s i m p l e s t Z - g r a d i n g , c o n s i d e r i t s c o m p l e t e a n d p a r t i a l p r o l o n g s
a n d t a k e t h e i r s i m p l e s u b q u o t i e n t s .
The new examples of simple modular Lie superalgebras (BRJ, Bj(3;N |3), Bj(3;N |5))
support this conjecture. (This is how Cartan got all simple Z-graded Lie algebras of poly-
nomial growth and finite depth — the Lie algebras of type (1) — at the time when the root
technique was not discovered yet.)
1.2.1. Yamaguchi’s theorem ([Y]). This theorem, reproduced in [GL4, BjL], states that
for almost all simple finite dimensional Lie algebras g over C and their Z-gradings g = ⊕
of finite depth d, the CTS prolong of g≤ = ⊕
−d≤i≤0
gi is isomorphic to g, the rare exceptions
being two of the four series of simple vectorial algebras (the other two series being partial
prolongs).
1.2.2. Conjecture. In the following theorems, we present the results of SuperLie-assisted
([Gr]) computations of the CTS-prolongs of the non-positive parts of the simple finite di-
mensional Lie algebras and Lie superalgebras g(A); we have only considered Z-grading cor-
responding to each (or, for larger ranks, even certain selected) of the simplest gradings
r = (r1, . . . , rrkg), where all but one coordinates of r are equal to 0 and only one — selected
— is equal to 1, and where we set degX±i = ±ri for the Chevalley generators X±i of g(A),
see [BGL1].
O t h e r g r a d i n g s ( a s w e l l a s a l g e b r a s g(A) o f h i g h e r r a n k s ) d o
n o t y i e l d n e w s i m p l e L i e ( s u p e r ) a l g e b r a s a s p r o l o n g s o f t h e n o n -
p o s i t i v e p a r t s.
New simple modular Lie superalgebras 3
1.3. Theorem. The CTS prolong of the nonpositive part of g returns g in the following
cases: p = 3 and g = f(4), e(6), e(7) and e(8) considered with the Z-grading with one selected
root corresponding to the endpoint of the Dynkin diagram.
1.3.1. Conjecture. [The computer got stuck here, after weeks of computations] To the
cases of Theorem 1.3, one can add the case for p = 5 and g = el(5) (see [BGL2]) in its
Z-grading with only one odd simple root and with one selected root corresponding to any
endpoint of the Dynkin diagram.
1.4. Theorem. Let p = 3. For the previously known (we found more, see Theorems 1.6,
1.7) simple finite dimensional Lie superalgebras g of rank ≤ 3 with Cartan matrix and for
their simplest gradings r, the CTS prolongs (of the non-positive part of g) different from g
are given in the following table elucidated below.
1.5. Melikyan superalgebras for p = 3. There are known the two constructions of the
Melikyan algebra Me(5;N) = ⊕
Me(5;N)i, defined for p = 5:
1) as the CTS prolong of the triple Me0 = cvect(1; 1), Me−1 = O(1; 1)/const and the
trivial module Me−2, see [S]; this construction would be a counterexample to our conjecture
were there no alternative:
2) as the complete CTS prolong of the non-negative part of g(2) in its grading r = (01),
with g(2) obtained now as a partial prolong, see [Ku1, GL4].
In [BjL], we have singled out Bj(1;N |7) as a p = 3 simple analog of Me(5;N) as a partial
CTS prolongs of the pair (the negative part of k(1;N |7), Bj(1;N |7)0 = pgl(3)), and bj as a
p = 3 simple analog of g(2) whose non-positive part is the same as that of Bj(1;N |7), i.e.,
bj and Bj(1;N |7) are analogs of the construction 2).
The original Melikyan’s construction 1) also has its super analog for p = 3 (only in the
situation described in Theorem 1.6) and it yields a new series of simple Lie superalgebras as
the complete prolongs, with another simple analog of g(2) as a partial prolong.
Recall ([BGL1]) that we normalize the Cartan matrix A so that Aii = 1 or 0 if the ith
root is odd, whereas if the ith root is even, we set Aii = 2 or 0 in which case we write 0̄
instead of 0 in order not to confuse with the case of odd roots.
1.6. Theorem. A p = 3 analog of the construction 1) of the Melikyan algebra is given by
setting g0 = ck(1; 1|1), g−1 = O(1; 1|1)/const and g−2 being the trivial module. It yields a
simple super Melikyan algebra that we denote by Me(3;N |3), non-isomorphic to a superMe-
likyan algebra Bj(1;N |7).
The partial prolong of the non-positive part of Me(3;N |3) is a new (exceptional) simple
Lie superalgebra that we denote by brj(2; 3). This brj(2; 3) has the three Cartan matrices:
and 2)
−1 0̄
joined by an odd reflection, and
−1 0̄
. It is a super
analog of the Brown algebra br(2) = brj(2; 3)0̄, its even part.
The CTS prolongs for the simplest gradings r of 1)brj(2; 3) returns known simple Lie
superalgebras, whereas the CTS prolong for a simplest grading r of 2)brj(2; 3) returns, as a
partial prolong, a new simple Lie superalgebra we denote BRJ.
Unlike br(2), the Lie superalgebra brj(2; 3) has analogs for p 6= 3, e.g., for p = 5, we
get a new simple Lie superalgebra brj(2; 5) such that brj(2; 5)0̄ = sp(4) with the two Cartan
matrices 1)
and 2)
. The CTS prolongs of brj(2; 5) for all its Cartan
matrices and the simplest r return brj(2; 5).
4 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
Having got this far, it was impossible not to try to get classification of simple g(A)’s. Here
is its beginning part, see [BGL5].
1.7. Theorem. If p > 5, every finite dimensional simple Lie superalgebra with a 2 × 2
Cartan matrix is isomorphic to osp(1|4), osp(3|2), or sl(1|2). If p = 5, we should add
brj(2; 5). If p = 3, we should add brj(2; 3).
Remark. For details of description of the new simple Lie superalgebras of types Bj and
Me and their subalgebras, in particular, presentations of brj(2; 3) and brj(2; 5), and proof of
Theorem 1.7 and its generalization for higher ranks, see [BGL4, BGL5].
The new simple Lie superalgebras obtained are described in the next subsections.
g Cartan matrix r prolong
osp(3|2)
k(1|3)
k(1|3; 1)
osp(3|2)
k(1|3; 1)
sl(1|2)
vect(0|2)
vect(1|1)
vect(1|1)
osp(1|4)
k(3|1)
osp(1|4)
brj(2; 3)
0̄ −1
Me(3;N |3)
Brj(4|3)
Brj(4;N |3)
Brj(3;N |4) ⊃ BRJ
brj(2; 3)
0̄ −1
Brj(3;N |3)
Brj(3;N |4) ⊃ BRJ
brj(2; 5)
brj(2; 5)
brj(2; 5)
brj(2; 5)
brj(2; 5)
New simple modular Lie superalgebras 5
g Cartan matrix r prolong
sl(1|3)
0 −1 0
−1 2 −1
0 −1 2
(100)
(010)
(001)
vect(0|3)
sl(1|3)
vect(2|1)
0 −1 0
−1 0 −2
0 −1 2
(100)
(010)
(001)
vect(2|1)
sl(1|3)
vect(2|1)
psl(2|2) any matrix
(100)
(010)
(001)
svect(1|2)
psl(2|2)
svect(1|2)
osp(1|6)
2 −1 0
−1 2 −1
0 −1 1
(100)
(010)
(001)
k(5|1)
osp(1|6)
osp(1|6)
osp(3|4)
2 −1 0
−1 0 −1
0 −2 2
0 −1 0
−1 0 1
0 −1 1
(100)
(010)
(001)
k(3|3)
osp(3|4)
osp(3|4)
0 −1 0
−1 2 −1
0 −1 1
(100)
(010)
(001)
osp(3|4)
osp(5|2)
2 −1 0
−1 0 1
0 −1 1
0 −1 0
−1 0 1
0 −2 2
(100)
(010)
(001)
osp(5|2)
0 −1 0
−1 2 −1
0 −2 2
(100)
(010)
(001)
osp(5|2)
k(1|5)
osp(4|2;α)
α generic
2 −1 0
α 0 −1− α
0 −1 2
0 1 −1− α
−1 0 −α
−1− α α 0
(100)
(010)
(001)
osp(4|2;α)
osp(4|2;α)
α = 0,−1
1) The simple part of 1)osp(4|2;α) is sl(2|2);
for the CTS of psl(2|2), see above
2) 2)osp(4|2;α) ≃ sl(2|2);
for the CTS of sl(2|2), see above
6 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
g Cartan matrix r prolong
osp(2|4)
0 1 0
−1 2 −2
0 −1 2
0 1 0
−1 0 2
0 −1 2
0 −2 1
−2 0 1
−1 −1 2
(100)
(010)
(001)
(100)
(010)
(001)
(100)
(010)
(001)
osp(2|4)
osp(2|4) if p > 3
Bj(3;N |3) if p = 3
osp(2|4)
k(3|2)
osp(2|4) if p > 3
Bj(3;N |3) if p = 3
osp(2|4)
osp(2|4)
osp(2|4)
k(3|2)
g(2|3) 3)
0 0 −1
0 0 −2
−1 −2 2
(100)
(010)
(001)
Bj(2|4)
Bj(3|5)
1.8. A description of Bj(3;N |3). For g = 1)osp(2|4) and r = (0, 1, 0), we have the
following realization of the non-positive part:
gi the generators (even | odd)
g−2 Y6 = ∂1 | Y8 = ∂4
g−1 Y2 = ∂2, Y5 = x2∂1 + x5∂4 + ∂3, | Y4 = ∂5, Y7 = 2x2∂4 + ∂6,
g0 ≃ Y3 = x22∂1 + x2x5∂4 + x2∂3 + 2x5∂6, Z3 = x32∂1 + 2x3x6∂4 + x3∂2 + 2x6∂5
sl(1|1) ⊕ sl(2) ⊕K H2 = 2x1∂1 + 2x2∂2 + x4∂4 + x5∂5 + 2x6∂6, H1 = [Z1, Y1],H3 = [Z3, Y3] |
Y1 = x1∂4 + 2x2∂5 + x3∂6, Z1 = 2x4∂1 + 2x5∂2 + x6∂3
The g0-module g−1 is irreducible, having one highest weight vector Y2.
Let p = 3. The CTS prolong gives sdim(g1) = 4|4. The g0-module g1 has the following
two lowest weight vectors:
V ′1 x1x2∂4 + 2x1∂6 + 2x2
2∂5 + x2x3∂6
V ′′1 x1x2∂1 + x1x5∂4 + 2x2x4∂4 + x1∂3 + x2
2∂2 + 2x2x5∂5 + x2x6∂6 + x3x5∂6 + x4∂6
Since g1 generates the positive part of the CTS prolong, [g−1, g1] = g0, and [g−1, g−1] = g−2,
the standard criteria of simplicity ensures that the CTS prolong is simple. Since none of
the Z-graded Lie superalgebras over C of polynomial growth and finite depth has grading
of this form (with g0 ≃ sl(1|1)⊕ sl(2)⊕ K), we conclude that this Lie superalgebra is new.
We denote it by Bj(3;N |3), where N is the shearing parameter of the even indeterminates.
Our calculations show that N2 = N3 = 1 always. For N1 = 1, 2, the super dimensions of the
positive components of Bj(3;N |3) are given in the following tables:
N = (1, 1, 1) g1 g2 g3 g4 g5 g6 – – –
sdim 4|4 5|5 4|4 4|4 2|2 0|3
N = (2, 1, 1) g1 g2 g3 g4 g5 g6 · · · g11 g12
sdim 4|4 5|5 4|4 5|5 4|4 5|5 · · · 2|2 0|3
New simple modular Lie superalgebras 7
Let V ′i , V
i and V
i be the lowest height vectors of gi with respect to g0. For N = (1, 1, 1),
these vectors are as follows:
gi lowest weight vectors
V ′2 x1
2∂4 + 2 x1x2∂5 + x1x3∂6 + x2x3
V ′′2 x1x2
2∂1 + x1x2x5∂4 + x1x2∂3 + 2 x1x5∂6 + x2
2x3∂3 + 2 x2
2x5∂5 + x2x3x5∂6
V ′′′2 x1
2∂1 + x2
2∂1 + x2
2x3∂2 + 2 x2
2x6∂5 + 2 x1x2∂2 + 2 x1x3∂3 + 2 x1x4∂4
+x2x3
2∂3 + x2x4∂5 + 2 x3x4∂6 + 2 x2
2x3x6∂4 + 2 x2x3x6∂6
V ′3 x1
2x2∂4 + 2 x1
2∂6 + 2 x1x2
2∂5 + x1x2x3∂6 + x2
V ′′3 x1
2x2∂1 + x1
2x5∂4 + x1
2∂3 + x1x2x3∂3 + 2 x1x2x5∂5 + x1x3x5∂6
+x2x3
2x5∂6 + x1x2x4∂4 + 2 x1x2
2∂2 + x1x2x3∂3 + 2 x1x2x6∂6
+2 x1x4∂6 + 2 x2
2∂3 + 2 x2
2x3x6∂6 + 2 x2
2x4∂5 + x2x3x4∂6
V ′4 x1
2∂1 + x1
2x2x5∂4 + x1
2x2∂3 + 2 x1
2x5∂6 + x1x2
2x3∂3 + 2 x1x2
2x5∂5
+x1x2x3x5∂6 + x2
2x5∂6
V ′′4 x1
2x4∂4 + x1
2x5∂5 + x1
2x6∂6 + x2
2x6∂6 + x1x2
2∂1 + x1x2
2x3∂2 + 2 x1x2
2x6∂5
+2 x1x3
2x5∂6 + 2 x1x2x3
2∂3 + 2 x1x2x4∂5 + x1x3x4∂6 + x2x3
2x4∂6 + 2 x1x2
2x3x6∂4
V ′5 x1
2∂2 + x1
2x4∂6 + 2 x1
2x2x3∂3 + 2 x1
2x2x4∂4 + x1
2x2x6∂6 + 2 x2
2x4∂6
+x1x2
2∂3 + x1x2
2x4∂5 + x1x2
2x3x6∂6 + 2 x1x2x3x4∂6
V ′6 x1
2x4∂1 + x1
2x5∂2 + 2 x1
2x6∂3 + x1
2x2x4∂3 + 2 x1
2x4x5∂6
+2 x1
2x2x3x5∂3 + x1
2x2x4x5∂4 + x1
2x2x5x6∂6 + x2
2x4x5∂6
+x1x2
2x5∂3 + x1x2
2x3x4∂3 + 2 x1x2
2x4x5∂5 + x1x2
2x3x5x6∂6 + x1x2x3x4x5∂6
For N = (2, 1, 1), the lowest hight vectors are as in the table above together with the
following ones
gi lowest weight vectors
V ′′′4 x1
3∂4 + 2 x1
2x2∂5 + x1
2x3∂6 + x1x2x3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
V ′11 x1
2∂2 + x1
5x4∂6 + x1
2∂3 + x1
2x4∂5 + 2 x1
5x2x3∂3 + 2 x1
5x2x4∂4
5x2x6∂6 + 2 x1
2x4∂6 + x1
2x3x6∂6 + 2 x1
4x2x3x4∂6
V ′12 x1
2x4∂1 + x1
2x5∂2 + 2 x1
2x6∂3 + x1
5x2x4∂3 + 2 x1
5x4x5∂6 + x1
2x5∂3
2x3x4∂3 + 2 x1
2x4x5∂5 + 2 x1
5x2x3x5∂3 + x1
5x2x4x5∂4 + x1
5x2x5x6∂6
2x4x5∂6 + x1
2x3x5x6∂6 + x1
4x2x3x4x5∂6
Let us investigate if Bj(3;N |3) has partial prolongs as subalgebras:
(i) Denote by g′1 the g0-module generated by V
1 . We have sdim(g
1) = 2|2. The CTS par-
tial prolong (g−, g0, g
1)∗ gives a graded Lie superalgebra with the property that [g−1, g1] ≃
{Y1, h1} := aff. From the description of irreducible modules over solvable Lie superalge-
bras [Ssol], we see that the irreducible aff-modules are 1-dimensional. For irreducible aff-
submodules g′−1 in g−1 we have two possibilities: to take g
−1 = {Y4} or g′−1 = {Y7}; for both
of them, g′−1 is purely odd and we can never get a simple Cartan prolong.
(ii) Denote by g′′1 the g0-module generated by V
1 . We have sdim(g
1) = 2|2. The CTS
partial prolong (g−, g0, g
1)∗ returns osp(2|4).
1.9. A description of Bj(2|4). We consider 3)g(2|3) with r = (1, 0, 0). In this case,
sdim(g(2, 3)−) = 2|4. Since the g(2, 3)0-module action is not faithful, we consider the quo-
tient algebra g0 = g(2, 3)0/ann(g−1) and embed (g(2, 3)−, g0) ⊂ vect(2|4). This realization
8 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
is given by the following table:
gi the generators (even | odd)
g−1 Y6 = ∂2, Y8 = ∂1 | Y11 = ∂3, Y10 = ∂4, Y4 = ∂5, Y1 = ∂6
g0 ≃ Y3 = x2∂1 + 2x4∂3 + x6∂5, Y9 = [Y2, [Y3, Y5]], Z3 = x1∂2 + 2x3∂4 + x5∂6, Z9 =
[Z2, [Z3, Z5]], H2 = [Z2, Y2],H3 = [Z3, Y3] | Y2 = x1∂4 + x5∂2, Y5 = [Y2, Y3],
osp(3|2) Y7 = [Y3, [Y2, Y3]], Z2 = x2∂5 + 2x4∂1, Z5 = [Z2, Z3], Z7 = [Z3, [Z2, Z3]]
The g0-module g−1 is irreducible, having one lowest weight vector Y11 and one highest weight
vector Y1. The CTS prolong (g−, g0)∗ gives a Lie superalgebra of superdimension 13|14.
Indeed, sdim(g1) = 4|4 and sdim(g2) = 1|0. The g0-module g1 has one lowest vector:
V1 = 2x1x2∂3 + x1x6∂1 + 2x2
2∂4 + x2x5∂1 + 2x2x6∂2 + x4x5∂3 + 2x4x6∂4 + x5x6∂5
The g2 is one-dimensional spanned by the following vector
2x2∂1 + x1
2x4∂3 + 2x1
2x6∂5 + x1x2
2∂2 + 2x1x2x3∂3 + x1x2x4∂4 + 2x1x2x5∂5 + x1x2x6∂6
+x1x3x6∂1 + 2x1x4x5∂1 + x1x4x6∂2 + 2x2
2x3∂4 + x2
2x5∂6 + x2x3x5∂1 + 2x2x3x6∂2 + x2x4x5∂2
+x3x4x5∂3 + 2x3x4x6∂4 + x3x5x6∂5 + 2x4x5x6∂6
Besides, if i > 2, then gi = 0 for all values of the sharing parameter N = (N1, N2). A
direct computation gives [g1, g1] = g2 and [g−1, g1] = g0. SuperLie tells us that this Lie
superalgebra has three ideals I1 ⊂ I2 ⊂ I3 with the same non-positive part but different
positive parts: sdim(I1) = 10|14, sdim(I2) = 11|14, sdim(I3) = 12|14. The ideal I1 is just
our bj, see [BjL, CE]. The partial CTS prolong with I1 returns I1 plus an outer derivation
given by the vector above (of degree 2). It is clear now that Bj(2|4) is not simple.
1.10. A description of Bj(3|5). We consider 3)g(2|3) and r = (0, 1, 0). In this case,
sdim(g(2, 3)−) = 3|5. Since the g(2, 3)0-module action is again not faithful, we consider
the quotient module g0 = g(2, 3)0/ann(g−1) and embed (g(2, 3)−, g0) ⊂ vect(3;N |5). This
realization is given by the following table:
gi the generators (even | odd)
g−2 Y9 = ∂1 | Y10 = ∂3, Y11 = ∂2
g−1 Y8 = ∂4, Y6 = ∂5 | Y5 = 2x4∂2 + 2x5∂3 + 2x7∂1 + ∂7, Y2 = x4∂3 − 2x6∂1 + ∂8
Y7 = x5∂2 + ∂6
g0 ≃ sl(1|2) H1 = [Z1, Y1],H3 = [Z3, Y3], Y3 = 2x3∂2 + 2x7x8∂1 + x5∂4 + 2x7∂6 + x8∂7,
Z3 = 2x2∂3 + 2x6x7∂1 + x4∂5 + x6∂7 + 2x7∂8 | Y4 = [Y1, Y3], Z4 = [Z1, Z3],
Y1 = 2 (2x1∂3 + 2x6x7∂2 + x6∂4 + x7∂5)
Z1 = 2
x3∂1 + 2x4x5∂2 + 2x5
2∂3 + 2x5x7∂1 + 2x4∂6 + x5∂7
The g0-module g−1 is irreducible, having one highest weight vector Y2. We have sdim(g1) =
6|4. The g0-module g1 has two lowest weight vectors given by
V ′1 x1x5∂2 + 2x5x6x8∂2 + x5x7x8∂3 + 2x1∂6 + 2x3∂4 + x5x7∂4 + x5x8∂5 + 2x7x8∂7
V ′′1 x6x7x8∂2 + 2x1∂4 + x7x8∂5
Now, the g0-module generated by the the vectors V
1 and V
1 is not the whole g1 but a g0-
module that we denote by g′′1, of sdim = 4|4. The CTS prolong (g−, g0, g1)∗ is not simple, so
New simple modular Lie superalgebras 9
consider the Lie subsuperalgebra (g−, g0, g
1)∗; the superdimensions of its positive part are
adig′′1
(g′′1) g
1 adg′′1 (g
1) ad
(g′′1) ad
(g′′1) ad
(g′′1)
sdim 4|4 4|4 4|4 3|2 2|1
The lowest weight vectors of the above components are precisely {V ′2 , V ′′2 , V3, V4, V5} described
bellow:
adig1(g1) lowest weight vectors
V ′2 x1
2∂2 + 2x1x7∂4 + 2x1x8∂5 + x1x6x8∂2 + 2x1x7x8∂3
V ′′2 2x1
2∂1 + x1x2∂2 + x1x3∂3 + x1x6∂6 + x1x7∂7 + x1x8∂8 + 2x2x7∂4 + 2x2x8∂5
+2x3x6∂4 + 2x3x7∂5 + x2x6x8∂2 + 2x2x7x8∂3 + x3x6x7∂2 + x6x7x8∂7
V3 x1
2∂4 + 2x1x7x8∂5 + 2x1x6x7x8∂2
V4 x1
2x3∂2 + 2x1
2x5∂4 + x1
2x7∂6 + 2x1
2x8∂7 + x1
2x7x8∂1 + 2x1x3x7∂4 + 2x1x3x8∂5
+x1x3x6x8∂2 + 2x1x3x7x8∂3 + x1x5x7x8∂5 + x1x5x6x7x8∂2
V5 x1
2x2∂4 + 2x1
2x3∂5 + 2x1
2x6x7∂6 + x1
2x6x8∂7 + 2x1
2x7x8∂8 + 2x1
2x6x7x8∂1
+2x1x2x7x8∂5 + 2x1x3x6x7∂4 + 2x1x3x6x8∂5 + 2x1x2x6x7x8∂2 + 2x1x3x6x7x8∂3
Since none of the known simple finite dimensional Lie superalgebra over (algebraically closed)
fields of characteristic 0 or > 3 has such a non-positive part in any Z-grading, it follows that
Bj(3;N |5) is new.
Let us investigate if Bj(3;N |5) has subalgebras — partial prolongs.
(i) Denote by g′1 the g0-module generated by V
1 . We have sdim(g
1) = 2|3. The CTS
partial prolong (g−1, g0, g
1)∗ gives a graded Lie superalgebra with sdim(g
2) = 2|2 and g′i = 0
for i > 3. An easy computation shows that [g−1, g
1] = g0 and [g
1] ( g
2. Since we are
investigating simple Lie superalgebra, we take the simple part of (g−1, g0, g
1)∗. This simple
Lie superalgebra is isomorphic to g(2, 3)/c = bj.
(ii) Denote by g′′1 the g0-module generated by V
1 . We just saw that sdim(g
1) = 4|4. The
CTS partial prolong (g−1, g0, g
1)∗ gives also Bj(3|5).
r = (0, 0, 1). In this case, sdim(g(2, 3)−) = 4|5. Since the g(2, 3)0-module action is not
faithful, we consider the quotient algebra g0 = g(2, 3)0/ann(g−1) and embed (g(2, 3)−, g0) ⊂
vect(4;N |5). The CTS prolong returns bj := g(2, 3)/c.
1.11. A description of Me(3;N |3). 1) Our first idea was to try to repeat the above
construction with a suitable super version of g(2). There is only one simple super analog of
g(2), namely ag(2), but our attempts [BjL] to construct a super analog of Melikyan algebra
in the above way as Kuznetsov suggested [Ku1] (reproduced in [GL4]) resulted in something
quite distinct from the Melikyan algebra: The Lie superalgebras we obtained, an exceptional
one bj (cf. [CE, BGL1]) and a series Bj, are indeed simple but do not resemble either g(2)
or Me.
2) Our other idea is based on the following observation. The anti-symmetric form
(3) (f, g) :=
fdg =
fg′dt,
on the quotient space F/const of functions (with compact support) modulo constants on the
1-dimensional manifolds, has its counterpart in 1|1-dimensional case in presence of a contact
structure a n d o n l y i n t h i s c a s e as follows from the description of invariant bilinear
differential operators, see [KLV]. Indeed, the Lie superalgebra k(1|1) does not distinguish
10 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
between the space of volume forms (let its generator be denoted vol) and the quotient Ω1/Fα,
where α = dt+ θdθ is the contact form.
For any prime p therefore, on the space g−1 := O(1;N |1)/ const of “functions (with com-
pact support) in one even indeterminate u and one odd, θ modulo constants”, the superanti-
symmetric bilinear form
(4) (f, g) :=
(fdg mod Fα) =
0 − f1g1)dt,
where f = f0(t) + f1(t)θ and g = g0(t) + g1(t)θ and where
′ := d
, is nondegenerate.
Therefore, we may expect that, for p small and N = 1, the Melikyan effect will reappear.
Consider p = 5 as the most plausible.
We should be careful with parities. The parity of vol is a matter of agreement, let it be
even. Then the integral is an odd functional but the factorization modulo Fα makes the form
(4) even. (Setting p(vol) = 1̄ we make the integral an even functional and the factorization
modulo Fα makes the form (4) even again.)
Since the form (4) is even, we get the following realization of
k(1; 1|1) ⊂ osp(5|4) ≃ k(5; 1, ..., 1|5)
by generating functions of contact vector fields on the 5|5-dimensional superspace with the
contact form, where the coefficients are found from the explicit values of
(p̂idq̂i − q̂2dp̂i) +
ξ̂jdηj + η̂jdξ̂j
− θ̂dθ̂.
The coordinates on this 5|5-dimensional superspace are hatted in order not to confuse them
with generating functions of k(1; 1|1):
gi basis elements
g−2 1̂
g−1 p̂1 = t, p̂2 = t
2, q̂1 = t
3, q̂2 = t
ξ̂1 = θ, ξ̂2 = tθ, θ̂ = t
2θ, η̂2 = t
3θ, η̂1 = t
We explicitly have:
(t, t4) =
t · t3dtN =
4t4dtN = 4 = −(t4, t);
(t2, t3) =
t2 · t2dtN =
6t4dtN = 1 = −(t3, t2);
(t4θ, θ) = −
t4 · 1dtN = −1 = (θ, t4θ);
(t3θ, tθ) = −
t3 · tdtN = −4 = (tθ, t3θ);
(t2θ, t2θ) = −
t2 · t2dtN = −6 = −1.
New simple modular Lie superalgebras 11
Now, let us realize k(1; 1|1) by contact fields in hatted functions:
gi basis elements
g−2 1̂
g−1 p̂1 = t, p̂2 = t
2, q̂2 = 4t
3, q̂1 = t
ξ1 = θ, ξ2 = tθ, θ̂ = t
2θ, η2 = 4t
3θ, η1 = t
g0 1 = 2 p̂1·q̂2 + 2p̂22 + 3 ξ1η2 + 3 ξ2θ̂; t = 2 p̂1q̂1 + 4 p̂2q̂2 + 4 ξ1η1 + 2 ξ2η2;
t2 = 2 p̂2q̂1 + 4 q̂
2 + 4 ξ2η1 + θ̂η2; t
3 = 3 q̂1q̂2 + 4 θ̂η1; t
4 = q̂21 + η2η1;
θ = p̂1η2 + p̂2θ̂ + q̂1ξ1 + q̂2ξ2; tθ = p̂1η1 + 2 p̂2η2 + q̂1ξ2 + 2 q̂2θ̂;
t2θ = p̂2η1 + q̂1θ̂ + 2 q̂2η2; t
3θ = 4 q̂1η2 + 4 q̂2η1; t
4θ = q̂1η1
The CTS prolong gives that g1 = 0.
The case where p = 3 is more interesting because it will give us the series Me(3;N |3).
The non-positive part is as follows:
gi basis elements
g−2 1̂
g−1 p̂1 = t, q̂2 = t
2, ξ1 = θ, θ̂ = tθ, η1 = t
g0 1 = p̂
1 + 2ξ̂1η̂1; t = 2 p̂1q̂1 + 2 ξ̂1η̂1; t
2 = 2 q̂21 + 2 θ̂η̂1; θ = 2 p̂1θ̂ + q̂1ξ̂1;
tθ = p̂1η̂1 + q̂1θ̂; t
2θ = q̂1η̂1
The Lie superalgebra g0 is not simple because [g−1, g1] = g0\{t2θ = q̂1η̂1}. Denote g′0 :=
[g−1, g1] ≃ osp(1|2). The CTS partial prolong (g−, g′0)∗ seems to be very interesting. First,
our computation shows that the parameter M = (M1,M2,M3) depends only on the first
parameter (relative to t). Namely, M = (M1, 1, 1). For M1 = 1, 2, the super dimensions of
the positive components of Bj(3;M |3) are given in the following table:
M = (1, 1, 1) g′1 g
5 – – – – –
sdim 2|4 4|2 2|4 3|2 0|1
M = (2, 1, 1) g′1 g
5 · · · g′14 g′15 g′16 g′17
sdim 2|4 4|2 2|4 4|2 2|4 · · · 4|2 2|4 3|2 0|1
Here we have that [g−1, g1] = g
0 and the g
1 generates the positive part. The standard criteria
for simplicity ensures that Me(3;N |3) is simple. For N = (1, a, b), the lowest weight vectors
12 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
are as follows:
gi lowest weight vectors
V ′1 2p1
(2)η1 + 2 p1q1θ + q1
(2)ξ1 + ξ1θη1
V ′′1 2 p1q1η1 + q1
(2)θ + η1t
V ′2 2 p1q1θη1 + q1
(2)ξ1η1 + tq1
(2) + tθη1
V ′′2 2 p1
(2)q1
(2) + p1
(2)θη1 + p1q1ξ1η1 + 2q1
(2)ξ1θ + t
V ′3 2 tp1
(2)η1 + tp1q1θ + 2 tq1
(2)ξ1 + tξ1θη1
V ′′3 2 p1
(2)q1
(2)η1 + 2 tp1q1η1 + 2 q1
(2)ξ1θη1 + tq1
(2)θ + t(2)η1
V ′4 2 p1
(2)q1
(2)θη1 + 2 tp1q1θη1 + tq1
(2)ξ1η1 + t
(2)q1
(2) + t(2)θη1
V ′5 p1
(2)q1
(2)ξ1θη1 + t
(2)p1
(2)η1 + t
(2)p1q1θ + 2 t
(2)q1
(2)ξ1 + 2 t
(2)ξ1θη1
For N = (2, a, b), the lowest weight vectors are as above together with:
gi lowest weight vectors
V ′′4 2 tp1
(2)q1
(2) + tp1
(2)θη1 + tp1q1ξ1η1 + 2 tq1
(2)ξ1θ + t
V ′′5 2 tp1
(2)q1
(2)η1 + 2 t
(2)p1q1η1 + 2 tq1
(2)ξ1θη1 + t
(2)q1
(2)θ + t(3)η1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
V ′15 2 t
(5)p1
(2)q1
(2)ξ1θη1 + 2t
(7)p1
(2)η1 + 2 t
(7)p1q1θ + t
(7)q1
(2)ξ1 + t
(7)ξ1θη1
2 t(6)p1
(2)q1
(2)η1 + 2 t
(7)p1q1η1 + 2 t
(6)q1
(2)ξ1θη1 + t
(7)q1
(2)θ + t(8)η1
V ′′16 2 t
(6)p1
(2)q1
(2)θη1 + 2 t
(7)p1q1θη1 + t
(7)q1
(2)ξ1η1 + t
(8)q1
(2) + t(8)θη1
V ′′17 2t
(6)p1
(2)q1
(2)ξ1θη1 + 2 t
(8)p1
(2)η1 + 2 t
(8)p1q1θ + t
(8)q1
(2)ξ1 + t
(8)ξ1θη1
Let us investigate the subalgebras of Me(3;N |3) — partial prolongs:
(i) Denote by g′1 the g0-module generated by V
1 . We have sdim(g
1) = 0|1 and gi = 0 for
all i > 1. The CTS partial prolong (g−, g0, g
1)∗ gives a graded Lie superalgebra with the
property that [g−1, g1] ≃ osp(1|2). The partial CTS prolong (g−, osp(1|2))∗ is not simple
(ii) Denote by g′′1 the g0-module generated by V
1 . We have sdim(g
1) = 3|2. The CTS
partial prolong (g−, g0, g
1)∗ returns brj(2; 3).
1.12. A description of Brj(4|3). We have the following realization of the non-positive
part inside vect(4|3):
gi the generators (even | odd)
g−4 Y8 = ∂1 | Y7 = ∂5
g−3 Y6 = ∂2 |
g−2 Y4 = ∂3 | Y5 = x3∂5 + x6∂1 + ∂6
g−1 Y3 = 2x2∂1 + 2x3∂2 + ∂4 |
Y2 = x2∂5 + 2x4
(2)x7∂1 + x4x6∂1 + x6x7∂5 + x4x7∂2 + x6∂2 + 2x4∂6 + 2x7∂3 + 1∂7,
g0 ≃ hei(0|2) ⊕ K H1 = [Z1, Y1], H2 = 2x5∂5 + x2∂2 + 2x3∂3 + 2x4∂4 + x7∂7, |
Y1 = 2x3
(2)∂5 + 2x3x6∂1 + 2x5∂1 + 2x3∂6 + x7∂4,
Z1 = x4
(2)∂6 + 2x4
(2)x6∂1 + x4
(2)x7∂2 + x4x7∂3 + 2x4x6x7∂5 + x1∂5 + 2x4∂7 + 2 x6∂3
The Lie superalgebra g0 is solvable, and hence the CTS prolong (g−, g0)∗ is NOT simple
since g1 does not generate the positive part. Our calculation shows that the prolong does
not depend on N , i.e., N = (1, 1, 1, 1). The simple part of this prolong is brj(2; 3). The sdim
New simple modular Lie superalgebras 13
of the positive parts are described as follows:
g1 g2 g3 g4 g5 g6 g7 g8 g9 g10
sdim 1|1 2|2 1|2 2|2 1|1 2|2 1|1 1|1 0|1 1|1
and the lowest weight vectors are as follows:
gi lowest weight vectors
V ′1 2x2x3∂5 + 2x2x6∂1 + x3x4∂6 + 2 x3x6∂2 + x3x7∂3 + x4x7∂4 + x3x4
(2)x7∂1 + 2x3x4x6∂1
+2x3x4x7∂2 + 2x3x6x7∂5 + 2x2∂6 + 2x3∂7 + 2x5∂2 + 2 x6∂4
V ′2 x4
(2)∂4 + x1x3∂5 + x1x6∂1 + x3x4
(2)∂6 + 2x3x4∂7 + 2x3x6∂3 + 2x4x6∂4 + 2x6x7∂7
+2x3x4
(2)x6∂1 + x3x4
(2)x7∂2 + x3x4x7∂3 + 2x3x4x6x7∂5 + x1∂6 + 2x5∂3
V ′′2 2x2
(2)∂1 + x3
(2)∂3 + 2x2x3∂2 + x3x4∂4 + x3x5∂5 + 2x3x7∂7 + x5x6∂1 + 2x6x7∂4 + x2∂4 + x5∂6
V ′3 2x1x2∂1 + 2x1x3∂2 + x2x3∂3 + 2 x2x4∂4 + 2x2x5∂5 + x2x6∂6 + 2x2x7∂7 + 2x3x6∂7 + x4x5∂6
+2x5x6∂2 + x5x7∂3 + x4
(2)x5x7∂1 + x3x4x6∂6 + x3x4x7∂7 + x3x6x7∂3 + 2x4x5x6∂1 + 2x4x5x7∂2
+2x4x6x7∂4 + 2x5x6x7∂5 + x3x4
(2)x6x7∂1 + 2x3x4x6x7∂2 + x1∂4 + 2x5∂7
V ′′3 2x3
(2)∂7 + x3
(2)x4∂6 + 2x3
(2)∂x6∂2 + x3
(2)x7∂3 + 2x2x3
(2)∂5 + 2x2x3∂6 + 2x2x5∂1 + x2x7∂4
+2x3x5∂2 + 2x3x6∂4 + x3
(2)x4
(2)x7∂1 + 2x3
(2)x4x6∂1 + 2x3
(2)x4x7∂2 + 2x3
(2)x6x7∂5 + 2x2x3x6∂1
+x3x4x7∂4 + x5∂4
V4 2x2
(2)∂6 + 2x2
(2)x3∂5 + 2x2
(2)x6∂1 + 2x3
(2)x4
(2)∂6 + x3
(2)x4∂7 + x3
(2)x6∂3 + 2x1x3
(2)∂5
+2x1x3∂6 + 2x1x5∂1 + x1x7∂4 + 2x2x3∂7 + 2x2x5∂2 + 2x2x6∂4 + 2 x3x5∂3 + x5x6∂6 + x5x7∂7
(2)x4
(2)x6∂1 + 2x3
(2)x4
(2)x7∂2 + 2x3
(2)x4x7∂3 + 2x1x3x6∂1 + x2x3x4∂6 + 2x2x3x6∂2
+x2x3x7∂3 + x2x4x7∂4 + x3x4
(2)x7∂4 + 2x3x4x6∂4 + x3
(2)x4x6x7∂5 + x2x3x4
(2)x7∂1 + 2 x2x3x4x6∂1
+2x2x3x4x7∂2 + 2x2x3x6x7∂5
V ′′4 x1
(2)∂1 + x4
(2)x5∂6 + 2x1x3∂3 + x1x4∂4 + x1x5∂5 + 2x1x6∂6 + x1x7∂7 + 2x4x5∂7 + 2x5x6∂3
(2)x5x6∂1 + x4
(2)x5x7∂2 + 2x4
(2)x6x7∂4 + x3x4
(2)x6∂6 + x3x4
(2)x7∂7 + 2x3x4x6∂7
+x4x5x7∂3 + x3x4
(2)x6x7∂2 + x3x4x6x7∂3 + 2x4x5x6x7∂5
V ′5 x1x2∂6 + x1x3∂7 + x1x5∂2 + x1x6∂4 + 2x2x5∂3 + x5x6∂7 + x1x2x3∂5 + x1x2x6∂1 + 2x1x3x4∂6
+x1x3x6∂2 + 2x1x3x7∂3 + 2x1x4x7∂4 + x2x4
(2)x7∂4 + x2x3x4
(2)∂6 + 2x2x3x4∂7 + 2x2x3x6∂3
+2x2x4x6∂4 + 2x2x6x7∂7 + 2x4x5x6∂6 + 2x4x5x7∂7 + 2x5x6x7∂3 + 2x4
(2)x5x6x7∂1
+2x1x3x4
(2)x7∂1 + x1x3x4x6∂1 + x1x3x4x7∂2 + x1x3x6x7∂5 + 2x2x3x4
(2)x6∂1 + x2x3x4
(2)x7∂2
+x2x3x4x7∂3 + 2x3x4x6x7∂7 + x4x5x6x7∂2 + 2x2x3x4x6x7∂5
V ′6 x1
(2)∂6 + x1
(2)x3∂5 + x1
(2)x6∂1 + 2x1x5∂3 + x4
(2)x5x6∂6 + x4
(2)x5x7∂7 + x1x4
(2)x7∂4
+x1x3x4
(2)∂6 + 2x1x3x4∂7 + 2x1x3x6∂3 + 2x1x4x6∂4 + 2x1x6x7∂7 + 2x4x5x6∂7
(2)x5x6x7∂2 + 2x1x3x4
(2)x6∂1 + x1x3x4
(2)x7∂2 + x1x3x4x7∂3 + x3x4
(2)x6x7∂7
+x4x5x6x7∂3 + 2x1x3x4x6x7∂5
V ′′6 x2
(2)x3∂3 + 2x2
(2)x4∂4 + 2x2
(2)x5∂5 + x2
(2)x6∂6 + 2x2
(2)x7∂7 + 2x1x2
(2)∂1 + x1x3
(2)∂3
+x1x2∂4 + x1x5∂6 + 2x2x5∂7 + 2x3
(2)x4
(2)x6∂6 + 2x3
(2)x4
(2)x7∂7 + x3
(2)x4x6∂7 + 2x1x2x3∂2
+x1x3x4∂4 + x1x3x5∂5 + 2x1x3x7∂7 + x1x5x6∂1 + 2x1x6x7∂4 + 2x2x3x6∂7 + x2x4x5∂6
+2x2x5x6∂2 + x2x5x7∂3 + x3x4
(2)x5∂6 + 2x3x4x5∂7 + 2x3x5x6∂3 + x5x6x7∂7
(2)x4
(2)x6x7∂2 + 2x3
(2)x4x6x7∂3 + x2x4
(2)x5x7∂1 + x2x3x4x6∂6 + x2x3x4x7∂7 + x2x3x6x7∂3
+2x2x4x5x6∂1 + 2x2x4x5x7∂2 + 2x2x4x6x7∂4 + 2x2x5x6x7∂5 + 2x3x4
(2)x5x6∂1 + x3x4
(2)x5x7∂2
+2x3x4
(2)x6x7∂4 + x3x4x5x7∂3 + x2x3x4
(2)x6x7∂1 + 2x2x3x4x6x7∂2 + 2x3x4x5x6x7∂5
14 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
V ′7 x1
(2)∂4 + 2x1
(2)x2∂1 + 2x1
(2)x3∂2 + 2x1x5∂7 + x1x2x3∂3 + 2x1x2x4∂4 + 2x1x2x5∂5 + x1x2x6∂6
+2x1x2x7∂7 + 2x1x3x6∂7 + x1x4x5∂6 + 2x1x5x6∂2 + x1x5x7∂3 + 2x2x4
(2)x5∂6 + x2x4x5∂7
+x2x5x6∂3 + x1x4
(2)x5x7∂1 + x1x3x4x6∂6 + x1x3x4x7∂7 + x1x3x6x7∂3 + 2x1x4x5x6∂1 + 2x1x4x5x7∂2
+2x1x4x6x7∂4 + 2x1x5x6x7∂5 + x2x4
(2)x5x6∂1 + 2x2x4
(2)x5x7∂2 + x2x4
(2)x6x7∂4 + 2x2x3x4
(2)x6∂6
+2x2x3x4
(2)x7∂7 + x2x3x4x6∂7 + 2x2x4x5x7∂3 + x4x5x6x7∂7 + x1x3x4
(2)x6x7∂1 + 2x1x3x4x6x7∂2
+2x2x3x4
(2)x6x7∂2 + 2x2x3x4x6x7∂3 + x2x4x5x6x7∂5
V ′8 x1
(2)x3
(2)∂5 + x1
(2)x3∂6 + x1
(2)x5∂1 + 2x1
(2)x7∂4 + 2x2
(2)x5∂3 + x1x2
(2)∂6 + x1
(2)x3x6∂1
(2)x4
(2)x7∂4 + x2
(2)x3x4
(2)∂6 + 2x2
(2)x3x4∂7 + 2x2
(2)x3x6∂3 + 2x2
(2)x4x6∂4 + 2x2
(2)x6x7∂7
+x1x2
(2)x3∂5 + x1x2
(2)x6∂1 + x1x3
(2)x4
(2)∂6 + 2x1x3
(2)x4∂7 + 2 x1x3
(2)x6∂3 + x1x2x3∂7 + x1x2x5∂2
+x1x2x6∂4 + x1x3x5∂3 + 2x1x5x6∂6 + 2x1x5x7∂7 + x2x5x6∂7 + 2x2
(2)x3x4
(2)x6∂1 + x2
(2)x3x4
(2)x7∂2
(2)x3x4x7∂3 + x3
(2)x4
(2)x6x7∂7 + 2x1x3
(2)x4
(2)x6∂1 + x1x3
(2)x4
(2)x7∂2 + x1x3
(2)x4x7∂3
+2x1x2x3x4∂6 + x1x2x3x6∂2 + 2x1x2x3x7∂3 + 2x1x2x4x7∂4 + 2x1x3x4
(2)x7∂4 + x1x3x4x6∂4
+2x2x4x5x6∂6 + 2x2x4x5x7∂7 + 2x2x5x6x7∂3 + 2x3x4
(2)x5x6∂6 + 2x3x4
(2)x5x7∂7 + x3x4x5x6∂7
(2)x3x4x6x7∂5 + 2x1x3
(2)x4x6x7∂5 + 2x1x2x3x4
(2)x7∂1 + x1x2x3x4x6∂1 + x1x2x3x4x7∂2
+x1x2x3x6x7∂5 + 2x2x4
(2)x5x6x7∂1 + 2x2x3x4x6x7∂7 + x2x4x5x6x7∂2 + 2x3x4
(2)x5x6x7∂2+
2x3x4x5x6x7∂3
V ′9 x1
(2)x2x3∂5 + x1
(2)x2x6∂1 + 2x1
(2)x3x4
(2)x7∂1 + x1
(2)x3x4x6∂1 + x1
(2)x3x6x7∂5 + 2x1x2x3x4
(2)x6∂1
+2x1x2x3x4x6x7∂5 + 2x1x4
(2)x5x6x7∂1 + x1
(2)x3x4x7∂2 + x1
(2)x3x6∂2 + x1
(2)x5∂2
+x1x2x3x4
(2)x7∂2 + x1x4x5x6x7∂2 + x2x4
(2)x5x6x7∂2 + x1
(2)x2∂6 + 2x1
(2)x3x4∂6 + 2 x1
(2)x3x7∂3
+x1x2x3x4
(2)∂6 + x1x2x3x4x7∂3 + 2x1x2x3x6∂3 + 2x1x2x5∂3 + 2x1x4x5x6∂6 + 2x1x5x6x7∂3
+x2x4
(2)x5x6∂6 + x2x4x5x6x7∂3 + x1
(2)x3∂7 + 2x1
(2)x4x7∂4 + x1
(2)x6∂4 + 2x1x2x3x4∂7
+x1x2x4
(2)x7∂4 + 2x1x2x4x6∂4 + 2x1x2x6x7∂7 + 2x1x3x4x6x7∂7 + 2x1x4x5x7∂7 + x1x5x6∂7
+x2x3x4
(2)x6x7∂7 + x2x4
(2)x5x7∂7 + 2x2x4x5x6∂7
V ′10 x1
(2)x2
(2)∂1 + 2x1
(2)x3
(2)∂3 + 2 x1
(2)x2∂4 + 2x1
(2)x5∂6 + x1
(2)x2x3∂2 + 2x1
(2)x3x4∂4
(2)x3x5∂5 + x1
(2)x3x7∂7 + 2x1
(2)x5x6∂1 + x1
(2)x6x7∂4 + x2
(2)x4
(2)x5∂6 + 2x2
(2)x4x5∂7
(2)x5x6∂3 + 2x1x2
(2)x3∂3 + x1x2
(2)x4∂4 + x1x2
(2)x5∂5 + 2x1x2
(2)x6∂6 + x1x2
(2)x7∂7
+x1x2x5∂7 + 2x2
(2)x4
(2)x5x6∂1 + x2
(2)x4
(2)x5x7∂2 + 2x2
(2)x4
(2)x6x7∂4 + x2
(2)x3x4
(2)x6∂6
(2)x3x4
(2)x7∂7 + 2x2
(2)x3x4x6∂7 + x2
(2)x4x5x7∂3 + x1x3
(2)x4
(2)x6∂6 + x1x3
(2)x4
(2)x7∂7
+2x1x3
(2)x4x6∂7 + x1x2x3x6∂7 + 2x1x2x4x5∂6 + x1x2x5x6∂2 + 2x1x2x5x7∂3 + 2x1x3x4
(2)x5∂6
+x1x3x4x5∂7 + x1x3x5x6∂3 + 2x1x5x6x7∂7 + x2
(2)x3x4
(2)x6x7∂2 + x2
(2)x3x4x6x7∂3
(2)x4x5x6x7∂5 + x1x3
(2)x4
(2)x6x7∂2 + x1x3
(2)x4x6x7∂3 + 2x1x2x4
(2)x5x7∂1 + 2x1x2x3x4x6∂6
+2x1x2x3x4x7∂7 + 2x1x2x3x6x7∂3 + x1x2x4x5x6∂1 + x1x2x4x5x7∂2 + x1x2x4x6x7∂4 + x1x2x5x6x7∂5
+x1x3x4
(2)x5x6∂1 + 2x1x3x4
(2)x5x7∂2 + x1x3x4
(2)x6x7∂4 + 2x1x3x4x5x7∂3 + 2x2x4x5x6x7∂7
+2x3x4
(2)x5x6x7∂7 + 2x1x2x3x4
(2)x6x7∂1 + x1x2x3x4x6x7∂2 + x1x3x4x5x6x7∂5
New simple modular Lie superalgebras 15
1.13. A description ofBrj(3;N |4). We have the following realization of the non-positive
part inside vect(3;N |4):
gi the generators (even | odd)
g−3 | Y6 = ∂4
g−2 Y5 = ∂1, Y6 = ∂2, Y7 = ∂3 |
g−1 | Y2 = 2x3∂4 + ∂5, Y3 = x2∂4 + x6∂1 + ∂6
Y4 = 2x1∂4 + 2x5x7∂4 + x5∂1 + x6∂2 + 2x7∂3 + ∂7
g0 ≃ hei(2|0)⊂+ KH2 H1 = [Z1, Y1], H2 = 2x1∂1 + x3∂3 + x4∂4 + x6∂6 + 2x7∂7
Y1 = 2x5x6x7∂4 + 2x1∂2 + 2x2∂3 + 2x5x6∂1 + x6x7∂3 + 2x5∂6 + 2x6∂7,
Z1 = 2x2∂1 + 2x3∂2 + x6x7∂1 + 2x6∂5 + x7∂6 |
The Lie superalgebra g0 is solvable with the property that [g0, g0] = hei(2|0). The CTS
prolong (g−, g0)∗ is NOT simple since g1 does not generate the positive part. Our calculation
shows that the prolong does not depend on N , i.e., N = (1, 1, 1, 1). The simple part of this
prolong is brj. The sdim of the positive parts are described as follows:
g1 g2 g3
sdim 0|3 3|0 0|2
and the lowest weight vectors are
V ′1 2x1
(2)∂4 + 2x1x5x7∂4 + x1x5∂1 + x1x6∂2 + 2x1x7∂3 + x4∂3 + x1∂7 + 2x5x6∂6 + x5x7∂7
V ′2 2x1x4∂4 + x2x5x6x7∂4 + 2x4x5x7∂4 + 2x1
(2)∂1 + x1x2∂2 + x2
(2)∂3 + x2x5x6∂1 + 2x2x6x7∂3
+x4x5∂1 + x4x6∂2 + 2x4x7∂3 + 2x1x5∂5 + x1x6∂6 + x2x5∂6 + x2x6∂7 + x4∂7 + x5x6x7∂6
V ′3 x1
(2)x2∂4 + x1x2x5x7∂4 + 2x4x5x6x7∂4 + x1
(2)x6∂1 + 2x1x2x5∂1 + 2x1x2x6∂2 + x1x2x7∂3
+2x1x4∂2 + 2 x1x5x6x7∂1 + 2x2x4∂3 + 2 x4x5x6∂1 + x4x6x7∂3 + x1
(2)∂6 + 2x1x2∂7
+x1x5x6∂5 + 2x1x5x7∂6 + x2x5x6∂6 + 2x2x5x7∂7 + 2x4x5∂6 + 2x4x6∂7
V ′′3 x1
(2)x3∂4 + x1x2
(2)∂4 + x1x3x5x7∂4 + x2
(2)x5x7∂4 + x1x2x6∂1 + 2x1x3x5∂1 + 2x1x3x6∂2
+x1x3x7∂3 + 2x1x4∂1 + 2x2
(2)x5∂1 + 2x2
(2)x6∂2 + x2
(2)x7∂3 + 2x2x4∂2 + 2x2x5x6x7∂1
+2x3x4∂3 + 2 x1
(2)∂5 + x1x2∂6 + 2x1x3∂7 + x1x6x7∂6 + 2x2
(2)∂7 + x2x5x6∂5 + 2x2x5x7∂6
+x3x5x6∂6 + 2x3x5x7∂7 + x4x5∂5 + x4x6∂6 + x4x7∂7
Let us study now the case where g′0 = der0(g−). Our calculation shows that g
0 is generated
by the vectors Y1, Z1, H1, H2 above together with V = 2x3∂1 + x7∂5. The Lie algebra g
solvable of sdim = 5|0. The CTS prolong (g−, g′0)∗ gives a Lie superalgebra that is not simple
because g′1 does not generate the positive part. Its simple part is a new Lie superalgebra
that we denote by BRJ, described as follows (here also N = (1, 1, 1):
g′1 adg′1(g
1) ad
(g′1) ad
(g′1) ad
(g′1) ad
(g′1)
sdim 0|6 6|0 0|5 3|0 0|3 1|0
16 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
1.14. A description ofBrj(3;N |3). We have the following realization of the non-positive
part inside vect(3;N |3):
gi the generators (even | odd)
g−2 Y7 = ∂1 | Y5 = ∂4, Y8 = ∂5
g−1 Y1 = ∂2, Y6 = 2x2∂1 + ∂3 | Y3 = x2∂4 + x3∂5 + 2x6∂1 + ∂6
g0 H2 = [X2, Y2], H1 = x1∂1 + x3∂3 + 2x4∂4 + 2x6∂6, X4 = [X2,X2], Y4 = [Y2, Y2] |
Y2 = x
2 ∂4 + x2 x3∂5 + 2x2 x6∂1 + x1∂5 + x2∂6 + x4∂1 + x6∂3
X2 = x
3 ∂5 + 2x1∂4 + x3∂6 + x5∂1 + 2x6∂2
The Lie superalgebra g0 is isomorphic to osp(1|2)⊕ K. The CTS prolong (g−, g0)∗ is NOT
simple since it gives back brj(2; 3) + an outer derivation. The sdim of the positive parts are
described as follows:
g1 g2 g3
sdim 2|1 1|2 0|1
and the lowest weight vectors are
V ′1 2 x1 x2∂1 + x2 x4∂4 + x3 x4∂5 + 2x4 x6∂1 + x1∂3 + 2x2 x3∂3 + x2 x6∂6 + x4∂6
V ′2 x
1 ∂5 + x1 x
2 ∂4 + x1 x2 x3∂5 + 2x1 x2 x6∂1 + x1 x4∂1 + 2x
3 ∂5 + 2x
2 x5∂1 + 2x4 x5∂5 + x1 x2∂6
+x1 x6∂3 + 2x
2 x3∂6 + x
2 x6∂2 + x2 x3 x6∂3 + x2 x4∂2 + x2 x5∂3 + 2x4 x6∂6
V ′3 x
1 x2∂4 + x
1 x3∂5 + 2x
1 x6∂1 + 2x1 x2 x
3 ∂5 + 2x1 x2 x5∂1 + x2 x
3 x4∂1 + 2x2 x4 x5∂4 + 2x3 x4 x5∂5
+x4 x5 x6∂1 + x
1 ∂6 + 2x1 x2 x3∂6 + x1 x2 x6∂2 + x1 x3 x6∂3 + x1 x4∂2 + x1 x5∂3 + 2x2 x
3 x6∂3 + 2x2 x3 x4∂2
+2x2 x3 x5∂3 + x2 x5 x6∂6 + 2x3 x4 x6∂6 + 2x4 x5∂6
1.15. A description ofBrj(3;N |4). We have the following realization of the non-positive
part inside vect(3;N |4):
gi the generators (even | odd)
g−3 | Y8 = ∂4
g−2 Y4 = ∂1, Y6 = ∂2, Y7 = ∂3 |
g−1 | Y2 = x3∂4 + 2x5∂1 + ∂5, Y3 = ∂6 + x5 x6∂4 + x2∂4 + x5∂2 + 2x6∂3
Y5 = x1∂4 + x5∂3 + ∂7
g0 ≃ hei(2|0)⊂+ KH2 H1 = [Z1, Y1], H2 = 2x1∂1 + x2∂2 + x4∂4 + x5∂5 + 2x7∂7
Y1 = x1∂2 + x2∂3 + x5 x6∂3 + 2x5∂6 + 2x6∂7
Z1 = 2x5 x6 x7∂4 + 2x2∂1 + x3∂2 + x5 x6∂1 + 2x6 x7∂3 + 2x6∂5 + x7∂6 |
The Lie superalgebra g0 is solvable with the property that [g0, g0] = hei(2|0). The CTS
prolong (g−, g0)∗ is NOT simple since g1 does not generate the positive part. Our calculation
shows that the prolong does not depend on N , i.e., N = (1, 1, 1, 1). The simple part of this
prolong is 3)brj(2; 3). The sdim of the positive parts are described as follows:
g1 g2 g3
sdim 0|3 3|0 0|2
New simple modular Lie superalgebras 17
and the lowest weight vectors are
V ′1 2x1 x3∂4 + x
2 ∂4 + x2 x5 x6∂4 + x1 x5∂1 + x2 x5∂2 + 2 x2 x6∂3 + 2x3 x5∂3 + x4∂3 + 2x1∂5 + x2∂6 + 2 x3∂7 + 2x5 x7∂7
V ′2 2x1 x4∂4 + x
1 ∂1 + 2x1 x2∂2 + 2x
2 ∂3 + 2x2 x5 x6∂3 + 2x4 x5∂3 + 2x1 x5∂5 + x1 x7∂7 + x2 x5∂6 + x2 x6∂7 + 2x4∂7
V ′3 x1 x
3 ∂4 + 2x
2 x3∂4 + 2x2 x3 x5 x6∂4 + 2x1 x3 x5∂1 + 2x1 x4∂1 + x
2 x5∂1 + 2x2 x3 x5∂2 + x2 x3 x6∂3 + 2x2 x4∂2
+2x2 x5 x6 x7∂3 + x
3 x5∂3 + 2x3 x4∂3 + x1 x3∂5 + x1 x6 x7∂6 + 2x
2 ∂5 + 2x2 x3∂6 + 2x2 x5 x6∂5 + x2 x5 x7∂6
+2x2 x6 x7∂7 + x
3 ∂7 + x3 x5 x7∂7 + x4 x5∂5 + x4 x6∂6 + x4 x7∂7
Let us study now the case where g′0 = der0(g−). The Lie algebra g
0 is solvable of sdim = 5|0.
The CTS prolong (g−, g
0)∗ gives a Lie superalgebra that is not simple because g
1 does not
generate the positive part. Its simple part is a new Lie superalgebra that we had denoted
by BRJ, described as follows (here also N = (1, 1, 1):
g′1 adg′1(g
1) ad
(g′1) ad
(g′1) ad
(g′1) ad
(g′1)
sdim 0|6 6|0 0|5 3|0 0|3 1|0
1.16. Constructing Melikyan superalgebras. Denote by F1/2 := O(1; 1)
dx the space
of semi-densities (weighted densities of weight 1
). For p = 3, the CTS prolong of the triple
(K,Π(F1/2), cvect(1; 1))∗ gives the whole k(1;N |3). For p = 5, let us realize the non-positive
part in k(1;N |5):
gi the generators
g−2 1
g−1 Π(F1/2)
g0 ∂1 ←→ 4 ξ1η2 + ξ2θ, x1∂1 ←→ 2 ξ1η1 + ξ2η2, x
1 ∂1 ←→ 2 ξ2η1 + 3 θη2, x
1 ∂1 ←→ 2 θη1
1 ∂1 ←→ 2 η2η1, t
The CTS prolong gives that gi=0 for all i > 0.
Consider now the case of (K,Π(F1/2), cvect(2; 1))∗, where p = 3. The non-positive part is
realized in k(1;N |9) as follows:
gi the generators
g−2 1
g−1 Π(F1/2)
g0 ∂1 ←→ 2 ξ1η3 + x2θ + 2 ξ3η4, x1∂1 ←→ ξ1η1 + ξ2η2 + 2 ξ4η4, x21∂1 ←→ ξ3η1 + ξ4η3 + θη2,
∂2 ←→ 2 ξ1η2 + ξ2ξ4 + ξ3θ, x2∂2 ←→ ξ1η1 + ξ3η3 + ξ4η4, x22∂2 ←→ ξ2η1 + θη3 + 2 η4η2,
x1x2∂1 ←→ ξ2η1 + η4η2, x1x2∂2 ←→ ξ3η1 + 2 ξ4η3, x21x2∂1 ←→ θη1 + 2 η3η2,
x21x2∂2 ←→ ξ4η1, x1x22∂1 ←→ 2 η4η1, x1x22∂2 ←→ θη1 + η3η2, x21x22∂1 ←→ η3η1,
2∂2 ←→ η2η1, t
The CTS prolong (g−, g0)∗ gives a Lie superalgebra that is not simple with the property that
sdim(g1) = 0|4 and gi = 0 for all i > 1. The generating functions of g1 are
ξ2η2η1 + 2 ξ3η3η1 + ξ4η4η1 + θη3η2, 2 ξ4η3η1 + θη2η1, θη3η1 + η4η2η1, η3η2η1.
1.17. Defining relations of the positive parts of brj(2; 3) and brj(2; 5). For the
presentations of the Lie superalgebras with Cartan matrix, see [GL1, BGL1]. The only non-
trivial part of these relations are analogs of the Serre relations (both the straightforward
18 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
ones and the ones different in shape). Here they are:
brj(2; 3); sdim brj(2; 3) = 10|8.
1) [[x1, x2] , [x2, [x1, x2]]] = 0,
[[x2, x2] , [[x1, x2] , [x2, x2]]] = 0.
2) ad3x2(x1) = 0,
[[x1, x2] , [[x1, x2] , [x1, x2]]] = 0,
[[x2, [x1, x2]] , [[x1, x2] , [x2, [x1, x2]]]] = 0.
3) ad3x1(x2) = 0,
[x2, [x1, [x1, x2]]]− [[x1, x2], [x1, x2]] = 0,
[[x1, x2], [x2, x2]] = 0.
brj(2; 5); sdim brj(2; 5) = 10|12.
1) [[x2, [x1, x2]] , [x2, [x1, x2]]] = 2 [[x1, x2] , [[x1, x2] , [x2, x2]]],
[[x2, x2] , [[x1, x2] , [x2, x2]]] = 0,
[[x2, [x1, x2]] , [[x1, x2] , [x2, [x1, x2]]]] = 0.
2 )ad4x2(x1) = 0,
[[x2, [x1, x2]] , [x2, [x2, [x1, x2]]]] = 0,
[[[x1, x2] , [x1, x2]] , [[x1, x2] , [x2, [x1, x2]]]] = 0.
References
[BKK] Benkart, G.; Kostrikin, A. I.; Kuznetsov, M. I. The simple graded Lie algebras of characteristic three
with classical reductive component L0. Comm. Algebra 24 (1996), no. 1, 223–234.
[BL] Bernstein J., Leites D., Invariant differential operators and irreducible representations of Lie super-
algebras of vector fields. Selecta Math. Sov., v. 1, 1981, no. 2, 143–160
[BjL] Bouarroudj S., Leites D., Simple Lie superalgebras and non-integrable distributions in characteristic
p Zapiski nauchnyh seminarov POMI, t. 331 (2006), 15–29; Reprinted in J. Math. Sci. (NY), 141
(2007) no.4, 1390–98; math.RT/0606682
[BGL1] Bouarroudj S., Grozman P., Leites D., Cartan matrices and presentations of Elduque and Cunha
simple Lie superalgebras; MPIMiS preprint 124/2006 (www.mis.mpg.de)
[BGL2] Bouarroudj S., Grozman P., Leites D., Cartan matrices and presentations of the exceptional simple
Elduque Lie superalgebra; MPIMiS preprint 125/2006 (www.mis.mpg.de)
[BGL4] Bouarroudj S., Grozman P., Leites D., Infinitesimal deformations of the simple modular Lie super-
algebras with Cartan matrices for p = 3. IN PREPARATION
[BGL5] Bouarroudj S., Grozman P., Leites D., Simple modular Lie superalgebras with Cartan matrices. IN
PREPARATION
[C] Cartan É., Über die einfachen Transformationsgrouppen, Leipziger Berichte (1893), 395–420.
Reprinted in: Œuvres complètes. Partie II. (French) [Complete works. Part II] Algèbre, systèmes
différentiels et problèmes d’équivalence. [Algebra, differential systems and problems of equivalence]
Second edition. Éditions du Centre National de la Recherche Scientifique (CNRS), Paris, 1984.
[Cla] Clarke B., Decomposition of the tensor product of two irreducible sl(2)-modules in characteristic 3,
MPIMiS preprint 145/2006; for calculations, see
http://personal-homepages.mis.mpg.de/clarke/Tensor-Calculations.tar.gz
[CE] Cunha I., Elduque A., An extended Freudenthal magic square in characteristic 3; math.RA/0605379
[CE2] Cunha I., Elduque, A., The extended Freudenthal Magic Square and Jordan algebras;
math.RA/0608191
[El1] Elduque, A. New simple Lie superalgebras in characteristic 3. J. Algebra 296 (2006), no. 1, 196–233
[El2] Elduque, A. Some new simple modular Lie superalgebras. math.RA/0512654
[Er] Ermolaev, Yu. B. Integral bases of classical Lie algebras. (Russian) Izv. Vyssh. Uchebn. Zaved. Mat.
2004, , no. 3, 16–25; translation in Russian Math. (Iz. VUZ) 48 (2004), no. 3, 13–22.
http://arxiv.org/abs/math/0606682
http://personal-homepages.mis.mpg.de/clarke/Tensor-Calculations.tar.gz
http://arxiv.org/abs/math/0605379
http://arxiv.org/abs/math/0608191
http://arxiv.org/abs/math/0512654
New simple modular Lie superalgebras 19
[FH] Fulton, W., Harris, J., Representation theory. A first course. Graduate Texts in Mathematics, 129.
Readings in Mathematics. Springer-Verlag, New York, 1991. xvi+551 pp
[GK] Gregory, T.; Kuznetsov, M. On depth-three graded Lie algebras of characteristic three with classical
reductive null component. Comm. Algebra 32 (2004), no. 9, 3339–3371
[Gr] Grozman P., SuperLie, http://www.equaonline.com/math/SuperLie
[GL1] Grozman P., Leites D., Defining relations for classical Lie superalgebras with Cartan matrix, Czech.
J. Phys., Vol. 51, 2001, no. 1, 1–22; arXiv: hep-th/9702073
[GL2] Grozman P., Leites D., SuperLie and problems (to be) solved with it. Preprint MPIM-Bonn, 2003-39
(http://www.mpim-bonn.mpg.de)
[GL4] Grozman P., Leites D., Structures of G(2) type and nonintegrable distributions in characteristic p.
Lett. Math. Phys. 74 (2005), no. 3, 229–262; arXiv: math.RT/0509400
[GLS] Grozman P., Leites D., Shchepochkina I., Invariant operators on supermanifolds and standard mod-
els. In: In: M. Olshanetsky, A. Vainstein (eds.) Multiple facets of quantization and supersymme-
try. Michael Marinov Memorial Volume, World Sci. Publishing, River Edge, NJ, 2002, 508–555.
[math.RT/0202193; ESI preprint 1111 (2001)].
[KWK] Kac, V. G. Corrections to: ”Exponentials in Lie algebras of characteristic p” [Izv. Akad. Nauk SSSR
35 (1971), no. 4, 762–788; MR0306282 (46 #5408)] by B. Yu. Veisfeiler and Kac. (Russian) Izv. Ross.
Akad. Nauk Ser. Mat. 58 (1994), no. 4, 224; translation in Russian Acad. Sci. Izv. Math. 45 (1995),
no. 1, 229
[K2] Kac V., Lie superagebras, Adv. Math. v. 26, 1977, 8–96
[K3] Kac, V. Classification of supersymmetries. Proceedings of the International Congress of Mathemati-
cians, Vol. I (Beijing, 2002), Higher Ed. Press, Beijing, 2002, 319–344
Cheng, Shun-Jen; Kac, V., Addendum: “Generalized Spencer cohomology and filtered deformations
of Z-graded Lie superalgebras” [Adv. Theor. Math. Phys. 2 (1998), no. 5, 1141–1182; MR1688484
(2000d:17025)]. Adv. Theor. Math. Phys. 8 (2004), no. 4, 697–709.
Cantarini, N.; Cheng, S.-J.; Kac, V. Errata to: “Structure of some Z-graded Lie superalgebras of
vector fields” [Transform. Groups 4 (1999), no. 2-3, 219–272; MR1712863 (2001b:17037)] by Cheng
and Kac. Transform. Groups 9 (2004), no. 4, 399–400
[KKCh] Kirillov, S. A.; Kuznetsov, M. I.; Chebochko, N. G. Deformations of a Lie algebra of type G2 of
characteristic three. (Russian) Izv. Vyssh. Uchebn. Zaved. Mat. 2000, , no. 3, 33–38; translation in
Russian Math. (Iz. VUZ) 44 (2000), no. 3, 31–36
[KLV] Kochetkov Yu., Leites D., Vaintrob A. New invariant differential operators and pseudo-(co)homology
of supermanifolds and Lie superalgebras. In: S. Andima et. al. (eds.) General Topology and its Appl.,
June 1989, Marcel Dekker, NY, 1991, 217–238
[KS] Kostrikin, A. I., Shafarevich, I.R., Graded Lie algebras of finite characteristic, Izv. Akad. Nauk. SSSR
Ser. Mat. 33 (1969) 251–322 (in Russian); transl.: Math. USSR Izv. 3 (1969) 237–304
[Ku1] Kuznetsov, M. I. The Melikyan algebras as Lie algebras of the type G2. Comm. Algebra 19 (1991),
no. 4, 1281–1312.
[Ku2] Kuznetsov, M. I. Graded Lie algebras with the almost simple component L0. Pontryagin Conference,
8, Algebra (Moscow, 1998). J. Math. Sci. (New York) 106 (2001), no. 4, 3187–3211.
[LL] Lebedev A., Leites D., (with Appendix by P. Deligne) On realizations of the Steenrod algebras. J.
Prime Res. Math., v. 2, 2006,
[L] Leites D., Towards classification of simple finite dimensional modular Lie superalgebras in character-
istic p. IN PREPARATION
[LSh] Leites D., Shchepochkina I., Classification of the simple Lie superalgebras of vector fields, preprint
MPIM-2003-28 (http://www.mpim-bonn.mpg.de)
[Ssol] Sergeev, A. Irreducible representations of solvable Lie superalgebras. Represent. Theory 3 (1999),
435–443; math.RT/9810109
[Shch] Shchepochkina I., How to realize Lie algebras by vector fields. Theor. Mat. Fiz. 147 (2006) no. 3,
821–838; arXiv: math.RT/0509472
[Sk] Skryabin, S. M. New series of simple Lie algebras of characteristic 3. (Russian. Russian summary)
Mat. Sb. 183 (1992), no. 8, 3–22; translation in Russian Acad. Sci. Sb. Math. 76 (1993), no. 2, 389–406
[S] Strade, H. Simple Lie algebras over fields of positive characteristic. I. Structure theory. de Gruyter
Expositions in Mathematics, 38. Walter de Gruyter & Co., Berlin, 2004. viii+540 pp.
[St] Steinberg, R. Lectures on Chevalley groups. Notes prepared by John Faulkner and Robert Wilson.
Yale University, New Haven, Conn., 1968. iii+277 pp.
http://arxiv.org/abs/math/0202193
http://arxiv.org/abs/math/9810109
20 Sofiane Bouarroudj, Pavel Grozman, Dimitry Leites
[Vi] Viviani F., Deformations of Simple Restricted Lie Algebras I, II. math.RA/0612861,
math.RA/0702499;
Deformations of the restricted Melikian Lie algebra,math.RA/0702594;
Restricted simple Lie algebras and their infinitesimal deformations, math.RA/0702755
[WK] Weisfeiler, B. Ju.; Kac, V. G. Exponentials in Lie algebras of characteristic p. (Russian) Izv. Akad.
Nauk SSSR Ser. Mat. 35 (1971), 762–788.
[Y] Yamaguchi K., Differential systems associated with simple graded Lie algebras. Progress in differential
geometry, Adv. Stud. Pure Math., 22, Math. Soc. Japan, Tokyo, 1993, 413–494
1Department of Mathematics, United Arab Emirates University, Al Ain, PO. Box: 17551;
Bouarroudj.sofiane@uaeu.ac.ae, 2Equa Simulation AB, Stockholm, Sweden; pavel@rixtele.com,
3MPIMiS, Inselstr. 22, DE-04103 Leipzig, Germany, on leave from Department of Mathe-
matics, University of Stockholm, Roslagsv. 101, Kräftriket hus 6, SE-106 91 Stockholm,
Sweden; mleites@math.su.se, leites@mis.mpg.de
http://arxiv.org/abs/math/0612861
http://arxiv.org/abs/math/0702499
http://arxiv.org/abs/math/0702594
http://arxiv.org/abs/math/0702755
	1. Introduction
	References
ABSTRACT
  Over algebraically closed fields of characteristic p>2, prolongations of the
simple finite dimensional Lie algebras and Lie superalgebras with Cartan matrix
are studied for certain simplest gradings of these algebras. Several new simple
Lie superalgebras are discovered, serial and exceptional, including superBrown
and superMelikyan superalgebras. Simple Lie superalgebras with Cartan matrix of
rank 2 are classified.

<|endoftext|><|startoftext|>
Counterflow of electrons in two isolated quantum point contacts
V.S. Khrapai,1, 2 S. Ludwig,1 J.P. Kotthaus,1 H.P. Tranitz,3 and W. Wegscheider3
Center for NanoScience and Department für Physik, Ludwig-Maximilians-Universität,
Geschwister-Scholl-Platz 1, D-80539 München, Germany
Institute of Solid State Physics RAS, Chernogolovka, 142432, Russian Federation
Institut für Experimentelle und Angewandte Physik,
Universität Regensburg, D-93040 Regensburg, Germany
We study the interaction between two adjacent but electrically isolated quantum point contacts
(QPCs). At high enough source-drain bias on one QPC, the drive-QPC, we detect a finite electric
current in the second, unbiased, detector-QPC. The current generated at the detector-QPC always
flows in the opposite direction than the current of the drive-QPC. The generated current is maximal,
if the detector-QPC is tuned to a transition region between its quantized conductance plateaus and
the drive-QPC is almost pinched-off. We interpret this counterflow phenomenon in terms of an
asymmetric phonon-induced excitation of electrons in the leads of the detector-QPC.
PACS numbers: 73.23.-b, 73.23.Ad, 73.50.Lw
The state of a confined quantum system is modified
by interactions with an external field (or with exter-
nal sources of energy). In semiconductor nanostruc-
tures the energy and quasi-momentum of electrons act-
ing as probe are strongly influenced by the environment,
e. g. via electron-electron or electron-phonon interaction.
If driven out of equilibrium, Coulomb forces establish
the local equilibrium within the electron system whereas
electron-phonon interactions dominate the energy ex-
change with the environment [1]. Drag experiments in
semiconductor nanostructures provide a tool to study the
effect of external electrons or phonons onto a probe elec-
tron system.
Current drag between parallel two-dimensional (2D)
electron layers has been investigated in GaAs/AlGaAs
bilayer systems. At small interlayer separations, ob-
servations are consistent with the Coulomb drag phe-
nomenon [2]. At larger separations virtual-phonon ex-
change has been invoked to explain the data [3]. A neg-
ative sign of a current drag between 2D and 3D electron
gases in GaAs was explained by the Peltier effect [4]. At
high filling factors in a perpendicular magnetic field a
sign change of the longitudinal drag between parallel 2D
layers was found as a function of the imbalance of the
electron density in the two layers [5, 6].
Interactions between two lateral quantum wires in
GaAs have been investigated in Ref. [7]. The observed
frictional drag, strongly oscillating as a function of the
one-dimensional (1D) subband occupation, was inter-
preted in terms of Coulomb interaction between two
Luttinger liquids. Recently, the observation of negative
Coulomb drag between two disordered lateral 1D wires in
GaAs in perpendicular magnetic fields was reported [8].
Here we report on a novel interaction effect between
two neighboring quantum point contacts (QPCs), em-
bedded in mutually isolated electric circuits. When a
strong current is flowing through the partially transmit-
ting drive-QPC, we detect a small current in the sec-
ond, unbiased, detector-QPC. The detector current flows
in the opposite direction of the drive current and shows
a nonlinear dependence on the source-drain bias of the
drive-QPC. It oscillates as a function of the detector-
QPC transmission. We suggest an explanation of this
counterflow phenomenon in terms of asymmetric phonon-
induced excitation of ballistic electrons in the leads of the
detector-QPC.
Our samples are prepared on a GaAs/AlGaAs het-
erostructure containing a two-dimensional electron gas
90 nm below the surface, with an electron density of
nS = 2.8 × 10
11 cm−2 and a low-temperature mobility
of µ = 1.4 × 106 cm2/Vs. An AFM micrograph of the
split-gate nanostructure, produced with e-beam lithogra-
phy, is shown in the left inset of Fig. 1. The negatively
biased central gate C divides the electron system into two
separate circuits, and prevents leakage currents between
them. Two QPCs are defined on the upper and lower
side of the central gate, respectively, by biasing gates 8
and 3. Other gates are grounded if not stated otherwise.
The right inset of Fig. 1 shows a sketch of the coun-
terflow experiment. We use separate electric circuits for
the (upper) drive-QPC and (lower) detector-QPC. A dc
bias voltage, Vdrive, is applied to the left lead of the drive-
QPC, while the right lead is grounded. A current-voltage
amplifier with an input voltage-offset of about 10 µV is
connected to the right lead of the detector-QPC. Its left
lead is always maintained at the same offset potential
in order to assure zero voltage drop across the detector-
QPC. In both circuits, a positive sign of the current cor-
responds to electrons flowing to the left. For differential
counterflow conductance measurements, the drive bias is
modulated at a frequency of 21 Hz and the resulting ac
current component in the detector circuit is measured
with lock-in detection in the linear response regime. All
measurements are performed in a dilution refrigerator at
an electron temperature below 150 mK. The experimen-
tal results are the same if detector and drive QPC are
interchanged.
First, we characterize the QPCs using a standard dif-
ferential conductance measurement. Figure 1 displays
the differential conductances of both QPCs in linear re-
http://arxiv.org/abs/0704.0132v2
drive
-0.7 -0.5 -0.3
FIG. 1: Conductance of the drive-QPC (dashed line) and the
detector-QPC (solid line) in the linear response regime as a
function of respective gate voltages V8 and V3. Symbols on
the detector-QPC curve mark the V3 values used for coun-
terflow conductance measurement presented in Fig. 2b. Left
inset: AFM micrograph of the metal gates on the surface
of the heterostructure (bright tone). Crossed squares mark
contacted 2DEG regions. The scale bar equals 1 µm. Right
inset: sketch of the counterflow measurement. The directions
of currents are shown for the case of Vdrive > 0.
sponse, measured as a function of the respective gate
voltage V3, or V8. At low gate voltages, the QPCs are
pinched-off and the conductance is close to zero. With
increasing gate voltage, 1D channels successively open
up [9]. For both QPCs we observe three conductance
plateaus approximately quantized in units of G0 = 2e
With high bias spectroscopy [10] we find the spacing
between the two lowest subbands to be approximately
4 meV (3 meV) for the drive (detector) QPC. The half-
width of the energy window for opening a 1D subband is
∆ ≈ 0.5 meV in both QPCs.
Having characterized the QPCs, we turn to counter-
flow measurements. Fig. 2a shows the dc counterflow
current, Icf, through the detector-QPC and the differ-
ential counterflow conductance, gcf ≡ dIcf/dVdrive, as a
function of the bias on the drive-QPC. Here, the drive-
QPC is tuned to nearly half a conductance quantum
Gdrive = G0/2, while the detector-QPC is in the pinch-off
regime (i.e. the lowest 1D subband bottom is well above
the Fermi level) with Gdet ≃ 10 GΩ
−1. Surprisingly, for
|Vdrive| & 1 mV, a finite current is observed in the un-
biased detector circuit. The direction of Icf is opposite
to that of the drive-QPC current Idrive. The dc coun-
terflow current is a threshold-like, nearly odd function of
Vdrive. Correspondingly, the differential counterflow con-
ductance is negative and a nearly even function of Vdrive.
The sign of gcf expresses a phase shift of π between the
applied ac modulation of Vdrive and the detected ac com-
ponent of the counterflow current.
Figures 2c and 2d show the absolute value of Icf for the
nearly pinched-off detector as a function of the voltage
on gate 8, which tunes the drive-QPC transmission. The
-4 -2 0 2 4
-0.04
-0.02
-0.7 -0.6 -0.5 -0.4
drive
 (mV)
(d) G
FIG. 2: (a) - Icf and gcf for the nearly pinched-off detector-
QPC as a function of Vdrive. (b) - gcf measured for a set of
Gdet values marked by according symbols in Fig. 1. (c,d)
- Absolute value of Icf as a function of the drive-QPC gate
voltage V8, for Vdrive = ±2.25 mV (c) and Vdrive = ±4 mV(d).
Also shown is the drive-QPC’s conductance in linear response
(c) and its differential conductance at Vdrive = ±4 mV (d).
Solid (dotted) lines correspond to Vdrive < 0 (> 0). In (a),(b)
gates 7 and 9 are grounded, while in (c),(d) V7 = V9 = −0.4 V.
The drive bias modulation used to measure gcf is 92 µV rms.
corresponding drive-QPC differential conductance curves
are also shown. For not too high Vdrive (Fig. 2c), a non-
zero counterflow current is only detected in the region
between pinch-off and the first conductance plateau of
the drive-QPC. For higher Vdrive (Fig. 2d) Icf increases
superlinearly with Vdrive at its maximum and remains
finite at higher gate voltages V8. Since the source bias
effects the potential distribution near the constriction,
the nonlinear 1/2 conductance plateau of the drive-QPC
shifts when changing Vdrive [11]. This causes the shift of
the extrema on Fig. 2d as well as the asymmetry of gcf
in Fig. 2a when reversing the bias.
We proceed to study the counterflow effect in the
-0.6 -0.5 -0.4 -0.3
-0.10
-0.05
-0.65 -0.60 -0.55
=17 kΩ
=17 kΩ
FIG. 3: (a)- gcf as a function of the detector-QPC gate volt-
age V3. Filled symbols correspond to the gcf measured at a
finite external resistance Rext = 17 kΩ, while open symbols
show the corrected counterflow conductance Rext = 0 (see
text). Also shown are the transmission functions Tn(1 − Tn)
for the three lowest 1D subbands of the detector-QPC (dashed
lines), scaled to fit the corrected data. During the gcf mea-
surement the drive bias is modulated with a 230 µV rms signal
about the mean value Vdrive = +2.05 mV. (b) - Normalized gcf
(symbols as in (a)) and transmission function of the lowest 1D
detector-QPC subband 4T0(1−T0) (dashed line) as a function
of V3. The scale bar shows a gate voltage interval correspond-
ing to a change of the 1D subband energy by 0.5 meV. Inset:
Sketch of possible scattering processes of nonequilibrium elec-
trons and holes at a partially transmitting detector-QPC.
regime of a more opened detector-QPC. Figure 2b plots
gcf [12] as a function of Vdrive for several values of Gdet
between 0 and G0 (marked with the same symbols in
Fig. 1). The qualitative appearance of gcf(Vdrive) is in-
dependent of Gdet. However, the amplitude of gcf is a
strongly non-monotonic function of the detector trans-
mission. The counterflow conductance reaches its maxi-
mum for Gdet ≈ G0/2 and decreases rapidly with further
increasing Gdet. Note that the absolute value of gcf is
small, corresponding to a maximal ratio of the counter-
flow and drive currents |Icf/Idrive| . 10
In Fig. 3a gcf is plotted as a function of V3, controlling
the detector transmission. Vdrive and V8 are adjusted for
maximal gcf and kept fixed. Confirming the trend seen in
Fig. 2b, the measured gcf (solid symbols) strongly oscil-
lates with increasing V3 and displays three pronounced
maxima before the detector-QPC is fully opened. The
position of the n-th maximum (n = 0,1,2) is close to the
value of V3, where Gdet/G0 ≃ n + 0.5 (Fig. 1). Here,
the energy EnS of the bottom of the n-th 1D subband
-4 -2 0 2 4
��������������
drive
 (mV)
FIG. 4: Drive bias dependence of the counterflow current
through the pinched-off detector-QPC for the drive-QPC
formed with gate 6 (dotted line) or gate 10 (solid line). The
detector-QPC conductance is about Gdet = 5 GΩ
−1. The
drive-QPCs are tuned to provide the maximal effect. Insets:
sketches of the two counterflow measurements. The directions
of currents are shown for the case of Vdrive > 0.
of the detector-QPC aligns with the Fermi level of the
leads EnS ≃ EF. In contrast, gcf is close to zero for fully
transmitting 1D channels (Gdet/G0 ≃ n+1). The over-
all magnitude of gcf decreases with increasing V3, hence
Gdet. This is caused by a finite series resistance Rext of
the external circuit, which results in a measured gcf lower
than the case for an ideal ammeter [13]. The corrected
counterflow conductance, gidealcf ≡ gcf · (1 + Rext · Gdet),
corresponding to Rext = 0, is shown in Fig. 3a with open
symbols. The corrected maxima are roughly equal in size
and symmetric. Moreover, the shape of the n-th maxi-
mum compares quite well with the corresponding func-
tion of the equilibrium transmission Tn(1−Tn), extracted
from the detector conductance data Tn ≡ Gdet/G0 − n
(dashed lines in Fig. 3a).
In Figure 3b we plot the normalized gcf and the trans-
mission function 4T0(1− T0) on a logarithmic scale near
the detector pinch-off. In the pinch-off regime (i.e. for
T0 ≪ 1) the transmission probability of a QPC is ex-
pressed as T0(E) ∝ exp([E − E
S]/∆) [11]. Here E is
the kinetic energy of current carrying electrons and ∆
is the half-width of the energy window for opening a 1D-
subband. The energy E0S of the detector-QPC is con-
trolled by gate 3 via E0S ∝ −|e|V3. This explains a nearly
exponential drop of the transmission function with de-
creasing V3 (Fig. 3b). In contrast, the measured gcf
drops considerably slower and remains finite even where
the detector-QPC is already pinched-off in equilibrium.
This experimentally observed excess contribution of the
normalized gcf versus T0(EF ) signals that the counter-
flow current carrying electrons are excited well above the
Fermi level. Converting the shift in gate voltage (see the
bar in Fig. 3b) to energy, we find a characteristic excita-
tion energy of E∗ ≈ 0.5 meV. This is consistent with a
recently reported 1 meV bandwidth excitation provided
by the drive-QPC for electrons in a nearby double-dot
quantum ratchet [14].
Next we study the counterflow effect between spatially
shifted QPCs. Figure 4 shows Icf through the nearly
pinched-off detector-QPC as a function of the bias on the
drive-QPC, which is formed either with gate 10 or gate
6, while gate 8 is now grounded (Fig. 1). Despite the
shift of the drive-QPC position relative to the detector-
QPC by about 300 nm, the odd drive bias dependence
of the counterflow current found in Fig. 2 is practically
preserved. This indicates that the excitation of electrons
in one of the leads of the detector-QPC is not restricted
to the close vicinity of the drive-QPC.
The oscillations of the counterflow conductance gcf in
Fig. 3 are reminiscent of thermopower oscillations that
have been investigated on individual QPCs [15, 16]. This
suggests that Icf is caused by an energetic imbalance
across the detector-QPC. If the bottom of the n-th 1D-
subband of the detector-QPC is well separated from the
Fermi-energy in comparison to the characteristic excita-
tion energy, i. e. if |EnS −EF| ≫ E
∗, this subband is either
fully transmitting (Tn(E) = 1) or closed (Tn(E) = 0). In
both cases electrons (holes) excited by E∗ above (below)
EF are equally transmitted and gcf = 0. In contrast,
if EnS ≃ EF excited electrons are more likely transmit-
ted than excited holes (see inset of Fig. 3b), resulting in
gcf 6= 0.
The energetic imbalance across the detector-QPC we
propose to be caused by phonon-based energy transfer
from the drive-QPC. The excess energy of carriers in-
jected across the drive-QPC is mainly relaxed by emis-
sion of acoustic phonons. We consider the drive-QPC in
the non-linear regime near pinch-off where µS − µD ≫
∆ and the transmission probability is strongly energy-
dependent (the source and drain leads are defined so that
their chemical potentials satisfy µS > µD). In this case
electrons injected into the drain lead have an excess en-
ergy of about e|Vdrive| ≡ µS−µD whereas the source lead
remains essentially in thermal equilibrium [17]. Hence
acoustic phonons are predominantly generated in the
drain lead of the drive-QPC. Because of this asymme-
try electron-hole pairs are excited preferentially in the
adjacent lead of the detector-QPC [18]. This gives rise
to Icf directed opposite to the current through the drive-
QPC (and gcf < 0). The data in Fig. 2 clearly show, that
the counterflow effect is only observed in the non-linear
regime of the drive-QPC.
For a rough estimate we consider injected electrons
with a momentum relaxation time of 60 ps limited by
elastic scattering and an energy relaxation time of
1 ns [19, 20]. Assuming isotropic phonon emission we
estimate an energy transfer ratio which can account for
the observed value of Icf/Idrive within one order of mag-
nitude.
In summary, the current in a strongly biased drive-
QPC generates a current flowing in the opposite direc-
tion through an adjacent unbiased detector-QPC. This
counterflow current is maximal in between the conduc-
tance plateaus of the detector-QPC. The effect is most
pronounced near pinch-off of the drive-QPC, where it
behaves strongly non-linear. We interpret the results in
terms of an asymmetric phonon-based energy transfer.
The authors are grateful to V.T. Dolgopolov,
A.W. Holleitner, C. Strunk, F. Wilhelm, I. Favero,
A.V. Khaetskii, N.M. Chtchelkatchev, A.A. Shashkin,
D.V. Shovkun and P. Hänggi for valuable discussions
and to D. Schröer and M. Kroner for technical help. We
thank the DFG via SFB 631, the BMBF via DIP-H.2.1,
the Nanosystems Initiative Munich (NIM) and VSK the
A. von Humboldt foundation, RFBR, RAS, and the pro-
gram ”The State Support of Leading Scientific Schools”
for support.
[1] V. F. Gantmakher and Y. B. Levinson, in Carrier Scat-
tering in Metals and Semiconductors (North-Holland,
Amsterdam, 1987)
[2] T.J. Gramila et al., Phys. Rev. Lett. 66, 1216 (1991)
[3] T.J. Gramila et al., Phys. Rev. B 47, 12957 (1993);
H. Rubel et al., Semicond. Sci. Technol. 10, 1229 (1995)
[4] B. Laikhtman et al., Phys. Rev. B. 41, 9921 (1990)
[5] X.G. Feng et al., Phys. Rev. Lett. 81, 3219 (1998)
[6] J.G.S. Lok et al., Phys. Rev. B 63, 041305 (2001)
[7] P. Debray et al., J. Phys.: Condens. Matter 13, 3389,
(2001); P. Debray et al., Semicond. Sci. Technol. 17, R21,
(2002)
[8] M. Yamamoto et al., Science 313, 204, (2006)
[9] B.J. van Wees et al., Phys. Rev. Lett. 60, 848 (1988);
D.A. Wharam et al., J. Phys. C 21, L209 (1988)
[10] A. Kristensen et al., Phys. Rev. B 62, 10950 (2000)
[11] L.I. Glazman, A.V. Khaetskii JETP Lett. 48 591 (1988)
[12] For increasing Gdet the noises in the detector circuit in-
crease, making the dc measurements very difficult.
[13] The input resistance of the I-V amplifier, the ohmic con-
tacts and wiring resistances result in Rext = 17 kΩ. The
validity of the above formula has been checked by apply-
ing an additional 47 kΩ resistor in series to Rext.
[14] V.S. Khrapai et al., Phys. Rev. Lett. 97, 176803 (2006)
[15] L.W. Molenkamp et al., Phys. Rev. Lett. 68, 3765 (1992);
H. van Houten et al., Semicond. Sci. Technol. 7, B215
(1992)
[16] A.S. Dzurak et al., J. Phys.: Condens. Matter 5, 8055,
(1993)
[17] A. Palevski et al., Phys. Rev. Lett. 62, 1776 (1989); for
asymmetric heat production in 3D point contacts see
U. Gerlach-Meyer, H.J. Queisser Phys. Rev. Lett. 51,
1904 (1983)
[18] |Icf| is reduced for Vdrive < 0 (> 0) and the drive-QPC
shifted to the lh (rh) side of the detector-QPC (Fig. 4).
This is understood in terms of absorption of phonons in
both leads of the detector-QPC.
[19] B.K. Ridley, Rep. Prog. Phys 54, 169 (1991)
[20] A.A. Verevkin et al., Phys. Rev. B 53, R7592 (1996)
ABSTRACT
  We study the interaction between two adjacent but electrically isolated
quantum point contacts (QPCs). At high enough source-drain bias on one QPC, the
drive QPC, we detect a finite electric current in the second, unbiased,
detector QPC. The current generated at the detector QPC always flows in the
opposite direction than the current of the drive QPC. The generated current is
maximal, if the detector QPC is tuned to a transition region between its
quantized conductance plateaus and the drive QPC is almost pinched-off. We
interpret this counterflow phenomenon in terms of an asymmetric phonon-induced
excitation of electrons in the leads of the detector QPC.

<|endoftext|><|startoftext|>
Introduction
Redshifts ∼2.5 witness both the ‘quasar epoch’ with peak number density of luminous
accreting black holes (e.g. Schmidt et al. 1995) and the peak in number of the most intense
star forming events as traced by the submillimeter galaxy population (Chapman et al. 2005),
suggestive of a relation of the two phenomena. Detailed evolutionary connections between
massive starbursts and QSOs have been discussed for many years (e.g. Sanders et al. 1988;
Norman & Scoville 1988) and form an integral part of some recent models of galaxy and
merger evolution (e.g. Granato et al. 2004; Springel et al. 2005; Hopkins et al. 2006). Phases
of intense star formation coincident with the active phase of the quasars are a natural
postulate of such models but have been exceedingly difficult to confirm and quantify due to
the effects of the powerful AGN outshining tracers of star formation at most wavelengths.
Perhaps the strongest constraint on the potential significance of star formation in QSOs
comes from the far-infrared part of their spectral energy distribution (SED). Indeed, this far-
infrared emission has been interpreted as due to star formation (e.g. Rowan-Robinson 1995),
but alternative models successfully ascribe it to AGN heated dust, by postulating a dust
distribution in which relatively cold dust at large distance from the AGN has a significant
covering factor, for example in a warped disk configuration (Sanders et al. 1989). Additional
diagnostics are needed to break this degeneracy.
CO surveys of local QSOs (e.g. Evans et al. 2001; Scoville et al. 2003; Evans et al. 2006)
have produced a significant number of detections of molecular gas reservoirs that might power
star formation. Depending on the adopted ‘star formation efficiency’ SFE=LFIR/LCO the
detected gas masses may be sufficient or not for ascribing the QSO far-infrared emission to
star formation. Optical studies have identified significant ‘post-starburst’ stellar populations
in QSOs (Canalizo & Stockton 2001; Kauffmann et al. 2003). On the other hand, Ho (2005)
suggested low star formation in QSOs, perhaps actively inhibited by the AGN, on the basis
of observations of the [OII] 3727Å line. We have used the much less extinction sensitive mid-
infrared PAH emission features to infer that in a sample of local (PG) QSOs, star formation
is sufficient to power the observed far-infrared emission (Schweitzer et al. 2006).
– 3 –
The observational situation remains complex for high redshift QSOs. Metallicity studies
of the broad-line region suggest significant enrichment by star formation (e.g. Hamann & Ferland
1999; Shemmer et al. 2004) but may not be representative for the host as a whole. Submm
and mm studies of luminous radio quiet QSOs have produced significant individual detec-
tions of dust emission of some QSOs, as well as statistical detection of the entire popula-
tion (e.g. Omont et al. 2003; Priddey et al. 2003; Barvainis & Ivison 2002). These suggest
potential starburst luminosities up to and exceeding 1013L⊙. CO studies have detected
large gas reservoirs in many high-z QSOs (see summaries in Solomon & Vanden Bout 2005;
Greve et al. 2005). Emission from high density molecular gas tracers has been detected in
some of the brightest systems (Barvainis et al. 1997; Solomon et al. 2003; Carilli et al. 2005;
Riechers et al. 2006; Garćıa-Burillo et al. 2006; Guélin et al. 2007) and may well originate in
dense high pressure star forming regions, but AGN effects on chemistry and molecular line
excitation could also play a role (e.g. Maloney et al. 1996). Finally, the [CII] 157µm rest
wavelength fine structure line was detected in the z=6.42 quasar SDSS J114816.64+525150.3
(Maiolino et al. 2005) at a ratio to the rest frame far-infrared emission similar to the ratio
in local ULIRGs, consistent with massive star formation.
We have initiated a program extending the use of mid-infrared PAH emission as star
formation tracer to high redshift QSOs. In this Letter, we use Spitzer mid-infrared spectra to
detect and quantify star formation in one of the brightest and best studied z∼2.5 QSOs, the
lensed Cloverleaf (H1413+117, Hazard et al. 1984; Magain et al. 1988). We adopt Ωm = 0.3,
ΩΛ = 0.7 and H0 = 70 kms
−1 Mpc−1.
2. Observations and Results
We obtained low resolution (R∼ 60 − 120) mid-infrared spectra of the Cloverleaf QSO
using the Spitzer infrared spectrograph IRS (Houck et al. 2004) in staring mode on July 24,
2006, at J2000 target position RA 14h15m46.27s, DEC +11d29m43.40s. The IRS aperture
includes all lensed images. 30 cycles of 120sec integration time per nod position were taken
in the LL1 (19.5 to 38.0 µm) and 15 cycles in the LL2 (14.0 to 21.3 µm) module, leading to
effective on-source integration times of 2 and 1 hours, respectively. We use the pipeline 14.4.0
processed basic calibrated data, own deglitching and coaddition procedures, and SMART
(Higdon et al. 2004) for extraction. When combining the two orders into the final spectrum,
we scaled the LL2 spectrum by a factor 1.02 for best match in the overlapping region.
Fig. 1 shows the IRS spectrum embedded into the infrared to radio SED of the Clover-
leaf, and Fig. 2 the IRS spectrum proper, together with the location of key features in the
corresponding rest wavelength range. The rest frame mid-infrared emission is dominated by
– 4 –
a strong continuum, approximately flat in νFν , due to dust heated by the powerful active
nucleus to temperatures well above those reached in star forming regions. Superposed on
this continuum are emission features, which we identify with the 6.2µm and 7.7µm aromatic
‘PAH’ emission features normally detected in star-forming galaxies over a very wide range
of properties. As expected for a Type 1 AGN, there are no indications for the ice (6µm)
or silicate (9.6µm) absorptions seen in heavily obscured galaxies. None of the well-known
emission lines in this wavelength range is bright enough to be significantly detected in this
low resolution spectrum, although we cannot exclude a contribution of [NeVI] 7.64µm to
the 7.7µm feature. Adopting standard mid-infrared low resolution diagnostics (Genzel et al.
1998; Laurent et al. 2000), the weak PAH features on top of a strong continuum agree with
the notion that the Cloverleaf is energetically dominated by its AGN. The detection of PAH
features with several mJy peak flux density in a z∼2.6 galaxy, however, implies intense star
formation, which we discuss in conjunction with other properties of the Cloverleaf.
By fitting a Lorentzian superposed on a local polynomial continuum, we measure a
flux of 1.52×10−21Wcm−2 for the 6.2µm feature, with a S/N of 6. The 7.7µm feature is
more difficult to quantify. Schweitzer et al. (2006) have discussed PAHs as star formation
indicators in local (PG) QSOs, PAHs are also detected in the average QSO spectrum of
Hao et al. (2007). The AGN continuum of those QSOs shows superposed silicate emission
features at & 9µm (see also Siebenmorgen et al. 2005; Hao et al. 2005). If PAH emission
is additionally present, the PAH features partly ‘fill in’ the minimum in the AGN emission
before the onset of the silicate feature (see Fig. 2 of Schweitzer et al. (2006)), causing a
seemingly flat overall spectrum. In reality, there is simultaneous presence of AGN continuum,
the 6.2-8.6µmPAH complex, silicate emission, and more PAH emission at longer wavelengths.
From inspection of Fig. 2, a similar co-presence of PAH and silicate emission is observed for
the Cloverleaf. We note that the presence of silicate emission in the luminous Cloverleaf
Type 1 QSO, other high z Type 1 QSOs (Maiolino et al., in prep) as well as in luminous
Type 2 QSOs (Sturm et al. 2006; Teplitz et al. 2006) has implications for the location of
this cool silicate component (∼200K, Hao et al. 2005) in unified AGN schemes. We leave
further discussion of the properties of silicates to a future paper. For the 7.7µm feature of
the Cloverleaf, we adopt a flux of 6.1×10−21Wcm−2. This flux was determined by scaling a
PAH template (ISO spectrum of M82, Sturm et al. 2000) to the measured Cloverleaf 6.2µm
feature flux, and then fitting three lorentzians to represent the 6.2, 7.7, and 8.6 features plus
a local polynomial continuum to this template. Similar Lorentzian fits were also used for
local comparison objects discussed below. Brandl et al. (2006) have quantified the scatter
of the 6.2 to 7.7µm flux ratio in starbursts, with 0.07 in the log this scatter indicates the
modest uncertainty induced by tying the longer wavelength features to the 6.2µm one. The
result of subtracting the scaled M82 template from the Cloverleaf spectrum is indicated
– 5 –
in Fig. 2, and shows a combination of continuum and silicate emission very similar to local
QSOs. Directly measuring the 7.7µm flux by fitting a single Lorentzian plus local continuum
to the Cloverleaf spectrum gives a ∼40% lower feature flux, which would be a systematic
underestimate because of the complexity of the underlying continuum/silicates discussed
above.
3. Intense star formation in the host of the Cloverleaf QSO
The Cloverleaf SED (Fig. 1) shows strong rest frame far-infrared emission in addition
to the AGN heated dust emitting in the rest frame mid-infrared. Weiß et al. (2003) decom-
posed the SED into two modified blackbodies of temperature 50 and 115K, the rest frame
far-infrared (40-120µm) luminosity of 5.4 × 1012L⊙ is dominated by the colder component
and could largely originate in star formation. Comparison of PAH and far-infrared emission
can shed new light on this question. The bolometric (rather than rest-frame far-infrared)
luminosity of the Cloverleaf will still be dominated by the AGN. We estimate LBol extrap-
olating from the observed rest frame 6µm continuum which for a mid-infrared spectrum
with weak PAH but strong continuum will be AGN dominated (Laurent et al. 2000). Using
LBol ∼ 10× νLν(6µm) based on an Elvis et al. (1994) radio-quiet QSO SED, the AGN lu-
minosity is ∼ 7 × 1013L⊙. A similar estimate ∼ 5 × 10
13L⊙ is obtained from the rest frame
optical (observed near-infrared; Barvainis et al. 1995) continuum, tracing the AGN ionizing
continuum, and the same global SED.
Schweitzer et al. (2006) have measured PAH emission in local QSOs and compared the
PAH to far-infrared emission ratio to that for starbursting ULIRGs, i.e. those among a larger
ULIRG sample not showing evidence for dominant AGN and not having absorption domi-
nated mid-infrared spectra. Fig. 3 places the Cloverleaf on their relation between 7.7µm PAH
luminosity and far-infrared luminosity. L(PAH)/L(FIR) is 0.014 for the Cloverleaf, very close
to the mean value for the 12 starburst-dominated ULIRGs of < L(PAH)/L(FIR) >= 0.0130.
The scatter of this relation is 0.2 in the log for these 12 comparison ULIRGs, indicating the
minimum uncertainty of extrapolating from the PAH to far-infrared emission. The Clover-
leaf thus extends the relation between PAH and far-infrared luminosity for the local QSOs
and ULIRGs to ∼5 times larger luminosities. Its PAH emission is consistent with an ex-
tremely luminous starburst of ULIRG-like physical conditions powering essentially all of the
rest frame far-infrared emission.
Teplitz et al. (2006) present the IRS spectrum of the lensed FIR-bright Type 2 AGN
IRAS F10214+4724 at similar redshift. They report a marginal feature at 6.2µm rest wave-
length which they do not interpret as PAH given the lack of a 7.7µm maximum. Given
– 6 –
the strength of silicate emission in this target, PAH emission may be present in the blue
wing of the silicate feature without producing a maximum, and such a component may be
suggested by comparing their Fig. 1 with the later onset of silicate emission in the spectra
of local QSOs. The tentative 6.2µm peak in IRAS F10214+4724 has similar peak height as
the Cloverleaf PAH feature, in line with our interpretation and the similar rest frame FIR
fluxes of the two objects.
With ∼5-10% of its total luminosity originating in the rest frame far-infrared and by star
formation, the Cloverleaf is within the range of local QSOs, and not a pronounced infrared
excess object. Specifically, its ratio of FIR to total luminosity and the ratio of rest frame far-
infrared (60µm) to mid-infrared (6µm) continuum are about twice those of the Elvis et al.
(1994) radio-quiet QSO SED. Adopting the conclusion of Schweitzer et al. (2006) that star
formation already dominates the FIR emission of local PG QSOs and considering the modest
FIR ‘excess’ of the Cloverleaf compared to the Elvis et al. (1994) SED then suggests only
a small AGN contribution to its FIR luminosity. Other z∼2 QSOs may have lower ratios
of FIR and total luminosity, and conversely larger AGN contributions to their more modest
FIR emission, though. After correcting for lensing, the Cloverleaf submm flux is a factor ∼2
above the typical bright z∼2 QSOs of Priddey et al. (2003) whose rest frame B magnitudes
in addition are typically brighter than the delensed Cloverleaf.
Submillimeter galaxies host starbursts of similar luminosity as the Cloverleaf, at similar
redshift. Lutz et al. (2005) and Valiante et al. (2007) have obtained IRS mid-infrared spec-
tra of 13 SMGs with median redshift 2.8, finding mostly starburst dominated systems. A
comparison can be made between PAH peak flux density and flux density at rest wavelength
222µm which is obtained with minimal extrapolation from observed SCUBA 850µm fluxes.
Combining the 7.7µm feature peak of 5.1mJy (Fig. 2) with a ν3.5 extrapolation of the SCUBA
flux of Barvainis & Ivison (2002) places the Cloverleaf at Log(SPAH7.7/S222µm) ∼ −1.2, near
the center of the distribution of this quantity for the SMGs of Valiante et al. (2007, their
Fig.4). Like the SMGs, the Cloverleaf appears to host a scaled up ULIRG-like starburst,
but with superposition of a much more powerful AGN, also in comparison to the gas mass.
Tracers of high density gas, in particular HCN but also HCO+ have been detected in a
few high redshift QSOs including the Cloverleaf (Barvainis et al. 1997; Solomon et al. 2003;
Riechers et al. 2006). Their ratio to far-infrared emission is similar to the one for Galactic
dense star forming regions, and has been used to argue for dense, high pressure star forming
regions dominating the far-infrared luminosity of these QSOs as well as of local ULIRGs (e.g.
Solomon et al. 2003). Intense HCN emission is observed also from X-ray dominated regions
close to AGN (e.g. Tacconi et al. 1994), and there is ongoing debate as to the possible
contributions of chemistry and excitation in X-ray dominated regions, and other effects like
– 7 –
radiative pumping, to the emission of dense molecular gas tracers in ULIRGs and QSOs
(Kohno 2005; Imanishi et al. 2006; Graćıa-Carpio et al. 2006). Unlike HCN, PAH emission
is severely reduced in X-ray dominated regions close to AGN (Voit 1992) and provides an
independent check of the effects of the AGN on the molecular gas versus the role of the host
and its star formation. In a scenario where XDRs dominate the strong HCN emission and
the hosts PAH emission, reproducing the consistent ratios of these quantities to rest-frame
far-infrared over a wide range of far-infrared luminosities would thus require a considerable
amount of finetuning. In contrast, these consistent ratios are a natural implication if all
these components are dominated by ULIRG-like dense star formation.
Our detection of PAH emission is strong support to a scenario in which the Cloverleaf
QSO coexists with intense star formation. Applying the Kennicutt (1998) conversion from
infrared luminosity to star formation rate to LFIR = 5.4× 10
12L⊙ suggests a star formation
rate close to 1000 M⊙yr
−1, which can be maintained for a gas exhaustion timescale of only
3×107yr, for the molecular gas mass inferred by Weiß et al. (2003). At this time resolution,
the period of QSO activity coincides with what likely is the most significant star forming
event in the history of the Cloverleaf host.
This work is based on observations made with the Spitzer Space Telescope, which is
operated by the Jet Propulsion Laboratory, California Institute of Technology, under a con-
tract with NASA. Support for this work was provided by NASA under contracts 1287653
and 1287740 (S.V.,O.S.). We thank the referee for helpful comments.
REFERENCES
Alloin, D., Guilloteau, S., Barvainis, R., Antonucci, R., Tacconi, L. 1997, A&A, 321, 24
Aussel, H., Gerin, M., Boulanger, F., Désert, F.X., Casoli, F., Cutri, R.M., Signore, M. 1998,
A&A, 334, L73
Barvainis, R., Antonucci, R., Hurt, T., Coleman, P., Reuter, H.-P. 1995, ApJ, 451, L9
Barvainis, R., Lonsdale, C. 1997, AJ, 113, 144
Barvainis, R., Maloney, P., Antonucci, R., Alloin, D. 1997, ApJ, 484, 695
Barvainis, R., Ivison, R. 2002, ApJ, 571, 712
Benford, D. 1999, PhD Thesis, California Institute of Technology
– 8 –
Brandl, B., et al. 2006, ApJ, 653, 1129
Canalizo, G., Stockton, A. 201, ApJ, 555, 719
Carilli, C.L., et al. 2005, ApJ, 618, 586
Chapman, S.C., Blain, A.W., Smail, I., Ivison, R.J. 2005, ApJ, 622, 772
Elvis, M., et al. 1994, ApJS, 95, 1
Evans, A.S., Frayer, D.T., Surace, J.A., Sanders, D.B. 2001, AJ, 121, 3286
Evans, A.S., Solomon, P.M., Tacconi, L.J., Vavilkin, T., Downes, D. 2006, AJ, 132, 2398
Garćıa-Burillo, S., et al. 2006, ApJ, 645, L17
Genzel, R., et al. 1998, ApJ, 498, 579
Graćıa-Carpio, J., Garćıa-Burillo, S., Planesas, P., Colina, L. 2006, ApJ, 640, L135
Granato, G.L., de Zotti, G., Silva, L., Bressan, A., Danese, L. 2004, ApJ, 600, 580
Greve, T.R., et al. 2005, MNRAS, 359, 1165
Guélin, M., et al. 2007, A&A, 462, L45
Hao, L., et al. 2005, ApJ, 625, L75
Hao, L., Weedman, D.W., Spoon, H.W.W., Marshall, J.A., Levenson, N.A., Elitzur, M.,
Houck, J.R. 2007, ApJ, 655, L77
Hamann, F., Ferland, G. 1999, ARA&A, 47, 487
Hazard, C., Morton, D.C., Terlevich, R., McMahon, R. 1984, ApJ, 282, 33
Higdon, S.J.U., et al. 2004, PASP, 116, 975
Ho, L.C. 2005, ApJ, 629, 680
Hopkins, P.F., Hernquist, L., Cox, T.J., Di Matteo, T., Robertson, B., Springel, V. 2006,
ApJS, 163, 1
Houck, J.R., et al. 2004, ApJS, 154, 18
Hughes, D.H., Dunlop, J.S., Rawlings, S. 1997, MNRAS, 289, 766
Imanishi, M., Nakanishi, K., Kohno, K. 2006, AJ, 131, 2888
– 9 –
Kauffmann, G., et al. 2003, MNRAS, 346, 1055
Kennicutt, R.C. 1998, ARA&A, 36, 189
Kohno, K. 2005, astro-ph/0508420
Laurent, O., Mirabel, I.F., Charmandaris, V., Gallais, P., Madden, S.C., Sauvage, M., Vi-
groux, L., Cesarsky, C. 2000, A&A, 359, 887
Lutz, D., Valiante, E., Sturm, E., Genzel, R., Tacconi, L.J., Lehnert, M.D., Sternberg, A.,
Baker, A.J. 2005, ApJ, 625, L83
Magain, P., Surdej, J., Swings, J.-P., Borgeest, U., Kayser, R., Kühr, H., Refsdal, S., Remy,
M. 1988, Nature, 334, 325
Maiolino, R., et al. 2005 A&A, 440, L51
Maloney, P.R., Hollenbach, D.J., Tielens, A.G.G.M., 1996, ApJ, 466, 561
Norman, C., Scoville, N.Z., 1988, ApJ, 332, 124
Omont, A., Beelen, A., Bertoldi, F., Cox, P., Carilli, C.L., Priddey, R.S., McMahon, R.G.,
Isaak, K.G. 2003, A&A, 398, 857
Priddey, R.S., Isaak, K.G., McMahon, R.G., Omont, A. 2003, MNRAS, 339, 1183
Riechers, D.A., Walter, F., Carilli, C.L., Weiss, A., Bertoldi, F., Menten, K.M., Knudsen,
K.K., Cox, P. 2006, ApJ, 645, L13
Rowan-Robinson, M. 1995, MNRAS, 272, 737
Rowan-Robinson, M. 2000, MNRAS, 316, 885
Sanders, D.B., Soifer, B.T., Elias, J.H., Madore, B.F., Matthews, K., Neugebauer, G., Scov-
ille, N.Z. 1988, ApJ, 325, 74
Sanders, D.B., Phinney, E.S., Neugebauer, G., Soifer, B.T., Matthews, K. 1989, ApJ, 347,
Schmidt, M., Schneider, D., Gunn, J. 1995, AJ, 110, 68
Scoville, N.Z., et al. 2003, ApJ, 585, L105
Schweitzer, M., et al. 2006, ApJ, 649, 79
http://arxiv.org/abs/astro-ph/0508420
– 10 –
Shemmer, O., Netzer, H., Maiolino, R., Oliva, E., Croom, S., Corbett, E., di Fabrizio, L.
2004, ApJ, 614, 557
Siebenmorgen, R., Haas, M., Krügel, E., Schulz, B. 2005, A&A, 436, L5
Solomon, P., Vanden Bout, P., Carilli, C., Guélin, M. 2003, Nature, 426, 636
Solomon, P.M., Vanden Bout, P.A. 2005, ARA&A, 43, 677
Springel, V., Di Matteo, T., Hernquist, L. 2005, MNRAS, 361, 776
Sturm, E., Lutz, D., Tran, D., Feuchtgruber, H., Genzel, R., Kunze, D., Moorwood, A.F.M.,
Thornley, M.D. 2000, A&A, 358, 481
Sturm, E., Hasinger, G., Lehmann, I., Mainieri, V., Genzel, R., Lehnert, M.D., Tacconi,
L.J., 2006, ApJ, 642, 81
Tacconi, L.J., Genzel, R., Blietz, M., Cameron, M., Harris, A.I., Madden, S. 1994, ApJ, 416,
Teplitz, H.I., et al. 2006, ApJ, 638, L1
Valiante, E., Lutz, D., Sturm, E., Genzel, R., Tacconi, L.J., Lehnert, M., Baker, A.J. 2007,
ApJ, in press (astro-ph/0701816)
Venturini, S., Solomon, P.M. 2003, ApJ, 590, 740
Voit, G.M. 1992, MNRAS, 258, 841
Weiß, A., Henkel, C., Downes, D., Walter, F. 2003, A&A, 409, L41
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0701816
– 11 –
Table 1. Cloverleaf properties
Quantity Value Reference
Redshift z 2.55784 Weiß et al. (2003)
Amplification µL 11 Venturini & Solomon (2003)
F(PAH 6.2µm) 1.5× 10−21Wcm−2 this work
F(PAH 7.7µm) 6.1× 10−21Wcm−2 this work
L(PAH 7.7µm)a 7.6× 1010L⊙ this work
L(40-120µm)a 5.4× 1012L⊙ Weiß et al. (2003)
M(H2)
a 3.0× 1010M⊙ Weiß et al. (2003)
LBol(QSO)
∼ 7× 1013L⊙ this work, 10× νLν(6µm)
aCorrected for lensing amplification 11 and to our adopted cosmology Ωm = 0.3, ΩΛ = 0.7
and H0 = 70 kms
−1 Mpc−1 (DL=20.96 Gpc).
– 12 –
Fig. 1.— Infrared to radio spectral energy distribution for the Cloverleaf QSO. The
IRS spectrum (continuous line) is supplemented by photometric data from the literature
(Barvainis et al. 1995; Alloin et al. 1997; Barvainis & Lonsdale 1997; Hughes et al. 1997;
Benford 1999; Rowan-Robinson 2000; Solomon et al. 2003; Weiß et al. 2003). The ISOCAM-
CVF spectrum of Aussel et al. (1998) is indicated by the short dotted line. The ISO 12µm
flux appears too high while the other mid-infrared data are consistent within plausible cali-
bration uncertainties.
– 13 –
Fig. 2.— IRS spectrum of the Cloverleaf QSO. The PAH emission features as well as the
expected locations of strong spectral lines in this wavelength range are marked. The dotted
line shows the spectrum after the subtraction of a PAH template (spectrum of M82, see also
bottom of figure), redshifted and scaled to the measured strength of the Cloverleaf 6.2µm
PAH feature. Note that the noise in IRS low resolution spectra increases strongly from
∼33µm towards the long wavelength end.
– 14 –
Fig. 3.— Relation of 7.7µm PAH luminosity and rest frame FIR luminosity for the Cloverleaf
and for local PG QSOs and starbursting ULIRGs from Schweitzer et al. (2006).
	Introduction
	Observations and Results
	Intense star formation in the host of the Cloverleaf QSO
ABSTRACT
  We report the first detection of the 6.2micron and 7.7micron infrared `PAH'
emission features in the spectrum of a high redshift QSO, from the Spitzer-IRS
spectrum of the Cloverleaf lensed QSO (H1413+117, z~2.56). The ratio of PAH
features and rest frame far-infrared emission is the same as in lower
luminosity star forming ultraluminous infrared galaxies and in local PG QSOs,
supporting a predominantly starburst nature of the Cloverleaf's huge
far-infrared luminosity (5.4E12 Lsun, corrected for lensing). The Cloverleaf's
period of dominant QSO activity (Lbol ~ 7E13 Lsun) is coincident with an
intense (star formation rate ~1000 Msun/yr) and short (gas exhaustion time
~3E7yr) star forming event.

<|endoftext|><|startoftext|>
Causal dissipative hydrodynamics for QGP fluid in 2+1 dimensions
A. K. Chaudhuri∗
Variable Energy Cyclotron Centre, 1/AF, Bidhan Nagar, Kolkata 700 064, India
(Dated: February 21, 2013)
In 2nd order causal dissipative theory, space-time evolution of QGP fluid is studied in 2+1 di-
mensions. Relaxation equations for shear stress tensors are solved simultaneously with the energy-
momentum conservation equations. Comparison of evolution of ideal and viscous QGP fluid, ini-
tialized under the same conditions, e.g. same equilibration time, energy density and velocity profile,
indicate that in a viscous dynamics, energy density or temperature of the fluid evolve slowly, than
in an ideal fluid. Cooling gets slower as viscosity increases. Transverse expansion also increases in
a viscous dynamics. For the first time we have also studied elliptic flow of ’quarks’ in causal viscous
dynamics. It is shown that elliptic flow of quarks saturates due to non-equilibrium correction to
equilibrium distribution function, and can not be mimicked by an ideal hydrodynamics.
PACS numbers: 47.75.+f, 25.75.-q, 25.75.Ld
I. INTRODUCTION
One of the most important discoveries in Relativistic
Heavy ion collider (RHIC) at Brokhaven National Lab-
oratory is the large elliptic flow in non-central Au+Au
collisions [1, 2, 3, 4] . Elliptic flow measures the momen-
tum anisotropy of produced particles and is quantified by
the 2nd harmonic of the azimuthal distribution,
v2(pT ) =< cos(2φ) >=
dyd2pT
cos(2φ)dφ
dyd2pT
(1.1)
Elliptic flow is naturally explained in hydrodynamics.
Hydrodynamic pressure is built up from rescattering of
secondaries, and pressure gradients drive the subsequent
collective motion. In non-central Au+Au collisions, ini-
tially, the reaction zone is asymmetric (almond shaped).
The pressure gradient is large in one direction and small
in the other. The asymmetric pressure gradients gener-
ates the elliptic flow. Naturally, in a central collision,
reaction zone is symmetric and elliptic flow vanishes.
Observed elliptic flow then give the strongest indication
that in non-central Au+Au collisions, a collective QCD
matter is produced. Whether the formed matter can be
identified as the much sought after Quark-Gluon Plasma
(QGP) as predicted in Lattice QCD simulations [5] is
presently debatable.
Ideal hydrodynamics has been partly successful in ex-
plaining the observed elliptic flow, quantitatively [6]. El-
liptic flow of identified particles, up to pT ∼1.5 GeV are
well reproduced in ideal hydrodynamics. Ideal hydrody-
namics also explains the transverse momentum spectra
of identified particles (up to pT ∼ 1.5 GeV). Success of
ideal hydrodynamics in explaining bulk of the data [6],
together with the string theory motivated lower limit of
∗E-mail:akc@veccal.ernet.in
shear viscosity η/s ≥ 1/4π [7, 8] has led to a paradigm
that in Au+Au collisions, a nearly perfect fluid is created.
However, the paradigm of ”perfect fluid” produced in
Au+Au collisions at RHIC need to be clarified. As in-
dicated above, the ideal hydrodynamics is only partially
successful and in a limited pT range (pT ≤1.5 GeV) [9].
The transverse momentum spectra of identified particles
also starts to deviate form ideal fluid dynamics prediction
beyond pT ≈ 1.5 GeV. Experimentally determined HBT
radii are not reproduced in the ideal fluid dynamic mod-
els, the famous ”HBT puzzle” [10]. It also do not repro-
duce the experimental trend that elliptic flow saturates
at large transverse momentum. These shortcomings of
ideal fluid dynamics indicate greater importance of dis-
sipative effects in the pT ranges greater than 1.5 GeV
or in more peripheral collisions. Indeed, ideal fluid is a
concept, never realized in nature. As suggested in string
theory motivated models [7, 8], QGP viscosity could be
small, η/s ≥ 1/4π, nevertheless it is non-zero. It is im-
portant to study the effect of viscosity, even if small, on
space-time evolution of QGP fluid and quantify its effect.
This requires a numerical implementation of relativistic
dissipative fluid dynamics. Furthermore, if QGP fluid
is formed in heavy ion collisions, it has to be charac-
terized by measuring its transport coefficients, e.g. heat
conductivity, bulk and shear viscosity. Theoretically, it
is possible to obtain those transport coefficients in a ki-
netic theory model. However, in the present status of
theory, the goal can not be achieved immediately, even
more so for a strongly interacting QGP (sQGP). Alter-
natively, one can use the experimental data to obtain
a ”phenomenological” limit of transport coefficients of
sQGP. It will also require a numerical implementation of
relativistic dissipative fluid dynamics. There is another
incentive to study dissipative hydrodynamics. Ideal hy-
drodynamics depends on the assumption of local equilib-
rium. Before local equilibrium is attained, the system has
to pass through a non-equilibrium stage, where (if non-
equilibrium effects are small) dissipative hydrodynamics
may be applicable. Indeed, we can explore early times of
fluid evolution better in a dissipative hydrodynamics.
http://arxiv.org/abs/0704.0134v2
mailto:akc@veccal.ernet.in
Theory of dissipative relativistic fluid has been formu-
lated quite early. The original dissipative relativistic fluid
equations were given by Eckart [11] and Landau and Lif-
shitz [12]. They are called 1st order theories. Formally,
relativistic dissipative hydrodynamics are obtained from
an expansion of entropy 4-current, in terms of dissipative
fluxes. In 1st order theories, entropy 4-current contains
terms linear in dissipative quantities. 1st order theory
of dissipative hydrodynamics suffer from the problem of
causality violation. Signal can travel faster than light.
Causality violation is unwarranted in any theory, even
more in a relativistic theory. The problem of causal-
ity violation is removed in the Israel-Stewart’s 2nd order
theory of dissipative fluid [13]. In 2nd order theory, ex-
pansion of entropy 4-current contains terms 2nd order in
dissipative fluxes. However, these leads to complications
that dissipative fluxes are no longer function of the state
variables only. They become dynamic. The space of ther-
modynamic variables has to be extended to include the
dissipative fluxes (e.g. heat conductivity, bulk and shear
viscosity).
Even though 2nd order theory was formulated some
30 years back, significant progress towards its numer-
ical implementation has only been made very recently
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23]. At the Cyclotron
Centre, Kolkata, we have developed a code ”AZHYDRO-
KOLKATA” to simulate the hydrodynamic evolution of
dissipative QGP fluid. Presently only dissipative ef-
fect included is the shear viscosity. Some results of
AZHYDRO-KOLKATA, for first order dissipative hydro-
dynamics have been published earlier [19, 20, 21]. In the
present paper, for the first time, we will present some
results for 2nd order dissipative hydrodynamics in 2+1
dimensions. In the present paper, we will consider effect
of dissipation in the QGP phase only. Effect of phase
transition will be studied in a later publication.
The paper is organized as follows: In section II we
briefly review relativistic dissipative fluid dynamics. In
section III we derive the relevant equations in 2+1 di-
mension (assuming boost-invariance). Required inputs
e.g. the equation of state, viscosity coefficient and initial
conditions are discussed in section IV. Simulation re-
sults from AZHYDRO-KOLKATA are shown in section
V. In section VII we compare the transverse momentum
spectra and elliptic flow of quarks in ideal and viscous
dynamics. The concluding section IX summarizes our
results.
II. DISSIPATIVE FLUID DYNAMICS
In this section, I briefly discuss the phenomenological
theory of dissipative hydrodynamics. More detailed ex-
position can be found in [13].
A simple fluid, in an arbitrary state, is fully speci-
fied by primary variables: particle current (Nµ), energy-
momentum tensor (T µν) and entropy current (Sµ) and
a number of additional (unknown) variables. Primary
variables satisfies the conservation laws;
µ = 0, (2.1)
µν = 0, (2.2)
and the 2nd law of thermodynamics,
µ ≥ 0. (2.3)
In relativistic fluid dynamics, one defines a time-like
hydrodynamic 4-velocity, uµ (normalized as u2 = 1).
One also define a projector, ∆µν = gµν − uµuν , orthog-
onal to the 4-velocity (∆µνuν = 0). In equilibrium, an
unique 4-velocity (uµ) exists such that the particle den-
sity (n), energy density (ε) and the entropy density (s)
can be obtained from,
Nµeq = nuµ (2.4)
T µνeq = εu
µuν − p∆µν (2.5)
Sµeq = suµ (2.6)
An equilibrium state is assumed to be fully specified
by 5-parameters, (n, ε, uµ) or equivalently by the thermal
potential, α = µ/T (µ being the chemical potential) and
inverse 4-temperature, βµ = uµ/T . Given a equation of
state, s = s(ε, n), pressure p can be obtained from the
generalized thermodynamic relation,
Sµeq = pβ
µ − αNµeq + βλT λµeq (2.7)
Using the Gibbs-Duhem relation, d(pβµ) = Nµeqdα −
T λµeq dβλ, following relations can be established on the
equilibrium hyper-surface Σeq(α, β
dSµeq = −αdNµeq + βλdT λµeq (2.8)
In a non-equilibrium system, no 4-velocity can be
found such that Eqs.2.4,2.5,2.6 remain valid. Tensor de-
composition leads to additional terms,
Nµ = Nµeq + δN
µ = nuµ + V µ (2.9)
T µν = T µνeq + δT
= [εuµuν − p∆µν ] + Π∆µν + πµν
+(Wµuν +W νuµ) (2.10)
Sµ = Sµeq + δS
µ = suµ + Φµ (2.11)
The new terms describe a net flow of charge V µ =
∆µνNν , heat flow, W
µ = (ε + p)/nV µ + qµ (where qµ
is the heat flow vector), and entropy flow Φµ. Π =
µν − p is the bulk viscous pressure and πµν =
(∆µσ∆ντ +∆νσ∆µτ − 1
∆µν∆στ ]Tστ is the shear stress
tensor. Hydrodynamic 4-velocity can be chosen to elimi-
nate either V µ (the Eckart frame, uµ is parallel to particle
flow) or the heat flow qµ (the Landau frame, uµ is par-
allel to energy flow). In relativistic heavy ion collisions,
central rapidity region is nearly baryon free and Lan-
dau’s frame is more appropriate than the Eckart’s frame.
Dissipative flows are transverse to uµ and additionally,
shear stress tensor is traceless. Thus a non-equilibrium
state require 1+3+5=9 additional quantities, the dissi-
pative flows Π, qµ (or V µ) and πµν . In kinetic theory,
Nµ and T µν are the 1st and 2nd moment of the distri-
bution function. Unless the function is known a-priori,
two moments do not furnish enough information to enu-
merate the microscopic states required to determine Sµ,
and in an arbitrary non-equilibrium state, no relation
exists between, Nν , T µν and Sµ. Only in a state, close
to a equilibrium one, such a relation can be established.
Assuming that the equilibrium relation Eq.2.8 remains
valid in a ”near equilibrium state” also, the entropy cur-
rent can be generalized as,
Sµ = Sµeq + dS
µ = pβµ − αNµ + βλT λµ +Qµ (2.12)
where Qµ is an undetermined quantity in 2nd order in
deviations, δNµ = Nµ − Nµeq and δT µν = T µν − T µνeq .
Detail form of Qµ is constrained by the 2nd law ∂µS
0. With the help of conservation laws and Gibbs-Duhem
relation, entropy production rate can be written as,
µ = −δNµ∂µα+ δT µν∂µβν + ∂µQµ (2.13)
Choice of Qµ leads to 1st order or 2nd order theories
of dissipative hydrodynamics. In 1st order theories the
simplest choice is made, Qµ = 0, entropy current con-
tains terms up to 1st order in deviations, δNµ and δT µν.
Entropy production rate can be written as,
µ = ΠX − qµXµ + πµνXµν (2.14)
where, X = −∇.u; Xµ = ∇
− uν∂νuµ and Xµν =
∇<µuν>.
The 2nd law, ∂µS
µ ≥ 0 can be satisfied by postulat-
ing a linear relation between the dissipative flows and
thermodynamic forces,
Π = −ζθ, (2.15)
qµ = −λ nT
∇µ(µ/T ), (2.16)
πµν = 2η∇<µuν> (2.17)
where ζ, λ and η are the positive transport coefficients,
bulk viscosity, heat conductivity and shear viscosity re-
spectively.
In 1st order theories, causality is violated. If, in a given
fluid cell, at a certain time, thermodynamic forces vanish,
corresponding dissipative fluxes also vanish instantly. Vi-
olation of causality is unwanted in any theory, even more
so in relativistic theory. Causality violation of dissipative
hydrodynamics is corrected in 2nd order theories [13]. In
2nd order theories, entropy current contain terms up to
2nd order in the deviations, Qµ 6= 0. The most general
Qµ containing terms up to 2nd order in deviations can
be written as,
Qµ = −(β0Π2−β1qνqν+β2πνλπνλ)
(2.18)
As before, one can cast the entropy production rate
(T∂µS
µ) in the form of Eq.2.14. Neglecting the terms
involving dissipative flows with gradients of equilib-
rium thermodynamic quantities (both are assumed to be
small) and demanding that a linear relation exists be-
tween the dissipative flows and thermodynamic forces,
following relaxation equations for the dissipative flows
can be obtained,
Π = −ζ(θ + β0DΠ) (2.19)
qµ = −λ
) − β1Dqµ
(2.20)
πµν = 2η
∇<µuν> − β2Dπµν
, (2.21)
where D = uµ∂µis the convective time derivative. Unlike
in the 1st order theories, in 2nd order theories, dynamical
equations control the dissipative flows. Even if thermo-
dynamic forces vanish, dissipative flows do not vanish
instantly.
Before we proceed further, it may be mentioned that
the parameters, α and βλ are not connected to the actual
state (Nµ, T µν). The pressure p in Eq.2.12 is also not
the ”actual” thermodynamics pressure, i.e. not the work
done in an isentropic expansion. Chemical potential α
and 4-inverse temperature βλ has meaning only for the
equilibrium state. Their meaning need not be extended
to non-equilibrium states also. However, it is possible to
fit a fictitious ”local equilibrium” state, point by point,
such that pressure p in Eq.2.12 can be identified with
the thermodynamic pressure, at least up to 1st order.
The conditions of fit fixes the underlying non-equilibrium
phase-space distribution.
III. (2+1)-DIMENSIONAL VISCOUS
HYDRODYNAMICS WITH LONGITUDINAL
BOOST INVARIANCE
Complete dissipative hydrodynamics is a numerically
challenging problem. It requires simultaneous solution of
14 partial differential equations (5 conservation equations
and 9 relaxation equations for dissipative flows). We re-
duce the problem to solution of 6 partial differential equa-
tions (3 conservation equations and 3 relaxation equa-
tions). In the following, we will study boost-invariant
evolution of baryon free QGP fluid, including the dissi-
pative effect due to shear viscosity only. Shear viscosity
is the most important dissipative effect. For example, in
a baryon free QGP, heat conduction is zero and we can
disregard Eq.2.20. Bulk viscosity is also zero for the QGP
fluid (point particles) and Eq.2.19 can also be neglected.
Shear pressure tensor has 5 independent components but
the assumption of boost invariance reduces the number
of independent components to three. For a baryon free
fluid, we can also disregard the conservation equation
Eq.2.1. With the assumption of boost-invariance, energy-
momentum conservation equation ∂µT
µη = 0 become re-
dundant and only three energy-momentum conservation
equations are required to be solved.
-10 -5 0 5 10
X (fm)
-10 -5 0 5 10
_______ η/s=0.08
_______ η/s=0.135
_______ η/s=0
-10 -5 0 5 10
energy density in x-y plane, τ =2.6 fm.
FIG. 1: (color online). Constant energy density contours in
x-y plane at τ=2.6 fm. The black lines are for ideal fluid
(η/s=0). The red and blue lines are for viscous fluid with
ADS/CFT and perturbative estimate of viscosity, η/s=0.08
and 0.135.
Heavy ion collisions are best described in (τ, x, y, η)
coordinates, where τ =
t2 − z2 is the longitudinal
proper time and η = 1
ln t+z
is the space-time rapid-
ity. r⊥ = (x, y) are the usual cartisan coordinate in
the plane, transverse to the beam direction. Relevant
equations concerning this coordinate transformations are
given in the appendix A.
Explicit equations for energy-momentum conservation
in (τ ,x,y,η) coordinates are given in the appendix B. We
note that unlike in ideal fluid, in viscous fluid dynam-
ics, conservation equations (see Eqs.B1-B3) contain addi-
tional pressure gradients due to shear viscosity. Both T τx
and T τy components of energy-momentum tensor now
evolve under additional pressure gradients. The right-
most term of Eq.B3 also indicate that in viscous dynam-
ics, longitudinal pressure is effectively reduced (note that
the πηη component is negative). Since pressure can not
be negative, shear viscosity is limited by the condition,
p+ τ2πηη ≥ 0.
As evident from the Eqs.B1-B3, in boost-invariant dis-
sipative hydrodynamics, with shear viscosity taken into
account, fluid evolution depends only on seven compo-
nents of the shear stress tensors. They are πττ , πτx,
πτy, πxx, πyy, πxy and πηη. However, all the seven com-
ponents are not independent. Tracelessness, transver-
sality to uµ and the assumption of boost-invariance re-
duces the independent components to three. Presently,
we choose πxx πyy and πxy as the independent compo-
nents. Relaxation equations for the independent compo-
nents are given in the appendix C (see Eqs.C4-C6). They
are solved simultaneously with the three energy momen-
tum conservation equations Eqs.B1-B3, with inputs as
discussed below.
-10 -5 0 5 10
X (fm)
-10 -5 0 5 10
_______ η/s=0.08
_______ η/s=0.135
_______ η/s=0
-10 -5 0 5 10
energy density in x-y plane, τ =8.6 fm
FIG. 2: (color online). same as Fig.1 but at time τ=8.6 fm.
IV. EQUATION OF STATE, VISCOSITY
COEFFICIENT AND INITIAL CONDITIONS
A. Equation of state
One of the most important inputs of a hydrodynamic
model is the equation of state. Through this input, the
macroscopic hydrodynamic models make contact with
the microscopic world. In the present demonstrative cal-
culation we will show results for the QGP phase only.
In the QGP phase, we use the simple equation of state,
p = 1
ε, with energy density given as,
gqgpT
4 (4.1)
where gqgp = ggluon+
gquark is the degeneracy factor for
QGP. ggluon = 2(helicity) × 8(color) is the degeneracy
factor for gluons and gquark = 2(spin)× 3(color)× 2(q+
q̄) ×Nf is the degeneracy factor for Nf flavored quarks.
For Nf ≈ 2.5, the degeneracy factor is gqgp = 42.25
B. Shear viscosity coefficient
Shear viscosity coefficient (η) of QGP or sQGP is quite
uncertain. In a strongly coupled QGP, shear viscosity
can not be computed. Recently, using the ADS/CFT
correspondence [7, 8] shear viscosity of a strongly coupled
gauze theory, N=4 SUSY YM, has been evaluated, η =
N2c T
3 and the entropy is given by s = π
N2c T
3. Thus
in the strongly coupled field theory,
ADS/CFT
≈ 0.08, (4.2)
Shear viscosity is quite uncertain in perturbative QCD
also. At high temperature, shear viscosity, in leading log,
can be written as [24, 25],
η = κ
g4 ln g−1
, (4.3)
where g is the strong coupling constant. The leading
log shear viscosity coefficient κ depend on the number of
fermion flavors (Nf ). For example, for two flavored QGP,
κ = 86.47 and κ = 106.7 for a three flavored QGP. With
entropy density of QGP, s = π
gqgpT
3. For two flavored
QGP and αs ≈0.5, the ratio of viscosity over the entropy,
in the perturbative regime is estimated as,
≈ 0.135, (4.4)
For lower αs , perturbative estimation of η/s could be
even higher.
Shear viscosity can also be expressed in terms of sound
attenuation length, Γs, defined as,
(4.5)
Γs is equivalent to mean free path and for a valid hy-
drodynamic description Γs/τ << 1, i.e. mean free path
is much less than the system size. Initial conditions of
the fluid must be chosen carefully such that the validity
condition Γs/τ << 1 remains valid initially as well as at
later time also. In the present work, we have treated vis-
cosity as a parameter. To explore the effect of viscosity,
we have used both the ADS/CFT estimate η/s=0.08 and
perturbative estimate η/s=0.135. We have also run the
code with a higher value of viscosity η/s=0.2.
C. Initial conditions
Solution of Eqs.B1-B3 require initial conditions, the
initial time τi, the transverse distribution of energy den-
sity ε(x, y) and the velocities vx(x, y) and vy(x, y). Fol-
lowing [6], initial transverse energy density is parame-
terized geometrically. At an impact parameter ~b, trans-
verse distribution of wounded nucleons NWN (x, y,~b) and
-10 -5 0 5 10
X axis Title
-0.78
-0.56
-0.33
-0.11
-10 -5 0 5 10
contour plot of Vx at τ=8.6 fm
X axis Title
-10 -5 0 5 10
X axis Title
FIG. 3: (color online). contours of constant vx in x-y plane
at τ=8.6 fm. The black lines are for ideal fluid (η/s=0). The
red and blue lines are for viscous fluid with ADS/CFT and
perturbative estimate of viscosity, η/s=0.08 and 0.135.
-10 -5 0 5 10
X axis Title
-0.78
-0.56
-0.33
-0.11
-10 -5 0 5 10
contour plot of Vy at τ=8.6 fm
X axis Title
-10 -5 0 5 10
X axis Title
FIG. 4: (color online). contours of constant vy in x-y plane
at τ=8.6 fm. The black lines are for ideal fluid (η/s=0). The
red and blue lines are for viscous fluid with ADS/CFT and
perturbative estimate of viscosity, η/s=0.08 and 0.135.
of binary NN collisions NBC(x, y,~b) to are calculated in
a Glauber model. A collision at impact parameter ~b is
assumed to contain 25% hard scattering (proportional
to number of binary collisions) and 75% soft scattering
(proportional to number of wounded nucleons). Trans-
verse energy density profile at impact parameter~b is then
obtained as,
ε(x, y,~b) = ε0(0.75×NWN (x, y,~b) + 0.25×NBC(x, y,~b))
(4.6)
with central energy density ε0=30GeV/fm
−3. The equi-
libration time is chosen as τi=0.6 fm [6]. The initial ve-
locities vx and vy are assumed to be zero initially.
______ η/s=0.08
______ η/s=0
______ η/s=0
Temperature in τ-x plane. y=0.
X (fm)0 2 4 6 8 10
FIG. 5: (color online). Constant temperature contour in x −
τ plane, for fixed y=0. The black, red and blue lines are
for ideal, viscous fluid with η/s=0.08 and viscous fluid with
η/s=0.135.
In dissipative hydrodynamics, one requires initial con-
ditions for the viscous pressures also. Due to longitudinal
boost invariance of the problem, we assume that viscous
pressures have attained their boost-invariant values at
the time of equilibration. Boost invariant values of the
three independent shear stress-tensors can be easily ob-
tained from Eqs.C4-C6, σxx = σyy = θ = 1
and σxy = 0
( at the initial time τi, u
µ = (1, 0, 0, 0), Duµ = 0. The
initial distribution of shear pressure tensors are then ob-
tained as,
πxx(x, y,~b) = 2ησxx = 2η/τi (4.7)
πyy(x, y,~b) = 2ησyy = 2η/τi (4.8)
πxy(x, y,~b) = 2ησxy = 0 (4.9)
As stated earlier, the viscous coefficient η is obtained
using the relation, η/s = const, const=0.08, 0.135 or 0.2.
For these values of shear viscosity, the validity condition
Γs/τ << 1 is satisfied initially. The validity condition is
better satisfied at later time.
V. RESULTS
VI. STABILITY OF NUMERICAL SOLUTIONS
A. Evolution of the viscous QGP fluid
The energy-momentum conservation equations B1-B3,
and the relaxation equations C2-C4 are solved simultane-
ously using the code, AZHYDRO-KOLKATA, developed
at the Cyclotron Centre, Kolkata. As mentioned earlier,
we have solved the equations in the QGP phase only and
did not consider any phase transition. In the following
we will show the results for central Au+Au collisions (im-
pact parameter b = 0 fm). To understand the effect of
shear viscosity, with the same initial conditions, we have
solved the energy-momentum conservation equations for
ideal fluid and viscous fluid. As mentioned earlier, we
have considered two values of viscosity, the ADS/CFT
motivated value η/s=0.08 and the perturbative estimate,
η/s=0.135.
0 2 4 6 8 10
X (fm)
_____ η/s=0.08
_____ η/s=0.135
_____ η/s=0
Temperature   in τ-x plane. y=5 fm.
FIG. 6: (color online). same as fig.5 but at y=5.0
In Fig.1, we have shown the contours of constant en-
ergy density in x-y plane, after an evolution of 2.6 fm.
The black lines are for ideal fluid evolution. The red and
blue lines are for viscous fluid with ADS/CFT (η/s=0.08)
and perturbative (η/s=0.135) estimate of viscosity. Con-
stant energy density contours, as depicted in Fig.1, indi-
cate that with viscosity fluid cools slowly. Cooling gets
slower as viscosity increases. Thus at any point in the x-y
plane, energy density of viscous fluid is higher than that
of an ideal fluid. At later time also, compared to an ideal
fluid, viscous fluid evolve slowly. In Fig.2, contours of
constant energy density at time τ=8.6 fm is shown. Here
also we find than at any point energy density of viscous
fluid is higher than its ideal counter part. The result is
in accordance with our expectation. For dissipative fluid,
equation of motion can be written as,
Dε = −(ε+ p)∇µuµ + πµν∇<µuν> (6.1)
Due to viscosity, evolution of energy density (or tem-
perature) is slowed down.
In Fig.3 and 4, we have shown the contour plot of the
fluid velocity, vx and vy, after evolution of 8.6 fm. As
before the black lines are for the ideal fluid evolution.
The red and blue lines are for viscous fluid with η/s=0.08
and 0.135 respectively. Fluid velocities in viscous and
ideal fluid differ very little. Even at late time, as shown
in Fig.3 and 3, we find that for η/s=0.08-0.135, x and y
component of the fluid velocity show marginal difference.
However, there is an indication that in a viscous fluid,
velocity grow faster than in ideal fluid. But as mentioned
earlier, the difference is marginal.
η/s=0.08
η/s=0.135
τ (fm)
0 2 4 6 8 10
FIG. 7: (color online). In the upper panel, temporal evolu-
tion of the shear pressure tensor πxx at the fluid cell x=y=0
is shown. In the lower panel, evolution of πxy at the fluid
cell x=y=5 fm is shown. The black and red lines are for
ADS/CFT motivated viscosity η/s=0.08 and perturbative es-
timate η/s=0.135 respectively.
As seen in Fig.1-2, in viscous dynamics, QGP fluid
evolves slowly. Thus life-time of the QGP phase is en-
hanced in viscous dynamics. To obtain an idea about the
enhanced life-time, in Fig.5, we have shown the constant
temperature contours in τ − x plane , at a fixed value
of y=0 fm. As seen in Fig.5, temperature evolves slowly
in a viscous fluid and life-time of the QGP phase is ex-
tended. For small viscosity η/s=0.08-0.135, the increase
is not large. At the center of the fluid, for η/s=0.135,
QGP life-time is increased approximately by 5% only. It
is even less for the ADS/CFT estimate of viscosity. How-
ever, enhancement of QGP life-time depends on the fluid
cell position. It could be more. In Fig.6, constant tem-
perature contours at y=5 fm is shown. For η/s=0.135, at
x=0,y=5 fm, the QGP life-time is enhanced by ∼ 10%.
We conclude that in a viscous dynamics, with moder-
ate viscosity η/s=0.08-0.135, QGP life-time could be en-
hanced by 5-10%. Enhanced lifetime of QGP in a viscous
fluid can have significant effect on observables produced
early in the collisions e.g. direct photon production or in
J/ψ suppression.
1.01.21.4
0 2 4 6 8 10
0.030
0.180.21
0 2 4 6 8 10
(d) π
 at τ=2.6 fm(c) π
 at τ=0.6 fm
(b) π
 at τ=2.6 fm(a) π
 at τ=0.6 fm
1.01.21.4
0 2 4 6 8 10
0.030
0.150.18
0.210.24
0 2 4 6 8 10
X (fm)
FIG. 8: (color online). In panel (a) and (b), contours of
constant pressure tensor πxx at initial time τi=0.6 fm and at
time τ=2.6 fm is shown. In panel (c) and (d) same results for
shear pressure tensor πyy is shown.
B. Evolution of shear pressure tensors
We have assumed that initially the shear pressure ten-
sors πxx, πyy and πxy attained their longitudinal boost-
invariant values. As the fluid evolve, pressure tensors also
evolve. Here we investigate the evolution of shear pres-
sure tensors with time. In the top panel of Fig.7 evolu-
tion of shear pressure tensor πxx at the fluid cell position
x=y=0 is shown. The black line is for the ADS/CFT
motivated viscosity, η/s=0.08 and the red line is for the
perturbative estimate of viscosity η/s=0.135. Just after
the start of the evolution the shear pressure tensor πxx
increases, but for a short duration and then steadily de-
creases with time. By 4 fm of evolution, πxx at the center
of the fluid reduces to negligibly small values. Identical
behavior is seen for the shear pressure tensor πyy. In
the bottom panel of Fig.7 we have shown the evolution
of the third independent shear pressure tensor πxy. Ini-
tially πxy is zero. As the fluid evolve, it grow in the
negative direction. We find that at the centre of the fluid
(x=y=0), it never grows. In Fig.7, temporal evolution
of πxy at the fluid cell position x = y = 5fm is shown.
From the initial zero value, πxy rapidly increases in the
negative direction. It reaches its maximum around τ ≈1
fm and then decreases again. We also note that πxy never
grows to large values. Compared to πxx or πyy stress ten-
sor πxy is negligible. The results indicate that in a QGP
fluid, viscous effect persist for a short duration (3-4 fm)
only. At late time the fluid evolve essentially as an ideal
fluid. The result is understandable. Shear viscosity de-
pend strongly on temperature (η ∝ T 3). As the fluid
cools, effect of viscosity decreases rapidly.
τ (fm)
0 2 4 6 8 10
η/s=0.08
η/s=0.135
FIG. 9: Evolution of average entropy with time, for two val-
ues of viscosity, the ADS/CFT motivated viscosity η/s=0.08
and perturbative estimate η/s=0.135 are shown.
To show the spatial distribution of the stress tensors,
in Fig.8, πxx and πyy at initial time τi=0.6 fm and after
an evolution of τ = 2.6 fm are shown. As shown earlier,
πxx and also πyy rapidly decreases with time. By 2 fm
of evolution they are reduced by approximately by a fac-
tor 6. It is also interesting to note that the initial x-y
symmetric distribution of πxx and πyy quickly evolves to
asymmetric distribution. With time πxx evolves faster in
the x-direction than in y-direction. Similarly, πyy evolve
faster in the y-direction than in the x-direction. For cen-
tral collisions the asymmetric evolution of πxx and πyy
counter balance each other. As shown in Fig.1 and 2,
the contour plots of energy density do not show any in-
dication of asymmetry even at late time. However, the
asymmetric pressure tensors can have important effects
on elliptic flow of observables produced early in the col-
lisions, say in elliptic flow of direct photons.
C. Entropy generation
In a viscous fluid dynamics, entropy is generated. We
can easily calculate the entropy generated during the evo-
lution,
πµνπµν
[(πττ )2 + (πxx)2 + (πyy)2 + (τ2πηη)2
−2(πτx)2 − 2(πτy)2 + 2(πxy)2] (6.2)
Evolution of spatially averaged entropy is shown in
Fig.9, for the two values of viscosity coefficients η/s=0.08
and 0.135. As expected, entropy generation is more if vis-
cosity is more. For both the values of viscosity, we find
that entropy generation saturates after ≈ 3 fm of evolu-
tion. It is expected also. As shown previously, viscous
fluxes reduces to very small values after τ=3 fm. Natu-
rally, entropy generation is negligible thereafter.
0.190.22
 0 2 4 6 8 10
10 Temperature in τ-x plane, y= 5m.
_____ viscous fluid (2nd order)
_____ ideal fluid
_____ viscous fluid (1st order)
X (fm)
FIG. 10: (color online) constant temperature contours in
x − τ plane at y=5 fm. The black lines are for ideal fluid.
The red and blue lines are for viscous fluid in 1st order and
in 2nd order theory respectively. η/s=0.135.
D. 1st order theory vs. 2nd order theory
As mentioned earlier, 1st order theory of dissipative
hydrodynamics is acausal, signal can travel faster than
light. This is corrected in 2nd order theory, but we
have to pay the price, relaxation equations for dissipa-
tive fluxes are required to be solved. It is interesting to
τ (fm)
0 2 4 6 8 10 12
1st order theory
2nd order theory
η/s=0.135
FIG. 11: Evolution of average entropy production in a 1st
order (solid line) and 2nd order (dashed line) theory. 2nd
order theory generate more entropy.
compare the difference we can expect in a first order the-
ory and in a 2nd order theory of dissipation. In Fig. 10,
we have shown the contours of constant temperature in
x−τ , for a fixed y = 5fm. The black lines are for an ideal
fluid. The red lines are for a viscous fluid treated in the
1st order theory. The blue lines are for viscous fluid in
2nd order theory. In 2nd order theory fluid evolve more
slowly than in a first order theory. Entropy generation is
also more in a 2nd order theory. In Fig.11, average en-
tropy evolution with proper time is shown, both for the
1st order theory (the solid line) and the 2nd order theory.
In 2nd order theory, approximately 80% more entropy is
generated.
VII. TRANSVERSE MOMENTUM AND
ELLIPTIC FLOW OF QUARKS
Presently we can not compare predictions from viscous
hydrodynamics with experimental data. Hadrons are not
included in the model. The initial QGP fluid evolve and
cools but remain in the QGP phase, it did not undergo
a phase transition to hadronic gas. However, from the
momentum distribution of quarks we can get some idea
about the viscous effect on particle production. Viscos-
ity generates entropy, which will be reflected in enhanced
multiplicity. We use the standard Cooper-Frey prescrip-
tion to obtained the transverse momentum distribution
of quarks. In Cooper-Frey prescription, particle distribu-
tion is obtained by convoluting the one body distribution
function over the freeze-out surface,
µf(x, p) (7.1)
where dΣµ is the freeze-out hyper-surface and f(x, p) is
the one-body distribution function. Now in a viscous dy-
namics, the fluid is not in equilibrium and f(x, p) can not
be approximated by the equilibrium distribution func-
tion,
f (0)(x, p) =
exp[β(uµpµ − µ)] ± 1
, (7.2)
with inverse temperature β = 1/T and chemical poten-
tial µ. In a highly non-equilibrium system, distribution
function f(x, p) is unknown. If the system is slightly
off-equilibrium, then it is possible to calculate correction
to equilibrium distribution function due to (small) non-
equilibrium effects. Slightly off-equilibrium distribution
function can be approximated as,
f(x, p) = f (0)(x, p)[1 + φ(x, p)], (7.3)
φ(x, p) is the deviation from equilibrium distribution
function f (0). With shear viscosity as the only dissi-
pative forces, φ(x, p) can be locally approximated by a
quadratic function of 4-momentum,
φ(x, p) = εµνp
µpν . (7.4)
Without any loss of generality εµν can be written as
εµν =
2(ε+ p)T 2
πµν , (7.5)
completely specifying the non-equilibrium distribution
function. As expected, correction factor increases with
increasing viscosity. We also note that non-equilibrium
correction is more on large momentum particles. The
effect of viscosity is more on large momentum parti-
cles. The correction factor reduces if freeze-out occurs
at higher temperature.
With the corrected distribution function, we can cal-
culate the quark momentum spectra at freeze-out sur-
face Σµ. In appendix D, relevant equations are given.
The quark momentum distribution has two parts, (i)
dyd2pT
, obtained by convoluting the equilibrium distribu-
tion function over the freeze-out surface and (ii) dN
dyd2pT
obtained by convoluting the correction to the equilibrium
distribution function over the freeze-out surface. Since
the correction factor is obtained under the assumption
that non-equilibrium effects are small, φ(x, p) << 1, it
necessarily imply that, dN
dyd2pT
<< dN
dyd2pT
. The ratio,
dNneq
dNneq
dyd2pT
dyd2pT
, (7.6)
could at best be unity or less. If the ratio exceeds
unity, it will imply that non-equilibrium effects are large
 (GeV)
0 1 2 3 4 5
η/s=0.2 
η/s=0.135 
η/s=0.08 
Au+Au@b=6.5 fm
τi=0.6fm,Sini=110fm-3,TF=160MeV
FIG. 12: Ratio of quark spectra with non-equilibrium distri-
bution function to that with equilibrium distribution function.
and the distribution function f(x, p) can not be approx-
imated as in Eq.7.3. Using AZHYDRO-KOLKATA, we
have simulated a b=6.5 fm Au+Au collision. dN
dyd2pT
dNneq
dyd2pT
at freeze-out temperature TF =160 MeV are cal-
culated. The ratio dN
for η/s=0.08,0.135 and 0.2, are
shown in Fig.12. With ADS/CFT estimate of viscosity,
η/s=0.08, non-equilibrium correction to particle produc-
tion become comparable to equilibrium contribution only
beyond pT =5 GeV. However, with perturbative estimate,
η/s=0.135, non-equilibrium correction become compara-
ble to or exceeds the equilibrium contribution at pT ∼
4 GeV. pT range is further reduced for higher viscosity
η/s=0.2. Thus with perturbative estimate of viscosity
(η/s = 0.135 − 0.2), hydrodynamic description remain
valid upto transverse momentum pT ∼ 3.5-4 GeV.
In Fig.13, we have compared the transverse momen-
tum spectra of quarks in ideal hydrodynamics with that
in a viscous dynamics. In Fig.13, the dotted line is the
spectra obtained in ideal dynamics (η/s = 0). The pT
spectra in viscous dynamics are shown by black lines.
We have shown the spectra for three values of viscos-
ity η/s=0.08,0.135 and 0.2. Compared to ideal dynam-
ics, quarks yield in viscous dynamics increases. The in-
crease is more at large pT . For low values of viscosity
the increase is modest, a factor of 2 at pT =3 GeV. But
yield increase by a factor or 4(10) if viscosity increases
to η/s=0.135 (2). Please note that even though we have
shown pT spectra upto 5 GeV, for η/s=0.2 and 0.135,
hydrodynamic description fails beyond pT ∼ 3.5 and 4
We have also studied the effect of viscosity on quark
elliptic flow. Effect of viscosity is very prominent on
elliptic flow. In Fig.14, pT dependence of elliptic flow
of quarks, in a b=6.5 fm collision is shown. The black
line is v2 in ideal dynamics. In ideal dynamics, ellip-
pT (GeV)
0 1 2 3 4 5
η/s=0.0(ID.Fluid)
η/s=0.2 
η/s=0.135 
η/s=0.08 
Au+Au@b=6.5 fm
τi=0.6fm,Sini=110fm-3,TF=160MeV
FIG. 13: Qurak transverse momentum spectra at freeze-out
temperature of 160 MeV. The dotted line is the quarks spectra
in ideal hydrodynamics. The solid lines (top to bottom) are
in viscous dynamics with η/s=0.08,0.135 and 0.2.
tic flow continually increases with pT . It well known, in
contrast to experiments, where elliptic flow saturates at
large pT , in ideal hydrodynamics, elliptic flow continue
to increase with pT . Indeed, this is a major problem in
ideal hydrodynamics. The renewed the interest in dis-
sipative hydrodynamics is partly due to the inability of
ideal hydrodynamics to predict the trend of elliptic flow
in Au+Au collisions. In Fig.14, the blue lines are v2
in viscous dynamics with η/s=0.08,0.135 and 0.2 respec-
tively. In a viscous dynamics, pT dependence of v2 is
drastically changed. In contrast to ideal dynamics where
v2 continue to increase with pT , in viscous dynamics, v2
continue to increase only upto pT ∼ 1.5− 2GeV . There-
after v2 decreases. For perturbative estimate of viscos-
ity η/s=0.135 and beyond, v2 even become negative at
large pT . Veering about of v2 after pT ∼1.5-2 GeV is
due to viscous effect only or more explicitly due to the
non-equilibrium correction to the equilibrium distribu-
tion function. This is clearly manifested from the red
lines in Fig.14. The red lines are calculated ignoring the
non-equilibrium corrections to the equilibrium distribu-
tion function. If non-equilibrium correction is ignored, in
viscous dynamics also, v2 continue to increase with pT ,
albeit its magnitude is reduced compared to ideal dy-
namics. The result is very important. It imply that the
experimental trend of elliptic flow (saturation at large
pT ) could only be explained if the QGP fluid is viscous.
An ideal QGP, will not be able to explain the saturation
trend of the experimental data.
As stated earlier, non-equilibrium correction to the
equilibrium distribution function depends on the freeze-
out condition. To show the effect of freeze-out condition,
on v2, in Fig.15 we have shown v2 for a values of freeze-
out temperature TF =160,150,140,130 and 120 MeV. As
pT (GeV)
0 1 2 3 4 5
vis.fluid with Fneq
vis.fluid with Feq
id. fluid
Au+Au@b=6.5 fm
τi=0.6fm,Sini=110fm-3,TF=160MeV
FIG. 14: (color online) Elliptic flow as a function of transverse
momentum. The black line is v2 in ideal hydrodynamics. The
blue lines are v2 in viscous dynamics with viscosity to entropy
ratio η/s=0.08,0.135 and 0.2 (top to bottom) respectively,
including the correction to equilibrium distribution function.
The red lines are same as the blues lines but ignoring the
non-equilibrium correction to the distribution function.
freeze-out occur at higher and higher temperature, the
veering of v2 takes place at larger and larger pT and for
TF =120 MeV, the elliptic flow saturates. The result is
understood easily. With decreasing freeze-out tempera-
ture, the fluid evolves for longer time, the shear stress-
tensor’s at the freeze-out surface is reduced and the non-
equilibrium correction, proportional to shear stress ten-
sors, decreases.
VIII. STABILITY OF NUMERICAL
SOLUTIONS IN AZHYDRO-KOLKATA
Before we summarise our results, we would like to
comment on the stability of numerical solutions in
AZHYDRO-KOLKATA. As indicated above, with shear
viscosity as the only dissipative force, boost-invariant
causal hydrodynamics require simultaneous solution of
six partial differential equations. Numerical solution of
six partial differential equations is non-trivial and it is
important to check for the numerical stability of the
solutions. Analytical solutions of viscous hydrodynam-
ics, even in restrictive conditions are not available, and
we can not check the solutions against analytical re-
sults. However, we can check for the stability of the
numerical solutions. The standard procedure of check-
ing the numerical stability is to change the integration
step lengths and look for the difference in the solution.
In Fig.16, for viscosity η/s=0.135, we have shown the
constant temperature contours in x − τ plane at a fixed
pT (GeV)
0 1 2 3 4 5
Au+Au@b=6.5 fm
τi=0.6fm,η/s=.2,Sini=110fm-3
TF=160,150,140,130 and 120 MeV
(bottom to top)
FIG. 15: Dependence of elliptic flow on the freeze-out tem-
perature. The solid lines (from bottom to top) are elliptic
flow (v2) in viscous dynamics with TF =160,150,140,130 and
120 MeV respectively.
0.250.28
0 2 4 6 8 10
____dx=dy=0.1,dτ=0.01
____dx=dy=0.2,dτ=0.02Au+Au@b=0
X (fm)
FIG. 16: (color online) constant temperature contours in
x − τ plane at a fixed y=0 fm. The black lines are obtained
with integration step lengths dx=dy=0.2 fm and dτ=0.02 fm.
The blue lines are obtained with integration step lengths,
dx=dy=0.1 fm and dτ=0.01 fm. Halving the step lengths do
not change the evolution. The numerical solutions are stable.
y=0 fm. The black and blue lines are obtained when
integration step lengths are dx=dy=0.2fm,dτ=0.02 fm,
and dx=dy=0.1 fm,dτ=0.01fm respectively. Evolution
of QGP fluid donot alter by changing the step lengths,
the solutions are stable against mesh size.
IX. SUMMARY AND CONCLUSIONS
In Israel-Stewart’s 2nd order theory of dissipative rel-
ativistic hydrodynamics, we have studied evolution QGP
fluid. In 2nd order theory, in addition to usual thermo-
dynamic quantities e.g. energy density, pressure, hydro-
dynamic velocities, dissipative flows are treated as ex-
tended thermodynamic variables. Relaxation equations
for dissipative flows are solved, simultaneously with the
energy-momentum conservation equations. This greatly
enhances the complexity of the problem. Altogether 14
partial differential equations are required to be solved.
We simplify the problem to solution of six partial differ-
ential equations by considering the evolution of baryon
free QGP fluid with longitudinal boost-invariance. We
also consider dissipation due to shear viscosity only, dis-
regarding the bulk viscosity and the heat conduction
(for a baryon free QGP fluid they do not contribute).
The six partial differential equations are solved using
the code AZHYDRO-KOLKATA, developed at the Cy-
clotron Centre, Kolkata.
To bring out the effect of viscosity, we have considered
the evolution of ideal as well as viscous QGP fluid. Both
ideal and viscous fluid are initialized similarly, at initial
time τi=0.6 fm, the central entropy density is 110 fm
Viscous dynamics require initial conditions for the shear-
stress tensor components. It is assumed that at the equi-
libration time, the shear stress tensors components have
attained their boost-invariant values.
Explicit simulation of ideal and viscous fluids confirms
that energy density of a viscous fluid, evolve slowly than
its ideal counterpart. Thus in a viscous fluid, lifetime of
the QGP phase will be enhanced. Transverse expansion
is also more in viscous dynamics. For a similar freeze-out
condition freeze-out surface is extended in viscous fluid.
As the fluid evolve, shear pressure tensors also evolve.
Explicit simulations indicate that shear pressure tensors
πxx and πyy which are initially non zero, rapidly de-
creases as the fluid evolve. By 3-4 fm of evolution they
reduced to very small values. The other independent
shear tensor πxx is zero initially. At later time it grow
in the negative direction but never grow to large value
and is always order of magnitude smaller than the stress
tensors (πxx and πyy). Spatial distribution of shear pres-
sure tensors πxx and πyy reveal an interesting feature
of viscous dynamics. Initially πxx and πyy have sym-
metric distribution. As the fluid evolve, pressure tensors
quickly become asymmetric, e.g. πxx evolve faster in the
x-direction than in the y-direction, πyy evolve faster in
y direction than in x-direction. However, in a central
collision, we did not see any effect of asymmetry in the
energy density distribution. In a central b=0 collision,
the two opposite asymmetry cancels each other.
We could not study effect of shear viscosity on par-
ticle production. However, we have explored the effect
of viscosity on parton momentum distribution and ellip-
tic flow. We have simulated b=6.5 fm Au+Au collision.
Using the Cooper-Frey prescription, transverse momen-
tum spectra as well as elliptic flow of quarks at freeze-out
temperature of TF =160 MeV are obtained. Viscous dy-
namics flattens the quark yield at large pT . At pT =3
GeV, even a small viscosity, η/s=0.8, increase the yield
by a factor of 2. The increase is even more if viscosity is
large. Viscous effect is most prominent on elliptic flow.
In ideal hydrodynamics, elliptic flow continue to increase
with pT . But in viscous dynamics v2 veer about around
pT =1.5-2 and even become negative at large pT . With
appropriate choice of viscosity, freeze-out condition, el-
liptic flow show saturation. The saturation effect is es-
sentially due to non-equilibrium correction to the equi-
librium distribution function and can not be mimicked
in an ideal hydrodynamics. Only in viscous dynamics,
saturation of elliptic flow can be explained.
APPENDIX A: COORDINATE
TRANSFORMATIONS
Instead of Cartesian coordinates xµ = (t, x, y, z) we
use curvilinear coordinates in longitudinal proper time
and rapidity, x̄m = (τ, x, y, η):
t = τ cosh η; τ =
t2 − z2 (A1)
z = τ sinh η; η =
. (A2)
The differentials
dt = dτ cosh η + dη τ sinh η, (A3)
dz = dτ sinh η + dη τ cosh η, (A4)
and the metric tensor is easily read off from
ds2 = gµνdx
µdxν = dt2 − dx2 − dy2 − dz2
= ḡmndx̄
mdx̄n = dτ2 − dx2 − y2 − τ2dη2,(A5)
namely
ḡmn =
1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −τ2
, ḡmn =
1 0 0 0
0 −1 0 0
0 0 −1 0
0 0 0 −1/τ2
In curvilinear coordinates we must replace the partial
derivatives with respect to xµ by covariant derivatives
(denoted by a semicolon) with respect to x̄m:
T̄ ik;p =
∂T̄ ik
+ ΓipmT̄
mk + T̄ imΓkmp.
The only non-vanishing Christoffel symbols are
Γτηη = τ ; Γ
τη = Γ
ητ = 1/τ. (A7)
The hydrodynamic 4-velocity uµ = γ(1, vx, vy, vz) is
transformed to ūm = γ(1, vx, vy, 0), with γ⊥ =
1−v2r . From here on, we drop the bars over tensor
components in x̄-coordinates for simplicity.
The projector can be easily calculated,
∆µν = gµν − uµuν
1 − γ2⊥ −γ2⊥vx γ2⊥vy 0
−γ2⊥vx −1 − γ2⊥v2x −γ2⊥vxvy 0
−γ2⊥vy −γ2⊥vxvy −1 − γ2⊥v2y 0
0 0 0 1
.(A8)
In (τ, x, y, η) coordinate system, the convective time
derivative can be obtained as,
D = u · ∂ = γ(∂τ + vx∂x + vy∂y). (A9)
For future reference,we also write down the the scalar
expansion rate
θ = ∂·u = ∂τuτ + ∂xux + ∂yuy +
(A10)
APPENDIX B: ENERGY-MOMENTUM
CONSERVATION
With longitudinal boost-invariance the energy-
momentum conservation equations Tmn;n = 0 yield
∂τ T̃
ττ + ∂x(T̃
ττvx) + ∂y(T̃
ττvy) = − (p+ τ2πηη) (B1)
∂τ T̃
τx + ∂x(T̃
τxvx) + ∂y(T̃
τxvy) = −∂x(p̃+ π̃xx − π̃τxvx) − ∂y(π̃xy − π̃τxvy) (B2)
∂τ T̃
τy + ∂x(T̃
τyvx) + ∂y(T̃
τyvy) = −∂x(π̃xy − π̃τyvx) − ∂y(p̃+ π̃yy − π̃τyvy) (B3)
where Ãmn ≡ τAmn, p̃ ≡ τp, and vx ≡ T τx/T ττ , vy ≡
T τy/T ττ .
The components of the energy momentum tensors, in-
cluding the shear pressure tensor are,
T ττ = (ε+ p)γ2⊥ − p+ πττ (B4)
T τx = (ε+ p)γ2⊥vx + π
τx (B5)
T τy = (ε+ p)γ2⊥vy + π
τy (B6)
In causal dissipative hydrodynamics, energy momen-
tum conservation equations are solved simultaneously
with the relaxation equations. Given an equation of
state, if energy density (ε) and fluid velocity (vx and
vy) distributions, at any time τi are known, Eqs.B1,B2
and B3 can be integrated to obtain ε, vx and vy at the
next time step τi+1. While for ideal hydrodynamics, this
procedure works perfectly, viscous hydrodynamics poses
a problem that shear stress-tensor components contains
time derivatives, ∂τγ⊥, ∂τu
x, ∂τu
x etc. Thus at time step
τi one needs the still unknown time derivatives. Numer-
ically, time derivatives at step τi could be obtained if ve-
locities at time step τi and τi+1 are known. One possible
way to circumvent the problem, is to use time derivatives
of the previous step, i.e. use velocities at time step τi−1
and τi to calculate the derivatives at time step τi [18].
The underlying assumption that fluid velocity changes
slowly with time. In 1st order theories, this problem is
circumvented by calculating the time derivatives from the
ideal equation of motion ,
Duµ =
, (B7)
Dε = −(ε+ p)∇µuµ. (B8)
With the help of these two equations all the time
derivatives can be expressed entirely in terms of spa-
tial gradients [15, 26]. 1st order theories are restricted
to contain terms at most linear in dissipative quantities.
Neglect of viscous terms can contribute only in 2nd or-
der corrections, which are neglected in 1st order theories.
While the procedure is not correct in 2nd order theory,
we still use it in the present calculations. The alternative
procedure of using the derivative of earlier time step is
not correct either.
APPENDIX C: RELAXATION EQUATIONS FOR
THE VISCOUS PRESSURE TENSOR
Being symmetric and traceless, the viscous pressure
tensor πµν has 9 independent components. The as-
sumption of boost invariance reduces this number by
3 (∇〈mu η〉 =0, m 6= η). The transversality condition
mn = 0 eliminates another three components ( uη
vanish and thus yield no constraint). Thus, with boost-
invariance the viscous pressure tensor has only three in-
dependent components. As seen in Eqs.B1,B2 and B3 in
a boost-invariant evolution only seven pressure tensors
πττ , πxx, πyy, πηη, πτx, πτy and πxy are of importance.
Only three of these seven are independent. In an ear-
lier publication [17], we have debated about the choice
of the independent components and suggested use of ei-
ther (πττ , πηη, ∆ = πxx − πyy) or (πττ ,πηη,πτx, πτy)
(which will require solution of an additional relaxation
equation) as choice of independent components. How-
ever, while computing we find that the three pressure
tensors πxx and πyy and πxy as independent components
are computationally more convenient. The choice has
the advantage that the dependent shear stress tensors
can be obtained from the 3 independent stress tensors by
multiplying them by fluid velocity, vx and vy (see Eqs.
C7-C10). In any other choice of independent components
(e.g. πττ ,πηη,∆ = πxx − πyy), the evaluation of depen-
dent stress tensors requires division by fluid velocities.
Since initially, fluid velocities are assumed to be zero and
they grow slowly, these choices will involve division by
very small numbers. Unless proper care is not taken, di-
vision by small numbers can lead to unrealistically large
values for the dependent stress tensors and ruin the com-
putation.
The relaxation equations for the independent shear
stress tensors πxx, πyy and πxy, in (τ ,x,y,η) co-ordinate
can be written as,
xx + vx∂xπ
xx + vy∂yπ
xx = − 1
(πxx − 2ησxx) (C1)
yy + vx∂xπ
yy + vy∂yπ
yy = − 1
(πyy − 2ησyy) (C2)
xy + vx∂xπ
xy + vy∂yπ
xy = − 1
(πxy − 2ησxy) (C3)
where τπ is the relaxation time, τπ = 2ηβ2 (see
Eq.2.21). In ultra-relativistic limit, for a Boltzman gas,
β2 can be evaluated, β2 ≈ 34p where p is the pressure
[13]. In the present paper, we use this limit to obtain the
relaxation time τπ.
The viscous pressure tensor relaxes on a time scale τπ
to 2η times the shear tensor σµν = ∇〈µ u ν〉. The xx,
yy and xy components of the shear tensor σµν can be
written as
σxx = −∂xux − uxDux −
∆xxθ (C4)
σyy = −∂yuy − uyDuy −
∆yyθ (C5)
σxy = −1
y − ∂yux − uxDuy − uyDux]
∆xyθ (C6)
The dependent shear stress tensors can easily be ob-
tained from the independent ones as,
πτx = vxπ
xx + vyπ
xy (C7)
πτy = vxπ
xy + vyπ
yy (C8)
πττ = v2xπ
xx + v2yπ
yy + 2vxvyπ
xy (C9)
τ2πηη = −(1 − v2x)πxx − (1 − v2y)πyy
+2vxvyπ
xy (C10)
The expressions for the convective time derivative D
and expansion scalar θ = ∂u̇, in (τ ,x,y,η) are given in
Eqs. A9 and A10.
APPENDIX D: PARTICLE SPECTRA
With the non-equilibrium distribution function thus
specified, it can be used to calculate the particle spectra
from the freeze-out surface. In the standard Cooper-Frye
prescription, particle distribution is obtained as,
dyd2pT
µf(x, p) (D1)
In (τ, x, y, ηs) coordinate, the freeze-out surface is pa-
rameterised as,
Σµ = (τf (x, y) cosh ηs, x, y, τf (x, y) sinh ηs), (D2)
and the normal vector on the hyper surface is,
dΣµ = (cosh ηs,−
,−∂τf
,− sinh ηs)τfdxdydηs
At the fluid position (τ, x, y, ηs) the particle 4-
momenta are parameterised as,
pµ = (mT cosh(ηs − Y ), px, py,mT sinh(ηs − Y )) (D4)
The volume element pµdΣµ become,
pµdΣµ = (mT cosh(η − Y ) − ~pT .~∇T τf )τfdxdydη (D5)
Equilibrium distribution function involve the term
which can be evaluated as,
γ(mT cosh(η − Y ) − ~vT .~pT − µ/γ)
The non-equilibrium distribution function require the
sum pµpνπµν ,
pµpνπ
µν = a1cosh
2(η − Y ) + a2cosh(η − Y ) + a3 (D7)
a1 = m
ττ + τ2πηη) (D8)
a2 = −2mT (pxπτx + pyπτy) (D9)
a3 = p
xx + p2yπ
yy + 2pxpyπ
xy −m2T τ2πηη(D10)
Inserting all the relevant formulas in Eq.D1 and inte-
grating over spatial rapidity one obtains,
dyd2pT
dyd2pT
dNneq
dyd2pT
(D11)
with,
dyd2pT
(2π)3
dxdyτf [mTK1(nβ) − pT ~∇T τfK0(nβ)] (D12)
dNneq
dyd2pT
(2π)3
dxdyτf [mT {
K3(nβ) +
K2(nβ) + (
+ a3)K1(nβ) +
K0(nβ)}
−~pT .~∇T τf{
K2(nβ) + a2K1(nβ) + (
+ a3)K0(nβ)}] (D13)
where K0, K1, K2 and K3 are the modified Bessel func-
tions.
We will also show results for elliptic flow v2. It is de-
fined as,
dyd2pT
cos(2φ)dφ
dyd2pT
(D14)
Expanding to the 1st order, elliptic flow as a function
of transverse momentum can be obtained as,
v2(pT ) = v
2 (pT )
2Nneq
pT dpT dφ
pT dpT dφ
dφcos(2φ) d
2Nneq
pT dpT dφ
pT dpT dφ
(D15)
where v
2 is the elliptic flow calculated with the equi-
librium distribution feq.
[1] BRAHMS Collaboration, I. Arsene et al., Nucl. Phys. A
757, 1 (2005).
[2] PHOBOS Collaboration, B. B. Back et al., Nucl. Phys.
A 757, 28 (2005).
[3] PHENIX Collaboration, K. Adcox et al., Nucl. Phys. A
757 (2005), in press [arXiv:nucl-ex/0410003].
[4] STAR Collaboration, J. Adams et al., Nucl. Phys. A 757
(2005), in press [arXiv:nucl-ex/0501009].
[5] Karsch F, Laermann E, Petreczky P, Stickan S and Wet-
zorke I, 2001 Proccedings of NIC Symposium (Ed. H.
Rollnik and D. Wolf, John von Neumann Institute for
Computing, Jülich, NIC Series, vol.9, ISBN 3-00-009055-
X, pp.173-82,2002.)
[6] P. F. Kolb and U. Heinz, in Quark-Gluon Plasma 3,
edited by R. C. Hwa and X.-N. Wang (World Scientific,
Singapore, 2004), p. 634.
[7] G. Policastro, D. T. Son and A. O. Starinets, Phys. Rev.
Lett. 87, 081601 (2001) [arXiv:hep-th/0104066].
[8] G. Policastro, D. T. Son and A. O. Starinets, JHEP
0209, 043 (2002) [arXiv:hep-th/0205052].
[9] U. Heinz, J. Phys. G 31, S717 (2005).
[10] U. W. Heinz and P. F. Kolb, arXiv:hep-ph/0204061.
[11] C. Eckart, Phys. Rev. 58, 919 (1940).
[12] L. D. Landau and E. M. Lifshitz, Fluid Mechanics, Sect.
127, Pergamon, Oxford, 1963.
[13] W. Israel, Ann. Phys. (N.Y.) 100, 310 (1976); W. Israel
and J. M. Stewart, Ann. Phys. (N.Y.) 118, 349 (1979).
[14] A. Muronga, Phys. Rev. Lett. 88, 062302 (2002) [Er-
ratum ibid. 89, 159901 (2002)]; and Phys. Rev. C 69,
034903 (2004).
[15] D. A. Teaney, J. Phys. G 30, S1247 (2004).
[16] A. Muronga and D. H. Rischke, nucl-th/0407114 (v2).
[17] U. W. Heinz, H. Song and A. K. Chaudhuri, Phys. Rev.
C 73, 034904 (2006) [arXiv:nucl-th/0510014].
[18] A. K. Chaudhuri and U. W. Heinz, J. Phys. Conf. Ser.
50, 251 (2006) [arXiv:nucl-th/0504022].
[19] A. K. Chaudhuri, Phys. Rev. C 74, 044904 (2006)
[arXiv:nucl-th/0604014].
[20] A. K. Chaudhuri, arXiv:nucl-th/0703029.
[21] A. K. Chaudhuri, arXiv:nucl-th/0703027.
[22] T. Koide, G. S. Denicol, Ph. Mota and T. Kodama, Phys.
Rev. C 75, 034909 (2007).
[23] R. Baier and P. Romatschke, arXiv:nucl-th/0610108.
[24] P. Arnold, G. D. Moore and L. G. Yaffe, JHEP 0011,
001 (2000) [arXiv:hep-ph/0010177].
[25] G. Baym, H. Monien, C. J. Pethick and D. G. Ravenhall,
Phys. Rev. Lett. 64, 1867 (1990).
[26] S. R. de Groot, W. A. van Leeuwen and Ch. G. van
Weert, Relativistic Kinetic Theory ( North-Holland, Am-
sterdam, 1980) p.36
http://arxiv.org/abs/nucl-ex/0410003
http://arxiv.org/abs/nucl-ex/0501009
http://arxiv.org/abs/hep-th/0104066
http://arxiv.org/abs/hep-th/0205052
http://arxiv.org/abs/hep-ph/0204061
http://arxiv.org/abs/nucl-th/0407114
http://arxiv.org/abs/nucl-th/0510014
http://arxiv.org/abs/nucl-th/0504022
http://arxiv.org/abs/nucl-th/0604014
http://arxiv.org/abs/nucl-th/0703029
http://arxiv.org/abs/nucl-th/0703027
http://arxiv.org/abs/nucl-th/0610108
http://arxiv.org/abs/hep-ph/0010177
ABSTRACT
  In 2nd order causal dissipative theory, space-time evolution of QGP fluid is
studied in 2+1 dimensions. Relaxation equations for shear stress tensors are
solved simultaneously with the energy-momentum conservation equations.
Comparison of evolution of ideal and viscous QGP fluid, initialized under the
same conditions, e.g. same equilibration time, energy density and velocity
profile, indicate that in a viscous dynamics, energy density or temperature of
the fluid evolve slowly, than in an ideal fluid. Cooling gets slower as
viscosity increases. Transverse expansion also increases in a viscous dynamics.
For the first time we have also studied elliptic flow of 'quarks' in causal
viscous dynamics. It is shown that elliptic flow of quarks saturates due to
non-equilibrium correction to equilibrium distribution function, and can not be
mimicked by an ideal hydrodynamics.

<|endoftext|><|startoftext|>
A Single Trapped Ion as a Time-Dependent Harmonic Oscillator
Nicolas C. Menicucci1, 2, ∗ and G. J. Milburn2
Department of Physics, Princeton University, Princeton, NJ 08544, USA
School of Physical Sciences, The University of Queensland, Brisbane, Queensland 4072, Australia
(Dated: November 4, 2018)
We show how a single trapped ion may be used to test a variety of important physical mod-
els realized as time-dependent harmonic oscillators. The ion itself functions as its own motional
detector through laser-induced electronic transitions. Alsing et al. [Phys. Rev. Lett. 94, 220401
(2005)] proposed that an exponentially decaying trap frequency could be used to simulate (thermal)
Gibbons-Hawking radiation in an expanding universe, but the Hamiltonian used was incorrect. We
apply our general solution to this experimental proposal, correcting the result for a single ion and
showing that while the actual spectrum is different from the Gibbons-Hawking case, it nevertheless
shares an important experimental signature with this result.
PACS numbers: 03.65.-w, 32.80.Pj
I. INTRODUCTION
The time-dependent quantum harmonic oscillator has
long served as a paradigm for nonadiabatic time-
dependent Hamiltonian systems and has been applied to
a wide range of physical problems by choosing the mass,
the frequency, or both, to be time-dependent. The ear-
liest application is to squeezed state generation in quan-
tum optics [1, 2, 3], in which the effect of a second-order
optical nonlinearity on a single-mode field can be mod-
eled by a harmonic oscillator with a frequency that is
harmonically modulated at twice the bare oscillator fre-
quency. It was subsequently shown that any modulation
of the frequency could produce squeezing [4], and thus the
same model could be used to approximately describe the
generation of photons in a cavity with a time-dependent
boundary [5, 6].
The model has been used in a number of quantum
cosmological models. In Ref. [7], a time-dependent fre-
quency has been used to explain entropy production in a
quantum mini-superspace model. The model, with both
mass and frequency time-dependent, has been particu-
larly important in developing an understanding of how
quantum fluctuations in a scalar field can drive classical
metric fluctuation during inflation [8, 9]. In a cosmo-
logical setting the time-dependence is not harmonic and
is usually exponential. In all physical applications, of
course, the model is only an approximation to the true
physics, and its validity can be tested only with consid-
erable difficulty, especially in the cosmological setting.
Here we propose a realistic experimental context in which
the time-dependent quantum harmonic oscillator can be
studied directly.
Many decades of effort to refine spectroscopic measure-
ments for time standards now enable a single ion to be
confined in three dimensions, its vibrational motion re-
stricted effectively to one dimension, and the ion cooled
∗Electronic address: nmen@princeton.edu
to the vibrational ground state with a probability greater
than 99% [10]. Laser cooling is based on the ability to
couple an internal electronic transition to the vibrational
motion of the ion [11]. These methods can easily be ex-
tended to more than one ion and their collective normal
modes of vibration [12]. Indeed so carefully can the cou-
pling between the electronic and vibrational states be
engineered that is is possible to realise simple quantum
information processing tasks [13, 14]. We use the control
of trapping potential afforded by ion traps, together with
the ability to reach quantum limited motion, to propose
a simple experimental test of quantum harmonic oscilla-
tors with time-dependent frequencies. We also make use
of the ability to make highly efficient quantum measure-
ments, based on fluorescent shelving [10], to propose a
practical means to test our predictions.
In this paper, we calculate the excitation probability
of a trapped ion in a general time-dependent potential.
When beginning in the vibrational ground state of the
unchirped trap and starting the chirping process adia-
batically, the excitation probability is simply related to
the Fourier transform of the solution of the Heisenberg
equations of motion (which is also the same as the trajec-
tory of the equivalent classical oscillator). We compare
our result with that of Ref. [15] for the case of a single
ion undergoing an exponential frequency chirp. The cited
work attempts to use this experimental setup to model
a massless scalar field during an inflating (i.e., de Sitter)
universe, which would give a thermal excitation spectrum
as a function of the detector response frequency [16]. The
analysis is incorrect, however, because the wrong Hamil-
tonian was used. Nevertheless, the corrected calculation
presented here also gives an excitation spectrum with
a thermal signature, although the particular functional
form is different.
II. GENERAL SOLUTION
The quantum Hamiltonian for a single ion in a time-
dependent harmonic trap can be well-approximated in
http://arxiv.org/abs/0704.0135v2
mailto:nmen@princeton.edu
one dimension by
ν(t)2q2 , (1)
where ν(t) is time-dependent but always assumed to
be much slower than the timescale of the micromo-
tion [10]. For emphasis, we have indicated the explicit
time-dependence of the frequency ν; we will often omit
this from now on. Working in the Heisenberg picture, we
get the following equations of motion for q and p:
, (2)
ṗ = −Mν2q . (3)
Dots indicate total derivatives with respect to time. Dif-
ferentiating again and plugging in these results gives
0 = q̈ + ν2q , (4)
0 = p̈− 2
ṗ+ ν2p . (5)
As we shall see, only Eq. (4) is necessary for calculating
excitation probabilities, so we will focus only on it. These
equations are operator equations, but they are identical
to the classical equations of motion for the analogous
classical system. Interpreting them as such, we will la-
bel the two linearly independent c-number solutions as
h(t) and g(t), where the following initial conditions are
satisfied:
h(0) = ġ(0) = 1 and ḣ(0) = g(0) = 0 , (6)
Writing q(0) = q0 and p(0) = p0, the unique solution for
q to the initial value problem above is
q(t) = q0h(t) +
g(t) . (7)
By differentiating and using the relations above, we know
also that
p(t) = Mq0ḣ(t) + p0ġ(t) . (8)
To check our math, we can verify that [q(t), p(t)] = i~,
which is fulfilled if and only if the Wronskian W (h, g) of
the two solutions is one for all times—specifically,
hġ − ḣg = 1 , (9)
where we have assumed that [q0, p0] = i~.
Moreover, if the initial state at t = 0 is symmetric with
respect to phase-space rotations, then we have additional
rotational freedom in choosing the initial quadratures.
(This would be the case, for instance, if we start in the
instantaneous ground state.) Notice that Eq. (7) can be
written as the inner product of two vectors:
q(t) =
h(t), ν0g(t)
(and similarly for Eq. (8)), where we have normalized
the quadrature operators to have the same units. As an
inner product, this expression is invariant under simulta-
neous rotations of both vectors. Thus, if the initial state
possesses rotational symmetry in the phase plane, then
the rotated quadratures are equally as valid as the orig-
inal ones for representing the initial state, which means
that an arbitrary rotation can be applied to the second
vector above without changing any measurable property
of the system. This freedom can be used, for instance, to
define new functions h′(t) and g′(t) that are more con-
venient for calculations, where the linear transformation
between them and the original ones (with prefactors as
in Eq. (10)) is a rotation. We will use this freedom in the
next section.
One reason why ion traps have become a leading im-
plementation for quantum information processing is the
ability to efficiently read out the internal electronic state
using a fluorescence shelving scheme [10]. As the internal
state can become correlated with the vibrational motion
of the ion, this scheme can be configured as a way to
measure the vibrational state directly [17]. To correlate
the internal electronic state with the motion of the ion,
an external laser can be used to drive an electronic tran-
sition between two levels |g〉 and |e〉, separated in energy
by ~ωA. The interaction between an external classical
laser field and the ion is described, in the dipole and
rotating-wave approximation, by the interaction-picture
Hamiltonian [10]
HL = −i~Ω0
σ+(t)e
ik cos θq(t) − σ−(t)e
−ik cos θq(t)
where Ω0 is the Rabi frequency for the laser-atom inter-
action, ωL is the laser frequency, k is the magnitude of
the wave vector ~k, which makes an angle θ with the trap
axis, q(t) is given in Eq. (7), and
σ±(t) = e
±i∆tσ± . (12)
The electronic-state raising and lowering operators are
defined as σ+ = |e〉〈g| and σ− = |g〉〈e|, respectively, and
∆ = ωA − ωL (13)
is the detuning of the laser below the atomic transi-
tion. We can construct a meaningful quantity that char-
acterizes the “size” of q(t) based on the width of the
ground-state wave packet for an oscillator with frequency
ν(t), namely
~/2Mν(t). As long as this quantity is
much smaller than k cos θ throughout the chirping pro-
cess, then we can expand the exponentials in Eq. (11) to
first order and define the interaction Hamiltonian HI be-
tween the electronic states and vibrational motion (still
in the interaction picture) by
HI = ~Ω0k cos θq(t)
e−i∆tσ− + e
+i∆tσ+
. (14)
where we have assumed that ωL is far off-resonance, and
thus ∆ 6≃ 0.
Using first-order time-dependent perturbation theory,
the probability to find the ion in the excited state is
P (1) =
dt2 〈HI(t1)PeHI(t2)〉
= Ω20k
2 cos2 θ
dt2 e
−i∆(t1−t2) 〈q(t1)q(t2)〉 ,
where Pe = 1vib ⊗ |e〉〈e| is the projector onto the ex-
cited electronic state (and the identity on the vibrational
subspace). We always assume that the ion begins in the
electronic ground state. If the ion also starts out in the
instantaneous vibrational ground state for a static trap
of frequency ν0 = ν(0) at t = 0 (which is most useful
when the chirping begins in the adiabatic regime), then
we can evaluate the two-time correlation function as
〈q(t1)q(t2)〉ground =
h(t1)h(t2) +
g(t1)g(t2)
〈q0p0〉
h(t1)g(t2)− h(t2)g(t1)
h(t1)− iν0g(t1)
h(t2) + iν0g(t2)
f(t1)f
∗(t2) , (16)
where we have used the facts that for the vibrational
ground state,
(p0/Mν0)
= ~/2Mν0 and
〈q0p0〉 =
〈{q0, p0}+ [q0, p0]〉 = i~/2, and we have de-
fined the complex function
f(t) = h(t)− iν0g(t) , (17)
which is the solution to Eq. (4) with initial the conditions,
f(0) = 1 and ḟ(0) = −iν0. Plugging this into Eq. (15)
gives, quite simply,
P (1) → (Ω0η0)
2 |F̧|
, (18)
where
dt e−i∆tf(t) , (19)
and we have defined the unitless, time-dependent Lamb-
Dicke parameter [10] as
η(t) =
~k2 cos2 θ
2Mν(t)
, (20)
and η0 = η(0). Recalling that f(t) can be considered a
complex c-number solution to the equations of motion for
the equivalent classical Hamiltonian, Eq. (18) shows that
the excitation probability is simply related to the Fourier
transform of the classical trajectories when beginning in
the vibrational ground state.
III. EXPONENTIAL CHIRPING
Recent work [15] has suggested that an exponen-
tially decaying trap frequency has the same effect on
the phonon modes of a string of ions as an expand-
ing (i.e., de Sitter) spacetime does on a one-dimensional
scalar field [18]. An inertial detector that responds to
such an expanding scalar field would register a thermal
bath of particles, called Gibbons-Hawking radiation [16].
Ref. [15] suggests that the acoustic analog [19] of this
radiation could be seen in an ion trap, causing each ion
to be excited with a thermal spectrum with temperature
~κ/2πkB, as a function of the detuning ∆, where κ is
the trap-frequency decay rate. The analysis used an in-
correct Hamiltonian that neglected squeezing and source
terms that have no analog in the expanding scalar field
model but which are present when considering trapped
ions in this way, and the results are incorrect. In this sec-
tion, we revisit this problem and calculate the excitation
probability for a single ion in an exponentially decaying
harmonic potential, as a function of the detuning ∆.
We write the time-dependent frequency as [20]
ν(t) = ν0e
−κt . (21)
This results in
q̈ + ν20e
−2κtq = 0 . (22)
Solutions with initial conditions (6) are
h(t) =
, (23)
g(t) =
where the time dependence is carried in ν = ν(t) from
Eq. (21), and Jn and Yn are Bessel functions. We could
plug these directly into the formulas from the last section,
but we will simplify the calculations by considering the
limits of slow and long-time frequency decay, represented
ν0 ≫ κ and ν0e
−κT ≪ κ , (25)
respectively. This allows us to do several things. First, it
allows us to use the usual ground state of the unchirped
trap at frequency ν0 as a good approximation to the
ground state of the expanding trap at t = 0, since at that
time the system is being chirped adiabatically. This is
important because it allows the experiment to begin with
a static potential, which is useful for cooling. Second, it
allows us to simplify h(t) and g(t) using the phase-space
rotation freedom discussed above. Using asymptotic ap-
proximations for the Bessel functions in the coefficients,
≃ −Y1
, (26)
, (27)
we get
h(t) ≃
sinϕY0
+ cosϕJ0
, (28)
ν0g(t) ≃
− cosϕY0
+ sinϕJ0
. (29)
where ϕ = ν0/κ − π/4. Since we are taking the initial
state to be the ground state, which is symmetric with
respect to phase-space rotations, we can use the freedom
discussed in the previous section to undo the rotation
represented by Eqs. (28) and (29) and define the simpler
functions
h(t) → h′(t) =
, (30)
g(t) → g′(t) =
. (31)
The primes are unnecessary due to the symmetry of the
initial state, so we drop them from now on and plug di-
rectly into Eq. (17):
f(t) =
− iJ0
, (32)
where H
n is a Hankel function of the first kind. The
integral in Eq. (19) can be evaluated in the limits (25)
using techniques similar to those used in Ref. [15]. First,
define
, τ = α− κt , u = eτ , and x = ∆/κ .
The integral in question then becomes (neglecting the
prefactor)
dt e−i∆tH
dt e−i∆tH
α−κt)
dτ e−ix(α−τ)H
e−ixα
dτ eixτH
e−ixα
du uix−1H
0 (u) . (34)
Inserting a convergence factor with x → x− iǫ, and then
taking the limit ǫ → 0+, we can use the formula
du uix−1H
0 (u) = −2
ix Γ(ix/2)
(eπx − 1)Γ(1− ix/2)
to evaluate
Γ(ix/2)
Γ(1− ix/2)
(eπx − 1)2
(eπx − 1)2
. (36)
When plugging in for the dummy variables (33), this
gives
P (1) = (Ω0η0)
2 2πν0
(eπ∆/κ − 1)2
. (37)
The calculated result from Ref. [15] for a single ion is
GH = (Ω0η0)
e2π∆/κ − 1
, (38)
which contains a Planck factor with Gibbons-
Hawking [16] temperature T = ~κ/2πkB but is
different from the actual result for a single ion, given by
Eq. (37).
Several things should be noted about these functions.
First, they both break down as ∆ → 0 because of the ap-
proximation made in obtaining Eq. (14). They also fail
if the time-dependent Lamb-Dicke parameter (20) ever
becomes too large throughout the chirping process. Fur-
thermore, most cases of interest will be ∆ ≃ ν0 (the first
red sideband) and near ∆ ≃ −ν0 (the first blue side-
band), which means that |∆| ≫ κ, since ν0 ≫ κ. The
first red sideband represents a detector that requires the
absorption of one phonon (plus one laser photon) in order
to excite the atom—the usual thing we mean by “particle
detector” when the particles are phonons. The first blue
sideband, on the other hand, represents a detector that
emits a phonon in order to excite the atom (along with
absorbing one laser photon).
There are a couple of ways to compare these functions.
First, we can take the ratio of the two for both the red-
and blue-sideband cases. In both cases, we obtain
P (1)
(1 + 2e−π|∆|/κ) (39)
plus terms of order O(e−2π|∆|/κ). Since |∆| ≃ ν0, the
prefactor is close to one, and the second term is very
small (since ν0 ≫ κ). Furthermore, it is cumbersome
to directly compare the measured probability to the full
function (with all the prefactors). It is often easier in-
stead to make measurements on both the first red side-
band and the first blue sideband and then take the ratio
of the two. The constant prefactors disappear in this
calculation, and both functions then have the same ex-
perimental signature:
P (1)(∆)
P (1)(−∆)
GH(∆)
GH(−∆)
= e−2π∆/κ , (40)
which is that of a thermal distribution with tempera-
ture T = ~κ/2πkB, which is of the Gibbons-Hawking
form [16] with the expansion rate given by κ. There-
fore, although the Hamiltonian used in the calculations
in Ref. [15] was missing terms, the intuition (at least for
a single ion) was correct in that the actual experimental
signature in this case matches that of an ion undergoing
thermal motion in a static trap, where the temperature
is proportional to κ.
To see whether this experiment is feasible, we must ex-
amine the validity of our approximations. For a typical
trap, we expect that ν0 ≃ 1 MHz, and thus if we take
κ ≃ 1 kHZ, we easily satisfy the first of conditions (25),
namely ν0 ≫ κ. The second of these conditions gives a
constraint on the modulation time T . For these param-
eters we expect that T ≃ a few msec. This is compat-
ible with typical cooling and readout time scales and is
less than those for heating due to fluctuating patch po-
tentials [10]. Thus, this is a realizable experiment with
current technology.
IV. CONCLUSION
We have shown that a single trapped ion in a modu-
lated trapping potential can serve as an experimentally
accessible implementation of a quantum harmonic oscilla-
tor with time-dependent frequency, including robust con-
trol over state preparation, manipulation, and measure-
ment. The ion itself serves both as the oscillating particle
and as the local detector of vibrational motion via cou-
pling to internal electronic states by an external laser.
For the case of a general time-dependent trap frequency,
we calculated the first-order excitation probability for the
ion in terms of the solution to the classical equations of
motion for the equivalent classical oscillator. We applied
this general result to the case of exponential chirping and
corrected the calculation in Ref. [15] for a single ion. We
found that while the results from the two calculations dif-
fer, the experimental signature in both cases is the same
and equivalent to that of a thermal ion in a static trap.
We thank Dave Kielpinski for invaluable help with the
experimental details. We also thank Paul Alsing, Bill
Unruh, John Preskill, Jeff Kimble, Greg Ver Steeg, and
Michael Nielsen for useful discussions and suggestions.
NCM extends much appreciation to the faculty and staff
of the Caltech Institute for Quantum Information for
their hospitality during his visit, which helped bring this
work to fruition. NCM was supported by the United
States Department of Defense, and GJM acknowledges
support from the Australian Research Council.
[1] D. Stoler, Phys. Rev. D 1, 3217 (1970).
[2] H. P. Yuen, Phys. Rev. A 13, 2226 (1976).
[3] J. N. Hollenhorst, Phys. Rev. D 19, 1669 (1979).
[4] X. Ma and W. Rhodes, Phys. Rev. A 39, 1941 (1989).
[5] V. V. Dodonov and A. B. Klimov, Phys. Rev. A 53, 2664
(1996).
[6] G. T. Moore, J. Math. Phys. 11, 2679 (1970).
[7] S. P. Kim and S.-W. Kim, Phys. Rev. D 51, 4254 (1995).
[8] D. Polarski and A. A. Starobinsky, Classical and Quan-
tum Gravity 13, 377 (1996).
[9] C. Kiefer, J. Lesgourgues, D. Polarski, and A. A.
Starobinsky, Classical and Quantum Gravity 15, L67
(1998).
[10] D. Leibfried, R. Blatt, C. Monroe, and D.Wineland, Rev.
Mod. Phys. 75, 281 (2003).
[11] C. Monroe, D. M. Meekhof, B. E. King, S. R. Jefferts,
W. M. Itano, D. J. Wineland, and P. L. Gould, Phys.
Rev. Lett. 75, 4011 (1995).
[12] D. F. V. James, Applied Physics B: Lasers and Optics
66, 181 (1998).
[13] D. Leibfried, B. De Marco, V. Meyer, D. Lucas, M. Bar-
rett, J. Britton, W. M. Itano, B. Jelenkovic, C. Langer,
T. Rosenband, et al., Nature 422, 412 (2003).
[14] F. Schmidt-Kaler, H. Häffner, M. Riebe, G. P. T. Lan-
caster, T. Deuschle, C. Becher, C. F. Roos, J. Eschner,
and R. Blatt, Nature 422, 408 (2003).
[15] P. M. Alsing, J. P. Dowling, and G. J. Milburn, Phys.
Rev. Lett. 94, 220401 (2005).
[16] G. W. Gibbons and S. W. Hawking, Phys. Rev. D 15,
2738 (1977).
[17] S. Wallentowitz and W. Vogel, Phys. Rev. A 54, 3322
(1996).
[18] A. M. de M. Carvalho, C. Furtado, and I. A. Pedrosa,
Phys. Rev. D 70, 123523 (pages 6) (2004).
[19] W. G. Unruh, Phys. Rev. Lett. 46, 1351 (1981).
[20] The authors of Ref. [15] consider both signs in the ex-
ponential, but we will restrict ourselves to the case that
allows us to begin chirping in the adiabatic limit.
ABSTRACT
  We show how a single trapped ion may be used to test a variety of important
physical models realized as time-dependent harmonic oscillators. The ion itself
functions as its own motional detector through laser-induced electronic
transitions. Alsing et al. [Phys. Rev. Lett. 94, 220401 (2005)] proposed that
an exponentially decaying trap frequency could be used to simulate (thermal)
Gibbons-Hawking radiation in an expanding universe, but the Hamiltonian used
was incorrect. We apply our general solution to this experimental proposal,
correcting the result for a single ion and showing that while the actual
spectrum is different from the Gibbons-Hawking case, it nevertheless shares an
important experimental signature with this result.

<|endoftext|><|startoftext|>
Introduction 
Nowadays, there still exist some movement phenomena which can’t be explained by current 
sub-quark field theories. Therefore, some scientists bring forward some new field theories to 
explain the strange phenomena in the strong interaction and weak interaction etc. 
A new insight on the problem of the sub-quark movement and their interactions can be 
given by the concept of trigintaduonion space. According to previous research results and the 
‘SpaceTime Equality Postulation’ [1-5], the eight sorts of interactions in the paper can all be 
described by quaternion spacetimes. Based on the conception of space verticality etc., these 
eight types of quaternion spacetimes can be united into the 32-dimensional trigintaduonion 
space. In the trigintaduonion space, the characteristics of eight sorts of interactions can be 
described by single trigintaduonion space uniformly. 
By analogy with the octonionic and sedenion fields, four sorts of trigintaduonion fields 
which consist of octonionic fields H-S, S-W and E-G etc., can be obtained in the paper. The 
paper describes the trigintaduonion fields and their quantum theory, and deduces some 
predicts and new conclusions which are consistent with the current sub-quark theories etc. 
_________ 
E-mail Addresses: xmuwzh@hotmail.com, xmuwzh@xmu.edu.cn 
2. Compounding fields in trigintaduonion spaces 
Through the analysis of the different fields in the octonionic and sedenion spaces, we find that 
each interaction possesses its own spacetime, field and operator in accordance with the 
‘SpaceTime Equality Postulation’. In the sedenion spaces, sixteen sorts of sedenion fields can 
be tabulated in Table 1, including their operators, spaces and fields. 
Table 1.  The compounding fields and operators in the different sedenion spaces 
operator X H-S / k
 H-S +
+A S-W / k
 S-W 
A H-S / k
B E-G / k
 E-G 
+B H-S / k
+S H-W / k
S H-S / k
space
octonion space 
sedenion space 
SW-HS 
sedenion space 
EG-HS
sedenion space 
HW-HS 
field H-S SW-HS EG-HS HW-HS 
operator
X H-S / k
+X S-W / k
A S-W / k
 S-W +
B E-G / k
 E-G 
+B S-W / k
+S H-W / k
S S-W / k
space
sedenion space 
HS-SW
octonion space 
sedenion space 
EG-SW
sedenion space 
HW-SW 
field HS-SW S-W EG-SW HW-SW 
     
operator
X H-S / k
+X E-G / k
+A S-W / k
A E-G / k
B E-G / k
 E-G +
+S H-W / k
S E-G / k
space
sedenion space 
HS-EG
sedenion space 
SW-EG 
octonion space 
sedenion space 
HW-EG 
field HS-EG SW-EG E-G HW-EG 
operator
X H-S / k
+X H-W / k
+A S-W / k
A H-W / k
B E-G / k
 E-G 
+B H-W / k
S H-W / k
 H-W +
space
sedenion space 
HS-HW
sedenion space 
SW-HW 
sedenion space 
EG-HW
octonion space 
field HS-HW SW-HW EG-HW H-W
In the Cayley-Dickson algebra, there exists the Cayley-Dickson construction [6]. This is the 
process based on which the 2n-dimensional hypercomplex number is constructed from a pair 
of (2n-1)-dimensional hypercomplex numbers, where n is a positive integer. This is 
accomplished by defining the multiplication rule for the two 2n-dimensional hypercomplex 
numbers in terms of the four (2n-1)-dimensional hypercomplex numbers. The 2-dimensional 
complex numbers (n = 1), 4-dimensional quaternions (n = 2), 8-dimensional octonions (n = 3), 
16-dimensional sedenions (n = 4), 32-dimensional trigintaduonions (n = 5), etc., can all be 
constructed from real numbers by the iterations of this process [7]. At each iteration some 
new basal elements, e k , are introduced with the property, e k
 = 1.
We define the product and conjugate on the trigintaduonions, (u, v) and (x, y), in terms of 
the sedenions, u, v, x and y, as follows: 
(u, v) (x, y) = (u x  y* v, y u + v x*) ,    (u, v)* = (u*, v)
where, the mark (*) denotes the conjugate. 
In the trigintaduonion space, there exist different constructions of fields in the terms of 
different operators. By analogy with the cases in the different octonionic spaces and sedenion 
spaces, the operators and fields in the different trigintaduonion spaces can be written in Table 
2. There exist four sorts of compounding fields in the trigintaduonion spaces. 
Table 2. The compounding fields and operators in the trigintaduonion space 
operator
X H-S / k
 H-S 
+X S-W / k
 S-W 
X E-G / k
 E-G 
+X H-W / k
A H-S / k
 H-S 
+A S-W / k
 S-W 
A E-G / k
 E-G 
+A H-W / k
B H-S / k
 H-S 
+B S-W / k
 S-W 
B E-G / k
 E-G 
+B H-W / k
S H-S / k
 H-S 
+S S-W / k
 S-W 
S E-G / k
 E-G 
+S H-W / k
space T-X T-A T-B T-S 
field T-X T-A T-B T-S 
3. Compounding field in trigintaduonion space T-X 
It is believed that hyper-strong field, strong-weak field, electromagnetic-gravitational field 
and hyper-weak field are unified, equal and interconnected. By means of the conception of the 
space expansion etc., four types of octonionic spaces can be combined into a trigintaduonion 
space T-X. In trigintaduonion space, some properties of eight sorts of interactions including 
strong, weak, electromagnetic and gravitational interactions etc. can be described uniformly. 
In the trigintaduonion space T-X, the displacement r should be extended to the new 
displacement R = (r + krx X ) and be consistent with the definition of momentum M.
In the octonionic space H-S, the base E H-S can be written as 
E H-S = (1, e 1 , e 2 , e 3 , e 4 , e 5 , e 6 , e 7 )               (1) 
The displacement R H-S = ( R0 , R1 , R2 , R3 , R4 , R5 , R6 , R7 ) in the octonionic space H-S 
is consist of the displacement r H-S = ( r0 , r1 , r2 , r3 , r4 , r5 , r6 , r7 ) and physical quantity X H-S
= ( x0 , x1 , x2 , x3 , x4 , x5 , x6 , x7 ). 
         R H-S = r H-S + k
rx X H-S
= R0 + e 1 R1 + e 2 R2 + e 3 R3 + e 4 R4 + e 5 R5 + e 6 R6 + e 7 R7       (2) 
where, R j = r j + k
rx x j ; j = 0, 1, 2, 3, 4, 5, 6, 7. r0 = c H-S t H-S , r4 = c H-S T H-S . c H-S is the 
speed of intermediate particle in the hyper-strong field, t H-S and T H-S denote the time. 
The octonionic differential operator T-X1 and its conjugate operator are defined as 
T-X1 = 0 + e 1  1 + e 2  2 + e 3  3 + e 4  4 + e 5  5 + e 6  6 + e 7  7       (3) 
T-X1 = 0 e 1  1 e 2  2 e 3  3 e 4  4 e 5  5 e 6  6 e 7  7       (4) 
where, j =  /  R j . The mark (*) denotes the octonionic conjugate. 
In the octonionic space S-W, the base E S-W can be written as 
E S-W = ( e 8 , e 9 , e 10 , e 11 , e 12 , e 13 , e 14 , e 15 )            (5) 
The displacement R S-W = (R8 , R9 , R10 , R11 , R12 , R13 , R14 , R15 ) in the octonionic space 
S-W is consist of the displacement r S-W = ( r8 , r9 , r10 , r11 , r12 , r13 , r14 , r15 ) and the physical 
quantity X S-W = ( x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15 ). 
             R S-W = r S-W + k
 rx X S-W
= e 8 R8 + e 9 R9 + e 10 R10 + e 11 R11
+ e 12 R12 + e 13 R13 + e 14 R14 + e 15 R15               (6) 
where, R j = r j + k
 rx x j ; j = 8, 9, 10, 11, 12, 13, 14, 15. r8 = c S-W t S-W , r12 = c S-W T S-W .    
c S-W is the speed of intermediate particle in strong-weak field, t S-W and T S-W denote the time. 
The octonionic differential operator T-X2 and its conjugate operator are defined as 
T-X2 = e 8  8 + e 9  9 + e 10  10 + e 11  11
+ e 12  12 + e 13  13 + e 14  14 + e 15  15                (7) 
T-X2 = e 8  8 e 9  9 e 10  10 e 11  11
e 12  12 e 13  13 e 14  14 e 15  15                (8) 
where, j =  /  R j . 
In the octonionic space E-G, the base E E-G can be written as 
E E-G = ( e 16 , e 17 , e 18 , e 19 , e 20 , e 21 , e 22 , e 23 )           (9) 
The displacement R E-G = ( R16 , R17 , R18 , R19 , R20 , R21 , R22 , R23 ) in the octonionic 
space E-G is consist of the displacement r E-G = ( r16 , r17 , r18 , r19 , r20 , r21 , r22 , r23 ) and the 
physical quantity XE-G = ( x16 , x17 , x18 , x19 , x20 , x21 , x22 , x23 ). 
       R E-G = r E-G + k
 rx X E-G
= e 16 R16 + e 17 R17 + e 18 R18 + e 19 R19
+ e 20 R20 + e 21 R21 + e 22 R22 + e 23 R23               (10) 
where, R j = r j + k
 rx x j ; j = 16, 17, 18, 19, 20, 21, 22, 23. r16 = c E-G t E-G , r20 = c E-G T E-G .
c E-G is the speed of intermediate particle in electromagnetic-gravitational field, t E-G and T E-G
denote the time. 
The octonionic differential operator T-X3 and its conjugate operator are defined as 
T-X3 = e 16  16 + e 17  17 + e 18  18 + e 19  19
+ e 20  20 + e 21  21 + e 22  22 + e 23  23            (11) 
T-X3 = e 16  16 e 17  17 e 18  18 e 19  19
e 20  20 e 21  21 e 22  22 e 23  23            (12) 
where, j =  /  R j . 
In the octonionic space H-W, the base E H-W can be written as 
E H-W = ( e 24 , e 25 , e 26 , e 27 , e 28 , e 29 , e 30 , e 31 )           (13) 
The displacement R H-W = ( R24 , R25 , R26 , R27 , R28 , R29 , R30 , R31 ) in octonionic space 
H-W is consist of the displacement r H-W = ( r24 , r25 , r26 , r27 , r28 , r29 , r30 , r31 ) and the 
physical quantity XH-W = ( x24 , x25 , x26 , x27 , x28 , x29 , x30 , x31 ). 
           R H-W = r H-W + k
 rx X H-W
= e 24 R24 + e 25 R25 + e 26 R26 + e 27 R27
+ e 28 R28 + e 29 R29 + e 30 R30 + e 31 R31             (14) 
where, R j = r j + k
 rx x j ; j = 24, 25, 26, 27, 28, 29, 30, 31. r24 = c H-W t H-W , r28 = c H-W T H-W .
c H-W is the speed of intermediate particle in hyper-weak field, t H-W and T H-W denote the time. 
The octonionic differential operator T-X4 and its conjugate operator are defined as, 
T-X4 = e 24  24 + e 25  25 + e 26  26 + e 27  27
+ e 28  28 + e 29  29 + e 30  30 + e 31  31                (15) 
T-X4 = e 24  24 e 25  25 e 26  26 e 27  27
e 28  28 e 29  29 e 30  30 e 31  31                (16) 
where, j =  /  R j . 
In the trigintaduonion space T-X, the base E T-X can be written as 
    E T-X = E T-X1 + E T-X2 + E T-X3 + E T-X4
= (1, e 1, e 2, e 3, e 4, e 5, e 6, e 7, e 8, e 9, e 10,
e 11, e 12, e 13, e 14, e 15, e 16, e 17, e 18, e 19, e 20,
e 21, e 22, e 23, e 24, e 25, e 26, e 27, e 28, e 29, e 30, e 31)        (17) 
The displacement R T-X = ( R0 , R1 , R2 , R3 , R4 , R5 , R6 , R7 , R8 , R9 , R10 , R11 , R12 , R13 , 
R14 , R15 , R16 , R17 , R18 , R19 , R20 , R21 , R22 , R23 , R24 , R25 , R26 , R27 , R28 , R29 , R30 , R31 ) 
in trigintaduonion space T-X is 
R T-X = R T-X1 + R T-X2 + R T-X3 + R T-X4
= R0 + e 1 R1 + e 2 R2 + e 3 R3 + e 4 R4 + e 5 R5 + e 6 R6
+ e 7 R7 + e 8 R8 + e 9 R9 + e 10 R10 + e 11 R11
+ e 12 R12 + e 13 R13 + e 14 R14 + e 15 R15 + e 16 R16
+ e 17 R17 + e 18 R18 + e 19 R19 + e 20 R20 + e 21 R21
+ e 22 R22 + e 23 R23 + e 24 R24 + e 25 R25 + e 26 R26
                     + e 27 R27 + e 28 R28 + e 29 R29 + e 30 R30 + e 31 R31           (18) 
The trigintaduonion differential operator T-X and its conjugate operator are defined as 
T-X = T-X1 + T-X2 + T-X3 + T-X4                   (19) 
T-X =
T-X1 +
T-X2 + 
T-X3 +
T-X4                (20) 
In the trigintaduonion space T-X, there exists one kind of field (trigintaduonion field T-X, 
for short) can be obtained related to the operator (X/K + ).
In the trigintaduonion field T-X, by analogy with the octonion and sedenion fields, the 
trigintaduonion differential operator needs to be generalized to the operator (X H-S / k
 H-S 
X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ). This is because the trigintaduonion field T-X 
includes the hyper-strong, strong-weak, electromagnetic-gravitational and hyper-weak fields. 
It can be predicted that the eight sorts of interactions are interconnected each other. The 
physical features of each subfield in the trigintaduonion field T-X meet the requirements of 
the equations set in the Table 3. 
In the trigintaduonion field T-X, the field potential A = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , 
a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , 
a30 , a31 ) is defined as 
        A = (X/K + )* X
= (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
= a0 + a1 e 1 + a2 e 2 + a3 e 3 + a4 e 4 + a5 e 5 + a6 e 6
+ a7 e 7 + a8 e 8 + a9 e 9 + a10 e 10 + a11 e 11
+ a12 e 12 + a13 e 13 + a14 e 14 + a15 e 15 + a16 e 16  
+ a17 e 17 + a18 e 18 + a19 e 19 + a20 e 20 + a21 e 21
+ a22 e 22 + a23 e 23 + a24 e 24 + a25 e 25 + a26 e 26
                + a27 e 27 + a28 e 28 + a29 e 29 + a30 e 30 + a31 e 31               (21) 
where, the mark (*) denotes the trigintaduonion conjugate. krx X = krx XT-X = k
rx XH-S + 
rx XS-W + k
rx XE-G + k
rx XH-W . K = KT-X , k
 H-S , k
 S-W , k
 E-G , k
 H-W , k
rx , k
rx , 
rx and k
rx are coefficients. XH-S is the physical quantity in the octonionic space H-S; 
XS-W is the physical quantity in octonionic space S-W; XE-G is the physical quantity in the 
octonionic space E-G; XH-W is the physical quantity in the octonionic space H-W. 
The field strength B of the trigintaduonion field T-X can be defined as 
B = (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) A       (22)
The field source and force of the trigintaduonion field T-X can be defined respectively as 
S = (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
* B      (23) 
Z = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) S     (24) 
where, the coefficient  is interaction intensity of the trigintaduonion field T-X. 
The angular momentum of trigintaduonion field can be defined as (k rx is the coefficient) 
M = S (r + k rx X)                           (25) 
and the energy and power in the trigintaduonion field can be defined respectively as 
W = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
* M    (26) 
N = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) W     (27) 
Table 3.  Equations set of trigintaduonion field T-X 
Spacetime trigintaduonion space T-X 
X physical quantity X = XT-X
Field potential A = (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
Field strength B = (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) A
Field source S = (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
Force Z = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) S
Angular momentum M = S (r + k rx X)
Energy W = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
Power N = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) W
In the trigintaduonion space T-X, the wave functions of the quantum mechanics are the 
trigintaduonion equations set. The Dirac and Klein-Gordon equations of quantum mechanics 
are actually the wave equations set which are associated with particle’s angular momentum. 
In the trigintaduonion field T-X, the Dirac equation and Klein-Gordon equation can be 
attained respectively from the energy equation (26) and power equation (27) after substituting 
the operator K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) for the operator 
(W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ).
The coefficients b
 H-S , b
 S-W , b
 E-G and b
 H-W are the Plank-like constant. 
The U equation of the quantum mechanics can be defined as 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* M        (28) 
The L equation of the quantum mechanics can be defined as 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U        (29) 
The four sorts of Dirac-like equations can be obtained from the Eqs.(21), (22), (23) and (24) 
respectively. 
The D equation of quantum mechanics can be defined as 
D = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* X        (30) 
The G equation of quantum mechanics can be defined as 
G = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) D        (31) 
The T equation of quantum mechanics can be defined as 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* G        (32) 
The O equation of quantum mechanics can be defined as 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T        (33) 
In the trigintaduonion field T-X, the intermediate and field source particles can be obtained. 
We can find that the intermediate particles and other kinds of new and unknown particles may 
be existed in the nature. 
Table 4.  Quantum equations set of trigintaduonion field T-X 
Energy quantum 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Power quantum 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U
Field potential quantum 
D = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Field strength quantum 
G = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) D
Field source quantum 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Force quantum 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T
4. Compounding field in trigintaduonion space T-A 
It is believed that strong-weak field, hyper-strong field, electromagnetic-gravitational field 
and hyper-weak field are unified, equal and interconnected. By means of the conception of the 
space expansion etc., four types of octonionic spaces can be combined into a trigintaduonion 
space T-A. In trigintaduonion space, some properties of eight sorts of interactions including 
strong, weak, electromagnetic and gravitational interactions etc. can be described uniformly. 
In the trigintaduonion space T-A, there exists one kind of field (trigintaduonion field T-A, 
for short) which is different to the trigintaduonion field T-X, can be obtained related to the 
operator (A/K + ). In the trigintaduonion space T-A, the base E T-A can be written as 
E T-A = E T-X                                (34) 
The displacement R T-A in trigintaduonion space T-A is 
R T-A = R T-X                               (35) 
The trigintaduonion differential operator T-A and its conjugate operator are defined as 
T-A = T-X   ,   
T-A =
T-X                     (36) 
In the trigintaduonion field T-A, by analogy with the octonion and sedenion fields, the 
trigintaduonion differential operator needs to be generalized to the operator (A H-S / k
 H-S 
A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ). This is because the trigintaduonion field 
T-A includes hyper-strong, strong-weak, electromagnetic-gravitational and hyper-weak fields. 
It can be predicted that the eight sorts of interactions are interconnected each other. The 
physical features of each subfield in the trigintaduonion field T-A meet the requirements of 
the equations set in the Table 5. 
In the trigintaduonion field T-A, the field potential A = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , 
a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , 
a30 , a31 ) is defined as 
                  A = * X
= a0 + a1 e 1 + a2 e 2 + a3 e 3 + a4 e 4 + a5 e 5 + a6 e 6
+ a7 e 7 + a8 e 8 + a9 e 9 + a10 e 10 + a11 e 11
+ a12 e 12 + a13 e 13 + a14 e 14 + a15 e 15 + a16 e 16  
+ a17 e 17 + a18 e 18 + a19 e 19 + a20 e 20 + a21 e 21
+ a22 e 22 + a23 e 23 + a24 e 24 + a25 e 25 + a26 e 26
                      + a27 e 27 + a28 e 28 + a29 e 29 + a30 e 30 + a31 e 31         (37) 
where, the mark (*) denotes the trigintaduonion conjugate. X = XT-A = XT-X . 
The field strength B of the trigintaduonion field T-A can be defined as 
         B = (A/K + ) A
= (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) A    (38)
where, K= KT-A , k
 H-S , k
 S-W , k
 E-G and k
 H-W are coefficients in the trigintaduonion space. 
The field potentials are 
A H-S = a0 + a1 e 1 + a2 e 2 + a3 e 3 + a4 e 4 + a5 e 5 + a6 e 6 + a7 e 7
A S-W = a8 e 8 + a9 e 9 + a10 e 10 + a11 e 11 + a12 e 12 + a13 e 13 + a14 e 14 + a15 e 15
A E-G = a16 e 16 + a17 e 17 + a18 e 18 + a19 e 19 + a20 e 20 + a21 e 21 + a22 e 22 + a23 e 23
A H-W = a24 e 24 + a25 e 25 + a26 e 26 + a27 e 27 + a28 e 28 + a29 e 29 + a30 e 30 + a31 e 31
The field source and force of the trigintaduonion field T-A can be defined respectively as 
S = (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + )
* B      (39) 
Z = K (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) S     (40) 
where, the coefficient  is interaction intensity of the trigintaduonion field T-A. 
The angular momentum of trigintaduonion field can be defined as (k rx is the coefficient) 
M = S (r + k rx X)                           (41) 
and the energy and power in the trigintaduonion field can be defined respectively as 
W = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + )
* M    (42) 
N = K (X H-S / k
 H-S X S-W / k
 S-W X E-G / k
 E-G +X H-W / k
 H-W + ) W     (43) 
In the trigintaduonion space T-A, the wave functions of the quantum mechanics are the 
trigintaduonion equations set. The Dirac and Klein-Gordon equations of quantum mechanics 
are actually the wave equations set which are associated with particle’s angular momentum. 
In the trigintaduonion field T-A, the Dirac equation and the Klein-Gordon equation can be 
attained respectively from the energy equation (42) and power equation (43) after substituting 
the operator K (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) for the 
operator (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W 
+ ). The coefficients b
 H-S , b
 S-W , b
 E-G and b
 H-W are the Plank-like constant. 
Table 5.  Equations set of trigintaduonion field T-A 
Spacetime trigintaduonion space T-A 
X physical quantity X = XT-X
Field potential A = * X
Field strength B = (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) A
Field source S = (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + )
Force Z = K (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) S
Angular momentum M = S (r + k rx X)
Energy W = K (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + )
Power N = K (A H-S / k
 H-S A S-W / k
 S-W A E-G / k
 E-G +A H-W / k
 H-W + ) W
The U equation of the quantum mechanics can be defined as 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* M        (44) 
The L equation of the quantum mechanics can be defined as 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U        (45) 
Table 6.  Quantum equations set of trigintaduonion field T-A 
Energy quantum 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Power quantum 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U
Field strength quantum 
G = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) A
Field source quantum 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Force quantum 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T
The three sorts of Dirac-like equations can be obtained from Eqs.(38), (39) and (40) 
respectively. 
The G equation of the quantum mechanics can be defined as 
G = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) A        (46) 
The T equation of the quantum mechanics can be defined as 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* G        (47) 
The O equation of the quantum mechanics can be defined as 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T        (48) 
In the trigintaduonion field T-A, the intermediate and field source particles can be obtained. 
We can find that the intermediate particles and other kinds of new and unknown particles may 
be existed in the nature. 
5. Compounding field in trigintaduonion space T-B 
It is believed that electromagnetic-gravitational field, strong-weak field, hyper-strong field 
and hyper-weak field are unified, equal and interconnected. By means of the conception of the 
space expansion etc., four types of octonionic spaces can be combined into a trigintaduonion 
space T-B. In trigintaduonion space, some properties of eight sorts of interactions including 
strong, weak, electromagnetic and gravitational interactions etc. can be described uniformly. 
In the trigintaduonion space T-B, there exists one kind of field (trigintaduonion field T-B, 
for short) which is different to the trigintaduonion field T-X or T-A, can be obtained related to 
the operator (B/K + ). In the trigintaduonion space T-B, the base E T-B can be written as 
E T-B = E T-X                                 (49) 
The displacement R T-B in trigintaduonion space T-B is 
R T-B = R T-X                                (50) 
The trigintaduonion differential operator T-B and its conjugate operator are defined as 
T-B = T-X   ,   
T-B =
T-X                     (51) 
In the trigintaduonion field T-B, by analogy with the octonion and sedenion fields, the 
trigintaduonion differential operator needs to be generalized to the operator (B H-S / k
 H-S 
B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ). This is because the trigintaduonion field T-B 
includes hyper-strong, strong-weak, electromagnetic-gravitational and hyper-weak fields. 
It can be predicted that the eight sorts of interactions are interconnected each other. The 
physical features of each subfield in the trigintaduonion field T-B meet the requirements of 
the equations set in the Table 7. 
In the trigintaduonion field T-B, the field potential A = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , 
a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , 
a30 , a31 ) is defined as 
                 A = * X
= a0 + a1 e 1 + a2 e 2 + a3 e 3 + a4 e 4 + a5 e 5 + a6 e 6
+ a7 e 7 + a8 e 8 + a9 e 9 + a10 e 10 + a11 e 11
+ a12 e 12 + a13 e 13 + a14 e 14 + a15 e 15 + a16 e 16  
+ a17 e 17 + a18 e 18 + a19 e 19 + a20 e 20 + a21 e 21
+ a22 e 22 + a23 e 23 + a24 e 24 + a25 e 25 + a26 e 26
                      + a27 e 27 + a28 e 28 + a29 e 29 + a30 e 30 + a31 e 31         (52) 
where, the mark (*) denotes the trigintaduonion conjugate. X = XT-B = XT-X . 
The field strength B of the trigintaduonion field T-B can be defined as 
B = A                               (53)
The field source of the trigintaduonion field T-B can be defined as 
S = (B/K + ) * B
= (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + )
* B      (54) 
where, K = KT-B , k
 H-S , k
 S-W , k
 E-G and k
 H-W are coefficients in the trigintaduonion space. 
The coefficient  is interaction intensity of trigintaduonion field T-B. The field strengths are 
B H-S = b0 + b1 e 1 + b2 e 2 + b3 e 3 + b4 e 4 + b5 e 5 + b6 e 6 + b7 e 7
B S-W = b8 e 8 + b9 e 9 + b10 e 10 + b11 e 11 + b12 e 12 + b13 e 13 + b14 e 14 + b15 e 15
B E-G = b16 e 16 + b17 e 17 + b18 e 18 + b19 e 19 + b20 e 20 + b21 e 21 + b22 e 22 + b23 e 23
B H-W = b24 e 24 + b25 e 25 + b26 e 26 + b27 e 27 + b28 e 28 + b29 e 29 + b30 e 30 + b31 e 31
The force of the trigintaduonion field T-B can be defined as 
Z = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ) S     (55) 
The angular momentum of trigintaduonion field can be defined as (k rx is the coefficient) 
M = S (r + k rx X)                           (56) 
and the energy and power in the trigintaduonion field can be defined respectively as 
W = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + )
* M    (57) 
N = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ) W     (58) 
In the trigintaduonion space T-B, the wave functions of the quantum mechanics are the 
trigintaduonion equations set. The Dirac and Klein-Gordon equations of quantum mechanics 
are actually the wave equations set which are associated with particle’s angular momentum. 
In the trigintaduonion field T-B, the Dirac equation and the Klein-Gordon equation can be 
attained respectively from the energy equation (57) and power equation (58) after substituting 
the operator K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ) for the operator 
(W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ). The 
coefficients b
 H-S , b
 S-W , b
 E-G and b
 H-W are the Plank-like constant. 
Table 7.  Equations set of trigintaduonion field T-B 
Spacetime trigintaduonion space T-B 
X physical quantity X = XT-X
Field potential A = * X
Field strength B = A
Field source S = (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + )
Force Z = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ) S
Angular momentum M = S (r + k rx X)
Energy W = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + )
Power N = K (B H-S / k
 H-S B S-W / k
 S-W B E-G / k
 E-G +B H-W / k
 H-W + ) W
The U equation of the quantum mechanics can be defined as 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* M        (59) 
The L equation of the quantum mechanics can be defined as 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U        (60) 
The two sorts of Dirac-like equations can be obtained from the field source equation (54) 
and force equation (55) respectively. 
The T equation of the quantum mechanics can be defined as 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* B        (61) 
The O equation of the quantum mechanics can be defined as 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T        (62) 
Table 8.  Quantum equations set of trigintaduonion field T-B 
Energy quantum 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Power quantum 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U
Field source quantum 
T = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Force quantum 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) T
In the trigintaduonion field T-B, the intermediate and field source particles can be obtained. 
We can find that the intermediate particles and other kinds of new and unknown particles may 
be existed in the nature. 
6. Compounding field in trigintaduonion space T-S 
It is believed that hyper-weak field, electromagnetic-gravitational field, strong-weak field and 
hyper-strong field are unified, equal and interconnected. By means of the conception of the 
space expansion etc., four types of octonionic spaces can be combined into a trigintaduonion 
space T-S. In trigintaduonion space, some properties of eight sorts of interactions including 
strong, weak, electromagnetic and gravitational interactions etc. can be described uniformly. 
In the trigintaduonion space T-S, there exists one kind of field (trigintaduonion field T-S, 
for short) which is different to trigintaduonion field T-X, T-A or T-B, can be obtained related 
to operator (S/K + ). In the trigintaduonion space T-S, the base E T-S can be written as 
E T-S = E T-X                                 (63) 
The displacement R T-S in trigintaduonion space T-S is 
R T-S = R T-X                                (64) 
The trigintaduonion differential operator T-S and its conjugate operator are defined as 
T-S = T-X   ,   
T-S =
T-X                     (65) 
In the trigintaduonion field T-S, by analogy with the octonion and sedenion fields, the 
trigintaduonion differential operator needs to be generalized to a new operator (S H-S / k
 H-S 
S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ). This is because the trigintaduonion field T-S 
includes the hyper-strong, strong-weak, electromagnetic-gravitational and hyper-weak fields. 
It can be predicted that the eight sorts of interactions are interconnected each other. The 
physical features of each subfield in the trigintaduonion field T-S meet the requirements of the 
equations set in the Table 9. 
In the trigintaduonion field T-S, the field potential A = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , 
a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , 
a30 , a31 ) is defined as 
                  A = * X
= a0 + a1 e 1 + a2 e 2 + a3 e 3 + a4 e 4 + a5 e 5 + a6 e 6
+ a7 e 7 + a8 e 8 + a9 e 9 + a10 e 10 + a11 e 11
+ a12 e 12 + a13 e 13 + a14 e 14 + a15 e 15 + a16 e 16  
+ a17 e 17 + a18 e 18 + a19 e 19 + a20 e 20 + a21 e 21
+ a22 e 22 + a23 e 23 + a24 e 24 + a25 e 25 + a26 e 26
                      + a27 e 27 + a28 e 28 + a29 e 29 + a30 e 30 + a31 e 31         (66) 
where, the mark (*) denotes the trigintaduonion conjugate. X = XT-S = XT-X . 
The field strength B of the trigintaduonion field T-S can be defined as 
B = A                               (67)
The field source of the trigintaduonion field T-S can be defined as 
S = * B                               (68) 
where, the coefficient  is interaction intensity of the trigintaduonion field T-S. 
The force of the trigintaduonion field T-S can be defined as 
Z = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ) S     (69) 
where, K = KT-S , k
 H-S , k
 S-W , k
 E-G and k
 H-W are coefficients in the trigintaduonion space. 
And the field sources are 
S H-S = s0 + s1 e 1 + s2 e 2 + s3 e 3 + s4 e 4 + s5 e 5 + s6 e 6 + s7 e 7
S S-W = s8 e 8 + s9 e 9 + s10 e 10 + s11 e 11 + s12 e 12 + s13 e 13 + s14 e 14 + s15 e 15
S E-G = s16 e 16 + s17 e 17 + s18 e 18 + s19 e 19 + s20 e 20 + s21 e 21 + s22 e 22 + s23 e 23
S H-W = s24 e 24 + s25 e 25 + s26 e 26 + s27 e 27 + s28 e 28 + s29 e 29 + s30 e 30 + s31 e 31
Table 9.  Equations set of trigintaduonion field T-S 
Spacetime trigintaduonion space T-S 
X physical quantity X = XT-X
Field potential A = * X
Field strength B = A
Field source S = * B
Force Z = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ) S
Angular momentum M = S (r + k rx X)
Energy W = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + )
Power N = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ) W
The angular momentum of trigintaduonion field can be defined as (k rx is the coefficient) 
M = S (r + k rx X)                           (70) 
and the energy and power in the trigintaduonion field can be defined respectively as 
W = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + )
* M    (71) 
N = K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ) W     (72) 
In the trigintaduonion space T-S, the wave functions of the quantum mechanics are the 
trigintaduonion equations set. The Dirac and Klein-Gordon equations of quantum mechanics 
are actually the wave equations set which are associated with particle’s angular momentum. 
In the trigintaduonion field T-S, the Dirac equation and the Klein-Gordon equation can be 
attained respectively from the energy equation (71) and power equation (72) after substituting 
the operator K (S H-S / k
 H-S S S-W / k
 S-W S E-G / k
 E-G +S H-W / k
 H-W + ) for the new 
operator (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W 
+ ). The coefficients b
 H-S , b
 S-W , b
 E-G and b
 H-W are the Plank-like constant. 
The U equation of the quantum mechanics can be defined as 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
* M        (73) 
The L equation of the quantum mechanics can be defined as 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U        (74) 
The Dirac-like equation can be obtained from the force equation (69). The O equation of 
the quantum mechanics can be defined as 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) S        (75) 
Table 10.  Quantum equations set of trigintaduonion field T-S 
Energy quantum 
U = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + )
Power quantum 
L = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) U
Force quantum 
O = (W H-S / k
 H-S b
 H-S W S-W / k
 S-W b
 S-W  
W E-G / k
 E-G b
 E-G +W H-W / k
 H-W b
 H-W + ) S
In the trigintaduonion field T-S, the intermediate and field source particles can be obtained. 
We can find that the intermediate particles and other kinds of new and unknown particles may 
be existed in the nature. 
7. Special case of compounding field in trigintaduonion space 
It is believed that different sorts of interactions are all unified, equal and interconnected. By 
means of the conception of the space expansion etc., four types of the octonionic spaces can 
be combined into a trigintaduonion space T-C. In the trigintaduonion space, some properties 
of eight sorts of interactions including the strong, weak, electromagnetic and gravitational 
interactions etc. can be described uniformly. 
In the trigintaduonion space T-C, there exists one kind of field (trigintaduonion field T-C, 
for short) which is the special case of the trigintaduonion fields T-X, T-A, T-B or T-S, can be 
obtained related to the operator .
In the trigintaduonion space T-C, the base E T-C can be written as 
E T-C = E T-X                                 (76) 
The displacement R T-C in trigintaduonion space T-C is 
R T-C = R T-X                                (77) 
The trigintaduonion differential operator T-C and its conjugate operator are defined as 
T-C = T-X   ,   
T-C =
T-X                     (78) 
It can be predicted that the eight sorts of interactions are interconnected each other. The 
physical features of each subfield in the trigintaduonion field T-C meet the requirements of 
the equations set in the Table 11. 
In the trigintaduonion field T-C, the field potential A = (a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , 
a9 , a10 , a11 , a12 , a13 , a14 , a15 , a16 , a17 , a18 , a19 , a20 , a21 , a22 , a23 , a24 , a25 , a26 , a27 , a28 , a29 , 
a30 , a31 ) is defined as 
A = * X                               (79) 
where, the mark (*) denotes the trigintaduonion conjugate. X = XT-C = XT-X . 
The field strength B of the trigintaduonion field T-C can be defined as 
B = A                               (80)
The field source of the trigintaduonion field T-C can be defined as 
S = * B                               (81) 
where, the coefficient  is interaction intensity of the trigintaduonion field T-C. 
The force of the trigintaduonion field T-C can be defined as 
Z = K S                             (82) 
where, K = KT-C is the coefficient in the trigintaduonion space.  
The angular momentum of trigintaduonion field can be defined as (k rx is the coefficient) 
M = S (r + k rx X)                          (83) 
and the energy and power in the trigintaduonion field can be defined respectively as 
W = K * M                             (84) 
N = K W                             (85) 
Table 11.  Equations set of trigintaduonion field T-C 
Spacetime trigintaduonion space T-C 
X physical quantity X = XT-X
Field potential A = * X
Field strength B = A
Field source S = * B
Force Z = K S
Angular momentum M = S (r + k rx X)
Energy W = K * M
Power N = K W
In the trigintaduonion space T-C, the wave functions of the quantum mechanics are the 
trigintaduonion equations set. The Dirac and Klein-Gordon equations of quantum mechanics 
are actually the wave equations set which are associated with particle’s angular momentum 
M = b . The coefficient b is the Plank-like constant. 
In the trigintaduonion field T-C, the Dirac equation and the Klein-Gordon equation can be 
attained respectively from the energy equation (84) and the power equation (85). 
The U equation of the quantum mechanics can be defined as 
U = (b )*  (M / b)                              (86) 
The L equation of the quantum mechanics can be defined as 
L = (b )  (U / b)                               (87) 
The four sorts of Dirac-like equations can be obtained from Eqs.(79), (80), (81) and (82) 
respectively. 
The D equation of the quantum mechanics can be defined as 
D = (b )*  (X / b)                               (88) 
The G equation of the quantum mechanics can be defined as 
G = (b )  (D / b)                               (89) 
The T equation of the quantum mechanics can be defined as 
T = (b )*  (G / b)                               (90) 
The O equation of the quantum mechanics can be defined as 
O = (b )  (T / b)                                (91) 
Table 12.  Quantum equations set of trigintaduonion field T-C 
Energy quantum U = (b )*  (M /b)
Power quantum L = (b )  (U /b)
Field potential quantum D = (b )*  (X /b)
Field strength quantum G = (b )  (D /b)
Field source quantum T = (b )*  (G /b)
Force quantum O = (b )  (T /b)
8. Conclusions 
By analogy with the four sorts of octonionic fields and twelve sorts of sedenion fields, four 
sorts of trigintaduonion fields and their special case have been developed, including their field 
equations, quantum equations and some new unknown particles. 
In trigintaduonion field T-X, the study deduces the Dirac equation, Schrodinger equation, 
Klein-Gordon equation and some newfound equations of sub-quarks etc. It infers four sorts of 
Dirac-like equations of intermediate particles among sub-quarks etc. It predicts that there are 
some new particles of field sources (sub-quarks etc.) and their intermediate particles. 
In trigintaduonion field T-A, the paper draws the Yang-Mills equation, Dirac equation, 
Schrodinger equation and Klein-Gordon equation of the quarks and leptons etc. It infers three 
sorts of Dirac-like equations of intermediate particles among quarks and leptons. It draws 
some conclusions of field source particles and intermediate particles which are consistent 
with current electro-weak theory. It predicts that there are some new unknown particles of 
field sources (quarks and leptons) and their intermediate particles. 
In trigintaduonion field T-B, the research infers the Dirac equation, Schrodinger equation, 
Klein-Gordon equation and some newfound equations of electrons and masses etc. It deduces 
two sorts of Dirac-like equations of intermediate particles among electrons and masses etc. It 
draws some conclusions of field source particles and intermediate particles which are 
consistent with current electromagnetic and gravitational theories etc. It predicts that there are 
some new unknown particles of field sources (electrons and masses etc.) and their 
intermediate particles. 
In trigintaduonion field T-S, the thesis concludes the Dirac equation, Schrodinger equation 
and Klein-Gordon equation of the galaxies etc. It infers Dirac-like equation of intermediate 
particles among galaxies. It predicts that there are some new unknown particles of field 
sources and their intermediate particles. 
In the trigintaduonion field theory, we can find that the interplays among all eight sorts of 
interactions are much more mysterious and complicated than we found and imagined before. 
Acknowledgements 
The author thanks Shaohan Lin, Minfeng Wang, Yun Zhu, Zhimin Chen and Xu Chen for 
helpful discussions. This project was supported by National Natural Science Foundation of 
China under grant number 60677039, Science & Technology Department of Fujian Province 
of China under grant number 2005HZ1020 and 2006H0092, and Xiamen Science & 
Technology Bureau of China under grant number 3502Z20055011. 
References 
[1] Zihua Weng. Octonionic electromagnetic and gravitational interactions and dark matter. 
arXiv: physics /0612102. 
[2] Zihua Weng. Octonionic quantum interplays of dark matter and ordinary matter. arXiv: 
physics /0702019. 
[3] Zihua Weng. Octonionic strong and weak interactions and their quantum equations. arXiv: 
physics /0702054. 
[4] Zihua Weng. Octonionic hyper-strong and hyper-weak fields and their quantum equations. 
arXiv: physics /0702086. 
[5] Zihua Weng. Compounding fields and their quantum equations in the sedenion space. 
arXiv: physics /0703055. 
[6] S. Kuwata, H. Fujii, A. Nakashima. Alternativity and reciprocity in the Cayley-Dickson 
algebra. J. Phys. A, 39 (2006) 1633-1644. 
[7] Yongge Tian, Yonghui Liu. On a group of mixed-type reverse-order laws for generalized 
inverses of a triple matrix product with applications. J. Linear Algebra, 16 (2007) 73-89.
ABSTRACT
  The 32-dimensional compounding fields and their quantum interplays in the
trigintaduonion space can be presented by analogy with octonion and sedenion
electromagnetic, gravitational, strong and weak interactions. In the
trigintaduonion fields which are associated with the electromagnetic,
gravitational, strong and weak interactions, the study deduces some conclusions
of field source particles (quarks and leptons) and intermediate particles which
are consistent with current some sorts of interaction theories. In the
trigintaduonion fields which are associated with the hyper-strong and
strong-weak fields, the paper draws some predicts and conclusions of the field
source particles (sub-quarks) and intermediate particles. The research results
show that there may exist some new particles in the nature.

<|endoftext|><|startoftext|>
Topological defects, geometric phases, and the angular momentum of light
S. C. Tiwari
Institute of Natural Philosophy
c/o 1 Kusum Kutir Mahamanapuri,Varanasi 221005, India
Recent reports on the intriguing features of vector vortex bearing beams are analyzed using
geometric phases in optics. It is argued that the spin redirection phase induced circular birefringence
is the origin of topological phase singularities arising in the inhomogeneous polarization patterns.
A unified picture of recent results is presented based on this proposition. Angular momentum
shift within the light beam (OAM) has exact equivalence with the angular momentum holonomy
associated with the geometric phase consistent with our conjecture.
PACS numbers: 42.25.-p, 41.20.Jb, 03.65.Vf
Topological defects in continuous field theoretic frame-
work are usually associated with field singularities, how-
ever in analogy with crystal defects wavefront disloca-
tions for scalar waves [1] and disclinations for vector
waves [2] have been discussed in the literature. An im-
portant advancement was the realization that topological
charge was related with the orbital angular momentum
(OAM) of finite sized (transverse) light beams: typically
for the Laguerre-Gaussian (LG) beams helicoidal spatial
structure of the wavefront with azimuthal phase exp(ilφ)
gives rise to OAM per photon of lh̄ where l is the topolog-
ical charge, see review [3]. Adopting the fluid dynamical
paradigm topological defects in optics are termed vor-
tices; singularities in the polarization patterns are called
vector vortices [4].
The aim of this Letter is to present a unified picture
of the underlying physics of intriguing aspects of recent
reports [5, 6, 7] in terms of the transformation of topo-
logical charges due to spin redirection phase (SRP) such
that OAM is exchanged within the beams [8]. The role
of Pancharatnam phase (PP) invoked in [4, 6, 7] is also
critically examined.
For the sake of clarity we briefly review the essentials
of geometric phases (GP) in optics which are primarily
of two types, see [8, 9] for details and original references.
Rytov-Vladimirskii phase rediscovered by Chiao and Wu
in 1986 (inspired by the Berry phase) arises in the wave
vector or momentum space of light. The unit wave vector
κ and polarization vector ǫ(κ) describe the intrinsic prop-
erties of the light wave. A plane wave propagating along
a slowly varying path in the real space can be mapped on
to the surface of a unit sphere in the wave vector space,
and under parallel transport along a curve in this space
preserving the spin helicity, ǫ.κ, the polarization vector is
found to be rotated after the completion of a closed cycle
on the sphere. The magnitude of the rotation is given
by the solid angle enclosed by the cycle, and the sign is
determined by the initial polarization state. Since left
circular |L > and right circular |R > polarization states
acquire equal but opposite geometric phases, Berry terms
this effect as geometric circular birefringence [10].
A polarized light wave propagating in a fixed direction
passed through optical elements traversing a polariza-
tion cycle on the Poincare sphere acquires Pancharatnam
phase equal to half of the solid angle of the cycle. Berry
points out that [11] Pancharatnam actually made two
important contributions. One,a notion of Pancharatnam
connection was introduced for the phase difference be-
tween two arbitrary nonorthogonal polarizations which
can be written as arg(E1
∗.E2) for complex electric field
vectors. Secondly this connection is nontransitive result-
ing into the Pancharatnam phase for a geodesic triangle
on the sphere. Note that a parallel transport on the
Poincare sphere is made with fixed direction of propaga-
tion for the occurrence of PP.
In the case of space varying polarization patterns,
defining a direction of propagation is not easy, however
Nye [12] has used Pancharatnam connection to define
propagation vector kδ as a gradient of phase difference
between fields at spatial locations r1 and r2 given by
dδ = Im(E∗.dE)/|E|2 (1)
In [4] authors correctly use Pancharatnam connection to
obtain the phase difference of light at two locations in
the space varying polarization plane, however the GP
involved is not Pancharatnam phase as one cannot com-
plete polarization cycle without changing the wave vec-
tor direction. As discussed above we have to construct
appropriate wave vector space for the case of vector vor-
tices. For an initial beam propagating along z-axis, at
each point (r, φ) on the inhomogeneous polarization plane
there will correspond a k-space, and spin helicity preserv-
ing parallel transport will give SRP for closed cycles. It
is known that for a linearly polarized plane wave repre-
sented by
|P >= e−iα|R > +eiα|L > (2)
the SRP corresponding to a cycle with solid angle Ω re-
sults into [10]
|P t >= e
−i(α+Ω)|R > +ei(α+Ω)|L > (3)
For the optical vortices this equation has to be general-
ized: we suggest a spatially evolving GP embodied in the
solid angle as a function of (r, φ). This is one of the main
contributions of this Letter leading to Eq.(4) below. For
http://arxiv.org/abs/0704.0137v1
the special case in which only the azimuthal angle de-
pendence matters we obtain Ω in the following way. In
the transverse plane consider a point A on a circle spec-
ified by φ, then the area enclosed by the great circles
in k-space corresponding to this point and the reference
point O specified by φ = 0 would subtend a solid an-
gle which varies linearly with φ; the solid angle will vary
from 0 to 4π as φ varies from 0 to 2π. Thus we obtain
the generalization of Eq.(3) to
|P v >= e
−i(α+2φ)|R > +ei(α+2φ|L > (4)
Spatially evolving SRP [13] is crucial to understand in-
teresting features observerd in vector vortices; we state
our second principal contribution in the form of a propo-
sition.
Proposition: Geometric phase induced circular bire-
fringence is the origin of topological charge transforma-
tion in vector vortex carrying beams, and angular mo-
mentum holonomy is manifested as OAM.
We demonstrate in the following that this proposition
offers transparent physical mechanism to explain the re-
cent reports on inhomogeneous polarization patterns.
Backscattered polarization [5] : Theory and experi-
ments on the backscattered light for linearly polarized
light from random media have been of current interest.
Interesting features for the backscattering geometry have
been observed. Authors of [5] give insightful treatment
of the observations invoking GP in wave vector space,
and this is in agreement with our proposition. Note that
backscattered light wave vector could be treated similar
to the discrete transformation for reflection from a mir-
ror, see discussion in [9] one can envisage an adiabatic
path in a modified k-space. It may be noted that essen-
tially spatially evolving SRP is used in [5]. It seems the
term ’geometrical phase vortex’ introduced by them for
scalar vortices appearing in space varying polarization
pattern is quite revealing.
The q-plate experiment [6]: An inhomogeneous
anisotropic optical element called q-plate has novel addi-
tion to HWP : inhomogeneity is introduced orienting the
fast (or slow) optical axis making an angle of α with
the x-axis in the xy-plane for a planar slab given by
α(r, φ) = qφ+α0. Jones calculus applied at each point of
the q-plate shows that the output beam for an incident
|L > state is not only converted to |R > state but it also
acquires an azimuthal phase factor of exp(2iqφ). Simi-
lar to the LG beams this phase is interpreted as OAM
of 2qh̄ per photon in the output wave. Experiment is
carried out using nematic liquid crystal planar cell for
q = 1 and the measurements on the interference pattern
formed by the superposition of the output beam with the
reference beam display the wave front singularities and
helical modes in the output beam.
We argue that in the light of our proposition SRP in-
duced circular birefringence is the origin of topological
phase singularity and OAM in q-plates. We picture he-
licity preserving transformation in the wave vector space
defined by kδ. Since the polarization variation is confined
in the transverse plane for the q-plates the constraint of
the spin helicity preserving process in the q-plate with
the azimuthal dependence of α would lead to a spiral
path for the wave vector. In q=1 plate the circular plus
linear propagation along z-axis will result into a helical
path and the width of the plate ensures that the input
and output ends of the helix are parallel. On the unit
sphere in the wave vector space this will correspond to a
great circle, and the solid angle would be 2π. Since the
SRP equals the solid angle for the evolving paths, our
Eq.(4) above is in agreement with Eq.(3) of [6]. The im-
portant observation emphasized by the authors that the
incident polarization controls the sign of the orbital helic-
ity or topological charge is easily explained in view of the
property of the geometric birefingence in which handed-
ness decides the sign of the phase. Thus both magnitude
and sign of the azimuthal phase have been explained in
accordance with our proposition.
Tightly focused beams [7]: Analysis of the light field
at the focal plane of a high numerical aperture lens for
the incoming circularly polarized plane wave shows the
existence of inhomogeneous polarization pattern. Pan-
charatnam connection at two different points on the cir-
cle around the focus shows φ-dependence of the phase
difference. The field can be decomposed into |L > and
|R > states, Eq.(7) in [7], and it is found that the compo-
nent with the spin same as that in the object plane does
not change phase while the one with opposite spin ac-
quires an azimuthal phase of 2φ, i. e. topological charge
2 and OAM of 2h̄ per photon. Application of Eq. (4)
immediately leads to this result in conformity with our
proposition. We may remark that the construction used
by the authors to derive PP, namely the geodesic trian-
gle on the Poincare sphere formed by the pole, E(r, 0),
and E(r, φ) cannot be completed with a fixed direction of
propagation for space varying polarization pattern, and
therefore it is SRP not PP that arises.
Having established first part of the proposition, we dis-
cuss the role of angular momentum (AM) holonomy con-
jecture [8, 9]. Transfer of spin AM of light to matter
was measured long ago by Beth [14], and there are many
reports of OAM transfer to particles in recent years [3].
Since polarization cycle for PP requires spin exchange
with optical elements, it is natural to envisage a role of
AM in GP; however it would be trivial. In the AM holon-
omy conjecture, we argued AM level shifts within the
light beams as physical mechanism for GP. This implies
exact equivalence between AM shift and GP. Indirectly
the backscattered light experiments and their interpre-
tation in terms of SRP supports our conjecture. In the
context of the AM conservation [5] the redistribution of
total AM within the beam is also indicative of AM level
shifts. The q=1 plate is a special optical element in which
no transfer of AM to the crystal takes place, and total
AM is conserved within the light beam. We argue that
spin is intrinsic, and the OAM is a manifestation of the
GP with exact equivalence between them in this case.
The counter-intuitive interpretation in terms of spin to
OAM conversion claimed in [6] is clearly ruled out. In
fact, Marrucci et al experiment offers first direct evidence
in support of our conjecture. It is remarkable that the
light field structure calculated for tightly focused beams
shows strong resemblence with the action of q-plates on
light wave, and offers another setting to test our propo-
sition.
To conclude the Letter we make few observations. First
let us note that even without the existence of phase singu-
larities it should be possible to exchange AM within the
light beam accompanied with GP: as argued earlier trans-
verse shifts in the beam would account for the change in
OAM [9]. Secondly the interplay of evolving GP in space
and time domains could be of interest. A simple rotat-
ing q-plate experiment is suggested: polarized light beam
after traversing the q-plate is made to pass through a ro-
tating HWP. Another variant with nonintegral q for this
arrangement, i.e. q-plate plus rotating HWP, is also sug-
gested. Analysis of the emerging beams may delineate
the role of SRP and PP as well as provide further test to
AM holonomy conjecture.
The physical mechanism proposed here for space vary-
ing polarization pattern of light could find important
application in ’all optical information processing’. The
angular momentum holonomy associated with GP, and
the strong evidence of its proof discussed here will have
significance in the context of the controversy surround-
ing ’the hidden momentum’ and Aharanov-Bohm effect.
We believe present ideas also hold promise to address
some fundamental questions in physics. An important
recent example is that of birefringence of the vacuum
in quantum electrodynamics in strong external magnetic
field. Though this has been known since long, last year
PVLAS experiment reported polarization rotation [15]
apparently very much in excess than the expected one.
This has led to a controversy on the interpretation of
QED birefringence in external rotating magnetic field,
see [16] for a short review. As remarked by Adler essen-
tially it involves light wave propagation in a nontrivial
refracive media, and he finds that to first order there
should be no rotation of the polarization of light. Could
there be a role of GP in this case? It would be interesting
to use Pancharatnam connection to calculate the phase of
propagating light, and see if evolving GP in time domain
will arise due to rotating magnetic field. It is interesting
to note that the magnetic field direction rotates in the
plane transverse to the direction of the propagation of
the light. Obviously it would give additional polariza-
tion rotation. This problem is being investigated, and
will be reported elewhere.
The Library facility at Banaras Hindu University is
acknowledged.
[1] J. F. Nye and M. V. Berry, Dislocations in wave trains,
Proc. R. Soc. Lond. A 336,165 (1974).
[2] J. F. Nye, Polarization effects in the diffraction of electro-
magnetic waves: the role of disclinations, Proc. R. Soc.
Lond. A 387, 105 (1983).
[3] L. Allen, M. J. Padgett, and M. Babiker, The orbital
angular momentum of light, Prog. Opt. 39,291 (1999).
[4] A. Niv, G. Biener, V. Kleiner, and E. Hasman, Manip-
ulation of the Pancharatnam phase in vectorial vortices,
Opt. Express, 14, 4208 (2006).
[5] C. Schwartz and A. Dogariu, Backscattered polarization
patterns, optical vortices, and the angular momentum of
light, Opt. Lett. 31,1121(2006).
[6] L. Marrucci, C. Manzo, and D. Paparo, Optical spin-to-
orbital angular momentum conversion in inhomogeneous
anisotropic media, Phys. Rev. Lett. 96,163905(2006).
[7] Z. Bomzon, M. Gu, and J. Shamir, Angular momentum
and geometrical phases in tight-focused circularly polar-
ized plane waves, Appl. Phys. Lett. 89,241 (2006).
[8] S. C. Tiwari, Geometric phase in optics: quantal or clas-
sical?, J. Mod. Opt. 39,1097(1992).
[9] S. C. Tiwari, Geometric phase in optics and angular mo-
mentum of light, J.Mod. Opt. 51,2297(2004).
[10] M. V. Berry, Quantum adiabatic anholonomy, Lectures
Ferrara School on Anomalies, defects, phases..., June
1989.
[11] M. V. Berry, The adiabatic phase and Pancharatnam’s
phase for polarized light, J. Mod. Opt. 34, 1401 (1987).
[12] J. F. Nye, Phase gradient and crystal-like geometry in
electromagnetic and elastic wavefields, in Sir Charles
Frank OBE, FRS:An eightieth birthday tribute (IOP,UK
1991)pp220-231.
[13] S. C. Tiwari, Nature of the angular momentum of
light: rotational energy and geometric phase, arxiv.org
: quant-ph/0609015.
[14] R. A. Beth, Direct detection of the angular momentum
of light, Phys. Rev. 48, 471 (1935).
[15] E. Zavattini et al, Experimental observation of optical
rotation generated in vacuum by a magnetic field, Phys.
Rev. Lett. 96, 110406 (2006).
[16] S. L. Adler, Vacuum birefringence in a rotating magnetic
field, J. Phys. A: Math. Theor. 40, F143 (2007).
http://arxiv.org/abs/quant-ph/0609015
ABSTRACT
  Recent reports on the intriguing features of vector vortex bearing beams are
analyzed using geometric phases in optics. It is argued that the spin
redirection phase induced circular birefringence is the origin of topological
phase singularities arising in the inhomogeneous polarization patterns. A
unified picture of recent results is presented based on this proposition.
Angular momentum shift within the light beam (OAM) has exact equivalence with
the angular momentum holonomy associated with the geometric phase consistent
with our conjecture.

<|endoftext|><|startoftext|>
Circular and non-circular nearly horizon-skimming orbits in Kerr spacetimes
Enrico Barausse∗
SISSA, International School for Advanced Studies and INFN, Via Beirut 2, 34014 Trieste, Italy
Scott A. Hughes
Department of Physics and MIT Kavli Institute, MIT, 77 Massachusetts Ave., Cambridge, MA 02139 USA
Luciano Rezzolla
Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut, 14476 Potsdam, Germany and
Department of Physics, Louisiana State University, Baton Rouge, LA 70803 USA
(Dated: November 1, 2018)
We have performed a detailed analysis of orbital motion in the vicinity of a nearly extremal Kerr black hole.
For very rapidly rotating black holes — spin parameter a ≡ J/M > 0.9524M — we have found a class of
very strong field eccentric orbits whose orbital angular momentum Lz increases with the orbit’s inclination with
respect to the equatorial plane, while keeping latus rectum and eccentricity fixed. This behavior is in contrast
with Newtonian intuition, and is in fact opposite to the “normal” behavior of black hole orbits. Such behavior
was noted previously for circular orbits; since it only applies to orbits very close to the black hole, they were
named “nearly horizon-skimming orbits”. Our current analysis generalizes this result, mapping out the full
generic (inclined and eccentric) family of nearly horizon-skimming orbits. The earlier work on circular orbits
reported that, under gravitational radiation emission, nearly horizon-skimming orbits exhibit unusual inspiral,
tending to evolve to smaller orbit inclination, toward prograde equatorial configuration. Normal orbits, by
contrast, always demonstrate slowly growing orbit inclination — orbits evolve toward the retrograde equatorial
configuration. Using up-to-date Teukolsky-based fluxes, we have concluded that the earlier result was incorrect
— all circular orbits, including nearly horizon-skimming ones, exhibit growing orbit inclination under radiative
backreaction. Using kludge fluxes based on a Post-Newtonian expansion corrected with fits to circular and
to equatorial Teukolsky-based fluxes, we argue that the inclination grows also for eccentric nearly horizon-
skimming orbits. We also find that the inclination change is, in any case, very small. As such, we conclude that
these orbits are not likely to have a clear and peculiar imprint on the gravitational waveforms expected to be
measured by the space-based detector LISA.
PACS numbers: 04.30.-w
I. INTRODUCTION
The space-based gravitational-wave detector LISA [1] will
be a unique tool to probe the nature of supermassive black
holes (SMBHs), making it possible to map in detail their
spacetimes. This goal is expected to be achieved by observing
gravitational waves emitted by compact stars or black holes
with masses µ ≈ 1 − 100M⊙ spiraling into the SMBHs
which reside in the center of galaxies [2] (particularly the low
end of the galactic center black hole mass function, M ≈
105− 107M⊙). Such events are known as extreme mass ratio
inspirals (EMRIs). Current wisdom suggests that several tens
to perhaps of order a thousand such events could be measured
per year by LISA [3].
Though the distribution of spins for observed astrophysi-
cal black holes is not very well understood at present, very
rapid spin is certainly plausible, as accretion tends to spin-
up SMBHs [4]. Most models for quasi-periodic oscillations
(QPOs) suggest this is indeed the case in all low-mass x-ray
binaries for which data is available [5]. On the other hand,
continuum spectral fitting of some high-mass x-ray binaries
indicates that modest spins (spin parameter a/M ≡ J/M2 ∼
∗Electronic address: barausse@sissa.it
0.6− 0.8) are likewise plausible [6]. The continuum-fit tech-
nique does find an extremely high spin of a/M & 0.98 for the
galactic “microquasar” GRS1915+105 [7]. This argues for a
wide variety of possible spins, depending on the detailed birth
and growth history of a given black hole.
In the mass range corresponding to black holes in galactic
centers, measurements of the broad iron Kα emission line in
active galactic nuclei suggest that SMBHs can be very rapidly
rotating (see Ref. [8] for a recent review). For instance, in
the case of MCG-6-30-15, for which highly accurate observa-
tions are available, a has been found to be larger than 0.987M
at 90% confidence [9]. Because gravitational waves from EM-
RIs are expected to yield a very precise determination of the
spins of SMBHs [10], it is interesting to investigate whether
EMRIs around very rapidly rotating black holes may possess
peculiar features which would be observable by LISA. Should
such features exist, they would provide unambiguous infor-
mation on the spin of SMBHs and thus on the mechanisms
leading to their formation [11].
For extremal Kerr black holes (a = M ), the existence of
a special class of “circular” orbits was pointed out long ago
by Wilkins [12], who named them “horizon-skimming” or-
bits. (“Circular” here means that the orbits are of constant
Boyer-Lindquist coordinate radius r.) These orbits have vary-
ing inclination angle with respect to the equatorial plane and
have the same coordinate radius as the horizon, r = M . De-
http://arxiv.org/abs/0704.0138v2
mailto:barausse@sissa.it
spite this seemingly hazardous location, it can be shown that
all these r = M orbits have finite separation from one another
and from the event horizon [13]. Their somewhat pathological
description is due to a singularity in the Boyer-Lindquist co-
ordinates, which collapses a finite span of the spacetime into
r = M .
Besides being circular and “horizon-skimming”, these or-
bits also show peculiar behavior in their relation of angular
momentum to inclination. In Newtonian gravity, a generic or-
bit has Lz = |L| cos ι, where ι is the inclination angle relative
to the equatorial plane (going from ι = 0 for equatorial pro-
grade orbits to ι = π for equatorial retrograde orbits, passing
through ι = π/2 for polar orbits), and L is the orbital angular
momentum vector. As a result, ∂Lz(r, ι)/∂ι < 0, and the an-
gular momentum in the z-direction always decreases with in-
creasing inclination if the orbit’s radius is kept constant. This
intuitively reasonable decrease of Lz with ι is seen for almost
all black hole orbits as well. Horizon-skimming orbits, by
contrast, exhibit exactly the opposite behavior: Lz increases
with inclination angle.
Reference [14] asked whether the behavior ∂Lz/∂ι > 0
could be extended to a broader class of circular orbits than just
those at the radius r = M for the spin value a = M . It was
found that this condition is indeed more general, and extended
over a range of radius from the “innermost stable circular or-
bit” to r ≃ 1.8M for black holes with a > 0.9524M . Or-
bits that show this property have been named “nearly horizon-
skimming”. The Newtonian behavior ∂Lz(r, ι)/∂ι < 0 is
recovered for all orbits at r & 1.8M [14].
A qualitative understanding of this behavior comes from
recalling that very close to the black hole all physical pro-
cesses become “locked” to the hole’s event horizon [15], with
the orbital motion of point particles coupling to the horizon’s
spin. This locking dominates the “Keplerian” tendency of an
orbit to move more quickly at smaller radii, forcing an or-
biting particle to slow down in the innermost orbits. Lock-
ing is particularly strong for the most-bound (equatorial) or-
bits; the least-bound orbits (which have the largest inclination)
do not strongly lock to the black hole’s spin until they have
very nearly reached the innermost orbit [14]. The property
∂Lz(r, ι)/∂ι > 0 reflects the different efficiency of nearly
horizon-skimming orbits to lock with the horizon.
Reference [14] argued that this behavior could have ob-
servational consequences. It is well-known that the incli-
nation angle of an inspiraling body generally increases due
to gravitational-wave emission [16, 17]. Since dLz/dt <
0 because of the positive angular momentum carried away
by the gravitational waves, and since “normal” orbits have
∂Lz/∂ι < 0, one would expect dι/dt > 0. However, if
during an evolution ∂Lz/∂ι switches sign, then dι/dt might
switch sign as well: An inspiraling body could evolve towards
an equatorial orbit, signalling the presence of an “almost-
extremal” Kerr black hole.
It should be emphasized that this argument is not rigorous.
In particular, one needs to consider the joint evolution of or-
bital radius and inclination angle; and, one must include the
dependence of these two quantities on orbital energy as well
as angular momentum1. As such, dι/dt depends not only on
dLz/dt and ∂Lz/∂ι, but also on dE/dt, ∂E/∂ι, ∂E/∂r and
∂Lz/∂r.
In this sense, the argument made in Ref. [14] should be
seen as claiming that the contribution coming from dLz/dt
and ∂Lz/∂ι are simply the dominant ones. Using the nu-
merical code described in [17] to compute the fluxes dLz/dt
and dE/dt, it was then found that a test-particle on a circu-
lar orbit passing through the nearly horizon-skimming region
of a Kerr black hole with a = 0.998M (the value at which
a hole’s spin tends to be buffered due to photon capture from
thin disk accretion [19]) had its inclination angle decreased
by δι ≈ 1◦ − 2◦ [14] in the adiabatic approximation [20].
It should be noted at this point that the rate of change of in-
clination angle, dι/dt, appears as the difference of two rel-
atively small and expensive to compute rates of change [cf.
Eq. (3.8) of Ref. [17]]. As such, small relative errors in those
rates of change can lead to large relative errors in dι/dt. Fi-
nally, in Ref. [14] it was speculated that the decrease could
be even larger for eccentric orbits satisfying the condition
∂Lz/∂ι > 0, possibly leading to an observable imprint on
EMRI gravitational waveforms.
The main purpose of this paper is to extend Ref. [14]’s anal-
ysis of nearly horizon-skimming orbits to include the effect
of orbital eccentricity, and to thereby test the speculation that
there may be an observable imprint on EMRI waveforms of
nearly horizon-skimming behavior. In doing so, we have re-
visited all the calculations of Ref. [14] using a more accurate
Teukolsky solver which serves as the engine for the analysis
presented in Ref. [21].
We have found that the critical spin value for circular
nearly horizon-skimming orbits, a > 0.9524M , also delin-
eates a family of eccentric orbits for which the condition
∂Lz(p, e, ι)/∂ι > 0 holds. (More precisely, we consider vari-
ation with respect to an angle θinc that is easier to work with
in the extreme strong field, but that is easily related to ι.) The
parameters p and e are the orbit’s latus rectum and eccentric-
ity, defined precisely in Sec. II. These generic nearly horizon-
skimming orbits all have p . 2M , deep in the black hole’s
extreme strong field.
We next study the evolution of these orbits under
gravitational-wave emission in the adiabatic approximation.
We first revisited the evolution of circular, nearly horizon-
skimming orbits using the improved Teukolsky solver which
was used for the analysis of Ref. [21]. The results of this anal-
ysis were somewhat surprising: Just as for “normal” orbits,
we found that orbital inclination always increases during in-
spiral, even in the nearly horizon-skimming regime. This is in
stark contrast to the claims of Ref. [14]. As noted above, the
inclination’s rate of change depends on the difference of two
expensive and difficult to compute numbers, and thus can be
strongly impacted by small relative errors in those numbers.
1 In the general case, one must also include the dependence on “Carter’s
constant” Q [18], the third integral of black hole orbits (described more
carefully in Sec. II). For circular orbits, Q = Q(E,Lz): knowledge of E
and Lz completely determines Q.
A primary result of this paper is thus to retract the claim of
Ref. [14] that an important dynamic signature of the nearly
horizon-skimming region is a reversal in the sign of incli-
nation angle evolution: The inclination always grows under
gravitational radiation emission.
We next extended this analysis to study the evolution of
generic nearly horizon-skimming orbits. The Teukolsky code
to which we have direct access can, at this point, only com-
pute the radiated fluxes of energy E and angular momentum
Lz; results for the evolution of the Carter constant Q are just
now beginning to be understood [22], and have not yet been
incorporated into this code. We instead use “kludge” expres-
sions for dE/dt, dLz/dt, and dQ/dt which were inspired by
Refs. [23, 24]. These expression are based on post-Newtonian
flux formulas, modified in such a way that they fit strong-field
radiation reaction results obtained from a Teukolsky integra-
tor; see Ref. [24] for further discussion. Our analysis indicates
that, just as in the circular limit, the result dι/dt > 0 holds
for generic nearly horizon-skimming orbits. Furthermore, and
contrary to the speculation of Ref. [14], we do not find a large
amplification of dι/dt as orbits are made more eccentric.
Our conclusion is that the nearly horizon-skimming regime,
though an interesting curiosity of strong-field orbits of nearly
extremal black holes, will not imprint any peculiar observa-
tional signature on EMRI waveforms.
The remainder of this paper is organized as follows. In Sec.
II, we review the properties of bound stable orbits in Kerr, pro-
viding expressions for the constants of motion which we will
use in Sec. III to generalize nearly horizon-skimming orbits to
the non-circular case. In Sec. IV, we study the evolution of the
inclination angle for circular nearly horizon-skimming orbits
using Teukolsky-based fluxes; in Sec. V we do the same for
non-circular orbits and using kludge fluxes. We present and
discuss our detailed conclusions in Sec. VI. The fits and post-
Newtonian fluxes used for the kludge fluxes are presented in
the Appendix. Throughout the paper we have used units in
which G = c = 1.
II. BOUND STABLE ORBITS IN KERR SPACETIMES
The line element of a Kerr spacetime, written in Boyer-
Lindquist coordinates reads [25]
ds2 = −
1− 2Mr
dt2 +
dr2 +Σ dθ2
r2 + a2 +
2Ma2r
sin2 θ
sin2 θ dφ2
−4Mar
sin2 θ dt dφ, (1)
where
Σ ≡ r2 + a2 cos2 θ, ∆ ≡ r2 − 2Mr + a2. (2)
Up to initial conditions, geodesics can then be labelled by four
constants of motion: the mass µ of the test particle, its energy
E and angular momentum Lz as measured by an observer at
infinity and the Carter constant Q [18]. The presence of these
four conserved quantities makes the geodesic equations sepa-
rable in Boyer-Lindquist coordinates. Introducing the Carter
time λ, defined by
≡ Σ , (3)
the geodesic equations become
= Vr(r), µ
= Vt(r, θ),
= Vθ(θ), µ
= Vφ(r, θ) , (4)
Vt(r, θ) ≡ E
− a2 sin2 θ
+ aLz
Vr(r) ≡
E̟2 − aLz
)2 −∆
µ2r2 + (Lz − aE)2 +Q
Vθ(θ) ≡ Q− L2z cot2 θ − a2(µ2 − E2) cos2 θ, (5c)
Vφ(r, θ) ≡ Lz csc2 θ + aE
, (5d)
where we have defined
̟2 ≡ r2 + a2 . (6)
The conserved parameters E, Lz , and Q can be remapped
to other parameters that describe the geometry of the orbit. We
have found it useful to describe the orbit in terms of an angle
θmin — the minimum polar angle reached by the orbit — as
well as the latus rectum p and the eccentricity e. In the weak-
field limit, p and e correspond exactly to the latus rectum and
eccentricity used to describe orbits in Newtonian gravity; in
the strong field, they are essentially just a convenient remap-
ping of the orbit’s apoastron and periastron:
rap ≡
, rperi ≡
1 + e
. (7)
Finally, in much of our analysis, it is useful to refer to
z− ≡ cos2 θmin , (8)
rather than to θmin directly.
To map (E,Lz, Q) to (p, e, z−), use Eq. (4) to impose
dr/dλ = 0 at r = rap and r = rperi, and to impose
dθ/dλ = 0 at θ = θmin. (Note that for a circular orbit,
rap = rperi = r0. In this case, one must apply the rules
dr/dλ = 0 and d2r/dλ2 = 0 at r = r0.) Following this ap-
proach, Schmidt [26] was able to derive explicit expressions
for E, Lz and Q in terms of p, e and z−. We now briefly
review Schmidt’s results.
Let us first introduce the dimensionless quantities
Ẽ ≡ E/µ , L̃z ≡ Lz/(µM) , Q̃ ≡ Q/(µM)2 , (9)
ã ≡ a/M , r̃ ≡ r/M , ∆̃ ≡ ∆/M2 , (10)
Figure 1: Left panel: Inclination angles θinc for which bound stable orbits exist for a black hole with spin a = 0.998M . The allowed range
for θinc goes from θinc = 0 to the curve corresponding to the eccentricity under consideration, θinc = θ
inc . Right panel: Same as left but for
an extremal black hole, a = M . Note that in this case θmaxinc never reaches zero.
and the functions
f(r̃) ≡ r̃4 + ã2
r̃(r̃ + 2) + z−∆̃
, (11)
g(r̃) ≡ 2 ã r̃ , (12)
h(r̃) ≡ r̃(r̃ − 2) +
1− z−
∆̃ , (13)
d(r̃) ≡ (r̃2 + ã2z−)∆̃ . (14)
Let us further define the set of functions
(f1, g1, h1, d1) ≡{
(f(r̃p), g(r̃p), h(r̃p), d(r̃p)) if e > 0 ,
(f(r̃0), g(r̃0), h(r̃0), d(r̃0)) if e = 0 ,
(f2, g2, h2, d2) ≡{
(f(r̃a), g(r̃a), h(r̃a), d(r̃a)) if e > 0 ,
(f ′(r̃0), g
′(r̃0), h
′(r̃0), d
′(r̃0)) if e = 0 ,
and the determinants
κ ≡ d1h2 − d2h1 , (17)
ε ≡ d1g2 − d2g1 , (18)
ρ ≡ f1h2 − f2h1 , (19)
η ≡ f1g2 − f2g1 , (20)
σ ≡ g1h2 − g2h1 . (21)
The energy of the particle can then be written
κρ+ 2ǫσ − 2D
σ(σǫ2 + ρǫκ− ηκ2)
ρ2 + 4ησ
. (22)
The parameter D takes the values ±1. The angular momen-
tum is a solution of the system
2 − 2g1ẼL̃z − h1L̃2z − d1 = 0 , (23)
2 − 2g2ẼL̃z − h2L̃2z − d2 = 0 . (24)
By eliminating the L̃2z terms in these equations, one finds the
solution
L̃z =
ρẼ2 − κ
for the angular momentum. Using dθ/dλ = 0 at θ = θmin,
the Carter constant can be written
Q̃ = z−
ã2(1− Ẽ2) +
1− z−
. (26)
Additional constraints on p, e, z− are needed for the orbits
to be stable. Inspection of Eq. (4) shows that an eccentric orbit
is stable only if
(rperi) > 0 . (27)
It is marginally stable if ∂Vr/∂r = 0 at r = rperi. Similarly,
the stability condition for circular orbits is
(r0) < 0 ; (28)
marginally stable orbits are set by ∂2Vr/∂r
2 = 0 at r = r0.
Finally, we note that one can massage the above solutions
for the conserved orbital quantities of bound stable orbits to
rewrite the solution for L̃z as
L̃z = −
g21Ẽ
2 + (f1Ẽ2 − d1)h1 . (29)
From this solution, we see that it is quite natural to refer to
orbits with D = 1 as prograde and to orbits with D = −1
as retrograde. Note also that Eq. (29) is a more useful form
than the corresponding expression, Eq. (A4), of Ref. [21]. In
that expression, the factor 1/h1 has been squared and moved
inside the square root. This obscures the fact that h1 changes
sign for very strong field orbits. Differences between Eq. (29)
and Eq. (A4) of [21] are apparent for a & 0.835, although
only for orbits close to the separatrix (i.e., the surface in the
space of parameters (p, e, ι) where marginally stable bound
orbits lie).
III. NON-CIRCULAR NEARLY HORIZON-SKIMMING
ORBITS
With explicit expressions for E, Lz and Q as functions of
p, e and z−, we now examine how to generalize the condi-
tion ∂Lz(r, ι)/∂ι > 0, which defined circular nearly horizon-
skimming orbits in Ref. [14], to encompass the non-circular
case. We recall that the inclination angle ι is defined as [14]
cos ι =
Q+ L2z
. (30)
Such a definition is not always easy to handle in the case of
eccentric orbits. In addition, ι does not have an obvious phys-
ical interpretation (even in the circular limit), but rather was
introduced essentially to generalize (at least formally) the def-
inition of inclination for Schwarzschild black hole orbits. In
that case, one has Q = L2x+L
y and therefore Lz = |L| cos ι.
A more useful definition for the inclination angle in Kerr
was introduced in Ref. [21]:
θinc =
−D θmin , (31)
where θmin is the minimum reached by θ during the orbital
motion. This angle is trivially related to z− (z− = sin
2 θinc)
and ranges from 0 to π/2 for prograde orbits and from π/2 to
π for retrograde orbits. It is a simple numerical calculation to
convert between ι and θinc; doing so shows that the differences
between ι and θinc are very small, with the two coinciding for
a = 0, and with a difference that is less than 2.6◦ for a = M
and circular orbits with r = M .
Bearing all this in mind, the condition we have adopted to
generalize nearly horizon-skimming orbits is
∂Lz(p, e, θinc)
∂θinc
> 0 . (32)
We have found that certain parts of this calculation, particular
the analysis of strong-field geodesic orbits, are best done us-
ing the angle θinc; other parts are more simply done using the
angle ι, particularly the “kludge” computation of fluxes de-
scribed in Sec. V. (This is because the kludge fluxes are based
on an extension of post-Newtonian formulas to the strong-
field regime, and these formulas use ι for inclination angle.)
Accordingly, we often switch back and forth between these
two notions of inclination, and in fact present our final results
for inclination evolution using both dι/dt and dθinc/dt.
Before mapping out the region corresponding to nearly
horizon-skimming orbits, it is useful to examine stable or-
bits more generally in the strong field of rapidly rotating black
holes. We first fix a value for a, and then discretize the space
of parameters (p, e, θinc). We next identify the points in this
space corresponding to bound stable geodesic orbits. Suffi-
ciently close to the horizon, the bound stable orbits with spec-
ified values of p and e have an inclination angle θinc ranging
from 0 (equatorial orbit) to a maximum value θmaxinc . For given
p and e, θmaxinc defines the separatrix between stable and unsta-
ble orbits.
Example separatrices are shown in Fig. 1 for a = 0.998M
and a = M . This figure shows the behavior of θmaxinc as a
function of the latus rectum for the different values of the ec-
centricity indicated by the labels. Note that for a = 0.998M
the angle θmaxinc eventually goes to zero. This is the general
behavior for a < M . On the other hand, for an extremal black
hole, a = M , θmaxinc never goes to zero. The orbits which re-
side at r = M (the circular limit) are the “horizon-skimming
orbits” identified by Wilkins [12]; the a = M separatrix has
a similar shape even for eccentric orbits. As expected, we
find that for given latus rectum and eccentricity the orbit with
θinc = 0 is the one with the lowest energy E (and hence is the
most-bound orbit), whereas the orbit with θinc = θ
inc has the
highest E (and is least bound).
Having mapped out stable orbits in (p, e, θinc) space, we
then computed the partial derivative ∂Lz(p, e, θinc)/∂θinc and
identified the following three overlapping regions:
• Region A: The portion of the (p, e) plane for which
∂Lz(p, e, θinc)/∂θinc > 0 for 0 ≤ θinc ≤ θmaxinc . This
region is illustrated in Fig. 2 as the area under the heavy
solid line and to the left of the dot-dashed line (green in
the color version).
• Region B: The portion of the (p, e) plane
for which (Lz)most bound(p, e) is smaller than
(Lz)least bound(p, e). In other words,
Lz(p, e, 0) < Lz(p, e, θ
inc ) (33)
in Region B. Note that Region B contains Region A. It
is illustrated in Fig. 2 as the area under the heavy solid
line and to the left of the dotted line (red in the color
version).
• Region C: The portion of the (p, e) plane for which
∂Lz(p, e, θinc)/∂θinc > 0 for at least one angle θinc
between 0 and θmaxinc . Region C contains Region B, and
is illustrated in Fig. 2 as the area under the heavy solid
line and to the left of the dashed line (blue in the color
version).
Figure 2: Left panel: Non-circular nearly horizon-skimming orbits for a = 0.998M . The heavy solid line indicates the separatrix between
stable and unstable orbits for equatorial orbits (ι = θinc = 0). All orbits above and to the left of this line are unstable. The dot-dashed line
(green in the color version) bounds the region of the (p, e)-plane where ∂Lz/∂θinc > 0 for all allowed inclination angles (“Region A”). All
orbits between this line and the separatrix belong to Region A. The dotted line (red in the color version) bounds the region (Lz)most bound <
(Lz)least bound (“Region B”). Note that B includes A. The dashed line (blue in the color version) bounds the region where ∂Lz/∂θinc > 0 for
at least one inclination angle (“Region C”); note that C includes B. All three of these regions are candidate generalizations of the notion of
nearly horizon-skimming orbits. Right panel: Same as the left panel, but for the extreme spin case, a = M . In this case the separatrix between
stable and unstable equatorial orbits is given by the line p/M = 1 + e.
Orbits in any of these three regions give possible general-
izations of the nearly horizon-skimming circular orbits pre-
sented in Ref. [14]. Notice, as illustrated in Fig. 2, that the
size of these regions depends rather strongly on the spin of
the black hole. All three regions disappear altogether for
a < 0.9524M (in agreement with [14]); their sizes grow with
a, reaching maximal extent for a = M . These regions never
extend beyond p ≃ 2M .
As we shall see, the difference between these three regions
is not terribly important for assessing whether there is a strong
signature of the nearly horizon-skimming regime on the inspi-
ral dynamics. As such, it is perhaps most useful to use Region
C as our definition, since it is the most inclusive.
IV. EVOLUTION OF θinc: CIRCULAR ORBITS
To ascertain whether nearly horizon-skimming orbits can
affect an EMRI in such a way as to leave a clear imprint in the
gravitational-wave signal, we have studied the time evolution
of the inclination angle θinc. To this purpose we have used the
so-called adiabatic approximation [20], in which the infalling
body moves along a geodesic with slowly changing parame-
ters. The evolution of the orbital parameters is computed us-
ing the time-averaged fluxes dE/dt, dLz/dt and dQ/dt due
to gravitational-wave emission (“radiation reaction”). As dis-
cussed in Sec. II, E, Lz and Q can be expressed in terms of p,
e, and θinc. Given rates of change of E, Lz and Q, it is then
straightforward [23] to calculate dp/dt, de/dt, and dθinc/dt
(or dι/dt).
We should note that although perfectly well-behaved for all
bound stable geodesics, the adiabatic approximation breaks
down in a small region of the orbital parameters space very
close to the separatrix, where the transition from an inspiral to
a plunging orbit takes place [27]. However, since this region
is expected to be very small2 and its impact on LISA wave-
forms rather hard to detect [27], we expect our results to be
at least qualitatively correct also in this region of the space of
parameters.
Accurate calculation of dE/dt and dLz/dt in the adiabatic
approximation involves solving the Teukolsky and Sasaki-
Nakamura equations [28, 29]. For generic orbits this has been
done for the first time in Ref. [21]. The calculation of dQ/dt
for generic orbits is more involved. A formula for dQ/dt has
been recently derived [22], but has not yet been implemented
(at least in a code to which we have access).
On the other hand, it is well-known that a circular orbit will
remain circular under radiation reaction [30, 31, 32]. This
constraint means that Teukolsky-based fluxes for E and Lz
2 Its width in p/M is expected to be of the order of ∆p/M ∼ (µ/M)2/5 ,
where µ is the mass of the infalling body [27].
are sufficient to compute dQ/dt. Considering this limit, the
rate of change dQ/dt can be expressed in terms of dE/dt and
dL/dt as
= −N1(p, ι)
N5(p, ι)
− N4(p, ι)
N5(p, ι)
where
N1(p, ι) ≡ E(p, ι) p4 + a2 E(p, ι) p2
− 2 aM (Lz(p, ι)− aE(p, ι)) p , (35)
N4(p, ι) ≡ (2M p− p2)Lz(p, ι)− 2M aE(p, ι) p , (36)
N5(p, ι) ≡ (2M p− p2 − a2)/2 . (37)
(These quantities are for a circular orbit of radius p.) Using
this, it is simple to compute dθinc/dt (or dι/dt).
This procedure was followed in Ref. [14], using the code
presented in Ref. [17], to determine the evolution of ι; this
analysis indicated that dι/dt < 0 for circular nearly horizon-
skimming orbits. As a first step to our more general analy-
sis, we have repeated this calculation but using the improved
Sasaki-Nakamura-Teukolsky code presented in Ref. [21]; we
focused on the case a = 0.998M .
Rather to our surprise, we discovered that the fluxes dE/dt
and dLz/dt computed with this more accurate code indicate
that dι/dt > 0 (and dθinc/dt > 0) for all circular nearly
horizon-skimming orbits — in stark contrast with what was
found in Ref. [14]. As mentioned in the introduction, the rate
of change of inclination angle appears as the difference of two
quantities. These quantities nearly cancel (and indeed cancel
exactly in the limit a = 0); as such, small relative errors in
their values can lead to large relative error in the inferred in-
clination evolution. Values for dE/dt, dLz/dt, dι/dt, and
dθinc/dt computed using the present code are shown in Ta-
ble I in the columns with the header “Teukolsky”.
V. EVOLUTION OF θinc: NON-CIRCULAR ORBITS
The corrected behavior of circular nearly horizon-
skimming orbits has naturally led us to investigate the evo-
lution of non-circular nearly horizon-skimming orbits. Since
our code cannot be used to compute dQ/dt, we have resorted
to a “kludge” approach, based on those described in Refs.
[23, 24]. In particular, we mostly follow the procedure de-
veloped by Gair & Glampedakis [24], though (as described
below) importantly modified.
The basic idea of the “kludge” procedure is to use the func-
tional form of 2PN fluxes E, Lz and Q, but to correct the
circular part of these fluxes using fits to circular Teukolsky
data. As developed in Ref. [24], the fluxes are written
= (1− e2)3/2
(1− e2)−3/2
(p, e, ι)
(p, 0, ι) +
fit circ
(p, ι)
, (38)
= (1 − e2)3/2
(1 − e2)−3/2
(p, e, ι)
(p, 0, ι) +
fit circ
(p, ι)
, (39)
= (1− e2)3/2
Q(p, e, ι) ×
(1− e2)−3/2
dQ/dt√
(p, e, ι)−
dQ/dt√
(p, 0, ι)
dQ/dt√
fit circ
(p, ι)
. (40)
The post-Newtonian fluxes (dE/dt)2PN, (dLz/dt)2PN and
(dQ/dt)2PN are given in the Appendix [particularly Eqs.
(A.1), (A.2), and (A.3)].
Since for circular orbits the fluxes dE/dt, dLz/dt and
dQ/dt are related through Eq. (34), only two fits to circu-
lar Teukolsky data are needed. One possible choice is to fit
dLz/dt and dι/dt, and then use the circularity constraint to
obtain3 [24]
dQ/dt√
fit circ
(p, ι) =
2 tan ι
fit circ
Q(p, 0, ι)
sin2 ι
fit circ
, (41)
fit circ
(p, ι) = −
N4(p, ι)
N1(p, ι)
fit circ
(p, ι)
− N5(p, ι)
N1(p, ι)
Q(p, 0, ι)
dQ/dt√
fit circ
(p, ι) . (42)
As stressed in Ref. [24], one does not expect these fluxes to
work well in the strong field, both because the post-Newtonian
approximation breaks down close to the black hole, and be-
cause the circular Teukolsky data used for the fits in Ref. [24]
was computed for 3M ≤ p ≤ 30M . As a first attempt to
improve their behavior in the nearly horizon-skimming re-
gion, we have made fits using circular Teukolsky data for
orbits with M < p ≤ 2M . In particular, for a black hole
with a = 0.998M , we computed the circular Teukolsky-based
fluxes dLz/dt and dι/dt listed in Table I (columns 8 and 10).
These results were fit (with error . 0.2%); see Eqs. (A.4) and
(A.6) in the Appendix.
3 This choice might seem more involved than fitting directly dLz/dt and
dQ/dt, but, as noted by Gair & Glampedakis, ensures more sensible re-
sults for the evolution of the inclination angle. This generates more physi-
cally realistic inspirals [24].
Despite using strong-field Teukolsky fluxes for our fit, we
found fairly poor behavior of these rates of change, particu-
larly as a function of eccentricity. To compensate for this, we
introduced a kludge-type fit to correct the equatorial part of
the flux, in addition to the circular part. We fit, as a function
of p and e, Teukolsky-based fluxes for dE/dt and dLz/dt for
orbits in the equatorial plane, and then introduce the following
kludge fluxes for E and Lz:
(p, e, ι) =
(p, e, ι)
(p, e, 0) +
fit eq
(p, e) (43)
(p, e, ι) =
(p, e, ι)
(p, e, 0) +
fit eq
(p, e) . (44)
[Note that Eq. (40) for dQ/dt is not modified by this proce-
dure since dQ/dt = 0 for equatorial orbits.] Using equatorial
non-circular Teukolsky data provided by Drasco [21, 33] for
a = 0.998 and M < p ≤ 2M (the ι = 0 “Teukolsky” data in
Tables II, III and IV), we found fits (with error . 1.5%); see
Eqs. (A.9) and (A.10). Note that the fits for equatorial fluxes
are significantly less accurate than the fits for circular fluxes.
This appears to be due to the fact that, close to the black hole,
many harmonics are needed in order for the Teukolsky-based
fluxes to converge, especially for eccentric orbits (cf. Figs. 2
and 3 of Ref. [21], noting the number of radial harmonics that
have significant contribution to the flux). Truncation of these
sums is likely a source of some error in the fluxes themselves,
making it difficult to make a fit of as high quality as we could
in the circular case.
These fits were then finally used in Eqs. (43) and (44) to
calculate the kludge fluxes dE/dt and dLz/dt for generic or-
bits. This kludge reproduces to high accuracy our fits to the
Teukolsky-based fluxes for circular orbits (e = 0) or equato-
rial orbits (ι = 0). Some residual error remains because the
ι = 0 limit of the circular fits do not precisely equal the e = 0
limit of the equatorial fits.
Table I compares our kludge to Teukolsky-based fluxes for
circular orbits; the two methods agree to several digits. Tables
II, III and IV compare our kludge to the generic Teukolsky-
based fluxes for dE/dt and dLz/dt provided by Drasco
[21, 33]. In all cases, the kludge fluxes dE/dt and dLz/dt
have the correct qualitative behavior, being negative for all
the orbital parameters under consideration (a = 0.998M ,
1 < p/M ≤ 2, 0 ≤ e ≤ 0.5 and 0◦ ≤ ι ≤ 41◦). The
relative difference between the kludge and Teukolsky fluxes
is always less than 25% for e = 0 and e = 0.1 (even for
orbits very close to separatrix). The accuracy remains good
at larger eccentricity, though it degrades somewhat as orbits
come close to the separatrix.
Tables I, II, III and IV also present the kludge values of the
fluxes dι/dt and dθinc/dt as computed using Eqs. (43) and
(44) for dE/dt and dLz/dt, plus Eq. (40) for dQ/dt. Though
certainly not the last word on inclination evolution (pending
rigorous computation of dQ/dt), these rates of change proba-
bly represent a better approximation than results published to
date in the literature. (Indeed, prior work has often used the
crude approximation dι/dt = 0 [21] to estimate dQ/dt given
dE/dt and dLz/dt.)
Most significantly, we find that (dι/dt)kludge > 0 and
(dθinc/dt)kludge > 0 for all of the orbital parameters we con-
sider. In other words, we find that dι/dt and dθinc/dt do not
ever change sign.
Finally, in Table V we compute the changes in θinc and ι
for the inspiral with mass ratio µ/M = 10−6. In all cases,
we start at p/M = 1.9. The small body then inspirals through
the nearly horizon-skimming region until it reaches the sep-
aratrix; at this point, the small body will fall into the large
black hole on a dynamical timescale ∼ M , so we terminate
the calculation. The evolution of circular orbits is computed
using our fits to the circular-Teukolsky fluxes of E and Lz;
for eccentric orbits we use the kludge fluxes (40), (43) and
(44). As this exercise demonstrates, the change in inclination
during inspiral is never larger than a few degrees. Not only is
there no unique sign change in the nearly horizon-skimming
region, but the magnitude of the inclination change remains
puny. This leaves little room for the possibility that this class
of orbits may have a clear observational imprint on the EMRI-
waveforms to be detected by LISA.
VI. CONCLUSIONS
We have performed a detailed analysis of the orbital mo-
tion near the horizon of near-extremal Kerr black holes. We
have demonstrated the existence of a class of orbits, which we
have named “non-circular nearly horizon-skimming orbits”,
for which the angular momentum Lz increases with the or-
bit’s inclination, while keeping latus rectum and eccentricity
fixed. This behavior, in stark contrast to that of Newtonian
orbits, generalizes earlier results for circular orbits [14].
Furthermore, to assess whether this class of orbits can pro-
duce a unique imprint on EMRI waveforms (an important
source for future LISA observations), we have studied, in
the adiabatic approximation, the radiative evolution of incli-
nation angle for a small body orbiting in the nearly horizon-
skimming region. For circular orbits, we have re-examined
the analysis of Ref. [14] using an improved code for comput-
ing Teukolsky-based fluxes of the energy and angular momen-
tum. Significantly correcting Ref. [14]’s results, we found no
decrease in the orbit’s inclination angle. Inclination always
increases during inspiral.
We next carried out such an analysis for eccentric nearly
horizon-skimming orbits. In this case, we used “kludge”
fluxes to evolve the constants of motion E, Lz and Q [24].
We find that these fluxes are fairly accurate when compared
with the available Teukolsky-based fluxes, indicating that they
should provide at least qualitatively correct information re-
garding inclination evolution. As for circular orbits, we find
that the orbit’s inclination never decreases. For both circular
and non-circular configurations, we find that the magnitude of
the inclination change is quite paltry — only a few degrees at
most.
Quite generically, therefore, we found that the inclination
angle of both circular and eccentric nearly horizon-skimming
orbits never decreases during the inspiral. Revising the results
obtained in Ref. [14], we thus conclude that such orbits are
not likely to yield a peculiar, unique imprint on the EMRI-
waveforms detectable by LISA.
Acknowledgments
It is a pleasure to thank Kostas Glampedakis for enlight-
ening comments and advice, and Steve Drasco for useful
discussions and for also providing the non-circular Teukol-
sky data that we used in this paper. The supercomputers
used in this investigation were provided by funding from the
JPL Office of the Chief Information Officer. This work was
supported in part by the DFG grant SFB TR/7, by NASA
Grant NNG05G105G, and by NSF Grant PHY-0449884. SAH
gratefully acknowledges support from the MIT Class of 1956
Career Development Fund.
Appendix
In this Appendix we report the expressions for the post-
Newtonian fluxes and the fits to the Teukolsky data necessary
to compute the kludge fluxes introduced in Sec. V. In partic-
ular the 2PN fluxes are given by [24]
= −32
(1− e2)3/2
g1(e)− ã
g2(e) cos ι−
g3(e) + π
g4(e)
g5(e) + ã
g6(e)−
sin2 ι
, (A.1)
= −32
(1− e2)3/2
g9(e) cos ι+ ã
(ga10(e)− cos2 ιgb10(e)) −
g11(e) cos ι
g12(e) cos ι−
g13(e) cos ι+ ã
cos ι
g14(e)−
sin2 ι
, (A.2)
)7/2 √
Q sin ι (1− e2)3/2
g9(e)− ã
cos ιgb10(e)−
g11(e)
g12(e)−
g13(e) + ã
g14(e)−
sin2 ι
, (A.3)
where µ is the mass of the infalling body and where the various e-dependent coefficients are
g1(e) ≡ 1 +
e4 , g2(e) ≡
e6 , g3(e) ≡
g4(e) ≡ 4 +
e2 , g5(e) ≡
44711
172157
e2 , g6(e) ≡
e2 , g9(e) ≡ 1 +
ga10(e) ≡
e4 , gb10(e) ≡
e4 , g11(e) ≡
g12(e) ≡ 4 +
e2 , g13(e) ≡
44711
302893
e2 , g14(e) ≡
The fits to the circular-Teukolsky data of Table I are instead given by
fit circ
(p, ι ) = −32
)7/2 {
cos ι+
)3/2 (
cos2 ι+ 4π cos ι
− 1247
cos ι
cos ι
−1625
sin2 ι
d̃1(p/M) + d̃2(p/M) cos ι + d̃3(p/M) cos
+ d̃4(p/M) cos
3 ι + d̃5(p/M) cos
4 ι+ d̃6(p/M) cos
5 ι+ cos ι
)3/2 (
A+B cos2 ι
, (A.4)
(A.5)
fit circ
(p, ι ) =
sin2 ι√
Q(p, 0, ι)
d̃1(p/M) + cos ι
a7d + b
+ c7d
)3/2]
+ cos2 ι
d̃8(p/M) + cos ι
)5/2 [
h̃1(p/M) + cos
2 ι h̃2(p/M)
, (A.6)
where
d̃i(x) ≡ aid + bid x−1/2 + cid x−1 , i = 1, . . . , 8, h̃i(x) ≡ aih + bih x−1/2 , i = 1, 2 (A.7)
and the numerical coefficients are given by
a1h = −278.9387 , b1h = 84.1414 , a2h = 8.6679 , b2h = −9.2401 , A = −18.3362 , B = 24.9034 , (A.8)
and by the following table
i 1 2 3 4 5 6 7 8
aid 15.8363 445.4418 −2027.7797 3089.1709 −2045.2248 498.6411 −8.7220 50.8345
bid −55.6777 −1333.2461 5940.4831 −9103.4472 6113.1165 −1515.8506 −50.8950 −131.6422
cid 38.6405 1049.5637 −4513.0879 6926.3191 -4714.9633 1183.5875 251.4025 83.0834
Note that the functional form of these fits was obtained from Eqs. (57) and (58) of Ref. [24] by setting ã (i.e., q in their
notation) to 1. Finally, we give expressions for the fits to the equatorial Teukolsky data of tables II, III and IV (data with ι = 0,
columns with header “Teukolsky”):
fit eq
(p, e) =
(p, e, 0)− 32
)2 (M
(1− e2)3/2
g̃1(e) + g̃2(e)
+ g̃3(e)
+ g̃4(e)
+ g̃5(e)
, (A.9)
fit eq
(p, e) =
(p, e, 0)− 32
(1− e2)3/2
f̃1(e) + f̃2(e)
+ f̃3(e)
+ f̃4(e)
+ f̃5(e)
, (A.10)
g̃i(e) ≡ aig + big e2 + cig e4 + dig e6 , f̃i(e) ≡ aif + bif e2 + cif e4 + dif e6 , i = 1, . . . , 5 (A.11)
where the numerical coefficients are given by the following table
i aig b
1 6.4590 −2038.7301 6639.9843 227709.2187 5.4577 −3116.4034 4711.7065 214332.2907
2 -31.2215 10390.6778 −27505.7295 −1224376.5294 −26.6519 15958.6191 −16390.4868 −1147201.4687
3 57.1208 −19800.4891 39527.8397 2463977.3622 50.4374 -30579.3129 15749.9411 2296989.5466
4 -49.7051 16684.4629 −21714.7941 −2199231.9494 −46.7816 25968.8743 656.3460 −2038650.9838
5 16.4697 −5234.2077 2936.2391 734454.5696 15.6660 −8226.3892 −4903.9260 676553.2755
e θinc ι
dθinc
dθinc
(deg.) (deg.) (kludge) (Teukolsky) (kludge) (Teukolsky) (kludge) (Teukolsky) (kludge) (Teukolsky)
1.3 0 0 0 −9.108×10−2 −9.109×10−2 −2.258×10−1 −2.259×10−1 0 0 0 0
1.3 0 10.4870 11.6773 −9.328×10−2 −9.332×10−2 −2.304×10−1 −2.306×10−1 1.837×10−2 1.839×10−2 6.462×10−3 6.475×10−3
1.3 0 14.6406 16.1303 −9.588×10−2 −9.588×10−2 −2.359×10−1 −2.360×10−1 2.397×10−2 2.400×10−2 8.645×10−3 8.667×10−3
1.3 0 17.7000 19.3172 −9.875×10−2 −9.876×10−2 −2.420×10−1 −2.421×10−1 2.728×10−2 2.731×10−2 1.007×10−2 1.010×10−2
1.3 0 20.1636 21.8210 −1.019×10−1 −1.019×10−1 −2.486×10−1 −2.488×10−1 2.943×10−2 2.950×10−2 1.111×10−2 1.117×10−2
1.4 0 0 0 −8.700×10−2 −8.709×10−2 −2.311×10−1 −2.312×10−1 0 0 0 0
1.4 0 14.5992 16.0005 −9.062×10−2 −9.070×10−2 −2.386×10−1 −2.386×10−1 2.316×10−2 2.319×10−2 8.823×10−3 8.848×10−3
1.4 0 20.1756 21.7815 −9.520×10−2 −9.526×10−2 −2.482×10−1 −2.482×10−1 2.875×10−2 2.877×10−2 1.141×10−2 1.143×10−2
1.4 0 24.1503 25.7517 −1.006×10−1 −1.007×10−1 −2.595×10−1 −2.596×10−1 3.140×10−2 3.141×10−2 1.289×10−2 1.288×10−2
1.4 0 27.2489 28.7604 −1.067×10−1 −1.068×10−1 −2.725×10−1 −2.725×10−1 3.274×10−2 3.275×10−2 1.378×10−2 1.377×10−2
1.5 0 0 0 −8.009×10−2 −7.989×10−2 −2.270×10−1 −2.265×10−1 0 0 0 0
1.5 0 16.7836 18.1857 −8.401×10−2 −8.383×10−2 −2.348×10−1 −2.343×10−1 2.360×10−2 2.351×10−2 9.602×10−3 9.545×10−3
1.5 0 23.0755 24.6167 −8.917×10−2 −8.897×10−2 −2.454×10−1 −2.449×10−1 2.872×10−2 2.863×10−2 1.228×10−2 1.222×10−2
1.5 0 27.4892 28.9670 −9.537×10−2 −9.516×10−2 −2.583×10−1 −2.579×10−1 3.091×10−2 3.082×10−2 1.372×10−2 1.367×10−2
1.5 0 30.8795 32.2231 −1.025×10−1 −1.023×10−1 −2.733×10−1 −2.728×10−1 3.184×10−2 3.173×10−2 1.452×10−2 1.443×10−2
1.6 0 0 0 −7.181×10−2 −7.156×10−2 −2.168×10−1 −2.162×10−1 0 0 0 0
1.6 0 18.3669 19.7220 −7.568×10−2 −7.545×10−2 −2.242×10−1 −2.237×10−1 2.240×10−2 2.229×10−2 9.600×10−3 9.515×10−3
1.6 0 25.1720 26.6245 −8.084×10−2 −8.062×10−2 −2.346×10−1 −2.341×10−1 2.701×10−2 2.685×10−2 1.223×10−2 1.210×10−2
1.6 0 29.9014 31.2625 −8.708×10−2 −8.687×10−2 −2.474×10−1 −2.470×10−1 2.889×10−2 2.872×10−2 1.363×10−2 1.349×10−2
1.6 0 33.5053 34.7164 −9.425×10−2 −9.399×10−2 −2.622×10−1 −2.616×10−1 2.964×10−2 2.951×10−2 1.441×10−2 1.432×10−2
1.7 0 0 0 −6.332×10−2 −6.317×10−2 −2.034×10−1 −2.031×10−1 0 0 0 0
1.7 0 19.6910 20.9859 −6.702×10−2 −6.687×10−2 −2.101×10−1 −2.098×10−1 2.057×10−2 2.052×10−2 9.202×10−3 9.171×10−3
1.7 0 26.9252 28.2884 −7.197×10−2 −7.184×10−2 −2.199×10−1 −2.196×10−1 2.467×10−2 2.456×10−2 1.170×10−2 1.162×10−2
1.7 0 31.9218 33.1786 −7.794×10−2 −7.782×10−2 −2.319×10−1 −2.316×10−1 2.632×10−2 2.620×10−2 1.306×10−2 1.296×10−2
1.7 0 35.7100 36.8118 −8.475×10−2 −8.465×10−2 −2.457×10−1 −2.455×10−1 2.698×10−2 2.686×10−2 1.384×10−2 1.373×10−2
1.8 0 0 0 −5.531×10−2 −5.528×10−2 −1.888×10−1 −1.887×10−1 0 0 0 0
1.8 0 20.8804 22.1128 −5.879×10−2 −5.874×10−2 −1.948×10−1 −1.946×10−1 1.858×10−2 1.858×10−2 8.635×10−3 8.639×10−3
1.8 0 28.5007 29.7791 −6.343×10−2 −6.336×10−2 −2.036×10−1 −2.035×10−1 2.221×10−2 2.223×10−2 1.098×10−2 1.101×10−2
1.8 0 33.7400 34.9034 −6.901×10−2 −6.894×10−2 −2.146×10−1 −2.144×10−1 2.368×10−2 2.371×10−2 1.228×10−2 1.232×10−2
1.8 0 37.6985 38.7065 −7.533×10−2 −7.533×10−2 −2.271×10−1 −2.271×10−1 2.429×10−2 2.427×10−2 1.306×10−2 1.303×10−2
1.9 0 0 0 −4.809×10−2 −4.811×10−2 −1.740×10−1 −1.740×10−1 0 0 0 0
1.9 0 21.9900 23.1615 −5.132×10−2 −5.134×10−2 −1.792×10−1 −1.793×10−1 1.666×10−2 1.664×10−2 8.022×10−3 8.007×10−3
1.9 0 29.9708 31.1702 −5.562×10−2 −5.564×10−2 −1.872×10−1 −1.872×10−1 1.986×10−2 1.987×10−2 1.019×10−2 1.020×10−2
1.9 0 35.4385 36.5176 −6.078×10−2 −6.077×10−2 −1.971×10−1 −1.970×10−1 2.118×10−2 2.122×10−2 1.143×10−2 1.148×10−2
1.9 0 39.5592 40.4847 −6.659×10−2 −6.658×10−2 −2.082×10−1 −2.082×10−1 2.177×10−2 2.182×10−2 1.222×10−2 1.228×10−2
2.0 0 0 0 −4.174×10−2 −4.175×10−2 −1.598×10−1 −1.598×10−1 0 0 0 0
2.0 0 23.0471 24.1605 −4.471×10−2 −4.472×10−2 −1.643×10−1 −1.643×10−1 1.489×10−2 1.489×10−2 7.425×10−3 7.424×10−3
2.0 0 31.3715 32.4978 −4.867×10−2 −4.871×10−2 −1.713×10−1 −1.714×10−1 1.773×10−2 1.770×10−2 9.436×10−3 9.411×10−3
2.0 0 37.0583 38.0608 −5.341×10−2 −5.345×10−2 −1.801×10−1 −1.801×10−1 1.893×10−2 1.889×10−2 1.062×10−2 1.057×10−2
2.0 0 41.3358 42.1876 −5.873×10−2 −5.875×10−2 −1.900×10−1 −1.900×10−1 1.950×10−2 1.948×10−2 1.141×10−2 1.138×10−2
Table I: Teukolsky-based fluxes and kludge fluxes [computed using Eqs. (40), (43) and (44)] for circular orbits about a hole with a = 0.998M ;
µ represents the mass of the infalling body. The Teukolsky-based fluxes have an accuracy of 10−6.
[1] http://lisa.nasa.gov/; http://sci.esa.int/home/lisa/
[2] J. Kormendy and D. Richstone, Ann. Rev. Astron. Astrophys.
33, 581 (1995).
[3] J. R. Gair, L. Barack, T. Creighton, C. Cutler, S. L. Larson, E.
S. Phinney, and M. Vallisneri, Class. Quantum Grav. 21, S1595
(2004).
[4] S. L. Shapiro, Astrophys. J. 620, 59 (2005).
[5] L. Rezzolla, T. W. Maccarone, S. Yoshida, and O. Zanotti, Mon.
Not. Roy. Astron. Soc 344, L37 (2003).
[6] R. Shafee, J. E. McClintock, R. Narayan, S. W. Davis, L.-X. Li,
and R. A. Remilland, Astrophys. J. 636, L113 (2006).
[7] J. E. McClintock, R. Shafee, R. Narayan, R. A. Remilland, S.
W. Davis, and L.-X. Li, Astrophys. J. 652, 518 (2006).
[8] A. C. Fabian and G. Miniutti, G. 2005, to appear in Kerr Space-
time: Rotating Black Holes in General Relativity, edited by D.
L. Wiltshire, M. Visser, and S. M. Scott; astro-ph/0507409.
[9] L. W. Brenneman and C. S. Reynolds, Astrophys. J. 652, 1028
(2006).
[10] L. Barack and C. Cutler, Phys. Rev. D 69, 082005 (2004).
[11] M. Volonteri, P. Madau, E. Quataert, and M. J. Rees, Astrophys.
J. 620, 69 (2005).
[12] D. C. Wilkins, Phys. Rev. D 5, 814 (1972).
http://lisa.nasa.gov/
http://sci.esa.int/home/lisa/
http://arxiv.org/abs/astro-ph/0507409
e θinc ι
dθinc
(deg.) (deg.) (kludge) (Teukolsky) (kludge) (Teukolsky) (kludge) (kludge)
1.3 0.1 0 0 −8.804×10−2 −8.804×10−2 −2.098×10−1 −2.098×10−1 0 0
1.4 0.1 0 0 −8.728×10−2 −8.719×10−2 −2.274×10−1 −2.275×10−1 0 0
1.4 0.1 8 8.8664 −9.110×10−2 −8.736×10−2 −2.355×10−1 −2.273×10−1 4.066×10−2 2.938×10−2
1.4 0.1 16 17.4519 −1.030×10−1 −8.958×10−2 −2.602×10−1 −2.309×10−1 7.428×10−2 5.475×10−2
1.4 0.1 24 25.5784 −1.243×10−1 −9.771×10−2 −3.037×10−1 −2.415×10−1 9.663×10−2 7.316×10−2
1.5 0.1 0 0 −8.069×10−2 −8.095×10−2 −2.255×10−1 −2.260×10−1 0 0
1.5 0.1 8 8.7910 −8.323×10−2 −8.133×10−2 −2.310×10−1 −2.264×10−1 2.996×10−2 2.070×10−2
1.5 0.1 16 17.3490 −9.121×10−2 −8.395×10−2 −2.483×10−1 −2.314×10−1 5.512×10−2 3.888×10−2
1.5 0.1 24 25.5197 −1.059×10−1 −8.980×10−2 −2.792×10−1 −2.423×10−1 7.255×10−2 5.264×10−2
1.6 0.1 0 0 −7.255×10−2 −7.281×10−2 −2.161×10−1 −2.168×10−1 0 0
1.6 0.1 8 8.7195 −7.430×10−2 −7.321×10−2 −2.201×10−1 −2.173×10−1 2.258×10−2 1.502×10−2
1.6 0.1 16 17.2437 −7.986×10−2 −7.533×10−2 −2.323×10−1 −2.212×10−1 4.179×10−2 2.839×10−2
1.6 0.1 24 25.4388 −9.025×10−2 −8.040×10−2 −2.547×10−1 −2.309×10−1 5.554×10−2 3.886×10−2
1.6 0.1 32 33.2683 −1.082×10−1 −9.435×10−2 −2.920×10−1 −2.551×10−1 6.316×10−2 4.559×10−2
1.7 0.1 0 0 −6.427×10−2 −6.440×10−2 −2.036×10−1 −2.040×10−1 0 0
1.7 0.1 8 8.6555 −6.552×10−2 −6.478×10−2 −2.065×10−1 −2.045×10−1 1.742×10−2 1.124×10−2
1.7 0.1 16 17.1454 −6.953×10−2 −6.651×10−2 −2.154×10−1 −2.075×10−1 3.240×10−2 2.134×10−2
1.7 0.1 24 25.3531 −7.707×10−2 −7.052×10−2 −2.317×10−1 −2.150×10−1 4.342×10−2 2.948×10−2
1.7 0.1 32 33.2416 −9.009×10−2 −7.959×10−2 −2.590×10−1 −2.324×10−1 4.998×10−2 3.512×10−2
1.8 0.1 0 0 −5.640×10−2 −5.640×10−2 −1.897×10−1 −1.897×10−1 0 0
1.8 0.1 8 8.5991 −5.732×10−2 −5.676×10−2 −1.918×10−1 −1.902×10−1 1.371×10−2 8.640×10−3
1.8 0.1 16 17.0562 −6.028×10−2 −5.817×10−2 −1.984×10−1 −1.925×10−1 2.562×10−2 1.647×10−2
1.8 0.1 24 25.2693 −6.588×10−2 −6.139×10−2 −2.105×10−1 −1.983×10−1 3.456×10−2 2.291×10−2
1.8 0.1 32 33.2018 −7.555×10−2 −6.849×10−2 −2.307×10−1 −2.120×10−1 4.020×10−2 2.765×10−2
1.9 0.1 0 0 −4.915×10−2 −4.911×10−2 −1.753×10−1 −1.751×10−1 0 0
1.9 0.1 8 8.5494 −4.985×10−2 −4.945×10−2 −1.768×10−1 −1.755×10−1 1.097×10−2 6.791×10−3
1.9 0.1 16 16.9760 −5.208×10−2 −5.064×10−2 −1.817×10−1 −1.774×10−1 2.055×10−2 1.298×10−2
1.9 0.1 24 25.1898 −5.633×10−2 −5.328×10−2 −1.908×10−1 −1.819×10−1 2.788×10−2 1.816×10−2
1.9 0.1 32 33.1555 −6.364×10−2 −5.870×10−2 −2.059×10−1 −1.920×10−1 3.272×10−2 2.214×10−2
2.0 0.1 0 0 −4.263×10−2 −4.264×10−2 −1.607×10−1 −1.608×10−1 0 0
2.0 0.1 8 8.5057 −4.316×10−2 −4.292×10−2 −1.619×10−1 −1.611×10−1 8.862×10−3 5.424×10−3
2.0 0.1 16 16.9042 −4.488×10−2 −4.390×10−2 −1.656×10−1 −1.625×10−1 1.666×10−2 1.039×10−2
2.0 0.1 24 25.1156 −4.815×10−2 −4.604×10−2 −1.724×10−1 −1.660×10−1 2.271×10−2 1.459×10−2
2.0 0.1 32 33.1064 −5.376×10−2 −5.031×10−2 −1.838×10−1 −1.736×10−1 2.684×10−2 1.793×10−2
2.0 0.1 40 40.8954 −6.339×10−2 −6.236×10−2 −2.027×10−1 −1.967×10−1 2.917×10−2 2.036×10−2
Table II: As in Table I but for non-circular orbits; the Teukolsky-based fluxes for E and Lz have an accuracy of 10
−3. Note that our code, as
all the Teukolsky-based code that we are aware of, presently does not have the capability to compute inclination angle evolution for generic
orbits.
[13] J. M. Bardeen, W. H. Press, and S. A. Teukolsky, Astrophys. J.
178, 347 (1972).
[14] S. A. Hughes, Phys. Rev. D 63, 064016 (2001).
[15] K. S. Thorne, R. H. Price, and D. A. MacDonald, Black Holes:
The Membrane Paradigm (Yale University Press, New Haven,
CT, 1986).
[16] F. D. Ryan, Phys. Rev. D 52, R3159 (1995).
[17] S. A. Hughes, Phys. Rev. D 61, 084004 (2000).
[18] B. Carter, Phys. Rev. 174, 1559 (1968).
[19] K. S. Thorne, Astrophys. J. 191, 507 (1974).
[20] Y. Mino, Phys. Rev. D 67, 084027 (2003)
[21] S. Drasco and S. A. Hughes, Phys. Rev. D 73, 024027 (2006).
[22] N. Sago, T. Tanaka, W. Hikida, and H. Nakano, Prog. Theor.
Phys. 114, 509 (2005); N. Sago, T. Tanaka, W. Hikida, K. Ganz,
and H. Nakano, Prog. Theor. Phys. 115, 873 (2006).
[23] K. Glampedakis, S. A. Hughes, and D. Kennefick, Phys. Rev.
D 66, 064005 (2002).
[24] J. R. Gair and K. Glampedakis, Phys. Rev. D 73, 064037
(2006).
[25] C. W. Misner, K. S. Thorne, and J. A. Wheeler, Gravitation
(Freeman, San Francisco, 1973).
[26] W. Schmidt, Class. Quantum Grav. 19, 2743 (2002).
[27] A. Ori and K. S. Thorne, Phys. Rev. D 62, 124022 (2000)
[28] S. A. Teukolsky, Astrophys. J. 185, 635 (1973).
[29] M. Sasaki and T. Nakamura, Prog. Theor. Phys. 67, 1788
(1982).
[30] F. D. Ryan, Phys. Ref. D 53, 3064 (1996).
[31] D. Kennefick and A. Ori, Phys. Rev. D 53, 4319 (1996).
[32] Y. Mino, unpublished Ph. D. thesis, Kyoto University, 1996.
[33] Data available at http://gmunu.mit.edu/sdrasco/snapshots/
http://gmunu.mit.edu/sdrasco/snapshots/
e θinc ι
dθinc
(deg.) (deg.) (kludge) (Teukolsky ) (kludge) (Teukolsky) (kludge) (kludge)
1.4 0.2 0 0 −8.636×10−2 −8.642×10−2 −2.119×10−1 −2.121×10−1 0 0
1.4 0.2 8 8.8215 −9.853×10−2 −8.240×10−2 −2.374×10−1 −2.015×10−1 1.148×10−1 9.714×10−2
1.5 0.2 0 0 −8.362×10−2 −8.349×10−2 −2.236×10−1 −2.230×10−1 0 0
1.5 0.2 8 8.7595 −9.141×10−2 −8.276×10−2 −2.410×10−1 −2.206×10−1 7.893×10−2 6.549×10−2
1.5 0.2 16 17.2957 −1.145×10−1 −8.394×10−2 −2.915×10−1 −2.215×10−1 1.466×10−1 1.230×10−1
1.5 0.2 24 25.4608 −1.524×10−1 −9.230×10−2 −3.712×10−1 −2.357×10−1 1.952×10−1 1.661×10−1
1.6 0.2 0 0 −7.596×10−2 −7.616×10−2 −2.171×10−1 −2.176×10−1 0 0
1.6 0.2 8 8.6935 −8.111×10−2 −7.641×10−2 −2.292×10−1 −2.177×10−1 5.520×10−2 4.502×10−2
1.6 0.2 16 17.1994 −9.649×10−2 −7.798×10−2 −2.647×10−1 −2.198×10−1 1.032×10−1 8.500×10−2
1.6 0.2 24 25.3891 −1.221×10−1 −8.314×10−2 −3.212×10−1 −2.288×10−1 1.388×10−1 1.160×10−1
1.7 0.2 0 0 −6.765×10−2 −6.799×10−2 −2.057×10−1 −2.068×10−1 0 0
1.7 0.2 8 8.6329 −7.116×10−2 −6.813×10−2 −2.144×10−1 −2.066×10−1 3.963×10−2 3.176×10−2
1.7 0.2 16 17.1064 −8.171×10−2 −6.995×10−2 −2.398×10−1 −2.096×10−1 7.441×10−2 6.024×10−2
1.7 0.2 24 25.3085 −9.948×10−2 −7.443×10−2 −2.806×10−1 −2.178×10−1 1.009×10−1 8.290×10−2
1.7 0.2 32 33.2037 −1.257×10−1 −8.558×10−2 −3.371×10−1 −2.366×10−1 1.175×10−1 9.806×10−2
1.8 0.2 0 0 −5.965×10−2 −5.962×10−2 −1.927×10−1 −1.926×10−1 0 0
1.8 0.2 8 8.5789 −6.211×10−2 −5.997×10−2 −1.990×10−1 −1.930×10−1 2.919×10−2 2.300×10−2
1.8 0.2 16 17.0211 −6.953×10−2 −6.147×10−2 −2.175×10−1 −1.954×10−1 5.504×10−2 4.380×10−2
1.8 0.2 24 25.2283 −8.216×10−2 −6.502×10−2 −2.474×10−1 −2.016×10−1 7.515×10−2 6.068×10−2
1.8 0.2 32 33.1656 −1.009×10−1 −7.410×10−2 −2.890×10−1 −2.190×10−1 8.839×10−2 7.258×10−2
1.9 0.2 0 0 −5.218×10−2 −5.210×10−2 −1.786×10−1 −1.783×10−1 0 0
1.9 0.2 8 8.5312 −5.394×10−2 −5.244×10−2 −1.833×10−1 −1.787×10−1 2.197×10−2 1.704×10−2
1.9 0.2 16 16.9441 −5.928×10−2 −5.373×10−2 −1.970×10−1 −1.807×10−1 4.156×10−2 3.254×10−2
1.9 0.2 24 25.1518 −6.843×10−2 −5.669×10−2 −2.192×10−1 −1.858×10−1 5.706×10−2 4.535×10−2
1.9 0.2 32 33.1207 −8.213×10−2 −6.277×10−2 −2.502×10−1 −1.966×10−1 6.767×10−2 5.475×10−2
2.0 0.2 0 0 −4.528×10−2 −4.530×10−2 −1.637×10−1 −1.638×10−1 0 0
2.0 0.2 8 8.4891 −4.657×10−2 −4.557×10−2 −1.671×10−1 −1.641×10−1 1.679×10−2 1.283×10−2
2.0 0.2 16 16.8749 −5.049×10−2 −4.664×10−2 −1.774×10−1 −1.657×10−1 3.184×10−2 2.457×10−2
2.0 0.2 24 25.0802 −5.725×10−2 −4.904×10−2 −1.941×10−1 −1.696×10−1 4.391×10−2 3.440×10−2
2.0 0.2 32 33.0730 −6.743×10−2 −5.427×10−2 −2.175×10−1 −1.793×10−1 5.243×10−2 4.184×10−2
1.5 0.3 0 0 −8.481×10−2 −8.478×10−2 −2.094×10−1 −2.094×10−1 0 0
1.5 0.3 8 8.7037 −1.006×10−1 −7.824×10−2 −2.442×10−1 −1.934×10−1 1.484×10−1 1.301×10−1
1.5 0.3 16 17.2003 −1.469×10−1 −7.811×10−2 −3.435×10−1 −1.864×10−1 2.766×10−1 2.440×10−1
1.6 0.3 0 0 −8.144×10−2 −8.123×10−2 −2.183×10−1 −2.178×10−1 0 0
1.6 0.3 8 8.6498 −9.182×10−2 −7.807×10−2 −2.426×10−1 −2.095×10−1 1.028×10−1 8.918×10−2
1.6 0.3 16 17.1246 −1.223×10−1 −8.089×10−2 −3.122×10−1 −2.144×10−1 1.928×10−1 1.683×10−1
1.6 0.3 24 25.3046 −1.716×10−1 −8.666×10−2 −4.197×10−1 −2.229×10−1 2.607×10−1 2.295×10−1
1.7 0.3 0 0 −7.362×10−2 −7.314×10−2 −2.104×10−1 −2.095×10−1 0 0
1.7 0.3 8 8.5953 −8.060×10−2 −7.224×10−2 −2.277×10−1 −2.065×10−1 7.240×10−2 6.224×10−2
1.7 0.3 16 17.0415 −1.013×10−1 −7.369×10−2 −2.774×10−1 −2.084×10−1 1.365×10−1 1.180×10−1
1.7 0.3 24 25.2339 −1.349×10−1 −7.800×10−2 −3.547×10−1 −2.153×10−1 1.861×10−1 1.622×10−1
1.8 0.3 0 0 −6.488×10−2 −6.484×10−2 −1.973×10−1 −1.972×10−1 0 0
1.8 0.3 8 8.5454 −6.970×10−2 −6.480×10−2 −2.099×10−1 −1.966×10−1 5.206×10−2 4.436×10−2
1.8 0.3 16 16.9628 −8.402×10−2 −6.671×10−2 −2.461×10−1 −1.998×10−1 9.857×10−2 8.445×10−2
1.8 0.3 24 25.1601 −1.075×10−1 −7.030×10−2 −3.026×10−1 −2.056×10−1 1.353×10−1 1.169×10−1
1.8 0.3 32 33.1047 −1.404×10−1 −8.153×10−2 −3.762×10−1 −2.255×10−1 1.600×10−1 1.394×10−1
1.9 0.3 0 0 −5.669×10−2 −5.690×10−2 −1.829×10−1 −1.832×10−1 0 0
1.9 0.3 8 8.5010 −6.010×10−2 −5.683×10−2 −1.922×10−1 −1.824×10−1 3.823×10−2 3.229×10−2
1.9 0.3 16 16.8911 −7.025×10−2 −5.818×10−2 −2.189×10−1 −1.844×10−1 7.263×10−2 6.165×10−2
1.9 0.3 24 25.0887 −8.701×10−2 −6.054×10−2 −2.609×10−1 −1.874×10−1 1.003×10−1 8.579×10−2
1.9 0.3 32 33.0624 −1.106×10−1 −6.912×10−2 −3.157×10−1 −2.034×10−1 1.195×10−1 1.032×10−1
2.0 0.3 0 0 −4.953×10−2 −4.946×10−2 −1.683×10−1 −1.683×10−1 0 0
2.0 0.3 8 8.4616 −5.199×10−2 −4.970×10−2 −1.753×10−1 −1.685×10−1 2.862×10−2 2.395×10−2
2.0 0.3 16 16.8262 −5.932×10−2 −5.079×10−2 −1.954×10−1 −1.699×10−1 5.452×10−2 4.585×10−2
2.0 0.3 24 25.0215 −7.150×10−2 −5.328×10−2 −2.269×10−1 −1.737×10−1 7.564×10−2 6.411×10−2
2.0 0.3 32 33.0172 −8.878×10−2 −6.003×10−2 −2.682×10−1 −1.864×10−1 9.077×10−2 7.771×10−2
Table III: As in Table II, but for additional values of eccentricity e; the Teukolsky-based fluxes for E and Lz have an accuracy of 10
e θinc ι
dθinc
(deg.) (deg.) (kludge) (Teukolsky ) (kludge) (Teukolsky) (kludge) (kludge)
1.6 0.4 0 0 −7.766×10−2 −7.772×10−2 −1.918×10−1 −1.919×10−1 0 0
1.6 0.4 8 8.5863 −9.433×10−2 −7.645×10−2 −2.297×10−1 −1.881×10−1 1.528×10−1 1.370×10−1
1.6 0.4 16 17.0151 −1.432×10−1 −7.651×10−2 −3.382×10−1 −1.837×10−1 2.873×10−1 2.584×10−1
1.7 0.4 0 0 −7.882×10−2 −7.953×10−2 −2.097×10−1 −2.115×10−1 0 0
1.7 0.4 8 8.5426 −9.002×10−2 −7.408×10−2 −2.367×10−1 −1.978×10−1 1.087×10−1 9.656×10−2
1.7 0.4 16 16.9502 −1.229×10−1 −7.682×10−2 −3.143×10−1 −2.025×10−1 2.054×10−1 1.830×10−1
1.7 0.4 24 25.1282 −1.760×10−1 −8.090×10−2 −4.336×10−1 −2.075×10−1 2.809×10−1 2.514×10−1
1.8 0.4 0 0 −7.107×10−2 −7.007×10−2 −2.013×10−1 −1.988×10−1 0 0
1.8 0.4 8 8.4989 −7.877×10−2 −7.001×10−2 −2.209×10−1 −1.981×10−1 7.788×10−2 6.879×10−2
1.8 0.4 16 16.8817 −1.015×10−1 −7.009×10−2 −2.774×10−1 −1.965×10−1 1.478×10−1 1.309×10−1
1.8 0.4 24 25.0646 −1.383×10−1 −7.314×10−2 −3.646×10−1 −2.003×10−1 2.036×10−1 1.810×10−1
1.8 0.4 32 33.0184 −1.887×10−1 −9.193×10−2 −4.755×10−1 −2.319×10−1 2.414×10−1 2.156×10−1
1.9 0.4 0 0 −6.187×10−2 −6.267×10−2 −1.861×10−1 −1.881×10−1 0 0
1.9 0.4 8 8.4591 −6.728×10−2 −6.216×10−2 −2.006×10−1 −1.861×10−1 5.666×10−2 4.980×10−2
1.9 0.4 16 16.8173 −8.328×10−2 −6.222×10−2 −2.424×10−1 −1.844×10−1 1.079×10−1 9.506×10−2
1.9 0.4 24 25.0006 −1.094×10−1 −6.486×10−2 −3.071×10−1 −1.878×10−1 1.495×10−1 1.322×10−1
1.9 0.4 32 32.9804 −1.452×10−1 −7.884×10−2 −3.896×10−1 −2.158×10−1 1.787×10−1 1.588×10−1
2.0 0.4 0 0 −5.483×10−2 −5.457×10−2 −1.735×10−1 −1.729×10−1 0 0
2.0 0.4 8 8.4235 −5.871×10−2 −5.445×10−2 −1.844×10−1 −1.720×10−1 4.222×10−2 3.686×10−2
2.0 0.4 16 16.7586 −7.020×10−2 −5.555×10−2 −2.158×10−1 −1.733×10−1 8.064×10−2 7.057×10−2
2.0 0.4 24 24.9396 −8.902×10−2 −5.844×10−2 −2.645×10−1 −1.778×10−1 1.122×10−1 9.860×10−2
2.0 0.4 32 32.9389 −1.150×10−1 −6.536×10−2 −3.267×10−1 −1.896×10−1 1.351×10−1 1.193×10−1
1.7 0.5 0 0 −7.421×10−2 −7.401×10−2 −1.815×10−1 −1.810×10−1 0 0
1.7 0.5 8 8.4736 −8.957×10−2 −7.168×10−2 −2.173×10−1 −1.750×10−1 1.379×10−1 1.256×10−1
1.7 0.5 16 16.8300 −1.347×10−1 −6.999×10−2 −3.201×10−1 −1.676×10−1 2.611×10−1 2.378×10−1
1.8 0.5 0 0 −7.589×10−2 −7.620×10−2 −1.993×10−1 −2.000×10−1 0 0
1.8 0.5 8 8.4395 −8.644×10−2 −6.929×10−2 −2.254×10−1 −1.829×10−1 1.005×10−1 9.076×10−2
1.8 0.5 16 16.7776 −1.175×10−1 −7.210×10−2 −3.004×10−1 −1.880×10−1 1.911×10−1 1.726×10−1
1.8 0.5 24 24.9413 −1.678×10−1 −7.395×10−2 −4.158×10−1 −1.881×10−1 2.638×10−1 2.385×10−1
1.9 0.5 0 0 −6.646×10−2 −6.620×10−2 −1.855×10−1 −1.849×10−1 0 0
1.9 0.5 8 8.4059 −7.386×10−2 −6.320×10−2 −2.048×10−1 −1.768×10−1 7.312×10−2 6.579×10−2
1.9 0.5 16 16.7233 −9.572×10−2 −6.551×10−2 −2.603×10−1 −1.809×10−1 1.395×10−1 1.255×10−1
1.9 0.5 24 24.8877 −1.312×10−1 −7.087×10−2 −3.461×10−1 −1.909×10−1 1.937×10−1 1.744×10−1
1.9 0.5 32 32.8741 −1.795×10−1 −8.247×10−2 −4.544×10−1 −2.091×10−1 2.320×10−1 2.092×10−1
2.0 0.5 0 0 −5.987×10−2 −5.995×10−2 −1.761×10−1 −1.763×10−1 0 0
2.0 0.5 8 8.3750 −6.516×10−2 −5.918×10−2 −1.906×10−1 −1.738×10−1 5.456×10−2 4.882×10−2
2.0 0.5 16 16.6725 −8.081×10−2 −5.817×10−2 −2.324×10−1 −1.694×10−1 1.044×10−1 9.343×10−2
2.0 0.5 24 24.8347 −1.063×10−1 −6.254×10−2 −2.970×10−1 −1.776×10−1 1.456×10−1 1.304×10−1
2.0 0.5 32 32.8378 −1.412×10−1 −6.993×10−2 −3.787×10−1 −1.893×10−1 1.756×10−1 1.576×10−1
Table IV: As in Tables II and III, but for different values of eccentricity e; the Teukolsky-based fluxes for E and Lz have an accuracy of 10
e θinc ι ∆t/M ∆θinc ∆ι
(deg.) (deg.) (deg.) (deg.)
0 0 0 1.250×106 0 0
0 5 5.355510 1.217×106 1.949×10−1 4.954×10−1
0 10 10.679331 1.118×106 3.468×10−1 8.631×10−1
0 15 15.943192 9.574×105 4.236×10−1 1.019
0 20 21.125167 7.446×105 4.109×10−1 9.440×10−1
0 25 26.211779 4.981×105 3.158×10−1 6.860×10−1
0 30 31.199048 2.528×105 1.732×10−1 3.527×10−1
0 35 36.092514 6.584×104 4.636×10−2 8.806×10−2
0.1 0 0 1.228×106 0 0
0.1 5 5.351602 1.198×106 4.517×10−1 7.766×10−1
0.1 10 10.671900 1.103×106 6.900×10−1 1.236
0.1 15 15.932962 9.426×105 7.283×10−1 1.344
0.1 20 21.113129 7.315×105 6.433×10−1 1.187
0.1 25 26.199088 4.900×105 4.780×10−1 8.547×10−1
0.1 30 31.186915 2.513×105 2.730×10−1 4.585×10−1
0.1 35 36.082095 6.589×104 8.385×10−2 1.279×10−1
0.2 0 0 1.173×106 0 0
0.2 5 5.339916 1.150×106 1.204 1.598
0.2 10 10.649670 1.064×106 1.698 2.331
0.2 15 15.902348 9.043×105 1.618 2.293
0.2 20 21.077081 6.980×105 1.324 1.900
0.2 25 26.161046 4.693×105 9.545×10−1 1.351
0.2 30 31.150481 2.486×105 5.674×10−1 7.711×10−1
0.2 35 36.050712 7.562×104 2.070×10−1 2.648×10−1
0.3 0 0 1.087×106 0 0
0.3 5 5.320559 1.069×106 2.307 2.788
0.3 10 10.612831 1.001×106 3.256 4.007
0.3 15 15.851572 8.454×105 2.984 3.741
0.3 20 21.017212 6.483×105 2.375 2.998
0.3 25 26.097732 4.408×105 1.700 2.129
0.3 30 31.089639 2.493×105 1.040 1.276
0.3 35 35.997987 1.108×105 4.626×10−1 5.569×10−1
Table V: Variation in the inclination angles ι and θinc as well as time needed to reach the separatrix for several inspirals through the nearly
horizon-skimming regime. In all of these cases, the binary’s mass ratio was fixed to µ/M = 10−6, the large black hole’s spin was fixed to
a = 0.998M , and the orbits were begun at p = 1.9M . The time interval ∆t is the total accumulated time it takes for the inspiralling body
to reach the separatrix (at which time it rapidly plunges into the black hole). The angles ∆θinc and ∆ι are the total integrated change in these
inclination angles that we compute. For the e = 0 cases, inspirals are computed using fits to the circular-Teukolsky fluxes of E and Lz ; for
eccentric orbits we use the kludge fluxes (40), (43) and (44). Notice that ∆θinc and ∆ι are always positive — the inclination angle always
increases during the inspiral through the nearly horizon-skimming region. The magnitude of this increase never exceeds a few degrees.
ABSTRACT
  We have performed a detailed analysis of orbital motion in the vicinity of a
nearly extremal Kerr black hole. For very rapidly rotating black holes (spin
a=J/M>0.9524M) we have found a class of very strong field eccentric orbits
whose angular momentum L_z increases with the orbit's inclination with respect
to the equatorial plane, while keeping latus rectum and eccentricity fixed.
This behavior is in contrast with Newtonian intuition, and is in fact opposite
to the "normal" behavior of black hole orbits. Such behavior was noted
previously for circular orbits; since it only applies to orbits very close to
the black hole, they were named "nearly horizon-skimming orbits". Our analysis
generalizes this result, mapping out the full generic (inclined and eccentric)
family of nearly horizon-skimming orbits. The earlier work on circular orbits
reported that, under gravitational radiation emission, nearly horizon-skimming
orbits tend to evolve to smaller orbit inclination, toward prograde equatorial
configuration. Normal orbits, by contrast, always demonstrate slowly growing
orbit inclination (orbits evolve toward the retrograde equatorial
configuration). Using up-to-date Teukolsky-fluxes, we have concluded that the
earlier result was incorrect: all circular orbits, including nearly
horizon-skimming ones, exhibit growing orbit inclination. Using kludge fluxes
based on a Post-Newtonian expansion corrected with fits to circular and to
equatorial Teukolsky-fluxes, we argue that the inclination grows also for
eccentric nearly horizon-skimming orbits. We also find that the inclination
change is, in any case, very small. As such, we conclude that these orbits are
not likely to have a clear and peculiar imprint on the gravitational waveforms
expected to be measured by the space-based detector LISA.

<|endoftext|><|startoftext|>
The Blue Straggler Population of the Globular Cluster M5 1
B. Lanzoni1,2, E. Dalessandro1,2, F.R. Ferraro1, C. Mancini3, G. Beccari2,4,5, R.T. Rood6,
M. Mapelli7, S. Sigurdsson8
1 Dipartimento di Astronomia, Università degli Studi di Bologna, via Ranzani 1, I–40127
Bologna, Italy
2 INAF–Osservatorio Astronomico di Bologna, via Ranzani 1, I–40127 Bologna, Italy
3 Dipartimento di Astronomia e Scienza dello Spazio, Università degli Studi di Firenze,
Largo Enrico Fermi 2, I– 50125 Firenze, Italy
4 Dipartimento di Scienze della Comunicazione, Università degli Studi di Teramo, Italy
5 INAF–Osservatorio Astronomico di Collurania, Via Mentore Maggini, I–64100 Teramo,
Italy
6 Department of Astronomy and Astrophysics, The Pennsylvania State University, 525
Davey Lab, University Park, PA 16802
7 S.I.S.S.A., Via Beirut 2 - 4, I–34014 Trieste, Italy
8 Astronomy Department, University of Virginia, P.O. Box 400325, Charlottesville, VA,
22904
20 March, 07
ABSTRACT
By combining high-resolution HST and wide-field ground based observations,
in ultraviolet and optical bands, we study the Blue Stragglers Star (BSS) popula-
tion of the galactic globular cluster M5 (NGC 5904) from its very central regions
up to its periphery. The BSS distribution is highly peaked in the cluster center,
decreases at intermediate radii and rises again outward. Such a bimodal dis-
tribution is similar to those previously observed in other globular clusters (M3,
47 Tucanae, NGC 6752). As for these clusters, dynamical simulations suggest
that, while the majority of BSS in M5 could be originated by stellar collisions,
a significant fraction (20-40%) of BSS generated by mass transfer processes in
primordial binaries is required to reproduce the observed radial distribution. A
candidate BSS has been detected beyond the cluster tidal radius. If confirmed,
this could represent an interesting case of an ”evaporating” BSS.
http://arxiv.org/abs/0704.0139v1
– 2 –
Subject headings: Globular clusters: individual (M5); stars: evolution – binaries:
general - blue stragglers
1. INTRODUCTION
In globular cluster (GC) color-magnitude diagrams (CMD) blue straggler stars (BSS)
appear to be brighter and bluer than the Turn-Off (TO) stars and lie along an extension of
the Main Sequence. Since BSS mimic a rejuvenated stellar population with masses larger
than the normal cluster stars (this is also confirmed by direct mass measurements; e.g. Shara
et al. 1997), they are thought to be objects that have increased their initial mass during
their evolution by means of some process. Two main scenarios have been proposed for their
formation: the collisional scenario suggests that BSS are the end-products of stellar mergers
induced by collisions (COL-BSS), while in the mass-transfer scenario BSS form by the mass-
transfer activity between two companions in a binary system (MT-BSS), possibly up to the
complete coalescence of the two stars. Hence, understanding the origin of BSS in stellar
clusters provides valuable insight both on the binary evolution processes and on the effects
of dynamical interactions on the (otherwise normal) stellar evolution.
The relative efficiency of the two formation mechanisms is thought to depend on the en-
vironment (Fusi Pecci et al. 1992; Ferraro et al. 1999a; Bellazzini et al. 2002; Ferraro et al.
2003). COL-BSS are expected to be formed preferentially in high-density environments (i.e.,
the GC central regions), where stellar collisions are most probable, and MT-BSS should
mainly populate lower density environments (the cluster peripheries), where binary systems
can more easily evolve in isolation without suffering exchanges or ionization due to gravita-
tional encounters. The overall scenario is complicated by the fact that primordial binaries
can also sink to the core due to mass segregation processes, and “new” binaries can be formed
in the cluster centers by gravitational encounters. The two formation mechanisms are likely
to be at work simultaneously in every GC (see the case of M3 as an example; Ferraro et al.
1993, 1997), but the identification of the cluster properties that mainly affect their relative
efficiency is still an open issue.
One possibility for distinguishing between the two types of BSS is offered by high-
resolution spectroscopic studies. Anomalous chemical abundances are expected at the surface
1Based on observations with the NASA/ESA HST, obtained at the Space Telescope Science Institute,
which is operated by AURA, Inc., under NASA contract NAS5-26555. Also based on WFI observations
collected at the European Southern Observatory, La Silla, Chile, within the observing programs 62.L-0354
and 64.L-0439.
– 3 –
of BSS resulting from MT activity (Sarna & de Greve 1996), while they are not predicted
in case of a collisional formation (Lombardi, Rasio & Shapiro 1995). Such studies have just
become feasible, and the results found in the case of 47 Tucanae (47 Tuc; Ferraro et al.
2006a) are encouraging. The detection of unexpected properties of stars along standard
evolutionary sequences (e.g., variability, anomalous population fractions, or peculiar radial
distributions) can help estimating the fraction of binaries within a cluster (see, e.g., Bailyn
1994; Albrow et al. 2001; Bellazzini et al. 2002; Beccari et al. 2006), but such evidence does
not directly allow the determination of the relative efficiency of the two BSS formation
processes.
The most widely applicable tool to probe the origin of BSS is their radial distribution
within the clusters (see Ferraro 2006, for a review). This has been observed to be bimodal
(i.e., highly peaked in the cluster centers and peripheries, and significantly lower at inter-
mediate radii) in at least 4 GCs: M3 (Ferraro et al. 1997), 47 Tuc (Ferraro et al. 2004),
NGC 6752 (Sabbi et al. 2004), and M5 (Warren, Sandquist & Bolte 2006, hereafter W06).
Preliminary evidence of bimodality has also been found in M55 (Zaggia, Piotto & Capaccioli
1997). Dynamical simulations suggest that the bimodal radial distributions observed in M3,
47 Tuc and NGC 6752 (Mapelli et al. 2004, 2006) result from ∼ 40− 50% of MT-BSS with
the balance being COL-BSS. In this context, the case of ω Cen is atypical: the BSS radial
distribution in this cluster is flat (Ferraro et al. 2006b), and mass segregation processes have
not yet played a major role, thus implying that this system is populated by a vast majority
of MT-BSS (Mapelli et al. 2006). These results demonstrate that detailed studies of the BSS
radial distribution within GCs are very powerful tools for better understanding the complex
interplay between dynamics and stellar evolution in dense stellar systems.
In the present paper we extend this kind of investigation to M5 (NGC 5904). With HST-
WFPC2 and -ACS ultraviolet and optical high-resolution images of the core we have been
able to efficiently detect the BSS population even in the severely crowded central regions.
Moreover, with wide-field optical observations performed with ESO-WFI we sampled the
entire cluster extension. The combination of these two data sets allowed us to study the
dynamical properties of M5, accurately redetermining its center of gravity, its surface density
profile, and the BSS radial distribution over the entire cluster. The BSS population of M5
has been recently studied by W06, but we have extended the analysis to larger distances
from the cluster center, and we have used Monte-Carlo dynamical simulations to interpret
the observational results.
– 4 –
2. OBSERVATIONS AND DATA ANALYSIS
2.1. The data sets
The present study is based on a combination of two different photometric data sets:
1. The high-resolution set – It consists of a series of ultraviolet (UV) and optical images
of the cluster center obtained with HST-WFPC2 (Prop. 6607, P.I. Ferraro). To efficiently
resolve the stars in the highly crowded central regions, the Planetary Camera (PC, being the
highest resolution instrument: 0.′′046/pixel) has been pointed approximately on the cluster
center, while the three Wide Field Cameras (WF, having a lower resolution: 0.′′1/pixel) have
been used to sample the surrounding regions. Observations have been performed through
filter F255W (medium UV) in order to efficiently select the BSS and horizontal branch
(HB) populations, and through filters F336W (approximately corresponding to an U filter)
and F555W (V ) for the red giant branch (RGB) population and to guarantee a proper
combination with the ground-based data set (see below). The photometric reduction of the
high-resolution images was carried out using ROMAFOT (Buonanno et al. 1983), a package
developed to perform accurate photometry in crowded fields and specifically optimized to
handle under-sampled Point Spread Functions (PSFs; Buonanno & Iannicola 1989), as in
the case of the HST-WF chips.
To obtain a better coverage of the innermost regions of the cluster, we have also used a
set of public HST-WFPC2 and HST-ACS observations. The HST-WFPC2 data set has been
obtained through filters F439W (B) and F555W (V ) by Piotto et al. (2002), and because
of the different orientation of the camera, it is complementary to ours. Additional HST-ACS
data in filters F435W (B), F606W (V ), and F814W (I) have been retrieved from the ESO-
STECF Science Archive, and have been used to sample the central area not covered by the
WFPC2 observations. All the ACS images were properly corrected for geometric distortions
and effective flux (over the pixel area) following the prescriptions of Sirianni et al. (2005).
The photometric analysis was performed independently in the three drizzled images by using
the aperture photometry code SExtractor (Source-Extractor; Bertin & Arnouts 1996), and
adopting a fixed aperture radius of 2.5 pixels (0.125′′). The magnitude lists were finally
cross-correlated in order to obtain a combined catalog. The adopted combination of the
three HST data sets is sketched in Figure 1 and provided a good coverage of the cluster up
to r = 115′′.
2. The wide-field set - A complementary set of wide-field B and V images was secured by
using the Wide Field Imager (WFI) at the 2.2m ESO-MPI telescope during an observing run
in April 2000. Thanks to the exceptional imaging capabilities of WFI (each image consists of
a mosaic of 8 CCDs, for a global field of view of 34′×34′), these data cover the entire cluster
– 5 –
extension (see Figure 2, where the cluster is roughly centered on CCD #7). The raw WFI
images were corrected for bias and flat field, and the overscan regions were trimmed using
IRAF2 tools. The PSF fitting procedure was performed independently on each image using
DoPhot (Schechter, Mateo & Saha 1993). All the uncertain detections, usually caused by
photometric blends, stars near the CCD gaps or saturated stars, have been checked one by
one using ROMAFOT (Buonanno et al. 1983).
2.2. Astrometry and center of gravity
The HST+WFI catalog has been placed on the absolute astrometric system by adopting
the procedure already described in Ferraro et al. (2001, 2003). The new astrometric Guide
Star Catalog (GSC-II3) was used to search for astrometric standard stars in the WFI field of
view (FoV), and a cross-correlation tool specifically developed at the Bologna Observatory
(Montegriffo et al. 2003, private communication) has been employed to obtain an astrometric
solution for each of the 8 CCDs. Several hundred GSC-II reference stars were found in each
chip, thus allowing an accurate absolute positioning of the stars. Then, a few hundred stars
in common between the WFI and the HST FoVs have been used as secondary standards to
place the HST catalog on the same absolute astrometric system. At the end of the procedure
the global uncertainties in the astrometric solution are of the order of ∼ 0.′′2, both in right
ascension (α) and declination (δ).
Given the absolute positions of individual stars in the innermost regions of the cluster,
the center of gravity Cgrav has been determined by averaging coordinates α and δ of all
stars lying in the PC FoV following the iterative procedure described in Montegriffo et
al. (1995; see also Ferraro et al. 2003, 2004). In order to correct for spurious effects
due to incompleteness in the very inner regions of the cluster, we considered two samples
with different limiting magnitudes (m555 < 19.5 and m555 < 20), and we computed the
barycenter of stars for each sample. The two estimates agree within ∼ 1′′, giving Cgrav at
α(J2000) = 15h 18m 33.s53, δ(J2000) = +2o 4′ 57.′′06, with a 1σ uncertainty of 0.′′5 in both α
and δ, corresponding to about 10 pixels in the PC image. This value of Cgrav is located at
∼ 4′′ south-west (∆α = −4′′, ∆δ = −0.′′9) from that previously derived by Harris (1996) on
the basis of the surface brightness distribution.
2IRAF is distributed by the National Optical Astronomy Observatory, which is operated by the Associa-
tion of Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science
Foundation.
3Available at http://www-gsss.stsci.edu/Catalogs/GSC/GSC2/GSC2.htm.
http://www-gsss.stsci.edu/Catalogs/GSC/GSC2/GSC2.htm
– 6 –
2.3. Photometric calibration and definition of the catalogs
The optical HST magnitudes (i.e., those obtained through the WFPC2 filters F439W
and F555W, and through ACS filters F435W, F606W, F814W), as well as the WFI B and
V magnitudes have been all calibrated on the catalog of Sandquist et al. (1996). The UV
magnitudes m160 and m255 have been calibrated to the Holtzman et al. (1995) zero-points
following Ferraro et al. (1997, 2001), while the U magnitude m336 has been calibrated to
Dolphin (2000).
In order to reduce spurious effects due to the low resolution of the ground-based obser-
vations in the most crowded regions of the cluster, we use only the HST data for the inner
115′′, this value being imposed by the FoV of the WFPC2 and ACS cameras (see Figure
1). In particular, we define as HST sample the ensemble of all the stars in the WFPC2
and ACS combined catalog having r ≤ 115′′ from the center, and as WFI sample all stars
detected with WFI at r > 115′′ (see Figure 2). The CMDs of the HST and WFI samples in
the (V, U − V ) and (V, B − V ) planes are shown in Figure 3.
2.4. Density profile
We have determined the projected density profile over the entire cluster extension, from
Cgrav out to ∼ 1400
∼ 23.′3, by direct star counts, considering only stars brighter than
V = 20 (see Figure 3) in order to avoid incompleteness biases. The brightest RGB stars that
are strongly saturated in the ACS data set have been excluded from the analysis, but since
they are few in number, the effect on the resulting density profile is completely negligible.
Following the procedure already described in Ferraro et al. (1999a, 2004), we have divided
the entire HST+WFI sample in 27 concentric annuli, each centered on Cgrav and split in
an adequate number of sub-sectors. The number of stars lying within each sub-sector was
counted, and the star density was obtained by dividing these values by the corresponding
sub-sector areas. The stellar density in each annulus was then obtained as the average of
the sub-sector densities, and its standard deviation was estimated from the variance among
the sub-sectors.
The radial density profile thus derived is plotted in Figure 4, where we also show the best-
fit mono-mass King model and the corresponding values of the core radius and concentration:
rc = 27
′′ (with a typical error of ∼ ±2′′) and c = 1.68, respectively. These values confirm
that M5 has not yet experienced core collapse, and they are in good agreement with those
quoted by McLaughlin & van der Marel (2005, rc = 26.
′′3 and c = 1.71), and marginally
consistent with those listed by Harris (1996, rc = 25.
′′2 and c = 1.83), both derived from
– 7 –
the surface brightness profile. Our value of rc corresponds to ∼ 1 pc assuming the distance
modulus (m−M)0 = 14.37 (d ∼ 7.5 Kpc, Ferraro et al. 1999b).
3. DEFINITION OF THE SAMPLES
In order to study the BSS radial distribution and detect possible peculiarities, both the
BSS and a reference population must be properly defined. Since the HST and the WFI data
sets have been observed in different photometric bands, different selection boxes are needed
to separate the samples in the CMDs. The adopted strategy is described in the following
sections (see also Ferraro et al. 2004 for a detailed discussion of this issue).
3.1. The BSS selection
At UV wavelengths BSS are among the brightest objects in a GC, and RGB stars are
particularly faint. By combining these advantages with the high-resolution capability of HST,
the usual problems associated with photometric blends and crowding in the high density
central regions of GCs are minimized, and BSS can be most reliably recognized and separated
from the other populations in the UV CMDs. For these reasons our primary criterion for the
definition of the BSS sample is based on the position of stars in the (m255, m255 −U) plane.
In order to avoid incompleteness bias and the possible contamination from TO and sub-giant
branch stars, we have adopted a limiting magnitude m255 = 18.35, roughly corresponding
to 1 magnitude brighter than the cluster TO. This is also the limiting magnitude used by
W06, facilitating the comparison with their study. The resulting BSS selection box in the
UV CMD is shown in Figure 5. Once selected in the UV CMD, the bulk of the BSS lying in
the field in common with the optical-HST sample has been used to define the selection box
and the limiting magnitude in the (B, B − V ) plane. The latter turns out to be B ≃ 17.85,
and the adopted BSS selection box in the optical CMD is shown in Figure 6. The two stars
lying outside the selection box (namely BSS-19 and BSS-20 in Table 1) have been identified
as BSS from the (m255, m255−U) CMD. Indeed, they are typical examples of how the optical
magnitudes are prone to blend/crowding problems, while the BSS selection in UV bands is
much more secure and reliable. An additional BSS (BSS-47 in Table 1) lies near the edge
of the ACS FoV and has only V and I observations; thus it was selected in the (V, V − I)
plane (see Figure 7, where this BSS is shown together with the other 5 identified in the ACS
complementary sample).
With these criteria we have identified 60 BSS: 47 BSS in the HST sample (r ≤ 115′′)
– 8 –
and 13 in the WFI one. Their coordinates and magnitudes are listed in Table 1. Out of the
47 BSS identified in the HST sample, 41 are from the WFPC2 data set, and 6 from the ACS
catalog. As shown in Figure 1 their projected distribution is quite asymmetric with the N-E
sector seemingly underpopulated. The statistical significance of such an asymmetry appears
even higher if only the BSS outside the core are considered. However a quantitative discussion
of this topic is not warranted unless additional evidences supporting this anomalous spatial
distribution are collected. One of the inner BSS (BSS-29 in Table 1) lying at 21.′′76 from the
center, corresponds to the low-amplitude variable HST-V28 identified by Drissen & Shara
(1998)4. In the WFI sample (r > 115′′) we find 13 BSS, with a more symmetric spatial
distribution (see Figure 2). The most distant BSS (BSS-60 in Table 1, marked with an
empty triangle in Fig.6) lies at ∼ 24′ from the center, i.e., beyond the cluster tidal radius.
Hence, it might be an evaporating BSS previously belonging to the cluster. However, further
investigations are needed before firmly assessing this issue.
In order to perform a proper comparison with W06 study, we have transformed their
BSS catalog in our astrometric system, and we have found that 50 BSS of their bright sample
lie at r ≤ 115′′: 35 are from the HST sample, 13 from the Canada France Hawaii Telescope
(CFHT) data set, and 2 from the Cerro Tololo Inter-American Observatory (CTIO) sample;
in the outer regions (115′′ < r <∼ 425
′′) 9 BSS are identified, all from the CTIO data set.
By cross correlating W06 bright sample with our catalog we have found 43 BSS in
common (see Table 1), 37 at r ≤ 115′′ and 6 outward. In particular, 33 BSS out of the 41
(i.e., 80% of the total) that we have identified in the WFPC2-HST sample5 are found in
both catalogs, while 3 of our BSS belong 5 to their faint BSS sample (namely, BSS-27, 34,
and 40, corresponding to their Core BSS 70, 79, and 76, respectively), 5 of our BSS have
been missed in W06 paper, and 2 objects in their sample are classified as HB stars in our
study. This is probably due to different selection criteria, and/or small differences in the
measured magnitudes, caused by the different data reduction procedures and photometric
analysis. For example, W06 identify the BSS on the basis of both the UV and the optical
observations, while we select the BSS only in the UV plane whenever possible. Out of the
other 15 BSS found at r ≤ 115′′ in the ground-based CFHT/CTIO sample of W06, 8 BSS
(Core BSS 38–45 in their Table 2) clearly are false identifications. They are arranged in a
very unlikely ring around a strongly saturated star, as can be seen in Figure 8, where the
position in the sky of the 8 spurious BSS are overplotted on the CFHT image. Though they
4The observations presented here do not have the time coverage needed to properly search for BSS
variability.
5Note that the WFPC2-HST observations used in W06 and in the present study are the same.
– 9 –
clearly are spurious identifications, they still define a clean sequence in the (B, B−I) CMD,
nicely mimicking the BSS magnitudes and colors. As already discussed in previous papers,
this once again demonstrates how automatic procedures for the search of peculiar objects
are prone to errors, especially when using ground-based observations to probe very crowded
stellar regions. We emphasize that all the candidate BSS listed in our Table 1 have been
visually inspected evaluating the quality and the precision of the PSF fitting. This procedure
significantly reduces the possibility of introducing spurious objects in the sample. Out of the
remaining 7 BSS, 4 objects (namely their Core BSS 32, 30, 37 and 28) are also confirmed
by our ACS observations (BSS-42, 43, 44, and 45 respectively), while 2 others (their Core
BSS 27 and Ground BSS 6) are not found in the ACS data set, and the remaining one (their
Ground BSS 7) is not included in our observation FoV. In turn, two BSS identified in our
ACS data set (BSS 46 and 47) are missed in their sample. Concerning the BSS lying at
115′′ < r < 450′′, 6 objects (out of 9 found in both samples) are in common between the two
catalogs (see Table 1), one (BSS-55) belongs to W06 faint sample (their Ground BSS 23),
while the remaining 2 do not coincide. Moreover, 4 additional BSS have been identified at
r > 450′′ in our study.
3.2. The reference population
Since the HB sequence is bright and well separable in the UV and optical CMDs, we
chose these stars as the primary representative population of normal cluster stars to be used
for the comparison with the BSS data set. As with the BSS, the HB sample was first defined
in the (m255, m255 −U) plane, and the corresponding selection box in (B, B − V ) has then
been determined by using the stars in common between the UV and the optical samples. The
resulting selection boxes in both diagrams are shown in Figures 5 and 6, and are designed
to include the bulk of HB stars6. Slightly different selection boxes would include or exclude
a few stars only without affecting the results.
We have used WFI observations to roughly estimate the impact of possible foreground
field stars contamination on the cluster population selection. As shown in the right-hand
panel of Figure 6, field stars appear to define an almost vertical sequence at 0.4 < B−V < 1
in the (B, B − V ) CMD. Hence, they do not affect the BSS selection box, but marginally
contaminate the reddest end of the HB. In particular, 5 objects have been found to lie within
the adopted HB box in the region at r > rt sample by our observations (∼ 194 arcmin
this corresponds to 0.026 spurious HB stars per arcmin2. On the basis of this, 11 field stars
6The large dispersion in the redder HB stars arises because RR Lyrae variables are included.
– 10 –
are expected to ”contaminate” the HB population over the sampled cluster region (r < rt).
4. THE BSS RADIAL DISTRIBUTION
The radial distribution of BSS in M5 has been studied following the same procedure
previously adopted for other clusters (see references in Ferraro 2006; Beccari et al. 2006).
First, we have compared the BSS cumulative radial distribution to that of HB stars. A
Kolmogorov-Smirnov test gives a ∼ 10−4 probability that they are extracted from the same
population (see Figure 9). BSS are more centrally concentrated than HB stars at ∼ 4σ level.
For a more quantitative analysis, the surveyed area has been divided into 8 concentric
annuli, with radii listed in Table 2. The number of BSS (NBSS) and HB stars (NHB), as
well as the fraction of sampled luminosity (Lsamp) have been measured in the 8 annuli and
the obtained values are listed in Table 2. Note that HB star counts listed in the table are
already decontaminated from field stars, according to the procedure described in Section 3.2
(1, 2, and 8 HB stars in the three outer annuli have been estimated to be field stars). The
listed values have been used to compute the specific frequency FHBBSS ≡ NBSS/NHB, and the
double normalized ratio (see Ferraro et al. 1993):
Rpop =
(Npop/N
(Lsamp/L
tot )
, (1)
with pop = BSS, HB.
In the present study luminosities have been calculated from the surface density profile
shown in Figure 4. The surface density has been transformed into luminosity by means of a
normalization factor obtained by assuming that the value obtained in the core (r ≤ 27′′) is
equal to the sum of the luminosities of all the stars with V ≤ 20 lying in this region. The
distance modulus quoted in Section 2.4 and a reddening E(B−V ) = 0.03 have been adopted
(Ferraro et al. 1999b). The fraction of area sampled by the observations in each annulus has
been carefully computed, and the sampled luminosity in each annulus has been corrected for
incomplete spatial coverage (in the case of annuli 3 and 8; see Figures 1 and 2).
The resulting radial trend of RHB is essentially constant with a value close to unity
over the surveyed area (see Figure 10). This is just what expected on the basis of the
stellar evolution theory, which predicts that the fraction of stars in any post-main sequence
evolutionary stage is strictly proportional to the fraction of the sampled luminosity (Renzini
& Fusi Pecci 1988). Conversely, BSS follow a completely different radial distribution. As
shown in Figure 10 the specific frequency RBSS is highly peaked at the cluster center (a
– 11 –
factor of ∼ 3 higher than RHB in the innermost bin), decreases to a minimum
7 at r ≃ 10 rc,
and rises again outward. The same behavior is clearly visible also in Figure 11, where the
population ratio NBSS/NHB is plotted as a function of r/rc.
Note that the region between 800′′ and rt ≃ 1290
′′ (and thus also BSS-59, that lies at
r ≃ 995.′′5) has not been considered in the analysis, since our observations provide a poor
sampling of this annulus: only 35% of its area, corresponding to ∼ 0.4% of the total sampled
light, is covered by the WFI pointing. However, for sake of completeness, we have plotted in
Figure 12 the corresponding value of FHBBSS even for this annulus (empty circle in the upper
panel): as can be seen, there is a hint for a flattening of the BSS radial distribution in the
cluster outskirts.
4.1. Dynamical simulations
Following the same approach as Mapelli et al. (2004, 2006), we now exploit dynamical
simulations to derive some clues about the BSS formation mechanisms from their observed
radial distribution. We use the Monte-Carlo simulation code originally developed by Sig-
urdsson & Phinney (1995) and upgraded in Mapelli et al. (2004, 2006). In any simulation
run we follow the dynamical evolution of N BSS within a background cluster, taking into
account the effects of both dynamical friction and distant encounters. We identify as COL-
BSS those objects having initial positions ri
∼ rc, and as MT-BSS stars initially lying at
ri ≫ rc (this because stellar collisions are most probable in the central high-density regions
of the cluster, while primordial binaries most likely evolve in isolation in the periphery).
Within these two radial ranges, all initial positions are randomly generated following the
probability distribution appropriate for a King model. The BSS initial velocities are ran-
domly extracted from the cluster velocity distribution illustrated in Sigurdsson & Phinney
(1995), and an additional natal kick is assigned to the COL-BSS in order to account for the
recoil induced by the encounters. Each BSS has characteristic mass M and lifetime tlast. We
follow their dynamical evolution in the cluster (fixed) gravitational potential for a time ti
(i = 1, N), where each ti is a randomly chosen fraction of tlast. At the end of the simulation
we register the final positions of BSS, and we compare their radial distribution with the
observed one. We repeat the procedure until a reasonable agreement between the simulated
and the observed distributions is reached; then, we infer the percentage of collisional and
mass-transfer BSS from the distribution of the adopted initial positions in the simulation.
For a detailed discussion of the ranges of values appropriate for these quantities and
7Note that no BSS have been found between 3.′5 and 5′.
– 12 –
their effects on the final results we refer to Mapelli et al. (2006). Here we only list the
assumptions made in the present study:
– the background cluster is approximated with a multi-mass King model, determined as
the best fit to the observed profile8. The cluster central velocity dispersion is set to
σ = 6.5 km s−1 (Dubath et al. 1997), and, assuming 0.5M⊙ as the average mass of the
cluster stars, the central stellar density is nc = 2× 10
4 pc−3 (Pryor & Meylan 1993);
– the COL-BSS are distributed with initial positions ri ≤ rc and are given a natal kick
velocity of 1× σ;
– initial positions ranging between 5 rc and rt (with the tidal radius rt ≃ 48 rc) have been
considered for MT-BSS in different runs;
– BSS masses have been fixed to M = 1.2M⊙ (Ferraro et al. 2006a), and their charac-
teristic lifetime to tlast = 2 Gyr;
– in each simulation run we have followed the evolution of N = 10, 000 BSS.
The simulated radial distribution that best reproduces the observed one (with a reduced
χ2 ≃ 0.6) is shown in Figure 11 (solid line) and is obtained by assuming that ∼ 80% of the
BSS population was formed in the core through stellar collisions, while only ∼ 20% is made
of MT-BSS. A higher fraction ( >∼ 40%) of MT-BSS does not correctly reproduce the steep
decrease of the distribution and seriously overpredict the number of BSS at r ∼ 10 rc, where
no BSS at all are found, but it nicely matches the observed upturning point at r ≃ 13 rc (see
the dashed line in Figure 11). On the other hand, a population of only COL-BSS is unable to
properly reproduce the external upturn of the distribution (see the dotted line in Figure 11),
and 100% of MT-BSS is also totally excluded. Assuming heavier BSS (up to M = 1.5M⊙) or
different lifetimes tlast (between 1 and 4 Gyr) does not significantly change these conclusions,
since both these parameters mainly affect the external part of the simulated BSS distribution.
Thus, an appreciable effect can be seen only in the case of a relevant upturn, and negligible
variations are found in the best-fit case and when assuming 100% COL-BSS. The effect starts
to be relevant in the simulations with 40% or more MT-BSS, which are however inconsistent
with the observations at intermediate radii (see above).
By using the simulations and the dynamical friction timescale (from, e.g., Mapelli et al.
2006), we have also computed the radius of avoidance of M5. This is defined as the char-
acteristic radial distance within which all MT-BSS are expected to have already sunk to
8By adopting the same mass groups as those of Mapelli et al. (2006), the resulting value of the King
dimensionless central potential is W0 = 9.7
– 13 –
the cluster core, because of mass segregation processes. Assuming 12 Gyr for the age of
M5 (Sandquist et al. 1996) and 1.2M⊙ for the BSS mass, we find that ravoid ≃ 10 rc. This
nicely corresponds to the position of the minimum in the observed BSS radial distribution,
in agreement with the findings of Mapelli et al. (2004, 2006).
5. SUMMARY AND DISCUSSION
In this paper we have used a combination of HST UV and optical images of the cluster
center and wide-field ground-based observations covering the entire cluster extension to de-
rive the main structural parameters and to study the BSS population of the galactic globular
cluster M5.
The accurate determination of the cluster center of gravity from the high-resolution
data gives α(J2000) = 15h 18m 33.s53, δ(J2000) = +2o 4′ 57.′′06, with a 1σ uncertainty of 0.′′5
in both α and δ. The cluster density profile, determined from direct star counts, is well fit
by a King model with core radius rc = 27
′′ and concentration c = 1.68, thus suggesting that
M5 has not yet suffered the core collapse.
The BSS population of M5 amounts to a total of 59 objects, with a quite asymmetric
projected distribution (see Figure 1) and a high degree of segregation in the cluster center.
With respect to the sampled luminosity and to HB stars, the BSS radial distribution is
bimodal: highly peaked at r <∼ rc, decreasing to a minimum at r ≃ 10 rc, and rising again
outward (see Figures 10 and 11).
The comparison with results of W06 has revealed that 43 (out of 59) bright BSS iden-
tified by these authors at r <∼ 450
′′ are in common with our sample. Moreover, 4 additional
stars classified as faint BSS in their study are in common with our BSS sample at r <∼ 450
Considering that we find 56 BSS within the same radial distance from the center, this corre-
sponds to 84% matching of our catalogue. The discrepancies are explained by different data
reduction procedures, photometric analysis, and adopted selection criteria, other than the
spurious identification of 8 BSS by W06, due a strongly saturated star in their sample. The
central peak of the RBSS distribution in our study is slightly higher (but compatible within
the error bar) compared to that of W06, and we extend the analysis to larger distance from
the center (out to r > 800′′), thus unveiling the external upturn and the possible flattening
of the BSS distribution in the cluster outskirts.
Moreover, we have compared the BSS radial distribution of M5 with that observed
in other GCs studied in a similar way. In Figure 12 we plot the specific frequency FHBBSS
as a function of (r/rc) for M5, M3, 47 Tuc, and NGC 6752. Such a comparison shows
– 14 –
that the BSS radial distributions in these clusters are only qualitatively similar, with a
high concentration at the center and an upturn outward. However, significant quantitative
differences are apparent: (1) the FHBBSS peak value, (2) the steepness of the decreasing branch of
the distribution, (3) the radial position of the minimum (marked by arrows in the figure), and
(4) the extension of the “zone of avoidance,” i.e., the intermediate region poorly populated
by BSS. In particular M5 shows the smallest FHBBSS peak value: it turns out to be ∼ 0.24,
versus a typical value >∼ 0.4 in all the other cases. It also shows the mildest decreasing slope:
at r ≈ 2 rc the specific frequency in M5 is about a half of the peak value, while it decreases
by a factor of 4 in all the other clusters. Conversely, it is interesting to note that the value
reached by FHBBSS in the external regions is ∼ 50-60% of the central peak in all the studied
clusters. Another difference between M5 and the other systems concerns the ratio between
the radius of avoidance and the tidal radius: ravoid ≃ 0.2 rt for M5, while ravoid
∼ 0.13 rt for
47 Tuc, M3, and NGC 6752 (see Tables 1 and 2 in Mapelli et al. 2006).
The dynamical simulations discussed in Section 4.1 suggest that the majority of BSS in
M5 are collisional, with a content of MT-BSS ranging between 20% and 40% of the overall
population. This fraction seems to be smaller than that (40-50%) derived for M3, 47 Tuc
and NGC 6752 by Mapelli et al. (2006), in qualitative agreement with the smaller value of
ravoid/rt estimated for M5, which indicates that the fraction of cluster currently depopulated
of BSS is larger in this system than in the other cases. More in general, the results shown
in Figure 11 exclude a pure collisional BSS content for M5.
Our study has also revealed the presence of a candidate BSS at ∼ 24′ from the center,
i.e., beyond the cluster tidal radius (see Figures 2 and 6 and BSS-59 in Table 1). If confirmed,
this could represent a very interesting case of a BSS previously belonging to M5 and then
evaporating from the cluster (a BSS kicked off from the core the because of dynamical
interactions?).
This research was supported by Agenzia Spaziale Italiana under contract ASI-INAF
I/023/05/0, by the Istituto Nazionale di Astrofisica under contract PRIN/INAF 2006, and
by the Ministero dell’Istruzione, dell’Università e della Ricerca. RTR is partially funded by
NASA through grant number GO-10524 from the Space Telescope Science Institute. We
thank the referee E. Sandquist for the careful reading of the manuscript and the useful
comments and suggestions that significantly improved the presentation of the paper.
REFERENCES
Albrow, M. D., et al. 2001, ApJ, 559, 1060
– 15 –
Bailyn, C. D. 1994, AJ, 107, 1073
Beccari, G., Ferraro, F. R., Lanzoni, B., & Bellazzini, M., 2006, ApJ, 652, L121
Bellazzini, M., Fusi Pecci, F., Messineo, M., Monaco, L., & Rood, R. T. 2002, AJ, 123, 1509
Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393
Buonanno, R., Buscema, G., Corsi, C. E., Ferraro, I., & Iannicola, G. 1983, A&A, 126, 278
Buonanno, R., Iannicola, G. 1989, PASP, 101, 294
Dolphin, A. E. 2000, PASP, 112, 1383
Drissen, L., & Shara, M. M. 1998, AJ, 115, 725
Dubath P., Meylan G., Mayor M., 1997, A&A, 324, 505
Ferraro, F. R., Fusi Pecci, F., Cacciari, C., Corsi, C., Buonanno, R., Fahlman, G. G., &
Richer, H. B. 1993, AJ, 106, 2324
Ferraro, F. R., Paltrinieri, B., Fusi Pecci, F., Cacciari, C., Dorman, B., Rood, R. T., Buo-
nanno, R., Corsi, C. E., Burgarella, D., & Laget, M., 1997, A&A, 324, 915
Ferraro, F. R., Paltrinieri, B., Rood, R. T., Dorman, B. 1999a, ApJ 522, 983
Ferraro F. R., Messineo M., Fusi Pecci F., De Palo M. A., Straniero O., Chieffi A., Limongi
M. 1999b, AJ, 118, 1738
Ferraro, F. R., D’Amico, N., Possenti, A., Mignani, R. P., & Paltrinieri, B. 2001, ApJ, 561,
Ferraro, F. R., Sills, A., Rood, R. T., Paltrinieri, B., & Buonanno, R. 2003, ApJ, 588, 464
Ferraro, F. R., Beccari, G., Rood, R. T., Bellazzini, M., Sills, A., & Sabbi, E. 2004, ApJ,
603, 127
Ferraro, F. R., 2006, in Resolved Stellar Populations, ASP Conference Series, 2005, D. Valls-
Gabaud & M. Chaves Eds., astro-ph/0601217
Ferraro, F. R., et al. 2006a, ApJ, 647, L53
Ferraro, F. R., Sollima, A., Rood, R. T., Origlia, L., Pancino, E., & Bellazzini, M. 2006b,
ApJ, 638, 433
http://arxiv.org/abs/astro-ph/0601217
– 16 –
Fusi Pecci, F., Ferraro, F. R., Corsi, C. E., Cacciari, C., Buonanno, R. 1992, AJ, 104, 1831
Harris, W.E. 1996, AJ, 112, 1487
Holtzman, J. A., Burrows, C. J., Casertano, S., Hester, J. J., Trauger, J. T., Watson, A. M.,
& Worthey, G. 1995, PASP, 107, 1065
Lombardi, J. C. Jr., Rasio, F. A., Shapiro, S. L. 1995, ApJ, 445, L117
Mapelli, M., Sigurdsson, S., Colpi, M., Ferraro, F. R., Possenti, A., Rood, R. T., Sills, A.,
& Beccari, G. 2004, ApJ, 605, L29
Mapelli, M., Sigurdsson, S., Ferraro, F. R., Colpi, M., Possenti, A., & Lanzoni, B. 2006,
MNRAS, 373, 361
McLaughlin, D. E., & van der Marel, R. P. 2005, ApJS, 161, 304
Montegriffo, P., Ferraro, F. R., Fusi Pecci, F., & Origlia, L. 1995, MNRAS, 276, 739
Piotto, G., et al. 2002, A&A, 391, 945
Pryor C., & Meylan G., 1993, Structure and Dynamics of Globular Clusters. Proceedings of a
Workshop held in Berkeley, California, July 15-17, 1992, to Honor the 65th Birthday of
Ivan King. Editors, S.G. Djorgovski and G. Meylan; Publisher, Astronomical Society
of the Pacific, Vol. 50, 357
Renzini, A., & Fusi Pecci, F. 1988, ARA&A, 26, 199
Sabbi, E., Ferraro, F. R., Sills, A., Rood, R. T., 2004, ApJ 617, 1296
Sandquist, E. L., Bolte, M., Stetson, P. B.; Hesser, J. E. 1996, ApJ, 470, 910
Sarna, M. J., & de Greve, J. P. 1996, QJRAS, 37, 11
Schechter, P. L., Mateo, M., & Saha, A. 1993, PASP, 105, 1342
Shara, M. M., Saffer, R. A., & Livio, M. 1997, ApJ, 489, L59
Sigurdsson S., Phinney, E. S., 1995, ApJS, 99, 609
Sirianni, M., et al. 2005, PASP, 117, 1049
Warren, S. R., Sandquist, E. L., & Bolte, M., 2006, ApJ 648, 1026 (W06)
Zaggia, S. R., Piotto, G., & Capaccioli, M., 1997, A&A, 327, 1004
– 17 –
Fig. 1.— Map of the HST sample. The heavy solid line delimits the HST-WFPC2 FoV
of our UV observations (Prop. 6607), the dashed line bounds the FoV of the optical HST-
WFPC2 observations by Piotto et al. (2002), and the dotted line marks the edge of the
complementary ACS data set. The derived center of gravity Cgrav is marked with a cross.
BSS (heavy dots) and the concentric annuli used to study their radial distribution (cfr. Table
2) are also shown. The inner and outer annuli correspond to r = rc = 27
′′ and r = 115′′,
respectively.
This preprint was prepared with the AAS LATEX macros v5.2.
– 18 –
Fig. 2.— Map of the WFI sample. All BSS detected in the WFI sample are marked as heavy
dots, and the concentric annuli used to study their radial distribution are shown as solid lines,
with the inner and outer annuli corresponding to r = 115′′ and r = 800′′, respectively (cfr.
Table 2). The circle corresponding to the tidal radius (rt ≃ 21.
′5) is also shown as dashed-
dotted line. The BSS lying beyond rt might represent a BSS previously belonging to M5
and now evaporating from the cluster.
– 19 –
– 20 –
Fig. 3.— Optical CMDs of the WFPC2-HST and the WFI samples. The hatched regions
indicate the magnitude limit (V ≤ 20) adopted for selecting the stars used to construct the
cluster surface density profile.
– 21 –
Fig. 4.— Observed surface density profile (dots and error bars) and best-fit King model (solid
line). The radial profile is in units of number of stars per square arcseconds. The dotted line
indicates the adopted level of the background, and the model characteristic parameters (core
radius rc, concentration c, dimensionless central potential W0) are marked in the figure. The
lower panel shows the residuals between the observations and the fitted profile at each radial
coordinate.
– 22 –
Fig. 5.— CMD of the ultraviolet HST sample. The adopted magnitude limit and selection
box used for the definition of the BSS population are shown. The resulting fiducial BSS are
marked with empty circles. The open square corresponds to the variable BSS identified by
Drissen & Shara (1998). The box adopted for the selection of HB stars is also shown.
– 23 –
Fig. 6.— CMD of the optical HST-WFPC2 and WFI samples. The adopted BSS and HB
selection boxes are shown, and all the BSS identified in these samples are marked with the
empty circles. The two BSS not included in the box in the left-hand panel lie well within
the selection box in the UV plane and are therefore considered as fiducial BSS. The empty
triangle in the right-hand panel corresponds to the BSS identified beyond the cluster tidal
radius, at r ≃ 24′.
– 24 –
Fig. 7.— CMD of the ACS complementary sample. The BSS selection box is shown, and
the resulting fiducial BSS are marked with empty circles.
– 25 –
Fig. 8.— Left-hand panel: position of the 8 false BSS (marked with white circles) as derived
from Table 2 of W06, overplotted to the CFHT image (units are the same as in their Figure
1). As can be seen, a heavily saturated star is responsible for the false identification. Right-
hand panel: location of the 8 false BSS (empty circles) in the (B, B − I) plane, as derived
from Table 2 of W06 (cfr. to their Fig. 2).
– 26 –
Fig. 9.— Cumulative radial distribution of BSS (solid line) and HB stars (dashed line) as
a function of the projected distance from the cluster center for the combined HST+WFI
sample. The two distributions differ at ∼ 4σ level.
– 27 –
Fig. 10.— Radial distribution of the BSS and HB double normalized ratios, as defined in
equation (1), plotted as a function of the radial coordinate expressed in units of the core
radius. RHB (with the size of the rectangles corresponding to the error bars computed
as described in Sabbi et al. 2004) is almost constant around unity over the entire cluster
extension, as expected for any normal, non-segregated cluster population. Instead, the radial
trend of RBSS (dots with error bars) is completely different: highly peaked in the center (a
factor of ∼ 3 higher than RHB), decreasing at intermediate radii, and rising again outward.
– 28 –
Fig. 11.— Observed radial distribution of the specific frequency NBSS/NHB (filled circles
with error bars), as a function of r/rc. The simulated distribution that best reproduces the
observed one is shown as a solid line and is obtained by assuming 80% of COL-BSS and 20%
of MT-BSS. The simulated distributions obtained by assuming 40% of MT-BSS (dashed line)
and 100% COL-BSS (dotted line) are also shown.
– 29 –
Fig. 12.— Radial distribution of the population ratio NBSS/NHB for M5, M3, 47 Tuc, and
NGC 6752, plotted as a function of the radial distance from the cluster center, normalized to
the core radius rc (from Mapelli et al. 2006, rc ≃ 30
′′, 21′′, 28′′ for M3, 47 Tuc, and NGC 6752,
respectively). The arrows indicate the position of the minimum of the distribution in each
case. The outermost point shown for M5 (empty circle) corresponds to BSS-58, lying at
r ≃ 995′′. This star has not been considered in the quantitative study of the BSS radial
distribution since only a negligible fraction of the annuls between 800′′ and rt is sampled by
our observations.
– 30 –
Table 1. The BSS population of M5
Name RA DEC m255 U B V I W06
[degree] [degree]
BSS-1 229.6354506 2.0841090 16.52 16.15 15.88 15.71 - CR2
BSS-2 229.6388102 2.0849660 17.95 17.38 17.40 17.04 - CR4
BSS-3 229.6383433 2.0842640 18.21 17.63 17.64 17.32 - CR3
BSS-4 229.6416234 2.0851791 17.59 17.22 17.05 16.90 - CR5
BSS-5 229.6416518 2.0836794 16.28 15.99 15.79 15.70 - CR1
BSS-6 229.6381953 2.0810119 17.36 16.99 16.81 16.65 - CR21
BSS-7 229.6403657 2.0824062 17.40 17.07 16.97 16.76 - CR12
BSS-8 229.6412279 2.0823768 17.91 17.47 17.41 17.15 - CR13
BSS-9 229.6376256 2.0793288 17.84 17.12 16.99 16.77 - CR23
BSS-10 229.6401139 2.0794858 17.57 16.98 16.87 16.62 - CR22
BSS-11 229.6396566 2.0784944 17.51 17.20 17.12 16.92 - CR24
BSS-12 229.6432834 2.0797197 18.12 17.64 17.78 17.54 - -
BSS-13 229.6384406 2.0776614 17.36 16.88 16.88 16.59 - CR25
BSS-14 229.6274500 2.0864896 18.07 17.63 17.64 17.33 - CR8
BSS-15 229.6204246 2.0879629 18.33 17.61 17.75 17.36 - CR11
BSS-16 229.6209379 2.0917858 17.80 17.28 17.26 16.98 - CR18
BSS-17 229.6264834 2.0960870 16.32 16.22 16.20 16.13 - CR20
BSS-18 229.6368731 2.0896002 16.56 16.30 16.11 16.01 - CR14
BSS-19 229.6367309 2.0917639 18.27 17.35 17.58 17.07 - CR17
BSS-20 229.6345837 2.0906438 17.88 16.81 16.96 16.43 - CR16
BSS-21 229.6382677 2.0934706 18.25 17.58 17.71 17.35 - CR19
BSS-22 229.6340227 2.0853879 17.67 17.32 17.22 17.03 - CR7
BSS-23 229.6332685 2.0875294 17.69 17.34 17.21 17.08 - CR10
BSS-24 229.6366685 2.0807168 18.23 17.78 17.67 17.37 - -
BSS-25 229.6393544 2.0762832 18.11 17.79 17.72 17.50 - -
BSS-26 229.6378381 2.0779999 17.86 17.52 17.43 17.27 - -
BSS-27 229.6349851 2.0807202 18.17 17.51 17.74 17.30 - CR70
BSS-28 229.6397645 2.0736403 18.19 17.60 17.69 17.28 - CR33
BSS-29 229.6370495 2.0770798 16.83 16.56 16.57 17.75 - CR26
BSS-30 229.6358816 2.0747883 18.25 17.81 17.79 17.51 - CR31
BSS-31 229.6361653 2.0720147 18.29 17.77 17.81 17.47 - CR36
BSS-32 229.6339822 2.0723032 16.73 16.10 16.16 15.95 - CR35
BSS-33 229.6281392 2.0756490 17.74 17.41 17.22 17.09 - CR29
BSS-34 229.6241278 2.0750261 18.21 17.50 17.65 17.27 - CR79
BSS-35 229.6332759 2.0603761 17.48 17.17 16.95 16.86 - CR48
BSS-36 229.6270877 2.0662947 17.33 17.18 17.06 16.95 - CR47
BSS-37 229.6244175 2.0693612 16.89 16.41 16.51 15.71 - CR46
BSS-38 229.6180419 2.0724090 17.37 17.23 17.12 17.00 - CR34
– 31 –
Table 1—Continued
BSS-39 229.6311963 2.0857800 18.31 17.33 17.40 16.76 - -
BSS-40 229.6297499 2.0664961 18.16 17.58 - 17.27 - CR76
BSS-41 229.6443367 2.0872809 - - 17.50 17.23 - CR9
BSS-42 229.6448646 2.0738335 - - 16.53 16.06 15.95 CR32
BSS-43 229.6460645 2.0748695 - - 16.64 16.44 16.66 CR30
BSS-44 229.6481631 2.0718829 - - 16.72 16.61 16.87 CR37
BSS-45 229.6433942 2.0760163 - - 17.03 16.79 16.91 CR28
BSS-46 229.6439884 2.0775670 - - 17.44 16.99 16.81 -
BSS-47 229.6180420 2.0598328 - - - 17.18 17.12 -
BSS-48 229.6092873 2.1680914 - - 16.85 16.68 - OR2
BSS-49 229.6723094 2.0882827 - - 16.94 16.64 - OR9
BSS-50 229.6006551 2.0814678 - - 17.00 16.74 - OR10
BSS-51 229.6669956 1.9781808 - - 17.20 16.74 - OR1
BSS-52 229.5949935 2.0469325 - - 17.69 17.46 - OR4
BSS-53 229.6706625 2.0695464 - - 17.82 17.50 - -
BSS-54 229.6667908 2.1149550 - - 17.82 17.72 - -
BSS-55 229.7370667 2.0323392 - - 17.80 17.42 - OR23
BSS-56 229.5476990 2.0112610 - - 16.88 16.60 - OR5
BSS-57 229.6711255 1.9415566 - - 16.98 16.64 - -
BSS-58 229.4381714 2.0302088 - - 17.75 17.33 - -
BSS-59 229.7408412 2.3399166 - - 17.49 17.08 - -
BSS-60 229.3218200 2.3271022 - - 16.34 16.09 - -
Note. — The first 41 BSS have been identified in the WFPC2 sample; BSS-42–
46 are from the complementary ACS observations; BSS-47–59 are from the WFI
data-set. BSS-59 lies beyond the cluster tidal radius, at ∼ 24′ from the center. The
last column list the corresponding BSS in W06 sample, with ”CR” indicating their
”Core BSS” and ”OR” their ”Outer Region BSS”.
– 32 –
Table 2. Number counts of BSS and HB
stars
′′ re
′′ NBSS NHB L
samp/L
0 27 22 94 0.14
27 50 15 94 0.16
50 115 10 135 0.26
115 150 3 46 0.09
150 210 2 52 0.10
210 300 0 45† 0.10
300 450 4 42† 0.09
450 800 2 38† 0.06
Note. — † The NHB values listed here
are those corrected for field contamination
(i.e., 1, 2 and 8 stars have been subtracted
to the observed number counts in these
three external annuli, respectively).
	INTRODUCTION
	OBSERVATIONS AND DATA ANALYSIS
	The data sets
	Astrometry and center of gravity
	Photometric calibration and definition of the catalogs
	Density profile
	DEFINITION OF THE SAMPLES
	The BSS selection
	The reference population
	THE BSS RADIAL DISTRIBUTION
	Dynamical simulations
	SUMMARY AND DISCUSSION
ABSTRACT
  By combining high-resolution HST and wide-field ground based observations, in
ultraviolet and optical bands, we study the Blue Stragglers Star (BSS)
population of the galactic globular cluster M5 (NGC 5904) from its very central
regions up to its periphery. The BSS distribution is highly peaked in the
cluster center, decreases at intermediate radii and rises again outward. Such a
bimodal distribution is similar to those previously observed in other globular
clusters (M3, 47Tucanae, NGC6752). As for these clusters, dynamical simulations
suggest that, while the majority of BSS in M5 could be originated by stellar
collisions, a significant fraction (20-40%) of BSS generated by mass transfer
processes in primordial binaries is required to reproduce the observed radial
distribution. A candidate BSS has been detected beyond the cluster tidal
radius. If confirmed, this could represent an interesting case of an
"evaporating" BSS.

<|endoftext|><|startoftext|>
Entanglement Entropy of two-dimensional anti-de Sitter black holes
Mariano Cadoni∗
Dipartimento di Fisica, Università di Cagliari, and INFN sezione di Cagliari,
Cittadella Universitaria 09042 Monserrato, ITALY
Using the AdS/CFT correspondence we derive a formula for the entanglement entropy of the
anti-de Sitter black hole in two spacetime dimensions. The leading term in the large black hole
mass expansion of our formula reproduces exactly the Bekenstein-Hawking entropy SBH , whereas
the subleading term behaves as lnSBH . This subleading term has the universal form typical for
the entanglement entropy of physical systems described by effective conformal fields theories (e.g.
one-dimensional statistical models at the critical point). The well-known form of the entanglement
entropy for a two-dimensional conformal field theory is obtained as analytic continuation of our
result and is related with the entanglement entropy of a black hole with negative mass.
Quantum entanglement is a fundamental feature of quantum systems. It is related to the existence of correlations
between parts of the system. The degree of entanglement of a quantum system is measured by the entanglement
entropy Sent. In quantum field theory (QFT), or more in general in many body systems, we can localize observable
and unobservable degrees of freedom in spatially separated regions Q and R. Sent is then defined as the von Neumann
entropy of the system when the degrees of freedom in the region R are traced over, Sent = −TrQρ̂Q ln ρ̂Q, where the
trace is taken over states in the observable region Q and the reduced density matrix ρ̂Q = TrRρ̂ is obtained by tracing
the density matrix ρ̂ over states in the region R.
Investigation of the entanglement entropy (EE) has become relevant in many research areas. Apart from quantum
information theory, the field that gave birth to the notion of entanglement entropy, it plays a crucial role in condensed
matter systems, where it helps to understand quantum phases of matter (e.g spin chains and quantum liquids)[1, 2,
3, 4, 5]. Entanglement (geometric) entropy is also an useful concept for investigating general features of QFT, in
particular two-dimensional conformal field theory (CFT) and the Anti-de Sitter/conformal field theory (AdS/CFT)
correspondence [6, 7, 8, 9, 10, 11, 12] . Last but not least entanglement may held the key for unraveling the mystery
of black hole entropy [13, 14, 15, 16, 17, 18, 19, 20, 21, 22].
We will be mainly concerned with the entanglement entropy of two-dimensional (2D) CFT and its relationship with
the entropy of 2D black holes. It is an old idea that black hole entropy may be explained in terms of the EE of the
quantum state of matter fields in the black hole geometry [13]. The main support to this conjecture comes from the
fact that both the EE of matter fields and the Bekenstein-Hawking (BH) entropy depend on the area of the boundary
region. On the other hand any attempt to explain the BH entropy as originating from quantum entanglement has to
solve conceptual and technical difficulties.
The usual statistical paradigm explains the BH entropy in terms of a microstate gas. This is conceptually different
from the EE that measures the observer’s lack of information about the quantum state of the system in a inaccessible
region of spacetime. Moreover, the EE depends both on the number of species ns of the matter fields, whose
entanglement should reproduce the BH entropy, and on the value of the UV cutoff δ arising owing to the presence
of a sharp boundary between the accessible and inaccessible regions of the spacetime. Conversely, the BH entropy is
meant to be universal, hence independent from ns and δ. Some conceptual difficulties can be solved using Sakharov’s
induced gravity approach [23, 24, 25], but the problem of the dependence on ns and δ still remains unsolved.
In this letter we will show that in the case of two-dimensional AdS black hole these difficulties can be completely
solved. We will derive an expression for the black hole EE that in the large black hole mass limit reproduces exactly
the BH entropy. Moreover, we will show that the subleading term has the universal behavior typical for CFTs and in
particular for critical phenomena. The reason of this success is related to the peculiarities of 2D AdS gravity, namely
the existence of an AdS/CFT correspondence and the fact that 2D Newton constant can be considered as wholly
induced by quantum fluctuations of the dual CFT.
Most of the progress in understanding the EE in QFT has been achieved in the case of 2D CFT. Conformal
invariance in two space-time dimension is a powerful tool that allows us to compute the EE in closed form. The
entanglement entropy for the ground state of a 2D CFT originated from tracing over correlations between spacelike
separated points has been calculated by Holzhey, Larsen and Wilckzek [6]. Introducing an infrared cutoff Λ the
spacelike coordinate of our 2D universe will belong to C = [0,Λ[. The subsystem where measurements are performed
is Q = [0,Σ[, whereas the outside region where the degrees of freedom are traced over is R = [Σ,Λ[. Because of the
contribution of localized excitations arbitrarily near to the boundary the entanglement entropy diverges. Introducing
http://arxiv.org/abs/0704.0140v2
an ultraviolet cutoff δ, the regularized entanglement entropy turns out to be [6]
Sent =
c+ c̄
, (1)
where c and c̄ are the central charges of the 2D CFT. The expression (1) emphasizes the characterizing features of
the entanglement entropy, namely subadditivity and invariance under the transformation which exchanges the inside
and outside regions
Σ → Λ− Σ. (2)
Moreover, Sent is not a monotonic function of Σ, but increases and reaches its maximum for Σ = Λ/2 and then
decreases as Σ increases further. This behavior has an obvious explanation. When the subsystem begins to fill most
of the universe there is lesser information to be lost and the entanglement entropy decreases.
Let us now consider 2D AdS black holes. As classical solutions of a 2D gravity theory they are endowed with a
non-constant scalar field, the dilaton Φ. In the Schwarzschild gauge the 2D AdS black hole solutions are [26],
ds2 = −
dt2 +
dr2, Φ = Φ0
, (3)
where the length L is related to cosmological constant of the AdS spacetime (λ = 1/L2), Φ0 is the dimensionless 2D
inverse Newton constant and a is an integration constants related to the black hole mass M and horizon radius rh by
. (4)
The thermodynamical, Bekenstein-Hawking, entropy of the black hole is [26]
SBH = 2πΦ0a = 2π
2Φ0ML, (5)
whereas the black hole temperature is T = a/2πL. Setting a = 0 in Eq. (3) we have the AdS black hole ground state
( in the following called AdS0) with zero mass, temperature and entropy. The AdS black hole (3) can be considered
as the thermalization of the AdS0 solution at temperature a/2πL [26].
It has been shown that the 2D black hole has a dual description in terms of a CFT with central charge [27, 28, 29, 30]
c = 12Φ0. (6)
The dual CFT can have both the form of a 2D [29, 30] or a 1D [27, 28] conformal field theory. This AdS2/CFT2 ( or
AdS2/CFT1) correspondence has been used to give a microscopical meaning to the thermodynamical entropy of 2D
AdS black holes. Eq. (5) has been reproduced by counting states in the dual CFT.
In Ref. [16] (see also Refs. [24, 25, 31]) it was observed that in two dimensions black hole entropy can be ascribed
to quantum entanglement if 2D Newton constant is wholly induced by quantum fluctuations of matter fields. On
the other hand the AdS2/CFT2 correspondence, and in particular Eq. (6), tells us that the 2D Newton constant is
induced by quantum fluctuations of the dual CFT. It follows that the black hole entropy (5) should be explained as
the entanglement of the vacuum of the 2D CFT of central charge given by Eq. (6) in the gravitational black hole
background (3).
At first sight one is tempted to use Eq. (1) to calculate the entanglement entropy of the vacuum of the dual CFT.
The exterior region of the 2D black hole can be easily identified with the region Q, whereas the black hole interior
has to be identified with the R region where the degrees of freedom are traced over. There are two obstacles that
prevents direct application of Eq. (1). First, Eq. (1) holds for a 2D flat spacetime, whereas we are dealing with a
curved 2D background. Second, the calculations leading to Eq. (1) are performed for spacelike slice Q, whereas in
our case the coordinate singularities at r = rh (the horizon) and r = ∞ (the timelike asymptotic boundary of the
AdS spacetime) do not allow for a global notion of spacelike coordinate (a coordinate system covering the whole black
hole spacetime in which the metric is non-singular and static). Owing to these geometrical features, in the black hole
case we cannot give a direct meaning to both the measures Σ and (Λ−Σ) of the subsystems Q,R. As a consequence
invariance under the transformation (2) is meaningless in the black hole case.
The second difficulty can be circumvented using appropriate coordinate system and regularization procedure, the
first using instead of Eq. (1) the formula derived by Fiola et al. [16], which gives the EE of the vacuum of matter
fields in the case of a curved gravitational background.
σ= ∞σ=ε 
FIG. 1: Regularized euclidean instanton corresponding to the 2D AdS black hole in the coordinate system (t, σ) covering
only the black hole exterior. The euclidean time is periodic. The point σ = ∞ correspond to the black hole horizon. σ = 0
corresponds to the asymptotic timelike boundary of AdS2.
In the coordinate system used to define the vacuum of scalar fields in AdS2, the 2D black hole metric (3) is [26]
ds2 =
sinh2(aσ
−dt2 + dσ2
. (7)
The coordinate system (t, σ) covers only the black hole exterior. The black hole horizon corresponds to σ = ∞
where the conformal factor of the metric vanishes. The asymptotic r = ∞ timelike conformal boundary of the AdS2
spacetime is located at σ = 0, where the conformal factor diverges.
The entanglement entropy of the CFT vacuum in the curved background (7) can be calculated, using the formula
of Ref. [16] as the half line entanglement entropy seen by an observer in the 0 < σ < ∞ region. From the CFT
point of view the AdS black hole has to be considered as the AdS0 vacuum seen by the observer using the black hole
coordinates (7) [26]. Moreover, this observer sees the the AdS0 vacuum as filled with thermal radiation with negative
flux [26]. It follows that the black hole entanglement entropy is given by the formula of Ref. [16] with reversed sign,
ent = −
ρ(σ = 0)− ln
, (8)
where ρ defines the conformal factor of the metric in the conformal gauge (ds2 = exp(2ρ)(−dt2+dσ2)), c is the central
charge given by Eq. (6) and δ,Λ are respectively UV and IR cutoffs. Notice that in Eq. (8) we have only contributions
from only one sector (e.g. right movers) of the CFT. In Ref. [29, 30] it has been shown that the 2D AdS black hole
is dual to an open string with appropriate boundary conditions. These boundary conditions are such that only one
sector of the CFT2 is present. The same is obviously true for the AdS2/CFT1 realization of the correspondence
[27, 28].
The conformal factor of the metric (7), hence the entanglement entropy (8) blows up on the σ = 0 boundary of the
AdS spacetime. The simplest regularization procedure that solves this problem is to consider a regularized boundary
at σ = ǫ. Notice that ǫ plays the role of a UV cutoff for the coordinate σ, which is the natural spacelike coordinate
of the dual CFT. ǫ is an IR cutoff for the coordinate r, which is the natural spacelike coordinate for the AdS2 black
hole. The regularized euclidean instanton corresponding to the black hole (7) is shown in figure (1). The regularizing
parameter ǫ can be set equal to the UV cutoff, δ = ǫ. Moreover, the regularized boundary is at finite proper distance
from the horizon so that ǫ acts also as IR regulator, making the presence of the IR cutoff Λ in Eq. (8) redundant. It
follows that the regularized EE is given by S
ent = −
ρ(ǫ)− ln ǫ
, which using equations (7) and (4) becomes
ent =
. (9)
As a check of the validity of our formula we note that in the case of AdS0 (rh = 0) the entanglement entropy vanishes.
The AdS/CFT correspondence enable us to identify the cutoff ǫ as the UV cutoff of the CFT : ǫ ∝ L. The
proportionality factor can be determined by requiring that the analytical continuation of Eq. (9) is invariant under
the transformation (2) (see later). This requirement fixes ǫ = πL. With this position we get
ent =
. (10)
This formula is our main result, it gives the entanglement entropy of the 2D AdS black hole. This entanglement
entropy has the expected behavior as a function of the horizon radius rh or, equivalently, of the black hole mass M .
ent becomes zero in the AdS0 ground state, rh = 0 (M = 0), whereas it grows monotonically for rh > 0 (M > 0).
In order to compare the black hole EE (10) with the BH entropy (5) let us consider the limit of macroscopic black
holes, that is the limit a → ∞ or equivalently rh >> L or also M >> 1/L. Expanding Eq. (10) and using Eqs. (4)
and (6) we get
ent = 2π
2Φ0ML− Φ0 lnLM +O(1) = SBH − 2Φ0 lnSBH +O(1). (11)
We have obtained the remarkable result that the leading term in the large mass expansion of the black hole en-
tanglement entropy reproduces exactly the Bekenstein-Hawking entropy. Moreover, the subleading term behaves as
the logarithm of the BH entropy and describes quantum corrections to SBH . It is an universally accepted result
that the quantum corrections to the BH entropy behave as lnSBH [32, 33, 34, 35, 36, 37, 38, 39, 40, 41]. However,
there is no general consensus about the value of the prefactor of this term. For the microcanonical ensemble this
term has to be negative, whereas there are positive contributions coming from thermal fluctuation. Equation (11)
fixes the prefactor of lnSBH in terms of the 2D Newton constant. This result contradicts some previous results
supporting a Φ0-independent value of the prefactor. Our result is consistent with the approach followed in this
paper, which considers 2D gravity as induced from the quantum fluctuations of a CFT with central charge 12Φ0.
The first (Bekenstein-Hawking) term in Eq. (11) is the induced entanglement entropy, whereas the second term,
−(c/6) ln(rh/L), is determined by the conformal symmetry. It gives the entanglement entropy (1) of a CFT in 2D flat
spacetime with central charge 12Φ0 and Σ = rh in the limit Σ << Λ [6]. The subleading term in Eq. (11) represents
therefore an universal behavior shared with other systems described by 2D QFTs, such as one-dimensional statistical
models near to the critical point (with the black hole radius rh corresponding to the correlation length) or free scalars
fields [7, 9].
Eq. (10) shows a close resemblance with the CFT entanglement entropy (1). Eqs. (10) and (1) differs in two
main points: the absence in the black hole case of something corresponding to the measure of the whole space (the
parameter Λ in Eq. (1)) and the appearance of hyperbolic instead of trigonometric functions. These are expected
features for the entanglement entropy of a black hole. They solve the problems concerning the application of formula
(1) to the black hole case. For a black hole one cannot define a measure of the whole space analogue to Λ. For
static solutions the coordinate system covers only the black hole exterior. The appearance of hyperbolic instead of
trigonometric functions allows for monotonic increasing of S
ent (rh), eliminating the unphysical decreasing behavior
of Sent(Σ) in the region Σ > Λ/2.
It is interesting to see how Eq. (1) can be obtained as the analytic continuation rh → irh of our formula (10),
i.e by considering an AdS black hole with negative mass. The analytically continued black hole solution is given
by Eq. (3) with a2 < 0. In the conformal gauge the solution reads now ds2 = [a2/ sin2(aσ/L)](−dt2 + dσ2). The
range of the spacelike coordinate, corresponding to 0 < r < ∞, is now 0 < σ < πL/2a. Regularizing the solution
at σ = 0 by introducing the cutoff ǫ we get the euclidean instanton shown in Fig. (2). In terms of the 2D CFT
we have to trace over the degrees of freedom outside the spacelike slice ǫ < σ < πL/2a. The related entanglement
entropy can be calculated using the formula of Ref. [16] in the case of a spacelike slice with two boundary points:
Sent = −c/6[ρ(ǫ)+ρ(πL/2a)− ln(δ/Λ)]. Applying this formula to the case of the black hole solution of negative mass,
identifying ǫ in terms of the IR cutoff Λ, ǫ = πL2/Λ, and redefining appropriately the UV cutoff δ, we get
Sent =
. (12)
σ= πL/2a    σ=ε 
FIG. 2: Regularized euclidean instanton corresponding to the 2D AdS black hole with negative mass. The euclidean time
is periodic. The point σ = πL/2a corresponds to the black hole singularity at r = 0. σ = 0 corresponds to the asymptotic
timelike boundary of AdS2.
Thus, the entanglement entropy of the 2D CFT in the curved background given by the AdS black hole of negative
mass has exactly the form given by Eq. (1) with the horizon radius rh playing the role of Σ. Notice that the presence
of the factor π in the argument of the sin-function is necessary if one wants invariance under the transformation
(2). The requirement that equation (12) is the analytic continuation of Eq. (10) fixes, as previously anticipated, the
proportionality factor between ǫ and L in the calculations leading to Eq. (10).
In this letter we have derived a formula for the entanglement entropy of 2D AdS black holes that has nice striking
features. The leading term in the large black hole mass expansion reproduces exactly the BH entropy. The subleading
term has the right lnSBH , behavior of the quantum corrections to the BH formula and represents an universal term
typical of CFTs. Analytic continuation to negative black hole masses give exactly the entanglement entropy of 2D
CFT with the black hole radius playing the role of the measure of the observable spacelike slice in the CFT. Our
results rely heavily on peculiarities of 2D AdS gravity, namely the existence of an AdS/CFT correspondence and
on the fact that 2D Newton constant arises from quantum fluctuation of the dual CFT. The generalization of our
approach to higher dimensional gravity theories is therefore far from being trivial. A related problem is the form of
the coefficient of the lnSBH term. In the 2D context our result, stating that this coefficient is given in terms of the
2D Newton constant (or equivalently the central charge of the dual CFT) is rather natural. For higher dimensional
gravity theories this is again a rather subtle point.
I thank G. D’Appollonio for discussions and valuable comments.
∗ Electronic address: mariano.cadoni@ca.infn.it
[1] G. Vidal, J. I. Latorre, E. Rico and A. Kitaev, Phys. Rev. Lett. 90 (2003) 227902 [arXiv:quant-ph/0211074].
[2] A. R. Its, B. Q. Jin, V. E. Korepin, J. Phys. A 38 (2005) 2975 [arXiv:quant-ph/0409027].
[3] A. Kitaev and J. Preskill, Phys. Rev. Lett. 96 (2006) 110404 [arXiv:hep-th/0510092].
[4] J. I. Latorre, C. A. Lutken, E. Rico and G. Vidal, Phys. Rev. A 71 (2005) 034301 [arXiv:quant-ph/0404120].
[5] V. E. Korepin, Phys. Rev. Lett. 92 (2003) 964021.
[6] C. Holzhey, F. Larsen and F. Wilczek, Nucl. Phys. B 424 (1994) 443 [arXiv:hep-th/9403108].
mailto:mariano.cadoni@ca.infn.it
http://arxiv.org/abs/quant-ph/0211074
http://arxiv.org/abs/quant-ph/0409027
http://arxiv.org/abs/hep-th/0510092
http://arxiv.org/abs/quant-ph/0404120
http://arxiv.org/abs/hep-th/9403108
[7] P. Calabrese and J. L. Cardy, J. Stat. Mech. 0406 (2004) P002 [arXiv:hep-th/0405152].
[8] H. Casini and M. Huerta, Phys. Lett. B 600 (2004) 142 [arXiv:hep-th/0405111].
[9] D. V. Fursaev, Phys. Rev. D 73 (2006) 124025 [arXiv:hep-th/0602134].
[10] S. N. Solodukhin, Phys. Rev. Lett. 97 (2006) 201601 [arXiv:hep-th/0606205].
[11] S. Ryu and T. Takayanagi, Phys. Rev. Lett. 96 (2006) 181602 [arXiv:hep-th/0603001].
[12] S. Ryu and T. Takayanagi, JHEP 0608 (2006) 045 [arXiv:hep-th/0605073].
[13] G. ’t Hooft, Nucl. Phys. B 256 (1985) 727.
[14] L. Bombelli, R. K. Koul, J. H. Lee and R. D. Sorkin, Phys. Rev. D 34 (1986) 373.
[15] V. P. Frolov and I. Novikov, Phys. Rev. D 48 (1993) 4545 [arXiv:gr-qc/9309001].
[16] T. M. Fiola, J. Preskill, A. Strominger and S. P. Trivedi, Phys. Rev. D 50 (1994) 3987 [arXiv:hep-th/9403137].
[17] F. Belgiorno and S. Liberati, Phys. Rev. D 53 (1996) 3172 [arXiv:gr-qc/9503022].
[18] S. Hawking, J. M. Maldacena and A. Strominger, JHEP 0105 (2001) 001 [arXiv:hep-th/0002145].
[19] J. M. Maldacena, JHEP 0304 (2003) 021 [arXiv:hep-th/0106112].
[20] R. Brustein, M. B. Einhorn and A. Yarom, JHEP 0601 (2006) 098 [arXiv:hep-th/0508217].
[21] R. Emparan, JHEP 0606 (2006) 012 [arXiv:hep-th/0603081].
[22] P. Valtancoli, arXiv:hep-th/0612049.
[23] T. Jacobson, arXiv:gr-qc/9404039.
[24] V. P. Frolov, D. V. Fursaev and A. I. Zelnikov, Nucl. Phys. B 486 (1997) 339 [arXiv:hep-th/9607104].
[25] V. P. Frolov and D. V. Fursaev, Phys. Rev. D 56 (1997) 2212 [arXiv:hep-th/9703178].
[26] M. Cadoni and S. Mignemi, Phys. Rev. D 51 (1995) 4319 [arXiv:hep-th/9410041].
[27] M. Cadoni and S. Mignemi, Phys. Rev. D 59 (1999) 081501 [arXiv:hep-th/9810251].
[28] M. Cadoni and S. Mignemi, Nucl. Phys. B 557 (1999) 165 [arXiv:hep-th/9902040].
[29] M. Cadoni and M. Cavaglia, Phys. Lett. B 499 (2001) 315 [arXiv:hep-th/0005179].
[30] M. Cadoni and M. Cavaglia, Phys. Rev. D 63 (2001) 084024 [arXiv:hep-th/0008084].
[31] L. Susskind and J. Uglum, Phys. Rev. D 50 (1994) 2700 [arXiv:hep-th/9401070].
[32] D. V. Fursaev, Phys. Rev. D 51 (1995) 5352 [arXiv:hep-th/9412161].
[33] R. B. Mann and S. N. Solodukhin, Nucl. Phys. B 523 (1998) 293 [arXiv:hep-th/9709064].
[34] R. K. Kaul and P. Majumdar, Phys. Rev. Lett. 84 (2000) 5255 [arXiv:gr-qc/0002040].
[35] S. Carlip, Class. Quant. Grav. 17 (2000) 4175 [arXiv:gr-qc/0005017].
[36] A. Ghosh and P. Mitra, Phys. Rev. Lett. 73 (1994) 2521 [arXiv:hep-th/9406210].
[37] S. Mukherji and S. S. Pal, JHEP 0205 (2002) 026 [arXiv:hep-th/0205164].
[38] M. R. Setare, Phys. Lett. B 573 (2003) 173 [arXiv:hep-th/0311106].
[39] M. Domagala and J. Lewandowski, Class. Quant. Grav. 21 (2004) 5233 [arXiv:gr-qc/0407051].
[40] A. J. M. Medved, Class. Quant. Grav. 22 (2005) 133 [arXiv:gr-qc/0406044].
[41] D. Grumiller, arXiv:hep-th/0506175.
http://arxiv.org/abs/hep-th/0405152
http://arxiv.org/abs/hep-th/0405111
http://arxiv.org/abs/hep-th/0602134
http://arxiv.org/abs/hep-th/0606205
http://arxiv.org/abs/hep-th/0603001
http://arxiv.org/abs/hep-th/0605073
http://arxiv.org/abs/gr-qc/9309001
http://arxiv.org/abs/hep-th/9403137
http://arxiv.org/abs/gr-qc/9503022
http://arxiv.org/abs/hep-th/0002145
http://arxiv.org/abs/hep-th/0106112
http://arxiv.org/abs/hep-th/0508217
http://arxiv.org/abs/hep-th/0603081
http://arxiv.org/abs/hep-th/0612049
http://arxiv.org/abs/gr-qc/9404039
http://arxiv.org/abs/hep-th/9607104
http://arxiv.org/abs/hep-th/9703178
http://arxiv.org/abs/hep-th/9410041
http://arxiv.org/abs/hep-th/9810251
http://arxiv.org/abs/hep-th/9902040
http://arxiv.org/abs/hep-th/0005179
http://arxiv.org/abs/hep-th/0008084
http://arxiv.org/abs/hep-th/9401070
http://arxiv.org/abs/hep-th/9412161
http://arxiv.org/abs/hep-th/9709064
http://arxiv.org/abs/gr-qc/0002040
http://arxiv.org/abs/gr-qc/0005017
http://arxiv.org/abs/hep-th/9406210
http://arxiv.org/abs/hep-th/0205164
http://arxiv.org/abs/hep-th/0311106
http://arxiv.org/abs/gr-qc/0407051
http://arxiv.org/abs/gr-qc/0406044
http://arxiv.org/abs/hep-th/0506175
	Acknowledgments
	References
ABSTRACT
  Using the AdS/CFT correspondence we derive a formula for the entanglement
entropy of the anti-de Sitter black hole in two spacetime dimensions. The
leading term in the large black hole mass expansion of our formula reproduces
exactly the Bekenstein-Hawking entropy S_{BH}, whereas the subleading term
behaves as ln S_{BH}. This subleading term has the universal form typical for
the entanglement entropy of physical systems described by effective conformal
fields theories (e.g. one-dimensional statistical models at the critical
point). The well-known form of the entanglement entropy for a two-dimensional
conformal field theory is obtained as analytic continuation of our result and
is related with the entanglement entropy of a black hole with negative mass.

<|endoftext|><|startoftext|>
Towards self-consistent definition of instanton liquid
parameters
S.V. Molodtsov1,2, G.M. Zinovjev3
1Joint Institute for Nuclear Research, RU-141980, Dubna, Moscow region, Russia
2Institute of Theoretical and Experimental Physics, RU-117259, Moscow, Russia
3Bogolyubov Institute for Theoretical Physics, National Academy of Sciences of Ukraine, UA-03680,
Kiev-143, Ukraine
The possibility of self-consistent determination of instanton liquid parameters is discussed to-
gether with the definition of optimal pseudo-particle configurations and comparing the various
pseudo-particle ensembles. The weakening of repulsive interactions between pseudo-particles is
argued and estimated.
The problem of finding the most effective pseudo-particle profile for instanton liquid (IL) model
of the QCD vacuum [1] has already been formulated in the first papers treating the pseudo-particle
superposition as the quasi-classical configuration saturating the generating functional [2] of the fol-
lowing form
D[A] e−S(A) , (1)
where S(A) is the Yang-Mills action. Although the solution proposed in Ref. [2] was quite acceptable
phenomenologically the consequent more accurate analysis discovered several imperfect conclusions
putting into doubt the assertion about the instanton ensemble getting stabilization and some addi-
tional mechanism should be introduced to fix such an ensemble [3]. In this note we revisit the task
formulated in Ref. [2] within the self-consistent approach proposed in our previous paper [4]. We are
not speculating on the detailed mechanism of stabilizing and are based on one crucial assumption
which is the existence of non-zero gluon condensate in the QCD vacuum. This idea is not very orig-
inal but turns out far reaching in the context of our approach. The particular form and properties
of this condensate will be discussed in the following paper.
Thus, as the configuration saturating the generating functional (1) we take the following super-
position
Aaµ(x) = Baµ(x) +
Aaµ(x; γi) , (2)
here Aaµ stands for the (anti-)instanton field in the singular gauge
Aaµ(x; γ) =
ωabη̄bµν
f(y), y = x− z , (3)
γi = (ρi, zi, ωi) denotes all the parameters describing the i-th (anti-)instanton, in particular, its size
ρ, colour orientation ω, center position z and as usual g is the coupling constant of gauge field.
The function f(y) introduces the pseudo-particle profile and will be fixed by resolving the suitable
variational problem. For example, for the conventional singular instanton it looks like
f(y) =
. (4)
http://arxiv.org/abs/0704.0141v1
In analogy with this form we consider the function f depending on y2 or, more precisely, on the
variable x =
at some characteristic mean pseudo-particle size ρ̄. Dealing with the anti-instanton
one should make the substitution of the ’t Hooft symbol η̄ → η. It is seen from (2) we ’singled out’
one pseudo-particle of ensemble and introduced the special symbol B for its field which actually has
the same form as Eq. (3).
The strength tensor of this ’external’ field and the field of every separate pseudo-particle A can
be written as
Gaµν = G
µν(B) +G
µν(A) +G
µν(A,B) , (5)
where two first terms are given by the standard definition of field strength
Gaµν(A) = ∂µA
ν − ∂νAaµ + g fabcAbµAcν , (6)
with the entirely antisymmetric tensor fabc. In particular, for the singular instanton of Eq. (3) it
takes the form
Gaµν = −
η̄kαβ
f(1− f)
+ (η̄kµβ yν − η̄kνα yµ)
f ′ − f(1− f)
, (7)
where f ′ means the derivative over y2. The third term of Eq. (5) presents the ’mixed’ component of
field strength and is
Gaµν(A,B) = g f
abc(BbµA
ν − BbνAcµ) = g fabcωcd
(Bbµ η̄dνα −Bbν η̄dµα)
f. (8)
It was shown in Ref. [4] that in quasi-classical regime which is of particular interest for appli-
cations, the generating functional (1) could be essentially simplified if reformulated in terms of the
field BA averaged over ensemble A. Performing the cluster decomposition [5] of stochastic exponent
in Eq. (1)
〈exp(−S)〉ωz = exp
(−1)k
〈〈Sk〉〉ωz
, (9)
where 〈S1〉 = 〈〈S1〉〉, 〈S1S2〉 = 〈S1〉〈S2〉 + 〈〈S1S2〉〉, . . . (the first cumulant is simply defined by
averaging the action) the higher terms of effective action for the ’external’ field in IL could be
presented as
〈〈S[BA]〉〉A =
G(BA) G(BA)
, (10)
and the mass m is defined by the IL parameters developing for the standard singular pseudo-particles
(4) the following form (see, also below)
m2 = 9π2 n ρ̄2
N2c − 1
, (11)
with n = N/V where N is the total number of pseudoparticles in the volume V and Nc is the number
of colours. The small magnitude of characteristic IL parameter (packing fraction) nρ̄4 allows us at
decomposing to keep the contributions of one pseudo-particle term (∼ n) only.
The effective action in Eq. (10) implies a functional integration in which the vacuum stochastic
fields are not destroyed by the external field. Then there is no reason to develop the detailed
description of the field B driven by the symmetries of initial gauge invariant Lagrangian for the
Yang-Mills fields. In practice it could be understood as an argument to do use the averaged action
dealing with the field B. It means the colourless binary (and similar even) configurations only of
field B survive in the effective action. In other words the decomposition B ≃ BA + · · · is used
(in what follows we are not maintaining the index for the field B). Obviously, if there is any need
of more detailed description including, for example, information on the fluctuations of field B one
should operate with the correlation functions of higher order and the corresponding chain of the
Bogolyubov equations.
The selfconsistent description of pseudo-particle ensemble may not be developed based on Eq.
(10) only because in such a form the pseudo-particles of zero size ρ = 0 are most advantageous.
In Ref. [4] the version of variational principle was proposed which makes it possible to determine
the selfconsistent solution in long wave-length approximation for the pseudo-particle ensemble (anti-
instantons in the singular gauge with standard profile (4)) and external field. Here it adapts to the
saturating configuration (2) also and its more optimal (than standard) profile is defined, as suggested
in Ref. [2], taking into account the IL parameter change while the pseudo-particle field is present.
The contribution of saturating configuration into the generating functional is evaluated as (see
[2] for the denotions)
Z ≃ Y =
dγi e
−S(B,γ) . (12)
The following terms should be taken into consideration
S(B, γ) = −
ln d(ρi) + β Uint +
U iext(B) + S(B) , (13)
(the details of deducing this expression can be found in [4]). Here we remind only that to obtain it
one should average over the pseudo-particle parameters and to hold the highest contributions only
at summing up the pseudo-particles. If the saturating configurations are the instantons in singular
gauge with the standard profile (4) the first term describing the one instanton contributions takes
the form of distribution function over (anti-)instanton sizes
d(ρ) = CNcΛ
b ρb−5β̃2Nc , (14)
where
Nf , (15)
β̃ = −b ln(Λρ̄),
CNc ≈
4.66 exp(−1.68Nc)
π2(Nc − 1)!(Nc − 2)!
If one considers the profile of Eq. (3) the change of one pseudo-particle action which has the form
Si = 3
(y2f ′)2 + f 2(1− f)2
, (16)
should be absorbed while calculating. Here β = 8π2/g2 is the characteristic action of single pseudo-
particle (4) which is defined at the scale of average pseudo-particle size β = β(ρ̄) where β(ρ) =
− lnCNc−b ln(Λρ). The coefficient b enters the corresponding equations (in particular the distribution
function (14)) always with the additional factor s =
. It means that in all the formula containing
the one instanton contribution the following substitution
b → b s . (17)
should be done. The penultimate term of Eq. (13) accumulates the partial pseudo-particle contribu-
tions coming from the ’mixed’ component of the strength tensor (8) and describing the interaction
of pseudo-particle ensemble with the detached one, i.e.
U iext(B) =
Gaµν(Ai, B) G
µν(Ai, B)
The other terms at the characteristic IL parameters are small as it was shown in Ref. [4]. The
average value of ’mixed’ component is given by the following formula
〈Gaµν(A,B) Gaµν(A,B)〉ωz =
N2c − 1
I Bbµ B
µ , B
, (18)
here I is defined by the integrated profile function of pseudo-particle
Iα,β = δα,β I =
f 2 , I =
dx f 2 , x =
In particular, for the standard form of pseudo-particle we have
dx f 2 = 1 .
The corresponding constant (see [4]) ζ0 =
N2c−1
should be changed for the modified one
ζ = λζ0 , λ =
dx f 2 ,
in all terms describing the interaction of IL with detached pseudo-particle if the profile function
f is arbitrary. Eq. (18) demonstrates that we are formally dealing with non-zero value of gluon
condensate which is given by the correlation function
〈Aaµ(x; γ)Aaµ(y; γ)〉ωz =
N2c − 1
|x− y|
. (19)
For the pseudo-particle of standard form the function F (∆) equals to
F (∆) =
∆2 + 2
∆2 + 4 ln
∣∣∣∣∣
∆2 + 4(∆2 + 1) + ∆3 + 3∆√
∆2 + 4−∆
∣∣∣∣∣−
− π2 (∆
2 + 1)2
ln(1 + ∆2) + π2 ∆2 ln |∆| ,
with the asymptotic behaviours
F (∆) → π2 − π
∆2 + π2 ∆2 ln |∆| , lim
F (∆) → π
The presence of this condensate (19) which leads, in particular, to the mass definition as in (11) just
signifies the assumption mentioned at the beginning this note.
The second term of (13) describes the repulsive interaction between the pseudo-particles of en-
semble
β Uint =
Gaµν(Ai, Aj) G
µν(Ai, Aj)
γi,γj
and actually presents the same contribution as Uext but being integrated with the field B of every
individual pseudo-particle as β Uint =
d4x m
2. It results in the change of coupling constant
ξ20 =
27 π2
N2c−1
describing the pseudo-particle interaction (see [2]) for new form
ξ2 = λ2 ξ20 ,
(similar to the change of constant ζ). And eventually the last term of Eq. (13) presents simply the
Yang-Mills action of the B field
S(B) =
Gaµν(B) G
µν(B)
It is worthwhile to notice that the topological charge of the configuration (4) is retained to be equal
GaµνG̃
dx f ′f(1− f) = 1 , G̃aµν =
εµναβ G
here εµναβ is an entirely antisymmetric tensor, ε1234 = 1.
The generating functional (12) might be estimated with the approximating functional (see [2]) as
Y ≥ Y1 exp(−〈S − S1〉) , (21)
where
dγi e
−S1(B,γ)−S(B) , S1(B, γ) = −
lnµ(ρi) ,
and µ(ρ) is an effective one particle distribution function defined by solving the variational problem.
In our particular situation the average value of difference of the actions is given as follows
〈S − S1〉 =
dγi [β Uint + Uext(γ, B)−
ln d(ρi) +
lnµ(ρi)] e
lnµ(ρi) =
dρ µ(ρ) ln
dγ1dγ2 Uint(γ1, γ2) µ(ρ1)µ(ρ2) +
ρ2ζ B2 =
d4x n
+ ζρ2 B2
, (22)
with µ0 =
dρ µ(ρ). In this note we estimate the functionals in the long wave length (adiabatic)
approximation, i.e. consider the IL elements to be equilibrated by the external fixed field B. After-
wards, with finding the optimal IL parameters out we receive the effective action for the external
field in the selfconsistent form. Eq. (22) is taken just in such a form in order to underline the inte-
gration is executed over the IL elements and the parameters describing their states are the functions
of external field (i.e. could finally be the functions of a coordinate x). The physical meaning of such
a functional is quite transparent and implies that each separate IL element develops its characteristic
screening of the attached field.
Now calculating the variation of action difference 〈S − S1〉 over µ(ρ) we obtain
µ(ρ) = C d(ρ) e−(nβξ
2ρ2+ζB2)ρ2 ,
where C is an arbitrary constant and its value is fixed by requiring the coincidence of the distribution
function when the external field is switched off (B = 0) with vacuum distribution function then
µ(ρ) = CNcβ̃
2NcΛbsρbs−5 e−(nβξ
2ρ2+ζB2)ρ2 . (23)
With defining the average size as
dρ ρ2 µ(ρ)
we come to the practical interrelation between the IL density and average size of pseudo-particles
(n β ξ2 ρ2 + ζ B2) ρ2 ≃ ν , (24)
where ν = bs− 42 . Apparently, the size distribution of pseudo-particles can be presented by the
well-known form as
µ(ρ) = CNcβ̃
2NcΛbsρbs−5 e
ρ2 . (25)
Figure 1: The energy E(α) when the profile function includes a screening effect (29) with the pa-
rameter λ (s = 1) only taken into consideration (lower curve) and with both parameters used (upper
curve) (see the text).
Eqs. (22) and (25) allow us to get the estimate of generating functional (21) in the following form
D[B] e−S(B) e−E , (26)
d4x n
− 1− ν
ζ ρ2 B2
CNc β̃
− ν ln ρ
Now taking into account Eq. (24) and fixing a field B, parameters s and λ the maximum of functional
(26) over the IL parameters can be calculated by solving the corresponding transcendental equation
= 0) numerically. Here it is a worthwhile place to notice the presence of new factor in the
denominator of
2 what is caused by the Gaussian form of the corresponding integral over ρ
squared and, hence, the integration element requires the introduction of 2ρ dρ. In Ref. [2] this factor
was missed. However, this fact has not generated a serious consequence because any application of
these results is actually related to the choice of suitable quantity of the parameter Λ entering the
observables (the pion decay constant, for example). It means we should make the proper choice of
basic scale. Besides, we should also keep in mind the approximate character of IL model. Further we
give the results for both versions to demonstrate the dependence of final results on the renormalized
constant CNc .
Searching the optimal configuration f we take the effective action in the form of nonlinear func-
tional as
Seff =
Gaµν(B) G
µν(B)
+ E[B]
, (27)
in which the IL state is described by solutions ρ̄[B, s, λ], n[B, s, λ]. In practice the following differ-
ential equation should be resolved
= − 1
f(1− f)(1− 2f)
, (28)
at fixed initial magnitude of f(x0) putting up the derivative in the initial point f
′(x0) in such a way to
have the solution going to zero when x is going to infinity. Parameter β0 is introduced to fix a priori
Figure 2: The IL density as the function of x = y2/ρ̄2. Three dashed curves correspond to the
different profile functions. The lowest dashed line corresponds to the standard form (4). The top
dashed line corresponds to the profile function with the screening factor (29) and one parameter λ
(s = 1) included and the middle line presents the same function but with two parameters included.
The solid line presents the selfconsistent solution of variational problem.
unknown value of coupling constant in the pseudo-particle definition (3). If the profile function has
been fixed the configuration should be found in the form in which the starting values of parameters
s, λ and β0 coincide (within the given precision) with the parameters obtained from the solution
f . Nowadays this approach looks the most optimal one among other existing possibilities not only
because of the computational arguments but in view of the poor current level of understanding the
interrelation between perturbative and non-perturbative contributions while calculating the effective
Lagrangian. In fact, it was mentioned in Ref. [2] that in more general (realistic) formulation of
this problem Eq. (28) should include the term responsible for the change of ’quantum’ constant
CNc with the function f changing. In principle, it could imply that the problem of pseudo-particle
ensemble stabilization is connected at the fundamental dynamics level with the anticipated smallness
of the
contribution and, apparently, should be addressed not so much to the description of
the interacting pseudo-particles and their interactions with the perturbative fields but rather to
investigation of the time hierarchy corresponding to the breakdown of quasi-stationary behaviour
of the vacuum fluctuations which will certainly lead to the changes of suitable effective Lagrangian
(10).
In order to receive the preliminary parameter estimates we consider the simplified model with
the profile function containing only one additional parameter for describing the screening effect as
regards
f(y) =
1 + x
, x =
. (29)
The energy E as the function of the screening parameter α is depicted in Fig. 1. The lowest dashed
curve shows the behaviour when the changes related to weakening of repulsive interaction are taken
into account by switching on the parameter λ only (at s = 1). The top dashed curve was obtained
with both parameters switched on. The optimal value of the screening parameter α is determined by
the minimum point of function E(α). Besides, this figure demonstrates the stability of variational
procedure of extracting the IL parameters. For the first calculation the values of characteristic
parameters for corresponding solution were taken as α = 0.06, λ = 0.775, s = 1.0067 with the
following set of the IL parameters ρ̄Λ = 0.3305, n/Λ4 = 0.919, β = 17.186. These values give for the
ratio of average pseudo-particle size and average distance between pseudo-particles the quite suitable
quantity ρ̄/R = 0.324. For another calculation we have treated the parameter set characterizing
the solution as α = 0.02, λ = 0.888, s = 1.0015 and for the IL parameters the following values
ρ̄Λ = 0.315, n/Λ4 = 0.829, β = 17.67, ρ̄/R = 0.3. In order to get more orientation we would like to
mention that for the ensemble of standard pseudo-particles (α = 0, λ = 1, s = 1) the corresponding
values are ρ̄Λ = 0.301, n/Λ4 = 0.769, β = 18.103, ρ̄/R = 0.282.
Figure 3: The average size of IL pseudo-particles as the function of x = y2/ρ̄2. Three dashed curves
correspond to different profile functions. The lowest curve corresponds to the standard form (4). The
top dashed curve corresponds to the profile function with the screening factor (29) which includes
one parameter λ (s = 1) and the middle line shows the same function with two parameters included.
The solid curve corresponds to the selfconsistent solution of the variational problem.
Now we examine the impact of correction introduced in Eq. (26) when we changed the term
which has been obtained in Ref. [2]. For the first calculation with the set of solution parameters as
α = 0.24, λ = 0.546, s = 1.029 we have for the IL parameters ρ̄Λ = 0.331, n/Λ4 = 1.844, β = 17.173
which lead to the ratio discussed equal to ρ̄/R = 0.386. For another calculation we have the following
results α = 0.05, λ = 0.799, s = 1.0053 and ρ̄Λ = 0.291, n/Λ4 = 1.356, β = 18.483, ρ̄/R = 0.314.
And for the ensemble of standard pseudo-particles (α = 0, λ = 1, s = 1) these parameters are
ρ̄Λ = 0.265, n/Λ4 = 1.186, β = 19.305, ρ̄/R = 0.277.
The Fig. 2 and Fig. 3 show the behaviours of IL density and average pseudo-particle size as the
functions of distance x. The dashed lines on both plots correspond to the similar ensembles. The
lowest curves demonstrate the behaviours for the ensembles of standard pseudo-particles (4). The
top curves present the ensemble of pseudo-particles with the profile function (29) at α = 0.06 and
s = 1. And the middle dashed lines correspond to the profile functions with α = 0.02 and s ∼ 1.03.
Obviously, it may be concluded that including even small change of the second parameter value
(s ∼ 1.03) leads to the noticeable change of ensemble characteristics (for example, the IL density)
because the highest contribution to the action when the coupling constant becomes the function of
ρ is essentially modified.
Let us make now several comments as to the ’complete’ formulation of the problem of analyzing the
equation (28). It was numerically resolved by the Runge-Kutta method. This approach combined
with numerical calculation of the derivative dE
at every point of consequent integration interval
allows us to avoid the problems which appear when searching the minimum of complicated functional
in multidimensional space.
The initial data were fixed at the point x0 =
= 0.1. Since the IL density value at the
coordinate origin is inessential the initial form of pseudo-particle profile function is taken without
any deformations as f(x0) =
1 + x0
. Then at fixed values of the parameters λ, s and β0 the coefficient
c is calculated. It allows to set the slope of trajectory f ′(x0) = −cf(1− f)/x0 at initial point in such
a form in order to have the solution going to zero at large distances. Afterwards we find out the
values of parameters λ and s requiring the input data to coincide with the output ones within the
fixed precision. The parameter values which obey the imposed constraints are the following (input
values) λ = 0.69099, s = 1.049, β0 = 16.26 at c = 1.361 and λ = 0.691, s = 1.049, β0 = 16.263
(at the output of variational procedure). The solid line in Fig. 4 shows the obtained profile f as
the function of x =
. The differences of profiles are smoothed over if they are presented as the
functions of y because the large magnitude of the screening coefficient, for example α = 0.06, is
compensated by enlargening the pseudo-particle size. The dashed lines on this plot show the profile
functions for the standard form (4) (top dashed line), with the screening factor (29) including one
parameter only α (s = 1) (lowest dashed curve) and two parameters included (middle dashed line).
Figure 4: The various profile functions. The top dashed curve corresponds to the standard form (4),
the lowest dashed curve shows the function with the screening factor (29) including one parameter
λ (s = 1) and the middle line presents the same function with two parameters included. The solid
line corresponds to the selconsistent solution of variational problem.
Another calculation (with modified Γ-function contribution) was based on the slightly different set
of relevant parameters which are for the input values λ = 0.607, s = 1.0515, β0 = 17.04 at c = 1.545
and λ = 0.6066, s = 1.0515, β0 = 17.042 for the output one at the finish of variational procedure.
The behaviours of IL density and average pseudo-particle size for selfconsistent solution are plotted
in Fig. 2 and Fig. 3 (solid lines, respectively)1. In the Table 1 we present the IL parameters at
1It is interesting to notice that considering IL (ensemble of pseudo-particles in the singular gauge) in the field of
regular pseudo-particle we obtain the IL density value in the center of regular pseudo-particle which is larger than its
value at large distances what looks like the anti-screening effect.
the large distances from pseudo-particle (the first line) together with the data for the ensemble of
pseudo-particles with the standard profile function (the second line). The third and fourth lines of
this Table 1 are devoted to the calculations with the second set of parameters (with factor 2 absent in
Eq. (26)). The fourth line, in particular, presents the calculations for pseudo-particles with standard
form of profile function.
Table 1. Parameters of IL.
ρ̄Λ n/Λ4 β ρ̄/R nρ̄4
0.381 0.743 16.263 0.354 1.582·10−2
0.331 0.769 18.103 0.282 6.277·103
0.354 1.245 17.042 0.379 1.955·10−2
0.265 1.186 19.305 0.277 5.849·10−3
It is quite obvious that the utilization of optimal pseudo-particle profile function leads to the larger
pseudo-particle size but the packing fraction parameter holds, nevertheless, a small quantity which is
quite suitable for the perturbative expansion. Besides, the results obtained allow us to conclude that
with tuning Λ a fully satisfactory agreement our calculations of pseudo-particle size, the ensemble
diluteness and gluon condensate value with their phenomenological magnitudes extracted from the
other models are easily reachable. The calculations of several dimensional quantities in our approach
are also very indicative. The values of the screening mass (11), average pseudo-particle size and IL
density obtained for two values of Λ (200 MeV and 280 MeV) are shown in Table 2. The sequence of
line meanings is identical to that in Table 1 as well as the meanings of last four lines which present
the results of calculations with the second set of parameters (with factor 2 absent in Eq. (26)).
Table 2. Screening mass and IL parameters
Λ MeV m MeV ρ̄ GeV−1 n fm−4
200. 381 1.906 0.7496
304 1.503 0.7688
280. 533 1.361 2.88
426 1.074 2.95
200. 456 1.77 1.245
333 1.325 1.186
280. 638 1.264 4.78
466 0.946 4.56
Another interesting feature of this calculation is the weakening of pseudo-particle interaction. This
effect is driven by the coefficient ξ2 (∼ λ2). Our estimates for the first set of parameters give
λ = 0.691 and, hence, λ2 ∼ 0.48 and for the second set we have (λ = 0.607) and λ2 ∼ 0.37. Let us
mention here that the reasonable description of instanton ensemble can be reached in the framework
of two-component models [6] as well.
Our calculations enable us to conclude that dealing with IL model (formulated in one-loop ap-
proach) one is able to reach quite reasonable description of gluon condensate even being constrained
by the values of average pseudo-particle size and other routine phenomenological parameters. More-
over, the ensemble of pseudo-particles with standard profile functions turns out to be very practical
because introducing the other configurations to make the similar estimates is simply unoperable.
With such an approximation of the vacuum configurations the coefficient of interaction weakening
develops the magnitude about λ2 ∼ 0.3 — 0.5. Including this effect leads to the enlargening of
pseudo-particle size. It allows us to conclude that nowadays the instantons in the singular gauge is
the only serious instrument for effective practising.
The authors are sincerely grateful to A.E. Dorokhov and S.B. Gerasimov for interesting discussions
and practical remarks. The financial support of the Grants INTAS-04-84-398 and NATO PDD(CP)-
NUKR980668 is also acknowledged.
References
[1] C.G. Callan, R. Dashen, and D.J. Gross, Phys. Lett. B 66 (1977) 375;
C.G. Callan, R. Dashen, and D.J. Gross, Phys. Rev. D 17, (1978) 2717.
[2] D.I. Diakonov and V.Yu. Petrov, Nucl. Phys. B 245, (1984) 259.
[3] I.I. Balitsky and A.V. Yung, Phys. Lett. B 168, (1986) 113;
D. Förster, Phys. Lett. B 66, (1977) 279;
E.V. Shuryak and J.J.M. Verbaarschot, Nucl. Phys. B 364, (1991) 255;
T. Schäfer and E.V. Shuryak, Rev. Mod. Phys. 70, (1998) 323.
[4] S.V. Molodtsov, G.M. Zinovjev, Yad. Fiz. 70, N0 6, (2007).
[5] N.G. Van Kampen, Phys. Rep. 24 (1976) 171; Physica 74 (1974) 215, 239;
Yu.A. Simonov, Phys. Lett. B 412 (1997) 371.
[6] A.E. Dorokhov, S.V. Esaibegyan, A.E. Maximov, and S.V. Mikhailov,
Eur. Phys.J C 13 (2000) 331;
N.O. Agasian and S.M. Fedorov, JHEP 12 (2001) 019.
ABSTRACT
  The possibility of self-consistent determination of instanton liquid
parameters is discussed together with the definition of optimal pseudo-particle
configurations and comparing the various pseudo-particle ensembles. The
weakening of repulsive interactions between pseudo-particles is argued and
estimated.

<|endoftext|><|startoftext|>
Introduction
The renormalization group (RG) approach, perhaps, is the most extensively used one in
numerous studies of critical phenomena [1, 2]. Particularly, the perturbative RG approach
to the ϕ4 or Ginzburg–Landau model is widely known [3, 4, 5, 6]. However, the pertur-
bative approach suffers from some problems [7]. Therefore it is interesting to look for a
nonperturbative approach. Historically, nonperturbative RG equations have been devel-
oped in parallel to the perturbative ones. These are so called exact RG equations (ERGE).
The method of deriving such RG equations is close in spirit to the famous Wilson’s ap-
proach, where the basic idea is to integrate out the short–wave fluctuations corresponding
to the wave vectors within Λ/s < q < Λ with the upper (or ultraviolet) cutoff parameter
Λ and the renormalization scale s > 1. The oldest nonperturbative equation of this kind,
originally presented by Wegner and Houghton [8], uses the sharp momentum cutoff. Later,
a similar equation with smooth momentum cutoff has been proposed by Polchinski [9]. The
RG equations of this class are reviewed in [10].
According to the known classification [10, 11], there is another class of nonperturbative
RG equations proposed by [12] and reviewed in [11]. Some relevant discussion can be found
in [10], as well. Such equations describe the variation of an average effective action Γk[φ]
depending on the running cutoff scale k. Here φ(x) = 〈ϕ(x)〉 is the averaged order–
parameter field (for simplicity, we refer to the case of scalar field). According to [12], the
averaging is performed over volume ∼ k−d such that the fluctuation degrees of freedom
E–mail: kaupuzs@latnet.lv
http://arxiv.org/abs/0704.0142v2
with momenta q > k are effectively integrated out. In fact, the averaging over volume
∼ k−d is the usual block–spin–averaging procedure of the real–space renormalization.
At the same time, the fluctuations with q . k are suppressed by a smooth infrared
cutoff. As one can judge from [11], the existence of a deterministic relation between the
configuration of external source {J(x)} and that of the averaged order parameter {φ(x)} is
(implicitly) assumed in the nonperturbative derivation of the RG flow equation. Namely,
it is stated (see the text between (2.28) and (2.29) in [11]) that δJ(x)/δφ(y) is the inverse
of δφ(x)/δJ(y), which has certain meaning as a matrix identity. To make this point
clearer, let us consider a toy example ~J = A~φ, where ~J = (J(x1), J(x2), . . . , J(xN)) and
~φ = (φ(x1), φ(x2), . . . , φ(xN)) are N–component vectors and A is a matrix of size N ×N .
In this case ∂J(xi)/∂φ(xj) is the element Aij of matrix A, whereas ∂φ(xi)/∂J(xj) is the
element
of the inverse matrix A−1. In the continuum limit N → ∞, this toy
example corresponds to a linear dependence between {J(x)} and {φ(x)}. The calculation
of derivative always implies the linearisation around some point, so that the matrix identity
used in [11] (as a continuum limit in the above example) has a general meaning. However,
it makes sense only if there exists a deterministic relation between the configurations of
φ(x) and J(x) or, in a mathematical notation, if there exist mappings f : {J(x)} → {φ(x)}
and f−1 : {φ(x)} → {J(x)}. On the other hand, according to the block–averaging, the
values of φ(x) should be understood as the block–averages. These, of course, are not
uniquely determined by the external sources, but are fluctuating quantities. So, we are
quite sceptical about the exactness of such an approach of averaged effective action.
The integration over fluctuation degrees with momenta q > k does not alter the be-
havior of the infrared modes, directly related to the critical exponents. From this point of
view, the approach based on the equations of Wegner–Houghton and Polchinski type seems
to be more natural. These are widely believed to be the exact RG equations, although, in
view of our currently presented results, it turns out to be questionable in which sense they
are really exact. In any case, the nonperturbative RG equations cannot be solved exactly,
therefore a suitable truncation is used. The convergence of several truncation schemes and
of the derivative expansion has been widely studied in [13, 14, 15, 16, 17]. Here [17] refers
to the specific approach of [12]. A review about all the methods of approximate solution
can be found in [18].
Another problem is to test and verify the nonperturbative RG equations, comparing
the results with the known exact and rigorous solutions, as well as with the results of
the perturbation theory. In [15], the derivative expansion of the RG β–function has been
considered, showing the agreement up to the second order between the perturbative results
and those obtained from the Legendre flow equation, which also belongs to the same class
of RG equations as the Wegner–Houghton and Polchinski equations. It has been stated
in [13] that the critical exponent ν, extracted from the Wegner–Houghton equation in the
local potential approximation, agrees with the ε–expansion up to the O(ε) order, as well
as with the 1/n (1/N in the notations of [13]) expansion in the leading order. However,
looking carefully on the results of [13], one should make clear that “the leading order of
the 1/n expansion” in this case is no more than the zeroth order, whereas the expansion
coefficient at 1/n is inconsistent with that proposed by the perturbative RG calculation
at any fixed dimension d except only d = 4. The inconsistency could be understood from
the point of view that the Wegner–Houghton equation has been solved approximately.
Therefore it would be interesting to verify whether the problem is eliminated beyond the
local potential approximation. One should also take into account that the perturbative
RG theory is not rigorous and, therefore, we think that a possible inconsistency still would
not prove that something is really wrong with the nonperturbative RG equation. In any
case, it is a remarkable fact that correct RG eigenvalue spectrum and critical exponents
are obtained in the local potential approximation at n → ∞ from the Wegner–Houghton
equation [13], as well as from similar RG equations [19], in agreement with the known
exact and rigorous results for the spherical model. It shows that some solutions, being
not exact, nevertheless can lead to exact critical exponents. From this point of view, it
seems also possible that some kind of approximations, made in the derivation of an RG
equation, are not harmful for the critical exponents.
We propose a simple test of the Wegner–Houghton equation: to verify the expansion of
the renormalized action S of the ϕ4 model in powers of the coupling constant u at u → 0
in the high–temperature phase. Such a test is rigorous, in the sense that the natural
domain of validity of the perturbation theory is considered. We think that it would be
quite natural to start with such a relatively simple and straightforward test before passing
to more complicated ones, considered in [13, 15, 19]. We have made this simplest test
in our paper and have found that the Wegner–Houghton equation fails to give all correct
expansion coefficients. We have also proposed another derivation of the Wegner–Houghton
equation (Secs. 2, 3). It is helpful to clarify the origin of the mentioned inconsistency. It
is also less obscure from the point of view that the used assumptions and approximations
are clearly stated. As regards the derivation in [8], at least one essential step is obscure
and apparently contains an implicit approximation which, in very essence, is analogous to
that pointed out in our derivation. We will discuss this point in Sec. 3.
2 An elementary step of renormalization
To derive a nonperturbative RG equation for the ϕ4 model, we should start with some
elementary steps, as explained in this section.
Consider the action S[ϕ] which depends on the configuration of the order parameter
field ϕ(x) depending on coordinate x. By definition, it is related to the Hamiltonian H of
the model via S = H/T , where T is the temperature measured in energy units. In general,
ϕ(x) is an n–component vector with components ϕj(x) given in the Fourier representation
as ϕj(x) = V
k<Λ ϕj,ke
ikx, where V = Ld is the volume of the system, d is the spatial
dimensionality, and Λ is the upper cutoff of the wave vectors. We consider the action of
the Ginzburg–Landau form. For simplicity, we include only the ϕ2 and ϕ4 terms. The
action of such ϕ4 model is given by
S[ϕ] =
Θ(k)ϕj,kϕj,−k + uV
j,l,k1,k2,k3
ϕj,k1ϕj,k2ϕl,k3ϕl,−k1−k2−k3 , (1)
where Θ(k) is some function of wave vector k, e. g., Θ(k) = r0 + ck
2 like in theories of
critical phenomena [4, 5, 6, 7]. In the sums we set ϕl,k = 0 for k > Λ.
The renormalization group (RG) transformation implies the integration over ϕj,k for
some set of wave vectors with Λ′ < k < Λ, i. e., the Kadanoff’s transformation, followed
by certain rescaling procedure [4]. The action under the Kadanoff’s transformation is
changed from S[ϕ] to Stra[ϕ] according to the equation
e−Stra[ϕ] =
e−S[ϕ]
j,Λ′<k<Λ
dϕj,k . (2)
Alternatively, one often writes −Stra[ϕ]+AL
d instead of −Stra[ϕ] to separate the constant
part of the action ALd. This, however, is merely a redefinition of Stra, and for our purposes
it is suitable to use (2). Note that ϕj,k = ϕ
j,k+iϕ
j,k is a complex number and ϕj,−k = ϕ
holds (since ϕj(x) is always real), so that the integration over ϕj,k means in fact the
integration over real and imaginary parts of ϕj,k for each pair of conjugated wave vectors
k and −k.
The Kadanoff’s transformation (2) can be split in a sequence of elementary steps
S[ϕ] → Stra[ϕ] of the repeated integration given by
e−Stra[ϕ] =
e−S[ϕ]dϕ′j,qdϕ
j,q (3)
for each j and q ∈ Ω, where Ω is the subset of independent wave vectors (±q represent
one independent mode) within Λ′ < q < Λ. Thus, in the first elementary step of renor-
malization we have to insert the original action (1) into (3) and perform the integration
for one chosen j and q ∈ Ω. In an exact treatment we must take into account that the
action is already changed in the following elementary steps.
For Λ′ > Λ/3, we can use the following exact decomposition of (1)
S[ϕ] = A0 +A1ϕj,q +A
1ϕj,−q +A2ϕj,qϕj,−q +B2ϕ
j,q +B
j,−q +A4ϕ
j,−q , (4)
where
A0 = S|ϕj,±q=0 , (5)
∂ϕj,q
ϕj,±q=0
= 4uV −1
l,k1,k2
ϕj,k1ϕl,k2ϕl,−q−k1−k2 , (6)
∂ϕj,q∂ϕj,−q
ϕj,±q=0
= Θ(q) + Θ(−q) + 4uV −1
(1 + 2δlj) | ϕl,k |
2 , (7)
∂ϕ2j,q
∣∣∣∣∣
ϕj,±q=0
= 2uV −1
(1 + 2δlj)ϕl,kϕl,−2q−k , (8)
∂2ϕj,q∂2ϕj,−q
ϕj,±q=0
= 6uV −1 . (9)
Here the sums are marked by a prime to indicate that terms containing ϕj,±q are omitted.
This is simply a splitting of (1) into parts with all possible powers of ϕj,±q. The condition
Λ′ > Λ/3, as well as the existence of the upper cutoff for the wave vectors, ensures that
terms of the third power are absent in (4). Besides, the derivation is performed formally
considering all ϕl,k as independent variables.
Taking into account (4), as well as the fact that A1 = A
1+ iA
1 and B2 = B
2+ iB
2 are
complex numbers, the transformed action after the first elementary renormalization step
reads
Stra[ϕ] = A0 − ln
j,q −A
ϕ′j,q
+ ϕ′′j,q
× exp
−2B′2
ϕ′j,q
− ϕ′′j,q
+ 4B′′2ϕ
× exp
ϕ′j,q
+ ϕ′′j,q
dϕ′j,qdϕ
. (10)
Considering only the field configurations which are relevant in the thermodynamic
limit V → ∞, Eq. (10) can be simplified, omitting the terms with B2 and A4. Really,
using the coordinate representation ϕl,k = V
ϕl(x) e
−ikxdx, we can write
B2 = 2uV
(1 + 2δlj)
ϕ2l (x)e
iqx dx
− 3ϕ2j,−q
 . (11)
The quantity V −1
ϕ2l (x)e
iqx dx is an average of ϕ2l (x) over the volume with oscillating
weight factor eiqx. This quantity vanishes for relevant configurations in the thermody-
namic limit: due to the oscillations, positive and negative contributions are similar in
magnitude and cancel at V → ∞. Since 〈| ϕj,−q |
2〉 = 〈| ϕj,q |
2〉 is the Fourier transform
of the two–point correlation function, it is bounded at V → ∞ and, hence, ϕ2j,−q also
is bounded for relevant configurations giving nonvanishing contribution to the statistical
averages 〈·〉 in the thermodynamic limit. Consequently, for these configurations, A2 is
a quantity of order O(1), whereas V −1ϕ2j,−q and B2 vanish at V → ∞. Note, however,
that the term with A4 = O
in (10) cannot be neglected unless A2 is positive. One
can judge that the latter condition is satisfied for the relevant field configurations due to
existence of the thermodynamic limit for the RG flow.
Omitting the terms with B2 and A4, the integrals in (10) can be easily calculated. It
yields
Stra[ϕ] = S
′[ϕ] + ∆Seltra[ϕ] , (12)
where S′[ϕ] = A0 is the original action, where only the ±q modes of the j-th field com-
ponent are omitted, whereas ∆Seltra[ϕ] represents the elementary variation of the action
given by
∆Seltra[ϕ] = ln
| A1 |
. (13)
According to the arguments provided above, this equation is exact for the relevant field
configurations with A2 > 0 in the thermodynamic limit.
The contributions to (6) and (7) provided by modes with wave vectors k, obeying two
relations Λ − dΛ < k < Λ and k 6= ±q, are irrelevant in the thermodynamic limit at
dΛ → 0. It can be verified by the method of analysis introduced in Sec. 2. Hence, Eq. (13)
can be written as
∆Seltra[ϕ] = ln
| Ã1 |
+ δSeltra[ϕ] , (14)
where
Ã1 = P
∂ϕj,q
= 4uV −1
Λ−dΛ∑
l,k1,k2
ϕj,k1ϕl,k2ϕl,−q−k1−k2 , (15)
Ã2 = P
∂ϕj,q∂ϕj,−q
= Θ(q) + Θ(−q) + 4uV −1
Λ−dΛ∑
(1 + 2δlj) | ϕl,k |
2 , (16)
and δSeltra[ϕ] is a vanishingly small correction in the considered limit. Here the operators
P set to zero all ϕj,k within the shell Λ− dΛ < k < Λ (i. e., the derivatives are evaluated
at zero ϕj,k for k within the shell), and the upper border Λ− dΛ for sums implies that we
set ϕl,k = 0 for k > Λ− dΛ. The above replacements are meaningful, since they allow to
obtain easily the Wegner–Houghton equation, as discussed in the following section.
3 Superposition hypothesis and the
Wegner–Houghton equation
Intuitively, it could seem very reasonable that the result of integration over Fourier modes
within the shell Λ − dΛ < k < Λ at dΛ → 0 can be represented as a superposition of
elementary contributions given by (14), neglecting the irrelevant corrections δSeltra[ϕ]. We
will call this idea the superposition hypothesis.
We remind, however, that strictly exact treatment requires a sequential integration of
exp(−S[ϕ]) over a set of ϕj,q. The renormalized action changes after each such integration,
and these changes influence the following steps. A problem is to estimate the discrepancy
between the results of two methods: (1) the exact integration and (2) the superposition
approximation. Since it is necessary to perform infinitely many integration steps in the
thermodynamic limit, the problem is nontrivial and the superposition hypothesis cannot
be rigorously justified.
Nevertheless, the summation of elementary contributions in accordance with the su-
perposition hypothesis leads to the known Wegner–Houghton equation [8]. In this case
the variation of the action due to the integration over shell reads
∆Stra[ϕ] =
Λ−dΛ<q<Λ
Ã2(j,q)
| Ã1(j,q) |
Ã2(j,q)
. (17)
It is exactly consistent with Eq. (2.13) in [8]. The factor 1/2 appears, since only half of
the wave vectors represent independent modes. Here we have indicated that the quantities
Ã1 and Ã2 depend on the current j and q. They depend also on the considered field con-
figuration [ϕ]. If Ã1 and Ã2 are represented by the derivatives of S[ϕ] (see (15) and (16)),
then the equation is written exactly as in [8].
To avoid possible confusion, one has to make clear that the operators P in (15) and (16)
influence the result, as discussed further on. It means that the equation where these
operators are simply omitted, referred in the review paper [10] as the Wegner–Houghton
equation, is not really the Wegner–Houghton equation.
The derivation in [8] is somewhat different. Instead of performing only one elementary
step of integration first, the expansion of Hamiltonian in terms of all shell variables is made
there. The basic method of [8] is to show that, in the thermodynamic limit at dΛ → 0, the
expansion consists of terms containing no more than two derivatives with respect to the
field components. Moreover, it is assumed implicitly that only the diagonal terms with
k′ = −k are important finally, when performing the summation over the wave vectors
k,k′. It leads to Eq. (2.12) in [8]. The omitting of nondiagonal terms is equivalent to the
superposition assumption we discussed already. Indeed, in this and only in this case the
integration over the shell variables can be performed independently, as if the superposition
principle were hold. Hence, essentially the same approximation is used in [8] as in our
derivation, although it is not stated explicitly.
Our derivation refers to the ϕ4 model, whereas in the form with derivatives the equation
may have a more general validity, as supposed in [8]. Indeed, (14) remains correct for a
generalized model provided that higher than second order derivatives of S[ϕ] vanish for
relevant field configurations in the thermodynamic limit. It, in fact, has been assumed
and shown in [8]. Based on similar arguments we have used already, the latter assumption
can be justified for certain class of models, for which the action is represented by a linear
combination of ϕm–kind terms with wave–vector dependent weights and vanishing sum of
the wave vectors
l=1 kl = 0 related to the ϕ factors. In this case we have
Ã1(j,q) =
∂ϕj,q
− ϕj,−q
∂ϕj,qϕj,−q
, (18)
Ã2(j,q) =
∂ϕj,qϕj,−q
for the relevant configurations at V → ∞ and dΛ → 0. The second term in (18) appears
because the derivative ∂S/∂ϕj,q contains relevant terms with ϕj,−q, which have to be
removed. The influence of the operators P is seen from (15) and (18).
Here we do not include the second, i. e., the rescaling step of the RG transformation.
It, however, can be easily calculated for any given action, as described, e. g., in [4]. It is
not relevant four our further considerations.
4 The weak coupling limit
Here we consider the weak coupling limit u → 0 of the model with Θ(k) = r0 + ck
a given positive r0, i. e., in the high temperature phase. In this case ∆Stra[ϕ] can be
expanded in powers of u. It is the natural domain of validity of the perturbation theory,
and the expansion coefficients can be calculated exactly by the known methods applying
the Feynman diagram technique and the Wick’s theorem [4, 5, 11]. On the other hand, the
expansion can be performed in (17). Our aim is to compare the results of both methods
to check the correctness of (17), since the latter equation is based on assumptions.
Let us denote by ∆S̃tra[ϕ] the variation of S[ϕ] omitting the constant (independent of
the field configuration) part. Then the expansion in powers of u reads
∆S̃tra[ϕ] = ∆S1[ϕ]u+
2 [ϕ] + ∆S
2 [ϕ] + ∆S
2 [ϕ]
u2 +O
, (20)
where the expansion coefficient at u2 is split in three parts ∆S
2 [ϕ], ∆S
2 [ϕ], and ∆S
2 [ϕ]
corresponding to the ϕ2, ϕ4, and ϕ6 contributions, respectively. The contribution of order
u is related to the diagram r✐ , whereas the three second–order contributions — to the
diagrams q q❦ , ❛✦q q✦❛ , and ❛✦q q✦❛ . The diagram technique represents the
expansion of −S[ϕ] in terms of connected Feynman diagrams, where the coupled lines are
associated with the Gaussian averages. In particular, the Fourier transformed two–point
correlation function in the Gaussian approximation G0(k) = 〈ϕj,kϕj,−k〉0 = 1/[2Θ(k)]
appears due to the integration over ϕ′j,k and ϕ
j,k. It is represented as the coupling of
lines, in such a way that each line related to the wave vector k and vector–component j is
coupled with another line having the wave vector −k and the same component j. Thus, if
we integrate over ϕj,k within Λ−dΛ < k < Λ in (2), then it corresponds to the coupling of
lines in the same range of wave vectors in the diagram technique. According to the Wick’s
theorem, one has to sum over all possible couplings, which finally yields the summation
(integration) over the wave vectors obeying the constraint Λ− dΛ < k < Λ for each of the
coupled lines associated with the factors G0(k). In the n–component case, it is suitable
to represent the ϕ4 vertex as ❛✦q q✦❛ , where the same index j is associated with two
solid lines connected to one node. The above diagrams are given by the sum of all possible
couplings of the vertices ❛✦q q✦❛ , yielding the corresponding topological pictures when
the dashed lines shrink to points. In this case factor n corresponds to each closed loop
of solid lines, which comes from the summation over j. For a complete definition of the
diagram technique, one has to mention that factors −uV −1 are related to the dashed
lines, G0(k) – to the coupled solid lines, and the fields ϕj,k – to the outer uncoupled solid
lines. Besides, each diagram contains a combinatorial factor. For a diagram consisting
of m vertices ❛✦q q✦❛ , it is the number of all possible couplings of (numbered) lines,
divided by m!.
At dΛ → 0, the diagrammatic calculation for the n–component case yields
∆S1[ϕ] =
(n + 2) dΛ
Λ−dΛ∑
| ϕj,k |
2 (21)
2 [ϕ] = −4V
Λ−dΛ∑
j,l,k1,k2,k3
ϕj,k1ϕj,k2ϕl,k3ϕl,−k1−k2−k3 (22)
× [(n+ 4)Q (k1 + k2,Λ, dΛ) + 4Q (k1 + k3,Λ, dΛ)]
2 [ϕ] = −8V
Λ−dΛ∑
i,j,l,k1,k2,k3,k4,k5
ϕi,k1ϕi,k2ϕj,k3ϕj,k4ϕl,k5ϕl,−k1−k2−k3−k4−k5
×G0 (k1 + k2 + k3) F (| k1 + k2 + k3 |,Λ, dΛ) , (23)
where Kd = S(d)/(2π)
d, S(d) = 2πd/2/Γ(d/2) is the area of unit sphere in d dimensions,
Θ(Λ) is the value of Θ(k) at k = Λ, whereas F(k,Λ, dΛ) is a cutoff function which has the
value 1 within Λ− dΛ < k < Λ and zero otherwise. The quantity Q is given by
Q(k,Λ, dΛ) = V −1
Λ−dΛ<q<Λ
G0(q)G0(k− q)F(| k− q |,Λ, dΛ) . (24)
Below we will give some details of calculation of (22), which is the most important term
in our further discussion. To obtain this result, we have dechipered the ❛✦q q✦❛ diagram
as a sum of three diagrams of different topologies made of vertices ❛✦q q✦❛ , i. e.,
q q q q✐ ✦
✍ , and q
, providing the same topological picture
❛ when shrinking the dashed lines to points. Recall that any loop made of solid
lines of ❛✦q q✦❛ gives a factor n, and one needs also to compute the combinatorial
factors. For the above three diagrams, the resulting factors are 4n, 16, and 16, which
enter the prefactors of Q in (22). To obtain the correct sign, we recall that the diagram
expansion is for −S[ϕ]. The other diagrams are calculated in a similar way.
The expansion of (17) gives no contribution ∆S
2 [ϕ], and we have skipped it in the
diagrammatic calculation as an irrelevant term, which vanishes faster than ∝ dΛ at dΛ → 0
in the thermodynamic limit V → ∞. The expansion of the logarithm term in (17) yields
∆S1[ϕ] exactly consistent with (21). Similarly, ∆S
2 [ϕ] is exactly consistent with (23).
One has to remark that two propogators are involved in (24) and, therefore, the volume
of summation region with nonvanishing cut function F shrinks as (dΛ)2 for a given nonzero
wave vector k at dΛ → 0. However, there is a contribution linear in dΛ for k = 0. As a
result, a contribution proportional to dΛ appears in (22).
Note that the contributions (21) and (23) come from diagrams with only one coupled
line. The term (22) is related to the diagram with two coupled lines. The expansion
of (17) provides a different result for the corresponding part of ∆S̃tra[ϕ]:
2 [ϕ] = −
d−1 dΛ
Θ2(Λ)
Λ−dΛ∑
j,l,k1,k2
(n+ 4 + 4δjl) | ϕj,k1 |
2 | ϕl,k2 |
2 . (25)
Note that (25) comes from the ln Ã2 term in (17), and the calculation is particularly simple
in this case, since the related sum in (16) is independent of q. Eq. (25) is obtained if we
set Q(k,Λ, dΛ) → δk,0Q(0,Λ, dΛ) in (22) (in this case only the diagonal terms j = l are
relevant when summing up the contributions with Q (k1 + k3,Λ, dΛ), as it can be shown by
an analysis of relevant real–space configurations, since 〈ϕj(x)ϕl(x)〉 = 0 holds for j 6= l).
It means that a subset of terms is missing in (25), as compared to (22). The following
analysis will show that this discrepancy between (22) and (25) is important.
It is interesting to mention that (25) is obtained also by the diagrammatic perturbation
method if we first integrate out only the mode with ϕj,±q and then formally apply the
superposition hypothesis, as in the derivation of the Wegner–Houghton equation. It shows
that the discrepancy between (25) and (22) arises because in one case the superposition
hypothesis is applied, whereas in the other case it is not used.
The difference between (22) and (25) can be better seen in the coordinate representa-
tion. In this case (22) reads
2 [ϕ] = − (4n+ 16)
ϕ2(x1)R
2(x1 − x2)ϕ
2(x2) dx1dx2 (26)
ϕj(x1)ϕl(x1)R
2(x1 − x2)ϕj(x2)ϕl(x2) dx1dx2 ,
where
R(x) = V −1
G0(q)F(q,Λ, dΛ)e
iqx (27)
is the Fourier transform of G0F , and ϕ
2(x) =
l (x). In three dimensions we have
R(x) =
(2π)2Θ(Λ)
sin(Λx) (28)
for any given x at dΛ → 0 and L → ∞, where L is the linear system size. The continuum
approximation (28), however, is not correct for x ∼ L and therefore, probably, should not
be used for the evaluation of (26).
The coordinate representation of (25) is
2 [ϕ] = −
d−1 dΛ
Θ2(Λ)
(n+ 4)
ϕ2(x1)V
−1 ϕ2(x2) dx1dx2
ϕ2j (x1)V
−1 ϕ2j (x2) dx1dx2
 . (29)
Eq. (29) represents a relevant contribution at dΛ → 0, as it is proportional to dΛ. It is ob-
viously not consistent with (26). In fact, the term (29) represents a mean-field interaction,
which is proportional to 1/V and independent of the distance, whereas (26) corresponds
to another non-local interaction given by R2(x1 − x2). Hence, the Wegner–Houghton
equation (17) does not yield all correct expansion coefficients at u → 0.
5 Discussion
The results of our test, stated at the end of Sec. 4, reveal some inconsistency between
the Wegner–Houghton equation and the diagrammatic perturbation theory in the high
temperature phase at u → 0. Since this is the natural domain of validity of the per-
turbation theory, there should be no doubts that it produces correct results here, which
agree with (2). So, the results of our test point to some inconsistency between the Wegner–
Houghton equation and (2), which causes a question in which sense the Wegner–Houghton
equation is really exact. The same can be asked about the equations of Polchinski type,
since these (as it is believed) are generalizations of the Wegner–Houghton equation to
the case of smooth momentum cutoff. There is no contradiction with the tests of consis-
tency made in [13, 15], since our test is independent and quite different. According to our
derivation of the Wegner–Houghton equation and the related discussion, it turns out that
the reason of the inconsistency, likely, is the superposition approximation (defined at the
beginning of Sec. 3) used in our paper and analogous approximation implicitly used in [8].
Despite of this problem, the Wegner–Houghton equation is able to reproduce the exact
RG eigenvalue spectrum and critical exponents of the spherical model at n → ∞ [13].
This fact can be interpreted in such a way that the superposition approximation (or an
analogous approximation) is valid to derive such nonperturbative RG equations, which can
produce correct (exact) critical exponents in some limit cases, at least. From a general
point of view, it concerns the fundamental question about the relation between the form
of RG equation and the universal quantities. It has been verified in several known studies
that the universal quantities are invariant with respect to some kind of variations in the
RG equation, like changes in the shape of the momentum cutoff function. This property
is known as the reparametrisation invariance [10]. Probably, the universal quantities are
invariant also with respect to such a variation of the Wegner–Houghton equation, which
makes it exactly consistent with (2). However, this is only a hypothesis.
6 Conclusions
1. The nonperturbative Wegner–Houghton RG equation has been rederived (Secs. 2
and 3), discussing explicitly some assumptions which are used here. In particular,
our derivation assumes the superposition of small contributions provided by elemen-
tary integration steps over the short–wave fluctuation modes. We consider it as
an approximation. As discussed in Sec. 3, the original derivation by Wegner and
Houghton includes essentially the same approximation, although not stated explic-
itly.
2. According to our calculation in Sec. 4, the Wegner–Houghton equation is not com-
pletely consistent with the diagrammatic perturbation theory in the limit of small
ϕ4 coupling constant u in the high temperature phase. This fact, together with
some other important results known from literature, is discussed in Sec. 5. Apart
from critical remarks, a hypothesis has been proposed that the equations of Wegner–
Houghton type, perhaps, can give exact universal quantities.
References
[1] D. J. Amit, Field theory, the renormalization group, and critical phenomena, World
Scientific, Singapore, 1984
[2] D. Sornette, Critical Phenomena in Natural Sciences, Springer, Berlin, 2000
[3] K. G. Wilson, M. E. Fisher, Phys. Rev. Lett. 28, 240 (1972)
[4] Shang–Keng Ma, Modern Theory of Critical Phenomena, W.A. Benjamin, Inc., New
York, 1976
[5] J. Zinn–Justin, Quantum Field Theory and Critical Phenomena, Clarendon Press,
Oxford, 1996
[6] H. Kleinert, V. Schulte–Frohlinde, Critical properties of φ4 theories, World Scientific,
[7] J. Kaupužs, Ann. Phys. (Leipzig) 10, 299 (2001)
[8] F. Wegner, A. Houghton, Phys. Rev. A 8, 401 (1973)
[9] J. Polchinski, Nucl. Phys. B 231, 269 (1984)
[10] C. Bagnuls, C. Bervillier, Phys. Rep. 348, 91 (2001)
[11] J. Berges, N. Tetradis, C. Wetterich, Phys. Rep. 363, 223 (2002)
[12] C. Wetterich, Phys. Lett. B 301, 90 (1993)
[13] K.-I. Aoki, K. Morikava, W. Souma, J.-I. Sumi, H. Terao, Progress of Theoretical
Physics 95, 409 (1996)
[14] K.-I. Aoki, K. Morikava, W. Souma, J.-I. Sumi, H. Terao, Progress of Theoretical
Physics 99, 451 (1998)
[15] T. R. Morris, J. F. Tighe, Journal of High Energy Physics 9908, 007 (1999)
[16] T. R. Morris, J. F. Tighe, Int. J. Mod. Phys. A 16, 2095 (2001)
[17] L. Canet, B. Delamotte, D. Mouhanna, J. Vidal, Phys. Rev. B 68, 064421 (2003)
[18] B. Delamotte, D. Mouhanna, M. Tissier, Phys. Rev. B 69, 134413 (2004)
[19] M. D’Attanasio, T. R. Morris, Physics Letters B 409, 363 (1997)
	1 Introduction
	2 An elementary step of renormalization
	3 Superposition hypothesis and the  Wegner–Houghton equation
	4 The weak coupling limit
	5 Discussion
	6 Conclusions
ABSTRACT
  A nonperturbative renormalization of the phi^4 model is considered. First we
integrate out only a single pair of conjugated modes with wave vectors +/- q.
Then we are looking for the RG equation which would describe the transformation
of the Hamiltonian under the integration over a shell Lambda - d Lambda < k <
Lambda, where d Lambda -> 0. We show that the known Wegner--Houghton equation
is consistent with the assumption of a simple superposition of the integration
results for +/- q. The renormalized action can be expanded in powers of the
phi^4 coupling constant u in the high temperature phase at u -> 0. We compare
the expansion coefficients with those exactly calculated by the diagrammatic
perturbative method, and find some inconsistency. It causes a question in which
sense the Wegner-Houghton equation is really exact.

<|endoftext|><|startoftext|>
Instanton Liquid at Finite Temperature and Chemical
Potential of Quarks
S.V. Molodtsov1,3, G.M. Zinovjev2
1Joint Institute for Nuclear Research, Dubna, 141980 RUSSIA
2Bogolyubov Institute for Theoretical Physics, ul. Metrolohichna 14-b, Kiev, 03680 UKRAINE
3Institute of Theoretical and Experimental Physics, Moscow, 117259 RUSSIA
Instanton liquid in heated and strongly interacting matter is studied using the variational princi-
ple. The dependence of the instanton liquid density (gluon condensate) on the temperature and
the quark chemical potential is determined under the assumption that, at finite temperatures,
the dominant contribution is given by an ensemble of calorons. The respective one-loop effective
quark Lagrangian is used.
In current studies of strong-interacting matter under extreme conditions, primary attention is
focused on a description of its phase state at given temperature and chemical potential. For def-
initeness, we consider that T is the temperature of quarks and µ is the quark chemical potential
(it is assumed that gluons are in thermodynamical equilibrium with quarks). However, there is no
approach making it possible to describe main features of the expected phase diagram of quark-gluon
matter at least qualitatively.
In the present study, we argue that the instanton liquid model of the QCD vacuum [1] can shed
light on some important features of a full picture. It is frequently noted that this model offers a useful
tool for obtaining phenomenologically plausible estimates in spite of the fact that it is poorly justified
because the typical size of an instanton is not properly fixed. As of now, this fact is considered as
inessential because a connection has been revealed between limitations on the instanton size due
to repulsion [2] and generation of mass of the gluon field in the framework of the quasi-classical
approximation [3]. The latter mechanism is a more general property of stochastic gluon fields than
the former one. We will discuss this question later. Here we assume that the problem of instanton
size is solved in one of the following scenarios: self-stabilization of the saturating ensemble [2],[4],
freezing of the coupling constant [5], or influence of the confining component [6]. In the present
study, primary attention is focused on a plausible qualitative model describing a behavior of the
gluon condensate.
In the beginning, we recollect the variational principle proposed in [2] and the method of determi-
nation of the size of pseudoparticles and the density of the instanton liquid and introduce notation for
further considerations. In the model of instanton liquid describing the QCD vacuum, it is assumed
that the leading contribution to the QCD generating functional is given by the background fields
representing superposition of instantons in the singular gauge:
Aaµ(x; γ) =
ωabη̄bµν aν(y) , aν(y) =
y2 + ρ2
, y = x− z , µ, ν = 1, 2, 3, 4 . (1)
where ρ is the size, ω is the matrix of color rotation, and z is the position of the center of a
pseudoparticle (in the case of anti-instanton, the ’t Hooft symbol should be replaced as follows:
η̄ → η). This being so, the QCD generating functional takes the form
dγi d(ρi) e
−β Uint(γ) =
dγi e
−E(γ) , (2)
http://arxiv.org/abs/0704.0143v1
E(γ) = β Uint(γ)−
ln d(ρi) ,
where
d(ρ) =
β̃2Nc e−β(ρ) , (3)
is the instanton size distribution [7]; dγi = dzi dωi dρi, and
β(ρ) =
= −b ln(C1/bNc Λρ)
is the action of a single instanton, where (Λ = ΛMS = 0.92ΛP.V.) CNc , depends on the renormalization
scheme and, in the case under consideration, is given by CNc ≈
4.66 exp(−1.68Nc)
π2(Nc − 1)!(Nc − 2)!
, and b =
11 Nc − 2 Nf
. We assume that Nf=2 here because the leading contribution to renormalization
comes from hard massless gluons and quarks. The auxiliary function
β̃ = −b ln(Λρ̄) ,
is evaluated at the scale ρ̄ defined by an average size of pseudoparticles, Uint(γ) is considered assuming
pair interaction dominance. Its contribution has the form [2]
dω1 dω2 dz1 dz2 Uint(γ1, γ2) = V ξ
2 ρ21 ρ
where ξ2 = 27 π
N2c − 1
. The factor β that appears in the exponent in formula (2) is also evaluated
at the scale of an average size of pseudoparticles ρ̄. Assuming that the instanton liquid is topologically
neutral, we do not introduce notation to distinguish between instantons and anti-instantons, N
denotes the ovarall number of pseudoparticles in volume V .
Since the interaction is independent of coordinates or orientation in color space, it is natural to
calculate the generating functional Y on the basis of the effective one-particle distribution function
µ(ρ), which can be determined from the solution of the variational problem
dρi µ(ρ) =
dγi e
−E1(γ) , (4)
E1(γ) = −
lnµ(ρi) ,
where the factor V N in (4) is isolated in order that the result be expressed in terms of the respective
density and convenience in interpretation of the function µ(ρ). With regard to convexity of the
exponential function, the generating functional (2) for every fixed N partial contribution can be
estimated using the approximating inequality
Y ′ ≥ Ya = Y ′1 exp(−〈E − E1〉) , (5)
where an average over approximate ensemble is implied. In the case under consideration, the average
of difference 〈E − E1〉 is given by:
〈E −E1〉 =
dγi [β Uint −
ln d(ρi) +
lnµ(ρi)] e
lnµ(ρi) =
dρ µ(ρ) ln
dρ1dρ2 ξ
2 ρ21ρ
2 µ(ρ1)µ(ρ2)
where µ0 =
dρ µ(ρ).
Variation of the functional 〈E −E1〉 with respect to µ(ρ) results formally in the equation µ(ρ) =
e−1 d(ρ) e−nβξ
2ρ2ρ2 (where n = N/V is the density of the instanton liquid). Here an unwanted factor
of e−1, emerges. It can be excluded due to the fact that the approximate functional Ya is independent
of the constant factor of C that can be added to the expressionfor µ(ρ). For convenience, we set
C = e, and therefore, arrive at
µ(ρ) = d(ρ) e−nβξ
2ρ2ρ2 . (6)
Substituting this solution to the approximate functional, we obtain
V N µN0
(ρ2)2 .
Defining suitable parameter ν the integral for determination µ0 can be represented in the form
µ0 = Λ
dρΛ CNcβ̃
2Nc (ρΛ)b−5 e
ρ2 . (7)
From the comparison of which with formula (6) we obtain
= βξ2nρ2 . (8)
Provided that ν is known, this formula offers a relation between the average instanton size and the
density of the instanton liquid. To find this relation, we consider the equation
dρ ρb−3 e
dρ ρb−5 e
ρ2 ν−1 Γ( b−4
Γ( b−4
which gives ν = b−4
, and therefore, µ0 = Λ
4 CNcβ̃
2Nc (ρΛ)
2 . It should be noted that the factor
of two in the denominator of this expression stems from the integration measure 2ρdρ, which, in its
turn, emerges in transformation to the Gaussian integral with respect to ρ squired. This factor was
omited in [2]; however, this fact has no noticeable consequences. The reason is that the parameter
Λ is determined from a fit to some observable, for example, to the pion decay constant. In so doing,
everything is governed by a choice of scale. Moreover, it should be remembered that the instanton
liquid model is merely a rough approximation. From the above, we derive an approximate expression
for the functional as follows:
Ya = exp
[ln(n/Λ4)− 1] +N ln
CNcβ̃
2Nc(βξ2ν)−ν/2
. (9)
Now we find the value of n at which the argument of the exponential approaches its maximum. To
do this, we should solve the equation
ln(n/Λ4) + ln
CNcβ̃
2Nc(βξ2ν)−ν/2
− n ν
= 0 . (10)
From the relation (8) we obtain
= 0 .
On the other hand,
= − bρ̄ ,
. We represent the derivative of β with respect to the density
in the form
, and obtain
4β − b ,
. (11)
Thus we derive the expressionfor the instanton liquid density
n/Λ4 =
CNcβ̃
2Nc(βξ2ν)−ν/2
ν + 2
ν − 2
ν + 2
2β − ν − 2
. (12)
The contribution of the derivatives of the functions β and β̃ with respect to the density was disre-
garded in [2]. This contribution compensates for the above-mentioned factor of 2 though, as was
noted above, this is not essential. The obtained formula for the instanton liquid density by itself
does not provide a solution to the problem because it remains to solve the transcendental equation
(8) in ρ̄, where the function β involves the logarithm of ρ̄. To solve this equation, it is convenient to
reformulate the problem without resort to the explicit formula (12) for the instanton liquid density.
By definition of the function β, the action of an isolated pseudoparticle must be positive. This gives a
limitation to the maximum size of an (anti-)instanton as follows: ρ̄ΛC
≤ 1 (actually, ρ̄Λ ≤ 1). Now
we can solve the transcendental equation (10), by bisection of the segment. In so doing, a stationary
value of ρ̄ is determined at each step and the respective instanton liquid density is determined from
equation (8). In the calculation of the generating functional, the contributions of the type (ρ̄Λ)2ν are
used rather than the expression for the instanton liquid density.
Now we modify the variational principle in order to extend our description to the case of finite
temperatures. For this purpose, we employ calorons — solutions of the Yang–Mills equations periodic
in the Euclidean time. The background field should be replaced by a superposition of calorons and
anti-calorons as follows [8]:
Aaµ(x, γ) = −
ωab η̄bµν ∂ν lnΠ,
Π = 1 +
sinh(2πrT )
cosh(2πrT )− cos(2πτT ) ,
where T−1 is the period of the caloron, r = |x−z| is the distance from the center of the caloron z in
three dimensional space, and τ = x4 − z4 – is the respective interval of ”time”. As the temperature
tends to zero, such solutions go over to (anti-)instantons in the singular gauge. Yet another modifi-
cation of the variational principle is the replacement of the distribution (3) in the instanton size by
the function
d(ρ, T ) =
β̃2Nc exp[−β(ρ)− ANcT 2ρ2] , (14)
where the coefficient ANc =
6 Nc − 1
π2 accounts for the additional contribution to the action
of each individual pseudoparticle. It provides an approximation to a more exact expression
d(ρ, T ) = d(ρ, 0) exp
g2T 2
(Nc +Nf/2)
4π2ρ2
+ 12 A(πρT ) [1 + (Nc −Nf)/6]
, (15)
constructed from the respective determinants [9]. For our purposes it is sufficient to say that the
function A(πρT ) is determined by a shape of the pseudoparticle (13). This function was studied in
the cited work; however, we do not use it in the present article. It should be mentioned that the
expansion up to the terms of the order T 2 can be used as an approximate expression for the function
A(πρT ) because, within the accuracy of the variational principle, only the terms up to order ρ2
should be kept in the argument of the exponential in formula (14). The first term in formula (15)
is represented as a product of two factors; each factor was interpreted in [9]. The first factor is
the square of the electric mass, that is, the temporal component of the gluon polarization tensor
evaluated at the zero energy and momentum. It has the form
m2el = Π44(ω = 0,p = 0) = g
(Nc +Nf/2)
. (16)
The remaining components being equal to zero at zero energy-momentum. Therefore, the magnetic
mass vanishes. Note that the one-loop quark and gluon contributions to the polarization tensor
are taken into account [10], the resulting sum being rearranged in order that the quark and gluon
contributions in the medium sum up to a finite value. This, formally, gives rise to a generation of
the mass of the gluon field. The second factor is the integral of the square of the fourth component
A4 of the field in formula (13) ∫
dy Aa4(y)A
4(y) =
4π2ρ2
. (17)
It is independent of the temperature [11]. It is seen that one can take into account only one-loop
contribution 12m
4 to the Lagrangian of the gluon field and neglect other corrections. It was
demonstrated [3] that the term Uint describing the interaction of pseudoparticles can be brought in the
form 12m
2 AaµA
µ, wherem
2 = 9π2 n ρ̄2
N2c − 1
. Thus the interaction term also describes generation of
the mass of the gluon field in the instanton–anti-instanton medium in quasi-classical approximation.
This being so, chromoelectric and chromomagnetic fields are screened equally well provided that
the instanton liquid density is not equal to zero. It was shown that screening is a consequence of
stochastic character of the ensemble of gluon fields being unrelated to a specific instanton solution
of the type (1) or details of the repulsion mechanism responsible for stabilization of the ensemble
[2]. An application of these considerations to the (anti-)instanton solution (1) leads precisely to the
formula for Uint. It turns out that, in the caloron ensemble, screening of chromomagnetic fields and
the interaction term depends only weakly on the temperature. However, the anisotropy is negligible
small and the interaction term coincides with that obtained for the (anti-)instanton solution. First
it was found in [11], where the instanton liquid was studied at non zero temperature.
The one-loop contribution of Plank gluons is proportional to Nc (see formula (16)) and does not
vary as the chemical potential becomes different from zero. On the other hand, it is known that
the one-loop fermion contribution in the medium can be calculated exactly. It has no dangerous
singularities [12], [13]. The ”temporal” component of the polarization tensor generated by a quark
of definite flavor has the form
44(k4, ω) = g
dp p2
4ε2p − k2
(k2 + 2pω)2 + 4ε2pk
(k2 − 2pω)2 + 4ε2pk24
− εpk4
arctan
8pω εpk4
4ε2pk
4 − 4p2ω2 + k4
where ω = |k|, k2 = ω2 + k24, εp = (m2 + p2)1/2, where m – is the quark mass, np = n−p + n+p ,
n−p = (e
T + 1)−1, n+p = (e
T + 1)−1. After summation over all components, the polarization
tensor takes the form
Πf(k4, ω) = g
dp p2
2m2 − k2
(k2 + 2pω)2 + 4ε2pk
(k2 − 2pω)2 + 4ε2pk24
. (18)
It is seen that, at k4 = 0, and small values of ω, the first term (that is, unit) gives the dominant
contribution to the gluon mass. The spatial components are negligibly small. In particular, at ω = 0
we obtain
Πf(0, 0) = Π
44(0, 0) = g
dp p2
np , (19)
and at T = 0 we arrive at Πf (0, 0) = g2
(µ2 −m2)1/2µ
µ+ (µ2 −m2)1/2
The ultimate expression for the electric mass has the form
m2el =
g2T 2
Πf (0, 0)
 . (20)
In this approximation, the effect of the instanton liquid is completely accounted for by the quark
mass dynamically generated in the instanton medium. With such definition of mass, the formula (16)
at µ = 0 and T 6= 0 should be modified. The coefficient 16 at Nf should be replaced by
. However,
this replacement has only a little effect; self-consistency of our calculations will be discussed below.
Using the integral (17), which is also valid for the caloron solution, we derive the expression for
the distribution of pseudoparticles:
d(ρ;µ, T ) = d(ρ; 0, 0) e−η
2(µ,T ) ρ2 , η2 = 2 π2
Πf(0, 0)
 . (21)
The one-loop quark contribution to the instanton action at zero temperature, finite chemical poten-
tial, and ω 6= 0 was studied in detail in [14] (see also [15], [16]). These studies make it possible to
improve our description, however, we work within the approximation (21) and, moreover, we consider
the limit of massless quarks. A self-consistent calculation for the quark with dynamically generated
mass can be the subject of a separable study.
Necessary modifications in the variational principle are as follows. It was revealed that only
the distribution function d(ρ;µ, T ) of pseudoparticles changes, whereas the repulsion interaction
Uint between pseudoparticles remains as before. Similar to the case of instantons, we introduce the
parameter ν satisfying the relation
= η2 + βξ2nρ2 , (22)
instead of (8). Since the instanton liquid density is greater than zero, a new limitation on the average
size of pseudoparticle emerges ρ̄Λ ≤ ν
η . If this limit is smaller than the limit descussed above,
then it must be the starting point for the determination of the equilibrium size of pseudoparticles by
the bisection method. The derivative of the function β with respect to the density of the instanton
liquid can be determined from the relation (22). The result is
4β − b+ 2η2 ρ̄2β
ν−η2 ρ̄2
, (23)
it should be substituted in Eq. (10) that determines the saddle-point. The integral (19) should be
evaluated numerically because it cannot be calculated analytically at arbitrary temperatures even
though the quark mass equals zero. Thus we are ready to determine the parameters of the instanton
liquid everywhere over the µ – T plane. For simplicity, the calculations are performed at zero quark
masses. We neglect the light-quark contribution to the respective determinants [17] (see also [18]).
We also disregard a possible temperature molecular behavior of instanton–anti-instanton pairs [19].
The results of the calculations are shown in Fig. 1 by the lines of constant density. The instanton
liquid density is plotted in Fig 2. versus the temperature (at zero chemical potential) and versus the
chemical potential (at zero temperature). Though the conventional natation for the instanton liquid
density at nonzero temperature is n = TN/V3, we use the label n which is more simple. At T 6= 0,
and µ = 0, our results coincide with the results obtained in [11] and [9]. It sould be noted that our
results are consistent with recent calculations on a lattice at finite temperatures [20], [21], where a
rapid decrease of the chromoelectric components in the respective correlation functions was found.
In our model, such suppression is due to the term 12m
4 in the effective action; with neglect of
this factor, the chromoelectric and chromomagnetic correlators coincide. From this point of view, our
calculations may seem inconsistent. We use the caloron solution (13), which is symmetric under an
interchange of chromoelectric and chromomagnetic fields. However, the caloron components manifest
themthelves in the observables differently because of the anisotropy of the weight function. In fact,
our method of taking the gluon mass term into account is consistent only in perturbation theory. In
a complete study, one must find an analogue of the solution (13) for the effective Lagrangian with the
Figure 1: Lines of equal density of the instanton liquid in the temperature-chemical potential plane.
Curve 1 corresponds to the density n = 0.75 n0, where n0 is the density at zero temperature and
chemical potential. Also shown are the densities from (curve 2) n = 0.5 n0 to (curve 6) n = 0.1 n0.
Curves 3–5 correspond to intermediate densities at intervals of 0.1.
Figure 2: Instanton liquid density versus (curve 1) temperature and (curve 2) chemical potential.
gluon mass generated for the chromoelectric field and gain a self-consistent description of ensemble
of pseudoparticles in the long-wave approximation [3].
It is of interest that the data on correlation functions for cooled configurations [20] are fitted
well by the instanton ensemble [22]. In so doing, the contribution of the terms of the second order
in the instanton liquid density (∼ n2) is in excellent agreement with the effect of the standard
instanton ensemble with the respective admixture of the perturbative component everywhere over the
distance range chosen for a fit [23]. This agreement indicates that the confining component is absent
from the lattice configurations isolated by cooling. It is surprising because lattice simulations with
cooling were aimed at the searches for a long-wave confining component. However, an interpretation
of lattice simulations at finite temperature presents difficulties because it is not clear what scale
corresponds to the configarations used for the measurements. The magnitude of deformation of the
chromoelectric component of the solution for the effective Lagrangian with the mass term is also
poorly known. The scale of lattice configurations can, in principle, be estimated using the scale at
which the chromoelectric field decrease since only this scale has emerged in our calculations.
In conclusion we note that, though we used only a rough approximation, the most important
features of the behavior of the instanton liquid density (gluon condensate) in the medium have been
revealed. The lines of equal density are markedly extended along the µ axis because, according to
the formula (20), the most substantial gluon component of screening vanishes at small temperatures.
Typical values of T and µ at which the effects of the medium become significant are related to each
other by the formula
(Nc +Nf/2)
(T/Λ)2 ∼ Nf
(µ/Λ)2
∼ 1, which leads to a plausible coefficient
of oblongness along the µ axis
2π Tc ,
(at Nc = 3 and Nf = 2). A fall in density evaluated with allowance for the dynamically generated
quark mass should begin at a greater value and be more steep. The reason is that, at chemical
potentials less than the quark mass, the quark contribution to screening is reduced. This gives rise
to formation of a plateau and concentration of the lines of equal density. The dependence of the
dynamical quark mass on the momentum ω is significant at small temperatures leading to a decrease
of screening approximately by a factor of two [14].
We are grateful to A.E. Dorokhov for helful discussions.
This work was supported in part by grants STCU #P015c, CERN-INTAS 2000-349, NATO
2000-PST.CLG 977482.
References
[1] C.G. Callan, R. Dashen, and D.J. Gross, Phys. Lett. B66 (1977) 375;
C.G. Callan, R. Dashen, and D.J. Gross, Phys. Rev. D17 (1978) 2717.
A. Schäfer and E.V. Shuryak, Rev. Mod. Phys. 70 (1998) 323.
[2] D. I. Diakonov, V. Yu. Petrov, Nucl. Phys. B245 (1984) 259.
[3] S.V. Molodtsov, G.M. Zinovjev, hep-ph/0510015
[4] I.V. Musatov, A.N. Tavkhelidze and V.F. Tokarev, Theor. Math. Phys. 86, 20 (1991);
A.N. Tavkhelidze and V.F. Tokarev, Fiz. Elem. Chast. Atom. Yadra 21, 1126 (1990).
[5] E.V. Shuryak, Phys. Rev. D52, 5370 (1995).
[6] A.E. Dorokhov, S.V. Esaibegian, A.E. Maximov and S.V. Mikhailov,
Eur. Phys. J. C 13, 331 (2000).
[7] G.’t Hooft, Phys.Rev.D14 (1976) 3432.
[8] B.J. Harrington, H.K. Shepard, Phys. Rev. D17 (1978) 2122.
[9] D.J. Gross, R.D. Pisarski, and L.G. Yaffe, Rev. Mod. Phys. 53 (1981) 43.
[10] E.V. Shuryak, JETP 74 (1978) 408.
[11] D. I. Diakonov, A. D. Mirlin, Phys. Lett. B203 (1988) 299.
[12] I.A. Akhiezer, S.V. Peletminsky, JETP 38 (1960) 1829.
[13] B.A. Freedman, L.D. McLerran, Phys. Rev. D16 (1977) 1130, 1147, 1169.
[14] C.A. Carvalho, Nucl. Phys. B183 (1981) 182.
[15] A.A. Abrikosov (Jr), Yad. Fiz. 37 (1983) 772;
V. Baluni, Phys. Lett. B106 (1981) 491.
http://arxiv.org/abs/hep-ph/0510015
[16] E.V. Shuryak, Preprint INP, N0 82-03, 1982.
[17] M.A. Novak, J.J.M. Verbaarschot, and I. Zahed, Nucl. Phys. B325 (1989) 581.
[18] G.V. Dunne, J. Hur, Ch. Lee, H. Min, Phys. Rev. D71 (2005) 085019;
G.V. Dunne, J. Hur, Ch. Lee, H. Min, Phys. Rev. Lett. 94 (2005) 072001.
[19] E.-M. Ilgenfritz, E.V. Shuryak, Phys. Lett. B325 (1994) 263.
[20] A. Di Giacomo, E. Meggiolaro, H. Panagopoulos, Nucl. Phys. B483 (1997) 371.
[21] M. DÉlia, A. Di Giacomo and E. Meggiolaro, Phys. Rev. D67 (2003) 114504.
[22] A.E. Dorokhov, S.V. Esaibegyan, and S.V. Mikhailov, Phys. Rev. D56 (1997) 4062;
E.-M. Ilgenfritz, B.V. Martemyanov, S.V. Molodtsov, M. Müller-Preussker, and Yu.A. Simonov,
Phys. Rev. D58 (1998) 114508.
[23] E.-M. Ilgenfritz, B.V. Martemyanov, M. Müller-Preussker, Phys. Rev. D62 (2000) 096004.
ABSTRACT
  Instanton liquid in heated and strongly interacting matter is studied using
the variational principle. The dependence of the instanton liquid density
(gluon condensate) on the temperature and the quark chemical potential is
determined under the assumption that, at finite temperatures, the dominant
contribution is given by an ensemble of calorons. The respective one-loop
effective quark Lagrangian is used.

<|endoftext|><|startoftext|>
Eternal inflation and localization on the landscape
D. Podolsky1∗ and K. Enqvist1,2
1 Helsinki Institute of Physics, P.O. Box 64 (Gustaf Hällströmin katu 2), FIN-00014, University of Helsinki, Finland and
2 Department of Physical Sciences, P.O. Box 64, FIN-00014, University of Helsinki, Finland
(Dated: November 4, 2018)
We model the essential features of eternal inflation on the landscape of a dense discretuum of
vacua by the potential V (φ) = V0 + δV (φ), where |δV (φ)| ≪ V0 is random. We find that the
diffusion of the distribution function ρ(φ, t) of the inflaton expectation value in different Hubble
patches may be suppressed due to the effect analogous to the Anderson localization in disordered
quantum systems. At t → ∞ only the localized part of the distribution function ρ(φ, t) survives
which leads to dynamical selection principle on the landscape. The probability to measure any but
a small value of the cosmological constant in a given Hubble patch on the landscape is exponentially
suppressed at t → ∞.
PACS numbers: 98.80.Bp,98.80.Cq,98.80.Qc
String theory is believed to imply a wide landscape [1]
of both metastable vacua with a positive cosmological
constant and true vacua with a vanishing or a negative
cosmological constant; the latter are called anti-de Sitter
or AdS vacua, where space-time collapses into a singular-
ity. In regions with positive cosmological constant, or in
de Sitter (dS) vacua, the universe inflates, and because
of the possibility of tunneling between different de Sitter
vacua inflation is eternal.
The problem of calculating statistical distributions
of the landscape vacua is very complicated [2] and is
even considered to be NP-hard [3] (the total number
of vacua on the landscape is estimated to be of order
10100 ÷ 101000). Our aim is to consider how eternal in-
flation proceeds on the landscape by using the mere fact
that the number of vacua within the landscape is ex-
tremely large, so that their distribution can have signif-
icant disorder. The dynamics of eternal inflation is then
described by the Fokker-Planck equations in the disor-
dered effective potential.1 In that case, the landscape
dynamics may have some interesting parallels in solid
state physics, as we will discuss in the present paper.
Eternal inflation on the landscape can be modeled as
follows [5, 6]. Let us numerate vacua on the landscape
by the discrete index i and define Pi(t) as the probability
to measure a given (positive) value of the cosmological
constant Λi in a given Hubble patch. If the rates of
tunneling between the metastable minima i and j on the
landscape are given by the time independent matrix Γij ,
then the probabilities Pi satisfy the system of “vacuum
dynamics” equations [7]
Ṗi =
j 6=i
(ΓjiPj − ΓijPi)− ΓisPi. (1)
The last term in this equation corresponds to tunneling
∗On leave from Landau Institute for Theoretical Physics, 119940,
Moscow, Russia.
1 An approach somewhat similar to ours was also presented in [4].
between the metastable de Sitter vacuum i and a true
vacuum with a negative cosmological constant (an AdS
vacuum), i.e. tunneling into a collapsing AdS space-time
[8]. The collapse time tcol ∼ MP /V
is much shorter
than the characteristic time trec ∼ exp
M4P /VdS
for tun-
neling back into a de Sitter metastable vacuum, so that
the AdS true vacua effectively play the role of sinks for
the probability current (1) describing eternal inflation on
the landscape [5].
In what follows we will assume that the effect of the
AdS sinks is relatively small; otherwise the landscape will
be divided into almost unconnected “islands” of vacua
[6], preventing the population of the whole landscape by
eternal inflation.
In the limit of weak tunneling only the vacua closest
to each other are important. It is convenient to classify
parts (islands) of the landscape according to the typical
number of adjacent vacua within each part. Technically,
the landscape of vacua of the string theory can be repre-
sented as a graph with 10100÷101000 nodes and a number
of connections between them of the same order. By an
island on the landscape, we mean a subgraph relatively
weakly connected to the major ”tree”. The dimension-
ality of the island can then be defined as the Hausdorff
dimension NH of the corresponding subgraph [17]. For
example, if there are only two adjacent vacua for any
vacuum in a given island, then NH = 1 for this island
and we denote it as quasi-one-dimensional; a domain of
vacua with NH = 2 is quasi-two-dimensional, and so on.
In the quasi-one-dimensional case (neglecting the AdS
sinks) the system (1) reduces to
Ṗi = −Γi,i+1Pi+Γi+1,iPi+1−Γi,i−1Pi+Γi−1,iPi−1. (2)
While in general Γij 6= Γji, we will take 〈Γij〉 = 〈Γji〉
on the average.2 Furthermore, suppose that the initial
2 This condition is never satisfied for the Bousso-Polchinski land-
scape [9], where the adjacent vacua are those with closest values
http://arxiv.org/abs/0704.0144v3
condition for Eq. (2) is
Pi(0) = 1, Pj 6=i(0) = 0. (3)
so that the initial state is well localized. Naively, one may
expect that the distribution function Pi(t) would start
to spread out according to the usual diffusion law and
the system of vacua would exponentially quickly reach a
“thermal” equilibrium distribution of probabilities for a
given Hubble patch to be in a given dS vacuum. However,
there exists a well known theorem [10] from the theory of
diffusion on random lattices stating that the distribution
function Pi remains localized near the initial distribution
peak for a very long time, with its characteristic width
behaving as
〈i2(t)〉 ∼ log
t . (4)
This is a surprising result when applied to eternal infla-
tion where the general lore (see for example [11]) is that
the initial conditions for eternal inflation will be forgot-
ten almost immediately after its beginning. Instead, in
what follows we will argue that the memory about the
initial conditions may survive during a very long time on
the quasi-one-dimensional islands of the landscape.
We will model the landscape by a continuous inflaton
potential
V (φ) = V0 + δV (φ), (5)
where V0 is constant, and δV (φ) is a random contribution
such that |δV (φ)| ≪ V0, and φ is the inflaton or the or-
der parameter describing the transitions. As in stochas-
tic inflation [16], in different causally connected regions
fluctuations have a randomly distributed amplitude and
observers living in different Hubble patches see differ-
ent expectation values of the inflaton. When stochastic
fluctuations of the inflaton are large enough, the expec-
tation value of the inflaton in a given Hubble patch is
determined by the Langevin equation [16]
φ̇ = −
+ f(t), (6)
where the stochastic force f(φ, t) is Gaussian with corre-
lation properties
〈f(t)f(t′)〉 =
δ(t− t′). (7)
From (6) one can derive the Fokker-Planck equation,
which controls the evolution of the probability distribu-
tion ρ(φ, t) describing how the values of φ are distributed
of the effective cosmological constant. However, the spectrum
of states on Bousso-Polchinski landscape is not disordered, so
that the analysis based on averaging over disorder is not appli-
cable. Disorder appears in more realistic multithroat models of
the string theory landscape.
among different Hubble patches in the multiverse. One
finds [16]
∂ρ(φ, t)
. (8)
The general solution to Eq. (8) is given by
ρ = e
4π2δV (φ)
cnψn(φ)e
0 (t−t0)
4π2 , (9)
where ψn and En are respectively the eigenfunctions and
the eigenvalues of the effective Hamiltonian
Ĥ = −
+W (φ). (10)
W (φ) =
is a functional of the scalar field potential V (φ). It is
often denoted as the superpotential due to its “super-
symmetric” form: the Hamiltonian (10) can be rewrit-
ten as Ĥ = Q̂†Q̂, where Q̂ = −∂/∂φ + v′(φ) with
v(φ) = 4π2δV (φ)/(3H40 ).
The eigenfunctions of the Hamiltonian (10) satisfy the
Schrödinger equation
+ (En −W (φ))ψn = 0, (12)
and its solutions have the following well known features
[16]:
1. The eigenvalues of the Hamiltonian (10) are all pos-
itive definite.
2. The contributions from eigenfunctions of excited
states ψn>0(φ) to the solution Eq. (9) become ex-
ponentially quickly damped with time. However,
if one is interested in what happens at time scales
∆t . 1/En, the first n eigenfunctions should be
taken into account. In particular, if the spectrum
of the Hamiltonian (10) is very dense, as in the case
of the string theory landscape, knowing the ground
state alone is not enough for complete understand-
ing dynamics of eternal inflation.
We now recall that the potential V (φ) is a random func-
tion of the inflaton field and has extremely large number
of minima. This allows us to draw several conclusions
about the form of the eigenfunctions ψn(φ) using the for-
mal analogy between Eq. (12) and the time-independent
Schrödinger equation describing the motion of carriers
in disordered quantum systems such as semiconductors
with impurities. The physical quantities in disordered
systems can be calculated by averaging over the random
potential of the impurities.3
A famous consequence of the random potential gener-
ated by impurities in crystalline materials is the strong
suppression of the conductivity, known as Anderson lo-
calization [12, 13]. This effect is essential in dimensions
lower than 3 and completely defines the kinetics of carri-
ers in one-dimensional systems. There, impurities create
a random potential for Bloch waves with the correlation
properties
〈u(r)u(r′)〉 =
δ(r − r′), 〈u(r)〉 = 0, (13)
where τ is the mean free path for electrons and ν is the
density of states per one spin degree of freedom of the
electron gas at the Fermi surface. As a consequence, in
the one-dimensional case all eigenstates of the electron
hamiltonian become localized with
ψn(r) ∼ exp
|r − rn|
at t → ∞, where rn are the positions of localization
centers, and the localization length L is of the order of
the mean free path lτ = 〈v〉τ . As a result, the probability
density ρ(R, t) to find electron at the point R at time t
asymptotically approaches the limit ρ(R) ∼ exp(−R/L)
for R ≫ L, or ρ(R) ∼ Const for R ≪ L at t → ∞.
The one-dimensional Anderson localization takes place
for an arbitrarily weak disorder and arbitrary correlation
properties of the random potential u(r) [13].
Also, in a two-dimensional case all the electron eigen-
states in a random potential remain localized. However,
the localization length grows exponentially with energy,
the rate of growth being related to the strength of the dis-
order. In three-dimensional case, the localization prop-
erties of eigenstates are defined by the Ioffe-Regel-Mott
criterion: if the corresponding eigenvalue of the Hamil-
tonian of electrons En satisfies the condition En < Eg
where Eg is so called mobility edge, then the eigenstate
is localized. The mobility edge Eg is a function of the
strength of the disorder. In higher dimensional cases the
situation is unknown.
Let us now return to the discussion of eternal infla-
tion described by the Fokker-Planck equation (8). Since
the localization is the property of the eigenfunctions of
the time-independent hamiltonian (10), it is also a nat-
ural consequence of the effective randomness of the po-
tential of the string theory landscape.4 The diffusion
3 Observe that the typical number of these impurities varies be-
tween 1012 to 1017 per cm3 while the number of vacua on the
string theory landscape is 10100 ÷ 101000.
4 The Anderson localization on the landscape of string theory was
discussed before in [14] in the context of the Wheeler-deWitt
equation in the minisuperspace. The possibility to have the An-
derson localization on the landscape was also mentioned in [15].
of the probability distribution (4) is suppressed due to
the localization of the eigenfunctions ψn(φ) contributing
to the overall solution (9). This counteracts the general
wisdom that eternal inflation rapidly washes out any in-
formation of the initial conditions. Indeed,in the quasi-
one-dimensional case all the wave functions ψn(φ) are
localized, i.e., for a particular realization of disorder they
behave as
ψn(φ) ∼ exp
|φ− φn|
. (15)
where φn define the ”localization centers” as in the Eq.
(14), and L is the localization length which is of the same
order of magnitude as the “mean free path” related to the
strength of the disorder in the superpotential W (φ).
Let us now discuss how eternal inflation proceeds on
islands where the typical number of adjacent vacua is
larger than two. In the quasi-two-dimensional case the
network of vacua within a given island is described by a
composite index ~i = (i, j). The distribution function ρ
for finding a given value of the cosmological constant in a
given Hubble patch is a two-dimensional matrix. Again,
all the eigenstates of the corresponding tunneling hamil-
tonian Ĥ are localized. However, since the localization
length grows exponentially with energy, the distribution
function effectively spreads out almost linearly with
〈~i2(t)〉 ∼ t
1 + c1
logα t
+ · · ·
, (16)
where α > 0 are constants depending on the correlation
properties of the disorder on the landscape [18]. The
low energy eigenstates (namely, the states with E < Eg
where Eg is the mobility edge) are localized with a rela-
tively small localization length.
In the quasi-higher-dimensional cases the distribution
function spreads out according to the linear diffusion law
at intermediate times. Again, there exists a mobility edge
Eg such that the eigenstates of the tunneling Hamilto-
nian with energies E < Eg are localized. These low
energy eigenstates define the asymptotics of the distri-
bution function ρ at
t≫ E−1g . (17)
The value of the mobility edge Eg strongly depends on
the dimensionality of the island and the strength of the
disorder, and the higher is the dimensionality, the lower
is the mobility edge [17].
Localization of the low energy eigenstates in two- and
higher-dimensional cases introduces an effective dynam-
ical selection principle for different vacua on the land-
scape (5): in the asymptotic future, not all of them will
be populated, but only those near the localization centers
φn, and the probability to populate other minima will be
suppressed exponentially according to the Eq. (15).
It is interesting to note that in condensed matter sys-
tems the localization centers are typically located near
the points where the effective potential has its deepest
minima [13]. In the case of eternal inflation, it means
that the probability to measure any but very low value of
the cosmological constant in a given Hubble patch will be
exponentially suppressed in the asymptotic future [17].
Finally, we discuss the effect of sinks on the dynamics
of tunneling between the vacua. On the string theory
landscape, dS metastable vacua are typically realized by
uplifting stable AdS vacua (as, for example, in the well
known KKLT model [19]). The probability to tunnel
from the uplifted dS state i back into the AdS vacuum
is related to the value of gravitino mass m3/2 in the dS
state [8] and given by
tAdS ∼ Γ
is ∼ exp
Const.M2P
3/2,i
. (18)
The gravitino mass after uplifting [20] has the order of
magnitude m3/2,i ∼ |VAdS,i|
1/2/MP . Since at long time
scales VAdS,i can also be regarded as a random quantity,
our analysis of the general solution of “vacuum dynam-
ics” equations (1) does not have to be modified in any
essential way [17].
In addition to AdS sinks, Hubble patches where eternal
inflation has ended (stochastic fluctuations of the infla-
ton expectation value became smaller than the effect of
classical force) also effectively play a role of sinks for the
probability current described by the Eq. (8). In par-
ticular, the Hubble patch we live in is one of such sinks.
Related to the effect of sinks, there exists a time scale tend
for eternal inflation on the landscape (5) such that the
unitarity of the evolution of the probability distribution
ρ breaks down at t ≫ tend [17]. Our discussion remains
valid if t ≪ tend. It is unclear whether the probability
distribution ρ has achieved the late time asymptotics in
the corner of the landscape we live in.
In summary, we have argued that eternal inflation
on the landscape may lead to a strong localization of
the inflaton distribution function among different Hub-
ble patches. This is a consequence of the high density
of the vacua, which effectively implies a random poten-
tial for the order parameter responsible for inflation. We
found that the inflaton motion is analogous to the mo-
tion of carriers in disordered quantum systems, and there
exists an analogue of the Anderson localization for eter-
nal inflation on the landscape. Physically, this means
that not all the vacua on the landscape are populated by
eternal inflation in the asymptotic future, but only those
near the localization centers of the inflaton effective po-
tential. They are located near the deepest minima of the
potential, which implies that the probability to measure
any but very low value of the cosmological constant in
a given Hubble patch is exponentially suppressed at late
times.
Acknowledgements
The authors belong to the Marie Curie Research Train-
ing Network HPRN-CT-2006-035863. D.P. is thankful to
I. Burmistrov, N. Jokela, J. Majumder, M. Skvortsov,
K. Turitsyn ad especially to A.A. Starobinsky for the
discussions. K.E. is supported partly by the Ehrnrooth
foundation and the Academy of Finland grant 114419.
[1] L. Susskind, hep-th/0302219.
[2] M.R. Douglas, JHEP 0305 046 (2003); F. Denef and
M.R. Douglas, JHEP 0405 072 (2004).
[3] F. Denef and M.R. Douglas, hep-th/0602072.
[4] A. Aazami, R. Easther, JCAP 0603 013 (2006); R. Eas-
ther, L. McAllister, JCAP 0605 018 (2006).
[5] A. Linde, JCAP 0701 022 (2007).
[6] T. Clifton, A. Linde, N. Sivanandam, JHEP 0702 024
(2007).
[7] J. Garriga, D. Schwartz-Perlov, A. Vilenkin, and S.
Winitzki, JCAP 0601 017 (2006).
[8] A. Ceresole, G. Dall’Agata, A. Giryavets, R. Kallosh, and
A. Linde, Phys. Rev. D 74 086010 (2006).
[9] R. Bousso and J. Polchinski, JHEP 0006, 006 (2000).
[10] Ia.G. Sinai, in Proceedings of the Berlin Conference on
Mathematical Problems in Theoretical Physics, edited
by R. Schrader, R. Seiler, D.A. Ohlenbrock (Springer-
Verlag, 1982), p. 12.
[11] D.S. Goldwirth and T. Piran, Phys. Rept. 214 223
(1992); A. Linde, Phys. Lett. B 129 177 (1983); A. Linde,
Mod. Phys. Lett. A1 81 (1986).
[12] The volume of the literature regarding this subject is
extremely large. The original publications, where the
effect of Anderson localization was introduced, include
P.W. Anderson, Phys. Rev. 109 1492 (1958); N.F. Mott,
W.D. Twose, Adv. Phys. 10 107 (1961). The suppres-
sion of conductivity in one-dimensional disordered sys-
tems was originally proven by diagrammatic methods
in V. Berezinsky, Sov. Phys. JETP 38 620 (1974); A.
Abrikosov and I. Ryzhkin, Adv. Phys. 27 147 (1978); V.
Berezinsky, L. Gorkov, Sov. Phys. JETP 50 1209 (1979).
[13] K. Efetov, Supersymmetry in Disorder and Chaos (Cam-
bridge University Press, 1999).
[14] L. Mersini-Houghton, Class.Quant.Grav. 22 3481 (2005);
A. Kobakhidze, L. Mersini-Houghton, Eur.Phys.J. C49
869 (2007).
[15] S.H. Henry Tye, arXiv:hep-th/0611148.
[16] A.A. Starobinsky, in Field Theory, Quantum Gravity and
Strings, edited by H.J. de Vega and N. Sanchez (Springer-
Verlag, 1986), p. 107.
[17] D. Podolsky (in preparation); D. Podolsky, J. Majumder,
and N. Jokela (in preparation).
[18] D.S. Fisher, Phys. Rev. A 30 960 (1984).
[19] S. Kachru, R. Kallosh, A. Linde, S.P. Trivedi, Phys. Rev.
D 68 046005 (2003).
[20] R. Kallosh, A. Linde, JHEP 0412 004 (2004); J.J.
Blanco-Pillado, R. Kallosh, A. Linde, JHEP 0605 053
(2006).
http://arxiv.org/abs/hep-th/0302219
http://arxiv.org/abs/hep-th/0602072
http://arxiv.org/abs/hep-th/0611148
ABSTRACT
  We model the essential features of eternal inflation on the landscape of a
dense discretuum of vacua by the potential $V(\phi)=V_{0}+\delta V(\phi)$,
where $|\delta V(\phi)|\ll V_{0}$ is random. We find that the diffusion of the
distribution function $\rho(\phi,t)$ of the inflaton expectation value in
different Hubble patches may be suppressed due to the effect analogous to the
Anderson localization in disordered quantum systems. At $t \to \infty$ only the
localized part of the distribution function $\rho (\phi, t)$ survives which
leads to dynamical selection principle on the landscape. The probability to
measure any but a small value of the cosmological constant in a given Hubble
patch on the landscape is exponentially suppressed at $t\to \infty$.

<|endoftext|><|startoftext|>
Introduction to Modern Canonical Quantum General Relativity,
[gr-qc/0110034]; Ashtekar A and Lewandowski J, 2004, Background Independent Quantum
Gravity: A Status Report, Class. Quant. Grav., 21, R53, gr-qc/0404018;
[3] Bojowald M, 2005, Loop Quantum Cosmology, Living Rev. Relativity, 8, 11,
http://www.livingreview.org/lrr-2005-11, [gr-qc/0601085].
[4] Domagala M and Lewandowski J, 2004, Black hole entropy from Quantum Geometry, Class.
Quant. Grav., 21, 5233-5244, [gr-qc/0407051]; Krzysztof A. Meissne K A, 2004, Black hole
entropy in Loop Quantum Gravity, Class. Quant. Grav., 21, 5245-5252, [gr-qc/0407052];
[5] Ashtekar A, Bojowald M and Lewandowski J, 2003, Mathematical structure of loop quantum
cosmology, Adv. Theor. Math. Phys., 7, 233-268, [gr-qc/0304074].
[6] Ashtekar A, Pawlowski T and Singh P, 2006, Quantum Nature of the Big Bang, Phys. Rev.
Lett., 96, 141301, [gr-qc/0602086].
[7] Bojowald M, 2001, The Semiclassical Limit of Loop Quantum Cosmology, Class. Quant. Grav.,
18, L109-L116, [gr-qc/0105113].
[8] Date G and Hossain G M, 2004, Effective Hamiltonian for Isotropic Loop Quantum Cos-
mology, Class. Quant. Grav., 21, 4941-4953, [gr-qc/0407073]; Banerjee K and Date G, 2005,
Discreteness Corrections to the Effective Hamiltonian of Isotropic Loop Quantum Cosmology,
Class. Quant. Grav., 22, 2017-2033, [gr-qc/0501102].
http://arxiv.org/abs/gr-qc/0110034
http://arxiv.org/abs/gr-qc/0404018
http://www.livingreview.org/lrr-2005-11
http://arxiv.org/abs/gr-qc/0601085
http://arxiv.org/abs/gr-qc/0407051
http://arxiv.org/abs/gr-qc/0407052
http://arxiv.org/abs/gr-qc/0304074
http://arxiv.org/abs/gr-qc/0602086
http://arxiv.org/abs/gr-qc/0105113
http://arxiv.org/abs/gr-qc/0407073
http://arxiv.org/abs/gr-qc/0501102
[9] Willis J 2002, On the Low-Energy Ramifications and a Mathematical Extension of Loop
Quantum Gravity , Ph. D. Dissertation, Penn State,
http://cgpg.gravity.psu.edu/archives/thesis/index.shtml.
[10] Date G and Hossain G M, 2004, Genericness of Big Bounce in isotropic loop quantum cos-
mology, Phys. Rev. Lett., 94, 011302, [gr-qc/0407074].
[11] Vandersloot K, 2005, On the Hamiltonian Constraint of Loop Quantum Cosmology, Phys.
Rev., D 71, 103506, [gr-qc/0502082]; Perez A, 2006, On the regularization ambiguities in loop
quantum gravity, Phys. Rev., D 73, 044007, [gr-qc/0509118].
[12] Bojowald M, 2006, Loop quantum cosmology and inhomogeneities Gen. Rel. Grav., 38, 1771-
1795, [gr-qc/0609034].
[13] Ashtekar A, Pawlowski T and Singh P, 2006, Quantum Nature of the Big Bang: An Analytical
and Numerical Investigation, Phys. Rev. D, 73, 124038, [gr-qc/0604013].
[14] Bojowald M, 2006, Large scale effective theory for cosmological bounces, gr-qc/0608100.
[15] Ashtekar A, Pawlowski T and Singh P, 2006, Quantum Nature of the Big Bang: Improved
dynamics, Phys. Rev. D, 74, 084003, [gr-qc/0607039].
[16] Ashtekar A, Pawlowski T, Singh P and Vandersloot K, 2006, Loop quantum cosmology of
k=1 FRW models [gr-qc/0612104].
[17] Hossain G M, 2005, On Energy Conditions and Stability in Effective Loop Quantum Cosmol-
ogy, Class. Quant. Grav., 22, 2653, [gr-qc/0503065].
[18] Bojowald M and Kagan M, 2006, Singularities in Isotropic Non-Minimal Scalar Field Models,
Class. Quant. Grav., 23, 4983-4990, [gr-qc/0604105]; Bojowald M and Kagan M, 2006, Loop
cosmological implications of a non-minimally coupled scalar field, Phys. Rev. D, 74, 044033,
[gr-qc/0606082].
[19] Singh P, Vandersloot K and Vereshchagin G V, 2006, Non-Singular Bouncing Universes in
Loop Quantum Cosmology, Phys. Rev. D, 74, 043510 [gr-qc/0606032].
[20] Hossain G M, 2005, Primordial Density Perturbation in Effective Loop Quantum Cosmology,
Class. Quant. Grav., 22, 2511, [gr-qc/0411012].
[21] Calcagni G and Cortes M, 2007, Inflationary scalar spectrum in loop quantum cosmology,
Class. Quantum Grav., 24, 829, [gr-qc/0607059].
[22] Bojowald M and Skirzewski A, 2006, Effective Equations of Motion for Quantum Systems
Rev. Math. Phys., 18, 713-746, [math-ph/0511043].
http://cgpg.gravity.psu.edu/archives/thesis/index.shtml
http://arxiv.org/abs/gr-qc/0407074
http://arxiv.org/abs/gr-qc/0502082
http://arxiv.org/abs/gr-qc/0509118
http://arxiv.org/abs/gr-qc/0609034
http://arxiv.org/abs/gr-qc/0604013
http://arxiv.org/abs/gr-qc/0608100
http://arxiv.org/abs/gr-qc/0607039
http://arxiv.org/abs/gr-qc/0612104
http://arxiv.org/abs/gr-qc/0503065
http://arxiv.org/abs/gr-qc/0604105
http://arxiv.org/abs/gr-qc/0606082
http://arxiv.org/abs/gr-qc/0606032
http://arxiv.org/abs/gr-qc/0411012
http://arxiv.org/abs/gr-qc/0607059
http://arxiv.org/abs/math-ph/0511043
[23] Date G, 2005, Absence of the Kasner singularity in the effective dynamics from loop quantum
cosmology, Phys. Rev. D, 72, 067301 [gr-qc/0505002].
[24] Chiou D, 2006, Loop Quantum Cosmology in Bianchi Type I Models: Analytical Investigation,
[gr-qc/0609029].
[25] Bojowald M, Hernndez H. H, Kagan M, Singh P and Skirzewski A, 2006, Hamiltonian cosmo-
logical perturbation theory with loop quantum gravity corrections Phys. Rev. D, 74, 123512,
[gr-qc/0609057]; Bojowald M, Hernndez H. H, Kagan M, and Skirzewski A, 2006, Effective
constraints of loop quantum gravity, Phys. Rev. D, 74, [gr-qc/0611112]; Bojowald M, Hern-
ndez H. H, Kagan M, Singh P and Skirzewski A, 2006, Formation and Evolution of Structure
in Loop Cosmology Phys. Rev. Lett., 98, 031301, [astro-ph/0611685].
http://arxiv.org/abs/gr-qc/0505002
http://arxiv.org/abs/gr-qc/0609029
http://arxiv.org/abs/gr-qc/0609057
http://arxiv.org/abs/gr-qc/0611112
http://arxiv.org/abs/astro-ph/0611685
	Cosmology, quantum cosmology, loop quantum cosmology
	Summary of pre 2005 LQC
	Post 2004 Isotropic LQC
	Physical quantities and Singularity Resolution
	Improved Quantization
	Close Isotropic Model
	Open Issues and Out look
	References
ABSTRACT
  Since the past Iagrg meeting in December 2004, new developments in loop
quantum cosmology have taken place, especially with regards to the resolution
of the Big Bang singularity in the isotropic models. The singularity resolution
issue has been discussed in terms of physical quantities (expectation values of
Dirac observables) and there is also an ``improved'' quantization of the
Hamiltonian constraint. These developments are briefly discussed.
  This is an expanded version of the review talk given at the
24$^{\mathrm{th}}$ IAGRG meeting in February 2007.

<|endoftext|><|startoftext|>
arXiv:0704.0146v1  [cond-mat.other]  2 Apr 2007
Vortices in Bose-Einstein Condensates: Theory
N. G. Parker1, B. Jackson2, A. M. Martin1, and C. S. Adams3
1 School of Physics, University of Melbourne, Parkville, Victoria 3010, Australia.
ngparker@ph.unimelb.edu.au,amm@ph.unimelb.edu.au
2 School of Mathematics and Statistics, Newcastle University, Newcastle upon
Tyne, NE1 7RU, United Kingdom. brian.jackson@newcastle.ac.uk
3 Department of Physics, Durham University, South Road, Durham, DH1 3LE,
United Kingdom. c.s.adams@durham.ac.uk
1 Quantized vortices
Vortices are pervasive in nature, representing the breakdown of laminar fluid
flow and hence playing a key role in turbulence. The fluid rotation associated
with a vortex can be parameterized by the circulation Γ =
dr · v(r) about
the vortex, where v(r) is the fluid velocity field. While classical vortices can
take any value of circulation, superfluids are irrotational, and any rotation
or angular momentum is constrained to occur through vortices with quan-
tized circulation. Quantized vortices also play a key role in the dissipation of
transport in superfluids. In BECs quantized vortices have been observed in
several forms, including single vortices [1, 2], vortex lattices [3, 4, 5, 6] (see
also Chap. VII), and vortex pairs and rings [7, 8, 9]. The recent observation
of quantized vortices in a fermionic gas was taken as a clear signature of the
underlying condensation and superfluidity of fermion pairs [10]. In addition to
BECs, quantized vortices also occur in superfluid Helium [11, 12], nonlinear
optics, and type-II superconductors [13].
1.1 Theoretical Framework
Quantization of circulation
Quantized vortices represent phase defects in the superfluid topology of the
system. Under the Madelung transformation, the macroscopic condensate
‘wavefunction’ ψ(r, t) can be expressed in terms of a fluid density n(r, t) and a
macroscopic phase S(r, t) via ψ(r) =
n(r, t) exp[iS(r, t)]. In order that the
wavefunction remains single-valued, the change in phase around any closed
contour C must be an integer multiple of 2π,
∇S · dl = 2πq, (1)
http://arXiv.org/abs/0704.0146v1
2 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
where q is an integer. The gradient of the phase S defines the superfluid
velocity via v(r, t) = (h̄/m)∇S(r, t). This implies that the circulation about
the contour C is given by,
v · dl = q
. (2)
In other words, the circulation of fluid is quantized in units of (h/m). The
circulating fluid velocity about a vortex is given by v(r, θ) = qh̄/(mr)θ̂, where
r is the radius from the core and θ̂ is the azimuthal unit vector.
Theoretical model
The Gross-Pitaevskii equation (GPE) provides an excellent description of
BECs at the mean-field level in the limit of ultra-cold temperature [14]. It
supports quantized vortices, and has been shown to give a good description of
the static properties and dynamics of vortices [14, 15]. Dilute BECs require a
confining potential, formed by magnetic or optical fields, which typically varies
quadratically with position. We will assume an axially-symmetric harmonic
trap of the form V = 1
m(ω2rr
2 + ω2zz
2), where ωr and ωz are the radial and
axial trap frequencies respectively. Excitation spectra of BEC states can be
obtained using the Bogoliubov equations, and specify the stability of station-
ary solutions of the GPE. For example, the presence of the so-called anomalous
modes of a vortex in a trapped BEC are indicative of their thermodynamic
instability. The GPE can also give a qualitative, and sometimes quantitative,
understanding of vortices in superfluid Helium [11, 12].
Although this Chapter deals primarily with vortices in repulsively-inte-
racting BECs, vortices in attractively-interacting BECs have also received
theoretical interest. The presence of a vortex in a trapped BEC with attractive
interactions is less energetically favorable than for repulsive interactions [16].
Indeed, a harmonically-confined attractive BEC with angular momentum is
expected to exhibit a center-of-mass motion rather than a vortex [17]. The
use of anharmonic confinement can however support metastable vortices, as
well as regimes of center-of-mass motion and instability [18, 19, 20].
Various approximations have been made to incorporate thermal effects
into the GPE to describe vortices at finite temperature (see also Chap. XI).
The Popov approximation self-consistently couples the condensate to a normal
gas component using the Bogoliubov-de-Gennes formalism [21] (cf. Chap. I
Sec. 5.2). Other approaches involve the addition of thermal/quantum noise to
the system, such as the stochastic GPE method [22, 23, 24] and the classical
field/truncated Wigner methods [25, 26, 27, 28]. Thermal effects can also be
simulated by adding a phenomenological dissipation term to the GPE [29].
Basic properties of vortices
In a homogeneous system, a quantized vortex has the 2D form,
Vortices in Bose-Einstein Condensates: Theory 3
ψ(r, θ) =
nv(r) exp(iqθ). (3)
The vortex density profile nv(r) has no analytic solution, although approx-
imate solutions exist [30]. Vortex solutions can be obtained numerically by
propagating the GPE in imaginary time (t→ −it) [31], whereby the GPE con-
verges to the lowest energy state of the system (providing it is stable). By en-
forcing the phase distribution of Eq. (3), a vortex solution is generated. Figure
1 shows the solution for a q = 1 vortex at the center of a harmonically-confined
BEC. The vortex consists of a node of zero density with a width characterized
by the condensate healing length ξ = h̄/
mn0g, where g = 4πh̄
2a/m (with a
the s-wave scattering length) and n0 is the peak density in the absence of the
vortex. For typical BEC parameters [3], ξ ∼ 0.2 µm. For a q = 1 vortex at
the center of an axially-symmetric potential, each particle carries h̄ of angular
momentum. However, if the vortex is off-center, the angular momentum per
particle becomes a function of position [15].
1.2 Vortex structures
Increasing the vortex charge widens the core due to centrifugal effects. In
harmonically-confined condensates a multiply-quantized vortex with q > 1 is
energetically unfavorable compared to a configuration of singly-charged vor-
tices [32, 33]. Hence, a rotating BEC generally contains an array of singly-
charged vortices in the form of a triangular Abrikosov lattice [3, 4, 5, 6, 34]
(see also Chap. VII), similar to those found in rotating superfluid helium
[11]. A q > 1 vortex can decay by splitting into singly-quantized vortices via
a dynamical instability [35, 36], but is stable for some interaction strengths
[37]. Multiply-charged vortices are also predicted to be stabilized by a suitable
localized pinning potential [38] or the addition of quartic confinement [33].
Two-dimensional vortex-antivortex pairs (i.e. two vortices with equal but
opposite circulation) and 3D vortex rings arise in the dissipation of superflow,
and represent solutions to the homogeneous GPE in the moving frame [39, 40],
with their motion being self-induced by the velocity field of the vortex lines.
When the vortex lines are so close that they begin to overlap, these states are
no longer stable and evolves into a rarefaction pulse [39].
Having more than one spin component in the BECs (cf. Chap. IX) pro-
vides an additional topology to vortex structures. Coreless vortices and vortex
‘molecules’ in coupled two-component BECs have been probed experimentally
[41] and theoretically [42]. More exotic vortex structures such as skyrmion ex-
citations [43] and half-quantum vortex rings [44] have also been proposed.
2 Nucleation of vortices
Vortices can be generated by rotation, a moving obstacle, or phase imprinting
methods. Below we discuss each method in turn.
4 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
2.1 Rotation
As discussed in the previous section, a BEC can only rotate through the
existence of quantized vortex lines. Vortex nucleation occurs only when the
rotation frequency Ω of the container exceeds a critical value Ωc [15, 32, 46].
Consider a condensate in an axially-symmetric trap which is rotating about
the z-axis at frequency Ω. In the Thomas-Fermi limit, the presence of a vortex
becomes energetically favorable when Ω exceeds a critical value given by [47],
0.67R
. (4)
This is derived by integrating the kinetic energy density mn(r)v(r)2/2 of the
vortex velocity field in the radial plane. The lower and upper limits of the
integration are set by the healing length ξ and the BEC Thomas-Fermi radius
R, respectively. Note that Ωc < ωr for repulsive interactions, while Ωc > ωr
for attractive interactions [16]. In a non-rotating BEC the presence of a vortex
raises the energy of the system, indicating thermodynamic instability [48].
In experiments, vortices are formed only when the trap is rotated at a
much higher frequency than Ωc [3, 4, 5], demonstrating that the energetic
criterion is a necessary, but not sufficient, condition for vortex nucleation.
There must also be a dynamic route for vorticity to be introduced into the
condensate, and hence Eq. (4) provides only a lower bound for the critical
frequency.
The nucleation of vortices in rotating trapped BECs appears to be linked to
instabilities of collective excitations. Numerical simulations based on the GPE
have shown that once the amplitude of these excitations become sufficiently
large, vortices are nucleated that subsequently penetrate the high-density bulk
of the condensate [23, 27, 29, 49, 50].
One way to induce instability is to resonantly excite a surface mode by
adding a rotating deformation to the trap potential. In the limit of small
perturbations, this resonance occurs close to a rotation frequency Ωr = ωℓ/ℓ,
where ωℓ is the frequency of a surface mode with multipolarity ℓ. In the
Thomas-Fermi limit, the surface modes satisfy ωℓ =
ℓωr [51], so Ωr =
ℓ. For example, an elliptically-deformed trap, which excites the ℓ = 2
quadrupole mode, would nucleate vortices when rotated at Ωr ≈ ωr/
This value has been confirmed in both experiments [3, 4, 5] and numerical
simulations [23, 27, 29, 49, 50]. Higher multipolarities were resonantly excited
in the experiment of Ref. [6], finding vortex formation at frequencies close to
the expected values, Ω = ωr/
ℓ, and lending further support to this picture.
A similar route to vortex nucleation is revealed by considering stationary
states of the BEC in a rotating elliptical trap, which can be obtained in the
Thomas-Fermi limit by solving hydrodynamic equations [52]. At low rotation
rates only one solution is found; however at higher rotations (Ω > ωr/
bifurcation occurs and up to three solutions are present. Above the bifurcation
point one or more of the solutions become dynamically unstable [53], leading
Vortices in Bose-Einstein Condensates: Theory 5
to vortex formation [54]. Madison et al. [55] followed these stationary states
experimentally by adiabatically introducing trap ellipticity and rotation, and
observed vortex nucleation in the expected region.
Surface mode instabilities can also be induced at finite temperature by
the presence of a rotating noncondensed “thermal” cloud. Such instabilities
occur when the thermal cloud rotation rate satisfies Ω > ωℓ/ℓ [56]. Since all
modes can potentially be excited in this way, the criterion for instability and
hence vortex nucleation becomes Ωc > min(ωℓ/ℓ), analogous to the Landau
criterion. Note that such a minimum exists at Ωc > 0 since the Thomas-Fermi
result ωℓ =
ℓωr becomes less accurate for high ℓ [57]. This mechanism may
have been important in the experiment of Haljan et al. [34], where a vortex
lattice was formed by cooling a rotating thermal cloud to below Tc.
2.2 Nucleation by a moving object
Vortices can also be nucleated in BECs by a moving localized potential. This
problem was originally studied using the GPE for 2D uniform condensate flow
around a circular hard-walled potential [58, 59], with vortex-antivortex pairs
being nucleated when the flow velocity exceeded a critical value.
In trapped BECs a similar situation can be realized using the optical dipole
force from a laser, giving rise to a localized repulsive Gaussian potential. Under
linear motion of such a potential, numerical simulations revealed vortex pair
formation when the potential is moved at a velocity above a critical value [60].
The experiments of [61, 62] oscillated a repulsive laser beam in an elongated
condensate. Although vortices were not observed directly, the measurement
of condensate heating and drag above a critical velocity was consistent with
the nucleation of vortices [63].
An alternative approach is to move the laser beam potential in a circular
path around the trap center [64]. By “stirring” the condensate in this way one
or more vortices can be created. This technique was used in the experiment
of Ref. [6], where vortices were generated even at low stirring frequencies.
2.3 Other mechanisms and structures
A variety of other schemes for vortex creation have been suggested. One of
the most important is that by Williams and Holland [65], who proposed a
combination of rotation and coupling between two hyperfine levels to create
a two-component condensate, one of which is in a vortex state. The non-
vortex component can then either be retained or removed with a resonant
laser pulse. This scheme was used by the first experiment to obtain vortices
in BEC [1]. A related method, using topological phase imprinting, has been
used to experimentally generate multiply-quantized vortices [66].
Apart from the vortex lines considered so far, vortex rings have also been
the subject of interest. Rings are the decay product of dynamically unstable
dark solitary waves in 3D geometries [7, 8, 67, 68]. Vortex rings also form
6 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
in the quantum reflection of BECs from surface potentials [69], the unstable
motion of BECs through an optical lattice [70], the dragging of a 3D object
through a BEC [71], and the collapse of ultrasound bubbles in BECs [72].
The controlled generation of vortex rings [73] and multiple/bound vortex ring
structures [74] have been analyzed theoretically.
A finite temperature state of a quasi-2D BEC, characterized by the ther-
mal activation of vortex-antivortex pairs, has been simulated using classical
field simulations [75]. This effect is thought to be linked to the Berezinskii-
Kosterlitz-Thouless phase transition of 2D superfluids, recently observed ex-
perimentally in ultracold gases [76]. Similar simulations in a 3D system have
also demonstrated the thermal creation of vortices [77, 78].
3 Dynamics of vortices
The study of vortex dynamics has long been an important topic in both clas-
sical [79] and quantum [12] hydrodynamics. Helmholtz’s theorem for uniform,
inviscid fluids, which is also applicable to quantized vortices in superfluids
near zero temperature, states that the vortex will follow the motion of the
background fluid. So, for example, in a superfluid with uniform flow velocity
vs, a single straight vortex line will move with velocity vL, such that it is
stationary in the frame of the superfluid.
Vortices similarly follow the “background flow” originating from circulat-
ing fluid around a vortex core. Hence vortex motion can be induced by the
presence of other vortices, or by other parts of the same vortex line when it is
curved. Most generally, the superfluid velocity vi due to vortices at a partic-
ular point r is given by the Biot-Savart law [12], in analogy with the similar
equation in electromagnetism,
(s − r) × ds
|s− r|3
; (5)
where s(ζ, t) is a curve representing the vortex line with ζ the arc length.
Equation (5) suffers from a divergence at r = s, so in calculations of vortex
dynamics this must be treated carefully [80]. Equation (5) also assumes that
the vortex core size is small compared to the distance between vortices. In
particular, it breaks down when vortices cross during collisions, where recon-
nection events can occur. These reconnections can either be included manually
[81], or by solving the full GPE [82]. The latter method also has the advantage
of including sound emission due to vortex motion or reconnections [83, 84].
In a system with multiple vortices, motion of one vortex is induced by the
circulating fluid flow around other vortices, and vice-versa [11]. This means
that, for example, a pair of vortices of equal but opposite charge will move
linearly and parallel to each other with a velocity inversely proportional to
the distance between them. Two or more vortices of equal charge, meanwhile,
Vortices in Bose-Einstein Condensates: Theory 7
will rotate around each other, giving rise to a rotating vortex lattice as will be
discussed in Chap. VII. When a vortex line is curved, circulating fluid from
one part of the line can induce motion in another. This effect can give rise to
helical waves on the vortex, known as Kelvin modes [85]. It also has interesting
consequences for a vortex ring, which will travel in a direction perpendicular
to the plane of the ring, with a self-induced velocity that decreases with in-
creasing radius. Classically, this is most familiar in the motion of smoke rings,
though similar behavior has also been observed in superfluid helium [86].
This simple picture is complicated in the presence of density inhomo-
geneities or confining walls. In a harmonically-trapped BEC the density is
a function of position, and therefore the energy, E, of a vortex will also de-
pend on its position within the condensate. To simplify matters, let us con-
sider a quasi-2D situation, where the condensate is pancake-shaped and the
vortex line is straight. In this case, the energy of the vortex depends on its
displacement r from the condensate center [87], and a displaced vortex feels a
force proportional to ∇E. This is equivalent to a Magnus force on the vortex
[88, 89, 90] and to compensate the vortex moves in a direction perpendicular
to the force, leading it to precess around the center of the condensate along a
line of constant energy. This precession of a single vortex has been observed
experimentally [2], with a frequency in agreement with theoretical predictions.
In more 3D situations, such as spherical or cigar-shaped condensates, the vor-
tex can bend [91, 92, 93, 94] leading to more complicated motion [15]. Kelvin
modes [95, 96] and vortex ring dynamics [88] are also modified by the density
inhomogeneity in the trap.
In the presence of a hard-wall potential, a new constraint is imposed such
that the fluid velocity normal to the wall must be zero, vs ·n̂ = 0. The resulting
problem of vortex motion is usually solved mathematically [79] by invoking
an “image vortex” on the other side of the wall (i.e. in the region where there
is no fluid present), at a position such that its normal flow cancels that of the
real vortex at the barrier. The motion of the real vortex is then simply equal
to the induced velocity from the image vortex circulation.
4 Stability of vortices
4.1 Thermal instabilities
At finite temperatures the above discussion is modified by the thermal oc-
cupation of excited modes of the system, which gives rise to a noncondensed
normal fluid in addition to the superfluid. A vortex core moving relative to the
normal fluid scatters thermal excitations, and will therefore feel a frictional
force leading to dissipation. This mutual friction force can be written as [11],
fD = −nsΓ{αs′ × [ s′ × (vn − vL)] + α′s′ × (vn − vL)}, (6)
where ns is the background superfluid density, s
′ is the derivative of s with
respect to arc length ζ, α and α′ are temperature dependent parameters,
8 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
while vL and vn are the velocities of the vortex line and normal fluid respec-
tively. The mutual friction therefore has two components perpendicular to the
relative velocity vn − vL.
To consider an example discussed in the last section, an off-center vortex in
a trapped BEC at zero temperature will precess such that its energy remains
constant. In the presence of a non-condensed component, however, dissipation
will lead to a loss of energy. Since the vortex is topological it cannot simply
vanish, so this lost energy is manifested as a radial drift of the vortex towards
lower densities. In Eq. (6) the α term is responsible for this radial motion,
while α′ changes the precession frequency. The vortex disappears at the edge
of the condensate, where it is thought to decay into elementary excitations
[97]. Calculations based upon the stochastic GPE have shown that thermal
fluctuations lead to an uncertainty in the position of the vortex, such that
even a central vortex will experience thermal dissipation and have a finite
lifetime [24]. This thermodynamic lifetime is predicted to be of the order of
seconds [97], which is consistent with experiments [1, 3, 94].
4.2 Hydrodynamic instabilities
Experiments indicate that the crystallization of vortex lattices is temperature-
independent [5, 98]. Similarly, vortex tangles in turbulent states of superfluid
Helium have been observed to decay at ultracold temperature, where thermal
dissipation is virtually nonexistent [99]. These results highlight the occurrence
of zero temperature dissipation mechanisms, as listed below.
Instability to acceleration
The topology of a 2D homogeneous superfluid can be mapped on to a (2+1)D
electrodynamic system, with vortices and phonons playing the role of charges
and photons respectively [100]. Just as an accelerating electron radiates ac-
cording to the Larmor acceleration squared law, a superfluid vortex is inher-
ently unstable to acceleration and radiates sound waves.
Vortex acceleration can be induced by the presence of an inhomogeneous
background density, such as in a trapped BEC. Sound emission from a vortex
in a BEC can be probed by considering a trap of the form [45],
Vext = V0
1 − exp
mω2rr
2. (7)
This consists of a gaussian dimple trap with depth V0 and harmonic frequency
component ωd, embedded in an ambient harmonic trap of frequency ωr. A 2D
description is sufficient to describe this effect. This set-up can be realized with
a quasi-2D BEC by focussing a far-off-resonant red-detuned laser beam in the
center of a magnetic trap. The vortex is initially confined in the inner region,
where it precesses due to the inhomogeneous density. Since sound excitations
Vortices in Bose-Einstein Condensates: Theory 9
Fig. 1. Profile of a singly-quantized (q = 1) vortex at the center of a harmonically-
confined BEC: (a) condensate density along the y = 0 axis (solid line) and the
corresponding density profile in the absence of the vortex (dashed line). (b) 2D
density and (c) phase profile of the vortex state. These profiles are calculated nu-
merically by propagating the 2D GPE in imaginary time subject to an azimuthal
2π phase variation around the trap center.
−6 −4 −2 0 2 4 6
x (ξ)
(i) (ii)
Fig. 2. Vortex path in the dimple trap geometry of Eq. (7) with ωd = 0.28(c/ξ).
Deep V0 = 10µ dimple (dotted line): mean radius is constant, but modulated by the
sound field. Shallow V0 = 0.6µ dimple and homogeneous outer region ωr = 0 (dot-
ted line): vortex spirals outwards. Outer plots: Sound excitations (with amplitude
∼ 0.01n0) radiated in the V0 = 0.6µ system at times indicated. Top: Far-field distri-
bution [−90, 90]ξ×[−90, 90]ξ. Bottom: Near-field distribution [−25, 25]ξ×[−25, 25]ξ,
with an illustration of the dipolar radiation pattern. Copyright (2004) by the Amer-
ican Physical Society [45].
have an energy of the order of the chemical potential µ, the depth of the dimple
relative to µ leads to two distinct regimes of vortex-sound interactions.
V0 ≫ µ: The vortex effectively sees an infinite harmonic trap - it precesses
and radiates sound but there is no net decay due to complete sound reabsorp-
tion. However, a collective mode of the background fluid is excited, inducing
slight modulations in the vortex path (dotted line in Fig 2).
V0 < µ: Sound waves are radiated by the precessing vortex. Assuming ωr =
0, the sound waves propagate to infinity without reinteracting with the vortex.
10 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
The ensuing decay causes the vortex to drift to lower densities, resulting in a
spiral motion (solid line in Fig. 2), similar to the effect of thermal dissipation.
The sound waves are emitted in a dipolar radiation pattern, perpendicularly
to the instantaneous direction of motion (subplots in Fig. 2), with a typical
amplitude of order 0.01n0 and wavelength λ ∼ 2πc/ωV [15], where c is the
speed of sound and ωV is the vortex precession frequency. The power radiated
from a vortex can be expressed in the form [45, 101, 102],
P = βmN
, (8)
where a is the vortex acceleration, N is the total number of atoms, and β is
a dimensionless coefficient. Using classical hydrodynamics [101] and by map-
ping the superfluid hydrodynamic equations onto Maxwell’s electrodynamic
equations [102], it has been predicted that β = π2/2 under the assumptions
of a homogeneous 2D fluid, a point vortex, and perfect circular motion. Full
numerical simulations of the GPE based on a realistic experimental scenario
have derived a coefficient of β ∼ 6.3 ± 0.9 (one standard deviation), with the
variation due to a weak dependence on the geometry of the system [45].
When ωr 6= 0, the sound eventually reinteracts with the vortex, slowing but
not preventing the vortex decay. By varying V0 it is possible to control vortex
decay, and in suitably engineered traps this decay mechanism is expected to
dominate over thermal dissipation [45].
Vortex acceleration (and sound emission) can also be induced by the pres-
ence of other vortices. A co-rotating pair of two vortices of equal charge has
been shown to decay continuously via quadrupolar sound emission, both an-
alytically [103] and numerically [104]. Three-body vortex interactions in the
form of a vortex-antivortex pair incident on a single vortex have also been sim-
ulated numerically, with the interaction inducing acceleration in the vortices
with an associated emission of sound waves [104].
Simulations of vortex lattice formation in a rotating elliptical trap show
that vortices are initially nucleated in a turbulent disordered state, before
relaxing into an ordered lattice [50]. This relaxation process is associated
with an exchange of energy from the sound field to the vortices due to these
vortex-sound interactions. This agrees with the experimental observation that
vortex lattice formation is insensitive to temperature [5, 98].
Kelvin wave radiation and vortex reconnections
In 3D a Kelvin wave excitation will induce acceleration in the elements of
the vortex line, and therefore local sound emission. Indeed, simulations of
the GPE in 3D have shown that Kelvin waves excitations on a vortex ring
lead to a decrease in the ring size, indicating the underlying radiation process
[84]. Kelvin wave excitations can be generated from a vortex line reconnection
[83, 84] and the interaction of a vortex with a rarefaction pulse [105].
Vortices in Bose-Einstein Condensates: Theory 11
Vortex lines which cross each other can undergo dislocations and reconnec-
tions [106], which induce a considerable burst of sound emission [83]. Although
they have yet to be probed experimentally in BECs, vortex reconnections are
hence thought to play a key role in the dissipation of vortex tangles in Helium
II at ultra-low temperatures [11].
5 Dipolar BECs
A BEC has recently been formed of chromium atoms [107], which feature a
large dipole moment. This opens the door to studying of the effect of long-
range dipolar interactions in BECs.
5.1 The Modified Gross-Pitaevskii Equation
The interaction potential Udd(r) between two dipoles separated by r, and
aligned by an external field along the unit vector ê is given by,
Udd(r) =
êiêj
(δij − 3r̂ir̂j)
. (9)
For low energy scattering of two atoms with dipoles induced by a static electric
field E = Eê, the coupling constant Cdd = E
2α2/ǫ0 [108, 109], where α is the
static dipole polarizability of the atoms and ǫ0 is the permittivity of free space.
Alternatively, if the atoms have permanent magnetic dipoles, dm, aligned in
an external magnetic field B = Bê, one has Cdd = µ0d
m [110], where µ0 is the
permeability of free space. Such dipolar interactions give rise to a mean-field
potential
Φdd(r) =
d3rUdd (r − r′) |ψ (r′) |2, (10)
which can be incorporated into the GPE to give,
ih̄ψt =
∇2 + g|ψ|2 + Φdd + V
ψ. (11)
For an axially-symmetric quasi-2D geometry (ωz ≫ ωr) rotating about the
z -axis, the ground state wavefunction of a single vortex has been solved numer-
ically [111]. Considering 105 chromium atoms and ωr = 2π × 100Hz, several
solutions were obtained depending on the strength of the s-wave interactions
and the alignment of the dipoles relative to the trap.
For the case of axially-polarized dipoles the most striking results arise
for attractive s-wave interactions g < 0. Here the BEC density is axially
symmetric and oscillates in the vicinity of the vortex core. Similar density
oscillations have been observed in numerical studies of other non-local inter-
action potentials, employed to investigate the interparticle interactions in 4He
12 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
[112, 113, 114, 115], with an interpretation that relates to the roton structure
in a superfluid [115]. For the case of transversely-polarized dipoles, where the
polarizing field is co-rotating with the BEC, and repulsive s-wave interactions
(g > 0), the BEC becomes elongated along the axis of polarization [116] and
as a consequence the vortex core is anisotropic.
5.2 Vortex Energy
Assuming a dipolar BEC in the TF limit (cf. Sec. 5.1 in Chap. I), the en-
ergetic cost of a vortex, aligned along the axis of polarization (z-axis), has
been derived using a variational ansatz for the vortex core [117], and thereby
the critical rotation frequency Ωc at which the presence of a vortex becomes
energetically favorable has been calculated. For an oblate trap (ωr < ωz),
dipolar interactions decrease Ωc, while for prolate traps (ωr > ωz) the pres-
ence of dipolar interactions increases Ωc. A formula resembling Eq. (4) for
the critical frequency of a conventional BEC can be used to explain these
results, with R being the modified TF radius of the dipolar BEC. Indeed,
using the TF radius of a vortex-free dipolar BEC [118, 119] and the conven-
tional s-wave healing length ξ, it was found that Eq. (4) closely matches the
results from the energy cost calculation. Deviations become significant when
the dipolar interactions dominate over s-wave interactions. In this regime the
s-wave healing length ξ is no longer the relevant length scale of the system,
and the equivalent dipolar length scale ξd = Cddm/(12πh̄
) will characterize
the vortex core size.
For g > 0 and in the absence of dipolar interactions, the rotation frequency
at which the vortex-free BEC becomes dynamically unstable, Ωdyn, is always
greater than the critical frequency for vortex stabilization Ωc. However in
the presence of dipolar interactions, Ωdyn can become less than Ωc, leading
to an intriguing regime in which the dipolar BEC is dynamically unstable
but vortices will not enter [117, 120]. As with attractive condensates [17], the
angular momentum may then be manifested as center of mass oscillations.
6 Analogs of Gravitational Physics in BECs
There is growing interest in pursuing analogs of gravitational physics in con-
densed matter systems [121], such as BECs. The rationale behind such models
can be traced back to the work of Unruh [122, 123], who noted the analogy
between sound propagation in an inhomogeneous background flow and field
propagation in curved space-time. This link applies in the TF limit of BECs
where the speed of sound is directly analogous to the speed of light in the
corresponding gravitational system [124]. This has led to proposals for exper-
iments to probe effects such as Hawking radiation [125, 126] and superradiance
[127]. For Hawking radiation it is preferable to avoid the generation of vortices
[121, 128], and as such will not be discussed here. However, the phenomena
Vortices in Bose-Einstein Condensates: Theory 13
of superradiance in BECs, which can be considered as stimulated Hawking
radiation, relies on the presence of a vortex [129, 130, 131, 132], which is
analogous to a rotating black hole.
Below we outline the derivation of how the propagation of sound in a BEC
can be considered to be analogous to field propagation [121]. From the GPE
it is possible to derive the continuity equation for an irrotational fluid flow
with phase S(r, t) and density n(r, t), and a Hamilton-Jacobi equation whose
gradient leads to the Euler equation. Linearizing these equations with respect
to the background it is found that
′ = − 1
∇S · ∇S′ − gn′ + h̄
, (12)
′ = − 1
∇ · (n∇S′) − 1
∇ · (n′∇S) , (13)
where n′ and S′ are the perturbed values of the density n and phase S respec-
tively. Neglecting the quantum pressure ∇2-terms, the above equations can
be rewritten as a covariant differential equation describing the propagation of
phase oscillations in a BEC. This is directly analogous to the propagation of
a minimally coupled massless scalar field in an effective Lorentzian geometry
which is determined by the background velocity, density and speed of sound in
the BEC. Hence, the propagation of sound in a BEC can be used as an analogy
for the propagation of electromagnetic fields in the corresponding space-time.
Of course one has to be aware that this direct analogy is only valid in the TF
regime, which breaks down on scales of the order of a healing length, i.e. the
theory is only valid on large length scales, as is general relativity.
6.1 Superradiance
Superradiance in BECs relies on sound waves incident on a vortex structure
and is characterized by the reflected sound energy exceeding the incident
energy. This has been studied using Eqs. (12) and (13) for monochromatic
sound waves of frequency ωs and angular wave number qs incident upon a
vortex [129] and a ‘draining vortex” (a vortex with outcoupling at its center)
[130, 131, 132].
For the vortex case, a vortex velocity field v(r, θ) = (β/r)θ̂ and a density
profile ansatz was assumed. Superradiance then occurs when βqs > Ac∞,
where A is related to the vortex density ansatz and c∞ is the speed of sound
at infinity [129]. Interestingly, this condition is frequency independent.
For the case of a draining vortex, an event horizon occurs at a distance
a from the vortex core, where the fluid circulates at frequency Ω. Assuming
a homogeneous density n and a velocity profile v(r, θ) =
−car̂ +Ωa2θ̂
where c is the homogeneous speed of sound, superradiance occurs when 0 <
ωs < qsΩ [130, 131, 132].
The increase in energy of the outgoing sound is due to an extraction of
energy from the vortex and as such it is expected to lead to slowing of the
14 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
vortex rotation. However, such models do not include quantized vortex angular
momentum, and as such it is expected that superradiance will be suppressed
[132]. This raises tantalizing questions, such as whether superradiance can
occur if vorticity is quantized, if such effects can be modeled with the GPE,
and whether the study of quantum effects in condensate superradiance will
shed light on quantum effects in general relativity.
References
1. M. R. Matthews, B. P. Anderson, P. C. Haljan, D. S. Hall, C. E. Wieman, and
E. A. Cornell, Phys. Rev. Lett. 83, 2498 (1999).
2. B. P. Anderson, P. C. Haljan, C. E. Wieman, and E. A. Cornell, Phys. Rev.
Lett. 85, 2857 (2000).
3. K. W. Madison, F. Chevy, W. Wohlleben, and J. Dalibard, Phys. Rev. Lett.
84, 806 (2000).
4. J. R. Abo-Shaeer, C. Raman, J. M. Vogels, and W. Ketterle, Science 292, 476
(2001).
5. E. Hodby, C. Hechenblaikner, S. A. Hopkins, O. M. Maragò, and C. J. Foot,
Phys. Rev. Lett. 88, 010405 (2002).
6. C. Raman, J. R. Abo-Shaeer, J. M. Vogels, K. Xu, and W. Ketterle, Phys. Rev.
Lett. 87, 210402 (2001).
7. B. P. Anderson, P. C. Haljan, C. A. Regal, D. L. Feder, L. A. Collins, C. W.
Clark, and E. A. Cornell, Phys. Rev. Lett. 86, 2926 (2001).
8. Z. Dutton, M. Budde, C. Slowe, and L. V. Hau, Science 293, 663 (2001).
9. S. Inouye, S. Gupta, T. Rosenband, A. P. Chikkatur, A. Görlitz, T. L. Gus-
tavson, A. E. Leanhardt, D. E. Pritchard, and W. Ketterle, Phys. Rev. Lett.
87, 080402 (2001).
10. M. W. Zwierlein, J. R. Abo-Shaeer, A. Schirotzek, C. H. Schunck, and W.
Ketterle, Nature 435, 1047 (2005).
11. R. J. Donnelly: Quantized vortices in Helium II (Cambridge University Press,
Cambridge, 1991).
12. C. F. Barenghi, R. J. Donnelly, and W. F. Vinen (Eds.): Quantized Vortex
Dynamics and Superfluid Turbulence (Springer Verlag, Berlin, 2001).
13. D. R. Tilley and J. Tilley: Superfluidity and Superconductivity (IOP, Bristol,
1990).
14. F. Dalfovo, S. Giorgini, L. P. Pitaevskii, and S. Stringari, Rev. Mod. Phys. 71,
463 (1999).
15. A. L. Fetter and A. A. Svidzinsky, J. Phys.: Condens. Matter 13, R135 (2001).
16. F. Dalfovo and S. Stringari, Phys. Rev. A 53, 2477 (1996).
17. N. K. Wilkin, J. M. F. Gunn, and R. A. Smith, Phys. Rev. Lett. 80, 2265
(1998).
18. H. Saito and M. Ueda, Phys. Rev. A 69, 013604 (2004).
19. E. Lundh, A. Collin, and K-A. Suominen, Phys. Rev. Lett. 92, 070401 (2004).
20. G. M. Kavoulakis, A. D. Jackson, and G. Baym, Phys. Rev. A 70, 043603
(2004).
21. S. M. M. Virtanen, T. P. Simula, M. M. Salomaa, Phys. Rev. Lett. 86, 2704
(2001).
Vortices in Bose-Einstein Condensates: Theory 15
22. C. W. Gardiner, J. R. Anglin, and T. I. A. Fudge, J. Phys. B 35, 1555 (2002).
23. A. A. Penckwitt, R. J. Ballagh, and C. W. Gardiner, Phys. Rev. Lett. 89,
260402 (2002).
24. R. A. Duine, B. W. A. Leurs, and H. T. C. Stoof, Phys. Rev. A 69, 053623
(2004).
25. M. J. Steel, M. K. Olsen, L. I. Plimak, P. D. Drummond, S. M. Tan, M. J.
Collett, D. F. Walls, and R. Graham, Phys. Rev. A 58, 4824 (1998).
26. M. J. Davis, S. A. Morgan, and K. Burnett, Phys. Rev. A 66, 053618 (2002).
27. C. Lobo, A. Sinatra, and Y. Castin, Phys. Rev. Lett. 92, 020403 (2004).
28. T. P. Simula and P. B. Blakie, Phys. Rev. Lett. 96, 020404 (2006).
29. M. Tsubota, K. Kasamatsu, and M. Ueda, Phys. Rev. A 65, 023603 (2002).
30. C. J. Pethick and H. Smith: Bose-Einstein Condensation in Dilute Gases (Cam-
bridge, 2002).
31. A. Minguzzi, S. Succi, F. Toschi, M. P. Tosi, and P. Vignolo, Phys. Rep. 395,
223 (2004).
32. D. A. Butts and D. S. Rokshar, Nature 397, 327 (1999).
33. E. Lundh, Phys. Rev. A 65, 043604 (2002).
34. P. C. Haljan, I. Coddington, P. Engels, and E. A. Cornell, Phys. Rev. Lett. 87,
210403 (2001).
35. M. Möttönen, T. Mizushima, T. Isoshima, M. M. Salomaa, and K. Machida,
Phys. Rev. A 68, 023611 (2003).
36. Y. Shin, M. Saba, A. Schirotzek, T. A. Pasquini, A. E. Leanhardt, D. E.
Pritchard, and W. Ketterle, Phys. Rev. Lett. 93, 160406 (2004).
37. H. Pu, C. K. Law, J. H. Eberly, and N. P. Bigelow, Phys. Rev. Lett. 59, 1533
(1999).
38. T. P. Simula, S. M. M. Virtanen, and M. M. Salomaa, Phys. Rev. A 65, 033614
(2002).
39. C. A. Jones and P. H. Roberts, J. Phys. A 15, 2599 (1982).
40. C. A. Jones, S. J. Putterman, and P. H. Roberts, J. Phys. A 19, 2991 (1986).
41. A. E. Leanhardt, Y. Shin, D. Kielpinski, D. E. Pritchard, and W. Ketterle,
Phys. Rev. Lett. 90, 140403 (2003).
42. K. Kasamatsu, M. Tsubota, and M. Ueda, Phys. Rev. Lett. 93, 250406 (2004).
43. J. Ruostekoski and J. R. Anglin, Phys. Rev. Lett. 86, 003934 (2001).
44. J. Ruostekoski and J. R. Anglin, Phys. Rev. Lett. 91, 190402 (2003).
45. N. G. Parker, N. P. Proukakis, C. F. Barenghi, and C. S. Adams, Phys. Rev.
Lett. 92, 160403 (2004).
46. P. Nozieres and D. Pines: The Theory of Quantum Liquids (Perseus Publishing,
New York, 1999).
47. E. Lundh, C.J. Pethick and H. Smith, Phys. Rev. A 55, 2126 (1997).
48. D. S. Rokhsar, Phys. Rev. Lett. 79, 2164 (1997).
49. E. Lundh, J. P. Martikainen, and K. A. Suominen, Phys. Rev. A 67, 063604
(2003).
50. N. G. Parker and C. S. Adams, Phys. Rev. Lett. 95, 145301 (2005); J. Phys. B
39, 43 (2006).
51. S. Stringari, Phys. Rev. Lett. 77, 2360 (1996).
52. A. Recati, F. Zambelli, and S. Stringari, Phys. Rev. Lett. 86, 377 (2001).
53. S. Sinha and Y. Castin, Phys. Rev. Lett. 87, 190402 (2001).
54. N.G. Parker, R.M.W. van Bijnen and A.M. Martin, Phys. Rev. A 73, 061603(R)
(2006).
16 N. G. Parker, B. Jackson, A. M. Martin, and C. S. Adams
55. K. W. Madison, F. Chevy, V. Bretin, and J. Dalibard, Phys. Rev. Lett. 86,
4443 (2001).
56. J. E. Williams, E. Zaremba, B. Jackson, T. Nikuni, and A. Griffin,
Phys. Rev. Lett. 88, 070401 (2002).
57. F. Dalfovo and S. Stringari, Phys. Rev. A 63, 011601(R) (2001).
58. T. Frisch, Y. Pomeau, and S. Rica, Phys. Rev. Lett. 69, 1644 (1992).
59. T. Winiecki, J. F. McCann, and C. S. Adams, Phys. Rev. Lett. 82, 5186 (1999).
60. B. Jackson, J. F. McCann, and C. S. Adams, Phys. Rev. Lett. 80, 3903 (1998).
61. C. Raman, M. Köhl, R. Onofrio, D. S. Durfee, C. E. Kuklewicz, Z. Hadzibabic,
and W. Ketterle, Phys. Rev. Lett. 83, 2502 (1999).
62. R. Onofrio, C. Raman, J. M.Vogels, J. R. Abo-Shaeer, A. .P. Chikkatur, and
W. Ketterle, Phys. Rev. Lett. 85, 2228 (2000).
63. B. Jackson, J. F. McCann, and C. S. Adams, Phys. Rev. A 61, 051603(R)
(2000).
64. B. M. Caradoc-Davies, R. J. Ballagh, and K. Burnett, Phys. Rev. Lett. 83, 895
(1999).
65. J. E. Williams and M. J. Holland, Nature 401, 568 (1999).
66. A. E. Leanhardt, A. Görlitz, A. Chikkatur, D. Kielpinski, Y. Shin, D. E.
Pritchard, and W. Ketterle, Phys. Rev. Lett. 89, 190403 (2002).
67. N. S. Ginsberg, J. Brand, and L. V. Hau, Phys. Rev. Lett. 94, 040403 (2005).
68. S. Komineas and N. Papanicolaou, Phys. Rev. A 68, 043617 (2003).
69. R. G. Scott, A. M. Martin, T. M. Fromhold, and F. W. Sheard, Phys. Rev.
Lett. 95, 073201 (2005).
70. R. G. Scott, A. M. Martin, S. Bujkiewicz, T. M. Fromhold, N. Malossi, O.
Morsch, M. Cristiani, and E. Arimondo, Phys. Rev. A 69, 033605 (2004).
71. B. Jackson, J. F. McCann, and C. S. Adams, Phys. Rev. A 60, 4882 (1999).
72. N. G. Berloff and C. F. Barenghi, Phys. Rev. Lett. 93, 090401 (2004).
73. J. Ruostekoski and Z. Dutton, Phys. Rev. A 70, 063626 (2005).
74. L. C. Crasovan, V. M. Pérez-Garćıa, I. Danaila, D. Mihalache, and L. Torner,
Phys. Rev. A 70, 033605 (2004).
75. T. P. Simula and P. B. Blakie, Phys. Rev. Lett. 96, 020404 (2006).
76. Z. Hadzibabic, P. Krüger, M. Cheneau, B. Battelier, and J. Dalibard, Nature
441, 1118 (2006).
77. M. J. Davis, S. A. Morgan, and K. Burnett, Phys. Rev. Lett. 66, 053618 (2002).
78. N. G. Berloff and B. V. Svistunov, Phys. Rev. A 66, 013603 (2002).
79. H. Lamb: Hydrodynamics (Cambridge University Press, 1932).
80. M. Tsubota, T. Araki, and S. K. Nemirovskii, Phys. Rev. B 62, 11751 (2000).
81. K. W. Schwarz, Phys. Rev. B 31, 5782 (1985).
82. J. Koplik and H. Levine, Phys. Rev. Lett. 71, 1375 (1993).
83. M. Leadbeater, T. Winiecki, D. C. Samuels, C. F. Barenghi, and C. S. Adams,
Phys. Rev. Lett. 86, 1410 (2001).
84. M. Leadbeater, D. C. Samuels, C. F. Barenghi, and C. S. Adams, Phys. Rev.
A 67, 015601 (2003).
85. W. Thomson (Lord Kelvin), Philos. Mag. 10, 155 (1880).
86. G. W. Rayfield and F. Reif, Phys. Rev. 136, A1194 (1964).
87. A. A. Svidzinsky and A. L. Fetter, Phys. Rev. Lett. 84, 5919 (2000).
88. B. Jackson, J. F. McCann, and C. S. Adams, Phys. Rev. A 61, 013604 (2000).
89. E. Lundh and P. Ao, Phys. Rev. A 61, 063612 (2000).
90. S. A. McGee and M.J. Holland, Phys. Rev. A 63, 043608 (2001).
Vortices in Bose-Einstein Condensates: Theory 17
91. J. J. Garćıa-Ripoll and V. M. Pérez-Garćıa, Phys. Rev. A 63, 041603 (2001).
92. J. J. Garćıa-Ripoll and V. M. Pérez-Garćıa, Phys. Rev. A 64, 053611 (2001).
93. A. Aftalion and T. Riviere, Phys. Rev. A 64, 043611 (2001).
94. P. Rosenbusch, V. Bretin, and J. Dalibard, Phys. Rev. Lett. 89, 200403 (2002).
95. V. Bretin, P. Rosenbusch, F. Chevy, G. V. Shlyapnikov, and J. Dalibard, Phys.
Rev. Lett. 90, 100403 (2003).
96. A. L. Fetter, Phys. Rev. A 69, 043617 (2004).
97. P. O. Fedichev and G. V. Shylapnikov, Phys. Rev. A 60, R1779 (1999).
98. J. R. Abo-Shaeer, C. Raman, and W. Ketterle, Phys. Rev. Lett. 88, 070409
(2002).
99. S. I. Davis, P. C. Hendry, and P. V. E. McClintock, Physica B 280, 43 (2000).
100. D. P. Arovas and J. A. Freire, Phys. Rev. B 55, 3104 (1997).
101. W. F. Vinen, Phys. Rev. B 61, 1410 (2000).
102. E. Lundh and P. Ao, Phys. Rev. A 61, 063612 (2000).
103. L. M. Pismen: Vortices in Nonlinear Fields (Clarendon Press, Oxford, 1999).
104. C. F. Barenghi, N. G. Parker, N. P. Proukakis, and C. S. Adams, J. Low. Temp.
Phys. 138, 629 (2005).
105. N. G. Berloff, Phys. Rev. A 69, 053601 (2004).
106. B. M. Caradoc-Davies, R. J. Ballagh, and P. B. Blakie, Phys. Rev. A 62,
011602 (2000).
107. A. Griesmaier, J. Werner, S. Hensler, J. Stuhler, and T. Pfau,
Phys. Rev. Lett. 94, 160401 (2005).
108. M. Marinescu and L. You, Phys. Rev. Lett. 81, 4596 (1998).
109. S. Yi and L. You, Phys. Rev. A 61, 041604 (2000).
110. K. Góral, K. Rza̧żewski, and T. Pfau, Phys. Rev. A 61, 051601 (2000).
111. S. Yi and H. Pu, Phys. Rev. A 73, 061602(R) (2006).
112. G. Oritz and D. M. Ceperley, Phys. Rev. Lett. 75, 4642 (1995).
113. M. Sadd, G.V. Chester, and L. Reatto, Phys. Rev. Lett. 79, 2490 (1997).
114. N. G. Berloff and P. H. Roberts, J. Phys. A 32, 5611 (1999).
115. F. Dalfovo, Phys. Rev. B 46, 5482 (1992).
116. J. Stuhler, A. Griesmaier, T. Koch, M. Fattori, T. Pfau, S. Giovanazzi, P. Pedri,
and L. Santos, Phys. Rev. Lett. 95, 150406 (2005).
117. D.H.J. O’Dell and C. Eberlein, Phys. Rev. A 75, 013604 (2007).
118. D.H.J. O’Dell, S. Giovanazzi, and C. Eberlein, Phys. Rev. Lett. 92, 250401
(2004).
119. C. Eberlein, S. Giovanazzi, and D.H.J. O’Dell, Phys. Rev. A 71, 033618 (2005).
120. R.M.W. van Bijnen, D. H. J. O’Dell, N.G. Parker, and A.M. Martin, Phys.
Rev. Lett. accepted, cond-mat/0602572 (2006).
121. C. Barceló, S. Liberati and M. Visser, Living Rev. Rel. 8, 12 (2005).
122. W.G. Unruh, Phys. Rev. Lett. 46, 1351 (1981).
123. W.G. Unruh, Phys. Rev. D 27, 2827 (1995).
124. C. Barceló, S. Liberati and M. Visser, Class. Quant. Gav. 18, 1137 (2001).
125. S.W. Hawking, Nature 248, 30 (1974).
126. S.W. Hawking, Commun. Math. Phys. 43, 199 (1975).
127. J.D. Bekenstein and M. Schiffer, Phys. Rev. 58, 064014 (1998).
128. C. Barceló, S. Liberati, and M. Visser, Phys. Rev. A 68, 053613 (2003).
129. T.R. Slatyer and C.M. Savage, Class. Quant. Grav. 22, 3833 (2005).
130. S. Basak and P. Majumdar, Class. Quant. Grav. 20, 2929 (2000).
131. S. Basak and P. Majumdar, Class. Quant. Grav. 20, 3907 (2000).
132. F. Federici, C. Cherubini, S. Succi, and M.P. Tosi, Phys. Rev. A 73, 033604
(2006).
ABSTRACT
  Vortices are pervasive in nature, representing the breakdown of laminar fluid
flow and hence playing a key role in turbulence. The fluid rotation associated
with a vortex can be parameterized by the circulation $\Gamma=\oint {\rm d}{\bf
r}\cdot{\bf v}({\bf r})$ about the vortex, where ${\bf v}({\bf r})$ is the
fluid velocity field. While classical vortices can take any value of
circulation, superfluids are irrotational, and any rotation or angular momentum
is constrained to occur through vortices with quantized circulation. Quantized
vortices also play a key role in the dissipation of transport in superfluids.
In BECs quantized vortices have been observed in several forms, including
single vortices, vortex lattices, and vortex pairs and rings. The recent
observation of quantized vortices in a fermionic gas was taken as a clear
signature of the underlying condensation and superfluidity of fermion pairs. In
addition to BECs, quantized vortices also occur in superfluid Helium, nonlinear
optics, and type-II superconductors.

<|endoftext|><|startoftext|>
Introduction
The intensity and polarization of a beam of light passing through an isolated optical de-
vice undergoes a linear transformation. But this is an ideal situation because, in general,
the optical system is embedded in some media such as atmosphere or other ambient mate-
rial, which further modifies the polarization properties of the light beam passing through
it. A statistical ensemble model describing random linear optical media was formulated
two decades ago by Kim, Mandel and Wolf [1], but is not examined in any detail in the
literature, to the best of our knowledge. The purpose of the present paper is to pursue
this avenue in a new way arising from the realization of a relationship, presented here, with
the positive operator valued measures (POVM) of quantum measurement theory. This is
because the transformation of the polarization states of a light beam propagating through
an ensemble of deterministic optical devices exhibits a structural similarity with the POVM
transformation of quantum density matrices. This connection motivates, in view of the re-
cent interest in the implementations of POVMs on single photon density matrix employing
linear optics elements [2], identification of experimental schemes to realize various kinds of
Muller transformations. The properties of the transformation of the polarization states of
light form a much studied topic in literature [3 – 17]. Thus the power of the ensemble
approach becomes evident in elucidating the known optical devices as well as some hitherto
unknown types [17], which had remained only a mathematical possibility.
The contents of this paper are organized as follows. In Sec. 2, a concise formulation of
the Jones and Mueller matrix theory, along with a summary of main results of Gopala Rao
et al. [17] is given. Based on the approach of Kim, Mandel and Wolf [1] suitable Jones en-
sembles, corresponding to various types of Mueller transformations are identified in Sec. 3.
In Sec. 4, a structural equivalence between Jones ensemble and POVMs of quantum mea-
surement theory is established. Following the linear optics scheme of Ahnert and Payne [2]
for the implementation of POVMs on single photon density matrix, experimental setup for
realizing Mueller matrices of types I and II are suggested in Sec. 5. The final section has
some concluding remarks.
2. Brief summary of known results on the Jones and the Mueller formalism.
Following the standard procedure, let E1 and E2, defined here as a column matrix E =
, denote two components of the transverse electric field vector associated with a light
beam. The coherency matrix (or the polarization matrix) of the light beam is a positive
semidefinite 2x2 hermitian matrix defined by,
C = 〈E⊗E†〉. (1)
Expressing this in terms of the standard Pauli matrices σ1 =
 , σ3 =
 and the unit matrix σ0 =
, we have
siσi =
s0 + s3 s1 − is2
s1 + is2 s0 − s3
 (2)
The physical significance of the quantities arising here are
s0 = Tr (Cσ0) = Intensity of the beam
si = Tr (Cσi) = Components of Polarization vector ~s of the beam
Thus the coherency matrix completely specifies the physical properties of the light beam.
The four-vector S =
 defined by Eq. (3) is the well known Stokes vector, which
represents the state of polarization of the light beam. Because C is hermitian, the Stokes
vector is real. The positive semidefiniteness of C implies that the Stokes vector must satisfy
the properties
s0 > 0, s
0 − |~s|2 ≥ 0 (4)
A 2x2 complex matrix J, called the Jones matrix, represents the so-called deterministic
optical device [18] or medium. When a light beam represented by E passes through such a
medium, the transformed light beam is given by E′ = JE. Correspondingly, the coherency
matrix C transforms as
′ = JCJ† (5)
(Here J† is the hermitian conjugate of J.)
Alternatively, instead of the 2 × 2 matrix transformation of the coherency matrix, as
given by Eq. (5), a transformation
′ = MS (6)
of the four componental Stokes column S through a real 4x4 matrix M, called the Mueller
matrix, is found be more useful [18].
Using Eq. (3) and Eq. (5 we have,
s′i = Tr(C
′σi) = Tr(JCJ
† σi) =
Tr(J†σiJσj)sj
which leads to the well-known relationship [1]
Mij =
Tr(J†σiJσj)
between the elements of a Jones matrix and that of corresponding Mueller matrix.
But in the case where medium cannot be represented by a Jones matrix, it is not possible
to characterize the change in the state of polarization of the light beam through Eq. (5).
In such a situation, Mueller formalism provides a general approach for the polarization
transformation of the light beam. The Mueller matrix M is said to be non-deterministic
when it has no corresponding Jones characterization.
Mathematically, a Mueller device can be represented by any 4 × 4 matrix such that the
Stokes parameters of the outgoing light beam satisfy the physical constraint Eq. (4). In other
words, a Mueller matrix is any 4×4 real matrix that transforms a Stokes vector into another
Stokes vector. There are many aspects of the relationships between these two formulations
of the polarization optics and a complete characterization of Mueller matrices has been the
subject matter of Ref. [1, 3-17]. It was Gopala Rao et al. [17] who presented a complete set
of necessary and sufficient conditions for any 4x4 real matrix to be a Mueller matrix. In so
doing, they found that there are two algebraic types of Mueller matrices called type I and
type II; and it has been shown [17] that only a subset of the type-I Mueller matrices - called
deterministic or pure Mueller matrices - have corresponding Jones characterization. All the
known polarizing optical devices such as retarders, polarizers, analyzers, optical rotators are
pure Mueller type and are well understood. Mueller matrices of the Type II variety are yet
to be physically realized and have remained as mere mathematical possibility. For the sake
of completeness, we present here the characterization as well as categorization of these two
types of Mueller matrices as is given in Ref. [17]. This will enable us to show that both
Type I and II Mueller devices are realizable in an unified manner in terms of the proposed
ensemble approach [1].
I. A 4× 4 real matrix M is called a type-I Mueller matrix iff
(i) M00 ≥ 0
(ii) The G-eigenvalues ρ0, ρ1, ρ2, ρ3 of the matrix N=M̃GM are all real. (Here, M̃
stands for the transpose of M; G-eigenvalues are the eigenvalues of the matrix GN,
with G = diag(1, −1, −1, −1)).
iii) The largest G-eigenvalue ρ0 possesses a time-like G-eigenvector and the G-eigenspace
of N contains one time-like and three space-like G-eigenvectors.
II. A 4× 4 real matrix M is called a type-II Mueller matrix iff
(i) M00 > 0.
(ii) The G-eigenvalues ρ0, ρ1, ρ2, ρ3 of N=M̃GM are all real.
(iii) The largest G-eigenvalue ρ0 possesses a null G-eigenvector and the G-eigenspace of N
contains one null and two space-like G-eigenvectors.
(iv) If X0 = e0 + e1 is the null G-eigenvector of N such that e0 is a time-like vector
with positive zeroth component, e1 is a space-like vector G-orthogonal to e0 then
ẽ0Ne0 > 0.
Despite the knowledge of these new category of Mueller matrices [15, 17], not much
attention is paid for realizing the corresponding devices. An experimental arrangement
involving a parallel combination of deterministic (pure Mueller) optical devices is proposed
in Ref. [17] for realizing type-II Mueller devices. The physical situations, where the beam of
light is subjected to the influence of a medium such as atmosphere was addressed in Ref. [1].
In the next section, we discuss this ensemble approach for random optical media, proposed
by Kim, Mandel and Wolf [1] .
3. Mueller matrices as ensemble of Jones devices
Kim et. al. [1] associate a set of probabilities {pe,
pe = 1} to describe the stochastic
medium. Then a Jones device Je associated with each element e of the ensemble gives a
corresponding coherency matrix C′e = JeCJ
e. The ensemble averaged coherency matrix
Cav =
pe(JeCJ
e) (7)
then describes the effects of the medium on the beam of light. In a similar fashion, the
corresponding ensemble of Mueller matrices {Me} associated with the ensemble of Jones
matrices {Je} is constructed and its ensemble averaged Mueller matrix is similarly formed
as Mav =
e peMe. Since a linear combination of Mueller matrices with non-negative
coefficients is also a Mueller matrix, the ensemble averaged Mueller matrix Mav is a Mueller
matrix 1.
We now turn to the question of constructing an appropriate ensemble designed to describe
a given physical situation. The simplest example of an ensemble is one where the elements
are chosen entirely randomly, i.e., the system is described by a chaotic ensemble where the
probabilities are all equal, pe =
, where n denotes the number of elements in the ensemble.
The coherency matrix Cav of the light beam passing through such a chaotic assembly is just
an arithmetic average of the coherency matrices C′e = JeCJ
e and hence
Cav =
e (8)
More general models can be constructed depending on the medium for the propagation of
the beam of light. For example, one may employ various types of filters or solid state systems
through which the light passes; the assignment of the Jones matrices and the corresponding
probabilities will then differ depending on the weights placed on these elements.
Restricting ourselves to an ensemble consisting of only two Jones devices which occur
with equal probability p1 = 1/2, p2 = 1/2, we have found out that the resultant Mueller
matrices can either be deterministic or non-deterministic. We give in the foregoing (see Table
I) some examples of Mueller matrices corresponding to different choices of Jones matrices
in an ensemble Je, e = 1, 2, for some representative cases. This will also serve to show the
1 This is because, each Mueller matrix M
transforms an initial Stokes vector into a final Stokes vector and
a linear combination of Stokes vectors with non-negative coefficients p
is again a Stokes vector.
generality of the ensemble procedure in capturing the physical realizations for the Mueller
devices discussed in Ref. [17].
Table 1. Mueller matrices resulting from 2-element Jones ensemble.
J1 J2 M = p1M1 + p2M2, Type of M
p1 = p2 =
1. 1√
1 1− i
1 + i −1
1 1− i
1 + i −1
3 0 0 0
0 −1 2 2
0 2 −1 2
0 2 2 −1
Pure Mueller
1 0 0 0
1 0 0 0
0 0 0 0
0 0 0 0
Type-I
1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 −1
Type-I
4. 1√
3 0 0 0
0 −1 0 2
0 0 −1 0
0 2 0 −1
Type-I
1 1− i
1 + i −1
5 0 0 0
0 −1 2 4
0 2 −3 2
0 4 2 −1
Type-I
1 −1 0 0
0 0 0 0
0 0 0 0
0 0 0 0
Type-II
2 0 0 1
0 0 1 0
0 1 0 0
1 0 0 0
Type-II
In Table I, the Jones matrices chosen are so as to give pure Mueller (deterministic)
and non-deterministic type-I, type-II matrices respectively. We observe that an assembly
of Jones matrices can result in a pure Mueller matrix if and only if all elements of the
assembly correspond to the same optical device. This is because, with all Je’s are same,
a transformation of the form Cav =
e pe(JeCJ
e) is equivalent to a transformation of the
Stokes vector S through a Mueller matrix Mav =
peMe = Mpure. When the medium is
represented by a pure Mueller matrix, the outgoing light beam will have the same degree of
polarization as the incoming light beam. In fact, pure Mueller matrix is the simplest among
type-I Mueller matrices. Not all type-I Mueller matrices preserve the degree of polarization
of the incident light beam. To see this, note that the type-I matrix of example 2 (see Table
I) converts any incident light beam into a linearly polarized light beam; the other three
type-I matrices (examples 3 to 5) transform completely polarized light beams into partially
polarized light beams. Similarly, type-II Mueller matrices do not, in general, preserve the
degree of polarization of the incident light beam. It may be seen that the type-II Mueller
matrix of example 7 is a depolarizer matrix, since it converts any incident light beam into
an unpolarized light beam.
Though one cannot a priori state which choices of Jones matrices result in type-I or type-
II, it is interesting to observe that all types of Mueller matrices result - even in 2-element
ensembles. It is not difficult to conclude that an ensemble, with more Jones devices and
with different weight factors, can give rise to a variety of Mueller matrices of all possible
algebraic types. It would certainly be interesting to physically realize such systems.
In the following section, a connection between the ensemble approach for optical devices
and the POVMs of quantum measurement theory is established.
4. A connection to Positive Operator Valued Measures
We will now show that the phenomenology of the ensemble construction of Kim, Mandel
and Wolf [1] described above has a fundamental theoretical underpinning, if we make a
formal identification of the coherency matrix with the density matrix description of the
subsystem of a composite quantum system. The coherency matrix defined by Eqs. (1)
and (2) resembles a quantum density matrix in that both describe a physical system by
a hermitian, trace-class, and positive semi-definite matrix. While the quantum density
matrix has unit trace, the coherency matrix has intensity of the beam as the value of the
trace. The Jones matrix transformation is a general transformation of the coherency matrix,
which preserves its hermiticity and positive semi-definiteness - but changes the values of the
elements of the coherency matrix. The most general transformation of the density matrix
ρ, which preserves its hermiticity, positive semi-definiteness and also the unit trace is the
positive operator valued measures (POVM) [19]:
iVi = I (9)
where Vi’s are general matrices and I is the unit element in the Hilbert space. More generally,
one could relax the condition of preservation of the unit trace of the density matrix by
examining the possibility of a contracting transformation, where the unit matrix condition
on the POVM operators is replaced by an inequality.
This mathematical theorem has a physical basis in the Kraus operator formalism [19]
when we consider the Hamiltonian description of a composite interacting system A, B
described by a density matrix ρ(A, B) and deduce the subsystem density matrix of A given
by, ρ(A) = TrB ρ(A, B). In this case, the Kraus operators are the explicit expressions of
the POVM operators and contain the effects of interaction between the systems A and B
in the description of the subsystem A. It is thus clear that the phenomenology of Ref. [1]
has a correspondence with the Kraus formulation and the POVM theory. In order to make
this association complete, we compare Eq. (9) with the expression given by Eq. (7). Apart
from a phase factor, the Kraus operators {Vi}, associated with POVMs, may be related to
the Jones assembly {Ji}, chosen in the form
iVi =
iJi (10)
In the construction of the Table I presented earlier, a simple model was proposed where all
probabilities were chosen to be equal and the condition on the sum over the Jones matrix
combinations was set equal to unit matrix. In such cases, the intensity of the beam gets
reduced by 1/n and the polarization properties of the beam gets changed as was described
earlier. With this identification, we have provided here an important interpretation and
meaning to the phenomenology of the ensemble approach of Kim et al.[1].
Recently Ahnert and Payne [2] proposed an experimental scheme to implement all possible
POVMs on single photon polarization states using linear optical elements. In view of the
connection between the ensemble formalism for Jones and Mueller matrices with the POVMs,
a possible experimental realization of the two types of Mueller matrices is suggested in the
next section.
5. Possible experimental realization of types I and II Mueller matrices.
We first observe that the density matrix of a single photon polarization state,
ρ = ρHH |H〉 〈H|+ ρHV |H〉 〈V |+ ρ∗HV |V 〉 〈H|+ ρV V |V 〉 〈V | (11)
is nothing but the coherency matrix of the photon [20]
〈â†H âH〉 〈â
H âV 〉
〈â†V âH〉 〈â
V âV 〉
 , (12)
where âH and âV are the creation operators of the polarization states of the single photon;
{|H〉, |V 〉} denote the transverse orthogonal polarization states of photon. This is seen
explicitly by noting that the average values of the Stokes operators are obtained as,
s0 = 〈Ŝ0〉 = 〈(â†H âH + â
V âV )〉 = ρHH + ρV V = Tr(ρ),
s1 = 〈Ŝ1〉 = 〈(â†H âV + â
V âH)〉 = ρHV + ρ∗HV = Tr(ρ σ1),
s2 = 〈Ŝ2〉 = i 〈(â†V âH − â
H âV )〉 = i (ρHV − ρ∗HV ) = Tr(ρ σ2),
s3 = 〈Ŝ3〉 = 〈(â†H âH − â
V âV )〉 = ρHH − ρV V = Tr(ρ σ3). (13)
Hence the proposed setup [2], involving only linear optics elements such as polarizing beam
splitters, rotators and phase shifters, that promises to implement all possible POVMs on a
single photon polarization state leads to all possible ensemble realizations for the Mueller
matrices. More specifically, this provides a general experimental scheme to realize varieties
of Mueller matrices - including the hitherto unreported type-II Mueller matrices. We briefly
describe the scheme proposed in Ref. [2] and illustrate, by way of examples, how it leads to
both type-I and type-II Mueller matrices.
In Ref. [2], a module corresponds to an arrangement having polarization beam splitters,
polarization rotators, phase shifters and unitary operators. For an n element POVM, a
setup involving n − 1 modules are needed. That means, a single module is enough for a 2
element POVM; a setup involving two modules is required for a 3 element POVM and so on.
We describe two, three element POVMs by specifying the optical elements in the respective
modules and by specifying the corresponding Kraus operators in terms of these elements.
For any two operator POVM, the Kraus operators V1, V2 are given by V1 = U
and V2 = U
′′D2U. Here U, U
′, U′′ are the three unitary operators in a single module.
Denoting θ, φ as the angles of rotation of the two variable polarization rotators and γ, ξ,
the angles of the two variable phase shifters in the module, the diagonal matrices D1, D2
are given by,
eiγ cos θ 0
0 cos φ
 , D2 =
eiξ sin θ 0
0 sin φ
 (14)
The POVM elements
F1 = V
1V1 = U
1D1U, F2 = V
2V2 = U
2D2U (15)
satisfy the condition
i=1,2 Fi = F1 + F2 = I.
For any three operator POVM, the Kraus operators are given by
V1 = U
IDIUI,
V2 = U
IIDIIUIIU
V3 = U
IIUIIU
Here, the diagonal D matrices are
eiγI cos θI 0
0 cosφI
 , D′I =
eiξI sin θI 0
0 sinφI
 (17)
DII =
eiγII cos θII 0
0 cos φII
 , D′II =
eiξII sin θII 0
0 sinφII
 (18)
(θI, φI), (γI, ξI) are respectively the pair of angles corresponding to variable polarization
rotators and variable phase shifters in the first module. Similarly, (θII, φII), (γII, ξII) are the
pairs of angles corresponding to variable polarization rotators and variable phase shifters
respectively in the second module. UI, U
I are the unitary operators used in the first
module and UII, U
II, U
II are the unitary operators used in the second module. (Notice
that all the unitary operators in the above schemes are arbitrary and a particular choice of
the associated unitary operators gives rise to a different experimental arrangement). The
extension of this scheme to n operator POVM involving n-1 modules is quite similar and is
given in [2].
We had identified, in Sec. 3, that an ensemble average of Jones devices will lead to all
possible types of Mueller matrices, some examples of which are given in Table 1. We now
show that the experimental set up proposed in Ref. [2] can also be used to realize varieties
of Mueller devices. To substantiate our claim, we identify here the linear optical elements
needed in the single module set up of Ahnert and Payne [2], which lead to the physical
realization of two typical Mueller matrices given in Table 1.
To obtain the type-I Mueller matrix M= 1

1 0 0 0
0 0 0 0
0 0 0 0
0 0 0 −1

of example 3 (see Table I), we
use U = I, U′ =
 and U′′ =
 as the required unitary Jones devices and
both the variable polarization rotators are set with their rotation angles θ=φ = π/4. There
is no need of phase shifter devices in this case i.e, γ=ξ=0.
Similarly for the type-II Mueller matrix M= 1

2 0 0 1
0 0 1 0
0 1 0 0
1 0 0 0

of example 7, we find that
U = I, U′ = 1√
 andU′′ = 1√
 are the required unitary Jones devices. The
rotation angles of the variable polarization rotators are, as in the earlier case, θ=φ = π/4
and there is no need of phase shifter devices i.e., γ=ξ=0. Notice that in both the above
examples the unitary operators U′, U′′ correspond to linear and circular retarders [18].
These two examples illustrate that the experimental set up given in Ref. [2] may be
utilized to realize the required non-determinisitc Mueller devices. In fact Mueller matri-
ces corresponding to an ensemble with more than two Jones devices may also be realized
by employing larger number of modules as given in the experimental scheme proposed by
Ref. [2].
6. Conclusion
We have established here a connection between the phenomenological ensemble approach [1]
for the coherency matrix and the POVM transformation of quantum density matrix. This
opens up a fresh avenue to physically realize types I and II of the Mueller matrix classification
of Ref. [17]. We have also given experimental setup to implement Mueller transformations
corresponding to ensemble average of Jones devices by employing the POVM scheme on the
single photon density matrix suggested in Ref. [2], in the context of quantum measurement
theory. It is gratifying to note that two decades after the introduction of the ensemble
approach, which had remained obscure and only received passing reference in textbooks such
as [20], its value is revealed in this paper through its connection with the new developments
in quantum measurement theory. We plan on exploring further the POVM transformation
in the description of quantum polarization optics.
References
1. K. Kim, L. Mandel, and E. Wolf, “Relationship between Jones and Mueller matrices
for random media”, J. Opt. Soc. Am. A 4, 433–437 (1987).
2. S. E. Ahnert and M. C. Payne, “General implementation of all possible positive-
operator-value measurements of single photon polarization states”, Phys. Rev. A 71,
012330-33, (2005).
3. R. Barakat, “Bilinear constraints between elements of the 4× 4 Mueller-Jones transfer
matrix of polarization theory”, Opt. Commun. 38,159–161 (1981).
4. R. Simon, “The connection between Mueller and Jones matrices of Polarization Op-
tics”, Opt. Commun. 42, 293–297 (1982).
5. A. B. Kostinski, B. James, and W. M. Boerner, “Optimal reception of partially polar-
ized waves” J. Opt. Soc. Am. A 5, 58–64 (1988).
6. A. B. Kostinski, Depolarization criterion for incoherent scattering” Appl. Optics 31,
3506–3508 (1992).
7. J. J. Gil, and E. Bernabeau, “A depolarization criterion in Mueller matrices” Optica
Acta, 32, 259–261 (1985).
8. R. Simon,“ Mueller matrices and depolarization criteria” J. Mod. Optics 34, 569–575
(1987).
9. R. Simon, “Non-depolarizing systems and degree of polarization” Opt. Commun. 77,
349–354 (1990)
10. M. Sanjay Kumar, and R. Simon, “Characterization of Mueller matrices in Polarizatio
Optics”, Optics Commun. 88, 464–470 (1992).
11. R. Sridhar and R. Simon, “Normal form for Mueller matrices in Polarization Optics”
J. Mod. Optics 41, 1903–1915 (1994).
12. D. G. M. Anderson, and R. Barakat,“Necessary and sufficient conditions for a Mueller
matrix to be derivable from a Jones matrix” J. Opt. Soc. Am. A 11, 2305–2319 (1994).
13. C. V. M. van der Mee, and J. W. Hovenier, “Structure of matrices transforming Stokes
parameters”, J. Math. Phys. 33, 3574–3584 (1992).
14. C. R. Givens, and A. B. Kostinski, “A simple necessary and sufficient criterion on
physically realizable Mueller matrices”, J. Mod. Opt. 40, 471–481 (1993).
15. C. V. M. van der Mee, “An eigenvalue criterion for matrices transforming Stokes pa-
rameters”, J. Math. Phys. 34, 5072–5088 (1993).
16. S. R. Cloude, “Group Theory and Polarization algebra”, Optik 75, 26–36 (1986).
17. A. V. Gopala Rao, K. S. Mallesh, and Sudha, “On the algebraic characterization of a
Mueller matrix in polarization optics I. Identifying a Mueller matrix from its N matrix”
J. Mod. Optics, 45, 955–987 (1998).
18. R. M. A. Azzam and N. M. Bashara, Ellipsometry and Polarized light, (North Holland
Publishing Co., Amsterdam, 1977)
19. M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information,
(Cambridge University Press, Cambridge, 2002).
20. L. Mandel and E. Wolf, Quantum Coherence and quantum optics, (Cambridge Univer-
sity Press, Cambridge, 1995).
ABSTRACT
  Statistical ensemble formalism of Kim, Mandel and Wolf (J. Opt. Soc. Am. A 4,
433 (1987)) offers a realistic model for characterizing the effect of
stochastic non-image forming optical media on the state of polarization of
transmittedlight. With suitable choice of the Jones ensemble, various Mueller
transformations - some of which have been unknown so far - are deduced. It is
observed that the ensemble approach is formally identical to the positive
operator valued measures (POVM) on the quantum density matrix. This
observation, in combination with the recent suggestion by Ahnert and Payne
(Phys. Rev. A 71, 012330, (2005)) - in the context of generalized quantum
measurement on single photon polarization states - that linear optics elements
can be employed in setting up all possible POVMs, enables us to propose a way
of realizing different types of Mueller devices.

<|endoftext|><|startoftext|>
Reexamination of spin decoherence in semiconductor quantum dots from
equation-of-motion approach
J. H. Jiang,1, 2 Y. Y. Wang,2 and M. W. Wu1, 2, ∗
Hefei National Laboratory for Physical Sciences at Microscale,
University of Science and Technology of China, Hefei, Anhui, 230026, China
Department of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, China
(Dated: November 2, 2018)
The longitudinal and transversal spin decoherence times, T1 and T2, in semiconductor quantum
dots are investigated from equation-of-motion approach for different magnetic fields, quantum dot
sizes, and temperatures. Various mechanisms, such as the hyperfine interaction with the surrounding
nuclei, the Dresselhaus spin-orbit coupling together with the electron–bulk-phonon interaction, the
g-factor fluctuations, the direct spin-phonon coupling due to the phonon-induced strain, and the
coaction of the electron–bulk/surface-phonon interaction together with the hyperfine interaction
are included. The relative contributions from these spin decoherence mechanisms are compared in
detail. In our calculation, the spin-orbit coupling is included in each mechanism and is shown to
have marked effect in most cases. The equation-of-motion approach is applied in studying both the
spin relaxation time T1 and the spin dephasing time T2, either in Markovian or in non-Markovian
limit. When many levels are involved at finite temperature, we demonstrate how to obtain the
spin relaxation time from the Fermi Golden rule in the limit of weak spin-orbit coupling. However,
at high temperature and/or for large spin-orbit coupling, one has to use the equation-of-motion
approach when many levels are involved. Moreover, spin dephasing can be much more efficient
than spin relaxation at high temperature, though the two only differs by a factor of two at low
temperature.
PACS numbers: 72.25.Rb, 73.21.La,71.70.Ej
I. INTRODUCTION
One of the most important issues in the growing
field of spintronics is quantum information processing
in quantum dots (QDs) using electron spin.1,2,3,4,5 A
main obstacle is that the electron spin is unavoidably
coupled to the environment (such as, the lattice) which
leads to considerable spin decoherence (including lon-
gitudinal and transversal spin decoherences).6,7 Vari-
ous mechanisms, such as, the hyperfine interaction with
the surrounding nuclei,8,9 the Dresselhaus/Rashba spin-
orbit coupling (SOC)10,11 together with the electron-
phonon interaction, g-factor fluctuations,12 the direct
spin-phonon coupling due to the phonon-induced strain,9
and the coaction of the hyperfine interaction and the
electron-phonon interaction can lead to the spin deco-
herence. There are quite a lot of theoretical works
on spin decoherence in QD. Specifically, Khaetskii and
Nazarov analyzed the spin-flip transition rate using a
perturbative approach due to the SOC together with the
electron-phonon interaction, g-factor fluctuations, the di-
rect spin-phonon coupling due to the phonon-induced
strain qualitatively.13,14,15 After that, the longitudinal
spin decoherence time T1 due to the Dresslhaus and/or
the Rashba SOC together with the electron-phonon in-
teraction were studied quantitatively in Refs. 16,17,18,19,
20,21,22,23,24,25,26. Among these works, Cheng et al.18
developed an exact diagonalization method and showed
that due to the strong SOC, the previous perturbation
method14,15,16 is inadequate in describing T1. Further-
more, they also showed that, the perturbation method
previously used missed an important second-order en-
ergy correction and would yield qualitatively wrong re-
sults if the energy correction is correctly included and
only the lowest few states are kept as those in Refs.
14,15,16. These results were later confirmed by Deste-
fani and Ulloa.21 The contribution of the coaction of the
hyperfine interaction and the electron-phonon interac-
tion to longitudinal spin decoherence was calculated in
Refs. 27 and 28. In contrast to the longitudinal spin de-
coherence time, there are relatively fewer works on the
transversal spin decoherence time, T2, also referred to as
the spin dephasing time (while the longitudinal spin de-
coherence time is referred to as the spin relaxation time
for short). The spin dephasing time due to the Dressel-
haus and/or the Rashba SOC together with the electron-
phonon interaction was studied by Semenov and Kim29
and by Golovach et al..20 The contributions of the hyper-
fine interaction and the g-factor fluctuation were studied
in Refs. 30,31,32,33,34,35,36,37,38,39,40,41,42,43,44 and
in Ref. 45 respectively. However, a quantitative calcula-
tion of electron spin decoherence induced by the direct
spin-phonon coupling due to phonon-induced strain in
QDs is still missing. This is one of the issues we are
going to present in this paper. In brief, the spin re-
laxation/dephasing due to various mechanisms has been
studied previously in many theoretical works. However,
almost all of these works only focus individually on one
mechanism. Khaetskii and Nazarov discussed the ef-
fects of different mechanisms on the spin relaxation time.
Nevertheless, their results are only qualitative and there
is no comparison of the relative importance of the dif-
http://arxiv.org/abs/0704.0148v5
ferent mechanisms.13,14,15 Recently, Semenov and Kim
discussed various mechanisms contributed to the spin
dephasing,46 where they gave a “phase diagram” to in-
dicate the most important spin dephasing mechanism in
Si QD where the SOC is not important. However, the
SOC is very important in GaAs QDs. To fully under-
stand the microscopic mechanisms of spin relaxation and
dephasing, and to achieve control over the spin coherence
in QDs,47,48,49 one needs to gain insight into the relative
importance of each mechanism to T1 and T2 under vari-
ous conditions. This is one of the main purposes of this
paper.
Another issue we are going to address relates to differ-
ent approaches used in the study of the spin relaxation
time. The Fermi-Golden-rule approach, which is widely
used in the literature, can be used in calculation of the re-
laxation time τi→f between any initial state |i〉 and final
state |f〉.12,13,14,15,16,17,18,19,21,23,24,25,27,28,50,51,52 How-
ever, the problem is that when the process of the spin
relaxation relates to many states, (e.g., when tempera-
ture is high, the electron can distribute over many states),
one should find a proper way to average over the relax-
ation times (τi→f ) of the involved processes to give the
total spin relaxation time (T1). What makes it difficult
in GaAs QDs, is that all the states are impure spin states
with different expectation values of spin. In the existing
literature, spin relaxation time is given by the average of
the relaxation times of processes from the initial state |i〉
to the final state |f〉 (with opposite majority spin of |i〉)
weighted by the distribution of the initial states fi,
18,51,52
i.e.,
T−11 =
i→f . (1)
This is a good approximation in the limit of small SOC
as each state only carries a small amount of minority
spin. However, when the SOC is very strong which hap-
pens at high levels, it is difficult to find the proper way
to perform the average. We will show that Eq. (1) is
not adequate any more. Thus, to investigate both T1
and T2 at finite temperature for arbitrary strength of
SOC, we develop an equation-of-motion approach for the
many-level system via projection operator technique56 in
the Born approximation. With the rotating wave ap-
proximation, we obtain a formal solution to the equation
of motion. By assuming a proper initial distribution,
we can calculate the evolution of the expectation value
of spin. We thus obtain the spin relaxation/dephasing
time by the 1/e decay of the expectation value of spin
operator 〈Sz〉 or |〈S+〉| (to its equilibrium value), with
S+ ≡ Sx+ iSy. With this approach, we are able to study
spin relaxation/dephasing for various temperature, SOC
strength, and magnetic field.
For quantum information processing based on electron
spin in QDs, the quantum phase coherence is very im-
portant. Thus, the spin dephasing time is a more rel-
evant quantity. There are two kinds of spin dephasing
times: the ensemble spin dephasing time T ∗2 and the
irreversible spin dephasing time T2. For a direct mea-
surement of an ensemble of QDs58 or an average over
many measurements at different times where the config-
urations of the environment have been changed,59,60,61
it gives the ensemble spin dephasing time T ∗2 . The ir-
reversible spin dephasing time T2 can be obtained by
spin echo measurement.60,61 A widely discussed source
which leads to both T ∗2 and T2 is the hyperfine interac-
tion between the electron spin and the nuclear spins of
the lattice. As it has been found that T ∗2 is around 10 ns,
which is too short and makes a practical quantum infor-
mation processing difficult in electron spin based qubits
in QDs. Thus a spin echo technique is needed to remove
the free induction decay and to elongate the spin dephas-
ing time. Fortunately, this technique has been achieved
first by Petta et al. for two electron triplet-singlet system
and then by Koppens et al. for a single electron spin sys-
tem. The achieved spin dephasing time is ∼ 1 µs, which
is much longer than T ∗2 . We therefore discuss only the ir-
reversible spin dephasing time T2 throughout the paper,
i.e., we do not consider the free induction decay in the
hyperfine-interaction-induced spin dephasing.
It is further noticed that Golovach et al. have shown
that the spin dephasing time T2 is two times the spin
relaxation time T1.
20 However, as temperature increases,
this relation does not hold. Semenov and Kim on the
other hand reported that the spin dephasing time is much
smaller than the spin relaxation time.29 In this paper,
we calculate the temperature dependence of the ratio of
the spin relaxation time to the spin dephasing time and
analyze the underlying physics.
This paper is organized as follows: In Sec. II, we
present our model and formalism of the equation-of-
motion approach. We also briefly introduce all the spin
decoherence mechanisms considered in our calculations.
In Sec. III we present our numerical results to indicate
the contribution of each spin decoherence mechanism to
spin relaxation/dephasing time under various conditions
based on the equation-of-motion approach. Then we
study the problem of how to obtain the spin relaxation
time from the Fermi Golden rule when many levels are
involved in Sec. IV. The temperature dependence of T1
and T2 is investigated in Sec. V. We conclude in Sec. VI.
II. MODEL AND FORMALISM
A. Model and Hamiltonian
We consider a QD system, where the QD is confined
by a parabolic potential Vc(x, y) =
m∗ω20(x
2 + y2) in
the quantum well plane. The width of the quantum well
is a. The external magnetic field B is along z direction,
except in Sec. IV. The total Hamiltonian of the system
of electron together with the lattice is:
HT = He +HL +HeL , (2)
where He, HL, HeL are the Hamiltonians of the elec-
tron, the lattice and their interaction, respectively. The
electron Hamiltonian is given by
+ Vc(r) +HZ +HSO (3)
where P = −i~∇+ e
A with A = (B⊥/2)(−y,x) (B⊥ is
the magnetic field along z direction), HZ =
gµBB · σ
is the Zeeman energy with µB the Bohr magneton,
and HSO is the Hamiltonian of SOC. In GaAs, when
the quantum well width is small or the gate-voltage
along the growth direction is small, the Rashba SOC is
unimportant.53 Therefore, only the Dresselhaus term10
contributes to HSO. When the quantum well width is
smaller than the QD radius, the dominant term in the
Dresselhaus SOC reads
Hso =
〈P 2z 〉λ0(−Pxσx + Pyσy) , (4)
with γ0 denoting the Dresselhaus coefficient, λ0 being
the quantum well subband index of the lowest one and
〈P 2z 〉λ ≡ −~2
ψ∗zλ(z)∂
2/∂z2ψzλ(z)dz. The Hamiltonian
of the lattice consists of two parts HL = Hph +Hnuclei,
where Hph =
qη ~ωqηa
qηaqη (a
†/a is the phonon cre-
ation/annihilation operator) describes the vibration of
the lattice and Hnuclei =
j γIB · Ij (γI is the gyro-
magnetic ratios of the nuclei and Ij is the spin of the j-
th nucleus) describes the precession of the nuclear spins
of the lattice in the external magnetic field. We focus
on the spin dynamics due to hyperfine interaction at a
time scale much smaller than the nuclear dipole-dipole
correlation time (10−4 s in GaAs33,40), where the nuclear
dipole-dipole interaction can be ignored. Under this ap-
proximation, the equation of motion for the reduced elec-
tron system can be obtained which only depends on the
initial distribution of the nuclear spin bath.33 The in-
teraction between the electron and the lattice also has
two parts HeL = HeI + He−ph, where HeI is the hy-
perfine interaction between the electron and nuclei and
He−ph represents the electron-phonon interaction which
is further composed of the electron–bulk-phonon (BP)
interaction Hep, the direct spin-phonon coupling due to
the phonon-induced strain Hstrain and phonon-induced
g-factor fluctuation Hg.
B. Equation-of-motion approach
The equations of motion can describe both the co-
herent and the dissipative dynamics of the electron sys-
tem. When the quasi-particles of the bath relax much
faster than the electron system, the Markovian approx-
imation can be made; otherwise the kinetics is the non-
Markovian. For electron-phonon coupling, due to the
fast relaxation of the phonon bath and the weak electron-
phonon scattering, the kinetics of the electron is Marko-
vian. Nevertheless, as the nuclear spin bath relaxes much
slower than the electron spin, the kinetics due to the
coupling with nuclei is of non-Markovian type.30,32,33 It
is further noted that there is also a contribution from
the coaction of the electron-phonon and electron-nuclei
couplings, which is a fourth order coupling to the bath.
For this contribution, the decoherence of spin is mainly
controlled by the electron-phonon scattering while the
hyperfine (Overhauser) field54 acts as a static magnetic
field. Thus, this fourth order coupling is also Markovian.
Finally, since the electron orbit relaxation is much faster
than the electron spin relaxation,55 we always assume a
thermo-equilibrium initial distribution of the orbital de-
grees of freedom.
Generally, the interaction between the electron and the
quasi-particle of the bath is weak. Therefore the first
Born approximation is adequate in the treatment of the
interaction. Under this approximation, the equation of
motion for the electron system coupled to the lattice envi-
ronment can be obtained with the help of the projection
operator technique.56 We then assume a sudden approxi-
mation so that the initial distribution of the whole system
is ρ(t = 0) = ρe(0) ⊗ ρL(0), where ρe, ρL is the density
matrix of the system and bath respectively. This approx-
imation corresponds to a sudden injection of the electron
into the quantum dot, which is reasonable for genuine
experimental setup.33 As the initial distribution of the
the lattice ρL(0) commutates with the Hamiltonian of
the lattice HL, the equation of motion can be written as
dρe(t)
= − i
[He +TrL(HeLρ
L(0)), ρe(t)]
dτTrL[HeL, U0(τ)(P̂ [HeL, ρe(t− τ)
⊗ρL(0)])U †0 (τ)] , (5)
where ρe(t) is the density operator of the electron sys-
tem at time t, TrL stands for the trace over the lattice
degree of freedom, and U0(τ) = e
−i(HL+He)τ is time-
evolution operator without HeL. P̂ = 1̂ − ρL(0) ⊗ TrL
is the projection operator. The initial distribution of the
phonon system is chosen to be the thermo-equilibrium
distribution.20 It has been shown by previous theoretical
studies that the initial state of the nuclear spin bath is
crucial to the spin dephasing and relaxation.30,32,33 Al-
though it may take a long time (e.g., seconds) for the
nuclear spin system to relax to its thermo-equilibrium
state, one can still assume that its initial state is the
thermo-equilibrium one. This assumption corresponds
to the genuine case of enough long waiting time during
every individual measurement. For a typical setup at
above 10 mK and with about 10 T external magnetic
field, the thermo-equilibrium distribution is a distribu-
tion with equal probability on every state. For these ini-
tial distributions of phonons and nuclear spins, the term
TrL(HeLρ
L(0)) is zero. Thus,
P̂ [HeL, ρe(t−τ)⊗ρL(0)] = [HeL, ρe(t−τ)⊗ρL(0)] . (6)
The equation of motion is then simplified to,
dρe(t)
= − i
[He, ρ
e(t)] − 1
dτTrL[HeL, [H
eL(−τ),
Ue0 (t)ρ
Ie(t− τ)Ue0
(t)ρL(0)]] , (7)
where HIeL and ρ
Ie are the corresponding operators (HeL
and ρe ) in the interaction picture, and Ue0 (t) = e
−iHet
is the time-evolution operator of He. It should be fur-
ther noted that the first Born approximation can not
fully account for the non-Markovian dynamics due to the
hyperfine interaction with nuclear spins.33,57 Only when
the Zeeman splitting is much larger than the fluctuating
Overhauser shift, the first Born approximation is ade-
quate. For GaAs QDs, this requires B ≫ 3.5 T.33 In this
paper, we focus on the study of spin dephasing for the
high magnetic field regime of B > 3.5 T under the first
Born approximation, where the second Born approxima-
tion only affects the long-time behavior.33 Later we will
argue that this correction of long time dynamics changes
the spin dephasing time very little.
1. Markovian kinetics
The kinetics due to the coupling with phonons can be
investigated within the Markovian approximation, where
the equation of motion reduces to,
dρe(t)
= − i
[He, ρ
e(t)]− 1
dτTrph[He−ph,
[HIe−ph(−τ), ρe(t)⊗ ρph(0)]] . (8)
Here Trph is the trace over phonon degrees of freedom
and ρph(0) is the initial distribution of the phonon bath.
Within the basis of the eigen-states of the electron Hamil-
tonian, {|ℓ〉}, the above equation reads,
ρeℓ1ℓ2= −i
(εℓ1 − εℓ2)
ρeℓ1ℓ2
Trp(H
I e−ph
ρeℓ4ℓ2 ⊗ ρ
−HI e−phℓ1ℓ3 ρ
⊗ ρpeqH
) +H.c.
. (9)
Here H
= 〈ℓ1|He−ph|ℓ3〉 and HI e−phℓ1ℓ3 =
〈ℓ1|HIe−ph(−τ)|ℓ3〉. A general form of the electron-
phonon interaction reads
He−ph =
Φqη(aqη + a
−qη)Xqη(r,σ) . (10)
Here, η represents the phonon branch index; Φqη is the
matrix element of the electron-phonon interaction; aqη
is the phonon annihilation operator; Xqη(r,σ) denotes
a function of electron position and spin. Substituting
this into Eq. (9), we obtain, after integration within the
Markovian approximation,49
ρeℓ1ℓ2 = i
(εℓ1 − εℓ2)
ρeℓ1ℓ2
|Φqη|2{Xqηℓ1ℓ3X
ρeℓ4ℓ2
×Cqη(εℓ4 − εℓ3)−X
ρeℓ3ℓ4
×Cqη(εℓ3 − εℓ1)} +H.c.
in which X
= 〈ℓ1|Xqη(r,σ)|ℓ2〉, and Cqη(∆ε) =
n̄(ωqη)δ(∆ε+ωqη)+[n̄(ωqη)+1]δ(∆ε−ωqη). Here n̄(ωqη)
represents the Bose distribution function. Equation (11)
can be written in a more compact form
ρeℓ1ℓ2 = −
Λℓ1ℓ2ℓ3ℓ4ρ
, (12)
which is a linear differential equation. This equation can
be solved by diagonalizing Λ. Given an initial distribu-
tion ρeℓ1ℓ2(0), the density matrix ρ
(t) and the expec-
tation value of any physical quantity 〈O〉t = Tr(Ôρe(t))
at time t can be obtained:49
〈O〉t = Tr(Ôρe)
ℓ1···ℓ6
〈ℓ2|Ô|ℓ1〉P(ℓ1ℓ2)(ℓ3ℓ4)
× e−Γ(ℓ3ℓ4)tP−1
(ℓ3ℓ4)(ℓ5ℓ6)
ρeℓ5ℓ6(0) (13)
with Γ = P−1ΛP being the diagonal matrix and P repre-
senting the transformation matrix. To study spin dynam-
ics, we calculate 〈Sz〉t (|〈S+〉t|) and define the spin relax-
ation (dephasing) time as the time when 〈Sz〉t (|〈S+〉t|)
decays to 1/e of its initial value (to its equilibrium value).
2. Non-Markovian kinetics
Experiments have already shown that for a large en-
semble of quantum dots or for an ensemble of many mea-
surements on the same quantum dot at different times,
the spin dephasing time due to hyperfine interaction is
quite short, ∼ 10 ns.58,59,60,61 This rapid spin dephas-
ing is caused by the ensemble broadening of the preces-
sion frequency due to the hyperfine fields.40 When the
external magnetic field is much larger than the random
Overhauser field, the rotation due to the Overhauser field
perpendicular to the magnetic field is blocked. Only
the broadening of the Overhauser field parallel to the
magnetic field contribute to the spin dephasing. To de-
scribe this free induction decay for this high magnetic
field case, we write the hyperfine interaction into two
parts: HeI = h · S = HeI1 +HeI2. Here h = (hx, hy, hz)
and S = (Sx, Sy, Sz) are the Overhauser field and the
electron spin respectively. HeI1 = hzSz and HeI2 =
(h+S− + h−S+) with h± = hx ± ihy. The longitudi-
nal part HeI1 is responsible for the free induction de-
cay, while the transversal part HeI2 is responsible for
high order irreversible decay. As the rapid free induction
decay can be removed by spin echo,60,61 elongating the
spin dephasing time to ∼ 1 µs which is more favorable
for quantum computation and quantum information pro-
cessing, we then discuss only the irreversible decay. We
first classify the states of the nuclear spin system with its
polarization. Then we reconstruct the states within the
same class to make it spatially uniform. These uniformly
polarized pure states, |n〉’s, are eigen-states of hz. They
also form a complete-orthogonal basis of the nuclear spin
system. A formal expression of |n〉 is33
|n〉 =
m1···mN
αnm1···mN
|I,mj〉 . (14)
Here |I,mj〉 denotes the eigen-state of the z-component
of the j-th nuclear spin Ijz with the eigenvalue ~mj. N
denotes the number of the nuclei. The equation of motion
for the case with initial nuclear spin state ρns1 (0) = |n〉〈n|
is given by33
dρe(t)
[He +Trns(HeIρ
1 (0)), ρ
e(t)]
dτTrns[HeI2, U
0 (τ)
×[HeI2, ρe(t− τ)⊗ ρns1 (0)]UeI0
(τ)] . (15)
As in traditional projection operator technique, the dy-
namics of the nuclear spin subsystem is incorporated
self-consistently in the last term.33,56 Here Trns is the
trace over nuclear spin degrees of freedom. UeI0 (τ) =
exp[−iτ(He+HI +HeI1)]. The Overhauser field is given
by h =
j Av0Ijδ(r − Rj), where the constants A and
v0 are given later. Ij and Rj are the spin and posi-
tion of j-th nucleus respectively. As mentioned above,
the initial state of the nuclear spin bath is chosen to be
a state with equal probability of each state, therefore
ρns(0) =
n 1/Nw|n〉〈n|, with Nw =
n 1 being the
number of states of the basis {|n〉}. To quantify the ir-
reversible decay, we calculate the time evolution of S
for every case with initial nuclear spin state |n〉. We then
sum over n and obtain
||〈S+〉t|| =
|〈S(n)+ 〉t|. (16)
It is noted that the summation is performed after the
absolute value of 〈S(n)+ 〉t. Therefore, the destructive in-
terference due to the difference in precession frequency
ωzn, which originates from the longitudinal part of the
hyperfine interaction (HeI1), is removed. We thus use
1/e decay of the envelope of ||〈S+〉t|| to describe the irre-
versible spin dephasing time T2. Similar description has
been used in the irreversible spin dephasing in semicon-
ductor quantum wells62 and the irreversible inter-band
optical dephasing in semiconductors.63,64
Expanding Eq. (15) in the basis of {|n〉}, one obtains,
ρeℓ1ℓ2= −
(εℓ1δℓ1ℓ3 +H
nℓ1;nℓ3
)ρeℓ3ℓ2
−ρeℓ1ℓ3(εℓ3δℓ3ℓ2 +H
nℓ3;nℓ2
[HeI2nℓ1;n1ℓ3H
I eI2
n1ℓ3;nℓ4
ρeℓ4ℓ2(t− τ)
−HI eI2nℓ1;n1ℓ3ρ
(t− τ)HeI2n1ℓ4;nℓ2 ] +H.c.
. (17)
Here HeI2nℓ1;n1ℓ3 = 〈nℓ1|HeI2|n1ℓ3〉 and H
I eI2
nℓ1;n1ℓ3
〈nℓ1|HIeI2(−τ)|n1ℓ3〉. For simplicity, we neglect the
terms concerning different orbital wavefunctions which
are much smaller. For small spin mixing, assuming an
equilibrium distribution in orbital degree of freedom, un-
der rotating wave approximation, and trace over the or-
bital degree of freedom, we finally arrive at
〈S(n)+ 〉t = iωzn〈S
+ 〉t −
fk([h+]knn′
× [h−]kn′n + [h−]knn′ [h+]kn′n)
× exp[iτ(ωkn − ωkn′)]}〈S(n)+ 〉t−τ . (18)
Here ωzn =
k fk(Ezk/~ + ωkn) with Ezk representing
the electron Zeeman splitting of the k-th orbital level.
[hi]knn′ = 〈n|〈k|hi|k〉|n′〉 (i = ±, z). ωkn = [hz]knn + ǫnz
with ǫnz denoting the nuclear Zeeman splitting which is
very small and can be neglected. By solving the above
equation, we obtain |〈S(n)+ 〉t| for a given |n〉. We then sum
over n and determine the irreversible spin dephasing time
T2 as 1/e decay of the envelop of ||〈S+〉t||. By noting that
only the polarization of nuclear spin state |n〉 determines
the evolution of |〈S(n)+ 〉t|, the summation over n is then
reduced to summation over polarization which becomes a
integration for large N . This integration can be handled
numerically.
In the limiting case of zero SOC and very low tem-
perature, only the lowest two Zeeman sublevels are con-
cerned. The equation for 〈S+〉t with initial nuclear spin
state ρns1 (0) = |n〉〈n| reduce to
〈S+〉t = iωzn〈S+〉t −
([h+]nn′
× [h−]n′n + [h−]nn′ [h+]n′n) exp[iτ(ωn − ωn′)]}〈S+〉t−τ
= iωz〈S+〉t −
dτΣ(τ)〈S+〉t−τ . (19)
In this equation ωzn = (gµBB + [hz]nn′)/~, [hξ]nn′ =
〈n|〈ψ1|hξ|ψ1〉|n′〉 (ξ = ±, z and ψ1 is the orbital quantum
number of the ground state), and ωn = [hz]nn. Similar
equation has been obtained by Coish and Loss,33 and
later by Deng and Hu35 at very low temperature such
that only the lowest two Zeeman sublevels are considered.
Coish and Loss also presented an efficient way to evaluate
Σ(τ) in terms of their Laplace transformations, Σ(s) =
dτe−sτΣ(τ). They gave,
Σ(s) =
([h+]nn′ [h−]n′n
+ [h−]nn′ [h+]n′n)/(s− iδωnn′) , (20)
with δωnn′ =
(ωn − ωn′). With the help of this tech-
nique, we are able to investigate the spin dephasing due
to the hyperfine interaction.
C. Spin decoherence mechanisms
In this subsection we briefly summarize all the spin de-
coherence mechanisms. It is noted that the SOC modifies
all the mechanisms. This is because the SOC modifies the
Zeeman splitting18 and the spin-resolved eigen-states of
the electron Hamiltonian, it hence greatly changes the
effect of the electron-BP scattering.18 These two modifi-
cations, especially the modification of the Zeeman split-
ting, also change the effect of other mechanisms, such
as, the direct spin-phonon coupling due to the phonon-
induced strain, the g-factor fluctuation, the coaction of
the electron-phonon interaction and the hyperfine inter-
action. In the literature, except for the electron-BP scat-
tering, the effects from the SOC are neglected except for
the work by Woods et al.16 in which the spin relaxation
time between the two Zeeman sub-levels of the lowest
electronic state due to the phonon-induced strain is in-
vestigated. However, the perturbation method they used
does not include the important second-order energy cor-
rection. In our investigation, the effects of the SOC are
included in all the mechanisms and we will show that
they lead to marked effects in most cases.
1. SOC together with electron-phonon scattering
As the SOC mixes different spins, the electron-BP scat-
tering can induce spin relaxation and dephasing. The
electron-BP coupling is given by
Hep =
Mqη(aqη + a
−qη)e
iq·r , (21)
where Mqη is the matrix element of the electron-phonon
interaction. In the general form of the electron phonon
interaction He−ph, Φqη = Mqη and Xqη(r,σ) = e
iq·r.
|Mqsl|2 = ~Ξ2q/2ρvslV for the electron-BP coupling due
to the deformation potential. For the piezoelectric cou-
pling, |Mqpl|2 = (32~π2e2e214/κ2ρvslV )[(3qxqyqz)2/q7]
for the longitudinal phonon mode and
j=1,2 |Mqptj |2 =
[32~π2e2e214/(κ
2ρvstq
5V )][q2xq
y + q
z + q
(3qxqyqz)
2/q2] for the two transverse modes. Here
Ξ stands for the acoustic deformation potential; ρ is the
GaAs volume density; V is the volume of the lattice;
e14 is the piezoelectric constant and κ denotes the
static dielectric constant. The acoustic phonon spectra
ωqql = vslq for the longitudinal mode and ωqpt = vstq
for the transverse mode with vsl and vst representing
the corresponding sound velocities.
Besides the electron-BP scattering, electron also cou-
ples to vibrations of the confining potential, i.e., the
surface-phonons,28
δV (r) = −
2ρωqηV
(aqη + a
−qη)ǫqη · ∇rVc(r) ,
in which ǫqη is the polarization vector of a phonon mode
with wave-vector q in branch η. However, this contri-
bution is much smaller than the electron-BP coupling.
Compared to the coupling due to the deformation poten-
tial for example, the ratio of the two coupling strengths is
≈ ~ω0/Ξql0 , where l0 is the characteristic length of the
quantum dot and ~ω0 is the orbital level splitting. The
phonon wave-vector q is determined by the energy differ-
ence between the final and initial states of the transition.
Typically phonon transitions between Zeeman sublevels
and different orbital levels, ql0 ranges from 0.1 to 10.
Bearing in mind that ~ω0 is about 1 meV while Ξ = 7 eV
in GaAs, ~ω0/Ξql0 is about 10
−3. The piezoelectric cou-
pling is of the same order as the deformation potential.
Therefore spin decoherence due to the electron–surface-
phonon coupling is negligible.
2. Direct spin-phonon coupling due to phonon-induced
strain
The direct spin-phonon coupling due to the phonon-
induced strain is given by65
Hstrain =
s(p) · σ , (23)
where hsx = −Dpx(ǫyy − ǫzz), hsy = −Dpy(ǫzz − ǫxx) and
hsz = −Dpz(ǫxx− ǫyy) with p = (px, py, pz) = −i~∇ and
D being the material strain constant. ǫij (i, j = x, y, z)
can be expressed by the phonon creation and annihilation
operators:
ǫij =
qη=l,t1,t2
2ρωqηV
(aq,η + a
−q,η)(ξiηqj
+ ξjηqi)e
iq·r , (24)
in which ξil = qi/q for the longitudinal phonon
mode and (ξxt1 , ξyt1 , ξzt1) = (qxqz, qyqz,−q2‖)/qq‖,
(ξxt2 , ξyt2 , ξzt2) = (qy,−qx, 0)/q‖ for the two trans-
verse phonon modes with q‖ =
q2x + q
y . There-
fore, in the general form of electron-phonon interaction
He−ph, Φqη = −iD
~/(32ρωqηV ) and Xqη(r,σ) =
ijk ǫijk(ξjηqj − ξkηqk)pieiq·rσi with ǫijk denoting the
Levi-Civita tensor.
3. g-factor fluctuation
The spin-lattice interaction via phonon modulation of
the g-factor is given by12
ijkl=x,y,z
AijklµBBiσjǫkl , (25)
where ǫkl is given in Eq. (24) and Aijkl is a
tensor determined by the material. Therefore in
He−ph, Φqη = i
~/(32ρωqηV ) and Xqη(r,σ) =
i,j,k,l Ai,j,k,lµBBi(ξkηqk − ξlηql)σjeiq·r. Due to the
axial symmetry with respect to the z-axis, and keep-
ing in mind that the external magnetic field is along
the z direction, the only finite element of Hg is Hg =
[(A33−A31)ǫzz+A31
i ǫii]~µBBσz/2 with A33 = Azzzz ,
A31 = Azzxx and A66 = Axyxy. A33 + 2A31 = 0.
4. Hyperfine interaction
The hyperfine interaction between the electron and nu-
clear spins is66
HeI(r) =
Av0S · Ijδ(r−Rj) , (26)
where S = ~σ/2 and Ij are the electron and nucleus
spins respectively, v0 = a
0 is the volume of the unit cell
with a0 representing the crystal lattice parameter, and r
(Rj) denotes the position of the electron (the j-th nu-
cleus). A = 4µ0µBµI/(3Iv0) is the hyperfine coupling
constant with µ0, µB and µI representing the perme-
ability of vacuum, the Bohr magneton and the nuclear
magneton separately.
As the Zeeman splitting of the electron is much larger
(three orders of magnitude larger) than that of the nu-
cleus spin, to conserve the energy for the spin relax-
ation processes, there must be phonon-assisted transi-
tions when considering the spin-flip processes. Tak-
ing into account directly the BP induced motion of nu-
clei spin of the lattice leads to a new spin relaxation
mechanism:28
eI−ph(r) = −
Av0S · Ij(u(R0j) ·∇r)δ(r−Rj) , (27)
where u(R0j) =
~/(2ρωqηv0)(aqη + a
qη)ǫqηe
iq·R0
is the lattice displacement vector. Therefore using the
notation of Eq. (10), Φ =
~/(2ρV ωqη) and Xqη =
j Av0S·Ij∇rδ(r−Rj). The second-order process of the
surface phonon and the BP together with the hyperfine
interaction also leads to spin relaxation:
eI−ph(r) = |ℓ2〉
m 6=ℓ1
〈ℓ2|δVc(r)|m〉〈m|HeI (r)|ℓ1〉
εℓ1 − εm
m 6=ℓ2
〈ℓ2|HeI(r)|m〉〈m|δVc(r)|ℓ1〉
εℓ2 − εm
〈ℓ1| , (28)
eI−ph = |ℓ2〉
m 6=ℓ1
〈ℓ2|Hep|m〉〈m|HeI(r)|ℓ1〉
εℓ1 − εm
m 6=ℓ2
〈ℓ2|HeI(r)|m〉〈m|Hep|ℓ1〉
εℓ2 − εm
〈ℓ1| , (29)
in which |ℓ1〉 and |ℓ2〉 are the eigen states ofHe. By using
the notations in He−ph, Φqη =
~/(2ρωqηv0) and
Xqη = |ℓ2〉ǫqη ·
m 6=ℓ1
εℓ1 − εm
〈ℓ2|[He,P]|m〉
× 〈m|S · Ijδ(r−Rj)|ℓ1〉+
m 6=ℓ2
εℓ2 − εm
〈m|[He,P]|ℓ1〉
Av0〈ℓ2|S · Ijδ(r−Rj)|m〉
〈ℓ1| (30)
for V
eI−ph. Similarly Φqη =Mqη and
Xqη = |ℓ2〉
m 6=ℓ1
〈ℓ2|eiq·r|m〉
εℓ1 − εm
Av0〈m|S · Ij
× δ(r−Rj)|ℓ1〉+
m 6=ℓ2
εℓ2 − εm
〈m|eiq·r|ℓ1〉
Av0〈ℓ2|S · Ijδ(r−Rj)|m〉
〈ℓ1| (31)
for V
eI−ph. Again as the contribution from the surface
phonon is much smaller than that of the BP, V
eI−ph can
be neglected. It is noted that, the direct spin-phonon
coupling due to the phonon-induced strain together with
the hyperfine interaction gives a fourth-order scattering
and hence induces a spin relaxation/dephasing. The in-
teraction is
eI−ph = |ℓ2〉
m 6=ℓ1
〈ℓ2|Hzstrain|m〉〈m|HeI(r)|ℓ1〉
εℓ1 − ǫm
m 6=ℓ2
〈ℓ2|HeI(r)|m〉〈m|Hzstrain|ℓ1〉
ǫℓ2 − ǫm
〈ℓ1| , (32)
with Hzstrain = h
sσz/2 only changing the electron energy
but conserving the spin polarization. It can be written
hzs = −
2ρωq,ηV
(ξyηqy−ξzηqz)qzeiq·r . (33)
Comparing this to the electron-BP interaction Eq. (21),
the ratio is ≈ ~Dq/Ξ, which is about 10−3. Therefore,
the second-order term of the direct spin-phonon coupling
due to the phonon-induced strain together with the hy-
perfine interaction is very small and can be neglected.
Also the coaction of the g-factor fluctuation and the hy-
perfine interaction is very small compared to that of the
electron-BP interaction jointly with the hyperfine inter-
action as µBB/Ξ is around 10
−5 when B = 1 T. There-
fore it can also be neglected. In the following, we only
retain the first and the third order terms V
eI−ph and
eI−ph in calculating the spin relaxation time.
The spin dephasing time induced by the hyperfine in-
teraction can be calculated from the non-Markovian ki-
netic Eq. (18), for unpolarized initial nuclear spin state
|n0〉, resulting in
〈S(n0)+ 〉t ∝
dr|ψk(r)|4 cos(
|ψk(r)|2t) ,
where fk is the thermo-equilibrium distribution of the
orbital degree of freedom. When only the lowest two
Zeeman sublevels are considered, assuming a simple
form of the wavefunction, |Ψ(r)|2 = 1
exp(−r2
/d20)
with d‖/az representing the QD diameter/quantum well
width, and r‖ = x
2 + y2, the integration can be carried
〈S(n0)+ 〉t ∝
cos(t/t0)− 1
(t/t0)2
sin(t/t0)
. (35)
Here, t0 = (2πazd
‖)/(Av0) determines the spin dephas-
ing time. Note that t0 is proportional to the factor azd
where az/d
is the characteristic length/area of the QD
along z direction / in the quantum well plane. By solving
Eq. (18) for various n, and summing over n, we obtain
||〈S+〉t|| =
n |〈S
+ 〉t|. We then define the time when
the envelop of ||〈S+〉t|| decays to 1/e of its initial value
as the spin dephasing time T2. As mentioned above the
hyperfine interaction can not transfer an energy of the
order of the Zeeman splitting, thus the hyperfine inter-
action alone can not lead to any spin relaxation.43
In the above discussion, the nuclear spin dipole-dipole
interaction is neglected. Recently, more careful exami-
nations based on quantum cluster expansion method or
pair correlationmethod have been performed.41,42,43,47 In
these works, the nuclear spin dipole-dipole interaction is
also included. This interaction together with the hyper-
fine mediated nuclear spin-spin interaction is the origin
of the fluctuation of the nuclear spin bath. To the lowest
order, the fluctuation is dominated by nuclear spin pair
flips.41,42,43,47 This fluctuation provides the source of the
electron spin dephasing, as the electron spin is coupled
to the nuclear spin system via hyperfine interaction. Our
method used here includes only the hyperfine interaction
to the second order in scattering. However, it is found
that the dipole-dipole-interaction–induced spin dephas-
ing is much weaker than the hyperfine interaction for a
QD with a = 2.8 nm and d0 = 27 nm until the par-
allel magnetic field is larger than ∼ 20 T.42 Therefore,
for the situation in this paper, the nuclear dipole-dipole-
interaction–induced spin dephasing can be ignored.67
III. SPIN DECOHERENCE DUE TO VARIOUS
MECHANISMS
Following the equation-of-motion approach developed
in Sec. II, we perform a numerical calculation of the spin
relaxation and dephasing times in GaAs QDs. Two mag-
netic field configurations are considered: i.e., the mag-
netic fields perpendicular and parallel to the well plane
(along x-axis). The temperature is taken to be T = 4
K unless otherwise specified. For all the cases we con-
sidered in this manuscript, the orbital level splitting is
larger than an energy corresponding to 40 K. Therefore,
the lowest Zeeman sublevels are mainly responsible for
the spin decoherence. When calculating T1, the initial
distribution is taken to be in the spin majority down
state of the eigen-state of the Hamiltonian He with a
Maxwell-Boltzmann distribution fk = C exp[−ǫk/(kBT )]
for different orbital levels (C is the normalization con-
stant). For the calculation of T2, we assign the same
distribution between different orbital levels, but with a
superposition of the two spin states within the same or-
bital level. The parameters used in the calculation are
listed in Table I.8,68,69
TABLE I: Parameters used in the calculation
ρ 5.3× 103 kg/m3 κ 12.9
vst 2.48 × 10
3 m/s g −0.44
vsl 5.29 × 10
3 m/s Ξ 7.0 eV
e14 1.41 × 10
9 V/m m∗ 0.067m0
A 90 µeV A33 19.6
γ0 27.5 Å
3·eV I 3
D 1.59 × 104 m/s a0 5.6534 Å
A. Spin Relaxation Time T1
We now study the spin relaxation time and show how
it changes with the well width a, the magnetic field B
and the effective diameter d0 =
~π/m∗ω0. We also
compare the relative contributions from each relaxation
mechanism.
1. Well width dependence
In Fig. 1(a) and (b), the spin relaxation times induced
by different mechanisms are plotted as function of the
width of the quantum well in which the QD is confined
for perpendicular magnetic field B⊥ = 0.5 T and parallel
magnetic field B‖ = 0.5 T respectively. We first concen-
trate on the perpendicular magnetic field case. In Fig.
1(a), the calculation indicates that the spin relaxation
due to each mechanism decreases with the increase of well
g-factor
strain
eI−ph
eI−ph
B⊥ = 0.5 T
a (nm)
1098765432
10−10
B‖ = 0.5 T
a (nm)
1098765432
10−10
FIG. 1: (Color online) T−11 induced by different mechanisms
vs. the well width for (a): perpendicular magnetic field B⊥ =
0.5 T with (solid curves) and without (dashed curves) the
SOC; (b) parallel magnetic field B‖ = 0.5 T with the SOC.
The effective diameter d0 = 20 nm, and temperature T = 4 K.
Curves with � — T−11 induced by the electron-BP scattering
together with the SOC; Curves with • — T−11 induced by
the second-order process of the hyperfine interaction together
with the BP (V
eI−ph); Curves with N — T
1 induced by the
first-order process of the hyperfine interaction together with
the BP (V
eI−ph); Curves with H — T
1 induced by the direct
spin-phonon coupling due to phonon-induced strain; Curves
with � — T−11 induced by the g-factor fluctuation.
width. Particularly the electron-BP scattering mecha-
nism decreases much faster than the other mechanisms.
It is indicated in the figures that when the well width is
small (smaller than 7 nm in the present case), the spin re-
laxation time is determined by the electron-BP scattering
together with the SOC. However, for wider well widths,
the direct spin-phonon coupling due to phonon-induced
strain and the first-order process of hyperfine interac-
tion combined with the electron-BP scattering becomes
more important. The decrease of spin relaxation due to
each mechanism is mainly caused by the decrease of the
SOC which is proportional to a−2. The SOC has two
effects which are crucial. First, in the second order per-
turbation the SOC contributes a finite correction to the
Zeeman splitting which determines the absorbed/emitted
phonon frequency and wave-vector.18 Second, it leads to
spin mixing. The decrease of the SOC thus leads to the
decrease of Zeeman splitting and spin mixing. The for-
mer leads to small phonon wave-vector and small phonon
absorption/emission efficiency.18 Therefore the electron-
BP mechanism decreases rapidly with increasing a. On
the other hand, the other two largest mechanisms can flip
spin without the help of the SOC. The spin relaxations
due to these two mechanisms decrease in a relatively mild
way. It is further confirmed that without SOC they de-
creases in a much milder way with increasing a (dashed
curves in Fig. 1). It is also noted that the spin relaxation
rate due to the g-factor fluctuation is at least six orders
of magnitude smaller than that due to the leading spin
decoherence mechanisms and can therefore be neglected.
It is noted that in the calculation, the SOC is always
included as it has large effect on the eigen-energy and
eigen-wavefunction of the electron.18 We also show the
spin relaxation times induced by the hyperfine interac-
tions (V
eI−ph and V
eI−ph) and the direct spin-phonon
coupling due to the phonon-induced strain but without
the SOC as in the literature.27,28,45 It can be seen clearly
that the spin relaxation that includes the SOC is much
larger than that without the SOC. For example, the spin
relaxation induced by the second-order process of the hy-
perfine interaction together with the BP (V
eI−ph) is at
least one order of magnitude larger when the SOC is in-
cluded than that when the SOC is neglected. This is
because when the SOC is neglected, 〈m|HeI(r)|ℓ1〉 and
〈ℓ2|HeI(r)|m〉 in Eq. (29) are small as the matrix el-
ements of HeI(r) between different orbital energy lev-
els are very small. However, when the SOC is taken
into account, the spin-up and -down levels with differ-
ent orbital quantum numbers are mixed and therefore
|ℓ〉 and |m〉 include the components with the same or-
bital quantum number. Consequently the matrix ele-
ments of 〈m|HeI(r)|ℓ1〉 and 〈ℓ2|HeI(r)|m〉 become much
larger. Therefore, spin relaxation induced by this mech-
anism depends crucially on the SOC.
It is emphasized from the above discussion that the
SOC should be included in each spin relaxation mecha-
nism. In the following calculations it is always included
unless otherwise specified. In particular in reference to
the mechanism of electron-BP interaction, we always con-
sider it together with the SOC.
We further discuss the parallel magnetic field case. In
Fig. 1(b) the spin relaxation times due to different mech-
anisms are plotted as function of the quantum well width
for same parameters as Fig. 1(a), but with a parallel mag-
netic field B‖ = 0.5 T. It is noted that the spin relaxation
rate due to each mechanism becomes much smaller for
small a compared with the perpendicular case. Another
feature is that the spin relaxation due to each mecha-
nism decrease in a much slower rate with increasing a.
The electron-BP mechanism is dominant even at a = 10
nm but decrease faster than other mechanisms with a. It
is expected to be less effective than the V
eI−ph mecha-
nism or V
eI−ph mechanism or the direct spin-phonon cou-
pling due to phonon-induced strain mechanism for large
enough a. The g-factor fluctuation mechanism is negli-
gible again. These features can be explained as follows.
For parallel magnetic field the contribution of the SOC to
Zeeman splitting is much less than in the perpendicular
magnetic field geometry.21 Moreover, this contribution is
negative which makes Zeeman splitting smaller.21 There-
fore, the phonon absorption/emission efficiency becomes
much smaller for small a, i.e., large SOC. When a in-
creases, the Zeeman splitting increases. However, the
spin mixing decreases. The former effect is weak, and
only cancels part of the latter, thus the spin relaxation
due to each mechanism decrease slowly with a.
g-factor
strain
eI−ph
eI−ph
a = 5 nm
B⊥ (T)
543210
a = 10 nm
B⊥ (T)
543210
FIG. 2: (Color online) T−11 induced by different mechanisms
vs. the perpendicular magnetic field B⊥ for d0 = 20 nm and
(a) a = 5 nm and (b) 10 nm. T = 4 K. Curves with � —
T−11 induced by the electron-BP scattering; Curves with •
— T−11 induced by the second-order process of the hyperfine
interaction together with the BP (V
eI−ph); Curves with N
— T−11 induced by the first-order process of the hyperfine
interaction together with the BP (V
eI−ph); Curves with H
— T−11 induced by the direct spin-phonon coupling due to
phonon-induced strain; Curves with � — T−11 induced by the
g-factor fluctuation.
g-factor
strain
eI−ph
eI−ph
a = 5 nm
B‖ (T)
543210
10−10
a = 10 nm
B‖ (T)
543210
FIG. 3: (Color online) T−11 induced by different mechanisms
vs. the parallel magnetic field B‖ for d0 = 20 nm and (a)
a = 5 nm and (b) 10 nm. T = 4 K. Curves with � —
T−11 induced by the electron-BP scattering; Curves with •
— T−11 induced by the second-order process of the hyperfine
interaction together with the BP (V
eI−ph); Curves with N
— T−11 induced by the first-order process of the hyperfine
interaction together with the BP (V
eI−ph); Curves with H
— T−11 induced by the direct spin-phonon coupling due to
phonon-induced strain; Curves with � — T−11 induced by the
g-factor fluctuation.
2. Magnetic Field Dependence
We first study the perpendicular-magnetic-field case.
The magnetic field dependence of T1 for two different
well widths are shown in Fig. 2(a) and Fig. 2(b). In the
calculation, d0 = 20 nm. It can be seen that the ef-
fect of each mechanism increases with the magnetic field.
Particularly the electron-BP mechanism increases much
faster than other ones and becomes dominant at high
magnetic fields. For small well width (5 nm in Fig. 2a),
the spin relaxation induced by the electron-BP scattering
is dominant except at very low magnetic fields (0.1 T in
the figure) where contributions from the first-order pro-
cess of hyperfine interaction together with the electron-
BP scattering and the direct spin-phonon coupling due to
phonon-induced strain also contribute. It is interesting
to see that when a is increased to 10 nm, the electron-BP
scattering is the largest spin relaxation mechanism only
at high magnetic fields (>1.1 T). For 0.4 T < B⊥ < 1.1
T (B⊥ < 0.4 T), the direct spin-phonon coupling due
to the phonon-induced strain (the first order hyperfine
interaction together with the BP ) becomes the largest
relaxation mechanism. It is also noted that there is no
single mechanism which dominates the whole spin relax-
ation. Two or three mechanisms are jointly responsible
for the spin relaxation. It is indicated that the spin relax-
ations induced by different mechanisms all increase with
B⊥. This can be understood from a perturbation theory:
when the magnetic field is small the spin relaxation be-
tween two Zeeman split states for each mechanism is pro-
portional to n̄(∆E)(∆E)m (∆E is the Zeeman splitting)
with m = 7 for electron-BP scattering due to the de-
formation potential18,25 and for the second-order process
of the hyperfine interaction together with the electron-
BP scattering due to the deformation potential V
eI−ph;
m = 5 for electron-BP scattering due to the piezoelec-
tric coupling15,18,25 and for the second-order process of
the hyperfine interaction together with the electron-BP
scattering due to the piezoelectric coupling V
eI−ph;
27 and
m = 5 for the direct spin-phonon coupling due to phonon-
induced strain;15 m = 1 for the first-order process of the
hyperfine interaction together with the BP V
eI−ph. The
spin relaxation induced by the g-factor fluctuation is pro-
portional to n̄(∆E)(∆E)5B2⊥. For most of the cases stud-
ied, ∆E is smaller than kBT , hence n̄(∆E) ∼ kBT/∆E,
and n̄(∆E)(∆E)m ∼ (∆E)m−1. m > 1 hold for all mech-
anism except the V
eI−ph mechanism, therefore the spin
relaxation due to these mechanisms increases with in-
creasing B⊥. However, from Eq. (27) one can see that
it has a term with ∇r, which indicates that the effect of
this mechanism is proportional to 1/d0. As the vector
potential of the magnetic field increases the confinement
of the QD and gives rise to smaller effective diameter d0,
this mechanism also increases with the magnetic field in
the perpendicular magnetic field geometry.
We then study the case with the magnetic field par-
allel to the quantum well plane. In Fig. 3 the spin re-
laxation induced by different mechanisms are plotted as
function of the parallel magnetic field B‖ for two dif-
ferent well widths. In the calculation, d0 = 20 nm. It
can be seen that, similar to the case with perpendicular
magnetic field, the effects of most mechanisms increase
with the magnetic field. Also the electron-BP mechanism
increases much faster than the other ones and becomes
dominant at high magnetic fields. However, without the
orbital effect of the magnetic field in the present con-
figuration, the effect of V
eI−ph changes very little with
the magnetic field. For both small (5 nm in Fig. 3(a))
and large (10 nm in Fig. 3(b)) well widths, the electron-
BP scattering is dominant except at very low magnetic
field (0.1 T in the figure), where the first-order process of
the hyperfine interaction together with the electron-BP
interaction V
eI−ph also contributes.
a = 5 nm
d0 (nm)
3025201510
g-factor
strain
eI−ph
eI−ph
a = 10 nm
d0 (nm)
3025201510
10−10
FIG. 4: (Color online) T−11 induced by different mechanisms
vs. the effective diameter d0 for B⊥ = 0.5 T and (a) a = 5 nm
and (b) 10 nm. T = 4 K. Curves with � — T−11 induced by
the electron-BP scattering; Curves with • – T−11 induced by
the second-order process of the hyperfine interaction together
with the BP (V
eI−ph); Curves with N — T
1 induced by the
first-order process of the hyperfine interaction together with
the BP (V
eI−ph); Curves with H — T
1 induced by the direct
spin-phonon coupling due to phonon-induced strain; Curves
with � — T−11 induced by the g-factor fluctuation.
3. Diameter Dependence
We now turn to the investigation of the diameter de-
pendence of the spin relaxation. We first concentrate on
the perpendicular magnetic field geometry. The spin re-
laxation rate due to each mechanism is shown in Fig. 4a
for a small (a = 5 nm) and Fig. 4b for a large (a = 10
nm) well widths respectively with a fixed perpendicu-
lar magnetic field B⊥ = 0.5 T. In the figure, the spin
relaxation rate due each mechanism except V
eI−ph in-
creases with the effective diameter. Specifically, the ef-
fect of the electron-BP mechanism increases very fast,
while the effect of the direct spin-phonon coupling due to
phonon-induced strain mechanism increases very mildly.
The V
eI−ph decreases with d0 slowly. Other mechanisms
are unimportant. The electron-BP mechanism eventu-
ally dominates spin relaxation when the diameter is large
a = 5 nm
d0 (nm)
3025201510
10−10
g-factor
strain
eI−ph
eI−ph
a = 10 nm
d0 (nm)
3025201510
FIG. 5: (Color online) T−11 induced by different mechanisms
vs. the effect diameter d0 with B‖ = 0.5 T and (a) a = 5 nm
and (b) 10 nm. T = 4 K. Curves with � — T−11 induced by
the electron-BP scattering; Curves with • — T−11 induced by
the second-order process of the hyperfine interaction together
with the BP (V
eI−ph); Curves with N — T
1 induced by the
first-order process of the hyperfine interaction together with
the BP (V
eI−ph); Curves with H — T
1 induced by the direct
spin-phonon coupling due to phonon-induced strain; Curves
with � — T−11 induced by the g-factor fluctuation.
enough. The threshold increases from 12 nm to 26 nm
when the well width increases from 5 nm to 10 nm. For
small diameter the V
eI−ph and the direct spin-phonon
coupling due to phonon-induced strain mechanism dom-
inate the spin relaxation. The increase/decrease of the
spin relaxation due to these mechanisms can be under-
stood from the following. The effect of the SOC on the
Zeeman splitting is proportional to d20 for small magnetic
field.18 The increase of d0 thus leads to a increase of Zee-
man splitting, therefore the efficiency of the phonon ab-
sorption/emission increases. Another effect is that the in-
crease of d0 will increase the phonon absorption/emission
efficiency due to the increase of the form factor.18 Thus
the spin relaxation increases. Moreover, the spin mix-
ing is also proportional to d0 also.
18 This leads to much
faster increasing of the effect of the electron-BP mecha-
nism and the V
eI−ph mechanism. However, the spin re-
laxation due to V
eI−ph decreases with the diameter. This
is because V
eI−ph contains a term ∇r [Eq. (27)] which
decreases with the increase of d0. Physically speaking,
the decrease of the effect of V
eI−ph is due to the fact
that the spin mixing due to the hyperfine interaction de-
creases with the increase of the number of nuclei within
the dot N as the random Overhauser field is proportional
to 1/
N . The spin relaxation induced by the g-factor is
also negligible here for both small and large well width.
We then turn to the parallel magnetic field case. In
the calculation, B‖ = 0.5 T. The results are shown for
both small well width (a = 5 nm in Fig. 5(a)) and large
well width (a = 10 nm in Fig. 5(b)) respectively. Simi-
lar to the perpendicular magnetic field case, the effect of
every mechanism except the V
eI−ph mechanism increases
with increasing diameter. The effect of the electron-BP
mechanism increases fastest and becomes dominant for
d0 > 12 nm for both small and large well width. For
d0 < 12 nm for the two cases the first-order process
of the V
eI−ph mechanism becomes dominant. The ef-
fect of the V
eI−ph mechanism become larger than that of
the direct spin-phonon coupling due to phonon-induced
strain mechanism. However, these two mechanism are
still unimportant and becomes more and more unimpor-
tant for larger d0. Here, the spin relaxation induced by
the g-factor is negligible.
4. Comparison with Experiment
In this subsection, we apply our analysis to experiment
data in Ref. 7. We first show that our calculation is in
good agreement with the experimental results. Then we
compare contributions from different mechanisms to spin
relaxation as function of the magnetic field. In the cal-
culation we choose the quantum dot diameter d0 = 56
nm (~ω0 = 1.1 meV as in experiment). The quantum
well is taken to be an infinite-depth well with a = 13
nm. The Dresselhaus SOC parameter γ0〈k2z〉 is taken
to be 4.5 meV·Åand the Rashba SOC parameter is 3.3
meV·Å. T = 0 K as kBT ≪ gµBB in the experiment.
The magnetic field is applied parallel to the well plane in
[110]-direction. The Dresselhaus cubic term is also taken
into consideration. All these parameters are the same
with (or close to) those used in Ref. 24 in which a cal-
culation based on the electron-BP scattering mechanism
agrees well with the experimental results. For this mech-
anism, we reproduce their results. The spin relaxation
time measured by the experiments (black dots with er-
ror bar in the figure) almost coincide with the calculated
spin relaxation time due to the electron-BP scattering
mechanism (curves with � in the figure).71 It is noted
from the figure that other mechanisms are unimportant
for small magnetic field. However, for large magnetic
field the effect of the direct-spin phonon coupling due
to phonon-induced strain becomes comparable with that
of the electron-BP mechanism. At B‖ = 10 T, the two
differs by a factor of ∼ 5.
g-factor
strain
eI−ph
eI−ph
B‖ (T)
15129630
10−10
10−12
FIG. 6: (Color online) T−11 induced by different mechanisms
vs. the parallel magnetic field B‖ in the [110] direction for
d0 = 56 nm and a = 13 nm with both the Rashba and Dres-
selhaus SOCs. T = 0 K. The black dots with error bar is the
experimental results in Ref. 7. Curves with � — T−11 induced
by the electron-BP scattering; Curves with • — T−11 induced
by the second-order process of the hyperfine interaction to-
gether with the BP (V
eI−ph); Curves with N — T
1 induced
by the first-order process of the hyperfine interaction together
with the BP (V
eI−ph); Curves with H — T
1 induced by the
direct spin-phonon coupling due to phonon-induced strain;
Curves with � — T−11 induced by the g-factor fluctuation.
B. Spin Dephasing Time T2
In this subsection, we investigate the spin dephasing
time for different well widths, magnetic fields and QD
diameters. As in the previous subsection, the contribu-
tions of the different mechanisms to spin dephasing are
compared.70 To justify the first Born approximation in
studying the hyperfine interaction induced spin dephas-
ing, we focus mainly on the high magnetic field regime
of B > 3.5 T. A typical magnetic field is 4 T. We also
demonstrate via extrapolation that in the low magnetic
field regime spin dephasing is dominated by the hyperfine
interaction.
1. Well Width Dependence
In Fig. 7 the well width dependence of the spin dephas-
ing induced by different mechanisms is presented under
the perpendicular (a) and parallel (b) magnetic fields. In
the calculations B⊥ = 4 T/B‖ = 4 T and d0 = 20 nm. It
can be seen in both figures that the spin dephasing due to
each mechanism decreases with a. Moreover, the spin de-
phasing due to the electron-BP scattering decreases much
faster than that due to the hyperfine interaction. These
features can be understood as following. The spin de-
phasing due to electron-BP scattering depends crucially
on the SOC. As the SOC is proportional to a−2, the spin
dephasing decreases fast with a. For the hyperfine inter-
action, from Eq. (35) one can deduce that the decay rate
of ||〈S+〉t|| is mainly determined by the factor 1/(azd2‖)
(here az = a), which thus decreases with a, but in a very
mild way. The fast decrease of the electron-BP mecha-
nism makes it eventually unimportant. For the present
perpendicular-magnetic-field case the threshold is around
2 nm. For parallel magnetic field it is even smaller. A
higher temperature may enhance the electron-BP mech-
anism (see discussion in Sec. V) and make it more im-
portant than the hyperfine mechanism. It is noted that
other mechanisms contribute very little to the spin de-
phasing. Thus, in the following discussion, we do not con-
sider these mechanisms. Comparing Figs 7(a) and (b),
one finds that a main difference is that the electron-BP
mechanism is less effective for the parallel-magnetic-field
case. As has been discussed in the previous subsection,
the spin mixing and the Zeeman splitting in the parallel
filed case is smaller than those in the perpendicular field
case. Therefore, the electron-BP mechanism is weakened
markedly.
Similar to Fig. 1, the SOC is always included in the
computation as it has large effect on the eigen-energy and
eigen-wavefunction of the electrons. The spin dephasings
calculated without the SOC for the hyperfine interaction,
the direct spin-phonon coupling due to phonon-induced
strain and the g-factor fluctuation are also shown in Fig.
7(a) as dashed curves. It can be seen from the figure
that for the spin dephasings induced by the direct spin-
phonon coupling due to phonon-induced strain and by
the g-factor fluctuation, the contributions with the SOC
are much larger than those without. This is because when
the SOC is included, the fluctuation of the effective field
induced by both mechanisms becomes much stronger and
more scattering channels are opened. However, what
should be emphasized is that the spin dephasings induced
by the hyperfine interaction with and without the SOC
are nearly the same (the solid and the dashed curves
nearly coincide). That is because the change of the wave-
function Ψ(r) due to the SOC is very small (less than 1
% in our condition) and therefore the factor 1/(azd
‖) is
almost unchanged when the SOC is neglected. Thus the
spin dephasing rate is almost unchanged.
In the inset of Fig. 7(a), the time evolution of ||〈S+〉t||
induced by the hyperfine interaction is shown, with a = 2
nm. It can be seen that ||〈S+〉t|| decays very fast and de-
creases to less than 10 % of its initial value within the
first two oscillating periods. Therefore, T2 is determined
by the first two or three periods of ||〈S+〉t||. Thus the
correction of the long time dynamics due to higher or-
der scattering33 contributes little to the spin dephasing
time. For quantum computation and quantum informa-
tion processing, the initial, e.g., 1 % decay of ||〈S+〉t||
may be more important than the 1/e decay.42,43 Indeed,
the spin dephasing time defined by the exponential fitting
of 1 % decay is short than that defined by the 1/e decay.
However, the two differs less than 5 times. For a rough
B⊥ = 4 T
t (µs)
6543210
a (nm)
1098765432
eI−ph
eI−ph
g-factor
strain
hyperfine
B‖ = 4 T
a (nm)
1098765432
10−10
10−15
10−20
10−25
FIG. 7: (Color online) T−12 induced by different mechanisms
vs. the well width for d0 = 20 nm. T = 4 K. (a): B⊥ = 4 T
with (solid curves) and without (dashed curves) the SOC; (b):
B‖ = 4 T only with the SOC. Curve with � — T
2 induced
by the electron-BP interaction; Curves with • — T−12 induced
by the hyperfine interaction; Curves with H — T−12 induced
by the direct spin-phonon coupling due to phonon-induced
strain; Curves with � — T−12 induced by g-factor fluctuation;
N — T−12 induced by the second-order process of the hyper-
fine interaction together with the BP (V
eI−ph); Curves with �
— T−12 induced by the first-order process of the hyperfine in-
teraction together with the BP (V
eI−ph). The time evolution
of ||〈S+〉t|| induced by the hyperfine interaction with a = 2
nm is shown in the inset of (a).
comparison of contributions from different mechanisms to
spin dephasing where only the order-of-magnitude differ-
ence is concerned (see Figs. 7-9), this difference due to
the definition does not jeopardize our conclusions.
2. Magnetic Field Dependence
We then investigate the magnetic field dependence of
the spin dephasing induced by the electron-BP scatter-
ing and by the hyperfine interaction for two different well
widths (a = 3 nm and a = 5 nm) with both perpendicular
and parallel magnetic field. From Fig. 8(a) and (b) one
hyperfine
B⊥ (T)
87.576.565.554.543.5
B‖ (T)
87.576.565.554.543.5
FIG. 8: (Color online) T−12 induced by the electron-BP scat-
tering and the hyperfine interaction vs. (a): the perpendic-
ular magnetic field B⊥ ; (b): the parallel magnetic field B‖
for a = 3 nm (solid curves) and 5 nm (dashed curves). T = 4
K, and d0 = 20 nm. Curves with � — T
2 induced by the
electron-BP interaction; Curves with • — T−12 induced by the
hyperfine interaction.
can see that the spin dephasing due to the electron-BP
scattering increases with magnetic field, whereas that due
to the hyperfine interaction decreases with magnetic field.
Thus, the electron-BP mechanism eventually dominates
the spin dephasing for high enough magnetic field. The
threshold is Bc⊥ = 4 T / B
= 7 T for a = 3 nm with per-
pendicular/parallel magnetic field. For larger well width,
e.g., a = 5 nm with parallel magnetic field or perpendicu-
lar magnetic field, the threshold magnetic fields increase
to larger than 8 T. The different magnetic field depen-
dences above can be understood as following. Besides
spin relaxation, the spin-flip scattering also contributes
to spin dephasing.20 As has been demonstrated in Sec.
IIIA, the electron-BP scattering induced spin-flip tran-
sition rate increases with the magnetic field. Therefore
the spin dephasing rate increases with the magnetic field
also. In contrast, spin dephasing induced by the hyper-
fine interaction decreases with the magnetic field. This
is because when the magnetic field becomes larger, the
fluctuation of the effective magnetic field due to the sur-
rounding nuclei becomes insignificant compared. There-
fore, the hyperfine-interaction-induced spin dephasing is
reduced. Similar results have been obtained by Deng and
Hu.44
hyperfine
B⊥ = 4 T
d0 (nm)
3025201510
B‖ = 4 T (b)
d0 (nm)
3025201510
FIG. 9: (Color online) T−12 induced by the electron-BP scat-
tering and the hyperfine interaction vs. the effective diameter
d0 T = 4 K. (a): B⊥ = 4 T ; (b): B‖ = 4 T for a = 3 nm
(solid curves) and 5 nm (dashed curves). Curves with � —
T−12 induced by the electron-BP interaction; Curves with •
— T−12 induced by the hyperfine interaction.
3. Diameter Dependence
In Fig. 9 the spin dephasing times induced by the
electron-BP scattering and the hyperfine interaction are
plotted as function of the diameter d0 for a small (a = 3
nm) and a large (a = 5 nm) well widths. In the cal-
culation, B⊥ = 4 T in Fig. 9(a) and B‖ = 4 T in (b).
It is noted that the effect of the electron-BP mechanism
increases rapidly with d0, whereas the effect of the hy-
perfine mechanism decreases slowly. Consequently, the
electron-BP mechanism eventually dominates the spin
dephasing for large enough d0. The threshold is d
0 = 19
(27) nm for a = 3 (5) nm case with the perpendicular
magnetic field and dc0 = 26 (30) nm for a = 3 (5) nm
case under the parallel magnetic field. As has been dis-
cussed in Sec. IIIA, both the effect of the SOC and the ef-
ficiency of the phonon absorption/emission increase with
d0. Therefore, the spin dephasing due to the electron-BP
mechanism increases rapidly with d0.
18,21 The decrease
of the effect of the hyperfine interaction is due to the de-
crease of the factor 1/(azd
) [Eq. 35] with the diameter
IV. SPIN RELAXATION TIMES FROM FERMI
GOLDEN RULE AND FROM EQUATION OF
MOTION
In this section, we will try to find a proper method to
average over the transition rates from the Fermi Golden
rule, τ−1i→f , to give the spin relaxation time T1. In the
limit of small SOC, we rederive Eq. (1) from the equation
of motion. We further show that Eq. (1) fails for large
SOC where a full calculation from the equation of motion
is needed.
T = 12 K
10987654321
γ = γ0
T (K)
4035302520151050
FIG. 10: (Color online) Spin relaxation time T1 calculated
from the equation-of-motion approach (�) v.s. that obtained
from Eq. (1) (•) as function of (a): the strength of the SOC
for T=12 K; (b): the temperature for γ = γ0. The well width
a = 5 nm, perpendicular magnetic field B⊥ = 0.5 T, QD
diameter d0 = 30 nm. The ratio of the two R is also plotted
in the figure. Note the scale of T−11 is at the right hand side
of the frame.
We first rederive Eq. (1) for small SOC from the equa-
tion of motion. In QDs, the orbital level splitting is
usually much larger than the Zeeman splitting. Each
Zeeman sublevel has two states: one with majority up-
spin, the other with majority down-spin. We call the
former “minus state” (as it corresponds to a lower en-
ergy) while the latter “plus state”. For small SOC,
the spin mixing is small. Thus we neglect the much
smaller contribution from the off-diagonal terms of the
density matrix to Sz. Therefore Sz(t) =
z fi±(t)
where i± denotes the plus/minus state of the i-th or-
bital state. For small SOC, the spin relaxation is
much slower than the orbital relaxation.25,55 This im-
plies that the time takes to establish equilibrium within
the plus/minus states is much smaller than the spin relax-
ation time. Thus we can assume a equilibrium (Maxwell-
Boltzmann) distribution between the plus/minus states
at any time. The distribution function is therefore given
by fi±(t) = N±(t) exp(−εi±/kBT )/Z±. Here N±(t) =
i fi±(t) is the total probability of the plus/minus states
with N+(t) + N−(t) = 1 for single electron in QD and
i exp(−εi±/kBT ) is the partition function for
the plus/minus state. At equilibrium, N± = N
± . The
equation for Sz(t) is hence,
Sz(t) =
[Sz(t)− Seqz ]
Si±z exp(−εi±/kBT )/Z±
δN±(t) , (36)
with δN±(t) = N±(t) − Neq± . As the orbital level
splitting is usually much larger than the Zeeman split-
ting, the factor exp(−εi±/kBT )/Z± can be approximated
by exp(−εi0/kBT )/Z0 with εi0 = 12 (εi+ + εi−) and
i exp[−εi0/kBT ]. Further using the particle-
conservation relation
± δN±(t) = 0, one has
Sz(t) = [
(Si+z −Si−z ) exp(−εi0/kBT )/Z0]
δN+(t) .
As Sz(t) − Seqz = [δN+(t)/Z0]
Si−z ) exp(−εi0/kBT ), one finds that the spin relax-
ation time is nothing but the relaxation time of N+.
The next step is to derive the equation of d
δN+(t),
which is given in our previous work:49
δN+(t) =
δfi+(t)
[τ−1i+→f−δfi+(t)− τ
i−→f+δfi−(t)]
[τ−1i+→f− + τ
i−→f+]
e−εi0/kBT
δN+(t) .(38)
Thus spin relaxation time is given by,
(τ−1i+→f− + τ
i−→f+)
e−εi0/kBT
. (39)
Furthermore, substituting e−εi0/kBT /Z0 by f
exp(−εi±/kBT )/Z±, we have
(τ−1i+→f−f
i+ + τ
i−→f+f
i−) . (40)
This is exactly Eq. (1).
T1/T2
T (K)
10−10
10−11
10−12
10−13
2520151050
FIG. 11: (Color online) Spin relaxation time T1, spin dephas-
ing time T2 and T1/T2 against temperature T . B⊥ = 4 T,
a = 5 nm and d0 = 30 nm. Note the scale of T1 and T2 is at
the right hand side of the frame.
For large SOC, or large spin mixing due to anticrossing
of different spin states,19,25 the spin relaxation rate be-
comes comparable with the orbital relaxation rate. Fur-
thermore, the decay of the off-diagonal term of the den-
sity matrix should contribute to the decay of Sz. There-
fore, the above analysis does not hold. In this case, it is
difficult to obtain such a formula, and a full calculation
from the equation-of-motion is needed.
In Fig. 10(a), we show that (for T = 12 K, a = 5
nm, B⊥ = 0.5 T, d0 = 30 nm) the spin relaxation times
T1 calculated from equation-of-motion approach and that
obtained from Eq. (40). Here, for simplicity and without
loss of generality, we consider only the electron-BP scat-
tering mechanism. The discrepancy of T1 obtained from
the two approaches increases with γ. At γ = 10γ0, the
ratio of the two becomes as large as ∼ 3. In Fig. 10(b),
we plot the spin relaxation times obtained via the two
approaches as function of temperature for γ = γ0 with
other parameters remaining unchanged. It is noted that
the discrepancy of T1 obtained from the two approaches
increases with temperature. For high temperature, the
higher levels are involved in the spin dynamics where the
SOC becomes larger. At 40 K, the discrepancy is as large
as 60 %. The ratio increases very slowly for T < 20 K
where only the lowest two Zeeman sublevels are involved
in the dynamics.
g-factor
strain
eI−ph
eI−ph
B⊥ = 0.5 T
T (K)
2520151050
10−10
10−15
B⊥ = 0.9 T
T (K)
2520151050
FIG. 12: (Color online) Spin relaxation time T1 against tem-
perature T for (a): B⊥ = 0.5 T; (b): B⊥ = 0.9 T. a = 10
nm and d0 = 20 nm. Curves with � — T
1 induced by the
electron-BP scattering together with the SOC; Curves with •
— T−11 induced by the second-order process of the hyperfine
interaction together with the BP (V
eI−ph); Curves with N —
T−11 induced by the first-order process of the hyperfine inter-
action together with the BP (V
eI−ph); Curves with H — T
induced by the direct spin-phonon coupling due to phonon-
induced strain; Curves with � — T−11 induced by the g-factor
fluctuation.
V. TEMPERATURE DEPENDENCE OF SPIN
RELAXATION TIME T1 AND SPIN DEPHASING
TIME T2
We first study the relative magnitude of the spin re-
laxation time T1 and the spin dephasing time T2. We
consider a QD with d0 = 30 nm and a = 5 nm at B⊥ = 4
T where the largest contribution to both spin relaxation
and dephasing comes from the electron-BP scattering
(see Fig. 4(a) and Fig. 9(a), we have checked that the
electron-BP scattering mechanism is dominant through-
out the temperature range). From Fig. 11, one finds that
when the temperature is low (T < 5 K in the figure),
T2 = 2T1, which is in agreement with the discussion in
Ref. 20. However, T1/T2 increases very quickly with T
and for T = 20 K, T1/T2 ∼ 2 × 102. This is understood
from the fact that when T is low, the electron mostly dis-
tributes in the lowest two Zeeman sublevels. For small
SOC, Golovach et al. have shown via perturbation the-
ory that phonon induces only the spin-flip noise in the
leading order. Consequently, T2 = 2T1.
20 When the tem-
perature becomes comparable with the orbital level split-
ting ~ω0, the distribution over the upper orbital levels is
not negligible any more. As mentioned previously, the
SOC contributes a non-trivial part to the Zeeman split-
ting. Specifically, the second order energy correction due
to the SOC contributes to the Zeeman splitting. The
energy correction for different orbital levels is generally
unequal (always larger for higher levels). When the elec-
tron is scattered by phonons randomly from one orbital
state to another one with the same major spin polariza-
tion, the frequency of its precession around z direction
changes. Continuous scattering leads to random fluctu-
ation of the precession frequency and thus leads to spin
dephasing.29,46 Note that this fluctuation only leads to a
phase randomization of S+, but not flips the z compo-
nent spin Sz, i.e., not leads to spin relaxation. There-
fore, the spin dephasing becomes stronger than the spin
relaxation for high temperatures. Moreover, this effect
increases with temperature rapidly as the distribution
over higher levels and the phonon numbers both increase
with temperature.
We further study the temperature dependence of spin
relaxation for lower magnetic field and larger quantum
well width where other mechanisms may be more im-
portant than the electron-BP mechanism. In Fig. 12(a),
the spin relaxation time is plotted as function of tem-
perature for B⊥ = 0.5 T, a = 10 nm and d0 = 20 nm.
It is seen from the figure that the direct spin-phonon
coupling due to phonon-induced strain mechanism dom-
inates the spin relaxation throughout the temperature
range. It is also noted that for T ≤ 4 K the spin relax-
ation rates induced by different mechanisms all increase
with temperature according to the phonon number fac-
tor 2n̄(Ez1) + 1 with Ez1 being the Zeeman splitting of
the lowest Zeeman sublevels. However, for T > 4 K, the
spin relaxation rates induced by the direct spin-phonon
coupling due to phonon-induced strain and the electron-
BP interaction increase rapidly with temperature, while
the spin relaxation rates induced by V
eI−ph and V
eI−ph
increase mildly according to 2n̄(Ez1) + 1 throughout the
temperature range. These features can be understood as
what follows. For T ≤ 4 K, the distribution over the high
levels is negligible. Only the lowest two Zeeman sublevels
involve in the spin dynamics. The spin relaxation rates
thus increase with 2n̄(Ez1) + 1 and the relative impor-
tance of each mechanism does not change. Therefore, our
previous analysis on comparison of relative importance of
different spin decoherence mechanisms at 4 K holds true
for the range 0 ≤ T ≤ 4 K. When the temperature gets
higher, the contribution from higher levels becomes more
important. Although the distribution at the higher lev-
els is still very small, for the direct spin-phonon coupling
mechanism, the transition rates between the higher lev-
els and that between higher levels and the lowest two
sublevels are very large. For the electron-BP mechanism
the transition rates between the higher levels are very
large due to the large SOC in these levels. Therefore, the
contribution from the higher levels becomes larger than
that from the lowest two sublevels. Consequently, the in-
crease of temperature leads to rapid increase of the spin
relaxation rates. However, for the two hyperfine mech-
anisms: the V
eI−ph and the V
eI−ph, the spin relaxation
rates does not change much when the higher levels are
involved. They thus increase by the phonon number fac-
In Fig. 12(b) we show the temperature dependence of
the spin relaxation time for the same condition but with
B = 0.9 T. It is noted that the spin relaxation rate due
to the electron-BP mechanism catches up with that in-
duced by the direct spin-phonon coupling due to phonon-
induced strain at T = 9 K and becomes larger for higher
temperature. This indicates that the temperature depen-
dence of the two mechanisms are quite different.
In Fig. 13 we show the spin dephasing induced by
electron-BP scattering and the hyperfine interaction as
function of temperature for B⊥ = 4 T, a = 10 nm and
d0 = 20 nm. We choose the conditions so that the spin
dephasing is dominated by the hyperfine interaction at
low temperature. However, the effect of the electron-BP
mechanism increases with temperature quickly while that
of the hyperfine interaction remains nearly unchanged.
The fast increase of the effect from the electron-BP scat-
tering is due to three factors: 1) the increase of the
phonon number; 2) the increase of scattering channels;
and 3) the increase of the SOC induced spin mixing in
higher levels. On the other hand, from Eq. 35, one can
deduce that the spin dephasing rate of the hyperfine in-
teraction depends mainly on the factor 1/(azd
) with
is the characteristic length/area along the z di-
rection / in the quantum well plane. For higher levels,
the d2‖ is larger, but only about a factor smaller than 10.
Thus the effect of the hyperfine interaction increases very
slowly with temperature.
It should be noted that in the above discussion, we
neglected the two-phonon scattering mechanism,15,46,50
which may be important at high temperature. The con-
tribution of this mechanism should be calculated via
the equation-of-motion approach developed in this paper,
and compared with the contribution of other mechanisms
showed here.
VI. CONCLUSION
In conclusion, we have investigated the longitudinal
and transversal spin decoherence times T1 and T2, called
spin relaxation time and spin dephasing time, in differ-
ent conditions in GaAs QDs from the equation-of-motion
approach. Various mechanisms, including the electron-
BP scattering, the hyperfine interaction, the direct spin-
hyperfine
T (K)
2520151050
FIG. 13: (Color online) Spin relaxation time T1 against tem-
perature T . B⊥ = 4 T, a = 10 nm and d0 = 20 nm. Curves
with � — T−12 induced by the electron-BP scattering together
with the SOC; Curves with • — T−12 induced by the hyperfine
interaction
phonon coupling due to phonon-induced strain and the
g-factor fluctuation are considered. Their relative im-
portance is compared. There is no doubt that for spin
decoherence induced by electron-BP scattering, the SOC
must be included. However, for spin decoherence induced
by the hyperfine interaction, the direct spin-phonon cou-
pling due to phonon-induced strain, g-factor fluctua-
tion, and hyperfine interaction combined with electron-
phonon scattering, the SOC is neglected in the existing
literature.27,28,45 Our calculations have shown that, as
the SOC has marked effect on the eigen-energy and the
eigen-wavefunction of the electron, the spin decoherence
induced by these mechanisms with the SOC is larger than
that without it. Especially, the decoherence from the
second-order process of hyperfine interaction combined
with the electron-BP interaction increases at least one
order of magnitude when the SOC is included. Our cal-
culations show that, with the SOC, in some conditions
some of these mechanisms (except g-factor fluctuation
mechanism) can even dominate the spin decoherence.
There is no single mechanism which dominates spin re-
laxation or spin dephasing in all parameter regimes. The
relative importance of each mechanism varies with the
well width, magnetic field and QD diameter. In particu-
lar, the electron-BP scattering mechanism has the largest
contribution to spin relaxation and spin dephasing for
small well width and/or high magnetic field and/or large
QD diameter. However, for other parameters the hyper-
fine interaction, the first-order process of the hyperfine
interaction combined with electron-BP scattering, and
the direct spin-phonon coupling due to phonon-induced
strain can be more important. It is noted that the g-
factor fluctuation always has very little contribution to
spin relaxation and spin dephasing which can thus be
neglected all the time. For spin dephasing, the electron-
BP scattering mechanism and the hyperfine interaction
mechanism are more important than other mechanisms
for magnetic field higher than 3.5 T. For this regime,
other mechanisms can thus be neglected. It is also shown
that spin dephasing induced by the electron-BP mecha-
nism increases rapidly with temperature. Extrapolated
from our calculation, the hyperfine interaction mecha-
nism is believed to be dominant for small magnetic field.
We also discussed the problem of finding a proper
method to average over the transition rates τ−1i→f obtained
from the Fermi Golden rule, to give the spin relaxation
time T1 at finite temperature. For small SOC, we red-
erived the formula for T1 at finite temperature used in
the existing literature18,51,52 from the equation of mo-
tion. We further demonstrated that this formula is in-
adequate at high temperature and/or for large SOC. For
such cases, a full calculation from the equation-of-motion
approach is needed. The equation-of-motion approach
provides an easy and powerful way to calculate the spin
decoherence at any temperature and SOC.
We also studied the temperature dependence of spin re-
laxation T1 and dephasing T2. We show that for very low
temperature if the electron only distributes on the low-
est two Zeeman sublevels, T2 = 2T1. However, for higher
temperatures, the electron spin dephasing increases with
temperature much faster than the spin relaxation. Con-
sequently T1 ≫ T2. The spin relaxation and dephasing
due to different mechanisms are also compared.
Acknowledgments
This work was supported by the Natural Science
Foundation of China under Grant Nos. 10574120 and
10725417, the National Basic Research Program of China
under Grant No. 2006CB922005 and the Innovation
Project of Chinese Academy of Sciences. Y.Y.W. would
like to thank J. L. Cheng for valuable discussions.
∗ Author to whom correspondence should be addressed;
Electronic address: mwwu@ustc.edu.cn
† Mailing Address
Semiconductor Spintronics and Quantum Computation,
edited by D. D. Awschalom, D. Loss, and N. Samarth
(Springer-Verlag, Berlin, 2002); I. Zutic, J. Fabian, and
S. Das Sarma, Rev. Mod. Phys. 76, 323 (2004).
2 H.-A. Engel, L. P. Kouwenhoven, D. Loss, and C. M. Mar-
cus, Quantum Information Processing 3, 115 (2004); D.
Heiss, M. Kroutvar, J. J. Finley, and G. Abstreiter, Solid
State Commun. 135, 591 (2005); and references therein.
3 D. Loss and D. P. DiVincenzo, Phys. Rev. A 57, 120
(1998).
4 R. Hanson, L. P. Kouwenhoven, J. R. Petta, S. Tarucha,
and L. M. K. Vandersypen, Rev. Mod. Phys. 79, 1217
(2007).
5 J. M. Taylor, H.-A. Engel, W. Dür, A. Yacoby, C. M.
Marcus, P. Zoller, and M. D. Lukin, Nature Phys. 1, 177
(2005).
6 S. Amasha, K. MacLean, I. Radu, D. M. Zumbuhl,
M. A. Kastner, M. P. Hanson, and A. C. Gossard,
cond-mat/0607110.
7 J. M. Elzerman, R. Hanson, L. H. Willems van Beveren, B.
Witkamp, L. M. K. Vandersypen and L. P. Kouwenhoven,
Nature (London) 430, 431 (2004).
8 D. Paget, G. Lample, B. Sapoval, and V. I. Safarov, Phys.
Rev. B 15, 5780 (1977).
Optical Orientation, edited by F. Meier and B. P. Za-
kharchenya (North-Holland, Amsterdam, 1984).
10 G. Dresselhaus, Phys. Rev. 100, 580 (1955).
11 Y. Bychkov and E. I. Rashba, J. Phys. C 17, 6039 (1984).
12 L. M. Roth, Phys. Rev. 118, 1534 (1960).
13 A. V. Khaetskii and Y. V. Nazarov, Physica E 6, 470
(2000).
14 A. V. Khaetskii and Y. V. Nazarov, Phys. Rev. B 61, 12639
(2000).
15 A. V. Khaetskii and Y. V. Nazarov, Phys. Rev. B 64,
125316 (2001).
16 L. M. Woods, T. L. Reinecke, and Y. Lyanda-Geller, Phys.
Rev. B 66, 161318 (2002).
17 R. de Sousa and S. Das Sarma, Phys. Rev. B 68, 155330
(2003).
18 J. L. Cheng, M. W. Wu, and C. Lü, Phys. Rev. B 69,
115318 (2004).
19 D. V. Bulaev and D. Loss, Phys. Rev. B 71, 205324 (2005).
20 V. N. Golovach, A. Khaetskii, and D. Loss, Phys. Rev.
Lett. 93, 016601 (2004).
21 C. F. Destefani and S. E. Ulloa, Phys. Rev. B 72, 115326
(2005).
22 P. San-Jose, G. Zarand, A. Shnirman, and G. Schön, Phys.
Rev. Lett. 97, 076803 (2006).
23 V. I. Fal’ko, B. L. Altshuler, and O. Tsyplyatyev, Phys.
Rev. Lett. 95, 076603 (2005).
24 P. Stano and J. Fabian, Phys. Rev. Lett. 96, 186602 (2006).
25 P. Stano and J. Fabian, Phys. Rev. B 74, 045320 (2006).
26 H. Westfahl. Jr., A. O. Caldeira, G. Medeiros-Ribeiro, and
M. Cerro, Phys. Rev. B 70, 195320 (2004).
27 S. I. Erlingsson, and Yuli V. Nazarov, Phys. Rev. B 66,
155327 (2002).
28 V. A. Abalmassov and F. Marquardt, Phys. Rev. B 70,
075313 (2004).
29 Y. G. Semenov and K. W. Kim, Phys. Rev. Lett. 92,
026601 (2004).
30 A. V. Khaetskii, D. Loss, and L. Glazman, Phys. Rev. Lett.
88, 186802 (2002).
31 A. Khaetskii, D. Loss, and L. Glazman, Phys. Rev. B 67,
195329 (2003).
32 J. Schliemann, A. Khaetskii, and D. Loss, J. Phys.: Con-
dens. Matter 15, R1809 (2003) and references there in.
33 W. A. Coish and D. Loss, Phys. Rev. B 70, 195340 (2004).
34 Ö. Cakir and T. Takagahara, cond-mat/0609217.
35 C. Deng and X. Hu, cond-mat/0608544.
36 S. I. Erlingsson and Yuli V. Nazarov, Phys. Rev. B 70,
205327 (2004).
37 N. Shenvi, R. de Sousa, and K. B. Whaley, Phys. Rev. B
71, 224411 (2005).
38 R. de Sousa, in Electron spin resonance and related phe-
nomena in low dimensional structures, edited by M. Fan-
mailto:mwwu@ustc.edu.cn
http://arxiv.org/abs/cond-mat/0607110
http://arxiv.org/abs/cond-mat/0609217
http://arxiv.org/abs/cond-mat/0608544
ciulli (Springer-Verlag, Berlin, to be published.)
39 Y. V. Pershin and V. Privman, Nano Lett. 3, 695 (2003).
40 I. A. Merkulov, Al. L. Efros, and M. Rosen, Phys. Rev. B
65, 205309 (2002).
41 W. M. Witzel, R. de Sousa, and S. Das Sarma, Phys. Rev.
B 72, 161306 (2005).
42 W. Yao, R.-B. Liu, and L. J. Sham, Phys. Rev. B 74,
195301 (2006).
43 W. M. Witzel and S. Das Sarma, Phys. Rev. B 74, 035322
(2006).
44 C. Deng and X. Hu, Phys. Rev. B 73, 241303 (2006).
45 Y. G. Semenov and K. W. Kim, Phys. Rev. B 70, 085305
(2004).
46 Y. G. Semenov and K. W. Kim, Phys. Rev. B 75, 195342
(2007).
47 W. M. Witzel and S. Das Sarma, Phys. Rev. Lett. 98,
077601 (2007).
48 R. de Sousa, N. Shenvi, and K. B. Whaley, Phys. Rev. B
72 045330 (2005).
49 J. H. Jiang and M. W. Wu, Phys. Rev. B 75, 035307
(2007).
50 B. A. Glavin and K. W. Kim, Phys. Rev. B 68, 045308
(2003).
51 C. Lü, J. L. Cheng, and M. W. Wu, Phys. Rev. B 71,
075308 (2005).
52 Y. Y. Wang and M. W. Wu, Phys. Rev. B 74, 165312
(2006).
53 W. H. Lau and M. E. Flatté, Phys. Rev. B 72, 161311(R)
(2005).
54 C. P. Slichter, Principles of Magnetic Resonance,
(Springer-Verlag, Berlin, 1990).
55 T. Fujisawa, D. G. Austing, Y. Tokura, Y. Hirayama, and
S. Tarucha, Nature 419, 278 (2002).
56 see, e.g., P. N. Argyres and P. L. Kelley, Phys. Rev. 134,
A98 (1964).
57 R. L. Fulton, J. Chem. Phys. 41, 2876 (1964).
58 P.-F. Braun, X. Marie, L. Lombez, B. Urbaszek, T.
Amand, P. Renucci, V. K. Kalevich, K. V. Kavokin, O.
Krebs, P. Voisin, and Y. Masumoto, Phys. Rev. Lett. 94,
116601 (2005).
59 F. H. L. Koppens, J. A. Folk, J. M. Elzerman, R. Hanson,
L. H. Willems van Beveren, I. T. Vink, H. P. Tranitz, W.
Wegscheider, L. P. Kouwenhoven, L. M. K. Vandersypen,
Science 309, 1346 (2005).
60 J. R. Petta, A. C. Johnson, J. M. Taylor, E. A. Laird, A.
Yacoby, M. D. Lukin, C. M. Marcus, M. P. Hanson, A. C.
Gossard, Science 309, 2180 (2005).
61 F. H. L. Koppens, K. C. Nowack, and L. M. K. Vander-
sypen, arXiv:0711.0479.
62 M. W. Wu and H. Metiu, Phys. Rev. B 61, 2945 (2000).
63 T. Kuhn and F. Rossi, Phys. Rev. Lett. 69, 977 (1992).
64 J. Shah, Ultrafast Spectroscopy of Semiconductors and
Semiconductor Nanostructures (Springer, Berlin, 1996).
65 M. I. D’yakonov and V. I. Perel’, Zh. Eksp. Teor. Fiz. 60,
1954 (1971) [Sov. Phys. JETP 33, 1053 (1971)].
66 A. Abragam, The Principles of Nuclear Magnetism (Ox-
ford University Press, Oxford, 1961), Chaps. VI and IX.
67 This can be obtained from Eq. (17) in Ref. 42.
Numerical Data and Functional Relationships in Science
and Technology, edited by O. Madelung, M. Schultz, and
H. Weiss, Landolt-Börnstein, New Series, Group III, Vol.
17, Pt. a (Springer-Verlag, Berlin, 1982).
69 W. Knap, C. Skierbiszewski, A. Zduniak, E. Litwin-
Staszewska, D. Bertho, F. Kobbi, J. L. Robert, G. E.
Pikus, F. G. Pikus, S. V. Iordanskii, V. Mosser, K.
Zekentes, and Yu. B. Lyanda-Geller, Phys. Rev. B 53, 3912
(1996).
70 It should be mentioned that one effect is not included :
when electron is scattered by phonon from one orbital state
to another, it feels a difference in the spin precession fre-
quency since the strength of longitudinal (along the exter-
nal magnetic field) component of Overhauser field differs
with orbital states. This effect randomizes the spin preces-
sion phase and leads to a pure spin dephasing. However,
this effect is negligible in our manuscript.
71 The deviation of our calculation from the experiment data
at T = 14 T is due to the fact that we do not include the
cyclotron effect along the-z direction. For B & 10 T, the
cyclotron orbit length is smaller than the quantum well
width, which makes our model unrealistic.
http://arxiv.org/abs/0711.0479
ABSTRACT
  The longitudinal and transversal spin decoherence times, $T_1$ and $T_2$, in
semiconductor quantum dots are investigated from equation-of-motion approach
for different magnetic fields, quantum dot sizes, and temperatures. Various
mechanisms, such as the hyperfine interaction with the surrounding nuclei, the
Dresselhaus spin-orbit coupling together with the electron--bulk-phonon
interaction, the $g$-factor fluctuations, the direct spin-phonon coupling due
to the phonon-induced strain, and the coaction of the
electron--bulk/surface-phonon interaction together with the hyperfine
interaction are included. The relative contributions from these spin
decoherence mechanisms are compared in detail. In our calculation, the
spin-orbit coupling is included in each mechanism and is shown to have marked
effect in most cases. The equation-of-motion approach is applied in studying
both the spin relaxation time $T_1$ and the spin dephasing time $T_2$, either
in Markovian or in non-Markovian limit. When many levels are involved at finite
temperature, we demonstrate how to obtain the spin relaxation time from the
Fermi Golden rule in the limit of weak spin-orbit coupling. However, at high
temperature and/or for large spin-orbit coupling, one has to use the
equation-of-motion approach when many levels are involved. Moreover, spin
dephasing can be much more efficient than spin relaxation at high temperature,
though the two only differs by a factor of two at low temperature.

<|endoftext|><|startoftext|>
Introduction
The 3+1 formalism is the basis of most modern numerical relativity and has lead,
along with alternative approaches [82], to the recent successes in the binary black hole
merger problem [6, 7, 99, 25, 26, 27, 28] (see [24, 69, 86] for a review). Thanks to
the 3+1 formalism, the resolution of Einstein equation amounts to solving a Cauchy
problem, namely to evolve “forward in time” some initial data. However this is a
Cauchy problem with constraints. This makes the set up of initial data a non trivial
task, because these data must fulfill the constraints. In this lecture, we present the
most wide spread methods to deal with this problem. Notice that we do not discuss
the numerical techniques employed to solve the constraints (see e.g. Choptuik’s lecture
for finite differences [32] and Grandclément and Novak’s review for spectral methods
[58]).
Standard reviews about the initial data problem are the articles by York [106] and
Choquet-Bruhat and York [36]. Recent reviews are the articles by Cook [37], Pfeiffer
[79] and Bartnik and Isenberg [10].
2. The initial data problem
2.1. 3+1 decomposition of Einstein equation
In this lecture, we consider a spacetime (M, g), where M is a four-dimensional
smooth manifold and g a Lorentzian metric on M. We assume that (M, g) is globally
http://arxiv.org/abs/0704.0149v2
Construction of initial data for 3+1 numerical relativity 2
hyperbolic, i.e. that M can be foliated by a family (Σt)t∈R of spacelike hypersurfaces.
We denote by γ the (Riemannian) metric induced by g on each hypersurface Σt and
K the extrinsic curvature of Σt, with the same sign convention as that used in the
numerical relativity community, i.e. for any pair of vector fields (u,v) tangent to Σt,
g(u,∇vn) = −K(u,v), where n is the future directed unit normal to Σt and ∇ is
the Levi-Civita connection associated with g.
The 3+1 decomposition of Einstein equation with respect to the foliation (Σt)t∈R
leads to three sets of equations: (i) the evolution equations of the Cauchy problem
(full projection of Einstein equation onto Σt), (ii) the Hamiltonian constraint (full
projection of Einstein equation along the normal n), (iii) the momentum constraint
(mixed projection: once onto Σt, once along n). The latter two sets of equations do
not contain any second derivative of the metric with respect to t. They are written‡
R+K2 −KijKij = 16πE (Hamiltonian constraint), (1)
DjKij −DiK = 8πpi (momentum constraint), (2)
where R is the Ricci scalar (also called scalar curvature) associated with the 3-metric
γ, K is the trace of K with respect to γ: K = γijKij , D stands for the Levi-Civita
connection associated with the 3-metric γ, and E and pi are respectively the energy
density and linear momentum of matter, both measured by the observer of 4-velocity
n (Eulerian observer). In terms of the matter energy-momentum tensor T they are
expressed as
E = Tµνn
µnν and pi = −Tµνnµγνi. (3)
Notice that Eqs. (1)-(2) involve a single hypersurface Σ0, not a foliation (Σt)t∈R. In
particular, neither the lapse function nor the shift vector appear in these equations.
2.2. Constructing initial data
In order to get valid initial data for the Cauchy problem, one must find solutions to
the constraints (1) and (2). Actually one may distinguish two problems:
• The mathematical problem: given some hypersurface Σ0, find a Riemannian
metric γ, a symmetric bilinear form K and some matter distribution (E,p) on Σ0
such that the Hamiltonian constraint (1) and the momentum constraint (2) are
satisfied. In addition, the matter distribution (E,p) may have some constraints
from its own. We shall not discuss them here.
• The astrophysical problem: make sure that the solution to the constraint equations
has something to do with the physical system that one wish to study.
Facing the constraint equations (1) and (2), a naive way to proceed would be to
choose freely the metric γ, thereby fixing the connection D and the scalar curvature
R, and to solve Eqs. (1)-(2) for K. Indeed, for fixed γ, E, and p, Eqs. (1)-(2) form
a quasi-linear system of first order for the components Kij . However, as discussed
by Choquet-Bruhat [45], this approach is not satisfactory because we have only four
equations for six unknowns Kij and there is no natural prescription for choosing
arbitrarily two among the six components Kij .
In 1944, Lichnerowicz [70] has shown that a much more satisfactory split of
the initial data (γ,K) between freely choosable parts and parts obtained by solving
‡ we are using the standard convention for indices, namely Greek indices run in {0, 1, 2, 3}, whereas
Latin ones run in {1, 2, 3}
Construction of initial data for 3+1 numerical relativity 3
Eqs. (1)-(2) is provided by a conformal decomposition of the metric γ. Lichnerowicz
method has been extended by Choquet-Bruhat (1956, 1971) [45, 33], by York and
Ó Murchadha (1972, 1974, 1979) [103, 104, 76, 106] and more recently by York and
Pfeiffer (1999, 2003) [107, 80]. Actually, conformal decompositions are by far the most
widely spread techniques to get initial data for the 3+1 Cauchy problem. Alternative
methods exist, such as the quasi-spherical ansatz introduced by Bartnik in 1993 [8] or
a procedure developed by Corvino (2000) [39] and by Isenberg, Mazzeo and Pollack
(2002) [63] for gluing together known solutions of the constraints, thereby producing
new ones. Here we shall limit ourselves to the conformal methods.
2.3. Conformal decomposition of the constraints
In the conformal approach initiated by Lichnerowicz [70], one introduces a conformal
metric γ̃ and a conformal factor Ψ such that the (physical) metric γ induced by the
spacetime metric on the hypersurface Σt is
γij = Ψ
4γ̃ij . (4)
We could fix some degree of freedom by demanding that det γ̃ij = 1. This would
imply Ψ = (det γij)
1/12. However, in this case γ̃ and Ψ would be tensor densities.
Moreover the condition det γ̃ij = 1 has a meaning only for Cartesian-like coordinates.
In order to deal with tensor fields and to allow for any type of coordinates, we proceed
differently and introduce a background Riemannian metric f on Σt. If the topology of
Σt allows it, we shall demand that f is flat. Then we replace the condition det γ̃ij = 1
by det γ̃ij = det fij . This fixes
det γij
det fij
)1/12
. (5)
Ψ is then a genuine scalar field on Σt (as a quotient of two determinants). Consequently
γ̃ is a tensor field and not a tensor density.
Associated with the above conformal transformation, there are two decomposi-
tions of the traceless part Aij of the extrinsic curvature, the latter being defined by
Kij =: Aij +
Kγij . (6)
These two decompositions are
Aij =: Ψ−10Âij , (7)
Aij =: Ψ−4Ãij . (8)
The choice −10 for the exponent of Ψ in Eq. (7) is motivated by the following identity,
valid for any symmetric and traceless tensor field,
ij = Ψ−10D̃j
Ψ10Aij
, (9)
where D̃j denotes the covariant derivative associated with the conformal metric γ̃.
This choice is well adapted to the momentum constraint, because the latter involves
the divergence of K. The alternative choice, i.e. Eq. (8), is motivated by time
evolution considerations, as we shall discuss below. For the time being, we limit
ourselves to the decomposition (7), having in mind to simplify the writing of the
momentum constraint.
Construction of initial data for 3+1 numerical relativity 4
By means of the decompositions (4), (6) and (7), the Hamiltonian constraint (1)
and the momentum constraint (2) are rewritten as (see Ref. [51] for details)
D̃iD̃
iΨ− 1
ÂijÂ
ij Ψ−7 + 2πẼΨ−3 − 1
K2Ψ5 = 0, (10)
D̃jÂ
ij − 2
Ψ6D̃iK = 8πp̃i, (11)
where R̃ is the Ricci scalar associated with the conformal metric γ̃ and we have
introduced the rescaled matter quantities
Ẽ := Ψ8E and p̃i := Ψ10pi. (12)
Equation (10) is known as Lichnerowicz equation, or sometimes Lichnerowicz-York
equation. The definition of p̃i is such that there is no Ψ factor in the right-hand side
of Eq. (11). On the contrary the power 8 in the definition of Ẽ is not the only possible
choice. As we shall see in § 3.4, it is chosen (i) to guarantee a negative power of Ψ in
the Ẽ term in Eq. (10), resulting in some uniqueness property of the solution and (ii)
to allow for an easy implementation of the dominant energy condition.
3. Conformal transverse-traceless method
3.1. Longitudinal/transverse decomposition of Âij
In order to solve the system (10)-(11), York (1973,1979) [104, 105, 106] has decomposed
Âij into a longitudinal part and a transverse one, setting
Âij = (L̃X)ij + Â
TT, (13)
where Â
TT is both traceless and transverse (i.e. divergence-free) with respect to the
metric γ̃:
γ̃ijÂ
TT = 0 and D̃jÂ
TT = 0, (14)
and (L̃X)ij is the conformal Killing operator associated with the metric γ̃ and acting
on the vector field X:
(L̃X)ij := D̃iXj + D̃jX i − 2
k γ̃ij . (15)
(L̃X)ij is by construction traceless:
γ̃ij(L̃X)
ij = 0 (16)
(it must be so because in Eq. (13) both Âij and Â
TT are traceless). The kernel of
L̃ is made of the conformal Killing vectors of the metric γ̃, i.e. the generators of
the conformal isometries (see e.g. Ref. [51] for more details). The symmetric tensor
(L̃X)ij is called the longitudinal part of Âij , whereas Â
TT is called the transverse part.
Given Âij , the vector X is determined by taking the divergence of Eq. (13):
taking into account property (14), we get
D̃j(L̃X)
ij = D̃jÂ
ij . (17)
The second order operator D̃j(L̃X)
ij acting on the vector X is the conformal vector
Laplacian ∆̃L:
∆̃L X
i := D̃j(L̃X)
ij = D̃jD̃
jX i +
D̃iD̃jX
j + R̃i jX
j , (18)
Construction of initial data for 3+1 numerical relativity 5
where the second equality follows from the Ricci identity applied to the connection D̃,
R̃ij being the associated Ricci tensor. The operator ∆̃L is elliptic and its kernel is, in
practice, reduced to the conformal Killing vectors of γ̃, if any. We rewrite Eq. (17) as
∆̃L X
i = D̃jÂ
ij . (19)
The existence and uniqueness of the longitudinal/transverse decomposition (13)
depend on the existence and uniqueness of solutions X to Eq. (19). We shall consider
two cases:
• Σ0 is a closed manifold, i.e. is compact without boundary;
• (Σ0,γ) is an asymptotically flat manifold, i.e. is such that the background metric
f is flat (except possibly on a compact sub-domain B of Σt) and there exists a
coordinate system (xi) = (x, y, z) on Σt such that outside B, the components
of f are fij = diag(1, 1, 1) (“Cartesian-type coordinates”) and the variable
x2 + y2 + z2 can take arbitrarily large values on Σt; then when r → +∞,
the components of γ and K with respect to the coordinates (xi) satisfy
γij = fij +O(r
−1) and
= O(r−2), (20)
Kij = O(r
−2) and
= O(r−3). (21)
In the case of a closed manifold, one can show (see Appendix B of Ref. [51] for details)
that solutions to Eq. (19) exist provided that the source D̃jÂ
ij is orthogonal to all
conformal Killing vectors of γ̃, in the sense that
∀C ∈ ker L̃,
γ̃ijC
iD̃kÂ
γ̃ d3x = 0. (22)
But the above property is easy to verify: using the fact that the source is a pure
divergence and that Σ0 is closed, we may integrate the left-hand side by parts and
get, for any vector field C,
γ̃ijC
i D̃kÂ
γ̃ d3x = −1
γ̃ij γ̃kl(L̃C)
ikÂjl
γ̃ d3x. (23)
Then, obviously, when C is a conformal Killing vector, the right-hand side of the
above equation vanishes. So there exists a solution to Eq. (19) and this solution is
unique up to the addition of a conformal Killing vector. However, given a solution
X, for any conformal Killing vector C, the solution X +C yields to the same value
of L̃X, since C is by definition in the kernel of L̃. Therefore we conclude that the
decomposition (13) of Âij is unique, although the vector X may not be if (Σ0, γ̃)
admits some conformal isometries.
In the case of an asymptotically flat manifold, the existence and uniqueness is
guaranteed by a theorem proved by Cantor in 1979 [30] (see also Appendix B of
Ref. [87] as well as Refs. [35, 51]). This theorem requires the decay condition
∂2γ̃ij
∂xk∂xl
= O(r−3) (24)
in addition to the asymptotic flatness conditions (20). This guarantees that
R̃ij = O(r
−3). (25)
Then all conditions are fulfilled to conclude that Eq. (19) admits a unique solution X
which vanishes at infinity.
To summarize, for all considered cases (asymptotic flatness and closed manifold),
any symmetric and traceless tensor Âij (decaying as O(r−2) in the asymptotically flat
case) admits a unique longitudinal/transverse decomposition of the form (13).
Construction of initial data for 3+1 numerical relativity 6
3.2. Conformal transverse-traceless form of the constraints
Inserting the longitudinal/transverse decomposition (13) into the constraint equations
(10) and (11) and making use of Eq. (19) yields to the system
D̃iD̃
iΨ− 1
(L̃X)ij + Â
(L̃X)ij + Â
+ 2πẼΨ−3 − 1
K2Ψ5 = 0, (26)
∆̃L X
i − 2
Ψ6D̃iK = 8πp̃i, (27)
where
(L̃X)ij := γ̃ikγ̃jl(L̃X)
kl and ÂTTij := γ̃ikγ̃jlÂ
TT. (28)
With the constraint equations written as (26) and (27), we see clearly which part
of the initial data on Σ0 can be freely chosen and which part is “constrained”:
• free data:
– conformal metric γ̃;
– symmetric traceless and transverse tensor Â
TT (traceless and transverse are
meant with respect to γ̃: γ̃ijÂ
TT = 0 and D̃jÂ
TT = 0);
– scalar field K;
– conformal matter variables: (Ẽ, p̃i);
• constrained data (or “determined data”):
– conformal factor Ψ, obeying the non-linear elliptic equation (26)
(Lichnerowicz equation)
– vector X, obeying the linear elliptic equation (27) .
Accordingly the general strategy to get valid initial data for the Cauchy problem is
to choose (γ̃ij , Â
TT,K, Ẽ, p̃
i) on Σ0 and solve the system (26)-(27) to get Ψ and X
Then one constructs
γij = Ψ
4γ̃ij (29)
Kij = Ψ−10
(L̃X)ij + Â
Ψ−4Kγ̃ij (30)
E = Ψ−8Ẽ (31)
pi = Ψ−10p̃i (32)
and obtains a set (γ,K, E,p) which satisfies the constraint equations (1)-(2). This
method has been proposed by York (1979) [106] and is naturally called the conformal
transverse traceless (CTT ) method.
3.3. Decoupling on hypersurfaces of constant mean curvature
Equations (26) and (27) are coupled, but we notice that if, among the free data, we
choose K to be a constant field on Σ0,
K = const, (33)
then they decouple partially : condition (33) implies D̃iK = 0, so that the momentum
constraint (27) becomes independent of Ψ:
∆̃L X
i = 8πp̃i (K = const). (34)
Construction of initial data for 3+1 numerical relativity 7
The condition (33) on the extrinsic curvature of Σ0 defines what is called a constant
mean curvature (CMC ) hypersurface. Indeed let us recall that K is nothing but
(minus three times) the mean curvature of (Σ0,γ) embedded in (M, g). A maximal
hypersurface, having K = 0, is of course a special case of a CMC hypersurface. On a
CMC hypersurface, the task of obtaining initial data is greatly simplified: one has first
to solve the linear elliptic equation (34) to get X and plug the solution into Eq. (26)
to form an equation for Ψ. Equation (34) is the conformal vector Poisson equation
discussed above (Eq. (19), with D̃jÂ
ij replaced by 8πp̃i). We know then that it is
solvable for the two cases of interest mentioned in Sec. 3.1: closed or asymptotically
flat manifold. Moreover, the solutions X are such that the value of L̃X is unique.
3.4. Lichnerowicz equation
Taking into account the CMC decoupling, the difficult problem is to solve Eq. (26)
for Ψ. This equation is elliptic and highly non-linear§. It has been first studied
by Lichnerowicz [70, 71] in the case K = 0 (Σ0 maximal) and Ẽ = 0 (vacuum).
Lichnerowicz has shown that given the value of Ψ at the boundary of a bounded domain
of Σ0 (Dirichlet problem), there exists at most one solution to Eq. (26). Besides, he
showed the existence of a solution provided that ÂijÂ
ij is not too large. These early
results have been much improved since then. In particular Cantor [29] has shown that
in the asymptotically flat case, still with K = 0 and Ẽ = 0, Eq. (26) is solvable if
and only if the metric γ̃ is conformal to a metric with vanishing scalar curvature (one
says then that γ̃ belongs to the positive Yamabe class) (see also Ref. [74]). In the
case of closed manifolds, the complete analysis of the CMC case has been achieved by
Isenberg (1995) [62].
For more details and further references, we recommend the review articles by
Choquet-Bruhat and York [36] and Bartnik and Isenberg [10]. Here we shall simply
repeat the argument of York [107] to justify the rescaling (12) of E. This rescaling is
indeed related to the uniqueness of solutions to the Lichnerowicz equation. Consider
a solution Ψ0 to Eq. (26) in the case K = 0, to which we restrict ourselves. Another
solution close to Ψ0 can be written Ψ = Ψ0 + ǫ, with |ǫ| ≪ Ψ0:
D̃iD̃
i(Ψ0 + ǫ)−
(Ψ0 + ǫ) +
ÂijÂ
ij (Ψ0 + ǫ)
−7 + 2πẼ(Ψ0 + ǫ)
−3 = 0. (35)
Expanding to the first order in ǫ/Ψ0 leads to the following linear equation for ǫ:
D̃iD̃
iǫ− αǫ = 0, (36)
ÂijÂ
ijΨ−80 + 6πẼΨ
0 . (37)
Now, if α ≥ 0, one can show, by means of the maximum principle, that the solution
of (36) which vanishes at spatial infinity is necessarily ǫ = 0 (see Ref. [34] or § B.1 of
Ref. [35]). We therefore conclude that the solution Ψ0 to Eq. (26) is unique (at least
locally) in this case. On the contrary, if α < 0, non trivial oscillatory solutions of
Eq. (36) exist, making the solution Ψ0 not unique. The key point is that the scaling
(12) of E yields the term +6πẼΨ−40 in Eq. (37), which contributes to make α positive.
If we had not rescaled E, i.e. had considered the original Hamiltonian constraint, the
§ although it is quasi-linear in the technical sense, i.e. linear with respect to the highest-order
derivatives
Construction of initial data for 3+1 numerical relativity 8
contribution to α would have been instead −10πEΨ40, i.e. would have been negative.
Actually, any rescaling Ẽ = ΨsE with s > 5 would have work to make α positive. The
choice s = 8 in Eq. (12) is motivated by the fact that if the conformal data (Ẽ, p̃i)
obey the “conformal” dominant energy condition
γ̃ij p̃ip̃j, (38)
then, via the scaling (12) of pi, the reconstructed physical data (E, pi) will
automatically obey the dominant energy condition
γijpipj. (39)
4. Conformally flat initial data by the CTT method
4.1. Momentarily static initial data
In this section we search for asymptotically flat initial data (Σ0,γ,K) by the CTT
method exposed above. As a purpose of illustration, we shall start by the simplest
case one may think of, namely choose the freely specifiable data (γ̃ij , Â
TT,K, Ẽ, p̃
to be a flat metric:
γ̃ij = fij , (40)
a vanishing transverse-traceless part of the extrinsic curvature:
TT = 0, (41)
a vanishing mean curvature (maximal hypersurface)
K = 0, (42)
and a vacuum spacetime:
Ẽ = 0, p̃i = 0. (43)
Then D̃i = Di, where D denotes the Levi-Civita connection associated with f , R̃ = 0
(f is flat) and the constraint equations (26)-(27) reduce to
(LX)ij(LX)
ij Ψ−7 = 0 (44)
i = 0, (45)
where ∆ and ∆L are respectively the scalar Laplacian and the conformal vector
Laplacian associated with the flat metric f :
∆ := DiDi and ∆LX i := DjDjX i +
DiDjXj . (46)
Equations (44)-(45) must be solved with the boundary conditions
Ψ = 1 when r → ∞ (47)
X = 0 when r → ∞, (48)
which follow from the asymptotic flatness requirement. The solution depends on the
topology of Σ0, since the latter may introduce some inner boundary conditions in
addition to (47)-(48).
Let us start with the simplest case: Σ0 = R
3. Then the unique solution of Eq. (45)
subject to the boundary condition (48) is
X = 0. (49)
Construction of initial data for 3+1 numerical relativity 9
Figure 1. Hypersurface Σ0 as R
3 minus a ball, displayed via an embedding
diagram based on the metric γ̃, which coincides with the Euclidean metric on
3. Hence Σ0 appears to be flat. The unit normal of the inner boundary S with
respect to the metric γ̃ is s̃. Notice that D̃ · s̃ > 0.
Consequently (LX)ij = 0, so that Eq. (44) reduces to Laplace equation for Ψ:
∆Ψ = 0. (50)
With the boundary condition (47), there is a unique regular solution on R3:
Ψ = 1. (51)
The initial data reconstructed from Eqs. (29)-(30) is then
γ = f (52)
K = 0. (53)
These data correspond to a spacelike hyperplane of Minkowski spacetime.
Geometrically the condition K = 0 is that of a totally geodesic hypersurface [i.e. all
the geodesics of (Σt,γ) are geodesics of (M, g)]. Physically data with K = 0 are said
to be momentarily static or time symmetric. Indeed, if we consider a foliation with
unit lapse around Σ0 (geodesic slicing), the following relation holds: Ln g = −2K,
where Ln denotes the Lie derivative along the unit normal n. So if K = 0, Ln g = 0.
This means that, locally (i.e. on Σ0), n is a spacetime Killing vector. This vector
being timelike, the configuration is then stationary. Moreover, the Killing vector n
being orthogonal to some hypersurface (i.e. Σ0), the stationary configuration is called
static. Of course, this staticity properties holds a priori only on Σ0 since there is no
guarantee that the time development of Cauchy data with K = 0 at t = 0 maintains
K = 0 at t > 0. Hence the qualifier ‘momentarily’ in the expression ‘momentarily
static’ for data with K = 0.
4.2. Slice of Schwarzschild spacetime
To get something less trivial than a slice of Minkowski spacetime, let us consider a
slightly more complicated topology for Σ0, namely R
3 minus a ball (cf. Fig. 1). The
sphere S delimiting the ball is then the inner boundary of Σ0 and we must provide
boundary conditions for Ψ and X on S to solve Eqs. (44)-(45). For simplicity, let us
choose
= 0. (54)
Altogether with the outer boundary condition (48), this leads to X being identically
zero as the unique solution of Eq. (45). So, again, the Hamiltonian constraint reduces
to Laplace equation
∆Ψ = 0. (55)
Construction of initial data for 3+1 numerical relativity 10
Figure 2. Same hypersurface Σ0 as in Fig. 1 but displayed via an embedding
diagram based on the metric γ instead of γ̃. The unit normal of the inner
boundary S with respect to that metric is s. Notice that D · s = 0, which
means that S is a minimal surface of (Σ0,γ).
If we choose the boundary condition Ψ|
= 1, then the unique solution is Ψ = 1
and we are back to the previous example (slice of Minkowski spacetime). In order to
have something non trivial, i.e. to ensure that the metric γ will not be flat, let us
demand that γ admits a closed minimal surface, that we will choose to be S. This
will necessarily translate as a boundary condition for Ψ since all the information on
the metric is encoded in Ψ (let us recall that from the choice (40), γ = Ψ4f ). S is a
minimal surface of (Σ0,γ) iff its mean curvature vanishes, or equivalently if its unit
normal s is divergence-free (cf. Fig. 2):
= 0. (56)
This is the analog of ∇ · n = 0 for maximal hypersurfaces, the change from minimal
to maximal being due to the change of metric signature, from the Riemannian to the
Lorentzian one. Expressed in term of the connection D̃ = D (recall that in the present
case γ̃ = f), condition (56) is equivalent to
Di(Ψ6si)
= 0. (57)
Let us rewrite this expression in terms of the unit vector s̃ normal to S with respect
to the metric γ̃ (cf. Fig. 1); we have
s̃ = Ψ−2s, (58)
since γ̃(s̃, s̃) = Ψ−4γ̃(s, s) = γ(s, s) = 1. Thus Eq. (57) becomes
Di(Ψ4s̃i)
fΨ4s̃i
= 0. (59)
Let us introduce on Σ0 a coordinate system of spherical type, (x
i) = (r, θ, ϕ), such
that (i) fij = diag(1, r
2, r2 sin2 θ) and (ii) S is the sphere r = a, where a is some
positive constant. Since in these coordinates
f = r2 sin θ and s̃i = (1, 0, 0), the
minimal surface condition (59) is written as
= 0, (60)
Construction of initial data for 3+1 numerical relativity 11
Figure 3. Extended hypersurface Σ′
obtained by gluing a copy of Σ0 at the
minimal surface S; it defines an Einstein-Rosen bridge between two asymptotically
flat regions.
= 0 (61)
This is a boundary condition of mixed Newmann/Dirichlet type for Ψ. The unique
solution of the Laplace equation (55) which satisfies boundary conditions (47) and
(61) is
Ψ = 1 +
. (62)
The parameter a is then easily related to the ADM mass m of the hypersurface Σ0.
Indeed for a conformally flat 3-metric (and more generally in the quasi-isotropic gauge,
cf. Chap. 7 of Ref. [51]), the ADM mass m is given by the flux of the gradient of the
conformal factor at spatial infinity:
m = − 1
r=const
r2 sin θ dθ dϕ
= − 1
= 2a. (63)
Hence a = m/2 and we may write
Ψ = 1 +
. (64)
Therefore, in terms of the coordinates (r, θ, ϕ), the obtained initial data (γ,K) are
γij =
diag(1, r2, r2 sin θ) (65)
Kij = 0. (66)
So, as above, the initial data are momentarily static. Actually, we recognize on (65)-
(66) a slice t = const of Schwarzschild spacetime in isotropic coordinates.
The isotropic coordinates (r, θ, ϕ) covering the manifold Σ0 are such that the
range of r is [m/2,+∞). But thanks to the minimal character of the inner boundary
S, we can extend (Σ0,γ) to a larger Riemannian manifold (Σ′0,γ ′) with γ′|Σ0 = γ
and γ′ smooth at S. This is made possible by gluing a copy of Σ0 at S (cf. Fig. 3).
Construction of initial data for 3+1 numerical relativity 12
Figure 4. Extended hypersurface Σ′
depicted in the Kruskal-Szekeres
representation of Schwarzschild spacetime. R stands for Schwarzschild radial
coordinate and r for the isotropic radial coordinate. R = 0 is the singularity and
R = 2m the event horizon. Σ′
is nothing but a hypersurface t = const, where
t is the Schwarzschild time coordinate. In this diagram, these hypersurfaces are
straight lines and the Einstein-Rosen bridge S is reduced to a point.
The topology of Σ′0 is S
2×R and the range of r in Σ′0 is (0,+∞). The extended metric
γ ′ keeps exactly the same form as (65):
γ′ij dx
i dxj =
dr2 + r2dθ2 + r2 sin2 θdϕ2
. (67)
By the change of variable
r 7→ r′ = m
it is easily shown that the region r → 0 does not correspond to some “center” but is
actually a second asymptotically flat region (the lower one in Fig. 3). Moreover the
transformation (68), with θ and ϕ kept fixed, is an isometry of γ ′. It maps a point
p of Σ0 to the point located at the vertical of p in Fig. 3. The minimal sphere S is
invariant under this isometry. The region around S is called an Einstein-Rosen bridge.
(Σ′0,γ
′) is still a slice of Schwarzschild spacetime. It connects two asymptotically flat
regions without entering below the event horizon, as shown in the Kruskal-Szekeres
diagram of Fig. 4.
4.3. Bowen-York initial data
Let us select the same simple free data as above, namely
γ̃ij = fij , Â
TT = 0, K = 0, Ẽ = 0 and p̃
i = 0. (69)
For the hypersurface Σ0, instead of R
3 minus a ball, we choose R3 minus a point:
Σ0 = R
3\{O}. (70)
The removed point O is called a puncture [21]. The topology of Σ0 is S
2×R; it differs
from the topology considered in Sec. 4.1 (R3 minus a ball); actually it is the same
topology as that of the extended manifold Σ′0 (cf. Fig. 3).
Construction of initial data for 3+1 numerical relativity 13
Thanks to the choice (69), the system to be solved is still (44)-(45). If we choose
the trivial solution X = 0 for Eq. (45), we are back to the slice of Schwarzschild
spacetime considered in Sec. 4.1, except that now Σ0 is the extended manifold
previously denoted Σ′0.
Bowen and York [20] have obtained a simple non-trivial solution to the momentum
constraint (45) (see also Ref. [15]). Given a Cartesian coordinate system (xi) =
(x, y, z) on Σ0 (i.e. a coordinate system such that fij = diag(1, 1, 1)) with respect to
which the coordinates of the puncture O are (0, 0, 0), this solution writes
X i = − 1
7f ijPj +
k, (71)
where r :=
x2 + y2 + z2, ǫ
k is the Levi-Civita alternating tensor associated with
the flat metric f and (Pi, Sj) = (P1, P2, P3, S1, S2, S3) are six real numbers, which
constitute the six parameters of the Bowen-York solution. Notice that since r 6= 0 on
Σ0, the Bowen-York solution is a regular and smooth solution on the entire Σ0.
The conformal traceless extrinsic curvature corresponding to the solution (71) is
deduced from formula (13), which in the present case reduces to Âij = (LX)ij ; one
Âij =
xiP j + xjP i −
f ij − x
ǫiklSkx
lxj + ǫ
, (72)
where P i := f ijPj . The tensor Â
ij given by Eq. (72) is called the Bowen-York extrinsic
curvature. Notice that the Pi part of Â
ij decays asymptotically as O(r−2), whereas
the Si part decays as O(r
Remark : Actually the expression of Âij given in the original Bowen-York article
[20] contains an additional term with respect to Eq. (72), but the role of this
extra term is only to ensure that the solution is isometric through an inversion
across some sphere. We are not interested by such a property here, so we have
dropped this term. Therefore, strictly speaking, we should name expression (72)
the simplified Bowen-York extrinsic curvature.
The Bowen-York extrinsic curvature provides an analytical solution of the
momentum constraint (45) but there remains to solve the Hamiltonian constraint
(44) for Ψ, with the asymptotic flatness boundary condition Ψ = 1 when r → ∞.
Since X 6= 0, Eq. (44) is no longer a simple Laplace equation, as in Sec. 4.1, but a
non-linear elliptic equation. There is no hope to get any analytical solution and one
must solve Eq. (44) numerically to get Ψ and reconstruct the full initial data (γ,K)
via Eqs. (29)-(30).
The parameters Pi of the Bowen-York solution are nothing but the three
components of the ADM linear momentum of the hypersurface Σ0 Similarly, the
parameters Si of the Bowen-York solution are nothing but the three components of
the angular momentum of the hypersurface Σ0, the latter being defined relatively to
the quasi-isotropic gauge, in the absence of any axial symmetry (see e.g. [51]).
Remark : The Bowen-York solution with P i = 0 and Si = 0 reduces to the
momentarily static solution found in Sec. 4.1, i.e. is a slice t = const of the
Schwarzschild spacetime (t being the Schwarzschild time coordinate). However
Bowen-York initial data with P i = 0 and Si 6= 0 do not constitute a slice of Kerr
spacetime. Indeed, it has been shown [47] that there does not exist any foliation
of Kerr spacetime by hypersurfaces which (i) are axisymmetric, (ii) smoothly
Construction of initial data for 3+1 numerical relativity 14
reduce in the non-rotating limit to the hypersurfaces of constant Schwarzschild
time and (iii) are conformally flat, i.e. have induced metric γ̃ = f , as the
Bowen-York hypersurfaces have. This means that a Bowen-York solution with
Si 6= 0 does represent initial data for a rotating black hole, but this black hole is
not stationary: it is “surrounded” by gravitational radiation, as demonstrated by
the time development of these initial data [22, 49].
5. Conformal thin sandwich method
5.1. The original conformal thin sandwich method
An alternative to the conformal transverse-traceless method for computing initial data
has been introduced by York in 1999 [107]. The starting point is the identity
K = − 1
LNnγ = −
γ, (73)
where N is the lapse function and β is the shift vector associated with some 3+1
coordinates (t, xi). The traceless part of Eq. (73) leads to
Ãij =
γ̃ij − 2
k γ̃ij
, (74)
where Ãij is defined by Eq. (8). Noticing that
− Lβ γ̃ij = (L̃β)ij +
k, (75)
and introducing the short-hand notation
γ̃ij , (76)
we can rewrite Eq. (74) as
Ãij =
+ (L̃β)ij
. (77)
The relation between Ãij and Âij is [cf. Eqs. (7)-(8)]
Âij = Ψ6Ãij . (78)
Accordingly, Eq. (77) yields
Âij =
+ (L̃β)ij
, (79)
where we have introduced the conformal lapse
Ñ := Ψ−6N. (80)
Equation (79) constitutes a decomposition of Âij alternative to the longitudi-
nal/transverse decomposition (13). Instead of expressing Âij in terms of a vector
X and a TT tensor Â
TT, it expresses it in terms of the shift vector β, the time
derivative of the conformal metric, ˙̃γ
, and the conformal lapse Ñ .
The Hamiltonian constraint, written as the Lichnerowicz equation (10), takes the
same form as before:
D̃iD̃
iΨ− R̃
ÂijÂ
ij Ψ−7 + 2πẼΨ−3 − K
Ψ5 = 0, (81)
Construction of initial data for 3+1 numerical relativity 15
except that now Âij is to be understood as the combination (79) of βi, ˙̃γ
and Ñ .
On the other side, the momentum constraint (11) becomes, once expression (79) is
substituted for Âij ,
(L̃β)ij
+ D̃j
Ψ6D̃iK = 16πp̃i. (82)
In view of the system (81)-(82), the method to compute initial data consists in
choosing freely γ̃ij , ˙̃γ
, K, Ñ , Ẽ and p̃i on Σ0 and solving (81)-(82) to get Ψ and β
This method is called conformal thin sandwich (CTS ), because one input is the time
derivative ˙̃γ
, which can be obtained from the value of the conformal metric on two
neighbouring hypersurfaces Σt and Σt+δt (“thin sandwich” view point).
Remark : The term “thin sandwich” originates from a previous method devised in
the early sixties by Wheeler and his collaborators [4, 101]. Contrary to the
methods exposed here, the thin sandwich method was not based on a conformal
decomposition: it considered the constraint equations (1)-(2) as a system to be
solved for the lapse N and the shift vector β, given the metric γ and its time
derivative. The extrinsic curvature which appears in (1)-(2) was then considered
as the function of γ, ∂γ/∂t, N and β given by Eq. (73). However, this method
does not work in general [9]. On the contrary the conformal thin sandwich method
introduced by York [107] and exposed above was shown to work [35].
As for the conformal transverse-traceless method treated in Sec. 3, on CMC
hypersurfaces, Eq. (82) decouples from Eq. (81) and becomes an elliptic linear equation
for β.
5.2. Extended conformal thin sandwich method
An input of the above method is the conformal lapse Ñ . Considering the astrophysical
problem stated in Sec. 2.2, it is not clear how to pick a relevant value for Ñ . Instead
of choosing an arbitrary value, Pfeiffer and York [80] have suggested to compute Ñ
from the Einstein equation giving the time derivative of the trace K of the extrinsic
curvature, i.e.
K = −Ψ−4
D̃iD̃
iN + 2D̃i lnΨ D̃
4π(E + S) + ÃijÃ
, (83)
where S is the trace of the matter stress tensor as measured by the Eulerian observer:
S = γµνTµν . This amounts to add this equation to the initial data system. More
precisely, Pfeiffer and York [80] suggested to combine Eq. (83) with the Hamiltonian
constraint to get an equation involving the quantity NΨ = ÑΨ7 and containing no
scalar products of gradients as the D̃i lnΨD̃
iN term in Eq. (83), thanks to the identity
D̃iD̃
iN + 2D̃i lnΨ D̃
iN = Ψ−1
D̃iD̃
i(NΨ) +ND̃iD̃
. (84)
Expressing the left-hand side of the above equation in terms of Eq. (83) and
substituting D̃iD̃
iΨ in the right-hand side by its expression deduced from Eq. (81),
Construction of initial data for 3+1 numerical relativity 16
we get
D̃iD̃
i(ÑΨ7)− (ÑΨ7)
K2Ψ4 +
ÂijÂ
ijΨ−8 + 2π(Ẽ + 2S̃)Ψ−4
K̇ − βiD̃iK
Ψ5 = 0, (85)
where we have used the short-hand notation
K̇ :=
and have set
S̃ := Ψ8S. (87)
Adding Eq. (85) to Eqs. (81) and (82), the initial data system becomes
D̃iD̃
iΨ− R̃
ÂijÂ
ij Ψ−7 + 2πẼΨ−3 − K
Ψ5 = 0 (88)
(L̃β)ij
+ D̃j
Ψ6D̃iK = 16πp̃i (89)
D̃iD̃
i(ÑΨ7)− (ÑΨ7)
K2Ψ4 +
ÂijÂ
ijΨ−8 + 2π(Ẽ + 2S̃)Ψ−4
K̇ − βiD̃iK
Ψ5 = 0, (90)
where Âij is the function of Ñ , βi, γ̃ij and ˙̃γ
defined by Eq. (79). Equations (88)-(90)
constitute the extended conformal thin sandwich (XCTS ) system for the initial data
problem. The free data are the conformal metric γ̃, its coordinate time derivative ˙̃γ,
the extrinsic curvature trace K, its coordinate time derivative K̇, and the rescaled
matter variables Ẽ, S̃ and p̃i. The constrained data are the conformal factor Ψ, the
conformal lapse Ñ and the shift vector β.
Remark : The XCTS system (88)-(90) is a coupled system. Contrary to the CTT
system (26)-(27), the assumption of constant mean curvature, and in particular
of maximal slicing, does not allow to decouple it.
5.3. XCTS at work: static black hole example
Let us illustrate the extended conformal thin sandwich method on a simple example.
Take for the hypersurface Σ0 the punctured manifold considered in Sec. 4.3, namely
Σ0 = R
3\{O}. (91)
For the free data, let us perform the simplest choice:
γ̃ij = fij , ˙̃γ
= 0, K = 0, K̇ = 0, Ẽ = 0, S̃ = 0, and p̃i = 0, (92)
i.e. we are searching for vacuum initial data on a maximal and conformally flat
hypersurface with all the freely specifiable time derivatives set to zero. Thanks to
(92), the XCTS system (88)-(90) reduces to
ÂijÂ
ij Ψ−7 = 0 (93)
(Lβ)ij
= 0 (94)
∆(ÑΨ7)−
ÂijÂ
ijΨ−1Ñ = 0. (95)
Construction of initial data for 3+1 numerical relativity 17
Aiming at finding the simplest solution, we notice that
β = 0 (96)
is a solution of Eq. (94). Together with ˙̃γ
= 0, it leads to [cf. Eq. (79)]
Âij = 0. (97)
The system (93)-(95) reduces then further:
∆Ψ = 0 (98)
∆(ÑΨ7) = 0. (99)
Hence we have only two Laplace equations to solve. Moreover Eq. (98) decouples
from Eq. (99). For simplicity, let us assume spherical symmetry around the puncture
O. We introduce an adapted spherical coordinate system (xi) = (r, θ, ϕ) on Σ0. The
puncture O is then at r = 0. The simplest non-trivial solution of (98) which obeys
the asymptotic flatness condition Ψ → 1 as r → +∞ is
Ψ = 1 +
, (100)
where as in Sec. 4.1, the constant m is the ADM mass of Σ0 [cf. Eq. (63)]. Notice
that since r = 0 is excluded from Σ0, Ψ is a perfectly regular solution on the entire
manifold Σ0. Let us recall that the Riemannian manifold (Σ0,γ) corresponding to
this value of Ψ via γ = Ψ4f is the Riemannian manifold denoted (Σ′0,γ) in Sec. 4.1
and depicted in Fig. 3. In particular it has two asymptotically flat ends: r → +∞
and r → 0 (the puncture).
As for Eq. (98), the simplest solution of Eq. (99) obeying the asymptotic flatness
requirement ÑΨ7 → 1 as r → +∞ is
ÑΨ7 = 1 +
, (101)
where a is some constant. Let us determine a from the value of the lapse function at
the second asymptotically flat end r → 0. The lapse being related to Ñ via Eq. (80),
Eq. (101) is equivalent to
Ψ−1 =
r + a
r +m/2
. (102)
Hence
. (103)
There are two natural choices for limr→0 N . The first one is
N = 1, (104)
yielding a = m/2. Then, from Eq. (102) N = 1 everywhere on Σ0. This value of N
corresponds to a geodesic slicing. The second choice is
N = −1. (105)
This choice is compatible with asymptotic flatness: it simply means that the
coordinate time t is running “backward” near the asymptotic flat end r → 0. This
contradicts the assumption N > 0 in the standard definition of the lapse function.
However, we shall generalize here the definition of the lapse to allow for negative
values: whereas the unit vector n is always future-oriented, the scalar field t is allowed
to decrease towards the future. Such a situation has already been encountered for the
Construction of initial data for 3+1 numerical relativity 18
part of the slices t = const located on the left side of Fig. 4. Once reported into
Eq. (103), the choice (105) yields a = −m/2, so that
. (106)
Gathering relations (96), (100) and (106), we arrive at the following expression of the
spacetime metric components:
gµνdx
µdxν = −
1 + m
dt2 +
dr2 + r2(dθ2 + sin2 θdϕ2)
. (107)
We recognize the line element of Schwarzschild spacetime in isotropic coordinates.
Hence we recover the same initial data as in Sec. 4.1 and depicted in Figs. 3 and 4.
The bonus is that we have the complete expression of the metric g on Σ0, and not
only the induced metric γ.
Remark : The choices (104) and (105) for the asymptotic value of the lapse both lead
to a momentarily static initial slice in Schwarzschild spacetime. The difference
is that the time development corresponding to choice (104) (geodesic slicing) will
depend on t, whereas the time development corresponding to choice (105) will
not, since in the latter case t coincides with the standard Schwarzschild time
coordinate, which makes ∂t a Killing vector.
5.4. Uniqueness of solutions
Recently, Pfeiffer and York [81] have exhibited a choice of vacuum free data
(γ̃ij , ˙̃γ
,K, K̇) for which the solution (Ψ, Ñ , βi) to the XCTS system (88)-(90) is not
unique (actually two solutions are found). The conformal metric γ̃ is the flat metric
plus a linearized quadrupolar gravitational wave, as obtained by Teukolsky [92], with
a tunable amplitude. ˙̃γ
corresponds to the time derivative of this wave, and both
K and K̇ are chosen to zero. On the contrary, for the same free data, with K̇ = 0
substituted by Ñ = 1, Pfeiffer and York have shown that the original conformal thin
sandwich method as described in Sec. 5.1 leads to a unique solution (or no solution at
all if the amplitude of the wave is two large).
Baumgarte, Ó Murchadha and Pfeiffer [14] have argued that the lack of uniqueness
for the XCTS system may be due to the term
− (ÑΨ7)7
ÂijÂ
ijΨ−8 = − 7
Ψ6γ̃ikγ̃jl
+ (L̃β)ij
+ (L̃β)kl
(ÑΨ7)−1 (108)
in Eq. (90). Indeed, if we proceed as for the analysis of Lichnerowicz equation in
Sec. 3.4, we notice that this term, with the minus sign and the negative power of
(ÑΨ7)−1, makes the linearization of Eq. (90) of the type D̃iD̃
iǫ+αǫ = σ, with α > 0.
This “wrong” sign of α prevents the application of the maximum principle to guarantee
the uniqueness of the solution.
The non-uniqueness of solution of the XCTS system for certain choice of free data
has been confirmed by Walsh [100] by means of bifurcation theory.
5.5. Comparing CTT, CTS and XCTS
The conformal transverse traceless (CTT) method exposed in Sec. 3 and the
(extended) conformal thin sandwich (XCTS) method considered here differ by the
choice of free data: whereas both methods use the conformal metric γ̃ and the trace
Construction of initial data for 3+1 numerical relativity 19
of the extrinsic curvature K as free data, CTT employs in addition Â
TT, whereas
for CTS (resp. XCTS) the additional free data is ˙̃γ
, as well as Ñ (resp. K̇).
Since Â
TT is directly related to the extrinsic curvature and the latter is linked to
the canonical momentum of the gravitational field in the Hamiltonian formulation of
general relativity, the CTT method can be considered as the approach to the initial
data problem in the Hamiltonian representation. On the other side, ˙̃γ
being the
“velocity” of γ̃ij , the (X)CTS method constitutes the approach in the Lagrangian
representation [108].
Remark : The (X)CTS method assumes that the conformal metric is unimodular:
det(γ̃ij) = f (since Eq. (79) follows from this assumption), whereas the CTT
method can be applied with any conformal metric.
The advantage of CTT is that its mathematical theory is well developed, yielding
existence and uniqueness theorems, at least for constant mean curvature (CMC) slices.
The mathematical theory of CTS is very close to CTT. In particular, the momentum
constraint decouples from the Hamiltonian constraint on CMC slices. On the contrary,
XCTS has a much more involved mathematical structure. In particular the CMC
condition does not yield to any decoupling. The advantage of XCTS is then to be
better suited to the description of quasi-stationary spacetimes, since ˙̃γ
= 0 and
K̇ = 0 are necessary conditions for ∂t to be a Killing vector. This makes XCTS
the method to be used in order to prepare initial data in quasi-equilibrium. For
instance, it has been shown [57, 43] that XCTS yields orbiting binary black hole
configurations in much better agreement with post-Newtonian computations than the
CTT treatment based on a superposition of two Bowen-York solutions. Indeed, except
when they are very close and about to merge, the orbits of binary black holes evolve
very slowly, so that it is a very good approximation to consider that the system is in
quasi-equilibrium. XCTS takes this fully into account, while CTT relies on a technical
simplification (Bowen-York analytical solution of the momentum constraint), with no
direct relation to the quasi-equilibrium state.
A detailed comparison of CTT and XCTS for a single spinning or boosted black
hole has been performed by Laguna [68].
6. Initial data for binary systems
A major topic of contemporary numerical relativity is the computation of the merger
of a binary system of black holes [24] or neutron stars [84], for such systems are
among the most promising sources of gravitational radiation for the interferometric
detectors either groundbased (LIGO, VIRGO, GEO600, TAMA) or in space (LISA).
The problem of preparing initial data for these systems has therefore received a lot of
attention in the past decade.
6.1. Helical symmetry
Due to the gravitational-radiation reaction, a relativistic binary system has an inspiral
motion, leading to the merger of the two components. However, when the two bodies
are sufficiently far apart, one may approximate the spiraling orbits by closed ones.
Moreover, it is well known that gravitational radiation circularizes the orbits very
efficiently, at least for comparable mass systems [18]. We may then consider that the
motion is described by a sequence of closed circular orbits.
Construction of initial data for 3+1 numerical relativity 20
Figure 5. Action of the helical symmetry group, with Killing vector ℓ. χτ (P )
is the displacement of the point P by the member of the symmetry group of
parameter τ . N and β are respectively the lapse function and the shift vector
associated with coordinates adapted to the symmetry, i.e. coordinates (t, xi) such
that ∂t = ℓ.
The geometrical translation of this physical assumption is that the spacetime
(M, g) is endowed with some symmetry, called helical symmetry. Indeed exactly
circular orbits imply the existence of a one-parameter symmetry group such that the
associated Killing vector ℓ obeys the following properties [46]: (i) ℓ is timelike near
the system, (ii) far from it, ℓ is spacelike but there exists a smaller number T > 0 such
that the separation between any point P and its image χT (P ) under the symmetry
group is timelike (cf. Fig. 5). ℓ is called a helical Killing vector, its field lines in a
spacetime diagram being helices (cf. Fig. 5).
Helical symmetry is exact in theories of gravity where gravitational radiation does
not exist, namely:
• in Newtonian gravity,
• in post-Newtonian gravity, up to the second order,
• in the Isenberg-Wilson-Mathews (IWM) approximation to general relativity,
based on the assumptions γ̃ = f and K = 0 [61, 102].
Moreover helical symmetry can be exact in full general relativity for a non-
axisymmetric system (such as a binary) with standing gravitational waves [44]. But
notice that a spacetime with helical symmetry and standing gravitational waves cannot
be asymptotically flat [48].
To treat helically symmetric spacetimes, it is natural to choose coordinates (t, xi)
that are adapted to the symmetry, i.e. such that
∂t = ℓ. (109)
Then all the fields are independent of the coordinate t. In particular,
= 0 and K̇ = 0. (110)
Construction of initial data for 3+1 numerical relativity 21
If we employ the XCTS formalism to compute initial data, we therefore get some
definite prescription for the free data ˙̃γ
and K̇. On the contrary, the requirements
(110) do not have any immediate translation in the CTT formalism.
Remark : Helical symmetry can also be useful to treat binary black holes outside the
scope of the 3+1 formalism, as shown by Klein [67], who developed a quotient
space formalism to reduce the problem to a three dimensional SL(2,R)/SO(1, 1)
sigma model.
Taking into account (110) and choosing maximal slicing (K = 0), the XCTS
system (88)-(90) becomes
D̃iD̃
ÂijÂ
ij Ψ−7 + 2πẼΨ−3 = 0 (111)
(L̃β)ij
− 16πp̃i = 0 (112)
D̃iD̃
i(ÑΨ7)− (ÑΨ7)
ÂijÂ
ijΨ−8 + 2π(Ẽ + 2S̃)Ψ−4
= 0, (113)
where [cf. Eq. (79)]
Âij =
(L̃β)ij . (114)
6.2. Helical symmetry and IWM approximation
If we choose, as part of the free data, the conformal metric to be flat,
γ̃ij = fij , (115)
then the helically symmetric XCTS system (111)-(113) reduces to
ÂijÂ
ij Ψ−7 + 2πẼΨ−3 = 0 (116)
∆βi +
DiDjβj − (Lβ)ijDj ln Ñ = 16πÑp̃i (117)
∆(ÑΨ7)− (ÑΨ7)
ÂijÂ
ijΨ−8 + 2π(Ẽ + 2S̃)Ψ−4
= 0, (118)
where
Âij =
(Lβ)ij (119)
and D is the connection associated with the flat metric f , ∆ := DiDi is the flat
Laplacian [Eq. (46)], and (Lβ)ij := Diβj +Djβi− 2
Dkβk f ij [Eq. (15) with D̃i = Di].
We remark that the system (116)-(118) is identical to the system defining
the Isenberg-Wilson-Mathews approximation to general relativity [61, 102] (see e.g.
Sec. 6.6 of Ref. [51]). This means that, within helical symmetry, the XCTS system
with the choice K = 0 and γ̃ = f is equivalent to the IWM system.
Remark : Contrary to IWM, XCTS is not some approximation to general relativity:
it provides exact initial data. The only thing that may be questioned is the
astrophysical relevance of the XCTS data with γ̃ = f .
Construction of initial data for 3+1 numerical relativity 22
6.3. Initial data for orbiting binary black holes
The concept of helical symmetry for generating orbiting binary black hole initial data
has been introduced in 2002 by Gourgoulhon, Grandclément and Bonazzola [52, 57].
The system of equations that these authors have derived is equivalent to the XCTS
system with γ̃ = f , their work being previous to the formulation of the XCTS method
by Pfeiffer and York (2003) [80]. Since then other groups have combined XCTS with
helical symmetry to compute binary black hole initial data [38, 1, 2, 31]. Since all
these studies are using a flat conformal metric [choice (115)], the PDE system to be
solved is (116)-(118), with the additional simplification Ẽ = 0 and p̃i = 0 (vacuum).
The initial data manifold Σ0 is chosen to be R
3 minus two balls:
Σ0 = R
3\(B1 ∪ B2). (120)
In addition to the asymptotic flatness conditions, some boundary conditions must be
provided on the surfaces S1 and S2 of B1 and B2. One choose boundary conditions
corresponding to a non-expanding horizon, since this concept characterizes black holes
in equilibrium. We shall not detail these boundary conditions here; they can be found
in Refs. [38, 40, 41, 54, 65]. The condition of non-expanding horizon provides 3
among the 5 required boundary conditions [for the 5 components (Ψ, Ñ , βi)]. The two
remaining boundary conditions are given by (i) the choice of the foliation (choice of
the value of N at S1 and S2) and (ii) the choice of the rotation state of each black
hole (“individual spin”), as explained in Ref. [31].
Numerical codes for solving the above system have been constructed by
• Grandclément, Gourgoulhon and Bonazzola (2002) [57] for corotating binary
black holes;
• Cook, Pfeiffer, Caudill and Grigsby (2004, 2006) [38, 31] for corotating and
irrotational binary black holes;
• Ansorg (2005, 2007) [1, 2] for corotating binary black holes.
Detailed comparisons with post-Newtonian initial data (either from the standard post-
Newtonian formalism [17] or from the Effective One-Body approach [23, 42]) have
revealed a very good agreement, as shown in Refs. [43, 31].
An alternative to (120) for the initial data manifold would be to consider the
twice-punctured R3:
Σ0 = R
3\{O1, O2}, (121)
where O1 and O2 are two points of R
3. This would constitute some extension to the
two bodies case of the punctured initial data discussed in Sec. 5.3. However, as shown
by Hannam, Evans, Cook and Baumgarte in 2003 [60], it is not possible to find a
solution of the helically symmetric XCTS system with a regular lapse in this case‖.
For this reason, initial data based on the puncture manifold (121) are computed within
the CTT framework discussed in Sec. 3. As already mentioned, there is no natural
way to implement helical symmetry in this framework. One instead selects the free
data Â
TT to vanish identically, as in the single black hole case treated in Secs. 4.1 and
4.3. Then
Âij = (L̃X)ij . (122)
‖ see however Ref. [59] for some attempt to circumvent this
Construction of initial data for 3+1 numerical relativity 23
The vector X must obey Eq. (45), which arises from the momentum constraint. Since
this equation is linear, one may choose for X a linear superposition of two Bowen-York
solutions (Sec. 4.3):
X = X(P (1),S(1)) +X(P (2),S(2)), (123)
where X(P (a),S(a)) (a = 1, 2) is the Bowen-York solution (71) centered on Oa. This
method has been first implemented by Baumgarte in 2000 [11]. It has been since then
used by Baker, Campanelli, Lousto and Takashi (2002) [5] and Ansorg, Brügmann and
Tichy (2004) [3]. The initial data hence obtained are closed from helically symmetric
XCTS initial data at large separation but deviate significantly from them, as well
as from post-Newtonian initial data, when the two black holes are very close. This
means that the Bowen-York extrinsic curvature is bad for close binary systems in
quasi-equilibrium (see discussion in Ref. [43]).
Remark : Despite of this, CTT Bowen-York configurations have been used as initial
data for the recent binary black hole inspiral and merger computations by Baker
et al. [6, 7, 99] and Campanelli et al. [25, 26, 27, 28]. Fortunately, these initial
data had a relative large separation, so that they differed only slightly from the
helically symmetric XCTS ones.
Instead of choosing somewhat arbitrarily the free data of the CTT and XCTS
methods, notably setting γ̃ = f , one may deduce them from post-Newtonian results.
This has been done for the binary black hole problem by Tichy, Brügmann, Campanelli
and Diener (2003) [94], who have used the CTT method with the free data (γ̃ij , Â
given by the second order post-Newtonian (2PN) metric. This work has been improved
recently by Kelly, Tichy, Campanelli and Whiting (2007) [66]. In the same spirit,
Nissanke (2006) [75] has provided 2PN free data for both the CTT and XCTS methods.
6.4. Initial data for orbiting binary neutron stars
For computing initial data corresponding to orbiting binary neutron stars, one must
solve equations for the fluid motion in addition to the Einstein constraints. Basically
this amounts to solving ∇νT µν = 0 in the context of helical symmetry. One can then
show that a first integral of motion exists in two cases: (i) the stars are corotating,
i.e. the fluid 4-velocity is colinear to the helical Killing vector (rigid motion), (ii) the
stars are irrotational, i.e. the fluid vorticity vanishes. The most straightforward way
to get the first integral of motion is by means of the Carter-Lichnerowicz formulation
of relativistic hydrodynamics, as shown in Sec. 7 of Ref. [50]. Other derivations have
been obtained in 1998 by Teukolsky [93] and Shibata [83].
From the astrophysical point of view, the irrotational motion is much more
interesting than the corotating one, because the viscosity of neutron star matter is
far too low to ensure the synchronization of the stellar spins with the orbital motion.
On the other side, the irrotational state is a very good approximation for neutron
stars that are not millisecond rotators. Indeed, for these stars the spin frequency is
much lower than the orbital frequency at the late stages of the inspiral and thus can
be neglected.
The first initial data for binary neutron stars on circular orbits have been
computed by Baumgarte, Cook, Scheel, Shapiro and Teukolsky in 1997 [12, 13] in
the corotating case, and by Bonazzola, Gourgoulhon and Marck in 1999 [19] in the
irrotational case. These results were based on a polytropic equation of state. Since
then configurations in the irrotational regime have been obtained
Construction of initial data for 3+1 numerical relativity 24
• for a polytropic equation of state [73, 96, 97, 53, 90, 91] (the configurations
obtained in Ref. [91] have been used as initial data by Shibata [84] to compute
the merger of binary neutron stars);
• for nuclear matter equations of state issued from recent nuclear physics
computations [16, 77];
• for strange quark matter [78, 72].
All these computation are based on a flat conformal metric [choice (115)], by
solving the helically symmetric XCTS system (116)-(118), supplemented by an elliptic
equation for the velocity potential. Only very recently, configurations based on a non
flat conformal metric have been obtained by Uryu, Limousin, Friedman, Gourgoulhon
and Shibata [98]. The conformal metric is then deduced from a waveless approximation
developed by Shibata, Uryu and Friedman [85] and which goes beyond the IWM
approximation.
6.5. Initial data for black hole - neutron star binaries
Let us mention briefly that initial data for a mixed binary system, i.e. a system
composed of a black hole and a neutron star, have been obtained very recently by
Grandclément [55] and Taniguchi, Baumgarte, Faber and Shapiro [88, 89]. Codes
aiming at computing such systems have also been presented by Ansorg [2] and Tsokaros
and Uryu [95].
Acknowledgments
I warmly thank the organizers of the VII Mexican school, namely Miguel Alcubierre,
Hugo Garcia-Compean and Luis Urena, for their support and the success of the school.
I also express my gratitude to Marcelo Salgado for his help and many discussions and
to Nicolas Vasset for the careful reading of the manuscript.
References
[1] M. Ansorg : Double-domain spectral method for black hole excision data, Phys. Rev. D 72,
024018 (2005).
[2] M. Ansorg: Multi-Domain Spectral Method for Initial Data of Arbitrary Binaries in General
Relativity, Class. Quantum Grav. 24, S1 (2007).
[3] M. Ansorg, B. Brügmann and W. Tichy : Single-domain spectral method for black hole puncture
data, Phys. Rev. D 70, 064011 (2004).
[4] R.F. Baierlein, D.H Sharp and J.A. Wheeler : Three-Dimensional Geometry as Carrier of
Information about Time, Phys. Rev. 126, 1864 (1962).
[5] J.G. Baker, M. Campanelli, C.O. Lousto and R. Takahashi : Modeling gravitational radiation
from coalescing binary black holes, Phys. Rev. D 65, 124012 (2002).
[6] J.G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and J. van Meter : Gravitational-Wave
Extraction from an Inspiraling Configuration of Merging Black Holes, Phys. Rev. Lett. 96,
111102 (2006).
[7] J.G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and J. van Meter : Binary black hole merger
dynamics and waveforms, Phys. Rev. D 73, 104002 (2006).
[8] R. Bartnik : Quasi-spherical metrics and prescribed scalar curvature, J. Diff. Geom. 37, 31
(1993).
[9] R. Bartnik and G. Fodor : On the restricted validity of the thin sandwich conjecture, Phys. Rev.
D 48, 3596 (1993).
[10] R. Bartnik and J. Isenberg : The Constraint Equations, in The Einstein Equations and the
Large Scale Behavior of Gravitational Fields — 50 years of the Cauchy Problem in General
Relativity, edited by P.T. Chruściel and H. Friedrich, Birkhäuser Verlag, Basel (2004), p. 1.
Construction of initial data for 3+1 numerical relativity 25
[11] T.W. Baumgarte : Innermost stable circular orbit of binary black holes, Phys. Rev. D 62, 024018
(2000).
[12] T.W. Baumgarte, G.B. Cook, M.A. Scheel, S.L. Shapiro, and S.A. Teukolsky : Binary neutron
stars in general relativity: Quasiequilibrium models, Phys. Rev. Lett. 79, 1182 (1997).
[13] T.W. Baumgarte, G.B. Cook, M.A. Scheel, S.L. Shapiro, and S.A. Teukolsky : General
relativistic models of binary neutron stars in quasiequilibrium, Phys. Rev. D 57, 7299 (1998).
[14] T.W. Baumgarte, N. Ó Murchadha, and H.P. Pfeiffer : Einstein constraints: Uniqueness and
non-uniqueness in the conformal thin sandwich approach, Phys. Rev. D 75, 044009 (2007).
[15] R. Beig and W. Krammer : Bowen-York tensors, Class. Quantum Grav. 21, S73 (2004).
[16] M. Bejger, D. Gondek-Rosińska, E. Gourgoulhon, P. Haensel, K. Taniguchi, and J. L. Zdunik :
Impact of the nuclear equation of state on the last orbits of binary neutron stars, Astron.
Astrophys. 431, 297-306 (2005).
[17] L. Blanchet : Innermost circular orbit of binary black holes at the third post-Newtonian
approximation, Phys. Rev. D 65, 124009 (2002).
[18] L. Blanchet : Gravitational Radiation from Post-Newtonian Sources and Inspiralling Compact
Binaries, Living Rev. Relativity 9, 4 (2006); http://www.livingreviews.org/lrr-2006-4
[19] S. Bonazzola, E. Gourgoulhon, and J.-A. Marck : Numerical models of irrotational binary
neutron stars in general relativity, Phys. Rev. Lett. 82, 892 (1999).
[20] J.M. Bowen and J.W. York : Time-asymmetric initial data for black holes and black-hole
collisions, Phys. Rev. D 21, 2047 (1980).
[21] S. Brandt and B. Brügmann : A Simple Construction of Initial Data for Multiple Black Holes,
Phys. Rev. Lett. 78, 3606 (1997).
[22] S.R. Brandt and E. Seidel : Evolution of distorted rotating black holes. II. Dynamics and
analysis, Phys. Rev. D 52, 870 (1995).
[23] A. Buonanno and T. Damour : Effective one-body approach to general relativistic two-body
dynamics, Phys. Rev. D 59, 084006 (1999).
[24] M. Campanelli : The dawn of a golden age for binary black hole simulations, in these
proceedings.
[25] M. Campanelli, C. O. Lousto, P. Marronetti, and Y. Zlochower : Accurate Evolutions of Orbiting
Black-Hole Binaries without Excision, Phys. Rev. Lett. 96, 111101 (2006).
[26] M. Campanelli, C. O. Lousto, and Y. Zlochower : Last orbit of binary black holes, Phys. Rev.
D 73, 061501(R) (2006).
[27] M. Campanelli, C. O. Lousto, and Y. Zlochower : Spinning-black-hole binaries: The orbital
hang-up, Phys. Rev. D 74, 041501(R) (2006).
[28] M. Campanelli, C. O. Lousto, and Y. Zlochower : Spin-orbit interactions in black-hole binaries,
Phys. Rev. D 74, 084023 (2006).
[29] M. Cantor: The existence of non-trivial asymptotically flat initial data for vacuum spacetimes,
Commun. Math. Phys. 57, 83 (1977).
[30] M. Cantor : Some problems of global analysis on asymptotically simple manifolds, Compositio
Mathematica 38, 3 (1979); available at http://www.numdam.org/item?id=CM_1979__38_1_3_0
[31] M. Caudill, G.B. Cook, J.D. Grigsby, and H.P. Pfeiffer : Circular orbits and spin in black-hole
initial data, Phys. Rev. D 74, 064011 (2006).
[32] M.W. Choptuik : Numerical analysis for numerical relativists, in these proceedings.
[33] Y. Choquet-Bruhat : New elliptic system and global solutions for the constraints equations in
general relativity, Commun. Math. Phys. 21, 211 (1971).
[34] Y. Choquet-Bruhat and D. Christodoulou : Elliptic systems of Hs,δ spaces on manifolds which
are Euclidean at infinity, Acta Math. 146, 129 (1981)
[35] Y. Choquet-Bruhat, J. Isenberg, and J.W. York : Einstein constraints on asymptotically
Euclidean manifolds, Phys. Rev. D 61, 084034 (2000).
[36] Y. Choquet-Bruhat and J.W. York : The Cauchy Problem, in General Relativity and
Gravitation, one hundred Years after the Birth of Albert Einstein, Vol. 1, edited by A. Held,
Plenum Press, New York (1980), p. 99.
[37] G.B. Cook : Initial data for numerical relativity, Living Rev. Relativity 3, 5 (2000);
http://www.livingreviews.org/lrr-2000-5
[38] G.B. Cook and H.P. Pfeiffer : Excision boundary conditions for black-hole initial data, Phys.
Rev. D 70, 104016 (2004).
[39] J. Corvino : Scalar curvature deformation and a gluing construction for the Einstein constraint
equations, Commun. Math. Phys. 214, 137 (2000).
[40] S. Dain : Trapped surfaces as boundaries for the constraint equations, Class. Quantum Grav.
21, 555 (2004); errata in Class. Quantum Grav. 22, 769 (2005).
[41] S. Dain, J.L. Jaramillo, and B. Krishnan : On the existence of initial data containing isolated
Construction of initial data for 3+1 numerical relativity 26
black holes, Phys.Rev. D 71, 064003 (2005).
[42] T. Damour : Coalescence of two spinning black holes: An effective one-body approach, Phys.
Rev. D 64, 124013 (2001).
[43] T. Damour, E. Gourgoulhon, and P. Grandclément : Circular orbits of corotating binary black
holes: comparison between analytical and numerical results, Phys. Rev. D 66, 024007 (2002).
[44] S. Detweiler : Periodic solutions of the Einstein equations for binary systems, Phys. Rev. D 50,
4929 (1994).
[45] Y. Fourès-Bruhat (Y. Choquet-Bruhat) : Sur l’Intégration des Équations de la Relativité
Générale, J. Rational Mech. Anal. 5, 951 (1956).
[46] J.L. Friedman, K. Uryu and M. Shibata : Thermodynamics of binary black holes and neutron
stars, Phys. Rev. D 65, 064035 (2002); erratum in Phys. Rev. D 70, 129904(E) (2004).
[47] A. Garat and R.H. Price : Nonexistence of conformally flat slices of the Kerr spacetime, Phys.
Rev. D 61, 124011 (2000).
[48] G.W. Gibbons and J.M. Stewart : Absence of asymptotically flat solutions of Einstein’s
equations which are periodic and empty near infinity, in Classical General Relativity,
Eds. W.B. Bonnor, J.N. Islam and M.A.H. MacCallum Cambridge University Press,
Cambridge (1983), p. 77.
[49] R.J. Gleiser, C.O. Nicasio, R.H. Price, and J. Pullin : Evolving the Bowen-York initial data for
spinning black holes, Phys. Rev. D 57, 3401 (1998).
[50] E. Gourgoulhon : An introduction to relativistic hydrodynamics, in Stellar Fluid Dynamics
and Numerical Simulations: From the Sun to Neutron Stars, edited by M. Rieutord & B.
Dubrulle, EAS Publications Series 21, EDP Sciences, Les Ulis (2006), p. 43; available as
arXiv:gr-qc/0603009.
[51] E. Gourgoulhon : 3+1 Formalism and Bases of Numerical Relativity, lectures at Institut Henri
Poincaré (Paris, Sept.-Dec. 2006), arXiv:gr-qc/0703035.
[52] E. Gourgoulhon, P. Grandclément, and S. Bonazzola : Binary black holes in circular orbits. I.
A global spacetime approach, Phys. Rev. D 65, 044020 (2002).
[53] E. Gourgoulhon, P. Grandclément, K. Taniguchi, J.-A. Marck, and S. Bonazzola :
Quasiequilibrium sequences of synchronized and irrotational binary neutron stars in general
relativity: Method and tests, Phys. Rev. D 63, 064029 (2001).
[54] E. Gourgoulhon and J.L. Jaramillo : A 3+1 perspective on null hypersurfaces and isolated
horizons, Phys. Rep. 423, 159 (2006).
[55] P. Grandclément : Accurate and realistic initial data for black hole-neutron star binaries, Phys.
Rev. D 74, 124002 (2006); erratum in Phys. Rev. D 75, 129903(E) (2007).
[56] P. Grandclément, S. Bonazzola, E. Gourgoulhon, and J.-A. Marck : A multi-domain spectral
method for scalar and vectorial Poisson equations with non-compact sources, J. Comput.
Phys. 170, 231 (2001).
[57] P. Grandclément, E. Gourgoulhon, and S. Bonazzola : Binary black holes in circular orbits. II.
Numerical methods and first results, Phys. Rev. D 65, 044021 (2002).
[58] P. Grandclément and J. Novak : Spectral methods for numerical relativity, Living Rev. Relativity,
submitted, preprint arXiv:0706.2286.
[59] M.D. Hannam : Quasicircular orbits of conformal thin-sandwich puncture binary black holes,
Phys. Rev. D 72, 044025 (2005).
[60] M.D. Hannam, C.R. Evans, G.B Cook and T.W. Baumgarte : Can a combination of
the conformal thin-sandwich and puncture methods yield binary black hole solutions in
quasiequilibrium?, Phys. Rev. D 68, 064003 (2003).
[61] J.A. Isenberg : Waveless Approximation Theories of Gravity, preprint University of Maryland
(1978), unpublished but available as arXiv:gr-qc/0702113; an abridged version can be found
in Ref. [64].
[62] J. Isenberg : Constant mean curvature solutions of the Einstein constraint equations on closed
manifolds, Class. Quantum Grav. 12, 2249 (1995).
[63] J. Isenberg, R. Mazzeo, and D. Pollack : Gluing and wormholes for the Einstein constraint
equations, Commun. Math. Phys. 231, 529 (2002).
[64] J. Isenberg and J. Nester : Canonical Gravity, in General Relativity and Gravitation, one
hundred Years after the Birth of Albert Einstein, Vol. 1, edited by A. Held, Plenum Press,
New York (1980), p. 23.
[65] J.L. Jaramillo, M. Ansorg, F. Limousin : Numerical implementation of isolated horizon boundary
conditions, Phys. Rev. D 75, 024019 (2007).
[66] B.J. Kelly, W. Tichy, M. Campanelli, and B.F. Whiting : Black-hole puncture initial data with
realistic gravitational wave content, Phys. Rev. D 76, 024008 (2007).
[67] C. Klein : Binary black hole spacetimes with a helical Killing vector, Phys. Rev. D 70, 124026
http://arxiv.org/abs/gr-qc/0603009
http://arxiv.org/abs/gr-qc/0703035
http://arxiv.org/abs/0706.2286
http://arxiv.org/abs/gr-qc/0702113
Construction of initial data for 3+1 numerical relativity 27
(2004).
[68] P. Laguna : Conformal-thin-sandwich initial data for a single boosted or spinning black hole
puncture, Phys. Rev. D 69, 104020 (2004).
[69] P. Laguna : Two and three body encounters: Astrophysics and the role of numerical relativity,
in these proceedings.
[70] A. Lichnerowicz : L’intégration des équations de la gravitation relativiste et le problème des n
corps, J. Math. Pures Appl. 23, 37 (1944); reprinted in A. Lichnerowicz : Choix d’œuvres
mathématiques, Hermann, Paris (1982), p. 4.
[71] A. Lichnerowicz : Sur les équations relativistes de la gravitation, Bulletin de la S.M.F. 80, 237
(1952); available at http://www.numdam.org/item?id=BSMF_1952__80__237_0
[72] F. Limousin, D. Gondek-Rosińska, and E. Gourgoulhon : Last orbits of binary strange quark
stars, Phys. Rev. D 71, 064012 (2005).
[73] P. Marronetti, G.J. Mathews, and J.R. Wilson : Irrotational binary neutron stars in
quasiequilibrium, Phys. Rev. D 60, 087301 (1999).
[74] D. Maxwell : Initial Data for Black Holes and Rough Spacetimes, PhD Thesis, University of
Washington (2004).
[75] S. Nissanke : Post-Newtonian freely specifiable initial data for binary black holes in numerical
relativity, Phys. Rev. D 73, 124002 (2006).
[76] N. Ó Murchadha and J.W. York : Initial-value problem of general relativity. I. General
formulation and physical interpretation, Phys. Rev. D 10, 428 (1974).
[77] R. Oechslin, H.-T. Janka and A. Marek : Relativistic neutron star merger simulations with
non-zero temperature equations of state I. Variation of binary parameters and equation of
state, Astron. Astrophys. 467, 395 (2007).
[78] R. Oechslin, K. Uryu, G. Poghosyan, and F. K. Thielemann : The Influence of Quark Matter
at High Densities on Binary Neutron Star Mergers, Mon. Not. Roy. Astron. Soc. 349, 1469
(2004).
[79] H.P. Pfeiffer : The initial value problem in numerical relativity, in Proceedings Miami Waves
Conference 2004 [preprint arXiv:gr-qc/0412002].
[80] H.P. Pfeiffer and J.W. York : Extrinsic curvature and the Einstein constraints, Phys. Rev. D
67, 044022 (2003).
[81] H.P. Pfeiffer and J.W. York : Uniqueness and Nonuniqueness in the Einstein Constraints, Phys.
Rev. Lett. 95, 091101 (2005).
[82] F. Pretorius : Evolution of Binary Black-Hole Spacetimes, Phys. Rev. Lett. 95, 121101 (2005).
[83] M. Shibata : Relativistic formalism for computation of irrotational binary stars in
quasiequilibrium states, Phys. Rev. D 58, 024012 (1998).
[84] M. Shibata : Merger of binary neutron stars in full general relativity, in these proceedings.
[85] M. Shibata, K. Uryu, and J.L. Friedman : Deriving formulations for numerical computation of
binary neutron stars in quasicircular orbits, Phys. Rev. D 70, 044044 (2004); errata in Phys.
Rev. D 70, 129901(E) (2004).
[86] D. Shoemaker : Binary Black Hole Simulations Through the Eyepiece of Data Analysis, in these
proceedings.
[87] L. Smarr and J.W. York : Radiation gauge in general relativity, Phys. Rev. D 17, 1945 (1978).
[88] K. Taniguchi, T.W. Baumgarte, J.A. Faber, and S.L. Shapiro : Quasiequilibrium sequences of
black-hole-neutron-star binaries in general relativity, Phys. Rev. D 74, 041502(R) (2006).
[89] K. Taniguchi, T.W. Baumgarte, J.A. Faber, and S.L. Shapiro : Quasiequilibrium black hole-
neutron star binaries in general relativity, Phys. Rev. D 75, 084005 (2007).
[90] K. Taniguchi and E. Gourgoulhon : Quasiequilibrium sequences of synchronized and irrotational
binary neutron stars in general relativity. III. Identical and different mass stars with γ = 2,
Phys. Rev. D 66, 104019 (2002).
[91] K. Taniguchi and E. Gourgoulhon : Various features of quasiequilibrium sequences of binary
neutron stars in general relativity, Phys. Rev. D 68, 124025 (2003).
[92] S.A. Teukolsky : Linearized quadrupole waves in general relativity and the motion of test
particles, Phys. Rev. D 26, 745 (1982).
[93] S.A Teukolsky : Irrotational binary neutron stars in quasi-equilibrium in general relativity,
Astrophys. J. 504, 442 (1998).
[94] W. Tichy, B. Brügmann, M. Campanelli, and P. Diener : Binary black hole initial data for
numerical general relativity based on post-Newtonian data, Phys. Rev. D 67, 064008 (2003).
[95] A.A. Tsokaros and K. Uryu : Numerical method for binary black hole/neutron star initial data:
Code test, Phys. Rev. D 75, 044026 (2007).
[96] K. Uryu and Y. Eriguchi : New numerical method for constructing quasiequilibrium sequences
of irrotational binary neutron stars in general relativity, Phys. Rev. D 61, 124023 (2000).
http://arxiv.org/abs/gr-qc/0412002
Construction of initial data for 3+1 numerical relativity 28
[97] K. Uryu, M. Shibata, and Y. Eriguchi : Properties of general relativistic, irrotational binary
neutron stars in close quasiequilibrium orbits: Polytropic equations of state, Phys. Rev. D
62, 104015 (2000).
[98] K. Uryu, F. Limousin, J.L. Friedman, E. Gourgoulhon, and M. Shibata : Binary Neutron Stars:
Equilibrium Models beyond Spatial Conformal Flatness, Phys. Rev. Lett. 97, 171101 (2006).
[99] J.R. van Meter, J.G. Baker, M. Koppitz, D.I. Choi : How to move a black hole without excision:
gauge conditions for the numerical evolution of a moving puncture, Phys. Rev. D 73, 124011
(2006).
[100] D. Walsh : Non-uniqueness in conformal formulations of the Einstein Constraints, Class.
Quantum Grav 24, 1911 (2007).
[101] J.A. Wheeler : Geometrodynamics and the issue of the final state, in Relativity, Groups and
Topology, edited by C. DeWitt and B.S. DeWitt, Gordon and Breach, New York (1964),
p. 316.
[102] J.R. Wilson and G.J. Mathews : Relativistic hydrodynamics, in Frontiers in numerical
relativity, edited by C.R. Evans, L.S. Finn and D.W. Hobill, Cambridge University Press,
Cambridge (1989), p. 306.
[103] J.W. York : Mapping onto Solutions of the Gravitational Initial Value Problem, J. Math. Phys.
13, 125 (1972).
[104] J.W. York : Conformally invariant orthogonal decomposition of symmetric tensors on
Riemannian manifolds and the initial-value problem of general relativity, J. Math. Phys.
14, 456 (1973).
[105] J.W. York : Covariant decompositions of symmetric tensors in the theory of gravitation, Ann.
Inst. Henri Poincaré A 21, 319 (1974);
available at http://www.numdam.org/item?id=AIHPA_1974__21_4_319_0
[106] J.W. York : Kinematics and dynamics of general relativity, in Sources of Gravitational
Radiation, edited by L.L. Smarr, Cambridge University Press, Cambridge (1979), p. 83.
[107] J.W. York : Conformal “thin-sandwich” data for the initial-value problem of general relativity,
Phys. Rev. Lett. 82, 1350 (1999).
[108] J.W. York : Velocities and Momenta in an Extended Elliptic Form of the Initial Value
Conditions, Nuovo Cim. B119, 823 (2004).
	Introduction
	The initial data problem
	3+1 decomposition of Einstein equation
	Constructing initial data
	Conformal decomposition of the constraints
	Conformal transverse-traceless method
	Longitudinal/transverse decomposition of "705EAij
	Conformal transverse-traceless form of the constraints
	Decoupling on hypersurfaces of constant mean curvature
	Lichnerowicz equation
	Conformally flat initial data by the CTT method
	Momentarily static initial data
	Slice of Schwarzschild spacetime
	Bowen-York initial data
	Conformal thin sandwich method
	The original conformal thin sandwich method
	Extended conformal thin sandwich method
	XCTS at work: static black hole example
	Uniqueness of solutions
	Comparing CTT, CTS and XCTS
	Initial data for binary systems
	Helical symmetry
	Helical symmetry and IWM approximation
	Initial data for orbiting binary black holes
	Initial data for orbiting binary neutron stars
	Initial data for black hole - neutron star binaries
ABSTRACT
  This lecture is devoted to the problem of computing initial data for the
Cauchy problem of 3+1 general relativity. The main task is to solve the
constraint equations. The conformal technique, introduced by Lichnerowicz and
enhanced by York, is presented. Two standard methods, the conformal
transverse-traceless one and the conformal thin sandwich, are discussed and
illustrated by some simple examples. Finally a short review regarding initial
data for binary systems (black holes and neutron stars) is given.

<|endoftext|><|startoftext|>
Magnetism and Thermodynamics of Spin-1/2 Heisenberg Diamond Chains in a
Magnetic Field
Bo Gu and Gang Su∗
College of Physical Sciences, Graduate University of Chinese Academy of Sciences, P. O. Box 4588, Beijing 100049, China
The magnetic and thermodynamic properties of spin-1/2 Heisenberg diamond chains are investi-
gated in three different cases: (a) J1, J2, J3 > 0 (frustrated); (b) J1, J3 < 0, J2 > 0 (frustrated);
and (c) J1, J2 > 0, J3 < 0 (non-frustrated), where the bond coupling Ji (i = 1, 2, 3) > 0 stands
for an antiferromagnetic (AF) interaction, and < 0 for a ferromagnetic (F) interaction. The density
matrix renormalization group (DMRG) technique is invoked to study the properties of the system
in the ground state, while the transfer matrix renormalization group (TMRG) technique is applied
to explore the thermodynamic properties. The local magnetic moments, spin correlation functions,
and static structure factors are discussed in the ground state for the three cases. It is shown that
the static structure factor S(q) shows peaks at wavevectors q = aπ/3 (a = 0, 1, 2, 3, 4, 5) for different
couplings in a zero magnetic field, which, however in the magnetic fields where the magnetization
plateau with m = 1/6 pertains, exhibits the peaks only at q = 0, 2π/3 and 4π/3, which are found
to be couplings-independent. The DMRG results of the zero-field static structure factor can be
nicely fitted by a linear superposition of six modes, where two fitting equations are proposed. It is
observed that the six modes are closely related to the low-lying excitations of the system. At finite
temperatures, the magnetization, susceptibility and specific heat show various behaviors for different
couplings. The double-peak structures of the susceptibility and specific heat against temperature
are obtained, where the peak positions and heights are found to depend on the competition of the
couplings. It is also uncovered that the XXZ anisotropy of F and AF couplings leads the system of
case (c) to display quite different behaviors. In addition, the experimental data of the susceptibility,
specific heat and magnetization for the compound Cu3(CO3)2(OH)2 are fairly compared with our
TMRG results.
PACS numbers: 75.10.Jm, 75.40.Cx
I. INTRODUCTION
Low-dimensional quantum spin systems with compet-
ing interactions have become an intriguing subject in the
last decades. Among many achievements in this area,
the phenomenon of the topological quantization of mag-
netization has attracted much attention both theoreti-
cally and experimentally. A general necessary condi-
tion for the appearance of the magnetization plateaus
has been proposed by Oshikawa, Yamanaka and Affleck
(OYA) [1], stating that for the Heisenberg antiferromag-
netic (AF) spin chain with a single-ion anisotropy, the
magnetization curve may have plateaus at which the
magnetization per site m is topologically quantized by
n(S − m) = integer, where S is the spin, and n is the
period of the ground state determined by the explicit
spatial structure of the Hamiltonian. As one of fasci-
nating models which potentially possesses the magne-
tization plateaus, the Heisenberg diamond chain, con-
sisting of diamond-shaped topological unit along the
chain, as shown in Fig. 1, has also gained much atten-
tion both experimentally and theoretically (e.g. Refs.
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]).
It has been observed that the compounds,
A3Cu3(PO4)4 with A = Ca, Sr[2], and Bi4Cu3V2O14[3]
can be nicely modeled by the Heisenberg diamond chain.
Another spin-1/2 compound Cu3Cl6(H2O)2·2H8C4SO2
was initially regarded as a model substance for the
spin-1/2 diamond chain [4], but a later experimental
research reveals that this compound should be described
by a double chain model with very weak bond alterna-
tions, and the lattice of the compound is found to be
Cu2Cl4·H8C4SO2 [5]. Recently, Kikuchi et al.[6] have
reported the experimental results on a spin-1/2 com-
pound Cu3(CO3)2(OH)2, where local Cu
2+ ions with
spin S = 1/2 are arranged along the chain direction, and
the diamond-shaped units consist of a one-dimensional
(1D) lattice. The 1/3 magnetization plateau and the
double peaks in the magnetic susceptibility as well as
the specific heat as functions of temperature have been
observed experimentally[6, 7], which has been discussed
in terms of the spin-1/2 Heisenberg diamond chain with
AF couplings J1, J2 and J3 > 0.
On the theoretical aspect, the frustrated diamond spin
chain with AF interactions J1, J2 and J3 > 0 was studied
by a few groups. The first diamond spin chain was ex-
plored under a symmetrical condition J1 = J3[8]. Owing
to the competition of AF interactions, the phase diagram
in the ground state of the spin-1/2 frustrated diamond
chain was found to contain different phases, in which the
magnetization plateaus at m = 1/6 as well as 1/3 are
predicted[9, 10, 11, 12, 13]. Another frustrated diamond
chain with ferromagnetic (F) interactions J1, J3 < 0 and
AF interaction J2 > 0 was also investigated theoretically,
which can be experimentally realized if all angles of the
exchange coupling bonds are arranged to be around 90◦,
a region where it is usually hard to determine safely the
coupling constants and even about their signs[14, 15].
Despite of these works, the investigations on the Heisen-
http://arxiv.org/abs/0704.0150v1
berg diamond spin chain with various competing inter-
actions are still sparse.
Motivated by the recent experimental observation on
the azurite compound Cu3(CO3)2(OH)2[6, 7], we shall
explore systematically the magnetic and thermodynamic
properties of the spin-1/2 Heisenberg diamond chain with
various competing interactions in a magnetic field, and
attempt to fit into the experimental observation on the
azurite in a consistent manner. The density matrix renor-
malization group (DMRG) as well as the transfer ma-
trix renormalization group (TMRG) techniques will be
invoked to study the ground-state properties and ther-
modynamics of the model under interest, respectively.
The local magnetic moments, spin correlation functions,
and static structure factors will be discussed for three
cases at zero temperature. It is found that the static
structure factor S(q) shows peaks in zero magnetic field
at wavevectors q = aπ/3 (a = 0, 1, 2, 3, 4, 5) for different
couplings, while in the magnetic fields where the mag-
netization plateau with m = 1/6 remains, the peaks ap-
pear only at wavevectors q = 0, 2π/3 and 4π/3, which
are found to be couplings-independent. These informa-
tion could be useful for further neutron studies. The
double-peak structures of the susceptibility and specific
heat against temperature are obtained, where the peak
positions and heights are found to depend on the com-
petition of the couplings. It is uncovered that the XXZ
anisotropy of F and AF couplings leads the system with-
out frustration (see below) to display quite different be-
haviors. In addition, the experimental data of the sus-
ceptibility, specific heat and magnetization for the com-
pound Cu3(CO3)2(OH)2 are fairly compared with our
TMRG results.
The rest of this paper is outlined as follows. In Sec. II,
we shall introduce the model Hamiltonian for the spin-
1/2 Heisenberg diamond chain with three couplings J1,
J2 and J3, where three particular cases are identified. In
Sec. III, the magnetic and thermodynamic properties of
a frustrated diamond chain with AF interactions J1, J2
and J3 > 0 will be discussed. In Sec. IV, the physi-
cal properties of another frustrated diamond chain with
F interactions J1, J3 < 0 and AF interaction J2 > 0
will be considered. In Sec. V, the magnetism and ther-
modynamics of a non-frustrated diamond chain with AF
interactions J1, J2 > 0 and F interaction J3 < 0 will
be explored, and a comparison to the experimental data
on the azurite compound will be made. Finally, a brief
summary and discussion will be presented in Sec. VI.
FIG. 1: (Color online) Sketch of the Heisenberg diamond
chain. The bond interactions are denoted by J1, J2, and J3.
Three cases will be considered: (a) J1, J2, J3 > 0 (a frustrated
diamond chain); (b) J1, J3 < 0, J2 > 0 (a frustrated diamond
chain with competing interactions); and (c) J1, J2 > 0, J3 < 0
(a diamond chain without frustration). Note that Ji > 0
stands for an antiferromagnetic interaction while Ji < 0 for a
ferromagnetic interaction, where i = 1, 2, 3.
II. MODEL
The Hamiltonian of the spin-1/2 Heisenberg diamond
chain reads
(J1S3i−2 · S3i−1 + J2S3i−1 · S3i
+J3S3i−2 · S3i + J3S3i−1 · S3i+1
+J1S3i · S3i+1)−H ·
Sj , (1)
where Sj is the spin operator at the jth site, L is the to-
tal number of spins in the diamond chain, Ji (i = 1, 2, 3)
stands for exchange interactions, H is the external mag-
netic field, gµB = 1 and kB = 1. Ji > 0 represents the
AF coupling while Ji < 0 the F interaction. There are
three different cases particularly interesting, as displayed
in Fig. 1, which will be considered in the present paper:
(a) a frustrated diamond chain with J1, J2, J3 > 0; (b)
a frustrated diamond chain with competing interactions
J1, J3 < 0, J2 > 0; and (c) a diamond chain without
frustration with J1, J2 > 0, J3 < 0. It should be re-
marked that in the case (c) of this model, since the two
end points of the J2 bond represent the two different lat-
tice sites, it is possible that J1 and J3 can be different,
even their signs.
The magnetic properties and thermodynamics for
the aforementioned three spin-1/2 Heisenberg diamond
chains in the ground states and at finite temperatures
will be investigated by means of the DMRG and TMRG
methods, respectively. As the DMRG and TMRG tech-
niques were detailed in two nice reviews[16, 17], we
shall not repeat the technical details for concise. In the
ground-state calculations, the total number of spins in
the diamond chain is taken at least as L = 120. At finite
temperatures, the thermodynamic properties presented
below are calculated down to temperature T = 0.05 (in
units of |J1|) in the thermodynamic limit. In our calcu-
lations, the number of kept optimal states is taken as 81;
the width of the imaginary time slice is taken as ε = 0.1;
the Trotter-Suzuki error is less than 10−3; and the trun-
cation error is smaller than 10−6.
III. A FRUSTRATED HEISENBERG DIAMOND
CHAIN (J1, J2, J3 > 0)
A. Local Magnetic Moment and Spin Correlation
Function
Figure 2(a) manifests the magnetization process of a
frustrated spin-1/2 Heisenberg diamond chain with the
couplings satisfying J1 : J2 : J3 = 1 : 2 : 2 at zero tem-
perature. The plateau of magnetization per site m = 1/6
is observed. According to the OYA necessary condition
[1], the m = 1/6 plateau of a spin-1/2 Heisenberg chain
corresponds to the period of the ground state n = 3.
Beyond the magnetization plateau region, the magnetic
curve goes up quickly with increasing the magnetic field
H . Above the upper critical field, the magnetic curve
shows a s-like shape. To further look at how this magneti-
zation plateau appears, the spatial dependence of the av-
eraged local magnetic moment 〈Szj 〉 in the ground states
under different external fields is presented, as shown in
Fig. 2(b). It is seen that in the absence of external
field, the expectation value 〈Szj 〉 changes its sign at every
three sites within a very small range of (−10−3, 10−3)
because of quantum fluctuations, resulting in the magne-
tization per site m =
j=1〈Szj 〉/L = 0. 〈Szj 〉 increases
with increasing the magnetic field, and oscillates with
increasing j, whose unit of three spins is gradually di-
vided into a pair and a single, as displayed in Fig. 2(c).
At the field H/J1 = 1.5, as demonstrated in Fig. 2(d),
the behavior of 〈Szj 〉 falls into a perfect sequence such as
{..., (Sa, Sa, Sb), ...} with Sa = 0.345 and Sb = −0.190,
giving rise to the magnetization per site m = 1/6. In ad-
dition, such a sequence remains with increasing the mag-
netic field till H/J1 = 2.5, implying that the m = 1/6
plateau appears in the range of H/J1 = 1.5 ∼ 2.5, as
manifested in Fig. 2(a). When the field is promoted fur-
ther, the sequence changes into a waved succession with
smaller swing of (Sa − Sb), as shown in Fig. 2(e), which
corresponds to the fact that the plateau state ofm = 1/6
is destroyed, and gives rise to a s-like shape of M(H). It
is noting that when the plateau state of m = 1/6 is de-
stroyed, the increase of m at first is mainly attributed to
a rapid lift of Sb, and later, the double Sa start to flimsily
increase till Sa = Sb = 0.5 at the saturated field.
The physical picture for the above results could be un-
derstood as follows. For the m = 1/6 plateau state at
0 1 2 3 4 5
0 40 80
0 40 80
T = 0
 = 1 : 2 : 2,
-1E-3
T = 0, H = 0
(d) H/J
 = 1.5, 2.5
(e) H/J
FIG. 2: (Color online) For a spin-1/2 frustrated Heisenberg
diamond chain with fixed couplings J1 : J2 : J3 = 1 : 2 : 2,
(a) the magnetization per site m as a function of magnetic
field H in the ground states; and the spatial dependence of
the averaged local magnetic moment 〈Szj 〉 in the ground states
with external field (b) H/J1 = 0, (c) 1, (d) 1.5 and 2.5, and
(e) 4.
J1 : J2 : J3 = 1 : 2 : 2, we note that if an approximate
wave function defined by[13]
(2| ↑3i−2↑3i−1↓3i〉 ± | ↑3i−2↓3i−1↑3i〉
± | ↓3i−2↑3i−1↑3i〉), (i = 1, ..., L/3) (2)
where ↑j (↓j) denotes spin up (down) on site j, is applied,
one may obtain 〈ψi|Sz3i−2|ψi〉 = 1/3, 〈ψi|Sz3i−1|ψi〉 = 1/3,
〈ψi|Sz3i|ψi〉 = −1/6, giving rise to a sequence {..., (13 ,
), ...}, andm = (1
)/3 = 1/6, which is in agree-
ment with our DMRG results {..., (0.345, 0.345, −0.190),
...}. This observation shows that the ground state of this
plateau state might be described by trimerized states.
Let Hc1 and Hc2 be the lower and upper critical mag-
netic field at which the magnetization plateau appears
and is destructed, respectively. For Hc1 ≤ H ≤ Hc2, the
magnetization m = mp = 1/6, namely, the system falls
into the magnetization plateau state. For 0 ≤ H ≤ Hc1
and J1 : J2 : J3 = 1 : 2 : 2, the magnetization curve
shows the following behavior
m(H) = mp(
)[1+α1(1−
)−α2(1−
)2/3], (3)
where Hc1/J1 = 1.44, the parameters α1 = 2/3 and α2 =
1. Obviously, when H = 0, m = 0; H = Hc1 , m = mp.
A fair comparison of Eq. (3) to the DMRG results is
presented in Fig. 3(a). For Hc2 ≤ H ≤ Hs and J1 : J2 :
J3 = 1 : 2 : 2, where Hs is the saturated magnetic field,
the magnetization curve has the form of
m(H) = mp + (H −Hc2){kc + (Hs −H)[
(H −Hc2)1/3
(Hs −H)1/3
]}, (4)
where kc = (ms −mp)/(Hs −Hc2) with ms the satura-
tion magnetization, and β1, β2 the parameters. One may
see that when H = Hc2 , m = mp; H = Hs, m = ms.
A nice fitting to the DMRG result gives the parameters
Hc2/J1 = 3.15, ms = 1/2, Hs/J1 = 4.55, β1 = 0.143,
β2 = 0.178 and kc = 0.238, as shown in Fig. 3(b).
It should be remarked that from the phenomenological
Eqs.(3) and (4), we find that, away from the plateau
region, the magnetic field dependence of the magnetiza-
tion of this model differs from those of Haldane-type spin
chains where m(H) ∼ (H −Hc1)1/2.
To explore further the magnetic properties of the frus-
trated spin-1/2 Heisenberg diamond chain in the ground
states with the couplings J1 : J2 : J3 = 1 : 2 : 2 at dif-
ferent external fields, let us look at the static structure
factor S(q) which is defined as
S(q) =
eiqj〈Szj Sz0 〉, (5)
where q is the wave vector, and 〈Szj Sz0 〉 is the spin corre-
lation function in the ground state. As demonstrated in
Fig. 4(a), in the absence of the external field, S(q) shows
three peaks: two at q = π/3, 5π/3, and one at q = π,
which is quite different from that of the spin S = 1/2
Heisenberg AF chain, where S(q) only diverges at q = π.
As indicated by Eq. (5), the peaks of S(q) reflect the
periods of the spin correlation function 〈Szj Sz0 〉, i.e., the
peaks at q = π/3 (5π/3) and π reflect the periods of 6
and 2 for 〈Szj Sz0 〉, respectively. As shown in Fig. 4(b),
in the absence of the external field, 〈Szj Sz0 〉 changes sign
every three sites, which corresponds really to the peri-
ods of 6 and 2. With increasing the magnetic field, the
small peak of S(q) at q = π becomes a round valley while
the peak at q = π/3 (5π/3) continuously shifts towards
q = 2π/3 (4π/3) with the height enhanced, indicating
the corruption of the periods of 6 and 2 but the emer-
gence of the new period ∈ (3, 6) for 〈Szj Sz0 〉, as shown
in Fig. 4(c). At the field H/J1 = 1.5, two peaks shift
to q = 2π/3 and 4π/3 respectively, and merge into the
peaks already existing there, showing the existence of
3.0 3.5 4.0 4.5
0.0 0.5 1.0 1.5
 DMRG
=1, J
=2, J
 Eq. (4)
= 1/6, k
= 0.238,
= 3.15, H
= 4.55,
= 0.143, 
= 0.178
 DMRG
 Eq.(3)
=1, J
=2, J
= 1/6, H
= 1.44, 
=2/3, 
FIG. 3: (Color online) For a spin-1/2 frustrated Heisenberg
diamond chain with fixed couplings J1 : J2 : J3 = 1 : 2 : 2, the
DMRG results of the magnetization per site m as a function
of magnetic field H away from the plateau state can be fairly
fitted by Eqs. (3) and (4), (a) for 0 ≤ H ≤ Hc1 , and (b) for
Hc2 ≤ H ≤ Hs.
the period 3 for 〈Szj Sz0〉, as clearly displayed in Fig. 4(d).
The valley and peaks of S(q) keep intact in the plateau
state at m = 1/6. When the plateau state is destroyed
at the field H/J1 = 4, the peaks at q = 2π/3 and 4π/3
are depressed dramatically while the peaks at q = π/3,
5π/3 and π appear again with very small heights, reveal-
ing the absence of the period 3 and the slight presence of
the period 2 and 6, as shown in Fig. 4(e). At the field
H/J1 = 4.8, all peaks disappear and become zero, ex-
cept for the peak at q = 0, which is the saturated state.
Therefore, the static structure factor S(q) shows differ-
ent characteristics in different magnetic fields[18]. On the
other hand, it is known that S(q) also reflects the low-
lying excitations of the system. It is thus reasonable to
expect that the low-lying excitations of the frustrated di-
amond chain will behave differently in different magnetic
fields.
To investigate the zero-field static structure factor S(q)
in the ground state for the frustrated spin-1/2 diamond
chains with various AF couplings, the four cases with
J1 = 1, J3 > 0, and J2 = 0.5, 1, 2, and 4 are shown
in Figs. 5(a)-5(d), respectively. For J2 = 0.5, as shown
in Fig. 5(a), S(q) displays a sharp peak at q = π when
J3 < 0.5, and three peaks at q = 0, 2π/3 and 4π/3
when J3 > 0.5. It is shown from the ground state phase
-0.01
-0.01
0 40 80
0 40 80
(a) J1 = 1,
 = 2,
 = 2,
T = 0
=1.5, 
T = 0, H = 0
(c) H/J
(d) H/J
 = 1.5, 2.5
FIG. 4: (Color online) For a spin-1/2 frustrated Heisenberg
diamond chain with fixed couplings J1 : J2 : J3 = 1 : 2 : 2,
(a) the static structure factor S(q) in the ground states under
different external fields; and the spatial dependence of the
spin correlation function 〈SzjS
0 〉 in the ground states under
external field (b) H/J1 = 0, (c) 1, (d) 1.5 and 2.5, and (e) 4.
diagram [10] that the system is in the spin fluid (SF)
phase when J1 = 1, J2 = 0.5, J3 < 0.5, and enters into
the ferrimagnetic (FRI) phase when J1 = 1, J2 = 0.5,
J3 > 0.5. For J2 = 1, as indicated in Fig. 5(b), the
incommensurate peaks exist, such as the case of J3 = 0.8,
where the system is in the dimerized (D) phase [10]. For
J2 = 2, as manifested in Fig. 5(c), S(q) has a sharp
peak at q = π and two ignorable peaks at π/3 and 5π/3
when J3 < 1; three sharp peaks at q = 0, 2π/3 and 4π/3
at J3 = 1; a round valley (J3 = 1.5) or a small peak
(J3 = 4) at q = π and two mediate peaks at π/3 and
5π/3 when J3 > 1. The system with J1 = 1 and J2 = 2
is in the D phase when 1 < J3 < 2.8, and in the SF
phase when J3 < 1 or J3 > 2.8 [10]. For J2 = 4 revealed
in Fig. 5(d), the situations are similar to that of Fig.
5(c), but here only the SF phase exists for the system
with J1 = 1, J2 = 4 and J3 > 0 [10]. It turns out that
even in the same phase, such as the SF phase, the zero-
field static structure factor S(q) could display different
characteristics for different AF couplings. In fact, we
note that the exotic peak of S(q) has been experimentally
observed in the diamond-typed compound Sr3Cu3(PO4)4
[19].
0 1 2
(a) T = 0,
H = 0,
 = 1,
 = 0.5
(b) J1 = 1,
 = 1J3=0.3
(c) J1 = 1,
 = 2J3=0.5
(d) J1 = 1,
FIG. 5: (Color online) The zero-field static structure factor
S(q) in the ground states for the spin-1/2 frustrated Heisen-
berg diamond chains with length L = 120, J1 = 1, J3 > 0
and J2 taken as (a) 0.5, (b) 1, (c) 2 and (d) 4.
If the spin correlation function for the spin-S chain can
be expressed as 〈Szj Sz0 〉 = α(−1)je−jβ , where α and β are
two parameters, its static structure factor will take the
form of
S(q) =
S(S + 1)
− α(cos q + e
cos q + coshβ
, (6)
which can recover exactly the S(q) of the spin-S AKLT
chain S(q) = S+1
1−cos q
1+cos q+2/S(S+2)
[20] with α = (S +
1)2/3 and β = ln(1+2/S). Eq.(6) has a peak at wavevec-
tor q = π. By noting that the zero-field static struc-
ture factor S(q) for the frustrated diamond chains dis-
plays peaks at wavevectors q = aπ/3 (a = 0, 1, 2, 3, 4, 5)
for different AF couplings, the spin correlation func-
tion 〈Szj Sz0 〉 could be reasonably divided into six modes
〈Sz6m+lSz0 〉 = cl + αle−(6m+l)β or 〈Sz6m+lSz0 〉 = αl(6m +
l)−β with j = 6m+ l and l = 1, 2, ..., 6, whose contribu-
tions to the static structure factor should be considered
separately[21]. Thus, the static structure factor for the
present systems could be mimicked by a superposition of
six modes, which leads to
S(q) =
e(6−l)β cos(lq)− e−lβ cos[(6 − l)q]
cosh(6β)− cos(6q)
cos(lq)− cos[(6− l)q]
1− cos(6q)
, (7)
S(q) =
2(6m+ l)−β cos[(6m+ 1)q] +
, (8)
0 1 2
0 1 2
 DMRG
T= 0, H= 0,
= 0.5,
= 0.3
 Eq.(8)
1,3,5
= -0.17,
2,4,6
= 0.17,
 = 1.4  
 DMRG
T=0, H=0,
=0.5,
 Eq.(7)
= -0.15,
= -0.01,
= 0.16,
= 0.27,
= -0.0007,
= -0.0013,
= 0.0117
 Eq.(8)
= -0.11,
= 0.04,
= -0.21,
= 1.3
 DMRG
T= 0, H= 0,
= 0.5  
 DMRG
 Eq.(8)
= -0.04,
= -0.16,
= -0.14,
 = 1.1
T=0, H=0, J
FIG. 6: (Color online) The DMRG results of the zero-field
static structure factor as a function of wavevector for the spin-
1/2 frustrated Heisenberg diamond chains are fitted: (a) J1 =
1, J2 = 0.5, J3 = 0.3 by Eq. (8); (b) J1 = 1, J2 = 0.5, J3 = 4
by Eq. (7); (c) J1 = 1, J2 = 4, J3 = 0.5, and (d) J1 = 1,
J2 = 4, J3 = 4 by Eq. (8).
respectively, depending on which phase the system falls
into, where αl, cl and β are couplings-dependent param-
eters.
As presented in Fig. 6, the DMRG results of the static
structure factor as a function of wavevector are fitted
by Eqs. (7) and (8) for the spin-1/2 frustrated Heisen-
berg diamond chains with various AF couplings in zero
magnetic field. It can be found that the characteristic
peaks can be well fitted by Eqs. (7) and (8), with only
a slightly quantitative deviation, showing that the main
features of the static structure factor for the present sys-
tems can be reproduced by a linear superposition of six
modes. The fitting results give six different modes in
general, as shown in Fig. 4(b).
To further understand the above-mentioned behav-
iors of the zero-field static structure factor, S(q), we
have applied the Jordan-Wigner (JW) transformation
to study the low-lying excitations of the spin-1/2 frus-
trated Heisenberg diamond chain with various AF cou-
plings (see Appendix A for derivations). It can be seen
that the zero-field low-lying fermionic excitation ε(k) be-
haves differently for different AF couplings, as shown in
Figs. 7(a)-(d). Obviously, these low-lying excitations are
responsible for the DMRG calculated behaviors of S(q),
where the positions of minimums of ε(k) for different AF
couplings, as indicated by arrows in Figs. 7(a)-(c), are
exactly consistent with the locations of the peaks of zero-
field static structure factor S(q) shown in Figs. 6(a)-(c),
respectively, although there is a somewhat deviation for
Fig. 7(d) and Fig. 6(d). It also shows that the six
0 1 2
0 1 2
T=0, H=0,
 = 1, 
 = 0.5, 
 = 0.3
(b) T=0, H=0,
 = 1, 
 = 0.5, 
(c) T=0, H=0,
 = 1, 
 = 4, 
 = 0.5
(d) T=0, H=0,
 = 1, 
 = 4, 
FIG. 7: (Color online) The zero-field low-lying fermionic ex-
citation as a function of wavevector for the spin-1/2 frustrated
Heisenberg diamond chain with (a) J1 = 1, J2 = 0.5, J3 = 0.3,
(b) J1 = 1, J2 = 0.5, J3 = 4, (c) J1 = 1, J2 = 4, J3 = 0.5,
and (d) J1 = 1, J2 = 4, J3 = 4. The arrows indicate the
locations of minimums of ε(k).
modes suggested by Eqs. (7) and (8) is closely related to
the low-lying excitations of the system.
B. Magnetization, Susceptibility and Specific Heat
The magnetization process for the spin-1/2 frustrated
diamond chain with J1 = 1, J3 > 0, and J2 = 0.5 and
2 is shown in Fig. 8(a) and 8(b), respectively, where
temperature is fixed as T/J1 = 0.05. It is found that
the magnetization exhibits different behaviors for differ-
ent AF couplings: a plateau at m = 1/6 is observed, in
agreement with the ground state phase diagram [12, 13];
for J2 = 0.5, as shown in Fig. 8(a), the larger J3 is, the
larger the width of the plateau at m = 1/6 becomes; for
J2 = 2, as presented in Fig. 8(b), the width of the plateau
at m = 1/6 becomes larger with increasing J3 < 1, and
then turns smaller with increasing J3 > 1; for J3 < 1, the
larger J2, the larger the width of the plateau atm = 1/6;
for J3 = 2, the larger J2, the smaller the width of the
plateau at m = 1/6. The saturated field is obviously
promoted with increasing AF J3 and J2.
Figures 8(c) and 8(d) give the susceptibility χ as a
function of temperature T for the spin-1/2 frustrated
diamond chain with J1 = 1, J3 > 0 and J2 = 0.5
and 2, respectively, while the external field is taken as
H/J1 = 0.01. For J2 = 0.5, as shown in Fig. 8(c), the
low temperature part of χ(T ) keeps finite when J3 < 0.5,
and becomes divergent when J3 > 0.5. As clearly man-
ifested in the inset of Fig. 8(c), J3 = 0.5 is the critical
value, which is consistent with the behaviors of static
structure factor S(q) in Fig. 5(a). For J2 = 2, as shown
in Fig. 8(d), an unobvious double-peak structure at low
temperature is observed at small and large J3 such as 0.2
and 3. The temperature dependence of the specific heat
C with J1 = 1, J3 > 0 and J2 = 0.5 and 2 is shown in
Fig. 8(e) and 8(f), respectively, while the external field
is fixed as H/J1 = 0.01. For J2 = 0.5, as given in Fig.
8(e), a double-peak structure of C(T ) is observed at low
temperature for small and large J3 such as 0.3 and 1.
The case with J2 = 2 shown in Fig. 8(f) exhibits the
similar characteristics. Thus, the thermodynamics of the
system demonstrate different behaviors for different AF
couplings. As manifested in Fig. 5, the low-lying excita-
tions behave differently for different AF couplings. The
double-peak structure of the susceptibility as well as the
specific heat could be attributed to the excited gaps in
the low-lying excitation spectrum[22].
IV. A FRUSTRATED DIAMOND CHAIN WITH
COMPETING INTERACTIONS (J1, J3 < 0, J2 > 0)
A. Local Magnetic Moment and Spin Correlation
Function
Figure 9(a) shows the magnetization process of a frus-
trated spin-1/2 Heisenberg diamond chain in the ground
states with the couplings J1 : J2 : J3 = −1 : 4 : −0.5.
The plateau of magnetization per site m = 1/6 is clearly
obtained. To understand the occurrence of the magne-
tization plateau, the spatial dependence of the averaged
local magnetic moment 〈Szj 〉 in the ground states at dif-
ferent external fields is calculated. It is seen that in the
absence of the magnetic field, as presented in Fig. 9(b),
the expectation values of 〈Szj 〉 change sign every three
sites within a very small range of (−2× 10−4, 2× 10−4),
resulting in the magnetization per site m = 0. With
increasing the field, the expectation values of 〈Szj 〉 in-
crease, whose unit of three spins is gradually divided into
a pair and a single, as displayed in Fig. 9(c). At the field
H/|J1| = 0.05, as illustrated in Fig. 9(d), the behavior of
〈Szj 〉 shows a perfect sequence such as {..., (Sa, Sb, Sb), ...}
with Sa = 0.496 and Sb = 0.002, giving rise to the mag-
netization per site m = 1/6. In addition, the sequence
is fixed with increasing the field until H/|J1| = 3.2, cor-
responding to the plateau of m = 1/6. As the field is
enhanced further, the double Sb begin to rise, and the
sequence becomes a waved series with smaller swing of
(Sa − Sb) as revealed in Fig. 9(e), which corresponds to
the plateau state at m = 1/6 that is destroyed. It is
noting that the increase of m is mainly attributed to the
promotion of double Sb, as Sa is already saturated until
Sa = Sb = 0.5 at the saturated field.
As discussed above, the physical picture of them = 1/6
plateau state at J1 : J2 : J3 = −1 : 4 : −0.5 can be
0 2 4
0 2 4
0.0 0.4 0.8
0 1 2
0.0 0.2 0.4 0.6
0 1 2
0 1 2
=0.2 1 2
 = 1, J
 = 0.5,
 = 0.05
1 2J3 = 0.2
 = 1, J
 = 2,
 = 0.05
 = 1, J
 = 0.5,
        H/J
=0.01
=1, J
 = 0.01
 = 0.2
=0.5,
 = 0.01
3  = 3
=1, J
 = 0.01
 = 0.2
FIG. 8: (Color online) For the spin-1/2 frustrated Heisenberg
diamond chains with J1 = 1 and J3 > 0, the magnetization
process m(H) at temperature T/J1 = 0.05 with (a) J2 = 0.5
and (b) J2 = 2; the susceptibility χ(T ) at field H/J1 = 0.01
with (c) J2 = 0.5 and (d) J2 = 2; the specific heat C(T ) at
field H/J1 = 0.01 with (e) J2 = 0.5 and (f) J2 = 2.
understood by the following approximate wave function
(| ↑3i−2↑3i−1↓3i〉 ± | ↑3i−2↓3i−1↑3i〉).
By use of this wave function, we have 〈ψi|Sz3i−2|ψi〉 =
1/2, 〈ψi|Sz3i−1|ψi〉 = 0, 〈ψi|Sz3i|ψi〉 = 0, leading to a
sequence of {..., (1/2, 0, 0), ...}, and m = (1/2 + 0 +
0)/3 = 1/6. This is in agreement with our DMRG re-
sults {..., (0.496, 0.002, 0.002), ...}.
The static structure factor S(q) of the frustrated spin-
1/2 Heisenberg diamond chain with the competing cou-
plings J1 : J2 : J3 = −1 : 4 : −0.5 in the ground states
is considered in different external fields. At zero field,
shown in Fig. 10(a), S(q) has three peaks at q = π/3,
5π/3 and π with mediate heights, which reflects the pe-
riods of 6 and 2 for 〈Szj Sz0 〉, respectively. As shown in
0 1 2 3
0 40 80
0 40 80
T = 0
 = -1:4:-0.5,
-2E-3
T = 0, H = 0
| = 0.025
(d) H/|J
| = 0.05, 3.2
(e) H/|J
| = 3.27
FIG. 9: (Color online) For a spin-1/2 frustrated Heisenberg
diamond chain with fixed couplings J1 : J2 : J3 = −1 : 4 :
−0.5, (a) the magnetization per site m as a function of mag-
netic field H in the ground states; and the spatial dependence
of the averaged local magnetic moment 〈Szj 〉 in the ground
states with external field (b) H/|J1| = 0, (c) 0.025, (d) 0.05
and 3.2, and (e) 3.27.
Fig. 10(b), in the absence of the external field, 〈Szj Sz0 〉
changes sign every three sites, corresponding to the pe-
riods of 6 and 2. With increasing the field, the peak at
q = π becomes flat with height depressed forwardly, while
the peak at q = π/3 (5π/3) is divided into two peaks
shifting oppositely from q = π/3 (5π/3) with height de-
creased, indicating the corruption of the periods of 6 and
2 and the emergence of new periods for 〈Szj Sz0 〉, as shown
in Fig. 10(c). At the field H/|J1| = 0.05, two shift-
ing peaks have respectively reached q = 2π/3 and 4π/3,
and are merged with the existing peaks, showing the oc-
currence of period 3 for 〈Szj Sz0 〉, as clearly displayed in
Fig. 10(d). The flat and peaks keep constant during the
plateau state at m = 1/6. When the plateau state is
destroyed at field H/J1 = 3.27, the peaks at q = 2π/3
and 4π/3 are depressed sharply, revealing the decay of
the period 3, as shown in Fig. 10(e). At the saturated
field of H/J1 = 3.3, all peaks disappear and become flat
with the value zero, which is the saturated state. So, the
static structure factor S(q) shows different characteristics
in different magnetic fields. Similar to the discussions in
-0.02
0 40 80
0 40 80
 = -1,
 = 4,
 = -0.5,
T = 0
0.05, 3.2
| = 3.3
T = 0, H = 0
| = 0.025
| = 0.05, 3.2
| = 3.27
FIG. 10: (Color online) For a spin-1/2 frustrated Heisenberg
diamond chain with fixed couplings J1 : J2 : J3 = −1 : 4 :
−0.5, (a) the static structure factor S(q) in the ground states
with different external fields; and the spatial dependence of
the spin correlation function 〈Szj S
0 〉 in the ground states with
external field (b) H/|J1| = 0, (c) 0.025, (d) 0.05 and 3.2, and
(e) 3.27.
Fig. 4, the low-lying excitations of this frustrated di-
amond chain would also behave differently in different
magnetic fields.
To further investigate the zero-field static structure
factor S(q) in the ground state for the frustrated spin-1/2
diamond chains with various J1, J3 < 0 and J2 > 0, two
cases with J1 = −1, J3 < 0 and J2 = 1 and 4 are illus-
trated in Fig. 11(a) and 11(b), respectively. For J2 = 1,
S(q) shows a round peak at q = π, two sharp peaks at π/3
and 5π/3 when |J3| < 1, and a very sharp peak at q = 0
when |J3| > 1. For J2 = 4, S(q) shows three peaks at
q = π/3, π and 5π/3 as |J3| < 1, and a very sharp peak
at q = π and nearly ignorable peaks at π/3 and 5π/3
as |J3| > 1. In general, the zero-field static structure
factor S(q) shows different characteristics with different
competing couplings, whose exotic characteristics could
be experimentally observed in the related diamond-typed
compounds.
As shown in Fig. 12, the DMRG results of the static
structure factor as a function of wavevector are fitted by
Eq.(7) for the spin-1/2 frustrated Heisenberg diamond
chains with J1, J3 < 0 and J2 > 0. It can be found
0 1 2
(b) J
 = -1, J
 = 4J3 = -16
(a) T = 0, H = 0, J
 = -1, J
 = -0.1
FIG. 11: (Color online) The zero-field static structure factor
S(q) in the ground states for the spin-1/2 frustrated Heisen-
gberg diamond chains with length L = 120, J1 = −1, J3 < 0
and J2 taken as (a) 1; (b) 4.
that the characteristic behaviors can be nicely fitted by
Eq.(7), with only a slightly quantitative deviation, show-
ing that the main features of the static structure factor
for the present systems can be captured by a superposi-
tion of six modes. It is consistent with the fact that the
spin correlation function for the present systems has six
different modes, as manifested in Fig. 10(b).
Similar to the case (a) with all AF couplings in the
last section, the characteristics of zero-field S(q) for the
spin-1/2 frustrated Heisenberg diamond chain with J1,
J3 < 0 and J2 > 0 can be further undersood in terms
of the low-lying excitations of the system (see Appendix
A). By means of the JW transformation, the zero-field
low-lying fermionic excitation ε(k) of the present case is
calculated, as shown in Figs. 13(a)-(b). It is found that
the zero-field low-lying fermionic excitation ε(k) differs
for different couplings, but the positions of minimums
of ε(k) for different couplings, as indicated by arrows in
Figs. 13(a) and (b), appear to be the same. One may see
that these positions coincide exactly with the locations of
peaks of the zero-field static structure factor S(q) mani-
fested in Figs. 12(a) and (b), respectively, showing that
our fitting equation is qualitatively consistent with the
low-lying excitations of the system.
0 1 2
(b)  DMRG
T = 0, H = 0,
 = -1, 
 = 4,
 = -0.5
 Eq. (7)
= -0.07,
 = 0.3, c
1,2,...,6
= 0.0
= -0.07,
= -0.21,
(a)  DMRG
T = 0, H = 0, 
 = -1, 
 = -0.1
 Eq. (7)
= -0.03,
= -0.14,
= -0.12,
 = 0.19,
1,2,...,6
= 0.0
FIG. 12: (Color online) The DMRG results of the zero-field
static structure factor as a function of wavevector are fitted
by Eq. (7) for the spin-1/2 frustrated Heisenberg diamond
chains with (a) J1 = −1, J2 = 1, J3 = −0.1, and (b) J1 = −1,
J2 = 4, J3 = −0.5.
0 1 2
-3.35
-3.30
-3.25
(b) T=0, H=0,
 = -1, 
 = 4, 
 = -0.5
(a) T=0, H=0, 
 = -1, 
 = 1, 
 = -0.1
FIG. 13: (Color online) The zero-field low-lying fermionic
excitation as a function of wavevector for the spin-1/2 frus-
trated Heisenberg diamond chain with (a) J1 = −1, J2 = 1,
J3 = −0.1, and (b) J1 = −1, J2 = 4, J3 = −0.5. The arrows
indicate the locations of minimums of ε(k).
B. Magnetization, Susceptibility and Specific Heat
Figures 14(a) and 14(b) show the magnetization pro-
cess for the spin-1/2 frustrated diamond chain at a finite
temperature T/|J1| = 0.05 with J1 = −1, J3 < 0, and
J2 = 1 and 4, respectively. It is shown that the magneti-
zation exhibits different behaviors for different J1, J3 < 0
and J2 > 0. A plateau at m = 1/6 is observed at small
|J3|; with a fixed J2, the larger |J3|, the smaller the width
of the plateau at m = 1/6, and the plateau disappears
when |J3| exceeds the critical value; for a fixed J3, the
larger J2, the wider the width of the plateau at m = 1/6;
the saturation field is obviously depressed with the in-
crease of |J3| at a fixed J2, and is enhanced with the
increase of J2 at a fixed J3.
Figures 14(c) and 14(d) manifest the susceptibility χ
as a function of temperature T for the spin-1/2 frus-
trated diamond chain with J1 = −1, J3 < 0 and J2 = 1
and 4, respectively, where the external field is taken as
H/|J1| = 0.01. For J2 = 1, the low temperature part of
χ(T ) keeps finite when |J3| < 1, and becomes divergent
when |J3| > 1. As clearly revealed in the inset of Fig.
14(c), J3 = −1 is the critical value, which is in agreement
with the behaviors of static structure factor S(q) shown
in Fig. 11(a). For J2 = 4, a clear double-peak structure
of χ(T ) is obtained at |J3| = 8. The temperature depen-
dence of the specific heat C with J1 = −1, J3 < 0 and
J2 = 1 and 4 is shown in Figs. 14(e) and 14(f), respec-
tively, where the external field is fixed as H/|J1| = 0.01.
For J2 = 1, a double-peak structure of C(T ) is observed
for the case of |J3| = 0.5. The case with J2 = 4 shown in
Fig. 14(f) exhibits the similar characteristics. It is also
found that, owing to the competitions among J1, J3 and
J2, the thermodynamics demonstrate rich behaviors at
different couplings. As reflected in Fig. 11, the low-lying
excitations behave differently with various F interactions
J1, J3 and AF interaction J2, while the excitation gaps
could induce the double-peak structure in the suscepti-
bility as well as in the specific heat[22].
V. A DIAMOND CHAIN WITHOUT
FRUSTRATION (J1, J2 > 0, J3 < 0)
A. Local Magnetic Moment and Spin Correlation
Function
Figure 15(a) shows the magnetization process of a
non-frustrated spin-1/2 Heisenberg diamond chain in the
ground states with the couplings satisfying J1 : J2 : J3 =
1 : 2 : −0.5. The plateau of magnetization per site
m = 1/6 is observed. The appearance of the magneti-
zation plateau can be understood from the spatial de-
pendence of the averaged local magnetic moment 〈Szj 〉 in
the ground states under different external fields. In ab-
sence of the external field, the expectation values of 〈Szj 〉
change sign every one site with a waved swing within a
very small range of (−2× 10−4, 2× 10−4), giving rise to
0.0 0.4 0.8
0 1 2 3 4
0 1 2
0 1 2
0.0 0.6 1.2
0 1 2
0 1 2
-1 -0.5 J3 = -0.1
 = -1, J
 = 1,
| = 0.05
 = -8
 = -1, J
 = 4,
| = 0.05
-0.5-4
= -1.5
 = -1, J
 = 1,
|=0.01
 = -1
=-1, J
| = 0.01
 = -1
=-1, J
| = 0.01
 = -1
=-1, J
| = 0.01
 = -0.1
FIG. 14: (Color online) For the spin-1/2 frustrated Heisen-
berg diamond chains with J1 = −1 and J3 < 0, the mag-
netization process m(H) at temperature T/J1 = 0.05 with
(a) J2 = 1 and (b) J2 = 4; the susceptibility χ(T ) at field
H/J1 = 0.01 with (c) J2 = 1 and (d) J2 = 4; the specific heat
C(T ) at field H/J1 = 0.01 with (e) J2 = 1 and (f) J2 = 4.
the magnetization per site m = 0. Under a finite field,
every three successive spins have gradually cooperated
into a pair and a single, as shown in Fig. 15(c). At
H/J1 = 0.9, as given in Fig. 15(d), 〈Szj 〉 shows a perfect
sequence such as {..., (Sa, Sb, Sb), ...} with Sa = 0.393
and Sb = 0.053, resulting in the magnetization per site
m = 1/6. Moreover, the sequence is fixed with the field
increased until H/J1 = 1.8, corresponding to the plateau
state ofm = 1/6. As the field is increased further, double
Sb begin to increase, and the sequence becomes a waved
succession with a smaller swing of (Sa − Sb), as mani-
fested in Fig. 15(e), which corresponds to the fact that
the plateau state at m = 1/6 is destroyed. It is observed
that the increase of m at first is mainly attributed to the
speedy boost of double Sb, and later, Sa starts to increase
weakly until Sa = Sb = 0.5 at the saturated field.
0 1 2 3
0 40 80
0 40 80
T = 0
 = 1:2:-0.5,
-2E-4
(b) T = 0, H = 0
 = 0.7
(d) H/J
 = 0.9, 1.8
 = 2.5
FIG. 15: (Color online) For a spin-1/2 non-frustrated Heisen-
berg diamond chain with fixed couplings J1 : J2 : J3 = 1 :
2 : −0.5, (a) the magnetization per site m as a function of
magnetic field H in the ground states; and the spatial de-
pendence of the averaged local magnetic moment 〈Szj 〉 in the
ground states with external field (b) H/J1 = 0, (c) 0.7, (d)
0.9 and 1.8, and (e) 2.5.
For this non-frustrated case with couplings J1 : J2 :
J3 = 1 : 2 : −0.5, the obtained perfect sequence of
{..., (0.393, 0.053, 0.053), ...} for the m = 1/6 plateau
state could be understood by the following approximate
trimerized wave function
(2| ↑3i−2↑3i−1↓3i〉 ± 2| ↑3i−2↓3i−1↑3i〉
± | ↓3i−2↑3i−1↑3i〉). (9)
According to this function, we find 〈ψi|Sz3i−2|ψi〉 = 7/18,
〈ψi|Sz3i−1|ψi〉 = 1/18, 〈ψi|Sz3i|ψi〉 = 1/18, giving rise to a
sequence of {..., (7/18, 1/18, 1/18), ...}. It turns out that
m = (7/18 + 1/18 + 1/18)/3 = 1/6. This observation
implies that the ground state of the plateau state can
also be described by the trimerized states.
The static structure factor S(q) of the non-frustrated
spin-1/2 Heisenberg diamond chain in the ground states
with the couplings J1 : J2 : J3 = 1 : 2 : −0.5 is probed
in different external fields. As shown in Fig. 16(a), in
absence of the external field, S(q) shows a sharp peak at
q = π, similar to the behaviors of the S = 1/2 Heisen-
berg AF chain, which reflects the period of 2 for 〈Szj Sz0 〉.
0 1 2
-0.02
0 40 80
0 40 80
(a) J1 = 1,
 = 2,
 = -0.5,
T = 0
0.9, 1.8
T = 0, H = 0
 = 0.7
(d) H/J
 = 0.9, 1.8
 = 2.5
FIG. 16: (Color online) For a spin-1/2 non-frustrated Heisen-
berg diamond chain with fixed couplings J1 : J2 : J3 = 1 : 2 :
−0.5, (a) the static structure factor S(q) in the ground states
with different external fields; and the spatial dependence of
the spin correlation function 〈Szj S
0 〉 in the ground states with
external field (b) H/J1 = 0, (c) 0.7, (d) 0.9 and 1.8, and (e)
As displayed in Fig. 16(b), at zero external field, 〈Szj Sz0 〉
changes sign every one lattice site, corresponding to the
period of 2. When the field is increased, the peak at
q = π becomes a flat with the height depressed greatly,
while two new peaks with small heights at q = 2π/3 and
4π/3 appear, indicating the corruption of the period of 2
and the emergence of the new period of 3 for 〈Szj Sz0 〉, as
demonstrated in Fig. 16(c). At the field H/J1 = 0.9,
the peaks at q = 0, 2π/3 and 4π/3 become sharper.
The flat and peaks of S(q) keep unchanged during the
plateau state at m = 1/6. When the plateau state is de-
stroyed at the field H/J1 = 2.5, the peaks at q = 2π/3
and 4π/3 are suppressed dramatically, revealing the de-
cay of the period 3, as shown in Fig. 16(e). At the field
H/J1 = 2.8, all peaks disappear, and become a flat with
the value zero, except for the peak at q = 0, which is the
saturated state. It can be stated that the static struc-
ture factor S(q) shows various characteristics in different
magnetic fields. Similar to what discussed in Figs. 4
and 10, the low-lying excitations of this non-frustrated
diamond chain would also behave differently in different
magnetic fields.
The zero-field static structure factor S(q) in the ground
state for the present system displays a peak at q = π with
different couplings, but whose static correlation function
〈Szj Sz0 〉 varies with the couplings. As illustrated in Fig.
17, only are the values of 〈Szj Sz0 〉 larger than zero pre-
sented for convenience. In order to gain deep insight
into physics, for a comparison we also include the static
correlation function for the S = 1/2 Heisenberg antifer-
romagnetic (HAF) chain in Fig. 17(a), whose asymptotic
behavior has the form of [23, 24]
〈Sj · S0〉 ∝ (−1)j
(2π)3/2
. (10)
Eq. (10) is depicted as solid lines in Figs. 17. To take the
finite-size effect into account, the length of the diamond
chain is taken as L = 90, 120 and 160, respectively. As
revealed in Fig. 17(a), the DMRG result of the S = 1/2
HAF chain with an infinite length agrees well with the
solid line. Fig. 17(b) shows that the static correlation
function for the spin-1/2 diamond chain with frustrated
couplings J1 : J2 : J3 = 1 : 1.2 : 0.5 decays faster than
that of the HAF chain. Compared with Figs. 17(c)-(f),
it can be found that all the static correlation functions
for the spin-1/2 non-frustrated diamond chain with AF
interactions J1, J2 and F interaction J3 fall more slowly
than that of the S = 1/2 HAF chain; for fixed AF inter-
actions J1 and J2, the static correlation functions drop
more leisurely with increasing the F interaction |J3|; for
fixed AF interaction J1 and F interaction J3, the static
correlation functions decrease more rapidly with increas-
ing the AF interaction J2.
B. Magnetization, Susceptibility and Specific Heat
Figures 18(a) and 18(b) show the magnetization pro-
cess for the spin-1/2 non-frustrated diamond chain at a
finite temperature T/J1 = 0.05 with J1 = 1, J3 < 0, and
J2 = 0.5 and 2, respectively. It is found that the magneti-
zation behaves differently with different AF interactions
J1, J2 and F interaction J3. A plateau at m = 1/6 is ob-
tained at small |J3|; for fixed J1 and J2, the larger |J3|,
the narrower the width of the plateau at m = 1/6, and
after |J3| exceeds a critical value, the plateau at m = 1/6
is eventually smeared out; for fixed J1 and J3, the larger
J2, the wider the width of the plateau at m = 1/6; the
saturated field is obviously unchanged with changing the
F interaction J3. The coupling-dependence of the spin-
1/2 non-frustrated diamond chain with AF interactions
J1, J2 and F interaction J3 is similar to that of trimerized
F-F-AF chains[25].
Figures 18(c) and 18(d) present the susceptibility χ
as a function of temperature T for the spin-1/2 frus-
trated diamond chain with J1 = 1, J3 < 0 and J2 = 0.5
and 2, respectively, where the external field is taken as
H/J1 = 0.01. A double-peak structure of χ(T ) is ob-
served at small F interaction |J3| and disappears at large
|J3|. The temperature dependence of the specific heat C
with J1 = 1, J3 < 0 and J2 = 0.5 and 2 is shown in Figs.
0 50 100 150
0 50 100 150
(a) J
=1:1:0 (HAF)
 Eq. (10)
L=90   (DMRG)
L=120 (DMRG)
L=160 (DMRG)
T = 0, H = 0
 = 1:1.2:0.5
(frustrated)
 = 1:2:-0.5
 = 1:2:-2
 = 1:0.5:-0.5
 = 1:0.5:-2
FIG. 17: (Color online) The zero-field static correlation func-
tion 〈Szj S
0 〉 versus site j in the ground state for a spin-1/2
diamond chain with different lengths and various couplings.
The couplings ratio J1 : J2 : J3 is taken as (a) 1 : 1 : 0 (HAF),
(b) 1 : 1.2 : 0.5 (frustrated), (c) 1 : 2 : −0.5, (d) 1 : 2 : −2, (e)
1 : 0.5 : −0.5, (f) 1 : 0.5 : −2. The length is taken as L = 90,
120 and 160, respectively.
18(e) and 18(f), respectively, where H/J1 = 0.01. It is
seen that, when J2 is small, C(T ) exhibits only a single
peak; when J2 is large, a double-peak structure of C(T )
is observed. In the latter case, the double-peak structure
is more obvious for small F interaction |J3|, and tend to
disappear at large |J3|. Therefore, the thermodynamics
demonstrate various behaviors with different AF interac-
tions J1, J2 and F interaction J3.
C. Effect of Anisotropy of Bond Interactions
Some magnetic materials show different behaviors un-
der longitudinal and transverse magnetic fields, showing
that the anisotropy plays an important role in the physi-
cal properties of the system. First, let us investigate the
XXZ anisotropy of the AF interaction J2 on the proper-
ties of the spin-1/2 non-frustrated diamond chain with
the couplings J1 : J2z : J3 = 1 : 2 : −0.5 for vari-
ous anisotropy parameter defined by γ2 = J2x/J2z =
J2y/J2z, where the z axis is presumed to be perpendic-
ular to the chain direction. For γ2 ≥ 1, the magneti-
zation m(H), susceptibility χ(T ) and specific heat C(T )
are presented in Figs. 19(a), (b) and (c), respectively.
0.0 0.8 1.6
0 1 2 3
0 1 2 3
0 1 2 3
0 1 2
0 1 2
 = 0.05,
 = 1, J
 = 0.5
 = -2
 = 1, J
 = 0.05,
 = -2
 = 1, J
 = 0.5
 = 0.01,
 = -2
-0.5-0.2
 = 1, J
 = 0.01,
 = -0.2
 = 1, J
 = 0.5
 = 0.01,
 = -2
-0.2   
 = 0.01,
 = 1, J
-0.2 -0.5
 = -2
FIG. 18: (Color online) For the spin-1/2 non-frustrated
Heisenberg diamond chains with J1 = 1 and J3 < 0, the mag-
netization process m(H) at temperature T/J1 = 0.05 with
(a) J2 = 0.5 and (b) J2 = 2; the susceptibility χ(T ) at field
H/J1 = 0.01 with (c) J2 = 0.5 and (d) J2 = 2; the specific
heat C(T ) at field H/J1 = 0.01 with (e) J2 = 0.5 and (f)
J2 = 2.
With increasing γ2, it is found that when the magnetic
field H is along the z direction, the width of the mag-
netization plateau at m = 1/6 as well as the saturation
field are enlarged, while those are more increased for H
along the x direction than along the z direction; the peak
of the susceptibility χ(T ) for H along the z direction
at lower temperature side is promoted, and the second
round peak at high temperature side is depressed with a
little shift, while χ(T ) for H along the x direction shows
the similar varying trend; the peak of the specific heat
C(T ) for H along the z direction at lower temperature
side leaves almost unchanged, and the second round peak
at high temperature side moves towards the higher tem-
perature side, while C(T ) for H along the x direction
coincides with those for H along the z direction. For
0 < γ2 < 1, the anisotropy just shows very reverse effect
on the thermodynamic properties in comparison to what
we discussed above.
Now let us discuss the effect of the XXZ anisotropy of
J3 < 0 on the magnetic and thermodynamic properties
of the spin-1/2 non-frustrated diamond chain with the
couplings J1 : J2 : J3z = 1 : 2 : −0.5. Recall that as
the J2 bond connects two different lattice sites, as shown
in Fig. 1, J1 and J3 can be different, even their signs.
Define a parameter γ3 to characterize the anisotropy as
γ3 = J3x/J3z = J3y/J3z, where the z axis is perpendicu-
lar to the chain direction. For γ3 ≥ 1, the magnetization
m(H), susceptibility χ(T ) and specific heat C(T ) are de-
picted in Figs. 19(d), (e) and (f), respectively. With
increasing γ3, it is seen that the width of the plateau at
m = 1/6 for H along the z direction becomes slightly
wider, while it goes smaller for H along the x direc-
tion; the saturation field is not changed with γ3 along
both directions; the peak of χ(T ) for H along the z di-
rection at lower temperature side is promoted, and the
second round peak at higher temperature side is slightly
depressed, while the situations along the x direction are
just reverse, namely, the peak at lower temperature side
is depressed, and the second peak at higher tempera-
ture side is slightly promoted; the peak of C(T ) for H
along the z direction at lower temperature side leaves al-
most unchanged, and the second round peak at higher
temperature side moves slightly to the higher tempera-
ture side, while C(T ) along the x direction coincides with
that along the z direction. For 0 < γ3 < 1, the situation
just becomes reverse in comparison to what we discussed
above.
D. Comparison to Experimental Results
Recently, Kikuchi et al. [6] have performed a nice
measurement on a spin-1/2 diamond-chain compound
Cu3(CO3)2(OH)2, i.e., azurite. They have observed the
1/3 magnetization plateau, unambiguously confirming
the previous theoretical prediction. The two broad peaks
both in the magnetic susceptibility and the specific heat
are observed. We note that in Ref. [6], the experimen-
tal data at finite temperatures are fitted by the zero-
temperature theoretical results obtained by the exact di-
agonalization and DMRG methods, while the result of
the high temperature series expansion fails to fit the
low-temperature behavior of the susceptibility. In ac-
cordance with our preceding discussions, by using the
TMRG method, we have attempted to re-analyse the ex-
perimental data presented in Ref. [6] to fit the experi-
ments for the whole available temperature region.
Our fitting results for the temperature dependence of
the susceptibility χ of the compound Cu3(CO3)2(OH)2
are presented in Fig. 20(a). For a comparison, we
have also included the TMRG result calculated by us-
ing the parameters given in Ref. [6]. Obviously, our
TMRG results with J1 : J2 : J3z = 1 : 1.9 : −0.3 and
J3x/J3z = J3y/J3z = 1.7 fit very well the experimental
0 1 2 3 4
0 1 2 3 4
0 1 2 3
0 1 2 3
0 1 2
0 1 2
 = 0.05
=1:2:-0.5
=2, z
=2, x
 = 0.05
=1:2:-0.5
=2, z
=2, x
 = 0.01
=2, z
=2, x
 = 0.01
=2, z
=2, x
 = 0.01
 = 0.01
FIG. 19: (Color online) For the spin-1/2 non-frustrated dia-
mond chain with the couplings satisfying J1 : J2z : J3 = 1 :
2 : −0.5 for various anisotropy γ2 = J2x/J2z ≥ 1: (a) the
magnetization process m(H) at temperature T/J1 = 0.05;
(b) the susceptibility χ(T at field H/J1 = 0.01; and (c) the
specific heat C(T ) at field H/J1 = 0.01. For the spin-1/2
non-frustrated diamond chain with couplings J1 : J2 : J3z =
1 : 2 : −0.5 for various anisotropy γ3 = J3x/J3z ≥ 1: (d) the
magnetization process m(H) at temperature T/J1 = 0.05;
(e) the susceptibility χ(T ) at field H/J1 = 0.01; and (f) the
specific heat C(T ) at field H/J1 = 0.01.
data of χ, and the two round peaks at low temperatures
are nicely reproduced, while the result with J1 : J2 :
J3 = 1 : 1.25 : 0.45 obtained in Ref. [6] cannot fit the
low-temperature behavior of χ [26]. On the other hand,
the fitting results for the temperature dependence of the
specific heat C(T ) of the compound Cu3(CO3)2(OH)2
are shown in Fig. 20(b). The lattice contribution, which
is included in the raw experimental data in Ref.[6], is
subtracted according to C(T ) = CExp(T ) − αT 3, where
α is a parameter. Obviously, our TMRG result with the
same set of parameters J1 : J2 : J3z = 1 : 1.9 : −0.3
and J3x/J3z = J3y/J3z = 1.7 fits also remarkably well
the experimental data of C(T ), and the two round peaks
at low temperatures are nicely reproduced, while the re-
sult with J1 : J2 : J3 = 1 : 1.25 : 0.45 given in Ref. [6]
cannot fit the low-temperature behavior of C(T ), even
qualitatively. In addition, the sharp peak of C(T ) ex-
perimentally observed at temperature around 2K cannot
be reproduced by both sets of the coupling parameters,
which might be a three-dimensional long-range ordering
due to interchain interactions. The fitting results for the
magnetization m(H) of the compound Cu3(CO3)2(OH)2
are shown in Fig. 20(c). We would like to point out
that the quantitative fitting by our above parameters to
the width of the plateau is not so good, but the quali-
tative behavior is quite consistent with the experiments
both in the transverse and longitudinal magnetic fields,
say, H
c1 > H
c1, H
c2 < H
c2, and the saturation field is
fixed along both directions, suggesting that our fitting
parameters capture the main characteristics. It is worth
pointing out that if the anisotropy ratio is increased up
to J3x/J3z = J3y/J3z = 2.5 with the same couplings
J1 : J2 : J3z = 1 : 1.9 : −0.3, the width of the 1/3
plateau for H ‖ b will be decreased to about one-half of
that for H ⊥ b.
Therefore, our calculations show that (i) the best cou-
plings obtained by fitting the experimental data of the
susceptibility for the azurite could be J1 : J2 : J3z = 1 :
1.9 : −0.3 with the anisotropic ratio for the ferromag-
netic interaction J3x/J3z = J3y/J3z = 1.7, where z ⊥ b;
(ii) the compound may not be a spin frustrated magnet;
(iii) the double peaks of the susceptibility and the specific
heat are not caused by the spin frustration effect, but by
the two kind of gapless and gapful excitations owing to
the competition of the AF and F interactions.
One might argue that for this diamond chain com-
pound, from the point of the lattice distance it is unlikely
that J1 is AF without XXZ anisotropy while J3 is F with
strong XXZ anisotropy. We may offer another possibil-
ity to support our findings, namely, the case of J1 and
J3 with opposite signs is not excluded from the lattice
structure of the compound. A linear relationship exists
between the exchange energy and the metal-ligand-metal
bridge angle: the coupling energy, positive (ferromag-
netic) at angles near 90o, becomes increasingly smaller
(more antiferromagnetic) as the angle increases[27]. As
the ferromagnetic coupling J3 is determined by fitting
the experimental low-temperature behaviors of χ(T ) and
C(T ), this fitting coupling parameters should not be im-
possible if one considers the angle of J1 bridge to keep the
antiferromagnetic coupling while the angle of J3 bridge
to induce the ferromagnetic coupling. On the other
hand, we note that there is another compound with Cu
ions, Cu2(abpt)(SO4)2(H2O)·H2O, whose g factors in XY
plane are different from that in z direction [28]. Besides,
someone might argue that the condition J2 ≫ J1, |J3| is
necessary to explain the double peak behavior of the dia-
mond chain. In fact, such an argument is not necessarily
true, as manifested in Fig. 18(c), where the double peaks
of χ(T ) at low temperatures can also be produced with
0 10 20 30 40
0 10 20 30 40
0.020
0.025
0.030
H (T)
(c) H b (Exp.) H//b (Exp.)
 (TMRG) (Ref.[12])
 H b (TMRG) (present work)
 H//b (TMRG) (present work)
T (K)
(b) CExp- T
=0.0005
=0.00055
=0.0006
 (TMRG) (Ref.[12])
 (TMRG) (present work)
H b (Exp.)
 (TMRG)
=1:1.25:0.45,
 H b (TMRG)
 H//b (TMRG)
(a) H//b (Exp.)
=19K (Ref.[12])
=1:1.9:-0.3,
=23K, J
(present work)
FIG. 20: (Color online) A comparison of experimental re-
sults for (a) the magnetic susceptibility, (b) the specific heat
and (c) the magnetization process for the spin-1/2 diamond
compound Cu3(CO3)2(OH)2 with the TMRG results. The
experimental data are taken from Ref. [6]. See the context
for details.
the parameters J1 : J2 : J3 = 1 : 0.5 : −0.1. In other
words, the double-peak behavior of the diamond chain
may not depend on whether J2 ≫ J1, J3 or not, but may
be strongly dependent on the competition of AF and F
interactions, as discussed above.
VI. SUMMARY AND DISCUSSION
In this paper, we have numerically studied the mag-
netic and thermodynamic properties of spin-1/2 Heisen-
berg diamond chains with three different cases (a) J1,
J2, J3 > 0 (frustrated), (b) J1, J3 < 0, J2 > 0 (frus-
trated), and (c) J1, J2 > 0, J3 < 0 (non-frustrated) by
means of the DMRG and TMRG methods. In the ground
states, the local magnetic moment, spin correlation func-
tion, and static structure factor are explored. The static
structure factor S(q) at zero field shows peaks at wave
vector q = 0, π/3, 2π/3, π, 4π/3 and 5π/3 for different
couplings, in which the peaks at q = 0, 2π/3 and 4π/3
in the magnetization plateau state with m = 1/6 are ob-
served to be couplings independent. The DMRG results
of the zero-field static structure factor can be nicely fitted
by a linear superposition of six modes, where two fitting
equations are proposed. It is seen that the six modes
are closely related to the low-lying excitations of the sys-
tem. At finite temperatures, the magnetization, suscep-
tibility and specific heat are calculated, which show var-
ious behaviors for different couplings. The double-peak
structure of the susceptibility and specific heat can be
procured, whose positions and heights are found to be
dependent on competing couplings. It has been shown
that the XXZ anisotropy of F and AF couplings can have
remarkable effect on the physical behaviors of the sys-
tem. In addition, the experimental susceptibility, specific
heat and magnetization of the diamond chain compound
Cu3(CO3)2(OH)2[6] can be nicely fitted by our TMRG
results.
For the spin-1/2 frustrated Heisenberg diamond chains
with AF couplings J1, J2 and J3, the magnetization
plateau at m = 1/6 in the ground state coincides with
a perfect fixed sequence of the averaged local magnetic
moment such as {..., (Sa, Sa, Sb), ...} with 2Sa+Sb = 1/2,
which might be described by trimerized states. On the
other hand, the static structure factor S(q) shows peaks
at wave vectors q = 0, π/3 (5π/3), and 2π/3 (4π/3)
for different external fields and different AF couplings.
We note that the similar behavior of S(q) has been
experimentally observed in diamond-typed compound
Sr3Cu3(PO4)4 [19]. In addition, the DMRG results of
the zero-field static structure factor can be nicely fitted
by a linear superposition of six modes. It is observed
that the six modes are closely related to the low-lying
excitations of the present case. At finite temperatures,
the magnetizationm(H), susceptibility χ(T ) and specific
heat C(T ) demonstrate different behaviors at different
AF couplings, say, the magnetization plateau atm = 1/6
is observed whose width is found to be dependent on the
couplings; the double peak structure is observed for the
susceptibility χ(T ) and specific heat C(T ) as a function of
temperature, and the heights and positions of the peaks
are found to be dependent on the AF couplings.
For the spin-1/2 frustrated Heisenberg diamond chains
with F couplings J1, J3 and AF coupling J2, the mag-
netization plateau at m = 1/6 in the ground state cor-
responds to a perfect fixed sequence of the averaged lo-
cal magnetic moment such as {..., (Sa, Sb, Sb), ...} with
Sa+2Sb = 1/2, which could be understood by trimerized
states. The static structure factor S(q) shows peaks also
at wave vectors q = 0, π/3 (5π/3), and 2π/3 (4π/3) for
different external fields and different F couplings J1, J3
and AF coupling J2, which is expected to be experimen-
tally observed in the related diamond-type compound.
In addition, the DMRG results of the zero-field static
structure factor can be nicely fitted by a linear superpo-
sition of six modes with the fitting equations mentioned
above. The six modes are closely related to the low-
lying excitations of the system. At finite temperatures,
the magnetization m(H), susceptibility χ(T ) and spe-
cific heat C(T ) demonstrate various behaviors for differ-
ent couplings, namely, the magnetization plateau at m =
1/6 is observed whose width is found to depend on the
couplings; the double-peak structure is also observed for
the susceptibility χ(T ) and specific heat C(T ), and the
heights and positions of the peaks are found dependent
on F couplings J1, J3 and AF coupling J2.
For the spin-1/2 non-frustrated Heisenberg diamond
chains with AF couplings J1, J2 and F coupling J3, the
magnetization plateau at m = 1/6 in the ground state
coincides with a perfect fixed sequence of the averaged
local magnetic moment such as {..., (Sa, Sb, Sb), ...} with
Sa+2Sb = 1/2, which could be understood by trimerized
states. The static structure factor S(q) is observed to ex-
hibit the peaks at wave vectors q = 0 and 2π/3 (4π/3)
for different external fields and different AF couplings
J1, J2 and F coupling J3, which could be experimen-
tally detected in the related diamond-type compound. In
addition, it is found that the zero-field spin correlation
function 〈Szj Sz0 〉 is similar to that of the S = 1/2 Heisen-
berg AF chain. At finite temperatures, the magnetiza-
tion m(H), susceptibility χ(T ) and specific heat C(T )
are found to reveal different behaviors for different cou-
plings, i.e., the magnetization plateau at m = 1/6 is ob-
tained, whose width is found to depend on the couplings;
the double-peak structure is observed for the tempera-
ture dependence of the susceptibility χ(T ) and specific
heat C(T ), where the heights and positions of the peaks
depend on different AF couplings J1, J2 and F coupling
The effect of the anisotropy of the AF and F inter-
actions on the physical properties of the non-frustrated
Heisenberg diamond chain is also investigated. For the
case of the couplings satisfying J1 : J2z : J3 = 1 :
2 : −0.5, when the anisotropic ratio γ2 = J2x/J2z =
J2y/J2z 6= 1, it is found that the width of the plateau at
m = 1/6, the saturation field, and the susceptibility χ(T )
show the same tendency, but quantitatively different, un-
der the external field H along the z and x directions,
while the specific heat C(T ) for H along the z direction
coincides with that along the x direction. For the case
of the couplings satisfying J1 : J2 : J3z = 1 : 2 : −0.5,
when the anisotropic ratio γ3 = J3x/J3z = J3y/J3z 6= 1,
it is seen that the width of the plateau at m = 1/6, the
saturation field, and the susceptibility χ(T ) exhibit the
opposite trends for H along the z and x directions, while
the specific heat C(T ) for H along the z direction also
coincides with that along the x direction.
For all the three cases, plateau states of m = 1/6 are
observed during the magnetization, whose static struc-
ture factor S(q) shows peaks at wavevectors q = 0, 2π/3
and 4π/3. But in absence of the magnetic field, the static
structure factor S(q) in the ground state displays peaks
at q = 0, π/3, 2π/3, π, 4π/3, and 5π/3 for the frustrated
case with J1, J2, J3 > 0; peaks at q = 0, π/3, π, and
5π/3 for the frustrated case with J1, J3 < 0, J2 > 0;
and a peak at q = π for the non-frustrated case with J1,
J2 > 0, J3 < 0. In addition, the DMRG results of the
zero-field static structure factor can be nicely fitted by a
linear superposition of six modes, where the fitting equa-
tion is proposed. At finite temperatures, the double-peak
structure of the susceptibility and specific heat against
temperature can be obtained for all the three cases. It
is found that the susceptibility shows ferrimagnetic char-
acteristics for the two frustrated cases with some cou-
plings, while no ferrimagnetic behaviors are observed for
the non-frustrated case.
The compound Cu3(CO3)2(OH)2 is regarded as a
model substance for the spin-1/2 Heisenberg diamond
chain. The 1/3 magnetization plateau and the two broad
peaks both in the magnetic susceptibility and the spe-
cific heat have been observed experimentally[6]. Our
TMRG calculations with J1 : J2 : J3z = 1 : 1.9 : −0.3
and J3x/J3z = J3y/J3z = 1.7 capture well the main
characteristics of the experimental susceptibility, spe-
cific heat and magnetization, indicating that the com-
pound Cu3(CO3)2(OH)2 may not be a spin frustrated
magnet[26].
APPENDIX A: LOW-LYING EXCITATIONS OF
SPIN-1/2 FRUSTRATED HEISENBERG
DIAMOND CHAINS
In this Appendix, the low-lying excitations of the spin-
1/2 frustrated Heisenberg diamond chain are investigated
by means of the Jordan-Wigner (JW) transformation.
The Hamiltonian of the system reads
(J1S3i−2 · S3i−1 + J2S3i−1 · S3i + J3S3i−2 · S3i
+J3S3i−1 · S3i+1 + J1S3i · S3i+1)−H
Szj ,(A1)
where 3N is the total number of spins in the diamond
chain, Ji > 0 (i = 1, 2, 3) represent the AF coupling
while Ji < 0 the F interaction, and H is the external
magnetic field along the z direction. In accordance with
the spin configuration of the diamond chain, we start
from the Jordan-Wigner (JW) transformation with spin-
less fermions
S+j = a
j exp[iπ
a+mam],
Szj = a
j aj −
, (A2)
where j = 1, · · · , 3N . Because the period of the present
system is 3, three kinds of fermions in moment space can
be introduced through the Fourier transformations
a3i−2 =
eik(3i−2)a1k,
a3i−1 =
eik(3i−1)a2k,
a3i =
eik(3i)a3k. (A3)
Ignoring the interactions between fermions, the Hamilto-
nian takes the form of
H = E0 +
[(ω1a
1ka1k + ω2a
2ka2k + ω3a
3ka3k)
+(γ1a1ka
2k + γ2a2ka
3k + γ3a3ka
1k + h.c.)],(A4)
where E0 =
(2J1+J2+2J3−6H), ω1=−(J1+J3)−H ,
ω2=− 12 (J1+J2+J3)−H , ω3=ω2, γ1=(J1e
ik+J3e
−i2k)/2,
γ2=(J2e
ik)/2, and γ3=(J3e
ik + J1e
−ik)/2.
Via the Bogoliubov transformation
a1k = u11(k)α1k + u12(k)α2k + u13(k)α3k,
a2k = u21(k)α1k + u22(k)α2k + u23(k)α3k,
a3k = u31(k)α1k + u32(k)α2k + u33(k)α3k, (A5)
the Hamiltonian can be diagonalized as
H = Eg +
ikαik. (A6)
The coefficients of the Bogoliubov transformation can be
found through equations of motion i~ȧik = [aik, H ]:
ω1 γ1 γ3
γ∗1 ω2 γ2
γ∗3 γ
 = ǫik
 . (A7)
For a given k, the eigenvalues εik and eigenvectors
(u1i, u2i, u3i) can be numerically calculated by the driver
ZGEEV.f of the LAPACK, which is available on
the website[29]. Figs.7 show the zero-field low-lying
fermionic excitation ε(k) for the frustrated diamond
chain with different AF coupling, while Figs.13 present
the zero-field low-lying fermionic excitation ε(k) for the
frustrated diamond chain with J1, J3 < 0 and J2 > 0.
Acknowledgments
We are grateful to Prof. D. P. Arovas for useful com-
munication. This work is supported in part by the Na-
tional Science Fund for Distinguished Young Scholars
of China (Grant No. 10625419), the National Science
Foundation of China (Grant Nos. 90403036, 20490210,
10247002), and by the MOST of China (Grant No.
2006CB601102).
[] ∗Corresponding author. E-mail: gsu@gucas.ac.cn
[1] M.Oshikawa, M. Yamanaka, and I. Affleck, Phys. Rev.
Lett. 78, 1984 (1997).
[2] M. Drillon, E. Coronado, M. Belaiche and R. L. Carlin,
J. Appl. Phys. 63, 3551 (1988); M. Drillon, M. Belaiche,
P. Legoll, J. Aride, A. Boukhari and A. Moqine, J. Magn.
Magn. Mater. 128, 83 (1993).
[3] H. Sakurai, K. Yoshimura, K. Kosuge, N. Tsujii, H. Abe,
H. Kitazawa, G. Kido, H. Michor and G. Hilscher, J.
Phys. Soc. Japan 71, 1161 (2002).
[4] M. Ishii, H. Tanaka, M. Mori, H. Uekusa, Y. Ohashi, K.
Tatani, Y. Narumi, and K. Kindo, J. Phys. Soc. Jpn. 69,
340 (2000).
[5] M. Fujisawa, J. Yamaura, H. Tanaka, H. Kageyama, Y.
Narumi, and K. Kindo, J. Phys. Soc. Jpn. 72, 694 (2003).
[6] H. Kikuchi, Y. Fujii, M. Chiba, S. Mitsudo, T. Klehara,
T. Tonegawa, K. Okamoto, T. Sakai, T. Kuwai, and H.
Ohta, Phys. Rev. Lett. 94, 227201 (2005).
[7] H. Kikuchi, Y. Fujii, M. Chiba, S. Mitsudo, and T. Ide-
hara, Physica B 329, 967 (2003).
[8] K. Takano, K. Kubo, and H. Sakamoto, J. Phys.: Con-
dens. Matter 8, 6405 (1996).
[9] K. Okamoto, T. Tonegawa, Y. Takahashi, and M.
Kaburagi, J. Phys.: Condens. Matter 11, 10485 (1999).
[10] T. Tonegawa, K. Okamoto, T. Hikihara, Y. Takahashi,
and M. Kaburagi, J. Phys. Soc. Jpn. 69, 332 (2000).
[11] K. Sano and K. Takano, J. Phys. Sco. Jpn. 69, 2710
(2000).
[12] T. Tonegawa, K. Okamoto, T. Hikihara, Y. Takahashi,
and M. Kaburagi, J. Phys. Chem. Solids 62, 125 (2001).
[13] K. Okamoto, T. Tonegawa, and M. Kaburagi, J. Phys.
Condens. Matter 15, 5979 (2003).
[14] A. Honecker and A. Lauchli, Phys. Rev. B 63, 174407
(2001).
[15] D. D. Swank and R. D.Willett, Inorganica Chimica Acta,
8, 143 (1974).
[16] S. White; T. Xiang and X. Wang, Density-Matrix Renor-
malization, Lecture Notes in Physics, Vol. 528, edited
by I. Peschel, X. Wang, M. Kaulke and K. Hallberg
(Springer-Verlag, New York, 1999).
[17] U. Schollwock, Rev. Mod. Phys. 77, 259 (2005).
[18] We have checked that in the present situation, the static
structure factor S(q) calculated from 〈Szj S
0 〉 coincides
with that from 〈(Szj − 〈S
j 〉)(S
0 − 〈S
0 〉)〉.
[19] Y. Ajiro, T. Asano, K. Nakaya, M. Mekata, K. Ohoyama,
Y. Yamaguchi, Y. Koike, Y. Morii, K. Kamishima, H. A.
Katori, and T. Goto, J. Phys. Soc. Jpn. 70, Suppl. A,
186 (2001).
[20] D. P. Arovas, A. Auerbach, and F. D. M. Haldane, Phys.
Rev. Lett. 60, 531 (1988).
[21] The system under interest involves possibly ferrimag-
netic, dimerized and spin liquid phases, leading to the
spin-spin correlation functions exhibit different behav-
iors including exponential or power-law decaying. The
constant cl in Eq. (7) characterizes the long-rang order
in ferrimagnetic phase.
[22] B. Gu, G. Su, and S. Gao, Phys. Rev. B 73, 134427
(2006).
[23] I. Affleck, D. Gepner, H. J. Schulz, and T. Ziman, J.
Phys. A: Math. Gen. 22, 511 (1989).
[24] R. P. Singh, M. E. Fisher, and R. Shanker, Phys. Rev. B
39, 2562 (1989).
[25] B. Gu, G. Su, and S. Gao, J. Phys.:Condens. Matter 17,
6081 (2005).
[26] B. Gu and G. Su, Phys. Rev. Lett. 97, 089701 (2006).
[27] J. C. Livermore, R. D. Willett, R. M. Gaura, and C. P.
Landee, Inorg. Chem. 21, 1403 (1982).
[28] P. J. van Koningsbruggen, D. Gatteschi, R. A. G. de
Graaff, J. G. Haasnoot, J. Reedijk, and C. Zanchini, In-
org. Chem. 34, 5175 (1995).
[29] http://www.netlib.org/lapack/
http://www.netlib.org/lapack/
ABSTRACT
  The magnetic and thermodynamic properties of spin-1/2 Heisenberg diamond
chains are investigated in three different cases: (a) J1, J2, J3>0
(frustrated); (b) J1, J3<0, J2>0 (frustrated); and (c) J1, J2>0, J3<0
(non-frustrated). The density matrix renormalization group (DMRG) technique is
invoked to study the properties of the system in the ground state, while the
transfer matrix renormalization group (TMRG) technique is applied to explore
the thermodynamic properties. The local magnetic moments, spin correlation
functions, and static structure factors are discussed in the ground state for
the three cases. It is shown that the static structure factor S(q) shows peaks
at wavevectors $q=a\pi /3$ (a=0,1,2,3,4,5) for different couplings in a zero
magnetic field, which, however in the magnetic fields where the magnetization
plateau with m=1/6 pertains, exhibits the peaks only at q=0, $2\pi /3$ and
$4\pi /3$, which are found to be couplings-independent. The DMRG results of the
zero-field static structure factor can be nicely fitted by a linear
superposition of six modes, where two fitting equations are proposed. It is
observed that the six modes are closely related to the low-lying excitations of
the system. At finite temperatures, the double-peak structures of the
susceptibility and specific heat against temperature are obtained, where the
peak positions and heights are found to depend on the competition of the
couplings. It is also uncovered that the XXZ anisotropy of F and AF couplings
leads the system of case (c) to display quite different behaviors. In addition,
the experimental data of the susceptibility, specific heat and magnetization
for the compound Cu$_{3}$(CO$_{3}$)$_{2}$(OH)$_{2}$ are fairly compared with
our TMRG results.

<|endoftext|><|startoftext|>
Introduction
The progress of natural sciences depends on advancement
in the fields of experimental techniques and modeling of
relations between experimental data in terms of physical
laws.[1,2] By utilizing computers a revolution appeared
in the acquisition of experimental data while modeling
still awaits a corresponding progress. For this purpose the
modeling process should be generally described in terms
of operations that could be autonomously performed by a
computer. A step in this direction was taken recently by a
nonparametric statistical modeling of the probability dis-
tribution of measured data.[3] The nonparametric model-
ing requires no a priori assumptions about the probability
density function (PDF) of measured data and therefore
provides for a fairly general and autonomous experimen-
tal modeling of physical laws by a computer.[1,4] More-
over, the inaccuracy of measurement caused by stochastic
influences can be properly accounted for in the nonpara-
metric modeling that further leads to the expression of ex-
perimental information, redundancy of repeated measure-
ments and model cost function in terms of entropy of infor-
mation. These variables have already been applied when
formulating an optimal nonparametric modeling of PDF,
in the most simple case of a one–dimensional variable.[3]
However, more frequently than modeling of a PDF the
problem is to extract a physical law from joint data about
various variables and to analyze its properties. Therefore,
the aim of this article is to propose a general statistical
approach also to the solution of this problem.
As an optimal statistical estimator of an experimen-
tal physical law we propose the conditional average (CA)
that is determined by the conditional PDF.[1] This esti-
mator represents a nonparametric regression whose struc-
ture is case independent; hence it can be generally pro-
grammed and autonomously determined by a computer.
Due to these convenient properties, we consider CA as a
basis for the autonomous extraction of experimental phys-
ical laws in data acquisition systems.
The fundamental steps of the proposed approach to
extraction of experimental physical laws from given data
are explained in the second section. We first define the
estimators of the joint, the marginal and the conditional
PDFs and derive from them the conditional average as
an optimal estimator of a physical law that is hidden in
joint data. In order to estimate the number of data ap-
propriate for the extraction of a physical law, we further
introduce the statistics that characterize the information
provided by joint measurements. In the third section of
the article the properties of the CA estimator and the
other introduced statistics are demonstrated on cases of
deterministically and randomly related data.
2 Statistics of joint measurements
2.1 Uncertainty of experimental observation
Without loss of generality we consider a phenomenon that
can be quantitatively characterized by two scalar valued
variables x and y comprising a vector z = (x, y). We fur-
ther assume that the phenomenon can be experimentally
Igor Grabec: Extraction of physical laws from joint experimental data 3
explored by repetition of joint measurements on a two–
channel instrument having equal spans Sx = (−L,L),
Sy = (−L,L). Their Cartesian product Sxy = Sx ⊗ Sy
determines the joint span. We treat a measurement of a
joint datum as a process in which the measured object
generates the instrument output z = (x, y). The basic
properties of the instrument and measurement procedure
can be characterized by a calibration based on a set of
objects {wkl = (uk, vl); k = 1, . . . l = 1, . . .} that repre-
sent joint physical units. Using these units, a scale net can
be determined in the joint span Sxy of the instrument. In
order to simplify the notation, we further omit the indices
of units.
A common property of measurements is that the out-
put of the instrument fluctuates even when calibration
is repeated.[1,2] We describe this property by the joint
PDF ψ(z|w), which characterizes the scattering of the in-
strument output at a given joint unit w. For the sake
of simplicity, we consider an instrument whose channels
can be calibrated mutually independently. In this case the
instrument scattering function is expressed by the prod-
uct of scattering functions corresponding to both channels
ψ(z|w) = ψ(x|u)ψ(y|v). Their mean values u, v, and stan-
dard deviations σx, σy represent an element of the instru-
ment scale and the scattering of instrument output at the
joint calibration. These values can be estimated statisti-
cally by the sample mean and variance of both components
measured during repeated calibration by a joint unit w.
The standard deviation σ characterizes the uncertainty
of the measurement procedure performed on a unit.[1,2]
We further consider the most frequent case in which the
output scattering does not depend on the channel index
and the position w = (u, v) on the joint scale. In this
case it can be expressed as a function of the difference
z − w = (x − u, y − v) and a common standard devia-
tion σ = σx = σy as ψ(z|w) = ψ(z −w, σ). We consider
scattering of instrument output during calibration as a
consequence of random disturbances in the measurement
system. When these disturbances are caused by contribu-
tions from mutually independent sources, the central limit
theorem of the probability theory leads us to the Gaussian
scattering function ψ(z−w, σ) = g(x−u, σ)g(y−v, σ), in
which the scattering of a single component is determined
ψ(x|u) = g(x− u, σ) = 1√
− (x− u)
. (1)
2.2 Estimation of probability density functions
Let us consider a single measurement which yields a joint
datum z1 = (x1, y1). We assume that this joint datum
appears at the outputs of instrument channels, since it is
the most probable at a given state z of the observed phe-
nomenon and the instrument during measurement. There-
fore, we utilize the measured datum z1 as the center of the
probability distribution ψ(z− z1, σ) = ψ(x− x1, σ)ψ(y −
y1, σ) that represents the corresponding state.
Consider next a series of N repeated measurements
which yield the basic data set {zi; i = 1, . . . , N}. In ac-
cordance with the above–given interpretation of measured
data we adapt to them the distributions {ψ(z−zi, σ); i =
1, . . . , N}. If the data z1, . . . , zN are spaced more than σ
4 Igor Grabec: Extraction of physical laws from joint experimental data
apart, we assume that their scattering is caused by varia-
tion of the state z in repeated measurements and generally
consider z as a random vector variable. Its joint PDF is
determined by the statistical average over distributions
{ψ(z− zi, σ); i = 1, . . . , N} as:
fN (z) =
ψ(z− zi, σ). (2)
This function represents an experimental model of PDF
and resembles Parzen’s kernel estimator, which is often
used in statistical modeling of PDFs.[5,4] However, in Parzen’s
modeling the kernel width σ plays the role of a smooth-
ing parameter whose value decreases with the number of
data N , which is not consistent with the general proper-
ties of measurements. In opposition to this, we consider σ
as an instrumental parameter that is determined by the
inaccuracy of measurement.[3,4] In the majority of experi-
mental observations σ is a constant during measurements,
and hence need not be further indicated in the scattering
function ψ.
From the joint PDF f(z) = f(x, y) the marginal PDF
f(x) of a component x is obtained by integration over the
other component, for example:
f(x) =
f(x, y)dy (3)
The conditional PDF of the variable y at a given condition
x is then defined by the ratio of the joint PDF and the
marginal PDF of the condition:
f(y|x) = f(x, y)
Using the experimental model of joint PDF (2) we obtain
for the marginal and conditional PDFs the following kernel
estimators:
fN(x) =
ψ(x− xi, σ) (5)
fN (y|x) =
i=1 ψ(x− xi, σ)ψ(y − yi, σ)
i=1 ψ(x− xi, σ)
2.3 Estimation of a physical law
It is often observed that the joint PDF resembles a crest
along some line y = ŷ(x). We consider ŷ(x) as an estimator
of a hidden physical law y = yo(x) that provides for a
prediction of a value y from the given value x. If we repeat
joint measurements, and consider only those that yield
the value x, we can generally observe that corresponding
values of the variable y are scattered, at least due to the
stochastic character of the measurements. As an optimal
predictor of the variable y at the given value x, we consider
the value ŷ that yields the minimum of the mean square
prediction error D at a given x:
D = E[(ŷ − y)2|x] = min(ŷ) (7)
The minimum takes place when dD/dŷ = 0. The solu-
tion of this equation yields as the optimal predictor ŷ the
conditional average
ŷ(x) = E[y|x] =
y f(y|x)dy (8)
By using Eq. 6 for the conditional probability, we obtain
for CA the superposition
ŷN (x) =
i=1 yiψ(x− xi, σ)
i=1 ψ(x− xi, σ)
yiCi(x) (9)
The coefficients
Ci(x) =
ψ(x− xi, σ)
i=1 ψ(x − xi, σ)
Igor Grabec: Extraction of physical laws from joint experimental data 5
represent a normalized measure of similarity between the
given value x and sample values xi and satisfy the condi-
tions:
Ci(x) = 1 , (11)
0 ≤ Ci(x) ≤ 1. (12)
The more similar given value x is to a datum xi, the larger
the coefficient Ci(x) is and the contribution of the corre-
sponding term yiCi(x) to the sum in Eq.(9). The pre-
diction of the value ŷN (x), which best corresponds to the
given value x, thus resembles the associative recall of mem-
orized items in the brains of intelligent beings, and there-
fore could be treated as a basis for the development of
computerized autonomous modelers of physical laws and
related machine intelligence.[1]
The predictor Eq. (9) is completely determined by the
set of measured data {z − zi; i = 1, . . . , N} and the in-
strument scattering function ψ. The predictor is not based
on any a priori assumption about the functional relation
between the variables x and y, as is done for example
when a physical law is described by some regression func-
tion in which parameters are adapted to given data. The
conditional average Eq. (9) can thus be treated as a non-
parametric regression, although the scattering functions
ψ(z−zi, σ) still depend on the parameters zi, σ. However,
these parameters, as well as the form of the function ψ,
are totally specified by measurements. They represent a
property of the observed phenomenon and not an assumed
auxiliary of the modeling. Since the form of the CA pre-
dictor does not depend on a specific phenomenon under
consideration, it could be considered as a generally ap-
plicable basis for statistical modeling of physical laws in
terms of experimental data in an autonomous computer.
It is convenient that Eq. (9) can be simply generalized to a
multi–dimensional case by substituting the condition and
the estimated variable by the corresponding vectors.[1]
Moreover, it is convenient that the ordering into depen-
dent and independent variables is done automatically by
a specification of the condition.
2.3.1 Description of predictor quality
We can interpret a phenomenon which is characterized by
the vector z = (x, y) as a process that maps the vari-
able x to the variable y. When the variables x and y are
stochastic, we most generally describe this mapping by the
joint PDF f(x, y). Similarly, we can interpret the predic-
tion of the variable ŷ(x) from the given value x as a pro-
cess that runs in parallel with the observed phenomenon.
This process is also generally characterized by the PDF
f(x, ŷ), while the relation between the variables y and ŷ
is characterized by the PDF f(y, ŷ). The better the pre-
dictor is, the more the distribution f(y, ŷ) is concentrated
along the line y = ŷ(x). For a good predictor we generally
expect that the prediction error Er = y − ŷ is close to
0. Since both variables are considered as stochastic ones,
we expect that the first and second moments of the pre-
diction error E[y − ŷ], E[(y − ŷ)2] are small, while for
an exact prediction E[y − ŷ] = 0, and E[(y − ŷ)2] = 0.
The second moment of the error is equal to E[(y − ŷ)2] =
Var(y)+Var(ŷ)−2Cov(y, ŷ)+(my−mŷ)2, wheremy = E[y]
andmŷ = E[ŷ] denote mean values. If the variables y and ŷ
6 Igor Grabec: Extraction of physical laws from joint experimental data
are statistically independent and have equal mean values,
the covariance vanishes: Cov(y, ŷ) = 0, and my −mŷ = 0,
so that E[(y − ŷ)2] = Var(y) + Var(ŷ). Based upon this
property we introduce a relative statistic called the pre-
dictor quality with the formula
Q = 1− E[(y − ŷ)
Var(y) + Var(ŷ)
2Cov(y, ŷ)
Var(y) + Var(ŷ)
− (my −mŷ)
Var(y) + Var(ŷ)
Its value equals 1 for an exact prediction: ŷ = y, while it
equals 0, if the variables y, ŷ are statistically independent
and have equal mean values. If the mean values differ:
my −mŷ 6= 0, the quality Q can also be negative.
When the predictor is determined by the conditional
average (8), we obtain for its mean value
mŷ = E[ŷ] =
ŷf(x)dx =
yf(y|x)f(x)dxdy
yf(y, x)dxdy = E[y] = my. (14)
Since in this case my −mŷ = 0, we further get
2Cov(y, ŷ)
Var(y) + Var(ŷ)
Similarly we get for the covariance
Cov(y, ŷ) =
(y −my)(ŷ(x) −mŷ(x)])f(y, x)dxdy
(ŷ(x)−mŷ(x))(y −my)f(y|x)dyf(x)dx
(ŷ(x)−mŷ(x))2f(x)dx = Var(ŷ), (16)
so that the expected quality of the CA predictor is
2Var(ŷ)
Var(y) + Var(ŷ)
. (17)
In the case when the relation between both components of
the vector z is determined by some physical law yo(x), and
only the measurement procedure introduces an additive
noise ν with zero mean E[ν] = 0, and variance E[ν2] = σ2,
we can express the variable y as y = yo(x) + ν. In this
case the following equations: E[(y − ŷ)2] = σ2, Var(y) =
Var(ŷ) + σ2 hold, and we get for the expected predictor
quality the expression:
2Var(ŷ)
2Var(ŷ) + σ2
. (18)
For Var(ŷ) ≫ σ2/2 we have Q ≈ 1, while for Var(ŷ) ≪
σ2/2 we have Q ≈ 0. In the last case ŷ ≈ constant, while
y fluctuates around this constant, and consequently the
prediction quality is low.
Since generally Var(y) ≥ Var(ŷ) and Var(ŷ) ≥ 0, we
obtain from Eq. (17) the inequality 0 ≤ Q ≤ 1. It describes
a mean property, which need not be fulfilled exactly if the
conditional average is statistically estimated from a finite
number of samples N ; but we can expect that it holds
ever more with an increasing N . However, we can gen-
erally expect that with an increasing N , the statistically
estimated CA ever better represents the underlying physi-
cal law y = yo(x). However, with an increasing N , the cost
of experiments increases, and consequently there generally
appears the question: ”How to specify a number of sam-
ples N that is reasonable for the experimental estimation
of a hidden law yo(x)?”
2.4 Experimental information
In order to answer the last question, we proceed with the
description of the indeterminacy of the vector variable z
in terms of the entropy of information. Following the def-
initions given for a scalar random variable in the previous
Igor Grabec: Extraction of physical laws from joint experimental data 7
article,[3] we first describe the indeterminacy of the com-
ponent x. For this purpose we introduce a uniform refer-
ence PDF ρ(x) = 1/(2L) that hypothetically corresponds
to the most indeterminate noninformative observation of
variable x; or to equivalently prepared initial states of the
instrument before executing the experiments in a series
of observations. By using this reference and the marginal
PDF f(x), we first define the indeterminacy of a continu-
ous random variable by the negative value of the relative
entropy[6,7]
Hx = −
f(x) log
(f(x)
dx. (19)
Using the expressions for the reference, instrumental scat-
tering function, and experimentally estimated PDF, we
obtain the expressions for the uncertainty Hu of calibra-
tion performed on a unit u, the uncertainty Hx of the
component x, experimental information Ix provided by
N measurements of x, and the redundancy Rx of these
measurements as follows [3]:
Hu = −
ψ(x, u) log(ψ(x, u)) dx − log(2L),
Hx = −
fN (x) log(fN (x)) dx − log(2L),
Ix(N) = Hx −Hu,
Rx(N) = log(N)− Ix(N), (20)
Similar equations are obtained for the component y by
substituting x→ y.
In order to describe the uncertainty of the random vec-
tor z, we utilize the reference PDF that is uniform inside
the joint span Sxy: ρ(z) = ρ(x)ρ(y) = 1/(2L)
2, and van-
ishes elsewhere. By analogy with the scalar variable we
define the indeterminacy of the random vector z by the
negative value of the relative entropy:[6]
Hxy = −
f(z) log
(f(z)
dxdy. (21)
In the case of a uniform reference PDF we obtain
Hxy = −
f(z) log(f(z)) dxdy − 2 log(2L). (22)
With this formula we then express the uncertainty of the
joint instrument calibration as
ψ(z,w) log(ψ(z,w)) dxdy − 2 log(2L).
For σ ≪ L we obtain from the Gaussian scattering func-
tion ψ(z, zi) = g(x− xi, σ)g(y − yi, σ) the approximation
≈ log
+ log
+ 1, (24)
The uncertainty of calibration depends on the ratio be-
tween the scattering width 2σ and the instrument span 2L
in both directions. The number 2 log(σ/L) determines the
lowest possible uncertainty of measurement on the given
two–channel instrument, as achieved at its joint calibra-
tion.
The indeterminacy of the random vector z, which char-
acterizes the scattering of experimental data, is defined by
the estimated joint PDF as
Hxy = −
fN (z) log(fN (z)) dxdy − 2 log(2L) (25)
and is generally greater than the uncertainty of calibra-
tion described by H
. Since H
denotes the lowest possi-
ble indeterminacy of observation carried out over a given
instrument, we define the joint experimental information
8 Igor Grabec: Extraction of physical laws from joint experimental data
Ixy about vector z = (x, z) by the difference
Ixy(N) = Hxy −Hw
fN (z) log(fN (z)) dxdy
ψ(z,w) log(ψ(z,w)) dxdy. (26)
Most properties of the uncertainty and information apper-
taining to a random vector are similar to those in the case
of a scalar variable. For example, the reference density ρ(z)
can be arbitrarily selected since it is excluded from the
specification of the experimental information.[3] Further-
more, the joint experimental information Ixy(1) provided
by a single measurement is zero. For a measurement which
yields multiple samples z1, . . . , zN that are mutually sep-
arated by several σ in both directions, the distributions
ψ(z, z1) = g(x− xi, σ)g(y− yi, σ) are nonoverlapping and
the first integral on the right of Eq. 26 can be approxi-
mated as
ψ(z, zi) log
ψ(z, zi)
≈ log(N)−
ψ(z, z1) logψ(z, z1) dxdy (27)
so that we get Ixy(N) ≈ log(N). If the distributions ψ(z, zi)
are overlapping but not concentrated at a single point, the
inequality 0 ≤ Ixy(N) ≤ log(N) holds generally. Similarly
as the entropy of information for a discrete random vari-
able, the experimental information describes how much
information is provided by N experiments performed by
an instrument that is not infinitely accurate.[6] In accor-
dance with these properties the experimental information
describes the complexity of experimental data in units of
information entropy, which are here nats.
When the distributions ψ(z, zi) are nonoverlapping, N
repeated experiments yield the maximal possible informa-
tion log(N). However, with an increasing number N , ever
more overlapping of distributions ψ(z, zi) takes place, and
therefore the experimental information Ixy(N) increases
more slowly than log(N). Consequently, the repetition of
joint measurements becomes on average ever more redun-
dant with an increasing number N . The difference
Rxy(N) = log(N)− Ixy(N) . (28)
thus represents the redundancy of repeated joint measure-
ments in N experiments. Since the overlapping of distri-
butions ψ(z, zi) increases with an increasing number of ex-
periments, the experimental information on average tends
to a constant value Ixy(∞), and along with this, the re-
dundancy increases with N .
The number
Kxy(N) = e
Ixy(N) (29)
describes how many nonoverlapping distributions are needed
to represent the experimental observation. With an in-
creasing N , the number Kxy(N) tends to a fixed value
Kxy(∞) that can be well estimated already from a finite
number of experiments. We could conjecture thatKxy(∞)
approximately determines a reasonable number of experi-
ments that provide sufficient data for an acceptable mod-
eling of the joint PDF. However, it is still better to de-
termine such a number from a properly introduced cost
function of the experimental observation. With this aim
we consider the difference Dxy(N) = Ixy(∞)− Ixy(N) as
the measure of the discrepancy between the experimen-
Igor Grabec: Extraction of physical laws from joint experimental data 9
tally observed and the true properties of the phenomenon.
An information cost function is then comprised of the re-
dundancy and the discrepancy measure:
Cxy(N) = Rxy(N) +Dxy(N). (30)
Since the redundancy on average increases, while the dis-
crepancy measure decreases with the number of measure-
ments N , we expect that the cost function Cxy(N) ex-
hibits a minimum at a certain number No, which could be
considered as an optimal one for the experimental model-
ing of a phenomenon. From the definition of redundancy
and the discrepancy measure we further obtain Cxy(N) =
Rxy(N)+Dxy(N) = log(N)−2Ixy(N)+Ixy(∞). Since the
last term is a constant for a given phenomenon, it is not
essential for the determination of No, and can be omitted
from the definition of the cost function. This yields a more
simple version
Cxy(N) = log(N)− 2Ixy(N), (31)
which is more convenient for application since it does not
include the limit value Ixy(∞). In a previous article [3]
we have proposed a cost function that is comprised from
the redundancy and the information measure of the dis-
crepancy between the hypothetical and experimentally ob-
served PDFs. However, such a definition is less convenient
than the present one, although the values of No deter-
mined from both cost functions do not differ essentially.
Numerical investigations also show that the optimal num-
ber No approximately corresponds to Kxy(∞) = eIxy(∞)
if the distribution of the data points is approximately uni-
form.
Although the experimental information of a vector vari-
able and its scalar components exhibits similar properties,
their values generally do not coincide since the overlapping
of distributions ψ(z, zi) generally differs from that of dis-
tributions ψ(x, xi) or ψ(y, yi). Therefore, the experimen-
tal information provided by joint measurements generally
differs from that provided by measurements of single com-
ponents.
2.5 Mutual information and determination of one
variable by the other
In order to describe the information corresponding to the
relation between variables x, y we introduce conditional
entropy. At a given value x we express the entropy per-
taining to the variable y by the conditional PDF as
Hy|x = −
f(y|x) log
(f(y|x)
dy (32)
If we express in Eq. (21) the joint PDF by the conditional
one f(z) = f(y|x)f(x) we obtain the following equation:
Hxy = Hy|x +Hx (33)
in which Hy|x denotes the average conditional entropy of
information
Hy|x = −
Hy|xf(x) dx. (34)
When we exchange the meaning of the variables we get
Hxy = Hx|y +Hy. (35)
Based on these equations and Eq. (26) we obtain the fol-
lowing relation between the joint and the conditional in-
10 Igor Grabec: Extraction of physical laws from joint experimental data
formation
Ixy = Hx|y +Hy −Hu −Hv
= Iy|x + Ix = Ix|y + Iy (36)
where the conditional information is defined by
Ix|y = Hx|y −Hu or Iy|x = Hy|x −Hv. (37)
When the components of the vector z are statistically
independent, the joint PDF is equal to the product of
marginal probabilities and the joint information is given
by the sum Ixy = Ix + Iy, which represents the maxi-
mal possible information that could be provided by joint
measurements. However, when x and y are not statisti-
cally independent, the joint information is less than the
maximal possible one: Ixy < Ix + Iy. The difference
Im = Ix + Iy − Ixy = Ix − Ix|y = Iy − Iy|x. (38)
can be interpreted as the experimental information that
a measurement of one variable provides about another one
and is consequently called the mutual information.[6,8,9,10]
In accordance with the previous interpretation of the re-
dundancy, it follows from the last two terms in Eq. (38)
that the mutual information also describes how redun-
dant on average is a measurement of the variable y at a
given x or vice versa. In accordance with the definition of
the redundancy of a certain number N of measurements
Rx(N) = log(N) − Ix, we further define also the mutual
redundancy of N joint measurements
Rm(N) = log(N)− Im(N) . (39)
If we then take into account all the definitions of the re-
dundancies and types of information, we obtain the for-
mula:
Rxy(N) = Rx(N) +Ry(N)−Rm(N) (40)
It should be pointed out that redundanciesRxy(N), Rx(N),
Ry(N), and Rm(N) generally increase with N , while the
corresponding experimental information tends to fixed val-
ues that correspond to the amount of data needed for pre-
senting related variables.
In order to describe quantitatively how well determined
the value of the variable y by the value of x is on aver-
age, we propose a relative measure of determination by
the ratio
Dy|x =
. (41)
If Dy|x > Dx|y, the value of the variable x better deter-
mines the value of y than vice versa. In this case the vari-
able x could be considered as more fundamental for the
description of the phenomenon, and consequently as an
independent one. In the case of functional dependence de-
scribed by a physical law y = yo(x), the relative measure
of determination is Dy|x = 1, while for the statistically
independent variables x and y it is Dy|x = 0.
The entropy of information is generally decreased if
the distribution of scattered experimental data at a given
x is compressed to the estimated physical law ŷ(x). The
corresponding information gain is in drastic contrast to
the information loss that is caused by the noise in a mea-
surement system.[11]
Igor Grabec: Extraction of physical laws from joint experimental data 11
3 Illustration of statistics
3.1 Data with a hidden law
The purpose of this section is to demonstrate graphically
the basic properties of the statistics introduced above. For
this purpose it is most convenient to generate data nu-
merically since in this case the relation between the vari-
ables x and y, as well as the properties of the scatter-
ing function ψ(z), can be simply set. For our demonstra-
tion we arbitrarily selected a third order polynomial law
yo(x) = [x(x − 5)(x + 10)]/100 and the Gaussian scatter-
ing function with standard deviation σ = 0.2. To simulate
the basic data set {xi, yi; i = 1, . . . , N}, we first calcu-
lated 50 sample values xi by summing two random terms
obtained from a generator with a uniform distribution in
the interval [−8,+8] and from a Gaussian generator hav-
ing the mean value 0 and standard deviation σ = 0.2.
The corresponding sample values yi were then calculated
as a sum of terms obtained from the selected law yo(xi)
and the same random Gaussian generator with a different
seed. The generated data {xi, yi; i = 1, . . . , 50} were used
as centers of scattering function when estimating the joint
PDF based on Eq. (2). An example of such PDF is shown
in Fig. 1, while the corresponding joint data of the basic
set are shown by points in the top curve of Fig. 2 together
with the underlying law yo(x).
The conditional average predictor, which corresponds
to the presented example, was modeled by inserting data
from the basic data set into Eq. (9). To demonstrate its
performance, we additionally generated a test data set by
N=50, σ=0.2 
Fig. 1. The joint PDF f(x, y) utilized to demonstrate the
properties of the conditional average predictor.
−10 −8 −6 −4 −2 0 2 4 6 8 10
TESTING OF CA PREDICTOR
σ = 0.2   N=50  Q = 0.977
Fig. 2. Testing of CA predictor. Curves representing the un-
derlying law and given data yo, y – (top), test and predicted
data yt, yp – (middle), and prediction error Er = yp − yt –
(bottom) are displaced in vertical direction for a better visu-
alization.
the same procedure as in the case of the basic data set, but
with different seeds of all the random generators. Using
the values xi,t of the test set, we then predicted the cor-
responding values ŷi by the modeled CA predictor. With
this procedure we simulated a situation that is normally
12 Igor Grabec: Extraction of physical laws from joint experimental data
met when a natural law is modeled and tested based upon
experimental data. The test and predicted data are shown
by the middle two curves in Fig. 2. From both data sets
the prediction error Er = ŷ − yt was calculated that is
presented by the bottom curve (..*..) in Fig. 2. The curve
representing the predicted data (–o–) is smoother than the
curve representing the original test data (..·..). This prop-
erty is a consequence of smoothing caused by estimating
the conditional mean value from various data included in
the modeled CA predictor. In spite of this smoothing, it is
obvious that the characteristic properties of the relation
between the variables x and y is approximately extracted
from the given data by the CA predictor. This further
means that the properties of the hidden law y = yo(x) can
be approximately described in the region where measured
data appear based on a finite number of joint samples.
The quality of estimation of the hidden law yo(x) de-
pends on the values and number N of statistical samples
utilized in Eq. (9) in the modeling of CA and its testing. To
demonstrate this property, we repeated the complete pro-
cedure three times, using various statistical data sets with
increasing N and determined the dependence of predic-
tor quality Q on N . The result is presented in Fig. 3. The
quality statistically fluctuates with the increasing N , but
the fluctuations are ever less pronounced, so that quality
determined from different data sets converges to a com-
mon limit value at a large N . In our example with σ = 0.2
the limit value is approximately Q = 0.98. With increas-
ing N , the curves corresponding to different data sets join
approximately at NCA ≈ 30. At a higher N the fluctua-
0 5 10 15 20 25 30 35 40 45 50
PREDICTOR QUALITY
σ = 0.2
Fig. 3. Dependence of predictor quality Q on number of sam-
ples N determined by various statistical data sets.
tions of Q are ever less expressive. We could conjecture
that about 30 data values are needed to model the CA
predictor in the presented case approximately.
The smaller the scattering width σ is, the higher gen-
erally the limit value of the predictor quality is, but on
average Q is still less than 1 if 1/σ and N are finite. This
property is in tune with the well–known fact that it is
impossible to determine exactly the law y = yo(x) from
joint data that are measured by an instrument which is
subject to output scattering due to inherent stochastic
disturbances.
The properties of the statistics that are formulated
based upon the entropy of information are demonstrated
for the case with σ = 0.2 in Fig. 4. It shows the depen-
dence of experimental information Ixy, mutual informa-
tion Im, redundancy Rxy, and cost function Cxy on the
number of samples N for three different sample sets. In
the same figure the maximal possible information, which
Igor Grabec: Extraction of physical laws from joint experimental data 13
0 10 20 30 40 50 60 70 80 90 100
log(N)
Ixy   
Im    
Rxy   
Cxy   
σ=0.2 
Fig. 4. Dependence of log(N), experimental information Ixy,
mutual information Im, redundancy Rxy, and cost function
Cxy on the number of samples N determined by various sta-
tistical data sets.
corresponds to the ideal case with no scattering, is also
presented by the curve log(N), since it represents the ba-
sis for defining the redundancy. Similarly as in the one–
dimensional case [3], the experimental information Ixy in
the two–dimensional case also converges with increasing
N to a fixed value. In the presented case the limit value
is Ixy(∞) ≈ 3.2, which yields the number K∞ ≈ 25. This
number is approximately equal to the ratio of standard
deviation of variable x and the scattering width σ and
describes how many uniformly distributed samples are
needed to represent the PDF of the data.[3] Due to the
convergence of experimental information to a fixed value,
the curve Ixy(N) starts to deviate from log(N) with the in-
creasingN . Consequently the redundancyRxy = log(N)−
Ixy(N) starts to increase, which further leads to the min-
imum of the cost function Cxy(N) = log(N) − 2Ixy(N).
0 10 20 30 40 50 60 70 80 90 100
log(N)
Ixy   
Ix    
Iy    
Im    
σ=0.2 
Fig. 5. Dependence of log(N), experimental information Ixy,
marginal informations Ix, Iy, and mutual information Im on
the number of samples N .
The minimum is not well pronounced due to statistical
variations, but it takes place at approximately No ≈ 30.
Not surprisingly, the optimal number No approximately
corresponds to K∞ and also to NCA.
Similarly as the joint experimental information Ixy, the
marginal experimental information Ix, Iy also converges
to fixed values with increasing N .[3] These statistics are
presented in Fig. 5 for the same data generator as applied
in the case of Fig. 4. The sample values of variable x take
place in a larger interval than those of variable y. Hence
there is less overlapping of scattering functions comprising
the marginal PDF of x and consequently Ix is larger than
Iy. It is also characteristic that Ixy is larger than Ix since
the data points in the joint span Sxy are more separated
than in the marginal span Sx. Since the mutual informa-
tion Im is defined as Im = Ix + Iy − Ixy, its properties
depend on both the marginal and the joint information,
14 Igor Grabec: Extraction of physical laws from joint experimental data
0 10 20 30 40 50 60 70 80 90 100
log(N)
Ixy   
Rxy   
Cxy   
σ=0.1 
σ=0.4 
σ=0.4 
σ=0.4 
σ=0.1 
σ=0.1 
Fig. 6. Dependence of log(N), experimental information Ixy,
redundancy Rxy, and cost function Cxy on the number of sam-
plesN determined from various data sets and scattering widths
and consequently Im converges more quickly to the limit
value than the experimental information Ixy.
To demonstrate the influence of scattering width on
the presented statistics the calculations were repeated with
σ = 0.1 and 0.4. The results are presented in Fig. 6. For
the sake of clear presentation, the curves representing the
mutual information Im are omitted. As could be expected,
the limit value of Ixy increases with decreasing σ. This
property is consistent with the well–known fact that more
information can be obtained by experimental observation
when using an instrument of higher accuracy that corre-
sponds to a lesser scattering width. In opposition to this,
the redundancy of measurement decreases, and along with
it, the optimal number No increases with the decreasing
scattering width.
0 10 20 30 40 50 60 70 80 90 100
Dx|yσ=0.2 
Fig. 7. Dependence of relative measure of determination Dy|x
– (top lines) and Dx|y – (bottom lines) on the number of sam-
ples N determined from various statistical data sets.
From the calculated mutual and marginal information,
the relative measures of determinationDy|x andDx|y were
further determined using various statistical data sets. The
results are presented in Fig. 7 for the case of scattering
width σ = 0.2. When the number of data N surpasses the
interval around the optimal number No, statistical varia-
tions of Dy|x and Dx|y become less pronounced and their
values settle close to limit ones. The limit value Dx|y is
essentially lower than Dy|x. This is the consequence of the
fact that in our case the variable y is uniquely determined
by the underlying law yo(x) based upon the variable x, but
not vice versa. In our case, there are three values of the
variable x corresponding to a value of y in a certain inter-
val. Consequently, y is better determined by a given x than
vice versa, which further yields Dy|x > Dx|y. Hence the
relative measure of determination indicates that variable x
Igor Grabec: Extraction of physical laws from joint experimental data 15
N=500, σ=0.2
random data 
Fig. 8. The joint PDF f(x, y) of N = 500 statistically inde-
pendent random data with σ = 0.2.
could be considered more fundamental for the description
of the relation between the variables x and y.
3.2 Data without a hidden law
To support the last conclusion let us examine an exam-
ple in which the sample values of the variables x and
y were calculated by two statistically independent ran-
dom generators. The corresponding joint PDF is shown
in Fig. 8, while the properties of the other statistics are
demonstrated by Figs. 9, 10 and 11.
The properties of the presented statistics could be un-
derstood, if the overlapping of scattering functions com-
prising the estimator of the joint PDF is examined. In
the previous case with the underlying law yo(x), the joint
data are distributed along the corresponding line where
−8 ≤ x ≤ +8, while in the last case, they take place in
the square region −8 ≤ x ≤ +8,−8 ≤ y ≤ +8. Conse-
quently, the number of samples with nonoverlapping scat-
tering functions in the last case is approximately L/σ = 16
0 50 100 150 200 250 300 350 400 450 500
log(N)
Ixy   
Im    
Rxy   
Cxy   
σ=0.2
random data 
Fig. 9. Dependence of log(N), experimental information Ixy,
redundancy Rxy, and cost function Cxy on the number of sam-
ples N determined by various statistical data sets and scatter-
ing widths σ.
0 50 100 150 200 250 300 350 400 450 500
log(N)
Ixy   
Ix    
Iy    
Im    
σ=0.2
random data 
Fig. 10. Dependence of log(N), experimental information Ixy,
marginal informations Ix, Iy, and mutual information Im on
the number of samples N in the case of statistically indepen-
dent random variables x, y.
16 Igor Grabec: Extraction of physical laws from joint experimental data
0 50 100 150 200 250 300 350 400 450 500
σ=0.2
random data 
Fig. 11. Dependence of relative measure of determinationDy|x
– (top lines) and Dx|y – (bottom lines) on the number of ran-
dom samples N in the case of statistically independent random
data with σ = 0.2.
times larger than in the previous case. In the last case
we can therefore expect the optimal number of samples
in the interval around Nro ≈ 16No = 480. Since in the
last case a larger region is covered by the joint PDF, the
overlapping of scattering functions is less probable than
previously, and therefore, the joint experimental informa-
tion Ixy deviates less quickly from the line log(N) with
the increasingN . Therefore, the redundancy increases less
quickly and the minimum of the cost function takes place
at a much higher number of Nro = 480, which corre-
sponds well to our estimation. Since in the last case the
experimental information Ixy converges less quickly to the
limit value than the marginal information Ix, Iy, the mu-
tual information Im first increases and later decreases to
its limit value. Related to this is the approach of rela-
tive measures of determination Dy|x, Dx|y to much lower
limit values as in the previous case. Since the marginal
information Ix, Iy is approximately equal, the curves rep-
resenting Dy|x, Dx|y join with increasing N , and there is
no argument to consider any variable as a more funda-
mental one for the description of the phenomenon under
examination. This conclusion is consistent with the fact
that the centers of the scattering functions are determined
by two statistically independent random generators. How-
ever, the limit values of the statistics Dy|x, Dx|y are not
equal to zero since the region −8 ≤ x ≤ +8,−8 ≤ y ≤ +8
where the data appear is limited, while the characteristic
region −σ ≤ x ≤ +σ,−σ ≤ y ≤ +σ covered by the joint
scattering function does not vanish.
4 Conclusions
Following the procedures proposed in the previous article
[3], we have shown how the joint PDF of a vector variable
z = (x, y) can be estimated nonparametrically based upon
measured data. For this purpose the inaccuracy of joint
measurements was considered by including the scattering
function in the estimator. It is essential that the properties
of the scattering function need not be a priori specified,
but could be determined experimentally based upon cali-
bration procedure. The joint PDF was then transformed
into the conditional PDF that provides for an extraction
of the law yo(x) that relates the measured variables x, y.
For this purpose the estimation by the conditional average
yo(x) ≈ E[y|x] is proposed. The quality of the prediction
by the conditional average is described in terms of the es-
timation error and the variance of the measured data. It
is outstanding that the quality exhibits a convergence to
Igor Grabec: Extraction of physical laws from joint experimental data 17
some limit value that represents the measure of applicabil-
ity of the proposed approach. Examination of the quality
convergence makes it feasible to estimate an appropriate
number of joint data needed for the modeling of the law.
It is important that the conditional average makes feasi-
ble a nonparametric autonomous extraction of underlying
law from the measured data.
Using the joint PDF estimator we have also defined
the experimental information, the redundancy of measure-
ment and the cost function of experimental exploration. It
is characteristic that experimental information converges
with an increasing number of joint samples to a certain
limit value which characterizes the number of nonoverlap-
ping scattering distributions in the estimator of the joint
PDF. The most essential terms of the cost function are
the experimental information and the redundancy. Dur-
ing cost minimization the experimental information pro-
vides for a proper adaptation of the joint PDF model to
the experimental data, while the redundancy prevents an
excessive growth of the number of experiments. By the
position of the cost function minimum we introduced the
optimal number of the data that is needed to represent the
phenomenon under exploration. This number roughly cor-
responds to the ratio between the magnitude of the charac-
teristic region where joint data appear and the magnitude
of the characteristic region covered by the joint scattering
function. It also corresponds to the appropriate number
estimated from the quality of prediction by the conditional
average. Based upon the experimental information corre-
sponding to the joint and marginal PDFs, the mutual in-
formation has been introduced and further utilized in the
definition of the relative measure of determination of one
variable by another. This statistic provides an argument
for considering one variable as a fundamental one for the
description of the phenomenon.
In this article we graphically present the properties of
the proposed statistics by two characteristic examples that
represent data related by a certain law and statistically
independent random data. The exhibited properties agree
well with the expectations given by experimental science.
The problems related to the extraction of laws represent-
ing relations such as y2 + x2 = 1 and the expression of
physical laws by differential equations or analytical mod-
eling were not considered. For this purpose the statistical
methods are developed in the fields of pattern recognition,
system identification and artificial intelligence.
Acknowledgment
The research was supported by the Ministry of Science
and Technology of Slovenia and EU COST.
References
1. I. Grabec and W. Sachse, Synergetics of Measurement, Pre-
diction and Control (Springer-Verlag, Berlin, 1997).
2. J. C. G. Lesurf, Information and Measurement (Institute of
Physics Publishing, Bristol, 2002)
3. I. Grabec, Experimental modeling of physical laws, Eur.
Phys. J., B, 22 129-135 (2001)
4. R. O. Duda and P. E. Hart, Pattern Classification and Scene
Analysis (J. Wiley and Sons, New York, 1973), Ch. 4.
5. E. Parzen, Ann. Math. Stat., 35 1065-1076 (1962).
18 Igor Grabec: Extraction of physical laws from joint experimental data
6. T. M. Cover and J. A. Thomas Elements of Information
Theory (John Wiley & Sons, New York, 1991).
7. A. N. Kolmogorov, IEEE Trans. Inf. Theory, IT-2 102-108
(1956).
8. B. S. Clarke, A. R. Barron, IEEE Trans. Inf. Theory, 36 (6)
453-471 (1990)
9. D. Haussler, M. Opper, Annals of Statistics, 25 (6) 2451-
2492 (1997)
10. D. Haussler, IEEE Trans. Inform. Theory, 43 (4) 1276-
1280 (1997)
11. C. E. Shannon, Bell. Syst. Tech. J., 27 379-423 (1948).
	Introduction
	Statistics of joint measurements
	Illustration of statistics
	Conclusions
ABSTRACT
  The extraction of a physical law y=yo(x) from joint experimental data about x
and y is treated. The joint, the marginal and the conditional probability
density functions (PDF) are expressed by given data over an estimator whose
kernel is the instrument scattering function. As an optimal estimator of yo(x)
the conditional average is proposed. The analysis of its properties is based
upon a new definition of prediction quality. The joint experimental information
and the redundancy of joint measurements are expressed by the relative entropy.
With the number of experiments the redundancy on average increases, while the
experimental information converges to a certain limit value. The difference
between this limit value and the experimental information at a finite number of
data represents the discrepancy between the experimentally determined and the
true properties of the phenomenon. The sum of the discrepancy measure and the
redundancy is utilized as a cost function. By its minimum a reasonable number
of data for the extraction of the law yo(x) is specified. The mutual
information is defined by the marginal and the conditional PDFs of the
variables. The ratio between mutual information and marginal information is
used to indicate which variable is the independent one. The properties of the
introduced statistics are demonstrated on deterministically and randomly
related variables.

<|endoftext|><|startoftext|>
Introduction
The problem of extending the Vlasov equation to systems in which pairing
correlations play an important role has been tackled some time ago by Di
Toro and Kolomietz [1] in a nuclear physics context and, more recently, by
Urban and Schuck [2] for trapped fermion droplets. These last authors derived
∗ Corresponding author
Email address: della@fi.infn.it (A. Dellafiore).
Preprint submitted to Elsevier 16 November 2021
http://arxiv.org/abs/0704.0152v2
the TDHFB equations for the Wigner transform of the normal density matrix
ρ and of the pair correlation function κ (plus their time-reversal conjugates)
and used them to study the dynamics of a spin-saturated trapped Fermi gas.
In the time-dependent theory one obtains a system of four coupled differential
equations for ρ, κ, and their conjugates [2] and, if one wants an analytical so-
lution, some approximation must be introduced. Here we try to find a solution
of the equations of motion derived by Urban and Schuck in the approximation
in which the pairing field ∆(r,p, t) is treated as a constant. It is well known
that such an approximation violates both particle-number-conservation and
gauge invariance (see e.g. sect. 8-5 of [3] and [4]), nonetheless we study it
because of its simplicity, with the aim of correcting the final results for its
shortcomings. Moreover, the constant-∆ approximation is not satisfactory for
describing long wavelength pairing modes in a large system. Such modes have
frequencies which are much less than the pairing frequency ∆/~ and for their
study it is essential to use a self consistent theory where the gap ∆ is related
to the pair density κ through the pairing interaction. The phases of ∆ and
κ are particularly important because they describe the collective superfluid
currents. On the other hand nuclei are small systems. Shell gaps are large
compared with ∆, or equivalently giant resonance frequencies are large com-
pared with the pairing frequency. The constant-∆ approximation is much more
reasonable in such systems.
In Sect. 2, the basic equations are recalled and reformulated in terms of the
even and odd components of the normal density ρ. In Sect. 3, the static limit
is studied by following the approach of [5] and the constant-∆ approximation
is introduced. In Sect. 4, the simplified dynamic equations resulting from the
constant-∆ approxmation are derived and their solutions are determined in lin-
ear approximation. In Sect. 5, these solutions are studied in a one-dimensional
model and the problem of particle-number conservation is examined in detail.
By studying the energy-weighted sum rule (in the Appendix), we find that the
constant-∆ approximation introduces some spurious strength into the density
response of the system. A simple prescription, based on the continuity equa-
tion, is proposed in order to eliminate the spurious strength. The resulting
strength function gives the same energy-weighted sum rule as for the uncor-
related systems. In Sect. 6, the general solution found in Sect. 4 is re-written
for spherical systems, where the angular integrations can be performed ex-
plicitly, leading to expressions containing only radial integrations. In Sect.
7, the collective response function of spherical nuclei is derived for a simple
multipole-multipole residual interaction. In Sect. 8, the quadrupole and oc-
tupole channels, that are the ones most affected by the pairing correlations
are shown explicitly. Finally, in Sect. 9 conclusions are drawn.
2 Basic equations
We assume that our system is saturated both in spin and isospin space and
do not distinguish between neutrons and protons, so we can use directly the
equations of motion of Urban and Schuck.
We start from the equations of motion derived in Ref. [2] for the Wigner-
transformed density matrices ρ = ρ(r,p, t) and κ = κ(r,p, t), with the warning
that the sign of κ that we are using agrees with that of Ref. [1], hence it is
opposite to that of [2]. Moreover we find convenient to use the odd and even
combinations of the normal density introduced in [2]:
ρev =
[ρ(r,p, t) + ρ(r,−p, t)] , (1)
ρod =
[ρ(r,p, t)− ρ(r,−p, t)] . (2)
Thus, the equations of motion given by Eqs.(15a...d) of Ref. [2] read
i~∂tρev = i~{h, ρod} − 2iIm[∆∗(r,p, t)κ] (3)
i~∂tρod = i~{h, ρev}+ i~Re{∆∗(r,p, t), κ} (4)
i~∂tκ=2(h− µ)κ−∆(r,p, t)(2ρev − 1) + i~{∆(r,p, t), ρod} . (5)
Here h is the Wigner-transformed Hartree–Fock hamiltonian h(r,p, t), while
∆(r,p, t) is the Wigner-transformed pairing field. Since the time-dependent
part of κ is complex, κ = κr+iκi, the last equation gives two separate equations
for the real and imaginary parts of κ.
Moreover, from the supplementary normalization condition ([6], p. 252)
R2 = R (6)
satisfied by the generalized density matrix R, the two following independent
equations are obtained:
ρodκ+ i
{ρev, κ}=0 , (7)
ρev(ρev − 1) + ρ2od + κκ∗=0 . (8)
We shall use the equations of motion (3–5), together with these equations, as
our starting point, but first we notice that, in the limit of no pairing, both ∆
and κ vanish, the third equation of motion reduces to a trivial identity, while
the first two give the Vlasov equation for normal systems, expressed in terms
of the even and odd components of ρ:
∂tρev = {h, ρod} , (9)
∂tρod = {h, ρev} . (10)
A solution of the linearized Vlasov equation for normal systems (i. e. without
pairing) has been obtained in Ref. [7] and our aim here is to study the changes
introduced by the pairing interaction in the solution of [7].
Moreover, before studying the time-dependent problem, it is useful to look at
the static limit.
3 Static limit
In this section we follow the approach of Ref. [5]. At equilibrium we have
ρev = ρ0(r,p) , (11)
ρod =0 , (12)
h=h0(r,p) , (13)
κ=κ0(r,p) (14)
∆0 =∆0(r,p) (15)
and equations (3–5) give
0=−2iIm(∆∗0κ0) , (16)
0= i~{h0, ρ0}+ i~Re{∆∗0, κ0} , (17)
0=2(h0 − µ)κ0 −∆0(2ρ0 − 1) , (18)
while Eqs. (7, 8) give
{ρ0, κ0}=0 , (19)
ρ0(ρ0 − 1) + |κ0|2=0 . (20)
Equation (16) is satisfied if we assume that ∆0 and κ0 are real quantities,
while Eqs. (18) and (20), taken as a system, have the solution:[5]
ρ0(r,p) =
h0(r,p)− µ
E(r,p)
κ0(r,p) = −
∆0(r,p)
2E(r,p)
, (22)
with the quasiparticle energy
E(r,p) =
∆20(r,p) + (h0(r,p)− µ)2 . (23)
It can be easily checked that Eqs. (21, 22) satisfy also Eqs. (17) and (19), that
{h0, ρ0}+ {∆0, κ0} = 0 (24)
{ρ0, κ0} = 0 . (25)
The (semi)classical equilibrium phase-space distribution is closely related to
ρ0(r,p):
f0(r,p) =
(2π~)3
ρ0(r,p) (26)
and the statistical factor 4 takes into account the fact that there are two kinds
of fermions.
The parametrer µ is determined by the condition
drdpf0(r,p) , (27)
where A is the number of particles. This integral should keep the same value
also out of equilibrium (global particle-number conservation).
3.1 Constant-∆ approximation
In a fully self-consistent approach, the pairing field ∆(r,p, t) is related to
κ(r,p, t), however here we introduce an approximation and replace the pairing
field of the HFB theory with the phenomenological pairing gap of nuclei, hence
in all our equations we put
∆(r,p, t) ≈ ∆0(r,p) ≈ ∆ = const. , (28)
with ∆ ≈ 1MeV.
In the constant-∆ approximation the equilibrium distributions become
ρ0(ǫ) =
1− ǫ− µ
, (29)
κ0(ǫ) =−
2E(ǫ)
and the quasiparticle energy
E(ǫ) =
∆2 + (ǫ− µ)2 , (31)
ǫ = h0(r,p) =
+ V0(r)
the particle energy in the equilibrium mean field.
In the following we shall use the relation:
κ0(ǫ) =
E2(ǫ)
dρ0(ǫ)
. (32)
4 Dynamic equations
Always in the approximation where ∆ is constant and real, the time-dependent
equations (3–5) become
i~∂tρev = i~{h, ρod} − 2i∆Im(κ) (33)
i~∂tρod = i~{h, ρev} (34)
i~∂tκ=2(h− µ)κ−∆(2ρev − 1) . (35)
This is the simplified set of equations that we want to study here. The sum of
the first two equations gives an equation that is similar to the Vlasov equation
of normal systems, only with the extra term −2i∆Im(κ). This extra term
couples the equation of motion of ρ with that of κ, thus, instead of a single
differential equation (Vlasov equation), now we have a system of two coupled
differential equations (for ρ and κi).
Our aim here is that of determining the effects of pairing on the linear response
of nuclei, thus we assume that our system is initially at equilibrium, with
densities given by Eqs. (29,30), and that at time t = 0 a weak external field
of the kind
δV ext(r, t) = βδ(t)Q(r) (36)
is applied to it. This simple time-dependence is sufficient to determine the
linear response of the system. In a self-consistent approach, we should take
into account also the changes of the mean field surrounding each particle
induced by the external force and consider a perturbing hamiltonian of the
δh = δV ext + δV int , (37)
however we start with the zero-order approximation
δh = δV ext (38)
and will consider collective effects in a second stage.
Since we want to solve Eqs. (33–35) in linear approximation, we consider small
fluctuations of the time-dependent quantities about their equilibrium values
and neglect terms that are of second order in the fluctuations. Hence, in Eqs.
(33–35) we put:
h=h0 + δh , (39)
ρev = ρ0 + δρev , (40)
ρod = δρod , (41)
κ=κ0 + δκ = κ0 + δκr + iδκi . (42)
Then, the linearized form of Eqs. (33–35) is
i~∂tδρev = i~{h0, δρod} − 2i∆δκi (43)
i~∂tδρod= i~{h0, δρev}+ i~{δh, ρ0} (44)
−~∂tδκi=2(ǫ− µ)δκr + 2κ0δh− 2∆δρev , (45)
~∂tδκr =2(ǫ− µ)δκi . (46)
Taking the sum of the first two equations gives
i~∂tδρ(r,p, t) = (47)
i~{h0, δρ(r,p, t)}+ i~{δh(r,p, t), ρ0} − 2i∆δκi(r,p, t) ,
which can be regarded as an extension of the linearized Vlasov equation stud-
ied in [7].
In order to make the comparison with [7] easier, from now on we change the
normalization of the phase-space densities and define
f(r,p, t)=
(2π~)3
ρ(r,p, t) (48)
χ(r,p, t)=
(2π~)3
κ(r,p, t) , (49)
moreover, we put
F (ǫ) =
(2π~)3
ρ0(ǫ) . (50)
In terms of the new functions Eqs. (47) and (45) read
i~∂tδf(r,p, t) = i~{h0, δf(r,p, t)}+ i~{δh(r,p, t), f0}
− 2i∆δχi(r,p, t) , (51)
−~∂tδχi(r,p, t) = 2(ǫ− µ)δχr(r,p, t) + 2χ0δh(r,p, t)
− 2∆δfev(r,p, t) . (52)
The function fev is given by the obvious extension of Eq. (1). In order to get
a closed system of equations, we still need an extra equation for δχr(r,p, t).
This can be obtained from the linearized form of the supplementary condition
(8) that reads
δρev(2ρ0 − 1) = −2κ0δκr , (53)
δκr(r,p, t) =
1− 2ρ0(ǫ)
δρev(r,p, t) = −
δρev(r,p, t) . (54)
The last expression has been obtained with the help of Eq. (18). In terms of
the new functions f and χ, the last equation reads
δχr(r,p, t) = −
δfev(r,p, t) . (55)
Equations (51, 52) and (55) are the set of coupled equations for the phase-
space densities that we have to solve.
Replacing Eq. (55) into Eq. (52), and using Eq. (30), gives the following system
of coupled differential equations:
∂tδf(r,p, t)= {h0, δf}+ {δh, f0} − 2
δχi(r,p, t) , (56)
∂tδχi(r,p, t)=
E2(ǫ)
[δf(r,p, t) + δf(r,−p, t)]− 2
δh(r,p, t) . (57)
Taking the Fourier transform in time, gives
−iωδf(r,p, ω)= {h0, δf}+ {δh, f0} − 2
δχi(r,p, ω) , (58)
−iωδχi(r,p, ω))=
E2(ǫ)
[δf(r,p, ω) + δf(r,−p, ω)]
− 2χ0
δh(r,p, ω) , (59)
or (for ω 6= 0)
−iωδf(r,p, ω) + {δf, h0}=−iωd2
[δf(r,p, ω) + δf(r,−p, ω)]
+F ′(ǫ)[{δh, h0}+ iωd2δh] , (60)
(Ω(ǫ)
Ω(ǫ) = 2
. (62)
This frequency plays a crucial role in our approach, its minimum value is
2∆/~.
In Eq. (60) we have used the relation {f0, δh} = F ′(ǫ){ho, δh} as well as Eq.
(32).
By comparing Eq. (60) with the analogous equation for normal systems
−iωδf(r,p, ω) + {δf, h0} = F ′(h0){δh, h0} , (63)
we can see that the only effect of pairing in the constant-∆ approximation is
that of adding the terms proportional to d2.
The normal Vlasov equation (63) can be solved in a very compact way by using
the method of action-angle variables [8], [7]. In that approach one expands
δhn(I)e
in·Φ , (64)
where I and Φ are the action and angle variables, respectively. Moreover
δf(r,p, ω) =
δfn(I, ω)e
in·Φ (65)
{δf, h0} =
i(n · ~ω)δfn(I, ω)ein·Φ , (66)
where the vector ~ω has components
. (67)
Then Eq.(63) gives
δfn(I, ω) =
(n · ~ω)
(n · ~ω)− (ω + iε)
F ′(ǫ)δhn(I) . (68)
The (zero-order) eigenfrequencies of the (normal) physical system are
ωn = n · ~ω . (69)
Here want to use the same method to solve the more complicated equation
(60). Since that equation contains also the function δf(r,−p, ω), we need also
the analogous equation for this other quantity:
−iωδf(r,−p, ω)− {δf, h0}r,p=−iωd2
[δf(r,p, ω) + δf(r,−p, ω)]
+F ′(ǫ)[−{δh, h0}r,p + iωd2δh] . (70)
By expanding δf(r,p, ω) and δf(r,−p, ω) as
δf(r,±p, ω) =
δf±n (I, ω)e
in·Φ , (71)
Eqs. (60) and (70) give
[−ω(1− d
) + ωn]δf
n + ω
δf−n = F
′(ǫ)[ωn + ωd
2]δhn , (72)
]δf+n + [−ω(1−
)− ωn]δf−n = F ′(ǫ)[−ωn + ωd2]δhn , (73)
which is a system of two coupled algebraic equations for the coefficients δf+n
and δf−n . Its solution is
δf+n =
ω̄2n + ωωn
ω̄2n − ω2
F ′(ǫ)δhn , (74)
δf−n =
ω̄2n − ωωn
ω̄2n − ω2
F ′(ǫ)δhn , (75)
where
ω̄2n = ω
n + Ω
2(ǫ) (76)
are the (squared) eigenfrequencies of the correlated system. These eigenfre-
quencies are in agreement with the enegy spectrum of a superfluid infinite ho-
mogeneous Fermi gas (see e. g. Sect. 39 of [10]) and they lead to a low-energy
gap of 2∆ in the excitation spectrum of the correlated systems. However,
as anticipated, we expect problems with particle-number conservation. These
problems are better discussed in one dimension, where formulae are simpler.
5 One-dimensional systems and particle-number conservation
In one dimension, Eq.(60) reads
−iωδf(x, p, ω) + ẋ∂xδf −
dV0(x)
∂pδf = (77)
−iωd2 1
[δf(x, p, ω) + δf(x,−p, ω)] + F ′(ǫ)( p
∂xδh + iωd
2δh) .
In zero-order approximation δh(x, p, ω) = βQ(x), moreover, in one dimension
F ′(ǫ) =
. (78)
The vectors ~ω and Φ have only one component:
ω0(ǫ) =
T (ǫ)
Φ(x) = ω0τ(x) , (80)
τ(x) =
v(ǫ, x′)
, (81)
v(ǫ, x) =
[ǫ− V0(x)] . (82)
The time T (ǫ) is the period of the bound motion of particles with enegy ǫ
in the equilibrium potential well V0(x): T = 2τ(x2). The points x1,2 are the
classical turning points for the same particles. Instead of the action variable
I(ǫ) = 1
dxp(ǫ, x), it is more convenient to use the particle energy ǫ as
constant of motion. As pointed out in [7], the range of values of τ can be
extended to the whole interval (0, T ), by defining
τ(x) =
v(ǫ, x′)
when τ > T
. With this extension, the angle variable Φ(x) takes values between
0 and 2π, as it should.
In one dimension, Eqs. (74, 75) give
δf±n (ǫ, ω) =
ω̄2n ± ω ωn
ω̄2n − ω2
F ′(ǫ)δhn , (84)
δhn = βQn
v(ǫ, x)
e−iωnτ(x)
v(ǫ, x)
cos[ωnτ(x)] , (85)
The frequencies ωn are the eigenfrequencies of the uncorrelated system:
ωn = nω0 , (86)
while ω̄n are the new eigenfrequencies modified by the pairing correlations:
ω̄n = ±
ω2n + Ω
2(ǫ) . (87)
Note that, since δh−n = δhn, then δf
n = δf
By using the solutions (84), we can also obtain an expansion for the even and
odd parts of δf :
δfev(x, ǫ, ω) =
An(ω) cosnω0τ(x) , (88)
δfod(x, ǫ, ω) =
Bn(ω) sinnω0τ(x) , (89)
An(ω) =
ω̄2n − ω2
F ′(ǫ)δh′n , (90)
Bn(ω) = iω
ω̄2n − ω2
F ′(ǫ)δh′n (91)
δh′n =2δhn , n 6= 0 , (92)
= δhn , n = 0 . (93)
Note that, while Bn=0(ω) = 0, we have An=0(ω) 6= 0, and this fact leads to
an unphysical fluctuation of the number of particles, induced by the applied
external field. These fluctuations are given by
δA(ω) =
dxδ̺(x, ω) , (94)
where δ̺(x, ω) is the density fluctuation at point x:
δ̺(x, ω) =
dp δf(x, p, ω) = 2
v(ǫ, x)
δfev(x, ǫ, ω) . (95)
Equation (88) gives
δA(ω) = 2
An(ω)
dτ cos nω0τ . (96)
Since the integrals
∫ T/2
0 dτ cosnω0τ vanish when n 6= 0, the term with n = 0
is the only one contributing to this sum, thus givig an unphysical fluctuation
of the number of particles. This problem could be solved simply by excluding
the mode n = 0 from the sum in Eq. (88), however this would not be sufficient
to solve all problems with particle-number conservation, since we can easily
check that the solutions (88, 89) do not satisfy the continuity equation
iω̺(x, ω) = ∂xj(x, ω) . (97)
The density fluctuation involves only the even part of δf , while the current
density j(x, ω) involves only the odd part:
j(x, ω) =
δf(x, p, ω) = 2
dǫδfod(x, ǫ, ω) . (98)
The fact that the continuity equation is violated is a very serious shortcom-
ing of the constant-∆ approximation. However, since we have seen that this
approximation leads to very simple equations and to rather satisfactory ex-
pressions for the eigenfrequencies of the correlated systems, we still use it,
but with the following prescription: when calculating the longitudinal response
function, the density fluctuations should be evaluated by using Eq. (97), instead
of Eq, (95). Then, the density fluctuations (95) should be replaced by
δ ¯̺(x, ω) =
dǫ∂xδfod(x, ǫ, ω) . (99)
In practice we are proposing to evaluate the longitudinal response function
in terms of the transverse response function. It is well known that also the
more familiar BCS approximation gives a more accurate description of the
transverse response (see e.g. sect. 8-5 of [3]). In the Appendix we show that the
longitudinal response function resulting from the present prescription satisfies
the same energy-weighted sum rule as the uncorrelated response function. This
would not necessarily happen if, instead of changing only the even part of δf ,
we had modified also its odd part.
It is interesting to see how the solutions (84) are changed by our prescription.
By using Eq. (89) forfod, Eq. (99) gives
δ ¯̺(x, ω) = 2
v(ǫ, x)
δf̄ev(x, ǫ, ω) , (100)
δf̄ev(x, ǫ, ω) =
Ān(ω) cosnω0τ(x) , (101)
Ān(ω) =
Bn(ω) , (102)
(note that Ān=0(ω) = 0). Then
δf̄(x,±p, ω) = δf̄ev(x, ǫ, ω)± δfod(x, ǫ, ω) (103)
and Eq. (84) is replaced by
δf̄±n (ǫ, ω) =
ω2n ± ω ωn
ω̄2n − ω2
F ′(ǫ)δhn . (104)
By comparing this expression to Eq. (84), we can see that the fluctuations
of the phase-space density given by the constant-∆ approximation contain an
extra contribution that we identify as spurious:
δf±n (ǫ, ω) = δf̄
n (ǫ, ω) + δf
n (ǫ, ω) , (105)
δf spurn (ǫ, ω) =
Ω2(ǫ)
ω̄2n − ω2
F ′(ǫ)δhn . (106)
The spurious character of δf spurn is suggested also by sum-rule arguments
(see Appendix). The term f spurn (ǫ, ω) contributes to all modes of the density
strength function: the contribution to the mode n = 0 gives a fluctuation of
the particle-number integral (global paticle-number violation), while the other
modes give a spurious contribution to the density strength function, increas-
ing the sum rule and violating the continuity equation (local particle-number
violation). Note that the spurious contribution (106) affects only the even part
of the pase-space density, not the odd part.
6 Spherical Systems
The method of action-angle variables gives a very compact solution of the lin-
earized Vlasov equation both in the uncorrelated and correlated cases, however
it may be useful to make a connection between the results given by this method
and the more explicit treatment of spherical nuclei given in [7]. For uncorre-
lated system this has been done in [11]. Here we follow that approach in order
to derive useful expressions for correlated spherical systems. The components
of the vecor n are (n1, n2, n3), the first point to notice is that, because of the
degeneracy associated with any central-force field, the vector ~ω has only two
non vanishing components:
~ω = (0, ωϕ(ǫ, λ), ω0(ǫ, λ)) . (107)
With λ we denote the magnitude of the particle angular momentum. According
to Eq. (69), the eigenfrequencies of the uncorrelated system are [7]
ωn = ωn3,n2(ǫ, λ) = n3ω0 + n2ωϕ , (108)
while Eq. (76) gives the correlated eigenfrequencies
ω̄n = ω̄n3,n2 = ±
ω2n3,n2 + Ω
2(ǫ) . (109)
In three dimensions, the Fourier coefficients analogous to (85) are
Qn(I) =
(2π)3
dΦe−in·ΦQ(r) . (110)
The external field Q(r) can be expanded in partial waves as
Q(r) =
QL(r)YLM(r̂) , (111)
giving
Qn(I) =
Q(LM)n , (112)
with [11]
Q(LM)n =
dLMN(β
′)δM,n1δN,n2Q
. (113)
By using this last equation (and changing n3 → n), the expansion (71) be-
comes
δf(r,±p, ω) =
(114)
δfL±nN (ǫ, λ, ω)e
iφnN (r)
DLMN(α, β ′, γ)
δfL±nN (ǫ, λ, ω) =
ω̄2nN ± ωωnN
ω̄2nN − ω2
βF ′(ǫ)QLnN (115)
and QLnN the semiclassical limit of the radial matrix elements:
QLnN =
vr(r)
e−iφnN (r)QL(r) ,
vr(r)
cos[φnN(r)]QL(r) . (116)
Here T is the period of radial motion, vr(r) the radial velocity
vr(r) =
ǫ− V0(r)−
(117)
and the phases φnN(r) are given by
φnN(r) = ωnNτ(r)−Nγ(r) , (118)
where
τ(r) =
vr(r′)
(119)
γ(r) =
vr(r′)
. (120)
The frequencies ω0 and ωϕ are given by
τ(r2)
, (121)
γ(r2)
τ(r2)
. (122)
The Wigner rotation matrix elements in Eq. (114) are given by [9]
DLMN(α, β ′, γ) = e−iMαdLMN(β ′)e−iNγ , (123)
where (α, β ′, γ) are the Euler angles introduced in [7].
On the basis of the discussion in Sect. (5), we expect that the solution (114)
will contain some spurious strength introduced by the constant-∆ approxima-
tion. In order to eliminate the spurious contributions, we should replace the
coefficients (115) with
δf̄L±nN (ǫ, λ, ω) =
ω2nN ± ωωnN
ω̄2nN − ω2
βF ′(ǫ)QLnN . (124)
These modified coefficients allow us to obtain the modified zero-order propa-
gator
D̄0L(r, r
′, ω) =
dǫF ′(ǫ)
d̄LnN(r, r
ω − ω̄nN + iε
, (125)
d̄LnN(r, r
′) = (126)
(4π)2
2L+ 1
|YLN(
(−2ωnN
)(ωnN
)cosφnN(r)
r2vr(r)
cosφnN(r
r′2vr(r′)
and the corresponding response and strength functions:
R̄0L(ω)=
drdr′r2QL(r)D̄
L(r, r
′, ω)r′2QL(r
′) , (127)
S̄0L(ω)=−
ImR̄0L(ω) . (128)
For multipole response: QL(r) = r
For normal systems, the zero-order propagator D0L(r, r
′, ω) is given by Eqs.
(125) and(126) where ω̄nN is replaced by ωnN and F
′(ǫ) is proportional to a
δ-function[7].
7 Collective response
Up to now, we have been concerned only with the zero-order approximation,
which corresponds to the single-particle approximation of the quantum ap-
proach. In this approximation, the perturbing part of the hamiltonian is given
only by the external field, while a more consistent approach would require
taking into account also the mean-field fluctuation induced by the external
force, so that
δh = δV ext(r, ω) + δV int(r, ω) . (129)
In the Hartree approximation,
δV int(r, ω) =
dr′v(r− r′)δ̺(r′, ω) . (130)
where v(r− r′)is the (long-range) interaction between constituents.
For consistency, we take
δV int(r, ω) =
dr′v(r− r′)δ ¯̺(r′, ω) , (131)
then the collective propagator for correlated systems satisfies the same kind
of integral equation as for normal systems [7]:
D̄L(r, r
′, ω) = (132)
D̄0L(r, r
′, ω) +
dyy2D̄0L(r, x, ω)vL(x, y)D̄L(y, r
′, ω) .
Here vL(x, y) is the partial-wave component of the interaction between par-
ticles. We assume that this interaction can be approximated by a separable
form of the kind
vL(x, y) = κLx
LyL , (133)
where κL is a parameter that determines the strength of the interaction. Then,
the integral equation (132) gives an algebraic equation for the collective cor-
related response function
R̄L(ω) =
drr2rL
dr′r′2r′LD̄L(r, r
′, ω)
leading to the expression
R̄L(ω) =
R̄0L(ω)
1− κLR̄0L(ω)
. (134)
8 Results
Here we compare the multipole strength functions given by our simplified
model of pairing correlations with that of the corresponding uncorrelated sys-
tem. This comparison is made for the quadrupole and octupole strength func-
tions, since these channels are the ones that are most affected.
The static nuclear mean field is approximated with a spherical cavity of radius
R = 1.2A
3 fm and the A nucleons are treated on the same footing, i. e., we do
not distinguish between neutrons and protons. Moreover, we chose A = 208
for ease of comparison with previous calculations of uncorrelated response
functions [12,13]. Shell effects are not included in our semiclassical picture and
the results shown below should be considered as an indication of the qualitative
effects to be expected in heavy nuclei. For the uncorrelated calculations, the
Fermi energy is determined by the parametrization chosen for the radius as
ǫF ≈ 33.33 MeV, while for the correlated case, the parameter µ is determined
by the condition (27); with the value of ∆ = 1 MeV used here, the value of µ
is practically coincident with that of ǫF , so we have used µ = ǫF = 33.33 MeV
in the calculations below. Moreover, the small parameter ε appearing in Eq.
(125) has been given the value ε = 0.1 MeV. This value is chosen to simplify
the evaluation of the response function by smoothing out discontinuities in its
dependence on ω.
In the evaluation of the collective response, the value of parameters κL is
the same as in [12,13], that is: κ2 = −1 × 10−3 MeV/fm4 and κ3 = −2 ×
10−5 MeV/fm6.
8.1 Quadrupole response
Figure 1 show the longitudinal quadrupole strength function evaluated in the
zero-order approximation (corresponding to the quantum single-particle ap-
proximation). The dashed curve shows the uncorrelated response evaluated
according to the theory of [7], while the full curve shows the result of the
present correlated calculation. As we can see the effect of pairing correlations
on this zero-order strength function is rather small, however, since pairing af-
fects also the real part of the zero-order response function,in Fig. 2 we plot
0 5 10 15 20 25 30
ÑΩ @MeVD
Fig. 1. Quadrupole stength function in zero-order approximation. The dashed curve
gives the response of a normal system of A = 208 nucleons contained in a spherical
cavity, while the solid curve includes the effects of pairing correlations in constant-∆
approximation.
0 5 10 15 20 25 30
ÑΩ @MeVD
Fig. 2. Collective quadrupole strength function showing the giant quadrupole res-
onance. The solid curve involves also pair correlations, the dashed curve has no
pairing.
also the collective strength function given by Eq. (134). Again, the effect of
pairing correlations is very small, in agreement with the results of [1].
0 1 2 3 4 5
ÑΩ @MeVD
0.0025
0.005
0.0075
0.0125
0.015
0.0175
Fig. 3. Same as Fig.1, at low excitation energy.
The main difference between the uncorrelated and correlated responses occurs
at small excitation energy, Fig. 3 shows a detail of Fig. 1 at low excitation
energy. The correlated strength function displays a gap of about 2 ∆ , the
very small strength extending below 2 MeV is entirely due to the finite value
of the small parameter ε used in the numerical evaluation of the propagator
(125).
8.2 Octupole response
Figures 4 and 5 show the zero-order and collective octupole strength functions,
both correlated and uncorrelated. As we can see, in this case too the effect is
rather small.
0 5 10 15 20 25 30
ÑΩ @MeVD
Fig. 4. Zero-order octupole strength function. The solid curve involves pair correla-
tions, the dashed curve has no pairing.
0 5 10 15 20 25 30
ÑΩ @MeVD
Fig. 5. Collective octupole strength function. The solid curve involves pair correla-
tions, the dashed curve has no pairing.
9 Conclusions
The solutions of the semiclassical time-dependent Hartree–Fock–Bogoliubov
equations have been studied in a simplified model in which the pairing field
∆(r,p, t) is treated as a constant phenomenological parameter. Such an ap-
proximation is known to violate some important constraints, like global (particle-
number integral) and local (continuity equation) particle-number conserva-
tion. In a linearized approach , we have shown that the global particle-number
violation is related only to one particular mode of the density fluctuations,
while the violation of the continuity equation gives a spurious contribution
to all modes of the density response. Both global and local particle-number
conservation can be restored by introducing a new density fluctuation that is
related to the current density by the continuity equation. This prescription
changes the strength associated with the various eigenmodes of the density
fluctuations, but not the eigenfrequencies of the system. We have shown in a
one-dimensional model that the energy-weighted sum rule calculated accord-
ing to this prescription has exactly the same value as for normal, uncorrelated
systems, thus we conclude that our prescription eliminates all the spurious
strength introduced by the constant-∆ approximation.
In a simplified model of nuclei, the effects of pairing correlations on the
isoscalar strength functions has been studied in detail for the quadrupole and
octupole channels, in the region of giant resonances. In both cases the effects
of pairing are rather small. More sizable effects are found at lower excitation
energy, in the region of surface modes which have not been included in the
present model, but will certainly be more affected by pairing correlations.
Appendix
In this Appendix we show that the correlated zero-order response function
given by the modified density fluctuation (99) satisfies the same energy-weighted
sum rule (EWSR) as the uncorrelated response function. We assume that par-
ticles move in one-dimensional square-well potential, so that formulae become
simpler because the particle velocity does not depend on position: v(ǫ, x) =
v(ǫ) =
Uncorrelated sum rule
The uncorrelated propagator [7]
D0(x, x′, ω) = (135)
dǫF ′(ǫ)
−2nω0
cos[nω0τ(x)]
v(ǫ, x)
ω − nω0(ǫ) + iε
cos[nω0τ(x
v(ǫ, x′)
gives the uncorrelated strength function S0(ω) = − 1
dxdx′Q(x)D0(x, x′, ω)Q(x′)
and the first moment
dωωS0(ω)
dǫF ′(ǫ)
(−2nω0(ǫ)
T (ǫ)
)(T (ǫ)
Q2nnω0(ǫ). (136)
Correlated sum rule
The modified density fluctuation (99) allows us to evaluate the correlated
propagator D̄0 through the relation
δ ¯̺(x, ω) = β
dx′D̄0(x, x′, ω)Q(x′) , (137)
giving
D̄0(x, x′, ω) = (138)
dǫF ′(ǫ)
−2nω0
)cosnω0τ(x)
v(ǫ, x)
ω − ω̄n + iε
cosnω0τ(x
v(ǫ, x′)
and the correlated first moment
M̄1 = 2
dǫF ′(ǫ)
(−2nω0(ǫ)
T (ǫ)
)(T (ǫ)
(nω0(ǫ)
ω̄n(ǫ)
ω̄n(ǫ) . (139)
The only difference between this expression and Eq. (136) is in the form of
F ′(ǫ), which is proportional to a δ-function in (136), while it is smoother in
the correlated case, however, if the parameter µ is determined by the one-
dimensional version of Eq. (27), then it can be easly found that, for a square-
well mean field,
M̄1 = M1 . (140)
The detailed argument goes as follows: both for correlated and uncorrelated
systems, the number of particles is given by
dxdpF (ǫ) =
dǫT (ǫ)F (ǫ) , (141)
with F (ǫ) = 4
θ(ǫF − ǫ) for uncorrelated fermions and F (ǫ) = 42π~ρ0(ǫ) in the
correlated case, while the moments (136, 139) are given by
dǫF ′(ǫ)G(ǫ) , (142)
G(ǫ) = −
(2π)2
T (ǫ)
Q2n (143)
in both cases, F ′(ǫ) obviously differs in the two cases.
Integrating by parts the last expression in (141), gives
A = J(ǫ)F (ǫ)
F ′(ǫ)J(ǫ) , (144)
J(ǫ) =
dǫT (ǫ) . (145)
For a square-well potential of size L:
T (ǫ) =
, (146)
J(ǫ) =
2mL 2
ǫ . (147)
Since
J(ǫ)F (ǫ) = lim
J(ǫ)F (ǫ) = 0 , (148)
both for the correlated and uncorrelated distributions, we have
A = −
dǫF ′(ǫ)J(ǫ) . (149)
The explicit expressions of J(ǫ) and T (ǫ) give
A=−2L
dǫF ′(ǫ)
ǫ , (150)
(2π)2√
dǫF ′(ǫ)
ǫ (151)
for both distributions. From these relations follows that, for a square-well mean
field, the relation (140) is exact. If we had used the density fluctuation (95),
instead of (99), to evaluate the correlated propagator, we would have obtained
a different value of the first moment because the additional term (106) in
the phase-space density gives an extra contribution to the density response
function and hence to the EWSR. Because of the fundamental character of
the EWSR, as well as of the continuity equation, we conclude that this term
is a spurious contribution generated by the constant-∆ approximation.
References
[1] M. Di Toro and V.M. Kolomietz, Zeit. Phys. A-Atomic Nuclei 328 (1987) 285
[2] M. Urban and P. Schuck, Phys. Rev. A 73 (2006) 013621
[3] J.R. Schrieffer, Theory of superconductivity, (W.A. Benjamin, Inc., New York,
1964)
[4] R. Combescot, M. Yu. Kagan, S. Stringari, Phys. Rev. A 74, (2006) 042717
[5] R. Bengtsson and P. Schuck, Phys. Lett. 89B (1980) 321
[6] P. Ring and P. Schuck, The Nuclear Many-Body Problem, (Springer, New
York, 1980)
[7] D.M. Brink, A. Dellafiore, M. Di Toro, Nucl. Phys. A456 (1986) 205
[8] V.L. Polyachenko, I.G. Schuckman, Sov. Astron. 25 (1981) 533
[9] D.M. Brink and G.R. Satchler, Angular Momentum (Oxford University Press,
Oxford,U.K., 1968), p. 147
[10] E.M. Lifshitz and L.P. Pitaevskii, Statistical Physics, Part 2 (Pergamon Press,
Oxford, 1980)
[11] A. Dellafiore, F. Matera, D.M. Brink, Phys. Rev. A 51 (1995) 914
[12] V.I. Abrosimov, A. Dellafiore, F. Matera, Nucl. Phys. A697 (2002) 748
[13] V.I. Abrosimov, O.I. Davidovskaya, A. Dellafiore, F. Matera, Nucl. Phys. A727
(2003) 748220
ABSTRACT
  The solutions of the Wigner-transformed time-dependent
Hartree--Fock--Bogoliubov equations are studied in the constant-$\Delta$
approximation. This approximation is known to violate particle-number
conservation. As a consequence, the density fluctuation and the longitudinal
response function given by this approximation contain spurious contributions. A
simple prescription for restoring both local and global particle-number
conservation is proposed. Explicit expressions for the eigenfrequencies of the
correlated systems and for the density response function are derived and it is
shown that the semiclassical analogous of the quantum single--particle spectrum
has an excitation gap of $2\Delta$, in agreement with the quantum result. The
collective response is studied for a simplified form of the residual
interaction.

<|endoftext|><|startoftext|>
Introduction 
The differential equation  
=                                                         (1.1) 
has a unique solution. The corresponding finite difference equation has more solutions1. 
When the function represents a harmonic oscillator, different solutions will contribute to 
oscillator energy in different ways. We intend to study these contributions and compare 
them to the corresponding quantum mechanical values.     
2. Oscillator Finite Difference Equation   
Classical simple harmonic oscillator function f (with angular speed w) satisfies 
differential equation (1.1) 
To exploit its symmetry properties we replace the above differential equation by the 
corresponding symmetric finite difference equation2   
± = giW
),( δ
                                                   (2.1)  
where  
),(),(
),( −−+
= ±±±
twgtwg
                                    (2.2) 
The above difference quotient has the following symmetry under the change δδ −→  
),(),( δδ tD
Dg ±± =
                                                (2.3) 
We require that at least one of the solutions, +g , of (2.1) should go over to (1.1) in the 
limit 0→δ  
⎯⎯→⎯=
0)(),( δδ
=                              (2.4) 
With  
fg ⎯⎯→⎯
→+ 0δ  and  wW ⎯⎯→⎯ →0δ                                      (2.5) 
3. Reciprocal Symmetry 
Let ±g be of the form 
δ/)( tag ±± ±= so that 
),()1(),( / twgtwg t −+ −=
δ                                            (3.1) 
Consider equation (2.1) 
),(),( −−+ ++ twgtwg ),(. twgiW +=                                (3.2) 
Using (3.1) we find that −g also satisfies the equation. This establishes reciprocal 
symmetry of (2.1), that the equation remains invariant under transformation (3.1).   
4. Reciprocal symmetric Solutions 
(2.1) has a pair of solutions 
( )iwt
±=± exp)1(
sin(1
sin(1
)2/( /
                    (4.1) 
+g and −g  satisfy (2.1) with  
δ )sin(w
W =                                                      (4.2) 
We may write 
).exp()exp(
exp tiwiwti
g ++ =⎟
                             (4.3) 
).exp()exp(
exp tiwiwti
g −− =−⎟
                         (4.4) 
where  
wywnw +=+= ++ δπ /)2(                                        (4.5) 
wywnw −=−+= −− δπ /)12(                                      (4.6) 
5. Classical and Half-Integral Energy Levels 
The energy of the oscillator is proportional to 
22222 }/)2{(2}/)2{(2)( wwnnwwyyw ++=++= +++ δπδπ               (5.1) 
22222 }/)2/1{(4}/)12{(2)( wwnnwwyyw ++++=++= −+− δπδπ          (5.2) 
For n=0 (5.1) gives the classical value. The middle term of (5.2) is a product of half-
integers and w. To this extent it corresponds to quantum mechanical value.  
4. Conclusion 
We have replaced oscillator differential equation by the corresponding symmetric 
discrete equation (2.1). This has brought to surface important parts of oscillator function, 
which were lost in the conventional solution. These parts contain discrete – integral and 
half integral -- energy levels.
                                                                                                                                                                             
1 Mushfiq Ahmad. Reciprocal Symmetry and Equivalence between Relativistic and 
Quantum Mechanical Concepts.  http://www.arxiv.org/abs/math-ph/0611024 
2  Mushfiq AhmadReciprocal Symmetric and Origin of Quantum Statistics.  
http://www.arxiv.org/abs/physics/0703194
ABSTRACT
  Classical oscillator differential equation is replaced by the corresponding
(finite time) difference equation. The equation is, then, symmetrized so that
it remains invariant under the change d going to -d, where d is the smallest
span of time. This symmetric equation has solutions, which come in reciprocally
related pairs. One member of a pair agrees with the classical solution and the
other is an oscillating solution and does not converge to a limit as d goes to
0. This solution contributes to oscillator energy a term which is a multiple of
half-integers.

<|endoftext|><|startoftext|>
Introduction
The study of in-medium properties of hadrons has attracted quite some inter-
est among experimentalists and theorists alike because of a possible connection with
chiral symmetry restoration in hot and/or dense matter. Experiments using ultrarel-
ativistic heavy ions reach not only very high densities, but connected with that also
very high temperatures. In their dynamical evolution they run through various –
physically quite different – states, from an initial high-nonequilibrium stage through
a very hot stage of – possibly - a new state of matter (QGP) to an equilibrated ’clas-
sical’ hadronic stage at moderate densities and temperatures. Any observed signal
necessarily represents a time-integral over all these physically quite distinct states of
matter. On the contrary, in experiments with microscopic probes on cold nuclei one
tests interactions with nuclear matter in a well-known state, close to cold equilib-
rium. Even though the density probed is always smaller than the nuclear saturation
density, the expected signals are as large as those from ultrarelativistic heavy-ion
collisions.1), 2)
In this talk we discuss as an example the theoretical situation concerning the
ω meson in medium and use it to point out various essential points both in the
theoretical framework as well as in the interpretation of data (for further refs see the
reviews in3)–5)).
§2. In-medium Properties: Theory
The interest in in-medium properties arose suddenly in the early 90’s when sev-
eral authors6), 7) predicted a close connection between in-medium masses and chiral
symmetry restoration in hot and/or dense matter. This seemed to establish a direct
link between nuclear properties on one hand and QCD symmetries on the other.
Later on it was realized that the connection between the chiral condensates of QCD
and hadronic spectral functions is not as direct as originally envisaged. The only
∗) Speaker, e-mail address: mosel@physik.uni-giessen.de
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0154v1
2 U. Mosel
strict connection is given by QCD sum rules which restrict only an integral over the
hadronic spectral function by the values of the quark and gluon condensates which
themselves are known only for the lowest twist configurations. Indeed a simple, but
more realistic analysis of QCD sum rules showed that these do not make precise
predictions for hadron masses or widths, but can only serve to constrain hadronic
spectral functions.8)–11) Thus hadronic models are needed for a more specific pre-
diction of hadronic properties in medium.
For example, in the past a lively discussion has been going on about a pos-
sible mass shift of the ω-meson in a nuclear medium. While there seems to be a
general agreement that the ω acquires a certain width of the order of 40-60 MeV
in the medium, the mass shift is not so commonly agreed on. While some groups
have predicted a dropping mass,12)–14) there have also been suggestions for a rising
mass15)–18) or even a structure with several peaks.19), 20) In this context a recent
experiment by the CBELSA/TAPS collaboration is of particular interest, since it
is the first indication of a downward shift of the mass of the ω-meson in a nuclear
medium.21) Since Klingl et al.13) were among the first to predict such a downward
shift it is worthwhile to look into their approach again.
The central quantity that contains all the information about the properties of
an ω meson in medium is the spectral function
Amed(q) = −
q2 − (m0ω)
2 −Πvac(q)−Πmed(q)
, (2.1)
with the bare mass m0ω of the ω. The vacuum part of the ω selfenergy Πvac is
dominated by the decay ω → π+π0π−.22) For the calculation of the in-medium part
one can employ the low-density-theorem13), 19), 20) which states that at sufficiently
small density of the nuclear medium one can expand the selfenergy in orders of the
density ρ
Πmed(ν, ~q = 0; ρ) = −ρT (ν) , (2.2)
where T (ν) is the ω-nucleon forward-scattering amplitude. We note that a priori it
is not clear up to which densities this low-density-theorem is reliable.23)
To obtain the imaginary part of the forward scattering amplitude via Cutkosky’s
Cutting Rules Klingl et. al12), 13) used an effective Lagrangian that combined chiral
SU(3) dynamics with VMD. The ω selfenergy was evaluated on tree-level which
needs as input the inelastic reactions ωN → πN (1π channel) and ωN → 2πN (2π
channel) to determine the effective coupling constants. The amplitude ωN → πN
is more or less fixed by the measurable and measured back reaction.25) This is in
contrast to the reaction ωN ↔ ρN which – in the calculations of ref.12), 13) – is not
constrained by any data and which dominates the 2π channel. Furthermore, Klingl
et. al12), 13) employed a heavy baryon approximation (HBA) to drop some of the
tree-level diagrams generated by their Lagrangian. All the calculations were made
for isospin-symmetric nuclear matter at temperature T = 0. The scattered ω was
taken to have ~q = 0 relative to the nuclear medium.
We have repeated these calculations without, however, invoking the HBA.∗) For
∗) For further details of the present calculations we refer to ref.26)
Hadrons in Medium 3
the 2π channel which decides about the in-medium mass shift of the ω in the calcu-
lations of ref.12), 13) we find considerable differences – up to one order of magnitude
in the imaginary part of the selfenergy – when comparing calculations using the full
model with those using the HBA.26) We thus have to conclude already at this point
that the HBA is unjustified for the processes considered here and leads to grossly
incorrect results.
We show our resulting in-medium spectral function of the ω (where HBA was
not employed) in figure 1. Note that in the medium the peak is shifted to 544 MeV
which is due to the large effects of a relativistic, full treatment of the imaginary and
real parts of the amplitudes obtained in the present model. This has to be compared
with the results obtained by Klingl et al.12) Since Klingl et al. find an in-medium
peak at about 620 MeV it is obvious that in the relativistic calculation the physical
picture changes drastically. It is also obvious that the correct treatment of the same
 0.001
 0.01
 0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1
ω [GeV]
vacuum
Fig. 1. Spectral function of the ω meson in the vacuum and at normal nuclear density.
Lagrangian as used in ref.12) on tree-level leads to an unrealistic lowering of the ω
spectral function.
It is, therefore, worthwhile to look into another method to calculate the ω self-
energy that takes experimental constraints as much as possible into account and –
in contrast to the tree-level calculations of ref.12) – respects unitarity. A first study
in this direction has been performed by Lutz et al.19) who solved the Bethe-Salpeter
equation with local interaction kernels. These authors found a rather complex spec-
tral function with a second peak at lower energies due to a coupling to nucleon
resonances with masses of about ≈ 1500 MeV. We have recently used a large-scale
K-matrix analysis of all available γN and πN data27)–29) that does respect unitarity
and thus constrains the essential 2π channel by the inelasticities in the 1π chan-
nel.20) By consistently using the low-density-approximation we have obtained the
result shown in Fig. 2. Fig. 2 clearly exhibits a broadened ω spectral function with
only a small (upwards) shift of the peak mass. In agreement with the calculations of
Lutz et al.,19) although with less strength, it also exhibits a second peak at masses
around 550 MeV that is due to a coupling to a N*(1535)-nucleon hole configuration.
Such a resonance-hole coupling is known to play also a major role in the determi-
4 U. Mosel
0.5 0.6 0.7 0.8 0.9 1.0
 [GeV]
Fig. 2. ω spectral function for an ω meson at rest, i.e. q0 =
q2 (from ref.20)). The appropriately
normalized data points correspond to the reaction e+e− → ω → 3π in vacuum. Shown are
results for densities ρ = 0, ρ = ρ0 = 0.16 fm
−3 (solid) and ρ = 2ρ0 (dashed).
nation of the ρ meson spectral function;23), 24) in the context of QCD sum rules it
has been examined in ref.17) It is obviously quite sensitive to the detailed coupling
strength of this resonance to the ωN channel which energetically opens up only at
much higher masses.
As mentioned earlier, there is general consensus among different theories, that
the on-shell width of the ω meson in medium reaches values of about 50 MeV at
saturation density. To illustrate this point we show in Fig. 3 the width as a function
of omega momentum relative to the nuclear matter restframe both for the transverse
and the longitudinal polarization degree of freedom. It is clearly seen that the
0.0 0.1 0.2 0.3 0.4 0.5 0.6
|q| [GeV]
Fig. 3. On-shell width of the ω in nuclear matter at nuclear matter density ρ0 (from ref.
20)). The
open (solid) points give the width for the transverse (longitudinal) degree of freedom.
transverse width increases strongly as a function of momentum. At values of about
Hadrons in Medium 5
500 MeV, i.e. the region, where CBELSA/TAPS measures, the transverse width
has already increased to about 125 MeV and even the polarization averaged width
amounts to 100 MeV.
§3. Spectral Functions and Observables
Apart from invariant mass measurements, there is another possibility to exper-
imentally constrain the in-medium broadening of the ω-meson. The total width
plotted in Fig. 3 is the sum of elastic and inelastic widths. In general, the inelastic
width alone is determined by the imaginary part of the selfenergy and the latter
determines the amount of reabsorption of ω mesons in the medium. In a Glauber
approximation the cross section for ω production on a nucleus reads
dσγ+A→ω+X
d3x ρ(~x)
dσγ+N→ω+X
ℑΠ(p, ρ(~x ′))
(3.1)
The ratio of this cross section on the nucleus to that on the nucleon then deter-
mines the nuclear transmission T which depends on the imaginary part of the omega
selfenergy ℑΠ
T (A) ≈
d3x ρ(~x) exp
ℑΠ(p, ρ(~x ′))
. (3.2)
Using in addition the low-density-approximation
ℑΠ(p, ~x) = −pρ(~x)σinelωN (3
one obtains the usual Glauber result
T (A) =
d3x ρ(~x) exp
dz′ ρ(~x ′)σinelωN
. (3.4)
We show the calculated transmission T in Fig. 4 together with the data obtained
by CBELSA/TAPS. The measured cross section dependence on massnumber A is
reproduced very well30) if the inelastic ωN cross section is increased by 25% over the
usually used parametrization. This may indicate a problem with the usually used
cross section, or - more interesting - it may indicate a breakdown of the low-density-
approximation.
It is, furthermore, important to realize that the spectral functions themselves are
not observable. What can be observed are the decay products of the meson under
study. It is thus obvious that even in vacuum the invariant mass distribution of the
decaying resonance (V → X1X2), reconstructed from the four-momenta of the decay
products (X1,X2), involves a product of spectral function and partial decay width
into the channel being studied
dRV→X1X2
∼ A(q2)×
ΓV→X1X2(q
Γtot(q2)
. (3.5)
Since in general the branching ratio also depends on the invariant mass of the decay-
ing resonance this dependence may distort the observed invariant mass distribution
6 U. Mosel
0 50 100 150 200
Fig. 4. Transparency of nuclei for ω production. Calculations and preliminary data are normalized
to 12C. Dashed lines reflect error estimates obtained from the spread of the data. Data are
from CBELSA/TAPS.31)
compared with the spectral function itself. This effect is obviously the more impor-
tant the broader the decaying resonance is and the stronger the widths depend on
While these branching ratios are usually well known in vacuum there is con-
siderable uncertainty about their value in the nuclear medium. This uncertainty is
connected with the lack of knowledge about the in-medium vertex corrections, i.e.
the change of coupling constants with density. Even if we assume that these quan-
tities stay the same, then at least the total width appearing in the denominator of
the branching ratio has to be changed, consistent with the change of the width in
the spectral function. This point has only rarely been discussed so far, but it has
far-reaching consequences.
For example, for the ρ meson the partial decay width into the dilepton channel
goes like
Γρ→e+e− ∼
, (3.6)
where the first factor on the rhs originates in the photon propagator and the last
factor M comes from phase-space. On the other hand, the total decay width of the
ρ meson in vacuum is given by (neglecting the pion masses for simplicity)
Γtot ≈ Γρ→ππ ∼ M , (3.7)
so that the branching ratio in vacuum goes like
Γρ→e+e−
. (3.8)
This strong M -dependence distorts the spectral function, in particular, for a broad
resonance such as the ρ meson. This effect is contained and clearly seen in theoretical
simulations of the total dilepton yield from nuclear reactions (see, e. g., Figs. 8− 10
Hadrons in Medium 7
in32)); it leads to a considerable shift of strength in the dilepton spectrum towards
lower masses.
For the semileptonic decay channel π0γ that has been exploited in the CBELSA
TAPS experiment again a strong mass-dependence of the branching ratio shows up
because just at the resonance the decay channel ω → ρπ opens up.
In both of these cases the in-medium broadening changes the total widths in the
denominator of the branching ratios even if the partial decay width stays the same as
in vacuum. Such an in-medium broadening of the total width, which should be the
same as in the spectral function, will tend to weaken the M -dependence of the total
width and thus the branching ratio as a whole. In medium another complication
arises: the spectral function no longer depends on the invariant mass alone, but
– due to a breaking of Lorentz-invariance because of the presence of the nuclear
medium – in addition also on the three-momentum of the hadron being probed.
Again, this p-dependence of the vector meson selfenergy has only rarely been taken
into account (see, however, refs.20), 24), 33)). In addition, final state interactions do
affect hadronic decay channels. A quantitatively reliable treatment of these FSI thus
has to be integral part of any trustable theory that aims at describing these data.
§4. Conclusions
QCD sum rules establish a very useful link between the chiral condensates, both
in vacuum and in medium, but their connection to hadronic spectral functions is
indirect. The latter can thus only be constrained by the QCDSR, but not be fixed;
for a detailed determination hadronic models are needed. We have pointed out in
this talk that the low-density-approximation nearly always used in these studies does
not answer the question up to which densities it is applicable. First studies23) have
shown that this may be different from particle to particle.
While the in-medium properties of all vector mesons ρ, ω, and φ are the subject
of intensive experimental and theoretical research, in this talk we have concentrated
on the ω meson for which recent experiments indicate a lowering of the mass by
about 60 MeV in photon-produced experiments on nuclei. A tree-level calculation,
based on an effective Lagrangian, that predicted such a lowering, has been shown
to be incorrect because of the heavy-baryon approximation used in that calculation.
A correct tree-level calculation with the same Lagrangian gives strong contributions
from the ω → 2πN channel, which, however, is unconstrained by any data; in effect,
the spectral function is softened by an unreasonable pole mass shift. This problem
might partially be based on the fact that all the inelastic processes ωN → πN and
ωN → 2πN are only treated at tree-level. Here an improved calculation is needed,
which incorporates coupled-channels and rescattering, e.g. a Bethe-Salpeter19) or a
K-matrix approach.27), 34), 35)
We have indeed shown that a better calculation that again starts out from an
effective Lagrangian and takes unitarity, channel-coupling and rescattering into ac-
count yields a significantly different in-medium spectral function in which the pole
mass hardly changes, but a broadening of about 60 MeV at nuclear saturation density
takes place, which increases with momentum, primarily in the transverse channel.
8 U. Mosel
Finally, we have pointed out that any measurement of the spectral function
necessarily involves also a branching ratio into the channel being studied. The ex-
perimental in-medium signal thus contains changes of both the spectral function and
the branching ratio.
Acknowledgements
The authors acknowledge discussions with Norbert Kaiser and Wolfram Weise.
They have also benefitted a lot from discussions with Vitaly Shklyar. This work has
been supported by DFG through the SFB/TR16 ”Subnuclear Structure of Matter”.
References
1) U. Mosel, in: QCD Phase Transitions, Proc. Int. Workshop Hirschegg 1997, GSI Darm-
stadt, p. 201, arXiv:nucl-th/9702046.
2) U. Mosel, in: Hadrons in Dense Matter, Proc. Int. Workshop Hirschegg 2000, GSI Darm-
stadt, p. 11, arXiv:nucl-th/0002020.
3) T. Falter, J. Lehr, U. Mosel, P. Muehlich and M. Post, Prog. Part. Nucl. Phys. 53, 25
(2004).
4) L. Alvarez-Ruso, T. Falter, U. Mosel and P. Muehlich, Prog. Part. Nucl. Phys. 55, 71
(2005).
5) R. Rapp and J. Wambach, Adv. Nucl. Phys. 25, 1 (2000).
6) G. E. Brown and M. Rho, Phys. Rev. Lett. 66, 2720 (1991).
7) T. Hatsuda and S. H. Lee, Phys. Rev. C 46, 34 (1992).
8) S. Leupold, W. Peters and U. Mosel, Nucl. Phys. A 628 (1998) 311
9) S. Leupold and U. Mosel, Phys. Rev. C 58, 2939 (1998).
10) S. Leupold, Phys. Rev. C 64, 015202 (2001).
11) S. Leupold and M. Post, Nucl. Phys. A 747, 425 (2005).
12) F. Klingl, T. Waas and W. Weise, Nucl. Phys. A 650, 299 (1999).
13) F. Klingl, N. Kaiser and W. Weise, Nucl. Phys. A 624, 527 (1997).
14) T. Renk, R. A. Schneider and W. Weise, Phys. Rev. C 66, 014902 (2002).
15) A. K. Dutt-Mazumder, R. Hofmann and M. Pospelov, Phys. Rev. C 63, 015204 (2001).
16) M. Post and U. Mosel, Nucl. Phys. A 699, 169 (2002).
17) B. Steinmueller and S. Leupold, Nucl. Phys. A 778, 195 (2006).
18) S. Zschocke, O. P. Pavlenko and B. Kampfer, Phys. Lett. B 562, 57 (2003).
19) M. F. M. Lutz, G. Wolf and B. Friman, Nucl. Phys. A 706, 431 (2002) [Erratum-ibid. A
765, 431 (2006)].
20) P. Muehlich, V. Shklyar, S. Leupold, U. Mosel and M. Post, Nucl. Phys. A 780, 187 (2006).
21) D. Trnka et al. [CBELSA/TAPS Collaboration], Phys. Rev. Lett. 94, 192303 (2005).
22) F. Klingl, N. Kaiser and W. Weise, Z. Phys. A 356, 193 (1996).
23) M. Post, S. Leupold and U. Mosel, Nucl. Phys. A 741, 81 (2004).
24) W. Peters, M. Post, H. Lenske, S. Leupold and U. Mosel, Nucl. Phys. A 632, 109 (1998).
25) B. Friman, arXiv:nucl-th/9801053.
26) F. Eichstaedt, Diploma Thesis, Institut fuer Theoretische Physik, JLU Giessen, 2006,
http://theorie.physik.uni-giessen.de/documents/diplom/eichstaedt.pdf .
27) G. Penner and U. Mosel, Phys. Rev. C 66, 055211 (2002).
28) G. Penner and U. Mosel, Phys. Rev. C 66, 055212 (2002).
29) V. Shklyar, H. Lenske, U. Mosel and G. Penner Phys.Rev. C72, 015210 (2005).
30) P. Muehlich and U. Mosel, Nucl.Phys.A773,156 (2006).
31) M. Kotulla, nucl-ex/0609012.
32) M. Effenberger, E. L. Bratkovskaya and U. Mosel, Phys. Rev. C 60, 044614 (1999).
33) M. Post, S. Leupold and U. Mosel, Nucl. Phys. A 689, 753 (2001).
34) T. Feuster and U. Mosel, Phys. Rev. C 59, 460 (1999).
35) V. Shklyar, H. Lenske, U. Mosel and G. Penner, Phys. Rev. C 71, 055206 (2005) [Erratum-
ibid. C 72, 019903 (2005)].
0 50 100 150 200
ABSTRACT
  In this talk we briefly summarize our theoretical understanding of in-medium
selfenergies of hadrons. With the special case of the $\omega$ meson we
demonstrate that earlier calculations that predicted a significant lowering of
the mass in medium are based on an incorrect treatment of the model Lagrangian;
more consistent calculations lead to a significant broadening, but hardly any
mass shift. We stress that the experimental reconstruction of hadron spectral
functions from measured decay products always requires knowledge of the decay
branching ratios which may also be strongly mass-dependent. It also requires a
quantitatively reliable treatment of final state interactions which has to be
part of any reliable theory.

<|endoftext|><|startoftext|>
Introduction
Observations of spectral lines at radio, (sub)millimeter
and infrared wavelengths are a powerful tool to in-
vestigate the physical and chemical conditions in the
dilute gas of astronomical sources where thermody-
namic equilibrium is a poor approximation (e.g., Genzel
1991; Black 2000). To extract astrophysical parame-
ters from the data, the excitation and optical depth of
the lines need to be estimated, for which various meth-
ods may be used, depending on the available observa-
tions (Van Dishoeck & Hogerheijde 1999; Van der Tak
2005).
If only one or two lines of a molecule1 have been ob-
served, the excitation must be deduced from observa-
tions of other species or from theoretical considerations.
An example is the assumption that the excitation tem-
perature equals the kinetic temperature, a case known as
Local Thermodynamic Equilibrium (LTE) which holds
at high densities.
If many lines have been observed, a popular method
is the ‘rotation diagram’, also called ‘Boltzmann
plot’ or ‘population diagram’ (e.g., Blake et al. 1987;
Helmich et al. 1994; Goldsmith & Langer 1999). This
method describes the excitation by a single tempera-
ture, obtained by a fit to the line intensities as a func-
tion of upper level energy. Provided that beam sizes are
similar and optical depths are low, or that appropriate
corrections are made, this method yields estimates of
the excitation temperature and column density of the
molecule. The excitation temperature approaches the ki-
netic temperature in the high-density limit, but generally
depends on both kinetic temperature and volume den-
sity. Spectral line surveys are often analyzed with ro-
tation diagrams, although more advanced methods are
also used (Helmich & van Dishoeck 1997; Comito et al.
2005).
More sophisticated methods retain the assumption of
a local excitation, but solve for the balance of ex-
citation and de-excitation rates from and to a given
http://arxiv.org/abs/0704.0155v1
http://www.sron.rug.nl/~vdtak/radex/index.shtml
2 Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra
probability method and the Large Velocity Gradient
(LVG) method (Sobolev 1960; De Jong et al. 1975;
Goldreich & Scoville 1976). These ‘intermediate-level’
methods require knowledge of molecular collisional
data, whereas the previous ‘basic-level’ methods only
required spectroscopic and dipole moment information.
This extra requirement limits the use of these methods
to some extent, because collisional data do not exist
for all astrophysically relevant species. The advantage
is that column density, kinetic temperature and volume
density can be constrained, if accurate collision rates
are known. As with rotation diagrams, this method can
be used to compute synthetic spectra to be compared
with data with a χ2 statistic (Jansen 1995; Leurini et al.
2004).
The most advanced methods drop the local approxi-
mation and solve for the intensities (or the radiative
rates) as functions of depth into the cloud, as well as of
velocity. Such methods are usually of the Accelerated
Lambda Iteration (ALI) or Monte Carlo (MC) type, al-
though hybrids also exist. The performance and con-
vergence of such programs have recently been tested
by Van Zadelhoff et al. (2002). Using such programs
one can constrain temperature, density, and velocity
gradients within sources (e.g., Van der Tak et al. 1999;
Tafalla et al. 2002; Jakob et al. 2007), and, if enough
observations are available, even molecular abundance
profiles (e.g., Van der Tak et al. 2000a; Schöier et al.
2002; Maret et al. 2005), especially when coupled to
chemical networks (e.g., Doty et al. 2004; Evans et al.
2005; Goicoechea et al. 2006).
This paper presents the public version of a radiative
transfer code at the ‘intermediate’ level. The assump-
tion of a homogeneous medium limits the number of
free parameters and makes the program a useful tool
in rapidly analyzing a large set of observational data,
in order to provide constraints on physical conditions,
such as density and kinetic temperature (Jansen 1995).
The program can be used for any molecule for which
collisional rate coefficients are available. The input for-
mat for spectroscopic and collisional data is that of the
LAMDA database (Schöier et al. 2005)2 where an on-
line calculator for molecular line intensities3, based on
our program, can also be found4.
The paper is set up as follows. Section 2 describes the
radiative transfer formalism and introduces our notation
of the key quantities. Section 3 describes the formalism
which the program actually uses, and discusses its im-
plementation. Section 4 compares the results of the pro-
gram to those of other programs. The paper concludes
in § 5 with suggested future directions of astrophysical
radiative transfer modeling.
2 http://www.strw.leidenuniv.nl/∼moldata
3 In this paper, the ‘strength’ of a line is an intrinsic quantity
2. Radiative transfer and molecular
excitation
This section summarizes the formalism to analyze
molecular line observations which our program adopts.
For more detailed discussions of radiative transfer see,
e.g., Cannon (1985) or Rybicki & Lightman (1979).
2.1. Basic formalism
Describing the transfer of radiation requires a quantity
which is conserved along its path as long as no local
absorption or emission processes take place, and which
includes the direction of travel. The quantity that sat-
isfies this requirement is the specific intensity Iν, de-
fined as the amount of energy passing through a surface
normal to the path, per unit time, surface, bandwidth
(measured here in frequency units), and solid angle. The
transfer equation for radiation propagating a distance ds
can then be written as
= jν − αν Iν, (1)
where jν and αν are the local emission and extinction
coefficients, respectively. The two terms on the right-
hand side may be combined into the source function,
defined by
S ν ≡
. (2)
Writing the transport equation in its integral form and
defining the optical depth, dτν ≡ αν ds, measured along
the ray5 one arrives at
Iν = Iν (0)e
−τν +
S ν(τ
−(τν−τ′ν) dτ′ν, (3)
where Iν is the radiation emerging from the medium and
Iν(0) is the ‘background’ radiation entering the medium.
The above equations hold both for continuum radiation,
which is emitted over a large bandwidth, and for spectral
lines, which arise when the local emission and absorp-
tion properties change drastically over a very small fre-
quency interval, due to the presence of molecules. From
this point the discussion will focus on bound-bound
transitions within a multi-level molecule consisting of
N levels with spontaneous downward rates Aul, Einstein
coefficients for stimulated transitions Bul and Blu, and
collisional rates Cul and Clu, between upper levels u and
lower levels ℓ.
The rate of collision is equal to
Cul = ncolγul, (4)
where ncol is the number density of the collision partner
(in cm−3) and γul is the downward collisional rate coeffi-
cient (in cm3 s−1). The rate coefficient is the Maxwellian
average of the collision cross section, σ,
γul =
8kTkin
)−1/2 (
kTkin
σEe−E/kTkin dE, (5)
http://www.strw.leidenuniv.nl/~moldata
http://www.sron.rug.nl/~vdtak/radex.php
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra 3
where E is the collision energy, k is the Boltzmann con-
stant, Tkin is the kinetic temperature, and µ is the re-
duced mass of the system. The upward rates are ob-
tained through detailed balance
γlu = γul
e−hν/kTkin , (6)
where gi is the statistical weight of level i.
The local emission in transition u→l with laboratory
frequency νul, can be expressed as
nu Aul φν, (7)
where nu is the number density of molecules in level u
and φν is the frequency-dependent line emission profile.
The absorption coefficient reads
(nl Blu ϕν − nu Bul χν) , (8)
where φν and χν are the line profiles for absorption and
stimulated emission (counted as negative extinction),
respectively.
From here on we assume complete angular and fre-
quency redistribution of the emitted photons, so that
φν=ϕν=χν, which is strictly only valid when collisional
excitation dominates. This assumption allows the source
function to be written as
S νul =
nuAul
nlBlu − nuBul
, (9)
where we have used the Einstein relations. It is com-
mon to introduce an excitation temperature Tex defined
through the Boltzmann equation
exp [−(Eu − El)/kTex] , (10)
where Ei is the energy of level i, such that S νul =
Bν(Tex), the specific intensity of a blackbody radiating
at Tex.
In the interstellar medium, the dominant line broadening
mechanism is Doppler broadening. Except in very cold
and dark cloud cores, observed line widths are much
larger than expected from the kinetic temperature: this
effect is commonly ascribed to random macroscopic gas
motions or ‘turbulence’. The result is a Gaussian line
profile
ν − νul − v · n
, (11)
where νD is the Doppler width, v is the velocity vector
of the moving gas at the position of the scattering, n is
a unit vector in the direction of the propagating beam of
radiation, and c is the speed of light. The Doppler width
is the 1/e half-width of the profile, equal to ∆V/2
where ∆V is its full width at half-maximum.
If the level populations ni are known, the radiative trans-
fer equation can be solved exactly. In particular, under
circumstellar media, the density is too low to attain LTE,
but statistical equilibrium (SE) can often be assumed:
= 0 =
n jP ji − ni
Pi j = Fi − niDi , (12)
where Pi j, the destruction rate coefficient of level i, and
its formation rate coefficient P ji are given by
Pi j =
Ai j + Bi j J̄ν + Ci j (i > j)
Bi j J̄ν +Ci j (i < j).
In Eq. 13,
Bi j J̄ν = Bi j
Jν φ(ν) dν (14)
is the number of induced radiative (de-)excitations from
state i to state j per second per particle in state i, and
Iν dΩ (15)
is the specific intensity Iν integrated over solid angle dΩ
and averaged over all directions. The SE equations thus
include the effects of non-local radiation.
This discussion assumes that the state-specific rates of
formationFi [cm3 s−1] and destructionDi [s−1] are zero
to ensure that the radiative transfer is solved indepen-
dently of assumptions about chemical processes. In gen-
eral, formation and destruction processes should be in-
cluded explicitly to be able to deal with situations in
which the chemical time scales are very short or the
radiative lifetimes very long. For example, the forma-
tion temperature (in Fi) affects the rotational excitation
of C3 (Roueff et al. 2002) and the vibrational excitation
of H2 (Black & van Dishoeck 1987; Burton et al. 1990;
Takahashi & Uehara 2001), systems for which line ra-
diation only occurs as slow electric quadrupole transi-
tions. The rotational excitation of reactive ions like CO+
(Fuente et al. 2000; Black 1998) is also sensitive to Fi
and Di because the rates of reactions with H and H2
rival the inelastic collision excitation rates. Similar con-
siderations apply to the excitation of H+3 in the Sgr A
region close to the Galactic Center (Van der Tak 2006),
where electron impact excitation competes with disso-
ciative recombination.
2.2. Molecular line cooling
Once the radiative transfer problem has been solved and
the level populations are known, the cooling (or heating)
from molecular line emission can be estimated. Since
the level populations contain all the information of the
radiative transfer, a general expression for the cooling is
obtained from considering all possible collisional tran-
sitions
(nlγlu − nuγul)hνul, (16)
4 Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra
are in detailed balance at the kinetic temperature; there-
fore it is possible for net heating to occur (Λ < 0) in
cases where the crucial level populations have Tex>Tkin,
owing to strong radiative excitation in a hot external ra-
diation field.
2.3. Escape probability
The difficulty in solving radiative transfer problems is
the interdependence of the molecular level populations
and the local radiation field, requiring iterative solution
methods. In particular, for inhomogeneous or geometri-
cally complex objects, extensive calculations with many
grid points are required. However, if only the global
properties of an interstellar cloud are of interest, the cal-
culation can be greatly simplified through the introduc-
tion of a geometrically averaged escape probability β,
the probability that a photon will escape the medium
from where it was created. This probability depends
only on the optical depth τ and is related to the inten-
sity within the medium, ignoring background radiation
and any local continuum, through
Jνul = S νul (1 − β). (17)
Several authors have developed detailed relations be-
tween β and τ for specific geometrical assumptions.
Our program offers the user a choice of three such ex-
pressions. The first is the expression derived for an ex-
panding spherical shell, the so-called Sobolev or large
velocity gradient (LVG) approximation (Sobolev 1960;
Castor 1970; Elitzur 1992, p. 42-44). This method is
also widely applied for moderate velocity gradients, to
mimic turbulent motions. Our program uses the formula
by Mihalas (1978) and De Jong et al. (1980) for this ge-
ometry:
βLVG =
dτ′ =
1 − e−τ
. (18)
Second, in the case of a static, spherically symmet-
ric and homogeneous medium the escape probability is
(Osterbrock & Ferland 2006, Appendix 2)6
βsphere =
. (19)
Third, for a plane-parallel ‘slab’ geometry, applicable
for instance to shocks,
βslab =
1 − e−3τ
is derived (De Jong et al. 1975). Figure 1 plots the be-
haviour of β as a function of τ for these three cases; for
more detailed comparisons see Stutzki & Winnewisser
(1985) and Ossenkopf et al. (2001). Users of our pro-
gram can select either expression for their calculations.
The on-line version of the program uses the formula for
the uniform sphere, Eq. (19).
 0  1  2  3  4  5  6  7  8  9  10
Optical depth τ
Sphere
Fig. 1. Escape probability β as a function of optical
depth τ for three different geometries: uniform sphere
(solid line), expanding sphere (dotted line) and plane-
parallel slab (dashed line).
3. The program
RADEX is a non-LTE radiative transfer code, written
originally by J. H. Black, that uses the escape proba-
bility formulation assuming an isothermal and homo-
geneous medium without large-scale velocity fields.
With the current increase of observational possibilities
in mind, we have developed a version of this program
which is suitable for public use. A guide for using the
code in practice is provided in Appendix A and on-line7;
Appendix B describes the adopted coding style. This
section focuses on the implementation of the formalism
of § 2 in the program.
3.1. Basic capabilities
For a homogeneous medium with no global velocity
field, the optical depth at line centre can be expressed
using Eqs. (2, 7, 9, 11), as
AulNmol
1.064∆V
, (21)
where Nmol is the total column density,∆V the full width
at half-maximum of the line profile in velocity units,
and xi the fractional population of level i. The formal-
ism is analogous to the LVG method, with the global
n/(dV/dR) replaced by the local N/∆V , as in microtur-
bulent codes (Leung & Liszt 1976). The program itera-
tively solves the statistical equilibrium equations start-
ing from optically thin statistical equilibrium (§ 3.4) for
the initial level populations.
The program can handle up to seven collision partners
simultaneously. In dense molecular clouds, H2 is the
main collision partner for most species, but in some
cases, separate cross sections may exist for collisions
with the ortho and para forms of H2, and electron col-
lisions may be important for ionic species. In diffuse
molecular clouds and PDRs, excitation by atomic H be-
comes important, particularly for fine structure lines,
while for comets, H2O is the main collision partner. We
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra 5
refer to Flower (1989) for the basic theory of molecular
collisions, and to Dubernet (2005) for an update of the
latest results.
The output of the program is the background-subtracted
line intensity in units of the equivalent radiation temper-
ature in the Rayleigh-Jeans limit. The background sub-
traction follows traditional cm- and mm-wave spectro-
scopic observations where the differences between on-
source and off-source measurements are recorded, such
Iemν − I
. (22)
The radiation peak temperature TR can be directly com-
pared to the observed antenna temperature corrected
for the optical efficiency of the telescope. However, it
should be emphasized that RADEX contains no informa-
tion about the geometry or length scale and that it is
assumed that the source fills the antenna beam. If the
source is expected to be smaller than the observational
beam, computed line fluxes must be corrected before
comparing to observed fluxes.
In other types of observations, the continuum may
not be subtracted from the data. In (sub-)millimeter
and THz observations, for example with ESA’s fu-
ture Herschel space observatory, the dust continuum
of many sources will be much stronger than any in-
strumental error, and baseline subtraction may not be
needed. The same is true for interferometer data, where
the instrumental passband is well characterized.
3.2. Background radiation field
The average Galactic background (interstellar radia-
tion field, ISRF) adopted in RADEX consists of several
components. The main contribution is the cosmic mi-
crowave background (CMB) whose absolute tempera-
ture is taken to be TCBR = 2.725±0.001K based on the
full COBE data set as analyzed by Fixsen & Mather
(2002). This model of the microwave background rep-
resents the broadband continuum only and does not in-
clude the strong emission lines, several of which contain
significant power in the far-infrared and (sub-)millime-
ter part of the spectrum (see, e.g., Fixsen et al. 1999).
The ultraviolet/visible/near-infrared part of the spec-
trum is based on the model of average Galactic starlight
in the solar neighborhood of Mathis et al. (1983). The
far-infrared and (sub-)millimeter part of the spectrum is
based on the single-temperature fit to the Galactic ther-
mal dust emission of Wright et al. (1991). At frequen-
cies below 10 cm−1 (30 GHz), there is a background
contribution from non-thermal radiation in the Galaxy.
A tabulation of this spectrum in ASCII format is avail-
able on-line8, and a graphical representation is shown in
Black (1994).
One subtle aspect of the calculation is the distinction
between the background seen by the observer and the
background seen by the molecules. The continuum con-
tribution to the rate equations may be composed of (1)
an external component which arises outside the emitting
region and (2) an internal continuum that arises within
the emitting region. The CMB and ISRF are examples
of external continuum components; dust emission from
the line-emitting region is an example of an internal
continuum. While an external continuum always fills
the entire sky, an internal continuum may only fill a frac-
tion of it, for example in the case of a circumstellar disk.
With this distinction in mind, the internal intensity be-
comes
Jintν = β[Bν(TCBR)+ηI
ν ]+(1−β)[Bν(Tex)+θ(1−η)I
where Iuserν is the continuous spectrum defined by the
user. The factor η, is the fraction of local continuum
which arises outside the line emitting region, and the
factor θ is the fraction of local sky filled by the internal
continuum.
3.3. Chemical formation and destruction rates
The equations of statistical equilibrium (12) include
source and sink terms. By default, RADEX sets the
destruction rates equal to the same small value,
Di ≡ D = 10−15 s−1, appropriate for cosmic-ray
ionization plus cosmic-ray induced photodissociation
(Prasad & Tarafdar 1983; Gredel et al. 1989). The cor-
responding formation rates are
Fi = 10−24ntotalgi exp(−Ei/kTform)/Q(Tform) (24)
where ntotal is the sum of the densities of all collision
partners, Tform is a formation temperature (default value
300 K), and
Q(T ) =
gi exp(−Ei/kTform) (25)
is the partition function. These assumptions imply a
nominal fractional abundance of every molecule
ntotal
ntotalD
= 10−9 . (26)
The value of the nominal abundance is inconsequential
because the results in RADEX depend on Nmol/∆V , but
not on the fractional abundance. For most molecules
currently in the associated database (LAMDA) and for
the most commonly encountered interstellar conditions,
these choices will not affect the observable excitation.
The formation and destruction rates are computed in a
subroutine that can be modified by the user to provide
a more realistic description of chemical processes. For
example, users may treat the combined ortho/para forms
of molecules by introducing a realistic Tform, especially
in cases where no o/p interchange processes is likely
to be effective. Other cases of potential interest include
the photodissociation of large molecules into smaller
molecules, or the evaporation of icy grain mantles into
the gas phase. Our formulation in terms of a volume
rate of formation is chosen to be independent of the de-
tails of the formation process. In general, formation and
destruction processes are important for molecules that
http://www.oso.chalmers.se/~jblack/RESEARCH/
6 Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra
3.4. Calculation
The input parameters of RADEX and its output are de-
scribed in Appendix A. Calculations with RADEX pro-
ceed as follows. A first guess of the populations of the
molecular energy levels is produced by solving statisti-
cal equilibrium in the optically thin case. The only ra-
diation taken into account is the unshielded background
radiation field; internally produced radiation is not yet
available. The solution for the level populations allows
calculation of the optical depths of all the lines, which
are then used to re-calculate the molecular excitation.
The new calculation treats the background radiation in
the same manner as the internally produced radiation.
The program iteratively finds a consistent solution for
the level populations and the radiation field. When the
optical depths of the lines with τ > 10−2 are stable from
one iteration to the next to a given tolerance (default
10−6), the program writes output and stops.
3.5. Results
There are several ways in which RADEX can be used
to analyze molecular line observations. In most of
these applications, the modeled quantity is the velocity-
integrated line intensity, as the excitation is assumed
to be independent of velocity. As a consequence, self-
absorbed lines cannot be modeled satisfactorily with
RADEX. In the simplest case, the temperature and den-
sity are known from other observations and only the col-
umn density of the molecule under consideration needs
to be varied to get the best agreement with the observed
line intensity. If the H2 column density is known from
other observations, for example from an optically thin
CO isotopic line, the ratio of the two column densi-
ties gives the molecular abundance, averaged over the
source. The RADEX distribution contains a Python script
to automate this procedure which is further described in
Appendix D.
Another often-used application of RADEX is to deter-
mine temperatures and densities from the observed in-
tensity ratios of lines of the same molecule. If the abun-
dance of the molecule is constant throughout the source,
the ratios should give source-averaged physical con-
ditions independent of the specific chemistry of the
molecule. Appendix C presents illustrative plots of line
ratios for commonly observed molecules and lines in
the optically thin case. For higher optical depths, the
qualitative trends remain the same but there are quanti-
tative differences. RADEX can readily be used to generate
similar plots for moderately thick cases. Again, Python
scripts are made available to automate this procedure
(Appendix D).
To illustrate the use of RADEX on actual observations, we
take the observations of the HCO+ 1–0 and 3–2 lines to-
ward a relatively simple source, the photon-dominated
region IC 63 (Jansen et al. 1994). The observed 1–0/3–2
ratio corrected for beam dilution is 5.5±1.5. The kinetic
temperature of the source is constrained from CO ob-
servations to be ∼50 K. Fig. C.3 shows that the inferred
Fig. 2. Comparison of the predicted line strengths for
the 10 lowest rotational transitions of CO for a homo-
geneous isothermal sphere, with nH2 = 10
5 cm−3 and
Tkin = 50 K, using different methods. Upper panel: The
total optical depth through the sphere at line centre τ
and excitation temperature Tex as a function of the up-
per rotational level J involved in the transition. Middle
panel: The radiation temperature TR obtained for each
transition using RADEX with different prescriptions of
the escape probability β and compared with the result
from the Monte-Carlo code (MCC) of Schöier (2000).
Also shown are the results for optically thin emission in
LTE. Lower panel: TR obtained from RADEX compared
with the results from MCC, δTR = T
R − TR.
inferred column density from the absolute intensities is
8 × 1012 cm−2, which, together with the overall H2 col-
umn density of 5×1021 cm−2, gives an HCO+ abundance
with respect to H2 of 1.6 × 10−9.
A slightly more complicated situation arises for the
Orion Bar PDR (Hogerheijde et al. 1995). For this
source, both HCO+ 1–0, 3–2 and 4–3 lines have been
observed. The 1–0/3–2 ratio gives an order of magni-
tude lower density than the 3–2/4–3 ratio. This differ-
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra 7
density tracers such as CO are usually much less sensi-
tive to density. One possible solution is a clumpy PDR
model in which the 1–0 line is mostly produced in the
low-density interclump gas containing 90% of the mate-
rial and the 4–3 line in the high-density clumps. Within
this clumpy model, a single column density fits all three
lines and an accurate abundance can be derived. Note
that this technique of adding the results of two models
is only applicable at low optical depths.
If only two lines of a molecule have been observed,
the line ratio can be used as indicator of tempera-
ture or density, depending on molecule and transition
(Appendix C). A single line ratio is never enough
to constrain both temperature and density, though.
For multi-line observations, a comparison of data and
models in terms of χ2 is preferred. See for example
Van der Tak et al. (2000b), Schöier et al. (2002), and
Leurini et al. (2004) for details of such calculations.
3.6. Limitations of the program
The current version of the program does not in-
clude a contribution from continuous (dust or free-free)
opacity to the escape probability, as for example in
Takahashi et al. (1983). Continuum radiation from dust
is generally negligible at long wavelengths ( >∼ 1 mm)
but becomes important for regions with very high col-
umn densities (such as protoplanetary disks) and at
far-infrared and shorter wavelengths ( <∼ 100µm). Free-
free radiation may become important for the calcula-
tion of atomic fine structure lines from H II regions;
other programs such as CLOUDY (Ferland 2003)9 may
be more suitable for this purpose. The absence of con-
tinuous opacity limits the applicability of the program
particularly in situations where infrared pumping is im-
portant, either directly through rotational transitions or
via vibrational transitions (Carroll & Goldsmith 1981;
Hauschildt et al. 1993).
Another limitation of the program is that only one
molecule is treated at a time, so that the effects of line
overlap are not taken into account. Such overlaps may
occur both at radio and at infrared wavelengths (e.g.,
Expósito et al. 2006). In special cases, overlap between
lines of the same molecule may influence their excita-
tion, for example the hyperfine components of HCN or
+ (Daniel et al. 2006).
For certain molecules under certain physical conditions
(especially low density and/or strong radiation field),
population inversions occur, which cause negative op-
tical depth and hence nonlinear amplification of the
incoming radiation (Elitzur 1992). This phenomenon,
known as ‘maser’ action, requires non-local treatment
of the radiative transfer, in particular a fine sampling of
directions, for which RADEX is not set up. Generally, the
escape probability approximation is justified until the
masers saturate, which occurs at τ ≈ −1. In practice,
the computed intensities of lines with τ <∼ − 0.1 are not
as accurate as those of other lines, and the intensities of
lines with τ <∼ − 1 should be disregarded altogether. If
Fig. 3. Excitation temperature of the HCO+ 1–0 tran-
sition as a function of radius for the model cloud of
§ 4.1.2, calculated with RADEX assuming static spher-
ical, expanding spherical, and slab geometry (dashed
/ dash-dotted / dotted lines), with a multi-zone es-
cape probability program (long/short dashes) and with
a Monte Carlo code (solid lines). The panels are for dif-
ferent column densities, hence optical depths. Note the
different vertical scales.
of non-maser lines may also be affected. While special-
ized programs should be used to calculate the intensities
of maser lines (e.g., Spaans & van Langevelde 1992;
Gray & Field 1995; Yates et al. 1997), RADEX may well
be used to predict which lines of a molecule may display
http://www.nublado.org/
8 Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra
which are pumped by infrared radiation (Leurini et al.
2004).
4. Comparison with other methods
This section shows a comparison of RADEX with other
programs, first for the case of constant physical con-
ditions (§ 4.1) and second for variable conditions
(§ 4.2). Comparison is with the analytical rotation di-
agram method and with Monte Carlo methods, which
have been benchmarked to high accuracy, both for
the case of HCO+ (Van Zadelhoff et al. 2002)10 and of
H2O (Van der Tak et al. 2005)
11. Throughout this sec-
tion, molecular data have been taken from the LAMDA
database (Schöier et al. 2005).
4.1. Homogeneous models
4.1.1. The case of CO
To test the RADEX code, we have compared its output
both to an optically thin LTE analysis (rotation diagram
method) and a full radiative transfer analysis using a
Monte-Carlo method (Schöier 2000). The test problem
consists of a spherically symmetric cloud with a con-
stant density, n(H2), of 1 × 105 cm−3 within a radius of
100 AU. In this example only the CO emission is treated
using a fractional abundance of 1 × 10−4 relative to H2
yielding a central CO column density of NCO = 3 ×
1016 cm−2 and an average value of N=2×1016 cm−2. The
kinetic temperature is set to 50 K, the background tem-
perature to 2.73 K, and the line width to ∆V=1.0 km s−1.
Fig. 2 presents the results of the calculations for the
ten lowest rotational transitions. The excitation temper-
atures of the lines vary from being close to thermalized
for transitions involving low J-levels, to sub-thermally
excited for the higher-lying lines. The optical depth in
the lines is moderate (∼ 1 − 2) to low. It is seen that the
expressions of the escape probability for the uniform
sphere and the expanding sphere give almost identical
solutions which are close to that obtained from the full
radiative transfer (MCC in Fig. 2). The slab geometry
gives slightly higher intensities, in particular for high-
lying lines. The optically thin approximation, where the
gas is assumed to be in LTE at 50 K, produces much
larger discrepancies, up to a factor of∼ 2, and only gives
the correct intensity for the J = 1 → 0 line, where the
LTE conditions are met.
4.1.2. The case of HCO+
To further verify the performance of the RADEX
program, we have compared its results to that
of another program that does not use the local
approximation: the Monte Carlo program RATRAN
(Hogerheijde & van der Tak 2000). We also compare
the results to those from the multi-zone escape proba-
bility program by Poelman & Spaans (2005). The test
case is a cloud with n(H2) = 1×104 cm−3, Tkin=10 K,
Tbg=2.73 K, and a line width of ∆V=1.0 km s
−1, equiv-
alent to bD=0.6 km s
−1. The pure rotational emission
spectrum of HCO+ was calculated for column densi-
ties of 1012, 1013, 1014 and 1015 cm−2, which for RADEX
were given directly as input parameters. For the multi-
zone programs, a cloud radius of 1018 cm was specified
along with abundances of 10−10–10−7, distributed over
50 cells.
Figure 3 shows the calculated excitation temperature
of the HCO+ 1–0 transition as a function of radius
for these physical conditions. For N(HCO+) <∼ 10
12 cm−2,
the excitation is independent of radius and the calcu-
lations for the various geometries agree to ≈10%. The
dependence of the excitation on radius and on geom-
etry increases with increasing column density, and for
N(HCO+) >∼ 10
15 cm−2, the curvature of the Tex distri-
bution becomes too large to ignore. The corresponding
line optical depth is ≈100, with ≈20% spread between
the various estimates (Fig. 4). The curvature arises be-
cause at the cloud center, photon trapping thermalizes
the excitation, while at the edge, the emission can es-
cape the cloud (Bernes 1979). We do not recommend to
use RADEX at line optical depths >∼ 100, because the cal-
culated excitation temperature may not be representa-
tive of the emitting region. However, even if some lines
are highly optically thick, RADEX may well be used to
analyze other lines which are optically thin. For exam-
ple for H2O, the ground state lines often have τ ∼ 1000,
but RADEX is well capable of computing intensities for
higher-lying transitions which are not as optically thick.
At low optical depth, variations in Tex translate directly
into changes in emergent line intensity. Thus, differ-
ences as large as 20% in calculated line flux can arise
depending on the choice of escape probability descrip-
tion, even for moderately thick cases. At high optical
depth, the direct connection between Tex and line flux
is lost because of the dependence on the adopted veloc-
ity field. The assumption in the program that the opti-
cal depth is independent of velocity breaks down in this
case. In this limit, the peak line temperature TR gives
the value of Tex at the τ = 1 surface of the cloud in this
specific transition.
The results shown in this section do not translate easily
to other HCO+ lines such as J=3→2, because the ex-
citation is governed by several competing effects. The
optical depth of the J=3→2 line may be higher or lower
than that of the J=1→0 line, depending on temperature
and density. Observers are encouraged to use RADEX to
study the excitation of their lines as a function of these
parameters, and also consider geometric variations.
4.2. Observations of a Young Stellar Object
To compare a typical RADEX analysis with other meth-
ods for a situation which varying physical conditions,
we choose a molecule for which many lines can be
observed: para-formaldehyde, p-H2CO. Figure 5 shows
observations of p-H2CO (sub-)millimeter emission lines
http://www.strw.leidenuniv.nl/astrochem/
http://www.sron.rug.nl/~vdtak/H2O/
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra 9
Fig. 4. Optical depth of the HCO+ 1–0 transition for
the model of § 4.1.2, calculated with RADEX assuming
static spherical, expanding spherical, and slab geometry
(filled triangles/filled squares/open triangles), with the
multi-zone escape probability code (open circles) and
with the Monte Carlo code (filled circles).
Fig. 5. Line strengths of p-H2CO observed towards
the embedded low-mass protostar IRAS 16293–2422
(squares with error bars), modeled assuming LTE (solid
line), using RADEX (triangles), and using a Monte Carlo
program assuming a constant abundance (solid circles)
and an abundance varying with radius (open circles).
Fig. 6. Distributions of the χ2 parameter correspond-
ing to the models in Figure 5. The RADEX results are
for n(H2) = 10
6 cm−3 as found by Van Dishoeck et al.
(1995).
Van Dishoeck et al. (1995). The data are analyzed us-
ing three methods: assuming LTE (with a rotation dia-
gram), assuming SE (using RADEX), and using a Monte
Carlo program. The free parameters for the LTE fit are
the excitation temperature Tex and the column density
Figure 6 shows the distributions of the χ2 parameter for
the LTE and SE fits, calculated in the standard way (see,
e.g., Van der Tak et al. 2000b; Schöier et al. 2002) as-
suming a 20% uncertainty for all observed points except
the line with the highest upper level energy where a 30%
uncertainty was used. As seen from the figure, the non-
LTE method gives a better fit to the data, as quantified
by the lower minimum χ2 value. This result is not nec-
essarily surprising given that more free parameters are
available. A more important difference is that the esti-
mates of temperature and column density between the
two methods are substantially different, in particular the
temperature (50 vs 150 K). Since the non-LTE method
involves fewer assumptions about the physical state of
the cloud, its results are to be preferred.
These results illustrate that rotation diagrams may give
misleading results when determining physical proper-
ties of interstellar gas clouds (cf. Johnstone et al. 2003
for the case of CH3OH). Figure 5 also demonstrates that
temperatures and column densities derived from rota-
tion diagrams tend to depend on which lines happen to
have been observed (cf. the HCO+ case in § 4.1.2). From
other data, IRAS 16293–2422 is actually known to have
a gradient in temperature and density throughout its en-
velope, which cannot be modelled properly with either
technique. For such situation, a full Monte Carlo radia-
tive transfer method is needed in which both the physi-
cal conditions and the abundances can vary with radius
(Fig. 5, circles). Nevertheless, the column densities and
abundances inferred with RADEX using the physical con-
ditions inferred from the line ratios differ by only a fac-
tor of a few from those found with the more sophis-
ticated analysis, at least for the particular zone of the
source to which those conditions apply (Schöier et al.
2002).
5. Conclusions
We have presented a computer program to analyze spec-
tral line observations at radio and infrared wavelengths,
based on the escape probability approximation. The pro-
gram can be used for any molecule for which collisional
data exist; such input data are available in the required
format from the LAMDA database. The program can be
used for optical depths from ≈–0.1 to ≈100.
The limited number of free parameters makes RADEX
very useful to rapidly analyze large datasets. As an
example, observed line intensity ratios may be com-
pared with the plots in Appendix C to estimate density
and kinetic temperature. Ratios of other lines and other
molecules may be easily computed using the Python
scripts included in the RADEX distribution. The program
may also be used to create synthetic spectra. This capa-
bility will be important to model the THz line surveys
from the HIFI instrument onboard the Herschel space
observatory.
In the future, we plan to incorporate a multi-zone es-
cape probability formalism (Poelman & Spaans 2005;
Elitzur & Asensio Ramos 2006) which will enable
10 Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra
depths, the calculation may also start from LTE condi-
tions rather than from optically thin statistical equilib-
rium. Robust convergence may be achieved by starting
from either initial condition and requiring the two an-
swers to be equal. For the modeling of crowded spec-
tra, the effects of line overlap will also need to be con-
sidered, for instance in the ‘all or nothing’ approach
(Cesaroni & Walmsley 1991). Such spectra will be rou-
tinely observed with the superb resolution and sensitiv-
ity of ALMA.
Our program is free for anybody to use for science, pro-
vided that appropriate reference is made to this paper.
For any other purpose such as to incorporate the pro-
gram into other packages which may be distributed to
the public, prior agreement with the authors is needed.
Acknowledgements. The authors wish to thank Huib Jan van
Langevelde for his efforts in documenting RADEX, and Erik
Deul for computing support at Leiden Observatory. JHB and
FLS acknowledge the Swedish Research Council for financial
support. FvdT and EvD thank the Netherlands Organization
for Scientific Research (NWO) and the Netherlands Research
School for Astronomy (NOVA). Finally we thank Volker
Ossenkopf, Marco Spaans, and an anonymous referee for
helpful comments on the manuscript.
References
Bernes, C. 1979, A&A, 73, 67
Black, J. H. 1994, in ASP Conf. Ser. 58: The First
Symposium on the Infrared Cirrus and Diffuse
Interstellar Clouds, ed. R. M. Cutri & W. B. Latter,
Black, J. H. 1998, in Chemistry and Physics of
Molecules and Grains in Space. Faraday Discussions
No. 109, 257
Black, J. H. 2000, in IAU Symposium 197 –
Astrochemistry: From Molecular Clouds to Planetary
Systems, ed. Y. C. Minh & E. F. van Dishoeck, 81
Black, J. H. & van Dishoeck, E. F. 1987, ApJ, 322, 412
Blake, G. A., Sutton, E. C., Masson, C. R., & Phillips,
T. G. 1987, ApJ, 315, 621
Blake, G. A., van Dishoek, E. F., Jansen, D. J.,
Groesbeck, T. D., & Mundy, L. G. 1994, ApJ, 428,
Burton, M. G., Hollenbach, D. J., & Tielens, A. G. G. M.
1990, ApJ, 365, 620
Cannon, C. J. 1985, The transfer of spectral line radia-
tion (Cambridge: University Press)
Carroll, T. J. & Goldsmith, P. F. 1981, ApJ, 245, 891
Castor, J. I. 1970, MNRAS, 149, 111
Cesaroni, R. & Walmsley, C. M. 1991, A&A, 241, 537
Comito, C., Schilke, P., Phillips, T. G., et al. 2005, ApJS,
156, 127
Daniel, F., Cernicharo, J., & Dubernet, M.-L. 2006, ApJ,
648, 461
De Jong, T., Boland, W., & Dalgarno, A. 1980, A&A,
91, 68
De Jong, T., Dalgarno, A., & Chu, S.-I. 1975, ApJ, 199,
Dubernet, M. L. 2005, in IAU Symposium, ed. D. C.
Lis, G. A. Blake, & E. Herbst, 235
Elitzur, M. 1992, Astronomical masers (Kluwer
Academic Publishers)
Elitzur, M. & Asensio Ramos, A. 2006, MNRAS, 365,
Evans, II, N. J., Lee, J.-E., Rawlings, J. M. C., & Choi,
M. 2005, ApJ, 626, 919
Expósito, J. P. F., Agúndez, M., Tercero, B., Pardo, J. R.,
& Cernicharo, J. 2006, ApJ, 646, L127
Ferland, G. J. 2003, ARA&A, 41, 517
Fixsen, D. J., Bennett, C. L., & Mather, J. C. 1999, ApJ,
526, 207
Fixsen, D. J. & Mather, J. C. 2002, ApJ, 581, 817
Flower, D. R. 1989, Physics Reports, 174, 1
Fuente, A., Black, J. H., Martı́n-Pintado, J., et al. 2000,
ApJ, 545, L113
Genzel, R. 1991, in NATO ASIC Proc. 342: The Physics
of Star Formation and Early Stellar Evolution, ed.
C. J. Lada & N. D. Kylafis, 155
Goicoechea, J. R., Pety, J., Gerin, M., et al. 2006, A&A,
456, 565
Goldreich, P. & Scoville, N. 1976, ApJ, 205, 144
Goldsmith, P. F. & Langer, W. D. 1999, ApJ, 517, 209
Gray, M. D. & Field, D. 1995, A&A, 298, 243
Gredel, R., Lepp, S., Dalgarno, A., & Herbst, E. 1989,
ApJ, 347, 289
Hauschildt, H., Güsten, R., Phillips, T. G., et al. 1993,
A&A, 273, L23
Helmich, F. P., Jansen, D. J., de Graauw, T., Groesbeck,
T. D., & van Dishoeck, E. F. 1994, A&A, 283, 626
Helmich, F. P. & van Dishoeck, E. F. 1997, A&AS, 124,
Hogerheijde, M. R., Jansen, D. J., & van Dishoeck, E. F.
1995, A&A, 294, 792
Hogerheijde, M. R. & van der Tak, F. F. S. 2000, A&A,
362, 697
Jakob, H., Kramer, C., Simon, R., et al. 2007, A&A,
461, 999
Jansen, D. J. 1995, Ph.D. Thesis, Leiden University
Jansen, D. J., van Dishoeck, E. F., & Black, J. H. 1994,
A&A, 282, 605
Johnstone, D., Boonman, A. M. S., & van Dishoeck,
E. F. 2003, A&A, 412, 157
Leung, C.-M. & Liszt, H. S. 1976, ApJ, 208, 732
Leurini, S., Schilke, P., Menten, K. M., et al. 2004,
A&A, 422, 573
Mangum, J. G. & Wootten, A. 1993, ApJS, 89, 123
Maret, S., Ceccarelli, C., Tielens, A. G. G. M., et al.
2005, A&A, 442, 527
Mathis, J. S., Mezger, P. G., & Panagia, N. 1983, A&A,
128, 212
Mihalas, D. 1978, Stellar atmospheres (2nd edition)
(San Francisco, W. H. Freeman and Co.)
Müller, H. S. P., Thorwirth, S., Roth, D. A., &
Winnewisser, G. 2001, A&A, 370, L49
Ossenkopf, V., Trojan, C., & Stutzki, J. 2001, A&A,
378, 608
Osterbrock, D. E. & Ferland, G. J. 2006, Astrophysics
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra 11
Poelman, D. R. & Spaans, M. 2005, A&A, 440, 559
Prasad, S. S. & Tarafdar, S. P. 1983, ApJ, 267, 603
Roueff, E., Felenbok, P., Black, J. H., & Gry, C. 2002,
A&A, 384, 629
Rybicki, G. B. & Lightman, A. P. 1979, Radiative
processes in astrophysics (New York, Wiley-
Interscience)
Schöier, F. L. 2000, Ph.D. Thesis, Stockholm University
Schöier, F. L., Jørgensen, J. K., van Dishoeck, E. F., &
Blake, G. A. 2002, A&A, 390, 1001
Schöier, F. L., van der Tak, F. F. S., van Dishoeck, E. F.,
& Black, J. H. 2005, A&A, 432, 369
Sobolev, V. 1960, Moving envelopes of stars (Harvard
University Press)
Spaans, M. & van Langevelde, H. J. 1992, MNRAS,
258, 159
Stutzki, J. & Winnewisser, G. 1985, A&A, 144, 13
Tafalla, M., Myers, P. C., Caselli, P., Walmsley, C. M.,
& Comito, C. 2002, ApJ, 569, 815
Takahashi, J. & Uehara, H. 2001, ApJ, 561, 843
Takahashi, T., Silk, J., & Hollenbach, D. J. 1983, ApJ,
275, 145
Van der Tak, F., Neufeld, D., Yates, J., et al. 2005, in
The Dusty and Molecular Universe: A Prelude to
Herschel and ALMA, ed. A. Wilson, 431–432
Van der Tak, F. F. S. 2005, in IAU Symposium
227: Massive Star Birth, ed. R. Cesaroni, M. Felli,
E. Churchwell, & M. Walmsley (Cambridge:
University Press), 70–79
Van der Tak, F. F. S. 2006, Phil. Trans. R. Soc. Lond.,
364, 3101
Van der Tak, F. F. S., van Dishoeck, E. F., & Caselli, P.
2000a, A&A, 361, 327
Van der Tak, F. F. S., van Dishoeck, E. F., Evans, II,
N. J., Bakker, E. J., & Blake, G. A. 1999, ApJ, 522,
Van der Tak, F. F. S., van Dishoeck, E. F., Evans, II,
N. J., & Blake, G. A. 2000b, ApJ, 537, 283
Van Dishoeck, E. F., Blake, G. A., Jansen, D. J., &
Groesbeck, T. D. 1995, ApJ, 447, 760
Van Dishoeck, E. F. & Hogerheijde, M. R. 1999, in
NATO ASIC Proc. 540: The Origin of Stars and
Planetary Systems, ed. C. J. Lada & N. D. Kylafis,
Van Zadelhoff, G.-J., Dullemond, C. P., van der Tak,
F. F. S., et al. 2002, A&A, 395, 373
Wright, E. L., Mather, J. C., Bennett, C. L., et al. 1991,
ApJ, 381, 200
Yates, J. A., Field, D., & Gray, M. D. 1997, MNRAS,
285, 303
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 1
Online Material
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 2
Appendix A: Program input and output
A.1. Program input
The input parameters to RADEX are the following:
1. The name of the molecular data file to be used.
2. The name of the file to write the output to.
3. The frequency range for the output file [GHz]. All
transitions from the molecular data file are always
taken into account in the calculation, but often it is
practical to write only a limited set of lines to the
output.
4. The kinetic temperature of the cloud [K].
5. The number of collision partners to be used. Most
users will want H2 as only collision partner, but in
more specialized cases, additional collisions with
H or electrons may for instance play a role. See
the molecular datafiles for details. For some species
(CO, atoms) separate collision data for ortho and
para H2 exist; the program then uses the thermal or-
tho/para ratio unless the user specifies otherwise.
6. The name (case-insensitive) and the density [cm−3]
of each collision partner. Possibilities are H2, p-H2,
o-H2, electrons, atomic H, He, and H
7. The temperature of the background radiation field
– If >0, a black body at this temperature is used.
Most users will adopt the cosmic microwave
background at TCMB=2.725(1+z) K for a galaxy
at redshift z.
– If =0, the average interstellar radiation field
(ISRF) is used, taken from Black (1994) with
modifications described in § 3.2. This spectrum
is not adjustable by a scale factor because it con-
sists of several components that are not expected
to scale linearly with respect to each other.
– If <0, a user-defined radiation field is used, spec-
ified by values of frequency [cm−1], intensity
[Jy nsr−1], and dilution factor [dimensionless].
Spline interpolation and extrapolation are ap-
plied to this table. The intensity need not be
specified at all frequencies of the line list, but
a warning message will appear if extrapolation
(rather than interpolation) is required.
8. The column density of the molecule [cm−2].
9. The FWHM line width [km s−1].
A.2. Program output
The output file written by RADEX first replicates the in-
put parameters, and then lists the following quantities
for each spectral line within the specified frequency
range.
1. Quantum numbers, upper state energy [K], fre-
quency [GHz], and wavelength [µm]. These num-
bers are just copied from the molecular data file,
which usually comes from the LAMDA database.
Frequencies from this database are generally of
line catalogs such as CDMS (Müller et al. 2001)12
should be consulted.
2. The excitation temperature [K] as defined in
Eq. (10). In general, different lines have different
excitation temperatures. Lines are thermalized if
Tex=Tkin; in LTE, all lines are thermalized.
3. The line optical depth, defined as the optical depth
of the equivalent rectangular line shape (φν = 1/∆ν).
4. The line intensity, defined as the Rayleigh-Jeans
equivalent temperature TR [K].
5. The line flux, defined as the velocity-integrated
intensity, both in units of K km s−1 (common in
radio astronomy) and of erg cm−2 s−1 (common
in infrared astronomy). The line flux is calcu-
lated as 1.0645TR∆V, where the factor 1.0645 =√
ln 2) converts the adopted rectangular line
profile into a Gaussian profile with an FWHM of
∆V. The integrated profile is useful to estimate the
total emission in the line, but it has limited mean-
ing at high optical depths, because the change of
optical depth over the line profile is not taken into
account. Proper modeling of optically thick lines re-
quires programs that resolve the source both spec-
trally and spatially (see § 4.1.2 for further discus-
sion).
Auxiliary output files can be generated, for example to
display the adopted continuum spectrum.
Appendix B: Coding standards
The original version of RADEX was written in such a
way as to minimize the use of machine memory which
was expensive until a decade ago. Nowadays, clarity
and easy maintenance are more important requirements,
which is why the source code has been re-written fol-
lowing the rules below. We hope that these rules will be
useful for the development of other ‘open source’ as-
tronomical software. For further guidelines on scientific
programming we recommend the Software Carpentry13
on-line course.
1. All the action is in subroutines; the sole purpose of
the main program is to show the structure of the pro-
gram. The subroutines are grouped into several files
for a better overview; compilation instructions for
automated builds on a variety of platforms are in a
Makefile.
2. The program text is interspersed with comments at
a ratio of ≈1:1. In particular, each subroutine starts
with a description of its contents, its input and out-
put, where incoming calls come from, and which
calls go out. Then the properties of each variable are
described: contents, units and type.
3. Variables and subroutines have descriptive names
with a length of 5–10 letters. Names of integer vari-
ables start with the letters i..n; names of floating-
point variables (always of double precision) with
a..h or o..z. There are no specific namings for vari-
ables of character or logical type, as for example
http://cdms.de
http://www.swc.scipy.org/
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 3
in CLOUDY. Names are always based on the English
language. We do not use upper-, lower-, or camel-
case to distinguish types of identifiers, as some pro-
grams do.
4. Loops are marked by indenting the program text.
The loop variables are always called ilev, iline ..,
never just i.
5. Subroutines start with a check whether the input pa-
rameters have reasonable values. Such checks force
soft landings if necessary, and avoid runtime errors.
6. Statements with calculations use spaces around
the =, + and – symbols, but not around others.
Calculations that consist of multiple steps are split
over as many program lines. Multiple assignments
in a row are aligned at the = sign.
7. The program text avoids “magic numbers” both in
calculations and in definitions. Numbers that are of-
ten used such as Planck’s constant are defined at one
central place in the program. Similarly, often-used
variables such as physical parameters are stored in
shared memory rather than passed on via subroutine
calls.
Appendix C: Diagnostic plots of molecular
line ratios
We have used the RADEX program and the LAMDA
database to calculate line ratios of several commonly
observed molecules for a range of kinetic tempera-
tures and H2 densities. The plots in this Appendix may
be used by observers to estimate physical conditions
from their data. Line ratios have the advantage of be-
ing less sensitive to calibration errors than absolute line
strengths, especially when the two lines have been mea-
sured with the same telescope, receiver and spectrome-
The calculations assume a column density of 1012 cm−2,
a line width of 1.0 km s−1, and use a 2.73 K blackbody
as background radiation field. Under these conditions,
the lines are optically thin, so that the line ratios do not
depend on column density. The calculations also assume
that the emission in both lines fills the telescope beams
equally, which may be the case if the lines are close
in frequency. However, lines are generally measured in
beams of different sizes, and the observations need to
be corrected to account for this effect, if the source is
known to be compact.
Linear molecules such as CO (Fig. C.1) are tracers
of density at low densities, when collisions compete
with radiative decay. At higher densities, the excitation
becomes thermalized and the line ratios are sensitive
to temperature. For a given molecule, moving up the
J−ladder means probing higher temperatures and densi-
ties. Note that for the column densities of typical dense
interstellar clouds, the CO lines are optically thick, and
observations of 13CO or even rarer isotopologues must
be used to probe physical conditions.
The critical densities of molecular lines scale as µ2ν3,
where µ is the permanent dipole moment of the
molecule and ν is the frequency of the line. Indeed, the
CS molecule (Fig. C.2) has a larger dipole moment than
CO, and its line ratios are mainly probes of the den-
sity. The small frequency spacing between the lines of
CS makes this molecule very useful to probe density
structure (e.g., Van der Tak et al. 2000b). The HCO+
and HCN molecules (Fig. C.3) display similar trends to
CS, although their line spacing is not as small.
Non-linear molecules such as H2CO (Figs. C.4 and C.5)
have the advantage that both temperature and density
may be probed within the same frequency range. Ratios
of lines from different J−states tend to be density tracers
(left panels), while ratios of lines from the same J−state
but different K−states are mostly temperature probes
(right panels). The lines of H2CO are often quite strong,
making this molecule a favourite tracer of temperature
and density (Mangum & Wootten 1993). Other asym-
metric molecules have also been used, such as H2CS
(Blake et al. 1994) and CH3OH (Leurini et al. 2004) al-
though abundance variations from source to source or
even within sources often complicate the interpretation
(Johnstone et al. 2003).
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 4
Appendix D: The Python scripts
The RADEX distribution comes with two scripts,
radex line.py and radex grid.py, to automate
standard modeling procedures. The scripts are written
in Python and are run from the Unix shell command
line after manual editing of parameters.
The first script, radex line.py, calculates the column
density of a molecule from an observed line intensity,
given estimates of kinetic temperature and H2 volume
density. The input parameters are:
1. The kinetic temperature [K]
2. The number density of H2 molecules [cm
3. The temperature of the background radiation field
[K], usually 2.73 (CMB).
4. The name of the molecule (or molecular data file)
5. The frequency of the line [GHz]
6. The observed line intensity [K]
7. The observed line width [km s−1]
Furthermore, two numerical parameters have good de-
fault values but will need to be changed occasionally:
1. The free spectral range around the line (default
10%): this number must be smaller for molecules
with many line close in frequency, such as CH3OH.
The program uses this parameter to find the ob-
served line from the list of lines in the molecular
model.
2. The required accuracy (default 10%): The default
corresponds to the calibration uncertainty of most
telescopes.
The script iterates on column density until the observed
and modeled line fluxes agree to within the desired ac-
curacy. The best-fit column density is directly written
to the screen. The file radex.out gives details of the
best-fit model.
The second script, radex grid.py, runs a series of
RADEX models to estimate the kinetic temperature
and/or the volume density from an observed line ratio.
The user needs to set the following input parameters:
1. The grid boundaries: minimum and maximum ki-
netic temperature [K] and minimum and maximum
H2 volume density [cm
2. The temperature of the background radiation field
[K], usually 2.73 (CMB).
3. The molecular column density [cm−2]. For the
illustrative plots in Appendix C, a low value
(N=1012 cm−2) was used, so that the line ratios are
independent of column density (optically thin limit).
However, in modeling specific observations, it is
worth varying this parameter to assess the sensitivity
of the line ratio to column density.
4. The observed line width [km s−1], usually an aver-
age of the widths of the two lines.
1. The number of grid points along the temperature and
density axes.
2. The free spectral range around the line (see above)
The name of the molecule and a list of observed line
ratios and names of associated output files are given at
the start of the main program. The script produces a file
radex.out which is a tabular listing of temperature,
log density, and line ratio. This results may be plotted
with the user’s favourite plotting program.
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 5
Fig. C.1. Line ratios of CO in the optically thin limit
as a function of kinetic temperature and H2 density.
Contours are spaced linearly and some contours are la-
beled for easy identification.
Fig. C.2. Line ratios of CS in the optically thin limit
as a function of kinetic temperature and H2 density.
Contours are spaced linearly and some contours are la-
beled for easy identification.
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 6
Fig. C.3. Line ratios of HCO+ and HCN in the optically
thin limit as a function of kinetic temperature and H2
density. Contours are spaced linearly and some contours
are labeled for easy identification.
Fig. C.4. Line ratios of o-H2CO in the optically thin
limit as a function of kinetic temperature and H2 den-
sity. Contours are spaced linearly and some contours are
labeled for easy identification.
Van der Tak et al.: Fast non-LTE analysis of interstellar line spectra, Online Material p 7
Fig. C.5. Line ratios of p-H2CO in the optically thin
limit as a function of kinetic temperature and H2 den-
sity. Contours are spaced linearly and some contours are
labeled for easy identification.
	Introduction
	Radiative transfer and molecular excitation
	Basic formalism
	Molecular line cooling
	Escape probability
	The program
	Basic capabilities
	Background radiation field
	Chemical formation and destruction rates
	Calculation
	Results
	Limitations of the program
	Comparison with other methods
	Homogeneous models
	The case of CO
	The case of HCO+
	Observations of a Young Stellar Object
	Conclusions
	Program input and output
	Program input
	Program output
	Coding standards
	Diagnostic plots of molecular line ratios
	The Python scripts
ABSTRACT
  The large quantity and high quality of modern radio and infrared line
observations require efficient modeling techniques to infer physical and
chemical parameters such as temperature, density, and molecular abundances. We
present a computer program to calculate the intensities of atomic and molecular
lines produced in a uniform medium, based on statistical equilibrium
calculations involving collisional and radiative processes and including
radiation from background sources. Optical depth effects are treated with an
escape probability method. The program is available on the World Wide Web at
http://www.sron.rug.nl/~vdtak/radex/index.shtml . The program makes use of
molecular data files maintained in the Leiden Atomic and Molecular Database
(LAMDA), which will continue to be improved and expanded. The performance of
the program is compared with more approximate and with more sophisticated
methods. An Appendix provides diagnostic plots to estimate physical parameters
from line intensity ratios of commonly observed molecules. This program should
form an important tool in analyzing observations from current and future radio
and infrared telescopes.

<|endoftext|><|startoftext|>
Introduction
In the statistical thermodynamic approach to the theory of simple liquids,
there is a close connection between the thermodynamic and structural prop-
erties [1–4]. These properties depend on the intermolecular potential of the
system, which is generally assumed to be well represented by pair interactions.
The simplest model pair potential is that of a hard-core fluid (rods, disks,
spheres, hyperspheres) in which attractive forces are completely neglected. In
fact, it is a model that has been most studied and has rendered some analytical
results, although up to this day no general (exact) explicit expression for the
equation of state is available, except for the one-dimensional case. Something
similar applies to the structural properties. An interesting feature concerning
the thermodynamic properties is that in hard-core systems the equation of
state depends only on the contact values of the radial distribution functions.
In the absence of a completely analytical approach, the most popular methods
http://arxiv.org/abs/0704.0157v1
2 M. López de Haro, S. B. Yuste and A. Santos
to deal with both kinds of properties of these systems are integral equation
theories and computer simulations.
It is well known that in real gases and liquids at high temperatures the
state and thermodynamic properties are determined almost entirely by the
repulsive forces among molecules. At lower temperatures, attractive forces
become significant, but even in this case they affect very little the configu-
ration of the system at moderate and high densities. These facts are taken
into account in the application of the perturbation theory of fluids, where
hard-core fluids are used as the reference systems in the computation of the
thermodynamic and structural properties of real fluids. However, successful
results using perturbation theory are rather limited due to the fact that, as
mentioned above, there are in general no exact (analytical) expressions for
the thermodynamic and structural properties of the reference systems which
are in principle required in the calculations. On the other hand, in the realm
of soft condensed matter the use of the hard-sphere model in connection, for
instance, with sterically stabilized colloidal systems is quite common. This is
due to the fact that nowadays it is possible to prepare (almost) monodisperse
spherical colloidal particles with short-ranged harshly repulsive interparticle
forces that may be well described theoretically with the hard-sphere potential.
This chapter presents an overview of the efforts we have made over the
last few years to compute the thermodynamic and structural properties of
hard-core systems using relatively simple (approximate) analytical methods.
It is structured as follows. In Section 2 we describe our proposals to derive
the contact values of the radial distribution functions of a multicomponent
mixture (with an arbitrary size distribution, either discrete or continuous) of
d-dimensional hard spheres from the use of some consistency conditions and
the knowledge of the contact value of the radial distribution function of the
corresponding single component system. In turn, these contact values lead to
equations of state both for additive and non-additive hard spheres. Some con-
sequences of such equations of state, in particular the demixing transition, are
briefly analyzed. This is followed in Section 3 by the description of the Ratio-
nal Function Approximation method to obtain analytical expressions for the
structural quantities of three-dimensional single component and multicompo-
nent fluids. The only required inputs in this approach are the contact values of
the radial distribution functions and so the connection with the work of the
previous section follows naturally. Structural properties of related systems,
like sticky hard spheres or square-well fluids, that may also be tackled with
the same philosophy are also discussed in Section 4. Section 5 provides an
account of the reformulation of the perturbation theory of liquids using the
results of the Rational Function Approximation method for a single compo-
nent hard-sphere fluid and its illustration in the case of the Lennard–Jones
fluid. In the final section, we provide some perspectives of the achievements
obtained so far and of the challenges that remain ahead.
Alternative Approaches to Hard-Sphere Liquids 3
2 Contact Values and Equations of State for Mixtures
As stated in the Introduction, a nice feature of hard-core fluids is that the
expressions of all their thermodynamic properties in terms of the radial distri-
bution functions (RDF) are particularly simple. In fact, for these systems the
internal energy reduces to that of the ideal gas and in the pressure equation
it is only the contact values rather than the full RDF which appear explicitly.
In this section we present our approach to the derivation of the contact values
of hard-core fluid mixtures in d dimensions.
2.1 Additive Systems in d Dimensions
If σij denotes the distance of separation at contact between the centers of
two interacting fluid particles, one of species i and the other of species j,
the mixture is said to be additive if σij is just the arithmetic mean of the
hard-core diameters of each species. Otherwise, the system is non-additive.
We deal in this subsection and in Subsection 2.2 with additive systems, while
non-additive hard-core mixtures will be treated in Subsection 2.3.
Definitions
Let us consider an additive mixture of hard spheres (HS) in d dimensions with
an arbitrary number N of components. In fact, our discussion will remain valid
for N → ∞, i.e., for polydisperse mixtures with a continuous distribution of
sizes.
The additive hard core of the interaction between a sphere of species i and
a sphere of species j is σij =
(σi + σj), where the diameter of a sphere of
species i is σii = σi. Let the number density of the mixture be ρ and the mole
fraction of species i be xi = ρi/ρ, where ρi is the number density of species i.
From these quantities one can define the packing fraction η = vdρMd, where
vd = (π/4)
d/2/Γ (1 + d/2) is the volume of a d-dimensional sphere of unit
diameter and
Mn ≡ 〈σn〉 =
i (1)
denotes the nth moment of the diameter distribution.
In a HS mixture, the knowledge of the contact values gij(σij) of the RDF
gij(r), where r is the distance, is important for a number of reasons. For
example, the availability of gij(σij) is sufficient to get the equation of state
(EOS) of the mixture via the virial expression
Z(η) = 1 +
i,j=1
xixjσ
ijgij(σij), (2)
4 M. López de Haro, S. B. Yuste and A. Santos
where Z = p/ρkBT is the compressibility factor of the mixture, p being the
pressure, kB the Boltzmann constant, and T the absolute temperature.
The exact form of gij(σij) as functions of the packing fraction η, the set
of diameters {σk}, and the set of mole fractions {xk} is only known in the
one-dimensional case, where one simply has [5]
gij(σij) =
, (d = 1). (3)
Consequently, for d ≥ 2 one has to resort to approximate theories or empirical
expressions. For hard-disk mixtures, an accurate expression is provided by
Jenkins and Mancini’s (JM) approximation [6, 7],
gJMij (σij) =
(1− η)2
σiσjM1
σijM2
, (d = 2). (4)
The associated compressibility factor is
ZJM(η) =
1 + η/8
(1 − η)2
, (d = 2). (5)
In the case of three-dimensional systems, some important analytical expres-
sions for the contact values and the corresponding compressibility factor also
exist. For instance, the expressions which follow from the solution of the
Percus–Yevick (PY) equation of additive HS mixtures by Lebowitz [8] are
gPYij (σij) =
(1− η)2
σiσjM2
σijM3
, (d = 3), (6)
ZPY(η) =
(1− η)2
(1 − η)2
, (d = 3). (7)
Also analytical are the results obtained from the Scaled Particle Theory (SPT)
[9–12],
gSPTij (σij) =
(1 − η)2
σiσjM2
σijM3
(1 − η)3
σiσjM2
σijM3
, (d = 3),
ZSPT(η) =
(1− η)2
(1− η)3
, (d = 3). (9)
Neither the PY nor the SPT lead to particularly accurate values and so
Boubĺık [13] and, independently, Grundke and Henderson [14] and Lee and
Levesque [15] proposed an interpolation between the PY and the SPT contact
values, that we will refer to as the BGHLL values:
gBGHLLij (σij) =
(1− η)2
σiσjM2
σijM3
(1− η)3
σiσjM2
σijM3
, (d = 3).
Alternative Approaches to Hard-Sphere Liquids 5
This leads through Eq. (2) to the widely used and rather accurate Boubĺık–
Mansoori–Carnahan–Starling–Leland (BMCSL) EOS [13,16] for HS mixtures:
ZBMCSL(η) =
(1− η)2
η2(3− η)
(1 − η)3
, (d = 3). (11)
Refinements of the BGHLL values have been subsequently introduced, among
others, by Henderson et al. [17–22], Matyushov and Ladanyi [23], and Barrio
and Solana [24] to eliminate some drawbacks of the BMCSL EOS in the so-
called colloidal limit of binary HS mixtures. On a different path, but also
having to do with the colloidal limit, Viduna and Smith [25] have proposed
a method to obtain contact values of the RDF of HS mixtures from a given
EOS. However, none of these proposals may be easily generalized so as to be
valid for any dimensionality and any number of components. Therefore, if one
wants to have a more general framework able to deal with arbitrary d and N
an alternative strategy is called for.
Universality Ansatz
In order to follow our alternative strategy, it is useful to make use of exact
limit results that can help one in the construction of approximate expressions
for gij(σij). Let us consider first the limit in which one of the species, say i,
is made of point particles, i.e., σi → 0. In that case, gii(σi) takes the ideal
gas value, except that one has to take into account that the available volume
fraction is 1− η. Thus,
gii(σi) =
. (12)
An even simpler situation occurs when all the species have the same size,
{σk} → σ, so that the system becomes equivalent to a single component
system. Therefore,
{σk}→σ
gij(σij) = gs, (13)
where gs is the contact value of the RDF of the single component fluid at
the same packing fraction η as that of the mixture. Table 1 lists some of
the most widely used proposals for the contact value gs and the associated
compressibility factor
Zs = 1 + 2
d−1ηgs (14)
in the case of the single component HS fluid.
Equations (12) and (13) represent the simplest and most basic conditions
that gij(σij) must satisfy. There is a number of other less trivial consistency
conditions [11, 17, 19, 20, 23, 24,32–34], some of which will be used later on.
In order to proceed, in line with a property shared by earlier proposals
[see, in particular, Eqs. (4), (6), (8), and (10)], we assume that, at a given
packing fraction η, the dependence of gij(σij) on the parameters {σk} and
{xk} takes place only through the scaled quantity
6 M. López de Haro, S. B. Yuste and A. Santos
Table 1. Some expressions of gs and Zs for the single component HS fluid. In the
SHY proposal, ηcp = (
3/6)π is the crystalline close-packing fraction for hard disks.
In the LM proposal, b3 and b4 are the (reduced) third and fourth virial coefficients,
ζ(η) = 1.2973(59)−0.062(13)η/ηcp for d = 4, and ζ(η) = 1.074(16)+0.163(45)η/ηcp
for d = 5, where the values of the close-packing fractions are ηcp = π
2/16 ≃ 0.617
and ηcp = π
2/30 ≃ 0.465 for d = 4 and d = 5, respectively.
d gs Zs Label Ref.
1− 7η/16
(1− η)2
1 + η2/8
(1− η)2
H [26]
1− η(2ηcp − 1)/2η2cp
1− 2η + η2(ηcp − 1)/2η2cp
1− 2η + η2(ηcp − 1)/2η2cp
SHY [27]
2 gHs −
27(1− η)4
ZHs −
26(1− η)4
L [28]
1 + η/2
(1− η)2
1 + 2η + 3η2
(1− η)2
PY [29]
1− η/2 + η2/4
(1− η)3
1 + η + η2
(1− η)3
SPT [9]
1− η/2
(1− η)3
1 + η + η2 − η3
(1− η)3
CS [30]
1 + [21−db3 − ζ(η)b4/b3]η
1− ζ(η)(b4/b3)η + [ζ(η)− 1] 21−db4η2
1 + 2d−1ηgLMs LM [31]
zij ≡
. (15)
More specifically, we assume
gij(σij) = G(η, zij), (16)
where the function G(η, z) is universal in the sense that it is a common func-
tion for all the pairs (i, j), regardless of the composition and number of compo-
nents of the mixture. Of course, the function G(η, z) is in principle different for
each dimensionality d. To clarify the implications of this universality ansatz,
let us imagine two mixtures M and M′ having the same packing fraction
η but strongly differing in the set of mole fractions, the sizes of the parti-
cles, and even the number of components. Suppose now that there exists a
pair (i, j) in mixture M and another pair (i′, j′) in mixture M′ such that
zij = zi′j′ . Then, according to Eq. (16), the contact value of the RDF for the
pair (i, j) in mixture M is the same as that for the pair (i′, j′) in mixture
M′, i.e., gij(σij) = gi′j′ (σi′j′ ). In order to ascribe a physical meaning to the
parameter zij , note that the ratio Md−1/Md can be understood as a “typ-
ical” inverse diameter (or curvature) of the particles of the mixture. Thus,
z−1ij =
(σ−1i + σ
j )/(Md−1/Md) represents the arithmetic mean curvature,
in units of Md−1/Md, of a particle of species i and a particle of species j.
Alternative Approaches to Hard-Sphere Liquids 7
Once the ansatz (16) is adopted, one may use the limits in (12) and (13)
to get G(η, z) at z = 0 and z = 1, respectively. Since zii → 0 in the limit
σi → 0, insertion of Eq. (12) into (16) yields
G(η, 0) = 1
≡ G0(η). (17)
Next, if all the diameters are equal, zij → 1, so that Eq. (13) implies that
G(η, 1) = gs. (18)
Linear Approximation
As the simplest approximation [35], one may assume a linear dependence of
G on z that satisfies the basic requirements (17) and (18), namely
G(η, z) = 1
z. (19)
Inserting this into Eq. (16), one has
ge1ij (σij) =
. (20)
Here, the label “e1” is meant to indicate that (i) the contact values used are
an extension of the single component contact value gs and that (ii) G(η, z) is
a linear polynomial in z. This notation will become handy below. Although
the proposal (20) is rather crude and does not produce especially accurate
results for gij(σij) when d ≥ 3, it nevertheless leads to an EOS that exhibits
an excellent agreement with simulations in 2, 3, 4, and 5 dimensions, provided
that an accurate gs is used as input [35–39]. This EOS may be written as
Ze1(η) = 1 +
2d−1(Ω0 −Ω1) + [Zs(η)− 1]Ω1, (21)
where the coefficients Ωm depend only on the composition of the mixture and
are defined by
Ωm = 2
−(d−m)
Mmd−1
Mm+1d
Mn+mMd−n. (22)
In particular, for d = 2 and d = 3,
Ze1(η) =
Zs(η) −
, (d = 2), (23)
8 M. López de Haro, S. B. Yuste and A. Santos
Ze1(η) =
Zs(η)−
, (d = 3). (24)
As an extra asset, from Eq. (21) one may write the virial coefficients of
the mixture Bn, defined by
Z = 1 +
Bn+1ρ
n, (25)
in terms of the (reduced) virial coefficients of the single component fluid bn
defined by
Zs = 1 +
bn+1η
n. (26)
The result is
Bn = v
Ω1bn + 2
d−1(Ω0 −Ω1)
. (27)
In the case of binary mixtures, these coefficients are in very good agreement
with the available exact and simulation results [35,37], except when the mix-
ture involves components of very disparate sizes, especially for high dimen-
sionalities. One may perform a slight modification such that this deficiency is
avoided and thus get a modified EOS [37, 40]. For d = 2 and d = 3 it reads
Z(η) = Zs(η) + x1
1− η2
1− η2
− Zs(η)
σ2 − σ1
1− η1
1− η1
− Zs(η)
σ1 − σ2
, (d = 2, 3),
where ηi = vdρiσ
i is the partial volume packing fraction due to species i. In
contrast to most of the approaches (PY, SPT, BMCSL, e1, . . . ), the proposal
(28) expresses Z(η) in terms not only of Zs(η) but also involves Zs
and Zs
. Equation (28) should in principle be useful in particular for
binary mixtures involving components of very disparate sizes. However, it is
slightly less accurate than the one given in Eq. (21) for ordinary mixtures [37].
Quadratic Approximation
In order to improve the proposal contained in Eq. (20), in addition to the
consistency requirements (12) and (13), one may consider the condition stem-
ming from a binary mixture in which one of the species (say i = 1) is much
Alternative Approaches to Hard-Sphere Liquids 9
larger than the other one (i.e., σ1/σ2 → ∞), but occupies a negligible volume
(i.e., x1(σ1/σ2)
d → 0). In that case, a sphere of species 1 is felt as a wall by
particles of species 2, so that [17, 20, 41]
σ1/σ2→∞
x1(σ1/σ2)d→0
g12(σ12)− 2d−1ηg22(σ2)
= 1. (29)
Hence, in the limit considered in Eq. (29), we have z22 → 1, z12 → 2. Conse-
quently, under the universality ansatz (16), one may rewrite Eq. (29) as
G(η, 2) = 1 + 2d−1ηG(η, 1). (30)
Thus, Eqs. (17), (18), and (30) provide complete information on the function
G at z = 0, z = 1, and z = 2, respectively, in terms of the contact value gs of
the single component RDF.
The simplest functional form of G that complies with the above consistency
conditions is a quadratic function of z [42]:
G(η, z) = G0(η) + G1(η)z + G2(η)z2, (31)
where the coefficients G1(η) and G2(η) are explicitly given by
G1(η) = (2− 2d−2η)gs −
2− η/2
1− η , (32)
G2(η) =
1− η/2
− (1− 2d−2η)gs. (33)
Therefore, the explicit expression for the contact values is
ge2ij (σij) =
(2− 2d−2η)gs −
2− η/2
1− η/2
− (1− 2d−2η)gs
. (34)
Following the same criterion as the one used in connection with Eq. (20), the
label “e2” is meant to indicate that (i) the resulting contact values represent
an extension of the single component contact value gs and that (ii) G(η, z)
is a quadratic polynomial in z. Of course, the quadratic form (31) is not the
only choice compatible with conditions (17), (18), and (30). For instance, a
rational function was also considered in Ref. [42]. However, although it is
rather accurate, it does not lead to a closed form for the EOS. In contrast,
when Eq. (34) is inserted into Eq. (2), one gets a closed expression for the
compressibility factor in terms of the packing fraction η and the first few
moments Mn, n ≤ d. The result is
Ze2(η) = 1 + 2
d−2 η
1− η [2(Ω0 − 2Ω1 +Ω2) + (Ω1 −Ω2)η]
+ [Zs(η)− 1]
2Ω1 −Ω2 + 2d−2(Ω2 −Ω1)η
, (35)
10 M. López de Haro, S. B. Yuste and A. Santos
where the quantities Ωm are defined in Eq. (22). Quite interestingly, in the
two-dimensional case Eq. (35) reduces to Eq. (23), i.e.,
Ze1(η) = Ze2(η), (d = 2). (36)
This illustrates the fact that two different proposals for the contact values
gij(σij) can yield the same EOS when inserted into Eq. (2). On the other
hand, for three-dimensional mixtures Eq. (35) becomes
Ze2(η) =
1− η + M
Zs(η)−
, (d = 3),
which differs from Eq. (24). In fact,
Ze1(η)− Ze2(η) =
1 + η
− (1 − 2η)Zs(η)
, (d = 3).
Specific Examples
In this subsection, rather than carrying out an exhaustive comparison with
the wealth of results available in the literature, we will consider only a few
representative examples. In particular, for d = 3, we will restrict ourselves
to a comparison with classical proposals (say BGHLL, PY, and SPT for the
contact values). The comparison with more recent ones may be found in Refs.
[35, 42, 43].
Thus far the development has been rather general since gs remains free in
Eqs. (20) and (34). In order to get specific results, it is necessary to fix gs [cf.
Table 1]. In the one-dimensional case, one has gs = 1/(1− η) and so one gets
the exact result (3) after substitution into Eq. (20). Similarly Eqs. (32) and
(33) lead to G1 = G2 = 0 and so we recover again the exact result.
If in the two-dimensional case we take Henderson’s value [26] gs = g
then the linear approximation (20) reduces to the JM approximation, Eq. (4).
This equivalence can be symbolically represented as geH1ij = g
ij , where the
label “eH1” refers to the extension of Henderson’s single component value in
the linear approximation. While gJMij is very accurate, even better results are
provided by the quadratic form (34), especially if Luding’s value [28] gs = g
is used [44].
In the three-dimensional case, Eq. (20) is of the form of the solution of the
PY equation [8]. In fact, insertion of gs = g
s leads to Eq. (6), i.e., g
gPYij . Similarly, if the SPT expression [9] gs = g
s is used for the single
component contact value in the quadratic approximation (34), we reobtain
the SPT expression for the mixture, Eq. (8). In other words, geSPT2ij = g
On the other hand, if the much more accurate CS [30] expression gs = g
used as input, we arrive at the following expression:
Alternative Approaches to Hard-Sphere Liquids 11
geCS2ij =
η(1 − η/3)
(1 − η)2
σiσjM2
σijM3
η2(1− η/2)
(1− η)3
σiσjM2
σijM3
, (d = 3),
which is different from the BGHLL one, Eq. (10), improves the latter for
zij > 1, and leads to similar results for zij < 1, as comparison with computer
simulations shows [42]. The four approximations (6), (8), (10), and (39) are
consistent with conditions (12) and (13), but only the SPT and eCS2 are also
consistent with condition (29). It should also be noted that if one considers
a binary mixture in the infinite solute dilution limit, namely x1 → 0, so that
z12 → 2/(1 + σ2/σ1), Eq. (39) yields the same result for g12(σ12) as the one
proposed by Matyushov and Ladanyi [23] for this quantity on the basis of
exact geometrical relations. However, the extension that the same authors
propose when there is a non-vanishing solute concentration, i.e., for x1 6= 0,
is different from Eq. (39).
Equation (34) can also be used in the case of hyperspheres (d ≥ 4) [42]. In
particular, a very good agreement with available computer simulations [38] is
obtained for d = 4 and d = 5 by using Luban and Michels [31] value gs = g
0.0 0.1 0.2 0.3 0.4 0.5
-0.02
 eCS1
 eCS2
Fig. 1. Deviation of the compressibility factor from the BMCSL value, as a function
of the packing fraction η for an equimolar three-dimensional binary mixture with
σ2/σ1 = 0.6. The open (Ref. [18]) and closed (Ref. [45]) circles are simulation data.
The lines are the PY EOS (– · · –), the SPT EOS (– · – ·), the eCS1 EOS (· · · ), and
the eCS2 EOS (– – –).
12 M. López de Haro, S. B. Yuste and A. Santos
0.00 0.05 0.10 0.15 0.20 0.25
 4D, / =1/2
 4D, / =1/3
 5D, / =2/5
Fig. 2. Compressibility factor for three equimolar mixtures in 4D and 5D systems.
Lines are the eLM1 predictions, while symbols are simulation data [38].
Now we turn to the compressibility factors (21) and (35), which are ob-
tained from the contact values (20) and (34), respectively. Since they depend
on the details of the composition through the d first moments, they are mean-
ingful even for continuous polydisperse mixtures.
As said above, in the two-dimensional case both Eqs. (21) and (35) reduce
to Eq. (23), which yield very accurate results when a good Zs is used as
input [39, 42, 44]. For three-dimensional mixtures, insertion of Zs = Z
Eqs. (24) and (37) yields
ZeCS1(η) = ZBMCSL(η) +
(1 − η)3M23
M1M3 −M22
, (d = 3), (40)
ZeCS2(η) = ZBMCSL(η)−
(1 − η)2M23
M1M3 −M22
, (d = 3), (41)
where ZBMCSL(η) is given by Eq. (11). Note that ZeCS1(η) > ZBMCSL(η) >
ZeCS2(η). Since simulation data indicate that the BMCSL EOS tends to un-
derestimate the compressibility factor, it turns out that, as illustrated in Fig.
1 for an equimolar binary mixture with σ2/σ1 = 0.6, the performance of ZeCS1
is, paradoxically, better than that of ZeCS2 [42], despite the fact that the un-
derlying linear approximation for the contact values is much less accurate than
the quadratic approximation. This shows that a rather crude approximation
such as Eq. (20) may lead to an extremely good EOS [35, 37–39], which, as
Alternative Approaches to Hard-Sphere Liquids 13
clearly seen in Fig. 1, represents a substantial improvement over the classical
proposals. Interestingly, the EOS corresponding to ZeCS1 has recently been
independently derived as the second order approximation of the Fundamental
Measure Theory for the HS fluid by Hansen-Goos and Roth [46].
In the case of d = 4 and d = 5, use of Zs(η) = Z
s (η) in Eq. (21)
produces a simple extended EOS of a mixture of hard additive hyperspheres
in these dimensionalities. The accuracy of these two EOS for hard hypersphere
mixtures in the fluid region has been confirmed by simulation data [38] for a
wide range of compositions and size ratios. In Fig. 2, this accuracy is explicitly
exhibited in the case of three equimolar mixtures, two in 4D and one in 5D.
2.2 A More Consistent Approximation for Three-Dimensional
Additive Mixtures
Up to this point, we have considered an arbitrary dimensionality d and have
constructed, under the universality assumption (16), the acurate quadratic
approximation (34), which fulfills the consistency conditions (12), (13), and
(29). However, there exist extra consistency conditions that are not necessarily
satisfied by (34). In particular, when the mixture is in contact with a hard wall,
the state of equilibrium imposes that the pressure evaluated near the wall by
considering the impacts with the wall must be the same as the pressure in the
bulk evaluated from the particle-particle collisions. This consistency condition
is especially important if one is interested in deriving accurate expressions for
the contact values of the particle-wall correlation functions.
Since a hard wall can be seen as a sphere of infinite diameter, the contact
value gwj of the correlation function of a sphere of diameter σj with the wall
can be obtained from gij(σij) as
gwj = lim
gij(σij). (42)
Note that gwj provides the ratio between the density of particles of species
j adjacent to the wall and the density of those particles far away from the
wall. The sum rule connecting the pressure of the fluid and the above contact
values is [47]
Zw(η) =
xjgwj, (43)
where the subscript w in Zw has been used to emphasize that Eq. (43) repre-
sents a route alternative to the virial one, Eq. (2), to get the EOS of the HS
mixture. The condition Z = Zw is equivalent to (29) in the special case where
one has a single fluid in the presence of the wall. However, in the general case
of a mixture plus a wall, the condition Z = Zw is stronger than Eq. (29).
In the two-dimensional case, it turns out that the quadratic approximation
(34) already satisfies the requirement Z = Zw, regardless of the density and
composition of the mixture [44]. However, this is not the case for d ≥ 3.
14 M. López de Haro, S. B. Yuste and A. Santos
Our problem now consists of computing gij(σij) and the associated gwj for
the HS mixture in the presence of a hard wall, so that the condition Z = Zw
is satisfied for an arbitrary mixture [43]. Due to the mathematical complexity
of the problem, here we will restrict ourselves to three-dimensional systems
(d = 3). Similarly to what we did in the preceding subsection, we consider
a class of approximations of the universal type (16), so that conditions (12)
and (13) lead again to Eqs. (17) and (18), respectively. Notice that Eq. (16)
implies in particular that
gwj = G(η, zwj), zwj = 2σj
. (44)
Assuming that z = 0 is a regular point and taking into account condition
(17), G(η, z) can be expanded in a power series in z:
G(η, z) = G0(η) +
Gn(η)zn. (45)
After simple algebra, using the ansatz (16) and Eq. (45) in Eqs. (2) (with
d = 3) and (43) one gets
Z = G0 + 3η
G0 + 4η
Mn+13
i,j=1
xixjσ
ij , (46)
Zw = G0 +
Mn. (47)
Notice that if the series (45) is truncated after a given order n ≥ 3, Zw is given
by the first n moments of the size distribution only. On the other hand, Z still
involves an infinite number of moments if the truncation is made after n ≥ 4
due to the presence of terms like
i,j xixjσ
4/σij ,
i,j xixjσ
5/σ2ij , . . . .
Therefore, if we want the consistency condition Z = Zw to be satisfied for any
discrete or continuous polydisperse mixture, either the whole infinite series
(45) needs to be considered or it must be truncated after n = 3. The latter is
of course the simplest possibility and thus we make the approximation
G(η, z) = G0(η) + G1(η)z + G2(η)z2 + G3(η)z3. (48)
As a consequence, Z and Zw depend functionally on the size distribution of
the mixture only through the first three moments (which is in the spirit of
Rosenfeld’s Fundamental Measure Theory [48]).
Using the approximation (48) in Eqs. (46) and (47) we are led to
Z = G0 + η
(3G0 + 2G1) + 2
(G1 + 2G2 + 2G3)
, (49)
Alternative Approaches to Hard-Sphere Liquids 15
Zw = G0 + 2
G1 + 4
(G2 + 2G3) . (50)
Thus far, the dependence of both Z and Zw on the momentsM1,M2, andM3
is explicit and we only lack the packing-fraction dependence of G1, G2, and
G3. From Eqs. (49) and (50) it follows that the difference between Z and Zw
is given by
Z−Zw =
[3ηG0 − 2(1− η)G1]+2
[ηG1 − 2(1− η)G2 − 2(2− η)G3] .
Therefore, Z = Zw for any dispersity provided that
G1(η) =
2 (1− η)2
, (52)
G2(η) =
4 (1− η)3
− 2− η
G3(η), (53)
where use has been made of the definition of G0, Eq. (17). To close the problem,
we use the equal size limit given in Eq. (18), which yields G0+G1+G2+G3 = gs.
After a little algebra we are led to
G2(η) = (2− η)gs −
2 + η2/4
(1− η)2
, (54)
G3(η) = (1− η)
gSPTs − gs
. (55)
This completes the derivation of our improved approximation, which we will
call “e3”, following the same criterion as the one used to call “e1” and “e2” to
the approximations (20) and (34), respectively. In Eq. (55), gSPTs is the SPT
contact value for a single fluid, whose expression appears in Table 1. From
Eq. (55) it is obvious that the choice gs = g
s makes our e3 approximation
to become the e2 approximation, both reducing to the SPT for mixtures, Eq.
(8). This means that the SPT is fully internally consistent with the require-
ment Z = Zw, although it has the shortcoming of not being too accurate in
the single component case. The e3 proposal, on the other hand, satisfies the
condition Z = Zw and has the flexibility of accommodating any desired gs.
For the sake of concreteness, let us write explicitly the contact values in
the e3 aproximation:
ge3ij (σij) =
2 (1− η)2
(2 − η)gs −
2 + η2/4
(1− η)2
+ (1− η)
gSPTs − gs
, (56)
16 M. López de Haro, S. B. Yuste and A. Santos
ge3wj =
1− η +
(1− η)2
σj + 4
(2 − η)gs −
2 + η2/4
(1− η)2
+8(1− η)
gSPTs − gs
. (57)
With the above results the compressibility factor may be finally written in
terms of Zs as
Ze3(η) =
(1− η)
(1− η)2
Zs(η)−
. (58)
A few comments are in order at this stage. First, from Eq. (49) we can
observe that, for the class of approximations (48), the compressibility factor
Z does not depend on the individual values of the coefficients G2 and G3,
but only on their sum. As a consequence, two different approximations of
the form (48) sharing the same density dependence of G1 and G2 + G3 also
share the same virial EOS. For instance, if one makes the choice gs = g
then ZePY3 = ZPY, even though g
ij (σij) 6= gPYij (σij). Furthermore, if one
makes the more accurate choice gs = g
s , then ZeCS3 = ZBMCSL, but again
geCS3ij (σij) 6= gBGHLLij (σij). The eCS3 contact values are
geCS3ij (σij) =
2 (1− η)2
η2(1 + η)
4(1− η)3
4(1− η)2
, (59)
geCS3wj =
(1− η)2
η2(1 + η)
(1− η)3
(1 − η)2
. (60)
In Figs. 3 and 4 we display the performance of the contact values as given
by Eqs. (59) and (60), respectively, by comparison with results of computer
simulations for both discrete and polydisperse mixtures. In both figures we
have also included the results that follow from the classical proposals as well
as those of the eCS1 and eCS2 approximations. It is clear that for the wall-
particle contact values the eCS3 approximation yields the best performance,
while for the particle-particle contact values both the eCS2 and eCS3 are of
comparable accuracy. A further feature to be pointed out is that the practical
collapse on a common curve of the simulation data in Figs. 3 and 4 provide a
posteriori support for the universality ansatz made in Eq. (16).
As mentioned earlier, there exist extra consistency conditions (see for in-
stance Ref. [12]) that one might use as well within our approach. Assuming
Alternative Approaches to Hard-Sphere Liquids 17
0.0 0.5 1.0 1.5 2.0
 eCS1
 eCS2 
 eCS3 
Fig. 3. Plot of the difference gij(σij)− gBGHLLij (σij) as a function of the parameter
zij = (σiσj/σij)M2/M3 for hard spheres (d = 3) at a packing fraction η = 0.49.
The symbols are simulation data for the single fluid (circle, Ref. [36]), three binary
mixtures (squares, Ref. [49]) with σ2/σ1 = 0.3 and x1 = 0.0625, 0.125, and 0.25, and
a ternary mixture (triangles, Ref. [50]) with σ2/σ1 =
, σ3/σ1 =
, and x1 = 0.1,
x2 = 0.2. The lines are the PY approximation (– · · –), the SPT approximation
(– · – ·), the eCS1 approximation (· · · ), the eCS2 approximation (– – –), and the
eCS3 approximation (—).
that the ansatz (16) still holds, some of these conditions are related to the
derivatives of G with respect to z, namely
∂G(η, z)
2(1− η)2
, (61)
∂2G(η, z)
gPYs −
, (62)
∂3G(η, z)
= 0. (63)
Interestingly enough, as shown by Eq. (52), condition (61) is already satisfied
by our e3 approximation without having to be imposed. On the other hand,
condition (63) implies G3 = 0 in the e3 scheme and thus it is only satisfied if
gs = g
s , in which case we recover the SPT. Condition (62) is not fulfilled
either by the SPT or by the e3 approximation (except for a particular ex-
pression of gs which is otherwise not very accurate). Thus, fulfilling the extra
18 M. López de Haro, S. B. Yuste and A. Santos
0.0 0.5 1.0 1.5 2.0 2.5
 eCS1
 eCS2
 eCS3
Fig. 4. Plot of the difference gwj − gBGHLLwj as a function of the parameter zwj/2 =
σjM2/M3 for hard spheres (d = 3) at a packing fraction η = 0.4. The symbols are
simulation data for a polydisperse mixture with a narrow top-hat distribution (open
squares, Ref. [51]), a polydisperse mixture with a wide top-hat distribution (open
circles, Ref. [51]), a polydisperse mixture with a Schulz distribution (open triangles,
Ref. [51]), and a binary mixture (closed circles, Ref. [52]). The lines are the PY
approximation (– · · –), the SPT approximation (– · – ·), the eCS1 approximation
(· · · ), the eCS2 approximation (– – –), and the eCS3 approximation (—).
conditions (62) and (63) with a free gs requires either considering a higher
order polynomial in z (in which case the consistency condition Z = Zw can-
not be satisfied for arbitrary mixtures, as discussed before) or not using the
universality ansatz at all. In the first case, we have checked that a quartic or
even a quintic polynomial does not improve matters, whereas giving up the
universality assumption increases significantly the number of parameters to
be determined and seems not to be adequate in view of the behavior observed
in the simulation data.
An additional comment has to do with the restriction to d = 3 in this
subsection. As noted before, the approximation e1 reduces to the exact result
(3) for d = 1. For d = 2, the approximation e2 already fulfills the condition
Z = Zw and so there is no real need to go further in that case. Since we
have needed the approximation e3 to satisfy Z = Zw for d = 3, it is tempting
to speculate that a polynomial form for G(z) of degree d could be found to
be consistent with the condition Z = Zw for d ≥ 4. However, a detailed
analysis shows that this is not the case for an arbitrary mixture, since the
Alternative Approaches to Hard-Sphere Liquids 19
number of conditions exceeds the number of unknowns, unless the universality
assumption is partially relaxed.
As a final comment, let us stress that, although the discussion in this
section has referred, for the sake of simplicity, to discrete mixtures, all the
dependence on the details of the composition occurs through a finite number
of moments, so that the results remain meaningful even for continuous poly-
disperse mixtures [53]. In that case, instead of a set of mole fractions {xi} and
a set of diameters {σi}, one has to deal with a distribution function w(σ) such
that w(σ)dσ is the fraction of particles with a diameter comprised between σ
and σ + dσ. Therefore, the moments (1) are now defined as
dσ σnw(σ), (64)
and with such a change the results we have derived for discrete mixtures also
hold for polydisperse systems.
2.3 Non-Additive Systems
Non-additive hard-core mixtures, where the distance of closest approach be-
tween particles of different species is no longer the arithmetic mean of the
diameters of both particles, have received much less attention than additive
mixtures, in spite of their in principle more versatility to deal with interesting
aspects occurring in real systems (such as fluid-fluid phase separation) and
of their potential use as reference systems in perturbation calculations on the
thermodynamic and structural properties of, say, Lennard–Jones mixtures.
Nevertheless, the study of non-additive systems goes back fifty years [54–56]
and is still a rapidly developing and challenging problem.
As mentioned in the paper by Ballone et al. [57], where the relevant
references may be found, experimental work on alloys, aqueous electrolyte
solutions, and molten salts suggests that hetero-coordination and homo-
coordination may be interpreted in terms of excluded volume effects due to
non-additivity of the repulsive part of the intermolecular potential. In particu-
lar, positive non-additivity leads naturally to demixing in HS mixtures, so that
some of the experimental findings of phase separation in the above mentioned
(real) systems may be accounted for by using a model of a binary mixture of
(positive) non-additive HS. On the other hand, negative non-additivity seems
to account well for chemical short-range order in amorphous and liquid binary
mixtures with preferred hetero-coordination [58].
Some Preliminary Definitions
Let us consider an N -component mixture of non-additive HS in d dimensions.
In this case, σij =
(σi + σj)(1 + ∆ij), where ∆ij ≥ −1 is a symmetric
matrix with zero diagonal elements (∆ii = 0) that characterizes the degree
20 M. López de Haro, S. B. Yuste and A. Santos
of non-additivity of the interactions. If ∆ij > 0 the non-additivity character
of the ij interaction is said to be positive, while it is negative if ∆ij < 0. In
the case of a binary mixture (N = 2), the only non-additivity parameter is
∆ ≡ ∆12 = ∆21. The virial EOS (2) remains being valid in the non-additive
case.
The contact values gij(σij) can be expanded in a power series in density
gij(σij) = 1 + vdρ
xkck;ij + (vdρ)
k,ℓ=1
xkxℓckℓ;ij +O(ρ3). (65)
The coefficients ck;ij , ckℓ;ij , . . . are independent of the composition of the mix-
ture, but they are in general complicated nonlinear functions of the diameters
σij , σik, σjk, σkℓ, . . . . Insertion of the expansion (65) into Eq. (2) yields the
virial expansion of Z, namely
Z(ρ) = 1 +
Bn(vdρ)
= 1 + vdρ
i,j=1
Bijxixj + (vdρ)
i,j,k=1
Bijkxixjxk
+(vdρ)
i,j,k,ℓ=1
Bijkℓxixjxkxℓ +O(ρ4). (66)
Note that, for further convenience, we have introduced the coefficients Bn ≡
−(n−1)
d Bn, where Bn are the usual virial coefficients [cf. Eq. (25)]. The
composition-independent second, third, and fourth (barred) virial coefficients
are given by
Bij = 2
d−1σdij , (67)
Bijk =
ck;ijσ
ij + cj;ikσ
ik + ci;jkσ
, (68)
Bijkℓ =
ckℓ;ijσ
ij + cjℓ;ikσ
ik + ciℓ;jkσ
jk + cjk,iℓσ
iℓ + cik,jℓσ
+cij;kℓσ
. (69)
A Simple Proposal for the Equation of State of d-Dimensional
Non-Additive Mixtures
Our goal now is to generalize the e1 proposal given by Eq. (20) to the non-
additive case [59]. We will not try to extend the e2 and e3 proposals, Eqs. (34)
and (56), because of two reasons. First, given the inherent complexity of non-
additive systems, we want to keep the approach as simple as possible. Second,
Alternative Approaches to Hard-Sphere Liquids 21
we are more interested in the EOS than in the contact values themselves and,
as mentioned earlier, the e1 proposal provides excellent EOS, at least in the
additive case, despite the simplicity of the corresponding contact values.
As the simplest possible extension, we impose again the point particle and
equal size consistency conditions, Eqs. (12) and (13), and thus keep in this
case also the ansatz (16) and the linear structure of Eq. (19). However, instead
of using Eq. (15), we determine the parameters zij as to reproduce Eq. (65)
to first order in the density. The result is readily found to be [59]
zij =
)−1(∑
k xkck;ij
. (70)
Here b2 = 2
d−1 and b3 are the second and third virial coefficients for the
single component fluid, as defined by Eq. (26). The proposal of Eq. (19) sup-
plemented by Eq. (70) is, by construction, accurate for densities low enough
as to justify the truncated approximation gij(σij) ≈ 1 + vdρ
k xkck;ij . On
the other hand, the limitations of this truncated expansion for moderate and
large densities may be compensated by the use of gs. When Eqs. (16), (19),
and (70) are inserted into Eq. (2) one gets
Z(η) = 1 +
b3MdB2 − b2B3
(b3 − b2)M2d
+ [Zs(η) − 1]
B3 −MdB2
(b3 − b2)M2d
. (71)
Equation (71) is the sought generalization of Eq. (21) to non-additive hard-
core systems. As in the additive case, the the density dependence in the EOS
of the mixture is rather simple: Z(η)− 1 is expressed as a linear combination
of η/(1 − η) and Zs(η) − 1, with coefficients such that the second and third
virial coefficients are reproduced. Again, Eq. (71) is bound to be accurate for
sufficiently low densities, while the limitations of the truncated expansion for
moderate and large densities are compensated by the use of the EOS of the
pure fluid.
The exact second virial coefficient B2 is known from Eq. (67). In principle,
one should use the exact coefficients ck;ij to compute B3. However, to the
best of our knowledge they are only known for d ≤ 3. Since our objective is to
have a proposal which is explicit for any d, we can make use of a reasonable
approximation for them [59], as described below.
An Approximate Proposal for ck;ij
The values of the coefficients ck;ij are exactly known for d = 1 and d = 3 and
from these results one may approximate them in d dimensions as [59]
ck;ij = σ
k;ij +
σd−1k;ij
σi;jkσj;ik, (72)
where we have called
22 M. López de Haro, S. B. Yuste and A. Santos
σk;ij ≡ σik + σjk − σij (73)
and it is understood that σk;ij ≥ 0 for all sets ijk. Clearly, σi;ij = σi. For a
binary mixture Eq. (72) yields
c1;11 = (b3/b2)σ
c2;11 = (2σ12 − σ1)d + (b3/b2 − 1)σ1(2σ12 − σ1)d−1,
c1;12 = σ
1 + (b3/b2 − 1) (2σ12 − σ1)σd1/σ12.
Of course, Eqs. (72) and (74) reduce to the exact results for d = 1 (b2 = b3 = 1)
and for d = 3 (b2 = 4, b3 = 10).
The quantities σk;ij may be given a simple geometrical interpretation.
Assume that we have three spheres of species i, j, and k aligned in the sequence
ikj. In such a case, the distance of closest approach between the centers of
spheres i and j is σik + σjk. If the sphere of species k were not there, that
distance would of course be σij . Therefore σk;ij as given by Eq. (73) represents
a kind of effective diameter of sphere k, as seen from the point of view of the
interaction between spheres i and j.
Inserting Eq. (72) into Eq. (70), one gets
zij =
)−1(∑
k xkσ
k xkσ
k;ij σi;jkσj;ik
Mdσij
. (75)
It can be easily checked that in the additive case (σk;ij → σk), Eq. (75) reduces
to Eq. (15).
Equations (72) and (74) are restricted to the situation σk;ij ≥ 0 for any
choice of i, j, and k, i.e., 2σ12 ≥ max(σ1, σ2) in the binary case. This excludes
the possibility of dealing with mixtures with extremely high negative non-
additivity in which one sphere of species k might “fit in” between two spheres
of species i and j in contact. Since for d = 3 and N = 2 the coefficients ck;ij
are also known for such mixtures [60], we may extend our proposal to deal
with these cases:
c1;11 = (b3/b2)σ
c2;11 = σ̂
2 + (b3/b2 − 1)σ1σ̂d−12 ,
c1;12 = (2σ12 − σ̂2)d + (b3/b2 − 1) σ̂2σd1/σ12,
where we have defined
σ̂2 = max (2σ12 − σ1, 0) . (77)
With such an extension, we recover the exact values of ck;ij for a binary
mixture of hard spheres (d = 3), even if σ1 > 2σ12 or σ2 > 2σ12.
The EOS (71) becomes explicit when B3 is obtained from Eq. (68) by
using the approximation (72). The resulting virial coefficient is the exact one
for d = 1 and d = 3. For hard disks (d = 2), it turns out that the approximate
third virial coefficient is practically indistinguishable from the exact one [59].
Alternative Approaches to Hard-Sphere Liquids 23
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Fig. 5. Plot of the compressibility factor versus the non-additivity parameter ∆ for
a symmetric binary mixture of non-additive hard spheres (d = 3) at η = π/30 and
two different compositions. The solid lines are our proposal, Eq. (71), with Zs = Z
while the dashed lines are Hamad’s proposal (Refs. [61–63]). The symbols are results
from Monte Carlo simulations (Refs. [64,65]).
When the approximate B3 is used, Eq. (71) reduces to Eq. (21) in the additive
case.
From the comparison with simulation results, both for the compressibility
factor and higher order virial coefficients, we find that the EOS (71) does a
good job for non-additive mixtures, thus representing a reasonable compro-
mise between simplicity and accuracy, provided that Zs is accurate enough.
This is illustrated in Fig. 5, where the proposal (71) with Zs = Z
s and a
similar proposal by Hamad [61–63] are compared with simulation data [64,65]
for some three-dimensional symmetric mixtures. A more extensive compari-
son [59] shows that Eq. (71) seems to work better (especially as the density
is increased) in the case of positive non-additivities, at least for d = 1, d = 2,
and d = 3, but its performance is also reasonably good in highly asymmetric
mixtures, even for negative ∆. Of course the full assessment of this proposal
is still pending since it involves many facets (non-additivity parameters, size
ratios, density, and composition). Without this full assessment and given its
rather satisfactory performance so far, going beyond the approximation given
by Eq. (19) (taking similar steps to the ones described in Subsections 2.1 and
2.2 for additive systems) does not seem to be necessary at this stage, although
it is in principle feasible.
24 M. López de Haro, S. B. Yuste and A. Santos
2.4 Demixing
Demixing is a common phase transition in fluid mixtures usually originated
on the asymmetry of the interactions (e.g., their strength and/or range) be-
tween the different components in the mixture. In the case of athermal systems
such as HS mixtures in d dimensions, if fluid-fluid separation occurs, it would
represent a neat example of an entropy-driven phase transition, i.e., a phase
separation based only on the size asymmetry of the components. The exis-
tence of demixing in binary additive three dimensional HS mixtures has been
studied theoretically since decades, and the issue is still controversial. In this
subsection we will present our results following different but related routes
that attempt to clarify some aspects of this problem.
Binary Mixtures of Additive d-Dimensional Spheres (d = 3, d = 4
and d = 5)
Now we look at the possible instability of a binary fluid mixture of HS of
diameters σ1 and σ2 (σ1 > σ2) in d dimensions by looking at the Helmholtz
free energy per unit volume, f , which is given by
= −1 +
xi ln
Z(η′)− 1
, (78)
where λi is the thermal de Broglie wavelength of species i. We locate the
spinodals through the condition f11f22−f212 = 0, with fij ≡ ∂2f/∂ρi∂ρj . Due
to the spinodal instability, the mixture separates into two phases of different
composition. The coexistence conditions are determined through the equality
of the pressure p and the two chemical potentials µ1 and µ2 in both phases
(µi = ∂f/∂ρi), leading to binodal (or coexistence) curves.
We begin with the case d = 3. It is well known that the BMCSL EOS, Eq.
(11), does not lead to demixing. However, other EOS for HS mixtures have
been shown to predict demixing [41, 66], including the EOS that is obtained
by truncating the virial series after a certain number of terms [67, 68]. In
particular, it turns out that both Z = ZeCS1, Eq. (40), and Z = ZeCS2,
Eq. (41), lead to demixing for certain values of the parameter γ ≡ σ2/σ1
that measures the size asymmetry. The critical values of the pressure, the
composition, and the packing fraction are presented in Table 2 for a few
values of γ.
As discussed earlier, the eCS1 EOS and, to a lesser extent, the eCS2 EOS
are both in reasonably good agreement with the available simulation results
for the compressibility factor [18, 36, 45] and lead to the exact second and
third virial coefficients but differ in the predictions for Bn with n ≥ 4. The
scatter in the values for the critical constants shown in Table 2 is evident and
so there is no indication as to whether one should prefer one equation over
the other in connection with this problem. Notice, for instance, that the eCS2
Alternative Approaches to Hard-Sphere Liquids 25
Table 2. Critical constants pcσ
1/kBT , x1c, and ηc for different γ-values as obtained
from the two extended CS equations (40) and (41).
eCS1 eCS2
γ pcσ
1/kBT x1c ηc pcσ
1/kBT x1c ηc
0.05 3599 0.0093 0.822 1096 0.0004 0.204
0.1 1307 0.0203 0.757 832.0 0.0008 0.290
0.2 653.4 0.0537 0.725 — — —
0.3 581.9 0.0998 0.738 — — —
0.4 663.4 0.1532 0.766 — — —
does not predict demixing for γ ≥ 0.2, while both the values of the critical
pressures and packing fractions for which it occurs according to the eCS1 EOS
suggest that the transition might be metastable with respect to a fluid-solid
transition.
Now we turn to the cases d = 4 and d = 5. Here we use the extended
Luban–Michels equation (eLM1) described in Subsection 2.1 [see Eq. (21)
and Table 1]. As seen in Fig. 6, the location of the critical point tends to go
down and to the right in the η2 vs η1 plane as γ decreases for d = 4 [69].
On the other hand, while it also tends to go down as γ decreases if d = 5,
its behavior in the η2 vs η1 plane is rather more erratic in this case. Also,
the value of the critical pressure pc (in units of kBT/σ
1) is not a monotonic
function of γ; its minimum value lies between γ = 1/3 and γ = 1/2 when
d = 4, and it is around γ = 3/5 for d = 5. This non-monotonic behavior is
also observed for three-dimensional HS [66, 68].
It is conceivable that the demixing transition in binary mixtures of hard
hyperspheres in four and five dimensions described above may be metastable
with respect to a fluid-solid transition, as it may also be the case of 3D HS.
In fact, the value of the pressure at the freezing transition for the single
component fluid is [31] pfσ
d/kBT ≃ 12.7 (d = 3), 11.5 (d = 4), and 12.2
(d = 5), i.e., pfσ
d/kBT does not change appreciably with the dimensionality
but is clearly very small in comparison with the critical pressures pcσ
1/kBT
we obtain for the mixture; for instance, pcσ
1/kBT ≃ 600 (d = 3, γ = 3/10),
300 (d = 4, γ = 1/3) and 123 (d = 5, γ = 3/5). However, one should also
bear in mind that, if the concentration x1 of the bigger spheres decreases, the
value of the pressure at which the solid-fluid transition in the mixture occurs
in 3D is also considerably increased with respect to pf [cf. Fig. 6 of Ref. [66]].
Thus, for concentrations x1 ≃ 0.01 corresponding to the critical point of the
fluid-fluid transition, the maximum pressure of the fluid phase greatly exceeds
pf. If a similar trend with composition also holds in 4D and 5D, and given that
the critical pressures become smaller as the dimensionality d is increased, it is
not clear whether the competition between the fluid-solid and the fluid-fluid
transitions in these dimensionalities will always be won by the former. The
point clearly deserves further investigation.
26 M. López de Haro, S. B. Yuste and A. Santos
0.0 0.1 0.2 0.3 0.4 0.5 0.6
1400  2/ 1=3/4
4 1/k
0.0 0.1 0.2 0.3 0.4
5 1/k
Fig. 6. Spinodal curves (upper panels: lines) and binodal curves (upper panels: open
symbols; lower panels: lines) in a 4D system (left panels) and in a 5D system (right
panels). The closed symbols are the critical consolute points.
An interesting feature must be mentioned. There is a remarkable similarity
between the binodal curves represented in the pσdi –η1 and in the µi–η1 planes
[69]. By eliminating η1 as if it were a parameter, one can represent the binodal
curves in a µi vs pσ
i plane. Provided the origin of the chemical potentials
is such as to make λi = σi, the binodals in the µi–pσ
i plane practically
collapse into a single curve (which is in fact almost a straight line) for each
dimensionality (d = 3, d = 4, and d = 5) [69]. A closer analysis of this
Alternative Approaches to Hard-Sphere Liquids 27
phenomenon shows, however, that it is mainly due to the influence on µi
of terms which are quantitatively dominant but otherwise irrelevant to the
coexistence conditions.
Binary Mixtures of Non-Additive Hard Hyperspheres in the Limit
of High Dimensionality
Let us now consider a binary mixture of non-additive HS of diameters σ1
and σ2 in d dimensions. Thus in this case σ12 ≡ 12 (σ1 + σ2)(1 +∆) where as
before ∆ may be either positive or negative. Further assume (something that
will become exact in the limit d → ∞ [70]) that the EOS of the mixture is
described by the second virial coefficient only, namely
p = ρkBT [1 +B2(x1)ρ] , (79)
where, according to Eq. (67),
B2(x1) = vd2
1 + x
2 + 2x1x2σ
. (80)
The Helmholtz free energy per unit volume is given by f/ρkBT = −1 +∑2
i=1 xi ln
+B2ρ, where Eq. (78) has been used. The Gibbs free energy
per particle is
g = (f + p)/ρ =
xi ln
+ 2B2(x1)ρ, (81)
where without loss of generality we have set kBT = 1. Given a size ratio γ, a
value of ∆, and a dimensionality d, the consolute critical point (x1c, pc) is the
solution to
∂2g/∂x21
∂3g/∂x31
= 0, provided of course it exists. Then,
one can get the critical density ρc from Eq. (79).
We now introduce the scaled quantities [71]
p̃ ≡ 2d−1vdd−2pσd1/kBT, u ≡ d−1B2ρ. (82)
Consequently, Eqs. (79) and (81) can be rewritten as
p̃ = u
u+ d−1
/B̃2, (83)
xi ln (xiΛi) + ln
Adu/B̃2
+ 2du, (84)
where B̃2 ≡ B2/2d−1vdσd1 , Λi ≡ (λi/σ1)d, and Ad ≡ d/2d−1vd. Next we take
the limit d → ∞ and assume that the volume ratio γ̃ ≡ γd is kept fixed and
that there is a (slight) non-additivity ∆ = d−2∆̃ such that the scaled non-
additivity parameter ∆̃ is also kept fixed in this limit. Thus, the second virial
coefficient can be approximated by
28 M. López de Haro, S. B. Yuste and A. Santos
B̃2 = B̃
2 +B̃
−1+O(d−2), B̃(0)2 =
x1 + x2γ̃
2 = x1x2γ̃
1/2J,
J ≡ 1
(ln γ̃)
+ 2∆̃. (86)
Let us remark that, in order to find a consolute critical point, it is essential
to keep the term of order d−1 if ∆̃ ≤ 0. The EOS (83) can then be inverted
to yield
u = u(0)+u(1)d−1+O(d−2), u(0) =
2 , u
(1) = −1
1− u(0) B̃
In turn, the Gibbs free energy (84) becomes
g = g(0)d+ g(1) +O(d−1),
g(0) = 2u(0), g(1) =
i=1 xi ln (xiΛi) + ln
(0)/B̃
+ 2u(1),
while the chemical potentials µ1 = g+x2 (∂g/∂x1)p and µ2 = g−x1 (∂g/∂x1)p
are given by
µi = µ
i d+ µ
i +O(d−1), µ
1 = 2p̃
1 = ln
Adx1Λ1
p̃/B̃
2 + (x2/x1)(γ̃p̃)
1/2B̃
2 /B̃
where µ2 is obtained from µ1 by the changes x1 ↔ x2, Λ1 → Λ2/γ̃, γ̃ → 1/γ̃,
p̃→ p̃γ̃, B̃2 → B̃2/γ̃.
The coordinates of the critical point are readily found to be
x1c =
γ̃3/4
1 + γ̃3/4
, p̃c =
1 + γ̃1/4
4γ̃J2
. (90)
Note that x1c is independent of ∆̃. The coexistence curve, which has to be
obtained numerically, follows from the conditions µ
i (xA, p̃) = µ
i (xB , p̃)
(i = 1, 2) where x1 = xA and x1 = xB are the mole fractions of the co-
existing phases. Once the critical consolute point has been identified in the
pressure/concentration plane, we can obtain the critical density. The domi-
nant behaviors of B̃2 and u at the critical point are
2 (x1c) =
1− γ̃1/4 + γ̃1/2
)2 , u
1 + γ̃1/4
1− γ̃1/4 + γ̃1/2
. (91)
Hence, the critical density readily follows after substitution in the scaling
relation given in Eq. (82). It is also convenient to consider the scaled version
η̃ ≡ d−12dη of the packing fraction η = vdρσd1 (x1 + x2γ̃). At the critical point,
it takes the nice expression
Alternative Approaches to Hard-Sphere Liquids 29
0.00 0.04 0.08 0.12 0.16
p  =-0.1
 =0.1
=0.01
Fig. 7. Binodal curves in the planes ep vs x1 and eη vs x1 corresponding to eγ = 0.01
and e∆ = −0.1, e∆ = 0, and e∆ = 0.1.
η̃c =
γ̃1/8 + γ̃−1/8
. (92)
The previous results clearly indicate that a demixing transition is possible
not only for additive or positively non-additive mixtures but even for negative
non-additivities. The only requirement is J > 0, i.e., ∆̃ > − 1
(ln γ̃)
equivalently, ∆ > − 1
(ln γ)
. Figure 7 shows the binodal curves corresponding
to γ̃ = 0.01 and ∆̃ = −0.1 (negative non-additivity), ∆̃ = 0 (additivity), and
∆̃ = 0.1 (positive non-additivity).
While the high dimensionality limit has allowed us to address the prob-
lem in a mathematically simple and clear-cut way, the possibility of demixing
with negative non-additivity is not an artifact of that limit. As said before,
demixing is known to occur for positive non-additive binary mixtures of HS
in three dimensions and there is compelling evidence on the existence of this
phenomenon in the additive case, at least in the metastable fluid region. Even
though in a three-dimensional mixture the EOS is certainly more complicated
than Eq. (79) and the demixing transition that we have just discussed for neg-
ative non-additivity is possibly metastable with respect to the freezing transi-
tion, the main effects at work (namely the competition between depletion due
to size asymmetry and hetero-coordination due to negative non-additivity)
are also present. In fact, it is interesting to point out that Roth et al. [72],
using the approximation of an effective single component fluid with pair inter-
30 M. López de Haro, S. B. Yuste and A. Santos
actions to describe a binary mixture of non-additive 3D HS and employing an
empirical rule based on the effective second virial coefficient, have also sug-
gested that demixing is possible for small negative non-additivity and high
size asymmetry. Our exact results lend support to this suggestion and con-
firm that, in some cases, the limit d→ ∞ highlights features already present
in real systems.
3 The Rational Function Approximation (RFA) Method
for the Structure of Hard-Sphere Fluids
The RDF g(r) and its close relative the (static) structure factor S(q) are the
basic quantities used to discuss the structure of a single component fluid [1–4].
The latter quantity is defined as
S(q) = 1 + ρh̃(q), (93)
where
h̃(q) =
dr e−iq·rh(r) (94)
is the Fourier transform of the total correlation function h(r) ≡ g(r)−1, i being
the imaginary unit. An important related quantity is the direct correlation
function c(r), which is defined in Fourier space through the Ornstein–Zernike
(OZ) relation [1–4]
c̃(q) =
h̃(q)
1 + ρh̃(q)
, (95)
where c̃(q) is the Fourier transform of c(r)
The usual approach to obtain g(r) is through one of the integral equa-
tion theories, where the OZ equation is complemented by a closure relation
between c(r) and h(r) [1]. However, apart from requiring in general hard nu-
merical labor, a disappointing aspect is that the substitution of the (necessar-
ily) approximate values of g(r) obtained from them in the (exact) statistical
mechanical formulae may lead to the thermodynamic inconsistency problem.
The two basic routes to obtain the EOS of a single component fluid of HS
are the virial route, Eq. (14), and the compressibility route
χs ≡ kBT
= [1− ρc̃(0)]−1 = S(0)
= 1 + 2ddησ−d
dr rd−1h(r). (96)
Thermodynamic consistency implies that
χ−1s (η) =
[ηZs(η)], (97)
Alternative Approaches to Hard-Sphere Liquids 31
but, in general, this condition is not satisfied by an approximate RDF. In
the case of a HS mixture, the virial route is given by Eq. (2), while the
compressibility route is indicated below [cf. Eq. (145)].
In this section we describe the RFA method, which is an alternative to
the integral equation approach and in particular leads by construction to
thermodynamic consistency.
3.1 The Single Component HS Fluid
We begin with the case of a single component fluid of HS of diameter σ.
The following presentation is equivalent to the one given in Refs. [73, 74],
where all details can be found, but more suitable than the former for direct
generalization to the case of mixtures.
The starting point will be the Laplace transform
G(s) =
dr e−srrg(r) (98)
and the auxiliary function Ψ(s) defined through
G(s) =
[ρ+ esσΨ(s)]
. (99)
The choice of G(s) as the Laplace transform of rg(r) and the definition of
Ψ(s) from Eq. (99) are suggested by the exact form of g(r) to first order in
density [73].
Since g(r) = 0 for r < σ while g(σ+) = finite, one has
g(r) = Θ(r − σ)
g(σ+) + g′(σ+)(r − σ) + · · ·
, (100)
where g′(r) ≡ dg(r)/dr. This property imposes a constraint on the large s
behavior of G(s), namely
eσssG(s) = σg(σ+) +
g(σ+) + σg′(σ+)
s−1 +O(s−2). (101)
Therefore, lims→∞ e
sσsG(s) = σg(σ+) = finite or, equivalently,
s−2Ψ(s) =
2πσg(σ+)
= finite. (102)
On the other hand, according to Eq. (96) with d = 3,
χs = 1− 24ησ−3 lim
dr e−srr [g(r) − 1]
= 1− 24ησ−3 lim
G(s)− s−2
. (103)
Since the (reduced) isothermal compressibility χs is also finite, one has∫∞
dr r2 [g(r) − 1] = finite, so that the weaker condition
dr r [g(r)− 1] =
lims→0[G(s)− s−2] = finite must hold. This in turn implies
32 M. López de Haro, S. B. Yuste and A. Santos
Ψ(s) = −ρ+ ρσs−
ρσ2s2+
ρσ3 +
ρσ3 +
σs4+O(s5).
(104)
First-Order Approximation (PY Solution)
An interesting aspect to be remarked is that the minimal input we have just
described on the physical requirements related to the structure and thermo-
dynamics of the system is enough to determine the small and large s limits of
Ψ(s), Eqs. (102) and (104), respectively. While infinite choices for Ψ(s) would
comply with such limits, a particularly simple form is a rational function. In
particular, the rational function having the least number of coefficients to be
determined is
Ψ(s) =
E(0) + E(1)s+ E(2)s2 + E(3)s3
L(0) + L(1)s
, (105)
where one of the coefficients can be given an arbitrary non-zero value. We
choose E(3) = 1. With such a choice and in view of Eq. (104), one finds
E(0) = −ρL(0), E(1) = −ρ(L(1) − σL(0)), E(2) = ρ(σL(1) − 1
σ2L(0)), and
L(0) = 2π
1 + 2η
(1− η)2
, (106)
L(1) = 2πσ
1 + η/2
(1− η)2
. (107)
Upon substitution of these results into Eqs. (99) and (105), we get
G(s) =
L(0) + L(1)s
ϕ2(σs)σ3L(0) + ϕ1(σs)σ2L(1)
] , (108)
where
ϕn(x) ≡ x−(n+1)
(−x)m
− e−x
. (109)
In particular,
ϕ0(x) =
1− e−x
, ϕ1(x) =
1− x− e−x
, ϕ2(x) =
1− x+ x2/2− e−x
(110)
Note that limx→0 ϕn(x) = (−1)n/(n+ 1)!.
It is remarkable that Eq. (108), which has been derived here as the sim-
plest rational form for Ψ(s) complying with the requirements (102) and (104),
coincides with the solution to the PY closure, c(r) = 0 for r > σ, of the OZ
equation [29]. Application of Eq. (102) yields the PY contact value gPYs and
compressibility factor ZPYs shown in Table 1. Analogously, Eq. (103) yields
χPYs =
(1− η)4
(1 + 2η)2
. (111)
It can be easily checked that the thermodynamic relation (97) is not satisfied
by the PY theory.
Alternative Approaches to Hard-Sphere Liquids 33
Second-Order Approximation
In the spirit of the RFA, the simplest extension of the rational approximation
(105) involves two new terms, namely αs4 in the numerator and L(2)s2 in the
denominator, both of them necessary in order to satisfy Eq. (102). Such an
addition leads to
Ψ(s) =
E(0) + E(1)s+ E(2)s2 + E(3)s3 + αs4
L(0) + L(1)s+ L(2)s2
. (112)
Applying Eq. (104), it is possible to express E(0), E(1), E(2), E(3), L(0), and
L(1) in terms of α and L(2). This leads to
G(s) =
L(0) + L(1)s+ L(2)s2
1 + αs− ρ
ϕ2(σs)σ3L(0) + ϕ1(σs)σ2L(1) + ϕ0(σs)σL(2)
(113)
where
L(0) = 2π
1 + 2η
(1− η)2
, (114)
L(1) = 2πσ
1 + 1
(1 − η)2
1 + 2η
α− 3ηL
. (115)
Thus far, irrespective of the values of the coefficients L(2) and α, the condi-
tions lims→∞ e
sσsG(s) = finite and lims→0[G(s) − s−2] = finite are satisfied.
Of course, if L(2) = α = 0, one recovers the PY approximation. More gen-
erally, we may determine these coefficients by prescribing the compressibility
factor Zs (or equivalently the contact value gs) and then, in order to ensure
thermodynamic consistency, compute from it the isothermal compressibility
χs by means of Eq. (97). From Eqs. (102) and (103) one gets
L(2) = 2πασgs, (116)
1− 12η
1 + 2
αL(2)
. (117)
Clearly, upon substitution of Eqs. (114) and (116) into Eq. (117) a quadratic
algebraic equation for α is obtained. The physical root is
α = − 12η(1 + 2η)E4
(1− η)2 + 36η [1 + η − Zs(1− η)]E4
, (118)
where
Zs − 13
Zs − 13
Zs − ZPYs
)]1/2}
. (119)
The other root must be discarded because it corresponds to a negative value
of α, which, according to Eq. (116), yields a negative value of L(2). This would
34 M. López de Haro, S. B. Yuste and A. Santos
imply the existence of a positive real value of s at which G(s) = 0 [73, 74],
which is not compatible with a positive definite RDF. However, according to
the form of Eq. (119) it may well happen that, once Zs has been chosen, there
exists a certain packing fraction ηg above which α is no longer positive. This
may be interpreted as an indication that, at the packing fraction ηg where α
vanishes, the system ceases to be a fluid and a glass transition in the HS fluid
occurs [74–76].
Expanding (113) in powers of s and using Eq. (101) one can obtain the
derivatives of the RDF at r = σ+ [77]. In particular, the first derivative is
g′(σ+) =
L(1) − L(2)
, (120)
which may have some use in connection with perturbation theory [15].
It is worthwhile to point out that the structure implied by Eq. (113) coin-
cides in this single component case with the solution of the Generalized Mean
Spherical Approximation (GMSA) [78], where the OZ relation is solved under
the ansatz that the direct correlation function has a Yukawa form outside the
core.
For a given Zs, once G(s) has been determined, inverse Laplace trans-
formation yields rg(r). First, note that Eq. (99) can be formally rewritten
G(s) = −
ρn−1 [−Ψ(s)]−n e−nsσ. (121)
Thus, the RDF is then given by
g (r) =
ρn−1ψn (r − nσ)Θ (r − nσ) , (122)
with Θ (x) denoting the Heaviside step function and
ψn (r) = −L−1
s [−Ψ (s)]−n
, (123)
L−1 denoting the inverse Laplace transform. Explicitly, using the residue the-
orem,
ψn (r) = −
(n−m)!(m− 1)!
rn−m, (124)
where
a(i)mn = lim
s [−Ψ (s) /(s− si)]−n , (125)
si (i = 1, . . . , 4) being the poles of 1/Ψ(s), i.e., the roots of E
(0) + E(1)s +
E(2)s2 + E(3)s3 + αs4 = 0. Explicit expressions of g(r) up to the second
coordination shell σ ≤ r ≤ 3σ can be found in Ref. [79].
Alternative Approaches to Hard-Sphere Liquids 35
On the other hand, the static structure factor S(q) [cf. Eq. (93)] and the
Fourier transform h̃(q) may be related to G(s) by noting that
h̃(q) =
dr r sin(qr)h(r) = −2π G(s)−G(−s)
. (126)
Therefore, the basic structural quantities of the single component HS fluid,
namely the RDF and the static structure factor, may be analytically deter-
mined within the RFA method once the compressibility factor Zs, or equiva-
lently the contact value gs, is specified. In Fig. 8 we compare simulation data
of g(r) for a density ρσ3 = 0.9 [80] with the RFA prediction and a recent
approach by Trokhymchuk et al. [81], where Zs = Z
s [cf. Table 1] and the
associated compressibility
χCSs =
(1− η)4
1 + 4η + 4η2 − 4η3 + η4
(127)
are taken in both cases. Both theories are rather accurate, but the RFA cap-
tures better the maxima and minima of g(r) [82].
It is also possible to obtain within the RFA method the direct correlation
function c(r). Using Eqs. (95) and (126), and applying the residue theorem,
one gets, after some algebra,
1 2 3 4
1 2 3 4
Fig. 8. Radial distribution function of a single component HS fluid for ρσ3 = 0.9.
The solid lines represent simulation data [80]. The dashed lines represent the results
of the approach of Ref. [81], while the dotted lines refer to those of the RFA method.
The inset shows the oscillations of g(r) in more detail.
36 M. López de Haro, S. B. Yuste and A. Santos
c(r) =
+K0 +K1r +K3r
Θ(1 − r) +K
(128)
where
12αηL(2)/π + 1− 12α(1 + 2α)η/(1− η), (129)
4α2(1 − η)4κ6
2 [1 + 2(1 + 3α)η]± [2 + η + 2α(1 + 2η)]κ
+(1− η)
κ2 − η (12 + (κ± 6)κ)
L(2)/π
12η [1 + 2(1 + 3α)η]
±6η [3η − 2α(1− 4η)]κ− 6η(1 + 2α)(1− η)κ2 − (1 − η)2κ3(ακ∓ 1)
+6η(1− η)
κ2 − η (12 + (κ± 6)κ)
L(2)/π
, (130)
K−1 = −
κ +K−e
−κ +K0 +K1 +K3
, (131)
K0 = −
1 + 2 (1 + 3α) η − 6η (1− η)L(2)/π
ακ (1− η)2
, (132)
2α2κ2 (1− η)4
[2 + η + 2α(1 + 2η)]
2 − 4 (1− η) [1 + η
×(7 + η + 6α (2 + η))]L(2)/π + 12η (2 + η) (1− η)2L(2)
,(133)
K0, (134)
K = − (K+ +K− +K−1) . (135)
In Eqs. (129)–(135) we have taken σ = 1 as the length unit. Note that Eq.
(135) guarantees that c(0) = finite, while Eq. (131) yields c(σ+) − c(σ−) =
L(2)/2πα = g(σ+). The latter equation proves the continuity of the indirect
correlation function γ(r) ≡ h(r) − c(r) at r = σ. With the above results,
Eqs. (122) and (128), one may immediately write the function γ(r). Finally,
we note that the bridge function B(r) is linked to γ(r) and to the cavity
(or background) function y(r) ≡ eφ(r)/kBT g(r), where φ(r) is the interaction
potential, through
B(r) = ln y(r) − γ(r), (136)
and so, within the RFA method, the bridge function is also completely speci-
fied analytically for r > σ once Zs is prescribed.
If one wants to have B(r) also for 0 ≤ r ≤ σ, then an expression for the
cavity function is required in that region. Here we propose such an expression
using a limited number of constraints. First, since the cavity function and its
first derivative are continuous at r = σ, we have
Alternative Approaches to Hard-Sphere Liquids 37
0.0 0.2 0.4 0.6 0.8 1.0
1000 3=0.3
3=0.5
3=0.7
Fig. 9. Cavity function of a single component HS fluid in the overlap region for
ρσ3 = 0.3, 0.5, and 0.7. The solid lines represent our proposal (140) with Zs = Z
while the symbols represent Monte Carlo simulation results [84].
y(1) = gs,
y′(1)
− 1, (137)
where Eqs. (116) and (120) have been used and again σ = 1 has been taken.
Next, we consider the following exact zero-separation theorems [83]:
ln y(0) = Zs(η)− 1 +
′)− 1
, (138)
y′(0)
= −6ηy(1). (139)
The four conditions (137)–(139) can be enforced by assuming a cubic poly-
nomial form for ln y(r) inside the core, namely
y(r) = exp
Y0 + Y1r + Y2r
2 + Y3r
, (0 ≤ r ≤ 1), (140)
where
Y0 = Zs(η)− 1 +
′)− 1
, (141)
Y1 = −6ηy(1), (142)
38 M. López de Haro, S. B. Yuste and A. Santos
0 2 4 6 8 10
 RFA, =0.3
 RFA, =0.49
Fig. 10. Parametric plot of the bridge function B(r) versus the indirect correlation
function γ(r). The dashed line refers to the RFA for η = 0.3, while the solid line
refers to the RFA for η = 0.49. In each case, the branch of the curve to the right
of the circle corresponds to r ≤ 1, while that to the left corresponds to r ≥ 1. For
comparison, the PY closure B(r) = ln[1 + γ(r)]− γ(r) is also plotted (dash-dotted
line).
Y2 = 3 ln y(1)−
y′(1)
− 3Y0 − 2Y1, (143)
Y3 = −2 ln y(1) +
y′(1)
+ 2Y0 + Y1. (144)
The proposal (140) is compared with available Monte Carlo data [84] in Fig.
9, where an excellent agreement can be observed.
Once the cavity function y(r) provided by the RFA method is comple-
mented by (140), the bridge function B(r) can be obtained at any distance.
Figure 10 presents a parametric plot of the bridge function versus the indirect
correlation function as given by the RFA method for two different packing
fractions, as well as the result associated with the PY closure. The fact that
one gets a smooth curve means that within the RFA the oscillations in γ(r)
are highly correlated to those of B(r). Further, the effective closure relation in
the RFA turns out to be density dependent, in contrast with what occurs for
the PY theory. Note that the absolute value |B(r)| for a given value of γ(r)
is smaller in the RFA than the PY value and that the RFA and PY curves
become paradoxically closer for larger densities. Since the PY theory is known
Alternative Approaches to Hard-Sphere Liquids 39
to yield rather poor values of the cavity function inside the core [85, 86], it
seems likely that the present differences may represent yet another manifes-
tation of the superiority of the RFA method, a point that certainly deserves
to be further explored.
3.2 The Multicomponent HS Fluid
The method outlined in the preceding subsection will be now extended to an
N -component mixture of additive HS. Note that in a multicomponent system
the isothermal compressibility χ is given by
χ−1 =
T,{xj}
T,{xj}
= 1− ρ
i,j=1
xixj c̃ij(0), (145)
where c̃ij(q) is the Fourier transform of the direct correlation function cij(r),
which is defined by the OZ equation
h̃ij(q) = c̃ij(q) +
ρkh̃ik(q)c̃kj(q), (146)
where hij(r) ≡ gij(r) − 1. Equations (145) and (146) are the multicompo-
nent extensions of Eqs. (96) and (95), respectively. Introducing the quantities
ĥij(q) ≡
ρiρj h̃ij(q) and ĉij(q) ≡
ρiρj c̃ij(q), the OZ relation (146) be-
comes, in matrix notation,
ĉ(q) = ĥ(q) · [I+ ĥ(q)]−1, (147)
where I is the N ×N identity matrix. Thus, Eq. (145) can be rewritten as
χ−1 =
i,j=1
xixj [δij − ĉij(0)] =
i,j=1
I+ ĥ(0)
. (148)
Similarly to what we did in the single component case, we introduce the
Laplace transforms of rgij(r):
Gij(s) =
dr e−srrgij(r). (149)
The counterparts of Eqs. (100) and (101) are
gij(r) = Θ(r − σij)
gij(σ
ij) + g
ij)(r − σij) + · · ·
, (150)
40 M. López de Haro, S. B. Yuste and A. Santos
eσijssGij(s) = σijgij(σ
ij) +
gij(σ
ij) + σijg
s−1 +O(s−2). (151)
Moreover, the condition of a finite compressibility implies that h̃ij(0) = finite.
As a consequence, for small s,
s2Gij(s) = 1 +H
3 + · · · (152)
with H
ij = finite and H
ij = −h̃ij(0)/4π = finite, where
dr (−r)nrhij(r). (153)
We are now in the position to generalize the approximation (113) to the
N -component case [87]. While such a generalization may be approached in a
variety of ways, two motivations are apparent. On the one hand, we want to
recover the PY result [8] as a particular case in much the same fashion as in
the single component system. On the other hand, we want to maintain the
development as simple as possible. Taking all of this into account, we propose
Gij(s) =
e−σijs
L(s) · [(1 + αs)I− A(s)]−1
, (154)
where L(s) and A(s) are the matrices
Lij(s) = L
ij + L
ij s+ L
2, (155)
Aij(s) = ρi
ϕ2(σis)σ
ij + ϕ1(σis)σ
ij + ϕ0(σis)σiL
, (156)
the functions ϕn(x) being defined by Eq. (109). We note that, by construc-
tion, Eq. (154) complies with the requirement lims→∞ e
σijssGij(s) = finite.
Further, in view of Eq. (152), the coefficients of s0 and s in the power series
expansion of s2Gij(s) must be 1 and 0, respectively. This yields 2N
2 condi-
tions that allow us to express L(0) and L(1) in terms of L(2) and α. The solution
is [87]
ij = ϑ1 + ϑ2σj + 2ϑ2α− ϑ1
ρkσkL
kj , (157)
ij = ϑ1σij +
ϑ2σiσj + (ϑ1 + ϑ2σi)α−
ρkσkL
kj , (158)
where ϑ1 ≡ 2π/(1− η) and ϑ2 ≡ 6π(M2/M3)η/(1− η)2.
In parallel with the development of the single component case, L(2) and α
can be chosen arbitrarily. Again, the choice L
ij = α = 0 gives the PY solution
[8, 88]. Since we want to go beyond this approximation, we will determine
those coefficients by taking prescribed values for gij(σij), which in turn, via
Eq. (2), give the EOS of the mixture. This also leads to the required value of
Alternative Approaches to Hard-Sphere Liquids 41
χ−1 = ∂(ρZ)/∂ρ, thus making the theory thermodynamically consistent. In
particular, according to Eq. (151),
ij = 2πασijgij(σ
ij). (159)
The condition related to χ is more involved. Making use of Eq. (152), one can
get h̃ij(0) = −4πH(1)ij in terms of L(2) and α and then insert it into Eq. (148).
Finally, elimination of L
ij in favor of α from Eq. (159) produces an algebraic
equation of degree 2N , whose physical root is determined by the requirement
that Gij(s) is positive definite for positive real s. It turns out that the physical
solution corresponds to the smallest of the real roots. Once α is known, upon
substitution into Eqs. (154), (157), (158), and (159), the scheme is complete.
Also, using Eq. (151), one can easily derive the result
g′ij(σ
ij) =
2πασij
ij − L
. (160)
It is straightforward to check that the results of the preceding subsection are
recovered by setting σi = σ, regardless of the values of the mole factions.
Once Gij(s) has been determined, inverse Laplace transformation directly
yields rgij(r). Although in principle this can be done analytically, it is more
practical to use one of the efficient methods discussed by Abate and Whitt [89]
to numerically invert Laplace transforms [90].
In Fig. 11 we present a comparison between the results of the RFA method
with the PY theory and simulation data [50] for the RDF of a ternary mixture.
In the case of the RFA, we have used the eCS2 contact values and the cor-
responding isothermal compressibility. The improvement of the RFA over the
PY prediction, particularly in the region near contact, is noticeable. Although
the RFA accounts nicely for the observed oscillations, it seems to somewhat
overestimate the depth of the first minimum.
Explicit knowledge of Gij(s) also allows us to determine the Fourier trans-
form h̃ij(q) through the relation
h̃ij(q) = −2π
Gij(s)−Gij(−s)
. (161)
The structure factor Sij(q) may be expressed in terms of h̃ij(q) as [4]
Sij(q) = xiδij + ρxixj h̃ij(q). (162)
In the particular case of a binary mixture, rather than the individual structure
factors Sij(q), it is some combination of them which may be easily associated
with fluctuations of the thermodynamic variables [91, 92]. Specifically, the
quantities [4]
Snn(q) = S11(q) + S22(q) + 2S12(q), (163)
42 M. López de Haro, S. B. Yuste and A. Santos
2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8
10.750.50.250
10.750.50.250
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
Fig. 11. Radial distribution functions gij(r) for a ternary mixture with diameters
σ1 = 1, σ2 = 2, and σ3 = 3 at a packing fraction η = 0.49 with mole fractions
x1 = 0.7, x2 = 0.2, and x3 = 0.1. The circles are simulation results [50], the solid
lines are the RFA predictions, and the dotted lines are the PY predictions.
Snc(q) = x2S11(q)− x1S22(q) + (x2 − x1)S12(q), (164)
Scc(q) = x
2S11(q) + x
1S22(q)− 2x1x2S12(q) (165)
are sometimes required.
After replacement of ĥij(q) =
ρiρj h̃ij(q) in Eq. (147), one easily gets
c̃ij(q). Subsequent inverse Fourier transformation yields cij(r). The result
gives cij(r) for r > σij as the superposition of N Yukawas [93], namely
Alternative Approaches to Hard-Sphere Liquids 43
cij(r) =
e−κℓr
, (166)
where q = ±iκℓ with ℓ = 1, . . . , N are the zeros of det
I+ ĥ(q)
and the
amplitudes K
ij are obtained by applying the residue theorem as
q→iκℓ
c̃ij(q)(q − iκℓ). (167)
The indirect correlation functions γij(r) ≡ hij(r) − cij(r) readily follow
from the previous results for the RDF and direct correlation functions. Finally,
in this case the bridge functions Bij(r) for r > σij are linked to gij(r) and
cij(r) through
Bij(r) = ln gij(r) − γij(r) (168)
and so once more we have a full set of analytical results for the structural
properties of a multicomponent fluid mixture of HS once the contact values
gij(σij) are specified.
4 Other Related Systems
The philosophy behind the RFA method to derive the structural properties
of three-dimensional HS systems can be adapted to deal with other related
systems. The main common features of the RFA can be summarized as follows.
First, one chooses to represent the RDF in Laplace space. Next, using as a
guide the low-density form of the Laplace transform, an auxiliary function is
defined which is approximated by a rational or a rational-like form. Finally,
the coefficients are determined by imposing some basic consistency conditions.
In this section we consider the cases of sticky-hard-sphere, square-well, and
hard-disk fluids. In the two former cases the RFA program is followed quite
literally, while in the latter case it is done more indirectly through the RFA
method as applied to hard rods (d = 1) and hard spheres (d = 3).
4.1 Sticky Hard Spheres
The sticky-hard-sphere (SHS) fluid model has received a lot of attention since
it was first introduced by Baxter in 1968 [94] and later extended to multi-
component mixtures by Perram and Smith [95] and, independently, by Bar-
boy [96]. In this model, the molecular interaction may be defined via square-
well (SW) potentials of infinite depth and vanishing width, thus embodying
the two essential characteristics of real molecular interactions, namely a harsh
repulsion and an attractive part. In spite of their known shortcomings [97], an
important feature of SHS systems is that they allow for an exact solution of the
44 M. López de Haro, S. B. Yuste and A. Santos
OZ equation in the PY approximation [94,95]. Furthermore, they are thought
to be appropriate for describing structural properties of colloidal systems, mi-
celles, and microemulsions, as well as some aspects of gas-liquid equilibrium,
ionic fluids and mixtures, solvent mediated forces, adsorption phenomena,
polydisperse systems, and fluids containing chainlike molecules [98–102].
Let us consider an N -component mixture of spherical particles interacting
according to the SW potential
φij(r) =
∞, r < σij ,
−ǫij , σij < r < Rij ,
0, r > Rij .
(169)
As in the case of additive HS, σij = (σi + σj)/2 is the distance between the
centers of a sphere of species i and a sphere of species j at contact. In addition,
ǫij is the well depth and Rij − σij indicates the well width. We now take the
SHS limit [94], namely
Rij → σij , ǫij → ∞, τij ≡
Rij − σij
e−ǫij/kBT = finite, (170)
where the τij are monotonically increasing functions of the temperature T and
their inverses measure the degree of “adhesiveness” of the interacting spheres i
and j. Even without strictly taking the mathematical limits (170), short-range
SW fluids can be well described in practice by the SHS model [103].
The virial EOS for the SHS mixture is given by
Z = 1 +
i,j=1
dr ryij(r)
e−φij(r)/kBT
= 1 +
i,j=1
xixjσ
ijyij(σij)
12τij
y′ij(σij)
yij(σij)
, (171)
where yij(r) ≡ gij(r)eφij(r)/kBT is the cavity function and y′ij(r) = dyij(r)/dr.
Since yij(r) must be continuous, it follows that
gij(r) = yij(r)
Θ(r − σij) +
12τij
δ(r − σij)
. (172)
The case of a HS system is recovered by taking the limit of vanishing adhe-
siveness τ−1ij → 0, in which case Eq. (171) reduces to the three-dimensional
version of Eq. (2). On the other hand, the compressibility EOS, Eq. (145), is
valid for any interaction potential, including SHS.
As in the case of HS, it is convenient to define the Laplace transform (149).
The condition yij(σij) = finite translates into the following large s behavior
of Gij(s):
eσijsGij(s) = σ
ijyij(σij)
12τij
+ σ−1ij s
+O(s−2), (173)
Alternative Approaches to Hard-Sphere Liquids 45
which differs from (151): while eσijsGij(s) ∼ s−1 for HS, eσijsGij(s) ∼ s0 for
SHS. However, the small s behavior is still given by Eq. (152), as a consequence
of the condition χ−1 = finite.
The RFA proposal for SHS mixtures [104] keeps the form (154), except
that now
Lij(s) = L
ij + L
ij s+ L
2 + L
3, (174)
Aij(s) = ρi
ϕ2(σis)σ
ij + ϕ1(σis)σ
ij + ϕ0(σis)σiL
ij − e
−σisL
(175)
instead of Eqs. (155) and (156). By construction, Eqs. (154), (174), and (175)
comply with the requirement lims→∞ e
σijsGij(s) = finite. Further, in view
of Eq. (152), the coefficients of s0 and s in the power series expansion of
s2Gij(s) must be 1 and 0, respectively. This yields 2N
2 conditions that allow
us to express L(0) and L(1) in terms of L(2), L(3), and α as [104]
ij = ϑ1+ϑ2σj+2ϑ2α−ϑ1
kj − L
ρkσkL
kj , (176)
ij = ϑ1σij +
ϑ2σiσj + (ϑ1 + ϑ2σi)α−
kj − L
(ϑ1 + ϑ2σi)
ρkσkL
kj , (177)
where ϑ1 and ϑ2 are defined below Eq. (158). We have the freedom to choose
(3) and α, but L(2) is constrained by the condition (173), i.e., the ratio
between the first and second terms in the expansion of eσijsGij(s) for large s
must be exactly equal to σij/12τij.
First-Order Approximation (PY Solution)
The simplest approximation consists of making α = 0. In view of the condition
eσijsGij(s) ∼ s0 for large s, this implies L(3)ij = 0. In that case, the large s
behavior that follows from Eq. (154) is
2πeσijsGij(s) = L
(2) · D
s−1 +O(s−2), (178)
where
Dij ≡ ρi
σ2i L
ij − σiL
ij + L
. (179)
Comparison with Eq. (173) yields
yij(σij) =
πσ2ij
ij , (180)
46 M. López de Haro, S. B. Yuste and A. Santos
12τijL
ik Dkj . (181)
Taking into account Eqs. (176) and (177) (with L
ij = L
ji and of course also
with α = 0 and L(3) = 0), Eq. (181) becomes a closed equation for L(2):
12τijL
= ϑ1σij+
ϑ2σiσj−
ki σj + L
kj σi
(182)
The physical root L(2) of Eq. (182) is the one vanishing in the HS limit τij →
∞. Once known, Eq. (180) gives the contact values.
This first-order approximation obtained from the RFA method turns out
to coincide with the exact solution of the PY theory for SHS [95].
Second-Order Approximation
As in the case of HS mixtures, a more flexible proposal is obtained by keeping
α (and, consequently, L
ij ) different from zero. In that case, instead of Eq.
(178), one has
2πeσijsGij(s) =
+O(s−2). (183)
This implies
πσ2ij
αyij(σij), (184)
12τijL
. (185)
If we fix yij(σij), Eqs. (176), (177), (184), and (185) allow one to express L
(1), L(2), and L(3) as linear functions of α. Thus, only the scalar parameter
α remains to be fixed, analogously to what happens in the HS case. As done
in the latter case, one possibility is to choose α in order to reproduce the
isothermal compressibility χ given by Eq. (148). To do so, one needs to find
the coefficients H
ij appearing in Eq. (152). The result is [104]
(0) = C(0) ·
I− A(0)
, (186)
(1) = C(1) ·
I− A(0)
, (187)
where
Alternative Approaches to Hard-Sphere Liquids 47
αδkj −A(1)kj
δkj −A(0)kj
(188)
σ2ik +H
αδkj −A(1)kj
σ3ik + σikH
δkj −A(0)kj
, (189)
ij = (−1)
σn+3i
(n+ 3)!
σn+2i
(n+ 2)!
σn+1i
(n+ 1)!
(190)
Equation (187) gives H(1) in terms of α: H
ij = Pij(α)/[Q(α)]
2, where Pij(α)
denotes a polynomial in α of degree 2N and Q(α) denotes a polynomial of
degree N . It turns out then that, seen as a function of α, χ is the ratio of
two polynomials of degree 2N . Given a value of χ, one may solve for α. The
physical solution, which has to fulfill the requirement that Gij(s) is positive
definite for positive real s, corresponds to the smallest positive real root.
Once α is known, the scheme is complete: Eq. (184) gives L(3), then L(2) is
obtained from Eq. (185), and finally L(1) and L(0) are given by Eqs. (176) and
(177), respectively. Explicit knowledge of Gij(s) through Eqs. (154), (174),
and (175) allows one to determine the Fourier transform h̃ij(q) and the struc-
ture factor Sij(q) through Eqs. (161) and (162), respectively. Finally, inverse
Laplace transformation of Gij(s) yields gij(r) [90].
Single Component SHS Fluids
The special case of single component SHS fluids [105, 106] can be obtained
from the multicomponent one by taking σij = σ and τij = τ . Thus, the
Laplace transform of rg(r) in the RFA is
G(s) =
L(0) + L(1)s+ L(2)s2 + L(3)s3
1 + αs− ρ
ϕ2(s)L(0) + ϕ1(s)L(1) + ϕ0(s)L(2) − e−sL(3)
(191)
where we have taken σ = 1. Equations (176) and (177) become
L(0) = 2π
1 + 2η
(1− η)2
− L(2)
(1 − η)2
(1 − 4η)L(3), (192)
L(1) = 2π
1 + 1
(1− η)2
1 + 2η
α− 3ηL(2)
− 18η
(1− η)2
L(3). (193)
The choice α = L(3) = 0 makes Eq. (191) coincide with the exact solution
to the PY approximation for SHS [94], where L(2) is the physical root (i.e., the
one vanishing in the limit τ → ∞) of the quadratic equation [see Eq. (182)]
48 M. López de Haro, S. B. Yuste and A. Santos
12τL(2) = 2π
1 + 2η
(1− η)2
− 12η
L(2) +
ηL(2)
. (194)
We can go beyond the PY approximation by prescribing a contact value
y(1), so that, according to Eqs. (184) and (185),
L(3) =
y(1), (195)
L(2) =
12τ +
L(3). (196)
By prescribing the isothermal compressibility χ, the parameter α can be ob-
tained as the physical solution (namely, the one remaining finite in the limit
τ → ∞) of a quadratic equation [106]. Thus, given an EOS for the SHS
fluid, one can get the thermodynamically consistent values of y(1) and χ and
determine from them all the coefficients appearing in Eq. (191).
Figure 12 shows the cavity function for η = 0.164 and τ = 0.13 as obtained
from Monte Carlo simulations [101] and as predicted by the PY and RFA
theories, the latter making use of the EOS recently proposed by Miller and
Frenkel [102]. It can be observed that the RFA is not only more accurate than
the PY approximation near r = 1 but also near r = 2. On the other hand,
none of these two approximations account for the singularities (delta-peaks
and/or discontinuities) of y(r) at r =
8/3, 5/3,
3, 2, . . . [100, 101].
1.0 1.5 2.0 2.5
=0.164, =0.13 MC
Fig. 12. Cavity function of a single component SHS fluid for η = 0.164 and τ = 0.13.
The solid line represents simulation data [101]. The dotted and dashed lines represent
the PY and RFA approaches, respectively.
Alternative Approaches to Hard-Sphere Liquids 49
4.2 Single Component Square-Well Fluids
Now we consider again the SW interaction potential (169) but for a single
fluid, i.e., σij = σ, ǫij = ǫ, Rij = R. Since no exact solution of the PY theory
for the SW potential is known, the application of the RFA method is more
challenging in this case than for HS and SHS fluids.
As in the cases of HS and SHS, the key quantity is the Laplace transform
of rg(r) defined by Eq. (98). It is again convenient to introduce the auxiliary
function Ψ(s) through Eq. (99). As before, the conditions g(r) = finite and
χ = finite imply Eqs. (102) and (104), respectively. However, the important
difference between HS and SHS fluids is that in the latter case G(s) must
reflect the fact that g(r) is discontinuous at r = R as a consequence of the
discontinuity of the potential φ(r) and the continuity of the cavity function
y(r). This implies that G(s), and hence Ψ(s), must contain the exponential
term e−(R−σ)s. This manifests itself in the low-density limit, where the con-
dition limρ→0 y(r) = 1 yields
Ψ(s) =
(1 + s)− e−(R−1)s(e1/T∗ − 1)(1 +Rs)
, (197)
where T ∗ ≡ kBT/ǫ and we have taken σ = 1.
In the spirit of the RFA method, the simplest form that complies with Eq.
(102) and is consistent with Eq. (197) is [107]
Ψ(s) =
−12η + E1s+ E2s2 + E3s3
1 +Q0 +Q1s− e−(R−1)s (Q0 +Q2s)
, (198)
where the coefficients Q0, Q1, Q2, E1, E2, and E3 are functions of η, T
∗, and
R. The condition (104) allows one to express the parameters Q1, E1, E2, and
E3 as linear functions of Q0 and Q2 [107, 108]:
1 + 2η
+ 2η(R3 − 1)Q2 −
(R − 1)2(R2 + 2R+ 3)Q0
+Q2 − (R − 1)Q0, (199)
1 + 2η
3− 4(R3 − 1)Q2 + (R− 1)2(R2 + 2R+ 3)Q0
, (200)
1 + 2η
{1− η − 2(R− 1) [1− 2ηR(R+ 1)]Q2
+(R− 1)2
(1− η(R + 1)2
, (201)
1 + 2η
(1 − η)2 + 6η(R− 1)
R+ 1− 2ηR2
−η(R − 1)2[4 + 2R− η(3R2 + 2R+ 1)]Q0
. (202)
50 M. López de Haro, S. B. Yuste and A. Santos
From Eq. (102), we have
g(1+) =
. (203)
The complete RDF is given by Eq. (122), where now Eq. (198) must be used
in Eq. (123). In particular, ψ1(r) and ψ2(r) are
ψ1(r) = ψ10(r)Θ(r) + ψ11(r + 1−R)Θ(r + 1−R), (204)
ψ2(r) = ψ20(r)Θ(r)+ψ21(r+1−R)Θ(r+1−R)+ψ22(r+2−2R)Θ(r+2−2R),
(205)
where
ψ1k(r) = 2π
W1k(si)
E′(si)
six, (206)
ψ2k(r) = −4π2
rW2k(si) +W
2k(si)−W2k(si)
E′′(si)
E′(si)
[E′(si)]2
. (207)
Here, si are the three distinct roots of E(s) ≡ −12η+E1s+E2s2 +E3s3 and
W10(s) ≡ 1 +Q0 +Q1s, W11(s) ≡ −(Q0 +Q2s). (208)
W20(s) ≡ s[W10(s)]2, W21(s) ≡ 2sW10(s)W11(s), W22(s) ≡ s[W11(s)]2.
(209)
To close the proposal, we need to determine the parameters Q0 and Q2 by
imposing two new conditions. An obvious condition is the continuity of the
cavity function at r = R, what implies
g(R+) = e1/T
g(R−). (210)
This yields (
1− e−1/T
ψ10(R− 1) = −ψ11(0) = 2π
. (211)
As an extra condition, we could enforce the continuity of the first derivative
y′(r) at r = R [109]. However, this complicates the problem too much without
any relevant gain in accuracy. In principle, it might be possible to impose
consistency with a given EOS, via either the virial route, the compressibility
route, or the energy route. But this is not practical since no simple EOS for
SW fluids is at our disposal for wide values of density, temperature, and range.
As a compromise between simplicity and accuracy, we fix the parameter Q0
at its exact zero-density limit value, namely Q0 = e
1/T∗ − 1 [107]. Therefore,
Eq. (211) becomes a transcendental equation for Q2 that needs to be solved
numerically. For narrow SW potentials, however, it is possible to replace the
exact condition (210) by a simpler one allowing Q2 to be obtained analytically
[108], which is especially useful for determining the thermodynamic properties
[108, 110].
Alternative Approaches to Hard-Sphere Liquids 51
R=1.5, 3=0.4, T*=1.5
1.0 1.5 2.0 2.5 3.0
2.0 R=2, 3=0.4, T*=3
R=1.05, 3=0.8, T*=0.5
Fig. 13. Radial distribution function of a single component SW fluid for R = 1.05,
ρσ3 = 0.8, and T ∗ = 0.5 (top panel), for R = 1.5, ρσ3 = 0.4, and T ∗ = 1.5 (middle
panel), and for R = 2.0, ρσ3 = 0.4, and T ∗ = 3.0 (bottom panel). The circles
represent simulation data [111] and the solid lines refer to the results obtained from
the RFA method.
It can be proven that the RFA proposal (198) reduces to the exact solutions
of the PY equation [29, 94] in the HS limit, i.e., ǫ → 0 or R → 1, and in the
SHS limit, i.e., ǫ→ ∞ and R → 1 with (R− 1)e1/T∗ = finite [107, 108].
Comparison with computer simulations [107, 108, 110, 111] shows that the
RFA for SW fluids is rather accurate at any fluid density if the potential well
is sufficiently narrow (say R ≤ 1.2), as well as for any width if the density
is small enough (say ρσ3 ≤ 0.4). However, as the width and/or the density
increase, the RFA predictions worsen, especially at low temperatures. As an
52 M. López de Haro, S. B. Yuste and A. Santos
illustration, Fig. 13 compares the RDF provided by the RFA with Monte Carlo
data [111] for three representative cases.
4.3 Hard Disks
As is well known, the PY theory is exactly solvable for HS fluids with an
odd number of dimensions [112–114]. In particular, in the case of hard rods
(d = 1), the PY theory provides the exact RDF g(r) or, equivalently, the exact
cavity function y(r) outside the hard core (i.e., for r > σ). However, it does
not reproduce the exact y(r) in the overlapping region (i.e., for r < σ) [85].
The full exact one-dimensional cavity function is [85]
yHR(r|η) =
e−(r−1)η/(1−η)
ηn−1e−(r−n)η/(1−η)
(1 − η)n(n− 1)!
(r − n)n−1Θ(r − n),
(212)
where the subscript HR stands for hard rods and, as usual, σ = 1 has been
taken. Consequently, one has
gHR(1
+|η) = 1
dr rhHR(r|η) ≡ H(0)HR(η) = −
η2. (213)
When d is even, the PY equation is not analytically solvable for the HS
interaction. In particular, in the important case of hard disks (d = 2), one
must resort to numerical solutions of the PY equation [1, 115]. Alternatively,
a simple heuristic approach has proven to yield reasonably good results [116].
Such an approach is based on the näıve assumption that the structure and
spatial correlations of a hard-disk fluid share some features with those of a
hard-rod and a hard-sphere fluid. This fuzzy idea becomes a more specific one
by means of the following simple model [116]:
gHD(r|η) = ν(η)gHR(r|ω1(η)η) + [1− ν(η)]gHS(r|ω3(η)η). (214)
Here, the subscript HD stands for hard disks (d = 2) and the subscript HS
stands for hard spheres (d = 3). The parameter ν(η) is a density-dependent
mixing parameter, while ω1(η)η and ω3(η)η are the packing fractions in one
and three dimensions, respectively, which are “equivalent” to the packing
fraction η in two dimensions. In Eq. (214), it is natural to take for gHR(r|η)
the exact solution, Eq. (212). As for gHR(r|η), one might use the RFA recipe
described in Section 3. However, in order to keep the model (214) as simple
as possible, it is sufficient for practical purposes to take the PY solution, Eq.
(108). In the latter approximation,
gHS(1
+|η) = 1 + η/2
(1− η)2
dr rhHS(r|η) ≡ H(0)HS (η) = −
10− 2η + η2
20(1 + 2η)
(215)
Alternative Approaches to Hard-Sphere Liquids 53
In order to close the model (214), we still need to determine the parameters
ν(η), ω1(η), and ω3(η). To that end, we first impose the condition that Eq.
(214) must be consistent with a prescribed contact value gHD(1
+|η) or, equiv-
alently, with a prescribed compressibility factor ZHD(η) = 1 + 2ηgHD(1
+|η),
with independence of the choice of the mixing parameter ν(η). In other words,
gHD(1
+|η) = gHR(1+|ω1(η)η) = gHS(1+|ω3(η)η). (216)
Making use of Eqs. (213) and (215), this yields
ω1(η) =
gHD(1
+|η)− 1
ηgHD(1+|η)
, ω3(η) =
4gHD(1
+|η) + 1−
24gHD(1+|η) + 1
4ηgHD(1+|η)
(217)
Once ω1(η) and ω3(η) are known, we can determine ν(η) by imposing that
the model (214) reproduces the isothermal compressibility χHD(η) thermody-
namically consistent with the prescribed ZHD(η) [cf. Eq. (97)]. From Eqs. (96)
and (214) one has
χHD(η) = 1 + 8η
dr r {ν(η)hHR(r|ω1(η)η) + [1− ν(η)] hHS(r|ω3(η)η)} ,
(218)
so that
ν(η) =
[χHD(η)− 1] /8η −H(0)HS (ω3(η)η)
HR(ω1(η)η) −H
HS (ω3(η)η)
, (219)
where H
HR(η) and H
HS (η) are given by Eqs. (213) and (215), respectively.
Once a sensible EOS for hard disks is chosen [see, for instance, Table 1],
Eqs. (217) and (219) provide the parameters of the model (214). The results
show that the scaling factor ω1(η) is a decreasing function, while ω3(η) is
an increasing function [116]. As for the mixing parameter ν(η), it is hardly
dependent of density and takes values around ν(η) ≃ 0.35–0.40.
Comparison of the interpolation model (214) with computer simulation re-
sults shows a surprisingly good agreement, despite the crudeness of the model
and the absence of empirical fitting parameters, especially at low and mod-
erate densities [116]. The discrepancies become important only for distances
beyond the location of the second peak and for densities close to the stability
threshold.
5 Perturbation Theory
When one wants to deal with realistic intermolecular interactions, the prob-
lem of deriving the thermodynamic and structural properties of the system
becomes rather formidable. Thus, perturbation theories of liquids have been
devised since the mid twentieth century. In the case of single component flu-
ids, the use of an accurate and well characterized RDF for the HS fluid in a
54 M. López de Haro, S. B. Yuste and A. Santos
perturbation theory opens up the possibility of deriving a closed theoretical
scheme for the determination of the thermodynamic and structural proper-
ties of more realistic models, such as the Lennard–Jones (LJ) fluid. In this
section, we will consider this model system, which captures the basic physical
properties of real non-polar fluids, to illustrate the procedure.
In the application of the perturbation theory of liquids, the stepping stone
has been the use of the HS RDF obtained from the solution to the PY equa-
tion. Unfortunately, the absence of thermodynamic consistency present in the
PY approximation (as well as in other integral equation theories) may clearly
contaminate the results derived from its use within a perturbative treatment.
In what follows we will reanalyze the different theoretical schemes for the ther-
modynamics of LJ fluids that have been constructed with perturbation theory,
taking as the reference system the HS fluid. This includes the consideration
of the RDF as obtained with the RFA method, which embodies thermody-
namic consistency, as well as the proposal of a unifying framework in which
all schemes fit in. With our development, we will be able to present a formula-
tion which lends itself to relatively easy numerical calculations while retaining
the merits that analytical results provide, namely a detailed knowledge and
control of all the approximations involved.
Let us consider a three-dimensional fluid system defined by a pair inter-
action potential φ(r). The virial and energy EOS express the compressibility
factor Z and the excess part of the Helmholtz free energy per unit volume
f ex, respectively, in terms of the RDF of the system as
Z = 1− 2
∂φ(r)
g(r)r3, (220)
= 2πρβ
dr φ(r)g(r)r2 , (221)
where β ≡ 1/kBT . Let us now assume that φ(r) is split into a known (ref-
erence) part φ0(r) and a perturbation part φ1(r). The usual perturbative
expansion for the Helmholtz free energy to first order in β leads to [2]
+ 2πρβ
dr φ1(r)g0(r)r
, (222)
where f0 and g0(r) are the free energy and the RDF of the reference system,
respectively.
The LJ potential is
φLJ(r) = 4ǫ
r−12 − r−6
, (223)
where ǫ is the depth of the well and, for simplicity, we have taken the distance
at which the potential vanishes as the length unit, i.e., φLJ(r = 1) = 0. For
this potential the reference system may be forced to be a HS system, i.e., one
can set
Alternative Approaches to Hard-Sphere Liquids 55
φ0(r) = φHS(r) =
∞, r ≤ σ0,
0, r > σ0,
(224)
where σ0 is a conveniently chosen effective HS diameter. In this case the
Helmholtz free energy to this order is approximated by
≈ fHS
+ 2πρβ
dr φLJ(r)gHS(r/σ0)r
2. (225)
Note that Eq. (225) may be rewritten in terms of the Laplace transform G(s)
of (r/σ0)gHS(r/σ0) as
≈ fHS
+ 2πρβσ30
ds ΦLJ(s)G(s), (226)
where ΦLJ(s) satisfies
rφLJ(r) = σ0
ds e−rs/σ0ΦLJ(s), (227)
so that
ΦLJ(s) = 4ǫσ
(s/σ0)
− (s/σ0)
. (228)
Irrespective of the value of the diameter σ0 of the reference system, the
right hand side of Eq. (226) represents always an upper bound for the value of
the free energy of the real system. Therefore, it is natural to determine σ0 so
as to provide the least upper bound. This is precisely the variational scheme of
Mansoori and Canfield [117,118] and Rasaiah and Stell [119], usually referred
to as MC/RS, and originally implemented with the PY theory for G(s), Eq.
(108). In our case, however, we will considerG(s) as given by the RFA method,
Eq. (113). Therefore, at fixed ρ and β, the effective diameter σ0 in the MC/RS
scheme is obtained from the conditions
{∫ η0
ZHS(η)− 1
+ 48βǫσ−20
dsG(s|η0)
(s/σ0)
− (s/σ0)
= 0, (229)
{∫ η0
ZHS(η)− 1
+ 48βǫσ−20
dsG(s|η0)
(s/σ0)
− (s/σ0)
> 0. (230)
In these equations, use has been made of the thermodynamic relationship
between the free energy and the compressibility factor, Eq. (78). Moreover, we
have called η0 ≡ (π/6)ρσ30 and have made explicit with the notation G(s|η0)
the fact that the HS RDF depends on the packing fraction η0.
56 M. López de Haro, S. B. Yuste and A. Santos
Even if the reference system is not forced to be a HS fluid, one can still use
Eq. (226) provided an adequate choice for σ0 is made such that the expansion
involved in the right hand side of Eq. (222) yields the right hand side of Eq.
(226) to order β2. This is the idea of the Barker and Henderson [120] first
order perturbation scheme (BH1), where the effective HS diameter is
1− e−βφLJ(r)
. (231)
The same ideas may be carried out to higher order in the perturbation
expansion. The inclusion of the second order term in the expansion yields the
so-called macroscopic compressibility approximation [2] for the free energy,
namely
+ 2πρβ
dr φ1(r)g0(r)r
−πρβ2χ0
dr φ21(r)g0(r)r
, (232)
where χ0 is the (reduced) isothermal compressibility of the reference system
[121].
To implement a particular perturbation scheme in this approximation un-
der a unifying framework that eventually leads to easy numerical evaluation,
two further assumptions may prove convenient. First, the perturbation poten-
tial φ1(r) ≡ φLJ(r)−φ0(r) may be split into two parts using some “molecular
size” parameter ξ ≥ σ0 such that
φ1(r) =
φ1a(r), 0 ≤ r ≤ ξ,
φ1b(r), r > ξ.
(233)
Next, a choice for the RDF for the reference system is done in the form
g0 (r) ≈ θ(r)yHS(r/σ0), (234)
where yHS is the cavity (background) correlation function of the HS system
and θ(r) is a step function defined by
θ(r) =
θa(r), 0 ≤ r ≤ ξ,
θb(r), r > ξ,
(235)
in which the functions θa(r) and θb(r) depend on the scheme.
With these assumptions the integrals involved in Eq. (232) may be rewrit-
ten as
dr φn1 (r)g0(r)r
dr φn1a(r)θa(r)yHS(r/σ0)r
dr φn1a(r)θa(r)gHS(r/σ0)r
dr φn1b(r)θb(r)gHS(r/σ0)r
2, (236)
Alternative Approaches to Hard-Sphere Liquids 57
with n = 1, 2 and where the fact that yHS(r/σ0) = gHS(r/σ0) when r > σ0
has been used. Decomposing the last integral as
and applying
the same step as in Eq. (226), Eq. (236) becomes
In = σ
ds Φnb(s)G(s) +
dr φn1a(r)θa(r)yHS(r/σ0)r
dr [φn1a(r)θa(r) − φn1b(r)θb(r)] gHS(r/σ0)r2, (237)
where the functions Φ1b(s) and Φ2b(s) are defined by the relation
rφn1b(r)θb(r) = σ0
ds e−rs/σ0Φnb(s). (238)
In the Barker–Henderson second order perturbation scheme (BH2), one
takes
θa(r) = 0, θb(r) = 1, ξ = σ0, φ1a(r) = 0, φ1b(r) = 4ǫ
r−12 − r−6
(239)
and σ0 is computed according to Eq. (231). This choice ensures that
+ 2πρβ
dr φ1(r)gHS(r/σ0)r
−πρβ2χHS
dr φ21(r)gHS(r/σ0)r
. (240)
On the other hand, if one chooses
θa(r) = exp [−β (φLJ(r) + ǫ)] , θb(r) = 1, ξ = 21/6, (241)
φ1a(r) = −ǫ, φ1b(r) = 4ǫ
r−12 − r−6
, (242)
the scheme leads to the Weeks–Chandler–Andersen (WCA) theory [122] if one
determines the HS diameter through the condition χ0 = χHS [123], which in
turn implies
dr r2e−βφ0(r)yHS(r/σ0) =
∫ 21/6
dr r2gHS(r/σ0)
1− e−βφ0(r)
. (243)
To close the scheme, the HS cavity function has to be provided in the range 0 ≤
r ≤ σ0. Fortunately, relatively simple expressions for yHS(r/σ0) are available
in the literature [124–126], apart from our own proposal, Eq. (140).
Note that θb(r) and φ1b(r), and thus also Φnb(s), are the same functions in
the BH2 and WCA schemes. It is convenient, in order to have all the quantities
needed to evaluate fLJ in these schemes, to provide explicit expressions for
Φ1b(s) and Φ2b(s). These are given by [cf. Eq. (228)]
58 M. López de Haro, S. B. Yuste and A. Santos
Φ1b(s) = ΦLJ(s), (244)
Φ2b(s) = 16ǫ
2σ−20
(s/σ0)
− 2(s/σ0)
(s/σ0)
. (245)
Up to this point, we have embodied the most popular perturbation schemes
within a unified framework that requires as input only the EOS of the HS fluid
in order to compute the Helmholtz free energy of the LJ system and leads
to relatively easy numerical computations. It should be clear that a variety
of other possible schemes, requiring the same little input, fit in our unified
framework, which is based on the RFA method for gHS(r/σ0) and G(s). Once
fLJ has been determined, the compressibility factor of the LJ fluid at a given
order of the perturbation expansion readily follows from Eqs. (222) or (232)
through the thermodynamic relation
ZLJ = ρ
. (246)
Taking into account that the HS fluid presents a fluid-solid transition at
a freezing packing fraction ηf ≃ 0.494 [127] and a solid-fluid transition at
a melting packing fraction ηm ≃ 0.54 [127], the fluid-solid and solid-fluid
coexistence lines for the LJ system may be computed from the values (ρ, T )
determined from the conditions (π/6)ρσ30(ρ, T ) = ηf and (π/6)ρσ
0(ρ, T ) = ηm,
respectively, with the effective diameter σ0(ρ, T ) obtained using any of the
perturbative schemes. Similarly, admitting that there is a glass transition in
the HS fluid at the packing fraction ηg ≃ 0.56 [128], one can now determine
the location of the liquid-glass transition line for the LJ fluid in the (ρ, T )
plane from the simple relationship (π/6)ρσ30(ρ, T ) = ηg. With a proper choice
for ZHS, it has been shown [76,129,130] that the critical point, the structure,
and the phase diagram (including a glass transition) of the LJ fluid may be
adequately described with this approach.
6 Perspectives
In this chapter we have given a self-contained account of a simple (mostly
analytical) framework for the study of the thermodynamic and structural
properties of hard-core systems. Whenever possible, the developments have
attempted to cater for mixtures with an arbitrary number of components (in-
cluding polydisperse systems) and arbitrary dimensionality. We started con-
sidering the contact values of the RDF because they enter directly into the
EOS and are required as input in the RFA method to compute the structural
properties. With the aid of consistency conditions, we were able to devise var-
ious approximate proposals which, when used in conjunction with a sensible
choice for the contact value of the RDF of the single component fluid (re-
quired in the formulation but otherwise chosen at will), have been shown to
Alternative Approaches to Hard-Sphere Liquids 59
be in reasonably good agreement with simulation results and lead to accurate
EOS both for additive and non-additive mixtures. Some aspects of the results
that follow from the use of these EOS were illustrated by looking at demixing
problems in these mixtures, including the far from intuitive case of a binary
mixture of non-additive hard spheres in infinite dimensionality.
After that, restricting ourselves to three-dimensional systems, we described
the RFA method as applied to a single component hard-sphere fluid and to
a multicomponent mixture of HS. Using this approach, we have been able to
obtain explicit analytical results for the RDF, the direct correlation function,
the static structure factor, and the bridge function, in the end requiring as
input only the contact value of the RDF of the single component HS fluid
(or equivalently its compressibility factor). One of the nice assets of the RFA
approach is that it eliminates the thermodynamic consistency problem which
is present in most of the integral equation formulations for the computation
of structural quantities. Once again, when a sensible choice for the single
component EOS is made, we have shown, through the comparison between
the results of the RFA approach and simulation data for some illustrative
cases, the very good performance of our development. Also, the use of the
RFA approach in connection with some other related systems (sticky hard
spheres, square-well fluids, and hard disks) has been addressed.
The final part of the chapter concerns the use of HS results for more
realistic intermolecular potentials in the perturbation theory of liquids. In
this instance we have been able to provide a unifying scheme in which the
most popular perturbation theory formulations may be expressed and which
was devised to allow for easy computations. We illustrated this for a LJ fluid
but it should be clear that a similar approach might be followed for other fluids
and in fact it has recently been done in connection with the glass transition
of hard-core Yukawa fluids [131].
Finally, it should be clear that there are many facets of the equilibrium and
structural properties of hard-core systems that may be studied with a simi-
lar approach but that up to now have not been considered. For instance, the
generalizations of the RFA approach for systems such as hard hyperspheres,
non-additive hard spheres, square-well mixtures, penetrable spheres [132], or
the Jagla potential [133] appear as interesting challenges. Similarly, the ex-
tension of the perturbation theory scheme to the case of LJ mixtures seems
a worthwhile task. We hope to address some of these problems in the future
and would be very much rewarded if some others were taken up by researchers
who might find these developments also a valuable tool for their work.
References
1. J. A. Barker and D. Henderson, Rev. Mod. Phys. 48, 587 (1976).
2. D. A. McQuarrie, Statistical Mechanics (Harper & Row, N. Y., 1976).
3. H. L. Friedman, A Course in Statistical Mechanics (Prentice Hall, Englewood
Cliffs, 1985).
60 M. López de Haro, S. B. Yuste and A. Santos
4. J.-P. Hansen and I. R. McDonald, Theory of Simple Liquids, (Academic Press,
London, 1986).
5. J. L. Lebowitz and D. Zomick, J. Chem. Phys. 54, 3335 (1971).
6. J. T. Jenkins and F. Mancini, J. Appl. Mech. 54, 27 (1987).
7. C. Barrio and J. R. Solana, J. Chem. Phys. 115, 7123 (2001); 117, 2451(E)
(2002).
8. J. L. Lebowitz, Phys. Rev. A 133, 895 (1964).
9. H. Reiss, H. L. Frisch, and J. L. Lebowitz, J. Chem. Phys. 31, 369 (1959); E.
Helfand, H. L. Frisch, and J. L. Lebowitz, J. Chem. Phys. 34, 1037 (1961); J.
L. Lebowitz, E. Helfand, and E. Praestgaard, J. Chem. Phys. 43, 774 (1965).
10. M. J. Mandell and H. Reiss, J. Stat. Phys. 13, 113 (1975).
11. Y. Rosenfeld, J. Chem. Phys. 89, 4272 (1988).
12. M. Heying and D. S. Corti, J. Phys. Chem. B 108, 19756 (2004).
13. T. Boubĺık, J. Chem. Phys. 53, 471 (1970).
14. E. W. Grundke and D. Henderson, Mol. Phys. 24, 269 (1972).
15. L. L. Lee and D. Levesque, Mol. Phys. 26, 1351 (1973).
16. G. A. Mansoori, N. F. Carnahan, K. E. Starling, and J. T. W. Leland, J. Chem.
Phys. 54, 1523 (1971).
17. D. Henderson, A. Malijevský, S. Lab́ık, and K. Y. Chan, Mol. Phys. 87, 273
(1996).
18. D. H. L. Yau, K.-Y. Chan, and D. Henderson, Mol. Phys. 88, 1237 (1996); 91,
1137 (1997).
19. D. Henderson and K. Y. Chan, J. Chem. Phys. 108, 9946 (1998); Mol. Phys.
94, 253 (1998); 98, 1005 (2000).
20. D. Henderson, D. Boda, K. Y. Chan, and D. T. Wasan, Mol. Phys. 95, 131
(1998).
21. D. Matyushov, D. Henderson, and K.-Y. Chan, Mol. Phys. 96, 1813 (1999).
22. D. Cao, K.-Y. Chan, D. Henderson, and W. Wang, Mol. Phys. 98, 619 (2000).
23. D. V. Matyushov and B. M. Ladanyi, J. Chem. Phys. 107, 5815 (1997).
24. C. Barrio and J. R. Solana, J. Chem. Phys. 113, 10180 (2000).
25. D. Viduna and W. R. Smith, Mol. Phys. 100, 2903 (2002); J. Chem. Phys.
117, 1214 (2002).
26. D. Henderson, Mol. Phys. 30, 971 (1975).
27. A. Santos, M. López de Haro, and S. B. Yuste, J. Chem. Phys. 103, 4622
(1995); M. López de Haro, A. Santos, and S. B. Yuste, Eur. J. Phys. 19, 281
(1998).
28. S. Luding, Phys. Rev. E 63, 042201 (2001); S. Luding, Adv. Compl. Syst.
4, 379 (2002); S. Luding and O. Strauß, in Granular Gases, T. Pöschel and
S. Luding, eds. (LNP 564, Springer-Verlag, Berlin, 2001), pp. 389–409.
29. M. S. Wertheim, Phys. Rev. Lett. 10, 321 (1963); E. Thiele, J. Chem. Phys.
39, 474 (1963).
30. N. F. Carnahan and K. E. Starling, J. Chem. Phys. 51, 635 (1969).
31. M. Luban and J. P. J. Michels, Phys. Rev. A 41, 6796 (1990).
32. E. Hamad, J. Chem. Phys. 101, 10195 (1994).
33. C. Vega, J. Chem. Phys. 108, 3074 (1998).
34. N. M. Tukur, E. Z. Hamad, and G. A. Mansoori, J. Chem. Phys. 110, 3463
(1999).
35. A. Santos, S. B. Yuste, and M. López de Haro, Mol. Phys. 96, 1 (1999).
36. A. Malijevský and J. Veverka, Phys. Chem. Chem. Phys. 1, 4267 (1999).
Alternative Approaches to Hard-Sphere Liquids 61
37. A. Santos, S. B. Yuste, and M. López de Haro, Mol. Phys. 99, 1959 (2001).
38. M. González-Melchor, J. Alejandre, and M. López de Haro, J. Chem. Phys.
114, 4905 (2001).
39. M. López de Haro, S. B. Yuste, and A. Santos, Phys. Rev. E 66, 031202
(2002).
40. A. Santos, Mol. Phys. 96, 1185 (1999); 99, 617(E) (2001).
41. C. Regnaut, A. Dyan, and S. Amokrane, Mol. Phys. 99, 2055 (2001); 100,
2907(E) (2002).
42. A. Santos, S. B. Yuste, and M. López de Haro, J. Chem. Phys. 117, 5785
(2002).
43. A. Santos, S. B. Yuste and M. López de Haro, J. Chem. Phys. 123, 234512
(2005); M. López de Haro, S. B. Yuste, and A. Santos, Mol. Phys. 104, 3461
(2006).
44. S. Luding and A. Santos, J. Chem. Phys. 121, 8458 (2004).
45. M. Barošová, A. Malijevský, S. Lab́ık, and W. R. Smith, Mol. Phys. 87, 423
(1996).
46. H. Hansen-Goos and R. Roth, J. Chem. Phys. 124, 154506 (2006).
47. R. Evans, in Liquids and Interfaces, edited by J. Charvolin, J. F. Joanny, and
J. Zinn-Justin (North-Holland, Amsterdam, 1990).
48. Y. Rosenfeld, Phys. Rev. Lett. 63, 980 (1989).
49. A. Malijevský, M. Barošová, and W. R. Smith, Mol. Phys. 91, 65 (1997).
50. Al. Malijevský, A. Malijevský, S. B. Yuste, A. Santos, and M. López de Haro,
Phys. Rev. E 66, 061203 (2002).
51. M. Buzzacchi, I. Pagonabarraga, and N. B. Wilding, J. Chem. Phys. 121, 11362
(2004).
52. Al. Malijevský, S. B. Yuste, A. Santos, and M. López de Haro, preprint arXiv:
cond-mat/0702284.
53. F. Lado, Phys. Rev. E 54, 4411 (1996).
54. I. Prigogine and S. Lafleur, Bull. Classe Sci. Acad. Roy. Belg. 40, 484, 497
(1954).
55. S. Asakura and F. Oosawa, J. Chem. Phys. 22, 1255 (1954); J. Polym. Sci. 33,
183 (1958).
56. R. Kikuchi, J. Chem. Phys. 23, 2327 (1955).
57. P. Ballone, G. Pastore, G. Galli, and D. Gazzillo, Mol. Phys. 59, 275 (1986).
58. D. Gazzillo, G. Pastore, and S. Enzo, J. Phys.: Condens. Matter 1, 3469 (1989);
D. Gazzillo, G. Pastore, and R. Frattini, J. Phys.: Condens. Matter 2,8465
(1990).
59. A. Santos, M. López de Haro, and S. B. Yuste, J. Chem. Phys. 122, 024514
(2005).
60. E. Z. Hamad, J. Chem. Phys. 105, 3229 (1996).
61. E. Z. Hamad, J. Chem. Phys. 105, 3222 (1996).
62. H. Hammawa and E. Z. Hamad, J. Chem. Soc. Faraday Trans. 92, 4943 (1996).
63. M. Al-Naafa, J. B. El-Yakubu, and E. Z. Hamad, Fluid Phase Equilibria 154,
33 (1999).
64. J. Jung, M. S. Jhon, and F. H. Ree, J. Chem. Phys. 100, 528 (1994).
65. J. Jung, M. S. Jhon, and F. H. Ree, J. Chem. Phys. 100, 9064 (1994).
66. T. Coussaert and M. Baus, J. Chem. Phys. 109, 6012 (1998).
67. A. Yu. Vlasov and A. J. Masters, Fluid Phase Equilibria 212, 183 (2003).
68. M. López de Haro and C. F. Tejero, J. Chem. Phys. 121, 6918 (2004).
62 M. López de Haro, S. B. Yuste and A. Santos
69. S. B. Yuste, A. Santos, and M. López de Haro, Europhys. Lett. 52, 158 (2000).
70. H.-O. Carmesin, H. L. Frisch, and J. K. Percus, J. Stat. Phys. 63, 791 (1991).
71. A. Santos and M. López de Haro, Phys. Rev. E 72, 010501(R) (2005).
72. R. Roth, R. Evans, and A. A. Louis, Phys. Rev. E 64, 051202 (2001).
73. S. B. Yuste and A. Santos, Phys. Rev. A 43, 5418 (1991).
74. S. B. Yuste, M. López de Haro, and A. Santos, Phys. Rev. E 53, 4820 (1996).
75. M. Robles, M. López de Haro, A. Santos, and S. B. Yuste, J. Chem. Phys. 108,
1290 (1998).
76. M. Robles and M. López de Haro, Europhys. Lett. 62, 56 (2003).
77. M. Robles and M. López de Haro, J. Chem. Phys. 107, 4648 (1997).
78. E. Waisman, Mol. Phys. 25, 45 (1973); D. Henderson and L. Blum, Mol. Phys.
32, 1627 (1976); J. S. Høye and L. Blum, J. Stat. Phys. 16, 399 (1977).
79. A. Dı́ez, J. Largo, and J. R. Solana, J. Chem. Phys. 125, 074509 (2006).
80. J. Kolafa, S. Lab́ık, and A. Malijevský, Phys. Chem. Chem. Phys. 6, 2335
(2004). See also http://www.vscht.cz/fch/software/hsmd/ for molecular dy-
namics results of g(r).
81. A. Trokhymchuk, I. Nezbeda, J. Jirsák, and D. Henderson, J. Chem. Phys.
123, 024501 (2005).
82. M. López de Haro, A. Santos, and S. B. Yuste, J. Chem. Phys. 124, 236102
(2006).
83. L. L. Lee, J. Chem. Phys. 103, 9388 (1995); L. L. Lee, D. Ghonasgi, and E.
Lomba, J. Chem. Phys. 104, 8058 (1996); L. L. Lee and A. Malijevský, J.
Chem. Phys. 114, 7109 (2001).
84. S. Lab́ık and A. Malijevský, Mol. Phys. 53, 381 (1984).
85. Al. Malijevský and A. Santos, J. Chem. Phys. 124, 074508 (2006).
86. A. Santos and Al. Malijevský, Phys. Rev. E 75, 021201 (2007).
87. S. B. Yuste, A. Santos, and M. López de Haro, J. Chem. Phys. 108, 3683
(1998).
88. L. Blum and J. S. Høye, J. Phys. Chem. 81, 1311 (1977).
89. J. Abate and W. Whitt, Queuing Systems 10, 5 (1992).
90. A code using the Mathematica computer algebra system to obtain Gij(s)
and gij(r) with the present method is available from the web page
http://www.unex.es/eweb/fisteor/santos/filesRFA.html.
91. N. W. Ashcroft and D. C. Langreth, Phys. Rev. 156, 685 (1967).
92. A. B. Bathia and D. E. Thornton, Phys. Rev. B 8, 3004 (1970).
93. S. B. Yuste, A. Santos, and M. López de Haro, Mol. Phys. 98, 439 (2000).
94. R. J. Baxter, J. Chem. Phys. 49, 2270 (1968).
95. J. W. Perram and E. R. Smith, Chem. Phys. Lett. 35, 138 (1975).
96. B. Barboy, Chem. Phys. 11, 357 (1975); B. Barboy and R. Tenne, Chem. Phys.
38, 369 (1979).
97. G. Stell, J. Stat. Phys. 63, 1203 (1991); B. Borštnik, C. G. Jesudason, and G.
Stell, J. Chem. Phys. 106, 9762 (1997).
98. B. Barboy, J. Chem. Phys. 61, 3194 (1974).
99. J. W. Perram and E. R. Smith, Chem. Phys. Lett. 39, 328 (1975); P. T.
Cummings, J. W. Perram, and E. R. Smith, Mol. Phys. 31, 535 (1976); E. R.
Smith and J. W. Perram, J. Stat. Phys. 17, 47 (1977); J. W. Perram and E.
R. Smith, Proc. R. Soc. London A353, 193 (1977); W. G. T. Kranendonk and
D. Frenkel, Mol. Phys. 64, 403 (1988); C. Regnaut and J. C. Ravey, J. Chem.
Phys. 91, 1211 (1989); G. Stell and Y. Zhou, J. Chem. Phys. 91, 3618 (1989);
Alternative Approaches to Hard-Sphere Liquids 63
J. N. Herrera and L. Blum, J. Chem. Phys. 94, 6190 (1991); A. Jamnik, D.
Bratko, and D. J. Henderson, J. Chem. Phys. 94, 8210 (1991); S. V. G. Menon,
C. Manohar, and K. S. Rao, J. Chem. Phys.; 95, 9186 (1991); Y. Zhou and G.
Stell, J. Chem. Phys. 96, 1504 (1992); E. Dickinson, J. Chem. Soc. Faraday
Trans. 88, 3561 (1992); C. F. Tejero and M. Baus, Phys. Rev. E, 48, 3793
(1993); K. Shukla and R. Rajagopalan, Mol. Phys. 81, 1093 (1994); C. Regnaut,
S. Amokrane, and Y. Heno, J. Chem. Phys. 102, 6230 (1995); C. Regnaut, S.
Amokrane, and P. Bobola, Prog. Colloid Polym. Sci. 98, 151 (1995); Y. Zhou,
C. K. Hall, and G. Stell, Mol. Phys. 86, 1485 (1995); J. N. Herrera-Pacheco
and J. F. Rojas-Rodŕıguez, Mol. Phys. 86, 837 (1995); Y. Hu, H. Liu, and J. M.
Prausnitz, J. Chem. Phys. 104, 396 (1996); O. Bernard and L. Blum, J. Chem.
Phys. 104, 4746 (1996); L. Blum, M. F. Holovko, and I. A. Protsykevych, J.
Stat. Phys. 84, 191 (1996); S. Amokrane, P. Bobola and C. Regnaut, Prog.
Colloid Polym. Sci. 100, 186 (1996); S. Amokrane and C. Regnaut, J. Chem.
Phys. 106, 376 (1997); C. Tutschka, G. Kahl, and E. Riegler, Mol. Phys. 100,
1025 (2002); D. Gazzillo and A. Giacometti, Mol. Phys. 100, 3307 (2002); M.
A. Miller and D. Frenkel, Phys. Rev. Lett. 90, 135702 (2003); D. Gazzillo and
A. Giacometti, J. Chem. Phys. 120, 4742 (2004); R. Fantoni, D. Gazzillo, and
A. Giacometti, Phys. Rev. E 72, 011503 (2005); A. Jamnik, Chem. Phys. Lett.
423, 23 (2006).
100. A. J. Post and E. D. Glandt, J. Chem. Phys. 84, 4585 (1986); N. A. Seaton
and E. D. Glandt, J. Chem. Phys. 84, 4595 (1986); 86, 4668 (1986); 87, 1785
(1987).
101. M. A. Miller and D. Frenkel, J. Phys.: Condens. Matter 16, S4901 (2004).
102. M. A. Miller and D. Frenkel, J. Chem. Phys. 121, 535 (2004).
103. Al. Malijevský, S. B. Yuste, and A. Santos, J. Chem. Phys. 125, 074507 (2006).
104. A. Santos, S. B. Yuste, and M. López de Haro, J. Chem. Phys. 109, 6814
(1998).
105. S. B. Yuste and A. Santos, J. Stat. Phys. 72, 703 (1993).
106. S. B. Yuste and A. Santos, Phys. Rev. E 48, 4599 (1993).
107. S. B. Yuste and A. Santos, J. Chem. Phys. 101, 2355 (1994).
108. L. Acedo and A. Santos, J. Chem. Phys. 115, 2805 (2001).
109. L. Acedo, J. Stat. Phys. 99, 707 (2000).
110. J. Largo, J. R. Solana, L. Acedo, and A. Santos, Mol. Phys. 101, 2981 (2003).
111. J. Largo, J. R. Solana, S. B. Yuste, and A. Santos, J. Chem. Phys. 122, 084510
(2005).
112. C. Freasier and D. J. Isbister, Mol. Phys. 42, 927 (1981).
113. E. Leutheusser, Physica A 127, 667 (1984).
114. M. Robles, M. López de Haro, and A. Santos, J. Chem. Phys. 120, 9113 (2004).
115. D. G. Chae, F. H. Ree, and T. Ree, J. Chem. Phys. 50, 1581 (1976).
116. S. B. Yuste and A. Santos, J. Chem. Phys. 99, 2020 (1993).
117. G. A. Mansoori and F. B. Canfield, J. Chem. Phys. 51, 4958 (1969).
118. G. A. Mansoori, J. A. Provine, and F. B. Canfield, J. Chem. Phys. 51, 5295
(1969).
119. J. Rasaiah and G. Stell, Mol. Phys. 18, 249 (1970).
120. J. A. Barker and D. Henderson, J. Chem. Phys. 47, 2856 (1967).
121. The macroscopic compressibility approach is only one of the possibilities of ap-
proximation to the second order Barker–Henderson perturbation theory term.
Another successful approach is the local-compressibility approximation (see
64 M. López de Haro, S. B. Yuste and A. Santos
Ref. [2], p. 308). This expresses the free energy in terms of φ1(r) and HS
quantities.
122. J. D. Weeks, D. Chandler, and H. C. Andersen, J. Chem. Phys. 53, 149 (1971).
123. A simple algorithm to compute a rather accurate approximation for the HS
diameter σ0 in the WCA theory has been given in L. Verlet and J. J. Weis,
Phys. Rev. A 5, 939 (1972).
124. D. Henderson and E. W. Grundke, J. Chem. Phys. 63, 601 (1975).
125. J. A. Ballance and R. J. Speedy, Mol. Phys. 54, 1035 (1985).
126. Y. Zhou and G. Stell, J. Stat. Phys. 52, 1389 (1988).
127. J.-P. Hansen and L. Verlet, Phys. Rev. 184, 151 (1969).
128. R. J. Speedy, J. Chem. Phys. 100, 6684 (1994).
129. M. Robles and M. López de Haro, Phys. Chem. Chem. Phys. 3, 5528 (2001).
130. M. López de Haro and M. Robles, J. Phys.: Condens. Matt. 16, S2089 (2004).
131. M. López de Haro and M. Robles, Physica A 372, 307 (2006).
132. C. N. Likos, Phys. Rep. 348, 267 (2001).
133. E. A. Jagla, J. Chem. Phys. 111, 8980 (1999).
ABSTRACT
  An overview of some analytical approaches to the computation of the
structural and thermodynamic properties of single component and multicomponent
hard-sphere fluids is provided. For the structural properties, they yield a
thermodynamically consistent formulation, thus improving and extending the
known analytical results of the Percus-Yevick theory. Approximate expressions
for the contact values of the radial distribution functions and the
corresponding analytical equations of state are also discussed. Extensions of
this methodology to related systems, such as sticky hard spheres and
square-well fluids, as well as its use in connection with the perturbation
theory of fluids are briefly addressed.

<|endoftext|><|startoftext|>
APS/xxx
Complexities of Human Promoter Sequences
Fangcui Zhao1,∗ Huijie Yang2,3,† and Binghong Wang4
College of Life Science and Bioengineering, Beijing University of Technology, Beijing 100022, China
Department of Physics, National University of Singapore, Science Drive 2, Singapore 117543
School of Management, University of Shanghai for Science and Technology,
and Shanghai Institute for Systematic Science, Shanghai 200093, China
Department of Modern Physics, University of Science and Technology of China, Anhui Hefei 230026, China
(Dated: October 26, 2018)
By means of the diffusion entropy approach, we detect the scale-invariance characteristics embed-
ded in the 4737 human promoter sequences. The exponent for the scale-invariance is in a wide range
of [0.3, 0.9], which centered at δc = 0.66. The distribution of the exponent can be separated into
left and right branches with respect to the maximum. The left and right branches are asymmetric
and can be fitted exactly with Gaussian form with different widths, respectively.
PACS numbers: 82.39.Pj, 05.45.Tp
I. INTRODUCTION
Understanding gene regulation is one of the most excit-
ing topics in molecular genetics [1]. Promoter sequences
are crucial in gene regulation. The analysis of these re-
gions is the first step towards complex models of regula-
tory networks.
A promoter is a combination of different regions with
different functions [2, 3, 4, 5]. Surrounding the tran-
scription start site is the minimal sequence for initiat-
ing transcription, called core promoter. It interacts with
RNA polymerase II and basal transcription factors. Few
hundred base pairs upstream of the core promoter are
the gene-specific regulatory elements, which are recog-
nized by transcription factors to determine the efficiency
and specificity of promoter activity. Far distant from
the transcription start site there are enhancers and dis-
tal promoter elements which can considerably affect the
rate of transcription. Multiple binding sites contribute
to the functioning of a promoter, with their position and
context of occurrence playing an important role. Large-
scale studies show that repeats participate in the regula-
tion of numerous human and mouse genes [6]. Hence, the
promoter’s biological function is a cooperative process of
different regions such as the core promoter, the gene-
specific regulatory elements, the enhancers/silencers, the
insulators, the CpG islands and so forth. But how they
cooperate with each other is still a problem to be inves-
tigated carefully.
The structures of DNA sequences determine their bio-
logical functions [7]. Recent years witness an avalanche
of finding nontrivial structure characteristics embedded
in DNA sequences. Detailed works show that the non-
coding sequences carry long-range correlations [8, 9, 10].
The size distributions of coding sequences and non-
coding sequences obey Gaussian or exponential and
∗Electronic address: yangzhaon@eyou.com
†Electronic address: huijieyangn@eyou.com; Corresponding author
power-law [11, 12], respectively. Theoretical model-based
simulations [13, 14, 15, 16] tell us that the parts of the
promoters where the RNA transcription has started are
more active than a random portion of the DNA. By
means of the nonlinear modeling method it is found that
along the putative promoter regions of human sequences
there are some segments much more predictable com-
pared with other segments [17]. All the evidences suggest
that the nontrivial structure characteristics of a promoter
determine its biological functions. The statistical prop-
erties of a promoter may shed light on the cooperative
process of different regions.
Experimental knowledge of the precise 5’ ends of cD-
NAs should facilitate the identification and characteri-
zation of regulatory sequence elements in proximal pro-
moters [18]. Using the oligocapping method, Suzuki et
al. identify the transcriptional start sites from cDNA
libraries enriched in full-length cDNA sequences. The
identified transcriptional start sites are available at the
Database, http://dbtss.hgc.jp/. [19]. Consequently,
Leonardo et al. have used this data set and aligned
the full-length cDNAs to the human genome, thereby
extracting putative promoter regions (PPRs) [20]. Us-
ing the known transcriptional start sites from over 5700
different human full-length cDNAs, a set of 4737 distinct
PPRs are extracted from the human genome. Each PPR
consists nucleotides from −2000 to +1000bp, relative to
the corresponding transcriptional start site. They have
also counted eight-letter words within the PPRs, using
z-scores and other related statistics to evaluate the over-
and under- representations.
In this paper, by means of the concept of diffusion
entropy (DE) we try to detect the scale-invariant char-
acteristics in these putative promoter regions.
II. DIFFUSION ENTROPY ANALYSIS
The diffusion entropy (DE) method is firstly designed
to capture the scale-invariance embedded in time series
[21, 22, 23]. To keep the description as self-contained as
http://arxiv.org/abs/0704.0158v1
mailto:yangzhaon@eyou.com
mailto:huijieyangn@eyou.com
http://dbtss.hgc.jp/
possible, we review briefly the procedures.
We consider a PPR denoted with Y =
(y1, y2, · · · , y3001), where ys is the element at the
position s and ys = A, T,C or G. Replacing A, T and
C,G with −1 and +1, respectively, the original PPR is
mapped to a time series X = (x1, x2, · · · , x3001). There
is not a trend in this series, i.e., X is stationary.
Connecting the starting and the end of X , we can ob-
tain a set of delay-register vectors, which reads,
T1(t) = (x1, x2, · · · , xt)
T2(t) = (x2, x3, · · · , xt+1)
T3001(t) = (x3001, x1, · · · , xt−1)
Regarding each vector as a trajectory of a particle in
duration of t time units, all the vectors can be described
as a diffusion process of a system containing 3001 parti-
cles. The initial state of the system is
T1(0)
T2(0)
T3001(0)
Accordingly, at each time step t we can calculate dis-
placements of all the particles. The probability distribu-
tion function (PDF) of the displacements can be approx-
imated with p(m, t) ∼ Km/3001, where m = −t,−t +
1, · · · , t and Km is the number of the particles whose
displacements are m. It can represent the state of the
system at time t.
As a tenet of complexity theory [24, 25], complexity is
related with the concept of scaling invariance. For the
constructed diffusion process, the scaling invariance is
defined as,
p(m, t) ≈
, (2)
where δ is the scaling exponent and can be regarded as a
quantitative description of the PPR’s complexity. If the
elements in the PPR are positioned randomly, the result-
ing PDF obeys a Gaussian form and δ = 0.5. Complexity
of the PPR is expected to generate a departure from this
ordinary condition, that is, δ 6= 0.5.
The value of δ can tell us the pattern characteristics of
a PPR. The departure from the ordinary condition can
be described with a preferential effect. Let the element
is A, T (or C,G), the preferential probability for the fol-
lowing element’s being A, T (or C,G) is Wpre. A positive
preferential effect, i.e, Wpre > 0.5, leads to the value of δ
larger than 0.5. While a negative preferential effect, i.e,
Wpre < 0.5, can induce the value of δ smaller than 0.5.
101 102 103
PPR-1
 Calculate
=0.662
101 102 103
PPR-1000
 Calculate
=0.760
101 102 103
PPR-2000
 Calculate
=0.703
101 102 103
PPR-3000
 Calculate
=0.500
FIG. 1: (Color online) Typical DE results. The results for
the PPRs numbered 1, 1000, 2000 and 3000 are presented. In
considerable wide regions of t , the curves of DE can be fitted
almost exactly with the linear relation in Eq.(4).
Hence, a large value of δ implies that A, T or C,G accu-
mulate strongly in a scale-invariance way, respectively.
However, correct evaluation of the scaling exponent is a
nontrivial problem. In literature, variance-based method
is used to detect the scale-invariance. But the obtained
Hurst exponent Hmay be different from the real δ, that
is, generally we haveH 6= δ. And for some conditions,
the variance is divergent, which leads the invalidation of
the variance-method at all. To overcome these shortages,
the Shannon entropy for the diffusion can be used, which
reads,
S(t) = −
p(m, t) ln p(m, t)
This diffusion-based entropy is called diffusion entropy
(DE). A simple computation leads the relation between
the scaling invariance defined in Eq.2 and the DE as,
S(t) = A+ δ ln t, (4)
where A is a constant depends on the PDF. Detailed
works show that DE is a reliable method to search the
correct value of δ, regardless the form of the PDF [26,
27, 28, 29].
The complexity in the PDF can be catalogued into
two levels [30], the primary one due to the extension of
the probability to all the possible displacements m, and
the secondary one due to the internal structures. Conse-
quently, we should consider also the corresponding shuf-
fling sequences as comparison.
0 200 400 600 800 1000 1200 1400 1600
FIG. 2: Distribution of the maximum interval ∆t in which one
can find scale-invariant characteristics. Keeping the standard
deviation of the fitting result in the range of ≤ 0.05 , we
can find the maximum intervals ∆t for all the PPRs. The
distribution tells us that generally the scale-invariance can be
found over two to three decades of the scale t .
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
600  Calculate
 Gauss fit
 Gauss fit
left=0.67
   wleft=0.17
right=0.65
   wright=0.10
FIG. 3: (Color online) The complex index δ distributes in
a wide range of [0.3, 0.9]. The distribution can be sepa-
rated into two branches with respect to the center δc = 0.66.
The two branches are asymmetric and obey exactly the Gau-
usian function, respectively. The widths and centers of
the left and right branches are (wleft, xleftc ) = (0.17, 0.67),
(wright, xrightc ) = (0.10, 0.65). The centers coincide with each
other, wleft ≈ wright ≈ δc = 0.66. The right branch dis-
tributes in a significant narrow region.
III. RESULTS AND DISCUSSIONS
The DEs for all the 4737 PPRs are calculated. As a
typical example, Fig.1 presents the DE results for the
PPRs numbered 1, 1000, 2000 and 3000. In considerable
wide regions of t, the curves of DE can be fitted almost
exactly with the linear relation in Eq.4.
For each PPR, there exists an interval, t0 ∼ t0+∆t, in
which the PDF behaves scale-invariance. Keeping simul-
taneously the standard deviation and the error of the
scaling exponent for the fitting result in the range of
≤ 0.05 and ≤ 0.03, we can find the maximum intervals
∆t for all the PPRs. In the fitting procedure, the confi-
dence level is set to be 95%. The distribution of ∆t, as
shown in Fig.2, tells us that generally the scale-invariance
can be found over two to three decades of the scale t.
The concept of DE is based upon statistical theory, that
is, t0 should be large enough so that the statistical as-
sumptions are valid. To cite an example, we consider a
random series, whose elements obey a homogenous dis-
tribution in [0, 1]. Only the length of the delay-register
vectors, t, in Eq.(1) is large enough, the corresponding
PDF for the displacements, i.e, the summation value of
each delay-register vector, approaches the Gaussian dis-
tribution. Consequently, t0 is not a valuable parameter.
The values of t0 for different PPRs are not presented.
The resulting scaling exponent δ ± 0.03 distributes
in a wide range of [0.3, 0.9]. The distribution can
be separated into two branches with respect to the
center δc = 0.66. The two branches are asymmet-
ric and can be fitted exactly with the Gauusian func-
tion, respectively. The widths and centers of the left
and right branches are (wleft, xleftc ) = (0.17, 0.67),
(wright, xrightc ) = (0.10, 0.65). That is to say, the centers
coincide with each other, wleft ≈ wright ≈ δc = 0.66.
Comparatively, the right branch distributes in a signifi-
cant narrow region.
The PPRs are shuffled also. For each PPR, the shuf-
fling result is obtained by averaging over ten shuffling
samples. The scaling exponents are almost same, i.e.,
δshuffling = 0.5±0.03. The detected scale-invariant char-
acteristics are internal-structure-related.
How to understand the asymmetric characteristic of
the distribution of the complexity index δ is an in-
teresting problem. In literature, some statistical char-
acteristics of DNA sequences are captured with evo-
lution models, such as the long-range correlations and
the over- and under-representation of strings and so on
[31, 32, 33]. From the perspective of evolution, per-
haps the distribution characteristics may favor a stochas-
tic evolution model. The initial sequences have same
complexity δinitial = δc = 0.66. With the evolution
processes the sequences diffuse along two directions, in-
creasing complexity and decreasing complexity, i.e, the
index δ increases and decreases, respectively. The diffu-
sion coefficients for the two directions are significantly
different, denoted with Dleft 6= Dright. Based upon
the widths of the two branches we can estimate that,
Dleft
Dright =
δleft
δright = 1.7. It should be noted
that, the complexity is regarded as the departure from
the ordinary condition, δ = 0.5. In the totally 4737 values
of δ, only a small portion of them are less than 0.5. Ac-
cordingly, the PPRs may be catalogued into two classes,
the PPRs with high complexity and the PPRs with low
complexity. The former class evolves averagely with a
slow speed while the later one with a high speed.
In summary, by means of the DE method, we calculate
the complexities of the 4737 PPRs. The distribution of
the complexity index includes two asymmetric branches,
which obey Gaussian form with different widths, respec-
tively. A stochastic evolution model may provide us a
comprehensive understand of these characteristics.
IV. ACKNOWLEDGEMENTS
This work is funded by the National Natural Sci-
ence Foundation of China under Grant Nos. 70571074,
10635040 and 70471033, by the National Basic Re-
search Program of China (973 Program) under grant
No.2006CB705500), by the President Funding of Chinese
Academy of Science, and by the Specialized Research
Fund for the Doctoral Program of Higher Education of
China. One of the authors (H. Yang) would like to thank
Prof. Y. Zhuo for stimulating discussions.
[1] Ohler,U. and Niemann,H. (2001) Identification and anal-
ysis of eukaryotic promoters: recent computational ap-
proaches. Trends Genet., 17, 56-60.
[2] Werner,T. (1999) Models for prediction and recognition
of eukaryotic promoters. Mammalian Genome, 10, 168-
[3] Pedersen,A.G., Baldi,P., Chauvin,Y., Brunak,S. (1999)
The biology of eukaryotic promoter prediction - a review.
Comput. Chem., 23, 191-207.
[4] Zhang,M.Q. (2002) Computational methods for promoter
recognition. In: Jiang T, Xu Y, Zhang,M.Q., editors.
Current topics in computational molecular biology. Cam-
bridge, Massachusetts: MIT Press; p. 249-268.
[5] Narang,V., Sung,W.-K., Mittal A. (2005) Computational
modeling of oligonucleotides positional densities for hu-
man promoter prediction. Art. Intel. Med., 35, 107-119.
[6] Rosenberg, N. and Jolicoeur, P. (1997) Retroviral patho-
genesis. In Retroviruses (Coffin, J.M. et al., eds), pp.
475–586, Cold Spring Harbor Press.
[7] Buldyrev,S.V., Goldberger,A.L., Havlin,S., Man-
tegna,R.N., Matsa,M.E., Peng,C.-K., Simons, M. and
Stanley,H.E. (1995) Long-range correlation properties
of coding and noncoding DNA sequences: GenBank
analysis. Phys. Rev. E 51, 5084-5091.
[8] Peng,C.K., Buldyrev,S., Goldberger,A., Havlin, S.,
Sciortino,F., Simons,M. and Stanley,H.E.(1992) Long-
range correlations in nucleotide sequences. Nature 356,
168-171.
[9] Li,W., Kaneko,K. (1992) Long-range correlations and
partial 1/f −α spectrum in a noncoding DNA sequence.
Europhys. Lett. 17, 655.
[10] Yang,H., Zhao,F., Zhuo,Y., et al. (2002) Analysis of DNA
chains by means of factorial moments. Phys. Lett. A
292,349-356.
[11] Provata, A., Almirantis,Y. (1997) Scaling properties of
coding and non-coding DNA sequences. Physica A 247,
[12] Provata,A., Almirantis,Y. (2000) Fractal cantor patterns
in the sequence structure of DNA, Fractals 8 ,15-27.
[13] Yang,H., Zhuo,Y., Wu,X. (1994) Investigation of ther-
mal denaturation of DNA molecules based upon non-
equilibrium transport approach. J. Phys. A 27, 6147-
6156.
[14] Salerno,M. (1991) Discrete model for DNA-promoter dy-
namics. Phys. Rev. A 44, 5292-5297.
[15] Lennholm,E., Homquist,M. (2003) Revisiting Salerno’s
sine-Gordon model of DNA: active regions and robust-
ness. Physica D 177, 233-241.
[16] Kalosakas,G., Rasmussen,K.O. and Bishop,A.R. (2004)
Sequence-specific thermal fluctuations identify start sites
for DNA transcription. Europhys. Lett. 68, 127-133.
[17] Yang,H., Zhao,F., Gu,J. and Wang,B. (2006) Nonlin-
ear modeling approach to human promoter sequences. J.
Theo. Bio. 241, 765-773.
[18] Trinklein, N.D., Aldred, S.J., Saldanha, A.J., My-
ers, R.M., 2003. Identification and functional analysis
of human transcriptional promoters. Genome Res. 13,
308C312.
[19] Suzuki,Y., Yamashita,R., Nakai,K. and Sugano,S. (2002)
DBTSS:DataBase of human transcriptional start sites
and full-length cDNAs. Nucleic Acids Res., 30, 328-331.
[20] Leonardo,M.–R., John,L.S., Gavin,C.K. and
David,L. (2004) Statistical analysis of over-
represented words in human promoter sequences.
Nucleic Acids Res., 32, 949-958. See also,
ftp://ftp.ncbi.nlm.nih.gov/pub/marino/published/
hs promoters/fasta/.
[21] Grigolini,P., Palatella,L. and Raffaelli,G. (2001) Complex
Geometry, Patterns, and Scaling in Nature and Society.
Fractals 9, 439-449.
[22] Scafetta,N., Hamilton,P. and Grigolini,P. (2001) The
thermodynamics of social processes: the teen birth phe-
nomenon. Fractals 9, 193-208.
[23] Scafetta,N. and Grigolini,P. (2002) Scaling detection in
time series: Diffusion entropy analysis. Phys. Rev. E 66,
036130.
[24] Bar-Yam, Y. (1997) Dynamics of Complex Systems.
Addison-Wesley, Reading, MA.
[25] Mandelbrot,B. B. (1988) Fractal Geometry of Nature.
W.H. Freeman, San Francisco, CA.
[26] Scafetta,N. and West, B. J. (2003) Solar flare intermit-
tency and the earth’s temperature anomalies. Phys. Rev.
Lett. 90, 248701.
[27] Scafetta,N., Latora,V. and Grigolini,P. (2002) Levy scal-
ing: The diffusion entropy analysis applied to DNA se-
quences. Phys. Rev. E 66, 031906.
[28] Yang,H., Zhao,F., Zhang,W. and Li,Z. (2005) Diffusion
entropy approach to complexity for a Hodgkin–Huxley
neuron. Physica A 347, 704-710.
[29] Yang,H., Zhao,F., Qi,L. and Hu,B. (2004) Temporal se-
ries analysis approach to spectra of complex networks.
Phys. Rev. E 69, 066104.
[30] Pipek,J. and Varga,I. (1992) Universal classification
scheme for the spatial-localization properties of one-
ftp://ftp.ncbi.nlm.nih.gov/pub/marino/published/
particle states in finite, d-dimensional systems. Phys.
Rev. A 46,3148-3163.
[31] Hsieh,L.-C., Luo,L., Ji F. and Lee,H.C. (2003) Minimal
model for genome evolution and growth. Phys. Rev. Lett.
90, 018101.
[32] Kloster,M. (2005) Analysis of evolution through compet-
itive selection. Phys. Rev. Lett. 95, 168701.
[33] Messer,P.W., Arndt,P.F. and Lassig,M. (2005) Solvable
sequence evolution models and genomic correlations.
Phys. Rev. Lett. 94, 138103.
ABSTRACT
  By means of the diffusion entropy approach, we detect the scale-invariance
characteristics embedded in the 4737 human promoter sequences. The exponent for
the scale-invariance is in a wide range of $[ {0.3,0.9} ]$, which centered at
$\delta_c = 0.66$. The distribution of the exponent can be separated into left
and right branches with respect to the maximum. The left and right branches are
asymmetric and can be fitted exactly with Gaussian form with different widths,
respectively.

<|endoftext|><|startoftext|>
APS/123-QED
Evidence for an excitonic insulator phase in 1T -TiSe2
H. Cercellier,∗ C. Monney, F. Clerc, C. Battaglia, L. Despont, M. G. Garnier, H. Beck, and P. Aebi
Institut de Physique, Université de Neuchâtel, CH-2000 Neuchâtel, Switzerland
L. Patthey
Swiss Light Source, Paul Scherrer Institute, CH-5232 Villigen, Switzerland
H. Berger
Institut de Physique de la Matière Complexe, EPFL, CH-1015 Lausanne, Switzerland
(Dated: October 22, 2018)
We present a new high-resolution angle-resolved photoemission study of 1T -TiSe2 in both, its
room-temperature, normal phase and its low-temperature, charge-density wave phase. At low tem-
perature the photoemission spectra are strongly modified, with large band renormalisations at high-
symmetry points of the Brillouin zone and a very large transfer of spectral weight to backfolded
bands. A theoretical calculation of the spectral function for an excitonic insulator phase reproduces
the experimental features with very good agreement. This gives strong evidence in favour of the
excitonic insulator scenario as a driving force for the charge-density wave transition in 1T -TiSe2.
PACS numbers:
Transition-metal dichalcogenides (TMDC’s) are lay-
ered compounds exhibiting a variety of interesting phys-
ical properties, mainly due to their reduced dimension-
ality [1]. One of the most frequent characteristics is a
ground state exhibiting a charge-density wave (CDW),
with its origin arising from a particular topology of the
Fermi surface and/or a strong electron-phonon coupling
[2]. Among the TMDC’s 1T -TiSe2 shows a commensu-
rate 2×2×2 structural distortion below 202 K, accom-
panied by the softening of a zone boundary phonon and
with changes in the transport properties [3, 4]. In spite
of many experimental and theoretical studies, the driv-
ing force for the transition remains controversial. Sev-
eral angle-resolved photoelectron spectroscopy (ARPES)
studies suggested either the onset of an excitonic insula-
tor phase [5, 6] or a band Jahn-Teller effect [7]. Further-
more, TiSe2 has recently attracted strong interest due to
the observation of superconductivity when intercalated
with Cu [8]. In systems showing exotic properties, such
as Kondo systems for example [9], the calculation of the
spectral function has often been a necessary and deci-
sive step for the interpretation of the ARPES data and
the determination of the ground state of the systems. In
the case of 1T -TiSe2, such a calculation for an excitonic
insulator phase lacked so far.
In this letter we present a high-resolution ARPES
study of 1T -TiSe2, together with theoretical calculations
of the excitonic insulator phase spectral function for this
compound. We find that the experimental ARPES spec-
tra show strong band renormalisations with a very large
transfer of spectral weight into backfolded bands in the
low-temperature phase. The spectral function calculated
for the excitonic insulator phase is in strikingly good
Electronic address: herve.cercellier@unine.ch
agreement with the experiments, giving strong evidence
for the excitonic origin of the transition.
The excitonic insulator model was first introduced in
the sixties, for a semi-conductor or a semi-metal with a
very small indirect gap EG [10, 11, 12, 13]. Thermal ex-
citations lead to the formation of holes in the valence
band and electrons in the conduction band. For low
free carrier densities, the weak screening of the electron-
hole Coulomb interaction leads to the formation of sta-
ble electron-hole bound states, called excitons. If the
exciton binding energy EB is larger than the gap energy
EG, the system becomes unstable upon formation of exci-
tons. This instability can drive a transition to a coherent
ground state of condensed excitons, with a periodicity
given by the spanning vector w that connects the va-
lence band maximum to the conduction band minimum.
In the particular case of TiSe2, there are three vectors
(wi, i = 1, 2, 3) connecting the Se 4p-derived valence
band maximum at the Γ point to the three symmetry-
equivalent Ti 3d-derived conduction band minima at the
L points of the Brillouin zone (BZ) (see inset of fig. 1b)).
Our calculations are based on the BCS-like model of
Jérome, Rice and Kohn [12], adapted for multiple wi.
The band dispersions for the normal phase have been
chosen of the form
ǫv(k) = ǫ
v + ~
k2x + k
+ tv cos(
c(k,wi) = ǫ
c + ~
( (kx − wix)
(ky − wiy)
+tc cos
(2π(kz − wiz)
for the valence (ǫv) and the three conduction (ǫ
c) bands
respectively, with c the lattice parameter perpendicular
to the surface in the normal (1×1×1) phase, tv and tc the
amplitudes of the respective dispersions perpendicular to
the surface and mv, mc the effective masses.
http://arxiv.org/abs/0704.0159v1
mailto:herve.cercellier@unine.ch
The parameters for equations 1 were derived from pho-
ton energy dependent ARPES measurements carried out
at the Swiss Light Source on the SIS beamline, using
a Scienta SES-2002 spectrometer with an overall energy
resolution better than 10 meV, and an angular resolution
better than 0.5◦. The fit to the data gives for the Se 4p
valence band maximum -20 ± 10 meV, and for the Ti 3d
conduction band a minimum -40 ± 5 meV [14]. From our
measurements we then find a semimetallic band structure
with a negative gap (i.e. an overlap) EG=-20 ± 15 meV
for the normal phase of TiSe2, in agreement with the lit-
erature [15]. The dispersions deduced from the ARPES
data are shown in fig. 1a) (dashed lines).
Within this model the one-electron Green’s functions
of the valence and the conduction bands were calculated
for the excitonic insulator phase. For the valence band,
one obtains
Gv(k, z) =
z − ǫv(k)−
|∆|2(k,wi)
z − ǫc(k+wi)
. (2)
This is a generalized form of the equations of Ref. [12]
for an arbitrary number of wi. The order parameter ∆ is
related to the number of excitons in the condensed state
at a given temperature. For the conduction band, there
is a system of equations describing the Green’s functions
Gic corresponding to each spanning vector vector wi:
z − ǫic(k+wi)
c(k+wi, z) = 1 +∆
∗(k,wi)
∆(k,wj)G
c(k+wj, z)
z − ǫv(k)
This model and the derivation of the Green’s functions
will be further described elsewhere [16].
The spectral function calculated along several high-
symmetry directions of the BZ is shown in fig. 1a) for an
order parameter ∆=0.05 eV. Its value has been chosen for
best agreement with experiment. The color scale shows
the spectral weight carried by each band. For presen-
tation purposes the δ-like peaks of the spectral function
have been broadened by adding a constant 30 meV imagi-
nary part to the self-energy. In the normal phase (dashed
lines), as previously described we consider a semimetal
with a 20 meV overlap, with bands carrying unity spec-
tral weight. In the excitonic phase, the band structure
is strongly modified. The first observation is the appear-
ance of new bands (labeled C1, V2 and C3), backfolded
with the spanning vector w = ΓL. The C1, V2 and C3
branches are the backfolded replicas of branches C2, V3
and C4 respectively. In this new phase the Γ and L points
are now equivalent, which means that the excitonic state
has a 2×2×2 periodicity of purely electronic origin, as ex-
pected from theoretical considerations [10, 12]. Another
effect of exciton condensation is the opening of a gap in
the excitation spectrum. This results in a flattening of
the valence band near Γ in the ΓM direction (V1 branch)
and in the AΓ direction (V3 branch), and also an upward
bend of the conduction band near L and M (C2 and C4
Γ Μ Α ΓL
FIG. 1: : a) Spectral function of the excitonic insulator in a
1T structure calculated for a 20 meV overlap and an order
parameter ∆=0.05 eV. The V1-V3 (resp. C1-C4) branches
refer to the valence (resp. conduction) band. Dashed lines
correspond to the normal phase (∆=0). The path in recipro-
cal space is shown in red in the inset. b) Spectral weight of
the different bands. Inset : bulk Brillouin zone of 1T -TiSe2.
branches). It is interesting to notice that in the vicinity of
these two points, the conduction band is split (arrows).
This results from the backfolding of the L points onto
each other, according to the new periodicity of the exci-
tonic state [17]. The spectral weight carried by the bands
is shown in fig. 1b). The largest variations occur near
the Γ, L and M points, where the band extrema in the
normal phase are close enough for excitons to be created.
Away from these points, the spectral weight decreases in
the backfolded bands (C1, V2, C3) and increases in the
others. The intensity of the V1 branch, for example, de-
creases by a factor of 2 when approaching Γ, whereas
the backfolded C1 branch shows the opposite behaviour.
Such a large transfer of spectral weight into the back-
folded bands is a very uncommon and striking feature.
Indeed, in most compounds with competing potentials
(CDW systems, vicinal surfaces,...), the backfolded bands
carry an extremely small spectral weight [18, 19, 20]. In
these systems the backfolding results mainly from the in-
fluence of the modified lattice on the electron gas, and
the weight transfer is related to the strength of the new
crystal potential component. Here, the case of the exci-
tonic insulator is completely different, as the backfolding
is an intrinsic property of the excitonic state. The large
Τ=250 Κ Τ=250 Κ
Τ=65 Κ Τ=65 Κ
FIG. 2: : ARPES spectra of 1T -TiSe2 for a) the normal and b)
the low temperature phase. Thick dotted lines are parabolic
fits to the bands in the normal phase and thin dotted lines
are guides to the eye for the CDW phase. Fine lines follow
the dispersion of the 4p sidebands (see text).
transfer of spectral weight is then a purely electronic ef-
fect, and turns out to be a characteristic feature of the
excitonic insulator phase.
Fig. 2 shows ARPES spectra recorded at a photon
energy hν=31 eV as a function of temperature. At this
photon energy, the normal emission spectra correspond
to states located close to the Γ point. For the sake of sim-
plicity the description is in terms of the surface BZ high-
symmetry points Γ̄ and M̄ . The 250 K spectra exhibit
the three Se 4p-derived bands at Γ̄ and the Ti 3d-derived
band at M̄ widely described in the literature [5, 6, 7].
The thick dotted lines (white) are fits by equation 1, giv-
ing for the topmost 4p band a maximum energy of -20
± 10 meV, and for the Ti 3d a minimum energy of -40
± 5 meV. The small overlap EG=-20 ± 15 meV in the
normal phase is consistent with the excitonic insulator
scenario, as the exciton binding energy is expected to be
close to that value. [5, 6]. The position of both band
maxima in the occupied states is most probably due to
a slight Ti overdoping of our samples [3]. In our case, a
transition temperature of 180 ± 10 K was found from dif-
ferent ARPES and scanning tunneling microscopy mea-
surements, indicating a Ti doping of less than 1 %. On
the 250 K spectrum at Γ̄, the intensity is low near normal
emission. This reduced intensity and the residual inten-
sity at M̄ around 150 meV binding energy (arrows) may
arise from exciton fluctuations (see reduction of spectral
Τ=250 Κ Τ=250 Κ
Τ=65 Κ Τ=65 Κ
∆=0.05 eV
FIG. 3: : Theoretical spectral function of 1T -TiSe2, calcu-
lated along the path given by the free electron final state
approximation shown in the inset. a) normal state and b)
low temperature phase (see text).
weight near Γ in the V1 branch in fig. 1b). Matrix ele-
ments do not appear to play a role as the intensity vari-
ation only depends very slightly on photon energy and
polarization. In the 65 K data (fig. 2b)), the topmost
4p band flattens near Γ̄ and shifts to higher binding en-
ergy by about 100 meV (thin white, dotted line). This
shift is accompanied by a larger decrease of the spectral
weight near the top of the band. The two other bands
(fine black lines) are only slightly shifted and do not ap-
pear to participate in the transition. In the M̄ spectrum
strong backfolded valence bands can be seen, and the
conduction band bends upwards, leading to a maximum
intensity located about 0.25 Å−1 from M̄ (thin white dot-
ted line). This observation is in agreement with Kidd et
al. [6], although in their case the conduction band was
unoccupied in the normal phase.
The calculated spectral functions corresponding to the
data of fig. 2 are shown in fig. 3, using the free-electron fi-
nal state approximation with a 10 eV inner potential and
a 4.6 eV work function (see inset). The effect of tempera-
ture was taken into account via the order parameter and
the Fermi function. Only the topmost valence band was
considered, as the other two are practically not influenced
by the transition (see above, fig. 2). The behavior of this
band is extremely well reproduced by the calculation. In
the 65 K calculation the valence band is flattened near Γ̄,
and the spectral weight at this point is reduced to 44 %,
close to the experimental value of 35 %. The agreement
E-EF (eV)Model ARPES
−0.025
FIG. 4: : Near-EF constant energy cuts in the vicinity of the
Γ point. The theoretical data correspond to fig. 3b and the
ARPES data are taken from the low-temperature data of fig.
is very satisfying, considering that the calculation takes
into account only the lowest excitonic state. The exper-
imental features appear broader than in the calculation,
but at finite temperatures one may expect the existence
of excitons with non-zero momentum, leading to a spread
of spectral weight away from the high-symmetry points.
In the near-M̄ spectral function, the backfolded va-
lence band is strongly present in the 65 K calculation,
with comparable spectral weight as at Γ̄ and as the con-
duction band at M̄ . The conduction band maximum
intensity is located away from M̄ as in the experiment.
The small perpendicular dispersion of the free-electron
final state causes an asymmetry of the intensity of the
conduction band on each side of M̄ , which is also visible
in fig. 2. In our calculation, as opposed to the ARPES
spectra, the conduction band is unoccupied and only the
occupied tail of the peaks is visible. This difference may
be simply due to the final state approximation used in
the calculation, a slight shift of the chemical potential
due to the transition, or to atomic displacements that
would shift the conduction band [6, 7, 21]. Such atomic
displacements, in terms of a band Jahn-Teller effect, were
suggested as a driving force for the transition. However,
the key point is that, although the lattice distortion may
shift the conduction band, the very small atomic displace-
ments (≈ 0.02 Å [3]) in 1T -TiSe2 are expected to lead
to a negligable spectral weight in the backfolded bands
[20]. As an example, 1T -TaS2, another CDW compound
known for very large atomic displacements [22] (of or-
der > 0.1 Å) introduces hardly detectable backfolding of
spectral weight in ARPES. Clearly, an electronic origin
is necessary for obtaining such strong backfolding in the
presence of such small atomic displacements. Therefore,
our results allow to rule out a Jahn-Teller effect as the
driving force for the transition of TiSe2.
Furthermore, the ARPES spectra also show evidence
for the backfolded conduction band at the Γ̄ point. Fig.
4 shows constant energy cuts around the Fermi energy,
taken from the data of fig. 2b and 3b (arrows). In the
ARPES data two slightly dispersive peaks, reproduced
in the calculation, clearly cross the Fermi level. These
features turn out to be the populated tail of the back-
folded conduction band, whose centroid is located just
above the Fermi level. To our knowledge no evidence
for the backfolding of the conduction band had been put
forward so far.
In summary, by comparing ARPES spectra of 1T -
TiSe2 to theoretical predictions for an excitonic insula-
tor, we have shown that the superperiodicity of the ex-
citonic state with respect to the lattice results in a very
large transfer of spectral weight into backfolded bands.
This effect, clearly evidenced by photoemission, turns
out to be a characteristic feature of the excitonic insula-
tor phase, thus giving strong evidence for the existence
of this phase in 1T -TiSe2 and its prominent role in the
CDW transition.
Skillfull technical assistance was provided by the work-
shop and electric engineering team. This work was sup-
ported by the Fonds National Suisse pour la Recherche
Scientifique through Div. II and MaNEP.
[1] J. A. Wilson et al., Adv. Phys. 24, 117 (1975).
[2] F. Clerc et al., Phys. Rev. B 74, 155114 (2006).
[3] F. J. Di Salvo et al., Phys. Rev. B 14, 4321 (1976).
[4] M. Holt et al., Phys. Rev. Lett. 86, 3799 (2001).
[5] T. Pillo et al., Phys. Rev. B 61, 16213 (2000).
[6] T. E. Kidd et al., Phys. Rev. Lett. 88, 226402 (2002).
[7] K. Rossnagel et al., Phys. Rev. B 65, 235101 (2002).
[8] E. Morosan et al., Nature Physics 2, 544 (2006).
[9] D. Malterre et al., Adv. Phys. 45, 299 (1996).
[10] W. Kohn, Phys. Rev. Lett. 19, 439 (1967).
[11] B. I. Halperin and T. M. Rice, Rev. Mod. Phys. 40, 755
(1968).
[12] D. Jérome et al., Phys. Rev. 158, 462 (1967).
[13] F. X. Bronold and H. Fehske, Phys. Rev. B 74, 165107
(2006).
[14] The fit parameters are : ǫ0v=-0.08±0.005 eV, mv=-
0.23±0.02 me, where me is the free electron mass,
tv=0.06±0.005 eV ; ǫ
c=-0.01±0.0025 eV, m
c=5.5±0.2
me, m
c=2.2±0.1 me, tc=0.03±0.0025 eV
[15] O. Anderson et al., Phys. Rev. Lett. 55, 2188 (1985).
[16] C. Monney et al., to be published
[17] J. A. Wilson et al., Phys. Rev. B 18, 2866 (1978).
[18] C. Didiot et al., Phys. Rev. B 74, 081404(R) (2006).
[19] C. Battaglia et al., Phys. Rev. B 72, 195114 (2005).
[20] J. Voit et al., Science 290, 501 (2000).
[21] M. H. Whangbo and E. Canadell, J. Am. Chem. Soc.
114, 9587 (1992).
[22] A. Spijkerman et al., Phys. Rev. B 56, 13757 (1997).
ABSTRACT
  We present a new high-resolution angle-resolved photoemission study of
1\textit{T}-TiSe$_{2}$ in both, its room-temperature, normal phase and its
low-temperature, charge-density wave phase. At low temperature the
photoemission spectra are strongly modified, with large band renormalisations
at high-symmetry points of the Brillouin zone and a very large transfer of
spectral weight to backfolded bands. A theoretical calculation of the spectral
function for an excitonic insulator phase reproduces the experimental features
with very good agreement. This gives strong evidence in favour of the excitonic
insulator scenario as a driving force for the charge-density wave transition in
1\textit{T}-TiSe$_{2}$.

<|endoftext|><|startoftext|>
Introduction
There is no doubt that galaxies suffer chemical enrichment during their lives (see e.g.
Cid Fernandes et al. 2006 for a recent systematic approach using a large data base of
galaxies from the Sloan Digital Survey Data Release 5 – Adelman-Mac Carthy J.K. et
al., 2007). The main source of oxygen production has since long been identified as due
to supernovae from massive stars (type II supernovae). Yet, the exact process by which
chemical enrichment proceeds is poorly known (see a review by Scalo & Elmegreen 2004).
Ten years ago, Tenorio-Tagle (1996, hereafter T-T96) proposed a scenario in which the
metal-enhanced ejecta from supernovae follow a long excursion in galactic haloes before
falling down on the galaxies in the form of oxygen-rich droplets.
In the present work (the full version of which has been submitted to Astronomy &
Astrophysics, Stasińska et al. 2007), we suggest that the discrepancy between the oxygen
abundances derived from optical recombination lines (ORLs) and from collisionally ex-
cited lines (CELs) in HII regions (see e.g. Garćıa-Rojas et al. 2006 and references therein)
might well be the signature of those oxygen-rich droplets. In fact, Tsamis et al. (2003)
and Péquignot & Tsamis (2005) already suggested that the ORL/CEL discrepancy in
HII regions is the result of inhomegeneities in the chemical composition in these objects.
Our aim is to explicit the link between the ORL/CEL discrepancy and the T-T96 sce-
nario, and to check whether what is known of the oxygen yields allows one to explain
the ORL/CEL discrepancy in a quantitative way.
2. The Tenorio-Tagle (1996) scenario
Figures 1–5 present the T-T96 scenario in cartoon format.
http://arxiv.org/abs/0704.0160v1
2 Stasińska et al.: Oxygen-rich droplets and the enrichment of the ISM
Figure 1. Sketch of the T-T96 scenario: At time t=0, a burst of star formation occurs and a
giant HII region forms.
Figure 2. Sketch of the T-T96 scenario: During the next ∼ 40 Myr, supernovae explode, creating
a hot superbubble confined within a large expanding supershell that bursts into the galactic halo.
The superbubble contains the matter from the oxygen-rich supernova ejecta mixed with the
matter from the stellar winds and with the matter thermally evaporated from the surrounding
supershell.
3. The ORL-CEL discrepancy in the context of the T-T96 scenario
The details of the physical arguments concerning the amount of oxygen available in the
droplets, the mixing processes, as well as the simulation of the ORL-CEL discrepancy
with a multizone photoionization model are described in Stasińska et al. (2007). Here,
we simply give the most important conclusions.
Photoionization of the oxygen-rich droplets predicted by the T-T96 scenario can repro-
duce the observed abundance discrepancy factors (ADFs, i.e. the ratios of abundances
obtained from ORLs and from CELs) derived for Galactic and extragalactic HII regions.
The recombination lines arising from the highly metallic droplets thus show mixing at
work.
Stasińska et al.: Oxygen-rich droplets and the enrichment of the ISM 3
Figure 3. Sketch of the T-T96 scenario: after the last supernova has exploded, the gas in the
superbubble begins to cool down. Loci of higher densities cool down quicker. Due to a sequence
of fast repressurizing shocks, this leads to the formation of metal-rich cloudlets. The cooling
timescale is of the order of 100 Myr.
Figure 4. Sketch of the T-T96 scenario: The now cold metal-rich cloudlets fall unto the galactic
disk. They are further fragmented into metal-rich droplets by Raighleigh-Taylor instabilities.
This metal-rich rain affects a region whose extension is of the order of kiloparsecs, i.e. much
larger than the size of the initial HII region
We find that, if our scenario holds, the recombination lines strongly overestimate the
metallicities of the fully mixed HII regions. The collisionally excited lines may also over-
estimate them, although in much smaller proportion. In absence of any recipe to correct
for these biases, we recommend to discard objects showing large ADFs to probe the
chemical evolution of galaxies.
4 Stasińska et al.: Oxygen-rich droplets and the enrichment of the ISM
Figure 5. Sketch of the T-T96 scenario: When a next generation of massive stars form, they
photoionize the surrounding interstellar medium, including the metal-rich droplets. It is only
after the droplets have been photoionized that their matter is intimately mixed with the matter
from the ISM, and that proper chemical enrichment has occured. The whole process since the
explosion of the supernovae that provided fresh oxygen has taken at least 100 Myr.
To proceed further with this question of inhomogeneities, one needs as many observa-
tional constraints as possible.
On the theoretical side, one needs more robust estimates of the integrated stellar yields
as well as a better knowledge of the impact of massive stars on the ISM and of the role
of turbulence. All these issues are relevant to our understanding of the metal enrichment
of the Universe.
REFERENCES
Adelman-Mac Carthy J.K. et al., 2007, in preparation
Cid Fernandes, R., Vala Asari, N., Sodré Jr. L., Stasińska, G., Mateus, A., Torres-Papaqui, J.P.,
Schnoell, W., 2006, MNRAS in press (astro-ph/0610815)
Garćıa-Rojas, J., Esteban, C., Peimbert, M., Costado, M. T., Rodŕıguez, M., Peimbert, A., &
Ruiz, M. T. 2006, MNRAS, 368, 253
Scalo, J., & Elmegreen, B. G. 2004, ARA&A, 42, 275
Stasińska, G., Tenorio-Tagle, G., Rodŕıguez, M., Henney, W.J., 2007, A&A submitted
Tenorio-Tagle, G. 1996, AJ, 111, 1641 (T-T96)
Tsamis, Y. G., Barlow, M. J., Liu, X.-W., Danziger, I. J., & Storey, P. J. 2003, MNRAS, 338,
Tsamis, Y. G., & Péquignot, D. 2005, MNRAS, 364, 687
http://arxiv.org/abs/astro-ph/0610815
	Introduction
	The Tenorio-Tagle (1996) scenario
	The ORL-CEL discrepancy in the context of the T-T96 scenario
ABSTRACT
  We argue that the discrepancies observed in HII regions between abundances
derived from optical recombination lines (ORLs) and collisionally excited lines
(CELs) might well be the signature of a scenario of the enrichment of the
interstellar medium (ISM) proposed by Tenorio-Tagle (1996). In this scenario,
the fresh oxygen released during massive supernova explosions is confined
within the hot superbubbles as long as supernovae continue to explode. Only
after the last massive supernova explosion, the metal-rich gas starts cooling
down and falls on the galaxy within metal-rich droplets. Full mixing of these
metal-rich droplets and the ISM occurs during photoionization by the next
generations of massive stars. During this process, the metal-rich droplets give
rise to strong recombination lines of the metals, leading to the observed
ORL-CEL discrepancy. (The full version of this work is submitted to Astronomy
and Astrophysics.)

<|endoftext|><|startoftext|>
Microsoft Word - word_Zn_CN__high_pressure
Soft modes and NTE in Zn(CN)2 from  
Raman spectroscopy and first principles calculations 
T. R. Ravindran*, A. K. Arora, Sharat Chandra, M. C. Valsakumar and N. V. Chandra 
Shekar 
Materials Science Division, Indira Gandhi Centre for Atomic Research, Kalpakkam 603 
102, India 
We have studied Zn(CN)2 at high pressure using Raman spectroscopy, and report 
Gruneisen parameters of the soft phonons.  The phonon frequencies and eigen vectors 
obtained from ab-initio calculations are used for the assignment of the observed phonon 
spectra.   Out of the eleven zone-centre optical modes, six modes exhibit negative 
Gruneisen parameter.  The calculations suggest that the soft phonons correspond to the 
librational and translational modes of C≡N rigid unit, with librational modes contributing 
more to thermal expansion.  A rapid disordering of the lattice is found above 1.6 GPa 
from X-ray diffraction. 
PACS numbers: 62.50.+p, 63.20.Dj, 78.30.–j, 78.20.Bh 
*Corresponding author: Email: trr@igcar.gov.in 
Interest in materials that exhibit negative thermal expansion (NTE) was renewed 
after the report [1] of high and isotropic NTE in Zr(WO4)2 over a wide temperature range, 
leading to extensive work and several reviews on the subject [2-5].  The structure of 
Zr(WO4)2 and several other NTE materials consist of corner sharing tetrahedral and 
octahedral units.  From Raman spectroscopic investigations on Zr(WO4)2 as a function of 
pressure and temperature, the phonons responsible for NTE have been identified, and it 
has been shown that in addition to the librational (rigid-unit) mode at 5 meV, several 
other phonons of much higher energy also contribute significantly to the NTE in this 
material [6-9].  Based on structural analysis transverse displacements of the shared 
oxygen atoms and consequent rotation of polyhedra [1] was suggested as the cause of 
NTE in Zr(WO4)2.  In the context of corner linked structures, Zn(CN)2 is remarkable, as it 
has C≡N as the linking species between tetrahedral units instead of a single atom and 
exhibits twice as much coefficient of NTE (-17x10-6 K-1) [10] as that of Zr(WO4)2.  The 
structure of Zn(CN)2 consists of three-dimensional, inter-penetrating, tetrahedral 
frameworks of Zn-CN-Zn chains  [11].  Two different cubic structures, mP 34  (CN-
ordered) [11] and mPn3 (CN-disordered) [10] have been reported to fit well to the 
diffraction patterns.  In the ‘ordered structure’ the CN ions lying along the body diagonal 
are orientationally ordered such that they form ZnC4 and ZnN4 coordination tetrahedra 
around alternate cations.  On the other hand, in the ‘disordered structure’, C and N atoms 
are randomly flipped so as to occupy the sites with equal probability.  It was shown 
recently from factor group analysis in conjunction with Raman and IR spectroscopic 
measurements that the structure is indeed disordered [12]. 
From a topological model treating ZnN4/ZnC4 as rigid units the structure was 
argued to support a large number of low frequency rigid unit phonon modes (ωph< 2 THz, 
≈70 cm-1) that contribute to NTE [13].  On the other hand, it was shown recently by 
spectroscopic measurements [12] that the lowest energy optical mode in Zn(CN)2 is an IR 
mode at 178 cm-1.  It has been shown by atomic pair distribution function analysis of X-
ray diffraction data and suitable modelling of the displacements of C/N away from the 
body-diagonal that this displacement increases as a function of temperature [14].  This is 
in any case expected from the increased amplitude of atomic vibrations as temperature is 
increased.  However, there is no report of the role of different phonons to thermal 
expansion.  Since phonon modes and their Gruneisen parameters are directly responsible 
for thermal expansion in a material, it becomes vitally important to study them. 
Here we report the first study of phonons in Zn(CN)2 at high pressure using 
Raman spectroscopy and ab-initio calculations. High pressure X-ray diffraction 
measurements are also carried out for obtaining the bulk modulus to calculate Gruneisen 
parameters.  From high pressure Raman measurements soft phonons are identified.  In 
addition, first-principles ab-initio density functional calculations are performed at 
different volumes and phonon dispersion curves obtained using frozen phonon 
approximation with SIESTA code [15].  The phonon eigen vectors are used for the 
assignment of phonon modes.  The thermal expansion coefficient is calculated from 
Gruneisen parameters of all the phonons and compared with the reported value. 
Zn(CN)2 (>99.5%) was obtained from Alfa Aesar.  X-ray diffraction pattern of 
this powder sample showed no observable impurity phases.  A small piece of sample of 
lateral dimensions ~100 μm was loaded into a gasketed, Mao-Bell type diamond anvil 
cell.  Raman spectra were recorded at different pressures in the backscattering geometry 
using the 488-nm line of an argon ion laser.   Methanol-ethanol (4:1) mixture was used as 
pressure transmitting medium.  Ruby fluorescence was used to measure pressure.  
Scattered light from the sample was analyzed by a SPEX double monochromator, and 
detected with a cooled photomultiplier tube operated in the photon counting mode. 
Scanning of the spectra and data acquisition were carried out using a PSoC 
(Programmable System on Chip) hardware controlled by LabVIEW® 7.1 program [16].  
The spectral range covered was 10-2400 cm-1 that also includes the C≡N stretch mode 
around 2220 cm-1.  High pressure X-ray diffraction (HPXRD) was carried out in an angle 
dispersive mode using Guinier diffractometer [17].  The incident Mo Kα1 radiation is 
obtained from a Rigaku 18 kW rotating anode x-ray generator. 
Ab-initio calculations were carried out in the framework of the density functional 
theory using the Perdew–Burke–Ernzerhof generalized gradient approximation for 
exchange and correlation [18].  A 3×3×3 supercell of Zn(CN)2 unit cell was used for 
determining the relaxed atomic configuration, and phonon frequencies calculated using 
the SIESTA code.  The calculations were performed using a Monkhorst-Pack grid of 
8×8×8 k-points with a shift of 0.5.  The energy cut off was 350 Rydbergs and a double 
zeta plus polarization (DZP) basis set was used.  Standard norm-conserving, fully 
relativistic Troullier-Martins TM2 pseudopotentials were used.  The computations were 
performed in a 16-node linux cluster. 
The 30 degrees of freedom arising from the 10 atoms in the cubic unit cell of 
Zn(CN)2 result in 3 acoustic and 27 optical branches.  Out of the three structural units 
viz., C≡N ion, ZnC4 tetrahedron and ZnN4 tetrahedron, C≡N is the most strongly bound 
unit and hence taken as a ‘rigid molecular unit’.  The 6 degrees of freedom corresponding 
to the linear molecular ion C≡N can be divided into 1-internal (stretching vibration), 3 
rigid-translations and 2 rigid rotational degrees of freedom.  The ‘disordered’ structure of 
zinc cyanide has the following irreducible representations of optical phonons [12]: 
.OptΓ = A1g + Eg + F1g + 3F2g + A2u + Eu + 2F1u + F2u 
Out of these, the A1g, Eg and F2g modes are Raman active and F1u mode is IR active.  The 
remaining four modes are optically inactive. 
  Figure 1 shows the Raman spectra of Zn(CN)2 at several pressures including 
ambient.  There are three Raman modes clearly seen at 2221, 342, and 200 cm-1.  Out of 
these, the asymmetric peak about 342 cm-1 is actually a doublet that can be resolved into 
339 and 343 cm-1 [12].  The linewidth of all three modes increase and their intensities 
reduce at high pressures.  While the C≡N stretch mode about 2220 cm-1 hardens as 
pressure is increased, the other two modes are seen to soften.  The modes at 342 cm-1 and 
200 cm-1 are too weak to follow above 1 GPa.  Figure 2 depicts the phonon frequency (ω) 
vs. pressure (P) for the three modes observed by Raman spectroscopy.  Most 
measurements were carried out under hydrostatic conditions using (methanol + ethanol) 
as pressure transmitting medium.  One set of measurements in which no medium was 
used (open circles in Fig. 2) resulted in a weaker P dependence for the 2220 cm-1 mode 
up to a pressure of 1.5 GPa and a negative coefficient above 2 GPa.  The reason for this 
change of slope from positive to negative could be a structural transition occurring under 
non-hydrostatic pressure.  However, high pressure X-ray diffraction measurements 
(discussed later) under hydrostatic pressure have not indicated any structural phase 
transition between 1.5 and 2 GPa or at any pressure up to 5.2 GPa, the highest pressure 
up to which measurements were made. 
X-ray diffraction patterns were recorded at several pressures up to 5.2 GPa.  Only 
three reflections, viz., (110), (211) and (321) could be observed.   As pressure is 
increased, the intensity of all lines reduces drastically.  At a pressure as low as 0.2 GPa 
the intensities of the peaks reduce by about 50%.  Above 0.6 GPa the (321) line 
disappears and above 1.6 GPa, only the (110) line is present, which continues up to the 
highest pressure of 5.2 GPa, indicating possible disordering of C≡N has taken place 
above 1.6 GPa.  Such a partial/sublattice amorphization has been reported earlier also in 
other compounds [19, 20].  Lattice parameters at several pressures were obtained from 
the three lines using a disordered cubic space group ( mPn3 ) structure.  The unit cell 
volume obtained as a function of pressure was fitted to Murnaghan equation of state and 
resulted in a bulk modulus B0=25±11 GPa.  The large error in B0 is due to the scatter in 
the XRD data and also the small number of reflections that were used to calculate the 
lattice parameters.  With this input of B0 the mode Gruneisen parameters (γi  = B0ωi−1 
∂ωi/∂P) of the three Raman modes could be calculated (Table 1, last column).  In the 
absence of γi values of other vibrational modes, the thermal expansion coefficient has 
been calculated using simulation data as detailed in the next paragraph. 
It is not straight forward to incorporate random disorder in ab-initio calculations.  
When such a disorder is introduced by randomly flipping half the C≡N species in the 
supercell, it is found that for this disordered structure of cubic Zn(CN)2 - when the system 
is allowed to relax - the ground state energy does not converge to a stable configuration 
but evolves into a tetragonal structure (space group nmP 24 ) with c-parameter ~0.5% 
larger than the a- and b-parameters.  Upon further relaxation, the structure slowly 
becomes triclinic.  Additionally, the inter-atomic forces are large and do not converge to 
small values.  On the other hand, for the ‘ordered structure’ the forces converged to 
values less than 10-6 eV/Å due to geometrical considerations.  Hence the ordered 
structure of Zn(CN)2 is used for computational purposes.  It should be pointed out that the 
values of the vibrational frequencies obtained from either of the space groups are not 
expected to be different from each other, since the same kind of atomic motions are 
involved in the vibrational modes.  The number of zone-centre optical phonon modes is 
also the same in either space group.  The total energy of the system was computed in the 
relaxed configuration for different volumes of the cell up to V/V0=0.844.  The energy vs. 
volume data was fitted to Murnaghan equation and the bulk modulus obtained is 88 GPa.  
A similar result (90 GPa) is obtained when WIEN2K is used to calculate the bulk 
modulus.  Phonon dispersion curves at different volumes were calculated using the 
frozen-phonon method using the VIBRA module in the SIESTA package.  Eigen 
frequencies for the various modes were obtained by diagonalizing the dynamical matrix.  
The phonon dispersion curves obtained at ambient volume from simulations are shown in 
Figure 3.  Eigenvectors were viewed using the Visual Molecular Dynamics (VMD) 
package [21].  The highest compression corresponds to a pressure of 8.3 GPa.  From the 
pressure dependence of the various zone centre optical phonons (inset in Fig. 2) the mode 
Gruneisen parameters were obtained [Table I].  Using Einstein’s specific heat Ci = R 
[xi2exp(xi)]/[exp(xi)-1]2, where xi=ħωi/kBT, for the various modes the total specific heat 
CV was obtained.  Here R is the universal gas constant.  Thermal expansion coefficient 
α=(γavCV)/(3VmB0), (where γav=½∑piCiγi)/CV, pi are the degeneracies of the respective ωi 
phonon branches at the Brillouin zone centre, Vm is the molar volume and B0 taken as 88 
GPa) is calculated to be -22×10-6 K-1, in good agreement with the reported value.   
In view of the non-availability of polarized Raman measurements on oriented 
single crystals of Zn(CN)2, the observed modes were assigned (Table I) based on eigen 
vectors of calculated phonons.  The CN stretching mode at 2200 cm-1 can be assigned to 
A1g.  In the internal mode region A1g and F2g modes arise due to correlation splitting [12] 
and are often degenerate.  It is noteworthy that six out of the eleven optical modes exhibit 
negative Gruneisen parameters.  Furthermore, all the modes of energy lower than 360 cm-
1 have negative γi.  Figure 4 shows the displacement vectors of different atoms for the 
phonons that exhibit large negative γi.  The 143 cm-1 mode corresponds to translational 
motion of CN ions whereas the other three modes involve librations of CN ions about the 
axis joining Zn-Zn’ atoms.  The difference in the values of the calculated and the 
observed values of γi could partly arise from the different values of B0 used.  However, 
this does not affect the calculation of α, since B0 gets cancelled in the definition of α.  
Further, the total Gruneisen parameter (∑piγi) for the C≡N librational modes is -57 
whereas for the translational modes this value is -41. 
As mentioned earlier, using a topological model the network structure of Zn(CN)2 
has been argued to have a large number of low-frequency rigid units modes of 
ZnC4/ZnN4 in analogy with Zr(WO4)2.  On the other hand, in the present lattice 
dynamical calculations the lowest frequency mode turns out to be a CN-translational 
mode.  This is because the topological model treated ZnC4/ZnN4 as rigid units, whereas 
actually only the strongly bound CN ions should be considered as rigid units.  Recent 
atomic pair distribution function analysis shows that the displacements of C/N away from 
the line joining Zn…Zn’ increases as a function of temperature.  Though this appears 
physically reasonable, this displacement, when extrapolated to 0 K, remains as large as 
0.42 Å (Fig.9 of Ref.[14]) suggesting inconsistency between the Rietweld refined 
structure and that obtained from PDF analysis.  Furthermore, the reason for Zn…Zn’ 
distance (which is directly related to the lattice parameters) estimated from PDF analysis 
being different from that obtained from XRD analysis [14] remains unclear.  On the other 
hand, the present phonon calculations and Raman measurements at high pressure provide 
the first insight into the relative role of the different phonons in causing negative thermal 
expansion in Zn(CN)2. 
In conclusion, we have identified the optical phonons responsible for NTE in 
Zn(CN)2 from high pressure Raman spectroscopic studies and from first principles 
density functional simulation studies at different volumes.  Gruneisen parameters of all 
the vibrational modes were obtained from simulations.  A large number of phonon modes 
in Zn(CN)2 are soft, and all contribution to NTE arises from C≡N librational and 
translational modes.  The value of thermal expansion coefficient α calculated from the 
Gruneisen parameters is in good agreement with experimental value.  X-ray diffraction 
investigations suggest growth of  disorder at high pressure.   
References: 
1. J. S. O. Evans, T. A. Mary, T. Vogt, M. A. Subramanian, and A. W. Sleight, Chem. 
Mater. 8, 2809 (1996). 
2. M. G. Tucker, A. L. Goodwin, M. T. Dove, D. A. Keen, S. A. Wells, and J. S. O. 
Evans, Phys. Rev. Lett. 95, 255501 (2005) 
3. A. W. Sleight, Curr. Opin. in Solid State Mater. Sci. 3, 128 (1998). 
4. S. K. Sikka, J. Phys.: Condens. Matter 16, S1033 (2004). 
5. G. D. Barrera, J. A. O. Bruno, T. H. K. Barron, and N. L. Allan, J. Phys.: Condens. 
Matter 17, R217 (2005). 
6. T. R. Ravindran, A. K. Arora, and T. A. Mary, Phys. Rev. Lett. 84, 3879 (2000). 
7. T. R. Ravindran, A. K. Arora, and T. A. Mary, Phys. Rev. Lett. 86, 4977 (2001) 
8. T. R. Ravindran, A. K. Arora, and T. A. Mary, J. Phys.: Condens. Matter 13, 11573 
(2001). 
9. T. R. Ravindran, A. K. Arora, and T. A. Mary, Phys. Rev. B 67, 064301 (2003). 
10. D. J. Williams, D. E. Partin, F. J. Lincoln, J. Kouvetakis, and M. O’Keefe, J. Solid 
State Chem. 134, 164 (1997) 
11. B. F. Hoskins and R. Robson, J. Am. Chem. Soc. 112, 1546 (1990). 
12. T. R. Ravindran, A. K. Arora, and T. N. Sairam, J. Raman Spectrosc. 38, 283 (2007). 
13. A. L. Goodwin, and C. K. Kepert, Phys. Rev. B 71, R140301 (2005). 
14. K. W. Chapman, P. J. Chupas, and C. J. Kepert, J. Am. Chem. Soc. 127, 15630 
(2005) 
15. J. M. Soler, E. Artacho, J. D. Gale, A. Garcia, J. Junquera, P. Ordejon and D. 
Sanchez-Portal, J. Phys.: Condens. Matter 14, 2745 (2002). 
16. J. Jayapandian, R. Kesavamoorthy and A. K. Arora, J. Instrum. Soc. India (2007) in 
press. 
17. P. Ch. Sahu, M. Yousuf, N. V. C. Shekar, N. Subramanian, and K. G. Rajan, Rev. 
Sci. Instrum. 66, 2599 (1995). 
18. J. P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev. Lett. 77, 3865 (1996). 
19. A. K. Arora, R. Nithya, T. Yagi, N. Miyajima and T.A. Mary, Solid State Commun. 
129, 9 (2004) 
20. J.B. Parise, J.S. Loveday, R.J. Nelmes, H. Kagi, Phys. Rev. Lett. 83, 328 (1999). 
21. W. Humphrey, A. Dalke and K. Schulten, J. Mol. Graphics 14, 33 (1996) 
Table I 
Modes 
(deg. of 
freedom) 
metry 
Calc. 
freq. 
(cm-1) 
Obs. 
freq. 
(cm-1) 
Calc. 
Obs. γi 
Zn-trans. 
F2g 388 343 (R) 0.45 - 
F1u 143 178 (IR) -14.3 - 
Eu 255 Inactive -1.5 - 
F2g  352 216 (R) -0.13 -0.50 
(15) 
A2u 564 Inactive 1.1 - 
CN-trans. 
(12) 
F1u 596 461 (IR) 1.4 - 
F1g 288 Inactive -8.0 - 
Eg 357 339 (R) -6.2 -0.54 
CN-libr. 
F2u 326 Inactive -7 - 
F2g 2232 2218(IR) 1.5 0.14(1) CN-int. 
(4) A1g 2245 2221 (R) 1.5 - 
Figure and table captions: 
Table 1. Calculated and observed phonon frequencies in Zn(CN)2, their classification, 
mode assignments and Gruneisen parameters.  Observed IR frequencies are from [12]. 
Figure 1.  Raman spectra of Zn(CN)2 at several pressures.  Spectra are scaled and shifted 
for clarity.  The modes at 342 and 200 cm-1 could not be followed above 1 GPa due to 
weak intensities. 
Figure 2.  Mode frequency vs. Pressure for the observed Raman modes in Zn(CN)2.  
Open symbols: results without pressure medium.  The inset shows ω vs. P for all the 
eleven modes obtained from the phonon calculations.  Though data were generated up to 
8.3 GPa, the trend after the first three pressures (shown here) is non-linear, and hence not 
considered for obtaining γi. 
Figure 3.  Phonon dispersion curves obtained from First Principles density functional 
simulations on a 3×3×3 supercell of Zn(CN)2.  Note that the acoustic phonon branch 
interacts with the lowest energy optical phonon branch at 143 cm-1.  Both the branches 
change character due to the non-crossing rule. 
Figure 4. Atomic displacements of  vibrational modes corresponding to (a) 143 cm-1, (b) 
288 cm-1, (c) 326 cm-1 and (d) 357 cm-1.  The arrows fixed to the atoms are proportional 
to the amplitude of atomic motion.  In the 326 cm-1 mode neighbouring Zn atoms also 
move (opposite direction) 
200 250 300 350 2200 2250
CN-translation
CN stretch
CN libration
x0.6x1.6
0.8 GPa
2.4 GPa
1.2 GPa
0.1 GPa
Raman shift (cm-1)  
Figure 1. Ravindran et al 
Figure 2. Ravindran et al. 
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
2 2 0 0
2 3 0 0
Figure 3. Ravindran et al. 
Figure 4. Ravindran et al.
ABSTRACT
  We have studied Zn(CN)2 at high pressure using Raman spectroscopy, and report
Gruneisen parameters of the soft phonons. The phonon frequencies and eigen
vectors obtained from ab-initio calculations are used for the assignment of the
observed phonon spectra. Out of the eleven zone-centre optical modes, six modes
exhibit negative Gruneisen parameter. The calculations suggest that the soft
phonons correspond to the librational and translational modes of CN rigid unit,
with librational modes contributing more to thermal expansion. A rapid
disordering of the lattice is found above 1.6 GPa from X-ray diffraction.

<|endoftext|><|startoftext|>
Introduction
The basic task of experimental physical exploration of nat-
ural phenomena is to provide quantitative data on mea-
sured variables and, from them extract physical laws [1].
Related to this task, experimenters must decide how many
experiments to perform in order to provide proper exper-
http://arxiv.org/abs/0704.0162v1
2 Igor Grabec: Estimation of experimental data redundancy and related statistics
imental data. We know that it is reasonable to repeat ex-
periments as long as they yield essentially new data, and
to stop repetition when the data become redundant. In
order to describe this concept objectively, we have intro-
duced in previous articles [2,3] two statistics called exper-
imental information I and redundancy R of experimental
data based on the entropy of information [4]. Their differ-
ence C = R−I can be interpreted as the information cost
function of the experimental exploration. From the cost
function minimum, the proper number N◦ of experiments
can be determined in an objective way. The entropy of in-
formation is defined by the integral of a nonlinear function
of the probability density function of experimental data,
and consequently its calculation is numerically demand-
ing. This property represents a serious obstacle, especially
when treating multivariate data. Therefore, our aim is to
show how this obstacle can be effectively avoided by es-
timating data redundancy without integration. For this
purpose we first briefly repeat the route to the definition
of redundancy [2,3] and subsequently show how the inte-
gral in the corresponding expression can be approximated.
The performance of the derived approximate method of
calculation is demonstrated using two–dimensional nor-
mally distributed random data.
2 Redundancy of experimental data
Let us consider a phenomenon characterized by N mea-
surements of a variable x using an instrument with span
Sx = (−L,L). Properties of the instrument are specified
by calibration on a unit u. The probability density func-
tion (PDF) of the instrument’s output scattering during
calibration is described by the scattering function ψ(x, u).
When the scattering is caused by mutually independent
disturbances in the experimental system, the scattering
function is Gaussian [1,4] :
ψ(x, u) = g(x− xi, σ) =
(x− u)2
. (1)
We apply this function in our further treatment. The mean
value u and standard deviation σ can be estimated statis-
tically by repetition of calibration.
Let xi denote the most probable instrument output
in the i–th experiment. Using ψ(x, xi) we describe the
properties of the explored phenomenon during the i–th
experiment. Similarly, the properties in a series of N re-
peated experiments, which yield the basic data set {xi; i =
1, . . . , N}, are described by the experimentally estimated
fN (x) =
ψ(x, xi). (2)
In addition, we introduce a uniform reference PDF ρ(x) =
1/(2L) indicating that all outcomes of the experiment are
hypothetically equally probable before executing the ex-
periments.
Based upon functions fN (x) and ρ(x) we describe the
indeterminacy of variable x by the negative value of the
relative entropy [5,6,7]:
Hx = −
f(x) log
(fN (x)
dx. (3)
Similarly, we describe the uncertainty Hu of calibration
performed on a unit u by:
Hu = −
ψ(x, u) log
(ψ(x, u)
dx. (4)
Igor Grabec: Estimation of experimental data redundancy and related statistics 3
Using the difference of these statistics we define the ex-
perimental information:
I = Hx −Hu
f(x) log(fN (x)) dx
ψ(x, u) log(ψ(x, u)) dx. (5)
Using Eq. 2 in this expression we get:
I = log(N) − 1
ψ(x, xi) log
ψ(x, xj)
ψ(x, u) log
ψ(x, u)
dx. (6)
If we express the logarithm in the second term as:
ψ(x, xj)
= logψ(x, xi) + log
ψ(x, xj)
ψ(x, xi)
we obtain:
I = log(N) +
ψ(x, xi) log
ψ(x, xi)
ψ(x, xi) log
ψ(x, xj)
ψ(x, xi)
ψ(x, u) log
ψ(x, u)
dx. (8)
The second and the fourth term on the right side of this
equation yield 0 and we get:
I = log(N)− 1
ψ(x, xi) log
ψ(x, xj)
ψ(x, xi)
With the last term we introduce the statistic called redun-
dancy of data:
ψ(x, xi) log
ψ(x, xj)
ψ(x, xi)
dx (10)
with which we get the basic relation:
I = log(N)−R (11)
If |xi − xj | ≫ σ for all pairs i#j, there is no overlapping
of functions ψ(x, xi), ψ(x, xj); therefore, the sum in the
logarithm is ∼ 0, and consequently the redundancy is R ∼
0. In the opposite case, when |xi − xj | ≪ σ, it follows
that ψ(x, xi) ∼ ψ(x, xj). Due to good overlapping in this
case, the corresponding term in the expression of R yields
log(2)/N and R > 0.
This property indicates that experimental information
is increasing with increasing N as I ∼ log(N) if the ac-
quired data are well separated with respect to σ. However,
with an increasing number of data, they are ever more
densely distributed, which results in an increasing overlap-
ping of distributions that causes increasing redundancy of
measurements. Although the expression in Eq. 10 for re-
dundancy R is rather cumbersome due to the included
integral, we expect that R could be estimated without
integration by the simpler function of distances between
data points. For this purpose we next consider the prop-
erties of the scattering function ψ(x, xi).
If the Gaussian function ψ(x, xi) = g(x−xi, σ) is con-
sidered as an approximation of the delta function δ(x−xi),
and the logarithm as a slowly changing function, the inte-
gration in Eq. 10 can be carried out, which yields for the
redundancy the first order approximate expression with-
out the integral:
ψ(xi, xj)
ψ(xi, xi)
If we take into account Eq. 1, we get for the redundancy
the following approximate expression that depends only
4 Igor Grabec: Estimation of experimental data redundancy and related statistics
on standard functions of distances between data points:
− (xi − xj)
However, this first order approximation is rather rough
because the distribution ψ(xi, xj) has the width σ > 0 and
the logarithm in Eq. 10 includes the fraction of functions
ψ(x, xj)/ψ(x, xi). To proceed to a better approximation,
we have examined the case of just two data points, since
it mainly determines the property of the redundancy. In
this case the integration of the first three terms in a Taylor
series expansion of the logarithm yields the second approx-
imation:
− (xi − xj)
, (14)
which is obtained from the previous one by merely chang-
ing 2σ2 → 4σ2. This property indicates that a still better
approximation could be obtained by properly adapting
2σ2 in Eq. 13. For this purpose we have proceeded with
numerical investigations which have shown that a nearly
optimal approximation is obtained if 2σ2 in Eq. 13 is re-
placed by ∼ 5.1σ2:
− (xi − xj)
5.1σ2
. (15)
Numerical investigations have further shown that this for-
mula also yields good results in cases with many data
points.
Since the integral is excluded from Eq. 15, the redun-
dancy R can be estimated from Eq. 15 with essentially less
computational effort than from Eq. 10. This advantage is
especially outstanding in a multivariate case where the
redundancy is defined by multiple integrals, while in the
approximate formula in Eq. 15 only the term (xi − xj) in
the exponential function has to be replaced by the norm
of corresponding vectors. Due to this advantage, it is also
reasonable to estimate approximately the experimental in-
formation using the basic formula I = log(N) − R. The
experimental information I converges with the increasing
number of data N to a certain limit value from which the
complexity of the phenomenon under investigation can be
estimated using the formula K ≈ exp(IN→∞) introduced
previously [2,3]. The complexity K indicates how many
non–overlapping scattering distributions are needed in the
estimator Eq. 2 to describe the PDF of the observed phe-
nomenon.
The information cost function is the difference of the
redundancy and experimental information: C = R − I.
During minimization of this cost, the experimental infor-
mation provides for a proper adaptation of the PDF es-
timator to the experimental data, while the redundancy
prevents excessive growth of the number of data points. By
the position of the cost function minimum we introduce
the proper number No of the data and the corresponding
experiments that are needed to judiciously represent the
phenomenon under exploration. By inserting the expres-
sion I = log(N) − R into C = R − I, we obtain for the
information cost function the formula:
C = 2R− log(N). (16)
Therefore the proper number No can also be determined
from the approximately estimated redundancy Ro. This
number roughly corresponds to the ratio between the mag-
nitude of the characteristic region where experimental data
Igor Grabec: Estimation of experimental data redundancy and related statistics 5
appear and the magnitude of the characteristic region cov-
ered by the scattering function [2,3].
3 Numerical examples
To demonstrate the properties of the approximations R1,
R2, Ro let us first consider the case of just two data
points separated by a distance x1 − x2. Fig. 1 shows the
dependence of redundancy R on relative distance d =
(x1−x2)/σ as determined by the integral in Eq. 10 and ap-
proximations in Eqs. 13,14,15. Improvement achieved by
subsequent steps of approximation and a fairly good agree-
ment between approximation Ro and R calculated by the
integral is evident. However, in a case with more data
points we can generally expect slightly worse agreement
due to overlapping of more than two scattering functions
in the sum of the approximation formula in Eq. 15. The
performance in such a case is demonstrated in the next
example.
In order to provide for reproduction of the demon-
strated example, we consider a two–dimensional Gaussian
random phenomenon with zero mean value. The stan-
dard deviation of both components is equal to s = 2.5,
while their covariance is zero. The data generated by a
standard Gaussian generator are represented in the two-
dimensional span (−10,+10)⊗ (−10,+10) using the scat-
tering width σ = 0.5. In such a case we can theoretically
predict that the proper number of data samples should be
No ≈ (s/σ)2 = 25.
For the demonstration, a set of Nmax = 100 two-
dimensional data samples {(xi, yi); i = 1 . . .Nmax} was
0 1 2 3 4 5 6
Fig. 1. Dependence of redundancy R on relative distance d =
(x1−x2)/σ between data points as determined by the integral
in Eq. 10, and approximations in Eqs. 13,14,15.
0.005
0.015
0.025
0.035
Fig. 2. PDF determined by 100 data points xi, yi.
generated. The corresponding probability density function
was estimated using Eq. 2 adapted to the two–dimensional
case with statistically independent components:
fN (x, y) =
ψ(x, xi)ψ(y, yi). (17)
The resulting PDF with N = 100 is graphically repre-
sented in Fig. 2.
6 Igor Grabec: Estimation of experimental data redundancy and related statistics
0 10 20 30 40 50 60 70 80 90 100
REDUNDANCY VERSUS NUMBER OF DATA
Fig. 3. Dependence of redundancy R on number N of data
points as determined by the integral in Eq. 10 – (R), and ap-
proximation in Eq. 15 – (Ro) adapted to the two–dimensional
case.
From the generated data the redundancy was calcu-
lated using Eqs. 10 and 15 adapted to the two–dimensional
case. The dependence of redundancy R on the number N
of accounted data points is shown in Fig. 3. Fairly good
agreement between both statistics is again evident.
Approximately estimated redundancy was further uti-
lized in the calculation of statistics I and C. They are
shown as functions of the number of data points N in
Fig. 4 together with R(N) and log(N). Agreement with
the same statistics calculated more exactly by integration
can be established by comparing this figure with Fig. 4.
In both cases we obtain for the proper number the value
No = 28. This value depends on the statistical properties
of the data set used in its calculation; a statistical esti-
mation from 100 different data sets yields the estimate
No ≈ 25±13 which agrees well with the theoretically pre-
0 10 20 30 40 50 60 70 80 90 100
INFORMATION STATISTICS VERSUS NUMBER OF DATA
log( N)
Fig. 4. Dependence of information statistics on the number N
of data points as approximately determined from Eq. 15. The
minimum of the cost function occurs at N = 28.
0 10 20 30 40 50 60 70 80 90 100
log( N)
Fig. 5. Dependence of information statistics on the number N
of data points as determined based on integration.
dicted value No = 25. Similarly as in the one–dimensional
case [2], it turns out that the function fNo(x, y) is only a
rough estimator of the hypothetical PDF. This property is
a consequence of the fact that experimental information I
and redundancy R have equal weights in the cost function
C = R− I.
Igor Grabec: Estimation of experimental data redundancy and related statistics 7
Figs. 4 and 5 indicate that experimental information I
converges with increasing N to a certain limit value from
which the complexity of the phenomenon under investiga-
tion can be approximately estimated as K ≈ exp INmax .
In our case we get the estimate K ≈ 21. The number
of non–overlapping scattering distributions that represent
the PDF of the observed phenomenon is thus slightly smaller
than the proper number No of experiments needed for its
exploration.
4 Conclusions
From the statistics introduced in the previous articles [2,3]
based on information entropy, we have here derived an ap-
proximate formula for the calculation of redundancy R of
experimental data. It is important that this formula does
not include the integral by which the information entropy
is defined. This makes feasible a simplified and fairly good
estimation of redundancy and, with it, the related exper-
imental information and cost function. The advantage of
the approximate calculation becomes outstanding in mul-
tivariate cases because multiple integration is not needed
there. A serious obstacle for the application of the con-
cept of experimental information and redundancy of data
can thus be avoided. Efficient estimation of the experi-
mental information and cost function, and with them the
determined complexity of the phenomenon and the proper
number of experiments needed for its exploration, could
be considered valuable in planning experimental work. In
addition, the complexityK or the proper numberNo could
be applied in the field of neural networks [1,8] to deter-
mine the appropriate number of cells needed to deal with
a certain phenomenon.
Acknowledgment
This work was supported by the Ministry of Higher Edu-
cation, Science and Technology of the Republic of Slovenia
and EU – COST.
References
1. I. Grabec and W. Sachse, Synergetics of Measurement, Pre-
diction and Control (Springer-Verlag, Berlin, 1997).
2. I. Grabec, Experimental modeling of physical laws, Eur.
Phys. J., B, 22 129-135 (2001)
3. I. Grabec, Extraction of physical laws from joint exper-
imental data, Eur. Phys. J., B, 48 279-289 (2005) (DOI:
10.1140/epjb/e2005-00391-0)
4. J. C. G. Lesurf, Information and Measurement (Institute of
Physics Publishing, Bristol, 2002)
5. T. M. Cover and J. A. Thomas, Elements of Information
Theory (John Wiley & Sons, New York, 1991).
6. A. N. Kolmogorov, IEEE Trans. Inf. Theory, IT-2 102-108
(1956).
7. D. J. C. MacKay, Information Theory, Inference, and Learn-
ing Algorithms (Cambridge University Press, Cambridge,
UK, 2003)
8. S. Haykin, Neural Networks, (Prentice Hall International,
Inc., Upper Saddle River, New Jersey, 1999)
	Introduction
	Redundancy of experimental data
	Numerical examples
	Conclusions
ABSTRACT
  Redundancy of experimental data is the basic statistic from which the
complexity of a natural phenomenon and the proper number of experiments needed
for its exploration can be estimated. The redundancy is expressed by the
entropy of information pertaining to the probability density function of
experimental variables. Since the calculation of entropy is inconvenient due to
integration over a range of variables, an approximate expression for redundancy
is derived that includes only a sum over the set of experimental data about
these variables. The approximation makes feasible an efficient estimation of
the redundancy of data along with the related experimental information and
information cost function. From the experimental information the complexity of
the phenomenon can be simply estimated, while the proper number of experiments
needed for its exploration can be determined from the minimum of the cost
function. The performance of the approximate estimation of these statistics is
demonstrated on two-dimensional normally distributed random data.

<|endoftext|><|startoftext|>
Introduction
Large-scale molecular dynamics simulations are possible only with classical effective po-
tentials, which reduce the quantum-mechanical interactions of electrons and nuclei to an
effective interaction between the atom cores. The computational task is thereby greatly
simplified. Whereas ab-initio simulations are limited to a few hundred atoms at most,
classical simulations can be done routinely with multi-million atom systems. For many
purposes, such system sizes are indispensable. For example, fracture studies of quasicrys-
tals require samples with several million atoms at least [1]. Diffusion studies, on the other
∗e-mail: p.brommer@itap.physik.uni-stuttgart.de
hand, can be done with a few thousand atoms (or even less), but require very large simu-
lated times of the order of nanoseconds [2], which also makes them infeasible for ab-initio
simulations.
While physically justified effective potentials have been constructed for many elemen-
tary solids, such potentials are rare for complex intermetallic alloys. For this reason, molec-
ular dynamics simulations of these materials have often been done with simple model po-
tentials, resulting in rather limited reliability and predictability. In order to make progress,
better potentials are needed to accurately simulate complex materials.
The force matching method [3] provides a way to construct physically reasonable po-
tentials also for more complex solids, where a larger variety of local environments has to
be described correctly, and many potential parameters need to be determined. The idea is
to compute forces and energies from first principles for a suitable selection of small refer-
ence systems, and to fit the potential parameters so that they optimally reproduce these
reference data. Hereafter, potentials generated in this way will be referred to as fitted
potentials. Thus, the force matching method allows to make use of the results of ab-initio
simulations also for large-scale classical simulations, thereby bridging the gap between the
sample sizes supported by these two methods.
2 Force Matching
As we intend to construct potentials for complex intermetallic alloys, we have to assume
a functional form which is suitable for metals. A good choice are EAM (Embedded Atom
Method) potentials [4], also known as glue potentials [5]. Such potentials have been used
very successfully for many metals, and are still efficient to compute, even though they
include many-body terms. In contrast, pure pair potentials show a number of deficiencies
when it comes to describe metals [5]. The functional form of EAM potentials is given by
i,j<i
φkikj(rij) +
Uki(ni), with ni =
j 6=i
ρkj(rij), (1)
where φkikj is a pair potential term depending on the two atom types kl. Uki describes
the embedding term that represents an additional energy for each atom. This energy is a
function of a local density ni determined by contributions ρkj of the neighbouring atoms.
It is tempting to view this as embedding each atom into the electron sea provided by
its neighbours. Such an interpretation is not really meaningful, however. The potential
(1) is invariant under a family of “gauge” transformations [5], by which one can move
contributions from the embedding term to the pair term, and vice versa, so that it makes
little sense to give any of them an individual physical interpretation.
In order to allow for maximal flexibility, and to avoid any bias, the potential functions
in (1) are represented by tabulated values and spline interpolation, the tabulated values
acting as potential parameters. This makes it unnecessary to guess the right analytic form
beforehand. The sampling points can be chosen freely, which is useful for functions which
vary rapidly in one region, but only slowly in another region.
The forces and energies in the reference structures are computed with VASP, the Vi-
enna Ab-Initio Simulation Package [6, 7], using the Projector Augmented Wave (PAW)
method [8, 9]. Like all plane wave based ab-initio codes, VASP requires periodic bound-
ary conditions. For quasicrystals, this means that periodic approximants have to be used
as reference structures. As ab-initio methods are limited to a few hundred atoms, those
approximants must be rather small. For the systems studied so far, this was not a major
problem, as the relevant local environments in the quasicrystal all occur also in reasonably
small approximants. Icosahedral quasicrystals with F-type lattice may be more problem-
atic in this respect. For these, small approximants are rare, and the force matching method
requires a sufficient variety of reference structures.
Given the reference data (forces, energies, and stresses in the reference structures),
the potential parameters (in our case: up to about 120 EAM potential sampling points
for spline interpolation) then are optimised in a non-linear least square fit, so that the
fitted potential reproduces the reference data as well as possible. The target function to be
minimised is a weighted sum of the squared deviations between the reference data, denoted
by the subscript 0 below, and the corresponding data computed from the fitted effective
potential. It is of the form
Z = ZF + ZC, with (2)
α=x,y,z
(fjα − f0,jα)
f 20,j + εj
, and ZC =
(Ak − A0,k)
A20,k + εk
, (3)
where ZF represents the contributions of the forces f j, and ZC those of some collective
quantities like total stresses and energies, but also additional constraintsAk on the potential
one would like to impose. The denominators of the fractions ensure that the target function
measures the relative deviations from the reference data, except for really tiny quantities,
where the εl prevent extremely small denominators. The Wl are the weights of the different
terms. It proves useful for the fitting to give the total stresses and the cohesion energies a
higher weight, although in principle they should be reproduced correctly already from the
forces.
We developed a programme named potfit, which optimises the potential parameters to
a set of reference data. It consists of two largely independent parts. The first part imple-
ments a particular parametrised potential model. It takes a list of potential parameters
and computes from it the target function, i.e., the deviations of the forces, energies, and
stresses from the reference data. Wrapped around this part is a second, potential indepen-
dent part, which implements a least square minimisation module, using a combination of a
deterministic conjugate gradient algorithm [10] and a stochastic simulated annealing algo-
rithm [11]. This part knows nothing about the details of the potential, and only deals with
a list of potential parameters. The programme architecture thus makes it easy to replace
the potential dependent part by a different one, e.g., one which implements a different
potential model, or a different way to parametrise it.
3 Results and Applications
We generated several fitted potentials for decagonal Al-Ni-Co and icosahedral Ca-Cd qua-
sicrystals, as well as Mg-Zn potentials suitable for both icosahedral and decagonal phases.
In a first step, classical molecular dynamics simulations with simple model potentials were
used to create reference configurations from small approximants (80–250 atoms). These in-
cluded samples at different temperatures, but also samples which were scaled and strained
in different ways. The approximants were carefully selected, so that all relevant local envi-
ronments are represented. For those reference structures, the forces, stresses and energies
were computed with ab-initio methods, and a first version of the fitted effective potential
given by sampling points with cubic spline interpolation was fitted to the reference data.
In a second step, molecular dynamics simulations with the newly determined potential
were used to create new reference structures, which are better representatives of the struc-
tures actually appearing in that system. The new reference structures complemented and
partially replaced the previous ones, and the fitting procedure was repeated. This second
iteration resulted in a significantly better fit to the reference data. In order to test the
transferability of the fitted potentials, further samples similar to the reference structures
were created, and their ab-initio forces and energies were compared to those determined
by the classical potentials. The deviations were of the same order as the deviations found
in the potential fit, which shows that the fitted potentials transfer well to similar struc-
tures. For Al-Ni-Co, a force-matched potential is displayed in figure 1. Fitted potentials
for Ca-Cd and Mg-Zn are not displayed here for space constraints, but are available from
the authors.
The potentials developed for decagonal Al-Ni-Co quasicrystals are intended to be used
in high-temperature diffusion simulations [2]. It is therefore important that they describe
high temperature states well, which is achieved by selecting the reference structures ac-
cordingly. By using high temperature reference structures, the fitted potential is especially
trained to such situations. As part of the potential validation, the melting temperature was
determined by slowly heating the sample at constant pressure, and the elastic constants of
decagonal Al-Ni-Co were determined. We actually have constructed two potential variants:
Variant A gives excellent values for the elastic constants (Table 1), but produces a melting
temperature which is somewhat too high. Conversely, variant B shows larger deviations
in the elastic constants, but gives a very reasonable value of the melting temperature of
about 1300 K. It is a general experience that with an effective potential it is often not
possible to reproduce all desired quantities equally well at the same time.
In complex intermetallic systems there are many competing candidates for the ground
state structure. This is the case also for complex crystalline systems. In principle, the
ground state of these can be determined directly by ab-initio simulations, but for large
unit cells this is extremely time-consuming, or even impossible. Classical potentials can be
used to select the most promising candidates, and to pre-relax them, so that the time for
ab-initio relaxation can be dramatically reduced. Potentials used for this purpose must be
able to discriminate energy differences of the order of a meV/atom. This has been largely
achieved with fitted potentials for the Mg-Zn and Ca-Cd systems, by using mainly near
 1  2  3  4  5  6  7  8
distance [ Å ]
Al−Al
Al−Co
Al−Ni
 1  2  3  4  5  6  7  8
distance [ Å ]
Co−Co
Co−Ni
Ni−Ni
−0.10
−0.05
 1  2  3  4  5  6  7  8tr
distance [ Å ]
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
(total) local "electron density"
Figure 1: Potential functions for decagonal Al-Ni-Co
ground state structures as reference structures. Also, for this application it is important to
choose a small εj in equation (3), so that small forces are also reproduced accurately. The so
constructed Ca-Cd potentials have been used successfully for structure optimisations [13].
4 Discussion and Conclusion
The selection of the reference structures used for the potential fit largely determines the
capabilities of the resulting potential. For a precise determination of the ground state,
low temperature structures should be dominant in the reference structures, and it must be
assured that even small forces and energy differences are reproduced accurately. For high
temperature simulations, on the other hand, typical high temperature structures must be
predominant in the reference structures. This opens the possibility to design specialised
potentials for certain purposes by a suitable selection of reference structures. It should
be kept in mind, however, that a fitted potential can only deal with situations it has
been trained to. For instance, one should not expect a fitted potential to handle surfaces
correctly, if it was trained only with bulk systems. Clearly, there is always a trade-off
Table 1: Elastic constants of decagonal Al-Ni-Co
[GPa] c11 c33 c44 c66
a c12 c13
Exp. [12] 234 232 70 88 57 67
Pot. A 230 231 55 70 91 91
Pot. B 197 187 49 58 86 84
aIn decagonal QC: c66 = 12 (c11 − c12)
between the transferability and the accuracy of a fitted potential. A potential can be
made more versatile by training it with many different kinds of structures, but the more
versatile it becomes, the less accurate it will be on average. Conversely, very accurate fitted
potentials will probably have limited transferability.
For practical applications, the range of a potential is also an important issue, as it
enters in the third power in the computational effort of molecular dynamics. Allowing for
a larger potential range results in greater flexibility of the potential, which might improve
its accuracy, but this comes at the price of a slower simulation. We therefore need a
compromise between speed and accuracy. The potential range should only be increased as
long as this can improve the potential quality. In a first step, our fitted potentials were
constructed with a fairly generous range of about 7Å. It turned out, however, that especially
the transfer function ρi did not make effective use of this range, and was essentially zero
beyond 5Å. In a second fit we therefore restricted the range of ρi to 5Å, without significant
loss of accuracy. This is one of the advantages of using tabulated functions: The system
itself chooses the optimal functions, including the optimal range.
Force Matching has proven to be a versatile method to construct physically reasonable,
accurate effective potentials even for structures as complicated as quasicrystals and their
approximants. Our potfit programme makes it easy to apply this method to different sys-
tems, and is also easy to adapt for the support of further potential models. The potentials
constructed so far have successfully been used in high temperature diffusion simulations
of decagonal Al-Ni-Co [2], and in structure optimisation of approximants in the Zn-Mg
and Ca-Cd systems. Further fruitful applications of the fitted potentials can certainly be
expected, and we hope to apply our methods also to other complex alloy systems, where
reliable potentials are still lacking.
Acknowledgement
This work was funded by the Deutsche Forschungsgemeinschaft through Sonderforschungs-
bereich 382. Special thanks go to Marek Mihalkovič for supplying approximants and feed-
back in the Ca-Cd and Mg-Zn systems, and to Hans-Rainer Trebin for supervising the
thesis work of the first author.
References
[1] F. Rösch, Ch. Rudhart, J. Roth, H.-R. Trebin, and P. Gumbsch, Phys. Rev. B 72,
014128 (2005).
[2] S. Hocker, F. Gähler, and P. Brommer, Phil. Mag. 86, 1051 (2006).
[3] F. Ercolessi and J. B. Adams, Europhys. Lett. 26, 583 (1994).
[4] M. S. Daw and M. I. Baskes, Phys. Rev. B 29, 6443 (1984).
[5] F. Ercolessi, M. Parrinello, and E. Tosatti, Phil. Mag. A 58, 213 (1988).
[6] G. Kresse and J. Hafner, Phys. Rev. B 47, 558 (1993).
[7] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169 (1996).
[8] P. E. Blöchl, Phys. Rev. B 50, 17953 (1994).
[9] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999).
[10] M. J. D. Powell, Comp. J. 7, 303 (1965).
[11] A. Corana, M. Marchesi, C. Martini, and S. Ridella, ACM Trans. Math. Soft. 13, 262
(1987).
[12] M. A. Chernikov, H. R. Ott, A. Bianchi, A. Migliori, and T. W. Darling, Phys. Rev.
Lett. 80, 321 (1998).
[13] M. Mihalkovič and M. Widom, Phil. Mag. 86, 519 (2006).
	Introduction
	Force Matching
	Results and Applications
	Discussion and Conclusion
ABSTRACT
  Classical effective potentials are indispensable for any large-scale
atomistic simulations, and the relevance of simulation results crucially
depends on the quality of the potentials used. For complex alloys like
quasicrystals, however, realistic effective potentials are practically
inexistent. We report here on our efforts to develop effective potentials
especially for quasicrystalline alloy systems. We use the so-called force
matching method, in which the potential parameters are adapted so as to
optimally reproduce the forces and energies in a set of suitably chosen
reference configurations. These reference data are calculated with ab-initio
methods. As a first application, EAM potentials for decagonal Al-Ni-Co,
icosahedral Ca-Cd, and both icosahedral and decagonal Mg-Zn quasicrystals have
been constructed. The influence of the potential range and degree of
specialisation on the accuracy and other properties is discussed and compared.

<|endoftext|><|startoftext|>
On smooth foliations with Morse singularities
Lilia Rosati
Università di Firenze,
Dipartimento di Matematica “U. Dini”,
viale Morgagni 67/A, 50134 Firenze
e-mail: rosati@math.unifi.it
Abstract
Let M be a smooth manifold and let F be a codimension one, C∞ foliation on M , with isolated singularities
of Morse type. The study and classification of pairs (M,F) is a challenging (and difficult) problem. In this
setting, a classical result due to Reeb [Reeb] states that a manifold admitting a foliation with exactly two center-
type singularities is a sphere. In particular this is true if the foliation is given by a function. Along these lines
a result due to Eells and Kuiper [Ee-Kui] classify manifolds having a real-valued function admitting exactly
three non-degenerate singular points. In the present paper, we prove a generalization of the above mentioned
results. To do this, we first describe the possible arrangements of pairs of singularities and the corresponding
codimension one invariant sets, and then we give an elimination procedure for suitable center-saddle and some
saddle-saddle configurations (of consecutive indices).
In the second part, we investigate if other classical results, such as Haefliger and Novikov (Compact Leaf) the-
orems, proved for regular foliations, still hold true in presence of singularities. At this purpose, in the singular
set, Sing(F) of the foliation F , we consider weakly stable components, that we define as those components
admitting a neighborhood where all leaves are compact. If Sing(F) admits only weakly stable components,
given by smoothly embedded curves diffeomorphic to S1, we are able to extend Haefliger’s theorem. Finally,
the existence of a closed curve, transverse to the foliation, leads us to state a Novikov-type result.
Acknoledgements
I am very grateful to prof. Bruno Scárdua for proposing me such an interesting subject and for his valuable
advice. My hearthy good thanks to prof. Graziano Gentili for his suggestions on the writing of this article.
1 Foliations and Morse Foliations
Definition 1.1 A codimension k, foliated manifold (M,F) is a manifold M with a differentiable structure,
given by an atlas {(Ui, φi)}i∈I , satisfying the following properties:
(1) φi(Ui) = B
n−k × Bk;
(2) in Ui ∩ Uj 6= ∅, we have φj ◦ φ
i (x, y) = (fij(x, y), gij(y)),
where {fij} and {gij} are families of, respectively, submersions and diffeomorphisms, defined on natural
domains. Given a local chart (foliated chart) (U, φ), ∀x ∈ Bn−k and y ∈ Bk, the set φ−1(·, y) is a plaque and
the set φ−1(x, ·) is a transverse section.
The existence of a foliated manifold (M,F) determines a partition of M into subsets, the leaves, defined
by means of an equivalence relation, each endowed of an intrinsic manifold structure. Let x ∈ M ; we denote
by Fx or Lx the leaf of F through x. With the intrinsic manifold structure, Fx turns to be an immersed (but
not embedded, in general) submanifold of M .
In an equivalent way, a foliated manifold (M,F) is a manifold M with a collection of couples {(Ui, gi)}i∈I ,
http://arxiv.org/abs/0704.0164v1
where {Ui}i∈I is an open covering of M , gi : Ui → B
k is a submersion, ∀i ∈ I , and the gi’s satisfy the cocycle
relations, gi = gij ◦ gj , gii = id, for suitable diffeomorphisms gij : B
k → Bk, defined when Ui ∩ Uj 6= ∅.
Each Ui is said a foliation box, and gi a distinguished map. The functions γij = dgij are the transition maps
[Stee] of a bundle NF ⊂ TM , normal to the foliation. More completely, there exists a G-structure on M
[Law], which is a reduction of the structure group GL(n, R) of the tangent bundle to the subgroup of the
matrices
, where A ∈ GL(n− k, R) and C ∈ GL(k, R).
A codimension one, C∞ foliation of a smooth manifold M , with isolated singularities, is a pair F =
(F∗, Sing(F)), where Sing(F) ⊂ M is a discrete subset and F∗ is a codimension one, C∞ foliation (in the
ordinary sense) of M∗ = M \Sing(F). The leaves of F are the leaves of F∗ and Sing(F) is the singular set
of F . A point p is a Morse singularity if there is a C∞ function, fp : Up ⊂ M → R, defined in a neighborhood
Up of p, with a (single) non-degenerate critical point at p and such that fp is a local first integral of the foliation,
i.e. the leaves of the restriction F|Up are the connected components of the level hypersurfaces of fp in Up\{p}.
A Morse singularity p, of index l, is a saddle, if 0 < l < n (where n = dimM ), and a center, if l = 0, n. We
say that the foliation F has a saddle-connection when there exists a leaf accumulated by at least two distinct
saddle-points. A Morse foliation is a foliation with isolated singularities, whose singular set consists of Morse
singularities, and which has no saddle-connections. In this way if a Morse foliation has a (global) first integral,
it is given by a Morse function.
Of course, the first basic example of a Morse foliation is indeed a foliation defined by a Morse function on M .
A less evident example is given by the foliation depicted in figure 2.
In the literature, the orientability of a codimension k (regular) foliation is determined by the orientability of
the (n− k)-plane field tangent to the foliation, x → TxFx. Similarly transverse orientability is determined by
the orientability of a complementary k-plane field. A singular, codimension one foliation, F , is transversely
orientable [Cam-Sc] if it is given by the natural (n − 1)-plane field associated to a one-form, ω ∈ Λ1(M),
which is integrable in the sense of Frobenius. In this case, choosing a Riemannian metric on M , we may find
a global vector field transverse to the foliation, X = grad(ω), ωX ≥ 0, and ωxXx = 0 if and only if x is a
singularity for the foliation (ω(x) = 0). A transversely orientable, singular foliation F of M is a transversely
orientable (regular) foliation F∗ of M∗ in the sense of the classical definition. Viceversa, if F∗ is transversely
orientable, in general, F is not.
Thanks to the Morse Lemma [Mil 1], Morse foliations reduce to few representative cases. On the other
hand, Morse foliations describe a large class among transverseley orientable foliations. To see this, let F be a
foliation defined by an integrable one-form, ω ∈ Λ1(M), with isolated singularies. We proceed with a local
analysis; using a local chart around each singularity, we may suppose ω ∈ Λ1( Rn), ω(0) = 0, and 0 is the
only singularity of ω. We have ω(x) =
hi(x)dx
i and, in a neighborhood of 0 ∈ Rn, we may write ω(x) =
ω1(x) +O(|x|
2), where ω1 is the linear part of ω, defined by ω1(x) =
idxj , aij = ∂h
i(x)/∂xj . We
recall that the integrability of ω implies the integrability of ω1 and that the singularity 0 ∈ R
n is said to be non
degenerate if and only if (aij) ∈ R(n) is non degenerate; in this latter case (aij) is symmetric: it is the hessian
matrix of some real function f , defining the linearized foliation (ω1 = df ). We have
{transverseley orientable foliations, with Morse singularities} =
{foliations, defined by non degenerate linear one-forms} ⊂
{foliations, defined by non degenerate one-forms}.
Let (σ, τ) be the space σ of integrable one-forms in Rn, with a singularity at the origin, endowed with the
C1-Whitney topology, τ . If ω, ω′ ∈ σ, we say ω equivalent ω′ (ω ∼ ω′) if there exists a diffeomorphism
φ : Rn → Rn, φ(0) = 0, which sends leaves of ω into leaves of ω′. Moreover, we say ω is structurally stable,
if there exists a neighborhood V of ω in (σ, τ) such that ω′ ∼ ω, ∀ω′ ∈ V .
Theorem 1.2 (Wagneur)[Wag] The one-form ω ∈ σ is structurally stable, if and only if the index of 0 ∈
Sing(ω) is neither 2 nor n− 2.
Let us denote by S the space of foliations defined by non degenerate one-forms with singularities, whose
index is neither 2 nor n− 2. If S1 ⊂ S is the subset of foliations defined by linear one-forms, then we have:
Corollary 1.3 There exists a surjective map,
s : S1 → S/∼.
PSfrag replacementsF1
L0 L1
Figure 1: F1,F2 foliations on RP
Hol(L,F1) = {e}, Hol(L0,F1) = {e, g0},
g20 = e, Hol(L1,F2) = {e, g1}, g1
orientation reversing diffeomorphism,
Hol(L2,F2) = {e, g2}, g2 generator of
unilateral holonomy.
PSfrag replacements
Figure 2: A singular foliation of the sphere S2,
which does not admit a first integral. With the
same spirit, a singular foliation on S3 may be
given.
2 Holonomy and Reeb Stability Theorems
It is well known that a basic tool in the study of foliations is the holonomy of a leaf (in the sense of Ehresmann).
If L is a leaf of a codimension k foliation (M,F), the holonomy Hol(L,F) = Φ(π1(L)), is the image of a
representation, Φ : π1(L) → Germ( R
k, 0), of the fundamental group of L into the germs of diffeomorphisms
of Rk, fixing the origin. Let x ∈ L and Σx be a section transverse to L at x; with abuse of notation, we will
write that a diffeomorphism g : Dom(g) ⊂ Σx → Σx, fixing the origin, is an element of the holonomy group.
For codimension one foliations (k = 1), we may have: (i) Hol(L,F) = {e}, (ii) Hol(L,F) = {e, g}, with
g2 = e, g 6= e, (iii) Hol(L,F) = {e, g}, where gn 6= e, ∀n, and g is a (orientation preserving or reversing)
diffeomorphism. In particular, among orientation preserving diffeomorphisms, we might find a g : Σx → Σx,
such that g is the identity on one component of Σx \ {x} and it is not the identity on the other; in this case, we
say that L has unilateral holonomy (see figure 1 for some examples). We recall Reeb Stability Theorems (cfr.,
for example, [Cam-LN] or [Mor-Sc]).
Theorem 2.1 (Reeb Local Stability) Let F be a C1, codimension k foliation of a manifold M andF a compact
leaf with finite holonomy group. There exists a neighborhood U of F , saturated in F (also called invariant), in
which all the leaves are compact with finite holonomy groups. Further, we can define a retraction π : U → F
such that, for every leaf F ′ ⊂ U , π|F ′ : F
′ → F is a covering with a finite number of sheets and, for each
y ∈ F , π−1(y) is homeomorphic to a disk of dimension k and is transverse to F . The neighborhood U can be
taken to be arbitrarily small.
The last statement means in particular that, in a neighborhood of the point corresponding to a compact leaf
with finite holonomy, the space of leaves is Hausdorff.
Under certain conditions the Reeb Local Stability Theorem may replace the Poincaré Bendixon Theorem
[Pal-deM] in higher dimensions. This is the case of codimension one, singular foliations (Mn,F), with n ≥ 3,
and some center-type singularity in Sing(F).
Theorem 2.2 (Reeb Global Stability) Let F be a C1, codimension one foliation of a closed manifold, M . If
F contains a compact leaf F with finite fundamental group, then all the leaves of F are compact, with finite
fundamental group. If F is transversely orientable, then every leaf of F is diffeomorphic to F ; M is the total
space of a fibration f : M → S1 over S1, with fibre F , and F is the fibre foliation, {f−1(θ)|θ ∈ S1}.
This theorem holds true even when F is a foliation of a manifold with boundary, which is, a priori, tangent
on certain components of the boundary and transverse on other components [God]. In this setting, let H l =
{(x1, . . . , xl) ∈ Rl|xl ≥ 0}. Taking into account definition 1.1, we say that a foliation of a manifold with
boundary is tangent, respectively transverse to the boundary, if there exists a differentiable atlas {(Ui, φi)}i∈I ,
such that property (1) of the above mentioned definition holds for domains Ui such that Ui ∩ ∂M = ∅, while
φi(Ui) = B
n−k × H k, respectively, φi(Ui) = H
n−k × Bk for domains such that Ui ∩ ∂M 6= ∅. Moreover,
we ask that the change of coordinates has still the form described in property (2). Recall that F|∂M is a regular
codimension k − 1 (respectively, k) foliation of the (n− 1)-dimensional boundary. After this, it is immediate
to write the definition for foliations which are tangent on certain components of the boundary and transverse
PSfrag replacements b
Figure 3: n = 2: a singular foliation with center-
type singularities, having no first integral.
PSfrag replacements
Ws(q))
Wu(q)
Figure 4: A trivial couple center-saddle (p, q)
(Theorem 3.5, case (i)).
on others.
Observe that, for foliations tangent to the boundary, we have to replace S1 with [0, 1] in the second statement
of the Reeb Theorem 2.2 (see Lemma 5.6).
We say that a component of Sing(F) is weakly stable if it admits a neighborhood, U , such that F|U is a
foliation with all leaves compact. The problem of global stability for a foliation with weakly stable singular
components may be reduced to the case of foliations of manifolds with boundary, tangent to the boundary. It is
enough to cut off an invariant neighborhood of each singular component.
Holonomy is related to transverse orientability by the following:
Proposition 2.3 Let L be a leaf of a codimension one (Morse) foliation (M,F). If Hol(L,F) = {e, g}, where
g2 = e, g 6= e, then F is non-transversely orientable. Moreover, if π : M → M/F is the projection onto the
space of leaves, then ∂(M/F) 6= ∅ and π(L) ∈ ∂(M/F).
Proof. We choose x ∈ L and a segment Σx, transverse to the foliation at x. Then g : Σx → Σx turns out to
be g(y) = −y. Let y → Ny a 1-plane field complementary to the tangent plane field y → TyFy . Suppose
we may choose a vector field y → X(y) such that Ny = span{X(y)}. Then it shoud be X(x) = −X(x) =
(dg)x(X(x)), a contraddiction. Consider the space of leaves near L; this space is the quotient of Σx with
respect to the equivalence relation ∼ which identifies points on Σx of the same leaf. Then Σx/∼ is a segment
of type (z, x] or [x, z), where π−1(x) = L.
At last we recall a classical result due to Reeb.
Theorem 2.4 (Reeb Sphere Theorem) [Reeb] A transversely orientable Morse foliation on a closed manifold,
M , of dimension n ≥ 3, having only centers as singularities, is homeomorphic to the n-sphere.
This result is proved by showing that the foliation considered must be given by a Morse function with only two
singular points, and therefore thesis follows by Morse theory. Notice that the theorem still holds true for n = 2,
with a different proof. In particular, the foliation need not to be given by a function (see figure 3).
3 Arrangements of singularities
In section 4 we will study the elimination of singularities for Morse foliations. To this aim we will describe here
how to identify special “couples” of singularities and we will study the topology of the neighbouring leaves.
Definition 3.1 Let n = dimM,n ≥ 2. We define the set C(F) ⊂ M as the union of center-type singularities
and leaves diffeomorphic to Sn−1 (with trivial holonomy if n = 2) and for a center singularity, p, we denote
by Cp(F) the connected component of C(F) that contains p.
Proposition 3.2 Let F be a Morse foliation on a manifold M . We have:
(1) C(F) and Cp(F) are open in M .
(2) Cp(F) ∩ Cq(F) 6= ∅ if and only if Cp(F) = Cq(F). Cp(F) = M if and only if ∂ Cp(F) = ∅. In this
case the singularities of F are centers and the leaves are all diffeomorphic to Sn−1.
(3) If q ∈ Sing(F) ∩ ∂ Cp(F), then q must be a saddle; in this case ∂ Cp(F) ∩ Sing(F) = {q}. Moreover,
for n ≥ 3 and F transversely orientable, ∂ Cp(F) 6= ∅ if and only if ∂ Cp(F) ∩ Sing(F) 6= ∅. In these
hypotheses, ∂ Cp(F) contains at least one separatrix of the saddle q.
(4) ∂ Cp(F) \ {q} is closed in M \ {q}.
PSfrag replacements
Figure 5: A saddle q of index 1 (n− 1), accumu-
lating one center p (Theorem 3.5, case (ii)).
PSfrag replacements
Figure 6: A saddle q of index 1 (n− 1), accumu-
lating one center p (Theorem 3.5, case (iii)).
Figure 7: Two saddles in trivial coupling for the
foliation defined by the function fǫ = −
ǫy + z
, (ǫ > 0).
PSfrag replacements
L2p q
no intersection
legenda
Figure 8: A dead branch of a trivial couple of sad-
dles for a foliated manifold (Mn,F), n ≥ 3.
Proof. (1) C(F) is open by the Reeb Local Stability Theorem 2.1. (3) If non-empty, ∂ Cp(F) ∩ Sing(F)
consists of a single saddle q, as there are no saddle connections. The second part follows by the Reeb Global
Stability Theorem for manifolds with boundary and the third by the Morse Lemma. (4) By the Transverse
Uniformity Theorem (see, for example, [Cam-LN]), it follows that the intrinsic topology of ∂ Cp(F) \ {q}
coincides with its natural topology, as induced by M \ {q}.
We recall the following (cfr., for example [Mor-Sc]):
Lemma 3.3 (Holonomy Lemma) Let F be a codimension one, transversely orientable foliation on M , let A
be a leaf of F and K be a compact and path-connected set. If g : K → A is a C1 map homotopic to a constant
in A, then g has a normal extension i.e. there exist ǫ > 0 and a C1 map G : K × [0, ǫ] → M such that
Gt(x) = G
x(t) = G(x, t) has the following properties: (i) G0(K) = g, (ii) Gt(K) ⊂ A(t) for some leaf A(t)
of F with A(0) = A, (iii) ∀x ∈ K the curve Gx([0, ǫ]) is normal to F .
For the case of center-saddle pairings we prove the following descriptions of the separatrix:
Theorem 3.4 Let F be a C∞, codimension one, transversely orientable, Morse foliation of a compact n-
manifold, M , n ≥ 3. Let q be a saddle of index l /∈ {1, n− 1}, accumulating to one center p. Let L ⊂ Cp(F)
be a spherical leaf intersecting a neighborhood U of q, defined by the Morse Lemma. Then ∂ Cp(F) \ {q}
has a single connected component (see figure 13) and is homeomorphic to Sn−1/Sl−1. If F is a leaf such that
U \ Cp(F)
6= ∅, then F is homeomorphic to Bl×Sn−l−1∪φ B
l×Sn−l−1, where φ is a diffeomorphism
of the boundary (for example, we may have F ≃ Sl × Sn−l−1, but also F ≃ Sn−1, for l = n/2).
Proof. Let ω ∈ Λ1(M) be a one-form defining the transversely orientable foliation. We choose a riemannian
metric on M and we consider the transverse vector field Xx = grad(ω)x. We suppose ||X || = 1. In U , we
have X = h · grad(f) for some real function h > 0 defined on U . Further, we may suppose that ∂U follows
the orbits of X in a neighborhood of ∂ Cp(F).
The Morse Lemma gives a local description of the foliation near its singularities; in particular the local topology
of a leaf near a saddle of index l is given by the connected components of the level sets of the function f(x) =
−x21 − · · · − x
l + x
l+1 + · · · + x
n. If, for c ≥ 0, we write f
−1(c) = {(x1, . . . , xn) ∈ R
n|x21 + · · · +
x2l + c = x
l+1 + · · · + x
n}, it is easy to see that f
−1(0) is homeomorphic to a cone over Sl−1 × Sn−l−1
and f−1(c) ≃ Bl × Sn−l−1 (c > 0). Similarly, we obtain f−1(c) ≃ Bn−l × Sl−1 for c < 0. Therefore,
by our hypothesis on l, the level sets are connected; in particular the separatrix S ⊃ f−1(0) is unique and
∂ Cp(F) = S ∪ {q}; moreover U is splitted by f
−1(0) in two different components. A priori, a leaf may
intersect more than one component. As F is transversely orientable, the holonomy is an orientation preserving
diffeomorphism, and then a leaf may intersect only non adiacent components; then this is not the case, in our
hypotheses.
Let L be a spherical leaf ⊂ Cp(F) enough near q. Then L ∩ U 6= ∅ and it is not restrictive to suppose it
is given by f−1(c) for some c < 0. We define the compact set K = Sn−1 \ Bn−l × Sl−1 ≃ L \ U . As
n ≥ 3, the composition K
// L \ U
// L is homotopic to a constant in its leaf. By
the proof of the Holonomy Lemma 3.3, L \ U projects diffeomorphically onto A(ǫ) = ∂ Cp(F), by means
of the constant-speed vector field, X . Together with the Morse Lemma, this gives a piecewise description of
∂ Cp(F), which is obtained by piecing pieces toghether. It comes out ∂ Cp(F) ≃ S
n−1/Sl−1, a set with the
homotopy type of Sn−1 ∨ Sl (where ∨ is the wedge sum), simply connected in our hypotheses. Consequently,
the map K×{ǫ} → ∂ Cp(F), obtained with the extension, admits, on turn, a normal extension. This completes
the piecewise description of F .
In case of presence of a saddle of index 1 or n− 1, we have:
Theorem 3.5 Let F be a C∞, codimension one, transversely orientable, Morse foliation of a compact n-
manifold, M , n ≥ 3. Let q be a saddle of index 1 or n − 1 accumulating to one center p. Let L ⊂ Cp(F) be
a spherical leaf intersecting a neighborhood U of q, defined by the Morse Lemma. We may have: (i) ∂ Cp(F)
contains a single separatrix of the saddle (see figure 4) and is homeomorphic to Sn−1; (ii) ∂ Cp(F) contains
both separatrices S1 and S2 of the saddle (see figure 5) and is homeomorphic to S
n−1/Sn−2 ≃ Sn−1 ∨Sn−1.
If this is the case, there exist two leaves Fi (i = 1, 2), such that Fi and L intersect different components of
U \ Si and we have that Fi is homeomorphic to S
n−1 (i = 1, 2); (iii) q is a self-connected saddle (see figure
6) and ∂ Cp(F) is homeomorphic to S
n−1/S0. In this case we will refer to the couple
Cp(F),F| Cp(F )
a singular Reeb component. Moreover, U \ ∂ Cp(F) has three connected components and L intersects two of
them. If F is a leaf intersecting the third component of U \ ∂ Cp(F), then F is homeomorphic to S
1 × Sn−2,
or to R× Sn−2.
Proof. The proof is quite similar to the proof of the previous theorem. Nevertheless we give a brief sketch
here. The three cases arise from the fact that q has two local separatrices, S1 and S2, but not necessarily
∂ Cp(F) contains both of them. When this is the case, we may have that S1 and S2 belong to distinct leaves,
or to the same leaf (in this case all spherical leaves contained in Cp(F) intersect two different components of
U \ (S1 ∪ S2) ). Using the Morse lemma, we construct the set K for the application of the Holonomy Lemma
3.3. We have, respectively: K = Bn−1, K = K1 ⊔K2 = S
0 ×Bn−1 (we apply twice the Holonomy Lemma),
K = B1 × Sn−2. In the first two cases, as K is simply connected, the map K → L, to be extended, is clearly
homotopic to a constant in its leaf. Then L \U projects onto ∂Cp(F) and on neighbour leaves. This completes
the piecewise description in case (i) and (ii).
In the third case, piecing pieces together after a first application of the Holonomy Lemma, we obtain ∂ Cp(F) ≃
Sn−1/S0 and ∂ Cp(F) \ {q} ≃ B
1 × Sn−2, simply connected for n 6= 3. With a second application of the
Holonomy Lemma (n 6= 3), K projects diffeomorphically onto any neighbour leaf, F . The same also happens
for n = 3, because a curve γ : S1 → ∂ Cp(F), as the one depicted in figure 6, is never a generator of the
holonomy, which is locally trivial (a consequence of the Morse lemma). Nevertheless, there are essentially two
ways to piece pieces together. We may have F ≃ S1 × Sn−2 or F ≃ R× Sn−2.
The last result gives the motivation for a new concept.
Definition 3.6 In a codimension one singular foliation F it may happen that, for some leaf L and q ∈ Sing(F),
the set L ∪ {q} is arcwise connected. Let C = {q ∈ Sing(F)|L ∪ {q} is arcwise connected}. If for some
leaf L the set C 6= ∅, we define the corresponding singular leaf [Wag] S(L) = L ∪ C. In particular, if F is a
transversely orientable Morse foliation, each singular leaf is given by S(L) = L∪{q}, for a single saddle-type
singularity q, either selfconnected or not.
In the case of a transversely orientable Morse foliation F on M (n = dimM ≥ 3), given a saddle q and
a separatrix L of q, we may define a sort of holonomy map of the singular leaf S(L). This is done in the
following way.
As the foliation is Morse, in a neighborhoodU ⊂ M of q there exists a (Morse) local first integral f : U → R,
with f(q) = 0. Keeping into account the structure of the level sets of the Morse function f (see Theorem 3.4
and Theorem 3.5) we observe that there are at most three connected components in U \ S(L) = U \ {f−1(0)}
(notice that the number of components depends on the Morse index of q).
Let γ : [0, 1] → S(L) be a C1 path through the singularity q. At first, we consider the case γ([0, 1]) ⊂ U ,
q = γ(t) for some 0 < t < 1. For a point x ∈ M \ Sing(F), let Σx be a transverse section at x. The
set Σx \ {x} is the union of two connected components, Σ
x and Σ
x that we will denote by semi-transverse
sections at x. For x = γ(0) ∈ S(L) we have f(x) = 0 and we can choose semi-transverse sections at x in a
way that f(Σ+x ) > 0 and f(Σ
x ) < 0. We repeat the construction for y = γ(1), obtaining four semi-transverse
sections, which are contained in (at most) three connected components of U \S(L). As a consequence, at least
two of them are in the same component. By our choices, this happens for Σ−x and Σ
y (but we cannot exclude it
happens also for Σ+x and Σ
y ). We define the semi-holonomy map h
− : Σ−
∪γ(0) → Σ−
∪γ(1) by setting
h−(γ(0)) = γ(1) and h−(z) = h(z) for z ∈ Σ−
, where h : Σ−
is a classic holonomy map (i.e.
such that for a leaf F , it is h(F ∩ Σ−
) = F ∩ Σ−
). In the same way, if it is the case, we define h+.
Consider now any curve γ : [0, 1] → S(L). As F is transversely orientable, the choice of a semi-transverse
section for the curve γ([0, 1]) ∩ U , may be extended continuously on the rest of the curve, γ([0, 1]) \ U ; with
this remark, we use classic holonomy outside U . To complete the definition, it is enough to say what a semi-
transverse section at the saddle q is. In this way we allow q ∈ γ(∂[0, 1]). To this aim, we use the orbits of
the transverse vector field, grad(f). By the property of gradient vector fields, there exist points t, v such that
α(t) = ω(v) = q. Let Σ+q (Σ
q ) be the negative (positive) semi-orbit through t (v). Each of Σ
q and Σ
transverse to the foliation and such that Σ+q ∩ Σ
q = {q}, is a semi-transverse section at the saddle q.
In this way, the semi-holonomy of a singular leaf Hol+(S(L),F) is a representation of the fundamental
group π1(S(L)) into the germs of diffeomorphisms of R≥0 fixing the origin, Germ( R≥0, 0).
Now we consider the (most interesting) case of a selfconnected separatrix S(L) = ∂ Cp(F), with ∂ Cp(F)
satisfying the description of Theorem 3.5, case (iii). The singular leaf ∂ Cp(F), homeomorphic to S
n−1/S0,
has the homotopy type of Sn−1 ∨ S1. We have Hol+(∂ Cp(F),F) = {e, h
γ }, where γ is the non trivial
generator of the homotopy, and h−γ is a map with domain contained in the complement ∁ Cp(F). The two
options h−γ = e, h
γ 6= e give an explanation of the two possible results about the topology of the leaves near
the selfconnected separatrix.
4 Realization and elimination of pairings of singularities
Let us describe one of the key points in our work, i.e. the elimination procedure, which allows us to delete
pairs of singularities in certain configurations, and, this way, to lead us back to simple situations as in the Reeb
Sphere Theorem (2.4). We need the following notion [Cam-Sc]:
Definition 4.1 Let F be a codimension one foliation with isolated singularities on a manifold Mn. By a
dead branch of F we mean a region R ⊂ M diffeomorphic to the product Bn−1 × B1, whose boundary,
∂R ≈ Bn−1×S0∪Sn−2×B1, is the union of two invariant components (pieces of leaves of F , not necessarily
distinct leaves in F ) and, respectively, of transverse sections, Σ ≈ {t} × B1, t ∈ Sn−2.
Let Σi, i = 1, 2 be two transverse sections. Observe that the holonomy from Σ1 → Σ2 is always trivial, in the
sense of the Transverse Uniformity Theorem [Cam-LN], even if Σi ∩ S(L) 6= ∅ for some singular leaf S(L).
In this case we refer to the holonomy of the singular leaf, in the sense above.
A first result includes known situations.
Proposition 4.2 Given a foliated manifold (Mn,F), with F Morse and transversely orientable, with Sing(F) ∋
p, q, where p is a center and q ∈ ∂ Cp(F) is a saddle of index 1 or n− 1, there exists a new foliated manifold
(M, F̃), such that: (i) F̃ and F agree outside a suitable region R of M , which contains the singularities p, q;
(ii) F̃ is nonsingular in a neighborhood of R.
Proof. We are in the situations described by Theorem 3.5. If we are in case (i), the couple (p, q) may be
eliminated with the technique of the dead branch, as illustrated in [Cam-Sc]. If we are in case (ii), we observe
that the two leaves Fi, i = 1, 2 bound a region, A, homeomorphic to an anulus, S
n−1 × [0, 1]. We may now
replace the singular foliation F|A with the trivial foliation F̃|A, given by S
n−1 × {t}, t ∈ [0, 1]. If we are in
case (iii), we may replace the singular Reeb component with a regular one, in the spirit of [Cam-Sc]. Even in
this case, we may think the replacing takes place with the aid of a new sort of dead branch, the dead branch
PSfrag replacements
Figure 9: On the left: the height function on the
plane V defines a foliation of the torus; on the
right: a possible description of the foliation.
PSfrag replacements
∂ Cp(F)
Figure 10: On the left, a dead branch for the self-
connected saddle q of figure 9; on the right, the
foliation obtained after the elimination of the two
couples of singularities.
of the selfconnected saddle, that we describe with the picture of figure 10, for the case of the foliation of the
torus of figure 9, defined by the height Morse function [Mil 1]. Observe that the couples (p, q) and (r, s) of this
foliation may be also seen as an example of the coupling described in Theorem 3.5, case (ii). In this case the
elimination technique and the results are completely different (see figure 11).
Definition 4.3 If the couple (p, q) satisfies the description of Theorem 3.5, case (i) (and therefore may be elim-
inated with the technique of the dead branch), we will say that (p, q) is a trivial couple.
A new result is the construction of saddle-saddle situations:
Proposition 4.4 Given a foliation F on an n-manifold Mn, there exists a new foliation F̃ on M , with
Sing(F̃) = Sing(F) ∪ {p, q}, where p and q are a couple of saddles of consecutive indices, connecting
transversely (i.e. such that the stable manifold of p, Ws(p), intersects transversely the unstable manifold of q,
Wu(q)).
Proof. We choose the domain of (any) foliated chart, (U, φ). Observe that R′ = U (≃ φ(U)) is a dead branch
for a foliation F ǫ′ , given (up to diffeomorphisms) by the submersion fǫ = −x
1/2− · · · − x
k−1/2 + (x
ǫxk) + xk+1/2+ · · ·+ x
n/2, for some ǫ = ǫ
′ < 0. We consider F ǫ′′ , given by taking ǫ = ǫ
′′ > 0 in fǫ, which
presents a couple of saddles of consecutive indices, and we choose a dead branch R′′ around them. We also
choose a homeomorphism between R′ and R′′ which sends invariant sets of F ǫ′ into invariant sets of F ǫ′′ in a
neighborhood of the boundary. With a surgery, we may replace F ǫ′ with F ǫ′′ .
The converse of the above poposition is preceded by the following
Remark 4.5 Given a foliation F on Mn with two complementary saddle singularities p, q ∈ Sing(F), having
a strong stable connection γ, there exist a neighborhood U of p, q and γ in Mn, a δ ∈ R+ and a coordinate
system φ : U → Rn taking p onto (0, . . . , φk = −δ, . . . , 0), q onto (0, . . . , φk = δ, . . . , 0), γ onto the
xk-axis, {xl = 0}l 6=k, and such that: (i) the stable manifold of p is tangent to φ
−1({xl = 0}l>k) at p, (ii)
the unstable manifold of q is tangent to φ−1({xl = 0}l<k) at q (we are led to the situation considered in
[Mil 2], A first cancelation theorem). So using the chart φ : U → Rn we may assume that we are on a
dead branch of Rn and the foliation F|U is defined by fǫ, for ǫ = δ
2. In this way the vector field grad(fǫ)
defines a transverse orientation in U . For a suitable µ > 0, the points r1 = (0, . . . , φ
k = −δ − µ, . . . , 0)
and r2 = (0, . . . , φ
k = δ + µ, . . . , 0) are such that the modification takes place in a region of U delimited by
Lri , i = 1, 2.
Proposition 4.6 Given a foliation F on Mn with a couple of saddles p, q of complementary indices, having
a strong stable connection, there exists a dead branch of the couple of saddles, R ⊂ M and we can obtain
a foliation F̃ on M such that: (i) F̃ and F agree on M \ R; (ii) F̃ is nonsingular in a neighborhood of R;
indeed F̃ |R is conjugated to a trivial fibration; (iii) the holonomy of F̃ is conjugate to the holonomy of F in
the following sense: given any leaf L of F such that L ∩ (M \R) 6= ∅, then the corresponding leaf L̃ of F̃ is
such that Hol(L̃, F̃) is conjugate to Hol(L,F).
Example 4.7 (Trivial Coupling of Saddles) Let M = Sn, n ≥ 3. For l = 1, . . . , n − 2 we may find a
Morse foliation of M = Sn, invariant for the splitting Sn = Bn−l × Sl ∪φ S
n−l−1 × Bl+1, where φ is a
diffeomorphism of the boundary. In fact, by theorem 3.4 or 3.5, case (iii), Bn−l × Sl admits a foliation with
one center and one saddle of index l. Similarly, Sn−l−1 × Bl+1 admits a foliation with a saddle of index
n− l− 1, actually a saddle of index l+ 1, after the attachment. We may eliminate the trivial couple of saddles
and we are led to the well-known foliation of Sn, with a couple of centers and spherical leaves.
Remark 4.8 The elimination of saddles of consecutive indices is actually a generalization of the elimination of
couples center-saddle, (p, q) with q ∈ ∂ Cp(F). Indeed, we may eliminate (p, q) only when the saddle q has
index 1 or n−1. This means the singularities of the couple must have consecutive indices and, as q ∈ ∂ Cp(F),
there exists an orbit of the transverse vector field having p as α-limit (backward) and q as ω-limit (forward), or
viceversa. Such an orbit is a strong stable connection.
5 Reeb-type theorems
We shall now describe how to apply our techniques to obtain some generalizations of the Reeb Sphere Theorem
(2.4) for the case of Morse foliations admitting both centers and saddles.
A first generalization is based on the following notion:
Definition 5.1 We say that an isolated singularity, p, of a C∞, codimension one foliation F on M is a stable
singularity, if there exists a neighborhood U of p in M and a C∞ function, f : U → R, defining the foliation
in U , such that f(p) = 0 and f−1(a) is compact, for |a| small. The following characterization of stable
singularities can be found in [Cam-Sc].
Lemma 5.2 An isolated singularity p of a function f : U ⊂ Rn → R defines a stable singularity for df ,
if and only if there exists a neighborhood V ⊂ U of p, such that, ∀x ∈ V , we have either ω(x) = {p} or
α(x) = {p}, where ω(x) (respectively α(x)) is the ω-limit (respectively α-limit) of the orbit of the vector field
grad(f) through the point x.
In particular it follows the well-known:
Lemma 5.3 If a function f : U ⊂ Rn → R has an isolated local maximum or minimum at p ∈ U then p is a
stable singularity for df .
The converse is also true:
Lemma 5.4 If p is a stable singularity, defined by the function f , then p is a point of local maximum or minimum
for f .
Proof. It follows immediately by Lemma 5.2 and by the fact that f is monotonous, strictly increasing, along
the orbits of grad(f).
With this notion, we obtain
Lemma 5.5 Let F be a codimension one, singular foliation on a manifold Mn. In a neighborhood of a stable
singularity, the leaves of F are diffeomorphic to spheres.
Proof. Let p ∈ Sing(F) be a stable singularity. By Lemma 5.4, we may suppose p is a minimum (otherwise
we use −f ). Using a local chart around p, we may suppose we are on Rn and we may write the Taylor-
Lagrange expansion around p for an approximation of the function f : U → R at the second order. We
have f(p + h) = f(p) + 1/2〈h,H(p + θh)h〉, where H is the Hessian of f and 0 < θ < 1. It follows
〈h,H(p+ θh)h〉 ≥ 0 in U . Then f is convex and hence the sublevels, f−1(c), are also convex.
We consider the flow φ : D(φ) ⊂ R × U → U of the vector field grad(f). By the properties of gradient
vector fields, in our hypothesis, D(φ) ⊃ (−∞, 0] × U and ∀x ∈ U there exists the α-limit, α(x) = p. For
any x ∈ f−1(c), the tangent space, Txf
−1(c), to the sublevels of f does not contain the radial direction, −→px.
This is obvious otherwise, for the convexity of f−1(c), the singularity p should lie on the sublevel f−1(c), a
contraddiction because, in this case, p should be a saddle. Equivalently, the orbits of the vector field grad(f)
are transverse to spheres centered at p. An application of the implicit function theorem shows the existence
of a smooth function x → tx, that assigns to each point x ∈ f
−1(c) the (negative) time at which φ(t, x)
intersects Sn−1(p, ǫ), where ǫ is small enough to have Bn(p, ǫ) ( R(f−1(c)), the compact region bounded by
f−1(c) . The diffeomorphism between the leaf f−1(c) and the sphere Sn−1(p, ǫ) is given by the composition
x → φ(tx, x). The lemma is proved.
Lemma 5.6 Let F be a codimension one, transversely orientable foliation of M , with all leaves closed,
π : M → M/F the projection onto the space of leaves. Then we may choose a foliated atlas on M and
a differentiable structure on M/F , such that M/F is a codimension one compact manifold, locally diffeomor-
phic to the space of plaques, and π is a C∞ map.
Proof. At first we notice that the space of leaves M/F (with the quotient topology) is a one-dimensional Hau-
sorff topological space, as a consequence of the Reeb Local Stability Theorem 2.1. As all leaves are closed and
with no holonomy, we may choose a foliated atlas {(Ui, φi)} such that, for each leaf L ∈ F , L∩Ui consists, at
most, of a single plaque. Let π : M → M/F be the projection onto the space of leaves and πi : Ui → R the
projection onto the space of plaques. With abuse of notation, we may write πi = p2 ◦ φi, where p2 is the pro-
jection on the second component. As there is a 1-1 correspondence between the quotient spaces π|Ui(Ui) and
πi(Ui), then, are homeomorphic. Let V ⊂ M/F be open. The set π
−1(V ) is an invariant open set. We may
find a local chart (Ui, φi) such that π(Ui) = V . We say that (V, πi ◦ (π|Ui )
−1) is a chart for the differentiable
atlas with the required property. To see this, it is enough to prove that, if (V, πj ◦(π|Uj )
−1) is another chart with
the same domain, V , there exists a diffeomorphism between the two images of V , i.e. between πi◦(π|Ui)
−1(V )
and πj ◦ (π|Uj )
−1(V ). This is not obvious when Ui ∩ Uj = ∅. Indeed, the searched diffeomorphism exists,
and it is given by the Transverse Uniformity Theorem [Cam-LN]. Observe that, in coordinates, π coincides
with the projection on the second factor.
Lemma 5.7 Let n ≥ 2. A weakly stable singularity for a foliation (Mn,F) is a stable singularity.
Proof. Let p be a weakly stable singularity, U a neighborhood of p with all leaves compact. We need a
local first integral near p. As a consequence of the Reeb Local Stability Theorem 2.1, we can find an (invari-
ant) open neighborhood V ⊂ U of p, whose leaves have all trivial holonomy. The set V \ {p} is open in
M∗ = M \ Sing(F). Let F∗ = F \ Sing(F); the projection π∗ : M∗ → M∗/F∗ is an open map (see, for
example [Cam-LN]). As a consequence of Lemma 5.6, the connected (as n ≥ 2) and open set π∗(V \ {p})
is a 1-dimensional manifold with boundary, i.e. it turns out to be an interval, for example (0, 1). Now, we
extend smoothly π∗ to a map π on U . In particular, let W ( V be a neighborhood of p. If (for example)
π∗(W \ {p}) = (0, b) for some b < 1, we set π(p) = 0. Thesis follows by lemma 5.3.
Theorem 5.8 Let Mn be a closed n-dimensional manifold, n ≥ 3. Suppose that M supports a C∞, codimen-
sion one, transversely orientable foliation, F , with non-empty singular set, whose elements are, all, weakly
stable singularities. Then M is homeomorphic to the sphere, Sn.
Proof. By hypothesis, ∀p ∈ Sing(F), p is a weakly stable singularity. Then it is a stable singularity. By lemma
5.5, in an invariant neighborhood Up of p, the leaves are diffeomorphic to spheres. Now we can proceed as in
the proof of the Reeb Sphere Theorem 2.4.
Theorem 5.9 (Classification of codimension one foliations with all leaves compact) Let F be a (possibly
singular, with isolated singularities) codimension one foliation of M , with all leaves compact. Then all pos-
sible singularities are stable. If F is (non) transversely orientable, the space of leaves is (homeomorphic to
[0, 1]) diffeomorphic to [0, 1] or S1. In particular, this latter case ocurs if and only if ∂M,Sing(F) = ∅. In
all the other cases, denoting by π : M → [0, 1] the projection onto the space of leaves, it is Hol(π−1(x),F) =
{e}, ∀x ∈ (0, 1). Moreover, if x = 0, 1, we may have: (i) π−1(x) ⊂ ∂M 6= ∅ and Hol(π−1(x),F) = {e};
(ii) π−1(x) is a (stable) singularity; (iii) Hol(π−1(x),F) = {e, g}, g 6= e, g2 = e (in this case, ∀y ∈ (0, 1),
the leaf π−1(y) is a two-sheeted covering of π−1(x).
Proof. If F is transversely orientable, by the Reeb Global Stability Theorem 2.2 and Lemma 5.6, the space of
leaves is either diffeomorphic to S1 or to [0, 1]. In particular, M/F ≈ S1 if and only if M is closed and F non
singular. When this is not the case, M/F ≈ [0, 1], and there are exactly two points (∂[0, 1]) which come from
a singular point and/or from a leaf of the boundary.
If F is non transversely orientable, there is at least one leaf with (finite) non trivial holonomy, which corre-
sponds a boundary point in M/F to (by Proposition 2.3). By the proof of Lemma 5.6, the projection is not
differentiable and the space of leaves M/F , a Hausdorff topological 1-dimensional space, turns out to be an
orbifold (see [Thu]). We pass to the transversely orientable double covering, p : (M̃, F̃) → (M,F). The fo-
liation F̃ , pull-back of F , has all leaves compact, and singular set empty or with stable components; therefore
we apply the first part of the classification to M̃/F̃ . Both if M̃/F̃ is diffeomorphic to S1 or to [0, 1], M/F is
homeomorphic to [0, 1], but (clearly) with different orbifold structures.
Before going on with our main generalization of the Reeb Sphere Theorem 2.4, which extends a similar
result of Camacho and Scárdua [Cam-Sc] concerning the case n = 3, we need to recall another result, that we
are going to generalize.
As we know, the Reeb Sphere Theorem, in its original statement, consideres the effects (on the topology of a
manifold M ) determined by the existence, on M , of a real valued function with exactly two non-degenerate
singular points. A very similar problem was studied by Eells and Kuiper [Ee-Kui]. They considered manifolds
admitting a real valued function with exactly three non-degenerate singular points.They obtained very interest-
ing results. Among other things, it sticks out the obstruction they found about the dimension of M , which must
be even and assume one of the values n = 2m = 2, 4, 8, 16. Moreover, the homotopy type of the manifold
turns out to vary among a finite number of cases, including (or reducing to, if n = 2, 4) the homotopy tupe of
the projective plane over the real, complex, quaternion or Cayley numbers.
Definition 5.10 In view of the results of Eells and Kuiper [Ee-Kui], if a manifold M admits a real-valued func-
Figure 11: Elimination technique applied in case
(ii) (Theorem 3.5) for the foliation of figure 9.
PSfrag replacements q
Figure 12: A foliation of RP2 with three singular
points.
tion with exactly three non-degenerate singular points, we will say that M is an Eells-Kuiper manifold.
We have (see [Cam-Sc] for the case n = 3):
Theorem 5.11 (Center-Saddle Theorem) Let Mn be an n-dimensional manifold, with n ≥ 2 such that
(M,F) is a foliated manifold, by means of a transversely orientable, codimension-one, Morse, C∞ folia-
tion F . Moreover F is assumed to be without holonomy if n = 2. Let Sing(F) be the singular set of F , with
#Sing(F) = k + l, where k, l are the numbers of, respectively, centers and saddles. If we have k ≥ l + 1,
then there are two possibilities:
(1) k = l + 2 and M is homeomorphic to an n-dimensional sphere;
(2) k = l + 1 and M is an Eells-Kuiper manifold.
Proof. If l = 0, assertion is proved by the Reeb Sphere Theorem 2.4. Let l ≥ 1; we prove our thesis by
induction on the number l of saddles. We set F l = F .
So let l = 1 and F1 = F . By hypothesis, in the set Sing(F) there exist at least two centers, p1, p2, with
p1 6= p2, and one saddle q. We have necessarily q ∈ ∂ Cp1(F) ∩ ∂ Cp2(F). In fact, if this is not the case and,
for example q /∈ ∂ Cp1(F), then (keeping into account that for n = 2, the foliation F is assumed to be without
holonomy) ∂ Cp1 = ∅ and M = Cp1(F). A contraddiction. Let i(q) the Morse index of the saddle q.
For n ≥ 3 we apply the results of Theorems 3.4 and 3.5 to the couples (p1, q) and (p2, q). In particular, by
Theorem 3.5, (iii), it follows that the saddle q cannot be selfconnected. We now have the following two possi-
bilities:
(a) i(q) = 1, n− 1 and (p1, q) or (and) (p2, q) is a trivial couple,
(b) i(q) 6= 1, n− 1 and there are no trivial couples.
For n = 2, we have necessarily i(q) = 1 and, in our hypotheses, q is always selfconnected. With few changes,
we adapt Theorem 3.5, to this case, obtaining ∂ Cp(F) ≃ S
1 or ∂ Cp(F) ≃ S
1 ∨ S1; in this latter case we will
say that the saddle q is selfconnected with respect to p. We obtain:
(a’) (p1, q) or (and) (p2, q) is a trivial couple;
(b’) q is selfconnected both with respect to p1 and to p2.
In cases (a) and (a’) we proceed with the elimination of a trivial couple, as stated in Proposition 4.2, and then
we obtain the foliated manifold (M,F0), with no saddle-type and some center-type singularities. We apply the
Reeb Sphere Theorem 2.4 and obtain #Sing(F) = 2 and M ≃ Sn.
In case (b) (n ≥ 3), as a consequence of Theorem 3.4, we necessarily have i(q) = n/2 (and therefore n must
be even!). Moreover Cp1(F) ≈ Cp2(F) and M = Cp1(F) ∪φ Cp2(F) may be thought as two copies of the
same (singular) manifold glued together along the boundary, by means of the diffeomorphism φ.
In case (b’) (n = 2), we obtain the same result as above, i.e. Cp1(F) ≈ Cp2(F) and M = Cp1(F)∪φ Cp2(F).
We notice that case (b’) occurs when the set Cpi(F) ≃ B
2/S0 is obtained by identifying two points of the
boundary in a way that reverses the orientation.
In cases (b) and (b’), it turns out that #Sing(F1) = 3. Moreover, F1 has a first integral, which is given by
the projection of M onto the space of (possibly singular) leaves. In fact, by Lemma 5.6, the space of leaves is
diffeomorphic to a closed interval of R. In this way M turns out to be an Eells-Kuiper manifold. This ends the
case l = 1.
Let l > 1 (and #Sing(F) > 3). As above, in Sing(F) there exist at least one saddle q and two (distinct) cen-
ters, p1, p2 such that q ∈ ∂ Cp1(F)∩ ∂ Cp2(F); we are led to the same possibilities (a), (b) for n ≥ 3 and (a)’,
(b)’ for n = 2. Anyway (b) and (b’) cannot occur, otherwise M = Cp1(F) ∪φ Cp2(F) and #Sing(F) = 3,
a contraddiction. Then we may proceed with the elimination of a trivial couple. In this way we obtain the
foliated manifold (M,F l−1), which we apply the inductive hypothesis to. The theorem is proved, observing
that, a posteriori, case (1) holds if k = l + 2 and case (2) if k = l + 1.
6 Haefliger-type theorems
In this paragraph, we investigate the existence of leaves of singular foliations with unilateral holonomy. Keep-
ing into account the results of the previous paragraph, for Morse foliations, we may state or exclude such an
occurrence, according to the following theorem:
Theorem 6.1 Let F be a C∞, codimension one, Morse foliation on a compact manifold Mn, n ≥ 3, assumed
to be transversely orientable, but not necessarily closed. Let k be the number of centers and l the number of
saddles. We have the following possibilities: (i) if k ≥ l + 1, then all leaves are closed in M \ Sing(F); in
particular, if ∂M 6= ∅ or k ≥ l + 2 each regular (singular) leaf of F , is diffeomorphic (homeomorphic) to a
sphere (in the second option, it is diffeomorphic to a sphere with a pinch at one point); (ii) if k = l there are
two possibilities: all leaves are closed in M \Sing(F), or there exists some compact (regular or singular) leaf
with unilateral holonomy.
Example 6.2 The foliation of example 4.7 is an occurrence of theorem 6.1, case (ii) with all leaves closed. The
Reeb foliation of S3 and each foliation we may obtain from it, with the introduction of l = k trivial couples
center-saddle, are examples of theorem 6.1, case (ii), with a leaf with unilateral holonomy.
Now we consider other possibilities for Sing(F).
Definition 6.3 Let F be a C∞, codimension one foliation on a compact manifold Mn, n ≥ 3, with singular
set Sing(F) 6= ∅. We say that Sing(F) is regular if its connected components are either isolated points or
smoothly embedded curves, diffeomorphic to S1. We extend the definition of stability to regular components,
by saying that a connected component Γ ⊂ Sing(F) is (weakly) stable, if there exists a neighborhood of Γ,
where the foliation has all leaves compact (notice that we can repeat the proof of Lemma 5.7 and obtain that a
weakly stable component is a stable component).
In the case Sing(F) is regular, with stable isolated singularities, when n ≥ 3 we may exclude a Haefliger-
type result, as a consequence of Lemma 5.5 and the Reeb Global Stability Theorem for manifolds with bound-
ary. Then we study the case Sing(F) regular, with stable components, all diffeomorphic to S1. Let J be a set
such that for all j ∈ J , the curve γj : S
1 → M , is a smooth embedding and Γj := γj(S
1) ⊂ Sing(F) is
stable. Then J is a finite set. This is obvious, otherwise ∀j ∈ J , we may select a point xj ∈ Γj and obtain
that the set {xj}j∈J has an accumulation point. But this is not possible because the singular components are
separated. We may regard a singular component Γj , as a degenerate leaf, in the sense that we may associate to
it, a single point of the space of leaves.
We need the following definition
Definition 6.4 Let F be a C∞, codimension one foliation on a compact manifold M . Let D2 be the closed
2-disc and g : D2 → M be a C∞ map. We say that p ∈ D2 is a tangency point of g with F if (dg)p( R
Tg(p)Fg(p).
We recall a proposition which Haefliger’s theorem (cfr. the book [Cam-LN]) is based upon.
Proposition 6.5 Let A : D2 → M be a C∞ map, such that the restriction A|∂D2 is transverse to F , i.e.
∀x ∈ ∂D2, (dA)x(Tx(∂D
2)) + TA(x)FA(x) = TA(x)M . Then, for every ǫ > 0 and every integer r ≥ 2,
there exists a C∞ map, g : D2 → M , ǫ-near A in the Cr-topology, satisfying the following properties: (i)
g|∂D2 is transverse to F . (ii) For every point p ∈ D
2 of tangency of g with F , there exists a foliation box
U of F with g(p) ∈ U and a distinguished map π : U → R such that p is a non-degenerate singularity of
π ◦ g : g−1(U) → R. In particular there are only a finite number of tangency points of g with F , since they
are isolated, and they are contained in the open disc D2 = {z ∈ R2 : ||z|| < 1}. (iii) If T = {p1, . . . , pt}
is the set of tangency points of g with F , then g(pi) and g(pj) are contained in distinct leaves of F , for every
i 6= j. In particular, the singular foliation g∗(F) has no saddle connections.
We are now able to prove a similar result, in the case of existence of singular components.
Proposition 6.6 Let F be a codimension one, C∞ foliation on a compact manifold Mn, n ≥ 3, with regular
singular set, Sing(F) = ∪j∈JΓj 6= ∅, where Γj are all stable components diffeomorphic to S
1 and J is finite.
Let A : D2 → M be a C∞ map, such that the restriction A|∂D2 is transverse to F . Then, for every ǫ > 0 and
every integer r ≥ 2, there exists a C∞ map, g : D2 → M , ǫ-near A in the Cr-topology, satisfying properties
(i) and (iii) of proposition 6.5, while (ii) is changed in: (ii’) for every point p ∈ D2 of tangency of g with F , we
have two cases: (1) if Lg(p) is a regular leaf of F , there exists a foliation box, U of F , with g(p) ∈ U , and a
distinguished map, π : U → R, satisfying properties as in (ii) of Proposition 6.5; (2) if Lg(p) is a degenerate
leaf of F , there exists a neighborhood, U of p, and a singular submersion, π : U → R, satisfying properties
as in (ii) Proposition 6.5.
Proof. We start by recalling the idea of the classical proof.
We choose a finite covering of A(D2) by foliation boxes {Qi}
i=1. In each Qi the foliation is defined by
a distinguished map, the submersion πi : Qi → R. We choose an atlas, {(Qi, φi)}
i=1, such that the last
component of φi : Qi → R
n is πi, i.e. φi = (φ
i , φ
i , . . . , φ
i , πi). We construct the finite cover of D
{Wi = A
−1(Qi)}
i=1; the expression of A in coordinates is A|Wi = (A
i , . . . , A
i , πi ◦A). We may choose
covers of D2, {Ui}
i=1, {Vi}
i=1, such that Ui ⊂ Vi ⊂ Vi ⊂ Wi, i = 1, . . . , r; then we proceed by induction
on the number i. Starting with i = 1 and setting g0 = A, we apply a result ([Cam-LN], Cap. VI, §2, Lemma
1, pag. 120) and we modify gi−1 in a new function gi, in a way that gi(Wi) ⊂ Qi and πi ◦ gi : Wi → R is
Morse on the subset Ui ⊂ Wi. At last we set g = gr.
In the present case, essentially, it is enough to choose a set of couples, {(Uk, πk)}k∈K , where {Uk}k∈K is
an open covering of M , πk : Uk → R, for k ∈ K , is a (possibly singular) submersion and, if Uk ∩ Ul 6= ∅
for a couple of indices k, l ∈ K , there exists a diffeomorphism plk : πk(Uk ∩ Ul) → πl(Uk ∩ Ul), such
that πl = plk ◦ πk. By hypothesis, there exists the set of couples {(Ui, πi)}i∈I , where {Ui}i∈I , is an open
covering of M \ Sing(F), and, for i ∈ I , the map πi : Ui → R, is a distinguished map, defining the foliated
manifold (M \ Sing(F),F∗). Let y ∈ Sing(F), then y ∈ Γj , for some j ∈ J . As y ∈ M , there exists
a neighborhood C ∋ y, homeomorphic to an n-ball. Let h : C → Bn be such a homeomorphism. As the
map γj : S
1 → Γj is a smooth embedding, we may suppose that, locally, Γj is sent in a diameter of the
ball Bn, i.e. h(C ∩ Γj) = {x2 = · · · = xn = 0}. For each singular point z = h
−1(b, 0, . . . , 0), the set
D = h−1(b, x2, . . . , xn), homeomorphic to a small (n−1)-ball, is transverse to the foliation at z. Moreover, if
z1 6= z2, then D1 ∩D2 = ∅. The restriction F|D is a singular foliation with an isolated stable singularity at z.
By lemma 5.5, the leaves of F|D are diffeomorphic to (n− 2)-spheres. It turns out that y has a neighborhood
homeomorphic to the product (−1, 1)×Bn−1, where the foliation is the image of the singular trivial foliation of
(−1, 1)×Bn−1, given by (−1, 1)×Sn−2×{t}, t ∈ (0, 1), with singular set (−1, 1)×{0}. Let πy : Uy → [0, 1)
be the projection. If, for a couple of singular points y, w ∈ Sing(F), we have Uy ∩Uw 6= ∅, we may suppose
they belong to the same connected component, Γj . We have πw ◦ π
y (0) = 0 and, as a consequence of lemma
5.6, there exists a diffeomorphism between πy(Uy ∩ Uw \ Γj) and πw(Uy ∩ Uw \ Γj). The same happens if
Uy ∩Ui 6= ∅ for some Ui ⊂ M \Sing(F). It comes out that πy is singular on Uy ∩Sing(F) and non-singular
on Uy \ Sing(F), i.e. (dπy)z = 0 ⇔ z ∈ Uy ∩ Sing(F). At the end, we set K = I ∪ Sing(F).
Let g : D2 → M be a map. Then g defines the foliation g∗(F), pull-back of F , on D2. Observe that if
Sing(F) = ∅, then Sing(g∗(F)) = {tangency points of g with F}, but in the present case, as Sing(F) 6= ∅,
we have Sing(g∗(F)) = {tangency points of g with F} ∪ g∗(Sing(F)). Either if p is a point of tangency of
g with F or if p ∈ g∗(Sing(F)), we have d(πk)p = 0. With this remark, we may follow the classical proof.
As a consequence of proposition 6.6, we have:
Theorem 6.7 (Haefliger’s theorem for singular foliations) Let F be a codimension one, C2, possibly singular
foliation of an n-manifold M , with Sing(F), (empty or) regular and with stable components diffeomorphic to
S1. Suppose there exists a closed curve transverse to F , homotopic to a point. Then there exists a leaf with
unilateral holonomy.
7 Novikov-type theorems
We end this article with a result based on the original Novikov’s Compact Leaf Theorem and on the notion of
stable singular set. To this aim, we premise the following remark. Novikov’s statement establishes the existence
of a compact leaf for foliations on 3-manifolds with finite fundamental group. This result actually proves the
existence of an invariant submanifold, say N ⊂ M , with boundary, such that F|N contains open leaves whose
universal covering is the plane. Moreover these leaves accumulate to the compact leaf of the boundary. In what
follows, a submanifold with the above properties will be called a Novikov component. In particular a Novikov
component may be a Reeb component, i.e. a solid torus endowed with its Reeb foliation. We recall that two
PSfrag replacements
∂ Cp(F)
Figure 13: p − q is not a trivial coupling when
1 < l < n− 1, where l is the index of the saddle
PSfrag replacements ST1
Figure 14: A singular foliation of S3, with no van-
ishing cycles.
Reeb components, glued together along the boundary by means of a diffeomorphism which sends meridians in
parallels and viceversa, give the classical example of the Reeb foliation of S3.
If F is a Morse foliation of a 3-manifold, as all saddles have index 1 or 2, we are always in conditions of
proposition 4.2 and then we are reduced to consider just two (opposite) cases: (i) all singularities are centers,
(ii) all singularities are saddles. In case (i), by the proof of the Reeb Sphere Theorem 2.4, we know that all
leaves are compact; in case (ii), all leaves may be open and dense, as it is shown by an example of a foliation
of S3 with Morse singularities and no compact leaves [Ros-Rou].
As in the previous paragraph, we study the case in which Sing(F) is regular with stable components,Γj , j ∈ J ,
where J is a finite set. We have:
Theorem 7.1 Let F be a C∞, codimension one foliation on a closed 3-manifold M3. Suppose Sing(F) is
(empty or) regular, with stable components. Then we have two possibilities: (i) all leaves of F are compact;
(ii) F has a Novikov component.
Proof. If Sing(F) = ∅, thesis (case (ii)) follows by Novikov theorem. Let Sing(F) 6= ∅. We may suppose
that F is transversely orientable (otherwise we pass to the transversely orientable double covering). If Sing(F)
contains an isolated singularity, as we know, we are in case (i). Then we suppose Sing(F) contains no isolated
singularity, i.e. Sing(F) =
Γj . Set D(F) = {Γj, j ∈ J} ∪ { compact leaves with trivial holonomy}.
By the Reeb Local Stability Theorem 2.1, D(F) is open. We may have ∂D(F) = ∅, and then we are in
case (i), or ∂D(F) 6= ∅, and in this case it contains a leaf with unilateral holonomy, F . It is clear that F
bounds a Novikov component, and then we are in case (ii); in fact, from one side, F is accumulated by open
leaves. If F ′ is one accumulating leaf, then its universal covering is p : R2 → F ′. Suppose, by contraddiction,
that the universal covering of F ′ is p : S2 → F ′. By the Reeb Global Stability Theorem for manifolds with
boundary, all leaves are compact, diffeomorphic to p(S2). This concludes the proof since F must have infinite
fundamental group.
The last result may be reread in terms of the existence of closed curves, transverse to the foliation. We have:
Lemma 7.2 LetF be a codimension one, C∞ foliation on a closed 3-manifoldM , with singular set, Sing(F) 6=
∅, regular, with stable components. Then F is a foliation with all leaves compact if and only if there exist no
closed transversals.
Proof. (Sufficiency) If the foliation admits an open (in M \ Sing(F)) leaf, L, it is well known that we may
find a closed curve, intersecting L, transverse to the foliation. Viceversa (necessity), let F be a foliation with
all leaves compact. If necessary, we pass to the transversely orientable double covering p : (M̃, F̃) → (M,F).
In this way, we apply Lemma 5.6 and obtain, as Sing(F̃) 6= ∅, that the projection onto the space of leaves is
a (global) C∞ first integral of F̃ , f : M̃ → [0, 1] ⊂ R. Suppose, by contraddiction, that there exists a C1
closed transversal to the foliation F , the curve γ : S1 → M . The lifting of γ2 is a closed curve, Γ : S1 → M̃ ,
transverse to F̃ . The set f(Γ(S1)) is compact and then has maximum and minimum, m1,m2 ∈ R. A contrad-
diction, because Γ cannot be transverse to the leaves {f−1(m1)}, {f
−1(m2)}.
With this result, we may rephrase the previous theorem.
Corollary 7.3 Let F be a codimension one, C∞ foliation on a 3-manifold M , such that Sing(F) is regular
with stable components. Then (i) there are no closed transversals, or equivalently, F is a foliation by compact
leaves, (ii) there exists a closed transversal, or equivalently, F has a Novikov component.
Remark 7.4 In the situation we are considering, we cannot state a singular version of Auxiliary Theorem I (see,
for example [Mor-Sc]). In fact, even though a singular version of Haefliger Theorem is given, the existence of
a closed curve transverse the foliation, homotopic to a constant, does not lead, in general, to the existence of a
vanishing cycle, as it is shown by the following counterexample.
Example 7.5 We consider the foliation of S3 given by a Reeb component, ST1, glued (through a diffeomor-
phism of the boundary which interchanges meridians with parallels) to a solid torus ST2 = S
1 × D2 =
T 2 × (0, 1) ∪ S1. The torus ST2 is endowed with the singular trivial foliation F|ST2 = T
2 × {t}, for
t ∈ (0, 1), where Sing(F|ST2) = S
1 = Sing(F). As a closed transversal to the foliation, we consider
the curve γ : S1 → ST1 ⊂ S
3, drawed in figure 14. Let f : D2 → S3 be an extension of γ; the extension
f is assumed to be in general position with respect to F , as a consequence of proposition 6.5. As γ(S1) is
linked to the singular component S1 ⊂ ST2, then f(D2) ∩ Sing(F) 6= ∅. As a consequence, we find a
decreasing sequence of cycles, {βn}, (the closed curves of the picture) which does not admit a cycle, β∞, such
that βn > β∞, for all n. In fact the “limit” of the sequence is not a cycle, but the point f(D2) ∩ Sing(F).
Example 7.6 The different situations of Theorem 7.1 or Corollary 7.3 may be exemplified as follows. It is
easy to see that S3 admits a singular foliation with all leaves compact (diffeomorphic to T 2) and two singular
(stable) components linked together, diffeomorphic to S1. In fact one can verify that S3 is the union of two
solid tori, ST1 and ST2, glued together along the boundary, both endowed with a singular trivial foliation.
We construct another foliation on S3, modifying the previous one. We set S̃T1 = S
1 × {0} ∪ T 2 × (0, 1/2].
In this way, ST1 = S̃T1 ∪ T
2 × (1/2, 1]. We now modify the foliation in ST1 \ S̃T1, by replacing the trivial
foliation with a foliation with cylindric leaves accumulating to the two components of the boundary.
References
[Cam-LN] C. Camacho, A. Lins Neto: Geometric theory of foliations, Boston, Birkhauser, 1985
[Cam-Sc] C. Camacho, B. Scárdua: On codimension one foliations with Morse singularities on three-
manifolds, Topology and its Applications 154 (2007) 1032-1040.
[Ee-Kui] J. Eells, N.H. Kuiper: Manifolds which are like projective planes, Pub. Math. de l’I.H.E.S., 14, 1962.
[God] C. Godbillon: Feuilletages, etudies geometriques, Basel, Birkhauser, 1991
[Law] H.B. Lawson, jr.: Foliations, Bull. Amer. Math. Soc., Vol. 80, N. 3, May 1974.
[Mil 1] J. Milnor: Morse theory, Princeton, NJ, Princeton University Press, 1963.
[Mil 2] J. Milnor: Lectures on the h-cobordism theorem, Princeton, NJ, Princeton University Press, 1965.
[Mor-Sc] C.A. Morales, B. Scárdua: Geometry and Topology of foliated manifolds.
[Nov] S.P. Novikov: Topology of foliations. Trudy Moskov. Mat. Obshch. 14 (1965), 248-278.
[Pal-deM] J. Palis, jr., W. de Melo: Geometric theory of dinamical systems: an introduction, New-York,
Springer,1982.
[Reeb] G. Reeb: Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction
numérique. CRAS 222 (1946), 847-849.
[Ros-Rou] H. Rosemberg, R. Roussarie: Some remarks on stability of foliations, J. Diff. Geom. 10, 1975,
207-219.
[Stee] N. Steenrod: The topology of fiber bundles, Princeton, NJ, Princeton University Press, 1951
[Thu] W.P. Thurston: Three-dimensional geometry and topology, Princeton, NJ, Princeton University Press,
1997.
[Wag] E. Wagneur: Formes de Pfaff à singularités non dégénérées, Annales de l’institut Fourier, tome 28 n. 3
(1978), p. 165-176.
	Foliations and Morse Foliations
	Holonomy and Reeb Stability Theorems
	Arrangements of singularities
	Realization and elimination of pairings of singularities
	Reeb-type theorems
	Haefliger-type theorems
	Novikov-type theorems
ABSTRACT
  Let $M$ be a smooth manifold and let $\F$ be a codimension one, $C^\infty$
foliation on $M$, with isolated singularities of Morse type. The study and
classification of pairs $(M,\F)$ is a challenging (and difficult) problem. In
this setting, a classical result due to Reeb \cite{Reeb} states that a manifold
admitting a foliation with exactly two center-type singularities is a sphere.
In particular this is true if the foliation is given by a function. Along these
lines a result due to Eells and Kuiper \cite{Ku-Ee} classify manifolds having a
real-valued function admitting exactly three non-degenerate singular points. In
the present paper, we prove a generalization of the above mentioned results. To
do this, we first describe the possible arrangements of pairs of singularities
and the corresponding codimension one invariant sets, and then we give an
elimination procedure for suitable center-saddle and some saddle-saddle
configurations (of consecutive indices). In the second part, we investigate if
other classical results, such as Haefliger and Novikov (Compact Leaf) theorems,
proved for regular foliations, still hold true in presence of singularities. At
this purpose, in the singular set, $Sing(\F)$ of the foliation $\F$, we
consider {\em{weakly stable}} components, that we define as those components
admitting a neighborhood where all leaves are compact. If $Sing(\F)$ admits
only weakly stable components, given by smoothly embedded curves diffeomorphic
to $S^1$, we are able to extend Haefliger's theorem. Finally, the existence of
a closed curve, transverse to the foliation, leads us to state a Novikov-type
result.

<|endoftext|><|startoftext|>
Introduction
Classically Frobenius-Schur indicators were defined for irreducible representa-
tions of finite groups over the field of complex numbers. The interest in doing so
came from the second indicator which determines whether an irreducible represen-
tation is real, complex or quaternionic. Namely, a classical theorem of Frobenius
and Schur asserts that an irreducible representation is real, complex or quaternionic
if and only if its second indicator is 1, 0 or −1, respectively (see e.g. [S]). However,
no representation-theoretic interpretation of the higher indicators is known.
Recently, Frobenius-Schur indicators of irreducible representations of complex
semisimple finite dimensional (quasi-)Hopf algebras H were defined by Linchenko
and Montgomery [LM] and Mason and Ng [MN] (see also [KSZ]), generalizing
the definition in the group case. The values of the mth indicator are cyclotomic
integers in Qm. Moreover, an analog of the Frobenius-Schur theorem on the second
indicator was proved, and in general it has been shown that the indicators carry
rich information on H , as well as on its representation category (see also [NS2]).
In fact, one can generalize the definition of Frobenius-Schur indicators to simple
objects of any semisimple tensor categories which admit a pivotal structure (=
tensor isomorphism id →∗∗), thus showing in particular that the indicators are
categorical invariants (see e.g. [FGSV], [NS1]).
The category of finite dimensional representations of a finite dimensional complex
semisimple Lie algebra is a pivotal semisimple tensor category, and hence one can
define the Frobenius-Schur indicators of its simple objects. The second indicator
was already defined and known to be nonzero if and only if the simple representation
is self-dual, and 1 or −1 if and only if the representation is orthogonal or symplectic,
respectively. Furthermore, Tits gave an explicit formula for it in representation-
theoretic terms (see Section 3).
The purpose of this paper is to study Frobenius-Schur indicators (of all degrees)
for semisimple Lie algebras. More specifically to find a closed formula for the
indicators in representation-theoretic terms and deduce its asymptotical behavior.
In particular we obtain that the indicators take integer values.
The organization of the paper is as follows.
Section 2 is devoted to preliminaries. We recall some basic definitions and facts
from Lie theory which we need (e.g. the Weyl integration formula). Next we
define the mth Frobenius-Schur indicator of the representation categories of finite
dimensional complex semisimple Lie algebras.
Date: February 11, 2007.
http://arxiv.org/abs/0704.0165v1
2 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
In section 3 we recall the properties of the second indicator. For the benefit of
the reader we also give a proof of Tits’ theorem.
Section 4 is dedicated to the proof of our main results. In 4.1 we prove the
formula for the mth Frobenius-Schur indicator νm, m ≥ 2, which is given by the
following theorem.
Theorem 1.1. Let g be a finite dimensional complex semisimple Lie algebra. Let
V (λ) be an irreducible representation of g with highest weight λ, W the Weyl group
of g, ρ the half sum of positive roots, and V (λ)
ρ−σ·ρ
the weight space of the
weight ρ−σ·ρ
where m ≥ 2 is an integer. Then the mth Frobenius-Schur indicator
νm(V (λ)) of V (λ) is given by
νm(V (λ)) =
sn(σ) dimV (λ)
ρ− σ · ρ
Our proof of Theorem 1.1 is analytic. Namely, we work with the equivalent
representation category of the associated simply connected Lie group and use the
Weyl integration formula to obtain our formula.
Next, in 4.2 we prove the following corollary of Theorem 1.1.
Corollary 1.2. For large enough m, νm(V (λ)) = dimV (λ)[0] (which is not zero if
and only if λ belongs to the root lattice). In particular for the classical Lie algebras
sl(n,C), so(2n,C), so(2n + 1,C) and sp(2n,C), νm(V (λ)) = dimV (λ)[0] for m
greater or equal to 2n− 1, 4n− 5, 4n− 3 and 2n+ 1, respectively.
Finally in 4.3 we use our formula and Kostant’s theorem to compute explicitly
the Frobenius-Schur indicators for the representation category of sl(3,C). More
specifically, we prove:
Theorem 1.3. Let V (a, b) be an irreducible representation of sl(3,C). Then
(1) ν2(V (a, b)) = 1 if a = b, and ν2(V (a, b)) = 0 if a 6= b.
(2) ν3(V (a, b)) = 1 +min{a, b}.
(3) For m > 3 we have, νm(V (a, b)) = 1 + min{a, b} if (a, b) is in the root
lattice and νm(V (a, b)) = 0 otherwise.
Acknowledgments. This research was supported by the Israel Science Foundation
(grant No. 125/05).
2. Preliminaries
Throughout let g be a finite dimensional complex semisimple Lie algebra of rank
r, ( , ) its Killing form, h a Cartan subalgebra (CSA) of g, Φ the root system
corresponding to h, ∆ a fixed base, {h1, ..., hr} the corresponding coroot system,
and W the Weyl group.
Let λ ∈ h∗ be a dominant integral weight (i.e. λ(hi) is a nonnegative integer
for all i), V (λ) the finite dimensional irreducible representation of g with highest
weight λ and Π(λ) the set of integral weights occurring in V (λ); it is a finite
set which is invariant under the action of the Weyl group. For µ ∈ Π(λ), let
mλ(µ) = dimV (λ)[µ] be the multiplicity of µ in V (λ). Recall that the multiplicities
are invariant under the Weyl group action. Let ρ =
α (half sum of positive
roots); it is a strongly dominant integral weight.
FROBENIUS-SCHUR INDICATORS FOR SEMISIMPLE LIE ALGEBRAS 3
Let us recall Kostant’s theorem on the multiplicities of weights (for a proof see
[Hu]). Let µ ∈ h∗ and define p(µ) to be the number of sets of non-negative integers
{kα|α ≻ 0} for which µ =
kαα (p is called the Kostant’s partition function).
Of course, p(µ) = 0 if µ is not in the root lattice.
Theorem 2.1. (Kostant) Let λ be a dominant weight and µ ∈ Π(λ). Then the
multiplicities of V (λ) are given by the formula
mλ(µ) =
sn(σ)p(σ(λ + ρ)− µ− ρ).
Let gc be the compact real form of g, and G the corresponding simply connected
compact matrix Lie group with Lie algebra gc. It is known that Rep(g), Rep(gc)
and Rep(G) are equivalent symmetric tensor categories.
Let t be a CSA of gc; it corresponds to a maximal torus T of G. Then h = t⊕ it.
It is known that α(h) is purely imaginary for all h ∈ t and α ∈ Φ. If t∗ denotes
the space of real-valued linear functionals on t, then the roots are contained in
it∗ ⊂ h∗. It is then convenient to introduce the real roots, which are simply 1
times
the ordinary roots, the real coroots hα which are the elements of t corresponding
to the elements 2α
(α,α)
where α is a real root, and the real weights of an irreducible
representation of G. An element µ of t∗ is said to be integral if µ(hα) ∈ Z for each
real coroot hα. The real weights of any finite dimensional representation of g are
integral. (See [Ha].)
The Weyl denominator is the function Aρ : T −→ C given by
Aρ(t) = Aρ(e
sn(ω)ei(ω·ρ)(h).
Theorem 2.2. (Weyl integration formula) Let G be a simply connected compact
Lie group. Let f be a continuous class function on G, dg the normalized Haar
measure on G, and dt the normalized Haar measure on T . Then
f(g)dg =
f(t)|Aρ(t)|
Let us now define the Frobenius-Schur indicators of an irreducible representation
of g.
Definition 2.3. Let V be an irreducible representation of g and m ≥ 2 be an inte-
ger. The mth Frobenius-Schur indicator of V is the number νm(V ) = tr(c|(V ⊗m)g),
where c is the cyclic automorphism of V ⊗m given by v1⊗· · ·⊗vm 7→ vm⊗· · ·⊗vm−1.
Remark 2.4. In fact, as we mentioned in the introduction, the indicators can be
defined categorically. Applying the categorical definition to Rep(g) yields the above
definition, while applying it to Rep(G) yields tr(c|(V ⊗m)G). Since the indicators of
V regarded as a g-module coincide with the indicators of V regarded as a G-module
we have νm(V ) = tr(c|(V ⊗m)G).
3. Tits’ theorem on the second indicator
Theorem 3.1. (See [B]) Let G be a compact Lie group. Let V be an irreducible
complex representation of G, and set ǫV =
χ(g2)dg. Then V is self dual if and
only if ǫV 6= 0. Furthermore, suppose V is self dual and let B be a (unique up to
4 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
scalar) G-invariant non-degenerate bilinear form on V . Then B is either symmetric
or skew-symmetric, and it is such if and only if ǫV = 1,−1, respectively.
Remark 3.2. In Proposition 4.4 we will prove that ǫV = ν2(V ) as defined above.
Historically ν2(V ) was defined by ǫV .
Example 3.3. Let us use Theorem 1.1 to calculate νm(V ) in the representation
category of sl(2,C). Let sl(2,C) = sp{h, x, y}, where h =
, x =
. The root system is Φ = {α,−α}, where α(h) = 2. The Weyl group
is W = {1, σα} , ρ =
α and σα(ρ) = −
α. Let V (n) = ⊕nj=0V [n − 2j] be
the irreducible representation of highest weight λ(h) = n with its weight space
decomposition. By Theorem 1.1,
νm(V (n)) = dimV (n)[0]− dimV (n)
Let m = 2. By the formula above, if n is odd, then dimV (n)[0] = 0 and
dimV (n)[α
] = 1. Hence ν2(V (n)) = −1. Similarly, if n is even, ν2(V (n)) = 1.
Consequently ν2(V (n)) = (−1)
n = (−1)λ(h).
For m ≥ 3,
is not an integer and hence νm(V (n)) = dimV (n)[0].
Therefore we have
νm(V (n)) =
1 if n is even
0 if n is odd
Let g = h
(⊕α∈Φgα) be the root space decomposition of g and ∆ = {α1, ..., αr}
a fixed base. Fix a standard set of generators for g: xi ∈ gαi , yi ∈ g−αi so that
[xi, yi] = hi. Let ρ̌ := 1/2
α∈Φ+ hα be the half sum of positive coroots.
Proposition 3.4. Let E := x1 + ...+ xr and H := 2ρ̌. Then there exist constants
a1, ..., ar such that the subalgebra P generated by H,E, F := a1y1 + ... + aryr is
isomorphic to sl(2,C).
The Lie subalgebra P ⊆ g is called a principal sl(2,C)-subalgebra of g (see [K]
or [D]).
Lemma 3.5. Let V = V (λ) be an irreducible representation of g. Let P be a
principal sl(2,C)-subalgebra of g. Consider V as a P-module. Then its highest
weight is λ(H), and it contains the irreducible sl(2,C)-representation V (λ(H)) with
multiplicity one.
Proof. Let v+ be a highest weight vector of V considered as a g-module. Then
obviously we have Hv+ = λ(H)v+ and Ev+ = 0. Hence v+ is a highest weight
vector with weight λ(H) for V considered as a P -module. Therefore we can write
V = V (λ(H))
V (nj). Now it remains to show that λ(H) > nj for any j. Let
V = V [λ] ⊕
V [µ] be the weight space decomposition of V as a g-module. It
is also a weight space decomposition of V considered as a P -module, so V [µ] is a
weight space of P with weight µ(H). Recall that µ = λ−
j=1 kjαj where kj ∈ Z
Note that λ(H) > µ(H) if and only if λ(H) > λ(H)−
j=1 kjαj(H) if and only if
j=1 kjαj(2ρ̆) > 0 if and only if
j=1 kj(αj , 2ρ) > 0. But 2ρ is strongly dominant,
i.e., (αj , 2ρ) > 0 for all 1 ≤ j ≤ r. The proof is complete. �
FROBENIUS-SCHUR INDICATORS FOR SEMISIMPLE LIE ALGEBRAS 5
Let ω0 ∈ W be the unique element sending ∆ to −∆.
Theorem 3.6. (Tits) Let V = V (λ) be a finite dimensional irreducible represen-
tation of g. If λ+ ω0λ 6= 0 then ν2(V ) = 0. Otherwise, ν2(V ) = (−1)
λ(2ρ̌).
Proof. It is known that the dual of V (λ) is V (−ω0λ), so if V (λ) is not self dual
(i.e., λ+ ω0λ 6= 0) then ν2(V ) = 0.
Suppose that V is self dual as a g-module. Then V admits a non-degenerate
g-invariant bilinear form, and we have to decide if it is symmetric or skew sym-
metric. To do so, consider the principal sl(2,C)-subalgebra P as in Lemma 3.5.
The restriction of V to P has a unique copy of the largest representation of P
occurring in V , with highest weight λ(2ρ̌). We already proved that this represen-
tation has indicator (−1)λ(2ρ̆). Now we can use Theorem 3.1 to prove that V has
a symmetric (skew-symmetric) g-invariant form if and only if it has a symmetric
(skew-symmetric) P -invariant form. The first direction is obvious. Conversely, sup-
pose that V has a symmetric P -invariant form and suppose on the contrary that V
admits a skew-symmetric g-invariant form. Then if we restrict the bilinear g-form
to P we get that V has a skew-symmetric P -invariant form which is a contradiction.
Similar considerations are applied when V has a skew-symmetric P -invariant form.
We conclude that ν2(V ) = (−1)
λ(2ρ̆). �
4. The Main results
4.1. Proof of Theorem 1.1. Let G be the associated simply connected compact
Lie group. From now on we will consider V (λ) as a G-module. For convenience
set V = V (λ), N = V (λ)⊗m, and let π : G −→ GL(V ) be the irreducible represen-
tation.
The following lemma is easily derived from linear algebra.
Lemma 4.1. Let T ∈ End(V ) be a projection, W = ImT and S ∈ End(V ) an
operator preserving W. Then tr|W (S) = tr|V (S ◦ T ).
Proof. Fix a basis A = {w1, ..., wk} for W , and let Ã = {w1, ..., wk, wk+1, ..., wn}
be a completion to a basis for V . Let C = [S|W ]A be the matrix representing S|W
with respect to the basis A. Since T |W = idW and S(W ) ⊆ W we find out that
, [S]
, and hence [S]
. The lemma
follows easily now. �
Proposition 4.2. We have,
νm(V ) = tr|NG(c) =
tr|V (c ◦ π
⊗m(g))dg.
Proof. We follow the lines of the proof of the first formula for Frobenius-Schur
indicators in the Hopf case, given in Section 2.3 of [KSZ].
Set τ = π⊗m. Consider the operator
τ(g)dg : N −→ N . Let us first show that
the image of this operator is NG. Indeed, by the invariance of the Haar measure,
τ(g)vdg =
τ(hg)vdg =
τ(g)vdg for all h ∈ G and v ∈ N . Hence
τ(g)dg
⊆ NG.
Conversely, suppose that u ∈ NG, then
τ(g)udg =
udg = u
dg = u.
Hence NG ⊆ Im
τ(g)dg
and we are done.
6 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
In fact, the above shows also that the operator
τ(g)dg is a projection onto
Finally, c ∈ Aut(NG), so by Lemma 4.1,
tr|NG(c) = tr|N
τ(g)dg
tr|N (c ◦ τ(g))dg,
as claimed. �
The following lemma is a particular case of a lemma in Section 2.3 of [KSZ] and
its proof replicates the proof of that lemma.
Lemma 4.3. Let f1, ..., fm ∈ End(V ). Then,
tr|V ⊗m(c ◦ (f1 ⊗ ...⊗ fm)) = tr|V (f1 ◦ ... ◦ fm).
Proof. Let v1, ..., vn be a basis of V with dual basis v
1 , ..., v
n. For l = 1, ...,m, fl is
presented by the matrix
i,j=1
, where alij = (v
i , fl(vj)). Therefore, tr
i=1(v
i , fl(vi)). We now have
tr|V ⊗m(c ◦ (f1 ⊗ ...⊗ fm)) =
i1,...,im=1
(v∗i1 ⊗ v
⊗ ...⊗ v∗im , c(f1(vi1)⊗ f2(vi2)⊗ ...⊗ fm(vim))) =
i1,...,im=1
(v∗i1 , f2(vi2)) · · · (v
, fm(vim ))(v
, f1(vi1 )) =
i1,...,im=1
a2i1,i2a
i2,i3
· · · amim−1,ima
im,i1
= tr|V (f2 ◦ f3 ◦ · · · ◦ fm ◦ f1) =
tr|V (f1 ◦ f2 ◦ · · · ◦ fm),
as desired. �
Consequently we have the following proposition which is analogous to the finite
group case.
Proposition 4.4. Let χ be the irreducible character of V . Then
νm(V ) =
χ(gm)dg.
Proof. We follow the lines of the proof of the first formula for Frobenius-Schur
indicators in the Hopf case, given in Section 2.3 of [KSZ].
It follows immediately from Proposition 4.2 and Lemma 4.3 that
νm(V ) =
tr|N (c ◦ π
⊗m(g))dg =
tr|N (c ◦ (π(g) ⊗ ...⊗ π(g))dg =
tr|V (π(g) ◦ .. ◦ π(g))dg =
χ(gm)dg.
FROBENIUS-SCHUR INDICATORS FOR SEMISIMPLE LIE ALGEBRAS 7
Recall the integral real elements which are those elements µ of t∗ for which
2(µ,α)
(α,α)
is an integer for any simple real root α. For each real integral element µ, there is a
function µ̃ on T given by
µ̃(eh) = eiµ(h)
for all h in t. Functions of this form are called torus characters and they have the
following property.
Lemma 4.5.
µ̃(t)dt =
eiµ(h)deh =
1 µ = 0,
0 otherwise.
Proof. Suppose that µ 6= 0, then there exists t0 ∈ t such that µ̃(t0) 6= 1. Therefore
µ̃(t)dt =
µ̃(t0t)dt = µ̃(t0)
µ̃(t)dt,
hence
µ̃(t)dt = 0. �
Let χ be the character of V . Before we begin the proof of Theorem 1.1, recall
that if t = eh ∈ T then for all t ∈ T ,
(1) χ(t) = χ(eh) =
µ∈Π(V )
dim(V [µ])eiµ(h).
We can now prove our main result.
Proof of Theorem 1.1: By Proposition 4.4 and the Weyl integration formula we
have,
(2) νm(V ) =
χ(gm)dg =
χ(tm)|Aρ(t)|
On the other hand,
(3) χ(tm) = χ(emh) =
µ∈Π(V )
dim(V [µ])eimµ(h).
Hence by (2) and (3) we have,
(4) νm(V ) =
µ∈Π(V )
dimV [µ]
eimµ(h)|Aρ(e
h)|2deh.
Now let us calculate the last integral. We have
eimµ(h)|Aρ(e
h)|2deh =
eimµ(h)Aρ(e
h)Aρ(eh)de
eimµ(h)
ω∈W sn(ω)e
i(ω·ρ)(h)
τ∈W sn(τ)e
−i(τ ·ρ)(h)
deh =
ω,τ∈W
sn(ωτ)
ei(mµ+ω·ρ−τ ·ρ)(h)deh.
But from Lemma 4.5 we have
ei(mµ+ω·ρ−τ ·ρ)(h)deh =
1 if mµ+ ω · ρ− τ · ρ = 0
0 otherwise.
8 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
Hence (4) becomes,
νm(V ) =
ω,τ∈W
τ·ρ−ω·ρ
sn(ωτ)dimV [µ]
ω,τ∈W
sn(ωτ)dimV
τ · ρ− ω · ρ
Since dimV [ζ] = dimV [τ · ζ] for all ζ ∈ Π(V ) and τ ∈ W , we can write,
νm(V ) =
ω,τ∈W
sn(ωτ)dimV
ρ− τ−1ω · ρ
Now if we fix ω ∈ W , substitute σ = τ−1ω and use the fact that sn(ωτ) =
sn(τ−1ω), we get
sn(ωτ)dimV
ρ− τ−1ω · ρ
sn(τ−1ω)dimV
ρ− τ−1ω · ρ
sn(σ)dimV
ρ− σ · ρ
Consequently,
νm(V ) =
ω,σ∈W
sn(σ)dimV
ρ− σ · ρ
sn(σ)dimV
ρ− σ · ρ
as desired. �
It may be interesting to state the following immediate consequence of Theorem
1.1 and Theorem 3.6.
Corollary 4.6. Let V (λ) be an irreducible self dual representation of g, then
sn(σ)dimV (λ)
ρ− σ · ρ
= (−1)λ(2ρ̌).
If V (λ) is not self dual, the sum equals 0.
4.2. Proof of Corollary 1.2. Since ρ is strongly dominant, σ · ρ = ρ only when
σ = 1. Write
νm(V ) = dimV [0] +
σ 6=1
sn(σ)dimV
ρ− σ · ρ
We wish to show that for large enough m, ρ−σ·ρ
is not a weight of V when σ 6= 1.
Indeed, suppose that σ 6= 1. Recall that ρ− σ · ρ is an integral element, hence if we
fix some coroot hα, we have the following set of integers: Uα = {(ρ− σ · ρ)(hα)|σ ∈
W , σ 6= 1}. Therefore if we take mα = 1 + uα, where uα is the maximal element
of Uα, then
ρ−σ·ρ
/∈ Π(V ). Hence dimV
ρ−σ·ρ
= 0 for all σ 6= 1, and therefore
νm(V ) = dimV [0], for all m ≥ mα. �
Note that by the procedure of the above proof, m =: min{mα|α ∈ ∆} is a better
bound. Let us now give an explicit such lower bound.
FROBENIUS-SCHUR INDICATORS FOR SEMISIMPLE LIE ALGEBRAS 9
Lemma 4.7. If ω ∈ W then
ω · ρ = ρ−
−1(α)∈Φ−
In particular, sα(ρ) = ρ− α for α ∈ ∆.
Proof. Evidently, ω · ρ is half sum of the set {ω(α)|α ∈ Φ+}. Like Φ+, this is a
set of exactly half of the roots, containing each root or its negative but not both.
More precisely, this set is obtained from Φ+ by replacing each α ∈ Φ+ such that
ω−1 · α ∈ Φ− by its negative. Now,
ω · ρ = ρ−
−1(α)∈Φ−
is evident , and sα(ρ) = ρ− α is a special case since one shows that if α ∈ ∆ and
β ∈ Φ+, then either β = α or sα(β) ∈ Φ
Proposition 4.8. Let V be an irreducible representation of G. Then νm(V ) =
dimV [0] for all m ≥ M := minα∈∆{
|β(hα)|+ 1}.
Proof. Let h = hα be a simple coroot. For all 1 6= ω ∈ W we have,
|(ρ− ω · ρ)(h)| =
β∈Φ+,
−1(β)∈Φ−
β∈Φ+,
−1(β)∈Φ−
|β(h)| ≤
|β(h)|.
Therefore if we choose m =
β∈Φ+ |β(h)|+ 1 then
ρ−ω·ρ
(h) /∈ Z, namely, ρ−ω·ρ
not a weight. Consequently,
σ 6=1
sn(σ)dimV
ρ− σ · ρ
= 0, and we are done. �
Let us calculate the bound M defined in Proposition 4.8 for sl(n,C). Let the
Cartan subalgebra be the set of diagonal matrices in sl(n,C). Let the set of positive
roots be Φ+ = {βi,j|1 ≤ i < j ≤ n}, where βi,j(diag(a1, . . . , an)) = ai − aj . The
subset ∆ = {βi,i+1|1 ≤ i ≤ n− 1} is a base. With respect to this base the simple
coroots are {hi|1 ≤ i ≤ n− 1}, where hi is the matrix with 1 in the (i, i) position,
−1 in the (i+1, i+1) position and 0 elsewhere. Then, by an elementary calculation,
we get that for any simple coroot h,
1≤i<j≤n
|βi,j(h)|+ 1 = 2n− 1.
Consequently we obtain that M = 2n− 1.
Let us calculate the bound M defined in Proposition 4.8 for so(2n+ 1,C). Let
the Cartan subalgebra be the set of diagonal matrices in so(2n + 1,C). Let the
set of positive roots be Φ+ = {βi ± βj |1 ≤ i < j ≤ n} ∪ {βi|1 ≤ i ≤ n}, where
βi(hj) = δij . The subset ∆ = {βi − βi+1, βn|1 ≤ i ≤ n − 1} is a base. With
respect to this base the simple coroots are {hi − hi+1, 2hn|1 ≤ i ≤ n − 1}, where
hi is the matrix with 1 in the (i, i) position, −1 in the (n+ i, n+ i) position and 0
10 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
elsewhere. Then, by an elementary calculation, we get that for any simple coroot
h := hk − hk+1, 1 ≤ k ≤ n− 1, the sum
β∈Φ+ |β(h)|+ 1 equals
1≤i<j≤n
|(βi + βj)(h)|+
1≤i<j≤n
|(βi − βj)(h)| +
|βi(h)|+ 1 = 4n− 3,
while for the simple coroot h := 2hn it equals 4n− 1. Consequently we obtain that
M = 4n− 3.
Applying similar arguments to the other classical simple Lie algebras yields the
following result.
Proposition 4.9. The bound M for sl(n,C), so(2n,C), so(2n+1,C) and sp(2n,C)
is equal to 2n− 1, 4n− 5, 4n− 3 and 2n+ 1, respectively.
4.3. The proof of Theorem 1.3. Let h be the CSA of sl(3,C) generated by the
two elements h1 = diag(1,−1, 0) and h2 = diag(0, 1,−1). We will identify any
functional α on h with the pair (α(h1), α(h2)). Under this identification the six
roots of sl(3,C) are α1 = (2,−1), α2 = (−1, 2), α1 + α2 = (1, 1), −α1 = (−2, 1),
−α2 = (1,−2) and −α1 − α2 = (−1,−1). The roots α1 = (2,−1), α2 = (−1, 2)
form a base and the corresponding simple coroots are h1, h2, respectively.
Recall that if V = V (λ) is an irreducible representation of sl(3,C) of highest
weight λ, then λ is of the form (a, b) with a and b non-negative integers.
Recall thatW ∼= S3 and it acts on h by σ·diag(d1, d2, d3) = diag(dσ(1), dσ(2), dσ(3)).
Therefore, (12) · α1 = −α1, (12) · α2 = α1 + α2; (13) · α1 = −α2, (13) · α2 = −α1;
(23) · α1 = α1 + α2, (23) · α2 = −α2; (123) · α1 = −α1 − α2, (123) · α2 = α1; and
(132) · α1 = α2, (123) · α2 = −α1 − α2.
The half sum of positive roots is ρ = 1
(2α1 + 2α2) = α1 + α2. We have,
ρ− (12)ρ = (2,−1), ρ− (13)ρ = (2, 2), ρ− (23)ρ = (−1, 2), ρ− (123)ρ = (0, 3), and
ρ− (132)ρ = (3, 0).
Let m = 2. Considering our formula, we cancel all the summands which include
roots that one of their two components is not divisible by 2. Consequently we get
ν2(V ) = dimV [(0, 0)]− dimV [(1, 1)].
Recall that an irreducible representation V (a, b) is self dual if and only if a = b.
Since λ = (s, s) = sα1 + sα2, (λ, 2ρ̌) = (λ, 2h1 + 2h2) = (λ, 2h1) + (λ, 2h2) = 4s, it
follows from Tits’ theorem that
ν2(V (a, b)) =
0 a 6= b,
1 otherwise.
Similar considerations for m ≥ 3 yield,
ν3(V ) = dimV [(0, 0)] + dimV [(1, 0)] + dimV [(0, 1)]
νm≥4(V ) = dimV [(0, 0)].
In particular, if λ does not belong to the root lattice, νm≥4(V ) = 0.
We now calculate dimV [(0, 0)], dimV [(1, 0)] and dimV [(0, 1)]. Recall that for η ∈
h∗, p(η) ≥ 1 if and only if η belongs to the root lattice and η ≻ 0. If η = kα1 + lα2
with nonnegative integers k and l, then p(η) = 1 +min{k, l}. Write λ = kα1 + lα2
where k and l are real numbers and identify it with the pair (λ(h1), λ(h2)) = (a, b) =
(2k − l, 2l− k).
FROBENIUS-SCHUR INDICATORS FOR SEMISIMPLE LIE ALGEBRAS 11
Note that (0, 1) = 1
α2 and (1, 0) =
α2. Therefore by Kostant’s
formula (see Theorem 2.1),
dimV [(0, 0)] =
sn(ω)p((k + 1)ω · α1 + (l + 1)ω · α2 − α1 − α2),
dimV [(0, 1)] =
sn(ω)p
(k + 1)ω · α1 + (l + 1)ω · α2 −
dimV [(1, 0)] =
sn(ω)p
(k + 1)ω · α1 + (l + 1)ω · α2 −
It is straightforward to verify that in each of the three cases the surviving
terms correspond to ω = 1, (12), (23). For example, in the first case calculating
(k + 1)ω · α1 + (l + 1)ω · α2 − α1 − α2 for ω = (12), (13), (23), (123), (132), yields
(l − k − 1)α1 + lα2, −(l + 2)α1 − (k + 2)α2 (hence p = 0), kα1 + (k − l − 1)α2,
(l−k−1)α1− (k+2)α2 (hence p = 0), and −(l+2)α1+(k− l−1)α2 (hence p = 0),
respectively.
Therefore we have that dimV [(0, 0)] equals
b+ 2a
2b+ a
b− a− 3
2b+ a
b+ 2a
−b+ a− 3
dimV [(0, 1)] equals
b+ 2a− 1
2b+ a− 2
b− a− 4
2b+ a− 2
b+ 2a− 1
−b+ a− 5
and dimV [(1, 0)] equals
b+ 2a− 2
2b+ a− 1
b− a− 5
2b+ a− 1
b+ 2a− 2
−b+ a− 4
Now, modulo 3, exactly one of the following holds: 1) b+2a = 0 and 2b+a = 0 (in
this case λ belongs to the root lattice), 2) b+2a = 1 and 2b+a = 2 and 3) b+2a = 2
and 2b + a = 1. Hence by the above and elementary calculations, we obtain that
in the first case dimV [(0, 1)] = dimV [(1, 0)] = 0, in the second case dimV [(0, 0)] =
dimV [(1, 0)] = 0 and in the third case dimV [(0, 0)] = dimV [(0, 1)] = 0. Therefore,
in the first case ν3(V (a, b)) equals
1 +min
b+ 2a
2b+ a
1 +min
b− a− 3
2b+ a
1 +min
b+ 2a
−b+ a− 3
12 MOHAMMAD ABU-HAMED AND SHLOMO GELAKI
in the second case it equals
1 +min
b+ 2a− 1
2b+ a− 2
1 +min
b− a− 4
2b+ a− 2
1 +min
b+ 2a− 1
−b+ a− 5
and in the third case it equals
1 +min
b+ 2a− 2
2b+ a− 1
1 +min
b− a− 5
2b+ a− 1
1 +min
b+ 2a− 2
−b+ a− 4
Finally, it is easy to check that in each case the sum equals 1 + min{a, b}, as
claimed. This completes the proof of the theorem. �
References
[B] D. Bump, Lie Groups, Springer-Verlag NY, LLC, (2004).
[D] E. Dynkin, Semisimple subalgebras of semisimple Lie algebras (Russian) Mat.Sbornik
N.S. 30 (27) (1952) 349-462, English: AMS Translations 6 (1957), 111-244.
[FGSV] J. Fucs, C. Ganchev, K. Szlachányi, and P. Vescernyes, S4-symmetry of 6j-sympols and
Frobenius-Schur indicators in rigid monoidal C∗-categories. J.Math Phys. 40 (1999),
408-426.
[Ha] B. Hall, Lie groups, Lie algebras and representations, Springer-Verlag, Berlin-Heidelberg-
New York, (2006).
[Hu] J. Humphreys, Introdution to Lie algebras and representation theory, Springer-Verlag,
Berlin-Heidelberg-New York, (1972).
[K] B. Kostant, The principal three dimensional subgroup and betti numbers of complex
simple Lie group, Amer.J.Math. 81 (1959), 973-1032.
[KSZ] Y. Kashina, Y. Sommerhaeuser, and Y. Zhu, On higher Frobenius-Schur indicators,
Memoirs of the AMS 181, no 855 (2006).
[LM] V. Linchenko and S. Montgomery, A Frobenius-Schur theorem for Hopf algebras, Algebr.
Represent. Theory 3 (2000), no. 4, 347-355, Special issue dedicated to Klaus Roggenkamp
on the occasion of his 60th birthday.
[MN] G. Mason and S-H. Ng, Central invariants and Frobenius-Schur indicators for semi-
simple qusi-Hopf algebras, Adv. Math. 190 (2005), 161-195.
[NS1] S-H. Ng and P. Schauenburg, Higher Frobenius-Schur indicators for pivotal categories,
preprint arXiv:math,QA/0503167.
[NS2] S-H. Ng and P. Schauenburg, Central invariants and higher indicators for semisimple
quasi-Hopf algebras, Transactions of the AMS, to appear, arXiv:math,QA/0508140.
[S] J-P. Serre, Linear Representation of Finite Groups, Springer-Verlag, New York, (1977).
Department of Mathematics, Technion-Israel Institute of Technology, Haifa 32000,
Israel
E-mail address: mohammad@tx.technion.ac.il, mohammad.abu-hamed@weizmann.ac.il
Department of Mathematics, Technion-Israel Institute of Technology, Haifa 32000,
Israel
E-mail address: gelaki@math.technion.ac.il
	1. Introduction
	2. Preliminaries
	3. Tits' theorem on the second indicator
	4. The Main results
	4.1. Proof of Theorem ??
	4.2. Proof of Corollary ??
	4.3. The proof of Theorem ??
	References
ABSTRACT
  Let g be a finite dimensional complex semisimple Lie algebra, and let V be a
finite dimensional represenation of g. We give a closed formula for the mth
Frobenius-Schur indicator, m>1, of V in representation-theoretic terms. We
deduce that the indicators take integer values, and that for a large enough m,
the mth indicator of V equals the dimension of the zero weight space of V. For
the classical Lie algebras sl(n), so(2n), so(2n+1) and sp(2n), this is the case
for m greater or equal to 2n-1, 4n-5, 4n-3 and 2n+1, respectively.

<|endoftext|><|startoftext|>
Introduction
Systems of D-branes at singularities provide a very interesting setup to realize and study
diverse non-perturbative gauge dynamics phenomena in string theory. In the context
of N = 1 supersymmetric gauge field theories, systems of D3-branes at Calabi-Yau
singularities lead to interesting families of tractable 4d strongly coupled conformal field
theories, which extend the AdS/CFT correspondence [1, 2, 3] to theories with reduced
(super)symmetry [4, 5, 6] and enable non-trivial precision tests of the correspondence
(see for instance [7, 8]). Addition of fractional branes leads to families of non-conformal
gauge theories, with intricate RG flows involving cascades of Seiberg dualities [9, 10,
11, 12, 13], and strong dynamics effects in the infrared.
For instance, fractional branes associated to complex deformations of the singular
geometry (denoted deformation fractional branes in [12]), correspond to supersym-
metric confinement of one or several gauge factors in the gauge theory [9, 12]. The
generic case of fractional branes associated to obstructed complex deformations (de-
noted DSB branes in [12]), corresponds to gauge theories developing a non-perturbative
Affleck-Dine-Seiberg superpotential, which removes the classical supersymmetric vacua
[14, 15, 16]. As shown in [15] (see also [17, 18]), assuming canonical Kahler potential
leads to a runaway potential for the theory, along a baryonic direction. A natural
suggestion to stop this runaway has been proposed for the particular example of the
dP1 theory (the theory on fractional branes at the complex cone over dP1) in [19]. It
was shown that, upon the addition of D7-branes to the configuration (which intro-
duce massive flavors), the theory develops a meta-stable minimum (closely related to
the Intriligator-Seiberg-Shih (ISS) model [20]), parametrically long-lived against decay
to the runaway regime (see [21] for an alternative suggestion to stop the runaway, in
compact models).
In this paper we show that the appearance of meta-stable minima in gauge theories
on DSB fractional branes, in the presence of additional massless flavors, is much more
general (and possibly valid in full generality). We use the tools of [15] to introduce
D7-branes on general toric singularities, and give masses to the corresponding flavors.
Since quiver gauge theories are rather involved, we develop new techniques to efficiently
analyze the one-loop stability of the meta-stable minima, via the direct computation
of Feynman diagrams. These tools can be used to argue that the results plausibly
hold for general systems of DSB fractional branes at toric singularities. It is very
satisfactory to verify the correspondence between the existence of meta-stable vacua
and the geometric property of having obstructed complex deformations.
The present work thus enlarges the class of string models realizing dynamical su-
persymmetry breaking in meta-stable vacua (see [22, 23, 24, 25, 26] for other proposed
realizations, and [27, 28, 29] for models of dynamical supersymmetry breaking in ori-
entifold theories). Although we will not discuss it in the present paper, these results
can be applied to the construction of models of gauge mediation in string theory as
in [30] (based on the additional tools in [31]), in analogy with [32]. This is another
motivation for the present work.
The paper is organized as follows. In Section 2 we review the ISS model, evaluating
one-loop pseudomoduli masses directly in terms of Feynman diagrams. In Section 3
we study the theory of DSB branes at the dP1 and dP2 singularities upon the addition
of flavors, and we find that metastable vacua exist for these theories. In Section 4
we extend this analysis to the general case of DSB branes at toric singularities with
massive flavors, and we illustrate the results by showing the existence of metastable
vacua for DSB branes at some well known families of toric singularities. Finally, the
Appendix provides some technical details that we have omitted from the main text in
order to improve the legibility.
2 The ISS model revisited
In this Section we review the ISS meta-stable minima in SQCD, and propose that the
analysis of the relevant piece of the one-loop potential (the quadratic terms around the
maximal symmetry point) is most simply carried out by direct evaluation of Feynman
diagrams. This new tool will be most useful in the study of the more involved examples
of quiver gauge theories.
2.1 The ISS metastable minimum
The ISS model [20] (see also [33] for a review of these and other models) is given by
N = 1 SU(Nc) theory with Nf flavors, with small masses
Welectric = mTrφφ̃, (2.1)
where φ and φ̃ are the quarks of the theory. The number of colors and flavors are
chosen so as to be in the free magnetic phase:
Nc + 1 ≤ Nf <
Nc. (2.2)
This condition guarantees that the Seiberg dual is infrared free. This Seiberg dual is
the SU(N) theory (with N = Nf −Nc) with Nf flavors of dual quarks q and q̃ and the
meson M . The dual superpotential is given by rewriting (2.1) in terms of the mesons
and adding the usual coupling between the meson and the dual quarks:
Wmagnetic = h (Tr q̃Mq − µ
2TrM), (2.3)
where h and µ can be expressed in terms of the parameters m and Λ, and some
(unknown) information about the dual Kähler metric1. It was also argued in [20] that
it is possible to study the supersymmetry breaking minimum in the origin of (dual)
field space without taking into account the gauge dynamics (their main effect in this
discussion consists of restoring supersymmetry dynamically far in field space). In the
following we will assume that this is always the case, and we will forget completely
about the gauge dynamics of the dual.
Once we forget about gauge dynamics, studying the vacua of the dual theory be-
comes a matter of solving the F-term equations coming from the superpotential (2.3).
The mesonic F-term equation reads:
− FMij = hq̃i · qj − hµ2δij = 0, (2.4)
where i and j are flavor indices and the dot denotes color contraction. This has no
solution, since the identity matrix δij has rank Nf while q̃
i · qj has rank N = Nf −Nc.
Thus this theory breaks supersymmetry spontaneously at tree level. This mechanism
for F-term supersymmetry breaking is called the rank condition.
The classical scalar potential has a continuous set of minima, but the one-loop
potential lifts all of the non-Goldstone directions, which are usually called pseudomod-
uli. The usual approach to study the one-loop stabilization is the computation of the
complete one-loop effective potential over all pseudomoduli space via the Coleman-
Weinberg formula [34]:
M4B log
−M4F log
. (2.5)
This approach has the advantage that it allows the determination of the one-loop
minimum, without a priori information about its location, and moreover it provides
the full potential around it, including higher terms. However, it has the disadvantage
1The exact expressions can be found in (5.7) in [20], but we will not need them for our analysis.
We just take all masses in the electric description to be small enough for the analysis of the metastable
vacuum to be reliable.
of requiring the diagonalization of the mass matrix, which very often does not admit a
closed expression, e.g. for the theories we are interested in.
In fact, we would like to point out that to determine the existence of a meta-stable
minimum there exists a computationally much simpler approach. In our situation, we
have a good ansatz for the location of the one-loop minimum, and are interested just in
the one-loop pseudomoduli masses around such point. This information can be directly
obtained by computing the one-loop masses via the relevant Feynman diagrams. This
technique is extremely economical, and provides results in closed form in full generality,
e.g. for general values of the couplings, etc. The correctness of the original ansatz for
the vacuum can eventually be confirmed by the results of the computation (namely
positive one-loop squared masses, and negligible tadpoles for the classically massive
fields 2).
Hence, our strategy to study the one-loop stabilization in this paper is as follows:
• First we choose an ansatz for the classical minimum to become the one-loop
vacuum. It is natural to propose a point of maximal enhanced symmetry (in
particular, close to the origin in the space of vevs for M there exist and R-
symmetry, whose breaking by gauge interactions (via anomalies) is negligible in
that region). Hence the natural candidate for the one-loop minimum is
q = q̃T =
, (2.6)
with the rest of the fields set to 0. This initial ansatz for the one-loop minimum
is eventually confirmed by the positive square masses at one-loop resulting from
the computations described below. In our more general discussion of meta-stable
minima in runaway quiver gauge theories, our ansatz for the one-loop minimum
is a direct generalization of the above (and is similarly eventually confirmed by
the one-loop mass computation).
• Then we expand the field linearly around this vacuum, and identify the set of
classically massless fields. We refer to these as pseudomoduli (with some abuse
of language, since there could be massless fields which are not classically flat
directions due to higher potential terms)
2Since supersymmetry is spontaneously broken the effective potential will get renormalized by
quantum effects, and thus classically massive fields might shift slightly. This appears as a one loop
tadpole which can be encoded as a small shift of µ. This will enter in the two loop computation of
the pseudomoduli masses, which are beyond the scope of the present paper.
• As a final step we compute one-loop masses for these pseudomoduli by evaluating
their two-point functions via conventional Feynman diagrams, as explained in
more detail in appendix A.1 and illustrated below in several examples.
The ISS model is a simple example where this technique can be illustrated. Con-
sidering the above ansatz for the vacuum, we expand the fields around this point as:
µ+ 1√
(ξ+ + ξ−)
(ρ+ + ρ−)
, q̃T =
µ+ 1√
(ξ+ − ξ−)
(ρ+ − ρ−)
, M =
Z̃T Φ
(2.7)
where we have taken linear combinations of the fields in such a way that the bosonic
mass matrix is diagonal. This will also be convenient in section 2.2, where we discuss
the Goldstone bosons in greater detail.
We now expand the superpotential (2.3) to get
2µξ+Y +
µZρ+ +
µZρ− +
µρ+Z̃ −
µρ−Z̃
ρ2+Φ−
ρ2−Φ− µ2Φ+ . . . , (2.8)
where we have not displayed terms of order three or higher in the fluctuations, unless
they contain Φ, since they are irrelevant for the one loop computation we will perform.
Note also that we have set h = 1 and we have removed the trace (the matricial structure
is easy to restore later on, here we just set Nf = 2 for simplicity). The massless bosonic
fluctuations are given by Re ρ+, Im ρ−, Φ and ξ−. The first two together with Im ξ− are
Goldstone bosons, as explained in section 2.2. Thus the pseudomoduli we are interested
in are given by Φ and Re ξ−. Let us focus on Φ (the case of Re ξ− admits a similar
discussion). In this case the relevant terms in the superpotential simplify further, and
just the following superpotential contributes:
W = µZ
(ρ+ + ρ−) + µZ̃
(ρ+ − ρ−) +
ρ2+Φ−
ρ2−Φ− µ2Φ+ . . . ,
which we recognize, up to a field redefinition, as the symmetric model of appendix A.2.
We can thus directly read the result
δm2Φ =
|h|4µ2
(log 4− 1). (2.9)
This matches the value given in [20], which was found using the Coleman-Weinberg
potential.
2.2 The Goldstone bosons
One aspect of our technique that merits some additional explanation concerns the
Goldstone bosons. The one-loop computation of the masses for the fluctuations associ-
ated to the symmetries broken by the vacuum, using just the interactions described in
appendix A.1, leads to a non-vanishing result. This puzzle is however easily solved by
realizing that certain (classically massive) fields have a one-loop tadpole. This leads to
a new contribution to the one-loop Goldstone two-point amplitude, given by the dia-
gram in Figure 1. Adding this contribution the total one-loop mass for the Goldstone
bosons is indeed vanishing, as expected. This tadpole does not affect the computation
of the one-loop pseudomoduli masses (except for Re ξ+, but its mass remains positive)
as it is straightforward to check.
Re ξ+
Figure 1: Schematic tadpole contribution to the Im ξ− two point function. Both bosons and
fermions run in the loop.
The structure of this cancellation can be understood by using the derivation of the
Goldstone theorem for the 1PI effective potential, as we now discuss. The proof can
be found in slightly more detail, together with other proofs, in [35]. Let us denote by
V the 1PI effective potential. Invariance of the action under a given symmetry implies
∆φi = 0, (2.10)
where we denote by ∆φi the variation of the field φi under the symmetry, which will
in general be a function of all the fields in the theory. Taking the derivative of this
equation with respect to some other field φk
δφiδφk
∆φi +
· δ∆φi
= 0. (2.11)
Let us consider how this applies to our case. At tree level, there is no tadpole and
the above equation (truncated at tree level) states that for each symmetry generator
broken by the vacuum, the value of ∆φi gives a nonvanishing eigenvector of the mass
matrix with zero eigenvalue. This is the classical version of the Goldstone theorem,
which allows the identification of the Goldstone bosons of the theory.
For instance, in the ISS model in the previous section (for Nf = 2), there are
three global symmetry generators broken at the minimum described around (2.6). The
SU(2) × U(1) symmetry of the potential gets broken down to a U(1)′, which can be
understood as a combination of the original U(1) and the tz generator of SU(2). The
Goldstone bosons can be taken to be the ones associated to the three generators of
SU(2), and correspond (for µ real) to Im ξ−, Im ρ− and Re ρ+, in the parametrization
of the fields given by equation (2.7).
Even in the absence of tree-level tadpoles, there could still be a one-loop tadpole.
When this happens, there should also be a non-trivial contribution to the mass term
for the Goldstone bosons in the one-loop 1PI potential, related to the tadpole by the
one-loop version of (2.11). This relation guarantees that the mass term in the physical
(i.e. Wilsonian) effective potential, which includes the 1PI contribution, plus those of
the diagram in Figure 1, vanishes, as we described above.
In fact, in the ISS example, there is a non-vanishing one-loop tadpole for the real
part of ξ+ (and no tadpole for other fields). The calculation of the tadpole at one loop
is straightforward, and we will only present here the result
iM = −i|h|
(4π)2
(2 log 2). (2.12)
The 1PI one-loop contribution to the Goldstone boson mass is also simple to calculate,
giving the result
iM = −i|h|
(4π)2
(log 2). (2.13)
Using the variations of the relevant fields under the symmetry generator, e.g. for tz,
∆Re ξ+ = −Im ξ− (2.14)
∆Im ξ− = Re ξ+ + 2µ. (2.15)
we find that the (2.11) is satisfied at one-loop.
δφiδφk
∆φi +
· δ∆φi
= m2Im ξ− · 2µ+ (Re ξ+tadpole) · (−1) = 0. (2.16)
A very similar discussion applies to tx and ty.
The above discussion of Goldstone bosons can be similarly carried out in all ex-
amples of this paper. Hence, it will be enough to carry out the computation of the
1PI diagrams discussed in appendix A.1, and verify that they lead to positive squared
masses for all classically massless fields (with Goldstone bosons rendered massless by
the additional diagrams involving the tadpole).
3 Meta-stable vacua in quiver gauge theories with
DSB branes
In this section we show the existence of a meta-stable vacuum in a few examples
of gauge theories on DSB branes, upon the addition of massive flavors. As already
discussed in [19], the choice of fractional branes of DSB kind is crucial in the result.
The reason is that in order to have the ISS structure, and in particular supersymmetry
breaking by the rank condition, one needs a node such that its Seiberg dual satisfies
Nf > N , with N = Nf − Nc with Nc, Nf the number of colors, flavors of that gauge
factor. Denoting Nf,0, Nf,1 the number of massless and massive flavors (namely flavors
arising from bi-fundamentals of the original D3-brane quiver, or introduced by the D7-
branes), the condition is equivalent to Nf,0 < Nc. This is precisely the condition that
an ADS superpotential is generated, and is the prototypical behavior of DSB branes
[14, 15, 16, 18].
Another important general comment, also discussed in [19], is that theories on DSB
branes generically contain one or more chiral multiplets which do not appear in the
superpotential. Being decoupled, such fields remain as accidental flat directions at
one-loop, so that the one-loop minimum is not isolated. The proper treatment of these
flat directions is beyond the reach of present tools, so they remain an open question.
However, it is plausible that they do not induce a runaway behavior to infinity, since
they parametrize a direction orthogonal to the fields parametrizing the runaway of
DSB fractional branes.
3.1 The complex cone over dP1
In this section we describe the most familiar example of quiver gauge theory with DSB
fractional branes, the dP1 theory. In this theory, a non-perturbative superpotential
removes the classical supersymmetric vacua [14, 15, 16]. Assuming canonical Kähler
potential the theory has a runaway behavior [15, 17]. In this section, we revisit with
our techniques the result in [19] that the addition of massive flavors can induce the ap-
pearance of meta-stable supersymmetry breaking minima, long-lived against tunneling
to the runaway regime. As we show in coming sections, this behavior is prototypical
and extends to many other theories with DSB fractional branes. The example is also
representative of the computations for a general quiver coming from a brane at a toric
singularity, and illustrates the usefulness of the direct Feynman diagram evaluation of
one-loop masses.
Consider the dP1 theory, realized on a set ofM fractional D3-branes at the complex
cone over dP1. In order to introduce additional flavors, we introduce sets of Nf,1
D7-branes wrapping non-compact 4-cycles on the geometry and passing through the
singular point. We refer the reader to [19], and also to later sections, for more details on
the construction of the theory, and in particular on the introduction of the D7-branes.
Its quiver is shown in Figure 2, and its superpotential is
W = λ(X23X31Y12 −X23Y31X12)
+ λ′(Q3iQ̃i2X23 +Q2jQ̃j1X12 +Q1kQ̃k3X31)
+ m3Q3iQ̃k3δik +m2Q2jQ̃i2δji +m1Q1kQ̃j1δkj , (3.1)
where the subindices denote the groups under which the field is charged. The first
line is the superpotential of the theory of fractional brane, the second line describes
77-73-37 couplings between the flavor branes and the fractional brane, and the last line
gives the flavor masses. Note that there is a massless field, denoted Z12 in [19], that
does not appear in the superpotential. This is one of the decoupled fields mentioned
above, and we leave its treatment as an open question.
SU(3M)
SU(2M) SU(M)
PSfrag replacements Q3i
Q2j Q̃j1
Figure 2: Extended quiver diagram for a dP1 theory with flavors, from [19].
We are interested in gauge factors in the free magnetic phase. This is the case for
the SU(3M) gauge factor in the regime
M + 1 ≤ Nf,1 <
M. (3.2)
To apply Seiberg duality on node 3, we introduce the dual mesons:
M21 =
X23X31 ; Nk1 =
Q̃k3X31
M ′21 =
X23Y31 ; N
Q̃k3Y31
N2i =
X23Q3i ; Φki =
Q̃k3Q3i
(3.3)
and we also replace the electric quarks Q3i, Q̃k3, X23, X31, Y31 by their magnetic duals
Q̃i3, Q3k, X32, X13, Y13. The magnetic superpotential is given by rewriting the confined
fields in terms of the mesons and adding the coupling between the mesons and the dual
quarks,
W = h (M21X13X32 + M
21Y13X32 + N2iQ̃i3X32
+ Nk1X13Q3k + N
k1Y13Q3k + ΦkiQ̃i3Q3k )
+ hµ0 (M21Y12 − M ′21X12 ) + µ′Q1kNk1 + µ′N2iQ̃i2
− hµ 2TrΦ + λ′Q2jQ̃j1X12 + m2Q2iQ̃i2 + m1Q1iQ̃i1. (3.4)
This is the theory we want to study. In order to simplify the treatment of this example
we will disregard any subleading terms in mi/µ
′, and effectively integrate out Nk1 and
N2i by substituting them by 0. This is not necessary, and indeed the computations in
the next sections are exact. We do it here in order to compare results with [19].
As in the ISS model, this theory breaks supersymmetry via the rank condition. The
fields Q̃i3, Q3k and Φki are the analogs of q, q̃ and M in the ISS case discussed above.
This motivates a vacuum ansatz analogous to (2.6) and the following linear expansion:
φ00 φ01
φ10 φ11
; Q̃i3 =
µeθ +Q3,1
Q̃3,2
; QT3i =
µe−θ +Q3,1
Q̃k1 =
Q̃1,1
; Q2j =
Q2,11 x
Q2,21 x
; M21 =
M21,1
M21,2
Y13 = (Y13) ; X
X12,1
X12,2
; XT32 =
X32,1
X32,2
Y T12 =
Y12,1
Y12,2
; N ′k1 =
N ′k1,1
; M ′21 =
M ′21,1
M ′21,2
X13 = (X13) .
(3.5)
Note that we have chosen to introduce the nonlinear expansion in θ in order to re-
produce the results found in the literature in their exact form3. Note also that for
the sake of clarity we have not been explicit about the ranks of the different matrices.
They can be easily worked out (or for this case, looked up in [19]), and we will restrict
ourselves to the 2 flavor case where the matrix structure is trivial. As a last remark,
we are not being explicit either about the definitions of the different couplings in terms
of the electric theory. This can be done easily (and as in the ISS case they involve
3A linear expansion would lead to identical conclusions concerning the existence of the meta-stable
vacua, but to one-loop masses not directly amenable to comparison with results in the literature.
an unknown coefficient in the Kähler potential), but in any event, the existence of
the meta-stable vacua can be established for general values of the coefficients in the
superpotential. Hence we skip this more detailed but not very relevant discussion.
The next step consists in expanding the superpotential and identifying the massless
fields. We get the following quadratic contributions to the superpotential:
Wmass = 2hµφ00Q̃3,1 + hµφ01Q̃3,2 + hµφ10Q3,2
+ hµ0M21,1Y12,1 + hµ0M21,2Y12,2 − λ′M ′21,1X12,1 − λ′M ′21,2X12,2
+ hµN ′k1,1Y13 − h1µQ̃1,1X13 − h2µQ2,11X32,1 − h2µQ2,21X32,2. (3.6)
The fields massless at tree level are x, x′, y, z, φ11, θ, Q3,2 and Q̃3,2. Three of these
are Goldstone bosons as described in the previous section. For real µ they are Im θ,
Re (Q̃3,2 +Q3,2) and Im (Q̃3,2 −Q3,2). We now show that all other classically massless
fields get masses at one loop (with positive squared masses).
As a first step towards finding the one-loop correction, notice that the supersym-
metry breaking mechanism is extremely similar to the one in the ISS model before, in
particular it comes only from the following couplings in the superpotential:
Wrank = hQ3,2Q̃3,2φ11 − hµ2φ11 + . . . (3.7)
This breaks the spectrum degeneracy in the multiplets Q3,2 and Q̃3,2 at tree level, so
we refer to them as the fields with broken supersymmetry.
Let us compute now the correction for the mass of x, for example. For the one-loop
computation we just need the cubic terms involving one pseudomodulus and at least
one of the broken supersymmetry fields, and any quadratic term involving fields present
in the previous set of couplings. From the complete expansion one finds the following
supersymmetry breaking sector:
Wsymm. = hφ11Q3,2Q̃3,2 + hµφ01Q̃3,2 + hµφ10Q3,2 − hµ2φ11. (3.8)
The only cubic term involving the pseudomodulus x and the broken supersymmetry
fields is
Wcubic = −h2 x Q̃3,2X32,1, (3.9)
and there is a quadratic term involving the field X32,1
Wmass coupling = −h2µQ2,11X32,1. (3.10)
Assembling the three previous equations, the resulting superpotential corresponds to
the asymmetric model in appendix A.2, so we can directly obtain the one-loop mass
for x:
δm2x =
|h|4µ2C
|h2|2
. (3.11)
Proceeding in a similar way, the one-loop masses for φ11, x
′, y and z are:
δm2φ11 =
|h|4µ2(log 4− 1)
δm2x′ =
|h|4µ2C
|h2|2
δm2y =
|h|4µ2C
|h1|2
δm2z =
|h|4µ2(log 4− 1). (3.12)
There is just one pseudomodulus left, Re θ, which is qualitatively different to the
others. With similar reasoning, one concludes that it is necessary to study a superpo-
tential of the form
W = h(Xφ1φ2 + µe
θφ1φ3 + µe
−θφ2φ4 − µ2X). (3.13)
Due to the non-linear parametrization, the expansion in θ shows that there is a term
quadratic in θ which contributes to the one-loop mass via a vertex with two bosons and
two fermions, the relevant diagram is shown in Figure 16d. The result is a vanishing
mass for Im θ, as expected for a Goldstone boson (the one-loop tadpole vanishes in this
case), and a non-vanishing mass for Re θ
δm2Re θ =
|h|4µ4(log 4− 1). (3.14)
We conclude by mentioning that all squared masses are positive, thus confirming
that the proposed point in field space is the one-loop minimum. As shown in [19], this
minimum is parametrically long-lived against tunneling to the runaway regime.
3.2 Additional examples: The dP2 case
Let us apply these techniques to consider new examples. In this section we consider
a DSB fractional brane in the complex cone over dP2, which provides another quiver
theory with runaway behavior [15]. The quiver diagram for dP2 is given in Figure 3,
with superpotential
W = X34X45X53 −X53Y31X15 −X34X42Y23 + Y23X31X15X52
+ X42X23Y31X14 −X23X31X14X45X52 (3.15)
Figure 3: Quiver diagram for the dP2 theory.
We consider a set of M DSB fractional branes, corresponding to choosing ranks
(M, 0,M, 0, 2M) for the corresponding gauge factors. The resulting quiver is shown in
Figure 4, with superpotential
W = −λX53Y31X15 (3.16)
U(2M)
U(M)U(M)
Figure 4: Quiver diagram for the dP2 theory with M DSB fractional branes.
Following [19] and appendix B, one can introduce D7-branes leading to D3-D7
open strings providing (possibly massive) flavors for all gauge factors, and having cubic
couplings with diverse D3-D3 bifundamental chiral multiplets. We obtain the quiver
in Figure 5. Adding the cubic 33-37-73 coupling superpotential, and the flavor masses,
the complete superpotential reads
Wtotal = −λX53Y31X15 − λ′(Q1iQ̃i3Y31 +Q3jQ̃j5X53 +Q5kQ̃k1X15)
+ m1Q1iQ̃k1 +m2Q3jQ̃i3 +m5Q5kQ̃j5 (3.17)
where 1, 2, 3 are the gauge group indices and i, j, k are the flavor indices.
We consider the U(2M) node in the free magnetic phase, namely
M + 1 ≤ Nf,1 < 2M (3.18)
U(M) U(M)
U(2M)
PSfrag replacements
Q1i Qi3
Figure 5: Quiver for the dP2 theory with M fractional branes and flavors.
After Seiberg Duality the dual gauge factor is SU(N) withN = Nf,1−M and dynamical
scale Λ. To get the matter content in the dual, we replace the microscopic flavors Q5k,
Q̃j5, X53, X15 by the dual flavors Q̃k5, Q5j , X35, X51 respectively. We also have the
mesons related to the fields in the electric theory by
M1k =
X15Q5K ; Ñj3 =
Q̃j5X53
M13 =
X15X53 ; Φ̃jk =
Q̃j5Q5k
(3.19)
There is a cubic superpotential coupling the mesons and the dual flavors
Wmes. = h (M1kQ̃k5X51 + M13X35X51 + Ñj3X35Q5j + Φ̃jkQ̃k5Q5j ) (3.20)
where h = Λ/Λ̂ with Λ̂ given by Λ
3Nc−Nf
elect Λ
3(Nf−Nc)−Nf = Λ̂Nf , where Λelect is the
dynamical scale of the electric theory. Writing the classical superpotential terms of the
new fields gives
Wclas. = −hµ0M13Y31 + λ′Q1iQ̃i3Y31 + µ′ Ñj3Q3j + µ′M1kQ̃k1
+ m1Q1iQ̃k1 + m3Q3jQ̃i3 − hµ 2TrΦ (3.21)
where µ0 = λΛ, µ
′ = λ′Λ, and µ 2 = −m5Λ̂. So the complete superpotential in the
Seiberg dual is
Wdual = −hµ0M13Y31 + λ′Q1iQ̃i3Y31 + µ′ Ñj3Q3j + µ′M1kQ̃k1
+ m1Q1iQ̃k1 + m3Q3jQ̃i3 − hµ 2TrΦ
+ h (M1kQ̃k5X51 + M13X35X51 + Ñj3X35Q5j + Φ̃jkQ̃k5Q5j ) (3.22)
This superpotential has a sector completely analogous to the ISS model, triggering
supersymmetry breaking by the rank condition. This suggests the following ansatz for
the point to become the one-loop vacuum
Q5k = Q̃
, (3.23)
with all other vevs set to zero. Following our technique as explained above, we expand
fields at linear order around this point. Focusing on Nf,1 = 2 and Nc = 1 for simplicity
(the general case can be easily recovered), we have
Q̃k5 =
µ+ δQ̃5,1
δQ̃5,2
; Q5k = (µ+ δQ5,1 ; δQ5,2) ; Φ =
δΦ0,0 δΦ0,1
δΦ1,0 δΦ1,1
Q̃k1 =
δQ̃1,1
δQ̃1,2
; Q1i = (δQ1,1 ; δQ1,2) ; Q̃i3 =
δQ̃3,1
δQ̃3,2
; Q3j = (δQ3,1 ; δQ3,2)
Ñj3 =
δÑ3,1
δÑ3,2
; M1k = (δM1,1 ; δM1,2) ; M13 = δM13 ; Y31 = δY31 ; X51 = δX51
X35 = δX35
(3.24)
Inserting this into equation (3.22) gives
Wdual = −hµ0 δM13δY31 + λ′ δQ1,1δQ̃3,1δY31 + λ′ δQ1,2δQ̃3,2δY31
+ µ′ δÑ3,1δQ3,1 + µ
′ δÑ3,2δQ3,2 + µ
′ δM1,1δQ̃1,1 + µ
′ δM1,2δQ̃1,2
+ m1δQ1,1δQ̃1,1 + m1δQ1,2δQ̃1,2 + m3δQ3,1δQ̃3,1 + m3δQ3,2δQ̃3,2
− hµ 2δΦ11 + h (µδM1,1δX51 + δM1,1δQ̃5,1δX51 + δM1,2δQ̃5,2δX51
+ δM13δX35δX51 + µδX35δÑ3,1 + δX35δÑ3,1δQ5,1 + δX35δÑ3,2δQ5,2
+ µδQ̃5,1δΦ00 + µδQ5,1δΦ00 + δQ5,1δQ̃5,1δΦ00 + µδΦ01δQ̃5,2
+ δQ5,1δΦ01δQ̃5,2 + µδΦ10δQ5,2 + δQ̃5,1δΦ10δQ5,2 + δQ̃5,2δΦ11δQ5,2).
We now need to identify the pseudomoduli, in other words the massless fluctuations at
tree level. We focus then just on the quadratic terms in the superpotential
Wmass = −hµ0 δM13δY31
+ µ′ δÑ3,1δQ3,1 +m3δQ3,1δQ̃3,1 + hµδX35δÑ3,1
+ µ′ δÑ3,2δQ3,2 +m3δQ3,2δQ̃3,2
+ µ′ δM1,1δQ̃1,1 + m1δQ1,1δQ̃1,1 + hµδM1,1δX51
+ µ′ δM1,2δQ̃1,2 + m1δQ1,2δQ̃1,2
+ hµδQ̃5,1δΦ00 + hµδQ5,1δΦ00
+ hµδΦ01δQ̃5,2 + µδΦ10δQ5,2. (3.25)
We have displayed the superpotential so that fields mixing at the quadratic level appear
in the same line. In order to identify the pseudomoduli we have to diagonalize4 these
fields. Note that the structure of the mass terms corresponds to the one in appendix C,
in particular around equation (C.9). From the analysis performed there we know that
upon diagonalization, fields mixing in groups of four (i.e., three mixing terms in the
superpotential, for example the δM1,1, δQ̃1,1, δQ1,1, δX51 mixing) get nonzero masses,
while fields mixing in groups of three (two mixing terms in the superpotential, for
example δM1,2, δQ̃1,2 and δQ1,2) give rise to two massive perturbations and a massless
one, a pseudomodulus. We then just need to study the fate of the pseudomoduli. From
the analysis in appendix C, the pseudomoduli coming from the mixing terms are
Y1 = m3δÑ3,2 − µ′δQ̃3,2 ,
Y2 = m1δM1,2 − µ′δQ1,2 ,
Y3 = hµ(δQ5,1 − δQ̃5,1) . (3.26)
In order to continue the analysis, one just needs to change basis to the diagonal fields
and notice that the one loop contributions to the pseudomoduli are described again by
the asymmetric model of appendix A.2, so they receive positive definite contributions.
The exact analytic expressions can be easily found with the help of some computer
algebra program, but we omit them here since they are quite unwieldy.
4 The general case
In the previous section we showed that several examples of quiver gauge theories on
DSB fractional branes have metastable vacua once additional flavors are included.
In this section we generalize the arguments for general DSB branes. We will show
how to add D7–branes in a specific manner so as to generate the appropriate cubic
flavor couplings and mass terms. Once this is achieved, we describe the structure of
the Seiberg dual theory. The results of our analysis show that, with the specified
configuration of D7–branes, the determination of metastability is greatly simplified
and only involves looking at the original superpotential. Thus, although we do not
prove that DSB branes on arbitrary singularities generate metastable vacua, we show
how one can determine the existence of metastability in a very simple and systematic
4As a technical remark, let us note that it is possible to set all the mass terms to be real by an
appropriate redefinition of the fields, so we are diagonalizing a real symmetric matrix.
manner. Using this analysis we show further examples of metastable vacua on systems
of DSB branes.
4.1 The general argument
4.1.1 Construction of the flavored theories
Consider a general quiver gauge theory arising from branes at singularities. As we have
argued previously, we focus on DSB branes, so that there is a gauge factor satisfying
Nf,0 < Nc, which can lead to supersymmetry breaking by the rank condition in its
Seiberg dual. To make the general analysis more concrete, let us consider a quiver
like that in Figure 6, which is characteristic enough, and let us assume that the gauge
factor to be dualized corresponds to node 2. In what follows we analyze the structure
of the fields and couplings in the Seiberg dual, and reduce the problem of studying the
meta-stability of the theory with flavors to analyzing the structure of the theory in the
absence of flavors.
PSfrag replacements
X21 Y21
Y32 Z32
X43 Y43
Figure 6: Quiver diagram used to illustrate general results. It does not correspond to any
geometry in particular.
The first step is the introduction of flavors in the theory. As discussed in [19], for any
bi-fundamental Xab of the D3-brane quiver gauge theory there exist a supersymmetric
D7-brane leading to flavors Qbi, Q̃ia in the fundamental (antifundamental) of the b
(ath) gauge factor. There is also a cubic coupling XabQbiQ̃ia. Let us now specify a
concrete set of D7-branes to introduce flavors in our quiver gauge theory. Consider a
superpotential coupling of the D3-brane quiver gauge theory, involving fields charged
under the node to be dualized. This corresponds to a loop in the quiver, involving node
2, for instance X32X21X14Y43 in Figure 6. For any bi-fundamental chiral multiplet in
this coupling, we introduce a set of Nf,1 of the corresponding D7-brane. This leads
to a set of flavors for the different gauge factors, in a way consistent with anomaly
cancellation, such as that shown in Figure 7. The description of this system of D7-
branes in terms of dimer diagrams is carried out in Appendix B. The cubic couplings
described above lead to the superpotential terms5
Wflavor = λ
′ (X32Q2bQb3 + X21Q1aQa2 + X14Q4dQd1 + Y43Q3cQc4 ) (4.1)
Finally, we introduce mass terms for all flavors of all involved gauge factors:
Wmass = m2Qa2Q2b + m3Qb3Q3c + m4Qc4Q4d + m1Qd1Q1a (4.2)
These mass terms break the flavor group into a diagonal subgroup.
PSfrag replacements
X21 Y21
Y32 Z32
X43 Y43
Q2b Qb3
Q4dQd1
Figure 7: Quiver diagram with flavors. White nodes denote flavor groups.
4.1.2 Seiberg duality and one-loop masses
We consider introducing a number of massive flavors such that node 2 is in the free
magnetic phase, and consider its Seiberg dual. The only relevant fields in this case are
those charged under gauge factor 2, as shown if Figure 8. The Seiberg dual gives us
Figure 9 where the M ’s are mesons with indices in the gauge groups, R’s and S’s are
5Here we assume the same coupling, but the conclusions hold for arbitrary non-zero couplings.
PSfrag replacements
X21 Y21
Y32 Z32Qa2
Figure 8: Relevant part of quiver before Seiberg duality.
PSfrag replacements
X̃12 Ỹ12
Ỹ23Z̃23
M1, . . . ,M6
Figure 9: Relevant part of the quiver after Seiberg duality on node 2.
mesons with only one index in the flavor group, and Xab is a meson with both indices
in the flavor groups. The original cubic superpotential and flavor mass superpotentials
become
Wflavor dual = λ
′ (S13bQb3 + R
a1Q1a + X14Q4dQd1 + Y43Q3cQc4 )
Wmass dual = m2Xab + m3Qb3Q3c + m4Qc4Q4d + m1Qd1Q1a (4.3)
In addition we have the extra meson superpotential
Wmesons = h (XabQ̃b2Q̃2a + R
a1X̃12Q̃2a + R
a1Ỹ12Q̃2a + S
3bQ̃b2X̃23 + S
3bQ̃b2Ỹ23
+ S33bQ̃b2Z̃23 + M
31X̃12X̃23 + M
31X̃12Ỹ23 + M
31X̃12Z̃23
+ M431Ỹ12X̃23 + M
31Ỹ12Ỹ23 + M
31Ỹ12Z̃23 ). (4.4)
The crucial point is that we always obtain terms of the kind underlined above, namely
a piece of the superpotential reading m2Xab + hXabQ̃b2Q̃2a. This leads to tree level
supersymmetry breaking by the rank condition, as announced. Moreover the superpo-
tential fits in the structure of the generalized asymmetric O’Raifeartaigh model studied
in appendix A.2, with Xab, Q̃b2, Q̃2a corresponding to X , φ1, φ2 respectively. The mul-
tiplets Q̃b2 and Q̃2a are split at tree level, and Xab is massive at 1-loop. From our
study of the generalized asymmetric case, any field which has a cubic coupling to the
supersymmetry breaking fields Q̃b2 or Q̃2a is one-loop massive as well. Using the gen-
eral structure of Wmesons, a little thought shows that all dual quarks with no flavor
index (e.g. X̃ , Ỹ ) and all mesons with one flavor index (e.g. R or S) couple to the
supersymmetry breaking fields.
Thus they all get one-loop masses (with positive squared mass). Finally, the flavors
of other gauge factors (e.g. Qb3) are massive at tree level from Wmass.
The bottom line is that the only fields which do not get mass from these interac-
tions are the mesons with no flavor index, and the bi-fundamentals which do not get
dualized (uncharged under node 2). All these fields are related to the theory in the
absence of extra flavors, so they can be already stabilized at tree-level from the original
superpotential. So, the criteria for a metastable vacua is that the original theory, in
the absence of flavors leads, after dualization of the node with Nf < Nc, to masses for
all these fields (or more mildly that they correspond to directions stabilized by mass
terms, or perhaps higher order superpotential terms).
For example, if we apply this criteria to the dP2 case studied previously, the original
superpotential for the fractional DSB brane is
W = −λX53Y31X15 (4.5)
so after dualization we get
W = −λM13Y31 (4.6)
which makes these fields massive. Hence this fractional brane, after adding the D7-
branes in the appropriate configuration, will generate a metastable vacua will all moduli
stabilized.
The argument is completely general, and leads to an enormous simplification in
the study of the theories. In the next section we describe several examples. A more
rigorous and elaborate proof is provided in the appendix where we take into account the
matricial structure, and show that all fields, except for Goldstone bosons, get positive
squared masses at tree-level or at one-loop.
4.2 Additional examples
4.2.1 The dP3 case
Let us consider the complex cone over dP3, and introduce fractional DSB branes of the
kind considered in [15]. The quiver is shown in Figure 10 and the superpotential is
W = X13X35X51 (4.7)
Node 1 has Nf < Nc so upon addition of massive flavors and dualization will lead
to supersymmetry breaking by the rank condition. Following the procedure of the
previous section, we add Nf,1 flavors coupling to the bi-fundamentals X13, X35 and
X51. Node 1 is in the free magnetic phase for P +1 ≤ Nf,1 < 32P +
. Dualizing node
1, the above superpotential becomes
W = X35M53 (4.8)
where M53 is the meson X51X13. So, following the results of the previous section, we
can conclude that this DSB fractional brane generates a metastable vacua with all
pseudomoduli lifted.
4.2.2 Phase 1 of PdP4
Let us consider the PdP4 theory, and introduce the DSB fractional brane of the kind
considered in [15]. The quiver is shown in Figure 11 . The superpotential is
W = −X25X51X12 (4.9)
U(P+1)
Figure 10: Quiver diagram for the dP3 theory with a DSB fractional brane.
U(M+P)
Figure 11: Quiver diagram for the dP4 theory with a DSB fractional branes.
Node 1 has Nf < Nc and will lead to supersymmetry breaking by the rank condition in
the dual. Following the procedure of the previous section, we add Nf,1 flavors coupling
to the bi-fundamentals X12, X25 and X51. Node 1 is in the free magnetic phase for
P + 2 ≤M +Nf,1 < 32(M + P ). Dualizing node 1, the above superpotential becomes
W = X25M52, where M53 is the meson X51X12. Again we conclude that this DSB
fractional brane generates a metastable vacua with all pseudomoduli lifted.
4.2.3 The Y p,q family
Consider D3-branes at the real cones over the Y p,q Sasaki-Einstein manifolds [36, 37,
38, 39], whose field theory were determined in [8]. The theory admits a fractional brane
[13] of DSB kind, which namely breaks supersymmetry and lead to runaway behavior
[15, 18]. The analysis of metastability upon addition of massive flavors for arbitrary
Y p,q’s is much more involved than previous examples. Already the description of the
field theory on the fractional brane is complicated. Even for the simpler cases of Y p,q
and Y p,p−1 the superpotential contains many terms. In this section we do not provide a
general proof of metastability, but rather consider the more modest aim of showing that
all directions related to the runaway behavior in the absence of flavors are stabilized by
the addition of flavors. We expect that this will guarantee full metastability, since the
fields not involved in our analysis parametrize directions orthogonal to the runaway at
infinity.
The dimer for Y p,q is shown in Figure 12 and consists of a column of n hexagons and
2m quadrilaterals which are just halved hexagons [18]. The labels (n,m) are related
to (p, q) by
n = 2q ; m = p− q (4.10)
• The Y p,1 case
The dimer for the theory on the DSB fractional brane in the Y p,1 case is shown
in Figure 13, a periodic array of a column of two full hexagons, followed by p− 1 cut
hexagons (the shaded quadrilateral has Nc = 0). As shown in [18], the top quadrilateral
which has Nf < Nc, and induces the ADS superpotential triggering the runaway. The
relevant part of the dimer is shown in Figure 14, where V1 and V2 are the fields that run
to infinity [18]. This node will lead to supersymmetry breaking by the rank condition
in the dual. It is in the free magnetic phase for M + 1 ≤ Nf,1 < pM + M2 . The piece
Figure 12: The generic dimer for Y p,q, from [18].
of the superpotential involving the V1 and V2 terms is
W = Y U2V2 − Y U1V1. (4.11)
In the dual theory, the dual superpotential makes the fields massive. Hence, the theory
has a metastable vacua where the runaway fields are stabilized.
Figure 13: The dimer for Y p,1.
(p−1)M
(p−2)M
(2p−1)M
(p−1)M
(p+1)M
Figure 14: Top part of the dimer for Y p,1. The hexagons are labeled by the ranks of
the respective gauge groups
• The Y p,p−1 case
The analysis for Y p,p−1 is similar but in this case it is the bottom quadrilateral
which has the highest rank and thus gives the ADS superpotential [18]. The relevant
part of the dimer is shown in Figure 15, and the runaway direction is described by the
fields V1 and V2. Upon addition of Nf,1 flavors, the relevant node in the in the free
magnetic phase for M + 1 ≤ Nf,1 < pM + M2 Considering the superpotential, it is
straightforward to show that the runaway fields become massive. Complementing this
with our analysis in previous section, we conclude that the theory has a metastable
vacua where the runaway fields are stabilized.
We have thus shown that we can obtain metastable vacua for fractional branes at
cones over the Y p,1 and Y p,p−1 geometries. Although there is no obvious generalization
for arbitrary Y p,q’s, our results strongly suggest that the existence of metastable vacua
extends to the complete family.
5 Conclusions and outlook
The present work introduces techniques and computations which suggest that the ex-
istence of metastable supersymmetry breaking vacua is a general property of quiver
gauge theories on DSB fractional branes, namely fractional branes associated to ob-
structed complex deformations. It is very satisfactory to verify the correlation between
a non-trivial dynamical property in gauge theories and a geometric property in their
(p−1)M
(p−1)M
(2p−1)M
(p−2)M
(2p−2)M
(2p−2)M
Figure 15: Bottom part of the dimer for Y p,p−1. The hexagons are labeled by the ranks
of the respective gauge groups
string theory realization. The existence of such correlation fits nicely with the remark-
able properties of gauge theories on D-branes at singularities, and the gauge/gravity
correspondence for fractional branes.
Beyond the fact that our arguments do not constitute a general proof, our analysis
has left a number of interesting open questions. In fact, as we have mentioned, all
theories on DSB fractional branes contain one or several fields which do not appear
in the superpotential. We expect the presence of these fields to have a direct physical
interpretation, which has not been uncovered hitherto. It would be interesting to find
a natural explanation for them.
Finally, a possible extension of our results concerns D-branes at orientifold singular-
ities, which can lead to supersymmetry breaking and runaway as in [27]. Interestingly,
in this case the field theory analysis is more challenging, since they would require
Seiberg dualities of gauge factors with matter in two-index tensors. It is very possible
that the string theory realization, and the geometry of the singularity provide a much
more powerful tool to study the system.
Overall, we expect other surprises and interesting relations to come up from further
study of D-branes at singularities.
Acknowledgments
We thank S. Franco for useful discussions. A.U. thanks M. González for encouragement
and support. This work has been supported by the European Commission under RTN
European Programs MRTN-CT-2004-503369, MRTN-CT-2004-005105, by the CICYT
(Spain), and by the Comunidad de Madrid under project HEPHACOS P-ESP-00346.
The research by I.G.-E. is supported by the Gobierno Vasco PhD fellowship program.
The research of F.S is supported by the Ministerio de Educación y Ciencia through an
FPU grant. I.G.-E. and F.S. thank the CERN Theory Division for hospitality during
the completion of this work.
A Technical details about the calculation via Feyn-
man diagrams
A.1 The basic amplitudes
In the main text we are interested in computing two point functions for the pseudo-
moduli at one loop, and in section 2.2 also tadpole diagrams. There are just a few kinds
of diagrams entering in the calculation, which we will present now for the two-point
function, see Figure 16. The (real) bosonic fields are denoted by φi and the (Weyl)
fermions by ψi. The pseudomodulus we are interested in is denoted by ϕ.
c) d)
a) b)
ϕ ϕ ϕ
ϕ ϕ ϕ
Figure 16: Feynman diagrams contributing to the one-loop two point function. The dashed
line denotes bosons and the solid one fermions.
Bosonic contributions
These come from two terms in the Lagrangian. First there is a diagram coming from
terms of the form (Figure 16b):
L = . . .+ λϕ2φ2 − 1
m2φ2, (A.1)
giving an amplitude (we will be using dimensional regularization)
iM = −2iλ
(4π)2
− γ + 1 + log 4π − logm2
. (A.2)
The other contribution comes from the diagram in Figure 16a:
L = . . .+ λϕφ1φ2 −
2, (A.3)
which contributes to the two point function with an amplitude:
iM = iλ
(4π)2
− γ + log 4π −
dx log∆
, (A.4)
where here and in the following we denote ∆ ≡ xm21 + (1− x)m22.
Fermionic contributions
The relevant vertices here are again of two possible kinds, one of which is nonrenor-
malizable. The cubic interaction comes from terms in the Lagrangian given by the
diagram in Figure 16c:
L = . . .+ ϕ(aψ1ψ2 + a∗ψ̄1ψ̄2) +
1 + ψ̄
2 + ψ̄
2). (A.5)
We are assuming real masses for the fermions here, in the configurations we study this
can always be achieved by an appropriate field redefinition. The contribution from
such vertices is given by:
−2im1m2
(4π)2
(a2 + (a2)∗)
− γ + log 4π − log∆
− 8i|a|
(4π)2
− γ + log 4π + 1
− log∆
. (A.6)
The other fermionic contribution, which one does not need as long as one is dealing
with renormalizable interactions only (but we will need in the main text when analyzing
the pseudomodulus θ), is given by terms in the Lagrangian of the form (Figure 16d):
L = . . .+ λϕ2(ψ2 + ψ̄2) + 1
m(ψ2 + ψ̄2), (A.7)
which contributes to the total amplitude with:
iM = 8λmi
(4π)2
− γ + 1 + log 4π − logm2
. (A.8)
A.2 The basic superpotentials
The previous amplitudes are the basic ingredients entering the computation, but in
general the number of diagrams contributing to the two point amplitudes is quite
big, so calculating all the contributions by hand can get quite involved in particular
examples6. Happily, one finds that complicated models (such as dP1 or dP2, studied in
the main text) reduce to performing the analysis for only two different superpotentials,
which we analyze in this section.
The symmetric case
We want to study in this section a superpotential of the form:
W = h(Xφ1φ2 + µφ1φ3 + µφ2φ4 − µ2X). (A.9)
6The authors wrote the computer program in http://cern.ch/inaki/pm.tar.gz which helped
greatly in the process of computing the given amplitudes for the relevant models.
http://cern.ch/inaki/pm.tar.gz
This model is a close cousin of the basic O’Raifeartaigh model. We are interested in
the one loop contribution to the two point function of X , which is massless at tree
level.
From the (F-term) bosonic potential one obtains the following terms entering the
one loop computation:
|hXφ2|2 + |h|2µ(Xφ2φ∗3 +X∗φ∗2φ3) + |h|2µ(Xφ1φ∗4 +X∗φ∗1φ4)
+ |h|2µ2(φ1φ2 + φ∗1φ∗2) +
|h|2µ2|φi|2 (A.10)
In order to do the computation it is useful to diagonalize the mass matrix by
introducing φ+ and φ− such that:
(φ+ + iφ−) φ2 =
(φ+ − iφ−) (A.11)
and φa, φb such that:
φ∗3 =
(φa + iφb) φ
(φa − iφb). (A.12)
With these redefinitions the bosonic scalar potential decouples into identical φ+ and
φ− sectors, giving two decoupled copies of:
V = |h|2|X|2|φ+|2 + |h|2µ2(|φ+|2 + |φa|2)
+|h|2µ(Xφ+φa +X∗φ∗+φ∗a)−
|h|2µ2
φ2+ + (φ
. (A.13)
Calculating the amplitude consists simply of constructing the (very few) two point
diagrams from the potential above and plugging the formulas above for each diagram
(the fermionic part is even simpler in this case). The final answer is that in this model
the one loop correction to the mass squared of X is given by:
δm2X =
|h4|µ2
(log 4− 1). (A.14)
The generalized asymmetric case
The next case is slightly more complicated, but will suffice to analyze completely all
the models we encounter. We will be interested in the one loop contribution to the
mass of the pseudomoduli Y in a theory with superpotential:
W = h(Xφ1φ2 + µφ1φ3 + µφ2φ4 − µ2X) + k(rY φ1φ5 + µφ5φ7), (A.15)
with k and r arbitrary complex numbers. The procedure is straightforward as above,
so we will just quote the result. We obtain an amplitude given by:
iM = −i
(4π)2
|h2rµ|2C
, (A.16)
where we have defined C(t) as:
C(t) = t
log 4− t
t− 1 log t
. (A.17)
Note that this is a positive definite function, meaning that the one loop correction
to the mass is always positive, and the pseudomoduli get stabilized for any (nonzero)
value of the parameters. Also note that the limit of vanishing t with |r|2t fixed (i.e.,
vanishing masses for φ5 and φ7, but nonvanishing coupling of Y to the supersymmetry
breaking sector) gives a nonvanishing contribution to the mass of Y .
B D7–branes in the Riemann surface
The gauge theory of D3-branes at toric singularities can be encoded in a dimer diagram
[40, 41, 42, 43, 44]. This corresponds to a bi-partite tiling of T 2, where faces corre-
spond to gauge groups, edges correspond to bi-fundamentals, and nodes correspond to
superpotential terms. As an example, the dimer diagram of D3–branes on the cone
over dP2 is shown in Figure 17. As shown in [43], D3–branes on a toric singularity are
mirror to D6–branes on intersecting 3-cycles in a geometry given by a fibration of a
Riemann surface Σ with punctures. This Riemann surface is just a thickening of the
web diagram of the toric singularity [45, 46, 47], with punctures associated to external
legs of the web diagram. The mirror D6-branes wrap non-trivial 1-cycles on this Rie-
mann surface, with their intersections giving rise to bi-fundamental chiral multiplets,
and superpotential terms arising from closed discs bounded by the D6-branes. In [19],
it was shown that D7–branes passing through the singular point can be described in
the mirror Riemann surface Σ by non-compact 1-cycles which come from infinity at one
puncture and go to infinity at another. Figure 18 shows the 1-cycles corresponding to
some D3- and D7-branes in the Riemann surface in the geometry mirror to the complex
cone over dP2. A D7-brane leads to flavors for the two D3-brane gauge factors whose
1-cycles are intersected by the D7-brane 1-cycle, and there is a cubic coupling among
the three fields (related to the disk bounded by the three 1-cycles in the Riemann
surface).
Figure 17: Dimer diagram for D3–branes at a dP2 singularity.
Figure 18: Riemann surface in the geometry mirror to the complex cone over dP2, shown
as a tiling of a T 2 with punctures (denoted by capital letters). The figure shows the non-
compact 1-cycles extending between punctures, corresponding to D7-branes, and a piece of
the 1-cycles that correspond to the mirror of the D3-branes.
U(M) U(M)
U(2M)
PSfrag replacements
Q1i Qi3
Figure 19: Quiver for the dP2 theory with M fractional branes and flavors.
As stated in Section 4, given a gauge theory of D3-branes at a toric singularity,
we introduce flavors for some of the gauge factors in a specific way. We pick a term
in the superpotential, and we introduce flavors for all the involved gauge factors, and
coupling to all the involved bifundamental multiplets. For example, the quiver with
flavors for the dP2 theory is shown in Figure 19.
On the Riemann surface, this procedure amounts to picking a node and introducing
D7-branes crossing all the edges ending on the node, see Figure 18. In this example
we obtain the superpotential terms
Wflavor = λ
′(Q1iQ̃i3Y31 +Q3jQ̃j5X53 +Q5kQ̃k1X15) (B.1)
In addition we introduce mass terms
Wmass = m1Q1iQ̃k1 +m2Q3jQ̃i3 +m5Q5kQ̃j5 (B.2)
This procedure is completely general and applies to all gauge theories for branes at
toric singularities7.
C Detailed proof of Section 4
Recall that in Section 4 we considered the illustrative example of the gauge theory
given by the quiver in Figure 20. Since node 2 is the one we wish to dualize, the only
relevant part of the diagram is shown in Figure 21. We show the Seiberg dual in Figure
22. The above choice of D7–branes, which we showed in appendix B can be applied
to arbitrary toric singularities, gives us the superpotential terms
Wflavor = λ
′ (X32Q2bQb3 + X21Q1aQa2 + X14Q4dQd1 + Y43Q3cQc4 )
Wmass = m2Qa2Q2b + m3Qb3Q3c + m4Qc4Q4d + m1Qd1Q1a (C.1)
Taking the Seiberg dual of node 2 gives
Wflavor dual = λ
′ (S13bQb3 + R
a1Q1a + X14Q4dQd1 + Y43Q3cQc4 )
Wmass dual = m2Xab + m3Qb3Q3c + m4Qc4Q4d + m1Qd1Q1a
Wmesons = h (XabQ̃b2Q̃2a
+ R1a1X̃12Q̃2a + R
a1Ỹ12Q̃2a
7This procedure does not apply if the superpotential (regarded as a loop in the quiver) passes twice
through the node which is eventually dualized in the derivation of the metastable vacua. However we
have found no example of this for any DSB fractional branes.
PSfrag replacements
X21 Y21
Y32 Z32
X43 Y43
Q2b Qb3
Q4dQd1
Figure 20: Quiver diagram with flavors. White nodes denote flavor groups
PSfrag replacements
X21 Y21
Y32 Z32Qa2
Figure 21: Relevant part of quiver before Seiberg duality.
PSfrag replacements
X̃12 Ỹ12
Ỹ23Z̃23
M1, . . . ,M6
Figure 22: Relevant part of the quiver after Seiberg duality on node 2.
+ S13bQ̃b2X̃23 + S
3bQ̃b2Ỹ23 + S
3bQ̃b2Z̃23
+ M131X̃12X̃23 + M
31X̃12Ỹ23 + M
31X̃12Z̃23
+ M431Ỹ12X̃23 + M
31Ỹ12Ỹ23 + M
31Ỹ12Z̃23 ) (C.2)
where we have not included the original superpotential. The crucial point is that
the underlined terms appear for any quiver gauge theory with flavors introduced as
described in appendix B. As described in the main text, supersymmetry is broken
by the rank condition due to the F-term of the dual meson associated to the massive
flavors. Our vacuum ansatz is (we take Nf = 2 and Nc = 1 for simplicity; this does
not affect our conclusions)
Q̃b2 =
; Q̃2a = (µ1Nc ; 0) (C.3)
with all other vevs set to zero. We parametrize the perturbations around this minimum
Q̃b2 =
µ+ φ1
; Q̃2a = (µ+ φ3 ; φ4) ; Xab =
X00 X01
X10 X11
(C.4)
and the underlined terms give
hXabQ̃b2Q̃2a − hµ2Xab = hX11 φ2 φ4 − hµ2X11 + hµ φ2X01 + hµ φ4X10
+ hµ φ1X00 + hµ φ3X00 + hφ1 φ3X00 + hφ2 φ3X01
+ hφ1 φ4X10 (C.5)
It is important to note that all the fields in (C.4) will have quadratic couplings only in
the underlined term (C.5). Thus, one can safely study this term, and the conclusions
are independent of the other terms in the superpotential. Diagonalizing (C.5) gives
hXabQ̃b2Q̃2a − hµ2Xab = hX11 φ2 φ4 − hµ2X11 + hµ φ2X01 + hµ φ4X10
2hµ φ+X00 +
φ2+X00 −
φ2−X00
(ξ+ − ξ−)φ2X01 +
(ξ+ + ξ−)φ4X10 (C.6)
where
(φ1 + φ3) ; ξ− =
(φ1 − φ3) (C.7)
This term is similar to the generalized asymmetric case studied in appendix A.2 with
X11 → X ; φ4 → φ1 ; φ2 → φ2 ; X10 → φ3 ; X01 → φ4 (C.8)
So here X11 is the linear term that breaks supersymmetry, and φ2, φ4 are the broken
supersymmetry fields. In (C.6), the only massless fields at tree-level are X11 and
ξ−. Comparing to the ISS case in Section 2.1 shows that Im ξ− is a Goldstone boson
and X11, Re ξ− get mass at tree-level. As for φ2 and φ4, setting ρ+ =
(φ2 + φ4) and
(φ2−φ4) gives us Re(ρ+) and Im (ρ−) massless and the rest massive. Following
the discussion in Section 2.1, Re(ρ+) and Im (ρ−) are just the Goldstone bosons of the
broken SU(Nf ) symmetry
8. We have thus shown that the dualized flavors (e.g. Q̃b2,
Q̃2a) and the meson with two flavor indices (e.g. Xab) get mass at tree-level or at 1-loop
unless they are Goldstone bosons. Now, we need to verify that this is the case for the
remaining fields.
PSfrag replacements
Q4dQd1
X̃12 Ỹ12
Ỹ23Z̃23
M1..M6
Figure 23: Quiver after Seiberg duality on node 2.
The Seiberg dual of the original quiver diagram is shown in Figure 23. The dual-
ized bi-fundamentals come in two classes. The first are the ones that initially (before
dualizing) had cubic flavor couplings, there will always be only two of those (e.g. X̃12,
X̃23). The second are those that did not initially have cubic couplings to flavors, there
is an arbitrary number of those (e.g. Ỹ12, Ỹ23, Z̃23). Figure 24 shows the relevant part
of the quiver for the first class. Recalling the superpotential terms (C.2), there are
several possible sources of tree-level masses. For instance, these can arise in Wflavor dual
and Wmass dual. Also, remembering our assignation of vevs in (C.3), tree-level masses
can also arise in Wmesons from cubic couplings involving the broken supersymmetry
fields (e.g. Q̃b2, Q̃2a). The first class of bi-fundamentals (e.g. X̃12, X̃23) only appear in
Wmesons coupled to their respective mesons (e.g. R
1, S1). In turn these mesons will ap-
8In the case where the flavor group is SU(2), these Goldstone bosons are associated to the gener-
ators tx and ty.
PSfrag replacements
M1, . . . ,M6
Figure 24: Relevant part of dual quiver for first class of bi-fundamentals.
pear in quadratic terms in Wflavor dual coupled to flavors (e.g. S
3bQb3 and R
a1Q1a), and
these flavors each appear in one term in Wmass. Thus there are two sets of three terms
which are coupled at tree-level and which always couple in the same way. Consider for
instance the term
λ′ S13bQb3 + m3Qb3Q3c + hS
3bQ̃b2X̃23 = λ
′ (S1 S2)
+m1(C1 C2)
+ h (S1 S2)
µ+ φ1
= λ′(S1B1 + S2B2) +m1(B1C1 +B2C2)
+ hµS1 X̃23 + hS1 φ1 X̃23 + hS2 φ2 X̃23
(C.9)
where Si, Bi, Ci and X̃23 are the perturbations around the minimum. Diagonalizing
(which can be done analytically for any values of the couplings), we get that all terms
except one get tree-level masses, the massless field being:
Y = m1S2 − λ′C2 (C.10)
This massless field has a cubic coupling to φ2 X̃23 and gets mass at 1-loop since φ2 is
a broken supersymmetry field, as described in appendix A.2.
Figure 25 shows the relevant part of the quiver for the second class of bi-fundamentals
(i.e. those that are dualized but do not have cubic flavor couplings).
These fields and their mesons only appear in one term, so will always couple in the
same way. Taking as an example
hR2a1Ỹ12Q̃2a =
Ỹ12 (µ+ φ3 ; φ4)
= µR1 Ỹ12 + R1 φ3 Ỹ12 +R2 φ4 Ỹ12 (C.11)
PSfrag replacements
M1, . . . ,M6
Figure 25: Relevant part of dual quiver for second class of bi-fundamentals.
This shows that R1 and Ỹ12 get tree-level masses and R2 gets a mass at 1-loop since
it couples to the broken supersymmetry field φ4. The only remaining fields are flavors
like Qc4, Q4d, which do not transform in a gauge group adjacent to the dualized node
(i.e. not adjacent in the quiver loop corresponding to the superpotential term used to
introduce flavors). These are directly massive from the tree-level Wmass term.
So, as stated, all fields except those that appear in the original superpotential (i.e.
mesons with gauge indices and bi-fundamentals which are not dualized) get masses
either at tree-level or at one-loop. So we only need to check the dualized original
superpotential to see if we have a metastable vacua.
References
[1] J. M. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998) [Int. J. Theor. Phys. 38,
1113 (1999)] [arXiv:hep-th/9711200].
[2] S. S. Gubser, I. R. Klebanov and A. M. Polyakov, Phys. Lett. B 428, 105 (1998)
[arXiv:hep-th/9802109].
[3] E. Witten, Adv. Theor. Math. Phys. 2, 253 (1998) [arXiv:hep-th/9802150].
[4] S. Kachru and E. Silverstein, Phys. Rev. Lett. 80, 4855 (1998)
[arXiv:hep-th/9802183].
[5] I. R. Klebanov and E. Witten, Nucl. Phys. B 536, 199 (1998)
[arXiv:hep-th/9807080].
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/9802109
http://arxiv.org/abs/hep-th/9802150
http://arxiv.org/abs/hep-th/9802183
http://arxiv.org/abs/hep-th/9807080
[6] D. R. Morrison and M. R. Plesser, Adv. Theor. Math. Phys. 3, 1 (1999)
[arXiv:hep-th/9810201].
[7] M. Bertolini, F. Bigazzi and A. L. Cotrone, JHEP 0412, 024 (2004)
[arXiv:hep-th/0411249].
[8] S. Benvenuti, S. Franco, A. Hanany, D. Martelli and J. Sparks, JHEP 0506, 064
(2005) [arXiv:hep-th/0411264].
[9] I. R. Klebanov and M. J. Strassler, JHEP 0008, 052 (2000)
[arXiv:hep-th/0007191].
[10] S. Franco, A. Hanany, Y. H. He and P. Kazakopoulos, arXiv:hep-th/0306092.
[11] S. Franco, Y. H. He, C. Herzog and J. Walcher, Phys. Rev. D 70, 046006 (2004)
[arXiv:hep-th/0402120].
[12] S. Franco, A. Hanany and A. M. Uranga, JHEP 0509, 028 (2005)
[arXiv:hep-th/0502113].
[13] C. P. Herzog, Q. J. Ejaz and I. R. Klebanov, JHEP 0502, 009 (2005)
[arXiv:hep-th/0412193].
[14] D. Berenstein, C. P. Herzog, P. Ouyang and S. Pinansky, JHEP 0509, 084 (2005)
[arXiv:hep-th/0505029].
[15] S. Franco, A. Hanany, F. Saad and A. M. Uranga, JHEP 0601 (2006) 011
[arXiv:hep-th/0505040].
[16] M. Bertolini, F. Bigazzi and A. L. Cotrone, Phys. Rev. D 72, 061902 (2005)
[arXiv:hep-th/0505055].
[17] K. Intriligator and N. Seiberg, JHEP 0602, 031 (2006) [arXiv:hep-th/0512347].
[18] A. Brini and D. Forcella, arXiv:hep-th/0603245.
[19] S. Franco and A. M. Uranga, JHEP 0606 (2006) 031 [arXiv:hep-th/0604136].
[20] K. Intriligator, N. Seiberg and D. Shih, JHEP 0604 (2006) 021
[arXiv:hep-th/0602239].
[21] B. Florea, S. Kachru, J. McGreevy and N. Saulina, arXiv:hep-th/0610003.
http://arxiv.org/abs/hep-th/9810201
http://arxiv.org/abs/hep-th/0411249
http://arxiv.org/abs/hep-th/0411264
http://arxiv.org/abs/hep-th/0007191
http://arxiv.org/abs/hep-th/0306092
http://arxiv.org/abs/hep-th/0402120
http://arxiv.org/abs/hep-th/0502113
http://arxiv.org/abs/hep-th/0412193
http://arxiv.org/abs/hep-th/0505029
http://arxiv.org/abs/hep-th/0505040
http://arxiv.org/abs/hep-th/0505055
http://arxiv.org/abs/hep-th/0512347
http://arxiv.org/abs/hep-th/0603245
http://arxiv.org/abs/hep-th/0604136
http://arxiv.org/abs/hep-th/0602239
http://arxiv.org/abs/hep-th/0610003
[22] H. Ooguri and Y. Ookouchi, Phys. Lett. B 641 (2006) 323 [arXiv:hep-th/0607183].
[23] R. Argurio, M. Bertolini, S. Franco and S. Kachru, JHEP 0701 (2007) 083
[arXiv:hep-th/0610212].
[24] S. Franco, I. Garcia-Etxebarria and A. M. Uranga, JHEP 0701 (2007) 085
[arXiv:hep-th/0607218].
[25] I. Bena, E. Gorbatov, S. Hellerman, N. Seiberg and D. Shih, JHEP 0611 (2006)
088 [arXiv:hep-th/0608157].
[26] R. Argurio, M. Bertolini, S. Franco and S. Kachru, arXiv:hep-th/0703236.
[27] J. D. Lykken, E. Poppitz and S. P. Trivedi, Nucl. Phys. B 543, 105 (1999)
[arXiv:hep-th/9806080].
[28] M. Wijnholt, arXiv:hep-th/0703047.
[29] Y. E. Antebi and T. Volansky, arXiv:hep-th/0703112.
[30] I. Garcia-Etxebarria, F. Saad and A. M. Uranga, JHEP 0608, 069 (2006)
[arXiv:hep-th/0605166].
[31] I. Garcia-Etxebarria, F. Saad and A. M. Uranga, JHEP 0606, 055 (2006)
[arXiv:hep-th/0603108].
[32] D. E. Diaconescu, B. Florea, S. Kachru and P. Svrcek, JHEP 0602, 020 (2006)
[arXiv:hep-th/0512170].
[33] K. Intriligator and N. Seiberg, arXiv:hep-ph/0702069.
[34] S. R. Coleman and E. Weinberg, Phys. Rev. D 7 (1973) 1888.
[35] S. Weinberg, Cambridge, UK: Univ. Pr. (1996) 489 p
[36] J. P. Gauntlett, D. Martelli, J. Sparks and D. Waldram, Class. Quant. Grav. 21,
4335 (2004) [arXiv:hep-th/0402153].
[37] J. P. Gauntlett, D. Martelli, J. Sparks and D. Waldram, Adv. Theor. Math. Phys.
8, 711 (2004) [arXiv:hep-th/0403002].
[38] J. P. Gauntlett, D. Martelli, J. F. Sparks and D. Waldram, Adv. Theor. Math.
Phys. 8, 987 (2006) [arXiv:hep-th/0403038].
http://arxiv.org/abs/hep-th/0607183
http://arxiv.org/abs/hep-th/0610212
http://arxiv.org/abs/hep-th/0607218
http://arxiv.org/abs/hep-th/0608157
http://arxiv.org/abs/hep-th/0703236
http://arxiv.org/abs/hep-th/9806080
http://arxiv.org/abs/hep-th/0703047
http://arxiv.org/abs/hep-th/0703112
http://arxiv.org/abs/hep-th/0605166
http://arxiv.org/abs/hep-th/0603108
http://arxiv.org/abs/hep-th/0512170
http://arxiv.org/abs/hep-ph/0702069
http://arxiv.org/abs/hep-th/0402153
http://arxiv.org/abs/hep-th/0403002
http://arxiv.org/abs/hep-th/0403038
[39] D. Martelli and J. Sparks, Commun. Math. Phys. 262, 51 (2006)
[arXiv:hep-th/0411238].
[40] A. Hanany and K. D. Kennaway, arXiv:hep-th/0503149.
[41] S. Franco, A. Hanany, K. D. Kennaway, D. Vegh and B. Wecht,
arXiv:hep-th/0504110.
[42] A. Hanany and D. Vegh, arXiv:hep-th/0511063.
[43] B. Feng, Y. H. He, K. D. Kennaway and C. Vafa, arXiv:hep-th/0511287.
[44] S. Franco and D. Vegh, arXiv:hep-th/0601063.
[45] O. Aharony and A. Hanany, Nucl. Phys. B 504, 239 (1997)
[arXiv:hep-th/9704170].
[46] O. Aharony, A. Hanany and B. Kol, JHEP 9801, 002 (1998)
[arXiv:hep-th/9710116].
[47] N. C. Leung and C. Vafa, Adv. Theor. Math. Phys. 2, 91 (1998)
[arXiv:hep-th/9711013].
http://arxiv.org/abs/hep-th/0411238
http://arxiv.org/abs/hep-th/0503149
http://arxiv.org/abs/hep-th/0504110
http://arxiv.org/abs/hep-th/0511063
http://arxiv.org/abs/hep-th/0511287
http://arxiv.org/abs/hep-th/0601063
http://arxiv.org/abs/hep-th/9704170
http://arxiv.org/abs/hep-th/9710116
http://arxiv.org/abs/hep-th/9711013
	Introduction
	The ISS model revisited
	The ISS metastable minimum
	The Goldstone bosons
	Meta-stable vacua in quiver gauge theories with DSB branes
	The complex cone over dP1
	Additional examples: The dP2 case
	The general case
	The general argument
	Construction of the flavored theories
	Seiberg duality and one-loop masses
	Additional examples
	The dP3 case
	Phase 1 of PdP4
	The Yp,q family
	Conclusions and outlook
	Technical details about the calculation via Feynman diagrams
	The basic amplitudes
	The basic superpotentials
	D7–branes in the Riemann surface
	Detailed proof of Section 4
ABSTRACT
  In this paper we consider quiver gauge theories with fractional branes whose
infrared dynamics removes the classical supersymmetric vacua (DSB branes). We
show that addition of flavors to these theories (via additional non-compact
branes) leads to local meta-stable supersymmetry breaking minima, closely
related to those of SQCD with massive flavors. We simplify the study of the
one-loop lifting of the accidental classical flat directions by direct
computation of the pseudomoduli masses via Feynman diagrams. This new approach
allows to obtain analytic results for all these theories. This work extends the
results for the $dP_1$ theory in hep-th/0607218. The new approach allows to
generalize the computation to general examples of DSB branes, and for arbitrary
values of the superpotential couplings.

<|endoftext|><|startoftext|>
Introduction
In this paper we consider non-leptonic “heavy meson to heavy meson(s)”
transitions, for instance B − B-mixing [1], B → DD̄ [2] and with only one
D-meson in the final state, like B → Dη′ [3] and B → γ D∗ [4, 5, 6].
The methods [7] used to describe heavy to light tansitions like B → ππ
and B → Kπ are not suited for the decays we consider. We use heavy-light
chiral perturbation theory (HLχPT). Lagrangian terms corresponding to
factorization are then determined to zeroth order in 1/mQ, where mQ is the
mass of the heavy quark (b or c). For B−B-mixing we have also calculated
1/mb corrections [1].
Colour suppressed 1/Nc terms beyond factorization can be written down,
but their coefficients are unknown. However, these coefficients can be cal-
culated within a heavy-light chiral quark model (HLχQM) [8] based on the
heavy quark effective theory (HQEFT) [9] and HLχPT [10]. The 1/Nc
suppressed non-factorizable terms calculated in this way will typically be
proportional to a model dependent gluon condensate [1, 2, 3, 6, 8, 11].
Presented at the Euridice meeting in Kazimierz, Poland, 24-27th of august 2006
http://arxiv.org/abs/0704.0167v1
2 KazProc printed on November 4, 2018
2. Quark Lagrangians for non-leptonic decays
The effective non-leptonic Lagrangian at quark level has the form [12]:
Ci(µ) Q̂i(µ) , (1)
where the Wilson coefficients Ci contain GF and KM factors. Typically, the
operators are four quark operators being the product of two currents:
Q̂i = j
W (q1 → q2) j
µ (q3 → q4) , (2)
where j
W (qi → qj) = (qj)L γ
µ (qi)L, and some of the quarks qi,j are heavy.
To leading order in 1/Nc, matrix elements of Q̂i factorize in products of
matrix elements of currents. Non-factorizable 1/Nc suppressed terms are
obtained from “coloured quark operators”. Using Fierz transformations
δijδln =
δinδlj + 2 t
lj , (3)
where ta are colour matrices, we may rewrite the operator Q̂i as
Q̂Fi =
W (q1 → q4) j
µ (q3 → q2) + 2 j
W (q1 → q4)
a jWµ (q3 → q2)a , (4)
where j
W (qi → qj)a = (qj)L γµ ta (qi)L is a left-handed coloured current.
The quark operators in Q̂Fi give 1/Nc suppressed terms.
3. Heavy-light chiral perturbation theory
The QCD Lagrangian involving light and heavy quarks is:
LQuark = ±Q
v iv ·DQ(±)v +O(m−1Q ) + q̄iγ ·Dq + ... (5)
where Q
v are the quark fields for a heavy quark and a heavy anti-quark
with velocity v, q is the light quark triplet, and iDµ = i∂µ− eqAµ− gstaAaµ.
The bosonized Lagrangian have the following form, consistent with the un-
derlying symmetry [10]:
Lχ(Bos) = ∓Tr
a (iv · Dfa)H
− gATr
f γµγ5A
+ ...(6)
where the covariant derivative is iDµ
≡ δaf (i∂µ− eHAµ)−Vµfa ; a, f being
SU(3) flavour indices. The axial coupling is gA ≃ 0.6. Furthermore,
Vµ(orAµ) = ±
(ξ†∂µξ ± ξ∂µξ†) , (7)
KazProc printed on November 4, 2018 3
where ξ = exp(iΠ/f), and Π is a 3 by 3 matrix containing the light mesons
(π,Kη), and the heavy (1−, 0−) doublet field (Pµ, P5) is
H(±) = P±(P
µ − iP (±)5 γ5) , P± = (1± γ · v)/2 , (8)
where superscripts (±) means meson and anti-meson respectively. To bosonize
the non-leptonic quark Lagrangian, we need to bosonize the currents. Then
the b, c, and c quarks are treated within HQEFT, which means the replace-
ments b → Q(+)vb , c → Q
vc , and c → Q
v̄ . Then the bosonization of
currents within HQEFT for decay of a heavy B-meson will be:
µQ(+)vb −→
ξ†γµLH
≡ Jµb , (9)
where L is the left-handed projector in Dirac space, and αH = fH
for H = B,D before pQCD and chiral corrections are added. Here, H
represents the heavy meson (doublet) containing a b-quark. For creation of
a heavy anti-meson B or D, the corresponding currents J
and J
c̄ are given
by (9) with H
b replaced by H
b and H
c , repectively. For the B → D
transition we have
µ LQ(+)vc −→ −ζ(ω)Tr
≡ Jµb→c , (10)
where ζ(ω) is the Isgur-Wise function, and ω = vb · vc. For creation of DD
pair we have the same expression for the current J
cc̄ with H
replaced
c , and ζ(ω) replaced by ζ(−λ), where λ = v̄ · vc . In addition there
are 1/mQ corrections for Q = b, c. The low velocity limit is ω → 1 . For
B → DD and B → D∗γ one has ω ≃ 1.3 , and ω ≃ 1.6 , respectively.
3.1. Factorized lagrangians for non-leptonic processes
For B −B mixing, the factorized bosonized Lagrangian is
LB = CB J
b (Jb̄)
µ , (11)
where CB is a short distance Wilson coefficient (containing (GF )
2), which
is taken at µ = Λχ ≃ 1 GeV, and the currents are given by (9).
For processes obtained from two different four quark operators for b →
cc̄q (q = d, s), we find the factorized Lagrangian corresponding to Fig. 1:
LSpecFact = (C2 +
b→c (Jc̄)µ , (12)
4 KazProc printed on November 4, 2018
B0 D+
Fig. 1. Factorized contribution for B0
→ D+D−s through the spectator mechanism,
which does not exist for decay mode B0
→ D+s D−s .
Fig. 2. Factorized contribution for B0
→ D+s D−s through the annihilation mecha-
nism, which give zero contributions if both D+s and D
s are pseudoscalars.
where Ci =
GFVcbV
cq ai, and [13] a1 ≃ −0.35 − 0.07i, a2 ≃ 1.29 +
0.08i. We have considered the process B0d → D
s . Note that there is no
factorized contribution to this process if both D-mesons in the final state
are pseudoscalars! But the factorized contribution to B0
→ D+D−s will be
the starting point for chiral loop contributions to the process B0
→ D+s D−s .
The factorizable term from annihilation is shown in Fig. 2, and is:
LAnnFact = (C1 +
cc̄ (Jb)µ . (13)
Because (C1 + C2/Nc) is a non-favourable combination of the Wilson coef-
ficients, this term will give a small non-zero contribution if at least one of
the mesons in the final state is a vector.
3.2. Possible 1/Nc suppressed tree level terms
For B − B̄ mixing, we have for instance the 1/Nc suppressed term
ξ†σµαLH
ξ†σµαRH
. (14)
KazProc printed on November 4, 2018 5
π0, η8
π0, η8
π0, η8
π0, η8
Fig. 3. Chiral corrections to B −B mixing, i.e the bag parameter BBq for q = d, s.
The black boxes are weak vertices.
B0 B∗0
Fig. 4. Two classes of non-factorizable chiral loops for B0
→ D+s D−s based on the
factorizable amplitude proportional to the IW function ∼ ζ(ω).
For B → DD̄, we have for instance the terms
ξ†σµαLH
c γαLH
c̄ γµ
, (15)
ξ†σµαLH
c γαLH
(v̄ − vc)µ . (16)
One needs a framework to estimate the coefficients of such terms. We use
the HLχQM, which will pick a certain linear combination of 1/Nc terms.
3.3. Chiral loops for non-leptonic processes
Within HLχPT, the leading chiral corrections are proportional to
χ(M) ≡ (
)2 ln(
) , (17)
where mM is the appropriate light meson mass and Λχ is the chiral symme-
try breaking scale, which is also the matching scale within our framework.
For B − B mixing there are chiral loops obtained from (6) and (11)
shown in Fig. 3. These have to be added to the factorized contribution.
6 KazProc printed on November 4, 2018
Fig. 5. The HLχQM ansatz: Vertex for quark meson interaction
For the process B0d → D
s we obtain a chiral loop amplitude cor-
responding to Fig. 4. This amplitude is complex and depend on ω and λ
defined previously. It has been recently shown [5] that (0+, 1+) states in
loops should also be added to the result.
4. The heavy-light chiral quark model
The Lagrangian for HLχQM [8] contains the Lagrangian (5):
LHLχQM = LHQET + LχQM + LInt , (18)
where LHQET is the heavy quark part of (5), and the light quark part is
LχQM = χ [γµ(iDµ + Vµ + γ5Aµ)−m]χ . (19)
Here χL = ξ
†qL and χR = ξqR are flavour rotated light quark fields, and m
is the light constituent mass. The bosonization of the (heavy-light) quark
sector is performed via the ansatz:
LInt = −GH
v Qv +Qv H
. (20)
The coupling GH is determined by bosonization through the loop diagrams
in Fig 6. The bosonization lead to relations between the model depen-
dent parameters GH , m, and 〈 αsπ G
2 〉, and the quadratic-, linear, and
logarithmic- divergent integrals I1, I3/2, I1, and the physical quantities fπ,
〈 qq 〉, gA and fH (H = B,D). For example, the relation obtained for iden-
tifying the kinetic term is:
− iG2HNc (I3/2 + 2mI2 +
i(8 − 3π)
384Ncm3
G2 〉) = 1 , (21)
where we have used the prescription:
αβ → 4π2〈
G2 〉 1
(gµαgνβ − gµβgνα) . (22)
The parameters are fitted in strong sector, with 〈 αs
G2 〉 = [(0.315 ± 0.020)
GeV]4 , and GH
2 = 2m
ρ , where ρ ≃ 1. For details , see [8].
KazProc printed on November 4, 2018 7
Fig. 6. Diagrams generating the strong chiral lagrangian at mesonic level. The
kinetic term and and the axial vector term ∼ gA.
Fig. 7. Non-factorizable contribution to B −B mixing; Γ ≡ ta γµ L
5. 1/N
terms from HLχQM
To obtain the 1/Nc terms for B − B mixing in Fig. 7 , we need the
bosonization of colored current in the quark operators of eq. (4):
a γαQ(+)vb
GH gs
GaµνTr
ξ†γαLH
b Σµν
, (23)
Σµν = σµν − 2πf
[σµν , γ · vb]+ . (24)
This coloured current is also used for B → DD in Fig. 8, for B → Dη′ in
Fig. 9, and for B → γD∗ in Fig. 10 In addition there are more complicated
bosonizations of coloured currents as indicated in Fig. 8.
For B → Dη′ and B → γD∗ decays there are two different four quark
operators, both for b → cūq and b → c̄uq, respectively. At µ = 1 GeV they
have Wilson coefficients a2 ≃ 1.17 , a1 ≃ −0.37 (up to prefactors GF and
8 KazProc printed on November 4, 2018
Fig. 8. Non-factorizable 1/Nc contribution for B0 → D+s D−s through the annihila-
tion mechanism with additional soft gluon emision.
Fig. 9. Diagram for B → Dη′ within HLχQM . Γ = γµ(1− γ5)
KM-factors). For B → Dη′, we must also attach a propagating gluon to the
η′gg∗-vertex. Note that for B0
→ γD0∗, the 1/Nc suppressed mechanism
in Fig. 10 dominates, unlike B0
→ γD0∗. Factorized contributions are
proportional to either the favourable contribution af = a2 + a1/Nc ≃ 1.05
or the non-favourable contribution anf = a1 + a2/Nc ≃ 0.02.
5.1. 1/mc correction terms
For the B → D transition we have the 1/mc suppressed terms:
c + Z1γ
c γα + Z2H
c γ · vb
, (25)
where the Zi’s are calculable within HLχQM. The relative size of 1/mc
corrections are typically of order 20− 30%.
6. Results
6.1. B −B mixing
The result for the B(ag) parameter in B −B-mixing has the form [1]
B̂Bq =
1− δBG
32π2f2
, (26)
KazProc printed on November 4, 2018 9
B D B D
Fig. 10. Non-factorizable contributions to B → γD∗ from the coloured operators
similar to the K − K-mixing case [11]. From perturbative QCD we have
b̃ ≃ 1.56 at µ = Λχ = 1 GeV. From calculations within the HLχQM we
obtain, δBG = 0.5±0.1 and τb = (0.26±0.04)GeV, and from chiral corrections
τχ,s = (−0.10 ± 0.04)GeV2, and τχ,d = (−0.02 ± 0.01)GeV2 . We obtained
B̂Bd = 1.51± 0.09 B̂Bs = 1.40 ± 0.16 , (27)
in agreement with lattice results.
6.2. B → DD decays
Keeping the chiral logs and the 1/Nc terms from the gluon condensate,
we find the branching ratios in the “leading approximation”. For decays of
B̄0d (∼ VcbV ∗cd) and B̄0s (∼ VcbV ∗cs) we obtain branching ratios of order few
×10−4 and ×10−3, respectively Then we have to add counterterms ∼ ms
for chiral loops. These may be estimated in HLχQM.
6.3. B → Dη′ and B → γD∗ decays
The result corresponding to Fig. 9 is:
Br(B → Dη′) ≃ 2× 10−4 . (28)
The partial branching ratios from the mechanism in Fig. 10 are [6]
Br(B0d → γ D
∗0)G ≃ 1× 10−5 ; Br(B0s → γ D∗0)G ≃ 6× 10−7 . (29)
The corresponding factorizable contribibutions are roughly two orders of
magnitude smaller. Note that the process B0d → γ D∗0 has substantial
meson exchanges (would be chiral loops for ω → 1), and is different.
7. Conclusions
Our low energy framework is well suited to B −B mixing, and to some
extent to B → DD. Work continues to include (0+, 1+), states, countert-
erms, and 1/mc terms. Note that the amplitude for B
→ D+s D−s is zero
10 KazProc printed on November 4, 2018
in the factorized limit. For processes like B → Dη′ and B → Dγ we
can give order of magnitude estimates when factorization give zero or small
amplitudes.
* * *
JOE is supported in part by the Norwegian research council and by
the European Union RTN network, Contract No. HPRN-CT-2002-00311
(EURIDICE). He thanks his collaborators : A. Hiorth, S. Fajfer, A. Polosa,
A. Prapotnik Brdnik, J.A. Macdonald Sørensen, and J. Zupan
REFERENCES
[1] A. Hiorth and J. O. Eeg, Eur. Phys. J. direct C30, 006 (2003) (see also
references therein).
[2] J.O. Eeg, S. Fajfer , and A. Hiorth, Phys.Lett. B570, 46-52 (2003);
J. O. Eeg, S. Fajfer, and A. Prapotnik Eur. Phys. J. C42, 29-36 (2005).
See also: J.O. Eeg, S. Fajfer, J. Zupan, Phys. Rev. D 64, 034010 (2001).
[3] J. O. Eeg, A. Hiorth, A. D. Polosa, Phys. Rev. D 65, 054030 (2002).
[4] B.Grinstein and R.F. Lebed, Phys.Rev. D60, 031302(R) (1999).
[5] O. Antipin and G. Valencia, Phys.Rev. D74, 054015 (2006), hep-ph/0606065.
[6] J.A. Macdonald Sørensen and J.O. Eeg, hep-ph/0605078.
[7] M. Beneke et. al, Phys. Rev. Lett. 83, 1914 (1999); C. W. Bauer et al. Phys.
Rev. D 70, 054015 (2004)
[8] A. Hiorth and J. O. Eeg, Phys. Rev. D 66, 074001 (2002), and references
therein.
[9] For a review, see M. Neubert, Phys. Rep. 245, 259 (1994).
[10] For a review, see: R. Casalbuoni et al. Phys. Rep. 281, 145 (1997).
[11] S. Bertolini, J.O. Eeg and M. Fabbrichesi, Nucl. Phys. B449, 197 (1995);
V. Antonelli et al. Nucl. Phys. B469, 143 (1996); S. Bertolini et al. Nucl.
Phys. B514, 63 (1998); ibid B514, 93 (1998).
[12] See for example: G. Buchalla, A. J. Buras, M. E. Lautenbacher, Rev. Mod.
Phys. 68, 1125 (1996), and references therein.
[13] B. Grinstein et al., Nucl. Phys. B363 , 19 (1991). R. Fleischer, Nucl. Phys. B
412, 201 (1994).
http://arxiv.org/abs/hep-ph/0606065
http://arxiv.org/abs/hep-ph/0605078
	Introduction
	Quark Lagrangians for non-leptonic decays
	Heavy-light chiral perturbation theory
	Factorized lagrangians for non-leptonic processes
	Possible 1/Nc suppressed tree level terms 
	Chiral loops for non-leptonic processes
	The heavy-light chiral quark model
	1/Nc terms from HLQM
	1/mc correction terms 
	Results
	B- B mixing
	B D  D decays
	B D  ' and B D* decays
	Conclusions
ABSTRACT
  I discuss low energy aspects of heavy meson decays, where there is at least
one heavy meson in the final state. Examples are $B -\bar{B}$ mixing, $B \to D
\bar{D}$, $B \to D \eta'$, and $B \to D \gamma$. %and $B \to D W $ (Isgur-Wise
function). The analysis is performed in the heavy quark limit within
heavy-light chiral perturbation theory. Coefficients of $1/N_c$ suppressed
chiral Lagrangian terms (beyond factorization) have been estimated by means of
a heavy-light chiral quark model.

<|endoftext|><|startoftext|>
Introduction
	Nonrelativistic Shocks
	Momentum Cut-off
	The Integrated Distribution Function and Synchrotron Spectra
	Relativistic Shock Acceleration with Losses
	Determining the Eigenfunctions
	Shock matching conditions
	The Spatially Integrated Distribution
	Discussion
	Inverse Laplace Transforms
	Deriving the eigensystem differential equations
ABSTRACT
  We investigate the acceleration and simultaneous radiative losses of
electrons in the vicinity of relativistic shocks. Particles undergo pitch angle
diffusion, gaining energy as they cross the shock by the Fermi mechanism and
also emitting synchrotron radiation in the ambient magnetic field. A
semi-analytic approach is developed which allows us to consider the behaviour
of the shape of the spectral cut-off and the variation of that cut-off with the
particle pitch angle. The implications for the synchrotron emission of
relativistic jets, such as those in gamma ray burst sources and blazars, are
discussed.

<|endoftext|><|startoftext|>
arXiv:0704.0169v1  [hep-ph]  2 Apr 2007
VERY STRONG AND SLOWLY VARYING MAGNETIC FIELD
AS SOURCE OF AXIONS
Giorgio CALUCCI*
Dipartimento di Fisica Teorica dell’Università di Trieste, Trieste, I 34014 Italy
INFN, Sezione di Trieste, Italy
Abstract
The investigation on the production of particles in slowly varying but
extremely intense magnetic field in extended to the case of axions. The
motivation is, as for some previously considered cases, the possibility that
such kind of magnetic field may exist around very compact astrophysical
objects.
* E-mail: giorgio@ts.infn.it
http://arxiv.org/abs/0704.0169v1
1. Statement of the problem
A magnetic field of huge strength can give rise to real particles even if
its rate of variation is very small: this posibility could be of some interest
from a pure theoretical point of view, but it gains more physical relevance
if one accepts that such kind of field configurations may be present around
some very compact astrophysical objects[1-3]. In this case the time vari-
ation is related to the evolution of the source, by collapse, rotations or
else, and it is therefore very slow, in comparison with the times typical
of elementary-particle processes. We can call the former time the macro-
scopic time and the latter the microscopic one. The production of light
particles in these processes has been analyzed in some detail in some pre-
vious papers[4],with the suggestion that it is one of the mechanisms at
work in the phenomenon of gamma-ray bursts[5]; the typical microscopic
time is related to the electron mass since photons are produced through
real or virtual intermediate states of e− − e+-pairs. The lightest particles
that could be produced are massive neutrinos, but the magnetic-moment
coupling induced by the standard electroweak interactions is extremely
small.
There is, at least in the theoretical realm, another very light parti-
cle, that is the axion[6]: owing to its dynamical characteristics it must
be coupled also to the electromagnetic field[7], even more its electomag-
netic coupling is being actively studied from an experimental side[8] and
the possibility of detecting such particles as coming from nonterrestrial
sources has already been foreseen[9]. It is immediately seen that the pro-
duction of axions by a varying magnetic field must be realized through a
mechanism different from the previously considered one, in fact the axions
are coupled[7] only to the pseudoscalar density E ·B so the presence of
an electric field is necessary as a starting point, but a nonstatic magnetic
field creates always an electric field and even though the rate of variation
is small, the very large magnetic strength makes the electric field not a
tiny one.
In the present paper the coupling of the axions field with a given
E ·B density is written in standard second-quantized formalism, then the
effect of time variation of that density on the axion vacuum is determined
and the consequent production is calculated. The result depends both
on the spatial shape and on the time variation of the magnetic field: in
accordance with the prevailing astrophysical hypotheses[1-3] the magnetic
field is seen as a bundle of lines of force which may safely be considered
straight in comparison with the microscopic scale. The time-variation
could affect both the shape and the strength of the fields, both are effective
in the production process. The calculation procedure is not the standard
adiabatic approximation [10] as used in previous investigations [4], but the
feature that one has to deal with a two-scale problem is still fully relevant.
2. General form of the production probability
The starting point is a second-quantized axion field in presence of
given, classical, magnetic and electric fields. The axion field φ(x) is cou-
pled to the pseudoscalar density * G(x) = E(x) · B(x) and the coupling
constant, of dimension of length, is here indicated by C. We assume that
the interaction lasts from an initial time to until a final time t.
So we have for the axion field the expression:
φ(x) = φo(x) + χ(x) = φo(x) + C
∆R(x− y)G(y)d4y (1)
Here ∆R(x − y) is the standard retarded Green function, the source is a
c−number, so the same holds for χ. The field φ is free before to, where it
has the standard expansion:
φo(x) =
(2π)3/2
ao(k)e
ik·r−iωt + a†o(k)e
−ik·r+iωt
, (2)
then it acquires a contribution from χ, this term has the following actual
expression:
χ(x) =
(2π)3/2
e−iωkt
eiωkτgk(τ)dτe
ik·r − c.c.
(3.1)
gk(t) =
(2π)3/2
G(r, t)e−ik·r (3.2)
The reality condition for G, that gives g∗k = g−k have been used, together
with the initial condition χ̇(to) = 0. The separation into positive and
negative frequencies in χ(x) is unambiguous until the typical frequencies
of G are small.
Having found the time evolution of the field the total production of
axions is calculated in Heisenberg description of motion, i.e. we take an
initial state ao(k)|◦ >= 0 ∀k, i.e. the vacuum in the absence of inter-
action then we express the mean particle number as the time dependent
N (k, t) =< ◦|a†(k, t) a(k, t)|◦ >
* the same kind of coupling is possible also for the neutral pion, but in this case
other channels of production are present [11]
Since all the effect of the interaction is a c−number shift on the opera-
tors a(k, t) = ao(k) + b(k) the calculation is easy, in particular when the
interaction no longer acts we get
Nf (k) = |bf (k)|2 (4)
The expression of bf (k) can be read off from eq.s (2,3.1,3.2), it is:
bf (k) =
eiωkτgk(τ)dτ (5)
3. Detailed calculations
As anticipated in the introduction a more definite model of the mag-
netic field can by a field of uniform direction (at a given time) with some
transverse shape, both the direction and the shape may vary in time, one
possible restriction is the conservation of the total flux. More explicitly
these conditions are realized by giving:
n ∧ r
1− w(µr⊥)
B = − F
n ∂⊥w(µr⊥) . (6)
The unit vector n gives the instantaneous direction of the magnetic field,
r2 − n · r2 and ∂⊥ the corresponding derivative, F is the total
magnetic flux; the parameter µ defines the size of the field in the transverse
directions, w is taken to be cylindrically symmetric and, obviously it must
go to zero at infinity, the requirement w(0) = 1 avoids singularities in E.
Since E = −Ȧ and A ·B = 0 only the terms coming from the time
variation of the direction of A contributes to G = E ·B so we get:
)2J · r
(1− w)∂⊥w
Here J gives the angular velocity of n i.e. J = n ∧ ṅ. In this configuration
the Fourier transform of the source is
gk(t) =
(2π)3/2
C(2F )2δ(n · k)J · kS(k⊥/µ) (8.1)
S(k⊥/µ) = (µ/k⊥)
J1(ρk⊥/µ)
(1− w(ρ))w′(ρ)
dρ/ρ (8.2)
Here J1 is the Bessel function of order one and w
′ indicates the derivative
with respect to the argument. It is useful to remember that, owing to the
presence of the factor δ(n · k) in the expression of S we can substitute k2⊥
simply with k2.
We now remember that the model of magnetic field we have at hand
is such that it is uniform along one direction, but this direction is contin-
uously varying, so a most significant quantity is obtained by an angular
integration
dΩkNf (k) (9)
So we need a quantity like
dΩkgk(τ)g
′) which contains a singularity
due to the presence, in the domain of integration, of a δ-square term which
arises for τ = τ ′. This is clearly due to the unphysical assumption that
at every time there is a direction in which the pseudoscalar density G
is absolutely uniform. Since we are integrating over the direction at the
end the effect of the singularity is mild enough, it results in a logarithmic
divergence, however this fact must be explicitly death with, considering a
finite extension of the fields. A more careful treatment is mathemetically
heavier, so it is presented in an Appendix.
The final result is given by
≈ C2(2F )4 k
Ψ[S(k/µ)]2
2 ln 4kL . (10)
If we give a definite transverse shape to the fields we get, evidently,
a definite answer. In the situation where the fields change direction but
not intensities, i.e. we keep µ constant, we may give, tentatively the form
w(ρ) = e−ρ
. Then the expression for the function S(k/µ) is[12]:
S(k/µ) =
− exp
. (11)
The limit k → 0 of this expression is finite, so the whole production goes
to zero only owing to the phase-space factor k2 in front of the expression in
eq.(10). This result, as it appears from the whole derivation, can be valid
only for axion masses and so energies which are definitely larger than the
typical frequencies of the astrophysical phenomena, no resonant dynamics
is included.
4. Some conclusions
The rate of production of axions by a slowly varying but very strong
magnetic field has been calculated. The conditions are such that the as-
trophysical frequencies are very much lower than the proper frequencies of
the axion field. In fact with an axion mass mA of the order of 1eV[13] the
typical frequencies of the field shall exceed 1015 Hz and so no resonance
conditions appear realistic. The relevant parameter turns out to be the
dimension of the spatial inhomogeneity, in the present model 1/µ, here
also is seems very reasonable to assume that µ << mA. It is also possible
to give an expression for the total number of produced particles.
A = ζ C
2F 4µ3
Ψ lnLµ (12)
Some of the factors owe their origin to the general form of the interaction,
eq(1) and to dimensional requirament, it appers clear the role of the total
rotation of the field (Ψ) in determining the overall production; so when
the rotation is uniform the rate is proportional to the angular velocity.
The numerical factor is more model dependent, in the chosen case it
is ζ = 8
3− 2−
2] = 0.704 . . ..
The transformation from the incoming Heisenberg field to the outgo-
ing field can be implemented by the simple unitary operator
U = exp
d3k[a(k)b∗(k) − a†(k)b(k)]
Ua(k)U† = a(k) + b(k) .
The actual form of the evolution operator gives the further information
that the axion are produced with a Poissonian distribution of multiplicity,
strictly speaking this is true for a production in a totally defined state,
in operations like the one leading to eq. (9) this particular form can be
blurred.
A very short mention on this problem was made at the XXXVI In-
ternational Symposium on Multiparticle Dynamics, Paraty, R.J. - Brazil,
September 2006
Appendix
We want to calculate
dΩkgk(τ)g
′), with the functions g given by
eq.(8.1) and taking care of the finite extension of the magnetic field. This
is implemented by substituting the δ−functions as:
δ(n · k) → Lπ−1/2 exp[−L2(n · k)2] (A.1)
The calculation is performed in the particular case in which the rotation
takes place in a constant plane, so J‖J′. The integration over the angles is
performed in Cartesian coordinates, it is useful to introduce the unit vector
of the three-momentum direction k = kv, then
dΩk = 2δ(v
2 − 1)d3v
and through standard although lengthy calculations the representation is
obtained:
dΩkδ(n · k)J · kδ(n′ · k)J′ · k →
JJ ′×
dλ exp[iλ(w2 − 1)]w2dw
(n ∧ n′)2 − 2iλ/(Lk)2 − λ2/(Lk)4
]−1/2
(A.2)
In the limit L→ ∞ it results
I = JJ ′/|n ∧ n′|
which can be obtained in a simpler way.
Now we must integrate over τ and τ ′, times an oscillating factor
eiωk(τ−τ
′). In the conditions that have been chosen the motion takes
place in a plane, so n is characterized by a unique angle ψ and n′ by
ψ′ hence the integration over time amounts at an angular integration, in
fact (n ∧ n′)2 = (sin(ψ − ψ′))2 and moreover J = ψ̇ and J ′ = ψ̇′. So
we must integrate I in dψ , dψ′ from 0 to some final angle Ψf . Defin-
ing γ = ψ − ψ′ we see that the integrand shows, in the limit L → ∞,
a singularity for γ = 0, so we perform the integration from −1
π to 1
because the domain which do not include zero has no singular behavior,
the oscillating factor is approximated with its value on the singular point
τ = τ ′, so that the exponential factor reduces to 1. Then the integral is a
complete elliptic integral,[12] which can be conveniently expressed in term
of the hypergeometric function. In fact the integration in dγ is:
sin2 γ +Q
with Q = − 2iλ
(Lk)2
(Lk)4
(A.3)
The result of the integration is
and in the limit L→ ∞, which gives Q→ 0 we get;
2 ln 2 + lnLk
Since the dominant term in the limit is independent of the auxiliary pa-
rameter λ the rest of the integration in eq. (A.2) is straightforward and it
gives
I = Ψ
2 ln 4kL (A.4)
A comment: what is excluded is the possibility that the magnetic field
should perform more than a complete rotation, this would destroy the
correspondence ψ = ψ′ ↔ τ = τ ′.
References
1. C. Thompson, R.C. Duncan, Astrophys.J. 408 (1993) 194
2. H. Hanami, Astrophys.J. 491 (1997) 687
3. C. Kouveliotou, R.C. Duncan, C. Thompson, Sci.Am. 288,2 (2003)
4. G. Calucci, Mod.Phys.Lett. A14 (1999) 1183;
A. DiPiazza, G.Calucci, Phys.Rev.D 65 (2002) 125019;
A. DiPiazza, G.Calucci, Phys.Rev.D 66 (2002) 123006;
A. DiPiazza, Eur. Phys. J.C 36 (2004) 25
5. T. Piran, Phys.Rep.314 (1999) 375;
J. van Paradijs, C. Kouveliotou, R.A.M.J. Wijers, Annu.Rev. As-
tron.Astrophys. 38 (2000) 279:
P. Mészáros, Annu.Rev.Astron.Astrophys. 40 (2002)137;
T. Piran, Rev.Mod.Phys. 76 (2004) 1143
R. Ruffini, Nuovo Cimento B 119 (2004) 785
6. R.D.Peccei, H.R.Quinn, Phys.Rev.Lett. 38 (1977) 1440;
R.D.Peccei, H.R.Quinn, Phys.Rev.D 16 (1977) 1791;
F. Wilczek, Phys.Rev.Lett. 40 (1978) 279;
S. Weinberg, Phys.Rev.Lett. 40 (1978) 223
7. L. Maiani, R.Petronzio, E.Zavattini Phys.Lett.B 175 (1986) 359
8. E. Zavattini, G. Zavattini, G. Ruoso, E. Polacco, E. Milotti, M.
Karuza, U. Gastaldi, G. DiDomenico, F. DellaValle, R. Cimino, S.
Carusotto, G.Cantatore, M. Bregant Phys.Rev.Lett. 96 (2006) 110406
S. Lamoreaux Nature 41 (2006) 31
9. The CERN Axion solar telescope (C.E.Asleth et al.) Nucl.Phys.B
(proc.suppl.) 110 (2002) 85
10. A.B. Migdal, V. Krainov Approximation methods in quantum mechan-
ics (Ch. 2) (Benjamin, NewYork 1969)
D.R.Bates ed. Quantum theory, vol 1 (Ch. 8)Academic press, London
11. A. DiPiazza, G.Calucci, Mod.Phys.Lett. A20 (2005), 117
12. M.Abramowitz and I.A.Stegun, Handbook of mathematical functions
(Dover, 1964).
13. S. Hannestad, A. Mirizzi,G. Raffelt, J. Cosm. Astrop. Phys. 07002
(2005)
ABSTRACT
  The investigation on the production of particles in slowly varying but
extremely intense magnetic field in extended to the case of axions. The
motivation is, as for some previously considered cases, the possibility that
such kind of magnetic field may exist around very compact astrophysical
objects.

<|endoftext|><|startoftext|>
Introduction
	Experiment
	Results and Discussion
	High Temperature Phase
	Low Temperature Phases
	Magnetic Interactions
	Electronic Excitations and Comparison with VOCl
	Conclusion
	Appendix: Details of the spring model calculation
	References
ABSTRACT
  The sequence of phase transitions and the symmetry of in particular the low
temperature incommensurate and spin-Peierls phases of the quasi one-dimensional
inorganic spin-Peierls system TiOX (TiOBr and TiOCl) have been studied using
inelastic light scattering experiments. The anomalous first-order character of
the transition to the spin-Peierls phase is found to be a consequence of the
different symmetries of the incommensurate and spin-Peierls (P$2_{1}/m$)
phases.
  The pressure dependence of the lowest transition temperature strongly
suggests that magnetic interchain interactions play an important role in the
formation of the spin-Peierls and the incommensurate phases. Finally, a
comparison of Raman data on VOCl to the TiOX spectra shows that the high energy
scattering observed previously has a phononic origin.

<|endoftext|><|startoftext|>
Introduction
Shell-type supernova remnants (SNRs) have been identified
as particle accelerators via their very-high-energy (VHE;
E > 100 GeV) γ-ray and non-thermal X-ray emission (see
e.g. Aharonian et al. (2006a) and Koyama et al. (1997)). It
has been suggested that interactions of particles acceler-
ated in SNR with nearby molecular clouds should produce
detectable γ-ray emission (Aharonian et al. 1994). For this
reason the well-known Monoceros Loop SNR (G 205.5+0.5,
distance∼1.6 kpc (Graham et al. 1982; Leahy et al. 1986)),
with its apparent interaction with the Rosette Nebula (a
young stellar cluster/molecular cloud complex, distance
http://arxiv.org/abs/0704.0171v1
2 F. A. Aharonian et al.: A point-like γ-ray source in Monoceros
1.4± 0.1 kpc (Hensberge et al. 2000)) is a prime target for
observations with VHE γ-ray instruments.
For the case of hadronic cosmic rays (CRs) interact-
ing in the interstellar medium to produce pions and hence
γ-rays via π0 decay, a spatial correlation between γ-ray
emission and tracers of interstellar gas is expected. Such a
correlation was used to infer the presence of a population of
recently accelerated CR hadrons in the Galactic Centre re-
gion (Aharonian et al. 2006b). This discovery highlights the
importance of accurate mapping of available target material
for the interpretation of TeV γ-ray emission. The NANTEN
4 m diameter sub-mm telescope at Las Campanas observa-
tory, Chile, has been conducting a 12CO (J=1→0) survey
of the Galactic plane since 1996 (Mizuno & Fukui 2004).
The Monoceros region is covered by this survey and the
NANTEN data are used here to trace the target material
for interactions of accelerated hadrons.
2. H.E.S.S. Observations and Results
The observations described here took place between March
2004 and March 2006 and comprise 13.5 hours of data after
data quality selection and dead-time correction. The data
were taken over a wide range of zenith angles from 29 to
59 degrees, leading to a mean energy threshold of 400 GeV
with so-called standard cuts used here for spectral analysis
and 750 GeV with the hard cuts used here for the source
search and position fitting. These cuts are described in de-
tail in Aharonian et al. (2006c).
A search in this region for point-like emission was made
using a 0.11◦ On source region and a ring of mean radius
0.5◦ for Off source background estimation (see Berge et al.
(2006) for details). Fig. 1 shows the resulting significance
map, together with CO data from NANTEN, radio con-
tours and the positions of all Be-stars in this region. The
peak significance in the field is 7.1σ. The number of sta-
tistical trials associated with a search of the entire field of
view, in 0.01◦ steps along both axes, is≈ 105. The measured
peak significance corresponds to 5.3σ after accounting for
these trials. A completely independent analysis based on a
fit of camera images to a shower model (Model Analysis de-
scribed in de Naurois (2006)), yields a significance of 7.3σ
(5.6σ post-trials).
The best fit position of the new source is 6h32m58.3s,
+5◦48′20′′ (RA/Dec. J2000) with 28′′ statistical errors
on each axis, and is hence identified as HESSJ0632+057.
Systematic errors are estimated at 20′′ on each axis. There
is no evidence for intrinsic extension of the source and we
derive a limit on the rms size of the emission region of 2′
(at 95% confidence), under the assumption that the source
follows a Gaussian profile. This source size upper limit is
shown as a dashed circle in the bottom panel of Fig. 1. Fig. 2
demonstrates the point-like nature of the source. The an-
gular distribution of excess γ-ray-like events with respect
to the best fit position is shown together with the expected
distribution for a point-like source.
The reconstructed energy spectrum of the source is con-
sistent with a power-law: dN/dE = k(E/1TeV)−Γ with
photon index Γ = 2.53± 0.26stat ± 0.20sys and a flux nor-
malisation k = 9.1±1.7stat±3.0sys×10
−13 cm−2s−1TeV−1.
Fig. 3 shows the H.E.S.S. spectrum together with that
for the unidentified EGRET source 3EGJ0634+0521 (dis-
cussed below) and an upper limit derived for TeV emis-
sion from 3EGJ0634+0521 using the HEGRA telescope
s00m30h06s00m35h06
s00m30h06s00m35h06
G 205.5+0.5
Rosette Nebula
HESS J0632+057
SAX J0635.2+0533
3EG 0634+0521
5.6407
5.7507
5.8607
5.9707
s30m32h06s00m33h06s30m33h06
s30m32h06s00m33h06s30m33h06
H.E.S.S.
Fig. 1. Top: the Monoceros SNR / Rosette Nebula re-
gion. The grey-scale shows velocity integrated (0-30 km
s−1) 12CO (J=1→0) emission from the NANTEN Galactic
Plane Survey (white areas have highest flux). Yellow con-
tours show 4 and 6 σ levels for the statistical significance of
a point-like γ-ray excess. Radio observations at 8.35 GHz
from Langston et al. (2000) are overlaid as cyan contours,
and illustrate the extent of the Rosette Nebula. The nomi-
nal Green (2004) Catalogue position/size of the Monoceros
SNR is shown as an (incomplete) dashed circle. 95% and
99% confidence regions for the position of the EGRET
source 3EG0634+0521 are shown as dotted green contours.
The binary pulsar SAXJ0635.2+0533 is marked with a
square and Be-stars with pink stars. Bottom: an expanded
view of the centre of the top panel showing H.E.S.S. sig-
nificance as a colour scale. The rms size limit derived for
the TeV emission is shown as a dashed circle. The unidenti-
fied X-ray source 1RXSJ063258.3+054857 is marked with
a triangle and the Be-star MCW 148 with a star.
F. A. Aharonian et al.: A point-like γ-ray source in Monoceros 3
 (square degrees)2θ
0 0.005 0.01 0.015 0.02 0.025
Fig. 2. Distribution of excess (candidate γ-ray) events as a
function of squared angular distance from the best fit posi-
tion of HESS J0632+057 (points), compared to the expecta-
tion for this dataset from Monte-Carlo simulations (smooth
curve).
array (Aharonian et al. 2004), converted from an integral
to a differential flux using the spectral shape measured
by H.E.S.S. We find no evidence for flux variability of
HESS J0632+057 within our dataset. However, we note
that due to the weakness of the source and sparse sam-
pling of the light-curve, intrinsic variability of the source
is not strongly constrained. The bulk of the available data
was taken in two short periods in December 2004 (P1, 4.7
hours) and November/December 2005 (P2, 6.2 hours). The
integral fluxes (above 1 TeV) in these two periods were:
6.3±1.8×10−13 cm−2 s−1 (P1) and 6.4±1.5×10−13 cm−2
s−1 (P2).
Energy (GeV)
-110 1 10 210 310 410
-1010
HEGRA
HESS J0623+057
3EG J0634+0521
Fig. 3. Reconstructed VHE γ-ray spectrum of
HESS J0632+057 compared to the HE γ-ray source
3EGJ0634+0521. An upper limit derived for
3EGJ0634+0521 at TeV energies using the HEGRA
instrument is also shown.
Amongst the candidate VHE sources in this field is the
34 ms binary pulsar SAXJ0635.2+0533. There is no signif-
icant γ-ray emission at the position of this object and we
derive a 99% confidence upper limit on the integral flux,
F (> 1TeV), of 2.6 × 10−13 cm−2 s−1, assuming an E−2
type spectrum.
3. Possible Associations of HESS J0632+057
The new VHE source HESS J0632+057 lies in a complex
region and several associations with objects known at other
wavelengths seem plausible. We therefore consider each of
these potential counterparts in turn.
The Monoceros Loop SNR is rather old in
comparison to the known VHE γ-ray shell-type
SNRs RXJ1713.7−3946 (Aharonian et al. 2006a),
RXJ0852.0−4622 (Aharonian et al. 2005b) and Cas-
A (Aharonian et al. 2001). All these objects have estimated
ages less than ∼ 2000 years, in contrast the Monoceros
Loop SNR has an age of ∼ 3 × 104 years (Leahy et al.
1986). This supernova remnant therefore appears to be in
a different evolutionary phase (late Sedov or Radiative)
compared to these known VHE sources. However, CR
acceleration may occur even at this later evolutionary stage
(see for example Yamazaki et al. (2006)). The principal
challenge for a scenario involving the Monoceros Loop
is to explain the very localised VHE emission at only
one point on the SNR limb. The interaction of the SNR
with a compact molecular cloud is one possible solution.
In this scenario (and indeed any π0 decay scenario) for
the observed γ-ray emission, a correlation is expected
between the TeV emission and the distribution of target
material. An unresolved molecular cloud listed in a CO
survey at 115 GHz (Oliver et al. 1996) lies rather close to
HESS J0623+057, at l = 205.75 b = −1.31. The distance
estimate for this cloud (1.6 kpc) is consistent with that
for the Monoceros SNR, making it a potential target for
hadrons accelerated in the SNR. However, as can be seen
clearly in the NANTEN data in Fig. 1, the intensity peak
of this cloud is significantly shifted to the East of the
H.E.S.S. source. We find no evidence in the NANTEN
data for any clouds along the line of sight to the H.E.S.S.
source.
3EGJ0634+0521 is an unidentified EGRET source
(Hartman et al. 1999) with positional uncertainties such
that HESS J0632+057 lies close to the 99% confidence con-
tour. Given that this source is flagged as possibly extended
or confused, a positional coincidence of these two objects
seems plausible. Furthermore, the reported third EGRET
catalogue flux above 100 MeV ((25.5± 5.1)× 10−8 photons
cm−2 s−1 with a photon index of 2.03 ± 0.26, see Fig. 3),
is consistent with an extrapolation of the H.E.S.S. spec-
trum. A global fit of the two spectra gives a photon index
of 2.41±0.06.
1RXSJ063258.3+054857 is a faint ROSAT source
(Voges et al. 2000) which lies 36′′ from the H.E.S.S. source
with a positional uncertainty of 21′′ (see Fig. 1 bottom).
Given the uncertainties on the positions of both objects
this X-ray source can certainly be considered a potential
counterpart of HESS J0632+057. The chance probability of
the coincidence of a ROSAT Faint Source Catalogue source
within the H.E.S.S. error circle is estimated as 0.1% by
scaling the total number of sources in the field of view. The
ROSAT source is rather weak, with only 4 counts detected
above 0.9 keV, spectral comparison is therefore rather diffi-
cult. In the scenario where the γ-ray emission is interpreted
as inverse Compton emission from a population of energetic
electrons, the ROSAT source could be naturally ascribed to
the synchrotron emission of the same electron population.
However, the low level of the X-ray emission (∼ 10−13 erg
cm−2 s−1) in comparison with the TeV flux (∼ 10−12 erg
4 F. A. Aharonian et al.: A point-like γ-ray source in Monoceros
cm−2 s−1) implies a very low magnetic field (≪ 3µG) un-
less a strong radiation source exists in the neighbourhood
of the emission region and/or the X-ray emission suffers
from substantial absorption. Observations at > 4 keV are
required to resolve this absorption issue. In a π0 decay sce-
nario for the γ-ray source, secondary electron production
via muon decay is expected along with γ-ray emission. The
synchrotron emission of these secondary electrons would in
general produce a weaker X-ray source than the IC scenario,
probably compatible with the measured ROSAT flux.
MWC148 (HD 259440) is a massive emission-line star
of spectral type B0pe which lies within the H.E.S.S. error
circle. The chance probability of this coincidence is hard to
assess, as there was no a-priori selection of stellar objects
as potential γ-ray sources. However, given the presence of
only 3 Be-type stars in the field of view of the H.E.S.S.
observation (see Fig. 1) and the solid angle of the H.E.S.S.
error circle, the naive chance probability of the associa-
tion is 10−4. Stars of this spectral type have winds with
typical velocities and mass loss rates of 1000 km s−1 and
10−7M⊙/year, respectively. Plausible acceleration sites are
in strong internal or external shocks of the stellar wind.
We estimate that an efficiency of 1-10% in the conversion
of the kinetic energy of the wind into γ-ray emission would
be required to explain the H.E.S.S. flux (assuming this star
lies at the distance of the Rosette Nebula). However, as no
associations of similar stars with point-like γ-ray sources
were found in the H.E.S.S. survey of the inner Galaxy, this
scenario seems rather unlikely.
A related possibility is that MWC148 is part of a binary
system with an, as yet undetected, compact companion.
Such a system might then resemble the known VHE γ-ray
source PSRB1259-63/SS2883 (Aharonian et al. 2005a).
Further multi-wavelength observations are required to con-
firm or refute this scenario.
Acknowledgements. The support of the Namibian authorities and of
the University of Namibia in facilitating the construction and op-
eration of H.E.S.S. is gratefully acknowledged, as is the support by
the German Ministry for Education and Research (BMBF), the Max
Planck Society, the French Ministry for Research, the CNRS-IN2P3
and the Astroparticle Interdisciplinary Programme of the CNRS, the
U.K. Particle Physics and Astronomy Research Council (PPARC),
the IPNP of the Charles University, the South African Department
of Science and Technology and National Research Foundation, and
by the University of Namibia. We appreciate the excellent work of
the technical support staff in Berlin, Durham, Hamburg, Heidelberg,
Palaiseau, Paris, Saclay, and in Namibia in the construction and oper-
ation of the equipment. The NANTEN project is financially supported
from JSPS (Japan Society for the Promotion of Science) Core-to-Core
Program, MEXT Grant-in-Aid for Scientific Research on Priority
Areas, and SORST-JST (Solution Oriented Research for Science and
Technology: Japan Science and Technology Agency). We would also
like to thank Stan Owocki and James Urquhart for very useful dis-
cussions.
References
Aharonian, F., Akhperjanian, A., Barrio, J., et al. 2001, A&A, 370,
Aharonian, F., Akhperjanian, A. G., Aye, K.-M., et al. 2005a, A&A,
442, 1
Aharonian, F., Akhperjanian, A. G., Bazer-Bachi, A. R., et al. 2006a,
A&A, 449, 223
Aharonian, F., Akhperjanian, A. G., Bazer-Bachi, A. R., et al. 2006b,
Nature, 439, 695
Aharonian, F., Akhperjanian, A. G., Bazer-Bachi, A. R., et al. 2006c,
Aharonian, F., Akhperjanian, A. G., Bazer-Bachi, A. R., et al. 2005b,
A&A, 437, L7
Aharonian, F. A., Akhperjanian, A. G., Beilicke, M., et al. 2004, A&A,
417, 973
Aharonian, F. A., Drury, L. O., & Voelk, H. J. 1994, A&A, 285, 645
Berge, D., Funk, S., & Hinton, J. 2006, astro-ph/0610959
de Naurois, M. 2006, astro-ph/0607247
Graham, D. A., Haslam, C. G. T., Salter, C. J., & Wilson, W. E.
1982, A&A, 109, 145
Green, D. A. 2004, Bulletin of the Astronomical Society of India, 32,
Hartman, R. C., Bertsch, D. L., Bloom, S. D., et al. 1999, ApJS, 123,
Hensberge, H., Pavlovski, K., & Verschueren, W. 2000, A&A, 358, 553
Koyama, K., Kinugasa, K., Matsuzaki, K., et al. 1997, PASJ, 49, L7
Langston, G., Minter, A., D’Addario, L., et al. 2000, AJ, 119, 2801
Leahy, D. A., Naranan, S., & Singh, K. P. 1986, MNRAS, 220, 501
Mizuno, A. & Fukui, Y. 2004, in ASP Conf. Ser. 317: Milky
Way Surveys: The Structure and Evolution of our Galaxy, ed.
D. Clemens, R. Shah, & T. Brainerd, 59
Oliver, R. J., Masheder, M. R. W., & Thaddeus, P. 1996, A&A, 315,
Voges, W., Aschenbach, B., Boller, T., et al. 2000, IAU Circ., 7432, 1
Yamazaki, R., Kohri, K., Bamba, A., et al. 2006, MNRAS, 371, 1975
1 Max-Planck-Institut für Kernphysik, P.O. Box 103980, D
69029 Heidelberg, Germany
2 Yerevan Physics Institute, 2 Alikhanian Brothers St.,
375036 Yerevan, Armenia
3 Centre d’Etude Spatiale des Rayonnements, CNRS/UPS,
9 av. du Colonel Roche, BP 4346, F-31029 Toulouse Cedex 4,
France
4 Universität Hamburg, Institut für Experimentalphysik,
Luruper Chaussee 149, D 22761 Hamburg, Germany
5 Institut für Physik, Humboldt-Universität zu Berlin,
Newtonstr. 15, D 12489 Berlin, Germany
6 LUTH, UMR 8102 du CNRS, Observatoire de Paris,
Section de Meudon, F-92195 Meudon Cedex, France
7 DAPNIA/DSM/CEA, CE Saclay, F-91191 Gif-sur-Yvette,
Cedex, France
8 University of Durham, Department of Physics, South
Road, Durham DH1 3LE, U.K.
9 Unit for Space Physics, North-West University,
Potchefstroom 2520, South Africa
10 Laboratoire Leprince-Ringuet, IN2P3/CNRS, Ecole
Polytechnique, F-91128 Palaiseau, France
11 Laboratoire d’Annecy-le-Vieux de Physique des Particules,
IN2P3/CNRS, 9 Chemin de Bellevue - BP 110 F-74941
Annecy-le-Vieux Cedex, France
12 APC, 11 Place Marcelin Berthelot, F-75231 Paris Cedex
05, France UMR 7164 (CNRS, Université Paris VII, CEA,
Observatoire de Paris)
13 Dublin Institute for Advanced Studies, 5 Merrion Square,
Dublin 2, Ireland
14 Landessternwarte, Universität Heidelberg, Königstuhl, D
69117 Heidelberg, Germany
15 Laboratoire de Physique Théorique et Astroparticules,
IN2P3/CNRS, Université Montpellier II, CC 70, Place
Eugène Bataillon, F-34095 Montpellier Cedex 5, France
16 Universität Erlangen-Nürnberg, Physikalisches Institut,
Erwin-Rommel-Str. 1, D 91058 Erlangen, Germany
17 Laboratoire d’Astrophysique de Grenoble, INSU/CNRS,
Université Joseph Fourier, BP 53, F-38041 Grenoble Cedex
9, France
18 Institut für Astronomie und Astrophysik, Universität
Tübingen, Sand 1, D 72076 Tübingen, Germany
19 Laboratoire de Physique Nucléaire et de Hautes Energies,
IN2P3/CNRS, Universités Paris VI & VII, 4 Place Jussieu,
F-75252 Paris Cedex 5, France
20 Institute of Particle and Nuclear Physics, Charles
University, V Holesovickach 2, 180 00 Prague 8, Czech
Republic
21 Institut für Theoretische Physik, Lehrstuhl IV: Weltraum
und Astrophysik, Ruhr-Universität Bochum, D 44780
F. A. Aharonian et al.: A point-like γ-ray source in Monoceros 5
Bochum, Germany
22 University of Namibia, Private Bag 13301, Windhoek,
Namibia
23 European Associated Laboratory for Gamma-Ray
Astronomy, jointly supported by CNRS and MPG
24 Department of Astrophysics, Nagoya University, Chikusa-
ku, Nagoya 464-8602, Japan
25 Nagoya University Southern Observatories, Nagoya 464-
8602, Japan
	Introduction
	H.E.S.S. Observations and Results
	Possible Associations of HESSJ0632+057
ABSTRACT
  The complex Monoceros Loop SNR/Rosette Nebula region contains several
potential sources of very-high-energy (VHE) gamma-ray emission and two as yet
unidentified high-energy EGRET sources. Sensitive VHE observations are required
to probe acceleration processes in this region. The H.E.S.S. telescope array
has been used to search for very high-energy gamma-ray sources in this region.
CO data from the NANTEN telescope were used to map the molecular clouds in the
region, which could act as target material for gamma-ray production via
hadronic interactions. We announce the discovery of a new gamma-ray source,
HESS J0632+058, located close to the rim of the Monoceros SNR. This source is
unresolved by H.E.S.S. and has no clear counterpart at other wavelengths but is
possibly associated with the weak X-ray source 1RXS J063258.3+054857, the
Be-star MWC 148 and/or the lower energy gamma-ray source 3EG J0634+0521. No
evidence for an associated molecular cloud was found in the CO data.

<|endoftext|><|startoftext|>
Thermal entanglement of qubit pairs on the Shastry-Sutherland lattice
S. El Shawish
J. Stefan Institute, Ljubljana, Slovenia
A. Ramšak and J. Bonča
Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia and
J. Stefan Institute, Ljubljana, Slovenia
(Dated: 2 April 2007)
We show that temperature and magnetic field properties of the entanglement between spins on the two-
dimensional Shastry-Sutherland lattice can be qualitatively described by analytical results for a qubit tetramer.
Exact diagonalization of clusters with up to 20 sites reveals that the regime of fully entangled neighboring pairs
coincides with the regime of finite spin gap in the spectrum. Additionally, the results for the regime of vanishing
spin gap are discussed and related to the Heisenberg limit of the model.
PACS numbers: 75.10.Jm, 03.65.Yz, 03.67.Mn
I. INTRODUCTION
In any physical system with subsystems in interaction, indi-
vidual parts of the system are to some extent entangled, even
if they are far apart, as realized already at the beginning of
modern quantum mechanics sixty years ago. Today it has be-
come appreciated that the ability to establish entanglement be-
tween quantum particles in a controlled manner is a crucial in-
gredient of any quantum information processing system1. On
the other hand, it turned out that the analysis of appropriately
quantified entanglement between parts of the system can also
be a very useful tool in the study of many body phenomena,
as is, e.g., the behavior of correlated systems in the vicinity of
crossovers between various regimes or even points of quantum
phase transition2.
Quantum entanglement of two distinguishable particles in a
pure state can be quantified through von Neuman entropy3,4,5.
Entanglement between two spin- 1
particles – qubit pair – can
be considered a physical resource, an essential ingredient of
algorithms suitable for quantum computation. For a pair of
subsystems A and B, each occupied by a single electron, an
appropriate entanglement measure is the entanglement of for-
mation, which can be quantified from the Wootters formula6.
In general, electron-qubits have the potential for even richer
variety of entanglement measure choices due to both their
charge and spin degrees of freedom. When entanglement is
quantified in systems of indistinguishable particles, the mea-
sure must account for the effect of exchange and it must ade-
quately deal with multiple occupancy states7,8,9,10,11,12. A typ-
ical example is the analysis of entanglement in lattice fermion
models (the Hubbard model, e.g.) where double occupancy
plays an essential role11.
In realistic hardware designed for quantum information
processing, several criteria for qubits must be fulfilled13: the
existence of multiple identifiable qubits, the ability to initial-
ize and manipulate qubits, small decoherence, and the ability
to measure qubits, i.e., to determine the outcome of compu-
tation. It seems that among several proposals for experimen-
tal realizations of such quantum information processing sys-
tems the criteria for scalable qubits can be met in solid state
structures consisting of coupled quantum dots14,15. Due to the
ability to precisely control the number of electrons in such
structures16, the entanglement has become experimentally ac-
cessible quantity. In particular, recent experiments on semi-
conductor double quantum dot devices have shown the evi-
dence of spin entangled states in GaAs based heterostuctures17
and it was shown that vertical-lateral double quantum dots
may be useful for achieving two-electron spin entanglement18.
It was also demonstrated recently that in double quantum dot
systems coherent qubit manipulation and projective readout is
possible19.
Qubit pairs to be used for quantum information processing
must be to a high degree isolated from their environment, oth-
erwise small decoherence requirement from the DiVincenzo’s
checklist can not be fulfilled. The entanglement, e.g., between
two antiferromagnetically coupled spins in contact with ther-
mal bath, is decreased at elevated temperatures and external
magnetic field20,21,22, and will inevitably vanish at some fi-
nite temperature23. Entanglement of a pair of electrons that
are confined in a double quantum dot is collapsed due to the
Kondo effect at low temperatures and for a very weak tunnel-
ing to the leads. At temperatures below the Kondo temper-
ature a spin-singlet state is formed between a confined elec-
tron and conduction electrons in the leads24. For other open
systems there are many possible sources of decoherence or
phase-breaking, for example coupling to phonon degrees of
freedom25.
The main purpose of the present paper is to analyze the ro-
bustness of the entanglement of spin qubit pairs in a planar
lattice of spins (qubits) with respect to frustration in magnetic
couplings, elevated temperatures as well as due to increasing
external magnetic field. The paper is organized as follows.
Sec. II introduces the model for two coupled qubit pairs –
qubit tetramer – and presents exact results for temperature and
magnetic field dependence of the entanglement between near-
est and next-nearest-neighboring spins in a tetrahedron topol-
ogy. In Sec. III the model is extended to infinite lattice of
qubit pairs described by the Shastry-Sutherland model26. This
model is convenient firstly, because of the existence of sta-
ble spin-singlet pairs in the ground state in the limit of weak
coupling between the qubit pairs, and secondly, due to a rel-
atively good understanding of the physics of the model in the
http://arxiv.org/abs/0704.0172v1
thermodynamic limit. Entanglement properties of the Shastry-
Sutherland model were so far not considered quantitatively.
Neverteless, several results concerning the role of entangle-
ment at a phase transition in other low-dimensional spin lattice
systems2,27,28,29,30,31,32, as well as in fermionic systems33,34,35
have been reported recently. Near a quantum phase transi-
tion in some cases entanglement even proves to be more ef-
ficient precursor of the transition compared to standard spin-
spin correlations35,36. In Sec. IV we discuss entanglement be-
tween nearest neighbors in the Heisenberg model, represent-
ing a limiting case of the Shastry-Sutherland model. Results
are summarized in Sec. V and some technical details are given
in Appendix A.
II. THERMAL ENTANGLEMENT OF A QUBIT
TETRAMER IN MAGNETIC FIELD
Consider first a double quantum dot composed of two adja-
cent quantum dots weakly coupled via a controllable electron-
hopping integral. By adjusting a global back-gate voltage,
precisely two electrons can be confined to the dots. The inter-
dot tunneling matrix element t determines the effective anti-
ferromagnetic (AFM) superexchange interaction J ∼ 4t2/U ,
where U is the scale of Coulomb interaction between two
electrons confined on the same dot. There are several possi-
ble configurations of coupling between such double quantum
dots. One of the simplest specific designs is shown schemat-
ically in Fig. 1(a): four qubits at vertices of a tetrahedron. In
addition to the coupling A-B, by appropriate arrangements of
gate electrodes the tunneling between A-C and A-D can as
well be switched on.
We consider here the case where J/U ≪ 1, thus double oc-
cupancy of individual dot is negligible and appropriate Hilbert
space is spanned by two dimers (qubit pairs): spins at sites
A-B and C-D are coupled by effective AFM Heisenberg mag-
netic exchange J and at sites A-C, B-C, A-D, B-D by J ′. The
corresponding hamiltonian of such a pair of dimers is given as
H4 = J(SA · SB + SC · SD) + (1)
+ 2J ′(SA · SC + SB · SC + SA · SD + SB · SD)−
− B(SzA + SzB + SzC + SzD),
where Si =
σi is spin operator corresponding to the site i
and B is external homogeneous magnetic field in the direction
of the z-axis. Factor 2 in Eq. (1) is introduced for convenience
– such a parameterization represents the simplest case of finite
Shastry-Sutherland lattice with periodic boundary conditions
studied in Sec. III.
A. Concurrence
We focus here on the entanglement properties of two cou-
pled qubit dimers. The entanglement of a pair of spin
qubits A and B may be defined through concurrence3, C =
2|α↑↑α↓↓ − α↑↓α↓↑|, if the system is in a pure state |ΨAB〉 =
ss′ αss′ |s〉A|s′〉B, where |s〉i corresponds to the basis | ↑ 〉i,
Figure 1: (Color online) (a) Two coupled qubit pairs (dimers) in tetra-
hedral topology. (b) Shastry-Sutherland lattice as realized, e.g., in the
SrCu2(BO3)2 compound.
| ↓ 〉i. Concurrence varies from C = 0 for an unentan-
gled state (for example | ↑ 〉A| ↑ 〉B) to C = 1 for com-
pletely entangled Bell states3 1√
(| ↑ 〉A| ↑ 〉B ± | ↓ 〉A| ↓ 〉B)
or 1√
(|↑ 〉A|↓ 〉B ± |↓ 〉A|↑ 〉B).
For finite inter-pair coupling J ′ 6= 0 or at elevated tempera-
tures the A-B pair can not be described by a pure state. In the
case of mixed states describing the subsystem A-B the concur-
rence may be calculated from the reduced density matrix ρAB
given in the standard basis |s〉i|s′〉j6. Concurrence can be fur-
ther expressed in terms of spin-spin correlation functions2,27,
where for systems that are axially symmetric in the spin space
the concurrence may conveniently be given in a simple closed
form37, which for the thermal equilibrium case simplifies fur-
ther,
CAB = 2max(0, |〈S+AS
〉〈P ↓
〉). (2)
Here S+i = (S
† = Sxi + ıS
i is the spin raising operator
for dot i and P ↑i =
(1 + 2Szi ), P
(1 − 2Szi ) are the
projection operators onto the state | ↑ 〉i or | ↓ 〉i, respectively.
We consider the concurrence at fixed temperature, therefore
the expectation values in the concurrence formula Eq. (2) are
evaluated as
〈O〉 =
〈n|O|n〉e−βEn , (3)
whereZ =
−βEn is the partition function, β = 1/T , and
{|n〉} is a complete set of states of the system. Note that due
to the equilibrium and symmetries of the system, several spin-
spin correlation functions vanish, 〈S+
〉 = 0, for example.
Figure 2: (Color online) (a) Zero-temperature concurrence CAB as a function of J
′/J and B/J . Different regimes are characterized by
particular ground state functions |φn〉 defined in Appendix A. (b) T/J = 0.1 results for CAB. (c) Next nearest concurrence CAC for T = 0,
and (d) for T/J = 0.1. Dashed lines separate CAB(C) > 0 from CAB(C) = 0.
In vanishing magnetic field, where the SU(2) symmetry is
restored, the concurrence formula Eq. (2) simplifies further
and is completely determined by only one38 spin invariant
〈SA · SB〉,
CAB = max(0,−2〈SA · SB〉 −
). (4)
The concurrence may be expected to be significant whenever
enhanced spin-spin correlations indicate A-B singlet forma-
tion.
B. Analytical results
There are several known results related to the model Eq. (1).
In the special case of J ′ = 0, for example, the tetramer con-
sists of two decoupled spin dimers with concurrence CAB
(or the corresponding thermal entanglement) as derived in
Refs. 20,21. Entanglement of a qubit pair described by the
related XXZ Heisenberg model with Dzyaloshinskii-Moriya
anisotropic interaction can be also obtained analytically22.
Hamiltonian H4 with additional four-spin exchange interac-
tion but in the absence of magnetic field was considered re-
cently in the various limiting cases39.
Tetramer model Eq. (1) considered here is exactly solvable
and in Appendix A we present the corresponding eigenvec-
tors and eigenenergies. The concurrence CAB is for this case
determined from Eq. (2) with
〉 = 1
− e3j/2/2− ej/2
eb + e−b
+ e−j/2+4j
/6 + e−j/2+2j
eb + e−b
+ e−j/2−2j
eb/4 + 1/3 + e−b/4
, (5)
where j = βJ , j′ = βJ ′, b = βB, and with
〈P ↑↓
〉 = 1
+ e−j/2+4j
+ e−j/2+2j
1 + e±b
+ e−j/2−2j
1/6 + e±b/2 + e±2b
. (6)
Z = e3j/2 + 2ej/2
eb + 1 + e−b
+ e−j/2+4j
+ e−j/2+2j
eb + 1 + e−b
+ e−j/2−2j
e2b + eb + 1 + e−b + e−2b
is the partition function.
Alternatively, one can define and analyze also the entangle-
ment between spins at sites A and C and the corresponding
concurrence CAC can be expressed from Eq. (2) by applying
Figure 3: (Color online) (a) Temperature and magnetic field depen-
dence of CAB for J
′/J = 0.4 and (b) J ′ = J . Dashed lines separate
CAB > 0 from CAB = 0.
additional correlators with replaced B→C,
− e−j/2+4j
− e−j/2+2j
eb + e−b
+ e−j/2−2j
eb/4 + 1/3 + e−b/4
, (8)
〈P ↑↓
〉 = 1
e3j/2/4 + ej/2
1/2 + e±b
+ e−j/2+4j
/12 + e−j/2+2j
e±b/2
+ e−j/2−2j
1/6 + e±b/2 + e±2b
. (9)
The line 2J ′ = J represents a particularly interesting spe-
cial case where two dimers are coupled symmetrically form-
ing a regular tetrahedron. An important property of this sys-
tem is the (geometrical) frustration of, e.g., qubits C-A-B.
Such a frustration is the driving force of the quantum phase
transition found in the Shastry-Sutherland model and is the
reason for similarity of the results for two coupled dimers and
a large planar lattice studied in the next Section.
C. Examples
In the low temperature limit the concurrence is determined
by the ground state properties while transitions between var-
ious regimes are determined solely by crossings of eigenen-
ergies, which depend on two parameters (J ′/J,B/J). There
are 5 distinct regimes for CAB shown in Fig. 2(a): (i) com-
pletely entangled dimers (singlets A-B and C-D, state |φ1〉
from Appendix A), CAB = 1; (ii) for B > J and smaller
J ′/J the concurrence is zero because the energy of the state
consisted of a product of fully polarized A-B and C-D triplets,
|φ12〉, is the lowest energy in this regime; (iii) concurrence
is zero also for J ′ > J/2 and low B/J , with the ground
state |φ2〉. There are two regimes corresponding to 12 step in
CAB where the ground state is either (iv) any linear combina-
tion of degenerate states |φ6,7〉, i.e., simultaneous A-B singlet
(triplet) and C-D triplet (singlet) for J ′ < J/2, or (v) state
|φ5〉 at J ′ > J/2 and larger B. Qubits A-C are due to special
topology never fully entangled, and the corresponding CAC is
presented in Fig. 2(c). In the limit of J ′ ≫ J the tetramer
corresponds to a Heisenberg model ring consisted of 4 spins
and in this case qubit A is due to tetramer symmetry equally
entangled to both neighbors (C and D), thus CAC =
At elevated temperatures the concurrence is smeared out as
shown in Figs. 2(b,d). Note the dip separating the two dif-
ferent regimes with CAB =
, seen also in the CAC =
case. This dip clearly separates different regimes discussed in
the previous T = 0 limit and signals a proximity of a disen-
tangled excited state. For sufficiently high temperatures van-
ishing concurrence is expected23. The critical temperature Tc
denoted by a dashed line is set by the magnetic exchange scale
J , since at higher temperatures local singlets are broken irre-
spectively of the magnetic field.
A rather unexpected result is shown in Fig. 3(a) where at
B & 2J and low temperatures the concurrence slightly in-
creases with increasing temperature due to the contribution of
excited A-B singlet components that are absent in the ground
state. Similar behavior is found for J ′ ∼ 0 around B ∼ J ,
which is equivalent to the case of a single qubit dimer20,21
(not shown here). There is no distinctive feature in tempera-
ture and magnetic field dependence of CAB when J
′ > J/2
and a typical results is shown in Fig. 3(b) for J ′ > J .
III. PLANAR ARRAY OF QUBIT PAIRS: THE
SHASTRY-SUTHERLAND LATTICE
A. Preliminaries
The central point of this paper is the analysis of pair en-
tanglement for the case of a larger number of coupled qubit
pairs. In the following it will be shown that the results corre-
sponding to tetramers considered in the previous Section can
be very helpful for better understanding pair-entanglement of
N > 4 qubits. There are several possible generalizations of
coupled dimers and one of the simplest in two dimensions is
the Shastry-Sutherland lattice shown in Fig. 1(b). Neighbor-
ing sites A-B are connected with exchange interaction J and
next-neighbors with J ′. The corresponding hamiltonian for
N/2 dimers (N sites) is given with
HN = J
Si · Sj + J ′
Si · Sj −B
Szi . (10)
Periodic boundary conditions are used. For the special case
N = 4 the model reduces to Eq. (1) where due to periodic
boundary conditions sites A-C (and other equivalent pairs) are
doubly connected, therefore a factor of 2 in Eq. (1), as men-
tioned in Sec. II.
The Shastry-Sutherland model (SSM) was initially pro-
posed as a toy model possessing an exact dimerized eigenstate
known as a valence bond crystal26. Recently, the model has
experienced a sudden revival of interest by the discovery of
the two-dimensional spin-liquid compound SrCu2(BO3)2
40,41
since it is believed that magnetic properties of this compound
are reasonably well described by the SSM42. In fact, several
generalizations of the SSM have been introduced to account
better for recent high-resolution measurements revealing the
magnetic fine structure of SrCu2(BO3)2
42,43,44,45. Soon af-
ter the discovery of the SrCu2(BO3)2 system, the SSM thus
became a focal point of theoretical investigations in the field
of frustrated AFM spin systems, particularly low-dimensional
quantum spin systems where quantum fluctuations lead to
magnetically disordered ground states (spin liquids) with a
spin gap in the excitation spectrum.
The SSM is a two-dimensional frustrated antiferromagnet
with a unique spin-rotation invariant exchange topology that
leads in the limit J ≫ J ′ to an exact gapped dimerized ground
state with localized spin singlets on the dimer bonds (dimer
phase). In the opposite limit, J ≪ J ′, the model becomes
ordinary AFM Heisenberg model with a long-range Néel or-
der and a gapless spectrum (Néel phase). While two of the
phases are known, there are still open questions regarding the
existence and the nature of the intermediate phases. Several
possible scenarios have been proposed, e.g.: either a direct
transition between the two states occurs at the quantum criti-
cal point near J ′/J ∼ 0.746,47, or a transition via an interme-
diate phase that exists somewhere in the range of J ′/J > 0.6
and J ′/J < 0.948. Although different theoretical approaches
have been applied, a true nature of the intermediate phase (if
any) has still not been settled. As will be evident later on, our
exact-diagonalization results support the first scenario.
The SSM phase diagram reveals interesting behavior also
for varying external magnetic field. In particular, experiments
on SrCu2(BO3)2 in strong magnetic fields show formation of
magnetization plateaus41,49, which are believed to be a con-
sequence of repulsive interaction between almost localized
spin triplets. Several theoretical approaches support the idea
that most of these plateaus are readily explained within the
(bare) SSM46,50,51. Recent variational treatment based on en-
tangled spin pairs revealed new insight into various phases of
the SSM48.
Although extensively studied, the zero-temperature phase
diagram of the SSM remains elusive. This lack of reliable
solutions is even more pronounced when considering thermal
fluctuations in SSM as only few methods allow for the inclu-
0 0.2 0.4 0.6 0.8 1
T/J=0.1-2 <S
> - 1/2.
-2 <S
> - 1/2
AFM limit
dimer limit
Figure 4: (Color online) Results for the Shastry-Sutherland lattice
with N = 20 sites and periodic boundary conditions. Presented are
renormalized spin-spin correlation functions −2〈SA · SB,C〉 − 12 as
a function of J ′/J and for various temperatures. Asterisk indicates
critical J ′c which roughly separates the dimer and Néel phase.
sion of finite temperatures in frustrated spin systems. In this
respect, the calculation of thermal entanglement between the
spin pairs would also provide a new insight into the complex-
ity of the SSM.
B. Numerical method
We use the low-temperature Lanczos method52 (LTLM), an
extension of the finite-temperature Lanczos method53 (FTLM)
for the calculation of static correlation functions at low tem-
peratures. Both methods are nonperturbative, based on the
Lanczos procedure of exact diagonalization and random sam-
pling over different initial wave functions. A main advan-
tage of LTLM is that it accurately connects zero- and finite-
temperature regimes with rather small numerical effort in
comparison to FTLM. On the other hand, while FTLM is lim-
ited in reaching arbitrary low temperatures on finite systems,
it proves to be computationally more efficient at higher tem-
peratures. A combination of both methods therefore provides
reliable results in a wide temperature regime with moderate
computational effort. We note that FTLM was in the past suc-
cessfully used in obtaining thermodynamic as well as dynamic
properties of different models with correlated electrons as are:
the t-J model,53 the Hubbard model,54 as well as the SSM
model.43,45
In comparison with the conventional Quantum Monte Carlo
(QMC) methods LTLM possesses the following advantages:
(i) it does not suffer from the minus-sign problem that usually
hampers QMC calculations of many-electron as well as frus-
trated spin systems, (ii) the method continuously connects the
zero- and finite-temperature regimes, (iii) it incorporates as
well as takes the advantage of the symmetries of the prob-
lem, and (iv) it yields results of dynamic properties in the
real time in contrast to QMC calculations where imaginary-
time Green’s function is obtained. The LTLM (FTLM) is on
the other hand limited to small lattices which usually leads
to sizable finite-size effects. To account for these, we ap-
Figure 5: (Color online) (a) Zero-temperature concurrence CAB for a 20-site cluster for various J
′/J and B/J . Shaded area represents the
regime of fully entangled dimers, CAB = 1. (b) The corresponding results for T/J = 0.1. (c) Next nearest concurrence CAC for T = 0, and
(d) for T/J = 0.1. Note qualitative and even quantitative similarity with the tetramer results, Fig. 2. Dashed lines separate CAB(C) > 0 from
CAB(C) = 0.
plied LTLM to different square lattices with N = 8, 16 and
20 sites using periodic boundary conditions (we note that
next-larger system, N = 32, was too large to be handled
numerically). Another drawback of the LTLM (FTLM) is
the difficulty of the Lanczos procedure to resolve degener-
ate eigenstates that emerge also in the SSM. In practice, this
manifests itself in severe statistical fluctuations of the cal-
culated amplitude for T → 0 since in this regime only a
few (degenerate) eigenstates contribute to thermal average.
The simplest way to overcome this is to take a larger num-
ber of random samples R ≫ 1, which, however, requires
a longer CPU time. We have, in this regard, also included
a small portion of anisotropy in the SSM (in the form of
the anisotropic interdimer Dzyaloshinskii-Moriya interaction
{AC}(S
C − S
C), D
z/J ∼ 0.01), which slightly
splits the doubly degenerate single-triplet levels. In this way,
R ∼ 30 per Sz sector was enough for all calculated curves
to converge within ∼ 1% for T/J < 1. Here, the number
of Lanczos iterations M = 100 was used along with the full
reorthogonalization of Lanczos vectors at each step.
C. Entanglement
Entanglement in the absence of magnetic field is most
prominently reflected in spin-spin correlation functions, e.g.,
〈SA ·SB〉 and 〈SA ·SC〉. In zero temperature limit due to quan-
tum phase transition at J ′c these correlations change sign. In
Fig. 4 are presented renormalized spin-spin correlation func-
tions (for positive values identical to concurrence) as a func-
tion of J ′/J : (i) CAB > 0 in dimer phase and (ii) CAC > 0
in the Néel phase. Critical J ′c is indicated by asterisk. The re-
sults for N = 16 are qualitatively and quantitatively similar to
the N = 20 case presented here. At finite temperatures spin
correlations are smeared out as shown in Fig. 4 for various T .
Limiting Heisenberg case, J ′ → ∞, is discussed in more de-
tail in the next Section. J ′ = 0 case corresponds to the single
dimer limit21 and Sec. II.
Complete phase diagram of the SSM at T = 0 but with fi-
nite magnetic field can be classified in terms of concurrence
instead of spin correlations. In Fig. 5(a)CAB is presented as
a function of (J ′/J,B/J) as in the case of a single tetramer,
Fig. 2(a). Presented results correspond to the N = 20 case,
while N = 16 system exhibits very similar structure (not
shown here). N = 8 and N = 4 cases are qualitatively sim-
ilar, the main difference being the value of critical J ′c which
increases with N . Remarkable similarity between all these
cases can be interpreted by local physics in the regime of fi-
nite spin gap, J ′ < J ′c. Qubit pairs are there completely entan-
gled, CAB = 1, and CAB ∼ 12 for magnetic field larger than
the spin gap, but B < J+2J ′. For even larger B concurrence
approaches zero, similar to the N = 4 case. Concurrence is
Figure 6: (Color online) Temperature and magnetic field dependence
of CAB for J
′/J = 0.4 and N = 20. Note the similarity with
the corresponding tetramer results, Fig. 3(a). Dashed lines separate
CAB > 0 from CAB = 0.
zero also for J ′ > J ′c, except along the B ∼ 4J ′ line where
weak finite concurrence could be the finite size effect. Similar
results are found also for N = 16, 8 cases, and are most pro-
nounced in the N = 4 case. At finite temperature the structure
of concurrence is smeared out [Fig. 5(b)] similar to Fig. 2(b).
Concurrence CAC corresponding to next-nearest neighbors
is, complementary to CAB, increased in the Néel phase of the
diagram, Fig. 5(c). The similarity with N = 4, Fig. 2(c) is
somewhat surprising because in this regime long-range cor-
relations corresponding to the gapless spectrum of AFM-like
physics are expected to change also short range correlations.
The only quantitative difference compared to N = 4 is the
maximum value of CAC ∼ 0.3 instead of 0.5 (beside the crit-
ical value J ′c discussed in the previous paragraph). Concur-
rence is very small for B > J + 2J ′. At finite temperatures
fine fluctuations in the concurrence structure are smeared out,
Fig. 5(d).
Temperature and magnetic field dependence of CAB in the
dimer phase is presented in Fig. 6 for fixed J ′/J = 0.4. Sim-
ilarity with the corresponding N = 4 tetramer case, Fig. 3(a),
is astonishing and is again the consequence of local physics in
the presence of a finite spin gap. Finite size effects (in com-
parison with N = 16 and N = 8 cases) are very small (not
shown). Dashed line represents the borderline of the CAB = 0
region: critical Tc ≈ 0.75J valid for B/J . 3, that is in this
regime nearly independent of B, is slightly larger than in the
single tetramer case where its insensitivity to B is even more
pronounced.
IV. HEISENBERG LIMIT
The concurrence corresponding to next-nearest neighbors
in SSM, CAC, is non zero in the Néel phase for J
′ > J ′c.
Typical result for concurrence in this regime (for fixed J ′/J =
1) in terms of temperature and magnetic field is presented in
Fig. 7(a). At zero temperature the concurrence is zero for B >
4J ′ [compare with Fig. 2(c) and Fig. 5(c)].
Figure 7: (Color online) (a) Next nearest neighbor concurrence CAC
for J ′ = J . (b) Heisenberg lattice result as a special case of the
SSM, J = 0. Shaded region represents CAC = 0. In the line shaded
region (low finite temperature and large magnetic field) our numer-
ical results set only the upper limit CAC < 5 · 10
−4. Dashed lines
separate CAC > 0 from CAC = 0.
In the limit J = 0 the model simplifies to the AFM Heisen-
berg model on a square lattice of N sites,
HAC = J
Si · Sj −B
Szi . (11)
Several results for this model have already been presented for
very small clusters55,56,57, however the temperature and mag-
netic field dependence of the concurrence for systems with
sufficiently large number of states and approaching thermo-
dynamic limit has not been presented so far.
In Fig. 7(b) we further presented temperature and magnetic
field dependence of concurrence for the Heisenberg model for
N = 20 (results for N = 16 are quantitatively similar, but not
shown here). Temperature and magnetic field dependence of
CAC exhibits peculiar semi-island shape where at fixed value
of B the concurrence increases with increasing temperature.
This effect is to some extent seen in all cases and is the con-
sequence of exciting local singlet states, which do not appear
in the ground state. At T → 0 finite steps with increasing B
correspond to gradual transition from the singlet ground state
to totally polarized state with total spin S = 10 and vanish-
ing concurrence. This is in more detail presented in Fig. 8(a)
for various N = 4, 8, 16, 20. At B = 0 and for N = 20
we get CAC = 0.19. It is interesting to compare this results
0 0.05 0.1
/2N=8
1/6+2N
-3/2C
Figure 8: (Color online) (a) Zero-temperature concurrence CAC in
the Heisenberg limit as a function of B/J ′ and for various N = 4,
8, 16, 20. Sections with different total spin values are addition-
ally labeled. (b) Finite-size scaling of concurrence in the absence of
magnetic field. Full line represents the fit corresponding to Ref. 58,
CAC ≈
+ 2N−3/2.
with the known finite-size analysis scaling for the ground state
energy of the Heisenberg model58. The same scaling gives
CAC ≈ 16 + 2N
−3/2. Our finite-size scaling, Fig. 8(b), is in
perfect agreement with this result for N → ∞ at T = 0 and
B = 0.
In the opposite limit of high magnetic fields, the vanishing
concurrence CAC = 0, is observed for B above the critical
value Bc = 4J
′ for all system sizes shown in Fig. 8(a). This
result can be deduced also analytically. Since in a fully po-
larized state CAC = 0, this Bc actually denotes a transition
from S1 = N/2 − 1 to S0 = N/2 ferromagnetic ground
state with energy E0 = N(J
′ − B)/2. The energy of the
one-magnon excitation above the ferromagnetic ground state
is given by the spin wave theory, which is in this case exact, as
E1 = E0 −J ′(2− cos kxa− cos kya)+B, where (kx, ky) is
the magnon wave vector and a denotes the lattice spacing. Ev-
idently, a transition to a fully polarized state occurs precisely
at Bc = 4J
′ at (π/a, π/a) point in the one-magnon Brillouin
zone.
V. SUMMARY
The aim of this paper was to analyze and understand
how concurrence (and related entanglement) of qubit pairs
(dimers) is affected by their mutual magnetic interactions. In
particular, we were interested in a planar array of qubit dimers
described by the Shastry-Sutherland model. This model is
suitable due to very robust ground state composed of entan-
gled qubit pairs which breaks down by increasing the in-
terdimer coupling. It is interesting to study both, the en-
tanglement between nearest and between next-nearest spins
(qubits) at finite temperature and magnetic field. The results
are based on numerical calculations using low-temperature
Lanczos methods on lattices of 4, 8, 16 and 20 sites with pe-
riodic boundary conditions.
A comprehensive analysis of concurrence for various pa-
rameters revealed two general conclusions:
(1) For a weak coupling between qubit dimers, J ′ < J ′c,
qubit pairs are locally entangled in accordance with the local
nature of the dimer phase. This is due to a finite singlet-triplet
gap (spin gap) in the excitation spectrum that is a consequence
of strong geometrical frustration in magnetic couplings. The
regime of fully entangled neighbors perfectly coincides with
the regime of finite spin gap as presented in Fig. 9. Calcu-
lated lines for various system sizes N in Fig. 9(a) denote re-
gions (shaded for N = 20) in the (J ′/J,B/J) plane where
CAB = 1 at T = 0. In the lower panel [Fig. 9(b)] the lines
represent the energy gap E1 − EGS between the first excited
state with energy E1 with total spin projection S
z = 1 and
the ground state with energy EGS and total spin projection
Sz = 0, calculated for B = 0. For J ′ < J ′c (full lines)
E1 − EGS corresponds to the value of the spin gap. With an
increasing magnetic field the spin gap closes (shaded region
for N = 20) and eventually vanishes at the CAB = 1 border
line. Shaded regions in Figs. 9(a),(b) therefore coincide. Note
also that the results for N = 16 and 20 sites differ mainly in
J ′c.
As a consequence of finite spin gap and local character of
correlations it is an interesting observation that even N = 4
results as a function of temperature and magnetic field quali-
tatively correctly reproduce N = 20 results in the regime of
J ′ < J ′c. The main quantitative difference is in a renormal-
ized value of J ′c = J/2 for N = 4, as is evident from the
comparison of Figs.2,3 and Figs.5,6. This similarity of the re-
sults appears very useful due to the fact that concurrence for
tetrahedron-like systems (N = 4) is given analytically (Sec.
(2) In the opposite, strong interdimer coupling regime, J ′ >
J ′c, the excitation spectrum is gapless and the concurrence be-
tween next-nearest qubits, CAC, exhibits a similar behavior as
in the antiferromagnetic Heisenberg model J/J ′ → 0. Our
B = 0 results coincide with the known result extrapolated to
the thermodynamic limit CAC ≈ 16 . In finite magnetic field
and T = 0 the concurrence vanishes at Bc = 4J
′ when the
system becomes fully polarized (ground state with the total
spin S = N/2). However, at elevated temperatures the con-
currence increases due to excited singlet states and eventually
drops to zero at temperatures above Tc ≈ J ′.
We can conclude with the observation that our analysis
of concurrence and related entanglement between qubit pairs
was also found to be a very useful measure for classify-
ing various phases of the Shastry-Sutherland model. As our
numerical method is based on relatively small clusters, we
were unable to unambiguously determine possible interme-
diate phases of the model in the regime J ′ ∼ J ′c, but we be-
lieve that concurrence will prove to be a useful probe for the
classification of various phases also in this regime using alter-
native approaches. However, we were able to sweep through
all other dominant regimes of the parameters including finite
temperature and magnetic field.
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
singlet - triplet gap
Figure 9: (Color online) (a) Zero-temperature CAB = 1 region in the
plane (J ′/J, B/J) for various N . (b) The corresponding spin gap
at B = 0 (the energy of the lowest total Sz = 1 state relative to the
ground state energy).
VI. ACKNOWLEDGMENTS
The authors acknowledge J. Mravlje for useful discussions
and the support from the Slovenian Research Agency under
Contract No. P1-0044.
Appendix A: EIGENENERGIES AND EIGENVECTORS FOR
PERIODICALLY COUPLED TWO QUBIT DIMERS
Consider two qubit dimers coupled into a tetramer and de-
scribed with the Hamiltonian Eq. (1) and Fig. 1(a). The model
is exactly solvable in the separate {S, Sz} subspaces corre-
sponding to different values of the total spin S and its z com-
ponent Sz . Following the abbreviations for singlet and triplet
states on nearest-neighbor (dimer) sites i and j,
|sij〉 =
|↑i↓j − ↓i↑j〉,
|t0ij〉 =
|↑i↓j + ↓i↑j〉,
|t+ij〉 = |↑i↑j〉,
|t−ij〉 = |↓i↓j〉, (A1)
the resulting eigenstates |φk〉 and eigenenergies Ek corre-
sponding to the hamiltonian Eq. (1) are:
S = 0, Sz = 0 :
|φ1〉 = |sAB〉|sCD〉,
E1 = −3J/2, (A2)
|φ2〉 =
− |t0AB〉|t0CD〉+ |t+AB〉|t
〉+ |t−
E2 = J/2− 4J ′. (A3)
S = 1, Sz = −1 :
|φ3〉 = |sAB〉|t−CD〉,
|φ4〉 = |t−AB〉|sCD〉,
E3,4 = −J/2−B, (A4)
|φ5〉 =
|t0AB〉|t
〉 − |t−
〉|t0CD〉
E5 = J/2− 2J ′ −B. (A5)
S = 1, Sz = 0 :
|φ6〉 = |sAB〉|t0CD〉,
|φ7〉 = |t0AB〉|sCD〉,
E6,7 = −J/2, (A6)
|φ8〉 =
〉 − |t−
E8 = J/2− 2J ′. (A7)
S = 1, Sz = 1 :
|φ9〉 = |sAB〉|t+CD〉,
|φ10〉 = |t+AB〉|sCD〉,
E9,10 = −J/2 +B, (A8)
|φ11〉 =
− |t0AB〉|t+CD〉+ |t
〉|t0CD〉
E11 = J/2− 2J ′ +B. (A9)
S = 2, Sz = −2 :
|φ12〉 = |t−AB〉|t
E12 = J/2 + 2J
′ − 2B. (A10)
S = 2, Sz = −1 :
|φ13〉 =
|t0AB〉|t−CD〉+ |t
〉|t0CD〉
E13 = J/2 + 2J
′ −B. (A11)
S = 2, Sz = 0 :
|φ14〉 =
2|t0AB〉|t0CD〉+ |t+AB〉|t
〉+ |t−
E14 = J/2 + 2J
′. (A12)
S = 2, Sz = 1 :
|φ15〉 =
|t0AB〉|t+CD〉+ |t
〉|t0CD〉
E15 = J/2 + 2J
′ +B. (A13)
S = 2, Sz = 2 :
|φ16〉 = |t+AB〉|t
E16 = J/2 + 2J
′ + 2B. (A14)
1 M. A. Nielsen and I. A. Chuang, Quantum Information and
Quantum Computation (Cambridge University Press, Cambridge,
2001).
2 A. Osterloh, L. Amico, G. Falci, and R. Fazio, Nature 416, 608
(2002).
3 C. H. Bennett, H. J. Bernstein, S. Popescu, and B. Schumacher,
Phys. Rev. A 53, 2046 (1996); C. H. Bennett, D. P. DiVincenzo,
J. A. Smolin, and W.K. Wootters, ibid. 54, 3824 (1996).
4 S. Hill and W. K. Wootters, Phys. Rev. Lett. 78, 5022 (1997).
5 V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight, Phys.
Rev. Lett. 78, 2275 (1997).
6 W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
7 J. Schliemann, D. Loss, and A. H. MacDonald, Phys. Rev. B 63,
085311 (2001); J. Schliemann, J. I. Cirac, M. Kuś, M. Lewenstein,
and D. Loss, Phys. Rev. A 64, 022303 (2001).
8 G.C. Ghirardi and L. Marinatto, Phys. Rev. A 70, 012109 (2004).
9 K. Eckert, J. Schliemann, G. Brus, and M. Lewenstein, Ann. Phys.
299, 88 (2002).
10 J. R. Gittings and A. J. Fisher, Phys. Rev. A 66 032305 (2002).
11 P. Zanardi, Phys. Rev. A 65, 042101 (2002).
12 V. Vedral, Cent. Eur. J. Phys. 2, 289 (2003); D. Cavalcanti, M. F.
Santos, M. O. TerraCunha, C. Lunkes, V. Vedral, Phys. Rev. A 72,
062307 (2005).
13 D. P. DiVincenzo, Mesoscopic Electron Transport, NATO Ad-
vanced Studies Institute, Series E: Applied Science, edited by L.
Kouwenhoven, G. Schön, and L. Sohn (Kluwer Academic, Dor-
drecht, 1997); cond-mat/9612126.
14 D. P. DiVincenzo, Science 309, 2173 (2005).
15 W. A. Coish and D. Loss, cond-mat/0603444.
16 J. M. Elzerman, R. Hanson, J. S. Greidanus, L. H. Willems van
Beveren, S. DeFranceschi, L. M. K. Vandersypen, S. Tarucha, and
L. P. Kouwenhoven, Phys. Rev. B 67, 161308 (2003).
17 J. C. Chen, A. M. Chang, and M. R. Melloch, Phys. Rev. Lett. 92,
176801 (2004).
18 T. Hatano, M. Stopa, and S. Tarucha, Science 309, 268 (2005).
19 J. R. Petta, A. C. Johnson, J. M. Taylor, E. A. Laird, A. Yacoby,
M. D. Lukin, C. M. Marcus, M. P. Hanson, and A. C. Gossard,
Science 309, 2180 (2005).
20 M. A. Nielsen, Ph.D. thesis, University of New Mexico, 1998;
quant-ph/0011036.
21 M. C. Arnesen, S. Bose, and V. Vedral, Phys. Rev. Lett. 87,
017901 (2001).
22 X. Wang, Phys. Lett. A 281, 101 (2001).
23 B. V. Fine, F. Mintert, and A. Buchleitner Phys. Rev. B 71, 153105
(2005).
24 A. Ramšak, J. Mravlje, R. Žitko, and J. Bonča, Phys. Rev. B 74,
241305(R) (2006).
25 T. Yu and J. H. Eberly, Phys. Rev. B 66, 193306 (2002).
26 B. S. Shastry and B. Sutherland, Physica B 108, 1069 (1981).
27 O. F. Syljuåsen, Phys. Rev. A 68, 060301(R) (2003)
28 L. Amico, A. Osterloh, F. Plastina, R. Fazio, and G.M. Palma,
Phys. Rev. A 69, 022304 (2004).
29 T. J. Osborne and M. A. Nielsen, Phys. Rev. A 66, 032110 (2002).
30 T. Roscilde, P. Verrucchi, A. Fubini, S. Haas, and V. Tognetti,
Phys. Rev. Lett. 93, 167203 (2004).
31 T. Roscilde, P. Verrucchi, A. Fubini, S. Haas, and V. Tognetti,
Phys. Rev. Lett. 94, 147208 (2005).
32 D. Larsson and H. Johannesson, Phys. Rev. Let. 95, 196406
(2005).
33 Shi-Jian Gu, Shu-Sa Deng, You-Quan Li, and Hai-Qing Lin Phys.
Rev. Lett. 93, 086402 (2004).
34 S. S. Deng, S. J. Gu, and H. Q. Lin Phys. Rev. B 74, 045103
(2006).
35 Ö. Legeza and J. Sólyom, Phys. Rev. Lett. 96, 116401 (2006).
36 F. Verstraete, M. Popp, and J. I. Cirac, Phys. Rev. Lett. 92, 027901
(2004).
37 A. Ramšak, I. Sega, and J. H. Jefferson Phys. Rev. A 74,
010304(R) (2006).
38 R. F. Werner, Phys. Rev. A 40, 4277 (1989).
39 I. Bose and A. Tribedi, Phys. Rev. A 72, 022314 (2005).
40 R. W. Smith and D. A. Keszler, J. Solid State Chem. 93, 430
(1991).
41 H. Kageyama, K. Yoshimura, R. Stern, N. V. Mushnikov, K.
Onizuka, M. Kato, K. Kosuge, C. P. Slichter, T. Goto, and Y.
Ueda, Phys. Rev. Lett. 82, 3168 (1999).
42 S. Miyahara and K. Ueda, J. Phys.: Condens. Matter 15, R327
(2003); and references therein.
43 G. A. Jorge, R. Stern, M. Jaime, N. Harrison, J. Bonča, S. El
Shawish, C. D. Batista, H. A. Dabkowska, and B. D. Gaulin, Phys
Rev. B 71, 092403 (2005).
44 S. El Shawish, J. Bonča, C. D. Batista, and I. Sega, Phys. Rev. B
71, 014413 (2005).
45 S. El Shawish, J. Bonča, and I. Sega, Phys. Rev. B 72, 184409
(2005).
46 S. Miyahara and K. Ueda, Phys. Rev. Lett. 82, 3701 (1999).
47 E. Müller-Hartmann, R. R. P. Singh, C. Knetter, and G. S. Uhrig,
Phys. Rev. Lett. 84, 1808 (2000).
48 A. Isacsson and O. F. Syljuåsen, Phys. Rev. E 74, 026701 (2006);
and references therein.
49 K. Onizuka, H. Kageyama, Y. Narumi, K. Kindo, Y. Ueda, and T.
Goto, J. Phys. Soc. Jpn. 69, 1016 (2000).
50 T. Momoi and K. Totsuka, Phys. Rev. B 61, 3231 (2000).
51 G. Misguich, Th. Jolicoeur, and S. M. Girvin, Phys. Rev. Lett., 87,
097203 (2001).
52 M. Aichhorn, M. Daghofer, H. G. Evertz, and W. von der Linden,
Phys. Rev. B 67, 161103(R) (2003).
53 J. Jaklič and P. Prelovšek, Adv. Phys. 49, 1 (2000); Phys. Rev.
Lett. 77, 892 (1996); Phys. Rev. B 49, 5065 (1994).
54 J. Bonča and P. Prelovšek, Phys. Rev. B 67, 085103 (2003).
55 X. Wang, Phys. Rev. A 64, 012313 (2001); ibid. 66, 044305
(2002).
56 M. Cao and S. Zhu, Phys. Rev. A 71, 034311 (2005).
57 G. F. Zhang and S. S. Li, Phys. Rev. A 72, 034302 (2005).
58 E. Manousakis, Rev. Mod. Phys. 63, 1 (1991).
http://arxiv.org/abs/cond-mat/9612126
http://arxiv.org/abs/cond-mat/0603444
http://arxiv.org/abs/quant-ph/0011036
ABSTRACT
  We show that temperature and magnetic field properties of the entanglement
between spins on the two-dimensional Shastry-Sutherland lattice can be
qualitatively described by analytical results for a qubit tetramer. Exact
diagonalization of clusters with up to 20 sites reveals that the regime of
fully entangled neighboring pairs coincides with the regime of finite spin gap
in the spectrum. Additionally, the results for the regime of vanishing spin gap
are discussed and related to the Heisenberg limit of the model.

<|endoftext|><|startoftext|>
Bonding of H in O vacancies of ZnO
H. Takenaka and D.J. Singh
Materials Science and Technology Division and Center for Radiation Detection Materials and Systems,
Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6032
(Dated: October 25, 2018)
We investigate the bonding of H in O vacancies in ZnO using density functional calculations. We
find that H is anionic and does not form multicenter bonds with Zn in this compound.
PACS numbers: 71.20.Ps,71.55.Gs
ZnO is of importance as an extremely fast inorganic
scintillator material when doped with Ga or In. It
is useful in alpha particle detection, e.g. for devices
such as deuterium-tritium neutron generators used in
radiography.1,2,3,4,5,6 In this application, H treatment has
been shown to improve properties. ZnO has also at-
tracted much recent attention motivated by potential
applications as an oxide electronic material,7,8,9,10 and
in optoelectronic and lighting applications.11,12,13,14,15 H
has been implicated as playing an important role in the
electronic properties for ZnO for those applications as
well.16,17,18 From a fundamental point of view, the be-
havior, and especially bonding of H, is of great inter-
est; H plays an exceptionally important role in chemistry,
and shows unique bonding characteristics. For example,
it readily forms compounds where it behaves as a halo-
gen ion and forms structures similar to fluorides, such as
rutile, perovskite, rocksalt, etc.,19,20,21 and at the same
time readily occurs a cation in other chemical environ-
ments. Polar covalent bonds involving H and hydrogen
bonds are central to much of organic chemistry as well
the properties of important substances such as water.22
Thus the recent report by Janotti and Van de Walle (JV)
that H forms a new type of strong multicenter bond in
O vacancies in ZnO is of wide ranging interest.23
In this paper, we present standard local density ap-
proximation (LDA) calculations of the electronic proper-
ties and structure of H containing O vacancies in ZnO.
We do not find the multicenter covalent bonds claimed by
JV, and instead characterize the behavior of H as quite
conventional in that it occurs as an anion on the anion
site in a polar crystalline environment.
Our calculations were done within the standard local
density approximation using the general potential lin-
earized augmented planewave (LAPW) method, includ-
ing local orbitals.24,25 Specifically, we constructed a 72
atom 3x3x2 wurtzite supercells of ZnO, with one O atom
removed and replaced by H. The calculations were done
using the bulk lattice parameters of ZnO, but the inter-
nal coordinates of all atoms in the supercell were fully
relaxed. No symmetry was assumed in the relaxations.
The LAPW method is an all electron method that makes
no shape approximations to either the potential or charge
density. It divides space into non-overlapping atom cen-
tered spheres and an interstitial region. The method then
employs accurate basis sets appropriate for each region.24
In the present calculations, LAPW sphere radii of 2.0 a0,
1.6 a0 and 1.2 a0 were used for Zn, O, and H respec-
tively, along with a basis set consisting of more than 8500
LAPW functions and local orbitals. Convergence tests
were done with a larger basis set of approximately 12000
functions, but no significant changes were found. The
relaxations were done without any imposed symmetry,
with a 2x2x2 special k-point zone sampling. A sampling
using only the Γ point was found to yield slightly different
quantitative results, due to the limited size of our super-
cell, but would lead to the same conclusions. The calcu-
lated value of the internal parameter is u=0.119, which
agrees almost exactly with the experimental value. The
densities of states used to analyze the electronic proper-
ties were obtained using the linear tetrahedron method
based on eigenvalues and wavefunctions at 36 k-points in
the half zone (k and -k are connected by time reversal).
In our relaxed structure for a neutral cell, we find that
H occurs in a slightly asymmetric position, with three Zn
neighbors at 2.03 Å, and one Zn neighbor (the one along
the c-axis direction) at 2.17 Å. For the singly charged
cell, we obtain a very similar result, specifically three Zn
neighbors at 2.02 Å, and the apical Zn at 2.21 Å. In the
following, we focus on the neutral cell except as noted.
Fig. 1 shows the projection of the electronic density of
states onto the H LAPW sphere, of radius 1.2 a0. The
Fermi energy for our neutral cell lies at a position one
electron per cell into the conduction bands, correspond-
ing to the valence difference of one between O and H.
As may be seen, there are two prominent peaks in the
H component of the density of states, one, denoted “B”,
at ∼ -8 eV with respect to the Fermi level (-6 eV with
respect to the valence band maximum), and the other,
denoted “A”, high in the conduction bands at ∼ 6 eV. JV
identified these peaks, “B” and “A”, respectively, as the
bonding and antibonding combinations of metal and H
orbitals giving rise to the multicenter bond. In addition,
there is significant H s character distributed over the va-
lence bands, especially near the valence band maximum.
We note that the very large bonding-antibonding split-
ting of 14 eV implied by the assignment of JV indicates
extremely strong covalent bonds, which is somewhat sur-
prising considering the Zn-H distances. In any case, such
a large covalent gap would imply that the bonding and
antibonding states should have mixed character. In other
words, the bonding state should be of roughly half H s
character, while the remaining H 1s character should oc-
cur in the unoccupied antibonding level, so that the oc-
http://arxiv.org/abs/0704.0173v1
-8 -6 -4 -2  0  2  4  6  8
E(eV)
H pH in O vacancy
FIG. 1: Projection of the electronic density of states onto the
s and p components inside the H LAPW sphere, radius 1.2 a0,
for the 72 atom neutral cell. The two peaks identified by JV as
bonding and antibonding combinations are indicated by “B”
and “A”, respectively. The Fermi level lies in the conduction
bands. The position of the valence band maximum is denoted
by “VBM” (note that the LDA strongly underestimates the
3.3 eV band gap of ZnO).
cupancy of the H 1s orbital should be roughly 1 e, and
certainly significantly less than 2 e.
To analyze the bonding further it is convenient to com-
pare the charge density with an ionic model, as was done
for some alanates.30,31 As mentioned, H is known to en-
ter some solids as an anion, including tetragonal MgH2.
ZnH2 also exists though it is not as well characterized.
Furthermore, the simplest hydride, LiH, is of this ioni-
cally bonded type and includes H− anions coordinated
by six metal atoms.33,34 In these hydrogen anion based
materials, the negative H ion is stabilized by the Ewald
field. In fact, the importance of the Ewald field is one of
the essential differences between chemistry in solid state
and the chemistry of molecules. The long range Coulomb
interaction stabilizes ionic bonding for species that would
generally be largely covalent in small molecules, and in
particular stabilizes anions such as O2− and H−, which
are common in solid state chemistry but much less so
in gas phase molecules. This stabilization by the Ewald
field is reflected in the variability of the effective size of
H in crystal structure data for anionic hydrides.27,28,29 In
view of the common occurrence of H as an anion in many
metal hydrides, it would not be surprising if H− were sta-
bilized by the Ewald field of an anion vacancy in a polar
crystal such as ZnO. Thus we consider an ionic model,
based on the charge density of a H− ion stabilized by the
Ewald field, as simulated by a Watson sphere,35 as in Ref.
31. For such a H− ion, 0.525 e out of 2.0 e, i.e. ∼ 26% of
the charge, is inside a radius of 1.2 a0, so the majority of
the charge is outside. Because of the small sphere radius
used for H in our calculations the amount of charge in-
-8 -6 -4 -2  0  2  4  6  8
E(eV)
FIG. 2: Integration of the projection of the H s projected
density of states as in Fig. 1 normalized according to the
fraction of charge inside a 1.2 a0 sphere for a Watson sphere
stabilized H− anion (see text).
side the sphere is only weakly dependent on the Watson
sphere radius, which reflects the environment. For a non-
spin polarized neutral H in free space as described in the
LDA, 0.378 e (38%) would be inside a radius of 1.2 a0,
showing that there is a strong dependence on the charge
state, though not precise proportionality. Fig. 2 shows
the integral of the H s character as a function of energy
normalized by the fraction of the H− charge inside a 1.2
a0 sphere (0.525/2). Over the valence band region the
p contribution is less than 2%, and the d contribution is
less than 0.2%. The conduction bands, which are more
Zn sp derived, show a larger proportion of H p character,
as may be seen in Fig. 1. Thus the charge inside the
sphere, which comes from the occupied valence bands,
is mainly due to H s states, and not from orbitals on
neighboring atoms.
Using the ionic model for H−, i.e. incorporating the
factor of 0.525/2 as the fraction of charge inside the H
LAPW sphere, and integrating, one finds that the peak
“B” contains ∼ 0.8 H s electrons. Integrating over the
remaining valence bands brings the H s count to 2.0 elec-
trons, i.e. what is expected for H−. This leads to an
interpretation of the electronic structure, where the peak
“B” comes from the H 1s state. This hybridizes with
valence band states, which have mixed Zn d and O p
character. The second peak “A”, 14 eV higher, is then
the H s resonance. This is a very reasonable position for
the resonance of H−. In particular, the H−− resonance
of atomic H− is at ∼ 14.5 eV.36,37,38 JV emphasized the
shape of the charge density associated with the states in
the peak “B” and argued for bonding based partly on real
space images of this charge. As mentioned, in our pro-
jected density of states we find that this peak contains
0.8 s electrons (i.e. 40% H s character), which would
be consistent with a bonding orbital. However, the hy-
bridization is with other occupied states, and when the
integration is done over all the valence bands, we find 2
s electrons, consistent with H−. We emphasize that mix-
ing of occupied states does not contribute to the energy,
and that such hybridizations do not constitute bonds.
Our calculated binding energy relative H2 and a relaxed
neutral supercell with an O vacancy is 87 kJ/mol H.39
This may be an overestimate due to LDA errors,40 but
in any case is much smaller than the binding that would
be suggested by a 14 eV bonding-antibonding splitting.
We also calculated the positron wavefunction and life-
times for ZnO with an O vacancy and with the H contain-
ing O vacancy. This was done using the LAPW method
in the full inverted self-consistent Coulomb potential plus
the correlation and enhancement factors of Boronski and
Nieminen41 as calculated from the full charge density. We
obtain a bulk positron lifetime for ZnO of 144 ps, which
is at the lower end of the experimental range. Reported
experimental values are 151 ps (Ref. 42), 170 ps (Ref.
43), 141 ps - 155 ps (Ref. 44), and 182 ps (Ref. 45). Sig-
nificantly, positrons, which are positively charged, tend
to localize in voids and in sites that are favorable for
cations, and localize weakly if at all in anion sites, due
to the unfavorable Coulomb potential. We do not find
positron localization at the O vacancy in our ZnO super-
cell, indicating that the O is indeed an anion as expected,
nor do we find positron localization or a significant life-
time increase in the cell with a H containing O vacancy.
We also find no significant change in lifetime for H in an
O vacancy within a charged supercell with one electron
removed. In contrast, we obtain a bound positron state
for Zn vacancies, both with and without H, reflecting the
fact that Zn is on a cation site. The calculated lifetime
in a supercell with a Zn vacancy is 212 ps, while with a H
filled Zn vacancy we obtain 175 ps (in this case H bonds
to a single adjacent O to form a hydroxyl like unit with
H-O bond length of 1.01 Å).46
To summarize, we have performed density functional
calculations for ZnO supercells with both empty and H
filled O vacancies. Based on an analysis of the electronic
structure we do not find any evidence for hydrogen mul-
ticenter bonds, but rather find that H occurs as H−.
We are grateful for helpful discussions with L.A. Boat-
ner, J.S. Neal, and L.E. Halliburton. This work was sup-
ported by the Department of Energy, Office of Nonpro-
liferation Research and Development, NA22.
1 W. Lehmann, Solid State Electronics 9, 1107 (1966).
2 D. Luckey, Nucl. Instr. and Meth. 62, 119 (1968).
3 T. Batsch, B. Bengtson, and M. Moszynski, Nucl. Instr.
and Meth. 125, 443 (1975).
4 S.E. Derenzo, E. Bourret-Courchesne, M.J. Weber, and
M.K. Klintenberg, Nucl. Instr. Meth. Phys. Res. A 537,
261 (2005).
5 J.S. Neal, L.A. Boatner, N.C. Giles, L.E. Halliburton, S.E.
Derenzo, and E.D. Bourret-Courchesne, Nucl. Inst. Meth.
Phys. Res. A 568, 803 (2006).
6 E.D. Bourret-Courchesne, and S.E. Derenzo, 2006 IEEE
Nuclear Science Symposium Conference Record N40-5,
1541 (2006).
7 K. Nomura, H. Ohta, K. Ueda, T. Kamiya, M. Hirano, and
H. Hosono, Science 300, 5623 (2003).
8 R.L. Hoffman, B.J. Norris, and J.F. Wager, Appl. Phys.
Lett. 82, 733 (2003).
9 A. Suzuki, T. Matsushita, T. Aoki, Y. Yoneyama, and M.
Okuda, Jpn. J. Appl. Phys., Part 2 38, L71 (1999).
10 D.C. Look, D.C. Reynolds, C.W. Litton, R.L. Jones, D.B.
Eason, and G. Cantwell, Appl. Phys. Lett. 81, 1830 (2002).
11 F.H. Nicoll, Appl. Phys. Lett. 9, 13 (1966).
12 D.M. Bagnall, Y.F. Chen, Z. Zhu, T. Yao, S. Koyama,
M.Y. Shen, and T. Goto, Appl. Phys. Lett. 70, 2230
(1997).
13 D.C. Look, J.W. Hemsky, and J.R. Sizelove, Phys. Rev.
Lett. 82, 2552 (1999).
14 D.C. Look, Mater. Sci. Eng. B 80, 383 (2001).
15 M.H. Huang, S. Mao, H. Feick, H.Q. Yan, Y.Y. Wu, H.
Kind, E. Weber, R. Russo, and P. Yang, Science 292, 1897
(2001).
16 C.G. Van de Walle, Phys. Rev. Lett. 85, 1012 (2000).
17 S.F.J. Cox, E.A. Davis, S.P. Cottrell, P.J.C. King, J.S.
Lord, J.M. Gil, H.V. Alberto, R.C. Vilao, J. Pironto
Duarte, N. Ayres de Campos, A. Weidinger, R.L. Lichti,
and S.J.C. Irvine, Phys. Rev. Lett. 86, 2601 (2001).
18 D.M. Hoffmann, A. Hofstaetter, F. Leiter, H. Zhou, F.
Henecker, B.K. Meyer, S.B. Orlinskii, J. Schmidt, and P.G.
Baranov, Phys. Rev. Lett. 88, 045504 (2002).
19 B. Bertheville, T. Herrmannsdorfer, and K. Yvon, J. Alloys
Compd. 325, L13 (2001).
20 F. Gingle, T. Vogt, E. Akiba, and K. Yvon, J. Alloys
Compd. 282, 125 (1999).
21 K. Yvon, Chimia 52, 613 (1998).
22 L. Pauling, Nature of the Chemical Bond (Cornell Univer-
sity Press, Ithaca, 1960).
23 A. Janotti, and C.G. Van de Walle, Nature Materials 6,
44 (2007).
24 D.J. Singh and L. Nordstrom, Planewaves, Pseudopoten-
tials and the LAPW Method, 2nd. Ed. (Springer, Berlin,
2006).
25 D. Singh, Phys. Rev. B 43, 6388 (1991).
26 R. Yu and P.K. Lam, Phys. Rev. B 15, 8730 (1988).
27 D.F.C. Morris and G.L. Reed, J. Inorg. Nucl. Chem. 27,
1715 (1965).
28 R.D. Shannon, Acta Cryst.A32, 751 (1976).
29 L. Pauling, Acta Cryst., Sect. B 34, 746 (1978).
30 A. Aguayo and D.J. Singh, Phys. Rev. B 69, 155103
(2004).
31 D.J. Singh, Phys. Rev. B 71, 216101 (2005).
32 E. Wiberg, W. Henle, and R. Bauer, Z. Naturforsch. B 6,
393 (1951).
33 A.B. Kunz and D.J. Mickish, Phys. Rev. B 11, 1700 (1975).
34 R. Dovesi, C. Ermond, E. Ferrero, C. Pisani, and C. Roetti,
Phys. Rev. B 29, 3591 (1984).
35 R.E. Watson, Phys. Rev. 111, 1108 (1958); we used a H
anion stabilized by a sphere of radius 1.62 Å.
36 D.S. Walton, B. Peart, and K. Dolder, J. Phys. B 3, L148
(1970).
37 H.S. Taylor and L.D. Thomas, Phys. Rev. Lett. 28, 1091
(1972).
38 G.J. Schulz, Rev. Mod. Phys. 45, 378 (1973).
39 The value used for the energy of H2 is -2.294 Ry; D.J.
Singh, M. Gupta, and R. Gupta, Phys. Rev. B 75, 035103
(2007).
40 The LDA generally overbinds solids, and this leads to over-
estimates of the binding of H in solids, typically in the
range of 0 to 20 kJ/mol H; H. Smithson, C.A. Marianetti,
D. Morgan, A. Van der Ven, A. Predith, and G. Ceder,
Phys. Rev. B 66, 144107 (2002); S.V. Halilov, D.J. Singh,
M. Gupta, and R. Gupta, Phys. Rev. B 70, 195117 (2004);
K. Miwa and A. Fukumoto, Phys. Rev. B 65, 155114
(2002).
41 E. Boronski and R.M. Nieminen, Phys. Rev. B 34, 3820
(1986); see also M.J. Puska and R.M. Nieminen, Rev. Mod.
Phys. 66, 841 (1994); P. Schultz and K.G. Lynn, Rev. Mod.
Phys. 60, 701 (1988).
42 G. Bauer, W. Anwand, W. Skorupa, J. Kuriplach, O. Me-
likhova, C. Moisson, W. von Wenckstern, H. Schmidt, M.
Lorenz, and M. Grundmann, Phys. Rev. B 74, 045208
(2006).
43 F. Tuomisto, V. Ranki, K. Saarinen and D.C. Look, Phys.
Rev. Lett. 91, 205502 (2003).
44 S. Dutta, M. Chakrabarti, S. Chattopadhyay, D. Jana, D.
Sanyal, and A. Sarkar, J. Appl. Phys. 98, 053513 (2005).
45 Z.Q. Chen, S. Yamamoto, M. Maekawa, A. Kawasuso, X.L.
Yuan, and T. Sekiguchi, J. Appl. Phys. 94, 4807 (2003).
46 The calculations for H in a Zn vacancy were done using a
different set of LAPW sphere radii, as was necessary due
to the short H-O bond length.
ABSTRACT
  We investigate the bonding of H in O vacancies of ZnO using density
functional calculations. We find that H is anionic and does not form
multicenter bonds with Zn in this compound.

<|endoftext|><|startoftext|>
Introduction
The extraction of the CP violating phase α [1] has lead to some recent con-
troversy confronting the results and statistical methods of two different col-
laborations: the frequentist approach advocated in references [2, 4] and the
bayesian approach employed in reference [3]. In reference [2] J. Charles et al.
presented an important criticism to the bayesian methods used by the UTfit
collaboration in order to extract the angle α of the unitarity triangle b–d from
ππ and ρρ data. The criticism relies heavily on the statistical treatment of
data: frequentist vs. bayesian. The answer of the UTfit collaboration [3]
rises some interesting points, both on the interpretation of the results and
on the importance of the physical assumptions on the hadronic amplitudes.
The authors of [2] have recently answered to this UTfit reply in [4]. The
aim of the present work is to clarify several issues central to an adequate
understanding of the physics at stake. We also want to call the attention
on the importance of reparametrization invariance (RpI) in the sense intro-
duced by F.J.B. and J. Silva in reference [5] to do so. We will not enter
the polemic arena of statistical confrontation. With regard to this, we will
instead illustrate the compatibility of results obtained in both approaches as
long as things are done properly; notwithstanding, we will not ignore some
“obscure” aspects of both approaches that are somehow swept under the
rug as the statistical confrontation rages on, they illustrate that rather than
sticking to one approach and deprecating the other it may be wiser to learn
lessons from both.
This work is organized as follows. We start section 2 with a short re-
minder on reparametrization invariance and its implications, then we use the
exclusion or inclusion of B → π0π0 data together with RpI to clarify the ori-
gin of our knowledge on α. In section 3 we study critically Standard Model
inspired parametrizations. We devote section 4 to a detailed analysis of the
impact on the results of allowed ranges for some parameters. The lessons
from previous sections set up the stage for an adequate extraction of α, to
which section 5 is dedicated, especially in the presence of New Physics (NP)
in loops. Several appendices deal with aspects left out of the main flow of
the discussion.
2 Reparametrization invariance and B → ππ
2.1 Weak Phases
We start this section with a short reminder of the findings presented in
reference [5] concerning the parametrization of decay amplitudes and the
election of weak phases. A generic parametrization of the decay amplitude
of a B meson to a given final state and the CP-conjugate amplitude is the
following1:
A = M1 e
+iφ1 eiδ1 +M2 e
+iφ2 eiδ2 ,
Ā = M1 e
−iφ1 eiδ1 +M2 e
−iφ2 eiδ2 , (1)
where φj are CP-odd weak phases, δj are CP-even strong phases and Mj
the magnitudes of the different contributions. The first property to consider
is the full generality, as long as φ1 − φ2 6= 0 mod [π], of Eq. (1), i.e. any
additional contribution M3e
±iφ3eiδ3 can be recast into the previous form as
e±iφ3 =
sin(φ3 − φ2)
sin(φ1 − φ2)
e±iφ1 +
sin(φ3 − φ1)
sin(φ2 − φ1)
e±iφ2 , (2)
and thus
A′ = A +M3e
+iφ3eiδ3 = M ′1e
+iφ1eiδ
1 +M ′2e
+iφ2eiδ
Ā′ = Ā +M3e
−iφ3eiδ3 = M ′1e
−iφ1eiδ
1 +M ′2e
−iφ2eiδ
2 , (3)
M ′1e
1 = M1e
iδ1 +M3e
sin(φ3 − φ2)
sin(φ1 − φ2)
M ′2e
2 = M2e
iδ2 +M3e
sin(φ3 − φ1)
sin(φ2 − φ1)
. (4)
We can also use Eq. (2) to change our basic set {φ1, φ2} of weak phases to any
other arbitrary set of weak phases {ϕ1, ϕ2}, as long as ϕ1−ϕ2 6= 0 mod [π]:
A = M1 e+iϕ1 ei∆1 +M2 e+iϕ2 ei∆2 ,
Ā = M1 e−iϕ1 ei∆1 +M2 e−iϕ2 ei∆2 , (5)
where
M1ei∆1 = M1eiδ1
sin(φ1 − ϕ2)
sin(ϕ1 − ϕ2)
sin(φ2 − ϕ2)
sin(ϕ1 − ϕ2)
M2ei∆2 = M1eiδ1
sin(φ1 − ϕ1)
sin(ϕ2 − ϕ1)
sin(φ2 − ϕ1)
sin(ϕ2 − ϕ1)
. (6)
1If the final state is ±1 CP eigenstate, Ā should include an additional ±1 factor.
This change in the basic set of chosen weak phases should have no physical
implications, hence the name reparametrization invariance. We remind two
main consequences of RpI in the absence of hadronic inputs. For an extensive
discussion see [5]:
1. Consider two basic sets of weak phases {φ1, φ2} and {φ1, ϕ2} with φ2 6=
ϕ2; if an algorithm allows us to write φ2 as a function of physical
observables then, owing to the functional similarity of equation (1) and
(5), we would extract ϕ2 with exactly the same function, leading to
φ2 = ϕ2, in contradiction with the assumptions; then, a priori, the
weak phases in the parametrization of the decay amplitudes have no
physical meaning, or cannot be extracted without hadronic input.
2. If, experimentally, the direct CP asymmetry C = (|A|2 − |Ā|2)/(|A|2+
|Ā|2) is C = 0, then the decay amplitudes can be expressed in terms of a
single weak phase, which could be sensibly extracted, up to discrete am-
bigüities, through the indirect CP asymmetry S = 2 Im(ĀA∗)/(|A|2 +
|Ā|2). Additionally, if the theoretical description of the decay ampli-
tudes only involves a single weak phase from a basic Lagrangian, then
it can be identified with the phase measured through S.
As we will see, this two results apply respectively to the π+π− and π+π0
channels. Essentially, the first one will be operative in the ∆I = 1/2 piece
and the second one in the ∆I = 3/2.
2.2 Removing π0π0 information
To make our point transparent we will start by studying the extraction – in
fact the non-extraction – of α from ππ data when B → π0π0 experimental
information is removed. Let us start with a widely used [2, 3], Standard
Model inspired, parametrization of the decay amplitudes:
A+− ≡ A(B0d → π+π−) = e−iαT+− + P ,√
2A+0 ≡
2A(B+ → π+π0) = e−iα(T+− + T 00) ,
2A00 ≡
2A(B0d → π0π0) ≡
2A+0 −A+− = e−iαT 00 − P ,
Ā+− ≡ A(B̄0d → π+π−) = e+iαT+− + P ,√
2Ā+0 ≡
2A(B− → π−π0) = e+iα(T+− + T 00) ,
2Ā00 ≡
2A(B̄0d → π0π0) ≡
2Ā+0 − Ā+− = e+iαT 00 − P . (7)
When π0π0 experimental information is removed we have two decoupled de-
cays:
1. π+π0 data, i.e. the average branching ratio B+0 and the direct CP
asymmetry C+0, provide, respectively, |T+− + T 00| and a consistency
check C+0 = 0; α is irrelevant there.
2. π+π− data, i.e. B+−, C+− and the mixing induced CP asymmetry
S+−, give information on α decoupled from π+π0, on |T+−|, |P | and
the relative (strong) phase δPT+− between T
+− and P .
With three observables and four parameters everybody knows or suspects
that one cannot really extract α: we have C+− 6= 0, as reminded in section
2.1, α cannot be extracted from B → π+π− in this limited case. One can try,
nevertheless, to obtain a probability distribution function (PDF) for α as in
reference [2]. This PDF, obtained in an analysis with three observables and
four unknowns, has obviously a strong dependence in the priors, as in figure
2 of [2]. Even worse, reparametrization invariance [5] tells us that A+−, Ā+−
can also be written as
A+− = e
−iα′T ′+− + P ′, Ā+− = e
+iα′T ′+− + P ′ , (8)
where α′ is any weak phase – known or unknown, α′ 6= 0, π –. In this scenario
the conclusion is clear: any information one would get for α would also be
valid for any α′ and thus it cannot be assigned to α. This solves the puzzle
raised in the MA and RI parametrizations within figure 4 of reference [2]:
those PDFs cannot be attributable to α. Just with that data alone we cannot
extract α′ – whatever it is –, as we have emphasized in 2.1. To illustrate this
issue we compute the PDFs of figure 1 in the following parametrization:
A+− ≡ A(B0d → π+π−) = e−iα
T+− + P ,
2A+0 ≡
2 A(B+ → π+π0) = e−iα(T+− + T 00) ,
2A00 ≡
2 A(B0d → π0π0) ≡
2A+0 − A+− ,
Ā+− ≡ A(B̄0d → π+π−) = e+iα
T+− + P ,
2Ā+0 ≡
2 A(B− → π−π0) = e+iα(T+− + T 00) ,
2Ā00 ≡
2 A(B̄0d → π0π0) ≡
2Ā+0 − Ā+− .
Notice that just with α = α′, Eq. (9) recovers the parametrization in Eq. (7).
The phase of T+− is set to zero (i.e. all strong phases are relative to arg(T+−))
and flat priors are used for all the parameters2, that is, moduli |T+−|, |T 00|,
2The allowed ranges for the different moduli and the sensitivity to them in this and
other cases will be addressed later, for instance, for this example, they are all limited to
lie in the range [0; 10]× 10−3 ps−1/2.
|P | and phases δP = arg(P ), δ0 = arg(T 00), α and α′. Results in other
parametrizations, being equally illustrative, are relegated to appendix C.
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
0 25 50 75 100 125 150 175
α = α′
(c) Joint (α′, α) PDF
25 50 75 100 125 150 175
(d) α = α′ PDF
Figure 1: PDFs of α and α′ from B → ππ without π0π0 data.
The lesson of this example is rather obvious: the set of observables being
insensitive to α, its PDF is uninformative (just the flat prior in this case);
the PDF in figure 1(d), erroneously identified with α, is nothing else than α′
itself, whatever it could be.
2.3 Including back π0π0 information
When we incorporateB → π0π0 data to the isospin construction, |A00| (|Ā00|)
gives the angle among A+0 (Ā+0) and A+− (Ā+−); using then the known
phase difference between A+− and Ā+−, the angle among A+0 and Ā+0 is
obtained. This is just the isospin analysis giving α. Knowing α, i.e. with α
fixed, A+− = e
−iαT+− + P would have full meaning and {B+−, C+−, S+−}
would fix the three hadronic parameters. Unfortunately the isospin analysis
as explained above yields allowed values for α spanning a wide range. The
degeneracy of solutions together with the experimental errors do not fix α,
just exclude some region. In this situation {B+−, C+−, S+−} do not really fix
the hadronic parameters and, consequently, they tend to generate a spurious
PDF for α as we have seen. The final “α” is thus a sort of convolution of the
α obtained from the isospin analysis and the spurious one “extracted” purely
from π+π− data. This is illustrated with the PDFs of figure 2, making use
of the parametrization in Eq. (9).
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
0 25 50 75 100 125 150 175
α = α′
(c) Joint (α′, α) PDF
25 50 75 100 125 150 175
(d) α = α′ PDF
Figure 2: PDFs of α and α′ from B → ππ.
To stress the importance of this issue we repeat the previous example
while arbitrarily reducing all experimental uncertainties by a common factor
of 5. The PDFs corresponding to this fake scenario are displayed in figure 3.
The results shown in figures 1, 2 and 3 deserve some comment:
1. Figures 1(b) and 2(b) are almost identical; in the former we were not
using B → π0π0 information while in the later we were doing so. This
similarity is a dramatic illustration of the spurious nature of the “ex-
tracted” α′.
2. Figure 2(d) is the cut of the joint PDF in figure 2(c) along the line α =
α′. Therefore the so called MA extraction of α is a sort of convolution
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
25 50 75 100 125 150 175
(c) α = α′ PDF
Figure 3: PDFs of α and α′ from B → ππ with experimental uncertainties
reduced by a factor of 5.
of the Gronau-London α – figure 2(a) – and the spurious one.
3. This α′ PDF basically allows any value of α′ except the neighborhoods
of 0 and π, which are a priori forbidden by S+−, C+− 6= 0: obviously
there is no way to produce CP violation in the π+π− channel without
two weak phases in the amplitude that controls it. The exclusion of
α′ = 0, π is the only physical information one can extract in the SM
from the PDF of α′.
4. The deep in the α distributions around α ∼ π/4, which is transmitted
to the α = α′ PDF, is senseful. The exclusion of α ∼ 0, π is also
physical inside the SM. Nevertheless, how strongly these 0, π regions
are excluded is highly sensitive to the allowed ranges for |T+−|, |T 00|
and |P | – see section 4 –. As we move away form the α = 0, π points, the
final PDF of α would be more influenced by the spurious α′ distribution.
One can see that in the shape of the α distribution for α < 25◦ or
α > 75◦.
5. As uncertainties are reduced, even with α ≡ α′, the valid ranges for
the “real” α emerge, despite the α′ distribution. That is, as experimen-
tal uncertainties are reduced, the α′ “pollution” of α through α ≡ α′
becomes increasingly ineffective, as it should, and just transmits the
physical exclusion of α = 0, π inside the SM.
The main lesson from the previous example is: α is obtained from purely
∆I = 3/2 amplitudes, without additional hadronic input. Including it in
∆I = 1/2 pieces, as reparametrization invariance shows, pollutes the legiti-
mate extraction with information that one cannot claim is concerning α.
3 Standard Model inspired parametrizations
As stated above, following the consequences of reparametrization invariance,
the really legitimate sources of our knowledge on α are A+0, Ā+0. We have
referred to the parametrization in Eq. (7) as a “SM inspired parametrization”
of the amplitudes and we have discussed how the inclusion of α in A+−, Ā+−
is dangerous with present uncertainties. Nevertheless, it is clear that the
exclusion of α ∼ 0, π inside the SM is a valid physical consequence that comes
from having α in A+− and Ā+−. To further illustrate the importance and the
subtlety of this issue let us consider in detail what can be interpreted as a
“SM inspired parametrization”. Once we take into account reparametrization
invariance, we only need3 to focus on A+− and Ā+−:
1. RpI allows us to write {A+−, Ā+−} in terms of any pair of weak phases
{φ1, φ2} (as long as φ1 − φ2 6= 0 mod [π]), nothing enforces the use of
{0, α}.
2. SM compliance of any parametrization only requires that the vanishing
of all the SM phases leads to no CP violation, once again nothing singles
out or requires the use of {0, α}.
Consequently, as we have at our disposal other SM phases that we can choose
to parametrize A+−, Ā+−, namely
4 γ, β, χ, χ′, instead of A+− = e
−iαT+−+P
and Ā+− = e
iαT+− + P , we can for example write, on equal footing,
A+− = M1e
iδ1e−iχ +M2e
iδ2e−iβ, Ā+− = M1e
iδ1e+iχ +M2e
iδ2e+iβ , (10)
A+− = e
−iχT+− + P, Ā+− = e
+iχT+− + P . (11)
Within the SM χ ∼ O(λ2), had we used this last parametrization (Eq. (11)),
we would have found extreme compatibility problems5 that would be absent
with another SM inspired parametrization: this is a dramatic illustration
of the consequences of RpI mentioned in section 2.1. In other words, pre-
tending that one obtains information on SM “theoretical” phases just by
parametrizing A+− and Ā+− with them is in general senseless. In this case
we would have obtained that figure 2(b) is the PDF of the phase χ, the one
that appears in Bs–B̄s mixing [8, 9, 10, 11, 12, 13].
3A+0 and Ā+0 can be parametrized with a single weak phase, identifiable with α, A00
and Ā00 will follow from the isospin relations.
4γ = arg(−VudVcbV ∗ubV ∗cd), β = arg(−VcdVtbV ∗cbV ∗td), χ = arg(−VcbVtsV ∗csV ∗tb) and χ′ =
arg(−VusVcdV ∗udV ∗cs) [6].
5Just look, for example, to the O(λ2) ∼ 2 − 3◦ region of the different α′ PDFs in the
plots of previous sections [7].
4 Physics and parametrical problems
In section 2 we mentioned that the exclusion of the “dangerous” α′ near 0 and
π depended on the allowed ranges for the parameters |T ij| and |P |. Figure 4
shows the PDFs of α, α′ and α = α′ for four different sets of allowed ranges of
|T ij| and |P |. On the one hand, the PDFs of α in figures 4(a), 4(d), 4(g) and
4(j) are quite similar. On the other hand, the PDFs of α′ in figures 4(b), 4(e),
4(h) and 4(k) are completely different: the “dangerous” α′, especially in the
regions close to 0,π, is sensitive to the applied bounds. This is automatically
transmitted to the α = α′ PDF and it is in this way that the region with
“α” close to 0,π is suppressed (even wipped out as in figures 4(c) and 4(i))
through the cuts on the spurious α′, induced by the cuts on |T ij| and |P |.
One could think that this is particular to the bayesian statistical approach,
figure 5 shows the frequentist confidence level curves for α computed under
the same parametric restrictions. As we use the parametrization of Eq. (7),
they correspond to the α = α′ plots in Figure 4. It is rather clear that
without regard to the statistical approach, limiting the values of |T ij| and
|P | has observable effects in the extraction of α. Note that figure 5(c) differs
from figure 5(a) not by a cut but by a change in the shape, even if it is not
a dramatic change.
The authors of reference [2] pointed out that there is some peculiar limit
with α → 0 together with P/T+−, T 00/T+− → −1, |T+−| → ∞ – using the
parametrization of Eq. (7) – that keeps all the observables “in place”: it is
in fact a question of having α′ → 0 rather than α → 0. This peculiar limit
is useful to understand the α ∼ 0, π exclusion above mentioned. To obtain
parameter configurations with high likelihood when α(′) approaches 0 or π,
the required values of |T ij| and |P | are increasingly large. Imposing bounds
on |T ij| and |P | automatically limits how close to 0, π one can push the weak
phase while producing likely branching ratios and asymmetries. The use of
the parametrization in Eq. (9) shows how this works for the dangerous α′
and is then transmitted to α.
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
25 50 75 100 125 150 175
(c) α = α′ PDF
Allowed ranges: |T ij| ∈ [0; 10]× 10−3 ps−1/2, |P | ∈ [0; 2.5]× 10−3 ps−1/2
25 50 75 100 125 150 175
(d) α PDF
25 50 75 100 125 150 175
(e) α′ PDF
25 50 75 100 125 150 175
(f) α = α′ PDF
Allowed ranges: |T ij| ∈ [0; 10]× 10−3 ps−1/2, |P | ∈ [0; 10]× 10−3 ps−1/2
25 50 75 100 125 150 175
(g) α PDF
25 50 75 100 125 150 175
(h) α′ PDF
25 50 75 100 125 150 175
(i) α = α′ PDF
Allowed ranges: |T ij| ∈ [0; 5]× 10−3 ps−1/2, |P | ∈ [0; 1.25]× 10−3 ps−1/2
25 50 75 100 125 150 175
(j) α PDF
25 50 75 100 125 150 175
(k) α′ PDF
25 50 75 100 125 150 175
(l) α = α′ PDF
Allowed ranges: |T ij| ∈ [0; 25]× 10−3 ps−1/2, |P | ∈ [0; 25]× 10−3 ps−1/2
Figure 4: PDFs obtained using the parametrization in Eq. (9) and different
allowed ranges for |T ij| and |P |.
25 50 75 100 125 150 175
(a) |T ij| < 10, |P | < 2.5
25 50 75 100 125 150 175
(b) |T ij | < 10, |P | < 10
25 50 75 100 125 150 175
(c) |T ij | < 5, |P | < 1.25
25 50 75 100 125 150 175
(d) |T ij | < 25, |P | < 25
Figure 5: α CL; as usual |T ij| and |P | in units of 10−3 ps−1/2.
5 The extraction of α from B → ππ and New
Physics
Recently the UTfit collaboration has proposed to add information on the
moduli of the amplitudes in order to extract α inside the SM. In particular, to
add reasonable QCD based cuts on the moduli of T ij and P . Even if we agree
with this procedure, we must stress that the resulting PDF of α – see figures
4(c) or 4(i) – in the non zero region mixes ∆I = 3/2 information with spurious
∆I = 1/2 information. In this case it does not seem dramatic, but it can be so
in the B → ρρ case – see [2] –. In addition, if one is trying to make a general
fit of the SM it is more natural to use the ∆I = 3/2 piece of B → ππ to
get reliable bounds on α and once α is fixed by the general unitarity triangle
analysis, use the ∆I = 1/2 piece of B → ππ to obtain better information
on the hadronic parameters. In fact, the UTfit collaboration presents results
along this line in [3]. This implies our recommendation of using α in the A+0
amplitude and another phase in A+− or in the ∆I = 1/2 piece.
After confronting the SM à la CKM with data, the most important ob-
jective in overconstraining the unitarity triangle is in fact to look for New
Physics (NP) [14, 15, 16, 8, 9, 10, 11, 12, 13]. When there is NP – just in the
mixings or also in the ∆I = 1/2 decay amplitudes6 – it is not appropriate
to use a SM inspired parametrization. In the limit where all SM phases
go to zero, C+− and S+− can still be reproduced by NP loops. So, if we
want to interpret the α PDF as7 ᾱ we have to use a different CP-violating
phase in the ∆I = 1/2 piece or in A+−. Parametrizations that fulfill these
requirements are the so-called PLD, ES, the ’τ ’ parametrization in [2] and
even our SM-like parametrization with α′ in Eq. (9) despite having one more
parameter. A similar one, which additionally factorizes an overall scale of
the amplitudes, is the following, that we call ’1i’:
A+− ≡ e−iαT3/2(T + iP ),
2A00 ≡ e−iαT3/2(1− T − iP ),√
2A+0 ≡ e−iαT3/2,
2Ā+0 ≡ e+iαT3/2,
Ā+− ≡ e+iαT3/2(T − iP ),
2Ā00 ≡ e+iαT3/2(1− T + iP ).
Notice that a global weak phase in A+− is irrelevant in C
+− and amounts to
a global shift of arg(Ā+−A
In this section we will “extract” α in a bayesian approach making use of
different parametrizations; we will show the consistency of all those results
6With great accuracy – up to small electroweak penguins – this case corresponds to
having NP everywhere except in tree level amplitudes.
7Where ᾱ = π − β̄ − γ, β̄ = β − φd and the NP phase in B0d–B̄0d mixing is defined by
= r2de
−i2φd [Md
]SM .
and then compare to frequentist results. From a fundamental point of view,
as stressed in previous sections, we are not willing to use information beside
assuming the triangular isospin relations, the single “tree level” weak phase of
the ∆I = 3/2 piece and experimental results themselves. Reparametrization
invariance and the presence of a single weak phase, α, in the ∆I = 3/2
amplitudes A+0 and Ā+0 imply that all the results to be presented in this
section will be valid in the presence of New Physics in loops.
Figure 6 shows the PDF of α in three different cases: the ’PLD’ [17] and
’1i’ (Eq. (12)) parametrizations, and the explicit extraction (as in [17] or [5]).
Corresponding 68%, 90% and 95% probability regions are displayed in table
1, together with the frequentist 68%, 90% and 95% CL regions (in the fol-
lowing, frequentist calculations are carried with the ’PLD’ parametrization).
These regions are represented in figure 7. Despite some small differences in
the 68% regions, somehow expectable as they are more sensitive to details,
the results are consistent, they coincide rather well. B → ππ data are still
too uncertain to really provide important constraints on α, the only relevant
feature being the exclusion of the α ∼ π/4 region, which could be understood
(see section C.1 in appendix C) in terms of the smallness of B00.
25 50 75 100 125 150 175
(a) PLD parametrization
25 50 75 100 125 150 175
(b) 1i parametrization
25 50 75 100 125 150 175
(c) Explicit extraction
Figure 6: α PDFs.
68% 90% 95%
PLD [0; 5]◦ ∪ [85; 101]◦∪ [0; 8]◦ ∪ [82; 107]◦∪ [0; 9]◦ ∪ [82; 110]◦
[121; 150]◦ ∪ [168; 180]◦ [114; 157]◦ ∪ [162; 180]◦ ∪[113; 180]◦
1i [95; 174]◦ [0; 1]◦ ∪ [89; 180]◦ [0; 5]◦ ∪ [85; 180]◦
[2; 8]◦ ∪ [82; 88]◦∪ [0; 9]◦ ∪ [81; 91]◦
Explicit [100; 120]◦ ∪ [125; 145]◦∪ [95; 175]◦ ∪ [179; 180]◦ [0; 10]◦ ∪ [80; 180]◦
[150; 170]◦
CL [0; 7]◦ ∪ [83; 104]◦ [0; 12]◦ ∪ [78; 180]◦ [0; 14]◦ ∪ [76; 180]◦
[115; 154]◦ ∪ [166; 180]◦
Table 1: α regions within [0; 180◦].
25 50 75 100 125 150 175
Figure 7: α regions (the ordering, top to bottom, is in each case: ’PLD’ pa-
rametrization, ’1i’ parametrization, Explicit extraction and frequentist anal-
ysis).
Conclusions
To our knowledge the discrepancies between frequentist and bayesian ap-
proaches using the so-called MA and RI parametrizations with Eq. (7) have
not been previously understood. We explain that with present experimen-
tal uncertainties it is extremely unsecure to introduce the phase α in the
∆I = 1/2 piece. To a great extent a spurious PDF of α tends to be gener-
ated. The Gronau and London analysis is critically based on the appearance
of one weak phase in the ∆I = 3/2 piece (C+− = 0). Introducing α in the
∆I = 1/2 piece – or A+− – (C
+− 6= 0) brings this “second” α to the category
of ’not observable’ even if one is using a Standard Model inspired parametri-
zation. This difficulty is operative in the so-called MA and RI parametriza-
tions. The introduction of α in the ∆I = 1/2 piece and some QCD-based
bounds on the amplitudes allows – as done by the UTfit collaboration – to
eliminate the solutions around α ∼ 0, π inside the SM. The PDF can still be
partially contaminated with the spurious α distribution. In B → ππ it is not
dramatic but it could be so in other channels. This last procedure cannot be
applied to an analysis with NP in loops. Therefore, we strongly recommend
to use parametrizations where α is just included in the ∆I = 3/2 piece. We
partially agree with the UTfit collaboration that, in spite of the differences
among the frequentist and bayesian methods, both approaches give similar
results if one uses parametrizations with a clear physical meaning. In this
sense the most relevant result is the exclusion of the region ᾱ ∼ 25◦ − 75◦.
Acknowledgments
This research has been supported by European FEDER, Spanish MEC under
grant FPA 2005-01678, Generalitat Valenciana under GVACOMP 2007-172,
by Fundação para a Ciência e a Tecnologia (FCT, Portugal) through the
projects PDCT/FP/63912/2005, PDCT/FP/63914/2005, CFTP-FCT UNIT
777, and by the Marie Curie RTN MRTN-CT-2006-035505. M.N. acknowl-
edges financial support from FCT. The authors thank J. Bernabéu and P.
Paradisi for reading the manuscript and useful comments.
A Inputs and numerical methods
Along this work we use the set of experimental measurements [18, 19, 20, 21,
22,23,24,25,26,27,28], combined by the Heavy Flavour Averaging Group [29],
in table 2.
B+−ππ B
5.2± 0.2 1.31± 0.21 5.7± 0.4
C+−ππ S
−0.39± 0.07 −0.59± 0.09 −0.37± 0.32
Table 2: Experimental results, branching ratios are multiplied by 10−6.
In terms of B → ππ amplitudes,
Bij = τBi+j
|Aij|2 + |Āij |2
, C ij =
|Aij|2 − |Āij|2
|Aij|2 + |Āij|2
, Sij =
2 Im(ĀijAij
|Aij|2 + |Āij |2
All frequentist CL computations are performed by: (1) minimizing χ2 with
respect to all parameters except the one of interest which is fixed (in this case
α), (2) computing the corresponding CL through an incomplete Γ function.
All bayesian PDFs are computed using especially adapted Markov Chain
MonteCarlo techniques.
B Experimental results and isospin relations
The isospin relations
A+− +
2A00 =
2A+0 ,
Ā+− +
2Ā00 =
2Ā+0 , (14)
define two triangles in the complex plane whose relative orientation fixes α.
The sizes of the different sides follow from Eq. (14).
|A+−|
2|A00|
2|A+0| |Ā+−|
2|Ā00|
2|Ā+0|
1.441 1.040 2.634 2.176 1.533 2.634
Table 3: Numerical values of the sides of the isospin triangles computed with
experimental central values, to be multiplied by 10−3 ps−1/2.
This allows the reconstruction, up to a number of discrete ambigüities -
namely up to eight -, of both triangles. Central values of present measure-
ments yield the values of the sides in table 3. One straightforward question
is mandatory: do those would-be triangles “close”? The answer is in the
negative because
|A+−|+
2|A00| = 2.481 ≯ 2.634 =
2|A+0| ,
|Ā+−|+
2|Ā00| = 3.709 > 2.634 =
2|Ā+0| .
In fact, for those central values, the first triangle is not a triangle [30]. In
terms of likelihood, the closest configuration to that situation, the most likely
one, is having the first triangle flat, a feature which naturally explains the
reduced – by a factor of two, from eight to four – degeneracy of α “solutions”.
That is, while for old data the almost flatness of this same isospin triangle
yielded eight different solutions distributed in four almost-degenerate pairs,
those pairs are now degenerate and rather than exact solutions for the central
values of the observables they produce best-fitting points.
Consequently, the use of explicit solution constructions requires the rejec-
tion of the joint regions of experimental input incompatible with the isospin
relations Eqs. (14). For old data, this meant rejecting some 48.2% of allowed
experimental input (weighting each observable with a gaussian with mean
and standard deviation given by the corresponding central value and uncer-
tainty), for the new data set this rejection rate is 70.9%. In the bayesian
and frequentist treatments the isospin relations are assumed valid and all
the subsequent analyses are “normalized” to that assumption.
C Removing B → π0π0 information
C.1 Explicit extraction of α
This appendix is devoted to some complementary results extending what is
presented in section 2.2. The first issue we will address is the explicit extrac-
tion8 of α when B → π0π0 information is removed, that is, no knowledge of
B00 and C00. The explicit extraction of α assumes the isospin relations in
Eqs. (14) so to start with, the ignorance on B00 is not ”just plain ignorance”
(whatever this could stand for) as it will operatively mean that for any ex-
perimental set of results {B+−, B+0, C+−, S+−}, B00 and C00 should be such
8Beside the explicit formula for α in terms of the available observables presented in
reference [17] we also make use of the extraction of α explained in [5]; the results are com-
pletely equivalent, however the later does not make any use of a particular parametrization
of the amplitudes and is easily interpreted in terms of the isospin construction.
that both would-be isospin triangles are in fact isospin triangles. C00 is obvi-
ously restricted to be in the range [−1; 1]; what about B00? One could argue
that if there is no information on B → π0π0 it should be smaller than a given
bound or one can just let it be as large as allowed by other data and isospin
constraints. This rather trivial fact is apparently at the origin of the discrep-
ancy in the results presented in references [2,3] for the explicit extraction of
α “without” B → π0π0 information: figure 8 shows two PDFs of α. They
are obtained by generating known experimental sets {B+−, B+0, C+−, S+−}
according to gaussian distributions with central values and standard devia-
tions given by the quoted measurements and uncertainties (C+− and S+− are
also restricted to be within [−1; 1]), then C00 and B00 are generated through
flat distributions, C00 in the range [−1; 1] and B00 in a range [0;B00Max]. Sets
{B+−, B+0, C+−, S+−, B00, C00} which fulfill the isospin relations Eqs. (14)
are retained and used to extract α. The PDFs of α represented in figure 8
only differ in the value of B00Max, Fig. 8(a) was obtained with B
Max equal to
two times the present measurement while Fig. 8(b) was obtained with B00Max
equal to twenty times the present measurement. On the one hand, the PDF
in figure 8(a) coincides with the one presented in figure 4 ’ES’ of reference [2];
on the other hand the PDF in figure 8(b) agrees, more or less, with figure
4 of reference [3]. It is now clear that the difference among both may be
just due to the numerical procedure. Figure 8(b) shows that the removal of
B → π0π0 information leads to a loss of knowledge on α. Ironically, there is
a lesson in this example: numerics apart, the smallness of B00 is responsible
for the exclusion of values α ∼ π/4.
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α PDF
Figure 8: Explicit extraction without B → π0π0; lighter curves correspond
to the different individual contributions related by the discrete ambigüities.
C.2 Parametrizations
To complete the picture we now proceed to repeat the extraction of α when
B → π0π0 information is removed in several parametrizations. We will make
use of the ’PLD’ parametrization [17], of the ’1i’ parametrization with fixed
weak phases in {A+−, Ā+−} (Eq. (12)) and, finally, of the parametrization in
Eq. (9) but in this case, apart from α and α′, instead of moduli and phases we
will use real and imaginary parts of T+−, P and T 00 (the RI parametrization
in reference [2]). The PDFs of α obtained for the first two parametrizations
are shown in figure 9, they are eloquent: no knowledge on α.
25 50 75 100 125 150 175
(a) α PDF, PLD parametrization
25 50 75 100 125 150 175
(b) α PDF, 1i parametrization
Figure 9: Extraction without B → π0π0.
For the RI parametrization we show the PDFs of α, α′ and the one
obtained by setting α = α′ in figure 10. Once again it is clear that there
is no information on α and that inappropriately insisting on including it in
{A+−, Ā+−} produces the senseless result of figure 10(c).
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
25 50 75 100 125 150 175
(c) α = α′ PDF
Figure 10: Extraction without B → π0π0, RI parametrization.
The conclusion of this appendix is straightforward: just dealing with
a reduced scenario in which B → π0π0 information is removed, a proper
understanding of the subtleties involved in the parametrization of B → ππ
amplitudes avoids peculiar results as for instance the ’MA’ and ’RI’ ones
included in figure 4 of reference [2]. We have shown here that starting with a
flat prior for α consistently gives highly non-informative posteriors in several
sensible parametrizations.
D Using the RI parametrization
In section 2.3 we used the parametrization in Eq. (9) to obtain figure 2 with
flat |T+−|, |P |, |T 00|, arg(P ), arg(T 00), α and α′ priors. For completness
we also show – figure 11 – the PDFs of α, α′ and α = α′ in case one uses
flat |T+−|, Re [P ], Im [P ], Re [T 00], Im [T 00], α and α′ priors. Beside the
effect of the spurious α′ in the PDF of α = α′, we can also appreciate the
influence of the change in the priors: the integration domain is the same as in
figure 2 but the integration measure is now different. The main effect is the
relative enhancement of the contributions from regions with large parameters,
including the contributions from the α′ → 0 driven region.
25 50 75 100 125 150 175
(a) α PDF
25 50 75 100 125 150 175
(b) α′ PDF
25 50 75 100 125 150 175
(c) α = α′ PDF
Figure 11: α extraction, RI parametrization.
E One short statistical comment
Leaving completely aside philosophical aspects of probability, both frequen-
tist and bayesian approaches start with a common likelihood function. Each
approach reduces the information provided by the likelihood function in a
different manner. Consequently, they do not yield strictly coincident results:
• Bayesian posteriors obviously depend on the priors, for example the
allowed ranges or the shape. As we have seen, we obtain different
posteriors with different priors. However, as long as one is using sen-
sible parametrizations and reasonable priors, we end up finding rather
compatible results.
• Frequentist CL curves do depend on the parametrization, to be precise,
they depend on the allowed ranges for the parameters; once sensible
parametrizations and adequate ranges are used, CL curves obtained
with them are identical. The α → 0 limit in the SM inspired parame-
trization of Eq. (7) illustrates this issue.
Beside those well known issues, we may find troublesome that:
1. Most probable values in the bayesian PDFs do not coincide with the
analytical solutions for α.
2. Intimately related to this aspect, bayesian PDFs seem unable to dis-
tinguish among degenerate solutions.
We remind that these statements concern one dimensional PDFs of α. Fre-
quentist one dimensional CL curves distinguish α solutions because they are
obtained through best fitting points for fixed α. Bayesian PDFs do not dis-
tinguish them as the uncertainties produce distributions for the degenerate
solutions which overlap and add up in the complete PDF. One can still have
a hint of the proximity of different solutions from this kind of overlap, but
this is not the point here. For reduced experimental uncertainties, bayesian
PDFs would not overlap and would distinguish among those different solu-
tions. This could be sufficient to think that, per se, there is no discriminating
advantage in using one or the other approach. With present uncertainties,
bayesian analyses seem incapable of pinning down the right location of the
solutions in α and telling us something about their degeneracy. It is not a
fundamental problem of bayesian methods as reduced uncertainties would
overcome these “difficulties”. If it is not a fundamental problem, could we
somehow overcome these “difficulties” with present uncertainties? The an-
swer is in the positive as the problem only arises because we are insisting
in the reduction of the available experimental information to obtain one-
dimensional PDFs of α; let us take a look to the joint PDFs in figure 12.
These are the joint PDFs of (δ, α) and (αeff , α) obtained with the ’PLD’
parametrization. They are quite illustrative, one can see the different solu-
tions in α concentrated around the values of α dictated by the analytical
expectations. The pretended fundamental drawbacks of bayesian methods
to adequately place and distinguish the solutions are just a consequence of
pushing too far, for the present level of experimental uncertainty in the re-
sults, the statistical “reduction of information process”. A simultaneous look
to both frequentist and bayesian results will not put an end to the statistical
discrepancies, notwithstanding it will be very helpful to understand the phys-
ical results we are interested in. Both approaches are “information reduction
processes” and strictly sticking to one and deprecating the other may not be
the wiser strategy.
0 25 50 75 100 125 150 175
(a) Joint (δ, α) PDF
0 25 50 75 100 125 150 175
(b) Joint (αeff , α) PDF
Figure 12: Joint PDFs obtained with the ’PLD’ parametrization.
References
[1] M. Gronau and D. London, Phys. Rev. Lett. 65, 3381 (1990).
[2] J. Charles, A. Höcker, H. Lacker, F. R. Le Diberder, and S. T’Jampens,
hep-ph/0607246.
[3] UTfit, M. Bona et al., hep-ph/0701204.
[4] J. Charles, A. Höcker, H. Lacker, F. Le Diberder, and S. T’Jampens,
hep-ph/0703073.
[5] F. J. Botella and J. P. Silva, Phys. Rev. D71, 094008 (2005),
hep-ph/0503136.
[6] F. J. Botella, G. C. Branco, M. Nebot, and M. N. Rebelo, Nucl. Phys.
B651, 174 (2003), hep-ph/0206133.
[7] J. A. Aguilar-Saavedra, F. J. Botella, G. C. Branco, and M. Nebot,
Nucl. Phys. B706, 204 (2005), hep-ph/0406151.
[8] Z. Ligeti, M. Papucci, and G. Perez, Phys. Rev. Lett. 97, 101801 (2006),
hep-ph/0604112.
[9] P. Ball and R. Fleischer, Eur. Phys. J. C48, 413 (2006),
hep-ph/0604249.
[10] Y. Grossman, Y. Nir, and G. Raz, Phys. Rev. Lett. 97, 151801 (2006),
hep-ph/0605028.
[11] UTfit, M. Bona et al., Phys. Rev. Lett. 97, 151803 (2006),
hep-ph/0605213, http://www.utfit.org/.
[12] J. Charles, hep-ph/0606046.
[13] F. J. Botella, G. C. Branco, and M. Nebot, Nucl. Phys. B768, 1 (2007),
hep-ph/0608100.
[14] CKMfitter Group, J. Charles et al., Eur. Phys. J. C41, 1 (2005),
hep-ph/0406184.
[15] UTfit, M. Bona et al., JHEP 07, 028 (2005), hep-ph/0501199.
[16] F. J. Botella, G. C. Branco, M. Nebot, and M. N. Rebelo, Nucl. Phys.
B725, 155 (2005), hep-ph/0502133.
http://arxiv.org/abs/hep-ph/0607246
http://arxiv.org/abs/hep-ph/0701204
http://arxiv.org/abs/hep-ph/0703073
http://arxiv.org/abs/hep-ph/0503136
http://arxiv.org/abs/hep-ph/0206133
http://arxiv.org/abs/hep-ph/0406151
http://arxiv.org/abs/hep-ph/0604112
http://arxiv.org/abs/hep-ph/0604249
http://arxiv.org/abs/hep-ph/0605028
http://arxiv.org/abs/hep-ph/0605213
http://arxiv.org/abs/hep-ph/0606046
http://arxiv.org/abs/hep-ph/0608100
http://arxiv.org/abs/hep-ph/0406184
http://arxiv.org/abs/hep-ph/0501199
http://arxiv.org/abs/hep-ph/0502133
[17] M. Pivk and F. R. Le Diberder, Eur. Phys. J. C39, 397 (2005),
hep-ph/0406263.
[18] BABAR Collaboration, B. Aubert et al., hep-ex/0703016.
[19] BABAR Collaboration, B. Aubert et al., Phys. Rev. D75, 012008
(2007), hep-ex/0608003.
[20] BABAR Collaboration, B. Aubert et al., Phys. Rev. Lett. 95, 151803
(2005), hep-ex/0501071.
[21] BABAR Collaboration, B. Aubert et al., Phys. Rev. Lett. 94, 181802
(2005), hep-ex/0412037.
[22] BELLE Collaboration, K. Abe et al., hep-ex/0608035.
[23] BELLE Collaboration, K. Abe et al., Phys. Rev. Lett. 95, 101801 (2005),
hep-ex/0502035.
[24] BELLE Collaboration, K. Abe et al., Phys. Rev. Lett. 94, 181803 (2005),
hep-ex/0408101.
[25] BELLE Collaboration, K. Abe et al., Phys. Rev. Lett. 93, 021601 (2004),
hep-ex/0401029.
[26] BELLE Collaboration, Y. Chao et al., Phys. Rev. D69, 111102 (2004),
hep-ex/0311061.
[27] BELLE Collaboration, K. Abe et al., Phys. Rev. Lett. 91, 261801 (2003),
hep-ex/0308040.
[28] BELLE Collaboration, K. Abe et al., Phys. Rev. D68, 012001 (2003),
hep-ex/0301032.
[29] The Heavy Flavour Averaging Group,
http://www.slac.stanford.edu/xorg/hfag/.
[30] F. J. Botella, D. London, and J. P. Silva, Phys. Rev. D73, 071501
(2006), hep-ph/0602060.
http://arxiv.org/abs/hep-ph/0406263
http://arxiv.org/abs/hep-ex/0703016
http://arxiv.org/abs/hep-ex/0608003
http://arxiv.org/abs/hep-ex/0501071
http://arxiv.org/abs/hep-ex/0412037
http://arxiv.org/abs/hep-ex/0608035
http://arxiv.org/abs/hep-ex/0502035
http://arxiv.org/abs/hep-ex/0408101
http://arxiv.org/abs/hep-ex/0401029
http://arxiv.org/abs/hep-ex/0311061
http://arxiv.org/abs/hep-ex/0308040
http://arxiv.org/abs/hep-ex/0301032
http://www.slac.stanford.edu/xorg/hfag/
http://arxiv.org/abs/hep-ph/0602060
	Introduction
	Reparametrization invariance and bold0mu mumu BBBBBB
	Weak Phases
	Removing bold0mu mumu 000000000000 information
	Including back bold0mu mumu 000000000000 information
	Standard Model inspired parametrizations
	Physics and parametrical problems
	The extraction of bold0mu mumu  from bold0mu mumu BBBBBB and New Physics
	Inputs and numerical methods
	Experimental results and isospin relations
	Removing bold0mu mumu B00B00B00B00B00B00 information
	Explicit extraction of bold0mu mumu 
	Parametrizations
	Using the RI parametrization
	One short statistical comment
ABSTRACT
  The extraction of the weak phase $\alpha$ from $B\to\pi\pi$ decays has been
controversial from a statistical point of view, as the frequentist vs. bayesian
confrontation shows. We analyse several relevant questions which have not
deserved full attention and pervade the extraction of $\alpha$.
Reparametrization Invariance proves appropriate to understand those issues. We
show that some Standard Model inspired parametrizations can be senseless or
inadequate if they go beyond the minimal Gronau and London assumptions: the
single weak phase $\alpha$ just in the $\Delta I=3/2$ amplitudes, the isospin
relations and experimental data. Beside those analyses, we extract $\alpha$
through the use of several adequate parametrizations, showing that there is no
relevant discrepancy between frequentist and bayesian results. The most
relevant information, in terms of $\alpha$, is the exclusion of values around
$\alpha\sim \pi/4$; this result is valid in the presence of arbitrary New
Physics contributions to the $\Delta I=1/2$ piece.

<|endoftext|><|startoftext|>
Introduction
Supernovae measurements [1] indicate that our universe has entered a phase of late-
time acceleration. One can question the magnitude of the acceleration and its equation
of state, although given the concordance of different cosmological data, acceleration
seems a robust observation (although see [2] for criticisms). Commonly, in order to
explain this phenomenon one postulates the existence of a minute cosmological constant
Λ ∼ 10−12 eV4. This fits the data well and is the most economic explanation in terms of
parameter(s). However such a tiny value is extremely unnatural from a particle physics
point of view [3]. Given the theoretical problems of a cosmological constant, one hopes
that the intriguing phenomenon of acceleration is a window to new observable physics.
This could be in the matter sector, in the form of dark energy [4, 5], or in the gravity
sector, in the form of a large distance modification of Einstein gravity [6, 7, 8, 9].
Scalar field driven dark energy, or quintessence [4] is one of the most popular of the
former possibilities. However these models have important drawbacks, such as the
fine tuning of the mass of the quintessence field (which has to be smaller than the
actual Hubble parameter, H0 ∼ 10−33 eV), and stability of radiative corrections from
http://arxiv.org/abs/0704.0175v2
Solar system constraints on Gauss-Bonnet mediated dark energy 2
the matter sector [10] (see however [11]). Modified gravity models have the potential
to avoid these problems, and can give a more profound explanation of the acceleration.
However, these are far more difficult to obtain since Einstein’s theory is experimentally
well established [12], and the required modifications happen at very low (classical)
energy scales which are (supposed to be) theoretically well understood. Furthermore,
many apparently successful modified gravity models suffer from instabilities or are
incompatible with gravity experiments. For example the self-accelerating solutions of
DGP [8] suffer from perturbative ghosts [13], and f(R) gravity theories [9] can conflict
with solar system measurements and present instabilities [14].
In this paper we will consider observational constraints on a class of gravity theories
which feature both dark energy and modified gravity. Specifically, we will examine
solar system and laboratory constraints resulting from the response of gravity to a
quintessence-like scalar field, which couples to quadratic order curvature terms such
as the Gauss-Bonnet term. Such couplings arise naturally [15], and modify gravity at
local and cosmological scales [15, 16]. Although the Gauss-Bonnet invariant shares
many of the properties of the Einstein-Hilbert term, the resulting theory can have
substantially different features, see for example [17]. It is a promising candidate for
a consistent explanation of cosmological acceleration, but as we will show, can also
produce undesirable effects at solar system scales.
In particular, we will determine constraints from deviations in planetary orbits
around the sun, the frequency shift of signals from the Cassini probe, and table-
top experiments. In contrast to some previous efforts in the field [18], we will not
suppose a priori the order of the Gauss-Bonnet correction or the scalar field potential.
Instead we will calculate leading-order gravity corrections for each of them, and obtain
constraints on the relevant coupling constants (checking they fall within the validity of
our perturbative expansion). Hence our analysis will apply for large couplings, which
as we will see, are in accord with Gauss-Bonnet driven effective dark energy models. In
this way we will show such models generally produce significant deviations from general
relativity at local scales. We also include higher-order scalar field kinetic terms, although
for the solutions we consider they turn out to be subdominant.
In the next section we will present the theory in question and calculate the
corrections to a post-Newtonian metric for a distributional point mass source. In
section 3, we derive constraints from planetary motion, the Cassini probe, and a table-
top experiment. For the Cassini constraint, we have to explicitly derive the predicted
frequency shift for our theory, as it does not fall within the usual Parametrised Post-
Newtonian (PPN) analysis. We discuss the implications of our results in section 4.
2. Quadratic Curvature Gravity
We will consider a theory with the second-order gravitational Lagrangian
R− (∇φ)2 − 2V (φ)
Solar system constraints on Gauss-Bonnet mediated dark energy 3
ξ1(φ)LGB + ξ2(φ)Gµν∇µφ∇νφ+ ξ3(φ)(∇φ)2∇2φ+ ξ4(φ)(∇φ)4
, (1)
which includes the Gauss-Bonnet term LGB = R2 − 4RµνRµν + RµνρσRµνρσ. Note for
example that such a Lagrangian with given ξ’s arises naturally from higher dimensional
compactification of a pure gravitational theory [15]. On its own, in four dimensions, the
Gauss-Bonnet term does not contribute to the gravitational field equations. However
we emphasise that when coupled to a scalar field (as above), it has a non-trivial effect.
Throughout this paper we take the dimensionless couplings ξi and their derivatives
to be O(1). There is then only one scale for the higher curvature part of the action,
given by the parameter α, with dimensions of length squared. Similarly we assume that
all derivatives of the potential V are of O(V ), which in our conventions has dimensions
of inverse length squared. These two simplifying assumptions will hold for a wide range
of theories, including those in which ξi and V arise from a toroidal compactification of
a higher dimensional space [15]. On the other hand it is perfectly conceivable that they
do not apply for our universe, in which case the corresponding gravity theories will not
be covered by the analysis in this article.
Using the post-Newtonian limit, the metric for the solar system can be written [12]
ds2 = −(1− h00)(c dt)2 + (δij + hij)dxidxj +O(ǫ3/2) . (2)
with h00, hij = O(ǫ). The dimensionless parameter ǫ is the typical gravitational strength,
given by ǫ = Gm/(rc2) where m is the typical mass scale and r the typical length scale
(see below). For the solar system ǫ is at most 10−5, while for cosmology, or close to the
event horizon of a black hole, it is of order unity. The scale of planetary velocities v,
is of order ǫ1/2, and so the h0i components of the metric are O(ǫ
3/2), as are ∂th00 and
∂thij . In what follows, we will take φ = φ0 +O(ǫ). For the linearised approximation we
are using, we can adopt a post-Newtonian gauge in which the off-diagonal components
of hij are zero. We can then write
hij = −2Ψδij , h00 = −2Φ , (3)
and so c2Φ is the Newtonian potential.
In this paper we will consider the leading-order corrections in ǫ without assumptions
on the magnitude of V and α. To leading order in ǫ, the Einstein equations take the
nice compact form,
ρm − V − 2αξ′1D(Φ + Ψ, φ) + O(ǫ2, αǫ3/r2, V ǫr2) (4)
2ξ′1D(Ψ, φ) +
D(φ, φ)
+O(ǫ2, αǫ3/r2, V ǫr2) (5)
where primes denote ∂/∂φ, and V , ξ′1, etc. are evaluated at φ = φ0. The matter energy
density in the solar system is ρm, and G0 is its bare coupling strength (without quadratic
gravity corrections). Other components of the energy-momentum tensor are higher order
in ǫ. The scalar field equation is
∆φ = V ′ − α [4ξ′1D(Φ,Ψ) + ξ2D(Φ−Ψ, φ) + ξ3D(φ, φ)] + O(ǫ2, αǫ3/r2, V ǫr2) . (6)
Solar system constraints on Gauss-Bonnet mediated dark energy 4
We have defined the operators
X,ii , D(X, Y ) =
X,ijY,ij −∆X∆Y . (7)
with i, j = 1, 2, 3 where to leading order, the Gauss-Bonnet term is then LGB =
8D(Φ,Ψ). For standard Einstein gravity (V = α = 0), the solution of the above
equations is
Φ = Ψ = −Um , φ = φ0 , (8)
where
ρm(~x
′, t)
|~x− ~x′|
. (9)
We will now study solutions which are close to the post-Newtonian limit of general
relativity, and take
Φ = −Um + δΦ , Ψ = −Um + δΨ , φ = φ0 + δφ , (10)
where δφ, etc. are the leading-order α- and V -dependent corrections.
Note that the Laplacian carries a distribution and therefore we have to be careful
with the implementation of the D operator. We see that δφ is O(V, αǫ2), and so, to
leading order, we have
∆ δφ = V ′ − 4αξ′1D(Um, Um) . (11)
Having calculated δφ, we obtain
∆ δΦ = −V + 4αξ′1D(Um, δφ) (12)
∆ δΨ =
+ 2αξ′1D(Um, δφ) . (13)
In the case of a spherical distributional source ρm = mδ
(3)(x),
. (14)
In accordance to our estimations for ǫ the solar system Newtonian potentials are
Um . 10
−5, and the velocities satisfy v2 . Um. For planets we have Um . 10
(with the maximum attained by Mercury).
With the aid of the relation
D(r−n, r−m) = 2nm
n+m+ 2
∆r−(n+m+2) (15)
the above expressions evaluate, at leading order, to
φ = φ0 +
r2V ′
− 2ξ′1
α(G0m)
Φ = −G0m
− 64(ξ
α2(G0m)
Ψ = −G0m
− 32(ξ
α2(G0m)
. (18)
Solar system constraints on Gauss-Bonnet mediated dark energy 5
We find that there are now non-standard corrections to the Newtonian potential which
do not follow the usual parametrised expansion, in agreement with [19], but not [18]
(which uses different assumptions on the form of the theory). First of all note that the
Gauss-Bonnet coupling α couples to the running of the dark energy potential V ′, giving
a 1/r contribution to the modified Newtonian potential (17). We absorb this into the
gravitational coupling,
G = G0
. (19)
The corresponding term in (18) gives a constant contribution to the effective γ PPN
parameter. The r2V terms in (17), (18) are typical of a theory with a cosmological
constant, whereas the final, 1/r7 terms are the leading pure Gauss-Bonnet correction,
which is enhanced at small distances. If we take the usual expression for the PPN
parameter γ = Ψ/Φ, we see that it is r dependent. In using the Cassini constraint on γ
we must be careful to calculate the frequency shift from scratch.
For the above derivation we have assumed δφ ≪ Um, which implies V ≪ Um/r2
and α ≪ r2/Um. This will hold in the solar system if
V ≪ 10−36m−2 and α ≪
1023m2 (everywhere)
1029m2 (planets only)
in geometrised units. Note that strictly speaking there is also a lower bound on our
coupling constants, if the above analysis is to be valid. Indeed, if we were to find
corrections of order ǫ2 ∼ 10−14, then it would imply that higher-order corrections from
general relativity were just as important as the ones appearing in (17), (18).
3. Constraints
3.1. Planetary motion
Deviations from the usual Newtonian potential will affect planetary motions, which
provides a way of bounding them. This idea has been used to bound dark matter in the
solar system [20], and also the value of the cosmological constant [21]. We will apply the
same arguments to our theory. From the above gravitational potential (17), we obtain
the Newtonian acceleration
gacc(r) = −c2
64(αξ′1)
Gmeff
where rg ≡ Gm/c2 is gravitational radius of the mass m. The above expression gives
the effective mass meff felt by a body at distance r. If the test body is a planet with
semi-major axis a, we can use this formula at r ≈ a. Its mean motion n ≡
Gm/a3
will then be changed by δn = (n/2)(δmeff/m). By evaluating the statistical errors of
the mean motions of the planets, δn = −(3n/2)δa/a, we can derive a bound on δmeff
and hence our deviations from general relativity
δmeff
64(αξ′1)
. (22)
Solar system constraints on Gauss-Bonnet mediated dark energy 6
The values of a for the planets are determined using Kepler’s third law, with a constant
sun’s mass m⊙. Constraints on δΦ then follow from the errors δa, in the measure of a.
These can be found in [22], and are also listed in the appendix for convenience. Given
their different r-dependence, the two corrections to δmeff are unlikely to cancel. We will
therefore bound them separately, giving constraints on α and V .
The strongest bound on the combination ξ′1α comes from Mercury, with
. 1.8× 10−12 . (23)
Neglecting the cosmological constant term, and using a ≈ 5.8×107 km and rg ≈ 1.5 km,
we find
|ξ′1α| .
(3a5δa)
≈ 3.8× 1022m2 . (24)
We see that this is within range of validity (20) for our perturbative treatment of gravity.
In cosmology, the density fraction corresponding to the Gauss-Bonnet term is [15]
ΩGB = 4ξ
. (25)
If this is to play the role of dark energy in our universe, it needs to take, along with the
contribution of the potential, a value around 0.7 at cosmological length scales (and for
redshift z ∼ 1).
If we wish to accurately apply the bound on α (24) to cosmological scales, details of
the dynamical evolution of φ will be required. These will depend on the form of V and
the ξi, and are expected to involve complex numerical analysis, all of which is beyond
the scope of this work. Here we will instead assume that the cosmological value of φ
is also φ0, which, while crude, will allow us to estimate the significance of the above
result. Given the hierarchy between cosmological and solar system scales it is natural
to question this assumption but we will make it here, and discuss it in more detail in
the concluding section.
Making the further, and less controversial, assumption that dφ/dt ≈ H , we obtain
a very stringent constraint on ΩGB:
|ΩGB| ≈ 4|ξ′1α|H20 . 8.8× 10−30 . (26)
Hence we see that solar system constraints on Gauss-Bonnet fraction of the dark energy
are potentially very significant, despite the fact that the Gauss-bonnet term is quadratic
in curvature.
Since we are assuming that all the ξi are of the same order, the above bound
also applies to the dark energy fractions arising from the final three terms in (1).
Clearly there are effective dark energy models for which the analysis leading to the
above bound (26) does not apply. However any successful model will require a huge
variation of ξ1 between local and cosmological scales, or a very substantial violation of
one of our other assumptions.
Solar system constraints on Gauss-Bonnet mediated dark energy 7
For comparison, we apply similar arguments to obtain a constraint on the potential.
The strongest bound comes from the motion of Mars [21], and is
|V | .
9rgδa
≈ 1.2× 10−40m−2 . (27)
This suggests ΩV = V/(3H
0) . 7.3×1011, which is vastly weaker than the corresponding
cosmological constraint (ΩV . 1). Hence planetary orbits tell us little of significance
about dark energy arising from a potential, in sharp contrast to the situation for Gauss-
Bonnet dark energy.
3.2. Cassini spacecraft
The most stringent constraint on the PPN parameter γ was obtained from the Cassini
spacecraft in 2002 while on its way to Saturn. The signals between the spacecraft
and the earth pass close to the sun, whose gravitational field produces a time delay.
The smallest value of r on the light ray’s path defines the impact parameter b. A
small impact parameter maximises the light delay. During that year’s superior solar
conjunction the spacecraft was re = 8.43AU = 1.26 × 1012m away from the sun, and
the impact parameter dropped as low as bmin = 1.6R⊙. A PPN analysis of the system
produced the strong constraint
δγ ≡ γ − 1 = (2.1± 2.3)× 10−5 . (28)
Given that our theory is not PPN we have to undertake the calculation from scratch.
The above constraint comes from considering a round trip, in which the light travels
from earth, grazes the sun’s ‘surface’, reaches the spacecraft, and then returns by the
same route. We take the path of the photon to be the straight line between the earth
and the spacecraft, ~x = (x, b, 0) with x varying from −xe to x⊕. For a round trip (there
and back), the additional time delay for a light ray due to the gravitational field of the
sun is then
c∆t = 2
h00(r) + hxx(r)
dx = −2
(Φ + Ψ)|r=√x2+b2 dx . (29)
For the solution (17) and (18), this evaluates to
c∆t = 4rg
1− 2αξ
a3⊕ + r
+ b2(a⊕ + re)
1024(αξ′1)
, (30)
where we have assumed x⊕ ≈ a⊕ ≫ b, and similarly for the spacecraft.
Rather than directly measure ∆t, the Cassini experiment actually found the
frequency shift in the signal [23]
ygr =
. (31)
The results obtained were
ygr = −
10−5 s
(2 + δγ) . (32)
Solar system constraints on Gauss-Bonnet mediated dark energy 8
If gravitation were to be described by the standard PPN formalism, then δγ would be
the possible deviation of the PPN parameter γ from the general relativity value of 1.
From (30) we obtain
ygr = −
b2V (a⊕ + re)
1536(αξ′1)
4αξ′1V
. (33)
Requiring that the corrections are within the errors (28) of (32), implies
|ξ′1α| .
. 1.6× 1020m2 . (34)
This suggests the dark energy bound
|ΩGB| . 3.6× 10−32 , (35)
although obtaining this bound from solar system data requires major assumptions about
the cosmological behaviour of φ, as we will point out in section 4.
The data obtained by the spacecraft were actually for a range of impact parameters
b, but we have just used the most conservative value b = bmin = 1.6R⊙. The above
constraint is even stronger than (24), which was obtained for planetary motion. This
is because the experiment involved smaller r, and so the possible Gauss-Bonnet effects
were larger.
Taking the above expression for ygr (33) at face value, we can also constrain the
potential to be |V | . 10−22m2 and the cross-term |αξ′1V ′| . 10−5. However these are
of little interest as they are much weaker than the planetary motion constraints (24),
(27), and also the former is far outside the range of validity (20) of our analysis.
3.3. A table-top experiment
Laboratory experiments can also be used to obtain bounds on deviations from Newton’s
law. For illustration we will consider the table-top experiment described in [24]. It
consists of a 60 cm copper bar, suspended at its midpoint by a tungsten wire. Two 7.3 kg
masses are placed on carts far (105 cm) from the bar, and another mass of m ≈ 43 g is
placed near (5 cm) to the side of bar. Moving the masses to the opposite sides of the bar
changes in the torque felt by it. The experiment measures the torques N105 and −N5
produced respectively by the far and near masses. The masses and distances are chosen
so that the two torques roughly cancel. The ratio R = N105/N5 is then determined, and
compared with the theoretical value. The deviation from the Newtonian result is
Rexpt
RNewton
− 1 = (1.2± 7)× 10−4 . (36)
In fact, to help reduce errors, additional measurements were taken. To account for the
gravitational field of the carts that the far masses sit on, the experiment was repeated
with only the carts and a m′ ≈ 3 g near mass. The measured torque was then subtracted
from the result for the loaded carts.
The Gauss-Bonnet corrections to the Newton potential (17) will alter the torques
produced by all four masses, as well as the carts. Furthermore, since δΦ is non-linear
Solar system constraints on Gauss-Bonnet mediated dark energy 9
in mass, there will be further corrections coming from cross terms. The expressions
derived in section 2 are just for the gravitational field of a single mass, and so will not
fully describe the above table-top experiment. However, we find that the contribution
from the mass m will dominate the other corrections, and so we can get a good estimate
of the Gauss-Bonnet contribution to the ratio R by just considering m.
The torque experienced by the copper bar, due to a point mass at ~X = (X, Y, Z) is
d3x (~x ∧ ~F )z = ρCu
yX − xY
r=| ~X−~x|
, (37)
where ρCu is the bar’s density. A full list of parameters for the experiment is given
in table I of [24]. The bar’s dimensions are 60 cm × 1.5 cm × 0.65 cm. Working in
coordinates with the origin at the centre of the bar, the mass m = 43.58 g is at
~X = (24.42,−4.77,−0.03) cm. Treating m as a point mass, Newtonian gravity implies
a torque of N5 ≈ (8.2 cm2)GmρCu is produced. The Gauss-Bonnet correction is
δN5 = ρCu
64G3m3(αξ′1)
yX − xY
| ~X − ~x|9
≈ −(0.025 cm−4)
(Gm)3(αξ′1)
. (38)
To be consistent with the bound (36), we require δN5/N5 to be within the range of δR.
This implies
|αξ′1| . (18 cm3)
. 1.3× 1022m2 , (39)
which is comparable to the planetary constraint (24). Extrapolating it to cosmological
scales gives
|ΩGB| . 3.1× 10−30 . (40)
There are of course many more recent laboratory tests of gravity, and we expect
that stronger constraints can be obtained from them. Table-top experiments frequently
involve multiple gravitational sources, or gravitational fields which cannot reasonably be
treated as point masses. A more detailed calculation than the one presented in section 2
will then be required. For example, the gravitational field inside a sphere or cylinder
will not receive corrections of the form (17), and so any experiment involving a test
mass moving in such a field requires a different analysis.
4. Discussion
We have shown that significant constraints on Gauss-Bonnet gravity can be derived from
both solar system measurements and table-top laboratory experiments (note that further
constraints arise when imposing theoretical constraints like absence of superluminal or
ghost modes, see [25]). The fact that the corrections to Einstein gravity are second
order in curvature suggests they will automatically be small. However this does not
take into account the fact that the dimensionfull coupling of the Gauss-Bonnet term
must be large if it is to have any hope of producing effective dark energy. Additional
constraints will come from the perihelion precession of Mercury, although the linearised
Solar system constraints on Gauss-Bonnet mediated dark energy 10
analysis we have used is inadequate to determine this, and higher-order (in ǫ) effects
will need to be calculated.
Performing an extrapolation of our results to cosmological scales suggests that the
density fraction ΩGB will be far too small to explain the accelerated expansion of our
universe. This agrees with the conclusions of [19]. Hence if Gauss-Bonnet gravity is to
be a viable dark energy candidate, one needs to find a loophole in the above arguments.
This is not too difficult, and we will now turn to this question.
In particular, we have assumed no spatial or temporal evolution of the field φ
between cosmological and solar system scales, even though the supernova measurements
correspond to a higher redshift and a far different typical distance scale. A varying φ
would of course imply that different values of ξi, and their derivatives, would be perceived
by supernovas and the planets. It is interesting to note that the size of the bound we have
found (26) is of order the square of the ratio of the solar system and the cosmological
horizon scales, s = (1AUH0)
2 ∼ 10−30. Therefore one could reasonably argue that the
small number appearing in (26) could in fact be due to the hierarchy scale, s, rather
than a very stringent constraint on ΩGB. This could perhaps be concretely realised with
something similar to the chameleon effect [26] giving some constraint on the running of
the quintessence theory. One other possibility is that the baryons (which make up the
solar system) and dark matter (which is dominant at cosmological scales) have different
couplings to φ [27]. Again, this would alter the relation between local and cosmological
constraints.
Alternatively, it may be that our assumptions on the form of the theory should
be changed. The scalar field could be coupled directly to the Einstein-Hilbert term, as
in Brans-Dicke gravity. Additionally, the couplings ξi and their derivatives could be of
different orders. The same could be true of the potential. In particular, if φ were to have
a significant mass, this would suppress the quadratic curvature effects, as they operate
via the scalar field. This would be similar to the situation in scalar-tensor gravity with
a potential, where the strong constraints on the theory can be avoided by giving the
scalar a large mass (which, however, would inhibit acceleration).
Finally, the behaviour of the scalar field could be radically different. We took it to
be O(ǫ), like the metric perturbations. However since our constraints are on the metric,
and not φ, this need not be true. Furthermore, since the theory is quadratic, there may
well be alternative solutions of the field equations, and not just the one we studied.
Hence to obtain a viable Gauss-Bonnet dark energy model, which is compatible
with solar system constraints, at least one of the above assumptions must be broken.
For many of the above ideas the higher-order scalar kinetic terms will play a significant
role. This then opens up the possibility that the higher-gravity corrections will cancel
each other, further weakening the constraints. We hope to address some of these issues
in the near future.
Solar system constraints on Gauss-Bonnet mediated dark energy 11
Acknowledgments
CC thanks Martin Bucher, Gilles Esposito-Farese and Lorenzo Sorbo for discussions.
SCD thanks the Netherlands Organisation for Scientific Research (NWO) for financial
support.
Appendix
For the benefit of readers without an astronomical background, we list relevant solar
system parameters. The values for δa come from table 4 of [22]. We take the Hubble
constant to be H0 = 70 kms
−1 Mpc−1.
R⊙ = 6.96× 108m
r⊙g ≡ Gm⊙/c2 = 1477m
H0/c = 7.566× 10−27m−1
1AU ≡ a⊕ = 149597870691m
G = 6.6742× 10−11m3 s−2 kg−1
c = 299792458m s−1
m⊙ = 1.989× 1030 kg
name a (109m) δa (m)
Mercury 57.9 0.105
Venus 108 0.329
Earth 149 0.146
Mars 228 0.657
Jupiter 778 639
Saturn 1433 4.22× 103
Uranus 2872 3.85× 104
Neptune 4495 4.79× 105
Pluto 5870 3.46× 106
References
[1] A. G. Riess et al. [Supernova Search Team Collaboration],Observational Evidence from Supernovae
for an Accelerating Universe and a Cosmological Constant, Astron. J. 116, 1009 (1998)
[astro-ph/9805201]
S. Perlmutter et al. [Supernova Cosmology Project Collaboration], Measurements of Omega and
Lambda from 42 High-Redshift Supernovae, Astrophys. J. 517, 565 (1999) [astro-ph/9812133]
A. G. Riess et al. [Supernova Search Team Collaboration], Type Ia Supernova Discoveries at z > 1
From the Hubble Space Telescope: Evidence for Past Deceleration and Constraints on Dark
Energy Evolution, Astrophys. J. 607, 665 (2004) [astro-ph/0402512]
[2] A. Blanchard, M. Douspis, M. Rowan-Robinson and S. Sarkar, An alternative to the cosmological
concordance model, Astron. Astrophys. 412, 35 (2003) [astro-ph/0304237]
[3] S. Weinberg, The cosmological constant problem, Rev. Mod. Phys. 61, 1 (1989)
[4] C. Wetterich, Cosmology and the Fate of Dilatation Symmetry, Nucl. Phys. B 302, 668 (1988)
B. Ratra and P. J. E. Peebles, Cosmological Consequences of a Rolling Homogeneous Scalar Field,
Phys. Rev. D 37, 3406 (1988)
R. R. Caldwell, R. Dave and P. J. Steinhardt, Cosmological Imprint of an Energy Component with
General Equation-of-State, Phys. Rev. Lett. 80, 1582 (1998) [astro-ph/9708069]
C. Armendariz-Picon, V. F. Mukhanov and P. J. Steinhardt, A dynamical solution to the problem
of a small cosmological constant and late-time cosmic acceleration, Phys. Rev. Lett. 85, 4438
(2000) [astro-ph/0004134]
C. Armendariz-Picon, V. F. Mukhanov and P. J. Steinhardt, Essentials of k-essence, Phys. Rev.
D 63, 103510 (2001) [astro-ph/0006373]
T. Chiba, T. Okabe and M. Yamaguchi, Kinetically driven quintessence, Phys. Rev. D 62, 023511
(2000) [astro-ph/9912463]
http://arxiv.org/abs/astro-ph/9805201
http://arxiv.org/abs/astro-ph/9812133
http://arxiv.org/abs/astro-ph/0402512
http://arxiv.org/abs/astro-ph/0304237
http://arxiv.org/abs/astro-ph/9708069
http://arxiv.org/abs/astro-ph/0004134
http://arxiv.org/abs/astro-ph/0006373
http://arxiv.org/abs/astro-ph/9912463
Solar system constraints on Gauss-Bonnet mediated dark energy 12
N. Arkani-Hamed, L. J. Hall, C. F. Kolda and H. Murayama, A New Perspective on Cosmic
Coincidence Problems, Phys. Rev. Lett. 85, 4434 (2000) [astro-ph/0005111]
[5] V. K. Onemli and R. P. Woodard, Quantum effects can render w < −1 on cosmological scales,
Phys. Rev. D 70, 107301 (2004) [gr-qc/0406098]
V. K. Onemli and R. P. Woodard, Super-acceleration from massless, minimally coupled phi**4,
Class. Quant. Grav. 19, 4607 (2002) [gr-qc/0204065]
[6] A. Padilla, Cosmic acceleration from asymmetric branes, Class. Quant. Grav. 22, 681 (2005)
[hep-th/0406157]
A. Padilla, Infra-red modification of gravity from asymmetric branes, Class. Quant. Grav. 22, 1087
(2005) [hep-th/0410033]
N. Kaloper and D. Kiley, Charting the Landscape of Modified Gravity, JHEP 0705, 045 (2007)
[hep-th/0703190]
N. Kaloper, A new dimension hidden in the shadow of a wall, Phys. Lett. B 652, 92 (2007)
[hep-th/0702206]
[7] G. R. Dvali, G. Gabadadze and M. Porrati, 4D gravity on a brane in 5D Minkowski space, Phys.
Lett. B 485, 208 (2000) [hep-th/0005016]
G. R. Dvali and G. Gabadadze, Gravity on a brane in infinite-volume extra space, Phys. Rev. D
63, 065007 (2001) [hep-th/0008054]
C. Deffayet, G. R. Dvali and G. Gabadadze, Accelerated universe from gravity leaking to extra
dimensions, Phys. Rev. D 65, 044023 (2002) [astro-ph/0105068]
A. Lue, The phenomenology of Dvali-Gabadadze-Porrati cosmologies, Phys. Rept. 423, 1 (2006)
[astro-ph/0510068]
[8] C. Deffayet, Cosmology on a brane in Minkowski bulk, Phys. Lett. B 502, 199 (2001)
[hep-th/0010186]
[9] S. Capozziello, S. Carloni and A. Troisi, Quintessence without scalar fields, astro-ph/0303041
S. Capozziello, V. F. Cardone, S. Carloni and A. Troisi, Curvature quintessence matched with
observational data, Int. J. Mod. Phys. D 12, 1969 (2003) [astro-ph/0307018]
S. M. Carroll, V. Duvvuri, M. Trodden and M. S. Turner, Is cosmic speed-up due to new
gravitational physics?, Phys. Rev. D 70, 043528 (2004) [astro-ph/0306438]
[10] S. M. Carroll, Quintessence and the rest of the world, Phys. Rev. Lett. 81, 3067 (1998)
[astro-ph/9806099]
C. F. Kolda and D. H. Lyth, Quintessential difficulties, Phys. Lett. B 458, 197 (1999)
[hep-ph/9811375]
T. Chiba, Quintessence, the gravitational constant, and gravity, Phys. Rev. D 60, 083508 (1999)
[gr-qc/9903094]
[11] J. A. Frieman, C. T. Hill, A. Stebbins and I. Waga, Cosmology with ultralight pseudo Nambu-
Goldstone bosons, Phys. Rev. Lett. 75, 2077 (1995) [astro-ph/9505060]
Y. Nomura, T. Watari and T. Yanagida, Quintessence axion potential induced by electroweak
instanton effects, Phys. Lett. B 484, 103 (2000) [hep-ph/0004182]
J. E. Kim and H. P. Nilles, A quintessential axion, Phys. Lett. B 553, 1 (2003) [hep-ph/0210402]
K. Choi, String or M theory axion as a quintessence, Phys. Rev. D 62, 043509 (2000)
[hep-ph/9902292]
N. Kaloper and L. Sorbo, Of pNGB QuiNtessence, JCAP 0604, 007 (2006) [astro-ph/0511543]
[12] C. M. Will, The confrontation between general relativity and experiment, gr-qc/0510072
C. M. Will, Theory and experiment in gravitational physics, Cambridge University Press (1993)
G. Esposito-Farese, Tests of Alternative Theories of Gravity, Proceedings of 33rd SLAC Summer
Institute on Particle Physics (SSI 2005): Gravity in the Quantum World and the Cosmos, Menlo
Park, California, 25 Jul – 5 Aug 2005, pp T025
[13] D. Gorbunov, K. Koyama and S. Sibiryakov, More on ghosts in DGP model, Phys. Rev. D 73,
044016 (2006) [hep-th/0512097]
C. Charmousis, R. Gregory, N. Kaloper and A. Padilla, DGP specteroscopy, JHEP 0610, 066
http://arxiv.org/abs/astro-ph/0005111
http://arxiv.org/abs/gr-qc/0406098
http://arxiv.org/abs/gr-qc/0204065
http://arxiv.org/abs/hep-th/0406157
http://arxiv.org/abs/hep-th/0410033
http://arxiv.org/abs/hep-th/0703190
http://arxiv.org/abs/hep-th/0702206
http://arxiv.org/abs/hep-th/0005016
http://arxiv.org/abs/hep-th/0008054
http://arxiv.org/abs/astro-ph/0105068
http://arxiv.org/abs/astro-ph/0510068
http://arxiv.org/abs/hep-th/0010186
http://arxiv.org/abs/astro-ph/0303041
http://arxiv.org/abs/astro-ph/0307018
http://arxiv.org/abs/astro-ph/0306438
http://arxiv.org/abs/astro-ph/9806099
http://arxiv.org/abs/hep-ph/9811375
http://arxiv.org/abs/gr-qc/9903094
http://arxiv.org/abs/astro-ph/9505060
http://arxiv.org/abs/hep-ph/0004182
http://arxiv.org/abs/hep-ph/0210402
http://arxiv.org/abs/hep-ph/9902292
http://arxiv.org/abs/astro-ph/0511543
http://arxiv.org/abs/gr-qc/0510072
http://arxiv.org/abs/hep-th/0512097
Solar system constraints on Gauss-Bonnet mediated dark energy 13
(2006) [hep-th/0604086]
K. Koyama, Are there ghosts in the self-accelerating brane universe?, Phys. Rev. D 72, 123511
(2005) [hep-th/0503191]
[14] T. Chiba, 1/R gravity and scalar-tensor gravity, Phys. Lett. B 575, 1 (2003) [astro-ph/0307338]
A. D. Dolgov and M. Kawasaki, Can modified gravity explain accelerated cosmic expansion?, Phys.
Lett. B 573, 1 (2003) [astro-ph/0307285]
L. Amendola, D. Polarski, S. Tsujikawa, Are f(R) dark energy models cosmologically viable?
Phys. Rev. Lett. 98, 131302 (2007) [astro-ph/0603703]
[15] L. Amendola, C. Charmousis and S. C. Davis, Constraints on Gauss-Bonnet gravity in dark energy
cosmologies, JCAP 0612, 020 (2006) [hep-th/0506137]
[16] T. Koivisto and D. F. Mota, Cosmology and Astrophysical Constraints of Gauss-Bonnet Dark
Energy, Phys. Lett. B 644, 104 (2007) [astro-ph/0606078]
T. Koivisto and D. F. Mota, Gauss-Bonnet quintessence: Background evolution, large scale
structure and cosmological constraints, Phys. Rev. D 75, 023518 (2007) [hep-th/0609155]
B. M. Leith and I. P. Neupane, Gauss-Bonnet cosmologies: Crossing the phantom divide and the
transition from matter dominance to dark energy, JCAP 0705, 019 (2007) [hep-th/0702002]
S. Tsujikawa and M. Sami, String-inspired cosmology: Late time transition from scaling matter
era to dark energy universe caused by a Gauss-Bonnet coupling, JCAP 0701, 006 (2007)
[hep-th/0608178]
[17] P. Binetruy, C. Charmousis, S. C. Davis and J. F. Dufaux, Avoidance of naked singularities
in dilatonic brane world scenarios with a Gauss-Bonnet term, Phys. Lett. B 544, 183 (2002)
[hep-th/0206089]
C. Charmousis, S. C. Davis and J. F. Dufaux, Scalar brane backgrounds in higher order curvature
gravity, JHEP 0312, 029 (2003) [hep-th/0309083]
[18] T. P. Sotiriou and E. Barausse, Post-Newtonian expansion for Gauss-Bonnet gravity, Phys. Rev.
D 75, 084007 (2007) [gr-qc/0612065]
[19] G. Esposito-Farese, Scalar-tensor theories and cosmology and tests of a quintessence-Gauss-Bonnet
coupling, gr-qc/0306018
G. Esposito-Farese, Tests of scalar-tensor gravity, AIP Conf. Proc. 736, 35 (2004) [gr-qc/0409081]
[20] J. D. Anderson, E. L. Lau, A. H. Taylor, D. A. Dicus D. C. Teplitz and V. L. Teplitz Bounds on
Dark Matter in Solar Orbit, Astrophys. J. 342, (1989) 539
[21] M. Sereno and P. Jetzer, Solar and stellar system tests of the cosmological constant, Phys. Rev. D
73, 063004 (2006) [astro-ph/0602438]
[22] E. V. Pitjeva, High-Precision Ephemerides of Planets–EPM and Determination of Some
Astronomical Constants, Solar System Research 39, 176 (2005)
[23] B. Bertotti, L. Iess and P. Tortora, A test of general relativity using radio links with the Cassini
spacecraft, Nature 425, 374 (2003)
[24] J. K. Hoskins, R. D. Newman, R. Spero and J. Schultz, Experimental tests of the gravitational
inverse square law for mass separations from 2-cm to 105-cm, Phys. Rev. D 32, 3084 (1985)
[25] G. Calcagni, A. De Felice, B. de Carlos, Ghost conditions for Gauss-Bonnet cosmologies, Nucl.
Phys. B 752, 404 (2006) [hep-th/0604201]
[26] P. Brax, C. van de Bruck, A. C. Davis, J. Khoury and A. Weltman, Detecting dark energy in orbit:
The cosmological chameleon, Phys. Rev. D 70, 123518 (2004) [astro-ph/0408415]
[27] L. Amendola, Coupled quintessence, Phys. Rev. D 62, 043511 (2000) [astro-ph/9908023]
http://arxiv.org/abs/hep-th/0604086
http://arxiv.org/abs/hep-th/0503191
http://arxiv.org/abs/astro-ph/0307338
http://arxiv.org/abs/astro-ph/0307285
http://arxiv.org/abs/astro-ph/0603703
http://arxiv.org/abs/hep-th/0506137
http://arxiv.org/abs/astro-ph/0606078
http://arxiv.org/abs/hep-th/0609155
http://arxiv.org/abs/hep-th/0702002
http://arxiv.org/abs/hep-th/0608178
http://arxiv.org/abs/hep-th/0206089
http://arxiv.org/abs/hep-th/0309083
http://arxiv.org/abs/gr-qc/0612065
http://arxiv.org/abs/gr-qc/0306018
http://arxiv.org/abs/gr-qc/0409081
http://arxiv.org/abs/astro-ph/0602438
http://arxiv.org/abs/hep-th/0604201
http://arxiv.org/abs/astro-ph/0408415
http://arxiv.org/abs/astro-ph/9908023
	Introduction
	Quadratic Curvature Gravity 
	Constraints
	Planetary motion
	Cassini spacecraft
	A table-top experiment
	Discussion
ABSTRACT
  Although the Gauss-Bonnet term is a topological invariant for general
relativity, it couples naturally to a quintessence scalar field, modifying
gravity at solar system scales. We determine the solar system constraints due
to this term by evaluating the post-Newtonian metric for a distributional
source. We find a mass dependent, 1/r^7 correction to the Newtonian potential,
and also deviations from the Einstein gravity prediction for light-bending. We
constrain the parameters of the theory using planetary orbits, the Cassini
spacecraft data, and a laboratory test of Newton's law, always finding
extremely tight bounds on the energy associated to the Gauss-Bonnet term. We
discuss the relevance of these constraints to late-time cosmological
acceleration.

<|endoftext|><|startoftext|>
Swit
hing me
hanism of photo
hromi
 diarylethene derivatives
mole
ular jun
tions
Jing Huang,
Qunxiang Li,
Hao Ren,
Haibin Su,
Q.W.Shi,
and Jinlong Yang
Hefei National Laboratory for Physi
al S
ien
es at Mi
ros
ale,
University of S
ien
e and Te
hnology of China,
Hefei, Anhui 230026, People's Republi
 of China
Division of Materials S
ien
e, Nanyang Te
hnologi
al University,
50 Nanyang Avenue, 639798, Singapore
(Dated: November 4, 2018)
Abstra
t
The ele
troni
 transport properties and swit
hing me
hanism of single photo
hromi
 diarylethene
derivatives sandwi
hed between two gold surfa
es with 
losed and open 
on�gurations are inves-
tigated by a fully self-
onsistent nonequilibrium Green's fun
tion method 
ombined with density
fun
tional theory. The 
al
ulated transmission spe
tra of two 
on�gurations are strikingly distin
-
tive. The open form la
ks any signi�
ant transmission peak within a wide energy window, while the

losed stru
ture has two signi�
ant transmission peaks on the both sides of the Fermi level. The
ele
troni
 transport properties of the mole
ular jun
tion with 
losed stru
ture under a small bias
voltage are mainly determined by the tail of the transmission peak 
ontributed unusually by the
perturbed lowest perturbed uno

upied mole
ular orbital. The 
al
ulated on-o� ratio of 
urrents
between the 
losed and open 
on�gurations is about two orders of magnitude, whi
h reprodu
es
the essential features of the experimental measured results. Moreover, we �nd that the swit
hing
behavior within a wide bias voltage window is extremely robust to both substituting F or S for H
or O and varying end an
horing atoms from S to Se and Te.
PACS numbers: 73.63.-b, 85.65.+h, 82.37.Vb
http://arxiv.org/abs/0704.0176v1
I. INTRODUCTION
A 
riti
al mission of the mole
ular ele
troni
s is to develop innovative devi
es at single
mole
ular s
ale. The representative mole
ular wires, re
ti�ers, swit
hes, and transistors
have been intensively studied in the past years.
Obviously, a single mole
ular swit
h holds
great promise sin
e the swit
h is a 
ru
ial element of any modern design of memory and
logi
 appli
ations. Now various s
hemes have been proposed to realize mole
ular swit
hing
pro
ess in
luding relative motion of mole
ule internal stru
ture,
3,4,5,6,7

hange of mole
ule

harge states,
and bond �u
tuation between the mole
ule and their ele
tri
al 
onta
ts.
Re
ently, an alternative routine has been suggested to design swit
hes based on single stably
existing mole
ule whi
h 
an reversibly transform between two 
ondu
tive states in response
to external triggers.
11,12,13,14
Among various triggers, light is a very attra
tive external stim-
ulus be
ause of the ease of addressability, fast response times, and 
ompatibility with a wide
range of 
ondensed phases.
The swit
hing properties through the so-
alled photo
hromi
 mole
ules have been 
ar-
ried out by several experimental and theoreti
al groups.
15,16,17,18,19,20,21,22
In parti
ular, the
dithienyl
y
lopentene (DTC) derivatives (as the 
entral swit
hing unit) hold great promise
as arti�
ial photoele
troni
 swit
hing mole
ules be
ause of their reversible photo-indu
ed
transformations that modulate ele
tri
al 
ondu
tivity and their ex
eptional thermal sta-
bility and fatigue resistan
e.
15,16,17,18,19
Using the me
hani
ally 
ontrollable break-jun
tion
te
hnique, Dulić et al.16 designed mole
ular swit
hes based on DTC mole
ules (1,2-bis[5′-
-a
etylsulfanylthien-2
- yl)-2
-methylthien-3
-yl℄ 
y
lopentene) with two thiophene link-
ers, however, whi
h operates only one-way, i.e. from 
ondu
ting to the insulating state under
visible light with λ=546 nm with resistan
e 
hange at 2-3 orders of magnitude. Interestingly,
He et al.
found that the transition 
an be opti
al two-way for DTC mole
ules where H
atoms in 
y
lopentene are substituted by six F atoms (�uorined-DTC). The single-mole
ule
resistan
e in the open form is about 130 times larger than that of in the 
losed stru
ture
measured by using s
anning tunneling mi
ros
opy with a gold tip. In a parallel study, Kat-
sonis et al.
used aromati
 (meta-phenyl) linkers and observed that the light-
ontrolled
swit
hing of single DTC mole
ules 
onne
ted to gold nanoparti
les was reversible. Very
re
ently, to improve the poor stability of su
h kind of 
onjugated mole
ules with thiols on
both ends, Tanigu
hi et al.
developed an inter
onne
t method in solution for diarylethene
photo
hromi
 mole
ular swit
hes that 
an ameliorate ele
trode-mole
ule binding, mole
u-
lar orientation, and devi
e fun
tions. In their experiment, one light-
ontrolled swit
hing
mole
ule 
onsists of a 
entral �uorined-DTC mole
ule, diaryls on two sides and two thiol
groups at both ends. The 
orresponding 
losed and open forms are shown Figure 1(a).
Consequently, the 
urrent through the mole
ular jun
tion with 
losed stru
ture is about 20
times larger than that of the open form measured by STM.
To gain better understanding of these experimental observations, several theoreti
al work
have been 
arried out.
20,21,22
Li et al.
performed quantum mole
ular dynami
s and density
fun
tional theory (DFT) 
al
ulations on the ele
troni
 stru
tures and transport properties
through several photo
hromi
 mole
ules with several di�erent spa
ers sandwi
hed between
gold 
onta
ts. They 
hose dithienylethene (DTE) derivatives to model the experimental
measured mole
ules and predi
ted an about 30 times 
ondu
tion enhan
ement when 
on-
verting the open form into a 
losed one by opti
al te
hnique. Subsequently, two resear
h
groups independently investigated the swit
hing properties of DTC mole
ular jun
tions,
and found that the transmission peak originates from the highest o

upied mole
ular orbital
(HOMO) of the 
losed form lying near the ele
trode Fermi level.
21,22
In their 
al
ulations,
the ele
trodes are simulated with 
luster models and the e�e
ts on the transport properties

oming from six H hydrogen atoms substituted by F atom are not 
onsidered. Till now, to
our best knowledge, there is no theoreti
al study about the swit
hing me
hanism through
exa
tly the same measured mole
ules (�uorined-DTC with two diaryls on two sides, named
diarylethene in Tanigu
hi et al.'s experiment). In this paper, we employ the non-equilibrium
green's fun
tion te
hnique (NEGF) 
ombined with DFT method to address ele
troni
 trans-
port and swit
hing behavior of diarylethene based mole
ular jun
tion. Moreover, we examine
the robustness of this type of swit
hing devi
e against various 
hemi
al substitution (where
six F atoms in the peripheral of 
y
lopentene and S atoms in thienyl are substituted by H
and O atoms, respe
tively) and alternations of an
horing atoms.
II. COMPUTATIONAL MODEL AND METHOD
The 
omputational model system is s
hemati
ally illustrated in Fig. 1(b). The mole
ules
with open and 
losed 
on�gurations are sandwi
hed between two gold ele
trodes through
S-Au bonds. The Au (111) surfa
e is represented by a (4×4) 
ell with periodi
 boundary 
on-
ditions. Sin
e the hollow site 
on�guration is energeti
ally preferable by 0.2 and 0.6 eV than
the bridge and atop sites, respe
tively,
the diarylethene mole
ule 
onne
ts to sulfur atoms
whi
h are lo
ated at hollow sites of two Au (111) surfa
es. Both ele
trodes are repeated by
three layers (A, B, and C). The whole system is arranged as (BCA)-(BC-mole
ule-CBA)-
(CBA), whi
h 
an be divided into three regions in
luding the left lead (BCA), the s
attering
region, and the right lead (CBA). The s
attering regions in
lude a diarylethene mole
ule,
two surfa
e layers of the left (BC), and three surfa
e layers of the right lead (CBA), where
all the s
reening e�e
ts are in
luded into the 
onta
t region, within whi
h the 
harge-density
matrix is solved self-
onsistently with the NEGF method.
The ele
troni
 transport properties are studied by the NEGF 
ombined with DFT 
al-

ulations, whi
h are implemented in ATK pa
kage.
This methodology has been adopted
to explain various experimental results su

essfully.
26,27
In our 
al
ulations, Ceperley-Alder
lo
al-density approximation is used.
Core ele
trons are modeled with Troullier-Matrins
nonlo
al pseudopotential, and valen
e ele
trons are expanded in a SIESTA lo
alized basis
29,30
A energy 
uto� of 150 Ry for the grid integration is set to present the a

urate 
harge
density. The optimized ele
trode-ele
trode distan
e is 39.5 Å for the 
losed 
on�guration
whi
h is 0.7 Å longer than that of the open one. All atomi
 positions are relaxed and the

orresponding gold-sulfur distan
e is 2.5 Å, whi
h is 
lose to the typi
al theoreti
al values.
In addition, we �nd that the geometri
 
hanges of two diarylethene mole
ule sandwi
hed
between two Au(111) surfa
es are negligible 
omparing to the 
orresponding free mole
ules.
III. RESULTS AND DISCUSSION
A. Ele
troni
 stru
tures of free diarylethene mole
ules with 
losed and open stru
-
tures
Atomi
 positions of two free diarylethene mole
ules with 
losed and open stru
tures are
optimized by Gaussion03 pa
kage at general gradient approximation level.
In the ground
ele
troni
 states, both optimized 
on�gurations are featured by out-of-plane distortions. The

entral dihedral angle is 60 degrees for the open form, while only about 8 degrees for the

losed one. This distortion leads the distan
e between 
arbon and 
arbon bond (the bond

an be broken by photon) 
lose to 4.0 Å in the open 
ase 
ompared to 1.5 Å for the 
losed

on�guration. These important geometri
 parameters are 
onsistent with the previous DFT
predi
tions for bisbenzothienylethene mole
ules.
Experimental studies have demonstrated
that the mole
ule 
an transform reversibly between the 
losed and open forms by shining
ultraviolet and visible lights, respe
tively.
Drawing from the 
hemi
al intuition, one would expe
t that the ele
troni
 stru
tures
have distin
tive 
hara
teristi
s due to the signi�
ant geometri
 di�eren
e between 
losed
and open stru
tures. For example, it is 
lear that both single and double bonds appearing
in the 
entral swit
hing unit get almost swapped within the 
losed and open 
on�gurations
as shown in Fig. 1(a). The number of double bonds is 9 in the open form in 
ontrast to 8 in
the 
losed one. Thus, the energy of HOMO in the open form is expe
ted to be lower than
that in the 
losed one.
In deed, the energies of the HOMO and lowest uno

upied mole
ular
orbital (LUMO) of the 
losed form are -4.6 and -3.3 eV, respe
tively, whereas the HOMO
and LUMO energies of the open one are -4.9 and -2.7 eV. The frontier orbital lo
alizes
primarily on ea
h 
onjugated unit of the mole
ule or on the 
entral swit
hing unit for the
diarylethene mole
ule with open 
on�guration. The mole
ule in the 
losed form belongs to a

onjugated system, whose HOMO and LOMO orbitals are essentially delo
alizated π orbitals
extending over the entire mole
ule. More interesting, when six F atoms in the peripheral of

y
lopentene are substituted by H atoms, we �nd that the HOMO and LUMO energies of
this modi�ed mole
ule shift dramati
ally to -4.2 and -2.5 eV for the 
losed form, and to -4.7
and -1.6 eV for the open one respe
tively. These remarkable di�eren
es of the geometries
and ele
troni
 stru
tures are expe
ted to a�e
t signi�
antly transport properties.
B. Transport properties of diarylethene mole
ular jun
tions
The 
urrents through the mole
ular jun
tion with 
losed and open 
on�gurations in the
bias voltage range [-1.0, 1.0V℄ are 
al
ulated by the Landauer-Bütiker formalism.
It should
be pointed out that at ea
h bias voltage, the 
urrent is determined self-
onsistently under
the nonequilibrium 
ondition. The 
al
ulated I-V 
urves are presented in Figure 2. The
triangles linking with bla
k solid lines are for the diarylethene mole
ular jun
tion, while
the 
ir
les linking with short red dotted lines stand for the jun
tion where six F atoms
in the peripheral of 
y
lopentene are substituted by H atoms. The �lled (empty) symbols

orrespond to the 
losed (open) stru
tures. Our 
al
ulations 
apture the key features of the
experimental results.
The 
urrent through the 
losed form is remarkably higher than that
of the open one. When the diarylethene mole
ule in the jun
tion 
hanges from a 
losed

on�guration to the open one, the mole
ular wire is predi
ted to swit
h from the on (low
resistan
e) state to the o� (high resistan
e) state. The 
urrent enhan
ement is quanti�ed by
the on-o� ratio of 
urrent de�ned as R(V ) = Iclosed(V )/Iopen(V ). For example, the 
urrent
of the 
losed form at 1.0 V is about 4.5 µA, whi
h is about 500 times larger than that of
the open 
ase. Su
h a large on-o� ratio in this given range of bias voltage 
an be readily
measured and is desirable for the real appli
ation. Note that the predi
ted on-o� ratio at
1.0 V is larger by about one order of magnitude 
ompared to experiment.
We think one
possible reason for this dis
repan
y is the limitation of the 
omputational method. It is
well known that the 
al
ulated value of the 
urrent through mole
ular jun
tion using NEGF

ombined with DFT is larger about 1-2 orders of magnitude than that of these experimental
measured result.
26,31
Other two possible reasons are environment e�e
t and geometry dif-
feren
e. Firstly, solvent e�e
t is not 
onsidered in presented 
al
ulations. Se
ondly, in our

omputational model, diarylethene mole
ules are dire
tly bound to gold ele
trodes through
Au-S bonds in va
uum. In the experimental setup, the 
entral swit
hing mole
ules bind to
the long orientation 
ontrol mole
ules (polyrotaxane), whi
h 
onne
t to the interfa
e 
ontrol
mole
ules (4-iodobenzenethiol) an
hored with gold nanoele
trodes in solution (the distan
e
between two ele
trodes is about 30 nm).
Note that the slight geometri
 distortion due to
the mole
ule-ele
trode intera
tion 
an result in a slight asymmetry in the 
al
ulated I-V

urves at small bias voltage range as shown in the inset below right of Fig. 2 in small s
ale
for 
larity.
To understand the dramati
 di�eren
e in 
ondu
tivities of the 
losed and open 
on�gu-
rations, we 
ompute the energy dependen
e of total zero-bias voltage transmission spe
tra
shown in Figure 3, where the Fermi level (EF ) is set to be zero for 
larity. Generally
speaking, the 
ondu
tan
e of the mole
ular jun
tion is determined by the number of the
eigen
hannel, the properties of the perturbed frontier orbitals of the mole
ule due to the
presen
e of the gold ele
trodes and the alignment of the metal Fermi level within the per-
turbed HOMO-LUMO gap.
Applying an e�e
tive s
heme named mole
ular proje
ted self-

onsistent Hamiltonian (MPSH) method,
the orbital energies and eigenstates (referred as
perturbed MOs) of the MPSH are obtained and plotted in Fig. 3. The energy positions of
these perturbed MOs relative to the EF are denoted in Figs. 3(a) and 3(b) with red short
verti
al lines, whi
h mat
h ni
ely with the transmission peaks. The spatial distributions of
the perturbed-HOMOs and -LUMOs are presented in Fig. 3 lo
ating on the right and left
sides of the EF , respe
tively. Both 
al
ulated 
ondu
tan
es are very small at zero bias. It
is 4.2×10
G0 (G0=2e
/h) for the 
losed 
on�guration at the EF , and only 5.4×10
for the open one whi
h is about 800 times smaller than the former one. The diarylethene
mole
ule with a 
losed stru
ture has two broad and strong transmission peaks lo
ating at
-0.8 and 0.5 eV, respe
tively. For the open form, note that the la
k of any signi�
ant peaks
in between -1.5 and 1.7 eV 
learly elu
idates its lower 
ondu
tivity.
More importantly, the transmission spe
tra display extraordinarily dis
repant 
hara
ter-
isti
s. It is 
lear that for the diarylethene mole
ular jun
tion with 
losed stru
ture, the
signi�
ant transmission peaks lo
ating below and above the EF (about -0.8 and 0.5 eV)
are mainly 
ontributed by the perturbed-HOMO and -LUMO, respe
tively. Notably, the
perturbed HOMOs and LUMOs of the 
losed 
on�guration in Fig. 3(a) are delo
alized π-

onjugated orbitals, whi
h provide good 
hannels for ele
tron tunneling through the mole
u-
lar jun
tion and lead to two signi�
ant transmission peaks. Very interestingly, the transport
properties are predominated by the tail of the perturbed LUMO 
ontributed transmission
peak at small bias voltage (for example, less than 1.0 V), sin
e the transmission 
oming from
the perturbed LUMO is just 0.5 eV away from EF , whi
h is 0.3 eV 
loser than that of the
perturbed HOMO. Note that this �nding is di�erent from the mi
ros
opi
 pi
tures of other
existing mole
ular jun
tions based on photo
hromi
 DTE and DTC swit
hing mole
ules,
21,22
azobenzene,
and quintuple bond [PhCrCrPh℄ mole
ules,
whose transport properties are
prevailed by the transmission peak 
ontributed by the perturbed HOMO.
Yet 
ontradi
torily, the spatial pro�les shown in Fig.3 (b) of the perturbed LUMO
strongly lo
alizes at the 
entral swit
hing unit with open 
on�guration. This leads to no
appre
iable transmission peaks in the wide bias window (i.e. from -1.5 to 1.7 eV). The sig-
ni�
ant transmission peak at -1.7 eV originates from the perturbed HOMO and HOMO-1
(both are π orbitals) for the open stru
ture, however, it is lo
ated too far away from the EF .
Here, 
omparing to the 
losed 
ase, we note that the position of the perturbed HOMO for
the open 
on�guration is buried deeply below the EF , whi
h is 
onsistent with the previous
theoreti
al results of DTC mole
ules.
21,22
These theoreti
al �ndings ensure us to 
on
lude
that the sharp 
ontrast of the alignment of the perturbed orbital energies with respe
t to
the ele
trode Fermi level and the shape of these perturbed frontier mole
ular orbitals are
the essential 
auses for the striking 
ontrast in transport properties through diarylethene
mole
ular jun
tions with 
losed and open 
on�gurations.
It should be pointed out that the number of transmission paths 
an not a

ount for
the dramati
 di�eren
e in 
ondu
tivities of the 
losed and open 
on�gurations sin
e the
eigen
hannel analysis indi
ates that there is a single eigen
hannel for both 
ases within a
wide window (i.e. [-1.5, 1.5 eV℄). A

ording to the features mentioned above of the 
al
ulated
zero-bias transmission spe
tra (Fig. 3), One 
an spe
ulate that this type of mole
ular
swit
h 
an operate robustly in a pretty wide range of bias voltages with fairly large on -o�
ratio. Additional 
urrents through the diarylethene mole
ular jun
tions with two di�erent

on�gurations at -2.0, -1.5, 1.5 and 2.0 V are also 
al
ulated. The on-o� ratios of 
urrent are
predi
ted to be about two orders of magnitude. This suggests that the bias voltage window
of this kind of mole
ular swit
h (in Tanigu
hi et al.'s experiments
) with reasonably large
on-o� ratio is surprisingly wider than that of other photo-sensitive mole
ules.
13,21
Experimentalist found that diarylethene mole
ular swit
h is reversible when the mole
ules
are sandwi
hed through aromati
 linkers.
18,23
Theoreti
al 
al
ulations argued that whether
it 
an be swit
hed reversibly or not depending on the mole
ule-ele
trode hybridization.
The weak intera
tion between mole
ule and ele
trode is required to fa
ilitate the desired
reversible transition. A

ording to these �ndings, the reversible transition between the open
and 
losed 
on�gurations in this diarylethtene derivatives based mole
ular jun
tion is highly
possible, sin
e the mole
ules are sandwi
hed with phenyl linkers and the mole
ule-ele
trode
hybridization is weak.
C. Substituting e�e
t on diarylethene mole
ule
Previous theoreti
al 
al
ulations fo
us on the end linking groups,
no attempts so far
have been made to examine the side substituting e�e
t on transport properties through the
diarylethene derivations. It is important to investigate the 
ondu
tan
e of the mole
ular
jun
tion, where six F atoms in the peripheral of 
y
lopentene are substituted by H atoms.
The 
al
ulated transmission spe
tra for the H-substituted mole
ular jun
tion with the 
losed
and open forms at zero bias voltage are shown by bla
k solid lines in Figures. 4(a) and
4(b), and two 
orresponding I-V 
urves are presented in Fig. 2 with �lled and empty

ir
les (linked by red dotted lines), respe
tively. The 
urrent through the H-substituted
mole
ular jun
tion with 
losed 
on�guration is about half of that of the 
y
lopentene with
six F atoms in the peripheral. The reasons are summarized in the following three points.
(1) The repla
ement of H with F on the swit
hing unit results in the variation of band
gaps. The energy gap of the H-substituted diarylethene mole
ule is about 1.7 eV, while the
gap of diarylethene (F) mole
ule is about 1.3 eV. (2) The alignment of the Fermi level is
di�erent for two systems. For the jun
tion with the H-substituted mole
ule, the peak 
oming
from the perturbed HOMO lo
ates at -0.7 eV, whi
h is 
loser to the Fermi level than the
perturbed LUMO transmission peak (at 1.0 eV). This result is 
onsistent with these previous
theoreti
al studies on other DTC and DTE derivations.
21,22
However, the Fermi level lies

lose to the transmission 
ontributed by the perturbed LUMO for the diarylethene (F)
mole
ular jun
tion, as shown in Fig. 3. (3) The transport properties under small bias voltage
are mainly determined by the tail of the transmission peak 
oming from the perturbed LUMO
for the 
losed diarylethene (F) mole
ular jun
tion. However, the 
ondu
tivity of the 
losed
H-substituted one is 
ontrolled the tail of transmission peak 
ontributed by the perturbed
HOMO. Nonetheless, the light-
ontrolled swit
hing feature is undoubtedly retained.
Experimental and theoreti
al results revealed that the visible adsorption spe
tra 
hanged
when two S atoms of the swit
hing unit were substituted by O atoms.
Thus, the trans-
mission spe
tra of the mole
ular jun
tion shown in Fig.1 (b) where two S atoms in 
entral
swit
hing unit are repla
ed by O atoms are also 
al
ulated here, as shown in Fig. 4 with
red dotted lines. Clearly, the swit
hing behavior does not depend sensitively on the O-
substituent. However, it should be pointed out that the positions of signi�
ant transmission
peaks obviously shift when 
ompared to Fig. 3. Parti
ularly, the transmission peaks 
oming
from the perturbed HOMO and LUMO lo
ates at -0.7 and 0.8 eV, respe
tively. Again, the
tail of perturbed HOMO transmission peak 
ontributes largely to the low bias ele
troni

ondu
tan
e.
D. E�e
t of varing end an
horing atoms
In general, the transport properties of mole
ular jun
tions depend nontrivially on the end
linking atoms.
36,37
Now we turn to explore the e�e
t of alternating end an
horing atoms.
The 
al
ulated transmission spe
tra at zero bias voltage are shown in Figure 5. The bla
k
solid and red dotted lines stand for end Se- and Te-an
hored 
ases, respe
tively. It is 
lear
that the main 
hara
teristi
s of the transmission spe
tra are maintained and the 
losed
stru
ture is undoubtedly more 
ondu
tive. For the end Se-an
hored 
ase, the energies of
perturbed MO are quite 
lose to the data presented in Fig. 3 of the end S-an
hored one.
Interestingly, 
learly observable 
hanges have been shown for the end Te-an
hored 
ase. The
transmission peaks originating from the perturbed HOMO and LUMO for the mole
ular
jun
tion 
onne
ting to gold ele
trodes through Te atoms lo
ates at -1.0 (-0.8 for S-an
hored
one) and 0.3 eV (0.5 for S-an
hored one), respe
tively. The very interesting �nding of this
study is that the swit
hing behavior of diarylethene derivatives based mole
ular jun
tions
is robust to vary end an
horing atoms from S to Se and Te.
To examine the sensitivity of results shown in Fig. 3 to small 
hange of the ele
trode-
ele
trode distan
e, we 
ompute the zero-bias transmission spe
tra of diarylethene swit
hes
with the 
losed and open stru
tures as elongating and shortening ele
trode-ele
trode separa-
tion up to 0.3 Å. We �nd that the transmission spe
tra experien
e little 
hange, and trans-
port properties of this kind of diarylethene mole
ular jun
tion is not dete
tably sensitive
to the ele
trode-ele
trode distan
e. It indi
ates that this kind of light-
ontrolled swit
hing
based on diarylethene derivatives is stable as a mole
ular swit
hing devi
e. Note that the
transport behavior is des
ribed by the ele
tron elasti
 s
attering theory in our 
al
ulations.
The e�e
t arising from the ele
troni
 vibration and the a

ompanying heat dissipation on
the 
al
ulated on-o� ratio 
an be negle
ted be
ause of the remarkable di�eren
e of the I-V

urves.
IV. CONCLUSION
In summary, we investigate the transport properties of the diarylethene with 
losed and
open stru
tures using the NEGF 
ombined the DFT method. The zero-bias transmission
fun
tion of two di�erent forms is strikingly distin
tive. The open form la
ks any signi�-

ant transmission peak within a wide energy window, while the 
losed stru
ture has two
signi�
ant transmission peaks on the both sides of the Fermi level. The ele
troni
 trans-
port properties of the mole
ular jun
tion with 
losed stru
ture under a small bias voltage
are mainly determined by the tail of the transmission peak 
ontributed unusually by the
perturbed lowest perturbed uno

upied mole
ular orbital. The 
al
ulated on-o� ratio of 
ur-
rents between the 
losed and open 
on�gurations is about two orders of magnitude, whi
h
reprodu
es the essential features of the experimental measured results. Moreover, although
the alignments of the perturbed mole
ular orbitals's energies with respe
t to the ele
trode's
Fermi level are not exa
tly the same, we �nd that the swit
hing behavior within a wide bias
voltage window is extremely robust to both substituting F or S for H or O and varying end
an
horing atoms from S to Se and Te.
ACKNOWLEDGMENTS
This work was partially supported by the National Natural S
ien
e Foundation of China
under Grants 10674121, 10574119, 50121202, and 20533030, by National Key Basi
 Resear
h
Program under Grant No. 2006CB922004, by the USTC-HP HPC proje
t, and by the
SCCAS and Shanghai Super
omputer Center.Work at NTU is supported in part by A*STAR
SERC grant (No. 0521170032).
Corresponding author. E-mail: liqun�ust
.edu.
n
Corresponding author. E-mail: jlyang�ust
.edu.
n
A. Aviram and M. A. Ratner, Mole
ular Ele
troni
s: S
ien
e and Te
hnology (The New York
A
ademy of S
ien
es, New York, 1999); A. Nitzan and M. A. Ratner, S
ien
e 300, 1384 (2003).
C. Joa
him, J. K. Gimzewski, and A. Aviram, Nature 408, 541 (2000).
B. Y. Choi, S. J. Kahng, S. Kim, H. Kim, H. W. Kim, Y. J. Song, J. Ihm, and Y. Kuk, Phys.
Rev. Lett. 96, 156106 (2006).
J. Henzl, M. Mehlhorn, H. Gawronski, K. H. Rieder, and K. Morgenstern, Angew. Chem. Int.
Ed. 45, 603 (2006).
J. Chen, M. A. Reed, A. M. Rawlett, and J. M. Tour, S
ien
e 286, 1550 (1999).
Y. Chen, D. A. A. Ohlberg, X. M. Li, D. R. Stewart, R. S. Williams, J. O. Jeppesen, K. A.
Nielsen, J. F. Stoddart, D. L. Olyni
k, and E. Anderson, Appl. Phys. Lett. 82, 1610 (2003).
A. S. Blum, J. G. Kushmeri
k, D. P. Long, C. H. Patterson, J. C. Yang, J. C. Henderson, Y. X.
Yao, J. M. Tour, R. Shashidhar, and B. R. Ratna, Nature Materials 4, 167 (2005).
J. M. Seminario, A. G. Za
arias, and J. M. Tour, J. Am. Chem. So
. 122, 3015 (2000).
J. M. Seminario, P. A. Derosa, and J. L. Bastos, J. Am. Chem. So
. 124, 10266 (2002).
G. K. Rama
handran, T. J. Hopson, A. M. Rawlett, L. A. Nagahara, A. Primak, and S. M.
Lindsay, S
ien
e 300, 1413 (2003).
H. Tian and S. J. Yang, Chem. So
. Rev. 33, 85 (2004).
K. Matsuda and M. Irie, J. Photo
h. Photobio. C 5, 169 (2004).
C. Zhang, M. H. Du, H. P. Cheng, X. G. Zhang, A. E. Roitberg, and J. L. Krause, Phys. Rev.
Lett. 92, 158301 (2004); C. Zhang, Y. He, H. P. Cheng, Y. Q. Xue, M. A. Ratner, X. G. Zhang,
and P. Krsti
, Phys. Rev. B 73, 125445 (2006).
Jing Huang, Qunxiang Li, Hao Ren, Haibin Su, and Jinlong Yang, J. Chem. Phys. 125, 184713
(2006).
S. Fraysse, C. Coudret, and J. P. Launay, Eur. J. Inorg. Chem. 7, 1581 (2000).
D. Duli¢, S. J. van der Molen, T. Kuderna
, H. T. Jonkman, J. J. D. de Jong, T. N. Bowden,
J. van Es
h, B. L. Feringa, and B. J. van Wees, Phys. Rev. Lett. 91, 207402 (2003).
J. He, F. Chen, P. A. Liddell, J. Andréasson, S. D. Straight, D. Gust, T. A. Moore, A. L. Moore,
J. Li, O. F. Sankey, and S. M. Lindsay, Nanote
hnology, 16, 695 (2005).
N. Katsonis, T. Kuderna
, M. Walko, S. J. van der Molen, B. J. van Wees, and B. L. Feringa,
Adv. Mater. 18, 1397 (2006)
T. Kuderna
, S. J. van der Molen, B. J. van Wees, and B. L. Feringa, Chem. Commun. 34, 3597
(2006).
J. Li, G. Speyer, and O. F. Sankey, Phys. Rev. Lett. 93, 248302 (2004); G. Speyer, J. Li, and
O. F. Sankey, Phys. Stat. Sol.(b) 241, 2326 (2004).
M. Zhuang and M. Ernzerhof, Phys. Rev. B 72, 073104 (2005).
M. Kondo, T. Tada, K. Yoshizawa, Chem. Phys. Lett. 412, 55 (2005).
M. Tanigu
hi, Y. Nojima, K. Yokota, J. Terao, K. Sato, N. Kambe, and T. Kawai, J. Am. Chem.
So
. 128, 15062 (2006).
We have performed DFT 
al
ulations for the diarylethene mole
ule adsorbing on the hollow,
bridge, and atop sites. The 
al
ulated results show that the mole
ule prefers to the hollow site.
Its adsorption energy is lower about 0.2 and 0.6 eV than that of the bridge and atop adsorption

on�guration, respe
tively.
M. Brandbyge, J. L. Mozos, P. Ordejón, J. Taylor, and K. Stokbro, Phys. Rev. B 65, 165401
(2002); J. Taylor, H. Guo, and J. Wang, Phys. Rev. B 63, 245407 (2001).
Xiaojun Wu, Qunxiang Li, Jing Huang, and Jinlong Yang, J. Chem. Phys. 123, 184712 (2005);
Xiaojun Wu, Qunxiang Li, Jing Huang, and Jinlong Yang, Phys. Rev. B 72, 115438 (2005);
Qunxiang Li, Xiaojun Wu, Jing Huang, and Jinlong Yang, Ultrami
ros
opy, 105, 293 (2005).
S. K. Nielsen, M. Brandbyge, K. Hansen, K. Stokbro, J. M. van Ruitenbeek, and F. Besenba
her,
Phys. Rev. Lett. 89, 066804 (2002).
D. M. Ceperley, and B. J. Alder, Phys. Rev. Lett. 45, 566 (1980).
J. M. Soler, E. Arta
ho, J. D. Gale, A. Gar
ia, J. Junquera, P. Ordejón, and D. S.-Portal, J.
Phys.: Condens. Matter 14, 2745 (2002).
Single-zeta plus polarization (SZP) basis set for Au atoms and double zeta plus polarization
(DZP) basis set for other atoms are adopted. Test 
al
ulations show that the very similar results
are obtained by using DZP basis set for all atoms.
K. Stokbro, J. Taylor, M. Brandbyge, J. -L. Mozos, and P. Ordejón, Comput. Mater. S
i. 27,
151 (2003).
Our 
al
ulations are 
ondu
ted by using Gassian03 pa
kage with 6-31+G basis at BLYP level.
(M. J. Fris
h, M. J. Fris
h, G. W. Tru
ks, H. B. S
hlegel, G. E. S
useria, M. A. Robb, J. R.
Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N. Kudin, J. C. Burant, J. M. Millam, S. S.
Iyengar, J. Tomasi, V. Barone, B. Mennu

i, M. Cossi, G. S
almani, N. Rega, G. A. Petersson,
H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima,
Y. Honda, O. Kitao, H. Nakai, M. Klene, X. Li, J. E. Knox, H. P. Hrat
hian, J. B. Cross, V.
Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R.
Cammi, C. Pomelli, J. W. O
hterski, P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J.
J. Dannenberg, V. G. Zakrzewski, S. Dappri
h, A. D. Daniels, M. C. Strain, O. Farkas, D. K.
Mali
k, A. D. Rabu
k, K. Raghava
hari, J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S.
Cli�ord, J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. L.
Martin, D. J. Fox, T. Keith, M. A. Al-Laham, C. Y. Peng, A. Nanayakkara, M. Challa
ombe,
P. M. W. Gill, B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople, Gaussian03,
revision A.1; Gaussian, In
., Pittsburgh PA, 2003.)
A. E. Clark, J. Phys. Chem. A 100, 3790 (2006).
R. Landauer, Philos. Mag. 21, 863 (1970).
M. Irie and M. Mohri, J. Org. Chem. 53, 803 (1988); D. Ja
quemin and E. A. Perpète, Chem.
Phys. Lett. 429, 147 (2006).
S. H. Ke, H. U. Baranger, and W. T. Yang, J. Am. Chem. So
. 126, 15897 (2004).
Y. Q. Xue, and M. A. Ratner, Phys. Rev. B 69, 085403 (2004).
J. K. Viljas, J. C. Cuevas, F. Pauly, and M. Häfner, Phys. Rev. B 72, 245415 (2005).
Figure 1: (Color online) (a) The diarylethene derivative in 
losed and open 
on�gurations. (b) A
s
hemati
 of the swit
hing jun
tion. Diarylethene mole
ules are sandwi
hed between two Au (111)
surfa
es, and two S an
horing atoms are lo
ated at the hollow site. The verti
al blue line denotes
the interfa
e between the s
attering region and the left or right gold ele
trode.
Figure 2: (Color online) The 
al
ulated 
urrent-voltage 
hara
teristi
s of the diarylethene and its
derivative mole
ular jun
tions with two di�erent 
on�gurations. The triangles linking with bla
k
solid lines are for the diarylethene mole
ular jun
tion, while the 
ir
les linking with short red dotted
lines stand for the jun
tion where six F atoms in the peripheral of 
y
lopentene are substituted by
H atoms. The �lled (empty) symbols 
orrespond to the 
losed (open) stru
tures. The inset below
right is the I-V 
urve for the open stru
tures (with F and H atoms in the peripheral, respe
tively)
in small s
ale for 
larity.
Figure 3: (Color online) The zero-bias voltage transmission spe
tra versus the energy E-EF of
diarylethene mole
ular jun
tions with the 
losed (a) and open (b) 
on�gurations. Here, EF is the
Fermi level of ele
trodes. The red short verti
al lines stand for the positions of MPSH mole
ular
energy levels. The spatial distributions of the perturbed HOMOs and LUMOs are inserted in the
�gure, and pla
ed at the right and left sides of the EF , respe
tively.
Figure 4: (Color online) The 
al
ulated transmission spe
tra versus the energy E-EF at zero-bias
voltage for diarylethene mole
ular jun
tions with 
losed (a) and open (b) forms, respe
tively. One

ase is that six F atoms in the peripheral of 
entral 
y
lopentene are substituted by H atoms (with
bla
k solid lines); the other is that two S atoms are repla
ed by O atoms (with red dotted lines).
The red short verti
al lines stand for the positions of MPSH mole
ular energy levels.
Figure 5: (Color online) The 
al
ulated transmission spe
tra for diarylethene mole
ular jun
tions
with 
losed (a) and open (b) stru
tures. The bla
k solid and red dotted lines stand for the end
an
horing Se and Te atoms, respe
tively. Here, the red short verti
al lines stand for the positions
of MPSH mole
ular energy levels.
Fig.1 of Huang et al.
Fig.2 of Huang et al.
Fig.3 of Huang et al.
Fig.4 of Huang et al.
Fig.5 of Huang et al.
	INTRODUCTION
	COMPUTATIONAL MODEL AND METHOD
	RESULTS AND DISCUSSION
	Electronic structures of free diarylethene molecules with closed and open structures
	Transport properties of diarylethene molecular junctions
	Substituting effect on diarylethene molecule
	Effect of varing end anchoring atoms 
	CONCLUSION
	ACKNOWLEDGMENTS
	References
ABSTRACT
  The electronic transport properties and switching mechanism of single
photochromic diarylethene derivatives sandwiched between two gold surfaces with
closed and open configurations are investigated by a fully self-consistent
nonequilibrium Green's function method combined with density functional theory.
The calculated transmission spectra of two configurations are strikingly
distinctive. The open form lacks any significant transmission peak within a
wide energy window, while the closed structure has two significant transmission
peaks on the both sides of the Fermi level. The electronic transport properties
of the molecular junction with closed structure under a small bias voltage are
mainly determined by the tail of the transmission peak contributed unusually by
the perturbed lowest perturbed unoccupied molecular orbital. The calculated
on-off ratio of currents between the closed and open configurations is about
two orders of magnitude, which reproduces the essential features of the
experimental measured results. Moreover, we find that the switching behavior
within a wide bias voltage window is extremely robust to both substituting F or
S for H or O and varying end anchoring atoms from S to Se and Te.

<|endoftext|><|startoftext|>
Robust manipulation of electron spin coherence in an ensemble of singly charged
quantum dots
A. Greilich, M. Wiemann, F. G. G. Hernandez † , D. R. Yakovlev § , I. A. Yugova ‡ , and M. Bayer
Experimentelle Physik II, Universität Dortmund, D-44221 Dortmund, Germany
A. Shabaev ⋆ and Al. L. Efros
Naval Research Laboratory, Washington, DC 20375, USA
D. Reuter and A. D. Wieck
Angewandte Festkörperphysik, Ruhr-Universität Bochum, D-44780 Bochum, Germany
(Dated: November 4, 2018, robustcontrol-03-27-07-fin.tex)
Using the recently reported mode locking effect [1] we demonstrate a highly robust control of
electron spin coherence in an ensemble of (In,Ga)As quantum dots during the single spin coherence
time. The spin precession in a transverse magnetic field can be fully controlled up to 25 K by
the parameters of the exciting pulsed laser protocol such as the pulse train sequence, leading to
adjustable quantum beat bursts in Faraday rotation. Flipping of the electron spin precession phase
was demonstrated by inverting the polarization within a pulse doublet sequence.
PACS numbers: 72.25.Dc, 72.25.Rb, 78.47.+p, 78.55.Cr
The spin of an electron in a quantum dot (QD) is an
attractive quantum bit candidate [2, 3, 4, 5] due to its
favorable coherence properties [1, 6, 7, 8]. As the inter-
action strength is rather small for direct spin manipula-
tion, the idea to swap spin into charge has been furbished
[6, 9, 10]. For example, the electron may be converted
into a charged exciton by optical injection of an electron-
hole pair [10], depending on the residual electron’s spin
orientation, leading to distinctive polarization selection
rules.
The fundamental quantity regarding spin coherence is
the transverse relaxation time T2 . In a QD ensemble,
this time is masked by dephasing, mostly caused by dot-
to-dot variations of the spin dynamics. The dephasing
time does not exceed 10 ns, much shorter than T2 . This
leads to the general believe that manipulations ought to
-1 0 1 2 3 4 5 -1 0 1 2 3 4 5
= 1.86 ns
3.26 ns
3.66 ns
3.76 ns
3.86 ns
= 4.26 ns
B = 6 T, T = 6 K
4.92 ns
5.22 ns
5.42 ns
5.62 ns
5.92 ns
Time (ns)
FIG. 1: Faraday rotation traces measured as function of delay
between probe and first pump pulse at time zero. A second
pump pulse was applied, delayed relative to the first one by
TD , indicated at each trace. The top left trace gives the FR
without second pump.
be performed on a single spin. Measurement of a single
electron spin polarization, however, also results in de-
phasing due to temporal sampling of varying nuclear spin
configurations [11, 12], as statistically significant mea-
surements on a single QD may require multiple repetition
of the experiment. The dephasing can be overcome by
spin-echo techniques, which give a single electron spin co-
herence time on the scale of micro-seconds [8]. This long
coherence time derived by spin-echo is result of a refo-
cusing of the electron spin and possibly the nuclear spin
configuration [11], and it is viewed as an upper bound on
the free-induction decay of spin coherence [11, 13].
Recently, however, we have shown that mode locking
of electron spin coherence allows one to overcome the en-
semble dephasing [14] and to measure the single electron
spin relaxation time T2 without applying spin-echo re-
focusing [1]. For monitoring the coherence, pump-probe
Faraday rotation (FR) measurements [15] on a QD en-
semble were used: after optical alignment of the spins
normal to an external magnetic field the electron spins
precess about this field. Due to precession frequency
variations the ensemble phase coherence is quickly lost.
However, a periodic train of circularly polarized pulses
emitted by a mode-locked laser synchronizes those spin
precession modes, for which the precession frequency is
a multiple of the laser repetition rate. This synchroniza-
tion leads to constructive interference (CI) of these modes
in the ensemble spin polarization before arrival of each
pump pulse (see Fig. 1, upper left trace). The limit for
spin mode locking is set by the single electron spin co-
herence time which can last up to a few microseconds [1]
reaching the low bound on echo-like decays [16].
Here we develop a detailed understanding of the de-
gree of control which can be reached for the electron spin
coherence in an ensemble of singly charged QDs by ex-
ploiting the mode locking. For this purpose trains of
excitation pump pulse doublets were designed to vary
http://arxiv.org/abs/0704.0177v1
the phase synchronization condition (PSC) for electron
spin precession frequencies. The PSC selects a QD sub-
set, whose contribution to the ensemble spin polarization
shows a well controlled phase recovery. Variation of the
pulse separation results in tunable patterns of quantum
oscillation bursts in time-resolved FR, in good agreement
with our calculation, which rely on a newly developed
theoretical model. This tailoring of electron spin coher-
ence is very robust, as the spin mode locking is stable up
to 25 K. For higher temperatures the coherence ampli-
tude decreases due to phonon-assisted scattering of holes
during the laser pulse excitation by which the spin co-
herence is created.
The studied self-assembled (In,Ga)As/GaAs QDs were
fabricated by molecular beam epitaxy on a (001)-oriented
GaAs substrate. The sample contains 20 QD layers with
a layer dot density of about 10 10 cm −2 , separated by 60
nm wide barriers [17]. For average occupation by a sin-
gle electron per dot, the structures were n -modulation
doped 20 nm below each layer with a Si-dopant density
matching roughly the dot density. The sample was held
in the insert of an optical magneto-cryostat, allowing
temperature variation from T = 6 to 50 K.
FR with picosecond time resolution was used for study-
ing the spin dynamics: Thereby spin polarization along
the growth direction ( z -axis) is generated by a circu-
larly polarized pump pulse hitting the sample along z ,
and its precession in a transverse magnetic field B ≤ 7 T
along the x -axis is tested by the rotation of the linear
polarization of a probe pulse. For optical excitation, a Ti-
sapphire laser was used emitting pulses with a duration
of ∼ 1.5 ps (full width at half maximum of ∼ 1 meV) at
75.6 MHz repetition rate (corresponding to a repetition
period TR = 13.2 ns). The laser energy was tuned into
resonance with the QD ground state transition and the
laser pulses were split into pump and probe. The pump
beam was split further into two pulses with variable de-
lay TD in between. The circular polarization of the two
pumps could be controlled individually. For detecting
the rotation angle of the probe beam linear polarization,
a homodyne technique was used.
Figure 1 shows FR traces excited by the two-pulse
train with a repetition period TR = 13.2 ns, in which
both pulses have the same intensity and polarization,
and the delay between these pulses TD was varied be-
tween ∼ TR/7 and ∼ TR/2 . The FR pattern varies
strongly for the case when the delay time TD is commen-
surate with the repetition period TR : TD = TR/i with
i = 2, 3, ... , and for the case TD 6= TR/i . For commen-
surability TD = TR/i , the FR signal shows strong peri-
odic bursts of quantum oscillations only at times equal
to multiples of TD , as seen in the left panel of Fig. 1 for
TD = 1.86 ns≈ TR/7 . Commensurability is also given to
a good approximation for delays TD = TR/4 ≈ 3.26 ns
and TD = TR/3 ≈ 4.26 ns.
For incommensurability of TD and TR , TD 6= TR/i ,
the FR signal shows bursts of quantum oscillations be-
tween the two pulses of each pump doublet, in addition
to the bursts outside of the doublet. For example, one
can see a single burst in the mid between the pumps
for TD = 3.76 and 5.22 ns. Two bursts, each equidis-
tant from the closest pump and also equidistant from
one another, appear at TD = 4.92 and 5.62 ns. Three
equidistant bursts occur at TD = 5.92 ns. Note also that
the FR amplitude before the second pump arrival is al-
ways significantly larger than before the first pump for
any TD .
Although the time dependencies of the FR signal look
very different for commensurate and incommensurate
TD and TR , in both cases they can be fully controlled
by designing the synchronization of electron spin pre-
cession modes in order to reach CI of their contribu-
tions to the FR signal [1]. A train of circularly polar-
ized pump pulse singlets synchronizes those spin preces-
sions for which the precession frequency satisfies the PSC
[1, 18]: ωe = 2πN/TR . Then the electron spin under-
goes an integer number, N ≫ 1 , of full 2π rotations in
the interval TR between the pump pulses.
For a train of pump pulse doublets the PSC has to be
extended to account for the intervals TD and TR − TD
in the laser excitation protocol
ωe = 2πNK/TD = 2πNL/(TR − TD) , (1)
where K and L are integers. On first glance this con-
dition imposes severe limitations on the TD values, for
which synchronization is obtained:
TD = [K/(K + L)]TR , (2)
which for TD < TR/2 leads to K < L . When Eq. (2)
is satisfied, the contribution of synchronized precession
modes to the average electron spin polarization Sz(t) is
proportional to −0.5 cos[N(2πKt/TD)] . Summing over
all relevant oscillations leads to CI of their contributions
with a period TD/K in time [1]. The rest of QDs does
not contribute to Sz(t) at times longer than the ensem-
ble dephasing time. The PSC Eq. (1) explains the posi-
tion of all bursts in the FR signal for commensurate and
incommensurate ratios of TD and TR . For commensu-
rability, K ≡ 1 and TD = TR/(1 + L) according to Eq.
(2). In this case CIs should occur with period TD as
seen in Fig. 1 for TD = 1.86 ns (L = 6 ).
For incommensurability of TD and TR the number of
FR bursts between the two pulses within a pump doublet
and the delays at which they appear can be tailored.
There should be just one burst between the pulses, when
K ≡ 2 , because then the CI must have a period TD/2 .
A single burst is seen in Fig. 1 for TD = 3.76 and
5.22ns. The corresponding ratios TD/TR are 0.285 and
0.395, respectively. At the same time Eq.(2) gives a ratio
TD/TR = 2/(L+ 2) , which is equal to 0.285 and 0.4 for
L=5 and 3, respectively, in good accord with experiment.
Next, two FR bursts are seen for TD = 4.92 and
5.62ns, corresponding to TD/TR ≈ 0.372 and 0.426. The
corresponding CI period TD/3 is reached for K ≡ 3 .
Then from Eq.(2) TD/TR = 3/(L + 3) , giving 0.375
-1 0 1 2 3 4 5 6
305 310 315 305 310 315
TD=2TR/7  
Time (ns)
TD=TR/3
t = TD
t = 0
TD=TR/3
 Precession frequency (GHz)
t = 0
t = TD
TD=2TR/7
FIG. 2: (a,b): Spectra of electron spin precession modes,
−Sz(t) , which are phase synchronized by the two-pulse train
calculated for TD = TR/3 and TD = 2TR/7 at the moments
of first ( t = 0 ) and second ( t = TD ) pulse arrival (red).
Single-pump spectra are shown in blue. (c): FR traces cal-
culated for two ratios of TD/TR . Laser pulse area Θ = π .
TR = 13.2 ns. Electron g -factor | ge |= 0.57 and its disper-
sion ∆g = 0.005 . B = 6 T.
and 0.429 for L = 5 and 4, respectively. Finally, the
FR signal with TD = 5.92 ns ( TD/TR ≈ 0.448) shows
three FR bursts between the two pumps. The CI period
TD/4 is obtained for K ≡ 4 , for which Eq.(2) gives
TD/TR = 4/(L + 4) ≈ 0.444 with L = 5 . Obviously
good general agreement between experiment and theory
is established, highlighting the high flexibility of the laser
protocol. In turn, this understanding can be used to in-
duce FR bursts at wanted delays TD/K , so that at these
times further coherent manipulation of all electron spins
involved in the burst is facilitated.
However, the question arises how accurate condition
Eq. (2) for the TD/TR ratio must be fulfilled to reach
phase synchronization. Formally, one can find for any
arbitrary TD/TR large K and L values such that Eq.
(2) is satisfied with high accuracy. But the above analysis
shows, that only the smallest of all available L leads to
PSC matching. Experimentally, the facilities to address
this point are limited, as the largest TD for which FR
signal can be measured are delays around 5 ns between
the two pumps. For larger delays the FR bursts shift out
of the scanning range. For short TD , on the other hand,
the bursts are overlapping with the FR signal from the
pump pulses.
To answer this question, we have modeled the FR sig-
nal for commensurate and incommensurate ratios of TD
and TR . Figure 2 shows the results together with spec-
tra of synchronized spin precession modes (SSPM) at the
moment of the first and second pump pulse arrival. The
SSPM were calculated similar those induced by a single
pulse train [1]. Figure 2(a) gives the SSPM for com-
mensurate TD = TR/3 superimposed on the SSPM cre-
ated by a single pulse train with the same TR . Panel
(c) shows the FR signal created by such a two pulse
train. The SSPM for the considered strong excitation
are considerably broadened and contain modes for which
ωe = 2πM/TD = 2π3M/TR with integer M , which co-
incide with each third mode created by a single pulse
train. However, the SSPM given by ωe = 2πN/TR ,
which do not satisfy the PSC for a two pulse train, are
not completely suppressed, because the train synchro-
nizes the electron spin precession in some frequency range
around the PSC. One sees also, that at t = 0 the two
pulse train leads to a significant alignment of electron
spins opposite to the direction of spins satisfying the
PSC. This ”negative” alignment decreases the CI mag-
nitude and therefore the FR signal before the first pulse
arrival, and is also responsible for a significantly larger
magnitude of the FR signal before the second pulse ar-
rival [see Figs. 1 and 2 (c)].
For incommensurate ratios of TD and TR the SSPM
become much more complex. Still we are able to recover
the modes which satisfy the PSC at the pulse arrival
times. In Fig. 2 (b) we show the SSPM at t = 0 and t =
TD for TD = 2TR/7 (K = 2 , L = 5 ), where the arrows
indicate the frequencies which satisfy the PSC for the
two pulse train. Only a small number of such modes fall
within the average distribution of electron spin precession
modes, because the distance between the PSC modes is
proportional to 2πK/TD = 2π(K +L)/TR . The diluted
spectra of PSC modes for incommensurability decrease
the magnitude of the FR bursts between the pump pulses,
in accord with experiment. This shows, that although
any ratio of TD/TR can be satisfied by large K and
L , the FR signal between the pulses should be negligibly
small in this case. Consequently, not any ratio of TD/TR
leads to pronounced FR bursts.
To obtain further insight into the tailoring of electron
spin coherence, which can be reached by a two-pulse
train, we have turned from co- to counter-circularly po-
larized pumps. The delay between pumps TD was fixed
at TR/6 ≈ 2.2 ns. The time dependencies of the corre-
sponding FR signals are similar, as shown in Fig. 3. Be-
sides the two FR bursts directly connected to the pump
pulses, one sees a burst +1 due to CI of spin synchro-
nized modes. The insets in Fig. 3 (a) show closeups of
the different FR bursts. The sign, κ , of the FR am-
plitude for the counter-circular configuration undergoes
2TD -periodic changes in time relative to the co-circular
case, as seen in Fig. 3 (b), which demonstrates optical
switching of the electron spin precession phase by π in
an ensemble of QDs.
The observed effect of sign reversal is well described
by our model. Let us consider first a two-pulse train
with delay time TD = TR/2 for which the two pumps
-1 0 1 2 3 4 5
0 1 2 3 4 5
 co, 
 counter, 
 Time (ns)
pump 1 pump 2 +1 burst
- (b)  
 Time (ns)
B = 3 T T = 6 K (c)
FIG. 3: (a): Faraday rotation traces in the co-circularly (blue
traces) or counter-circularly (red traces) polarized two pump
pulse experiments measured for TD = 2.2 ns and B = 6T.
Insets give close-ups showing the relative sign, κ , of the FR
amplitude between the two traces. κ is plotted in (b) vs
time. (c): Effect of temperature on the FR amplitude in two-
pump-pulse experiment. TD = 1.88 ns.
are counter-circularly polarized. In this case an electron
spin can be synchronized only if at the moment of pulse
arrival it has an orientation opposite to the orientation
at the previous pulse. This leads to the PSC ωe =
2π(N + 1/2)/TD . The contribution of such precession
modes to the electron spin polarization is proportional
to cos[2π(N+1/2)t/TD)] = cos(2πNt/TD) cos(πt/TD)−
sin(2πNt/TD) sin(πt/TD) . Summing these contribu-
tions, only the first term gives a CI, whose modulus has
period TD , while the sign of cos(πt/TD) changes with
period 2TD . Only each third of the precession frequen-
cies can be synchronized by a counter-circularly polarized
two pulse train when the delay time is TR/6 as in our ex-
periment. However, the corresponding PSC has the same
dependence on TD . The CI modulus also has period TD
and its sign changes with period 2TD . The relative sign
of the FR amplitude for the counter- and co-circularly
case, κ = sgn{cos[πt/TD]} , is in accord with the exper-
imental dependence in Fig. 3 (b).
The CIs of the electron spin contributions can be seen
only as long as the coherence of the electron spins is main-
tained. In this respect the temperature stability of the
CI is especially important. Fig. 3 (c) shows FR traces
in a two-pump-pulse configuration with TD = 1.88 ns at
different temperatures. For both positive and negative
delays, the FR amplitude at a fixed delay is about con-
stant for temperatures up to 25 K, irrespective of slight
variations which might arise from changes in the phase
synchronization of QD subsets. Above 30 K a sharp drop
occurs, which can be explained by thermally activated
destruction of the spin coherence.
The electron spin coherence in charged QDs is initi-
ated by generation of a superposition of an electron and
a charged exciton state by resonant pump pulses [17, 19].
The simultaneous decrease of the FR magnitude before
each pump pulse and afterwards (when the CI signal is
controlled by the excitation pulse) suggests that the co-
herence at elevated temperatures is lost already during its
generation. The 30K temperature threshold corresponds
to an activation energy of ∼ 2.5 meV. This energy may
be assigned only to the splitting between the two lowest
confined hole levels, because the electron level splitting
dominates the 20 meV splitting between p- and s-shell
emission in photoluminescence and is much larger than
2.5 meV. The decoherence of the hole spin results from
two phonon scattering, which is thermally activated and
should occur on a sub-picosecond time scale, i.e. within
the laser pulse [20]. The fast decoherence of the hole spin
at T > 30 K suppresses formation of the electron-trion
superposition state. ps-pulses as used here are therefore
not sufficiently short for initialization of the superposi-
tion and creation of a long-lived electron spin coherence.
In summary, we have demonstrated that the mode-
locking effect allows a far-reaching control of electron
spin coherence in QD ensembles during the spin coher-
ence time of microseconds [1]. Two-pulse train mode-
locking selects QD subsets which give a non-dephasing
contribution to the ensemble spin precession. The tech-
nique shows remarkable stability with respect to temper-
ature increase up to 25 K, a property which is important
for utilizing it in quantum information processing. The
robustness of this control technique is provided by the
dispersion of the spin precession frequencies in the QD
ensemble.
Acknowledgments. This work was supported by the
BMBF program nanoquit, the DARPA program QuIST,
the ONR, the DFG (FOR485) and FAPESP.
[ † ] on leave from the Instituto de Fisica Gleb Wataghin,
Campinas, SP , Brazil.
[ § ] also at Ioffe Physico-Technical Institute, 194021, St. Pe-
tersburg, Russia.
[ ‡ ] also at Institute of Physics, St. Petersburg State Univer-
sity, 198504, St. Petersburg, Russia.
[ ⋆ ] also at School of Computational Sciences, George Mason
University, Fairfax VA 22030.
[1] A. Greilich et al., Science 313, 341 (2006).
[2] D. Loss and D. P. DiVincenzo, Phys. Rev. A 57, 120
(1998).
[3] A. Imamoglu et al., Phys. Rev. Lett. 83, 4204 (1999).
[4] Semiconductor Spintronics and Quantum Computation,
ed. by D. D. Awschalom, D. Loss, and N. Samarth
(Springer-Verlag, Heidelberg 2002).
[5] S. M. Clark et al., cond-mat/0610152.
[6] J. M. Elzerman et al., Nature 430, 431 (2004).
[7] M. Kroutvar et al., Nature 432, 81 (2004).
[8] J. R. Petta et al., Science 309, 2180 (2005).
[9] see, for example, R. Hanson et al., Phys. Rev. Lett. 94,
196802 (2005).
[10] see, for example, T. Calarco et al., Phys. Rev. A 68,
012310 (2003); P. Chen et al., Phys. Rev. B 69, 075320
(2004).
[11] W. Yao, R. Liu, and L. J. Sham, Phys. Rev. B 74, 195301
(2006).
[12] R. Liu, S. E. Economou, L. J. Sham, and D. G. Steel,
Phys. Rev. B 75, 085322 (2007).
[13] W. A. Coish et al., Phys. Stat. Sol. (b) 243, 3658 (2006).
[14] For a general treatment on suppression of phase noise
see A. G. Kofman and G. Kurizki, Phys. Rev. Lett. 93,
130406 (2004).
[15] J. M. Kikkawa and D. D. Awschalom, Science 287, 473
(2000).
[16] R. Hanson et al., cond-mat/0610433.
[17] A. Greilich et al., Phys. Rev. Lett. 95, 227401 (2006).
[18] A. Shabaev et al., Phys. Rev. B 68, 201305(R) (2003).
[19] T. A. Kennedy et al., Phys. Rev. B 73, 045307 (2006).
[20] T. Takagahara, Phys. Rev. B 62, 16840 (2000).
http://arxiv.org/abs/cond-mat/0610152
http://arxiv.org/abs/cond-mat/0610433
ABSTRACT
  Using the recently reported mode locking effect we demonstrate a highly
robust control of electron spin coherence in an ensemble of (In,Ga)As quantum
dots during the single spin coherence time. The spin precession in a transverse
magnetic field can be fully controlled up to 25 K by the parameters of the
exciting pulsed laser protocol such as the pulse train sequence, leading to
adjustable quantum beat bursts in Faraday rotation. Flipping of the electron
spin precession phase was demonstrated by inverting the polarization within a
pulse doublet sequence.

<|endoftext|><|startoftext|>
Introduction
The equation of state (EOS) of hydrogen and helium at high pressures is of great relevance for models of the
interior of giant planets and other astrophysical objects as well as for inertial confinement fusion experiments.
For detailed calculations accurate knowledge of the EOS over a wide range of densities and temperatures is
needed. Especially, in the range of warm dense matter with high densities characteristic for condensed matter
and at temperatures of a few eV the EOS is crucial for modelling giant planets. This region is challenging
for many-particle theory because strong correlations dominate the physical behavior. Progress in shock-wave
experimental technique has allowed to study this region only recently.
To probe the EOS, experimental investigations were performed statically with diamond anvil cells or dynami-
cally by using shock waves, see [1] for a recent review. The experimental data indicate that a nonmetal-to-metal
transition occurs at about 1 Mbar which is identified by a strong increase of the conductivity [2] and reflectiv-
ity [3]. Some theoretical models yield a thermodynamic instability in this transition region, the plasma phase
transition (PPT) [4, 5, 6, 7, 8], which would strongly affect models for planetary interiors and the evolution of
giant planets [9, 10, 11]. After a long period of controversial discussions, new results of shock wave experiments
on deuterium support the existence of such a PPT [12]. This fundamental problem of high-pressure physics will
also be studied with the FAIR facility at GSI Darmstadt within the LAPLAS project, see [13, 14].
In this paper we present new results for the EOS of dense hydrogen within the chemical picture. We treat the
reactions pressure dissociation and ionization self-consistently via respective mass action laws. We identify the
region of thermodynamic instability and calculate the phase diagram as well as the reflectivity in order to verify
the corresponding nonmetal-to-metal transition. The EOS data is used to model the interior of Jupiter within a
three-layer model. The agreement with astrophysical constraints such as the core mass and the fraction of heavier
elements can serve as an additional test of the theoretical EOS.
2 Equation of state for dense hydrogen
Warm dense hydrogen is considered as a partially ionized plasma in the chemical picture. A mixture of a neutral
component (atoms and molecules) and a plasma component (electrons and protons) is in chemical equilibrium
∗ Corresponding author: e-mail: bastian.holst@uni-rostock.de, Phone: +49 381 498 6919, Fax: +49 381 498 6912
cpp data will be provided by the publisher
http://arxiv.org/abs/0704.0178v1
2 EOS for dense hydrogen: EOS for dense hydrogen
with respect to dissociation and ionization. The EOS is derived from an expression for the free energy of the
neutral (F0) and charged particles (F±), see [15, 16]:
F (T, V,N) = F0 + F± + Fpol. (1)
The first two terms consist of ideal and interaction contributions and can be written as F0 = F
0 and F± =
+F int
. Fpol contains interaction terms between charged and neutral components caused by polarization [17].
Applying fluid variational theory (FVT), the EOS is determined by calculating the free energy F int0 (T, V,N)
via the Gibbs-Bogolyubov inequality [18]. This method has been generalized to two-component systems with a
reaction [19, 20, 21] so that also molecular systems at high pressure can be treated where pressure dissociation
occurs, e.g. H2 ⇀↽ 2H for hydrogen. In chemical equilibrium, µH2 = 2µH is fulfilled, and the number of
atoms and molecules can be determinded self-consistently via the chemical potentials µc = (∂F/∂Nc)T . The
effective interactions between the neutral species are modeled by exp-6 potentials, and the free energy of a multi-
component reference system of hard spheres has to be known; for details, see [19, 20, 22].
The charged component is treated by using efficient Padé approximations for the free energy developed by
Chabrier and Potekhin [23]. The coupling with the neutral component occurs via the ionization equilibrium,
H⇀↽e+p. In chemical equilibrium, the relation µH = µe + µp determines the degree of ionization.
Since atoms and molecules are particles of finite size there is an additional interaction between the charged
component and the neutral fluid. According to the concept of reduced volume, point-like particles cannot pene-
trate into the volume occupied by atoms and molecules. This leads to a correction in the description of the ideal
gas of the charged component [24, 25] so that the ideal free energy of protons and electrons F id
is dependent on
the reduced volume V ∗ = V · (1− η),
(T, V ∗, N) = N±kBT · f
, (2)
where η is the ratio of the volume which cannot be penetrated by point-like particles to the total volume. It is
derived from hard sphere diameters obtained within the FVT self-consistently. The free energy density f id,∗
given by Fermi integrals which take into account quantum effects. In order to avoid an intersection of pressure
isotherms, which is important for modelling planetary interiors, a minimum diameter dmin has been introduced.
It was determined starting at low temperatures where it remains almost constant up to 15.000 K, then it increases
up to 20.000 K and remains constant again for higher temperatures, see Fig. 1. These values are in the range of
the results for the diameter of the hydrogen atom derived from the confined atom model [26].
5000 10000 15000 20000
T [K]
Fig. 1 Minimum diameter for expanded particles (atoms, molecules) introduced within the reduced volume concept.
Consequently, the reduced volume concept changes the chemical potential of each component drastically at
higher densities and results in pressure ionization. This is due to the fact that additional terms appear in the
chemical potential, which is the particle number derivative of the free energy, and thermodynamic functions of
degenerate plasmas are very sensitive to changes in density.
This current model FVT+ includes all interaction contributions to the chemical potentials, thus being a gener-
alization of earlier work [22] where only ideal plasma contributions have been treated (FVT+
cpp header will be provided by the publisher 3
mass density ρ [g/cm
20000 K
mass density ρ [g/cm
50000 K
Fig. 2 Composition of dense hydrogen for 20.000 K (left) and 50.000 K (right).
In Fig. 2 the composition of hydrogen derived from the present approach is shown for two temperatures.
Hydrogen is an atomic gas at low temperatures (left) and low densities. With increasing densities molecules
are formed due to the mass action law. Pressure dissociation and ionization can be observed in the high-density
region. The nonideality corrections to the free energy force a transition from a molecular fluid to a fully ionized
plasma. At higher temperatures (right) the formation of molecules is suppressed and pressure ionization becomes
the dominating process. At low densities and high temperatures a fully ionized plasma is produced due to thermal
ionization.
We show pressure isotherms over a wide range of temperatures and densities in Fig. 3. At low densities the
system behaves like a neutral fluid. Between densities of 10−3 g/cm3 and 10−1 g/cm3 nonideality corrections to
the free energy of atoms and molecules lead to a nonlinear behavior of the isotherms. For still higher densities a
phase transition occurs which is treated by a Maxwell construction. The thermodynamic instability vanishes with
increasing temperatures, and the critical point is located at 16.800 K, 0.35 g/cm3, and 45 GPa.
 ρ [g/cm
100000 K
  50000 K
  30000 K
  20000 K
  15000 K
    5000 K
Fig. 3 Pressure isotherms for dense hydrogen.
The critical point and the related coexistence line are shown in Fig. 4 and compared with results of other EOS.
The critical point itself lies within the range of other predictions, whereas the coexistence line is lower than most
of the other results. For a comparison of data concerning the PPT, see Table 1.
New shock-wave experiments [12] imply that a PPT occurs in deuterium at densities of 1.5 g/cm3 and a
coexistence pressure of about 1 megabar. Each of these values is twice as high as evaluated in the recent model.
4 EOS for dense hydrogen: EOS for dense hydrogen
0 5000 10000 15000 20000
T [K]
Fig. 4 Phase diagram for dense hydrogen. Present results of the FVT+ (red) are compared with other predictions for the
PPT: SC [4, 5], RK [27], MH [28], ER [29], SBT [30], RRN [31], BEF [32], MCPB [33].
Tc pc ρc Method Authors Reference
(103 K) (GPa) (g/cm3)
12.6 95 0.95 PIP Ebeling/Sändig (1973) [34]
19 24 0.14 PIP Robnik/Kundt (1983) [27]
16.5 22.8 0.13 PIP Ebeling/Richert (1985) [29]
16.5 95 0.43 PIP Haronska et al. (1987) [35]
15 64.6 0.36 PIP Saumon/Chabrier (1991) [4]
15.3 61.4 0.35 PIP Saumon/Chabrier (1992) [5]
14.9 72.3 0.29 PIP Schlanges et al. (1995) [30]
16.5 57 0.42 PIP Reinholz et al. (1995) [31]
11 55 0.25 PIMC Magro et al. (1996) [33]
20.9 0.3 0.002 Kitamura/Ichimaru (1998) [36]
16.8 45 0.35 PIP present FVT+
Table 1 Theoretical results for the critical point of the hypothetical plasma phase transition (PPT) in hydrogen which was
predicted by Zeldovich and Landau [37] and Norman and Starostin [38].
3 Conductivity and reflectivity
The PPT is an instability driven by the nonmetal-to-metal transition (pressure ionization). We calculate the
electrical conductivity as well as the reflectivity by applying the COMPTRA04 program package [39, 40] in
order to locate this transition in the density-temperature plane.
Optical properties are calculated within the Drude model. The reflectivity R(ω) is given in the long-wavelength
limit via the dielectric function ε(ω) which is determined by a dynamic collision frequency ν(ω) or, alternatively,
by the dynamic conductivity σ(ω) [41]:
R(ω) =
ε(ω)− 1
ε(ω)− 1
, (3)
ε(ω) = 1−
ω [ω + iν(ω)]
= 1 +
σ(ω), (4)
σ(ω) = σ(0)
. (5)
ωpl =
nee2/(ε0me) is the plasma frequency of the electrons.
The reflectivity was determined along the Hugoniot curve and is compared with experimental results [3] and
those of the earlier model FVT+
[42] in Fig. 5. The results of the current model show a much better agreement
with the experiment. The characteristic and abrupt rise with increasing pressure was reproduced more accurately.
cpp header will be provided by the publisher 5
This drastic increase appears due to pressure ionization in the vicinity of the criotical point of the PPT. As a
result, the reflectivity advances from very low values to metallic-like ones almost instantly.
10 100 1000
P [GPa]
808 nm
1064 nm
Celliers et al. 2000
Fig. 5 Reflectivity of dense hydrogen within the models FVT+ and FVT+
along the Hugoniot curve in comparison with
experiments [3].
4 Planetary interiors
Modelling the interiors of giant planets and comparison with their observational parameters offers an alternative
tool besides laboratory experiments of probing the EOS of the components the planets are predominantly made
of. Giant planets such as Jupiter and Saturn consist mainly of hydrogen and, in decreasing order, of helium,
water and rocks, covering a wide range of pressures and temperatures. Independently from the H-EOS used for
modelling, the simplest interior structure that is compatible with the observational constraints requires at least
three homogenous layers with a transition from a cold molecular fluid in the outer envelope to a pressure ionized
plasma in the deep interior and a dense solid core of ices and rocks. A solid core may be explained as a result
of the formation process and the seperation into two fluid envelopes with different particle abundances by an
existence of a PPT as provided by the FVT+ EOS. The constraining observational parameters are the total mass
of the planet M , its equatorial radius Req , the temperature T at the outer boundary, the average helium content
Ȳ , the period of rotation ω and the gravitational moments J2, J4, J6. From measurements of the luminosity it
has been argued [43] that the temperature profile should be adiabatic. For a given EOS, the interior profiles of
pressure P and density ρ are calculated by integration of the equation of hydrostatic equilibrium
ρ(r, θ)
∇~rP (r, θ) = ∇~r
ρ(r, θ)
|~r − ~r′|
ω2r2 sin θ2
along an isentrope defined by the outer boundary. The first term on the right hand side of eq. (6) is the gravitational
potential and the second term the centrifugal potential assuming axialsymmetric rotation. We apply the theory
of figures [44] up to third order to solve this equation and to calculate the gravitational moments. They are
defined as the coefficients of the expansion of the gravitational potential into Legendre polynomials, taken at the
outer boundary. Being integrals of the density distribution weighted by some power of the radius, they are very
sensitive with respect to the amount and distribution of helium and heavier elements within the planet.
In accordance with previous calculations [45, 46], mixtures of hydrogen with helium and heavier elements
have been derived from the EOS of the pure materials via the additive volume rule. It states that the entropy of
mixing can be neglected.
6 EOS for dense hydrogen: EOS for dense hydrogen
Assuming a three-layer structure, we present results for Jupiter for the profiles of temperature, density, and
pressure along the radius in Fig. 6 using two different H-EOS, the standard Sesame table 5251 for hydrogen [47]
and the FVT+ model presented above.
 0.0001
 0.001
 0.01
 1000
 10000
 100000
 0  2  4  6  8  10  12
radius [RE]
profiles inside Jupiter
density
pressure
temperature
Sesame
Fig. 6 Profiles of temperature, density, pressure along the radius within Jupiter using two different H-EOS, FVT+ (solid)
and Sesame 5251 (dashed).
The profiles of temperature appear very similar, meaning a small uncertainty about the real profiles. Contrary,
the density and pressure profiles exhibit more differences and require some explanation. In the fluid part of
Jupiter, the presence of a PPT leads to a jump in density between the envelopes. Since the gravitational moments
as integrals over the density have to be the same for both H-EOS, the density profile of a H-EOS with PPT has to
be smaller in the outer envelope and larger in the inner envelope. The different size and composition of the core
for these specific H-EOS are a consequence of their different compressibility in the regime of pressure ionization
at about 1 Mbar, where the gravitational moments are most sensitive to the density distribution.
In case of a stiff H-EOS like Sesame, a larger amount of heavy elements is needed in the two fluid envelopes
to compensate for the smaller hydrogen density at a given pressure. As a result, this material is added to the
well-known density-pressure relation of degenerate electrons in the deep interior, leaving less material for the
core. Thus, in case of the Sesame-EOS, the amount of heavy elements becomes with 10% very large and an
unlikely solution with a very small core of light material (e.g. water) can be found.
In case of the FVT+ EOS which is more compressible than the Sesame EOS at about 1 Mbar, the helium
content is below the value of 27.5% for the protosolar cloud in order to reproduce the lowest gravitational moment
J2. Furthermore, the next gravitational moment J4 cannot be reproduced correctly because the transition to the
metallic envelope occurs already at about 90% of the radius and, thus, at too low densities. For opposite reasons,
both the Sesame and FVT+ EOS applied in a three-layer model of Jupiter are not compatible with all of the
observational constraints. While Sesame is probably too stiff, the FVT+ model is likely too soft in the WDM
region at about 1 Mbar.
5 Conclusions
In this paper, we have extended the earlier chemical model FVT+
to calculate the EOS of dense hydrogen.
The current model FVT+ includes nonideality corrections to the free energy of each commponent of the partially
ionized plasma. We have shown results for the composition and the thermodynamic properties of dense hydrogen.
The PPT was located in the phase diagram, its critical point coincides with earlier results. Furthermore, we
have determined optical properties such as reflectivity and conductivity, within linear response theory using the
program package COMPTRA04. The calculated reflectivity along the experimental Hugoniot curve shows a
good agreement with the experiments. However, application of the FVT+ EOS to the interior structure of Jupiter
indicates that the behavior at about 1 Mbar is probably too soft. The same conclusion can be drawn from a
cpp header will be provided by the publisher 7
comparison with shock-wave experiments that indicate the existence of a PPT [12]. FVT+ predicts the PPT at
too low pressures as well as at too low densities. Further efforts to solve this problem, especially concerning the
reduced volume concept, are necessary.
Acknowledgements We thank P. M. Celliers, W. Ebeling, V. E. Fortov, V. K. Gryaznov, W.-D. Kraeft, and G. Röpke for
stimulating discussions. This work was supported by the DFG within the SFB 652 Strongly Correlated Matter in Radiation
Fields and the GRK 567 Strongly Correlated Many Particle Systems.
References
[1] W. J. Nellis, Rep. Prog. Phys. 69, 1479 (2006).
[2] S. T. Weir, A. C. Mitchell, W. J. Nellis, Phys. Rev. Lett. 76, 1860 (1996).
[3] P. M. Celliers et al., Phys. Rev. Lett. 84, 5564 (2000).
[4] D. Saumon and G. Chabrier, Phys. Rev. A 44, 5122 (1991).
[5] D. Saumon and G. Chabrier, Phys. Rev. A 46, 2084 (1992).
[6] W. Ebeling and G. Norman, J. Stat. Phys. 110 861 (2003).
[7] W. Ebeling, H. Hache, H. Juranek, R. Redmer, and G. Röpke, Contrib. Plasma Phys. 45 160 (2005).
[8] V. S. Filinov et al., J. Phys. A: Math. Gen. 39, 4421 (2006).
[9] D. Saumon, W. B. Hubbard, G. Chabrier, and H. M. Van Horn, Astrophys. J. 391 827 (1992).
[10] G. Chabrier, D. Saumon, W. B. Hubbard, and J. I. Lunine, Astrophys. J. 391 817 (1992).
[11] D. J. Stevenson, J. Phys.: Condens. Matt. 10 11227 (1998).
[12] V. E. Fortov et al., unpublished.
[13] N. A. Tahir, H. Juranek, A. Shutov, R. Redmer, A. R. Piriz, M. Temporal, D. Varentsov, S. Udrea, D. H. H. Hoffmann,
C. Deutsch, I. Lomonosov, V. E. Fortov, Phys. Rev. B 67, 184101 (2003).
[14] N. A. Tahir et al., Contrib. Plasma Phys. (this issue).
[15] H. Juranek, N. Nettelmann, S. Kuhlbrodt, V. Schwarz, B. Holst, and R. Redmer, Contrib. Plasma Phys. 45, 432 (2005).
[16] R. Redmer, B. Holst, H. Juranek, N. Nettelmann, and V. Schwarz, J. Phys. A: Math. Gen. 39, 4479 (2006).
[17] R. Redmer and G. Röpke, Physica A 130, 523 (1985).
[18] M. Ross, F. H. Ree, and D. A. Young, J. Chem. Phys. 79, 1487 (1983).
[19] H. Juranek and R. Redmer, J. Chem. Phys. 112, 3780 (2000).
[20] H. Juranek, R. Redmer, and Y. Rosenfeld, J. Chem. Phys. 117, 1768 (2002).
[21] Qi-Feng Chen et al., J. Chem. Phys. 124, 074510 (2006).
[22] V. Schwarz, H. Juranek, R. Redmer, Phys. Chem. Chem. Phys. 7, 1990 (2005).
[23] G. Chabrier and A. Y. Potekhin, Phys. Rev. E 58, 4941 (1998).
[24] A. G. McLellan and B. J. Alder, J. Chem. Phys. 24, 115 (1956).
[25] T. Kahlbaum and A. Förster, Fluid Phase Equilibria 76, 71 (1992).
[26] H. C. Graboske, Jr., D. J. Harwood, and F. J. Rogers, Phys. Rev. 186, 210 (1969).
[27] M. Robnik and W. Kundt, Astron. Astrophys 120, 227 (1983).
[28] M. S. Marley and W. B. Hubbard, Icarus 88, 536 (1988).
[29] W. Ebeling and W. Richert, phys. stat. sol. (b) 128, 467 (1985); Phys. Lett. A 108, 80 (1985); Contrib. Plasma Phys. 25,
1 (1985).
[30] M. Schlanges, M. Bonitz, and A. Tschttschjan, Contrib. Plasma Phys. 35, 109 (1995).
[31] H. Reinholz, R. Redmer, and S. Nagel, Phys. Rev. E 52, 5368 (1995).
[32] D. Beule, W. Ebeling, A. Förster, H. Juranek, S. Nagel, R. Redmer, and G. Röpke, Phys. Rev. B 59, 14 177 (1999).
[33] W. R. Magro, D. M. Ceperley, C. Pierleoni, and B. Bernu, Phys. Rev. Lett. 76, 1240 (1996).
[34] W. Ebeling and R. Sändig, Annalen der Physik 28, 289 (1973).
[35] P. Haronska, D. Kremp, and M. Schlanges, Wiss. Zeit. Univ. Rostock 36, 98 (1987).
[36] H. Kitamura and S. Ichimaru, J. Phys. Soc. Jap. 67, 950 (1998).
[37] Ya. B. Zeldovich and L. D. Landau, Zh. Eksp. Teor. Fiz. 14, 32 (1944).
[38] G. E. Norman and A. N. Starostin, High Temp. 6, 394 (1968); ibid. 8, 381 (1970).
[39] S. Kuhlbrodt, B. Holst, R. Redmer, Contrib. Plasma Phys. 45, 73 (2005).
[40] The COMPTRA04 source code and data files can be found at http://www.mpg.uni-rostock.de/sp/pages/comptra.
[41] H. Reinholz et al., Phys. Rev. E 68, 036403 (2003).
[42] R. Redmer, H. Juranek, N. Nettelmann, and B. Holst, AIP Conf. Proc. 845, 127 (2006).
[43] W. B. Hubbard, Astrophys. J. 152, 745 (1968).
[44] V. N. Zharkov and V. P. Trubytsin, Physics of planetary Interiors, in: Astronomy and Astrophysics Series (Pachart,
Tucson/AZ, 1978)
[45] G. Chabrier, D. Saumon, W. B. Hubbard, and J. I. Lunine, Astrophys. J. 391, 817 (1992).
[46] D. Saumon and T. Guillot, Astrophys. J. 609, 1170 (2004).
[47] Sesame table 5251 (1982), derived from Sesame table 5263, G. Kerley, Report LA-4776 (1972).
http://www.mpg.uni-rostock.de/sp/pages/comptra
	Introduction
	Equation of state for dense hydrogen
	Conductivity and reflectivity
	Planetary interiors
	Conclusions
	References
ABSTRACT
  We calculate the equation of state of dense hydrogen within the chemical
picture. Fluid variational theory is generalized for a multi-component system
of molecules, atoms, electrons, and protons. Chemical equilibrium is supposed
for the reactions dissociation and ionization. We identify the region of
thermodynamic instability which is related to the plasma phase transition. The
reflectivity is calculated along the Hugoniot curve and compared with
experimental results. The equation-of-state data is used to calculate the
pressure and temperature profiles for the interior of Jupiter.

<|endoftext|><|startoftext|>
Experimental nonclassicality of single-photon-added thermal light states
Alessandro Zavatta,1, 2, ∗ Valentina Parigi,2,3 and Marco Bellini1, 3, †
1Istituto Nazionale di Ottica Applicata (CNR), L.go E. Fermi, 6, I-50125, Florence, Italy
2Department of Physics, University of Florence, I-50019 Sesto Fiorentino, Florence, Italy
3LENS, Via Nello Carrara 1, 50019 Sesto Fiorentino, Florence, Italy
(Dated: October 29, 2018)
We report the experimental realization and tomographic analysis of novel quantum light states
obtained by exciting a classical thermal field by a single photon. Such states, although completely
incoherent, possess a tunable degree of quantumness which is here exploited to put to a stringent
experimental test some of the criteria proposed for the proof and the measurement of state non-
classicality. The quantum character of the states is also given in quantum information terms by
evaluating the amount of entanglement that they can produce.
PACS numbers: 42.50.Dv, 03.65.Wj
INTRODUCTION
The definition and the measurement of the nonclassi-
cality of a quantum light state is a hot and widely dis-
cussed topic in the physics community; nonclassical light
is the starting point for generating even more nonclas-
sical states [1, 2] or producing the entanglement which
is essential to implement quantum information protocols
with continuous variables [3, 4]. A quantum state is said
to be nonclassical when it cannot be written as a mixture
of coherent states. In terms of the Glauber-Sudarshan
P representation [5, 6], the P function of a nonclassi-
cal state is highly singular or not positive, i.e. it cannot
be interpreted as a classical probability distribution. In
general however, since the P function can be badly be-
haved, it cannot be connected to any observable quan-
tity. In recent years, a nonclassicality criterion based on
the measurable quadrature distributions obtained from
homodyne detection has been proposed by Richter and
Vogel [7]. Moreover, a variety of nonclassical states has
recently been characterized by means of the negative-
ness of their Wigner function [8, 9, 10, 11], this however
being just a sufficient and not necessary condition for
nonclassicality [12]. It is still an open question which
is the universal way to experimentally characterize the
nonclassicality of a quantum state.
A conceptually simple way to generate a quantum light
state with a varying degree of nonclassicality consists in
adding a single photon to any completely classical one.
This is quite different from photon subtraction which, on
the other hand, produces a nonclassical state only when
starting from an already nonclassical one [13, 14].
In this Letter we report the generation and the analy-
sis of single-photon-added thermal states (SPATSs), i.e.,
completely classical states excited by a single photon,
∗Electronic address: azavatta@inoa.it
†Electronic address: bellini@inoa.it
first described by Agarwal and Tara in 1992 [15]. We
use the techniques of conditioned parametric amplifica-
tion recently demonstrated by our group [10, 11] to gen-
erate such states, and we employ ultrafast pulsed ho-
modyne detection and quantum tomography to investi-
gate their character. The peculiar nonclassical behavior
of SPATSs has recently triggered an interesting debate
[7, 16] and has been described in several theoretical pa-
pers [14, 15, 16, 17, 18]; their experimental generation
has already been proposed, although with more complex
schemes [14, 18, 19], but never realized. Thanks to their
adjustable degree of quantumness, these states are an
ideal benchmark to test the different experimental crite-
ria of nonclassicality recently proposed, and to investi-
gate the possibility of multi-photon entanglement gener-
ation. The nonclassicality of SPATSs is here analyzed by
reconstructing their negative-valued Wigner functions,
by using the quadrature-based Richter-Vogel (RV) crite-
rion, and finally comparing these with two other methods
based on quantum tomography. In particular, we show
that the so-called entanglement potential [20] is a sensi-
tive measurement of nonclassicality, and that it provides
quantitative data about the possible use of the states for
quantum information applications in terms of the entan-
glement that they would generate once sent to a 50-50
beam-splitter.
EXPERIMENTAL
The main source of our apparatus is a mode-locked
Ti:Sa laser which emits 1.5 ps pulses with a repetition
rate of 82 MHz. The pulse train is frequency-doubled to
393 nm by second harmonic generation in a LBO crystal.
The spatially-cleaned UV beam then serves as a pump for
a type-I BBO crystal which generates spontaneous para-
metric down-conversion (SPDC) at the same wavelength
of the laser source. Pairs of SPDC photons are emitted
in two distinct spatial channels called signal and idler.
Along the idler channel the photons are strongly filtered
http://arxiv.org/abs/0704.0179v1
mailto:azavatta@inoa.it
mailto:bellini@inoa.it
in the spectral and spatial domain by means of etalon
cavities and by a single-mode fiber which is directly con-
nected to a single-photon-counting module (further de-
tails are given in [9, 11]). The signal field is mixed with
a strong local oscillator (LO, an attenuated portion of
the main laser source) by means of a 50% beam-splitter
(BS). The BS outputs are detected by two photodiodes
connected to a wide-bandwidth amplifier which provides
the difference (homodyne) signal between the two pho-
tocurrents on a pulse-to-pulse basis [21]. Whenever a
single photon is detected in the idler channel, an homo-
dyne measurement is performed on the correlated spatio-
temporal mode of the signal channel by storing the corre-
sponding electrical signal (proportional to the quadrature
operator value) on a digital scope.
FIG. 1: (color online) Experimental setup. HR (HT) is a high
reflectivity (transmittivity) beam splitter; SPCM is a single-
photon-counting module; all other symbols are defined in the
text. The mode-cleaning fiber used to inject the thermal state
coming from the rotating ground glass disk (RD) into the
parametric crystal is not shown here for clarity.
When no field is injected in the SPDC crystal, con-
ditioned single-photon Fock states are generated from
spontaneous emission in the signal channel [8, 9]. We
have recently shown that, if the SPDC crystal is injected
with a coherent state, stimulated emission comes into
play and single-photon excitation of such a pure state is
obtained [10, 11]. However, a coherent state is still at the
border between the quantum and the classical regimes;
it is therefore extremely interesting to use a truly clas-
sical state, like the thermal one, as the input, and to
observe its degaussification [13]. In order to avoid the
technical problems connected to the handling of a true
high-temperature thermal source, we use pseudo-thermal
one, obtained by inserting a rotating ground glass disk
(RD) in a portion of the laser beam (see Fig.1). By cou-
pling a fraction (much smaller than the typical speckle
size) of the randomly scattered light into a single-mode
fiber, at the output we obtain a clean spatial mode with
random amplitude and phase yielding the photon distri-
bution typical of a thermal source [22] which is then used
to inject the parametric amplifier.
PROPERTIES OF SPATSS
In order to describe the state generated in our exper-
iment, we give a general treatment of photon addition
based on conditioned parametric amplification. By first-
order perturbation theory, the output of the parametric
amplifier when a pure state |ϕm〉 is injected along the
signal channel is given by
|ψm〉 = [1 + (gâ†sâ
i − g
∗âsâi)] |ϕm〉s |0〉i , (1)
where g accounts for the coupling and the amplitude of
the pump and â, â† are the usual noncommuting annihi-
lation and creation operators. For a generic signal input,
the output state of the parametric amplifier can be writ-
ten as
ρ̂out =
Pm |ψm〉 〈ψm| (2)
where the input mixed state is ρ̂s =
Pm |ϕm〉 〈ϕm| and
Pm is the probability for the state |ϕm〉. If we condition
the preparation of the signal state to single-photon de-
tection on the idler channel, we obtain the prepared state
ρ̂ = Tri(ρ̂out |1〉i 〈1|i) = |g|
2â†sρ̂sâs. (3)
When the input state ρ̂s is a thermal state with mean
photon number n̄, we obtain that the single-photon-
added thermal state is described by the following density
operator expressed in the Fock base:
n̄(n̄+ 1)
1 + n̄
n |n〉 〈n|. (4)
The lack of the vacuum term and the rescaling of higher
excited terms is evident in this expression. The P phase-
space representation can be easily calculated and is given
by (see also [15])
P (α) =
[(1 + n̄)|α|2 − n̄]e−|α|
2/n̄, (5)
while the corresponding Wigner function reads as
W (α) =
|2α|2(1 + n̄)− (1 + 2n̄)
(1 + 2n̄)3
e−2|α|
2/(1+2n̄) (6)
where α = x + iy. SPATSs have a well-behaved P func-
tion which is always negative around α = 0; this feature
is also present in the Wigner function and assures their
nonclassicality, however both P (0) andW (0) tend to zero
in the limit of n̄→ ∞.
DATA ANALYSIS AND DISCUSSION
After the acquisition of about 105 quadrature values
with random phases, we have performed the reconstruc-
tion of the diagonal density matrix elements using the
maximum likelihood estimation [23]. This method gives
the density matrix that most likely represents the mea-
sured homodyne data. Firstly, we build the likelihood
function contracted for a density matrix truncated to 25
diagonal elements (with the constraints of Hermiticity,
positivity and normalization), then the function is max-
imized by an iterative procedure [24, 25] and the errors
on the reconstructed density matrix elements are evalu-
ated using the Fisher information [25]. The results are
shown in Fig. 2, together with the corresponding recon-
structed [11] Wigner functions for two different temper-
FIG. 2: (color online) Experimentally reconstructed diagonal
density matrix elements (reconstruction errors of statistical
origin are of the order of 1%) and Wigner functions for ther-
mal states (left) and SPATSs (right): a) n̄ = 0.08; b) n̄ = 1.15.
Filled circles indicate the density matrix elements calculated
for thermal states and SPATSs with the expected efficiencies.
atures of the injected thermal state. Since in the low-
gain regime the count rate in the idler channel is given
by 〈n̂〉 = Tr(ρ̂outâ†i âi) = |g|2(1 + n̄), the mean photon
number values n̄ reported in Fig. 2 and in the following
are obtained from the ratio between the trigger count
rates when the thermal injection is present and when it
is blocked (see Ref. [11] and references therein).
The finite experimental efficiency in the preparation
and homodyne detection of SPATSs is fully accounted for
by a loss mechanism which can be modeled by the trans-
mission of the ideal state ρ̂ of Eq.(4) through a beam
splitter of trasmittivity η coupling vacuum into the de-
tection mode, such that the detected state ρ̂η is finally
found as:
ρ̂η = TrR{Uη(ρ̂ |0〉 〈0|)U †η} (7)
where Uη is the beam splitter operator acting on two in-
put modes containing the state ρ̂ and the vacuum, and
the states of the reflected mode (indicated by R) are
traced out. In the case of finite efficiency the expression
for the Wigner function thus results:
Wη(α) =
1 + 2η[n̄+ 2(1 + n̄)|α|2 − 2n̄η − 1]
(1 + 2n̄η)3
−2|α|2
1+2n̄η .
It should be noted that the value of experimental ef-
ficiency which best fits the data is the same (η = 0.62)
as that obtained for single-photon Fock states (i.e., with-
out injection), and implies that only a portion of vac-
uum due to losses enters the mode during the generation
of SPATS. Thanks to a very low rate of dark counts in
the trigger detector, the portion of the injected thermal
state which survives the conditional preparation proce-
dure and contributes to degradation of the SPATSs is in
fact completely negligible. However, since the nonclassi-
cal features of the state get weaker for large n̄, a limited
efficiency (η < 1) has the effect of progressively hiding
them among unwanted vacuum components.
Indeed, the measured negativity of the Wigner func-
tion at the origin (see Fig.3a and b) rapidly gets smaller
as the mean photon number of the input thermal state is
increased. With the current level of efficiency and recon-
struction accuracy we are able to prove the nonclassical-
ity of all the generated states (up to n̄ = 1.15), but one
may expect to experimentally detect negativity above the
reconstruction noise, and thus prove state nonclassical-
ity, up to about n̄ ≈ 1.5 (also see Fig.6a). It should
be noted that, even for a single-photon Fock state, the
Wigner function loses its negativity for efficiencies lower
than 50%, so that surpassing this experimental threshold
is an essential requisite in order to use this nonclassicality
criterion.
After having experimentally proved the nonclassical-
ity of the states for all the investigated values of n̄, it
is interesting to verify the nonclassical character of the
measured SPATSs also using different criteria.
The first one has been recently proposed by Richter
and Vogel [7] and is based on the characteristic func-
tion G(k, θ) = 〈eikx̂(θ)〉 of the quadratures (i.e., the
Fourier transform of the quadrature distribution), where
x̂(θ) = (âe−iθ + â†eiθ)/2 is the phase-dependent quadra-
ture operator. At the first-order, the criterion defines
a phase-independent state as nonclassical if there is a
value of k such that |G(k, θ)| ≡ |G(k)| > Ggr(k), where
-2 -1 0 1 2
0.0 0.4 0.8 1.2
-0.16
-0.12
-0.08
-0.04
 0.08
 0.34
 0.70
 1.15
Classical limit
FIG. 3: (color online) a) Sections of the experimentally re-
constructed Wigner functions for SPATSs with different n̄; b)
Experimental values for the minimum of the Wigner func-
tion W (0) as a function of n̄ for SPATSs (solid squares)
and for single-photon Fock states (empty circles) obtained
by blocking the injection; the values calculated from Eq.(8)
for η = 0.62 (solid curves) are in very good agreement with
experimental data and clearly show the appropriateness of
the model. Negativity of the Wigner function is a sufficient
condition for affirming the nonclassical character of the state.
Ggr(k) is the characteristic function for the vacuum mea-
sured when the signal beam is blocked before homodyne
detection. In other words, the evidence of structures nar-
rower than those associated to vacuum in the quadrature
distribution is a sufficient condition to define a nonclas-
sical state [12]. However, it has been shown that non-
classical states exist (as pointed out by Diósi [16] for a
vacuum-lacking thermal state [17], which is very similar
to SPATSs) which fail to fulfil such inequality; when this
happens, the first-order Richter-Vogel (RV) criterion has
to be extended to higher orders: the second-order RV
inequality reads as
2G2(k/2)Ggr(k/
2)−G(k) > Ggr(k). (9)
It is evident that, as higher orders are investigated,
the increasing sensitivity to experimental and statistical
noise may soon become unmanageable.
The measured |G(k)| and left hand side of Eq. (9) are
plotted in Fig. 4a) and b), together with the Ggr(k) char-
acteristic function, also obtained from the experimental
quadrature distribution of vacuum. While the detected
0 2 4 6 8
0 2 4 6 8
1.0a)
 Ggr(k)
 0.53
 0.70
 0.90
 1.15
4 6 8
 Ggr(k)
 0.08
 0.34
FIG. 4: (color online) Experimental characteristic functions
involved in the RV nonclassicality criterion for the detected
SPATSs: a) first order; b) second order (the inset shows a
magnified view of the region where the state with n̄ = 0.53 is
just slightly fulfilling the criterion).
SPATSs satisfy the nonclassical first-order RV criterion
only for the two lowest values of n̄, it is necessary to
extend the criterion to the second order to just barely
show nonclassicality at large values of k for n̄ = 0.53 (see
the inset of Fig.4b, where the shaded region indicates the
error area of the experimental Ggr(k)).
At higher temperatures, no sign of nonclassical be-
havior is experimentally evident with this approach, al-
though the Wigner function of the corresponding states
still clearly exhibits a measurable negativity (see Fig.3).
It should be noted that the second-order RV criterion for
the ideal state of Eq. (4) is expected to prove the nonclas-
sicality of SPATSs up to n̄ ≈ 0.6 [7]; however, when the
limited experimental efficiency and the statistical noise
is taken into account, it will start to fail even earlier.
The tomographic reconstruction of the state that was
earlier used for the nonclassicality test based on the neg-
ativity of the Wigner function, can also be exploited to
test alternative criteria: for example by reconstructing
the photon-number distribution ρn = 〈n| ρ̂meas |n〉 and
then looking for strong modulations in neighboring pho-
ton probabilities by the following relationship [26, 27]
B(n) ≡ (n+ 2)ρnρn+2 − (n+ 1)ρ2n+1 < 0, (10)
introduced by Klyshko in 1996, which is known to hold
for nonclassical states. In the ideal situation of unit ef-
ficiency SPATSs should always give B(0) < 0 due to
the absence of the vacuum term ρ0, in agreement with
Ref. [17]. The experimental results obtained for B(0)
by using the reconstructed density matrix ρ̂meas are pre-
sented in Fig.5a) together with those calculated for the
state described by ρ̂η (see Eq.(7)) with η = 0.62. The
agreement between the experimental data and the ex-
pected ones is again very satisfactory, showing that our
model state ρ̂η well represents the experimental one. Our
current efficiency should in principle allow us to find neg-
ative values of B(0) even for much larger values of n̄;
however, if one takes the current reconstruction errors
due to statistical noise into account, the maximum n̄ for
which the corresponding SPATS can be safely declared
nonclassical is of the order of 2. It should be noted that,
differently from the Wigner function approach, here the
nonclassicality can be proved even for experimental effi-
ciencies much lower than 50%, as far as the mean photon
number of the thermal state is not too high (see Fig.6b).
Finally, it is particularly interesting to measure the en-
tanglement potential (EP) of our states as recently pro-
posed by Asboth et al. [20]. This measurement is based
on the fact that, when a nonclassical state is mixed with
vacuum on a 50-50 beam splitter, some amount of entan-
glement (depending on the nonclassicality of the input
state) appears between the BS outputs. No entangle-
ment can be produced by a classical initial state. For
a given single-mode density operator ρ̂, one calculates
the entanglement of the bipartite state at the BS out-
puts ρ̂′ = UBS(ρ̂|0〉〈0|)U †BS by means of the logarithmic
negativity EN (ρ̂
′) based on the Peres separability cri-
terion and defined in [28], where UBS is the 50-50 BS
transformation. The computed entanglement potentials
for the reconstructed SPATS density matrices ρ̂meas are
shown in Fig. 5b) together with those expected at the
experimentally-evaluated efficiency (i.e., obtained from
ρ̂η with η = 0.62). The EP is definitely greater than zero
(by more than 13σ) for all the detected states, thus con-
firming that they are indeed nonclassical, in agreement
with the findings obtained by the measurement of B(0)
and W (0). As a comparison, the EP would be equal to
unity for a pure single-photon Fock state, while it would
reduce to 0.43 for a single-photon state mixed with vac-
uum ρ̂ = (1− η) |0〉 〈0|+ η |1〉 〈1| with η = 0.62.
To summarize, the three tomographic approaches to
test nonclassicality have all been able to experimentally
prove it for all the generated states (i.e., SPATSs with
0.0 0.4 0.8 1.2
0.0 0.4 0.8 1.2
Classical limit
Classical limit
FIG. 5: (color online) a) Experimental data (squares) and
calculated values (solid curve) of B(0) as a function of n̄;
negative values indicate nonclassicality of the state. b) The
same as above for the entanglement potential (EP) of the
SPATSs; here nonclassicality is demonstrated by EP values
greater than zero.
an average number of photons in the seed thermal state
up to n̄ = 1.15) for a global experimental efficiency of
η = 0.62. In order to gain a better view of the range of
values for n̄ and for the global experimental efficiency η
which allow to prove the nonclassical character of single-
photon-added thermal states under realistic experimen-
tal conditions, we have calculated the indicators W (0),
B(0), and EP from the model state described by ρ̂η. The
results are shown in Fig.6: the contour plots define the
regions of parameters where the detected state is classi-
cal (white areas), where it would result nonclassical if the
reconstruction errors coming from statistical noise could
be neglected (grey areas) and, finally, where it is defi-
nitely nonclassical even with the current level of noise
(black areas). From such plots it is evident that, as al-
ready noted, the Wigner function negativity only works
for sufficiently high efficiencies, while both B(0) and EP
are able to detect nonclassical behavior even for η < 50%.
In particular, the entanglement potential is clearly seen
to be the most powerful criterion, at least for these par-
ticular states, and to allow for an experimental proof of
a) b)
W(0) EPB(0)
FIG. 6: Calculated regions of nonclassical behavior of SPATSs as a function of n̄ and η according to: a) the negativity
of the Wigner function at the origin W (0); b) the Klyshko criterion B(0); c) the entanglement potential EP. White areas
indicate classical behavior; grey areas indicate where a potentially nonclassical character is not measurable due to experimental
reconstruction noise (estimated as the average error on the experimentally reconstructed parameters); black areas indicate
regions where the nonclassical character is measurable given the current statistical uncertainties.
nonclassicality for all combinations of n̄ and η, as long as
reconstruction errors can be neglected. Also considering
the current experimental parameters, EP should show
the quantum character of SPATSs even for n̄ > 3, thus
demonstrating its higher immunity to noise.
Although at a different degree, all three indicators are
however very sensitive to the presence of reconstruction
noise of statistical origin which may completely mask the
nonclassical character of the states, even for relatively low
values of n̄ or for low efficiencies. In order to unambigu-
ously prove the quantum character of higher-temperature
SPATSs in these circumstances the only possibility is to
reduce the “grey zone” by significantly increasing the
number of quadrature measurements.
CONCLUSIONS
In conclusion, we have generated a completely incoher-
ent light state possessing an adjustable degree of quan-
tumness which has been used to experimentally test and
compare different criteria of nonclassicality. Although
the direct analysis of quadrature distributions, done fol-
lowing the criterion proposed by Richter and Vogel, has
been able to show the nonclassical character of some of
the states with lower mean photon numbers, quantum to-
mography, with the reconstruction of the density matrix
and the Wigner function from the homodyne data, has
allowed us to unambiguously show the nonclassical char-
acter of all the generated states: three different criteria,
the negativity of the Wigner function, the Klyshko crite-
rion and the entanglement potential, have been used with
varying degree of effectiveness in revealing nonclassical-
ity. Besides being a useful tool for the measurement of
nonclassicality through the definition of the entanglement
potential, the combination of nonclassical field states -
such as those generated here - with a beam-splitter, can
be viewed as a simple entangling device generating multi-
photon states with varying degree of purity and entangle-
ment and allowing the future investigation of continuous-
variable mixed entangled states [29].
ACKNOWLEDGMENTS
The authors gratefully acknowledge Koji Usami for
giving the initial stimulus for this work and Milena
D’Angelo and Girish Agarwal for useful discussions and
comments. This work was partially supported by Ente
Cassa di Risparmio di Firenze and MIUR, under the
PRIN initiative and FIRB contract RBNE01KZ94.
[1] A. P. Lund, H. Jeong, T. C. Ralph, and M. S. Kim, Phys.
Rev. A 70, 020101(R) (2004).
[2] H. Jeong, A. P. Lund, and T. C. Ralph, Phys. Rev. A
72, 013801 (2005).
[3] M. S. Kim, W. Son, V. Bužek, and P. L. Knight, Phys.
Rev. A 65, 032323 (2002).
[4] S. L. Braunstein and P. van Loock, Rev. Mod. Phys. 77,
513 (2005).
[5] R. J. Glauber, Phys. Rev. 131, 2766 (1963).
[6] E. C. G. Sudarshan, Phys. Rev. Lett. 10, 277 (1963).
[7] W. Vogel, Phys. Rev. Lett. 84, 1849 (2000).
[8] A. I. Lvovsky, H. Hansen, T. Aichele, O. Benson,
J. Mlynek, and S. Schiller, Phys. Rev. Lett. 87, 050402
(2001).
[9] A. Zavatta, S. Viciani, and M. Bellini, Phys. Rev. A 70,
053821 (2004).
[10] A. Zavatta, S. Viciani, and M. Bellini, Science 306, 660
(2004).
[11] A. Zavatta, S. Viciani, and M. Bellini, Phys. Rev. A 72,
023820 (2005).
[12] A. I. Lvovsky and J. H. Shapiro, Phys. Rev. A 65, 033830
(2002).
[13] J. Wenger, R. Tualle-Brouri, and P. Grangier, Phys. Rev.
Lett. 92, 153601 (2004).
[14] M. S. Kim, E. Park, P. L. Knight, and H. Jeong, Phys.
Rev. A 71, 043805 (2005).
[15] G. S. Agarwal and K. Tara, Phys. Rev. A 46, 485 (1992).
[16] L. Diósi, Phys. Rev. Lett. 85, 2841 (2000).
[17] C. T. Lee, Phys. Rev. A 52, 3374 (1995).
[18] G. N. Jones, J. Haight, and C. T. Lee, Quantum Semi-
class. Opt. 9, 411 (1997).
[19] M. Dakna, L. Knöll, and D.-G. Welsch, Eur. Phys. J. D
3, 295 (1998).
[20] J. K. Asboth, J. Calsamiglia, and H. Ritsch, Phys. Rev.
Lett. 94, 173602 (2005).
[21] A. Zavatta, M. Bellini, P. L. Ramazza, F. Marin, and
F. T. Arecchi, J. Opt. Soc. Am. B 19, 1189 (2002).
[22] F. T. Arecchi, Phys. Rev. Lett. 15, 912 (1965).
[23] K. Banaszek, G. M. D’Ariano, M. G. A. Paris, and M. F.
Sacchi, Phys. Rev. A 61, 010304 (1999).
[24] A. I. Lvovsky, J. Opt. B: Quantum Semiclass. Opt. 6,
556 (2004).
[25] Z. Hradil, D. Mogilevtsev, and J. Rehacek, Phys. Rev.
Lett. 96, 230401 (2006).
[26] D. N. Klyshko, Phys. Lett. A 231, 7 (1996).
[27] G. M. D’Ariano, M. F. Sacchi, and P. Kumar, Phys. Rev.
A 59, 826 (1999).
[28] G. Vidal and R. F. Werner, Phys. Rev. A 65, 032314
(2002).
[29] M. Horodecki, P. Horodecki, and R. Horodecki, Phys.
Rev. Lett. 80, 5239 (1998).
ABSTRACT
  We report the experimental realization and tomographic analysis of novel
quantum light states obtained by exciting a classical thermal field by a single
photon. Such states, although completely incoherent, possess a tunable degree
of quantumness which is here exploited to put to a stringent experimental test
some of the criteria proposed for the proof and the measurement of state
non-classicality. The quantum character of the states is also given in quantum
information terms by evaluating the amount of entanglement that they can
produce.

<|endoftext|><|startoftext|>
Introduction
Accurate experimental data on the neutron skin in neutron rich nuclei would allow to further constrain
model parameters involved in the calculations of the nuclear symmetry energy [1]. The latter plays
a central role in a variety of nuclear phenomena. The value a4 ≈ 30 MeV of the nuclear symmetry
energy S(ρ0) = a4+
(ρ− ρ0)+ . . . at nuclear saturation density ρ0 ≈ 0.17 fm
−3 seems reasonably well
established. On the other hand, the density dependence of the symmetry energy can vary substantially
with the many-body approximations employed.
Several authors have pointed out [2, 3] a strong correlation between the neutron skin, ∆R =
〈r2〉n−
〈r2〉p = Rn−Rp, and the symmetry energy of neutron matter near saturation density. In the framework
of a mean field approach Furnstahl [3] demonstrated that in heavy nuclei there exists an almost linear
empirical correlation between theoretical predictions in terms of various mean field approaches to S(ρ)
(i.e., a bulk property) and the neutron skin, ∆R (a property of finite nuclei).
This observation has contributed to a renewed interest in an accurate determination of the neutron
skin in neutron rich nuclei. Besides, a precise value of the neutron skin is required as an input in several
processes of physical interest, e.g. the analysis of energy shifts in deeply bound pionic atoms [4], and
in the analysis of atomic parity violation experiments (weak charge) [5]. It is worth to stress that to
experimentally determine the skin in heavy nuclei is extremely challenging as ∆R is just about few
percents of the nuclear radius.
The present contribution is partially based upon the results published previously in [6].
2 Relationship between the symmetry energy and ∆R
Brown [2] and Furnstahl [3] have pointed out that within the framework of mean field models there
exists an almost linear empirical correlation between theoretical predictions for both a4 and its density
http://arxiv.org/abs/0704.0180v1
dependence, p0, and the neutron skin ∆R in heavy nuclei. This observation suggests an intriguing
relationship between a bulk property of infinite nuclear matter and a surface property of finite systems.
Here, following the analysis of [6], this question is addressed from a point of view of the Landau-Migdal
approach.
Let us consider a simple mean-field model with the Hamiltonian consisting of the single-particle
mean field part Ĥ0 and the residual particle-hole interaction Ĥp−h:
Ĥ = Ĥ0 + Ĥp−h, Ĥph =
(F ′ +G′~σa~σb)~τa~τbδ(~ra − ~rb), (1)
Ĥ0 =
(Ta + U(xa)), U(x) = U0(x) + U1(x) + UC(x), (2)
U0(x) = U0(r) + Uso(x); U1(x) =
Spot(r)τ
(3); UC(x) =
UC(r)(1− τ
(3)). (3)
Here, U0(x) is the phenomenological isoscalar part of the mean field potential U(x) (x = {~r, ~σ, ~τ}),
U0(r) and Uso(x) are the central and spin-orbit parts, respectively; F
′ and G′ are the phenomenological
Landau-Migdal parameters. The isovector part U1(x) and the Coulomb mean field UC(x) are both
calculated consistently in the Hartree approximation, Spot(r) is the symmetry potential (r-dependent
symmetry energy in finite nuclei).
The model Hamiltonian Ĥ (1) preserves the isospin symmetry within the RPA if a selfconsistency
relation between the symmetry potential and the Landau-Migdal parameter F ′ is fulfilled:
Spot(r) = 2F
′n(−)(r), (4)
where n(−)(r) = nn(r)− np(r) is the neutron excess density. Thus, in this model the depth of the sym-
metry potential is controlled by the Landau-Migdal parameter F ′ (analogous role plays the parameter
g2ρ in relativistic mean field models). Spot(r) is obtained from Eq.(4) by an iterative procedure; the
resulting dependence of ∆R on the dimensionless parameter f ′ = F ′/(300 MeV fm3) shown in fig. 1
indeed illustrates that ∆R depends almost linearly on f ′. Then with the use of the Migdal relation
(1 + 2f ′) [7] relating the symmetry energy and f ′, a similar, almost linear, correlation between
a4 and ∆R is obtained.
To get more insight in the role of f ′ we consider small variations δF ′. Neglecting the varia-
tion of n(−)(r) with respect to δF ′, the corresponding linear variation of the symmetry potential
is δSpot(r) = 2δF
′n(−)(r). Then in the first order perturbation theory, such a variation of Spot
causes the following variation of the ground-state wave function |δ0〉 = δF ′
〈s|N̂(−)|0〉
E0−Es
|s〉, with “s”
labeling the eigenstates of the nuclear Hamiltonian and a single-particle operator N̂ (−) defined as
N̂ (−) =
n(−)(ra)τ
a . Consequently, the variation of the expectation value 〈0|V̂
(−)|0〉 = NR2n − ZR
of another single-particle operator V̂ (−) =
a can be written as
Rpδ(∆R) = δF
Re〈0|N̂ (−)|s〉〈s|V̂ (−)|0〉
E0 −Es
. (5)
In practice the sum in Eq. (5) is exhausted mainly by the isovector monopole resonance (IMR) which
high excitation energy (about 24 MeV in 208Pb) justifies the perturbative consideration. Eq. (5) is able
to reproduce directly calculated δ(∆R) shown in Fig. 1 with the accuracy of about 10%. As a result,
a simple microscopic interpretation of the linear correlation between ∆R and Landau parameter F ′ is
obtained.
0.6 0.8 1 1.2 1.4
Figure 1: Neutron skin in 208Pb versus the Landau-Migdal parameter f ′.
3 Extracting neutron skin from properties of isovector giant
resonances
Parity violating electron scattering off nuclei is probably the least model dependent approach to probe
the neutron distribution [8]. The weak electron-nucleus potential is Ṽ (r) = V (r) + γ5A(r), where
the axial potential A(r) = GF
ρW (r). The weak charge is mainly determined by neutrons ρW (r) =
(1 − 4 sin2 θW )ρp(r)− ρn(r), with sin
2 θW ≈ 0.23. In a scattering experiment using polarized electrons
one can determine the cross section asymmetry [8] which comes from the interference between the A and
V contributions. Using the measured neutron form factor at small finite value of Q2 and the existing
information on the charge distribution one can uniquely extract the neutron skin. Some slight model
dependence comes from the need to assume a certain radial dependence for the neutron density, to
extract Rn from a finite Q
2 form factor.
However, the best claimed accuracy of the experimental determination of neutron radii would be on
the level of 1%, that translates to relatively large uncertainty of 20-30% in the neutron skin. On such
accuracy level, some indirect experimental probes of ∆R still can be competitive.
A variety of experimental approaches have been employed to obtain indirect information on ∆R.
To some extent all the analysis contain a certain model dependence, which in many cases is difficult
to estimate quantitatively. For choosing an indirect probe it is very important to address the question
how sensitive is the proposed physical quantity with respect to a variation of ∆R in a single nucleus.
The higher is the sensitivity, the better is the choice of the correlation for the indirect deducing ∆R
from the measured values.
It is not intended here to give a comprehensive review of the existing methods. In particular, the
results from the analysis of the antiprotonic atoms, elastic proton and neutron scattering reactions, and
the pygmy dipole resonance are completely left out. Here, special emphasis will be put on proposals to
provide accurate information on the neutron skin from properties of isovector giant resonances.
3.1 Spin-dipole Giant Resonance
In [9] it has been proposed to utilize the excitation probability of the spin-dipole resonance in charge
exchange reactions for determining the neutron skin. The method has been applied to obtain information
on the variation of the neutron skin in the Sn isotopes [9]. For the relevant operator,
a [~σa ⊗
~ra]JM , (J = 0, 1, 2) the summed ∆L = 1 strength is
S(−) − S(+) = C(NR2n − ZR
p). (6)
Here S(−) and S(+) are the spin-dipole total strengths in β(−) and β(+) channels, respectively; C is the
factor depending on the normalization of the spin-dipole operator (in the definition of Ref. [2] C = 1/4π,
we use here C = 1). Because S(+) could not be measured experimentally, the model-dependent energy-
weighted sum rule was invoked in the analysis of [9] to eliminate S(+). However, the used analytical
representation for the sum rule was oversimplified and led in some cases, e.g. for 208Pb, to absurdly
negative S(+). In [10] another way was proposed, namely, to use for the analysis the ratio S(+)/S(−)
calculated within the pn-RPA. The parameterization of the RPA calculation results for tin isotopes in
the form
S(+)/S(−) = 0.388− 0.012(N − Z)
was used later in [11] to reanalyze the experimental data and led to a marked change in the extracted
∆R’s.
Let us now assess the experimental accuracy for S(−) needed to determine the neutron skin to a
given accuracy. Putting S(+) = 0 (that seems to be a very good approximation for 208Pb) and one has
S(−) = (N − Z)R2p + 2NRp∆R. (7)
The ratio of the second term on the rhs to the first one in case of 208Pb is
2N∆R/((N − Z)Rp) ≈ 5.7∆R/Rp.
Therefore, for Rp = 5.5 fm and ∆R = 0.2 fm the second term is only 25% of the first one and one needs
5% accuracy in S(−) to determine ∆R with 20% accuracy. Because the SD strength is spread out and
probably has a considerable strength at low-energy, the results for the ∆R can be only considered as
qualitative with a relatively large uncertainty (up to 30-50%).
3.2 Isobaric analogue state
The dominant contribution to the energy weighted sum rule (EWSR) for Fermi excitations by the
operator T (−) =
a comes from the Coulomb mean field
(EWSR)F =
UC(r)n
(−)(r)d3r, (8)
The Coulomb mean field UC(r) resembles very much that of the uniformly charged sphere, being inside
a nucleus a quadratic function: UC(r) =
(3 − (r/Rc)
2), r ≤ Rc. It turns out that if one extends
such a quadratic dependence also to the outer region r > Rc (instead of proportionality to Rc/r), it
gives numerically just a very small deviation in (EWSR)F (less than 0.5%, due to the fact, that both
the difference and its first derivative go to zero at r = Rc and n
(−)(r) is exponentially decreasing for
r > Rc). Using such an approximation, one gets:
(EWSR)F ≈ (N − Z)∆C
3(N − Z)R2c
with ∆C =
, and S(−) given in Eq.(7).
Since the IAS exhausts almost 100% of the NEWSR and EWSR, one may hope to extract S(−) from
the IAS energy. However, the term depending on S(−) contributes only about 20% to (EWSR)F , and as
a result, the part of S(−) depending on ∆R contributes only about 4% to (EWSR)F (in
208Pb). ¿From
the experimental side, the IAS energy can be determined with unprecendently high accuracy, better
than 0.1%. Also, from the experimentally known charge density distribution the Coulomb mean field
UC(r) can be calculated rather accurately, and hence one can determine the small difference between
Eqs.(9) and (8). But at the level of 1% accuracy several theoretical effects discarded in Eq.(8) come
into play that makes such an accurate description of the IAS energy very difficult (the Nolen-Schiffer
anomaly).
Also in [12] it was stated that the Coulomb displacement energies (CDE) are sensitive to ∆R. A
gross estimate ∆R = 0.80(5)(N −Z)/A fm was obtained from a four-parameter fit of the experimental
Rp and observed mirror CDE’s. The authors claimed 127 keV to be the rms error of the fit, but they
assumed the nuclear wave functions calculated within the Nuclear Shell Model to be isospin pure. Thus,
the important effect of the Coulomb mixing of the IAS and the IMR was not taken into account, which
is known to decrease the IAS energy by a few percents. Therefore, the Nolen-Schiffer anomaly does not
seem to have been resolved yet.
3.3 Is the energy spacing between GTR and IAS a good candidate for
determining the neutron skin in isotopic chains?
In a recent paper [13] a proposal has been put forward to use the isotopic dependence of the energy
spacing, ∆E, between the Gamow-Teller resonance (GTR) and the IAS as a tool for determining the
evolution of the neutron skin in nuclei along an isotopic chain. Here, we would like to present some
physical arguments which question the physical relevance of this method.
The authors of [13] have used the fact that both functions, ∆R and ∆E, are monotonic functions
(increasing and decreasing, respectively) of the neutron excess (N−Z) to state that “isotopic dependence
of the energy spacings between the GTR and IAS provides direct information on the evolution of neutron
skin-thickness along the Sn isotopic chain”. Arguing in such a way one can find a correlation between
any two monotonic functions of a single physical parameter and plot them as a function of one another
like is done in Fig. 2 of [13] 1. However, it does not imply automatically a real physical correlation
between the functions which are determined also by many other model parameters which are kept fixed
while performing calculations (the calculations in [13] have been performed within the relativistic mean
field (RMF) and relativistic QRPA (RQRPA) approaches).
Again, the relevant question to be addressed is how sensitive is one physical quantity with respect to
a variation of another in a single nucleus? In other words, one has to evaluate what variation of ∆E
is produced by varying ∆R in a single nucleus. Imaging an extreme situation (which is actually not
far from reality) that ∆E were not sensitive to ∆R at all, one would get by varying ∆R a family of
different calculated dependences (like shown in the upper panel of Fig. 2 of [13]) which would give no
clue about the real dependence seen in nature.
Thus, it is quite important to understand the physical reasons which cause the energy splitting
between the GTR and the IAS. It is well-known that if the nuclear Hamiltonian possessed Wigner
SU(4) symmetry then the GTR and the IAS would be degenerate, ∆E = 0. In such a case any
variation of ∆R, not violating the symmetry, would not affect ∆E at all. However, it is also known
that the spin-isospin SU(4) symmetry is broken in nuclei. Hence, ∆E is determined by those terms in
1note that, to avoid confusion in comparing the measured and calculated dependences, the authors should have plotted
the experimental points in the upper panel as the function of the measured ∆R rather than calculated ∆R and should
have added the horizontal error bars to them reflecting the experimental uncertainty in ∆R shown in the lower panel.
the nuclear Hamiltonian which violate the symmetry. Their qualitative and semi-quantitative estimates
in terms of the energy weighted sum rules for the Gamow-Teller (EWSRGT ) and the Fermi (EWSRF )
excitations have been already known for more than 20 years (see, e.g., [14]). The analysis of these
authors as well as a quantitative analysis performed recently in [15] has shown that there are three
basic sources in the Hamiltonian which violate SU(4) symmetry and contribute to the difference of the
sum rules: spin-orbit mean field and both particle-particle and particle-hole residual charge-exchange
interactions. One sees that none of the sources explicitly refers to the symmetry potential, to which
∆R is especially sensitive.
An estimate of ∆E as ∆E =
EWSRGT − EWSRF
N − Z
can be calculated according to [15] in the
Sn isotopes. From the sources violating SU(4) symmetry, spin-orbit mean field represents the major
one and contributes about 5 MeV to the splitting. The contribution of the particle-hole interaction is
negative and about 1–2 MeV in the absolute value. The contribution of the particle-particle interaction
is rather difficult to evaluate (due to uncertainty in the strength of the spin-dependent particle-particle
interaction) but it seems to be of minor importance (very probably no more than 0.5 MeV, especially
for large (N − Z)) and can safely be neglected.
Now let us turn to the discussion of the sensitivity of the contributions to the variation of ∆R. We
could reproduce the corresponding analytical expressions from [15] explicitly, but it is enough for our
purpose just to mention that the dominating contribution to ∆E from the spin-orbit mean field is given
by its expectation value in the ground state and is determined basically only by the unfilled spin-orbit
doublets. This expectation value is completely insensitive to the variation of ∆R.
Within the Landau-Migdal approach described above, the particle-hole contribution
∆Eph =
2(G′ − F ′)
N − Z
(n(−)(r))2d3r (10)
is given by the product of the volume integral of the neutron excess density squared and the difference
of the p-h strengths G′ and F ′ [15]. In the SU(4)-symmetric limit one has G′ = F ′ and ∆Eph = 0
explicitly. Still, in this limit one has a freedom to choose different F ′ that produces a variation in ∆R,
similar to shown in fig 1. Therefore, as already mentioned, one can get no clue about the actual ∆R
from ∆E = 0 in the SU(4)-symmetric limit.
In a realistic situation G′ 6= F ′ (f ′ = 1.0 and g′ = 0.8 were taken in [15]), but ∆Eph depends only on
the difference G′ − F ′. One usually fixes G′ in order to reproduce the GTR energy in some nuclei (the
authors of [13] have followed this way, too) and possible information from ∆E about the absolute value
of F ′ is lost. Furthermore, one can a priori think that a degree of violation of the SU(4) symmetry
should be a sort of a fundamental property of the residual interaction.Therefore, the difference G′ − F ′
should stay more stable in different models as compared to some possible variation of F ′ producing
different ∆R.
Considering value of G′ − F ′ fixed, one can employ a simple model varying only ρn(r) to see how a
change of ∆R affects ∆Eph via variation of the neutron excess density n
(−)(r). A small variation of ρn(r)
can be approximately represented as δρn(r) = −
(3ρn(r) +R
dρn(r)
), where δRn is a change of the
rms neutron radius Rn, R is the nuclear radius (with R
n ≈ 0.6R
2). Assuming the proton and neutron
densities be constant inside a nucleus, the final estimate is
δ∆Eph
N + Z − 2γN
N − Z
), where
γ = n(−)(R)/n(−)(0).
Thus, in Sn isotopes with the experimental charge radii about Rp =4.6 fm a rather significant
variation of ∆R about 0.1 fm, that is of the order of magnitude of δR, would cause
δ∆Eph
= 0.3 and
δ∆Eph
= 0.15 for 112Sn and 132Sn, respectively (γ = 0.5), that corresponds to the absolute change about
0.3 MeV in ∆E, to be compared with the experimental uncertainties in ∆E of the same order. It
is clear that to draw any conclusion about ∆R from the measured ∆E would be premature. Even if
the experimental errors in ∆E were exactly zero, the accuracy of the theoretical model itself would be
hardly believed to be of the necessary level. For instance, apart from the obvious uncertainties in the
isotopic dependence of the spin-orbit potential, the GTR does not exhaust 100% of the corresponding
sum rules and the shell-structure effects such as configurational and isospin splitting of the GTR can
have some effect on the calculated GTR energy.
It is also noteworthy that, in spite of the claimed self-consistency of the calculations, the slope of the
calculated isotopic dependence of the IAS energy is about 3 times larger than the experimental one (see
inset in Fig. 1 of [13]). Note, that the isospin self-consistent continuum-QRPA calculations of [15] were
able to nicely reproduce the slope (while overall underestimated the IAS energy by about 0.5 MeV, the
well-known Nolen-Schiffer anomaly).
To conclude, we believe that the suggested in [13] method to deduce the neutron skin from the
energy spacing between GTR and IAS is rather questionable in its origin and does not fairly provide
“direct information on the evolution of neutron skin-thickness”.
4 Some implications of ∆R
In several processes of physical interest knowledge of ∆R plays a crucial role and in fact a more accurate
value could lead to more stringent tests:
(i) The pion polarization operator [4] (the s-wave optical potential) in a heavy nucleus Π(ω, ρp, ρn) =
−T+(ω)ρ−T−(ω)(ρn−ρp) has mainly an isovector character (T
+(mπ) ∼ 0). Parameterizing the densities
by Fermi shapes for the case of 208Pb the main nuclear model dependence in the analysis comes from
the uncertainty in the value of ∆R multiplying T−.
(ii) The parity violation in atoms is dominated by Z−boson exchange between the electrons and the
neutrons [5]. Taking the proton distribution as a reference there is a small so-called neutron skin (ns)
correction to the parity non-conserving amplitude, δEnspnc, for, say, a 6s1/2 → 7s1/2 transition, which is
related to ∆R as (independent of the electronic structure)
δEnspnc
(αZ)2
. (11)
In 133Cs it amounts to a δE/E ≈ −(0.1−0.4)% depending on whether the non-relativistic or relativistic
estimates for ∆R are used [5]. The corresponding uncertainty in the weak charge QW is −(0.2− 0.8)σ.
(iii) The pressure in a neutron star matter can be expressed as in terms of symmetry energy and its
density dependence
P (ρ, x) = ρ2
∂E(ρ, x)
= ρ2[E ′(ρ, 1/2) + S ′(ρ)(1− 2x)2 + . . .]. (12)
By using beta equilibrium in a neutron star, µe = µn − µp = −
∂E(ρ,x)
, and the result for the electron
chemical potential, µe = 3/4h̄cx(3π
2ρx)1/3, one finds the proton fraction at saturation density, ρ0, to
be quite small, x0 ∼ 0.04. Hence, the pressure at saturation density can be approximated as
P (ρ0) = ρs(1− 2x0)(ρ0S
′(ρ0)(1− 2x0) + S(ρ0)x0) ∼ ρ
′(ρ0). (13)
At higher densities the proton fraction increases; this increase is more rapid in case of larger p0 [1]. While
for the pressure at higher densities contributions from other nuclear quantities like compressibility will
play a role in it was argued that that there is a correlation of the neutron star radius and the pressure
which does not depend on the EoS at the highest densities. Numerically the correlation can be expressed
in the form of a power law, RM ∼ C(ρ,M)(
P (ρ)
MeVfm−3
)0.25 km, where C(ρ = 1.5ρs,M = 1.4Msolar) ∼ 7.
This shows that a determination of a neutron star radius would provide some constraint on the symmetry
properties of nuclear matter.
5 Conclusion
In this contribution we discuss some aspects of extracting the neutron skin from properties of isovector
giant resonances and critically review existing proposals. The theoretical method relying on the energy
difference between the GTR and IAS is shown to lack sensitivity to ∆R. It is also shown that the phe-
nomenological, almost linear, relationship between the symmetry energy and the neutron skin in finite
nuclei, observed in mean field calculations, can be understood in terms the Landau-Migdal approach.
Acknowledgments
The work is supported in part by the Deutsche Forschungsgemeinschaft (grant FA67/28-2) and by the
EU ILIAS project (contract RII3-CT-2004-506222). The author would like to thank Profs. L. Dieperink
and M. Urin for useful discussions.
References
[1] C.J. Horowitz and J. Piekarewicz, Phys. Rev. Lett. 86 (2001) 5647 ;
Phys. Rev. C 66 (2002) 055803 .
[2] B.A. Brown, Phys. Rev. Lett. 85 (2000) 5296 .
[3] R. J. Furnstahl, Nucl. Phys. A 706 (2002) 85 .
[4] E.E. Kolomeitsev, N. Kaiser and W. Weise, Phys. Rev. Lett. 90 (2003) 092501 .
[5] S. J. Pollock and M. C. Welliver, Phys. Lett. B 464 (1998) 177 .
[6] A.E.L. Dieperink, Y. Dewulf, D. Van Neck, M. Waroquier, V. Rodin, Phys. Rev. C 68 (2003)
064307 .
[7] A.B. Migdal, Theory of finite Fermi-systems and properties of atomic nuclei (Moscow, Nauka,
1983) (in Russian).
[8] C. J. Horowitz, S. J. Pollock, P. A. Souder and R. Michaels, Phys. Rev. C 63 (2001) 025501 ;
http://hallaweb.jlab.org/parity/prex.
[9] A. Krasznohorkay et al., Phys. Rev. Lett. 82 (1999) 3216 .
[10] V. Rodin, M. Urin, KVI annual report (2000).
[11] M. Csatlós et al., Acta Phys. Polonica B33 (2002) 331 .
[12] J. Duflo and A. P. Zuker, Phys. Rev. C 66 (2002) 051304 .
[13] D. Vretenar, N. Paar, T. Nikšić, P. Ring, Phys. Rev. Lett. 91 (2003) 262502 .
[14] Yu.V. Gaponov, Yu.S. Lyutostansky, V.G. Aleksankin, JETP Lett. 34 (1981) 386 ; T. Suzuki, Phys.
Lett. B 104 (1981) 92 ; K. Nakayama, A. Pio Galeao, F. Krmpotic, Phys. Lett. B 114 (1982) 217 .
[15] V.A. Rodin and M.H. Urin, Phys. At. Nuclei 66 (2003) 2128 , nucl-th/0201065.
http://hallaweb.jlab.org/parity/prex
http://arxiv.org/abs/nucl-th/0201065
	Introduction
	Relationship between the symmetry energy and R
	Extracting neutron skin from properties of isovector giant resonances 
	Spin-dipole Giant Resonance
	Isobaric analogue state
	Is the energy spacing between GTR and IAS a good candidate for determining the neutron skin in isotopic chains?
	Some implications of R
	Conclusion
ABSTRACT
  Some aspects, both experimental and theoretical, of extracting the neutron
skin $\Delta R$ from properties of isovector giant resonances are discussed.
Existing proposals are critically reviewed. The method relying on the energy
difference between the GTR and IAS is shown to lack sensitivity to $\Delta R$.
A simple explanation of the linear relation between the symmetry energy and the
neutron skin is also given.

<|endoftext|><|startoftext|>
Introduction
Photonic crystals (PCs) describe a class of semiconductor structures which exhibit a periodic
variation of refractive index in 1, 2, or 3 dimensions. As a result of this periodic variation,
PCs possess a photonic band gap – a range of frequencies in which the propagation of light
is forbidden [1, 2]. This is the analog of the electronic bandgap in traditional semiconductors.
This unique characteristic of PCs enables them to be used to effectively manipulate light. PCs
have already been used for applications such as modifying the spontaneous emission rate of
emitters [3, 4], slowing down the group velocity of light [5, 6], and designing highly efficient
nanoscale lasers [7].
Given that Photonic Crystals find applications in a myriad of areas, we proceed to investigate
the question: What is the best possible PC design for a given application? Traditionally, the
design of optimal PC structures has been largely done by either trial-and-error, iterative searches
through a design space, by physical intuition, or some combination of the above methods [8, 9].
However, such methods of design have their limitations, and recent developments in PC design
optimization have instead taken on a more systematic and algorithmic nature [10, 11, 12, 13].
In this work, we report the results of a Genetic Algorithm to optimize the design of a set of
one and two-dimensional PC structures. We show that the Genetic Algorithm can effectively
optimize PC structures for any given design objective, and is thus a highly robust and useful
design tool.
2. Genetic Algorithms
Genetic Algorithms (also known as Evolutionary Algorithms) are a class of optimization algo-
rithms that apply principles of natural evolution to optimize a given objective [14, 15, 16]. In
the genetic optimization of a problem, different solutions to the problem are picked (usually
randomly), and a measure of fitness is assigned to each solution. On a given generation of the
design, a set of operations, analogous to mutation and reproduction in natural selection, are per-
formed on these solutions to create a new generation of solutions, which should theoretically
be “fitter” than their parents. This process is repeated until the algorithm terminates, typically
after a pre-defined number of generations, or after a particularly “fit” solution is found, or more
generally, when a generation of solutions meets some pre-defined convergence criterion.
3. Implementation
Genetic Algorithms have already been used in PC design - to find non-intuitive large-bandgap
designs [12, 17] and for designing PC fibers [18]. In our work, we performed the genetic opti-
mization by varying the sizes of circular holes in a triangular lattice. This approach was chosen
because the search space is conveniently constrained in this paradigm, and the optimized struc-
tures can be easily fabricated, if desired. A freely available software package [19] was used to
simulate the designed structures.
In addition, we used the following parameters for the implementation of our Genetic Algo-
rithm:
Chromosome Encoding. We used a direct-chromosome encoding, where the various opti-
mization parameters were stored in a vector. For the current simulations, for simplicity,
we only varied the radii of cylindrical holes in a triangular lattice. Our implementation
can be easily modified to include other optimization parameters as well, such as the po-
sitions of the various holes, or the refractive index of the dielectric material.
Selection. We used fitness-proportionate selection (also known as roulette-wheel selection),
to choose parent chromosomes for mating. In this selection scheme, a chromosome is
selected with a probability Pi that is proportional to its fitness fi, as shown in Eq. (1).
Mating. After a pair of parent chromosomes vparent,1 and vparent,2 were selected, they were
mated to produce a child chromosome vchild by taking a random convex combination of
the parent vectors, as in Eq. (2).
λ ∼ U(0,1)
~vchild = λ~vparent,1 +(1−λ )~vparent,2 (2)
Mutation. Mutation was used to introduce diversity in the population. We used two types of
mutation in our simulations, a random-point crossover and a gaussian mutation.
1) Random-point crossover: For an original chromosome vector ~vorig of length N, we
select a random index, k, from 0 to N as the crossover point, and swap the two halves of
~vorig to produce the mutated vector,~vmut , as represented in Eq. (3).
~vorig = (v1,v2, . . . ,vN)
k ∼ U{0,1,2, ....,N}
~vmut = (vk+1,vk+2, . . . ,vN ,v1,v2, . . . ,vk−1)
T (3)
2) Gaussian mutation: To mutate a chromosome vector by Gaussian mutation, we define
each element of ~vmut to be independent and identically distributed Gaussian Random
Variables with mean ~vorig and a standard deviation proportional to the corresponding
elements of ~vorig. This searches the space in the vicinity of the original chromosome
vector~vorig.
vmuti ∼ N
vorigi ,σ
, i ∈ {0,1,2, ....,N} (4)
σ2 is a algorithm-specific variance, and can be tuned to change the extent of parameter-
space exploration due to mutation.
Cloning. To ensure that the maximum fitness of the population was would never decrease, we
copied (cloned) the top few chromosomes with the highest fitness in each generation and
inserted them into the next generation.
4. Simulation Results
4.1. Optimizing Planar Photonic Cavity Cavities
4.1.1. Q-factor Maximization
One problem of interest in PC design is the inverse problem, where one tries to find a dielectric
structure to confine a given (target) electromagnetic mode. Here we consider the inverse design
problem of optimizing a linear-defect cavity in a planar photonic crystal cavity. The Q-factor is
a common figure of merit measuring how well a cavity can confine a given mode, and can be
approximated (assuming no material absorption) by the following expression:
Qtotal
where Q|| represents the Q-factor in the direction parallel to the slab, and Q⊥ represents
the Q-factor perpendicular to the slab. Q⊥ is usually the limiting factor for Qtotal. As was
shown previously [10, 20], the vertical mode confinement, which occurs through total internal
reflection (TIR), can be improved if the mode has minimal k-space components inside the light
cone.
In the subsequent sections, we report the results where we employed our GA to minimize
the light cone radiation of such cavities. We used one-dimensional photonic crystals as approx-
imations to these cavities [21], and simulated these cavities using the standard Transfer Matrix
method for the E-field [22].
4.1.2. Matching to a Target Function
In [10] it was noted that minimization of light cone radiation could be performed via mode-
matching to a target function which already possessed such a property. We therefore used a
fitness function that was equal (up to a normalizing factor) to the reciprocal of the mean-squared
difference between our simulated mode and a target mode (see Eq (6)). For this simulation, our
chromosome encoded the thicknesses of the dielectric slabs in the structure, and was a vector
of length 10. We used 100 chromosomes in each generation and allowed them to evolve for 80
generations.
f itness ∝
| fsim(x)− ftarget (x)|
We used target modes that were sinusoidal functions multiplied by sinc and sinc-squared
envelope respectively. Such target modes have theoretically no radiation at or near the Gamma
point and are therefore ideal candidates as target functions. The results, shown in Fig. 1, clearly
feature a suppression of k-vector components at low spatial-frequencies. Matching using the
the sinc-squared envelope target function appeared to produce a better match. From the k-
space plots, the GA evidently had difficulty matching the sharp edges for the sinc-envelope
target mode.
4.1.3. Direct Minimization of Light Cone Radiation
In the preceding subsection, we observed that when we formulated our objective as a matching
problem, in the case of the sinc-envelope, the GA sacrificed the desired low spatial-frequency
suppression in an effort to match the overall shape of the function. The preceding formulation
therefore poses an implicit constraint on our optimization. By reformulating the optimization
problem, we were able to effectively remove this constraint, and obtain a better result.
0 100 200 300 400 500
−3 −2 −1 0 1 2 3
k (a/λ)
0 100 200 300 400 500
−3 −2 −1 0 1 2 3
k (a/λ)
Fig. 1: Top-left: Real-space mode profile after optimizing for closest-match to a sinc-envelope
target mode. Top-right: k-space mode profile of optimized simulated mode and a sinc-envelope
target mode. Bottom: Real-space and k-space mode profiles for matching against a sinc2-
envelope target mode.
Our reformulation directly minimized the k-vector components in the light cone, by min-
imizing the integrated square-magnitude of the simulated E-field mode in k-space inside the
light cone. The fitness function that we used is given as in Eq (7), where V represents the set of
k-vectors within the light cone.
f itness =
|F(k)|2dk
The final, evolved structure, together with the corresponding real-space and k-space mode
profiles are shown in Fig 2. The k-space mode profile features a strong suppression of radiation
at low frequencies, to a greater extent as compared to the optimized fields from the preced-
ing simulations. By relaxing our constraint and performing a direct optimization, our GA has
designed a structure that achieves better light cone suppression than before. Our direct opti-
mization paradigm has exploited the extreme generality of the GA, which simply requires that
a fitness function be defined, with little further constraint thereafter.
4.2. Maximal Gap at any k-vector Point
Moving on to the more generic case of 2D photonic crystals, we will proceed to show the results
of simulations for maximizing the TE bandgap at any point in k-space for a 2-Dimensional PC
structure with a triangular lattice of air holes. This could be useful for PC design applications
where the target mode to be confined is centered around a particular point in k-space [10]. By
maximizing the bandgap at that k-space point, we would effectively design a better mirror for
a mode resonating along this k-space direction.
0 50 100 150 200 250 300 350 400 450
Optimized Mode − real space
−3 −2 −1 0 1 2 3
k (a/λ)
Optimized Mode − k space
Fig. 2: Top: Real-space mode profile of optimized resonant E-field mode. Bottom: Correspond-
ing k-space mode profile of optimized mode
We used a supercell which was three periods wide in each dimension and varied the radii of
the nine holes in total, and we encoded the chromosome as a vector of these nine holes. We
used a population size of 60 chromosomes for each generation, and allowed the optimization to
run for a total of 100 generations.
To evaluate the fitness of each chromosome, we used the eigensolver in Ref [19] to calculate
the gap-to-midgap ratio at the K-point of the band diagram. We then scaled the calculated ratio
exponentially to tune the selection pressure of the optimization. Figure 3 shows the variation of
the gap-to-midgap ratio of our structures as the algorithm progressed.
Our Genetic Algorithm performs as expected, and we get a general increase of fitness as
the algorithm progresses. All the four runs do not show any significant increase in fitness af-
ter Generation 80, at which point they have maximum fitnesses (i.e. ratio of their bandgap to
midgap value) of around 72%. All the optimized structures after the run have similar dielectric
structures and band diagrams. The dielectric structures and a sample band diagram is shown in
Figure 4.
4.3. Optimal dual PC structures
As a more complex example, let us consider two similar PC designs, (1) a triangular lattice of
air holes in a dielectric slab, and (2) a triangular lattice of dielectric rods in air. Structure (1)
possesses a bandgap for TE light, but no bandgap for TM light, while structure (2) possesses a
bandgap for TM light, but not for TE light.
Our objective is to use the Genetic Algorithm to find a PC design in which the TE eigenmode
for structure (1) and the TM eigenmode for structure (2) are most similar. Maxwell’s equations
can be cast as eigenproblems for the Electric or Magnetic fields, and our approach could be po-
tentially useful in future PC design, because solving the inverse problem is analytically simpler
(at least intuitively) for the eigenproblem involving the E-field.
We used a 3x3 supercell for the optimization, and we minimize the mean-square difference
of the z-components of the electric and magnetic fields of the dual structures at the K-point
of the band diagram. We recognize a priori that a trivial solution, which we wish to avoid, is
0 10 20 30 40 50 60 70 80 90 100
Generation number
Fig. 3: Fitness (gap-to-midgap ratio at K-point of the band diagram) of maximally-fit struc-
ture of each generation for 100 generations. The maximum fitness is a monotonically non-
decreasing function due to cloning. A general increase in fitness arises as a result of various
genetic operations (selection, mating, mutation).
a structure that has a uniform refractive index (either dielectric or air) throughout, and so we
prevent the genetic algorithm from obtaining this by restricting our mutation to only a Gaussian
mutation (see Eq. 4). This preferentially searches the locality of points, and is a necessary trade-
off for obtaining a reasonable solution. This illustrates the versatility of the Genetic approach
- the extent of the search can be easily modified by a simple change of algorithm parameters.
Fig. 5 shows the optimal dual structures with the corresponding simulated fields.
5. Conclusion
From the results above, we have shown that our Genetic Algorithm is able to effectively opti-
mize PC designs to meet specific design criteria. Furthermore, by our choice of encoding, we
could easily impose constraints upon the design space to ensure that every design searched by
the algorithm could be realistically fabricated. Between different optimizations, all that needed
to be changed was the measure of how well a given structure complied with our design crite-
rion - the ”fitness function” in Genetic Algorithm parlance. Our Genetic Algorithm is therefore
highly robust and can be easily modified to optimize any user-defined objective function.
(a) Run 1 (b) Run 2
(c) Run 3 (d) Run 4
(e) Band Diagram - optimized
(f) Band Diagram - uniform holes, r/a = 0.3
Fig. 4: Dielectric structures (a-d), showing the optimal PC structures predicted by 4 runs our
Genetic Algorithm. The unit cell for each structure is depicted by the yellow bounding box. A
sample band diagram (for Run 3) is shown in (e). The optimized TE-bandgap, calculated as the
ratio of the size of the gap to the midgap value, was found to be ≃ 72%. The TE-bandgap for a
triangular lattice with uniform air holes (r/a = 0.3) is shown in (f ) for reference.
(a) Band 1, E-field (b) Band 1, H-field
(c) Band 2, E-field (d) Band 2, H-field
(e) Band 3, E-field (f) Band 3, H-field
(g) Band 4, E-field (h) Band 4, H-field
Fig. 5: Genetic Algorithm prediction of PC structures that have optimally matched E and H
fields, for the lowest 4 bands, at the K point. The E-fields are shown for structure with dielectric
rods, and have a TM bandgap, while the H-fields are shown for structures with air holes, and
have a TE bandgap. The shown fields are in the direction aligned with the rods. The fields for
the lowest 3 bands are very well matched, but begin to deviate significantly from each other at
band 4.
	Introduction
	Genetic Algorithms
	Implementation
	Simulation Results
	Optimizing Planar Photonic Cavity Cavities
	Q-factor Maximization
	Matching to a Target Function
	Direct Minimization of Light Cone Radiation
	Maximal Gap at any k-vector Point
	Optimal dual PC structures
	Conclusion
ABSTRACT
  We investigate the use of a Genetic Algorithm (GA) to design a set of
photonic crystals (PCs) in one and two dimensions. Our flexible design
methodology allows us to optimize PC structures which are optimized for
specific objectives. In this paper, we report the results of several such
GA-based PC optimizations. We show that the GA performs well even in very
complex design spaces, and therefore has great potential for use as a robust
design tool in present and future applications.

<|endoftext|><|startoftext|>
Introduction, such a large
anisotropy is unusual in 3d metals since the spin–orbit
coupling is rather weak, e.g. the calculated magneto-
crystalline anisotropy energy in bulk fcc Co is only 2 µeV
per atom.34 Also the XMCD spectrum, which depends
essentially only on integral quantities, namely spin and
orbital moments, exhibits a very small anisotropy.9 As
pointed out by one of us and P. M. Oppeneer, the XMLD
signal in metallic Co depends only weakly on the small
valence band spin-orbit coupling. The major contribu-
tion to XMLD comes from the exchange splitting of the
2p levels (≈ 1 eV).9 The magneto-crystalline anisotropy
then arises from the fact that different final 3d states are
probed for different orientations of the sample magneti-
zation.
To assess the feasibility of our experimental data, we
used the calculated XMLD spectrum of Ref. 9 for the
[100] direction (what is referred to in Ref. 9 as “full cal-
culation”) and augmented these with equal calculations
for the [110] magnetization on the same system (see Fig.
6). In the calculations performed on bulk fcc Co a siz-
able anisotropy of XMLD is found, however, the [100]
exhibits larger XMLD magnitude contrary to the exper-
iment. Before dismissing these results as a disagreement
a few remarks are in order. First, the calculations were
done on bulk material while the experiment is performed
on a thin layer sandwiched by other materials, therefore a
good quantitative agreement is unlikely. Second, we can-
not judge the calculated anisotropy based on the present
data only. Note that due to a slight mutual shift of the
calculated spectra, the [100] contrast at the maximum
amplitude of the [110] XMLD would be rather small.
Such a shift is not present in Fig. 5, where [100] and
[110] spectra obtained on slightly different samples are
compared. Third, a possible non-collinearity of Co spins
due to the presence of the AF FeMn layer would lead to
local moments pointing neither along [110] nor fully along
[100]. Taking these uncertainties into account we draw a
modest, nevertheless non-trivial, conclusion that the the-
ory does not prohibit a magneto-crystalline anisotropy of
XMLD as large as observed in our experiment.
IV. CONCLUSIONS
We have presented a spectromicroscopic PEEM inves-
tigation of the magnetic domain pattern on Co/FeMn
bilayers using XMCD and XMLD as the contrast mech-
anism. The sensitivity of the method allows to visualize
even the tiny XMLD signal of the induced ferromagnetic
moments in the FeMn layer. We have found a factor
of 3.6 difference in the XMLD contrast between the Co
L3 signal from 〈110〉 and 〈100〉 domains in a single sam-
ple. We argue that this huge difference is mainly due to
an intrinsic magneto-crystalline anisotropy of XMLD of
the Co layer. Comparison of experimental XMLD spec-
tra obtained from different samples published previously
and ab initio calculations on bulk fcc Co suggest that
such an anisotropy is indeed possible.
Acknowledgments
We thank B. Zada and W. Mahler for technical assis-
tance, and S. S. Dhesi for providing the data from Ref. 30.
Financial support by the German Minister for Education
and Research (BMBF) under grant No. 05 SL8EF19 is
gratefully acknowledged. J. K. acknowledges the support
by an Alexander von Humboldt Research Fellowship.
∗ Electronic address: kuch@physik.fu-berlin.de; URL: http:
//www.physik.fu-berlin.de/~ag-kuch
† Present address: CNISM and Dipartimento di Fisica, Uni-
versità Roma Tre, Via della Vasca Navale 84, I-00146
Roma, Italy.
‡ Present address: Universität Duisburg–Essen, Institut für
Experimentelle Physik, Lotharstraße 1, D-47057 Duisburg,
Germany.
§ Present address: Department of Physics and HKU-CAS
Joint Lab on New Materials, The University of Hong Kong,
Hong Kong, China.
¶ Present address: SPring-8, 1–1–1 Kouto, Sayo-cho, Sayo-
gun, Hyogo 679-5198, Japan.
∗∗ Present address: Hiroshima Synchrotron Radiation Cen-
ter, 2–313 Kagamiyama, Higashi-Hiroshima, 739-8526 Hi-
roshima, Japan.
1 J. Nogués and I. K. Schuller, J. Magn. Magn. Mater. 192,
203 (1999).
2 W. H. Meiklejohn and C. P. Bean, Phys. Rev. 102, 1413
(1956).
3 B. Dieny, V. S. Speriosu, S. S. P. Parkin, B. A. Gurney,
D. R. Wilhoit, and D. Mauri, Phys. Rev. B 43, 1297 (1991).
4 J. C. S. Kools, IEEE Trans. Magn. 32, 3165 (1996).
5 G. van der Laan, B. T. Thole, G. A. Sawatzky, J. B.
Goedkoop, J. C. Fuggle, J.-M. Esteva, R. Karnatak, J. P.
Remeika, and H. A. Dabkowska, Phys. Rev. B 34, 6529
(1986).
6 M. W. Haverkort, S. I. Csiszar, Z. Hu, S. Altieri,
A. Tanaka, H. H. Hsieh, H.-J. Lin, C. T. Chen, T. Hibma,
and L. H. Tjeng, Phys. Rev. B 69, 020408(R) (2004).
7 E. Arenholz, G. van der Laan, R. V. Chopdekar, and
Y. Suzuki, Phys. Rev. B 74, 094407 (2006).
8 G. van der Laan, Phys. Rev. Lett. 82, 640 (1999).
9 J. Kuneš and P. M. Oppeneer, Phys. Rev. B 67, 024431
(2003).
10 W. Kuch, F. Offi, L. I. Chelaru, M. Kotsugi, K. Fukumoto,
and J. Kirschner, Phys. Rev. B 65, 140408(R) (2002).
11 C. Won, Y. Z. Wu, H. W. Zhao, A. Scholl, A. Doran,
W. Kim, T. L. Owens, X. F. Jin, and Z. Q. Qiu, Phys.
Rev. B 71, 024406 (2005).
12 F. Offi, W. Kuch, and J. Kirschner, Phys. Rev. B 66,
064419 (2002).
13 W. Kuch, R. Frömter, J. Gilles, D. Hartmann, C. Zi-
ethen, C. M. Schneider, G. Schönhense, W. Swiech, and
J. Kirschner, Surf. Rev. Lett. 5, 1241 (1998).
14 W. Kuch, L. I. Chelaru, F. Offi, M. Kotsugi, and
J. Kirschner, J. Vac. Sci. Technol. B 20, 2543 (2002).
15 M. Kotsugi, W. Kuch, F. Offi, L. I. Chelaru, and
J. Kirschner, Rev. Sci. Instrum. 74, 2754 (2003).
16 J. Stöhr, Y. Wu, B. D. Hermsmeier, M. G. Samant, G. R.
Harp, S. Koranda, D. Dunham, and B. P. Tonner, Science
259, 658 (1993).
17 W. Kuch, Appl. Phys. A 76, 665 (2003).
18 Y. Endoh and Y. Ishikawa, J. Phys. Soc. Jpn. 30, 1614
(1971).
19 F. Offi, W. Kuch, L. I. Chelaru, K. Fukumoto, M. Kotsugi,
and J. Kirschner, Phys. Rev. B 67, 094419 (2003).
20 W. Kuch, L. I. Chelaru, and J. Kirschner, Surf. Sci. 566–
568, 221 (2004).
21 W. Kuch, L. I. Chelaru, F. Offi, J. Wang, M. Kotsugi, and
J. Kirschner, Phys. Rev. Lett. 92, 017201 (2004).
22 W. Kuch, L. I. Chelaru, F. Offi, J. Wang, M. Kotsugi, and
J. Kirschner, Nature Mater. 5, 128 (2006).
23 W. Kuch, J. Gilles, F. Offi, S. S. Kang, S. Imada, S. Suga,
and J. Kirschner, J. Electron Spectrosc. Relat. Phenom.
109, 249 (2000).
24 M. R. Weiss, R. Follath, K. J. S. Sawhney, F. Senf,
J. Bahrdt, W. Frentrup, A. Gaupp, S. Sasaki, M. Scheer,
H.-C. Mertins, et al., Nucl. Instr. and Meth. A 467–468,
449 (2001).
25 J. Stöhr, A. Scholl, T. J. Regan, S. Anders, J. Lüning,
M. R. Scheinfein, H. A. Padmore, and R. L. White, Phys.
Rev. Lett. 83, 1862 (1999).
26 A. Scholl, J. Stöhr, J. Lüning, J. W. Seo, J. Fompeyrine,
H. Siegwart, J.-P. Locquet, F. Nolting, S. Anders, E. E.
Fullerton, et al., Science 287, 1014 (2000).
27 F. Nolting, A. Scholl, J. Stöhr, J. W. Seo, J. Fompeyrine,
H. Siegwart, J.-P. Locquet, S. Anders, J. Lüning, E. E.
Fullerton, et al., Nature 405, 767 (2000).
28 H. Ohldag, A. Scholl, F. Nolting, S. Anders, F. U. Hille-
brecht, and J. Stöhr, Phys. Rev. Lett. 86, 2878 (2001).
29 M. M. Schwickert, G. Y. Guo, M. A. Tomaz, W. L.
O’Brien, and G. R. Harp, Phys. Rev. B 58, R4289 (1998).
30 S. S. Dhesi, G. van der Laan, and E. Dudzik, Appl. Phys.
Lett. 80, 1613 (2002).
31 W. A. A. Macedo, B. Sahoo, V. Kuncser, J. Eisenmenger,
I. Felner, J. Nogués, K. Liu, W. Keune, and I. K. Schuller,
Phys. Rev. B 70, 224414 (2004).
32 Normal incidence, room temperature, degree of linear po-
larization 95% (Ref. 30), > 97% (Ref. 21), photon energy
resolution 400 meV (Ref. 30), 300 meV (Ref. 21), measured
under 7 T magnetic field (Ref. 30) and in remanence (Ref.
33 The left axis of Fig. 4 gives the difference of the raw XMLD
asymmetry between the two orthogonal domains. To com-
pare to the spectra of Figs. 6 and 5, one has to keep in mind
that the intensity at these two photon energies, in particu-
lar the one 1.0 eV below the L3 peak maximum, is less than
the intensity in the peak maximum. In the case of Co the
average intensity of the denominator of the asymmetry is
about 75% of the peak maximum. Because the difference
was normalized to the sum, this has to be doubled and
gives a factor of 1.5. The pre-edge background, which is
included here, has also to be taken into account. From im-
mailto:kuch@physik.fu-berlin.de
http://www.physik.fu-berlin.de/~ag-kuch
http://www.physik.fu-berlin.de/~ag-kuch
ages acquired in the Co pre-edge region it was determined
to make up for about 35% of the intensity measured at the
L3 maximum. This leads roughly to another factor of 1.7.
An asymmetry value of 0.017 in Fig. 4 (the amplitude of
the fit curve for 〈110〉 magnetization) thus corresponds to
a peak-to-peak amplitude of the linear dichroism of about
4.3% of the L3 peak height. The curve of Dhesi et al., for
comparison, shows a peak-to-peak dichroism of 5.3% of the
L3 peak height.
34 J. Trygg, B. Johansson, O. Eriksson, and J. M. Wills, Phys.
Rev. Lett. 75, 2871 (1995).
	Introduction
	Experiment
	Results and discussion
	 Conclusions
	Acknowledgments
	References
ABSTRACT
  We present an x-ray spectromicroscopic investigation of single-crystalline
magnetic FeMn/Co bilayers on Cu(001), using X-ray magnetic circular (XMCD) and
linear (XMLD) dichroism at the Co and Fe L3 absorption edges in combination
with photoelectron emission microscopy (PEEM). Using the magnetic coupling
between the ferromagnetic Co layer and the antiferromagnetic FeMn layer we are
able to produce magnetic domains with two different crystallographic
orientations of the magnetic easy axis within the same sample at the same time.
We find a huge difference in the XMLD contrast between the two types of
magnetic domains, which we discuss in terms of intrinsic magneto-crystalline
anisotropy of XMLD of the Co layer. We also demonstrate that due to the high
sensitivity of the method, the small number of induced ferromagnetic Fe moments
at the FeMn-Co interface is sufficient to obtain magnetic contrast from XMLD in
a metallic system.

<|endoftext|><|startoftext|>
Introduction 
Tensile properties of SWCNTs have been widely investigated by experimental and theoretical 
techniques. Experimentally, the Young’s modulus of  SWCNTs are measured as ranging from 
0.9 to 1.9 TPa in [1]. In the SEM measurements of [2], SWCNT ropes broke at the strain 
values of 5.3 % or lower and the determined mean values of breaking strength and Young’s 
modulus are 30 GPa and 1002 GPa, respectively. AFM and SGM measurements [3] show that  
SWCNTs can sustain elongations as great as 30% without breaking.  
On the other hand, the ab initio simulation study of SWCNTs [4] showed that Young’s 
modulus and Poisson ratio values of the tubes are ranging from 0.5 TPa to 1.1 TPa and from 
0.11 to 0.19, respectively. The Young modulus and Poisson ratio of armchair nanotubes is 
given as 0.764 TPa and 0.32 respectively in [5]. Young modulus of (10,0); (8,4) and (10,10) 
tubes are calculated as 1.47 TPa; 1.10 TPa and 0.726 TPa, respectively in [6].  The results of 
[7] proposed that the structural failure should occur at 16% for zigzag and above 24% for 
armchair tubes. An empirical force-constant model of [8] gave the Young’s modulus between  
0.971 TPa- 0.975 TPa and Poisson ratio between 0.277 - 0.280. An empirical pair potential 
simulations of [9]  gave the Young’s modulus between 1.11 TPa -1.258 TPa and the Poisson 
ratio between  0.132-0.151. Continuum shell model of [10], calculated the elastic modulus as 
0.94 TPa.and the maximum stress and failure strain values as 70 GPa, 88 GPa; 11%, 15% for 
(17,0) and (10,10) tubes, respectively. Finite element method [11] determined  the strength of 
CNTs  between 77 GPa to 101 GPa and Poisson ratio between 0.31-0.35. Analytical model in 
[12] found the tensile strength as 126.2 GPa of armchair tubes to be stronger than that (94.56 
GPa) of zigzag tubes and the failure strains are 23.1% for armchair and 15.6-17.5% for zigzag 
tubes.  MD simulations of [13-17]  determined Young’s modulus between 0.311 to 1.017 TPa 
for SWCNT. We found the Young’s modulus, tensile strength and the Poisson ratio as  0.311 
TPa, 4.92 GPA and 0.287 for (10,10) tubes in [16]. C.Goze et al. [18]  calculated the Young’s 
modulus as 0.423 TPa and Poisson ratio as 0.256 for (10,10) tube. Nonlinear elastic properties 
of SWCNTs under axial tension and compression were studied by T.Xiao et al. [19,20] using 
MD simulations with the second-generation Brenner potential. They showed that the energy 
change of the nanotubes are a cubic function of the tensile strains, both in tension and under 
compression. The maximum elongation strains are 15% and 17% for zigzag and armchair 
tubes, respectively. Also the maximum compression strain decreases with increasing tube 
diameter, and it is almost 4% for (10,10) tube. M.Sammalkorpi et al. [21] studied the effects 
of vacancy-related defects on the mechanical characteristics of SWCNTs by employing MD 
simulations and continuum theory. They calculated the Young’s modulus for perfect 
SWCNTs as 0.7 TPa. They showed that at 10K temperature, the critical strains of (5,5) and 
(10,10) tubes are 26% and 27%, respectively; also tensile strength is 120 GPa. On the other 
hand, for (9,9) and (17,0) tubes, the critical strains are found as 22% and 21%, respectively, 
and tensile strength is 110 GPa.  Y.Wang et al. [22] investigated the compression deformation 
of SWCNTs by MD simulations using the Tersoff-Brenner potential to describe the 
interactions of carbon atoms. They determined that the SWCNTs whose diameters range from 
0.5 nm to 1.7 nm and length ranges from 7 nm to 19 nm, the Young’s modulus range from 
1.25 TPa to 1.48 TPa.  S.H.Yeak et al. [23] used MD and TBMD method to examine the 
mechanical properties of SWCNTs under axial tension and compression. Their results showed 
that the Young’s modulus of the tubes are around 0.53 TPa; the maximum strain under axial 
tension is 20% for (12,12) and (7,7) tubes and also under this strain rate, the tensile stresses 
are 100 GPa and 90 GPa, respectively. Many elastic characteristics like the Young’s modulus 
show a wide variations (0.3 TPa- 1.48 TPa) in all reported results in literature. These results 
are obtained at room temperature or without the mention of the temperature. The following 
reasons may be given for the variety of results: i) Young’s modulus depends on the tube 
diameter and the chirality ii) different values are used for the wall thickness iii) different 
procedures are applied to represent the strain iv) accuracy of the applied methods (first 
principle methods in comparison with emprical model potentials) 
SWCNTs will be locally subject to abrupt temperature increases in electronics circuits and the 
temperature increase affects their structural stability and the mechanical properties. MD 
simulation studies on the mechanical properties of the SWCNTs at various temperatures 
under tensile loading simulations can be followed in [24-28].  M.B.Nardelli et al. [24] showed 
that all tubes are brittle at high strains and low temperatures, while at low strains and high 
temperatures armchair nanotubes can be completely or partially ductile. In zigzag tubes 
ductile behavior is expected for tubes with n<14 while larger tubes are completely brittle. 
N.R.Raravikar et al. [25]  showed between  0-800K temperature range radial Young’s 
modulus of nanotubes decreases with increasing temperature and its slope is -7.5x10-5 (1/K) . 
C.Wei et al. [26,27] studied the tensile yielding of SWCNTs and MWCNTs under continuous 
stretching using MD simulations and a transition state theory based model. They showed that 
the yield strain decreases at higher temperatures and at slower strain rates. The tensile yield 
strain of SWCNT has linear dependence on the temperature and has a logarithmic dependence 
on the strain rate. The slope of the linear dependence increases with temperature. From their 
results it is shown that the yield strain of (10,0) tube decreased from 18% to 5% for the 
temperature range increasing from 300K to 2400K and for the different strain rates. Another 
MD simulation study was performed by Y.-R.Jeng et al. [28] investigated the effect of 
temperature and vacancy defects on tensile deformation of (10,0); (8,3); (6,6) tubes of similar 
radii. Their Young’s modulus and Poisson ratio values  range from 0.92 to 1.03 TPa and 0.36-
0.32, respectively. Their simulations also demonstrate that the values of the majority of the 
considered mechanical properties decrease with increasing temperature and increasing 
vacancy percentage. 
In this study, the effect of temperature increase on the structural stability and mechanical 
properties of (10,10) armchair SWCNT under tensile loading is investigated by using O(N) 
tight-binding molecular dynamics (TBMD) simulations. Extensive literature survey is given 
in order to show the importance of our present study. The armchair 20 layers (10,10) SWCNT 
is chosen  in the present work because it is one of the most synthesized nanotube in the 
experiments. For the first time we questioned how the strain energy of these nanotubes 
changes for positive and negative strain values at high temperatures. Along with the high 
temperature stress-strain curves for the first time we displayed the bond-breaking strain values 
through total energy graphs. Mechanical properties (Young’s modulus, Poisson ratio, tensile 
strength and elastic limit) of this nanotube are reported at high temperatures. 
2. Method  
Traditional TB theory solves the Schrödinger equation by direct matrix diagonalization, 
which results in cubic scaling with respect to the number of atoms O(N3). The O(N) methods, 
on the other hand, make the approximation that only the local environment contributes to the 
bonding and hence the bond energy of each atom. In this case the run time would be in linear 
scaling with respect to the number of atoms. G.Dereli et al. [29,30] have improved and  
succesfully applied the O(N) TBMD technique to SWCNTs. In this work, using the same 
technique, we performed SWCNT simulations depending on conditions of temperature and 
unaxial strain. The electronic structure of the simulated system is calculated by a TB 
Hamiltonian so that the quantum mechanical many body nature of the interatomic forces  is 
taken into account.  Within a semi-empirical TB, the matrix elements of the Hamiltonian  are 
evaluated by fitting a suitable database. TB hopping integrals, repulsive potential and scaling 
law is fixed in the program  [31,32]. Application of the technique to SWCNTs can be seen in 
our previous studies [16,29,30]. 
An armchair (10,10) SWCNT consisting of 400 atoms with 20 layers is simulated. Periodic 
boundary condition is  applied along the tube axis. Velocity Verlet algorithm along with the 
canonical ensemble molecular dynamics (NVT) is used. Our simulation procedure is as 
follows: i.) The tube is simulated at a specified temperature during a 3000 MD steps of run 
with a time step of 1 fs. This eliminates the possibility of the system to be trapped in a 
metastable state. We wait for the total energy per atom  to reach the equilibrium state. ii.) 
Next, uniaxial strain is applied to the tubes. We further simulated the deformed tube structure 
(the under uniaxial strain) for another 2000 MD steps. In our study, while the nanotube is 
axially elongated or contracted, reduction or enlargement of the radial dimension is observed. 
Strain is obtained from 00 /)( LLL −=ε , where  and 0L L  are the tube lengths before and 
after the strain, respectively. We applied the elongation and compression and calculated the 
average total energy per atom.  Following this procedure we examined the structural stability, 
total energy per atom, stress-strain curves, elastic limit,  Young’s modulus,  tensile strength, 
Poisson ratio of the (10,10) tube as a function of temperature.     
The stress is determined from the resulting force acting on the tube per  cross sectional area 
under stretching. The cross sectional area of the tube, is defined by S RRS δπ2= , 
where R and Rδ  are the radius and the wall thickness of the tube, respectively. We have used  
3.4 Å for wall thickness. Mechanical properties are calculated from the stress-strain curves. 
Elastic limit is obtained from the linear regions of the stress-strain curves. Young’s modulus, 
which shows the resistivity of a material to a change in its length, is determined from the 
slope of the stress-strain curve at studied temperatures. The tensile strength can be defined as 
the maximum stress which may be applied to the tube without perturbing its stability. Poisson 
ratio which is a measure of the radial reduction or expansion of a material under tensile 
loading can be defined as 
    ⎟⎟
where R  and  are the tube radius at the strainoR ε  and before the strain, respectively.  
3. Results and Discussion 
In Figure 1, we present the total energy per atom of the  (10,10) SWCNT as a function of 
strain. Several strain values are applied. The positive values of strain corresponds to 
elongation and the negative values to compression. We obtained the total energy per atom vs 
strain curves in the temperature range between 300K-1800K in steps of 300K. Total energy 
per atom increases as we increase the temperature. An asymmetric pattern is observed in these 
curves. Repulsive forces are dominant in the case of compression. SWCNT does not have a 
high strength for compression as much as for elongation. (10,10) SWCNT is stable up to 0.06 
strain in compression in the temperature range between 300K-1500K and 0.03 at 1800K. In 
elongation (10,10) SWCNT is stable up to 0.23 strain at 300K.  As we increase the 
temperature the tube is stable up to 0.15 in elongation until 1800K. At 1800K we can only 
apply the strain of 0.08 in elongation before bond breakings. Figure 2a shows the variation of 
the total energy per atom during simulations for the strain values of 0.23 in elongation and 
0.06 in compression at 300K. This figure indicates that the tube can sustain its structural 
stability up to these strain values. Beyond these, bond-breakings between the carbon atoms 
are observed at the strain values of 0.24 in elongation and 0.07 in compression as given in 
Figure 2b. In Figure 2b. sharp peaks represent the disintegrations of atoms from the tube. 
Next, bond-breaking strains are studied with increasing temperature. In Figure 3, we show the 
bond-breaking strain values with respect to temperature: as the temperature increases,  
disintegration of atoms from their places is possible at lower strain values due to the thermal 
motion of atoms. But this is not the case for compression as can be seen in Figure 3. Some 
examples of the variation of the total energy as a function of MD Steps under uniaxial strain 
values at various  temperatures are given in Figure 4 and Figure 5. Figure 4a and Figure 5a 
shows that the tube can sustain its structural stability for strain values of 0.14 in elongation 
and 0.06 in compression at 900K; 0.08 in elongation and 0.03 in compression at 1800K, 
respectively. Beyond these points, bond-breakings between carbon atoms are observed at the 
strain values of 0.15 in elongation and 0.07 in compression, at 900 K (Figure 4b) and 0.09 in 
elongation and 0.04 in compression, at 1800K (Figure 5b). 
The stress-strain curves of the tube are given in Figure 6. at studied temperatures. Our results 
show that the temperature have a significant influence on the stress-strain behaviour of the 
tubes. The stress-strain curves are in the order of increasing temperatures between 300K-
900K. Stress value is increasing with increasing temperature . On the other hand between 
1200K-1800K the stress value decreases with increasing temperature. This is due to the 
smaller energy difference under tensile loading with respect to 300K-900K temperature range.  
This result can also be followed in the total energy changes observed in  Figures 2b , 4b and 
5b.    
Table1. gives a summary of the variations of the mechanical properties of (10,10) SWCNT 
with temperature. As given in Table1, elastic limit has the same value (0.10) in the 300K-
900K temperature range. It drops to 0.09 in the 1200K-1500K temperature range and to 0.08 
at 1800K. Young’s modulus, Poisson ratio and the tensile strength of the tube have been 
found to be sensitive to the temperature (Table 1.). Our calculated value at 300K is 0.401 TPa. 
It decreases to 0.370 TPa at 600K and to 0.352 TPa at 900K. In this temperature range 
Young’s modulus decreases 12 %. After 1200K as we increase the temperature to 1800K 
there is 3% increase in the Young’s modulus. We determined the tensile strength of (10,10) 
tube as 83.23 GPa at 300K. There is an abrupt decrease in tensile strength as we increase the 
temperature to 900K. Between 900K-1500K temperature range tensile strength does not 
change appreciably. At 1800K, it  drops to 43.78 GPa. We specified the Poisson ratio at 300K 
as 0.3. Between 300K-900K temperature range Poisson ratio increases to 0.339 (12.5 %). This 
corresponds to the increase in the radial reduction. As we increase the temperature to 1200K 
its value drops to 0.315 and  at 1800K to 0.289. We can conclude that for 20 layer (10,10) 
SWCNT in the 300K-900K temperature range : Young’s modulus, the tensile strengths are 
decreasing with increasing temperature while the Poisson ratio is increasing. At higher 
temperatures, Young’s modulus and the tensile strengths start to increase while the Poisson 
ratio decreases. In the 1200K-1800K temperature range, the SWCNT is already deformed and 
softened. Applying strain on these deformed and softened SWCNT do not follow the same 
pattern of 300K- 900K temperature range.  
4. Conclusion 
This paper reports for the first time the effect of temperature on the stress-strain curves, 
Young’s modulus, tensile strength, Poisson ratio and elastic limit of (10,10) SWCNT. Total 
energy per atom of the (10,10) tube increases with axial strain under elongation. We propose 
that SWCNTs do not have a high strength for compression as much as for elongation. This is 
due to the dominant behavior of repulsive forces in compression. At room temperature, the 
bond breaking strain values of the tube are 0.24 in elongation and 0.07 in compression. We 
showed that as the temperature increases, the disintegration of atoms from their places is 
possible at lower strain values (0.09 at 1800K) in elongation due to the thermal motion of 
atoms. But this is not the case for compression. For 20 layers SWCNT bond-breaking 
negative strain values are temperature independent between 300K-1500K temperature range. 
Bond breakings occurs at 0.07 compression  in this temperature range. When we increase the   
number of layers to 50, bond-breaking negative strain value decrease from 0.07 to 0.05 and 
remains the same in this temperature range. However, this is not a robust property for 
negative strains. When we decrease on the other  hand the layer size to 10; bond-breaking 
negative strain values vary with increasing temperature. We note that for short tubes the 
critical strain values for compressive deformations are dependent on the size of the employed 
supercell and therefore they are an artifact of the calculation. In literature, various critical 
strain values were mentioned for the tube deformations. Our room temperature critical strain 
values are in aggrement with the experimental results of [3] and the computational results of  
[7,10,12]. MD simulations of [21] determined the critical strain value of (10,10) tube as 0.27 
at 10K. To our knowledge the only reported temperature simulation study on tensile property 
comes from the MD simulation results of  [26]. They showed that the yield strain of (10,0) 
tube decreases from 0.18 to 0.05 for the temperature range increasing from 300K to 2400K. 
Our results follow the same trend such that the bond breaking strain values decrease with 
increasing temperature. In [20] the maximum compression strain of (10,10) tube is given as 
0.04 using Brenner potential without the mention of temperature, we obtained this value at 
1800K.  
We obtained the stress-strain curves in the temperature range between 300K-1800K. Our 
results show that the temperature have a significant influence on the stress-strain behavior of 
the tubes. (10,10) tube is brittle between 300K-900K and soft after 1200K.  The elastic limit 
decreased from 0.10 to 0.08 with increasing temperature. There is a wide range of values 
given in literature for Young’s modulus of SWCNTs due to the accuracy of the method and 
the choice of the wall thickness of the tube. The experimental results are in the range from 0.9 
TPa to 1.9 TPa [1,2], ab initio results are in the range from 0.5 TPa to 1.47 TPa [4-6], 
empirical results are in the range from 0.971 TPa to 0.975 TPa [8] and from 1.11 TPa to 1.258 
TPa [9], and also MD simulation results are in the range from 0.311 TPa to 1.48 TPa [13-28]. 
Our calculated value at 300K is 0.401 TPa is consistent with [4, 18,23]. We determined the 
tensile strength of (10,10) tube as 83.23 GPa at 300K and it decreases with increasing 
temperature. Maximum stress value of (10,10) tube is reported as 88 GPa in [10], 77 to 101 
GPa in [11]. At 300K,  we calculated the Poisson ratio of (10,10) tube as 0.3. This is in accord 
with the ab initio [5]; empirical [8,11], tight binding [18] results.  M.B.Nardelli et al. [24] 
showed that all tubes are brittle at high strains and low temperatures, while at low strains and 
high temperatures armchair nanotubes can be completely or partially ductile. Our findings 
agree with this (10,10) armchair SWCNT is brittle at low temperatures and ductile at higher 
temperatures. Contrary to [28] our extensive temperature study has shown that Young’s 
modulus changes with temperature.  
5. Comments 
Carbon nanotubes have the highest tensile strength of any material yet measured, with labs 
producing them at a tensile strength of 63 GPa, still well below their theoretical limit of 300 
GPa.  Carbon nanotubes are one of the strongest and stiffest materials known, in terms of their 
tensile stress and Young’s  modulus. This strength results from the covalent sp2 bonds formed 
between the individual carbon atoms. Our simulation study using the interactions between 
electrons and ions also predicts a similar tensile strength and also shows that when exposed to 
heat they still keep their tensile strength around this value until very high temperatures like 
1800K. CNTs are not nearly as strong under compression. Because of their hollow structure 
and high aspect ratio, they tend to undergo buckling when placed under compressive stress. 
The elastic limit is the maximum stress a material can undergo at which all strain are 
recoverable. (i.e., the material will return to its original size after removal of the stress). At 
stress levels below the elastic limit the material is said to be elastic.Once the material exceeds 
this limit, it is said to have undergone plastic deformation (also known as permanent 
deformation). When the stress is removed, some permanent strain will remain, and the 
material will be a different size. Our study shows that when the nanotube is exposed to heat 
this property does not change appreciably until 1800K. Through our tight-binding molecular 
dynamics simulation study we reported the high temperature positive/negative bond- breaking 
strain values and stress-strain curves of (10,10) SWCNTs . As far as we are aware, the strain 
energy values corresponding to positive/negative strain values at different temperatures are 
given here for the first time. We hope this extensive study of high temperature mechanical 
properties  will be useful for aerospace applications of CNTs.  
Acknowledgement 
The research reported here is supported through the Yildiz Technical University Research 
Fund Project No: 24-01-01-04. The calculations are performed at the Carbon Nanotubes 
Simulation Laboratory at the Department of Physics, Yildiz Technical University, Istanbul, 
Turkey.  
References 
[1] A. Krishnan, E. Dujardin, T.W. Ebbesen, P.N. Yianilos, M.M.J.Treacy, Phys. Rev. B 58, 
14013 (1998). 
[2] M.F. Yu, B.S. Files, S. Arepalli, R.S. Ruoff, Phys. Rev. Lett. 84, 5552 (2000). 
[3] D. Bozovic, M. Bockrath, J.H. Hafner, C.M. Leiber, H. Park, M. Tinkham, Phys. Rev. B    
67, 033407 (2003). 
[4] D.Sanchez-Portal, E. Artacho, J.M. Soler, A. Rubio, P. Ordejon, Phys. Rev. B 59, 12678 
(1999). 
[5] G. Zhou, W. Duan, B. Gu, Chem. Phys. Lett. 333, 344 (2001). 
[6] A. Pullen, G.L. Zhao, D. Bagayoko, L.Yang, Phys. Rev. B 71, 205410 (2005). 
[7] T.Dumitrica, T.Belytschko, B.I.Yakobson, Journal of Chem. Phys. 118, 9485 (2003). 
[8] J.P. Lu, Phys. Rev. Lett. 79, 1297 (1997). 
[9] S. Gupta, K.Dharamvir, V.K.Jindal, Phys. Rev. B 72, 165428 (2005). 
[10] T. Natsuki, M. Endo, Carbon 42, 2147 (2004). 
[11] X. Sun, W. Zhao, Mater. Sci. And Engineering A 390, 366 (2005). 
[12] J.R. Xiao, B.A. Gama, J.W. Gillespie Jr., Int. Journal of Solids and Structures 42, 3075 
(2005). 
[13] T. Ozaki, Y. Iwasa, T. Mitani, Phys. Rev. Lett 84, 1712 (2000). 
[14] B. Ni, S.B. Sinnott, P.T. Mikulski, J.A. Harrison,  Phys. Rev. Lett. 88, 205505 (2002). 
[15] L.G. Zhou, S.Q. Shi, Comp. Mater. Sci. 23, 166 (2002). 
[16] G. Dereli, C. Özdogan, Phys. Rev. B 67, 035416 (2003). 
[17] S. Ogata, Y. Shibutani, Phys. Rev. B 68, 165409 (2003). 
[18] C.Goze, L.Vaccarini, L. Henrard, P. Bernier, E. Hernandez, A.Rubio, Synthetic         
Metals, 103, 2500 (1999). 
[19] T.Xiao, K.Liao, Phys. Rev. B 66, 153407 (2002). 
[20] T.Xiao, X.Xu, Journal of Appl. Phys. 95, 8145 (2004). 
[21] M.Sammalkorpi, A.Krasheninnikov, A.Kuronen, K.Nordlund, K.Kaski, Phys. Rev. B 71, 
169906(E) (2005). 
[22] Y.Wang, X.Wang, X.Ni, H.Wu, Comput. Mater. Sci. 32, 141 (2005). 
[23] S.H.Yeak, T.Y.Ng, K.M.Liew, Phys. Rev. B 72, 165401 (2005). 
[24] M.B. Nardelli, B.I. Yakobson, J. Bernholc, Phys. Rev. Lett. 81, 4656 (1998). 
[25] N.R. Raravikar, P. Keblinski, A.M. Rao, M.S. Dresselhaus, L.S. Schadler, P.M. Ajayan,   
Phys. Rev. B 66, 235424 (2002). 
 [26] C. Wei, K. Cho, D. Srivastava, Phys. Rev. B 67, 115407 (2003). 
[27] C. Wei, K. Cho, D. Srivastava, Appl. Phys. Lett. 82, 2512 (2003). 
[28] Y.R. Jeng, P.C. Tsai, T.H. Fang, Journal of Physics and Chemistry of Solids 65, 1849  
(2004). 
[29] C. Özdoğan, G. Dereli, T. Çağın, Comp. Phys. Comm. 148, 188 (2002). 
[30] G. Dereli, C. Özdogan, Phys. Rev. B 67, 035415 (2003). 
[31] L.Colombo, Comput. Mater. Sci. 12, 278 (1998). 
[32] C.H. Xu,C.Z. Wang, C.T. Chan, K.M. Ho, J. Phys.:Cond. Matt. 4, 6047 (1992). 
Temperature (K) Elastic Limit Young’s Modulus
(TPa) 
Tensile Strength 
(GPa) 
Poisson Ratio 
300 0.10 0.401 83.23 0.300 
600 0.10 0.370 69.78 0.332 
900 0.10 0.352 67.62 0.339 
1200 0.09 0.360 67.33 0.315 
1500 0.09 0.356 68.14 0.320 
1800 0.08 0.365 43.78 0.289 
Table 1. High Temperature Mechanical Properties of (10,10) SWCNT 
Figure Captions 
Figure 1. “Color online” Total energy per atom curves as a function of strain at different 
temperatures (negative strain values correspond to compression). 
Figure 2 a. “Color online” (10,10) SWCNT is stable for the strains of 0.23  and -0.06 at 300K. 
Figure 2 b. “Color online” Bond- breakings are observed between the carbon atoms for the 
strains of 0.24 and -0.07 at 300K. System is not in equilibrium. 
Figure 3. “Color online” Bond- breaking strain variations as a function of temperature for a) 
tension, b) compression. 
Figure 4 a. “Color online” (10,10) SWCNT is stable for the strains of 0.14 and -0.06 at 900K. 
Figure 4 b. “Color online” Bond- breakings are observed between the carbon atoms for the 
strains of  0.15 and -0.07 at 900 K. System is not in equilibrium. 
Figure 5 a. “Color online” (10,10) SWCNT is stable for the strains of  0.08 and -0.03 at 
1800K. 
Figure 5 b. “Color online” Bond- breakings are observed between the carbon atoms for the 
strains of 0.09 and -0.04 at 1800K. System is not in equilibrium. 
Figure 6. “Color online” The stress-strain curves of (10,10) SWCNT at different temperatures. 
ABSTRACT
  This paper examines the effect of temperature on the structural stability and
mechanical properties of 20 layered (10,10) single walled carbon nanotubes
(SWCNTs) under tensile loading using an O(N) tight binding molecular dynamics
(TBMD) simulation method. We observed that (10,10) tube can sustain its
structural stability for the strain values of 0.23 in elongation and 0.06 in
compression at 300K. Bond breaking strain value decreases with increasing
temperature under streching but not under compression. The elastic limit,
Young's modulus, tensile strength and Poisson ratio are calculated as 0.10,
0.395 TPa, 83.23 GPa, 0.285, respectively, at 300K. In the temperature range
from 300K to 900K; Young's modulus and the tensile strengths are decreasing
with increasing temperature while the Poisson ratio is increasing. At higher
temperatures, Young's modulus starts to increase while the Poisson ratio and
tensile strength decrease. In the temperature range from 1200K to 1800K, the
SWCNT is already deformed and softened. Applying strain on these deformed and
softened SWCNTs do not follow the same pattern as in the temperature range of
300K to 900K.

<|endoftext|><|startoftext|>
arXiv:0704.0184v1  [astro-ph]  2 Apr 2007
Gamma-ray emitting AGN and GLAST
P. Padovani
European Southern Observatory, Karl-Schwarzschild-Str. 2, 85748 Garching bei München, Germany
Abstract. I describe the different classes of Active Galactic Nuclei (AGN) and the basic tenets of unified schemes. I then
review the properties of the extragalactic sources detected in the GeV and TeV bands, showing that the vast majority of them
belong to the very rare blazar class. I further discuss the kind of AGN GLAST is likely to detect, making some predictions
going from the obvious to the likely, all the way to the less probable.
Keywords: active galactic nuclei, radio sources, gamma-ray sources
PACS: 98.54.Cm, 98.54.Gr, 98.70.Dk, 98.70.Rz
THE ACTIVE GALACTIC NUCLEI ZOO
Active Galactic Nuclei (AGN) are extragalactic sources, in some cases clearly associated with nuclei of galaxies
(although generally the host galaxy light is swamped by the nucleus), whose emission is dominated by non-stellar
processes in some waveband(s).
Based on a variety of observations, we believe that the inner parts of AGN are not spherically symmetric and
therefore that emission processes are highly anisotropic [4, 28]. The current AGN paradigm includes a central engine,
almost certainly a massive black hole, surrounded by an accretion disk and by fast-moving clouds, which under the
influence of the strong gravitational field emit Doppler-broadened lines. More distant clouds emit narrower lines.
Absorbing material in some flattened configuration (usually idealized as a torus) obscures the central parts, so that
for transverse lines of sight only the narrow-line emitting clouds are seen and the source is classified as a so-called
"Type 2" AGN. The near-infrared to soft-X-ray nuclear continuum and broad-lines, including the UV bump typical
of classical quasars, are visible only when viewed face-on, in which case the object is classified as a "Type 1" AGN.
In radio-loud objects, which constitute ≈ 10% of all AGN, we have the additional presence of a relativistic jet, likely
perpendicular to the disk (see Fig. 1 of [28]).
This axisymmetric model of AGN implies widely different observational properties (and therefore classifications)
at different aspect angles. Hence the need for "Unified Schemes" which look at intrinsic, isotropic properties, to unify
fundamentally identical (but apparently different) classes of AGN. Seyfert 2 galaxies are though to be the "parent"
population of, and have been "unified" with, Seyfert 1 galaxies, whilst low-luminosity (Fanaroff-Riley type I [FR
I][7]) and high-luminosity (Fanaroff-Riley type II [FR II] ) radio galaxies have been unified with BL Lacs and radio
quasars respectively [28]. In other words, BL Lacs are thought to be FR I radio galaxies with their jets at relatively
small ( <
15− 20◦) angles w.r.t. the line of sight. Similarly, we believe flat-spectrum radio quasars (FSRQ) to be FR
II radio galaxies oriented at small ( <
15◦) angles, while steep-spectrum radio quasars (SSRQ) should be at angles in
between those of FSRQ and FR II’s (15◦ <∼ θ <∼ 40◦; a spectral index value αr = 0.5 at a few GHz [where fν ∝ ν−α ]
is usually taken as the dividing line between FSRQ and SSRQ). BL Lacs and FSRQ, that is radio-loud AGN with their
jets practically oriented towards the observer, make up the blazar class. Blazars, as I show below, play a very important
role in γ-ray astronomy and it is therefore worth expanding on their properties.
Blazars
Blazars are the most extreme variety of AGN. Their signal properties include irregular, rapid variability, high
polarization, core-dominant radio morphology (and therefore flat [αr <∼ 0.5] radio spectra), apparent superluminal
motion, and a smooth, broad, non-thermal continuum extending from the radio up to the γ-rays [28]. Blazar properties
are consistent with relativistic beaming, that is bulk relativistic motion of the emitting plasma at small angles to the line
of sight, which gives rise to strong amplification and collimation in the observer’s frame. Adopting the usual definition
of the relativistic Doppler factor δ = [Γ(1−β cosθ )]−1, Γ = (1−β 2)−1/2 being the Lorentz factor, β = v/c being the
Gamma-ray emitting AGN and GLAST November 4, 2018 1
http://arxiv.org/abs/0704.0184v1
0 10 20 30 40 50 60 70 80 90
FIGURE 1. The dependence of the Doppler factor on viewing angle. Different curves correspond to different Lorentz factors Γ.
The expanded scale on the inset shows the angles for which δ = 1.
ratio between jet speed and the speed of light, and θ the angle w.r.t. the line of sight, and applying simple relativistic
transformations, it turns out that the observed luminosity at a given frequency is related to the emitted luminosity in
the rest frame of the source via Lobs = δ pLem with p ∼ 2−3. For θ ∼ 0◦, δ ∼ 2Γ (Fig. 1) and the observed luminosity
can be amplified by factors 400 – 10,000 (for Γ ∼ 10 and p ∼ 2−3, which are typical values). That is, for jets pointing
almost towards us the emitted luminosity can be overestimated by up to four orders of magnitude. For more typical
angles θ ∼ 1/Γ, δ ∼ Γ and the amplification is ∼ 100− 1,000.
In a nut-shell, blazars can be defined as sites of very high energy phenomena, with bulk Lorentz factors up to Γ ≈ 30
[6] (corresponding to velocities ∼ 0.9994c) and photon energies reaching the TeV range (see below).
Given their peculiar orientation, blazars are very rare. Assuming that the maximum angle w.r.t. the line of sight
an AGN jet can have for a source to be called a blazar is ∼ 15◦, only ∼ 3% of all radio-loud AGN, and therefore
≈ 0.3% of all AGN, are blazars. For a ∼ 1− 10% fraction of galaxies hosting an AGN, this implies that only 1 out of
≈ 3,000− 30,000 galaxies is a blazar!
Blazar spectral energy distributions (SEDs) are usually explained in terms of synchrotron and inverse Compton
emission, the former dominating at lower energies, the latter being relevant at higher energies. Blazars have a large
range in synchrotron peak frequency, νpeak, which is the frequency at which the synchrotron energy output is maximum
(i.e., the frequency of the peak in a ν − ν fν plot). Although the νpeak distribution appears now to be continuous, it
is still useful to divide blazars into low-energy peaked (LBL), with νpeak in the IR/optical bands, and high-energy
peaked (HBL) sources, with νpeak in the UV/X-ray bands [21]. The location of the synchrotron peaks suggests in
fact a different origin for the X-ray emission of the two classes. Namely, an extension of the synchrotron emission
responsible for the lower energy continuum in HBL, which display steep (αx ∼ 1.5) X-ray spectra [29], and inverse
Compton (IC) emission in LBL, which have harder (αx ∼ 1) spectra [20]. This distinction applies almost only to
BL Lacs, as most known FSRQ are of the low-energy peak type and, therefore, with the X-ray band dominated by
Gamma-ray emitting AGN and GLAST November 4, 2018 2
inverse Compton emission. Very few “HFSRQ” (as these sources have been labelled), i.e., FSRQ with high (UV/X-ray
energies) νpeak are in fact known. Moreover, νpeak for all these sources (apart from one) appears to be ∼ 10−100 times
smaller than the values reached by BL Lacs (see [19] for a review).
THE GEV AND TEV SKIES
Before moving on to GLAST we need to assess the present status of the γ-ray sky. I do this first at GeV and then TeV
energies.
The third EGRET catalogue [15] includes 271 sources (E > 100 MeV), out of which 95 were identified as
extragalactic (including 28 lower confidence sources). Further work [17, 24, 25], which provided more identifications,
allows us to say that EGRET has detected at least ∼ 130 extragalactic sources (since a large fraction of sources is still
unidentified), all of them AGN apart from the Large Magellanic Cloud. Furthermore, all the AGN are radio-loud and
∼ 97% of them are blazars, with the remaining sources including a handful of radio galaxies (e.g., Centaurus A, NGC
6251). Most of the blazars are FSRQ, in a ratio ∼ 3/1 with BL Lacs. Finally, ∼ 80% of the BL Lacs are LBL and the
few HBL are all local (z < 0.12). As all of the FSRQ are also of the LBL type, ∼ 93% of EGRET detected blazars are
of the low-energy peak type.
The situation at TeV energies is at first order similar to that in the GeV band, with some significant differences.
All confirmed extragalactic TeV sources are radio-loud AGN and include 16 BL Lacs and one radio galaxy (M87)
(a starburst galaxy is also a possible TeV source) [18, 3]. That is, the blazar fraction is ∼ 94%. Unlike the GeV
band, however, no FSRQ is detected and all but one BL Lacs are HBL. This is due to the fact that in HBL the very
high-energy flux is higher than in LBL, as both peaks of the two humps in their SED are shifted to higher frequencies.
The fact that the GeV and TeV skies are dominated by blazars seems to be at odds with these sources being extremely
rare (see previous section). The explanation has to be found in the peculiar properties of the blazar class and rests on
the fact that blazars are characterized by:
1. high-energy particles, which can produce GeV and TeV photons;
2. relativistic beaming, to avoid photon-photon collision and amplify the flux;
3. strong non-thermal (jet) component.
Point 1 is obvious. We know that in some blazars synchrotron emission reaches at least the X-ray range, which
reveals the presence of high-energy electrons which can produce γ-rays via inverse Compton emission (although other
processes can also be important: e.g., [5]). Point 2 is vital, as otherwise in sources as compact as blazars all GeV
photons, for example, would be absorbed through photon-photon collisions with target photons in the X-ray band (see,
e.g., [16]). Beaming means that the intrinsic radiation density is much smaller than the observed one and therefore γ-
ray photons manage to escape from the source. The flux amplification in the observer’s frame makes also the sources
more easily detectable. Point 3 is also very important. γ-ray emission is clearly non-thermal (although we still do not
know for sure which processes are responsible for it) and therefore related to the jet component. The stronger the jet
component, the stronger the γ-ray flux.
GLAST AND AGN
We can know ask which (and how many) AGN GLAST will detect. This I describe in the following, in decreasing
order of "obviousness".
Blazars
Given that blazars are well know γ-ray sources, GLAST will certainly detect many flat-spectrum radio quasars and
BL Lacs. How many exactly depends on a variety of factors. These include blazar evolution and intrinsic number
density (which can to some extent be estimated from deep surveys in other bands), their duty cycle in the γ-ray band
(as we know that EGRET was detecting mostly sources in outburst), and their SED (see below). Finally, any prediction
will have not to violate the extragalactic γ-ray background.
Gamma-ray emitting AGN and GLAST November 4, 2018 3
To get an order of magnitude estimate, I make the following simple assumptions: a) EGRET has detected 130
blazars, which is likely to be a lower limit given the still unidentified sources; b) the number counts are Euclidean, that
is N(> S) ∝ S−1.5, where S is the flux density; this is a very likely upper limit as we know that, after the initial steep
rise, number counts of extragalactic sources tend to flatten out at lower fluxes; c) GLAST is 30 times more sensitive
than EGRET. The total number of blazars GLAST will detect over the whole sky is then <
20,000. This corresponds
to <∼ 0.5 objects/deg
2, which, interestingly enough, is the surface density of blazars down to ∼ 50 mJy at 5 GHz in the
Deep X-ray Radio Blazar Survey (DXRBS) [22]. Note also that by means of Monte Carlo simulations a value around
5,000 has been predicted all-sky (extrapolating from the high Galactic latitude value of [10]; see also [11]).
As discussed above, EGRET has detected very few blazars of the high-energy peak type (HBL). This is because the
EGRET band was sampling the "valley" between the two (synchrotron and IC) humps in their SED. A look at the SED
of some of the TeV detected HBL [1, 2, 27] shows that many, if not all, of them should be easily detected by GLAST.
Radiogalaxies
Unified schemes predict that the "parent" population of blazars is made up of radio-galaxies, a much more numerous
class (by a factor ≈ 30 for a dividing angle between the two classes ∼ 15◦). However, at large angles w.r.t. the
line of sight, jet emission is not only not-amplified but actually de-amplified. Fig. 1 shows that for typical Lorentz
factors δ < 1 for viewing angles >
20− 30◦. This implies that radio-galaxies on average are weaker sources (by
factors ≈ 1,000) than blazars, in all bands. And indeed, the handful of GeV/TeV-detected radio-galaxies are all local
(z < 0.02).
Large scale, that is kpc-scale jet emission, as opposed to the small, pc-scale, one, is also unlikely to be relevant in
the γ-ray band for the bulk of radio-galaxies [26, 23].
However, the radio-galaxy cause might not be totally lost. It has been proposed that blazar jets are structured or
decelerated. The first scenario [9], which ties in with Very Long Baseline Interferometry (VLBI) observations of limb
brightening [12], suggests the presence of a fast spine surrounded by a slower external layer. In the other case [8],
which tries to reconcile the low δ values from VLBI observations of TeV BL Lacs with the high values inferred from
SED modeling of the same sources, the jet is supposed to decelerate from a Lorentz factor Γ ∼ 20 down to Γ ∼ 5 over
a length of ∼ 0.1 pc. In both instances the presence of the two velocity fields implies that each of the two components
sees an enhanced radiation field produced by the other. The net result is that IC emission gets boosted and therefore the
GeV flux is higher than that predicted in the simpler case of an homogeneous jet (at the price of having a larger number
of free parameters). Assuming that the γ-ray/radio flux ratio observed for the three GeV/TeV-detected radio-galaxies
sources is typical, at least 10 3CR radio-galaxies should to be detected by GLAST [9].
Note that some Broad Line Radio Galaxies (BLRG), which are Type 1 sources in which the jet is at angles
intermediate between those of blazars and radio-galaxies, are also likely to be detected by GLAST [13, 14].
Radio-Quiet AGN
The large majority of AGN are of the radio-quiet type, that is they are characterized by very weak radio emission,
on average ∼ 1,000 times fainter than in radio-loud sources. Radio-quiet does not mean radio-silent, however, and the
nature of radio emission in these sources is still debated. Two extreme options ascribe it either to processes related to
star-formation (synchrotron emission from relativistic plasma ejected from supernovae) or to a scaled down version
of the non-thermal processes associated with energy generation and collimation present in radio-loud AGN. In the
latter case, one would expect also radio-quiet AGN to be (faint) γ-ray sources. Assuming their GeV flux to scale
roughly as the radio flux this would be, on average, a factor ≈ 30 below the GLAST detection limit. Detection might
be possible, however, for the (few) high core radio flux radio-quiet AGN. Even a negative detection, supported by
detailed calculations, could prove very valuable in constraining the nature of radio-emission in these sources.
Gamma-ray emitting AGN and GLAST November 4, 2018 4
SUMMARY
The main conclusions are as follows:
1. Blazars, even though they make up a small minority of AGN, dominate the γ-ray sky;
2. GLAST will certainly detect "many thousand" blazars, with the exact number being somewhat model dependent;
3. GLAST will most likely detect "many" high-energy peaked blazars, which have so far escaped detection at GeV
energies due to the fact that EGRET was sampling the "valley" between the two (synchrotron and IC) humps in
their spectral energy distribution;
4. GLAST will possibly detect a "fair" number of radio-galaxies;
5. GLAST might also detect some radio-quiet AGN, depending on the nature of their radio emission.
In any case, GLAST will constrain (radio-loud) AGN physics and populations, as described very well at this
conference!
ACKNOWLEDGMENTS
It is a pleasure to thank Paolo Giommi for useful discussions and Annalisa Celotti for reading the manuscript.
REFERENCES
1. J. Albert, et al., The Astrophysical Journal 648, L105–L108 (2006).
2. J. Albert, et al., The Astrophysical Journal 654, L199–L122 (2007).
3. J. Albert, et al., The Astrophysical Journal in press (2007) (arXiv:astro-ph/0703084).
4. R. Antonucci, Annual Review of Astronomy and Astrophysics 31, 473–521 (1993).
5. A. Celotti, these proceedings (2007).
6. M. H. Cohen, et al., The Astrophysical Journal in press (2007) (arXiv:astro-ph/0611642).
7. B. L. Fanaroff, and J. M. Riley, Monthly Notices of the Royal Astronomical Society 167, 31p–36p (1974).
8. M. Georganopoulos, E. S. Perlman, and D. Kazanas, The Astrophysical Journal 643, L33–L36 (2005).
9. G. Ghisellini, F. Tavecchio, and M. Chiaberge, Astronomy & Astrophysics 432, 401–410 (2005).
10. P. Giommi, and S. Colafrancesco, "Non-thermal Cosmic Backgrounds and prospects for future high-energy observations of
blazars" in Gamma-Wave 2005 in press (2007) (arXiv:astro-ph/0602243).
11. P. Giommi, these proceedings (2007).
12. M. Giroletti, et al., The Astrophysical Journal 600, 127–140 (2004).
13. P. Grandi, and G. Palumbo, The Astrophysical Journal in press (2007) (arXiv:astro-ph/0611342).
14. P. Grandi, and G. Palumbo, these proceedings (2007).
15. R. C. Hartman, et al., The Astrophysical Journal Supplement Series 123, 79–202 (1999).
16. L. Maraschi, G. Ghisellini, and A. Celotti, The Astrophysical Journal 397, L5–L9, (1992).
17. J. R. Mattox, R. C. Hartman, and O. Reimer, The Astrophysical Journal Supplement Series 135, 155–175 (2001).
18. D. Mazin, these proceedings (2007).
19. P. Padovani, "Blazar Sequence: Validity and Predictions" in The Multi-messenger approach to high energy gamma-ray sources
in press (2007) (arXiv:astro-ph/0610545).
20. P. Padovani, L. Costamante, P. Giommi, G. Ghisellini, A. Celotti, and A. Wolter, Monthly Notices of the Royal Astronomical
Society 347, 1282–1293 (2004).
21. P. Padovani, and P. Giommi, The Astrophysical Journal 444, 567–581 (1995).
22. P. Padovani, P. Giommi, H. Landt, and E. S. Perlman, The Astrophysical Journal in press (2007) (arXiv:astro-ph/0702740).
23. R. Sambruna, these proceedings (2007).
24. D. Sowards-Emmerd, R. W. Romani, and P. F. Michelson, The Astrophysical Journal 590, 109–122 (2003).
25. D. Sowards-Emmerd, R. W. Romani, P. F. Michelson, and J. S. Ulvestad, The Astrophysical Journal 609, 564–575 (2004).
26. Ł. Stawarz, M. Sikora, and M. Ostrowski, The Astrophysical Journal 597, 186–201 (2003).
27. F. Tavecchio, et al., The Astrophysical Journal 554, 725–733 (2001).
28. C. M. Urry, and P. Padovani, Publications of the Astronomical Society of the Pacific 107, 803–845 (1995).
29. A. Wolter, et al., Astronomy & Astrophysics 335, 899–911 (1998).
Gamma-ray emitting AGN and GLAST November 4, 2018 5
ABSTRACT
  I describe the different classes of Active Galactic Nuclei (AGN) and the
basic tenets of unified schemes. I then review the properties of the
extragalactic sources detected in the GeV and TeV bands, showing that the vast
majority of them belong to the very rare blazar class. I further discuss the
kind of AGN GLAST is likely to detect, making some predictions going from the
obvious to the likely, all the way to the less probable.

<|endoftext|><|startoftext|>
Introduction
Classical effective potentials reduce the quantum-mechanical interactions of electrons
and nuclei in a solid to an effective interaction between atom cores. This greatly
reduces the computational effort in molecular dynamics (MD) simulations. Whereas
first principles simulations are limited to a few hundred atoms at most, classical
MD calculations with many millions of atoms are routinely performed. Such system
sizes are possible, because molecular dynamics with short-range interactions scales
linearly with the number of atoms. Moreover, it can easily be parallelized using a
geometrical domain decomposition scheme [1, 2], thereby achieving linear scaling also
in the number of CPUs.
The study of many problems in materials science and nanotechnology indeed
requires simulations of systems with millions of atoms. Quite generally, this is the case
whenever long-range mechanical stresses are involved. Examples of such problems are
the study of fracture propagation [3], nano-indentation, or the motion and pinning
of dislocations. Other problems may be simulated with more moderate numbers of
atoms, but require very long simulated times, of the order of nanoseconds, an example
of which is the study of atomic diffusion [4]. In either case, if large systems and/or
long time scales are required, classical effective potentials are the only way to make
molecular dynamics simulations possible.
The reliability and predictive power of classical MD simulations depend cruicially
on the quality of the effective potentials employed. In the case of elementary solids,
such potentials are usually obtained by adjusting a few potential parameters to
http://arxiv.org/abs/0704.0185v2
mailto:p.brommer@itap.physik.uni-stuttgart.de
http://stacks.iop.org/ms/15/295
http://dx.doi.org/10.1088/0965-0393/15/3/008
Potfit: effective potentials from ab-initio data 2
optimally reproduce a set of reference data, which typically includes a number of
experimental values like lattice constants, cohesive energies, or elastic constants,
sometimes supplemented with ab-initio cohesive energies and stresses [5, 6]. In the
case of more complex systems with a large variety of local environments and many
potential parameters to be determined, such an approach cannot help, however; there
is simply not enough reference data available.
The force matching method [7] provides a way to construct physically justified
potentials even under such circumstances. The idea is to compute forces and energies
from first principles for a suitable selection of small reference systems and to adjust
the parameters of the potential to optimally reproduce them.
For that purpose, we developed a program called potfit‡. By separating the
process of optimization from the form of the potential, potfit allows for maximal
flexibility in the choice of potential model and parametrization.
The underlying algorithms are described in section 2. Section 3 focuses on
the implementation of the algorithms, followed by details on employing potfit in
section 4. We discuss advantages and limitations of the force matching method and
our implementation in section 5, and present our conclusions in the final section 6.
2. Algorithms
As mentioned above, potfit consists of two separate parts. The first one implements
a particular parametrized potential model and calculates from a set of potential
parameters ξi the target function that quantifies the deviations of the forces, stresses
and energies from the reference values. Wrapped around is a second, potential
independent part which implements a least squares minimization module. As this
part is completely independent of the potential model and just deals with the list
of parameters ξi, it is fairly straightforward to change the parametrization of the
potential (tabulated or analytic), or even to switch to a different potential model.
2.1. Optimization
From a mathematical point of view, force matching is a basic optimization problem:
There is a set of parameters ξi, a set of values bk(ξi) depending on them, and a
set of reference values b0,k which the bk have to match. This leads to the well-
known method of least squares, where one tries to minimize the sum of squares of
the deviations between the bk and the b0,k. In our case, the reference values can either
be the components of the force vector ~f0,j acting on each individual atom j, or global
data A0,k like stresses, energies, or certain external constraints. We found it helpful
to measure the relative rather than the absolute deviations from the reference data,
except for very small reference values. The least squares target function thus becomes
Z = ZF + ZC, (1)
with ZF =
α=x,y,z
(fjα − f0,jα)
0,j + εj
, (2)
and ZC =
(Ak −A0,k)
, (3)
‡ http://www.itap.physik.uni-stuttgart.de/%7Eimd/potfit
http://www.itap.physik.uni-stuttgart.de/%7Eimd/potfit
Potfit: effective potentials from ab-initio data 3
where ZF represents the contributions of the forces, and ZC that of the global data.
The (small and positive) εℓ impose a lower bound on the denominators, thereby
avoiding a too accurate fitting of small quantities which are actually not known to
such a precision. The Wℓ are the weights of the different terms. It proves useful for
the fitting to give the total stresses and the cohesion energies an increased weight,
although in principle they should be reproduced correctly already from the forces.
Even if all forces are matched with a small deviation only, those deviations can add
up in an unfortunate way when determining stresses, thus leading to potentials giving
wrong elastic constants. Including global quantities in the fit with a sufficiently high
weight supresses such undesired behaviour of the fitting process.
As the evaluation of the highly nonlinear target function (1) is computationally
rather expensive, a careful choice of the minimization method has to be made. We
chose a combination of a conjugate-gradient-like deterministic algorithm [8] and a
stochastic simulated annealing algorithm [9].
For the deterministic algorithm we take the one described by Powell [8], which
takes advantage of the form of the target function (which is a sum of squares). By
re-using data obtained in previous function calls it arrives at the minimum faster
than standard least squares algorithms. It also does not require any knowledge of the
gradient of the target function. The algorithm first determines the gradient matrix
at the starting point in the high-dimensional parameter space by finite differences.
The gradient matrix is assumed to be slowly varying around the starting point. A
new optimal search direction towards the minimum is determined by the method of
conjugate gradients. Then, the target function is minimized along this direction. This
operation is called line minimization. When the minimum is found, the direction unit
vector replaces one of the basis vectors spanning the parameter space. The gradient
matrix is updated only with respect to this new direction, using the finite differences
calculated in the line minimization. In this way, no finite differences have to be
calculated explicitly except in the very first step. The line minimization is performed
by Brent’s algorithm [10] in an implementation taken from the GNU Scientific Library
[11].
The algorithm is restarted (including a calculation of the full gradient matrix)
when either a step has been too large to maintain the assumption of a constant
gradient matrix, the basis vectors spanning the parameter space become almost
linearly dependent, or the linear equation involved in Powell’s algorithm cannot be
solved with satisfactory numerical precision.
The other minimization method implemented is a simulated annealing[12]
algorithm proposed by Corana [9]. While the deterministic algorithm mentioned above
will always find the closest local minimum, simulated annealing samples a larger part
of the parameter space and thus has a chance to end up in a better minimum. The
price to pay is a computational burdon which can be several orders of magnitude
larger.
For the basic Monte Carlo move, we chose adding Gaussian-shaped bumps to the
potential functions. The bump heights are normally distributed around zero, with
a standard deviation adjusted so that on average half of the Monte Carlo steps are
accepted. This assures optimal progress: Neither are too many calculations wasted
because the changes are too large to be accepted, nor are the steps too small to make
rapid progress.
Potfit: effective potentials from ab-initio data 4
2.2. Potential models and parametrizations
The simplest effective potential is a pair potential, which only depends on interatomic
distances. It takes the form
i,j<i
φsisj (rij), (4)
where rij is the distance between atoms i and j, and φsisj is a potential function
depending on the two atom types si and sj . This function can either be given in
analytic form, using a small number of free parameters, like for a Lennard-Jones
potential, or in tabulated form together with an interpolation scheme for distances
between the tabulation points. Whereas the parameters of an analytic potential can
often be given a physical meaning, such an interpretation is usually not possible
for tabulated potentials. On the other hand, an inappropriate form of an analytic
potential may severely constrain the optimization, leading to a poor fit. For this
reason, we chose the functions φ to be defined by tabulated values and spline
interpolation, thus avoiding any bias introduced by an analytic potential. This choice
results in a relatively high number of potential parameters, compared to an analytic
description of the potentials. This is not too big a problem, however. Force matching
provides enough reference data to fit even a large number of parameters. The potential
functions φ only need to be defined at pair distances r between a minimal distance
rmin and a cutoff radius rcut, where the function should go to zero smoothly.
We found pair potentials to be insufficient for the simulation of complex metallic
alloys. More suited are EAM (Embedded Atom Method [13, 5]) potentials, also
known as glue potentials [14], which have many advantages over pair potentials in
the description of metals [14].
EAM potentials include a many-body term depending on a local density ni:
i,j<i
φsisj (rij) +
Usi(ni) with ni =
j 6=i
ρsj (rij). (5)
ni is a sum of contributions from the neighbours through a transfer function ρsj , and
Usi is the embedding function that yields the energy associated with placing atom i at
a density ni. Again, all functions are specified by their values at a number of sampling
points.
The parameters ξi specifying a tabulated potential are naturally the values at the
sampling points. Due to the nature of spline interpolation, either the gradient or the
curvature at the exterior sampling points of each function can also be chosen freely.
Depending on the type of potential one can keep the gradients fixed, or adapt them
dynamically by adding them to the set of parameters ξi.
The EAM potential described by (5) has two gauge degrees of freedom, i.e., two
sets of parameter changes which do not alter the physics of the potential:
ρs(r) → κρs(r),
Usi(ni) → Usi(
φsisj (r) → φsisj (r) + λsiρsj (r) + λsjρsi(r),
Usi(ni) → Usi(ni)− λsini.
According to (6), the units of the density ni can be chosen arbitrarily. We use this
degree of freedom to set the units such that the densities ni computed for the reference
Potfit: effective potentials from ab-initio data 5
configurations are contained in the interval (−1; 1], but not in any significantly smaller
interval. The transformation (7) states that certain energy contributions can be moved
freely between the pair and the embedding term. An embedding function U which is
linear in the density n can be gauged away completely. This also makes any separate
interpretation of the pair potential part and the embedding term void; the two must
only be judged together. The latter degeneracy is usually lifted by choosing the
gradients of the Ui(ni) to vanish at the average density for each atom type. potfit also
uses this convention when exporting potentials for plotting and MD simulation. As
the average density might change during minimization, potfit internally uses a slightly
different gauge: It requires that the gradient vanishes at the center of the domain of
the respective embedding function.
potfit can perform the transformations (6,7) periodically on its own, thus
eliminiating the need to fix the gauge by an additional term in the target function
(1). Unfortunately, for tabulated functions the transformations cannot be performed
exactly due to the nature of spline interpolation. A change of gauge therefore can
lead to an increase of the target function, which is why we suppress such gauge
transformations in the very late stages of a minimization.
3. Implementation
potfit is implemented in ANSI C. While the user may specify most options in a
parameter file read when running the program, some fundamental choices must be
made at compile time, like for example the potential model used, or whether to allow
for automatic gauge transformations in EAM potentials. This is a compromise between
convenience and computation speed. Compile time options can be selected by passing
them to the make command, and thus do not require any changes of the source files.
For solving the linear equations in Powell’s minimization algorithm, potfit makes use of
routines from the LAPACK library [15], which must be installed separately, probably
together with the BLAS library [16] LAPACK is based on.
3.1. Parallelization and optimization
The program spends almost all CPU time in calculating the forces for a given potential;
finding a new potential to be tested against the reference data takes only a tiny
fraction of that time. Thus, the only way to improve performance is to reduce the
total time needed for the force computations, either by minimizing their number, or by
making each force computation faster. Powell’s algorithm leaves only little room for
further reduction of the number of force evaluations. One could for instance adjust the
precision required in a line minimization. If the tolerance is too small, time is wasted
in refining a minimum beyond need, whereas an insufficent precision may stop too far
from the minimum, thus requiring more steps in total. The choice of this tolerance
was made empirically.
Much more time can be saved by parallelizing the calculation of forces, energies,
and stresses for a given potential. This is done in a straightforward way: As
the forces, energies, and stresses of the different reference configurations can be
computed independently, we simply distribute the reference configurations on several
processes. Before the force computation, the potential parameters are distributed to all
processes, and afterwards the computed forces, energies and stresses are collected. The
communication is performed using the standard Message Passing Interface (MPI [17]).
Potfit: effective potentials from ab-initio data 6
This simple parallelization scheme works well as long as the number of configurations
per process does not drop below 10 to 15. Otherwise, the communication overhead
starts to show up, and load balancing problems may appear. A shared memory
OpenMP parallelization also exists, but produces inferior results.
In force matching, the reference configurations stay fixed. Therefore, all distances
between atoms remain fixed, and potfit can use neighbour lists, which need to be
computed only once at startup. In fact, for each neighbour pair all data required
for spline interpolation are pre-computed, allowing for a fast lookup of the tabulated
functions. This data needs to be recomputed only when the tabulation points of a
function are changed.
3.2. Input and output files
Tabulated potential functions can be specified with equidistant or with arbitrary
tabulation points. For equidistant tabulation points, the boundaries of the domain
and the number of sampling points of each function are read from the potential file,
followed by a list of function values at the sampling points and the gradients at the
domain boundaries. In the case of free tabulation points, only their number is specified
at the beginning of the potential file, followed by a list of argument-value pairs and
again the gradients of the potential functions at the domain boundaries.
Reference configuration files contain the number of atoms, the box vectors, the
cohesive energy, and the stresses on the unit cell, followed by a list of atoms, with
atom species, position and reference force for each atom. Such reference configuration
files can simply be concatenated.
potfit was designed to cooperate closely with the first-principles code VASP
[18, 19] and with IMD [20], our own classical MD code. VASP, which is a plane
wave code implementing ultrasoft pseudopotentials and the Projector-Augmented
Wave (PAW) method [21, 22], is used to compute the reference data for the force
matching, whereas the resulting potentials are intended to be used with IMD. For this
reason, potfit provides import and export filters for potentials and configurations to
communicate with these programs. These filters are implemented as scripts, which
can easily be modified to interface with other programs.
4. Results and validation
As a first test, potfit should be able to recover a classical potential from reference data
computed with that potential. For this test, we used snapshots from several molecular
dynamics runs as reference structures, first for a Lennard-Jones fcc solid, then for a
complex Ni-Al alloy simulated with EAM potentials [23]. In order to ensure that all
reference data presented to potfit is consistent, the potentials were approximated by
cubic spline polynomials, in the same way as potfit represents the potentials. With
such reference data and starting with vanishing potential functions, potfit could in both
cases perfectly recover the potentials. This test therefore demonstrates the correctness
of the program. One should keep in mind, however, that reference data from ab-inito
computations often cannot be reproduced perfectly by any classical potential.
Our primary research interest are quasicrystals [24] and other complex metal
alloys, for which good potentials are hardly available. potfit has been developed in
order to generate effective potentials for such complex metal alloys, which feature large
(or even infinite) unit cells, several atom species, and a wide variety of different local
Potfit: effective potentials from ab-initio data 7
environments. So far, force matching had been used mainly to determine potentials
for monoatomic metals and a small selection of relatively simple binary alloys.
As a first application beyond simple alloys, we have developed potentials for the
quasicrystalline and nearby crystalline phases in the systems Al-Ni-Co, Ca-Cd, and
Mg-Zn. Due to the complexity of the structures and also due to the choice of tabulated
potential functions, a relatively large number of potential parameters is required. This
is especially true for ternary EAM potentials, which comprise 12 tabulated functions,
with 10–15 tabulation points each. Correspondingly, a relatively large amount of
reference data is required. A computationally efficient implementation of the force
matching method is therefore essential. It turned out that potfit scales well under
those circumstances and is up to its task.
Although the potentials to be generated are intended for (aperiodic) quasicrystals
and crystals with large unit cells, all reference structures have to be periodic crystals
with unit cell sizes suitable for the ab-initio computation of the reference data. On
the other hand, the reference structures should approximate the quasicrystal in the
sense, that all their unit cells together accommodate all relevant structural motifs.
To do so, they must be large enough. For instance, the quasicrystalline and related
crystalline phases of Ca-Cd and Mg-Zn consist of packings of large icosahedral clusters
in different arrangements. Reference structures must be able to accomodate such
clusters. A further constraint is, that the unit cell diameter must be larger than the
range of potentials. We found that reference structures with 80–200 atoms represent
a good compromise between these requirements.
Starting from a selection of basic reference structures, further ones were obtained
by taking snapshots of MD simulations with model potentials at various temperatures
and pressures. Also samples which were strained in different ways were included.
For all these reference structures, the ab-initio forces, stresses and energies werde
determined with VASP, and a potential was fitted to reproduce these data. As
reference energy, the cohesive energy was used, i.e., the energies of the constituent
atoms was subtracted from the VASP energies. Instead of absolute cohesive energies
one can also use the energy relative to some reference structure. Once a first version of
the fitted potential was available, the MD snapshots were replaced or complemented
with better ones obtained with the new potential, and the procedure was iterated.
As expected, no potential could be found which would reproduce the reference
data exactly. During the optimization, the target function (1) does not converge
to zero, which indicates that quantum mechanical reality (taking density functional
theory as reality) is not represented perfectly by the potential model used. The forces
computed from the optimal potential typically differ by about 10% from the reference
forces, which seems acceptable. For the energies and stresses a much higher agreement
could be reached. Cohesion energy differences for instance can be reproduced with an
accuracy better than 1%.
The generated potentials were then used in molecular dynamics simulations to
determine various material properties, such as the melting temperature and the elastic
constants, for which values consistent with experiment were obtained. The Ca-Cd
potentials were especially tuned towards ground-state like structures, whose energies
are reproduced with high accuracy, in agreement with ab-initio results. Details of
these applications can be found in [4, 25, 26, 27]. Probably the best tested EAM
potential constructed with potfit was obtained by Rösch, Trebin and Gumbsch [28].
This potential is intended for the simulation of crack propagation in the C15 Laves
Phase of NbCr2, and has undergone a broad validation. These authors calculated the
Potfit: effective potentials from ab-initio data 8
lattice constant, the elastic constants and the melting temperature and compared
these values to experimental and ab-initio results with reasonable success. They
also studied relaxation of surface atoms, surface energy and the crack propagation
in NbCr2. According to the authors [28], the force-matched potentials created with
potfit clearly outperform previously published potentials. But this example also shows
[28] that a large number of fitting-validation cycles are usually required, before a usable
and satisfactory potential is obtained. This makes force matching a time-consuming
and tedious process.
5. Discussion
5.1. Transferability
It should be kept in mind that force-matched potentials will only work well in
situations they have been trained to. Therefore, all local environments that might
occur in the simulation should also be present in the set of reference configurations.
Otherwise the results may not be reliable. Using a very broad selection of reference
configurations will make the potential more transferable, making it usable for many
different situations, e.g. for different phases of a given alloy. On the other hand,
giving up some transferability may lead to a higher precision in special situations. By
carefully constraining the variety of reference structures one may generate a potential
that is much more precise in a specific situation than a general purpose potential,
which was trained on a broader set of reference structures. The latter potential, on the
other hand, will be more versatile, but less accurate on average. Finding sufficiently
many suitable reference structures might not always be trivial. For certain complex
structure like quasicrystals, there may be only very few (if any) approximating periodic
structures with small enough unit cells.
5.2. Optimal number and location of sampling points
Each reference database has an optimum number of parameters it can support. Using
too few parameters, the potential functions lack flexibility. On the other hand,
exceeding this number may lead to overfitting beyond the limit of the potential
model. potfit cannot determine that optimal number automatically, but there is a
simple strategy the user can employ. The set of reference configurations is split in
two subsets, one of which is used for fitting and the other for testing the potential.
If the root-mean-square (rms) deviation of the test set significantly exceeds that of
the fitting set, the database is probably overfitted [29]. By starting with a relatively
low number of parameters, that is increased as long as the rms of the testing stage
decreases, one can arrive at the optimal number of parameters [30].
This strategy also helps in dealing with oscillatory artefacts of the spline
interpolation: If the sampling points are not spaced too densely, and there is enough
data to support each tabulation point, artificial wiggles are suppressed. potfit provides
the frequency with which each tabulation interval is accessed during an evaluation of
the target function (1). With this information, sampling point density can be reduced
for distances that do not appear frequently enough in the reference configurations.
Potfit: effective potentials from ab-initio data 9
5.3. Number of atom types and choice of reference structures
The most obvious impact of an increasing number of atom types is the corresponding
increase in the number of potential parameters. For instance, an EAM potential
for n atom types requires n(n + 1)/2 + 2n tabulated functions, each with 10 to 15
tabulation points. For a ternary system, this already amounts to the order of 150
potential parameters. Whereas such a number of parameters can still be handled, an
increasing number of atom types leads to yet another problem, which is more serious.
To see this, it must be kept in mind that any potential function depending on the
interatomic distance must be determined for the entire argument range between rmin
and rcut. If tabulated functions are used, for each tabulation interval there must be
distances actually occuring in the reference structures, for otherwise there are potential
parameters which do not affect the target function, and which conseqently cannot be
determined in the fit. The requirement that all distances for all combinations of atom
types actually occur in the reference structures becomes especially problematic if the
atoms of one type form only a small minority, in which case some distances between
such atoms might be completely absent in all reasonable reference structures. If the
number of atom types is large, there is unfortunately always at least one element
which is a minority constituent. In such situations it might be unavoidable to use a
much broader selection of reference structures with varying stoichiometry, instead of a
fixed stoichiometry with a minority constituent. It might even be necessary to include
energetically less favourable configurations to provide a complete set of reference data.
Another solution would be to use a non-local (or less local) parametrisation of
the potential functions, like a superposition of broad gaussians or functions given by
analytic formulae. Changing one parameter can then affect the function over a broader
range of arguments, making it again possible to fit the function even if only sparse
information on it is provided by the reference data. Potentials represented in this way
would also not suffer from the wiggle artefacts of spline interpolation described above.
5.4. Experimental values as reference data
potfit does currently not use experimental data during force matching. The potentials
are determined exclusively from ab-initio data, which means they cannot exceed the
accuracy of the first principles calculations. While it is possible, in principle, to support
also the comparison to experimental values, we decided against such an addition.
For once, available experimental values can often not be calculated directly from the
potentials, so determining them would considerably slow down the target function (1)
evaluation. Secondly, experimental values often also depend on the exact structure
of the system, which in most cases is not completely known beforehand for complex
structures, for instance due to fractional occupancies in the experimentally determined
structure model. A better way to use experimental data is to test whether the newly
generated potentials lead to structures that under MD simulation show the behaviour
known from experiment.
6. Conclusion
Large scale molecular dynamics simulations are possible only with classical effective
potentials, but for many complex systems physically justified potentials do not exist so
far. Our program potfit allows the generation of effective potentials even for complex
Potfit: effective potentials from ab-initio data 10
binary and ternary intermetallics, adjusting them to ab-initio determined reference
data using the force matching method. Potentials for several complex intermetallic
compounds have been generated, and were successfully used in molecular dynamics
studies of various properties [4, 25, 26, 28]. It should be emphasized, however, that
constructing potentials is still tedious and time-consuming. Potentials have to be
thoroughly tested against quantities not included in the fit. In this process, candiate
potentials often need to be rejected or refined. Many iterations of the fitting-validation
cycle are usually required. It takes experience and skill to decide when a potential is
finished and ready to be used for production, and for which conditions and systems
it is suitable. potfit is only a tool that assists in this process. Flexibility and easy
extensibility was one of the main design goals of potfit. While at present only pair
and EAM potentials with tabulated potential functions are implemented in potfit, it
would be easy to complement these by other potential models, or to add support for
differently represented potential functions.
Acknowledgments
This work was funded by the Deutsche Forschungsgemeinschaft through Collaborative
Research Centre (SFB) 382, project C14. Special thanks go to Stephen Hocker and
Frohmut Rösch for fruitful discussion and feedback, and to Hans-Rainer Trebin for
supervising the thesis work of the first author.
References
[1] Allen M P and Tildesley D J 1987 Computer Simulation of Liquids, Oxford Science Publications,
(Oxford: Clarendon)
[2] Beazley D M, Lohmdahl P S, Grønbech-Jensen N, Giles R and Tamayou P 1995 Parallel
algorithms for short-range molecular dynamics vol III of Annual Reviews of Computational
Physics (Singapore: World Scientific) pp 119–175 ISBN 981–02–2427–3
[3] Rösch F, Rudhart C, Roth J, Trebin H R and Gumbsch P 2005 Phys. Rev. B 72 014128
[4] Hocker S, Gähler F and Brommer P 2006 Phil. Mag. 86(6–8) 1051–1057
[5] Daw M S, Foiles S M and Baskes M I 1993 Mater. Sci. Rep. 9(7–8) 251–310
[6] Chantasiriwan S and Milstein F 1996 Phys. Rev. B 53(21) 14080–14088
[7] Ercolessi F and Adams J B 1994 Europhys. Lett. 26(8) 583–588
[8] Powell M J D 1965 Comp. J. 7(4) 303–307
[9] Corana A, Marchesi M, Martini C and Ridella S 1987 ACM Trans. Math. Soft. 13(3) 262–280
[10] Brent R P 1973 Algorithms for minimization without derivatives Prentice-Hall series in
automatic computation (Englewood Cliffs, NJ: Prentice-Hall) ISBN 0–13–022335–2
[11] Galassi M, Davies J, Theiler J, Gough B, Jungman G, Booth M and Rossi F 2005 GNU Scientific
Library Reference Manual - Revised Second Edition (Bristol: Network Theory Ltd) ISBN 0–
9541617–3–4
[12] Kirkpatrick S, Gelatt C D and Vecci M P 1983 Science 220(4598) 671–680
[13] Daw M S and Baskes M I 1984 Phys. Rev. B 29(12) 6443–6453
[14] Ercolessi F, Parrinello M and Tosatti E 1988 Phil. Mag. A 58(1) 213–226
[15] Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A,
Hammarling S, McKenney A and Sorensen D 1999 LAPACK Users’s Guide, Third Edition
(Philadelphia, PA: Society for Industrial and Applied Mathematics) ISBN 0–89871–447–8
[16] Lawson C L, Hanson R J, Kincaid D R and Krogh F T 1979 ACM Trans. Math. Soft.
5(3) 308–323
[17] Gropp W, Lusk E and Skjellum A 1999 Using MPI - 2nd Edition (Cambridge, MA: MIT Press)
ISBN 0–262–57132–3
[18] Kresse G and Hafner J 1993 Phys. Rev. B 47(1) 558–561
[19] Kresse G and Furthmüller J 1996 Phys. Rev. B 54(16) 11169–11186
[20] Stadler J, Mikulla R and Trebin H R 1997 Int. J. Mod. Phys. C 8(5) 1131–1140
[21] Blöchl P E 1994 Phys. Rev. B 50(24) 17953–17979
http://dx.doi.org/10.1103/PhysRevB.72.014128
http://dx.doi.org/10.1080/14786430500259734
http://dx.doi.org/10.1016/0920-2307(93)90001-U
http://dx.doi.org/10.1103/PhysRevB.53.14080
http://dx.doi.org/10.1145/29380.29864
http://dx.doi.org/10.1126/science.220.4598.671
http://dx.doi.org/10.1103/PhysRevB.29.6443
http://dx.doi.org/10.1145/355841.355847
http://dx.doi.org/10.1103/PhysRevB.47.558
http://dx.doi.org/10.1103/PhysRevB.54.11169
http://dx.doi.org/10.1142/S0129183197000990
http://dx.doi.org/10.1103/PhysRevB.50.17953
Potfit: effective potentials from ab-initio data 11
[22] Kresse G and Joubert D 1999 Phys. Rev. B 59(3) 1758–1775
[23] Ludwig M and Gumbsch P 1995 Modelling Simul. Mater. Sci. Eng. 3(4) 533–542
[24] Trebin H R, ed 2003 Quasicrystals. Structure and Physical Properties (Weinheim: Wiley-VCH)
[25] Brommer P and Gähler F 2006 Phil. Mag. 86(6–8) 753–758
[26] Mihalkovič M and Widom M 2006 Phil. Mag. 86(3–5) 519–527
[27] Brommer P, Gähler F and Mihalkovič M 2007 Phil. Mag. (at press)
[28] Rösch F, Trebin H R and Gumbsch P 2006 Int. Journal Fracture 139(3–4) 517–526
[29] Robertson I J, Heine V and Payne M C 1993 Phys. Rev. Lett. 70(13) 1944–1947
[30] Mishin Y, Farkas D, Mehl M J and Papaconstantopoulos D A 1999 Phys. Rev. B
59(5) 3393–3407
http://dx.doi.org/10.1103/PhysRevB.59.1758
http://dx.doi.org/10.1088/0965-0393/3/4/008
http://dx.doi.org/10.1080/14786430500333349
http://dx.doi.org/10.1080/14786430500333356
http://dx.doi.org/10.1007/s10704-006-0065-8
http://dx.doi.org/10.1103/PhysRevLett.70.1944
http://dx.doi.org/10.1103/PhysRevB.59.3393
	Introduction
	Algorithms
	Optimization
	Potential models and parametrizations
	Implementation
	Parallelization and optimization
	Input and output files
	Results and validation
	Discussion
	Transferability
	Optimal number and location of sampling points
	Number of atom types and choice of reference structures
	Experimental values as reference data
	Conclusion
ABSTRACT
  We present a program called potfit which generates an effective atomic
interaction potential by matching it to a set of reference data computed in
first-principles calculations. It thus allows to perform large-scale atomistic
simulations of materials with physically justified potentials. We describe the
fundamental principles behind the program, emphasizing its flexibility in
adapting to different systems and potential models, while also discussing its
limitations. The program has been used successfully in creating effective
potentials for a number of complex intermetallic alloys, notably quasicrystals.

<|endoftext|><|startoftext|>
Introduction
Cosmological observations have provided the strong evidence that the Universe
is flat and its energy density is dominated by the dark energy component whose
negative pressure causes the cosmic expansion to accelerate.1 In order to clarify the
origin of the dark energy, one has tried to understand the connection of the dark
energy with particle physics.
In the Mass Varying Neutrinos (MaVaNs) scenario proposed by Fardon, Nelson
and Weiner, relic neutrinos could form a negative pressure fluid and cause the
present cosmic acceleration.2 In the model, an unknown scalar field, which is called
“acceleron”, is introduced and neutrinos are assumed to interact through a new
scalar force. The acceleron sits at the instantaneous minimum of its potential, and
the cosmic expansion only modulates this minimum through changes in the number
density of neutrinos. Therefore, the neutrino mass is given by the acceleron, in
other words, it depends on the number density of neutrinos and changes with the
expansion of the Universe. The equation of state parameterw and the energy density
of the dark energy also evolve with the neutrino mass. Those evolutions depend
∗talked at the International Workshop on Neutrino Masses and Mixings, University of Shizuoka,
Shizuoka, Japan, December 17-19, 2006
http://arxiv.org/abs/0704.0186v1
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
2 Ryo Takahashi
on the form of a scalar potential and the relation between the acceleron and the
neutrino mass strongly. Some examples of the potential have been considered.3
The idea of the variable neutrino mass was considered at first in a model of
neutrino dark matter and was discussed for neutrino clouds.4 Interacting dark en-
ergy scalar with neutrinos was considered in the model of a sterile neutrino.5 The
coupling to the left-handed neutrino and its implication on the neutrino mass limit
from baryogenesis was discussed.6 In the context of the MaVaNs scenario, there
have been a lot of works. 7,8,9,10,11,12
In this talk, we present a MaVaNs model including the supersymmetry breaking
effect mediated by the gravity. Then we show evolutions of the neutrino mass and
the equation of state parameter in the model.
2. MaVaNs Model in Supersymmetric Theory
We discuss the Mass Varying Neutrinos scenario in supersymmetric theory and
present a model.
We assume a chiral superfield A in dark sector. A is assumed to be a singlet
under the gauge group of the standard model. It is difficult to construct a viable
MaVaNs model without fine-tunings in some parameters when one assumes one
chiral superfield in dark sector, which couples to only the left-handed lepton doublet
superfield. 8 Therefore, we assume that the superfield A couples to both the left-
handed lepton doublet superfield L and the right-handed neutrino superfield R. For
simplicity, we consider the MaVaNs scenario in one generation of neutrinos.a
In such framework, we suppose the following superpotential,
AA+mDLA+MDLR+
RR, (1)
where λ is a coupling constant of O(1) and MA, MD, MR and mD are mass pa-
rameters.b The scalar and the spinor component of A are represented by φ and ψ,
respectively. The scalar component corresponds to the acceleron which cause the
present cosmic acceleration. The spinor component is a sterile neutrino. The third
term of the right-hand side in Eq. (1) is derived from the Yukawa coupling such as
yLAH with y < H >= mD, where H is the Higgs doublet.
In the MaVaNs scenario, the dark energy is assumed to be composed of the
neutrinos and the scalar potential for the acceleron. Therefore, the energy density
of the dark energy is given as
ρDE = ρν + V (φ). (2)
Since only the acceleron potential contributes to the dark energy, we assume the
vanishing vacuum expectation values of sleptons, and thus we find the following
aThree generations model of this scenario has presented in non supersymmetric theory.9
bOther supersymmetric model so called “hybrid model” has been proposed.10
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
Dark energy and neutrino model in SUSY– Remarks on active and sterile neutrinos mixing – 3
effective scalar potential,
V (φ) =
|φ|4 +M2A|φ|
2 +m2D|φ|
2. (3)
We can write down a lagrangian density from Eq. (1),
L = λφψψ +MAψψ +mDνLψ +MDνLνR +MRνRνR + h.c.. (4)
It is noticed that the lepton number conservation in the dark sector is violated
because this lagrangian includes both MAψψ and mDνLψ. After integrating out
the right-handed neutrino, the effective neutrino mass matrix is given by
mD MA + λφ
, (5)
in the basis of (νL, ψ), where c ≡ −M
D/MR and we assume λφ≪ MD ≪MR. The
first term of the (1, 1) element of this matrix corresponds to the usual term given
by the seesaw mechanism in the absence of the acceleron. We obtain masses of the
left-handed and a sterile neutrino as follows,
mνL =
c+MA + λ < φ >
[c− (MA + λ < φ >)]2 + 4m
, (6)
c+MA + λ < φ >
[c− (MA + λ < φ >)]2 + 4m
. (7)
It is remarked that only the mass of a sterile neutrino is variable in the case of
the vanishing mixing (mD = 0) between the left-handed and a sterile neutrino on
cosmological time scale. The finite mixing (mD 6= 0) makes the mass of the left-
handed neutrino variable. We will consider these two cases of mD = 0 and mD 6= 0
later.
In the MaVaNs scenario, there are two constraints on the scalar potential. The
first one comes from cosmological observations. It is that the magnitude of the
present dark energy density is about 0.74ρc. ρc is the critical density. Thus, the
first constraint turns to
V (φ0) = 0.74ρc − ρ
ν , (8)
where “0” means the present value.
The second one is the stationary condition:
∂V (φ)
= 0. (9)
In this scenario, the neutrino mass is represented by a function of the acceleron;
mν = f(φ). Since the energy density of the neutrino varies on cosmological times
scale, the vacuum expectation value of the acceleron also varies. This property
makes the neutrino mass variable. If ∂mν/∂φ 6= 0, Eq. (9) is equivalent to
∂V (φ(mν))
= 0. (10)
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
4 Ryo Takahashi
Eq. (10) is rewritten by using the cosmic temperature T :
∂V (φ)
= −T 3
∂F (ξ)
, (11)
where ξ ≡ mν/T , ρν = T
4F (ξ) and
F (ξ) ≡
y2 + ξ2
ey + 1
. (12)
We can get the time evolution of the neutrino mass from Eq. (11). Since the sta-
tionary condition should be always satisfied in the evolution of the Universe, this
one at the present epoch is the second constraint on the scalar potential:
∂V (φ)
mν=m0ν
= −T 3
∂F (ξ)
mν=m0ν ,T=T0
. (13)
In addition to two constraints for the potential, we also have two relations between
the vacuum expectation value of the acceleron and the neutrino masses at the
present epoch:
m0νL =
c+MA + λ < φ >
[c− (MA + λ < φ >0)]2 + 4m
, (14)
m0ψ =
c+MA + λ < φ >
[c− (MA + λ < φ >0)]2 + 4m
. (15)
Next, let us consider the dynamics of the acceleron field. In order that the
acceleron does not vary significantly on distance of inter-neutrino spacing, the ac-
celeron mass at the present epoch must be less than O(10−4eV) 2. Here and below,
we fix the present acceleron mass as
m0φ = 10
−4 eV. (16)
Once we adjust parameters which satisfy five equations (8) and (13)∼(16), we can
have evolutions of the neutrino masses by using the Eq. (11).
The dark energy is characterized by the evolution of the equation of state pa-
rameter w. The equation of state is derived from the energy conservation law and
the stationary condition Eq. (11):
w + 1 =
[4− h(ξ)]ρν
, (17)
where
h(ξ) ≡
∂F (ξ)
F (ξ)
. (18)
It seems that w in this scenario depend on the neutrino mass and the cosmic
temperature. This means that w varies with the evolution of the Universe unlike
the cosmological constant.
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
Dark energy and neutrino model in SUSY– Remarks on active and sterile neutrinos mixing – 5
In the last of this section, we comment on the hydrodynamical stability of the
dark energy in the MaVaNs scenario. The speed of sound squared in the neutrino-
acceleron fluid is given by
c2s =
ẇρDE + wρ̇DE
, (19)
where pDE is the pressure of the dark energy. Recently, it was argued that when
neutrinos are non-relativistic, this speed of sound squared becomes negative in this
scenario.11 The emergence of an imaginary speed of sound means that the MaVaNs
scenario with non-relativistic neutrinos is unstable, and thus the fluid in this sce-
nario cannot acts as the dark energy. However, finite temperature effects provide a
positive contribution to the speed of sound squared and avoid this instability. 12
Then, a model should satisfy the following condition,
5aT 2
25aT 20 (z + 1)
> 0, (20)
where z is the redshift parameter, z ≡ (T/T0)− 1, and
≃ 6.47. (21)
The first and the second term of left hand side in Eq. (20) are negative and positive
contributions to the speed of sound squared, respectively. We find that a model
which leads to small ∂mν/∂z is favored. A model with a small power-law scalar
potential; V (φ) = Λ4(φ/φ0)k, k ≪ 1, and a constant dominant neutrino mass;
mν = C + f(φ), f(φ) ≪ C, leads to small ∂mν/∂z.
c Actually, some models have
been presented.9
3. Effect of supersymmetry breaking
Let us consider effect of supersymmetry breaking in the dark sector. We assume
a superfield X , which breaks supersymmetry, in the hidden sector, and the chiral
superfield A in the dark sector is assumed to interact with the hidden sector only
through the gravity. This framework is shown graphically in Fig. 1. Once supersym-
metry is broken at TeV scale, its effect is transmitted to the dark sector through
the following operators:
X† +X
A†A, (22)
where Mpℓ is the Planck mass. Then, the scale of soft terms FX(TeV
2)/Mpℓ ∼
O(10−3-10−2eV) is expected. In the “acceleressence” scenario, this scale is identi-
fied with the dark energy scale.14 We consider only one superfield which breaks
cA model with the masses of the left-handed neutrinos given by the see-saw mechanism is unstable
even if it has a small power-law scalar potential.13
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
6 Ryo Takahashi
Fig. 1. The illustration of interactions among three sectors. The dark sector couples to the left-
handed neutrino through a new scalar force in the MaVaNs scenario. The dark sector is also
assumed to be related with the hidden sector only through the gravity.
supersymmetry for simplicity. If one extends the hidden sector, one can consider a
different mediation mechanism between the standard model and the hidden sector
from one between the dark and the hidden sector.
In this framework, taking supersymmetry breaking effect into account, the scalar
potential is given by
V (φ) =
|φ|4 −
(φ3 + h.c.) +M2A|φ|
2 +m2D|φ|
2 −m2|φ|2 + V0, (23)
where κ and m are supersymmetry breaking parameters, and V0 is a constant
determined by the condition that the cosmological constant is vanishing at the true
minimum of the acceleron potential.
We consider two types of the neutrino mass matrix in this scalar potential. They
are the cases of the vanishing and the finite mixing between the left-handed and a
sterile neutrino.
3.1. Case of the Vanishing Mixing
When the mixing between the left-handed and a sterile neutrino is vanishing,mD =
0 in the neutrino mass matrix (5). Then we have the masses of the left-handed and
a sterile neutrino as
mνL = c, (24)
mψ = MA + λ < φ > . (25)
In this case, we find that only the mass of a sterile neutrino is variable on cosmo-
logical time scale due to the second term of the right hand side in Eq. (25).
Let us adjust parameters which satisfy Eqs. (8) and (13)∼(16). In Eq. (8), the
scalar potential Eq. (23) is used. Putting typical values for four parameters by hand
as follows:
λ = 1, mD = 0, m
= 2× 10−2 eV, m0ψ = 10
−2 eV, (26)
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
Dark energy and neutrino model in SUSY– Remarks on active and sterile neutrinos mixing – 7
we have
< φ >0≃ −1.31× 10−5 eV, c = 2× 10−2 eV, MA ≃ 10
−2 eV,
m ≃ 10−2 eV, κ ≃ 4.34× 10−3 eV. (27)
We need fine-tuning between MA and m in order to satisfy the constraint on the
present accerelon mass of Eq. (16).
We show evolutions of the mass of a sterile neutrino and the equation of state
parameter in Figs. 2, 3 and 4. The behavior of the mass of a neutrino near the
present epoch is shown in Fig. 3. We find that the mass of a sterile neutrino have
varied slowly in this epoch. This means that the first term of the left hand side in
Eq. (20), which is a negative contribution to the speed of sound squared, is tiny.
We can also check the positive speed of sound squared in a numerical calculation.
Therefore, the neutrino-acceleron fluid is hydrodynamically stable and acts as the
dark energy.
3.2. Case of the Finite Mixing
Next, we consider the case of the finite mixing between the left-handed and a sterile
neutrino (mD 6= 0). In this case, the left-handed and a sterile neutrino mass are
given by
mνL =
c+MA + λ < φ >
[c− (MA + λ < φ >)]2 + 4m
, (28)
c+MA + λ < φ >
[c− (MA + λ < φ >)]2 + 4m
. (29)
We find that both masses of the left-handed and a sterile neutrino are variable on
cosmological time scale due to the term of the acceleron dependence.
Taking typical values for four parameters as
λ = 1, mD = 10
−3 eV, m0νL = 2× 10
−2 eV, m0ψ = 10
−2 eV, (30)
we have
< φ >0≃ −1.31× 10−5 eV, c ≃ 1.99× 10−2 eV, MA ≃ 1.01× 10
−2 eV,
m ≃ 1.02× 10−2 eV, κ ≃ 4.34× 10−3 eV. (31)
where we required that the mixing between the active and a sterile neutrino is
tiny. In our model, the small present value of the acceleron is needed to satisfy the
constraints on the scalar potential in Eqs. (8) and (13).
Values of parameters in (31) are almost same as the case of the vanishing mixing
(27). However, the mass of the left-handed neutrino is variable unlike the vanishing
mixing case. The time evolution of the left-handed neutrino mass is shown in Fig. 5.
The mixing does not affect the evolution of a sterile neutrino mass and the equation
of state parameter, which are shown in Figs. 6, 7. Since the variation in the mass of
the left-handed neutrino is not vanishing but extremely small, the model can also
avoid the instability of speed of sound.
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
8 Ryo Takahashi
Finally, we comment on the smallness of the evolution of the neutrino mass at
the present epoch. In our model, the mass of the left-handed and a sterile neutrino
include the constant part. A variable part is a function of the acceleron. In the
present epoch, the constant part dominates the neutrino mass because the present
value of the acceleron should be small. This smallness of the value of the acceleron
is required from the cosmological observation and the stationary condition in Eqs.
(8) and (13).
4. Summary
We presented a supersymmetric MaVaNs model including effects of the supersym-
metry breaking mediated by the gravity. Evolutions of the neutrino mass and the
equation of state parameter have been calculated in the model. Our model has
a chiral superfield in the dark sector, whose scalar component causes the present
cosmic acceleration, and the right-handed neutrino superfield. In our framework,
supersymmetry is broken in the hidden sector at TeV scale and the effect is assumed
to be transmitted to the dark sector only through the gravity. Then, the scale of
soft parameters of O(10−3-10−2)(eV) is expected.
We considered two types of model. One is the case of the vanishing mixing
between the left-handed and a sterile neutrino. Another one is the finite mixing
case. In the case of the vanishing mixing, only the mass of a sterile neutrino had
varied on cosmological time scale. In the epoch of 0 ≤ z ≤ 20, the sterile neutrino
mass had varied slowly. This means that the speed of sound squared in the neutrino
acceleron fluid is positive, and thus this fluid can act as the dark energy. In the
finite mixing case, the mass of the left-handed neutrino had also varied. However,
the variation is extremely small and the effect of the mixing does not almost affect
the evolution of the sterile neutrino mass and the equation of state parameter.
Therefore, this model can also avoid the instability.
References
1. A. G. Riess et al., Astron. J. 116, 1009 (1998); S. Perlmutter et al., Astrophys. J. 517,
565 (1999); P. de Bernardis et al., Nature 404, 955 (2000); A. Baldi et al., Astrophys.
J. 545, L1 (2000), [Erratum-ibid. 558, L145 (2001)]; A. T. Lee et al., Astrophys. J.
561, L1 (2001); R. Stompor et al., Astrophys. J. 561, L7 (2001); N. W. Halverson
et al., Astrophys. J. 568, 38 (2002); C. L. Bennet et al., Astrophys. J. Suppl. 148, 1
(2003); D. N. Spergel et al., Astrophys. J. Suppl. 148, 175 (2003), [astro-ph/0603449];
J. A. Peacock et al., Nature 410, 169 (2001); W. J. Percival et al., Mon. Not. Roy.
Astron. Soc. 327, 1297 (2001); M. Tegmark et al., Phys. Rev. D 69, 103501 (2004); K.
Abazajian et al., Astron. J. 128, 502 (2004); U. Seljak et al., Phys. Rev. D 71, 103515
(2005); P. McDonald, U. Seljak, R. Cen, P. Bode, and J. P. Ostriker, Mon. Not. Roy.
Astron. Soc. 360, 1471 (2005).
2. R. Fardon, A. E. Nelson and N. Weiner, JCAP. 0410, 005 (2004).
3. R. D. Peccei, Phys. Rev. D 71, 023527 (2005).
4. M. Kawasaki, H. Murayama and T. Yanagida, Mod. Phys. Lett. A 7, 563 (1992); G. J.
http://arxiv.org/abs/astro-ph/0603449
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
Dark energy and neutrino model in SUSY– Remarks on active and sterile neutrinos mixing – 9
Stephenson, T. Goldman and B. H. J. McKellar, Int. J. Mod. Phys. A 13, 2765 (1998),
Mod. Phys. Lett. A 12, 2391 (1997).
5. P. Q. Hung, hep-ph/0010126.
6. P. Gu, X-L. Wang and X-Min. Zhang, Phys. Rev. D 68, 087301 (2003).
7. D. B. Kaplan, A. E. Nelson, N. Weiner, Phy. Rev. Lett. 93, 091801 (2004); V. Barger,
D. Marfatia and K. Whisnant, Phys. Rev. D73, 013005 (2006); P-H. Gu, X-J. Bi, B.
Feng, B-L. Young and X. Zhang, hep-ph/0512076; X-J. Bi, P. Gu, X-L. Wang and
X-Min. Zhang, Phys. Rev. D 69, 113007 (2004); P. Gu and X-J. Bi, Phys. Rev. D 70,
063511 (2004); P. Q. Hung and H. Päs, Mod. Phys. Lett. A 20, 1209 (2005); V. Barger,
P. Huber and D. Marfatia, Phys. Rev. Lett. 95, 211802 (2005); M. Cirelli and M. C.
Gonzalez-Garcia and C. Peña-Garay, Nucl. Phys. B 719, 219 (2005); X-J. Bi, B. Feng,
H. Li and X-Min. Zhang, Phys. Rev. D72 123523, (2005); A. W. Brookfield, C. van de
Bruck, D. F. Mota and D. Tocchini-Valentini, Phys. Rev. Lett. 96, 061301 (2006); R.
Horvat, JCAP 0601, 015 (2006); R. Barbieri, L. J. Hall, S. J. Oliver and A. Strumia,
Phys. Lett. B 625, 189 (2005); N. Weiner and K. Zurek, Phys. Rev. D 74, 023517
(2006); H. Li, B. Feng, J-Q. Xia and X-Min. Zhang, Phys. Rev. D 73, 103503 (2006);
A. W. Brookfield, C. van de Bruck, D. F. Mota and D. Tocchini-Valentini, Phys. Rev.
D 73, 083515 (2006); P-H. Gu, X-J. Bi and X. Zhang, hep-ph/0511027; E. Ma and
U. Sarkar, Phys. Lett. B638, 356 (2006); A. Zanzi, Phys. Rev. D 73, 124010 (2006);
R. Takahashi and M. Tanimoto, Phys. Rev. D 74, 055002 (2006); A. Ringwald and L.
Schrempp, JCAP. 0610, 012 (2006); R. Takahashi and M. Tanimoto, hep-ph/0610347;
S. Das and N. Weiner, astro-ph/0611353; C.T. Hill, I. Mocioiu, E.A. Paschos and U.
Sarkar, hep-ph/0611284; L. Schrempp, astro-ph/0611912.
8. R. Takahashi and M. Tanimoto, Phys. Lett. B633, 675 (2006).
9. M. Honda, R. Takahashi and M. Tanimoto, JHEP 0601, 042 (2006).
10. R. Fardon, A. E. Nelson and N. Weiner, JHEP 0603, 042 (2006).
11. N. Afshordi, M. Zaldarriaga and K. Kohri, Phys. Rev. D 72, 065024 (2005).
12. R. Takahashi and M. Tanimoto, JHEP 0605, 021 (2006).
13. C. Spitzer, astro-ph/0606034.
14. Z. Chacko, L. J. Hall and Y. Nomura, JCAP. 0410, 011 (2004).
http://arxiv.org/abs/hep-ph/0010126
http://arxiv.org/abs/hep-ph/0512076
http://arxiv.org/abs/hep-ph/0511027
http://arxiv.org/abs/hep-ph/0610347
http://arxiv.org/abs/astro-ph/0611353
http://arxiv.org/abs/hep-ph/0611284
http://arxiv.org/abs/astro-ph/0611912
http://arxiv.org/abs/astro-ph/0606034
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
10 Ryo Takahashi
Fig. 2. Evolution of the mass of a sterile neutrino (0 ≤ z ≤ 2000)
Fig. 3. Evolution of the mass of a sterile neutrino (0 ≤ z ≤ 20)
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
Dark energy and neutrino model in SUSY– Remarks on active and sterile neutrinos mixing – 11
Fig. 4. Evolution of w (0 ≤ z ≤ 50)
Fig. 5. Evolution of the mass of the left-handed neutrino (0 ≤ z ≤ 2000)
October 31, 2018 9:21 WSPC/INSTRUCTION FILE shizuoka
12 Ryo Takahashi
Fig. 6. Evolution of the mass of a sterile neutrino (0 ≤ z ≤ 2000)
Fig. 7. Evolution of w (0 ≤ z ≤ 50)
	Introduction
	MaVaNs Model in Supersymmetric Theory
	Effect of supersymmetry breaking
	Case of the Vanishing Mixing
	Case of the Finite Mixing
	Summary
ABSTRACT
  We consider a Mass Varying Neutrinos (MaVaNs) model in supersymmetric theory.
The model includes effects of supersymmetry breaking transmitted by the
gravitational interaction from the hidden sector, in which supersymmetry was
broken, to the dark energy sector. Then evolutions of the neutrino mass and the
equation of state parameter of the dark energy are presented in the model. It
is remarked that only the mass of a sterile neutrino is variable in the case of
the vanishing mixing between the left-handed and a sterile neutrino on
cosmological time scale. The finite mixing makes the mass of the left-handed
neutrino variable.

<|endoftext|><|startoftext|>
Introduction
After reionisation the intergalactic medium (IGM) is kept highly
photoionised by the metagalactic UV radiation field generated
by the overall population of quasars and star-forming galax-
ies (e.g. Haardt & Madau 1996; Fardal et al. 1998; Bianchi et al.
2001; Sokasian et al. 2003). The intensity and spectral shape of
the UV background determines the ionisation state of the observ-
able elements in the IGM. In particular, the remaining fraction
of intergalactic neutral hydrogen and singly ionised helium is
responsible for the Lyα forest of H i and He ii.
On lines of sight passing near quasars the IGM will be sta-
tistically more ionised due to the local enhancement of the UV
flux that should result in a statistically higher IGM transmission
(’void’) in the QSO’s vicinity (Fardal & Shull 1993; Croft 2004;
McDonald et al. 2005). This so-called proximity effect has been
found with high statistical significance on lines of sight towards
luminous quasars (e.g. Bajtlik et al. 1988; Giallongo et al. 1996;
Scott et al. 2000). On the other hand, a transverse proximity ef-
fect created by foreground ionising sources nearby the line of
sight has not been clearly detected in the H i forest, except the
recent detection at z = 5.70 by Gallerani et al. (2007). While
two large H i voids have been claimed to be due to the transverse
Send offprint requests to: G. Worseck, e-mail: gworseck@aip.de
⋆ Based on observations collected at the European Southern
Observatory, Chile (Proposals 070.A-0425 and 074.A-0273). Data col-
lected under Proposals 068.A-0194, 070.A-0376 and 116.A-0106 was
obtained from the ESO Science Archive.
proximity effect by Dobrzycki & Bechtold (1991a, however
see Dobrzycki & Bechtold 1991b) and Srianand (1997), other
studies find at best marginal evidence (Fernández-Soto et al.
1995; Liske & Williger 2001), and most attempts resulted
in non-detections (Crotts 1989; Møller & Kjærgaard 1992;
Crotts & Fang 1998; Schirber et al. 2004; Croft 2004). This
has led to explanations involving the systematic effects of
anisotropic radiation, quasar variability (Schirber et al. 2004),
intrinsic overdensities (Loeb & Eisenstein 1995; Rollinde et al.
2005; Hennawi & Prochaska 2007; Guimarães et al. 2007) and
finite quasar lifetimes (Croft 2004).
Intergalactic He ii Lyα absorption (λrest = 303.7822 Å)
can be studied only towards the few quasars at z > 2 whose
far UV flux is not extinguished by intervening Lyman limit
systems (Picard & Jakobsen 1993; Jakobsen 1998). Of the six
quasars successfully observed so far, the lines of sight to-
wards HE 2347−4342 (z = 2.885) and HS 1700+6416 (z =
2.736) probe the post-reionisation era of He ii with an emerg-
ing He ii forest that has been resolved with FUSE (Kriss et al.
2001; Shull et al. 2004; Zheng et al. 2004; Fechner et al. 2006;
Fechner & Reimers 2007a).
In a highly ionised IGM a comparison of the H i with the
corresponding He ii absorption yields an estimate of the spectral
shape of the UV radiation field due to the different ionisation
thresholds of both species. The amount of He ii compared to H i
gives a measure of the spectral softness, generally expressed via
the column density ratio η = NHe ii/NH i. Typically, η <∼ 100 indi-
cates a hard radiation field generated by the surrounding quasar
http://arxiv.org/abs/0704.0187v2
2 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
population, whereas η >∼ 100 requires a significant contribu-
tion of star-forming galaxies or heavily softened quasar radiation
(e.g. Haardt & Madau 1996; Fardal et al. 1998; Haardt & Madau
2001).
The recent FUSE observations of the He ii Lyα forest re-
vealed large η fluctuations (1 <∼ η <∼ 1000) on small scales
of 0.001 <∼ ∆z <∼ 0.03 with a median η ≃ 80–100. Apart
from scatter due to the low-quality He ii data at S/N ∼ 5
(Fechner et al. 2006; Liu et al. 2006) and possible systematic er-
rors due to the generally assumed line broadening mechanism
(Fechner & Reimers 2007a), several physical reasons for these
η variations have been proposed. A combination of local den-
sity variations (Miralda-Escudé et al. 2000), radiative transfer
effects (Maselli & Ferrara 2005; Tittley & Meiksin 2006) and lo-
cal differences in the properties of quasars may be responsible
for the fluctuations. In particular, at any given point in the IGM
at z > 2 only a few quasars with a range of spectral indices
(Telfer et al. 2002; Scott et al. 2004) contribute to the UV back-
ground at hν ≥ 54.4 eV (Bolton et al. 2006).
Already low-resolution He ii spectra obtained with HST in-
dicate a fluctuating radiation field, which has been interpreted
as the onset of He ii reionisation in Strömgren spheres around
hard He ii photoionising sources along or near the line of
sight (Reimers et al. 1997; Heap et al. 2000; Smette et al. 2002).
Jakobsen et al. (2003) found a quasar coinciding with the promi-
nent He ii void at z = 3.05 towards Q 0302−003, thereby pre-
senting the first clear case of a transverse proximity effect. In
Worseck & Wisotzki (2006), hereafter Paper I, we revealed the
transverse proximity effect as a systematic increase in spectral
hardness around all four known foreground quasars along this
line of sight. This suggests that a hard radiation field is a sensi-
tive probe of the transverse proximity effect even if there is no
associated void in the H i forest, either because of the weakness
of the effect, or because of large-scale structure.
Along the line of sight towards HE 2347−4342 several He ii
voids have been claimed to be due to nearby unknown AGN
(Smette et al. 2002). Likewise, some forest regions with a de-
tected hard radiation field may correspond to proximity ef-
fect zones of putative foreground quasars (Fechner & Reimers
2007a). Here we report on results from a slitless spectroscopic
quasar survey in the vicinity of HE 2347−4342 and on spectral
shape fluctuations of the UV radiation field probably caused by
foreground quasars towards the sightline of HE 2347−4342. The
paper is structured as follows. Sect. 2 presents the observations
and the supplementary data employed for the paper. Although
we do not detect any transverse proximity effect in the H i for-
est (Sect. 3), the fluctuating UV spectral shape along the line
of sight indicates a hard radiation field in the projected vicinity
of the foreground quasars (Sect. 4). In Sect. 5 we study three
nearby metal line systems which could further constrain the ion-
ising field. We interpret the statistically significant excesses of
hard radiation as being due to the transverse proximity effect
(Sect. 6). We present our conclusions in Sect. 7. Throughout
the paper we adopt a flat cosmological model with Ωm = 0.3,
ΩΛ = 0.7 and H0 = 70 km s
−1 Mpc−1.
2. Observations and data reduction
2.1. Search for QSO candidates near HE 2347−4342
In October 2002 we observed a 25′ × 33′ field centered on
HE 2347−4342 (z = 2.885) with the ESO Wide Field Imager
(WFI, Baade et al. 1999) at the ESO/MPI 2.2 m Telescope (La
Silla) in its slitless spectroscopic mode (Wisotzki et al. 2001) as
part of a survey for faint quasars in the vicinity of established
high-redshift quasars. A short summary of the survey is given in
Paper I; a detailed description will follow in a separate paper.
A semi-automated search for emission line objects among
the slitless spectra of the ∼ 1400 detected objects in the field
resulted in 10 prime quasar candidates.
2.2. Spectroscopic follow-up
Follow-up spectroscopy of these 10 quasar candidates was ob-
tained with the Focal Reducer/Low Dispersion Spectrograph 2
(FORS2, Appenzeller et al. 1998) on ESO VLT UT1/Antu in
Visitor Mode on November 17 and 19, 2004 under variable see-
ing but clear conditions. The spectra were taken either with
the 300V grism or the 600B grism and a 1′′ slit kept at the
parallactic angle, resulting in a spectral resolution of ∼ 10 Å
FWHM and ∼ 4.5 Å FWHM, respectively. No order separa-
tion filter was employed, leading to possible order overlap at
λ > 6600 Å in the spectra taken with the 300V grism. Exposure
times were adjusted to yield S/N ∼ 20 in the quasar con-
tinuum. The spectra were calibrated in wavelength against the
FORS2 He/Ne/Ar/HgCd arc lamps and spectrophotometrically
calibrated against the HST standard stars Feige 110 and GD 108.
Data reduction was performed with standard IRAF tasks us-
ing the optimal extraction algorithm by Horne (1986). Figure 1
shows the spectra of the quasars together with 4 quasars from an-
other survey (Sect. 2.3). Table 1 summarises our spectroscopic
follow-up observations.
2.3. Additional quasars
We checked the ESO Science Archive for additional quasars in
the vicinity of HE 2347−4342 and found several unpublished
quasars from a deeper slitless spectroscopic survey using the
ESO VLT, the results of which (on the field of Q 0302−003) are
described in Jakobsen et al. (2003). We obtained their follow-up
spectra of quasars surrounding HE 2347−4342 from the archive
and publish them here in agreement with P. Jakobsen. In the
course of their survey FORS1 spectra of 10 candidates were
taken with the 300V grism crossed with the GG435 order separa-
tion filter and a 1′′ slit, calibrated against the standards LTT 7987
and GD 50. Seven of their candidates are actually quasars, of
which 3 were also found independently by our survey. The re-
maining 4 quasars are beyond our redshift-dependent magnitude
limit. The FORS1 spectra of the 4 additional quasars are dis-
played in Fig. 1 and listed separately in Table 1. According to
the quasar catalogue by Véron-Cetty & Véron (2006) there are
no other previously known quasars within a radius < 30′ around
HE 2347−4342.
2.4. Redshifts and magnitudes
Redshifts of the 14 quasars were determined by taking every de-
tectable emission line into account. Line peaks were measured
by eye and errors were estimated taking into account the S/N
of the lines, line asymmetries and the presence of absorption
systems. The quasar redshifts were derived by weighting the
measurements of detected lines. Since high-ionisation lines suf-
fer from systematic blueshifts with respect to the systemic red-
shift (Gaskell 1982; Tytler & Fan 1992; McIntosh et al. 1999),
a higher weight was given to low-ionisation lines. Obviously
blueshifted lines were discarded. Redshift errors were estimated
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 3
Fig. 1. VLT/FORS spectra of quasars in the vicinity of HE 2347−4342. The spectra are shown in black together with their 1σ noise
arrays (green lines). The small inserts show the corresponding discovery spectra from our slitless survey in the same units.
from the redshift differences of the remaining lines and their es-
timated errors.
The 14 discovered quasars lie in the broad redshift range
0.720 ≤ z ≤ 3.542. Fig. 2 shows their angular separations with
respect to HE 2347−4342. We find three background quasars
to HE 2347−4342 and we identify a pair of bright quasars at
z ≃ 1.763 separated by 7.′8. Three foreground quasars (labelled
A–C in Table 1 and Fig. 2) are located in the redshift range
to study the transverse proximity effect. Table 2 provides the
redshift measurements for the detected emission lines in their
spectra. The redshift of QSO J23503−4328 was based on Lyα
and C iv. The measurement of the Mg ii is uncertain because of
the decline of the resolving power of the 300V grism towards
the red, but yields a slightly higher redshift than the adopted
one. For QSO J23500−4319 we measured a consistent redshift
from the C iv and the C iii] line. The redshift measurement of
QSO J23495−4338 was difficult due to several metal absorp-
tion line systems of which only two Mg ii systems at z = 0.921
and z = 1.518 could be identified. In particular, Fe ii absorp-
tion from the z = 0.912 system hampered a redshift measure-
4 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
Table 1. Quasars observed near the line of sight of HE 2347−4342. The first 10 listed quasars have been found in our survey, the
remaining 4 quasars result from the previously unpublished survey by P. Jakobsen. Quasar magnitudes are B and V magnitudes for
our survey and Jakobsen’s survey, respectively.
Object α (J2000) δ (J2000) z Magnitude Night Grism Exposure Airmass Seeing Abbr.
QSO J23510−4336 23h51m05.s50 −43◦36′57.′′2 0.720 ± 0.002 20.74 ± 0.27 19 Nov 2004 300V 1200 s 1.30 1.′′3
QSO J23507−4319 23h50m44.s97 −43◦19′26.′′0 0.850 ± 0.003 19.90 ± 0.07 17 Nov 2004 600B 360 s 1.28 0.′′7
QSO J23507−4326 23h50m45.s39 −43◦26′37.′′0 1.635 ± 0.003 21.05 ± 0.14 17 Nov 2004 300V 200 s 1.23 1.′′0
QSO J23509−4330 23h50m54.s80 −43◦30′42.′′2 1.762 ± 0.004 18.23 ± 0.03 17 Nov 2004 600B 300 s 1.08 0.′′7
QSO J23502−4334 23h50m16.s18 −43◦34′14.′′7 1.763 ± 0.003 18.95 ± 0.04 17 Nov 2004 300V 60 s 1.18 0.′′7
QSO J23503−4328 23h50m21.s55 −43◦28′43.′′7 2.282 ± 0.003 20.66 ± 0.11 17 Nov 2004 300V 400 s 1.20 0.′′7 A
QSO J23495−4338 23h49m34.s53 −43◦38′08.′′7 2.690 ± 0.006 20.21 ± 0.17 19 Nov 2004 300V 360 s 1.13 1.′′2 C
QSO J23511−4319 23h51m09.s44 −43◦19′41.′′6 3.020 ± 0.004 21.00 ± 0.14 17 Nov 2004 600B 1000 s 1.09 1.′′1
QSO J23514−4339 23h51m25.s54 −43◦39′02.′′9 3.240 ± 0.004 21.57 ± 0.29 17 Nov 2004 300V 1400 s 1.14 1.′′2
QSO J23503−4317 23h50m21.s94 −43◦17′30.′′0 3.542 ± 0.005 21.94 ± 0.62 19 Nov 2004 300V 1800 s 1.23 1.′′2
600B 1800 s 1.33 1.′′2
QSO J23515−4324 23h51m33.s05 −43◦24′45.′′2 1.278 ± 0.002 20.82 ± 0.14 06 Oct 2002 300V 900 s 1.24 0.′′7
QSO J23512−4332 23h51m15.s18 −43◦32′34.′′3 1.369 ± 0.001 21.52 ± 0.25 06 Oct 2002 300V 900 s 1.18 0.′′7
QSO J23508−4335 23h50m52.s91 −43◦35′06.′′8 1.778 ± 0.002 22.01 ± 0.32 06 Oct 2002 300V 900 s 1.11 0.′′9
QSO J23500−4319 23h50m00.s28 −43◦19′46.′′1 2.302 ± 0.002 22.61 ± 0.83 06 Oct 2002 300V 900 s 2.37 0.′′8 B
Table 2. Detected emission lines and redshifts of QSOs A–C.
Object Emission line λobs [Å] z
QSO J23503−4328 Lyα 3989 ± 4 2.281 ± 0.003
N v 4070 ± 8 2.282 ± 0.006
Si iv+O iv] 4585 ± 8 2.276 ± 0.006
C iv 5082 ± 4 2.281 ± 0.003
C iii] 6253 ± 7 2.276 ± 0.004
Mg ii 9196 ± 12 2.286 ± 0.004
2.282 ± 0.003
QSO J23500−4319 Si iv+O iv] 4613 ± 6 2.296 ± 0.004
C iv 5115 ± 3 2.302 ± 0.002
C iii] 6305 ± 2 2.303 ± 0.001
2.302 ± 0.002
QSO J23495−4338 Lyα 4513 ± 10 2.712 ± 0.008
O i+Si ii 4823 ± 10 2.694 ± 0.008
C ii 4930 ± 10 2.692 ± 0.007
Si iv+O iv] 5135 ± 15 2.669 ± 0.011
C iv 5691 ± 10 2.674 ± 0.006
C iii] 7028 ± 10 2.682 ± 0.005
2.690 ± 0.006
ment of the Lyα line. The C iv and the C iii] lines show unidenti-
fied absorption features. Thus, the redshift of QSO J23495−4338
is heavily weighted towards the very noisy low-ionisation lines
O i+Si ii and C ii. However, redshift uncertainties of the fore-
ground quasars do not significantly affect our results.
Apparent magnitudes were derived from target aquisition
images photometrically calibrated against the standard star fields
PG 2213−006 or Mark A (Landolt 1992). Unfortunately the
aquisition exposures of the faintest quasars were too short to de-
termine their magnitudes accurately. Magnitudes derived from
integration of the spectra are consistent with the photometric
ones after correcting for slit losses.
We note that QSO J23507−4326 is variable. This quasar
has been detected in both slitless surveys and had V ≃ 20.3
in October 2001, V ≃ 20.7 in October 2002 and V ≃ 21.0
in November 2004. We were able to discover this quasar
in its bright phase while missing the slightly fainter quasar
QSO J23515−4324 detected only in the survey by P. Jakobsen.
2.5. Optical spectra of HE 2347−4342
From the ESO Science Archive we retrieved the optical spectra
of HE 2347−4342 taken with UVES at VLT UT2/Kueyen in the
Large Programme “The Cosmic Evolution of the Intergalactic
Medium” (Bergeron et al. 2004). Data reduction was performed
Fig. 2. Distribution of separation angles ϑ vs. redshift z of the
quasars from Table 1 with respect to HE 2347−4342. Symbol
size indicates apparent optical magnitude.
using the UVES pipeline provided by ESO (Ballester et al.
2000). The vacuum-barycentric corrected co-added spectra yield
a S/N ∼ 100 in the Lyα forest at R ∼ 45000. The spectrum
was normalised in the covered wavelength range 3000 <∼ λ <∼
10000 Å using a cubic spline interpolation algorithm.
2.6. Far-UV spectra of HE 2347−4342
HE 2347−4342 is one of the two high-redshift quasars observed
successfully in the He ii Lyα forest below 303.7822 Å rest frame
wavelength with the Far Ultraviolet Spectroscopic Explorer
(FUSE) at a resolution of R ∼ 20000, although at a
S/N <∼ 5 (Kriss et al. 2001; Zheng et al. 2004). G. Kriss
and W. Zheng kindly provided the reduced FUSE spec-
trum of HE 2347−4342 described in Zheng et al. (2004). We
adopted their flux normalisation with a power law fλ =
3.3 × 10−15
λ/1000Å
)−2.4
erg cm−2 s−1 Å−1 reddened by the
Cardelli et al. (1989) extinction curve assuming E(B − V) =
0.014 (Schlegel et al. 1998).
3. The Lyα forest near the foreground quasars
Aiming to detect the transverse proximity effect as an underden-
sity (’void’) in the Lyα forest towards HE 2347−4342 we ex-
amined the forest regions in the projected vicinity of the three
foreground quasars labelled A–C in Table 1. The H i forest of
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 5
Fig. 3. The Lyα forest of HE 2347−4342 in the vicinity of the foreground quasars A–C from Table 1. The upper panels show the
normalised optical spectrum of HE 2347−4342 including Lyβ and metal lines (red) and the H i Lyα transmission obtained from the
line list by T.-S. Kim (black). The binned blue line shows the mean H i Lyα transmission in ∆z = 0.005 bins towards HE 2347−4342,
whereas the dashed green line indicates the expected mean transmission 〈T 〉exp. The lower panels display the corresponding He ii
transmission from the FUSE spectrum. [See the online edition of the Journal for a colour version of this figure.]
HE 2347−4342 has been analysed in several studies, e.g. by
Zheng et al. (2004) and Fechner & Reimers (2007a), hereafter
called Z04 and FR07, respectively. Since the line list from FR07
is limited to z > 2.29, T.-S. Kim (priv. comm.) kindly provided
an independent line list including the lower redshift Lyα forest
(z > 1.79). Both line lists agree very well in their overlapping
redshift range 2.29 < z < 2.89.
Figure 3 displays the H i and the He ii forest regions near
the foreground quasars A–C. The H i Lyα forest is contaminated
by metals. In particular at z < 2.332 there is severe contam-
ination due to the O vi absorption of the associated system of
HE 2347−4342 (Fechner et al. 2004). Because the strong O vi
absorption overlaps with the projected positions of QSO A and
QSO B it is very difficult to obtain a well-determined H i line
sample in this region. Furthermore, there is Lyβ absorption of
H i and He ii at z < 2.294. We also overplot in Fig. 3 the mean H i
Lyα transmission in ∆z = 0.005 bins obtained from T.-S. Kim’s
line list and the generally expected mean transmission over sev-
eral lines of sight 〈T 〉exp = e−τ
eff with τ
eff = 0.0032 (1 + z)
(Kim et al. 2002).
We do not detect a significant void near the three foreground
quasars, neither in the H i forest nor in the He ii forest. In the
vicinity of QSO A and QSO B, even a careful decontamina-
tion of the optical spectrum does not reveal a significant H i
underdensity. Instead, the transmission is fluctuating around the
mean. Due to the poor quality of the FUSE data in this region
(S/N <∼ 2) and the He ii Lyβ absorption from higher redshifts, a
simple search for He ii voids near QSO A and QSO B is impos-
sible. In the vicinity of QSO C the H i Lyα absorption is slightly
higher than on average. There is a small void at z ≃ 2.702 that
can be identified in the forests of both species. The probability of
chance occurrence of such small underdensities is high, so link-
ing this void to QSO C seems unjustified. However, note that the
He ii absorption in the vicinity of QSO C (z ∼ 2.69) is lower
than at z ∼ 2.71 in spite of the same H i absorption. This points
to fluctuations in the spectral shape of the ionising radiation near
the quasar (Sect. 4.3).
Given the luminosities and distances of our foreground
quasars to the sightline of HE 2347−4342, could we expect to
detect the transverse proximity effect as voids in the H i forest?
As in Paper I, we modelled the impact of the foreground quasars
on the line of sight towards HE 2347−4342 with the parameter
ω (z) =
fνLL , j
4πJν (z)
1 + z′j
)−α j+1
1 + z j
αJν + 3
α j + 3
z j, 0
z j, z
which is the ratio between the summed photoionisation rates of
n quasars at redshifts z j with rest frame Lyman limit fluxes fνLL , j,
penetrating the absorber at redshift z and the overall UV back-
ground with Lyman limit intensity Jν. dL(z j, 0) is the luminosity
distance of QSO j, and dL(z j, z) is its luminosity distance as seen
at the absorber; the redshift of the quasar as seen at the absorber
is z′j (Liske 2000). A value ω ≫ 1 predicts a highly significant
proximity effect.
We assumed a constant UV background at 1 ryd of Jν =
7×10−22 erg cm−2 s−1 Hz−1 sr−1 (Scott et al. 2000) with a power-
law shape Jν ∝ ν
−αJν and αJν = 1.8. The quasar Lyman limit
fluxes were estimated from the spectra by fitting a power law
fν ∝ ν
−α to the quasar continuum redward of the Lyα emis-
sion line, excluding the emission lines. The spectra were scaled
to yield the measured photometric magnitudes. Table 3 lists the
resulting spectral indices, the H i Lyman limit fluxes, and the
transverse distances.
The combined effects of QSOs A and B result in a peak
ωmax ≃ 0.89, while QSO C yields ωmax ≃ 0.11. So we expect
only a weak signature of the transverse proximity effect that can
be easily diluted by small-scale transmission fluctuations around
〈T 〉exp. Thus, the apparent lack of a transverse proximity effect
in the H i forest is no surprise.
We can also roughly estimate the amplitude of the proximity
effect in the He ii forest. Extrapolating the power laws (QSOs
and background) above 4 ryd at η = 50 (Haardt & Madau 1996,
hereafter HM96) we get ωmax ≃ 20 near QSO A and ωmax ≃ 2
near QSO C. A softer background would result in higher values
of ω, whereas absorption of ionising photons in the He ii forest
would decrease ω. However, due to the arising He ii Lyβ forest
and the low S/N in the FUSE data near QSOs A and B, even
high ω values do not necessarily result in a visible He ii void.
6 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
Table 3. Rest frame Lyman limit fluxes of foreground QSOs. A
power law fν ∝ ν
−α is fitted to the QSO continua and fνLL is the
extrapolated H i Lyman limit flux in the QSO rest frame. d⊥(z)
denotes the transverse proper distance to the line of sight towards
HE 2347−4342.
QSO Abbr. z α fνLL [µJy] d⊥(z) [Mpc]
QSO J23503−4328 A 2.282 0.21 16 1.76
QSO J23500−4319 B 2.302 0.84 1 4.33
QSO J23495−4338 C 2.690 0.24 29 7.75
In the direct vicinity of QSO C the He ii data is not saturated,
but shows no clear void structure either. We will show in the
following sections that the spectral shape of the radiation field is
a more sensitive indicator of the transverse proximity effect than
the detection of voids in the forests.
4. The fluctuating shape of the UV radiation field
towards HE 2347−4342
4.1. Diagnostics
If both hydrogen and helium are highly ionised in the IGM
with roughly primordial abundances, the column density ra-
tio η = NHe ii/NH i indicates the softness of the UV radiation
field impinging on the absorbers. Theoretically, η can be de-
rived numerically via photoionisation models of the IGM with
an adopted population of ionising sources. At the redshifts of
interest, 50 <∼ η <∼ 100 is predicted for a UV background gen-
erated by quasars (HM96; Fardal et al. 1998), whereas higher
values indicate a contribution of star-forming galaxies (e.g.
Haardt & Madau 2001, hereafter HM01).
The He ii forest has been resolved with FUSE towards
HE 2347−4342 and HS 1700+6416, allowing a direct estimation
of η by fitting the absorption lines (Kriss et al. 2001; Zheng et al.
2004; Fechner et al. 2006, FR07). Due to the low S/N and the
strong line blending in the He ii forest the He ii lines have to be
fitted with absorber redshifts and Doppler parameters fixed from
the fitting of the H i data of much higher quality. Generally, pure
non-thermal line broadening (bHe ii = bH i) is assumed (how-
ever, see FR07 and Sect. 6 below). The He ii forest towards
HE 2347−4342 was fitted independently by Z04 and FR07. In
the following, we rely on the line fitting results from FR07,
which at any rate are consistent with those obtained by Z04 in
the redshift ranges near the quasars.
All current studies indicate that η is strongly fluctuating on
very small scales in the range 1 <∼ η <∼ 1000. The median column
density ratio towards HE 2347−4342 is η ≃ 62 (Z04), whereas
Fechner et al. (2006) find a higher value of η ≃ 85 towards
HS 1700+6416. Both studies find evidence for an evolution of
η towards smaller values at lower redshifts. However, only part
of the scatter in η is due to redshift evolution and statistical er-
rors, so the spectral shape of the UV radiation field has to fluc-
tuate (FR07). Although the analyses of both available lines of
sight give consistent results, cosmic variance may bias the de-
rived median η and its evolution. This is of particular interest for
our study, since we want to reveal local excesses of low η near
the quasars with respect to the median (Sect. 4.3). Clearly, more
lines of sight with He ii absorption would be required to yield
tighter constraints on the redshift evolution of η.
The detailed results of visual line fitting may be subjective
and may depend on the used fitting software. In particular, am-
biguities in the decomposition of blended H i lines can affect the
derived η values (Fechner & Reimers 2007b). Therefore we also
analyse the UV spectral shape variations using the ratio of the
effective optical depths
τeff,He ii
τeff,H i
. (2)
As introduced in Paper I, this parameter is a resolution-
independent estimator of the spectral shape of the UV radiation
field with small (high) R values indicating hard (soft) radiation
on a certain redshift scale ∆z. Shull et al. (2004) followed a sim-
ilar approach by taking η ≃ 4τHe ii/τH i for a restricted τ range
on scales of ∆z = 1.6× 10−4 and ∆z = 6.6× 10−4. However, this
scaling relation between τ and η is only valid at the centre of
an absorption line (Miralda-Escudé 1993). The column density
ratio is defined per absorption line and not as a continuous quan-
tity, whereas R can be defined on any scale. While R and η are
correlated (see below), there is no simple conversion between
R and η and the correlation will depend on the adopted redshift
scale of R.
4.2. Fluctuations in R and η along the line of sight
We obtained R(z) by binning both normalised Lyα forest spec-
tra of H i and He ii into aligned redshift bins of ∆z = 0.005
in the range 2.3325 < z < 2.8975 and computed R =
ln 〈THe ii〉/ ln 〈TH i〉 with the mean transmission 〈THe ii〉 and
〈TH i〉. The choice of the redshift binning scale was motivated by
the typical scale of η fluctuations 0.001 <∼ ∆z <∼ 0.03 (Kriss et al.
2001, FR07). We adopted the binning procedure by Telfer et al.
(2002) in order to deal with original flux bins that only partly
overlap with the new bins. The errors were computed accord-
ingly. Due to the high absorption and the low S/N of the He ii
data we occasionally encountered unphysical values 〈THe ii〉 ≤ 0.
These were replaced by their errors, yielding lower limits on R.
We mostly neglected the usually small metal contamination in
the computation of 〈TH i〉 in the Lyα forest because the errors in
R are dominated by the low S/N and the more uncertain contin-
uum level of the He ii spectrum. The FUSE data in the redshift
bins at z = 2.375, 2.380, 2.730, 2.735, 2.845 and 2.850 are con-
taminated by galactic H2 absorption, so no R measurement on
the full scale of ∆z = 0.005 can be performed there.
At 2.29 <∼ zLyα <∼ 2.33 the H i Lyα forest is severely con-
taminated by O vi from the associated system of HE 2347−4342
(Fechner et al. 2004). Furthermore, the Lyβ forest of both
species emerges at zLyα < 2.294. Because this excess absorp-
tion would bias the direct estimation of R in the spectra, we tried
to decontaminate the forests at z < 2.332. 〈TH i〉 was computed
from the H i Lyα forest reconstructed from the line list by T.-
S. Kim (Sect. 3). The corresponding 〈THe ii〉 was obtained after
dividing the FUSE data by the simulated Lyβ absorption of the
lines at higher redshift. Since the decontamination depends on
the validity of the He ii line parameters as well as on the com-
pleteness of the H i line list in the complex region contaminated
by O vi, the derived R values at z < 2.332 have to be regarded as
rough estimates.
The resulting R(z) is shown in the upper panel of Fig. 4. The
optical depth ratio strongly fluctuates around its median value
R ≃ 4.8 obtained for uncontaminated redshift bins, indicating
spectral fluctuations in the UV radiation field. We also show in
Fig. 4 the median η(z) on the same redshift bins based on the
line fitting results in FR07. Also the median η strongly fluctuates
with a slight trend of an increase with redshift (Z04). Clearly,
the data is inconsistent with a spatially uniform UV background,
but the median η ≃ 70 of the line sample is consistent with
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 7
Fig. 4. The fluctuating spectral shape of the UV
background towards HE 2347−4342. The up-
per panel shows the ratio of effective optical
depths R vs. redshift z in ∆z = 0.005 bins.
Data points at z < 2.332 (crosses) have been
decontaminated from O vi and Lyβ absorp-
tion (see text). Foreground quasars are marked
with letters and vertical dotted lines as well
as HE 2347−4342 (star symbol). The green
dashed line indicates the median R ≃ 4.8 ob-
tained at z > 2.332 in uncontaminated bins. The
lower panel shows the median η from FR07 in
the same redshift bins. The red dashed line in-
dicates the median η ≃ 70 of the line sample.
quasar-dominated models of the UV background. A comparison
of R(z) and η(z) reveals that both quantities are correlated. The
Spearman rank order correlation coefficient is rS = 0.67 with a
probability of no correlation PS = 6 × 10
There is a scatter in the relation between R and η, which is
due to noise in the He ii data and due to the fact that R is a spec-
tral softness indicator that is smoothed in redshift. Therefore, in
addition to the UV spectral shape, R will depend on the den-
sity fluctuations of the Lyα forest on the adopted scale. In or-
der to estimate the scatter in R due to these density fluctua-
tions, we simulated H i and He ii Lyα forest spectra. We gen-
erated 100 H i forests with the same overall redshift evolution of
eff,H i = 0.0032 (1 + z)
3.37 (Kim et al. 2002) based on the empir-
ical line distribution functions in redshift z, column density NH i
and Doppler parameter bH i (e.g. Kim et al. 2001). We modelled
each forest as a composition of lines with Voigt profiles using the
approximation by Tepper-Garcı́a (2006). The spectral resolution
(R ∼ 42000) and quality (S/N ∼ 100) closely matches the opti-
cal data of HE 2347−4342. The corresponding He ii forests were
generated at FUSE resolution with a S/N = 4 for four constant
values of η = 10, 20, 50 and 100. We assumed pure non-thermal
broadening of the lines. Then we computed R at 2 ≤ z ≤ 3 on our
adopted scale ∆z = 0.005, yielding 20000 R measurements for
each considered η. For convenience we took out the general red-
shift dependence of τeff,H i via dividing by the expected effective
optical depth τ
eff,H i, so
τeff,H i
eff,H i
is a measure of H i overdensity (D > 1) or underdensity (D < 1).
In Fig. 5 we show the relation R(D) obtained from the Monte
Carlo simulations and compare it to the distribution observed
towards HE 2347−4342. The simulated R(D) can be fitted rea-
sonably with a 3rd order polynomial in logarithmic space, yield-
ing a general decrease of R with D for every η. The root-mean-
square scatter increases from 0.13 dex for η = 10 to 0.18 dex for
η = 100. At D >∼ 3 the R(D) distribution flattens due to saturation
of high-column density absorbers on the flat part of the curve of
growth. The flattening causes substantial overlap between the
simulated R distributions at D >∼ 5, making R increasingly in-
sensitive to the underlying η. However, at D <∼ 3 hard radiation
Fig. 5. Dependence of R on D = τeff,H i/τ
eff,H i for different sim-
ulated values of η. The black lines indicate the polynomial fits
to the simulated distributions in logarithmic space. Red filled
circles represent the measured R(D) towards HE 2347−4342 in
uncontaminated bins at z > 2.332. The horizontal dotted line de-
notes R = 2. [See the online edition of the Journal for a colour
version of this figure.]
and soft radiation can be reasonably well distinguished. We also
overplot the measured R(D) towards HE 2347−4342 in Fig. 5.
The observed distribution is inconsistent with a constant η, but
the majority of values falls into the modelled η range. While
many high R values indicate η > 100, values with R <∼ 2 corre-
spond to η <∼ 20 at D <∼ 3. Thus, the very low R values always in-
dicate a hard radiation field up to moderate overdensities. As we
will see in the next section, the saturation effect probably does
not play a role in relating a hard radiation field to the nearby
quasars.
4.3. The UV radiation field near the quasars
We now investigate in greater detail the spectral shape of the UV
radiation field near the four quasars with available data on R and
η: the background quasar HE 2347−4342 and the foreground
QSOs A–C. Due to the small number of comparison values de-
rived from only two lines of sight, we will adopt η = 100 as
a characteristic value for the overall UV background at z > 2.6
8 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
(HE 2347−4342, QSO C) and a value of η = 50 at z ∼ 2.3 (QSOs
A and B). The former value is close to the median η = 102
obtained by Fechner et al. (2006) at 2.58 < z < 2.75 towards
HS 1700+6416, whereas the latter η value accounts for the prob-
able evolution of η with redshift. Furthermore, we will compare
the η values in the vicinity of the quasars to models of the UV
background.
4.3.1. HE 2347−4342
A close inspection of Fig. 4 reveals a strongly fluctuating radi-
ation field near HE 2347−4342 with some very small, but also
high R values. Also the column density ratio shows large fluc-
tuations (1 <∼ η <∼ 1000) with six η <∼ 10 absorbers out of the
20 absorbers at z > 2.86. These strong variations of the spec-
tral shape are likely due to radiative transfer effects in the asso-
ciated absorption system causing an apparent lack of the prox-
imity effect of HE 2347−4342 (Reimers et al. 1997). The high
He ii column densities of the associated system may soften the
quasar radiation with increasing distance and Fig. 4 supports this
interpretation. Due to the probable strong softening of the hard
quasar radiation on small scales, the relative spectral hardness
near HE 2347−4342 is only revealed by individual low η val-
ues instead of robust median values. However, also the highly
ionised metal species of the associated system (Fechner et al.
2004) favour the presence of hard QSO radiation. Thus we
conclude that despite the lack of a radiation-induced void near
HE 2347−4342, its impact onto the IGM can be detected via the
relative spectral hardness of the UV radiation field. The three
R < 2 values near HE 2347−4342 have D < 3, so they are prob-
ably not affected by saturation.
4.3.2. QSOs A and B
If our decontamination of the Lyα forests near the two z ∼ 2.3
QSOs A and B is correct, R should reflect UV spectral shape
variations also in that region. Indeed, the redshift bin at z =
2.280 next to QSO A (z = 2.282) is a local R minimum with
R ≃ 1.5. At z = 2.270 we find R ≃ 0.8. At the redshift of QSO B
(z = 2.302) the radiation field is quite soft, but we note a low
R ∼ 1 at z = 2.310. We obtain D < 3 for the four R <∼ 2 values
near QSO A and QSO B, so saturation is not relevant, and the
low R values correspond to low η values.
The measured η values in this redshift region are presented
in Fig. 6. The error bars are only indicative, since blended line
components are not independent and the He ii column densities
are derived with constraints from the H i forest. Lower limits on η
result from features detected in He ii but not in H i. Due to ambi-
guities in the line profile decomposition at the H i detection limit
and the present low quality of the He ii data it is hard to judge the
reality of most of these added components (Fechner & Reimers
2007b). Nevertheless, since η for adjacent lines may be not inde-
pendent due to line blending, we must include the lower limits
in the analysis. At z < 2.294 the fitting of He ii lines becomes
unreliable due to the arising Lyβ forest. Therefore, no direct es-
timates of η can be obtained in the immediate vicinity of QSO A.
Furthermore, the H i line sample may be incomplete or the line
parameters may be not well constrained due to blending with the
O vi of the associated system of HE 2347−4342.
Considering these caveats, the median η ≃ 19 obtained for
the values at z < 2.332 shown in Fig. 6 is only an estimate.
Nevertheless, this is much lower than the typical values η ∼ 50
found at z ∼ 2.3 towards HS 1700+6416 (Fechner et al. 2006).
Fig. 6. Column density ratio η vs. redshift z in the vicinity of
QSO A and QSO B. The long (short) dashed line indicates the
median η ≃ 19 in this redshift range (η = 50 for a UV back-
ground generated by quasars). At z < 2.294 the He ii Lyβ forest
sets in.
Moreover, it is also lower than at slightly higher redshifts to-
wards HE 2347−4342. For instance, the median η increases to
η = 79 in the redshift range 2.35 ≤ z ≤ 2.40. This is inconsis-
tent with the smooth redshift evolution of η on large scales in-
ferred by Z04 and Fechner et al. (2006) for both available sight-
lines. Thus, we infer an excess of hard radiation in the vicinity of
QSO A and QSO B. The most extreme η values are located in the
projected vicinity of QSO B, with 6 lines reaching η < 1. If es-
timated correctly, these low η values require local hard sources
and cannot be generated by the diffuse UV background. Both
foreground quasars could be responsible for the hard radiation
field because of similar light travel times to the probably affected
absorbers (tA ≃ 2tB).
4.3.3. QSO C
Since metal contamination of the H i forest is small in the pro-
jected vicinity of QSO C (Fig. 3), the UV spectral shape is bet-
ter constrained here than near QSO A and QSO B. From Fig. 4
we note a local R minimum (R ≃ 1.3) that exactly coincides
with the redshift of QSO C (z = 2.690). At higher redshifts
R rises, possibly indicating a softer ionising field. However, at
2.63 <∼ z <∼ 2.695 the optical depth ratio is continuously below
the median with R < 2 in five redshift bins. Due to the H i over-
densities near QSO C, all R < 2 values have D > 1, but only
the bin at z = 2.635 has D ≃ 4, so the remaining ones may still
indicate low column density ratios η.
Figure 7 displays the η values from FR07 in the redshift
range 2.63 < z < 2.73 in the projected vicinity of QSO C.
For comparison, we also indicate η = 100 that is consistent
with the median η = 102 towards HS 1700+6416 in this red-
shift range (Fechner et al. 2006). While the data generally shows
strong fluctuations around the median over the whole covered
redshift range (Z04; Fechner et al. 2006), there is an apparent
excess of small η values near QSO C indicating a predominantly
hard radiation field. From the data, a median η ≃ 46 is obtained
at 2.63 < z < 2.73 including the lower limits on η. The median η
near QSO C is lower than the median η towards HS 1700+6416
by a factor of two and also slightly lower than the η obtained for
spatially uniform UV backgrounds generated by quasars. The
relative agreement of the median η near QSO C and hard ver-
sions of quasar UV background models may result from the soft-
ening of the quasar radiation by the IGM at the large proper dis-
tances d >∼ 7.75 Mpc considered here (Table 3). This will be
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 9
Fig. 7. Column density ratio η vs. redshift z
in the vicinity of QSO C. The short dashed
line denotes η = 100 that is consistent with
the median η ≃ 102 obtained in the range
2.58 < z < 2.75 towards HS 1700+6416
(Fechner et al. 2006). The long dashed line in-
dicates the median η ≃ 46 obtained for the
shown η values (2.63 < z < 2.73).
further explored in Sect. 6.2. The larger contrast between the
median η near QSO C and the median η towards HS 1700+6416
yields stronger evidence for a local hardening of the UV radia-
tion near QSO C. However, this comparison value derived from
the single additional line of sight tracing this redshift range may
be biased itself.
Near QSO C the column density ratio still fluctuates and is
not homogeneously low as naively expected. We also note an
apparent offset of the low η region near QSO C towards lower
redshift due to fewer absorbers with low η at z > 2.69. While
some of the fluctuations can be explained by uncertainties to
recover η reliably from the present data, the very low η ≤ 10
values (≃ 24% of the data in Fig. 7) are likely intrinsically low.
These η values are in conflict with a homogeneous diffuse UV
background, and are likely affected by a local hard source. In
Sect. 6.4 we will estimate the error budget of η by Monte Carlo
simulations.
If QSO C creates a fluctuation in the spectral shape of the
UV background, the distance between the quasar and the line of
sight implies a light travel time of t = 25 Myr. The low (high)
redshift end of the region shown in Fig. 7 corresponds to a light
travel time of 64 Myr (44 Myr). Since these light travel times are
comparable, we argue that it is important to consider not only
the immediate projected vicinity of QSO C to be affected by the
proximity effect (see also Fig. 10 below).
In summary, both spectral shape indicators R and η indicate
a predominantly hard UV radiation field near all four known
quasars in this field. Many η values in the projected vicinity
of the quasars indicate a harder radiation than expected even
for model UV backgrounds of quasars alone. This points to a
transverse proximity effect detectable via the relative spectral
hardness. However, there are other locations along the line of
sight with an inferred hard radiation field, but without an asso-
ciated quasar, most notably the regions at z ∼ 2.48 and z ∼ 2.53
(Fig. 4). Before discussing these in detail (Sect. 6.3), we search
for additional evidence for hard radiation near the foreground
quasars by analysing nearby metal line systems.
5. Constraints from metal line systems
Observed metal line systems provide an additional tool to con-
strain the spectral shape of the ionising radiation. Since pho-
toionisation modelling depends on several free parameters, ap-
propriate systems should preferably show many different ionic
species. Fechner et al. (2004) analysed the associated metal line
system of HE 2347−4342 and found evidence for a hard quasar
spectral energy distribution at the absorbers with highest veloc-
ities that are probably closest to the quasar. Their large He ii
column densities probably shield the other absorbers which are
better modelled with a softer radiation field. The results by
Table 4. Measured column densities of the metal line system at
z = 2.2753. Several components of H i remain unresolved.
# v [km s−1] H i C iv Nv
1 −106.2 13.25 ± 0.59 12.70 ± 0.27
2 −94.3
13.634 ± 0.005
13.26 ± 0.57 12.40 ± 0.42
3 −45.8 12.76 ± 0.39 12.43 ± 0.14
4 −32.0
13.319 ± 0.015
12.80 ± 0.40 12.88 ± 0.06
5 0.0 13.232 ± 0.019 13.17 ± 0.03 12.94 ± 0.02
6 44.8 12.604 ± 0.020 12.51 ± 0.09 12.21 ± 0.05
7 91.5 13.042 ± 0.007 12.79 ± 0.05 11.98 ± 0.11
Fechner et al. (2004) are consistent with the more direct hard-
ness estimators R and η near HE 2347−4342 (Sect. 4.3.1).
In the spectrum of HE 2347−4342 an intervening metal line
system is detected at z = 2.7119 which is close to the redshift
of QSO C (∆z = 0.022) at a proper distance of d ≃ 10.0 Mpc.
At z = 2.2753 there is another system showing multiple com-
ponents of C iv and N v as well as only weak H i absorption
(NH i < 10
13.7 cm−2). The presence of N v and weak H i features
with associated metal absorption are characteristic of intrinsic
absorption systems exposed to hard radiation. Due to the small
proper distance to QSO A (d ≃ 3.1 Mpc) this system is proba-
bly illuminated by the radiation of the close-by quasar. A third
suitable metal line system at z = 2.3132 is closer to QSO B
(d ≃ 6.1 Mpc) than to QSO A (d ≃ 12.0 Mpc). But since
QSO A is much brighter than QSO B (Table 3), the metal line
system at z = 2.3132 might be affected by both quasars. Due
to their small relative velocities with respect to the quasars of
< 3000 km s−1 the systems are likely associated to the quasars
(e.g. Weymann et al. 1981).
In order to construct CLOUDY models (Ferland et al. 1998,
version 05.07) we assumed a single-phase medium, i.e. all ob-
served ions arise from the same gas phase, as well as a solar
abundance pattern (Asplund et al. 2005) at a constant metallicity
throughout the system. Furthermore, we assumed pure photoion-
isation and neglected a possible contribution of collisional ion-
isation. The absorbers were modelled as distinct, plane-parallel
slabs of constant density testing different ionising spectra.
5.1. The system at z = 2.275 near QSO A
The system at z = 2.2753 shows seven components of C iv
and N v along with unsaturated features of H i (Fig. 8). The ab-
sorber densities are constrained by the C iv/N v ratio. For the
HM01 background scaled to yield log Jb = −21.15 at the H i
Lyman limit (Scott et al. 2000), we derive densities in the range
10−4.38 to 10−3.35 cm−3 at a metallicity of ∼ 0.6 solar. The es-
10 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
Fig. 8. Metal line system at z = 2.2753 towards HE 2347−4342.
The displayed profiles assume the rescaled HM01 background.
Zero velocity corresponds to z = 2.2753.
timated absorber sizes are ∼ 5 kpc or even smaller, where the
sizes are computed according to NH = nH l with the absorbing
path length l. With an additional contribution by QSO A, mod-
elled as a power law with α = 0.21 and H i Lyman limit intensity
log Jq = −21.9 at the location of the absorber, we obtain an even
higher metallicity of ∼ 11 times solar. Densities in the range
10−2.95 to 10−2.01 cm−3 are found leading to very small absorbers
of . 10 pc.
Both models lead to unusually high metallicities and very
small absorber sizes. However, Schaye et al. (2007) recently re-
ported on a large population of compact high-metallicity ab-
sorbers. Using the HM01 background they found typical sizes of
∼ 100 pc and densities of 10−3.5 cm−3 for absorbers with nearly
solar or even super-solar metallicities. In fact, this system is part
of the sample by Schaye et al. (2007).
Since the system exhibits only a few different species, it is
impossible to discriminate between the soft and the hard radi-
ation model. In principle, both models lead to a consistent de-
scription of the observed metal lines. The soft HM01 UV back-
ground yields η ∼ 170 for the modelled absorbers, whereas the
model including the hard radiation of QSO A leads to η ∼ 10.
Recall that the He ii forest cannot be used to measure η directly
due to blending with Lyβ features and very low S/N.
5.2. The systems near QSO B and QSO C
The systems at z = 2.3132 and z = 2.7119 are located near
QSO B and QSO C, respectively. Only few ions are observed and
some of them may even be blended. Therefore, no significant
conclusions based on CLOUDY models can be drawn. Using the
column density estimates we find that the system at z = 2.7119
close to QSO C exhibiting C iv and O vi can be described con-
sistently with a HM01+QSO background. Models assuming a
quasar flux of log Jq & −22.5 seen by the absorber yield η . 40,
consistent with the direct measurements from the He ii forest.
However, the metal transitions alone do not provide strong con-
straints.
The system at z = 2.3132 shows C iv in six components
along with Si iv and Si iii. The Lyman series of this system suf-
fers from severe blending preventing a reliable estimation of the
H i column density. Therefore metallicities and absorber sizes
cannot be estimated. Adopting our column density estimates we
infer that this system can be reasonably modelled with or with-
out a specific quasar contribution.
6. Discussion
6.1. The transverse proximity effect in spectral hardness
Fourteen quasars have been found in the vicinity of
HE 2347−4342 of which three are located in the usable part of
the H i Lyα forest towards HE 2347−4342. No H i underdensity
is detected near these foreground quasars even when correcting
for contamination by the O vi absorption from the associated
system of HE 2347−4342. An estimate of the predicted effect
confirms that even if existing, the classical proximity effect is
probably too weak to be detected on this line of sight due to the
high UV background at 1 ryd and small-scale variance in the H i
transmission (Sect. 3).
However, the analysis of the spectral shape of the UV radia-
tion field near the foreground quasars yields a markedly different
result. The spectral shape is fluctuating, but it is predominantly
hard near HE 2347−4342 and the known foreground quasars.
Close to QSO C, both estimators R and η are consistent with a
significantly harder radiation field than on average. There is a
sharp R minimum located precisely at the redshift of the quasar,
but embedded in a broader region of low R values statistically
consistent with a hard radiation field of η <∼ 10 (Fig. 4). The
column density ratio η is also lower than on average and indi-
cates a harder radiation field than obtained for quasar-dominated
models of the UV background (Fig. 7). Because of line blend-
ing, only one of the three metal line systems detected near the
foreground quasars can be used to estimate the shape of the
ionising field. The metal line system at z = 2.275 can be de-
scribed reasonably by the HM01 background with or without a
local ionising component by QSO A. The He ii forest does not
provide independent constraints for this absorber. Line blending
prevents an unambiguous detection of O vi at z = 2.712, leaving
the shape of the ionising field poorly constrained without tak-
ing into account the He ii forest. Thus, the systems show highly
ionised metal species, but our attempts to identify a local quasar
radiation component towards them remain inconclusive.
The most probable sources for the hard radiation field at
z ∼ 2.30 and z ∼ 2.69 towards HE 2347−4342 are the nearby
foreground quasars. In particular, the absorbers with η <∼ 10 have
to be located in the vicinity of an AGN, since the filtering of
quasar radiation over large distances results in η >∼ 50. Also star-
forming galaxies close to the line of sight cannot yield the low
η values, since they are unable to produce significant numbers
of photons at hν > 54.4 eV (Leitherer et al. 1999; Smith et al.
2002; Schaerer 2003). We conclude that there is evidence for a
transverse proximity effect of QSO C detectable via the relative
spectral hardness. There are also indications that QSO A and
QSO B show the same effect, although contamination adds un-
certainty to the spectral shape variations in their projected vicin-
Given these incidences of a hard radiation field near the
quasars, how do these results relate to those of Paper I, in which
we investigated the line of sight towards Q 0302−003? Both
lines of sight show He ii absorption and on both lines of sight
we find evidence for a predominantly hard radiation field near
the quasars in the background and the foreground. However, the
decrease of η near quasars towards Q 0302−003 appears to be
much smoother than towards HE 2347−4342.
There are several reasons for the lack of small-scale spec-
tral shape variations on the line of sight to Q 0302−003. First,
the low-resolution STIS spectrum of Q 0302−003 does not re-
solve the He ii lines and limits the visible scale of fluctuations
to ∆z & 0.006 (Paper I). Much smaller scales can be probed
in the resolved He ii forest of HE 2347−4342, but the fitting of
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 11
blended noisy He ii features may result in artifical η variations.
We will discuss the uncertainties of η below (Sect. 6.4). Second,
Q 0302−003 (z = 3.285) probes higher redshifts, where the He ii
fraction in the IGM is significantly higher and the inferred radi-
ation field is very soft (η ∼ 350 in the Gunn-Peterson trough).
Therefore, the impact of a hard source on the spectral shape is
likely to be more pronounced than at lower redshifts after the
end of He ii reionisation, where η of the UV background gradu-
ally decreases.
6.2. The decrease of η near QSO C
We now investigate quantitatively whether the foreground
quasars are capable of creating a hardness fluctuation on the
sightline towards HE 2347−4342. Unfortunately, since only one
quasar is located in an uncontaminated region of the Lyα forests,
we can present sufficient evidence only for QSO C. For the other
two quasars the data is too sparse and contamination adds uncer-
tainty to the derived η, but in principle QSO A should also show
a strong effect, because its Lyman limit flux penetrating the line
of sight is ∼ 8 times higher than the one of QSO C.
Heap et al. (2000) and Smette et al. (2002) presented simple
models of the decrease of η in front of a quasar taking into ac-
count the absorption of ionising photons by the IGM. In a highly
photoionised IGM with helium mass fraction Y ≃ 0.24 and tem-
perature T ≃ 2 × 104 K we have
4 (1 − Y)
αHe ii
ΓHe ii
≃ 0.42
ΓHe ii
, (4)
where Γi and αi are the photoionisation rate and the radiative
recombination coefficient for species i (Fardal et al. 1998). The
photoionisation rate is Γi = Γi,b + Γi,q with a contribution of the
background and the quasar. The contribution of the quasar to the
photoionisation rate of species i at the jth absorber in front of it
(z j > z j+1) is
Γi,q(z j) =
σi fν,i
h(1 + zq)
1 + zq
1 + z j
)−α+1 ( dL(zq, 0)
dL(zq, z j)
x−α−4exp
Ni,kσix
1 + zk
1 + z j
dx, (5)
with the photoionisation cross section at the Lyman limit σi, the
observed Lyman limit flux fν,i and x = ν/νi with the Lyman limit
frequency νi. Extrapolating the power law continuum flux to the
He ii Lyman limit yields fν,He ii = fν,H i4
−α. With the spectral in-
dex α from Table 3 we obtain ηmin ≃ 2.3 for QSO C.
We simulated η(z) for a set of 1000 Monte Carlo Lyα for-
est spectra generated with the procedure discussed in Sect. 4.2.
We assumed ΓH i,b = 1.75 × 10
−12 s−1 corresponding to the UV
background from Sect. 3 and ηb = 100, which agrees with the
median η towards HS 1700+6416 in the redshift range under
consideration (Fechner et al. 2006). The intervening absorbers
successively block the quasar flux. Especially, every absorber
with log NH i > 15.8 will truncate the quasar flux at hν > 4 ryd
due to a He ii Lyman limit system, leading to an abrupt softening
of the radiation field.
Figure 9 presents the simulated decrease of the median η
approaching QSO C assuming a constant quasar luminosity,
isotropic radiation and an infinite quasar lifetime together with
the upper and lower percentiles of the η distribution obtained
in bins of proper distance ∆d = 2 Mpc. The spread in the
simulated η is due to line-of-sight differences in the absorber
Fig. 9. Column density ratio η vs. proper distance d. The black
line shows the modelled decrease of the median η approaching
QSO C with respect to the ambient ηb = 100 (dashed line).
Green lines mark the upper and lower quartiles of the simu-
lated η distribution in bins of ∆d = 2 Mpc. QSO C is located at
7.75 Mpc. Filled circles show the median η from FR07 in con-
centric rings of ∆d = 2 Mpc around the quasar. Error bars are the
quartile distances to the median. The arrow marks the metal line
system at z = 2.7122 at d = 12.03 Mpc. [See the online edition
of the Journal for a colour version of this figure.]
properties. Since we consider the transverse proximity effect,
we are limited to a proper distance d >∼ 7.75 Mpc (Table 3).
The model agrees reasonably with the median η of the data ob-
tained in concentric rings around the quasar. As expected, in-
dividual η values strongly deviate from this simple model due
to the assumptions of the quasar properties (constant luminosity
and spectral index, isotropic radiation) and due to the unknown
real distribution of absorbers in transverse direction. Recently,
Hennawi & Prochaska (2007) found evidence for excess small-
scale clustering of high-column density systems in transverse
direction to quasar sightlines. In Sect. 4.3.3 we found indica-
tions that the η distribution around QSO C is not symmetric,
which could be due to such anisotropic shielding. However, this
does not imply an intrinsic anisotropy due to the unknown mat-
ter distribution around the quasar and the large uncertainties in
individual η values. Moreover, the line-of-sight variance at a
constant η = 100 is too small to explain the large observed
spread of the η values. Clearly, a self-consistent explanation
of the small-scale η fluctuations would require hydrodynami-
cal simulations of cosmological radiative transfer in order to in-
vestigate possible shielding effects and the statistical distribu-
tion of η values near quasars. While there is recent progress in
case of the UV background (Sokasian et al. 2003; Croft 2004;
Maselli & Ferrara 2005; Bolton et al. 2006), a proper treatment
of three-dimensional radiative transfer in the IGM around a
quasar is still in its infancy. However, our simplified approach
suggests that QSO C is capable of changing the spectral shape
of the UV radiation field by the right order of magnitude to ex-
plain the low η values in its vicinity. Also a variation in the sizes
and the centres of the bins chosen for Fig. 9 does not drastically
change the indicated excess of low η at d <∼ 14 Mpc. Figure 9
also shows very clearly that the sphere of influence for the trans-
verse proximity effect is not limited to the immediate vicinity of
the quasar.
Figure 10 shows a two-dimensional cut in comoving space
near QSO C in the plane spanned by both lines of sight. The
minimum separation of both lines of sight corresponds to a light
travel time of ≃ 25.2 Myr, but the lifetime of QSO C could be
>∼ 40 Myr due to the low η values at larger distances. The fluctua-
12 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
Fig. 10. Transverse comoving separation ∆r⊥ vs. line-of-sight
comoving separation ∆r‖ with respect to QSO C. Black (green)
points denote absorbers with η < 100 (η ≥ 100) on the line of
sight towards HE 2347−4342 (curved line) with indicated red-
shifts. The blue arrow points to the metal line system at z =
2.712. The half circles show the distance travelled by light emit-
ted at the indicated times prior to our observation. The minimum
light travel time between the two lines of sight is 25.19 Myr. [See
the online edition of the Journal for a colour version of this fig-
ure.]
tions of the UV spectral shape could be explained by shadowing
of the hard QSO radiation by unknown intervening structures
between both lines of sight.
6.3. Other regions with an inferred hard UV radiation field
In Fig. 4 we note two additional regions at z ∼ 2.48 and z ∼ 2.53
where R is prominently small and where there is no nearby
quasar. Also the fitted η(z) shows very low values apparently
unrelated to a known foreground quasar. Figure 11 displays the
redshift distribution of the η ≤ 10 subsample. The low η values
are clustered with two peaks near the foreground quasars, but
also at z ∼ 2.40, z ∼ 2.48 and z ∼ 2.53. At the first glance the ex-
istence of such regions seems to undermine the relation between
the foreground quasars and a low η in their vicinity. However,
there are several plausible explanations for the remaining low η
values:
1. Unknown quasars: We can conclude from Paper I that the
quasars responsible for hardness fluctuations may be very
faint (like Q 0302-D113 in Paper I) or may reside at large
distances (Q 0301−005 in Paper I). QSO C is located near
the edge of our survey area centred on HE 2347−4342, so
other quasars capable of influencing the UV spectral shape
might be located outside the field of view. Moreover, in or-
der to sample the full quasar luminosity domain (MB ≤ −23)
at z ∼ 2.5 our survey is still too shallow by ∼ 1 mag-
nitude. Therefore, a larger and/or deeper survey around
HE 2347−4342 is desirable.
2. Quasar lifetime: Assuming that quasars are long-lived and
radiate isotropically, every statistically significant low η fluc-
tuation should be due to a nearby quasar. On the other hand,
short-lived quasars will not be correlated with a hard radia-
tion field due to the light travel time from the quasar to the
background line of sight. Quasar lifetimes are poorly con-
strained by observations to 1 <∼ tq <∼ 100 Myr (Martini 2004).
This could be short enough to create relic light echoes from
extinct quasars. The comoving space density of quasars with
Fig. 11. Redshift distribution of low-η absorbers. The open
(hashed) histogram shows all (NH i ≤ 10
14 cm−2) absorbers with
η ≤ 10. Letters and dotted lines mark foreground quasars. The
horizontal dashed line denotes the estimated average number of
absorbers scattered from η = 80 to η ≤ 10 (≃ 0.64 per bin).
MB < −23 at z ≃ 2.5 is ≃ 3.7×10
−6 Mpc−3 (Wolf et al. 2003)
resulting in an average proper separation of ∼ 18.5 Mpc be-
tween two lines of sight. This translates into a light travel
time of ∼ 60 Myr which is of the same order as the quasar
lifetime. So it is quite possible that some quasars have al-
ready turned off, but their hard radiation is still present.
3. Obscured quasars: Anisotropic emission of type I quasars
may lead to redshift offsets between regions with an inferred
hard radiation field and quasars close to the line of sight. In
the extreme case the putative quasar radiates in transverse di-
rection, but is obscured on our line of sight (type II quasar).
The space density of type II AGN at z > 2 is very uncer-
tain due to the challenging optical follow-up that limits the
survey completeness (e.g. Barger et al. 2003; Szokoly et al.
2004; Krumpe et al. 2007). Thus, the fraction of obscured
AGN at high redshift is highly debated (Akylas et al. 2006;
Treister & Urry 2006), but may well equal that of type I
AGN in the luminosity range of interest (Ueda et al. 2003).
We believe that a combination of the above effects is respon-
sible for the loose correlation between low η values and active
quasars. In particular, at z ∼ 2.4 we infer a hard radiation field in
a H i void (FR07), which may have been created by a luminous
quasar that is unlikely to be missed by our survey (V <∼ 22).
In Fig. 11 we also indicate the error level due to inaccurate
line fitting and noise in the He ii data (dashed line) obtained from
simulated data (see below). The low number of η values scattered
from a simulated η = 80 to η ≤ 10 implies that the overdensities
of such small η values are statistically significant. Constraining
the sample to lines with log(NH i) < 14 due to a possible bias
caused by thermal broadening does not remove the significant
clusters of lines with small η.
6.4. Uncertainties in the spectral hardness
Our findings are likely to be affected by random errors and possi-
bly also by systematic errors mostly related to the He ii data. The
poor quality of the FUSE spectrum of HE 2347−4342 (S/N <∼ 5)
contributes to the fluctuations in η even if the η value was con-
stant (Fechner et al. 2006, see also below). The optical depth
ratio R should be less affected by noise, since it is an average
over a broader redshift range ∆z = 0.005. The low S/N and the
generally high absorption at η ≫ 1 provide uncertainty for the
continuum determination in the He ii spectrum. The extrapolated
reddened power law is certainly an approximation.
Although the η fitting results from FR07 are broadly consis-
tent with those of Z04 and agree well in the regions near the fore-
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 13
ground quasars, there are substantial differences in some redshift
ranges. This is probably due to the combined effects of low He ii
data quality, different data analysis software and ambiguities in
the deblending of lines. At present, η cannot be reliably deter-
mined at individual absorbers unless metal transitions provide
further constraints.
In order to assess the random scatter in η due to the low S/N
He ii data and ambiguities in the line deblending of both species,
we again used Monte Carlo simulations. Ten H i Lyα forest spec-
tra were generated in the range 2 < z < 3 via the Monte Carlo
procedure outlined in Sect. 4.2. The resolution R ∼ 42000 and
S/N = 100 closely resembles the optical data of HE 2347−4342.
We also generated the corresponding He ii forests at FUSE res-
olution and S/N = 4. We assumed pure non-thermal line broad-
ening and η = 80. Voigt profiles were automatically fitted to
the H i spectra using AUTOVP 1 (Davé et al. 1997). The He ii
spectra were then automatically fitted with redshifts z and non-
thermal Doppler parameters bH i fixed from the fitted H i line
lists, yielding 7565 simulated η values. On average the recov-
ered η is slightly higher than the simulated one (median η ≃ 89)
with a large spread (0 < η <∼ 8000), but only 285 lines have
η ≤ 10. Thus, we estimate a probability P ≃ 3.8% that η is
scattered randomly from η = 80 to η ≤ 10 if the assumption of
non-thermal broadening is correct. Note that this probability is
likely an upper limit due to the fact that only H i Lyα was used
to obtain the line parameters, which results in large error bars
for saturated lines on the flat part of the curve of growth. In the
real data, these errors were avoided by fitting unsaturated higher
orders of the Lyman series wherever possible.
In the line sample by FR07, 94 out of the 526 absorbers have
η ≤ 10, whereas our simulation implies that only ∼ 20 are ex-
pected to be randomly scattered to η ≤ 10 if η was constant.
Thus, the major part of the scatter of η in the data is due to real
fluctuations in the UV spectral shape. The majority of the low
η ≤ 10 values is inconsistent with η ≥ 80, so they indicate a
hard radiation field in spite of the low S/N in the He ii data. Yet,
due to the large intrinsic scatter obtained from the simulations,
individual η values hardly trace the variations of the UV spectral
shape. Local spatial averages should be more reliable (FR07).
Since the transverse proximity effect zones always extend over
some redshift range, this requirement is fulfilled and on average
we reveal a harder radiation field than expected.
Concerning the high tail of the simulated distribution at
η = 80, ∼ 15% of the lines are returned with η >∼ 200. This
may indicate that a fraction of the observed high η values is still
consistent with a substantially harder radiation field, underlining
that single η values poorly constrain the spectral shape.
Possibly, some η values are systematically too low due to
the assumption of non-thermal broadening (bHe ii = bH i) when
fitting the He ii forest. FR07 found that this leads to underes-
timated η values at NH i >∼ 10
13 cm−2 if the lines are in fact
thermally broadened (bHe ii = 0.5bH i). Non-thermal broaden-
ing is caused by turbulent gas motions or the differential Hubble
flow, with the latter affecting in particular the low-column den-
sity forest. Thermal broadening becomes important in collapsed
structures at high column densities. In simulations of the Lyα
forest, non-thermal broadening has been found to dominate
(Zhang et al. 1995, 1998; Hernquist et al. 1996; Weinberg et al.
1997; Bolton et al. 2006; Liu et al. 2006). This has been con-
firmed observationally for the low-column density forest (Z04;
Rauch et al. 2005). On the other hand, eight out of eleven ab-
sorbers with NH i > 10
14 cm−2 in the vicinity of QSO C have
1 http://ursa.as.arizona.edu/˜rad/autovp.tar
η ≤ 10 (Fig. 7). Although the column density ratio of these
absorbers could be underestimated due to an unknown contri-
bution of thermal broadening, the statistical evidence for a hard
radiation field is based on the vast majority of low-column den-
sity lines. The median η obtained in this region does not in-
crease significantly after excluding the suspected lines (∼ 53
vs. ∼ 40). This is still much lower than the median η ∼ 100
towards HS 1700+6416 in this redshift range (Fechner et al.
2006). Therefore, it is unlikely that our results are biased due
to the assumed line broadening.
7. Conclusions
Traditionally, the transverse proximity effect of a quasar has
been claimed to be detectable as a radiation-induced void in the
H i Lyα forest. But due to several systematic effects like quasar
variability, finite quasar lifetime, intrinsic overdensities around
quasars, or anisotropic radiation, most searches yielded negative
results (e.g. Schirber et al. 2004; Croft 2004).
In this paper, we have analysed the fluctuating spectral shape
of the UV background in the projected vicinity of the three
foreground quasars QSO J23503−4328, QSO J23500−4319 and
QSO J23495−4338 (dubbed QSO A, B and C) on the line of
sight towards HE 2347−4342 (z = 2.885). By comparing the
H i absorption and the corresponding He ii absorption, we have
presented evidence for a statistical excess of hard UV radiation
near the foreground quasars. However, due to contamination of
the forests near QSO A (z = 2.282) and QSO B (z = 2.302),
the evidence is strongest for QSO C (z = 2.690). We interpret
these indicators for an excess of hard radiation near the fore-
ground quasars as a manifestation of the transverse proximity
effect. A simple model indicates that the foreground quasars are
capable of generating the observed hard radiation over the ob-
served distances of several Mpc. Furthermore, we tried to model
the ionising radiation field of three metal line systems close to
the foreground quasars. Two of those are strongly affected by
line blending and do not allow for reliable photoionisation mod-
els. The remaining system can be modelled reasonably with or
without a contribution by a local quasar. Future larger samples of
highly ionised unblended metal systems near foreground quasars
may provide evidence for local hardness fluctuations.
In Worseck & Wisotzki (2006) we revealed the transverse
proximity effect as a systematic local hardness fluctuation
around four foreground quasars near Q 0302−003 and pointed
out that the relative UV spectral hardness is a sensitive phys-
ical indicator of the proximity effect over distances of several
Mpc. In this study we are able to confirm this on a second line
of sight. Evidently, small-scale transmission fluctuations in the
H i forest can dilute the small predicted signature of the effect.
However, the hard spectral shape of the UV radiation field still
indicates the transverse proximity effect despite the H i density
fluctuations. Thus, we confirm our previous result that the spec-
tral hardness breaks the density degeneracy that affects the tradi-
tional searches for the proximity effect. Moreover, the predicted
transverse proximity effect of the quasars in the H i forest is weak
due to the high UV background at 1 ryd. Still the UV spectral
shape is able to discriminate local UV sources independent of
the amplitude of the UV background.
Bolton et al. (2006) find that the large UV spectral shape
fluctuations in the IGM are likely due to the small number of
quasars contributing to the He ii ionisation rate at any given
point, whereas the H i ionisation rate is rather homogeneous
due to the probable contribution of star-forming galaxies (e.g.
Bianchi et al. 2001; Sokasian et al. 2003; Shapley et al. 2006).
14 G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342
Our findings confirm the picture that AGN create the hard part of
the intergalactic UV radiation field. If the quasar is active long
enough, its hard radiation field can be observed penetrating a
background line of sight. It is also likely that light echoes from
already extinguished quasars are responsible for some locations
of hard radiation without an associated quasar. The transverse
proximity effect of QSO C implies a minimum quasar lifetime
of ∼ 25 Myr (probably even ∼ 40 Myr), providing additional
constraints to more indirect estimates (e.g. Martini 2004, and
references therein).
However, the UV radiation field near the foreground quasars
is not homogeneously hard as naively expected, but still shows
fluctuations. Apart from substantial measurement uncertainties,
the unknown density structure around the quasar could shield the
ionising radiation in some directions, maybe even preferentially
in transverse direction to the line of sight (Hennawi & Prochaska
2007). Thus, radiative transfer effects may become important
to explain a fluctuating UV spectral shape in the presence of
a nearby quasar. Large-scale simulations of cosmological radia-
tive transfer with discrete ionising sources are required to adress
these issues in detail.
Moreover, the He ii forest has been resolved so far only to-
wards two quasars at a very low S/N <∼ 5. While the low data
quality primarily creates uncertainties in the spectral shape on
small spatial scales, large scales could be affected by cosmic
variance. Thus, the general redshift evolution of the UV spec-
tral shape is not well known and estimates obtained from single
lines of sight may well be biased by local sources.
Acknowledgements. We thank the staff of the ESO observatories La Silla and
Paranal for their professional assistance in obtaining the optical data discussed in
this paper. We are grateful to Peter Jakobsen for agreeing to publish the quasars
from his survey. We thank Gerard Kriss and Wei Zheng for providing the re-
duced FUSE spectrum of HE 2347−4342. Tae-Sun Kim kindly supplied an addi-
tional line list of HE 2347−4342. GW and ADA acknowledge support by a HWP
grant from the state of Brandenburg, Germany. CF is supported by the Deutsche
Forschungsgemeinschaft under RE 353/49-1. We thank the anonymous referee
for helpful comments.
References
Akylas, A., Georgantopoulos, I., Georgakakis, A., Kitsionas, S., &
Hatziminaoglou, E. 2006, A&A, 459, 693
Appenzeller, I., Fricke, K., Furtig, W., et al. 1998, The Messenger, 94, 1
Asplund, M., Grevesse, N., & Sauval, A. J. 2005, in ASP Conf. Ser. 336: Cosmic
Abundances as Records of Stellar Evolution and Nucleosynthesis, 25, astro–
ph/0410214
Baade, D., Meisenheimer, K., Iwert, O., et al. 1999, The Messenger, 95, 15
Bajtlik, S., Duncan, R. C., & Ostriker, J. P. 1988, ApJ, 327, 570
Ballester, P., Mondigliani, A., Boitquin, O., et al. 2000, The Messenger, 101, 31
Barger, A. J., Cowie, L. L., Capak, P., et al. 2003, AJ, 126, 632
Bergeron, J., Petitjean, P., Aracil, B., et al. 2004, The Messenger, 118, 40
Bianchi, S., Cristiani, S., & Kim, T.-S. 2001, A&A, 376, 1
Bolton, J. S., Haehnelt, M. G., Viel, M., & Carswell, R. F. 2006, MNRAS, 366,
Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, ApJ, 345, 245
Croft, R. A. C. 2004, ApJ, 610, 642
Crotts, A. P. S. 1989, ApJ, 336, 550
Crotts, A. P. S. & Fang, Y. 1998, ApJ, 502, 16
Davé, R., Hernquist, L., Weinberg, D. H., & Katz, N. 1997, ApJ, 477, 21
Dobrzycki, A. & Bechtold, J. 1991a, ApJ, 377, L69
Dobrzycki, A. & Bechtold, J. 1991b, in ASP Conf. Ser. 21: The Space
Distribution of Quasars, 272
Fardal, M. A., Giroux, M. L., & Shull, J. M. 1998, AJ, 115, 2206
Fardal, M. A. & Shull, J. M. 1993, ApJ, 415, 524
Fechner, C., Baade, R., & Reimers, D. 2004, A&A, 418, 857
Fechner, C. & Reimers, D. 2007a, A&A, 461, 847
Fechner, C. & Reimers, D. 2007b, A&A, 463, 69
Fechner, C., Reimers, D., Kriss, G. A., et al. 2006, A&A, 455, 91
Ferland, G. J., Korista, K. T., Verner, D. A., et al. 1998, PASP, 110, 761
Fernández-Soto, A., Barcons, X., Carballo, R., & Webb, J. K. 1995, MNRAS,
277, 235
Gallerani, S., Ferrara, A., Fan, X., & Roy Choudhury, T. 2007, MNRAS, sub-
mitted, arXiv:0706.1053
Gaskell, C. M. 1982, ApJ, 263, 79
Giallongo, E., Cristiani, S., D’Odorico, S., Fontana, A., & Savaglio, S. 1996,
ApJ, 466, 46
Guimarães, R., Petitjean, P., Rollinde, E., et al. 2007, MNRAS, 377, 657
Haardt, F. & Madau, P. 1996, ApJ, 461, 20
Haardt, F. & Madau, P. 2001, in Clusters of Galaxies and the High Redshift
Universe Observed in X-rays, ed. D. M. Neumann & J. T. T. Van, 64
Heap, S. R., Williger, G. M., Smette, A., et al. 2000, ApJ, 534, 69
Hennawi, J. F. & Prochaska, J. X. 2007, ApJ, 655, 735
Hernquist, L., Katz, N., Weinberg, D. H., & Miralda-Escudé, J. 1996, ApJ, 457,
Horne, K. 1986, PASP, 98, 609
Jakobsen, P. 1998, A&A, 335, 876
Jakobsen, P., Jansen, R. A., Wagner, S., & Reimers, D. 2003, A&A, 397, 891
Kim, T.-S., Carswell, R. F., Cristiani, S., D’Odorico, S., & Giallongo, E. 2002,
MNRAS, 335, 555
Kim, T.-S., Cristiani, S., & D’Odorico, S. 2001, A&A, 373, 757
Kriss, G. A., Shull, J. M., Oegerle, W., et al. 2001, Sci, 293, 1112
Krumpe, M., Lamer, G., Schwope, A. D., et al. 2007, A&A, 466, 41
Landolt, A. U. 1992, AJ, 104, 340
Leitherer, C., Schaerer, D., Goldader, J. D., et al. 1999, ApJS, 123, 3
Liske, J. 2000, MNRAS, 319, 557
Liske, J. & Williger, G. M. 2001, MNRAS, 328, 653
Liu, J., Jamkhedkar, P., Zheng, W., Feng, L.-L., & Fang, L.-Z. 2006, ApJ, 645,
Loeb, A. & Eisenstein, D. J. 1995, ApJ, 448, 17
Martini, P. 2004, in Carnegie Observatories Astrophysics Series Vol. 1:
Coevolution of Black Holes and Galaxies, ed. L. C. Ho (Cambridge
University Press), 170
Maselli, A. & Ferrara, A. 2005, MNRAS, 364, 1429
McDonald, P., Seljak, U., Cen, R., Bode, P., & Ostriker, J. P. 2005, MNRAS,
360, 1471
McIntosh, D. H., Rix, H.-W., Rieke, M. J., & Foltz, C. B. 1999, ApJ, 517, L73
Miralda-Escudé, J. 1993, MNRAS, 262, 273
Miralda-Escudé, J., Haehnelt, M., & Rees, M. J. 2000, ApJ, 530, 1
Møller, P. & Kjærgaard, P. 1992, A&A, 258, 234
Picard, A. & Jakobsen, P. 1993, A&A, 276, 331
Rauch, M., Becker, G. D., Viel, M., et al. 2005, ApJ, 632, 58
Reimers, D., Köhler, S., Wisotzki, L., et al. 1997, A&A, 327, 890
Rollinde, E., Srianand, R., Theuns, T., Petitjean, P., & Chand, H. 2005, MNRAS,
361, 1015
Schaerer, D. 2003, A&A, 397, 527
Schaye, J., Carswell, R. F., & Kim, T.-S. 2007, MNRAS, submitted, astro-
ph/0701761
Schirber, M., Miralda-Escudé, J., & McDonald, P. 2004, ApJ, 610, 105
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525
Scott, J., Bechtold, J., Dobrzycki, A., & Kulkarni, V. P. 2000, ApJS, 130, 67
Scott, J., Kriss, G. A., Brotherton, M., et al. 2004, ApJ, 615, 135
Shapley, A. E., Steidel, C. C., Pettini, M., Adelberger, K. L., & Erb, D. K. 2006,
ApJ, 651, 688
Shull, J. M., Tumlinson, J., Giroux, M. L., Kriss, G. A., & Reimers, D. 2004,
ApJ, 600, 570
Smette, A., Heap, S. R., Williger, G. M., et al. 2002, ApJ, 564, 542
Smith, L. J., Norris, R. P. F., & Crowther, P. A. 2002, MNRAS, 337, 1309
Sokasian, A., Abel, T., & Hernquist, L. 2003, MNRAS, 340, 473
Srianand, R. 1997, ApJ, 478, 511
Szokoly, G. P., Bergeron, J., Hasinger, G., et al. 2004, ApJS, 155, 271
Telfer, R. C., Zheng, W., Kriss, G. A., & Davidsen, A. F. 2002, ApJ, 565, 773
Tepper-Garcı́a, T. 2006, MNRAS, 369, 2025
Tittley, E. R. & Meiksin, A. 2006, astro-ph/0605317
Treister, E. & Urry, C. M. 2006, ApJ, 652, L79
Tytler, D. & Fan, X. 1992, ApJS, 79, 1
Ueda, Y., Akiyama, M., Ohta, K., & Miyaji, T. 2003, ApJ, 598, 886
Véron-Cetty, M.-P. & Véron, P. 2006, A&A, 455, 773
Weinberg, D. H., Hernquist, L., Katz, N., Croft, R., & Miralda-Escudé, J. 1997,
in Proceedings of the 13th IAP Astrophysics Colloquium: Structure and
Evolution of the Intergalactic Medium from QSO Absorption Line Systems,
ed. P. Petitjean & S. Charlot (Paris: Editions Frontières), 133
Weymann, R. J., Carswell, R. F., & Smith, M. G. 1981, ARA&A, 19, 41
Wisotzki, L., Selman, F., & Gilliotte, A. 2001, The Messenger, 104, 8
Wolf, C., Wisotzki, L., Borch, A., et al. 2003, A&A, 408, 499
Worseck, G. & Wisotzki, L. 2006, A&A, 450, 495
Zhang, Y., Anninos, P., & Norman, M. L. 1995, ApJ, 453, L57
Zhang, Y., Meiksin, A., Anninos, P., & Norman, M. L. 1998, ApJ, 495, 63
G. Worseck et al.: The transverse proximity effect in spectral hardness towards HE 2347−4342 15
Zheng, W., Kriss, G. A., Deharveng, J.-M., et al. 2004, ApJ, 605, 631
List of Objects
‘HE 2347−4342’ on page 1
‘HS 1700+6416’ on page 1
‘Q 0302−003’ on page 2
‘QSO J23510−4336’ on page 4
‘QSO J23507−4319’ on page 4
‘QSO J23507−4326’ on page 4
‘QSO J23509−4330’ on page 4
‘QSO J23502−4334’ on page 4
‘QSO J23503−4328’ on page 4
‘QSO J23495−4338’ on page 4
‘QSO J23511−4319’ on page 4
‘QSO J23514−4339’ on page 4
‘QSO J23503−4317’ on page 4
‘QSO J23515−4324’ on page 4
‘QSO J23512−4332’ on page 4
‘QSO J23508−4335’ on page 4
‘QSO J23500−4319’ on page 4
‘Q 0302-D113’ on page 12
‘Q 0301−005’ on page 12
	Introduction
	Observations and data reduction
	Search for QSO candidates near HE 2347-4342
	Spectroscopic follow-up
	Additional quasars
	Redshifts and magnitudes
	Optical spectra of HE 2347-4342
	Far-UV spectra of HE 2347-4342
	The Ly forest near the foreground quasars
	The fluctuating shape of the UV radiation field towards HE 2347-4342
	Diagnostics
	Fluctuations in R and  along the line of sight
	The UV radiation field near the quasars
	HE 2347-4342
	QSOs A and B
	QSO C
	Constraints from metal line systems
	The system at z=2.275 near QSO A
	The systems near QSO B and QSO C
	Discussion
	The transverse proximity effect in spectral hardness
	The decrease of  near QSO C
	Other regions with an inferred hard UV radiation field
	Uncertainties in the spectral hardness
	Conclusions
ABSTRACT
  We report the discovery of 14 quasars in the vicinity of HE2347-4342, one of
the two quasars whose intergalactic HeII forest has been resolved with FUSE. By
analysing the HI and the HeII opacity variations separately, no transverse
proximity effect is detected near three foreground quasars of HE2347-4342:
QSOJ23503-4328 (z=2.282, $\vartheta=3.59$ arcmin), QSOJ23500-4319 (z=2.302,
$\vartheta=8.77$ arcmin) and QSOJ23495-4338 (z=2.690, $\vartheta=16.28$
arcmin). This is primarily due to line contamination and overdensities probably
created by large-scale structure. By comparing the HI absorption and the
corresponding HeII absorption, we estimated the fluctuating spectral shape of
the extragalactic UV radiation field along this line of sight. We find that the
UV spectral shape near HE2347-4342 and in the projected vicinity of the three
foreground quasars is statistically harder than expected from UV background
models dominated by quasars. In addition, we find three highly ionised metal
line systems near the quasars. However, they do not yield further constraints
on the shape of the ionising field. We conclude that the foreground quasars
show a transverse proximity effect that is detectable as a local hardening of
the UV radiation field, although the evidence is strongest for QSOJ23495-4338.
Thus, the relative spectral hardness traces the proximity effect also in
overdense regions prohibiting the traditional detection in the HI forest.
Furthermore, we emphasise that softening of quasar radiation by radiative
transfer in the intergalactic medium is important to understand the observed
spectral shape variations. From the transverse proximity effect of
QSOJ23495-4338 we obtain a lower limit on the quasar lifetime of ~25 Myr.

<|endoftext|><|startoftext|>
Introduction
The behaviour of random walks on random combs is of interest from a number of points
of view. Condensed matter physicists have studied such structures because they serve
as a model for diffusion in more complicated fractals and percolation clusters [1, 2, 3, 4].
In the context of quantum gravity, random combs are a tractable example of a random
manifold ensemble and understanding their geometric properties can provide insight
into higher dimensional problems [5, 6, 7]. Most of the literature concerns approximate
analytical techniques and numerical solutions, although there are exact calculations of
leading order behaviour in some cases [8]. To this end, it is desirable to have rigorous
methods for determining the geometric quantities of interest and that is the purpose of
this paper.
One such quantity is the dimensionality of the ensemble. On a sufficiently smooth
manifold all definitions of dimension will agree, but for fractal geometries like random
combs this is not necessarily true. The spectral dimension is defined to be ds provided
the ensemble average probability of a random walker being back at the origin at time t,
takes the asymptotic form t−ds/2. This concept of dimension does not in general agree
with the Hausdorff dimension dH , which is defined when the expectation value of the
volume enclosed within a geodesic distance R from a marked point scales like RdH as
R → ∞.
http://arxiv.org/abs/0704.0188v2
Biased random walks on random combs 2
We know that for diffusion on regular structures the mean square displacement at
large times is proportional to t, but for a fractal substrate there is anomalous diffusion
and the mean square displacement behaves like t2/dw , where dw represents the fractal
dimension of the walk and depends sensitively on the nature of the random structure.
Biased random walks on combs have also been studied in connection with disordered
materials, since such a system is a paradigm for diffusion on fractal structures in the
presence of an applied field [9, 10]. As we discuss later there are several different bias
regimes. Topological bias, where at every vertex in the comb there is an increased
probability of moving away from the origin was first studied for a random comb with a
power-law distribution of tooth lengths in [11]. Other works have discussed the effects
of bias away from the origin only in the teeth [12] and only in the spine [13]. The effect
of going into the teeth can be viewed as creating a waiting time for the walk along the
spine; the distribution of the waiting time depends on both the bias and the length of
the teeth and the outcome is the result of subtle interplay between the two.
In [14] some new, rigorous techniques were developed to study random walks on
combs. This enabled an exact, but very simple calculation of the spectral dimension of
random combs. The principal idea is to split both random combs and random walks
into subsets that give either strictly controllable or exponentially decaying contributions
to the calculation of physical characteristics. These methods were later reinforced to
prove that the spectral dimension of generic infinite tree ensembles is 4/3 [15, 16]. In
this paper we use and extend the techniques of [14] to deal with biased walks on combs.
Some of our results are new; some qualify statements made in the literature; and some
merely confirm results already derived by other, usually less rigorous, methods.
The random combs, the bias scenario, some useful generating functions and the
critical exponents are defined in the next section. In Section 3 we introduce some
deterministic combs, discuss general properties of the generating functions and establish
bounds that will be instrumental when studying random ensembles. Section 4 looks at
regions of bias where the large time behaviour is independent of the comb ensemble or
simply dependent on the expectation value of the first return generating function in the
teeth. In Section 5 we compute the spectral dimension in regions of bias where it is
influenced by the probability measure on the teeth. Two specific cases are considered:
the random comb with nonzero probability of an infinitely long tooth at each vertex on
the spine and the random comb with a power law distribution of tooth lengths. Section
6 examines transport properties along the spine for these same probability measures and
in the final section we review the main results, compare with the literature and discuss
their significance. Some exact calculations and proofs omitted from the main text are
outlined in the appendices.
2. Definitions
Wherever possible we use the definitions and notation of [14]; we repeat them here for
the reader’s convenience but mostly refer back to [14] for proofs and derived properties.
Biased random walks on random combs 3
2.1. Random combs
LetN∞ denote the nonnegative integers regarded as a graph so that n has the neighbours
n ± 1 except for 0 which only has 1 as a neighbour. Let Nℓ be the integers 0, 1, . . . , ℓ
regarded as a graph so that each integer n ∈ Nℓ has two neighbours n± 1 except for 0
and ℓ which only have one neighbour, 1 and ℓ− 1, respectively. A comb C is an infinite
Figure 1. A comb.
rooted tree-graph with a special subgraph S called the spine which is isomorphic to N∞
with the root, which we denote r, at n = 0. At each vertex of S, except the root r, there
is attached by their endpoint 0 one of the graphs Nℓ or N∞. The linear graphs attached
to the spine are called the teeth of the comb, see figure 1. We will denote by Tn the
tooth attached to the vertex n on S, and by Ck the comb obtained by removing the links
(0, 1), . . . , (k − 1, k), the teeth T1, . . . , Tk and relabelling the remaining vertices on the
spine in the obvious way. An arbitrary comb is specified by a list of its teeth {T1, . . .}
and |Tk| denotes the length of the tooth. Note that we have excluded the possibility
of a tooth of zero length. This is for technical convenience in what follows and can be
relaxed [17].
In this paper we are interested in random combs for which the length ℓ of each
tooth is identically and independently distributed with probability µℓ. This induces a
probability measure µ on the positive integers and expectation values with respect to
this measure will be denoted 〈·〉µ. In particular we will consider the two measures
µAℓ =
p, ℓ = ∞,
1− p, ℓ = 1,
0, otherwise;
µBℓ =
, a > 1. (1)
Biased random walks on random combs 4
However, the results proved for µB apply to any measure with the same behaviour at
large ℓ and we note in passing that the methods used here will work for any distribution
that is reasonably smooth, for example the exponential distribution. The measure µB
has been discussed quite extensively in the literature but µA has not.
2.2. Biased random walks
We regard time as integer valued and consider a walker who makes one step on the
graph for each unit time interval. If the walker is at the root or at the end-point of a
tooth then she leaves with probability 1. If at any other vertex the probabilities are
parametrized by two numbers ǫ1 and ǫ2 as shown in figure 2a and the allowed range of
these parameters is shown in figure 2b. For walks in the teeth there is bias away from or
towards the spine depending on whether ǫ2 is positive or negative; similarly a walk on
the spine is biased away from or towards the root depending on whether ǫ1 is positive
or negative. When there is no bias we say that the walk is ‘critical’; the fully critical
case ǫ1 = ǫ2 = 0 was covered in [14]. The notation
b− = 1− ǫ1 − ǫ2,
b+ = 1 + ǫ1 − ǫ2,
bT = 1 + 2ǫ2, (2)
will be used where applicable since these combinations appear often in our analysis. We
denote by B,B′, B1, B2 etc constants which depend on ǫ1 and ǫ2 and may vary from line
to line but are positive and finite on the relevant range; other constants will be denoted
c, c′ etc.
The generating function for the probability pC(t) that the walker on C is back at
the root at time t having left it at t = 0 is defined by
QC(x) =
(1− x)t/2pC(t). (3)
Letting ω be a walk on C starting at r, ω(t) the vertex where the walker is to be found
at time t, and ρω(t) the probability for the walker to step from ω(t) to ω(t+1), we have
QC(x) =
ω:r→r
(1− x)
|ω|−1
ρω(t). (4)
A similar relation gives the generating function for probabilities for first return to the
root, PC(x), except that the trivial walk of duration 0 is excluded. The two functions
are related by
QC(x) =
1− PC(x)
, (5)
and it is straightforward to show that PC(x) satisfies the recurrence relation
PC(x) =
(1− x)b−
3− b+PC1(x)− bTPT1(x)
. (6)
Biased random walks on random combs 5
PSfrag replacements
(1 + 2ǫ2)
(1− 2ǫ2)1
(1 + 2ǫ2)
(1− ǫ1 − ǫ2) 13(1 + ǫ1 − ǫ2)
PSfrag replacements
(1 + 2ǫ2)
(1− 2ǫ2)
(1 + 2ǫ2)
(1− ǫ1 − ǫ2)
(1 + ǫ1 − ǫ2) ǫ1
Figure 2. Bias parameterisation.
Note that PC(x) and QC(x) depend upon ǫ1 and ǫ2; to avoid clutter we will normally
suppress this dependence but if necessary it will appear as superscripts. It is important
for what follows that QC is a convex function of PC which is itself a convex function of
PT1 , ...PTk , PCk for any k > 0.
For an ensemble of combs, we will denote the expectation values of the generating
functions for return and first return probabilities as
Q(x) = 〈QC(x)〉µ
P (x) = 〈PC(x)〉µ . (7)
We will say that g(x) ∼ f(x) if there exist positive constants c, c′, σ, σ′ and x0 such
c f(x) exp
−σ(log |f(x)|)1/a
< g(x) < c′ f(x)| log f(x)|σ′ (8)
for 0 < x ≤ x0. The tactic of this paper is to prove bounds of this form for the generating
functions; in almost all cases our results are in fact a little stronger having σ = σ′ = 0
Biased random walks on random combs 6
when we will say that g(x) ≈ f(x).
The random walk on C is recurrent if PC(0) = 1 in which case we define the
exponent β through
1− PC(x) ∼ xβ . (9)
If β is an integer then we expect logarithmic corrections and define β̃ if
1− PC(x) ≈ xβ | log x|−β̃. (10)
It follows that QC(x) diverges as x → 0 and we define α by
QC(x) ∼ x−α, (11)
and if α is an integer, α̃ when
QC(x) ≈ x−α| log x|α̃. (12)
If PC(0) < 1 then the random walk is non-recurrent, or transient, and QC(x) is finite
as x → 0. Then if, as x → 0, the first k − 1 derivatives of QC(x) are finite but the kth
derivative diverges we define the exponent αk by
C (x) ∼ x
−αk , (13)
and if αk is an integer, α̃k when
C (x) ≈ x
−αk | log x|α̃k . (14)
In considering the ensemble of combs µ, we define all these exponents in exactly
the same way simply replacing PC(x) with 〈PC(x)〉µ and so on. Note that for a single
recurrent comb β = α but in an ensemble this is no longer necessarily the case; applying
Jensen’s inequality to (5) we see that β ≤ α.
If Q(k)(x) ∼ x−αk then it is straightforward to show that
Rk(λ) =
tk 〈pC(t)〉µ ∼ λ
−αk . (15)
It follows that if the sequence decays uniformly at large t, which we do not prove, then
it falls off as tαk−1−k. Thus we define ds = 2(1 + k − αk). Similarly if Q(x) ≈ | log x|α̃
then R(λ) ≈ | log λ|α̃ and, again assuming uniformity, p(t) falls off as t−1 | log t|α̃−1.
2.3. Two-point functions
Let p1C(t;n) denote the probability that the walker on C, having left r at t = 0 and not
subsequently returned there, is at point n on the spine at time t. The corresponding
generating function, which we will call the two-point function, is defined by
GC(x;n) =
(1− x)t/2p1C(t;n). (16)
Letting ω be a walk on C starting at r and ending at n without returning to r we have
GC(x;n) =
ω:r→n
(1− x)
|ω|−1
ρω(t). (17)
Biased random walks on random combs 7
Following the discussion in section 2.2 of [14] this leads us to the representation
GC(x;n) =
b+(1− x)n/2
PCk(x). (18)
2.4. The Heat kernel
Let KC(t;n, ℓ) denote the probability that the walker on C, having left r at t = 0, is at
point ℓ in tooth Tn at time t. KC(t;n, ℓ) satisfies the diffusion equation on C so we call
it the heat kernel. The probability that the walker has travelled a distance n along the
spine at time t is given by
KC(t;n) =
KC(t;n, ℓ), (19)
and has generating function
HC(x;n) =
(1− x)t/2KC(t;n). (20)
HC(x;n) can be written as
HC(x;n) =
GC(x;n)
1− PC(x)
D|Tn|(x), (21)
where
Dℓ(x) = 1 +
GNℓ(x; k), (22)
and we define
H(x;n) = 〈HC(x;n)〉µ . (23)
Note that, because KC(t;n) is a probability,
H(x;n) =
. (24)
The exponent dk is defined through the moments in n
nk H(x;n) ≈ x−1−dk , (25)
and in the case dk = 0 the exponent d̃k is defined when
nk H(x;n) ≈ x−1 | log x|d̃k . (26)
If ǫ1 ≥ 0 one can show that on any comb 〈n〉ω:|ω|=t is a non-decreasing sequence and
thus that there is some constant T0 such that for T > T0
| log T |
〈n〉ω:|ω|=T + 〈n〉ω:|ω|=T+1
< c T d1, d1 6= 0
c | log T |d̃1 <
〈n〉ω:|ω|=T + 〈n〉ω:|ω|=T+1
< c | log T |d̃1, d1 = 0.
Biased random walks on random combs 8
If ǫ1 < 0 (for which we always have d̃1 = 0) then we have only the weaker result that
for T > T0
| log T |
)1+d1
〈n〉ω:|ω|=t
< c T 1+d1 . (28)
3. Basic properties
3.1. Results for simple regular combs
The relation (6) can be used to compute the generating functions for a number of simple
regular graphs which will be important in our subsequent analysis [14].
(i) An infinitely long tooth, N∞:
P∞(x) =
2 if ǫ2 = 0;
1− 2|ǫ2|
4|ǫ2|
(1− 2ǫ2) +O(x2) otherwise.
(ii) A tooth of length ℓ, Nℓ:
Pℓ(x) = P∞(x)
1 +XY 1−ℓ
1 +XY −ℓ
where
bT (1− P∞(x))
2− bT (1 + P∞(x))
, Y =
2− bTP∞(x)
bTP∞(x)
. (31)
(iii) The comb ♯ given by {Tk = N1, ∀k} has all teeth of length 1, and
P♯(x) =
1− B1x
2 +O(x) if ǫ1 = 0;
1− ǫ2 − |ǫ1|
− x B2
+O(x2) otherwise.
Note that ♯ is non-recurrent if ǫ1 > 0. It is also convenient to define ℓ♯ to be
{T1 = Nℓ, C1 = ♯}.
(iv) The comb ∗ given by {Tk = N∞, ∀k} has all teeth of length ∞ and is non-recurrent
for ǫ2 > 0,
P∗(x) =
1 + ǫ2 −
4ǫ2 + ǫ
− x B1√
4ǫ2 + ǫ
+O(x2). (33)
Otherwise
P∗(x) =
1− |ǫ1|
1 + ǫ1
2 +O(x) if ǫ2 = 0, ǫ1 6= 0;
1− ǫ2 − |ǫ1|
+O(x2) if ǫ2 < 0, ǫ1 6= 0;
1− B4x
2 +O(x) if ǫ2 < 0, ǫ1 = 0.
Biased random walks on random combs 9
(v) The comb ♭ℓ given by {Tk = Nℓ, ∀k} has all teeth of length ℓ and
P♭ℓ(x) =
1− |ǫ1|
1 + ǫ1
(ℓ+ 1 + |ǫ1|)x+O(x2ℓ2) if ǫ2 = 0, ǫ1 6= 0;
1− ǫ2 − |ǫ1|
|ǫ1ǫ2|
+O(xY −ℓ) if ǫ2 < 0, ǫ1 6= 0;
2 +O(x
2Y −ℓ) if ǫ2 < 0, ǫ1 = 0;
where, as x → 0,
Y → 1 + 2|ǫ2|
1− 2|ǫ2|
. (36)
When ǫ2 > 0 let ℓ̄ = ⌊| log x|/ log Y ⌋, where ⌊z⌋ denotes the integer below z. For
ℓ > 2ℓ̄ the teeth are long enough that P♭ℓ(x) behaves like (33). For ℓ̄ < ℓ ≤ 2ℓ̄,
P♭ℓ(x) is non-recurrent with the leading power of x being fractional. For ℓ ≤ ℓ̄
P♭ℓ<ℓ̄(x) =
1− ǫ2 − |ǫ1|
|ǫ1ǫ2|
+O(x) if ǫ1 6= 0;
ℓ +O(x
ℓ, xY −ℓ) if ǫ1 = 0,
where the notation O(a, b) means O(max (a, b)).
3.2. General properties of the generating functions
The generating functions for any comb satisfy three simple properties which can be
derived from (6):
(i) Monotonicity The value of PC(x) decreases monotonically if the length of a tooth
is increased.
(ii) Rearrangement If the comb C ′ is created from C by swapping the adjacent teeth
Tk and Tk+1 then PC′(x) > PC(x) if |Tk+1| < |Tk|.
(iii) Inheritance If walks on Ck or Tk are non-recurrent for finite k then walks on C are
non-recurrent.
The proof of the first two follows that given in [14] for the special case ǫ2 = ǫ1 = 0. The
third can be shown by assuming that either PC1(0) < 1 or PT1(0) < 1; it then follows
immediately from (6) that PC(0) < 1 and the result follows by induction.
3.3. Useful elementary bounds
By monotonicity GC(x;n) is always bounded above by G♯(x;n) from which we get
GC(x;n) <
exp(−nΛǫ1,ǫ2(x)), (38)
Biased random walks on random combs 10
where
Λǫ1,ǫ2(x) =
2 + ǫ2
if ǫ1 > 0,
2 + ǫ2
1− ǫ2
if ǫ1 = 0,
if ǫ1 < 0.
Now let P
C (x) denote the contribution to PC(x) from walks that reach beyond n = N
on the spine. It is straightforward to show using the arguments of section 2.5 of [14]
C (x) ≤
ǫ1,ǫ2
C (x;N)G
−ǫ1,ǫ2
C (x;N). (40)
Combining this with (38) we obtain the useful bound
C (x) ≤
exp(−N(Λǫ1,ǫ2(x) + Λ−ǫ1,ǫ2(x))). (41)
Now consider the ensemble µ′ of combs C for which: Tk = N1, k = 1..K − 1,
TK = Nℓ; at k > K teeth are short, Tk = N1, with probability 1 − p or long, Tk = Nℓ,
with probability p; and the nth tooth is short, Tn = N1. Then using the representation
(18) GC(x;n) can be bounded above by noting that if Tk+1 = Nℓ then PCk < Pℓ♯,
otherwise PCk < P♯. This gives
GC(x, n) ≤
(1− x)−n/2
Pℓ♯(x)
n−K−kP♯(x)
k+K , (42)
and hence
〈GC(x, n)〉µ′ =
n−K−1
n−K−1
pn−K−1−k(1− p)kGC(x, n)
P♯(x)
KPℓ♯(x)
(1− x)n/2
((1− p)P♯(x) + pPℓ♯(x))n−K−1 . (43)
4. Results independent of the comb ensemble µ
In this section we show that in some regions of ǫ1,2 the behaviour at large time is
essentially independent of the comb ensemble, or else simply dependent upon 〈PT (x)〉µ.
The leading, and where different, the leading non-analytic, behaviour of 〈PT (x)〉µ as
x → 0 for the measures studied here is given in table 1. The results for µA are trivial,
as are those for any measure when ǫ2 < 0, while the case µ
B and ǫ2 = 0 can be derived
using the techniques in [14]. The calculation for µB and ǫ2 > 0 is somewhat subtle and
is included in Appendix A.
Biased random walks on random combs 11
Table 1. Leading and leading non-analytic behaviour of 1− 〈PT 〉µ in various cases.
ensemble ǫ2 < 0 ǫ2 = 0 ǫ2 > 0
µA Bx Bx
2 B +B′x
µB, a < 2 Bx Bxa/2 B(| log x|a−1)−1
µB, a = 2k Bx Bx+ . . . B′xk| log x| B(| log x|a−1)−1
µB, a > 2, a 6= 2k Bx Bx+ . . . B′xa/2 B(| log x|a−1)−1
4.1. ds when ǫ2 < 0
First we show that for any comb ensemble
0 if ǫ1 < 0 and ǫ2 < 0;
1 if ǫ1 = 0 and ǫ2 < 0.
By monotonicity we have that for any comb C
P∗(x) ≤ PC(x) ≤ P♯(x). (45)
Taking expectation values and using (32) and (34) it follows that for ǫ2 < 0
P (x) = 〈PC(x)〉µ =
1−B1x
2 +O(x) if ǫ1 = 0,
1− ǫ2 − |ǫ1|
+O(x2) otherwise.
Similarly
Q∗(x) ≤ QC(x) ≤ Q♯(x) (47)
and so
Q(x) = 〈QC(x)〉µ =
+O(1) if ǫ1 = 0,
B2|ǫ1|
+O(1) if ǫ1 < 0,
and (44) follows.
4.2. ds when ǫ1 > 0
When ǫ1 > 0 all combs are non-recurrent and so we must examine the derivatives of
Q(x). Differentiating (5) and (6) gives
C (x) = QC(x)
C (x), (49)
C (x) =
−PC(x)
PC(x)
(1− x)b−
(x) + b+P
. (50)
Biased random walks on random combs 12
By monotonicity (50) can be bounded above and below by replacing PC with P∗ and
P♯ respectively. Taking the expectation value and using translation invariance to note
that 〈PC〉µ = 〈PC1〉µ shows that, if
T (x)
diverges as x → 0, then
Q(1)(x) ∼ B
T (x)
T (x)
, 1). (51)
As can be seen from table 1, in some cases 〈PT (x)〉µ is analytic, or only higher derivatives
diverge. For the measures considered here it can be shown that if 〈PT (x)〉µ is analytic
at x = 0 then so is Q(x). If on the other hand 〈PT (x)〉µ is not analytic but the k’th
derivative diverges then
Q(k)(x) = B
T (x)
T (x)
, 1). (52)
The proof is a straightforward but tedious generalization of (49) and (50) and is relegated
to Appendix B. If a derivative of Q(x) diverges then ds can be read off using (14) and
(52). Otherwise if all finite order derivatives are finite then pC(t) decays at large t faster
than any power and ds is not defined.
4.3. dk when ǫ2 < 0 or ǫ1 < 0
We show that for any comb ensemble
d̃k = 0, dk =
0 if ǫ1 < 0,
k/2 if ǫ1 = 0 and ǫ2 < 0,
k if ǫ1 > 0 and ǫ2 < 0.
It is trivial to show that
1 ≤ Dℓ ≤
, ǫ2 < 0, (54)
and then by monotonicity we get
G∗(x;n)
1− P∗(x)
≤ H(x;n) ≤ B
G♯(x;n)
1− P♯(x)
. (55)
Combining this with (32) and (34) yields the results for ǫ2 < 0.
To deal with ǫ1 < 0 and ǫ2 ≥ 0 note that monotonicity gives
D|Tn|(x)
1− PC(x)
G∗(x;n) ≤ H(x;n) ≤
D|Tn|(x)
1− PC(x)
G♯(x;n). (56)
Using the lower bound and (18), (24) and (33) we get after summing over n
D|Tn|(x)
1− PC(x)
4ǫ2 + ǫ
1 − ǫ1 − 2ǫ2
H(x;n) ≤
. (57)
Inserting this into the upper bound of (56) gives
H(x;n) ≤
4ǫ2 + ǫ
1 − ǫ1 − 2ǫ2
G♯(x;n). (58)
Biased random walks on random combs 13
It is a trivial consequence of (24) that
nkH(x;n) >
, k > 0, (59)
and the results then follow by using (38).
5. The spectral dimension when ǫ2 ≥ 0 and ǫ1 ≤ 0
Here and in some of the sections to follow we will need to sum over the location of the
first long tooth to determine the spectral dimension. Most generally we call a tooth
long when it has length ≥ ℓ and short when it has length < ℓ. Consider combs for
which the first L− 1 teeth are short but the Lth tooth is long; the probability for this
is p(1− p)L−1, where p is the probability of a tooth being long. Denoting by ℓL a comb
having the first long tooth at vertex L gives
Q(x) =
〈QℓL(x)〉µ p(1− p)
L−1. (60)
QℓL(x) is bounded above by the comb in which all teeth at n ≥ L + 1 are short, and
below by the comb in which all teeth at n ≥ L+ 1 are infinite,
Q{Tn<L=Nℓ′ ,ℓ′≤ℓ;Tn≥L=N∞}(x) < QℓL(x) < Q{Tn6=L=N1;TL=Nℓ}(x). (61)
5.1. µA – Infinite teeth at random locations
5.1.1. ǫ2 = 0, ǫ1 < 0 We first show that the exponent β =
– so it is unchanged from
the comb ∗. This result follows from the inequalities
1− pBx
+O(x) ≤ P (x) ≤ 1− pB′x
2 +O(x). (62)
The lower bound is obtained by applying Jensen’s inequality to (6). To get the upper
bound we average over the first tooth and then by monotonicity we obtain
P (x) ≤ pPℓ♯(x) + (1− p)P♯(x), (63)
with ℓ = ∞ and using (6) and (32) gives the bound required.
The spectral dimension is given by
1 if p ≥ 2|ǫ1|(1 + |ǫ1|)−1,
log(1− p)
1−|ǫ1|
1+|ǫ1|
) otherwise.
This result follows from estimating the sum in (60) using the bounds in (61) with ℓ = ∞
and short teeth being N1. PC(x) for these bounding combs is computed in Appendix C
and using (C.3) we get upper and lower bounds on Q∞L(x) of the form
Bx+B′x
1−|ǫ1|
1+|ǫ1|
. (65)
Biased random walks on random combs 14
5.1.2. ǫ2 > 0, ǫ1 < 0 The probability that C is non-recurrent is at least p, the
probability that T1 = N∞, and hence
P (0) < 1. (66)
In fact it follows from the lemma of Appendix B that P (k)(x) is finite for all finite k so
the exponent β is undefined.
The spectral dimension is given by
2 log(1− p)
1−|ǫ1|−ǫ2
1+|ǫ1|−ǫ2
) . (67)
To show this we start by estimating Q(x) in exactly the same way as in 5.1.1 except
that the behaviour of the limiting combs is now given by (C.5) so that there are upper
and lower bounds on Q∞L(x) of the form
Bx+B′
1−|ǫ1|−ǫ2
1+|ǫ1|−ǫ2
. (68)
When p ≤ 1 − b+/b− this sum diverges at x = 0 and it is then straightforward
to obtain (67). For larger p the sum is convergent at x = 0 so we next examine
Q(1)(x) =
. Note that −P (1)C ≥ 13b−; then letting Z be a very large integer
and using Hölder’s inequality
QC(x)
≤ −Q(1)(x) ≤
QC(x)
2+1/Z
−P (1)C (x)
. (69)
By the lemma of Appendix B the second factor in the upper bound is finite as x → 0
so we need an estimate of 〈Q2C〉µ. This is provided by (68) modified by squaring the
denominator; when p ≤ 1 − (b+/b−)2 this sum diverges at x = 0 and once again we
obtain (67). For still larger p both Q and Q(1) are finite at x = 0 and we examine
the second and higher derivatives. This uses (B.4), (−1)kP (k)C ≥ bk−b
2k−1, Hölder’s
inequality and the lemma; the term with the highest power of QC dominates and the
result is always (67). ‡
5.1.3. ǫ2 > 0, ǫ1 = 0 By the same argument as in 5.1.2 we find P (0) < 1, so β is again
undefined. An upper bound on Q(x) may be obtained as in 5.1.1 using (C.9) to get
Q∞L(x) ≤ (L+ (1− ǫ2)/4ǫ2) (70)
which means the upper bound of (60) is finite. A proof that all derivatives of Q(x) are
finite is given in Appendix B.2, so pC(t) decays faster than any power at large t.
‡ Strictly speaking when 1− (b+/b−)k < p ≤ 1− (b+/b−)k+1/Z the upper bounds diverge so our proof
does not work for these arbitrarily small intervals.
Biased random walks on random combs 15
5.2. µB – Teeth of random length
In this subsection we are concerned with random combs that have a distribution of tooth
lengths. The general strategy for determining quantities of interest is to identify teeth
that are long enough to affect the critical behaviour of the biased random walk and
consider the probability with which they occur. It will be useful to define the function
λ(δ, η, ζ) = ⌊
δ| log x|η − ζ(a− 1) log | log x|
log Y
⌋, (71)
which will be used to denote a tooth length, and the function
(ℓ) =
(a− 1)ℓa−1
1 +O(ℓ−1)
, (72)
which is the probability that a tooth has length greater than ℓ− 1.
5.2.1. ǫ2 = 0, ǫ1 < 0 We first show that
if a < 2,
1 otherwise.
The proof follows the lines described in section 5.1.1 with a slight modification for
the upper bound on P (x). Note that, from (30), teeth of length ℓ > ⌊x− 12 ⌋ have
PT (x) ≤ 1− Bx
2 . We then proceed as in (63) but with ℓ = ⌊x− 12 ⌋+ 1.
The exponent β is non-trivial if a < 2 but, as we now show, ds = 0 for all a > 1 so
mean field theory does not apply when a < 2. This result follows from the inequalities
x| log x|
≤ Q(x) ≤ B
. (74)
The upper bound is a consequence of Q(x) < Q♯(x). To obtain the lower bound consider
the combs for which at least the first N teeth are all shorter than ℓ0. Then using
monotonicity and (41)
Q(x) ≥ (1− p>(ℓ0))
1− P♭ℓ0(x) +O(exp(−N(Λǫ1,ǫ2(x) + Λ−ǫ1,ǫ2(x))))
. (75)
Setting ℓ0 = λ(1, (a− 1)−1, 0), N = ⌊2(Λǫ1,ǫ2 +Λ−ǫ1,ǫ2)−1| log x|⌋+ 1 and using (35) the
result follows for small enough x.
5.2.2. ǫ2 > 0, ǫ1 < 0 The exponent β = 0 but there are computable logarithmic
corrections and we find that
| log x|a−1
≤ P (x) ≤ 1− B
| log x|a−1
. (76)
The lower bound follows from applying Jensen’s inequality to (6). For the upper bound
note that teeth of length ℓ > λ(1, 1, 0) have PT < B. Again proceed as in (63) with
ℓ = λ(1, 1, 0) + 1.
Biased random walks on random combs 16
Table 2. 〈Dℓ〉µ in various cases.
ensemble ǫ2 < 0 ǫ2 = 0 ǫ2 > 0
µA B +O(x) Bx−
2 +O(1) Bx−1 +O(1)
µB, a ≥ 2 B +O(x) B +O(x) B(x| log x|a−1)−1 +O(1)
µB, a < 2 B +O(x) Bxa/2−1 +O(1) B(x| log x|a−1)−1 +O(1)
The spectral dimension is ds = 0 showing again that mean field theory does not
apply. This follows from the inequalities
B′ exp
−B′′| log x|1/a
≤ Q ≤ B
, (77)
for small enough x. The upper bound is a consequence of Q(x) < Q♯(x) and the lower
bound follows from (75) by setting ℓ0 = λ(1, 1/a, 0), N = ⌊2(Λǫ1,ǫ2+Λ−ǫ1,ǫ2)−1| log x|⌋+1
and using (37).
5.2.3. ǫ2 > 0, ǫ1 = 0 The exponent β = 0, but there are logarithmic corrections which
follow from the inequalities
| log x|(a−1)/2
≤ P (x) ≤ 1− B
| log x|(a−1)/2
. (78)
The lower bound comes from applying Jensen’s inequality to the recurrence relation (6).
The upper bound is obtained by requiring unitarity of the heat kernel and its proof is
relegated to Appendix D.
The spectral dimension and logarithmic exponent are given by
ds = 2,
α̃ = a− 1, (79)
which shows that mean field theory does not apply. This result follows from
B′ | log x|a−1 < Q(x) < B | log x|a−1 (80)
for small enough x which is obtained by a modified version of the argument in 5.1.1.
First let ℓ0 = λ(1, 1, ζ), so that
Pℓ0(x) = 1−
| log x|ζ(a−1)
| log x|2ζ(a−1)
. (81)
To obtain (80) we use (60) and (61) with p = p>(ℓ0), ℓ = ℓ0 and for the lower bound
set Tn<L = Nℓ0. Then using the bounds in (C.8) with ζ = 1 and (C.7) with ζ = 2 and
estimating the sums gives the result.
6. Heat Kernel when ǫ1 ≥ 0, ǫ2 ≥ 0
These calculations require 〈Dℓ〉µ in the various cases which are tabulated in table 2 for
convenience.
Biased random walks on random combs 17
6.1. µA – Infinite teeth at random locations
We show that
0 if ǫ2 > 0 and ǫ1 ≥ 0,
k/2 if ǫ2 = 0 and ǫ1 > 0.
These results follow from (85), (86) and (87) below.
Noting that for ǫ1 > 0 all combs have 1 − B−1− − < PC(x) < 1 − B−1+ and using
monotonicity gives
D|Tn|(x)
G∗(x;n) ≤ H(x;n) ≤
D|Tn|(x)
GC′(x;n)
1− PC′(x)
, (83)
D|Tn|(x)
〈GC′(x;n)〉µ , (84)
where C ′ is constructed from C by forcing Tn = N1. If ǫ2 > 0 then using (43) with
K = 0, ℓ = ∞ gives the upper bound
H(x;n) <
exp(−B′n). (85)
If ǫ2 = 0 then exactly the same calculation gives
H(x;n) <
exp(−B′x
2n) (86)
and evaluating the left hand side of (83) gives a lower bound of the same form. If ǫ2 > 0
and ǫ1 = 0 it is necessary to sum over the location of the first infinite tooth. Using
(C.9), (43) and introducing C ′ as in (83) gives
H(x;n) <
exp(−B′n). (87)
6.2. µB – Teeth of random length
We show that
0 if ǫ2 > 0 and ǫ1 ≥ 0;
ka/2 if ǫ2 = 0, ǫ1 > 0 and a < 2;
k if ǫ2 = 0, ǫ1 > 0 and a ≥ 2.
These results follow from
H(x;n) <
x| log x|a−1
exp(−B′n/| log x|a−1) if ǫ2 > 0 and ǫ1 ≥ 0;
x1−a/2
exp(−B′nxa/2) if ǫ2 = 0, ǫ1 > 0 and a < 2;
B exp(−nB′x) if ǫ2 = 0, ǫ1 > 0 and a ≥ 2,
when x is small enough and lower bounds of the same form.
The upper bounds are obtained by proceeding as in subsection 6.1: for ǫ1 > 0 and
ǫ2 > 0 setting ℓ = λ(1, 1, 0) + 1 and for ǫ1 > 0 and ǫ2 = 0 setting ℓ = ⌊x−
2 ⌋ + 1. For
ǫ1 = 0 and ǫ2 > 0 we start with the upper bound of (83); let ℓ1 = λ(1, 1, 2), p1 = p>(ℓ1)
and ℓ2 = λ(2, 1, 0), p2 = p>(ℓ2). The latter shall be called long teeth and we denote by
Biased random walks on random combs 18
(ℓ2K♯) the comb with a single long tooth at vertex K. We now sum over the location
of the first long tooth using (18), (43) and (C.8) and taking account of the fact that the
first long tooth may be before or after the nth tooth
H(x;n) ≤
D|Tn|(x)
p2(1− p1)K−1
1− P(ℓ2K♯)(0)
P(ℓ2K♯)m
× ((1− p1)P♯ + (p1 − p2)Pℓ1♯ + p2Pℓ2♯)
n−K−1
θ(ℓ2 − |Tn|)D|Tn|(x)
p2(1− p1)K−1
1− P(ℓ2K♯)(0)
P(ℓ2K♯)m . (90)
In the first sum we use the value given in Table 2 for
D|Tn|(x)
. In the second sum
θ(ℓ2 − |Tn|)D|Tn|(x)
= B(x| log x|2(a−1))−1+O(1) for |Tn| < ℓ2 and the result follows.
To obtain the lower bounds when ǫ1 > 0 we note that
H(x;n) ≥ B−
D|Tn|(x)GC(x;n)
D|Tn|(x)
〈GC(x;n)〉µ (91)
where the measure µ is defined by
µℓ = µℓ, for teeth Tk, k 6= n,
〈Dℓ〉µ
, for tooth Tn. (92)
Using the decomposition (18) and Jensen’s inequality
〈GC(x;n)〉µ ≥
3(1− x)n/2
exp(−Sn), (93)
where
− PCk+1(x)
1− PTk+1(x)
. (94)
Now applying Jensen’s inequality with the measure µ to (6) shows that the lower bounds
satisfy a recursion formula of exactly the same form as discussed in Appendix C. So
from (C.1) we find that
Sn ≤ n
− P (x) +
〈1− PT (x)〉µ
(〈PT (x)〉µ − 〈PT (x)〉µ) +
P (x)(1− A(x))
A(x)k−1(P̄ (x)− A(x)P (x))/(P̄ (x)− P (x))− 1
, (95)
where
P (x) =
(1− x)b−
3− bT 〈PT (x)〉µ − b+P (x)
P̄ (x) =
(1− x)b−
3− bT 〈PT (x)〉µ − b+P (x)
A(x) =
(1− x)b−
P (x)2 b+
. (96)
Biased random walks on random combs 19
For ǫ1 > 0 it is straightforward to check that A(x) > c > 1 and that the sum in (95) is
bounded above by an n independent constant. Lower bounds of the form of (89) then
follow by inserting the appropriate 〈PT 〉µ in (96) and (95).
When ǫ1 = 0
H(x;n) ≥
D|Tn|(x)
GC′(x;n)
1− PC′(x)
, (97)
where C ′ is constructed from C by setting Tk≥n = N∞. Choosing ℓ0 = λ(1, 1, 2) and
using (18) and (C.7) gives
H(x;n) ≥
D|Tn|(x)
(1− p>(ℓ0))n−13(1− x)−n/2P∗(x)2
P♭ℓ0(x)− 1k−1
1− P♭ℓ0(x) + 1n−1
D|Tn|(x)
(1− p>(ℓ0))n−13(1− x)−n/2P∗(x)2
1− P♭ℓ0(x) + 1n−1
(n− 3)2(1− P♭ℓ0(x))
2P♭ℓ0(x)− 1
, (98)
for n ≥ 4 which gives the result.
7. Results and discussion
Figure 3 outlines the results that we have computed for µA. These are new and show
that the most interesting regime is actually when the bias along the spine is towards
the origin, a circumstance which has not been studied much in the literature. When
ǫ1 ≥ 0 and ǫ2 > 0 the walker disappears rapidly, never to return, and p(t) decays faster
than any power. When ǫ1 < 0 the bias along the spine is keeping the walker close
to the origin but if there are any infinite teeth present the walker can spend a lot of
time in the teeth; the conflict between these effects leads to a non-trivial ds. The fact
that d1 = 0 whenever ǫ2 > 0 shows that the walker never gets far down the spine; if
she disappears then it is up a tooth that she is lost. The Hausdorff dimension for µA
is dH = 2, regardless of bias and so we have here several examples of violation of the
bound 2dH/(1 + dH) ≤ ds ≤ dH , which applies for unbiased diffusion [19].
Figure 4 shows our results for µB as well as the results for the unbiased case studied
in [14]. This length distribution has been studied quite extensively in the literature but
usually under the assumption that ǫ2 ≥ 0. As can be seen the interesting behaviour
displayed by µA when ǫ1 < 0 does not occur here – essentially because very long teeth
are not common enough. We believe that with more work α̃ when ǫ1 < 0, ǫ2 ≥ 0 can be
found using our methods, but as this will not give further physical insight we leave the
calculations to elsewhere [17]. The case ǫ1 > 0, ǫ2 > 0 (often called topological bias) was
originally studied using mean field theory, which gave the mean square displacement
〈n2(t)〉 ∼ (log t)2(a−1), (99)
and this is in fact correct since the walker spends much of the time in the teeth. However
the claim in [3] that (99) holds for ǫ2 > 0 regardless of ǫ1 is false. The mean field method
gives the correct result when ǫ1 = 0 only because the walk on the spine is ignored, which
Biased random walks on random combs 20
amounts to using PT (x) for PC(x) in (21) and naively applying Jensen’s inequality. The
case ǫ1 > 0, ǫ2 = 0 was studied by Pottier [13] who computed the leading contribution
exactly, but without complete control over the sub-leading terms; she also calculated
the leading behaviour 〈n2〉 − 〈n〉2 which we have not. Of course our results for ds and
d1 agree with hers. The Hausdorff dimension for µ
B is dH = 3 − a when a < 2 and
dH = 1 when a ≥ 2 and so again we see that, as expected, a biasing field intensifies
the difference between the purely geometric definition of dimension and that which is
related to particle propagation.
The results for ǫ2 < 0 are intuitively obvious and, as we have proved, apply for any
model with identically and independently distributed tooth lengths. The walker never
gets far into the tooth and therefore combs have long time behaviour characteristic of
the spine alone.
This paper has given a comprehensive treatment of biased random walks on combs
using rigorous techniques – namely recursion relations for generating functions combined
with unitarity and monotonicity arguments. It serves to put in context many previous
results as well as present new ones. In the unbiased case [14] and in some bias regimes
mean field theory is sufficient to compute the leading order behaviour because the walker
either does not reach the ends of the longest teeth or does not travel far enough down
the spine for variations from average to be important. But, as is illustrated in many
examples here, a full treatment is needed when such fluctuations cannot be ignored.
Finally, while the results are of interest in themselves, an important point of the paper
was to demonstrate that rigorous analytic methods can be used to treat biased diffusion
on random geometric structures and it is to be hoped that these tools can be extended
to higher dimensional problems.
Acknowledgments
We would like to thank Bergfinnur Durhuus and Thordur Jonsson for valuable
discussions. This work is supported in part by Marie Curie grant MRTN-CT-2004-
005616 and by UK PPARC grant PP/D00036X/1. T.E. would like to acknowledge an
ORS award and a Julia Mann Graduate Scholarship from St Hilda’s College, Oxford.
Appendix A. Calculation of 〈PT (x)〉µB for ǫ2 > 0
First we rewrite (30) as
Pℓ(x) = P∞(x)Y − (Y − 1)P∞(x)X−1
X−1 + Y −ℓ
, (A.1)
so that
〈PT (x)〉µB = P∞(x)Y − (Y − 1)P∞(x)X
X−1 + Y −ℓ
(A.2)
Biased random walks on random combs 21
PSfrag replacements
(1 + 2ǫ2)
(1− 2ǫ2)
(1 + 2ǫ2)
(1− ǫ1 − ǫ2)
(1 + ǫ1 − ǫ2)
ds = 0
dk = 0
dk = 0dk = 0
dk = 0
dk = 0
ds = 1
ds = 1 ds = 3
dk = k
ds n.d. ds n.d.
ds n.d.
ds = 2 log(1− p) Ω−1
ds = log(1− p) Ω−1 p < p∗
p ≥ p∗
Figure 3. Results for µA where Ω = log
1−|ǫ1|−ǫ2
1+|ǫ1|−ǫ2
and p∗ = 2|ǫ1|(1 + |ǫ1| − ǫ2)−1.
The logarithmic exponents α̃ and d̃k are always zero for µ
PSfrag replacements
(1 + 2ǫ2)
(1− 2ǫ2)
(1 + 2ǫ2)
(1− ǫ1 − ǫ2)
(1 + ǫ1 − ǫ2)
ds = 0
ds = 0
ds = 0
dk = 0
dk = 0 dk = 0
dk = 0
dk = 0
ds = 1
ds = 3
dk = k
dk = k
ds n.d.
ds = 2 log(1− p) Ω−1
ds = log(1− p) Ω−1
p < p∗
p ≥ p∗
ds = 2 ds = 2
ds = 2 + a
α̃ ≤ 0
α̃k = 1
α̃k = 0
α̃ = a− 1 α̃ = −a
d̃k = 0
d̃k = 0
d̃k = 0 d̃k = k(a− 1) d̃k = k(a− 1)
≤ α̃ ≤ 0
if a < 2
if a < 2
if a < 2
if a ≥ 2
if a ≥ 2
if a ≥ 2
if a = 2k
if a 6= 2k
Figure 4. Results for µB . When ǫ2 < 0 the logarithmic exponents α̃ and d̃k are
always zero.
Biased random walks on random combs 22
X−1 + Y −ℓ
X−1 + Y −ℓ
≡ S. (A.3)
Since for ǫ2 > 0, Y > 1 we let log Y = ρ and write
| log x|
X−1 + e−ρℓ
| log x|
X−1 + e−ρℓ
, (A.4)
where σ is an arbitrary constant < 1. This is bounded above by taking ℓ in the
exponential to be its value at the top of each sum to give
S ≤ Ca
X−1 + xσ
| log x|
ℓ−a +
| log x|
| log x|a−1
. (A.5)
Noting that as x → 0, X−1 → Bx we get a lower bound on 〈Pℓ(x)〉µB of
〈PT (x)〉µB ≥ 1−
| log x|a−1
, (A.6)
for small enough x. An equivalent upper bound is calculated in the same manner by
ignoring the first term in (A.4) and setting σ = 1, which leads to the result quoted in
table 1. A similar procedure leads to bounds of the form B/x| log x|a on
T (x)
which we also need, at small enough x.
Appendix B. Proof of results for non-recurrent regime
First we define a structure of ordered lists of ordered integers. Let S denote an ordered
list of hS integers
[n1, n2, . . . nhS ], n1 ≥ n2 ≥ . . . ≥ nhS ≥ 1, hS ≥ 1,
[ ], hS = 0.
(B.1)
Define
|S| =
ni, hS ≥ 1,
0, hS = 0,
(B.2)
and let SN denote the set of all distinct lists S with |S| = N . Within SN the lists S
and S ′ are ordered by letting j = min(i : ni 6= n′i) and then setting S > S ′ if nj > n′j .
Finally if S ∈ SN and S ′ ∈ SN ′ with N > N ′ then S > S ′. It is convenient to denote by
S + 1 the lowest list above S, and by S ∪ S ′ the list obtained by concatenating S and
S ′ and then ordering as above.
Biased random walks on random combs 23
Now define
H(S; f(x)) = (−1)|S|
f (ni)(x), (B.3)
and for the empty list H([ ]; f(x)) = 1. We need the following lemma, which is proved
in Appendix B.1:
Lemma
(i) If 〈H(S;PT (x))〉µ is finite as x → 0 for all S ≤ S̄ then 〈H(S;PC(x))〉µ is finite as
x → 0 for all S ≤ S̄ and ǫ1 6= 0.
(ii) If the conditions of part (i) apply and, as x → 0,
H(S̄ + 1, PT (x))
diverges as
x−γ , γ > 0, then
H(S̄ + 1, PC(x))
also diverges as x−γ.
Differentiating (5) k times gives
C (x) =
C (x)
(1− PC(x))2
+ (−1)k
S∈Sk/[k]
C(S)H(S;PC(x))
(1− PC(x))hS+1
(B.4)
where C(S) is a combinatorial coefficient. It is straightforward to check for any S
that 〈H(S;PT (x))〉µA is analytic for ǫ2 6= 0, and that 〈H(S;PT (x))〉µB is analytic when
ǫ2 < 0. When ǫ2 = 0
H(S;Pℓ(x))|x=0 = cSℓ2|S|−hS
1 +O(l−2)
(B.5)
from which 〈H(S;Pℓ(x))〉µB is divergent for S = [⌈a/2⌉], and with smaller degree for
[⌈a/2⌉ − 1, 1] if 2k < a ≤ 2k + 1, k ∈ Z, but always convergent for any inferior S. The
results given in section 4.2 then follow from noting that P∗(x) < PC(x) < P♯(0) < 1 and
using the lemma.
Appendix B.1. Proof of lemma
To prove the lemma note that
H(S; f + g) =
S′∪S′′=S
H(S ′; f)H(S ′′; g) (B.6)
and differentiate (6) n times to get
(−1)nP (n)C (x) = (1− x)F
C (x) + nF
(n−1)
C (x) (B.7)
where
C (x) =
PC(x)
PC(x)b+
(1− x)b−
S′∪S′′=S
)hS′′
H(S ′;PC1(x))H(S
′′;PT1(x)).
(B.8)
Biased random walks on random combs 24
It is then straightforward to generalise this formula to
H(S, PC(x)) = R+ (PC(x))hS
S′∈S|S|
C(S, S ′)
PC(x)b+
(1− x)b−
S′′∪S′′′=S′
)hS′′′
H(S ′′;PC1(x))H(S
′′′;PT1(x)),
(B.9)
where the leading terms are written out explicitly and R contains contributions
depending only on lists inferior to S |S|. Every term on the right hand side is positive so
it can be bounded above by using PC(x) < P♯(0) and then the expectation value taken;
moving the S ′′ = S term to the left hand side gives
〈H(S, PC(0))〉µ
P♯(0)
R+ (P♯(0))hS
S′∈S|S|
C(S, S ′)
P♯(0)b+
S′′∪S′′′=S′
S′′ 6=S
)hS′′′
〈H(S ′′;PC1(0))〉µ 〈H(S
′′′;PT (0))〉µ .
(B.10)
Part (i) is true for S̄ = [1] so the lemma then follows immediately by induction on S.
To prove part (ii) use part (i) to isolate the potentially divergent terms in (B.9) leaving
H(S̄, PC(x)) =
PC(x)
(1− x)b−
H(S̄;PC1(x)) +
H(S̄;PT1(x))
+ finite terms. (B.11)
For small enough x,
PC(x)
(1− x)b−
< 1, ∀C (B.12)
and part (ii) follows upon taking expectation values.
Appendix B.2. ǫ1 = 0, ǫ2 > 0
We will show that
H(S, PC(x))
(1− PC(x))hS+1
(B.13)
is finite at x = 0, which together with (B.4) gives the result. Using (61) and (C.9) gives
C H(S, PC(x))
, (B.14)
Biased random walks on random combs 25
where nC is the location of the first infinite tooth of C. Applying (B.9) iteratively we
find that the right hand side is bounded above by terms of the form
〈H(S ′, PT (x))〉µA . (B.15)
The maximum value of K occurring is hS + 1 + ΦS where ΦS is the number of strings
inferior to S. As remarked before 〈H(S ′, PT (x))〉µA is analytic and
is trivially
finite which completes the proof.
Appendix C. Calculation of PC(x) for some useful combs
Let the comb C have Tk = Nℓ, k < L and arbitrary TL and CL. Then following the
method of Appendix A of [14] we find
P ǫ1ǫ2C (x) = P
(1−A)(P ǫ1ǫ2CL−1(x)− P
AL−1(P ǫ1ǫ2CL−1(x)−AP
(x))− (P ǫ1ǫ2CL−1(x)− P
(C.1)
where
(1− x)b−
(P ǫ1ǫ2
(x))2b+
. (C.2)
Setting ǫ2 = 0, ǫ1 < 0, ℓ = 1, TL = N∞ and CL = ♯ we find after some algebra that
P ǫ10C (x) = P
♯ (x)
1 + A−L
+O(x)
(C.3)
and, as x → 0,
1 + |ǫ1|
1− |ǫ1|
. (C.4)
Repeating the exercise but with CL = ∗ yields a similar result.
If instead we set ǫ2 > 0, ǫ1 < 0, ℓ = 1, TL = N∞ and CL = ♯ we find
P ǫ1ǫ2C (x) = P
♯ (x)
2ǫ2(A− 1)A−L
ǫ1 − 2ǫ2(1− A−L)
(1 +O(x))
(C.5)
and, as x → 0,
A → 1 + |ǫ1| − ǫ2
1− |ǫ1| − ǫ2
. (C.6)
Again, repeating the exercise but with CL = ∗ yields a similar result.
With ǫ2 > 0, ǫ1 = 0, and C = {Tk<L = Nℓ, Tk≥L = N∞} we find that
P 0ǫ2C (x) > P
(x)− 1
, L > 2, (C.7)
(it is good enough to use P∗(x) for k = 2); and for C = {Tk 6=L = N1, TL = Nℓ}, x < x0,
P 0ǫ2C (x) < P
♯ (x)
1− 1
AL−1−1
 (C.8)
where A = (1− x)(P 0ǫ2♯ (x))−2 and B is a positive constant depending on x0, A and ǫ2.
Finally for ǫ2 > 0, ǫ1 = 0, and C = {Tk 6=L = N1, TL = N∞} we find that
P 0ǫ2C (0) = 1−
L+ (1− ǫ2)/4ǫ2
. (C.9)
Biased random walks on random combs 26
Appendix D. Upper bound on P (x) when ǫ1 = 0, ǫ2 > 0
We start by writing
H(x;n) =
D|Tn|(x)
GC(x;n)
1− PC(x)
, (D.1)
where the measure µ̄ is defined in (92). Applying Jensen’s inequality with this measure
to (6) results in a recursion formula of the same form as discussed in Appendix C and
it is easy to verify that 〈PCk(x)〉µ ≥ 〈PCk(x)〉µ to give
GC(x;n)
1− PC(x)
GC(x;n)
1− PC(x)
b+(1− x)n/2
−n 〈1− PC(x)〉µ
〈1− PC(x)〉µ
, (D.2)
where in the last line we have again used Jensen’s inequality when averaging over the
ensemble. Applying this result to (D.1), summing over n, and using (24) we obtain the
inequality
D|Tn|(x)
〈1− PC(x)〉2µ
. (D.3)
Using the value for 〈Dℓ(x)〉µ given in table 2 and rearranging gives the upper bound on
P (x) quoted in 5.2.3.
References
[1] G. H. Weiss and S. Havlin, Some properties of a random walk on a comb structure, Physica 134A
(1986) 474-484
[2] S. Revathi, V. Balakrishnan, S. Lakshmibala and K. P. N. Murthy, Validity of the mean-field
approximation for diffusion on a random comb, Phys. Rev. E 54 (1996) 2298-2302
[3] D. ben-Avraham and S. Havlin, Diffusion and reactions in fractals and disordered systems,
Cambridge University Press, Cambridge (2000)
[4] S. Havlin, J. E. Kiefer and G. H. Weiss, Anomalous diffusion on a random comblike structure,
Phys. Rev. A 36 (1987) 1403-1408
[5] J. Ambjørn, B. Durhuus and T. Jonsson, Quantum geometry: a statistical field theory approach,
Cambridge University Press, Cambridge (1997)
[6] J. Ambjørn and Y. Watabiki, Scaling in quantum gravity, Nucl. Phys. B445 (1995) 129-144,
hep-th/9501049
[7] J. Ambjørn, J. Jurkiewicz and R. Loll, Spectral dimension of the universe, Phys. Rev. Lett. 95
(2005) 171301, hep-th/0505113
[8] C. Aslangul, P. Chvosta and N. Pottier, Analytic study of a model of diffusion on a random
comblike structure, Physica A 203 (1994) 533-565
[9] S. Havlin, A. Bunde, H. E. Stanley and D. Movsholvitz, Diffusion on percolation clusters with a
bias in topological space: non-universal behaviour, J. Phys. A 19 (1986) L693-L698
[10] V. Balakrishnan and C. Van den Broeck, Transport properties on a random comb, Physica A 217
(1995) 1-21
[11] S. Havlin, A. Bunde, Y. Glaser and H. E. Stanley, Diffusion with a topological bias on random
structures with a power-law distribution of dangling ends, Phys. Rev. A 34 (1986) 3492-3495
http://arxiv.org/abs/hep-th/9501049
http://arxiv.org/abs/hep-th/0505113
Biased random walks on random combs 27
[12] N. Pottier, Diffusion on random comblike structures: field-induced trapping effects, Physica A 216
(1995) 1-19
[13] N. Pottier, Analytic study of a model of biased diffusion on a random comblike structure, Physica
A 208 (1994) 91-123
[14] B. Durhuus, T. Jonsson and J. F. Wheater, Random walks on combs, J. Phys. A 39 (2006) 1009-
1038, hep-th/0509191
[15] B. Durhuus, T. Jonsson and J. F. Wheater, The spectral dimension of generic trees,
math-ph/0607020
[16] B. Durhuus, T. Jonsson and J. F. Wheater, On the spectral dimension of generic trees, DMTCS
proc. AG (2006), 183-192
[17] T.M. Elliott, Oxford University D.Phil Thesis, in preparation.
[18] W. Feller, An introduction to probability theory and its applications, Vol.2, Wiley, London (1968)
[19] A. Grigoryan and T. Coulhon, Pointwise estimates for transition probabilities of random walks
in infinite graphs, in: Trends in mathematics: Fractals in Graz 2001, Ed. P. Grabner and W.
Woess. Birkhäueser (2002)
http://arxiv.org/abs/hep-th/0509191
http://arxiv.org/abs/math-ph/0607020
	Introduction
	Definitions
	Random combs
	Biased random walks
	Two-point functions
	The Heat kernel
	Basic properties
	Results for simple regular combs
	General properties of the generating functions
	Useful elementary bounds
	Results independent of the comb ensemble 
	ds when 2<0 
	ds when 1>0
	dk when 2<0 or 1<0
	The spectral dimension when 20 and 10
	A – Infinite teeth at random locations
	2= 0, 1< 0
	2> 0, 1< 0
	2> 0, 1= 0
	B – Teeth of random length
	2= 0, 1< 0
	2> 0, 1 <0 
	2> 0, 1 =0 
	Heat Kernel when 1 0, 2 0
	A – Infinite teeth at random locations
	B – Teeth of random length
	Results and discussion
	Calculation of "426830A PT(x) "526930B B for 2>0
	Proof of results for non-recurrent regime
	Proof of lemma
	1=0, 2>0
	Calculation of PC(x) for some useful combs
	Upper bound on P(x) when 1=0, 2>0
ABSTRACT
  We develop rigorous, analytic techniques to study the behaviour of biased
random walks on combs. This enables us to calculate exactly the spectral
dimension of random comb ensembles for any bias scenario in the teeth or spine.
Two specific examples of random comb ensembles are discussed; the random comb
with nonzero probability of an infinitely long tooth at each vertex on the
spine and the random comb with a power law distribution of tooth lengths. We
also analyze transport properties along the spine for these probability
measures.

<|endoftext|><|startoftext|>
Introduction to the Structure Theory, Marcel Dekker, New York (1995).
http://arxiv.org/abs/0704.0189
http://arxiv.org/abs/1511.02056
http://arxiv.org/pdf/math/0503670
[20] G. Higman, “Finitely presented infinite simple groups”, Notes on Pure Mathematics 8, The Australian
National University, Canberra (1974).
[21] J. Hopcroft, J. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley
(1979).
[22] B. Hughes, “Trees, Ultrametrics, and Noncommutative Geometry”,
http://arxiv.org/pdf/math/0605131v2.pdf
[23] M.V. Lawson, “Orthogonal completions of the polycyclic monoids”, Communications in Algebra, 35 (2007)
1651-1660.
[24] H. Lewis, Ch. Papadimitriou, Elements of the Theory of Computation, 2nd ed., Prentice Hall (1998).
[25] J. Lehnert, P. Schweitzer, “The co-word problem for the Higman-Thompson group is context-free”, Bulletin
of the London Mathematical Society, 39 (April 2007) 235-241.
[26] R. McKenzie, R. J. Thompson, “An elementary construction of unsolvable word problems in group theory”,
in Word Problems, (W. W. Boone, F. B. Cannonito, R. C. Lyndon, editors), North-Holland (1973) pp.
457-478.
[27] V.V. Nekrashevych, “Cuntz-Pimsner algebras of group actions”, J. Operator Theory 52(2) (2004) 223-249.
[28] Elizabeth A. Scott, “A construction which can be used to produce finitely presented infinite simple groups”,
J. of Algebra 90 (1984) 294-322.
[29] Richard J. Thompson, Manuscript (1960s).
[30] Richard J. Thompson, “Embeddings into finitely generated simple groups which preserve the word prob-
lem”, in Word Problems II, (S. Adian, W. Boone, G. Higman, editors), North-Holland (1980) pp. 401-441.
J.C. Birget
Dept. of Computer Science
Rutgers University – Camden
Camden, NJ 08102
birget@camden.rutgers.edu
http://arxiv.org/pdf/math/0605131v2.pdf
	1 Thompson-Higman monoids
	1.1 Definition of the Thompson-Higman groups and monoids
	1.2 Other Thompson-Higman monoids
	1.3 Cuntz algebras and Thompson-Higman monoids
	2 Structure and simplicity of the Thompson-Higman monoids
	2.1 Group of units, J-relation, simplicity
	2.2 D-relation
	3 Finite generating sets
	4 The word problem of the Thompson-Higman monoids
	4.1 The image code formula
	4.2 Some algorithmic problems about right-ideal morphisms
	4.3 The word problem of Mk,1 is in P
ABSTRACT
  The groups G_{k,1} of Richard Thompson and Graham Higman can be generalized
in a natural way to monoids, that we call M_{k,1}, and to inverse monoids,
called Inv_{k,1}; this is done by simply generalizing bijections to partial
functions or partial injective functions. The monoids M_{k,1} have connections
with circuit complexity (studied in another paper). Here we prove that M_{k,1}
and Inv_{k,1} are congruence-simple for all k. Their Green relations J and D
are characterized: M_{k,1} and Inv_{k,1} are J-0-simple, and they have k-1
non-zero D-classes. They are submonoids of the multiplicative part of the Cuntz
algebra O_k. They are finitely generated, and their word problem over any
finite generating set is in P. Their word problem is coNP-complete over certain
infinite generating sets.
  Changes in this version: Section 4 has been thoroughly revised, and errors
have been corrected; however, the main results of Section 4 do not change.
Sections 1, 2, and 3 are unchanged, except for the proof of Theorem 2.3, which
was incomplete; a complete proof was published in the Appendix of reference
[6], and is also given here.

<|endoftext|><|startoftext|>
Introduction
Superkamiokande have been analyzing Fully Contained
Events and Partially Contained Events which are gen-
erated inside the detector, and Upward Through Going
Events and Stopping Events which are generated outside
the detector, for the studies on the neutrino oscillation in
atmospheric neutrinos. The report of oscillations between
muon and tau neutrinos for atmospheric neutrinos de-
tected with SuperKamiokande (SK, hereafter) is claimed
to be robustly established for the following reasons:
(1) The discrimination between electrons and muons in the
SK energy range, say, several hundred MeV to several
GeV, has been proved to be almost perfect, as demon-
strated by calibration using accelerator beams [1] 1.
1 The SK discrimination procedure between muon and elec-
tron is constructed on the average value theory. In our opinion,
discrimination procedure should be examined, taking into ac-
count the stochastic characters of the physical processes in the
neutrino events concerned. If we take this effect into account,
then, for example, we give uncertainties of 3◦ to 14◦ in the in-
(2) The analysis for the electron-like events and the muon-
like events which give the single-ring structure in Fully
Contained Events and Partially Contained Events with
their zenith angle distribution, based on the well es-
tablished discrimination procedure mentioned in (1),
reveals a significant deficit of muon-like events but
the expected level of electron-like events. It is, thus,
concluded that muon neutrinos oscillate into tau neu-
trinos which cannot be detected due to the small ge-
ometry of SK. As the most new one, the SK collab-
oration published their comprehensive paper[2]. The
analysis of SK data presently yields sin22θ > 0.92 and
1.5× 10−3eV2 < ∆m2 < 3.4× 10−3eV2 at 90% confi-
dence level.
(3) The analysis of Upward Through Going Events and
Stopping Events, in which the neutrino interactions
occur outside the detector, leads to similar results to
cident direction of the charged lepton and uncertainties of 2m
to 7m in the vertex point of the events. See, our papers [2].
However, SK give 1.8◦ to 3.0◦ and 0.3m for the same physical
quantities. See, accompanied two papers.
http://arxiv.org/abs/0704.0190v1
2 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
(2). The charged leptons which are produced in these
categories are regarded as being exclusively muons, be-
cause electrons have negligible probabilities to produce
such events as they lose energy very rapidly in the sor-
rounding rock. Thus for these events the discrimina-
tion procedure described in (1) is not required, and,
therefore, the analysis here is independent of the anal-
ysis in (2). For these events, however, the SK group
obtains the same parameters for neutrino oscillations
as in (2) 2.
(4) Now, SK assert that they have found the oscillatory
signature in atomospheric neutrinos from L/E anal-
ysis, which should be the ultimate evidence for the
exsistence of the neutrino oscillation [3] Our critical
examination on the L/E analysis by SK will be pub-
lished elsewhere.
As for item(3), we have clarified that SK hardly dis-
criminate electron( neutrino) from muon( neutrino ) in
the SK manner and, instead, propose more rigorous and
suitable procedure with theoretical background for the dis-
crimination between them in the preceeding two papers.
Among the neutrino events both occurring inside and out-
side the detector, the most robust evidence for the neu-
trino oscillation, if exists, should have been obtained from
the analysis of both electron-like events and muon-like
events in Fully Contained Events. Because (i) all necces-
sary informations for the physical interpretation are in-
cluded in Fully Contained Events due to their character
and (ii),
furthermore, both electron-like events and muon-like events
give the single structure image free from arbitrary inter-
pretation with the proper electron/muon discrimination
procedure. SK treat neutrino events whose energies cover
from several hundred MeV to several GeV, if the neutrino
events occur inside the detector. In this energy region,
Quasi Elastic Scattering(QEL) [4] is dominant compared
with other physical processes, such as one-pion produc-
tion [5], coherent pion production [6] and deep inelastic
scattering [7]. Events due to other processes, except QEL,
are not free from ambiguities due to multi-ring structure
of the images.
Therefore, SK should have analyzed the muon-
like events and the electron-like events with the
single ring image in Fully Contained Events exculu-
sively where QEL is dominant, without utilizing
poorer quality events, if SK pursue to obtain the
clear cut conclusion on the neutrino oscillation 3.
2 It seems strange that the experimental data with different
qualities give similarly precise results, because Fully Contained
Events whose information is totally inside the detector are of
higher experimental qualities compared with those of both Up-
ward Through Going Events and Stopping Events.
3 In their analysis, they really add Partially Contained
Events to the experimental data as those with the same quality
under the assumption that they belong to muon-like events in
Fully Contained Events to raise the statistics higher. However,
such the assumption lacks in theoretical background. Further-
more, SK utilize to add multi-ring structure events which are
Therefore, it is essential for us to examine single ring
structure events among Fully Contained Events due to
QEL which have the least ambiguities among the neutrino
events concerned to obtain clear cut conclusion as for the
neutrino oscillation. Here, the main concern of the present
paper is devoted to the detailed analysis of the muon-like
events from QEL, focusing on the direction of the incident
neutrino. Situation around the corresponding electron-like
event is the same as in the muon-like event. The exami-
nation on the separation of Fully Contained Events from
Partially Contained Events will be discussed in subsequent
papers.
Here, it should be emphasized that the direction of
the incident neutrino is assumed to be the same as that
of the emitted charged lepton, i.e., the (anti-)muon or
(anti-)electron, in the SK analysis of both Fully Contained
Events and Partially Contained Events [8,9]. The SK De-
tector Simulation is to be constructed without any con-
tradiction with the SK assumption on the direction.
From the point of orthodoxical Monte Carlo Simula-
tion, it seems to be unnatural for SK to impose such the
assumption that the direction of the incident neutrino is
the same as that of the emitted lepton ( hereafter, we call
this assumption simply ”the SK assumption on the di-
rection”) upon their Detector Simulation. Obviously, one
need not any assumption on the relation between the di-
rection of the incident neutrino and that of the emitted
lepton in any sense, if we develop the Monte Carlo Simu-
lation in a rigorous manner, which will be shown later in
the present paper.
In order to avoid any misunderstanding toward the SK
assumption on the direction we reproduce this assumption
from the original SK paper:
”However, the direction of the neutrino must be estimated
from the reconstructed direction of the products of the neu-
trino interaction. In water Cherenkov detectors, the direc-
tion of an observed lepton is assumed to be the direction of
the neutrino. Fig.11 and Fig.12 show the estimated corre-
lation angle between neutrinos and leptons as a function of
lepton momentum. At energies below 400 MeV/c, the lep-
ton direction has little correlation with the neutrino direc-
tion. The correlation angle becomes smaller with increas-
ing lepton momentum. Therefore, the zenith angle depen-
dence of the flux as a consequence of neutrino oscillation
is largely washed out below 400 MeV/c lepton momentum.
With increasing momentum, the effect can be seen more
clearly. ” [8] 4.
On the other hand, Ishitsuka states in his Ph.D thesis
which is exclusively devoted into the L/E analysis of the
caused by one-pion roduction, coherent pion production and
deep inelastic scattering. However, the discrimination among
the multi-ring structures is not so easy, which may lead the
worse estimation of energies as well as directions of the events
concerned.
4 It could be understood from this statement that SK justify
the validity of this assumption above 400 MeV/c. However, it is
not correct, because SK put ”to be proved ” as the proposition.
See, page 101 in their paper [8].
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 3
atmospheric neutrino from Super Kamiokande as follows:
” 8.4 Reconstruction of Lν
Flight length of neutrino is determined from the neu-
trino incident zenith angle, although the energy and the
flavor are also involved. First, the direction of neutrino is
estimated for each sample by a different way. Then, the
neutrino flight lenght is calclulated from the zenith angle
of the reconstructed direction.
8.4.1 Reconstruction of Neutrino Direction
FC Single-ring Sample
The direction of neutrino for FC single-ring sample is
simply assumed to be the same as the reconstructed direc-
tion of muon. Zenith angle of neutrino is reconstructed as
follows:
cosΘrecν = cosΘµ (8.17)
,where cosΘrecν and cosΘµ are cosine of the reconstructed
zenith angle of muon and neutrino, respectively. ” [9] 5.
In our understanding, SK Monte Carlo Simulation is
named usually as the Detector Simulation. It is, however,
noticed that the effect of the azimuthal angles of the emit-
ted leptons in QEL could not be taken into account in their
Simulation. As will be shown in later (see Section 3), this
effect greatly influences over the final zenith angle distri-
bution of the emitted leptons. Also, the back scattering
due to QEL can not be neglected for the rigorous deter-
mination of the direction of the incident neutrino, but this
effect could not be treated in the SK Detector Simulation,
which is beyond the application limitaition 6.
On the other hand, we could take into account these
effects correctly in our Monte Carlo Simulation which is
named as Time Sequential Simulation.
In the present paper, we carry out the full Monte Carlo
Time Sequential Simulation as exactly as possible, with-
out the SK assumption on the direction to clarify the prob-
lematic issue raised by SK. We carry out simulation which
starts from the opposite side of the Earth to the SK de-
tector. A neutrino sampled from the atmospheric neutrino
energy spectrum at the opposite side of the Earth tra-
verses through the medium with different densities in the
interior of the Earth and penetrates finally into the SK
5 It should be noticed that the SK assumption on the di-
rection may hold on the following possible two cases: [1] The
scattering angle of the emitted lepton is so small that the ef-
fect of the scattering angle could be neglected really. However,
in the present case, it could not be true from Fig. 1 and Fig.
2 and Table 1. [2] One may assert that the assumption could
not hold on individual case, but it could hold statistically af-
ter accumulation of large amount of the data. However, such
assertion should be verified. We verify such assumption could
not hold. See, Fig. 11 and Fig. 12 in the present paper.
6 SK have never clarified not only the details, but also the
principle and its validity on their Monte Carlo Simulation. We
hope disclosure of their Detector Simulation for open and fair
scientific discussion.
detector where the neutrino interactions occur. The emit-
ted energy of the individual lepton thus produced and its
direction are simulated exactly based on the probability
function of the cross sections concerned.
We finally show the zenith angle distribution of the
emitted leptons as well as that of the incident neutrinos
are quite different from corresponding ones of the SK. This
indicates that the SK assumption on the direction coud
not be a reliable estimator as for the determination of the
direction of the incident neutrino (See, section 5).
2 Cross Sections of Quasi Elastic Scattering
in the Neutrino Reaction and the Scattering
Angle of Charged Leptons.
We examine the following reactions due to the charged
current interaction (c.c.) from QEL.
νe + n −→ p+ e−
νµ + n −→ p+ µ−
ν̄e + p −→ n+ e+ (1)
ν̄µ + p −→ n+ µ+
The differential cross section for QEL is given as fol-
lows [6].
dσℓ(ℓ̄)(Eν(ν̄))
G2F cos
ν(ν̄)
A(Q2)±B(Q2)
C(Q2)
where
A(Q2) =
+ f1f2
+ g21
B(Q2) = (f1 + f2)g1Q
C(Q2) =
f21 + f
+ g21
The signs + and − refer to νµ(e) and ν̄µ(e) for charged
current (c.c.) interactions, respectively. The Q2 denotes
the four momentum transfer between the incident neu-
trino and the charged lepton. Details of other symbols are
given in [4].
The relation among Q2, Eν(ν̄), the energy of the in-
cident neutrino, Eℓ, the energy of the emitted charged
lepton (muon or electron or their anti-particles) and θs,
the scattering angle of the emitted lepton, is given as
Q2 = 2Eν(ν̄)Eℓ(1− cosθs). (3)
4 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
Fig. 1. Relation between the energy of the muon and its
scattering angle for different incident muon neutrino energies,
0.5 GeV, 1 GeV, 2 GeV, 5 GeV, 10 GeV and 100 GeV.
Also, the energy of the emitted lepton is given by
Eℓ = Eν(ν̄) −
. (4)
Now, let us examine the magnitude of the scattering
angle of the emitted lepton in a quantitative way, as this
plays a decisive role in determining the accuracy of the
direction of the incident neutrino, which is directly related
to the reliability of the zenith angle distribution of both
Fully Contained Events and Partially Contained Events in
By using Eqs. (2) to (4), we obtain the distribution
function for the scattering angle of the emitted leptons
and the related quantities by a Monte Carlo method. The
procedure for determining the scattering angle for a given
energy of the incident neutrino is described in the Ap-
pendix A. Fig. 1 shows this relation for muon, from which
we can easily understand that the scattering angle θs of
the emitted lepton ( muon here ) cannot be neglected.
For a quantitative examination of the scattering angle,
we construct the distribution function for θs of the emit-
ted lepton from Eqs. (2) to (4) by using a Monte Carlo
method.
Fig. 2 gives the distribution function for θs of the muon
produced in the muon neutrino interaction. It can be seen
that the muons produced from lower energy neutrinos are
scattered over wider angles and that a considerable part
of them are scattered even in backward directions. Simi-
lar results are obtained for anti-muon neutrinos, electron
neutrinos and anti-electron neutrinos.
Also, in a similar manner, we obtain not only the dis-
tribution function for the scattering angle of the charged
leptons, but also their average values < θs > and their
standard deviations σs. Table 1 shows them for muon neu-
trinos, anti-muon neutrinos, electron neutrinos and anti-
electron neutrinos. In the SK analysis, it is assumed that
the scattering angle of the charged particle is zero [8,9].
Fig. 2. Distribution functions for the scattering angle of the
muon for muon-neutrino with incident energies, 0.5 GeV, 1.0
GeV and 2 GeV. Each curve is obtained by the Monte Carlo
method (one million sampling per each curve).
3 Influence of Azimuthal Angle of Quasi
Elastic Scattering over the Zenith Angle of
both the Fully Contained Events and
Partially Contained Events
In the present section, we examine the effect of the az-
imuthal angles of the emitted leptons over their own zenith
angles for given zenith angles of the incident neutrinos 7.
For three typical cases (vertical, horizontal and diag-
onal), Fig. 3 gives a schematic representation of the re-
lationship between, θν(ν̄), the zenith angle of the incident
neutrino, and (θs, φ) a pair of scattering angle of the emit-
ted lepton and its azimutal angle.
From Fig. 3(a), it can been seen that the zenith angle
θµ(µ̄) of the emitted lepton is not influenced by its φ in
the vertical incidence of the neutrinos (θν(ν̄) = 0
o), as it
must be. From Fig. 3(b), however, it is obvious that the
influence of φ of the emitted leptons on their own zenith
angle is the strongest in the case of horizontal incidence
of the neutrino (θν(ν̄) = 90
o). Namely, one half of the
emitted leptons are recognized as upward going, while the
other half is classified as downward going ones. The di-
agonal case ( θν(ν̄) = 43
o) is intermediate between the
vertical and the horizontal. In the following, we examine
the cases for vertical, horizontal and diagonal incidence of
the neutrino with different energies, say, Eν(ν̄) = 0.5 GeV,
Eν(ν̄) = 1 GeV and Eν(ν̄) = 5 GeV.
The detailed procedure for the Monte Carlo simulation
is described in the Appendix A.
7 Throughout this paper, we measure the zenith angles of
the emitted leptons from the upward vertical direction of the
incident neutrino. Consequently, notice that the sign of our
direction is oposite to that of the SK ( our cos θν(ν̄) = - cos θν(ν̄)
in SK)
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 5
Table 1. The average values < θs > for scattering angle of the emitted charged leptons and their standard deviations σs for
various primary neutrino energies Eν(ν̄).
Eν(ν̄) (GeV) angle νµ(µ̄) ν̄µ(µ̄) νe ν̄e
(degree)
0.2 < θs > 89.86 67.29 89.74 67.47
σs 38.63 36.39 38.65 36.45
0.5 < θs > 72.17 50.71 72.12 50.78
σs 37.08 32.79 37.08 32.82
1 < θs > 48.44 36.00 48.42 36.01
σs 32.07 27.05 32.06 27.05
2 < θs > 25.84 20.20 25.84 20.20
σs 21.40 17.04 21.40 17.04
5 < θs > 8.84 7.87 8.84 7.87
σs 8.01 7.33 8.01 7.33
10 < θs > 4.14 3.82 4.14 3.82
σs 3.71 3.22 3.71 3.22
100 < θs > 0.38 0.39 0.38 0.39
σs 0.23 0.24 0.23 0.24
Fig. 3. Schematic view of the zenith angles of the charged
muons for diffrent zenith angles of the incident neutrinos, fo-
cusing on their azimuthal angles.
3.1 Dependence of the spreads of the zenith angle for
the emitted leptons on the energies of the emitted
leptons for different incident directions with different
energies
We give the scatter plots between the fractional energies
of the emitted muons and their zenith angle for a definite
zenith angles of the incident neutrino with different ener-
gies in Figs. 4 to 6. In Fig. 4, we give the scatter plots
for vertically incident neutrino with different energies 0.5
GeV, 1 GeV and 5 GeV . In this case, the relations between
the emitted energies of the muon and and their zenith an-
gles are unique, which comes from the definition of the
zenith angle of the emitted lepton. However, the densities
(frequencies of event number) along each curve is differ-
ent in position to position and depend on the energies of
the incident neutrinos. Generally speaking,densities along
curves become higher toward cos θµ(µ̄) = 1. In this case,
cos θµ(µ̄) is never influenced by the azimuthal angel in the
scattering by the definition 8.
Fig. 5 tells us that the horizontally incident neutrinos
give the most widely spread of the zenith angle distribu-
tion of the emitted lepton influenced by the azimuthal an-
gle. The more lower incident neutrino energies, the more
wider spreads of the emitted leptons. The diagonally in-
cident neutrinos give the intermediate distribution of the
emitted leptons between those of vertically incident neu-
trinos and horizontally incident neutrinos.
8 The zenith angles of the particles concerned are measured
from the vertical direction.
6 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
(a) (b) (c)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=0.5GeV
cosθν=1(θν=0°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=1GeV
cosθν=1(θν=0°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=5GeV
cosθν=1(θν=0°)
Fig. 4. The scatter plots between the fractional energies of the produced muons and their zenith angles for vertically incident
muon neutrinos with 0.5 GeV, 1 GeV and 5 GeV, respectively. The sampling number is 1000 for each case.
(a) (b) (c)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=0.5GeV
cosθν=0(θν=90°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=1GeV
cosθν=0(θν=90°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=5GeV
cosθν=0(θν=90°)
Fig. 5. The scatter plots between the fractional energies of the produced muons and their zenith angles for horizontally incident
muon neutrinos with 0.5 GeV, 1 GeV and 5 GeV, respectively. The sampling number is 1000 for each case.
(a) (b) (c)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=0.5GeV
cosθν=0.731(θν=43°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=1GeV
cosθν=0.731(θν=43°)
0 0.2 0.4 0.6 0.8 1
Eµ / Eν
Eν=5GeV
cosθν=0.731(θν=43°)
Fig. 6. The scatter plots between the fractional energies of the produced muons and their zenith angles for diagonally incident
muon neutrinos with 0.5 GeV, 1 GeV and 5 GeV, respectively. The sampling number is 1000 for each case.
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 7
(a) (b) (c)
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=0.5GeV
cosθν=1(θν=0°)
average=0.262
s.d.=0.547
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=1GeV
cosθν=1(θν=0°)
average=0.590
s.d.=0.439
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=5GeV
cosθν=1(θν=0°)
average=0.978
s.d.=0.067
Fig. 7. Zenith angle distribution of the muon for the vertically incident muon neutrino with 0.5 GeV, 1 GeV and 5 GeV,
respectively. The sampling number is 10000 for each case. SK stand for the corresponding ones under the SK assumption.
(a) (b) (c)
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=0.5GeV
cosθν=0(θν=90°)
average=0.003
s.d.=0.564
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=1GeV
cosθν=0(θν=90°)
average=0.001
s.d.=0.480
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=5GeV
cosθν=0(θν=90°)
average=0.006
s.d.=0.141
Fig. 8. Zenith angle distribution of the muon for the horizontally incident muon neutrino with 0.5 GeV, 1 GeV and 5 GeV,
respectively. The sampling number is 10000 for each case. SK stand for the corresponding ones under the SK assumption.
(a) (b) (c)
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=0.5GeV
cosθν=0.731(θν=43°)
average=0.189
s.d.=0.556
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=1GeV
cosθν=0.731(θν=43°)
average=0.432
s.d.=0.463
−0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
cosθµ
muon neutrino
Eν=5GeV
cosθν=0.731(θν=43°)
average=0.715
s.d.=0.103
Fig. 9. Zenith angle distribution of the muon for the diagonally incident muon neutrino with 0.5 GeV, 1 GeV and 5 GeV,
respectively. The sampling number is 10000 for each case. SK stand for the corresponding ones under the SK assumption.
8 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
3.2 Zenith angle distribution of the emitted lepton for
the different incidence of the neutrinos with different
energies
In Figs. 7 to 9, we give the zenith angle distributions of
the emitted muons for the given direction of the incident
neutrinos with different energies of the neutrino. These
figures are obtained through summation on the energies
of the emitted muons for their definite zenith angles in
Figs. 4 to 6.
In Figs. 7(a) to 7(c), we give the zenith angle distri-
bution of the emitted muon for the case of vertically inci-
dent neutrinos with different energies, say, Eν = 0.5 GeV,
Eν = 1 GeV and Eν = 5 GeV.
Comparing the case for 0.5 GeV with that for 5 GeV,
we understand the big contrast between them as for the
zenith angle distribution. The scattering angle of the emit-
ted muon for 5 GeV neutrino is relatively small (See, Table
1) that the emitted muons keep roughly the same direction
as their original neutrino. In this case, the effect of their
azimuthal angle on the zenith angle is also small. However,
in the case for 0.5 GeV which is the dominant energy for
Fully Contained Events in the Superkamiokande, there is
even a possibility for the emitted muon to be emitted in
the backward direction due to the large angle scattering,
the effect of which is enhanced by their azimuthal angle.
The most frequent occurrence in the backward scatter-
ing of the emitted muon appear in the horizontally inci-
dent neutrino as shown in Figs. 8(a) to 8(c). In this case,
the zenith angle distribution of the emitted muon should
be symmetrical to the horizontal direction. Comparing the
case for 5 GeV with those both for ∼0.5 GeV and for ∼1
GeV, even 1 GeV incident neutrinos lose almost the orig-
inal sense of the incidence if we measure it by the zenith
angle of the emitted muon. Figs. 9(a) to 9(c) for the di-
agonally incident neutrino tell us that the situation for
diagonal cases lies between the case for the vertically in-
cident neutrino and that for horizontally incident one.
4 Zenith Angle Distribution of Fully
Contained Events and Partially Contained
Events for a Given Zenith Angle of the
Incident Neutrino, Taking Their Energy
Spectrum into Account
In the previous sections, we discuss the relation between
the zenith angle distribution of the incident neutrino with
a single energy and that of the emited muons produced by
the neutrino for the different incident direction. In order to
apply our motivation around the uncertainty of the SK as-
sumption on the direction for Fully Contained Events and
Partially Contained Events, we must consider the effect of
the energy spectrum of the incident neutrino. The Monte
Carlo simulation procedure for this purpose are given in
the Appendix B.
In Fig. 10, we give the zenith angle distributions of the
sum of µ+(µ̄) and µ− for a given zenith angle of ν̄µ̄ and
νµ, taking into account primary neutrino energy spectrum
at Kamioka site.
In Table 2, the average values for cosθµ+µ̄ and their
standard deviation for different incidences of the incident
neutrinos with different energies are presented 9.
In the SK case, their average values are given by cosθν(ν̄)
themselves by definition and, consequently, the standard
deviations are zero under the assumption, because the SK
assumption is of the delta function for the incidence direc-
tion. They are shown in the bottom line of Table 2. In the
second line from the bottom in this table, we give the av-
erage values and their standard deviations for cos θν+ν̄ ob-
tained under the inclusion of the energy spectrum for pri-
mary neutrinos. Thus, we found these values correspond to
those for incident neutrino with the effective single energy
between 0.5 GeV and 1 GeV. If we compare the average
energies and the standard deviations for the inclusion of
incident neutrino energy spectrum with those under the
SK assumption, it is easily understood that SK assump-
tion does not represent real zenith angle distribution of
the emitted muon.
5 Relation between the Zenth angle
Distribution of the Incident Neutrinos and
that of the emitted leptons
Now, we extend the results for the definite zenith angle
obtained in the previous section to the case in which we
consider the zenih angle distribution of the incident neu-
trinos totally.
Here, we examine the real correlation between cos θν
and cos θµ, by peforming the exact Monte Carlo simula-
tion.
The detail for the simulation procedure is given the
Appendix C.
In Fig. 11 we classsify the correlation between cos θν
and cos θµ according to the different energy range of the
incident muon neutrinos. It should be noticed that the
SK assumption on cos θν = cos θµ is roughly hold only for
Eν ≥ 5 GeV, but the widths in cos θµ for the definite cos θν
near cos θν =0 (for horizontally incident neutrino ) are
much larger than those near cos θν =1 ( for the vertically
incident neutrino). Of course, this is due to the effect of
the azimutal angle in QEL which could not be derived by
the SK simulation (DETECTOR SIMULATION ). Such
tendencies become more remarkable in Eν ≤ 5 GeV and
in these energies the SK assumption on the direction does
not hold any more.
In Fig. 12, we classify the correlation between cos θν and
cos θµ according to the different energy range of Eµ. The
similar argument on Fig. 11 can be done on the case of
Fig.12
9 Notice that the difference in the corresponding quantites
between the case for single energy and the case for the energy
spectrum. The formers are given in the µ−,while th latter is
given in µ− and µ+. However, such the difference does not
change the essential recognition.
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 9
Table 2. Average values and their standerd deviations in cosθµ+µ̄ for the zenith angle distributions of the muons with different
primary energies of the insident neutrinos.
Vertical Diagonal Horizontal
cosθν+ν̄ = 1 (0
◦) cosθν+ν̄ = 0.731 (43
◦) cosθν+ν̄ = 0 (90
Eν+ν̄(GeV) cos θµ+µ̄ σcos θµ+µ̄ cos θµ+µ̄ σcos θµ+µ̄ cos θµ+µ̄ σcos θµ+µ̄
0.5 0.262 0.547 0.189 0.556 -0.003 0.564
1.0 0.590 0.439 0.432 0.463 0.001 0.480
2.0 0.581 0.250 0.623 0.290 0.001 0.325
5.0 0.978 0.067 0.715 0.103 0.006 0.141
Spectrum∗ 0.468 0.531 0.339 0.519 -0.005 0.500
SK∗∗ 1.00 0.00 0.731 0.000 0.000 0.000
(a) (b) (c)
−1 −0.5 0 0.5 1
cosθµ+µ− 
µ+ and µ−
cosθν+ν − =1(θν+ν −=0°)
Avg.=0.81
S.D.=0.30
−1 −0.5 0 0.5 1
µ −  
cosθµ+µ −
µ+ and µ−
cosθν+ν −=0(θν+ν −=90°)
Avg.=0.00
S.D.=0.34
−1 −0.5 0 0.5 1
µ −  
cosθµ+µ −
µ+ and µ−
cosθν+ν −=0.73(θν+ν −=43°)
Avg.=0.60
S.D.=0.33
Fig. 10. Zenith angle distribution of µ− and µ+ for ν and ν̄ for the incident neutrinos with the vertical, horizontal and diagonal
directions, respectively. The overall neutrino spectrum at Kamioka site is taken into account. The sampling number is 10000
for each case. SK stand for the corresponding ones under the SK assumption.
0 0.2 0.4 0.6 0.8 1
cosθν
Eµ < 0.5 GeV
0.5 < Eµ < 1 GeV
Eµ > 1 GeV
0 0.2 0.4 0.6 0.8 1
cosθν
1 < Eµ < 2 GeV
2 < Eµ < 5 GeV
Eµ > 5 GeV
Fig. 12. Correlation diagrams between cos θν and cos θµ for different muon energy ranges.
10 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
0 0.2 0.4 0.6 0.8 1
cosθν
1 < Eν < 2 GeV
2 < Eν < 5 GeV
Eν > 5 GeV
Fig. 11. Correlation Diagram between cos θν and cos θµ for
different neutrino energy regions.
Thus, it could be surely concluded from Fig. 11 and
Fig. 12 that the SK assumption on the direction never
holds as a good estimator for the determination of the
directions of the incident neutrinos.
In order to obtain the zenith angle distribution of the
emitted leptons for that of the incident neutrinos, we di-
vide the cosine of the zenith angle of the incident neutrino
into twenty regular intervals from cos θν = 0 to cos θν = 1.
For the given interval of cos θν , we carry out the exact
Monte Carlo simulation, the detail of which is give in the
Appendix D and obtain the cosine of the zenith angle of
the emitted leptons, taking account of the geometry for
surronding the SK detector.
Thus, for each interval of cos θν , we obtain the corre-
sponding zenith angle distribution of the emitted leptons.
Then, we sum up these corresponding ones over all zenth
angles of the incident neutrinos and we finally obtain the
relation between the zenith angle distribution for the in-
cident neutrinos and that for the emitted leptons.
In a similar manner, we could obtain between cos θν̄
and cos θµ̄ for anti-neutrinos. The situation for anti-neutrinos
is essentially same as that for neutrinos.
Here, we examine the zenith angle distribution of the
muons from both upward neutrinos and downward ones
in the case that neutrino oscillation does not exist.
By performing the procedures described in Appendix
C, a pair of sampling ( cos θν+ν̄ , Eν+ν̄ ) gives a pair of (
cos θµ+µ̄, Eµ+µ̄ ). In Fig. 13, we give the zenith angle dis-
tribution of the upward neutrinos ( the sum of νµ and ν̄µ )
which is constructed from the energy spectra for different
cos θν+ν̄ . (see, Honda et. al. [10] and Appendix B)
Upward neutrinos may produce even downward lep-
tons due to both the backscattering effect and the effect
of azimuthal angle on larger forward scattering for the in-
teraction concerned (see, Figure 3 and Figures 4 to 6 in
−1 −0.5 0 0.5 1
cosθν+ν −
µ+ ,µ−  (pµ>0.4 GeV/c)
upward neutrinos
no oscillation
from upward neutrinos
cosθµ+µ −, 
Fig. 13. The relation between the zenith angle distribution
of the incident neutrino and corresponding ones of the emitted
lepton
the text). As the result of it, the zenith angle distribution
of the emitted muons for the upward neutrino may leak in
the downward direction. From Figure 13, it is very clear
that the shape of the zenith angle distribution for the in-
cident neutrinos is quite different from that of the emitted
muons produced by these neutrinos. If the SK assumption
on the direction statistically holds, the zenith angle dis-
tribution for the emitted muons should coincide totally
with that of the incident neutrinos. In other words, one
may say that the zenith angle distribution for the emitted
muons should be understood as that of the incident neu-
trino under the SK assumption on the direction. However,
the muon spectrum is distinctively different from the real
(computational) incident neutrino spectrum as shown in
the figure. Thus we conclude that SK assumption on the
direction leads to the wrong conclusion on the neutrino
oscillation. The further examination on the experimental
data obtained by SK will be carried out in the subsequent
papers.
It is, further, noticed that upward neutrino energy
spectrum in the figure biggest near cos θν+ν̄ = 0 and the
smallest near cos θν+ν̄ = 1, which reflects from the en-
hancement of the primary incident neutrino energy spec-
trum from the inclined direction and is independent on the
neutrino oscillation, while in SK opinion, such tendency
may be favor of the existence of neutrino oscillation.
6 Discussions and Conclusion
In order to extract the definite conclusion on the neutrino
oscillation from the experiment by cosmic ray neutrinos
whose intensity as well as interaction with the substance
are both very weak, first of all, one should analyze the
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 11
most clear cut and ambiguity free events. Among neutrino
events analyzed by SK, the most clear cut events are Single
Ring Events, such as electron-like events and muon-like
events in Fully Contained Events which are generated by
QEL. These events are provided with simplicity due to
single ring and all possible measureable physical quantities
are confined in the detector.
Furthermore, QEL is the most dominant source for
neutrino events which are generated in the SK detector.
This is the reason why we examine the QEL events excul-
sively in present and subsequent papers.
If the neutrino oscillation really exists, the most clear
cut evidence surely appears in the analysis of single ring
events due to QEL in Fully Contained Events and one does
not need the analysis of any other type of events, such as
single ring events in Partially Contained Events, multi-
ring events in either Fully Contained Events or Partially
Contained Evetns, all of which include inevitably ambigu-
ities for the interpretation and show merely sub-evidences
compared with that from the single ring events due to
QEL in Fully Contained Events.
SK analyze the zenith angle distribution of the inci-
dent neutrinos under the asssumption that the direction
of the incident neutrino is the same as that of the emitted
lepton. We conclude that this assumption is supplemented
by their Monte Carlo Simulation named as Detector Sim-
ulation 10.
In the present paper, we adopt Time Sequential Simu-
lation which starts from the incident neutrino energy spec-
trum on the opposite side of the Earth to the SK detec-
tor and simulate all posible physical processes which are
connected with the zenith angle distribution of the in-
cident neutrinos according to their probability functions
concerned for the examination on the validity of the SK
assumption on the direction.
Concretely speaking, we take the following treatment,
(i) the stochastic treatments of the scattering angle of the
emitted lepton in QEL, including the scattering on the
backward as well as the azimuthal angle, which could not
be treated in Detector Simulation, (ii) the stochastic treat-
ment on the zenith angle distribution of the emitted lep-
ton, considering the incident neutrino energy spectrum,
(iii) the stochastic treatment on the detection of the QEL
events inside the SK detector.
Furthermore, the discrimination between Fully Con-
tained Events and Partially Contained Events is only pos-
sible in the Time Sequential Simulation, because the events
concerned may be classified into different categories by
chance, Fully Contained Events and Partially Contained
Events due to different occurring points and different di-
rections.
10 The SK Detector Simulation for obtaing the zenith angle
distribtuion of the incident neutrino as for the neutrino oscilla-
tiom has never been disclosed in their papers, even in the Ph.D
thesis. Consequently, this is only our onjecture as for utiliza-
tion of the SK Detector Simulation . A clear thing is only that
SK impose the proposittion that the direction of the incident
neutrino is the same as that of the emitted lepton upon the
neutrino oscillation analysis.
The conclusions thus obtained are as follows:
(1) The zenith angle distributions of the emitted lepton
in QEL for the incident neutrino with both the def-
inite zenith angle and the definite energy are widely
spread, particularly, into even the backward region due
to partly pure backscattering and partly the combina-
tion of the azimuthal angle with the slant direction
of the incident neutrinos. However, for every incident
neutrino with a definte zenith angle, SK give the same
definite zenith angle to the emitted lepton. Already in
this stage, the SK assumption on the direction does
not hold.
(2) Taking account of the incident neutrino energy spec-
trum and simulating all physical processes concerned,
we obtain the zenith angle distribution of the emitted
leptons for the incident neutrino with a definite zenith
angle. It is proved that the SK assumption on the di-
rection does not hold again.
(3) The correlation diagrams between cos θν and cos θµ
show that SK assumption does noty hold well even
for higher energies of the incident neutrinos, and it
is shown that the correlation between them become
weaker in more inclined incident neutrinos due to the
effect of the azimuthal angle in QEL.
(4) Taking into account the detection efficiency for the
events concerned in the simulation for upward neu-
trinos and anti-neutrinos, we obtain the zenith angle
distribution of the leptons ( muons plus anti-muons
). According to the SK Assumption on the direction,
the zenith angle distribution is the same as that of the
incident neutrinos. However, the original zenith angle
distribution of incident neutrino is found to be quite
different from that derived from that of leptons. This is
the final conclusion that SK have not measured the di-
rection of the incident neutrinos reliably, which is quite
independent on either the existence or non-existence of
the neutrino oscillation.
(5) The SK assume that the Partially Contained Events
exclusively belong to the muon-like event. However,
such the assumption lacks in theoretical background.
Electron events can also contribute to the Partially
Contained Events under some geometrical condition,
for example, partly coming from the transformation
by Eq.(A.5). The quantitative examination on the Par-
tially Contained Events among the electron-like event
will be published elsewhere.
In subsequent papers, we will give the relation between
the zenith angle distributions of the incident neutrinos
and the corresponding muons in the cases with and with-
out neutrino oscillation, including downward neutrino and
will examine whether it is possible to or not to detect the
neutrino oscillation by using atmospheric neutrino.
12 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
In the following Appendices we give the concrete Monte
Carlo Simulations, namely, the details of our Time Se-
quential Simulation.
A Appendix: Monte Carlo Procedure for the
Decision of Emitted Energies of the Leptons
and Their Direction Cosines
Here, we give the Monte Carlo Simulation procedure for
obtaining the energy and its direction cosines, (lr,mr, nr),
of the emitted lepton in QEL for a given energy and its
direction cosines, (l,m, n), of the incident neutrino.
The relation among Q2, Eν+ν̄ , the energy of the inci-
dent neutrino, Eℓ, the energy of the emitted lepton (muon
or electron or their anti-particles) and θs, the scattering
angle of the emitted lepton, is given as
Q2 = 2Eν(ν̄)Eℓ(ℓ̄)(1− cosθs). (A·1)
Also, the energy of the emitted lepton is given by
Eℓ(ℓ̄) = Eν(ν̄) −
. (A·2)
Procedure 1
We decide Q2 from the probability function for the differ-
ential cross section with a given Eν(ν̄) (Eq. (2) in the text)
by using the uniform random number, ξ, between (0,1) in
the following
Pℓ(ℓ̄)(Eν(ν̄), Q
2)dQ2, (A·3)
where
Pℓ(ℓ̄)(Eν(ν̄), Q
dσℓ(ℓ̄)(Eν(ν̄), Q
dσℓ(ℓ̄)(Eν(ν̄), Q
(A·4)
From Eq. (A·1), we obtain Q2 in histograms together with
the corresponding theoretical curve in Fig. 14. The agree-
ment between the sampling data and the theoretical curve
is excellent, which shows the validity of the utlized proce-
dure in Eq. (A·3) is right.
Procedure 2
We obtain Eℓ(ℓ̄) from Eq. (A·2) for the given Eν(ν̄) and
Q2 thus decided in the Procedure 1.
Procedure 3
We obtain cos θs, cosine of the the scattering angle of the
emitted lepton, for Eℓ(ℓ̄) thus decided in the Procedure 2
from Eq. (A·1) .
Procedure 4
We decide φ, the azimuthal angle of the scattering lepton,
0 0.2 0.4 0.6 0.8 1
1 GeV
2 GeV
Eν=0.5 GeV
Fig. 14. The reappearance of the probability function for QEL
cross section. Histograms are sampling results, while the curves
concerned are theoretical ones for given incident energies.
which is obtained from
φ = 2πξ. (A·5)
Here, ξ is a uniform random number (0, 1).
As explained schematically in the text(see Fig. 3 in the
text), we must take account of the effect due to the az-
imuthal angle φ in the QEL to obtain the zenith angle
distribution of both Fully Contained Events and Partially
Contained Events correctly.
Procedure 5
The relation between direction cosines of the incident neu-
trinos, (ℓν(ν̄),mν(ν̄), nν(ν̄)), and those of the corresponding
emitted lepton, (ℓr,mr, nr), for a certain θs and φ is given
ℓ2 +m2
ℓ2 +m2
ℓν(ν̄)
ℓ2 +m2
ℓ2 +m2
mν(ν̄)
ℓ2 +m2 0 nν(ν̄)
sinθscosφ
sinθssinφ
cosθs,
(A·6)
where nν(ν̄) = cosθν(ν̄), and nr = cosθℓ. Here, θℓ is the
zenith angle of the emitted lepton.
The Monte Carlo procedure for the determination of θℓ
of the emitted lepton for the parent (anti-)neutrino with
given θν(ν̄) and Eν(ν̄) involves the following steps:
We obtain (ℓr,mr, nr) by using Eq. (A·6). The nr is
the cosine of the zenith angle of the emitted lepton which
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 13
Fig. 15. The relation between the direction cosine of the
incident neutrino and that of the emitted charged lepton.
should be contrasted to nν , that of the incident neutrino.
Repeating the procedures 1 to 5 just mentioned above, we
obtain the zenith angle distribution of the emitted leptons
for a given zenth angle of the incident neutrino with a def-
inite energy.
In the SK analysis, instead of Eq. (A·6), they assume
nr = nν(ν̄) uniquely for Eµ(µ̄) ≥ 400 MeV.
B Appendix: Monte Carlo Procedure to
Obtain the Zenith Angle of the Emitted
Lepton for a Given Zentith Angle of the
Incident Neutrino
The present simulation procedure for a given zenith an-
gle of the incident neutrino starts from the atmospheric
neutrino spectrum at the opposite site of the Earth to
the SK detector. We define, Nint(Eν(ν̄), t, cosθν(ν̄)), the in-
teraction neutrino spectrum at the depth t from the SK
detector in the following way
Nint(Eν(ν̄), t, cosθν(ν̄)) = Nsp(Eν(ν̄), cos θν(ν̄))×
λ1(Eν(ν̄), t1, ρ1)
× · · · ×
λn(Eν(ν̄), tn, ρn)
(B·1)
Here, Nsp(Eν(ν̄), cos θν(ν̄)) is the atmospheric (anti-)
neutrino spectrum for the zenith angle at the opposite
surface of the Earth.
Here λi(Eν(ν̄), ti, ρi) denotes the mean free path due
to the neutrino(anti neutrino) with the energy Eν(ν̄) from
QEL at the distance, ti, from the opposite surface of the
Earth inside whose density is ρi.
The procedures of the Monte Carlo Simulation for the
incident neutrino(anti neutrino) with a given energy,Eν(ν̄),
whose incident direction is expressde by (l,m, n) are as fol-
lows.
Procedure A
For the given zenith angle of the incident neutrino, θν(ν̄),
we formulate, Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄), the produc-
tion function for the neutrino flux to produce leptons at
the Kamioka site in the following
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
= σℓ(ℓ̄)(Eν(ν̄))Nint(Eν(ν̄), t, cosθν(ν̄))dEν(ν̄), (B·2)
where
σℓ(ℓ̄)(Eν(ν̄)) =
dσℓ(ℓ̄)(Eν(ν̄), Q
dQ2. (B·3)
Each differential cross section above is given in Eq. (2) in
the text.
Utilizing, ξ, the uniform random number between (0,1),
we determine Eν(ν̄), the energy of the incident neutrino in
the following sampling procedure
∫ Eν(ν̄)
Eν(ν̄),min
Pd(Eν(ν̄), t, cos θν(ν̄)(ν̄))dEν(ν̄), (B·4)
where
Pd(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
∫ Eν(ν̄),max
Eν(ν̄),min
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
. (B·5)
14 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
In our Monte Carlo procedure,
the reproduction of, Pd(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄), the nor-
malized differential neutrino interaction probability func-
tion, is confirmed in the same way as in Eq. (A·4).
Procedure B
For the (anti-)neutrino concerned with the energy ofEν(ν̄),
we sample Q2 utlizing ξ3, the uniform random number be-
tween (0,1). The Procedure B is exactly the same as in the
Procedure 1 in the Appendix A.
Procedure C
We decide, θs, the scattering angle of the emitted lepton
for given Eν(ν̄) and Q
2. The procedure C is exactly the
same as in the combination of Procedures 2 and 3 in the
Appendix A.
Procedure D
We randomly sample the azimuthal angle of the charged
lepton concerned. The Procedure D is exactly the same as
in the Procedure 4 in the Appendix A.
Procedure E
We decide the direction cosine of the charged lepton con-
cerned. The Procedure E is exactly the same as in the
Procedure 5 in the Appendix A.
We repeat Procedures A to E until we reach the de-
sired trial number.
C Appendix: Correlation between the Zenith
Angles of the Incident Neutrinos and Those
of the Emitted Leptons
Procedure A
By using, Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄), which is defined
in Eq. (B·2),
we define the spectrum for cos θν(ν̄) in the following.
I(cos θν(ν̄))d(cos θν(ν̄)) =
d(cos θν(ν̄))
∫ Eν(ν̄),max
Eν(ν̄),min
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄).(C·1)
By using Eq. (C·2) and ξ, a sampled uniform random num-
ber between (0,1), then we could determine cos θν(ν̄) from
the following equation
∫ cos θν(ν̄)
Pn(cos θν(ν̄))d(cos θν(ν̄)), (C·2)
where
Pn(cos θν(ν̄)) = I(cos θν(ν̄))
I(cos θν(ν̄))d(cos θν(ν̄)).
(C·3)
Procedure B
For the sampled d(cos θν(ν̄)) in the Procedure A, we sam-
ple Eν(ν̄) from Eq.(C·4) by using ξ, the uniform randum
number between (0,1)
∫ Eν(ν̄)
Eν(ν̄),min
Ppro(Eν(ν̄), cos θν(ν̄))dEν(ν̄), (C·4)
where
Ppro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄) =
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
∫ Eν(ν̄),max
Eν(ν̄),min
Npro(Eν(ν̄), t, cos θν(ν̄))dEν(ν̄)
. (C·5)
Procedure C
For the sampled Eν(ν̄) in the Procedure B, we sample
Eµ(µ̄) from Eqs. (A·2) and (A·3). For the sampled Eν(ν̄)
and Eµ(µ̄), we determine cos θs, the scattering angle of the
muon uniquely from Eq. (A·1).
Procedure D
We determine, φ, the azimuthal angle of the scattering lep-
ton from Eq. (A·5) by using ξ, an uniform randum number
between (0,1).
Procedure E
We obtain cos θµ(µ̄) from Eq. (A·6). As the result, we ob-
tain a pair of (cos θν(ν̄), cos θµ(µ̄)) through Procedures A
to E. Repeating the Procedures A to E, we finally the cor-
relation between the zenith angle of the incident neutrino
and that of the emitted muon.
E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK 15
D Appendix: Detection of the Neutrino
Events in the SK Detector and Their
Interaction Points
The plane ABCD is always directed vertically to the di-
rection of the incident neutrino with a given zenith an-
gle, which is shown in Fig. 16. The rectangular ABCDE-
FGH encloses the SK detector whose radius and height
is denoted by R and H, respectively. The width and the
height of the plane ABCD for a given zenith angle, θν(ν̄),
is given as, R and Rcos θν(ν̄) + H sin θν ,respectively, which
are shown in Fig. 16-c.
Now, let us estimate the ratio of the number of the neu-
trino events inside the SK detector to that in the rectan-
gular ABCDEFGH. As the number of the neutrino events
inside some material is proportional to the number of the
nucleons in the material concerned. The number of the
nucleons inside the SK detector (ρ = 1) is given as
Nsk =
NavogaR
2H, (D·1)
whereNavoga denotes the Avogadro number, and the num-
ber of the nucleons in the exterior of the SK detector inside
ABCDEFGH is given as
Fig 16-a Fig 16-b
Injection Point
(X,Y)
Neutrino
Injection Point
Fig 16-c Fig 16-d
Fig. 16. Sampling procedure for neutrino evens injected into
the detector
Nextr(cos θν(ν̄) ) = ρNavoga
R2H +
R(H2+R2) sin θν(ν̄) cos θν(ν̄)
, (D·2)
where ρ is the density of the rock which surrounds the SK
detector.
Then, the total number of the target in the rectangu-
larABCDEFGH is given as
Ntot(cos θν(ν̄)) = Nsk +Nextr(cos θν(ν̄)). (D·3)
Here, we take 2.65, as ρ (standard rock).
Then, Rtheor, the ratio of the number of the neutrino
events in the SK detector to that in the rectangular ABCDE-
FGH is given as
Rtheor(cos θν(ν̄)) = Nsk/Ntot(cos θν(ν̄)). (D·4)
We obtain Rtheor for different values of cos θν(ν̄) given in
the Table 3.
Here, we simulate neutrino events occured in the rectangu-
lar ABCDEFH, by using the atmospheric neutrino beam
which falls down on the plane ABCD. Thus, Nsmaple, the
sampling number of the (anti-)neutrino events inside the
rectangular ABCDEFG for a given cos θν(ν̄) is given as
Nsample(cos θν(ν̄)) = Ntot(cos θν(ν̄))×
∫ Eν(ν̄),max
Eν(ν̄),min
ℓ(ℓ̄)
(Eν(ν̄))Nint(Eν(ν̄), t, cosθν(ν̄))dEν(ν̄)
(D·5)
where σℓ(ℓ̄)(Eν(ν̄)) is the total cross section for (anti-)neutrino
due to QEL, and Nint(Eν(ν̄), t, cosθν(ν̄))dEν(ν̄) is the dif-
ferential nutrino energy spectrum for the definite zenith
angle, θν(ν̄), in the plane ABCD. The injection points of
the neutrinos in the plane ABCD are distributed over the
plane randomly and uniformely and the injection points
are determined from a pair of the uniform random num-
bers between (0,1). They penetrate into the rectangular
ABCDEFGH from the injection point in the plane ABCD
and some of them may penetrate into the SK detector or
may not, which depend on their injection point.
In the neutrino events which penetrate into the SK
detectorr, their geometrical total track length, Ttrack, are
devided into three parts
Ttrack = Tb + Tsk + Ta, (D·6)
where Tb denotes the track length from the plane ABCD
to the entrance point of the SK detector, Tsk denotes the
track length inside the SK detector, and Ta denotes the
track length from the escaping point of the SK detector
to the exit point of the rectangular ABCDEF, and thus
16 E.Konishi et. al.,: The Reliability on the Direction of Neutrino in SK
Table 3. Occurrence probabilities of the neutrino events in-
side the SK detector for different cos θν ’s. Comparison between
Rtheor and Rmonte. The sampling numbers for the Monte Carlo
Simulation are, 1000, 10000, 100000, respectively.
cos θν Rtheor Rmonte
Sampling Number
1000 10000 100000
0.000 0.58002 0.576 0.5750 0.57979
0.100 0.41717 0.425 0.4185 0.41742
0.200 0.32792 0.353 0.3252 0.32657
0.300 0.27324 0.282 0.2731 0.27163
0.400 0.23778 0.223 0.2329 0.23582
0.500 0.21491 0.206 0.2063 0.21203
0.600 0.20117 0.197 0.1946 0.19882
0.700 0.19587 0.193 0.1925 0.19428
0.800 0.20117 0.198 0.2002 0.20001
0.900 0.22843 0.230 0.2248 0.22803
1.00 0.58002 0.557 0.5744 0.57936
Ttrack denotes the geometrical length of the neutrino con-
cerned in the rectangular ABCDEFGH.
By the definition, the neutrinos concerned with Ttrack
interact surely somewhere along the Ttrack. Here, we are
interested only in the interaction point which ocuurs along
Tsk. We could determine the interaction point in the Tsk
in the following.
We define the following quantities for the purpose.
Tweight = Tsk + ρ(Tb + Ta), (D·7)
ρav = Tweight/Ttrack, (D·8)
ξρ = ρav/ρ, (D·9)
ξsk = Tsk/Tweight. (D·10)
The flow chart for the choice of the neutrino events in
the SK detector and the determination of the interaction
points inside the SK detector is given in Fig. 17. Thus, we
obtain neutrino events whose occurrence point is decided
in the SK detector in the following.
xf = x0, (D·11)
yf = y0 + ξTsk sin θν(ν̄) (D·12)
zf = z0 + ξTsk cos θν(ν̄). (D·13)
If we carry out the Monte Carlo Simulation, following
the flow chart in Fig. 17, then, we obtain Nevent, the num-
ber of the neutrino events generasted in the SK detector.
The ratio of the selected events to the total trial is given
Rmonte(cos θν(ν̄)) = Nevent(cos θν(ν̄))/Nsample(cos θν(ν̄)).
(D·14)
Comparison between Rtheor and Rmonte in Table 3 shows
that our Monte Carlo procedure is valid.
References
1. Kasuga, S. et al., Phys. Lett. B374 (1996) 238.
N Nsample
N=N+1
N=N+1
Entry
Determination of
Point in the plane ABCD
by using 1 and 2
Judgement on the
Event’s Entering
Determination of
Ta,Tb and Tsk
Determination of Tweight
av, and sk
Determination of e
interaction point of
e events inside SK
Fig. 17. Flow Chart for the determination of the interaction
points of the neutrino events inside the detector
2. Ashie,Y. et al., Phys. Rev. D 71 (2005) 112005.
3. Ashie,Y. et al., Phys. Rev. Lett.93 (2004) 101801.
4. Renton, P., Electro-weak Interaction, Cambridge University
Press (1990). See p. 405.
5. D.Rein and L.M.Sehgal, Ann. of Phys. 133 (1981) 1780.
6. D.Rein and L.M.Sehgal Nucl. Phys. B84 (1983) 29.
7. R.H.Gandhi et. al. Astropart. Phys. 5 (1996) 81.
8. Kajita, T. and Totsuka, Y. Rev. Mod. Phys., 73 (2001) 85.
See p. 101.
9. Ishitsuka, M., Ph.D thesis, University of Tokyo (2004). See
p. 138.
10. Honda, M., et al., Phys. Rev. D 52 (1996) 4985
	Introduction
	Cross Sections of Quasi Elastic Scattering in the Neutrino Reaction and the Scattering Angle of Charged Leptons.
	Influence of Azimuthal Angle of Quasi Elastic Scattering over the Zenith Angle of both the Fully Contained Events and Partially Contained Events
	Zenith Angle Distribution of Fully Contained Events and Partially Contained Events for a Given Zenith Angle of the Incident Neutrino, Taking Their Energy Spectrum into Account
	Relation between the Zenth angle Distribution of the Incident Neutrinos and that of the emitted leptons
	Discussions and Conclusion
	Appendix: Monte Carlo Procedure for the Decision of Emitted Energies of the Leptons and Their Direction Cosines 
	Appendix: Monte Carlo Procedure to Obtain the Zenith Angle of the Emitted Lepton for a Given Zentith Angle of the Incident Neutrino
	Appendix: Correlation between the Zenith Angles of the Incident Neutrinos and Those of the Emitted Leptons
	Appendix: Detection of the Neutrino Events in the SK Detector and Their Interaction Points
ABSTRACT
  In the SK analysis of the neutrino events for [Fully Contained Events] and
[Partially Contained Events] on their zenith angle distribution, it is assumed
that the zenith angle of the incident neutrino is the same as that of the
detected charged lepton. In the present paper, we examine the validity of [the
SK assumption on the direction] of the incident neutrinos. Concretely speaking,
we analyze muon-like events due to QEL. For the purpose, we develop [Time
Sequential Monte Carlo Simulation] to extract the conclusion on the validity of
the SK assumption. In our [Time Sequential Simulation], we simulate every
physical process concerned as exactly as possible without any approximation.
  From the comparison between the zenith angle distributon of the emitted muons
under [the SK assumption on the direction] and the corresponding one obtained
under our [Time Sequential Simulation], it is concluded that the measurement of
the direction of the incident neutrino for the neutrino events occurring inside
the detector in the SK analysis turns out to be unreliable, which holds
irrespective of the existence and/or non-existence of the neutrino oscillation.

<|endoftext|><|startoftext|>
Introduction
Although knots are abundant and complex in globular
homopolymers [1–3], they are rare and simple in proteins [4–
8]. Sixteen methyltransferases in bacteria and viruses can be
combined into the a/b knot superfamily [9], and several
isozymes of carbonic anhydrase (I, II, IV, V) are known to be
knotted. Apart from these two folds, only a few insular knots
have been reported [5,6,10,11], some of which were derived
from incomplete structures [6,11]. For the most part, knotted
proteins contain simple trefoil knots (31) that can be
represented by three essential crossings in a projection onto
a plane (see Figure 1, left). Only three proteins were identified
with four projected crossings (41, Figure 1, middle).
In this report we provide the first comprehensive review of
knots in proteins, which considers all entries in the Protein
Data Bank (http://www.pdb.org) [12], and not just a subset. This
allows us to examine knots in homologous proteins. Our
analysis reveals several new knots, all in enzymes. In particular,
we discovered the most complicated knot found to date (52) in
humanubiquitin hydrolase (Figure 1, right), and suggest that its
entangled topology protects it against being pulled into the
proteasome. We also noticed that knots are usually preserved
among structural homologues. Sequence similarity appears to
be a strong indicator for thepreservationof topology, although
differences between knotted and unknotted structures are
sometimes subtle. Interestingly, we have also identified a novel
knot in a transcarbamylase that is not present in homologues of
known structure. We show that the presence of this knot alters
the functionality of the protein, and suggest how the knot may
have been created in the first place.
Mathematically, knots are rigorously defined in closed
loops [13]. Fortunately, both the N- and C-termini of open
proteins are typically accessible from the surface and can be
connected unambiguously: we reduce the protein to its Ca-
backbone, and draw two lines outward starting at the termini
in the direction of the connection line between the center of
mass of the backbone and the respective ends [5]. The lines
are joined by a big loop, and the structure is topologically
classified by the determination of its Alexander polynomial
[1,13]. Applying this method to the Protein Data Bank in the
version of January 3, 2006, we found 273 knotted structures in
the 32,853 entries that contain proteins (Table S1). Knots
formed by disulfide [14,15] or hydrogen bonds [7] were not
included in the study.
Results
For further analysis, we considered 36 proteins that
contain knots as defined by rather stringent criteria discussed
in the Materials and Methods section. These proteins can be
classified into six distinct families (Table 1). Four of these
families incorporate a deeply knotted section, which persists
when 25 amino acids are cut off from either terminus.
Interestingly, all knotted proteins thus identified are en-
zymes. Our investigation affirms that all members of the
carbonic anhydrase fold (including the previously undeter-
mined isozymes III, VII, and XIV) are knotted. In addition, we
identify a novel trefoil in two bacterial transcarbamylase-like
proteins (AOTCase in Xanthomonas campestris and SOTCase in
Bacteroides fragilis) [16,17].
UCH-L3—The most complex protein knot. One of our
most intriguing discoveries is a fairly intricate knot with five
projected crossings (52) in ubiquitin hydrolase (UCH-L3 [18];
see Figure 1, right). This knot is the first of its kind and, apart
from carbonic anhydrases, the only identified in a human
protein. Human UCH-L3 also has a yeast homologue [6,19]
with a sequence identity of 32% [20]. Amino acids 63 to 77
are unstructured, and if we connect the unstructured region
by an arc that is present in the human structure, we obtain
the same knot with five crossings. What may be the function
of this knot? In eukaryotes, proteins get labeled for
Editor: Robert B. Russell, European Molecular Biology Laboratory, Germany
Received April 3, 2006; Accepted July 28, 2006; Published September 15, 2006
A previous version of this article appeared as an Early Online Release on July 28,
2006 (DOI: 10.1371/journal.pcbi.0020122.eor).
DOI: 10.1371/journal.pcbi.0020122
Copyright: � 2006 Virnau et al. This is an open-access article distributed under the
terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original author
and source are credited.
Abbreviations: AOTCase, N-acetylornithine transcarbamylase; SOTCase, N-succi-
nylornithine transcarbamylase; UCH-L3, ubiquitin hydrolase
* To whom correspondence should be addressed. E-mail: virnau@mit.edu
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221074
degradation by ubiquitin conjugation. UCH-L3 performs
deconjugation of ubiquitin, thus rescuing proteins from
degradation. The close association of the enzyme with
ubiquitin should make it a prime target for degradation at
the proteasome. We suggest that the knotted structure of
UCH-L3 makes it resistant to degradation. In fact, the first
step of protein degradation was shown to be ATP-dependent
protein unfolding by threading through a narrow pore (;13
Å in diameter) of a proteasome [21,22]. Such threading into
the degradation chamber depends on how easily a protein
unfolds, with more stable proteins being released back into
solution [23] and unstable ones being degraded. If ATP-
dependent unfolding proceeds by pulling the C-terminus into
a narrow pore [21], then a knot can sterically preclude such
translocation, hence preventing protein unfolding and
degradation. While arceabacterial proteasome PAN was
shown to process proteins from its C- to N-terminus [21], it
cannot be ruled out that some eukaryotic proteasomes
process proteins in the N- to C-direction, thus requiring
protection of both termini. Unfolding of a knotted protein by
pulling may require a long time for global unfolding and
untangling of the knot. Unknotted proteins, in contrast, have
been shown to become unstable if a few residues are removed
from their termini [24], suggesting that threading a few (5–10)
residues into a proteasomal pore would be sufficient to
unravel an unknotted structure. At both termini, UCH-L3
contains loops entangled into the knot protecting both ends
against unfolding if pulled. It should also be noted that both
N- and C-termini are stabilized by a number of hydrophobic
interactions with the rest of the protein. The C-terminus is
Figure 1. Examples of the Three Different Types of Knots Found in Proteins
Colors change continuously from red (first residue) to blue (last residue). A reduced representation of the structure, based on the algorithm described in
[1,6,36], is shown in the lower row.
(Left) The trefoil knot (31) in the YBEA methyltransferase from E. coli (pdb code 1ns5; unpublished data) reveals three essential crossings in a projection
onto a plane.
(Middle) The figure-eight knot (41) in the Class II ketol-acid reductoisomerase from Spinacia oleracea (pdb code 1yve [26]) features four crossings. (Only
the knotted section of the protein is shown.)
(Right) The knot 52 in ubiquitin hydrolase UCH-L3 (pdb code 1xd3 [18]) reveals five crossings. Pictures were generated with Visual Molecular Dynamics
(http://www.ks.uiuc.edu/Research/vmd) [43].
DOI: 10.1371/journal.pcbi.0020122.g001
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221075
Synopsis
Several protein structures incorporate a rather unusual structural
feature: a knot in the polypeptide backbone. These knots are
extremely rare, but their occurrence is likely connected to protein
function in as yet unexplored fashion. The authors’ analysis of the
complete Protein Data Bank reveals several new knots that, along
with previously discovered ones, may shed light on such con-
nections. In particular, they identify the most complex knot
discovered to date in a human protein, and suggest that its
entangled topology protects it against unfolding and degradation.
Knots in proteins are typically preserved across species and
sometimes even across kingdoms. However, there is also one
example of a knot in a protein that is not present in a closely related
structure. The emergence of this particular knot is accompanied by a
shift in the enzymatic function of the protein. It is suggested that
the simple insertion of a short DNA fragment into the gene may
suffice to cause this alteration of structure and function.
Intricate Knots in Proteins
particularly stable—residues 223 to 229 are hydrophobic and
form numerous contacts at 5 Å with the rest of the structure.
We would like to stress that this hypothesis needs to be
tested by experiments. Different proteins may also provide
different levels of protection against degradation, depending
on structural details, the depth of the knot, and its complex-
ity. Recently, a knot in the red/far-red light photoreceptor
phytochrome A in Deinococcus radiodurans was identified [11]
(see Materials and Methods). Although sequence similarity
suggests that the knot may also be present in plant
homologues, we cannot be certain. In plants, the red-
absorbing form is rather stable (half-life of 1 wk), but the
far-red–absorbing form is degraded upon photoconversion
by the proteasome with a half-life of 1–2 h in seedlings (and
somewhat longer in adult plants) [25].
Evolutionary aspects. As expected, homologous structures
tend to retain topological features. The trefoil knot in
carbonic anhydrase can be found in isozymes ranging from
bacteria and algae to humans (Table 1). Class II ketol-acid
reductoisomerase comprises a figure-eight knot present in
Escherichia coli [10] and spinach [26] (see Figure 1, middle), and
S-adenosylmethione synthetase contains a deep trefoil knot in
E. coli [5,27] and rat [28]. It appears that particular knots have
indeed been preserved throughout evolution, which suggests a
crucial role for knots in protein enzymatic activity and
binding.
Table 1. List of Knotted PDB Entries (January 2006)
Protein Knot Family Protein Species PDB Code Length Knot Knotted Core
a/b knot YbeA-like E. coli 1ns5 153 31 69–121 (32)
T. maritime 1o6d 147 31 68–117 (30)
S. aureus 1vh0 157 31 73–126 (31)
B. subtilis 1to0 148 31 64–116 (32)
tRNA(m1G37)-methyltransferase TrmD H. influenza 1uaj 241 31 93–138 (92)
E. coli 1p9p 235 31 90–130 (89)
SpoU-like RNA 29-O ribose mtf. T. thermophilus 1v2x 191 31 96–140 (51)
H. influenza 1j85 156 31 77–114 (42)
T. thermophilus 1ipa 258 31 185–229 (29)
E. coli 1gz0 242 31 172–214 (28)
A. aeolicus 1zjr 197 31 95–139 (58)
S. viridochromog. 1x7p 265 31 192–234 (31)
YggJ C-terminal domain-like H. influenza 1nxz 246 31 165–216 (30)
B. subtilis 1vhk 235 31 158–208 (27)
T. thermophilus 1v6z 227 31 103–202 (25)
Hypothetical protein MTH1 (MT0001) A. M. Thermoautotr. 1k3r 262 31 48–234 (28)
Carbonic anhydrases Carbonic anhydrase N. gonorrhoeae 1kop 223 31 36–223 (0)
Carbonic anhydrase I H. sapiens 1hcb 258 31 29–256 (2)
Carbonic anhydrase II H. sapiens 1lug 259 31 30–256 (3)
Bos Taurus 1v9e 259 31 32–256 (3)
Dunaliella salina 1y7w 274 31 37–270 (4)
Carbonic anhydrase III Rattus norv. 1flj 259 31 30–256 (3)
H. sapiens 1z93 263 31 28–254 (9)
Carbonic anhydrase IV H.sapiens 1znc 262 31 32–261 (1)
Mus musculus 2znc 249 31 32–246 (3)
Carbonic anhydrase V Mus musculus 1keq 238 31 7–234 (4)
Carbonic anhydrase VII H. sapiens 1jd0 260 31 28–257 (3)
Carbonic anhydrase XIV Mus Musculus 1rj6 259 31 29–257 (2)
Miscellaneous Ubiquitin hydrolase UCH-L3 H. sapiens 1xd3A 229 52 12–172 (11)
S. cerevisiae (synth.) 1cmxA 214 31 9–208 (6)
S-adenosylmethionine synthetase E. coli 1fug 383 31 33–260 (32)
Rattus norv. 1qm4 368 31 30–253 (29)
Class II ketol-acid reductoisomerase Spinacia oleracea 1yve 513 41 239–451 (62)
E. coli 1yrl 487 41 220–435 (52)
Transcarbamylase-like B. fragilis 1js1 324 31 169–267 (57)
X. campestris 1yh1 334 31 171–272 (62)
‘‘Protein’’ describes the name or the family of the knotted structure. ‘‘Species’’ refers to the scientific name of the organism from which the protein was taken for structure determination.
‘‘PDB code’’ gives one example Protein Data Bank entry for each knotted protein: additional structures of the same protein can be found using the SCOP classification tool [9]. ‘‘Length’’
describes the number of Ca-backbone atoms in the structure. ‘‘Knot’’ refers to the knot type which was discovered in the protein: 31, trefoil; 41, figure-eight knot; 52, 2nd knot with five
crossings according to standard knot tables [13]. The core of a knot is the minimum configuration which stays knotted after a series of deletions from each terminus; in brackets we
indicate how many amino acids can be removed from either side before the structure becomes unknotted (see Materials and Methods).
Structure is fragmented and becomes knotted when missing sections are joined by straight lines. The size of the knotted core refers to the thus-connected structure. The knot is also
present in at least one fragment.
Structure is fragmented and only knotted when missing sections are joined by straight lines. Fragments are unknotted.
1v6z is currently not classified according to SCOP (version 1.69). Sequence similarity suggests that it is part of the a/b knot fold. 1v6zB contains a shallow composite knot (31#41), which
turns into a regular trefoil when two amino acids are cut from the N-terminus. (The random closure [see Materials and Methods] determines a trefoil right away.)
1uch contains the same structure as 1xd3. If the missing section in the center of this structure is joined by a straight line, it becomes knotted (52). In the yeast homologue (1cmx), amino
acids 63 to 77 are unstructured, and if we replace the missing parts by a straight line, we obtain a trefoil knot that has been identified before [6]. If we connect the unstructured region by
an arc present in the human structure, we obtain the same knot with five crossings.
eVisual inspection reveals that the calculated size of the knotted core is too small.
DOI: 10.1371/journal.pcbi.0020122.t001
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221076
Intricate Knots in Proteins
UCH-L3 in human and yeast share only 33% [29] of their
sequences, but contain the same 5-fold knot as far as we can
tell from the incomplete structure in yeast. It is not only likely
that all species in between have the same knot—the link
between sequence and structure may also be used to predict
candidates for knots among isozymes or related proteins for
which the structure is unknown. For example, UCH-L4 in
mouse has 96% sequence identity with human UCH-L3. The
similarity with UCH-L6 in chicken is 86%, and with UCH-L1
about 55%. Indeed, a reexamination of the most recent
Protein Data Bank entries revealed that UCH-L1 contains the
same 52 knot as UCH-L3. (See the Update section—the
structure was not yet part of the January Protein Data Bank
release on which this paper is based.) Unfortunately, the
method is not foolproof because differences between knotted
and unknotted structures are sometime subtle. As we will
demonstrate in the next paragraph, a more reliable estimate
has to consider the conservation of major elements of the
knot, like loops and threads.
AOTCase—How a protein knot can alter enzymatic activity.
Somewhat surprisingly, we also identified a pair of homo-
logues for which topology is not preserved. N-acetylornithine
transcarbamylase (AOTCase [17]) is essential for the arginine
biosynthesis in several major pathogens. In other bacteria,
animals, and humans, a homologous enzyme (OTCase)
processes L-ornithine instead [30]. Both proteins have two
active sites. The first one binds carbamyl phosphate to the
enzyme. The second site binds acetylornithine in AOTCases
and L-ornithine in OTCases, enabling a reaction with
carbamyl phosphate to form acetylcitrulline or citrulline,
respectively [17, 31].
AOTCase in X. campestris has 41% sequence identity with
OTCase from Pyrococcus furiosus [32] and 29% with human
OTCase [31]. As demonstrated in Figure 2, AOTCase contains
a deep trefoil knot which is not present in OTCase (Figure 2,
right) and which modifies the second active site. The knot
consists of a rigid proline-rich loop (residues 178–185),
through which residues 252 to 256 are threaded and affixed.
As elaborated in [17], the reaction product N-acetylcitrulline
strongly interacts with the loop and with Lys252. Access to
subsequent residues is, however, restricted by the knot. L-
norvaline in Figure 2 (right) is very similar to L-ornithine but
lacks the N-e atom of the latter to prevent a reaction with
carbamyl phosphate. As the knot is not present in OTCase,
the ligand has complete access to the dangling residues 263–
268 and strongly interacts with them [31]. This leads to a
rotation of the carboxyl-group by roughly 1108 around the
Ca–Cb bond [17].
This example demonstrates how the presence of a knot can
modify active sites and alter the enzymatic activity of a
protein—in this case, from processing L-ornithine to N-
acetyl-L-ornithine. It is also easy to imagine how this
alteration happened: a short insertion extends the loop and
modifies the folding pathway of the protein.
Discussion
Nature appears to disfavour entanglements, and evolution
has developed mechanisms to avoid knots. Human DNA
wraps around histone proteins, and the rigidity of DNA
allows it to form a spool when it is fed into a viral capsid. One
end also stays in the loading channel and prevents subsequent
equilibration [33]. Knotted proteins are rare, although the
reason is far less well understood. Can the absence of
entanglement be explained in terms of particular statistical
ensembles, or is there an evolutionary bias? And how do these
structures actually fold?
Knots are ubiquitous in globular homopolymers [1–3,8], but
rare in coil-like phases [1,34–36]. It is likely that even a flexible
polymer will at least initially remain unknotted after a
Figure 2. Structures of Transcarbamylase from X. campestris with a Trefoil Knot and from Human without a Knot
(Left) Knotted section (residues 171–278) of N-acetylornithine transcarbamylase from X. campestris with reaction product N-acetylcitrulline (pdb code
1yh1 [17]) and interacting side chains.
(Right) Corresponding (unknotted) section (residues 189–286) in human ornithine transcarbamylase (pdb code 1c9y [31]) with inhibitor L-norvaline and
carbamyl phosphate. Colors change continuously from red (first residue in the section) to blue (last residue in the section). The two proteins have an
overall sequence identity of 29% [41]. Pictures were generated with VMD [43].
DOI: 10.1371/journal.pcbi.0020122.g002
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221077
Intricate Knots in Proteins
collapse from a swollen state. In proteins, the free energy
landscape is considerably more complex, which may allow
most proteins to stay unknotted. The secondary structure and
the stiffness of the protein backbone may shift the length scale
at which knots typically appear, too [8]. If knotted proteins are
in fact more difficult to degrade, it might also be disadvanta-
geous for most proteins to be knotted in the first place.
Unfortunately, few experimental papers address folding
and biophysical aspects of knots in proteins. In recent work
[37], Jackson and Mallam reversibly unfolded and folded a
knotted methyltransferase in vitro, indicating that chaper-
ones are not a necessary prerequisite. In a subsequent study
[38], the authors provide an extensive kinetic analysis of the
folding pathway. In conclusion, we would like to express our
hope that this report will inspire more experiments in this
small but nevertheless fascinating field.
Materials and Methods
To determine whether a structure is knotted, we reduce the
protein to its backbone, and draw two lines outward starting at the
termini in the direction of the connection line between the center of
mass of the backbone and the respective ends. These two lines are
joined by a big loop, and the structure is classified by the
determination of its Alexander polynomial [1,13]. To determine the
size of the knotted core, we delete successively amino acids from the
N-terminus until the protein becomes unknotted [1,6]. The proce-
dure is repeated at the C-terminus starting with the last structure that
contained the original knot. For each deletion, the outward pointing
line through the new termini is parallel to the respective lines
computed for the full structure. The thus determined size should,
however, only be regarded as a guideline. A better estimate can be
achieved by looking at the structure.
In Table 1 we include knotted structures with no missing amino
acids in the center of the protein. (A list of potentially knotted
structures with missing amino acids can be found in Table S3.)
Technically, the numbering of the residues in the mmcif file has to be
subsequent, and no two amino acids are allowed to be more than 6 Å
apart. In addition, the knot has to persist when two amino acids are
cut from either terminus. We have further excluded structures for
which unknotted counterexamples exist (e.g., only one nuclear
magnetic resonance structure among many is knotted or another
structure of the same protein is unknotted). If a structure is
fragmented, the knot has to appear in one fragment and in the
resulting structure obtained from connecting missing sections by
straight lines. Other knotted structures are only considered when at
least one additional member of the same structural family [9]
contains a knot according to the criteria above.
The enforcement of these rules leads to the exclusion of the
bluetongue virus core protein [6] (41) and photoreceptor phyto-
chrome A in D. radiodurans [11] (31), which have been previously
identified as being knotted. Both structures are fragmented and
become knotted only when a few missing fragments are connected by
straight lines. In the viral core protein, the dangling C-terminus
threads through a loose loop and becomes knotted in one out of two
cases. On the other hand, the photoreceptor phytochrome A appears
to contain a true knot. Notably, our analysis suggests that the thus
connected structure of phytochrome A contains a figure-eight knot
instead of a trefoil as reported in [11]. Moreover, we excluded a
structure of the Autographa California nuclear polyhedrosis virus, which
contains a knot according to our criteria. However, the N-terminus is
buried inside the protein and the knot only exists because of our
specific connection to the outside.
To further validate our criteria, we implemented an alternative
method [4,8,39] that relies on the statistical analysis of multiple
random closures. We arbitrarily chose two points on a sphere (which
has to be larger than the protein) and connected each with one
terminus. The two points can be joined unambiguously, and the
resulting loop was analyzed by calculating the Alexander polynomial.
We repeated the procedure 1,000 times, and defined the knot as the
majority type.
Applying this analysis, we discovered 241 knotted structures in the
Protein Data Bank. All 241 structures are also present in the 273
structures (Table S1) that were identified by our method, and the knot
type is the same. Themissing32 structures (Table S2) aremostly shallow
knots and were already rejected according to our extended criteria.
The random closure also correctly discards rare structures with buried
termini. In conclusion, the method used in this paper is considerably
faster but requires a slightly increased inspection effort. Our
observations agree with [8], which provides an extensive comparison
of closures applied to proteins. A complete listing of knotted Protein
Data Bank structures is given in the Supporting Information.
Update. Recently, the structure of human UCH-L1 was solved and
released [40]. The protein shares 55% sequence identity with UCH-L3
[41], and it contains the same 5-fold knot. UCH-L1 is highly abundant
in the brain, comprising up to 2% of the total brain protein [42]. The
structure of UCH-L1 was not yet part of the January Protein Data
Bank edition on which the rest of this study is based. We also noticed
several new structures of knotted transcarbamylase-like proteins.
Supporting Information
Table S1. List of Knotted Protein Data Bank Entries
Found at DOI: 10.1371/journal.pcbi.0020122.st001 (79 KB DOC).
Table S2. List of Knotted Entries from Table S1 That Become
UnknottedWhenEnds AreConnected by theRandomClosureMethod
Found at DOI: 10.1371/journal.pcbi.0020122.st002 (28 KB DOC).
Table S3. List of Structures That Become Knotted When Missing
Sections Are Joined by Straight Lines
Found at DOI: 10.1371/journal.pcbi.0020122.st003 (35 KB DOC).
Accession Numbers
The Protein Data Bank (http://www.pdb.org) accession numbers for
the structures discussed in this paper are human UCH-L3 (1xd3),
UCH-L3 yeast homologue (1cmx), human UCH-L1 (2etl), photo-
receptor phytochrome A in D. radiodurans (1ztu), class II ketol-acid
reductoisomerase in E. coli (1yrl), class II ketol-acid reductoisomerase
in spinach (1yve), S-adenosylmethione synthetase in E. coli (1fug), S-
adenosylmethione synthetase in rat (1qm4), AOTCase from X.
campestris (1yh1), SOTCase from B. fragilis (1js1), OTCase from P.
furiosus (1a1s), OTCase from human (1c9y), bluetongue virus core
protein (2btv), and baculovirus P35 protein in Autographa California
nuclear polyhedrosis virus (1p35).
Acknowledgments
Upon completion of this work we became aware of a related study [8],
which independently identified the knots in UCH-L3 and SOTCase in a
re-examination of protein knots. PV would like to acknowledge
discussions with François Nédélec and with Olav Zimmermann, in
which they proposed the potential link between protein knots and
degradation. LM and PV would also like to thank Rachel Gaudet for a
discussion about the function of ubiquitin hydrolase.
Author contributions. MK conceived the study. PV designed and
wrote the analysis code. PV and LM analyzed the data. PV, LM, and
MK wrote the paper.
Funding. This work was supported by National Science Foundation
grant DMR-04–26677 and by Deutsche Forschungsgemeinschaft grant
VI 237/1. LM is an Alfred P. Sloan Research Fellow.
Competing interests. The authors have declared that no competing
interests exist.
References
1. Virnau P, Kantor Y, Kardar M (2005) Knots in globule and coil phases of a
model polyethylene. J Am Chem Soc 127: 15102–15106.
2. Mansfield ML (1994) Knots in Hamilton cycles. Macromolecules 27: 5924–
5926.
3. Lua RC, Borovinskiy AL, Grosberg AY (2004) Fractal and statistical
properties of large compact polymers: A computational study, Polymer
45: 717–731.
4. Mansfield ML (1994) Are there knots in proteins? Nat Struct Mol Bio 1:
213–214.
5. Mansfield ML (1997) Fit to be tied. Nat Struct Mol Bio 4: 166–167.
6. Taylor WR (2000) A deeply knotted protein structure and how it might
fold. Nature 406: 916–919.
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221078
Intricate Knots in Proteins
7. Taylor WR, Lin K (2003) Protein knots—A tangled problem. Nature 421: 25.
8. Lua RC, Grosberg AY (2006) Statistics of knots, geometry of conformations,
and evolution of proteins. PLOS Comp Biol 2: e45.
9. Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: A structural
classification of proteins database for the investigation of sequences and
structures. J Mol Biol 247: 536–540 http://scop.mrc-lmb.cam.ac.uk/scop.
10. Tyagi R, Duquerroy S, Navaza J, Guddat LW, Duggleby RG (2005) The
crystal structure of a bacterial Class II ketol-acid reductolsomerase:
Domain conservation and evolution. Protein Sci 14: 3089–3100.
11. Wagner JR, Brunzelle JS, Forest KT, Vierstra RD (2005) A light-sensing knot
revealed by the structure of the chromophore-binding domain of
phytochrome. Nature 438: 325–331.
12. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The
Protein Data Bank. Nucleic Acids Res 28: 235–242. (The Protein Data Bank
is athttp://www.pdb.org. Accessed 22 August 2006.)
13. Adams CC (1994) The knot book: An elementary introduction to the
mathematical theory of knots. New York: W. H. Freeman. 306 p.
14. Liang C, Mislow K (1995) Topological features of protein structures: Knots
and links. J Am Chem Soc 177: 4201–4213.
15. Takusagawa F, Kamitori S (1996) A real knot in protein. J Am Chem Soc
118: 8945–8946.
16. Shi D, Gallegos R, DePonte J III, Morizono H, Yu X, et al. (2002) Crystal
structure of a transcarbamylase-like protein from the anaerobic bacterium
Bacteroides fragilis at 2.0 A resolution. J Mol Biol 320: 899–908.
17. Shi D, Morizono H, Yu X, Roth L, Caldovic L, et al. (2005) Crystal structure
of N-acetylornithine transcarbamylase from Xanthomonas campestris: A novel
enzyme in a new arginine biosynthetic pathway found in several eubacteria.
J Biol Chem 280: 14366–14369.
18. Misaghi S, Galardy PJ, Meester WJN, Ovaa H, Ploegh HL et al. (2005)
Structure of the ubiquitin hydrolase Uch-L3 complexed with a suicide
substrate. J Biol Chem 280: 1512–1520.
19. Johnston SC, Riddle SM, Cohen RE, Hill CP (1999) Structural basis for the
specificity of ubiquitin C-terminal hydrolases. EMBO J 18: 3877–3887.
20. Holm L, Sander C (1996) Mapping the protein universe. Science 273: 595–
602. (The Dali Database is located at http://ekhidna.biocenter.helsinki.fi/
dali/start. Accessed 22 August 2006.)
21. Navon A, Goldberg AL (2001) Proteins are unfolded on the surface of the
ATPase ring before transport into the proteasome. Mol Cell 8: 1339–1349.
22. Pickart CM, VanDemark AP (2000) Opening doors into the proteasome.
Nat Struct Mol Bio 7: 999–1001.
23. Kenniston JA, Baker TA, Sauer RT (2005) Partitioning between unfolding
and release of native domains during ClpXP degradation determines
substrate selectivity and partial processing. Proc Natl Acad Sci U S A 102:
1390–1395.
24. Neira JL, Fersht AR (1999) Exploring the folding funnel of a polypeptide
chain by biophysical studies on protein fragments. JMol Biol 285: 1309–1333.
25. Clough RC, Vierstra RD (1997) Phytochrome degradation. Plant Cell
Environ 20: 713–721.
26. Biou V, Dumas R, Cohen-Addad C, Douce R, Job D, et al. (1997) The crystal
structure of plant acetohydroxy acid isomeroreductase complexed with
NADPH, two magnesium ions and a herbicidal transition state analog
determined at 1.65 A resolution. EMBO J 16: 3405–3415.
27. Fu Z, Hu Y, Markham GD, Takusagawa F (1996) Flexible loop in the
structure of S-adenosylmethionine synthetase crystallized in the tetragonal
modification. J Biomol Struct Dyn 13: 727–739.
28. Gonzalez B, Pajares MA, Hermoso JA, Alvarez L, Garrido F, et al. (2000) The
crystal structure of tetrameric methionine adenosyltransferase from rat
liver reveals the methionine-binding site. J Mol Biol 300: 363–375.
29. Sander C, Schneider R (1991) Database of homology-derived protein
structures. Proteins: Struct Funct Genet 9: 56–68.
30. Morizono H, Cabrera-Luque J, Shi D, Gallegos R, Yamaguchi S, et al. (2006)
Acetylornithine transcarbamylase: A novel enzyme in arginine biosynthesis.
J Bacteriol 188: 2974–2982.
31. Shi D, Morizono H, Aoyagi M, Tuchman M, Allewell NM (2000) Crystal
structure of human ornithine transcarbamylase complexed with carbamyl
phosphate and L-Norvaline at 1.9 A resolution. Proteins: Struct Funct
Genet 39: 271–277.
32. Villeret V, Clantin B, Tricot C, Legrain C, Roovers M, et al. (1998) The
crystal structure of Pyrococcus furiosus ornithine carbamoyltransferase
reveals a key role for oligomerization in enzyme stability at extremely
high temperatures. Proc Natl Acad Sci U S A 95: 2801–2806.
33. Arsuaga J, Vasquez M, Trigueros S, Sumners DW, Roca J (2002) Knotting
probability of DNA molecules confined in restricted volumes: DNA
knotting in phage capsids. Proc Natl Acad Sci U S A 99: 5373–5377.
34. Janse van Rensburg EJ, Sumners DW, Wassermann E, Whittington SG
(1992) Math Gen 25: 6557–6566.
35. Deguchi T, Tsurusaki K (1997) Universality in random knotting. Phys Rev E
55: 6245–6248.
36. Koniaris K, Muthukumar M (1991) Self-entanglement in ring polymers. J
Chem Phys 95: 2873–2881.
37. Jackson SE, Mallam AL (2005) Folding studies on a knotted protein. J Mol
Biol 346: 1409–1421.
38. Mallam AL, Jackson SE (2006) Probing nature’s knots: The folding pathway
of a knotted homodimeric protein. J Mol Biol 359: 1420–1436.
39. Millett K, Dobay A, Stasiak A (2005) Linear random knots and their scaling
behavior. Macromolecules 38: 601–606.
40. Das C, Hoang QQ, Kreinbring CA, Luchansky SJ, Meray RK, et al. (2006)
Structural basis for conformational plasticity of the Parkinson’s disease-
associated ubiquitin hydrolase UCH-L1. Proc Natl Acad Sci U S A 103:
4675–4680.
41. Krissinel E, Henrick K (2004) Secondary-structure matching (SSM), a new
tool for fast protein structure alignment in three dimensions. Acta Cryst
D60: 2256–2268.
42. Wilkinson KD, Lee KM, Deshpande S, Duerksen-Hughes P, Boss JM, et al.
(1989) The neuron-specific protein PGP 9.5 is a ubiquitin carboxyl-
terminal hydrolase. Science 246: 670–673.
43. Humphrey W, Dalke A, Schulten K (1996) VMD—Visual molecular
dynamics. J Molec Graphics 14: 33–38.
PLoS Computational Biology | www.ploscompbiol.org September 2006 | Volume 2 | Issue 9 | e1221079
Intricate Knots in Proteins
ABSTRACT
  A number of recently discovered protein structures incorporate a rather
unexpected structural feature: a knot in the polypeptide backbone. These knots
are extremely rare, but their occurrence is likely connected to protein
function in as yet unexplored fashion. Our analysis of the complete Protein
Data Bank reveals several new knots which, along with previously discovered
ones, can shed light on such connections. In particular, we identify the most
complex knot discovered to date in human ubiquitin hydrolase, and suggest that
its entangled topology protects it against unfolding and degradation by the
proteasome. Knots in proteins are typically preserved across species and
sometimes even across kingdoms. However, we also identify a knot which only
appears in some transcarbamylases while being absent in homologous proteins of
similar structure. The emergence of the knot is accompanied by a shift in the
enzymatic function of the protein. We suggest that the simple insertion of a
short DNA fragment into the gene may suffice to turn an unknotted into a
knotted structure in this protein.

<|endoftext|><|startoftext|>
Introduction
Large low surface brightness galaxies are galaxies with disk central surface brightnesses
statistically far from the Freeman (1970) value of µB(0) = 21.65 ± 0.3 mag arcsec
−2, and
whose properties are significantly removed from the dwarf galaxy category (e.g. MB < −18,
MHI > 10
9M⊙). Studies of large LSB galaxies have discovered a number of intriguing facts:
large LSB galaxies, in contrast to dwarf LSB galaxies, can exhibit molecular gas (Das,
et al. 2006; O’Neil & Schinnerer 2004; O’Neil, Schinnerer, & Hofner 2003; O’Neil, Hofner,
& Schinnerer 2000); the gas mass-to-luminosity ratios of large LSB galaxies are typically
higher than for similar high surface brightness counterparts by a factor of 2 or more (O’Neil,
et al. 2004); and, like dwarf LSB galaxies, large LSB systems are typically dark-matter
dominated (Pickering, et al. 1997; McGaugh, Rubin, & de Blok 2001). These properties,
added to their typically low metallicities (de Naray, McGaugh, & de Blok 2004; Gerritsen
& de Blok 1999), lead to the inference that even large LSB galaxies are under-evolved
compared to their high surface brightness (HSB) counterparts. Once their typically low
gas surface densities (MHI ≤ 10
21 cm−2) (Pickering, et al. 1997) and low baryonic-to-dark
matter ratios (Gurovich, et al. 2004; McGaugh, et al. 2000) are taken into account, the
question becomes less why LSB galaxies are under-evolved than how they can form stars at
all (O’Neil, Bothun, & Schombert 2000, and references therein). Yet large LSB galaxies have
the same total luminosity within them as ordinary Hubble sequence spirals (O’Neil, et al.
2004; Impey & Bothun 1997; Pickering, et al. 1997; Sprayberry, et al. 1995). On average
then, star formation cannot be too inefficient in these large LSB galaxies in spite of their
unevolved characteristics, else their integrated light would be significantly less then in their
HSB counterparts.
In an effort to better understand this enigmatic group of galaxies and their evolutionary
status, we recently conducted a 21-cm survey to discover a larger nearby sample of such
objects (O’Neil, van Driel, & Schneider 2006; O’Neil, et al. 2004). We succeeded in identi-
fying about 25 candidates within the redshift range 0.04 < z < 0.08, whose combined HI
and optical properties suggest them to be large LSB galaxies. We obtained B, R, and Hα
imaging of 19 of these galaxies at Lowell Observatory to confirm whether these candidates
are indeed LSB galaxies, and to obtain a dataset of their fundamental parameters. These
observations are presented here, and interestingly, none of the galaxies ultimately turned out
to be LSB galaxies by the strict conventional definition; we discuss this result below in § 4.
However, these galaxies still represent a sample whose surface brightnesses are below average,
and whose properties are intermediate between those of the bona fide massive LSB galaxies,
and ordinary HSB galaxies. In this work, we quantify and parameterize the fundamental
properties of this sample of large, “lower surface brightness” galaxies.
– 3 –
2. Galaxy Sample
There are three ways that disk galaxy surface brightness can be measured or quantified –
using a surface brightness profile and fitting an exponential disk to derive the central surface
brightness; measuring an average surface brightness within a given isophotal diameter; and
measuring the surface brightness of the isophote at the 1/2 light radius point (the effective
surface brightness). The latter two definitions suffer from the fact that the bulge light is
included in the surface brightness estimates, resulting in their prediction of the disk surface
brightness to be less accurate. As a result, the typical operational definition of an LSB
galaxy uses the first definition, and defines an LSB galaxy as one whose whose observed disk
central surface brightness is µB(0) ≥23.0 mag arcsec
−2. For reference, the Freeman value of
µB(0)=21.65 +/- 0.30 mag arcsec
−2 defines the distribution of central surface brightness, in
the blue band, for Hubble sequence spirals.
Regardless of the definition, without pre-existing high quality optical imaging of galaxies,
it is difficult to unambiguously identify a sample of disk galaxies that will turn out to be
LSB. With only catalog data available, one is driven to use the average surface brightness
and identify potential LSB galaxies as those whose average surface brightness is below some
threshold level.
All of the galaxies in this sample were identified as LSB by Bothun, et al. (1985) using
the magnitude and diameter values found in the Uppsala General Catalog (Nilson 1973), and
employing the general equation 〈µB〉 = mpg +5log(D)+8.63. Here, mpg is the photographic
magnitude of the galaxies, D is the diameter in arcminutes, and the constant, 8.63, is derived
from the conversion from arcminutes to arcseconds (8.89) and the conversion from mpg to mB
(-0.26, as used by Bothun, et al.) Bothun, et al. (1985) then made a cut-off to the galaxies
in their sample, requiring 〈µB〉 >24.0 mag arcsec
−2 to look for galaxies with lower surface
brightness disks, with the majority of the galaxies chosen having 〈µB〉 >25.0 mag arcsec
(The inclusion of a number of galaxies with 〈µB〉=24-25 mag arcsec
−2 was due to the 0.5mag
errors given in the UGC.)
The Bothun, et al. (1985) sample was further pared down by our desire to image large
LSB galaxies. That is, we wished to avoid the dwarf galaxy category entirely. To do this, we
required the galaxies to have MHI > 10
9 M⊙, W20 > 200 km s
−1, and/or MB < −19. These
criteria are sufficiently removed from the dwarf galaxy category to guarantee no overlap
between our sample and that category exists.
– 4 –
3. Observations & Data Reduction
Galaxies’ integrated broad-band colors represent a convolution of the mean age of the
stellar population, metallicity, and recent star formation rate; while measurements of Hα
luminosity provide a direct measure of the current star formation rate (SFR). With these
combined observations, is is possible to parameterize the current SFR relative to the overall
star formation history. As a result, these observations are widely used in many surveys
that target fundamental galaxy parameters, for example, SINGG (Meurer, et al. 2006) and
11HUGS (Kennicutt, et al. 2004), and others (e.g. Gavazzi, et al. 2006; Koopman & Kenney
2006; Helmboldt, et al. 2005).
19 galaxies were observed on 7-10 June, 2002 and 5-8 October, 2003 using the Lowell
1.8m Perkins telescope. The filter set used included Johnson B and R as well as three
Hα filters from a private set (R. Walterbos) with center frequency/bandwidths of 6650/75,
6720/35, 6760/75 Å. A 1065x1024 pixel Loral SN1259 CCD camera was used, giving a 3.3′
field of view and resolution of 0.196′′/pixel. Seeing in June, 2002, ranged from 1.8′′ - 2.4′′ and
from 1.4′′ - 2.2′′ for the October, 2003, observations. At least 3 frames, each shifted slightly
in position, were obtained for each object through each filter and were median filtered to
reduce the effect from cosmic rays, bad pixels, etc. All initial data reduction (bias and flat
field removal, image alignment, etc) was done within IRAF. The R band images were scaled
and used as the continuum images for data reduction purposes.
Corrections to the measured fluxes were made in the following way. Atmospheric extinc-
tion was obtained using the observational airmass and the atmospheric extinction coefficients
for Kitt Peak which are distributed with IRAF. Galactic extinction was corrected using the
values for E(B−V) obtained from NED, the reddening law of Seaton (1979) as parameter-
ized by Howarth (1983) (A(λ) = X(λ)E(B−V )) and assuming the case B recombination of
Osterbrock (1989) with RV=3.1 (O’Donnell 1994) (X(6563Å)=2.468). Contamination from
[NII] emission in the Hα images was corrected using the relationship derived by Jansen,
et al. (2000) and re-confirmed by Helmboldt, et al. (2004):
[NII]
= [−0.13± 0.035]MR + [−3.2 ± 0.90] ,
where MR is the absolute magnitude in the R band. Hα extinction was determined using
the equation found in Helmboldt, et al. (2004):
log (Hα)int = [−0.12± 0.048]MR + [−2.5± 0.96]
which was found through a linear least squares fitting to the A(Hα)int determined using
all galaxies in his sample with a measured Hβ flux. For this calculation, Helmboldt, et al.
– 5 –
(2004) used the Hα to Hβ ratio measured by Jansen, et al. (2000), an assumed intrinsic ratio
of Hα
=2.85 (Case B recombination and T=104 K (Osterbrock 1989)), the extinction curve
of O’Donnell (1994), and RV=3.1 No correction for internal extinction due to inclination
was made for the B and R bands. It should be noted, though, that in a number of plots
inclination corrections were made to the B and R colors and central surface brightnesses, as
noted in the Figure captions. The corrections used in these cases are:
µ(0)λcorr = µ(0)
λ − 2.5Cλlog(b/a) (1)
mλcorr = m
λ −Aλ (2)
Aλ = −2.5log
1 + e−τ
λsec(i)
+ (1− 2f)
1− e−τ
λsec(i)
τλsec(i)
Here, CR,B=1 (Verheijen 1997); (b/a) is the ratio of the minor to major axis; f = 0.1 and
τR,B=0.40, 0.81 (Tully, et al. 1998; Verheijen 1997). Finally, a correction was applied to
account for the effect of stellar absorption in the Balmer line of
Fcor = Fobs
, (4)
where Fcor is the corrected and Fobs is the observed Hα flux, We is the measured equivalent
width and Wa is the equivalent width of the Balmer absorption lines. As we do not have
measurements for Wa, we estimated Wa to be 3±1 Å, based off the values found in Oey &
Kennicutt (1993); Roennback & Bergvall (1995); McCall, Rybski, & Shields (1985). Note
that this effect is potentially stronger in the diffuse gas than in the H II regions due to
the older stellar population likely lying in the diffuse gas. As a result we may still be
underestimating the total Hα flux in the diffuse gas within the galaxies. However, as the
diffuse gas fractions for these galaxies are extremely high (see Section 6, below), it is unlikely
that this effect is high.
Global parameters and radial profiles for the galaxies were determined primarily using
the routines available in IRAF (notably ellipse) and the results are given in Tables 1 and 2.
Galaxy images, surface brightness profiles, and color profiles are given in Figures 1 – 3. In
all cases the inclination and position angle for the galaxies were determined from the best fit
values from the B & R frames. These best fit values were then used for the ellipse fitting in
all four images (B, R, Hα and continuum with Hα subtracted), a practice which insures the
color profiles are obtained accurately and are not affected by, e.g. misaligned ellipses. The
same apertures were also used for all four images, with the apertures found through allowing
ellipse to range from 1 pixel (0.196′′) until the mean value in the ellipse reaches the sky value,
– 6 –
increasing geometrically by a factor of 1.2. Sky values were found through determining the
mean value in more than 100 5×5 sq. pixel boxes in each frame. The error found for the sky
was incorporated into all magnitude and surface brightness errors, which also include errors
from the determination of the zeropoint and the errors from the N II contribution to the Hα
(in the case of the Hα magnitudes).
The B and R surface brightness profiles of all galaxies were fit using two methods. First,
the inner regions of the galaxies’ surface brightness profiles was fit using the de Vaucouleurs
r1/4 profile
Σ(r) = Σeffexp
−7.669[(r/reff )1/4−1] → µ(r) = µeff + 8.327
, (5)
and the outer regions were fit by the exponential disk profiles
Σ(r) = Σ0exp
( rα) → µ(r) = µ0 + 1.086
. (6)
Additionally, we attempted to fit a disk profile (6) to both the inner and outer regions of the
galaxies’, to determine if a two-disk fit would better match the data (Broeils & Courteau
1997; de Jong 1996). Roughly one-fourth of the galaxies (5/19) were best fit (in the χ2-sense)
by the standard bulge+disk model. Another 47% of the galaxies were best fit by the two-disk
model. Of the remaining galaxies, 21% (4 galaxies) were best fit by a single disk, and one
galaxy (UGC 11840) could not be fit by any profile. The results from the fits are shown in
Table 3 and Figure 2, and an asterisk (*) is placed next to the best fit model. Note that
in a few cases (e.g. UGC 00189) only one model is listed in the Table. This is due to the
fact that in these cases the fitting using the other model proved to be completely unrealistic.
Finally, it should be noted that in all cases the same best-fit model was used for both the B
and R data.
The color profiles were similarly fit (using an an inverse error weighting) with a line to
both the inner and outer galaxy regions (Figure 3). Here, though, the “boundary radius”
was simply taken from the surface brightness profile fits, with the “boundary radius” being
defined as the radius where the inner and outer surface brightness fits crossed. If only one
(or no) fit was made to the surface brightness profile, then only one color profile was fit. In
a number of cases the difference in slope between the inner and outer galaxy regions was
less than the least-squares error for the fit. In these cases again only one line was fit for the
color profiles.
The HIIphot program (Thilker, Braun, & Walterbos 2000) was used both to determine
the shape and number of H II regions for each galaxy and also to determine the Hα flux for
each of these regions. The fluxes from the Hα, Hα-subtracted continuum, B, and R images
– 7 –
were measured in identical corresponding apertures, which are the H II region boundaries
defined by HIIphot. While HIIphot applies an interpolation algorithm across these apertures
to estimate the diffuse background in the Hα frames, we determined the background in other
bands from the median flux in an annulus around each H II region aperture. Results from
the analysis of the H II regions are given in Table 4, and sample H II regions are shown in
Figure 4. Errors for the Hα, SFR, and EW measurements are derived from the error values
reported with HIIphot. Errors for the B and R magnitudes, and colors, are derived from the
total sky and zeropoint errors, as well as the error in positioning of the HII regions. The
diffuse fraction errors are derived both from the total Hα flux errors and also include errors
in determining the total flux within the HII regions and for the entire galaxy. Finally, it
should be mentioned that the equivalent width (EW) was calculated simply as the ratio of
the Hα flux to Hα-subtracted continuum flux for a given region (or the whole galaxy).
The large distances to the observed galaxies (40 - 100 Mpc) results in many of the H II
region being blended together. As a result, any luminosity function derived for these objects
would be necessarily skewed towards larger HII regions (see Oey, et al. 2006). This can be
seen in the analysis done by Thilker, Braun, & Walterbos (2000) wherein the dependence of
the luminosity function found for M51 was examined. There one can clearly see the increase
in the number of high luminosity regions and subsequent reduction in the number of low
luminosity regions as the galaxy is ’moved’ to increasing distances. Examining their results
also shows that while the distribution of H II region luminosities changes with distance, the
total luminosity of the H II regions, as found by HIIphot, does not change significantly as the
galaxy moves from 10 Mpc to 45 Mpc. As a result, while determining luminosity functions
for the galaxies in this paper is not feasible due to the distances involved, derivations such
as the diffuse fraction are unaffected by distance. This fact is also supported by the SINGG
survey results (Oey, et al. 2006).
4. Surface Brightness
The distribution of central surface brightnesses found for the galaxies observed is shown
in Figure 5. As is plain from that Figure, the mean measured central surface brightness for
this sample, falls short of the definitions discussed in Section 2. Indeed only 4 galaxies in our
sample meet the operational definition of LSB galaxies as having µB(0) ≥ 23 mag arcsec
If we return to the Freeman value, however, we see that the operational definition of LSB
galaxies is 4.5σ from the value for Hubble sequence spirals, making it statistically extreme.
For the sample defined here, half have central surface brightnesses at least two sigma above
the Freeman value, a definition only 2.5% of the Freeman sample meets. As a result, while
– 8 –
the sample does not meet the operation criteria for LSB galaxies, we clearly do have a sample
with lower central surface brightnesses that would be found in the average Hubble sequence
galaxies.
It should be pointed out here that the main scientific focus of Bothun, et al. (1985)
was not oriented toward producing a representative sample of LSB galaxies as detected on
photographic surveys (that focus did not occur until Schombert & Bothun 1988), but rather
toward identifying cataloged galaxies for 21-cm based redshift determinations. The galaxies
were chosen to have surface brightnesses that were too low for reliable optical spectroscopy
(assuming emission lines were not present). This was done as a test of the potentially large
problem of bias in on going optical redshift surveys in the time (see Bothun, et al. 1986).
In fact, the operational criteria for selecting the galaxies that were observed at Arecibo 20
years ago, lay in the knowledge that these cataloged galaxies were never going to be even
attempted in the optical redshift surveys of the time and this raised the very real possibility
of biased redshift distributions and an erroneous mapping of large scale structure.
In the original redshift measurements of Bothun, et al. (1985) a significant number of
candidate LSB galaxies were not detected at 21-cm within the observational redshift window
(approximately 0-12,000 km/s). Many of those non-detections would later turn out to be
intrinsically large galaxies located at redshifts beyond 12,000 km/s (see O’Neil, et al. 2004).
As we are interested here in the Hα properties of galaxies with large, relatively LSB disks,
these initial non-detections comprise the bulk of our sample.
Surface photometry of this sample not only provides detailed information regarding
the galaxies’ surface brightness and color distributions, but it also probes the efficacy of
the Bothun, et al. (1985) average surface brightness criteria for selecting LSB disks. Here,
we used the magnitudes and diameters obtained in this study (Table 1) with two different
equations for determining a galaxy’s average surface brightness within the D25 radius. The
first equation used is that of Bothun, et al. (1985)
〈µ25〉 = m25 + 5log(D25) + 3.63 (7)
and the second is a modified version of the above equation from Bottinelli, et al. (1995)
which takes the galaxies’ inclination into account:
〈µ25〉 = m25 + 5log(D25) + 3.63− 2.5log
kR−2C + (1− k)R(0.4C/K)−1
. (8)
In both equations, m25 and D25 are the magnitude and diameter (in units of 0.1
′) at the
µ=25.0 mag arcsec−2 isophote, R is the axis ratio (a/b), and C is defined as (logD/logR)
and is fixed at 0.04 (Bottinelli, et al. 1995). Finally, k (the ratio of the bulge-to-disk lumi-
nosity) and K (a measure of how the apparent diameter changes with surface brightness at
– 9 –
a given axis ratio) are dependent on the revised de Vaucouleurs morphological type (T) as
follows (Simien & de Vaucouleurs 1986; Fouqué & Paturel 1985):
T=1 → k=0.41; T=2 → k=0.32; T=3 → k=0.24; T=4 → k=0.16; T=5 → k=0.09; T=6 →
k=0.05; T=7 → k=0.02; T≥8 → k=0.0;
K = 0.12− 0.007T if T < 0; K = 0.094 if T ≥ 0.
The values for k at T≥8 are extrapolated from fitting the Simien & de Vaucouleurs (1986)
values.
The results of equations 7 and 8, plotted against the galaxies’ central surface brightness
both uncorrected and corrected for inclination, are shown in Figures 6 and 7, respectively.
The difference between the two plots is small, with neither equation doing an excellent job in
predicting when a disk’s central surface brightness will be low. The two equations (Bothun,
et al. (1985) and Bottinelli, et al. (1995)) have roughly the same fit (in the χ2 sense), which
at first appears surprising. It is likely that uncertainties in the inclination measurements and
morphological classification of the galaxies have increased the scatter in the Bottinelli, et al.
(1995) equation, increasing the scatter in an otherwise more accurate equation. As a result,
while the Bottinelli, et al. (1995) may indeed be the most accurate, the simpler equation is
equally as good to use in most circumstances as it involves fewer assumptions.
The second fact that is readily apparent in looking at Figures 6 and 7 is that with the
new measurements of magnitude and diameter, none of the galaxies in our sample meet
the criterion laid out by Bothun, et al. (1985) for an LSB galaxy. That is that none of
the galaxies in this sample have 〈µ25〉 >25 mag arcsec
−2. As Bothun, et al. (1985) listed
all of these objects as having 〈µ25〉 >25 mag arcsec
−2 using the magnitudes and diameters
provided by the original UGC measurements, this shows that the UGC measurements indeed
predicted fainter magnitudes/larger values for D25 than is found with more sophisticated
measurement techniques. Additionally, it is good to note that the trends shown in Figures 6
and 7 indicate that any galaxy which met the 〈µ25〉 >25 mag arcsec
−2 criteria would be
highly likely to also have µ(0) >23 mag arcsec−2.
In these days of digital sky surveys it is difficult to appreciate the immense undertaking
that defines the UGC catalog. Anyone who has looked at the Nilson selected galaxies on
the Palomar Observatory Sky Survey (POSS) plates with a magnifying eyepiece really has
to marvel that Nilson’s eye saw objects at least one arcminute in diameter. It is thus not
surprising that, at the ragged end of that catalog, many of the listed UGC diameters are
systematically high. Cornell, et al. (1987) made a detailed diameter comparison between
diameters as obtained from high quality CCD surface photometry and the estimates made
by Nilson (1973). They compared the diameter at the 25.0 mag arcsec−2 isophote in CCD
– 10 –
B images to the tabulated diameter in the UGC. The study, based on approximately 250
galaxies, identified two sources of systematic error (neither of which are surprising). First,
galaxies with reported diameters less than 2′ typically had D25,B as measured by the CCD
images that were 15-25% smaller. Second, Cornell, et al. (1987) found a systematic bias as a
function of surface brightness in the sense that lower surface brightness galaxies had a higher
number of overestimated diameters in the UGC than higher surface brightness galaxies. It
should also be noted that the majority of the galaxies in this study lie at low Galactic latitude.
This seems to be a perverse consequence that there is a large collection of galaxies between
7,000 – 10,000 km s−1 (where the diameter criterion in the UGC yields a relatively large
physical size) located at relatively low galactic latitude. Nominal corrections for galactic
extinction made by Bothun, et al. (1985) turned out to underestimate the extinction as
shown by later published extinction maps. In some cases, the differences were as large
as one magnitude. The combination of these facts with the very uncertain magnitudes of
many of these galaxies (see Bothun & Cornell 1990), it is not surprising that the measured
average surface brightness could easily be 1-1.5 magnitudes higher than the average surface
brightness that has been estimated from the UGC catalog parameters (roughly 40% of this
comes from systematic magnitude errors and 60% from the diameter errors discovered by
Cornell, et al. (1987)).
5. Morphology & Color
All of the galaxies observed have large sizes (3αB = 10 – 54 kpc), bright central bulges,
and well defined spiral structure (Figure 1). In most cases the galaxies can be described as
late-type systems (Sbc and later). There are, though, a number of exceptions to this rule.
Three of the galaxies, UGC 00023, UGC 07598, and UGC 11355 (Sb, Sc, and Sb galaxies,
respectively) have clear nuclear bars. UGC 08311, classified as an Sbc galaxy, is clearly in
the late stages of merging with another system. In this case the LSB classification of the
galaxy is likely bogus, as the apparently LSB disk is likely just the remnant the merging
process and will disappear as the galaxy compacts after the merging process. UGC 8904 is
given a morphological type of S? with both NED and HYPERLEDA, yet the faint spiral
arms surrounding it indicate its should be properly classified as an Sbc system. UGC 12021
is, like UGC 00023, listed as an Sb galaxy. Finally, UGC 11068 has a faint nuclear ring
which is most readily visible in the B image.
The differences between the galaxies becomes more apparent when the Hα images are
examined. Hodge & Kennicutt (1983) classify the radial distribution of H II regions in
spiral galaxies into three broad categories – galaxies with H II region surface densities which
– 11 –
decrease with increasing radius, galaxies with oscillating H II region surface densities, and
galaxies with ring-like H II density distributions. To these categories we would add a fourth,
to include those galaxies with no detectable H II regions.
The first category of Hodge & Kennicutt (1983) is also the most common, as it includes
all galaxies with generally decreasing radial densities of Hα. In the Hodge & Kennicutt (1983)
sample this category is dominated by Sc – Sm galaxies but contains all Hubble types. In our
sample, this category includes both galaxies with and without significant Hα emission in the
spiral arm regions. This group includes UGC 00023, UGC 00189, UGC 02588, UGC 02796,
UGC 03119, UGC 03308, UGC 07598, and UGC 12021. Interestingly, of the galaxies listed
above, 4/8 are Sb/Sbc galaxies and 3/8 are Sc-Sm galaxies. (The last galaxy, UGC 02588,
is an irregular galaxy.)
The second category of Hodge & Kennicutt (1983), galaxies with oscillating densities,
is dominated in their sample of Sb galaxies. Only a few of the galaxies in this sample fall
into this category, 80% of which are also Sb/Sc galaxies. These are UGC 02299, UGC 08311,
UGC 08904, UGC 11355, and UGC 11396. These galaxies all have a concentration of star
formation seen in the nuclear regions and then clumps of star formation spread through the
spiral arms, typically accompanied by diffuse Hα also spread throughout the arms.
The third category of Hodge & Kennicutt (1983) is dominated by early-type galaxies,
of which we have none in our sample. Nonetheless we have three galaxies which fall into this
category – UGC 08644, UGC 10894, and UGC 11617. All three have H II regions spread
throughout their disks, with no central concentration near the galaxies’ nuclei. In fact, the
three brightest star forming regions within UGC 08644 all lie with the spiral arms, and are
visible in all three filters. In contrast, both UGC 11617 and UGC 10894 have no bright H II
regions, but instead have a large number of diffuse H II regions, with the brightest (as listed
in Table 2) receiving that designation simply due to its size.
The fourth category of galaxies contains UGC 01362, UGC 11068, and UGC 11840,
none of which have detectable Hα. In the case of UGC 01362 and UGC 11840 this is not too
surprising as the galaxies are dominated by a bright nucleus, and their surrounding spiral
arms are extremely faint in both R and B. As a result, any Hα which may exist in the
galaxies’ disks is too diffuse to be detected. UGC 11068, though, has both a well defined
nucleus and a clear spiral structure extending out to a radius of ∼13 kpc (3α). Yet no Hα
can be detected in this galaxy. This may mean that UGC 11068 is in a transition state for
its star formation, with no ongoing star formation yet with enough recent activity that the
spiral arms remain well defined.
Perhaps the most intriguing galaxy of our sample is UGC 11355. This galaxy was
– 12 –
placed in Category 2, above, as it has a bright nucleus and clumpy disk in the Hα image.
The B and R band images of UGC 11355 show a galaxy with a simple Sbc morphology.
The Hα image, though, shows a distinct star forming ring. The ring is at a very different
inclination from the rest of the galaxy (i=49◦ for the ring and 73◦ for the galaxy as a whole),
and lies approximately 2.6 kpc in radius from the center of the galaxy, measured along the
major axis. As the B and R images show no indication of a ring morphology this indicates
unusually strong star formation in the ring. It is also useful to note the presence of a bar in
UGC 11355 – shown more clearly in Figure 8. The fact that the inclination of the ring is
significantly different from that of the rest of the galaxy suggests the ring a tidal effect due
to an interaction, such as a small satellite galaxy being cannibalized by UGC 11355, or the
influence of CGCG 143-026, 14.9′ and 68 km s−1 away.
It is interesting to note that the Hα morphology of the galaxies does not appear to
correlate with the galaxies’ color profile (Figure 3). The galaxy with the steepest slope in
the color profile is UGC 08644 which has only a few H II regions in its outer arms. The other
galaxies with steep color profiles are UGC 00023 and UGC 8904, which have a bright knot
of star formation in the nucleus and faint Hα spread throughout their arms, and UGC 11840
and UGC 11068 both of which have no detectable Hα. The galaxies with the shallowest slopes
similarly show no correlation between their color profiles and morphology. This suggests that
the current star formation in these galaxies is largely independent of the past star-formation
history, although this result should be confirmed with better, extinction-corrected, data.
6. Star Formation
Figures 10 – 17 compare the properties of the H II regions and emission of our galaxy
sample. Where possible, measurements from other samples of late-type galaxies are also
shown (Kennicutt & Kent 1983; Jansen, et al. 2000; Helmboldt, et al. 2005; Oey, et al.
2006). Examining the figures it is clear that the overall properties of our sample are similar
to those of other late-type (Sbc-Sc) galaxies. That is, the values for the individual H II region
luminosities are similar to those reported by Helmboldt, et al. (2005) and Kennicutt & Kent
(1983) (Figure 10) while the global Hα equivalent width (EW) and global star formation
rates match those seen by all three comparison samples (Figures 11, 12).
We should note that as discussed in § 3 our sample suffers from having many of the H II
regions blended together as a result of the distance to our galaxy samples. As a result, it
is highly likely that in the comparisons of the luminosities for the galaxies’ individual H II
regions the luminosities (Figure 10) from our sample are artificially higher then those in the
other sample, potentially by a factor of 3 or more. This fact does not alter the results of this
– 13 –
section, but it is the likely explanation for the slightly higher than average values found for
L3 in Figure 10.
To examine the total amount of gas found within the H II regions compared with that
found in the diffuse Hα gas, we need to determine the galaxies’ Hα diffuse fraction, defined
here as the ratio of Hα flux not found within the defined H II regions to the total Hα flux
found for the entire galaxy. Examining Figures 13 and 14, as well as Tables 2 and 4, reveals
an interesting fact – while the global SFR for these galaxies is fairly typical (0.3 – 5 M⊙/yr),
the combined SFR from the galaxies’ H II regions is a factor of 2 – 10 smaller. That is, on
average the majority of the Hα emission and thus the majority of the star formation in the
observed galaxies comes not from the bright knots of star formation but instead from the
galaxies’ diffuse Hα gas. This is in contrast to the behavior seen from typical HSB galaxies,
as evidenced by the data of Oey, et al. (2006) in Figure 13. We note that blending and
angular resolution effects appear to be relatively unimportant in estimating the fraction of
diffuse Hα emission. Oey, et al. (2006) demonstrate this by showing no systematic changes
in measured diffuse fractions as a function of distance up to almost 80 Mpc, and inclination
angle, for their sample of 100+ SINGG survey galaxies.
While at first glance the higher diffuse Hα fractions found for these galaxies seems
surprising, recent GALEX results of the outer edges of M83, a region whose environment
closely resembles that of the disks of massive LSB galaxies, also show considerable star
formation outside the H II regions in that part of the galaxy (Thilker, et al. 2005). Similarly,
Helmboldt, et al. (2005) found a slight trend with lower surface brightness galaxies having
higher diffuse fractions than their higher surface brightness counterparts.
The fact that these galaxies have higher Hα diffuse gas fractions raises an interesting
question. Typically diffuse gas is believed to be ionized by OB stars lying within density-
bounded H II regions. The problem of transporting the ionizing photons from these regions
to the diffuse gas is extreme in these cases, as there would need to be a very large number
of density-bound H II regions leaking ionizing photons to ionize the quantity of diffuse
gas seen here. (See the more detailed discussion in Hoopes, Walterbos, & Bothun 2001,
which also discusses shock heating from stellar winds and SNe as ionization sources.) An
alternative suggestion is that field OB stars are also ionizing the diffuse gas, as was suggested
by Hoopes, Walterbos, & Bothun (2001). This would imply a different stellar population
within and without the H II regions, as it would likely be the later OB types (B0–O9) which
either escape the H II regions or survive the regions’ destruction. We note Oey, King, &
Parker (2004) predict a modest increase in the fraction of field massive stars in galaxies with
the lowest absolute star-formation rates. Scheduled GALEX observations of a subset of our
observed galaxies may shed light on the underlying stellar population in the galaxies’ diffuse
– 14 –
stellar disks.
Finally, it is elucidating to look for any trends between the global and regional properties
of the galaxies and their SFR and Hα content. Figure 15 plots the galaxies’ central surface
brightness (in both B and R) against the galaxies’ total star formation rate. While the error
bars make defining any trend difficult, there certainly appears to be a decrease in the global
SFR with decreasing central surface brightness, similar to the trends seen in other studies
(e.g. van den Hock, et al. 2000; Gerritsen & de Blok 1999). Figure 16 shows the galaxies’
gas-to-luminosity ratios plotted against both their global equivalent width and diffuse Hα
fraction. In both cases, no trend can be seen with our data, although the small number of
points available make any diagnosis difficult. Combined with the other datasets, though,
we can see a general trend toward higher equivalent widths with increasing MHI/LB, but
surprisingly no trend between gas fraction and the galaxies’ diffuse Hα fraction is visible.
This lack of correlation is also seen by Oey, et al. (2006). The last trend which can be seen is a
rough correlation between the galaxies’ global color and star formation rate (Figure 11), with
redder galaxies having higher SFR, a fact which may be a reddening effect. The individual
Hα regions, however, show no such trend (Figure 17).
7. Conclusion
The sample of 19 galaxies observed for this project were chosen to be large galaxies
with low surface brightness disks. The surface brightness measurements for this sample
were obtained originally through the UGC measurements through determining the galaxies’
average surface brightness within the µ=25 mag arcsec−2 isophote. The relation employed
to determine the galaxies’ average surface brightness (Equation 7) has shown itself it be a
good predictor of a galaxy’s central surface brightness. But for a wide variety of reasons the
UGC measurements were not sufficient to insure the galaxies contained within this catalog
have true LSB disks, underscoring the difficulty in designing targeted searches for large LSB
galaxies.
Nonetheless, the sample of galaxies observed for this project have lower surface bright-
nesses than is found for a typical sample of large high surface brightness galaxies. In most
other aspects the galaxies appear fairly ‘normal’, with colors typically B−R=0.3−0.9, mor-
phological types ranging from Sb – Irr, and color gradients which typically grow bluer toward
the outer radius. However, the galaxies have both higher gas mass-to-luminosity fractions
and diffuse Hα fractions than is found in higher surface brightness samples. This raises
two questions. First, if the SFR for these galaxies has been similar to their higher sur-
face brightness counterparts through the galaxies’ life, why do the lower surface brightness
– 15 –
galaxies have higher gas mass-to-luminosity ratios? Second, why do these galaxies have a
higher fraction of ionizing photons outside the density-bounded H II regions then their higher
surface brightness counterparts?
The answer to the first question posed above likely comes from the difference between
the studied galaxies’ current and historical SFR. As these galaxies have on average and lower
metallicities (de Naray, McGaugh, & de Blok 2004; Gerritsen & de Blok 1999) than their
higher surface brightness counterparts, it is likely that the galaxies’ SFR has not remained
constant throughout the their lifetimes. Indeed the simplest explanation for the current
similar SFRs and higher gas mass-to-luminosity ratios for the studied galaxies than for
their higher surface brightness counterparts is that the galaxies’ past SFR was significantly
different than is currently seen. In fact, the measured properties would be expected if
the galaxies in this study have episodic star formation histories, with significant time (1-3
Gyr) lapsing between SF bursts, as has been conjectured for LSB galaxies in the past (e.g.
Gerritsen & de Blok 1999). Such a star formation history would help promote significant
changes in the galaxies’ mean surface brightness and allow an individual large disk galaxy
to appear as either (a) a relatively normal Hubble sequence spiral, (b) a large, lower surface
brightness disk, or (c) perhaps even a lower surface brightness disk if the time between
episodes is sufficiently large, depending on the elapsed time since the last SF burst. The
final answer to this may be found when an answer to the second question, determining why
the diffuse fractions for the studied galaxies is higher than for similar HSB galaxies, is also
found. Irregardless, what is clear is that the studied sample shows a clear bridge between
the known properties of high surface brightness galaxies and the more poorly understood
properties of their very low surface brightness counterparts, such as Malin 1.
Thanks to Joe Helmboldt for his help in getting the HIIphot program running with LSB
galaxies and to Rene Walterbos for his loan of the Hα filters. MSO acknowledges support
from the National Science Foundation, grant AST-0448893.
REFERENCES
Bothun, G. D. & Cornell, M. 1990 AJ 99, 1004
Bothun, G. D., Beers, T. C., Mould, J. R., & Huchra, J. P. 1986 ApJ 308, 510
Bothun, G. D., Beers, T. C., Mould, J. R., & Huchra, J. P. 1985 AJ 90 2487
Bottinelli, L., Gouguenheim, L., Paturel, G., Teerikorpi, P. 1995 A&A 296, 64
– 16 –
Broeils, A. H. & Courteau, S. 1997 ASPC 117, 74
Cornell, M., Aaronson, M., Bothun, G., & Mould, J. 1987 ApJS 64, 507
Das, M., O’Neil, K., Vogel, S., & McGaugh, S. 2006 ApJ preprint
de Jong, R. S. 1996 A&AS 118, 557
de Naray, Rachel Kuzio, McGaugh, Stacy S., & de Blok, W. J. G. 2004 MNRAS 355, 887
de Vaucouleurs, G, de Vaucouleurs, Antoinette, Corwin, Herold G., Jr., Buta, Ronald J.,
Paturel, Georges, & Fouque, Pascal Third Reference Catalogue of Bright Galaxies
1991 Springer-Verlag Berlin Heidelberg New York
Fouqué, P. & Paturel, G.(1985) A&A 150, 192
Freeman, K. 1970 ApJ 160, 811
Gavazzi, G., Boselli, A., Cortese, L., Arosio, I., Gallazzi, A., Pedotti, P., & Carrasco, L.
2006 A&A 446, 839
Gerritsen, Jeroen P. E. & de Blok, W. J. G. 1999 A&A 342, 655
Gurovich, Sebastin, McGaugh, Stacy S., Freeman, Ken C., Jerjen, Helmut, Staveley-Smith,
Lister, & de Blok, W. J. G. 2004 PASA 21, 412
Helmboldt, J.F., Walterbos, R.A.M., Bothun, G.D., O’Neil, K. 2005, ApJ, 630, 824
Helmboldt, J.F., Walterbos, R.A.M., Bothun, G.D., O’Neil, K., de Blok, W.J.G. 2004 ApJ,
613, 914
Hodge, P. W. & Kennicutt, R. C. Jr. 1983 ApJ 267, 563
Hoopes, Charles G., Walterbos, Ren A. M., & Bothun, Gregory D. 2001 ApJ 559, 878
Howarth, I 1983 MNRAS 203, 301
Impey, C. & Bothun, G.D. 1997 ARA&A 35, 267
Jansen, R., Fabricant, D., Franx, M., Caldwell, N. 2000 ApJS 126, 331
Kennicutt, W.C. Jr, Lee, J. C., Akiyama, S., Funes, J. G., & Sakai, S. 2004 AAS 205, 6005
Kennicutt, W.C. Jr 1998 ARA&A 36, 189
Kennicutt, W.C. Jr, Tamblyn, P., & Congdon, C. 1994 ApJ 435, 22
– 17 –
Kennicutt, R. C. Jr & Kent 1983 AJ 88 1094
Kennicutt, R. C. Jr 1983 ApJ 272, 54
Koopman, E. & Kenney, J. 2006 ApJS 162, 97
McCall, M. L., Rybski, P. M., & Shields, G. A. 1985 ApJS 57, 1
McGaugh, Stacy S., Rubin, Vera C., & de Blok, W. J. G 2001 AJ 122, 2381
McGaugh, S. S., Schombert, J. M., Bothun, G. D., & de Blok, W. J. G. 2000, ApJ 533, L99
Meurer, G., Hanish, D.J., Ferguson, H.C., Knezek, P., et al.2006 ApJS 165, 307
Nilson, P. Uppsala General Catalogue of Galaxies (UGC) Acta Universitatis Upsalienis, Nova
Regiae Societatis Upsaliensis, Series
O’Donnell, J 1994 ApJ 437, 262
Oey, et al.2006 - preprint
Oey, M. S., King, N. L., & Parker, J. W. 2004, AJ 127, 1632
Oey, S. & Kennicutt, R. 1993 ApJ 411, 137O
O’Neil, K. van Driel, W. & Schneider, S. 2006 in preparation
O’Neil, K., Bothun, G., van Driel, W., & Monnier-Ragaigne, D. 2004 A&A 428, 823
O’Neil, K. & Schinnerer, E. 2004 ApJ 615, L109
O’Neil, K., Schinnerer, E., & Hofner, P. 2003 ApJ 588, 230
O’Neil, K., Bothun, G., Schombert, J. 2000 AJ 119. 136
O’Neil, K., Hofner, P., & Schinnerer, E. 2000 ApJ 545, L99
Osterbrock, D. 1989 Astrophysics of Gaseous Nebulae and Active Galactic Nuclei University
Science Books
Pickering, T. E., Impey, C. D., van Gorkom, J. H., & Bothun, G. D. 1997 AJ 114, 1858
Roennback, J., & Bergvall, N. 1995 A&A 302, 353
Schombert, J. & Bothun, G. 1988 AJ 91, 1389
Seaton, M. 1979 MNRAS 187, 73
– 18 –
Simien, F. & de Vaucouleurs, G. 1986 ApJ 302, 564
Sprayberry, D., Impey, C. D., Bothun, G. D. & Irwin, M. J. 1995 AJ 109, 558
Thilker, D., et al.2005 ApJ 619L, 79
Thilker, David A., Braun, Robert, & Walterbos, Ren A. M. 2000 AJ 120 3070
Tully, R. Brent, Pierce, Michael J., Huang, Jia-Sheng, Saunders, Will, Verheijen, Marc A.
W., & Witchalls, Peter L. 1998 AJ 115, 2264
Tully, B. & Fouqué 1985 ApJS 58, 67
Tully, B. & Fisher, R. 1977 A&A 54, 661
van den Hoek, L. B., de Blok, W. J. G., van der Hulst, J. M., & de Jong, T. 2000 A&A 357,
Verheijen, M. 1997 Ph.D. Dissertation Kapteyn Institute, Groningen
Zwaan, M.A., van der Hulst, J.M., de Blok, W.J.G., & McGaugh, S.S. 1995 MNRAS 273,
This preprint was prepared with the AAS LATEX macros v5.2.
Table 1. Global Properties of Galaxies – B & R Measurements
Galaxy RAa Deca Vela Typea mb Mb D25
c 〈µ〉d mb Mb D25
c 〈µ〉d rb B−Rb ie
[J2000] [J2000]
km s−1
[mag] [Mag] [′′]
mag/′′2
[mag] [Mag] [′′]
mag/′′2
[′′] [◦]
UGC 00023 00 04 13.0 10 47 25 7787 3 14.4 (0.1) -20.7 (0.1) 71 23.4 13.0 (0.1) -22.1 (0.1) 86.7 22.4 40 1.4 (0.2) 52
UGC 00189 00 19 57.5 15 05 32 7649 7 15.0 (0.2) -20.1 (0.2) 84 24.4 13.8 (0.1) -21.3 (0.1) 116.1 23.8 57 1.2 (0.2) 67
UGC 01362 01 52 50.7 14 45 52 7918 8.8 16.9 (0.2) -18.2 (0.2) 30 24.3 15.6 (0.1) -19.5 (0.1) 42.1 23.5 23 1.3 (0.3) 0
UGC 02299 02 49 07.8 11 07 09 10253 8 15.4 (0.2) -20.3 (0.2) 59 24.0 14.5 (0.4) -21.2 (0.4) 65.1 23.3 33 0.9 (0.4) 32
UGC 02588 03 12 26.5 14 24 27 10093 9.9 15.8 (0.2) -19.9 (0.2) 39 23.6 14.7 (0.1) -20.9 (0.1) 50.1 23.0 28 1.1 (0.2) 28
UGC 02796 03 36 52.5 13 24 24 9076 4 14.8 (0.2) -20.6 (0.2) 66 23.6 13.3 (0.1) -22.1 (0.1) 94.1 22.7 28 1.5 (0.2) 57
UGC 03119 04 39 07.7 11 31 50 7851 4 14.3 (0.2) -20.8 (0.2) 71 23.7 12.4 (0.1) -22.7 (0.1) ‡ 24.1 40 1.9 (0.2) 72
UGC 03308 05 26 01.8 08 57 25 8517 6 14.3 (0.3) -21.0 (0.3) 89 23.8 14.0 (0.2) -21.3 (0.2) 88.1 23.5 48 0.3 (0.4) 28
UGC 07598 12 28 30.9 32 32 52 9041 5.9 15.3 (0.1) -20.1 (0.1) 46 23.5 14.8 (0.1) -20.6 (0.1) 66.8 22.8 33 0.5 (0.2) 22
UGC 08311 13 13 50.8 23 15 16 3451 4.1 15.5 (0.1) -17.9 (0.1) 50 23.8 14.8 (0.1) -18.5 (0.1) 62.0 23.5 33 0.7 (0.2) 26
UGC 08644 13 40 01.4 07 22 00 6983 8 16.1 (0.2) -18.8 (0.2) 43 24.2 15.3 (0.2) -19.6 (0.2) 49.4 23.5 28 0.8 (0.3) 30
UGC 08904 13 58 51.1 26 06 24 9773 3.6 15.9 (0.1) -19.7 (0.1) 43 23.8 14.9 (0.1) -20.7 (0.1) 55.5 23.4 33 1.0 (0.2) 48
UGC 10894 17 33 03.8 27 34 29 6890 4 16.0 (0.3) -18.8 (0.3) 48 24.1 14.9 (0.3) -19.9 (0.3) ‡ 23.0 28 1.1 (0.4) 57
UGC 11068 17 58 05.0 28 14 38 4127 3.2 15.0 (0.2) -18.7 (0.2) 64 24.0 13.8 (0.1) -19.9 (0.1) 89.4 23.4 57 1.2 (0.2) 0
UGC 11355 18 47 57.0 22 56 33 4360 3.5 13.9 (0.3) -19.9 (0.3) 173 24.8 12.5 (0.1) -21.3 (0.1) ‡ 23.7 82 1.4 (0.3) 73
UGC 11396 19 03 49.5 24 21 28 4441 3.5 14.8 (0.3) -19.1 (0.3) 114 24.4 13.8 (0.2) -20.1 (0.2) 78.7 23.0 33 1.0 (0.4) 59
UGC 11617 20 43 39.3 14 17 52 5119 6.1 14.9 (0.2) -19.2 (0.2) 75 24.1 13.9 (0.2) -20.3 (0.2) 86.1 23.3 40 1.1 (0.3) 58
UGC 11840 21 53 18.0 04 14 50 7986 4 16.3 (0.1) -18.9 (0.1) 24 22.9 15.3 (0.1) -19.9 (0.1) 23.3 21.9 11 1.0 (0.2) 40
UGC 12021 22 24 11.6 06 00 12 4472 3 14.7 (0.2) -19.2 (0.2) 113 24.7 13.6 (0.1) -20.2 (0.1) 130.4 23.9 57 1.1 (0.2) 63
Note. — Errors are given in parenthesis.
aRA, Dec, velocity and type information obtained from NED, the NASA Extragalacitc Database. Galaxy types are defined in de Vaucouleurs, et al. (1991).
bMagnitudes and colors were obtained at the maximum usable radius, r. Corrections applied and error estimates are described in Section 3.
cD25 are the diameters for the 25 mag arcsec
−2 isophotes.
dAverage surface brightness, as defined by Equation 7 in Section 4.
eInclinations are simply the major to minor axis ratio of the galaxies, found through isophote fitting.
‡Isophotes did not reach 25 mag arcsec−2.
– 20 –
Table 2. Global Properties of Galaxies – Hα
Galaxy Hα flux ×10−13a EWb SFRc rd
erg cm−2 s−1
UGC 00023 5 (1) 22 (8) 4 (1) 28
UGC 00189 4 (1) 63 (33) 4 (1) 23
UGC 01362 · · · · · · · · · · · ·
UGC 02299 0.41 (0.09) 38 (13) 0.7 (0.2) 6
UGC 02588 0.7 (0.3) 32 (21) 1.1 (0.5) 13
UGC 02796 2.1 (0.7) 12 (4) 2.7 (0.9) 13
UGC 03119 7 (4) 20 (12) 6 (6) 28
UGC 03308 1.0 (0.3) 22 (8) 1.1 (0.3) 11
UGC 07598 1.3 (0.3) 60. (18) 1.8 (0.4) 19
UGC 08311 3.1 (0.6) 91 (37) 0.6 (0.1) 19
UGC 08644 · · · · · · · · · · · ·
UGC 08904 0.8 (0.2) 40 (13) 1.1 (0.3) 16
UGC 10894 0.4 (1) 34 (78) 0.4 (0.1) 9
UGC 11068 · · · · · · · · · · · ·
UGC 11355 8 (2) 30 (9) 2.7 (0.6) 28
UGC 11396 2.4 (0.8) 400 (2000) 0.8 (0.2) 16
UGC 11617 3.1 (0.7) 30 (2) 0.3 (0.2) 19
UGC 11840 · · · · · · · · · · · ·
UGC 12021 4 (1) 63 (32) 1.3 (0.4) 23
UGC 12289 · · · · · · · · · · · ·
Note. — Derivation of quantities are described in Section 3. Errors
are given in parenthesis.
aTotal Hα flux found within the radius centered on the (optical)
center of the galaxy and extending to the radius given in the last column.
Errors were determined in the same manners as for magnitudes, and are
given in Section 3.
bThe equivalent width was calculated simply as the ratio of the total
Hα flux to total Hα-subtracted continuum flux.
cSFR =
1.26×1041ergs−1
; from Kennicutt, Tamblyn, & Congdon
(1994).
dRadius at which the isophotal signal-to-noise went below 1σ.
– 21 –
Table 3. Fitted Galaxy Properties
inner outer
Galaxy Fit Filterb µeff/µ0
c Reff/α
e αf Boundaryg Fith
Type a
mag arcsec−2
mag arcsec−2
[′′] [′′] Error
∗ UGC 00023 Two Disk B 19.06 (0.33) 1.71 (0.02) 21.17 (0.21) 11.05 (0.06) 6.55 0.82
∗ UGC 00023 Two Disk R 17.30 (0.27) 1.72 (0.02) 19.58 (0.16) 9.58 (0.03) 6.93 0.53
UGC 00023 Bulge/Disk B 21.48 (1.98) 4.06 (0.23) 21.47 (0.51) 11.96 (0.11) 8.94 0.89
UGC 00023 Bulge/Disk R 19.20 (1.56) 3.07 (0.14) 19.82 (0.35) 9.96 (0.05) 8.79 0.58
∗ UGC 00189 Two Disk B 20.99 (0.08) 7.75 (0.06) 24.05 (1.85) 36.90 (2.11) 28.34 0.33
∗ UGC 00189 Two Disk R 19.53 (0.09) 6.13 (0.05) 21.66 (0.67) 18.84 (0.22) 24.92 0.97
∗ UGC 01362 One Disk B · · · · · · 23.06 (0.33) 7.77 (0.10) · · · 1.78
∗ UGC 01362 One Disk R · · · · · · 21.84 (0.15) 7.85 (0.04) · · · 0.46
∗ UGC 02299 Two Disk B 20.85 (0.10) 2.00 (0.16) 22.43 (1.76) 11.24 (0.04) 6.48 1.88
∗ UGC 02299 Two Disk R 19.84 (0.28) 2.21 (0.03) 21.46 (0.38) 10.50 (0.10) 7.23 0.99
UGC 02299 Bulge/Disk B 25.10 (2.85) 17.35 (1.95) 23.23 (2.18) 12.62 (0.27) 17.01 2.08
UGC 02299 Bulge/Disk R 24.00 (1.75) 17.76 (1.24) 22.30 (1.43) 10.37 (0.19) 0.00 1.18
UGC 02588 One Disk B · · · · · · 21.50 (0.08) 5.47 (0.02) · · · 1.40
UGC 02588 One Disk R · · · · · · 20.41 (0.07) 5.79 (0.02) · · · 1.12
∗ UGC 02588 Bulge/Disk B 25.75 (2.43) 16.16 (1.75) 22.61 (0.97) 8.78 (0.08) 6.94 1.18
∗ UGC 02588 Bulge/Disk R 23.77 (2.50) 7.86 (0.75) 21.09 (0.48) 7.41 (0.04) 4.57 0.82
∗ UGC 02796 Two Disk B 19.07 (0.34) 1.89 (0.03) 21.02 (0.60) 8.75 (0.14) 6.90 0.48
∗ UGC 02796 Two Disk R 17.52 (0.27) 2.19 (0.03) 19.66 (0.41) 8.69 (0.08) 8.59 0.68
UGC 02796 Bulge/Disk B 23.07 (2.01) 13.80 (1.10) 22.54 (4.11) 8.92 (0.45) † 0.71
UGC 02796 Bulge/Disk R 21.01 (0.90) 10.26 (0.31) 20.49 (1.06) 6.76 (0.22) † 0.21
UGC 03119 One Disk B · · · · · · 19.82 (0.07) 10.35 (0.03) · · · 0.75
UGC 03119 One Disk R · · · · · · 17.92 (0.05) 9.48 (0.01) · · · 0.49
∗ UGC 03119 Bulge/Disk B 22.05 (9.17) 3.54 (0.91) 20.03 (0.31) 11.51 (0.08) 3.50 0.61
∗ UGC 03119 Bulge/Disk R 21.07 (5.81) 6.53 (1.29) 18.16 (0.24) 10.10 (0.02) 3.71 0.28
∗ UGC 03308 Two Disk B 19.83 (2.25) 0.87 (0.04) 21.70 (0.18) 15.47 (0.16) 3.47 0.35
∗ UGC 03308 Two Disk R 19.06 (1.32) 0.92 (0.03) 20.99 (0.13) 11.60 (0.06) 3.58 0.25
UGC 03308 Bulge/Disk B 18.36 (10.1) 0.43 (0.08) 21.71 (0.23) 15.66 (0.18) 3.62 0.37
UGC 03308 Bulge/Disk R 18.70 (5.77) 0.67 (0.08) 21.03 (0.18) 11.90 (0.08) 3.85 0.20
UGC 07598 One Disk B · · · · · · 21.69 (0.07) 7.96 (0.02) · · · 0.75
UGC 07598 One Disk R · · · · · · 19.82 (0.08) 6.26 (0.02) · · · 1.12
∗ UGC 07598 Bulge/Disk B 16.02 (6.98) 0.21 (0.03) 21.84 (0.13) 8.60 (0.03) 3.25 0.52
∗ UGC 07598 Bulge/Disk R 15.78 (3.90) 0.40 (0.03) 20.18 (0.21) 7.34 (0.04) 4.06 0.49
∗ UGC 08311 Two Disk B 20.82 (0.09) 2.94 (0.02) 23.29 (0.66) 14.91 (0.26) 12.50 0.48
– 22 –
Table 3—Continued
inner outer
Galaxy Fit Filterb µeff /µ0
c Reff/α
e αf Boundaryg Fith
Type a
mag arcsec−2
mag arcsec−2
[′′] [′′] Error
∗ UGC 08311 Two Disk R 20.28 (0.08) 3.22 (0.03) 22.73 (0.71) 15.81 (0.32) 13.64 0.39
∗ UGC 08644 One Disk B · · · · · · 22.56 (0.14) 8.79 (0.06) · · · 0.55
∗ UGC 08644 One Disk R · · · · · · 21.19 (0.15) 6.40 (0.04) · · · 0.78
∗ UGC 08904 Two Disk B 20.43 (0.16) 2.56 (0.02) 22.94 (0.71) 10.07 (0.15) 11.22 0.69
∗ UGC 08904 Two Disk R 18.90 (0.16) 2.20 (0.01) 22.05 (0.49) 9.67 (0.10) 11.19 0.35
UGC 08904 Bulge Only B 23.99 (0.27) 12.55 (0.09) · · · · · · · · · 1.17
UGC 08904 Bulge Only R 21.83 (0.25) 6.95 (0.04) · · · · · · · · · 0.48
∗ UGC 10894 One Disk B · · · · · · 21.65 (0.10) 7.90 (0.04) · · · 0.44
∗ UGC 10894 One Disk R · · · · · · 20.26 (0.12) 6.69 (0.04) · · · 0.42
UGC 11068 Two Disk B 20.15 (0.65) 1.15 (0.02) 22.46 (0.13) 13.65 (0.07) 4.87 0.31
UGC 11068 Two Disk R 18.67 (0.54) 1.25 (0.02) 21.02 (0.14) 11.26 (0.05) 5.23 0.54
∗ UGC 11068 Bulge/Disk B 20.36 (0.18) 1.00 (0.00) 22.53 (0.11) 14.10 (0.06) 5.31 0.38
∗ UGC 11068 Bulge/Disk R 19.89 (2.52) 1.66 (0.10) 21.18 (0.25) 11.97 (0.07) 6.23 0.42
∗ UGC 11355 Two Disk B 19.20 (0.15) 3.34 (0.02) 21.61 (0.18) 32.30 (0.24) 14.23 0.86
∗ UGC 11355 Two Disk R 17.30 (0.16) 3.01 (0.02) 19.83 (0.13) 24.07 (0.10) 13.02 0.64
UGC 11355 Bulge/Disk B 23.29 (0.95) 19.99 (0.72) 22.13 (0.56) 38.81 (0.57) 26.27 1.13
UGC 11355 Bulge/Disk R 20.70 (0.94) 10.96 (0.37) 20.19 (0.35) 26.30 (0.18) 26.30 0.93
∗ UGC 11396 Two Disk B 20.34 (1.82) 1.14 (0.05) 22.08 (0.27) 22.66 (0.39) 4.46 0.95
∗ UGC 11396 Two Disk R 19.77 (2.17) 1.23 (0.08) 20.20 (0.17) 13.04 (0.10) 2.80 1.69
UGC 11396 Bulge/Disk B 20.52 (9.40) 0.97 (0.19) 22.10 (0.35) 23.03 (0.45) 4.75 0.93
UGC 11396 Bulge/Disk R 20.09 (3.23) 1.09 (0.32) 20.21 (0.23) 13.03 (0.11) 3.03 1.69
∗ UGC 11617 One Disk B · · · · · · 21.30 (0.07) 12.83 (0.05) · · · 0.69
∗ UGC 11617 One Disk R · · · · · · 20.08 (0.08) 10.99 (0.04) · · · 0.46
∗ UGC 11840 No Fit B · · · · · · · · · · · · · · · · · ·
∗ UGC 11840 No Fit R · · · · · · · · · · · · · · · · · ·
UGC 12021 One Disk B · · · · · · 20.73 (0.06) 11.05 (0.02) · · · 1.63
UGC 12021 One Disk R · · · · · · 19.41 (0.06) 10.26 (0.02) · · · 1.58
∗ UGC 12021 Bulge/Disk B 26.49 (1.28) 23.19 (0.00) 20.88 (0.16) 11.59 (0.04) 2.37 1.54
∗ UGC 12021 Bulge/Disk R 23.85 (0.32) 23.19 (0.00) 19.96 (0.18) 11.60 (0.03) 6.23 1.01
Note. — Derivation of quantities is described in Section 3. Errors are given in parenthesis.
aThe type of fit made to the surface brightenss profile – bulge+disk, two exponential disks, or one exponential disk.
– 23 –
bOptical filter for the data described within that row.
cEffective surface brightness (R1/4 bulge fit) or central surface brightness (exponential disk fit) for the inner disk fit. See Equations 5
and 6.
dEffective radius (R1/4 bulge fit) or scale length (exponential disk fit) for the inner disk fit. See Equations 5 and 6.
eCentral surface brightness for the outer exponential disk fit.
fScale length for the outer exponential disk fit.
gBoundary between the inner and outer fits, defined by where the fitted lines cross.
hχ2 error for the fits.
∗Best fit – used for all further analysis.
†Fitted lines for the bulge and disk components do not cross.
Table 4. Properties of Hα Regions
Galaxy Regiona Hα Flux ×10−15 Hα Luminosity×1038 SFRb EWc Bd Rd B−Rd Diffusee
erg cm−2 s−1
erg s−1
M⊙ yr
[mag] [mag] [mag] [%]
UGC 00023 1 70 (14) 760 (150) 0.6 (0.1) 23 (1) 16.9 (0.1) 15.2 (0.1) 1.7 (0.2) 0.82 (0.04)
UGC 00189 1 10 (2) 110 (20) 0.09 (0.02) 30 (2) 19.9 (0.6) 18.3 (0.5) 1.7 (0.7) 0.98 (0.01)
UGC 01362 0 · · · · · · · · · · · · · · · · · · · · · · · ·
UGC 02299 1 12 (2) 240 (50) 0.19 (0.04) 20 (1) 21 (2) 21 (2) 0. (3) · · ·
UGC 02299 2 15 (3) 260 (50) 0.21 (0.04) 14 (1) 18.2 (0.2) 17.1 (0.2) 1.1 (0.3) · · ·
UGC 02299 TOTAL 22 (3) 490 (70) 0.39 (0.06) · · · · · · · · · · · · 0.45 (0.06)
UGC 02588 0 · · · · · · · · · · · · · · · · · · · · · · · ·
UGC 02796 1 74.8 (15) 900 (200) 0.71 (0.1) 10 (1) 16.5 (0.2) 14.8 (0.4) 1.7 (0.4) 0.67 (0.01)
UGC 03119 1 151 (30) 1600 (300) 1.3 (0.3) 16 (1) 18.8 (0.1) 16.7 (0.1) 2 (0.1) 0.77 (0.02)
UGC 03308 0 · · · · · · · · · · · · · · · · · · · · · · · ·
UGC 07598 1 6 (1) 80 (17) 0.07 (0.01) 13 (1) 22.2 (0.7) 21.5 (0.7) 0.6 (1) · · ·
UGC 07598 2 4.7 (0.9) 60 (12) 0.05 (0.01) 12 (1) 22 (1) 20.2 (0.9) 2 (1) · · ·
UGC 07598 3 2.4 (0.5) 32 (6) 0.03 (0.01) 13 (1) 23 (1) 21 (1) 2 (2) · · ·
UGC 07598 4 6 (1) 80 (20) 0.06 (0.01) 12 (1) 21.1 (0.8) 19.6 (0.7) 2 (1) · · ·
UGC 07598 5 3.3 (0.7) 43 (9) 0.03 (0.01) 13 (1) 23 (2) 22 (2) 2 (3) · · ·
UGC 07598 6 6 (1) 80 (20) 0.06 (0.01) 11 (1) 21 (3) 22 (3) -1 (4) · · ·
UGC 07598 7 4.1 (0.8) 50 (10) 0.04 (0.01) 11 (1) 22 (1) 21 (2) 1 (4) · · ·
UGC 07598 8 1.3 (0.3) 15 (3) 0.012 (0.003) 9 (1) 25 (3) 24 (3) 1 (3) · · ·
UGC 07598 9 32 (6) 430 (90) 0.34 (0.07) 13 (1) 18 (0.1) 16 (0.1) 1.9 (3) · · ·
UGC 07598 10 1.4 (0.3) 20. (4) 0.015 (0.003) 16 (1) 21.8 (0.5) 20.4 (0.5) 1.3 (0.5) · · ·
UGC 07598 11 10 (2) 130 (30) 0.11 (0.02) 11 (1) 21.4 (0.5) 20 (0.6) 1.4 (0.8) · · ·
UGC 07598 12 4.8 (1) 60 (10) 0.05 (0.01) 11 (1) 22 (1) 20.2 (0.9) 2 (1) · · ·
UGC 07598 13 4.4 (0.9) 60 (10) 0.05 (0.01) 12 (1) 21.8 (0.8) 20.1 (0.7) 2 (1) · · ·
UGC 07598 14 5 (1) 70 (10) 0.05 (0.01) 11 (1) 22.7 (1) 21 (1) 1 (2) · · ·
UGC 07598 TOTAL 69 (6) 1200 (100) 0.96 (0.08) · · · · · · · · · · · · 0.5 (0.1)
UGC 08311 1 7 (1) 14 (3) 0.011 (0.002) 20. (1) 21.3 (0.7) 21 (0.9) 0. (1) · · ·
UGC 08311 2 100 (20) 240 (50) 0.19 (0.04) 53 (1) 18.8 (0.4) 18.4 (0.5) 0.3 (0.6) · · ·
UGC 08311 3 7 (1) 16 (3) 0.012 (0.003) 18 (1) 20.8 (0.1) 20. (1) 1 (1) · · ·
UGC 08311 4 17 (3) 38 (8) 0.03 (0.01) 29 (1) 0.9 (0.1) 20.5 (0.9) -19.6 (0.9) · · ·
UGC 08311 5 8 (2) 18 (4) 0.015 (0.003) 25 (1) 22 (2) 22 (2) 1 (3) · · ·
UGC 08311 6 140 (30) 320 (60) 0.26 (0.05) 38 (1) 16.9 (0.1) 16.3 (0.1) 0.6 (0.1) · · ·
UGC 08311 7 15 (3) 32 (6) 0.03 (0.01) 19 (1) 18.6 (0.1) 18.1 (0.2) 0.6 (0.2) · · ·
UGC 08311 TOTAL 270 (30) 680 (80) 0.54 (0.07) · · · · · · · · · · · · 0.2 (0.1)
UGC 08644 1 5.1 (1) 44 (9) 0.04 (0.01) 17 (1) 21 (0.1) 20. (1) 1 (1) · · ·
Table 4—Continued
Galaxy Regiona Hα Flux ×10−15 Hα Luminosity×1038 SFRb EWc Bd Rd B−Rd Diffusee
erg cm−2 s−1
erg s−1
M⊙ yr
[mag] [mag] [mag] [%]
UGC 08644 2 4.8 (1) 42 (8) 0.03 (0.01) 20. (1) 22 (0.5) 21.1 (0.6) 0.9 (0.8) · · ·
UGC 08644 TOTAL 8.3 (1) 90 (10) 0.07 (0.01) · · · · · · · · · · · · · · ·
UGC 08904 1 3.4 (0.7) 47 (9) 0.04 (0.01) 10. (1) 22.4 (0.8) 22 (1) 1 (1) · · ·
UGC 08904 2 0.7 (0.1) 12 (2) 0.01 (0.009) 23 (1) 23.6 (0.5) 23.0 (0.6) 0.7 (0.8) · · ·
UGC 08904 3 1.2 (0.3) 20. (4) 0.02 (0.02) 16 (1) 24.3 (0.7) 24 (1) 1 (1) · · ·
UGC 08904 4 0.8 (0.2) 12 (2) 0.01 (0.009) 10. (1) 23.8 (0.4) 24 (1) 0. (1) · · ·
UGC 08904 5 0.7 (0.1) 9 (2) 0.01 (0.007) 10. (1) 24.5 (0.9) 24 (1) 1 (2) · · ·
UGC 08904 6 32 (6) 600 (100) 0.45 (0.09) 25 (1) 17.7 (0.1) 16.3 (0.1) 1.4 (0.1) · · ·
UGC 08904 TOTAL 33 (6) 700 (100) 0.53 (0.09) · · · · · · · · · · · · 0.6 (0.1)
UGC 10894 1 1.2 (0.2) 11 (2) 0.009 (0.002) 39 (3) 24 (2) 23 (1) 1 (2) · · ·
UGC 10894 2 1.4 (0.3) 13 (3) 0.010 (0.002) 37 (2) 22.1 (0.5) 21.3 (0.6) 0.8 (0.8) · · ·
UGC 10894 3 13 (3) 110 (20) 0.09 (0.02) 27 (1) 18.9 (0.2) 17.2 (0.1) 1.8 (0.2) · · ·
UGC 10894 4 1.9 (0.4) 18 (4) 0.014 (0.003) 31 (2) 23 (1) 22 (1) 1 (2) · · ·
UGC 10894 5 1.8 (0.4) 16 (3) 0.013 (0.003) 35 (2) 22.8 (0.7) 21.9 (0.7) 1 (1) · · ·
UGC 10894 6 2.8 (0.6) 26 (5) 0.020 (0.004) 36 (2) 22.4 (0.7) 22 (1) 0. (1) · · ·
UGC 10894 7 3.9 (0.8) 35 (7) 0.03 (0.01) 31 (2) 22.6 (0.8) 21.6 (0.8) 1 (1) · · ·
UGC 10894 TOTAL 23 (2) 230 (25) 0.18 (0.02) · · · · · · · · · · · · 0.6 (0.3)
UGC 11068 0 · · · · · · · · · · · · · · · · · · · · · · · ·
UGC 11355 1 180 (40) 600 (100) 0.48 (0.1) 17 (1) 19.7 (0.4) 18.2 (0.4) 1.5 (0.6) 0.87 (0.08)
UGC 11396 1 25 (5) 90 (20) 0.07 (0.01) 21 (1) 22 (1) 18.1 (0.2) 4 (1) 0.91 (0.01)
UGC 11617 1 9 (2) 40 (8) 0.07 (0.01) 12 (1) 21 (1) 21 (0.6) 0. (1) · · ·
UGC 11617 2 21 (4) 90 (20) 0.04 (0.01) 13 (1) 21 (2) 20. (2) 1 (3) · · ·
UGC 11617 3 13 (3) 50 (10) 0.04 (0.01) 13 (1) 22 (2) 20. (3) 1 (3) · · ·
UGC 11617 4 12 (2) 47 (9) 0.09 (0.01) 11 (1) 21 (2) 21 (1) 1 (3) · · ·
UGC 11617 5 26 (5) 110 (20) 0.03 (0.02) 14 (1) 21.5 (0.6) 21 (4) 1 (4) · · ·
UGC 11617 6 10 (2) 42 (8) 0.03 (0.01) 12 (1) 21.5 (0.2) 20.8 (0.7) 0.7 (0.7) · · ·
UGC 11617 TOTAL 69 (6) 390 (30) 0.31 (0.03) · · · · · · · · · · · · 0.8 (0.1)
UGC 11840 0 · · · · · · · · · · · · · · · · · · · · · · · ·
UGC 12021 1 13 (3) 50 (10) 0.04 (0.01) 31 (2) 19.2 (0.4) 17.8 (0.3) 1.4 (0.5) 0.97 (0.01)
Note. — Derivation of quantities is described in Section 3. Errors are given in parenthesis.
aInternal numbering scheme for the HII regions found by HIIphot.
bSFR for the region, defined as SFR =
1.26×1041erg s−1
cEquivalent width was calculated simply as the ratio of the total Hα flux to total Hα-subtracted continuum flux.
dTotal B and R magnitudes and colors within the HII regions
eDiffuse fraction found for the galaxy, defined defined here as the ratio of the fraction of Hα flux not found within the defined Hα regions to the
total Hα flux found for the entire galaxy.
†Due to both a (masked) star near the center of this galaxy and a (masked) CCD flaw (bad column) which also runs through the center of the
galaxy, a number of H II regions which should have been identified by HIIphot were not, artificially rasing the diffuse fraction on this galaxy, possibly
by as much as 20-30%.
– 27 –
Fig. 1.— Grey scale images of the observed galaxies. Figure available through the published
AJ paper or online at http://www.gb.nrao.edu/∼koneil.
Fig. 2.— Surface brightness profiles for all galaxies observed. The dash-dotted lines show
the inner fit, the dashed lines show the outer fit, and the solid lines show the combined fits.
Both the B (blue - bottom) and R (red - top) profiles are shown. Figure available through
the published AJ paper or online at http://www.gb.nrao.edu/∼koneil.
Fig. 3.— Color profiles for all the galaxies observed. Here the inner fits (when made) are
shown by a dashed line and the outer fits are shown by a solid line. Figure available through
the published AJ paper or online at http://www.gb.nrao.edu/∼koneil.
– 28 –
UGC 00189 UGC 10894
UGC 07598 UGC 08904
Fig. 4.— Example images showing the H II regions found by HIIphot for the galaxies
UGC 00189, UGC 10894, UGC 07598, and UGC 08904. The H II regions are outlines in
white. In the case of UGC 10894 two regions which were masked due to the presence of stars
can also be seen, outlined by the square white boxes.
– 29 –
18 20 22 24
µB,R(0) [mag arcsec
19 20 21 22 23 24 25
µB,R(0) [mag arcsec
Fig. 5.— (Left) Histogram showing the distribution of central surface brightnesses for the
observed galaxies. The (red) dashed line shows the R-band data and the (blue) solid line
shows the B-band data. (Right) Plot of the observed central surface brightness against scale
length of the outer disk. Here, the R-band data is demarcated by (red) open circles while
the B-band data uses (blue) filled circles.
22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5
<µB,R> [mag arcsec
22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5
<µB,R> [mag arcsec
Fig. 6.— Plots comparing the measured central surface brightnesses with the average surface
brightness for the galaxies, as defined by Equation 7 (left) and Equation 8 (right) and using
the magnitude and diameter values found herein. The R-band data is demarcated by (red)
open circles while the B-band data uses (blue) filled circles.
– 30 –
22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5
<µB,R> [mag arcsec
22.0 22.5 23.0 23.5 24.0 24.5 25.0 25.5
<µB,R> [mag arcsec
Fig. 7.— Plots comparing the measured central surface brightnesses, corrected for inclina-
tion, with the average surface brightness for the galaxies, as defined by Equation 7 (left) and
Equation 8 (right) and using the magnitude and diameter values found herein. The R-band
data is demarcated by (red) open circles while the B-band data uses (blue) filled circles.
Fig. 8.— Images of UGC 11355 with the stretch altered to show the galaxy’s nuclear bar
(left - R-band image) and star forming ring (right - Hα image). In both images the ellipse
shows the shape and size of the star forming ring. The images are 1.0′ across.
– 31 –
0 1 2 3 4
MHI/LR,B global [MO  · /LO  · ]
Fig. 9.— Gas mass to B and R-band luminosity ratios plotted against the global star
formation rate for the galaxies. The (blue) filled circles are for the B-band data and the
(red) open circles are for the R-band data.
– 32 –
−14 −16 −18 −20 −22
MB, global
S0 − Sa
Sab − Sb
Sbc−Sc
Sm−Im
Fig. 10.— Total B magnitude plotted against the average luminosity of the brightest three
Hα regions. (If less than three regions were found, the average of all H II regions was used.)
The filled (red) symbols are the data from our observations; the filled (blue) symbols are
from Helmboldt, et al. (2005); and the open (black) symbols are from Kennicutt & Kent
(1983).
– 33 –
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
(B − R)global
Our Data
Helmboldt, et.al 2004
Jansen, et.al 2000
−0.5 0.0 0.5 1.0 1.5 2.0 2.5
(B−R)global
Our Data
Jansen, et.al 2000
Fig. 11.— Global color versus equivalent width (left) and star formation rate (right). To
insure any trends (or lack) remain the same, the data from this paper is shown both without
inclination correction (black) and with (gray). Note that inclination corrections are described
in Section 3. As the global SFR was not available for the Helmboldt, et al. (2005) data, it
is not shown on the right.
−17 −18 −19 −20 −21 −22 −23
MB,global
Our Data
Kennicutt & Kent 1983
Helmbolt, et.al 2004
−16 −17 −18 −19 −20 −21 −22
MB, global
Our data
Kennicutt 1983
Jansen et al. 2000
Fig. 12.— Total B magnitude plotted against the global equivalent width (left) and star
formation rate (right). As the global SFR was not available for the Helmboldt, et al. (2005)
data, it is not shown on the right.
– 34 –
36 37 38 39 40 41
log(LHalpha, eff/area) [erg s
−1 kpc−2]
Our Data
Oey, et.al
Fig. 13.— Luminosity surface brightness (Luminosity/area) plotted against the diffuse Hα
fraction for our sample and that of Oey, et al. (2006).
– 35 –
0.0 0.5 1.0 1.5 2.0
SFRtotal,region [MO  · /yr]
0 20 40 60 80 100 120 140
E.Wglobal [Å]
Fig. 14.— A comparison of regional and global star formation rate and equivalent width for
the studied galaxies. On the left is a plot of the global SFR against the total SFR found for
the individual H II regions, with a line demarcating the point where the global and regional
SFR are equal. On the right is a plot of the global EW against the average EW for the
individual H II regions.
– 36 –
19 20 21 22 23 24 25
µB,R(0) [mag arcsec
Fig. 15.— Central surface brightness versus global star formation rate for the observed
galaxies. The (red) open circles are from the R band data, while the (blue) filled circles are
for the B data.
– 37 –
0.001 0.010 0.100 1.000 10.000
MHI/LB global [MO  · /LO  · ]
Our Data
Kennicutt & Kent 1983
Helmbolt, et.al 2004
0.1 1.0 10.0
MHI/LR global [MO  · /LO  · ]
Our Data
Oey, et.al (2006)
Helmboldt, et.al (2005)
Fig. 16.— Gas mass to luminosity ratios plotted against global equivalent widths (left) and
diffuse Hα fractions (right). On the left, the (black) circles are our data, the (blue) triangles
are from Kennicutt & Kent (1983) and the (red) diamonds are from Helmboldt, et al. (2004).
On the right, the (black) circles are again our data, while the (blue) asterisks are from Oey,
et al. (2006).
– 38 –
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0
(B − R)region
Fig. 17.— This plot shows the regional colors versus star formation rates for the observed
galaxies.
ABSTRACT
  We present B, R, and Halpha imaging data of 19 large disk galaxies whose
properties are intermediate between classical low surface brightness galaxies
and ordinary high surface brightness galaxies. We use data taken from the
Lowell 1.8m Perkins telescope to determine the galaxies' overall morphology,
color, and star formation properties. Morphologically, the galaxies range from
Sb through Irr and include galaxies with and without nuclear bars. The colors
of the galaxies vary from B-R = 0.3 - 1.9, and most show at least a slight
bluing of the colors with increasing radius. The Halpha images of these
galaxies show an average star formation rate lower than is found for similar
samples with higher surface brightness disks. Additionally, the galaxies
studied have both higher gas mass-to-luminosity and diffuse Halpha emission
than is found in higher surface brightness samples.

<|endoftext|><|startoftext|>
Domain Wall Dynamics near a Quantum Critical Point
Shengjun Yuan and Hans De Raedt
Department of Applied Physics, Zernike Institute for Advanced Materials,
University of Groningen, Nijenborgh 4, NL-9747 AG Groningen, The Netherlands
Seiji Miyashita
Department of Physics, Graduate School of Science,
University of Tokyo, Bunkyo-ku,Tokyo 113-0033, Japan and
CREST, JST, 4-1-8 Honcho Kawaguchi, Saitama, Japan
(Dated: November 4, 2018)
We study the real-time domain-wall dynamics near a quantum critical point of the one-dimensional
anisotropic ferromagnetic spin 1/2 chain. By numerical simulation, we find the domain wall is
dynamically stable in the Heisenberg-Ising model. Near the quantum critical point, the width of
the domain wall diverges as (∆− 1)
PACS numbers: 75.10.Jm, 75.40.Gb, 75.60.Ch, 75.40.Mg. 75.75.+a
I. INTRODUCTION
Recent progress in synthesizing materials that contain
ferromagnetic chains1,2,3,4 provides new opportunities to
study the quantum dynamics of atomic-size domain walls
(DW). On the atomic level, a DW is a structure that is
stable with respect to (quantum) fluctuations, separating
two regions with opposite magnetization. Such a struc-
ture was observed in the one-dimensional CoCl2 · 2H2O
chain5,6.
In an earlier paper7, we studied the propagation of spin
waves in ferromagnetic quantum spin chains that sup-
port DWs. We demonstrated that DWs are very stable
against perturbations, and that the longitudinal compo-
nent of the spin wave speeds up when it passes through a
DW while the transverse component is almost completely
reflected.
In this paper, we focus on the dynamic stability of the
DW in the Heisenberg-Ising ferromagnetic chain. It is
known that the ground state of this model in the subspace
of total magnetization zero supports DW structures8,9.
However, if we let the system evolve in time from an
initial state with a DW structure and this initial state is
not an eigenstate, it must contain some excited states.
Therefore, the question whether the DW structure will
survive in the stationary (long-time) regime is nontrivial.
The question how the DW structure dynamically sur-
vives in the stationary (long-time) region is an interesting
problem. In particular, we focus on the stability of the
DW with respect to the dynamical (quantum) fluctua-
tions as we approach the quantum critical point (from
Heisenberg-Ising like to Heisenberg). We show that the
critical quantum dynamics of DWs can be described well
in terms of conventional power laws. The behavior of
quantum systems at or near a quantum critical point is
of contemporary interest10. We also show that the DW
profiles rapidly become very stable as we move away from
the quantum critical point.
II. MODEL
The Hamiltonian of the system is given by8,9,11,12,13
H = −J
(SxnS
n+1 + S
n+1 +∆S
n+1), (1)
where N indicates the total number of spins in the spin
chain, and the exchange integrals J and J∆ determine
the strength of the interaction between the x, y and z
components of spin 1/2 operators Sn = (S
n , S
Here we only consider the system with the ferromagnetic
(J > 0) nearest exchange interaction. It is well known
that |∆| = 1 is a quantum critical point of the Hamil-
tonian in Eq. (1), that is, the analytical expressions of
the ground state energy for 1 < ∆ and −1 < ∆ < 1 are
different and singular at the points ∆ = ±112.
In Ref.8,9 Gochev constructed a stable state with DW
structure in both the classical and quantum treatments of
the Hamiltonian (1). In the classical treatment, Gochev
replaces the spin operators in Eq. (1) by classical vectors
of length s
Szn = s cos θn, S
n = s sin θn cosϕn, S
n = s sin θn sinϕn,
and then uses the conditions δE/δθ = 0 and ϕn = const.
to find the ground state. In the ground state, the mag-
netization per site is given by9
Szn = s tanh(n− n0)σ,
Sxn = s cosϕ sech(n− n0)σ,
Syn = s sinϕ sech(n− n0)σ,
where
σ = ln[∆ +
∆2 − 1], (3)
ϕ is an arbitrary constant, and n0 is a constant fixing the
position of the DW. The corresponding energy is
EDW = 2s
2J∆tanhσ. (4)
http://arxiv.org/abs/0704.0193v1
In the quantum mechanical treatment, Gochev first con-
structs the eigenfunction of a bound state of k magnons9
|ψk〉 = An
Bm1m2...mkS
S−m2 ...S
|0〉 , (5)
where
Bm1m2...mk =
vmii ,mi < mi+1, (6)
vi = cosh(i − 1)σ/ cosh(iσ), (7)
A−2 =
v2ii /(1− v
i ), (8)
and the corresponding energy is given by9
J∆tanhσ tanh kσ. (9)
Then he demonstrated that for the infinite chain, the
linear superposition
|φn0〉 = A
|ψN0+i〉 ,
where
n0 = N0 + α, |α| ≤ 1/2, N0 → ∞, (11)
A−2 =
, (12)
is the quantum analog of the classical domain wall, in
which 〈Szn〉 , 〈S
n〉 , 〈S
n〉 are given in Eq. (2), and the en-
ergy coincides with Eq. (4).
Gochev’s work confirmed the existence of the DW
structures in the one-dimensional ferromagnetic quantum
spin 1/2 chain. In the infinite chain, the exact quantum
analog of classical DW is represented by |φn0〉. In the fi-
nite chain, the DW structure exists as a bound k-magnon
state |ψk〉. The main difference between these two states
is the distribution of magnetization in the XY plane. In
the infinite chain, the change of the magnetization oc-
curs in three dimensions, according to Eq. (2), but in
the finite chain 〈Sxn〉 = 〈S
n〉 = 0 for all spins.
Now we consider 〈Szn〉 of the bound state |ψk〉 in the
case that the number of flipped spin is half of the total
spins, i.e., k = N/2 and N is an even number. Even
though the formal expression for |ψk〉 is known, the ex-
pression for 〈Szn〉 in this state (for finite and infinite
chains) is not known. For finite N , the ground state in
the subspace of total magnetizationM = 0 can, in princi-
ple, be calculated from Eq. (5). However, this requires a
numerical procedure and we loose the attractive features
FIG. 1: The magnetization 〈Szn〉 in the ground state of the
subspace of total magnetization M = 0, generated by the
power method. The parameters are: (a) ∆ = 1.05, (b) ∆ =
1.1, (c) ∆ = 1.2, (d) ∆ = 2. The total number of spins in the
spin chain is N = 20. It is clear that there is a DW at the
centre of the spin chain. Furthermore there is no structure in
the XY plane, that is, 〈Sxn〉 = 〈S
n〉 = 0.
FIG. 2: Top picture (a): Initial spin configuration at time
t/τ = 0; Bottom pictures (b,c,d,e,f,g,h,i): Spin configuration
at time t/τ = 100; Bottom left pictures (b,c,d,e): DW struc-
tures disappear or are not stable. The parameters are: (b)
∆ = 0 (XY model), (c) ∆ = 0.5 (Heisenberg-XY model), (d)
∆ = 1 (Heisenberg model), (e) ∆ = 1.05 (Heisenberg-Ising
model); Bottom right pictures (f,g,h,i): DW structures are
dynamically stable in the Heisenberg-Ising model. The pa-
rameters are: (f) ∆ = 1.1, (g) ∆ = 1.2, (h) ∆ = 2, (i) ∆ = 20.
The total number of spins in the spin chain is N = 20.
of the analytical approach. Indeed, it is more efficient to
use a numerical method and compute directly the ground
state in the subspace of total magnetization M = 0. In
Fig. 1, we show some representative results as obtained
by the power method14 for a chain of N = 20 spins. In
all cases, the domain wall is well-defined. Obviously, be-
TABLE I: The energy E = J∆/2 of the initial state |Φ〉 (see
Fig. 2(a)) and the ground state E
g in the M = 0 subspace,
both relative to the ground state energy of the ferromagnet.
∆ E E
1.05 0.53 0.16 0.16 0.16 0.16
1.1 0.55 0.23 0.23 0.23 0.23
2 1.00 0.87 0.87 0.87 0.87
5 2.50 2.45 2.45 2.45 2.45
cause we are considering the system in the ground state,
the magnetization profile will not change during the time
evolution.
To inject a DW in the spin chain, we take the state |Φ〉
with the left half of the spins up and the other half down
as the initial state (see Fig. 2(a) for N = 20). The state
|Φ〉 corresponds to the state with the largest weight in the
bound state |ψk〉 with k = N/2, because |Bm1m2...mk |
reaches the maximum if mi = i for all i = 1, 2, .., N/2
(note |vi| < 1). It is clear that |Φ〉 is not an eigenstate of
the Hamiltonian in Eq. (1). The energy of |Φ〉, relative to
the ferromagnetic ground state, is J∆/2, and its spread
(〈Φ|H2|Φ〉−〈Φ|H |Φ〉2)1/2 = J/2. In Table I, we list some
representative values of the energy in the initial state (see
Fig. 2(a)) and in the ground state of subspace M = 0
(see Fig. 1).
A priori, there is no reason why the DW of the ini-
tial state |Φ〉 should relax to a DW profile that is dy-
namically stable. For ∆ ≃ 1, the difference between en-
ergy of the initial state and the ground state energies
for N = 16, 18, 20, 22 is relatively large and the relative
spread in energy (1/∆) is large also, suggesting that near
the quantum critical point, the initial state may contain
a significant amount of excited states. Therefore, it is
not evident that a DW structure will survive in the long-
time regime. In fact, from the numbers in Table I, one
cannot predict whether or not the DW will be stable.
For instance, for ∆ = 1.05 and N = 16, 18, 20, the DW
is not dynamically stable whereas for N = 22 it is stable
but the energies (see first line in Table I) do not give a
clue as to why this should be the case. On the other
hand, by solving the time dependent Schrödinger equa-
tion (TDSE), it is easy to see if the DW is dynamically
stable or not.
III. DYNAMICALLY STABLE DOMAIN WALLS
We solve the TDSE of the whole system with the
Hamiltonian in Eq. (1) and study the time-evolution
of the magnetization at each lattice site. The numeri-
cal solution of the TDSE is performed by the Chebyshev
polynomial algorithm, which is known to yield extremely
accurate independent of the time step used15,16,17,18. We
adopt open boundary conditions, not periodic bound-
ary conditions, because the periodic boundary condition
would introduce two DWs in the initial state. In this pa-
per, we display the results at time intervals of τ = π/5J ,
and use units such that ~ = 1 and J = 1.
The initial state of the system is shown in Fig. 2(a).
The spins in the left part (n = 1 to 10) of the spin chain
are all ”spin-up” and the rest (n = 11 to 20) are all
”spin-down”. Here ”spin-up” or ”spin-down” correspond
to the eigenstates of the single spin 1/2 operator Szn.
Whether the DW at the centre of the spin chain is
stable or unstable depends on the value of ∆. In Fig.
2(b,c,d,e,f,g,h, and i), we show the states of the system
as obtained by letting the system evolve over a fairly long
time (t = 500J/π). It is clear that the DW totally disap-
pears for 0 ≤ ∆ ≤ 1, that is, in the XY, Heisenberg-XY
and Heisenberg spin 1/2 chain, the DW structures are
not stable. For the Heisenberg-Ising model (∆ > 1), the
DW remains stable when t ≥ 500J/π (see Ref.7), and its
structure is more sharp and clear if ∆ is larger, so we
will concentrate on the cases ∆ > 1. One may note that
the values of ∆ in Fig. 2(e,f,g,h) are the same as in Fig.
1(a,b,c,d), but that the distributions of the magnetiza-
tion are similar but not the same. This is because the
energy is conserved during the time evolution and the
system, which starts from the initial state shown in Fig.
2(a), will never relax to the ground state of the subspace
with the total magnetization M = 0.
In order to get a quantitative expression of the width
of DW, we first introduce the quantity Szn (t1, t2; ∆) (n =
1, 2, ..., N) as the time average of the expectation value
〈Szn (t)〉 of nth spin:
Szn (t1, t2; ∆) ≡
〈Szn (t)〉 dt
t2 − t1
. (13)
We take the average in Eq. (13) over a long period dur-
ing which the DW is dynamically stable. In Fig. 3, we
show some results of Szn (t1, t2; ∆) for the Heisenberg-
Ising model, where we take t1 = 101τ , t2 = 200τ and
various ∆. We find that each curve in Fig. 3 is symmet-
ric about the line n = (N + 1) /2, and can be fitted well
by the function
Szn (t1, t2; ∆) = a∆ tanh
n− (N + 1)/2
. (14)
The values of ∆ we used and the corresponding values
of a∆, b∆ are shown in Table II. As we mentioned
earlier, Gochev9 constructed an eigenstate of the one-
dimensional anisotropic ferromagnetic spin 1/2 chain in
which the mean values Szn, S
n and S
n coincide with the
stable DW structure in the classical spin chain, that is
〈Szn〉 =
tanh(n− n0)σ, (15)
where n0 is the position of the DW (in our notation, this
is (N + 1) /2). The fitted form of Szn (t1, t2; ∆) in Eq.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
 1.05
 1.06
FIG. 3: (Color online) Szn (t1, t2; ∆) as a function of n for
different ∆. Here t1 = 101τ , t2 = 200τ . We show the data
for ∆ = 1.05, 1.06, 1.1, 1.2, 1.3, 1.5, 2, 5 and 20 only. The total
number of spins in the spin chain is N = 20.
(14) is similar to Eq. (15). From Table II, it is clear that
as ∆ increases, |a∆| converges to 1/2, in agreement with
Eq. (15). From the comparison of b∆ and 1/σ in Fig.
4, it is clear that the dependence on ∆ is qualitatively
similar but not the same. This is due to the fact that
Gochev’s solution is for a DW in the ground state whereas
we obtain the DW by relaxation of the state shown in Fig.
2(a).
We want to emphasize that the meaning of
Szn (t1, t2; ∆) in Eq. (14) is different from 〈S
n〉 in Eq.
(15). The former describes the mean value of 〈Szn (t)〉
in a state with dynamical fluctuations, while the latter
describes the distribution of 〈Szn〉 in an exact eigenstate
without dynamical fluctuations.
Next we introduce a definition of the DW width. From
Eq. (14), we can find n1 and n2 which satisfy
Szn1 (t1, t2; ∆) = 1/4,
Szn2 (t1, t2; ∆) = −1/4, (16)
that is, when
Szn (t1, t2; ∆)
equals half of its maximum
value (1/2). Here n1 and n2 are not necessarily integer
numbers. Now we can define the DW width W as the
distance between n1 and n2:
W = |n1 − n2| . (17)
Clearly, the width of the DW becomes ill-defined if it
approaches the size of the chain. On the other hand, the
computational resources (mainly memory), required to
solve the TDSE, grow exponentially with the number of
spins in the chain. These two factors severely limit the
minimum difference between ∆ and the quantum critical
point (∆ = 1) that yields meaningful results for the width
of the DW. Indeed, for fixed N , ∆ has to be larger than
TABLE II: The values of ∆ we used in our simulations and
the corresponding a∆, b∆ fitted by Eq. (14) for a spin chain
of N = 20 spins.
∆ a∆ b∆ ∆ a∆ b∆
1.05 −0.263 3.659 1.8 −0.493 0.524
1.06 −0.330 3.171 1.9 −0.494 0.488
1.07 −0.377 2.850 2 −0.495 0.460
1.08 −0.406 2.673 2.1 −0.495 0.436
1.09 −0.424 2.534 2.2 −0.496 0.416
1.1 −0.435 2.396 2.5 −0.497 0.370
1.15 −0.462 1.996 3 −0.498 0.322
1.2 −0.471 1.626 4 −0.499 0.270
1.25 −0.476 1.330 5 −0.499 0.240
1.3 −0.479 1.142 6 −0.500 0.220
1.35 −0.481 0.959 7 −0.500 0.206
1.4 −0.483 0.869 8 −0.500 0.195
1.45 −0.485 0.770 9 −0.500 0.187
1.5 −0.487 0.719 10 −0.500 0.179
1.6 −0.489 0.629 15 −0.500 0.156
1.7 −0.491 0.568 20 −0.500 0.141
1 3 5 7 9 11 13 15 17 19 21
FIG. 4: Comparison of b∆ and 1/σ as a function of ∆. The
total number of spins in the spin chain is N = 20.
the ”effective” critical value for the finite system in order
for the DW width to be smaller than the system size.
Although the system sizes that are amenable to numeri-
cal simulation are rather small for present-day ”classical
statistical mechanics” standards, it is nevertheless possi-
ble to extract from these simulations useful information
about the quantum critical behavior of the dynamically
stable DW.
In Fig. 5, we plot W as a function of ∆ (1.06 ≤ ∆ ≤
20). By trial and error, we find that all the data can be
1 3 5 7 9 11 13 15 17 19 21
FIG. 5: The DW width as a function of ∆ in a spin
chain of N = 20 spins. The black dots are the sim-
ulation data and the solid line is given by W (∆) =
AN/ ln
∆ − ǫN +
(∆− ǫN )
+ BN with ǫN =
0.046 ± 0.001, AN = 2.16 ± 0.06 and BN = −0.485 ± 0.068.
TABLE III: The values of ǫN , AN and BN in Eq. (18) for a
spin chain of N = 16, 18, 20, 22 and 24 spins. For the fits,
we used all the data for ∆ ≤ 5.
N ǫN AN BN
16 0.065 ± 0.001 2.08 ± 0.10 −0.493 ± 0.142
18 0.052 ± 0.002 2.07 ± 0.11 −0.450 ± 0.152
20 0.045 ± 0.002 2.22 ± 0.09 −0.556 ± 0.133
22 0.040 ± 0.001 2.36 ± 0.08 −0.689 ± 0.140
24 0.033 ± 0.001 2.34 ± 0.06 −0.681 ± 0.127
fitted very well by the function
W (∆) =
∆− ǫN +
(∆− ǫN)
} +BN ,
where ǫN , AN and BN are fitting parameters. As shown
in Fig. 6, all the data for N = 16, 18, 22, 24 and ∆ ≤ 5
fit very well to Eq. (18). The results of these fits are
collected in Table III.
To analyze the finite-size dependence in more detail, we
adopt the standard finite-size scaling hypothesis19. We
assume that in the infinite system, the DW width plays
the role of the correlation length, that is, we assume that
W (∆) ∼W0(∆− 1)
−ν , (19)
where ν is a critical exponent. Finite-size scaling predicts
that the effective critical value ∆∗N = 1 + ǫN where ǫN
is proportional to N−1/ν . Taking ν = 1/2, Fig. 7 shows
that ∆∗N converges to one as N increases.
As a check on the fitting procedure, we apply it to
the data obtained by solving for the ground state in the
M = 0 subspace. In view of Eq. (13) and (14), we may
expect that Eq. (18) fits the data very well and, as shown
in Fig. 8, this is indeed the case.
If we fit the data to
W (∆) =W0 (∆−∆
. (20)
without assuming a priori value C, we find that C de-
pends on the range of ∆ that was used in the fit, as
shown in Fig. 9. Remarkably, we find that C ≈ 0.57 if
we fit the data for a large range of ∆’s and that C ap-
proaches 1/2 if we restrict the value of ∆ to the vicinity
of the critical point.
IV. THE STABILITY OF DOMAIN WALLS
To describe the stability of the DW structure, we in-
troduce δn (∆) (n = 1, 2, ..., N):
δn (∆) =
[Szn (t1, t2; ∆)]
− Szn (t1, t2; ∆)
, (21)
where
[Szn (t1, t2; ∆)]
〈Szn (t)〉
t2 − t1
. (22)
In order to show the physical meaning of δn, we write
〈Szn (t)〉 as
〈Szn (t)〉 ≡ Cn +Ωn (t) , (23)
where Cn is a constant and Ωn (t) is a time-dependent
term. Then Eq. (21) becomes
δn (∆) =
Ω2n (t) dt
t2 − t1
Ωn (t) dt
t2 − t1
. (24)
It is clear that if 〈Szn (t)〉 is a constant in the time
interval [t1, t2], then δn (∆) = 0. In general, since the
initial state is not an eigenstate of the Hamiltonian Eq.
(1), the magnetization of each spin will fluctuate and
Ωn (t) 6= 0. If, after long time, the system relaxes to a
stationary state that contains a DW, the magnetization
of each spin will fluctuate around its stationary value
Cn. The fluctuations are given by Ωn (t). If |Ωn (t)| is
large, the difference between the actual magnetization
profile at time t and the stationary profile Cn may be
large. From Eq. (24), it is clear that δn (∆) is a measure
of the deviation of 〈Szn (t)〉 from its stationary value Cn,
averaged over the time interval [t1, t2]. Thus, δn (∆) gives
direct information about the dynamics stability of the
Figure 10 shows the distribution δn (∆) for different
values of ∆. We only show some typical results, as in Fig.
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
16 N=22
1 2 3 4 5
FIG. 6: The DW width as a function of ∆ in a spin chain of N = 16, 18, 22, and 24 spins. The black dots are the simulation
data and the solid line in each panel is given by Eq. (18).
3. As expected, the distribution of δn (∆) is symmetric
about the centre of the spin chain (n = 10.5).
We first consider how δn (∆) changes with ∆ for fixed
n. From Fig.10, we conclude:
1) For the spins which are not located at the DW cen-
tre, i.e., n 6= 10, 11, δn (∆) decreases if ∆ becomes larger.
This means that the quantum fluctuations of these spins
become smaller if we increase the value of ∆. This is
reasonable because with increasing ∆, the initial state
approaches an eigenstate of the Hamiltonian for which
δn (∆) = 0 (Ising limit).
2) For the spins at the DW centre, i.e., n = 10, 11,
when ∆ becomes larger and larger, δn (∆) first increases
and then decreases. Qualitatively, this can be understood
in the following way. When ∆ is close to 1, the magneti-
zation at the DW centre disappears very fast and remains
zero. However, if ∆ >> 1, the magnetization at the DW
centre will retain its initial direction, hence the behavior
of the spin at the DW centre will qualitatively change
as ∆ moves away from the critical point ∆ = 1. In Fig.
11, we plot δ10 (∆) (= δ11 (∆)) as a function of ∆. It is
clear that δ10 (∆) first increases as ∆ increases, reaches
its maximum at ∆ = 1.3, and then decreases as ∆ be-
comes larger.
Now we consider the n-dependence of δn (∆) for fixed
∆. Since δn (∆) is a symmetric function of n, we may
consider only one side of the whole chain, e.g., the spins
with n = 1, 2, ..., N/2. From Fig.10, according to the
value of ∆, there are three different regions:
1) 1.05 ≤ ∆ ≤ 1.3: starting from the boundary
(n = 1), δn (∆) first decreases, then increases, and finally
decreases again as n approaches the DW centre (n = 10).
As we discussed already, the fluctuation of the magneti-
zation at the DW centre is small when ∆ is close to 1.
The spin at the boundary only interacts with one nearest
spin, so it has more freedom to fluctuate. For the others,
because of the influence of the DW structure (or bound-
ary), the fluctuations of the spins which are near the DW
(or near the boundary) are larger compared to those of
a spin located in the middle of a polarized region. Thus
δn (∆) is larger if the spin is located near the DW or near
a boundary.
0 0.001 0.002 0.003 0.004 0.005
FIG. 7: Fit of ∆∗N to ∆
∗ + λ ·N−2 with ∆∗ = 1.009± 0.002,
and λ = 14.253 ± 0.660.
1 2 3 4 5
FIG. 8: The DW width as a function of ∆ (1.06 ≤ ∆ ≤ 20)
in the ground state of subspace M = 0 in a spin chain of
N = 20 spins. The black dots are the simulation data and
the solid line is given by Eq. (18), with ǫN = 0.010 ± 0.001,
AN = 1.87± 0.04 and BN = −0.550± 0.079.
2) 1.3 ≤ ∆ ≤ 5: δn (∆) reaches its maximum at the
DW centre. The reason for this is that in this regime the
magnetizations of all spins retain their initial direction,
therefore the spins that are far from the centre fluctuate
little.
3) 5 < ∆: in this regime (Ising limit), the initial state
is very close to the eigenstate, and the fluctuations are
small, even for the spins at the DW.
V. SUMMARY
In the presence of Ising-like anisotropy, DWs in a fer-
romagnetic spin 1/2 chain are dynamically stable over
1 3 5 7 9 11 13 15 17 19 21
FIG. 9: The exponent C as a function of ∆max in a spin chain
of N = 20 spins. The exponent C obtained by fitting the DW
width to Eq. (20), with ∆∗N=20 as obtained from the fit shown
in Fig. 7, for ∆ in the range [1.06,∆max].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
 1.05
 1.06
FIG. 10: (Color online) δn (∆) as a function of n for different
∆. Here t1 = 101τ , t2 = 200τ . We only show the data for
∆ = 1.05, 1.06, 1.1, 1.2, 1.3, 1.5, 2, 5 and 20. The total number
of spins in the spin chain is N = 20.
extended periods of time. The profiles of the magnetiza-
tion of the DW are different from the profile in the ground
state in the subspace of total magnetization M = 0.
As the system becomes more isotropic, approaching the
quantum critical point, the width of the DW increases as
a power law, with an exponent equal to 1/2.
1 3 5 7 9 11 13 15 17 19 21
FIG. 11: δ10 (∆) as a function of ∆. Here t1 = 101τ , t2 =
200τ . The total number of spins in the spin chain is N = 20.
1 T. Kajiwara, M. Nakano, Y. Kaneko, S. Takaishi, T. Ito,
M. Yamashita, A. Igashira-Kamiyama, H. Nojiri, Y. Ono
and N. Kojima, J. Am. Chem. Soc. 127 10150 (2005).
2 M. Mito, H. Deguchi, T. Tajiri, S. Takagi, M. Yamashita
and H. Miyasaka, Phys. Rev. B 72, 144421 (2005).
3 H. Kageyama, K. Yoshimura, K. Kosuge, M. Azuma, M.
Takano, H. Mitamura and T. Goto, J. Phys. Soc. Jpn. 66,
3996 (1997).
4 A. Maignana, C. Michel, A.C. Masset, C. Martin and B.
Raveau, Eur. Phys. J. B 15, 657 (2000).
5 J. Torrance and M. Tinkham, Phys. Rev. 187, 587 (1969).
6 D. Nicoli and M. Tinkham, Phys. Rev. B 9, 3126 (1974).
7 S. Yuan, H. De Raedt, and S. Miyashita, J. Phys. Soc.
Jpn., 75, 084703 (2006).
8 I.G. Gochev, JETP Lett. 26, 127 (1977).
9 I.G. Gochev, Sov. Phys. JETP 58, 115 (1983).
10 S. Sachdev, Quantum Phase Transitions, (Cambridge Uni-
versity Press, Cambridge, 1999).
11 H.J. Mikeska, S. Miyashita and G.H. Ristow, J. Phys.:
Condens. Matter 3, 2985 (1991).
12 J. des Cloizeaux and M. Gaudin, J. Math. Phys. 7, 1384
(1966).
13 D.C. Mattis, The Theory of Magnetism I, Solid State Sci-
ence Series 17 (Springer, Berlin 1981).
14 J.H. Wilkinson, The Algebraic Eigenvalue Problem, (Ox-
ford University Press, Oxford, 1999).
15 H. Tal-Ezer and R. Kosloff, J. Chem. Phys. 81, 3967
(1984).
16 C. Leforestier, R.H. Bisseling, C. Cerjan, M.D. Feit, R.
Friesner, A. Guldberg, A. Hammerich, G. Jolicard, W.
Karrlein, H.-D. Meyer, N. Lipkin, O. Roncero and R.
Kosloff, J. Comp. Phys. 94, 59 (1991).
17 T. Iitaka, S. Nomura, H. Hirayama, X. Zhao, Y. Aoyagi
and T. Sugano, Phys. Rev. E56, 1222 (1997).
18 V.V. Dobrovitski and H.A. De Raedt, Phys. Rev. E67 ,
056702 (2003).
19 D.P. Landau and K. Binder, AGuide to Monte Carlo Simu-
lations in Statistical Physics, (Cambridge University Press,
Cambridge, 2000).
ABSTRACT
  We study the real-time domain-wall dynamics near a quantum critical point of
the one-dimensional anisotropic ferromagnetic spin 1/2 chain. By numerical
simulation, we find the domain wall is dynamically stable in the
Heisenberg-Ising model. Near the quantum critical point, the width of the
domain wall diverges as $(\Delta -1) ^{-1/2}$.

<|endoftext|><|startoftext|>
Quantum mechanical approach to decoherence and relaxation generated by
fluctuating environment
S.A. Gurvitz∗
Department of Particle Physics, Weizmann Institute of Science,
Rehovot 76100, Israel and Theoretical Division and CNLS,
Los Alamos National Laboratory, Los Alamos, NM 87545, USA
D. Mozyrsky
Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
(Dated: November 4, 2018)
We consider an electrostatic qubit, interacting with fluctuating charge of a single electron tran-
sistor (SET) in the framework of an exactly solvable model. The SET plays role of an environment
affecting the qubits’ parameters in a controllable way. We derive the rate equations describing the
dynamics of the entire system for an arbitrary qubit-SET coupling. Solving these equations we
obtain decoherence and relaxation rates of the qubit, as well as the spectral density of qubit param-
eters’ fluctuations. We found that in a weak coupling regime decoherence and relaxation rates are
directly related to the spectral density taken at either zero or Rabi frequency, depending on which
qubit parameter is fluctuating. In the latter case our result coincides with that of the spin-boson
model in the weak coupling limit, despite different origin of the fluctuations. We show that this
relation holds also in the presence of weak back-action of the qubit on the environment. In case of
strong back-action such a simple relationship no longer holds, even if qubit-SET coupling is small.
It does not hold also in the strong coupling regime, even in the absence of the back-action. In
addition, we found that our model predicts localization of the qubit in the strong-coupling regime,
resembling that in the spin-boson model.
PACS numbers: 03.65.Yz, 05.60.Gg, 73.23.-b, 73.23.Hk
I. INTRODUCTION
The influence of environment on a single quantum sys-
tem is the issue of crucial importance in quantum in-
formation science. It is mainly associated with decoher-
ence, or dephasing, which transforms any pure state of
a quantum system into a statistical mixture. Despite a
large body of theoretical work devoted to decoherence, its
mechanism has not been clarified enough. For instance,
how decoherence is related to environmental noise, in
particular in the presence of back-action of the system
on the environment (quantum measurements). More-
over, decoherence is often intermixed with relaxation. Al-
though each of them represents an irreversible process,
decoherence and relaxation affect quantum systems in
quite different ways.
In order to establish a relation between the fluctua-
tion spectrum and decoherence and relaxation rates one
needs a model that describes the effects of decoherence
and relaxation in a consistent quantum mechanical way.
An obvious candidate is the spin-boson model1,2 which
represents the environment as a bath of harmonic oscil-
lators at equilibrium, where the fluctuations obey Gaus-
sian statistics3. Despite its apparent simplicity, the spin-
boson model cannot be solved exactly2. Also, it is hard
to manipulate the fluctuation spectrum in the framework
of this model. In addition, mesoscopic structures may
couple only to a few isolated fluctuators, like spins, lo-
cal currents, background charge fluctuations, etc. This
would require models of the environment, different from
Electrodes
FIG. 1: Electrostatic qubit, realized by an electron trapped
in a coupled-dot system (a), and its schematic representation
by a double-well (b). Ω0 denotes the coupling between the
two dots.
the spin-boson model (see for instace4,5,6,7,8,9,10,11,12). In
general, the environment can be out of equilibrium, like
a steady-state fluctuating current, interacting with the
qubit13,14,15,16. This for instance, takes place in the con-
tinuous measurement (monitoring) of quantum systems17
and in the “control dephasing” experiments18,19,20. All
these types on non-Gaussian and non-equilibrium envi-
ronments attracted recently a great deal of attention21.
In this paper we consider an electrostatic qubit, which
can be viewed as a generic example of two-state systems.
It is realized by an electron trapped in coupled quantum
dots22,23,24, Fig. 1. Here E1 and E2 denote energies of
the electron states in each of the dots and Ω0 is a cou-
pling between these states. It is reasonable to assume
that the decoherence of a qubit is associated with fluctu-
http://arxiv.org/abs/0704.0194v3
Electrostatic Qubit
Single Electron Transistor
(a) (b)
FIG. 2: Qubit near Single Electron Transistor. Here El,r and
E0 denote the energy levels in the left (right) reservoirs and in
the quantum dot, respectively, and µL,R are the correspond-
ing chemical potentials. The electric current I generates fluc-
tuations of the electrostatic opening between two dots (a), or
it fluctuates the energy level of the nearest dot (b).
ations of the qubit parameters, E1,2 and Ω0, generated
by the environment. Indeed, a stochastic averaging of
the Schrödinger equation over these fluctuations param-
eters results in the qubit’s decoherence, which transfers
any qubit state into a statistical mixture25,26. In general,
one can expect that the fluctuating environment should
result in the qubit’s relaxation, as well, as for instance in
the phenomenological Redfield’s description of relaxation
in the magnetic resonance27.
As a quantum mechanical model of the environment
we consider a Single Electron Transistor (SET) capac-
itively coupled to the qubit, e.g., Fig. 2. Such setup
has been contemplated in numerous solid state quantum
computing architectures where SET plays role of a read-
out device16,17,28,29 and contains most of the generic fea-
tures of a fluctuating non-equilibrium environment. The
discreteness of the electron charge creates fluctuations
in the electrostatic field near the SET. If the electro-
static qubit is placed near the SET, this fluctuating field
should affect the qubit behavior as shown in Fig. 2. It
can produce fluctuations of the tunneling coupling be-
tween the dots (off-diagonal coupling) by narrowing the
electrostatic opening connecting these dots, as in Fig. 2a,
or make the energy levels of the dots fluctuate, as shown
schematically in Fig. 2b. Note that while in some regimes
the SET operates as a measuring device16,17, in other
regimes it corresponds purely to a source of noise. In-
deed, if the energy level E0, Fig. 2, is deeply inside
the voltage bias – the case we consider in the begin-
ning, the SET current is not modulated by the qubit
electron. In this case the SET represents only the fluc-
tuating environment affecting the qubit behavior (“pure
environment”30).
A similar model of the fluctuating environment has
been studied mostly for small bias (linear response) or
for the environment in an equilibrium. Here, however,
we consider strongly non-equilibrium case where the bias
voltage applied on the SET (V = µL − µR) is much
larger than the levels widths and the coupling between
the SET and the qubit. In this limit our model can be
solved exactly for both weak and strong coupling (but
is still smaller than the bias voltage). This constitutes
an essential advantage with regard to perturbative treat-
ments of similar models. For instance, the results of our
model can be compared in different regimes with phe-
nomenological descriptions used in the literature. Such
a comparison would allow us to determine the regions
where these phenomenological models are valid.
Since our model is very simple in treatment, the deco-
herence and relaxation rates can be extracted from the
exact solution analytically, as well as the time-correlator
of the electric charge inside the SET. This would make
it possible to establish a relation between the frequency-
dependent fluctuation spectrum of the environment and
the decoherence and the relaxation rates of the qubit,
and to determine how far this relation can be extended.
We expect that such a relation should not depend on
a source of fluctuations. This point can be verified by
a comparison with a similar results obtained for equi-
librium environment in the framework of the spin-boson
model1,2.
It is also important to understand how the decoher-
ence and relaxation rates depend on the frequency of
the environmental fluctuations. This problem has been
investigated in many phenomenological approaches for
“classical” environments at equilibrium. Yet, there still
exists an ambiguity in the literature related to this point
for non-equilibrium environment. For instance, it was
found by Levinson that the decoherence rate, generated
by fluctuations of the energy level in a single quantum dot
is proportional to the spectral density of fluctuations at
zero frequency31. The same result, but for a double-dot
system has been obtained by Rabenstein et al.32. On
the other hand, it follows from the Redfied’s approach
that the corresponding decoherence rate is proportional
to the spectral density at the frequency of the qubit’s
oscillations (the Rabi frequency)27. Since our model is
the exactly solvable one, we can resolve this ambiguity
and establish the appropriate physical conditions that
can result in different relations of decoherence rate to
the environmental fluctuations.
The most important results of our study are related to
the situation when back-action of the qubit on the envi-
ronment takes place. This problem did not receive such
a considerable amount of attention in the literature as,
for example, the case of “inert” environment. This is in
spite of a fact that the back-action always takes place in
the presence of measurement. There are many questions
related to the effects of a back-action. For instance, what
would be a relation between decoherence (relaxation) of
the qubit and the noise spectrum of the environment?
Or, how decoherence is affected by a strong response of
environment? We believe that our model appears to be
more suitable for studying these and other problems re-
lated to the back-action than most of the other existing
approaches.
The plan of this paper is as follows: Sect. II presents a
phenomenological description of decoherence and relax-
ation in the framework of Bloch equations, applied to the
electrostatic qubit. Sect. III contains description of the
model and the quantum rate-equation formalism, used
for its solution. Detailed quantum-mechanical derivation
of these equations for a specific example is presented in
Appendix A. Sect. IV deals with a configuration where
the SET can generate only decoherence of the qubit. We
consider separately the situations when SET produces
fluctuations of the tunneling coupling (Rabi frequency)
or of the energy levels. The results are compared with
the SET fluctuation spectrum, evaluated in Appendix B.
Sect. V deals with a configuration where the SET gener-
ates both decoherence and relaxation of the qubit. Sect.
VI is summary.
II. DECOHERENCE AND RELAXATION OF A
QUBIT
In this section we describe in a general phenomeno-
logical framework the effect of decoherence and relax-
ation on the qubit behavior. Although the results are
known, there still exists some confusion in the literature
in this issue. We therefore need to define precisely these
quantities and demonstrate how the corresponding deco-
herenece and relaxation rates can be extracted from the
qubit density matrix.
Let us consider an electrostatic qubit, realized by an
electron trapped in coupled quantum dots, Fig. 1. This
system is described by the following tunneling Hamilto-
Hqb = E1a
1a1 + E2a
2a2 − Ω0(a
2a1 + a
1a2) (1)
where a
1,2, a1,2 are the creation and annihilation opera-
tors of the electron in the first or in the second dot. For
simplicity we consider electrons as spinless fermions. In
addition, we assume that a
1a1 + a
2a2 = 1, so that only
one electron is present in the double-dot. The electron
wave function can be written as
|Ψ(t)〉 =
b(1)(t)a
1 + b
(2)(t)a
〉 (2)
where b(1,2)(t) are the probability amplitudes for find-
ing the electron in the first or second well, obtained
from the Schrödinger equation i∂t|Ψ(t)〉 = Hqb|Ψ(t)〉 (we
adopt the units where ~ = 1 and the electron charge
e = 1). The corresponding density matrix, σjj′ (t) =
b(j)(t)b(j
′)∗(t), with j, j′ = {1, 2}, is obtained from the
equation i∂t σ = [H,σ]. This can be written explicitly as
σ̇11 = iΩ0(σ21 − σ12) (3a)
σ̇12 = −iǫσ12 + iΩ0(1− 2σ11) , (3b)
where σ22(t) = 1 − σ11(t), σ21(t) = σ∗12(t) and ǫ =
E1 − E2. Solving these equations one easily finds that
the electron oscillates between the two dots (Rabi oscil-
lations) with frequency ωR =
4Ω20 + ǫ
2. For instance,
for the initial conditions σ11(0) = 1 and σ12(0) = 1, the
probability of finding the electron in the second dot is
σ22(t) = 2(Ω0/ωR)
2(1− cosωRt). This result shows that
for ǫ≫ Ω0 the amplitude of the Rabi oscillations is small,
so the electron remains localized in its initial state.
The situation is different when the qubit interacts with
the environment. In this case the (reduced) density ma-
trix of the qubit σ(t) is obtained by tracing out the en-
vironment variables from the total density matrix. The
question is how to modify Eqs. (3), written for an isolated
qubit, in order to obtain the reduced density matrix of
the qubit, σ(t). In general one expects that the environ-
ment could affect the qubit in two different ways. First, it
can destroy the off-diagonal elements of the qubit density
matrix. This process is usually referred to as decoherence
(or dephasing). It can be accounted for phenomenolog-
ically by introducing an additional (damping) term in
Eq. (3b),
σ̇12 = −iǫσ12 + iΩ0(1− 2σ11)−
σ12 (4)
where Γd is the decoherence rate. As a result the qubit
density-matrix σ(t) becomes a statistical mixture in the
stationary limit,
t→∞−→
1/2 0
0 1/2
. (5)
This happens for any initial conditions and even for large
level displacement, ǫ ≫ Ω0,Γd (provided that Ω0 6= 0).
Note that the statistical mixture (5) is proportional to
the unity matrix and therefore it remains the same in
any basis.
Secondly, the environment can put the qubit in its
ground state, for instance via photon or phonon emis-
sion. This process is usually referred to as relaxation.
For a symmetric qubit we would have
t→∞−→
1/2 1/2
1/2 1/2
. (6)
In contrast with decoherence, Eq. (5), the relaxation pro-
cess puts the qubit into a pure state. That implies that
the corresponding density matrix can be always written
as δ1iδ1j in a certain basis (the basis of the qubit eigen-
states). This is in fact the essential difference between
decoherence and relaxation. With respect to elimination
of the off-diagonal density matrix elements, note that re-
laxation would eliminate these terms only in the qubit’s
eigenstates basis. In contrast, decoherence eliminates the
off-diagonal density matrix element in any basis (Eq. (5)).
In fact, if the environment has some energy, it can put
the qubit into an exited state. However, if the qubit is
finally in a pure state, such excitation process generated
by the environment affects the qubit in the same way as
relaxation: it eliminates the off-diagonal density matrix
elements only in a certain qubit’s basis. Therefore exci-
tation of the qubit can be described phenomenologically
on the same footing as relaxation.
It is often claimed that decoherence is associated with
an absence of energy transfer between the system and
the environment, in contrast with relaxation (excitation).
This distinction is not generally valid. For instance, if the
initial qubit state corresponds to the electron in the state
|E2〉, Fig. 1, the final state after decoherence corresponds
to an equal distribution between the two dots, 〈E〉 =
(E1 +E2)/2. In the case of E1 ≫ E2, this process would
require a large energy transfer between the qubit and the
environment. Therefore decoherence can be consistently
defines as a process leading to a statistical mixture, where
all states of the system have equal probabilities (as in
Eq. (5)).
The relaxation (excitation) process can be described
most simply by diagonalizing the qubit Hamiltonian,
Eqs. (1), to obtain Hqb = E+a
+a+ + E−a
−a−, where
the operators a± are obtained by the corresponding ro-
tation of the operators a1,2
30. Here E+ and E− are the
ground (symmetric) and excited (antisymmetric) state
energies. Then the relaxation process can be described
phenomenologically in the new qubit basis |±〉 = a†±|0¯〉
σ̇−−(t) = −Γrσ−−(t) (7a)
σ̇+−(t) = i(E− − E+)σ+−(t)−
σ+−(t) , (7b)
where σ++(t) = 1 − σ−−(t), σ−+(t) = σ∗+−(t) and Γr is
the relaxation rate.
In order to add decoherence, we return to the orig-
inal qubit basis |1, 2〉 = a†1,2|0¯〉 and add the damping
term to the equation for the off-diagonal matrix elements,
Eq. (4). We arrive at the quantum rate equation describ-
ing the qubit’s behavior in the presence of both decoher-
ence and relaxation30,33,
σ̇11 = iΩ0(σ21 − σ12)− Γr
(σ12 + σ21)−
(2σ11 − 1) + Γr
σ̇12 = −iǫσ12 +
iΩ0 + Γr
(1− 2σ11) + Γr
σ12 − κ2(σ12 + σ21)
σ12 , (8b)
where ǫ̃ = (ǫ2 + 4Ω20)
1/2 and κ = Ω0/ǫ̃. In fact, these
equations can be derived in the framework of a particu-
lar model, representing an electrostatic qubit interacting
with the point-contact detector and the environment, de-
scribed by the Lee model Hamiltonian33.
Equations (8) can be rewritten in a simpler form by
mapping the qubit density matrix σ = {σ11, σ12, σ21} to
a “polarization” vector S(t) via σ(t) = [1 + τ · S(t)]/2,
where τx,y,z are the Pauli matrices. For instance, one
obtains for the symmetric case, ǫ = 0,
Ṡz = −
Sz − 2Ω0 Sy (9a)
Ṡy = 2Ω0 Sz −
Γd + Γr
Sy (9b)
Ṡx = −
Γd + 2Γr
(Sx − S̄x) (9c)
where S̄x = Sx(t→∞) = 2Γr/(Γd+2Γr). One finds that
Eqs. (9) have a form of the Bloch equations for spin-
precession in the magnetic field27, where the effect of
environment is accounted for by two relaxation times for
the different spin components: the longitudinal T1 and
the transverse T2, related to Γd and 2Γr as
T−11 =
Γd + 2Γr
, and T−12 =
Γd + Γr
, (10)
The corresponding damping rates, the so-called “depolar-
ization” (Γ1 = 1/T1) and the “dephasing” (Γ2 = 1/T2)
are used for phenomenological description of two-level
systems34. However, neither Γ1 nor Γ2 taken alone would
drive the qubit density matrix into a statistical mixture
Eq. (5) or into a pure state Eq. (6).
In contrast, our definition of decoherence and relax-
ation (excitation) is associated with two opposite effects
of the environment on the qubit: the first drives it into
a statistical mixture, whereas the second drives it into
a pure state. We expect therefore that such a natural
distinction between decoherence and relaxation would be
more useful for finding a relation between these quantities
and the environmental behavior than other alternative
definitions of these quantities existing in the literature.
In general, the two rates, Γd,r, introduced in phe-
nomenological equations (8), (9), are consistent with our
definitions of decoherence and relaxation. The only ex-
ception is the case of Γr = 0 and Ω0 = 0, where are no
transitions between the qubit’s states even in the pres-
ence of the environment (“static” qubit). One easily finds
from Eqs. (3a), (4) that σ12(t) → 0 for t → ∞, whereas
the diagonal density-matrix elements of the qubit remain
unchanged (so-called “pure dephasing”5,34):
t→∞−→
σ11(0) 0
0 σ22(0)
. (11)
Thus, if the initial probabilities of finding the qubit in
each of its states are not equal, σ11(0) 6= σ22(0), then the
final qubit state is neither a mixture nor a pure state, but
a combination of the both. It implies that Γd in Eqs. (8)
would also generate relaxation (excitation) of the qubit.
Note that in this case the off-diagonal density-matrix el-
ements, absent in Eq.(11), would reappear in a different
basis. This implies that the “pure dephasing”5,34 occurs
only in a particular basis.
Let us evaluate the probability of finding the electron
in the first dot, σ11(t). Solving Eqs. (9) for the initial
conditions σ11(0) = 1, σ12(0) = 0, we find
σ11(t) =
e−Γrt/2
−e−t + C2e
where e± =
(Γd ± Ω̃), Ω̃ =
Γ2d − 64Ω20 and C1,2 =
1±(Γd/Ω̃). Solving the same equations in the limit of t→
∞, we find that the steady-state qubit density matrix is
t→∞−→
1/2 Γr/(Γd + 2Γr)
Γr/(Γd + 2Γr) 1/2
. (13)
Thus the off-diagonal elements of the density matrix
can provide us with a ratio of relaxation to decoherence
rates33.
III. DESCRIPTION OF THE MODEL
Consider the setup shown in Fig. 2. The entire system
can be described by the following tunneling Hamiltonian,
represented by a sum of the qubit and SET Hamiltonians
and the interaction term, H = Hqb +HSET +Hint. Here
Hqb is given by Eq. (1) and describes the qubit. The sec-
ond term, HSET, describes the single-electron transistor.
It can be written as
HSET =
l cl +
rcr + E0c
l c0 +Ωrc
rc0 +H.c.) , (14)
where c
l,r and cl,r are the creation and annihilation elec-
tron operators in the state El,r of the right or left reser-
voir; c
0 and c0 are those for the level E0 inside the quan-
tum dot; and Ωl,r are the couplings between the level E0
and the level El,r in the left (right) reservoir. In order to
avoid too lengthy formulaes, our summation indices l, r
indicate simultaneously the left and the right leads of the
SET, where the corresponding summation is carried out.
As follows from the Hamiltonian (14), the quantum dot
of the SET contains only one level (E0). This assumption
has been implied only for the sake of simplicity for our
presentation, although our approach is well suited for a
case of n levels inside the SET, E0c
0c0 →
n Enc
and even when the interaction between these levels is in-
cluded (providing that the latter is much less or much
larger than the bias V )35,36. We also assumed a weak
energy dependence of the couplings Ωl,r ≃ ΩL,R.
The interaction between the qubit and the SET, Hint,
depends on a position of the SET with respect to the
qubit. If the SET is placed near the middle of the qubit,
Fig. 2a, then the tunneling coupling between two dots of
the qubit in Eq. (1) decreases, Ω0 → Ω0− δΩ0, whenever
the quantum dot of the SET is occupied by an electron.
This is due to the electron’s repulsive field. In this case
the interaction term can be written as
Hint = δΩ c
0c0(a
1a2 + a
2a1) . (15)
On the other hand, in the configuration shown in Fig. 2b
where the SET is placed near one of the dots of the qubit,
the electron repulsive field displaces the qubit energy lev-
els by ∆E = U . The interaction terms in this case can
be written as
Hint = U a
0c0 . (16)
Consider the initial state where all the levels in the left
and the right reservoirs are filled with electrons up to the
Fermi levels µL,R respectively. This state will be called
the “vacuum” state |0
〉. The wave function for the entire
system can be written as
|Ψ(t)〉 =
b(1)(t)a
0l (t)a
0cl +
rl (t)a
rcl +
l<l′,r
0rll′(t)a
rclcl′ + · · ·
+b(2)(t)a
0l (t)a
0cl +
rl (t)a
rcl +
l<l′,r
0rll′(t)a
rclcl′ + . . .
〉, (17)
where b(j)(t), b
α (t) are the probability amplitudes to
find the entire system in the state described by the cor-
responding creation and annihilation operators. These
amplitudes are obtained from the Schrödinger equation
i|Ψ̇(t)〉 = H |Ψ(t)〉, supplemented with the initial condi-
tion b(1)(0) = p1, b
(2)(0) = p2, and b
α (0) = 0, where
p1,2 are the amplitudes of the initial qubit state.
Note that Eq. (17) implies a fixed electron number (N)
in the reservoirs. At the first sight it would lead to deple-
tion of the left reservoir of electrons over the time. Yet
in the limit of N →∞ (infinite reservoirs) the dynamics
of an entire system reaches its steady state before such a
depletion takes place37,38.
The behavior of the qubit and the SET is given by
the reduced density matrix, σss′ (t). It is obtained from
the entire system’s density matrix |Ψ(t)〉〈Ψ(t)| by trac-
ing out the (continuum) reservoir states. The space of
such a reduced density matrix consists of four discrete
states s, s′ = a, b, c, d, shown schematically in Fig. 3 for
the setup of Fig. 2a. The corresponding density-matrix
elements are directly related to the amplitudes b(t), for
instance,
σaa(t) = |b(1)(t)|2 +
|b(1)lr (t)|
l<l′,r<r′
|b(1)rr′ll′(t)|
2 + · · · (18a)
σdd(t) =
|b(2)0l (t)|
l<l′,r
|b(2)0rll′(t)|
l<l′<l′′,r<r′
|b(2)0rr′ll′(t)|
2 + · · · (18b)
σbd(t) =
0l (t)b
0l (t) +
l<l′,r
0rll′(t)b
0rll′(t) +
l<l′<l′′,r<r′
0rr′ll′ (t)b
0rr′ll′(t) + · · · . (18c)
In was shown in37,38 that the trace over the reservoir
states in the system’s density matrix can be performed
in the large bias limit (strong non-equilibrium limit)
V = µL − µR ≫ Γ,Ω0, U (19)
where the level (levels) of the SET carrying the current
are far away from the chemical potentials, and Γ is the
width of the level E0. In this derivation we assumed
only weak energy dependence of the transition ampli-
tudes Ωl,r ≡ ΩL,R and the density of the reservoir states,
ρ(El,r) = ρL,R. As a result we arrive at Bloch-type rate
equations for the reduced density matrix without any
additional assumptions. The general form of these equa-
tions is36,38
σ̇jj′ = i(Ej′ − Ej)σjj′ + i
σjkΩ̃k→j′ − Ω̃j→kσkj′
P2πρ(σjkΩk→k′Ωk′→j′ + σkj′Ωk→k′Ωk′→j)
P2πρ (Ωk→jΩk′→j′ +Ωk→j′Ωk′→j)σkk′ (20)
Here Ωk→k′ denotes the single-electron hopping ampli-
tude that generates the k → k′ transition. We distinguish
between the amplitudes Ω̃ describing single-electron hop-
ping between isolated states and Ω describing transitions
between isolated and continuum states. The latter can
generate transitions between the isolated states of the
system, but only indirectly, via two consecutive jumps of
an electron, into and out of the continuum reservoir states
(with the density of states ρ). These transitions are rep-
resented by the third and the fourth terms of Eq. (20).
The third term describes the transitions (k → k′ → j)
or (k → k′ → j′), which cannot change the number of
electrons in the collector. The fourth term describes the
transitions (k → j and k′ → j′) or (k → j′ and k′ → j)
which increase the number of electrons in the collector
by one. These two terms of Eq. (20) are analogues of the
“loss” (negative) and the “gain” (positive) terms in the
classical rate equations, respectively. The factor P2 = ±1
in front of these terms is due anti-commutation of the
fermions, so that P2 = −1 whenever the loss or the gain
terms in Eq. (20) proceed through a two-fermion state of
the dot. Otherwise P2 = 1.
Note that the reduction of the time-dependent
Schrödinger equation, i|Ψ̇(t)〉 = H |Ψ(t)〉, to Eqs. (20)
is performed in the limit of large bias without explicit
use of any Markov-type or weak coupling approxima-
tions. The accuracy of these equations is respectively
max(Γ,Ω0, U, T )/|µL,R−Ej |. A detailed example of this
derivation is presented in Appendix A for the case of res-
onant tunneling through a single level. The derivation
there and in Refs.37,38 were performed by assuming zero
temperature in the leads, T = 0. Yet, this assumption
is not important in the case of large bias, providing the
levels carrying the current are far away from the Fermi
energies, |µL,R − Ej | ≫ T .
IV. NO BACK-ACTION ON THE
ENVIRONMENT
A. Fluctuation of the tunneling coupling
Now we apply Eqs. (20) to investigate the qubit’s be-
havior in the configurations shown in Fig. 2. First we
consider the SET placed near the middle of the qubit,
Figs. 2a,3. In this case the electron current through the
SET will influence the coupling between two dots of the
(d)(b)(a) (c)
E E1 2
Ω0 Ω0’ ’
FIG. 3: The available discrete states of the entire system cor-
responding to the setup of Fig. 2a. ΓL,R denote the tunneling
rates to the corresponding reservoirs and Ω′0 = Ω0 − δΩ.
qubit, making it fluctuate between the values Ω0 and
Ω′0 = Ω0 − δΩ. The corresponding rate equations can be
written straightforwardly from Eqs. (20). One finds,
σ̇aa = −ΓLσaa + ΓRσbb − iΩ0(σac − σca), (21a)
σ̇bb = −ΓRσbb + ΓLσaa − iΩ′0(σbd − σdb), (21b)
σ̇cc = −ΓLσcc + ΓRσdd − iΩ0(σca − σac), (21c)
σ̇dd = −ΓRσdd + ΓLσcc − iΩ′0(σdb − σbd), (21d)
σ̇ac = −iǫ0σac − iΩ0(σaa − σcc)− ΓLσac
+ ΓRσbd, (21e)
σ̇bd = −iǫ0σbd − iΩ′0(σbb − σdd)− ΓRσbd
+ ΓLσac, (21f)
where ΓL,R = 2π|ΩL,R|2ρL,R are the tunneling rates from
the reservoirs and ǫ0 = E1 − E2.
These equations display explicitly the time evolution
of the SET and the qubit. The evolution of the for-
mer is driven by the first two terms in Eqs. (21a)-(21d).
They generate charge-fluctuations inside the quantum
dot of the SET (the transitions a←→b and c←→d),
described by the “classical” Boltzmann-type dynamics.
The qubit’s evolution is described by the Bloch-type
terms (c.f. Eqs. (3)), generating the qubit transitions
(a←→c and b←→d). Thus Eqs. (21) are quite general,
since they described fluctuations of the tunneling cou-
pling driven by the Boltzmann-type dynamics.
The resulting time evolution of the qubit is given by
the qubit (reduced) density matrix:
σ11(t) = σaa(t) + σbb(t) , (22a)
σ12(t) = σac(t) + σbd(t) , (22b)
and σ22(t) = 1− σ11(t).
Similarly, the charge fluctuations of SET are deter-
mined by the probability of finding the SET occupied,
P1(t) = σbb(t) + σdd(t) . (23)
It is given by the equation
Ṗ1(t) = ΓL − ΓP1(t) , (24)
obtained straightforwardly from Eqs. (21). Here Γ =
ΓL +ΓR is the total width. The same equation for P1(t)
can be obtained if the qubit is decoupled from the SET
(δΩ = 0). Thus there is no back-action of the qubit on
the charge fluctuations inside the SET in the limit of
large bias voltage.
Consider first the stationary limit, t → ∞, where
Ṗ1(t)→ 0 and σ̇(t)→ 0. It follows from Eq. (24) that the
probability of finding the SET occupied in this limit is
P̄1 = ΓL/Γ. This implies that the fluctuations of the cou-
pling Ω0, induced by the SET, would take place around
the average value Ω = Ω0 − P̄1 δΩ.
With respect to the qubit in the stationary limit, one
easily obtains from Eqs. (21) that the qubit density ma-
trix always becomes the statistical mixture (5), when
t → ∞. This takes place for any initial conditions and
any values of the qubit and the SET parameters. There-
fore the effect of the fluctuating charge inside the SET
does not lead to relaxation of the qubit, but rather to its
decoherence.
It is important to note, however, that for the aligned
qubit, ǫ = 0, the decoherence due to fluctuations of the
tunneling coupling Ω0 is not complete. Indeed, it follows
from Eqs. (21) that d/dt[Re σ12(t)] = 0. The reason is
that the corresponding operator, a
1a2 + a
2a1 commutes
with the total Hamiltonian H = Hqb + HSET + Hint,
Eqs. (1), (14) and (15), for E1 = E2. As a result,
Re σ12(t) = Re σ12(0).
In order to determine the decoherence rate analytically,
we perform a Laplace transform on the density matrix,
σ̃(E) =
σ(t) exp(−iEt)dE. Then solving Eq. (21) we
can determine the decoherence rate from the locations
of the poles of σ̃(E) in the complex E-plane. Consider
for instance the case of ǫ0 = 0 and the symmetric SET,
ΓL = ΓR = Γ/2. One finds from Eqs. (21) and (22a) that
σ̃11(E) =
i(E − 2Ω + iΓ)
4(E − 2Ω + iΓ/2)2 + Γ2 − (2 δΩ)2
i(E + 2Ω+ iΓ)
4(E + 2Ω+ iΓ/2)2 + Γ2 − (2 δΩ)2
. (25)
Upon performing the inverse Laplace transform,
σ11(t) =
∞+i0∫
−∞+i0
σ̃11(E) e
−iEt dE
, (26)
and closing the integration contour around the poles of
the integrand, we obtain for Γ > 2δΩ and t≫ 1/Γ
σ11(t)− (1/2) ∝ e−(Γ−
Γ2−4δΩ2)t/2 sin(2Ω t). (27)
Comparing this result with Eq. (12) we find that the
decoherence rate is
Γd = 2
Γ2 − 4δΩ2
Γ≫δΩ−→ (2δΩ)2/Γ . (28)
For ǫ0 6= 0 and ǫ0,Γ ≪ Ω the decoherece rate Γd is
multiplied by an additional factor [1− (ǫ0/2Ω)2].
In a general case, ΓL 6= ΓR, we obtain in the same
limit (ΓL,R ≫ δΩ) for the decoherence rate:
(4 δΩ)2
(ΓL + ΓR)3
It is interesting to compare this result with the fluctu-
ation spectrum of the charge inside the SET, Eq. (B8),
Appendix B. We find
Γd = 2 (δωR)
2 SQ(0) , (30)
where ωR =
4Ω2 + ǫ20 is the Rabi frequency. The latter
represents the energy splitting in the diagonalized qubit
Hamiltonian. Thus δωR corresponds to the amplitude of
energy level fluctuations in a single dot.
Although Eq. (30) has been obtained for small fluc-
tuations δωR, it might be approximately correct even
if δωR is of the order of Γ. It is demonstrated in
Fig. 4, where we compare σ11(t) and σ12(t), obtained
from Eqs. (21) and (22) (solid line) with those from
Eqs. (3a) and (4) (dashed line) for the decoherence rate
Γd given by Eq. (30). The initial conditions correspond to
σ11(0) = 1 and σ12(0) = 0 (respectively, σaa(0) = ΓR/Γ
and σbb(0) = ΓR/Γ).
In the case of aligned qubit, however, Re σ12(t) =
Re σ12(0), as was explained above. On the other hand,
one always obtains from (3a) and (4) that Re [σ12(t →
∞)] = 0. Therefore the phenomenological Bloch equa-
tions are not applicable for evaluation of Re [σ12(t)],
even in the weak coupling limit (besides the case of
Re [σ12(t = 0)] = 0).
In the large coupling regime (δΩ≫ Γ) the phenomeno-
logical Bloch equations, Eqs. (3a) and (4), cannot be
used, as well. Consider for simplicity the case of ǫ = 0
and ΓL,R = Γ/2. Then one finds from Eq. (27) that
the damping oscillations between the two dots take place
at two different frequencies, 2Ω ±
(δΩ)2 − (Γ/2)2, in-
stead of the one frequency, ωR = 2Ω, given the Bloch
equations. Moreover, Eq. (30) does not reproduce the
decoherence (damping) rate in this limit. Indeed, one
obtains from Eq. (28) that the decoherence rate Γd = 2Γ
for δΩ > Γ/2, so Γd does not depend on the coupling
(δΩ) at all.
B. Fluctuation of the energy level
Consider the SET placed near one of the qubit dots,
as shown in Fig. 2b. In this case the qubit-SET inter-
action term is given by Eq. (16). As a result the energy
level E1 will fluctuate under the influence of the fluctua-
tions of the electron charge inside the SET. The available
discrete states of the entire system are shown in Fig. 5.
Using Eqs. (20) we can write the rate equations, similar
10 20 30 40 50
(a) (b)
10 20 30 40 50
Re 12
Im 12
11 σ  ( )
σ  ( )
σ  ( )
FIG. 4: The occupation probability of the first dot of the qubit
for ǫ = 2Ω, ΓL = Ω, ΓR = 2Ω and δΩ = 0.5Ω. The solid line
is the exact result, whereas the dashed line is obtained from
the Bloch-type rate equations with the decoherence rate given
by Eq. (30).
(d)(b)(a) (c)
’’’ ’ Γ
R ΓL ΓR
E0 E0
0Ω0Ω0
E2 E2
E +U1Ω
FIG. 5: The available discrete states of the entire system for
the configuration shown in Fig. 2b. Here U is the repulsion
energy between the electrons.
to Eqs. (21),
σ̇aa = −Γ′Lσaa + Γ′Rσbb − iΩ0(σac − σca), (31a)
σ̇bb = −Γ′Rσbb + Γ′Lσaa − iΩ0(σbd − σdb), (31b)
σ̇cc = −ΓLσcc + ΓRσdd − iΩ0(σca − σac), (31c)
σ̇dd = −ΓRσdd + ΓLσcc − iΩ0(σdb − σbd), (31d)
σ̇ac = −iǫ0σac − iΩ0(σaa − σcc)−
ΓL + Γ
Rσbd, (31e)
σ̇bd = −i(ǫ0 + U)σbd − iΩ0(σbb − σdd)−
ΓR + Γ
Lσac , (31f)
where Γ′L,R are the tunneling rate at the energy E0+U
Let us assume that Γ′L,R = ΓL,R. Then it follows from
Eqs. (31) that the behavior of the charge inside the SET
is not affected by the qubit, the same as in the previous
case of the Rabi frequency fluctuations. Also the qubit
density matrix becomes the mixture (5) in the stationary
state for any values of the qubit and the SET parameters.
Hence, there is no qubit relaxation in this case either
(except for the static qubit, Ω0 = 0, and σ11(0) 6= σ22(0),
Eq. (11)).
Since according to Eq. (24), the probability of finding
an electron inside the SET in the stationary state is P̄1 =
ΓL/Γ, the energy level E1 of the qubit is shifted by P̄1U .
Therefore it is useful to define the “renormalized” level
displacement, ǫ = ǫ0 + P̄1U .
As in the previous case we use the Laplace transform,
σ(t)→ σ̃(E), in order to determine the decoherence rate
analytically. In the case of ΓL = ΓR = Γ/2 and ǫ = 0 we
obtain from Eqs. (31)
σ̃11(E) =
32(E + iΓ)Ω20
U2 − 4E(E + iΓ)
. (32)
The position of the pole in the second term of this expres-
sion determines the decoherence rate. In contrast with
Eq. (25), however, the exact analytical expression for the
decoherence rate (Γd) is complicated, since it is given by
a cubic equation. We therefore evaluate Γd in a different
way, by substituting E = ±2Ω0 − iγ in the second term
of Eq. (32) and then expanding the latter in powers of
γ by keeping only the first two terms of this expansion.
The decoherence rate Γd is related to γ by Γd = 4γ, as
follows from Eq. (12). Then we obtain:
2(Γ2 + 4Ω20)
for U ≪ (Ω20 + ΓΩ0)1/2
64ΓΩ20
U2 + 16Ω20
for U ≫ (Ω20 + ΓΩ0)1/2
In general, if ΓL 6= ΓR, one finds from Eqs. (31) that
Γd = 2U
2ΓLΓR/[Γ(Γ
2 + 4Ω20)] for U ≪ (Ω20 + ΓΩ0)1/2.
The same as in the previous case, Eq. (30), the deco-
herence rate in a weak coupling limit is related to the
fluctuation spectrum of the SET, SQ(ω), Eq. (B8), but
now taken at a different frequency, ω = 2Ω0. The lat-
ter corresponds to the level splitting of the diagonalized
qubit’s Hamiltonian, ωR. Thus,
Γd = U
2 SQ(ωR) , (34)
which can be applied also for ǫ 6= 0. This is illustrated
by Fig. 6 which shows σ11(t) obtained from Eqs. (31)
and (22) (solid line) with Eqs. (3a) and (4) (dashed line)
for the decoherence rate Γd given by Eq. (34). As in
the previous case, shown in Fig. 4, the initial conditions
correspond to σ11(0) = 1 and σ12(0) = 0 (respectively,
σaa(0) = ΓR/Γ and σbb(0) = ΓR/Γ). One finds from
Fig. 6 that Eq. (34) can be used for an estimation of Γd
even for U ∼ Γ,Ω0.
In contrast with the tunneling-coupling fluctuations,
Eq. (30), where the decoherence rate is given by SQ(0),
the fluctuations of the qubit’s energy level generate the
decoherence rate, determined by the fluctuation spec-
trum at Rabi frequency, SQ(ωR), Eq. (34). A similar
distinction between the decoherence rates generated by
different components of the fluctuating field, exists in a
phenomenological description of magnetic resonance27.
One can understand this distinction by diagonalizing the
qubit’s Hamiltonian. In this case the Rabi frequency,
ωR, becomes the level splitting of the qubit’s states
|±〉 = (|1〉 ± |2〉)/
2 (for ǫ = 0). So in this basis, the
tunneling-coupling fluctuations correspond to simultane-
ous fluctuations of the energy levels in the both dots.
10 20 30 40 50
(a) (b)
10 20 30 40 50
Re 12
Im 12
11 σ  ( )
σ  ( )
σ  ( )
FIG. 6: The probability of finding the electron in the first
dot of the qubit for ǫ = 2Ω0, ΓL = Ω0, ΓR = 2Ω0 and U =
0.5Ω0. The solid line is the exact result, whereas the dashed
line is obtained from the Bloch-type rate equations with the
decoherence rate given by Eq. (34).
Since these fluctuations are “in phase”, we could expect
that the corresponding dephasing rate is determined by
spectral density at zero frequency. In fact, it looks like
as fluctuations of a single dot state, considered by Levin-
son in a weak coupling limit31. On the other hand by
fluctuating the energy level in one of the dots only, one
can anticipate that the corresponding dephasing rate is
determined by the fluctuation spectrum at the Rabi fre-
quency, ωR, Eq. (34), which is a frequency of the inter-dot
transitions.
Since ωR can be controlled by the qubit’s levels dis-
placement, ǫ, the relation (34) can be implied by using
qubit for a measurement of the shot-noise spectrum of the
environment18,19,40. For instance, it can be done by at-
taching a qubit to reservoirs at different chemical poten-
tials. The corresponding resonant current which would
flow through the qubit in this case, can be evaluated via
a simple analytical expression13 that includes explicitly
the decoherence rate, Eq. (34). Thus by measuring this
current for different level displacement of the qubit (ǫ0),
one can extract the spectral density of the fluctuating
environment acting on the qubit18.
Although Eq. (34) for the decoherence rate has been
obtained by using a particular mechanism for fluctuations
of the qubit’s energy levels, we suggest that this mecha-
nism is quite general. Indeed, the rate equations (31) can
describe any fluctuating media near a qubit, driven by the
Boltzmann type of equations. Therefore it is rather nat-
ural to assume that Eq. (34) would be valid for any type
of such (classical) environment in weak coupling limit.
This implies that the decoherence rate is always deter-
mined via the spectral density of a fluctuating qubit’s
level, whereas the nature of a particular medium inducing
these fluctuations would be irrelevant. In order to sub-
stantiate this point it is important to compare Eq. (34)
with the corresponding decoherence rate induced by the
thermal environment in the framework of the spin-boson
model. In a weak damping limit this model predicts1,2
T−11 = T
2 = (q
0/2)S(ωR) , where q0 is a coupling of
the medium with the qubit levels (q0 corresponds to U in
our case) and S(ω) is a spectral density. Using Eq. (10)
one finds that this result coincides with Eq. (34).
0 5 10 15 20
0 5 10 15 20
t t1111
U/      =100ΩU/      =10
σ  ( )
σ  ( )
(a) (b)
FIG. 7: The probability of finding the electron in the first
dot of the qubit for ǫ = 0, ΓL = ΓR = Ω0 and U , as given
by Eqs. (31) (solid line) and from the Bloch-type equations
(dashed line) with the decoherence rate given by Eq. (33).
C. Strong-coupling limit and localization
Let us consider the limit of U ≫ (Ω20 + ΓΩ0)1/2. Our
rate equation (31) are perfectly valid in this region, pro-
viding only that E0 + U is deeply inside of the potential
bias, Eq. (19). We find from Eq. (33) that the deco-
herence rate is not directly related to the spectrum of
fluctuations in strong coupling limit. In addition, the ef-
fective frequency of the qubit’s Rabi oscillations (ω
decreases in this limit. Indeed, by using Eqs. (32), (26),
one finds that the main contribution to σ11(t), is coming
from a pole of σ̃11(E), which lies on the imaginary axis.
This implies that the effective frequency of Rabi oscilla-
tions strongly decreases when U ≫ (Ω20 + ΓΩ0)1/2. In
addition, the decoherence rate Γd → 0 in the same limit,
Eq. (33). As a result, the electron would localize in the
initial qubit state, Fig. 7.
The results displayed in this figure show that the solu-
tion of the Bloch-type rate equations, with the decoher-
ence rate given by Eq. (33), represents damped oscilla-
tions (dashed line). It is very far from the exact result
(solid line), obtained from Eqs. (31) and corresponding
to the electron localization in the first dot. The latter is
a result of an effective decrease of the Rabi frequency for
large U that slows down electron transitions between the
dots. Thus such an environment-induced localization is
different from the Zeno-type effect (unlike an assumption
of Ref.12). Indeed, the Zeno effect takes place whenever
the decoherence rate is much larger then the coupling be-
tween the qubit’s states13,33. However, the decoherence
rate in the strong coupling limit is much smaller then the
coupling Ω0 . In fact, the localization shown in Fig. 7
is rather similar to that in the spin-boson model1,2. It
shows that in spite of their defferences, both models trace
the same physics of the back-action of the environment
(SET) on the qubit.
V. BACK-ACTION OF THE QUBIT ON THE
ENVIRONMENT
A. Weak back-action effect
Now we investigate a weak dependence of the width’s
ΓL,R on the energy U , Fig. 5. We keep only the linear
term, Γ′L,R = ΓL,R+αL,RU , by assuming that U is small.
(A similar model has been considered in28,41). In con-
trast with the previous examples, where the widths have
not been dependent on the energy, the qubit’s oscillation
would affect the SET current and its charge correlator.
A more interesting case corresponds to αL 6= αR. Let us
take for simplicity αL = 0 and αR = α 6= 0.
Similarly to the previous case we introduce the “renor-
malized” level displacement, ǫ = ǫ0 − (ΓL/Γ)U , where
ǫ = 0 corresponds to the aligned qubit. Solving Eqs. (31)
in the steady-state limit, σ̄ = σ(t → ∞), and keeping
only the first term in expansion in powers of U , we find
for the reduced density matrix of the qubit, Eqs. (22):
− α ǫ
αΩ0(1 + c αU)
αΩ0(1 + c αU)
, (35)
where c = (αǫ − 2Γ)/(4ΓRΓ). It follows from Eqs. (35)
that the qubit’s density matrix in the steady-state is no
longer a mixture, Eq. (5) . Indeed, the probability to
occupy the lowest level is always larger than 1/2 and
σ̄12 6= 0. This implies that relaxation takes place to-
gether with decoherence. The ratio of the relaxation and
decoherence rates is given by the off-diagonal terms of
the reduced density matrix of the qubit. For ǫ = 0 one
finds from Eq. (13) that Γd/Γr = σ̄
12 − 2.
In order to find a relation between the decoherence
and relaxation rates, Γd,r, and the fluctuation spectrum
of the qubit energy level, SQ(ω), we first evaluate the
total damping rate of the qubit’s oscillations (γ). Using
Eq. (12) we find that this quantity is related to the deco-
herence and relaxation rates by γ = (Γd + 2Γr)/4. The
same as in the previous case the rate γ is determined by
poles of Laplace transformed density matrix σ(t)→ σ̃(E)
in the complex E-plane. Consider for simplicity the case
of ǫ = 0 and ΓL = ΓR = Γ/2. Performing the Laplace
transform of Eqs. (31) we look for the poles of σ11(E) at
E = ±2Ω0 − iγ by assuming that γ is small. We obtain
Γd + 2Γr =
2(Γ2 + 4Ω20)
Γ− αU Γ
2 − 4Ω20
2 (Γ2 + 4Ω20)
for U ≪ Ω0.
Now we evaluate the correlator of the charge inside the
SET, SQ(ω) which induces the energy-level fluctuations
of the qubit. Using Eqs. (31) and (B6) we find,
SQ(ω) =
2 (Γ2 + ω2)
− αU Γ
2 − ω2
4 (Γ2 + ω2)
for αU ≪ Γ. Therefore in the limit of U ≪ Ω0 and αU ≪
Γ the total damping rate of the qubit’s oscillations is
directly related to the spectral density of the fluctuations
spectrum taken at the Rabi frequency,
Γd + 2Γr = U
2SQ(2Ω0). (38)
This represents a generalization of Eq. (34) for the case
of a weak back-action of qubit oscillations on the spectral
density of the environment. As a result, the qubit dis-
plays relaxation together with decoherence. It is remark-
able that the total qubit’s damping rate is still given by
the fluctuation spectrum of the SET (environment) mod-
ulated by the qubit. Note that Eq. (38) can be applied
only if the modulation of the tunneling rate through the
SET (tunneling current) is small αU ≪ Γ, in addition to
a weak distortion of the qubit (U ≪ Ω0).
In the case of strong back-action of the qubit on the
environment the decorerence and relaxation rates of the
qubit are not directly related to the fluctuation spectrum
of the environment, even if the distortion of the qubit is
small. This point is illustrated by the following example.
B. Strong back-action
Until now we considered the case where E0 + U ≪
µL, so that the interacting electron of the SET remains
deeply inside the voltage bias. If however, the interaction
U between the qubit and the SET is such that E0 +
U ≫ µL, the qubit’s oscillation would strongly affect the
fluctuation of charge inside the SET. Indeed, the current
through the SET is blocked whenever the level E1 of the
qubit is occupied, Fig. 8. In fact, this case can be treated
with small modification of the rate equations (31), if only
µL − E0 ≫ Γ and E0 + U − µL ≫ Γ, where E0 is a level
of the SET carrying the current.
The corresponding quantum rate equations describing
the system are obtained directly from Eqs. (20). As-
suming that the widths ΓL,R are energy independent we
find16
σ̇aa = (ΓL + ΓR)σbb − iΩ0(σac − σca), (39a)
σ̇bb = −(ΓR + ΓL)σbb − iΩ0(σbd − σdb), (39b)
σ̇cc = −ΓLσcc + ΓRσdd − iΩ0(σca − σac), (39c)
σ̇dd = −ΓRσdd + ΓLσcc − iΩ0(σdb − σbd), (39d)
σ̇ac = −iǫ0σac − iΩ0(σaa − σcc)−
+ ΓRσbd, (39e)
σ̇bd = −i(ǫ0 + U)σbd − iΩ0(σbb − σdd)
σbd . (39f)
Solving Eqs. (39) in the stationary limit, σ̄ = σ(t →
∞) and introducing the “renormalized” level displace-
ment, ǫ = ǫ0 −UΓL/(2Γ), we obtain for the qubit’s den-
(d)(b)(a) (c)
E +U1
FIG. 8: The available discrete states of the entire system
when the electron-electron repulsive interaction U breaks off
the current through the SET.
sity matrix, Eqs. (22) in the steady state:
σ̄11 =
− 8ǫU
16ǫ2 + 8Uǫ+ 48Ω20 + 9(U
2 + Γ2)
,(40a)
σ̄12 =
12UΩ0
16ǫ2 + 8Uǫ+ 48Ω20 + 9 (U
2 + Γ2)
, (40b)
where for simplicity we considered the symmetric case,
ΓL = ΓR = Γ/2. It follows from Eqs. (40) that similarly
to the previous example, the qubit’s density matrix is no
longer a mixture (5). The relaxation takes place together
with decoherence in this case too.
Let us consider weak distortion of the qubit by the
SET, U < Ω0. Although the values of U are restricted
from below (U ≫ Γ+µL−E0), this limit can be achieved
if the level E0 is close to the Fermi energy, providing only
that µL − E0 ≫ Γ, and Γ≪ U . Now we evaluate σ11(t)
with the rate equations (39) and then compare it with
the same quantity obtained from the Bloch equations,
Eq. (12), where Γd,r are given by Eqs. (34)and (13). The
corresponding charge-correlator, SQ(ωR), is evaluated by
Eqs. (B6) and (39). As an example, we take symmetric
qubit with aligned levels, ǫ = 0, ΓL = ΓR = 0.05Ω0 and
U = 0.5Ω0. The decoherence and relaxation rates, corre-
sponding to these parameters are respectively: Γd/Ω0 =
0.0038 and Γr/Ω0 = 0.00059.
The results are presented in Fig. 9a. The solid line
shows σ11(t), obtained from the rate equations (39),
where the dashed line is the same quantity obtained from
Eq. (12). We find that Eq. (34) (or (38)) underestimates
the actual damping rate of σ11(t) by an order of mag-
nitude). This lies in a sharp contrast with the previous
case, where the energy level of the SET is not distorted
by the qubit, Γ′L,R = ΓL,R, Fig. 5. Indeed, in this case
σ11(t) obtained Eq. (12) with Γd given by Eq. (34) and
Γr = 0, agrees very well with that obtained from the rate
equations (31), as shown in Fig. 9b.
Such an example clearly illustrates that the decoher-
ence is not related to the fluctuation spectrum of the
environment, whenever the environment is strongly af-
fected by the qubit, even if the coupling with a qubit
is small. This is a typical case of measurement, corre-
sponding to a noticeable response of the environment to
the qubit’s state (a “signal”).
20 40 60 80 100
20 40 60 80 100
t t11 11σ  ( ) σ  ( )
FIG. 9: (a) The probability of finding the electron in the first
dot of the qubit for ǫ = 0, ΓL = ΓR = 0.05Ω0 and U = 0.5Ω0.
The solid line is obtained from Eqs. (39), whereas the dashed
line corresponds to the Eq. (12) with Γd given by Eq. (34);
(b) the same for the case, shown in Fig. 5, where the solid
line corresponds to Eqs. (31).
VI. SUMMARY
In this paper we propose a simple model describing
a qubit interacting with fluctuating environment. The
latter is represented by a single electron transistor (SET)
in close proximity of the qubit. Then the fluctuations
of the charge inside the SET generate fluctuating field
acting on the qubit. In the limit of large bias voltage, the
Schrödinger equation for the entire system is reduced to
the Bloch-type rate equations. The resulting equations
are very simple, so that one can easily analyze the limits
of weak and strong coupling of the qubit with the SET.
We considered separately two different cases: (a) there
is no back-action of the qubit on the SET behavior, so
that the latter represents a “pure environment”; and (b)
the SET behavior depends on the qubit’s state. In the
latter case the SET can “measure” the qubit. The setup
corresponding to the “pure environment” is realized when
the energy level of the SET carrying the current lies
deeply inside the potential bias. The second (measure-
ment) regime of the SET is realized when the tunnel-
ing widths of the SET are energy dependent, or when
the energy level of the SET carrying the current is close
enough to the Fermi level of the corresponding reservoir.
Then the electron-electron interaction between the qubit
and the SET modulates the electron current through the
In the case of the “pure environment” (“no-
measurement” regime) we investigate separately two dif-
ferent configurations of the qubit with respect to the
SET. In the first one the SET produces fluctuations of
the off-diagonal coupling (Rabi frequency) between two
qubit’s states. In the second configuration the SET pro-
duces fluctuations of the qubit’s energy levels. In the
both cases we find no relaxation of the qubit, despite
the energy transfer between the qubit and the SET can
take place. As a result the qubit always turns asymptot-
ically to the statistical mixture. We also found that in
both cases the decoherence rate of the qubit in the weak
coupling limit is given by the spectral density of the cor-
responding fluctuating parameter. The difference is that
in the case of the off-diagonal coupling fluctuations the
spectral density is taken at zero frequency, whereas in
the case of the energy level fluctuations it is taken at the
Rabi-frequency.
In the case of the strong coupling limit, however, the
decoherence rate is not related to the fluctuation spec-
trum. Moreover we found that the electron in the qubit
is localized in this limit due to an effective decrease of
the off-diagonal coupling. This phenomenon may resem-
ble the localization in the spin-boson model in the strong
coupling limit.
If the charge correlator and the total SET current are
affected by the qubit (back-action effect), we found that
the off-diagonal density-matrix elements of the qubit sur-
vive in the steady-state limit and therefore the relax-
ation rate is not zero. We concentrated on the case of
weak coupling, when the Coulomb repulsion between the
qubit and the SET is smaller then the Rabi frequency.
The back-action of the qubit on the SET, however, can
be weak or strong. In the first case we found that the
total damping rate of the qubit due to decoherence and
relaxation is again given by the spectral density of the
SET charge fluctuations, modulated by the qubit. This
relation, however, is not working if the back-action is
strong. Indeed, we found that the damping rate of the
qubit in this case is larger by an order of magnitude than
that given by the spectral density of the corresponding
fluctuating parameter.
This looks like that in the strong back-action of the
qubit on the SET the major component of decoherence is
not coming from the fluctuation spectrum of the qubit’s
parameters only, but also from the measurement “sig-
nal” of the SET. On the first sight it could agree with
an analysis of Ref.30, suggesting that the decoherence
rate contains two components, generated by a measure-
ment and by a “pure environment” (environmental fluc-
tuations). The latter therefore represents an unavoid-
able decoherence, generated by any environment. Yet,
in a weak coupling regime such a separation seems not
working. In this case the damping (decoherence) rate is
totally determined by the environment fluctuations, even
so modulated by the qubit.
Although our model deals with a particular setup, it
bears the main physics of a fluctuating environment, act-
ing on a qubit. Indeed, the Bloch-type rate equations,
which we used in our analysis have a pronounced phys-
ical meaning: they relate the variation of qubit param-
eters with a nearby fluctuating field described by rate
equations. A particular mechanism, generated this field
should not be relevant for an evaluations of the deco-
herence and relaxation rates, but only its fluctuation
spectrum. Indeed, in the weak coupling limit our re-
sult for the decorence rate coincides with that obtained
in a framework of the spin-boson model. Thus our model
can be considered as a generic one. Its main advantage is
that it can be easily extended to multiple coupled qubits.
Such an analysis would allow to determine how decoher-
ence scales with number of qubits42, which is extremely
important for a realization of quantum computations.
In addition, our model can be extended to a more
complicated fluctuating environments, such as containing
characteristic frequencies in its spectrum. It would for-
mally correspond to a replacement of the SET in Fig. 2
by a double-dot (DD) coupled to the reservoirs43. All
these situations, however, must be a subject of a sepa-
rate investigation.
VII. ACKNOWLEDGEMENT
One of us (S.G.) thanks T. Brandes and C. Emary
for helpful discussions and important suggestions. S.G is
also grateful to the Max Planck Institute for the Physics
of Complex Systems, Dresden, Germany, and to NTT Ba-
sic Research Laboratories, Atsugi-shi, Kanagawa, Japan,
for kind hospitality.
APPENDIX A: QUANTUM-MECHANICAL
DERIVATION OF RATE EQUATIONS FOR
QUANTUM TRANSPORT
Consider the resonant tunneling through the SET,
shown schematically in Fig. 10. The entire system is
described by the Hamiltonian HSET, given by Eq. (14).
The wave function can be written in the same way as
Eq. (17), where the variables related to the qubit are
omitted,
|Ψ(t)〉 =
b(t) +
b0l(t)c
0cl +
brl(t)c
l<l′,r
b0rll′(t)c
rclcl′ + · · ·
〉. (A1)
Substituting |Ψ(t)〉 into the time-dependent Schrödinger
equation, i∂t|Ψ(t)〉 = HSET|Ψ(t)〉, and performing the
Laplace transform, b̃(E) =
exp(iEt) b(t)dt, we obtain
the following infinite set of algebraic equations for the
FIG. 10: Resonant tunneling through a single dot. µL,R are
the Fermi energies in the collector and emitter, respectively.
amplitudes b̃(E):
Eb̃(E) −
Ωlb̃0l(E) = i (A2a)
(E + El − E0)b̃0l(E)− Ωlb̃(E)
Ωr b̃lr(E) = 0 (A2b)
(E + El − Er)b̃lr(E)− Ωr b̃0l(E)
Ωl′ b̃0ll′r(E) = 0 (A2c)
(E + El + El′ − E0 − Er)b̃0ll′r(E) − Ωl′ b̃lr(E)
+ Ωlb̃l′r(E)−
Ωr′ b̃ll′rr′(E) = 0 (A2d)
· · · · · · · · ·
(The r.h.s of Eq. (A2a) reflects the initial condition.)
Let us replace the amplitude b̃ in the term
Ωb̃ of each
of the equations (A2) by its expression obtained from the
subsequent equation. For example, substituting b̃0l(E)
from Eq. (A2b) into Eq. (A2a) we obtain
E + El − E0
b̃(E)
E + El − E0
b̃lr(E) = i. (A3)
Since the states in the reservoirs are very dense (contin-
uum), one can replace the sums over l and r by integrals,
for instance
ρL(El) dEl , where ρL(El) is the
density of states in the emitter, and Ωl,r → ΩL,R(El,r).
Consider the first term
Ω2L(El)
E + El − E0
ρL(El)dEl (A4)
where Λ is the cut-off parameter. Assuming weak en-
ergy dependence of the couplings ΩL,R and the density of
states ρL,R, we find in the limit of high bias, µL = Λ→∞
S1 = −iπΩ2L(E0 − E)ρL(E0 − E) = −i
. (A5)
Consider now the second sum in Eq. (A3).
ρR(Er)dEr
ΩL(El)ΩR(Er)b̃lr(E,El, Er)
E + El − E0
ρL(El)dEl , (A6)
where we replaced b̃lr(E) by b̃(E,El, Er) and took µL =
Λ, µR = −Λ. In contrast with the first term of Eq. (A3),
the amplitude b̃ is not factorized out the integral (A6).
We refer to this type of terms as “cross-terms”. Fortu-
nately, all “cross-terms” vanish in the limit of large bias,
Λ → ∞. This greatly simplifies the problem and is very
crucial for a transformation of the Schrödinger to the rate
equations. The reason is that the poles of the integrand
in the El(Er)-variable in the “cross-terms” are on the
same side of the integration contour. One can find it by
using a perturbation series the amplitudes b̃ in powers of
Ω. For instance, from iterations of Eqs. (A2) one finds
b̃(E,El, Er) =
iΩLΩR
E(E + El − Er)(E + El − E0)
+ · · ·
The higher order powers of Ω have the same structure.
Since E → E + iǫ in the Laplace transform, all poles of
the amplitude b̃(E,El, Er) in the El-variable are below
the real axis. In this case, substituting Eq. (A7) into
Eq. (A6) we find
(E + iǫ)(E + E0 − E1 + iǫ)2(E + E0 − Er + iǫ)
+ · · ·
dEl = 0 , (A8)
Thus, S2 → 0 in the limit of µL →∞, µR → −∞.
Applying analogous considerations to the other equa-
tions of the system (A2), we finally arrive at the following
set of equations:
(E + iΓL/2)b̃(E) = i (A9a)
(E + El − E0 + iΓR/2)b̃0l(E)
− Ωlb̃(E) = 0 (A9b)
(E + El − Er + iΓL/2)b̃lr(E)
− Ωrb̃0l(E) = 0 (A9c)
(E + El + El′ − E0 − Er + iΓR/2)b̃0ll′r(E)
− Ωl′ b̃lr(E) + Ωlb̃l′r(E) = 0 (A9d)
· · · · · · · · ·
Eqs. (A9) can be transformed directly to the reduced
density matrix σ
(n,n′)
jj′ (t), where j = 0, 1 denote the state
of the SET with an unoccupied or occupied dot and n de-
notes the number of electrons which have arrived at the
collector by time t. In fact, as follows from our derivation,
the diagonal density-matrix elements, j = j′and n = n′,
form a closed system in the case of resonant tunneling
through one level, Fig. 10. The off-diagonal elements,
j 6= j′, appear in the equation of motion whenever more
than one discrete level of the system carry the transport
(see Eq. (20). Therefore we concentrate below on the di-
agonal density-matrix elements only, σ
00 (t) ≡ σ
(n,n)
00 (t)
and σ
11 (t) ≡ σ
(n,n)
11 (t). Applying the inverse Laplace
transform on finds
00 (t) =
l...,r...
dEdE′
b̃l · · ·
r · · ·
(E)b̃∗l · · ·
r · · ·
(E′)ei(E
′−E)t (A10a)
11 (t) =
l...,r...
dEdE′
0l · · ·
r · · ·
(E)b̃∗
0l · · ·
r · · ·
(E′)ei(E
′−E)t (A10b)
Consider, for instance, the term σ
11 (t) =
l |b0l(t)|2.
Multiplying Eq. (A9b) by b̃∗0l(E
′) and then subtracting
the complex conjugated equation with the interchange
E ↔ E′ we obtain
dEdE′
(E′ − E − iΓR)b̃0l(E)b̃∗0l(E′)
− 2Im
Ωlb̃0l(E)b̃
∗(E′)
′−E)t = 0 (A11)
Using Eq. (A10b) one easily finds that the first inte-
gral in Eq. (A11) equals to −i[σ̇(0)11 (t)+ΓRσ
11 (t)]. Next,
substituting
b̃0l(E) =
Ωlb̃(E)
E + El − E0 + iΓR/2
(A12)
from Eq. (A9b) into the second term of Eq. (A11), and
replacing a sum by an integral, one can perform the El-
integration in the large bias limit, µL → ∞, µR → −∞.
Then using again Eq. (A10b) one reduces the second
term of Eq. (A11) to iΓLσ
00 (t). Finally, Eq. (A11) reads
11 (t) = ΓLσ
00 (t)− ΓRσ
11 (t).
The same algebra can be applied for all other am-
plitudes b̃α(t). For instance, by using Eq. (A10a) one
easily finds that Eq. (A9c) is converted to the following
rate equation σ̇
00 (t) = −ΓLσ
00 (t) + ΓRσ
11 (t). With
respect to the states involving more than one electron
(hole) in the reservoirs (the amplitudes like b̃0ll′r(E) and
so on), the corresponding equations contain the Pauli ex-
change terms. By converting these equations into those
for the density matrix using our procedure, one finds the
“cross terms”, like
Ωlb̃l′r(E)Ωl′ b̃
′), generated by
Eq. (A9d). Yet, these terms vanish after an integration
over El(r) in the large bias limit, as the second term in
Eq. (A3). The rest of the algebra remains the same,
as described above. Finally we arrive at the following
infinite system of the chain equations for the diagonal
elements, σ
00 and σ
11 , of the density matrix,
00 (t) = −ΓLσ
00 (t) , (A13a)
11 (t) = ΓLσ
00 (t)− ΓRσ
11 (t) , (A13b)
00 (t) = −ΓLσ
00 (t) + ΓRσ
11 (t) , (A13c)
11 (t) = ΓLσ
00 (t)− ΓRσ
11 (t) , (A13d)
· · · · · · · · ·
Summing over n in Eqs. (A13) we find for the reduced
density matrix of the SET, σ(t) =
(n)(t), the fol-
lowing “classical” rate equations,
σ̇00(t) = −ΓLσ00(t) + ΓRσ11(t) (A14a)
σ̇11(t) = ΓLσ00(t)− ΓRσ11(t) (A14b)
These equations represent a particular case of our general
quantum rate equations (20), which are derived using the
above described technique37,38.
APPENDIX B: CORRELATOR OF ELECTRIC
CHARGE INSIDE THE SET.
The charge correlator inside the SET is given by
SQ(ω) = S̄Q(ω) + S̄Q(−ω), where
S̄Q(ω) =
〈δQ̂(0)δQ̂(t)〉eiωtdt . (B1)
Here δQ̂(t) = c
0(t)c0(t) − q̄ and q̄ = P̄1 = P1(t → ∞) is
the average charge inside the dot. Since the initial state,
t = 0 in Eq. (B1) corresponds to the steady state, one
can represent the time-correlator as
〈δQ̂(0)δQ̂(t)〉 =
q=0,1
Pq(0)(q − q̄)(〈Qq(t)〉 − q̄) , (B2)
where Pq(0) is the probability of finding the charge q =
0, 1 inside the quantum dot in the steady state, such that
P1(0) = q̄ and P0(0) = 1 − q̄, and 〈Qq(t)〉 = P (q)1 (t) is
the average charge in the dot at time t, starting with the
initial condition P
1 (0) = q. Substituting Eq. (B2) into
Eq. (B1) we finally obtain
S̄Q(ω) = q̄(1 − q̄)[P̃ (1)1 (ω)− P̃
1 (ω)] , (B3)
where P̃
1 (ω) is a Laplace transform of P
1 (t). These
quantities are obtained directly from the rate equations,
such that q̄ = σ̄bb + σ̄dd and P̃
1 (ω) = σ̃
bb (ω) +
dd (ω), where σ̄ = σ(t → ∞) and σ̃(q)(ω) is the
Laplace transform σ(q)(t) with the initial conditions
corresponding to the occupied (q = 1) or unoccupied
(q = 0) SET. In order to find these quantities it is use-
ful to rewrite the rate equations in the matrix form,
σ̇(t) = Mσ(t), representing σ(t) as the eight-vector,
σ = {σaa, σbb, σcc, σdd, σac, σca, σbd, σdb} and M as the
corresponding 8× 8-matrix. Applying the Laplace trans-
form we find the following matrix equation,
(i ω I +M)σ̃(q)(ω) = −σ(q)(0) , (B4)
where I is the unit matrix and σ(q)(0) is the initial con-
dition for the density-matrix obtained by projecting the
total wave function (17) on occupied (q = 1) and unoc-
cupied (q = 0) states of the SET in the limit of t → ∞,
σ(1)(0) = N1{0, σ̄bb, 0, σ̄dd, 0, 0, σ̄bd, σ̄db} , (B5a)
σ(0)(0) = N0{σ̄aa, 0, σ̄cc, 0, σ̄ac, σ̄ca, 0, 0} , (B5b)
and N1 = 1/q̄ and N0 = 1/(1− q̄) are the corresponding
normalization factors. Finally one obtains:
SQ(ω) = 2q̄(1− q̄)Re [σ̃(1)bb (ω) + σ̃
dd (ω)
− σ̃(0)bb (ω)− σ̃
dd (ω)]. (B6)
In the case shown in Fig. 2 one finds from Eqs. (21) or
Eqs. (31) for Γ′L,R = ΓL,R that σ̄ac = σbd = 0, q̄ = ΓL/Γ
and σ̃
bb (ω) + σ̃
dd (ω) = P̃
1 (ω). The latter equation is
given by
(iω − Γ)P̃ (q)1 (ω) = −q +
. (B7)
Substituting Eq. (B7) into Eq. (B3) one obtains:
SQ(ω) =
2ΓLΓR
Γ(ω2 + Γ2)
. (B8)
Obviously, for a more general case when Γ′L,R 6= ΓL,R,
or when the electron-electron interaction excites the elec-
tron inside the SET above the Fermi level, Fig. 8, the ex-
pressions for SQ(ω), obtained from Eq. (B6) have a more
complicated than Eq. (B8).
∗ Electronic address: shmuel.gurvitz@weizmann.ac.il
1 A.J. Leggett, S. Chakravarty, A.T. Dorsey, M.P.A. Fisher,
A. Garg, and W. Zwerger, Rev. Mod. Phys. 59, 1 (1987).
2 U. Weiss, Quantum Dissipative Systems (World Scientific,
Singapure, 2000).
3 A. Shnirman, Y. Makhlin, and G. Schoön, Phys. Scr.
T102, 147 (2002).
4 H. Gassmann, F. Marquardt, and C. Bruder, Phys. Rev.
E66, 041111 (2002).
5 E. Paladino, L. Faoro, G. Falci, and R. Fazio, Phys. Rev.
Lett. 88, 228304 (2002).
6 T. Itakura and Y. Tokura, Phys. Rev. B67, 195320 (2003).
7 J.Q. You, X. Hu, and F. Nori, Phys. Rev. B 72, 144529
(2005).
8 A. Grishin, I.V. Yurkevich and I.V. Lerner, Phys. Rev. B
72, 060509(R) (2005).
9 J. Schriefl1, Y. Makhlin, A. Shnirman, and Gerd Schön,
New J. Phys. 8, 1 (2006).
10 Y.M. Galperin, B.L. Altshuler, J. Bergli, and D.V. Shant-
sev, Phys. Rev. Lett., 96, 097009 (2006).
11 S. Ashhab, J.R. Johansson, and F. Nori, Phys. Rev. A74,
052330 (2006); ibid, Physica C 444, 45 (2006); ibid, New
J. Phys. 8, 103 (2006).
12 U. Hartmann and F.K. Wilhelm, Phys. Rev. B 75, 165308
(2007).
13 S.A. Gurvitz, Phys. Rev. B56, 15215 (1997).
14 S. Pilgram and M. Büttiker, Phys. Rev. Lett., 89, 200401
(2002).
15 A.A. Clerk, S.M. Girvin, and A.D. Stone, Phys. Rev. B67,
165324 (2003).
16 S.A. Gurvitz and G.P. Berman, Phys. Rev. B72, 073303
(2005).
17 A. Käck, G. Wendin, and G. Johansson, Phys. Rev. B 67,
035301 (2003).
18 R. Aguado and L. P. Kouwenhoven, Phys. Rev. Lett., 84,
1986 (2000).
19 E. Onac, F. Balestro, L.H. Willems van Beveren, U. Hart-
mann, Y.V. Nazarov, and L.P. Kouwenhoven, Phys. Rev.
Lett. 96, 176601 (2006).
20 I. Neder, M. Heiblum, D. Mahalu, and V. Umansky, Phys.
Rev. Lett. 98, 036803 (2007).
21 I. Neder and F. Marquardt, New J. Phys. 9, 112 (2007),
and references therein.
22 W.G. van der Wiel, T. Fujisawa, S. Tarucha, L.P. Kouwen-
hoven, Japanese Jour. Appl. Phys. 40, 2100 (2001).
23 J. M. Elzerman, R. Hanson, J. S. Greidanus, L. H. W.
van Beveren, S. De Franceschi, L. M. K. Vandersypen, S.
Tarucha, L. P. Kouwenhoven, Physica E 25, 135 (2004).
24 T. Hayashi, T. Fujisawa, H.D. Cheong, Y.H. Jeong, Y.
Hirayama, Phys. Rev. Lett. 91, 226804 (2003).
25 J. Shao, C. Zerbe, and P. Hanggi, Chem. Phys. 235, 81
(1998).
26 X.R. Wang, Y.S. Zheng, and S. Yin, Phys. Rev. B72,
121303(R) (2005).
27 C.P. Slichter, Principles of Magnetic Resonance, (Springer-
Verlag, 1980).
28 Y. Makhlin, G. Schoön, and A. Shnirman, Rev. Mod. Phys.
73, 357 (2001).
29 H.S. Goan, Quantum Information and Computation, 2,
121 (2003); ibid, Phys. Rev. B 70, 075305 (2004).
30 A.N. Korotkov, Phys. Rev. B63, 085312 (2001); ibid, Phys.
Rev. B63, 115403 (2001).
31 Y. Levinson, Phys. Rev. B61, 4748 (2000).
32 K. Rabenstein, V.A. Sverdlov, and D.V. Averin, JETP
Lett. 79, 646 (2004).
33 S.A. Gurvitz, L. Fedichkin, D. Mozyrsky and G.P. Berman,
Phys. Rev. Lett., 91, 066801 (2003).
34 G. Ithier, E. Collin, P. Joyez, P.J. Meeson, D. Vion, D.
Esteve, F. Chiarello, A. Shnirman, Y. Makhlin, J. Schriefl,
and G. Schön, Phys. Rev. B72, 134519 (2005).
35 S.A. Gurvitz, IEEE Transactions on Nanotechnology 4, 45
(2005).
36 S.A. Gurvitz, D. Mozyrsky, and G.P. Berman, Phys.Rev.
B72, 205341 (2005).
37 S.A. Gurvitz and Ya.S. Prager, Phys. Rev. B53, 15932
(1996).
38 S.A. Gurvitz, Phys. Rev. B57, 6602, (1998).
39 In a strict sense the quantum rate equations (20) were de-
rived by assuming constant widths Γ. Yet these equation
are also valid when the widths are weakly energy depen-
dent, as follows from their derivations (see37,38 and Ap-
pendix A).
40 R.J. Schoelkopf, A.A. Clerk, S.M. Girvin, K.W. Lehnert
and M.H. Devoret, Quantum Noise in Mesoscopic Physics,
edited by Yu.V. Nazarov, (Springer, 2003).
41 Y. Makhlin, G. Schoön, and A. Shnirman, in Exploring the
Quantum-Classical Frontier, edited by J.R. Friedman and
S. Han (Nova Science, Commack, New York, 2002).
42 A.M. Zagoskin, S. Ashhab, J.R. Johansson, and F. Nori,
Phys. Rev. Lett. 97, 077001 (2006).
43 T. Gilad and S.A. Gurvitz, Phys. Rev. Lett. 97, 116806
(2006); H.J. Jiao, X.Q. Li, and J.Y. Luo, Phys. Rev. B75,
155333 (2007).
mailto:shmuel.gurvitz@weizmann.ac.il
ABSTRACT
  We consider an electrostatic qubit, interacting with a fluctuating charge of
single electron transistor (SET) in the framework of exactly solvable model.
The SET plays a role of the fluctuating environment affecting the qubit's
parameters in a controllable way. We derive the rate equations describing
dynamics of the entire system for both weak and strong qubit-SET coupling.
Solving these equation we obtain decoherence and relaxation rates of the qubit,
as well as the spectral density of the fluctuating qubit's parameters. We found
that in the weak coupling regime the decoherence and relaxation rates are
directly related to the spectral density taken at Rabi or at zero frequency,
depending on what a particular qubit's parameters is fluctuating. This relation
holds also in the presence of weak back-action of the qubit on the fluctuating
environment. In the case of strong back-action, such simple relationship no
longer holds, even if the qubit-SET coupling is small. It does not hold either
in the strong-coupling regime, even in the absence of the back-action. In
addition, we found that our model predicts localization of the qubit in the
strong-coupling regime, resembling that of the spin-boson model.

<|endoftext|><|startoftext|>
GROUP-THEORETICAL PROPERTIES OF NILPOTENT
MODULAR CATEGORIES
VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
To Yuri Ivanovich Manin on his 70th birthday
Abstract. We characterize a natural class of modular categories of prime
power Frobenius-Perron dimension as representation categories of twisted dou-
bles of finite p-groups. We also show that a nilpotent braided fusion category C
admits an analogue of the Sylow decomposition. If the simple objects of C have
integral Frobenius-Perron dimensions then C is group-theoretical in the sense
of [ENO]. As a consequence, we obtain that semisimple quasi-Hopf algebras
of prime power dimension are group-theoretical. Our arguments are based on
a reconstruction of twisted group doubles from Lagrangian subcategories of
modular categories (this is reminiscent to the characterization of doubles of
quasi-Lie bialgebras in terms of Manin pairs given in [Dr]).
1. introduction
In this paper we work over an algebraically closed field k of characteristic 0.
By a fusion category we mean a k-linear semisimple rigid tensor category C with
finitely many isomorphism classes of simple objects, finite dimensional spaces of
morphisms, and such that the unit object 1 of C is simple. We refer the reader to
[ENO] for a general theory of such categories. A fusion category is pointed if all its
simple objects are invertible. A pointed fusion category is equivalent to VecωG, i.e.,
the category of G-graded vector spaces with the associativity constraint given by
some cocycle ω ∈ Z3(G, k×) (here G is a finite group).
1.1. Main results.
Theorem 1.1. Any braided nilpotent fusion category has a unique decomposition
into a tensor product of braided fusion categories whose Frobenius-Perron dimen-
sions are powers of distinct primes.
The notion of nilpotent fusion category was introduced in [GN]; we recall it in
Subsection 2.2. Let us mention that the representation category Rep(G) of a finite
group G is nilpotent if and only if G is nilpotent. It is also known that fusion
categories of prime power Frobenius-Perron dimension are nilpotent [ENO]. On
the other hand, VecωG is nilpotent for any G and ω. Therefore it is not true that
any nilpotent fusion category is a tensor product of fusion categories of prime power
dimensions.
Theorem 1.2. A modular category C with integral dimensions of simple objects is
nilpotent if and only if there exists a pointed modular category M such that C⊠M
Date: March 31, 2007.
http://arxiv.org/abs/0704.0195v2
2 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
is equivalent, as a braided tensor category, to the center of a fusion category of the
form VecωG for a finite nilpotent group G.
We emphasize here that in general the equivalence in Theorem 1.2 does not
respect the spherical structures (equivalently, twists) of the categories involved and
thus is not an equivalence of modular categories. Fortunately, this is not a very
serious complication since the spherical structures on C are easy to classify: it is
well known that they are in bijection with the objects X ∈ C such that X⊗X = 1,
see [RT].
The categoryM in Theorem 1.2 is not uniquely determined by C. However, there
are canonical ways to choose M. In particular, one can always make a canonical
“minimal” choice for M such that dim(M) =
αp with αp ∈ {0, 1, 2} for odd p
and α2 ∈ {0, 1, 2, 3}, see Remark 6.11.
Theorem 1.3. A modular category C is braided equivalent to the center of a fusion
category of the form VecωG with G being a finite p-group if and only if it has the
following properties:
(i) the Frobenius-Perron dimension of C is p2n for some n ∈ Z+,
(ii) the dimension of every simple object of C is an integer,
(iii) the multiplicative central charge of C is 1.
See Subsection 2.6 for the definition of multiplicative central charge. In order
to avoid confusion we note that our definition of multiplicative central charge is
different from the definition of central charge of a modular functor from [BK, 5.7.10];
in fact, the central charge from [BK] equals to the square of our central charge.
Remark 1.4. If p 6= 2 then it is easy to see that (i) implies (ii) (see, e.g., [GN]).
1.2. Interpretation in terms of group-theoretical fusion categories and
semisimple quasi-Hopf algebras. The notion of group-theoretical fusion cate-
gory was introduced in [ENO, O1]. Group-theoretical categories form a large class
of well-understood fusion categories which can be explicitly constructed from finite
group data (which justifies the name). For example, as far as we know, all currently
known semisimple Hopf algebras have group-theoretical representation categories
(however, there are semisimple quasi-Hopf algebras whose representation categories
are not group-theoretical, see [ENO]).
Theorem 1.5. Let C be a fusion category such that all objects of C have integer
dimension and such that its center Z(C) is nilpotent. Then C is group-theoretical.
Remark 1.6. A consequence of this theorem is the following statement: every
semisimple (quasi-)Hopf algebra of prime power dimension is group-theoretical in
the sense of [ENO, Definition 8.40]. This provides a partial answer to a question
asked in [ENO].
1.3. Idea of the proof. We describe here the main steps in the proof of The-
orem 1.3. First we characterize centers of pointed fusion categories in terms of
Lagrangian subcategories and show that a modular category C is equivalent to the
representation category of a twisted group double if and only if it has a Lagrangian
(i.e., maximal isotropic) subcategory of dimension
dim(C). This result is remi-
niscent to the characterization of doubles of quasi-Lie bialgebras in terms of Manin
pairs [Dr, Section 2].
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 3
Thus we need to show that a category satisfying the assumptions of Theorem 1.3
contains a Lagrangian subcategory. The proof is inspired by the following result
for nilpotent metric Lie algebras (i.e, Lie algebras with an invariant non-degenerate
scalar product) which can be derived from [KaO]: if g is a nilpotent metric Lie
algebra of even dimension then g contains an abelian ideal k, which is Lagrangian
(i.e., such that k⊥ = k). The relevance of metric Lie algebras to our considerations
is explained by the fact that they appear in [Dr] as classical limits of quasi-Hopf
algebras. In fact, our proof is a “categorification” of the proof of the above result.
Thus we need some categorical versions of linear algebra constructions involved in
this proof. Remarkably, the categorical counterparts exist for all notions required.
For example the notion of orthogonal complement in a metric Lie algebra is replaced
by the notion of centralizer in a modular tensor category introduced by M. Müger
[Mu2].
1.4. Organization of the paper. Section 2 is devoted to preliminaries on fusion
categories, which include nilpotent fusion categories, (pre)modular categories, cen-
tralizers, Gauss sums and central charge, and Deligne’s classification of symmetric
fusion categories.
In Section 3 we define the notions of isotropic and Lagrangian subcategories
of a premodular category C, generalizing the corresponding notions for a metric
group (which is, by definition, a finite abelian group with a quadratic form). We
then recall a construction, due to A. Bruguières [Br] and M. Müger [Mu1], which
associates to a premodular category C the “quotient” by its centralizer, called a
modularization. We prove in Theorem 3.4 an invariance property of the central
charge with respect to the modularization. This result will be crucial in the proof
of Theorem 6.5. We also study properties of subcategories of modular categories
and explain in Proposition 3.9 how one can use maximal isotropic subcategories of
a modular category C to canonically measure a failure of C to be hyperbolic (i.e.,
to contain a Lagrangian subcategory).
In Section 4 we characterize hyperbolic modular categories. More precisely, we
show in Theorem 4.5 that for a modular category C there is a bijection between La-
grangian subcategories of C and braided tensor equivalences C ∼−→ Z(VecωG) (where
G is a finite group, ω ∈ Z3(G,K×), and Z(VecωG) is the center of VecωG). Note that
the category Z(VecωG) is equivalent to Rep(Dω(G)) - the representation category of
the twisted double of G [DPR].
We then prove in Theorem 4.8 that if C is a modular category such that dim(C) =
n2, n ∈ Z+, the central charge of C equals 1, and C contains a symmetric subcate-
gory of dimension n, then either C is equivalent to the representation category of a
twisted double of a finite group or C contains an object with non-integer dimension.
We also give a criterion for a modular category C to be group-theoretical. Namely,
we show in Corollary 4.13 that C is group-theoretical if and only if there is an
isotropic subcategory E ⊂ C such that (E ′)ad ⊆ E .
In Section 5 we study pointed modular p-categories. We give a complete list of
such categories which do not contain non-trivial isotropic subcategories and analyze
the values of their central charges. We then prove in Proposition 5.3 that a nonde-
generate metric p-group (G, q) with central charge 1 such that |G| = p2n, n ∈ Z+,
contains a Lagrangian subgroup.
4 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
Section 6 is devoted to nilpotent modular categories. There we give proofs of our
main results stated in 1.1 above. They are contained in Theorem 6.5, Theorem 6.6,
Corollary 6.7, Theorem 6.10, and Theorem 6.12.
1.5. Acknowledgments. The research of V. Drinfeld was supported by NSF grant
DMS-0401164. The research of D. Nikshych was supported by the NSF grant DMS-
0200202 and the NSA grant H98230-07-1-0081. The research of V. Ostrik was sup-
ported by NSF grant DMS-0602263. S. Gelaki is grateful to the departments of
mathematics at the University of New Hampshire and MIT for their warm hospi-
tality during his Sabbatical. The authors are grateful to Pavel Etingof for useful
discussions.
2. Preliminaries
Throughout the paper we work over an algebraically closed field k of character-
istic 0. All categories considered in this paper are finite, abelian, semisimple, and
k-linear.
2.1. Fusion categories. For a fusion category C let O(C) denote the set of iso-
morphism classes of simple objects.
Let C be a fusion category. Its Grothendieck ring K0(C) is the free Z-module
generated by the isomorphism classes of simple objects of C with the multiplication
coming from the tensor product in C. The Frobenius-Perron dimensions of objects
in C (respectively, FPdim(C)) are defined as the Frobenius-Perron dimensions of
their images in the based ring K0(C) (respectively, as FPdim(K0(C))), see [ENO,
8.1]. For a semisimple quasi-Hopf algebra H one has FPdim(X) = dimk(X) for all
X in Rep(H), and so FPdim(Rep(H)) = dimk(H).
A fusion category is pointed if all its simple objects are invertible.
By a fusion subcategory of a fusion category C we understand a full tensor subcat-
egory of C. An example of a fusion subcategory is the maximal pointed subcategory
Cpt generated by the invertible objects of C.
A fusion category C is pseudo-unitary if its categorical dimension dim(C) coin-
cides with its Frobenius-Perron dimension, see [ENO] for details. In this case C
admits a canonical spherical structure (a tensor isomorphism between the iden-
tity functor of C and the second duality functor) with respect to which categori-
cal dimensions of objects coincide with their Frobenius-Perron dimensions [ENO,
Proposition 8.23]. The fact important for us in this paper is that a fusion category
of an integer Frobenius-Perron dimension is automatically pseudo-unitary [ENO,
Proposition 8.24].
Let C and D be fusion categories. Recall that for a tensor functor F : C → D its
image F (C) is the fusion subcategory of D generated by all simple objects Y in D
such that Y ⊆ F (X) for some simple X in C. The functor F is called surjective if
F (C) = D.
2.2. Nilpotent fusion categories. For a fusion category C let Cad be the trivial
component in the universal grading of C (see [GN]). Equivalently, Cad is the smallest
fusion subcategory of C which contains all the objects X ⊗X∗, X ∈ O(C).
For a fusion category C we define C(0) = C, C(1) = Cad, and C(n) = (C(n−1))ad for
every integer n ≥ 1. The non-increasing sequence of fusion subcategories of C
(1) C = C(0) ⊇ C(1) ⊇ · · · ⊇ C(n) ⊇ · · ·
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 5
is called the upper central series of C. We say that a fusion category C is nilpotent if
every non-trivial subcategory of C has a non-trivial group grading, see [GN]. Equiv-
alently, C is nilpotent if its upper central series converges to Vec (the category of
finite dimensional k−vector spaces), i.e., C(n) = Vec for some n. The smallest such
n is called the nilpotency class of C. If C is nilpotent then every fusion subcategory
E ⊂ C is nilpotent, and if F : C → D is a surjective tensor functor, then D is
nilpotent (see [GN]).
Example 2.1. (1) Let G be a finite group and C = Rep(G). Then C is nilpo-
tent if and only if G is nilpotent.
(2) Pointed categories are precisely the nilpotent fusion categories of nilpotency
class 1. A typical example of a pointed category is VecωG, the category
of finite dimensional vector spaces graded by a finite group G with the
associativity constraint determined by ω ∈ Z3(G, k×).
In this paper we are especially interested in the following class of nilpotent fusion
categories.
Example 2.2. Let p be a prime number. Any category of dimension pn, n ∈ Z,
is nilpotent by [ENO, Theorem 8.28]. For representation categories of semisimple
Hopf algebras of dimension pn this follows from a result of A. Masuoka [Ma1].
By [GN], a nilpotent fusion category comes from a sequence of gradings, in
particular it has an integer Frobenius-Perron dimension. It follows from results of
[ENO] that a nilpotent fusion category C is pseudounitary.
2.3. Premodular categories and modular categories. Recall that a braided
tensor category C is a tensor category equipped with a natural isomorphism c :
⊗ ∼= ⊗rev satisfying the hexagon diagrams [JS]. Let cXY : X ⊗ Y ∼= Y ⊗X with
X,Y ∈ C denote the components of c.
A balancing transformation, or a twist, on a braided category C is a natural
automorphism θ : idC → idC satisfying θ1 = id1 and
(2) θX⊗Y = (θX ⊗ θY )cY XcXY .
A braided fusion category C is called premodular, or ribbon, if it has a twist θ
satisfying θ∗X = θX∗ for all objects X ∈ C.
The S-matrix of a premodular category C is S = {sXY }X,Y ∈O(C), where sXY is
the quantum trace of cYXcXY , see [T]. Equivalently, the S-matrix can be defined
as follows. For all X,Y, Z ∈ O(C) let NZXY be the multiplicity of Z in X ⊗ Y . For
every object X let d(X) denote its quantum dimension. Then
(3) sXY = θ
Z∈O(C)
NZXY θZd(Z).
The categorical dimension of C is defined by
(4) dim(C) =
X∈O(C)
d(X)2.
One has dim(C) 6= 0 [ENO, Theorem 2.3].
Note 2.3. Below we consider only fusion categories with integer Frobenius-Perron
dimensions of objects. Any such category C is pseudo-unitary (see 2.1). In par-
ticular, if C is braided then it has a canonical twist, which we will always assume
chosen.
6 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
A premodular category C is called modular if the S−matrix is invertible.
Example 2.4. For any fusion category C its center Z(C) is defined as the category
whose objects are pairs (X, cX,−), where X is an object of C and cX,− is a natural
family of isomorphisms cX,V : X ⊗ V ∼= V ⊗ X for all objects V in C satisfying
certain compatibility conditions (see e.g., [Kass]). It is known that the center of a
pseudounitary category is modular.
2.4. Pointed modular categories and metric groups. Let G be a finite abelian
group. Pointed premodular categories C with the group of simple objects isomorphic
to G (up to a braided equivalence) are in the natural bijection with quadratic forms
on G with values in the multiplicative group k∗ of the base field. Here a quadratic
form q : G → k∗ is a map such that q(g−1) = q(g) and b(g, h) := q(gh)
q(g)q(h)
a symmetric bilinear form, i.e., b(g1g2, h) = b(g1, h)b(g2, h) for all g1, g2, h ∈ G.
Namely, for g ∈ G the value of q(g) is the braiding automorphism of g ⊗ g (here
by abuse of notation g denotes the object of C corresponding to g ∈ G). See [Q,
Proposition 2.5.1] for a proof that if two categories C1 and C2 produce the same
quadratic form then they are braided equivalent (Quinn proves less canonical but
equivalent statement). We will denote the category corresponding to a group G
with quadratic form q by C(G, q) and call the pair (G, q) a metric group. The
category C(G, q) is pseudounitary and hence has a spherical structure such that
dimensions of all simple objects equal to 1; hence the categories C(G, q) always
have a canonical ribbon structure. The category C(G, q) is modular if and only if
the bilinear form b(g, h) associated with q is non-degenerate (in this case we will
say that the corresponding metric group is non-degenerate).
2.5. Centralizers. Let K be a fusion subcategory of a braided fusion category C.
In [Mu1, Mu2] M. Müger introduced the centralizer K′ of K, which is the fusion
subcategory of C consisting of all the objects Y satisfying
(5) cY XcXY = idX⊗Y for all objects X ∈ K.
If (5) holds we will say that objects X and Y centralize each other. In the case
of a ribbon category C, condition (5) is equivalent to sXY = d(X)d(Y ), see [Mu2,
Proposition 2.5]. Note that in the case of a pointed modular category the centralizer
corresponds to the orthogonal complement. The subcategory C′ of C is called the
transparent subcategory of C in [Br, Mu1].
For any fusion subcategory K ⊆ C of a braided fusion category C let Kco be
the commutator of K [GN], i.e., the fusion subcategory of C spanned by all simple
objects X ∈ C such that X ⊗ X∗ ∈ K. For example, if C = Rep(G), G a finite
group, then any fusion subcategory K of C is of the form K = Rep(G/N) for some
normal subgroup N of G, and Kco = Rep(G/[G,N ]) (see [GN]). It follows from the
definitions that (Kco)ad ⊆ K ⊆ (Kad)co.
Let K be a fusion subcategory of a pseudounitary modular category C. It was
shown in [GN] that
(6) (Kad)′ = (K′)co.
It was shown in [Mu2, Theorem 3.2] that for a fusion subcategory K of a modular
category C one has K′′ = K and
(7) dim(K) dim(K′) = dim(C).
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 7
The subcategory K is symmetric if and only if K ⊆ K′. It is modular if and only if
K ∩ K′ = Vec, in which case K′ is also modular and there is a braided equivalence
C ∼= K ⊠K′.
Let C be a modular category. Then by [GN, Corollary 6.9], Cpt = (Cad)′.
2.6. Gauss sums and central charge in modular categories. Let C be a
modular category. For any subcategory K of C the Gauss sums of K are defined by
(8) τ±(K) =
X∈O(K)
θ±1X d(X)
Below we summarize some basic properties of twists and Gauss sums (see e.g.,
[BK, Section 3.1] for proofs).
Each θX , X ∈ O(C), is a root of unity (this statement is known as Vafa’s theo-
rem). The Gauss sums are multiplicative with respect to tensor product of modular
categories, i.e., if C1, C2 are modular categories then
(9) τ±(C1 ⊠ C2) = τ±(C1)τ±(C2).
We also have that
(10) τ+(C)τ−(C) = dim(C).
When k = C the multiplicative central charge ξ(C) is defined by
(11) ξ(C) = τ
dim(C)
where
dim(C) is the positive root. If dim(C) is a square of an integer, then
Formula (11) makes sense even if k 6= C. By Vafa’s theorem, ξ(C) is a root of unity.
Example 2.5. The center Z(C) of any fusion category C (see Example 2.4) is a
modular category with central charge 1 [Mu4, Theorem 1.2].
2.7. Symmetric fusion categories. The structure of symmetric fusion categories
is known, thanks to Deligne’s work [De]. Namely, let G be a finite group and let
z ∈ G be a central element such that z2 = 1. Consider the category Rep(G) with
its standard symmetric braiding σX,Y . Then the map σ
X,Y =
(1 + z|X + z|Y −
z|Xz|Y )σX,Y is also a symmetric braiding on the category Rep(G) (the meaning of
the factor 1
(1 + z|X + z|Y − z|Xz|Y ) is the following: if z|X or z|Y equals 1, then
this factor is 1; if z|X = z|Y = −1 then this factor is (−1)). We will denote by
Rep(G, z) the category Rep(G) with the commutativity constraint defined above.
Theorem 2.6. ([De]) Any symmetric fusion category is equivalent (as a braided
tensor category) to Rep(G, z) for uniquely defined G and z. The categorical dimen-
sion of X ∈ Rep(G, z) equals Tr(z|X) and dim(C) = FPdim(C) = |G|.
Now assume that the category Rep(G, z) is endowed with a twist θ such that the
dimension of any object is non-negative. It follows immediately from the theorem
that θX = z|X . We have
Corollary 2.7. Let C be a symmetric fusion category with the canonical spherical
structure (see 2.3).
(i) If dim(C) is odd then θX = idX for any X ∈ C.
8 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
(ii) In general either θX = idX for any X ∈ C, or C contains a fusion subcat-
egory C1 ⊂ C such that FPdim(C1) = 12FPdim(C) and θX = idX for any
X ∈ C1.
Proof. As for (i), it is clear that z = 1. For (ii) one takes C1 = Rep(G/〈z〉) ⊂
Rep(G). �
3. Isotropic subcategories and Bruguières-Müger modularization
3.1. Modularization.
Definition 3.1. Let C be a premodular category with braiding c and twist θ. A
fusion subcategory E of C is called isotropic if θ restricts to the identity on E , i.e.,
if θX = idX for all X ∈ E . An isotropic subcategory E is called Lagrangian if
E = E ′. The category C is called hyperbolic if it has a Lagrangian subcategory and
anisotropic if it has no non-trivial isotropic subcategories.
Remark 3.2. (a) When C = C(G, q) is a pointed modular category defined in
Example 2.4 then isotropic and Lagrangian subcategories of C correspond
to isotropic and Lagrangian subgroups of (G, q), respectively. We discuss
properties of pointed modular categories in Section 5.
(b) Let G be a finite group and let ω ∈ Z3(G, k×). Consider the pointed fusion
category VecωG. Its center C = Z(VecωG) is a modular category. It contains
a Lagrangian subcategory E ∼= Rep(G) formed by all objects in C which
are sent to multiples of the unit object of VecωG by the forgetful functor
Z(VecωG) → Vec
(c) It follows from the balancing axiom (2) that an isotropic subcategory E ⊆ C
is always symmetric. Conversely, if E is symmetric and dim(E) is odd then
E is isotropic, see 2.7. In particular, if dim(C) is odd then any symmetric
subcategory of C is isotropic.
(d) Recall that we assume that C is endowed with a canonical spherical struc-
ture, see 2.3. Any isotropic subcategory E ⊂ C is equivalent, as a symmetric
category, to Rep(G) for a canonically defined group G with its standard
braiding and identical twist, see 2.7. In particular, if E is Lagrangian then
dim(C) = dim(E)2 is a square of an integer.
Let C be a premodular category such that its centralizer C′ is isotropic and
dimensions of all objects X ∈ C′ are non-negative. Let us recall a construction,
due to A. Bruguières [Br] and M. Müger [Mu1], which associates to C a modular
category C̄ and a surjective braided tensor functor C → C̄.
Let G(C) be the unique (up to an isomorphism) group such that the category C′
is equivalent, as a premodular category, to Rep(G(C)) with its standard symmetric
braiding and identity twist.
Let A be the algebra of functions on G(C). The group G(C) acts on A via left
translations and so A is a commutative algebra in C′ and hence in C.
Consider the category C̄ := CA of right A-modules in the category C (see, e.g.,
[KiO, 1.2]). It was shown in [Br, KiO, Mu1] that C̄ is a braided fusion category and
that the “free module” functor
(12) F : C → C̄, X 7→ X ⊗A
is surjective and has a canonical structure of a braided tensor functor. One can
define a twist φ on C̄ in such a way that φY = θX for all Y ∈ O(C̄) and X ∈
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 9
O(C) for which HomC(X,Y ) 6= 0. It follows that the category C̄ is modular, see
[Br, Mu1, KiO] for details. We will call the category C̄ a modularization of C.
Let d and d̄ denote the dimension functions in C and C̄, respectively. For any
object X in C̄ one has
(13) d̄(X) =
cf. [KiO, Theorem 3.5], [Br, Proposition 3.7].
Remark 3.3. Let E be an isotropic subcategory of a modular category C. Then
dim(Ē ′) = dim(C)/ dim(E)2 (see e.g. [KiO]).
3.2. Invariance of the central charge. In this subsection we prove an invariance
property of the central charge with respect to modularization, which will be crucial
in the sequel.
Theorem 3.4. Let C be a modular category and let E be an isotropic subcategory
of C. Let F : E ′ → Ē ′ be the canonical braided tensor functor from E ′ to its
modularization. Then ξ(Ē ′) = ξ(C).
Proof. Let A be the canonical commutative algebra in E . We have dim(E) = d(A).
By definition, Ē ′ is the category of left A-modules in E ′.
Let us compute the Gauss sums of Ē ′:
dim(E)τ±(Ē ′) = dim(E)
Y ∈O(Ē′)
φ±1Y d̄(Y )
Y ∈O(Ē′)
φ±1Y d(Y )d̄(Y )
Y ∈O(Ē′)
X∈O(C)
dimk HomC(X, Y )d(X)
 d̄(Y )
X∈O(C)
θ±1X d(X)
Y ∈O(Ē′)
dimk HomC(X, Y )d̄(Y )
X∈O(C)
θ±1X d(X)
Y ∈O(Ē′)
dimk HomĒ′(X ⊗A, Y )d̄(Y )
X∈O(C)
θ±1X d(X)d̄(F (X)) = τ
±(C),
where we used the relation (13) and the fact that F is an adjoint of the forgetful
functor from Ē ′ to E ′.
Combining this with the equation dim(Ē ′) = dim(C)/ dim(E)2 (see Remark 3.3)
we obtain the result. �
3.3. Maximal isotropic subcategories. Let C be a modular category and let L
be an isotropic subcategory of C which is maximal among isotropic subcategories
of C. Below we will show that the braided equivalence class of the modular category
L̄′ (the modularization of L′ by L) is independent of the choice of L.
Let C be a fusion category and let A and B be its fusion subcategories such that
X ⊗ Y ∼= Y ⊗ X for all X ∈ O(A) and Y ∈ O(B). Let A ∨ B denote the fusion
10 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
subcategory of C generated by A and B, i.e., consisting of all subobjects of X ⊗ Y ,
where X ∈ O(A) and Y ∈ O(B). Recall that the regular element of K0(C)⊗Z C is
X∈O(C) d(X)X . It is defined up to a scalar multiple by the property that
Y ⊗RC = d(Y )RC for all Y ∈ O(C) [ENO].
Lemma 3.5. Let C, A, B be as above. Then dim(A ∨ B) = dim(A) dim(B)
dim(A∩B) .
Proof. It is easy to see that
(14) RA ⊗RB = aRA∨B,
where the scalar a is equal to the multiplicity of the unit object 1 in RA ⊗ RB,
which is the same as the multiplicity of 1 in
Z∈O(A∩B) d(Z)
2Z ⊗ Z∗. Hence,
a = dim(A ∩ B). Taking dimensions of both sides of (14) we get the result. �
Let L(C) denote the lattice of fusion subcategories of a fusion category C. For
any two subcategories A and B their meet is their intersection and their joint is the
category A ∨ B.
Lemma 3.6. Let C be a fusion category such that X ⊗ Y ∼= Y ⊗X for all objects
X,Y in C. For all A, B, D ∈ L(C) such that D ⊆ A the following modular law
holds true:
(15) A ∩ (B ∨D) = (A ∩ B) ∨ D.
Proof. A classical theorem of Dedekind in lattice theory states that (15) is equiv-
alent to the following statement: for all A, B, D ∈ L(C) such that D ⊆ A, if
A∩ B = D ∩ B and A∨ B = D ∨ B then A = D (see e.g., [MMT]).
Let us prove the latter property. Take a simple object X ∈ A. Then X ∈
A ∨ B = D ∨ B so there are simple objects D ∈ D and B ∈ B such that X is
contained in D ⊗ B. Therefore, B is contained in D∗ ⊗ X and so B ∈ A. So
B ∈ A ∩ B = D ∩ B ⊆ D. Hence X ∈ D, as required. �
Remark 3.7. When C = Rep(G) is the representation category of a finite group
G, Lemma 3.6 gives a well-known property of the lattice of normal subgroups of G.
The next lemma gives an analogue of a diamond isomorphism for the “quotients
by isotropic subcategories.”
Lemma 3.8. Let C be a modular category, let D be an isotropic subcategory of C
and let B be a subcategory of D′. Let A, A0 be the canonical commutative algebras
in D and D ∩ B, respectively.
Then the category BA0 of A0-modules in B and the category (D ∨ B)A of A-
modules in D ∨ B are equivalent as braided tensor categories.
Proof. Note that
dim(BA0) =
dim(B)
dim(D ∩ B)
dim(D ∨ B)
dim(D)
= dim((D ∨ B)A)
by Lemma 3.5.
Define a functorH : BA0 → (D∨B)A byH(X) = X⊗A0A, X ∈ BA0 . ThenH has
a natural structure of a braided tensor functor. Note that for X = Y ⊗A0, Y ∈ B
we have H(X) = Y ⊗A, i.e., the composition of H with the free A0-module functor
is the free A-module functor. The latter functor is surjective and, hence, so is H .
Since a surjective functor between categories of equal dimension is necessarily
an equivalence (see [ENO, 5.7] or [EO, Proposition 2.20]) the result follows. �
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 11
Proposition 3.9. Let C be a modular category and let L1, L2 be maximal among
isotropic subcategories of C. Then the modularization L̄′1 and L̄′2 are equivalent as
braided fusion categories.
Proof. Let D = L1 and B = L′1∩L′2. By maximality of L1,L2 we have L′1∩L2 ⊆ L1
and L1 ∩ L′2 ⊆ L2. Therefore, D ∩ B = L1 ∩ L2 and D ∨ B = L′1 ∩ (L1 ∨ L′2) = L′1
by Lemma 3.6.
Let A0 be the canonical commutative algebra in L1 ∩ L2. Applying Lemma 3.8
we see that L̄′1 is equivalent to the category (L′1 ∩L′2)A0 of A0-modules in L′1 ∩L′2.
The proposition now follows by interchanging L1 and L2. �
Remark 3.10. (i) We can call the modular category L̄′1 constructed in the
proof of Proposition 3.9 “the” canonical modularization corresponding to
C (it measures the failure of C to be hyperbolic). The above proof gives
a concrete equivalence L̄′1 ∼= L̄′2. But given another maximal isotropic
subcategory L3 ⊂ C the composition of equivalences L̄′1 ∼= L̄′2 and L̄′2 ∼= L̄′3
is not in general equal to the equivalence L̄′1 ∼= L̄′3. This is why we put
“the” above in quotation marks.
(ii) For a maximal isotropic subcategory L ⊂ C the corresponding modular-
ization does not have to be anisotropic, in contrast with the situation for
metric groups. Examples illustrating this phenomenon are, e.g., the cen-
ters of non-group theoretical Tambara-Yamagami categories considered in
[ENO, Remark 8.48].
4. Reconstruction of a twisted group double from a Lagrangian
subcategory
4.1. C-algebras. Let us recall the following definition from [KiO].
Definition 4.1. Let C be a ribbon fusion category. A C−algebra is a commutative
algebra A in C such that dimHom(1, A) = 1, the pairing A⊗A → A → 1 given by
the multiplication of A is non-degenerate, θA = idA and dim(A) 6= 0.
Let C be a modular category, let A be a C−algebra, and let CA be the fusion
category of right A−modules with the tensor product ⊗A. The free module functor
F : C → CA, X 7→ X ⊗A has an obvious structure of a central functor. By this we
mean that there is a natural family of isomorphisms F (X)⊗AY ∼= Y ⊗AF (X), X ∈
C, Y ∈ CA, satisfying an obvious multiplication compatibility, see e.g. [Be, 2.1].
Indeed, we have F (X) = X ⊗ A, and hence F (X) ⊗A Y = X ⊗ Y . Similarly,
Y ⊗A F (X) = Y ⊗ X . These two objects are isomorphic via the braiding of C
(one can check that the braiding gives an isomorphism of A-modules using the
commutativity of A).
Thus, the functor F extends to a functor F̃ : C → Z(CA) in such a way that F
is the composition of F̃ and the forgetful functor Z(CA) → CA.
Proposition 4.2. The functor F̃ : C → Z(CA) is injective (that is fully faithful).
Proof. Consider CA as a module category over C via F and over Z(CA) via F̃ .
We will prove the dual statement (see [ENO, Proposition 5.3]), namely that the
functor T : CA ⊠ CopA → C∗CA dual to F̃ is surjective (here and below the superscript
op refers to the tensor category with the opposite tensor product). Recall (see e.g.
[O1]) that the category C∗CA is identified with the category of A−bimodules. An
12 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
explicit description of the functor T is the following: by definition, any M ∈ CA
is a right A−module. Using the braiding and its inverse one can define on M
two structures of a left A−module: A ⊗ M
A,M−→ M ⊗ A → M . Both structures
make M into an A−bimodule, and we will denote the two results by M+ and
M−, respectively. Then we have T (M ⊠ N) = M+ ⊗A N−. In particular we
see that the functor C ⊠ Cop F⊠F−→ CA ⊠ CopA
T−→ C∗CA coincides with the functor
C ⊠ Cop ≃ Z(C) ≃ Z(C∗CA) → C
CA (see [O2]). Since the functor Z(C
CA) → C
CA is
surjective (see [EO, 3.39]) we see that the functor T is surjective. The proposition
is proved. �
Remark 4.3. Note that since C and Z(CA) are modular we have a factorization
Z(CA) = C ⊠D, where D is the centralizer of C in Z(CA). One observes that D is
identified with the category of “dyslectic” A−modules Rep0(A), see [KiO, P].
Corollary 4.4. Assume that dim(A) =
dim(C). Then the functors F̃ : C →
Z(CA) and T : CA ⊠ CopA → C∗CA are tensor equivalences.
Proof. We have already seen that dim(CA) = dim(C)dim(A) . Hence, dim(Z(CA)) =
dim(C)2
dim(A)2
= dim(C). Since F̃ is an injective functor between categories of equal
dimension, it is necessarily an equivalence by [EO, Proposition 2.19]. Hence the
dual functor T is also an equivalence. �
4.2. Hyperbolic modular categories as twisted group doubles. We are now
ready to state and prove our first main result which relates hyperbolic modular
categories and twisted doubles of finite groups.
Let C be a modular category. Consider the set of all triples (G,ω, F ), where
G is a finite group, ω ∈ Z3(G, k×), and F : C ∼−→ Z(VecωG) is a braided tensor
equivalence. Let us say that two triples (G1, ω1, F1) and (G2, ω2, F2) are equivalent
if there exists a tensor equivalence ι : Vecω1G1
∼−→ Vecω2G2 such that F2◦F2 = ι◦F1◦F1,
where Fi : Z(VecωiGi) → Vec
, i = 1, 2, are the canonical forgetful functors.
Let E(C) be the set of all equivalences classes of triples (G,ω, F ). Let Lagr(C)
be the set of all Lagrangian subcategories of C.
Theorem 4.5. For any modular category C there is a natural bijection
f : E(C) ∼−→ Lagr(C).
Proof. The map f is defined as follows. Note that each braided tensor equivalence
F : C ∼−→ Z(VecωG) gives rise to the Lagrangian subcategory f(G,ω, F ) of C formed
by all objects sent to multiples of the unit object 1 under the forgetful functor
Z(VecωG) → VecωG. This subcategory is clearly the same for all equivalent choices
of (G,ω, F ).
Conversely, given a Lagrangian subcategory E ⊆ C it follows from Deligne’s
theorem [De] that E = Rep(G) for a unique (up to isomorphism) finite group G.
Let A = Fun(G) ∈ Rep(G) = E ⊂ C. It is clear that A is a C−algebra and
dim(A) = dim(E) =
dim(C). Then by Corollary 4.4, the functor F̃ : C → Z(CA)
is an equivalence.
Finally, let us show that CA is pointed and K0(CA) = ZG. Note that there are
|G| non-isomorphic structures Ag, g ∈ G, of an invertible A-bimodule on A, since
the category of A-bimodules in E is equivalent to VecG. For each Ag there is a
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 13
pair X,Y of simple objects in CA such that T (X ⊠ Y ) = Ag. Taking the forgetful
functor to CA we obtain Y = X∗ and X is invertible. Hence, for each g ∈ G there
is a unique invertible Xg ∈ CA such that T (Xg ⊠X∗g ) = Ag, and therefore g 7→ Xg
is an isomorphism of K0 rings. Thus, CA ∼= VecωG for some ω ∈ Z3(G, k×). We set
h(E) to be the class of the equivalence F̃ : C ∼−→ Z(CA).
Let show that the above constructions f and h are inverses of each other. Let E
be a Lagrangian subcategory of C and let A be the algebra defined in the previous
paragraph. The forgetful functor from C ∼= Z(CA) to CA is the free module functor,
and so f(h(E)) consists of all objects X in C such that X ⊗ A is a multiple of A.
Since A is the regular object of E , it follows that f(h(E)) = E and f ◦ h = id.
Proving that h ◦ f = id amounts to a verification of the following fact. Let G
be a finite group, let ω ∈ Z3(G, k×), and let A = Fun(G) be the canonical algebra
in Rep(G) ⊂ Z(VecωG). Then the category of A-modules in Z(VecωG) is equivalent
to VecωG and the functor of taking the free A-module coincides with the forgetful
functor from Z(VecωG) to Vec
G. This is straightforward and is left to the reader. �
Remark 4.6. Our reconstruction of the representation category of a twisted group
double from a Lagrangian subcategory can be viewed as a categorical analogue of
the following reconstruction of the double of a quasi-Lie bialgebra from a Manin
pair (i.e., a pair consisting of a metric Lie algebra and its Lagrangian subalgebra)
in the theory of quantum groups [Dr, Section 2].
Let g be a finite-dimensional metric Lie algebra (i.e., a Lie algebra on which a
nondegenerate invariant symmetric bilinear form is given). Let l be a Lagrangian
subalgebra of g. Then l has a structure of a quasi-Lie bialgebra and there is an
isomorphism between g and the double D(l) of l. The correspondence between
Lagrangian subalgebras of g and doubles isomorphic to g is bijective, see [Dr, Section
2] for details.
Remark 4.7. Given a hyperbolic modular category C there is no canonical way
to assign to it a pair (G, ω) such that C ∼= Z(VecωG) as a braided fusion category.
Indeed, it follows from [EG1] that there exist non-isomorphic finite groups G1, G2
such that Z(VecG1) ∼= Z(VecG2) as braided fusion categories. (See also [N].)
Theorem 4.8. Let C be a modular category such that dim(C) = n2, n ∈ Z+, and
such that ξ(C) = 1. Assume that C contains a symmetric subcategory V such that
dim(V) = n. Then either C is the center of a pointed category or it contains an
object with non-integer dimension.
Proof. Assume that V is not isotropic. Then V contains an isotropic subcategory K
such that dim(K) = 1
dim(V) (this follows from Deligne’s description of symmetric
categories, see 2.7). Hence the category K̄′ (modularization of K′) has dimension
4 and central charge 1. It follows from the explicit classification given in Example
5.1 (b),(d) that the category K̄′ contains an isotropic subcategory of dimension 2;
clearly this subcategory is equivalent to Rep(Z/2Z). Let A1 = Fun(Z/2Z) be the
commutative algebra of dimension 2 in this subcategory. Let I : K̄′ → K′ be the
right adjoint functor to the modularization functor F : K′ → K̄′.
We claim that the object A := I(A1) has a canonical structure of a C−algebra.
Indeed, we have a canonical morphism in Hom(F (A), A1) = Hom(A, I(A1)) =
Hom(A,A) ∋ id. Using this one can construct a multiplication on A via Hom(A1⊗
A1, A1) → Hom(F (A)⊗F (A), A1) = Hom(F (A⊗A), A1) = Hom(A⊗A,A). Since
14 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
the functor F is braided it follows from the commutativity of A1 that A is commu-
tative. Other conditions from Definition 4.1 are also easy to check. In particular
dim(A) = dim(K) dim(A1) = dim(V) =
dim(C). We also note that the category
RepK′(A) contains precisely two simple objects (actually, the functor M 7→ I(M) is
an equivalence of categories between RepK̄′(A1) and RepK′(A)); we will call these
two objects 1 (for A itself considered as an A−module) and δ. Clearly δ⊗A δ = 1.
By Corollary 4.4, we have an equivalence C∗CA = CA⊠C
A . Moreover, the forgetful
functor C∗CA → CA corresponds to the tensor product functor CA ⊠ C
A → CA. Now
consider the subcategory (K′)∗CA ⊂ C
CA (in other words A−bimodules in K
′); the
forgetful functor above restricts to S : (K′)∗CA → RepK′(A).
Let M ∈ (K′)∗CA be a simple object. We claim that there are three possibilities:
1) S(M) = 1, 2) S(M) = δ or 3) S(M) = 1⊕ δ. Indeed, M = X ⊠ Y ∈ CA ⊠ CopA
and S(M) = X ⊗ Y for some simple X,Y ∈ CA. Since 1 and δ are invertible the
result is clear.
Now, notice that if there exists M as in case 3) then we have X = Y ∗ and
dim(X) = dim(Y ) =
2. Thus the category CA contains an object with non-integer
dimension, which implies that the category C contains an object with non-integer
dimension (see e.g. [ENO, Corollary 8.36]), and the theorem is proved in this case.
Hence we will assume that for any M ∈ (K′)∗CA only 1) or 2) holds. This implies
that all objects of (K′)∗CA are invertible. Note that dim((K
′)∗CA) = dim(K
dim(C) and hence we have precisely 2
dim(C) simple objects. Consider all
objects M ∈ (K′)∗CA such that S(M) = 1; it is easy to see that there are precisely
dim(C) of those (indeed, X⊠Y 7→ X⊠ (Y ⊗A δ) gives a bijection between simple
bimodules M with S(M) = 1 and simple bimodules M with S(M) = δ). Let G
be the group of isomorphism classes of all objects M ∈ (K′)∗CA with S(M) = 1
(thus |G| =
dim(C)). Any object of this type is of the form Xg ⊠ (Xg)∗ for
some invertible Xg ∈ CA. Thus we already constructed
dim(C) invertible simple
objects in CA. Since dim(CA) =
dim(C) the objects Xg exhaust all simple objects
in CA. By Corollary 4.4, we are done. �
4.3. A criterion for a modular category to be group-theoretical. Let C be a
modular category. It is known that the entries of the S-matrix of C are cyclotomic
integers [CG, dBG]. Hence, we may identify them with complex numbers. In
particular, the notions of complex conjugation and absolute value of the elements
of the S-matrix make sense.
Remark 4.9. Let K ⊆ C be a fusion subcategory. Recall from [GN] that (Kad)′
is spanned by simple objects Y such that |sXY | = dXdY for all simple X in K. In
this case the ratio b(X,Y ) := sXY /(dXdY ) is a root of unity. Furthermore, for all
simple X ∈ K, Y1, Y2 ∈ K′ad and any simple subobject Z of Y1 ⊗ Y2 we have
(16) b(X,Y1)b(X,Y2) = b(X,Z),
as explained in [Mu2].
Lemma 4.10. Let C be a modular category and let K ⊆ C be a fusion subcategory
such that K ⊆ (Kad)′.
(1) There is a grading K = ⊕g∈G Kg such that K1 = K′ ∩ K.
(2) There is a non-degenerate symmetric bilinear form b on G such that b(g, h) =
sXY /(dXdY ) for all X ∈ Kg and Y ∈ Kh.
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 15
(3) If K′ ∩ K is isotropic then there is a non-degenerate quadratic form q on
G such that q(g) = θX for all X ∈ Kg. In this case b is the bilinear form
corresponding to q.
Proof. Since Kad ⊆ K′ ∩ K ⊆ K the assertion (1) follows from [GN].
Let b(X,Y ) = sXY /(dXdY ) for all simple X,Y ∈ K. Clearly, b is symmetric
and b(X,Y ) = 1 for all simple X in K if and only if Y ∈ K′ ∩ K = K1. To prove
(2) it suffices to check that b depends only on h ∈ G such that Y ∈ Kh (then the
G-linear property follows from (16)). Let Y1, Y2 be simple objects in Kh. Then
Y1 ⊗ Y ∗2 ∈ K′ ∩ K and so b(X,Y1)b(X,Y ∗2 ) = 1, whence b(X,Y1) = b(X,Y2), as
desired.
Finally, (3) is a direct consequence of our discussion in Section 3.1. �
For a subcategory K ⊆ C satisfying the hypothesis of Lemma 4.10 let (GK, bK)
be the corresponding abelian grading group and bilinear form. Note that if such
K is considered as a subcategory of Crev then the corresponding bilinear form is
(GK, b
Theorem 4.11. Let C be a modular category. Then symmetric subcategories of
Z(C) ∼= C ⊠ Crev of dimension dim(C) are in bijection with triples (L, R, ι), where
L ⊆ C, R ⊆ Crev are symmetric subcategories such that (L′)ad ⊆ L, (R′)ad ⊆ R,
and ι : (GL′ , bL′) ∼= (GR′ , bR′) is an isomorphism of bilinear forms.
Namely, any such subcategory is of the form
(17) DL,R,ι = ⊕g∈G
Lg ⊠Rι(g).
Proof. Let X1 ⊠ Y1 and X2 ⊠ Y2 be two simple objects of C⊠ Crev. They centralize
each other if and only if
|sX1X2 | = dX1dX2 ,(18)
|sY1Y2 | = dY1dY2 , and(19)
sX1X2
dX1dX2
sY1Y2
dY1dY2
= 1.(20)
Let D be a symmetric subcategory of C ⊠ Crev and let L (respectively, R) be
the centralizers of fusion subcategories of C (respectively, Crev) formed by left (re-
spectively, right) tensor factors of simple objects in D. By conditions (18), (19),
and Remark 4.9 we must have L′ad ⊆ L and R′ad ⊆ R. Hence, Lemma 4.10
gives gradings L′ = ⊕g∈GL (L′)g with (L′)1 = L′ ∩ L and R′ = ⊕g∈GR (R′)g
with (R′)1 = R′ ∩ R. The condition (20) gives an isomorphism of bilinear forms
ι : (GL′ , bL′) ∼= (GR′ , bR′) which is well-defined be the property that whenever
X ∈ (L′)g and Y ∈ R′ are simple objects such that X ⊠ Y ∈ D then Y ∈ (R′)ι(g).
Note that
(21) D ⊆ ⊕g∈G
Lg ⊠Rι(g),
and hence
dim(D) ≤ dim(L ∩ L′) dim(R∩R′)|GL′ | = dim(L′) dim(R∩R′).
The same inequality holds with L and R interchanged. Therefore,
dim(C)2 = dim(D)2 ≤ dim(L′) dim(L′ ∩ L) dim(R′) dim(R∩R′) ≤ dim(C)2.
16 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
Here the first inequality becomes equality if and only if the inclusion in (21) is an
equality and the second inequality becomes equality if and only if L′ ∩ L = L and
R′ ∩R = R, i.e., when L and R are symmetric. �
Remark 4.12. The subcategory DL,R,ι constructed in Theorem 4.11 is Lagrangian
if and only if L and R are isotropic subcategories of C and ι is an isomorphism of
metric groups.
Corollary 4.13. Let C be a modular category. The following conditions are equiv-
alent:
(i) C is group-theoretical.
(ii) There is a finite group G and a 3-cocycle ω ∈ Z3(G, k×) such that Z(C) ∼=
Z(V ecωG) as a braided fusion category.
(iii) C ⊠ Crev contains a Lagrangian subcategory.
(iv) There is an isotropic subcategory E ⊂ C such that (E ′)ad ⊆ E.
Proof. The equivalence (i)⇔(ii) is a consequence of [ENO], (ii)⇔(iii) follows from
Theorem 4.5, and (iii)⇔(iv) follows from taking E = R = L and ι = idG
Theorem 4.11, cf. Remark 4.12. �
Combining the above criterion with Theorem 4.8 we obtain the following useful
characterization of group-theoretical modular categories.
Corollary 4.14. A modular category C is group-theoretical if and only if simple
objects of C have integral dimension and there is a symmetric subcategory L ⊂ C
such that (L′)ad ⊆ L.
5. Pointed modular categories
In this section we analyze the structure of pointed modular categories, their
central charges, and Lagrangian subgroups. Recall that such categories canonically
correspond to metric groups [Q].
Let G = Z/nZ. The corresponding braided categories of the form C(Z/nZ, q) are
completely classified by numbers σ = q(1) such that σn = 1 (n is odd) or σ2n = 1
(n is even). Then the braiding of objects corresponding to 0 ≤ a, b < n is the
multiplication by σab and the twist of the object a is the multiplication by σa
[Q]). We will denote the category corresponding to σ by C(Z/nZ, σ).
Example 5.1. (a) Let G = Z/2Z. There are 4 possible values of σ: ±1,±i.
The categories C(Z/2Z,±i) are modular with central charge 1±i√
and the categories
C(Z/2Z,±1) are symmetric. The category C(Z/2Z, 1) is isotropic and the category
C(Z/2Z,−1) is not.
(b) Let G = Z/4Z. The twist of the object 2 ∈ Z/4Z is σ4 = ±1. If this twist is
-1 then σ is a primitive 8th root of 1 and the corresponding category is modular;
its Gauss sum is 1 + σ + σ4 + σ9 = 2σ and the central charge is σ. Note that if
σ4 = 1 then the category C(Z/4Z, σ) contains a nontrivial isotropic subcategory.
(c) Let G = Z/2kZ with k ≥ 3. Since the twist of the object 2k−1 is σ22k−2 = 1,
the category C(Z/2kZ, σ) always contains a nontrivial isotropic subcategory.
(d) Let G = Z/2Z × Z/2Z. There are five modular categories with this group.
We give for each of them the list of values of q on nontrivial elements of G:
(1) C(Z/2Z× Z/2Z, i): the values of q are i, i,−1, and the central charge is i.
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 17
(2) C(Z/2Z× Z/2Z,−i): the values of q are −i,−i,−1, and the central charge
is −i.
(3) C(Z/2Z×Z/2Z,−1): the values of q are −1,−1,−1, and the central charge
is −1.
(4) C(Z/2Z× Z/2Z, 1): the values of q are i,−i, 1, and the central charge 1.
(5) The double of Z/2Z: the values of q are 1, 1,−1, and the central charge 1.
In this list, each category of central charge 1 contains a nontrivial isotropic subcat-
egory while the others contain a nontrivial symmetric (but not isotropic) subcate-
gory.
(e) Let G = Z/2Z × Z/4Z. Assume that the category C(G, q) does not contain
a nontrivial isotropic subcategory. Then C(G, q) is equivalent to C(Z/4Z, σ) ⊠
C(Z/2Z,±i) where σ is a primitive 8th root of 1. The possible central charges are
±1 and ±i.
(f) Let G = Z/2Z × Z/2Z × Z/2Z. Assume that the category C(G, q) does not
contain a nontrivial isotropic subcategory. Then C(G, q) is equivalent to C(Z/2Z×
Z/2Z, σ)⊠ C(Z/2Z, σ′), where σ′ = ±i and σ 6= 1,−σ′.
Example 5.2. Let p be an odd prime.
(a) Let G = Z/pZ. The category C(Z/pZ, σ) is modular for σ 6= 1 and is isotropic
for σ = 1. The central charge of the modular category C(Z/pZ, σ) is ±1 for p = 1
mod 4 and ±i for p = 3 mod 4.
(b) Let G = Z/pZ × Z/pZ. There are two modular pointed categories with
underlying group G. One has central charge 1 (and is equivalent to the center of
Z/pZ), and the other one has central charge -1.
Recall that for a metric group (G, q) its Gauss sum is τ±(G, q) =
a∈G q(a)
A subgroup H of G is called isotropic if q|H = 1. An isotropic subgroup is called
Lagrangian if H⊥ = H .
The following proposition is well known.
Proposition 5.3. Let (G, q) be a non-degenerate metric group such that |G| = p2n
where p is a prime number and n ∈ Z+. Suppose that τ±(G, q) =
|G| (i.e., the
central charge of G is 1). Then G contains a Lagrangian subgroup.
Proof. It suffices to prove that G contains a non-trivial isotropic subgroup H , then
one can pass to H⊥/H and use induction.
Assume that p is odd. Assume that G contains a direct summand Z/pkZ with
k > 1. Then the subgroup Z/pZ ⊂ Z/pkZ is isotropic, since otherwise it is a non-
degenerate metric subgroup of G and hence can be factored. Thus we are reduced
to the case when G is a direct sum of k copies of Z/pZ. When k > 2, the quadratic
form on G is isotropic (by the Chevalley - Waring theorem). Thus we are reduced
to the case k = 2, which is easy (see Example 5.2 (b)).
Assume now that p = 2. Again assume that G contains a direct summand Z/2kZ
with k > 1. Again the subgroup Z/2Z ⊂ Z/2kZ is inside its orthogonal complement;
moreover it is isotropic if k ≥ 3. If k = 2 and the subgroup Z/2Z ⊂ Z/4Z is not
isotropic then the subgroup Z/4Z is a non-degenerate metric subgroup and hence
factors out; let G = G1 ⊕ Z/4Z be the corresponding decomposition of G. If G1
contains Z/2Z such that Z/2Z ⊆ Z/2Z⊥ then we are done: if this subgroup is
not isotropic then the diagonal subgroup Z/2Z ⊂ Z/2Z ⊕ Z/2Z ⊂ G1 ⊕ Z/4Z is
isotropic. Thus G1 is a sum of Z/2Z’s and each summand is non-degenerate. But
note that the central charge of a non-degenerate metric group Z/4Z is a primitive
18 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
eighth root of 1 (see Example 5.1 (b)) which is also the central charge of a non-
degenerate metric Z/2Z (see Example 5.1 (a)). This implies that the number of
Z/2Z summands in G1 is odd which is impossible since the order of G is a square.
Thus we are reduced to the case when G is a sum of k copies of Z/2Z. In this case
all possible values of the quadratic form q are ±1,±i and since τ+(G, q) = 2k/2,
there is at least one non-identity a ∈ G with q(a) = 1. So the subgroup generated
by a is isotropic. The proposition is proved. �
6. Nilpotent modular categories
In this section we prove our main results, stated in 1.1, and derive a few corol-
laries.
Recall the definitions of Kad and Kco from 2.5.
Proposition 6.1. Let C be a nilpotent modular category. Then for any maximal
symmetric subcategory K of C one has (K′)ad ⊆ K. Equivalently, there is a grading
of K′ such that K is the trivial component:
(22) K′ = ⊕g∈GK′g, K′1 = K.
Proof. The two conditions are equivalent since by [GN] the adjoint subcategory is
the trivial component of the universal grading.
Let K be a symmetric subcategory of C, i.e., such that K ⊆ K′. Assume that
(K′)ad is not contained in K. It suffices to show that K is not maximal.
Let E = (Kco ∩ (K′)ad) ∨ K. Clearly, K ⊆ E ⊆ K′. We have
E ′ = ((Kco ∩ (K′)ad) ∨ K)′
= K′ ∩ ((Kco)′ ∨ ((K′)ad)′)
= K′ ∩ ((K′)ad ∨ Kco)
= (K′ ∩ Kco) ∨ (K′)ad,
where we used the modular law of the lattice L(C) from Lemma 3.6. Since K ⊆
K′ ∩ Kco and Kco ∩ (K′)ad ⊆ (K′)ad we see that E ⊆ E ′, i.e., E is symmetric.
Let n be the largest positive integer such that (K′)(n) 6⊆ K. Such n exists by our
assumption and the nilpotency of K′. We claim that (K′)(n) ⊆ Kco. Indeed,
(K′)(n) ⊆ ((K′)(n+1))co ⊆ Kco
since D ⊆ (Dad)co for every subcategory D ⊆ C. Therefore, Kco ∩ (K′)(n) = (K′)(n)
is not contained in K and
K ( (Kco ∩ (K′)(n)) ∨K ⊆ (Kco ∩ (K′)ad) ∨ K = E ,
which completes the proof. �
Recall that in a fusion category whose dimension is an odd integer the dimensions
of all objects are automatically integers [GN, Corollary 3.11].
Corollary 6.2. A nilpotent modular category C with integral dimensions of simple
objects is group-theoretical.
Proof. This follows immediately from Corollary 4.14 and Proposition 6.1. �
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 19
Remark 6.3. It follows from Corollary 4.13 that a nilpotent modular category C
with integral dimensions of simple objects contains an isotropic subcategory E such
that (E ′)ad ⊆ E . The corresponding grading
(23) E ′ = ⊕h∈H E ′h, E ′1 = E ,
gives rise to a non-degenerate quadratic form q on H defined by q(h) = θV for any
non-zero V ∈ Ch. We have a braided equivalence Ē ′ ∼= C(H, q).
We may assume that E is maximal among isotropic subcategories of C. In this
case, Proposition 3.9 implies that the isomorphism class of the above metric group
(H, q) does not depend on the choice of the maximal isotropic subcategory E .
Corollary 6.4. The central charge of a modular nilpotent category with integer
dimensions of objects is always an 8th root of 1. Moreover, the central charge of a
modular p−category is ±1 if p = 1mod 4 and ±1, ±i if p = 3mod 4. The central
charge of a modular p−category of dimension p2k, k ∈ Z+ with odd p is ±1.
Proof. By Remark 6.3 and Theorem 3.4 the central charge always equals the central
charge of some pointed category, so the first claim follows from Examples 5.1-5.2.
The second and third claims follow from Example 5.2. �
Theorem 6.5. Let C be a modular category with integral dimensions of simple
objects. Then C is nilpotent if and only if there exists a pointed modular category
M such that C⊠M is equivalent (as a braided fusion category) to Z(VecωG), where
G is a nilpotent group.
Proof. Note that for a nilpotent group G the category Z(VecωG) is a tensor product
of modular p-categories and, hence, is nilpotent. So if C ⊠M ∼= Z(VecωG) then C is
nilpotent (as a subcategory of a nilpotent category).
Let us prove the converse implication. Pick an isotropic subcategory E ⊂ C such
that (E ′)ad ⊆ E (such a subcategory exists by Remark 6.3). There is a metric
group (H, q) such that Ē ′ ∼= C(H, q). Let E ′ = ⊕h∈H E ′h , where E1 = E be the
corresponding grading from (23).
Let M be the reversed category of Ē ′ (i.e., with the opposite braiding and twist).
Then M ∼= C(H, q−1) and ξ(M) = ξ(C(H, q))−1 = ξ(C)−1 by Theorem 3.4.
The modular category Cnew = C ⊠ M is nilpotent and ξ(Cnew) = 1. The cate-
gory Enew := ⊕h∈H Eh ⊠ h is a Lagrangian subcategory of Cnew and the required
statement follows from Theorem 4.5. �
Let p be a prime number.
Theorem 6.6. A modular category C is equivalent to the center of a fusion category
of the form VecωG with G being a p-group if and only if it has the following properties:
(i) the Frobenius-Perron dimension of C is p2n for some n ∈ Z+,
(ii) the dimension of every simple object of C is an integer,
(iii) the multiplicative central charge of C is 1.
Proof. It is clear that for any finite p-group G and ω ∈ Z3(G, k×) the modular
category Z(VecωG) satisfies properties (i) and (ii). The central charge of Z(Vec
equals 1 by [Mu4, Theorem 1.2].
Let us prove the converse. Suppose that C satisfies conditions (i), (ii), and (iii).
Let E be an isotropic subcategory of C such that (E ′)ad ⊆ E (such an E exists by
Remark 6.3). There is a grading E ′ = ⊕h∈H E ′h with E ′1 = E and θ being constant
20 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
on each E ′h, h ∈ H . Note that H is a metric p-group whose order is a square.
By Proposition 5.3 it contains a Lagrangian subgroup H0, whence ⊕h∈H0 E ′h is a
Lagrangian subcategory of C.
Thus, C ∼= VecωG for some G and ω by Theorem 4.5. Since |G|2 = dim(Vec
dim(C) it follows that G is a p-group. �
Finally, we apply our results to show that certain fusion categories (more pre-
cisely, representation categories of certain semisimple quasi-Hopf algebras) are group-
theoretical and to obtain a categorical analogue of the Sylow decomposition of
nilpotent groups.
Corollary 6.7. Let C be a fusion category with integral dimensions of simple objects
and such that Z(C) is nilpotent. Then C is group-theoretical.
Proof. By Corollary 6.2 the category Z(C) is group-theoretical. Hence, C ⊠ Crev is
group theoretical (as a dual category of Z(C), see [ENO]). Therefore, C is group-
theoretical (as a fusion subcategory of C ⊠ Crev). �
Corollary 6.8. Let C be a fusion category of dimension pn, n ∈ Z+, such that
all objects of C have integer dimension (this is automatic if p > 2). Then C is
group-theoretical.
In other words, semisimple quasi-Hopf algebras of dimension pn are group-
theoretical.
Remark 6.9. Semisimple Hopf algebras of dimension pn were studied by several
authors, see e.g., [EG2], [Kash], [Ma1], [Ma2], [MW], [Z].
From Corollary 6.2 we obtain the following Sylow decomposition.
Theorem 6.10. Let C be a braided nilpotent fusion category such that all objects
of C have integer dimension. Then C is group-theoretical and has a decomposition
into a tensor product of braided fusion categories of prime power dimension. If the
factors are chosen in such a way that their dimensions are relatively prime, then
such a decomposition is unique up to a permutation of factors.
Proof. It was shown in [GN, Theorem 6.11] that the center of a braided nilpotent
fusion category is nilpotent. Hence, Z(C) is group-theoretical by Corollary 6.2.
Since C is equivalent to a subcategory of Z(C), it is group-theoretical by [ENO,
Proposition 8.44]. This means that there is a group G and ω ∈ Z3(G, k∗) such
that C is dual to VecωG with respect to some indecomposable module category. The
group G is necessarily nilpotent since Rep(G) ⊆ Z(VecωG) ∼= Z(C). Hence, G is
isomorphic to a direct product of its Sylow p-subgroups, G = G1 × · · · ×Gn, and
so VecωG is equivalent to a tensor product of p-categories. It follows from [ENO,
Proposition 8.55] that the dual category C is also a product of fusion p-categories,
as desired.
Now suppose that C is decomposed into factors of prime power Frobenius-Perron
dimension, C ≃ ⊠pCp. It is easy to see that the objects from Cp ⊂ C are characterized
by the following property:
(24) X ∈ Cp if and only if there exists k ∈ Z+ such that Hom(1, X⊗
) 6= 0.
This shows that the decomposition in question is unique. �
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 21
Remark 6.11. Let C be a nilpotent modular category with integral dimensions
of simple objects. We already mentioned in the introduction that the choice of a
tensor complement M satisfying C ⊠ M ∼= VecωG is not unique. In the proof of
Theorem 6.5 such M can be chosen canonically as the category opposite to the
canonical modularization corresponding to a maximal isotropic subcategory of C,
see Proposition 3.9.
Another canonical way is to choose an M of minimal possible dimension. This
is done as follows. By Theorem 6.10, we have C = ⊠p Cp and M = ⊠p Mp,
where Cp,Mp are modular p-categories. By Theorem 6.6, Mp has to be chosen in
such a way that dim(Cp) dim(Mp) is a square and ξ(Mp) = ξ(Mp)−1. It follows
from Examples 5.1, 5.2 and Corollary 6.4 that there is a unique such choice of
Mp with minimal dim(Mp), in which case dim(Mp) ∈ {1, p, p2} for odd p and
dim(M2) ∈ {1, 2, 4, 8}.
Theorem 6.12. Let C be a braided nilpotent fusion category. Then C has a unique
decomposition into a tensor product of braided fusion categories of prime power
dimension.
Proof. According to Theorem 6.10 the result is true if the dimensions of simple
objects of C are integers. In general, define subcategories Cp ⊂ C by condition (24)
above. For a simple object X ∈ C it is known (see [GN]) that FPdim(X) =
N ∈ N. Thus X⊠X ∈ C⊠C has an integer dimension. The category C⊠C contains
a fusion subcategory (C⊠ C)int consisting of all objects with integer dimension, see
[GN]. We can apply Theorem 6.10 to the category (C ⊠ C)int and obtain a unique
decomposition X = ⊗pXp with Xp ∈ Cp. The theorem is proved. �
Corollary 6.13. Let C be a braided nilpotent fusion category. Assume that X ∈ C
is simple and its dimension is not integer. Then FPdim(X) ∈
Proof. This follows immediately from Theorem 6.12 since if a category of prime
power Frobenius-Perron dimension pk contains an object of a non-integer dimension
then p = 2, see [ENO]. �
Example 6.14. It is easy to see that the Tambara-Yamagami categories from [TY]
are nilpotent and indecomposable into a tensor product. Thus Theorem 6.12 implies
that if such a category admits a braiding, then its dimension should be a power of
2 (since the dimension of a Tambara-Yamagami category is always divisible by 2).
A stronger result is contained in [S].
References
[Be] R. Bezrukavnikov, On tensor categories attached to cells in affine Weyl groups, Represen-
tation theory of algebraic groups and quantum groups, 69–90, Adv. Stud. Pure Math., 40,
Math. Soc. Japan, Tokyo, 2004.
[Br] A. Bruguières, Catégories prémodulaires, modularization et invariants des variétés de
dimension 3, Mathematische Annalen, 316 (2000), no. 2, 215-236.
[BD] M. Boyarchenko, V. Drinfeld, A motivated introduction to character sheaves and the orbit
method for unipotent groups in positive characteristic, math.RT/0609769.
[BK] B. Bakalov, A. Kirillov Jr., Lectures on Tensor categories and modular functors, AMS,
(2001).
[CG] A. Coste, T. Gannon, Remarks on Galois symmetry in rational conformal field theories,
Phys. Lett. B 323 (1994), no. 3-4, 316-321.
[De] P. Deligne, Catégories tensorielles, Mosc. Math. J. 2 (2002), no. 2, 227–248.
[Dr] V.G. Drinfeld, Quasi-Hopf algebras, Leningrad Math. J. 1 (1990), no. 6, 1419-1457.
http://arxiv.org/abs/math/0609769
22 VLADIMIR DRINFELD, SHLOMO GELAKI, DMITRI NIKSHYCH, AND VICTOR OSTRIK
[dBG] J. de Boere, J. Goeree, Markov traces and II1 factors in conformal field theory, Comm.
Math. Phys. 139 (1991), no. 2, 267-304.
[DPR] R. Dijkgraaf, V. Pasquier, and P. Roche, Quasi-quantum groups related to orbifold models,
Nuclear Phys. B. Proc. Suppl. 18B (1990), 60-72.
[EG1] P. Etingof and S. Gelaki, Isocategorical groups, International Mathematics Research No-
tices 2 (2001), 59–76.
[EG2] P. Etingof and S. Gelaki, On finite-dimensional semisimple and cosemisimple Hopf al-
gebras in positive characteristic, International Mathematics Research Notices 16 (1998),
851–864.
[ENO] P. Etingof, D. Nikshych, V. Ostrik, On fusion categories, Annals of Mathematics 162
(2005), 581-642.
[EO] P. Etingof, V. Ostrik, Finite tensor categories, Moscow Math. Journal 4 (2004), 627-654.
[GN] S. Gelaki, D. Nikshych, Nilpotent fusion categories, math.QA/0610726.
[JS] A. Joyal, R. Street, Braided tensor categories, Adv. Math., 102, 20-78 (1993).
[Kash] Y. Kashina, Classification of semisimple Hopf algebras of dimension 16, J. Algebra 232
(2000), no. 2, 617–663.
[Kass] C. Kassel, Quantum groups, Graduate Texts in Mathematics 155, Springer, New York.
[KaO] I. Kath, M. Olbrich Metric Lie algebras and quadratic extensions, Transform. Groups 11
(2006), no. 1, 87–131.
[KiO] A. Kirillov Jr., V. Ostrik, On q-analog of McKay correspondence and ADE classification
of ŝl2 conformal field theories, Adv. Math. 171 (2002), no. 2, 183–227.
[Ma1] A. Masuoka, The pn theorem for semi-simple Hopf algebras, Proc. AMS 124 (1996), 735-
[Ma2] A. Masuoka, Self-dual Hopf algebras of dimension p3 obtained by extension, J. Algebra
178 (1995), 791–806.
[Mu1] M. Müger, Galois theory for braided tensor categories and the modular closure, Adv. Math.
150 (2000), no. 2, 151–201.
[Mu2] M. Müger, On the structure of modular categories, Proc. Lond. Math. Soc., 87 (2003),
291-308.
[Mu3] M. Müger, Galois extensions of braided tensor categories and braided crossed G-categories,
J. Algebra 277 (2004), no. 1, 256–281.
[Mu4] M. Müger, From subfactors to categories and topology. II. The quantum double of tensor
categories and subfactors, J. Pure Appl. Algebra 180 (2003), no. 1-2, 159–219.
[MMT] R. McKenzie, G. McNulty, W. Taylor, Algebras, lattices, varieties. Vol. I., The
Wadsworth & Brooks/Cole Mathematics Series. Wadsworth & Brooks/Cole Advanced
Books & Software, Monterey, CA, 1987.
[MW] S. Montgomery, S. Witherspoon, Irreducible representations of crossed products, J. Pure
Appl. Algebra, 129 (1998), no. 3, 315–326.
[N] D. Naidu, Categorical Morita equivalence for group-theoretical categories, Comm. Alg., to
appear, math.QA/0605530.
[O1] V. Ostrik, Module categories, weak Hopf algebras and modular invariants, Transform.
Groups, 8 (2003), 177-206.
[O2] V. Ostrik, Module categories over the Drinfeld double of a finite group, Int. Math. Res.
Not. (2003) no. 27, 1507-1520.
[P] B. Pareigis, On braiding and dyslexia, J. Algebra 171 (1995), no. 2, 413–425.
[Q] F. Quinn, Group categories and their field theories, Proceedings of the Kirbyfest (Berkeley,
CA, 1998), 407–453 (electronic), Geom. Topol. Monogr., 2, Geom. Topol. Publ., Coventry,
1999.
[RT] N. Reshtikhin, V. Turaev, Ribbon graphs and their invariants derived from quantum
groups, Comm. Math. Phys., 127 (1990), 1-26.
[S] J. A. Siehler, Braided Near-group Categories, math.QA/0011037.
[T] V. Turaev, Quantum invariants of knots and 3-manifolds, W. de Gruyter (1994).
[TY] D. Tambara, S. Yamagami, Tensor categories with fusion rules of self-duality for finite
abelian groups, J. Algebra 209 (1998), no. 2, 692–707.
[Z] Y. Zhu, Hopf algebras of prime dimension, Internat. Math. Res. Notices 1 (1994), 53–59.
http://arxiv.org/abs/math/0610726
http://arxiv.org/abs/math/0605530
http://arxiv.org/abs/math/0011037
GROUP-THEORETICAL PROPERTIES OF NILPOTENT MODULAR CATEGORIES 23
V.D.: Department of Mathematics, University of Chicago, Chicago, IL 60637, USA
E-mail address: drinfeld@math.uchicago.edu
S.G.: Department of Mathematics, Technion-Israel Institute of Technology, Haifa
32000, Israel
E-mail address: gelaki@math.technion.ac.il
D.N.: Department of Mathematics and Statistics, University of New Hampshire,
Durham, NH 03824, USA
E-mail address: nikshych@math.unh.edu
V.O.: Department of Mathematics, University of Oregon, Eugene, OR 97403, USA
E-mail address: vostrik@math.uoregon.edu
	1. introduction
	1.1. Main results
	1.2. Interpretation in terms of group-theoretical fusion categories and semisimple quasi-Hopf algebras.
	1.3. Idea of the proof
	1.4. Organization of the paper
	1.5. Acknowledgments
	2. Preliminaries
	2.1. Fusion categories
	2.2. Nilpotent fusion categories
	2.3. Premodular categories and modular categories
	2.4. Pointed modular categories and metric groups
	2.5. Centralizers
	2.6. Gauss sums and central charge in modular categories
	2.7. Symmetric fusion categories
	3. Isotropic subcategories and Bruguières-Müger modularization
	3.1. Modularization
	3.2. Invariance of the central charge
	3.3. Maximal isotropic subcategories
	4. Reconstruction of a twisted group double from a Lagrangian subcategory
	4.1. C-algebras
	4.2. Hyperbolic modular categories as twisted group doubles
	4.3. A criterion for a modular category to be group-theoretical
	5. Pointed modular categories
	6. Nilpotent modular categories
	References
ABSTRACT
  We characterize a natural class of modular categories of prime power
Frobenius-Perron dimension as representation categories of twisted doubles of
finite p-groups. We also show that a nilpotent braided fusion category C admits
an analogue of the Sylow decomposition. If the simple objects of C have
integral Frobenius-Perron dimensions then C is group-theoretical. As a
consequence, we obtain that semisimple quasi-Hopf algebras of prime power
dimension are group-theoretical. Our arguments are based on a reconstruction of
twisted group doubles from Lagrangian subcategories of modular categories (this
is reminiscent to the characterization of doubles of quasi-Lie bialgebras in
terms of Manin pairs).

<|endoftext|><|startoftext|>
Introduction
One of the most puzzling results of the chiral quark-soliton model (χQSM) for
exotic baryons consists in a very small hadronic decay width,1) governed by the
decay constant G10. While the small mass of exotic states is rather generic for all
chiral models1)–3) the smallness of the decay width appears as a subtle cancelation
of three different terms that contribute to G10. Decay width in solitonic models
4) is
calculated in terms of a matrix element M of the collective axial current operator
corresponding to the emission of a pseudoscalar meson ϕ1) – see Ref. 5) for criticism
of this approach:
Ô(8)ϕ = 3
ϕi −G1 dibcD
ϕb Ŝc −
ϕ8 Ŝi
× piϕ. (1.1)
For notation see Ref. 1). Constants G0,1,2 are constructed from the so called moments
of inertia that are calculable in χQSM. The decay width is given as
ΓB→B′+ϕ =
M M ′
M2 = 1
M M ′
A2. (1.2)
The “bar” over the amplitude squared denotes averaging over initial and summing
over final spin (and, if explicitly indicated, over isospin).
For B(10) → B′(8) + ϕ for spin ”up” and ~pϕ = (0, 0, pϕ) we have
81/2, B
∣ Ô(8)ϕ
∣101/2, B
3G10√
× pϕ (1.3)
G10 = G0 −G1 −
G2. (1.4)
∗) e-mail address: yessien@gmail.com
∗∗) e-mail address: michal@if.uj.edu.pl
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0196v2
2 K. Pieściuk and M. Prasza lowicz
In order to have an estimate of the width (1.2) the authors of Ref. 1) calculated G10
in the nonrelativistic limit6) of χQSM and got G10 ≡ 0. It has been shown that this
cancelation between terms that scale differently with Nc (G0 ∼ N
c , G1,2 ∼ N
is in fact consistent with large Nc counting,
7) since
G10 = G0 −
Nc + 1
G2 (1.5)
where the Nc dependence comes from the SU(3) Clebsch-Gordan coefficients calcu-
lated for large Nc. In the nonrelativistic limit (NRL):
G0 = −(Nc + 2)G, G1 = −4G, G2 = −2G, G ∼ N1/2c . (1.6)
In this paper we ask whether the similar cancelation takes place for the decays of
27 of spin 1/2 and 3/2. We also discuss the possible modifications of the Nc depen-
dence of the decay width due to the different choice of the large Nc generalizations
of regular SU(3) multiplets.
§2. Baryons in large N
limit
Soliton is usually quantized as quantum mechanical symmetric top with two
moments of inertia I1,2:
B = Mcl +
S(S + 1) +
C2(R) − S(S + 1) −
B . (2
Here S denotes baryon spin, C2(R) the Casimir operator for the SU(3) representation
R = (p, q):
C2(R) =
p2 + q2 + pq + 3(p + q)
(2.2)
and quantities δ
B denote matrix elements of the SU(3) breaking hamiltonian:
Ĥ ′ =
σ + αD
88 ) + βY +
8A ĴA. (2
Model parameters that can be found in Ref. 8)
α = −Nc
(σ + β), β = −ms
, γ = 2ms
, σ =
mu + md
scale with Nc in the following way:
i1,2 = 3I1,2/Nc where i1,2 ∼ O(N0c ), σ, β, γ ∼ O(msN0c ). (2.4)
Here ΣπN is pion-nucleon sigma term and mq denote current quark masses. Numer-
ically σ > |β| , |γ|.
So far we have specified explicit Nc dependence (2.4) that follows from the fact
that model parameters are given in terms of the quark loop. Another type of the
Remarks on Nc dependence 3
Nc dependence comes from the constraint
9) that selects SU(3)flavor representations
R = (p, q) containing states with hypercharge YR = Nc/3. Therefore for arbitrary
Nc ordinary baryon representations have to be extended and one has to specify which
states correspond to the physical ones. Usual choice10)
”8” = (1, (Nc − 1)/2) , ”10” = (3, (Nc − 3)/2) , ”10” = (0, (Nc + 3)/2) , (2.5)
depicted in Fig. 1 corresponds – in the quark language – to the case when each time
when Nc is increased by 2, a spin-isospin singlet (but charged) 3 diquark is added,
as depicted in Fig. 2.
Fig. 1. Standard generalization of SU(3) flavor baryon representations for arbitrary Nc
Fig. 2. Adding 3 diquarks to regular SU(3) baryon representations 8, 10 and 10 corresponds to the
representation set of Fig.1.
Extension (2.5) leads to (1.5). It implies that mass differences between centers
of multiplets scale differently with Nc:
∆10−8 =
∼ O(1/Nc), ∆10−8 =
Nc + 3
∼ O(1). (2.6)
The fact that ∆10−8 6= 0 in large Nc limit triggered recently discussion on the
validity of the semiclassical quantization for exotic states.11) Since in the chiral
limit the momentum pϕ of the outgoing meson scales according to (2.6), overall Nc
dependence of the decay width is strongly affected by its third power (1.2):
ΓB→B′+ϕ ∼
O(A2)O(p3ϕ). (2.7)
Phenomenologically, however, scaling (2.6) is not sustained. Indeed, meson mo-
menta in ∆ and Θ decays are almost identical (assuming M
Θ ≃ 1540 MeV):
pπ ≃ 225 MeV, pK ≃ 268 MeV. (2.8)
Unfortunately, going off SU(3)flavor limit does not help. Explicitly:
δ(8) =
(Nc − 3)
(Nc − 2)α + 32γ
Nc + 7
3(Nc + 2)α− 12(2Nc + 9)γ
(Nc + 3)(Nc + 7)
4 K. Pieściuk and M. Prasza lowicz
(6α + (Nc + 6)γ)
(Nc + 3)(Nc + 7)
− I(I + 1)
= 3σ + 2β − σY + . . . (2.9)
δ(10) =
(Nc − 3)(Nc + 4)
(Nc + 1)(Nc + 9)
Nc − 3
5(Nc − 3)
2(Nc + 1)(Nc + 9)
3(Nc − 1)α − 52(Nc + 3)γ
(Nc + 1)(Nc + 9)
Y = 3σ + 2β − σY + . . . (2.10)
δ(10) =
Nc(Nc − 3)
(Nc + 3)(Nc + 9)
Nc − 3
β − 3(Nc − 3)
2(Nc + 3)(Nc + 9)
6Ncα− 9γ
2(Nc + 3)(Nc + 9)
Y = 5σ + 4β − σY + . . . (2.11)
where . . . denote terms O(1/Nc), Y and I denote physical hypercharge and isospin.
Interestingly in all cases in the large Nc limt, ms splittings are proportional to
the hypercharge differences only. In this limit Σ−Λ splitting in the octet is zero and
this degeneracy is lifted in the next order at O(1/Nc). This explains the smallness of
Σ − Λ mass difference. Additionally δ(8)N ≃ δ
∆ up to higher order terms O(1/N2c ),
however δ
Θ − δ
N ≃ σ + 2β > 0. This implies that
α + β − 3
γ → 3
+ σ + 2β + O(1/Nc),
γ → O(1/Nc). (2.12)
The first equation shows that the Θ − N 6= 0 in the large Nc limit even if ms
corrections are included. We will come back to this problem in the last section.
§3. Decay constants of twentysevenplet for large N
In this section we shall consider decays of eikosiheptaplet (27-plet)
”27” = (2, (Nc + 1)/2) (3.1)
that can have either spin 1/2 or 3/2, the latter being lighter. Mass differences read
∆273/2−8 =
Nc + 1
∼ O(1), ∆271/2−8 =
Nc + 7
∼ O(1),
∆273/2−10 =
Nc + 1
∼ O(1), ∆271/2−10 = −
Nc + 7
∼ O(1),
∆273/2−10 =
∼ O(1/Nc), ∆271/2−10 =
∼ O(1/Nc). (3.2)
Matrix elements for the decays of eikosiheptaplet (with S3 = 1/2) read:
A(B273/2 → B
8 + ϕ) = 3
8 ”8”
8(Nc + 5)
9(Nc + 3)(Nc + 9)
×G27,
Remarks on Nc dependence 5
A(B273/2 → B
10 + ϕ) = −3
8 ”10”
(Nc − 1)(Nc + 7)
9(Nc + 1)(Nc + 3)(Nc + 9)
× F27,
A(B273/2 → B
+ ϕ) = 3
8 ”10”
2(Nc + 1)(Nc + 7)
3(Nc + 3)(Nc + 9)
× E27,
(3.3)
Decay Large Nc NRL
Scaling
in NRL
273/2 → 81/2 G27 = G0 − Nc−14 G1 = −3G N
273/2 → 103/2 F27 = G0 − Nc−14 G1 −
G2 = 0 0
273/2 → 101/2 E27 = G0 + G1 = −(Nc + 6)G N
For S = 1/2 and S3 = 1/2 we have:
A(B271/2 → B
8 + ϕ) = −3
8 ”8”
(Nc + 1)(Nc + 5)
9(Nc + 3)(Nc + 7)(Nc + 9)
×H27,
A(B271/2 → B
10 + ϕ) = −3
8 ”10”
8(Nc − 1)
9(Nc + 3)(Nc + 9)
×G′27,
A(B271/2 → B
+ ϕ) = 3
8 ”10”
Nc + 4
9(Nc + 3)(Nc + 9)
×H ′27,
(3.4)
Decay Large Nc NRL
Scaling
in NRL
271/2 → 81/2 H27 = G0−Nc+54 G1 +
G2 = 0 0
271/2 → 103/2 G′27 = G0−Nc+54 G1 = 3G N
271/2 → 101/2 H ′27 = G0 + 2Nc+52Nc+8G1 +
2Nc+8
G2 = − (Nc+3)(Nc+7)Nc+4 G N
In order to calculate the Nc behavior of the width we have to know the Nc
dependence of the flavor Clebsch-Gordan coefficients that depend on the states in-
volved. For the decays into 8 and 10 the only possible channels are Θ27 → N(∆)+K,
and the pertinent Clebsches do not depend on Nc. For the decays into 10 we have
Θ27 → Θ10 + π that scales like O(1) and Θ27 → N10 +K that scales like O(1/
The resulting scaling of ΓΘ27→B′+ϕ calculated from Eq.(2
.7) reads as follows:
6 K. Pieściuk and M. Prasza lowicz
decay of Nc scaling decay of Nc scaling
Θ273/2 exact NRL Θ271/2 exact NRL
→ N8 + K O(1) O(1/N2c ) → N8 + K O(1) 0
→ ∆10 + K O(1) 0 → ∆10 + K O(1) O(1/N2c )
→ N10 + K O(1/N3c ) O(1/N3c ) → N10 + K O(1/N3c ) O(1/N3c )
→ Θ10 + π O(1/N2c ) O(1/N2c ) → Θ10 + π O(1/N2c ) O(1/N2c )
Interestingly, we see that whenever the exact scaling is O(1), the nonrelativistic
cancelation (exact or partial) lowers the power of Nc, whereas in the case when the
width has good behavior for large Nc, there is no NRL cancelation.
§4. Alternative choices for large N
multiplets
So far we have only considered the ”standard” generalization (2.5) of baryonic
SU(3)flavor representations for large Nc. This choice is based on the requirement that
generalized baryonic states have physical spin, isospin and strangeness, however their
hypercharge and charge are not physical.10) Moreover, the generalization of the octet
is not selfadjoint and antidecuplet is not complex conjugate of decuplet. Some years
ago it has been proposed to consider alternative schemes.12)
Fig. 3. Generalization of SU(3) flavor representations in which octet is selfadjoint
Fig. 4. Adding triquarks to regular SU(3) baryon representations 8, 10 and 10 corresponds to the
representation set of Fig.3.
If we require the generalized octet to be self-adjoint we are led to the following
set of representations
”8” = (Nc/3, Nc/3) , ”10” = ((Nc + 6)/3, (Nc − 3)/3) , ”10” = ”10”∗ (4.1)
that are depicted in Figs. 3 and 4. This means that we enlarge Nc in steps of 3 adding
each time a uds triquark. Generalized states have physical isospin, hypercharge (and
charge), but unphysical strangeness and spin that is of the order of Nc. With this
Remarks on Nc dependence 7
choice both ∆10−8, ∆10−8 6= 0 in large Nc limit:
∆10−8 = (Nc/6 − 1) /I1, ∆10−8 = (Nc/6 − 1) /I2. (4.2)
With this power counting we can calculate large Nc approximation of the meson
momenta in the decays of ∆ and Θ:
∆ → N pπ =
(M∆ −MN )2 −m2π = 256 MeV,
Θ → N pK =
(MΘ −MN )2 −m2K = 339 MeV (4.3)
that are much closer to the physical values (2.8) than (2.6).
Fig. 5. Generalization of SU(3) flavor representations in which decuplet is fully symmetric (0, q).
Fig. 6. Adding sextet diquarks to regular SU(3) baryon representations 8, 10 and 10 corresponds
to the representation set of Fig.5.
Finally let us mention a third possibility in which we require generalized decuplet
to be a completely symmetric SU(3)flavor representation for arbitrary Nc. This leads
to (see Figs. 5 and 6):
”8” = (Nc − 2, 1) ”10” = (Nc, 0) ”10” = (Nc − 3, 3) . (4.4)
Interestingly this choice has a smooth limit to the one flavor case. In the quark
language it amounts to adding a symmetric diquark to the original SU(3)flavor rep-
resentation when increasing Nc in steps of 2. As seen from Fig. 5 physical states are
situated at the bottom of infinite representations (4.4) and therefore have unphysical
strangeness, charge (hypercharge) and also spin.
The mass splittings for this choice read
∆10−8 = Nc/ 2I1, ∆10−8 = 3/ 2I2. (4
Here the generalized decuplet remains split from the ”8”, while ∆10−8 → 0 for large
Nc. The phase space factor for Θ decay is therefore suppressed with respect to the
one of ∆.
§5. Summary
In this short note we have shown that very small width of exotic baryons – if
they exist – cannot be explained by the standard Nc counting alone. Certain degree
8 K. Pieściuk and M. Prasza lowicz
of nonrelativisticity is needed to ensure cancelations between different terms in the
decay constants. This phenomenon observed firstly for antidecuplet, is also operative
for the decays of eikosiheptaplet. We have shown that in χQSM in the nonrelativistic
limit all decays are suppressed for large Nc. Exact cancelations occur for Θ273/2 →
∆10 + K and Θ271/2 → N8 + K, leading Nc terms cancel for Θ273/2 → N8 + K and
Θ271/2 → ∆10 + K. For 27 → 10 there are no cancelations, but the phase space is
N−3c suppressed.
We have also briefly discussed nonstandard generalizations of regular baryon
representations for arbitrary Nc. For Nc > 3 bayons are no longer composed from 3
quarks and therefore they form large SU(3)flavor representations that reduce to octet,
decuplet and antidecuplet for Nc = 3. The standard way to generalize regular baryon
representations is to add antisymmetric antitriplet diqaurk when Nc is increased in
intervals of 2. This choice fulfils many reasonable requirements; most importantly
for SU(2)flavor these representations form regular isospin multiplets. However, repre-
sentations (2.5) do not obey conjugation relations characteristic for regular represen-
tations. Therefore we have proposed generalization (4.1) that satisfies conjugation
relations. Most important drawback of (4.1) is that spin S ∼ Nc that contradicts
semiclassical quantization. Nevertheless as a result meson momenta emitted in 10
and 10 decays scale in the same way with Nc (4.3), consistently with ”experimental”
values (2.8), whereas for (2.5) the scaling is different (2.6).
Acknowledgements
One of us (MP) is grateful to the organizers of the Yukawa International Sym-
posium (YKIS2006) for hospitality during this very successful workshop.
References
1) D. Diakonov, V. Petrov and M. V. Polyakov, Z. Phys. A 359 (1997) 305
[arXiv:hep-ph/9703373].
2) L.C Biedenharn and Y. Dothan, Monopolar Harmonics in SU(3)F as eigenstates of the
Skyrme-Witten model for baryons, E. Gotsman and G. Tauber (eds.), From SU(3) to
gravity, p. 15-34.
3) M. Prasza lowicz, talk at Workshop on Skyrmions and Anomalies, M. Jeżabek and M.
Prasza lowicz eds., World Scientific 1987, page 112 and Phys. Lett. B 575 (2003) 234
[hep-ph/0308114].
4) G. S. Adkins, C. R. Nappi and E. Witten, Nucl. Phys. B 228 (1983) 552;
5) H. Weigel, arXiv:hep-ph/0703072.
6) M. Prasza lowicz, A. Blotz and K. Goeke, Phys. Lett. B 354 (1995) 415 [hep-ph/9505328];
M. Prasza lowicz, T. Watabe and K. Goeke, Nucl. Phys. A 647 (1999) 49 [hep-ph/9806431].
7) M. Prasza lowicz, Phys. Lett. B 583, 96 (2004) [arXiv:hep-ph/0311230].
8) A. Blotz, D. Diakonov, K. Goeke, N. W. Park, V. Petrov and P. V. Pobylitsa, Nucl. Phys.
A 555, 765 (1993).
9) E. Guadagnini, Nucl. Phys. B 236 (1984) 35;
P.O. Mazur, M. Nowak and M. Prasza lowicz, Phys. Lett. B 147(1984) 137;
S. Jain and S.R. Wadia, Nucl. Phys. B 258 (1985) 713.
10) G. Karl, J. Patera and S. Perantonis, Phys. Lett. B 172 (1986) 49;
J. Bijnens, H. Sonoda and M. Wise, Can. J. Phys. 64 (1986) 1.
Z. Duliński and M. Prasza lowicz, Acta Phys. Pol. B 18 (1988) 1157.
11) P. V. Pobylitsa, Phys. Rev. D 69, 074030 (2004) [arXiv:hep-ph/0310221].
T. D. Cohen, Phys. Rev. D 70, 014011 (2004) [arXiv:hep-ph/0312191].
12) Z. Duliński, Acta. Phys. Pol. B 19 (1988) 891.
http://arxiv.org/abs/hep-ph/9703373
http://arxiv.org/abs/hep-ph/0308114
http://arxiv.org/abs/hep-ph/0703072
http://arxiv.org/abs/hep-ph/9505328
http://arxiv.org/abs/hep-ph/9806431
http://arxiv.org/abs/hep-ph/0311230
http://arxiv.org/abs/hep-ph/0310221
http://arxiv.org/abs/hep-ph/0312191
	Introduction
	Baryons in large Nc limit
	Decay constants of twentysevenplet for large Nc
	Alternative choices for large Nc multiplets
	Summary
ABSTRACT
  We calculate the N_c dependence of the decay widths of exotic eikosiheptaplet
within the framework of Chral Quark Soliton Model. We also discuss
generalizations of regular baryon representations for arbitrary N_c.

<|endoftext|><|startoftext|>
Introduction
In 1969 Stuart Kauffman started to study random Boolean networks as simple
models of genetic regulatory networks [1]. Random Boolean networks that con-
sists of a set of Boolean gates that are capable of storing a single Boolean value.
At discrete time steps these gates store a new value according to an initially
chosen random Boolean function, which receives its inputs from random chosen
gates. We will give a more formal definition later. Kauffman made numerical
∗Corresponding author. E-Mail: Steffen.Schober@uni-ulm.de
http://arxiv.org/abs/0704.0197v1
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
studies of random networks, where the functions are chosen from the set of all
Boolean functions with K arguments (the so called NK-Networks). He recog-
nised that if K ≤ 2, the random networks exhibit a remarkable form of ordered
behaviour: The limit cycles are small, the number of ineffective gates, which are
gates that can be perturbed without changing the asymptotic behaviour, and
the number of freezing gates that stop changing their state is large. In contrast
if K ≥ 3, the networks do not exhibit this kind of ordered behaviour (see [1, 2]).
The first analytical proof for this phase transition was given by Derrida and
Pomeau (see [3]) by studying the evolution of the Hamming distance of random
chosen initial states by means of so called annealed approximation. The first
proof for the number of freezing and ineffective gates was given by James Lynch
(see [4], although slightly weaker results appeared earlier [5, 6]). Depending on
a parameter λ, that depends on the probabilities of the Boolean functions, he
showed that if λ ≤ 1 almost all gates are ineffective and freezing, otherwise not.
Although his analysis is very general, until now it was only applied to networks
with connectivity 2 and non-uniform probabilities for the Boolean function: if
the probability of choosing a constant function is larger or equal the probability
of choosing a non-constant non-canalizing function (namely the XOR- or the
inverted XOR-function), λ is less or equal to one. But it turns out that in some
cases λ is equal to the expectation of the average sensitivity. Therefore we will
first study the average sensitivity in Section 3. Afterwards it will be shown
in Section 4 how to use the results from the previous section to apply Lynch’s
analysis to classical NK-Networks and biased random Boolean networks 1. But
first we will give some basic definition used throughout the paper in Section 2.
2 Basic Definitions
In the following F2 = {0, 1} denotes the Galois field of two elements, where
addition, denoted by ⊕, is defined modulo 2. The set of vectors of length K
over F2 will be denoted by F
2 . If x is a vector from F
2 , its ith component
will be denoted by xi. With u
(i) ∈ FK2 we will denote the unit vector which has
all components zero except component i which is one. The Hamming weight of
x ∈ FK2 is defined as
wH(x) = |{i | xi 6= 0, i = 1, . . . ,K}|
and the Hamming distance of x,y ∈ FK2 as
dH(x,y) = wH(x⊕ y).
A Boolean function is a mapping f : FK2 → F2. A function f may be represented
by its truth table tf , that is, a vector in F
2 , where each component of the truth
table gives the value of f for one of the 2K possible arguments. To fix an order
on the components of the truth table, suppose that its ith component equals
the value of the corresponding function, given the binary representation (to K
bits) of i as an argument.
1a definition will be given later
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
3 Average Sensitivity
In this section we will focus on the average sensitivity. The average sensitivity
is a known complexity measure for Boolean functions, see for example [7]2. It
was already used to study Boolean and random Boolean networks for example
in [8, 9].
Definition 1. Let f denote a Boolean function FK2 → F2 and u
(i) a unit vector.
1. The sensitivity sf(w) is defined as:
sf (w) =
i | f(w) 6= f(w⊕ u(i)), i = 1, . . . ,K
2. The average sensitivity sf is defined as the average of sf (w) over all w ∈
sf = 2
sf (w)
Now consider the random variable FK : Ω → FK, where FK denotes the set
a all 22
Boolean function with K arguments. The probability measure is given
by P (FK = f) =
K . The expected value of the average sensitivity of this
random variable is denoted by EFK (sf ), and is given by
EFK (sf ) =
P (FK = f)sf
The expected value was already derived in [10], and is given by:
Theorem 1 (Bernasconi [10]).
Let the random variable FK be defined as above, then
EFK (sf ) =
P (FK = f)sf =
We will now concentrate on biased Boolean functions. The bias of a Boolean
function f : FK2 → F2 is defined as the number of 1 in the functions truth table
divided by 2K . To define the bias of a random Boolean function two definitions
are possible. First we can assumes that the truth tables of the Boolean functions
are produced by independent Bernoulli trials with probability p for a one (This
should be called mean bias, used for example in [3, 8] ). Therefore consider the
random variable FK,p. The probability of choosing a function f is given by
P (FK,p = f) = p
wH(tf )(1− p)2
K−wH(tf )
For p = 1/2 this is equivalent to the definition of FK .
2here it is called critical complexity
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
As a second possibility, we can only choose functions which have bias p
whereas to all other functions we assign probability 0 (we will call this fixed
bias). Therefore consider the random variables F fixedK,p : Ω → FK. Denote the
truth table of a function f by tf . Further denote the set of all Boolean functions
f with K arguments and wH(tf ) = p2
K with FK,p. The probability for a certain
function chosen according F fixedK,p is given by
P (F fixedK,p = f) =
|FK,p|
if f ∈ FK,p
0 if f /∈ FK,p
Both definitions ensure that the expectation to get a one is equal to p if the
input of a function is chosen at random (with respect to uniform distribution).
But it will turn out that these two different methods of creating biased Boolean
functions, have a major impact on the average sensitivity.
The expectation of the average sensitivity of FK,p was derived in [8]:
Theorem 2 ([8]). Let the random variable FK,p be defined as above:
EFK,p(sf ) = 2Kp(1− p)
For the random variable F fixedK,p we will now proof the following theorem:
Theorem 3. Let the random variable F
fixed
be defined as above:
fixed
(sf ) =
2K+1Kp(1− p)
(2K − 1)
Proof. To find EF fixed
(sf ) we will first consider the random variable FK,t : Ω →
FK where t ∈ {0, 1, · · · , 2
K} and the probability of a function is given by
P (FK,t = f) =
if wH(tf ) = t
0 else
Consider the Boolean functions as functions into R by identifying 0, 1 ∈ F2
with 0, 1 ∈ R. Then we get or the function f :
sf = 2
i | f(w) 6= f(u(i) ⊕w), i = 1, . . . ,K
= 2−K
(f(w)− f(w⊕ u(i)))2
= 2−K
(f(w) + f(w⊕ u(i))− 2f(w)f(w⊕ u(i))).
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
where u(i) again denotes the unit vector with ith component set to 1. Hence by
the linearity of the expectation
EFK,t(sf ) = 2
EFK,t(f(w)) + EFK,t(f(w ⊕ u
(i)))
− 2EFK,t(f(w)f(w ⊕ u
(i)))
Now we form a matrix with the truth tables of all functions with Hamming
weight t as column vectors:
c(1), c(2), · · · , c((
where c(i) ∈ F2
M has exactly
columns and 2K rows. Each entry Mi,j in the ith row and
jth column equals the value of function fj given the binary representation of i
as input.
Hence EFK,t(f(w)) is determined by the number of 1 in the row associated
with w divided by the length of the row. Consider an arbitrary row i. This row
has a one at position j if the corresponding column c(j) has a one at position i.
But there are
column vectors with a 1 at position i. It follows:
∀w ∈ FK2 : EFK,t(f(w)) =
. (2)
As this holds for all w, we have
∀w,u(i) ∈ FK2 : EFK,t(f(w⊕ u
(i))) =
. (3)
To find an expression for EF fixed
(f(w)f(w⊕u(i))) we consider two arbitrary
rows l,m (l 6= m). Define the following sum:
γl,m =
(Kt )
Ml,iMm,i.
Obviously Ml,iMm,i = 1 only if we have a 1 in both rows at position i. This
means for the column vectors c(i) of M , we have c
m = 1. But there are
exactly
such column vectors in M . Therefore we have
∀l,m, l 6= m : γl,m =
2K − 2
As w 6= w ⊕ u(i) for all w,u(i) it follows:
EFK,t(f(w)f(w ⊕ u
(i))) =
t(t− 1)
2K(2K − 1)
. (4)
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
Hence substituting Equations (2), (3) and (4) into Equation (1) leads to
EFK,t(sf ) =
K(2K − t)t
2K−1(2K − 1)
Finally the claimed expression for EF fixed
(sf ) can be obtained from the above
equation by a substitution of t: t → p2K .
It should be noted, that the Theorems 1 and 2 can be proved using in a
similar way. Also worth noting is the fact, that if the functions are chosen
according FK , F
fixed
K,p or FK,p the expectation of the sensitivity of a fixed vector
w (namely the expectation of sf (w)) is independent of w (see Equation (1),(2),
(3) and (4)). Hence the following lemma holds
Lemma 1. If F = FK , F
fixed
or FK,p, then
∀w,v ∈ FK2 : EF (sf (w)) = EF (sf (v))
Before proceeding to the next section, it should be noted, that using the same
arguments as in the proof of Theorem 3, we can also prove the expectation of
average sensitivity of order l , defined as
s(l)(f) = 2−K
x ∈ FK2 |wH(x) = l and f(w) 6= f(w⊕ x)
In this case, instead of summing up all unit vectors in Equation (1), we sum
up all vectors of Hamming weight l. As the equations (2) and (4) hold for all
w ∈ FK2 we conclude that
E(s(l)(F fixedK,p )) =
2K+1p(1− p)
(2K − 1)
and by similar arguments
E(s(l)(FK,p)) =
2p(1− p)
respectively
E(s(l)(FK)) =
4 Extending Lynch’s analysis
As already mentioned James Lynch gave a very general analysis of randomly
constructed Boolean networks (see [4]). Before stating his results we give a
formal definition for Boolean networks A Boolean network B is a 4-tuple
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
(V,E, F̃ ,x) where V = {1, ..., N} is a set of natural numbers, E is a set of
labeled edges on V , F̃ = {f1, ..., fN} is a ordered set of Boolean functions such
that for each v ∈ V the number of arguments of fv is the in-degree of v in
E, these edges are labeled with 1, ..., in-degree(v), and x = (x1, . . . xn) ∈ F
Suppose that a vertex i has Ki in-edges from vertices vi,1, . . . , vi,Ki . For y ∈ F
we define
B(y) =
f1(yv1,1 , . . . , yv1,K1 ), . . . , fN(yvN,1 , . . . , yvN,KN )
The state of B at time 0 is called the initial state x, so we define B0(x) = x. For
time t ≥ 1 the state is inductively defined as Bt(x) = B(Bt−1(x)). Hence we
can in interpret V as set of gates, E and F̃ describes their functional dependence
and x is the networks initial state.
Assume some ordering f1, f2, ... on the set of all Boolean functions F , where
each function fi depends on Ki arguments. Further a random variable F : Ω →
F with probabilities pi = P (F = fi) such that
i=i pi = 1 and
i=1 piK
∞. Now a random Boolean network consisting of N gates is constructed as
follows: For each gate a Boolean function is chosen independently, where the
probability of choosing fi is given by pi. Suppose a function f was chosen that
has K arguments, these arguments are chosen at random from all
equally
likely possibilities. At last an initial state is chosen at random from the set on
all equally likely states. If the Boolean functions are chosen according to our
previously defined random variable FK we will call this networks NK-Networks
with connectivity K. If the functions are chosen according to F fixedK,p or FK,p
we will call this networks biased random Boolean networks with connectivity K
and fixed bias p respectively mean bias p.
Let us now state Lynch’s results. His analysis depends on a parameter
R ∋ λ ≥ 0 depending only on the functions and their probabilities. We will
define λ later in Definition 3. First we have to state Lynch’s definition of
freezing and ineffective gates:
Definition 2 (Lynch [4] Definition 1 Item 2 and 5).
Let x ∈ FN2 and v ∈ V .
1. Gate v freezes to y ∈ FN2 in t steps on input x if B
v (x) = y for all t
′ ≥ t.
2. Let u(i) ∈ Fn2 .
A gate v is t-ineffective at input x ∈ FK2 if B
t(x) = Bt(x⊕ u(v)).
Now we will state the main result.
Theorem 4 (Lynch [4] Theorem 4 and 6).
Let α, β be positive constants satisfying 2α log δ+2β < 1 and α log δ < β where
δ = E(Ki).
1. There is a constant r such that for all x ∈ FN2
P (v is ineffective in α log N steps) = r
When λ ≤ 1, r = 1 and when λ > 1 , r < 1.
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
2. There is a constant r such that for all x ∈ FN2
P (v is freezing in α log N steps) = r
When λ ≤ 1, r = 1 and when λ > 1 , r < 1. 3
The above theorem shows that if λ ≤ 1 almost all gates are freezing and
ineffective and otherwise not. The next corollary gives us more information
what happens if λ > 1:
Corollary 1 (Lynch [4] Corollary 3 and Corollary 6). Let λ > 1. For almost
all random Boolean networks
1. if gate v is not α logN -ineffective, there is a positive constant W such that
for t ≤ α logN , the number of gates affected by v at time t is asymptotic
to Wλt,
2. if gate v is not freezing in α logN steps , there is a positive constant W
such that for t ≤ α logN , the number of gates that affect v at time t is
asymptotic to Wλt.
Now we will state the definition of λ for Boolean networks:
Definition 3 (Lynch [4], Definition 4). Let f be a Boolean function of K ar-
guments. For i ∈ {1, . . . ,K}, we say that argument i directly affects f on input
w ∈ FK2 if f(w) 6= f(w ⊕ u
(i)). Now put γ(f,w) as the number of i’s that
directly affect f on input w. Given a constant a ∈ [0, 1], we define
γ(fi,w)a
wH(w)(1− a)Ki−wH(w).
Obviously γ(f,w) is identical to sf (w) which will be used instead in the
further discussion. The constant a is the probability that a random gate is one
(at infinite time) given that all gates at time 0 have probability 0.5 of being one.
(see [4, Definiton 2]). Assume that we choose the functions according a random
variable F which should be either FK , F
fixed
K,p or FK,p. The functions are chosen
out the set FK , we denote a function’s probability with pf . It follows that
awH(w)(1− a)K−wH(w)
pfsf (w) (5)
awH(w)(1− a)K−wH(w)E(sF (w)) (6)
= E(sF (w))
ai(1− a)K−i (7)
= E(sF (w)) = EF (sf ) (8)
3Please note that we here state a slightly weaker result than in the original analysis.
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
E(sF (w)) denotes the expectation of the sensitivity for a fixed w, Equation (7)
follows from Lemma 1. Therefore, together with Theorem 1 and Theorem 3 we
proved the following:
Theorem 5 (Biased random Boolean networks). For random Boolean networks,
1. the functions are chosen according random variable FK,p, it follows that
λ = 2Kp(1− p),
2. the functions are chosen according random variable F
fixed
, it follows that
2K+1Kp(1− p)
2K − 1
As a special case of the above theorem we get (or by using Theorem 1)
Theorem 6 (NK-Networks). In random Boolean networks, where the functions
are chosen according to the random variable FK
5 Discussion
The results about NK-Networks are consistent with experimental results. In
fact if K ≤ 2 almost all networks almost all gates are freezing and almost all
gates are ineffective and otherwise not (see [2]).
Obviously, the border between the ordered and disordered phase is given
by λ = 1. The resulting phase diagram for biased random Boolean networks,
where the functions are chosen according to F fixedK,p and FK,p is shown in Figure
1. It it interesting to note that if the functions are chosen with fixed bias, then
also Boolean networks with connectivity K = 2 can become unstable. This
conclusion can be drawn from Lynch’s original result already. As mentioned in
the introduction, he showed for K = 2, that λ > 1 if the probability of choosing
a non-constant non-canalizing function, namely the XOR or the inverted XOR
function, is larger than the probability of choosing a constant function. For
example if the bias is 0.5, the probability of choosing a constant function is
zero, whereas both XOR and inverted XOR function have probability greater
zero, hence λ > 1.
It is interesting to compare our results with previous results obtained first
by Derrida and Pomeau using the so called annealed approximation (see [3]).
In their annealed model the functions and connections are chosen at random
at each time step. Considering two instances of the same annealed network
starting in two randomly chosen initial states s1(0), s2(0) they show that
dH(s1(t), s2(t))
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
1 3 4 52
Figure 1: Phase diagram for biased random networks: Functions chosen accord-
ing FK,p (dashed) and F
fixed
K,p (solid)
where c = 1 if
2Kp(1− p) ≤ 1
and c ≤ 1 otherwise. It is remarkable that the two models behave similar, but
it is unclear whether this holds in general.
6 Acknowledgement
We would like to thank our colleges Georg Schmidt and Stephan Stiglmayr for
proofreading and Uwe Schoening for useful hints.
References
[1] S. Kauffman, Metabolic stability and epigenesis in randomly constructed
nets, Journal of Theoretical Biology 22 (1969) 437–467.
[2] S. Kauffman, The large scale structure and dynamics of genetic control
circuits: an ensemble approach, Journal of Theoretical Biology 44 (1974)
167–190.
[3] B. Derrida, Y. Pomeau, Random networks of automata - a simple annealed
approximation, Europhysics Letters 2 (1986) 45–49.
[4] J. F. Lynch, Dynamics of random boolean networks, in: Conference on
Mathematical Biology and Dynamical Systems, University of Texas at
Tyler, 2005.
DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  DRAFT  --  
[5] J. F. Lynch, On the threshold of chaos in random boolean cellular au-
tomata, Random Structures and Algorithms (1995) 236–260.
[6] J. F. Lynch, Critical points for random boolean networks, Physica D: Non-
linear Phenomena 172 (1-4) (2002) 49–64.
[7] I. Wegener, The Complexity of Boolean Functions, Wiley-Teubner Series
in Computer Science, John Wiley, B.G. Teubner, 1987.
[8] I. Shmulevich, S. A. Kauffman, Activities and sensitivities in boolean net-
work models, Physical Review Letters 93 (4).
[9] S. Kauffman, C. Peterson, B. Samuelsson, C. Troeln, Genetic networks with
canalyzing boolean rules are always stable, Proceedings of the Nationial
Academy of Science 101 (49).
[10] A. Bernasconi, Mathematical techniques for the analysis of boolean func-
tions, Ph.D. thesis, Dipartimento di Informatica, Universita di Pisa (March
1998).
ABSTRACT
  In this work we consider random Boolean networks that provide a general model
for genetic regulatory networks. We extend the analysis of James Lynch who was
able to proof Kauffman's conjecture that in the ordered phase of random
networks, the number of ineffective and freezing gates is large, where as in
the disordered phase their number is small. Lynch proved the conjecture only
for networks with connectivity two and non-uniform probabilities for the
Boolean functions. We show how to apply the proof to networks with arbitrary
connectivity $K$ and to random networks with biased Boolean functions. It turns
out that in these cases Lynch's parameter $\lambda$ is equivalent to the
expectation of average sensitivity of the Boolean functions used to construct
the network. Hence we can apply a known theorem for the expectation of the
average sensitivity. In order to prove the results for networks with biased
functions, we deduct the expectation of the average sensitivity when only
functions with specific connectivity and specific bias are chosen at random.

<|endoftext|><|startoftext|>
Theory of polariton mediated Raman scattering in microcavities
L. M. León Hilario, A. Bruchhausen, A. M. Lobos, and A. A. Aligia
Centro Atómico Bariloche and Instituto Balseiro,
Comisión Nacional de Enerǵıa Atómica,
8400 S. C. de Bariloche, Argentina
(Dated: November 4, 2018)
Abstract
We calculate the intensity of the polariton mediated inelastic light scattering in semiconductor
microcavities. We treat the exciton-photon coupling nonperturbatively and incorporate lifetime
effects in both excitons and photons, and a coupling of the photons to the electron-hole continuum.
Taking the matrix elements as fitting parameters, the results are in excellent agreement with
measured Raman intensities due to optical phonons resonant with the upper polariton branches in
II-VI microcavities with embedded CdTe quantum wells.
PACS numbers: 71.36.+c, 78.30.Fs, 78.30.-j
http://arxiv.org/abs/0704.0198v1
Planar semiconductors microcavities (MC’s) have attracted much attention in the last
decade as they provide a novel means to study, enhance and control the interaction between
light and matter [1, 2, 3, 4, 5, 6, 7]. When the MC mode (cavity-photon) is tuned in near
resonance with the embedded quantum-well (QW) exciton transitions, and the damping
processes involved are weak in comparison to the photon-matter interaction, the eigenstates
of the system become mixed exciton-photon states, cavity-polaritons, which are in part light
and in part matter bosonic quasi-particles [1, 2, 3]. Examples of interesting new physics are
the recent evidence of a Bose-Einstein condensation of polaritons in CdTe MC’s [5, 6], and
the construction of devices which increase the interaction of sound and light, opening the
possibility of realizing a coherent monochromatic source of acoustic phonons [7].
Raman scattering due to longitudinal optical (LO) phonons, being a coherent process is
intrinsically connected with the cavity-polariton. The physics of strongly coupled photons
and excitons, the polariton–phonon interaction, and the polariton–external-photon coupling
are clearly displayed [8, 9, 10, 11, 12]. In particular resonant Raman scattering (RRS)
experiments, in which the wave-length of the incoming radiation is tuned in a way such that,
after the emission of a LO phonon, the energy of the outgoing radiation coincides with that of
the cavity polariton branches, have proven to be suited to sense the dynamics of the coupled
modes, and to obtain information about the dephasing of the resonant polaritonic state
[8, 9, 10, 11, 12]. Unfortunately, due to the large remaining luminesence, RRS experiments
in resonance with the lower polariton branch have not yet been achieved in intrinsic II-VI
MC’s [13]. Therefore all reported experiments in these kind of MC’s, with embedded CdTe
QW’s, consider the case of the scattered photons in outgoing resonance with the upper
polariton branch (for two branch-systems: coupling of one exciton mode and the cavity
photon) or with the middle polariton branch (for tree branch-systems: coupling of two
exciton modes and the cavity photon) [9, 10, 14]. The measured intensities in these two
systems were analyzed on the basis of a model in which one (two) exciton states |e〉 are
mixed with a photon state |f〉 with the same in-plane wave vector k, leading to a 2x2 (3x3)
matrix. Essentially, the Raman intensity is proportional to
I ∼ TiTs|〈Pi|H
′|Ps〉|
2, (1)
where Ti describe the probability of conversion of an incident photon |fi〉 into the polariton
state |Pi〉, Ts has an analogous meaning for the scattered polariton |Ps〉 and the outgoing
photon |fs〉, and H
′ is the interaction between electrons and the LO phonons. At the
conditions of resonance with the outgoing polariton, Ti is very weakly dependent on laser
energy or detuning (difference between photon and exciton energies), while Ts is proportional
to the photon strength of the scattered polariton |〈fs|Ps〉|
2. Similarly, in the simplest case
of only one exciton (2x2 matrix) one expects that the matrix element entering Eq. (1) is
proportional to the exciton part of the scattered polariton |〈Pi|H
′|Ps〉|
2 ∼ |〈es|Ps〉|
2. Thus
if the wave function is |Ps〉 = α|fs〉+ β|es〉, one has
I ∼ |〈fs|Ps〉|
2|〈es|Ps〉|
2 = |α|2|β|2. (2)
This model predicts a Raman intensity which is symmetric with detuning and is maximum
at zero detuning. In other words, the intensity is maximum for detunings such that the
scattered polariton is more easily coupled to the external photons, but at the same time
when the polariton is more easily coupled to the optical phonons, which requires a large
matter (exciton) component of the polariton. This result is in qualitative agreement with
experiment [10]. However, for positive detuning the experimental results fall bellow the
values predicted by Eq. (2) (see Fig. 1). This is ascribed to the effects of the electron-hole
continuum above the exciton energy, which are not included in the model [10].
For another sample in which two excitons are involved, the above analysis can be ex-
tended straightforwardly and the intensity depends on the amplitudes of a 3x3 matrix and
matrix element of H ′ involving both excitons [10]. However, comparison with experiments at
resonance with the middle polariton branch, shows a poorer agreement than in the previous
case (Fig. 6 of Ref. 10). In addition, a loss of coherence between the scattering of both
excitons with the LO phonon was assumed, which is hard to justify. Some improvement has
been obtained recently when damping effects are introduced phenomenologically as imagi-
nary parts of the photon and exciton energies, but still a complete loss of coherence resulted
from the fit [14, 15].
In this paper we include the states of the electron-hole continuum, and the damping
effects in a more rigorous way. Using some matrix elements as free parameters, we can
describe accurately the Raman intensities for both samples studied in Ref. 10.
In the experiments with CdTe QW’s inside II-VI MC’s, the light incides perpendicular to
the (x, y) plane of the QW’s and is collected in the same direction z. Therefore the in-plane
wave vector K = 0, the polarization of the electric field should lie in the (x, y) plane, and
the excitons which couple with the light should have the same symmetry as the electric
field (one of the two Γ5 states of heavy hole excitons [16]). Thus, to lighten the notation
we suppress wave vector and polarization indices. The basic ingredients of the theory are
two or three strongly coupled boson modes, one for the light MC eigenmode with boson
creation operator f †, another one for the 1s exciton (e1) and if is necessary, the 2s exciton
(e2) is also included. We assume that each of these boson states mixes with a continuum of
bosonic excitations which broadens its spectral density. In addition, we include the electron-
hole continuum above the exciton states, described by bosonic operators c
, ck, where k is
the difference between electron and hole momentum in the (x, y) plane (the sum is K = 0
because it is conserved).
The Hamiltonian reads:
H = Eff
iei +
if +H.c.) +
prp +
pf +H.c.)
iqdiq +
(Viqd
iqei +H.c.) +
f +H.c.). (3)
The first three terms describe the strong coupling between the MC photon and the exciton(s)
already included in previous approaches [10, 15]. The fourth and fifth terms describe a
continuum of radiative modes and its coupling to the MC light eigenmode. Their main effect
is to broaden the spectral density of the latter even in the absence of light-matter interaction.
The following two terms have a similar effect for the exciton mode(s). The detailed structure
of the states described by the d
iq operators is not important in what follows. They might
describe combined excitations due to scattering with acoustical phonons. The last two terms
correspond to the energy of the electron hole excitations and their coupling to the MC light
mode.
In Eq. (3) we are making the usual approximation of neglecting the internal fermionic
structure of the excitons and electron-hole operators and taking them as free bosons. This
is an excellent approximation for the conditions of the experiment. We also neglect terms
which do not conserve the number of bosons. Their effect is small for the energies of
interest [17]. These approximations allow us formally to diagonalize the Hamiltonian by a
Bogoliubov transformation. The diagonalized Hamiltonian has the form H =
νpν ,
where the boson operators p†ν correspond to generalized polariton operators and are linear
combinations of all creation operators entering Eq. (3). Denoting the latter for brevity as b
then p†ν =
j . In practice, instead of calculating the Eν and Aνj , it is more convenient
to work with retarded Green’s functions Gjl(ω) = 〈〈bj; b
l 〉〉ω and their equations of motion
ω〈〈bj; b
l 〉〉ω = δjl + 〈〈[bj, H ]; b
l 〉〉ω. (4)
As we show below, the RRS intensity can be expressed in terms of spectral densities derived
from these Green’s functions, which in turn can be calculated using Eq. (4).
Using Fermi’s golden rule, the probability per unit time for a transition from a polariton
state |i〉 = p
|0〉 to states |s〉 = p†νa
†|0〉, where a† creates a LO phonon is
|〈i|H ′|s〉|2ρ(ω), where ρ(ω) =
δ(ω − Eν) = −
ImGνν(ω + i0
+) (5)
is the density of final states. As argued above, we neglect the dependence of Ti on frequency
and take Ts = |Aνf |
2, the weight of photons in the scattered eigenstates. In addition, H ′
should be proportional to the matter (exciton) part of the scattered polariton states. Then,
the Raman intensity is proportional to:
WTs ∝ I = |Aνe1 + αAνe2|
2|Aνf |
2ρ(ω). (6)
Here we are neglecting the contribution of the electron-hole continuum to H ′, and α is the
ratio of matrix elements of the exciton-LO phonon interaction between 2s and 1s excitons.
If the 2s excitons are unimportant, α = 0.
Using Eqs. (4) it can be shown that
ρjl(ω) = −
[Gjl(ω + i0
+)−Gjl(ω − i0
−)] = AνjĀνlρ(ω). (7)
From here and
|Aνj|
2 = 1, it follows that ρ(ω) =
ρjj(ω). Replacing in Eq. (6) we
obtain:
I(ω) =
ρff [ρe1,e1 + |α|
2ρe2,e2 + 2Re(αρe2,e1)]∑
ρjj(ω)
. (8)
In practice, when ω is chosen such that the resonance condition for the outgoing polariton
is fulfilled, we can neglect the contribution of the continuum states in the denominator of
Eq. (8). In particular, if the contribution of the 2s exciton can be neglected (as in sample
A of Ref. 10)
I(ω) =
ρff (ω)ρe1,e1(ω)
ρff(ω) + ρe1,e1(ω)
. (9)
The Green’s functions are calculated from the equations of motion (4). In the final expres-
sions, the continuum states enter through the following sums:
Sf (ω) =
ω + i0+ − ǫp
, Si(ω) =
|Viq|
ω + i0+ − ǫiq
, S ′f(ω) =
ω + i0+ − ǫk
For the first two we assume that the results are imaginary constants that we take as param-
eters:
Sf(ω) = −iδf , Sj(ω) = −iδj (11)
This is the result expected for constant density of states and matrix elements. Our results
seem to indicate that this assumption is valid for the upper and middle polariton branches.
For the lower branch at small k it has been shown that the line width due to the interaction
of polaritons with acoustic phonons depends on detuning Ef −E1 and k‖, being smaller for
small wave vector [18, 19].
The electron-hole continuum begins at the energy of the gap and corresponds to vertical
transitions in which the light promotes a valence electron with 2D wave vector k to the
conduction band with the same wave vector. In the effective-mass approximation, the energy
is quadratic with k and this leads to a constant density of states beginning at the gap. The
matrix element Vk is proportional to Mk = 〈kv|pE |kc〉, where pE is the momentum operator
in the direction of the electric field, and |kv〉, |kc〉 are the wave functions for valence and
conduction electrons with wave vector k. Taking these wave functions as plane waves, one
has Mk ∼ kE , the wave vector in the direction of the electric field. Then |Vk|
2 ∼ k2E. Adding
the contributions of all directions of k one has |Vk|
2 ∼ |k|2 ∼ ǫk (linear with energy for small
energy). This leads to
S ′f = R(ω)− iA(ω −B)Θ(ω −B), (12)
where B is the bottom of the electron-hole continuum (the energy of the semiconductor gap)
and A is a dimensionless parameter that controls the magnitude of the interaction. The real
part R(ω) can be absorbed in a renormalization of the photon energy and is unimportant
in what follows. The imaginary part is a correction to the photon width for energies above
the bottom of the continuum.
Using the theory outlined above, we calculated the intensity of RRS corresponding to
the samples A and B measured in Refs. 10, 14 and compared them with the experimental
results.
Sample A corresponds to the simplest case. Two polariton branches are seen and therefore
only the 1s exciton plays a significant role. The binding energy of this exciton B−E1 is not
well known. The Rabi splitting 2V1 = 19 meV. The width of the Raman scan as a function
of frequency for zero detuning is of the order of w = 0.1 meV (see Fig. 3 of Ref. 10). This
implies the relation 2w2 = δ2f + δ
in our theory. In any case the results are weakly sensitive
to w. Therefore, we have three free parameters in our theory in addition to a multiplicative
constant: B −E1, the ratio of widths δf/δ1 and the slope A.
-15 -10 -5 0 5 10 15
Ef - E1  (meV)
FIG. 1: Raman intensity as a function of detuning for sample A. Solid squares: experimental
results [10]. Solid line: theory (Eq. 9) for B − E1 = 14 meV, δf = δ1 = 0.1 meV, and A = 0.031.
Dashed line: result for a 2x2 matrix (Eq. 2).
The comparison between the experimental and the theoretical intensities is shown in
Fig. 1 for a set of parameters that lead to a close agreement with experiment. The condition
of resonance is established choosing the energy ω for which the intensity given by Eq. (9) has
its second relative maximum (corresponding to the upper polariton branch). The dashed line
corresponds to the case in which only the first three terms (with i = 1) in the Hamiltonian,
Eq. (3) are included. In this case, the intensity is given in terms of the solution of a 2x2
matrix [Eq. (2)] and was used in Ref. 10 to interpret the data. This simple expression gives
a Raman intensity which is an even function of detuning. When the full model is considered,
the Raman intensity falls more rapidly for large detuning Ef − E1 as a consequence of the
hybridization of the photon with the electron-hole continuum. When the energy of the
polariton increases beyond B entering the electron-hole continuum (corresponding to the
kink in Fig. 1), the Raman peak broadens and loses intensity. The kink can be smoothed
if the effect of the infinite excitonic levels below the continuum is included in the model
(leading to an S ′f with continuous first derivative), but this is beyond the scope of this
work. If the ratio δf/δ1 is enlarged, the Raman intensity increases for negative detuning
with respect to its value for positive detuning.
In the experiments with sample B three polariton branches are observed [14] and the
2s exciton plays a role. Experimentally, it is known that the binding energy for the two
excitons are B − E1 =17 meV and B − E2 =2 meV. From the observed Rabi splitting one
has 2V1 = 13 meV, 2V2 = 2.5 meV. In comparison with the previous case, we have the
additional parameter α (the ratio of exciton-LO phonon matrix elements). In addition, to
be able to describe well the intensity for low energies of the middle polariton (left part of
the curve shown in Fig. 2), we need to assume a small linear dependence of E1 with the
position of the incident laser spot in the sample. This dependence is also inferred form the
observed luminescence spectrum [14]. In our model, this corresponds to a dependence of E1
with Ef :
E1 = E
+ z(Ef − E
) (13)
In Fig. 2 we show the intensity at the second maximum of I(ω) (corresponding to the
middle polariton branch) as a function of the energy of this maximum. We also show in the
figure experimental results taken at lower laser excitation and a slightly higher temperature
(4.5 K) than those reported in Ref. 10.
The slope which better describes the data is z = 0.14. This value is close to z = 0.155
which was obtained from a fit of the maxima of luminescence spectrum of the lower and
middle polaritons. We have taken the same value for A as in Fig. 1. The agreement between
theory and experiment is remarkable. As for sample A, the values of δi that result from the
fit are reasonable in comparison with calculated values [18].
1,645 1,650 1,655 1,660 1,665
Middle polariton energy (eV)
FIG. 2: Raman intensity as a function of the middle polariton energy. Solid squares: experimental
results [14]. Solid line: theory for δf = 0.2 meV, δ1 = 0.1 meV, δ2 = 0.12 meV, α = −0.45 and
z = 0.14. A is the same as in Fig. 1 .
In summary, we have proposed a theory to calculate Raman intensity for excitation
of longitudinal optical phonons in microcavities, in which different matrix elements are
incorporated as parameters of the model. The most important advance in comparison with
previous simplified theories [10, 15] is the inclusion of the strong coupling of the electron-
hole continuum with the microcavity photon. Inclusion of this coupling is essential when the
energy of the polariton is near the bottom of the conduction band (at the right of Fig. 1).
We also have included the effects of damping of excitons and photons, coupling them with
a continuum of bosonic excitations. Simpler approaches have included the spectral widths
δf and δi of photons and excitons as imaginary parts of the respective energies, leading to
non-hermitian matrices.
Taking some of the parameters of the model as free (δf/δi, B − E1 and A for sample A,
δf , δi, z and α for sample B), we obtain excellent fits of the observed Raman intensities. The
resulting values of the parameters agree with previous estimates, if they are available. We
are not aware of previous estimates for A and α. As an important improvement to previous
approaches [10, 15] for the case of sample B, we do not have to assume a partial loss of
coherence between 1s and 2s excitons in their scattering with the LO phonon.
Further progress in the understanding of the interaction of excitons with light and
phonons requires microscopic calculations of the parameters δf , δi, A and α, and the effects
of the temperature on them. However, taking into account the difficulties in calculating
these parameters accurately, the present results are encouraging and suggest that the main
physical ingredients are included in our model.
We thank A. Fainstein for useful discussions. This work was supported by PIP 5254 of
CONICET and PICT 03-13829 of ANPCyT.
[1] Kavokin A and Malpuech G 2003 Cavity Polaritons (Elsevier, Amsterdam)
[2] Special issue on microcavities 2003 Semicond. Sci. Technol. 18 10 S279-S434
[3] Special issue on Photon-mediated phenomena in semiconductor nanostructures J. Phys. Con-
dens. Matter 18 35 S3549-S3768
[4] Skolnick M S, Fisher T and Whittaker D M 1998 Semicond. Sci. Technol 13 645
[5] Kasprzak J, Richard M, Kundermann M, Baas A, Jeambrun P, Keeling J M J, Marchetti F
M, Szymanska M H, André R, Staehli J L, Savona V, Littlewood P B, Deveaud B and Le Si
Dang 2006 Nature 443 409
[6] Deng H, Press D, Götzinger S, Solomon G S, Hey R, Ploog K H and Yamamoto Y 2006 Phys.
Rev. Lett. 97 146402
[7] Trigo M, Bruchhausen A, Fainstein A, Jusserand B and Thierry-Mieg V 2002 Phys. Rev. Lett.
89 227402
[8] Fainstein A, Jusserand B and Thierry-Mieg V 1997 Phys. Rev. Lett. 78 1576
[9] Fainstein A, Jusserand B and André R 1998 Phys. Rev. B 57 R9439
[10] Bruchhausen A, Fainstein A, Jusserand B and André R 2003 Phys. Rev. B 68 205326
[11] Tribe W R, Baxter D, Skolnick M S, Mowbray D J, Fisher T A and Roberts J S 1997 Phys.
Rev. B 56 12 429
[12] Stevenson R M, Astratov V N, Skolnick M S, Roberts J S and Hill G 2003 Phys. Rev. B 67
081301(R)
[13] RRS experiments in resonance with the lower polariton branch have only been reported in
III-VI samples with doped Bragg reflectors (see refs. [11] and [12] for details).
[14] Fainstein A and Jusserand B 2006 in Light Scattering in Solids vol. 9 Cardona M and Merlin
R editors (Springer, Berlin)
[15] Bruchhausen A, Fainstein A and Jusserand B 2005 Physics of semiconductors CP772 p.1117
Menéndez J and Van de Walle C G editors (American Institute of Physics)
[16] Jorda S, Rössler U and Broido D 1993 Phys. Rev. B 48 1669
[17] Jorda S 1994 Phys. Rev. B 50 2283
[18] Savona V and Piermarocchi C 1997 Phys. Status Solidi A 164 45
[19] Cassabois G, Triques A L C, Bogani F, Delalande C, Roussignol Ph and Piermarocchi C 2000
Phys. Rev. B 61 1696
	References
ABSTRACT
  We calculate the intensity of the polariton mediated inelastic light
scattering in semiconductor microcavities. We treat the exciton-photon coupling
nonperturbatively and incorporate lifetime effects in both excitons and
photons, and a coupling of the photons to the electron-hole continuum. Taking
the matrix elements as fitting parameters, the results are in excellent
agreement with measured Raman intensities due to optical phonons resonant with
the upper polariton branches in II-VI microcavities with embedded CdTe quantum
wells.

<|endoftext|><|startoftext|>
Introduction
The introduction of non-crossing partitions for finite reflection groups (finite Coxeter
groups) by Bessis [8] and Brady and Watt [15] marks the creation of a new, exciting
subject of combinatorial theory, namely the study of these new combinatorial objects
which possess numerous beautiful properties, and seem to relate to several other ob-
jects of combinatorics and algebra, most notably to the cluster complex of Fomin and
2000 Mathematics Subject Classification. Primary 05E15; Secondary 05A05 05A10 05A15 05A18
06A07 20F55 33C05.
Key words and phrases. root systems, reflection groups, Coxeter groups, generalised non-crossing
partitions, annular non-crossing partitions, chain enumeration, Möbius function, M -triangle, gener-
alised cluster complex, face numbers, F -triangle, Chu–Vandermonde summation.
†Research partially supported by the Austrian Science Foundation FWF, grant S9607-N13, in the
framework of the National Research Network “Analytic Combinatorics and Probabilistic Number
Theory”.
http://arxiv.org/abs/0704.0199v3
2 C. KRATTENTHALER AND T. W. MÜLLER
Zelevinsky [21] (cf. [2, 3, 5, 4, 8, 9, 14, 15, 16, 17, 20]). They reduce to the classical
non-crossing partitions of Kreweras [30] for the irreducible reflection groups of type An
(i.e., the symmetric groups), and to Reiner’s [32] type Bn non-crossing partitions for
the irreducible reflections groups of type Bn. (They differ, however, from the type Dn
non-crossing partitions of [32].) The subject has been enriched by Armstrong through
introduction of his generalised non-crossing partitions for reflection groups in [1]. In
the symmetric group case, these reduce to the m-divisible non-crossing partitions of
Edelman [18], while they produce new combinatorial objects already for the reflection
groups of type Bn. Again, these generalised non-crossing partitions possess numerous
beautiful properties, and seem to relate to several other objects of combinatorics and
algebra, most notably to the generalised cluster complex of Fomin and Reading [19] (cf.
[1, 6, 7, 20, 27, 28, 29, 36, 37, 38]).
From a technical point of view, the main subject matter of the present paper is
the computation of the number of certain factorisations of the Coxeter element of a
reflection group. These decomposition numbers, as we shall call them from now on
(see Section 2 for the precise definition), arose in [27, 28], where it was shown that
they play a crucial role in the computation of enumerative invariants of (generalised)
non-crossing partitions. Moreover, in these two papers the decomposition numbers for
the exceptional reflection groups have been computed, and it was pointed out that the
decomposition numbers in type An (i.e., the decomposition numbers for the symmetric
groups) had been earlier computed by Goulden and Jackson in [23]. Here, we explain
how the decomposition numbers in type Bn can be extracted from results of Bóna,
Bousquet, Labelle and Leroux [12] on the enumeration of certain planar maps, and we
find formulae for the decomposition numbers in type Dn, thus completing the project
of computing the decomposition numbers for all the irreducible reflection groups.
The main goal of the present paper, however, is to access the enumerative theory
of the generalised non-crossing partitions of Armstrong via these decomposition num-
bers. Indeed, one finds numerous enumerative results on ordinary and generalised
non-crossing partitions in the literature (cf. [1, 2, 4, 8, 9, 18, 30, 32, 37]): results on
the total number of (generalised) non-crossing partitions of a given size, of those with a
fixed number of blocks, of those with a given block structure, results on the number of
(multi-)chains of a given length in a given poset of (generalised) non-crossing partitions,
results on rank-selected chain enumeration (that is, results on the number of chains in
which the ranks of the elements of the chains have been fixed), etc. We show that not
only can all these results be rederived from our decomposition numbers, we are also
able to find several new enumerative results. In this regard, the most general type of
result that we find is formulae for the number of (multi-)chains π1 ≤ π2 ≤ · · · ≤ πl−1 in
the poset of non-crossing partitions of type An, Bn, respectively Dn, in which the block
structure of π1 is fixed as well as the ranks of π2, . . . , πl−1. Even the corresponding
result in type An, for the non-crossing partitions of Kreweras, is new. Furthermore,
from the result in type Dn, by a suitable summation, we are able to find a formula for
the rank-selected chain enumeration in the poset of generalised non-crossing partitions
of type Dn, thus generalising the earlier formula of Athanasiadis and Reiner [4] for
the rank-selected chain enumeration of “ordinary” non-crossing partitions of type Dn.
In conjunction with the results from [27, 28], this generalisation in turn allows us to
complete a computational case-by-case proof of Armstrong’s “F = M Conjecture” [1,
Conjecture 5.3.2] predicting a surprising relationship between a certain face count in the
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 3
generalised cluster complex of Fomin and Reading and the Möbius function in the poset
of generalised non-crossing partitions of Armstrong. (A case-free proof had been found
earlier by Tzanaki in [38].) Our results allow us also to address another conjecture of
Armstrong [1, Conj. 3.5.13] on maximal intervals containing a random multichain in
the poset of generalised non-crossing partitions. We show that the conjecture is indeed
true for types An and Bn, but that it fails for type Dn (and we suspect that it will also
fail for most of the exceptional types).
We remark that a totally different approach to the enumerative theory of (gener-
alised) non-crossing partitions is proposed in [29]. This approach is, however, com-
pletely combinatorial and avoids, in particular, reflection groups. It is, therefore, not
capable of computing our decomposition numbers nor anything else which is intrinsic to
the combinatorics of reflection groups. A similar remark applies to [31, Theorem 4.1],
where a remarkable uniform recurrence is found for rank-selected chain enumeration
in the generalised non-crossing partitions of any type. It could be used, for example,
for verifying our result in Corollary 19 on the rank-selected chain enumeration in the
generalised non-crossing partitions of type Dn, but it is not capable of computing our
decomposition numbers nor of verifying results with restrictions on block structure.
Our paper is organised as follows. In the next section we define the decomposition
numbers for finite reflection groups from [27, 28], the central objects in our paper,
together with a combinatorial variant, which depends on combinatorial realisations of
non-crossing partitions, which we also explain in the same section. This is followed by
an intermediate section in which we collect together some auxiliary results that will
be needed later on. In Section 4, we recall Goulden and Jackson’s formula [23] for
the full rank decomposition numbers of type An, together with the formula from [28,
Theorem 10] that it implies for the decomposition numbers of type An of arbitrary rank.
The purpose of Section 5 is to explain how formulae for the decomposition numbers of
type Bn can be extracted from results of Bóna, Bousquet, Labelle and Leroux in [12].
The type Dn decomposition numbers are computed in Section 6. The approach that
we follow is, essentially, the approach of Goulden and Jackson in [23]: we translate the
counting problem into the problem of enumerating certain maps. This problem is then
solved by a combinatorial decomposition of these maps, translating the decomposition
into a system of equations for corresponding generating functions, and finally solving
this system with the help of the multidimensional Lagrange inversion formula of Good.
Sections 7–11 form the “applications part” of the paper. In the preparatory Section 7,
we recall the definition of the generalised non-crossing partitions of Armstrong, and we
explain the combinatorial realisations of the generalised non-crossing partitions for the
types An, Bn, and Dn from [1] and [29]. The bulk of the applications is contained in
Section 8, where we present three theorems, Theorems 11, 13, and 15, on the number of
factorisations of a Coxeter element of type An, Bn, respectively Dn, with less stringent
restrictions on the factors than for the decomposition numbers. These theorems result
from our formulae for the (combinatorial) decomposition numbers upon appropriate
summations. Subsequently, it is shown that the corresponding formulae imply all known
enumeration results on non-crossing partitions and generalised non-crossing partitions,
plus several new ones, see Corollaries 12, 14, 16–19 and the accompanying remarks.
Section 9 presents the announced computational proof of the F = M (ex-)Conjecture for
type Dn, based on our formula in Corollary 19 for the rank-selected chain enumeration
in the poset of generalised non-crossing partitions of type Dn, while Section 10 addresses
4 C. KRATTENTHALER AND T. W. MÜLLER
Conjecture 3.5.13 from [1], showing that it does not hold in general since it fails in type
Dn. In the final Section 11 we point out that the decomposition numbers do not only
allow one to derive enumerative results for the generalised non-crossing partitions of the
classical types, they also provide all the means for doing this for the exceptional types.
For the convenience of the reader, we list the values of the decomposition numbers for
the exceptional types that have been computed in [27, 28] in an appendix.
In concluding the introduction, we want to attract the reader’s attention to the
fact that many of the formulae presented here are very combinatorial in nature (see
Sections 4, 5, 8). This raises the natural question as to whether it is possible to find
combinatorial proofs for them. Indeed, a combinatorial (and, in fact, almost bijective)
proof of the formula of Goulden and Jackson, presented here in Theorem 5, has been
given by Bousquet, Chauve and Schaeffer in [13]. Moreover, most of the proofs for the
known enumeration results on (generalised) non-crossing partitions presented in [1, 2,
4, 18, 32] are combinatorial. On the other hand, to our knowledge so far nobody has
given a combinatorial proof for Theorem 7, the formula for the decomposition numbers
of type Bn, essentially due to Bóna, Bousquet, Labelle and Leroux [12], although we
believe that this should be possible by modifying the ideas from [13]. There are also
other formulae in our paper (see e.g. Corollaries 12 and 14, Eqs. (6.1) and (8.33)) which
seem amenable to combinatorial proofs. However, to find combinatorial proofs for our
type Dn results (cf. in particular Theorem 9.(ii) and Corollaries 16–19) seems rather
hopeless to us.
2. Decomposition numbers for finite Coxeter groups
In this section, we introduce the decomposition numbers from [27, 28], which are
(Coxeter) group-theoretical in nature, plus combinatorial variants for Coxeter groups
of types Bn and Dn, which will be important in combinatorial applications. These
variants depend on the combinatorial realisation of these Coxeter groups, which we
also explain here.
Let Φ be a finite root system of rank n. (We refer the reader to [24] for all terminology
on root systems.) For an element α ∈ Φ, let tα denote the corresponding reflection in
the central hyperplane perpendicular to α. Let W = W (Φ) be the group generated by
these reflections. As is well-known (cf. e.g. [24, Sec. 6.4]), any such reflection group is at
the same time a finite Coxeter group, and all finite Coxeter groups can be realised in this
way. By definition, any element w of W can be represented as a product w = t1t2 · · · tℓ,
where the ti’s are reflections. We call the minimal number of reflections which is needed
for such a product representation the absolute length of w, and we denote it by ℓT (w).
We then define the absolute order on W , denoted by ≤T , via
u ≤T w if and only if ℓT (w) = ℓT (u) + ℓT (u
−1w).
As is well-known and easy to see, this is equivalent to the statement that every shortest
representation of u by reflections occurs as an initial segment in some shortest product
representation of w by reflections.
Now, for a finite root system Φ of rank n, types T1, T2, . . . , Td (in the sense of the
classification of finite Coxeter groups), and a Coxeter element c, the decomposition
number NΦ(T1, T2, . . . , Td) is defined as the number of “minimal” products c1c2 · · · cd
less than or equal to c in absolute order, “minimal” meaning that ℓT (c1) + ℓT (c2) +
· · ·+ ℓT (cd) = ℓT (c1c2 · · · cd), such that, for i = 1, 2, . . . , d, the type of ci as a parabolic
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 5
Coxeter element is Ti. (Here, the term “parabolic Coxeter element” means a Coxeter
element in some parabolic subgroup. The reader should recall that it follows from [8,
Lemma 1.4.3] that any element ci is indeed a Coxeter element in a parabolic subgroup
of W = W (Φ). By definition, the type of ci is the type of this parabolic subgroup. The
reader should also note that, because of the rewriting
c1c2 · · · cd = ci(c
i c1ci)(c
i c2ci) · · · (c
i ci−1ci)ci+1 · · · cd, (2.1)
any ci in such a minimal product c1c2 · · · cd ≤T c is itself ≤T c.) It is easy to see that the
decomposition numbers are independent of the choice of the Coxeter element c. (This
follows from the well-known fact that any two Coxeter elements are conjugate to each
other; cf. [24, Sec. 3.16].)
The decomposition numbers satisfy several linear relations between themselves. First
of all, the number NΦ(T1, T2, . . . , Td) is independent of the order of the types T1, T2, . . . ,
Td; that is, we have
NΦ(Tσ(1), Tσ(2), . . . , Tσ(d)) = NΦ(T1, T2, . . . , Td) (2.2)
for every permutation σ of {1, 2, . . . , d}. This is, in fact, a consequence of the rewriting
(2.1). Furthermore, by the definition of these numbers, those of “lower rank” can be
computed from those of “full rank.” To be precise, we have
NΦ(T1, T2, . . . , Td) =
NΦ(T1, T2, . . . , Td, T ), (2.3)
where the sum is taken over all types T of rank n − rkT1 − rkT2 − · · · − rkTd (with
rkT denoting the rank of the root system Ψ of type T , and n still denoting the rank of
the fixed root system Φ; for later use we record that
ℓT (w0) = rkT0 (2.4)
for any parabolic Coxeter element w0 of type T0).
The decomposition numbers for the exceptional types have been computed in [27,
28]. For the benefit of the reader, we reproduce these numbers in the appendix. The
decomposition numbers for type An are given in Section 4, the ones for type Bn are
computed in Section 5, while the ones for type Dn are computed in Section 6.
Next we introduce variants of the above decomposition numbers for the types Bn
and Dn, which depend on the combinatorial realisation of the Coxeter groups of these
types.
As is well-known, the reflection group W (An) can be realised as the symmetric group
Sn+1 on {1, 2, . . . , n+1}. The reflection groups W (Bn) and W (Dn), on the other hand,
can be realised as subgroups of the symmetric group on 2n elements. (See e.g. [11,
Sections 8.1 and 8.2].) Namely, the reflection group W (Bn) can be realised as the
subgroup of the group of all permutations π of
{1, 2, . . . , n, 1̄, 2̄, . . . , n̄}
satisfying the property
π(̄i) = π(i). (2.5)
(Here, and in what follows, ¯̄i is identified with i for all i.) In this realisation, there
is an analogue of the disjoint cycle decomposition of permutations. Namely, every
6 C. KRATTENTHALER AND T. W. MÜLLER
π ∈ W (Bn) can be decomposed as
π = κ1κ2 · · ·κs, (2.6)
where, for i = 1, 2, . . . , s, κi is of one of two possible types of “cycles”: a type A cycle,
by which we mean a permutation of the form
((a1, a2, . . . , ak)) := (a1, a2, . . . , ak) (a1, a2, . . . , ak), (2.7)
or a type B cycle, by which we mean a permutation of the form
[a1, a2, . . . , ak] := (a1, a2, . . . , ak, a1, a2, . . . , ak), (2.8)
a1, a2, . . . , ak ∈ {1, 2, . . . , n, 1̄, 2̄, . . . , n̄}. (Here we adopt notation from [15].) In both
cases, we call k the length of the “cycle.” The decomposition (2.6) is unique up to a
reordering of the κi’s.
We call a type A cycle of length k of combinatorial type Ak−1, while we call a type
B cycle of length k of combinatorial type Bk, k = 1, 2, . . . . The reader should observe
that, when regarded as a parabolic Coxeter element, for k ≥ 2 a type A cycle of length
k has type Ak−1, while a type B cycle of length k has type Bk. However, a type B
cycle of length 1, that is, a permutation of the form (i, ī), has type A1 when regarded
as a parabolic Coxeter element, while we say that it has combinatorial type B1. (The
reader should recall that, in the classification of finite Coxeter groups, the type B1 does
not occur, respectively, that sometimes B1 is identified with A1. Here, when we speak
of “combinatorial type,” then we do distinguish between A1 and B1. For example,
the “cycles” ((1, 2)) = (1, 2) (1̄, 2̄) or ((1̄, 2)) = (1̄, 2) (1, 2̄) have combinatorial type A1,
whereas the cycles [1] = (1, 1̄) or [2] = (2, 2̄) have combinatorial type B1.)
As Coxeter element for W (Bn), we choose
c = (1, 2, . . . , n, 1̄, 2̄, . . . , n̄) = [1, 2, . . . , n].
Now, given combinatorial types T1, T2, . . . , Td, each of which being a product of Ak’s
and Bk’s, k = 1, 2, . . . , the combinatorial decomposition number N
(T1, T2, . . . , Td) is
defined as the number of minimal products c1c2 · · · cd less than or equal to c in absolute
order, where “minimal” has the same meaning as above, such that for i = 1, 2, . . . , d
the combinatorial type of ci is Ti. Because of (2.1), the combinatorial decomposition
numbers N combBn (T1, T2, . . . , Td) satisfy also (2.2) and (2.3).
The reflection group W (Dn) can be realised as the subgroup of the group of all
permutations π of {1, 2, . . . , n, 1̄, 2̄, . . . , n̄} satisfying (2.5) and the property that an even
number of elements from {1, 2, . . . , n} is mapped to an element of negative sign. (Here,
the elements 1, 2, . . . , n are considered to have sign +, while the elements 1̄, 2̄, . . . , n̄ are
considered to have sign −.) Since W (Dn) is a subgroup of W (Bn), and since the above
realisation of W (Dn) is contained as a subset in the realisation of W (Bn) that we just
described, any π ∈ W (Dn) can be decomposed as in (2.6), where, for i = 1, 2, . . . , d, κi
is either a type A or a type B cycle. Requiring that π is in the subgroup W (Dn) of
W (Bn) is equivalent to requiring that there is an even number of type B cycles in the
decomposition (2.6). Again, the decomposition (2.6) for π ∈ W (Dn) is unique up to a
reordering of the κi’s.
As Coxeter element, we choose
c = (1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) (n, n̄) = [1, 2, . . . , n− 1] [n].
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 7
We shall be entirely concerned with elements π of W (Dn) which are less than or equal
to c. It is not difficult to see (and it is shown in [4, Sec. 3]) that the unique factorisation
of any such element π has either 0 or 2 type B cycles, and in the latter case one of the
type B cycles is [n] = (n, n̄). In this latter case, in abuse of terminology, we call the
product of these two type B cycles, [a1, a2, . . . , ak−1] [n] say, a “cycle” of combinatorial
type Dk. More generally, we shall say for any product of two disjoint type B cycles of
the form
[a1, a2, . . . , ak−1] [ak] (2.9)
that it is a “cycle” of combinatorial type Dk. The reader should observe that, when
regarded as parabolic Coxeter element, for k ≥ 4 an element of the form (2.9) has type
Dk. However, if k = 3, it has type A3 when regarded as parabolic Coxeter element,
while we say that it has combinatorial type D3, and, if k = 2, it has type A
1 when
regarded as parabolic Coxeter element, while we say that it has combinatorial type
D2. (The reader should recall that, in the classification of finite Coxeter groups, the
types D3 and D2 do not occur, respectively, that sometimes D3 is identified with A3,
D2 being identified with A
1. Here, when we speak of “combinatorial type,” then we do
distinguish between D3 and A3, and between D2 and A
Now, given combinatorial types T1, T2, . . . , Td, each of which being a product of Ak’s
and Dk’s, k = 1, 2, . . . , the combinatorial decomposition number N
(T1, T2, . . . , Td) is
defined as the number of minimal products c1c2 · · · cd less than or equal to c in absolute
order, where “minimal” has the same meaning as above, such that for i = 1, 2, . . . , d
the combinatorial type of ci is Ti. Because of (2.1), the combinatorial decomposition
numbers N combDn (T1, T2, . . . , Td) satisfy also (2.2) and (2.3).
3. Auxiliary results
In our computations in the proof of Theorem 9, leading to the determination of the
decomposition numbers of type Dn, we need to apply the Lagrange–Good inversion
formula [22] (see also [26, Sec. 5] and the references cited therein). We recall it here for
the convenience of the reader. In doing so, we use standard multi-index notation. Name-
ly, given a positive integer d, and vectors z = (z1, z2, . . . , zd) and n = (n1, n2, . . . , nd),
we write zn for zn11 z
2 · · · z
d . Furthermore, in abuse of notation, given a formal power
series f in d variables, f(z) stands for f(z1, z2, . . . , zd). Moreover, given d formal power
series f1, f2, . . . , fd in d variables, f
n(z) is short for
fn11 (z1, z2, . . . , zd)f
2 (z1, z2, . . . , zd) · · · f
d (z1, z2, . . . , zd).
Finally, if m = (m1, m2, . . . , md) is another vector, then m + n is short for (m1 +
n1, m2 + n2, . . . , md + nd). Notation such as m − n has to be interpreted in a similar
Theorem 1 (Lagrange–Good inversion). Let d be a positive integer, and let
f1(z), f2(z), . . . , fd(z) be formal power series in z = (z1, z2, . . . , zd) with the property
that, for all i, fi(z) is of the form zi/ϕi(z) for some formal power series ϕi(z) with
ϕi(0, 0, . . . , 0) 6= 0. Then, if we expand a formal power series g(z) in terms of powers
of the fi(z),
g(z) =
n(z), (3.1)
8 C. KRATTENTHALER AND T. W. MÜLLER
the coefficients γn are given by
g(z)f−n−e(z) det
1≤i,j≤d
where e = (1, 1, . . . , 1), where the sum in (3.1) runs over all d-tuples n of non-negative
integers, and where 〈zm〉h(z) denotes the coefficient of zm in the formal Laurent series
h(z).
Next, we prove a determinant lemma and a corollary, both of which will also be used
in the proof of Theorem 9.
Lemma 2. Let d be a positive integer, and let X1, X2, . . . , Xd, Y2, Y3, . . . , Yd be indeter-
minates. Then
1≤i,j≤d
1− χ(1 6= j)
, i = 1
1− χ(i 6= j)
, i ≥ 2
Y2Y3 · · ·Yd
X1X2 · · ·Xd
, (3.2)
where χ(S) = 1 if S is true and χ(S) = 0 otherwise.
Proof. By using multilinearity in the rows, we rewrite the determinant on the left-hand
side of (3.2) as
X1X2 · · ·Xd
1≤i,j≤d
X1 − χ(1 6= j)Yj, i = 1
Xi − χ(i 6= j)Yi, i ≥ 2
Next, we subtract the first column from all other columns. As a result, we obtain the
determinant
X1X2 · · ·Xd
1≤i,j≤d
X1, i = j = 1
−Yj, i = 1 and j ≥ 2
Xi − Yi, i ≥ 2 and j = 1
χ(i = j)Yi, i, j ≥ 2
 .
Now we add rows 2, 3, . . . , d to the first row. After that, our determinant becomes lower
triangular, with the entry in the first row and column equal to
i=1Xi −
i=2 Yi, and
the diagonal entry in row i, i ≥ 2, equal to Yi. Hence, we obtain the claimed result. �
Corollary 3. Let d and r be positive integers, 1 ≤ r ≤ d, and let X1, X2, . . . , Xd, Y
and Z be indeterminates. Then, with notation as in Lemma 2,
1≤i,j≤d
1− χ(r 6= j) Z
, i = r
1− χ(i 6= j) Y
, i 6= r
Y d−2
i=1Xi + (Y − Z)Xr − (d− 1)Y Z
X1X2 · · ·Xd
. (3.3)
Proof. We write the diagonal entry in the r-th row of the determinant in (3.3) as
Xr + Y − Z
Y − Z
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 9
and then use linearity of the determinant in the r-th row to decompose the determinant
Xr + Y − Z
Y − Z
where D1 is the determinant in (3.2) with Xr replaced by Xr+Y −Z, and with Yi = Y
for all i, and where D2 is the determinant in (3.2) with d replaced by d−1, with Yi = Y
for all i, and with Xi replaced by Xi−1 for i = r+1, r+2, . . . , d. Hence, using Lemma 2,
we deduce that the determinant in (3.3) is equal to
Y d−1
i=1Xi + Y − Z − (d− 1)Y
X1X2 · · ·Xd
(Y − Z)Y d−2
i=1Xi −Xr − (d− 2)Y
X1X2 · · ·Xd
Little simplification then leads to (3.3). �
We end this section with a summation lemma, which we shall need in Sections 5 and 6
in order to compute the Bn, respectively Dn, decomposition numbers of arbitrary rank
from those of full rank, and in Section 8 to derive enumerative results for (generalised)
non-crossing partitions from our formulae for the decomposition numbers.
Lemma 4. Let M and r be non-negative integers. Then
m1+2m2+···+rmr=r
m1, m2, . . . , mr
M + r − 1
, (3.4)
where the multinomial coefficient is defined by
m1, m2, . . . , mr
m1!m2! · · ·mr! (M −m1 −m2 − · · · −mr)!
Proof. The identity results directly by comparing coefficients of zr on both sides of the
identity
(1 + z + z2 + z3 + · · · )M = (1− z)−M .
4. Decomposition numbers for type A
As was already pointed out in [28, Sec. 10], the decomposition numbers for type An
have already been computed by Goulden and Jackson in [23, Theorem 3.2], albeit using
a somewhat different language. (The condition on the sum l(α1) + l(α2) + · · ·+ l(αm)
is misstated throughout the latter paper. It should be replaced by l(α1) + l(α2) + · · ·+
l(αm) = (m− 1)n+ 1.) In our terminology, their result reads as follows.
Theorem 5. Let T1, T2, . . . , Td be types with rkT1 + rkT2 + · · ·+ rkTd = n, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d.
NAn(T1, T2, . . . , Td) = (n + 1)
n− rkTi + 1
n− rkTi + 1
1 , m
2 , . . . , m
, (4.1)
where the multinomial coefficient is defined as in Lemma 4.
10 C. KRATTENTHALER AND T. W. MÜLLER
Here we have used Stembridge’s [35] notation for the decomposition of types into a
product of irreducibles; for example, the equation T = A32 ∗ A5 means that the root
system of type T decomposes into the orthogonal product of 3 copies of root systems
of type A2 and one copy of the root system of type A5.
It was shown in [28, Theorem 10] that, upon applying the summation formula in
Lemma 4 to the result in Theorem 5 in a suitable manner, one obtains a compact
formula for all type An decomposition numbers.
Theorem 6. Let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d.
NAn(T1, T2, . . . , Td) = (n+ 1)
rkT1 + rkT2 + · · ·+ rkTd + 1
n− rkTi + 1
n− rkTi + 1
1 , m
2 , . . . , m
, (4.2)
where the multinomial coefficient is defined as in Lemma 4. All other decomposition
numbers NAn(T1, T2, . . . , Td) are zero.
5. Decomposition numbers for type B
In this section we compute the decomposition numbers in type Bn. We show that one
can extract the corresponding formulae from results of Bóna, Bousquet, Labelle and
Leroux [12] on the enumeration of certain planar maps, which they call m-ary cacti.
While reading the statement of the theorem, the reader should recall from Section 2
the distinction between group-theoretic and combinatorial decomposition numbers.
Theorem 7. (i) If T1, T2, . . . , Td are types with rkT1 + rkT2 + · · ·+ rkTd = n, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Bα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
for some α ≥ 1, then
N combBn (T1, T2, . . . , Td) = n
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
(5.1)
where the multinomial coefficient is defined as in Lemma 4. For α ≥ 2, the number
NBn(T1, T2, . . . , Td) is given by the same formula.
(ii) If T1, T2, . . . , Td are types with rkT1 + rkT2 + · · ·+ rkTd = n, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d,
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 11
NBn(T1, T2, . . . , Td) = n
n− rkTi
n− rkTi
1 , m
2 , . . . , m
)) d∑
1 (n− rkTj)
0 + 1
(5.2)
where m
0 = n− rkTj −
(iii) All other decomposition numbers NBn(T1, T2, . . . , Td) and N
(T1, T2, . . . , Td)
with rkT1 + rkT2 + · · ·+ rkTd = n are zero.
Proof. Determining the decomposition numbers
NBn(T1, T2, . . . , Td) = NBn(Td, . . . , T2, T1)
(recall (2.2)), respectively
N combBn (T1, T2, . . . , Td) = N
(Td, . . . , T2, T1),
amounts to counting all possible factorisations
[1, 2, . . . , n] = σd · · ·σ2σ1, (5.3)
where σi has type Ti as a parabolic Coxeter element, respectively has combinatorial
type Ti. The reader should observe that the factorisation (5.3) is minimal, in the sense
n = ℓT
[1, 2, . . . , n]
= ℓT (σ1) + ℓT (σ2) + · · ·+ ℓT (σd),
since ℓT (σi) = rkTi, and since, by our assumption, the sum of the ranks of the Ti’s
equals n. A further observation is that, in a factorisation (5.3), there must be at least
one factor σi which contains a type B cycle in its (type B) disjoint cycle decomposition,
because the sign of [1, 2, . . . , n] as an element of the group S2n of all permutations of
{1, 2, . . . , n, 1̄, 2̄, . . . , n̄} is −1, while the sign of any type A cycle is +1.
We first prove Claim (iii). Let us assume, by contradiction, that there is a minimal
decomposition (5.3) in which, altogether, we find at least two type B cycles in the (type
B) disjoint cycle decompositions of the σi’s. In that case, (5.3) has the form
[1, 2, . . . , n] = u1κ1u2κ2u3, (5.4)
where κ1 and κ2 are two type B cycles, and u1, u2, u3 are the factors in between.
Moreover, the factorisation (5.4) is minimal, meaning that
n = ℓT (u1) + ℓT (κ1) + ℓT (u2) + ℓT (κ2) + ℓT (u3). (5.5)
We may rewrite (5.4) as
[1, 2, . . . , n] = κ1κ2(κ
1 u1κ1κ2)(κ
2 u2κ2)u3,
or, setting u′1 = κ
1 u1κ1κ2 and u
2 = κ
2 u2κ2, as
[1, 2, . . . , n] = κ1κ2u
2u3. (5.6)
This factorisation is still minimal since u′1 is conjugate to u1 and u
2 is conjugate to
u2. At this point, we observe that κ1 must be a cycle of the form (2.8) with a1 <
a2 < · · · < ak < a1 < a2 < · · · < ak in the order 1 < 2 < · · · < n < 1̄ < 2̄ < · · · < n̄,
because otherwise κ1 6≤T [1, 2, . . . , n], which would contradict (5.6). A similar argument
12 C. KRATTENTHALER AND T. W. MÜLLER
Figure 1. The 3-cactus corresponding to the factorisation (5.7)
applies to κ2. Now, if κ1 and κ2 are not disjoint, then it is easy to see that ℓT (κ1κ2) <
ℓT (κ1) + ℓT (κ2), whence
n = ℓT ([1, 2, . . . , n])
= ℓT (κ1κ2u
≤ ℓT (κ1κ2) + ℓT (u
1) + ℓT (u
2) + ℓT (u3)
≤ ℓT (κ1κ2) + ℓT (u1) + ℓT (u2) + ℓT (u3)
< ℓT (κ1) + ℓT (κ2) + ℓT (u1) + ℓT (u2) + ℓT (u3),
a contradiction to (5.5). If, on the other hand, κ1 and κ2 are disjoint, then we can find
i, j ∈ {1, 2, . . . , n, 1̄, 2̄, . . . , n̄}, such that i < j < κ1(i) < κ2(j) (in the above order of
{1, 2, . . . , n, 1̄, 2̄, . . . , n̄}). In other words, if we represent κ1 and κ2 in the obvious way
in a cyclic diagram (cf. [32, Sec. 2]), then they cross each other. However, in that case
we have
κ1κ2 6≤T [1, 2, . . . , n],
contradicting the fact that (5.6) is a minimal factorisation. (This is one of the con-
sequences of Biane’s group-theoretic characterisation [10, Theorem 1] of non-crossing
partitions.)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 13
We turn now to Claims (i) and (ii). In what follows, we shall show that the formulae
(5.1) and (5.2) follow from results of Bóna, Bousquet, Labelle and Leroux [12] on the
enumeration of m-ary cacti with a rotational symmetry. In order to explain this, we
must first define a bijection between minimal factorisations (5.3) and certain planar
maps. By a map, we mean a connected graph embedded in the plane such that edges
do not intersect except in vertices. The maps which are of relevance here are maps
in which faces different from the outer face intersect only in vertices, and are coloured
with colours from {1, 2, . . . , d}. Such maps will be referred to as d-cacti from now on.1
Examples of 3-cacti can be found in Figures 1 and 2. In the figures, the faces different
from the outer face are the shaded ones. Their colours are indicated by the numbers 1,
2, respectively 3, placed in the centre of the faces. Figure 1 shows a 3-cactus in which
the vertices are labelled, while Figure 2 shows one in which the vertices are not labelled.
(That one of the vertices in Figure 2 is marked by a bold dot should be ignored for the
moment.)
In what follows, we need the concept of the rotator around a vertex v in a d-
cactus, which, by definition, is the cyclic list of colours of faces encountered in a
clockwise journey around v. If, while travelling around v, we encounter the colours
b1, b2, . . . , bk, in this order, then we will write (b1, b2, . . . , bk)
O for the rotator, meaning
that (b1, b2, . . . , bk)
O = (b2, . . . , bk, b1)
O, etc. For example, the rotator of all the vertices
in the map in Figure 1 is (1, 2, 3)O.
We illustrate the bijection between minimal factorisations (5.3) and d-cacti with an
example. Take n = 10 and d = 3, and consider the factorisation
[1, 2, . . . , 10] = σ3σ2σ1, (5.7)
where σ3 = ((7, 8)), σ2 = [2, 6, 8] ((1, 9̄, 10)) ((4, 5)), and σ1 = ((1, 8̄)) ((2, 3, 5)). For
each cycle (a1, a2, . . . , ak) (sic!) of σi, we create a k-gon coloured i, and label its vertices
a1, a2, . . . , ak in clockwise order. (The warning “sic!” is there to avoid misunderstand-
ings: for each type A “cycle” ((b1, b2, . . . , bk)) we create two k-gons, the vertices of one
being labelled b1, b2, . . . , bk, and the vertices of the other being labelled b1, b2, . . . , bk,
while for each type B “cycle” [b1, b2, . . . , bk] we create one 2k-gon with vertices labelled
b1, b2, . . . , bk, b1, b2, . . . , bk.) We glue these polygons into a d-cactus, the faces of which
are these polygons plus the outer face, by identifying equally labelled vertices such that
the rotator of each vertex is (1, 2, . . . , d). Figure 1 shows the outcome of this procedure
for the factorisation (5.7).
The fact that the result of the procedure can be realised as a d-cactus follows from
Euler’s formula. Namely, the number of faces corresponding to the polygons is 1 +
k (the 1 coming from the polygon corresponding to the type B cycle),
the number of edges is 2α+2
k (k+1), and the number of vertices is 2n.
Hence, if we include the outer face, the number of vertices minus the number of edges
1We warn the reader that our terminology deviates from the one in [12, 23]. We follow loosely the
conventions in [25]. To be precise, our d-cacti in which the rotator around every vertex is (1, 2, . . . , d)O
are dual to the coloured d-cacti in [23], respectively d-ary cacti in [12], in the following sense: one is
obtained from the other by “interchanging” the roles of vertices and faces, that is, given a d-cactus in
our sense, one obtains a d-cactus in the sense of Goulden and Jackson by shrinking faces to vertices
and blowing up vertices of degree δ to faces with δ vertices, keeping the incidence relations between
faces and vertices. Another minor difference is that colours are arranged in counter-clockwise order in
[12, 23], while we arrange colours in clockwise order.
14 C. KRATTENTHALER AND T. W. MÜLLER
Figure 2. A rotation-symmetric 3-cactus with a marked vertex
plus the number of faces is
2n− 2α− 2
k (k + 1) + 2
k + 2
= 2n+ 2− 2α− 2
= 2n+ 2− 2 rkT1 − 2 rkT2 − · · · − 2 rkTd
= 2, (5.8)
according to our assumption concerning the sum of the ranks of the types Ti.
We may further simplify this geometric representation of a minimal factorisation
(5.3) by deleting all vertex labels and marking the vertex which had label 1. If this
simplification is applied to the 3-cactus in Figure 1, we obtain the 3-cactus in Figure 2.
Indeed, the knowledge of which vertex carries label 1 allows us to reconstruct all other
vertex labels as follows: starting from the vertex labelled 1, we travel clockwise along
the boundary of the face coloured 1 until we reach the next vertex (that is, we traverse
only a single edge); from there, we travel clockwise along the boundary of the face
coloured 2 until we reach the next vertex; etc., until we have travelled along an edge
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 15
bounding a face of colour d. The vertex that we have reached must carry label 2. Etc.
Clearly, if drawn appropriately into the plane, a d-cactus resulting from an application
of the above procedure to a minimal factorisation (5.3) is symmetric with respect to
a rotation by 180◦, the centre of the rotation being the centre of the regular 2α-gon
corresponding to the unique type B cycle of σj ; cf. Figure 2. In what follows, we shall
abbreviate this property as rotation-symmetric.
In summary, under the assumptions of Claim (i), the decomposition number
N combBn (T1, T2, . . . , Td), respectively, if α ≥ 2, the decomposition number NBn(T1, T2, . . . ,
Td) also, equals the number of all rotation-symmetric d-cacti on 2n vertices in which
one vertex is marked and all vertices have rotator (1, 2, . . . , d)O, with exactly m
k pairs
of faces of colour i having k + 1 vertices, arranged symmetrically around a central face
of colour j with 2α vertices.
Aside from the marking of one vertex, equivalent objects are counted in [12, Theo-
rem 25]. In our language, modulo the “dualisation” described in Footnote 1, and upon
replacing m by d, the objects which are counted in the cited theorem are d-cacti in
which all vertices have rotator (1, 2, . . . , d)O, and which are invariant under a rotation
(not necessarily by 180◦). To be precise, from the proof of [12, (81)] (not given in full
detail in [12]) it can be extracted that the number of d-cacti on 2n vertices, in which all
vertices have rotator (1, 2, . . . , d)O, which are invariant under a rotation by (360/s)◦, s
being maximal with this property, and which have exactly 2m
k faces of colour i having
k + 1 vertices arranged around a central face of colour j with 2α vertices, equals
(2n)d−2s
′µ(t/s)
2(n− rkTj)/t
1 /t, 2m
2 /t, . . . , 2m
i 6=j
2(n− rkTi)
2(n− rkTi)/t
1 /t, 2m
2 /t, . . . , 2m
, (5.9)
where the sum extends over all t with s | t, t | 2α, and t | 2m
k for all i = 1, 2, . . . , d and
k = 1, 2, . . . , n. Here, µ(·) is the Möbius function from number theory.2 In presenting
the formula in the above form, we have also used the observation that, for all i (including
i = j !), the number of type A cycles of σi is n− rkTi.
As we said above, the d-cacti that we want to enumerate have one marked vertex,
whereas the d-cacti counted by (5.9) have no marked vertex. However, given a d-cactus
counted by (5.9), we have exactly 2n/s inequivalent ways of marking a vertex. Hence,
recalling that the d-cacti that we want to count are invariant under a rotation by 180◦,
we must multiply the expression (5.9) by 2n/s, and then sum the result over all even
s. Since, by definition of the Möbius function, we have
2|s|t
µ(t/s) =
s′| t
µ(t/2s′) =
1 if t
0 otherwise,
the result of this summation is exactly the right-hand side of (5.1).
2Formula (81) in [12] does not distinguish the colour or the size of the central face (that is, in the
language of [12]: the colour or the degree of the central vertex), therefore it is in fact a sum over all
possible colours and sizes, represented there by the summations over i and h, respectively.
16 C. KRATTENTHALER AND T. W. MÜLLER
Finally, we prove Claim (ii). From what we already know, in a minimal factorisation
(5.3) exactly one of the factors on the right-hand side must contain a type B cycle of
length 1 in its (type B) disjoint cycle decomposition, σj say. As a parabolic Coxeter
element, a type B cycle of length 1 has type A1. Since all considerations in the proof
of Claim (i) are also valid for α = 1, we may use Formula (5.1) with α = 1, and with
1 replaced by m
1 − 1, to count the number of these factorisations, to obtain
n− rkTj
1 − 1, m
2 , . . . , m
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
This has to be summed over j = 1, 2, . . . , d. The result is exactly (5.2).
The proof of the theorem is now complete. �
Combining the previous theorem with the summation formula of Lemma 4, we can
now derive compact formulae for all type Bn decomposition numbers.
Theorem 8. (i) Let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Bα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
for some α ≥ 1. Then
N combBn (T1, T2, . . . , Td) = n
rkT1 + rkT2 + · · ·+ rkTd
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
, (5.10)
where the multinomial coefficient is defined as in Lemma 4. For α ≥ 2, the number
NBn(T1, T2, . . . , Td) is given by the same formula.
(ii) Let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d.
N combBn (T1, T2, . . . , Td)
rkT1 + rkT2 + · · ·+ rkTd
)( d∏
n− rkTi
n− rkTi
1 , m
2 , . . . , m
, (5.11)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 17
whereas
NBn(T1, T2, . . . , Td)
= nd−1
rkT1 + rkT2 + · · ·+ rkTd
)( d∏
n− rkTi
n− rkTi
1 , m
2 , . . . , m
n− rkT1 − rkT2 − · · · − rkTd +
1 (n− rkTj)
0 + 1
, (5.12)
with m
0 = n− rkTj −
(iii) All other decomposition numbers NBn(T1, T2, . . . , Td) and N
(T1, T2, . . . , Td)
are zero.
Proof. If we write r for n− rkT1− rkT2−· · ·− rkTd, then for Φ = Bn the relation (2.3)
becomes
NBn(T1, T2, . . . , Td) =
T :rkT=r
NBn(T1, T2, . . . , Td, T ), (5.13)
with the same relation holding for N combBn in place of NBn .
In order to prove (5.10), we let T = Am11 ∗A
2 ∗ · · · ∗A
n and use (5.1) in (5.13), to
obtain
N combBn (T1, T2, . . . , Td) =
m1+2m2+···+nmn=r
m1, m2, . . . , mn
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
If we use (3.4) with M = n− r, we arrive at our claim after little simplification.
In order to prove (5.11), we let T = Bα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
n in (5.13). The
important point to be observed here is that, in contrast to the previous argument, in
the present case T must have a factor Bα. Subsequently, use of (5.1) in (5.13) yields
N combBn (T1, T2, . . . , Td) =
m1+2m2+···+nmn=r−α
m1, m2, . . . , mn
n− rkTi
n− rkTi
1 , m
2 , . . . , m
. (5.14)
Now we use (3.4) with r replaced by r − α and M = n − r, and subsequently the
elementary summation formula
n− α− 1
r − α
n− α− 1
n− r − 1
r − 1
. (5.15)
Then, after little rewriting, we arrive at our claim.
To establish (5.12), we must recall that the group-theoretic type A1 does not distin-
guish between a type A cycle ((i, j)) = (i, j) (̄i, j̄) and a type B cycle [i] = (i, ī). Hence,
to obtain NBn(T1, T2, . . . , Td) in the case that no Ti contains a Bα for α ≥ 2, we must
18 C. KRATTENTHALER AND T. W. MÜLLER
add the expression (5.11) and the expressions (5.10) with m
1 replaced by m
1 −1 over
j = 1, 2, . . . , d. As is not difficult to see, this sum is indeed equal to (5.12). �
6. Decomposition numbers for type D
In this section we compute the decomposition numbers for the type Dn. Theorem 9
gives the formulae for the full rank decomposition numbers, while Theorem 10 presents
the implied formulae for the decomposition numbers of arbitrary rank. To our knowl-
edge, these are new results, which did not appear earlier in the literature on map
enumeration or on the connection coefficients in the symmetric group or other Coxeter
groups. Nevertheless, the proof of Theorem 9 is entirely in the spirit of the fundamental
paper [23], in that the problem of counting factorisations is translated into a problem of
map enumeration, which is then solved by a generating function approach that requires
the use of the Lagrange–Good formula for coefficient extraction.
We begin with the result concerning the full rank decomposition numbers in type
Dn. While reading the statement of the theorem below, the reader should again recall
from Section 2 the distinction between group-theoretic and combinatorial decomposition
numbers.
Theorem 9. (i) If T1, T2, . . . , Td are types with rkT1 + rkT2 + · · ·+ rkTd = n, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Dα ∗ A
2 ∗ · · · ∗ A
for some α ≥ 2, then
N combDn (T1, T2, . . . , Td) = (n− 1)
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rk Ti − 1
n− rkTi − 1
1 , m
2 , . . . , m
, (6.1)
where the multinomial coefficient is defined as in Lemma 4. For α ≥ 4, the number
NDn(T1, T2, . . . , Td) is given by the same formula.
(ii) If T1, T2, . . . , Td are types with rkT1 + rkT2 + · · ·+ rkTd = n, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d,
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 19
N combDn (T1, T2, . . . , Td)
= (n− 1)d−1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
−2(d− 1)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
 , (6.2)
while
NDn(T1, T2, . . . , Td)
= (n− 1)d−1
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
n− rkTj
1 , m
2 , . . . , m
n− rkTj
1 , m
2 , m
3 − 1, m
4 , . . . , m
n− rkTj
1 − 2, m
2 , . . . , m
−2(d− 1)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
 . (6.3)
(iii) All other decomposition numbers NDn(T1, T2, . . . , Td) and N
(T1, T2, . . . , Td)
with rkT1 + rkT2 + · · ·+ rkTd = n are zero.
Remark. These formulae must be correctly interpreted when Ti contains no Dα and
rkTi = n− 1. In that case, because of n− 1 = rkTi = m
1 + 2m
2 + · · ·+ nm
n , there
must be an ℓ, 1 ≤ ℓ ≤ n− 1, with m
ℓ ≥ 1. We then interpret the term
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
n− rkTi − 1
n− rk Ti − 1
1 , m
2 , . . . , m
n− rkTi − 2
1 , . . . , m
ℓ − 1, . . . , m
where the multinomial coefficient is zero whenever
−1 = n− rkTi − 2 < m
1 + · · ·+ (m
ℓ − 1) + · · ·+m
except when all of m
1 , . . . , m
ℓ − 1, . . . , m
n are zero. Explicitly, one must read
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
20 C. KRATTENTHALER AND T. W. MÜLLER
if rkTi = n− 1 but Ti 6= An−1, and
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
if Ti = An−1.
Proof of Theorem 9. Determining the decomposition number
NDn(T1, T2, . . . , Td) = NDn(Td, . . . , T2, T1)
(recall (2.2)), respectively
N combDn (T1, T2, . . . , Td) = N
(Td, . . . , T2, T1),
amounts to counting all possible factorisations
(1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) (n, n̄) = σd · · ·σ2σ1, (6.4)
where σi has type Ti as a parabolic Coxeter element, respectively has combinatorial
type Ti. Here also, the factorisation (6.4) is minimal in the sense that
n = ℓT
(1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) (n, n̄)
= ℓT (σ1) + ℓT (σ2) + · · ·+ ℓT (σd),
since ℓT (σi) = rkTi, and since, by our assumption, the sum of the ranks of the Ti’s
equals n.
We first prove Claim (iii). Let us assume, for contradiction, that there is a minimal
factorisation (6.4), in which, altogether, we find at least two type B cycles of length
≥ 2 in the (type B) disjoint cycle decompositions of the σi’s. It can then be shown by
arguments similar to those in the proof of Claim (iii) in Theorem 7 that this leads to
a contradiction. Hence, “at worst,” we may find a type B cycle of length 1, (a, ā) say,
and another type B cycle, κ say. Both of them must be contained in the disjoint cycle
decomposition of one of the σi’s since all the σi’s are elements of W (Dn). Given that
κ has length α − 1, the product of both, (a, ā) κ, is of combinatorial type Dα, α ≥ 2,
whereas, as a parabolic Coxeter element, it is of type Dα only if α ≥ 4. If α = 3, then
it is a parabolic Coxeter element of type A3, and if α = 2 it is of type A
1. Thus, we are
actually in the cases to which Claims (i) and (ii) apply.
To prove Claim (i), we continue this line of argument. By a variation of the conjuga-
tion argument (5.4)–(5.6), we may assume that these two type B cycles are contained
in σd, σd = (a, ā) κ σ
d say, where, as above, (a, ā) is the type B cycle of length 1 and
κ is the other type B cycle, and where σ′d is free of type B cycles. In that case, (6.4)
takes the form
c = (1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) (n, n̄) = (a, ā) κ σ′d · · ·σ1. (6.5)
If a 6= n, κ 6= (n, n̄), and if κ does not fix n, then (a, ā)κ 6≤T c, a contradiction. Likewise,
if a 6= n, κ = [b1, b2, . . . , bk] with n /∈ {b1, b2, . . . , bk}, then (a, ā) κ 6≤T [1, 2, . . . , n − 1],
again a contradiction. Hence, we may assume that a = n, whence (a, ā) κ = κ (n, n̄)
forms a parabolic Coxeter element of type Dα, given that κ has length α − 1. We are
then in the position to determine all possible factorisations of the form (6.5), which
reduces to
(1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) = [1, 2, . . . , n− 1] = κσ′d · · ·σ1. (6.6)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 21
This is now a minimal type B factorisation of the form (5.3) with n replaced by n− 1.
We may therefore use Formula (5.1) with n replaced by n− 1, and with rkTj replaced
by rkTj − 1. These substitutions lead exactly to (6.1).
Finally, we turn to Claim (ii). First we discuss two degenerate cases which come
from the identifications D3 ∼ A3, respectively D2 ∼ A
1, and which only occur for
NDn(T1, T2, . . . , Td) (but not for the combinatorial decomposition numbers N
(T1, T2,
. . . , Td)). It may happen that one of the factors in (6.4), let us say, without loss of
generality, σd, contains a type B cycle of length 1 and one of length 2 in its disjoint
cycle decomposition; that is, σd may contain
(n, n̄) [a, b] = (n, n̄) (a, b, ā, b̄) = [a, b] [b, n] [b, n̄].
As a parabolic Coxeter element, this is of type A3. By the reduction (6.5)–(6.6), we
may count the number of these possibilities by Formula (5.1) with n replaced by n− 1,
rkTj replaced by rkTj−1, and m
3 replaced by m
3 −1. This explains the second term
in the factor in big parentheses on the right-hand side of (6.3). On the other hand, it
may happen that one of the factors in (6.4), let us say again, without loss of generality,
σd, contains two type B cycles of length 1 in its disjoint cycle decomposition; that is,
σd may contain (n, n̄) (a, ā). As a parabolic Coxeter element, this is of type A
1. By the
reduction (6.5)–(6.6), we may count the number of these possibilities by Formula (5.1)
with n replaced by n − 1, rkTj replaced by rkTj − 1, and m
1 replaced by m
1 − 2.
This explains the third term in the factor in big parentheses on the right-hand side of
(6.3).
From now on we may assume that none of the σi’s contains a type B cycle in its (type
B) disjoint cycle decomposition. To determine the number of minimal factorisations
(6.4) in this case, we construct again a bijection between these factorisations and certain
maps. In what follows, we will still use the concept of a rotator, introduced in the
proof of Theorem 7. We apply again the procedure described in that proof. That
is, for each (ordinary) cycle (a1, a2, . . . , ak) of σi, we create a k-gon coloured i, label
its vertices a1, a2, . . . , ak in clockwise order, and glue these polygons into a map by
identifying equally labelled vertices such that the rotator of each vertex is (1, 2, . . . , d).
However, this map can be embedded in the plane only if we allow the creation of an
inner face corresponding to the cycle (n, n̄) on the left-hand side of (6.4) (the outer face
corresponding to the large cycle (1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1)). Moreover, this inner
face must be bounded by 2d edges. We call such a map, in which all faces except the
outer face and an inner face intersect only in vertices, and are coloured with colours
from {1, 2, . . . , d}, and in which the inner face is bounded by 2d edges, a d-atoll. For
example, if we take n = 10 and d = 3, and consider the factorisation
(1, 2, . . . , 9, 1̄, 2̄, . . . , 9̄) (10, 10) = σ3σ2σ1, (6.7)
where σ3 = ((1, 4, 10, 7̄)), σ2 = ((1, 3)) ((4, 6, 10)) ((7, 8, 9)), and σ1 = ((1, 2)) ((4, 5)),
and apply this procedure, we obtain the 3-atoll in Figure 3. In the figure, the faces
corresponding to cycles are shaded. As in Figures 1 and 2, the outer face is not shaded.
Here, there is in addition an inner face which is not shaded, the face formed by the
vertices 4, 10, 4̄, 10. Again, the colours of the shaded faces are indicated by the numbers
1, 2, respectively 3, placed in the centre of the faces.
22 C. KRATTENTHALER AND T. W. MÜLLER
3 2 1 3
Figure 3. The 3-atoll corresponding to the factorisation (6.7)
Unsurprisingly, the fact that the result of the procedure can be realised as a d-atoll
follows again from Euler’s formula. More precisely, the number of faces corresponding
to the polygons is 2
k , the number of edges is 2
k (k + 1),
and the number of vertices is 2n. Hence, if we include the outer face and the inner face,
the number of vertices minus the number of edges plus the number of faces is
2n− 2
k (k + 1) + 2
k + 2 = 2n+ 2− 2
= 2n+ 2− 2 rkT1 − 2 rkT2 − · · · − 2 rkTd
= 2, (6.8)
according to our assumption concerning the sum of the ranks of the types Ti.
Again, we may further simplify this geometric representation of a minimal factorisa-
tion (6.4) by deleting all vertex labels, marking the vertex which had label 1 with •,
and marking the vertex that had label n with �. If this simplification is applied to the
3-atoll in Figure 3, we obtain the 3-atoll in Figure 4. Clearly, if drawn appropriately
into the plane, a d-atoll resulting from an application of the above procedure to a mini-
mal factorisation (6.4) is symmetric with respect to a rotation by 180◦, the centre of the
rotation being the centre of the inner face; cf. Figure 4. As earlier, we shall abbreviate
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 23
3 2 1 3
Figure 4. A rotation-symmetric 3-atoll with two marked vertices
this property as rotation-symmetric. In fact, there is not much freedom for the choice of
the vertex marked by � once a vertex has been marked by •. Clearly, if we run through
the vertex labelling process described in the proof of Theorem 7, labelling 1 the vertex
which is marked by •, we shall reconstruct the labels 1, 2, . . . , n − 1, 1̄, 2̄, . . . , n− 1.
This leaves only 2 vertices incident to the inner face unlabelled, one of which will have
to carry the mark �.
In summary, under the assumptions of Claim (ii), the number of minimal factori-
sations (6.4), in which none of the σi’s contains a type B cycle in its disjoint cycle
decomposition, equals twice the number of all rotation-symmetric d-atolls on 2n ver-
tices, in which one vertex is marked by •, all vertices have rotator (1, 2, . . . , d)O, and
with exactly m
k pairs of faces of colour i having k + 1 vertices, arranged symmetri-
cally around the inner face (which is not coloured). Let us denote the number of these
d-atolls by N ′Dn(T1, T2, . . . , Td).
We must now enumerate these d-atolls. First of all, introducing a figure of speech,
we shall refer to coloured faces of a d-atoll which share an edge with the inner face but
not with the outer face as faces “inside the d-atoll,” and all others as faces “outside the
d-atoll.” For example, in Figure 4 we find two faces inside the 3-atoll, namely the two
loop faces attached to the vertices labelled 10, respectively 10, in Figure 3. Since, in a
d-atoll, the inner face is bounded by exactly 2d edges, inside the d-atoll, we find only
24 C. KRATTENTHALER AND T. W. MÜLLER
coloured faces containing exactly one vertex. Next, we travel counter-clockwise around
the inner face and record the coloured faces sharing an edge with both the inner and
outer faces. Thus we obtain a list of the form
F1, F2, . . . , Fℓ, Fℓ+1, . . . , F2ℓ,
where, except possibly for the marking, Fh+ℓ is an identical copy of Fh, h = 1, 2, . . . , ℓ.
In Figure 4, this list contains four faces, F̃1, F̃2, F̃3, F̃4, where F̃1 and F̃3 are the two
quadrangles of colour 3, and where F̃2 and F̃4 are the two triangles of colour 2 connecting
the two quadrangles.
Continuing the general argument, let the colour of Fh be ih. Inside the d-atoll, because
of the rotator condition, there must be {ih+1−ih−1}d faces (containing just one vertex)
incident to the common vertex of Fh and Fh+1 coloured {ih+1}d, . . . , {ih+1−1}d, where,
by definition,
{x}d :=
x, if 0 ≤ x ≤ d
x+ d, if x < 0
x− d, if x > d,
and where ih+ℓ = ih, h = 1, 2, . . . , ℓ. Here, if {ih + 1}d > {ih+1 − 1}d, the sequence
of colours {ih + 1}d, . . . , {ih+1 − 1}d must be interpreted “cyclically,” that is, as {ih +
1}d, {ih+1}d+1, . . . , d, 1, 2, . . . , {ih+1−1}d. As we observed above, the number of edges
bounding the inner face is 2d. On the other hand, using the notation just introduced,
this number also equals
{ih+1 − ih}d = 2
(ih+1 − ih) + d · χ(ih+1 < ih)
χ(ih+1 < ih).
Hence, there is precisely one h for which ih+1 < ih. Without loss of generality, we may
assume that h = ℓ, so that i1 < i2 < · · · < iℓ.
The ascending colouring of the faces F1, F2, . . . , Fℓ breaks the (rotation) symmetry
of the d-atoll. Therefore, we may first enumerate d-atolls without any marking, and
multiply the result by the number of all possible markings, which is n − 1. More
precisely, let N ′′Dn(T1, T2, . . . , Td) denote the number of all rotation-symmetric d-atolls
on 2n vertices, in which all vertices have rotator (1, 2, . . . , d)O, and with exactly m
pairs of faces of colour i having k+1 vertices, arranged symmetrically around the inner
face (which is not coloured). Then,
NDn(T1, T2, . . . , Td) = 2N
(T1, T2, . . . , Td)
= 2(n− 1)N ′′Dn(T1, T2, . . . , Td). (6.9)
We use a generating function approach to determine N ′′Dn(T1, T2, . . . , Td), which re-
quires a combinatorial decomposition of our objects. Let G(z) be the generating func-
G(z) =
w(A), (6.10)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 25
where A is the set of all rotation-symmetric d-atolls, in which all vertices have rotator
(1, 2, . . . , d)O, and where
w(A) =
#(faces of A with colour i)
#(faces of A with colour i and k vertices)
i,k .
Here, z = (z1, z2, . . . , zd), with the zi’s, i = 1, 2, . . . , d, and the pi,k, i = 1, 2, . . . , d,
k = 1, 2, . . . , being indeterminates. Clearly, in view of the bijection between minimal
factorisations (6.4) and d-atolls described earlier, and by (6.9), we have
NDn(T1, T2, . . . , Td) = 2(n− 1)
i,k+1
G(z), (6.11)
where c = (c1, c2, . . . , cd), with ci equal to the number of type A cycles of σi; that is,
k , i = 1, 2, . . . , d. Here, and in the sequel, we use the multi-index notation
introduced at the beginning of Section 3. For later use, we observe that, for all i, ci is
related to rkTi via
ci = n− rkTi. (6.12)
Now, let A be a d-atoll in A such that the faces which share an edge with both the
inner and outer faces are
F1, F2, . . . , Fℓ, Fℓ+1, . . . , F2ℓ,
where Fh+ℓ is an identical copy of Fh, where the colour of Fh is ih, h = 1, 2, . . . , ℓ, and
with i1 < i2 < · · · < iℓ. We decompose A by separating from each other the polygons
which touch in vertices of the inner face. The decomposition in the case of our example
in Figure 4 is shown in Figure 5. Ignoring identical copies which are there due to the
rotation symmetry, we obtain a list
K1, L
, . . . , L
, . . . , C
d , C
1 , . . . , C
K2, L
, . . . , L
, . . . , C
d , C
1 , . . . , C
, . . .
Kℓ, L
, . . . , L
d , L
1 , . . . , L
, . . . , C
, (6.13)
where Kh is the d-cactus containing the face Fh, and, hence, a d-cactus in which all
but two neighbouring vertices have rotator (1, 2, . . . , d)O, the latter two vertices being
incident to just one face, which is of colour ih, where L
j is a face of colour j with
just one vertex, and where C
j is a d-cactus in which all but one vertex have rotator
(1, 2, . . . , d)O, the distinguished vertex being incident to just one face, which is of colour
j, h = 1, 2, . . . , ℓ and j = 1, 2, . . . , d. With this notation, our example in Figure 5 is
one in which ℓ = 2, i1 = 2, i2 = 3.
The d-cacti Kh can be further decomposed. Namely, assuming that the face Fh
is a k-gon (of colour ih), let C1, C2, . . . , Ck−2 be the d-cacti incident to this k-gon,
read in clockwise order, starting with the d-cactus to the left of the two distinguished
vertices. Figure 6 illustrates this further decomposition of the d-cactus K2 from Fig-
ure 5. After removal of Fh, we are left with the ordered collection C1, C2, . . . , Ck−2
of d-cacti, each of which having the property that the rotator of all but one vertex is
(1, 2, . . . , d)O, the exceptional vertex having rotator (1, . . . , ih − 1, ih + 1, . . . , d)
O. By
separating from each other the polygons of colours 1, . . . , ih − 1, ih + 1, . . . , d which
26 C. KRATTENTHALER AND T. W. MÜLLER
Figure 5. The decomposition of the 3-atoll in Figure 4
touch in the exceptional vertex, each d-cactus Ci in turn can be decomposed into d-
cacti Ci,1, . . . , Ci,ih−1, Ci,ih+1, . . . , Ci,d with Ci,j ∈ Cj for all k, where Cj denotes the set
of all d-cacti in which all but one vertex have rotator (1, 2, . . . , d)O, the distinguished
vertex being incident to just one face, which is of colour j.
Let ωj(z) denote the generating function for the d-cacti in Cj , that is,
ωj(z) =
w(C). (6.14)
Furthermore, for i = 1, 2, . . . , d, define the formal power series Pi(u) in one variable u
Pi(u) =
pi,ku
Then, by the decomposition (6.13) and the further decomposition of the Kh’s that we
just described, the contribution of the above d-atolls to the generating function (6.10)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 27
Figure 6. The decomposition of K2 in Figure 5
zijωij (z)
ω1(z) · · ·ωd(z)
ω1(z) · · ·ωd(z)
ωij(z)
− pij ,1
j=1 zjpj,1∏ℓ
j=1 zijpij ,1
(ω1(z) · · ·ωd(z))
j=1 ωij(z)
zjpj,1
ωj(z)
ω1(z)···ωd(z)
ωij (z)
pij ,1
the term in the first line corresponding to the contribution of the Kj ’s, the first term in
the second line corresponding to the contribution of the L
k ’s, and the second term in
the second line corresponding to the contribution of the C
k ’s. These expressions must
be summed over ℓ = 2, 3, . . . , d and all possible choices of 1 ≤ i1 < i2 < · · · < iℓ ≤ d to
28 C. KRATTENTHALER AND T. W. MÜLLER
obtain the desired generating function G(z), that is,
G(z) =
zjpj,1
ωj(z)
1≤i1<i2<···<iℓ≤d
ω1(z)···ωd(z)
ωij (z)
pij , 1
zjpj,1
ωj(z)
ω1(z)···ωd(z)
ωj(z)
ω1(z)···ωd(z)
ωj(z)
(6.15)
Here we have used the elementary identity
1≤i1<i2<···<iℓ≤d
Xi1Xi2 · · ·Xiℓ = (1 +X1)(1 +X2) · · · (1 +Xd).
Before we are able to proceed, we must find functional equations for the generating
functions ωj(z), j = 1, 2, . . . , d. Given a d-cactus C in Cj such that the distinguished
vertex is incident to a k-gon (of colour j), we decompose it in a manner analogous
to the decomposition of Kh above. To be more precise, let C1, C2, . . . , Ck−1 be the
d-cacti incident to this k-gon, read in clockwise order, starting with the d-cactus to
the left of the distinguished vertex. After removal of the k-gon, we are left with the
ordered collection C1, C2, . . . , Ck−1 of d-cacti, each of which having the property that
the rotator of all but one vertex is (1, 2, . . . , d)O, the exceptional vertex having rotator
(1, . . . , j − 1, j + 1, . . . , d)O. By separating from each other the polygons of colours
1, . . . , j− 1, j+1, . . . , d which touch in the exceptional vertex, each d-cactus Ci in turn
can be decomposed into d-cacti Ci,1, . . . , Ci,j−1, Ci,j+1, . . . , Ci,d with Ci,k ∈ Ck for all k.
The upshot of these combinatorial considerations is that
ωj(z) = zjPj(ω1(z) · · ·ωd(z)/ωj(z)), j = 1, 2, . . . , d,
or, equivalently,
ωj(z)
Pj(ω1(z) · · ·ωd(z)/ωj(z))
, j = 1, 2, . . . , d.
Using this relation, the expression (6.15) for G(z) may now be further simplified, and
we obtain
G(z) = 1−
ω1(z)···ωd(z)
ωj(z)
ω1(z)···ωd(z)
ωj(z)
+ (d− 1)
ω1(z)···ωd(z)
ωj(z)
This is substituted in (6.11), to obtain
NDn(T1, T2, . . . , Td)
= −2(n− 1)
i,k+1
ω1(z)···ωd(z)
ωj(z)
ω1(z)···ωd(z)
ωj(z)
+ 2(n− 1)(d− 1)
i,k+1
ω1(z)···ωd(z)
ωj(z)
) . (6.16)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 29
Now the problem is set up for application of the Lagrange–Good inversion formula.
Let fi(z) = zi/Pi(z1 · · · zd/zi), i = 1, 2, . . . , d. If we substitute fi(z) in place of zi,
i = 1, 2, . . . , d, in (6.16), and apply Theorem 1 with
g(z) =
z1···zd
z1···zd
respectively
g(z) =
z1···zd
we obtain that
NDn(T1, T2, . . . , Td)
= −2(n− 1)
i,k+1
fj(z)pj,1
−c(z) det
1≤i,k≤d
+ 2(n− 1)(d− 1)
i,k+1
−c(z) det
1≤i,k≤d
, (6.17)
where 0 stands for the vector (0, 0, . . . , 0). We treat the two terms on the right-hand
side of (6.17) separately. We begin with the second term:
i,k+1
−c(z) det
1≤i,k≤d
i,k+1
1≤i,k≤d



z1 · · · zd
, i = k
−P ci−2i
z1 · · · zd
×P ′i
z1 · · · zd
z1 · · · zd
, i 6= k



i,k+1
× det
1≤i,k≤d
z1 · · · zd
, i = k
ci − 1
i (u)
) ∣∣∣∣∣
u=z1···zd/zi
, i 6= k
30 C. KRATTENTHALER AND T. W. MÜLLER
Reading coefficients, we obtain
ci − 1
1 , m
2 , . . . , m
1≤i,k≤d
1, i = k
ci − 1
, i 6= k
ci − 1
1 , m
2 , . . . , m
1≤i,k≤d
1− χ(i 6= k)
ci − 1
the second line being due to (6.12). Now we can apply Lemma 2 with Xi = ci − 1 and
Yi = n− 1, i = 1, 2, . . . , d. The term
(ci − 1)− (d− 1)(n− 1)
(n− rkTi − 1)− (d− 1)(n− 1)
on the right-hand side of (3.2) simplifies to −1 due to our assumption concerning the
sum of the ranks of the types Ti. Hence, if we use the relation (6.12) once more, the
second term on the right-hand side of (6.17) is seen to equal
−2(d− 1)(n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
This explains the second term in the factor in big parentheses in (6.2) and the fourth
term in the factor in big parentheses on the right-hand side of (6.3).
Finally, we come to the first term on the right-hand side of (6.17). We have
i,k+1
fj(z)pj,1
−c(z) det
1≤i,k≤d
i,k+1
i 6=j
 det1≤i,k≤d



ci−1+χ(i=j)
z1 · · · zd
, i = k
ci−2+χ(i=j)
z1 · · · zd
×P ′i
z1 · · · zd
z1 · · · zd
, i 6= k



i,k+1
i 6=j
× det
1≤i,k≤d
ci−1+χ(i=j)
z1 · · · zd
, i = k
ci − 1 + χ(i = j)
ci−1+χ(i=j)
i (u)
) ∣∣∣∣∣
u=z1···zd/zi
, i 6= k
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 31
Reading coefficients, we obtain
ci − 1 + χ(i = j)
1 , m
2 , . . . , m
1≤i,k≤d
1, i = k
ci − 1 + χ(i = j)
, i 6= k
ci − 1 + χ(i = j)
1 , m
2 , . . . , m
1≤i,k≤d
1− χ(j 6= k) n
, i = j
1− χ(i 6= k) n−1
, i 6= j
the second line being due to (6.12). Now we can apply Corollary 3 with r = j, Xi =
ci − 1, i = 1, . . . , j − 1, j + 1, . . . , d, Xj = cj, Y = n− 1, and Z = n. The term
Xi + (Y − Z)Xj − (d− 1)Y Z = n
(ci − 1)
− cj − (d− 1)(n− 1)n
(n− rkTi − 1) + n− cj − (d− 1)(n− 1)n
on the right-hand side of (3.2) simplifies to −cj due to our assumption concerning the
sum of the ranks of the types Ti. Hence, if we use the relation (6.12) once more, the
second term on the right-hand side of (6.17) is seen to equal the sum over j = 1, 2, . . . , d
2(n− 1)d−1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
This explains the first terms in the factors in big parentheses on the right-hand sides
of (6.2) and (6.3).
The proof of the theorem is complete. �
Combining the previous theorem with the summation formula of Lemma 4, we can
now derive compact formulae for all type Dn decomposition numbers.
Theorem 10. (i) Let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Dα ∗ A
2 ∗ · · · ∗ A
for some α ≥ 2. Then
N combDn (T1, T2, . . . , Td) = (n− 1)
rkT1 + rkT2 + · · ·+ rkTd − 1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
, (6.18)
where the multinomial coefficient is defined as in Lemma 4. For α ≥ 4, the number
NDn(T1, T2, . . . , Td) is given by the same formula.
32 C. KRATTENTHALER AND T. W. MÜLLER
(ii) Let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d.
N combDn (T1, T2, . . . , Td) = (n− 1)
rkT1 + rkT2 + · · ·+ rkTd − 1
n− rkTj
1 , m
2 , . . . , m
)( d∏
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
ℓ=1 rkTℓ
n− 1−
ℓ=1 rkTℓ
ℓ=1 rkTℓ
− 2(d− 2)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
 , (6.19)
whereas
NDn(T1, T2, . . . , Td) = (n− 1)
rk T1 + rkT2 + · · ·+ rkTd − 1
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
n− rkTj
1 , m
2 , . . . , m
n− rkTj
1 , m
2 , m
3 − 1, m
4 , . . . , m
n− rkTj
1 − 2, m
2 , . . . , m
ℓ=1 rkTℓ
n− 1−
ℓ=1 rkTℓ
ℓ=1 rkTℓ
− 2(d− 2)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
 . (6.20)
(iii) All other decomposition numbers NDn(T1, T2, . . . , Td) and N
(T1, T2, . . . , Td)
are zero.
Remark. The caveats on interpretations of the formulae in Theorem 9 for critical choices
of the parameters (cf. the Remark after the statement of that theorem) apply also to
the formulae of Theorem 10.
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 33
Proof. We proceed in a manner similar to the proof of Theorem 8. If we write r for
n− rkT1 − rkT2 − · · · − rkTd and set Φ = Dn, then the relation (2.3) becomes
NDn(T1, T2, . . . , Td) =
T :rkT=r
NDn(T1, T2, . . . , Td, T ), (6.21)
with the same relation holding for N combDn in place of NDn .
In order to prove (6.18), we let T = Am11 ∗A
2 ∗ · · · ∗A
n and use (6.1) in (6.21), to
obtain
N combDn (T1, T2, . . . , Td) =
m1+2m2+···+nmn=r
(n− 1)d
n− r − 1
n− r − 1
m1, m2, . . . , mn
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
If we use (3.4) with M = n− r − 1, we arrive at our claim after little simplification.
Next we prove (6.19). In contrast to the previous argument, here the summation
on the right-hand side of (6.21) must be taken over all types T of the form T =
Dα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
n , α ≥ 2, as well as of the form T = A
1 ∗ A
2 ∗ · · · ∗ A
For the sum over the former types, we have to substitute (6.1) in (6.21), to get
m1+2m2+···+nmn=r−α
(n− 1)d
m1, m2, . . . , mn
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
. (6.22)
On the other hand, for the sum over the latter types, we have to substitute (6.2) in
(6.21), to get
m1+2m2+···+nmn=r
(n− 1)d
m1, m2, . . . , mn
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
m1+2m2+···+nmn=r
(n− 1)d
n− r − 1
n− r − 1
m1, m2, . . . , mn
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
−2(d− 1)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
 . (6.23)
34 C. KRATTENTHALER AND T. W. MÜLLER
We simplify (6.22) by using (3.4) with r replaced by r − α and M = n − r, and by
subsequently applying the elementary summation formula
n− α− 1
r − α
n− α− 1
n− r − 1
r − 2
. (6.24)
The expression which we obtain in this way explains the fraction in the third line of
(6.19) multiplied by the expression in the last line. On the other hand, we simplify the
sums in (6.23) by using (3.4) with M = n− r, respectively M = n− r − 1. Thus, the
expression (6.23) becomes
2(n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
+(n−1)d−1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
− 2(d− 1)(n− 1)
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
which explains the expression in the second line of (6.19) and the second expression in
the third line of (6.19) multiplied by the expression in the last line.
The proof of (6.20) is analogous, using (6.3) instead of (6.2). We leave the details to
the reader. �
7. Generalised non-crossing partitions
In this section we recall the definition of Armstrong’s [1] generalised non-crossing
partitions poset, and its combinatorial realisation from [1] and [29] for the types An,
Bn, and Dn.
Let again Φ be a finite root system of rank n, and letW = W (Φ) be the corresponding
reflection group. We define first the non-crossing partition lattice NC(Φ) (cf. [8, 15]).
Let c be a Coxeter element in W . Then NC(Φ) is defined to be the restriction of the
partial order ≤T from Section 2 to the set of all elements which are less than or equal
to c in this partial order. This definition makes sense since any two Coxeter elements
in W are conjugate to each other; the induced inner automorphism then restricts to an
isomorphism of the posets corresponding to the two Coxeter elements. It can be shown
thatNC(Φ) is in fact a lattice (see [16] for a uniform proof), and moreover self-dual (this
is obvious from the definition). Clearly, the minimal element in NC(Φ) is the identity
element in W , which we denote by ε, and the maximal element in NC(Φ) is the chosen
Coxeter element c. The term “non-crossing partition lattice” is used because NC(An)
is isomorphic to the lattice of non-crossing partitions of {1, 2, . . . , n + 1}, originally
introduced by Kreweras [30] (see also [20] and below), and since also NC(Bn) and
NC(Dn) can be realised as lattices of non-crossing partitions (see [4, 32] and below).
In addition to a fixed root system, the definition of Armstrong’s generalised non-
crossing partitions requires a fixed positive integer m. The poset of m-divisible non-
crossing partitions associated to the root system Φ has as ground-set the following subset
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 35
of (NC(Φ))m+1:
NCm(Φ) =
(w0;w1, . . . , wm) : w0w1 · · ·wm = c and
ℓT (w0) + ℓT (w1) + · · ·+ ℓT (wm) = ℓT (c)
. (7.1)
The order relation is defined by
(u0; u1, . . . , um) ≤ (w0;w1, . . . , wm) if and only if ui ≥T wi, 1 ≤ i ≤ m.
(According to this definition, u0 and w0 need not be related in any way. However, it
follows from [1, Lemma 3.4.7] that, in fact, u0 ≤T w0.) The poset NC
m(Φ) is graded
by the rank function
(w0;w1, . . . , wm)
= ℓT (w0). (7.2)
Thus, there is a unique maximal element, namely (c; ε, . . . , ε), where ε stands for the
identity element in W , but, for m > 1, there are many different minimal elements. In
particular, NCm(Φ) has no least element if m > 1; hence, NCm(Φ) is not a lattice for
m > 1. (It is, however, a graded join-semilattice, see [1, Theorem 3.4.4].)
In what follows, we shall use the notions “generalised non-crossing partitions” and
“m-divisible non-crossing partitions” interchangeably, where the latter notion will be
employed particularly in contexts in which we want to underline the presence of the
parameter m.
In the remainder of this section, we explain combinatorial realisations of the m-
divisible non-crossing partitions of types An−1, Bn, and Dn. In order to be able to do
so, we need to recall the definition of Kreweras’ non-crossing partitions of {1, 2, . . . , N},
his “partitions non croisées d’un cycle” of [30]. We place N vertices around a cycle,
and label them 1, 2, . . . , N in clockwise order. The circular representation of a partition
of the set {1, 2, . . . , N} is the geometric object which arises by representing each block
{i1, i2, . . . , ik} of the partition, where i1 < i2 < · · · < ik, by the polygon consisting of
the vertices labelled i1, i2, . . . , ik and edges which connect these vertices in clockwise
order. A partition of {1, 2, . . . , N} is called non-crossing if any two edges in its circular
representation are disjoint. Figure 7 shows the non-crossing partition
{{1, 2, 21}, {3, 19, 20}, {4, 5, 6}, {7, 17, 18}, {8, 9, 10, 11, 12, 13, 14, 15, 16}}
of {1, 2, . . . , 21}. There is a natural partial order on Kreweras’ non-crossing partitions
defined by refinement: a partition π1 is less than or equal to the partition π2 if every
block of π1 is contained in some block of π2.
If Φ = An−1, the m-divisible non-crossing partitions are in bijection with Kreweras-
type non-crossing partitions of the set {1, 2, . . . , mn}, in which all the block sizes are
divisible by m. We denote the latter set of non-crossing partitions by ÑCm(An−1). It
has been first considered by Edelman in [18]. In fact, Figure 7 shows an example of a
3-divisible non-crossing partition of type A20.
Given an element (w0;w1, . . . , wm) ∈ NC
m(An−1), the bijection, ▽
say, from [1,
Theorem 4.3.8] works by “blowing up” w1, w2, . . . , wm, thereby “interleaving” them,
and then “gluing” them together by an operation which is called Kreweras complement
in [1]. More precisely, for i = 1, 2, . . . , m, let τm,i be the transformation which maps a
permutation w ∈ Sn to a permutation τm,i(w) ∈ Smn by letting
(τm,i(w))(mk + i−m) = mw(k) + i−m, k = 1, 2, . . . , n,
36 C. KRATTENTHALER AND T. W. MÜLLER
Figure 7. Combinatorial realisation of a 3-divisible non-crossing parti-
tion of type A6
and (τm,i(w))(l) = l for all l 6≡ i (mod m). At this point, the reader should recall from
Section 2 that W (An−1) is the symmetric group Sn, and that the standard choice of
a Coxeter element in W (An−1) = Sn is c = (1, 2, . . . , n). With this choice of Coxeter
element, the announced bijection maps (w0;w1, . . . , wm) ∈ NC
m(An−1) to
▽mAn−1(w0;w1, . . . , wm) = (1, 2, . . . , mn) (τm,1(w1))
−1 (τm,2(w2))
−1 · · · (τm,m(wm))
We refer the reader to [1, Sec. 4.3.2] for the details. For example, let n = 7, m = 3,
w0 = (4, 5, 6), w1 = (3, 6), w2 = (1, 7), and w3 = (1, 2, 6). Then (w0;w1, w2, w3) is
mapped to
▽3A6(w0;w1, w2, w3) = (1, 2, . . . , 21) (7, 16) (2, 20) (18, 6, 3)
= (1, 2, 21) (3, 19, 20) (4, 5, 6) (7, 17, 18) (8, 9, . . . , 16). (7.3)
Figure 7 shows the graphical representation of (7.3) on the circle, in which we represent
a cycle (i1, i2, . . . , ik) as a polygon consisting of the vertices labelled i1, i2, . . . , ik and
edges which connect these vertices in clockwise order.
It is shown in [1, Theorem 4.3.8] that ▽mAn−1 is in fact an isomorphism between the
posets NCm(An−1) and ÑC
m(An−1). Furthermore, it is proved in [1, Theorem 4.3.13]
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 37
Figure 8. Combinatorial realisation of a 3-divisible non-crossing parti-
tion of type B5
ci(w0) = bi(▽
(w0;w1, . . . , wm)), i = 1, 2, . . . , n, (7.4)
where ci(w0) denotes the number of cycles of length i of w0 and bi(π) denotes the
number of blocks of size mi in the non-crossing partition π.
If Φ = Bn, the m-divisible non-crossing partitions are in bijection with Kreweras-type
non-crossing partitions π of the set {1, 2, . . . , mn, 1̄, 2̄, . . . , mn}, in which all the block
sizes are divisible by m, and which have the property that if B is a block of π then also
B := {x̄ : x ∈ B} is a block of π. (Here, as earlier, we adopt the convention that ¯̄x = x
for all x.) We denote the latter set of non-crossing partitions by ÑCm(Bn). A block
B with B = B is called a zero block . A non-crossing partition in ÑCm(Bn) can only
have at most one zero block. Figures 8 and 9 give examples of 3-divisible non-crossing
partitions of type B5. Figure 8 shows one without a zero block, while Figure 9 shows
one with a zero block. Clearly, the condition that B is a block of the partition if and
only if B is a block translates into the condition that the geometric realisation of the
partition is invariant under rotation by 180◦.
38 C. KRATTENTHALER AND T. W. MÜLLER
Figure 9. A 3-divisible non-crossing partition of type B5 with zero block
Given an element (w0;w1, . . . , wm) ∈ NC
m(Bn), the bijection, ▽
say, from [1,
Theorem 4.5.6] works in the same way as for NCm(An−1). That is, recalling from
Section 2 that W (Bn) can be combinatorially realised as a subgroup of the group
of permutations of {1, 2, . . . , n, 1̄, 2̄, . . . , n̄}, and that, in this realisation, the standard
choice of a Coxeter element is c = [1, 2, . . . , n] = (1, 2, . . . , n, 1̄, 2̄, . . . , n̄), the announced
bijection maps (w0;w1, . . . , wm) ∈ NC
m(Bn) to
▽mBn(w0;w1, . . . , wm) = [1, 2, . . . , mn] (τ̄m,1(w1))
−1 (τ̄m,2(w2))
−1 · · · (τ̄m,m(wm))
where τ̄m,i is the obvious extension of the above transformations τm,i: namely we let
(τ̄m,i(w))(mk + i−m) = mw(k) + i−m, k = 1, 2, . . . , n, 1̄, 2̄, . . . , n̄,
and (τ̄m,i(w))(l) = l and (τ̄m,i(w))(l̄) = l̄ for all l 6≡ i (mod m), where mk̄ + i − m is
identified with mk + i−m for all k and i. We refer the reader to [1, Sec. 4.5] for the
details. For example, let n = 5, m = 3, w0 = ((2, 4)), w1 = [1] = (1, 1̄), w2 = ((1, 4)),
and w3 = ((2, 3)) ((4, 5)). Then (w0;w1, w2, w3) is mapped to
▽3B5(w0;w1, w2, w3) = [1, 2, . . . , 15] [1] ((2, 11)) ((6, 9)) ((12, 15))
= ((1, 2̄, 12)) ((3, 4, 5, 6, 10, 11)) ((7, 8, 9)) ((13, 14, 15)). (7.5)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 39
Figure 8 shows the graphical representation of (7.5).
It is shown in [1, Theorem 4.5.6] that ▽mBn is in fact an isomorphism between
the posets NCm(Bn) and ÑC
m(Bn). Furthermore, it is proved in [1, proof of The-
orem 4.3.13] that
ci(w0) = bi(▽
(w0;w1, . . . , wm)), i = 1, 2, . . . , n, (7.6)
where ci(w0) denotes the number of type A cycles (recall the corresponding terminology
from Section 4) of length i of w0 and bi(π) denotes one half of the number of non-zero
blocks of size mi in the non-crossing partition π. (Recall that non-zero blocks come in
“symmetric” pairs.) Consequently, under the bijection ▽mBn , the element w0 contains a
type B cycle of length ℓ if and only if ▽mBn(w0;w1, . . . , wm) contains a zero block of size
The m-divisible non-crossing partitions of type Dn cannot be realised as certain
“partitions non croisées d’un cycle,” but as non-crossing partitions on an annulus with
2m(n − 1) vertices on the outer cycle and 2m vertices on the inner cycle, the vertices
on the outer cycle being labelled by 1, 2, . . . , mn − m, 1̄, 2̄, . . . , mn−m in clockwise
order, and the vertices of the inner cycle being labelled by mn − m + 1, . . . , mn − 1,
mn,mn−m+ 1, . . . , mn− 1, mn in counter-clockwise order. Given a partition π of
{1, 2, . . . , mn, 1̄, 2̄, . . . , mn}, we represent it on this annulus in a manner analogous to
Kreweras’ graphical representation of his partitions; namely, we represent each block
of π by connecting the vertices labelled by the elements of the block by curves in
clockwise order, the important additional requirement being here that the curves must
be drawn in the interior of the annulus. If it is possible to draw the curves in such a
way that no two curves intersect, then the partition is called a non-crossing partition on
the (2m(n − 1), 2m)-annulus. Figure 10 shows a non-crossing partition on the (15, 6)-
annulus.
With this definition, them-divisible non-crossing partitions of typeDn are in bijection
with non-crossing partitions π on the (2m(n − 1), 2m)-annulus, in which successive
elements of a block (successive in the circular order in the graphical representation of
the block) are in successive congruence classes modulo m, which have the property that
if B is a block of π then also B := {x̄ : x ∈ B} is a block of π, and which satisfy an
additional restriction concerning their zero block. Here again, a zero block is a block
B with B = B. The announced additional restriction says that a zero block can only
occur if it contains all the vertices of the inner cycle, that is, mn − m + 1, . . . , mn −
1, mn,mn−m+ 1, . . . , mn− 1, mn, and at least two further elements from the outer
cycle. We denote this set of non-crossing partitions on the (2m(n − 1), 2m)-annulus
by ÑCm(Dn). A non-crossing partition in ÑC
m(Dn) can only have at most one zero
block. Figures 10 and 11 give examples of 3-divisible non-crossing partitions of type
D6, Figure 10 one without a zero block, while Figure 11 one with a zero block. Again,
it is clear that the condition that B is a block of the partition if and only if B is a block
translates into the condition that the geometric realisation of the partition is invariant
under rotation by 180◦.
In order to clearly sort out the differences to the earlier combinatorial realisations of
m-divisible non-crossing partitions of types An−1 and Bn, we stress that for type Dn
there are three major features which are not present for the former types: (1) here we
consider non-crossing partitions on an annulus; (2) it is not sufficient to impose the
40 C. KRATTENTHALER AND T. W. MÜLLER
17 18
Figure 10. Combinatorial realisation of a 3-divisible non-crossing par-
tition of type D6
condition that the size of every block is divisible by m: the condition on successive
elements of a block is stronger; (3) there is the above additional restriction on the zero
block (which is not present in type Bn).
Given an element (w0;w1, . . . , wm) ∈ NC
m(Dn), the bijection, ▽
say, from [29]
works as follows. Recalling from Section 2 that W (Dn) can be combinatorially realised
as a subgroup of the group of permutations of {1, 2, . . . , n, 1̄, 2̄, . . . , n̄}, and that, in
this realisation, the standard choice of a Coxeter element is c = [1, 2, . . . , n − 1] [n] =
(1, 2, . . . , n− 1, 1̄, 2̄, . . . , n− 1) (n, n̄), the announced bijection maps (w0;w1, . . . , wm) ∈
NCm(Dn) to
▽mDn (w0;w1, . . . , wm) = [1, 2, . . . , m(n− 1)] [mn−m+ 1, . . . , mn− 1, mn]
◦ (τ̄m,1(w1))
−1 (τ̄m,2(w2))
−1 · · · (τ̄m,m(wm))
where τ̄m,i is defined as above. We refer the reader to [29] for the details. For example,
let n = 6, m = 3, w0 = ((2, 4̄)), w1 = ((2, 6̄)) ((4, 5)), w2 = ((1, 5̄)) ((2, 3)), and
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 41
17 18
Figure 11. A 3-divisible non-crossing partition of type D6 with zero block
w3 = ((3, 6)). Then (w0;w1, w2, w3) is mapped to
▽3D6(w0;w1, w2, w3)
= [1, 2, . . . , 15] [16, 17, 18]((4, 16)) ((10, 13)) ((2, 14)) ((5, 8)) ((9, 18))
= ((1, 2, 15)) ((3, 4, 17, 18, 10, 14)) ((5, 9, 16)) ((6, 7, 8)) ((11, 12, 13)). (7.7)
Figure 10 shows the graphical representation of (7.7).
It is shown in [29] that ▽mDn is in fact an isomorphism between the posets NC
m(Dn)
and ÑCm(Dn). Furthermore, it is proved in [29] that
ci(w0) = bi(▽
(w0;w1, . . . , wm)), i = 1, 2, . . . , n, (7.8)
where ci(w0) denotes the number of type A cycles of length i of w0 and bi(π) denotes
one half of the number of non-zero blocks of size mi in the non-crossing partition
π. (Recall that non-zero blocks come in “symmetric” pairs.) Consequently, under
the bijection ▽mDn, the element w0 contains a type D cycle of length ℓ if and only if
▽mDn(w0;w1, . . . , wm) contains a zero block of size mℓ.
42 C. KRATTENTHALER AND T. W. MÜLLER
8. Decomposition numbers with free factors, and enumeration in the
poset of generalised non-crossing partitions
This section is devoted to applying our formulae from Sections 4–6 for the decompo-
sition numbers of the types An, Bn, and Dn to the enumerative theory of generalised
non-crossing partitions for these types. Theorems 11–15 present formulae for the num-
ber of minimal factorisations of Coxeter elements in types An, Bn, and Dn, respectively,
where we do not prescribe the types of all the factors as for the decomposition numbers,
but just for some of them, while we impose rank sum conditions on other factors. Im-
mediate corollaries are formulae for the number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1,
l being given, in the posets ÑCm(An−1), ÑC
m(Bn), and ÑC
m(Dn), where the poset
rank of πi equals ri, and where the block structure of π1 is prescribed, see Corollaries 12,
14, and 16. These results in turn imply all known enumerative results on ordinary and
generalised non-crossing partitions via appropriate summations, see the remarks accom-
panying the corollaries. They also imply two further new results on chain enumeration
in ÑCm(Dn), see Corollaries 18 and 19. We want to stress that, since ÑC
m(Φ) and
NCm(Φ) are isomorphic as posets for Φ = An−1, Bn, Dn, Corollaries 12, 14, 16, 17, 18
imply obvious results for NCm(Φ) in place of ÑCm(Φ), Φ = An−1, Bn, Dn, via (7.4),
(7.6), respectively (7.8).
We begin with our results for type An. The next theorem generalises Theorem 6,
which can be obtained from the former as the special case in which l = 1 and m1 = 1.
Theorem 11. For a positive integer d, let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d,
and let l, m1, m2, . . . , ml, s1, s2, . . . , sl be given non-negative integers with
rkT1 + rk T2 + · · ·+ rkTd + s1 + s2 + · · ·+ sl = n.
Then the number of factorisations
c = σ1σ2 · · ·σdσ
2 · · ·σ
2 · · ·σ
· · ·σ
2 · · ·σ
, (8.1)
where c is a Coxeter element in W (An), such that the type of σi is Ti, i = 1, 2, . . . , d,
and such that
ℓT (σ
1 ) + ℓT (σ
2 ) + · · ·+ ℓT (σ
) = si, i = 1, 2, . . . , l, (8.2)
is given by
(n+ 1)d−1
n− rkTi + 1
n− rkTi + 1
1 , m
2 , . . . , m
m1(n+ 1)
m2(n+ 1)
· · ·
ml(n+ 1)
, (8.3)
where the multinomial coefficient is defined as in Lemma 4.
Proof. In the factorisation (8.1), we first fix also the types of the σ
i ’s. For i =
1, 2, . . . , mj and j = 1, 2, . . . , l, let the type of σ
i = A
(i,j)
1 ∗ A
(i,j)
2 ∗ · · · ∗ A
(i,j)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 43
We know that the number of these factorisations is given by (4.1) with d replaced by
d+m1+m2+· · ·+ml and the appropriate interpretations of them
i ’s. Next we fix non-
negative integers r
i and sum the expression (4.1) over all possible types T
i of rank
i , i = 1, 2, . . . , mj, j = 1, 2, . . . , l. The corresponding summations are completely
analogous to the summation in the proof of Theorem 6. As a result, we obtain
(n+ 1)d−1
n− rkTi + 1
n− rkTi + 1
1 , m
2 , . . . , m
· · ·
n + 1
· · ·
× · · · ×
n + 1
n + 1
· · ·
for the number of factorisations under consideration. In view of (8.2) and (2.4), to
obtain the final result, we must sum these expressions over all non-negative integers
1 , . . . , r
ml satisfying the equations
1 + r
2 + · · ·+ r
= sj, j = 1, 2, . . . , l. (8.4)
This is easily done by means of the multivariate version of the Chu–Vandermonde
summation. The formula in (8.3) follows. �
In view of the combinatorial realisation of m-divisible non-crossing partitions of type
An−1 which we described in Section 7, the special case d = 1 of the above theorem has
the following enumerative consequence.
Corollary 12. Let l be a positive integer, and let s1, s2, . . . , sl be non-negative integers
with s1 + s2+ · · ·+ sl = n− 1. The number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in the
poset ÑCm(An−1), with the property that rk(πi) = s1 + s2 + · · ·+ si, i = 1, 2, . . . , l− 1,
and that the number of blocks of size mi of π1 is bi, i = 1, 2, . . . , n, is given by
b1 + b2 + · · ·+ bn
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
· · ·
, (8.5)
provided that b1 + 2b2 + · · ·+ nbn ≤ n, and is 0 otherwise.
Remark. The conditions in the statement of the corollary imply that
s1 + b1 + b2 + · · ·+ bn = n. (8.6)
Proof. Let
π1 ≤ π2 ≤ · · · ≤ πl−1 (8.7)
be a multi-chain in ÑCm(An−1). Suppose that, under the bijection ▽
, the element
πj corresponds to the tuple (w
1 , . . . , w
m ), j = 1, 2, . . . , l− 1. The inequalities in
(8.7) imply that w
1 , w
2 , . . . , w
m can be factored in the form
i = u
i · · ·u
i , i = 1, 2, . . . , m,
where u
i = w
(l−1)
i and, more generally,
i = u
(j+1)
(j+2)
i · · ·u
i , i = 1, 2, . . . , m, j = 1, 2, . . . , l − 1. (8.8)
44 C. KRATTENTHALER AND T. W. MÜLLER
For later use, we record that
c = w
1 · · ·w
(j+1)
(j+2)
1 · · ·u
(j+1)
(j+2)
2 · · ·u
· · ·
u(j+1)m u
(j+2)
m · · ·u
. (8.9)
Now, by (7.4), the block structure conditions on π1 in the statement of the corollary
translate into the condition that the type of w
Ab21 ∗ A
2 ∗ · · · ∗ A
n−1. (8.10)
On the other hand, using (7.2), we see that the rank conditions in the statement of the
corollary mean that
ℓT (w
0 ) = s1 + s2 + · · ·+ sj, j = 1, 2, . . . , l − 1.
In combination with (8.9), this yields the conditions
ℓT (u
1 ) + ℓT (u
2 ) + · · ·+ ℓT (u
m ) = sj , j = 2, 3, . . . , l. (8.11)
Thus, we want to count the number of factorisations
c = w
1 · · ·u
2 · · ·u
· · ·
u(2)m u
m · · ·u
, (8.12)
where the type of w
0 is given in (8.10), and where the “rank conditions” (8.11) are
satisfied. So, in view of (2.4), we are in the situation of Theorem 11 with n replaced by
n− 1, d = 1, l replaced by l − 1, si replaced by si+1, i = 1, 2, . . . , l − 1, T1 the type in
(8.10), m1 = m2 = · · · = ml−1 = m, except that the factors are not exactly in the order
as in (8.1). However, by (2.2) we know that the order of factors is without relevance.
Therefore we just have to apply Theorem 11 with the above specialisations. If we also
take into account (8.6), then we arrive immediately at (8.5). �
This result is new even for m = 1, that is, for the poset of Kreweras’ non-crossing par-
titions of {1, 2, . . . , n}. It implies all known results on Kreweras’ non-crossing partitions
and the m-divisible non-crossing partitions of Edelman. Namely, for l = 2 it reduces
to Armstrong’s result [1, Theorem 4.4.4 with ℓ = 1] on the number of m-divisible
non-crossing partitions in ÑCm(An−1) with a given block structure, which itself con-
tains Kreweras’ result [30, Theorem 4] on his non-crossing partitions with a given block
structure as a special case. If we sum the expression (8.5) over all s2, s3, . . . , sl with
s2 + s3 + · · · + sl = n − 1 − s1, then we obtain that the number of all multi-chains
π1 ≤ π2 ≤ · · · ≤ πl−1 in Edelman’s poset ÑC
m(An−1) of m-divisible non-crossing
partitions of {1, 2, . . . , mn} in which π1 has bi blocks of size mi equals
b1 + b2 + · · ·+ bn
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)mn
n− s1 − 1
b1 + b2 + · · ·+ bn
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)mn
b1 + b2 + · · ·+ bn − 1
, (8.13)
provided that b1 + 2b2 + · · · + nbn ≤ n, a result originally due to Armstrong [1,
Theorem 4.4.4]. On the other hand, if we sum the expression (8.5) over all possible
b1, b2, . . . , bn, that is, b2 + 2b3 + · · ·+ (n− 1)bn = s1, use of Lemma 4 with M = n− s1
and r = s1 yields that the number of all multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in Edelman’s
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 45
poset ÑCm(An−1) ∼= NC
m(An−1) where πi is of rank s1+s2+ · · ·+si, i = 1, 2, . . . , l−1,
equals
· · ·
, (8.14)
a result originally due to Edelman [18, Theorem 4.2]. Clearly, this formula contains
at the same time a formula for the number of all m-divisible non-crossing partitions of
{1, 2, . . . , mn} with a given number of blocks upon setting l = 2 (cf. [18, Lemma 4.1]),
as well as that it implies that the total number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in
the poset of these partitions is
(l − 1)mn+ n
(8.15)
upon summing (8.14) over all non-negative integers s1, s2, . . . , sl with s1+s2+ · · ·+sl =
n− 1 by means of the multivariate Chu–Vandermonde summation, thus recovering the
formula [18, Cor. 4.4] for the zeta polynomial of the poset of m-divisible non-crossing
partitions of type An−1. As special case l = 2, we recover the well-known fact that the
total number of m-divisible non-crossing partitions of {1, 2, . . . , mn} is 1
(m+1)n
We continue with our results for type Bn. We formulate the theorem below on
factorisations inW (Bn) only with restrictions on the combinatorial type of some factors.
An analogous result with group-theoretical type instead could be easily derived as well.
We omit this here because, for the combinatorial applications that we have in mind,
combinatorial type suffices. We remark that the theorem generalises Theorem 8, which
can be obtained from the former as the special case in which l = 1 and m1 = 1.
Theorem 13. (i) For a positive integer d, let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Bα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
for some α ≥ 1, and let l, m1, m2, . . . , ml, s1, s2, . . . , sl be given non-negative integers
rkT1 + rkT2 + · · ·+ rkTd + s1 + s2 + · · ·+ sl = n. (8.16)
Then the number of factorisations
c = σ1σ2 · · ·σdσ
2 · · ·σ
2 · · ·σ
· · ·σ
2 · · ·σ
, (8.17)
where c is a Coxeter element in W (Bn), such that the combinatorial type of σi is Ti,
i = 1, 2, . . . , d, and such that
ℓT (σ
1 ) + ℓT (σ
2 ) + · · ·+ ℓT (σ
) = si, i = 1, 2, . . . , l, (8.18)
is given by
n− rkTj
1 , m
2 , . . . , m
)( d∏
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
· · ·
, (8.19)
46 C. KRATTENTHALER AND T. W. MÜLLER
where the multinomial coefficient is defined as in Lemma 4.
(ii) For a positive integer d, let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d,
and let l, m1, m2, . . . , ml, s1, s2, . . . , sl be given non-negative integers. Then the number
of factorisations (8.17) which satisfy (8.18) plus the condition that the combinatorial
type of σi is Ti, i = 1, 2, . . . , d, is given by
nd−1(n− rkT1 − rkT2 − · · · − rkTd)
n− rkTi
n− rkTi
1 , m
2 , . . . , m
· · ·
. (8.20)
Proof. We start with the proof of item (i). In the factorisation (8.17), we first fix also
the types of the σ
i ’s. For i = 1, 2, . . . , mj and j = 1, 2, . . . , l let the type of σ
i = A
(i,j)
1 ∗ A
(i,j)
2 ∗ · · · ∗ A
(i,j)
We know that the number of these factorisations is given by (5.1) with d replaced by
d+m1+m2+· · ·+ml and the appropriate interpretations of them
i ’s. Next we fix non-
negative integers r
i and sum the expression (5.1) over all possible types T
i of rank
i , i = 1, 2, . . . , mj, j = 1, 2, . . . , l. The corresponding summations are completely
analogous to the first summation in the proof of Theorem 8. As a result, we obtain
n− rkTj
1 , m
2 , . . . , m
)( d∏
i 6=j
n− rkTi
n− rkTi
1 , m
2 , . . . , m
· · ·
· · ·
× · · · ×
· · ·
for the number of factorisations under consideration. In view of (8.18) and (2.4), to
obtain the final result, we must sum these expressions over all non-negative integers
1 , . . . , r
ml satisfying the equations
1 + r
2 + · · ·+ r
= sj, j = 1, 2, . . . , l.
This is easily done by means of the multivariate version of the Chu–Vandermonde
summation. The formula in (8.19) follows.
The proof of item (ii) is completely analogous, we must, however, cope with the
complication that the type B cycle, which, according to Theorem 7, must occur in
the disjoint cycle decomposition of exactly one of the factors on the right-hand side of
(8.17), can occur in any of the σ
i ’s. So, let us fix the types of the σ
i ’s to
i = A
(i,j)
1 ∗ A
(i,j)
2 ∗ · · · ∗ A
(i,j)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 47
i = 1, 2, . . . , mj, j = 1, 2, . . . , l, except for (i, j) = (p, q), where we require that the type
T (q)p = Bα ∗ A
1 ∗ A
2 ∗ · · · ∗ A
Again, we know that the number of these factorisations is given by (5.1) with d replaced
by d +m1 +m2 + · · ·+ml and the appropriate interpretations of the m
i ’s. Now we
fix non-negative integers r
i and sum the expression (5.1) over all possible types T
of rank r
i , i = 1, 2, . . . , mj , j = 1, 2, . . . , l. Again, the corresponding summations are
completely analogous to the summations in the proof of Theorem 8. In particular, the
summation over all possible types T
p of rank r
p is essentially the summation on the
right-hand side of (5.14) with d replaced by d + m1 + m2 + · · · + ml and r replaced
p . If we use what we know from the proof of Theorem 8, then the result of the
summations is found to be
n− rkTi
n− rkTi
1 , m
2 , . . . , m
· · ·
× · · · ×
· · · r(q)p
· · ·
× · · · ×
· · ·
. (8.21)
The reader should note that the term r
in this expression results from the sum-
mation over all types T
p of rank r
p (compare (5.15) with r replaced by r
p ; we have(
). Using (8.16), (8.18) and (2.4), we see that the sum of all r
p over
p = 1, 2, . . . , mq and q = 1, 2, . . . , l must be n− rkT1 − rkT2 − · · · − rkTd. Hence, the
sum of the expressions (8.21) over all (p, q) equals
nd−1(n− rkT1 − rkT2 − · · · − rkTd)
n− rkTi
n− rkTi
1 , m
2 , . . . , m
· · ·
× · · · ×
· · ·
× · · · ×
· · ·
Finally, we must sum these expressions over all non-negative integers r
1 , . . . , r
ml sat-
isfying the equations
1 + r
2 + · · ·+ r
= sj, j = 1, 2, . . . , l.
Once again, this is easily done by means of the multivariate version of the Chu–
Vandermonde summation. As a result, we obtain the formula in (8.20). �
In view of the combinatorial realisation of m-divisible non-crossing partitions of type
Bn which we described in Section 7, the special case d = 1 of the above theorem has
the following enumerative consequence.
48 C. KRATTENTHALER AND T. W. MÜLLER
Corollary 14. Let l be a positive integer, and let s1, s2, . . . , sl be non-negative integers
with s1 + s2 + · · · + sl = n. The number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in the
poset ÑCm(Bn) with the property that rk(πi) = s1+ s2+ · · ·+ si, i = 1, 2, . . . , l−1, and
that the number of non-zero blocks of π1 of size mi is 2bi, i = 1, 2, . . . , n, is given by(
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
· · ·
, (8.22)
provided that b1 + 2b2 + · · ·+ nbn ≤ n, and is 0 otherwise.
Remark. The conditions in the statement of the corollary imply that
s1 + b1 + b2 + · · ·+ bn = n. (8.23)
The reader should recall from Section 7, that non-zero blocks of elements π of ÑCm(Bn)
occur in pairs since, with a block B of π, also B is a block of π.
Proof. The arguments are completely analogous to those of the proof of Corollary 12.
The conclusion here is that we need Theorem 13 with d = 1, l replaced by l − 1, si
replaced by si+1, i = 1, 2, . . . , l − 1, m1 = m2 = · · · = ml−1 = m, and T1 of the type
Bn−b1−2b2−···−nbn ∗ A
1 ∗ A
2 ∗ · · · ∗ A
in the case that b1 + 2b2 + · · ·+ nbn < n (which enforces the existence of a zero block
of size 2(n− b1 − 2b2 − · · · − nbn) in π1), respectively
Ab21 ∗ A
2 ∗ · · · ∗ A
if not. So, depending on the case in which we are, we have to apply (8.19), respectively
(8.20). However, for d = 1 these two formulae become identical. More precisely, under
the above specialisations, they reduce to
n− rkT1
b2, b3, . . . , bn
· · ·
If we also take into account (8.23), then we arrive immediately at (8.22). �
This result is new even for m = 1, that is, for the poset of Reiner’s type Bn
non-crossing partitions. It implies all known results on these non-crossing partitions
and their extension to m-divisible type Bn non-crossing partitions due to Armstrong.
Namely, for l = 2 it reduces to Armstrong’s result [1, Theorem 4.5.11 with ℓ = 1] on
the number of elements of ÑCm(Bn) with a given block structure, which itself con-
tains Athanasiadis’ result [2, Theorem 2.3] on Reiner’s type Bn non-crossing partitions
with a given block structure as a special case. If we sum the expression (8.22) over all
s2, s3, . . . , sl with s2 + s3 + · · · + sl = n − s1, then we obtain that the number of all
multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in ÑC
m(Bn) in which π1 has 2bi non-zero blocks of
size mi equals
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)mn
n− s1
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)mn
b1 + b2 + · · ·+ bn
(8.24)
provided that b1 + 2b2 + · · · + nbn ≤ n, a result originally due to Armstrong [1, The-
orem 4.5.11]. On the other hand, if we sum the expression (8.22) over all possible
b1, b2, . . . , bn, that is, over b2 + 2b3 + · · · + (n − 1)bn ≤ s1, use of Lemma 4 with
M = n − s1 and r = s1 − α (where α stands for the difference between s1 and
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 49
b2+2b3+ · · ·+(n−1)bn) yields that the number of all multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1
in ÑCm(Bn) ∼= NC
m(Bn) where πi is of rank s1+s2+ · · ·+si, i = 1, 2, . . . , l−1, equals
n− α− 1
s1 − α
· · ·
· · ·
, (8.25)
another result due to Armstrong [1, Theorem 4.5.7]. Clearly, this formula contains
at the same time a formula for the number of all elements of ÑCm(Bn) ∼= NC
m(Bn)
with a given number of blocks (equivalently, a given rank) upon setting l = 2 (cf.
[1, Theorem 4.5.8]), as well as that it implies that the total number of multi-chains
π1 ≤ π2 ≤ · · · ≤ πl−1 in ÑC
m(Bn) ∼= NC
m(Bn) is
(l − 1)mn+ n
(8.26)
upon summing (8.25) over all non-negative integers s1, s2, . . . , sl with s1+s2+ · · ·+sl =
n by means of the multivariate Chu–Vandermonde summation, thus recovering the
formula [1, Theorem 3.6.9] for the zeta polynomial of the poset of generalised non-
crossing partitions in the case of type Bn. As special case l = 2, we recover the fact
that the cardinality of ÑCm(Bn) ∼= NC
m(Bn) is
(m+1)n
(cf. [1, Theorem 3.5.3]).
The final set of results in this section concerns the type Dn. We start with Theo-
rem 15, the result on factorisations in W (Dn) which is analogous to Theorems 11 and
13. Similar to Theorem 13, we formulate the theorem only with restrictions on the
combinatorial type of some factors. An analogous result with group-theoretical type
instead could be easily derived as well. We refrain from doing this here because, again,
for the combinatorial applications that we have in mind, combinatorial type suffices.
We remark that the theorem generalises Theorem 10, which can be obtained from the
former as the special case in which l = 1 and m1 = 1.
Theorem 15. (i) For a positive integer d, let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , j − 1, j + 1, . . . , d,
Tj = Dα ∗ A
2 ∗ · · · ∗ A
for some α ≥ 2, and let l, m1, m2, . . . , ml, s1, s2, . . . , sl be given non-negative integers
rkT1 + rk T2 + · · ·+ rkTd + s1 + s2 + · · ·+ sl = n.
Then the number of factorisations
c = σ1σ2 · · ·σdσ
2 · · ·σ
2 · · ·σ
· · ·σ
2 · · ·σ
, (8.27)
where c is a Coxeter element in W (Dn), such that the combinatorial type of σi is Ti,
i = 1, 2, . . . , d, and such that
ℓT (σ
1 ) + ℓT (σ
2 ) + · · ·+ ℓT (σ
) = si, i = 1, 2, . . . , l, (8.28)
50 C. KRATTENTHALER AND T. W. MÜLLER
is given by
(n− 1)d−1
n− rkTj
1 , m
2 , . . . , m
)( d∏
i 6=j
n− rkTi − 1
n− rk Ti − 1
1 , m
2 , . . . , m
m1(n− 1)
m2(n− 1)
· · ·
ml(n− 1)
, (8.29)
the multinomial coefficient being defined as in Lemma 4.
(ii) For a positive integer d, let the types T1, T2, . . . , Td be given, where
Ti = A
1 ∗ A
2 ∗ · · · ∗ A
n , i = 1, 2, . . . , d,
and let l, m1, m2, . . . , ml, s1, s2, . . . , sl be given non-negative integers. Then the number
of factorisations (8.27) which satisfy (8.28) as well as the condition that the combina-
torial type of σi is Ti, i = 1, 2, . . . , d, is given by
2(n− 1)d−1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
m1(n− 1)
m2(n− 1)
· · ·
ml(n− 1)
+ (n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
m1(n− 1)
· · ·
mj(n− 1)− 1
sj − 2
· · ·
ml(n− 1)
− 2(d− 1)(n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
m1(n− 1)
m2(n− 1)
· · ·
ml(n− 1)
. (8.30)
Proof. The proof of item (i) is completely analogous to the proof of item (i) in The-
orem 13. Making reference to that proof, the only difference is that, instead of the
expression (5.1), we must use (6.1) with d replaced by d + m1 + m2 + · · · + ml and
the appropriate interpretations of the m
i ’s. The summations over types T
i with
fixed rank r
i are carried out by using (3.4) with M = n − r − 1. Subsequently, the
summations over the r
i ’s satisfying (8.4) are done by the multivariate version of the
Chu–Vandermonde summation. We leave it to the reader to fill in the details to finally
arrive at (8.29).
Similarly, the proof of item (ii) is analogous to the proof of item (ii) in Theorem 13.
However, we must cope with the complication that there may or may not be a type D
cycle in the disjoint cycle decomposition of one of the σ
i ’s on the right-hand side of
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 51
(8.27). In the case that there is no type B cycle, we fix the types of the σ
i ’s to
i = A
(i,j)
1 ∗ A
(i,j)
2 ∗ · · · ∗ A
(i,j)
i = 1, 2, . . . , mj , j = 1, 2, . . . , l, and sum the expression (6.2) with d replaced by d +
m1 +m2 + · · ·+ml and the appropriate interpretations of the m
i ’s over all possible
types T
i with rank r
i , i = 1, 2, . . . , mj , j = 1, 2, . . . , l. This yields the expression
2(n− 1)d−1
n− rkTj
1 , m
2 , . . . , m
i 6=j
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
· · ·
× · · · ×
· · ·
+ 2(n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
· · ·
× · · · ×
· · ·
− 2(d− 1)(n− 1)d−1
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
· · ·
× · · · ×
· · ·
. (8.31)
In the case that there appears, however, a type B cycle in σ
p , say, we adopt the same
set-up as above, except that we restrict σ
p to types of the form
T (q)p = Dα ∗ A
2 ∗ · · · ∗ A
Subsequently, we sum the expression (6.1) with d replaced by d+m1 +m2 + · · ·+ml
and the appropriate interpretations of the m
i ’s over all possible types T
i of rank r
This time, we obtain
(n− 1)d
n− rkTi − 1
n− rkTi − 1
1 , m
2 , . . . , m
· · ·
× · · · ×
· · ·
n− α− 1
p − α
· · ·
× · · · ×
· · ·
. (8.32)
The sum over α can be evaluated by means of the elementary summation formula
n− α− 1
r − α
n− α− 1
n− r − 1
r − 2
52 C. KRATTENTHALER AND T. W. MÜLLER
Finally, we must sum the expressions (8.31) and (8.32) over all non-negative integers
1 , . . . , r
ml satisfying the equations
1 + r
2 + · · ·+ r
= sj, j = 1, 2, . . . , l.
Once again, this is easily done by means of the multivariate version of the Chu–
Vandermonde summation. After some simplification, we obtain the formula in (8.30).
In view of the combinatorial realisation of m-divisible non-crossing partitions of type
Dn which we described in Section 7, the special case d = 1 of the above theorem has
the following enumerative consequence.
Corollary 16. Let l be a positive integer, and let s1, s2, . . . , sl be non-negative integers
with s1 + s2 + · · · + sl = n. The number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in the
poset ÑCm(Dn) with the property that rk(πi) = s1+ s2+ · · ·+ si, i = 1, 2, . . . , l−1, and
that the number of non-zero blocks of π1 of size mi is 2bi, i = 1, 2, . . . , n, is given by(
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
m(n− 1)
· · ·
m(n− 1)
, (8.33)
if b1 + 2b2 + · · ·+ nbn < n− 1, and
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
m(n− 1)
· · ·
m(n− 1)
m(n− 1)
b1 + b2 + · · ·+ bn − 1
b1 + b2 + · · ·+ bn − 1
b1 − 1, b2, . . . , bn
m(n− 1)
· · ·
m(n− 1)− 1
sj − 2
· · ·
m(n− 1)
, (8.34)
if b1 + 2b2 + · · ·+ nbn = n.
Remark. The conditions in the statement of the corollary imply that
s1 + b1 + b2 + · · ·+ bn = n. (8.35)
The reader should recall from Section 7, that non-zero blocks of elements π of ÑCm(Dn)
occur in pairs since, with a block B of π, also B is a block of π. The condition
b1 + 2b2 + · · ·+ nbn < n− 1, which is required for Formula (8.33) to hold, implies that
π1 must contain a zero block of size 2(n − b1 − 2b2 − · · · − nbn), while the equality
b1 + 2b2 + · · · + nbn = n, which is required for Formula (8.34) to hold, implies that
π1 contains no zero block. The extra condition on zero blocks that are imposed on
elements of ÑCm(Dn) implies that b1 + 2b2 + · · ·+ nbn cannot be equal to n− 1.
Proof. Again, the arguments are completely analogous to those of the proof of Corol-
lary 12. Here we need Theorem 15 with d = 1, l replaced by l − 1, si replaced by si+1,
i = 1, 2, . . . , l − 1, m1 = m2 = · · · = ml−1 = m, and T1 of the type
Dn−b1−2b2−···−nbn ∗A
1 ∗ A
2 ∗ · · · ∗ A
in the case that b1 + 2b2 + · · ·+ nbn < n− 1, respectively
Ab21 ∗ A
2 ∗ · · · ∗ A
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 53
if not. So, depending on the case in which we are, we have to apply (8.29), respectively
(8.30). If we also take into account (8.35), then we arrive at the claimed result after
little manipulation. Since we have done similar calculations already several times, the
details are left to the reader. �
This result is new even for m = 1, that is, for the poset of type Dn non-crossing
partitions of Athanasiadis and Reiner [4], and of Bessis and Corran [9]. Not only does
it imply all known results on these non-crossing partitions and their extension to m-
divisible type Dn non-crossing partitions due to Armstrong, it allows us as well to solve
several open enumeration problems on the m-divisible type Dn non-crossing partitions.
We state these new results separately in the corollaries below.
To begin with, if we set l = 2 in Corollary 16, then we obtain the following extension
to ÑCm(Dn) of Athanasiadis and Reiner’s result [4, Theorem 1.3] on the number of
type Dn non-crossing partitions with a given block structure.
Corollary 17. The number of all elements of ÑCm(Dn) which have 2bi non-zero blocks
of size mi equals (
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
m(n− 1)
b1 + b2 + · · ·+ bn
(8.36)
if b1 + 2b2 + · · ·+ nbn < n− 1, and
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
m(n− 1)
b1 + b2 + · · ·+ bn
b1 + b2 + · · ·+ bn − 1
b1 − 1, b2, . . . , bn
m(n− 1)
b1 + b2 + · · ·+ bn − 1
(8.37)
if b1 + 2b2 + · · ·+ nbn = n.
On the other hand, if we sum the expression (8.33), respectively (8.34), over all
s2, s3, . . . , sl with s2+s3+ · · ·+sl = n−s1, then we obtain the following generalisation.
Corollary 18. The number of all multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in ÑC
m(Dn) in
which π1 has 2bi non-zero blocks of size mi equals(
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)m(n− 1)
b1 + b2 + · · ·+ bn
, (8.38)
if b1 + 2b2 + · · ·+ nbn < n− 1, and
b1 + b2 + · · ·+ bn
b1, b2, . . . , bn
(l − 1)m(n− 1)
b1 + b2 + · · ·+ bn
b1 + b2 + · · ·+ bn − 1
b1 − 1, b2, . . . , bn
(l − 1)m(n− 1)
b1 + b2 + · · ·+ bn − 1
(8.39)
if b1 + 2b2 + · · ·+ nbn = n.
Next we sum the expressions (8.33) and (8.34) over all possible b1, b2, . . . , bn, that is,
we sum (8.33) over b2+2b3+ · · ·+(n−1)bn < s1−1, and we sum the expression (8.34)
over b2+2b3+ · · ·+(n−1)bn = s1. With the help of Lemma 4 and the simple binomial
summation (6.24), these sums can indeed be evaluated. In this manner, we obtain the
following result on rank-selected chain enumeration in ÑCm(Dn).
54 C. KRATTENTHALER AND T. W. MÜLLER
Corollary 19. The number of all multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in ÑC
m(Dn) ∼=
NCm(Dn) where πi is of rank s1 + s2 + · · ·+ si, i = 1, 2, . . . , l − 1, equals
m(n− 1)
· · ·
m(n− 1)
m(n− 1)
· · ·
m(n− 1)− 1
sj − 2
· · ·
m(n− 1)
s1 − 2
m(n− 1)
· · ·
m(n− 1)
. (8.40)
This formula extends Athanasiadis and Reiner’s formula [4, Theorem 1.2(ii)] from
NC(Dn) to ÑC
m(Dn). Setting l = 2, we obtain a formula for the number of all
elements in ÑCm(Dn) ∼= NC
m(Dn) with a given number of blocks (equivalently, of
given rank); cf. [1, Theorem 4.6.3]. Next, summing (8.40) over all non-negative integers
s1, s2, . . . , sl with s1+s2+ · · ·+sl = n by means of the multivariate Chu–Vandermonde
summation, we find that the total number of multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in
ÑCm(Dn) ∼= NC
m(Dn) is given by
((l − 1)m+ 1)(n− 1)
((l − 1)m+ 1)(n− 1)
2(l − 1)m(n− 1) + n
((l − 1)m+ 1)(n− 1)
, (8.41)
thus recovering the formula [1, Theorem 3.6.9] for the zeta polynomial of the poset
of generalised non-crossing partitions for the type Dn. The special case l = 2 of
(8.41) gives the well-known fact that the cardinality of ÑCm(Dn) ∼= NC
m(Dn) is
2m(n−1)+n
(m+1)(n−1)
(cf. [1, Theorem 3.5.3]).
In the following section, Corollary 19 will enable us to provide a new proof of Arm-
strong’s F = M (Ex-)Conjecture in type Dn.
9. Proof of the F = M Conjecture for type D
Armstrong’s F = M (Ex-)Conjecture [1, Conjecture 5.3.2], which extends an earlier
conjecture of Chapoton [17], relates the “F -triangle” of the generalised cluster complex
of Fomin and Reading [19] to the “M-triangle” of Armstrong’s generalised non-crossing
partitions. The F -triangle is a certain refined face count in the generalised cluster
complex. We do not give the definition here and, instead, refer the reader to [1, 27],
because it will not be important in what follows. It suffices to know that, again fixing
a finite root system Φ of rank n and a positive integer m, the F -triangle FmΦ (x, y) for
the generalised cluster complex ∆m(Φ) is a polynomial in x and y, and that it was
computed in [27] for all types. What we need here is that it was shown in [27, Sec. 11,
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 55
Prop. D] that
(1− xy)nFmDn
x(1 + y)
1− xy
1− xy
r,s≥0
m(n− 1)
m(n− 1) + s− r − 1
m(n− 1)
m(n− 1) + s− r − 1
m(n− 1)− 1
r − 2
m(n− 1) + s− r − 1
m(n− 1)
m(n− 1) + s− r − 2
s− r − 2
. (9.1)
The “M-triangle” of NCm(Φ) is the polynomial defined by
MmΦ (x, y) =
u,w∈NCm(Φ)
µ(u, w) xrkuyrkw,
where µ(u, w) is the Möbius function in NCm(Φ). It is called “triangle” because the
Möbius function µ(u, w) vanishes unless u ≤ w, and, thus, the only coefficients in the
polynomial which may be non-zero are the coefficients of xkyl with 0 ≤ k ≤ l ≤ n.
An equivalent object is the dual M-triangle, which is defined by
(MmΦ )
∗(x, y) =
u,w∈(NCm(Φ))∗
µ∗(u, w) xrk
∗ wyrk
where (NCm(Φ))∗ denotes the poset dual to NCm(Φ) (i.e., the poset which arises from
NCm(Φ) by reversing all order relations), where µ∗ denotes the Möbius function in
(NCm(Φ))∗, and where rk∗ denotes the rank function in (NCm(Φ))∗. It is equivalent
since, obviously, we have
(MmΦ )
∗(x, y) = (xy)nMmΦ (1/x, 1/y). (9.2)
Given this notation, Armstrong’s F = M (Ex-)Conjecture [1, Conjecture 5.3.2] reads
as follows.
Conjecture FM. For any finite root system Φ of rank n, we have
FmΦ (x, y) = y
1 + y
y − x
y − x
Equivalently,
(1− xy)nFmΦ
x(1 + y)
1− xy
1− xy
u,w∈(NCm(Φ))∗
µ∗(u, w) (−x)rk
∗ w(−y)rk
∗ u. (9.3)
So, Equation (9.1) provides an expression for the left-hand side of (9.3) for Φ = Dn.
With our result on rank-selected chain enumeration in NCm(Dn) given in Corollary 19,
we are now able to calculate the right-hand side of (9.3) directly. As we mentioned
already in the Introduction, together with the results from [27, 28], this completes a
56 C. KRATTENTHALER AND T. W. MÜLLER
computational case-by-case proof of Conjecture FM. A case-free proof had been found
earlier by Tzanaki in [38].
The only ingredient that we need for the proof is the well-known link between chain
enumeration and the Möbius function. (The reader should consult [33, Sec. 3.11] for
more information on this topic.) Given a poset P and two elements u and w, u ≤ w,
in the poset, the zeta polynomial of the interval [u, w], denoted by Z(u, w; z), is the
number of (multi)chains from u to w of length z. (It can be shown that this is indeed
a polynomial in z.) Then the Möbius function of u and w is equal to µ(u, w) =
Z(u, w;−1).
Proof of Conjecture FM in type Dn. We now compute the right-hand side of (9.3), that
u,w∈(NCm(Dn))∗
µ∗(u, w)(−x)rk
∗ w(−y)rk
In order to compute the coefficient of xsyr in this expression,
(−1)r+s
u,w∈(NCm(Dn))∗
with rk∗ u=r and rk∗ w=s
µ∗(u, w),
we compute the sum of all corresponding zeta polynomials (in the variable z), multiplied
by (−1)r+s,
(−1)r+s
u,w∈(NCm(Dn))∗
with rk∗ u=r and rk∗ w=s
Z(u, w; z),
and then put z = −1.
For computing this sum of zeta polynomials, we must set l = z+2, n−s1 = s, sl = r,
s2 + s3 + · · · + sl−1 = s − r in (8.40), and then sum the resulting expression over all
possible s2, s3, . . . , sl−1. (The reader should keep in mind that the roles of s1, s2, . . . , sl
in Corollary 19 have to be reversed, since we are aiming at computing zeta polynomials
in the poset dual to NCm(Dn).) By using the Chu–Vandermonde summation, one
obtains
m(n− 1)
zm(n− 1)
m(n− 1)− 1
r − 2
zm(n − 1)
m(n− 1)
zm(n − 1)− 1
s− r − 2
m(n− 1)
zm(n − 1)
If we put z = −1 in this expression and multiply it by (−1)r+s, then we obtain exactly
the coefficient of xsyr in (9.1). �
10. A conjecture of Armstrong on maximal intervals containing a
random multichain
Given a finite root system of rank n, Conjecture 3.5.13 in [1] says the following:
If we choose an l-multichain uniformly at random from the set
π1 ≤ π2 ≤ · · · ≤ πl : πi ∈ NC
m(Φ), i = 1, . . . , l, and rk(π1) = i
, (10.1)
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 57
then the expected number of maximal intervals in NCm(Φ) containing this multichain
Narm(Φ, n− i)
Nar1(Φ, n− i)
, (10.2)
where Narm(Φ, i) is the i-th Fuß–Narayana number associated to NCm(Φ), that is,
the number of elements of NCm(Φ) of rank i. In particular, this expected value is
independent of l.
We show in this section that, for types An and Bn, the conjecture follows easily from
Edelman’s (8.14) respectively Armstrong’s (8.25) (presumably, this fact constituted the
evidence for setting up the conjecture), while an analogous computation using our new
result (8.40) demonstrates that it fails for type Dn. At the end of this section, we
comment on what we think happens for the exceptional types.
The computation of the expected value in the above conjecture can be approached in
the following way. One first observes that a maximal interval in NCm(Φ) is an interval
between an element π0 of rank 0 and the global maximum (c; ε, . . . , ε). Therefore, to
compute the proposed expected value, we may count the number of chains
π0 ≤ π1 ≤ π2 ≤ · · · ≤ πl, rk(π0) = 0 and rk(π1) = i, (10.3)
and divide this number by the total number of all chains in (10.1). Clearly, in types
An, Bn, and Dn, this kind of chain enumeration can be easily accessed by (8.14), (8.25),
and (8.40), respectively.
We begin with type An. By (8.14), the number of chains (10.3) equals
s2+···+sl+1=n−i
n + 1
m(n+ 1)
m(n+ 1)
· · ·
m(n + 1)
m(n + 1)
ml(n + 1)
while the number of chains in (10.1) equals
s2+···+sl+1=n−i
n + 1
m(n+ 1)
· · ·
m(n + 1)
n + 1
ml(n + 1)
In both cases, we used the multivariate Chu–Vandermonde summation to evaluate the
sums over s2, . . . , sl+1. The quotient of the two numbers is
n + 1
m(n+ 1)
n + 1
m(n + 1)
n + 1
which by (8.14) with n replaced by n + 1, l = 2, s1 = n − i, and s2 = i agrees indeed
with (10.2) for Φ = An.
58 C. KRATTENTHALER AND T. W. MÜLLER
For type Bn, there is an analogous computation using (8.25), the details of which we
leave to the reader. The result is that the desired expected value equals
which by (8.25) with l = 2, s1 = n− i, and s2 = i agrees indeed with (10.2) for Φ = Bn.
The analogous computation for type Dn uses (8.40). The number of chains (10.3)
equals
s2+···+sl+1=n−i
m(n− 1)
m(n− 1)
· · ·
m(n− 1)
s2+···+sl+1=n−i
m(n− 1)− 1
m(n− 1)
· · ·
m(n− 1)
s2+···+sl+1=n−i
m(n− 1)
m(n− 1)
· · ·
m(n− 1)− 1
sj − 2
· · ·
m(n− 1)
m(n− 1)
ml(n− 1)
m(n− 1)− 1
ml(n− 1)
+m(l − 1)
m(n− 1)
ml(n− 1)− 1
n− i− 2
, (10.4)
while the number of chains in (10.1) equals
s2+···+sl+1=n−i
m(n− 1)
· · ·
m(n− 1)
s2+···+sl+1=n−i
m(n− 1)
· · ·
m(n− 1)− 1
sj − 2
· · ·
m(n− 1)
s2+···+sl+1=n−i
m(n− 1)
· · ·
m(n− 1)
ml(n− 1)
ml(n− 1)− 1
n− i− 2
ml(n− 1)
(10.5)
The quotient of (10.4) and (10.5) gives the desired expected value. It is, however, not
independent of l, and therefore Armstrong’s conjecture does not hold for Φ = Dn.
In the case that Φ is of exceptional type, then, as we outline in the next section, the
knowledge of the corresponding decomposition numbers (see the Appendix) allows one
to access the rank selected chain enumeration. Using this, the approach for computing
the expected value proposed by Armstrong that we used above for the classical types
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 59
can be carried through as well for the exceptional types. We have not done this, but we
expect that, similarly to the case of Dn, for most exceptional types the expected value
will depend on l, so that Armstrong’s conjecture will probably also fail in these cases.
11. Chain enumeration in the poset of generalised non-crossing
partitions for the exceptional types
Although it is not the main topic of our paper, we want to briefly demonstrate
in this section that the knowledge of the decomposition numbers also enables one to
do refined enumeration in the generalised non-crossing partition posets NCm(Φ) for
exceptional root systems Φ (of rank n). We restrict the following considerations to the
rank-selected chain enumeration. This means that we want to count the number of all
multi-chains π1 ≤ π2 ≤ · · · ≤ πl−1 in NC
m(Φ), where πi is of rank s1 + s2 + · · · + si,
i = 1, 2, . . . , l − 1. Let us denote this number by RΦ(s1, s2, . . . , sl), with sl = n− s1 −
s2 − · · · − sl. Now, the considerations at the beginning of the proof of Corollary 12,
leading to the factorisation (8.12) with rank constraints on the factors, are also valid
for NCm(Φ) instead of NCm(An−1), that is, they are independent of the underlying
root system. Hence, to determine the number RΦ(s1, s2, . . . , sl), we have to count all
possible factorisations
c = w
1 · · ·u
2 · · ·u
· · ·
u(2)m u
m · · ·u
under the rank constraints (8.11) and ℓT (w
0 ) = s1, where c is a Coxeter element in
W (Φ). As we remarked in the proof of Corollary 12, equivalently we may count all
factorisations
c = w
2 · · ·u
2 · · ·u
· · ·
2 · · ·u
(11.1)
which satisfy (8.11) and ℓT (w
0 ) = s1. We can now obtain an explicit expression by
fixing first the types of w
0 and all the u
i ’s. Under these constraints, the number of
factorisations (11.1) is just the corresponding decomposition number. Subsequently, we
sum the resulting expressions over all possible types.
Before we are able to state the formula which we obtain in this way, we need to recall
some standard integer partition notation (cf. e.g. [34, Sec. 7.2]). An integer partition λ
(with n parts) is an n-tuple λ = (λ1, λ2, . . . , λn) of integers satisfying λ1 ≥ λ2 ≥ · · · ≥
λn ≥ 0. It is called an integer partition of N , written in symbolic notation as λ ⊢ N ,
if λ1 + λ2 + · · ·+ λn = N . The number of parts (components) of λ of size i is denoted
by mi(λ).
Then, making again use of the notation for the multinomial coefficient introduced in
Lemma 4, the expression for RΦ(s1, s2, . . . , sl) which we obtain in the way described
above is
′ NΦ(T
0 , T
1 , T
2 , . . . , T
m1(λ(j)), m2(λ(j)), . . . , mn(λ(j))
, (11.2)
where
∑ ′ is taken over all integer partitions λ(2), λ(3), . . . , λ(l) satisfying λ(2) ⊢ s2,
λ(3) ⊢ s3, . . . , λ
(l) ⊢ sl, over all types T
0 with rk(T
0 ) = s1, and over all types T
with rk(T
i ) = λ
i , i = 1, 2, . . . , n, j = 2, 3, . . . , l.
By way of example, using this formula and the values of the decomposition num-
bers NE8(. . . ) given in Appendix A.7 (and a computer), we obtain that the number
60 C. KRATTENTHALER AND T. W. MÜLLER
RE8(4, 2, 1, 1) of all chains π1 ≤ π2 ≤ π3 in NC
m(E8), where π1 is of rank 4, π2 is of
rank 6, and π3 is of rank 7, is given by
75m3 (8055m− 1141)
(which, by the independence (2.2) of decomposition numbers from the order of the types,
is also equal to RE8(4, 1, 2, 1) and RE8(4, 1, 1, 2)), while the number RE8(2, 4, 1, 1) of all
chains π1 ≤ π2 ≤ π3 in NC
m(E8), where π1 is of rank 2, π2 is of rank 6, and π3 is of
rank 7, is given by
75m3 (73125m3 − 58950m2 + 15635m− 2154)
(which is also equal to RE8(2, 1, 4, 1) and RE8(2, 1, 1, 4)).
Acknowledgements
The authors thank the anonymous referee for a very careful reading of the original
manuscript.
Appendix A. The decomposition numbers for the exceptional types
A.1. The decomposition numbers for type I2(a) [27, Sec. 13]. We have
NI2(a)(I2(a)) = 1, NI2(a)(A1, A1) = a, NI2(a)(A1) = a, NI2(a)(∅) = 1, all other num-
bers NI2(a)(T1, T2, . . . , Td) being zero.
A.2. The decomposition numbers for type H3 [27, Sec. 14]. We have NH3(H3) = 1,
NH3(A
1, A1) = 5, NH3(A2, A1) = 5, NH3(I2(5), A1) = 5, NH3(A1, A1, A1) = 50, plus the
assignments implied by (2.2) and (2.3), all other numbers NH3(T1, T2, . . . , Td) being
zero.
A.3. The decomposition numbers for type H4 [27, Sec. 15]. We have NH4(H4) = 1,
NH4(A1 ∗ A2, A1) = 15, NH4(A3, A1) = 15, NH4(H3, A1) = 15, NH4(A1 ∗ I2(5), A1) =
15, NH4(A
1) = 30, NH4(A
1, A2) = 30, NH4(A
1, I2(5)) = 15, NH4(A2, A2) = 5,
NH4(A2, I2(5)) = 15, NH4(I2(5), I2(5)) = 3, NH4(A
1, A1, A1) = 225, NH4(A2, A1, A1) =
150, NH4(I2(5), A1, A1) = 90, NH4(A1, A1, A1, A1) = 1350, plus the assignments implied
by (2.2) and (2.3), all other numbers NH4(T1, T2, . . . , Td) being zero.
A.4. The decomposition numbers for type F4 [27, Sec. 16]. We have NF4(F4) =
1, NF4(A1 ∗ A2, A1) = 12, NF4(B3, A1) = 12, NF4(A
1) = 12, NF4(A
1, B2) = 12,
NF4(A2, A2) = 16, NF4(B2, B2) = 3, NF4(A
1, A1, A1) = 72, NF4(A2, A1, A1) = 48,
NF4(B2, A1, A1) = 36, NF4(A1, A1, A1, A1) = 432, plus the assignments implied by (2.2)
and (2.3), all other numbers NF4(T1, T2, . . . , Td) being zero.
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 61
A.5. The decomposition numbers for type E6 [27, Sec. 17]. We have NE6(E6) = 1,
NE6(A1 ∗ A
2, A1) = 6, NE6(A1 ∗ A4, A1) = 12, NE6(A5, A1) = 6, NE6(D5, A1) = 12,
NE6(A
1 ∗ A2, A2) = 36, NE6(A
2, A2) = 8, NE6(A1 ∗ A3, A2) = 24, NE6(A4, A2) = 24,
NE6(D4, A2) = 4, NE6(A
1 ∗ A2, A
1) = 18, NE6(A1 ∗ A3, A
1) = 36, NE6(A4, A
1) = 36,
NE6(D4, A
1) = 18, NE6(A
1) = 12, NE6(A1 ∗ A2, A
1) = 24, NE6(A1 ∗ A2, A1 ∗
A2) = 48, NE6(A3, A
1) = 36, NE6(A3, A1 ∗ A2) = 72, NE6(A3, A3) = 27, NE6(A
A2, A1, A1) = 144, NE6(A
2, A1, A1) = 24, NE6(A1∗A3, A1, A1) = 144, NE6(A4, A1, A1) =
144, NE6(D4, A1, A1) = 48, NE6(A
1, A1) = 180, NE6(A
1, A2, A1) = 168, NE6(A1 ∗
A2, A
1, A1) = 360, NE6(A1 ∗ A2, A2, A1) = 336, NE6(A3, A
1, A1) = 378, NE6(A3, A2,
A1) = 180, NE6(A
1) = 432, NE6(A2, A
1) = 504, NE6(A2, A2, A
1) = 288,
NE6(A2, A2, A2) = 160, NE6(A
1, A1, A1) = 2376, NE6(A2, A
1, A1, A1) = 1872,
NE6(A2, A2, A1, A1) = 1056, NE6(A
1, A1, A1, A1) = 864, NE6(A1 ∗ A2, A1, A1, A1) =
1728, NE6(A3, A1, A1, A1) = 1296, NE6(A
1, A1, A1, A1, A1) = 10368, NE6(A2, A1, A1, A1,
A1) = 6912, NE6(A1, A1, A1, A1, A1, A1) = 41472, plus the assignments implied by (2.2)
and (2.3), all other numbers NE6(T1, T2, . . . , Td) being zero.
A.6. The decomposition numbers for type E7 [28, Sec. 6]. We have NE7(E7) =
1, NE7(E6, A1) = 9, NE7(D6, A1) = 9, NE7(A6, A1) = 9, NE7(A1 ∗ D5, A1) = 9,
NE7(A1∗A5, A1) = 9, NE7(A2∗D4, A1) = 0, NE7(A2∗A4, A1) = 9, NE7(A
1∗D4, A1) = 0,
NE7(A
1∗A4, A1) = 0, NE7(A
3, A1) = 0, NE7(A1∗A2∗A3, A1) = 9, NE7(A
1∗A3, A1) = 0,
NE7(A
2, A1) = 0, NE7(A
1 ∗ A
2, A1) = 0, NE7(A
1 ∗ A2, A1) = 0, NE7(A
1, A1) = 0,
NE7(D5, A2) = 18, NE7(A5, A2) = 30, NE7(A1 ∗ A4, A2) = 54, NE7(A1 ∗ D4, A2) = 9,
NE7(A2 ∗ A3, A2) = 36, NE7(A
1 ∗ A3, A2) = 36, NE7(A1 ∗ A
2, A2) = 36, NE7(A
A2, A2) = 12, NE7(A
1, A2) = 0, NE7(D5, A
1) = 54, NE7(A5, A
1) = 63, NE7(A1 ∗
D4, A
1) = 27, NE7(A1 ∗ A4, A
1) = 81, NE7(A2 ∗ A3, A
1) = 27, NE7(A
1 ∗ A3, A
1) = 27,
NE7(A1 ∗ A
1) = 27, NE7(A
1 ∗ A2, A
1) = 9, NE7(A
1) = 0, NE7(D5, A1, A1) =
162, NE7(A5, A1, A1) = 216, NE7(A1 ∗ D4, A1, A1) = 81, NE7(A1 ∗ A4, A1, A1) = 324,
NE7(A2 ∗ A3, A1, A1) = 162, NE7(A
1 ∗ A3, A1, A1) = 162, NE7(A1 ∗ A
2, A1, A1) = 162,
NE7(A
1 ∗ A2, A1, A1) = 54, NE7(A
1, A1, A1) = 0, NE7(D4, A3) = 9, NE7(A4, A3) = 54,
NE7(A1 ∗ A3, A3) = 135, NE7(A
2, A3) = 54, NE7(A
1 ∗ A2, A3) = 162, NE7(A
1, A3) = 27,
NE7(D4, A1∗A2) = 45, NE7(A4, A1∗A2) = 162, NE7(A1∗A3, A1∗A2) = 243, NE7(A
2, A1∗
A2) = 54, NE7(A
1 ∗ A2, A1 ∗ A2) = 162, NE7(A
1, A1 ∗ A2) = 27, NE7(D4, A
1) = 30,
NE7(A4, A
1) = 99, NE7(A1 ∗ A3, A
1) = 126, NE7(A
1) = 18, NE7(A
1 ∗ A2, A
1) = 54,
NE7(A
1) = 9, NE7(D4, A2, A1) = 81, NE7(A4, A2, A1) = 378, NE7(A1 ∗A3, A2, A1) =
783, NE7(A
2, A2, A1) = 270, NE7(A
1 ∗ A2, A2, A1) = 810, NE7(A
1, A2, A1) = 135,
NE7(D4, A
1, A1) = 243, NE7(A4, A
1, A1) = 891, NE7(A1 ∗ A3, A
1, A1) = 1377, NE7(A
A21, A1) = 324, NE7(A
1 ∗ A2, A
1, A1) = 972, NE7(A
1, A1) = 162, NE7(D4, A1, A1,
A1) = 729, NE7(A4, A1, A1, A1) = 2916, NE7(A1 ∗ A3, A1, A1, A1) = 5103, NE7(A
2, A1,
A1, A1) = 1458, NE7(A
1 ∗ A2, A1, A1, A1) = 4374, NE7(A
1, A1, A1, A1) = 729, NE7(A3,
A3, A1) = 486, NE7(A3, A1 ∗ A2, A1) = 1458, NE7(A3, A
1, A1) = 891, NE7(A1 ∗ A2, A1 ∗
A2, A1) = 2430, NE7(A1∗A2, A
1, A1) = 1215, NE7(A
1, A1) = 540, NE7(A3, A2, A2) =
432, NE7(A1 ∗ A2, A2, A2) = 1188, NE7(A
1, A2, A2) = 711, NE7(A3, A2, A
1) = 1053,
NE7(A1∗A2, A2, A
1) = 2349, NE7(A
1, A2, A
1) = 1323, NE7(A3, A
1) = 2430, NE7(A1∗
A2, A
1) = 3402, NE7(A
1) = 1539, NE7(A3, A2, A1, A1) = 3402, NE7(A1 ∗
A2, A2, A1, A1) = 8262, NE7(A
1, A2, A1, A1) = 4779, NE7(A3, A
1, A1, A1) = 8019,
NE7(A1 ∗ A2, A
1, A1, A1) = 13851, NE7(A
1, A1, A1) = 7047, NE7(A3, A1, A1, A1,
A1) = 26244, NE7(A1 ∗ A2, A1, A1, A1, A1) = 52488, NE7(A
1, A1, A1, A1, A1) = 28431,
62 C. KRATTENTHALER AND T. W. MÜLLER
NE7(A2, A2, A2, A1) = 2916, NE7(A2, A2, A
1, A1) = 6561, NE7(A2, A
1, A1) = 13122,
NE7(A
1, A1) = 19683, NE7(A2, A2, A1, A1, A1) = 21870, NE7(A2, A
1, A1, A1,
A1) = 45927, NE7(A
1, A1, A1, A1) = 78732, NE7(A2, A1, A1, A1, A1, A1) = 157464,
NE7(A
1, A1, A1, A1, A1, A1) = 295245, NE7(A1, A1, A1, A1, A1, A1, A1) = 1062882, plus
the assignments implied by (2.2) and (2.3), all other numbers NE7(T1, T2, . . . , Td) being
zero.
A.7. The decomposition numbers for type E8 [28, Sec. 7]. We have NE8(E8) = 1,
NE8(E7, A1) = 15, NE8(D7, A1) = 15, NE8(A7, A1) = 15, NE8(A1 ∗ E6, A1) = 15,
NE8(A1∗D6, A1) = 0, NE8(A1∗A6, A1) = 15, NE8(A2∗D5, A1) = 15, NE8(A2∗A5, A1) =
0, NE8(A
1∗D5, A1) = 0, NE8(A
1∗A5, A1) = 0, NE8(A3∗D4, A1) = 0, NE8(A3∗A4, A1) =
15, NE8(A1∗A2∗D4, A1) = 0, NE8(A1∗A2∗A4, A1) = 15, NE8(A
1∗D4, A1) = 0, NE8(A
A4, A1) = 0, NE8(A1 ∗ A
3, A1) = 0, NE8(A
2 ∗ A3, A1) = 0, NE8(A
1 ∗ A2 ∗ A3, A1) = 0,
NE8(A
1∗A3, A1) = 0, NE8(A1∗A
2, A1) = 0, NE8(A
2, A1) = 0, NE8(A
1∗A2, A1) = 0,
NE8(A
1, A1) = 0, NE8(E6, A2) = 20, NE8(D6, A2) = 15, NE8(A6, A2) = 60, NE8(A1 ∗
D5, A2) = 60, NE8(A1 ∗ A5, A2) = 60, NE8(A2 ∗D4, A2) = 20, NE8(A2 ∗ A4, A2) = 90,
NE8(A
3, A2) = 45, NE8(A
1∗D4, A2) = 0, NE8(A
1∗A4, A2) = 90, NE8(A1∗A2∗A3, A2) =
90, NE8(A
1∗A3, A2) = 0, NE8(A
2, A2) = 0, NE8(A
2, A2) = 45, NE8(A
1∗A2, A2) = 0,
NE8(A
1, A2) = 0, NE8(E6, A
1) = 45, NE8(D6, A
1) = 90, NE8(A6, A
1) = 135, NE8(A1 ∗
D5, A
1) = 135, NE8(A1 ∗A5, A
1) = 135, NE8(A2 ∗D4, A
1) = 45, NE8(A2 ∗A4, A
1) = 90,
NE8(A
1) = 45, NE8(A
1 ∗ D4, A
1) = 0, NE8(A
1 ∗ A4, A
1) = 90, NE8(A1 ∗ A2 ∗
A3, A
1) = 90, NE8(A
1 ∗ A3, A
1) = 0, NE8(A
1) = 0, NE8(A
1 ∗ A
1) = 45,
NE8(A
1 ∗A2, A
1) = 0, NE8(A
1) = 0, NE8(E6, A1, A1) = 150, NE8(D6, A1, A1) = 225,
NE8(A6, A1, A1) = 450, NE8(A1 ∗ D5, A1, A1) = 450, NE8(A1 ∗ A5, A1, A1) = 450,
NE8(A2 ∗ D4, A1, A1) = 150, NE8(A2 ∗ A4, A1, A1) = 450, NE8(A
3, A1, A1) = 225,
NE8(A
1 ∗ D4, A1, A1) = 0, NE8(A
1 ∗ A4, A1, A1) = 450, NE8(A1 ∗ A2 ∗ A3, A1, A1) =
450, NE8(A
1 ∗ A3, A1, A1) = 0, NE8(A
2, A1, A1) = 0, NE8(A
1 ∗ A
2, A1, A1) = 225,
NE8(A
1 ∗ A2, A1, A1) = 0, NE8(A
1, A1, A1) = 0, NE8(D5, A3) = 45, NE8(A5, A3) = 90,
NE8(A1 ∗ A4, A3) = 315, NE8(A1 ∗ D4, A3) = 45, NE8(A2 ∗ A3, A3) = 270, NE8(A
A3, A3) = 270, NE8(A1 ∗ A
2, A3) = 225, NE8(A
1 ∗ A2, A3) = 225, NE8(A
1, A3) = 0,
NE8(D5, A1 ∗A2) = 195, NE8(A5, A1 ∗A2) = 390, NE8(A1 ∗A4, A1 ∗A2) = 690, NE8(A1 ∗
D4, A1 ∗A2) = 195, NE8(A2 ∗A3, A1 ∗A2) = 495, NE8(A
1 ∗A3, A1 ∗A2) = 495, NE8(A1 ∗
A22, A1 ∗A2) = 300, NE8(A
1 ∗A2, A1 ∗A2) = 300, NE8(A
1, A1 ∗A2) = 0, NE8(D5, A
150, NE8(A5, A
1) = 300, NE8(A1 ∗ A4, A
1) = 375, NE8(A1 ∗ D4, A
1) = 150, NE8(A2 ∗
A3, A
1) = 225, NE8(A
1 ∗ A3, A
1) = 225, NE8(A1 ∗ A
1) = 75, NE8(A
1 ∗ A2, A
1) = 75,
NE8(A
1) = 0,NE8(D5, A2, A1) = 375,NE8(A5, A2, A1) = 750, NE8(A1∗A4, A2, A1) =
1950, NE8(A1 ∗D4, A2, A1) = 375, NE8(A2 ∗A3, A2, A1) = 1575, NE8(A
1 ∗A3, A2, A1) =
1575, NE8(A1 ∗ A
2, A2, A1) = 1200, NE8(A
1 ∗ A2, A2, A1) = 1200, NE8(A
1, A2, A1) =
0, NE8(D5, A
1, A1) = 1125, NE8(A5, A
1, A1) = 2250, NE8(A1 ∗ A4, A
1, A1) = 3825,
NE8(A1∗D4, A
1, A1) = 1125, NE8(A2∗A3, A
1, A1) = 2700, NE8(A
1∗A3, A
1, A1) = 2700,
NE8(A1 ∗ A
1, A1) = 1575, NE8(A
1 ∗ A2, A
1, A1) = 1575, NE8(A
1, A1) = 0,
NE8(D5, A1, A1, A1) = 3375, NE8(A5, A1, A1, A1) = 6750, NE8(A1 ∗ A4, A1, A1, A1) =
13500, NE8(A1 ∗ D4, A1, A1, A1) = 3375, NE8(A2 ∗ A3, A1, A1, A1) = 10125, NE8(A
A3, A1, A1, A1) = 10125, NE8(A1 ∗ A
2, A1, A1, A1) = 6750, NE8(A
1 ∗ A2, A1, A1, A1) =
6750, NE8(A
1, A1, A1, A1) = 0, NE8(D4, D4) = 5, NE8(D4, A4) = 15, NE8(A4, A4) = 138,
NE8(D4, A1 ∗ A3) = 105, NE8(A4, A1 ∗ A3) = 390, NE8(A1 ∗ A3, A1 ∗ A3) = 1155,
NE8(D4, A
2) = 35, NE8(A4, A
2) = 180, NE8(A1 ∗ A3, A
2) = 360, NE8(A
2) = 95,
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 63
NE8(D4, A
1 ∗ A2) = 135, NE8(A4, A
1 ∗ A2) = 630, NE8(A1 ∗ A3, A
1 ∗ A2) = 1035,
NE8(A
1 ∗A2) = 270, NE8(A
1 ∗A2, A
1 ∗A2) = 495, NE8(D4, A
1) = 30, NE8(A4, A
165, NE8(A1 ∗A3, A
1) = 255, NE8(A
1) = 60, NE8(A
1 ∗A2, A
1) = 135, NE8(A
30, NE8(D4, A3, A1) = 225, NE8(A4, A3, A1) = 1215, NE8(A1 ∗ A3, A3, A1) = 4050,
NE8(A
2, A3, A1) = 1575, NE8(A
1∗A2, A3, A1) = 5400, NE8(A
1, A3, A1) = 1350, NE8(D4,
A1 ∗ A2, A1) = 975, NE8(A4, A1 ∗ A2, A1) = 4590, NE8(A1 ∗ A3, A1 ∗ A2, A1) = 10800,
NE8(A
2, A1 ∗A2, A1) = 3450, NE8(A
1 ∗A2, A1 ∗A2, A1) = 9900, NE8(A
1, A1 ∗A2, A1) =
2475, NE8(D4, A
1, A1) = 750, NE8(A4, A
1, A1) = 3375, NE8(A1 ∗ A3, A
1, A1) = 6750,
NE8(A
1, A1) = 1875, NE8(A
1∗A2, A
1, A1) = 4500, NE8(A
1, A1) = 1125, NE8(D4,
A2, A2) = 175, NE8(A4, A2, A2) = 1140, NE8(A1∗A3, A2, A2) = 3300, NE8(A
2, A2, A2) =
1300, NE8(A
1 ∗ A2, A2, A2) = 4500, NE8(A
1, A2, A2) = 1125, NE8(D4, A2, A
1) = 675,
NE8(A4, A2, A
1) = 3015, NE8(A1∗A3, A2, A
1) = 8550, NE8(A
2, A2, A
1) = 2925, NE8(A
A2, A2, A
1) = 9000, NE8(A
1, A2, A
1) = 2250, NE8(D4, A
1) = 1800, NE8(A4, A
A21) = 8640, NE8(A1 ∗ A3, A
1) = 17550, NE8(A
1) = 5175, NE8(A
1 ∗ A2, A
A21) = 13500, NE8(A
1) = 3375, NE8(D4, A2, A1, A1) = 1875, NE8(A4, A2, A1,
A1) = 9450, NE8(A1 ∗ A3, A2, A1, A1) = 27000, NE8(A
2, A2, A1, A1) = 9750, NE8(A
A2, A2, A1, A1) = 31500, NE8(A
1, A2, A1, A1) = 7875, NE8(D4, A
1, A1, A1) = 5625,
NE8(A4, A
1, A1, A1) = 26325, NE8(A1 ∗ A3, A
1, A1, A1) = 60750, NE8(A
1, A1, A1) =
19125, NE8(A
1 ∗A2, A
1, A1, A1) = 54000, NE8(A
1, A1, A1) = 13500, NE8(D4, A1, A1,
A1, A1) = 16875, NE8(A4, A1, A1, A1, A1) = 81000, NE8(A1 ∗ A3, A1, A1, A1, A1) =
202500, NE8(A
2, A1, A1, A1, A1) = 67500, NE8(A
1 ∗ A2, A1, A1, A1, A1) = 202500,
NE8(A
1, A1, A1, A1, A1) = 50625, NE8(A3, A3, A2) = 1350, NE8(A3, A1 ∗A2, A2) = 5175,
NE8(A3, A
1, A2) = 3825, NE8(A1 ∗ A2, A1 ∗ A2, A2) = 15000, NE8(A1 ∗ A2, A
1, A2) =
9825, NE8(A
1, A2) = 6000, NE8(A3, A3, A
1) = 4050, NE8(A3, A1 ∗ A2, A
1) = 13500,
NE8(A3, A
1) = 9450, NE8(A1 ∗ A2, A1 ∗ A2, A
1) = 30825, NE8(A1 ∗ A2, A
17325,NE8(A
1) = 7875,NE8(A3, A3, A1, A1) = 12150,NE8(A3, A1∗A2, A1, A1) =
42525, NE8(A3, A
1, A1, A1) = 30375, NE8(A1 ∗ A2, A1 ∗ A2, A1, A1) = 106650, NE8(A1 ∗
A2, A
1, A1, A1) = 64125, NE8(A
1, A1, A1) = 33750, NE8(A3, A2, A2, A1) = 10575,
NE8(A3, A2, A
1, A1) = 29700, NE8(A3, A
1, A1) = 76950, NE8(A1 ∗ A2, A2, A2, A1) =
35700, NE8(A1∗A2, A2, A
1, A1) = 84825, NE8(A1∗A2, A
1, A1) = 171450, NE8(A
1, A2,
A2, A1) = 25125, NE8(A
1, A2, A
1, A1) = 55125, NE8(A
1, A1) = 94500, NE8(A3,
A2, A1, A1, A1) = 91125, NE8(A3, A
1, A1, A1, A1) = 243000, NE8(A1 ∗ A2, A2, A1, A1,
A1) = 276750,NE8(A1∗A2, A
1, A1, A1, A1) = 597375, NE8(A
1, A2, A1, A1, A1) = 185625,
NE8(A
1, A1, A1, A1) = 354375, NE8(A3, A1, A1, A1, A1, A1) = 759375, NE8(A1 ∗ A2,
A1, A1, A1, A1, A1) = 2025000, NE8(A
1, A1, A1, A1, A1, A1) = 1265625, NE8(A2, A2, A2,
A2) = 9350, NE8(A2, A2, A2, A
1) = 24975, NE8(A2, A2, A
1) = 64350, NE8(A2, A
A21, A
1) = 143100, NE8(A
1) = 261225, NE8(A2, A2, A2, A1, A1) = 78000,
NE8(A2, A2, A
1, A1, A1) = 203625, NE8(A2, A
1, A1, A1) = 479250, NE8(A
A1, A1) = 951750, NE8(A2, A2, A1, A1, A1, A1) = 641250, NE8(A2, A
1, A1, A1, A1, A1) =
1569375, NE8(A
1, A1, A1, A1, A1) = 3341250, NE8(A2, A1, A1, A1, A1, A1, A1) =
5062500, NE8(A
1, A1, A1, A1, A1, A1, A1) = 11390625, NE8(A1, A1, A1, A1, A1, A1, A1,
A1) = 37968750, plus the assignments implied by (2.2) and (2.3), all other numbers
NE8(T1, T2, . . . , Td) being zero.
References
[1] D. Armstrong, Generalized noncrossing partitions and combinatorics of Coxeter groups, Ph.D.
thesis, Cornell University, 2006; to appear in Mem. Amer. Math. Soc.; arχiv:math.CO/0611106.
http://arxiv.org/abs/math/0611106
64 C. KRATTENTHALER AND T. W. MÜLLER
[2] C. A. Athanasiadis, On noncrossing and nonnesting partitions for classical reflection groups, Elec-
tron. J. Combin. 5 (1998), Article #R42, 16 pp.
[3] C. A. Athanasiadis, On some enumerative aspects of generalized associahedra, European J. Com-
bin. 28 (2007), 1208–1215.
[4] C. A. Athanasiadis and V. Reiner, Noncrossing partitions for the group Dn, SIAM J. Discrete
Math. 18 (2004), 397–417.
[5] C. A. Athanasiadis, T. Brady and C. Watt, Shellability of noncrossing partition lattices, Proc.
Amer. Math. Soc. 135 (2007), 939–949.
[6] C. A. Athanasiadis and E. Tzanaki, On the enumeration of positive cells in generalized cluster
complexes and Catalan hyperplane arrangements, J. Algebraic Combin. 23 (2006), 355–375.
[7] C. A. Athanasiadis and E. Tzanaki, Shellability and higher Cohen-Macaulay connectivity of gen-
eralized cluster complexes, Israel J. Math. 167 (2008), 177–191.
[8] D. Bessis, The dual braid monoid, Ann. Sci. École Norm. Sup. (4) 36 (2003), 647–683.
[9] D. Bessis and R. Corran, Non-crossing partitions of type (e, e, r), Adv. Math. 202 (2006), 1–49.
[10] P. Biane, Some properties of crossings and partitions, Discrete Math. 175 (1997), 41–53.
[11] A. Björner and F. Brenti, Combinatorics of Coxeter groups, Springer–Verlag, New York, 2005.
[12] M. Bóna, M. Bousquet, G. Labelle and P. Leroux, Enumeration of m-ary cacti, Adv. Appl. Math.
24 (2000), 22–56.
[13] M. Bousquet, C. Chauve and G. Schaeffer, Énumération et génération aléatoire de cactus m-aires,
Proceedings of the Colloque LaCIM 2000 (Montréal), P. Leroux (ed.), Publications du LaCIM,
vol. 27, 2000, pp. 89–100.
[14] T. Brady, A partial order on the symmetric group and new K(π, 1)’s for the braid groups, Adv.
Math. 161 (2001), 20–40.
[15] T. Brady and C. Watt, K(π, 1)’s for Artin groups of finite type, Geom. Dedicata 94 (2002),
225–250.
[16] T. Brady and C. Watt, Non-crossing partition lattices in finite reflection groups, Trans. Amer.
Math. Soc. 360 (2008), 1983–2005.
[17] F. Chapoton, Enumerative properties of generalized associahedra, Séminaire Lotharingien Combin.
51 (2004), Article B51b, 16 pp.
[18] P. Edelman, Chain enumeration and noncrossing partitions, Discrete Math. 31 (1981), 171–180.
[19] S. Fomin and N. Reading, Generalized cluster complexes and Coxeter combinatorics, Int. Math.
Res. Notices 44 (2005), 2709–2757.
[20] S. Fomin and N. Reading, Root systems and generalized associahedra, in: Geometric combinatorics,
E. Miller, V. Reiner and B. Sturmfels (eds.), IAS/Park City Math. Ser., vol. 13, Amer. Math.
Soc., Providence, R.I., 2007, pp. 63–131.
[21] S. Fomin and A. Zelevinsky, Y -systems and generalized associahedra, Ann. of Math. (2) 158
(2003), 977–1018.
[22] I. J. Good, Generalizations to several variables of Lagrange’s expansion, with applications to
stochastic processes, Proc. Cambridge Philos. Soc. 56 (1960), 367–380.
[23] I. P. Goulden and D. M. Jackson, The combinatorial relationship between trees, cacti and certain
connection coefficients for the symmetric group, Europ. J. Combin. 13 (1992), 357–365.
[24] J. E. Humphreys, Reflection groups and Coxeter groups, Cambridge University Press, Cambridge,
1990.
[25] J. Irving, Combinatorial constructions for transitive factorizations in the symmetric group, Ph.D.
thesis, University of Waterloo, 2004.
[26] C. Krattenthaler, Operator methods and Lagrange inversion: A unified approach to Lagrange
formulas, Trans. Amer. Math. Soc. 305 (1988), 431–465.
[27] C. Krattenthaler, The F -triangle of the generalised cluster complex, in: Topics in Discrete Mathe-
matics, dedicated to Jarik Nešetřil on the occasion of his 60th birthday, M. Klazar, J. Kratochvil,
M. Loebl, J. Matoušek, R. Thomas and P. Valtr (eds.), Springer–Verlag, Berlin, New York, 2006,
pp. 93–126.
[28] C. Krattenthaler, The M -triangle of generalised non-crossing partitions for the types E7 and E8,
Séminaire Lotharingien Combin. 54 (2006), Article B54l, 34 pages.
[29] C. Krattenthaler, Non-crossing partitions on an annulus, in preparation.
DECOMPOSITION NUMBERS FOR FINITE COXETER GROUPS 65
[30] G. Kreweras, Sur les partitions non croisées d’un cycle, Discrete Math. 1 (1972), 333–350.
[31] N. Reading, Chains in the noncrossing partition lattice, SIAM J. Discrete Math. 22 (2008), 875–
[32] V. Reiner, Non-crossing partitions for classical reflection groups, Discrete Math. 177 (1997), 195–
[33] R. P. Stanley, Enumerative Combinatorics, vol. 1, Wadsworth & Brooks/Cole, Pacific Grove,
California, 1986; reprinted by Cambridge University Press, Cambridge, 1998.
[34] R. P. Stanley, Enumerative Combinatorics, vol. 2, Cambridge University Press, Cambridge, 1999.
[35] J. R. Stembridge, coxeter, Maple package for working with root systems and finite Coxeter groups;
available at http://www.math.lsa.umich.edu/~jrs.
[36] E. Tzanaki, Combinatorics of generalized cluster complexes and hyperplane arrangements, Ph.D.
thesis, University of Crete, Iraklio, 2007.
[37] E. Tzanaki, Polygon dissections and some generalizations of cluster complexes, J. Combin. Theory
Ser. A 113 (2006), 1189–1198.
[38] E. Tzanaki, Faces of generalized cluster complexes and noncrossing partitions, SIAM J. Discrete
Math. 22 (2008), 15–30.
Fakultät für Mathematik, Universität Wien, Nordbergstraße 15, A-1090 Vienna,
Austria. WWW: http://www.mat.univie.ac.at/~kratt.
School of Mathematical Sciences, Queen Mary & Westfield College, University of
London, Mile End Road, London E1 4NS, United Kingdom.
WWW: http://www.maths.qmw.ac.uk/~twm/.
http://www.math.lsa.umich.edu/~jrs
http://www.mat.univie.ac.at/~kratt
http://www.maths.qmw.ac.uk/~twm/
	1. Introduction
	2. Decomposition numbers for finite Coxeter groups
	3. Auxiliary results
	4. Decomposition numbers for type A
	5. Decomposition numbers for type B
	6. Decomposition numbers for type D
	7. Generalised non-crossing partitions
	8. Decomposition numbers with free factors, and enumeration in the poset of generalised non-crossing partitions
	9. Proof of the F=M Conjecture for type D
	10. A conjecture of Armstrong on maximal intervals containing a random multichain
	11. Chain enumeration in the poset of generalised non-crossing partitions for the exceptional types
	Acknowledgements
	Appendix A. The decomposition numbers for the exceptional types
	A.1. The decomposition numbers for type I2(a) [Sec. 13]KratCB
	A.2. The decomposition numbers for type H3 [Sec. 14]KratCB
	A.3. The decomposition numbers for type H4 [Sec. 15]KratCB
	A.4. The decomposition numbers for type F4 [Sec. 16]KratCB
	A.5. The decomposition numbers for type E6 [Sec. 17]KratCB
	A.6. The decomposition numbers for type E7 [Sec. 6]KratCF
	A.7. The decomposition numbers for type E8 [Sec. 7]KratCF
	References
ABSTRACT
  Given a finite irreducible Coxeter group $W$, a positive integer $d$, and
types $T_1,T_2,...,T_d$ (in the sense of the classification of finite Coxeter
groups), we compute the number of decompositions $c=\si_1\si_2 cdots\si_d$ of a
Coxeter element $c$ of $W$, such that $\si_i$ is a Coxeter element in a
subgroup of type $T_i$ in $W$, $i=1,2,...,d$, and such that the factorisation
is "minimal" in the sense that the sum of the ranks of the $T_i$'s,
$i=1,2,...,d$, equals the rank of $W$. For the exceptional types, these
decomposition numbers have been computed by the first author. The type $A_n$
decomposition numbers have been computed by Goulden and Jackson, albeit using a
somewhat different language. We explain how to extract the type $B_n$
decomposition numbers from results of B\'ona, Bousquet, Labelle and Leroux on
map enumeration. Our formula for the type $D_n$ decomposition numbers is new.
These results are then used to determine, for a fixed positive integer $l$ and
fixed integers $r_1\le r_2\le ...\le r_l$, the number of multi-chains $\pi_1\le
\pi_2\le ...\le \pi_l$ in Armstrong's generalised non-crossing partitions
poset, where the poset rank of $\pi_i$ equals $r_i$, and where the "block
structure" of $\pi_1$ is prescribed. We demonstrate that this result implies
all known enumerative results on ordinary and generalised non-crossing
partitions via appropriate summations. Surprisingly, this result on multi-chain
enumeration is new even for the original non-crossing partitions of Kreweras.
Moreover, the result allows one to solve the problem of rank-selected chain
enumeration in the type $D_n$ generalised non-crossing partitions poset, which,
in turn, leads to a proof of Armstrong's $F=M$ Conjecture in type $D_n$.

<|endoftext|><|startoftext|>
Introduction
Recently, it has been shown [1] that the t-channel components αt and βt of the electric (α) and
magnetic (β) polarizabilities of the nucleon can be understood as a property of the constituent
quarks. The constituent quarks couple to π and σ fields and, mediated by these fields, they
couple to two photons. The coupling of two photons with perpendicular linear polarization
to the π0 meson provides the main contribution, γtπ, to the backward-angle spin-polarizability
γπ. Similarly, the coupling of two photons with parallel linear polarization provides the main
contribution, (α− β)t, to the difference (α−β) of the electric and the magnetic polarizabilities.
The quantitative prediction (α − β)tp,n = 15.2 (in units of 10−4fm3) makes use of the fact that
the mass of the particle of the σ field is predicted by the quark-level Nambu–Jona-Lasinio (NJL)
model to be mσ = 666 MeV and its two-photon width to be Γγγ = 2.6 keV [1].
The foregoing paragraph describes the résumé of a long and partial controversial history
of research. The scalar-isoscalar t-channel was introduced [2] in analogy to the pseudoscalar
t-channel [3]. But differing from the π0-pole contribution [3] to the scattering amplitude, the
meaning and importance of the scalar-isoscalar t-channel [2] was less well known, mainly because
the σ meson was not considered as a normal particle. One important step forward was the
formulation of the BEFT sum rule [4], relating the s-channel part of the difference of the electric
and magnetic polarizabilities, (α − β)s, to the multipole content of the total photoabsorption
cross section using a fixed-θ dispersion relation at θ = π, and by relating the t-channel part
(α− β)t to a dispersion relation for t with the imaginary part of the amplitude taken from the
reactions γγ → ππ and NN̄ → ππ via a unitarity relation. Furthermore, the scalar-isoscalar
phase δ00(t) was taken from the reaction ππ → ππ. One of the first evaluations of the BEFT
sum-rule showed that for pointlike uncorrelated pions the large value of (α − β)t = +17.51 is
obtained [5] which in other calculations has been reduced by very different factors (see [6] for an
overview) when the ππ correlation and the pion internal structure is taken into account. The
largest reduction amounting to a factor of 2 has been obtained in the latest of this early series of
http://arxiv.org/abs/0704.0200v1
calculations [7]. This unsatisfactory situation has recently been clarified by showing [1, 8] that
the arithmetic average of the most recent calculations of Drechsel et al. [9] and Levchuk et al.
(see [6]), (α − β)tp,n = 15.3 ± 1.3, leads to a very good agreement with the experimental result
and with a parameter-free calculation based on the quark-level NJL model or dynamical linear
σ model (LσM) [1,10,11], leading to (α− β)tp,n = 15.2.
After the size and the dynamics of the t-channel contribution to the electromagnetic polar-
izabilities has been well understood, it appears of interest to get a similar understanding for the
s-channel contribution. Especially, the question has to be answered what the individual contri-
butions of the resonant excited states of the nucleon to the electric and magnetic polarizabilities
are and how the contributions of the “pion cloud” to the electric and magnetic polarizabilities
may be specified. To the author’s knowledge such an investigation has not been carried out
before.
2 Electromagnetic polarizabilities obtained from the forward-
angle sum-rule for (α + β) and the backward-angle sum-rule
for (α− β)
The appropriate tool for the present investigation is to simultaneously apply the forward-angle
sum-rule for (α + β) and the backward-angle sum-rule for (α − β). This leads to the following
relations
α = αs + αt, (1)
[A(ω)σ(ω,E1,M2, · · · ) +B(ω)σ(ω,M1, E2, · · · )] dω
, (2)
5αe gπNN
12π2 m2σ fπ
= 7.6, (3)
β = βs + βt, (4)
[A(ω)σ(ω,M1, E2, · · · ) +B(ω)σ(ω,E1,M2, · · · )] dω
, (5)
βt = − 5αe gπNN
12π2 m2σ fπ
= −7.6, (6)
ω0 = mπ +
, (7)
A(ω) =
, (8)
B(ω) =
. (9)
In (1) – (9) ω is the photon energy in the lab-system, mπ the pion mass and m the nucleon
mass. The quantities αs, βs are the s-channel electric and magnetic polarizabilities, and αt, βt
the t-channel electric and magnetic polarizabilities, respectively. The multipole content of the
photoabsorption cross section enters through
σ(ω,E1,M2, · · · ) = σ(ω,E1) + σ(ω,M2) + · · · , (10)
σ(ω,M1, E2, · · · ) = σ(ω,M1) + σ(ω,E2) + · · · , (11)
i.e. through the sums of cross sections with change and without change of parity during the
electromagnetic transition, respectively1. The multipoles belonging to parity-change are favored
for the electric polarizability αs whereas the multipoles belonging to parity-nonchange are fa-
vored for the magnetic polarizability βs. The coefficients A(ω) and B(ω) in Eqs. (2), (5), (8)
and (9) multiplying the cross sections of the parity-favored and parity-nonfavored multipoles,
respectively, are A ∼ +1.07 and B ∼ −0.07 at the pion photoproduction threshold. They in-
crease with photon energy, as expected for relativistic correction factors. Using A(ω) and B(ω)
it is easy to prove that (α + β) ≡ (α + β)s is given by the Baldin or Baldin-Lapidus (BL) [14]
sum rule, whereas (α− β)s is given by the s-channel part of the BEFT [4] sum rule.
For the t-channel parts, αt and βt, we use the predictions obtained from the σ-meson pole
representation2 with properties as predicted by the quark-level Nambu–Jona-Lasinio model [1,
8]. The quantities entering into this prediction are αe = e
2/4π = 1/137.04, the pion-nucleon
coupling constant, gπNN = 13.169±0.057, the pion decay constant, fπ = (92.42±0.26) MeV, and
the σ-meson mass, mσ = 666.0 MeV [1, 10, 11]. For convenience we summarize the arguments
leading to the relations (3) and (6). The flavor wave-functions of the π0 and the σ meson are
given by
|π0〉 =
(−uū+ dd̄), |σ〉 =
(uū+ dd̄). (12)
This leads to the decay matrix elements
M(σ → γγ) = −5
M(π0 → γγ) = 5
. (13)
Using the NJL model or the dynamical LσM with dimensional regularization we arrive at [1,11]
mclσ =
4πf clπ√
, (14)
where mclσ and f
π = 89.8 MeV are the σ meson mass and the π decay constant in the chiral
limit (cl) and Nc = 3 the number of colors. Then the mass of the σ meson is given by
(mclσ )
2 +m2π = 666 MeV. (15)
Inserting this into
(α− β)t = gσNNM(σ → γγ)
2πm2σ
and using fσNN = fπNN and (α+ β)
t = 0 we arrive at (3) and (6).
1It should be noted that this separation into cross sections for separate multipoles is possible in the presently
used fixed-θ dispersion theory applied at θ = 0 and θ = π, whereas in the corresponding formulas based on fixed-t
dispersion theory [12] terms containing mixed products of CGLN [13] amplitudes occur (for a discussion see [6]).
2This σ-meson pole in the complex t-plane of the Compton scattering amplitude A(s, t) is not the same, but
has relations with the σ-meson pole introduced to parameterize the ππ scattering amplitude. These relations
have been discussed in detail in [1,6].
3 Components of electromagnetic polarizabilities from analyses
of total photoabsorption and meson photoproduction data
In the following we use different photoabsorption data to get information on partial contributions
to αs and βs. Analyses of total photoabsorption cross sections have been carried out in [15].
These analyses give a very good overview over the resonant and nonresonant contributions to
the electromagnetic polarizabilities. Further information is taken from the PDG2006 [16], the
GWSES [17] and the Mainz [18,19] analyses of meson photoproduction data.
3.1 Components of electromagnetic polarizabilities from analysis of the total
photoabsorption cross-section of the proton
In the following we wish to study the contributions of nucleon resonances and nonresonant
excited states to the s-channel electromagnetic polarizabilities. Only the resonances P33(1232),
P11(1440), D13(1520), S11(1535) and F15(1680) have to be taken into account. The contributions
of the resonances S11(1650), D15(1675) and higher lying resonances are negligible. For this
analysis we use the Walker [15,20] parameterization of nucleon resonances
I = Ir
W 2r ΓΓγ
(W 2 −W 2r )2 +W 2r Γ2
, (17)
Γ = Γr
)2l+1(
q2r +X
q2 +X2
, (18)
Γγ = Γr
k2r +X
k2 +X2
. (19)
s = 2ωm+m2, ω = photon energy in the lab system, (20)
W 2 = s (21)
k = |k| =
, |k| = photon momentum in the c.m. system, (22)
q = |q| =
E2π −m2π; Eπ =
s−m2 +m2π
, |q| = π momentum in the c.m. system, (23)
jγ , multipole angular momentum of the photon, (24)
l, single π angular momentum. (25)
The damping constants X are X = 160 MeV for the P33(1232) resonance and X = 350 MeV
else.
For the proton, parameters are given in [15] for the relevant resonant states, leading to the
results given in lines 3 – 5 of Table 1. The sum αp+βp of nonresonant contributions in line 7 of
Table 1 is in agreement with the corresponding number calculated from the nonresonant cross
section given in [15] if the nonresonant cross section data are extrapolated to about 3.5 GeV.
This shows that with the predicted t-channel contributions given in line 6 there is consistency
between the experimental electromagnetic polarizabilities and the predictions.
3.2 Components of electromagnetic polarizabilities from analyses of meson
photoproduction for the proton and the neutron
From isospin considerations it has been derived [21] that the amplitudes for meson photopro-
duction are composed of A(1/2) and A(3/2), referring to final states of definite isospin (1
Table 1: Partial contributions to the electromagnetic polarizabilities based on the analysis of
the total photoabsorption cross section [15]. The t-channel parts in line 6 are the predictions
based on the σ-meson pole representation (see section 2). Line 7 contains the differences between
the numbers in line 2 and the sums of numbers given in lines 3–6. The experimental data are
normalized to (α+ β)p = 13.9 ± 0.3 (see [6]).
1 αp βp
2 experiment 12.0 ± 0.6 1.9 ∓ 0.6
3 P33(1232) M1, E2 −1.1 +8.3
4 P11(1440) M1 −0.1 +0.3
5 D13(1520) E1,M2 +1.2 −0.3
6 S11(1535) E1 +0.1 −0.0
5 F15(1680) E2,M3 −0.1 +0.4
6 t-channel +7.6 −7.6
7 nonresonant +4.4 +0.8
Furthermore, there is an amplitude A(0) which may be related to “recoil” effects [21]. This latter
amplitude makes a contribution to I = 1/2 only. Therefore, the amplitudes
(1/2) = A(0) +
A(1/2), nA
(1/2) = A(0) −
A(1/2) (26)
may be introduced. Furthermore, with
A(+) =
(A(1/2) + 2A(3/2)), A(−) =
(A(1/2) −A(3/2)), (27)
the physical amplitudes may be expressed by the isospin combinations (see e.g. [19, 22])
A(γp → nπ+) =
2(A(−) +A(0)) =
(1/2) − 1
A(3/2)), (28)
A(γp → pπ0) = A(+) +A(0) =p A(1/2) +
A(3/2), (29)
A(γn → pπ−) = −
2(A(−) −A(0)) =
(1/2) +
A(3/2)), (30)
A(γn → nπ0) = A(+) −A(0) = −nA(1/2) +
A(3/2). (31)
The relation for the cross section of 1π photoproduction is given by
σ1π = 2π
(l + 1)2
(l + 2)(|El+|2 + |M(l+1)−|2) + l(|Ml+|2 + |E(l+1)−|2)
, (32)
∆σ1π = 2π
(l + 1)2(−1)l
(l + 2)(|El+|2 − |M(l+1)−|2) + l(|Ml+|2 − |E(l+1)−|2)
,(33)
∆σ1π = σ1π(E1,M2, · · · )− σ1π(M1, E2, · · · ). (34)
The peak cross section Ir introduced in (17) is given by
Ir = 2π
2J + 1
2J0 + 1
, (35)
where J and J0 are the spins of the excited state and the ground state, respectively, Γγ the
photon width and Γ the total width of the resonance. The photon width Γγ may be expressed
through the resonance couplings A1/2 and A3/2 by the relation [16]
(2J + 1)MR
|A1/2|2 + |A3/2|2
, (36)
where MN and MR are the nucleon and resonant masses. Combining (35) and (36) we arrive
|A1/2|2 + |A3/2|2
. (37)
Using (37) the quantity Ir can be calculated from the resonance couplings A1/2 and A3/2 given
by the PDG [16], by GWSES [17] and Mainz [19]. The results obtained for the electromagnetic
polarizabilities obtained from the data given in [19] are given in lines 3 – 7 of Table 2.
Table 2: Partial contributions to the electromagnetic polarizabilities. The resonant contributions
in lines 3–7 are obtained from the analysis of Drechsel et al. [19]. The t-channel parts in line
8 are the predictions based on the σ-meson pole representation (see section 2). The predicted
contribution due to the E0+ amplitude in line 9 is based on the analyses given in [17–19]. Line
10 contains the differences between the numbers in line 2 and the sums of numbers given in lines
3–9. The experimental data are normalized to (α+ β)p = 13.9 ± 0.3 and (α + β)n = 15.2 ± 0.5
(see [6]).
1 αp βp αn βn
2 experiment 12.0 ± 0.6 1.9∓ 0.6 12.5 ± 1.7 2.7 ∓ 1.8
3 P33(1232) M
(3/2)
1+ , E
(3/2)
1+ −1.1 +8.3 −1.1 +8.3
4 P11(1440) p,nM
(1/2)
1− −0.0 +0.2 −0.0 +0.1
5 D13(1520) p,nE
(1/2)
2− , p,nM
(1/2)
2− +0.6 −0.2 +0.5 −0.1
6 S11(1535) p,nE
(1/2)
0+ +0.1 −0.0 +0.1 −0.0
7 F15(1680) p,nE
(1/2)
3− , p,nM
(1/2)
3− −0.1 +0.3 −0.0 +0.0
8 t-channel +7.6 −7.6 +7.6 −7.6
9 E0+ (empirical) +3.2 −0.3 +4.1 −0.4
10 background +1.7 +1.2 +1.3 +2.4
The main contributions to the nonresonant parts of the electromagnetic polarizabilities are
expected from the E0+ amplitude which has to be taken from analyses of meson photoproduction
data. Multipole analyses of pion photoproduction based on fixed-t dispersion relations and
unitarity are given by Hanstein et al. [18] in a convenient form. Cross sections separated into
resonant and nonresonant parts are provided for the reactions γp → π+n and γn → π−p up to
energies of 500 MeV and extrapolations of the nonresonant parts are straightforward using the
data contained in [19] and [17]. In principle there is a problem in disentangling resonant and
nonresonant contributions because of interference effects. The interference of the amplitudes
(1/2)
0+ with the S11(1535) and S11(1650) resonances, however, does not lead to problems in
determining the nonresonant E0+ contributions because of the smallness of the resonant parts.
The results for the electromagnetic polarizabilities obtained from these empirical E0+ data are
contained in line 9 of Table 2.
3It should be noted that the quantity Ir of (37) contains the branching correction Γ/Γπ as required.
Up to this point the electromagnetic polarizabilities find an explanation in the numbers given
in lines 3 – 9 of Table 2, with the exception of the small contributions given in line 10 which
deserve a further investigation. These non-E0+ parts of the nonresonant contributions are partly
due to the M
(3/2)
1− , p,nM
(1/2)
1+ and p,nE
(1/2)
1+ amplitudes which interfere with the corresponding
resonant amplitudes p,nM
(1/2)
1− (P11(1440)), M
(3/2)
1+ (P33(1232)) and E
(3/2)
1+ (P33(1232)), respec-
tively [19]. Only the nonresonant parts of the M1− and M1+ amplitudes are expected to be to
some extent important in comparison with dominant E0+ amplitude. Therefore we restrict the
present discussion to the M1− and M1+ amplitudes. Using the data given in [19] we arrive at
the estimates
αnonres.p (M1−) = −0.0, βnonres.p (M1−) = +0.2, αnonres.n (M1−) = −0.1, βnonres.p (M1−) = +0.4,
αnonres.p (M1+) = −0.0, βnonres.p (M1+) = +0.3, αnonres.n (M1+) = −0.1, βnonres.p (M1+) = +0.6.
The conclusion we have to draw from this is that it is not possible to relate the numbers given in
line 10 of Table 2 to known photoproduction processes, unless the two-pion channels are taken
into account (see e.g. [23]). The ππN final states can be characterized either as quasi two-
body states such as π∆ and ρN , or as a ππN component in which both pions are in S waves.
Furthermore, in the Regge regime above ≈ 2000 MeV also f2(1270), a2(1320) and Pomeron
t-channel exchanges play a role. The π∆ contribution has been analyzed in terms of a ∆ Kroll-
Ruderman term and a ∆ pion-pole term [24]. Using data from this analysis [24] we arrive at
(αp,n + βp,n) ≈ 1.0 for this partial ππ channel. The nonresonant cross section above ≈ 2000
MeV makes a contribution of about (αp,n + βp,n) ≈ 0.7.
4 Discussion
4.1 Discussion of the s-channel contribution
For a long time there have been attempts to understand the electromagnetic polarizabilities
predominantly in terms of properties of the “pion cloud” of the nucleon. Among these attempts
CHPT in its original relativistic form [25] is among the most prominent ones. It has been
shown by L’vov [26] that the results obtained for the electromagnetic polarizabilities through
the evaluation of chiral loops [25] can be reproduced via dispersion theory when the Born
approximation of the electric-dipole CGLN amplitude E0+ is taken into account. The results
obtained in this way are shown in lines 2 and 3 of Table 3.
Table 3: Predictions for the “meson cloud” contribution to the electromagnetic polarizabilities
in different approaches.
1 method αp βp αn βn reference
2 CHPT +7.4 −2.0 +10.1 −1.2 Bernard [25]
3 piona) Born +7.3 −1.8 +9.8 −0.9 L’vov [26]
4 E0+ Born +7.5 −1.4 +9.9 −1.8 present
a) The use of fixed-t dispersion theory requires the consideration of interference terms of the
E0+ amplitude with other amplitudes.
It is of interest to use also the present approach based on forward and backward dispersion
relations for studies of this type. For this purpose use may be made of the Born approximation
(see [22] p. 286, [27] p. 35) given in the form
EBorn0+ (γN → π±N) = ±
(−)Born
0+ ± E
(0)Born
, (38)
(−)Born
1− v2
1 + v
, (39)
with v = |q|/
q2 +m2π being the velocity of the pion in the c.m. system. The expression given
in (39) corresponds to the static approximation discussed in detail in [22, 27]. Because of the
relation
σE0+(γn → π−p)
σE0+(γp → π+n)
≃ 1.3 (40)
(see [22] p. 276) the recoil terms E
(0)Born
0+ may be replaced by multiplying E
(−)Born
0+ with (1 +
)−1/2 and (1+mπ
)+1/2 in order to get the results for the proton and neutron, respectively. The
relation given in (40) is well justified at threshold but its approximate validity extends to higher
energies [18,19,27]. The pseudovector coupling constant f in (39) is given by f = gπNN (mπ/2m)
with gπNN = 13.169 ± 0.057. There is a remarkable agreement between the numbers given in
Table 3 but these numbers are larger by a factor ∼ 2.4 than the corresponding numbers in line
9 of Table 2. Two reasons for the deviation of the empirical E0+ amplitude from the Born
approximation have been discussed in [19]. The first reason is that the pseudovector (PV)
coupling is not valid at high photon energies but has to be replaced by some average of the PV
and pseudoscalar the (PS) coupling. The second reason are ρ and ω meson t-channel exchanges
which are not taken into account in the Born approximation.
In Table 2 (see also Table 1) we see that the different resonant contributions to the electric
polarizabilities cancel each other, so that the electric polarizabilities are mainly due to the
t-channel part αtp,n (∼60%) given in line 8 and a smaller nonresonant part α(E0+) (∼30%)
given in line 9. For the magnetic polarizabilities there is an almost complete cancellation of the
P11(1440), D13(1520) and F15(1680) contributions, so that the main remaining contributions are
due to the P33(1232) resonance, canceled to a large extent by the t-channel contribution β
t. The
nonresonant background given in line 10 of Table 2 amounts to about 10% of the experimental
electric polarizabilities and to about 70% of the experimental magnetic polarizabilities. This
means that precise predictions of these contributions are highly desirable, especially for the
magnetic polarizabilities. Unfortunately, the non-E0+ parts of the nonresonant photoabsorption
cross sections are dominated by two-pion channels where the information on the multipole
content is scarce.
4.2 Discussion of the t-channel contribution
In [1] it has been shown that there are two independent, but apparently equivalent and comple-
mentary options to calculate the scalar-isoscalar t-channel contribution to the electromagnetic
polarizabilities of the nucleon.
Option 1 makes use of the properties of the σ-meson as predicted by the quark-level NJL
model and in this respect is of course model dependent. The quark-level NJL model predicts
a definite σ-mesons mass, viz. mσ = 666 MeV, through a parameter-free relation of mσ to
the pion decay constant fπ. The result (α − β)t = 15.2 is in an excellent agreement with the
experimental result. The agreement between a prediction and an experimental result cannot be
used as an argument for the validity of the prediction without further support. This support is
provided by dispersion theory applied to the measured properties of the σ meson as showing up
in particle reactions with two pions in the intermediate state (Option 2).
Option 2 first takes into consideration that the σ meson has been observed in many data
analyses [16] as a pole on the second sheet of the isoscalar S wave of ππ scattering. This pole
describes part of the resonant structure of the σ meson without being a complete description.
This latter property of the pole follows from the fact that the 90◦ crossing of the scalar-isoscalar
phase δ00(s) is located at much higher energies than predicted by the structure of the pole. The
analyses of Colangelo et al. [28] and Caprini et al. [29] led to
s(pole) = (470± 30) − i(295 ± 20) MeV
s(δS = 90
◦) = (844± 13) MeV [28], (41)
Mσ = 441
−8 MeV, Γσ = 544
−25 MeV [29]. (42)
The numbers contained in (41) and (42) are extremely valuable in characterizing the properties
of the σ meson as a real particle but they can only qualitatively be compared with the mass
mσ = 666 MeV of the virtual σ meson, because in the latter case there is no open decay channel.
This means that there is no contradiction between the existence of the broad mass distribution
for the real σ meson and a precisely determined mass of the virtual σ meson. Furthermore, the
numbers contained in (41) and (42) are of no direct relevance for the prediction of (α − β)t.
First of all it certainly would lead only to a qualitative estimate for (α − β)t if the parameters
of the σ-meson pole in (41) and (42) would be used instead of mσ = 666 MeV. Furthermore,
such an insufficient attempt is not necessary because the BEFT [4] sum rule provides a precise
relation between (α − β)t and the properties of the real σ meson. In the BEFT sum rule the
imaginary part of the t-channel Compton scattering amplitude is given by an unitarity relation
where the two reaction γγ → σ → ππ and NN̄ → σ → ππ are exploited. In these reactions
the resonant structure of the σ meson enters via the experimentally determined scalar-isoscalar
phase δ00(s) which is considerably different from the corresponding quantity predicted by the
poles shown in (41) and (42). The real part of the t-channel Compton scattering amplitude is
obtained via a dispersion relation. The present status of the evaluation of the BEFT sum rule
(α−β)tnp = 15.3±1.3 is in good agreement with the experimental result as well as the prediction
based on the quark-level NJL model.
5 Conclusion
The good agreement of the result based on the BEFT sum rule with the experimental result
as well as the prediction based on the quark level NJL model may be understood as a strong
argument that the two predictions of (α−β)t are equivalent. This implies that in addition to the
poles in (41) and (42) also the mass mσ = 666 MeV of the virtual σ meson is an experimentally
verified property of the σ meson.
Acknowledgment
The author is indebted to Deutsche Forschungsgemeinschaft for the support of this work through
the projects SCHU222 and 436RUS113/510. He thanks M.I. Levchuk, A.I. L’vov and A.I. Mil-
stein for a long term cooperation which contributed to the motivation for the present investiga-
tion.
References
[1] M. Schumacher, Eur. Phys. J. A 30, 413 (2006); DOI 10.1140/epja/i2006-10103-0
[hep-ph/0609040].
[2] A.C. Hearn, E. Leader, Phys. Rev. 126, 789 (1962); R. Köberle, Phys. Rev. 166, 1558
(1968).
[3] E.E. Low, Phys. Rev. 120, 582 (1960) (and reference therein); M. Jacob, J. Mathews, Phys.
Rev. 117, 854 (1960).
[4] J. Bernabeu, T.E.O. Ericson, C. Ferro Fontan, Phys. Lett. 49 B, 381 (1974); J. Bernabeu,
B. Tarrach, Phys. Lett 69 B, 484 (1977).
[5] I. Guiasu, E.E. Radescu, Phys. Rev. D 14, 1335 (1976); Phys. Lett. 62 B, 193 (1976).
[6] M. Schumacher, Prog. Part. Nucl. Phys. 55, 567 (2005) [hep-ph/0501167].
[7] B.R. Holstein, A.M. Nathan, Phys. Rev. D 49, 6101 (1994).
[8] M.I. Levchuk, A.I. L’vov, A.I. Milstein, M. Schumacher, Proceedings of the Workshop
NSTAR2005, 12–15 October 2005, Tallahassee, Florida, edited by S. Capstick, V. Crede,
P. Eugenio (World Scientific 2006) 389 [hep-ph/0511193].
[9] D. Drechsel et al., Phys. Rep. 378, 99 (2003); Phys. Rev. C 61, 015204 (1999).
[10] T. Hatsuda, T. Kunihiro, Phys. Rep. 247, 221 (1994).
[11] R. Delbourgo, M. Scadron, Mod. Phys. Lett. A 10, 251 (1995) [hep-ph/9910242]; Int. J.
Mod. Phys. A 13, 657 (1998) [hep-ph/9807504].
[12] A.I. L’vov, V.A. Petrun’kin, M. Schumacher, Phys. Rev. C 55, 359 (1997).
[13] G.F. Chew, M.L. Goldberger, F.E. Low, Y. Nambu, Phys. Rev. 106, 1345 (1957).
[14] A.M. Baldin, Nucl. Phys. 18, 310 (1960); L.I. Lapidus, Zh. Eksp. Teor. Fiz. 43, 1358 (1962)
[Sov. Phys. JETP 16, 964 (1963)].
[15] T.A. Armstrong et al., Phys. Rev. D 5, 1640 (1972); Nucl. Phys. B 41, 445 (1972).
[16] W.-M. Yao et al., (Particle Data Group) J. Phys. G 33, 1 (2006) [URL: http://pdg.lbl.gov].
[17] R.A. Arndt, et al. Phys. Rev. C 66, 055213 (2002).
[18] O. Hanstein, D. Drechsel, L. Tiator, Nucl. Phys. A 632, 561 (1998).
[19] D. Drechsel, O. Hanstein, S.S. Kamalov, L. Tiator, Nucl. Phys. A 645, 145 (1999).
[20] R.L. Walker, Phys. Rev. 182, 1729 (1969).
[21] K.M. Watson, Phys. Rev. 95, 228 (1954).
[22] T. Ericson, W. Weise, Pions and Nuclei, International Series of Monographs on Physics 74,
Oxford Science Publications (1988).
[23] D. Drechsel, L. Tiator, J. Phys. G: Nucl. Part. Phys. 18, 449 (1992).
[24] J.A. Gómez Tejedor, E. Oset, Nucl. Phys. A 571, 667 (1994); 600, 413 (1996).
http://arxiv.org/abs/hep-ph/0609040
http://arxiv.org/abs/hep-ph/0501167
http://arxiv.org/abs/hep-ph/0511193
http://arxiv.org/abs/hep-ph/9910242
http://arxiv.org/abs/hep-ph/9807504
http://pdg.lbl.gov
[25] V. Bernard, N. Kaiser, U.-G. Meissner, Phys. Rev. Lett. 67, 1515 (1991); Nucl. Phys. B
373, 346 (1992).
[26] A.I. L’vov, Phys. Lett. B 304, 29 (1993).
[27] A. Donnachie, in: High Energy Physics, Edited by E.H.S. Burhop V, 1 Academic Press
(1972)
[28] G. Colangelo, J. Gasser, H. Leutwyler, Nucl. Phys. B 603, 125 (2001).
[29] I. Caprini, G. Colangelo, H. Leutwyler, Phys. Rev. Lett. 96, 132001 (2006).
	Introduction
	Electromagnetic polarizabilities obtained from the forward-angle sum-rule for (+) and the backward-angle sum-rule for (-)
	Components of electromagnetic polarizabilities from analyses of total photoabsorption and meson photoproduction data
	Components of electromagnetic polarizabilities from analysis of the total photoabsorption cross-section of the proton
	Components of electromagnetic polarizabilities from analyses of meson photoproduction for the proton and the neutron
	Discussion
	Discussion of the s-channel contribution
	Discussion of the t-channel contribution
	Conclusion
ABSTRACT
  The electromagnetic polarizabilities of the nucleon are shown to be
essentially composed of the nonresonant $\alpha_p(E_{0+})=+3.2$,
$\alpha_n(E_{0+})=+4.1$,the $t$-channel $\alpha^t_{p,n}=-\beta^t_{p,n}=+7.6$
and the resonant $\beta_{p,n}(P_{33}(1232))=+8.3$ contributions (in units of
$10^{-4}$fm$^3$. The remaining deviations from the experimental data
$\Delta\alpha_p=1.2\pm 0.6$, $\Delta\beta_p=1.2\mp 0.6$, \Delta\alpha_n=0.8\pm
1.7$ and $\Delta\beta_n=2.0\mp 1.8$ are contributed by a larger number of
resonant and nonresonant processes with cancellations between the
contributions. This result confirms that dominant contributions to the electric
and magnetic polarizabilities may be represented in terms of two-photon
couplings to the $\sigma$-meson having the predicted mass $m_\sigma=666$ MeV
and two-photon width $\Gamma_{\gamma\gamma}=2.6$ keV.

<|endoftext|><|startoftext|>
Introduction
1.1. The Hecke algebras associated to finite and affine Weyl groups are
ubiquitous in diverse areas, including representation theories over finite
fields, infinite fields of prime characteristic, p-adic fields, and Kazhdan-
Lusztig theory for category O. Lusztig [Lu1, Lu2] introduced the graded
Hecke algebras, also known as the degenerate affine Hecke algebras, associ-
ated to a finite Weyl groupW , and provided a geometric realization in terms
of equivariant homology. The degenerate affine Hecke algebra of type A has
also been defined earlier by Drinfeld [Dr] in connections with Yangians, and
it has recently played an important role in modular representations of the
symmetric group (cf. Kleshchev [Kle]).
In [W1], the second author introduced the degenerate spin affine Hecke
algebra of type A, and related it to the degenerate affine Hecke-Clifford
algebra introduced by Nazarov in his study of the representations of the spin
symmetric group [Naz]. A quantum version of the spin affine Hecke algebra
of type A has been subsequently constructed in [W2], and was shown to be
related to the q-analogue of the affine Hecke-Clifford algebra (of type A)
defined by Jones and Nazarov [JN].
1.2. The goal of this paper is to provide canonical constructions of the
degenerate affine Hecke-Clifford algebras and degenerate spin affine Hecke
algebras for all classical finite Weyl groups, which goes beyond the type A
case, and then establish some basic properties of these algebras. The notion
of spin Hecke algebras is arguably more fundamental while the notion of
the Hecke-Clifford algebras is crucial for finding the right formulation of the
spin Hecke algebras. We also construct the degenerate covering affine Hecke
algebras which connect to both the degenerate spin affine Hecke algebras
and the degenerate affine Hecke algebras of Lusztig.
http://arxiv.org/abs/0704.0201v3
2 TA KHONGSAP AND WEIQIANG WANG
1.3. Let us describe our constructions in some detail. The Schur multiplier
for each finite Weyl group W has been computed by Ihara and Yokonuma
[IY] (see [Kar]). We start with a distinguished double cover W̃ for any finite
Weyl group W :
1 −→ Z2 −→ W̃ −→W −→ 1. (1.1)
Denote Z2 = {1, z}. Assume that W is generated by s1, . . . , sn subject to
the relations (sisj)
mij = 1. The quotient CW− := CW̃/〈z + 1〉 is then
generated by t1, . . . , tn subject to the relations (titj)
mij = 1 for mij odd,
and (titj)
mij = −1 for mij even. In the symmetric group case, this double
cover goes back to I. Schur [Sch]. Note that W acts as automorphisms on
the Clifford algebra CW associated to the reflection representation h of W .
We establish a (super)algebra isomorphism
Φfin : CW ⋊CW
≃−→ CW ⊗ CW−,
extending an isomorphism in the symmetric group case (due to Sergeev
[Ser] and Yamaguchi [Yam] independently) to all Weyl groups. That is,
the superalgebras CW ⋊ CW and CW
− are Morita super-equivalent in the
terminology of [W2]. The double cover W̃ also appeared in Morris [Mo].
We formulate the notion of degenerate affine Hecke-Clifford algebras HcW
and spin affine Hecke algebras H−
, with unequal parameters in type B
case, associated to Weyl groups W of type D and B. The algebra HcW (and
respectively H−
) contain CW ⋊CW (and respectively CW
−) as subalgebras.
We establish the PBW basis properties for these algebras:
∼= C[h∗]⊗ CW ⊗CW, H−W ∼= C[h
∗]⊗ CW−
where C[h∗] denotes the polynomial algebra and C[h∗] denotes a noncommu-
tative skew-polynomial algebra. We describe explicitly the centers for both
HcW and H
. The two Hecke algebras HcW and H
are related by a Morita
super-equivalence, i,e. a (super)algebra isomorphism
Φ : HcW
≃−→ CW ⊗ H−W
which extends the isomorphism Φfin. Such an isomorphism holds also for
W of type A [W1].
We generalize the construction in [Naz] of the intertwiners in the affine
Hecke-Clifford algebras HcW of type A to all classical Weyl groups W . We
also generalize the construction of the intertwiners in [W1] for H−
of type
A to all classical Weyl groups W . We further establish the basic properties
of these intertwiners in both HcW and H
. These intertwiners are expected
to play a fundamental role in the future development of the representation
theory of these algebras, as it is indicated by the work of Lusztig, Cherednik
and others in the setup of the usual affine Hecke algebras.
We further introduce a notion of degenerate covering affine Hecke algebras
H∼W associated to the double cover W̃ of the Weyl group W of classical
type. The algebra H∼W contains a central element z of order 2 such that the
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 3
quotient of H∼W by the ideal 〈z+1〉 is identified with H
and its quotient by
the ideal 〈z−1〉 is identified with Lusztig’s degenerate affine Hecke algebras
associated toW . In this sense, our covering affine Hecke algebra is a natural
affine generalization of the central extension (1.1). A quantum version of
the covering affine Hecke algebra of type A was constructed in [W2].
The results in this paper remain valid over any algebraically closed field
of characteristic p 6= 2 (and in addition p 6= 3 for type G2). In fact, most
of the constructions can be made valid over the ring Z[1
] (occasionally we
need to adjoint
1.4. This paper and [W1] raise many questions, including a geometric re-
alization of the algebras HcW or H
in the sense of Lusztig [Lu1, Lu2], the
classification of the simple modules (cf. [Lu3]), the development of the rep-
resentation theory, an extension to the exceptional Weyl groups, and so on.
We remark that the modular representations of HcW in the type A case in-
cluding the modular representations of the spin symmetric group have been
developed by Brundan and Kleshchev [BK] (also cf. [Kle]).
In a sequel [KW] to this paper, we will extend the constructions in this
paper to the setup of rational double affine Hecke algebras (see Etingof-
Ginzburg [EG]), generalizing and improving a main construction initiated
in [W1] for the spin symmetric group. We also hope to quantize these
degenerate spin Hecke algebras, reversing the history of developments from
quantum to degeneration for the usual Hecke algebras.
1.5. The paper is organized as follows. In Section 2, we describe the distin-
guished covering groups of the Weyl groups, and establish the isomorphism
theorem in the finite-dimensional case. We introduce in Section 3 the degen-
erate affine Hecke-Clifford algebras of type D and B, and in Section 4 the
corresponding degenerate spin affine Hecke algebras. We then extend the
isomorphism Φfin to an isomorphism relating these affine Hecke algebras,
establish the PBW properties, and describe the centers of HcW and H
Section 5, we formulate the notion of degenerate covering affine Hecke al-
gebras, and establish the connections to the degenerate spin affine Hecke
algebras and usual affine Hecke algebras.
Acknowledgements. W.W. is partially supported by an NSF grant.
2. Spin Weyl groups and Clifford algebras
2.1. The Weyl groups. Let W be an (irreducible) finite Weyl group with
the following presentation:
〈s1, . . . , sn|(sisj)mij = 1, mii = 1, mij = mji ∈ Z≥2, for i 6= j〉 (2.1)
For a Weyl group W , the integers mij take values in {1, 2, 3, 4, 6}, and
they are specified by the following Coxeter-Dynkin diagrams whose vertices
correspond to the generators of W . By convention, we only mark the edge
connecting i, j with mij ≥ 4. We have mij = 3 for i 6= j connected by an
unmarked edge, and mij = 2 if i, j are not connected by an edge.
4 TA KHONGSAP AND WEIQIANG WANG
An ◦ ◦ . . . ◦ ◦
1 2 n− 1 n
Bn(n ≥ 2) ◦ ◦ . . . ◦ ◦
1 2 n− 1 n
Dn(n ≥ 4) ◦ ◦ · · · ◦ ◦
1 2 n− 3
En=6,7,8 ◦ ◦ ◦ . . . ◦ ◦
1 3 4 n− 1 n
F4 ◦ ◦ ◦ ◦
1 2 3 4
G2 ◦ ◦
2.2. A distinguished double covering of Weyl groups. The Schur mul-
tipliers for finite Weyl groups W (and actually for all finite Coxeter groups)
have been computed by Ihara and Yokonuma [IY] (also cf. [Kar]). The
explicit generators and relations for the corresponding covering groups of W
can be found in Karpilovsky [Kar, Table 7.1].
We shall be concerned about a distinguished double covering W̃ of W :
1 −→ Z2 −→ W̃ −→W −→ 1.
We denote by Z2 = {1, z}, and by t̃i a fixed preimage of the generators si of
W for each i. The group W̃ is generated by z, t̃1, . . . , t̃n with relations
z2 = 1, (t̃it̃j)
mij =
1, if mij = 1, 3
z, if mij = 2, 4, 6.
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 5
The quotient algebra CW− := CW̃/〈z+1〉 of CW̃ by the ideal generated
by z+1 will be called the spin Weyl group algebra associated to W . Denote
by ti ∈ CW− the image of t̃i. The spin Weyl group algebra CW− has the
following uniform presentation: CW− is the algebra generated by ti, 1 ≤ i ≤
n, subject to the relations
(titj)
mij = (−1)mij+1 ≡
1, if mij = 1, 3
−1, if mij = 2, 4, 6.
(2.2)
Note that dimCW− = |W |. The algebra CW− has a natural superalgebra
(i.e. Z2-graded) structure by letting each ti be odd.
By definition, the quotient by the ideal 〈z − 1〉 of the group algebra CW̃
is isomorphic to CW .
Example 2.1. Let W be the Weyl group of type An, Bn, or Dn, which will
be assumed in later sections. Then the spin Weyl group algebra CW− is
generated by t1, . . . , tn with the labeling as in the Coxeter-Dynkin diagrams
and the explicit relations summarized in the following table.
Type of W Defining Relations for CW−
i = 1, titi+1ti = ti+1titi+1,
(titj)
2 = −1 if |i− j| > 1
t1, . . . , tn−1 satisfy the relations for CW
n = 1, (titn)
2 = −1 if i 6= n− 1, n,
(tn−1tn)
4 = −1
t1, . . . , tn−1 satisfy the relations for CW
n = 1, (titn)
2 = −1 if i 6= n− 2, n,
tn−2tntn−2 = tntn−2tn
2.3. The Clifford algebra CW . Denote by h the reflection representation
of the Weyl groupW (i.e. a Cartan subalgebra of the corresponding complex
Lie algebra g). In the case of type An−1, we will always choose to work with
the Cartan subalgebra h of gln instead of sln in this paper.
Note that h carries a W -invariant nondegenerate bilinear form (−,−),
which gives rise to an identification h∗ ∼= h and also a bilinear form on h∗
which will be again denoted by (−,−). We identify h∗ with a suitable sub-
space of CN and then describe the simple roots {αi} for g using a standard
orthonormal basis {ei} of CN . It follows that (αi, αj) = −2 cos(π/mij).
Denote by CW the Clifford algebra associated to (h, (−,−)), which is re-
garded as a subalgebra of the Clifford algebra CN associated to (C
N , (−,−)).
We shall denote by ci the generator in CN corresponding to
2ei and denote
by βi the generator of CW corresponding to the simple root αi normalized
with β2i = 1. In particular, CN is generated by c1, . . . , cN subject to the
relations
c2i = 1, cicj = −cjci if i 6= j. (2.3)
6 TA KHONGSAP AND WEIQIANG WANG
The explicit generators for CW are listed in the following table. Note that
CW is naturally a superalgebra with each βi being odd.
Type of W N Generators for CW
An−1 n βi =
(ci − ci+1), 1 ≤ i ≤ n− 1
Bn n βi =
(ci − ci+1), 1 ≤ i ≤ n− 1, βn = cn
Dn n βi =
(ci − ci+1), 1 ≤ i ≤ n− 1, βn = 1√
(cn−1 + cn)
E8 8 β1 =
(c1 + c8 − c2 − c3 − c4 − c5 − c6 − c7)
(c1 + c2), βi =
(ci−1 + ci−2), 3 ≤ i ≤ 8
E7 8 the subset of βi in E8, 1 ≤ i ≤ 7
E6 8 the subset of βi in E8, 1 ≤ i ≤ 6
F4 4 β1 =
(c1 − c2), β2 = 1√2(c2 − c3)
β3 = c3, β4 =
(c4 − c1 − c2 − c3)
G2 3 β1 =
(c1 − c2), β2 = 1√6(−2c1 + c2 + c3)
The action of W on h and h∗ preserves the bilinear form (−,−) and thus
W acts as automorphisms of the algebra CW . This gives rise to a semi-direct
product CW ⋊CW . Moreover, the algebra CW ⋊CW naturally inherits the
superalgebra structure by letting elements inW be even and each βi be odd.
2.4. The basic spin supermodule. The following theorem is due to Mor-
ris [Mo] in full generality, and it goes back to I. Schur [Sch] (cf. [Joz]) in the
type A, namely the symmetric group case. It can be checked case by case
using the explicit formulas of βi in the Table of Section 2.3.
Theorem 2.2. Let W be a finite Weyl group. Then, there exists a surjective
superalgebra homomorphism CW−
Ω−→ CW which sends ti to βi for each i.
Remark 2.3. In [Mo], W is viewed as a subgroup of the orthogonal Lie group
which preserves (h, (−,−)). The preimage of W in the spin group which
covers the orthogonal group provides the double cover W̃ of W , where the
Atiyah-Bott-Shapiro construction of the spin group in terms of the Clifford
algebra CW was used to describe this double cover of W .
The superalgebra CW has a unique (up to isomorphism) simple super-
module (i.e. Z2-graded module). By pulling it back via the homomorphism
Ω : CW− → CW , we obtain a distinguished CW−-supermodule, called the
basic spin supermodule. This is a natural generalization of the classical
construction for CS−n due to Schur [Sch] (see [Joz]).
2.5. A superalgebra isomorphism. Given two superalgebras A and B,
we view the tensor product of superalgebras A ⊗ B as a superalgebra with
multiplication defined by
(a⊗ b)(a′ ⊗ b′) = (−1)|b||a′|(aa′ ⊗ bb′) (a, a′ ∈ A, b, b′ ∈ B) (2.4)
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 7
where |b| denotes the Z2-degree of b, etc. Also, we shall use short-hand
notation ab for (a⊗ b) ∈ A ⊗ B, a = a⊗ 1, and b = 1⊗ b.
We have the following Morita super-equivalence in the sense of [W2] be-
tween the superalgebras CW ⋊CW and CW
Theorem 2.4. We have an isomorphism of superalgebras:
Φ : CW ⋊CW
≃−→ CW ⊗ CW−
which extends the identity map on CW and sends si 7→ −
−1βiti. The
inverse map Ψ is the extension of the identity map on CW which sends
ti 7→
−1βisi.
We first prepare some lemmas.
Lemma 2.5. We have (Φ(si)Φ(sj))
mij = 1.
Proof. Theorem 2.2 says that (titj)
mij = (βiβj)
mij = ±1. Thanks to the
identities βjti = −tiβj and Φ(si) = −
−1βiti, we have
(Φ(si)Φ(sj))
mij = (−βitiβjtj)mij
= (βiβjtitj)
mij = (βiβj)
mij (titj)
mij = 1.
Lemma 2.6. We have βjΦ(si) = Φ(si) si(βj) for all i, j.
Proof. Note that (βi, βi) = 2β
i = 2, and hence
βjβi = −βiβj + (βj , βi) = −βiβj +
2(βj , βi)
(βi, βi)
β2i = −βisi(βj).
Thus, we have
βjΦ(si) = −
−1βjβiti
−1tiβjβi =
−1tiβisi(βj) = Φ(si) si(βj).
Proof of Theorem 2.4. The algebra CW ⋊CW is generated by βi and si for
all i. Lemmas 2.5 and 2.6 imply that Φ is a (super) algebra homomorphism.
Clearly Φ is surjective, and thus an isomorphism by a dimension counting
argument.
Clearly, Ψ and Φ are inverses of each other. �
Remark 2.7. The type A case of Theorem 2.4 was due to Sergeev and Ya-
maguchi independently [Ser, Yam], and it played a fundamental role in
clarifying the earlier observation in the literature (cf. [Joz, St]) that the
representation theories of CS−n and Cn ⋊CSn are essentially the same.
In the remainder of the paper, W is always assumed to be one of the
classical Weyl groups of type A,B, or D.
8 TA KHONGSAP AND WEIQIANG WANG
3. Degenerate affine Hecke-Clifford algebras
In this section, we introduce the degenerate affine Hecke-Clifford algebras
of type D and B, and establish some basic properties. The degenerate affine
Hecke-Clifford algebra associated to the symmetric group Sn was introduced
earlier by Nazarov under the terminology of the affine Sergeev algebra [Naz].
3.1. The algebra HcW of type An−1.
Definition 3.1. [Naz] Let u ∈ C, and W =WAn−1 = Sn be the Weyl group
of type An−1. The degenerate affine Hecke-Clifford algebra of type An−1,
denoted by HcW or H
, is the algebra generated by x1, . . . , xn, c1, . . . , cn,
and Sn subject to the relation (2.3) and the following relations:
xixj = xjxi (∀i, j) (3.1)
xici = −cixi, xicj = cjxi (i 6= j) (3.2)
σci = cσiσ (1 ≤ i ≤ n, σ ∈ Sn) (3.3)
xi+1si − sixi = u(1− ci+1ci) (3.4)
xjsi = sixj (j 6= i, i+ 1) (3.5)
Remark 3.2. Alternatively, we may view u as a formal parameter and the
algebra HcW as a C(u)-algebra. Similar remarks apply to various algebras
introduced in this paper. Our convention c2i = 1 differs from Nazarov’s
which sets c2i = −1.
The symmetric group Sn acts as the automorphisms on the symmetric
algebra C[h∗] ∼= C[x1, . . . , xn] by permutation. We shall denote this action
by f 7→ fσ for σ ∈ Sn, f ∈ C[x1, . . . , xn].
Proposition 3.3. Let W = WAn−1. Given f ∈ C[x1, . . . , xn] and 1 ≤ i ≤
n− 1, the following identity holds in HcW :
sif = f
sisi + u
f − f si
xi+1 − xi
cici+1f − f sicici+1
xi+1 + xi
It is understood here and in similar expressions below that A
In this sense, both numerators on the right-hand side of the above formula
are (left-)divisible by the corresponding denominators.
Proof. By the definition of HcW , we have that six
j = x
j si for any k if
j 6= i, i + 1. So it suffices to check the identity for f = xki xli+1. We will
proceed by induction.
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 9
First, consider f = xki , i.e. l = 0. For k = 1, this follows from (3.4). Now
assume that the statement is true for k. Then
xki+1si + u
(xki − xki+1)
xi+1 − xi
(cici+1x
i − xki+1cici+1)
xi+1 + xi
= xki+1 (xi+1si − u(1− ci+1ci))
(xki − xki+1)
xi+1 − xi
xi + u
(cici+1x
i − xki+1cici+1)
xi+1 + xi
= xk+1i+1 si + u
(xk+1
− xk+1
i+1 )
xi+1 − xi
(cici+1x
− xk+1
i+1 cici+1)
xi+1 + xi
where the last equality is obtained by using (3.2) and (3.4) repeatedly.
An induction on l will complete the proof of the proposition for the mono-
mial f = xki x
i+1. The case l = 0 is established above. Assume the formula
is true for f = xki x
i+1. Then using sixi+1 = xisi+u(1+ci+1ci), we compute
i+1 =
i+1si + u
(xki x
i+1 − xlixki+1)
xi+1 − xi
(cici+1x
i+1 − xlixki+1cici+1)
xi+1 + xi
· xi+1
= xlix
i+1(xisi + u(1 + ci+1ci))
(xki x
i+1 − xlix
i+1 )
xi+1 − xi
(cici+1x
i+1 + x
i+1 cici+1)
xi+1 + xi
= xl+1i x
i+1si + u
(xki x
i+1 − x
xi+1 − xi
(cici+1x
i+1 − x
xki+1cici+1)
xi+1 + xi
This completes the proof of the proposition. �
The algebra HcW contains C[h
∗],Cn, and CW as subalgebras. We shall
denote xα = xa11 · · · xann for α = (a1, . . . , an) ∈ Zn+, cǫ = c
1 · · · cǫnn for ǫ =
(ǫ1, . . . , ǫn) ∈ Zn2 .
Below we give a new proof of the PBW basis theorem for HcW (which
has been established by different methods in [Naz, Kle]), using in effect the
induced HcW -module Ind
1 from the trivial W -module 1. This induced
module is of independent interest. This approach will then be used for type
D and B.
Theorem 3.4. LetW =WAn−1 . The multiplication of subalgebras C[h
∗],Cn,
and CW induces a vector space isomorphism
C[h∗]⊗ Cn ⊗ CW
≃−→ HcW .
10 TA KHONGSAP AND WEIQIANG WANG
Equivalently, {xαcǫw|α ∈ Zn+, ǫ ∈ Zn2 , w ∈ W} forms a linear basis for HcW
(called a PBW basis).
Proof. Note that IND := C[x1, . . . , xn] ⊗ Cn admits an algebra structure
by (2.3), (3.1) and (3.2). By the explicit defining relations of HcW , we can
verify that the algebra HcW acts on IND by letting xi and ci act by left
multiplication, and si ∈ Sn act by
si.(fc
ǫ) = f sicsiǫ +
f − f si
xi+1 − xi
cici+1f − f sicici+1
xi+1 + xi
For α = (a1, . . . , an), we denote |α| = a1 + · · · + an. Define a Lexico-
graphic ordering < on the monomials xα, α ∈ Zn+, (or respectively on Zn+),
by declaring xα < xα
, (or respectively α < α′), if |α| < |α′|, or if |α| = |α′|
then there exists an 1 ≤ i ≤ n such that ai < a′i and aj = a′j for each j < i.
Note that the algebra HcW is spanned by the elements of the form x
αcǫw.
It remains to show that these elements are linearly independent.
Suppose that S :=
xαcǫw = 0 for a finite sum over α, ǫ, w and that
some coefficient a
6= 0; we fix one such ǫ. Now consider the action S
on an element of the form x
2 · · · xNnn for N1 ≫ N2 ≫ · · · ≫ Nn ≫ 0.
Let w̃ be such that (x
2 · · · xNnn )w̃ is maximal among all possible w with
aαǫw 6= 0 for some α. Let α̃ be the largest element among all α with
6= 0. Then among all monomials in S(xN11 x
2 · · · xNnn ), the monomial
xα̃(x
2 · · · xNnn )w̃cǫ appears as a maximal term with coefficient ±aα̃ǫw̃.
It follows from S = 0 that aα̃ǫw̃ = 0. This is a contradiction, and hence the
elements xαcǫw are linearly independent. �
Remark 3.5. By the PBW Theorem 3.4, the HcW -module IND introduced
in the above proof can be identified with the HcW -module induced from the
trivial CW -module. The same remark applies below to type D and B.
3.2. The algebra HcW of type Dn. Let W = WDn be the Weyl group of
type Dn. It is generated by s1, . . . , sn, subject to the following relations:
s2i = 1 (i ≤ n− 1) (3.6)
sisi+1si = si+1sisi+1 (i ≤ n− 2) (3.7)
sisj = sjsi (|i− j| > 1, i, j 6= n) (3.8)
sisn = snsi (i 6= n− 2) (3.9)
sn−2snsn−2 = snsn−2sn, s
n = 1. (3.10)
In particular, Sn is generated by s1, . . . , sn−1 subject to the relations (3.6–
3.8) above.
Definition 3.6. Let u ∈ C, and letW =WDn . The degenerate affine Hecke-
Clifford algebra of type Dn, denoted by H
W or H
, is the algebra generated
by xi, ci, si, 1 ≤ i ≤ n, subject to the relations (3.1–3.5), (3.6–3.10), and the
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 11
following additional relations:
sncn = −cn−1sn
snci = cisn (i 6= n− 1, n)
snxn + xn−1sn = −u(1 + cn−1cn) (3.11)
snxi = xisn (i 6= n− 1, n).
Proposition 3.7. The algebra HcDn admits anti-involutions τ1, τ2 defined by
τ1 : si 7→ si, cj 7→ cj , xj 7→ xj , (1 ≤ i ≤ n);
τ2 : si 7→ si, cj 7→ −cj, xj 7→ xj , (1 ≤ i ≤ n).
Also, the algebra HcDn admits an involution σ which fixes all generators
si, xi, ci except the following 4 generators:
σ : sn 7→ sn−1, sn−1 7→ sn, xn 7→ −xn, cn 7→ −cn.
Proof. We leave the easy verifications on τ1, τ2 to the reader.
It remains to check that σ preserves the defining relations. Almost all the
relations are obvious except (3.4) and (3.11). We see that σ preserves (3.4)
as follows: for i ≤ n− 2,
σ(xi+1si − sixi) = xi+1si − sixi
= u(1− ci+1ci) = σ(u(1 − ci+1ci));
σ(xnsn−1 − sn−1xn−1) = −xnsn − snxn−1
= u(1 + cncn−1) = σ(u(1− cncn−1)).
Also, σ preserves (3.11) since
σ(snxn + xn−1sn) = −sn−1xn + xn−1sn−1
= −u(1− cn−1cn) = σ(−u(1 + cn−1cn)).
Hence, σ is an automorphism of HcDn . Clearly σ
2 = 1. �
The natural action of Sn on C[h
∗] = C[x1, . . . , xn] is extended to an action
of WDn by letting
xsnn = −xn−1, x
n−1 = −xn, x
= xi (i 6= n− 1, n).
Proposition 3.8. Let W = WDn , 1 ≤ i ≤ n − 1, and f ∈ C[x1, . . . , xn].
Then the following identities hold in HcW :
(1) sif = f
sisi + u
f − f si
xi+1 − xi
cici+1f − f sicici+1
xi+1 + xi
(2) snf = f
snsn − u
f − f sn
xn + xn−1
cn−1cnf − f sncn−1cn
xn − xn−1
Proof. Formula (1) has been established by induction as in type An−1. For-
mula (2) can be verified by a similar induction. �
12 TA KHONGSAP AND WEIQIANG WANG
3.3. The algebra HcW of type Bn. Let W = WBn be the Weyl group of
type Bn, which is generated by s1, . . . , sn, subject to the defining relation
for Sn on s1, . . . , sn−1 and the following additional relations:
sisn = snsi (1 ≤ i ≤ n− 2) (3.12)
(sn−1sn)
4 = 1, s2n = 1. (3.13)
We note that the simple reflections s1, . . . , sn belongs to two different
conjugacy classes in WBn , with s1, . . . , sn−1 in one and sn in the other.
Definition 3.9. Let u, v ∈ C, and let W = WBn . The degenerate affine
Hecke-Clifford algebra of type Bn, denoted by H
W or H
, is the algebra
generated by xi, ci, si, 1 ≤ i ≤ n, subject to the relations (3.1–3.5), (3.6–3.8),
(3.12–3.13), and the following additional relations:
sncn = −cnsn
snci = cisn (i 6= n)
snxn + xnsn = −
snxi = xisn (i 6= n).
The factor
2 above is inserted for the convenience later in relation to
the spin affine Hecke algebras. When it is necessary to indicate u, v, we will
write HcW (u, v) for H
W . For any a ∈ C\{0}, we have an isomorphism of
superalgebras ψ : HcW (au, av) → HcW (u, v) given by dilations xi 7→ axi for
1 ≤ i ≤ n, while fixing each si, ci.
The action of Sn on C[h
∗] = C[x1, . . . , xn] can be extended to an action
of WBn by letting
xsnn = −xn, x
i = xi, (i 6= n).
Proposition 3.10. Let W = WBn. Given f ∈ C[x1, . . . , xn] and 1 ≤ i ≤
n− 1, the following identities hold in HcW :
(1) sif = f
sisi + u
f − f si
xi+1 − xi
cici+1f − f sicici+1
xi+1 + xi
(2) snf = f
snsn −
f − f sn
Proof. The proof is similar to type A and D, and will be omitted. �
3.4. PBW basis for HcW . Note that H
W contains C[h
∗],Cn,CW as subal-
gebras. We have the following PBW basis theorem for HcW .
Theorem 3.11. Let W =WDn or W = WBn. The multiplication of subal-
gebras C[h∗],Cn, and CW induces a vector space isomorphism
C[h∗]⊗ Cn ⊗ CW −→ HcW .
Equivalently, the elements {xαcǫw|α ∈ Zn+, ǫ ∈ Zn2 , w ∈ W} form a linear
basis for HcW (called a PBW basis).
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 13
Proof. For W = WDn , we can verify by a direct lengthy computation that
the HcAn−1-action on IND = C[x1, . . . , xn]⊗Cn (see the proof of Theorem 3.4)
naturally extends to an action of HcDn , where (compare Proposition 3.8) sn
acts by
sn.(fc
ǫ) = f sncsnǫ −
f − f sn
xn + xn−1
− ucn−1cnf − f
sncn−1cn
xn − xn−1
Similarly, for W = WBn , the H
-action on IND extends to an action of
HcBn , where (compare Proposition 3.10) sn acts by
sn.(fc
ǫ) = f sncsnǫ −
f − f sn
It is easy to show that, for either W , the elements xαcǫw span HcW . It
remains to show that they are linearly independent. We shall treat theWBn
case in detail and skip the analogous WDn case.
To that end, we shall refer to the argument in the proof of Theorem 3.4
with suitable modification as follows. The w̃ = ((η1, . . . , ηn), σ) ∈ WBn =
{±1}n ⋊ Sn may now not be unique, but the σ and the α̃ are uniquely
determined. Then, by the same argument on the vanishing of a maximal
term, we obtain that
w̃ aα̃ǫw̃x
2 · · · xNnn )w̃ = 0, and hence,
(η1,...,ηn)
aα̃ǫw̃(−1)
i=1 ηiNi = 0.
By choosing N1, . . . , Nn with different parities (2
n choices) and solving the
2n linear equations, we see that all aα̃ǫw̃ = 0. This can also be seen more
explicitly by induction on n. By choosing Nn to be even and odd, we deduce
that for a fixed ηn,
(η1,...,ηn−1)∈{±1}n−1 aα̃ǫw̃(−1)
i=1 ηiNi = 0, which is the
equation for (n− 1) xi’s and the induction applies. �
3.5. The even center for HcW . The even center of a superalgebra A, de-
noted by Z(A), is the subalgebra of even central elements of A.
Proposition 3.12. Let W = WDn or W = WBn . The even center Z(H
of HcW is isomorphic to C[x
1, . . . , x
Proof. We first show that every W -invariant polynomial f in x21, . . . , x
central in HcW . Indeed, f commutes with each ci by (3.2) and clearly f
commutes with each xi. By Proposition 3.8 for type Dn or Proposition 3.10
for type Bn, sif = fsi for each i. Since H
W is generated by ci, xi and si for
all i, f is central in HcW and C[x
1, . . . , x
W ⊆ Z(HcW ).
On the other hand, take an even central element C =
α,ǫ,w
xαcǫw in
HcW . We claim that w = 1 whenever aα,ǫ,w 6= 0. Otherwise, let 1 6= w0 ∈ W
be maximal with respect to the Bruhat ordering in W such that a
α,ǫ,w0
6= 0.
Then x
i 6= xi for some i. By Proposition 3.8 for typeDn or Proposition 3.10
for type Bn, x
iC − Cx2i is equal to
α,ǫ aα,ǫ,w0x
α(x2i − (x
2)cǫw0 plus a
14 TA KHONGSAP AND WEIQIANG WANG
linear combination of monomials not involving w0, hence nonzero. This
contradicts with the fact that C is central. So we can write C =
xαcǫ.
Since xiC = Cxi for each i, then (3.2) forces C to be in C[x1, . . . , xn].
Now by (3.2) and ciC = Cci for each i we have that C ∈ C[x21, . . . , x2n].
Since siC = Csi for each i, we then deduce from Proposition 3.8 for type
Dn or Proposition 3.10 for type Bn that C ∈ C[x21, . . . , x2n]W .
This completes the proof of the proposition. �
3.6. The intertwiners in HcW . In this subsection, we will define the inter-
twiners in the degenerate affine Hecke-Clifford algebras HcW .
The following intertwiners φi ∈ HcW (with u = 1) for W = WAn−1 were
introduced by Nazarov [Naz] (also cf. [Kle]), where 1 ≤ i ≤ n− 1:
φi = (x
i+1 − x2i )si − u(xi+1 + xi)− u(xi+1 − xi)cici+1. (3.14)
A direct computation using (3.4) provides another equivalent formula for φi:
φi = si(x
i − x2i+1) + u(xi+1 + xi) + u(xi+1 − xi)cici+1.
We define the intertwiners φi ∈ HcW for W = WDn (1 ≤ i ≤ n) by the
same formula (3.14) for 1 ≤ i ≤ n− 1 and in addition by letting
φn ≡ φDn = (x2n − x2n−1)sn + u(xn − xn−1)− u(xn + xn−1)cn−1cn. (3.15)
We define the intertwiners φi ∈ HcW for W = WBn (1 ≤ i ≤ n) by the
same formula (3.14) for 1 ≤ i ≤ n− 1 and in addition by letting
φn ≡ φBn = 2x2nsn +
2vxn. (3.16)
The following generalizes the type An−1 results of Nazarov [Naz].
Theorem 3.13. Let W be either WAn−1 , WDn, or WBn. The intertwiners
φi (with 1 ≤ i ≤ n− 1 for type An−1 and 1 ≤ i ≤ n for the other two types)
satisfy the following properties:
(1) φ2i = 2u
2(x2i+1 + x
i )− (x2i+1 − x2i )2 (1 ≤ i ≤ n− 1,∀W );
(2) φ2n = 2u
2(x2n + x
n−1)− (x2n − x2n−1)2, for type Dn;
(3) φ2n = 4x
n − 2v2x2n, for type Bn;
(4) φif = f
siφi (∀f ∈ C[x1, . . . , xn],∀i,∀W );
(5) φicj = c
j φi (1 ≤ j ≤ n,∀i,∀W );
(6) φiφjφi · · ·︸ ︷︷ ︸
= φjφiφj · · ·︸ ︷︷ ︸
Proof. Part (1) follows by a straightforward computation and can also be
found in [Naz] (with u = 1). Part (2) follows from (1) by applying the
involution σ defined in Proposition 3.7. Part (3) and (5) follow by a direct
verification.
Part (4) for WAn−1 follows from clearing the denominators in the formula
in Proposition 3.3 and then rewriting in terms of φi as defined in (3.14).
Similarly, (4) for WDn and WBn follows by rewriting the formulas given in
Proposition 3.8 in type D and Proposition 3.10 in type B, respectively.
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 15
It remains to prove (6) which is less trivial. Recall that
mij︷ ︸︸ ︷
sisjsi · · · =
mij︷ ︸︸ ︷
sjsisj · · ·,
(denoting this element by w). Let IND be the subalgebra of HcW generated
by C[x1, . . . , xn] and Cn. Denote by ≤ the Bruhat ordering on W . Then we
can write
φiφjφi · · · = fw +
pu,wu
for some f ∈ C[x1, . . . , xn], and pu,w ∈ IND. We may rewrite
φiφjφi · · · = fw +
r′u,wφu
where φu := φaφb · · · for any subword u = sasb · · · of w = sisjsi · · · ,
and r′u,w is in some suitable localization of IND with the central element∏
1≤k≤n x
1≤i<j≤n(x
i − x2j) ∈ IND being invertible. Note that such
a localization is a free module over the corresponding localized ring of
C[x1, . . . , xn]. We can then write
φjφiφj · · · = fw +
r′′u,wφu
with the same coefficient of w as for φiφjφi · · ·, according to Lemma 3.14.
The difference ∆ := (φiφjφi · · · − φjφiφj · · ·) is of the form
ru,wφu
for some ru,w. Observe by (4) that ∆p = p
w∆ for any p ∈ C[x1, . . . , xn].
Then we have
pwru,wφu = p
w∆ = ∆p =
ru,wφup =
ru,wp
In other words, (pw − pu)ru,w = 0 for all p ∈ C[x1, . . . , xn] for each given
u < w. This implies that ru,w = 0 for each u, and ∆ = 0. This completes
the proof of (6) modulo Lemma 3.14 below. �
Lemma 3.14. The following identity holds:
i · · ·︸ ︷︷ ︸
= φ0jφ
j · · ·︸ ︷︷ ︸
where φ0i denotes the specialization φi|u=0 of φ at u = 0 (or rather φBn |v=0
when i = n in the type Bn case.)
16 TA KHONGSAP AND WEIQIANG WANG
Proof. Let W =WBn . For 1 ≤ i ≤ n− 1, mi,i+1 = 3. So we have
i = (x
i+1 − x2i )si(x2i+2 − x2i+1)si+1(x2i+1 − x2i )si
= (x2i+1 − x2i )(x2i+2 − x2i )(x2i+2 − x2i+1)sisi+1si
= (x2i+2 − x2i+1)(x2i+2 − x2i )(x2i+1 − x2i )si+1sisi+1
= (x2i+2 − x2i+1)si+1(x2i+1 − x2i )si(x2i+2 − x2i+1)si+1
= φ0i+1φ
Note that mij = 2 for j 6= i, i+ 1; clearly, in this case, φ0iφ0j = φ0jφ0i .
Noting that mn−1,n = 4, we have
φ0n−1φ
n = 4(x
n − x2n−1)sn−1x2nsn(x2n − x2n−1)sn−1x2nsn
= 4(x2n − x2n−1)x2n−1(x2n−1 − x2n)x2nsn−1snsn−1sn
= 4x2n(x
n − x2n−1)x2n−1(x2n−1 − x2n)snsn−1snsn−1
= 4x2nsn(x
n − x2n−1)sn−1x2nsn(x2n − x2n−1)sn−1
= φ0nφ
This completes the proof for type Bn.
The similar proofs for types An−1 and Dn are skipped. �
Theorem 3.13 implies that for every w ∈ W we have a well-defined el-
ement φw ∈ HcW given by φw = φi1 · · ·φim where w = si1 · · · sim is any
reduced expression for w. These elements φw should play an important role
for the representation theory of the algebras HcW . It will be very interest-
ing to classify the simple modules of HcW and to find a possible geometric
realization. This was carried out by Lusztig [Lu1, Lu2, Lu3] for the usual
degenerate affine Hecke algebra case.
4. Degenerate spin affine Hecke algebras
In this section we will introduce the degenerate spin affine Hecke algebra
when W is the Weyl group of types Dn or Bn, and then establish the
connections with the corresponding degenerate affine Hecke-Clifford algebras
HcW . See [W1] for the type A case.
4.1. The skew-polynomial algebra. We shall denote by C[b1, . . . , bn] the
C-algebra generated by b1, . . . , bn subject to the relations
bibj + bjbi = 0 (i 6= j).
This is naturally a superalgebra by letting each bi be odd. We will refer to
this as the skew-polynomial algebra in n variables. This algebra has a linear
basis given by bα := bk11 · · · bknn for α = (k1, . . . , kn) ∈ Zn+, and it contains a
polynomial subalgebra C[b21, . . . , b
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 17
4.2. The algebra H−
of type Dn. Recall that the spin Weyl group CW
associated to a Weyl group W is generated by t1, . . . , tn subject to the rela-
tions as specified in Example 2.1.
Definition 4.1. Let u ∈ C and let W = WDn . The degenerate spin affine
Hecke algebra of type Dn, denoted by H
or H−
, is the algebra generated
by C[b1, . . . , bn] and CW
− subject to the following relations:
tibi + bi+1ti = u (1 ≤ i ≤ n− 1)
tibj = −bjti (j 6= i, i+ 1, 1 ≤ i ≤ n− 1)
tnbn + bn−1tn = u
tnbi = −bitn (i 6= n− 1, n).
The algebra H−
is naturally a superalgebra by letting each ti and bi be
odd generators. It contains the type An−1 degenerate spin affine Hecke
algebra H−
(generated by b1, . . . , bn, t1, . . . , tn−1) as a subalgebra.
Proposition 4.2. The algebra H−
admits anti-involutions τ1, τ2 defined by
τ1 : ti 7→ −ti, bi 7→ −bi (1 ≤ i ≤ n);
τ2 : ti 7→ ti, bi 7→ bi (1 ≤ i ≤ n).
Also, the algebra H−
admits an involution σ which swaps tn−1 and tn while
fixing all the remaining generators ti, bi.
Proof. Note that we use the same symbols τ1, τ2, σ to denote the (anti-)
involutions for H−
and HcDn in Proposition 3.7, as those on H
are the
restrictions from those on HcDn via the isomorphism in Theorem 4.4 below.
The proposition is thus established via the isomorphism in Theorem 4.4, or
follows by a direct computation as in the proof of Proposition 3.7. �
4.3. The algebra H−
of type Bn.
Definition 4.3. Let u, v ∈ C, and W = WBn . The degenerate spin affine
Hecke algebra of type Bn, denoted by H
or H−
, is the algebra generated
by C[b1, . . . , bn] and CW
− subject to the following relations:
tibi + bi+1ti = u (1 ≤ i ≤ n− 1)
tibj = −bjti (j 6= i, i+ 1, 1 ≤ i ≤ n− 1)
tnbn + bntn = v
tnbi = −bitn (i 6= n).
Sometimes, we will write H−
(u, v) or H−
(u, v) for H−
or H−
to indicate
the dependence on the parameters u, v.
18 TA KHONGSAP AND WEIQIANG WANG
4.4. A superalgebra isomorphism.
Theorem 4.4. Let W =WDn or W =WBn . Then,
(1) there exists an isomorphism of superalgebras
Φ : HcW−→Cn ⊗ H−W
which extends the isomorphism Φ : Cn ⋊ CW −→ Cn ⊗ CW− (in
Theorem 2.4) and sends xi 7−→
−2cibi for each i;
(2) the inverse Ψ : Cn⊗H−W−→HcW extends Ψ : Cn⊗CW− −→ Cn⋊CW
(in Theorem 2.4) and sends bi 7−→
cixi for each i.
Theorem 4.4 also holds for WAn−1 (see [W1]).
Proof. We only need to show that Φ preserves the defining relations in HcW
which involve xi’s.
Let W = WDn . Here, we will verify two such relations below. The
verification of the remaining relations is simpler and will be skipped. For
1 ≤ i ≤ n− 1, we have
Φ(xi+1si − sixi) = ci+1bi+1(ci − ci+1)ti − (ci − ci+1)ticibi
= (1− ci+1ci)bi+1ti + (1− ci+1ci)tibi
= u(1− ci+1ci),
Φ(snxn + xn−1sn) = (cn−1 + cn)tncnbn + cn−1bn−1(cn−1 + cn)tn
= −(1 + cn−1cn)tnbn − (1 + cn−1cn)bn−1tn
= −u(1 + cn−1cn).
Now let W = WBn . For 1 ≤ i ≤ n − 1, as in the proof in type Dn, we
have Φ(xi+1si − sixi) = u(1− ci+1ci). Moreover, we have
Φ(snxn + xnsn) =
cntncnbn +
cnbncntn
2cntncnbn +
2cnbncntn
2(tnbn + bncn) = −
Φ(snxj) =
cntncjbj =
2cntncjbj
2cjcntnbj =
2cjbjcntn = Φ(xjsn), for j 6= n.
Thus Φ is a homomorphism of (super)algebras. Similarly, we check that
Ψ is a superalgebra homomorphism. Observe that Φ and Ψ are inverses on
generators and hence they are indeed (inverse) isomorphisms. �
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 19
4.5. PBW basis for H−
. Note that H−
contains the skew-polynomial
algebra C[b1, . . . , bn] and the spin Weyl group algebra CW
− as subalgebras.
We have the following PBW basis theorem for H−
Theorem 4.5. Let W = WDn or W = WBn. The multiplication of the
subalgebras CW− and C[b1, . . . , bn] induces a vector space isomorphism
C[b1, . . . , bn]⊗ CW−
≃−→ H−
Theorem 4.5 also holds for WAn−1 (see [W1]).
Proof. It follows from the definition that H−
is spanned by the elements of
the form bασ where σ runs over a basis for CW− and α ∈ Zn+. By Theo-
rem 4.4, we have an isomorphism ψ : Cn⊗H−W−→HcW . Observe that the im-
age ψ(bασ) are linearly independent in HcW by the PBW basis Theorem 3.11
for HcW . Hence the elements b
ασ are linearly independent in H−
4.6. The even center for H−
Proposition 4.6. Let W =WDn or W = WBn . The even center of H
isomorphic to C[b21, . . . , b
Proof. By the isomorphism Φ : HcW → Cn⊗H
(see Theorems 4.4) and the
description of the center Z(HcW ) (see Proposition 3.12), we have
Z(Cn ⊗ H−W ) = Φ(Z(H
W )) = Φ(C[x
1, . . . , x
W ) = C[b21, . . . , b
Thus, C[b21, . . . , b
W ⊆ Z(H−
Now let C ∈ Z(H−
). Since C is even, C commutes with Cn and thus com-
mutes with the algebra Cn ⊗ H−W . Then Ψ(C) ∈ Z(H
W ) = C[x
1, . . . , x
and thus, C = ΦΨ(C) ∈ Φ(C[x21, . . . , x2n]W ) = C[b21, . . . , b2n]W . �
In light of the isomorphism Theorem 4.4, the problem of classifying the
simple modules of the spin affine Hecke algebra H−
is equivalent to the
classification problem for the affine Hecke-Clifford algebra HcW . It remains
to be seen whether it is more convenient to find the geometric realization of
instead of HcW .
4.7. The intertwiners in H−
. The intertwiners Ii ∈ H−W (1 ≤ i ≤ n− 1)
for W =WAn−1 were introduced in [W1] (with u = 1):
Ii = (b
i+1 − b2i )ti − u(bi+1 − bi). (4.1)
The commutation relations in Definition 4.1 gives us another equivalent
expression for Ii:
Ii = ti(b
i − b2i+1) + u(bi+1 − bi).
We define the intertwiners Ii ∈ H−W for W = WDn (1 ≤ i ≤ n) by the
same formula (4.1) for 1 ≤ i ≤ n− 1 and in addition by letting
In ≡ IDn = (b2n − b2n−1)tn − u(bn − bn−1). (4.2)
20 TA KHONGSAP AND WEIQIANG WANG
Also, we define the intertwiners Ii ∈ H−W for W = WBn (1 ≤ i ≤ n) by
the same formula (4.1) for 1 ≤ i ≤ n− 1 and in addition by letting
In ≡ IBn = 2b2ntn − vbn. (4.3)
Proposition 4.7. The following identities hold in H−
, for W = WAn−1 ,
WBn, or WDn:
(1) Iibi = −bi+1Ii,Iibi+1 = −biIi, and Iibj = −bjIi (j 6= i, i + 1), for
1 ≤ i ≤ n− 1, 1 ≤ j ≤ n, and any W ;
In addition,
(2) Inbn−1 = −bnIn,Inbn = −bn−1In, and Inbi = −biIn (i 6= n− 1, n),
for type Dn;
(3) Inbn = −bnIn, and Inbi = −biIn (i 6= n), for type Bn.
Proof. (1) We first prove the case when j = i:
Iibi = (b
i+1 − b2i )tibi − u(bi+1 − bi)bi
= (b2i+1 − b2i )(−bi+1ti + u)− u(bi+1bi − b2i )
= −bi+1
(b2i+1 − b2i )ti − u(bi+1 − bi)
= −bi+1Ii.
The proof for Iibi+1 = −biIi is similar and thus skipped.
For j 6= i, i+ 1, we have tibj = −bjti, and hence Iibj = −bjIi.
(2) We prove only the first identity. The proofs of the remaining two
identities are similar and will be skipped.
Inbn−1 = (b
n − b2n−1)tnbn−1 − u(bn − bn−1)bn−1
= (b2n − b2n−1)(−bntn + u)− u(bnbn−1 − b2n−1)
= −bn
(b2n − b2n−1)tn − u(bn − bn−1)
= −bnIn.
The proof of (3) is analogous to (2), and is thus skipped. �
Recall the superalgebra isomorphism Φ : HcW−→Cn ⊗H
defined in Sec-
tion 4 and the elements βi ∈ Cn defined in Section 2.
Theorem 4.8. Let W be either WAn−1 , WDn, or WBn . The isomorphism
Φ : HcW −→ Cn ⊗H
sends φi 7→ −2
−1βiIi for each i. More explicitly, Φ
sends
φi 7−→ −
−2(ci − ci+1)⊗ Ii (1 ≤ i ≤ n− 1);
φn 7−→ −
−2(cn−1 + cn)⊗ In for type Dn;
φn 7−→ −2
−1cn ⊗ In for type Bn.
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 21
Proof. Recall that the isomorphism Φ sends si 7→ −
−1βiti, xi 7→
−2cibi
for each i. So, for 1 ≤ i ≤ n− 1, we have the following
Φ(φi) = Φ
(x2i+1 − x2i )si − u(xi+1 + xi)− u(xi+1 − xi)cici+1
−2(ci − ci+1)(b2i+1 − b2i )ti − u
−2(ci+1bi+1 − cibi)
−2(ci+1bi+1 − cibi)cici+1
−2(ci − ci+1)
(b2i+1 − b2i )ti − u(bi+1 − bi)
−2(ci − ci+1)⊗ Ii.
Next for φn ∈ HcDn , we have
Φ(φn) = Φ
(x2n − x2n−1)sn + u(xn − xn−1)− u(xn + xn−1)cn−1cn
−2(cn + cn−1)(b2n − b2n−1)tn + u
−2(cnbn − cn−1bn−1)
−2(cnbn − cn−1bn−1)cn−1cn
−2(cn + cn−1)
(b2n − b2n−1)tn − u(bn − bn−1)
−2(cn−1 + cn)⊗ In.
We skip the computation for φn ∈ HcBn which is very similar but less
complicated. �
Proposition 4.9. The following identities hold in H−
, for W = WAn−1 ,
WBn, or WDn:
(1) I2i = u
2(b2i+1 + b
i )− (b2i+1 − b2i )2, for 1 ≤ i ≤ n− 1 and every type of
(2) I2n = u
2(b2n + b
n−1)− (b2n − b2n−1)2, for type Dn.
(3) I2n = 4b
n − v2b2n, for type Bn.
Proof. It follows from the counterparts in Theorem 3.13 via the explicit
correspondences under the isomorphism Φ (see Theorem 4.8). It can of
course also be proved by a direct computation. �
Proposition 4.10. For W =WAn−1, WBn , or WDn , we have
IiIjIi · · ·︸ ︷︷ ︸
= (−1)mij+1 IjIiIj · · ·︸ ︷︷ ︸
Proof. By Theorem 2.2, we have
βiβjβi · · ·︸ ︷︷ ︸
= (−1)mij+1 βjβiβj · · ·︸ ︷︷ ︸
Now the statement follows from the above equation and Theorem 3.13 (6)
via the correspondence of the intertwiners under the isomorphism Φ (see
Theorem 4.8). �
Remark 4.11. Proposition 4.7, Theorem 4.8, and Proposition 4.10 for H−
can be found in [W1].
22 TA KHONGSAP AND WEIQIANG WANG
5. Degenerate covering affine Hecke algebras
In this section, the degenerate covering affine Hecke algebras associated
to the double covers W̃ of classical Weyl groups W are introduced. It has
as its natural quotients the usual degenerate affine Hecke algebras HW [Dr,
Lu1, Lu2] and the spin degenerate affine Hecke algebras H−
introduced by
the authors.
Recall the distinguished double cover W̃ of a Weyl group W from Sec-
tion 2.2.
5.1. The algebra H∼W of type An−1.
Definition 5.1. Let W = WAn−1 , and let u ∈ C. The degenerate covering
affine Hecke algebra of type An−1, denoted by H
W or H
, is the algebra
generated by x̃1, . . . , x̃n and z, t̃1, . . . , t̃n−1, subject to the relations for W̃
and the additional relations:
zx̃i = x̃iz, z is central of order 2 (5.1)
x̃ix̃j = zx̃jx̃i (i 6= j) (5.2)
t̃ix̃j = zx̃j t̃i (j 6= i, i + 1) (5.3)
t̃ix̃i+1 = zx̃it̃i + u. (5.4)
Clearly H∼W contains CW̃ as a subalgebra.
5.2. The algebra H∼W of type Dn.
Definition 5.2. Let W = WDn , and let u ∈ C. The degenerate covering
affine Hecke algebra of type Dn, denoted by H
W or H
, is the algebra
generated by x̃1, . . . , x̃n and z, t̃1, . . . , t̃n, subject to the relations (5.1–5.4)
and the following additional relations:
t̃nx̃i = zx̃it̃n (i 6= n− 1, n)
t̃nx̃n = −x̃n−1t̃n + u.
5.3. The algebra H∼W of type Bn.
Definition 5.3. Let W = WBn , and let u, v ∈ C. The degenerate covering
affine Hecke algebra of type Bn, denoted by H
W or H
, is the algebra
generated by x̃1, . . . , x̃n and z, t̃1, . . . , t̃n, subject to the relations (5.1–5.4)
and the following additional relations:
t̃nx̃i = zx̃it̃n (i 6= n)
t̃nx̃n = −x̃nt̃n + v.
5.4. PBW basis for H∼W .
Proposition 5.4. Let W = WAn−1 ,WDn , or WBn . Then the quotient of
the covering affine Hecke algebra H∼W by the ideal 〈z − 1〉 (respectively, by
the ideal 〈z+1〉) is isomorphic to the usual degenerate affine Hecke algebras
HW (respectively, the spin degenerate affine Hecke algebras H
THE CLASSICAL SPIN AFFINE HECKE ALGEBRAS 23
Proof. Follows by the definitions in terms of generators and relations of all
the algebras involved. �
Theorem 5.5. Let W = WAn−1 ,WDn , or WBn. Then the elements x̃
where α ∈ Zn+ and w̃ ∈ W̃ , form a basis for H∼W (called a PBW basis).
Proof. By the defining relations, it is easy to see that the elements x̃αw̃
form a spanning set for H∼W . So it remains to show that they are linearly
independent.
For each element t ∈ W , denote the two preimages in W̃ of t by {t̃, zt̃}.
Now suppose that
aα,t̃x̃
αt̃+ bα,t̃zx̃
Let I+ and I− be the ideals of H∼W generated by z−1 and z+1 respectively.
Then by Proposition 5.4, H∼W /I
+ ∼= HW and H∼W /I− ∼= H
. Consider the
projections:
Υ+ : H
W −→ H∼W /I+, Υ− : H∼W −→ H∼W /I−.
By abuse of notation, denote the image of x̃α in HW by x
α. Observe that
0 = Υ+
(aα,t̃x̃
αt̃+ bα,t̃x̃
αzt̃)
(aα,t̃ + bα,t̃)x
αt ∈ HW .
Since it is known [Lu1] that {xαt|α ∈ Zn+ and t ∈ W} form a basis for
the usual degenerate affine Hecke algebra HW , aα,t̃ = −bα,t̃ for all α and t.
Similarly, denoting the image in CW− of t̃ by t̄, we have
0 = Υ−
(aα,t̃x̃
αt̃+ bα,t̃x̃
αzt̃)
(aα,t̃ − bα,t̃)x
αt̄ ∈ H−
Since {xαt̄} is a basis for the spin degenerate affine Hecke algebra H−
have aα,t̃ = bα,t̃ for all α and t. Hence, aα,t̃ = bα,t̃ = 0, and the linear
independence is proved. �
References
[BK] J. Brundan, A Kleshchev, Hecke-Clifford superalgebras, crystals of type A
2l and
modular branching rules for Ŝn, Represent. Theory 5 (2001), 317–403.
[Dr] V. Drinfeld, Degenerate affine Hecke algebras and Yangians, Funct. Anal. Appl.
20 (1986), 58–60.
[EG] P. Etingof and V. Ginzburg, Symplectic reflection algebras, Calogero-Moser space,
and deformed Harish-Chandra homomorphism, Invent. Math. 147 (2002), 243–348.
[IY] S. Ihara and T. Yokonuma, On the second cohomology groups (Schur multipliers)
of finite reflection groups, J. Fac. Sci. Univ. Tokyo, Sect. IA Math. 11 (1965),
155–171.
[JN] A. Jones and M. Nazarov, Affine Sergeev algebra and q-analogues of the Young
symmetrizers for projective representations of the symmetric group, Proc. London
Math. Soc. 78 (1999), 481–512.
[Joz] T. Józefiak, A class of projective representations of hyperoctahedral groups and
Schur Q-functions, Topics in Algebra, Banach Center Publ., 26, Part 2, PWN-
Polish Scientific Publishers, Warsaw (1990), 317–326.
[Kar] G. Karpilovsky, The Schur multiplier, London Math. Soc. Monagraphs, New Series
2, Oxford University Press, 1987.
24 TA KHONGSAP AND WEIQIANG WANG
[Kle] A. Kleshchev, Linear and projective representations of symmetric groups, Cam-
bridge Tracts in Mathematics 163, Cambridge University Press, 2005.
[KW] T. Khongsap and W. Wang, Hecke-Clifford algebras and spin Hecke algebras II:
the rational double affine type, Preprint 2007.
[Lu1] G. Lusztig, Affine Hecke algebras and their graded version, J. Amer. Math. Soc. 2
(1989), 599–635.
[Lu2] ———, Cuspidal local systems and graded Hecke algebras I, Publ. IHES 67 (1988),
145–202.
[Lu3] ———, Cuspidal local systems and graded Hecke algebras III, Represent. Theory
6 (2002), 202–242.
[Mo] A. Morris, Projective representations of reflection groups, Proc. London Math. Soc
32 (1976), 403–420.
[Naz] M. Nazarov, Young’s symmetrizers for projective representations of the symmetric
group, Adv. in Math. 127 (1997), 190–257.
[Sch] I. Schur, Über die Darstellung der symmetrischen und der alternierenden Gruppe
durch gebrochene lineare Substitutionen, J. reine angew. Math. 139 (1911), 155–
[Ser] A. Sergeev, The Howe duality and the projective representations of symmetric
groups, Represent. Theory 3 (1999), 416–434.
[St] J. Stembridge, The projective representations of the hyperoctahedral group, J. Al-
gebra 145 (1992), 396–453.
[W1] W. Wang, Double affine Hecke algebras for the spin symmetric group, preprint
2006, math.RT/0608074.
[W2] ———, Spin Hecke algebras of finite and affine types, Adv. in Math. 212 (2007),
723–748.
[Yam] M. Yamaguchi, A duality of the twisted group algebra of the symmetric group and
a Lie superalgebra, J. Algebra 222 (1999), 301–327.
Department of Math., University of Virginia, Charlottesville, VA 22904
E-mail address: tk7p@virginia.edu (Khongsap); ww9c@virginia.edu (Wang)
http://arxiv.org/abs/math/0608074
	1. Introduction
	1.1. 
	1.2. 
	1.3. 
	1.4. 
	1.5. 
	2. Spin Weyl groups and Clifford algebras
	2.1. The Weyl groups
	2.2. A distinguished double covering of Weyl groups
	2.3. The Clifford algebra CW
	2.4. The basic spin supermodule
	2.5. A superalgebra isomorphism
	3. Degenerate affine Hecke-Clifford algebras
	3.1. The algebra HcW of type An-1
	3.2. The algebra HcW of type Dn
	3.3. The algebra HcW of type Bn
	3.4. PBW basis for HcW
	3.5. The even center for HcW
	3.6. The intertwiners in HcW
	4. Degenerate spin affine Hecke algebras
	4.1. The skew-polynomial algebra
	4.2. The algebra H-W of type Dn
	4.3. The algebra H-W of type Bn
	4.4. A superalgebra isomorphism
	4.5. PBW basis for H-W
	4.6. The even center for H-W
	4.7. The intertwiners in H-W
	5. Degenerate covering affine Hecke algebras
	5.1. The algebra HW of type An-1
	5.2. The algebra HW of type Dn
	5.3. The algebra HW of type Bn
	5.4. PBW basis for HW
	References
ABSTRACT
  Associated to the classical Weyl groups, we introduce the notion of
degenerate spin affine Hecke algebras and affine Hecke-Clifford algebras. For
these algebras, we establish the PBW properties, formulate the intertwiners,
and describe the centers. We further develop connections of these algebras with
the usual degenerate (i.e. graded) affine Hecke algebras of Lusztig by
introducing a notion of degenerate covering affine Hecke algebras.

<|endoftext|><|startoftext|>
Introduction
The discovery of new models of quantum computation (QC), such as the one-way quantum
computer [7] and the projective measurement-based model [4], have opened up new
experimental avenues toward the realisation of a quantum computer in laboratories. At the
same time they have challenged the traditional view about the nature of quantum computation.
Since the introduction of the quantum Turing machine by Deutsch [1], unitary
transformations plays a central rôle in QC. However, it is known that the action of unitary
gates can be simulated using the process of quantum teleportation based on projective
measurements-only [4]. Characterizing the minimal resources that are sufficient for this model
to be universal, is a key issue.
Resources of measurement-based quantum computations are composed of two
ingredients: (i) a universal family of observables, which describes the measurements that can
be performed (ii) the number of ancillary qubits used to simulate any unitary transformation.
Successive improvements of the upper bounds on these minimal resources have been
made by Leung and others [2, 3]. These bounds have been significantly reduced when the
state transfer (which is a restricted form of teleportation) has been introduced: one two-qubit
observable (Z ⊗ X) and three one-qubit observables (X , Z and (X + Y )/
2), associated
with only one ancillary qubit, are sufficient for simulating any unitary-based QC [6]. Are
these resources minimal ? In [5], a sub-family of observables (Z ⊗X , Z, and (X − Y )/
is proved to be universal, however two ancillary qubits are used to make this sub-family
universal.
These two results lead to an open question : is there a trade-off between observables and
ancillary qubits in measurement-based QC ? In this paper, we reply in the negative to this
http://arxiv.org/abs/0704.0202v1
Towards Minimal Resources of Measurement-based Quantum Computation 2
open question by proving that the sub-family {Z ⊗ X,Z, (X − Y )/
2} is universal using
only one ancillary qubit, improving the upper bound on the minimal resources required for
measurement-based QC.
2. Measurement-based QC
The simulation of a given unitary transformation U by means of projective measurements can
be decomposed into:
• (Step of simulation) First, U is probabilistically simulated up to a Pauli operator, leading
to σU , where σ is either identity or a Pauli operator σx, σy, or σz .
• (Correction) Then, a corrective strategy consisting in combinig conditionally steps of
simulation is used to obtain a non-probabilistic simulation of U .
Teleportation can be realized by two successive Bell measurements (figure 1), where a
Bell measurement is a projective measurement in the basis of the Bell states {|Bij〉}i,j∈{0,1},
where |Bij〉 = 1√
(σiz ⊗ σjx)(|00〉+ |11〉). A step of simulation of U is obtained by replacing
the second measurement by a measurement in the basis {(U † ⊗ Id)|Bij〉}i,j∈{0,1} (figure 2).
Figure 1. Bell measurement-based teleportation
ΦUσ ΦUσ
Figure 2. Simulation of U up to a Pauli operator
If a step of simulation is represented as a probabilistic black box (figure 3, left), there
exists a corrective strategy (figure 3, right) which leads to a full simulation of U . This strategy
consists in conditionally composing steps of simulation of U , but also of each Pauli operator.
A similar step of simulation and strategy are given for the two-qubit unitary transformation
ΛX (Controlled-X) in [4]. Notice that this simulation uses four ancillary qubits.
As a consequence, since any unitary transformation can be decomposed into ΛX and
one-qubit unitary transformations, any unitary transformation can be simulated by means
of projective measurements only. More precisely, for any circuit C of size n – with basis
ΛX and all one-qubit unitary transformations – and for any ǫ > 0, O(n log(n/ǫ)) projective
measurements are enough to simulate C with probability greater than 1− ǫ.
Towards Minimal Resources of Measurement-based Quantum Computation 3
Φ Φσy σz
Figure 3. Left: step of simulation abstracted into a probabilistic black box representation –
Rigth: conditional composition of steps of simulation.
Approximative universality, based on a finite family of projective measurements, can
also be considered. Leung [3] has shown that a family composed of five observables
F0 = {Z,X ⊗ X,Z ⊗ Z,X ⊗ Z, 1√
(X − Y ) ⊗ X} is approximatively universal, using
four ancillary qubits. It means that for any unitary transformation U , any ǫ > 0 and any
δ > 0, there exists a conditional composition of projective measurements from F0 leading to
the simulation of a unitary transformation Ũ with probability greater than 1− ǫ and such that
||U − Ũ || < δ.
Figure 4. State transfer
ΦUσVb
X Φ U ZU
ΦUσVVXVb
Figure 5. Step of simulation based on state transfer
In order to decrease the number of two-qubit measurements – four inF0 – and the number
of ancillary, an new scheme called state transfer has been introduced [6]. The state transfer
(figure 4) replaces the teleportation scheme for realizing a step of simulation. Composed
of one two-qubit measurements, two one-qubit measurements, and using only one ancillary
qubit, the state transfer can be used to simulate any one-qubit unitary transformation up to a
Pauli operator (figure 5). For instance, simulations ofH andHT – see section 3 for definitions
of H and T – are represented in figure 6. Moreover a scheme composed of two two-qubit
measurements, two one-qubit measurements, and using only one ancillary qubit can be used
to simulated ΛX up to a Pauli operator (figure 7). Since {H, T,ΛX} is a universal family
of unitary transformations, the family F1 = {Z ⊗ X,X,Z, 1√
(X − Y )} of observables is
approximatively universal, using one ancillary qubit [6]. This result improves the result by
Leung, since only one two-qubit measurement and one ancillary qubit are used, instead of four
two-qubit measurements and four ancillary qubits. Moreover, one can prove that at least one
Towards Minimal Resources of Measurement-based Quantum Computation 4
two-qubit measurement and one ancillary qubit are required for approximative universality.
Thus, it turns out that the upper bound on the minimal resources for measurement-based QC
differs form the lower bound, on the number of one-qubit measurements only.
b Z X
Z X−Y/  2Φ
b Z X
Figure 6. Simulation of H and HT up to a Pauli operator.
σ Λ X
Figure 7. Simulation of ΛX up to a Pauli operator.
In [5], it has been shown that the number of these one-qubit measurements can be
decreased, since the family F2 = {Z ⊗ X,Z, 1√
(X − Y )}, composed of one two-qubit
and only two one-qubit measurements, is also approximatively universal, using two ancillary
qubit. The proof is based on the simulation of X-measurements by means of Z and Z ⊗ X
measurements (figure 8). This result leads to a possible trade-off between the number
of one-qubit measurements and the number of ancillary qubits required for approximative
universality.
Figure 8. X-measurement simulation
In this paper, we meanly prove that the family F2 is approximatively universal, using
only one ancillary qubit. Thus, the upper bound on the minimal resources required for
approximative universality is improved, and moreover we answer the open question of the
trade-off between observables and ancillary qubits. Notice that we prove that the trade-off
conjectured in [5] does not exist, but another trade-off between observables and ancillary
qubits may exist since the bounds on the minimal resources for measurement-based quantum
computation are not tight.
3. Universal family of unitary transformations
There exist several universal families of unitary transformations, for instance {H, T,ΛX} is
one of them:
Towards Minimal Resources of Measurement-based Quantum Computation 5
H = 1√
, T =
, ΛX =
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 −1
We prove that the family {HT, σy,ΛZ} is also approximatively universal:
Theorem 1 U = {HT, σy,ΛZ} is approximatively universal.
The proof is based on the following properties. Let R
(α) be the rotation of the Bloch
sphere about the axis n through an angle α.
Proposition 1 If n = (a, b, c) is a real unit vector, then for any α, R
(α) = cos(α/2)I −
i sin(α/2)(aσx + bσy + cσz).
Proposition 2 For a given vector n of the Bloch sphere, if θ is an irrational multiple of π,
then for any α and any ǫ > 0, there exists k such that
(α)− R
(θ)k)|| < ǫ/3
Proposition 3 If n and m are non parallel vectors of the Bloch sphere, then for any one-
qubit unitary transformation U , there exists α, β, γ, δ such that:
U = eiαR
Proposition 4 (Włodarski [8]) If α is not an integer multiple of π/4 and cos β = cos2 α,
then either α or β is an irrational multiple of π.
Proof of theorem 1:
First we prove that any 1-qubit unitary transformation can be approximated by HT and
σyHT . Consider the unitary transformations U1 = T , U2 = HTH , U3 = σyHTHσy. Notice
that T is, up to an unimportant global phase, a rotation by π/4 radians around z axis on the
Block sphere:
U1 = T = e
−iπ/8(cos(π/8)I − i sin(π/8)σz)
U2 = HTH = e
−iπ/8(cos(π/8)I − i sin(π/8)σx)
U3 = σyHTHσy = e
−iπ/8(cos(π/8)I + i sin(π/8)σx)
Composing U1 and U2 gives, up to a global phase:
U2U1 = (cos(π/8)I − i sin(π/8)σx)(cos(π/8)I − i sin(π/8)σz)
= cos2(π/8)I − i[cos(π/8)(σx + σz)− sin(π/8)σy] sin(π/8)
Towards Minimal Resources of Measurement-based Quantum Computation 6
According to proposition 1, U2U1 is a rotation of the Bloch sphere about an axis along
n = (cos(π/8), − sin(π/8), cos(π/8)) and through an angle θ defined as a solution of
cos(θ/2) = cos2(π/8). Since π/8 is not an integer multiple of π/4 but a rational multiple
of π, according to proposition 4, a such θ is an irrational multiple of π. This irrationality
implies that for any angle α, the rotation around n about angle α can be approximated to
arbitrary accuracy by repeating rotations around n about angle θ (see proposition 3). For any
α and any ǫ > 0, there exists k such that
(α)− R
(θ)k)|| < ǫ/3
Moreover, composing U1 and U3 gives, up to a global phase:
U3U1 = (cos(π/8)I + i sin(π/8)σx)(cos(π/8)I − i sin(π/8)σz)
= cos2(π/8)I − i[cos(π/8)(−σx + σz) + sin(π/8)σy] sin(π/8)
U3U1 is a rotation of the Bloch sphere about an axis alongm = (− cos(π/8), sin(π/8), cos(π/8))
and through the angle θ. Thus, for any α and any ǫ > 0, there exists k such that
(α)− R
(θ)k)|| < ǫ/3
Since n and m are non-parallel, any one-qubit unitary transformation U , according to
proposition 2, can be decomposed into rotations around n and m : There exist α, β, γ, δ such
U = eiαR
Finally, for any U and ǫ > 0, there exist k1, k2, k3 such that
||U − R
(θ)k1R
(θ)k2R
(θ)k3 || < ǫ
Thus, any one-qubit unitary transformation can be approximated by means of U2U1, and
U3U1. Since U2U1 = (HT )(HT ) and U3U1 = σyHTHσyT = −(σyHT )(σyHT ), the family
{HT, σy} approximates any one-qubit unitary transformation.
With the additional ΛZ gate, the family U is approximatively universal. �
4. Universal family of projective measurements
In [5], the family of observables F2 = {Z ⊗ X,Z, X−Y√
} is proved to be approximatively
universal using two ancillary qubits. We prove that this family requires only one ancillary
qubit to be universal:
Theorem 2 F2 = {Z ⊗X,Z, X−Y√
} is approximatively universal, using one ancillary qubit
only.
The proof consists in simulating the unitary transformations of the universal family U .
First, one can notice that HT can be simulated up to a Pauli operator, using measurements of
F2, as it is depicted in figure 6. So, the universality is reduced to the ability to simulate ΛZ
and the Pauli operators – Pauli operators are needed to simulated σy ∈ F , but also to perform
the corrections required by the corrective strategy (figure 3).
Towards Minimal Resources of Measurement-based Quantum Computation 7
Lemma 5 For a given 2-qubit register a, b and one ancillary qubit c, the sequence of
measurements according to Zc, Za⊗Xc, Zc⊗Xb, and Zb (see figure 9) simulates ΛZ(Id⊗H)
on qubits a, b, up to a Pauli operator. The resulting state is located on qubits a and c.
Z(Id  H)Z
ZΦ Φc
Figure 9. Simulation of ΛZ(Id⊗H)
Proof: One can show that, if the state of the register a, b is |Φ〉 before the sequence of
measurements, the state of the register a, b after the measurements is σΛZ(Id⊗H)|Φ〉, where
σ = σs1z ⊗ σs3x σs2+s4z and si’s are the classical outcomes of the sequence of measurements. �
In order to simulate Pauli operators, a new scheme, different from the state transfer, is
introduced.
Lemma 6 For a given qubit b and one ancillary qubit a, the sequence of measurements Za,
Xa ⊗ Zb, and Za (figure 10) simulates, on qubit b, the application of σz with probability 1/2
and Id with probability 1/2.
Figure 10. Simulation of σ
Proof: Let |Φ〉 = α|0〉+ β|1〉 be the state of qubit b. After the first measurement, the state of
the register a, b is |ψ1〉 = (σs1x ⊗ Id)|0〉 ⊗ |Φ〉 where s1 ∈ {0, 1} is the classical outcome of
the measurement.
Since 〈ψ1|X ⊗ Z|ψ1〉 = 0, the state of the register a, b is:
|ψ2〉 =
(σs1x ⊗ Id)(Id+ (−1)s2X ⊗ Z)|0〉 ⊗ |Φ〉
(σs1x σ
z ⊗ Id)(|0〉 ⊗ |Φ〉+ |1〉 ⊗ (σz|Φ〉)
Let s3 ∈ {0, 1} be the outcome of the last measurement, on qubit a. If s1 = s3 then state
of the qubit b is |Φ〉, and σz|Φ〉 otherwise. One can prove that these two possibilities occur
with equal probabilities. �
Lemma 7 For a given qubit b and one ancillary qubit a, the sequence of measurements
, Za ⊗ Xb, and
(figure 11) simulates, on qubit b, the application of σx
with probability 1/2 and Id with probability 1/2.
Towards Minimal Resources of Measurement-based Quantum Computation 8
(X−Y)/  2 (X−Y)/  2
Figure 11. Simulation of σ
The proof of lemma 7 is similar to the proof of lemma 6.
Proof of theorem 2:
First notice that the family of unitary transformations U ′ = {HT, σy,ΛZ(I ⊗ H)} is
approximatively universal since U = {HT, σy,ΛZ} is universal.
HT and ΛZ(I⊗H) can be simulated up to a Pauli operator (lemmas 5). The universality
of the family of observables F2 = {Z ⊗ X,Z, X−Y√
} is reduced to the ability to simulate
any Pauli operators. Lemma 7 (resp. lemma 6), shows that σx (σz) can be simulated with
probability 1/2, moreover if the simulation fails, the resulting state is same as the original one.
Thus, this simulation can be repeated until a full simulation of σx (σz). Finally, σy = iσzσx
can be simulated, up to a global phase, by combining simulations of σx and σz. Thus,
F2 = {Z ⊗X,Z, X−Y√
} is approximatively universal using only one ancillary qubit. �
5. Conclusion
We have proved a new upper bound on the minimal resources required for measurement-
based QC: one two-qubit, and two one-qubit observables are universal, using one ancillary
qubit only. This new upper bound has experimental applications, but allows also to prove that
the trade-off between observables and ancillary qubits, conjectured in [5], does not exist. This
new upper bound is not tight since the lower bound on the minimal resources for this model
is one two-qubit observable and one ancillary qubit.
References
[1] D. Deutsch. Quantum theory, the Church-Turing principle and the universal quantum computer. Proc. R.
Soc. Lond. A, 400:97–117, 1985.
[2] S. A. Fenner and Y. Zhang. Universal quantum computation with two- and three-qubit projective
measurements, 2001.
[3] D. W. Leung. Quantum computation by measurements. IJQI, 2:33, 2004.
[4] M. A. Nielsen. Universal quantum computation using only projective measurement, quantum memory, and
preparation of the 0 state. Phys. Rev. A, 308:96–100, 2003.
[5] S. Perdrix. Qubit vs observable resouce trade-offs in measurement-based quantum computation. In
proceedings of Quantum communication measurement and computing, 2004.
[6] S. Perdrix. State transfer instead of teleportation in measurement-based quantum computation.
International Journal of Quantum Information, 3(1):219–223, 2005.
[7] R. Raussendorf, D. E. Browne, and H. J. Briegel. The one-way quantum computer - a non-network model
of quantum computation. Journal of Modern Optics, 49:1299, 2002.
[8] L. Wlodarski. On the equation cosα1+cosα2 cosα3+cosα4 = 0. Ann. Univ. Sci. Budapest. Eötvös Sect.
Math., 1969.
	Introduction
	Measurement-based QC
	Universal family of unitary transformations
	Universal family of projective measurements
	Conclusion
ABSTRACT
  We improve the upper bound on the minimal resources required for
measurement-based quantum computation. Minimizing the resources required for
this model is a key issue for experimental realization of a quantum computer
based on projective measurements. This new upper bound allows also to reply in
the negative to the open question about the existence of a trade-off between
observable and ancillary qubits in measurement-based quantum computation.

<|endoftext|><|startoftext|>
Introduction
	2 Spitzer census
	2.1 SED selected young stellar objects
	2.2 Class II census results
	2.2.1 Membership
	2.2.2 Completeness
	2.3 Protostellar census
	2.3.1 Low luminosity protostellar candidates
	2.3.2 MIPS survey of dark cloud cores near IC 348
	3 Analysis
	3.1 Spatial distribution of members
	3.2 Comparison of gas, mid-IR dust emission and young stars
	3.3 The protostars of the southern filament
	3.3.1 Spitzer & SCUBA correlations
	3.3.2 Clustering
	3.3.3 Near-Infrared Images
	3.4 Inferred cluster properties
	4 Discussion
	4.1 Total young star population of the IC 348 nebula
	4.2 Physical structure of the IC 348 star cluster
	4.3 History of star formation in the IC 348 nebula
	4.4 The origin & evolution of the IC 348 star cluster
	5 Conclusions
	A Extinction effects on 3-8m
	B Spectroscopy of new members
	B.1 Infrared Spectra
	B.2 Classification
	B.3 Optical Spectra
	C Class III membership
ABSTRACT
  We present a Spitzer based census of the IC 348 nebula and embedded star
cluster. Our Spitzer census supplemented by ground based spectra has added 42
class II T-Tauri sources to the cluster membership and identified ~20 class 0/I
protostars. The population of IC 348 likely exceeds 400 sources after
accounting statistically for unidentified diskless members. Our Spitzer census
of IC 348 reveals a population of protostars that is anti-correlated spatially
with the T-Tauri members, which comprise the centrally condensed cluster around
a B star. The protostars are instead found mostly at the cluster periphery
about 1 pc from the B star and spread out along a filamentary ridge. We find
that the star formation rate in this protostellar ridge is consistent with that
rate which built the exposed cluster while the presence of fifteen cold,
starless, millimeter cores intermingled with this protostellar population
indicates that the IC 348 nebula has yet to finish forming stars. We show that
the IC 348 cluster is of order 3-5 crossing times old, and, as evidenced by its
smooth radial profile and confirmed mass segregation, is likely relaxed. While
it seems apparent that the current cluster configuration is the result of
dynamical evolution and its primordial structure has been erased, our findings
support a model where embedded clusters are built up from numerous smaller
sub-clusters. Finally, the results of our Spitzer census indicate that the
supposition that star formation must progress rapidly in a dark cloud should
not preclude these observations that show it can be relatively long lived.

<|endoftext|><|startoftext|>
Introduction
Non-equilibrium transport through superconducting systems attracted much interest
since the demonstration of a Superconductor-Normal-Superconductor (SNS) transistor
[1]. In such a device, supercurrent suppression and its sign reversal (π-transition)
are achieved by driving the quasi-particle distribution out of equilibrium by means of
applied voltages [2, 3, 4, 5]. Another interesting issue in mesoscopic physics is transport
through quantum dots attached to superconducting leads. For DC transport through
quantum dots coupled to a normal and a superconducting lead, subgap transport is due
to Andreev reflection [6, 7, 8, 9, 10, 11]. Also transport between two superconductors
through a quantum dot has been studied extensively. The limit of a non-interacting
dot has been investigated in [12]. Several authors considered the regime of weak tunnel
coupling where the electrons forming a Cooper pair tunnel one by one via virtual states
[13, 14, 15]. The Kondo regime was also addressed [13, 16, 17, 18, 19]. Multiple
Andreev reflection through localized levels was investigated in [20, 21]. Numerical
approaches based on the non-crossing approximation [22], the numerical renormalization
group [23] and Monte Carlo [24] have also been used. The authors of [25] compare
different approximation schemes, such as mean field and second-order perturbation in
the Coulomb interaction. In double-dot systems the Josephson current has been shown
to depend on the spin state of the double dot [26]. Experimentally, the supercurrent
through a quantum dot has been measured through dots realized in carbon nanotubes
[27] and in indium arsenide nanowires [28].
In this Letter we study the transport properties of a system composed of an
interacting single-level quantum dot between two equilibrium superconductors where
a third, normal lead is used to drive the dot out of equilibrium. A Josephson coupling
in SNS heterostructures can be mediated by proximity-induced superconducting
correlations in the normal region. In case of a single-level quantum dot, superconducting
correlations are indicated by the correlator 〈d↓(0)d↑(t)〉, where dσ is the annihilation
operator of the dot level with spin σ. To obtain a large pair amplitude, i.e. the equal-
time correlator 〈d↓d↑〉, at least two conditions need to be fulfilled: (i) the states of an
empty and a doubly-occupied dot should be nearly energetically degenerate and (ii) the
overall probability of occupying the dot with an even number of electrons should be
finite. For a non-interacting quantum dot, i.e. vanishing charging energy U for double
occupancy, this can be achieved by tuning the level position ǫ in resonance with the
Fermi energy of the leads, ǫ = 0 [12]. In this case, the Josephson current can be viewed
as transfers of Cooper pairs between dot and leads and the expression of the current
starts in first order in the tunnel coupling strength Γ.
The presence of a large charging energy U ≫ kBT,Γ destroys this mechanism since
the degeneracy condition 2ǫ+U ≈ 0 is incompatible with a finite equilibrium probability
to occupy the dot with an even number of electrons. Nevertheless, a Josephson current
can be established by higher-order tunnelling processes (see, for example, [13, 14, 15]),
associated with a finite superconducting correlator 〈d↓(0)d↑(t)〉 at different times. The
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 3
amplitude of the Josephson coupling is, however, reduced by a factor Γ/∆, i.e., the
current starts only in second order in Γ, and the virtual generation of quasiparticles
in the leads suppresses the Josephson current for large superconducting gaps ∆. In
particular, it vanishes for ∆ → ∞.
The main purpose of the present paper is to propose a new mechanism that
circumvents the above-stated hindrance to achieve a finite pair amplitude in an
interacting quantum dot, and, thus, restores a Josephson current carried by first-order
tunnel processes that survives in the limit ∆ → ∞. For this aim, we attach a third,
normal, lead to the dot that drives the latter out of equilibrium by applying a bias
voltage, so that condition of occupying the dot with an even number of electrons is
fulfilled even for 2ǫ+ U ≈ 0.
We relate the current flowing into the superconductors to the nonequilibrium
Green’s functions of the dot. In the limit of a large superconducting gap, ∆ → ∞,
the current is only related to the pair amplitude. The latter is calculated by means
of a kinetic equation derived from a systematic perturbation expansion within real-
time diagrammatic technique that is suitable for dealing with both strong Coulomb
interaction and nonequilibrium at the same time.
2. Model
We consider a single-level quantum dot connected to two superconducting and one
normal lead via tunnel junctions, see figure 1. The total Hamiltonian is given by H =
ΓΓS S
Figure 1. Setup: a single-level quantum dot is connected by tunnel junctions to one
normal and two superconducting leads with tunnelling rates ΓN and ΓSL,R , respectively.
η=N,SL,SR
(Hη +Htunn,η). The quantum dot is described by the Anderson model
σdσ +Un↑n↓, where nσ = d
σdσ is the number operator for spin σ =↑, ↓, ǫ is
the energy level, and U is the charging energy for double occupation. The leads, labeled
by η = N, SL, SR, are modeled by Hη =
kσ ǫkc
ηkσcηkσ−
η−k↓ +H.c.
, where
∆η is the superconducting order parameter (∆N = 0). The tunnelling Hamiltonians are
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 4
Htunn,η = Vη
ηkσdσ +H.c.
. Here, Vη are the spin- and wavevector-independent
tunnel matrix elements, and cηkσ(c
ηkσ) and dσ(d
σ) represent the annihilation (creation)
operators for the leads and dot, respectively. The tunnel-coupling strengths are
characterized by Γη = 2π|Vη|
k δ(ω − ǫk).
3. Current formula
We start with deriving a general formula for the charge current in lead η by using
the approach of Ref. [29] generalized to superconducting leads. Similar formulae
that relate the charge current to the Green’s function of the dot in the presence of
superconducting leads have been derived in previous works, in particular for equilibrium
situations, see e.g. Refs. [16, 22]. The formula derived below is quite general, as
it allows for arbitrary bias and gate voltages, temperatures, and superconducting
order parameters for a quantum dot coupled to an arbitrary number of normal and
superconducting leads. For this, it is convenient to use the operators ψηk = (cηk↑, c
η−k↓)
and φ = (d↑, d
T in Nambu formalism. The current from lead η is expressed as
Jη = e 〈dNη/dt〉 = i(e/h̄)〈[H,Nη]〉 = i(e/h̄)〈[Htunn,η, Nη]〉 ‡, with Nη =
ηkτ3ψηk,
where τ1, τ2, τ3 indicate the Pauli matrices in Nambu space and e > 0 the electron
charge. Evaluating the commutator leads to
Jη = −
Re {Tr [τ3VηG
D,ηk(ω)]} , (1)
with Vη = Diag(Vη,−V
η ) and the lead–dot lesser Green’s functions (G
D,ηk(ω))m,n
that are the Fourier transforms of i〈ψ
ηkn(0)φm(t)〉. In the following, we assume the
tunnelling matrix elements Vη to be real (any phase of Vη can be gauged away by
substituting ∆η → ∆η exp(−2i arg Vη)). The Green’s function G
D,ηk is related to
the full dot Green’s functions and the lead Green’s functions by a Dyson equation in
Keldysh formalism: G<D,ηk(ω) = G
R(ω)V†ηg
ηk(ω)+G
<(ω)V†ηg
ηk(ω), where G
R(<)(ω) is
the retarded (lesser) dot Green’s function, and and g
ηk (ω) the lead advanced (lesser)
Green’s function. Using this relation and assuming energy-independent tunnel rates Γη,
we obtain for the current Jη = J1η + J2η with
J1η =
ΓηDη(ω)Im
ω − µη
2GR(ω)fη(ω) +G
, (2)
J2η =
ΓηSη(ω)Re
, (3)
where ∆η =
∆∗η 0
, and fη(ω) = [1 + exp(ω − µη)/(kBT )]
−1 is the Fermi
function, with T being the temperature and kB the Boltzmann constant. The dot
Green’s functions (G<D(ω))m,n and
GRD(ω)
are defined as the Fourier transforms
‡ Note that [Hη, Nη] 6= 0 but 〈[Hη, Nη]〉 = 0 for η = SL,R.
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 5
of i〈φ†n(0)φm(t)〉 and −iθ(t)〈{φm(t), φ
n(0)}〉, respectively. The two weighting functions
Dη(ω) and Sη(ω) are given by
Dη(ω) =
|ω − µη|
(ω − µη)2 − |∆η|2
θ(|ω − µη| − |∆η|)
Sη(ω) =
|∆η|2 − (ω − µη)2
θ(|∆η| − |ω − µη|).
The terms J1η and J2η involve excitation energies ω above and below the superconducting
gap, respectively. For η = N, only the part of J1η that involves normal (diagonal)
components of the Green’s functions contributes, and the current reduces to the result
presented in [29]. For superconducting leads, this part describes quasiparticle transport
that is independent of the superconducting phase difference. The other part of J1η
involves anomalous (off-diagonal) components of the Green’s functions and is, in general,
phase dependent. The contribution to the Josephson current stemming from this term is
the dominant one in the regime considered in [13, 14, 15]. The excitation energies above
the gap are only accessible either for transport voltages exceeding the gap or by including
higher-order tunnelling, involving virtual states with quasiparticles in the leads, and,
therefore, J1η vanishes for large |∆η|. In this case J2η, that involves only anomalous
Green’s functions with excitation energies below the gap, dominates transport. It is, in
general, phase dependent, and describes both Josephson as well as Andreev tunnelling.
In the following we consider the limit |∆η| → ∞, where the current simplifies to
Γη|〈d↓d↑〉| sin(Ψ− Φη) , (4)
with Φη being the phase of ∆η and 〈d↓d↑〉 = |〈d↓d↑〉| exp(iΨ) the pair amplitude of the
dot that has to be determined in the presence of Coulomb interaction, coupling to all
(normal and superconducting) leads and in non-equilibrium due to finite bias voltage.
We now consider a symmetric three-terminal setup with ΓSL = ΓSR = ΓS,
∆SL = |∆| exp(iΦ/2) and ∆SR = |∆| exp(−iΦ/2), and µSL = µSR = 0. The quantities
of interest are the the current that flows between the two superconductors (Josephson
current) Jjos = (JSL − JSR)/2 and the current in the normal lead (Andreev current)
Jand = JN = −(JSL + JSR).
Furthermore, we focus on the limit of weak tunnel coupling, ΓS < kBT . In this
regime, an Josephson current through the dot in equilibrium would be suppressed even
in the absence of Coulomb interaction, U = 0, since the influence of the superconductors
on the quantum-dot spectrum could not be resolved for the resonance condition ǫ ≈ 0.
This can, e.g., be seen in the exactly-solvable limit of U = 0 together with ΓN = 0,
where the Josephson current is Jjos = (e/2h̄)Γ
S sin(Φ) [f(−ǫA(Φ))− f(ǫA(Φ))] /ǫA(Φ)
with ǫA(Φ) =
ǫ2 + Γ2S cos
2(Φ/2). This provides an additional motivation to look for a
non-equilibrium mechanism to proximize the quantum dot.
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 6
4. Kinetic equations for quantum-dot degrees of freedom
The Hilbert space of the dot is four dimensional: the dot can be empty, singly
occupied with spin up or down, or doubly occupied, denoted by |χ〉 ∈ {|0〉, | ↑〉,
| ↓〉, |D〉 ≡ d
↓|0〉}, with energies E0, E↑ = E↓, ED. For convenience we define the
detuning as δ = ED −E0 = 2ǫ+ U . The dot dynamics is fully described by its reduced
density matrix ρD, with matrix elements P
≡ (ρD)χ2χ1 . The dot pair amplitude 〈d↓d↑〉
is given by the off-diagonal matrix element P 0D. The time evolution of the reduced
density matrix is described by the kinetic equations
P χ1χ2 (t) +
(Eχ1 −Eχ2)P
(t) =
(t, t′)P
(t′). (5)
We define the generalized transition rates byW
−∞ dt
(t, t′), which are the
only quantities to be evaluated in the stationary limit. Together with the normalization
condition
χ Pχ = 1, (5) determines the matrix elements of ρD. Furthermore, in (5)
we retain only linear terms in the tunnel strengths Γη and the detuning δ. Hence, we
calculate the rates W
to the lowest (first) order in Γη for δ = 0. This is justified in
the transport regime ΓS,ΓN, δ < kBT .
The rates are evaluated by means of a real-time diagrammatic technique [30],
that we generalize to include superconducting leads. This technique provides a
convenient tool to perform a systematic perturbation expansion of the transport
properties in powers of the tunnel-coupling strength. In the following, we concentrate
on transport processes to first order in tunnelling (a generalization to higher orders is
straightforward). This includes the transfer of charges through the tunnelling barriers
as well as energy-renormalization terms that give rise to nontrivial dynamics of the
quantum-dot degrees of freedom.
We find for the (first-order) diagonal rates Wχ1χ2 ≡ W
the expressions Wσ0 =
ΓNfN(−U/2);W0σ = ΓN[1 − fN(−U/2)];WDσ = ΓNfN(U/2);WσD = ΓN[1 − fN(U/2)].
The N lead also contributes to the rates WDD00 = (W
∗ = −ΓN[1 + fN(−U/2) −
fN(U/2) + iB] where B =
U/2−µN
2πkBT
−U/2−µN
2πkBT
, with µN being
the chemical potential of the normal lead and ψ(z) the Digamma function. Notice that
B vanishes when µN = 0 or U = 0. The superconducting leads do not enter here
due to the gap in the quasi-particle density of states. These leads, though, contribute
to the off-diagonal rates W 000D = W
W 0D00
W 0DDD
= −WD000 = −W
− (W 00D0)
WDDD0
= −iΓS cos(Φ/2).
For an intuitive representation of the system dynamics we define, in analogy to [31],
a dot isospin by
PD0 + P
; Iy = i
PD0 − P
; Iz =
PD − P0
. (6)
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 7
From (5), we find that in the stationary limit the isospin dynamics can be separated
into three parts, 0 = dI/dt = (dI/dt)acc + (dI/dt)rel + (dI/dt)rot, with
[1− fN(−U/2)− fN(U/2)]êz (7)
= − ΓN[1 + fN(−U/2)− fN(U/2)]I (8)
= I×Beff (9)
where êz is the z-direction and Beff = {2ΓS cos(Φ/2), 0,−ΓNB − 2ǫ− U} is an effective
magnetic field in the isospin space. The accumulation term (7) builds up a finite isospin,
while the relaxation term (8) decreases it. Finally, (9) describes a rotation of the isospin
direction.
5. Non-equilibrium Josephson current
In the isospin language the current in the superconducting leads is
JSL,R =
ΓS [Iy cos(Φ/2)± Ix sin(Φ/2)] , (10)
where the upper(lower) sign refers to the left(right) superconducting lead. The Iy
component contributes to the Andreev current, while Ix is responsible for the Josephson
current. To obtain subgap transport, we first need to build up a finite isospin component
along the z-direction, i.e. we need a population imbalance between the empty and doubly
occupied dot [(this is generated by the accumulation term in (7)]; second, we need a
finite Beff which rotates the isospin so that it acquires an inplane component. In order
to have a finite Josephson current (Ix 6= 0), we need the z-component, −ΓNB − 2ǫ−U ,
of the effective magnetic field producing the rotation to be non zero.
The Josephson current and the Andreev current read
Jjos = −
[2ǫ+ U + ΓNB]ΓS sin(Φ)
|Beff |2 + Γ
N[1 + fN(−U/2)− fN(U/2)]
1− fN(−U/2)− fN(U/2)
1 + fN(−U/2)− fN(U/2)
Jand =
2ΓNΓS[1 + cos(Φ)]
|Beff |2 + Γ
N[1 + fN(−U/2)− fN(U/2)]
× [1− fN(−U/2)− fN(U/2)]. (12)
These results take into account only first-order tunnel processes, i.e. the rates W
are computed to first order in Γη. The factor [1 − fN(−U/2) − fN(U/2)] ensures that
no finite dot-pair amplitude can be established if the chemical potential of the normal
lead, µN, is inside the interval [−U/2, U/2] by at least kBT . In this situation both
the Josephson and the Andreev currents vanish. On the other hand, this factor takes
the value −1 if µN > U/2 and the value +1 if µN < −U/2. Hence, the sign of the
Josephson current can be reversed by the applied voltage (voltage driven π-transition).
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 8
The considerations above establish the importance of the non-equilibrium voltage to
induce and control proximity effect in the interacting quantum dot. In figure 2 we show
in a density plot (a) Jjos and (b) Jand for Φ = π/2 as a function of the voltage µN and
the level position ǫ. Both the control of proximity effect by the chemical potential µN
and the voltage driven π-transition are clearly visible. If the detuning is too large,
|δ + ΓNB| >
Γ2N + 4Γ
S cos
2(Φ/2), it becomes difficult to build a superposition of
the states |0〉 and |D〉, which is necessary to establish proximity. As a consequence,
the Josephson and the Andreev current are algebraically suppressed by δ−1 and δ−2,
respectively. Figure 3 shows the Josephson current as a function of δ = 2ǫ + U . The
fact that the Josephson current is non zero for δ = 0 is due to the term ΓNB, i.e. of the
interaction induced contribution to the z-component of the effective field Beff acting on
the isospin. The term |B| has a maximum at µN = U/2, which causes this effect to be
more pronounced at the onset of transport. The fact that the value of the Josephson
current varies on a scale smaller than temperature indicates its nonequilibrium nature.
A π-transition of the Josephson current can also be achieved by changing the sign
of δ+ΓNB, as shown in figure 4 where Jjos is plotted as a function of the phase difference
Φ for different values of the level position. Notice that the current for δ = 0 (ǫ = −U/2)
is different from zero only due to the presence of the term ΓNB acting on the isospin.
6. Conclusion
In conclusion, we have studied non-equilibrium proximity effect in an interacting
single-level quantum dot weakly coupled to two superconducting and one normal lead.
We propose a new mechanism for a Josephson coupling between the leads that is
qualitatively different from earlier proposals based on higher-order tunnelling processes
via virtual states. Our proposal relies on generating a finite non-equilibrium pair
amplitude on the dot by applying a bias voltage between normal and superconducting
leads. The charging energy of the quantum dot defines a threshold bias voltage above
which the non-equilibrium proximity effect allows for a Josephson current carried
by first-order tunnelling processes, that is not suppressed in the limit of a large
superconducting gap. Both the magnitude and the sign of the Josephson current
are sensitive to the energy difference between empty and doubly-occupied dot. A π-
transition can be driven by either bias or gate voltage. In addition to defining a threshold
bias voltage, the charging energy induces many-body correlations that affect the dot’s
pair amplitude, visible in a bias-voltage-dependent shift of the π-transition as a function
of the gate voltage.
Acknowledgments
We would like to thank W. Belzig, R. Fazio, A. Shnirman, and A. Volkov for
useful discussions. M.G. and J.K. acknowledge the hospitality of Massey University,
Palmerston North, and of the CAS Oslo, respectively.
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots 9
Figure 2. Density plot of the a) Josephson and b) Andreev current, for fixed
superconducting-phase difference Φ = π/2, as a function of the dot-level position ǫ
and of the chemical potential of the normal lead µN. The symbols ± refer to the sign
of the current. The other parameters are ΓS = ΓN = 0.01U , and kBT = 0.05U .
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots10
-0.05 -0.025 0 0.025 0.05
/U=0.5
Figure 3. Josephson current, for fixed superconducting-phase difference Φ = π/2, as
a function of the detuning δ = ED − E0 = 2ǫ+ U for different values of the chemical
potential. The other parameters are ΓS = ΓN = 0.01U and kBT = 0.05U .
0 π/2 π 3π/2 2π
δ/U=0.01
δ/U=0
δ/U=-0.01
Figure 4. Josephson current as a function of the superconducting-phase difference
Φ for different values of the detuning. The other parameters are ΓS = ΓN = 0.01U ,
µN = U , and kBT = 0.05U .
Non-Equilibrium Josephson and Andreev Current through Interacting Quantum Dots11
References
[1] Baselmans J J A, Morpurgo A F, van Wees B J and Klapwijk T M 1999 Nature 397 43
[2] Volkov A F Phys. Rev. Lett. 1995 74 4730
[3] Wilhelm F K, Schön G and Zaikin A D 1998 Phys. Rev. Lett. 81 1682
[4] Yip S-K 1998 Phys. Rev. B 58 5803
[5] Giazotto F, Heikkilä T T, Taddei F, Fazio R, Pekola J P and Beltram F 2004 Phys. Rev. Lett. 92
137001
[6] Fazio R and Raimondi R 1999 Phys. Rev. Lett. 80 2913; Fazio R and Raimondi R 1999 Phys. Rev.
Lett. 82 4950
[7] Kang K 1998 Phys. Rev. B 58 9641
[8] Schwab P and Raimondi R 1999 Phys. Rev. B 59 1637
[9] Clerk A A, Ambegaokar V and Hershfield S 2000 Phys. Rev. B 61 3555
[10] Shapira S, Linfield E H, Lambert C J, Seviour R, Volkov A F and Zaitsev A V 2000 Phys. Rev.
Lett. 84 159
[11] Cuevas J C, Levy Yeyati A and Mart́ın-Rodero A 2001 Phys. Rev. B 63 094515
[12] Beenakker C W J and van Houten H 1992 Single-Electron Tunneling and Mesoscopic
Devices(Berlin: edited by H. Koch and H. Lübbig, Springer) p 175–179
[13] Glazman L I and Matveev K A 1989 JETP Lett. 49 659
[14] Spivak B I and Kivelson S A 1991 Phys. Rev. B 43 3740
[15] Rozhkov A V, Arovas D P and Guinea F 2001 Phys. Rev. B 64 233301
[16] Clerk A A and Ambegaokar V 2000 Phys. Rev. B 61 9109
[17] Avishai Y, Golub A and Zaikin A D 2003 Phys. Rev. B 67 041301
[18] Sellier G, Kopp T, Kroha J and Barash Y S 2005 Phys. Rev. B 72 174502
[19] López R, Choi M-S and Aguado R 2007 Phys. Rev. B 75 045132
[20] Levy Yeyati A , Cuevas J C, López-Dávalos A and Mart́ın-Rodero A 1997 Phys. Rev. B 55 R6137
[21] Johansson G, Bratus E N, Shumeiko V S and Wendin G 1999 Phys. Rev. B 60 1382
[22] Ishizaka S, Sone J and Ando T 1995 Phys. Rev. B 52 8358
[23] Choi M-S, Lee M and Belzig W 2004 Phys. Rev. B 70 020502(R)
[24] Siano F and Egger R 2004 Phys. Rev. Lett. 93 047002
[25] Vecino E, Mart́ın-Rodero A and Levy Yeyati A 2003 Phys. Rev. B 68 035105
[26] Choi M-S, Bruder C and Loss D 2000 Phys. Rev. B 62 13569
[27] Buitelaar M R, Nussbaumer T and Schönenberger C 2002 Phys. Rev. Lett. 89 256801; Cleuziou
J-P, Wernsdorfer W, Bouchiat V, Ondarçuhu T, and Monthioux M 2006 Nature Nanotechnology
1 53; Jarillo-Herrero P, van Dam J A and Kouwenhoven L P 2006 Nature 439 953; Jørgensen
H I, Grove-Rasmussen K, Novotný T, Flensberg K and Lindelof P E 2006 Phys. Rev. Lett. 96
207003
[28] van Dam J A, Nazarov Y V, Bakkers E P A M, De Franceschi S and Kouwenhoven L P 2006
Nature 442 667; Sand-Jespersen T, Paaske J, Andersen B M, Grove-Rasmussen K, Jørgensen H
I, Aagesen M, Sørensen C, Lindelof P E, Flensberg K and Nyg̊ard J Preprint cond-mat/0703264
[29] Meir Y and Wingreen N S 1992 Phys. Rev. Lett. 68 2512
[30] König J, Schoeller H and Schön G 1996 Phys. Rev. Lett. 76 1715; König J, Schmid J, Schoeller H
and Schön G 1996 Phys. Rev. B 54 16820
[31] Braun M, König J and Martinek J 2004 Phys. Rev. B 70 195345
http://arxiv.org/abs/cond-mat/0703264
	Introduction
	Model
	Current formula
	Kinetic equations for quantum-dot degrees of freedom
	Non-equilibrium Josephson current
	Conclusion
ABSTRACT
  We present a theory of transport through interacting quantum dots coupled to
normal and superconducting leads in the limit of weak tunnel coupling. A
Josephson current between two superconducting leads, carried by first-order
tunnel processes, can be established by non-equilibrium proximity effect. Both
Andreev and Josephson current is suppressed for bias voltages below a threshold
set by the Coulomb charging energy. A $\pi$-transition of the supercurrent can
be driven by tuning gate or bias voltages.

<|endoftext|><|startoftext|>
Introduction
X-ray observations of radio pulsars provide a powerful diag-
nostic of the energetics and emission mechanisms of rotation-
powered neutron stars. Due to the magnetic dipole braking, a
pulsar loses rotational kinetic energy at a rate Ė = 4π2IṖP−3,
where I is the moment of inertia of the neutron star, assumed to
be 1045 g cm2, and P is the rotation period. Though pulsars
have traditionally been mostly studied at radio wavelengths,
only a small fraction (10−7 to 10−5, e.g., Taylor et al. 1993)
of the “spin-down luminosity” Ė emerges as radio pulsations.
Rotation power can manifest itself in the X / γ-ray energy range
as pulsed emission, or as nebular radiation produced by a rela-
tivistic wind of particles emitted by the neutron star. Residual
heat of formation is also observed as soft X-ray emission from
young neutron stars. Such thermal radiation, however, can also
be produced as a result of reheating from internal or exter-
nal sources. The growing list of observable X-ray emitting
rotation-powered pulsars allows the study of the properties of
the population as a whole. Young pulsars constitute a particu-
larly interesting subset to investigate owing to their large spin-
down luminosities (&1036 erg s−1).
The discovery of PSR J1357−6429 during the Parkes multi-
beam survey of the Galactic plane (see Lorimer et al. 2006
and references therein) is reported in Camilo et al. (2004).
The pulsar is located near the supernova remnant candidate
G309.8−2.6 (Duncan et al. 1997) for which no distance or age
information is available. With a spin period of 166 ms, a pe-
riod derivative of 3.6 × 10−13 s s−1, and a characteristic age
τc = P/2Ṗ ≃ 7300 yr, this pulsar stands out as one of the ten
youngest Galactic radio pulsars known (see the ATNF Pulsar
Catalogue1, Manchester et al. 2005). The other main properties
of this source derived from the radio observations are the spin-
down luminosity of 3.1 × 1036 erg s−1 and the surface magnetic
field strength of 7.8 × 1012 G, inferred under the assumption of
pure magnetic dipole braking. Based on a dispersion measure
of ∼127 cm−3 pc (Camilo et al. 2004), a distance of ∼2.4 kpc
is estimated, according to the Cordes-Lazio NE2001 Galactic
Free Electron Density Model2.
Here we report the first detection of PSR J1357−6429 in
the X-ray range using the XMM-Newton observatory and we
present its spectral properties in the 0.5–10 keV energy band.
We also made use of two short Chandra observations to con-
firm the identification and to probe possible spatial extended
emissions, taking advantage of the superb angular resolution of
the Chandra telescope.
2. XMM-Newton observation and data analysis
In this section we present the results obtained with the EPIC in-
strument on board the XMM-Newton X-ray observatory. EPIC
consists of two MOS (Turner et al. 2001) and one pn CCD
1 See http://www.atnf.csiro.au/research/pulsar/psrcat .
2 See http://rsd-www.nrl.navy.mil/7213/lazio/ne model
and references therein.
http://arxiv.org/abs/0704.0205v2
2 P. Esposito et al.: X-ray observations of PSR J1357−6429
detectors (Strüder et al. 2001) sensitive to photons with ener-
gies between 0.1 and 15 keV. All the data reduction was per-
formed using the XMM-Newton Science Analysis Software3
(SAS version 7.0). The raw observation data files were pro-
cessed using standard pipeline tasks (epproc for pn, emproc
for MOS data). Response matrices and effective area files were
generated with the SAS tasks rmfgen and arfgen.
The observation was carried out on 2005 August 17 and
had a duration of 15 ks, yielding net exposure times of 11.7 ks
in the pn camera and 14.5 ks in the two MOSs. The pn
and the MOSs were operated in Full Frame mode (time res-
olution: 73.4 ms and 2.6 s, respectively) and mounted the
medium thickness filter. PSR J1357−6429 is clearly detected
in the pn and MOS images (see Figure 1) at the radio pul-
sar position (Right ascension = 13h 57m 02.4s, Declination
= −64◦ 29′ 30.2′′ (epoch J2000.0); Camilo et al. 2004). The
13:57:12.0 13:57:00.0 13:56:48.0
-64:27:35.8
-64:28:47.8
-64:29:59.8
-64:31:11.8
Right ascension
Figure 1. Field of PSR J1357−6429 as seen by the EPIC cam-
eras in the 0.5–10 keV energy range. The radio pulsar posi-
tion (Camilo et al. 2004) is marked with the white diamond
sign. The angular separation of the centroid of the X-ray source
(computed using the SAS task emldetect) from the radio pul-
sar position is (3.5 ± 0.6)′′ (1σ statistical error). Considering
the XMM-Newton absolute astrometric accuracy of 2′′ (r.m.s.),
the X-ray and radio positions are consistent.
source spectra were extracted from circular regions centered at
the position of PSR J1357−6429. The whole observation was
affected by a high particle background that led to the selec-
tion of a 20′′ radius circle in order to increase the signal-to-
noise ratio in the pn detector, particularly sensitive to particle
background, and a 40′′ radius for both the MOS cameras. The
background spectra were extracted from annular regions with
radii of 140′′ and 220′′ for the MOSs, and from two rectan-
gular regions with total area of ∼104 arcsec2 located on the
sides of the source for the pn. We carefully checked that the
choice of different background extraction regions does not af-
fect the spectral results. We selected events with pattern 0–4
3 See http://xmm.vilspa.esa.es/ .
Table 1. Summary of the XMM-Newton spectral results. Errors
are at the 90% confidence level for a single interesting param-
eter.
Parameter Value
PL PL +BB
NH (10
22 cm−2) 0.14+0.07
−0.06 0.4
Γ 1.8+0.3
−0.2 1.4 ± 0.5
kBT (keV) – 0.16
+0.09
−0.04
a (km) – 1.4+2.9
Fluxb (10−13erg cm−2 s−1) 2.3 3.6
Blackbody fluxb (10−13erg cm−2 s−1) – 1.3
χ2r / d.o.f. 1.00 / 72 0.85 / 70
a Radius at infinity assuming a distance of 2.5 kpc.
b Unabsorbed flux in the 0.5–10 keV energy range.
and pattern 0–12 for the pn and the MOS, respectively. The re-
sulting background subtracted count rates in the 0.5–10 keV
energy range were (4.2 ± 0.3) × 10−2 cts s−1 in the pn and
(1.9 ± 0.2) × 10−2 cts s−1 in the two MOS cameras, while the
background rate expected in the source extraction regions is
about 50% of these values. The spectra were rebinned to have
at least 20 counts in each energy bin. Spectral fits were per-
formed using the XSPEC version 12.3 software4.
The spectra from the three cameras were fitted together
in the 0.5–10 keV energy range with a power law and with
a power-law plus blackbody model (see Table 1). The latter
model provides a slightly better fit, with less structured resid-
uals (see Figure 2). Furthermore, considering the distance of
2.5 kpc, the interstellar absorption along the line of sight de-
rived with the power-law fit is too low if compared to the
typical column density of neutral absorbing gas in that direc-
tion of approximately 1022 cm−2 (Dickey & Lockman 1990).
The resulting best-fit parameters for the power-law plus black-
body model are photon index Γ = 1.4, blackbody temperature
kBT = 0.16 keV, and absorption NH = 4× 10
21 cm−2 with a re-
duced χ2 of 0.85 for 70 degrees of freedom. The corresponding
luminosity in the 0.5–10 keV band is 2.7 × 1032 erg s−1.
Young pulsars are often associated with pulsar wind neb-
ulae: complex structures that arise from the interaction be-
tween the particle wind powered by the pulsar and the
supernova ejecta or surrounding interstellar medium (see
Gaensler & Slane 2006 for a review). Inspecting the EPIC im-
ages in various energy bands, we find only a marginal (≈3σ)
evidence of diffuse emission, in the 2–4 keV energy band con-
sisting of a faint elongated (∼20 arcsec to the north-east, see
Figure 1) structure starting from PSR J1357−6429. We took
that excess as an upper limit for a diffuse emission: assuming
the same spectrum as the point source, it corresponds to a 2–10
keV luminosity of ≈6 × 1031 erg s−1.
For the timing analysis we applied the solar system
barycenter correction to the photon arrival times with the SAS
task barycen. We searched the data for pulsations around the
spin frequency at the epoch of the XMM-Newton observations,
predicted assuming the pulse period and the spin-down rate
measured with the Parkes radio telescope (Camilo et al. 2004).
4 See http://heasarc.gsfc.nasa.gov/docs/xanadu/xspec/ .
P. Esposito et al.: X-ray observations of PSR J1357−6429 3
1 2 5
Energy (keV)
Figure 2. EPIC pn spectrum of PSR J1357−6429. Top: Data
and best-fit power-law (dashed line) plus blackbody (dot-
dashed line) model. Middle: Residuals from the power-law
best-fit model in units of standard deviation. Bottom: Residuals
from the power-law plus blackbody best-fit model in units of
standard deviation.
As glitches and / or deviations from a linear spin-down may al-
ter the period evolution, we searched over a wide period range
centered at the value of ∼166 ms. We searched for significant
periodicities using two methods: a standard folding technique
and the Rayleigh statistic. No pulsation were detected near to
the predicted frequency with either method but, since the pn
timing resolution (73 ms) allows to only poorly sample the
166 ms pulsar period, a reliable upper limit on the pulsed frac-
tion cannot be set.
3. Chandra observations and data analysis
PSR J1357−6429 was observed by means of the Chandra
X-ray Observatory during two exposures of ∼17 ks duration
each on 2005 November 18 and 19. The observations were car-
ried out with the Spectroscopic array of the High Resolution
Camera (HRC-S; Murray et al. 2000) used without transmis-
sion gratings. The HRC is a multichannel plate detector sen-
sitive to X-ray over the 0.08–10 keV energy range, although
essentially no energy information on the detected photons is
available. The HRC-S time resolution is 16 µs.
We started from “level 1” event data calibrated and made
available through the Chandra X-ray Center5. The level 1 event
files contain all HRC triggers with the position information
corrected for instrumental (degap) and aspect (dither) effects.
After standard data processing with the Chandra Interactive
Analysis of Observations (CIAO ver. 3.3), a point-like source
has been clearly detected in both the observations at a position
consistent with that of PSR J1357−6429 (see Figure 3).
For the timing analysis we corrected the data to the solar
system barycenter with the CIAO task axbary and then we fol-
5 See http://cxc.harvard.edu/.
13:57:02.9 13:57:02.4 13:57:02.0
-64:29:27.4
-64:29:31.0
-64:29:34.6
Right ascension
Figure 3. Chandra 0.08–10 keV HRC-S image centered
on the radio pulsar position, marked with a diamond sign
(Camilo et al. 2004). The CIAO celldetect routine yields a
best-fit position for the X-ray source at an angular distance of
(0.9 ± 0.2)′′ (1σ statistical error) from the radio pulsar. This
value consistent with the Chandra pointing accuracy of 0.8′′
(99% confidence level).
lowed the same procedure described in Section 2, but we again
did not detect the source pulsation. By folding the light curve
of PSR J1357−6429 on the radio frequency and fitting it with a
sinusoid, we determine a 90% confidence level upper limit of
∼30% on the amplitude of a sinusoidal modulation. We stress
that this upper limit depends sensitively on data time binning
and on the assumed pulse shape.
We used the CIAO task merge all to generate a combined
image of the source. Our main purpose was to search for diffuse
structures on scales smaller than the XMM-Newton angular res-
olution. We compared the radial profile of the pulsar emission
with the Chandra High-Resolution Mirror Assembly point-
spread function at 1 keV generated using Chandra Ray Tracer
(ChaRT) and Model AXAF Response to X-rays (MARX).
We found that the emission we detect from PSR J1357−6429
(∼100 counts concentrated within a ∼0.5′′ radius circle) is con-
sistent with that from a point source.
We used the Chandra data and the PIMMS software6
to determine an upper limit on the luminosity of a possible
spatial extended emission. The 3σ upper limit on a pulsar
wind nebula brightness (in counts s−1) has been estimated as
3(bA)1/2τ−1, where b is the background surface brightness in
counts arcsec−2, A is the pulsar wind nebula area, and τ is the
exposure duration. Assuming the interstellar absorption value
from the XMM-Newton best-fit model (NH = 0.4 × 10
22 cm−2,
see Section 2) and typical parameters for a pulsar wind nebula
(radius of ∼2× 1017 cm, that corresponds to ∼5′′ for a distance
of 2.5 kpc, and power-law spectrum with photon index Γ = 1.6,
see, e.g., Gotthelf (2003)), this upper limit corresponds to a
2–10 keV luminosity of ≈3 × 1031 erg s−1 for a uniform dif-
fuse nebula. No significant diffuse excess was found even at
6 See http://heasarc.gsfc.nasa.gov/docs/tools.html .
4 P. Esposito et al.: X-ray observations of PSR J1357−6429
larger angular scale, but the corresponding upper limit for dif-
fuse emission is less constraining than that derived using the
XMM-Newton data.
4. Discussion
We have presented the results of the first X-ray observations of
PSR J1357−6429 by means of the XMM-Newton and Chandra
observatories. The source has been positively detected in all
the instruments although, probably due to the low statistics, we
could not detect the source pulsation. The high angular resolu-
tion Chandra observations favor the picture in which most of
the counts belong to a point source. We found that the spectrum
is well represented by either a power-law with photon index
Γ = 1.8+0.3
−0.2 or by a power-law plus blackbody model. In the
latter case the best-fit parameters are for the power-law com-
ponent a photon index Γ = 1.4 ± 0.5 and, for the blackbody
component, radius7 of ∼1.4+2.9
−0.2d2.5 km and temperature corre-
sponding to kBT = 0.16
+0.09
−0.04 keV.
It is generally believed that a combination of emission
mechanisms are responsible for the detected X-ray flux from
rotation-powered pulsars (see, e.g., Kaspi et al. 2006 for a re-
view). The acceleration of particles in the neutron star magne-
tosphere generates non thermal radiation by synchrotron and
curvature radiation and / or inverse Compton processes, while
soft thermal radiation could result by cooling of the surface
of the neutron star. A harder thermal component can arise from
polar-cap reheating, due to return currents from the outer gap or
from close to the polar-cap. The dominant emission mechanism
is likely related to the age of the pulsar. In pulsar younger than
≈104 yr the strong magnetospheric emission generally prevails
over the thermal radiation, making difficult to detect it.
As discussed in Section 2, we tend to prefer the power-law
plus blackbody spectral model for PSR J1357−6429. The re-
sulting blackbody size of ∼1.5d2.5 km may suggest that the soft
emission (.2 keV) is coming from hot spots on the surface due
to backflowing particles, rather than from the entire surface.
However this hint should be considered with caution, as the
surface temperature distribution of a neutron star is most likely
non uniform (since the heath conductivity of the crust is higher
along the magnetic field lines) and the small and hot blackbody
could result from a more complicated distribution of tempera-
ture. Moreover, currently we lack of reliable models of cooling
neutron star thermal emission and thus we cannot exclude that
the soft component is emitted from surface layers of the whole
neutron star.
To date, thermal emission has been detected in only a
few young radio pulsars. Among these, the properties of
PSR J1357−6429 are similar to those of the young pul-
sars Vela (PSR B0833−45; τc = 11 kyr, P = 89 ms,
Ė = 6.9 × 1036 erg s−1, and distance d ≃ 0.2 kpc; Pavlov et al.
2001) and PSR B1706–44 ( τc = 17.5 kyr, P = 102 ms,
Ė = 3.4 × 1036 erg s−1, and d ≃ 2.5 kpc; Gotthelf et al. 2002).
Notably, the efficiency in the conversion of the spin-down
energy loss into X-ray luminosity for PSR J1357−6429 is
L0.5−10 keV/Ė ≃ 8d
2.5×10
−5, significantly lower than the typical
7 We indicate with dN the distance in units of N kpc.
value of ≈10−3 (Becker & Truemper 1997), and similar to that
of PSR B1706–44 (∼10−4) and Vela (∼10−5).
Although a pulsar wind nebula would not came as a
surprise for this young and energetic source, we did not
find clear evidence of diffuse X-ray emission associated with
PSR J1357−6429. However, some known examples of wind
nebulae (see Gaensler & Slane 2006), rescaled to the distance
of PSR J1357−6429, would hide below the upper limits derived
from the XMM-Newton and Chandra data.
New deeper exposures using XMM-Newton or Chandra
would help determine if a thermal component is present in the
emission of PSR J1357−6429 as our spectral analysis suggests,
and possibly detect a pulsed emission. High sensitivity obser-
vations would also serve to address the issue of the presence
of a pulsar wind nebula. Although there is not any EGRET
γ-ray source coincident with PSR J1357−6429 (Hartman et al.
1999), young neutron stars and their nebulae are often bright
γ-ray sources and PSR J1357−6429 in particular, given its
high “spin-down flux” Ė/d2 and similarity with Vela and
PSR B1706–44, is likely to be a good target for the upcoming
AGILE and GLAST satellites and the ground based Cherenkov
air showers telescopes.
Acknowledgements. This work is based on data from observations
with XMM-Newton, an ESA science mission with instruments and
contributions directly funded by ESA member states and NASA. We
also used data from the Chandra X-ray Observatory Center, which is
operated by the Smithsonian Astrophysical Observatory Center on be-
half of NASA. The authors thank the anonymous referee for helpful
comments and acknowledge the support of the Italian Space Agency
and the Italian Ministry for University and Research.
References
Becker, W. & Truemper, J. 1997, A&A, 326, 682
Camilo, F., Manchester, R. N., Lyne, A. G., et al. 2004, ApJ,
611, L25
Dickey, J. M. & Lockman, F. J. 1990, ARA&A, 28, 215
Duncan, A. R., Stewart, R. T., Haynes, R. F., & Jones, K. L.
1997, MNRAS, 287, 722
Gaensler, B. M. & Slane, P. O. 2006, ARA&A, 44, 17
Gotthelf, E. V. 2003, ApJ, 591, 361
Gotthelf, E. V., Halpern, J. P., & Dodson, R. 2002, ApJ, 567,
Hartman, R. C., Bertsch, D. L., Bloom, S. D., et al. 1999, ApJS,
123, 79
Kaspi, V. M., Roberts, M. S. E., & Harding, A. K. 2006, in
Compact stellar X-ray sources, ed. W. H. G. Levin and M.
van der Klis (Cambridge: Cambridge University Press), 279
Lorimer, D. R., Faulkner, A. J., Lyne, A. G., et al. 2006,
MNRAS, 372, 777
Manchester, R. N., Hobbs, G. B., Teoh, A., & Hobbs, M. 2005,
AJ, 129, 1993
Murray, S. S., Austin, G. K., Chappell, J. H., et al. 2000,
in Proc. SPIE Vol. 4012, X-Ray Optics, Instruments, and
Missions III, ed. J. E. Truemper & B. Aschenbach, 68
Pavlov, G. G., Zavlin, V. E., Sanwal, D., Burwitz, V., &
Garmire, G. P. 2001, ApJ, 552, L129
Strüder, L., Briel, U., Dennerl, K., et al. 2001, A&A, 365, L18
P. Esposito et al.: X-ray observations of PSR J1357−6429 5
Taylor, J. H., Manchester, R. N., & Lyne, A. G. 1993, ApJS,
88, 529
Turner, M. J. L., Abbey, A., Arnaud, M., et al. 2001, A&A,
365, L27
	Introduction
	XMM-Newton observation and data analysis
	Chandra observations and data analysis
	Discussion
ABSTRACT
  We present the first X-ray detection of the very young pulsar PSR J1357-6429
(characteristic age of 7.3 kyr) using data from the XMM-Newton and Chandra
satellites. We find that the spectrum is well described by a power-law plus
blackbody model, with photon index Gamma=1.4 and blackbody temperature kT=160
eV. For the estimated distance of 2.5 kpc, this corresponds to a 2-10 keV
luminosity of about 1.2E+32 erg/s, thus the fraction of the spin-down energy
channeled by PSR J1357-6429 into X-ray emission is one of the lowest observed.
The Chandra data confirm the positional coincidence with the radio pulsar and
allow to set an upper limit of 3E+31 erg/s on the 2-10 keV luminosity of a
compact pulsar wind nebula. We do not detect any pulsed emission from the
source and determine an upper limit of 30% for the modulation amplitude of the
X-ray emission at the radio frequency of the pulsar.

<|endoftext|><|startoftext|>
Resonant activation in bistable semiconductor lasers
Stefano Lepri1, ∗ and Giovanni Giacomelli1
Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche,
via Madonna del Piano 10, I-50019 Sesto Fiorentino, Italy
(Dated: November 4, 2018)
We theoretically investigate the possibility of observing resonant activation in the hopping dynam-
ics of two-mode semiconductor lasers. We present a series of simulations of a rate-equations model
under random and periodic modulation of the bias current. In both cases, for an optimal choice of
the modulation time-scale, the hopping times between the stable lasing modes attain a minimum.
The simulation data are understood by means of an effective one-dimensional Langevin equation
with multiplicative fluctuations. Our conclusions apply to both Edge Emitting and Vertical Cavity
Lasers, thus opening the way to several experimental tests in such optical systems.
PACS numbers: 42.55.Px, 05.40.-a, 42.65.Sf
I. INTRODUCTION
It is currently established that stochastic fluctuations
may have a constructive role in enhancing the response
of nonlinear systems to an external coherent stimulus.
Relevant examples are the enhancement of the decay
time from a metastable state (noise–enhanced stability)
[1, 2], the synchronization with a weak periodic input sig-
nal (stochastic resonance) [3] or the regularizaton of the
response at an optimal noise intensity (coherence reso-
nance) [4].
Another instance is the phenomenon of resonant acti-
vation that was discovered by Doering and Gadoua [5].
They showed that the escape of an overdamped Brown-
ian particle over a fluctuating barrier can be enhanced
by suitably choosing the correlation time of barrier fluc-
tuations themselves. In other words, the escape time
from the potential well attains a minimum for an optimal
choice of such correlation time. Since its discovery, the
phenomenon received a considerable attention from the-
orists (see e.g. Refs. [6, 7, 8, 9, 10]). Detailed studies by
means of analog simulations have also been reported for
both Gaussian and dichotomous fluctuations [11]. More
recently, the phenomenon has been shown to occur also
for the case in which the barrier oscillates periodically
[12, 13].
To our knowledge, experimental evidences of resonant
activation were only given for a bistable electronic cir-
cuit [14] and, very recently, for a colloidal particle sub-
ject to a periodically–modulated optical potential [15].
It is therefore important to look for other setups where
the effect could be studied in detail. As a matter of
fact, multimode laser systems are good candidates to
investigate noise–activated dynamics like the switching
among modes induced by quantum fluctuations (sponta-
neous emission) [16]. In particular, semiconductor lasers
proved to be particularly versatile for detailed experi-
∗Electronic address: stefano.lepri@isc.cnr.it
mental investigations of modulation and noise-induced
phenomena like stochastic resonance [17, 18] and noise–
induced phase synchronization [19]. In those previous
studies, the resonance regimes are attained by a suit-
able random modulation of the bias current which can
be tuned in a well-controlled way. It is thus natural to
argue about the possibility of observing resonant activa-
tion with the same type of experimental setup.
In this paper, we theoretically demonstrate the phe-
nomenon of resonant activation in a generic rate–
equations model for a two-mode semiconductor laser un-
der modulation of the bias current. The basic ingredients
that act in the theoretical descriptions are a fluctuating
potential barrier and some activating noise. In the laser
system, the latter is basically provided by spontaneous
emission while current fluctuations, that appear addi-
tively into the rate equations, effectively act multiplica-
tively if a suitable separation of time scales holds [20]. In
a previous paper [21], we have explicitely demonstrated
such multiplicative–noise effects on the mode–hopping
dynamics. This was shown by a reduction to a bistable
one–dimensional potential system with both multiplica-
tive and additive stochastic forces. Several predictions
drawn from such a simplified model are in good agree-
ment with the experimental observations carried out for
a bulk, Edge-Emitting Laser (EEL) [21]. In the present
context, we will show that this reduced description is of
great help in the interpretation of simulation data.
The outline of the paper is the following. In Sec. II
we recall the model for a two-mode semiconductor laser.
In Sec. III we present the numerical simulation for two
physically distinct cases displaying resonant activation.
These results are discussed and interpreted by compar-
ing with the reduced one–dimensional Langevin model
mentioned above (Sec. IV). We draw our conclusions in
Sec. V.
II. RATE EQUATIONS
Our starting point is a stochastic rate-equation model
for a semiconductor laser that may operate in two longi-
http://arxiv.org/abs/0704.0206v1
mailto:stefano.lepri@isc.cnr.it
tudinal modes whose complex amplitudes are denoted by
E±. Both of them interact with a single carrier density
N that provides the necessary amplification. The two
modes have very similar linear gains, provided that their
wavelengths are almost equal and they are close to the
gain peak. Let J(t) denote the bias (injection) current,
the model can be written as [21]
Ė+ =
(1 + iα)g+ − 1
2DspN ξ+ (1a)
Ė− =
(1 + iα)g− − 1
2DspN ξ− (1b)
Ṅ = γ
J(t)−N − g+|E+|2 − g−|E−|2
where γ is carrier density relaxation rate, α is the
linewidth enhancement factor [22]. The modal gains read
N ± ε(N −Nc)
1 + s|E±|2 + c|E∓|2
, (2)
where ε determines the difference in differential gain
among the two modes while Nc defines the carrier
density where the unsaturated modal gains are equal.
The parameters s and c are respectively the self- and
cross-saturation coefficients. The ξ± are two indepen-
dent, complex white noise processes with zero mean
[〈ξ±(t)〉 = 0] and unit variance [〈ξi(t)ξ∗j (t′)〉 = δijδ(t−t′)]
that model spontaneous emission. The noise terms in
Eqs. (1a) and (1b) are gauged by the spontaneous emis-
sion coefficient Dsp.
All quantities are expressed in suitable dimensionless
units. In particular, time is normalized to the photons’
lifetime, which for semiconductor laser is typically of the
order of a few picoseconds or less (see e.g. [22, 23, 24])
A detailed analysis of the stationary solutions of
Eqs. (1) is reported in Ref. [25]. For a constant bias cur-
rent J(t) = J0 and Dsp = 0, Eqs. (1) admit four different
steady state solutions: the trivial one E± = 0, two single-
mode solutions — E+ 6= 0, E− = 0 and viceversa — and
a solution where both modes are lasing, E± 6= 0. For
Nc > 1, and c > s, there exist a finite interval of J0 val-
ues for which the two single–mode solutions coexist and
are stable while the E± 6= 0 is unstable (bistable region).
Here, for Dsp > 0 the laser performs stochastic mode-
hopping, with the total emitted intensity remains almost
constant while each mode switches on and off alternately
at random times. We point out that the emission in each
mode is nonvanishing even in the “off” state, as the av-
erage power spontaneously emitted in each mode at any
time is given by 4DspN [recall that Eqs. (1) are usually
interpreted in Itô sense [23]]. Observation of this be-
haviour has been reported in several experimental works
on EELs [26, 27, 28].
We remark that while Eqs. (1) aim at modeling EELs,
the results presented henceforth would apply also to po-
larization switching in Vertical Cavity Surface Emitting
Lasers (VCSELs). Indeed, experimental data [29] show
strong similarities between this phenomenon and the lon-
gitudinal mode dynamics. On the theoretical side, this
analogy is supported by the fact that the polarization dy-
namics in VCSELs is described by models that are math-
ematically similar to the one discussed here [30, 31, 32].
In the following, we will focus on the effect of the ex-
ternally imposed fluctuation/modulation of the injected
current. This situation is modeled by letting
J(t) = J0 + δJ(t) . (3)
The DC value J0 sets the working point and will be al-
ways chosen to be in the bistability region. We focus
on the case in which δJ is a Ornstein-Uhlenbeck process
with zero average 〈δJ(t)〉 = 0 and correlation time τ :
˙δJ = −
ξJ (4)
that means
〈δJ(t)δJ(0)〉 = DJ exp(−|t|/τ) . (5)
This choice is suitable to model a finite-bandwith noise
generator. Notice that τ and the variance of fluctuations
DJ = 〈δJ2〉 can be fixed independentely.
Another case of experimental interest that we will con-
sider is using the current modulation
δJ = A sinΩt (6)
To assess the nature of the stochastic process at hand,
it is important to introduce the relevant time scales. We
define first of all the switching or relaxation time TR as
the typical time for the emission to change from one mode
to the other. The main quantities we are interested in are
the Kramers or residence times T± defined as the average
times for which the emission occurs in each mode. In
semiconductor lasers T± are generally much larger than
TR. Typically, TR ∼ 1− 10ns while residence times may
range between 0.1 and 100 µs [29, 33]. The third time-
scale is of course given by the characteristic time of the
external driving, namely, τ and 2π/Ω respectively.
In the following, we will study how the hopping dy-
namics changes upon varying these latter parameters as
well as the strength of the perturbation.
III. NUMERICAL SIMULATIONS
In this Section we present the outcomes of a series of
numerical simulation of Eqs. (1). In Ref. [21] it was
observed that the sensitivity of each of the T± on the
imposed current fluctuations may be notably different
depending on the parameters’ choice. This is a typi-
cal signature of the multiplicative nature of the stochas-
tic process. In particular, one can argue [21] that such
“simmetry-breaking” effects mostly depend on the ratio
εσ/δ where
, δ =
. (7)
The parameter σ represents the gain saturation induced
by the total power in the laser, while δ describes the
reduction in gain saturation due to partitioning of the
power between the two modes.
The possibility of obtaining qualitatively different re-
sponses depending on the actual parameters corresponds
to the different experimental observations reported for
both EELs [21, 28, 33] and VCSELs [17, 29]. Those two
classes of lasers were indeed found to display markedly
different simmetry-breaking effects under current modu-
lation. To account for those features, we consider two
different sets of phenomenological parameters. For defi-
niteness, in both cases we fix ε = 0.1, s = 1.0, Nc = 1.1,
γ = 0.01 and change the values of c and Dsp (see Table
I). The first set (δ = 0.05) corresponds to the case in
which added modulation changes the hopping time scale
in an almost symmetric way. On the contrary, in the
second case (δ = 0.15) the asymmetry effect of the noise
is stronger [21]. We can thus consider the two as rep-
resentative of the VCSELs and EELs case respectively.
The value of J0 has been empirically adjusted to yield
T+ ≃ T− ≡ Ts and an almost symmetric distribution of
intensities in absence of modulation. The actual values
are about 10% above the laser threshold. The sponta-
neous emission coefficient Dsp has been chosen to yield a
value of the residence times of the same order of magni-
tude of the experimental ones.
In the following, we decide to set α = 0 which is ap-
propriate for our EEL model where the phase dynamics
is not relevant [21]. This choice may however not be fully
justified for the VCSEL case. In this respect, the sim-
ulations presented below are representative of the VC-
SEL dynamics only in a qualitative sense. Nonetheless,
it should be pointed out that a 1D Langevin model in-
dependent of α describes also the VCSEL case [30, 31].
Since resonant activation is mainly due to the multiplica-
tive noise effect described by such equations [see Eq. (9)
below] we consider this as an indirect proof that phe-
nomenology we will report below should be observable
also in the VCSEL case.
The largest part of the simulations were performed
with Euler method with time steps 0.01-0.05 for times
in the range 107 − 108 time units depending on the val-
ues of τ and Ω. For comparison, some checks with Heun
method [34] have also been carried on. Within the sta-
tistical accuracy, the results are found to be insensitive
the the choice of the algorithm.
A. Stochastic modulation
Let us start illustrating the results in the case of
stochastic current modulation (Eq. (4)). In Fig. 1 and
TABLE I: The parameter values used in the two series of
simulations of Eqs. (1), the other values are given in the text.
c Dsp J0 δ σ
1.1 0.7 × 10−5 1.197 0.05 1.05
1.3 1.5 × 10−5 1.194 0.15 1.15
2 we report the measured dependence of the residence
times T± on the correlation time τ for the two parameter
sets given in Table I and different values of the noise vari-
ance DJ . In all cases, the curves display well-pronounced
minima at an optimal value of τ . This is the typical sig-
nature of resonant activation. The minima are almost
located between the relaxation time TR and the hopping
time Ts (marked by the vertical dashed lines). The val-
ues of TR reported in the figures have been estimated
from the reduced model discussed in the next Section,
see Eq. (14) below.
The effect manifest in a different way for the second
parameter set. In the case of Fig. 1 both times attain a
minimum, albeit with different values. On the contrary
the data of Fig. 2 show that one of the two times is hardly
affected from the external perturbation regardless of the
value of τ . In other terms, we can tune the current corre-
lation in such a way that emission along only one of the
two modes is strongly reduced (about a factor 10 in the
simulation discussed here).
B. Periodic modulation
Let us now turn to the case of sinusoidal current mod-
ulation (Eq. 6). In Fig. 3 and 4 we report the measured
dependence of the residence times T± on the frequency Ω
for the two parameter sets given in Table I and different
values of the amplitude A. For comparison with the pre-
vious case we choose A such that the RMS value of (6)
is roughly equal to the variance of (4), i.e. A ≃
2DJ .
As in the previous case, the curves display resonant
activation at an optimal value of Ω. For the second set
of parameters, one of the two hopping times is more re-
duced than the other (compare Fig. 4 with Fig. 2). It
should be also noticed that the data in Fig. 2 display
some statistical fluctuations while the curves for the pe-
riodic modulation are smoother.
IV. INSIGHTS FROM A REDUCED MODEL
In order to better understand the activation phe-
nomenon it is useful to reduce the five–dimensional dy-
namical system (1) to an effective one-dimensional sys-
tem. This has been accomplished in Ref. [21]. For com-
pleteness, we only recall here some basic steps of the
derivation. In the first place, we introduce the change of
Correlation time τ
=4  10
=4  10
=4  10
FIG. 1: (Color online) Simulations of the rate equations with
Ornstein-Uhlenbeck current fluctuations, parameter set with
c = 1.1 (see text and Table I): residence times T+ (squares)
and T− (circles) for increasing values of the current variance
DJ . The values of the relaxation time TR and the hopping
time Ts (in absence of modulation) are marked by the vertical
dashed lines.
coordinates
E+ = r cosφ exp iψ+, E− = r sinφ exp iψ− . (8)
In these new variables, r2 is the total power emitted by
the laser, and φ determines how this power is partitioned
among the two modes. The values φ = 0, π/2 correspond
to pure emission in mode + and − respectively. The
phases ψ± do not influence the evolution of the modal
amplitudes and carrier density and can be ignored.
In order to simplify the analysis, we assume that (i)
The difference between modal gains is very small, i.e.,
Nc >∼ 1, ε≪ 1, c >∼ s; (ii) the laser operates close enough
to threshold, so that r2 ≪ 1 and the saturation term is
small: in this limit, r and N decouple to leading order
from φ; (iii) r and N can be adiabatically eliminated and
(iv) only their fluctuations around the equilibrium val-
ues due to J are retained. This last assumption holds for
weak spontaneous noise and amounts to say that r and
Correlation time τ
=1  10
=5  10
=1  10
FIG. 2: (Color online) Simulations of the rate equations with
Ornstein-Uhlenbeck current fluctuations, parameter set with
c = 1.3 (see text and Table I): residence times T+ (squares)
and T− (circles) for increasing values of the current variance
N are stochastic processes given by nonlinear transfor-
mations of J (see Eqs. (16) in Ref. [21]). This requires
that J does not change too fast. For example, in the
case of the Orstein–Uhlenbeck process, Eq. (4), τ should
be larger than the relaxation time of the total intensity.
The validity of the above reduction has been carefully
checked against simulations of the complete model [21].
For the scope of the present work, we performed a fur-
ther check by comparing the spectrum of fluctuations of
r2 with the imposed one, Eq. (4). Indeed, the behaviour
is the same for τ > TR while for shorter τ some differ-
ences are detected. This means that the reduced descrip-
tion discussed below becomes less and less accurate. On
the other hand, in this regime spontaneous fluctuation
should dominate and this limitation become less relevant
for our purposes.
Altogether, the hopping dynamics is effectively one-
dimensional and is described by the slow variable φ. Its
Modulation period 2π/Ω
A=0.020
A=0.090
A=0.009
FIG. 3: (Color online) Simulations of the rate equations with
sinusoidal modulation of the current, parameter set with c =
1.1 (see text and Table I): residence times T+ (squares) and
T− (circles) for increasing values of modulation amplitude A.
evolution is ruled by the effective Langevin equation
φ̇ = −1
a cos 2φ+ b
sin 2φ +
tan 2φ
2Dφ ξφ (9)
where, together with (7) we have defined the new set of
parameters
(1 + σ)Nc − 1
1 + σ
(J − 1) (11)
1 + σ
(J − Js) (12)
(1 + σJ)2
(1 + σ)(J − 1)
Dsp . (13)
We remind in passing that the same equation (9) has
been derived by Willemsen et al. [30, 31] to describe po-
larization switches in VCSELs (see also Ref. [35] for a
similar reduction). The starting point of their derivation
Modulation period  2π/Ω
6 A=0.014
A=0.030
A=0.055
FIG. 4: (Color online) Simulations of the rate equations with
sinusoidal modulation of the current, parameter set with c =
1.3 (see text and Table I): residence times T+ (squares) and
T− (circles) for increasing values of modulation amplitude A.
is the San Miguel-Feng-Moloney model [36]. The physi-
cal meaning of the variable φ is different from here as it
represents the polarization angle of emitted light. This
supports the above claim that, upon a suitable reinterpre-
tation of variables and parameters, many of the results
presented henceforth may apply also to the dynamics of
VCSELs.
In absence of modulation (δJ = 0), Eq. (9) is bistable
in an interval of current values where it admits two stable
stationary solutions φ± and an unstable one φ0 (double-
well). This regime correspond to the bistability region of
model (1). Notice that for J0 = Js, b = 0 the hopping
between the two modes occurs at the same rate. The
above definitions allows an estimate of relaxation time TR
defined above. This is is the inverse of the curvature of
the potential in φ0. For J0 = Js this is straightforwardly
evaluated to be
(1 + σ)
δ(Js − 1)
For the two parameter sets given in Table I one finds
TR = 210 TR = 77.0, respectively. These are the values
emploied to draw the leftmost vertical lines in Figs. 1-4.
The effect of a time-dependent current is to make the
coefficients a, b and Dφ fluctuating. It can be shown [21]
that the effect onDφ can be recasted as a renormalization
of the intensity of the spontaneous-emission noise. How-
ever, for the parameters employed in the present work it
turns out that this correction is pretty small and will be
neglected henceforth by simply considering Dφ as con-
stant [38]. For simplicity, we also disregard the depen-
dence of Dφ on δJ in the drift term of Eq. (9). Under
those further simplifications the Langevin equation can
be rewritten as
φ̇ = −U ′(φ)− V ′(φ) δJ +
2Dφ ξφ (15)
where we have express the force term as derivatives of
the “potentials”
U(φ) = −
δ(J0 − 1)
16(1 + σ)
cos 4φ−
εσ(J0 − Js)
4(1 + σ)
cos 2φ
−Dφ ln sin 2φ (16)
V (φ) = − δ
16(1 + σ)
cos 4φ− εσ
4(1 + σ)
cos 2φ. (17)
Langevin equations of the form (15) with (4) have been
thoroughly studied in the literature (see e.g. [7, 8, 9, 10,
11] and references therein) as prototypical examples of
the phenomenon of activated escape over a fluctuating
barrier. In view of their non-Markovian nature, their
full analytical solution for arbitrary τ is not generally
feasible. Several approximate results can be provided in
some limits.
For an arbitrary choice of the parameters, V has a
different symmetry with respect to U meaning that the
effective amplitude of multiplicative noise is different
within the two potential wells. If this difference is large
enough, current fluctuation will remove the degeneracy
between the two stationary solutions. This is best seen
by computing the istantaneous potential barriers ∆U±(t)
close to the symmetry point J0 = Js . For weak noise
and δJ ≪ (Js − 1), they are given to first-order in δJ(t)
∆U±(t) ≃
8(1 + σ)
(Js − 1) +
δ ± 2εσ
8(1 + σ)
δJ(t) . (18)
Obviously, this last expression makes sense only when
the fluctuating term is sub-threshold i.e. whenever the
system is bistable. In the case of periodic modulation,
formula (18) allows estimating the range of amplitude
values for a sub-threshold driving
δ(Js − 1)
δ ± 2εσ
. (19)
Using this condition, along with the parameter values at
hand, we deduce that the cases displayed in lower panels
of Figs. 3 and 4 correspond to superthreshold driving.
However, while the minima are much more pronounced
than in the other panels, there is no qualitative difference
in the system response. In the case of stochastic modu-
lation, the same remark applies in a probabilistic sense
for the last panels of Figs. 1 and 2.
Altogether, the mode switching can be seen as an acti-
vated escape over fluctuating barriers given by Eq. (18).
The statistical properties of the latter process is con-
trolled by the current fluctuations. We now discuss the
properties of various regimes. For simplicity, we refer
to the case of stochastic modulations. Most of the re-
marks and formulas reported in the following Subsection
should apply also to the periodic case by replacing τ and
DJ with 2π/Ω and A
2/2 whenever appropriate.
A. Fast barrier fluctuations: τ < TR ≪ T±
As we already pointed out, in this regime the reduc-
tion to Eq. (15) is not justified. We may thus only expect
some qualitative insight on the behaviour of the rate-
equations. From a mathematical point of view, some
analytical approximations for equations like (15) are fea-
sible in this limit (see e.g. Ref. [8] for the stochastic
case). For our purposes, it is sufficient to note that in
this regime the effect of δJ is hardly detected for both
types of driving (see again Figs. 1-4). Note also that
working at DJ fixed means that for τ → 0 the fluctua-
tion become negligible.
B. Resonant activation: TR < τ ≪ T±
If TR < τ we are in the colored noise case. The prob-
lem is amenable of a kinetic description which amounts
to neglect intrawell motion and reduce to a rate model
describing the statistical transitions in terms of transition
rates. If we consider τ as a time scale of the external driv-
ing we can follow the terminology of Ref. [39] and refer
to this situation as the “semiadiabatic” limit of Eq. (15).
In this regime, the residence time is basically the short-
est escape time, which in turn correspond to the lowest
value of the barrier (the noise is approximatively con-
stant in the current range considered henceforth). For
the case of interest, δ < 2εσ we can use (18) to infer
that the minimal values of ∆U± should be attained for
δJ ∝ ∓
DJ respectively. This yields
T± ≃ Ts exp
−K 2εσ ± δ
1 + σ
where K is a suitable numerical constant. Notice that δ
controls the asymmetry level: if δ ≪ 2εσ the two resi-
dence times decrease at approximatively the same rate.
This prediction is verified in the simulations and also in
the experiment [21].
As a further argument in support of the above rea-
soning, we also evaluated the probability distributions
of the residence times obtained from the simulation of
the rate equations. In Fig. 5, we show two representa-
tive cumulative distributions. The data are well fitted
by a Poissonian P (T ) = 1 − exp(−T/T±) for both the
stochastic and periodic modulation cases. This confirms
that hopping occours preferentially when a given (mini-
mal) barrier occurs.
0 0.5 1 1.5
0 1 2 3 4
FIG. 5: (Color online) Cumulative distributions of the resi-
dence times in the resonant activation region, parameter set
with c = 1.3 (see text and Table I). Left panel: stochastic
modulation with DJ = 5×10
−4, τ = 1.638×103 . Right panel:
periodic modulation with A = 0.03 and period 1.286 × 104.
We report only the histograms for the times whose averages
are denoted by T+ in the text. Solid line is the cumulative
Poissonian distribution with the same average.
C. Slow barrier, frequent hops: TR ≪ T± ≪ τ
This corresponds to the adiabatic limit in which the
time scale of the external driving is slower than the intrin-
sic dynamics of the system [39]. To a first approximation
we can here treat current fluctuations in a parametric
way. Correction terms may be evaluated by means of
a suitable perturbation expansion in the small parame-
ter 1/τ [10]. If δJ is small enough for the expression
(18) to make sense, the escape time can be estimated as
the average of escape times over the distribution of bar-
rier fluctuations, i.e. 〈T±〉δJ . For the case of Eq. (4),
the variable δJ is Gaussian and we can use the identity
〈expβz〉 = exp(β2〈z2〉/2) to obtain [11]
T± ≃ Ts exp
[2(δ ± 2εσ)2
(1 + σ)2D2
. (21)
This reasoning implies that for large τ the residence times
should approach two different constant values. A closer
inspection of the graphs (in linear scale) reveals that this
is not fully compatible with the data of Fig.1 even for
the smallest value of DJ . In several cases, T± continue to
increase with τ and no convincing evidence of saturation
is observed. We note that the same type of behaviour
was already observed in the analog simulations data of
Ref. [11]. There, an increase of hopping times duration at
large τ was found. The Authors of Ref. [11] explained this
as an effect of a too large value of the noise fluctuation
forcing the system to jump roughly every τ . We argue
that the same explanation holds for our case. This is also
consistent with the fact that the exponential factors in
Eq. (21) evaluated with the simulation parameters turn
out to be much larger than unity.
V. CONCLUSIONS
In this paper, we have explored numerically and ana-
lytically the effects of external current fluctuations on the
mode-hopping dynamics in a model of a bistable semicon-
ductor laser. To the best of our knowledge, this setup
provides the first theoretical evidence of resonant activa-
tion in a laser system. As the phenomenon has hardly
received any experimental confirmation in optics, we be-
lieve that our study may open the way to future research
in this subfield.
The model we investigated is based on a rate-equation
description, where the bias current enters parametrically
into the evolution of the modal amplitudes. We consid-
ered, two kinds of current flutuations, namely, a stochas-
tic process ruled by an Orstein-Uhlenbeck statistics, and
a coherent, sinusoidal modulation. These choices are mo-
tivated by the aim of proposing a suitable setup for an
experimental verification of our results. Upon varying the
characteristic time-scale of the imposed fluctuations, we
have shown that the residence times attain a minimum
for a well-defined value, which is the typical signature
of resonant activation. The magnitude of the effect can
be different depending on the parameters of the model.
Moreover, the response of the system appears very much
similar for both periodic and random modulations.
The reduction of the rate equations to a one-
dimensional Langevin equation allowed us to recast the
problem as an activated escape over a fluctuating bar-
rier. To first approximation, the fluctuating barrier (mul-
tiplicative term) is mainly controlled by current modu-
lations while the spontaneous noise act as an additive
source. This simplified description has allowed us to
draw some predictions (e.g. the dependence of residence
times on noise strength) and to better understand the
role of the physical parameters. Given the generality of
the description, our results should apply to a broad class
of multimode lasers, including both Edge Emitting and
Vertical Cavity Lasers.
From an experimental point of view, driving the laser
in a orders-of-magnitude wide range of time-scales is
more feasible in the case of a sinusoidal modulation than
for a colored, high frequency noise. However, given the
evidence of a resonant activation phenomenon for such
modulation, our results indicate that it occurs almost for
the same parameters in the case of colored noise, provided
that the RMS of the modulations equals the amplitude
of the added noise. Thus, the phenomenon could be fully
exploited along those lines. Since the reported experi-
mental evidences of the phenomenon are so far scarce,
we hope that the present work could suggest a detailed
characterization in optical systems that allows for both
very precise measurements and careful control of param-
eters.
[1] R. Graham and A. Schenzle, Phys. Rev. A 26, 1676
(1982).
[2] R. N. Mantegna and B. Spagnolo, Phys. Rev. Lett. 76,
563 (1996).
[3] L. Gammaitoni, P. Hänggi, P. Jung, and F. Marchesoni,
Rev. Mod. Phys. 70, 223 (1998).
[4] A. S. Pikovsky and J. Kurths, Phys. Rev. Lett. 78, 775
(1997).
[5] C.R. Doering and J.C. Gadoua, Phys. Rev. Lett. 69, 2318
(1992).
[6] M. Bier and R. D. Astumian, Phys. Rev. Lett. 71, 1649
(1993).
[7] P. Hänggi, Chem. Phys. 180 157 (1994)
[8] A. J. R. Madureira, P. Hänggi, V. Buonomano, J. Ro-
drigues and A. Waldyr, Phys. Rev. E 51 3849 (1995).
[9] P. Reimann, Phys. Rev. E 52 1579 (1995).
[10] J. Iwaniszewski, Phys. Rev. E 54 3173 (1996).
[11] M. Marchi, F. Marchesoni, L. Gammaitoni, E.
Menichella- Saetta and S. Santucci, Phys. Rev. E 54 3479
(1996).
[12] A. L. Pankratov and M. Salerno, Phys. Lett. A 273, 162
(2000).
[13] M.I. Dykman, B. Golding, L.I. McCann, V.N. Smelyan-
skiy, D.G. Luchinsky, R. Mannella and P.V.E. McClin-
tock, Chaos 11, 587 (2001).
[14] R. N. Mantegna and B. Spagnolo, Phys. Rev. Lett. 84,
3025 (2000).
[15] C. Schmitt, B. Dybiec, P. Hänggi and C. Bechinger, Eu-
rophys. Lett. 74(6), 937 (2006).
[16] R. Roy, R. Short, J. Durnin and L. Mandel, Phys. Rev.
Lett. 45, 1486 (1980).
[17] G. Giacomelli, F. Marin and I. Rabbiosi, Phys. Rev. Lett.
82, 675 (1999).
[18] F.Pedaci, M. Giudici, J.R. Tredicce and G. Giacomelli,
Phys. Rev. E 71, 036125 (2005).
[19] S. Barbay, G. Giacomelli, S. Lepri and A. Zavatta, Phys.
Rev. E 68, 020101(R) (2003).
[20] A. Schenzle and H. Brand, Phys. Rev. A 20, 1628 (1979).
[21] F.Pedaci, S. Lepri, S. Balle, G. Giacomelli, M. Giudici
and J.R. Tredicce, Phys. Rev. E 73, 041101 (2006).
[22] K. Petermann, Laser Diode Modulation and Noise,
ADOP-Kluwer Academic Publisher, Dordrecht (The
Nederlands), 1988.
[23] G. P. Agrawal and N. K. Dutta, Long wavelength semi-
conductor lasers, Van Nostran Reinhold, New York, 1986.
[24] T.E. Sale, Vertical Cavity Surface Emitting Lasers Wiley,
New York, 1995.
[25] J. Albert et al., Opt. Comm. 248, 527 (2005).
[26] M. Ohtsu, Y. Teramachi, Y. Otsuka, and A. Osaki, IEEE
J. of Quantum Electron. vol. 22, 535 (1986).
[27] M. Ohtsu and Y. Teramachi, IEEE J. Quantum Electron.
vol. 25, 31 (1989).
[28] L. Furfaro, F. Pedaci, X. Hachair, M. Giudici, S. Balle,
J.R. Tredicce, IEEE J. of Quantum Electron. 40, 1365
(2004).
[29] G. Giacomelli and F. Marin, Quantum Semiclass. Opt.
10, 469 (1998).
[30] M.B. Willemsen, M. U. F. Khalid, M. P. van Exter and
J. P. Woerdman, Phys. Rev. Lett. 82, 4815 (1999).
[31] M. P. van Exter, M.B. Willemsen, J. P. Woerdman, Phys.
Rev. A 58 4191 (1998).
[32] B. Nagler et al., Phys. Rev. A 68 013813 (2003).
[33] F. Pedaci, M. Giudici, G. Giacomelli, J. R. Tredicce,
Appl. Physics B81, 993 (2005).
[34] R. Toral and M. San Miguel, ”Stochastic effects in phys-
ical systems”, in Instabilities and Nonequilibrium Struc-
tures VI, 35-130, edited by Enrique Tirapegui, Javier
Martinez, and Rolando Tiemann, Kluwer Academic Pub-
lishers (2000).
[35] G. Van der Sande, J. Danckaert, I. Veretennicoff and T.
Erneux, Phys. Rev. A 67 013809 (2003).
[36] M. San Miguel, Q. Feng, J.V. Moloney, Phys. Rev. A 52
1728 (1995)
[37] B. Nagler, M. Peeters, I. Veretennicoff and J. Danckaert,
Phys. Rev. E 67, 056112 (2003).
[38] For specific choices of the parameters, this approximation
may not be justified. For example, when δ ≃ 2εσ the
barrier fluctuations ∆U− is hardly affected by a change
in J and the renormalization of spontaneous noise cannot
be neglected. Since the parameters are independent we
restrict to the generic case in which the above condition
is not fulfilled.
[39] P. Talkner, J. Luczka, Phys. Rev. E 69 046109 (2004).
ABSTRACT
  We theoretically investigate the possibility of observing resonant activation
in the hopping dynamics of two-mode semiconductor lasers. We present a series
of simulations of a rate-equations model under random and periodic modulation
of the bias current. In both cases, for an optimal choice of the modulation
time-scale, the hopping times between the stable lasing modes attain a minimum.
The simulation data are understood by means of an effective one-dimensional
Langevin equation with multiplicative fluctuations. Our conclusions apply to
both Edge Emitting and Vertical Cavity Lasers, thus opening the way to several
experimental tests in such optical systems.

<|endoftext|><|startoftext|>
Introduction
Utilizing the asymptotic freedom of QCD, Collins and Perry [1] first noted that the dense
cores of neutron stars may consist of deconfined quarks instead of hadrons. The crucial
question is whether observations of neutron stars from their birth to death through
neutrino, photon and gravity-wave emissions can unequivocably reveal the presence of
nearly deconfined quarks instead of other possibilities such as only nucleons or other
exotica such as strangeness-bearing hyperons or Bose (pion and kaon) condensates.
2. Neutrino signals during the birth of a neutron star
The birth of a neutron star is heralded by the arrival of neutrinos on earth as confirmed
by IMB and Kamiokande neutrino detectors in the case of supernova SN 1987A. Nearly
all of the gravitational binding energy (of order 300 bethes, where 1 bethe ≡ 1051 erg)
released in the progenitor star’s white dwarf-like core is carried off by neutrinos and
antineutrinos of all flavors in roughly equal proportions. The remarkable fact that the
weakly interacting neutrinos are trapped in matter prior to their release as a burst is due
to their short mean free paths in matter, λ ≈ (σn)−1 ≈ 10 cm, (here σ ≈ 10−40 cm2 is the
neutrino-matter cross section and n ≈ 2 to 3 ns, where ns ≃ 0.16 fm−3 is the reference
nuclear equilibrium density), which is much less than the proto-neutron star radius,
which exceeds 20 km. Should a core-collapse supernova occur in their lifetimes, current
neutrino detectors, such as SK, SNO, LVD’s, AMANDA, etc., offer a great opportunity
for understanding a proto-neutron star’s birth and propagation of neutrinos in dense
matter insofar as they can detect tens of thousands of neutrinos in contrast to the tens
of neutrinos detected by IMB and Kamiokande.
The appearence of quarks inside a neutron star leads to a decrease in the maximum
mass that matter can support, implying metastability of the star. This would occur
if the proto-neutron star’s mass, which must be less than the maximum mass of the
http://arxiv.org/abs/0704.0207v1
Quark matter and the astrophysics of neutron stars 2
Figure 1. Evolutions of the central baryon density nB, ν concentration Yν , quark
volume fraction χ and temperature T for different baryon masses MB. Solid lines
show stable stars whereas dashed lines showing stars with larger masses are metastable.
Diamonds indicate when quarks appear at the star’s center, and asterisks denote when
metastable stars become gravitationally unstable. Figure after Ref. [2]
hot, lepton-rich matter is greater than the maximum mass of hot, lepton-poor matter.
For matter with nucleons only, such a metastability is denied (see, e.g., [3]). Figure 1
shows the evolution of some thermodynamic quantities at the center of stars of various
fixed baryonic masses. With the equation of state used (see [2] for details), stars with
MB ∼< 1.1 M⊙ do not contain quarks and those with MB ∼ 1.7 M⊙ are metastable.
The subsequent collapse to a black hole could be observed as a cessation in the neutrino
signals well above the sensitivity limits of the current detectors (Figure 2).
3. Photon signals during the thermal evolution of a neutron star
Multiwavelength photon observations of neutron stars, the bread and butter affair of
astronomy, has yielded estimates of the surface tempeartures and ages of several neutron
stars (Fig. 3). As neutron stars cool principally through neutrino emission from their
cores, the possibility exists that the interior composition can be determined. The
star continuosly emits photons, dominantly in x-rays, with an effective temperature
Teff that tracks the interior temperature but that is smaller by a factor ∼ 100. The
dominant neutrino cooling reactions are of a general type, known as Urca processes
[4], in which thermally excited particles undergo beta and inverse-beta decays. Each
Quark matter and the astrophysics of neutron stars 3
Figure 2. The evolution of the total neutrino luminosity for stars of indicated baryon
masses. Shaded bands illustrate the limiting luninosities corresponding to a count
rate of 0.2 Hz in all detectors assuming 50 kpc for IMB and and Kamioka, 8.5 kpc
for SNO, SuperK, and UNO. Shaded regions represent uncertanities in the average
neutrino energy from the use of a diffusion scheme for neutrino transport in matter.
Figure after Ref. [2].
reaction produces a neutrino or anti-neutrino, and thermal energy is thus continuously
lost. Depending upon the proton-fraction of matter, which in turn depends on the
nature of strong interactions at high density, direct Urca processes involving nucleons,
hyperons or quarks lead to enhanced cooling compared to modified Urca processes in
which an additional particle is required to conserve momentum. However, effects of
superfluidity abates cooling as sufficient thermal energy is required to break paired
fermions. In addition, the poorly known envelope composition also plays a role in the
inferred surface temperature (Fig. 3). The multitude of high density phases, cooling
mechanisms, effects of superfluidity, and unknown envelope composition have thus far
prevented definitive conclusions to be drawn (see, e.g., [5]).
4. Mesured masses and their implications
Several recent observations of neutron stars have direct bearing on the determination of
the maximum mass. The most accurately measured masses are from timing observations
of the radio binary pulsars. As shown in Fig. 4, which is compilation of the measured
neutron star masses as of November 2006, observations include pulsars orbiting another
Quark matter and the astrophysics of neutron stars 4
Figure 3. Observational estimates of neutron star temperatures and ages together
with theoretical cooling simulations forM = 1.4 M⊙. Models and data are described in
[6]. Orange error boxes (see online) indicate sources from which both X-ray and optical
emissions have been observed. Simulations are for models with Fe or H envelopes,
with and without the effects of superfluidity, and allowing or forbidding direct Urca
processes. Models forbidding direct Urca processes are relatively independent of M
and superfluid properties. Trajectories for models with enhanced cooling (direct Urca
processes) and superfluidity lie within the yellow region, the exact location depending
upon M as well as superfluid and Urca properties. Figure adapted from Ref. [7].
neutron star, a white dwarf or a main-sequence star.
One significant development concerns mass determinations in binaries with white
dwarf companions, which show a broader range of neutron star masses than binary
neutron star pulsars. Perhaps a rather narrow set of evolutionary circumstances conspire
to form double neutron star binaries, leading to a restricted range of neutron star masses
[9]. This restriction is likely relaxed for other neutron star binaries. A few of the
white dwarf binaries may contain neutron stars larger than the canonical 1.4 M⊙ value,
including the intriguing case [10] of PSR J0751+1807 in which the estimated mass with
1σ error bars is 2.1 ± 0.2 M⊙. In addition, to 95% confidence, one of the two pulsars
Ter 5 I and J has a reported mass larger than 1.68 M⊙ [11].
Whereas the observed simple mean mass of neutron stars with white dwarf
companions exceeds those with neutron star companions by 0.25 M⊙, the weighted
means of the two groups are virtually the same. The 2.1 M⊙ neutron star, PSR
J0751+1807, is about 4σ from the canonical value of 1.4 M⊙. It is furthermore the
case that the 2σ errors of all but two systems extend into the range below 1.45 M⊙, so
caution should be exercised before concluding that firm evidence of large neutron star
masses exists. Continued observations, which will reduce the observational errors, are
necessary to clarify this situation.
Masses can also be estimated for another handful of binaries which contain an
Quark matter and the astrophysics of neutron stars 5
Figure 4. Measured and estimated masses of neutron stars in binarie pulsars (gold,
silver and blue regions online) and in x-ray accreting binaries (green). For each region,
simple averages are shown as dotted lines; error weighted averages are shown as dashed
lines. For labels and other details, consult Ref. [8].
accreting neutron star emitting x-rays. Some of these systems are characterized by
relatively large masses, but the estimated errors are also large. The system of Vela X-1
is noteworthy because its lower mass limit (1.6 to 1.7M⊙) is at least mildly constrained
by geometry [12].
Raising the limit for the neutron star maximum mass could eliminate entire families
of EOS’s, especially those in which substantial softening begins around 2 to 3ns. This
could be extremely significant, since exotica (hyperons, Bose condensates, or quarks)
generally reduce the maximum mass appreciably.
Ultimate energy density of observable cold baryonic matter
Measurements of neutron star masses can set an upper limit to the maximum possible
energy density in any compact object. It has been found [13] that no causal EOS has
a central density, for a given mass, greater than that for the Tolman VII [14] analytic
solution. This solution corresponds to a quadratic mass-energy density ρ dependence
Quark matter and the astrophysics of neutron stars 6
Figure 5. Model predictions are compared with results from the Tolman IV and
VII analytic solutions of general relativistic stucture equations. NR refers to non-
relativistic potential models, R are field- theoretical models, and Exotica refers to NR
or R models in which strong softening occurs, due to hyperons, a Bose condensate, or
quark matter as well as self-bound strange quark matter. Constraints from a possible
redshift measurement of z = 0.35 is also shown. The dashed lines for 1.44 and 2.2 M⊙
serve to guide the eye. Figure taken from Ref. [13].
on r, ρ = ρc[1− (r/R)2], where the central density is ρc. For this solution,
ρc,T V II = 2.5ρc,Inc ≃ 1.5× 1016
g cm−3 . (1)
A measured mass of 2.2 M⊙ would imply ρmax < 3.1× 1015 g cm−3, or about 8ns.
Figure 5 displays maximum masses and accompanying central densities for a wide
wariety of neutron star EOS’s, including models containing significant softening due
to “exotica”, such as strange quark matter. The upper limit to the density could be
lowered if the causal constraint is not approached in practice. For example, at high
densities in which quark asymptotic freedom is realized, the sound speed is limited to
3. Using this as a strict limit at all densities, the Rhoades & Ruffini [15] mass
limit is reduced by approximately 1/
3 and the compactness limit GM/Rc2 = 1/2.94
is reduced by a factor 3−1/4 to 1/3.8 [16]. In this extreme case, the maximum density
would be reduced by a factor of 3−1/4 from that of Eq. (1). A 2.2 M⊙ measured mass
would imply a maximum density of about 4.2ns.
5. Gravitational wave signals during mergers of binary stars
Mergers of compact objects in binary systems, such as a pair of neutron stars (NS-
NS), a neutron star and a black hole (NS-BH), or two black holes (BH-BH), are
expected to be prominent sources of gravitational radiation [17]. The gravitational-
wave signature of such systems is primarily determined by the chirp mass Mchirp =
Quark matter and the astrophysics of neutron stars 7
Figure 6. Physical and observational variables in mergers between low-mass black
holes and neutron stars or self-bound quark stars. The total system mass is 6 M⊙ and
the initial mass ratio is q = 1/3 in both cases. The initial radii of the neutron star
and quark star were assumed to be equal. The time scales have arbitrary zero points.
Upper panel displays semi-major axis a (thick lines) and component mass MNS,MQS
(thin lines) evolution. Lower panel displays orbital frequency ν (thick lines) and strain
amplitude |h+r| evolution. Solid curves refer to the neutron star simulation and dashed
curves to the quark star simulations. Figure taken from Ref. [8].
(M1M2)
3/5(M1 +M2)
−1/5, where M1 and M2 are the masses of the coalescing objects.
The radiation of gravitational waves removes energy which causes the mutual orbits to
decay. For example, the binary pulsar PSR B1913+16 has a merger timescale of about
250 million years, and the pulsar binary PSR J0737-3039 has a merger timescale of
about 85 million years [18], so there is ample reason to expect that many such decaying
compact binaries exist in the Galaxy. Besides emitting copious amounts of gravitational
radiation, binary mergers have been proposed as a source of the r-process elements [19]
and the origin of the shorter-duration gamma ray bursters [20].
Observations of gravity waves from merger events can simultaneously measure
masses and radii of neutron stars, and could set firm limits on the neutron star maximum
mass [21, 22]. Binary mergers for the two cases of a black hole and a normal neutron
star and a black hole and a self-bound strange quark matter star (Fig. 6) illustrate
the unique opportunity afforded by gravitational wave detectors due to begin operation
over the next decade, including LIGO, VIRGO, GEO600, and TAMA.
A careful analysis of the gravitational waveform during inspiral yields values for
not only the chirp mass Mchirp, but for also the reduced mass MBHMNS/M , so that
both MBH and MNS can be found [23]. The onset of mass transfer can be determined
by the peak in ω, and the value of ω there gives a. A general relativistic analysis of
mass transfer conditions then allows the determination of the star’s radius [22]. Thus
a point on the mass-radius diagram can be estimated [24]. The combination h+ω
depends only on a function of q, so the ratio of that combination and knowledge of qi
Quark matter and the astrophysics of neutron stars 8
should allow determination of qf . From the Roche condition and knowledge of af from
ωf , another mass-radius combination can be found.
The sharp contrast between the evolutions during stable mass transfer of a normal
neutron star and a strange quark star should make these cases distinguishable. For
strange quark matter stars, the differences in the height of the frequency peak and the
plateau in the frequency values at later times are related to the differences in radii of
the stars at these two epochs. It could be an indirect indicator of the maximum mass
of the star: the closer is the star’s mass before mass transfer to the maximum mass, the
greater is the difference between these frequency values, because the radius change will
be larger. Together with radius information, the value of the maximum mass remains the
most important unknown that could reveal the true equation of state at high densities.
Acknowledgments
This work was supported in part by the U.S. Department of Energy under the grant
DOE/DE-FG02-93ER40756.
References
[1] Collins J C and Perry M J 1975 Phys. Rev. Lett. 30, 1353
[2] Pons J A, Steiner A W, Prakash M and Lattimer J A 2001 Phys. Rev. Lett. 86, 5223
[3] Ellis P J, Lattimer J M and Prakash M 1996 Comments in Nuclear and Particle Physics 22, 63
[4] Lattimer J M, Pethick C J, Prakash M and Haensel P, 1991, Phys. Rev. Lett. 66, 2701
[5] Page D, Prakash M, Lattimer J M, and Steiner A W, 2000 Phys. Rev. Lett. 85, 2048
[6] Page D, Lattimer J M, Prakash M and Steiner A W 2004 Astrophys. Jl 155, 623
[7] Lattimer J M and Prakash M 2004 Science 304,536
[8] Lattimer J M and Prakash M 2006, astro-ph/0612440
[9] Bethe H A and Brown G E 1998 Astrophys. Jl 506, 780
[10] Nice D J et al. 2005 Astrophys. Jl. 634, 1242
[11] Ransom S M 2005 Science 307, 892
[12] Quaintrell et al. 2003 Astron. Astrophys. 401, 303
[13] Lattimer J M and Prakash M Phys. Rev. Lett. 94, 111101
[14] Tolman R C 1939 Phys. Rev. 55, 364
[15] Rhoades C E and Ruffini R 1974 Phys. Rev. Lett. 32, 324
[16] Lattimer J M, Prakash M, Masak D, and Yahil A 1990 Astrophys. Jl. 355, 241
[17] Thorne K S, 1973 Three Hundred Years of Gravitation, ed. S. W. Hawking and W. Israel,
Cambridge Univ. Press, Cambridge, Ch. 9
[18] Lyne A. G. et al., 2004 Science 303, 1153
[19] Lattimer J M and Schramm D, 1976 Astrophys. Jl. 210, 549
[20] Eichler D et al., 1989 Science 340, 126
[21] Prakash M and Lattimer J M 2003, J. Phys. G. Nucl. Part. Phys. 30, S451
[22] Ratkovic S, Prakash M and Lattimer J M 2005, astro-ph/0512136
[23] Cutler C and Flanagen E E 1994, Phys. Rev. D49, 2658
[24] Faber J A et al., 2002 Phys. Rev. Lett. 89, 231102
http://arxiv.org/abs/astro-ph/0612440
http://arxiv.org/abs/astro-ph/0512136
	Introduction
	Neutrino signals during the birth of a neutron star
	Photon signals during the thermal evolution of a neutron star
	Mesured masses and their implications
	Gravitational wave signals during mergers of binary stars
ABSTRACT
  Some of the means through which the possible presence of nearly deconfined
quarks in neutron stars can be detected by astrophysical observations of
neutron stars from their birth to old age are highlighted.

<|endoftext|><|startoftext|>
Introduction
Let k be an algebraically closed field. A fusion category C over k is a k-linear
semi-simple rigid monoidal category with finitely many (isomorphism classes of)
simple objects, finite dimensional morphism spaces, and End(1) ∼= k. See [7] or [4]
for definitions, and [2] for many of the known results about fusion categories.
The rank r of C is the number of isomorphism classes of simple objects in C. Let
{xi}1≤i≤r be a set of simple object representatives. The fusion rules for C are a
set of r × r N-valued matrices N = {Ni}1≤i≤r, with (Ni)j,k denoted Nkij or, when
convenient, Nxkxixj , such that xi⊗xj ∼=
1≤k≤r N
ijxk. In the sequel, assume k = C.
Fusion categories appear in representation theory, operator algebras, conformal
field theory, and in constructions of invariants of links, braids, and higher dimen-
sional manifolds. There is currently no general classification of them. Classifications
of fusion categories for various families of fusion rules have been given in work by
Kerler ([3]), Tambara and Yamagami ([12]), Kazhdan and Wenzl ([5]), and Wenzl
and Tuba ([13]).
For a given set of fusion rules, there are only finitely many monoidal natural
equivalence classes of fusion categories. This property is called Ocneanu rigidity (see
[2]). It is not known whether or not the number of fusion categories of a given rank
is finite. If one assumes a modular structure, the possibilities up to rank four have
been classified by Belinschi, Rowell, Stong and Wang in [11]. Ostrik has classified
fusion categories up to rank two in [10], and constructed a finite list of realizable
fusion rules for braided categories up to rank three in [9], in which the number of
categories for each set of fusion rules is known. The rank two classification relies in
an essential way on the theory of modular tensor categories; Ostrik shows that the
quantum double of a rank two category must be modular, and uses the theory of
modular tensor categories to eliminate most of the possibilities. The classification
of modular tensor categories is of independent interest; in many contexts one must
assume modularity.
We consider the only set of rank three fusion rules which is known to be realizable
as a fusion category but which has no braided realizations. Ostrik conjectured in
http://arxiv.org/abs/0704.0208v2
2 TOBIAS J. HAGGE AND SEUNG-MOON HONG
[9] that a classification for this rule set completes the classification of rank three
fusion categories.
The axioms for fusion categories over C reduce to a system of polynomial equa-
tions over C. In this context, Ocneanu rigidity, roughly translated, says that nor-
malization of some of the variables in the equations gives a finite solution set. In
this case, one can compute a Gröbner basis for the system and obtain the solutions
(see [1]). However, normalization becomes complicated when there are i, j, k such
that Nkij > 1. The fusion rules we consider are the smallest realizable set with this
property.
2. Main theorem and outline
Theorem 1 (Main Theorem). Consider the set of fusion rules with three simple
object types, x, y and 1. Let 1 be the trivial object, and let x⊗ x ∼= x⊕ x⊕ y⊕ 1,
x⊗ y ∼= y ⊗ x ∼= x and y ⊗ y ∼= 1. Then the following hold:
(1) Up to monoidal natural equivalence, there are four semisimple tensor cat-
egories with the above fusion rules. A set of associativity matrices for one
of these categories is given in Appendix A. Applying a nontrivial Galois
automorphism to all of the coefficients gives a set of matrices for any one
of the other three categories.
(2) The categories in part 1 are fusion categories.
(3) The categories in part 1 do not admit braidings.
(4) The categories in part 1 are spherical.
The structure of the remainder of the paper is as follows:
Section 3 describes the notation and categorical preliminaries used in later parts
of the paper. It constructs a canonical representative for each monoidal natural
equivalence class of fusion categories. This construction is really just two well
known constructions, skeletization and strictification, applied in sequence. These
constructions, taken together, form a bridge between the category theoretic lan-
guage in the statements of the theorems and the algebra appearing in the proofs.
For some of the calculations in this paper the translation between the category the-
ory and the algebra is already widely known, but there are some subtleties when
discussing pivotal structure that justify the treatment. Section 3 concludes by de-
scribing the algebraic equations corresponding to the axioms for a fusion category,
using the language of strictified skeletons.
Section 4 proves part 1 of Theorem 1. The proof amounts to solving the variety
of polynomial equations defined in the previous section, performing normalizations
along the way in order to simplify calculations. The section ends by arguing that
the nature of the normalizations guarantees that the solutions obtained really are
monoidally inequivalent. Section 5 proves part 2 of Theorem 1 by explicitly com-
puting rigidity structures.
Part 3 of Theorem 1 follows from Ostrik’s classification of rank three braided
fusion categories in [9]. Section 6 gives a direct proof by showing that there are no
solutions to the hexagon equations.
Section 7 defines pivotal and spherical structures and discusses their properties.
The focus is on the question of whether every fusion category is pivotal and spher-
ical. A novel and elementary proof that the quadruple dual functor is naturally
isomorphic to the identity functor is given. This proof makes use of the strictified
skeleton construction developed in Section 3. The section concludes by describing
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 3
what a pivotal category which does not admit a spherical structure would look like.
In particular, it must have at least five simple objects.
Section 8 proves part 4 of Theorem 1 by computing explicit pivotal structures
for the four categories given in Section 4, and invoking a lemma from Section 7 for
sphericity.
3. Preliminaries and notational conventions
This paper uses the “composition of morphisms” convention for functions as
well as morphisms, and left to right matrix multiplication. For calculations of the
fusion rules, our treatment is similar to [12], but the notation differs superficially
for typographic reasons. The notation captures algebraic data sufficient to classify
a fusion category up to monoidal natural equivalence, and is reviewed later in this
section.
We assume that the reader is familiar with the notions of a monoidal category,
a monoidal functor and a monoidal equivalence; for precise definitions, see [7].
Recall that a monoidal category is equipped with an associative bifunctor ⊗ and
a distinguished object 1. Reassociation of tensor factors in a monoidal category is
described by a natural isomorphism of trifunctors α : (−⊗−)⊗− → −⊗ (−⊗−).
Tensor products with 1 have natural isomorphisms ρ : −⊗1 → − and λ : 1⊗− → −.
These isomorphisms are subject to a coherency condition, namely that for any pair
of multifunctors there is at most one natural isomorphism between them which may
be constructed from λ, ρ, α and their inverses, along with Id and ⊗. This coherency
condition is well known to be equivalent to the statement that the category satisfies
the pentagon and triangle axioms (see [7] for a proof).
The triangle equations are the equations ρx⊗y = αx,1,y ◦ (x⊗λy) for all ordered
pairs (x, y) of objects. Here, and in the sequel when the context is unambiguous,
the name of an object is used as a shorthand for the identity morphism on that
object. The pentagon equations, defined for all tuples of objects (w, x, y, z), are as
follows (see also Figure 1):
(αx,y,z ⊗ w) ◦ αx,y⊗z,w ◦ (x⊗ αy,z,w) = αx⊗y,z,w ◦ αx,y,z⊗w,
When studying fusion categories up to monoidal equivalence, one may choose
categories within an equivalence class which have desirable attributes. Since the
categorical properties considered in this paper (fusion rule structure, monoidality,
pivotality, sphericity, presence of braidings) are all well known to be preserved un-
der monoidal equivalence, the desirable attributes may be assumed without loss. In
particular, one may construct, given an arbitrary fusion category, a canonical rep-
resentative for that category’s equivalence class in which one may replace instances
of the words “is isomorphic to” with “equals”.
3.1. Skeletization. The skeleton CSKEL of an arbitrary category C is any full
subcategory of C containing exactly one object from each isomorphism class in
C. If C is semi-simple, every object in C is isomorphic to a direct sum of simple
objects in CSKEL. One may then assume without loss that the objects of CSKEL
consist of simple object representatives and direct sums of such.
It is a well known fact that CSKEL may be given a monoidal structure such
that CSKEL and C are monoidally equivalent. The proof is a straightforward but
tedious extension of Maclane’s proof of the natural equivalence an ordinary category
4 TOBIAS J. HAGGE AND SEUNG-MOON HONG
and its skeleton (see section IV.4 in [7]). In that proof, one defines a family of
isomorphisms ix from objects x to their isomorphic representatives in CSKEL and
uses it to construct a pair of functors F and G which give a natural equivalence.
For the extension, the ix are used to define the tensor product functor on CSKEL,
as well as α, λ, ρ and the monoidal structures for F and G. One then writes out
all of the relevant commutative diagrams and removes any compositions ix ◦ i−1x .
The result in each case is a commutative diagram in C.
3.2. Strictification. Given a monoidal category C, one may construct a strict
monoidal category CST R equivalent to C. In a strict monoidal category, α, λ and ρ
are the identity. It is common practice to assume that a monoidal category is strict
without explicit reference to the construction. However, by using the construction
explicitly we will be able to pick a canonical representative for an equivalence class of
monoidal categories and provide a natural interpretation of the graphical calculus.
Strictification of a monoidal category is analogous to the construction of a tensor
algebra; it gives an equivalent strict category CSTR by replacing the tensor product
with a strictly associative formal tensor product. The objects of CSTR are finite
sequences of objects in C. Morphism spaces of the form
Mor((a1, a2, . . . , am−1, am), (b1, b2, . . . , bn−1, bn))
are given by
Mor(a1 ⊗ (a2 ⊗ . . . (am−1 ⊗ am) . . . ), b1 ⊗ (b2 ⊗ . . . (bn−1 ⊗ bn) . . . ))
in C. The tensor product on objects is just concatenation of sequences, for mor-
phisms it is the tensor product in C pre and post-composed with appropriate asso-
ciativity morphisms. Monoidal equivalence of C with CST R is proven in section XI.3
of [7].
It is not usually possible to make a fusion category strict and skeletal at the
same time. However, the category (CSKEL)ST R, while not a skeleton, is still unique
up to strict natural equivalence. Also, it is a categorical realization of a graphical
calculus, as will shortly become clear. The next subsection describes what strictified
skeleta of fusion categories look like, up to strict equivalence.
3.3. Strictified skeletal fusion categories. A strictified skeletal fusion category
C is as follows: Let N be a set of fusion rules for a set of objects S. Then the objects
in C are multisets of finite sequences of elements of S. C has a tensor product ⊗,
which is defined on objects by pairwise concatenation of sequences, distributed over
elements of multisets. Direct sum of objects is given by multiset disjoint union.
A strand is an object which is a sequence of length one. Strands correspond
to simple object types. If x, y and z are strands, define Mor(x ⊗ y, z) to be a k
vector space isomorphic to kN
xy . For brevity, V yx will denote Mor(x, y), and tensor
products will be omitted when the context is clear. A morphism is (n,m)-stranded
if its source and target are sequences of length n and m, respectively. A morphism
is (n)-stranded if it is (m,n−m)-stranded for some 0 ≤ m ≤ n.
Semi-simplicity of C means that for all objects w, x, y and z there are vector space
isomorphisms
v∈S V
xy ⊗ V zwv ∼= V zwxy ∼=
v∈S V
wx ⊗ V zvy . The first isomorphism
is given by f ⊗ g → (Idw ⊗ f) ◦ g, the inverse of the second by h⊗ l → (h⊗ Idy) ◦ l.
The composition of the two isomorphisms is denoted αzw,x,y. Additionally, each
morphism space V zxy has an algebraically dual space V
z , in the sense that there
are bases {vi}i ⊂ V zxy and {wi}i ⊂ V xyz such that wi ◦ vj = δijIdz.
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 5
In a strictified skeleton, the trivial object 1 is the zero length sequence. There
is a strand which is isomorphic to the trivial object, but not equal. This strand
shall be taken to be 1 in the sequel. This choice makes the category non-strict,
since λ and ρ are no longer the identity, but it is convenient for graphical calculus
purposes.
(Right) rigidity in a strictified skeleton means that there is a set involution ∗ on
the strands, and each strand x has morphisms bx : 1 → x⊗ x∗ and dx : x∗ ⊗ x → 1
such that (Idx∗ ⊗ bx) ◦ (dx ⊗ Idx∗) = Idx∗ and (bx ⊗ Idx) ◦ (Idx ⊗ dx) = Idx.
This implies that Nzxy = N
z∗x. Left rigidity is similar, and in the sequel, the right
rigidity morphism for x∗ will be defined to be the left rigidity morphism for x.
Define ∗, b, and d on concatenations of strands such that dx⊗y = (Idy∗ ⊗ dx ⊗
Idy) ◦ dy and extend to direct sums. Then there is a contravariant (right) dual
functor ∗ which sends f ∈ V yx to f∗ = (Idy∗ ⊗ bx)◦ (Idy∗ ⊗f ⊗ Idx∗)◦ (dy ⊗ Idx∗) ∈
y∗ . The definition of a left dual functor is similar, and the two duals are inverse
functors by rigidity.
Monoidality for a strictified skeletal fusion category implies that the α are iden-
tity morphisms. For this to be true, it is necessary and sufficient that the following
equation holds for all objects x, y, z, w, and u. Each instance will be referred to as
Pux,y,z,w in the sequel.
αty,z,wV
V syzα
x,s,w ◦
αtx,y,zV
V szwV
V syzV
V syzV
V sxyV
is equal to
V szwα
x,y,s ◦ τ ◦
V sxyα
s,z,w :
V szwV
V szwV
V sxyV
V sxyV
Here ⊗ for vector spaces and morphisms are omitted, and τ is the isomorphism
interchanging the first and the second factors of vector space tensor products (see
Figure 1).
3.4. Remarks.
(1) Every fusion category is monoidally naturally equivalent to a strictified
skeleton. Also, two naturally equivalent strictified skeleta have an invert-
ible equivalence functor that takes strands to strands. This implies that
equivalences are given by permutations of strands along with changes of
basis on the (2, 0) and (2, 1)-stranded morphism spaces.
(2) The functor ∗∗ fixes objects. The isomorphisms Jx,y : x∗∗⊗y∗∗ → (x⊗y)∗∗
associated with ∗∗ in the definition of a monoidal functor (see [7]) may be
taken to be trivial. There is an invertible scalar worth of freedom in the
choice of each bx, dx pair.
(3) Semi-simplicity allows every morphism to be built up from (3)-stranded
morphisms. Choosing bases for the (3)-stranded morphisms allows mor-
phisms in C to be characterized as undirected trivalent graphs with labeled
6 TOBIAS J. HAGGE AND SEUNG-MOON HONG
((x⊗ y)⊗ z)⊗ w
αx,y,zw
uukkk
αxy,z,w
(x⊗ (y ⊗ z))⊗ w
αx,yz,w
(a) (x ⊗ y)⊗ (z ⊗ w)
αx,y,zw
x⊗ ((y ⊗ z)⊗ w)
xαy,z,w // x⊗ (y ⊗ (z ⊗ w))
s,t V
s,t V
V sxyα
s,z,woo
s,t V
αtx,y,zV
s,t V
s,t V
V syzα
x,s,w
s,t V
y,z,wV
V szwα
x,y,s
Figure 1. (a) Pentagon equality and (b) corresponding equality
edges and vertices, subject to associativity relations given by the penta-
gon equations. The labels for the edges are isomorphism types of simple
objects; the labels for the vertices are basis vectors for the corresponding
morphism spaces. This gives a categorically precise interpretation of an
arrowless graphical calculus for C.
(4) If C is pivotal (see Section 7 for the definition), a well known construction
allows one to add a second copy of each object and get a strict pivotal
category. This construction gives a graphical calculus with arrows on the
strands.
(5) Strictified skeleta give any categorical structure preserved under natural
equivalence (and any functorial property preserved under natural isomor-
phism) a purely algebraic description.
4. Proof of Theorem 1 part 1:possible tensor category structures
In this section we classify, up to monoidal equivalence, all C-linear semisimple
tensor categories with fusion rules given in Theorem 1. This amounts to solving
the matrix equations described in the previous section. The simplest equations
(those involving 1× 1 matrices) are solved first, and normalizations are performed
as necessary in order to simplify the equations.
4.1. Setting up the pentagon equations. The fusion rules are given by x⊗x ∼=
1⊕y⊕x⊕x, x⊗y ∼= y⊗x ∼= x, and y⊗y ∼= 1. The non-trivial vector spaces are V 111,
V x1x ,V
x1, V
1y , V
y1, V
xy, V
yx, V
yy, V
xx, V
xx,and V
xx, and they are all 1-dimensional
except the last space which is 2-dimensional.
Let’s choose basis vectors in each space. If we fix any non-zero vector v111 ∈ V 111,
then there are unique vectors vx1x ∈ V x1x, vxx1 ∈ V xx1,v
1y ∈ V
1y, and v
y1 ∈ V
y1 such
that the triangle equality holds. For the other spaces, choose any non-zero vectors
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 7
in each space and denote them by vxxy ∈ V xxy, vxyx ∈ V xyx, v1yy ∈ V 1yy, v1xx ∈ V 1xx,
vyxx ∈ V yxx, v1 and v2 ∈ V xxx where the two vectors v1 and v2 are linearly independent.
There are 30 associativities. It is a well known fact that if at least one of
the bottom objects is 1 then the associativity is trivial. That is, with the above
basis choices the matrix for αzu,v,w is trivial if at least one of the u, v and w is 1.
Now we have ten non-trivial 1-dimensional associativities, αyy,y,y,α
x,y,y,α
y,y,x,α
x,y,x,
αyx,y,x,α
y,x,y,α
x,x,y,α
x,x,y, α
y,x,x, and α
y,x,x, five non-trivial 2-dimensional ones,
αxx,y,x,α
x,x,y,α
y,x,x,α
x,x,x, and α
x,x,x, and one 6-dimensional one, α
x,x,x.
4.2. Normalizations. With the above basis choices we obtain a basis for each
tensor product of vector spaces in a canonical way and can parameterize each
associativity and pentagon equation. However, at this point our basis elements
have not been uniquely specified, and we should expect to obtain solutions with
free parameters. As the calculation progresses it will be convenient to simplify the
pentagon equations by requiring certain coefficients of certain associativity matrices
to be 1 or 0. These normalizations should be thought of as restrictions on the basis
choices made above. Normalizations simplify the equations and have an additional
advantage: once the set of possible bases is sufficiently restricted, Ocneanu rigidity
[2] guarantees a finite set of possibilities for the associativity matrices of fusion
categories, which can be found algorithmically by computing a Gröbner basis.
4.3. Associativity matrices. The following are the 1-dimensional associativities:
αyy,y,y : v
y1 7→ ayy,y,yv1yyv
αxx,y,y : v
x1 7→ axx,y,yvxxyvxxy
αxy,y,x : v
yx 7→ axy,y,xv1yyvx1x
α1x,y,x : v
xx 7→ a1x,y,xvxxyv1xx
αyx,y,x : v
xx 7→ ayx,y,xvxxyvyxx
αxy,x,y : v
yx 7→ axy,x,yvxyxvxxy
α1x,x,y : v
xx 7→ a1x,x,yvyxxv1yy
αyx,x,y : v
xx 7→ ayx,x,yv1xxv
α1y,x,x : v
yy 7→ a1y,x,xvxyxv1xx
αyy,x,x : v
y1 7→ ayy,x,xvxyxvyxx
where associativity coefficients are all non-zero.
For 2-dimensional and 6-dimensional associativities we need to fix the ordering
of basis elements in each Hom vector space. The orderings are as follows:
{vxyxv1, vxyxv2} for V xx(yx), {vxxyv1, vxxyv2} for V x(xy)x,
{vxxyv1, vxxyv2} for V xx(xy), {v1v
xy, v2v
xy} for V x(xx)y,
{v1vxyx, v2vxyx} for V xy(xx), {v
yxv1, v
yxv2} for V x(yx)x,
{v1v1xx, v2v1xx} for V 1x(xx), {v1v
xx, v2v
xx} for V 1(xx)x,
{v1vyxx, v2vyxx} for V
x(xx)
, {v1vyxx, v2vyxx} for V
(xx)x
{v1xxvxx1, vyxxvxxy, v1v1, v1v2, v2v1, v2v2} for V xx(xx),
and {v1xxvx1x, vyxxvxyx, v1v1, v1v2, v2v1, v2v2} for V x(xx)x.
With these ordered bases, each associativity has a matrix form (recall that we are
using the right multiplication convention). That is, αxx,y,x is given by the invertible
2 × 2 matrix axx,y,x, and αxx,x,y is given by the invertible 2 × 2 matrix axx,x,y,etc.,
and finally αxx,x,x is given by the invertible 6× 6 matrix axx,x,x.
8 TOBIAS J. HAGGE AND SEUNG-MOON HONG
4.4. Pentagon equations with 1× 1 matrices. Considering only nontrivial as-
sociativities, there are 17 1-dimensional pentagon equations, 14 2-dimensional pen-
tagon equations, 6 6-dimensional ones, and 1 16-dimensional one. Without redun-
dancy, the following are the 1-dimensional equations:
P xx,y,y,y : a
y,y,y a
x,y,y = a
x,y,y.
P 1x,x,y,y : a
x,y,ya
x,x,y a
x,x,y = 1 ,
P 1x,y,x,y : a
y,x,y a
x,y,x = a
x,y,x ,
P yx,y,x,y : :a
y,x,y a
x,y,x = a
x,y,x ,
P 1x,y,y,x : a
y,y,x a
x,y,y = (a
x,y,x)
P yx,y,y,x : a
y,y,x a
x,y,y = (a
x,y,x)
P xy,x,y,y :(a
y,x,y)
2 = 1 ,
P 1y,y,x,x : a
y,x,x a
y,x,x a
y,y,x = 1 ,
P 1y,x,x,y : a
x,x,y a
y,x,x = a
y,x,x a
x,x,y
If we normalize the basis we may assume axy,y,x, a
x,y,x and a
x,x,y to be 1 (for
normalization see [12] or [6]), and we can solve the above 1-dimensional equations.
Here is the solution:
ayy,y,y = a
x,y,y = a
x,x,y = 1, a
y,x,y = a
x,y,x = ±1, a1y,x,x = ayy,x,x = ±1.
Let’s say g := axy,x,y = a
x,y,x and h := a
y,x,x = a
y,x,x in the sequel. Also let
A := axx,y,x, B := a
x,x,y, D := a
x,x,x, E := a
x,x,x, F := a
y,x,x and Φ := a
x,x,x for
brevity.
4.5. Pentagon equations with 2× 2 or 6× 6 matrices. Now, the following are
the 2-dimensional pentagon equations using the above 1-dimensional solutions:
P xy,y,x,x : F
2 = Id2
P xy,x,y,x : gAF = FA
P xx,y,y,x : A
2 = Id2
P xy,x,x,y : gBF = FB
P xx,y,x,y : gBA = AB
P xx,x,y,y : B
2 = Id2
P 1y,x,x,x : EF = D
P yy,x,x,x : DF = E
P 1x,y,x,x : FDA = D
P yx,y,x,x : FEA = gE
P 1x,x,y,x : ADB = D
P yx,x,y,x : AEB = gE
P 1x,x,x,y : BE = D
P yx,x,x,y : BD = E
It should be noted that for this particular category the large number of one
dimensional morphism spaces gives us q-commutativity relations and matrices with
±1 eigenvalues, which are of great help when simplifying the pentagon equations
by hand.
To analyze 2-dimensional and 6-dimensional pentagon equations, at first let’s
look at the isomorphism τ interchanging the first and the second factors of tensor
products. This change of basis is necessary for 6-dimensional pentagon equations
because the image basis of the matrix for αux,y,zw and the domain basis of the matrix
for αuxy,z,w may not be the same. For P
x,y,x,x, τ is an isomorphism from the space
V 1xxV
x1⊕V yxxV xxyV xxy⊕V xxxV xxyV xxx to V xxyV 1xxV xx1⊕V xxyV yxxV xxy⊕V xxyV xxxV xxx, both of
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 9
which correspond to Hom((x⊗ y)⊗ (x⊗x), x). With the canonically ordered basis
{v1xxvxxyvxx1, vyxxvxxyvxxy, vivxxyvj} and {vxxyv1xxvxx1, vxxyvyxxvxxy, vxxyvivj}, respectively,
τ turns out to be I6. For P
y,x,x,x, P
x,x,y,x and P
x,x,x,y, τ is also I6. But for P
x,x,x,x,
it is τ1, and for P
x,x,x,x, it is τ2, defined as follows:
τ1 :=
1 0 0 0 0 0
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 0 1 0
0 0 0 1 0 0
0 0 0 0 0 1
, τ2 :=
0 1 0 0 0 0
1 0 0 0 0 0
0 0 1 0 0 0
0 0 0 0 1 0
0 0 0 1 0 0
0 0 0 0 0 1
Here are the 6 6-dimensional pentagon equations:
P xy,x,x,x ; Φ(I2 ⊕ I2 ⊗ F )
⊕ F ⊗ I2
⊕ I2 ⊗ F
P xx,y,x,x ;
⊕ F ⊗ I2
⊕A⊗ I2
= (I2 ⊕ I2 ⊗A)Φ
P xx,x,y,x ;
⊕A⊗ I2
⊕B ⊗ I2
= Φ(I2 ⊕ I2 ⊗A)
P xx,x,x,y ;
⊕B ⊗ I2
(I2 ⊕ I2 ⊗ B)Φ = Φ
⊕ I2 ⊗B
P 1x,x,x,x ; (I2 ⊕ I2 ⊗D)Φ = (I2 ⊕ I2 ⊗D)τ1
⊕ I2 ⊗D
P yx,x,x,x ; Φ
⊕ I2 ⊗ E
Φ = (I2 ⊕ I2 ⊗ E)τ2
⊕ I2 ⊗ E
If we normalize the basis {v1, v2} of V xxx, we may assume A is of the form
and then get g = −1 and h = 1 using the above equations. Following is the
computation for this:
At first we may assume that matrix A is of the Jordan canonical form, then
A = ±I2 or
from P xx,y,y,x. We eliminate the possibility A = ±I2 from
P xy,x,y,x, P
x,y,x,x and P
y,x,x,x which imply respectively that g = 1, F = ±I2 and
then det(Φ) = 0, since the first two columns of Φ are scalar multiples of each other.
So we conclude A =
. Now we eliminate the possibility g = 1 using P xy,x,y,x,
P xy,y,x,x and P
y,x,x,x, which imply F is a diagonal matrix, with entries ±1 and then
det(Φ) = 0, respectively. For the case A =
and g = −1, F is of the form
[ 0 f
1/f 0
from P xy,x,y,x and P
y,y,x,x, and B is of the form
1/b 0
from P xx,y,x,y and
P xx,x,y,y. If h = −1, the first column of Φ has to be zero by comparing the first and
the second columns of P xy,x,x,x, P
x,y,x,x, P
x,x,y,x, and P
x,x,x,y.
At this point we have fixed all 1-dimensional associativity matrices.
From the above equations, we get
[ 0 f
1/f 0
for F and
1/b 0
for B with the
relation f2 + b2 = 0 from P xy,x,x,y. We note that the diagonalization of A defines
each basis element v1 and v2 only up to choice of a nonzero scalar. By using up
one of these degrees of freedom, we may assume f = 1. Then from the above
6-dimensional equations, we get the following:
D = d
E = d
φ φ −wb w w −wb
φ −φ −wb w −w wb
x x −yb z y −zb
x x −zb y z −yb
x −x −yb z −y zb
−x x zb −y z −yb
10 TOBIAS J. HAGGE AND SEUNG-MOON HONG
4.6. The pentagon equation with 16 × 16 matrices. Now we analyze the 16-
dimensional pentagon equation P xx,x,x,x. It is convenient to express each Hom vector
space in two different ways and put basis permutation matrices into the pentagon
equation. The following are two expressions with ordered direct sum.
Hom(x(x(xx)), x) :
V xxxV
x1 ⊕ V xxxV yxxV xxy ⊕ V 1xxV xx1V xxx ⊕ V yxxV xxyV xxx ⊕ V xxxV xxxV xxx, and
V 1xxV
xx ⊕ V yxxV xxyV xxx ⊕ V xxxV 1xxV xx1 ⊕ V xxxV yxxV xxy ⊕ V xxxV xxxV xxx
Hom(x((xx)x)), x) :
V xxxV
x1 ⊕ V xxxV yxxV xxy ⊕ V 1xxV x1xV xxx ⊕ V yxxV xyxV xxx ⊕ V xxxV xxxV xxx, and
V 1xxV
xx ⊕ V yxxV xyxV xxx ⊕ V xxxV 1xxV xx1 ⊕ V xxxV yxxV xxy ⊕ V xxxV xxxV xxx
Hom((x(xx))x), x) :
V 1xxV
xx ⊕ V yxxV xxyV xxx ⊕ V xxxV 1xxV x1x ⊕ V xxxV yxxV xyx ⊕ V xxxV xxxV xxx, and
V xxxV
1x ⊕ V xxxV yxxV xyx ⊕ V 1xxV xx1V xxx ⊕ V yxxV xxyV xxx ⊕ V xxxV xxxV xxx
Hom((((xx)x)x), x) :
V xxxV
1x ⊕ V xxxV yxxV xyx ⊕ V 1xxV x1xV xxx ⊕ V yxxV xyxV xxx ⊕ V xxxV xxxV xxx, and
V 1xxV
xx ⊕ V yxxV xyxV xxx ⊕ V xxxV 1xxV x1x ⊕ V xxxV yxxV xyx ⊕ V xxxV xxxV xxx
Hom((xx)(xx), x) :
V 1xxV
x1 ⊕ V yxxV xxxV xxy ⊕ V xxxV 1xxV x1x ⊕ V xxxV yxxV xyx ⊕ V xxxV xxxV xxx, and
V 1xxV
1x ⊕ V yxxV xxxV xyx ⊕ V xxxV 1xxV xx1 ⊕ V xxxV yxxV xxy ⊕ V xxxV xxxV xxx
where each direct summand space has canonical ordered basis. For exam-
ple V xxxV
xx has basis {vivjvk} where (i, j, k) range from 1 to 2 in the order
(1, 1, 1), (1, 1, 2), (1, 2, 1), etc., and V xxxV
1x has {v1v1xxvx1x, v2v1xxvx1x}.
τ3 :=
⊕ I8 and
τ4 :=
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
Then the pentagon equation P xx,x,x,x is of the form:
(D ⊕ E ⊕ Φ⊗ I2)τ3(I2 ⊕A⊕ Φ̃)τ3(D ⊕ E ⊕ Φ⊗ I2)τ3
= τ3(I2 ⊕B ⊕ Φ̃)τ4(I2 ⊕ F ⊕ Φ̃)
where
φ 0 φ 0 −wb w w −wb 0 0 0 0
0 φ 0 φ 0 0 0 0 −wb w w −wb
φ 0 −φ 0 −wb w −w wb 0 0 0 0
0 φ 0 −φ 0 0 0 0 −wb w −w wb
x 0 x 0 −yb z y −zb 0 0 0 0
x 0 x 0 −zb y z −yb 0 0 0 0
x 0 −x 0 −yb z −y zb 0 0 0 0
−x 0 x 0 zb −y z −yb 0 0 0 0
0 x 0 x 0 0 0 0 −yb z y −zb
0 x 0 x 0 0 0 0 −zb y z −yb
0 x 0 −x 0 0 0 0 −yb z −y zb
0 −x 0 x 0 0 0 0 zb −y z −yb
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 11
4.7. Solutions. We may assume x = 1 once we normalize basis vector vxxy. Then
from the equations, we get four explicit solution sets for the parameters b, φ, d, w, y,
and z. We list one solution here. All of its values lie in the field Q(
3, i); the
other solutions are obtained by applying Galois automorphisms. The full set of
associativity matrices for this solution is given in Appendix A.
b = i, φ =
, d =
e7πi/12, w =
e2πi/3,
(e−πi/3 + i), z =
e5πi/6.
4.8. Inequivalence of the solutions. To see that these solutions are monoidally
inequivalent, recall from the previous section that for strictified skeletons a natural
equivalence between two solutions to the pentagon equations is limited to change of
basis on the (2, 0) and (2, 1)-stranded morphism spaces, along with permutation of
the strands. In our case permutation of strands does not preserve the fusion rules.
Therefore, we must show that it is not possible to replicate the effect of a nontrivial
Galois automorphism by change of basis choices for the (2, 0) and (2, 1)-stranded
morphism spaces.
The Galois automorphism that fixes
3 and sends i to −i changes the eigenvalues
of the matrix a1x,x,x. However, a
x,x,x is determined by the basis choices v1 and v2,
and its rows and columns are indexed by v1 and v2. Thus changes to v1 and v2
conjugate a1x,x,x by a change of basis matrix, which doesn’t affect its eigenvalues.
Therefore this automorphism does not correspond to a change of basis.
The other two Galois automorphisms send
3 to −
3 and thus change the
value of the (1, 1) entry of axx,x,x. But this entry is invariant under change of basis.
Therefore no Galois automorphism corresponds to a change in basis, and the four
solutions given above are mutually monoidally inequivalent.
5. Proof of Theorem 1 part 2: rigidity structures
This section explicitly computes rigidity structures for the categories given in
the previous section. Rigidity implies that these categories are fusion categories.
Given v1xx ∈ V 1xx, choose a vector vxx1 ∈ V xx1 such that vxx1 ◦v1x,x = id1(see Figure
2). Now we define right death and birth, dx := v
xx : x ⊗ x → 1, bx := 1φv
1 : 1 →
x⊗ x (see Figure 3).
With these definitions, right rigidity is an easy consequence by direct computa-
tion. The following is a graphical version of it:
= = = idx,
= = = idx
where the first and the third equalities are from the definitions above, and the
second equalities are the associativity αxxxx and (α
−1, respectively.
12 TOBIAS J. HAGGE AND SEUNG-MOON HONG
, = 1
Figure 2. Graphical notation of v1xx and v
1 and property
:= =: , := 1
= = 1
Figure 3. Definitions of bx and dx, and elementary properties
(yx)z
αy,x,z // y(xz)
ycx,z
(xy)z
cx,yz
;;wwwwwwwww
αx,y,z
y(zx)
x(yz)
cx,yz // (yz)x
αy,z,x
;;wwwwwwwww
(yx)z
αy,x,z // y(xz)
yc−1z,x
(xy)z
c−1y,xz
;;wwwwwwwww
αx,y,z
y(zx)
x(yz)
c−1yz,x // (yz)x
αy,z,x
;;wwwwwwwww
Figure 4. Hexagon equalities
The same morphisms give a left rigidity structure when treated as left birth and
left death. Treat the objects y and 1 analogously by replacing φ with 1.
6. Proof of Theorem 1 part 3: the absence of braidings
The categories under consideration are known not to be braided (see [9]). How-
ever, once associativity matrices are known it is in principle not difficult to classify
braidings by direct computation. In this section we perform this computation and
show that no braidings are possible.
A braiding consists of a natural family of isomorphisms {cx,y : x ⊗ y → y ⊗ x}
such that two hexagon equalities hold:
(cx,y ⊗ z) ◦ αy,x,z ◦ (y ⊗ cx,z) = αx,y,z ◦ cx,yz ◦ αy,z,x and
((cy,x)
−1 ⊗ z) ◦ αy,x,z ◦ (y ⊗ (cz,x)−1) = αx,y,z ◦ (cyz,x)−1 ◦ αy,z,x.
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 13
= rzx,y
= r̄zx,y
Figure 5. Isomorphisms Rzx,y and R̄
We define isomorphisms Rzx,y : V
yx → V zxy by f 7→ cx,y ◦ f and R̄zx,y : V zyx → V zxy
by f 7→ (cx,y)−1 ◦ f for any f ∈ V zyx. Figure 5 shows the 1-dimensional case
where rzx,y is nonzero and r̄
x,y = (r
−1. For higher dimensional spaces it can
be expressed as an invertible matrix, also denoted rzx,y on the canonically ordered
basis as before.
With this linear isomorphism, the hexagon equations are equivalent to the equa-
tions
⊕sRsx,zV tys ◦ αty,x,z ◦ ⊕sRsx,yV tsz = αty,z,x ◦ ⊕sV syzRtx,s ◦ αtx,y,z, and
⊕sR̄sx,zV tys ◦ αty,x,z ◦ ⊕sR̄sx,yV tsz = αty,z,x ◦ ⊕sV syzR̄tx,s ◦ αtx,y,z, which we still call
hexagon equations, referred to as Htx,y,z and H̄
x,y,z, respectively. These are illus-
trated graphically in Figure 6).
We show the absence of a braiding by assuming the existence and deriving a
contradiction.
We need five 2-dimensional hexagon equations as follows:
Hxy,x,x : R
y,x ⊗ I2 ◦ αxx,y,x ◦Rxy,x ⊗ I2 = αxx,x,y ◦ I2 ⊗Rxy,x ◦ αxy,x,x
H̄xy,x,x : R̄
y,x ⊗ I2 ◦ αxx,y,x ◦ R̄xy,x ⊗ I2 = αxx,x,y ◦ I2 ⊗ R̄xy,x ◦ αxy,x,x
Hxx,y,x : R
x,x ⊗ 1 ◦ αxy,x,x ◦Rxx,y ⊗ I2 = αxy,x,x ◦ 1⊗Rxx,x ◦ αxx,y,x
H̄xx,y,x : R̄
x,x ⊗ 1 ◦ αxy,x,x ◦ R̄xx,y ⊗ I2 = αxy,x,x ◦ 1⊗ R̄xx,x ◦ αxx,y,x
H1x,x,x : R
x,x ⊗ 1 ◦ α1x,x,x ◦Rxx,x ⊗ 1 = α1x,x,x ◦ I2 ⊗R1x,x ◦ α1x,x,x
These are of the following forms, respectively:
(rxy,x)
= rxy,x
1/b 0
(rxx,y)
= (rxx,y)
1/b 0
rxx,y
(rxy,x)
]−1 [
]−1 [
= d2r1x,x
where
represents the matrix rxx,x.
From the first four equations, we get rxy,x = b, r
x,y = 1/b, −n = rxx,yk, m = rxx,yl,
−n = rxy,xk, which imply k = n = 0 since rxx,y 6= rxy,x as above. Now from the final
one we get l2 = dr1x,x(b+1) and −blm = dr1x,x(1+ b), and the later equality means
l2 = −dr1x,x(1 + b) by substituting m = rxx,yl. We get easily a contradiction for
either case b = ±i.
14 TOBIAS J. HAGGE AND SEUNG-MOON HONG
x y z
⊕sRsx,yV
x y z
αty,x,zoo
x y z
Htx,y,z
x y z
⊕sRsx,zV
]];;;;;;;;
αty,z,x����
x y z
αtx,y,z
]];;;;;;;;
x y z
⊕sV syzR
x,soo
x y z
⊕sR̄sx,yV
� x y z
αty,x,zoo
x y z
H̄tx,y,z
x y z
⊕sR̄sx,zV
αty,z,x����
x y z
αtx,y,z
]];;;;;;;;α
x,y,z
]];;;;;;;;
x y z
⊕sV syzR̄
x,soo
Figure 6. Equivalent hexagon equalities
7. Pivotal structures and sphericity
Let C be a rigid monoidal category. A pivotal structure for C is a monoidal natural
isomorphism π from ∗∗ to Id. A strict pivotal structure is a pivotal structure
which is the identity. In a pivotal monoidal category, the right trace trr of an
endomorphism f : x → x is given by trr(f) = bx ◦ (f ⊗ Idx∗) ◦ (π−1x ⊗ Idx∗) ◦ dx∗ ∈
End(1) ∼= C. The left trace trl is given by trl(f) = bx∗ ◦ (f∗ ⊗ Idx∗∗) ◦ ((πx)∗ ⊗
Idx∗∗) ◦ dx∗∗ . A pivotal monoidal category is spherical if trr = trl.
Pivotal structures may not be unique. For example, in a fusion category with
object types given by a finite group G, group multiplication as tensor product and
trivial associativity matrices, any group homomorphism G → C induces a pivotal
structure. Furthermore, pivotal structures depend on choices of rigidity. However, if
one chooses a new rigidity structure with b′x = cbx and d
x = c
−1dx, then π
x = c
gives a new pivotal structure π′ inducing the same traces as π.
For a strictified skeletal fusion category, we shall assume the rigidity structures
described in Section 3.3. Then ∗∗ is an object fixing monoidal endofunctor. The
isomorphisms Ja,b : a
∗∗ ⊗ b∗∗ → (a ⊗ b)∗∗ associated with ∗∗ considered as a
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 15
monoidal functor may be taken to be the identity on a⊗ b. If such a category has
a pivotal structure π, it must take the following form: for each strand (x), there is
a scalar tx such that πx = txIdx, and π(x1,...,xn) = tx1 . . . txnId(x1,...xn). Then for
all sequences (x1, . . . , xm) and (y1, . . . , yn), and all f : (x1, . . . , xm) → (y1, . . . , yn),
f∗∗ = tx1 . . . txmt
. . . t−1yn f . This implies that, in particular, t1 = 1.
Writing out the diagrams for b∗∗x and d
x and applying rigidity gives that b
x = bx
and d∗∗x = dx. Thus one must have txtx∗ = 1, and for self-dual strands, tx = ±1;
in this case tx is called the Frobenius-Schur indicator for x.
Furthermore, in a strictified skeleton the left and right trace on a strand may be
rewritten as follows:
trr(f) = t
x bx ◦ (f ⊗ Idx∗) ◦ dx∗ ,
trl(f) = txbx∗ ◦ (Idx∗ ⊗ f) ◦ dx.
Lemma 2. Every pivotal fusion category with self-dual simple objects is spherical.
Proof. The result holds since if x is a self dual strand then bx = bx∗ , dx = dx∗ and
tx = t
x . Thus for each f : x → x we have trr(f) = trl(f). �
Kitaev has shown in [6] that every unitary category admits a spherical structure.
A more general property called pseudo-unitarity is shown in [2] to guarantee a
spherical structure. However, it is not known if every fusion category admits a
pivotal or spherical structure.
For arbitrary fusion categories, one has that ∗ ∗ ∗∗ ∼= Id. This was shown in [2],
using an analog of Radford’s formula for S4 for representation categories of weak
Hopf algebras, which was developed in [8]. The following theorem shows that, in a
strictified skeletal fusion category, a convenient choice of rigidity makes ∗ ∗ ∗∗ the
identity on the nose. Extending the result to general fusion categories via natural
equivalences gives an elementary proof that ∗ ∗ ∗∗ ∼= Id.
Theorem 3. In a strictified skeletal fusion category, there is a choice of rigidity
structures such that ∗ ∗ ∗∗ = Id.
Proof. The functor ∗ ∗ ∗∗ is the identity on (2)-stranded morphisms by rigidity;
it suffices to prove the result for (2, 1) stranded morphisms. Let V = V zxy be a
(2, 1) stranded morphism space with a basis {vi}, and let {wi} be an algebraically
dual basis for the space W = V xyz , in the sense that wi ◦ vj = δijIdz . For any
simple object z, define the right pseudo-trace ptrr of an endomorphism f : z → z
by ptrr(f) = bz ◦ (f ⊗ Idz∗) ◦ dz∗ , and the left pseudo-trace ptrl by ptrl(f) =
bz∗ ◦ (Idz∗ ⊗f)◦dz. This definition is possible because ∗∗ is the identity on objects.
Scale rigidity morphisms if necessary so that for any strand z, ptrr(Idz) = ptrl(Idz).
Because dz and bz∗ are nonzero elements of one dimensional algebraically dual
morphism spaces, ptrr(Idz) 6= 0. One may now exchange left pseudo-traces for
right pseudo-traces, just like with traces in a graphical calculus for a spherical
category.
Figure 7 gives the proof. On the left side, bending arms and pseudo-sphericity
implies that the algebraic dual basis of the basis {w∗∗i } is {∗∗vi}. However, on the
right side the functoriality of the double dual implies that the algebraic dual basis
of {w∗∗i } is {v∗∗i }. Since the left and right double dual are inverse functors, ∗ ∗ ∗∗
is the identity. �
16 TOBIAS J. HAGGE AND SEUNG-MOON HONG
Figure 7. In a strictified skeletal fusion category, with the right
choice of rigidity structures the quadruple dual is the identity.
Even if a category admits a pivotal structure it is not known whether it admits
a spherical pivotal structure. Pictorial considerations do not readily provide an
answer. It is possible, however, to partially describe what a pivotal strictified
skeleton which did not admit a spherical structure would look like.
Let C be a pivotal strictified skeletal fusion category which does not admit a
spherical structure. Choose rigidity morphisms which give a pseudo-spherical struc-
ture as above, and a matching pivotal structure. Then for any object x, one has
the following:
trl(Idx)
trr(Idx)
txptrl(x)
t−1x ptrr(x)
= t2x.
Therefore, C is spherical iff there exists a pivotal structure such that all of the tx
are ±1. Thus there must be some strand x such that tx 6= ±1.
For strands u and v, u ⊗ v has a nontrivial morphism to some object w, and
tutv(tw)
−1 = ±1, since ∗ ∗ ∗∗ = Id. Thus the set of scalars t and their additive
inverses forms a finite subgroup G of C. Note that we can apply any group ho-
momorphism that preserves ±1 to the set of scalars t and get a new set of scalars
t′ which also gives a pivotal structure. At least one product tutv(tw)
−1 must be
equal to −1, or else we could apply the trivial homomorphism to the set of scalars
t to get a new pivotal structure with t′u = 1 for all strands u, which would make C
spherical.
Every finite subgroup of C is a cyclic group of roots of unity. We have |G| = 2k
for some k, and since C is not spherical, |G| ≥ 4. Using a homomorphism which
preserves −1 we may switch to a new pivotal structure which gives |G| = 2k for
some k, where k ≥ 2 to contradict sphericity. Pick an object v with t2v = −1. Then
v is not self dual, and for a simple summand w in v⊗ v, one has w 6= 1 and t2w = 1.
Therefore, C has at least four objects, v, v∗, w and 1. The set of objects u such
that t2u = 1 generates a spherical subcategory C′ with at least two simple objects,
and missing at least two.
SOME NON-BRAIDED FUSION CATEGORIES OF RANK 3 17
Lemma 4. Any fusion category which is pivotal but admits no spherical structure
contains at least five simple object types.
Proof. Assume that C has four simple object types, 1, w, v and v′ as above. Then
C′ has two simple object types, and by the classification of fusion categories with
two simple object types in [10], its fusion rules are given by w⊗w ∼= nw⊕1, where
n ∈ {0, 1}. The pivotal structure places limitations on the fusion rules, for example
v⊗w ∼= av⊕bv′ for some a and b in N. An easy calculation shows that C admits only
one associative fusion ring, in which objects and tensor products are given by the
group Z4. Any such category is pseudo-unitary and therefore spherical, as described
in [2], which contradicts the assumption. Therefore, a pivotal fusion category which
can’t be made spherical must have at least five simple object types. �
8. Proof of Theorem 1 part 4: spherical structure calculations
In this section we explicitly compute pivotal structures for the categories found
in Section 4. Since these categories have self dual simple objects, Lemma 2 implies
that they are spherical.
It is not hard to determine whether or not a fusion category is pivotal once
a set of associativity matrices is known. One way is to perform the calculations
directly using the associativity matrices, but there is an easier calculation. In order
to explain this calculation, it is convenient to extend the definition of composition
of morphisms over extra-categorical direct sums of morphism spaces. Suppose f ∈
Mor(a, b) and g ∈ Mor(c, d). Define f◦g as usual if b = c, and f◦g = 0 ∈ Mor(a, d)
otherwise. Extend this definition over direct sums of morphism spaces, distributing
composition over direct sum.
Given a strictified skeletal fusion category C and a set of associativity matrices,
choose bases for the (2, 0) and (2, 1)-stranded morphism spaces compatible with
the associativity matrices and choose rigidity so that for each strand x, the basis
element for V 1x∗x is dx. Define morphisms b = ⊕xbx, d = ⊕xdx, and I = ⊕xIdx,
taking sums over the strands.
Then B acts on
x,y,z V
xy as follows:
B(f) = (I ⊗ I ⊗ b) ◦ (I ⊗ f ⊗ I) ◦ (d⊗ I)
For a single (2, 1)-stranded morphism space, this action amounts to “bending
arms”. The cube of B is the double dual. The action of B on a morphism f ∈ V zxy
is given by the associativity matrix a1z∗,x,y, since (Idz∗ ⊗ f) ◦ dz = (g⊗ idy) ◦ dy for
some g ∈ V y
z∗x implies that B(f) = (Idz∗⊗Idx⊗by)◦(Idz∗⊗f⊗Idy∗)◦(dz⊗Idy∗) =
(Idz∗ ⊗ Idx ⊗ by) ⊗ (g ⊗ Idy ⊗ Idy∗) ◦ (dy ⊗ Idy∗) = g by rigidity. For the fusion
rules at hand, the matrix for B is as follows:
v1 v2 v
v1 (a
xxx)1,1 (a
xxx)1,2 0 0 0
v2 (a
xxx)2,1 (a
xxx)2,2 0 0 0
vyxx 0 0 0 a
yxx 0
vxyx 0 0 0 0 a
vxxy 0 0 a
xxy 0 0
For all of the solutions given in Section 4, B3 is the identity matrix, so the corre-
sponding strictified categories have a strict pivotal structure. Non-strict pivotality
18 TOBIAS J. HAGGE AND SEUNG-MOON HONG
would mean that B3 is a diagonal matrix with eigenvalues determined by a family
of invertible scalars t, coherent as described in Section 7.
Appendix A. Associativity matrices
In this section, we give explicit associativity matrices for the categorical realiza-
tion given in Section 4.
ayy,y,y = a
x,y,y = a
y,y,x = a
x,y,x = a
x,x,y = a
x,x,y = a
y,x,x = a
y,x,x = 1,
ayx,y,x = a
y,x,y = −1,
axx,y,x =
axx,x,y =
axy,x,x =
a1x,x,x =
e7πi/12
ayx,x,x =
e7πi/12
axx,x,x =
eπi/6 1−
e2πi/3 1−
e2πi/3 1−
eπi/6
eπi/6 1−
e2πi/3 − 1−
e2πi/3 − 1−
eπi/6
1 1 − 1
(eπi/6−1) 1
e5πi/6 1
(e−πi/3+i) 1
eπi/3
1 1 1
eπi/3 1
(e−πi/3+i) 1
e5πi/6 − 1
(eπi/6−1)
1 −1 − 1
(eπi/6−1) 1
e5πi/6 − 1
(e−πi/3+i) − 1
eπi/3
−1 1 − 1
eπi/3 − 1
(e−πi/3+i) 1
e5πi/6 − 1
(eπi/6−1)
References
[1] Bruno Buchberger. A theoretical basis for the reduction of polynomials to canonical forms.
SIGSAM Bull., 10(3):19–29, 1976.
[2] Pavel Etingof, Dmitri Nikshych, and Viktor Ostrik. On fusion categories. Ann. of Math.,
162(2):581–642, 2005.
[3] J. Frohlich and T. Kerler. Quantum groups, quantum categories, and quantum field theory,
chapter 4. Number 1542 in Lecture Notes in Mathematics. 1993.
[4] Christian Kassel. Quantum Groups. Springer-Verlag, 1995.
[5] Kazhdan and Hans Wenzl. Reconstructing monoidal categories. Adv. Soviet Math., 16:111–
136, 1993.
[6] Alexei Kitaev. Anyons in an exactly solved model and beyond. Annals of Physics, 321(1):2–
111, 2006.
[7] Saunders Mac Lane. Categories for the Working Mathematician, Second Edition. Springer-
Verlag, 1978.
[8] Dmitri Nikshych. On the structure of weak hopf algebras. And. Math., 170:257–286, 2002.
[9] Victor Ostrik. Pre-modular categories of rank 3. math.CT/0507349.
[10] Victor Ostrik. Fusion categories of rank 2. Math. Res. Lett., 10(2-3):177–183, 2003.
[11] Eric Rowell, Richard Stong, and Zhenghan Wang. in preparation.
[12] Daisuke Tambara and Shigeru Yamagami. Tensor categories with fusion rules of self-duality
for finite abelian groups. J. Algebra, 209:692–707, 1998.
[13] Hans Wenzyl and Imre Tuba. On braided tensor categories of type bcd. J. Reine. Angew.
Math., 581:31–69, 2005.
Department of Mathematics, Indiana University, Bloomington, Indana
E-mail address: thagge@indiana.edu,seuhong@indiana.edu
	1. Introduction
	2. Main theorem and outline
	3. Preliminaries and notational conventions
	3.1. Skeletization
	3.2. Strictification
	3.3. Strictified skeletal fusion categories
	3.4. Remarks
	4. Proof of Theorem ?? part  ??:possible tensor category structures
	4.1. Setting up the pentagon equations
	4.2. Normalizations
	4.3. Associativity matrices
	4.4. Pentagon equations with 1 1 matrices
	4.5. Pentagon equations with 2 2 or 6 6 matrices
	4.6. The pentagon equation with 16 16 matrices
	4.7. Solutions
	4.8. Inequivalence of the solutions
	5. Proof of Theorem ?? part ??: rigidity structures
	6. Proof of Theorem ?? part ??: the absence of braidings
	7. Pivotal structures and sphericity
	8. Proof of Theorem ?? part ??: spherical structure calculations
	Appendix A. Associativity matrices
	References
ABSTRACT
  We classify all fusion categories for a given set of fusion rules with three
simple object types. If a conjecture of Ostrik is true, our classification
completes the classification of fusion categories with three simple object
types. To facilitate the discussion we describe a convenient, concrete and
useful variation of graphical calculus for fusion categories, discuss
pivotality and sphericity in this framework, and give a short and elementary
re-proof of the fact that the quadruple dual functor is naturally isomorphic to
the identity.

<|endoftext|><|startoftext|>
Introduction
	Observations
	X-Ray Images
	X-Ray Spectrum
	X-Ray Light curves
	The ACIS Photon Pile-Up
ABSTRACT
  We have been monitoring Supernova (SN) 1987A with {\it Chandra X-Ray
Observatory} since 1999. We present a review of previous results from our {\it
Chandra} observations, and some preliminary results from new {\it Chandra} data
obtained in 2006 and 2007. High resolution imaging and spectroscopic studies of
SN 1987A with {\it Chandra} reveal that X-ray emission of SN 1987A originates
from the hot gas heated by interaction of the blast wave with the ring-like
dense circumstellar medium (CSM) that was produced by the massive progenitor's
equatorial stellar winds before the SN explosion. The blast wave is now
sweeping through dense CSM all around the inner ring, and thus SN 1987A is
rapidly brightening in soft X-rays. At the age of 20 yr (as of 2007 January),
X-ray luminosity of SN 1987A is $L_{\rm X}$ $\sim$ 2.4 $\times$ 10$^{36}$ ergs
s$^{-1}$ in the 0.5$-$10 keV band. X-ray emission is described by two-component
plane shock model with electron temperatures of $kT$ $\sim$ 0.3 and 2 keV. As
the shock front interacts with dense CSM all around the inner ring, the X-ray
remnant is now expanding at a much slower rate of $v$ $\sim$ 1400 km s$^{-1}$
than it was until 2004 ($v$ $\sim$ 6000 km s$^{-1}$).

<|endoftext|><|startoftext|>
Introduction
In this paper we continue the study we began in [DW4] of superpotentials for the cohomogeneity
one Einstein equations. These equations are the ODE system obtained as a reduction of the
Einstein equations by requiring that the Einstein manifold admits an isometric Lie group action
whose principal orbits G/K have codimension one [BB], [EW]. As discussed in [DW3], these
equations can be viewed as a Hamiltonian system with constraint for a suitable Hamiltonian H, in
which the potential term depends on the Einstein constant and the scalar curvature of the principal
orbit, and the kinetic term is essentially the Wheeler-deWitt metric, which is of Lorentz signature.
For any Hamiltonian system with Hamiltonian H and position variable q, a superpotential is a
globally defined function u on configuration space that satisfies the equation
(0.1) H(q, duq) = 0.
From the classical physics viewpoint, u is a C2 (rather than a viscosity) solution of a time-
independent Hamilton-Jacobi equation. The literature for implicitly defined first order partial
differential equations then suggests that such solutions are fairly rare. It is therefore not unrea-
sonable to expect in our case that one can classify (at least under appropriate conditions) those
principal orbits where the associated cohomogeneity one Einstein equations admit a superpotential.
The existence of such a superpotential u in our case leads naturally to a subsystem of equations
of half the dimension of the full Einstein system. One way to see this is via generalised first integrals
which are linear in momenta, described in [DW4]. Schematically, the subsystem may be written as
q̇ = J∇u where J is an endomorphism related to the kinetic term of the Einstein Hamiltonian.
String theorists have exploited the superpotential idea in their search for explicit metrics of special
holonomy (see for example [CGLP1], [CGLP2],[CGLP3], [BGGG] and references in [DW4]). The
point here is that the subsystem defined by the superpotential often (though not always) represents
the condition that the metric has special holonomy. Also, the subsystem can often be integrated
explicitly.
In [DW4], §6, we obtained classification results for superpotentials of the cohomogeneity one
Ricci-flat equations. Besides assuming that G and K are both compact, connected Lie groups such
that the isotropy representation of G/K is multiplicity-free, we also mainly restricted our attention
to superpotentials which are of the same form as the scalar curvature function of G/K, i.e., a finite
sum with constant coefficients of exponential terms. Almost all the known superpotentials are of
this kind.
Date: revised October 24, 2018.
The second author was partly supported by NSERC grant No. OPG0009421.
http://arxiv.org/abs/0704.0210v1
2 A. DANCER AND M. WANG
However, the above classification results were further subject to the technical assumption that
the extremal weights for the superpotential did not lie in the null cone of the Wheeler-de Witt
metric. In [DW4] we gave some examples of superpotentials which do not satisfy this hypothesis.
These included several new examples which do not seem to be associated to special holonomy.
In this paper, therefore, we attempt to solve the classification problem without the non-null
assumption on the extremal weights.
As in [DW4], we use techniques of convex geometry to analyse the two polytopes naturally
associated to the classification problem. The first is (a rescaled translate of) the convex hull
conv(W) of the weight vectors appearing in the scalar curvature function of the principal orbit.
The second is the convex hull conv(C) of the weight vectors in the superpotential. In [DW4] we
showed that the non-null assumption forces these polytopes to be equal, so we could analyse the
existence of superpotentials by looking at the geometry of conv(W).
In the current paper, conv(C) may be strictly bigger than conv(W) because of the existence of
vertices outside conv(W) but lying on the null cone of the Wheeler-de Witt metric. Our strategy
is to consider such a vertex c and project conv(W) onto an affine hyperplane separating c from
conv(W). We can now analyse the existence of superpotentials in terms of the projected polytope.
The analysis becomes considerably more complicated because, whereas in [DW4] we could analyse
the situation by looking at the vertices and edges of conv(W), now, because we have projected onto
a subspace of one lower dimension, we have to consider the 2-dimensional faces of conv(W) also.
We find that in this situation the only polytopes conv(W) arising from principal orbits with more
than three irreducible summands in their isotropy representations are precisely those coming from
principal orbits which are circle bundles over a (homogeneous) Fano product. In the latter case,
the solutions of the subsystem defined by the superpotential correspond to Calabi-Yau metrics, as
discussed in [DW4].
After a review of basic material in §1, we state the main classification theorem of the paper in
§2 and give an outline of the strategy of the proof there.
1. Review and notation
In this section we fix notation for the problem and review the set-up of [DW4].
Let G be a compact Lie group, K ⊂ G be a closed subgroup, and M be a cohomogeneity one
G-manifold of dimension n + 1 with principal orbit type G/K, which is assumed to be connected
and almost effective. A G-invariant metric g on M can be written in the form g = εdt2 + gt where
t is a coordinate transverse to the principal orbits, ε = ±1, and gt is a 1-parameter family of
G-homogeneous Riemannian metrics on G/K. When ε = 1, the metric g is Riemannian, and when
ε = −1, the metric g is spatially homogeneous Lorentzian, i.e., the principal orbits are space-like
hypersurfaces.
We choose an Ad(K)-invariant decomposition g = k ⊕ p where g and k are respectively the Lie
algebras of G and K, and p is identified with the isotropy representation of G/K. Let
(1.1) p = p1 ⊕ · · · ⊕ pr
be a decomposition of p ≈ T(K)(G/K) into irreducible real K-representations. We let di be the
real dimension of pi, and n =
i=1 di be the dimension of G/K (so dimM = n+ 1). We use d for
the vector of dimensions (d1, · · · , dr). We shall assume that the isotropy representation of G/K is
multiplicity free, i.e., all the summands pi in (1.1) are distinct as K-representations. In particular,
if there is a trivial summand it must be 1-dimensional.
We use q = (q1, · · · , qr) to denote exponential coordinates on the space of G-invariant metrics
on G/K. The Hamiltonian H for the cohomogeneity one Einstein equations with principal orbit
G/K is now given by:
H = v−1J + εv ((n− 1)Λ− S) ,
CLASSIFICATION OF SUPERPOTENTIALS 3
where Λ is the Einstein constant, v = 1
ed·q is the relative volume and
(1.2) J(p, p) =
which has signature (1, r − 1). The scalar curvature S of G/K above can be written as
where Aw are nonzero constants and W is a finite collection of vectors w ∈ Z
r ⊂ Rr. The set W
depends only on G/K and its elements will be referred to as weight vectors. These are of three
types
(i) type I: one entry of w is −1, the others are zero,
(ii) type II: one entry is 1, two are -1, the rest are zero,
(iii) type III: one entry is 1, one is -2, the rest are zero.
Notation 1.1. As in [DW4] we use (−1i,−1j , 1k) to denote the type II vector w ∈ W ⊂ Rr with
−1 in places i and j, and 1 in place k. Similarly, (−2i, 1j) will denote the type III vector with −2
in place i and 1 in place j, and (−1i) the type I vector with −1 in place i.
Remark 1.2. We collect below various useful facts from [DW4] and [WZ1]. Also, we shall use
standard terminology from convex geometry, as given, e.g., in [Zi]. In particular, a “face” is not
necessarily 2-dimensional. However, a vertex and an edge are respectively zero and one-dimensional.
The convex hull of a set X in Rr will be denoted by conv(X).
(a) For a type I vector w, the coefficient Aw > 0 while for type II and type III vectors, Aw < 0.
(b) The type I vector with −1 in the ith position is absent fromW iff the corresponding summand
pi is an abelian subalgebra which satisfies [k, pi] = 0 and [pi, pj] ⊂ pj for all j 6= i. If the isotropy
group K is connected, these last conditions imply that pi is 1-dimensional, and the pj , j 6= i, are
irreducible representations of the (compact) analytic group whose Lie algebra is k⊕ pi.
(c) If (1i,−1j ,−1k) occurs in W then its permutations (−1i, 1j ,−1k) and (−1i,−1j , 1k) do also.
(d) If dim pi = 1 then no type III vector with −2 in place i is present in W. If in addition K is
connected, then no type II vector with nonzero entry in place i is present.
(e) If I is a subset of {1, · · · , r}, then each of the equations
i∈I xi = 1 and
i∈I xi = −2
defines a face (possibly empty) of conv(W). In particular, all type III vectors in W are vertices
and (−1i,−1k, 1j) ∈ W is a vertex unless both (−2i, 1j) and (−2k, 1j) lie in W.
(f) For v,w ∈ W (or indeed for any v,w such that
vi or
wi = −1), we have
(1.3) J(v + d,w + d) = 1−
For the remainder of the paper, we shall work in the Ricci-flat Riemannian case, that is, we take
ε = 1 and Λ = 0. As in [DW4], any argument that does not use the sign of Aw would be valid in
the Lorentzian case. We shall also assume that conv(W) is r− 1 dimensional. This is certainly the
case if G is semisimple, as W spans Rr (see the proof of Theorem 3.11 in [DW3]).
The superpotential equation (0.1) now becomes
(1.4) J(∇u,∇u) = ed·q S,
where ∇ denotes the Euclidean gradient in Rr. As in [DW4] we shall look for solutions to Eq.(1.4)
of the form
(1.5) u =
Fc̄ e
4 A. DANCER AND M. WANG
where C is a finite set in Rr, and the Fc̄ are nonzero constants. Now Eq.(1.4) reduces to, for each
ξ ∈ Rr,
(1.6)
ā+c̄=ξ
J(ā, c̄) FāFc̄ =
Aw if ξ = d+ w for some w ∈ W
0 if ξ /∈ d+W.
We shall assume henceforth that r ≥ 2 since the superpotential equation always has a solution
in the r = 1 case, as was noted in [DW4], and J is of Lorentz signature only when r ≥ 2. The
following facts were deduced in [DW4] from Eq.(1.6).
Proposition 1.3. conv(1
(d+W)) ⊂ conv(C).
Proof. If w ∈ W, then Eq.(1.6) implies that d + w = ā + c̄ for some ā, c̄ ∈ C, and hence that
(d+ w) = 1
(ā+ c̄) ∈ conv(C).
Proposition 1.4. If ā, c̄ ∈ C and ā+ c̄ cannot be written as the sum of two non-orthogonal elements
of C distinct from ā, c̄ then either J(ā, c̄) = 0 or ā+ c̄ ∈ d+W.
In particular, if c̄ is a vertex of C, then either J(c̄, c̄) = 0, or 2c̄ = d + w for some w ∈ W and
J(c̄, c̄) F 2c̄ = Aw. In the latter case, J(d+w, d+w) has the same sign as Aw so is > 0 if w is type
I and < 0 if w is type II or III.
As mentioned in the Introduction, for the classification in [DW4] we made the assumption that
all vertices c̄ of C are non-null. Under this assumption, the second assertion of Prop 1.4 implies
that all vertices of C lie in 1
(d+W). Hence conv(C) is contained in conv(1
(d+W)), and by Prop
1.3 they are equal. This meant that in [DW4], subject to the non-null assumption, we could study
the existence of a superpotential in terms of the convex geometry of W.
The aim of the current paper is to drop this assumption. We still have
conv(
(d+W)) ⊂ conv(C),
but can no longer deduce that these sets are equal. The problem is that a vertex c̄ of conv(C) may
lie outside conv(1
(d+W)) if it is null.
In fact, it is clear from the above discussion that conv(1
(d+W)) is strictly contained in conv(C)
if and only if C has a null vertex. For if c̄ is a null vertex of C and 2c̄ = d + w for some w ∈ W,
then Eq.(1.6) fails for ξ = d+ w.
We conclude this section by proving an analogue of Proposition 2.5 in [DW4]. The arguments
below using Prop 1.4 are ones which will recur throughout this paper. Henceforth when we use the
term “orthogonal” we mean orthogonal with respect to J unless otherwise stated.
Theorem 1.5. C lies in the hyperplane {x̄ :
x̄i =
(n−1)} (possibly after subtracting a constant
from the superpotential).
Proof. We can assume 0 /∈ C by subtracting a constant from the superpotential. We shall also use
repeatedly below the fact that as J has signature (1, r− 1) there are no null planes, only null lines.
Denote by Hλ the hyperplane
x̄i = λ, so
(d + W) lies in H 1
(n−1). Suppose there exist
elements of C with
x̄i >
(n − 1). Let λmax denote the greatest value of
x̄i over C. If ãc̃ is
an edge of conv(C) ∩Hλmax , then Prop 1.4 shows that ã, c̃ are null, and that c̃ is orthogonal to the
element of C closest to it on the edge. Hence c̃ is orthogonal to the whole edge. Now J is totally
null on Span{ã, c̃}, so since there are no null planes, ã, c̃ are proportional, which is impossible as
they are both in Hλmax . So C ∩Hλmax is a single point c̃max, which is null.
Next we claim that all elements of C lying in the half-space
x̄i >
(n − 1) must be multiples
of c̃max. If not, let λ∗ be the greatest value such that there is an element of C, not proportional
to c̃max, in Hλ∗ . Let ã be a vertex of conv(C) ∩Hλ∗, not proportional to c̃max. Now, by Prop 1.4,
J(ã, c̃max) = 0, and so ã is not null. Since λ∗ >
(n− 1), we see ã+ ã must be written in another
CLASSIFICATION OF SUPERPOTENTIALS 5
way as a sum of two non-orthogonal elements of C. This sum must be of the form µc̃max + f̃ . But
c̃max is orthogonal to ã and to itself, hence to f̃ , a contradiction establishing our claim.
Similarly, all elements of C lying in
x̄i <
(n − 1) are multiples of an element c̃min, should
they occur. (Note that J is negative definite on H0 and we have assumed 0 /∈ C so λmin 6= 0.)
We denote the sets of elements lying in these open half-spaces by C+ and C− respectively. Note
that, when non-empty, C+ and C− are orthogonal to all elements of C ∩ H 1
(n−1). (For if ã ∈
C ∩ H 1
(n−1) then ã + c̃max cannot be written in another way as a sum of two non-orthogonal
elements of C.) In particular, if c̃max and c̃min are orthogonal, then c̃max is orthogonal to all
of conv(C), which is r-dimensional by assumption. So c̃max is zero, a contradiction. The same
argument implies that C+ and C− are both non-empty.
Let νc̃min and µc̃max be respectively the elements of C− and C+ closest to H 1
(n−1). Suppose that
c̃max+νc̃min = c̃
(1)+ c̃(2) with c̃(i) ∈ C and J(c̃(1), c̃(2)) 6= 0. Non-orthogonality means the c̃(i) cannot
belong to the same side of H 1
(n−1) and by the choice of ν, they cannot belong to opposite sides of
(n−1). Both therefore lie inH 1
(n−1). But by the previous paragraph, J(c̃max+νc̃min, c̃
(1)+c̃(2)) =
0. This means that c̃max + νc̃min is null, which contradicts J(c̃max, c̃min) 6= 0. Hence c̃max + νc̃min
lies in d+W ⊂ Hn−1. Applying the same argument to c̃min+µc̃max, we find that in fact µ = ν = 1,
i.e., C+ = {c̃max} and C− = {c̃min}.
Now C ∩ H 1
(n−1) (and hence its convex hull) is contained in the hyperplanes c̃
max, c̃
min in
(n−1). These hyperplanes are distinct as c̃max is orthogonal to itself but not to c̃min. Hence
d+W ⊂ (C + C) ∩Hn−1 is contained in the union of the point c̃max + c̃min and the codimension 2
subspace c̃⊥max ∩ c̃
min of Hn−1. So conv(d+W) is contained in a codimension 1 subspace of Hn−1,
contradicting our assumption that dimconv(d+W) = r − 1.
Remark 1.6. A notational difficulty arises from the fact that, as seen above, points of C are on
the same footing as points in 1
(d+W) rather than points of W. Accordingly, we shall use letters
c, u, v, ... to denote elements of the hyperplane
ui = −1 (such as elements of W), and c̄, ū, v̄, ...
to denote the associated elements 1
(d+ c), 1
(d+u), 1
(d+v), · · · of the hyperplane
ūi =
(n−1)
(such as elements of C or of 1
(d+W)).
Note that for any convex or indeed affine sum
(j) of vectors ξ(j) in Rr, we have
λjξ(j) =
λj ξ(j).
Since we now know that the set C, like 1
(d +W), lies in H 1
(n−1) := {x̄ :
x̄i =
(n − 1)}, we
will adopt the convention, as in the last paragraph, that when we refer to hyperplanes such as c̄⊥
in the rest of the paper, we mean “affine hyperplanes in H 1
(n−1)”.
2. The classification theorem and the strategy of its proof
We can now state the main theorem of the paper.
Theorem 2.1. Let G be a compact connected Lie group and K a closed connected subgroup such
that the isotropy representation of G/K is the direct sum of r pairwise inequivalent R-irreducible
summands. Assume that dimconv(W) = r−1, where W is the set of weights of the scalar curvature
function of G/K (cf §1). (This holds, for example, if G is semisimple.)
If the cohomogeneity one Ricci-flat equations with G/K as principal orbit admit a superpotential
of form (1.5) where C contains a J-null vertex, then we are in one of the following situations (up
to permutations of the irreducible summands):
(i) W = {(−1)i, (11,−2i) : 2 ≤ i ≤ r}, d1 = 1, C =
(d + {(−11), (11,−2i) : 2 ≤ i ≤ r}) and
r ≥ 2;
6 A. DANCER AND M. WANG
(ii) r ≤ 3.
Remark 2.2. As mentioned before, the situation where C has no null vertex was analysed in [DW4].
Hence, except for the r ≤ 3 case, Theorem 2.1 completes the classification of superpotentials of
scalar curvature type subject to the above assumptions on G and K.
Remark 2.3. The first case of the above theorem is realized by certain circle bundles over a product
of r−1 Fano (homogeneous) Kähler-Einstein manifolds (cf. Example 8.1 in [DW4], and [BB], [WW],
[CGLP3]), and the subsystem of the Ricci-flat equations singled out by the superpotential in these
examples corresponds to the Calabi-Yau condition. For more on the r = 2 case, see the concluding
remarks in §10.
Remark 2.4. Theorem 2.1 remains true if we replace the connectedness of G and K by the
connectedness of G/K and the extra condition on the isotropy representation given by the second
statement in Remark 1.2(d), i.e., if pi is an irreducible summand of dimension 1 in the isotropy
representation of G/K, then [pi, pj] ⊂ k⊕ pj for all j 6= i.
This weaker property does hold in practice. For example, the exceptional Aloff-Wallach space
N1,1 can be written as (SU(3) × Γ)/(U1,1 · ∆Γ), where U1,1 is the set of diagonal matrices of the
form diag(exp(iθ), exp(iθ), exp(−2iθ)) and Γ is the dihedral group with generators
0 1 0
−1 0 0
0 0 1
e2πi/3 0 0
0 e−2πi/3 0
0 0 1
In order to prove Theorem 2.1 we have to analyse the situation when there is a null vertex c̄ ∈ C.
As discussed in §1, conv(C) now strictly includes conv(1
(d + W)) as c̄ is not in conv(1
(d +W)).
Our strategy is to take an affine hyperplane H separating c̄ from conv(1
(d+W)), and consider the
projection ∆c̄ of conv(1
(d+W)) onto H from c̄.
Roughly speaking, whereas in [DW4] we could analyse the situation by looking at the vertices and
edges of conv(1
(d+W)), now, because we have projected onto a subspace of one lower dimension,
we have to consider the 2-dimensional faces of conv(1
(d+W)) also.
This is a natural method of dealing with the situation of a point outside a convex polytope. It
has some relation to the notion of “lit set” introduced in a quite different context by Ginzburg-
Guillemin-Karshon [GGK].
The analysis in the next section will show that the vertices of the projected polytope ∆ can
be divided into three types (Theorem 3.8). We label these types (1A), (1B) and (2). Roughly,
these correspond to vertices orthogonal to c̄, vertices ξ̄ such that the line through c̄ and ξ̄ meets
conv(1
(d+W)) at a vertex, and vertices ξ̄ such that this line meets conv(1
(d+W)) in an edge.
In the remainder of the paper we shall gradually narrow down the possibilities for each type. In
§3 we begin a classification of type (2) vertices. In §4 we are able to deduce that conv(1
(d +W))
lies in the half space J(c̄, ·) ≥ 0. We are able to deduce an orthogonality result for vectors on edges
in conv(1
(d + W)) ∩ c̄⊥. This is analogous to the key result Theorem 3.5 of [DW4] that held (in
the more restrictive situation of that paper) for general edges in conv(1
(d+W)). In §5 we exploit
this result and some estimates to classify the possible configurations of (1A) vertices (i.e. vertices
in c̄⊥), see Theorem 5.18.
In §6 we attack the (1B) vertices, exploiting the fact that adjacent (1B) vertices give rise to a
2-dimensional face of conv(1
(d +W)). This is the most laborious part of the paper, as it involves
a case-by-case analysis of such faces. We show that adjacent (1B) vertices can arise only in a very
small number of situations (Theorem 6.18). In §7 we exploit the listing of 2-dim faces to show that
there is at most one type (2) vertex, except in two special situations (Theorem 7.1). In §8 and 9,
we eliminate more possibilities for adjacent (1B) and type (2) vertices. We find that if r ≥ 4 then
we are either in case (i) of the Theorem or there are no type (2) vertices and no adjacent type (1B)
CLASSIFICATION OF SUPERPOTENTIALS 7
vertices. Using the results of §4, in the latter case we find that all vertices are (1A) except for a
single (1B). Building on the results of §5 for (1A) vertices, we are able to rule out this situation in
§10, see Theorem 10.15 and Corollary 10.16.
3. Projection onto a hyperplane
We first present some results about null vectors in H 1
(n−1).
Remark 3.1. From Eq.(1.3), the set of null vectors in the hyperplane H 1
(n−1) form an ellipsoid
x2i /di = 1}. If c̄ is null, then the hyperplane c̄
⊥ in H 1
(n−1) is the tangent space to this ellipsoid.
So any element x̄ 6= c̄ of c̄⊥ satisfies J(x̄, x̄) < 0.
Lemma 3.2. Let x, y satisfy
yi = −1.
Suppose that J(x̄, x̄) and J(ȳ, ȳ) ≥ 0. Then J(x̄, ȳ) ≥ 0, with equality iff x̄ is null and x̄ = ȳ. In
particular, if x̄, ȳ are distinct null vectors then J(x̄, ȳ) > 0.
Proof. This follows from Eq.(1.3) and Cauchy-Schwartz.
Proposition 3.3. Let H = {x̄ : h(x̄) = λ} be an affine hyperplane, where h is a linear functional
such that conv(1
(d + W)) lies in the open half-space {x̄ : h(x̄) < λ}. Then there is at most one
element of C in the complementary open half-space {x̄ : h(x̄) > λ}. Such an element is a null vertex
of conv(C). Hence any element of C outside conv(1
(d+W)) is a null vertex of C.
Proof. Suppose the points of C with h(x̄) > λ are c̄(1), · · · , c̄(m) with m > 1. Our result is stable
with respect to sufficiently small perturbations of H, so we can assume that h(c̄(1)) > h(c̄(2)) ≥
h(c̄(3)), · · · , h(c̄(m)).
Now c̄(1) + c̄(1) and c̄(1) + c̄(2) cannot be written in any other way as the sum of two elements
of C. Hence, by Prop. 1.4, c̄(1) is null and J(c̄(1), c̄(2)) = 0. The only other way c̄(2) + c̄(2) can be
written is as c̄(1) + c̄ for some c̄ ∈ C. But then c̄ = 2c̄(2) − c̄(1), so J(c̄(1), c̄) = 0, and such sums will
not contribute. Hence J(c̄(2), c̄(2)) = 0, contradicting Lemma 3.2.
Corollary 3.4. For distinct elements c̄, ā of C, the line segment c̄ā meets conv(1
(d+W)).
This gives us some control over the extent to which conv(C) can be bigger than the set conv(1
Lemma 3.5. Let A ⊂ H 1
(n−1) be an affine subspace such that A ∩ conv(
(d +W)) is a face of
conv(1
(d+W)). Suppose there exists c̄ ∈ C ∩A with c̄ /∈ conv(1
(d+W)).
Let x̄ ∈ A. If x̄ = 1
(ā+ ā′) with ā, ā′ ∈ C, then in fact ā, ā′ ∈ A.
Proof. If ā or ā′ equals c̄ this is clear.
We know by Cor 3.4 that if ā, ā′ 6= c̄ then the segments āc̄, ā′c̄′ meet conv(1
(d+W)). So there
exist 0 < s, t ≤ 1 with tā+ (1− t)c̄ and sā′ + (1− s)c̄ in conv(1
(d+W)). Hence
(tā+ (1− t)c̄) +
sā′ + (1− s)c̄
∈ conv(
(d+W)).
As it is an affine combination of x̄, c̄ this point also lies in A, so it lies in A∩ conv(1
(d+W)). Also,
it is a convex linear combination of the points tā+ (1 − t)c̄ and sā′ + (1− s)c̄ of conv(1
(d+W)).
Hence, by our face assumption, both these points lie in A, so ā, ā′ lie in A.
Remark 3.6. The above lemma will be very useful because it means that in all our later calculations
using Prop 1.4 for a face defined by an affine subspace A, we need only consider elements of C lying
in A.
8 A. DANCER AND M. WANG
Proposition 3.7. Let vw be an edge of conv(W) and suppose v̄, w̄ ∈ C.
(i) If there are no points of W in the interior of vw, then J(v̄, w̄) = 0.
(ii) If u = 1
(v+w) is the unique point of W in the interior of vw, J(v̄, w̄) > 0, and u is type II
or III, then Fv̄, Fw̄ are of opposite signs.
Proof. Part(i) is a generalization of Theorem 3.5 in [DW4] and we will be able to apply the proof
of that result after the following argument. Let the edge v̄w̄ of conv(1
(d + W)) be defined by
equations 〈x̄, u(i)〉 = λi : i ∈ I where 〈x̄, u
(i)〉 ≤ λi for i ∈ I and x̄ ∈ conv(
(d + W)). (In
the above, 〈 , 〉 is the Euclidean inner product in Rr.) Note that Span {u(i) : i ∈ I} is the
〈 , 〉-orthogonal complement of the direction of the edge.
Let H be a hyperplane whose intersection with conv(1
(d+W)) is the edge v̄w̄. We can take H
to be defined by the equation 〈x̄,
i∈I biu
(i)〉 =
i∈I biλi where bi are arbitrary positive numbers
summing to 1.
If ā, ā′ are elements of C whose midpoint lies in v̄w̄, then either they are both in H or one of
them, ā say, is on the opposite side of H from conv(1
(d+W)). In the latter case ā is null and the
only element of C on this side of H so, by Prop 1.4, is J-orthogonal to v̄, w̄. Hence, as 1
(ā + ā′)
is an affine combination of v̄, w̄, we see that J(ā, ā′) = 0, and so such sums do not contribute in
Eq.(1.6). We may therefore assume that ā, ā′ are in H. But as this is true for all H of the above
form, the only sums that will contribute are those where ā, ā′ are collinear with v̄w̄.
Now if ā, say, lies outside the line segment v̄w̄, then it is null and J-orthogonal to v̄ or w̄, and
hence to the whole line. So the only sums which contribute are those where ā, ā′ lie on the line
segment v̄w̄. Now the proof of Theorem 3.5 in [DW4] gives (i).
Turning to (ii), note first that the above arguments and Prop 1.4 give (ii) immediately if no
interior points of the edge v̄w̄ lie in C. If there are m interior points in C, we again proceed as in
the proof of Theorem 3.5 in [DW4] and use the notation there. We may assume that Lemma 3.2
(and hence Cor 3.3 and Lemma 3.4) of [DW4] still holds; for the only issue is the statement for
λm+1, but if c
(0) + c(λm+1) cannot be written as c(λj) + c(λk) (0 < λj , λk < m + 1) then what we
want to prove is already true.
Now Lemma 3.4 in [DW4] and our hypothesis J(v̄, w̄) > 0 imply that J00 < 0 and Jλi,λj > 0
except in the three cases listed there. The proof that the elements of C are equi-distributed in v̄w̄
carries over from [DW4] since the midpoint ū is not involved in the arguments.
Suppose next that the points in C ∩ v̄w̄ are equi-distributed. In the special case where m = 1,
we have J(v̄, ū) = 0 = J(ū, w̄), which imply J(ū, ū) = 0. So the midpoint does not contribute to
the equation from c(0) + c(λm+1). If m > 1, we write down the equations arising from c(0) + c(λm+1)
and c(λm−1) + c(λm+1). The formula for Fλj in [DW4] still holds for 1 ≤ j ≤ m, and using this and
the second equation we obtain the analogous formula for Fλm+1 .
Putting all the above information together in the first equation and using Au < 0, we see that
Fm+1λ1 /F
0 is positive if m is even and negative if m is odd. In either case it follows immediately
that F0Fλm+1 < 0, as required.
We shall now set up the basic machinery of the projection of our convex hull onto an affine
hyperplane.
Let c̄ be a null vector in C and let H be an affine hyperplane separating c̄ from conv(1
(d+W)).
Define a map P : conv(1
(d + W)) −→ H by letting P (z̄) be the intersection point of the ray c̄z̄
with H. We denote by ∆ the image of P in H. (P and ∆ of course depend on c̄ and the choice of
H. When considering projections from several null vertices, we will use the vertices as superscripts
to distinguish the cases, e.g., ∆c̄,∆b̄.)
Let us now consider a vertex ξ̄ of ∆. We know that c̄ and ξ̄ are collinear with a subset P−1(ξ̄) of
conv(1
(d+W)). As ξ̄ does not lie in the interior of a positive-dimensional subset of ∆, we see that
CLASSIFICATION OF SUPERPOTENTIALS 9
no point of P−1(ξ̄) lies in the interior of a subset of conv(1
(d +W)) of dimension > 1. So P−1(ξ̄)
is a vertex or an edge of conv(1
(d+W)).
If P−1(ξ̄) is a vertex x̄, then 2x̄ ∈ d +W and in Lemma 3.5 we can take the affine subspace A
to be the line through c̄, ξ̄, x̄. Using this lemma and also Prop 1.4 and Cor 3.4 we see that either
x̄ ∈ C (in which case J(x̄, c̄) = 0), or x̄ /∈ C and x̄ = (ā+ c̄)/2 for some null element ā ∈ C ∩A. We
have therefore deduced
Theorem 3.8. Let ξ̄ be a vertex of ∆. Then exactly one of the following must hold:
(1A) ξ̄ (and hence P−1(ξ̄)) is orthogonal to c̄;
(1B) The line through c̄, ξ̄ meets conv(1
(d + W)) in a unique point x̄, and there exists a null
ā ∈ C such that (ā+ c̄)/2 = x̄;
(2) ξ̄ is not orthogonal to c̄, and c̄ and ξ̄ are collinear with an edge v̄w̄ of conv(1
(d+W)), (and
hence c and ξ are collinear with the corresponding edge vw of conv(W)).
Remark 3.9. If (1B) occurs, then ā = 2x̄− c̄ being null is equivalent to J(x̄, x̄) = J(x̄, c̄), that is,
(3.1)
In particular xi and ci are nonzero for some common index i. We will from now on refer to this
situation by saying that the vectors x and c overlap.
We make a preliminary remark about (1A) vertices.
Lemma 3.10. Suppose that u ∈ W and ū ∈ c̄⊥.
(a) If u = (−2i, 1j) then ci 6= 0.
(b) Suppose that K is connected. If u = (−1i,−1j , 1k), then ci, cj , ck are all nonzero.
Proof. After a suitable permutation, we may let 1, · · · , s be the indices a with ca 6= 0. We need
In case (a) this is impossible if ci = 0 (that is, i /∈ {1, · · · , s}) as then we need dj = 1 = cj and
ca = 0 for a 6= j, contradicting
k=1 ck = −1.
Next, Cauchy-Schwartz on ( ua√
)sa=1, (
)sa=1 shows
≥ 1. In case (b), if, say, ck = 0,
then since 1
≥ 1 and di, dj ≥ 2 (see Remark 1.2(d)) we must have di = dj = 2. The equations
then imply ci = cj = −1 and ca = 0 for a 6= i, j, also giving a contradiction. Similar arguments
rule out ci = 0 or cj = 0.
In the next two sections we shall get stronger results on (1A) vertices. Let us now consider type
(2) vertices.
Theorem 3.11. Consider a type (2) vertex ξ̄ of ∆. So c and ξ are collinear with an edge vw of
conv(W). Suppose there are no points of W in the interior of vw. Then we have
(i) c = 2v − w or
(ii) c = (4v − w)/3.
In (i) the points of C on the line through c̄, ξ̄ are c̄ and w̄. In (ii) they are c̄, w̄ and c̄(1) =
(2v̄ + w̄)/3 = (c̄+ w̄)/2. We need J(c̄(1), w̄) = 0.
Proof. This is very similar to the arguments of §3 in [DW4]. We apply Lemma 3.5 to the line
through v̄, w̄.
(A) We write the elements of C on the line as c̄ = c̄(0), c̄(1), · · · , c̄(m+1) with m ≥ 0. So c̄(m+1) is
either null or is w̄. No other c̄(j) can lie beyond w̄, by Cor 3.4.
10 A. DANCER AND M. WANG
By assumption c̄ = c̄(0) is not orthogonal to the whole line. As c̄ is null, this means c̄ is not
orthogonal to any other point on the line. So c̄(0) + c̄(j) is either 2v̄, 2w̄ or else is a sum of two
other c̄(i). In particular, c̄(0)+ c̄(1) = 2v̄. In fact c̄(0)+ c̄(j) is never 2w̄; for the only possibility is for
c̄(0) + c̄(m+1) = 2w̄, in which case c̄(m+1) is null, and so c̄(m) + c̄(m+1) = 2w̄, contradicting v̄ 6= w̄.
We deduce that for j > 1, we have c̄(0) + c̄(j) = c̄(k) + c̄(p) for some 1 ≤ k, p ≤ j − 1.
(B) Let c̄(m+1) be null. Since the segment c̄(0)c̄(m+1) lies in the interior of the null ellipsoid,
Lemma 3.2 implies that J(c̄(i), c̄(j)) > 0 unless i = j = 0 or m + 1. Arguments very similar to
those in §3 of [DW4] enable us to determine the signs of the Fc̄(j) in (1.5) and show that the
contributions from the pairs summing to c̄(1) + c̄(m+1) cannot cancel. So we have a contradiction
unless c̄(1) + c̄(m+1) = w̄, which can only happen if m = 1, i.e., c̄(0) + c̄(1) = 2v̄, c̄(1) + c̄(2) = 2w̄ and
c̄(0) + c̄(2) = 2c̄(1) (otherwise c̄(0) + c̄(2) cannot cancel). Hence we have
c = (3v − w)/2 ; c(1) = (v + w)/2 ; c(2) = (3w − v)/2.
Writing Fj for Fc̄(j) , we need 2F0F2J(c̄, c̄
(2)) + F 21 J(c̄
(1), c̄(1)) = 0 so that the contributions from
c̄(0) + c̄(2) and c̄(1) + c̄(1) cancel. As J(c̄, c̄(2)) and J(c̄(1), c̄(1)) > 0, we need F0 and F2 to have
opposite signs. Now, as J(c̄, c̄(1)), J(c̄(1), c̄(2)) > 0, we see that Av and Aw have opposite signs. So
we may let w be type I and v be type II or III, as long as the asymmetry between c̄(0) and c̄(2) is
removed. Note that v,w cannot overlap if v is type II, as then Remark 1.2(c) means w is not a
vertex. The possibilities are (up to permutation)
v w c(0) = 1
(3v − w) c(2) = 1
(3w − v)
(1) (−2, 1, 0, · · · ) (−1, 0, · · · ) (−5
, 0, · · · ) (−1
, 0, · · · )
(2) (−2, 1, 0, · · · ) (0,−1, 0, · · · ) (−3, 2, 0, · · · ) (1,−2, 0, · · · )
(3) (−2, 1, 0, · · · ) (0, 0,−1, 0, · · · ) (−3, 3
, · · · ) (1,−1
, 0, · · · )
(4) (1,−1,−1, 0, · · · ) (0, 0, 0,−1, · · · ) (3
, · · · ) (−1
, · · · )
Now, it is clear in (1) and (2) that c̄(0) and c̄(2) can’t both give null vectors. For (3) and (4), we
find that the nullity equations for c̄(0) and c̄(2) have no integral solutions in di (in fact d3 (resp. d4)
must be 5/2).
Therefore in fact c̄(m+1) cannot be null.
(C) Now suppose that c̄(m+1) = w and m > 0. Since J(c̄, v̄) 6= 0, v̄ must lie between c̄(0) and c̄(1).
So J(c̄(0), ·) and J( · , c̄(m+1)) are affine functions on the line, vanishing at c̄(0) and c̄(m) respectively.
Hence J(c̄(0), c̄(i)) (i ≥ 1) and J(c̄(i), c̄(m+1)) (0 ≤ i ≤ m − 1) are the same sign as J(c̄(0), c̄(1)).
It follows that J(c̄(i), ·) is an affine function on the line, taking the same sign as J(c̄(0), c̄(1)) at
c̄(0), c̄(m+1) (for 1 ≤ i ≤ m − 1). Thus J(c̄(i), c̄(j)) is the same sign as J(c̄(0), c̄(1)) except for the
cases
J(c̄(0), c̄(0)) = 0 = J(c̄(m), c̄(m+1)) : sign J(c̄(m+1), c̄(m+1)) = −sign J(c̄(0), c̄(1)).
It then follows that the sign and non-cancellation arguments of (B) (taken from §3 of [DW4]) still
hold, except in the case m = 1.
These give the two cases of the Theorem. If m = 0, we have c(1) = w and c(0) = 2v − w as
c(0) + c(1) = 2v. If m = 1, then c(2) = w, c(0) + c(1) = 2v and c(0) + c(2) = 2c1 (for cancellation).
Hence c(0) = (4v − w)/3, c(1) = (2v + w)/3, as well as J(c̄(1), c̄(2)) = 0.
Remark 3.12. If there are points of W in the interior of vw, we can still conclude that c(0)+c(1) =
2v. Hence c = λv + (1 − λ)w for 1 < λ ≤ 2, since if λ > 2 then c̄(1) is beyond w̄. It must then be
null, and m = 0, so there is no way of getting 2w̄ as a sum of two elements in C.
Lemma 3.13. For case (i) in Theorem 3.11 (i.e., c = 2v −w), either w is type I, or w is type III
and vi = −1, wi = −2 for some index i.
CLASSIFICATION OF SUPERPOTENTIALS 11
Proof. It follows from above that J(w̄, w̄)F 2w̄ = Aw so J(w̄, w̄) is positive if w is type I and negative
if w is type II or III. In the latter case,
> 1, but by nullity, c = 2v − w satisfies
Hence for some i we have |wi| > |ci| = |2vi − wi|. As vi, wi ∈ {−2,−1, 0, 1}, it follows that
vi = −1, wi = −2.
We are now able to characterise the case where c is a type I vector.
Theorem 3.14. If c is a type I vector, say (−1, 0, · · · ) for definiteness, then W is given by
{(−1)i, (11,−2i) : i = 2, · · · , r}.
Remark 3.15. Equivalently, W is as in Ex 8.1 of [DW4], where the hypersurface in the Ricci-flat
manifold is a circle bundle over a product of Kähler-Einstein Fano manifolds. A superpotential was
found for this example in [CGLP3].
Proof. Nullity of c̄ implies d1 = 1, so (−2
1, 1i) /∈ W. Also (−11,−1j , 1k) /∈ W, as then c would be
in conv(W). Let us consider the vertices ξ̄ in ∆. ξ̄ cannot be of type (1A); otherwise ξ1 = −1,
which implies the existence of a type II vector in W with a nonzero first component, contradicting
the above. There can also be no ξ̄ of type (1B) since by Remark 3.9 the vector x̄ satisfies 0 < −x1,
which we ruled out above.
Hence all vertices of ∆ are of type (2), i.e., correspond to edges vw of conv(W) such that
c = λv + (1− λ)w and λ > 1. From this equation it follows that v,w are of the form
v = (−1i), w = (11,−2i)
for some i > 1. As ∆ (being a (r− 2)-dimensional polytope in an (r− 2)-dimensional affine space)
has at least r − 1 vertices, such vectors occur for all i 6= 1.
Now no type II vector can be in W, otherwise v would not be a vertex. Also (1i,−2j) with
i, j 6= 1 cannot be in W, as then (−1j) would not be a vertex. We have already seen (−21, 1i) is
not in W. So W is as claimed.
We shall henceforth exclude this case, i.e. case (i) of Theorem 2.1, from our discussion.
We conclude this section by giving a preliminary listing of the possibilities for c when we have a
type (2) vertex. These are given by cases (i) and (ii) of Theorem 3.11, as well as the possible cases
when there is a point of W in the interior of vw.
For Theorem 3.11(i) the possible v,w, c are:
v w c = 2v − w
(1) (−1, 1,−1, · · · ) (−2, 1, · · · ) (0, 1,−2, · · · )
(2) (−1,−1, 1, · · · ) (−2, 1, · · · ) (0,−3, 2, · · · )
(3) (−1, 0,−1, 1, · · · ) (−2, 1, · · · ) (0,−1,−2, 2, · · · )
(4) (−2, 1, · · · ) (−1, 0, · · · ) (−3, 2, · · · )
(5) (−2, 1, · · · ) (0, 0,−1, · · · ) (−4, 2, 1, · · · )
(6) (−1, 0, · · · ) (0,−1, · · · ) (−2, 1, · · · )
(7) (1,−1,−1, · · · ) (0, 0, 0,−1, · · · ) (2,−2,−2, 1, · · · )
Table 1: c = 2v −w cases
where · · · denotes zeros as usual. To arrive at this list, recall from Lemma 3.13 that w is either
type I or type III with vi = −1, wi = −2 for some i. Note also that if w is type I and v is type II
then v cannot overlap with w as w cannot then be a vertex. Furthermore, the other possibility with
w type I and v type III is excluded as we are assuming in Theorem 3.11 that there are no points
of W in the interior of vw. Finally, the case w = (−2, 1, · · · ), v = (−1, 0, · · · ) can be excluded as
this just gives the example in Theorem 3.14.
In order to list the possibilities under Theorem 3.11(ii), recall that we need J(c̄(1), w̄) = 0 where
c(1) = (2v + w)/3. Equivalently, we need
(3.2) 2J(v̄, w̄) + J(w̄, w̄) = 0.
12 A. DANCER AND M. WANG
This puts constraints on the possibilities for v,w. For instance, w cannot be type I, as for such
vectors J(v̄, w̄) ≥ 0 and J(w̄, w̄) > 0. Also, if w is type II or III, then from the superpotential
equation we need J(w̄, w̄) < 0, so J(v̄, w̄) > 0. If w is type III, say (−2, 1, 0, · · · ), then since d1 ≥ 2,
we have 4
≤ 3, and the above equation gives J(v̄, w̄) ≤ 1
with equality iff d1 = 2, d2 = 1.
By the above remarks and the nullity of c̄, after a moderate amount of routine computations, we
arrive at the following possibilities, up to permutation of entries. In the table we have listed only
the minimum number of components for each vector and all unlisted components are zero. Note
that the entries (12)-(16) can occur only if K is not connected (cf. Remark 5.9).
v w c(1) = (2v +w)/3 c = (4v − w)/3
(1) (0, 0,−2, 1) (−2, 1, 0, 0) (−2
(2) (−2, 0, 1) (−2, 1, 0) (−2, 1
) (−2,−1
(3) (−1, 0, 0, ) (−2, 1, 0) (−4
, 0) (−2
(4) (0, 0,−1) (−2, 1, 0) (−2
(5) (−1, 0, 1,−1) (−2, 1, 0, 0) (−4
) (−2
(6) (−1, 1,−1) (−2, 1, 0) (−4
, 1,−2
) (−2
, 1,−4
(7) (0, 0, 1,−1,−1) (−2, 1, 0, 0, 0) (−2
(8) (0, 1,−1,−1) (−2, 1, 0, 0) (−2
, 1,−2
, 1,−4
(9) (0,−1,−1, 1) (1,−1,−1, 0) (1
,−1,−1, 2
) (−1
,−1,−1, 4
(10) (1,−1, 0,−1) (1,−1,−1, 0) (1,−1,−1
) (1,−1, 1
(11) (1,−2, 0) (1,−1,−1) (1,−5
) (1,−7
(12) (1, 0, 0,−2) (1,−1,−1, 0) (1,−1
) (1, 1
(13) (0, 0, 0,−2, 1) (1,−1,−1, 0, 0) (1
) (−1
(14) (0,−1, 0, 1,−1) (1,−1,−1, 0, 0) (1
,−1,−1
) (−1
,−1, 1
(15) (1, 0, 0,−1,−1) (1,−1,−1, 0, 0) (1,−1
) (1, 1
(16) (0, 0, 0, 1,−1,−1) (1,−1,−1, 0, 0, 0) (1
) (−1
Table 2: c = 1
(4v − w) cases
We will also need a listing of those cases for which vw has interior points lying in conv(W).
v w c
(1) (1,−2, · · · ) (−2, 1, · · · ) (3λ− 2, 1− 3λ, · · · )
(2) (1,−2, · · · ) (−1, 0, · · · ) (2λ − 1, −2λ, · · · )
(3) (−1, 0, · · · ) (1,−2, · · · ) (1− 2λ, 2λ− 2, · · · )
(4) (−2, 1, 0, · · · ) (0, 1,−2, · · · ) (−2λ, 1, 2λ− 2, · · · )
(5) (1,−1,−1, · · · ) (−1, 1,−1, · · · ) (2λ− 1, 1 − 2λ,−1, · · · )
Table 3: Cases with interior points
Recall from Remark 3.12 that 1 < λ ≤ 2 and · · · denote zeros. Note that except in (4) all interior
points which may lie in W actually do.
4. The sign of J(c̄, w̄)
Theorem 4.1. conv(1
(d + W)) lies in the closed half-space J(c̄, ·) ≥ 0, i.e., the same closed
half-space in which the null ellipsoid lies.
Proof. We know that if ξ̄ is a vertex of ∆c̄ then there are three possibilities, given by (1A), (1B)
and (2) of Theorem 3.8. If (1A) occurs, then by definition J(c̄, ξ̄) = 0. If (1B) occurs, let ā be the
null vector in Theorem 3.8. Then by Lemma 3.2, J(c̄, ā) > 0, which in turn implies that J(c̄, ξ̄) > 0.
It is now enough to show that J(c̄, ξ̄) ≥ 0 if ξ̄ is a type (2) vertex of ∆c̄, since it then follows that
∆c̄, and hence conv(1
(d+W)), lies in the half-space J(c̄, ·) ≥ 0.
CLASSIFICATION OF SUPERPOTENTIALS 13
Suppose then that ξ̄ is a type (2) vertex with J(c̄, ξ̄) < 0. By Remark 3.12, c = λv + (1 − λ)w
for some v,w ∈ W with 1 < λ ≤ 2, and both J(c̄, v̄), J(c̄, w̄) < 0. In particular, from Remark 3.1
and Lemma 3.2, J(v̄, v̄), J(w̄, w̄) < 0 since v̄, w̄ lie on the side of c̄⊥ opposite to the null ellipsoid.
0 = 4J(c̄, c̄) = J(d+ λv + (1− λ)w, d+ λv + (1− λ)w)
= J(λ(d + v) + (1− λ)(d + w), λ(d+ v) + (1− λ)(d+ w))
= λ2J(d+ v, d+ v) + 2λ(1− λ)J(d+ v, d + w) + (1− λ)2J(d+ w, d + w).
It follows from the above remarks that J(d+ v, d + w) < 0, that is
One then checks that this condition is only satisfied in the following cases (up to permutation of
indices and interchange of v and w):
(a) v = (−2, 1, 0, · · · ), w = (−2, 0, 1, 0, · · · ) with 1 < d1 < 4;
(b) v = (−2, 1, 0, · · · ), w = (−1, 1,−1, 0, · · · ) with d1 = 2, or (d1, d2) = (3, 2), or d2 = 1;
(c) v = (1,−1,−1, 0, · · · ), w = (1,−1, 0,−1, 0, · · · ) with d1 = 1 or d2 = 1;
(d) v = (1,−1,−1, 0, · · · ), w = (0,−1,−1, 1, 0, · · · ) with d2 = 1 or d3 = 1.
In case (a), c = (−2, λ, 1 − λ, 0, · · · ). The condition d1 < 4 is incompatible with the nullity of c̄.
Interchanging v and w reverses only the role of λ and 1− λ.
A similar argument rules out case (b) with v,w as shown, as here c = (−λ − 1, 1, λ − 1). If
we interchange v and w, then c = (λ − 2, 1,−λ, · · · ). Theorem 3.11 tells us λ = 4/3 or 2, so
c = (−2/3, 1,−4/3, · · · ) or (0, 1,−2, · · · ).
In the former case c(1) := (2v+w)/3 = (−4/3, 1,−2/3, · · · ), so the condition J(w̄, c̄(1)) = 0 gives
8/3d1 + 1/d2 = 1. Thus (d1, d2) = (3, 9) or (4, 3) but in neither case is c̄ null. In the latter case
nullity means 1/d2 + 4/d3 = 1, so J(c̄, v̄) =
(1− 1/d2 − 2/d3) > 0, a contradiction.
In case (c), c = (1,−1,−λ, λ − 1) and if v and w are interchanged, the last two components of
c are interchanged. But c̄ cannot be null if d1 = 1 or d2 = 1. A similar argument works for case
Corollary 4.2. conv(1
(d+W)) ∩ c̄⊥ is a (possibly empty) face of conv(1
((d+W)).
This enables us to adapt Theorem 3.5 of [DW4] to the elements of c̄⊥.
Corollary 4.3. Let vw be an edge of conv(W) and suppose v̄ and w̄ are in c̄⊥. Suppose further
that there are no elements of W in the interior of vw. Then J(v̄, w̄) = 0.
Proof. This is essentially the same as the proof of Theorem 3.5 of [DW1]. As conv(1
(d+W))∩ c̄⊥
is a face of conv(1
(d + W)), Lemma 3.5 shows that for calculations in c̄⊥ we need only consider
elements of C in this hyperplane. Note that by Cor 3.4, no elements of C lie on the opposite side of
c̄⊥ to conv(1
(d+W).
Any vertex of conv(C) ∩ c̄⊥ outside conv(1
(d +W)) ∩ c̄⊥ is, by Prop 1.4, null, so must be c̄ by
Lemma 3.2. Now Cor 3.4 shows that c̄ is the only element of conv(C)∩c̄⊥ outside conv(1
(d+W))∩c̄⊥.
But any sum c̄+ ā with ā ∈ c̄⊥ does not contribute, so in fact we are in the situation of Theorem
3.5 of [DW4].
We introduce the following sets:
Ŝ1 = {i ∈ {1, · · · , r} : ∃ unique w ∈ W with w̄ ∈ c̄
⊥ and wi = −2}
Ŝ≥2 = {i ∈ {1, · · · , r} : ∃ more than one w ∈ W with w̄ ∈ c̄
⊥ and wi = −2}
These are similar to the sets S1, S≥2 of [DW4], but now we require that the vectors w to lie in c̄
It is immediate from Cor 4.3 that di = 4 if i ∈ Ŝ≥2, (cf Prop 4.2 in [DW4]).
14 A. DANCER AND M. WANG
We next prove a useful result about which elements of 1
(d +W) can be orthogonal to c̄. This
will give us information about when (1A) vertices can occur.
Lemma 4.4. Assume that we are not in the situation of Theorem 3.14 (i.e., c is not of type I ).
Let u ∈ W be such that ū ∈ c̄⊥. Then:
(a) there exists i with ci 6= 0 and −2 < ci < 1;
(b) if c ∈ Zr then there is at most one such u, and hence at most one (1A) vertex (wrt c).
Proof. (a) The condition J(ū, c̄) = 0 means
= 1, and nullity of c̄ means
= 1. As
ui ∈ {−2,−1, 0, 1}, if the condition in (a) does not hold, then uici ≤ c
i for all i so we must have
equality for all i. Now ci = ui for all i with ci nonzero. As
ci = −1 and c 6= u (since c /∈ W by
definition), this means c is a type I vector and we are in the situation of Theorem 3.14.
(b) We see from the previous paragraph that we need uici > c
i for some i. If c ∈ Z
r this means
ci = −1 and ui = −2. The orthogonality condition is now
= 1 where uj = 1. As di 6= 1 we
see cj ≥ 0.
If cj = 0 then di = 2. If cj > 0 then di ≥ 3 so
, where the second inequality is due to
the nullity requirement 1
≤ 1. So cj = 1 or 2. Moreover, the latter implies (di, dj) = (3, 6)
and c = (−1i, 2j), which contradicts
ci = −1.
We see that either cj = 1 and (di, dj) = (4, 2) or (3, 3), or cj = 0 and di = 2. Cor 4.3 implies
that if there is more than one such u (say (−2i, 1j) and (−2i, 1k)) for a given i, then di = 4, so
(di, dj , dk) = (4, 2, 2), and (ci, cj , ck) = (−1, 1, 1), contradicting the nullity of c.
It now readily follows that the nullity condition prevents there being more than one u ∈ W
with ū ∈ c̄⊥ except when c = (−1,−1, 1, 0, · · · ) with d = (4, 4, 2, · · · ) or (3, 3, 3, · · · ) and u =
(−2, 0, 1, 0, · · · ), (0,−2, 1, 0, . . .). But in this case if both u occur then c ∈ conv(W), a contradic-
tion.
We shall study (1A) vertices for non-integral c in the next section. The following results will be
useful.
Proposition 4.5. Let v = (−2i, 1j) and w = (−2k, 1l) be elements of W such that v̄, w̄ ∈ c̄⊥.
Suppose that i ∈ Ŝ1 and {i, j} ∩ {k, l} = ∅. Then k ∈ Ŝ≥2 and (di, dk, dl) = (2, 4, 2).
Proof. By Remark 1.2(e) the affine subspace {x̄ : xi+xk = −2, xj +xl = 1}∩ c̄
⊥ meets conv(1
W)) in a face, whose possible elements are v,w, u = (−2k, 1j), y = (−1i, 1j ,−1k) and z = (−1i,−1k, 1l)
(since i ∈ Ŝ1).
As J(v̄, w̄) = 1
, we see from Thm 4.3 that vw is not an edge so z is present in the face. Now
Cor 4.3 on vz implies di = 2. Also, u must be present, otherwise y is present and Cor 4.3 on zw
and yw gives a contradiction. So k ∈ Ŝ≥2, and Cor 4.3 on uw implies dk = 4. Now considering zw
implies dl = 2.
Remark 4.6. This is similar to the proof of Prop 4.6 in [DW4]. But we cannot now deduce that
dj = 1 as the proof of this in [DW4] relied on the existence of t = (−1
i,−1j , 1k), and although we
know this is in W we do not know if t̄ lies in c̄⊥.
Proposition 4.7. If i ∈ Ŝ1 and v = (−2
i, 1j) gives an element of c̄⊥ then w = (−1i,−1j , 1k)
cannot give an element of c̄⊥.
Proof. This is similar to Prop. 4.3 in [DW4]. Since i ∈ Ŝ1, the vectors v̄, w̄ lie on an edge in the face
{x̄ : 2xi + xj = −3} ∩ c̄
⊥ of conv(1
(d+W)), and J(v̄, w̄) = 1
(1− 2
) 6= 0 since di 6= 1.
Corollary 4.8. With v as in Prop 4.7, there are no elements w = (−2j , 1k) with w̄ in c̄⊥.
CLASSIFICATION OF SUPERPOTENTIALS 15
Proof. This is similar to Prop 4.4 in [DW4]. If k = i, then the type I vector u := (−1i) = 1
(2v+w)
lies in W and ū ∈ c̄⊥. By Lemma 5.1 below, u = c, contradicting c /∈ W.
We can therefore take k 6= i. Now v̄, w̄ lie on an edge in the face {x̄ : 3xi +2xj = −4} ∩ c̄
⊥ (this
is a face by Prop 4.7 and the assumption i ∈ Ŝ1). But J(v̄, w̄) =
(1 + 2
) 6= 0.
5. Vectors orthogonal to a null vertex
In this section we analyse the possibilities for 1
(d+W)∩ c̄⊥. This will give us an understanding
of the vertices of type (1A).
We first dispose of the case of type I vectors.
Lemma 5.1. If u is a type I vector and ū ∈ c̄⊥ then c = u, so we are in the situation of Theorem
3.14.
Proof. Up to a permutation we may let u = (−1, 0, · · · ). The orthogonality condition implies
c1 = −d1. But nullity implies
c2i /di = 1, so d1 = 1 and ci = 0 for i > 1. (Note that in particular
u /∈ W.)
We shall therefore assume from now on there are no type I vectors giving points of c̄⊥.
Lemma 5.2. (i) Two type II vectors whose nonzero entries lie in the same set of three indices
cannot both give elements of c̄⊥.
(ii) Two type III vectors (−2i, 1j) and (1i,−2j) cannot both give elements in c̄⊥.
(iii) Three type III vectors whose nonzero entries all lie in the same set of three indices cannot
all give rise to elements in c̄⊥.
Proof. These all follow from Lemma 5.1 by exhibiting an affine combination of the given vectors
which is of type I.
Let u, v ∈ W be such that ū and v̄ ∈ c̄⊥. It follows that λū + (1 − λ)v̄ ∈ c̄⊥ for all λ. Hence
Remark 3.1 shows that for all λ
0 ≥ J(d+ λu+ (1− λ)v, d + λu+ (1− λv))
= J(λ(d+ u) + (1− λ)(d+ v), λ(d + u) + (1− λ)(d+ v))
= λ2(J(d + u, d+ u) + J(d+ v, d+ v)− 2J(d+ u, d+ v)) +
2λ(J(d + u, d+ v)− J(d+ v, d + v)) + J(d+ v, d+ v).
Equality occurs if and only if λu+ (1− λ)v = c, as c̄ is the only null vector in c̄⊥.
Multiplying by −1, using Eq.(1.3), and recalling that the minimum value of a quadratic αλ2 +
βλ+ γ with α > 0 is γ − (β2/4α), we deduce the following result.
Lemma 5.3. If u, v ∈ W and ū, v̄ ∈ c⊥ then
(5.1)
Moreover, equality occurs if and only if c = λu+ (1− λ)v for some λ.
Remark 5.4. By definition, c does not lie in conv(W). So in the case of equality in Eq.(5.1) we
cannot have 0 ≤ λ ≤ 1. This observation will in many cases show that equality cannot occur.
Remark 5.5. The right-hand side of Eq.(5.1) is maximised when
= 1 (i.e., J(ū, v̄) = 0).
In this case Eq.(5.1) just follows from
≥ 1, which is true for any two vectors in c̄⊥. If
J(ū, v̄) 6= 0, we get sharper information.
16 A. DANCER AND M. WANG
Corollary 5.6. Suppose that K is connected. If u, v are type II vectors in W with ū, v̄ ∈ c̄⊥ then
with equality if and only if c = λu+ (1 − λ)v for some λ, in which case all the di = 2 whenever i
is an index such that ui or vi is nonzero.
Proof. Writing X =
and Y =
we see that 1 ≤ X,Y ≤ 3
. The lower bound arises from
ū, v̄ being in c̄⊥, while the upper bound follows from Remark 1.2(d) and the assumption that u, v
are type II vectors.
Now X +Y −XY = 1− (1−X)(1−Y ) is minimised for X,Y in this range if X = Y = 3
, when
it takes the value 3
. The inequality Eq.(5.1) now gives the result.
When K is connected, it follows that any two such type II vectors must overlap. Moreover, if
they have only one common index then we are in the case of equality in Cor 5.6. The nullity of c̄
implies that λ = 1
in this case, contradicting Remark 5.4.
Combining this remark with Cor 5.6 and Lemma 5.2 (i) , we deduce the following result.
Corollary 5.7. Assume K is connected. If u, v are type II vectors in W with ū, v̄ ∈ c̄⊥, then either
u = (−1a,−1b, 1i), v = (−1a,−1b, 1j) or u = (1a,−1b,−1i), v = (1a,−1b,−1j).
Hence the collection of all such type II vectors is of the form, for some fixed a, b:
(i) (−1a,−1b, 1i) : i ∈ I for some set I; or
(ii) (1a,−1b,−1i) : i ∈ I for some set I; or
(iii) (1,−1,−1, 0, · · · ), (1,−1, 0,−1, · · · ), (1, 0,−1,−1, · · · ).
We now investigate type III vectors.
Lemma 5.8. Suppose K is connected. If u is a type II vector and v a type III vector in W with
ū, v̄ ∈ c̄⊥, then
Proof. With the notation of Cor 5.6 we have 1 ≤ X ≤ 3
and 1 ≤ Y ≤ 3. So X + Y − XY =
1− (1 −X)(1 − Y ) ≥ 0, and Eq.(5.1) gives the desired inequality. Also, the case of equality (i.e.,
X = 3
, Y = 3) leads to λ = 2
, again contradicting Remark 5.4.
Remark 5.9. While Cor 5.6 - Lemma 5.8 are stated under the assumption that K is connected,
the actual property we used is that in Remark 2.4. By contrast, the next two results do not require
this property.
Lemma 5.10. Any two type III vectors u, v giving elements of c̄⊥ must overlap.
Proof. Write u = (−2i, 1j) and v = (−2k, 1l). By Cor 4.3, if i, k ∈ Ŝ≥2 then di = dk = 4. Since
J(c̄, ū) = 0 we have (by Cauchy-Schwartz)
Hence
1 + dj
If u and v do not overlap, then the above and the analogous result from considering J(c̄, v̄) = 0,
together with the nullity of c̄, imply that dj = 1 = dl and the only nonzero components of c are
ci = ck = −1, cj = cl =
. But then c is the midpoint of uv, contradicting c /∈ conv(W).
So if u and v do not overlap, we can take i ∈ Ŝ1. Proposition 4.5 shows that k ∈ Ŝ≥2 and
(di, dk, dl) = (2, 4, 2). Hence 2 < X ≤ 3 and Y =
, so X + Y − XY ≥ 0 and
Non-overlap means that equality holds. But then λ = 1/3, contradicting Remark 5.4.
CLASSIFICATION OF SUPERPOTENTIALS 17
Lemma 5.10, together with Lemmas 5.2 and 4.8, implies the following corollary.
Corollary 5.11. The type III vectors associated to elements of 1
(d+W)∩ c̄⊥ are, up to permutation
of indices, either of the form
(a) (−21, 1i), i ∈ I, (with d1 = 4 if |I| ≥ 2), or
(b) (11,−2i), i ∈ I,
for some subset I ⊂ {2, · · · , r}.
Having found the possible configurations for type III vectors in c̄⊥, we start to analyse the type II
vectors for each such configuration. For the rest of this section we will assume that K is connected
(cf Remark 5.9).
Remark 5.12. Lemma 5.8 now shows that in case (a) of Cor 5.11, if |I| ≥ 2, then every type II
vector associated to an element of c̄⊥ must have “-1” in place 1. Similarly, in case (b), if |I| ≥ 3,
then every such type II vector has “1” in place 1. (So if a type II is present then d1 6= 1). If |I| = 2,
the only possible type II vectors with “0” in place 1 are (01,−12,−13, 1i) where i ≥ 4, and all type
II vectors whose first entry is nonzero actually must have first entry equal to 1.
Lemma 5.13. In case (a) of Cor. 5.11 with |I| ≥ 2 there are no type II vectors associated to
elements of c̄⊥.
Proof. Let v = (−21, 1k) and w = (−11, 1i,−1j) give elements of c̄⊥ with k 6= i, j. Consider the
face {x̄ : xi + xk = 1, x1 + xj = −2} ∩ c̄
⊥. Other than v,w the possible elements in this face come
from u = (−11, 1k,−1j) and s = (−21, 1i). As d1 = 4, J(v̄, w̄) 6= 0, so vw is not an edge and u
must be present. But J(ū, w̄) = 1
(1 − 1
) 6= 0 since d1 = 4, giving a contradiction. So k = i
or j for every such v,w.
Hence if such a w exists there are at most two type III vectors. Now if |I| = 2 and the type IIIs
are (−2, 1, 0, · · · ), (−2, 0, 1, · · · ), we cannot have w = (−1, 1,−1, · · · ) or (−1,−1, 1, · · · ) as then a
suitable affine combination of the above vectors give a type I vector. (cf Lemmas 5.1, 5.2). So in
fact no type II vectors give rise to elements of c̄⊥.
Lemma 5.14. The vectors v = (−2, 1, 0, · · · ) and w = (0, 1,−1,−1, 0, · · · ) are not both associated
to elements of c̄⊥, unless (0, 1,−2, 0, · · · ) or (0, 1, 0,−2, 0, · · · ) is also.
Proof. Suppose (0, 1,−2, 0, · · · ), (0, 1, 0,−2, 0, · · · ) are absent. Consider the face {x̄ : x2 = 1, x1 +
x3 + x4 = −2} ∩ c̄
⊥. The other possible elements of this face come from t = (−1, 1,−1, 0, · · · ) and
y = (−1, 1, 0,−1, 0, · · · ). Both these must be present, as J(v̄, w̄) 6= 0. Applying Cor 4.3 to wt, vt
and wy we obtain (d1, d2, d3, d4) = (4, 2, 2, 2).
Now we have equality (for y, t) in Eq.(5.1), as both sides equal 15/16. We find that λ = 1/2,
giving a contradiction again to Remark 5.4.
Combining this with Lemma 5.8 (and using Lemma 4.7) yields:
Corollary 5.15. If there is a unique type III vector u = (−21, 12) with ū in c̄⊥, then the type II
vectors associated to elements of c̄⊥ all have “-1” in place 1. Moreover (−11,−12, 1i) cannot be
present. Also, if (−11, 12,−1i) is present for some i ≥ 3 then (d1, d2) = (4, 2) or (3, 3) and the
index i is unique.
For the last assertion, observe that (−11, 12,−1i) and the type III vector are joined by an edge,
so Cor 4.3 shows the dimensions are as stated. If we have two such type II for i0 and i1 then
Eq.(5.1) implies di0 + di1 ≤ 4. Hence since K is connected, di0 = di1 = 2 and we have equality in
Eq.(5.1) with λ = 1
, giving a contradiction.
Lemma 5.16. Let the type III vectors be as in Cor 5.11(b), i.e., they are (11,−2a), a ∈ I. Assume
that |I| ≥ 2. If we have a type II vector w = (11,−1i,−1j) with w̄ in c̄⊥ then i, j ∈ I
18 A. DANCER AND M. WANG
Proof. Suppose for a contradiction that w = (11,−1i,−1j) is present (so d1 6= 1) and (1
1,−2j)
absent (i.e. j /∈ I). Since |I| ≥ 2, we can consider v = (11,−2k) where k ∈ I (so k 6= j) and k 6= i.
Consider the face {x̄ : x1 = 1, xi + xj + xk = −2} ∩ c̄
⊥. As well as v,w the possible elements of W
in the face giving elements of c̄⊥ are y = (11,−1i,−1k), t = (11,−1j ,−1k) and u = (11,−2i). As
d1 6= 1, vw is not an edge so t is present. Now Cor 4.3 applied to vt and tw gives d1 = dj = 2 and
dk = 4.
Moreover, if i ∈ I then u is present, so the edge wu gives di = 4. Thus we have shown that
da = 4 for all a ∈ I.
Now considering (11,−2a) and (11,−2b) with a, b ∈ I, we see that we have equality in Eq.(5.1)
(both sides equal 3
). In fact c is the average of these two vectors (i.e., λ = 1
), so as in Remark 5.4
we have a contradiction.
Lemma 5.17. Let the type III vectors be as in Cor 5.11(b), i.e., they are (11,−2a), a ∈ I. Assume
that |I| ≥ 3. Then d1 = 1.
Proof. Each pair v,w of type III vectors gives an edge, and if d1 6= 1, then we have J(v̄, w̄) > 0. By
Theorem 4.3 all the midpoint vectors (11,−1a,−1b) are present for a, b ∈ I. Now Prop 3.7 shows
that Fv̄ and Fw̄ have opposite signs, so we have a contradiction if |I| ≥ 3.
Putting together our results so far, we obtain a description of the possibilities for c̄⊥∩ 1
(d+W).
Theorem 5.18. Assume that r ≥ 3 and K is connected, and that we are not in the situation of Thm
3.14. Up to permutation of the irreducible summands, the following are the possible configurations
of vectors in W associated to elements of 1
(d+W) ∩ c̄⊥.
(1) {(−21, 1i), 2 ≤ i ≤ m} for fixed m ≥ 2. There are no type II vectors, and d1 = 4 if m ≥ 3.
(2) {(11,−2i), 2 ≤ i ≤ m} for fixed m ≥ 3 and d1 = 1. There are no type II vectors.
(3)(i) {(11,−22), (11,−23), (−12,−13, 1i), 4 ≤ i ≤ m} with d1 = 1, d2 = d3 = 2.
(ii) {(11,−12,−13), (11,−22), (11,−23), (−12,−13, 1i), 4 ≤ i ≤ m}, d1 6= 1, d2 = d3 = 2.
(4) {(1,−2, 0, 0, · · · ), (1, 0,−2, 0, · · · ), (1,−1,−1, 0, · · · )} with d1 6= 1.
(5) A unique type III (−2, 1, 0, · · · ). Possible type II vectors are
(i) (−1, 1,−1, 0, · · · ) with either (d1, d2) = (4, 2) or (3, 3); or
(ii) {(−11, 13,−1i), 4 ≤ i ≤ m} for fixed m ≤ r and with d1 = 2; or
(iii) {(−11,−13, 1i), 4 ≤ i ≤ m} for fixed m ≤ r and with d1 = 2.
(6) No type III vectors. Possible type II vectors are
(i) {(−11,−12, 1i), 3 ≤ i ≤ m} for fixed m ≤ r, with d1 = d2 = 2 if m ≥ 4; or
(ii) {(11,−12,−1i), 3 ≤ i ≤ m} for fixed m ≤ r, with d1 = d2 = 2 if m ≥ 4; or
(iii) {(11,−12,−13), (11,−12,−14), (11,−13,−14)} with d1 = d2 = d3 = d4 = 2.
Proof. Cor 5.11 gives the possibilities for the type III vectors in c̄⊥. If there are none then Cor 5.7
gives the possibilities in (6). If there is a unique type III vector, then Cor 5.15 and Cor 5.7 give us
the cases listed in (5) (or (1) with m = 2 if there are no type II). If we have two or more type III
vectors with −2 in the same place then Lemma 5.13 shows we are in case (1).
If we have more than two type III vectors with 1 in the same place a, then da = 1 by Lemma
5.17. Remark 5.12 then implies there are no type II vectors and we are in case (2).
If we have exactly two type III vectors with 1 in the same place, e.g., (1,−2, 0, · · · ) and
(1, 0,−2, 0, · · · ), then the proof of Lemma 5.17 shows that if the type II vector (1,−1,−1, 0, · · · )
is absent we must have d1 = 1. On the other hand, if d1 = 1 we are, by Remark 5.12 and the
connectedness of K, in case (2) or (3)(i). If d1 6= 1, then by the above, Remark 5.12, and Cor 5.7,
we are in case (3)(ii) or (4).
The statements about values of the di follow from straightforward applications of Cor 4.3 to the
obvious edges of conv(1
(d+W)) ∩ c̄⊥.
CLASSIFICATION OF SUPERPOTENTIALS 19
Remark 5.19. The possibilities in Theorem 5.18 can be somewhat sharpened. In cases (1), (2),
and (3), m cannot be r; in other words the maximum number of vectors is not allowed. This follows
easily from looking at the system of equations expressing the nullity of c̄, the orthogonality of the
vectors to c̄ and the fact that the entries of c sum up to −1. Similarly, r 6= 3 in (5)(i) and r 6= 4 in
(6)(iii).
When m ≥ 5 in (5)(ii) or (5)(iii), the segment joining two type II vectors is an edge, so Cor 4.3
gives d3 = 2.
6. Adjacent (1B) vertices
We now turn to (1B) vertices. Let ξ̄, ξ̄′ be adjacent (1B) vertices of ∆. Then there exist vertices
x̄, x̄′ of conv(1
(d+W)) such that c̄, ξ̄, x̄ are collinear and c̄, ξ̄′, x̄′ are collinear. Moreover, there exist
null vectors ā, ā′ such that x̄ = (ā+ c̄)/2 and x̄′ = (ā′+ c̄)/2. By Cor 3.4, there must be an element ȳ
of conv(1
(d+W)) on āā′, so P−1(ξ̄ξ̄′) contains the convex hull of x̄, x̄′, ȳ and hence is 2-dimensional.
As ξ̄ξ̄′ is by assumption an edge of ∆, P−1(ξ̄ξ̄′) is a 2-dimensional face of conv(1
(d+W)).
So we need to analyse the 2-dimensional faces of conv(W) containing vertices x, x′ such that
(6.1) x = (a+ c)/2, x′ = (a′ + c)/2, ā, ā′ null,
and such that c lies in the 2-dimensional plane defining this face. The lines through x, c (resp. x′, c)
only meet conv(W) at x (resp. x′).
Most 2-faces of conv(W) are triangular. We list below (up to permutation of components) all the
possible non-triangular faces. For further details regarding how this listing is arrived at, see [DW5].
We emphasize that only the full faces are being listed, i.e., configurations formed by all the possible
elements of W in a given 2-dimensional plane. As the set of weight vectors for a given principal
orbit may be a subset of the full set of possible weight vectors, these full faces may degenerate to
subfaces or even lower-dimensional faces (see Remark 6.2).
Listing convention: In the interest of economy and clarity, we make the convention that when we
list vectors in W belonging to a 2-face we will use the freedom of permuting the summands to
place nonzero components of the vectors first and we will only put down the minimum number of
components necessary to specify the vectors.
Hexagons: There are 3 possibilities.
(H1) This is the face in the plane {x1 + x2 + x3 = −1; xa = 0, for a > 3}. Points of W are
(−2i, 1j), (−1i, 1j ,−1k), (−1i) where i, j, k ∈ {1, 2, 3}. The type III vectors form the vertices of the
hexagon.
(H2) The plane here is {x1 + x2 = −1, x3 + x4 = 0, xi = 0 (i > 4)}. Points of W are vertices
u = (−2, 1, 0, 0), v = (1,−2, 0, 0), y = (−1, 0, 1,−1), y′ = (0,−1, 1,−1),
z = (−1, 0,−1, 1), z′ = (0,−1,−1, 1),
and the interior points
α = (−1, 0, 0, 0), β = (0,−1, 0, 0).
(H3) The plane is {x2 = −1, x1 + x3 + x4 = 0, xi = 0 (i > 4)}. Points of W are the vertices
u = (−1,−1, 1, 0), v = (0,−1, 1,−1), w = (1,−1, 0,−1),
x = (1,−1,−1, 0), y = (0,−1,−1, 1), z = (−1,−1, 0, 1)
and the centre
t = (0,−1, 0, 0).
Square: (S) with midpoint t = (0,−1, 0, 0, 0) and vertices
v = (−1,−1, 1, 0, 0), u = (0,−1, 0, 1,−1),
s = (0,−1, 0,−1, 1), w = (1,−1,−1, 0, 0).
20 A. DANCER AND M. WANG
Trapezia: We have vertices v, u, s, w, t with 2v − s = 2u − w and t = 1
(s + w), i.e., these are
symmetric trapezia. Below we list the possible v, u, s, w.
v u s w
(T1) (−2, 1, 0, 0) (−2, 0, 1, 0) (0, 0,−2, 1) (0,−2, 0, 1)
(T2) (−2, 0, 1, 0) (−2, 1, 0, 0) (0,−1, 1,−1) (0, 1,−1,−1)
(T3) (−1,−1, 0, 1) (0,−1,−1, 1) (−2, 1, 0, 0) (0, 1,−2, 0)
(T4) (0, 0, 1,−1,−1) (1, 0, 0,−1,−1) (−2, 1, 0, 0, 0) (0, 1,−2, 0, 0)
(T5) (−1, 0, 0, 1,−1) (0, 0,−1, 1,−1) (−2, 1, 0, 0, 0) (0, 1,−2, 0, 0)
(T6) (1,−1,−1, 0, 0) (1,−1, 0,−1, 0) (0, 0,−1, 1,−1) (0, 0, 1,−1,−1)
Table 4: Possible trapezoidal faces
Note that the configuration with vertices (−1,−1, 1, 0, 0), (−1,−1, 0, 1, 0), (0, 0, 1,−1,−1), and
(0, 0,−1, 1,−1) is equivalent to (T6) under the composition of a permutation and a J-isometric
involution.
Parallelograms: We have vertices v, u, s, w with v − u = s− w.
v u s w
(P1) (−2, 1, 0, 0) (−1, 0,−1, 1) (−2, 0, 1, 0) (−1,−1, 0, 1)
(P2) (−2, 1, 0, 0, 0) (−2, 0, 1, 0, 0) (0, 1, 0,−1,−1) (0, 0, 1,−1,−1)
(P3) (−2, 1, 0, 0, 0) (−2, 0, 1, 0, 0) (0, 0,−1,−1, 1) (0,−1, 0,−1, 1)
(P4) (−2, 1, 0, 0) (−1, 0, 1,−1) (−1,−1, 0, 1) (0,−2, 1, 0)
(P5) (−2, 1, 0, 0, 0) (−1, 0, 0, 1,−1) (−1, 0, 1,−1, 0) (0,−1, 1, 0,−1)
(P6) (−2, 1, 0, 0, 0) (−1, 0, 0,−1, 1) (−1, 0,−1, 1, 0) (0,−1,−1, 0, 1)
(P7) (−2, 1, 0, 0, 0) (0, 1,−1,−1, 0) (−1, 0, 0, 1,−1) (1, 0,−1, 0,−1)
(P8) (1,−1,−1, 0, 0, 0) (0, 0, 0, 1,−1,−1) (1,−1, 0,−1, 0, 0) (0, 0, 1, 0,−1,−1)
(P9) (1,−1,−1, 0, 0, 0) (1,−1, 0, 0,−1, 0) (0, 0,−1, 1, 0,−1) (0, 0, 0, 1,−1,−1)
(P10) (1,−1,−1, 0, 0, 0) (1, 0, 0, 0,−1,−1) (0,−1,−1, 1, 0, 0) (0, 0, 0, 1,−1,−1)
(P11) (0, 0, 1,−1,−1, 0) (0,−1, 0,−1, 0, 1) (1, 0, 0, 0,−1,−1) (1,−1,−1, 0, 0, 0)
(P12) (1, 0,−1, 0,−1) (1,−1,−1, 0, 0) (0, 0,−1, 1,−1) (0,−1,−1, 1, 0)
(P13) (−1, 0,−1, 0, 1) (0, 0,−1,−1, 1) (0,−1,−1, 1, 0) (1,−1,−1, 0, 0)
(P14) (−1, 0, 1, 0,−1) (−1,−1, 1, 0, 0) (0, 0,−1, 1,−1) (0,−1,−1, 1, 0)
(P15) (−1, 0, 1, 0,−1) (−1,−1, 1, 0, 0) (0, 0, 1,−1,−1) (0,−1, 1,−1, 0)
(P16) (−2, 1, 0, 0) (0, 1,−2, 0) (−1, 0, 1,−1) (1, 0,−1,−1)
(P17) (−2, 1, 0, 0) (0, 1, 0,−2) (−2, 0, 1, 0) (0, 0, 1,−2)
Table 5: Possible parallelogram faces
Remark 6.1. (P1), (P2), (P3), and (P17) are actually rectangles. (P16) also includes the midpoints
y = (u+ v)/2 = (−1, 1,−1, 0) and z = (s+ w)/2 = (0, 0, 0,−1). The rectangle (P17) also includes
the midpoints y = (u+ v)/2 = (−1, 1, 0,−1) and z = (s+ w)/2 = (−1, 0, 1,−1).
Remark 6.2. We must also consider subshapes of the above. Each symmetric trapezium contains
two parallelograms. The two rectangles with midpoints (P17), (P16) will contain asymmetric
trapezia. (P17) also contains parallelograms and squares. (For (P16), note that s is present iff w is.)
Furthermore, there are numerous subshapes of the hexagons. The regular hexagon (H3) contains
rectangles with midpoint (by omitting opposite pairs of vertices). Besides triangles, the hexagon
(H2) contains pentagons, rectangles and squares (with midpoints), and kite-shaped quadrilaterals
(e.g. y′uz′v). For (H1) see the discussion before Theorem 6.12. Finally, the triangle with midpoints
of all sides (where the vertices are the three type III vectors with 1 in the same place) contains a
trapezium (by omitting one vertex) and hence parallelograms.
Remark 6.3. We also note for future reference that there are examples where we can have four
or more coplanar elements of W but the plane cannot be a face. These examples are not of course
CLASSIFICATION OF SUPERPOTENTIALS 21
relevant to the case of adjacent (1B) vertices, but some will be relevant when we consider multiple
vertices of type (2). The examples which we will need in that context are the following three
trapezia
v u s w
(T ∗1) (0, 1,−1,−1) (1, 0,−1,−1) (−2, 1, 0, 0) (1,−2, 0, 0)
(T ∗2) (0,−1, 1,−1) (1,−1, 0,−1) (−2, 1, 0, 0) (0, 1,−2, 0)
(T ∗3) (1,−1,−1, 0) (1,−1, 0,−1) (−1, 0,−1, 1) (−1, 0, 1,−1)
Table 6: Further trapezia
In (T*2),(T*3), as in (T1)-(T7), we have 2v − s = 2u− w. In these examples t = 1
(s+w) may
also be present. In (T*1) we have s− w = 3(v − u), and the vectors t = (2s + w)/3 = (−1, 0, 0, 0)
and r = (s+ 2w)/3 = (0,−1, 0, 0) will also be present.
As an example, we explain why the trapezium (T*2) can never be a face. As u is present
in W, so are u′ = (−1, 1, 0,−1) and u′′ = (−1,−1, 0, 1). Now (2u′ + u′′)/3 = (2s + u)/3 =
(−1, 1
, 0,−1
) is in the plane, but u′ is not, so this plane cannot give a face. Similar arguments
involving (1, 0,−1,−1), (−1, 0, 0, 0) (resp. (−1, 0,−1, 1), (−1, 0, 1,−1)) show (T*1) (resp. (T*3))
cannot be faces.
These arguments also show several parallelograms cannot be faces, but these will not be relevant
for our purposes.
We now begin to classify the possible 2-faces which arise from adjacent (1B) vertices. We shall
repeatedly use Prop 1.4, Cor 3.4, and Lemma 3.5. Let E denote the affine 2-plane determined by
the 2-face being studied.
Theorem 6.4. Suppose we have adjacent (1B) vertices corresponding to a parallelogram face vusw
of conv(W). So we have ū = (ā+c̄)/2 and w̄ = (ā′+c̄)/2 for null ā, ā′. Suppose the vertices v, u, s, w
are the only elements of W in the face. Then u,w are adjacent vertices of the parallelogram, and
either
(i) C ∩ E = {c̄, ā, ā′, ē} where ē is null with v = (a+ e)/2 and s = (a′ + e)/2; or
(ii) v̄, s̄ ∈ C and J(ā, v̄) = J(ā′, s̄) = J(s̄, v̄) = 0.
Moreover, if none of v, u, s, w is type I, then (i) cannot occur.
Proof. We may introduce coordinates in the 2-plane E using the sides sv and sw to define the
coordinate axes. In this way we can speak of “left” or “right”, “up” or “down”. If we extend
the sides of the parallelogram to infinite lines, these lines divide the part of the plane outside the
parallelogram into 8 regions, and c̄ must be in the interior of one such region.
We first observe that if c̄ is in one of the four regions which only meet the parallelogram at a
vertex, then āā′ does not meet the parallelogram, contradicting Lemma 3.4.
(A) Let c̄ then lie in a region which meets the parallelogram in an edge. Without loss of generality
we may assume the edge is uw. By Cor 3.4, all elements of C ∩ E lie on or between the rays from
c̄ through ā, ā′. Hence, by Lemma 3.2, J(b̄, c̄) > 0 for all b̄ ∈ C \ {c̄}. If b̄ is a rightmost element
of (C ∩E) \ {c̄}, then as b̄+ c̄ cannot be written in another way as a sum of two elements of C, we
deduce from Prop 1.4 that b̄+ c̄ ∈ d+W. So b̄ is either ā or ā′. All other elements of C ∩ E lie to
the left of āā′. Note also that a rightmost element of (C ∩ E) \ {c̄, ā, ā′} satisfies b+ c = a+ a′, 2v
or 2s.
(B) Next let ē = 2v̄ − ā. Observe that as well as v̄ = (ā + ē)/2, we have s̄ = (ā′ + ē)/2, since
2v̄ − ā = 2(v̄ − ū) + c̄ = 2(s̄ − w̄) + c̄ = 2s̄− ā′.
If ē ∈ C, then it must be null, and the same argument as above shows that no elements of
(C ∩ E) \ {ē} lie to the left of ā, ā′, so we are in case (i). Now, Lemma 3.2 shows J(h̄, k̄) > 0 for
all h̄ 6= k̄ ∈ C ∩E. If v, u, s, w are all type II/III, we see that Fc̄, Fē are of one sign and Fā, Fā′ the
other sign. But now the contributions from ā+ ā′ and c̄+ ē in the superpotential equation cannot
cancel.
22 A. DANCER AND M. WANG
If ē /∈ C then, as in the argument before Theorem 3.8, s̄, v̄ ∈ C and we are in case (ii). Prop 3.7
shows v̄, s̄ are orthogonal. Moreover, note that the remark at the end of (A) shows that v + c or
s+ c is left of a+ a′.
Lemma 6.5. In case (ii) of Theorem 6.4, we have J(v̄, v̄) = J(s̄, s̄).
Proof. As c̄ and ā = 2ū− c̄ are both null, and similarly c̄ and ā′ = 2w̄− c̄ are both null, we deduce
(cf Remark 3.9)
(6.2) J(ū, ū) = J(ū, c̄) : J(w̄, w̄) = J(w̄, c̄).
We also have
(6.3) 2J(ū, v̄) = J(c̄, v̄) : 2J(w̄, s̄) = J(c̄, s̄)
from the orthogonality conditions on ā, v̄ and ā′, s̄.
Now J(s̄, s̄)− J(v̄, v̄) = J(s̄, s̄)− J(w̄− ū− s̄, w̄− ū− s̄), which, on expanding out and using the
second relations of Eqs.(6.2),(6.3), becomes J(2ū− c̄, w̄ − s̄)− J(ū, ū). Now
J(2ū− c̄, w̄ − s̄)− J(ū, ū) = J(2ū− c̄, ū− v̄)− J(ū, ū) = J(2ū− c̄, ū)− J(ū, ū) = J(ū− c̄, ū) = 0.
We have used the first relations of Eqs.(6.3), (6.2) in the second and fourth equalities.
Remark 6.6. We must also consider the case when the midpoint of one side or a pair of opposite
sides of the parallelogram face is in W. This can happen for (P16) and (P17). Note that v, u, s, w
are type II/III in these cases.
In fact, the argument of Theorem 6.4 is still valid if one or both of the midpoints of vu, sw is in
W and c lies in the region to the right of uw (or the left of vs).
Keeping c in the region to the right of uw, we now need to consider the case where one or both
of the midpoints of vs, uw is in W. The conclusions (in 6.4(ii)) still hold except that we no longer
have J(v̄, s̄) = 0.
However, we have to make slight modifications to the arguments as 1
(ā + ā′) may be in C ∩ E.
If ē ∈ C, then, as ā + ā′ is not in d + W, the usual sign argument shows that the terms in the
superpotential equation summing up to ā + ā′ do not cancel, which is a contradiction. So ē /∈ C
and our previous arguments hold except for the use of Prop 3.7.
Note that we also have to consider the possibility that a, a′, and e lie on the line through vs.
But now the midpoint of uw must be present and C ∩E = {c̄, ā, ā′, 1
(ā+ ā′)}, with v + s = a+ a′.
The usual sign argument then forces the midpoints of uw and vs to be present and of type I. Hence
this special configuration cannot occur in (P16) or (P17).
Lastly, since the proof of Lemma 6.5 makes no mention of midpoints, it remains valid if midpoints
are present.
The conditions of Theorem 6.4 and Lemma 6.5, together with the nullity of ā, ā′, c̄, put very strong
constraints on vusw and the dimensions. In fact, one can check that these constraints cannot be
satisfied for any of our parallelograms (including those of Remark 6.2) with one exception. This is
the rectangle yy′z′z in (H2) with c = (−2, 1, 0, · · · ) and 1
, which will be dealt with in
Lemma 8.5. We now give an example of how to apply the above conditions in a specific case.
Example 6.7. Consider parallelogram (P8). The equation of the 2-plane E containing the paral-
lelogram is
(6.4) x2 = −x1, x5 = x6, x2 + x5 = −1, x1 + · · · + x6 = −1
and xi = 0 for i > 6. As all vertices are type II/III, we must be in case (ii) of Theorem 6.4.
(A) Take c to face the side uw. Note that vs and uw have equation x1 = 1, x1 = 0 respectively,
so c1 < 0. Also, the remarks at the end of parts (A) and (B) in the proof of Theorem 6.4 shows
that c1 > −
, as v + c or s+ c is left of a+ a′ so 1 + c1 > −2c1
CLASSIFICATION OF SUPERPOTENTIALS 23
The condition J(v̄, s̄) = 0 implies d1 = d2 = 2 and Lemma 6.5 implies d3 = d4. Eqs.(6.2) and
(6.3) give four linear equations in ci. Now d3 = d4 and Eq.(6.3) show c3 = c4, so the equations for
the plane give c = (1
− c4, −
+ c4, c4, c4,−
− c4, −
− c4). Next d1 = d2 = 2 and Eq.(6.3) show
c4 = 3d4/(2d4 + 2) and c1 = (1− 2d4)/(2d4 + 2).
But the condition −1
< c1 < 0 now implies d3 = d4 = 1, and it follows that c cannot be null.
(B) The argument if c faces vs is very similar. We have d3 = d4 and d5 = d6 = 2, and the
orthogonality equations imply c3 = c4. So c has the same form as in the second paragraph of (A)
above. We find c4 = −3d4/(2d4+2)) and c1 = (1+4d4)/(2d4 +2). But we now have the inequality
1 < c1 <
, so again d3 = d4 = 1, violating nullity.
(C) If c faces vu or sw then we need J(s̄, w̄) = 0 (resp. J(v̄, ū) = 0), which is impossible.
Example 6.8. The example of the square (S) with midpoint can be treated in essentially the same
way as the parallelograms. By symmetry, we may assume that c lies in the region that intersects
uw. However, because 1
(a + a′) may now be the midpoint and hence in W, the configuration
of Theorem 6.4(i) can occur, even though all vertices are type II. We have C ∩ E = {c̄, ā, ā′, ē}
with a = (−1,−1, 1, 1,−1, · · · ), a′ = (1,−1,−1,−1, 1, . . .), c = (1,−1,−1, 1,−1, · · · ), and e =
(−1,−1, 1,−1, 1, · · · ) with nullity condition
= 1. We will be able to rule this case out in
§7. On the other hand, the configuration of Theorem 6.4(ii) cannot occur, as one easily checks.
Next assume that adjacent (1B) vertices in ∆c̄ determine a trapezium vusw as shown in the
diagram below:
II III
VVIVII
where t is the midpoint of sw and vu is parallel to sw. We assume that v, u, s, w ∈ W but our
conclusions hold whether or not t lies in W. We will now derive constraints on the 2-face and E∩C
resulting from having c lie in one of the regions shown above. For theoretical considerations, we
need only treat the cases where c lies in regions I to VI. In practice, for an asymmetric trapezium,
we must consider c lying in the remaining regions as well. In the following we will adopt the
convention that ā, ā′ always denote null vectors in C.
(I) c in region I: This is impossible because then s̄ = 1
(c̄ + ā) and w̄ = 1
(c̄ + ā′) for some ā, ā′,
and so āā′ would not intersect conv(1
(d+W), a contradiction to Cor 3.4.
(II) c in region II: Then v̄ = 1
(c̄+ ā), ū = 1
(c̄+ ā′) for some ā, ā′. We get a contradiction to Cor
3.4 if ā, ā′ lie below the line sw. They also cannot lie on the line sw since the argument in (A) in
the proof of Theorem 6.4 and Cor 3.4 imply that C ∩ E = {c̄, ā, ā′}, and the terms corresponding
to s̄, w̄ in the superpotential equation would be unaccounted for.
24 A. DANCER AND M. WANG
Let e = 2s − a, e′ = 2w − a′. These points lie in region VI, and since we have a trapezium,
e 6= e′. We may now apply Theorem 3.8 to ā and ā′ to obtain the possibilities:
(i) s̄, w̄ ∈ C; J(ā, s) = 0 = J(ā′, w̄),
(ii) s̄ ∈ C, J(ā, s̄) = 0; w /∈ C, ē′ ∈ C is null, J(ē′, s̄) = 0,
(iii) w̄ ∈ C, J(w̄, ā′) = 0; s̄ /∈ C, ē ∈ C is null, J(ē, w̄) = 0.
Note that the last condition in (ii) (resp. (iii)) results from applying Theorem 3.8 to ē′ (resp. ē).
(III) c in region III: We have v̄ = 1
(c̄+ ā), w̄ = 1
(c̄+ ā′) for some ā, ā′ lying respectively in regions
VIII and VI (in view of Cor 3.4). Applying Theorem 3.8 we obtain the possibilities:
(i) s̄ ∈ C, J(ā, s̄) = 0 = J(ā′, s̄),
(ii) s̄ /∈ C, 2s̄ = ā+ ā′ (which implies c+ s = v + w).
(IV) c in region IV: We have ū = 1
(c̄+ ā), w̄ = 1
(c̄+ ā′) for some ā, ā′ ∈ C ∩E.
If a lies in region IX, then Cor 3.4 implies that ā′ lies in region VI. Applying Theorem 3.8 to ā
and ā′ we obtain the possibilities:
(i) s̄ ∈ C, J(ā, s̄) = 0 = J(ā′, s̄),
(ii) 2s = a+ a′, i.e., c+ s = u+ w.
If a lies on the line sv, then we may apply Theorem 3.8 to ā′. We cannot have 2s̄ = ā′ + ē′ with
ē′ ∈ C and null, otherwise āē′ would not intersect conv(1
(d+W)). So we have
(iii) s̄ ∈ C and J(s̄, ā′) = 0.
If a lies in region II, then ā′ lies in region VI. Let ē = 2v̄ − ā and ē′ = 2s̄ − ā′. As we have a
trapezium, ē 6= ē′. Now e lies in region VII or VIII while e′ lies in region VIII or IX, so by Cor 3.4
ē and ē′ cannot both lie in C and hence be null. Theorem 3.8 now gives the possibilities:
(iv) v̄, s̄ ∈ C, J(ā, v̄) = 0 = J(ā′, s̄), (and by Prop 3.7 J(v̄, s̄) = 0),
(v) v̄ ∈ C, J(ā, v̄) = 0, ē′ ∈ C is null, and J(ē′, v̄) = 0,
(vi) s̄ ∈ C, J(ā′, s̄) = 0, ē ∈ C is null, and J(ē, s̄) = 0.
(V) c in region V: We have ū = 1
(c̄+ ā′), s̄ = 1
(c̄+ ā) for some ā, ā′ lying respectively in regions
VIII and II (by Cor 3.4). Theorem 3.8 now gives the possibilities:
(i) v̄ ∈ C, J(ā, v̄) = 0 = J(ā′, v̄),
(ii) v̄ /∈ C, 2v̄ = ā+ ā′ (which implies c+ v = u+ s).
(VI) c in region VI: We have s̄ = 1
(c̄ + ā), w̄ = 1
(c̄ + ā′) for some ā, ā′ lying respectively in
regions VIII and IV (by Cor 3.4). (To rule out ā, ā′ lying in the line vu, we proceed as in case (II),
except that when t ∈ W, we conclude instead that C ∩E = {c̄, ā, ā′, 1
(ā+ ā′)}. One can still check
that v̄, ū cannot be both accounted for.) Now let ē = 2v̄ − ā and ē′ = 2ū − ā′. Again, having a
trapezium means ē 6= ē′ and Theorem 3.8 now gives the possibilities:
(i) ū, v̄ ∈ C, J(ā, v̄) = 0 = J(ā′, ū), (and J(ū, v̄) = 0 by Prop 3.7),
(ii) v̄ ∈ C, J(ā, v̄) = 0, ū /∈ C, ē′ ∈ C is null,
(iii) ū ∈ C, J(ā′, ū) = 0, v̄ /∈ C, ē ∈ C is null.
Remark 6.9. We mention a useful inequality which holds in (II) and (VI) above, as well as in
parallelogram faces with the same configuration (cf Example 6.7(A)).
Let us consider (II), where we choose in E coordinates such that the first coordinate axis is
parallel to s̄w̄ (assumed to be horizontal) and the second coordinate axis is arbitrary, with the
second coordinate increasing as we go up. As in (A) in the proof of Theorem 6.4, all points in
(C ∩ E) \ {c̄, ā, ā′} must lie below the line āā′. Let b̄ be a point among these with largest second
coordinate. Since we have seen above that either s̄ or w̄ lies in C∩E, we have s2 ≤ b2. Furthermore,
as b̄ + c̄ cannot lie in d +W it must be balanced by sums of elements in C ∩ E, with the limiting
configuration given by ā + ā′. So we have 1
(b2 + c2) ≤ a2 = a
2 = 2v2 − c2. Combining the two
inequalities we get 3c2 ≤ 4v2 − s2.
Equality in the above holds iff b̄ lies in s̄w̄ and b̄ + c̄ = ā + ā′. In particular, b̄ is unique, so in
II(i), the inequality above is strict.
CLASSIFICATION OF SUPERPOTENTIALS 25
Note that we only need v̄ū and s̄w̄ to be parallel and the presence or absence of t in W is
immaterial. Hence in Theorem 6.4(ii) we also have an analogous strict inequality, which we have
already used, e.g., in (B) of Example 6.7. (For a parallelogram, there may be midpoints on the
pair of non-horizontal sides lying in 1
(d+W), but 1
(b̄+ c̄) can never equal these midpoints, so we
still get the inequality we want.)
For the configuration in (VI), we still have an analogous inequality, but since 1
(ā + ā′) ∈ C, we
lose uniqueness of b̄ and hence the strict inequality.
We will also have occasion to apply the above analysis to appropriate trapezoidal regions in
hexagon (H3).
The method described above together with Remark 6.9 can now be used to rule out the trapezia
(T1)-(T6) as well as those mentioned in Remark 6.2.
Example 6.10. For the trapezium (T3), the vectors v, u, s, w are given in Table 4, and lie in the
2-plane {x1 + x2 + x3 + x4 = −1, x2 +2x4 = 1}. vu is given by x4 = 1 while sw is given by x4 = 0.
sv is given by x3 = 0 and wu is given by x1 = 0. The vector c that we are looking for has the
form (−c3 + c4 − 2, 1 − 2c4, c3, c4). Since the trapezium is symmetric, an explicit symmetry being
induced by interchanging x1 and x3, we need only consider c lying in regions II-VI.
(A) If c lies in region III, then c1 > 0, c4 > 1. Since a = 2v − c, we obtain a = (c3 − c4,−3 +
2c4,−c3, 2 − c4). Similarly, a
′ = (c3 − c4 + 2, 1 + 2c4,−4 − c3,−c4). If we are in case (ii), then
c = v + w − s = (1,−1,−2, 1), which violates c4 > 1. So we must be in case (i).
It follows from J(ā, s̄) = 0 = J(ā′, s̄) that d1+3 = −2c3+4c4 and J(w̄, s̄) = J(v̄, s̄). The second
equality implies that d1 = d2. Using this together with the first equality and the null condition
for ā′ (in the form J(w̄, w̄) = J(w̄, c̄), see Remark 3.9) we get c4 = d1(d1 − 1)/(4d1 + 2d3). Since
c4 > 1, we have d1(d1 − 5) > 2d3, so d1 > 5. But by Remark 3.1, J(s̄, s̄) < 0, which gives d1 < 5
(since d1 = d2), a contradiction.
(B) Let c lie in region IV, so that c1 > 0, 0 < c4 < 1. We obtain a = (2 + c3 − c4, 2c4 −
3,−2 − c3, 2 − c4) and a
′ = (2 + c3 − c4, 1 + 2c4,−4 − c3,−c4). We claim that a3 > 0, so that
a lies in region IX. To see this, we solve for c3, c4 using the null conditions J(ū, ū) = J(c̄, ū) and
J(w̄, w̄) = J(c̄, w̄) for ā, ā′ respectively. We obtain c4 = (d2d3 + 2d3d4 − d2d4)/(d3(d2 + 3d4)) and
a3 = −2− c3 = (d2d3 + 2d3d4 − d2d4)/(d2(d2 + 3d4)). Since c4 > 0 we obtain our claim.
Since a lies in region IX, we first check if (ii) holds. In this case, c = (2,−1,−3, 1) which
contradicts c4 < 1. The equations in (i) together imply the contradiction 0 = −4/d2.
(C) Suppose c lies in region V, so that c1 > 0, c4 < 0. We obtain a = (c3−c4−2, 1+2c4,−c3,−c4)
and a′ = (c3−c4+2, 2c4−3,−2−c3, 2−c4). If (ii) holds then c = (−1, 1,−1, 0) and this contradicts
c1 > 0. Hence (i) must hold.
By Remark 3.9, the null condition for ā is J(s̄, s̄) = J(s̄, c̄), which is c3
− ( 1
)c4 = 0. The
two equations in (i) imply J(ū, v̄) = J(s̄, v̄) and J(v̄, v̄) < 0, which in turn give d1 = 2. Using
this, the null condition for ā, and J(ā′, v̄) = 0 we obtain c4 =
and c3 = −
d2(d2+1)
. But
c1 = c4 − c3 − 2 > 0, which simplifies to 1 > d2(d2 + 1), a contradiction.
(D) Let c lie now in region II. Then c1 < 0, c3 < 0, 1 < c4 ≤
where the last upper
bound comes from the inequality in Remark 6.9. We obtain a = (c3 − c4, 2c4 − 3,−c3, 2 − c4)
and a′ = (2 + c3 − c4, 2c4 − 3,−2− c3, 2− c4). The null conditions for ā, ā
′ then give
c3 = −
2d1d4 + d1d2 − d2d4
(d1 + d3)(d2 + 2d4)− d2d4
, c4 =
(d1 + d3)(d2 + 2d4)
(d1 + d3)(d2 + 2d4)− d2d4
Suppose we are in case (i). The two equations and the above values of c3, c4 combine to give
(d1 − d3)((d1 + d3)(d2 + 2d4) − d2d4) = 0. However, the upper bound c4 ≤
translates into
(d1 + d3)(d2 + 2d4) ≥ 4d2d4. So the second factor is positive and we have d1 = d3. Putting this
26 A. DANCER AND M. WANG
information into the equation J(ā, s̄) = 0, we get
d1d2d4(d2 + 15) = 2d
1d2(d2 + 1)− 6d1d
2 + 2d4(2d
1d2 + 2d
1 + d
By Remark 3.1 we also have J(s̄, s̄) < 0, i.e., 1 < 4
, so either d2 = 1 or d1 < 8. Substituting
these values into the equation above and using c4 ≤
we obtain in each instance a contradiction.
If we are in case (ii), then by adding the equations J(ā, s̄) = 0 and 2J(w̄, s̄) − J(ā′, s̄) = 0
(equivalent to J(ē′, s̄) = 0), we obtain 1 = 2
. Hence (d1, d2) = (4, 2) or (3, 3). One then
checks that these values are incompatible with the null condition for ē′, J(ā, s̄) = 0, and the bound
c3 < 0.
An analogous argument works to eliminate case (iii), where we now need the bound c4 ≤
instead.
(E) Lastly suppose c lies in region VI, so c1, c3 < 0 and −
≤ c4 < 0, where the lower bound for
c4 results from Remark 6.9. We have a = (c3 − c4 − 2, 1 + 2c4,−c3,−c4) and a
′ = (2 + c3 − c4, 1 +
2c4,−4− c3,−c4). Using the null conditions for ā, ā
′, we obtain
c1 = −
2(d2 + d3)
d1 + d2 + d3
, c2 =
d1 + 5d2 + d3
d1 + d2 + d3
, c3 =
−2(d1 + d2)
d1 + d2 + d3
, c4 =
d1 + d2 + d3
If we are in case (i), J(ū, v̄) = 0 gives d2 = d4 = 2. The other two equations and the above
values of c3, c4 then give 3(d1 + d3 + 2)(d1 + d3 − 4) = 4(3d1 + 3d3 − 2). The lower bound −
becomes d1 + d2 + d3 ≥ 6d2. Using this inequality in the above Diophantine relation leads to a
contradiction. (Alternatively, observe the relation is a quadratic in d1 + d3 with no rational roots).
For case (ii), using the two equations and the above values for c3, c4, we arrive at the relation
(d1 + d2 + d3)((d1 − 5)d2d4 + d1d4 + d2d3 + 2d3d4) = 2d2(d1d2 + 2d1d4 − d2d4 + d2d3 + 2d3d4).
Using the lower bound −1
≤ c4 in the above relation we see that d1 ≤ 3. By direct substitution, we
further obtain d1 6= 3. Finally, if d1 = 2, the null condition for c̄ gives 1 >
c21 and so d2 + d3 ≤ 4.
The lower bound on c4 now implies d2 = 1. Since c2 > 1, the null condition for c̄ is violated.
Case (iii) reduces to case (ii) upon interchanging the first and third summands. Therefore, the
trapezium (T3) has been eliminated.
We discuss next the hexagons (H1)-(H3). As the three cases are similar, we will focus on (H3)
and refer to the following (schematic) diagram:
IVIV′
VII′ VII
Example 6.11. The hexagon (H3) lies in the 2-plane given by {x2 = −1, x1 + x3 + x4 = 0}. So c
has the form (−c3 − c4,−1, c3, c4). The lines vw and zy are given respectively by x1 + x3 = 1 and
CLASSIFICATION OF SUPERPOTENTIALS 27
x1 + x3 = −1. Similarly, the lines uv and yx are given by x3 = 1 and x3 = −1 respectively. The
lines uz and wx are given by x1 = −1 and x1 = 1 respectively.
Interchanging x1 and x3 induces the reflection about the perpendicular bisector of vw, while
(x1, x2, x3, x4) 7→ (−x3, x2,−x1,−x4) induces the reflection about ux. These symmetries reduce
our consideration to those c lying in regions I-VI. Moreover, (H3) is actually a regular hexagon.
The symmetry (x1, x2, x3, x4) 7→ (−x4, x2,−x3,−x1) induces the reflection about zw, which swaps
region II with region IV and region I with region VI. Finally, the symmetry (x1, x2, x3, x4) 7→
(−x3, x2,−x4,−x1) induces the rotation in E about t taking x to w, and maps region V to region
III. Therefore, we need only consider c lying in regions I, II, and V.
In the discussion below we again adopt the convention that ā, ā′ always denote null vectors in C.
If c lies in region I, then ū = 1
(c̄+ ā), x̄ = 1
(c̄+ ā′) for some ā, ā′, and we immediately see that
āā′ cannot meet conv(1
(d+W)), a contradiction to Cor 3.4.
c lying in region II:
We have c1, c3, < 1 and c1 + c3 > 1. The assumption of adjacent (1B) vertices means that
v̄ = 1
(c̄+ ā) and w̄ = 1
(c̄+ ā′) for some ā, ā′ ∈ E ∩ C. Hence a = (c3 + c4,−1, 2− c3,−2− c4) and
a′ = (2 + c3 + c4,−1,−c3,−2− c4). One checks easily that ā
′ lies in region IV and ā lies in region
IV′. Moreover, the null conditions for these vectors yield
d3 + d4
d1 + d3 + d4
, c3 =
d1 + d4
d1 + d3 + d4
, c4 = −
d1 + d3 + 2d4
d1 + d3 + d4
Let e := 2u − a and e′ := 2x − a′. These lie respectively in regions VII′ and VII. We can now
apply Theorem 3.8 to ā and ā′ to obtain the following possibilities:
(i) ū, x̄ ∈ C and J(ā, ū) = 0 = J(ā′, x̄);
(ii) ū ∈ C, J(ā, ū) = 0, x̄ /∈ C, ē′ ∈ C is null;
(iii) x̄ ∈ C, J(ā′, x̄) = 0, ū /∈ C, ē ∈ C is null;
(iv) ū, x̄ /∈ C, ē, ē′ are both null.
We can eliminate (i)-(iii) by noting that the two equations in each case together with the values
of c3, c4 above imply that 1 =
. Using this relation (and the values of c3, c4) in the null
condition for c̄ then leads to a contradiction.
For case (iv) we can again apply Theorem 3.8 to the null vertices ē and ē′. The conditions
J(ē, z̄) = 0 and J(ē′, ȳ) = 0 lead, as above, to 1 = 1
and 1 = 1
respectively.
Using this in the null condition for c̄ again leads to a contradiction. Hence z̄, ȳ /∈ C and q̄ := 2z̄− ē
and q̄′ := 2ȳ− ē′ are null vectors in E ∩C. In fact we now find that q = q′, so caeqe′a′ is a hexagon
circumscribing (H3).
Let us consider the pair of null vertices c̄, q̄. We apply the argument in (A) of the proof of Theorem
6.4 to the wedge with vertex c̄ bounded by the rays c̄ā and c̄ā′. All elements of (C ∩ E) \ {ā, ā′, c̄}
lie below the line āā′. Let b̄ be a highest (with respect to x1 + x3) element among these. Since
ē ∈ C, b1+ b3 > −1 and so c̄+ b̄ cannot equal 2ū, 2t̄, 2x̄. Hence c̄+ b̄ = ā+ ā
′, and we compute that
b1 + b3 =
d1+d3−2d4
d1+d3+d4
. The analogous argument applied to the wedge bounded by the rays q̄ē and q̄ē′
gives a lowest element b̄′ of (C ∩E) \ {q̄, ē, ē′} satisfying b̄′ + q̄ = ē+ ē′ and b′1 + b
2d4−d1−d3
d1+d3+d4
avoid a contradiction, we must have d1 + d3 ≥ 2d4.
We can repeat the above argument with the null vertex pairs {ē, ā′} and {ē′, ā}, obtaining the
inequalities d3 + d4 ≥ 2d1 and d1 + d4 ≥ 2d3 respectively. The three inequalities then imply that
in fact d1 = d3 = d4 and c = (
,−1, 2
). Furthermore, C ∩ E = {ā, ā′, c̄, ē, ē′, t̄, q̄} and the null
condition for c̄ gives (d1, d2) = (3, 9) or (4, 3).
By looking at the terms in the superpotential equation corresponding to the vertices (all of
type II), we find that the coefficients Fc̄, Fē, Fē′ have the same sign, which is opposite to that of
Fā, Fā′ , Fq̄. Next we note that the only ways to write d+(
,−1, 1
) (resp. d+(−1
,−1, 2
as a sum of element of C are t̄+ c̄ = ā+ ā′ (resp. t̄+ ā = c̄+ ē). The superpotential equation then
28 A. DANCER AND M. WANG
gives FāFā′J(ā, ā
′)+Ft̄Fc̄J(t̄, c̄) = 0 and Fc̄FēJ(c̄, ē)+Ft̄FāJ(t̄, ā) = 0. Since J(ā, ā
′), J(c̄, ē), J(t̄, c̄)
and J(t̄, ā) are all positive, the above equations and facts imply that Fc̄ and Fā have the same sign,
a contradiction.
So c cannot lie in region II.
c lying in region V:
We have c3 < −1 < −c4 < 1 < c1. The adjacent (1B) vertices assumption implies that
w̄ = 1
(c̄+ā) and ȳ = 1
(c̄+ā′) for some ā, ā′ ∈ C∩E. It follows that a = (2+c3+c4,−1,−c3,−2−c4)
and a′ = (c3 + c4,−1,−2− c3, 2− c4). The null conditions on these vectors give
(2d1 + d4)(d3 + d4)
d4(d1 + d3 + d4)
, c3 = −
2d3 + d4
d1 + d3 + d4
, c4 =
d3 − d1
d1 + d3 + d4
Since a3 = −c3 > 1, a lies above the line uv. Also, a
1 = c3 + c4 = −c1 < −1, so a
′ lies below the
line uz. We can therefore apply Theorem 3.8 to ā and ā′ to get the following possibilities:
(i) ū ∈ C, J(ā, ū) = 0 = J(ā′, ū),
(ii) ū /∈ C, e := 2u− a, e′ := 2u− a′ lie in C ∩ E and are null.
If (i) occurs, then the two orthogonality conditions imply that d1 = d3, so c4 = 0, c1 = −c3 =
1 + d1
. Substituting these values of ci into J(ā
′, ū) = 0 gives 1 = 2
. But the null condition
for c̄ is 1 = 1
(1 + d1
)2 > 1
= 1, which is a contradiction.
Hence (ii) must occur. Note that if the above diagram is rotated so that the lines x1 + x3 = κ
(for arbitrary constants κ) are horizontal, then the lines x1 − x3 = κ would be vertical. u is the
only point in the hexagon lying on x1 − x3 = −2. Observe that a1 − a3 = a
1 − a
3 ≥ −2, otherwise
āā′ would not intersect conv(1
(d+W)), which contradicts Cor 3.4. If, however, a1−a3 > −2, then
ēē′ would not intersect conv(1
(d+W)). So in fact a = e′, e = a′ and u all lie on x1 − x3 = −2. In
other words, the hexagon is circumscribed by the triangle caa′ with intersections at w, u and y.
It follows easily from the above that c = (2,−1,−2, 0), d1 = d3 = d4, and the null condition for
c̄ is 1 = 8
. Also, we have C ∩E = {c̄, ā, ā′, t̄}. Since w, u, y are type II, by Lemma 3.2, we see
that the signs of Fā, Fc̄, and Fā′ in the superpotential equation cannot be chosen compatibly. We
have thus shown that the hexagon (H3) cannot occur.
The hexagon (H2) is not regular, but has reflection symmetry about uv and the perpendicular
bisector of yy′. It can be eliminated by similar arguments, but we now have to consider c lying in
regions III and IV as well. The hexagon (H1) can also be eliminated by the above methods. Here
the hexagon is invariant under the symmetric group permuting the coordinates x1, x2, x3. Together
with Cor 3.4, this fact reduces our consideration to those c lying in three of the regions formed by
extending the sides of the hexagon.
As mentioned in Remark 6.2, we also need to rule out subshapes of the hexagons. For (H2) and
(H3) the methods used above can also be applied to rule out all the sub-parallelograms and trapezia
except the rectangle yy′z′z of (H2) (see Lemma 8.5 and the discussion immediately before Ex 6.7).
All sub-triangles will be dealt with at the end of this section. (There is a triangle with midpoint
in (H2) but that can be dealt with by similar methods.) For (H2) this leaves the pentagon yy′vz′z
and the kite y′uz′v, both of which can still be eliminated using the above methods.
The possible subshapes of (H1) are rather numerous. However, if r ≥ 4 we will be able to
eliminate all of them in Lemma 8.6. Without this assumption, the above methods can be used to
eliminate those subshapes which do not contain all three type I vectors. Of course the following
discussion will handle the sub-triangles.
Lastly, we consider triangular faces.
Theorem 6.12. Suppose we have adjacent (1B) vertices in ∆c̄ corresponding to a triangular face
x̄x̄′x̄′′ of conv(1
(d +W)). Let E be the affine 2-plane determined by the triangular face. So there
are null vectors ā, ā′ in C ∩ E such that x = 1
(a+ c), x′ = 1
(a′ + c).
CLASSIFICATION OF SUPERPOTENTIALS 29
Suppose the vertices of the triangle are the only elements of W in the face. Then we are in one
of the following two situations:
(i) C ∩ E = {c̄, ā, ā′, x̄′′}, with c+ x′′ = a+ a′ and J(x̄′′, ā) = J(x̄′′, ā′) = 0;
(ii) C ∩ E = {c̄, ā, ā′} where 1
(a + a′) = x′′, one of x, x′, x′′ is type I, and the others are either
both type I or both type II/III.
Proof.
PPPPPPPPPPPPP
(i) ❝✟
❵❵❵❵❵❵❵❵
(A) We may introduce coordinates in E so that x̄x̄′ is vertical and to the right of c̄. As āā′ must
meet conv(1
(d + W)), we see x̄′′ is on or to the right of āā′. Let b̄ be any leftmost point of
(C ∩E) \ {c̄}. As in Theorem 6.4, we see that b̄+ c̄ ∈ d+W, so all elements of C ∩E except c̄, ā, ā′
are to the right of āā′.
(B) Considering āx̄′′ and ā′x̄′′ we see (using Theorem 3.8 and Cor 3.4) that either
(1) x̄′′ ∈ C and J(x̄′′, ā) = 0 = J(x̄′′, ā′), or
(2) x̄′′ /∈ C and x′′ = 1
(a+ a′).
In case (1), (x̄′′)⊥ ∩E is the line through āā′. By Prop. 3.3 and Cor. 3.4, observe that all elements
of (C∩E)\{x̄′′} are left of x̄′′. Let b̄ be a rightmost element of (C∩E)\{x̄′′}. So either J(b̄, x̄′′) = 0
or b̄+ x̄′′ ∈ d+W. Since b̄ is not to the left of āā′, the second alternative cannot hold and so b̄ must
lie on āā′. Combining this with our results in (A), we see C ∩ E is as in (i). Also, as J(ā, ā′) > 0
and ā+ ā′ /∈ d+W, we see a+ a′ must equal c+ x′′.
In case (2), by Cor 3.4 there are no elements of C ∩E right of āā′. Hence C ∩E is as in (ii). Now
J(b̄, ē) > 0 for all b̄ 6= ē in C ∩ E, so the last statement of (ii) follows.
Remark 6.13. We must also consider the case when some midpoints of the sides of our trian-
gular face lie in W. (This could happen if two vertices were (1,−1,−1, · · · ), (−1, 1,−1, · · · ) or
(1,−2, · · · ), (1, 0,−2, · · · ) or (1,−2, · · · ), (−1, 0, · · · ).) Let us denote the midpoints of xx′, xx′′ and
x′x′′ respectively by z, y, t.
If z is absent, the arguments of (A) in the proof of Theorem 6.12 still hold, so we have the
alternatives (1),(2) in (B). If (1) holds then, choosing b̄ as above, if b is right of aa′, we have
(b + x′′) ∈ W. This gives a contradiction since 1
(b + x′′) cannot be y or t as b 6= x, x′. Now
C ∩ E = {c̄, ā, ā′, x̄′′}, and as c + x′′ /∈ 2W it must equal a+ a′. It follows that the midpoints y, t
cannot arise. If instead (2) holds, then C ∩ E = {c̄, ā, ā′} and again no midpoints can be present.
Suppose now the midpoint z of xx′ is present. The argument of (A) shows that to account for
(ā+ ā′) ∈ C, and all elements of (C ∩E) \ {c̄, ā, ā′, (ā+ ā′)/2} are right of āā′. We still have the
alternatives (1) and (2), but (2) immediately gives a contradiction.
In (1) we see as before there are no elements of C ∩ E lying to the right of āā′, so C ∩ E =
{c̄, x̄′′, ā, ā′, (ā+ ā′)/2}. Note that J(ā, (ā+ ā′)/2) and J(ā′, (ā+ ā′)/2) > 0.
If c+x′′ = a+ a′, we find after some algebra that ā+(ā+ ā′)/2 6= 2ȳ and also cannot be written
as a different sum of elements of C, giving a contradiction.
If c̄ + x̄′′ 6= ā + ā′ then one sees that c̄ + x̄′′ /∈ d +W, and by relabelling x and x′, a and a′ we
may assume that c̄ + x̄′′ = ā + 1
(ā + ā′) and also ā + ā′ = 2ȳ and ā′ + 1
(ā + ā′) = 2t̄ = x̄′ + x̄′′.
These relations imply a = x′, a contradiction. So no triangle with any midpoints present can arise.
30 A. DANCER AND M. WANG
Remark 6.14. There are also triangular faces with two points of W in the interior of an edge.
This can only happen if two vertices are (−2, 1, 0, · · · ) and (1,−2, 0, · · · ) (up to permutation). The
other sides of the triangle now have no interior points in W unless the triangle is contained in the
hexagon (H1). We can again modify the proof of Theorem 6.12 to treat this situation.
If the interior points z, w lie on xx′, then (2ā + ā′)/3, (ā + 2ā′)/3 must be in C, and all points
of C ∩ E except for these two and c̄, ā, ā′ lie to the right of āā′. By Prop 3.3, alternative (1) must
now hold. The usual argument shows x̄′′ is the only element of C ∩ E on the right of āā′. Now
again J(ā, 1
(2ā′ + ā)) > 0, J(ā′, 1
(ā+ 2ā′)) > 0, and the sums a+ (2a′ + a)/3 and a′ + (a+ 2a′)/3
cannot give points in 2W. Since they also cannot both be cancelled by c+x′′ in the superpotential
equation, we have a contradiction.
The other possibility for two interior points is, after relabelling the vertices if necessary, when
z = (2x+ x′′)/3 and w = (2x′′ + x)/3. As usual all elements of C ∩ E except for c̄, ā, ā′ are on the
right of āā′. Alternative (1) must hold, or else we cannot account for z, w. The usual argument
shows either x̄′′ is the only element of C ∩ E right of āā′, or z ∈ C is the rightmost element of
(C ∩ E) \ {x̄′′} (so (z + x′′)/2 = w). In the former case we cannot get both z and w, as (c+ x′′)/2
can’t equal both z and w. In the latter, considering āz̄ shows J(ā, z̄) = 0. But as J(ā, x̄′′) = 0, this
means ā is orthogonal to x̄ and hence to c̄, a contradiction.
So no triangle with points of W in the interior of an edge can arise (except possibly for a
subtriangle of (H1)).
Nullity of c̄, ā, ā′ and the conditions in Theorem 6.12(i),(ii) again put severe constraints on x, x′, x′
and the dimensions. The possible triangles for case (i) are as follows, where (Tr11)-(Tr22) occur
only if K is not connected, and we have also listed the vectors c, a, a′ for future reference. Further
details of how the following listing is arrived at can be found in [DW5].
x′′ x x′
(Tr1) (−2, 1, 0, 0, 0) (0, 0,−2, 1, 0) (0, 0,−2, 0, 1)
(Tr2) (−2, 1, 0, 0) (0, 1,−2, 0) (0, 1,−1,−1)
(Tr3) (0, 0, 0,−2, 1) (−2, 1, 0, 0, 0) (0, 1,−2, 0, 0)
(Tr4) (−2, 1, 0, 0, 0, 0) (0, 0,−2, 1, 0, 0) (0, 0, 0, 1,−1,−1)
(Tr5) (−2, 1, 0, 0, 0) (0, 1,−1, 0,−1) (0, 1,−1,−1, 0)
(Tr6) (−2, 1, 0, 0, 0, 0) (0, 0, 1,−1,−1, 0) (0, 0, 1,−1, 0,−1)
(Tr7) (−2, 1, 0, 0, 0, 0) (0, 0, 1,−1,−1, 0) (0, 0,−1,−1, 0, 1)
(Tr8) (−2, 1, 0, 0, 0, 0) (0, 0,−1,−1, 1, 0) (0, 0,−1,−1, 0, 1)
(Tr9) (−2, 1, 0, 0, 0, 0, 0) (0, 0, 1,−1,−1, 0, 0) (0, 0, 1, 0, 0,−1,−1)
(Tr10) (−2, 1, 0, 0, 0, 0, 0) (0, 0,−1, 1,−1, 0, 0) (0, 0,−1, 0, 0, 1,−1)
(Tr11) (0, 0, 0, 1,−1,−1) (−2, 1, 0, 0, 0, 0) (−2, 0, 1, 0, 0, 0)
(Tr12) (0, 1, 0,−1,−1) (−2, 1, 0, 0, 0) (−1, 1,−1, 0, 0)
(Tr13) (0, 0, 0,−1,−1, 1) (−2, 1, 0, 0, 0, 0) (0, 1,−2, 0, 0, 0)
(Tr14) (0, 0, 0,−1,−1, 1) (0, 1,−1,−1, 0, 0, 0) (−2, 1, 0, 0, 0, 0, 0)
(Tr15) (0, 0, 0, 1,−1,−1, 0) (1,−1,−1, 0, 0, 0, 0) (1,−1, 0, 0, 0, 0,−1)
(Tr16) (0, 0, 0, 1,−1,−1, 0) (1,−1,−1, 0, 0, 0, 0) (0,−1, 1, 0, 0, 0, 1)
(Tr17) (0, 0, 0, 1,−1,−1, 0) (1,−1,−1, 0, 0, 0, 0) (0,−1,−1, 0, 0, 0, 1)
(Tr18) (0, 0, 0, 1,−1,−1, 0) (1,−1,−1, 0, 0, 0, 0, 0) (1, 0, 0, 0, 0, 0,−1,−1)
(Tr19) (0, 0, 0, 1,−1,−1, 0) (1,−1,−1, 0, 0, 0, 0, 0) (0,−1, 0, 0, 0, 0,−1, 1)
(Tr20) (−1,−1, 1, 0, 0, 0) (0, 0, 1,−1,−1, 0) (0, 0, 1,−1, 0,−1)
(Tr21) (−1, 1,−1, 0, 0, 0) (0, 0,−1, 1,−1, 0) (0, 0,−1, 1, 0,−1)
(Tr22) (−1, 1,−1, 0, 0, 0) (0, 0,−1,−1, 1, 0) (0, 0,−1,−1, 0, 1)
CLASSIFICATION OF SUPERPOTENTIALS 31
3c 3a 3a′
(Tr1) (2,−1,−8, 2, 2) (−2, 1,−4, 4,−2) (−2, 1,−4,−2, 4)
(Tr2) (2, 3,−6,−2) (−2, 3,−6, 2) (−2, 3, 0,−4)
(Tr3) (−4, 4,−4, 2,−1) (−8, 2, 4,−2, 1) (4, 2,−8,−2, 1)
(Tr4) (2,−1,−4, 4,−2,−2) (−2, 1,−8, 2, 2, 2) (−2, 1, 4, 2,−4,−4)
(Tr5) (2, 3,−4,−2,−2) (−2, 3,−2, 2,−4) (−2, 3,−2,−4, 2)
(Tr6) (2,−1, 4,−4,−2,−2) (−2, 1, 2,−2,−4, 2) (−2, 1, 2,−2, 2,−4)
(Tr7) (2,−1, 0,−4,−2, 2) (−2, 1, 6,−2,−4,−2) (−2, 1,−6,−2, 2, 4)
(Tr8) (2,−1,−4,−4, 2, 2) (−2, 1,−2,−2, 4,−2) (−2, 1,−2,−2,−2, 4)
(Tr9) (2,−1, 4,−2,−2,−2,−2) (−2, 1, 2,−4,−4, 2, 2) (−2, 1, 2, 2, 2,−4,−4)
(Tr10) (2,−1,−4, 2,−2, 2,−2) (−2, 1,−2, 4,−4,−2, 2) (−2, 1,−2,−2, 2, 4,−4)
(Tr11) (−8, 2, 2,−1, 1, 1) (−4, 4,−2, 1,−1,−1) (−4,−2, 4, 1,−1,−1)
(Tr12) (−6, 3,−2, 1, 1) (−6, 3, 2,−1,−1) (0, 3,−4,−1,−1)
(Tr13) (−4, 4,−4, 1, 1,−1) (−8, 2, 4,−1,−1, 1) (4, 2,−8,−1,−1, 1)
(Tr14) (−4, 4,−2,−2, 1, 1,−1) (4, 2,−4,−4,−1,−1, 1) (−8, 2, 2, 2,−1,−1, 1)
(Tr15) (4,−4,−2,−1, 1, 1,−2) (2,−2,−4, 1,−1,−1, 2) (2,−2, 2, 1,−1,−1,−4)
(Tr16) (2,−4, 0,−1, 1, 1,−2) (4,−2,−6, 1,−1,−1, 2) (−2,−2, 6, 1,−1,−1,−4)
(Tr17) (2,−4,−4,−1, 1, 1, 2) (4,−2,−2, 1,−1,−1,−2) (−2,−2,−2, 1,−1,−1, 4)
(Tr18) (4,−2,−2,−1, 1, 1,−2,−2) (2,−4,−4, 1,−1,−1, 2, 2) (2, 2, 2, 1,−1,−1,−4,−4)
(Tr19) (2,−4,−2,−1, 1, 1,−2, 2) (4,−2,−4, 1,−1,−1, 2,−2) (−2,−2, 2, 1,−1,−1,−4, 4)
(Tr20) (1, 1, 3,−4,−2,−2) (−1,−1, 3,−2,−4, 2) (−1,−1, 3,−2, 2,−4)
(Tr21) (1,−1,−3, 4,−2,−2) (−1, 1,−3, 2,−4, 2) (−1, 1,−3, 2, 2,−4)
(Tr22) (1,−1,−3,−4, 2, 2) (−1, 1,−3,−2, 4,−2) (−1, 1,−3,−2,−2, 4)
Remark 6.15. In making the above table, it is useful to observe from the nullity and orthogonality
conditions that x′′ cannot be type I, and that if x′′ is type III, say, (−2i, 1j), then xi = x
i iff xj = x
The possibilities for Theorem 6.12(ii) are as follows (up to permutation of x, x′, x′′ and the
corresponding permutation of c, a, a′):
x′′ x x′
(Tr23) (−1, 0, 0, 0, 0) (0,−2, 1, 0, 0) (0, 0, 0,−2, 1)
(Tr24) (−1, 0, 0, 0) (0, 1,−2, 0) (0,−1,−1, 1)
(Tr25) (−1, 0, 0, 0, 0, 0) (0, 1,−2, 0, 0, 0) (0, 0, 0, 1,−1,−1)
(Tr26) (−1, 0, 0, 0, 0) (0, 1,−1,−1, 0) (0,−1,−1, 0, 1)
(Tr27) (−1, 0, 0, 0, 0, 0, 0) (0, 1,−1,−1, 0, 0, 0) (0, 0, 0, 0, 1,−1,−1)
(Tr28) (−1, 0, 0) (0,−1, 0) (0, 0,−1)
c a a′
(Tr23) (1,−2, 1,−2, 1) (−1,−2, 1, 2,−1) (−1, 2,−1,−2, 1)
(Tr24) (1, 0,−3, 1) (−1, 2,−1,−1) (−1,−2, 1, 1)
(Tr25) (1, 1,−2, 1,−1,−1) (−1, 1,−2,−1, 1, 1) (−1,−1, 2, 1,−1,−1)
(Tr26) (1, 0,−2,−1, 1) (−1, 2, 0,−1,−1) (−1,−2, 0, 1, 1)
(Tr27) (1, 1,−1,−1, 1,−1,−1) (−1, 1,−1,−1,−1, 1, 1) (−1,−1, 1, 1, 1,−1,−1)
(Tr28) (1,−1,−1) (−1,−1, 1) (−1, 1,−1)
Remark 6.16. In drawing up the above listing, recall from Theorem 6.12 that one of the vectors,
without loss of generality x′′, is of type I. We write x′′ = (−1, 0, 0, · · · ). It now easily follows from
nullity and the relations between x, x′, x′′ and c, a, a′ that x1 = x
Also, observe that as x′′ is a vertex of W, no type II vector may have a nonzero entry in the
first position.
32 A. DANCER AND M. WANG
In contrast to the earlier listing of non-triangular faces, the above lists result from examining
all triangular faces, including ones which arise from other faces because certain vertices are absent
from W.
The restrictions on the dimensions of the corresponding summands are as follows:
(Tr1) (2, 1, 16, 4, 4, · · · )
(Tr2) (2, 3, 12, 4, · · · )
(Tr3) (16, 4, 16, 2, 1, · · · )
(Tr4) (2, 1, 16, 4, d5 , d6, · · · ),
(Tr5) (2, 3, 6, 6, 6, · · · )
(Tr6) (2, 1, d3, d4, 4, 4, · · · ),
(Tr7) (2, 1, 12, 3, 12, 12, · · · )
(Tr8) (2, 1, d3, d4, 4, 4, · · · ),
(Tr9) (2, 1, 4, d4 , d5, d6, d7, · · · ),
(Tr10) (2, 1, 4, d4 , d5, d6, d7, · · · ),
(Tr11) (16, 4, 4, 1, 1, 1, · · · )
(Tr12) (12, 3, 4, 1, 1, · · · )
(Tr13) (16, 4, 16, 1, 1, 1, · · · )
(Tr14) (16, 4, d3, d4, 1, 1, 1, · · · )
(Tr15) (d1, d2, 4, 1, 1, 1, 4, · · · ),
(Tr16) (12, 3, 12, 1, 1, 1, 12, · · · )
(Tr17) (4, d2, d3, 1, 1, 1, 4, · · · ),
(Tr18) (4, d2, d3, 1, 1, 1, d7 , d8, · · · ),
(Tr19) (d1, 4, d3, 1, 1, 1, d7 , d8, · · · ),
(Tr20-22) (1, 1, 3, 6, 6, 6, · · · ), (1, 2, 2, 8, 8, 8, · · · ), or (2, 1, 2, 8, 8, 8, · · · )
(Tr23) 1
(Tr24) d3 = 2d2 :
(Tr25) 1
(Tr26) d2 = d3 :
(Tr27)
(Tr28) 1
Note that (Tr28) is a subtriangle of (H1), (Tr2) is a subtriangle of a triangle with midpoints of
all sides in W, and (Tr12) is a subtriangle of a triangle with the midpoint of one side.
Let us now illustrate by an example how one arrives at the above tables.
Example 6.17. One possible triangle has vertices V1 = (0, 0, 0,−2, 1), V2 = (−2, 1, 0, 0, 0), V3 =
(0, 1,−2, 0, 0) with the midpoint V4 = (−1, 1,−1, 0, 0) of V2V3 in W. The triangle has a symmetry
given by interchanging the first and third entries. It therefore suffices to consider V1, V2, V4 as pos-
sibilities for x′′. Of course, by Remark 6.13 the full triangle cannot occur. The possible subtriangles
xx′x′′ are V2V3V1, V2V4V1, V4V1V2, V3V1V2, and V3V1V4. Now 3c = 2x+2x
′−x′′, 3a = 4x−2x′+x′′,
and 3a′ = −2x+ 4x′ + x′′ can be used to compute these vectors in each case. For V2V4V1 one gets
3c = (−6, 4,−2, 2,−1), 3a = (−6, 2, 2,−2, 1) and so c̄ and ā cannot be both null. Similarly, for the
last three possibilities, ā′ and c̄ cannot be both null. That leaves the first case, which gives (Tr3).
The condition J(ā, x̄′′) = 0 is 3 = 4
, which implies (d4, d5) = (2, 1). Putting this into the null
conditions for c̄, ā, ā′ gives the equations
CLASSIFICATION OF SUPERPOTENTIALS 33
The last two equations imply that d1 = d3 and the first two equations give d1 = 4d2. These in turn
give (d1, d2, d3) = (16, 4, 16), as in the tables above.
Putting all the results in this section together we obtain
Theorem 6.18. If we have two adjacent (1B) vertices, then the associated 2-face of W is given
by a triangle in the list (Tr1) − (Tr27), the square with midpoint (S), a proper subshape of the
hexagonal face (H1) containing all three type I vectors, or the sub-rectangle yy′zz′ of (H2).
We note for future reference the following properties of the c vector of the non-triangular faces
appearing in the above theorem: for (S), all nonzero entries have the same absolute value, and
there are only 3 (resp. 2) nonzero entries for the subfaces of (H1) (resp. (H2)).
7. More than one type (2) vertex
In this section we shall now show there is at most one type (2) vertex in ∆c̄, except in the
situation of Theorem 3.14 and one other possible case.
Suppose we have two type (2) vertices of V . Then we have elements v,w, v′, w′ of W with c, v, w
collinear and c, v′, w′ collinear. So we have four coplanar elements v,w, v′, w′ of W where vw and
v′w′ are edges. Moreover, the edges vw and v′w′ meet at c outside conv(W). Hence vwv′w′ do not
form a parallelogram or a triangle.
From our listing of polygons in §6 and considering their sub-polygons we see that the possibilities
for further analysis are the following:
• Trapezia (T1)-(T6): We must have c = 2v− s = 2u−w. Also, we note for future reference
that sw is always an edge of conv(W) in (T3) and (T5), regardless of whether or not the
whole trapezium is a face, since sw can be cut out by {x2 = 1, x1 + x3 = −2} (cf 1.2(e)).
• Hexagons (H1)-(H3)
• Rectangle with midpoints (P17): While the rectangle itself cannot occur, we need to con-
sider the trapezia obtained by omitting one vertex, so that the edges are a side of the rec-
tangle and the segment joining the remaining vertex to the opposite midpoint. As above,
note that the longer of the two parallel sides of the trapezium is always an edge of conv(W).
• Parallelogram with midpoints (P16): This case is similar to (P17). The sub-polygons
to consider are the trapezia obtained by omitting one vertex of the parallelogram. By
symmetry, we are reduced to omitting either u or w. But since s occurs, w cannot be
omitted.
• Triangle with midpoints of all sides: We need to consider the trapezia obtained by omitting
a vertex. By symmetry all three trapezia are equivalent. This triangle is always a face of
conv(W) as it is cut out by {x2 = 1, x1 + x3 + x4 = −2} (cf 1.2(e)).
• Trapezia (T*1),(T*2), (T*3): By Rmk 6.3 these cannot be faces of conv(W), so cannot
come from adjacent type (2) vertices of V . For (T*1), besides the full trapezium, we need
to consider the two trapezia obtained by omitting either s or w. By symmetry these are
equivalent.
For (T1),(T2),(T4),(T6),(T*2),(T*3) we must have c = 2v − s = 2u − w, and so Lemma 3.13
applied to vs gives a contradiction. The same argument works for (P16), as up to permutations,
c = 2v − s = 2y − w. For (T1*), since Theorem 3.11 rules out c = (3v − s)/2 = (3u − w)/2
(corresponding to the full trapezium), the only other possible c is 2v − s = 2u − r, and again
Lemma 3.13 rules this out.
For (T3),(T5) and the trapezium coming from the triangle with midpoints, we need more infor-
mation from the superpotential equation. Since J(c̄, s̄), J(c̄, w̄) > 0, while Av, Au < 0, Fs̄, Fw̄ must
have the same sign, which must be opposite to that of Fc̄. Since stw is always an edge by earlier
remarks, the nullity of c̄ implies J(s̄, w̄) > 0, contradicting Prop 3.7(ii).
34 A. DANCER AND M. WANG
Essentially the same argument works for (P17), as up to permutations c = 2s − v = 2z − u =
(−2,−1, 2, 0).
For (H3) most quadruples cannot give pairs of edges. For we observe that u (resp. v,w) is present
iff x (resp. y, z) is. Thus, if u is missing, so is x, and v,w, y, z must all be present (otherwise we do
not have a 2-dimensional polygon). But we now get a rectangle, which is not admissible. Hence all
vertices are present and by symmetry we may assume that one of our edges is uz or uv. From this
we quickly find that the two possible c (up to permutations) are (−1,−1,−1, 2) = 2y − x = 2z − u
and (1,−1, 1,−2) = 2v − u = 2w − x. Both cases are ruled out by Lemma 3.13.
For (H2), observe that y (resp. y′) is present iff z (resp. z′) is. As these four vectors cannot all be
absent (otherwise we do not have a 2-dim polygon), by the symmetries of (H2), we can assume y′ is
present. If v is present, then all possibilities are eliminated by Theorem 3.11. (Note that although
2α − y′ = 2z − z′ it is impossible for αy′ and zz′ to both be edges.) On the other hand, if v is
absent, then y′z′ is an edge. Since the polygon cannot be a parallelogram or a triangle, it follows
that u is present and the polygon is a pentagon. In this case, the only possibility compatible with
Theorem 3.11 is c = (0,−1, 2,−2) = 2y−u = 3
y′− 1
z′. (This is not a priori ruled out by Theorem
3.11 as y′z′ has an interior point β).
To discuss (H1), we write
u = (−2, 1, 0), p = (0, 1,−2), v = (1, 0,−2), w = (0,−2, 1), s = (1,−2, 0), q = (−2, 0, 1)
for the vertices,
x = (−1, 1,−1), y = (−1,−1, 1), z = (1,−1,−1),
for midpoints of the longer sides, and
α = (−1, 0, 0), β = (0, 0,−1), γ = (0,−1, 0)
for the interior points, with the understanding that the rest of the components of the above vectors
are zero.
As before we consider pairs of vectors which can form edges of an admissible polygon. We then
compute the possibilities for c and apply Theorem 3.11. This will eliminate most possibilities. (For
many quadruples of points we can see, as in (H3), that they cannot all be vertices.) So up to
permutations, the remaining possibilities are as follows.
If no type II is present:
(1) c = 2α− u = 2β − p = (0,−1, 0, · · · )
(2) c = 2u− v = 2q − s = (−5, 2, 2, · · · )
(3) c = 2u− p = 2q − γ = (−4, 1, 2, · · · )
(4) c = 2q − u = 2α− p = (−2,−1, 2, · · · )
If all type II are present:
(5) c = 2u− x = 2q − y = (−3, 1, 1, · · · )
(6) c = 2u− y = 2x− z = (−3, 3,−1, · · · )
(7) c = (3y − z)/2 = 2q − u = (−2,−1, 2, · · · )
(8) c = (3p − u)/2 = (3v − s)/2 = (1, 1,−3, · · · )
Again, we cannot immediately rule out (7) and (8) using 3.11 because of the presence of interior
points. However for (8) we easily see using the arguments of 3.11 that the elements of C on the line
through v, s are c̄, c̄1 = (v̄+ s̄)/2 = z̄ and c̄2 = (3s̄− v̄)/2. Now as s, v are type III we need Fc̄ and
Fc̄2 to have the same sign, which is the opposite sign to Fc̄1 . But the superpotential equation now
gives a contradiction to the fact that Az < 0.
In (2)-(7) Lemma 3.13 applied respectively to uv, up, qu, ux, uy, qu gives a contradiction. Note
that case (4) only occurs when K is not connected, as the vectors β, γ are absent (cf 1.2(b)). We
are left with (1), which is precisely the situation of Theorem 3.14. We have therefore proved
Theorem 7.1. Apart from the situation of Theorem 3.14, the only other possible case where we
can have more than one type (2) vertex is, up to permutation of summands, when two type (2)
CLASSIFICATION OF SUPERPOTENTIALS 35
vertices are adjacent and the 2-plane determined by them and c̄ intersects conv(1
(d +W)) in the
pentagon with vertices uyy′zz′ contained in the hexagon (H2).
We will be able to rule this case out in §8.
8. Adjacent (1B) vertices revisited
We now return to our classification of when adjacent (1B) vertices can occur.
The idea is as follows: each of the configurations of §6 involves, as well as the null vector c̄, two
new null vectors ā, ā′. Hence the arguments of earlier sections also apply to ā, ā′. That is, we may
consider the associated polytopes ∆ā and ∆ā
The following lemma is useful when applied to ∆c̄,∆ā and ∆ā
Lemma 8.1. Suppose we have a (1B) vertex with exactly k adjacent (1B) vertices. Then
r ≤ #((1A) vertices) + #((2) vertices) + k + 2.
Suppose we have a (1B) vertex with no adjacent (1B) vertices. Then
r ≤ #((1A) vertices) + #((2) vertices) + 2.
If there are no (1B) vertices then
r ≤ #((1A) vertices) + #((2) vertices) + 1.
Proof. By our assumption that dim conv(W) = r− 1, it follows that ∆c̄ is a polytope of dimension
r − 2. Any vertex in it has at least r − 2 adjacent vertices. So for a (1B) vertex, the first two
statements follow immediately. If there are no (1B) vertices the third inequality follows because
∆c̄ has at least r − 1 vertices.
Lemma 8.2. Configurations (Tr1) - (Tr22) cannot arise from adjacent (1B) vertices.
Proof. The strategy is to count the number of type (1A), (2) and adjacent (1B) vertices in ∆ā or
∆c̄ and apply Lemma 8.1 to get a contradiction.
(i) We first observe that for these configurations c and a have at least four nonzero entries (at
least five except for (Tr2)), so they cannot be collinear with an edge vw with points of W in the
interior of vw (see Table 3 in §3). So if ∆c̄ or ∆ā has a type (2) vertex, by Theorem 3.11, c or a
must equal 2v − w or (4v − w)/3. It is easy to check that this is impossible except for c in (Tr3),
using the forms of c in Tables 1, 2 in §3.
(ii) Next we consider (1A) vertices. For (Tr1)-(Tr10) we have |ai/di| ≤
for all i. For Tr(1),
Tr(6),Tr(8) there are three i where equality holds. In these cases one of the associated di equals
1. Moreover for (Tr1) and (Tr8) two of these ai/di equal 1/3 and the third is −1/3, wheras for
(Tr6) it is the other way round. For (Tr2)-(Tr5), (Tr7) and (Tr9)-(Tr10) there are only two i where
equality holds. Further, for (Tr2) and (Tr5) |ci/di| ≤
for all i, with equality for just two i, and
here ci/di =
. It follows that for (Tr1), (Tr8) there are at most two (1A) vertices in ∆ā, while
for (Tr2)-(Tr5), (Tr7) and (Tr9)-(Tr10) there is at most one. In the case of (Tr6) there are at most
three (1A) vertices in general but at most two if K is connected. For (Tr2) and (Tr5) there are no
(1A) vertices in ∆c̄.
By means of similar considerations, we find that there is one (1A) vertex (corresponding to x̄′′)
in ∆ā for (Tr12), (Tr13),(Tr14) (Tr16), (Tr18), (Tr19) and at most two (1A) vertices for (Tr11),
(Tr17), (Tr20) and the d = (1, 1, 3, 6, 6, 6, · · · ) case of (Tr21), (Tr22). For (Tr15) there are at most
four (1A) vertices in ∆ā, and for the remaining cases of (Tr21) and (Tr22) there are at most r − 4
(1A) vertices (r − 6 of those correspond to (−23, 1j) where j > 6).
(iii) Finally, consider the (1B) vertex ξ̄ in ∆ā corresponding to x̄ in each of the triangles. In order
for there to be an adjacent (1B) vertex, ā must be (up to permutation) of the form of the null vector
c̄ in the 2-faces in Theorem 6.18. Now observe that for examples (Tr1), (Tr3), (Tr4), (Tr6)- (Tr20)
36 A. DANCER AND M. WANG
the null vector a does not appear in the list of possible c. Hence ξ̄ has no adjacent (1B) vertices.
From above, type (2) vertices cannot occur, so combining the bounds for (1A) vertices in ∆ā with
Lemma 8.1 gives an upper bound for r less than the minimum required by each configuration, a
contradiction.
(iv) Let us now consider (Tr21) and (Tr22). The vector a of (Tr21) has the same form as c in
(Tr22) and vice versa. An adjacent 2-face containing aξc of (Tr21) can only be a triangle of type
(Tr22) containing 1
(1,−1,−3,−2,−2, 4, 0, · · · ). Thus ξ has at most one adjacent (1B) vertex, and
we get the bound r− 2 ≤ 2+1+0 in the d = (1, 1, 3, 6, 6, 6, · · · ) subcase and r− 2 ≤ (r− 4)+1+0
in the other two subcases, a contradiction.
(v) For the remaining two triangles (Tr5) and (Tr2) we consider ∆c̄ instead.
For (Tr5), observe that c determines the plane xx′x′′ and does not occur as a possible null vector
for any other configurations. So we have at most one adjacent pair of (1B) vertices in ∆c̄. From
above, there are no type (1A) or (2) vertices. But r ≥ 5, giving a contradiction.
For (Tr2), consider the vertex ξ̄′ of ∆c̄ corresponding to x′. If there is a (1B) vertex adjacent
to it, we have a 2-dim face including c, x′. By Theorem 6.18 the only one is the face including x,
so there is at most one (1B) vertex of ∆c̄ adjacent to ξ̄′. Also, type (1A) and (2) vertices cannot
occur, so r ≤ 3, a contradiction.
Lemma 8.3. Configurations (Tr23)-(Tr27) cannot arise from adjacent (1B) vertices.
Proof. Note first that all entries of c, a, a′ are integers, so Lemma 4.4 shows in each case there is at
most one (1A) vertex, and for (Tr23),(Tr24) one checks that there are no (1A) vertices in ∆c̄. Note
also that for all these configurations, as x′′ is a vertex, there are no type II vectors with nonzero
entry in place 1.
Observe as in Lemma 8.2 that there are no type (2) vertices in ∆c̄,∆ā or ∆ā
. (For (Tr24), we
need to rule out the possibility that c has the form (4) in Table 3 in §3 with λ = 3/2. This follows
since the interior point in that case would be a type II vector with nonzero entry in place 1.)
So in all cases if we have a (1B) vertex with exactly k (1B) vertices adjacent to it, then by
Lemma 8.1 we have r ≤ k + 3. For (Tr23), (Tr24) (using ∆c̄), we have r ≤ k + 2. We will work
with ∆c̄ below.
First consider (Tr23) and look for (1B) vertices adjacent to ξ̄ where ξ̄ corresponds to the vertex
x. We need a 2-face including c, x. By Theorem 6.18 such a face must be of type (Tr23), and
having fixed c and a, the only freedom lies in assigning 1 in the third null vector to the first or fifth
place. So k ≤ 2, which gives r ≤ 4, a contradiction.
Similarly, for (Tr24), since a type (H1) face cannot contain c and x, we need only consider faces
of type (Tr24), for which there are again two possibilities. However, as mentioned above, in one of
these possibilities the vector “x′” has a 1 in place 1 and hence cannot occur. So there is at most
one (1B) vertex adjacent to ξ, and we deduce r ≤ 3, a contradiction.
For (Tr25),(Tr27) we similarly deduce that the only 2-face containing x, c is itself because as
above we cannot have any type II vectors with nonzero entry in place 1. So r ≤ 4, a contradiction.
Finally, for (Tr26) the above argument still works since (Tr24) has been ruled out (the vectors
a, a′ of (Tr24) are of the same form as c, a of (Tr26), so a priori (Tr24) could be an adjacent
2-face).
Lemma 8.4. Configuration (S) (square with midpoint) cannot arise from adjacent (1B) vertices.
Proof. We refer to §6 for the expressions for the vertices vusw of the square. The null vertex c̄
corresponds to (1,−1,−1, 1,−1, 0, · · · ) and the 2-dimensional face is cut out by x2 = −1, x1+x3 =
0 = x4 + x5, and xk = 0, for k > 5. Lemma 4.4(b) shows that there is at most one (1A) vertex
in ∆c̄. As r ≥ 5 and all the nonzero entries of c have the same absolute value, it follows that
there are no type (2) vertices. Let ξ̄ denote the vertex of ∆c̄ so that ξ is collinear with u and
a = 2u − c = (−1,−1, 1, 1,−1, 0, · · · ). A (1B) vertex adjacent to ξ̄ gives a 2-dimensional face
CLASSIFICATION OF SUPERPOTENTIALS 37
including c̄, ū. By what we have analysed so far about 2-faces given by adjacent (1B) vertices, this
face must again be a face of type (S), and the only possibilities are itself or the face obtained from
this by swapping indices 2 and 5. Hence there are at most two (1B) vertices adjacent to ξ̄, and at
vertex ξ̄, we have 3 ≤ r − 2 ≤ 1 + 2. Thus r = 5 is the remaining possibility, in which case ξ̄ has
exactly two adjacent (1B) vertices and one adjacent (1A) vertex.
Let us denote by ξ̄′ the (1B) vertex such that ξ′ is collinear with w and a′ := 2w − c =
(1,−1,−1,−1, 1). Let η̄ denote the other (1B) vertex adjacent to ξ̄. Then the 2-face deter-
mined by c, ξ, η is cut out by x5 = −1, x1 + x3 = 0 = x2 + x4. The ray cη intersects conv(W)
at z = (1, 0,−1, 0,−1) and b := 2z − c = (1, 1,−1,−1,−1) corresponds to a null vertex. Similarly,
there is a (1B) vertex η̄′ (besides ξ̄) adjacent to ξ̄′, and the corresponding 2-face (also of type (S)) is
cut out by x3 = −1, x1+x2 = 0 = x4+x5. The ray cη
′ intersects conv(W) at z′ = (0, 0,−1, 1,−1).
The vector b′ := 2z′ − c = (−1, 1,−1, 1,−1) corresponds to a null vertex.
Let us examine the (1A) vertex in ∆c̄ more closely. Let y ∈ W such that J(ȳ, c̄) = 0. As r = 5,
the null condition for c̄ implies that di ≥ 2 with at most one equal to 2. Also, for some j ∈ {2, 3, 5}
(i.e., j is an index for which the corresponding entry of c is −1) we must have yj = −2, so y is
type III. Let i be the index such that yi = 1. Then i ∈ {1, 4} (i.e., i is an index for which the
corresponding entry of c is 1), and the orthogonality condition implies (di, dj) = (2, 4) or (3, 3).
There are thus six possibilities for y, but only one can actually occur.
With the possible exception of the existence of the (1A) vertex, the above arguments apply
equally to the projected polytopes ∆ā, ∆b̄, ∆ā
, and ∆b̄
as the entries of a, b, a′ and b′ are just
permutations of those of c. We claim that whichever possibility for y occurs in ∆c̄, there is another
projected polytope with no (1A) vertex. Applying the above arguments to this polytope would
result in the contradiction r − 2 ≤ 2 and complete our proof.
We can use ∆ā for the contradiction if d1 = 2 or if any of (d1, d2), (d3, d4), (d1, d5) = (3, 3). If
d4 = 2 or if (d2, d4) = (3, 3) we can use ∆
ā′ instead. Finally, if (d1, d3) = (3, 3) we can use ∆
b̄′ and
if (d4, d5) = (3, 3) we can use ∆
For example, when (d1, d3) = (3, 3) (so y = (1
1,−23)), the null condition for c̄ implies that d2, d4
in particular cannot equal 2 or 3. In order to have a (1A) vertex in ∆b̄
, we must have a type III
vector (1i,−2j) with i ∈ {2, 4}. But this requires one of d2, d4 to be 2 or 3.
When (d4, d5) = (3, 3), then d1, d2 cannot be 2 or 3. But in ∆
b̄ a (1A) vertex corresponds to
(1i,−2j) with i ∈ {1, 2}, which implies that one of d1, d2 is 2 or 3.
The remaining cases are handled similarly.
Lemma 8.5. The subrectangle yy′zz′ of (H2) cannot arise from adjacent (1B) vertices.
Proof. Recall c = (−2, 1, 0, · · · ), so by Lemma 4.4 there are no (1A) vertices of ∆c̄. Moreover,
using Tables 1-3 in §3, one may check that there are no type (2) vertices either. (Note that type
II vectors other than y, z with a nonzero entry in place 1 cannot occur as then the subrectangle
cannot be a face. Similarly the line through c, α, β and (1,−2, 0, · · · ) will not give a type (2) vertex
as this line cannot be an edge.)
Let η̄ denote the vertex of ∆c̄ collinear with c and y. Any (1B) vertex adjacent to η̄ will give rise
to a face containing c and y, which cannot be of type (H1), and must therefore be of type (H2),
since we have eliminated all other possibilities. In fact, it must be the face we started with.
So there is just one (1B) vertex adjacent to η̄, and from above there are no (1A) or (2) vertices.
As r ≥ 4 for (H2), this contradicts Lemma 8.1.
Lemma 8.6. If r ≥ 4, configuration (H1) or subshapes cannot arise from adjacent (1B) vertices.
Proof. We first note some special properties of W. Since (H1) is a face, there can be no type II
vectors in W with nonzero entry in a place ∈ {1, 2, 3} and in a place /∈ {1, 2, 3}. Also, if (−2i, 1k)
with i ∈ {1, 2, 3}, k /∈ {1, 2, 3}, then (−1k) must be absent, which has strong implications, as noted
in Remark 1.2(b).
38 A. DANCER AND M. WANG
Let ξ̄ be a (1B) vertex in the plane. We have at most one (1B) vertex adjacent to ξ̄, as the
associated face must again be of type (H1) and is now determined by c̄, ξ̄. It also readily follows
that c cannot be collinear with an edge of W not in the face (assuming as usual we are not in the
situation of Theorem 3.14).
Now the special properties of W in the first paragraph imply that (1A) vertices in ∆c̄ can
correspond only to type III vectors in W which overlap with c. A straight-forward check using the
null condition for c̄ shows that the possible type IIIs have form (−2i, 1k) with i ∈ {1, 2, 3}, k /∈
{1, 2, 3} and ci/di = −1/2. It follows that di = 2 or 3 and hence, by nullity, the index i is unique.
So there are at most r−3 (1A) vertices. By Lemma 8.1, all r−3 (1A) vertices must occur. Applying
Cor 4.3 we conclude that r ≤ 4 (as di 6= 4 and r > 4 forces i ∈ Ŝ≥2).
We will now improve this estimate to r ≤ 3. Let the vertices of (H1) be as in §7. If r = 4 then
a (1A) vertex does exist and we can take it to come from t = (−2, 0, 0, 1) with (−14) absent. It
follows that besides t the only other possible members of W lying outside the 2-plane containing
(H1) are (0,−2, 0, 1) and (0, 0,−2, 1). As noted just before Theorem 6.12 we may assume the type
I vectors α, β, γ are all present. (If K is connected, d4 = 1 and so this last fact follows without
having first to eliminate those subshapes not containing one of the type I vectors.)
As noted above, d1 = 2 or 3, and c1 = −1 or −
respectively.
First consider c1 = −1, so d1 = 2. Now c = (−1, c2,−c2, 0), and by swapping the 2, 3 coordinates
if necessary, we may take c2 > 0. Observe u, q are absent, as if u is present or if u is absent but q
is present, then u (resp. q) gives a (1B) vertex, which contradicts nullity as the associated a would
have a1 = −3. Now the type II vectors x, y, z are absent, as if one is present they all are, and we
have a type (2) vertex. We deduce α gives a (1B) vertex so a = (−1,−c2, c2, 0). The other (1B)
vertex cannot correspond to w since β, γ are present. It also cannot be given by v, s as this violates
nullity, so must correspond to p or β.
If it is p, we have a′ = (1, 2− c2, c2−4, 0). Now Remark 3.9 implies c2 = (d3+4d2)/(d3 +2d2) so
1 < c2 < 2. But now no entry of a
′ equals −1 or −3
. We can now check that there are no (1A) or
(2) vertices with respect to a′, so there is at most one vertex of ∆ā
adjacent to p, a contradiction.
If it is β then p, v must be absent. Now a′ = (1,−c2, c2 − 2) and Remark 3.9 implies c2 = 1.
Hence c = (−1, 1,−1), a = (−1,−1, 1), a′ = (1,−1,−1), and nullity implies 1
. It is easy
to check by considering the vertices of ∆ā
that w, s must also be absent, so W just contains the
three type I vectors, t and possibly one or both of (0,−2, 0, 1), (0, 0,−2, 1). But we can check that,
if present, these three latter vectors give respectively vertices with respect to a, a′ which cannot
satisfy any of the conditions (1A), (1B) or (2). So in fact we have r = 3.
Similar arguments rule out the case c1 = −
Lemmas 8.2-8.6 give the following improvement of Theorem 6.18.
Theorem 8.7. It is impossible to have adjacent (1B) vertices except possibly when r = 3, in
which case conv(1
(d+W)) is a proper subface of (H1) containing all three type I vectors (e.g., the
tri-warped example (Tr28)).
We are now in a position to strengthen Theorem 7.1 by eliminating the remaining case of the
pentagon.
Theorem 8.8. Let c̄ be a null vertex of conv(C) such that ∆c̄ contains more than one type (2)
vertex. Then we are in the situation of Theorem 3.14.
Proof. We just have to eliminate the case of the pentagon uyy′zz′ in Theorem 7.1. Recall r ≥ 4 for
this configuration, and c is (0,−1, 2,−2, · · · ).
Using the nullity of c̄ we check that the only elements of W which can give an element of c̄⊥ are
(−22, 1i) where i > 4 and we have d2 = 2. Note that (1
1,−22) cannot be present as then y′z′ is
not an edge. By Cor 4.3, at most one such vector can arise. So there is at most one (1A) vertex,
which occurs only if r ≥ 5.
CLASSIFICATION OF SUPERPOTENTIALS 39
If we can show there are no (1B) vertices, then we are done because if we look at the adjacent
vertices of the type (2) vertex associated to y (in the pentagon), besides one (1A) possibility, the
other possibility is the type (2) vertex associated to y′ (by Theorem 7.1). As there must be at least
r− 2 adjacent vertices, we deduce r− 2 ≤ 2, so r = 4. But now, from above there is no (1A) vertex
so in fact we get r − 2 ≤ 1, a contradiction.
We now use Remark 3.9 to make a list of the possible x ∈ W associated to (1B) vertices
of ∆c̄. These are (0, 1,−1,−1, 0, · · · ), (0,−2, 1, 0, · · · ), (13,−1i,−1j), (−14, 1i,−1j), (13,−14,−1i),
(−13,−14, 1i) and (13,−2i) where i, j 6= 2, 3, 4. Note that type II vectors with nonzero entries in
places 2, k,m cannot occur except for y′, z′ as then y′z′ is not an edge. For each x in this list, we
consider the projected polytope ∆ā, where a = 2x − c. By looking at the form of a, we see from
Theorem 7.1 that there is at most one type (2) vertex in ∆ā. Also, the nonzero components of a
are either ≥ 1 or ≤ −2. By Lemma 4.4(a), there are no vertices of type (1A) in ∆ā. Since r ≥ 4,
by Theorem 8.7, the type (1B) vertex in ∆ā corresponding to x̄ has no adjacent (1B) vertices. So
we have a contradiction to Theorem 8.1.
The above result together with Theorem 8.7 and Lemma 8.1 gives us lower bounds on the number
of (1A) vertices.
Theorem 8.9. Let c̄ be a null vertex of conv(C) and ∆c̄ be the corresponding projected polytope.
Suppose further that c is not type I, i.e., we are not in the case of Theorem 3.14.
(i) If there are no (1B) vertices in ∆c̄, then there are at least r − 2 type (1A) vertices.
(ii) If either there is a type (2) vertex or r ≥ 4, then there are at least r − 3 type (1A) vertices
in ∆c̄. Hence there are at least r − 3 elements of 1
(d+W) orthogonal to c̄.
9. Type (2) vertices
In this section we consider again type (2) vertices of ∆c̄. In view of Theorem 8.8, it remains to
deal with the case of a unique type (2) vertex in ∆c̄. By Theorem 8.7 there are no adjacent (1B)
vertices in this situation.
Let c be collinear with an edge vw of conv(W). We first consider the situation where there are
no interior points of vw lying in W. By Theorem 3.11, we have the two possibilities c = 2v − w
and c = (4v − w)/3. Moreover, a preliminary listing of the cases appears in Tables 1 and 2 of §3.
Case (i): c = 2v − w
We have to analyse cases (1)-(7) in Table 1 of §3. The idea is to determine the number of (1A)
and (1B) vertices using respectively Lemma 4.4 and Remark 3.9, and then get a contradiction
(sometimes using Theorem 8.9). Note that J(w̄, w̄) < 0 for (1)-(3).
In (1), (2) and (4)-(7), Lemma 4.4 shows that there are no elements of 1
(d+W) orthogonal to
c̄ (recall c /∈ W), so Theorem 8.9 shows that r ≤ 3. This already gives a contradiction in case (7).
(Note that when r = 3 and w is type I, since w is a vertex there are no type II vectors in W.)
In (1) the only x ∈ W that could satisfy Eq.(3.1) and give a (1B) vertex with respect to c̄ is
(1,−1,−1). But the associated a = 2x− c is (2,−3, 0) and it easily follows that ā, c̄ cannot both be
null. For (2), the possible x ∈ W which correspond to (1B) vertices are (1,−2, 0) and (1,−1,−1)
respectively. In each case we find the nullity of c̄ and Remark 3.9 imply J(w̄, w̄) > 0, a contradiction
to F 2w̄ J(w̄, w̄) = Aw < 0 (as w is type III).
In (5) with r = 3, one checks that the only possible x ∈ W corresponding to a (1B) vertex is
(0, 1,−2). Let us consider the distribution of points of W in the plane x1 + x2 + x3 = −1. The
point (0, 1,−2), if present, would lie on one side of the line vw while the point (−1, 0, 0) lies on
the other side. Now (−1, 0, 0) must lie in W as otherwise v cannot be present by Remark 1.2(b).
So since vw is an edge by assumption, (0, 1,−2) cannot lie in W, which gives a contradiction to
Theorem 8.9(i).
Hence in (1),(2),(5) Theorem 8.9 shows r ≤ 2, which is a contradiction.
40 A. DANCER AND M. WANG
In case (4) the nullity of c̄ translates into 1 = 9/d1 + 4/d2. Hence d2 6= 1, so if K is connected
(0,−1, 0, · · · ) is present and w is not a vertex, which is a contradiction. If r = 3 and K is not
connected, by Remark 1.2(b), (1,−2, 0) and (0,−2, 1) must be absent, and, from Remark 3.9, the
possibilities for x ∈ W associated to the (1B) vertex are x = (−2, 0, 1) and y = (0, 1,−2). In
the first case, conv(W) is the triangle with vertices v,w, x and a = 2x − c = (−1,−2, 2). Now
J(ā, w̄) > 0, contradicting the superpotential equation. In the second case, a = (3, 0,−4) with
J(ā, ȳ) > 0, and aw intersects conv(W) in an edge. By Theorem 3.11, t = (1, 0,−2) ∈ W and
conv(W) is a parallelogram with vertices v, y, w, t. Moreover, Remark 3.12 implies that a and w
are the only elements of C in aw. But then the midpoint (0, 0,−1) of wt is unaccounted for in the
superpotential equation.
For (6) with r = 3, there should be at least two vertices in ∆c̄. But we find there are no (1B)
vertices, a contradiction. So r = 2, and we are in the situation of the double warped product
Example 8.2 of [DW4].
In case (3), Lemma 4.4 shows 1
(d + W) ∩ c̄⊥ has at most one element. Hence ∆c̄, which has
dimension ≥ 2 since r ≥ 4, must contain at least one (1B) vertex. By Theorem 8.7, such a (1B)
vertex has at most 2 adjacent vertices. It follows that r = 4 and (1,−2, 0, 0) corresponds to the
(1A) vertex; also d2 = 2. Also, since (−1, 0, 0, 0) ∈ W, (0,−1, 0, 0) cannot be a vertex of conv(W).
But now routine computations using Eq.(3.1) show there are no (1B) vertices, a contradiction.
So the only possible case if K is connected is that giving Example 8.2 of [DW4]. If K is
disconnected there is the further possibility of (4) with r = 2, i.e., W = {(−2, 1), (−1, 0)}. This
is discussed in the third paragraph of Example 8.3 of [DW4]. An example in the inhomogeneous
setting is treated there and in [DW2]. An example where the hypersurface is a homogeneous space
G/K is discussed in the concluding remarks at the end of section §10.
Case (ii): c = (4v − w)/3
For clarity of exposition let us assume K is connected, using the assumption as indicated in
Remark 5.9. We examine the cases (1)-(11) in Table 2 of §3.
Some of these cases can be immediately eliminated. In (3), Eq.(3.2) implies (d1, d2) = (3, 3) or
(4, 1). In neither case is c̄ null. In (11) Eq.(3.2) and J(v̄, w̄) > 0 imply (d1, d2, d3) = (2, 4, 4), (2, 5, 2)
or (3, 3, 3), and again c̄ is not null. In (4) and (6) Eq.(3.2) implies (d1, d2) = (2, 1) and (d1, d2) =
(3, 9) or (4, 3) respectively. In neither case does the nullity condition have an integral solution in
Further cases can be eliminated by finding the possible (1A) vertices (using Lemmas 3.10 and
4.4) for the given value of c and using Theorem 8.9. In particular, we get a contradiction whenever
r ≥ 4 and there are no (1A) vertices.
In (1), Eq.(3.2) implies (d1, d2) = (2, 1), and nullity of c̄ implies
= 3. But we now find
that 1
(d+W) ∩ c̄⊥ is empty, giving a contradiction as r ≥ 4.
In (5) Eq.(3.2) and nullity imply (d1, d2) = (3, 3) and {d3, d4} = {3, 8}. One can now check that
(d+W) ∩ c̄⊥ is empty, which is a contradiction as r ≥ 4.
In (7), Eq.(3.2) implies (d1, d2) = (2, 1) and nullity implies
. One can now check
that the only possible elements ū orthogonal to c̄ correspond to u = (1, 0, 0,−2, · · · ) if d4 = 4 and
(1, 0, 0, 0,−2, · · · ) if d5 = 4. The nullity condition means that at most one of these can occur. This
is a contradiction as r ≥ 5.
In (8) Eq.(3.2) gives (d1, d2) = (2, 3) and nullity of c̄ gives
. Again one can check
that 1
(d+W) ∩ c̄⊥ is empty, a contradiction as r ≥ 4.
In (9), Eq.(3.2) and the nullity of c̄ give (d1, d4) = (2, 16) and {d2, d3} = {2, 3}. The only u which
can give ū ∈ c̄⊥ are (0,−2, 0, 0, 1i , · · · ) if d2 = 2 or (0, 0,−2, 0, 1
i , · · · ) if d3 = 2, where i ≥ 5. In
each case, i is unique since d2 (resp. d3) 6= 4. Since r ≥ 4, Theorem 8.9 now implies r = 4. But now
these u are not present (as i ≥ 5). Therefore there are actually no (1A) vertices, a contradiction
to r = 4.
CLASSIFICATION OF SUPERPOTENTIALS 41
In (10) we have (d3, d4) = (2, 16) and {d1, d2} = {2, 3}. The only u which can give an element
of c̄⊥ is (0,−2, 0, 0, 1i , · · · ) (for i unique and ≥ 5) if d2 = 2. The final argument in (9) now applies
equally here.
Finally, we can eliminate (2) by an analysis of both the (1A) and (1B) vertices. First, Eq.(3.2)
and nullity of c̄ force (d1, d2, d3) = (6, 1, 8). Next we check that
(d+W) ∩ c̄⊥ is empty, so r = 3.
Using Remark 3.9 we then find there can be no (1B) vertices, giving a contradiction.
So case (ii) cannot occur if K is connected.
Remark 9.1. Case (ii) is the only part of this section that relies on the connectedness of K.
In fact, the analysis of the cases where w is type III does not use this assumption. If K is not
connected, using the same methods and with more computation we obtain the following additional
possibilities (all of which are associated to a w of type II).
v w c = (4v − w)/3 d r
(9∗) (0,−1,−1, 1) (1,−1,−1, 0) (−1
,−1,−1, 4
) (1, 2, 6, 8) 4, 5
(1, 6, 2, 8) 4, 5
(10∗) (1,−1, 0,−1) (1,−1,−1, 0) (1,−1, 1
) (3, 3, 1, 8) 4
(6, 2, 1, 8) 4, 5
(14) (0,−1, 0, 1,−1) (1,−1,−1, 0, 0) (−1
,−1, 1
) (1, 3, 1, d4 , d5) 5
In (9*) and (10*) there is always a (1B) vertex in ∆c̄, and r = 4 or 5 according to whether the
cardinality of c̄⊥ ∩ 1
(d+W) is 1 or 2. The dimensions d4, d5 in (14) must satisfy
(i.e.,
{d4, d5} = {5, 20}, {6, 12}, {8, 8}) and again there is always a (1B) vertex in ∆
Interior points
Finally, we must consider the cases, listed in Table 3, §3, when there may be points of W in the
interior of vw. As in the earlier cases, we analyse the possible (1A) and (1B) vertices for these c.
For (1) and (2), as 1 < λ ≤ 2, the nonzero entries of c are either < −2 or > 1. Hence by Lemma
4.4 there are no (1A) vertices. By Theorem 8.9 we have r = 2 or 3.
In case (3) the nullity of c̄ implies that a vector u ∈ W not collinear with vw and with ū ∈ c̄⊥
must be of the form (−2, 0, 1j). So 2λ− 1 = d1
and c = (−d1
,−1 + d1
, 0, · · · ). From the range for
λ and the nullity of c̄, we have d1 = 3, d2 = 1. But d2 6= 1 since w ∈ W. So again there are no
(1A) vertices and by Theorem 8.9 we have r ≤ 3.
In case (4), a straight-forward preliminary analysis reduces the possibilities of u ∈ W such that
ū ∈ c̄⊥ to the choices u = (−2, 0, 1, · · · ), (0,−2, 1, · · · ) or (−1,−1, 1, · · · ). Note that c1 < −2 and
c2 = 1, so by Lemma 4.4(a) we see c3 < 1, i.e. λ <
. Now the second vector cannot occur because
the orthogonality equation and λ ≤ 2 imply that d3 = 1, contradicting the presence of w. Since
the three vectors are collinear, if two satisfy the orthogonality equation then all do. So there is at
most one (1A) vertex and so r ≤ 4 by Theorem 8.9. This can be improved to r ≤ 3 as follows. If
the third vector (−1,−1, 1, · · · ) occurs then the orthogonality relation, the bound on λ, and nullity
imply that d1 = 5, d3 = 2 and λ =
2 + 1
. Now the nullity equation may be written as a
quadratic in 2+ 1
with no rational root. If the first vector (−2, 0, 1, · · · ) occurs then orthogonality
implies λ =
d1(d3+2)
4d3+2d1
, and the bound on λ gives 6
> 1. Nullity implies d1 ≥ 5 and d1 > d
We can deduce d3 = 2 and λ =
, and one can check that nullity fails.
For case (5), again a straight-forward analysis of the orthogonality condition with the help of
the nullity of c̄ gives the following u ∈ W as possibilities such that J(c̄, ū) = 0:
(a) (−23, 1i), i ≥ 4 and d3 = 2,
(b) (1,−2, 0, · · · ),
(c) (−22, 1i), i ≥ 4 and d2 = 3,
(d) (0,−2, 1, 0, · · · ).
42 A. DANCER AND M. WANG
Note (1, 0,−2, · · · ) cannot be in W as then vw is not an edge.
It follows from Cor 4.3 that among (a) only one vector can occur and among (b), (c), (d) also
only one vector can occur. (The orthogonality conditions of (b) and (d) are incompatible with
1 < λ ≤ 2.) So c̄⊥∩ 1
(d+W) contains at most two elements. If it has two elements, one must then
come from (a) and the other from (b)-(d). Together they give an edge of c̄⊥ ∩ conv(1
(d + W))
with no interior points in 1
(d +W). Using Cor 4.3 and the null condition, we find that all these
two-element cases cannot occur. Hence r ≤ 4.
If r = 4 then the possible u with J(c̄, ū) = 0 are given by (a)-(d) with i = 4. Now, we can show
using techniques similar to those of Theorem 3.11 that c∗ = (1− 2λ, 2λ− 1,−1, 0) also gives a null
element of C. The possible vectors orthogonal to this element come from (a) and the vectors (b∗),
(c∗), (d∗) obtained from (b), (c), (d) by swapping places 1 and 2.
If (a) does not give an element in c̄⊥ ∩ 1
(d + W), it is straightforward to show, using the
orthogonality and nullity conditions for c̄ and c∗ together, that the (1A) vertices for c̄ and c∗ are
given by (b) and (b∗) respectively. Also we must have c = (4
,−1, 0) and d = (4, 4, 9, d4).
We need a (1B) vertex outside x4 = 0. From Remark 3.9, the only possible (1B) vertices for c̄
correspond to (1, 0, 0,−2) and (1,−1, 0,−1). In particular there can be no vertices, and hence no
elements of W, with x4 > 0. Hence (−1,−1, 0, 1) and therefore (1,−1, 0,−1) are not in W. So
the (1B) vertices for c̄ and c∗ are given by (1, 0, 0,−2) and (0, 1, 0,−2) respectively. Now the line
joining the corresponding null vectors a, a∗ misses conv(W), a contradiction.
The remaining case is when (a) gives the element in c̄⊥ ∩ 1
(d+W) and in c∗
(d+W). Now
for vw to be an edge we need (−14) absent, so by Remark 1.2(b) the only possible members of W
lying outside {x4 = 0} are the three type IIIs with x4 = 1. In particular, all vectors in conv(W)
have x4 ≥ 0. As (−1,−1, 1, 0) ∈ W, there must be (1B) vertices lying in {x4 = 0} for both c̄ and c∗.
We then find that the only possibilities for such a (1B) vertex are given by (b), (b∗) respectively.
It follows that d1 = d2, but now nullity is violated.
We conclude that there are no (1A) vertices, so r ≤ 3.
Theorem 9.2. Let c̄ be a null vertex in C such that ∆c̄ contains a type (2) vertex corresponding
to an edge vw of conv(W). Suppose we are not in the situation of Theorem 3.14.
(i) If there are no points of W in the interior of vw, then either we are in the situation of
Example 8.2 of [DW4] or K is not connected and we are in one of cases in the table of Remark 9.1
or in the situation of the third paragraph of Example 8.3 of [DW4].
(ii) If there are interior points of vw in W then r ≤ 3.
For further remarks about the r = 2 case see the concluding remarks at the end of §10.
10. Completing the classification
Throughout this section we will assume that K is connected (and we are not in the situation of
Theorem 3.14). Theorems 8.7 and 9.2 then tell us that if r ≥ 4 there are no type (2) vertices and no
adjacent (1B) vertices in ∆c̄, for any null vector c̄ ∈ C. Since all (1B) vertices lie in the half-space
{J(c̄, ·) > 0} bounded by the hyperplane c̄⊥ containing the (1A) vertices, we must therefore have
in each ∆c̄ exactly one (1B) vertex, with the remaining vertices all of type (1A). So if r ≥ 4 the
only remaining task is to analyse such a situation.
As dim∆c̄ = r − 2, we see dim(∆c̄ ∩ c̄⊥) = r − 3. In particular, there must be at least r − 2
elements of W giving elements of c̄⊥ ∩ 1
(d+W).
Theorem 5.18 lists the possible configurations of such elements when r ≥ 3. The above discussion,
together with Remark 5.19, shows that in cases (1), (2), (3) we can take m = r− 1, in cases (5)(ii),
(5)(iii), (6)(i), (6)(ii) we can take m = r, and in (6)(iii) we have r = 5.
Finally, since the vectors in (4) are collinear, so that dim(∆ ∩ c̄⊥) = 1, we have r = 3 or 4.
If r = 3, it follows that c̄ and the edge in conv(1
(d + W)) determined by the vectors in (4) are
CLASSIFICATION OF SUPERPOTENTIALS 43
collinear. This contradicts the orthogonality condition for the configuration of vectors in (4) and
we conclude that r = 4.
For each configuration we can consider the possible vectors u ∈ W giving the (1B) vertex. Besides
the nullity condition on c̄ and the condition that c̄ should be orthogonal to the (1A) vectors, we
have a further relation coming from the null condition in Remark 3.9. In most cases, routine (but
occasionally tedious) computations show that these relations have no solution.
As a result, we obtain the following possibilities for u (up to obvious permutations):
case (1A) vectors possible (1B) vector
(1) (−21, 1i), 2 ≤ i ≤ r − 1 (−1r), (12,−2r), (−11,−12, 1r)
(−11, 12,−13), (−11, 12,−1r)
(2) (11,−2i), 2 ≤ i ≤ r − 1 (−12), (−12,−13, 1r)
(3) (11,−22); (11,−23); possibly(11,−12,−13);
(−12,−13, 1i), 4 ≤ i ≤ r − 1 (14,−2r), (11,−2r), (−1r)
(4) (11,−22), (11,−23), (11,−12,−13) (−14), (11,−24), (11,−12,−14),
(5)(i) (−21, 12), (−11, 12,−13) (−11), (−14)
(5)(ii) (−21, 12); (−11, 13,−1i), 4 ≤ i ≤ r (13,−24), (−11, 12,−14), (13,−14,−15)
(5)(iii) (−21, 12); (−11,−13, 1i), 4 ≤ i ≤ r (−22, 14), (−11,−12, 14), (−13, 14,−15)
(6)(i) (−11,−12, 1i), 3 ≤ i ≤ r (−11,−13, 14)
(6)(ii) (11,−12,−1i), 3 ≤ i ≤ r (11,−13,−14), (−12, 13,−14), (11,−23)
(6)(iii) (11,−12,−1i), i = 3, 4; (11,−13,−14) (−15)
Table 7: Unique 1B cases
Remark 10.1. The possibilities for the (1B) vertex in cases (5)(ii) and (5)(iii) only apply to the
r ≥ 5 situation. When r = 4, the two cases become the same if we switch the third and fourth
summands and the possibilities are discussed in Lemma 10.14 below.
Remark 10.2. Note that u such as (−12), (−13) in (4), (−14) in (5)(ii) or (−1i) with i > 2 in
(6)(ii) cannot arise because they will not be vertices, due to the presence of the type II vectors in
Remark 10.3. The dimensions must satisfy certain constraints in each case. Some such constraints
were stated in Theorem 5.18 and Remark 5.19. We also have constraints coming from the nullity
conditions for c̄ and ā. These typically involve the requirement that some expression in the di is a
perfect square. The following is a summary of general constraints in each case:
case (1): d1 = 4; case (2): d1 = 1; case (3): d2 = d3 = 2; case (4): d2 + d3 ≤ 4d1/(d1 − 1) and
d2, d3,≥ 2; case (5)(i): (d1, d2) = (4, 2), (3, 3); case (5)(ii,iii): d1 = 2 and if r ≥ 5 also d3 = 2; case
(6)(i, ii): d1 = d2 = 2; case (6)(iii): d1 = d2 = d3 = d4 = 2, d5 = 25.
Our strategy now is reminiscent of that in §8. We have a (1B) vertex corresponding to ū and a
second null vector ā satisfying a = 2u − c. Now we may apply our arguments to ā, and conclude
that the vectors in ā⊥ ∩ 1
(d+W) are also of the form given in the above table, up to permutation.
The resulting constraints will allow us to finish our classification.
In some cases we can actually show that ā⊥∩ 1
(d+W) is empty and we have a contradiction. A
simple example when this happens is case 6(iii), where we now have c = (6
,−1) and
(ai/di) = (−
). Other cases are treated in Lemma 10.4 below.
Next we shall show that cases (6)(i)(ii) cannot arise (cf Lemma 10.6), so in all remaining cases
there must be at least one type III vector w with w̄ in ā⊥. We now use our explicit formulae
for c and a to derive inequalities on the entries of a and find when there can be such a type III
vector orthogonal to ā. For each such instance we then check whether ā⊥ ∩ 1
(d + W) forms a
configuration equivalent to one of those in Table 4. This turns out to be possible in only two
44 A. DANCER AND M. WANG
situations (cf Lemmas 10.7, 10.11 and Lemma 10.9). These have such distinctive features that W
can be completely determined and judicious applications of Prop 3.7 lead to contradictions. This
yields our main classification theorem.
Lemma 10.4. The following cases cannot arise:
case (4) with u = (1,−1, 0,−1),
case (6)(ii) with u = (1, 0,−2, · · · ),
case (4) with u = (0, 0, 0,−1) except for the case c = (4
,−1) with d = (4, 2, 2, 9).
Proof. (α) For case (4) with u = (1,−1, 0,−1) we find that the nullity and orthogonality conditions
and Remark 3.9 leave us with the following possibilities:
d c (ai/di)
(2, 5, 3, 20) (1,−5
, 0) (1
(2, 6, 2, 12) (1,−3
, 0) (1
(3, 4, 2, 12) (1,−4
, 0) (1
(5, 3, 2, 15) (1,−6
, 0) (1
It is easy to see that we can never have
= 1 for w ∈ W, so ā⊥ ∩ 1
(d+W) is empty and we
have a contradiction.
(β) Similarly, nullity, orthogonality and Remark 3.9 give:
d c (ai/di)
(2, 2, 225, d4 , . . . , dr) (
,− d4
, . . . ,− dr
) (109
, . . . , 1
(2, 2, 98, d4 , . . . , dr) (
,− d4
, . . .− dr
) (23
, . . . , 1
(2, 2, 36, d4 , . . . dr)) (
,−1,−d4
, . . . ,−dr
, . . . 1
Moreover n equals 962, 226, 50 respectively. It is easy to see that we can never have
for w ∈ W, so we have a contradiction.
(γ) For case (4) with u = (0, 0, 0,−1) we find that the nullity and orthogonality conditions and
Remark 3.9 leave us with the following possibilities, up to swapping places 2 and 3:
d c (ai/di)
(2, 2, 4, 25) (6
,−1) (−3
(2, 3, 3, 25) (6
,−1) (−3
(3, 2, 3, 121) (15
,−1) (− 5
(2m− 2, 2, 2,m2) (
2(m−1)
, 1−m
, 1−m
,−1) (− 1
, m−1
, m−1
It is now straightforward to see that we cannot have w ∈ W with
= 1, except in two cases
(both associated to the last entry of the table). One is the case stated in the Lemma. The other
occurs if m = 2, so a = (−1, 1
,−1) which is orthogonal to (−1, 0, 1,−1), (−1, 1, 0,−1). But as
(0, 0, 0,−1) is a vertex, neither of these vectors can be in W. So ā⊥∩ 1
(d+W) is still empty, giving
the desired contradiction.
As discussed above, we now turn to showing that case (6) cannot occur. The following remark
is useful in finding when type III vectors can give elements of ā⊥ ∩ 1
(d+W).
Lemma 10.5. If w = (−2i, 1j) and w̄ ∈ ā⊥, then ai
< 0 (assuming we are not in the situation of
Theorem 3.14).
Proof. We need
(10.1)
= 1 +
CLASSIFICATION OF SUPERPOTENTIALS 45
so if ai
≥ 0 then
≥ 1. Hence
≥ dj ≥ 1. As ā is null this means a = (−1
j) and we
are in the situation of Theorem 3.14.
Lemma 10.6. Configurations of type (6) cannot arise.
Proof. Recall that we have dealt with (6)(iii) and we have d1 = d2 = 2.
Case (6)(i): We have u = (−1, 0,−1, 1, · · · ), and from the nullity and orthogonality conditions we
deduce that
(ci/di) =
2(m+ 1)
2(m− 1)
m2 − 1
, · · · ,
m2 − 1
where 1
2(m+1)
and n− 1 = m2 for some positive integer m. We have, therefore,
(ai/di) =
2(m+ 1)
2(m− 1)
m2 − 1
m2 − 1
m2 − 1
, · · · ,
m2 − 1
Let us estimate the size of the entries in (ai/di). First observe that, as d3, d4 > 2(m + 1) from
above, we have
4(m+ 1) < d3 + d4 ≤ n− d1 − d2 = n− 4 = m
2 − 3
so we deduce m ≥ 6. Hence we have 3
≤ |a1
| < 1
≤ |a2
| < 1
, |a3
| ≤ 6
, |ai
| ≤ 1
for i ≥ 5. Also
note that a4
. Finally, a4
> 0, else we would have 2n − 4 = 2m2 − 2 ≤ d4, which
is impossible.
Consider now a type III vector w = (−2i, 1j) with w̄ ∈ ā⊥. By Remark 10.5, we need i = 1, 3 or
≥ 5. If i = 1 then, by Eq.(10.1),we have
∈ (0, 1
]. So we must have j = 4 and 1
contradicting our above remarks. If i = 3 then
, which is impossible. Similarly, if i ≥ 5
, which is impossible.
Hence there are no such type III vectors, so we are in case (6) with respect to ā. We cannot be
in 6(ii) as then the null vector has exactly one positive entry (see below), but a has two positive
entries. For 6(i), the null vector has exactly two negative entries. Now a has r− 2 negative entries,
so r = 4. But for 6(i) the negative entries have modulus < 2, while a3 = −2−
m2−1 , a contradiction.
Case 6(ii): Here there are two possibilities.
Subcase (α): u = (0,−1, 1,−1, · · · ). Then, as above, the null condition for c̄ gives
(ci/di) =
(m− 1)(m+ 2)
2(m+ 1)2
2(m+ 1)
(m+ 1)2
, · · · ,
(m+ 1)2
where 1
2(m+1)
and n− 1 = m2. So the vector (ai/di) is given by
(m− 1)(m+ 2)
2(m+ 1)2
2(m+ 1)
(m+ 1)2
(m+ 1)2
(m+ 1)2
, · · · ,
(m+ 1)2
As before, we have m ≥ 6. We deduce |a1
| ≤ 1
≤ |a2
| < 1
, |a3
| ≤ 8
, |a4
| ≤ 1
, |ai
| ≤ 1
i ≥ 5. Also a4
< 0, else d4 ≥ 2(m+ 1)
2 > 2n, which is impossible.
We look for vectors w = (−2i, 1j) with w̄ ∈ ā⊥. By Lemma 10.5 we have i = 1, 2 or 4. If i = 1
= m+3
(m+1)2
, so j = 3, but now Eq.(10.1) contradicts 2
. A similar argument works if
i = 2, while if i = 4, Eq.(10.1) implies
, a contradiction.
So 6(ii) must hold for ā, as we have already ruled out 6(i). But now we need a to have exactly
one positive entry, which has modulus < 2. So r = 4 and this positive entry is a3, but we have
a3 > 2, a contradiction.
46 A. DANCER AND M. WANG
Subcase (β): u = (1, 0,−1,−1, · · · ). We similarly have
(ci/di) =
(m− 2)(m+ 1)
2(m− 1)2
2(m− 1)
(m− 1)2
, · · · ,
(m− 1)2
(ai/di) =
m2 − 3m+ 4
2(m− 1)2
2(m− 1)
(m− 1)2
(m− 1)2
(m− 1)2
, · · · ,
(m− 1)2
where n− 1 = m2 and 1
= m+1
2(m−1)2 . The last two equations easily imply that m ≥ 6, and so
| < 1
for all i.
A type III (−2i, 1j) giving an element of ā⊥ must have i = 3 or 4, by Lemma 10.5. In both cases
we find from Eq.(10.1) that
, which is impossible. So 6(ii) holds for a, which is impossible
as a has at least two positive entries.
Lemma 10.7. The only possible example in case (4) is when c =
, u = (0, 0, 0,−1),
and d = (4, 2, 2, 9). ∆ā is then in case (1) with a = (−4
,−1) and ā⊥ ∩ 1
(d +W) consists of
(−2, 1, 0, 0), (−2, 0, 1, 0).
Proof. By Lemma 10.4 we just have to eliminate the possibility u = (1, 0, 0,−2). Now
(ai/di) =
8− 2d1 + d4
d1(4 + 2d1 + d4)
(d1 − 1)(2d1 + d4)
2d1(4 + 2d1 + d4)
(d1 − 1)(2d1 + d4)
2d1(4 + 2d1 + d4)
2(d1 − 1)
d1(2d1 + d4 + 4)
The null condition is
−(d1 − 1) d
4 − 4(d
1 − d1 − 1) d
4 + 4d1(3d
1 + d1 + 8)d4 + 16d
1(d1 + 2)
2 = 0.
For w = (−2i, 1j) with w̄ ∈ ā⊥ we need, by Lemma 10.5, i = 1 or 4. If i = 1, then for j = 2
or 3, Eq.(10.1) can be rewritten as 2d21 + d1d4 + 2d1 = −5d4 − 32, which is absurd. For j = 4 it
can be rewritten as −1 − 2
= 14+2d4−2d1
d1(2d1+d4+4)
. So the right hand side is < −1, which on clearing
denominators is easily seen to be false.
If i = 4, then for j = 1 Eq.(10.1) becomes 4
= 1. So (d1, d4) = (2, 8), (3, 6) or (5, 5), all of
which violate the null condition. For j = 2, 3 we obtain from Eq.(10.1) the equation 4d4(d1 − 1) =
(2d1 + d4 + 4)(d1d4 + d4 − 8d1), which can only have solutions if d4 ≤ 9. On the other hand, the
null condition has no integer solutions if d4 ≤ 9.
So no such type III exists, contradicting Lemma 10.6.
Lemma 10.8. Configurations of type (5)(i) cannot occur.
Proof. It is useful to note that the null condition for c̄ implies that d3 ≤ 4 when (d1, d2) = (4, 2)
and d3 ≤ 3 when (d1, d2) = (3, 3). One further finds the following possibilities:
u d c (ai/di)
(−1, 0, 0, 0) (3, 3, 1, 2) (−1, 1,−1
) (−1
(3, 3, 2, 1) (−1, 1,−2
) (−1
(4, 2, 1, 3) (−1, 1,−1
) (−1
(4, 2, 2, 2) (−1, 1,−1
) (−1
(4, 2, 3, 1) (−1, 1,−3
), (−1
(0, 0, 0,−1) (3, 3, 2, 121) (− 9
,−1) ( 3
(4, 2, 2, 25) (−4
,−1) (1
One easily checks that ā⊥ ∩ 1
(d +W) is empty in the last two cases, and consists only of type
II vectors in the third to fifth cases, giving a contradiction to Lemma 10.6. For the first two cases,
note that ā⊥ ∩ 1
(d + W) contains 1
(d + (−1,−1, 1, 0)) since by hypothesis for 5(i) (−1, 1,−1, 0)
is in W. Hence (1),(2) cannot hold with respect to ā. Also the vector d of dimensions rules out
(3),(4) and (5), so we have a contradiction.
CLASSIFICATION OF SUPERPOTENTIALS 47
Lemma 10.9. The only possible example for case (3) is when c =
, u =
(0, 0, 0, 0,−1), d = (2, 2, 2, 2, 9). Then a = (−2
,−1) and ∆ā is again in case (3).
Proof. (A) Let u = (−1r). The null condition for c̄ gives 4dr = (δ+d1+2)
2, where δ = d4+· · ·+dr−1.
In particular, dr is a square. Also, (ai/di) =
1− 1√
1− 1√
, −1√
, · · · , −1√
If dr = 4, then we find there are no type III vectors in ā
⊥, a contradiction. So dr ≥ 9 and
we have |ai
| ≤ 1
for i = 1, 4, · · · , r − 1, ≤ 1
for i = r and < 1
for i = 2, 3. Lemma 10.5 shows
i 6= 2, 3. From Eq.(10.1) and the above estimates, we first get i 6= r, and for the remaining values
of i, we have
> 0, so that j = 2 or 3. Also, dr = 9 (and hence d1 + d4 + · · · + dr−1 = 4), and
(ai/di) = (−
, · · · ,−1
). Upon applying Theorem 5.18 to ∆ā together with Theorem
8.9 and the above Lemmas, we deduce that we are in case (3)(ii) with r = 5 and d1 = d4 = 2,
giving the example in the statement of the Lemma.
(B) Next let u = (1, 0, · · · ,−2). Now (ai/di) = (
− α, 1−α
, 1−α
,−α, · · · ,−α,
(n−2−dr)α−5
where, as a consequence of the null condition for c̄, we have
(10.2) α =
n− 2−m
: m2 = dr(n− 1).
Next we get the identity (n− 2)2 −m2 = (n− 1)(d1 + 1+ δ) + 1 =
m2(d1+1+δ)
+1, where δ is as
given in (A) above. We deduce m <
n−3(n− 2), and hence α <
d1+1+δ
n−dr−3 ≤ min(
So ai
is positive for i ≤ 3 and negative for 3 < i ≤ r−1. Note also that (n−2−dr)α <
2(n−2−dr)
n−3−dr =
1 + 1
n−3−dr
. In particular, ar
< − 5
As usual, we look for (−2i, 1j) giving an element of ā⊥. By Lemma 10.5, i 6= 1, 2, 3. If 4 ≤ i ≤ r−1
then Eq.(10.1) says
= 1 − 2α > 0, so j = 1, 2 or 3. If j = 1 we obtain α = 1 − 2
. Comparing
this with Eq.(10.2) shows d1 = 3 and α =
, but now ā is not null. If j = 2 or 3, we obtain α = 1
we deduce from Eq.(10.2) that d1 ≤ 5, and again one can check that all possibilities violate nullity.
So all type III (−2i, 1j) have i = r. Therefore we must be in case (1) or (5) with respect to ā,
and dr = 4 or 2 respectively.
If j = 1, 2, 3 then
> 0. Now Eq.(10.1) combined with the estimate above for ar
show that
> ar > −
, so dr > 5, a contradiction.
If 4 ≤ j ≤ r − 1, then in the case dr = 4, we find Eq.(10.1) gives α =
n−4 . Combining with
Eq.(10.2) we get n = 10, which is incompatible with dr = 4 and r ≥ 5. If dr = 2 we find similarly
that α = 4
n−3 and m satisfies 3m
2 − 8m− 4 = 0; but this has no integral roots.
So u = (1, 0, · · · − 2) cannot occur.
(C) For u = (14,−2r), we have (ai/di) = (−α,
, 1−α
− α,−α, · · · ,−α,
(n−2−dr)α−5)
) and
Eq.(10.2) still holds. The arguments of case (B) carry over to this case, on swapping indices
1, 4.
Lemma 10.10. Case (2) cannot occur.
Proof. (A) Consider u = (−12). Now (ai/di) = (
− 1, −1
, · · · , 1
(2− n+1−dr
Nullity implies d2 ≥ 3, so −1 <
and |ai
| ≤ 1
for 2 ≤ i ≤ r − 1. Also we have
dr(n− 1) = m
2, and for this choice of u we have m = n+1− 2d2, so
= dr−m
which is positive if
m < 0 and negative if m > 0. By Lemma 10.5 and the fact that d1 = 1, we only have to consider
(−2i, 1j) with i = 2 or r.
If i = 2, then Eq.(10.1) says
= 1 − 2
> 0 so j ≥ 3. If 3 ≤ j ≤ r − 1 then Eq.(10.1) shows
d2 = 3, so m = n− 5 and dr =
(n−5)2
n−1 = n− 9 +
n−1 . As n ≥ 7 we must then have (dr, n) = (9, 17)
or (2, 9). Imposing the nullity condition on ā shows there are only three possibilities, corresponding
48 A. DANCER AND M. WANG
to d = (1, 3, 2, 2, 9), (1, 3, 4, 9), (1, 3, 3, 2). In the first two there is only one type III in ā⊥, as d1 = 1
and d2 6= 4, so we are in case (5) with respect to ā, contradicting the fact that d2 6= 2. In the last
case we must be in case (4) with respect to ā, but now (0,−1, 1,−1) is present, so u is not a vertex.
If j = r then Eq.(10.1) becomes d2 =
3dr−n−1
dr−2 which is less than 3, a contradiction. (We cannot
have dr = 2 and 3dr = n+ 1 as n ≥ 7.)
Hence all such (−2i, 1j) have i = r and we are in case (1) or (5) with respect to ā. For case (1)
we need r− 2 of the
(j < r) equal. This can only happen for our a if d2 = 3, which is ruled out
as in the previous paragraph. For case (5) we have dr = 2 and
= 3 − n−1
. The possibilities on
the left-hand side are 2
− 1,− 1
respectively. On using our relations for m,n, dr we find that
only the third possibility can occur, and d2 = 3. The argument in the previous paragraph again
eliminates this case.
(B) Consider u = (0,−1,−1, · · · , 1). Now (ai/di) = (2β−1, β−
, β− 2
, β, · · · , β,
4−(n+1−dr)β
where again from the null condition of c̄ we have
(10.3) β :=
(1− c1) =
n+ 1−m
3 + dr(
n+ dr + 1
: dr(n− 1) = m
2 (m > 0).
Now 0 < β < 1
(the case β = 1
leads to r = 4, d2 = d3 = 2 and a = (0,−1,−1, 1) which violates
nullity). So for w̄ ∈ ā⊥ we just have to consider w = (−2i, 1j) with i = 2, r (as d1 = 1 we can’t
have i = 1; also by symmetry the case i = 3 is treated the same way as i = 2).
If i = 2 then Eq.(10.1) says
= 2β + 1 − 4
. If 4 ≤ j ≤ r − 1 we get β = 4
− 1; the only
possibility consistent with our bounds on β is d2 = 3, β =
and it is straightforward to check this
is incompatible with the null condition for ā.
If i = 2 and j = 1 then Eq.(10.1) implies d2 = 2 and again one checks that nullity for c̄ fails.
If j = 3 Eq.(10.1) says β = 4
− 1, so as β > 0 either d2 = 2 and β = 1 −
or d2 = 3
and β = 1
. In the former case the bound β < 1
shows d3 = 3 and β =
, and now nullity
for ā fails. In the latter case the bound β > 0 shows d3 > 6. Substituting this into the quadratic
which must vanish for nullity of c̄, we see δ = d4 + · · · + dr−1 is < 4. Checking the resulting short
list of cases yields no examples where nullity holds. If j = r then Eqs.(10.1) and Eq.(10.3) imply
= 1+ 1
and one check that the possible (d2, dr) yield no examples where nullity of c̄ holds.
So all such type III have i = r, and case (1) or (5) holds for ā. For case (1), then, as in (A), r−2
of the
(j ≤ r− 1) must be equal. So either r = 5 and d2 = d3 with β = 1−
or r = 4 and one
of the preceding equalities holds. If β = 1 − 2
holds, then the bounds on β show d2 = 3, β =
and as usual nullity for ā fails. If d2 = d3 holds, then using our formulae for β and substituting
into the null condition for ā gives a quadratic with no integer roots.
If case (5) holds, then dr = 2. Now Eq.(10.1) gives
= 5 − (n − 1)β. If j = 1, j = 2, or
4 ≤ j ≤ r − 1 we get β = 6
5+(2/d2)
respectively. (As usual, the case j = 3 is treated in just
the same way as j = 2.) Now using the equations in Eq.(10.3) relating n,m in each case gives a
quadratic with no real roots.
Lemma 10.11. For case (1) the only possibility is when c = (−4
,−1) with u = (0, 0, 0,−1)
and d = (4, 2, 2, 9). ∆ā is then in case (4).
Proof. (A) Consider u = (0, · · · , 0,−1). From the null condition for c̄ we see that dr = k
2, n− 1 =
(k + 1)2 for some positive integer k and (ai/di) = (
(1 − 1
),− 1
, · · · ,− 1
). Note that since
d1 = 4, n > 5 and so k 6= 1.
We must consider solutions of Eq.(10.1). By Lemma 10.5, i 6= 1. If i = r we have
= 1 − 2
The resulting equation has no solution in integer k > 1 for any choice of j. If 2 ≤ i ≤ r − 1, we
= 1 − 2
. We only obtain a solution k > 1 if j = 1; in this case k = 3, so n = 17, dr = 9
CLASSIFICATION OF SUPERPOTENTIALS 49
and we see r = 4 with {d2, d3} = {2, 2} or {3, 1}. The former case is that in the statement of the
Lemma. In the latter case we can have just one type III and one type II in ā⊥ (since d2 or d3 is
1, one potential type III is missing), so we must be in case (5) with respect to ā; but no di is 2, a
contradiction.
(B) Consider u = (0, 1, 0, · · · , 0,−2). Now (ai/di) = (
(1− β), 2
− β,−β, · · · ,−β,
(n−dr−2)β−5
where β = dr+6d2
d2(2n−dr−4) . The nullity condition for c̄ implies d2 ≥ 3, d2 > δ and dr > 2d2 + 4, where
δ now denotes d3 + · · · + dr−1. We can then deduce that β < 1, 0 <
, 0 < a2
particular β < 2
. By Lemma 10.5, we must consider elements of ā⊥ coming from vectors (−2i, 1j)
with i ≥ 3.
If 3 ≤ i ≤ r−1, Eq.(10.1) says
= 1−2β. As β < 1, this immediately rules out 3 ≤ j ≤ r−1. If
j = 1 we get β = 1
. Combining this with our formula above for β we get 2d2(d2+δ−7)+(d2−3)dr =
0. The only possibilities are d2 = 3, δ = 4 which violates the null condition, or d = (4, 4, 2, 8) which
violates the condition that dr(n − 1) should be a square. If j = 2, we get β = 1 −
. Since we
saw above that β < 2
we get d2 = 3 and β =
, which is ruled out as above. If j = r, Eq.(10.1)
implies β = dr+5
2+d2+δ+2dr
. Comparing this with the formula for β above leads to a contradiction.
The remaining possibility is for i = r. So we are in case (1) or (5) with respect to ā, and dr = 4
or 2 respectively. But dr > 2d2 + 4, so this is impossible.
(C) Let u = (−1,−1, 0, · · · , 0, 1). Now (ai/di) = (−
β,− 2
−β,−β, · · · ,−β,
1+(n−dr−2)β
) where
dr(d2−4)
2d2(2n+dr−4) , so 0 < β <
(noting that the nullity condition for c̄ implies d2 ≥ 5).
We look for vectors (−2i, 1j) giving elements of ā⊥. Now Lemma 10.5 rules out i = r, while if
3 ≤ i ≤ r − 1 we need
= 1− 2β > 2
. So j = r, and Eq.(10.1) yields β = dr−1
n+dr−2 . Equating this
to the expression above for β gives an equation which may be rearranged so that it says a sum of
positive terms is zero, which is absurd.
If i = 1 then Eq.(10.1) says
= 1−β. Clearly this can only possibly hold if j = r. The equation
then gives β = dr−1
n−2 , and equating this with the earlier expression for β leads, as in the previous
paragraph, to a contradiction.
So the only possibility is i = 2, and we are therefore in case (1) or (5) with respect to ā. But
d2 ≥ 5 so this is impossible.
(D) Let u = (−1, 1,−1, 0, · · · ). Now (ai/di) = (−
− β,− 2
− β,−β, . . . ,−β,
(n−dr−2)β−1
and β = 1
−2( 1
). An analysis of the nullity condition for c̄ shows that it can only be satisfied
, so d2, d3 ≥ 5 and 0 < β <
Let us now consider solutions to Eq.(10.1). If i = r, we have
= 1− 2
2(n−dr−2)β
. If j 6= 2,
this equation implies that the positive quantity (1 +
2(n−dr−2)
)β (or (1
2(n−dr−2)
)β if j = 1)
equals a nonpositive quantity (recall dr > 1 as i = r). If j = 2, we get that it equals
But d2 ≥ 5 so dr = 2 or 3, and in each case we find the nullity condition for c̄ is violated.
If i = 1, Eq.(10.1) says
= 1 − β > 9
, so j = 2 or r. But for j = 2 we get d2 = 2, which is
impossible as we know d2 ≥ 5, so in fact j = r.
If i = 2, Eq.(10.1) is
= 1 + 4
− 2β. We cannot then have j = 1, 3 or 4 ≤ j ≤ r − 1 as they
lead to β > 2
, > 1, > 1 respectively. So we must have j = r.
If i = 3, we see
= 1 − 4
− 2β. If j = 1, 2 or 4 ≤ j ≤ r − 1, we see in all cases (using our
bounds on d2, d3) that β >
, a contradiction. Hence again j = r.
If 4 ≤ i ≤ r − 1, then
= 1 − 2β > 4
so j = 2 or r. If j = 2 we obtain β = 1 − 2
contradicting our earlier inequality for β; so again we have j = r.
We have shown that any (−2i, 1j) giving an element of ā⊥ has j = r, so we are in case (3), (4)
or (5) with respect to ā. It cannot be case (3) as we know from Lemma 10.9 that then each di is
50 A. DANCER AND M. WANG
2 or 9, and we have d1 = 4. If we are in case (4), then Lemma 10.7 tells us that d = (4, 2, 2, 9).
Moreover, as (−2, 0, 0, 1), (0,−2, 0, 1) are the elements of ā⊥ ∩ 1
(d+W), we must have β = 4
; but
now β > 1
, a contradiction. If it is case (5), then we have di = 2 for some i, which we can take
to be 4. Now ā must be orthogonal to vectors associated to (−14, 15,−1k) or (−14,−15, 1k), and
either case is incompatible with our expressions for ai/di.
(E) Consider u = (−1, 1, 0, · · · , 0,−1). Now (ai/di) = (−
− β,−β, · · · ,−β,
((n−dr−2)β−3)
and β =
8d2−dr(d2−4)
2d2(2n−dr−4) . It is easy to check that β <
. Also, the nullity condition for c̄ implies
d2 ≥ 3 and (
− 1)dr + 8 > 0; hence β > 0.
The analysis is similar to that in (D). If i = r then Eq.(10.1) implies that a positive quantity
times β equals a positive linear combination of reciprocals of di, minus 1. This sum of reciprocals
is therefore > 1, which gives us upper bounds on dr. The only case where Eq.(10.1) and the null
condition can hold is if j = 2 and d2 = 7, dr = 4, d3 + · · ·+ dr−1 = 11
If i = 1 then Eq.(10.1) says
= 1− β > 0, so j = 2 or r. But j = 2 implies d2 = 2, which from
above cannot hold, so j = r.
Now Lemma 10.5 rules out i = 2. If 3 ≤ i ≤ r − 1, we have
= 1 − 2β. If j = 1 then we get
β = 2
, which cannot hold. If j = 2 then β = 1 − 2
, and as β < 2
we deduce d2 = 3 and β =
which violates the null condition for c̄. If 3 ≤ j ≤ r − 1, then β = 1, which is impossible. So we
have j = r.
So in all cases we have j = r, except in the exceptional case discussed above where we can have
i = r and j = 2. But our list (1)-(6) of possible configurations in ā⊥ shows that if the (i, j) = (r, 2)
case occurs then no other type III can be in ā⊥. So we are in case (5), which is impossible as
dr = 4 6= 2 for this example. Hence the exceptional case cannot arise.
We see therefore that j = r in all cases. So, as in (D), we must be in case (3), (4) or (5) with
respect to ā. As before, the fact that d1 = 4 rules out case (3). For case (4) we need the dk to be
4, 2, 2, 9 and dr to be 4 (as the (−2
i, 1j) in ā⊥ have j = r), but this contradicts d1 = 4.
So we are in case (5). Now the orthogonality condition for the family of type II vectors leads to
β > 2
, which is impossible.
Lemma 10.12. Case (5)(ii) cannot occur if r ≥ 5.
Proof. (A) Consider u = (0, 0, 1,−2, 0, · · · ). We have
(ai/di) = (1− β, 1− 2β,
n−2d2−3
n−d2−2 − (
n−6−3d2
n−d2−2 )β,
2(d2+2)β
n−d2−2 −
n−d2−2 −
2(d2+2)β
n−d2−2 −
n−d2−2 , · · · )
where all terms from the fifth onwards are equal and where
β := 1 + c1
8(n−d2−2)+d4(n+d2)
2d4(n+d2+2)
The nullity condition for c̄ implies d4 > 52, so n > 56 and we deduce 0 < β <
. Hence
> 0. It is also easy to show that ai
> 0 for i ≥ 5 and a3
So if (−2i, 1j) gives an element of ā⊥ we need i = 2 or 4. As d4 > 52, Lemmas 10.6 - 10.11 show
that case (5) must hold with respect to ā. In particular di = 2, so we cannot have i = 4. Hence
i = 2 and d2 = 2. Now Eq.(10.1) implies β =
, 2n−5
3n−4 ,
4(n−2) +
d4(n−2) or
4(n−2) , depending on
whether j = 1, 3, 4 or ≥ 5. In all cases this contradicts the bound β < 3
and n > 56.
(B) Consider u = (−1, 1, 0,−1, 0, · · · ). We have (ai/di) =
(−β, 2
+ 1− 2β, − d2+1
n−d2−2 − (
n−6−3d2
n−d2−2 )β,
2(d2+2)β
n−d2−2 −
n−d2−2 −
2(d2+2)β
n−d2−2 −
n−d2−2 , · · · )
where all terms from the fifth onwards are equal and
β := 1 + c1
= n+d2−2
2(n+d2+2)
+ n−2
d2(n+d2+2)
+ n−d2−2
d4(n+d2+2)
The nullity condition for c̄ implies d2 ≥ 9 and d4 ≥ 4. It is now easy to check that
< β < 31
and that ai
> 0 for i ≥ 5.
As in (A), case (5) must hold with respect to ā. Now if (−2i, 1j) gives an element of ā⊥ we need
di = 2. This, combined with Lemma 10.5, means i = 1 or 3.
CLASSIFICATION OF SUPERPOTENTIALS 51
If i = 1, Eq.(10.1) immediately shows j cannot be 2. Moreover, if j = 3 or ≥ 5, Eq.(10.1) yields a
value for β that violates the nullity condition for c̄. If j = 4 we obtain β = n−1
+ n−d2−2
. As we are
in case (5) with respect to ā, we need to consider the elements in ā⊥∩ 1
(d+W) corresponding to type
II vectors. Their number and pattern, as stipulated by Theorem 5.18, together with orthogonality
to ā, imply further linear relations among the components of (ai/di) and small upper bounds for
r (usually of the form r = 5, 6). In all cases these additional constraints can be shown to be
incompatible with the above values of β.
As an illustration of the above method, note that our type III vector is (−2, 0, 0, 1, 0, · · · ). If ∆ā
is in case (5)(iii), Theorem 5.18 says that the possible type IIs must have a −1 in place 1 and a 0 in
place 4. Since r ≥ 5, the remaining −1 must be in a place whose corresponding dimension is 2. As
d3 = 2 we can have (−1, ∗,−1, 0, ∗, · · · ) where ∗ indicates a possible location of the 1 in the type II.
The other possibility is for −1 to be in place k for some k ≥ 5. After a permutation we can assume
k = 5, and d5 = 2 must hold. The type II is then of the form (−1, ∗, ∗, 0,−1, ∗, · · · ) where ∗ again
indicates possible positions for the 1. In the first case, the orthogonality conditions imply a2
which gives β = n−1
+ n−d2−2
. Comparing with the value of β from Eq.(10.1), we get d2 = d4.
Using this in the first value of β in (B) gives a contradiction after some manipulation. In the second
case, the argument we just gave implies that we can only have r = 5 and the orthogonality condition
implies a2
, which gives β = n−1
n+d2+2
2(n−d2−2)
d2(n+d2+2)
. After a short computation, one sees that
the two values of β are again incompatible. If ∆ā is in case (5)(ii), the argument is essentially the
same, as we only have to switch the places of the second −1 and the 1 in the type IIs.
Let us now take i = 3. If j = 1, Eq.(10.1) implies β = 3d2+4−n
5d2+10−n . If the denominator is negative,
then β > 1, which is a contradiction. If it is positive we find that this is incompatible with the
inequality β > n+d2−2
2(n+d2+2)
which comes from the displayed expression for β above.
If j = 2, we get β = d2+1
2(d2+2)
+ n−d2−2
2d2(d2+2)
. As above, we can rule this out by considering the vectors
in ā⊥ ∩ 1
(d+W) associated to type II vectors. A similar argument works for j ≥ 5, where we find
β = n−2d2−3
2(n−2d2−4) , and for j = 4, where we have β =
n−d2−2
d4(n−2d2−4) +
n−2d2−3
2(n−2d2−4) .
(C) Next let u = (0, 0, 1,−1,−1, 0, · · · ). We have (ai/di) = (1 − β, 1 − 2β,
n−2d2−3
n−d2−2 −
(n−6−3d2
n−d2−2 )β,
2(d2+2)
n−d2−2β −
n−d2−2 −
2(d2+2)
n−d2−2β −
n−d2−2 −
2(d2+2)
n−d2−2β −
n−d2−2 , · · · ) where all
terms from the sixth on are equal and
β := 1 + c1
= n+d2
2(n+d2+2)
n−d2−2
(n+d2+2)
The nullity condition for c̄ implies d4 and d5 ≥ 13. It is now easy to check that 0 < β <
> 0. As in (A), we find also that ai
> 0 for i ≥ 6, and that a3
As in (A) again, case (5) must hold with respect to ā, so if (−2i, 1j) gives an element of ā⊥ we
need di = 2. This, combined with Lemma 10.5, means i = 2. In this situation in all cases Eq.(10.1)
gives a value of β incompatible with our bounds on n and β.
Lemma 10.13. Case (5)(iii) cannot arise if r ≥ 5.
Proof. This is similar to the proof of the previous Lemma so we will be brief.
(A) Consider u = (0,−2, 0, 1, 0, · · · ). Now (ai/di) is given by
(1− β,− 4
+ 1− 2β,
(n−3)+(n+d2−2)(β−1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 , · · · ,
2d2β−(d2+1)
n−d2−2 )
where
β := 1 + c1
(d2+4d4)(n−2−d2−d4)+4d24
2d2d4(2n−d2−4) .
The nullity condition for c̄ implies d4 ≥ 8, d2 ≥ 27 and d2 > 2d4, and it readily follows that
< β < 1
. In particular, a1
As before, we see that case (5) holds with respect to ā, so for ā⊥ we have to consider type III
vectors (−2i, 1j) where di = 2. So we need only consider i = 3 or i ≥ 5.
52 A. DANCER AND M. WANG
In either situation, we proceed as in part (B) of the proof of Lemma 10.12, and obtain inconsis-
tencies in the equations involving β or contradiction to the bounds on β or the dimensions.
(B) Let u = (0, 0,−1, 1,−1, 0, · · · ). The nullity condition on c̄, which has a symmetry in d4 and
d5, now implies d4, d5 ≥ 46 and > 28d2. Now (ai/di) is given by
(1− β, 1− 2β, (n+d2−2
n−d2−2)β −
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 , · · · )
where all terms from the sixth on are equal and
β := 1 + c1
n+d2−2 + (
)(n−d2−2
n+d2−2).
We deduce that 1
< β < 3
and so 1
> 0. ∆ā must be in case (5), and for (−2i, 1j)
associated to elements of ā⊥ ∩ 1
(d+W), we have di = 2 and we need only consider i = 2, 3 or ≥ 6.
If i = 2, Eq.(10.1) and the upper bound on β imply
. As d2 = 2, one checks that this
never holds.
If i ≥ 6, Eq.(10.1) implies
. The bound d4, d5 > 28d2 can be used to show that this never
happens.
If i = 3, Eq.(10.1) and the expression for β above are seen to be incompatible if we use the
bounds on d4, d5 and β.
(C) Consider u = (−1,−1, 0, 1, 0, · · · ). The nullity condition on c̄ gives d4 ≥ 4, d2 ≥ 5. The
vector (ai/di) is
(−β, 1− 2β − 2
(n+d2−2)β−(d2+1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 ,
2d2β−(d2+1)
n−d2−2 , · · · )
where all terms from the fifth on are equal and
β := 1 + c1
d24+(d2+d4)(n−d2−d4−2)
d2d4(3n−d2−6) .
Now as 1
, we see that 1
< β < 1
Again ∆ā is in case (5) and we consider vectors (−2i, 1j) associated to elements of ā⊥∩ 1
(d+W),
where we must have di = 2, so i 6= 2, 4.
If i = 1, Eq.(10.1) becomes
= 1 − 2β > 0. This immediately means j 6= 2, 5, · · · , r. If j = 3,
the value of β from Eq.(10.1) and the above expression for β lead to d4 ≤ 10/3. For j = 4, we
obtain a contradiction by the method of part (B) in the proof of Lemma 10.12.
If i ≥ 5, Eq.(10.1) says
4d2β+(n−3d2−4)
n−d2−2 . If i = 3, Eq.(10.1) say
2(d2+1)
n−d2−2 +
2(n+d2−2)β
n−d2−2 .
In both situations, we can again apply the method of part (B) in the proof of Lemma 10.12 to
obtain contradictions.
The last case to consider is case (5)(ii) with r = 4, which is the same as case (5)(iii) with r = 4
if we interchange the third and fourth summands.
Lemma 10.14. No configurations for case (5)(ii) with r = 4 can occur.
Proof. When r = 4 we no longer have d3 = 2, but the nullity condition for c̄ implies that
Hence either {d3, d4} is one of {3, 3}, {3, 4}, {3, 5}, {3, 6}, {4, 4} or one of d3 or d4 is 2. Using this
together with the nullity conditions for ā, c̄ and the orthogonality conditions, we see that we only
need to consider u = (0,−2, 1, 0), (0, 0, 1,−2), (−1, 1, 0,−1) and (−1,−1, 1, 0).
(A) Let u = (0,−2, 1, 0). From the nullity condition for c̄ we deduce that d4 = 2, d2 ≥ 13, and
d3 ≥ 3. Now
(ai/di) = (1− β, 2β − 1−
2d2β−(d2+1)
(2d2+d3+2)β−(d2+1)
n−d2−2 )
where β := 1 + c1
d2+4d3+2d
d2d3(d2+2d3+4)
. We find that 11
< β < 1
and so a1
The above facts imply that ∆ā is again in case (5)(ii), and for (−2i, 1j) associated to an element
of ā⊥ we must have i = 4. If j = 1, 2, Eq.(10.1) leads to a contradiction to the above dimension
restrictions. The case j = 3 can be eliminated using the method of part (B) in the proof of Lemma
10.12.
(B) Next let u = (0, 0, 1,−2). The nullity condition for c̄ implies that d3 = 2 and d4 > 32d2+14 ≥
46. Now (ai/di) is given by
CLASSIFICATION OF SUPERPOTENTIALS 53
(1− β, 1− 2β, d4−d2+1
+ (2d2+2−d4
)β, − 4
2(d2+2)β−(d2+1)
where β := 1 + c1
d24+2d2d4−16
2d4(2d2+d4+6)
. One easily sees that 1
< β < 1, so that 0 < a1
. Since
d4 > 2d2 + 2 we obtain 0 <
Therefore, ∆ā is in case (5)(ii) and for (−2i, 1j) associated to an element of ā⊥ we must have
(by Lemma 10.5) i = 2 and so d2 = 2. Putting this value of d2 into the nullity condition for c̄ gives
a cubic equation in d4 with no integral roots, a contradiction.
(C) Consider now u = (−1, 1, 0,−1). From the nullity condition for c̄ we deduce that d2 ≥
4, d3 = 2, d4 ≥ 3 and 4d2 > 3d4. Also,
(ai/di) = (−β,
+ 1− 2β,
(2d2+2−d4)β−(d2+1)
, − 2
2(d2+2)β−(d2+1)
where β := 1 + c1
2d2+2d4+d
d2d4(2d2+d4+6)
. It follows that 1
< β < 3
and a2
Now we see that ∆ā is either in case (1) or (4) or (5)(ii). In the first two instances, by Lemmas
10.11, 10.7 d is a permutation of (4, 2, 2, 9). Since 3d4 < 4d2 we have d2 = 9, d4 = 4. But then the
null condition for c̄ is violated. So we are in case (5)(ii). For (−2i, 1j) associated to an element of
ā⊥, as di = 2, we have i = 1 or 3.
If i = 1, then Eq.(10.1) becomes
= 1 − 2β < 0, so j = 3, 4. When j = 3 the value of β given
above together with Eq.(10.1) imply that d4 = 3 or 4. But then the null condition for c̄ is violated.
For j = 4 we may use the argument of part (B) of the proof of Lemma 10.12.
If i = 3, using β > 1
in Eq.(10.1), we see that j = 2, 4. In either case, applying our bounds for
the dimensions in Eq.(10.1) lead to contradictions.
(D) Let u = (−1,−1, 1, 0). The null condition for c̄ implies that d4 = 2, d3 ≥ 3, d2 ≥ 5. With
β := 1 + c1
, we have
(ai/di) = (−β, 1 − 2β −
2d2β−(d2+1)
(2d2+d3+2)β−(d2+1)
One computes that β = 1
2d2+d
3+2d3
d2(3d
3+2d2d3+6d3)
, and from the dimension bounds one gets 5
≤ β <
. ∆ā cannot be in case (1) or (4), otherwise as d2 ≥ 5, we must have d2 = 9, d3 = 4, and the null
condition for c̄ is violated. So ∆ā is in case (5)(ii). For (−2i, 1j) associated to an element of ā⊥,
we must then have i = 1, 4.
If i = 1, then Eq.(10.1) is
= 1 − 2β > 0, so j = 3 or 4. In either situation, we may apply
the argument of part (B) of the proof of Lemma 10.12 to get a contradiction. If i = 4, Eq.(10.1)
together with the dimension bounds above show first that we can only have j = 3. In that case, a
more detailed look at Eq.(10.1) leads to a contradiction.
We can summarise our discussions thus far by
Theorem 10.15. Let r ≥ 4 and K be connected. Suppose that we are not in the situation of
Theorem 3.14. Assume that c̄ ∈ C is a null vector such that ∆c̄ has the property that there is
a unique vertex of type (1B) and all other vertices are of type (1A). Then the only possibilities
are given by Lemmas 10.7/10.11 and 10.9, up to interchanging ā and c̄ and a permutation of the
irreducible summands.
We will now sharpen the above Theorem using Proposition 3.7.
Corollary 10.16. Let r ≥ 4. Assume that K is connected and we are not in the situation of
Theorem 3.14. Then the possibilities given by Lemmas 10.7 and 10.9 (and hence Lemma 10.11)
cannot occur.
Proof. We will discuss the r = 4 case (i.e. that in Lemmas 10.7, 10.11) in detail and leave the
details of the r = 5 case (from Lemma 10.9) to the reader, as the arguments are very similar.
In the r = 4 case, first observe that C has exactly two null vectors, c̄ and ā in the notation of
Lemma 10.7, as the entries of a, c are determined by the vector d of dimensions. Hence, by Prop.
3.3, these are the only elements of C outside conv(1
(d+W)).
54 A. DANCER AND M. WANG
Since (−14) is a vertex of W, all type II vectors in W must be zero in place 4. As (1,−1,−1, 0)
is associated to an element of c̄⊥, it, together with (−1, 1,−1, 0), (−1,−1, 1, 0) are the only type II
vectors in W.
Next we analyse vectors in W and see if they are associated to elements of C, this last property
being important for applying Prop 3.7. Recall that (1,−2, 0, 0), (1, 0,−2, 0), (−2, 1, 0, 0), (−2, 0, 1, 0)
must be in W. The first two give elements of c̄⊥, the last two give elements of ā⊥.
First consider v = (1,−2, 0, 0). Now v̄ is a vertex of conv(1
(d + W)). By the superpotential
equation, d+ v = 2v̄ can be written as c̄(α)+ c̄(β), with c̄(α), c̄(β) ∈ C. Since v̄ is a vertex, every such
expression must involve ā or c̄, unless it is the trivial expression v̄ + v̄ and v̄ ∈ C. By computing
2v−a, 2v−c we find that these cannot lie in conv(W) (it is enough to exhibit one component < −2
or > 1). Thus v̄ ∈ C. Now an analogous argument shows that if w = (0,−2, 1, 0) is in W then
w̄ also lies in C. But vw is an edge of conv(W) with no interior points in W. So Prop 3.7 gives
= 4J(v̄, w̄) = 0, a contradiction to d2 = 2. Hence w /∈ W. Similarly we see (0,−2, 0, 1) /∈ W.
Next consider z = (−1,−1, 1, 0). By Remark 1.2(e) and the above, z is a vertex of conv(W). As
above we can show that z̄ ∈ C. Now v, z are the only elements of the face {x1+2x2 = −3}∩conv(W)
(cf. proof of Prop. 4.3 in [DW4]). So applying Prop 3.7 to vz we obtain 0 = 4J(v̄, z̄) = 1
contradiction.
To handle the r = 5 case (from Lemma 10.9), first observe that null elements of C must have
entries 2
in two of the places 1, · · · , 4 and −2
in the other two places. For a, c as in Lemma 10.9,
we can take (1,−2, 0, 0, 0), (1, 0,−2, 0, 0), (0, 1, 0,−2, 0), (−2, 1, 0, 0, 0) to lie in W. The argument
above to show that v̄ is in C still works for such type III vectors v. As above, we can use Prop 3.7
to show the other type III vectors (−2i, 1j) (i ≤ 4) do not lie in W; hence the ā, c̄ of Lemma 10.9
are the only null elements of C.
Let z = (−1, 1,−1, 0, 0) (it lies in W since (1,−1,−1, 0, 0) is associated to an element of c̄⊥) and
v = (1, 0,−2, 0, 0). As above we find z̄ is in C, and the arguments of Prop 4.3 in [DW4] show vz is
an edge of conv(W). A contradiction results as above from applying Prop 3.7 to vz.
The discussion at the beginning of this section now tells us that if K is connected and r ≥ 4 the
only case when we have a superpotential of the kind under discussion is that of Theorem 3.14. The
proof of Theorem 2.1 is now complete.
Concluding remarks.
1. When r = 2, then c is collinear with the elements of W. In other words, the projected
polytope ∆c̄ reduces to a single vertex, which must be of type (2). The possible elements of W are
(−2, 1), (−1, 0), (0,−1), (1,−2). If W has just two elements then Theorem 9.2 tells us we are either
in the situation of Theorem 3.14 (the Bérard Bergery examples), or in Example 8.2 or the third case
of Example 8.3 in [DW4]. In fact one can show that this last possibility can be realised in the class
of homogeneous hypersurfaces exactly when (d1, d2) = (8, 18). An example for these dimensions is
provided by G = SU(2)9 ⋉ Sym(9) (where Sym(9) acts on SU(2)9 by permutation) and K is the
product of the diagonal U(1) in SU(2)9 with Sym(9). The arguments of [DW2] show that this in
fact gives an example where the cohomogeneity one Ricci-flat equations are fully integrable.
If W has three elements, we may adapt the proof of Theorem 3.11 to derive a contradiction.
Here the essential point is that whenever we had to check that a sum of two elements of C does not
lie in d+W, such a fact remains true because the interior point of vw is the midpoint.
If W contains all four possible elements, then k ⊂ g is a maximal subalgebra (with respect to
inclusion). We suspect that this case also does not occur. In any event, it is of less interest because
the only way to obtain a complete cohomogeneity one example is by adding a Z/2-quotient of the
principal orbit as special orbit.
2. The only parts of this paper which depend on K being connected (or slightly more generally,
on the condition in Remark 2.4) are parts of §5, Case (ii) of §9, and all of §10. To remove this
CLASSIFICATION OF SUPERPOTENTIALS 55
condition, the main task would be generalizing Theorem 5.18 by getting a better handle on the
type II vectors associated to (1A) vertices (cf Lemmas (5.6)-(5.8)).
References
[BB] L. Bérard Bergery: Sur des nouvelles variétés riemanniennes d’Einstein, Publications de l’Institut Elie
Cartan, Nancy (1982).
[BGGG] A. Brandhuber, J. Gomis, S. Gubser and S. Gukov: Gauge theory at large N and new G2 holonomy
metrics, Nuclear Phys. B, 611, (2001), 179-204.
[CGLP1] M. Cvetic̆, G. W. Gibbons, H. Lü and C. N. Pope: Hyperkähler Calabi metrics, L2 harmonic forms,
Resolved M2-branes, and AdS4/CFT3 correspondence, Nuclear Phys. B, 617, (2001), 151-197.
[CGLP2] M. Cvetic̆, G. W. Gibbons, H. Lü and C. N. Pope: Cohomogeneity one manifolds of Spin(7) and G2
holonomy, Ann. Phys., 300, (2002), 139-184.
[CGLP3] M. Cvetic̆, G. W. Gibbons, H. Lü and C. N. Pope: Ricci-flat metrics, harmonic forms and brane resolu-
tions, Commun. Math. Phys. 232, (2003), 457-500.
[DW1] A. Dancer and M. Wang: Kähler-Einstein metrics of cohomogeneity one, Math. Ann., 312, (1998), 503-
[DW2] A. Dancer and M. Wang: Integrable cases of the Einstein equations, Commun. Math. Phys., 208, (1999),
225-244.
[DW3] A. Dancer and M. Wang: The cohomogeneity one Einstein equations from the Hamiltonian viewpoint, J.
reine angew. Math., 524, (2000), 97-128.
[DW4] A. Dancer and M. Wang: Superpotentials and the cohomogeneity one Einstein equations, Commun. Math.
Phys., 260, (2005), 75-115.
[DW5] A. Dancer and M. Wang: Notes on Face-listings for “Classification of superpotentials”, posted at
http://www.maths.ox.ac.uk/... or http://www.math.mcmaster.ca/mckenzie.
[GGK] V. Ginzburg, V. Guillemin and Y. Karshon: Moment maps, cobordisms, and Hamiltonian group actions,
AMS Mathematical Surveys and Mongraphs, Vol. 98, (2002).
[EW] J. Eschenburg and M. Wang: The initial value problem for cohomogeneity one Einstein metrics, J. Geom.
Anal., 10 (2000), 109-137.
[WW] Jun Wang and M. Wang: Einstein metrics on S2-bundles, Math. Ann., 310, (1998), 497-526.
[WZ1] M. Wang and W. Ziller: Existence and non-existence of homogeneous Einstein metrics, Invent. Math., 84,
(1986), 177-194.
[Zi] G. M. Ziegler: Lectures on Polytopes, Graduate Texts in Mathematics, Vol. 152, Springer-Verlag, (1995).
Jesus College, Oxford University, Oxford, OX1 3DW, United Kingdom
E-mail address: dancer@maths.ox.ac.uk
Department of Mathematics and Statistics, McMaster University, Hamilton, Ontario, L8S 4K1,
Canada
E-mail address: wang@mcmaster.ca
http://www.maths.ox.ac.uk/
http://www.math.mcmaster.ca/mckenzie
	0. Introduction
	1. Review and notation
	2. The classification theorem and the strategy of its proof
	3. Projection onto a hyperplane
	4. The sign of J(, )
	5. Vectors orthogonal to a null vertex
	6. Adjacent (1B) vertices
	7. More than one type (2) vertex
	8. Adjacent (1B) vertices revisited
	9. Type (2) vertices
	10. Completing the classification
	References
ABSTRACT
  We extend our previous classification of superpotentials of ``scalar
curvature type" for the cohomogeneity one Ricci-flat equations. We now consider
the case not covered in our previous paper, i.e., when some weight vector of
the superpotential lies outside (a scaled translate of) the convex hull of the
weight vectors associated with the scalar curvature function of the principal
orbit. In this situation we show that either the isotropy representation has at
most 3 irreducible summands or the first order subsystem associated to the
superpotential is of the same form as the Calabi-Yau condition for submersion
type metrics on complex line bundles over a Fano K\"ahler-Einstein product.

<|endoftext|><|startoftext|>
Introduction
The minimum degree δ(D) of a digraph D is the minimum of its minimum
outdegree δ+(D) and its minimum indegree δ−(D). When referring to paths and
cycles in digraphs we always mean that these are directed without mentioning
this explicitly. A digraph D is ℓ-linked if |D| ≥ 2ℓ and if for every sequence
x1, . . . , xℓ, y1, . . . , yℓ of distinct vertices there are disjoint paths P1, . . . , Pℓ in D
such that Pi joins xi to yi. Since this is a very strong and useful property to have
in a digraph, the question of course arises how it can be forced by other properties.
In the case of (undirected) graphs, much progress has been made in this di-
rection. In particular, linkedness is closely related to connectivity: Bollobás and
Thomason [2] showed that every 22k-connected graph is k-linked (this was recently
improved to 10k by Thomas and Wollan [17]). However, for digraphs the situa-
tion is quite different: Thomassen [18] showed that for all k there are strongly
k-connected digraphs which are not even 2-linked.
Our first result determines the minimum degree forcing a (large) digraph to be
ℓ-linked, which confirms a conjecture of Manoussakis [16] for large digraphs.
Theorem 1. Let ℓ ≥ 2. Every digraph D of order n ≥ 1600ℓ3 which satisfies
δ(D) ≥ n/2 + ℓ− 1 is ℓ-linked.
It is not hard to see that the bound on minimum degree in Theorem 1 is best
possible (see Proposition 3). It is also easy to see that for ℓ = 1 the correct bound
is δ(D) ≥ ⌊n/2⌋. The cases ℓ = 2, 3 of Theorem 1 were proved by Heydemann and
Sotteau [9] and Manoussakis [16] respectively. Manoussakis [16] also determined
the number of edges which force a digraph to be ℓ-linked. A discussion of these
and related results can be found in the monograph by Bang-Jensen and Gutin [1].
Note that it does not make sense to ask for the minimum outdegree of a di-
graph D which ensures that D is ℓ-linked (or similarly, to ask for the minimum
indegree). Indeed, the digraph obtained from a complete digraph A of order n− 1
by adding a new vertex x which sends an edge to every vertex in A has minimum
outdegree n− 2 but is not even 1-linked.
A slightly weaker notion is that of a k-ordered digraph: a digraph D is k-ordered
if |D| ≥ k and if for every sequence s1, . . . , sk of distinct vertices ofD there is a cycle
http://arxiv.org/abs/0704.0211v1
2 DANIELA KÜHN AND DERYK OSTHUS
which encounters s1, . . . , sk in this order. It is not hard to see that every ℓ-linked
digraph is also ℓ-ordered. Conversely, every 2ℓ-ordered digraph D is also ℓ-linked:
if x1, . . . , xℓ, y1, . . . , yℓ is a sequence of vertices as in the definition of ℓ-linkedness
then a cycle which encounters x1, y1, x2, y2, . . . , xℓ, yℓ in this order would yield the
paths required for the linking. The next result says that as far as the minimum
degree is concerned it is no harder to guarantee the 2ℓ paths forming such a cycle
than to guarantee just the ℓ paths required for the linking. In particular, note that
Theorem 2 immediately implies Theorem 1.
Theorem 2. Let k ≥ 2. Every digraph D of order n ≥ 200k3 which satisfies
δ(D) ≥ (n+ k)/2 − 1 is k-ordered.
Again, the bound on the minimum degree is best possible (see Proposition 4).
Moreover, it is easy to see that if k = 1 then the correct bound is δ(D) ≥ n/2− 1.
The proof of Theorem 2 yields paths between the k ‘special’ vertices whose length
is at most 6 and it is also easy to translate the proof into an algorithm which finds
these paths in polynomial time (see the remarks after the end of the proof).
Somewhat surprisingly, the minimum degree in both theorems is not quite the
same as in the undirected case: Kawarabayashi, Kostochka and Yu [12] proved
that the smallest minimum degree which guarantees a graph on n vertices to be
ℓ-linked is ⌊n/2⌋ + ℓ − 1 for large n. (Egawa et al. [4] independently determined
the smallest minimum degree which guarantees the existence of ℓ disjoint cycles
containing ℓ specified independent edges, which is clearly a very similar property.)
Kierstead, Sarközy and Selkow [13] proved that the smallest minimum degree which
guarantees a graph on n vertices to be k-ordered is δ(D) ≥ ⌈n/2⌉ + ⌊k/2⌋ − 1 for
large n. So in the undirected case the ‘2ℓ-ordered’ result does not quite imply the
‘ℓ-linked’ result. The proofs in [4, 12, 13] do not seem to generalize to digraphs.
2. Further work and open problems
In a sequel to this paper, we hope to apply Theorem 2 to obtain the following
stronger results, which would also generalize the theorem of Ghouila-Houri [6] that
any digraph D on n vertices with δ(D) ≥ n/2 contains a Hamilton cycle: we aim
to apply Theorem 2 to show that if k ≥ 2 and D is a sufficiently large digraph
whose minimum degree is as in Theorem 2 then D is even k-ordered Hamiltonian,
i.e. for every sequence s1, . . . , sk of distinct vertices of D there is a Hamilton cycle
which encounters s1, . . . , sk in this order. One can use this to prove that the
minimum degree condition in Theorem 1 already implies that the digraph D is
Hamiltonian ℓ-linked, i.e. the paths linking the pairs of vertices span the entire
vertex set of D. Note that this in turn would immediately imply that D is ℓ-arc
ordered Hamiltonian, i.e. D has a Hamilton cycle which contains any ℓ disjoint
edges in a given order. Note that in each case the examples in Section 3 show
that the minimum degree condition would be best possible. Undirected versions
of these statements were first obtained by Kierstead, Sarközy and Selkow [13] and
Egawa et al. [4] respectively (and a common generalization of these in [3]).
For graphs, the concepts ‘ℓ-linked’ and ‘k-ordered’ were generalized to ‘H-linked’
by Jung [11]: a graph G isH-linked ifG contains a subdivision ofH with prescribed
branch vertices (so G is k-ordered if and only if it is Ck-linked). The minimum
LINKEDNESS AND ORDERED CYCLES IN DIGRAPHS 3
degree which forces a graph to be H-linked for an arbitrary H was determined
in [5, 14, 15, 7]. Clearly, one can ask similar questions also for digraphs.
Finally, we believe that the bound on n which we require in Theorem 2 (and
thus in Theorem 1) can be reduced to one which is linear in k.
3. Notation and extremal examples
Before we discuss the examples showing that the bounds on the minimum degree
in Theorems 1 and 2 are best possible, we will introduce the basic notation used
throughout the paper. A digraph D is complete if every pair of vertices of D is
joined by edges in both directions. The order |D| of a digraph D is the number of
its vertices. We write N+(x) for the outneighbourhood of a vertex x and d+(x) :=
|N+(x)| for its outdegree. Similarly, we write N−(x) for the inneighbourhood of a
vertex x and d−(x) := |N−(x)| for its indegree. We set d(x) := min{d+(x), d−(x)}.
Given a set A of vertices of D, we write N+
(x) for the set of all outneighbours
of x in A. N−
(x), d+
(x) and d−
(x) are defined similarly. Given two vertices x, y
of a digraph D, an x-y path in D is a directed path which joins x to y. Given two
disjoint vertex sets A and B of D, an A-B edge is an edge
ab where a ∈ A and
b ∈ B.
The following proposition shows that the bound on the minimum degree in
Theorem 1 cannot be reduced.
Proposition 3. For every ℓ ≥ 2 and every n ≥ 2ℓ there exists a digraph D on n
vertices with minimum degree ⌈n/2⌉+ ℓ− 2 which is not ℓ-linked.
Proof. We will distinguish the following cases.
Case 1. n is even.
Let D be the digraph which consists of complete digraphs A and B of order n/2+
ℓ − 1 which have precisely 2ℓ − 2 vertices in common. To see that D is not ℓ-
linked let x1, . . . , xℓ−1, y1, . . . , yℓ−1 denote the vertices in A∩B. Pick some vertex
xℓ ∈ A \ B and some vertex yℓ ∈ B \ A. Then D does not contain disjoint paths
between xi and yi for all i = 1, . . . , ℓ. The minimum degree of D is attained by
the vertices in (A \B) ∪ (B \A) and thus is as desired.
Case 2. n is odd.
In this case, we define D as follows. Let A and B be disjoint complete digraphs
of order ⌈n/2⌉ − ℓ − 1. Add a complete digraph X of order 2ℓ − 3 and join all
vertices in X to all vertices in A ∪ B with edges in both directions. Add a set
S := {x1, x2, y1, y2} of 4 new vertices such that each vertex in S is joined to each
vertex in X with edges in both directions. Moreover, we add all the edges between
different vertices in S except for −−→x1y1 and
−−→x2y2. Finally, we connect the vertices
in S to the vertices in A ∪ B as follows. Both x1 and y1 receive edges from every
vertex in B and send edges to every vertex in A. Additionally, x1 will receive an
edge from every vertex in A and y1 will send an edge to every vertex in B. Both x2
and y2 receive edges from every vertex in A and send edges to every vertex in B.
Additionally, x2 will receive an edge from every vertex in B and y2 will send an
edge to every vertex in A (see Figure 1).
To check that D has the required minimum degree, consider first any vertex a ∈
A. As a sends edges to 3 vertices in S and receives edges from 3 such vertices,
4 DANIELA KÜHN AND DERYK OSTHUS
PSfrag replacements
Figure 1. The digraph D in Case 2 of Proposition 3. The dashed
arrows indicate the missing edges between x1 and y1 and between x2
and y2.
we have that d(a) = |A| − 1 + |X| + 3 = ⌈n/2⌉ + ℓ − 2. It follows similarly
that the vertices in B have the correct degree. Thus consider any vertex s ∈ S.
Then s sends edges to all vertices in A or to all vertices in B (or both) and s
receives edges from all vertices in A or from all vertices in B (or both). Thus
d(s) = |A| + |X| + 2 = ⌈n/2⌉ + ℓ − 2. It is easy to check that the vertices in X
have the required degree and thus δ(D) = ⌈n/2⌉+ ℓ− 2.
To see that D is not ℓ-linked, let x, x3, . . . , xℓ, y3, . . . , yℓ denote the vertices
in X. Then we cannot link xi to yi for each i = 1, . . . , ℓ since every x1-y1 path
must meet X ∪ {x2, y2} (and thus would contain x) and the analogue is true for
every x2-y2 path. �
We conclude this section with the examples showing that the bound on the
minimum degree in Theorem 2 is best possible.
Proposition 4. For every k ≥ 2 and every n ≥ 2k there exists a digraph D on n
vertices with minimum degree ⌈(n+ k)/2⌉ − 2 which is not k-ordered.
Proof. We will distinguish the following cases.
Case 1. k ≥ 3 is odd and n is even.
In this case, we define D as follows. Let A and B be disjoint complete digraphs
of order n/2 − k + 1. Add a complete digraph X of order k − 2 and join all its
vertices to all vertices in A ∪ B with edges in both directions. Add new vertices
s1, . . . , sk such that every si is joined to all vertices in X with edges in both
directions. Moreover, we add all the edges −−→sisj for j 6= i, i + 1 where sk+1 := s1.
We also add the edge −−→s1s2. Finally, we connect the si to the vertices in A ∪ B
as follows. Both s1 and s2 receive edges from every vertex in B and send edges
to every vertex in B. Additionally, s1 will send an edge to every vertex in A and
s2 will receive an edge from every vertex in A. Each of s3, s5, . . . , sk receives an
edge from every vertex in A and sends an edge to every vertex in A. Each of
s4, s6, . . . , sk−1 receives an edge from every vertex in B and sends an edge to every
vertex in B (see Figure 2). Let us now check that the minimum degree of the
digraph D thus obtained is as required. Let S := {s1, . . . , sk}. Note that each
LINKEDNESS AND ORDERED CYCLES IN DIGRAPHS 5
PSfrag replacements
Figure 2. The digraph D for k = 5 in Case 1 of Proposition 4.
The dashed arrows indicate missing edges between the vertices si.
vertex v ∈ A ∪ B sends edges to precisely (k + 1)/2 vertices in S and receives
edges from precisely that many vertices. Since |A| = |B|, it follows that d(v) =
|A| − 1 + |X|+ (k + 1)/2 = n/2− 2 + (k + 1)/2 = ⌈(n + k)/2⌉ − 2. Now consider
any si ∈ S. Then si receives edges from either all vertices in A or all vertices in B
(or both) and si sends edges to either all vertices in A or all vertices in B (or both).
Hence d(si) ≥ |A|+ |X| + |S| − 2 = n/2− 1 + k − 2 ≥ ⌈(n + k)/2⌉ − 2. It is easy
to check that the degree of the vertices in X is > ⌈(n+ k)/2⌉ − 2.
To see that D is not k-ordered note that every cycle in D which encounters
s1, . . . , sk in this order would use at least one vertex from X between si and si+1
for every i 6= 1 (see Figure 2). But since |X| = k − 2 this is impossible.
Case 2. k is even.
LetD be the digraph which consists of a complete digraph A of order ⌈n/2⌉+k/2−1
and a complete digraph B of order ⌊n/2⌋+ k/2 which has precisely k − 1 vertices
in common with A. It is easy to check that δ(D) = |A| − 1 = ⌈(n + k)/2⌉ − 2. To
see that D is not k-ordered, pick vertices s1, s3, . . . , sk−1 in A\B and s2, s4, . . . , sk
in B \ A. Then every cycle in D which encounters s1, . . . , sk in this order would
meet A∩B when going from si to si+1, i.e. it would meet A∩B k times, which is
impossible.
Case 3. k ≥ 3 is odd and n is odd.
This time we take D to be the digraph which consists of two complete digraphs A
and B of order (n + k)/2 − 1 having k − 2 vertices in common. Then δ(D) =
|A| − 1 = (n+ k)/2− 2. To see that D is not k-ordered, pick vertices s1, s3, . . . , sk
in A \B and s2, s4, . . . , sk−1 in B \ A. �
Note that in the proof of Proposition 4 we could have omitted the (easy) case
when k is even as Proposition 3 already gives a digraph of the required minimum
degree which is not k/2-linked and thus not k-ordered.
6 DANIELA KÜHN AND DERYK OSTHUS
4. Proof of Theorem 2
We first prove Theorem 2 for the case when k = 2. So suppose that D is a
digraph of minimum degree at least ⌈n/2⌉. Let s1 and s2 be the vertices which
our cycle has to encounter. If −−→s1s2 is not an edge then s1, s2 /∈ N
+(s1) ∪ N
−(s2)
and so |N+(s1) ∩N
−(s2)| ≥ 2δ(D) − (n− 2) ≥ 2. Similarly, if
−−→s2s1 is not an edge
then |N−(s1) ∩N
+(s2)| ≥ 2. Altogether this shows that there is a cycle of length
at most 4 which contains both s1 and s2.
Thus we may assume that k ≥ 3 and that D is a digraph of minimum degree
at least ⌈(n + k)/2⌉ − 1. Let S := (s1, . . . , sk) be the given sequence of vertices
of D which our cycle has to encounter. We will call these vertices special and will
sometimes also use S for the set of these vertices. We set sk+1 := s1. Given a set
I ⊆ [k] and a family T := (ti)i∈I of positive integers, an (S, I, T )-system is a family
(Pi)i∈I where each Pi is a set of ti paths joining si to si+1 and each path in Pi
has length at most 6 and is internally disjoint from S, from all other paths in Pi
and from the paths in all the other Pj . An (S, I)-system is an (S, I, T )-system
where ti = 1 for all i ∈ I. Thus to prove Theorem 2 we have to show that there
exists an (S, [k])-system.
Let I be the set of all those indices i ∈ [k] for which D does not contain at
least 6k internally disjoint si-si+1 paths of length at most 6.
Claim 1. It suffices to show that D contains an (S, I)-system.
Indeed, suppose that (Pi)i∈I is an (S, I)-system in D. So each Pi contains precisely
one path Pi. We will show that for every i ∈ [k]\I we can find an si-si+1 path Pi of
length at most 6 which meets S only in si and si+1 such that all the paths P1, . . . , Pk
are internally disjoint. We will choose such a path Pi for every i ∈ [k] \ I in turn.
Suppose that next we want to find Pj . Recall that since j ∈ [k] \ I the digraph D
contains a set P of at least 6k internally disjoint sj-sj+1 paths of length at most 6.
Since at most 5(k − 1) + k < 6k vertices of D lie in S or in the interior of some of
the other paths Pi, one of the paths in P must be internally disjoint from S and
all the other paths Pi, and so we can take this path to be Pj . This proves Claim 1.
In order to prove the existence of an (S, I)-system, choose an (S, J, T )-system
(Pj)j∈J in D such that J ⊆ I is as large as possible and subject to this
j∈J tj is
maximal. Note that tj < 6k for all j ∈ J since J ⊆ I. Assume that |J | < |I|. By
relabelling the special vertices, we may assume that k ∈ I \ J . So we would like
to extend (Pj)j∈J by a suitable sk-s1 path. Let X
′ be the set of all those vertices
which lie in the interior of some path belonging to (Pj)j∈J . Note that
|S ∪X ′| < 6k · 5(k − 1) + |S| < 30k2 =: k0.
Let A := N+(sk) \ (S ∪X
′) and B := N−(s1) \ (S ∪X
′). Then
(1) |A|, |B| ≥ δ(D) − |S ∪X ′| ≥ n/2− k0.
Moreover, A ∩ B = ∅ as otherwise we could extend our (S, J, T )-system (Pj)j∈J
by adding the path Pk := skxs1 where x ∈ A ∩B, a contradiction to the choice of
(Pj)j∈J . In particular, this shows that the set X
′′ of all vertices outside A ∪ B ∪
S ∪X ′ has size at most 2k0 and thus, setting Y := S ∪X
′ ∪X ′′, we have that
|Y | ≤ 3k0.
LINKEDNESS AND ORDERED CYCLES IN DIGRAPHS 7
Note that D does not contain an edge
ab with a ∈ A and b ∈ B. Indeed, otherwise
we could extend (Pj)j∈J by adding the path Pk := skabs1. We will often use the
following claim.
Claim 2. Let a ∈ A and let A′ ⊆ A be a set of size at least k0. Then N
+(a)∩A′ 6=
∅. Similarly, if b ∈ B and B′ ⊆ B is a set of size at least k0 then N
−(b) ∩B′ 6= ∅.
Suppose that N+(a) ∩ A′ = ∅. Then (1) together with the fact that D does not
contain an A-B edge implies that d+(a) ≤ n − |B| − k0 ≤ n/2, a contradiction.
The proof of the second part of the claim is similar.
We say that a special vertex si has out-type A if si sends at least k0 edges to A.
Similarly we define when si has out-type B, in-type A and in-type B. As |Y |+2k0 ≤
5k0 ≤ δ(G), it follows that each si has out-type A or out-type B (or both) and
in-type A or in-type B (or both). Note that s1 has in-type B but not in-type A
whereas sk has out-type A but not out-type B.
Claim 3. Let j ∈ J . If sj has out-type A then sj+1 has in-type B but not in-type A.
Similarly, if sj has out-type B then sj+1 has in-type A but not in-type B.
Suppose that sj has out-type A and sj+1 has in-type A. Let a ∈ N
(sj). Claim 2
implies that a sends an edge to one of the at least k0 vertices in N
(sj+1). Let
a′ ∈ N−
(sj+1) be such a neighbour of a. Then we could extend our (S, J, T )-
system by adding the path sjaa
′sj+1, a contradiction. The proof of the second
part of Claim 3 is similar.
Claim 4. No vertex in B sends an edge to A.
Suppose that
b∗a∗ is an edge of D, where a∗ ∈ A and b∗ ∈ B. Given vertices
a ∈ A and b ∈ B, put Nab := N
+(a) ∩ N−(b). Note that Nab ⊆ Y and a, b /∈
N+(a) ∪N−(b) as D does not contain an A-B edge. Thus
(2) |Nab| ≥ 2δ(D) − (n− 2) =
− (n− 2) ≥ k.
Let us now show that no special vertex si with i ∈ J has out-type B. So suppose
i ∈ J and si has out-type B. Then Claim 3 implies that si+1 has in-type A. By
Claim 2 some of the at least k0 vertices in N
(si+1) receives an edge from a
∗. Let
a′ be such a vertex. Similarly, some of the vertices in N+
(si) sends an edge to b
Let b′ be such a vertex. Then we could extend our (S, J, T )-system by adding
the path sib
′b∗a∗a′si+1, a contradiction. This shows that whenever si is a special
vertex of out-type B then i /∈ J . Let Q denote the set of such vertices si. Note
that sk 6∈ Q as sk does not have out-type B. Thus each special vertex in Q forbids
one index in J . Altogether this shows that
(3) |J | ≤ k − 1− |Q|.
Let SA be the set of all those special vertices si with 1 ≤ N
(si) < k0. Let SB
be the set of all those special vertices si with 1 ≤ N
(si) < k0. Let A
∗ be the set
of all those vertices in A which do not send an edge to some vertex in SA. Then
|A∗| > |A| − kk0. Similarly, let B
∗ be the set of all those vertices in B which do
not receive an edge from some vertex in SB. Then |B
∗| > |B| − kk0.
8 DANIELA KÜHN AND DERYK OSTHUS
PSfrag replacements
Figure 3. Modifying our (S, J, T )-system in the proof of Claim 4.
Consider any pair a, b with a ∈ A∗ and b ∈ B∗. As each special vertex in Nab
belongs to Q, it follows that
(4) |Nab \ S| ≥ |Nab| − |Q|
≥ k − |Q|
> |J |.
Suppose first that J 6= ∅. Given j ∈ J , let X ′′j be the union of X
′′ with the set
of all vertices lying in the interior of paths in Pj. As Nab ⊆ Y there must be
an index jab ∈ J such that Nab contains at least two vertices in X
. Note that
|A∗|, |B∗| > 2k0k. Thus there are 2k0+1 disjoint pairs a, b for which this index jab
must be the same. Let aq, bq (q = 0, . . . , 2k0) denote these pairs and let j ∈ J
denote the common index.
Note that sj has out-type A since we have seen before that no special vertex si
with i ∈ J has out-type B. Claim 3 now implies that sj+1 has in-type B. Pick
vertices a ∈ N+
(sj) and b ∈ N
(sj+1) such that a 6= a0 and b 6= b0. Claim 2
implies that there are indices q1, . . . , qk0 such that a sends an edge to each aqr .
Apply Claim 2 again to find an index r ≤ k0 such that b receives an edge from bqr .
Let x ∈ Naqr bqr and y ∈ Na0b0 be distinct vertices such that x, y ∈ X
j . We
can now modify our (S, J, T )-system to obtain an (S, J ∪ {k}, T ′)-system in D
by replacing Pj with the single path sjaaqrxbqrbsj+1 and adding the sk-s1 path
ska0yb0s1 (see Figure 3). If J = ∅ then we just add the sk-s1 path (which is still
guaranteed by (4)). In both cases this contradicts the choice of our (S, J, T )-system
and completes the proof of Claim 4.
Claim 5. Let a ∈ A and let A′ ⊆ A be a set of size at least k0. Then N
−(a)∩A′ 6=
∅. Similarly, if b ∈ B and B′ ⊆ B is a set of size at least k0 then N
+(b) ∩B′ 6= ∅.
Using Claim 4, this can be shown similarly as Claim 2.
Let S+
be the set of all those special vertices which send an edge to A and let S−
be the set of all those special vertices which receive an edge from A. Define S+
and S−
similarly. Note that these sets are not disjoint. The proof of the next claim
LINKEDNESS AND ORDERED CYCLES IN DIGRAPHS 9
is similar to that of Claim 3. (To prove the second and third part of Claim 6 we
use Claim 5 instead of Claim 2.)
Claim 6. If j ∈ J and sj ∈ S
then sj+1 cannot have in-type A. If j − 1 ∈ J
and sj ∈ S
then sj−1 cannot have out-type A. If j ∈ J and sj ∈ S
then sj+1
cannot have in-type B. Finally, if j − 1 ∈ J and sj ∈ S
then sj−1 cannot have
out-type B.
Let q+
:= |S+
| and define q−
and q−
similarly. Let Ȳ := V (D) \ Y = A ∪ B
and X := X ′ ∪X ′′ = Y \ S. Consider any pair a, b with a ∈ A and b ∈ B. Then
− 1 ≤ |N+(a)| ≤ q−
+ |N+
(a)|+ |N+
− 1 ≤ |N−(b)| ≤ q+
+ |N−
(b)|+ |N−
(b)|.
Since N+
(a)∩N−
(b) = ∅ (as D does not contain an A-B edge) and a, b /∈ N+
(b) we have
(a)|+ |N−
(b)| ≤ |Ȳ | − 2 = n− |X| − k − 2.
Adding (5) and (6) together now gives
− 2 ≤ q−
+ |N+
(a)|+ |N−
(b)|+ n− |X| − k − 2.
Hence
(a)∩N−
(b)| ≥ |N+
(a)|+|N−
(b)|−|X| ≥ 2
−n+k−q−
≥ 2k−q−
Similarly, using Claim 4, one can show that
(8) |N−
(a) ∩N+
(b)| ≥ 2k − q+
Consider any j ∈ J . Recall that by Claim 3 we have that either sj has out-
type A and sj+1 has in-type B or sj has out-type B and sj+1 has in-type A.
Let JAB denote the set of all those indices j ∈ J for which the former holds and
let JBA be the set of all those j ∈ J for which the latter holds. Our next aim is
to estimate jAB := |JAB | and jBA := |JBA|. Note that Claim 6 implies that if
sj ∈ S
then j /∈ JAB . As sk /∈ S
and k /∈ J , this shows that
jAB ≤ k − 1− |S
\ {sk}| = k − 1− |S
| = k − 1− q+
Also, if sj ∈ S
then j − 1 /∈ JAB by Claim 6. As s1 /∈ S
, this shows that
jAB ≤ k − 1− |S
\ {s1}| = k − 1− |S
| = k − 1− q−
Adding these two inequalites gives
(9) jAB ≤ k − 1−
In order to give an upper bound for jBA, note that if sj ∈ S
then j /∈ JBA by
Claim 6. Thus
jBA ≤ k − 1− |S
\ {sk}| ≤ k − q
10 DANIELA KÜHN AND DERYK OSTHUS
Also, if sj ∈ S
then j − 1 /∈ JBA by Claim 6. Thus
jBA ≤ k − 1− |S
\ {s1}| ≤ k − q
Adding these two inequalites gives
(10) jBA ≤ k −
Our next aim is to show that D contains a (S, J ∪{k})-system. This will complete
the proof of Theorem 2 since it contradicts the choice of our (S, J, T )-system.
Pick distinct vertices a0 ∈ A, aj ∈ N
(sj) for all j ∈ JAB , a
j ∈ N
(sj+1) for
all j ∈ JBA, b0 ∈ B, bj ∈ N
(sj+1) for all j ∈ JAB and b
j ∈ N
(sj) for all
j ∈ JBA. Choose a vertex x0 ∈ N
(a0) ∩ N
(b0) and link sk to s1 by the path
Qk := ska0x0b0s1. (This can be done since the right hand side of (7) is at least 2.)
To find the other paths, we distinguish two cases.
Case 1. jBA ≤ jAB
For all j ∈ JBA we pick a vertex xj ∈ N
(a′j) ∩N
(b′j) such that all these xj are
pairwise distinct and distinct from x0. Inequalities (8) and (10) together imply
that this can be done. Inequality (7) together with the fact that
2k − q−
− 1− jBA
≥ 2jAB + 1− jBA ≥ jAB + 1,
implies that for all j ∈ JAB we can now pick a vertex xj ∈ N
(aj) ∩ N
such that x0 and all the xj (j ∈ J) are pairwise distinct. If j ∈ JAB we link sj
to sj+1 by the path Qj := sjajxjbjsj+1. If j ∈ JBA we link sj to sj+1 by the path
Qj := sjb
jsj+1. The paths Qj (j ∈ J) and Qk are internally disjoint and have
length 4, so they form an (S, J ∪ {k})-system, as required.
Case 2. jBA > jAB
We proceed similiarly as in Case 1, but this time we choose the vertices xj ∈
(aj) ∩N
(bj) for all j ∈ JAB first. As
2k − q+
− 1− jAB
≥ 2jBA − 1− jAB > jBA − 1,
inequality (8) implies that we can then pick the vertices xj ∈ N
(a′j) ∩ N
(b′j)
for all j ∈ JBA. The paths Qj (j ∈ J) and Qk are then defined as before. This
completes the proof of Theorem 2.
Note that throughout the proof, the paths we constructed always had length at
most 6 (the only case where they had length exactly 6 was in the proof of Claim 4).
This means that the proof can easily be translated into polynomial algorithm so
that the exponent of the running time does not depend on k: We simply start with
any (S, J, T )-system with J ⊆ I. Now we go through the steps of the proof and
find a ‘better’ (S, J ′, T ′)-system with J ′ ⊆ I. Claim 1 implies that for fixed k we
only need to do this a bounded number of times. Since the paths we need have
length at most 6 and there are only a bounded number of cases to consider in
the proof, it is clear that one can find the better system in polynomial time with
exponent independent of k. Altogether this means that the problem of finding a
cycle encountering a given sequence of k vertices is fixed parameter tractable for
digraphs whose minimum degree satisfies the conditon in Theorem 2 (where k is
LINKEDNESS AND ORDERED CYCLES IN DIGRAPHS 11
the fixed parameter). The same applies to the problem of linking ℓ given pairs of
vertices. In general, even the problem of deciding whether a digraph is 2-linked
is already NP-complete [10]. For a survey on fixed parameter tractable digraph
problems, see [8].
5. Acknowledgement
We are grateful to Andrew Young for reading through the manuscript.
References
[1] J. Bang-Jensen and G. Gutin, Digraphs: Theory, Algorithms and Applications, Springer,
2000.
[2] B. Bollobás and A. Thomason, Highly linked graphs, Combinatorica 16 (1996), 313–320.
[3] G. Chen, R.J. Faudree, R.J. Gould, L. Lesniak and M.S. Jacobson, Linear forests and ordered
cycles, Discussiones Mathematicae – Graph theory 24 (2004), 47–54.
[4] Y. Egawa, R. Faudree, E. Györi, Y. Ishigami, R. Schelp and H. Wang, Vertex-disjoint cycles
containing specified edges, Graphs and Combinatorics 16 (2000), 81–92.
[5] M. Ferrara, R. Gould, G. Tansey and T. Whalen, On H-linked graphs, Graphs and Combi-
natorics 22 (2006), 217–224.
[6] A. Ghouila-Houri, Une condition suffisante d’existence d’un circuit Hamiltonien, C. R. Acad.
Sci. Paris 251 (1960), 495–497.
[7] R. Gould, A.V. Kostochka and G. Yu, On minimum degree implying that a graph is H-linked,
SIAM J. on Discrete Math., to appear.
[8] G. Gutin and A. Yeo, Some parameterized problems on digraphs, preprint 2007.
[9] M.C. Heydemann and D. Sotteau, About some cyclic properties in digraphs, J. Combinatorial
Theory B 38 (1985), 261–278.
[10] S. Fortune, J.E. Hopcroft, J. Wyllie, The directed subgraph homeomorphism problem, The-
oretical Computer Science 10 (1980), 111–121.
[11] H.A. Jung, Eine Verallgemeinerung des k-fachen Zusammenhangs für Graphen, Math. An-
nalen 187 (1970), 95–103.
[12] K. Kawarabayashi, A. Kostochka and G. Yu, On sufficient degree conditions for a graph to
be k-linked, Combinatorics, Probability and Computing, to appear.
[13] H. Kierstead, G. Sarközy and S. Selkow, On k-ordered Hamiltonian graphs, J. Graph The-
ory 32 (1999), 17–25.
[14] A. Kostochka and G. Yu, An extremal problem for H-linked graphs, J. Graph Theory 50
(2005), 321–339.
[15] A. Kostochka and G. Yu, Minimum degree conditions for H-linked graphs, Discrete Applied
Math., to appear.
[16] Y. Manoussakis, k-linked and k-cyclic digraphs, J. Combinatorial Theory B 48 (1990), 216–
[17] R. Thomas and P. Wollan, An improved extremal function for graph linkages, European
Journal of Combinatorics 26 (2005), 309–324.
[18] C. Thomassen, Note on highly connected non-2-linked digraphs, Combinatorica 11 (1991),
393–395.
Daniela Kühn, Deryk Osthus
School of Mathematics
University of Birmingham
Edgbaston
Birmingham
B15 2TT
E-mail addresses: {kuehn,osthus}@maths.bham.ac.uk
	1. Introduction
	2. Further work and open problems
	3. Notation and extremal examples
	4. Proof of Theorem ??
	5. Acknowledgement
	References
ABSTRACT
  The minimum semi-degree of a digraph D is the minimum of its minimum
outdegree and its minimum indegree. We show that every sufficiently large
digraph D with minimum semi-degree at least n/2 +k-1 is k-linked. The bound on
the minimum semi-degree is best possible and confirms a conjecture of
Manoussakis from 1990. We also determine the smallest minimum semi-degree which
ensures that a sufficiently large digraph D is k-ordered, i.e. that for every
ordered sequence of k distinct vertices of D there is a directed cycle which
encounters these vertices in this order.

<|endoftext|><|startoftext|>
Introduction
	The model
	Homogeneous equations
	Linear perturbations
	Decomposition into adiabatic and entropy components
	Perturbation spectra
	Evolution of perturbations inside the Hubble radius
	Equations in the slow-roll approximation
	Evolution of perturbations on super-Hubble scales
	Numerical analysis
	Numerical procedure
	Examples of inflationary models
	Double inflation with canonical kinetic terms
	Double inflation with non-canonical kinetic terms
	Roulette inflation
	Numerical results for the perturbations
	Double inflation with canonical kinetic terms
	Double inflation with non-canonical kinetic terms
	Roulette inflation
	Impact of the non-canonical terms
	Effectively single-field cases
	Closing discussion
	Conclusion
ABSTRACT
  We study cosmological perturbations in two-field inflation, allowing for
non-standard kinetic terms. We calculate analytically the spectra of curvature
and isocurvature modes at Hubble crossing, up to first order in the slow-roll
parameters. We also compute numerically the evolution of the curvature and
isocurvature modes from well within the Hubble radius until the end of
inflation. We show explicitly for a few examples, including the recently
proposed model of `roulette' inflation, how isocurvature perturbations affect
significantly the curvature perturbation between Hubble crossing and the end of
inflation.

<|endoftext|><|startoftext|>
Introduction
Standard textbooks describe the stationary one-dimensional motion of a quantum
particle in a real potential well V (x) by the ordinary differential Schrödinger equation
+ V (x)
ψ(x) = E ψ(x) , x ∈ (−∞,∞) (1)
which may be considered and solved in the bound-state regime at E < V (∞) ≤ +∞
or in the scattering regime with, say, E = κ2 > V (∞) = 0. In this way one either
employs the boundary conditions ψ(±∞) = 0 and determines the spectrum of bound
states or, alternatively, switches to the different boundary conditions, say,
ψ(x) =
Aeiκx +B e−iκx , x≪ −1 ,
C eiκx , x≫ 1 .
Under the conventional choice of A = 1 the latter problem specifies the reflection
and transmission coefficients B and C, respectively [1].
The conventional approach to the quantum bound state problem has recently
been, fairly unexpectedly, generalized to many unconventional and manifestly non-
Hermitian Hamiltonians H 6= H† which are merely quasi-Hermitian, i.e., which are
Hermitian only in the sense of an identity H† = ΘH Θ−1 which contains a nontrivial
“metric” operator Θ = Θ† > 0 as introduced, e.g., in ref. [2]. The key ideas and
sources of the latter new development in Quantum Mechanics incorporate the so
called PT −symmetry of the Hamiltonians and have been summarized in the very
fresh review by Carl Bender [3]. This text may be complemented by a sample [4] of
the dedicated conference proceedings.
In this context we intend to pay attention to a very simple PT −symmetric scat-
tering model where
V (x) = Z(x) + i Y (x) , Z(−x) = Z(x) = real , Y (−x) = −Y (x) = real
and where the ordinary differential equation (1) is replaced by its Runge-Kutta-
discretized, difference-equation representation
ψ(xk−1)− 2ψ(xk) + ψ(xk+1)
+ V (xk)ψ(xk) = E ψ(xk) (3)
xk = k h , h > 0 , k = 0,±1, . . .
as employed, in the context of the bound-state problem, in refs. [5].
2 Runge-Kutta scattering
Once we assume, for the sake of simplicity, that the potential in eq. (3) vanishes
beyond certain distance from the origin,
V (x±j) = 0 j =M,M + 1, . . . ,
we may abbreviate ψk = ψ(xk), Vk = h
2V (xk) and 2 cosϕ = 2− h
2E in eq. (3),
−ψk−1 + (2 cosϕ+ Vk) ψk − ψk+1 = 0 . (4)
In the region of |k| ≥ M with vanishing potential Vk = 0 the two independent
solutions of our difference Schrödinger eq. (4) are easily found, via a suitable ansatz,
as elementary functions of the new “energy” variable ϕ,
ψk = const · ̺
k =⇒ ̺ = ̺± = exp(±i ϕ) .
This enables us to replace the standard boundary conditions (2) by their discrete
scattering version
ψ(xm) =
Aeimϕ +B e−imϕ , m ≤ −(M − 1) ,
C eimϕ , m ≥ M − 1
with a conventional choice of A = 1.
Two comments may be added here. Firstly, one notices that the condition of the
reality of the new energy variable ϕ imposes the constraint upon the original energy
itself, −2 ≤ 2 − h2E ≤ 2, i.e., E ∈ (0, 4/h2). At any finite choice of the lattice
step h > 0 this inequality is intuitively reminiscent of the spectra in relativistic
quantum systems. Via an explicit display of the higher O(h4) corrections in eq. (3),
this connection has been given a more quantitative interpretation in ref. [6].
The second eligible way of dealing with the uncertainty represented by the O(h4)
discrepancy between the difference- and differential-operator representation of the
Schrödinger’s kinetic energy is more standard and lies in its disappearance in the
limit h → 0. This is a purely numerical recipe known as the Runge-Kutta method
[7]. In the present context of scattering one has to keep in mind that the two “small”
parameters h and 1/M may and, in order to achieve the quickest convergence, should
be chosen and varied independently.
3 The matching method of solution
3.1 The simplest model of the scattering with M = 1
Once we are given the boundary conditions (5) the process of the construction of
the solutions is straightforward. Let us first illustrate its key technical ingredients
on the model with the first nontrivial choice of the cutoff M = 1. In this case our
difference Schrödinger eq. (4) degenerates to the mere three nontrivial relations,
−ψ−2 + 2 cosϕψ−1 − ψ
0 = 0
−ψ−1 + (2 cosϕ+ Z0) ψ0 − ψ1 = 0
0 + 2 cosϕψ1 − ψ2 = 0
where we may insert, from eq. (5),
ψ−1 = e
−i ϕ +B ei ϕ , ψ
0 = 1 +B , ψ
0 = C , ψ1 = C e
i ϕ (7)
and where we have to demand, subsequently,
0 = 1 +B = ψ
0 = C = ψ0 ,
−e−i ϕ −B ei ϕ + (2 cosϕ + Z0) C − C e
i ϕ = 0 .
Thus, at an arbitrary “energy” ϕ one identifies B = C − 1 and gets the solution
2i sinϕ
2i sinϕ− Z0
, B =
2i sinϕ− Z0
Of course, as long as we deal just with the real “interaction term” Z0, our M = 1
toy problem remains Hermitian since no PT −symmetry has entered the scene yet.
3.2 PT −symmetry and the scattering at M = 2
In the next, M = 2 version of our model we have to insert the four known quantities
ψ−2 = e
−2 i ϕ +B e2 i ϕ , ψ−1 = e
−i ϕ +B ei ϕ , ψ1 = C e
i ϕ , ψ2 = C e
2 i ϕ
in the triplet of relations
−ψ−2 + (2 cosϕ+ Z−1 − i Y−1) ψ−1 − ψ
0 = 0
−ψ−1 + (2 cosϕ+ Z0) ψ0 − ψ1 = 0
0 + (2 cosϕ+ Z−1 + i Y−1) ψ1 − ψ2 = 0
where the three symbols ψ0, ψ
0 and ψ
0 defined by these respective equations
should represent the same quantity and must be equal to each other, therefore.
Having this in mind we introduce ξ
0 = 1 +B and ξ
0 = C and decompose
0 = ξ
0 + χ
0 , ψ
0 = ξ
0 + χ
This enables us eliminate
0 = V−1 ψ−1 , χ
0 = V1 ψ1
and eq. (9) becomes reduced to the pair of conditions,
1 +B + V−1 ψ−1 = C + V1 ψ1 = ψ0 ,
−ψ−1 + (2 cosϕ+ Z0) ψ0 − ψ1 = 0
They lead to the two-dimensional linear algebraic problem which defines the reflection
and transmission coefficients B and C at any input energy ϕ. The same conclusion
applies to all the models with the larger M .
4 The matrix-inversion method of solution
Let us now re-write our difference Schrödinger eq. (4) as a doubly infinite system of
linear algebraic equations
. . .
. . .
. . .
. . . S−1 −1 0 . . .
. . . −1 S0 −1
. . .
. . . 0 −1 S1
. . .
. . .
. . .
. . .
= 0 , (11)
where
Sk ( ≡ S
−k) =
2 cosϕ+ Zk + i Yk sign k , |k| < M ,
2 cosϕ , |k| ≥M
and where the majority of the elements of the “eigenvector” are prescribed, in ad-
vance, by the boundary conditions (5). Once we denote all of them by a different
symbol,
ψ(xm) =
Aeimϕ +B e−imϕ ≡ ξ(−)m , m ≤ −(M − 1) ,
C eimϕ ≡ ξ(+)m , m ≥M − 1 ,
we may reduce eq. (11) to a finite-dimensional and tridiagonal non-square-matrix
problem
−1 S∗(M−1) −1
. . .
. . .
. . .
−1 S∗1 −1
−1 S0 −1
−1 S1 −1
. . .
. . .
. . .
. . .
. . .
−1 S(M−1) −1
−(M−1)
ψ−(M−2)
= 0 (14)
or, better, to a non-homogeneous system of 2M − 1 equations
−(M−1)
ψ−(M−2)
where the (2M − 1)−dimensional square-matrix of the system can be partitioned as
follows,
S∗(M−1) −1
−1 S∗(M−2) −1
. . .
. . .
. . . S0
. . .
. . .
. . . −1
−1 S(M−2) −1
−1 S(M−1)
. (16)
Whenever this matrix proves non-singular, it may assigned the inverse matrixR=T−1,
the knowledge of which enables us to re-write eq. (15), in the same partitioning, as
follows,
−(M−1)
= R ·
, ~Ψ =
ψ−(M−2)
. (17)
In the next step we deduce that the matrix R has the following partitioned form
α∗ ~tT β
~u Q ~v
β ~wT α
We may summarize that in the light of the overall partitioned structure of eq. (17),
the knowledge of the (2M − 3)−dimensional submatrix Q as well as of the two
(2M − 3)−dimensional row vectors ~tT and ~wT (where T denotes transposition) is
entirely redundant. Moreover, the knowledge of the other two column vectors ~u and
~v only helps us to eliminate the “wavefunction” components ψ−(M−2), ψ−(M−3), . . . ,
ψM−3, ψM−2. In this sense, equation (15) degenerates to the mere two scalar relations
−(M−1) − α
−(M) − β ξ
M = 0 ,
M−1 − β ξ
−M − α ξ
M = 0 .
Once we insert the explicit definitions from eq. (13) we get the final pair of linear
equations
e−i (M−1)ϕ +B ei (M−1)ϕ − α∗
e−iM ϕ +B eiM ϕ
− C β eiM ϕ = 0 ,
C ei (M−1)ϕ − β
e−iM ϕ +B eiM ϕ
− C α eiM ϕ = 0
which are solved by the elimination of
B = −e−2iMϕ +
e−iϕ − α
and, subsequently, of
2iβe−2iMϕ sinϕ
β2 − (e−iϕ − α∗) (e−iϕ − α)
. (21)
This is our present main result.
5 Coefficients α and β
Our final scattering-determining formulae (20) and (21) indicate that the complex
coefficient α and the real coefficient β carry all the “dynamical input” information.
At any given energy parameter ϕ these matrix elements are, by construction, rational
functions of our 2M − 1 real coupling constants Z0, Z1, . . . , ZM−1 and Y1, . . . , YM−1.
In particular, β is equal to 1/ detT and α has the same denominator of course. An
explicit algebraic determination of the determinant detT and of the numerator (say,
γ) of α is less easy. Let us illustrate this assertion on a few examples.
5.1 M = 2 once more
detT = Z0 Z1
2 − 2Z1 + Y1
Re γ = Z0 Z1 − 1
Im γ = −Z0 Y1
5.2 M = 3
detT = Z0 Z1
2 − 2Z0 Z1 Z2 − 2Z1 Z2
Z0 Z2
2 + 2Z2 + Y2
Z0 Z1
2 − 2Y2
Z1 + Y2
Z0 + 2Z0 Y1 Y2 + Z0
Re γ = Z0 Z1
Z2 − 2Z1 Z2 + Y1
Z0 Z2 − Z1 Z0 + 1
Im γ = −Z0 Z1
Y2 + 2Z1 Y2 − Y1
Z0 Y2 −Y1 Z0
5.3 M = 4
The growth of complexity of the formulae occurs already at the dimension as low
as M = 4. The determinant detT and the real and imaginary parts of γ are them
represented by the sums of 15 and 14 and 32 products of couplings, respectively.
A simplification is only encountered in the weak coupling regime where one finds
just two terms in the determinant which are linear in the couplings,
detT = −2Z3 − 2Z1 + . . .
being followed by the 10 triple-product terms,
. . .+ Y1
Z0 + 4Z1 Z2 Z3 − 4Z1 Y2 Y3 + Z1
Z0 + 2Y3
Z2 + 2Y1 Z0 Y3+
+2Z1 Z0 Z3 + Z3
Z0 + Y3
Z0 + 2Z3
Z2 + . . .
etc. Similarly, we may decompose, in the even-number products,
Re γ = −1 + 2Z2 Z3 + Z0 Z1 + Z0 Z3 + 2Z2 Z1 + . . .
and continue
. . .+−2Z0 Z1 Z2 Z3 + 2Z0 Y1 Y2 Z3−
−Z2 Y1
Z0 − 2Z2
Z1 Z3 − Z2 Z1
Z0 − 2Y2
Z1 Z3 + . . .
etc, plus
Im γ = −2Z2 Y3 + 2Y2 Z1 − Z0 Y1 − Z0 Y3 − . . .
with a continuation
. . .− Y2 Y1
Z0 −Y2 Z1
Z1 Y3 + 2Z0 Z1 Z2 Y3 + 2Z2
Z1 Y3 − 2Z0 Y1 Y2 Y3 + . . .
etc. Symbolic manipulations on a computer should be employed at all the higher
dimensions M ≥ 4 in general.
6 Discussion
The main inspiration of the activities and attention paid to the PT −symmetry
originates from the pioneering 1998 letter by Bender and Boettcher [8] where the
operator P meant parity and where the (antilinear) T represented time reversal. Its
authors argued that the complex model V (x) = x2 (ix)δ seems to possess the purely
real bound-state spectrum at all the exponents δ ≥ 0. After a rigorous mathematical
proof of this conjecture by Dorey, Duncan, Tateo and Shin [9] and after the (crucial)
clarification of the existence of a nontrivial, “physical” Hilbert space H where the
Hamiltonian remains self-adjoint [2, 10, 11, 12], the bound-state version of eq. (1)
may be considered more or less well understood, especially after it has been clarified
that the physics-inspired concept of PT −symmetry of a Hamiltonian H should in
fact be understood, in the language of mathematics, as a P−pseudo-Hermiticity of
H specified by the property H† = P H P−1 [10, 11, 13].
In the spirit of the latter generalization, current literature abounds in the studies
of the potentials which are analytically continued [14], singular and multisheeted
[15], multidimensional [16], manybody [17], relativistic [18], supersymmetric [19] and
channel-coupling [20]. Among all these developments, a comparatively small number
of papers has been devoted to the problem of the scattering. For a sample one might
recollect the key reviews [21] and various Kleefeld’s conceptual conjectures [22] as well
as a very explicit study of the scattering by the separable PT −symmetric potentials
of rank one [23] or by the rectangular or reflectionless barriers [24], or the motion
considered along the so called tobogganic (i.e., complex and topologically nontrivial)
integration contours [25]. In this context our present difference-equation-based study
may be understood just as another attempt to fill the gap.
Technically we felt inspired by our old Runge-Kutta-type discretization of the
PT −symmetric Schrödinger equations [5] as well as by our recent chain-model ap-
proximations of bound states in a finite-dimensional Hilbert space [26]. In a certain
unification of these two approaches we succeeded here in showing that there exists
a close formal parallelism between the description of the (one-dimensional, Runge-
Kutta-approximated) scattering by a real (i.e., Hermitian) potentials and by their
complex, PT −symmetric generalizations. We showed that in both these contexts,
the definition of the transmission and reflection coefficients has the same form [cf.,
e.g., eq. (21)], with all the differences represented by the differences in the form of
the “dynamical input” information. It has been shown to be encoded, in both the
Hermitian and non-Hermitian cases, in the two functions α and β of the lattice po-
tentials, with the vanishing or non-vanishing coefficients Yk, respectively (cf. a few
samples of the concrete form of α and β in section 5).
On the level of physics we would like to emphasize that one of the main dis-
tinguishing features of the scattering problem in PT −symmetric quantum mechan-
ics lies in the manifest asymmetry between the “in” and “out” states [22]. In its
present solvable exemplification we showed that such an asymmetry is merely formal
and that the problem remains tractable by the standard, non-matching and non-
recurrent techniques of linear algebra. A key to the success proved to lie in the
partitioning of the Schrödinger equation which enabled us to separate its essential
and inessential components and to reduce the construction of the amplitudes to the
mere two-dimensional matrix inversion [cf. eq. (18)] where all the dynamical input
is represented by the four corners of the inverse matrix R = T−1 [cf. the definition
(16)].
We believe that the merits of the present discrete model were not exhausted by
its present short analysis and that its further study might throw new light, e.g., on
the non-Hermitian versions of the inverse problem of scattering.
Acknowledgement
Work supported by the MŠMT “Doppler Institute” project Nr. LC06002, by the In-
stitutional Research Plan AV0Z10480505 and by the GAČR grant Nr. 202/07/1307.
References
[1] S. Flügge, Practical Quantum Mechanics (Springer, Berlin, 1971), p. 42.
[2] F. G. Scholtz, H. B. Geyer and F. J. W. Hahne, Ann. Phys. (NY) 213 (1992)
[3] C. M. Bender, Rep. Prog. Phys., submitted (hep-th/0703096).
[4] cf., e.g., the dedicated issues of J. Phys. A: Math. Gen. 39 (2006), Nr. 32 (H.
Geyer, D. Heiss and M. Znojil, editors, pp. 9963 - 10261), and Czechosl. J. Phys.
56 (2006), Nr. 9 (M. Znojil, editor, pp. 885 - 1064).
[5] M. Znojil, Phys. Lett. A 223 (1996) 411;
F. M. Fernández, R. Guardiola R, J. Ros and M. Znojil, J. Phys. A: Math. Gen.
32 (1999) 3105;
U. Guenther, F. Stefani and M. Znojil, J. Math. Phys. 46 (2005) 063504.
[6] M. Znojil, Phys. Lett. A 203 (1995) 1.
[7] S. Acton, Numerical Methods that Work (Harper, New York, 1970).
[8] C. M. Bender and S. Boettcher, Phys. Rev. Lett. 24 (1998) 5243.
[9] P. Dorey, C. Dunning and R. Tateo, J. Phys. A: Math. Gen. 34 (2001) L391;
P. Dorey, C. Dunning and R. Tateo, J. Phys. A: Math. Gen. 34 (2001) 5679;
K. C. Shin, Commun. Math. Phys. 229 (2002) 543.
[10] M. Znojil, J. Nonlin. Math. Phys. 9, suppl. 2 (2002) 122 (quant-ph/0103054);
M. Znojil, Rendiconti del Circ. Mat. di Palermo, Ser. II, Suppl. 72 (2004) 211
(math-ph/0104012);
B. Bagchi, C. Quesne and M. Znojil, Mod. Phys. Lett. A 16 (2001) 2047 (quant-
ph/0108096).
[11] A. Mostafazadeh, J. Math. Phys. 43 (2002) 205 and 2814 and 3944;
H. Langer and Ch. Tretter, Czech. J. Phys. 54 (2004) 1113.
[12] C. M. Bender, D. C. Brody and H. F. Jones, Phys. Rev. Lett. 89 (2002) 0270401.
[13] L. Solombrino, J. Math. Phys. 43 (2002) 5439.
[14] F. Cannata, G. Junker and J. Trost, Phys. Lett. A 246 (1998) 219;
F. M. Fernández, R. Guardiola, J. Ros and M. Znojil, J. Phys. A: Math. Gen.
31 (1998) 10105;
C. M. Bender, S. Boettcher and P. N. Meisinger, J. Math. Phys. 40 (1999) 2201;
M. Znojil, J. Math. Phys. 45 (2004) 4418.
[15] M. Znojil, Phys. Lett. A 342 (2005) 36.
[16] C. M. Bender, G. V. Dunne, P. N. Meisinger and M. Simsek, Phys. Lett. A 281
(2001) 311 (quant-ph/0101095);
A. Nananyakkara, Phys. Lett. A 334 (2005) 144;
H. B́ıla, M. Tater and M. Znojil, Phys. Lett. A 351 (2006) 452.
[17] M. Znojil and M. Tater, (quant-ph/0010087), J. Phys. A: Math. Gen. 34 (2001)
1793;
M. Znojil, J. Phys. A: Math. Gen. 36 (2003) 7825;
B. Basu-Mallick, T. Bhattacharyya, A. Kundu and B. P. Mandal, Czech. J.
Phys. 54 (2004) 5;
S. M. Fei,Czech. J. Phys. 54 (2004) 43;
V. Jakubský, Czech. J. Phys. 54 (2004) 67.
[18] A. Mostafazadeh,Class. Quantum Grav. 20 (2003) 155;
A. Mostafazadeh, J. Math. Phys. 44 (2003) 974;
A. Mostafazadeh, Ann. Phys. (NY) 309 (2004) 1;
M. Znojil, J. Phys. A: Math. Gen. 37 (2004) 9557;
M. Znojil, Czech. J. Phys. 54 (2004) 151 (2004) and 55 (2005) 1187;
V. Jakubský and J. Smejkal, Czechosl. J. Phys. 56 (2006) 985.
[19] F. Cannata, G. Junker and J. Trost, Phys. Lett. A 246 (1998) 219;
A. A. Andrianov, F. Cannata, J-P. Dedonder and M. V. Ioffe, Int. J. Mod. Phys.
A 14 (1999) 2675 (quant-ph/9806019);
M. Znojil, F. Cannata, B. Bagchi and R. Roychoudhury, Phys. Lett. B 483
(2000) 284;
S. M. Klishevich and M. S. Plyushchay, Nucl. Phys. B 628 (2002) 217;
A. Mostafazadeh, Nucl. Phys. B 640 (2002) 419;
M. Znojil, J. Phys. A: Math. Gen. 35 (2002) 2341 and 37 (2004) 9557;
B. Bagchi, S. Mallik and C. Quesne, Mod. Phys. Lett. A 17 (2002) 1651;
M. Znojil, Nucl. Phys. B 662/3 (2003) 554;
G. Lévai, Czech. J. Phys. 54 (2004) 1121.
C. Quesne, B. Bagchi, S. Mallik, H. Bila, V. Jakubsky and M. Znojil, Czech. J.
Phys. 55 (2005) 1161;
B. Bagchi, A. Banerjee, E. Caliceti, F. Cannata, H. B. Geyer, C. Quesne and
M. Znojil, Int. J. Mod. Phys. A 20 (2005) 7107 (hep-th/0412211);
A. Sinha and P. Roy, J. Phys. A: Math. Gen. 39 (2006) L377.
[20] M. Znojil, J. Phys. A: Math. Gen. 39 (2006) 441;
M. Znojil, Phys. Lett. A 353, 463 (2006).
[21] J. G. Muga, J. P. Palao, B. Navarro and I. L. Egusquiza, Phys. Rep. 395 (2004)
F. Cannata, J.-P. Dedonder and A. Ventura, Ann. Phys. 322 (2007) 397.
[22] F. Kleefeld, in “Hadron Physics, Effective Theories of Low Energy QCD”, AIP
Conf. Proc. 660 (2003) 325;
F. Kleefeld, Czech. J. Phys. 56 (2006) 999.
[23] F. Cannata and A. Ventura, Czech. J. Phys. 56 (2006) 943 (quant-ph/0606006).
[24] Z. Ahmed, Phys. Lett. A 324 (2004) 152;
Z. Ahmed, C. M. Bender and M. V. Berry, J. Phys. A: Math. Gen. 38 (2005)
L627.
[25] M. Znojil, J. Phys. A: Math. Gen. 39 (2006) 13325.
[26] M. Znojil and H. B. Geyer, Phys. Lett. B 640 (2006) 52;
M. Znojil, Phys. Lett. B 647 (2007) 225;
M. Znojil, J. Phys. A: Math. Theor., to appear (math-ph/0703070);
M. Znojil, Phys. Lett. A, to appear (quant-ph/0703168).
ABSTRACT
  One-dimensional scattering problem admitting a complex, PT-symmetric
short-range potential V(x) is considered. Using a Runge-Kutta-discretized
version of Schroedinger equation we derive the formulae for the reflection and
transmission coefficients and emphasize that the only innovation emerges in
fact via a complexification of one of the potential-characterizing parameters.

<|endoftext|><|startoftext|>
Introduction and results
Let W = {y : y1 < . . . < yn} be the Weyl chamber. Consider Xt = (X1t , . . . ,Xnt ), wherein
coordinates are independent Brownian motions with unit variance parameter, drift vector
a = (a1, . . . , an) and starting point X0 = x ∈ W . In this paper we study the collision time
τ , which is the exit time of Xt from the Weyl chamber, i.e.
τ = inf{t > 0 : Xt /∈ W} .
For identical drifts a1 = . . . = an , say ai ≡ 0, the celebrated Karlin-McGregor formula states
(see [7])
IP(τ > t;X t ∈ dy) = det [pt(xi, yj)] dy , (1.1)
where pt(x, y) =
(x−y)2
2t , which yields the tail distribution of τ :
IPx(τ > t) =
det [pt(xi, yj)] dy .
For the use of Karlin-McGregor formula it is essential that processes X1t , . . . ,X
t are inde-
pendent copies of the same strong Markov, with skip-free realizations process, starting at
t = 0 from x ∈ W . In this case the asymptotic of IPx(τ > t) was first studied by Grabiner
[5] (for the Brownian case) (see also proofs by Doumerc and O’Connell [4] and Pucha la [9])
Later Pucha la & Rolski [10]) showed that this asymptotic is also true for the Poisson and
continuous time random walk case. The above mentioned asymptotics is:
IPx(τ > t) ∼ D∆(x)t−n(n−1)/4, (1.2)
where ∆(x) = det
(j−1)
i,j=1
is the Vandermonde determinant, and
D = (2π)
2 ∆ (y) dy , (1.3)
for t → ∞. Here and below 1/cn =
j=1 j!.
In this note we study the same problem, however for Brownian motions with different
drifts. For this we derive first, in Section 2, a formula for IPx(τ > t) by the change of
measure. It is apparent that possible results must depend on the form of drift vector a. For
example we can analyze all cases for n = 2, because in this case the collision equals to the first
passage to zero of the Brownian process X2t − X1t , for which the density function is known
(see e.g. [3]). Hence
IPx(τ > t) =
2πs3/2
−(x + as)
where x = x2 − x1 and a = a2 − a1. This yields
IPx(τ > t) =
xeaxt−3/2e−ta
2/4 (1 + o(1)) , a1 > a2
2 (1 + o(1)) , a1 = a2
1 − e−ax + o(1) a1 < a2 .
For general n the situation is much more complex and different scenarios are possibles.
For example the drifts can be diverging and then IPx(τ > t) tends to a positive constant,
which the situation was analyzed by Biane et al [2]. Another case is when all drifts are equal,
in which the case the probability IPx(τ > t) is polynomially decaying, as it was found by
Grabiner [5]. However there are various situations when the probabilities are exponentially
decaying with polynomial prefactors. The full characterization depends on a concept of the
stable partition of the drift vector, which the notion is introduced in Section 3. In Section 4
we state the main theorem, which shows all possible exact asymptotics of IPx(τ > t) in from
of Ch(x)t−αe−γt, where formulas for C,α and γ are given in terms of the stable partition of
the drift vector.
2 Formula for IPx(τ > t).
We note our basic probabilistic space with natural history filtration (Ω,F , (Ft), IPx) and
consider on it process Xt as defined in the Introduction. Unless otherwise stated we tacitly
assume that x ∈ W . We start off a lemma on the change of measure for the Brownian
case, which the proof can be found for example in Asmussen [1], Theorem 3.4. Let Mt =
e<α,Xt>/IEe<α,Xt> be a Wald martingale. For a probability measure IPx its restriction to
Ft we denote by IPx|t. Let ĨPx be a probability measure obtained by the change of measure
IPx with the use of martingale Mt, that is defined by a family of measures ĨPx|t = Mt dIPx|t,
t ≥ 0. For the theory we refer e.g. to Section XIII.3 in [1]
Lemma 2.1 If Xt is a Brownian motion with drift a under IPx, then this process is a
Brownian motion with drift a + α under ĨPx.
The sought for formula for the tail distribution of the collision time is given in the next
proposition.
Proposition 2.2
IPx(τ > t) =
= (2π)−n/2e−<a,x>−||x||
e−||y−a
t||2/2 det[exiyj/
t] dy . (2.4)
Proof. We use α = −a to eliminate the drift under ĨPx. Thus IPx(τ > t) = ĨEx[M−1t ; τ > t] .
Now by Karlin-McGregor formula (1.1) we write
IPx(τ > t) = ĨEx[e
<a,Xt>IExe
<−a,Xt>; τ > t]
= e<−a,x>−||a||
e<a,y> det[pt(xi, yj)] dy ,
and next, algebraic manipulations yield (2.4).
In the paper we use the following vector notations. For a vector a ∈ IRn we denote a[i,j] =
(ai, ai+1, . . . , aj) and ā[i,j] = (ai+ai+1+ . . .+aj)/(j−i+1). We also use a(i,j] = (ai+1, . . . , aj)
and a(i,j) = (ai+1, . . . , aj−1). By z
k, where z = (z1, . . . , zm) and k = (k1, . . . , km) we denote
j=1 z
3 Stable partition of a.
Let a ∈ IRn. Our aim is to make a suitable partition
(a1, . . . , aν1)(aν1+1, . . . , aν1+ν2), . . . , (aν1+...+νq−1+1, . . . , aν1+...+νq ) . (3.5)
of a, where νi > 0. For short we denote m1 = ν1,m2 = ν1 + ν2, . . . ,mq = ν1 + . . . + νq = n.
We also set m0 = 0.
We say that sequence a is irreducible if
ā[1;1] > ā[2;n]
ā[1;2] > ā[3;n]
ā[1;n−1] > ā[n;n]
. (3.6)
Suppose we have a partition defined by m1, . . . ,mq . The mean of the i
th sub-vector is
denoted by f i = ā(mi−1;mi]. Furthermore we define a vector f = (f1, . . . , fn) by
fi = f
k, if mk−1 < i ≤ mk .
It is said that partition (3.5) of vector a is stable if
f1 ≤ f2 ≤ . . . ≤ f q (3.7)
and each vector a(mi−1,mi] is irreducible (i = 1, . . . , q). Remark that a stable partition is
defined if we know m = (m1, . . . ,mq) for which (3.7) hold and each a(mi−1,mi] is irreducible
(i = 1, . . . , q). In the sequel, for a given stable partition of a, characters q,f ,m are reserved
for it.
Consider now fm1 , fm2 , . . . , fmq and define a subsequence m
= (m′1, . . . ,m
q′) of m =
(m1,m2, . . . ,mq) as follows. Let q
′ be the number of strict inequalities in f1 ≤ f2 ≤ . . . ≤ f q
plus 1. Furthermore we define inductively by m′0 = 0 and for i = 1, . . . , q
′ − 1
m′i = inf{mj > m′i−1 : mj ∈ m, fmj < fmj+1} .
and finally we set mq′ = n. We also define a subsequence of indices i0, i1, . . . , iq′ inductively
by i0 = 0 and
ik = inf{j > ik−1 : fmj < fmj+1}.
Hence we have
fm′1 < fm
< · · · < fm′
In this case we say that (m′1, . . . ,m
q′) is a strong representation of the stable partition of a
and q′, (m′1, . . . ,m
q′) are characters reserved for it. Set ν
i = m
i −m′i−1, (i = 1, . . . , q′).
Example 1 Suppose that a = (3, 1, 2, 5, 1). Then q = 3 and m1 = 2,m2 = 3,m3 = 5 define
the stable partition (3, 1)(2)(5, 1) with means f1 = 2, f2 = 2, f3 = 3. furthermore q′ = 2,
m′1 = 3,m
2 = 5 and i1 = 2, i2 = 5.
Proposition 3.1 For each vector a, there exists its unique stable partition.
Before we state a proof of Proposition 3.1 we prove few lemmas.
Lemma 3.2 If a = (a1, . . . , an) is irreducible, then
ā[1;1] > fn > ā[2;n]
ā[1;2] > fn > ā[3;n]
ā[1;n−1] > fn > ā[n;n]
. (3.8)
Proof. fn is a nontrivial weighted mean of every pair ā[1;i] and ā[i+1;n].
Lemma 3.3 In a stable partition, for each element a(mi−1;mi]
ā(mi−1;mi−1+k] ≥ fmi .
Proof. The case k ≤ mi −mi−1 follows from Lemma 3.2. Clearly for k = mi −mi−1 we have
equality. Consider now k > mi −mi−1. Than ā(mi−1;mi−1+k] is a weighted mean of fmi and
ā(mi;mi+k−(mi−mi−1)] and the later term is greater or equal than fmi by (3.7) and (3.8).
In the next lemma we consider two vectors a1 ∈ IRn1 and a2 ∈ IRn2 . The corresponding
f -s are fn1 and fn2 respectively. We consider a situation of creating a new vector (a1,a2) =
(a1 . . . , an1+n2) ∈ IRn1+n2 .
Lemma 3.4 Suppose that a1 and a2 are irreducible and fn1 > fn2. Then vector (a1,a2) is
irreducible.
Proof. Recall that (a1,a2) = (a1 . . . , an1+n2) ∈ IRn1+n2 . Suppose 1 ≤ k ≤ n1. By Lemma
3.2 we have ā[1;k−1] > fn1 > ā[k;n1], also ā[1;k−1] > fn1 > fn2 . Hence ā[1;k−1] > ā[k;n1+n2]
becasue ā[k;n1+n2] is a weighted mean of ā[k;n1] and fn2 . Suppose now n1 < k. Then ā[1;k−1]
is a weighted mean of fn1 and ā[n1+1,k] and both by Lemma 3.2 are greater than ā[k;n1+n2],
which completes the proof.
Proof of Proposition 3.1. The existence part is by induction with respect n. For n = 2 we
have two situations
1. if a1 ≤ a2, than q = 2 with m1 = 1, m2 = 2 is a stable partition,
2. if a1 > a2, than q = 1 with m1 = 2 is a stable partition.
Assume that there exists a stable partition with q partition vectors of a vector a ∈ Rn. We
add a new element an+1 at the end of vector a to create new one (a, an+1) = (a1, . . . , an+1).
We have two situations.
1. If an+1 ≥ f q than in a stable partition an+1 is alone in the q + 1 partition vector.
2. If an+1 < f
q, than we proceed inductively as follow. We use Lemma 3.4 with a1 =
a[mq−1;mq] and a2 = (an+1) and let f
q and f q+1 = an+1 are means of these partition
vectors. In result (a(mq−1;mq], an+1) form an irreducible vector, for which we have to
check whether condition (3.7) holds. If yes, then we end with a stable partition, other-
wise we join the q − 1 partition vector with the new q partition vectors and repeat the
procedure. In the worst case we end up with one partition vector.
For the uniqueness proof , suppose that we have two different stable partitions: m11 <
m12 < · · · < m1q1 and m
1 < m
2 < · · · < m2q2 . The means of fs are (f
1)1, . . . , (f1)q1
for the first partition vector and (f2)1, . . . , (f2)q2 for the second respectively. Since parti-
tions are supposely different, there exists i such that m1i 6= m2i . We take the minimal i with
this property and without loss of generality we can assume m2i > m
i . Set k = m
i −m1i . We
have to analaze the following cases.
1. (m2i = m
i+1). We have
[m2i−1+1;k]
> ā2
[k+1;m2i ]
On the other hand (f1)i = ā[m1i−1+1;m
= ā[m2i−1+1;m
> ā[m1i+1;m
= ā[m1i+1;m
(f2)i and this contradics with (f1)i ≤ (f1)i+1.
2. (m2i > m
i+1). We have ā[m2i−1+1;m
i+1;m
and by Lemma 3.3
(f1)i = ā[m2i−1+1;m
> ā[m1i+1;m
≥ (f1)i+1 ,
which is a contradiction.
3. (m2i < m
i+1). We have by Lemma 3.3
(f1)i = ā[m1i−1+1;m
> ā[m1i+1;m
≥ (f1)i+1 ,
which is a contradiction.
The proof is completed.
Remark The stable partition can be obtained by considering the following simple determin-
istic dynamical system. We have n particles starting from x1 < x2 < · · · < xn. The ith
particle has speed ai. Each particle moves with a constant speed on the real line until it
collides with one of its neighboring particle (if it happens). Then both the particles coalesce
and from this time on they move with the proportional speed which is the mean of speed of
colliding particles, and so on. Ultimately the particles will form never colliding groups, which
are the same as in the stable partition of a. Notice that resulted grouping do not depend on
a starting position x.
4 The theorem and examples.
We begin introducing some notations. Suppose that a has a stable partition with character-
istics q, (mi), q
′, (m′i) respectively. In the sequel we will use the following notations:
ml−1<u<v≤ml
(au − av)2
 , (4.9)
+ (n− q) +
 , (4.10)
h(x) = e−<x,a> det
exifjx
(j−m′
l−1−1)1I{m′
l−1<j≤m
. (4.11)
Moreover we define a function
I(a, t)
|z|2e
ml−1<u<v≤ml
(zu−zv)(av−au)
∆(z(m′j−1;m
) d z .
(4.12)
Remark that from Lemma 5.1 it will follow
ml−1<u<v≤ml
(au − av)2 =
ml−1<u<v≤ml
(au − āl)2,
where
āl =
u=ml−1+1
Using this notation we now state a proposition which is useful for calculations in some
cases.
Proposition 4.1
IPx(τ > t) = (2π)
e−γtt−
j=1 (
×e−<x,a> det
exkfjx
(j−mil−1−1)1I{mil−1<j≤mil}
×I(a, t) (1 + o(1)). (4.13)
Remark that formula (4.13) does not give us straightforward asymptotic because inte-
gral I(a, t) depends on t. However in some cases this dependence vanishes and this is why
Proposition 4.1 can be sometimes useful.
The next theorem gives us asymptotic for all cases.
Theorem 4.2 For some C given below, as t → ∞
IPx(τ > t) = Ch(x)t
−αe−γt(1 + o(1)),
γ, α, and h(x) are defined in (4.9),(4.10),(4.11) respectively.
To show C we need few more definitions. Let
H(s1, . . . , sℓ) =
1≤i≤j≤ℓ+1
(si + . . . + sj−1). (4.14)
Define now
C = A1 ×A2 ×A3 , (4.15)
where
A1 = (2π)
−n/2√2πn
· · ·
ξi>0:i/∈{m1,...,mq}
ml−1<u<v≤ml
(ξu+···+ξv−1)(au−av)
ξ(mi−1;mi−1)
i/∈{m1,...,mq}
dξi ,
· · ·
ξi>0:i∈{m1,...,mq}\{ml1 ,...,mlq′ }
· · ·
ξi>−∞:i∈{ml1 ,...,mlq′ }
k,l∈{m1,...,mq}
Sklξkξl
i:{i,i+1,...,i+k}
∈{1,...,q}\{l1,...,lq′ }
ξmi+j
νiνi+k+1
i∈{m1,...,mq}
dξi ,
where Skl = (n− 2)k for k ≤ l and Skl = Slk. In the remaining part of this section we diplay
some special cases.
Example 2 (a1 = a2 = · · · = an) This is no drift case. Here q = n and m1 = 1,m2 =
2, . . . ,mn = n, also q
′ = 1 and m′1 = n. In result fm1 = a1, . . . , fmn = an. Let a be the
common value of the drift. Using Proposition 4.1 we have
IPx(τ > t) = (2π)
−n/2cne
−<x,a> det
exkfjx
2 ∆n(z[1;n]) dz (1 + o(1)).
First we notice that since all the coordinates in vector f are the same, we have
exkfjx
= e<x,f> det
= e<x,a>∆(x).
Furthermore W −f
t = W because y1 < y2 < · · · < yn if and only if y1 + a
t < y2 + a
· · · < yn + a
t. Finally we write
IPx(τ > t) = C h(x) t
−α (1 + o(1)),
where
h(x) = ∆n(x),
C = (2π)−n/2cn
2 ∆(z) dz.
Before we state the next example we prove the following lemma.
Lemma 4.3 If a ∈ W , then {W − at} → IRn as t → ∞.
Proof. Let a ∈ W . We show that for all y ∈ IRn there exists s > 0, such that for all t > s,
y ∈ {W − at}. Let y ∈ IRn. We note bi = yi+1 − yi and di = ai+1 − ai. Condition a ∈ W
implies di > 0 for all i = 1, 2, . . . , n − 1. We take s = max{−bi, 0}/min{di} and t > s. Set
zi = yi + tai, then we get that z ∈ W , because
zi+1 − zi = yi+1 + tai+1 − yi − tai = bi + tdi > bi + sdi ≥ bi + max{−bi, 0} ≥ 0.
Thus for t > s we have y = z − ta, where z ∈ W , and so y ∈ {W − ta} for all t > s.
Example 3 (a1 < a2 < · · · < an) This is the case of non-colliding drifts. Here q = q′ = n,
m1 = m
1 = 1, . . . ,mn = m
n = n, fm1 = a1, . . . , fmn = an. Using Proposition 4.1 we have
IPx(τ > t) = (2π)
−n/2e−<x,a> det [exkaj ]
|z|2 dz (1 + o(1)).
By Lemma 4.3 we have that
|z|2 dz =
|z|2 dz = (2π)n/2.
Finally we write
IPx(τ > t) = e
−<x,a> det [exkaj ] .
This result was derived earlier by Biane et al [2]
Example 4 Case when q = q′ = 1. This is the case of a one irreducible drift vector. Here
m1 = m
1 = n, f1 = f2 · · · = fn = ā[1;n] =
a1+···+an
. Using Proposition 4.1 we have
IPx(τ > t) = Ch(x)t
−αe−γt(1 + o(1)),
where
0<u<v≤n
(au − av)2
(n− 1)(n + 1)
h(x) = e−<x,a> det
exifjx
C = (2π)−n/2
2πncn
· · ·
ξi>0:i=1,2,...,n−1
ml−1<u<v≤ml
(ξu+···+ξv−1)(au−av)
×H(ξ[1;n−1])
dξi .
We now analyze a remaining situation for n = 3.
Example 5 (a1 > a2 and
a1+a2
< a3). This is the case of two subsequences. Thus q =
2, q′ = 2 and m1 = m
1 = 2, m2 = m
2 = 3. By Theorem 4.2 we have
IPx(τ > t) = Ch(x)e
(a2−a1)2t−
where
(a2 − a1)2
h(x) = e−<x,a>
a1+a2
2 ex1
a1+a2
2 x1 e
a1+a2
2 ex2
a1+a2
2 x2 e
a1+a2
2 ex3
a1+a2
2 x3 e
C = (2π)−3/2
(a1 − a2)2
5 Auxiliary results.
For the proof we need a set of lemmas and propositions, presented in subsections below.
5.1 Useful lemmas.
We need a few technical lemmas, which we state without proofs.
Lemma 5.1 For a ∈ IRm
ā[1;m] − ai
1≤u<v≤m
(au − av)2.
Lemma 5.2 For a,z ∈ IRm
ā[1;m] − ai
(zv − zu)(au − av).
The proof of the following lemma follows easily from Lemmas 5.1 and 5.2.
Lemma 5.3 For a,f ∈ Rn such that f is is a vector obtained from the stable partition of a,
and z ∈ Rn, we have
t + z|2 = |z|2 +
ml−1<u<v≤ml
(au − av)2
ml−1<u<v≤ml
(zv − zu)(au − av)
Lemma 5.4
−1 1 0 . . . 0 0
0 −1 1 . . . 0 0
. . .
0 0 0 . . . −1 1
1 1 1 . . . 1 1
(A−1)TA−1 =
n− 2 2(n − 2)
n− 3 2(n − 3) 3(n − 3)
. . .
1 2 3 . . . n− 1
0 0 0 . . . 0 1
Note that (A−1)TA−1 is symmetric.
By Proposition 2.2 we have
IPx(τ > t) = (2π)
−n/2e−<a,x>−||x||
e−||y−a
t||2/2 det(exiyj/
t) dy .
We now introduce new variable z by
y = f
t + z,
where f = (f1, . . . , fn) is a vector obtained from the stable partition of a.
Finally we rewrite formula (2.4) in new variables by the use of Lemma 5.3:
Lemma 5.5
IPx(τ > t) = (2π)
−n/2e−<a,x>−||x||
2/2te−γt
ml−1<u<v≤ml
(zv−zu)(au−av)
exi(zj/
t+fj)
dz. (5.16)
5.2 Asymptotic behavior of determinant.
The following lemma is an extension of Lemma 2 from Pucha la [9] .
We define functions
gk(z) =
det[z
det[z
for k = (k1, . . . , kn) ∈ Zn and 0 ≤ k1 < · · · < kn Functions g corresponds to Schur functions
gk = sk−(0,1,...,n); see e.g. Macdonald [8], Ch. 1.3.
Lemma 5.6 Let k0 =
exi(zj/
t+fj)
t−k/2Tk , (5.17)
where
z(m′j−1,m
k1+···+kn=k
k1<···<km′
......
<···<k
gk(m′
(z(m′0,m
k1! . . . km′1 !
. . .
gk(m′
(z(m′
q′−1,m
q′−1+1
! . . . km′
exifjx
In particular as t → ∞
exi(zj/
t+fj)
j=1 (
∆(z(m′
j−1,m
× det
exkfjx
(j−mil−1−1)1I{mil−1<j≤mil}
(1 + o(1)).
Proof. By Sn we denote the group of permutations on n-set. We write
exi(zj/
t+fj)
(−1)σe
xifσ(i)e
xizσ(i)/
(−1)σe
xifσ(i)
t−k/2(x1zσ(1) + · · · + xnzσ(n))k/k!
t−k/2
(−1)σe
xifσ(i)(x1zσ(1) + · · · + xnzσ(n))k.
Now the coefficient at t−k/2 is equal to
Tk = Tk(z) =
(−1)σe
xifσ(i)(x1zσ(1) + · · · + xnzσ(n))k
(−1)σe
xifσ(i)
k1+···+kn=k
k1! . . . kn!
(x1zσ(1))
kσ(1) . . . (xnzσ(n))
kσ(n)
k1+···+kn=k
k1! . . . kn!
(−1)σe
xifσ(i)(x1zσ(1))
kσ(1) . . . (xnzσ(n))
kσ(n)
k1+···+kn=k
k1! . . . kn!
det[exifjx
Recall that
f1 = · · · = fm′1 < fm′1+1 = · · · = fm′2 < · · · < fm′q′−1+1 = · · · = fm′q′ .
If ki = kj and fi = fj, then the determinant det
exifjx
is 0. Thus we have non-zero
determinant if ki are different for those i such that fi are equal. Thus index k such that Tk
is non-zero must be at least
kj ≥ k0 =
Moreover we get all nonzero det
exifjx
putting in each subsequence
(k(m′0,m
, . . . ,k(m′
q′−1,m
all possible permutations of strictly ordered numbers from Z+ such that all sum up to k.
Thus we have
k1+···+kn=k
k1<···<km′
......
<···<k
σ1∈Sν′
· · ·
σq′∈Sν′
σ1(k(m′
(m′0,m
k1! . . . km′1 !
. . .
σ1(k(m′
q′−1,m
q′−1+1
! . . . km′
× det
exifjx
σl(kj)1m′
l−1<j≤m
Again we notice that permutations in the determinant influence only by the change of sign.
These signs and sums over the group of permutations form determinants, thus we have
k1+···+kn=k
k1<···<km′
......
<···<k
i,j=1
k1! . . . km′1
. . .
i,j=m′
q′−1+1
q′−1+1
! . . . km′
exifjx
Remark. Using Itzykson–Zuber integral (see e.g. [6]) we can write
exi(zj/
t+fj)
∆(x)∆(z/
t + f)
eTrdiag(x)Udiag(z/
t+f)U∗µ( dU) ,
where µ( dU) is (normalized) Haar measure on the unitary group U(n). Now letting t → ∞,
eTrdiag(x)Udiag(z/
t+f)U∗µ( dU) →
eTr(diag(x)Udiag(f)U
∗)µ( dU)
exifj
∆(x)∆(f)
t + f) = t−
i=1 (
∆(z(m′i−1;m
1≤k<l≤n
(f l − fk)ν′kν′l(1 + o(1)) .
Hence, as t → ∞
exi(zj/
t+fj)
i=1 (
∆(z(m′i−1;m
1≤k<l≤n
(f l − fk)ν′kν′l
exifj
This is a less detailed version of the formula from Lemma 5.6.
6 Proof of the Theorem.
Using (5.17) and formula (5.16) we write
IPx(τ > t) =
= (2π)−n/2e−||x||
2/2te−<x,a>e−γt
|z|2e
ml−1<u<v≤ml
(zu−zv)(av−au)
t−k/2Tk(z) dz ,(6.18)
First we will analyze above expression by taking only the first term in the sum (6.18), and
then we show that it gives the right asymptotic. Thus the first term equals to
(2π)−n/2e−||x||
2/2te−γt
ml−1<u<v≤ml
(zu−zv)(av−au)
×e−<x,a>
∆(z(m′j−1;m
× det
exkfjx
(j−mil−1−1)1I{mil−1<j≤mil }
k0 dz
= (2π)−n/2e−||x||
2/2te−γt
×e−<x,a> det
exkfjx
(j−mil−1−1)1I{mil−1<j≤mil}
I(a, t),
where I(a, t) was introduced in (4.12).
6.1 Asymptotic behavior of integral.
If s = Az, where
−1 1 0 . . . 0 0
0 −1 1 . . . 0 0
. . .
0 0 0 . . . −1 1
1 1 1 . . . 1 1
than zu − zv = sv + sv+1 + · · · + su−1 and
|z|2 = zTz = (A−1s)T (A−1s) = sT (A−1)TA−1s.
Hence by Lemma 5.4 we have
|z|2 =
s2n + s
(n)((A
−1)TA−1)(n)s(n),
where s(n) is obtained from s by deleting the n
th coordinate and A(n) is matrix A without
nth row and nth column.
After substitution s = Az, integral I(a, t) is
I(a, t) =
· · ·
si>(fi−fi+1)
for i=1,...,n−1
((A−1)TA−1)(n)s(n))
ml−1<u<v≤ml
(su+···+sv−1)(au−av)
(6.19)
H(s(m′
k−1;m
) ds(n) dsn
· · ·
si>(fi−fi+1)
for i=1,...,n−1
((A−1)TA−1)(n)s(n))
ml−1<u<v≤ml
(su+···+sv−1)(au−av)
(6.20)
H(s(m′
k−1,m
)) ds(n).
It is important to notice that the second exponent in integral I(a, t) in (6.19) depends only
on those si, where i /∈ {m1, . . . mq}. We also see that if mi−1 < k < mi, then the coefficient
at sk in (6.19) is
mi −mi−1
(mi − k)(k −mi−1)
ami−1+1 + · · · + ai
i−mi−1
ai+1 + · · · + ami
mi − i
and it is strictly negative by the definition of the stable partition. Note also that polynomials
H in integral I(a, t) depends only on sj, where j /∈ {m′1, . . . ,m′q′}.
We now introduce new variables ξ = (ξ1, . . . , ξn−1) by
tsj, for j 6= mi, j = 1, . . . , n− 1, i = 1, . . . , q − 1
sj, for j = mi, j = 1, . . . , n− 1, i = 1, . . . , q − 1.
(6.21)
We define function K by K
k−1,m
k−1,m
Consider now H
s(1;m′1)
. Since m′ is a subsequence of m, we recall that i1 is such that
mi1 = m
1. Similarly are defined i1, . . . , iq′ . We now factorize H
s(1;m′1)
into parts in which
there in none of mi, where is exactly one mi, exactly two and so on. Thus
s(m0;m′1)
s(mk−1;mk)
mk−1<i≤mk
mk<j≤mk+1
(si + · · · + sj−1)
mk−1<i≤mk
mk+1<j≤mk+2
(si + · · · + sj−1)
m0<i≤m1
mi1−1<j≤mi1
(si + · · · + sj−1).
We make analogous factorization for other H(s(mk−1;mk)).
Lemma 6.1 As t → ∞
K(ξ(m′
k−1,m
), t) = t
i=1 (
H(ξ(mi−1;mi))
i:{i,i+1,...,i+k}
∈{1,...,q}\{i1,...,iq′ }
ξmi+j
νiνi+k+1
(1 + o(1))
Proof. After the substitution we get
K(ξ(m′
k−1,m
), t) =
ξ(mk−1;mk)/
mk−1<i≤mk
mk<j≤mk+1
r=i,r 6=mk
t + ξmk
mk−1<i≤mk
mk+1<j≤mk+2
r=i,r /∈{mk ,mk+1}
t + ξmk + ξmk+1
m0<i≤m1
mi1−1<j≤mi1
r=i,r /∈{m1,...,mi1−1}
It is not difficult to see that asymptotic behavior of the above expression is
K(ξ(m′
k−1,m
), t) = t
l=1 (
ξ(mk−1;mk)
(ξmk)
νkνk+1
(ξmk + ξmk+1)
νkνk+2
)ν1νi1
(1 + o(1)).
In result the whole polynomial is asymptotically
K(ξ(m′
k−1;mk)
, t) = t−
i=1 (
ξ(mi−1;mi)
i:{i,i+1,...,i+k}
∈{1,...,q}\{i1,...,iq′ }
ξmi+j
νiνi+k+1
(1 + o(1)).
For substitution (6.21), we have ds(n) = t
−(n−q)/2 dξ. Note that fk+1 = fk for k 6= mi,
and hence the integration on the kth coordinate starts from 0. On the other hand if k = mi
for some i, and k 6= m′j for every j, then we also have fk+1 = fk and therefore the integration
starts from 0. Finally if k = mij for some j, then fk+1 > fk and the integrations starts from
(fk − fk+1)
t. Hence we have after the substitution
I(a, t) = t−(n−q)/2
· · ·
ξj>(fj−fj+1)
for i=1,...,n−1
k,l∈{m1,...,mq}
Sklξkξl)
k,l/∈{m1,...,mq}
Sklξkξl/t+2
k∈{m1,...,mq},l/∈{m1,...,mq}
Sklξkξl/
ml−1<u<v≤ml
(ξu+···+ξv−1)(au−av)
K(ξ(m′
k−1,m
), t) dξ .
So we can clearly see that
k=1K(ξ(m′k−1,m
), t) depends only on ξi’s such that i /∈
{ml1 , . . . ,mlq′} and it can be factorized into a part which depends only on i /∈ {m1, . . . ,mq}
and a part that depends on i ∈ {m1, . . . ,mq} \ {ml1 , . . . ,mlq′}. Thus finally we can write
I(a, t) = t−(n−q)/2
· · ·
ξi>(fi−fi+1)
for i=1,...,n−1
k,l∈{m1,...,mq}
Sklξkξl)
k,l/∈{m1,...,mq}
Sklξkξl/t+2
k∈{m1,...,mq},l/∈{m1,...,mq}
Sklξkξl/
ml−1<u<v≤ml
(ξu+···+ξv−1)(au−av)
i=1 (
H(ξ(mi−1;mi), t)
i:{i,i+1,...,i+k}
∈{1,...,q}\{i1,...,iq′ }
ξmi+j
νiνi+k+1
dξ (1 + o(1))
Hence
I(a, t) = t−(n−q)/2t−
i=1 (
· · ·
ξi>0:i=1,...,n−1
i/∈{m1,...,mq}
ml−1<u<v≤ml
(ξu+···+ξv−1)(au−av)
H(ξ(mi−1;mi))
i/∈{m1,...,mq}
· · ·
ξi>0:i∈{m1,...,mq}\{ml1 ,...,mlq′ }
· · ·
ξi>−∞:i∈{ml1 ,...,mlq′ }
k,l∈{m1,...,mq}
Sklξkξl
i:{i,i+1,...,i+k}
∈{1,...,q}\{l1,...,lq′ }
ξmi+j
νiνi+k+1
i∈{m1,...,mq}
dξi (1 + o(1)) . (6.22)
Concluding we have
I(a, t) = C1t
−(n−q)/2t−
i=1 (
2 )(1 + o(1)),
where C1 depends only on drift vector a.
6.2 Proof of Theorem 4.2.
Following considerations of Section 6.1, notice first that it suffices to take the first term
from the sum (6.18) for asymptotic analysis because next terms consists of positive rank
polynomials of variable z and therefore they will tend to zero faster after substitution (6.21).
For the proof of the main theorem we have to plug the asymptotics (6.22) to integral (6.18).
References
[1] S. Asmussen (2003) Applied Probability and Queues. Second Ed., Springer , New York.
[2] Ph. Biane, Ph. Bougerol and N. O’Connell (2005) Littelmann paths and Brownian paths.
Duke Math. J. 130, 127–167.
[3] A.N. Borodin and P. Salminen (2002) Handbook of Brownian Motion - Facts and For-
mulae. Birkhäuser Verlag, Basel.
[4] Y. Doumerc, N. O’Connell (2005) Exit problems associated with finite reflection groups.
Probability Theory and Related Fields 132, 501 - 538
[5] D.J. Grabiner, Brownian motion in a Weyl chamber, non-colliding particles, and random
matrices. Ann. Inst. H. Poincaré. Probab. Statist.35 (1999), 177-204.
[6] C. Itzykson and J.-B. Zuber (1980) The planar approximation II. J. Math. Phys. 21,
411–421 .
[7] S. Karlin and J. McGregor (1959) Coincidence probabilities. Pacific J. Math. 1141–1164.
[8] I.G. Macdonald, Symetric Functions and Hall Polynomials. Clarendon Press (1979),
Oxford.
[9] Z. Pucha la (2005) A proof of Grabiner theorem on non-colliding particles. Probability
and Mathematical Statistics 25, 129–132.
[10] Z. Pucha la and T. Rolski (2005) The exact asymptotics of the time to collison. Electronic
Journal of Probability 10, 1359–1380.
	Introduction and results
	Formula for IPbold0mu mumu xxxxxx(>t).
	Stable partition of bold0mu mumu aaaaaa.
	The theorem and examples.
	Auxiliary results.
	Useful lemmas.
	Asymptotic behavior of determinant. 
	Proof of the Theorem.
	Asymptotic behavior of integral.
	Proof of Theorem ??.
ABSTRACT
  In this note we consider the time of the collision $\tau$ for $n$ independent
Brownian motions $X^1_t,...,X_t^n$ with drifts $a_1,...,a_n$, each starting
from $x=(x_1,...,x_n)$, where $x_1<...<x_n$. We show the exact asymptotics of
$P_x(\tau>t) = C h(x)t^{-\alpha}e^{-\gamma t}(1 + o(1))$ as $t\to\infty$ and
identify $C,h(x),\alpha,\gamma$ in terms of the drifts.

<|endoftext|><|startoftext|>
Ab initio Study of Graphene on SiC
Alexander Mattausch∗ and Oleg Pankratov
Theoretische Festkörperphysik, Universität Erlangen-Nürnberg, Staudtstr. 7, 91058 Erlangen, Germany
(Dated: October 30, 2018)
Employing density-functional calculations we study single and double graphene layers on Si- and
C-terminated 1 × 1 - 6H-SiC surfaces. We show that, in contrast to earlier assumptions, the first
carbon layer is covalently bonded to the substrate, and cannot be responsible for the graphene-
type electronic spectrum observed experimentally. The characteristic spectrum of free-standing
graphene appears with the second carbon layer, which exhibits a weak van der Waals bonding to
the underlying structure. For Si-terminated substrate, the interface is metallic, whereas on C-face
it is semiconducting or semimetallic for single or double graphene coverage, respectively.
PACS numbers: 68.35.Ct, 68.47.Fg, 73.20.-r
The last years have witnessed an explosion of inter-
est in the prospect of graphene-based nanometer-scale
electronics [1, 2, 3, 4]. Graphene, a single hexagonally
ordered layer of carbon atoms, has a unique electronic
band structure with the conic “Dirac points” at two in-
equivalent corners of the two-dimensional Brillouin zone.
The electron mobility may be very high and lateral pat-
terning with standard lithography methods allows de-
vice fabrication [1]. Two ways of obtaining graphene
samples have been used up to now. In the first “me-
chanical” method, the carbon monolayers are mechani-
cally split off the bulk graphite crystals and deposited
onto a SiO2/Si substrate [4]. This way an almost “free-
standing” graphene is produced, since the carbon mono-
layer is practically not coupled to the substrate. The sec-
ond method uses epitaxial growth of graphite on single-
crystal silicon carbide (SiC). The ultrathin graphite layer
is formed by vacuum graphitization due to Si depletion
of the SiC surface [5]. This method has apparent techno-
logical advantages over the “mechanical” method, how-
ever it does not guarantee that an ultrathin graphite (or
graphene) layer is electronically isolated from the sub-
strate. Moreover, one expects a covalent coupling be-
tween both which may strongly modify the electronic
properties of the graphene overlayer. Yet, experiments
show that the transport properties of the interface are
dominated by a single epitaxial graphene layer [1, 2].
Most surprisingly, the electronic spectrum seems not to
be affected much by the substrate. As in free-standing
graphene one observes the “Dirac points” with the linear
dispersion relation around them. The electron dynamics
is governed by a Dirac-Weyl Hamiltonian with the Fermi
velocity of graphene replacing the speed of light. This
leads to an unusual sequence of Landau levels in a mag-
netic field and hence to peculiar features in the quantum
Hall effect [1, 4].
The growth of high-quality graphene layers on both
Si-terminated or C-terminated SiC{0001} surfaces oc-
curs in vacuum at annealing temperatures above 1400◦C.
The geometric structure of the interface is unclear. For-
beaux et al. [5] proposed that on the Si-face the graphite
FIG. 1: (Color online) Side view (a) and top view (b) of a
graphene layer on the SiC(0001) surface. The
3R30◦
surface unit cell is highlighted.
layer is loosely bound by van der Waals-forces to the√
3R30◦-reconstructed substrate. On the con-
trary, combining STM and LEED data with DFT cal-
culations Chen et al. [6] came to the conclusion that the
graphite sheet is formed on a complex 6 × 6-structure,
from which originates the observed 6
3 × 6
3R30◦ re-
construction that precedes the graphite formation. On
the C-terminated SiC(0001̄) face, graphite growth on top
of a 2 × 2 reconstruction was reported [5, 7]. Berger
et al. [1, 2] observed the formation of large high-quality
graphene islands on top of a 1×1 C-terminated SiC sub-
strate with a
3R30◦ interface reconstruction.
In this work we employ an ab initio density-functional
theory approach to study the bonding and electronic
structure of graphene on SiC. We find that a strong co-
valent bonding of the first carbon layer to the substrate
removes the graphene-type electronic features from the
energy region around the Fermi level. However, these
features reappear with the second carbon layer. We also
compare the electronic properties of graphene on Si- and
C-terminated surfaces.
Our calculations were performed with the density-
functional theory program package VASP [8, 9, 10, 11] in
the local spin density approximation (LSDA). Projector
augmented wave (PAW) pseudopotentials [12] were used.
A special 7× 7× 1 k-point sampling was applied for the
Brillouin-zone integration. The plane wave basis set was
http://arxiv.org/abs/0704.0216v2
restricted by a cut-off energy of 400 eV. We have chosen a
6H-SiC polytype, which is most often used in experimen-
tal studies. The supercell was constructed of 6 bi-layers
of SiC in the S3-structure [13], one or two carbon mono-
layers and a vacuum interval needed to separate the slabs.
The vacuum separation varied, depending on the carbon
coverage, between 10 to 15 Å. The graphene layer was
placed on top of the unreconstructed 6H-SiC substrate
such that the structure had a lateral
3R30◦ el-
ementary cell (Fig. 1a). Due to the lattice mismatch
of 8% between SiC and graphite, this requires stretching
the graphene layer. We verified that for the free-standing
graphene layer the stretch reduces the total bandwidth
from 19.1 eV to 17.3 eV but does not affect the electronic
spectrum close to the Fermi energy. The elastic energy
is 0.8 eV per graphene unit cell.
The interface unit cell (cf. Fig. 1b) contains three sur-
face atoms of the substrate and four elementary unit cells
of graphene. The dangling bonds of the substrate atoms
at the corners of the unit cell are unsaturated, while the
other surface atoms bind to two carbon atoms of the
hexagonal graphene ring. In case of the Si-terminated
SiC(0001) surface, we find that the graphene layer is sep-
arated by 2.58 Å from the SiC substrate. The carbon
atoms covalently bonded to the substrate relax towards
the SiC surface, such that the bond length is 2.0 Å. This
is only slightly longer than the bond length 1.87 Å in SiC.
The graphene bonding releases 0.72 eV per graphene unit
cell. For the C-terminated SiC(0001̄) face, the graphene
layer is somewhat closer (2.44 Å) to the substrate and the
bond length between the bonding carbon atoms reduces
to 1.87 Å. The energy gain is 0.60 eV per graphene unit
cell. On both interfaces, the bonding atom of the sub-
strate relaxes outwards, whereas the partner graphene
atom moves towards the substrate. The bonding ener-
gies are quite close but somewhat smaller than the elastic
deformation energy of the graphene layer. However, the
latter can be drastically lessened by defects which result
from the lattice mismatch.
For a second graphene layer placed in the graphite-
type AB stacking, we find a weak bonding at a distance
of 3.3 Å, very close to the bulk graphite value 3.35 Å.
This conforms to the fact that LSDA, despite the lack
of long-range non-local correlations, produces reason-
able interlayer distances in van der Waals crystals like
graphite [14, 15] or h-BN [16]. As shown by Marini et
al. [16], a delicate error cancellation between exchange
and correlation underlies this apparent performance of
the LSDA. The semilocal GGA, which violates this bal-
ance, fails to generate the interplanar bonding in both
graphite [15] and h-BN [16], while producing a band
structure identical to LSDA [15]. It is thus natural to
assume that in our situation the bonding between the
graphene layers is the same as in bulk graphite with the
same interplanar distance. To reduce the calculational
cost, we fixed the interplanar distance at this value.
The first graphene layer, which is covalently bonded to
the substrate, thus serves as a buffer separating the SiC
crystal and the van der Waals bonded second graphene
sheet. Most probably, the 6
3R30◦ reconstructed
carbon-rich Si-terminated surface observed as a precur-
sor of graphitization is a natural realization of this buffer
layer in the epitaxial process. The 6
3 structure is
practically commensurate with graphene since 13 times
the graphene lattice constant almost precisely fits 6
times the SiC lattice parameter. In any case, there is
no stress in the second carbon layer. Even placed on a
strongly stretched buffer layer, the upper layer relaxes to
its natural lattice constant due to the weak interlayer in-
teraction. For the C-face Berger et al. [1] found graphene
formation on a 1× 1 substrate with a
3 interface
unit cell. This structure is the same as we used in our
calculations.
Figs. 2a and 2b show the electronic energy spectrum
of a single graphene layer on the two SiC surfaces. The
shaded regions are the projected conduction and valence
energy bands of SiC. The Kohn-Sham energy gap of
1.98 eV is smaller than the optical band gap (3.02 eV)
of the bulk 6H-SiC, which is a common consequence
of LSDA. The covalent bonding drastically changes the
graphene electron spectrum at the Fermi energy. The
“Dirac cones” are merged into the valence band, whereas
the upper graphene bands overlap with the SiC conduc-
tion band. Hence a wide energy gap emerges in the
graphene spectrum. A similar gap opening due to hy-
drogen absorption on a single graphene sheet was pre-
dicted in Ref. 17. The weakly dispersive interface states
visible in Figs. 2a and 2b result from the interaction
of the graphene layer with the three dangling orbitals
of the substrate. Two of them make covalent bonds,
while the third one in the center of the graphene ring
remains unsaturated (cf. Fig. 1b). A projection analysis
of the wave functions reveals that the gap states close to
the Fermi energy originate from the remaining dangling
bonds of the substrate. On the Si-face we find a half-filled
metallic state, whereas on the C-face the interface state
is split into a singly occupied (spin polarized) and an
empty state, making the interface insulating. In contrast,
on both clean SiC surfaces LSDA predicts a substantial
splitting of the surface states (0.86 eV for SiC(0001) and
0.45 eV for SiC(0001̄), see Table I). Actually, the gap
separating a singly occupied and an empty state is larger
due to the Hubbard repulsion of the electrons (about 2 eV
for the
3R30◦ reconstructed surface [18, 19]), but
already LSDA correctly reproduces the insulating char-
acter of both surfaces.
The reason for the striking difference between the two
graphene-covered surfaces becomes clear if one compares
the planar localization of the two gap states. As seen in
Fig. 3a for the Si-face the interface state electron density
is strongly delocalized. As the projection analysis shows,
this results from the hybridization with the graphene-
Γ Κ Μ Γ
SiC(0001)/Graphene
Γ Κ Μ Γ
SiC(0001)/Graphene
Γ ΓΚ Μ
SiC(0001)/2 graph. lay.
Γ ΓΚ Μ
SiC(0001)/2 graph. lay.
c) d)
FIG. 2: Energy spectrum of the interface states of a) the SiC(0001)/graphene interface, b) the SiC(0001̄)/graphene interface,
c) SiC(0001) with two layers of graphene and d) SiC(0001̄) with two layers of graphene. The Fermi energy is indicated by the
dashed line. K̄ and M̄ are the high-symmetry points of the surface Brillouin zone of the
3R30◦ surface unit cell.
FIG. 3: (Color online) Charge density of the interface states
at the Fermi energy for a single graphene layer on a) SiC(0001)
and b) SiC(0001̄).
induced electron states overlapping with the conduction
band (see Fig. 2a). Given the delocalized nature of the
interface state we expect the influence of Hubbard cor-
relations to be small. In contrast, at the C-terminated
substrate the electron state retains its localized charac-
ter, although it is smeared over a carbon ring just above
the unsaturated C-dangling bond. The localization fa-
vors the spin polarization and thus the splitting of the
gap state, whereas the interface state at the Si-face re-
mains spin-degenerate. In the former case, Hubbard cor-
relations may lead to a further splitting of the interface
state.
Figures 2c and 2d show that the second carbon layer
indeed possesses an electronic structure similar to free-
standing graphene. The characteristic conic point ap-
pears on the Γ̄ − K̄ line (note that since the Brillouin
zone corresponds to the
3R30◦ unit cell, the conic
point is not located at the K̄-point). The interface states
of the buffer layer remain practically unchanged since
the interaction of the carbon layers is very small. The
metallic interface state on the Si-terminated substrate
pins the Fermi level just above the conic point, making
the second graphene layer n-doped. On C-terminated
substrate the Fermi level runs exactly through the conic
point. Hence the interface is semimetallic just as for free-
standing graphene. Indeed, for a graphene-covered C-
face Berger et al. [1] found that the thin graphite layers
possess electronic properties of free-standing graphene.
The parameters of the electron states for the different
interfaces are summarized in Table I. For clean unre-
constructed surfaces we find work functions of 4.75 eV
(Si-terminated surface) and 5.75 eV (C-terminated sur-
face). The former value is practically the same as the
work function of the reconstructed SiC(0001) [20]. The
first graphene layer reduces this value to 3.75 eV, which
is 1.3 eV lower than the work function of free-standing
graphene. The drastic reduction of the work function
is caused by charge flow from graphene to the interface
region, which induces a dipole layer. On the C-face the
graphene overlayer also reduces the work function, but to
a lesser extent such that it remains above the graphene
value. Adding the second graphene layer makes the work
function closer to that of graphene for both faces.
The Fermi level pinning close to the conduction band
makes the graphitized Si-face especially suitable for
Ohmic contacts on n-type SiC, because it guarantees a
low Schottky barrier. Indeed, Lu et al. [21] find a very
low resistance for thermally treated SiC contacts with
TABLE I: Parameters of the unreconstructed and graphene-covered SiC{0001} surfaces in eV: work function φ, positions of
the occupied and the unoccupied surface and interface states above the valence band edge (Eo, Eu) and their corresponding
bandwidths (Bo, Bu).
Work function φ Eo Bo Eu Bu
SiC(0001) 1×1 4.75 Ev + 0.92 0.45 Ev + 1.78 0.53
SiC(0001)/Graphene 3.75 Ev + 1.64 0.35 − −
SiC(0001)/2 Graphene 4.33 Ev + 1.64 0.40 − −
SiC(0001̄) 1×1 5.75 Ev + 0.05 0.75 Ev + 0.50 0.45
SiC(0001̄)/Graphene 5.33 Ev + 0.43 0.13 Ev + 1.19 0.14
SiC(0001̄)/2 Graphene 5.31 Ev + 0.44 0.10 Ev + 1.19 0.15
Graphene (single layer) 5.11
nickel and cobalt, while other metals, which form car-
bides and thereby remove the graphitic inclusions, were
rectifying. Recently Seyller et al. measured the Schot-
tky barrier between n-type 6H-SiC(0001) and graphite
by photoelectron spectroscopy and found a low value of
0.3 eV [22]. On the contrary, the C-terminated face has
the Fermi level close to the middle of the band gap and
is semiconducting or semimetallic.
In conclusion, we investigated the interface between
1 × 1 - 6H-SiC{0001} surfaces and carbon layers em-
ploying ab initio density-functional theory. We find
that graphene overlayers on SiC(0001) and SiC(0001̄)
faces possess qualitatively different electronic structures.
While the former is metallic, the latter has semiconduct-
ing properties. The conic points at the Fermi energy,
which are specific for graphene, appear only with the sec-
ond layer. The first carbon sheet is covalently bound to
the substrate and plays the role of a transition region be-
tween a covalent SiC crystal and a van der Waals bonded
stack of graphene layers.
This work was supported by Deutsche Forschungsge-
meinschaft within the SiC Research Group. We are grate-
ful to L. Magaud and F. Varchon for communicating to
us similar results on the SiC/graphene system [23] and
fruitful discussions.
∗ Electronic address: Alexander.Mattausch@physik.uni-erlangen.de
[1] C. Berger, Z. Song, X. Li, X. Wu, N. Brown, C. Naud,
D. Mayou, T. Li, J. Hass, A. N. Marchenkov, et al., Sci-
ence 312, 1191 (2006).
[2] J. Hass, C. A. Jeffrey, R. Feng, T. Li, X. Li, Z. Song,
C. Berger, W. A. de Heer, P. N. First, and E. H. Conrad,
Appl. Phys. Lett. 89, 143106 (2006), cond-mat/0604206.
[3] T. Seyller, K. V. Emtsev, K. Gao, F. Speck, L. Ley,
A. Tadich, L. Broekmann, J. D. Riley, R. C. G. Leckey,
O. Rader, et al., Surf. Sci. 600, 3906 (2006).
[4] Y. Zhang, Z. Jiang, J. P. Small, M. S. Purewal, Y.-W.
Tan, M. Fazlollahi, J. D. Chudow, J. A. Jaszczak, H. L.
Stormer, and P. Kim, Phys. Rev. Lett. 96, 136806 (2006).
[5] I. Forbeaux, J.-M. Themlin, and J.-M. Debever, Phys.
Rev. B 58, 16396 (1998).
[6] W. Chen, H. Xu, L. Liu, X. Gao, D. Qi, G. Peng, S. C.
Tan, Y. Feng, K. P. Loh, and A. T. S. Wee, Surf. Sci.
596, 176 (2005).
[7] I. Forbeaux, J.-M. Themlin, A. Charrier, F. Thibaudau,
and J.-M. Debever, Appl. Surf. Sci. 162-163, 406 (2000).
[8] G. Kresse and J. Hafner, Phys. Rev. B 47, 558 (1993).
[9] G. Kresse, Ph.D. thesis, Technische Universität Wien,
Austria (1993).
[10] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169
(1996).
[11] G. Kresse and J. Furthmüller, Comput. Mat. Sci 6, 15
(1996).
[12] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999).
[13] U. Starke, J. Schardt, J. Bernhardt, M. Franke, and
K. Heinz, Phys. Rev. Lett. 82, 2107 (1999).
[14] J.-C. Charlier, X. Gonze, and J. P. Michenaud, Carbon
32, 289 (1994).
[15] N. Ooi, A. Rairkar, and J. B. Adams, Carbon 44, 231
(2006).
[16] A. Marini, P. Garćıa-González, and A. Rubio, Phys. Rev.
Lett. 96, 136404 (2006).
[17] E. J. Duplock, M. Scheffler, and P. J. D. Lindan, Phys.
Rev. Lett. 92, 225502 (2004).
[18] M. Rohlfing and J. Pollmann, Phys. Rev. Lett. 84, 135
(2000).
[19] V. I. Anisimov, A. E. Bedin, M. A. Korotin, G. San-
toro, S. Scandolo, and E. Tosatti, Phys. Rev. B 61, 1752
(2000).
[20] M. Wiets, M. Weinelt, and T. Fauster, Phys. Rev. B 68,
125321 (2003).
[21] W. Lu, W. C. Mitchel, G. R. Landis, T. R. Crenshaw,
and W. E. Collins, J. Appl. Phys. 93, 5397 (2003).
[22] T. Seyller, K. V. Emtsev, F. Speck, K.-Y. Gao, and
L. Ley, Appl. Phys. Lett. 88, 242103 (2006).
[23] F. Varchon, L. Magaud, and V. Olevano, private com-
munication; F. Varchon, R. Feng, J. Hass, X. Li, B. N.
Nguyen, C. Naud, P. Mallet, J. Y. Veuillen, C. Berger,
E. H. Conrad, L. Magaud, arXiv:cond-mat/0702311.
mailto:Alexander.Mattausch@physik.uni-erlangen.de
ABSTRACT
  Employing density-functional calculations we study single and double graphene
layers on Si- and C-terminated 1x1-6H-SiC surfaces. We show that, in contrast
to earlier assumptions, the first carbon layer is covalently bonded to the
substrate, and cannot be responsible for the graphene-type electronic spectrum
observed experimentally. The characteristic spectrum of free-standing graphene
appears with the second carbon layer, which exhibits a weak van der Waals
bonding to the underlying structure. For Si-terminated substrate, the interface
is metallic, whereas on C-face it is semiconducting or semimetallic for single
or double graphene coverage, respectively.

<|endoftext|><|startoftext|>
Introduction
	Channel Model
	Beamforming with Limited Feedback
	MISO Channel
	Multi-Input Multi-Output (MIMO) Channel
	Numerical Results
	Precoding Matrix with Arbitrary Rank
	Quantized Precoding with Linear Receivers
	Matched filter
	MMSE receiver
	Numerical Results
	Conclusions
	Appendix
	Proof of Theorem ??
	Proof of Theorem ??
	Proof of Theorem ??
	Derivation of (??)-(??)
	References
	Biographies
	Wiroonsak Santipach
	Michael L. Honig
ABSTRACT
  Given a multiple-input multiple-output (MIMO) channel, feedback from the
receiver can be used to specify a transmit precoding matrix, which selectively
activates the strongest channel modes. Here we analyze the performance of
Random Vector Quantization (RVQ), in which the precoding matrix is selected
from a random codebook containing independent, isotropically distributed
entries. We assume that channel elements are i.i.d. and known to the receiver,
which relays the optimal (rate-maximizing) precoder codebook index to the
transmitter using B bits. We first derive the large system capacity of
beamforming (rank-one precoding matrix) as a function of B, where large system
refers to the limit as B and the number of transmit and receive antennas all go
to infinity with fixed ratios. With beamforming RVQ is asymptotically optimal,
i.e., no other quantization scheme can achieve a larger asymptotic rate. The
performance of RVQ is also compared with that of a simpler reduced-rank scalar
quantization scheme in which the beamformer is constrained to lie in a random
subspace. We subsequently consider a precoding matrix with arbitrary rank, and
approximate the asymptotic RVQ performance with optimal and linear receivers
(matched filter and Minimum Mean Squared Error (MMSE)). Numerical examples show
that these approximations accurately predict the performance of finite-size
systems of interest. Given a target spectral efficiency, numerical examples
show that the amount of feedback required by the linear MMSE receiver is only
slightly more than that required by the optimal receiver, whereas the matched
filter can require significantly more feedback.

<|endoftext|><|startoftext|>
Introduction
Different problems of decidability in combinatorics on words are always of great interest and dif-
ficulty. Here we deal with two main types of symbolic infinite sequences — morphic and almost
periodic — and try to understand connections between them. Namely, we are trying to find an
algorithmic criterion which given a morphic sequence decides whether it is almost periodic.
Though the main problem still remains open, we propose polynomial-time algorithms solving
the problem in two important particular cases: for pure morphic sequences generated by non-erasing
morphisms (Section 3) and for automatic sequences (Section 4). In Section 5 we say a few words
about connections with monadic logics. In particular, in a curious result of Corollary 4 we give a
reason why the main problem may be decidable.
Some attempts to solve the problem were already done. In [3] A. Cobham gives a criterion for
automatic sequence to be almost periodic. But even if his criterion gives some effective procedure
solving the problem (which is not clear from his result, and he does not care about it at all), this
procedure could not be fast. We construct a polynomial-time algorithm solving the problem. In [5]
A. Maes deals with pure morphic sequences and finds a criterion for them to belong to a slightly
different class of generalized almost periodic sequences (but he calls them almost periodic — see
[9] for different definitions). And again, his algorithm does not seem to be polynomial-time.
All the results of this paper can be found in [10].
2 Preliminaries
Denote the set of natural numbers {0, 1, 2, . . . } by N and the binary alphabet {0, 1} by B. Let A
be a finite alphabet. We deal with sequences over this alphabet, i. e., mappings x : N → A, and
denote the set of these sequences by AN.
∗Moscow State University, Russia, http://lpcs.math.msu.su/~pritykin/, yura@mccme.ru. The work was par-
tially supported by RFBR grants 06-01-00122, 05-01-02803, Kolmogorov grant of Institute of New Technologies, and
August Möbius grant of Independent University of Moscow.
http://arxiv.org/abs/0704.0218v1
Denote by A∗ the set of all finite words over A including the empty word Λ. If i ≤ j are natural,
denote by [i, j] the segment of N with ends in i and j, i. e., the set {i, i + 1, i + 2, . . . , j}. Also
denote by x[i, j] a subword x(i)x(i + 1) . . . x(j) of a sequence x. A segment [i, j] is an occurrence
of a word u ∈ A∗ in a sequence x if x[i, j] = u. We say that u 6= Λ is a factor of x if u occurs in x.
A word of the form x[0, i] for some i is called prefix of x, and respectively a sequence of the form
x(i)x(i+ 1)x(i+ 2) . . . for some i is called suffix of x and is denoted by x[i,∞). Denote by |u| the
length of a word u. The occurrence u = x[i, j] in x is k-aligned if k|i.
A sequence x is periodic if for some T we have x(i) = x(i+ T ) for each i ∈ N. This T is called
a period of x. We denote by P the class of all periodic sequences. Let us consider an extension of
this class.
A sequence x is called almost periodic1 if for every factor u of x there exists a number l such
that every factor of x of length l contains at least one occurrence of u (and therefore u occurs in x
infinitely many times). Obviously, to show almost periodicity of a sequence it is sufficient to check
the mentioned condition only for all prefixes but not for all factors (and even for some increasing
sequence of prefixes only). Denote by AP the class of all almost periodic sequences.
Let A, B be finite alphabets. A mapping φ : A∗ → B∗ is called a morphism if φ(uv) = φ(u)φ(v)
for all u, v ∈ A∗. A morphism is obviously determined by its values on single-letter words. A
morphism is non-erasing if |φ(a)| > 1 for each a ∈ A. A morphism is k-uniform if |φ(a)| = k for
each a ∈ A. A 1-uniform morphism is called a coding. For x ∈ AN denote
φ(x) = φ(x(0))φ(x(1))φ(x(2)) . . .
Further we consider only morphisms of the form A∗ → A∗ (but codings are of the form A → B,
which in fact does not matter, they can be also of the form A→ A without loss of generality). Let
φ(s) = su for some s ∈ A, u ∈ A∗. Then for all natural m < n the word φn(s) begins with the word
φm(s), so φ∞(s) = limn→∞ φ
n(s) = suφ(u)φ2(u)φ3(u) . . . is well-defined. If ∀n φn(u) 6= Λ, then
φ∞(s) is infinite. In this case we say that φ is prolongable on s. Sequences of the form h(φ∞(s))
for a coding h : A→ B are called morphic, of the form φ∞(a) are called pure morphic.
Notice that there exist almost periodic sequences that are not morphic (in fact, the set of almost
periodic sequences has cardinality continuum, while the set of morphic sequences is obviously
countable), as well as there exist morphic sequences that are not almost periodic (you will find
examples later). Our goal is to determine whether a morphic sequence is almost periodic or not
given its constructive definition.
First of all, observe the following
Lemma 1. A sequence φ∞(s) is almost periodic iff s occurs in this sequence infinitely many times
with bounded distances.
Proof. In one direction the statement is obviously true by definition.
Suppose now that s occurs in φ∞(s) infinitely many times with bounded distances. Then for
every m the word φm(s) also occurs in φ∞(s) infinitely many times with bounded distances. But
every word u occurring in φ∞(s) occurs in some prefix φm(s) and thus occurs infinitely many times
with bounded distances.
For a morphism φ : {1, . . . , n} → {1, . . . , n} we can define a corresponding matrix M(φ), such
that M(φ)ij is a number of occurrences of symbol i into φ(j). One can easily check that for each l
we have M(φ)l =M(φl).
Morphism φ is called primitive if for some l all the numbers in M(φl) are positive.
1It was called strongly or strictly almost periodic in [7, 8].
Let us construct an oriented graph G corresponding to a morphism. Let its set of vertices be
A. In G edges go from b ∈ A to all the symbols occurring in φ(b).
For φ∞(s) it can easily be found using the graph corresponding to φ which symbols from A
really occur in this sequence. Indeed, these symbols form the set of all vertices that can be reached
from s. So without loss of generality from now on we assume that all the symbols from A occur in
φ∞(s).
A morphism is primitive if and only if its corresponding graph is strongly connected, i. e., there
exists an oriented path between every two vertices. This reformulation of the primitiveness notion
seems to be more appropriate for computational needs.
By Lemma 1 (and the observation that codings preserve almost periodicity) morphic sequences
obtained by primitive morphisms are always almost periodic. Moreover, in the case of increasing
morphisms (such that |φ(b)| > 2 for each b) this sufficient condition is also necessary (and this
is a polynomial-time algorithmic criterion). However when we generalize this case even on non-
erasing morphisms, it is not enough to consider only the corresponding graph or even the matrix
of morphism (which has more information), as it can be seen from the following example.
Let φ1 be as follows: 0 → 01, 1 → 120, 2 → 2, and φ2 be as follows: 0 → 01, 1 → 210, 2 → 2.
Then these two morphisms have identical matrices of morphism, but φ∞1 (0) is almost periodic, while
φ∞2 (0) is not. Indeed, in φ
2 (0) there are arbitrary long segments like 222. . . 22, so φ
2 (0) /∈ AP .
There is no such problem in φ∞1 (0). Since 0 occurs in both φ1(0) and φ1(1), and 22 does not occur
in φ∞1 (0), it follows that 0 occurs in φ
1 (0) with bounded distances. Thus φ
1 (0) for every m > 0
occurs in φ∞1 (0) with bounded distances, so φ
1 (0) ∈ AP. See Theorem 1 for a general criterion of
almost periodicity in the case of fixed points of non-erasing morphisms.
To introduce a bit the notion of almost periodicity, let us formulate an interesting result on this
topic. It seems to be first proved in [3], but also follows from the results of [9]. For x ∈ AN, y ∈ BN
define x× y ∈ (A×B)N such that (x× y)(i) = 〈x(i), y(i)〉.
Proposition 1. If x is almost periodic and y is periodic, then x× y is almost periodic.
3 Pure Morphic Sequences Generated by Non-erasing Morphisms
Here we consider the case of morphic sequence of the form φ∞(s) for non-erasing φ. We present an
algorithm that determines whether a morphic sequence φ∞(s) is almost periodic given an alphabet
A, a morphism φ and a symbol s ∈ A.
Suppose we have A, φ and s ∈ A, such that |A| = n, maxb∈A |φ(b)| = k, s begins φ(s).
Remember that we suppose that all the symbols from A appear in φ∞(s).
Divide A into two parts. Let I be the set of all symbols b ∈ A such that |φm(b)| → ∞ as
m→ ∞. Denote F = A \ I, it is the set of all symbols b such that |φm(b)| is bounded. Also define
E ⊆ F to be the set of all symbols b such that |φ(b)| = 1.
We can find a decomposition A = I ⊔ F in poly(n, k)-time as follows.
Find E. Then find all the cycles in G with all the vertices lying in E. Join all the vertices of all
these cycles in a set D. This set is stabilizing: F is the set of all vertices in G such that all infinite
paths starting from them stabilize in D. Polynomiality can be checked easily.
Construct “a graph of left tails” L with marked edges. Its set of vertices is I. From each vertex
b exactly one edge goes off. To construct this edge, find a representation φ(b) = uv, where c ∈ I,
u is the maximal prefix of φ(b) containing only symbols from F . It follows from the definitions
of I and F that u does not coincide with φ(b), that is why this representation is correct. Then
construct in L an edge from b to c and write u on it.
Analogously we construct “a graph of right tails” R. (In this case we consider representations
φ(b) = vu where u ∈ F ∗, c ∈ I.)
Now we formulate a general criterion.
Theorem 1. A sequence φ∞(s) is almost periodic iff
1) G restricted to I is strongly connected;
2) in graphs L and R on each edge of each cycle an empty word Λ is written.
It seems that full and detailed proof of this theorem can only confuse a reader, rather than a
proof sketch.
Proof sketch. By Lemma 1 for almost periodicity it is necessary and sufficient to check whether
symbol s occurs infinitely many times with bounded distances.
For every symbol b ∈ I the symbol s should occur in some φl(b), that is what the 1st part of
the criterion says.
Furthermore, in the sequence φ∞(s) all the segments of consecutive symbols from F should be
bounded. Indeed, every such segment consists only of symbols from F , but s /∈ F . That is what
the 2nd part of the criterion means, let us explain why.
Consider some v = buc occurring somewhere in φ∞(a), where b, c ∈ I, u ∈ F ∗. Every element
of sequence of words v, φ(v), φ2(v), φ3(v), . . . occurs in φ∞(s). Somewhere in the middle of φl(v) =
φl(b)φl(u)φl(c) a word φl(u) occurs. As l increases, some words from F ∗ might stick to φl(u) from
left or right for these words can come from φl(b) or φl(c). These words exactly correspond to those
written on edges of L or R. The 2nd part of the criterion exactly says that this situation can
happen only finitely many times, until we get to some cycle in L or R.
Let us consider examples with φ1 and φ2 from the end of Section 2. In both cases I = {0, 1},
F = {2}. On every edge of R in both cases Λ is written. Almost the same is true for L: the only
difference is about the edge going from 1 to 1. In the case of φ1 an empty word is written on this
edge, while in the case of φ2 a word 2 is written. That is why φ
1 (0) is almost periodic, while φ
2 (0)
is not.
Corollary 1. If for all b ∈ A we have |φ(b)| > 2, then φ∞(s) is almost periodic iff φ is primitive.
Proof. Follows from Theorem 1. In that case A = I, and on all the edges of L and R the empty
word is written.
Corollary 2. There exists a poly(n, k)-algorithm that says whether φ∞(s) is almost periodic.
Proof. Conditions from Theorem 1 can be checked in polynomial time.
It also seems useful to formulate an explicit version of the criterion for the binary case. We do
it without any additional assumptions, opposite to the previous.
Corollary 3. For non-erasing φ : B → B that is prolongable on 0 a sequence φ∞(0) is almost
periodic iff one of the following conditions holds:
1) φ(0) contains only 0s;
2) φ(1) contains 0;
3) φ(1) = Λ;
4) φ(1) = 1 and φ(0) = 0u0 for some word u.
4 Uniform Morphisms
Now we deal with morphic sequences obtained by uniformmorphisms. Again we present a polynomial-
time algorithm for solving the problem in this situation.
Suppose we have an alphabet A, a morphism φ : A∗ → A∗, a coding h : A→ B, and s ∈ A, such
that |A| = n, |B| 6 n, ∀b ∈ A |φ(b)| = k, s begins φ(s). We are interested in whether h(φ∞(s)) is
almost periodic. Sequences of the form h(φ∞(s)) with φ being k-uniform are also called k-automatic
(see [1]).
4.1 Equivalence Relations and Uniform Morphisms
For each l ∈ N define an equivalence relation on A: b ∼l c iff h(φ
l(b)) = h(φl(b)). We can easily
continue this relation on A∗: u ∼l v iff h(φ
l(u)) = h(φl(v)). In fact, this means |u| = |v| and
u(i) ∼l v(i) for all i, 1 6 i 6 |u|.
Let Bm be the Bell number, i. e., the number of all possible equivalence relations on a finite
set with exactly m elements, see [13]. As it follows from this article, we can estimate Bm in the
following way.
Lemma 2. 2m 6 Bm 6 2
Cm logm for some constant C.
Thus the number of all possible relations ∼l is not greater than Bn = 2
O(n logn). Moreover, the
following lemma gives a simple description for the behavior of these relations as l tends to infinity.
Lemma 3. If ∼r equals ∼s, then ∼r+p equals ∼s+p for all p.
Proof. Indeed, suppose ∼r equals ∼s. Then b ∼r+1 c iff φ(b) ∼r φ(c) iff φ(b) ∼s φ(c) iff b ∼s+1 c.
So if ∼r equals ∼s, then ∼r+1 equals ∼s+1, which implies the lemma statement.
This lemma means that the sequence (∼l)l∈N turns out to be ultimately periodic with a period
and a preperiod both not greater than Bn. Thus we obtain the following
Lemma 4. For some p, q 6 Bn we have for all i and all t > p that ∼t equals ∼t+iq.
4.2 Criterion
Now we are trying to get a criterion which we could check in polynomial time. Notice that the
situation is much more difficult than in the pure case because of a coding allowed. In particular,
the analogue of Lemma 1 for non-pure case does not hold.
We will move step by step to the appropriate version of the criterion reformulating it several
times.
This proposition is quite obvious and follows directly from the definition of almost periodicity
since all h(φm(a)) are the prefixes of h(φ∞(a)).
Proposition 2. A sequence h(φ∞(s)) is almost periodic iff for all m the word h(φm(s)) occurs in
h(φ∞(s)) infinitely often with bounded distances.
And now a bit more complicated version.
Proposition 3. A sequence h(φ∞(s)) is almost periodic iff for all m the symbols that are ∼m-
equivalent to s occur in φ∞(s) infinitely often with bounded distances.
Proof. ⇐. If the distance between two consecutive occurrences in φ∞(s) of symbols that are ∼m-
equivalent to s is not greater than t, then the distance between two consecutive occurrences of
h(φm(s)) in h(φ∞(s)) is not greater than tkm.
⇒. Suppose h(φ∞(s)) is almost periodic. Let ym = 012 . . . (k
m − 2)(km − 1)01 . . . (km − 1)0 . . .
be a periodic sequence with a period km. Then by Proposition 1 a sequence h(φ∞(s))×ym is almost
periodic, which means that the distances between consecutive km-aligned occurrences of h(φm(s))
in h(φ∞(s)) are bounded. It only remains to notice that if h(φ∞(s))[ikm, (i+1)km−1] = h(φm(s)),
then φ∞(s)(i) ∼m s.
Let Ym be the following statement: symbols that are ∼m-equivalent to s occur in φ
∞(s) infinitely
often with bounded distances.
Suppose for some T that YT is true. This implies that h(φ
T (s)) occurs in h(φ∞(s)) with bounded
distances. Therefore for all m 6 T a word h(φm(s)) occurs in h(φ∞(s)) with bounded distances
since h(φm(s)) is a prefix of h(φT (s)). Thus we do not need to check the statements Ym for all m,
but only for all m > T for some T .
Furthermore, it follows from Lemma 4, that we are sufficient to check the only one such state-
ment as in the following
Proposition 4. For all r > Bn: a sequence h(φ
∞(s)) is almost periodic iff the symbols that are
∼r-equivalent to s occur in φ
∞(s) infinitely often with bounded distances.
And now the final version of our criterion.
Proposition 5. For all r > Bn: a sequence h(φ
∞(s)) is almost periodic iff for some m the symbols
that are ∼r-equivalent to s occur in φ
m(b) for all b ∈ A.
Indeed, if the symbols of some set occur with bounded distances, then they occur on each
km-aligned segment for some sufficiently large m.
4.3 Polynomiality
Now we explain how to check a condition from Proposition 5 in polynomial time. We need to
show two things: first, how to choose some r > Bn and to find in polynomial time the set of all
symbols that are ∼r-equivalent to s (and this is a complicated thing keeping in mind that Bn is
exponential), and second, how to check whether for some m the symbols from this set for all b ∈ A
occur in φm(b).
Let us start from the second. Suppose we have found the set H of all the symbols that are
∼r-equivalent to s. For m ∈ N let us denote by P
m the set of all the symbols that occur in
φm(b). Our aim is to check whether exists m such that for all b we have P
m ∩H 6= ∅. First of
all, notice that if ∀b P
m ∩H 6= ∅, then ∀b P
∩H 6= ∅ for all l > m. Second, notice that the
sequence of tuples of sets ((P
m )b∈Σ)
m=0 is ultimately periodic. Indeed, the sequence (P
m=0 is
obviously ultimately periodic with both period and preperiod not greater than 2n (recall that n is
the size of the alphabet Σ). Thus the period of ((P
m )b∈Σ)
m=0 is not greater than the least common
divisor of that for (P
m=0, b ∈ A, and the preperiod is not greater than the maximal that of
m=0. So the period is not greater than (2
n)n = 2n
and the preperiod is not greater than 2n.
Third, notice that there is a polynomial-time-procedure that given a graph corresponding to some
morphism ψ (see Section 2 to recall what is the graph corresponding to a morphism) outputs a
graph corresponding to morphism ψ2. Thus after repeating this procedure n2 + 1 times we obtain
a graph by which we can easily find (P
)b∈Σ, since 2
n2+1 > 2n
+ 2n.
Similar arguments, even described with more details, are used in deciding our next problem.
Here we present a polynomial-time algorithm that finds the set of all symbols that are ∼r-equivalent
to s for some r > Bn.
We recursively construct a series of graphs Ti. Let its common set of vertices be the set of all
unordered pairs (b, c) such that b, c ∈ A and b 6= c. Thus the number of vertices is
n(n−1)
. The set
of all vertices connected with (b, c) in the graph Ti we denote by Vi(b, c).
Define a graph T0. Let V0(b, c) be the set {(φ(b)(j), φ(c)(j)) | j = 1, . . . , k, φ(b)(j) 6= φ(c)(j)}.
In other words, b ∼l+1 c if and only if x ∼l y for all (x, y) ∈ V0(b, c).
Thus b ∼2 c if and only if for all (x, y) ∈ V0(b, c) for all (z, t) ∈ V0(x, y) we have z ∼0 t. For the
graph T1 let V1(b, c) be the set of all (x, y) such that there is a path of length 2 from (b, c) to (x, y)
in T0. The graph T1 has the following property: b ∼2 c if and only if x ∼0 y for all (x, y) ∈ V1(b, c).
And even more generally: b ∼l+2 c if and only if x ∼l y for all (x, y) ∈ V1(b, c).
Now we can repeat operation made with T0 to obtain T1. Namely, in T2 let V2(b, c) be the set of
all (x, y) such that there is a path of length 2 from (b, c) to (x, y) in T1. Then we obtain: b ∼l+4 c
if and only if x ∼l y for all (x, y) ∈ V2(b, c).
It follows from Lemma 2 that log2Bn 6 Cn logn. Thus after we repeat our procedure r =
[Cn log n] times, we will obtain the graph Tl such that b ∼2r c if and only if x ∼0 y for all
(x, y) ∈ V2(b, c). Recall that x ∼0 y means h(x) = h(y), so now we can easily compute the set of
symbols that are ∼2r -equivalent to s.
5 Monadic Theories
Combinatorics on words is closely connected with the theory of second order monadic logics. Here
we just want to show some examples of these connections. More details can be found, e. g.,
in [11, 12].
We consider monadic logics on N with the relation “<”, that is, first-order logics where also
unary finite-value function variables and quantifiers over them are allowed. We also suppose that
we know some fixed finite-value function x : N → Σ and can use it in our formulas. Such a theory
is denoted by MT〈N, <, x〉 and is called monadic theory of x.
The main question here can be the question of decidability, that is, does there exist an algorithm
that given a sentence in a theory says whether this sentence is true of false.
The criterion of decidability for monadic theories of almost periodic sequences can be formulated
in terms of some their very natural characteristic, namely, almost periodicity regulator. An almost
periodicity regulator of an almost periodic sequence x is a function f : N → N such that every factor
u of x of length n occurs in each factor of x of length f(n). So an almost periodicity regulator
somehow regulates how periodic a sequence is. Notice that an almost periodicity regulator of a
sequence is not unique: every function greater than regulator is also a regulator.
Theorem 2 (Semenov 1983 [12]). If x is almost periodic, then MT〈N, <, x〉 is decidable iff x and
some its almost periodicity regulator are computable.
The following result was obtained recently, but uses the technics already used in [11, 12].
Theorem 3 (Carton, Thomas 2002 [2]). If x is morphic, then MT〈N, <, x〉 is decidable.
A curious result can be implied from two these theorems.
Corollary 4. If x is both morphic and almost periodic, then some its regulator is computable.
Proof. Indeed, if x is morphic, then by Theorem 3 the theory MT〈N, <, x〉 is decidable. Since
x is almost periodic, from Theorem 2 it follows that some almost periodicity regulator of x is
computable.
Notice that Corollary 4 does not imply the existence of an algorithm that given a morphic
sequence computes some almost periodicity regulator of this sequence whenever it is almost periodic
(but probably this algorithm can be constructed after deep analyzing the proofs of Theorems 2 and 3
and showing uniformity in a sense). And it also does not imply the decidability of almost periodicity
for morphic sequences. This decidability also does not imply Corollary 4.
By the way, Corollary 4 allows us to hope that these algorithms exist. Though the formulation
of this statement uses only combinatorics on words, the proof also involves the theory of monadic
logics. Of course, it would be interesting to find a simple combinatorial proof of the result.
And the last remark here is that Corollary 4 (and its probable uniform version) seems to be
the best progress that we can obtain by this monadic approach. One could try to express in the
monadic theory of morphic sequence (which is decidable by Theorem 3) the property of almost
periodicity, but it turns out to be impossible.
6 In General Case
We have described two polynomial-time algorithms, but without any precise bound for their working
time. Of course, it can be done after deep analyzing of all the previous, but is probably not so
interesting.
It is not still known whether the problem of determining almost periodicity of arbitrary morphic
sequence is decidable. Corollary 4 somehow supports the conjecture of decidability (but even does
not follow from this conjecture!).
Theorem 7.5.1 from [1] allows us to represent an arbitrary morphic sequence h(φ∞(s)) as
g(ψ∞(b)) where ψ is non-erasing. So it is sufficient to solve our main problem for h(φ∞(s)) with
non-erasing φ.
It seems that the general problem is tightly connected with a particular case of h(φ∞(a)) where
|φ(b)| > 2 for each b ∈ A. There is no strict reduction to this case but solving problem in this case
can help to deal with general situation.
The problem of finding an effective periodicity criterion in the case of arbitrary morphic se-
quences is also of great interest, as well as criteria for variations with periodicity and almost pe-
riodicity: ultimate periodicity, generalized almost periodicity, ultimate almost periodicity (see [9]
for definitions). If one notion is a particular case of another, it does not mean that corresponding
criterion for the first case is more difficult (or less difficult) than for the second.
Acknowledgements
The author is grateful to An. Muchnik and A. Semenov for their permanent help in the work, to
A. Frid, M. Raskin, K. Saari and to all the participants of Kolmogorov seminar, Moscow [4], for
fruitful discussions, and also to anonymous referees for very useful comments.
References
[1] J.-P. Allouche, J. Shallit. Automatic Sequences. Cambridge University Press, 2003.
[2] O. Carton, W. Thomas. The Monadic Theory of Morphic Infinite Words and Generalizations.
Information and Computation, vol. 176, pp. 51–76, 2002.
[3] A. Cobham. Uniform tag sequences. Math. Systems Theory, 6, pp. 164–192, 1972.
[4] Kolmogorov Seminar: http://lpcs.math.msu.su/kolmogorovseminar/eng/.
[5] A. Maes. More on morphisms and almost-periodicity. Theoretical Computer Science, vol. 231,
N 2, pp. 205–215, 2000.
[6] M. Morse, G. A. Hedlund. Symbolic dynamics. American Journal of Mathematics, 60, pp. 815–
866, 1938.
[7] An. Muchnik, A. Semenov, M. Ushakov. Almost periodic sequences. Theoretical Computer
Science, vol. 304, pp. 1–33, 2003.
[8] Yu. L. Pritykin. Finite-Automaton Transformations of Strictly Almost-Periodic Se-
quences. Mathematical Notes, vol. 80, N 5, pp. 710–714, 2006. Preprint on
http://arXiv.org/abs/cs.DM/0605026.
[9] Yu. Pritykin. Almost Periodicity, Finite Automata Mappings and Related Effectiveness Issues.
Proceedings of WoWA’06, St. Petersburg, Russia (satellite to CSR’06). To appear in ”Izvestia
VUZov. Mathematics”, 2007. Preprint on http://arXiv.org/abs/cs.DM/0607009.
[10] Yu. Pritykin. On Almost Periodicity Criteria for Morphic Sequences in Some Particular Cases.
Accepted to Developments in Language Theory, Turku, Finland, 2007. To appear in Lecture
Notes in Computer Science.
[11] A. L. Semenov. On certain extensions of the arithmetic of addition of natural numbers. Math.
of USSR, Izvestia, vol. 15, pp. 401–418, 1980.
[12] A. L. Semenov. Logical theories of one-place functions on the set of natural numbers. Math. of
USSR, Izvestia, vol. 22, pp. 587–618, 1983.
[13] Eric W. Weisstein. Bell Number. From MathWorld — A Wolfram Web Resource.
http://mathworld.wolfram.com/BellNumber.html
	Introduction
	Preliminaries
	Pure Morphic Sequences Generated by Non-erasing Morphisms
	Uniform Morphisms
	Equivalence Relations and Uniform Morphisms
	Criterion
	Polynomiality
	Monadic Theories
	In General Case
ABSTRACT
  In some particular cases we give criteria for morphic sequences to be almost
periodic (=uniformly recurrent). Namely, we deal with fixed points of
non-erasing morphisms and with automatic sequences. In both cases a
polynomial-time algorithm solving the problem is found. A result more or less
supporting the conjecture of decidability of the general problem is given.

<|endoftext|><|startoftext|>
Draft version November 4, 2018
Preprint typeset using LATEX style emulateapj v. 3/25/03
THE RADIO EMISSION, X-RAY EMISSION, AND HYDRODYNAMICS OF G328.4+0.2: A COMPREHENSIVE
ANALYSIS OF A LUMINOUS PULSAR WIND NEBULA, ITS NEUTRON STAR, AND THE PROGENITOR
SUPERNOVA EXPLOSION
Joseph D. Gelfand, Patrick O. Slane, and Daniel J. Patnaude
Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138
B. M. Gaensler∗
Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138 and
School of Physics, The University of Sydney, NSW 2006, Australia
John P. Hughes
Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854-8019
Fernando Camilo
Columbia Astrophysics Lab, Columbia University, New York, NY 10027
Draft version November 4, 2018
ABSTRACT
We present new observational results obtained for the Galactic non-thermal radio source G328.4+0.2
to determine both if this source is a pulsar wind nebula or supernova remnant, and in either case,
the physical properties of this source. Using X-ray data obtained by XMM, we confirm that the
X-ray emission from this source is heavily absorbed and has a spectrum best fit by a power law
model of photon index Γ = 2 with no evidence for a thermal component, the X-ray emission from
G328.4+0.2 comes from a region significantly smaller than the radio emission, and that the X-ray
and radio emission are significantly offset from each other. We also present the results of a new
high resolution (7′′) 1.4 GHz image of G328.4+0.2 obtained using the Australia Telescope Compact
Array, and a deep search for radio pulsations using the Parkes Radio Telescope. By comparing this
1.4 GHz image with a similar resolution image at 4.8 GHz, we find that the radio emission has a flat
spectrum (α ≈ 0; Sν ∝ ν
α), though some areas of the eastern edge of G328.4+0.2 have a steeper radio
spectral index of α ∼ −0.3. Additionally, we searched without success for a central radio pulsar, and
obtain a luminosity limit of L1400 <. 30mJykpc
2, assuming a distance of 17 kpc. In light of these
observational results, we test if G328.4+0.2 is a pulsar wind nebula (PWN) or a large PWN inside
a supernova remnant (SNR) using a simple hydrodynamic model for the evolution of a PWN inside
a SNR. As a result of this analysis, we conclude that G328.4+0.2 is a young (. 10000 years old)
pulsar wind nebula formed by a low magnetic field (. 1012 G) neutron star born spinning rapidly
(. 10 ms) expanding into an undetected SNR formed by an energetic (& 1051 ergs), low ejecta mass
(Mej . 5M⊙) supernova explosion which occurred in a low density (n ∼ 0.03 cm
−3) environment.
If correct, the low magnetic field and fast initial spin period of this neutron star poses problems for
models of magnetar formation which require fast initial periods.
Subject headings: stars: neutron, stars: pulsars: general, ISM: supernova remnants, radio continuum:
ISM, X-rays: individual
1. introduction
Stars with initial masses between ∼9 and 25 M⊙ are
expected to end their lives in a giant supernova (SN) ex-
plosion during which neutron stars are created. The fast
moving ejecta from the SN create a supernova remnant
(SNR), while the particle wind produced by the neu-
tron star as it loses rotational energy inflates a pulsar
wind nebula (PWN; Gaensler & Slane 2006). Initially,
the PWN is inside the SNR – and when the PWN is
detected inside the SNR the system is called a “compos-
ite” SNR (Helfand & Becker 1987). The evolution of the
central neutron star and the outer SNR affect the PWN,
Electronic address: jgelfand@cfa.harvard.edu
∗Alfred P. Sloan Research Fellow, Australian Research Council
Federation Fellow
Electronic address: jgelfand@cfa.harvard.edu
and as a result the PWN goes through several evolution-
ary phases while it is inside the SNR. This evolution is
determined by the physical properties of the neutron star
(specifically the initial period P0, the braking index p
and the strength of the dipole component to the surface
magnetic field Bns), the SN explosion (explosion energy
Esn and ejecta mass Mej), and the surrounding medium
(ambient number density n). As a result, by measur-
ing the properties of the PWN inside a SNR at a given
time one is able to constrain these physical parameters
which allows one to study the mechanisms behind both
core-collapse SNe and massive star evolution.
With this in mind, we present results of new radio
1 The braking index is defined as Ω̇ ∝ Ωp, where Ω is the angular
velocity of the neutron star’s surface.
http://arxiv.org/abs/0704.0219v1
mailto:jgelfand@cfa.harvard.edu
mailto:jgelfand@cfa.harvard.edu
observations of this source with the Australia Telescope
Compact Array (ATCA), as well as a new X-ray (X-ray
Multi-mirror Mission; XMM) observation of G328.4+0.2
(MSH 15-57; Mills et al. 1961). This source is a distant
(d ≥ 17.4 ± 0.9 kpc; Gaensler et al. 2000), radio bright
(flux density Sν = 14.3 ± 0.1 Jy at ν =1.4 GHz), po-
larized, extended (diameter D ≃ 5.0′) radio source with
a relatively flat spectral index (α ≃ −0.12± 0.03 where
Sν ∝ ν
α; Gaensler et al. 2000). Based on these radio
properties, and the discovery of non-thermal X-ray emis-
sion from this source by ASCA (Hughes et al. 2000), this
source was classified as a PWN – the largest and most
radio-luminous PWN in the Galaxy. In this interpre-
tation, the expectation is that G328.4+0.2 is ∼ 7000
years old and powered by an extremely energetic neu-
tron star (Gaensler et al. 2000). However, follow up ra-
dio polarimetry work (Johnston et al. 2004) implied that
G328.4+0.2 is an older composite SNR in which the
PWN is just a small fraction of the total volume, and
as a result is powered by a significantly less energetic
neutron star then argued by Gaensler et al. (2000). In
this paper, we analyze new observations of this source in
order to determine the age of G328.4+0.2, the energet-
ics of the neutron star and the progenitor SN, and the
density of its environment.
In §2 we present new X-ray and radio observations of
G328.4+0.2. In §3, we first discuss the expected evolu-
tionary sequence for PWNe inside SNRs (§3.1), making
general comments regarding the expected observational
signature of each phase. In §3.2 use the observational
results presented in §2 to draw some initial conclusions
about the nature of G328.4+0.2. In §4, we present a sim-
ple hydrodynamical model for the evolution of a PWN
inside a SNR, which we apply to G328.4+0.2 assuming it
is a composite SNR (§4.1.1) or a PWN (§4.1.2). Finally,
in §5 we summarize our results.
2. observations
In this Section, we present the data gathered in a
XMM observation (§2.1), a 1.4 GHz ATCA observation
of G328.4+0.2 (§2.2), and a search for a radio pulsar in
this source (§2.3).
2.1. X-ray Observations
On 2003 March 9–10, G328.4+0.2 was observed for
∼50 ks by XMM. During this observation, the pn cam-
era was operated in Small Window Mode, and the Mos1
and Mos2 camera were operated in Full Frame Mode.
The “Thick” optical filter was used due to the presence of
numerous bright stars in the field-of-view of G328.4+0.2.
The data were reduced with the software package xmm-
sas v 6.0.0 with calibration files current throughXMM-
CCF-REL-174, using the standard procedure for re-
ducing XMM data outlined in the XMM-Newton ABC
Guide2 and the Birmingham XMM Guide3.
2.1.1. Image Analysis
A vignetting corrected 0.2–12 keV image from the
Mos1 and Mos2 instruments4, is shown in Fig. 1, in
2 Available at http://heasarc.gsfc.nasa.gov/docs/xmm/abc/
3 Available at http://www.sr.bham.ac.uk/xmm2/guide.html
4 We did not use the pn data due to the substantially larger pixel
size of this instrument.
which we observe three spatial components to the X-ray
emission: a bright, compact feature located along the
SW edge of the X-ray emission (“Clump 1”), a fainter,
slightly extended feature located NE of the compact fea-
ture described above (“Clump 2”), and extended diffuse
emission, roughly 1′ in diameter, surrounding the two
features described above (“Diffuse”). From the “Clump
1” region we detected 120 ± 12 counts above the back-
ground between 0.5–10 keV in the Mos1 detector and
136± 12 in the Mos2 detector, from the “Clump 2” re-
gion we detected 66 ± 9 counts in both the Mos1 and
Mos2 detectors, and from the “Diffuse” regions we de-
tected 360± 20 and 380± 20 counts from the Mos1 and
Mos2 detectors, respectively.
We determined the spatial properties of these com-
ponents using the Sherpa modeling software package
(Freeman et al. 2001). Due to the low number of counts
per pixel, we used the simplex fitting method and min-
imized the cash statistic (Cash 1979). We attempted
to model Clump 1 and Clump 2 as circular 2D Gaus-
sians, elliptical 2D Gaussians, or circular 2D Lorentzians,
the latter of which is a good model for XMM’s point
spread function (PSF; Ghizzardi & Molendi 2002). We
attempted to model the Diffuse region as a circular or
elliptical 2D Gaussian, and assumed a constant back-
ground. We fit the observed image to all model com-
binations of Clump 1, Clump 2, and Diffuse (attempts
to eliminate one of these components resulted in signif-
icantly worse fits), and the best fit was obtained for
a model in which Clump 1 is a 2D Lorentzian while
Clump 2 and Diffuse are elliptical 2D Gaussians. The
fit parameters for this model are given in Table 1, and
the model and residuals are shown in Fig. 1, and from
this conclude that the emission from Clump 1 is consis-
tent with the PSF of XMM. Additionally, from this fit
we estimate that Clump 1 and Clump 2 are separated
by ∼ 10′′, while the centers of Clump 1 and Diffuse are
separated by ∼ 15′′.
2.1.2. Spectral Results
In generating the spectra, only events with flag = 0
were used. Additionally, the event files were screened for
background flares by binning the 10–15 keV light curve
of each instrument by 50 s and then recursively flagging
all bins with a count rate > 3σ above the average. This
procedure removed 3.0 ks, 2.3 ks, and 0.6 ks of data from
the Mos1, Mos2, and pn detectors, respectively. Spec-
tra were extracted for the regions shown in Fig. 1, and
the resulting spectra were binned into a minimum of 25
counts per channel and modeled using Xspec v12.2.0.
The background regions used are also shown in Fig. 1.
To determine the composite spectra of G328.4+0.2, we
jointly fit the spectrum obtained by the Mos1, Mos2,
and pn detectors – shown in Fig. 2. The background-
subtracted observed 0.5–10 keV count rate of G328.4+0.2
was 0.012 ± 0.001 counts s−1 in the the Mos1 detector
(555± 29 counts), 0.012± 0.001 counts s−1 in the Mos2
detector (580±30 counts), and 0.045±0.001 counts s−1 in
the pn detector5 (1576±63 counts). We fit the spectra to
seven different models separately – a power-law, a black-
body, bremsstrahlung, a Raymond-Smith plasma, and a
5 The pn detector count rate does not account for 29% dead time
since this instrument was operated in Small Window Mode.
http://heasarc.gsfc.nasa.gov/docs/xmm/abc/
http://www.sr.bham.ac.uk/xmm2/guide.html
power-law plus one of these three thermal models – all
attenuated for interstellar absorption. Only the single-
component models produced reasonable fits (reduced
χ2 ∼ 1), and the fitted parameters are presented in Table
2. Both the blackbody (kTBB ∼ 1.7 keV, TBB ∼ 20 MK)
and the bremsstrahlung models (kT ∼ 9 keV) require
unrealistically high temperatures, especially if the X-rays
are from a PWN as argued by Hughes et al. (2000). The
derived parameters for the power law model are similar
to that observed in other PWN (e.g. Gotthelf 2003), and
agree well with the results obtained for G328.4+0.2 by
Hughes et al. (2000). There is no evidence for thermal
X-ray emission, which one would expect from a SNR, in
this source.
Since the individual regions discussed in §2.1.1 did not
have enough counts for spectral fitting, we measured
their hardness ratio (HR), defined as:
H − S
H + S
where H is the number of counts in the Hard (higher
energy) band and S is the number of counts in the Soft
(lower energy) band, of the regions discussed in §2.1.1 to
determine if there were any spatial variations in the X-
ray spectrum of G328.4+0.2. Using the X-ray spectrum
as a guide, we calculate HR with H as the number of
counts between 4 and 8 keV and S as the number of
counts between 2 and 4 keV. Since the pixels on the pn
detector are sufficiently large that it is not possible to
separate the emission from these regions, we only use
data from the mos1 and mos2 detectors. The calculated,
background subtracted HR of G328.4+0.2 is 0.25± 0.04,
of the Clump 1 region is 0.29±0.07, of the Clump 2 region
is 0.31 ± 0.11, and of the diffuse region is 0.24 ± 0.05
(1σ errors). As a result, we conclude that there is no
significant change in the X-ray spectrum of G328.4+0.2
between these features.
2.1.3. Timing Results
A clear signature for the presence of a neutron star
would be the detection of X-ray pulsations in the emis-
sion from G328.4+0.2. We searched for this using the Z2n
test defined by Buccheri et al. (1983), where n is the har-
monic number of periodic signal, for n = 1, 2, 3, 4. The
maximum frequency searched was νmax = 1/(2n∆t), the
minimum frequency searched was 1/20 Hz, and the fre-
quency step was 1/4tobs (5× 10
−6 Hz, oversampling the
Nyquist rate by a factor of 2), where ∆t is the time res-
olution of the dataset (5.7 ms) and tobs is the length of
the observation. We only used events from the pn in-
strument (5.7 ms since it was operated in Small Win-
dow Mode) due to the poor time resolution (2.6s) of
the mos data. Additionally, we only used events from
the Clump 1 region because only only emission from the
central NS should be pulsed and as the brightest X-ray
region of G328.4+0.2, this region is the most probable
location of any neutron star. Unfortunately, due to the
large pixel size of the pn instrument this region is con-
taminated by emission from Clump 2 and the Diffuse re-
gion. The event times from the resultant event list were
barycentered to the Solar System reference frame, and we
searched for a periodicity over multiple energy ranges in
order to maximize the sensitivity of our search. The most
significant period detected was in the dataset which only
included photons between 5 and 10 keV (159 photons),
in which for n = 4 a signal with period P = 336 ms had
Z23 = 49.5 in 2074786 independent trials for this value
of n and energy range, which has a 12% chance of being
a false positive, a < 2σ result. Statistically, the most
significant sinusoidal (n = 1) pulse was in the 1–20 keV
dataset (340 photons), and had a period P = 73.1 ms
with a Z21 = 34.4 in 8365396 trials, a 34% chance of
being a false positive. Assuming that this signal is not
significant, we derive an upper limit on the pulse fraction
of 45% for a sinusoidal pulse profile and 22.5% for a δ-
function pulse profile (Leahy et al. 1983). Using the pn
count rate and size of the Clump 2 and Diffuse regions,
∼ 1/3 of the counts in the Clump 1 region is contamina-
tion from these regions. Accounting for this, we are only
able to put an upper limit on the pulse fraction of 67%
for a δ-function pulse profile. This upper limit is consis-
tent with the pulse fraction observed from other young
neutron stars.
2.2. Australia Telescope Compact Array Observations
Based on previous radio observations of G328.4+0.2,
there was a dispute in the literature as to whether this
source is a PWN, as argued by Gaensler et al. (2000), or
a composite SNR, as argued by Johnston et al. (2004).
The argument for this source being a PWN centered
on the flat spectrum of the radio emission, as well
as the high degree of polarization observed from the
center (Gaensler et al. 2000), while the radial polariza-
tion angles observed at the edge of this is more consis-
tent with a SNR (Johnston et al. 2004). If there is a
SNR component in G328.4+0.2, we expect that some
of the radio emission from this source should have a
steep (α < −0.3) spectrum. To search for such emis-
sion, we observed G328.4+0.2 for 12 hours at 1.4 GHz
with the Australia Telescope Compact Array (ATCA)
on 2005 June 25. Flux density calibration was carried
out using an observation of PKS B1934-638, and phase
calibration was carried out with regular observations of
PMN J1603-4904. The observation was carried out using
two 128 MHz bands, one centered at 1.344 GHz and the
other at 1.432 GHz, and the data reduction was done
using the miriad software package. The observation
was conducted when the ATCA was in the 6B configu-
ration, which has a longest baseline of ∼6000 m (∼ 6.′′9)
and a shortest baseline of ∼200 m (∼ 3.′4). As a re-
sult, this dataset alone is not sensitive to large-scale
emission from G328.4+0.2. To improve the sensitivity
to diffuse emission, we combined this dataset with the
1.4 GHz data used by Gaensler et al. (2000) as well as
continuum data gathered in the Southern Galactic Plane
Survey (McClure-Griffiths et al. 2005). Total intensity
images from this combined dataset were formed using
natural weighting, multi-frequency synthesis, and maxi-
mum entropy deconvolution. The final image, shown in
Fig. 3, has a resolution of 7.′′0×5.′′8, and an rms noise of
∼ 0.15 mJy beam−1. The measured 1.4 GHz flux density
of G328.4+0.2 is 13.8±0.4 Jy – consistent with the value
measured by Gaensler et al. (2000). For a 4.5 GHz flux
of 12.5± 0.2 Jy (Gaensler et al. 2000), this implies that
G328.4+0.2 has a radio spectral index α = −0.03± 0.03.
As seen in Fig. 3, the radio emission from G328.4+0.2
is very complicated, and contains multiple morphological
features. The major features are:
• Central Bar: The Central Bar is the bright-
est morphological feature in G328.4+0.2, and
has been previously detected at both 4.8 GHz
(Gaensler et al. 2000) and 19 GHz (Johnston et al.
2004). The Central Bar runs roughly E-W, and the
western edge of the bar is bifurcated, first noticed
by Johnston et al. (2004). The length of the bar
is ∼ 1.′75 long, and inside the bar there are three
peaks in the radio emission.
• Filamentary Structure A: These are the curved
filaments near the center of G328.4+0.2 which ap-
pear to be connected to the Central Bar, the bright-
est of which is the “Y” shaped structure NE of the
eastern edge of the Central Bar. In general, these
filaments are more prominent on the eastern side
of G328.4+0.2 and appear confined to the central
region of G328.4+0.2, not extending much beyond
the inner half.
• Filamentary Structure B: These are the faint,
radial filaments predominantly found on the west-
ern side of G328.4+0.2, as shown in Fig. 4. The
inner parts of these filaments are located ∼ 2.′5
from the center of G328.4+0.2, and their length
varies across G328.4+0.2 – in the southern half,
two filaments appear to extend to the edge of the
source while in the northern and western parts of
G328.4+0.2 they are substantially shorter. While
some of these filaments are kinked or curved, most
are fairly straight and radial in orientation. These
features were not detected by Gaensler et al. (2000)
and Johnston et al. (2004) due to the insufficient
u–v coverage of those observations.
• Filamentary Structure C: As shown in Fig. 3,
these are features located near the outer edge of
G328.4+0.2 that are parallel to the outer edge.
These features are more prominent and more plen-
tiful in the eastern half of G328.4+0.2.
• Outer Protrusions: The Outer Protrusions are
faint features – the two most prominent of which
are in the NE quadrant of G328.4+0.2 – that ex-
tend beyond the outer boundary of G328.4+0.2.
Several of these structures have bow-shock mor-
phologies.
The physical interpretation of these features will be pre-
sented in §3.2. It is worthwhile to note here that several
of these morphological features (e.g. the Central Bar and
the internal filamentary structures) have been observed
in other PWNe such as MSH 15-56 (Dickel et al. 2000)
and 3C58 (Slane et al. 2004), while others (e.g. the Fil-
amentary Structure C and Protrusions) are more char-
acteristic of SNRs, such as the Vela SNR (Bock et al.
1998).
This XMM observation also allows, for the first time,
a comparison between the radio and X-ray morphology
of G328.4+0.2. As shown in Fig. 3, there is a signifi-
cant offset between the X-ray and the center of the radio
emission, with Clump 1 located ∼ 80′′ from the center
of the radio emission. Additionally, the extent of the
X-ray emission is significantly smaller than that of the
radio emission. A physical interpretation of the X-ray
morphology and its relation to the radio emission will be
discussed in §3.2.
2.2.1. Spectral Index Map
The previous 1.4 GHz dataset had a resolution (∼ 20′′)
significantly worse than that of the 4.5 GHz data, and
therefore is not suitable for determining if there are
small scale changes in α inside G328.4+0.2. With these
new, high-resolution 1.4 GHz observations, it was pos-
sible to make a spectral index map of G328.4+0.2 us-
ing our new 1.4 GHz data and the 4.5 GHz data pre-
sented by Gaensler et al. (2000) since these datasets have
comparable u–v coverage. To detect any variation in
α, we made a spectral tomography map of G328.4+0.2
(Katz-Stone & Rudnick 1997). To do this, we first
produced a 1.4 GHz image of G328.4+0.2 from data
matched in u–v coverage with the 4.5 GHz data, and then
smoothed both the new 1.4 GHz image and the 4.5 GHz
image to a resolution of 8′′ to account for the poorer reso-
lution of the 1.4 GHz data. Finally, we produced a series
of difference images (Idiff,α) using the following formula:
Idiff,α= I1.4 − I4.5
where I1.4 and I4.5 are the 1.4 and 4.5 GHz images
produced above. In this method, the spectral index of
a region is determined by the spectral index at which
it disappears from the difference image. As shown in
Fig. 5, most of the radio emission from G328.4+0.2 has
a spectral index between α ∼ −0.1 and α ∼ +0.1, while
the outer edges of G328.4+0.2 have a steeper spectrum
(α ∼ −0.4) than the center, particularly the western edge
of G328.4+0.2. This steeper spectrum material is coin-
cident with some of the Filamentary Structure C dis-
cussed in §2.2, but there are no spectral features associ-
ated with any of the other radio morphological features
in G328.4+0.2 or with the X-ray emission. A physical
interpretation of these results will be discussed in §3.2.
2.3. Search for the radio pulsar at Parkes
As part of a project to search for pulsar counterparts to
all Galactic PWNe (e.g. Camilo et al. 2002a), on 2005
October 13 we observed G328.4+0.2 using the ATNF
Parkes telescope in NSW, Australia. As for similar
such work, we employed the central beam of the Parkes
multibeam receiver at a central frequency of 1374MHz,
with 96 frequency channels across a total bandwidth of
288MHz in each of two polarizations. The integration
time was 24ks, during which total-power samples were
recorded every 0.25ms for off-line analysis.
We analyzed the data with standard pulsar searching
techniques using PRESTO (Ransom et al. 2002). We
searched the dispersion measure range 0–2600cm−3 pc
(twice the maximum Galactic DM predicted for this line
of sight by the Cordes & Lazio 2002 electron density
model), while maintaining close to optimal time reso-
lution. In our search we were sensitive to pulsars whose
spin period could have changed moderately during the
observation due to very large intrinsic spin-down. The
search followed very closely that described in more detail
in Camilo et al. (2006). We did not identify any promis-
ing pulsar candidate in this search.
Applying the standard modification to the radiometer
equation, for an assumed pulsation duty cycle of 10%,
and accounting for a sky temperature at this location
of 15K, we were nominally sensitive to long-period pul-
sars having a period-averaged flux density at 1.4GHz
of S1400 > 0.05mJy. In fact, this limit applies only
to such long-period pulsars as to not be of practical
interest for us: for a distance of ∼ 17 kpc along this
line of sight, the expected DM is ≈ 1200 cm−3 pc, and
the scatter-broadening of the radio pulses due to multi-
path propagation is expected to be ∼ 50ms at 1.4GHz
(Cordes & Lazio 2002). This would likely render pulsa-
tions undetectable from any short-period pulsar, such as
we expect to power G328.4+0.2, regardless of average
radio flux. We therefore repeated the search at a higher
radio frequency ν, since the scattering timescale is ap-
proximately ∝ ν−4.
On 2007 January 4 we observed G328.4+0.2 at Parkes
at a central frequency of 3078MHz, with 288 channels
spanning a bandwidth of 864MHz in each of two polar-
izations. The total-power samples were recorded every
1ms for 30 ks, and analyzed in a manner analogous to
that described previously for the 1.4GHz data. This time
a few somewhat-promising candidates were identified in
the analysis, and a second 3GHz observation was made,
on 2007 March 19, for 36 ks. Analysis of this second ob-
servation did not confirm the original candidates, and we
have therefore not detected any radio pulsar counterpart
for the PWN in G328.4+0.2. The sensitivity of our 3GHz
observations was about 0.03mJy for long-period pulsars
(P & 20ms) and decreasing gradually for shorter peri-
ods. For the predicted DM, at this frequency the scat-
tering timescale is expected to be ∼ 2ms, comparable to
the dispersion smearing across each individual channel,
∼ 1ms. Propagation effects should therefore not have
prevented the detection of signals with P & 5ms.
Converting the 3GHz flux density limit to a frequency
of 1.4GHz, using a typical pulsar spectral index of –1.6
(Lorimer et al. 1995), results in S1400 . 0.1mJy. For
a distance of ∼ 17 kpc, this corresponds to a pseudo-
luminosity limit of L1400 ≡ S1400d
2 . 30mJykpc2. This
is comparable to L1400 of the very young pulsars B1509–
58, J1119–6127, and Crab (Camilo et al. 2002b), but a
factor of about 60 greater than for the young pulsar
in 3C58 (Camilo et al. 2002c), which has the smallest
known radio luminosity among young pulsars. Based
on these results, it is therefore entirely possible that
G328.4+0.2 harbors an as-yet undetected young pulsar
beaming toward the Earth with an ordinary radio lumi-
nosity.
3. interpretation of x-ray and radio observations
of g328.4+0.2
The X-ray spectrum of G328.4+0.2 is characteristic of
PWNe (see Gotthelf 2003 for a compilation of the X-
ray properties of PWNe), and therefore conclude that
the X-ray emission from G328.4+0.2 comes from a PWN
and not from a SNR. The same is true for the polarized,
flat-spectrum radio emission detected from the center
of G328.4+0.2 which are also characteristic of PWNe.
Therefore, in the following discussion we assume that
the X-ray and flat spectrum radio emission are both pro-
duced by a PWN. In §3.1, we discuss the evolutionary
sequence of PWNe in SNRs and the observational signa-
tures of each stage. In §3.2, we use the results from the
observations presented in §2.1 and §2.2 to draw general
conclusion about the properties of G328.4+0.2.
3.1. Evolution of PWN in SNRs
Both PWNe and SNRs are dynamic objects, and when
the PWN is inside the SNR its evolution is affected by
the behavior of both the central neutron star and the sur-
rounding SNR. While the PWN is inside the SNR, it typ-
ically goes through three evolutionary phases (Chevalier
1998; van der Swaluw et al. 2004):
• The Free-Expansion Phase – In this phase, the
PWN freely expands into the cold material inside
the SNR, sweeping up and shocking the surround-
ing ejecta into a thin shell (Chevalier & Fransson
1992; van der Swaluw et al. 2001). Since the PWN
is confined only by the shock wave its expansion
drives into the surrounding SNR, and the velocity
of this shock wave is much larger than the neutron
star velocity, it is free to move inside the SNR with
the neutron star.
• Collision with the Reverse Shock – As the
SN sweeps up and shocks the surrounding a am-
bient material, a reverse shock (RS) is driven
into the ejecta. Eventually, the PWN will en-
counter the RS, and as a result, can not con-
tinue to freely expand inside the SNR because it
is no longer in an essentially pressureless environ-
ment. Initially, the pressure behind the RS is
higher than the pressure inside the PWN, and as
result the PWN is compressed. As it contracts,
the pressure inside the PWN increases adiabati-
cally and eventually will be higher than its sur-
rounding, and will as a result re-expand inside the
SNR (Blondin et al. 2001; Bucciantini et al. 2003;
Reynolds & Chevalier 1984; van der Swaluw et al.
2001). Once the PWN encounters the RS, the ex-
pansion velocity of the PWN decreases signficantly
and falls below that of the neutron star, which is
unaffected by this collision. As a result, the neu-
tron star can detach itself from its PWN.
• Relic PWN Phase – When the neutron star
detaches from the relic nebula, it forms a new
PWN from the relativistic e+/e− plasma it contin-
ues to inject into the SNR (van der Swaluw et al.
2004). The PWN around the neutron star and
relic nebula evolve differently. The relic nebula
continues to contract/expand inside the SNR un-
til it achieves pressure equilibrium with its sur-
roundings, a process that can take many tens of
thousands of years. The new PWN initially ex-
pands sub-sonically, but when the neutron star is
∼ 2/3 of the way to the SNR shell, its velocity
will become supersonic relative to the surrounding
material and the PWN will take on a bow-shock
morphology (van der Swaluw et al. 2004). Eventu-
ally, the neutron star will leave the SNR, and as it
passes through the SNR shell it may re-energize the
surrounding SNR material, as possibly observed in
SNRs G5.4-1.2 and CTB80 (Shull et al. 1989).
As the PWN evolves inside the SNR, its appearance
changes radically. During the Free-Expansion phase, the
morphology of the PWN is determined by the properties
of the particle wind expelled by the neutron star. In
these cases, the neutron star is in the center of the PWN,
and there is no significant offset between the radio and
X-ray emission from the PWN. In general, during this
stage the PWN is located near the center of the observed
SNR shell. Examples of PWNe in this evolutionary stage
are the PWNe in SNR 0540-693 in the LMC (Reynolds
1985), as well as those in the Milky Way SNRs G11.2-
0.3 (Roberts et al. 2003; Tam et al. 2002) and G21.5-0.9
(Matheson & Safi-Harb 2005).
Due to the offset in the PWN’s position with respect to
the SNR’s center as a result of the neutron star’s veloc-
ity or inhomogeneities in the ISM, it is expected that one
side of the PWN will encounter the RS before the other
side (Blondin et al. 2001; van der Swaluw et al. 2004).
As a result, the PWN will no longer be symmetrically
oriented around the neutron star (van der Swaluw et al.
2004), leading to an offset between the radio and X-ray
emission from the PWN. Because the cooling time for X-
ray producing electrons is very short compared to that of
radio emitting electrons, the X-ray emission of a PWN
is expected to be brightest at the current location of the
neutron star while the radio emission reflects the effect
of the RS on the PWN. The compression/re-expansion
cycle triggered by the PWN/RS collision will also affect
the appearance of the PWN. According to a spherically
symmetric MHD simulation of a PWN in this phase,
compression of the PWN by the RS leads to an over-
pressurized region forming in the center of the PWN.
Material injected by the neutron star after this point is
then confined to the small region, leading to the forma-
tion of a radio/infrared “hot spot” in the center of the
PWN (Bucciantini et al. 2003).
The PWN/RS interaction also leads to the forma-
tion of hydrodynamic, primarily Rayleigh-Taylor (R-T),
instabilities at the PWN/SNR interface (Blondin et al.
2001). During the PWN’s initial free-expansion phase,
the shell of material swept up by the PWN is subject to
both thin shell and R-T instabilities (Bucciantini et al.
2004; Jun 1998), but the growth rate of these features
is expected to be sufficiently small that the PWN is not
disrupted, especially if even a small percentage of the
total energy of the pulsar’s wind is in magnetic fields
(Bucciantini et al. 2004). However, during the PWN/RS
interaction, rapid mixing of pulsar wind and SNR ma-
terial is expected when the PWN re-expands into the
SNR (Blondin et al. 2001). In fact, numerical simula-
tions suggest that the PWN is disrupted after its first
re-expansion into the SNR as a result of these instabil-
ities (Blondin et al. 2001). It is important to note that
these instabilities are only expected to affect the relic
nebula and not the new PWN formed by the neutron
star further from the SNR’s center. Once the composite
SNR enters the Relic PWN phase, the X-ray emission is
expected to be dominated by the new PWN since it con-
tains the high energy particles recently injected by the
neutron star, while the radio emission is dominated by
the relic nebula which contains most of the older parti-
cles that are expected to contribute at low frequencies.
As a result, during this phase the radio-emitting elec-
trons are expected to be dominated by electrons injected
during the free-expansion phase, while the X-ray is from
electrons injected after the passage of the RS.
3.2. Observational Results
In this Section, we use the results from the observa-
tions presented in §2.1 and §2.2, as well as the basic evo-
lutionary sequence for PWN in SNRs described above
in §3.1 to make some initial statements on the nature,
evolutionary state, and properties of G328.4+0.2. The
discussion given below is a very general interpretation of
observed radio and X-ray features in G328.4+0.2, it pro-
vides a framework in which to test the various scenarios
for G328.4+0.2 discussed in §4.1.1 and §4.1.2.
As mentioned in §1, there is a debate in the
literature as to whether G328.4+0.2 is a PWN
(Gaensler et al. 2000; Hughes et al. 2000) or a composite
SNR (Johnston et al. 2004). In neither of the X-ray or
radio observations presented above is there clear evidence
(e.g. thermal X-ray emission or a bright, steep spec-
trum, radio shell) for a SNR component. If G328.4+0.2
is a composite SNR, then the outer boundary of the ra-
dio emission likely marks the outer radius of the SNR
component, while the observed flat-spectrum radio emis-
sion and power law X-ray emission are emitted by the
PWN. Using the extent of the flat-spectrum radio emis-
sion shown in Fig. 5 to estimate the size of the PWN
component, we obtain that the radius of the PWN in this
source must be & 2/3RG328, the radius of G328.4+0.2.
If G328.4+0.2 is a composite SNR, then Filamentary
Structure 3, which as mentioned in §2.2.1 might have
a steeper spectral index than the rest of the radio emis-
sion in G328.4+0.2, would be emission from the SNR.
This emission is then possibly analogous to the corru-
gated structures seen in the NE part of the Tycho’s SNR
(Velazquez et al. 1998). Additionally, in this case the
Outer Protrusions mentioned in §2.2 maybe are ejecta
“bullets”, similar to those observed in the Vela SNR
(Aschenbach et al. 1995) and SNR N63A (Warren et al.
2003).
If G328.4+0.2 is a PWN, then the outer boundary of
the radio emission is the outer radius of the PWN and
the SNR in which it resides is undetected, similar to the
case for the Crab Nebula. As a result, the steep spectrum
radio emission seen at the edge of G328.4+0.2, as well
as Filamentary Structure C and the Outer Protrusions
observed in the radio, correspond to material swept-up
by the PWN. This is because radio emission from the
pulsar wind is observed to have a flat spectrum, un-
like this material. As a result, Filamentary Structure C,
as well as the radial component seen by Johnston et al.
(2004) in the polarization angle along the outer edge of
G328.4+0.2, are the result of the hydrodynamical (HD)
instabilities at the PWN/SNR interface. Since these
instabilities only occur when the PWN is accelerating
the shell of swept-up material surrounding it, he current
pressure inside the PWN (Ppwn) must be higher than
that of the SNR material just of the PWN [Psnr(Rpwn)].
Additionally, in this case the Outer Protrusions would be
the result of the PWN currently expanding into a clumpy
medium (e.g. the PWN analog of the process described
for young SNR by Jun et al. 1996), which requires that
the expansion speed of the PWN, vpwn, is currently posi-
tive. In this case, it is possible that the SNR surrounding
G328.4+0.2 will be detected at a later date, as was the
case for G21.5-0.9 (Matheson & Safi-Harb 2005).
Regardless of whether G328.4+0.2 is a composite SNR
or a PWN, the offset between the radio and X-ray
emission from the PWN component implies that the
PWN/RS collision has already occurred. The Central
Bar is then the remains of the over-pressurized region
created when the PWN was compressed by the RS, with
the bar-like shape of this region the result of either an
asymmetric RS or the anisotropic wind emitted by the
neutron star. The Central Bar should then consist of
pulsar wind material, which accounts for the α ∼ 0 spec-
tral index of this region as shown in Fig. 5. Additionally,
the flat spectral index observed from Filamentary Struc-
tures A implies that this feature is emitted by pulsar
wind material. As mentioned in §3.1, when the PWN
re-expands after the initial compression by the RS, nu-
merical simulations suggest that the rapid mixing of the
SNR ejecta and pulsar wind material can then occur. In
their 2D simulations of this process, Blondin et al. (2001)
observe features similar to that of Filamentary Structure
A, and as a result we conclude that the existence of these
features requires that the PWN has re-expanded at least
once after its initial compression by the RS. Since the ra-
dius of flat-spectrum radio emission from G328 is larger
than the outer radius of Filamentary Structure A, the
instabilities formed during this expansion must not have
completely disrupted the PWN. This differs from the re-
sults of Blondin et al. (2001), in which the PWN is dis-
rupted during the first re-expansion. This is most likely
due to the damping effect of the PWN’s magnetic field
on Raleigh-Taylor instabilities, which is not accounted
for by Blondin et al. (2001).
Finally, we discuss the X-ray emission seen from
G328.4+0.2 which, based on its X-ray spectrum, is emit-
ted by pulsar wind material. The observed offset between
Clump 1, Clump 2, and the Diffuse regions of the X-ray
emission implies that the PWN is not freely expanding,
consistent with the explanation that the PWN has col-
lided with the RS. We identify Clump 1 as the current
location of the neutron star since it is the brightest X-ray
feature, Clump 2 as the location of the termination shock
in the PWN (Kennel & Coroniti 1984), and the Diffuse
emission is produced by recently injected plasma stream-
ing away from the neutron star. In this case, the extent
of the Diffuse component depends on the synchrotron
lifetime of the X-ray emitting particles in the PWN.
The X-ray emission also provides an estimate of the
physical properties of the central neutron star, namely
a measure of the neutron star’s rotational spin-down en-
ergy, Ė. A comparison of observed X-ray luminosity Lx
and Ė shows a trend that neutron stars with a higher Lx
have a higher Ė, and that the relationship between these
two quantities is (Possenti et al. 2002):
logLX,(2−10)=1.34 log Ė − 15.34 (3)
where LX,(2−10) is the X-ray luminosity of the source
between 2 and 10 keV, albeit with a significant scatter
(Possenti et al. 2002). For the absorbed power-law fit to
the X-ray emission from G328.4+0.2 given in Table 2,
the unabsorbed 2–10 keV flux of G328.4+0.2 is ∼ 1 ×
10−12 ergs cm−2 s−1. For a distance to G328.4+0.2 of
d = 17d17 kpc, we obtain that:
LX,(2−10)∼ 3.5d
17 × 10
34 ergs s−1, (4)
which, using Eq. (3), gives us an estimate of Ė:
Ė ∼ 1.7d1.4917 × 10
37 ergs s−1. (5)
This number is somewhat less than estimate obtained
byGaensler et al. (2000) (Ė = 8.3× 1038 ergs s−1), who
assumed that Ė = 4 × 10−4LR, where LR is the radio
luminosity of G328.4+0.2. Our estimate of Ė is similar
to the estimate by Hughes et al. (2000) (Ė ∼ 1037 − 2×
1038 ergs s−1), who also used the Lx − Ė relation.
4. simple hydrodynamic model for the evolution of
a pwn inside a snr
In order to determine what neutron star, SN, and am-
bient density properties are required to produce a system
with these properties described in §3.2, we have devel-
oped a simple hydrodynamic (HD) model for the evo-
lution of a PWN inside of a SNR, which we then ap-
ply to G328.4+0.2 in §4.1. This model is based largely
on the models developed by Blondin et al. (2001) and
van der Swaluw et al. (2001). The main goal of this
model is to determine the radius of the PWN, Rpwn,
as it progresses through the evolutionary sequence de-
scribed in §3.1. In this model, we assume that the PWN
can be treated as a perfect gas with adiabatic index
γ = 4/3 and that is expanding into a SNR filled with
a perfect gas with adiabatic index γ = 5/3. We also as-
sume that the material swept-up by the PWN initially
lies in a thin shell with inner radius R = 23/24 Rpwn
(van der Swaluw et al. 2001), as shown in Fig. 6. The
dynamics of this mass shell are determined by the dif-
ference in pressure between the PWN and SNR, and by
calculating the radius of this mass shell we determine
Rpwn(t). Once the PWN enters the Relic PWN phase
of its evolution, this model only determines the proper-
ties of the relic nebula. What follows is a brief qualitative
description of the model used, while the full suite of equa-
tions used to implement it quantitatively can be found
in Appendix A.
As mentioned above, we model Rpwn(t) by calculating
the outer radius of the mass shell swept up by the PWN,
ignoring the effect of any instabilities which could dis-
rupt this shell. We solve for Rpwn(t) by assuming that
we know the values for the relevant quantities at a time
t − ∆t, and then calculate them for a time t, since the
relevant equations can not be solved at all times analyt-
ically. we wrote a program in IDL to implement this
numerically using the following procedure:
1. Calculate Rpwn(t+∆t) by assuming that the mass
shell around the PWN between t and t+∆t moves
with a constant velocity vpwn(t).
2. Calculate the internal energy of the PWN, Epwn(t+
∆t), using the first law of thermodynamics:
∆Epwn= Ėt− Ppwn∆Vpwn (6)
which takes into account energy losses from the adi-
abatic expansion/contraction of the PWN as well
as any energy input from neutron star into the
PWN between t and t + ∆t if the neutron star is
still inside the PWN. Since we assume the PWN is
filled with a γ = 4/3 perfect gas, Epwn ∝ PpwnVpwn
from the ideal gas law, where Ppwn is the internal
pressure of the PWN and Vpwn is the volume of
the PWN, as defined in Equations (A5 and (A6).
Since Vpwn ∝ R
3, and γ = 4/3 requires that
Ppwn ∝ V
pwn , we derive that Ppwn ∝ R
−4. As
a result, if there is no input from the neutron star,
then Epwn ∝ R
pwn. The energy input from the
neutron star (∆Epsr) is calculated by integrating
Eq. (A7) between t and t+∆t.
3. Calculate Ppwn(t + ∆t) using Equations (A5) and
(A6).
4. Calculate the pressure inside the SNR (Psnr), the
density inside the SNR (ρej), the velocity of the
material inside the SNR (vej), and the sound speed
of the material inside the SNR (cs), at the outer
radius of the PWN, R = Rpwn(t + ∆t) using a
model for the evolution and structure of a SNR, as
described in Appendix A.
5. If the PWN is expanding faster than the SNR ma-
terial around it, increase the mass of the shell sur-
rounding the PWN, Msw,pwn(t+∆t), accordingly,
as described in Appendix A.
6. Calculate the force on the mass shell surrounding
the PWN, Fpwn(t + ∆t), using Eqs. (A14) and
(A15). During the initial free-expansion of the
PWN inside the SNR, these equations reduce to
Equation A4 of van der Swaluw et al. (2001) and
Equation 14 of Chevalier (2005).
7. Calculate the new velocity of the mass shell around
the PWN, vpwn(t + ∆t), assuming that any mass
swept up by the PWN between t and t+∆t is done
so inelastically:
vpwn(t+∆t)=
Msw(t)vpwn(t) + Fpwn(t+∆t)∆t
Msw(t+∆t)
This is believed to be a reasonable approximation
because the newly swept-up material is shocked by
the mass shell and, as a result, its pre-existing mo-
mentum is transferred to the internal energy of the
mass shell.
This model is assumes that both the SNR and PWN
are spherically symmetric, the PWN remains centered
on the center of the SNR at all times, the PWN has no
effect on the evolution of the SNR, and that the ma-
terial swept-up by the PWN is incompressible and has
a negligible internal pressure. Additionally, this model
ignores the effects of magnetic field (Bucciantini et al.
2003) and RT instabilities at the PWN/SNR interface
(Blondin et al. 2001; van der Swaluw et al. 2004) on the
properties of the PWN. Despite these simplifications, our
model does a reasonably good job of reproducing the re-
sults for Model A in Blondin et al. (2001), as shown in
Fig. 7. In general, relative to results of Blondin et al.
(2001) and other authors, our model tends to result in
larger oscillations in Rpwn/Rsnr and a larger initial com-
pression. The first discrepancy results from neglecting
the effect of instabilities at the PWN/SNR interface that
damp these oscillations, and the second from not includ-
ing the effect of reflected shocks that enter the PWN at
the time of the PWN/RS collision (Blondin et al. 2001).
Additionally, we find that scenarios with the same to-
tal amount of energy deposited by the neutron star into
the PWN, Epsr but with different neutron star properties
(e.g. different values of P0 and Bns) produce the same
behavior of Rpwn(t). This is different than the conclu-
sion of Blondin et al. (2001), and believe that this dis-
crepancy is the result of using a more realistic expression
for Ė, Eq. (A7), than a step function, the form used by
Blondin et al. (2001).
4.1. Application of Model to G328.4+0.2
In the following discussion, we use the model given in
§4 for a PWN’s evolution inside a SNR to examine the
different possibilities for the nature of G328.4+0.2 given
in §3.2. We first analyze the possibility that G328.4+0.2
is a composite SNR (§4.1.1), and then he possibility that
G328.4+0.2 is a PWN (§4.1.2). This model requires six
inputs: the characteristic timescale of the neutron star’s
spin-down, τ0, initial spin-down power of the neutron
star Ė0, the velocity of the neutron star, vns, the kinetic
energy of the SN ejecta Esn, the mass of the SN ejecta
Mej, and the number density of the surrounding material
n. In order to calculate these values, we use the following
information:
• The distance to G328.4+0.2 is 17 kpc (d17 ≡ 1),
which is the lower limit on the distance to this
source as determined by Gaensler et al. (2000) us-
ing Hi absorption. This implies that the current
radius of G328.4+0.2 is RG328 ≡ 12.5 pc.
• The neutron star inside G328.4+0.2 is spinning
down with a braking index p = 3, the braking in-
dex produced by a pure dipole surface magnetic
field. Additionally, we assume that the neutron
star’s moment of inertia is I = 1045 g cm2, the
value derived for standard equations of state for
a neutron star (Shapiro & Teukolsky 1983). Both
of assumptions are standard in the literature (e.g.
Blondin et al. 2001).
• The offset between the Clump 1, which as described
in §3.2 is believed to be the location of the neu-
tron star powering the PWN, and the center of the
radio emission is due to neutron star’s spatial ve-
locity, vns. Using §2.1.1, we determine that this
observed offset corresponds to physical distance of
∼ 6.6d17 pc. Since the observed offset is due only
to the neutron star’s velocity in the plane of the
sky, it is a lower limit on the true distance the neu-
tron star has traveled since the SN explosion, rns.
If we assume that vns = rns/tnow, where tnow is
the age of G328.4+0.2, the observed offset allows
us to estimate the minimum spatial velocity of the
neutron star, vminns , equal to:
vminns =
6.6d17 pc
. (8)
• Using Equation (A19), we determine that for stan-
dard initial periods (P0 ∼ 5− 20 ms) and magnetic
field strengths (Bns = 5× 10
11 − 1013 G), τ0 varies
from ∼ 100 − 2000 years. To cover this range, we
assume that τ0 of the neutron star in G328.4+0.2
can have one of three different values:
τ0 = 430, 770, and 1730 years. (9)
which respectively correspond to a neutron star
with B = 1012 G and P0 = 5 ms, B = 3 × 10
and P0 = 20 ms, or B = 5 × 10
11 G and P0 =
5 ms. This range of τ0 is similar to those used
by Blondin et al. (2001), van der Swaluw et al.
(2001), and Bucciantini et al. (2003).
• G328.4+0.2 is expanding into a uniform medium
(s = 0 in the notation of Chevalier 1982). This
assumes G328.4+0.2 is much larger than either
the main sequence and late-stage wind bubble
formed by its progenitor, both of which are ex-
pected to have an interior ρ ∝ r−2 density struc-
ture. While the typical size of these structures
is smaller than RG328, this is not the the case
for very massive stars (M & 15 M⊙) for which
these bubbles can reach sizes of ∼ 100 pc or
larger in low density (n . 1 cm−3) environments
(Chevalier & Emmering 1989; Chevalier & Liang
1989). In this case, our assumption of a constant
density medium would not be correct. However,
the effect of G328.4+0.2 still being inside a stellar
wind bubble since this does not significantly mod-
ify the evolution of the SNR. Since their no a priori
information on the density around G328.4+0.2, we
assume it is one of the assume it has one of the
following values:
logn = −1.5,−1.0, 0, 0.5 cm−3. (10)
which cover the range of densities in the warm ion-
ized medium and the warm neutral medium.
• The ejecta mass of the SN explosion that formed
G328.4+0.2, Mej has one of the following values:
Mej = 1, 5, 10M⊙ (11)
and that the kinetic energy of the ejecta, Esn is:
log(Esn/10
51ergs) = −0.5, 0, 0.5. (12)
This range of Mej and Esn incorporate the range
inferred from observations of “normal” SNe, but
do not include hypernovae.
• We assume that the Lx − Ė relationship used in
§3.2 is accurate to better than two orders of mag-
nitude. As a result, in §4.1.1, we assume that the
current spin-down luminosity of the neutron star
in G328.4+0.2 is one of the following:
Ė=0.017, 0.17, 1.7, 17, and 170× 1037 ergs s−1,(13)
and in §4.1.2 assume that 1.7 × 1035 < Ė < 1.7 ×
1039 ergs s−1.
These are the initial conditions used in both §4.1.1 and
§4.1.2. The remaining input parameters into the model
are the initial spin-down luminosity Ė0 and space ve-
locity vns of the neutron star, and the method for deter-
mining the possible values of these parameters is given in
§4.1.1 and §4.1.2. Finally, for all trials discussed in §4.1.1
and §4.1.2 the model begins at a time t = 0.5 years, with
∆t = 0.5 years.
4.1.1. G328.4+0.2 as a Composite SNR
In this Section, we evaluate the possibility that
G328.4+0.2 is a composite SNR. To do this, we first
assume that the outer edge of the radio emission from
this source denotes the edge of the SNR, and therefore
RG328 ≡ Rsnr. As a result, for a given value of Esn,
Mej, and n, we use the model for the evolution of a SNR
discussed in Appendix A to calculate the current age of
G328.4+0.2, tnow. With this value of tnow and assumed
values for the current spin-down luminosity of the neu-
tron star, Ė, and τ0, we are able to calculate both the
initial spin-down luminosity Ė0 and initial period P0 of
the neutron star in G328.4+0.2 using Eqs. (A7) and (A9),
respectively.
Using this procedure, we ran our model using all pos-
sible combinations of the input parameters (τ0, Ė, Esn,
Mej, and n) given in §4.1, for a total of 540 different com-
binations. To see which combination of these input pa-
rameters provide a plausible explanation for G328.4+0.2,
we require the following:
• Criterion 1: vminns < 2000 km s
−1, since a neu-
tron star with a higher velocity than this is ex-
tremely implausible based on pulsar observations
(Hobbs et al. 2005).
• Criterion 2: P0 > 2 ms, the minimum rota-
tion period of a young proto-neutron star before
it breaks up (Goussard et al. 1998).
• Criterion 3: The PWN is smaller than the SNR
for all t < tnow. While it is possible that PWN
could expand to fill the entire SNR, it is not con-
sidered likely, and is contrary to the model as-
sumption that the PWN does not affect the evolu-
tion of the SNR. Additionally, we also require that
Rpwn(tnow) ≥ 0.67Rsnr(tnow), due to the large ob-
served size of the flat-spectral index radio emission
which is produced from the PWN, as described in
§3.2.
• Criterion 4: G328.4+0.2 is currently in the Free-
Expansion or Sedov-Taylor phase of its expansion,
i.e. tnow < trad, where trad, the age when a SNR
goes radiative, is defined in Eq. A4. Once a SNR
has entered its Radiative phase, it is expected that
the radio emission from the SNR be confined to
thin, bright, filaments like those observed in SNR
G6.4-0.1 (Mavromatakis et al. 2004) – which are
not observed in G328.4+0.2, or that the SNR is
radio-quiet.
• Criterion 5: The PWN/RS collision has already
occurred, as described in §3.2, and the PWN has
been compressed as a result of its collision with the
RS. This is required to explain the Central Bar, as
described in §3.2. This requires that vpwn < 0 at
some point in the past – which can only occur at
a time t > tcol, the time when the PWN and RS
collide.
• Criterion 6: The Central Bar created by the
compression is still observable. This is satisfied
if either the PWN is currently being compressed,
vpwn(tnow) < 0, or if the compression ended suffi-
ciently recently such that it can be observed. The
Central Bar is believed to be formed by both a
pressure and a magnetic field enhancement at the
center of the PWN (Bucciantini et al. 2003). As a
result its observable lifetime is the synchrotron life-
time of electrons accelerated by the magnetic field
enhancement. Therefore, we assume that the life-
time of the central bar is the synchrotron age of
the accelerated electrons, τsynch, equal to:
τsynch = 3× 10
4ν−1/2B−3/2pwn years, (14)
where ν is the observed frequency, in units of Hz,
and Bpwn is the strength of the magnetic field in-
side of the PWN, in units of G. With this infor-
mation, in the case that vpwn(tnow) > 0 we deter-
mine if the central bar is still observable by eval-
uating τsynch at 22 GHz (since this is the highest
frequency at which the Central Bar is observed;
Johnston et al. 2004) at the time when the com-
pression ends, assuming that Bpwn can be derived
using the minimum energy estimate. This criterion
is satisfied if τsynch > tnow − tre−exp, where tre−exp
is the time when the compression phase ends.
That the above criteria fall into two categories: criteria
required for physical plausibility (Criteria 1–3) and those
which depend on our interpretation of the radio and X-
ray properties of G328.4+0.2 (Criteria 4–6).
Of the 540 possible combinations of the input parame-
ters, only one passes all six criteria. The predicted SNR,
PWN, and neutron star properties of this scenario are
given in Table 3, and the behavior of Rpwn as a function
of time is given in Figure 8. In this scenario, G328.4+0.2
is quite young, ∼ 4900 years old, and the energy in-
jected into the SNR by the neutron star is similar to
the kinetic energy of the SN explosion (∼ 1051 ergs).
However, in this scenario, as shown in Fig. 8, the pre-
dicted compression is very small; when the compression
begins, Rpwn = 3.838 pc, and when the PWN begins to
re-expand into the SNR, Rpwn = 3.834 pc. This small
decrease is not surprising given that, as shown in Table
3, the total energy inputed into the PWN by the pulsar
(Epsr) is very close to the kinetic energy of the SN ejecta
(Esn). This negligible decrease in the volume of the PWN
is unlikely to form a central bar as prominent as the one
observed (Fig. 3), and therefore we feel is unlikely to be
the correct explanation for G328.4+0.2.
4.1.2. G328.4+0.2 as a PWN
In order to evaluate if G328.4+0.2 is a PWN, we use
the model presented in §4 to determine the earliest time6
(tnow) at which a PWN powered by a neutron star with
a given initial period P0 reaches the observed size of
G328.4+0.2 (Rpwn = 12.5 pc) if it is expanding into as
yet unseen SNR formed by ejecta with initial mass Mej
and kinetic energy Esn which exploded in a constant-
density ambient medium with number density n. To
consider all reasonable cases, we ran our model using all
combinations of the values of τ0, Esn, Mej, and n given
in §4.1, as well as P0 = 5, 10, 25, 100 ms for a total of
432 different trials. In this scenario, since it is not pos-
sible to determine an independent estimate of the age
of the system, it is necessary to assume a value of P0.
To determine which of these combinations are possible
explanations for G328.4+0.2, we required that:
6 Due to oscillations in radius the PWN undergoes after its col-
lision with the RS, it can reach the current size at multiple times.
• Criterion 1: vminns < 2000 km s
−1. Since, in
this scenario, we have no prior estimate of the
age of G328.4+0.2, when we run our model we as-
sume that vns = 0. This does not effect our re-
sults because rns, as measured in §4.1, is less than
Rpwn ≡ RG328, and therefore the neutron star is
always injecting energy into the PWN as it does if
vns = 0. Once, for a given set of input parameters
we have determined tnow, we calculate v
ns using
Equation (8).
• Criterion 2: The current spin-down energy of the
neutron star in G328.4+0.2 is between 0.017 ≤
˙E,37 ≤ 170, where ˙E,37 ≡ Ė/10
37 ergs. This is
based on the work done in §3.2, and is consistent
with the initial values of Ė used in §4.1.1. Since
in this scenario we have no estimate of the age of
G328.4+0.2, we are unable to assume a value for
Ė of the central neutron star and then calculate its
initial spin-down luminosity, as we did in §4.1.1.
• Criterion 3: Rpwn < Rsnr for all times t < tnow,
as explained in §4.1.1.
• Criterion 4: The PWN has already collided, and
has been compressed by, the RS, as explained in
§3.2.
• Criterion 5: The Central Bar created by the
compression of the PWN is still observable. The
method of determining if this is satisfied is the same
as the one used in §4.1.1.
• Criterion 6: The PWN must have been able to
form RT instabilities after the PWN/RS collision.
As in §4.1.1, we implement this requirement by re-
quiring that Ppwn > Psnr(Rpwn) for some t > tcol.
Additionally, as explained in §3.2, in order for the
PWN to create Filamentary Structure C it must
currently be unstable to R-T instabilities – requir-
ing that Ppwn > Psnr(Rpwn) now.
• Criterion 7: As explained in §3.2, the ob-
served Outer Protrusions in the radio require that
the PWN currently be expanding into the SNR,
vpwn(tnow) > 0.
• Criterion 8: G328.4+0.2, must have only under-
gone one compression/re-expansion cycle. As ex-
plained in §3.2, numerical simulations of PWN in-
side SNRs finds that the PWN is disrupted after
the first such cycle (Blondin et al. 2001).
It is important to note that, if G328.4+0.2 is a PWN,
then the radio and X-ray observations provide little in-
formation on the evolutionary phase of the (unseen) SNR
and no information on the current ratio of the PWN and
SNR radii.
Out of the 432 possible combinations of the input pa-
rameters, only five satisfy all ten of the above criteria, as
listed in Table 4. While the neutron star appears to be
inside the PWN, it is possible that this is just a projec-
tion effect. To evaluate the possibility that the PWN in
G328.4+0.2 has already entered the Relic PWN phase of
its evolution, we calculated vmin,IIns , defined as:
vmin,IIns =
RG328
. (15)
If vns > v
min,II
ns , then the G328.4+0.2 is a Relic PWN, if
not, then it is still in the Collision with the RS phase of
its evolution.
As shown in this Table, the predicted properties of
G328.4+0.2 vary substantially if G328.4+0.2 is inside a
Sedov or Radiative SNR. In the Sedov case, G328.4+0.2
is quite young, and progenitor SN explosion had a normal
explosion energy but a low ejecta mass (Mej ∼ 1M⊙),
and it occurred in a low density environment. Ad-
ditionally, the neutron star formed in this explosion
was spinning rapidly, has a low surface magnetic field
strength (Bns < 10
12 G), and a high space velocity
(vns & 800 km s
−1). In the Radiative Case, G328.4+0.2
is substantially older, and the progenitor SN explosion
was a low kinetic energy (Esn ∼ 3 × 10
50 ergs) and high
ejecta mass expanding. The neutron star in this case
was born spinning somewhat slower and has a normal
magnetic field strength for a young neutron star.
With the information presented in Table 5, it is possi-
ble to further refine the expected SNR and PWN prop-
erties. As argued in §4.1.1, the prominence of the central
bar argues that, during the compression stage, the vol-
ume of the PWN decreased significantly. Though it is not
possible at this time to quantify the compression needed,
an examination of Table 5 shows that for only two mod-
els, ST 2 and Rad 1, did the volume of the PWN decrease
by more than 10% – and therefore these two models are
the most probable descriptions of G328.4+0.2. In the
case of Rad 1, vmin,IIns ∼ 100 km s
−1 is significantly less
then the average neutron star velocity, v ∼ 400 km s−1
(Faucher-Giguère & Kaspi 2006; Hobbs et al. 2005), im-
plying that the PWN is in the Relic PWN phase of its
evolution. Since the sound speed inside a Radiative SNR
is quite low, ∼ 100 km s−1, we expect that the PWN in
the Rad 1 scenario would have a bow-shock morphology.
Since there is no clear evidence for this in the X-ray or
radio emission from G328.4+0.2, this suggests that ST 2
is a better fit to the data.
The conclusion that ST 2 is an accurate description
of G328.4+0.2 is supported by circumstantial evidence
as well. For this scenario, the expected radius of the
termination shock around the neutron star, rts, defined
as (Slane et al. 2004):
r2ts =
4πcPpwn
, (16)
assuming a spherical wind, is rts ∼ 0.6 pc, which corre-
sponds to an angle of θts ∼ 8d17
′′ – a distance which is
comparable to the offset between Clump 1 and Clump 2
derived in §2.1.1. Another interesting feature for this
model is that, as shown in Fig. 4, the radius of the PWN
at the time of re-expansion is similar to that of the outer
parts of the central filamentary structures discussed in
§2.2. While this correlation might be coincidental, this
could imply that the hydrodynamic instabilities formed
at the PWN/SNR interface during the re-expansion dis-
rupted the shell of material swept up by the PWN –
consistent with the simulation of Blondin et al. (2001).
While not definitive, these two pieces of evidence argue
that ST 2 is a reasonable description of G328.4+0.2.
The properties of ST 2 are given in Table 6, and the
evolution of Rpwn is shown in Fig. 9. It is interesting to
note that in this scenario, the PWN collides with the RS
at a time tcol < τ0 (tcol ≈ 850 years) so energy injection
by the neutron star into the PWN after the PWN/RS
collision is important to the PWN’s evolution during this
stage. It is important to note that this model predicts
that the neutron star powering G328.4+0.2 is the most
energetic neutron star in the Milky Way, as well as one
of the fastest. Given that G328.4+0.2 is largest and has
the highest radio luminosity of any known PWN, it is not
surprising that it was formed by such a powerful neutron
star. Finally, the age and Ė predicted by this method
are similar to those predicted by Gaensler et al. (2000).
In order to better understand the limitations of the
approach in determining the properties of the neutron
star and SNR in G328.4+0.2, we have run the model
presented in §4 over a finer grid of parameters and eval-
uated the resulting PWN evolution using the same cri-
teria as above. In Fig. 10, we show which values of P0
and Bns pass all of this criteria for three different kind
of SN explosions: Esn = 10
51 ergs and Mej = 1 M⊙,
Esn = 3 × 10
51 ergs and Mej = 1 M⊙, and Esn =
4 × 1051 ergs and Mej = 3.25 M⊙, assuming an ambi-
ent density with n = 0.03. The first set of SN parame-
ters corresponds to ST 1, the second to ST 2, and third
to a higher ejecta mass SN explosion is compatible with
a neutron star with the same parameters as ST 2. For
the first case, we find that a wide range of P0 values
are allowed but that Bns . 10
12 G. In fact, for this set
of SN parameters a P0 ∼ 10 ms, Bns ∼ 8 × 10
11 G neu-
tron star results in a PWN which is compressed a similar
amount as in ST 2. In the second case, we find that P0
and Bns are tightly constrained around P0 ≈ 5 ms and
Bns ≈ 5× 10
11 G. In the third set of SN parameters, we
find that P0 . 6 ms, but that Bns spans a wide range of
values, ∼ 1011 − 2× 1012 G.
To determine the allowed values of Esn and Mej, we
followed the same procedure as above using two different
sets of neutron star parameters: P0 = 5 ms and Bns =
5× 1011 G (the neutron star parameters in the ST 1 and
ST 2 scenarios), and P0 = 10 ms and Bns = 8 × 10
11 G.
For the first case, only models with Esn ∼ 1−4×10
51 ergs
andMej ∼ 0.5−3.5M⊙ satisfy the criteria above – though
a substantial compression of the PWN requires Esn &
2 × 1051 ergs. In the second case, we find that only
models with Esn . 10
51 ergs and Mej ∼ 0.5− 3.5M⊙ are
allowed.
While this error analysis shows that the method used
above to determine the properties of the neutron star
and SN explosion which formed G328.4+0.2 is unable to
do so to much better than an order of magnitude, the
different combinations values of P0, Bns, Esn, and Mej
which are allowed predict different physical properties for
G328.4+0.2 which are testable with further observations.
For example, in the case of P0 = 10 ms, Bns = 8×10
11 G,
Esn = 1 × 10
51 ergs, and Mej = 1 M⊙, G328.4+0.2 is
∼ 13, 000 years old, twice the age predicted in the ST 2
model as shown in Table 6, and as a result the required
velocity of the neutron star is significantly lower, vminns ∼
500 km s−1. The predicted period for the neutron star
in this scenario is also significantly slower than required
by the ST 2 scenario, P ∼ 24 ms, with a value of Ė
approximately an order of magnitude lower than that
in the ST 2 scenario. The termination shock radius for
this set of parameters is ∼ 5′′, considerable smaller than
the ∼ 8′′ for the ST 2 scenario and detectable with the
Chandra X-ray Observatory.
5. conclusions
In this paper, we first presented new X-ray (§2.1) and
radio (§ §2.2, 2.3) and observations of Galactic non-
thermal radio and X-ray source, G328.4+0.2, from which
we infer the current properties and evolutionary history
of this source (§3.2). We then presented a simple hydro-
dynamic model for the evolution of a PWN inside a SNR
(§4), which is used to determine which values of Esn,
Mej, n, P0, and Bns are able to reproduce the properties
discussed in §3.2 if G328.4+0.2 was a Composite SNR
(§4.1.1) or a PWN (§4.1.2). As a result of this analysis,
we determine the G328.4+0.2 is a PWN inside an unde-
tected SNR. Though we are not able to precisely deter-
mine the properties of the SN explosion and the neutron
star which have created this system, our analysis implies
that the neutron star in G328.4+0.2 was born with an ini-
tial period P0 . 10 ms, has a lower than average surface
dipole magnetic field strength, and has a higher than av-
erage spatial velocity vns & 400 km s
−1. We assume de-
termining that the SN explosion which created the neu-
tron star had a normal explosion energy, Esn ∼ 10
51 ergs,
but a relative low ejecta mass, Mej . 4M⊙. Future X-
ray and radio observations can significantly decrease this
uncertainty, particularly if they are able to either detect
pulsations from the neutron star or continuum X-ray or
radio emission from the currently undetected SNR in this
system.
While we are not able to definitely determine the ini-
tial period (P0) or surface magnetic field strength (Bns)
of the neutron star, nor the kinetic energy (Esn) or ejecta
mass (Mej) of the progenitor SN explosion, the estimates
quoted above are of interest. Our non-detection of the
pulsar via radio pulsations is not particularly constrain-
ing, due to the very large distance of the PWN. The low
magnetic field but rapid initial period predicted for the
neutron star in G328.4+0.2 has implications for mod-
els concerning the origin of neutron star magnetic fields.
For example, according to the α − Ω dynamo model of
(Thompson & Duncan 1993), neutron star born spinning
rapidly (P0 . 5 ms) should have a strong dipole compo-
nent to their surface magnetic fields (Bns ≫ 10
12 G).
If the ST 2 scenario proves to be correct, then the low
magnetic field of the neutron star in this system would
be a problem for such a model. Additionally, the low
ejecta mass inferred in this scenario requires that the
progenitor of this system was either a single, massive
star (M & 35M⊙) which exploded in a Type Ib/c SN
(Woosley et al. 1995), or was initially in a binary sys-
Finally, the method used in this paper to study
G328.4+0.2 is complementary to other methods used
(e.g. Chevalier 2005) to infer the initial period and mag-
netic field strength of other neutron stars in young PWN
as well as the properties of the SN explosion in which
they were formed, and is easily applicable to other such
systems.
JDG would like to thank Niccolo Bucciantini, Shami
Chatterjee, Roger Chevalier, Tracey DeLaney, David Ka-
plan, Kelly Korreck, Cara Rakowski, and John Ray-
mond for useful discussions, and the anonymous referee
for many useful comments. We are extremely grateful
to John Reynolds for prompt scheduling and observing
assistance at Parkes in 2007. The Australia Telescope
is funded by the Commonwealth of Australia for oper-
ation as a National Facility managed by CSIRO. JDG
and BMG were supported in this work by XMM grant
NAG5-13202 and LTSA grant NAG5-13032.
REFERENCES
Aschenbach, B., Egger, R., & Trumper, J. 1995, Nature, 373, 587
Bandiera, R. 1984, A&A, 139, 368
Blondin, J. M., Chevalier, R. A., & Frierson, D. M. 2001, ApJ, 563,
Blondin, J. M., Wright, E. B., Borkowski, K. J., & Reynolds, S. P.
1998, ApJ, 500, 342
Bock, D. C.-J., Turtle, A. J., & Green, A. J. 1998, AJ, 116, 1886
Buccheri, R., Bennett, K., Bignami, G. F., Bloemen, J. B. G. M.,
Boriakoff, V., Caraveo, P. A., Hermsen, W., Kanbach, G.,
Manchester, R. N., Masnou, J. L., Mayer-Hasselwander, H. A.,
Ozel, M. E., Paul, J. A., Sacco, B., Scarsi, L., & Strong, A. W.
1983, A&A, 128, 245
Bucciantini, N., Amato, E., Bandiera, R., Blondin, J. M., & Del
Zanna, L. 2004, A&A, 423, 253
Bucciantini, N., Blondin, J. M., Del Zanna, L., & Amato, E. 2003,
A&A, 405, 617
Camilo, F., Manchester, R. N., Gaensler, B. M., & Lorimer, D. R.
2002a, ApJ, 579, L25
Camilo, F., Manchester, R. N., Gaensler, B. M., Lorimer, D. R., &
Sarkissian, J. 2002b, ApJ, 567, L71
Camilo, F., Ransom, S. M., Gaensler, B. M., Slane, P. O., Lorimer,
D. R., Reynolds, J., Manchester, R. N., & Murray, S. S. 2006,
ApJ, 637, 456
Camilo, F., Stairs, I. H., Lorimer, D. R., Backer, D. C., Ransom,
S. M., Klein, B., Wielebinski, R., Kramer, M., McLaughlin,
M. A., Arzoumanian, Z., & Müller, P. 2002c, ApJ, 571, L41
Cash, W. 1979, ApJ, 228, 939
Chevalier, R. A. 1982, ApJ, 258, 790
—. 1998, Memorie della Societa Astronomica Italiana, 69, 977
—. 2005, ApJ, 619, 839
Chevalier, R. A. & Emmering, R. T. 1989, ApJ, 342, L75
Chevalier, R. A. & Fransson, C. 1992, ApJ, 395, 540
Chevalier, R. A. & Liang, E. P. 1989, ApJ, 344, 332
Cordes, J. M. & Lazio, T. J. W. 2002, ArXiv Astrophysics e-prints
Dickel, J. R., Milne, D. K., & Strom, R. G. 2000, ApJ, 543, 840
Faucher-Giguère, C.-A. & Kaspi, V. M. 2006, ApJ, 643, 332
Freeman, P., Doe, S., & Siemiginowska, A. 2001, in Proc. SPIE Vol.
4477, p. 76-87, Astronomical Data Analysis, Jean-Luc Starck;
Fionn D. Murtagh; Eds., 76–87
Gaensler, B. M., Dickel, J. R., & Green, A. J. 2000, ApJ, 542, 380
Gaensler, B. M. & Slane, P. O. 2006, ARA&A, 44, 17
Ghizzardi, S. & Molendi, S. 2002, in Proc. ’New Visions of the X-
ray Universe in the XMM-Newton and Chandra Era’ 26-30 Nov.
2001 F. Jansen; Eds.
Gotthelf, E. V. 2003, ApJ, 591, 361
Goussard, J.-O., Haensel, P., & Zdunik, J. L. 1998, A&A, 330, 1005
Helfand, D. J. & Becker, R. H. 1987, ApJ, 314, 203
Hobbs, G., Lorimer, D. R., Lyne, A. G., & Kramer, M. 2005,
MNRAS, 360, 974
Hughes, J. P., Slane, P. O., & Plucinsky, P. P. 2000, ApJ, 542, 386
Johnston, S., McClure-Griffiths, N. M., & Koribalski, B. 2004,
MNRAS, 348, L19
Jun, B.-I. 1998, ApJ, 499, 282
Jun, B.-I., Jones, T. W., & Norman, M. L. 1996, ApJ, 468, L59+
Katz-Stone, D. M. & Rudnick, L. 1997, ApJ, 488, 146
Kennel, C. F. & Coroniti, F. V. 1984, ApJ, 283, 710
Leahy, D. A., Elsner, R. F., & Weisskopf, M. C. 1983, ApJ, 272,
Lorimer, D. R., Yates, J. A., Lyne, A. G., & Gould, D. M. 1995,
MNRAS, 273, 411
Matheson, H. & Safi-Harb, S. 2005, Advances in Space Research,
35, 1099
Mavromatakis, F., Xilouris, E., & Boumis, P. 2004, A&A, 426, 567
McClure-Griffiths, N. M., Dickey, J. M., Gaensler, B. M., Green,
A. J., Haverkorn, M., & Strasser, S. 2005, ApJS, 158, 178
Mills, B. Y., Slee, O. B., & Hill, E. R. 1961, Australian Journal of
Physics, 14, 497
Possenti, A., Cerutti, R., Colpi, M., & Mereghetti, S. 2002, A&A,
387, 993
Ransom, S. M., Eikenberry, S. S., & Middleditch, J. 2002, AJ, 124,
Reynolds, S. P. 1985, ApJ, 291, 152
Reynolds, S. P. & Chevalier, R. A. 1984, ApJ, 278, 630
Roberts, M. S. E., Tam, C. R., Kaspi, V. M., Lyutikov, M., Vasisht,
G., Pivovaroff, M., Gotthelf, E. V., & Kawai, N. 2003, ApJ, 588,
Shapiro, S. L. & Teukolsky, S. A. 1983, Black holes, white dwarfs,
and neutron stars: The physics of compact objects (Research
supported by the National Science Foundation. New York,
Wiley-Interscience, 1983, 663 p.)
Shull, J. M., Fesen, R. A., & Saken, J. M. 1989, ApJ, 346, 860
Slane, P., Helfand, D. J., van der Swaluw, E., & Murray, S. S. 2004,
ApJ, 616, 403
Tam, C., Roberts, M. S. E., & Kaspi, V. M. 2002, ApJ, 572, 202
Thompson, C. & Duncan, R. C. 1993, ApJ, 408, 194
Truelove, J. K. & McKee, C. F. 1999, ApJS, 120, 299
van der Swaluw, E., Achterberg, A., Gallant, Y. A., & Tóth, G.
2001, A&A, 380, 309
van der Swaluw, E., Downes, T. P., & Keegan, R. 2004, A&A, 420,
Velazquez, P. F., Gomez, D. O., Dubner, G. M., de Castro, G. G.,
& Costa, A. 1998, A&A, 334, 1060
Warren, J. S., Hughes, J. P., & Slane, P. O. 2003, ApJ, 583, 260
Woosley, S. E., Langer, N., & Weaver, T. A. 1995, ApJ, 448, 315
Table 1
Spatial components of the X-ray emission from G328.4+0.2
Component Parameter Value
background constant 0.09
+0.06
−0.07
Clump 1 r0 1.
′′1+1.
−0.′′7
Position 15h55m26.s68+0.
−0.s05
, −53◦18′02.′′7+0.
−0.′′6
A 6.6+27.6
α 0.9+1.0
Clump 2 FWHM 19′′
Position 15h55m27.s5
+0.s2
−0.s1
, −53◦17′54.′′6
+3.′′4
−3.′′8
e 0.4
θ 280◦
A 0.8
Diffuse FWHM 67
Position 15h55m26.s6
+0.s2
−0.s2
, −53◦17′48.′′3
e 0.4
θ 130◦+10
A 0.4+0.1
Note. – Results from the spatial fit to the X-ray emission from G328.4+0.2, as described in §2.1.1. Fitting was done using the Sherpa
software package, and the errors reflect the 90% confidence level. Clump 1 was fitted to a 2D Lorentzian, defined as f(r) = A
1 + r
where f(r) is the expected number of counts at radius r away from the center of the source, A is given in counts, and the core-radius r0 is in
arc-seconds. The background component is given in counts pixel−1. Both Clump 2 and Diffuse were modeled with elliptical 2D Gaussians,
where the full width, half maximum (FWHM) is given in arc-seconds, the ellipticity is e, defined as 1− b/a, where b and a are, respectively,
the major and minor axis of the source, the position angle θ is given in degrees counterclockwise from north, and the amplitude A is given
in counts.
Table 2
Spectral Fits to Mos1 + Mos2 + pn Data
Parameter Value
Model phabs * pow phabs * bbodyrad phabs * bremss phabs * ray
NH 11.6
× 1022 6.4
× 1022 10.4
× 1022 7.8
× 1022
Γ or kT 2.0
+>900
Absorbed Flux 4.6
× 10−13 4.4
× 10−13 4.5
× 10−13 4.9
× 10−13
Unabsorbed Flux 1.9
× 10−12 6.5
× 10−13 1.2
× 10−12 9.6
× 10−13
χ2 103.3/131 98.4/131 101.7/131 122.8/131
Reduced χ2 0.81/128 0.77/128 0.79/128 0.96/128
Note – Results from joint fits to the mos1, mos2, and pn spectrum between 0.5 and 10 keV. The model phabs * pow refers to a power-law
attenuated by interstellar absorption, phabs * bbodyrad refers to a blackbody attenuated by interstellar absorption, phabs * bremss
refers to a Bremsstrahlung source spectrum attenuated by interstellar absorption, and phabs * ray refers to a Raymond-Smith thermal
plasma spectrum attenuated by interstellar absorption. In the table, NH is given in cm
−2, kT in keV, and flux in ergs cm−2 s−1, both the
absorbed and unabsorbed flux are calculated between 0.5 and 10 keV, and the errors represent the 90% confidence level.
Table 3
Expected Properties of G328.4+0.2 if it is a Composite SNR
Supernova Remnant Properties Pulsar Wind Nebula Properties Neutron Star Properties
Parameter Value Parameter Value Parameter Value
Esn 1× 10
51 ergs Epwn 4.8× 10
50 ergs Ė0 2.5× 10
40 ergs s−1
Mej 1 M⊙ Rpwn 8.7 parsecs P0 3.8 ms
Phase Sedov-Taylor M
sw 5.3 M⊙ τ0 1730 years
Psnr(Rpwn) 1.4× 10
−9 dynes Ppwn 2.0× 10
−9 erg cm−4 Bns 1.1× 10
vej(Rpwn) 400 km s
−1 vpwn 700 km s
−1 vminns 1300 km s
tnow 4900 years rts 0.5 parsecs Ė 1.7× 10
39 ergs s−1
n 0.32 cm−3 θts 6.
′′7 P 7.5 ms
· · · · · · · · · · · · Ṗ 1.8× 10−14 s/s
· · · · · · · · · · · · τc 6700 years
· · · · · · · · · · · · Epsr 1.0× 10
51 ergs
Note. – The values in bold are model assumptions, while the others are predicted by the model presented in §4. Psnr(Rpwn) is the
pressure inside of the SNR just outside of the PWN, vej(Rpwn) is the velocity of material inside the SNR just outside of the PWN, Epwn is
the internal energy of the PWN, Rpwn is the radius of the PWN, M
sw is the mass of material swept up by the PWN, Ppwn is the internal
pressure of the PWN, vpwn is the expansion velocity of the PWN, rts is the radius of the termination shock around the neutron star inside
the PWN, calculated using Equation 16, θts is the angular size of this feature assuming d17 ≡ 1, Bns is the dipole magnetic field of the
neutron star in G328.4+0.2 according to this model, P is the period of the neutron star in G328.4+0.2, Ṗ is the period-derivative of the
neutron star, τc is the characteristic age of the neutron star, defined as τc = P/(2Ṗ ), and Epsr is the total amount of energy injected by
the neutron star into G328.4+0.2 for t < tnow. All values are given for t = tnow unless otherwise noted.
Table 4
Scenarios for G328 as a PWN inside an Undetected SNR
Scenario # τ0 P0 Esn,51 Mej n tnow SNR Phase Rsnr Bns,12 v
min,II
ST 1 1730 5.0 1.00 1 0.03 5100 Sedov-Taylor 19.8 0.5 1300 2400
ST 2 1730 5.0 3.16 1 0.03 6500 Sedov-Taylor 28.0 0.5 1000 1900
Rad 1 430 10.0 0.32 10 0.32 84200 Radiative 28.4 2.0 100 100
Rad 2 770 10.0 0.32 10 0.32 52400 Radiative 24.8 1.5 100 200
Rad 3 770 10.0 0.32 10 1.00 101400 Radiative 22.1 1.5 100 100
Note. – Values for τ0, P0, Esn, Mej, and n that satisfy the criteria listed in §4.1.2 for G328 being a PWN inside an undetected SNR.
The value of τ0 is given in years, P0 in ms, Esn,51 ≡ Esn/10
51 ergs, Mej in solar masses, n in cm
−3, tnow in years, Rsnr in parsecs,
Bns = Bns,12 × 10
12 G, vminns is in km s
−1, and v
min,II
ns is also in km s
Table 5
Compression/Expansion properties of G328.4+0.2 if it is a PWN
Scenario # t(vpwn = 0) Central Bar Lifetime Rpwn(vpwn = 0)
re−exp
compres
ST 1 1648.5, 2432.0 63930 8.20, 7.96 0.91
ST 2 969.0, 1837.0 35250 8.36, 6.34 0.44
Rad 1 12772.0, 26541.0 243480 8.53, 7.55 0.69
Rad 2 13336.5, 21112.5 256230 8.48, 8.27 0.93
Rad 3 9761.5, 11035.0 102640 5.72, 5.72 1.00
Note. – In this table, values Scenario # correspond to the values of τ0, P0, Esn, Mej, and n given in Table 4. t(vpwn = 0) and the Central
Bar lifetime are given in years, and Rpwn(vpwn = 0) is given in parsecs.
re−exp
compres
is the ratio of the volume of the PWN at re-expansion
and compression.
Table 6
Expected Properties of G328.4+0.2 if it is a PWN (ST 2 scenario in Table 4)
Supernova Remnant Properties Pulsar Wind Nebula Properties Neutron Star Properties
Parameter Value Parameter Value Parameter Value
Esn 3.2× 10
51 ergs Epwn 3.2× 10
50 ergs Ė0 1.5× 10
40 ergs s−1
Mej 1 M⊙ Rpwn 12.5 parsecs P0 5 ms
Phase Sedov-Taylor M
sw 0.4 M⊙ τ0 1730 years
Psnr(Rpwn) 3.6× 10
−10 dynes Ppwn 4.4× 10
−10 dynes Bns 5× 10
vej(Rpwn) 460 km s
−1 vpwn 790 km s
−1 vminns 990 km s
Age (tnow) 6500 years rts 0.6 parsecs Ė 6.4× 10
38 ergs s−1
n 0.03 cm−3 θts 7.
′′7 P 10.9 ms
· · · · · · · · · · · · Ṗ 2.1× 10−14 s/s
· · · · · · · · · · · · τc 8200 years
· · · · · · · · · · · · Epsr 6.3× 10
50 ergs
Note. – The model assumptions are given in bold. Psnr(Rpwn) is the pressure inside just outside of the PWN, vej(Rpwn) is the velocity
of material inside just outside of the PWN, Epwn is the internal energy of the PWN, Rpwn is the radius of the PWN, M
sw is the mass of
material swept up by the PWN, Ppwn is the internal pressure of the PWN, vpwn is the expansion velocity of the PWN, rts is the radius of
the termination shock around the neutron star inside the PWN, θts is the predicated angular radius of this feature assuming d17 ≡ 1, Bns
is the predicted dipole magnetic field of the neutron star in G328.4+0.2, P is the predicted period of the neutron star in G328.4+0.2, Ṗ
is the predicted period-derivative of the neutron star, τc is characteristic age of the neutron star, and Epsr is the total amount of energy
injected by the neutron star into G328.4+0.2 for t < tnow. All values are given for t = tnow unless otherwise noted.
Point Source
Background
Background Region for Spectral Analysis
Source Region for Spectral Analysis
Clump 1
DiffuseClump 2
Clump 2
Clump 1
Diffuse
Fig. 1.— Top: Exposure normalized, vignette corrected mos1 + mos2 image of G328.4+0.2 smoothed by a 5′′ Gaussian. The white
contours indicate 20, 40, 50, 70, and 90% of the peak X-ray flux in the smoothed image, while the boxes indicate the background and
source regions used for the spectral analysis described in §2.1.2. The background point source labeled in the image was excluded from
the background region. Additionally, the labels in this plot point to the morphological features discussed in §2.1.1. Bottom: Unsmoothed
normalized, vignette corrected mos1 and mos2 image of G328.4+0.2 overlaid with the regions used for the Hardness Ratio analysis discussed
in §2.1.2.
Fig. 2.— Mos1, Mos2, and pn spectrum of G328.4+0.2 overlaid with the absorbed power-law model whose parameters are given in
Table 2.
Fig. 3.— 1.4 GHz image of G328.4+0.2, overlaid with X-ray contours in green which represent 20%, 35%, ..., 90% of the peak flux in the
smooth X-ray image shown in Fig. 1. The beam size of this image is 7.′′0×5.′′8, and is shown in the lower left-hand corner of the image.
The labels indicate examples of the different radio morphological features discussed in §2.2.
Fig. 4.— 20cm radio image of G328.4+0.2 (same data as shown in Fig. 3), with a color scale chosen to enhance the visibility of Filamentary
Structure B discussed in §2.2. The yellow circle indicates the size of PWN predicted in the ST 2 model listed in Table 5 when it re-expanded
after the initial compression by the SNR reverse shock, and the white cross indicates the center of G328.4+0.2 (15h55m33s,−53◦17′00′′;
J2000) as determined by Gaensler et al. (2000).
Fig. 5.— Spectral tomography images of G328.4+0.2, as described in §2.2.1. The spectral index α is given in the upper left hand corner
of each image, where Sν ∝ ν
Contact Discontinuity
SNR Forward Shock
SNR Reverse
Shock
Outer Boundary
of PWN
Neutron Star
Shocked ISM Ambient ISM
Unshocked Ejecta
Material Swept
Up by PWN
Pulsar
Shocked Ejecta
Fig. 6.— Diagram of a Composite SNR in the Free Expansion stage of its evolution. In this image, the ratio between the thickness
of the mass shell surrounding the PWN and the radius of the PWN is 1/24, as determined by van der Swaluw et al. (2001), and the
radio of the SNR Forward Shock, Contact Discontinuity, and Reverse Shock radii are equal to the values given in Chevalier (1982) for his
n = 9, s = 0 case. The colors denote the nature of the material within each region.
1000 10000 100000
Time (years)
Fig. 7.— Rpwn/Rsnr for Models A (solid), B (long-dashed line), & C (short-dashed line) in Blondin et al. (2001). The top plot shows
the result of the model presented in §4, while the bottom is a reproduction of Fig. 3 by Blondin et al. (2001), reproduced by permission of
the AAS.
Fig. 8.— The radius of the PWN, SNR, and SNR reverse shock as well as the location of the neutron star as a function of time if
G328.4+0.2 is a composite SNR. The vertical line indicates the current age of the system, and the properties of this system are given in
Table 3.
Fig. 9.— The radius of the PWN, SNR, and SNR reverse shock as well as the location of the neutron star as a function of time for the
favored (ST 2; Table 4) scenario if G328.4+0.2 is a PWN. The vertical line indicates the current age of the system.
Fig. 10.— The results for varying P0 and Bns for a Esn = 10
51 ergs, Mej = 1M⊙ (top), Esn = 3× 10
51 ergs, Mej = 1M⊙ (middle), and
Esn = 4×10
51 ergs, Mej = 3.25M⊙ (bottom) SN explosion, assuming n = 0.03 cm
−3. The small black squares indicate models which failed
the criteria described in §4.1.2, while the colored circles indicate scenarios which passed. The color represents the Compression Fraction
of the PWN, defined as the ratio of the PWN’s volume at the beginning and end of the compression stage. The Compression Fraction of
the ST 2 case given in Table 4 is 0.44 (light blue on this color scale), and lower values correspond to a more substantial compression. The
star indicates the position of a P0 = 5 ms, Bns = 5 × 10
11 G neutron star, while the large square indicates the position of a P0 = 10 ms,
Bns = 8× 10
11 G neutron star – the two neutron stars used in Fig. 11.
Fig. 11.— The results of varying Esn and Mej for a P0 = 5 ms, Bns = 5 × 10
11 G (top) and P0 = 10 ms, Bns = 8 × 10
11 G (bottom)
neutron star, assuming n = 0.03 cm−3. The black squares indicate which scenarios failed the criteria described in §4.1.2, while the colored
circles indicate those that passed, with the color representing the Compression Fraction (defined as the ratio of the PWN’s volume at the
beginning and end of the compression stage) of the PWN. The star indicates a Esn = 1× 10
51 ergs, Mej = 1 M⊙ SN explosion, the square
indicates a Esn = 3× 10
51 ergs, Mej = 1 M⊙ SN explosion, and the circle indicates a Esn = 4× 10
51 ergs, Mej = 3.25 M⊙ SN explosion –
the SN explosion parameters used in Fig. 10.
APPENDIX
equations for the hd model for the evolution of a pwn inside a snr
In this Appendix, we provide many of the details concerning the properties of the neutron star, PWN, and SNR
needed to simulate the HD model for the evolution of a PWN inside a SNR described in §4.
For the SNR, we assume that the initial ejecta density profile consists of a constant density core surrounded by a
ρ ∝ r−9 envelope – the standard assumption for a SNR produced by a Type-II SNR (Blondin et al. 2001; Chevalier
1982) – and that the ejecta is expanding ballistically (vej ≡ rej/t). The boundary between the constant density core
and the outer ejecta envelope has a velocity vcore, defined as (Blondin et al. 2001):
vcore=
where Esn is the explosion energy of the SN and Mej is the ejecta mass. As a result, the density ρcore of the ejecta
core is (Blondin et al. 2001):
ρcore(t)=
Esn v
core t
−3. (A2)
As the SNR expands, it sweeps up and shocks the surrounding interstellar medium (ISM). This swept-up material has
a higher pressure than the cold ejecta driving the expansion of the SNR, and as a result drives a reverse shock (RS)
into the SN ejecta. In between the outer edge of the SNR, which marks the location of the forward shock (FS) and
the RS is a contact discontinuity which separates the shocked ISM from the ejecta shocked by the RS. The pressure
inside the SNR at r < rrs, where rrs is the radius of the RS, is assumed to be zero. A diagram of this is shown in
Fig. 6. Since we assume that both the SN ejecta and the shocked ISM behave as a γ = 5/3 perfect gas, the sound
speed cs of this material is:
When the RS is still in the ejecta envelope, we determine the pressure, velocity, and density profiles on the material
between the RS and FS using the self-similar equations given by Chevalier (1982), evaluating them for the n = 9, s = 0
case. However, when the RS enters the constant density ejecta core, it is no longer possibly to apply the self-similar
solution of Chevalier (1982), and we use the work of Truelove & McKee (1999) to determine the radius of the RS and
the results given by Bandiera (1984) to determine the pressure, velocity, and density profiles between the RS and FS.
It is also necessary to model the radius of the FS (Rsnr), which we do using the work of Truelove & McKee (1999).
This is valid while the SNR is in the Free Expansion and Sedov-Taylor phases of its evolutions. After the SNR goes
radiative, which occurs at a time t = trad defined as (Blondin et al. 1998):
trad≈ 2.9E
17 × 104 yr. (A4)
After this point, Rsnr ∝ t
2/7. An analytic model for the pressure, velocity, and density distribution of a SNR in this
phase does not currently exist, and therefore it is difficult to extend our model to this phase, though if one assumes
that the interior of the SNR evolves adiabatically, Rpwn/Rsnr ∝ t
0.075 for t > trad (Blondin et al. 2001).
For the PWN, as mentioned in §4, we assume that it is a bubble filled with a γ = 4/3 perfect gas. As a result, the
internal pressure of the PWN, Ppwn is equal to:
Ppwn=
3Vpwn
, (A5)
where Epwn is the internal energy of the PWN and Vpwn is the volume of the PWN, defined as:
Vpwn=
πR3pwn (A6)
where Rpwn is the radius of the PWN. The internal energy of the PWN is determined by the rate of energy injected
into the PWN by the neutron star (Ė), and energy loss due to its expansion inside the SNR ( ˙Eadpwn). For Ė, we use
the standard assumption that it is equal to:
Ė= Ė0
where t is the age of the neutron star, p is the pulsar braking index (p = 3 for a magnetic dipole), τ0 is the characteristic
timescale of pulsar spin-down, and Ė0 is the initial spin-down energy of the neutron star. Both τ0 and Ė0 depend on
the physical properties of the neutron star, with τ0 defined as (Blondin et al. 2001; Shapiro & Teukolsky 1983):
3c3 IP 20
4π2B2nsR
ns sin
where I is the neutron star’s moment of inertia, P0 is the initial spin period, Bns is the magnetic field of the neutron
star, Rns is the radius of the neutron star, α is the angle between the neutron star’s rotation axis and magnetic field,
and (Blondin et al. 2001):
Ė0= I
τ0(p− 1)
. (A9)
Since the PWN expands adiabatically, ˙Eadpwn is equal to:
˙Eadpwn=−
. (A10)
As a result, the change in the internal energy of the PWN over time ( ˙Epwn) is equal to:
˙Epwn=−
+ Ė0
(A11)
assuming that p = 3. This equation can be solved analytical, and we result that Epwn(t) can be expressed as:
Epwn(t)= Ė0τ0
ln(1 + t/τ0)
t/τ0 − 1
. (A12)
When we run our model, we use Eq. (A12) to determine the initial value of Epwn, but determine Epwn at later times
using the procedure described in Step 2 in §4.
During its free-expansion, the PWN is moving faster than its surroundings, and the mass of the shell surrounding
the PWN (Msw,pwn) is simply:
Msw,pwn(t)=
∫ Rpwn
4πR2ρej(r, t)dr (A13)
where ρej(r) is the density profile of the SNR. After the collision with the reverse shock, if the PWN is moving
faster than its surroundings we determine the mass of the ejecta shell recently swept up by the PWN and add it
to the value of Msw,pwn calculated at the time of the reverse shock collision. If the PWN is moving slower than its
surroundings, we assume that Msw,pwn remains constant, even if the PWN is being compressed by the surrounding
SNR. Due to the difference in pressure between the PWN interior to the mass shell and the SNR exterior to the mass
shell (Psnr(r = Rpwn)), the mass shell is subject to a force F∆P, defined as:
F∆P=4πR
pwn[Ppwn − Psnr(Rpwn)]. (A14)
In this notation, F∆P > 0 means that the PWN interior has a higher pressure than inside the surrounding SNR. If the
PWN has not yet encountered the RS, we assume that Psnr(r = Rpwn) = 0. If the mass shell is moving faster than the
sound speed of the surrounding material (vpwn > cs(Rpwn)), which is the case before the PWN interacts with the RS
(Chevalier & Fransson 1992), the mass shell is decelerated by ram pressure, and the total force on the mass swept-up
by the PWN, Fpwn, is:
Fpwn=F∆P − 4πR
pwnρej(Rpwn)[vpwn − vej(Rpwn)]
2. (A15)
If vpwn < cs, then Fpwn = F∆P. For t ≪ τ0, analytical solutions to these equations give Rpwn ∝ t
6/5 if the PWN is
still inside the central constant-density core – a result which is reproduced by our numerical implementation of the
model described in §4.
In this framework, the period P of a neutron star evolves as:
P =P0
, (A16)
the period-derivative Ṗ evolves as:
(A17)
and the surface magnetic field Bns of the neutron star, assuming p = 3, is:
Bns=1.5
˙E0,37
2P 20,ms
× 109 G (A18)
where ˙E0,37 = Ė0/10
37 ergs s−1, P0,ms is the initial period in ms, and R14 is the radius of the neutron star R/14 km.
Additionally, for p = 3 τ0 is equal to:
τ0=17.3
B212R
years (A19)
where I = 1045I45 g cm
2 and the angle between the spin and magnetic field axes of the neutron star is α = 45◦.
ABSTRACT
  We present new observational results obtained for the Galactic non-thermal
radio source G328.4+0.2 to determine both if this source is a pulsar wind
nebula or supernova remnant, and in either case, the physical properties of
this source. Using X-ray data obtained by XMM, we confirm that the X-ray
emission from this source is heavily absorbed and has a spectrum best fit by a
power law model of photon index=2 with no evidence for a thermal component, the
X-ray emission from G328.4+0.2 comes from a region significantly smaller than
the radio emission, and that the X-ray and radio emission are significantly
offset from each other. We also present the results of a new high resolution (7
arcseconds) 1.4 GHz image of G328.4+0.2 obtained using the Australia Telescope
Compact Array, and a deep search for radio pulsations using the Parkes Radio
Telescope. We find that the radio emission has a flat spectrum, though some
areas along the eastern edge of G328.4+0.2 have a steeper radio spectral index
of ~-0.3. Additionally, we obtain a luminosity limit of the central pulsar of
L_{1400} < 30 mJy kpc^2, assuming a distance of 17 kpc. In light of these
observational results, we test if G328.4+0.2 is a pulsar wind nebula (PWN) or a
large PWN inside a supernova remnant (SNR) using a simple hydrodynamic model
for the evolution of a PWN inside a SNR. As a result of this analysis, we
conclude that G328.4+0.2 is a young (< 10000 years old) pulsar wind nebula
formed by a low magnetic field (<10^12 G) neutron star born spinning rapidly
(<10 ms) expanding into an undetected SNR formed by an energetic (>10^51 ergs),
low ejecta mass (M < 5 Solar Masses) supernova explosion which occurred in a
low density (n~0.03 cm^{-3}) environment.

<|endoftext|><|startoftext|>
Introduction
Though relativistic heavy ion collisions a medium is created that may be the quark
gluon plasma (QGP). We can study this medium though the use of jets and jet-
like correlations. Jets make good probes because their properties in vacuum can be
calculated with perturbative quantum chromodynamics (pQCD). Previous results
on two-particle azimuthal jet-like correlations have revealed a broadened away-side
shape in central Au+Au collisions relative to pp, d+Au and peripheral Au+Au
collisions, or even double humped 1,2,3,4. The away-side shape is consistent with
many different physics mechanisms including: large angle gluon radiation 5,6, jets
deflected by radial flow or preferential selection of particles due to path-length
dependent energy loss, hydrodynamic conical flow generated by Mach-cone shock
waves 7,8, and Čerenkov gluon radiation 9,10. 3-particle correlations can be used to
differentiate mechanisms with conical emission, Mach-cone and Ĉerenkov radiation,
from the other mechanisms. Additionally, the dependence of the conical emission
angle on associated particle pT can be used to differentiate between Mach-cone and
http://arxiv.org/abs/0704.0220v1
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
2 Jason Glyndwr Ulery For the STAR Collaboration
QCD-Čerenkov radiation.
2. Analysis Procedure
The 3-particle correlation analysis method has been rigorously described in refer-
ence 11. Results are reported here for trigger particles of 3 < pT < 4 Gev/c and
associated particles of 1 < pT < 2 GeV/c, except where otherwise noted. Results
are from pp, d+Au, and Au+Au collisions at
sNN = 200 GeV/c. All particles are
charged particles measuremented in the STAR time projection chamber (TPC).
φ-φ=φ∆
-1 0 1 2 3 4 5
-1 0 1 2 3 4 5
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
(a) (b) (c) (d)
Fig. 1. (color online) (a) Raw 2-particle correlation (points), background from mixed events with
flow modulation added-in (solid) and scaled by ZYA1 (dashed), and background subtracted 2-
particle correlation (insert). (b) Raw 3-particle correlation, (c) soft-soft background, βα2Binc
and (d) hard-soft background + trigger flow, Ĵ2 ⊗ αBinc2 + βα2B
inc,TF
. See text for detail.
Results are from ZDC-triggered 0-12% Au+Au collisions at
sNN=200 GeV/c.
Figure 1a shows the 2-particle azimuthal distribution (J2), its background (B
and the background subtracted 2-particle correlations (Ĵ2). The background is con-
structed from mixed events where the trigger particle and the associated particles
are from different events within the same centrality window. The flow modulation
is added in pairwise using the average v2 values from the measurements based on
the reaction plane and 4-particle cumulant methods 1 and the v4 contribution uses
the parameterization v4 = 1.15v
from the data12. The background is normalized
(with scale factor α) to the signal within 0.8 < |∆φ| < 1.2 (zero yield at 1 radian
or ZYA1).
Figure 1b shows the 3-particle azimuthal distribution (J3) in ∆φaT = φa −
φTrigger and ∆φbT = φb −φT where φT , φa, and φb are the azimuthal angles of the
trigger particle and the two associated particles respectively. Combinatorial back-
grounds must be removed to extract the genuine jet-like 3-particle signal. Events are
treated as composed of two components, particles that are jet-like correlated with
the trigger particle and background particles. One source of background, the hard-
soft background, results when one of the associated particles is jet-like correlated
with the trigger particle and the other uncorrelated, other than the correlation
due to flow. It is constructed from the 2-particle jet-like correlations, Ĵ2 folded
with the normalized 2-particle background, αBinc
. We shall refer to the hard-soft
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
Three-Particle Correlations from STAR 3
background as Ĵ2 ⊗ αBinc2 .
Another source of background, the soft-soft background, results from correla-
tions between the two associated particles which are independent of the trigger
particle. This background is obtained from mixing the trigger particle with a dif-
ferent event of the same centrality. We shall refer to this background as Binc
. Since
the two associated particle are from the same event all correlations between them
that are independent of the trigger are preserved. This may include contribution
from minijets, other jets in the event and flow. The soft-soft background is shown
in figure 1c.
Although the flow correlations between the two associated particles is accounted
for by the soft-soft term, those between the associated particles and the trigger par-
ticle are not. Those correlations are added in triplet-wise from mixed events where
the trigger and the associated particles are all from different events in the same
centrality window. The v2 and v4 values are obtained from the same measurements
as used in the 2-particle background. The total number of triplets is determined
from the soft-soft. We shall refer to this background as B
inc,TF
The total background is then, Ĵ2 ⊗ αBinc2 + βα2(Binc3 + B
inc,TF
). Binc
inc,tf
are scaled by βα2. The normalization factor α is determined from 2-particle
correlations and deviates from unity due to the combined effects of trigger bias and
centrality definition. If the events are poisson then α2 is the correct multiplicity
scaling in 3-particle correlations. The normalization factor β corrects for the effect
of non-poisson multiplicity distributions and is obtained such that the number
of triplets in the background subtracted jet-like three-particles correlation equals
the square of the number of pairs in the background subtracted jet-like 2-particle
correlation. Figure 1d shows Ĵ2 ⊗ αBinc2 + βα2B
inc,TF
3. Results
Figure 2 shows background subtracted 3-particle jet-like correlation signals for dif-
ferent collisions and centralities. The pp and d+Au results are similar with peaks
clearly visible for the near-side, (0,0), away-side, (π,π), and the two cases of one
particle on the near-side and the other on the away-side, (0,π) and (π,0). The away-
side peak displays on-diagonal elongation which is consistent with kT broadening.
Additional on-diagonal elongation is present in the Au+Au results, possibly due
to deflected jets or large angle gluon radiation. The more central Au+Au colli-
sions display an off-diagonal structure, at about π± 1.45 radians, that is consistent
with conical emission. This structure increases in magnitude with centrality and is
prominent in the high statistics top 12% data provided by the online zero degree
calorimeter (ZDC) trigger.
Figure 3 shows away-side projections of on-diagonal strips to (∆φaT+∆φbT )/2−
π and off-diagonal strips to (∆φaT − ∆φbT )/2. In d+Au collisions only a strong
central peak is present for both on-diagonal and off-diagonal projections. The on-
diagonal projection is broader than the off-diagonal projection, likely due to kT
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
4 Jason Glyndwr Ulery For the STAR Collaboration
-1 0 1 2 3 4 5-1
-0.02
-1 0 1 2 3 4 5-1
-0.02
-1 0 1 2 3 4 5-1
-0.05
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
STAR Preliminary
Fig. 2. (color online) Background subtracted jet-like 3-particle correlations for pp (top left), d+Au
(top middle), and Au+Au 50-80% (top right), 30-50% (bottom left), 10-30% (bottom center), and
ZDC triggered 0-12% (bottom right) collisions at
sNN=200 GeV/c.
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-0.2
STAR Preliminary
Fig. 3. (color online) Away-side projections of a strip of width 0.7 radians for (left) d+Au and
(right) 0-12% ZDC Triggered Au+Au. Off-diagonal projection (solid) is (∆φaT − ∆φbT )/2 and
on-diagonal projection (open) is (∆φaT +∆φbT )/2 − π. Shaded bands are systematic errors.
broadening. In central Au+Au collisions strong peaks are seen in the off-diagonal
projection, as expected for conical emission. The on-diagonal projection is similar
to the off-diagonal but with additional contribution between the peaks likely due to
deflected jets and/or large angle gluon radiation. The fitted angle of the side peaks
in the off-diagonal projection is about 1.45 radians.
Figure 4 shows the centrality dependence of the average signal strengths in
different regions. The right panel shows the away-side signal, average singal centered
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
Three-Particle Correlations from STAR 5
0 1 2 3 4 5 6
-0.05
0 1 2 3 4 5 6
-0.05
0 1 2 3 4 5 6
-0.05
Deflected + Cone
partN(
Fig. 4. (color online) Average signals in 0.7 × 0.7 boxes at (0,0), left, (π ± 1.45,π ∓ 1.45), center,
and (π ± 1.45,π ± 1.45). Solid error bars are statistical and shaded are systematic. Npart is the
number of participants. The ZDC 0-12% points (open symbols) are shifted to the left for clarity.
at (π,π). It increases with centrality in pp, d+Au and perpherial Au+Au and then
seems to level off for mid-central and central Au+Au collisions. The middle panel
shows the average signal where we only expect conical emission, at (π ± 1.45,π ∓
1.45). It increases with centrality and significantly deviates from zero in central
Au+Au collisions. The right panel shows the average signal were conical emissions
deflected jets, and large angle gluon radiation could all contribute, at π ± 1.45,π±
1.45). This signal is similar to what we see where we only expect conical emission.
Figure 5 shows the difference between on-diagonal signals, where conical emis-
sion, deflected jets, and large angle gluon radiation could contribute, and off-
diagonal signals, where only conical emission contributes. Since conical emission
signals are expected to be of equal magnitude on-diagonal and off-diagonal, the
difference may indicate the contribution from deflected jets and large angle gluon
radiation. The centrality dependence is shown for three different angles. The differ-
ence decreases with distance from (π,π).
If we have Mach-cone emission, the emission angle is expected to be indepen-
dent of the associated particle momentum; however, the Čerenkov radiation model
in Ref. 10 predicts an emission angle that is sharply decreasing with associated
particle momentum. For this reason we shall look at the associated particle pT
dependence of our signal. Figure 6 shows the background subtracted 3-particle cor-
relations for different associated pT bins. The angle is determined from fitting the
off-diagonal projection, (∆φaT − ∆φbT )/2, to a central Gaussian and two symet-
ric side Gaussians. The strength of the off-diagonal signal decrease with increasing
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
6 Jason Glyndwr Ulery For the STAR Collaboration
partN
0 1 2 3 4 5 6
1.00)±πDeflected (
1.30)±πDeflected (
1.45)±πDeflected (
STAR Preliminary
Fig. 5. (color online) Differences in average signals, between (π±1.45,π±1.45) and (π±1.45,π∓1.45)
(triangle), between (π± 1.3,π± 1.3) and (π± 1.3,π∓ 1.3) (square), and between (π± 1.0,π± 1.0)
and (π ± 1.0,π ∓ 1.0) (circle). Solid error bars are statistical and shaded are systematic. Npart
is the number of participants. The ZDC 0-12% points (open symbols) are shifted to the left for
clarity.
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-0.15
-0.05
-1 0 1 2 3 4 5-1
-0.05
-1 0 1 2 3 4 5-1
-0.02
(a) (b) (c) (d)
Fig. 6. (color online) Background subtracted jet-like 3-particle correlations for 0.75 < p
< 1.0
GeV/c (left), 1.0 < p
< 1.5 GeV/c (left center), 1.5 < p
< 2.0 GeV/c (right center), and
2.0 < p
< 3.0 GeV/c (right). Trigger particle pT is 3 < p
< 4 GeV/c. Results are from ZDC
triggered top 12% central Au+Au collisons at
sNN=200 GeV.
pT and is almost gone in the highest pT bin. This is not surprising since we need
two away-side particles each with a pT that is a significant fraction of the trigger
particle pT . Figure 7 (left) shows the dependence of the angle of the off-diagonal
peaks on associated particle pT . The angle is consistent with remaining constant as
a function of associated particle pT .
Figure 7 (right) shows the centrality dependence of the angle of the off-diagonal
peaks obtained from fits to the projections as done for the pT dependence. The
angle is consistent with remaining constant from mid-central to central Au+Au
collisions. If we have Mach-cone emission this likely implies the speed of sound in
the medium does not greatly vary from mid-central to central Au+Au collisions.
The solid line at 1.46 on the plot is from a fit to a constant.
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
Three-Particle Correlations from STAR 7
 Assoc (GeV/c)
0 0.5 1 1.5 2 2.5
0.02±Au+Au 0-12 1.41
0.02±Au+Au 0-50 1.45
Statistical
Systematic
STAR Preliminary
partN(
3 3.5 4 4.5 5 5.5 6
Au+Au 0-12% (shifted)
Au+Au 30-50%, 10-30% and 0-10%
0.03±1.46
STAR Preliminary
Fig. 7. (color online) Emission angles from double Gaussian fits. (left) Angle as a function of
associated particle pT for Au+Au 0-12% ZDC triggered (filled) and Au+Au 0-50% from minimum
bias (open). Numbers on the plot are results from a fit to a constant for the data points with the
fit errors displayed. (right) Angle as a function of centrality for Au+Au 0-12% ZDC triggered data
(circle) and Au+Au 30-50%, 10-30% and 0-10% from minimum bias data (square). The 0-12%
point has been shifted for clarity. The number is from a fit to a constant for the points, shown with
the solid line. The dashed line is at π/2. Solid error bars are statistical and shaded are systematic.
4. Systematics
The major sources of systematic error are from the elliptic flow measurements
and the background normalization. Our default v2 is the average of measurements
from the reaction plane and 4-particle cumulant methods. The systematic uncer-
tainty due to the v2 has been determined by varying it between the reaction plane
and 4-particle cumulant results. Figure 8a and b show the background subtracted
3-particle correlation for the reaction plane and 4-particle v2, respectively. Even
though the hard-soft background and trigger flow backgrounds individually vary a
great deal with the change in elliptic flow, the variations cancel out to first order.
Therefore the signal, as seen in Fig. 8 is robust with respect to the variation in
elliptic flow.
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
(a) (b) (c)
Fig. 8. (color online) 0-12% Au+Au ZDC triggerd data for different systematic checks: (a) reaction
plane v2, (b) 4-particle cumulant v2, and (c) normalization region for α of 0.6 < |∆φ| < 1.4
November 4, 2018 14:16 WSPC/INSTRUCTION FILE IWCF˙v3
8 Jason Glyndwr Ulery For the STAR Collaboration
To study the effect of the background normalization the size of the normalization
window used to determine α was doubled to 0.6 < |∆φ| < 1.4. The signal is robust
with this change in normalization. Figure 8 shows the background subtracted 3-
particle correlation using this larger normalization window.
Other sources of systematic error include the effect on the trigger particle flow
from requiring a correlated particle (a 20% change on trigger particle v2 is applied),
uncertainity in the v4 parameterization, and multiplicity bias effects on the soft-
soft background. The systematic errors shown in Figures 3, 4, 5, and 7 reflect the
quadratic sum of all the systematic uncertainties mentioned.
5. Conclusion
Three-particle azimuthal correlations have been studied for trigger particles of 3 <
pT < 4 GeV/c and associated particles of 1 < pT < 2 GeV/c in pp, d+Au, and
Au+Au collisions at
sNN=200 GeV/c by STAR. This analysis treats events as the
sum of two components, particles that are jet-like correlated with the trigger and
background particles. On-diagonal broadening has been observed in pp and d+Au
collisions that is consistent with kT broadening. Additional on-diagonal broadening
has been observed in heavy ion collisions that may be due to contributions from
deflected jets and/or large angle gluon radiation. Off-diagonal peaks have been
detected in central Au+Au collisions that are consistent with conical emission.
A study of the angle as a function of associated particle pT was performed to
discriminate between hydrodynamic conical flow and Čerenkov gluon radiation. No
strong dependence on associated particle pT was beheld. This result is consistent
with Mach-cone emission.
References
1. J. Adams et al. (STAR Collaboration), Phys. Rev. Lett. 95, 152301 (2005).
2. S.S. Adler et al. (PHENIX Collaboration), Phys. Rev. Lett. 97, 052301 (2006).
3. J. Ulery et al. (STAR Collaboration), Nuc. Phys. A774, 581-584 (2006).
4. M. Horner et al. (STAR Collaboration), QM2006 Talk Proceedings to appear in Jour.
Phys. G.
5. I. Vitev, Phys. Lett. B 630, 78 (2005).
6. A.D. Polosa and C.A. Salgado, hep-ph/0607295.
7. H. Stoecker, Nucl. Phys. A750, 121 (2005).
8. J. Casalderrey-Solana, E. Shuryak and D. Teaney, J. Phys. Conf. Ser. 27, 23 (2005).
9. I.M. Dremin, Nucl. Phys. A767 233 (2006).
10. V. Koch, A. Majumder and X.-N. Wang, Phys. Rev. Lett. 96, 172302 (2006).
11. J. Ulery and F. Wang, nucl-ex/0609016.
12. J. Adams et al. (STAR Collaboration), Phys. Rev. C 72, 014904 (2004).
http://arxiv.org/abs/hep-ph/0607295
http://arxiv.org/abs/nucl-ex/0609016
	Introduction
	Analysis Procedure
	Results
	Systematics
	Conclusion
ABSTRACT
  Two-particle correlations have shown modification to the away-side shape in
central Au+Au collisions relative to $pp$, d+Au and peripheral Au+Au
collisions. Different scenarios can explain this modification including: large
angle gluon radiation, jets deflected by transverse flow, path length dependent
energy loss, Cerenkov gluon radiation of fast moving particles, and conical
flow generated by hydrodynamic Mach-cone shock-waves. Three-particle
correlations have the power to distinguish the scenarios with conical emission,
conical flow and Cerenkov radiation, from other scenarios. In addition, the
dependence of the observed shapes on the $p_T$ of the associated particles can
be used to distinguish conical emission from a sonic boom (Mach-cone) and from
QCD-Cerenkov radiation. We present results from STAR on 3-particle azimuthal
correlations for a high $p_T$ trigger particle with two softer particles.
Results are shown for $pp$, d+Au and high statistics Au+Au collisions at
$\sqrt{s_{NN}}$=200 GeV. An important aspect of the analysis is the subtraction
of combinatorial backgrounds. Systematic uncertainties due to this subtraction
and the flow harmonics v2 and v4 are investigated in detail. The implications
of the results for the presence or absence of conical flow from Mach-cones are
discussed.

<|endoftext|><|startoftext|>
The Return of a Static Universe and the End of Cosmology
Lawrence M. Krauss1,2 and Robert J. Scherrer2
1Department of Physics, Case Western Reserve University,
Cleveland, OH 44106; email: krauss@cwru.edu and
2Department of Physics & Astronomy, Vanderbilt University,
Nashville, TN 37235; email: robert.scherrer@vanderbilt.edu
(Dated: February 1, 2008)
Abstract
We demonstrate that as we extrapolate the current ΛCDM universe forward in time, all evi-
dence of the Hubble expansion will disappear, so that observers in our “island universe” will be
fundamentally incapable of determining the true nature of the universe, including the existence
of the highly dominant vacuum energy, the existence of the CMB, and the primordial origin of
light elements. With these pillars of the modern Big Bang gone, this epoch will mark the end of
cosmology and the return of a static universe. In this sense, the coordinate system appropriate for
future observers will perhaps fittingly resemble the static coordinate system in which the de Sitter
universe was first presented.
http://arXiv.org/abs/0704.0221v3
Shortly after Einstein’s development of general relativity, the Dutch astronomer Willem
de Sitter proposed a static model of the universe containing no matter, which he thought
might be a reasonable approximation to our low density universe. One can define a coor-
dinate system in which the de Sitter metric takes a static form by defining de Sitter space-
time with a cosmological constant Λ as a four dimensional hyperboloid SΛ : ηABξ
AξB =
−R2, R2 = 3Λ−1 embedded in a 5d Minkowski spacetime with ds2 = ηABdξ
AdξB,
and (ηAB) = diag(1,−1,−1,−1,−1), A, B = 0, · · · , 4. The static form of the de Sitter
metric is then
ds2s = (1 − r
2)dt2s −
1 − r2s/R
− r2sdΩ
which can be obtained by setting ξ0 = (R2 − r2s)
1/2 sinh(ts/R), ξ
1 = rs sin θ cos ϕ, ξ
rs sin θ sin ϕ, ξ
3 = rs cos θ, ξ
4 = (R2 − r2s)
1/2 cosh(ts/R). In this case the metric only corre-
sponds to the section of de Sitter space within a cosmological horizon at R = r − s.
In fact de Sitter’s model wasn’t globally static, but eternally expanding, as can be seen by
a coordinate transformation which explicitly incorporates the time dependence of the scale
factor R(t) = exp(Ht). While spatially flat, it actually incorporated Einstein’s cosmological
term, which is of course now understood to be equivalent to a vacuum energy density, leading
to a redshift proportional to distance.
The de Sitter model languished for much of the last century, once the Hubble expansion
had been discovered, and the cosmological term abandoned. However, all present observa-
tional evidence is consistent with a ΛCDM flat universe consisting of roughly 30% matter
(both dark matter and baryonic matter) and 70% dark energy [1, 2, 3], with the latter
having a density that appears constant with time. All cosmological models with a non-zero
cosmological constant will approach a de Sitter universe in the far future, and many of the
implications of this fact have been explored in the literature [4, 5, 6, 7, 8, 9, 10, 11, 12, 13].
Here we re-examine the practical significance of the ultimate de Sitter expansion and
point out a new eschatological physical consequence: from the perspective of any observer
within a bound gravitational system in the far future, the static version of de Sitter space
outside of that system will eventually become the appropriate physical coordinate system.
Put more succinctly, in a time comparable to the age of the longest lived stars, observers
will not be able to perform any observation or experiment that infers either the existence of
an expanding universe dominated by a cosmological constant, or that there was a hot Big
Bang. Observers will be able to infer a finite age for their island universe, but beyond that
cosmology will effectively be over. The static universe, with which cosmology at the turn of
the last century began, will have returned with a vengeance.
Modern cosmology is built on integrating general relativity and three observational pillars:
the observed Hubble expansion, detection of the cosmic microwave background radiation,
and the determination of the abundance of elements produced in the early universe. We
describe next in detail how these observables will disappear for an observer in the far future,
and how this will be likely to affect the theoretical conclusions one might derive about the
universe.
A. The disappearance of the Hubble Expansion
The most basic component of modern cosmology is the expansion of the universe, firmly
established by Hubble in 1929. Currently, galaxies and galaxy clusters are gravitationally
bound and have dropped out of the Hubble flow, but structures on larger length scales are
observed to obey the Hubble expansion law. Now consider what happens in the far future
of the universe. Both analytic [7] and numerical [10] calculations indicate that the Local
Group remains gravitationally bound in the face of the accelerated Hubble expansion. All
more distant structures will be driven outside of the de Sitter event horizon in a timescale on
the order of 100 billion years ([4], see also Refs. [8, 9]). While objects will not be observed
to cross the event horizon, light from them will be exponentially redshifted, so that within
a time frame comparable to the longest lived main sequence stars all objects outside of our
local cluster will truly become invisible [4].
Since the only remaining visible objects will in fact be gravitationally bound and decou-
pled from the underlying Hubble expansion, any local observer in the far future will see a
single galaxy (the merger product of the Milky Way and Andromeda and other remnants of
the Local Group) and will have no observational evidence of the Hubble expansion. Lacking
such evidence, one may wonder whether such an observer will postulate the correct cosmo-
logical model. We would argue that in fact, such an observer will conclude the existence of
a static “island universe,” precisely the standard model of the universe c. 1900.
This will be true in spite of the fact that the dominant energy in this universe will not
be due to matter, but due to dark energy, with ρM/ρΛ ∼ 10
−12 inside the horizon volume
[9]. The irony, of course, is that the denizens of this static universe will have no idea of
the existence of the dark energy, much less of its magnitude, since they will have no probes
of the length scales over which Λ dominates gravitational dynamics. It appears that dark
energy is undetectable not only in the limit where ρΛ ≪ ρM , but also when ρΛ ≫ ρM .
Even if there were no direct evidence of the Hubble expansion, we might expect three
other bits of evidence, two observational and one theoretical, to lead physicists in the future
to ascertain the underlying nature of cosmology. However, we next describe how this is
unlikely to be the case.
B. Vanishing CMB
The existence of a Cosmic Microwave Background was the key observation that convinced
most physicists and astronomers that there was in fact a hot big bang, which essentially
implies a Hubble expansion today. But even if skeptical observers in the future were inclined
to undertake a search for this afterglow of the Big Bang, they would come up empty-
handed. At t ≈ 100 Gyr, the peak wavelength of the cosmic microwave background will be
redshifted to roughly λ ≈ 1 m, or a frequency of roughly 300 MHz. While a uniform radio
background at this frequency would in principle be observable, the intensity of the CMB will
also be redshifted by about 12 orders of magnitude. At much later times, the CMB becomes
unobservable even in principle, as the peak wavelength is driven to a length larger than the
horizon [4]. Well before then, however, the microwave background peak will redshift below
the plasma frequency of the interstellar medium, and so will be screened from any observer
within the galaxy. Recall that the plasma frequency is given by
where ne and me are the electron number density and mass, respectively. Observations
of dispersion in pulsar signals give [14] ne ≈ 0.03 cm
−3 in the interstellar medium, which
corresponds to a plasma frequency of νp ≈ 1 kHz, or a wavelength of λp ≈ 3 × 10
7 cm.
This corresponds to an expansion factor ∼ 108 relative to the present-day peak of the CMB.
Assuming an exponential expansion, dominated by dark energy, this expansion factor will
be reached when the universe is less than 50 times its present age, well below the lifetime
of the longest-lived main sequence stars.
After this time, even if future residents of our island universe set out to measure a
universal radiation background, they would be unable to do so. The wealth of information
about early universe cosmology that can be derived from fluctuations in the CMB would be
even further out of reach.
C. General Relativity Gives No Assistance
We may assume that theoretical physicists in the future will infer that gravitation is
described by general relativity, using observations of planetary dynamics, and ground-based
tests of such phenomena as gravitational time dilation. Will they then not be led to a Big
Bang expansion, and a beginning in a Big Bang singularity, independent of data, as Lemaitre
was? Indeed, is not a static universe incompatible with general relativity?
The answer is no. The inference that the universe must be expanding or contracting is
dependent upon the cosmological hypothesis that we live in an isotropic and homogeneous
universe. For future observers, this will manifestly not be the case. Outside of our local
cluster, the universe will appear to be empty and static. Nothing is inconsistent with
the temporary existence of a non-singular isolated self-gravitating object in such a universe,
governed by general relativity. Physicists will infer that this system must ultimately collapse
into a future singularity, but only as we presently conclude our galaxy must ultimately
coalesce into a large black hole. Outside of this region, an empty static universe can prevail.
While physicists in the island universe will therefore conclude that their island has a finite
future, the question will naturally arise as to whether it had a finite beginning. As we next
describe, observers will in fact be able to determine the age of their local cluster, but not
the nature of the beginning.
D. Polluted Elemental Abundances
The theory of Big Bang Nucleosynthesis reached a fully-developed state [15] only after
the discovery of the CMB (despite early abortive attempts by Gamow and his collaborators
[16]). Thus, it is unlikely that the residents of the static universe would have any motivation
to explore the possibility of primordial nucleosynthesis. However, even if they did, the
evidence for BBN rests crucially on the fact that relic abundances of deuterium remain
observable at the present day, while helium-4 has been enhanced by only a few percent
since it was produced in the early universe. Extrapolating forward by 100 Gyr, we expect
significantly more contamination of the helium-4 abundance, and concomitant destruction
of the relic deuterium. It has been argued [17] that the ultimate extrapolation of light
elemental abundances, following many generations of stellar evolution, is a mass fraction of
helium given by Y = 0.6. The primordial helium mass fraction of Y = 0.25 will be a relatively
small fraction of this abundance. It is unlikely that much deuterium could survive this degree
of processing. Of course, the current “smoking gun” deuterium abundance is provided by
Lyman-α absorption systems, back-lit by QSOs (see, e.g., Ref. [18]). Such systems will be
unavailable to our observers of the future, as both the QSOs and the Lyman-α systems will
have redshifted outside of the horizon.
Astute observers will be able to determine a lower limit on the age of their system,
however, using standard stellar evolution analyses of their own local stars. They will be able
to examine the locus of all stars and extrapolate to the oldest such stars to estimate a lower
bound on the age of the galaxy. They will be able to determine an upper limit as well, by
determining how long it would take for all of the observed helium to be generated by stellar
nucleosynthesis. However, without any way to detect primordial elemental abundances, such
as the aforementioned possibility of measuring deuterium in distant intergalactic clouds that
currently absorb radiation from distant quasars and allow a determination of the deuterium
abundance in these pre-stellar systems, and with the primordial helium abundance dwarfed
by that produced in stars, inferring the original BBN abundances will be difficult, and
probably not well motivated.
Thus, while physicists of the future will be able to infer that their island universe has not
been eternal, it is unlikely that they will be able to infer that the beginning involved a Big
Bang.
E. Conclusion
The remarkable cosmic coincidence that we happen to live at the only time in the history
of the universe when the magnitude of dark energy and dark matter densities are comparable
has been a source of great current speculation, leading to a resurgence of interest in possible
anthropic arguments limiting the value of the vacuum energy (see, e.g., Ref. [19]). But
this coincidence endows our current epoch with another special feature, namely that we can
actually infer both the existence of the cosmological expansion, and the existence of dark
energy. Thus, we live in a very special time in the evolution of the universe: the time at
which we can observationally verify that we live in a very special time in the evolution of
the universe!
Observers when the universe was an order of magnitude younger would not have been
able to discern any effects of dark energy on the expansion, and observers when the universe
is more than an order of magnitude older will be hard pressed to know that they live in an
expanding universe at all, or that the expansion is dominated by dark energy. By the time
the longest lived main sequence stars are nearing the end of their lives, for all intents and
purposes, the universe will appear static, and all evidence that now forms the basis of our
current understanding of cosmology will have disappeared.
Note added in proof: After this paper was submitted we learned of a prescient 1987
paper [20], written before the discovery of dark energy and other cosmological observables
that are central to our analysis, which nevertheless raised the general question of whether
there would be epochs in the Universe when observational cosmology, as we now understand
it, would not be possible.
Acknowledgments
L.M.K. and R.J.S. were supported in part by the Department of Energy.
[1] L.M. Krauss and M.S. Turner, Gen. Rel. Grav. 27, 1137 (1995).
[2] S. Perlmutter, et al., Astrophys. J. 517, 565 (1999).
[3] A.G. Reiss, et al., Astron. J. 116, 1009 (1998).
[4] L.M. Krauss, and G.D. Starkman, Astrophys. J. 531, 22 (2000).
[5] A.A. Starobinsky, Grav. Cosmol. 6, 157 (2000).
[6] E.H. Gudmundsson and G. Bjornsson, Astrophys. J. 565, 1 (2002).
[7] A. Loeb, Phys. Rev. D 65, 047301 (2002).
[8] T. Chiueh and X.-G. He, Phys. Rev. D 65, 123518 (2002).
[9] M.T. Busha, F.C. Adams, R.H. Wechsler, and A.E. Evrard, Astrophys. J. 596, 713 (2003).
[10] K. Nagamine and A. Loeb, New Astron. 8, 439 (2003).
[11] K. Nagamine and A. Loeb, New Astron. 9, 573 (2004).
[12] J.S. Heyl, Phys. Rev. D 72, 107302 (2005).
[13] L.M. Krauss and R.J. Scherrer, Phys. Rev. D 75, 083524 (2007).
[14] A.G.G.M. Tielens, The Physics and Chemistry of the Interstellar Medium, Cambridge: Cam-
bridge University Press (2005).
[15] R.V. Wagoner, W.A. Fowler, and F. Hoyle, Astrophys. J. 148, 3 (1967).
[16] R.A. Alpher, H. Bethe, and G. Gamow, Phys. Rev. 73, 803 (1948); R.A. Alpher, J.W. Follin,
and R.C. Herman, Phys. Rev. 92, 1347 (1953).
[17] F.C. Adams and G. Laughlin, Rev. Mod. Phys. 69, 337 (1997).
[18] D. Kirkman, D. Tytler, N. Suzuki, J.M. O’Meara, and D. Lubin, Ap.J. Suppl. 149, 1 (2003).
[19] S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987); J. Garriga, M. Livio, and A. Vilenkin, Phys.
Rev. D 61, 023503 (2000).
[20] T. Rothman and G. F. R. Ellis, The Observatory, 107, 24 (1987)
	The disappearance of the Hubble Expansion
	Vanishing CMB
	General Relativity Gives No Assistance
	Polluted Elemental Abundances
	Conclusion
	Acknowledgments
	References
ABSTRACT
  We demonstrate that as we extrapolate the current $\Lambda$CDM universe
forward in time, all evidence of the Hubble expansion will disappear, so that
observers in our "island universe" will be fundamentally incapable of
determining the true nature of the universe, including the existence of the
highly dominant vacuum energy, the existence of the CMB, and the primordial
origin of light elements. With these pillars of the modern Big Bang gone, this
epoch will mark the end of cosmology and the return of a static universe. In
this sense, the coordinate system appropriate for future observers will perhaps
fittingly resemble the static coordinate system in which the de Sitter universe
was first presented.

<|endoftext|><|startoftext|>
Introduction
	Draco -ray flux profiles and the effect of the PSF
	Detection prospects
ABSTRACT
  A new estimation of the gamma-ray flux that we expect to detect from SUSY
dark matter annihilation from the Draco dSph is presented using the DM density
profiles compatible with the latest observations. This calculation takes also
into account the important effect of the Point Spread Function (PSF) of the
telescope. We show that this effect is crucial in the way we will observe and
interpret a possible signal detection. Finally, we discuss the prospects to
detect a possible gamma signal from Draco for MAGIC and GLAST.

<|endoftext|><|startoftext|>
Mon. Not. R. Astron. Soc. 000, 000–000 (0000) Printed 4 November 2018 (MN LATEX style file v2.2)
Magnetohydrodynamic Rebound Shocks of Supernovae
Yu-Qing Lou1,2,3 ⋆ and Wei-Gang Wang1
1Physics Department and Tsinghua Centre for Astrophysics (THCA), Tsinghua University, Beijing, 100084, China;
2Department of Astronomy and Astrophysics, the University of Chicago, 5640 South Ellis Avenue, Chicago, IL 60637, USA;
3National Astronomical Observatories, Chinese Academy of Sciences, A20, Datun Road, Beijing 100012, China.
4 November 2018
ABSTRACT
We construct magnetohydrodynamic (MHD) similarity rebound shocks joining ‘quasi-
static’ asymptotic solutions around the central degenerate core to explore an MHD
model for the evolution of random magnetic field in supernova explosions. This pro-
vides a theoretical basis for further studying synchrotron diagnostics, MHD shock ac-
celeration of cosmic rays, and the nature of intense magnetic field in compact objects.
The magnetic field strength in space approaches a limiting ratio, that is comparable
to the ratio of the ejecta mass driven out versus the progenitor mass, during this
self-similar rebound MHD shock evolution. The intense magnetic field of the remnant
compact star as compared to that of the progenitor star is mainly attributed to both
the gravitational core collapse and the radial distribution of magnetic field.
Key words: magnetohydrodynamics (MHD) – shock waves – stars: neutron – stars:
winds, outflows – supernova remnants – white dwarfs
1 INTRODUCTION
Self-similar evolution of a spherical gas flow under self-
gravity and thermal pressure has been studied over past
four decades: from simulations and the discovery of Larson-
Penston (L-P) type solutions (Bodenheimer & Sweigart
1968; Larson 1969a, b; Penston 1969a, b), to the construc-
tion of the expansion-wave collapse solution (EWCS) using
the central free-fall asymptotic solution (Shu 1977) as well as
to the application of phase-match techniques for construct-
ing infinite series of discrete global solutions including L-P
type solutions (Hunter 1977) and solutions for envelope ex-
pansion with core collapse (EECC; Lou & Shen 2004). Prop-
erties of eigensolutions crossing the sonic critical line were
examined (Jordan & Smith 1977; Shu 1977; Whitworth &
Summers 1985; Hunter 1986). Self-similar shocks were stud-
ied and applied to various astrophysical settings by Tsai &
Hsu (1995), Shu et al. (2002), Shen & Lou (2004), and Bian
& Lou (2005). While these major results were obtained for
an isothermal gas, the counterpart problem with a poly-
tropic equation of state (EoS) was also studied by Cheng
(1978), Goldreich & Weber (1980), Yahil (1983), Suto &
Silk (1988), McLaughlin & Pudritz (1997), Fatuzzo et al.
(2004) and Lou & Gao (2006). In most cases, the polytropic
results share a feature that by setting the polytropic index
γ = 1 in the isothermal limit, all asymptotic behaviours ap-
⋆ E-mail: louyq@tsinghua.edu.cn and lou@oddjob.uchicago.edu;
wwg03@mails.tsinghua.edu.cn
proach the isothermal counterpart solutions. However, Lou
& Wang (2006) reported new ‘quasi-static’ asymptotic so-
lutions unique to a polytropic gas with γ > 1.2 and con-
structed self-similar rebound shocks for supernovae (SNe).
Chiueh & Chou (1994) studied a self-similar MHD
problem by including the magnetic pressure gradient force
in the momentum equation. Yu & Lou (2005) improved
their formulation and provided a more detailed analysis (see
Zel’dovich & Novikov 1971 for a discussion of random mag-
netic field). Wang & Lou (2006) studied this MHD problem
for a polytropic gas and derived the ‘quasi-static’ asymp-
totic solutions. Self-similar MHD shocks were explored by
Yu et al. (2006). As magnetic field is inevitably involved in
SNe and is crucial for synchrotron radiation and cosmic ray
acceleration, we construct here rebound MHD shocks with
‘quasi-static’ asymptotic solutions to model magnetic field
evolution in SN explosions.
Type II, Ib, Ic SNe are thought to be caused by grav-
itational core collapse due to an insufficient nuclear fuel;
such collapse creates an over-dense core, which rebounds
abruptly initiating a powerful rebound shock. The energetics
of sustaining such a rebound shock has been an outstanding
problem. We approach this issue in the following perspec-
tive. Triggered by such a core collapse, the rebound shock
is essentially supported by the neutrino-driven mechanism,
and several complicated physical processes are involved in
the stellar interior: all four elementary forces and the cou-
pling of various fluids and matters such as baryons, neutri-
nos, photons etc. (e.g., Janka et al. 2006). We approximate
c© 0000 RAS
http://arxiv.org/abs/0704.0223v1
2 Y.-Q. Lou & W.-G. Wang
such a dynamic system in terms of a single fluid with a
polytropic EoS, and treat the shock as an energy-conserved
self-similar shock. Conceptually, the ‘rebound shock’ here
refers to a neutrino-driven shock, as opposed to the ‘prompt
shock’ mentioned in Janka et al. (2006). We constructed
such a rebound shock (Lou & Wang 2006) to model a SN
explosion followed by a self-similar evolution leading to a
quasi-static configuration. In reference to the hydrodynamic
model of Lou & Wang (2006), the main thrust of this Let-
ter is to construct approximately a self-similar model of a
quasi-spherically symmetric rebound MHD shock for a SN
explosion, providing the profile and evolution of magnetic
field to facilitate future studies of synchrotron radiation and
MHD shock acceleration of cosmic rays, and to probe the
nature of intense magnetic field of compact stellar objects
left behind.
2 FORMULATION AND ANALYSIS
2.1 The Self-Similar MHD Formulation
A quasi-spherical similarity MHD flow embedded with a
completely random magnetic field on small scales is formu-
lated the same as in Yu & Lou (2005) and Yu et al. (2006);
the key difference here is the polytropic EoS p = κργ instead
of an isothermal gas, where p is the pressure, ρ is the mass
density, and κ is constant. Using the magnetic flux frozen-in
condition, the ideal MHD equations, viz., the mass conserva-
tion equation, the radial momentum equation, the magnetic
induction equation and the polytropic EoS, can be reduced
to two nonlinear ordinary differential equations (ODEs)
(n− 1)v +
nx− v
3n− 2
2(x− v)(nx− v)
α(nx− v)2 − (γαγ + hα2x2)
, (1)
(n− 1)
αv(nx− v) + 2hα2x2
(nx− v)2
(3n− 2)
− 2γαγ
(x− v)
α(nx− v)2 − (γαγ + hα2x2)
along with a useful relation m = αx2(nx−v) by the following
MHD self-similar transformation in a polytropic gas flow
r = k
x, u = k
v, ρ =
4πGt2
, p =
kt2n−4
k3/2t3n−2
(3n− 2)G
(nx−v) , < B2t >=
kt2n−4
, (3)
where G is the gravitational constant, M is the enclosed
mass at time t within radius r, u is the radial flow speed,
< B2t > is the mean square of random transverse magnetic
field Bt, x is the independent self-similar variable, v(x) is
the reduced flow speed, α(x) is the reduced density, m(x)
is the reduced enclosed mass, the prime ′ stands for the
first derivative d/dx, k and n are two parameters, and h
is a parameter for the strength of < B2t >
1/2. We expedi-
ently take γ = 2 − n for a polytropic EoS with a constant
κ ≡ k(4πG)γ−1 = pρ−γ . The magnetosonic critical curve is
determined by the simultaneous vanishing of the numerator
and denominator on the RHS of eq (1) or (2). The two eigen-
solutions of v′ across the magnetosonic critical curve can be
derived by using the L’Hôspital rule (Lou & Wang 2006; Yu
& Lou 2005; Yu et al. 2006). The solutions are obtained for
v(x) and α(x), and the magnetic field < B2t >
1/2 is then
known from transformation (3).
2.2 Analytic Asymptotic MHD Solutions
For h < hc ≡ n2/[2(1− n)(3n− 2)], eqs (1) and (2) give the
magnetostatic solution of a magnetized singular polytropic
sphere (MSPS) with v = 0 and
2γ(4− 3γ)
(1− γ)
]−1/n
, (4)
t >= h
2γ(4− 3γ)
(1− γ)
]−2/n
2−4/n
. (5)
There exists an asymptotic MHD solution approaching this
limiting form at small x (referred to as the type I ‘quasi-
static’ asymptotic MHD solution), viz., v = LxK and
2γ(4− 3γ)
(1− γ)
]−1/n
(K + 2− 2/n)L
n(K − 1)
n2/(2γ)
4− 3γ
]−1/n
K−1−2/n
, (6)
where K is the root of quadratic equation
/2 + n(3n− 2)h]K2 − (4− 3n)[n/2 + (3n− 2)h]K
+ γ(2/n− 2)(3n− 2)h = 0 . (7)
When 12 − 8
2 < n < 0.8 and h0 < h < hc, where h0 ≡
(3 + 2
2)n − 4
4 − (3 − 2
/[2n(3n − 2)], or when
2/3 < n < 12−8
2 for h < hc, eq (7) gives two roots K > 1,
corresponding to two possible ‘quasi-static’ solutions.
The asymptotic MHD solution at large x is α =
−2/n + · · · and
v = B0x
1−1/n −
(3n− 2)
2h(n− 1)
1−2/n
n[2(n+ γ)− 3]
(2−2γ−n)/n
+ · · · , (8)
where A0 and B0 are two constants. Solution (8) at large x
can be connected to ‘quasi-static’ asymptotic MHD solution
(6) and (7) at small x by a Runge-Kutta integration (Press
et al. 1986), crossing the magnetosonic critical curve either
smoothly or with an MHD shock (Yu et al. 2006).
2.3 MHD Shock Jump Conditions
MHD shock conditions (Yu et al. 2006; Lou & Wang 2006)
include conservations of mass, momentum, energy and mag-
netic flux, and in self-similar forms, they appear as
= 0 , (9)
= 0 , (10)
c© 0000 RAS, MNRAS 000, 000–000
MHD Rebound Shocks in Supernovae 3
(γ − 1)
αγ−1s
+ 2hαs
= 0 , (11)
where quantities in square brackets with superscript ‘1’ (up-
stream) and subscript ‘2’ (downstream) remain conserved
across the MHD shock front indicated by a subscript s. The
parameter k changes according to k2 = k1x
s2 on two
sides of a shock. For the specific entropy to increase from
upstream to downstream sides, xs1 > xs2 is necessary. MHD
shock conditions (9)−(11) lead to a quadratic equation (Lou
& Wang 2006); once we specify physical conditions on one
side of a chosen shock location, the corresponding quantities
α, v, x on the other side are readily computed.
3 REBOUND MHD SHOCKS IN SUPERNOVA
EXPLOSIONS
Various rebound MHD shocks are constructed numerically,
parallel to Lou & Wang (2006). With chosen inner and outer
radii, e.g., ri = 10
6cm and ro = 10
12cm for neutron star
formation, and when the k parameter in transformation (3)
is specified, we apply our solutions to a physical rebound
MHD shock scenario for SNe (Lou & Wang 2006).
3.1 Final and Initial Configurations
Similar to the hydrodynamic rebound shock model of Lou
& Wang (2006), the final configuration (small x) of our re-
bound MHD shock solutions gradually evolves to a MSPS
and is regarded as a remnant compact object after the re-
bound MHD shock ploughing through stellar ejecta; the
initial configuration (large x) marks the onset of gravity-
induced core collapse with outer inflows or outflows such as
stellar winds or stellar oscillations.
We define the outer initial mass Mo,ini and the inner ul-
timate mass Mi,ult the same way as in Lou & Wang (2006)
and regard them as rough estimates for the masses of the
progenitor star and the remnant compact object. The ratio
of the two masses isMo,ini/Mi,ult = λ1(ro/ri)
(3−2/n) where
λ1 ≡ A0(k1/k2)1/n
n2/[2γ(4−3γ)]+(1−γ)h/γ
involves
parameters of the rebound MHD shock and is equal to the
ratio of enclosed masses at the same r. Similar to the result
of Lou & Wang (2006), we find numerically that λ1 > 1 de-
pends on the choice of solutions, clearly indicating that a
rebound MHD shock drives out stellar materials.
By eq (3), the final magnetostatic configuration gives
t,ult >
k2γ(4− 3γ)
]−1/n
1−2/n
The ratio of initial to final magnetic fields at the same r is
t,ini >
1/2 / < B2
t,ult >
1/2= λ1, where λ1 > 1 by numer-
ical exploration. Thus a rebound MHD shock breakout pro-
cess reduces the magnetic field by the same ratio of enclosed
masses at the same r; yet this decrease in magnetic field is
insignificant as compared to the radial variation of magnetic
field, i.e., the r1−2/n dependence. As γ approaches 4/3 or
n → 2/3, this scaling approaches r−2, while the dependence
of enclosed mass on r goes to r0. For a ∼ 10G (0.1G) surface
magnetic field at ro = 10
12cm, we estimate a magnetic field
in the interior of the final configuration (ri = 10
6cm) to be
∼ 1013G (1011G), sensible for magnetized neutron stars; if
we take ri = 10
9cm, then the final interior magnetic field
is estimated to be ∼ 107G (105G), fairly close to relevant
magnetic field strengths of white dwarfs (e.g., Euchner et al.
2005, 2006; Schmidt et al. 2003).
3.2 Evolution of Rebound MHD Shocks
Time evolution of density, velocity and enclosed mass are
similar to those described by Lou & Wang (2006). We focus
here on the magnetic field evolution. Figure 1 shows a typical
time evolution of < B2t >
1/2 / < B2
t,ult
>1/2 to complement
the r1−2/n behaviour. Magnetic field increases at first, and
gradually decreases until reaching the magnetostatic config-
uration much smaller than the initial configuration in size. In
short, magnetic field changes moderately. The crucial point
is that the magnetic field varies significantly in r within a
star. If we take the magnetic field at the outer boundary
to be the surface magnetic field of the progenitor star and
take the magnetic field at the inner boundary as the surface
magnetic field of the remnant compact star, then a large
ratio of ∼ 1012 appears in forming a neutron star (Lou &
Wang 2006). This model feature may explain the intense
magnetic field of neutron stars inferred from spin-down ob-
servations of radio pulsars. In our scenario, after the passage
of such a rebound MHD shock, stellar ejecta detach from
the central degenerate neutron star which is thus exposed
with a surface magnetic field of 1013∼11G. In the same spirit
of Lou & Wang (2006), we also suggest the formation of
magnetic white dwarfs from the end of main-sequence stars
with 6 ∼ 8M⊙; in this scenario, the surface magnetic field
of an exposed central white dwarf is in a plausible range of
∼ 107∼5G (e.g., Euchner et al. 2005, 2006; Schmidt et al.
2003).
The major point to be emphasized is that random mag-
netic field preexists inside progenitor stars through various
dynamo processes. We detect magnetic field strengths of or-
der 10−2∼3G on stellar surface and this corresponds to a
much stronger magnetic field in the stellar interior with a
scaling of ∼ r1−2/n shortly after the initiation of core col-
lapse. In addition, the interior magnetic field can be consid-
erably strengthened by the free-fall core collapse preceding
the emergence of a rebound MHD shock (see Lou & Wang
2006 for descriptions of the rebound shock scenario and the
core collapse process), according to the frozen-in flux and
accretion shock conditions. In reality, these two processes
happen concurrently to produce the resultant self-similar
distribution of magnetic field. In short, the interior magnetic
field would be much stronger than the surface magnetic field
and can be further enhanced to reach a high-field regime.
The origin of stellar magnetic field was argued by
several authors to come from various processes, includ-
ing dynamo effects and thermomagnetic instabilities (e.g.,
Reisenegger et al. 2005). Our MHD scenario of interior core
collapse and rebound shock appears to grossly match with
observational facts. From our MSPS configuration with a
random magnetic field strength scaled as Bt ∝ r1−2/n in
a polytropic gas, we see a real possibility that the interior
magnetic field can be actually much stronger than the sur-
c© 0000 RAS, MNRAS 000, 000–000
4 Y.-Q. Lou & W.-G. Wang
r(cm)
t = ∞
Typical Magnetic Field Evolution During Shock Breakout
Figure 1. The ratio Bt/Bt,ult ≡< B
1/2 / < B2
t,ult
>1/2 is
the rms magnetic field strength divided by the corresponding rms
magnetic field strength of the final magnetostatic configuration
at the same r. This example is constructed by integrating inward
from (x0 , v0 , α0) on the magnetosonic critical curve and using an
eigensolution to match with a quasi-static solution as x → 0+; we
use the solution portion within xs2 < x0 for the downstream. We
then obtain the upstream point (xs1, vs1, αs1) by the MHD shock
jump conditions from the values of (xs2, vs2, αs2) obtained in the
former integration and further integrate outward to determine
the upstream solution. The relevant parameters are γ = 1.32,
n = 0.68, h = 0.01, k1 = 7.7 × 10
16 cgs units, k2 = 4 × 10
cgs units, x0 = 1.778 , v0 = 0.4620 , α0 = 0.067, and xs2 = 1.1.
Here, t1 = 6.61× 10
−5s is the time when the MHD shock crosses
the inner boundary and is the initial time of application; t2 =
4.40 × 104s is the time when the MHD shock crosses the outer
boundary; tm1 = 0.1s and tm2 = 1 × 10
8s are two intermediate
times between t1 and t2 and t2 and t = ∞.
face magnetic field. Once the onset of a gravitational col-
lapse has been initiated within a magnetized progenitor star
and following subsequent free-fall core collapse and accretion
shock, an eventual emergence of a rebound MHD shock can
evolve in a quasi-spherical self-similar manner and can end
up to a MSPS configuration with a high-density compact
degenerate object left behind.
4 CONCLUSIONS AND DISCUSSION
We outline and propose the model scenario of a quasi-
spherical rebound MHD shock to form high-density com-
pact stars after a gravity-induced collapse in the core of a
progenitor star that runs out of nuclear fuels. The stellar in-
terior magnetic field is expected to be enhanced during the
core collapse before the eventual emergence of a rebound
MHD shock; also the interior magnetic field should be much
stronger than the stellar surface magnetic field prior to the
onset of a core collapse and during the outward propagation
of the rebound MHD shock. Once the magnetostatic config-
uration of a remnant degenerate star appears, stellar ejecta
gradually detach from the compact object, exposing intense
surface magnetic fields of ∼ 1013∼11G for neutron stars or
∼ 107∼5G for magnetic white dwarfs.
Formally, MSPS solution (4) for density diverges as x →
0+. Conceptually, this can be readily reconciled by the onset
of degeneracy in core materials at a nuclear mass density.
In our model, there are two parameters for magnetic
field: index n for radial variation and ratio h. While it ap-
pears in this Letter that n depends on the stiffness (i.e., γ)
of EoS, as discussed below it is in fact a parameter free from
the stiffness (i.e., γ). Meanwhile, ratio h represents an ideal
MHD approximation that dictates the magnitude variation
of random transverse magnetic field; other factors, such as
metalicity, differential rotation, convective motions, buoy-
ancy etc. (Janka 2006), are important in generating random
magnetic fields inside a star prior to the onset of the core
collapse.
In contexts of SN explosions, two-shock models, i.e.,
models involving a ‘forward shock’ for the SN remnant shock
after the powerful rebound shock crashing into the inter-
stellar medium and a ‘reverse shock’ produced by the same
impact process (see, e.g., Chevalier et al. 1992 and Truelove
& McKee 1999), have been studied earlier. The major for-
mulation difference between these earlier works and ours is
that they ignored self-gravity of stellar ejecta. By estimates,
the self-gravity cannot be obviously dropped and thus these
models including forward and reverse shocks would be ap-
plicable in the limit of extremely strong shocks in order to
ignore self-gravity. Another major difference is that these
earlier models focus on circumstellar interactions, while we
focus on a rebound MHD shock as it travels within the mag-
netized stellar interior.
Our polytropic model is currently restricted to γ = 2−n
for a constant κ merely for expediency. This constraint can
be actually removed if we consistently allow the reduced
pressure to be ∝ αγmq where index parameter q ≡ 2(n +
γ − 2)/(3n − 2) 6= 0 in general and m = αx2(nx− v) is the
reduced enclosed mass. It is then possible for 1 < γ < 2
while n → 2/3. This more general case will be reported
separately (Wang & Lou 2006).
Numerical MHD simulations and observations are
needed to further test our scenario for rebound MHD shocks
in SNe, such as direct or indirect observation of density and
flow speed profiles (Lou & Wang 2006) as well as diagnostics
of synchrotron emissions caused by relativistic electrons in
random magnetic field generated by MHD shocks.
ACKNOWLEDGMENTS
This research has been supported in part by the ASCI
Center for Astrophysical Thermonuclear Flashes at the
Univ. of Chicago, by THCA, by the NSFC grants 10373009
and 10533020 at the Tsinghua Univ., and by the SRFDP
20050003088 and the Yangtze Endowment from the Min-
istry of Education at the Tsinghua Univ.
REFERENCES
Bian F.-Y., Lou Y.-Q., 2005, MNRAS, 363, 1315
Bodenheimer P., Sweigart A., 1968, ApJ, 152, 515
Boily C. M., Lynden-Bell D., 1995, MNRAS, 276, 133
Chandrasekhar S., 1957, Stellar Structure. Dover Publica-
tions, New York
c© 0000 RAS, MNRAS 000, 000–000
MHD Rebound Shocks in Supernovae 5
Cheng A. F., 1978, ApJ, 221, 320
Chevalier R. A., Blondin J. M., Emmering R. T., 1992,
ApJ, 392, 118
Chiueh T., Chou J.-K., 1994, ApJ, 431, 380
Euchner F., et al., 2005, A&A, 442, 651
Euchner F., et al., 2006, A&A, 451, 671
Fatuzzo M., Adams F. C., Myers P. C., 2004, ApJ, 615,
Goldreich P., Weber S. V., ApJ, 1980, 238, 991
Hu J., Shen Y., Lou Y.-Q., Zhang S.N., 2006, MNRAS,
365, 345
Hunter C., 1977, ApJ, 218, 834
Hunter C., 1986, MNRAS, 223, 391
Jordan D. W., Smith P., 1977, Nonlinear Ordinary Differ-
ential Equations, Oxford University Press. Oxford
Janka H.-Th., Langanke K., Marek A., Mart́ınez-Pinedo
G., Müeller B., 2006, arXiv:astro-ph/0612072
Landau L. D., Lifshitz E. M., 1959, Fluid Mechanics, Perg-
amon Press, NY
Larson R. B., 1969a, MNRAS, 145, 271
Larson R. B., 1969b, MNRAS, 145, 405
Lattimer J. M., Prakash M., 2004, Science, 304, 536
Lou Y.-Q., Shen Y., 2004, MNRAS, 348, 717
Lou Y.-Q., Gao Y., 2006, MNRAS, 373, 1610
Lou Y.-Q., Wang W. G., 2006, MNRAS, 372, 885
McLaughlin D. E., Pudritz R. E., 1997, ApJ, 476, 750
Penston M. V., 1969a, MNRAS, 144, 425
Penston M. V., 1969b, MNRAS, 145, 457
Press W. H., et al., 1986, Numerical Recipes (Cambridge
University Press)
Reisenegger A., et al., 2005, AIP Conf. Proc., 784, 263
Schmidt G. D., et al., ApJ, 595, 1101
Shapiro S. L., Teukolsky S. A., 1983, Black Holes, White
Dwarfs and Neutron Stars, John Wiley & Sons, Inc.
Shen Y., Lou Y. Q., 2004, ApJL, 611, L117
Shu F. H., 1977, ApJ, 214, 488
Shu F. H., Lizano S., Galli D., Cantó J., Laughlin G., 2002,
ApJ, 580, 969
Suto Y., Silk J., 1988, ApJ, 326, 527
Terebey S., Shu F. H., Cassen P., 1984, ApJ, 286, 529
Truelove J. K., McKee C. F., 1999, ApJS, 120, 299
Tsai J. C., Hsu J. J. L., 1995, ApJ, 448, 774
Wang W. G., Lou Y.-Q., 2006, in preparation
Whitworth A., Summers D., 1985, MNRAS, 214, 1
Yahil A., 1983, ApJ, 265, 1047
Yu C., Lou Y.-Q., 2005, MNRAS, 364, 1168
Yu C., et al., 2006, MNRAS, 370, 121 (astro-ph/0604261)
Zel’dovich Ya. B., Novikov I. D., 1971, Stars and Relativ-
ity – Relativistic Astrophysics, Vol. 1, The University of
Chicago Press, Chicago
c© 0000 RAS, MNRAS 000, 000–000
http://arxiv.org/abs/astro-ph/0612072
http://arxiv.org/abs/astro-ph/0604261
	INTRODUCTION
	Formulation and Analysis
ABSTRACT
  We construct magnetohydrodynamic (MHD) similarity rebound shocks joining
`quasi-static' asymptotic solutions around the central degenerate core to
explore an MHD model for the evolution of random magnetic field in supernova
explosions. This provides a theoretical basis for further studying synchrotron
diagnostics, MHD shock acceleration of cosmic rays, and the nature of intense
magnetic field in compact objects. The magnetic field strength in space
approaches a limiting ratio, that is comparable to the ratio of the ejecta mass
driven out versus the progenitor mass, during this self-similar rebound MHD
shock evolution. The intense magnetic field of the remnant compact star as
compared to that of the progenitor star is mainly attributed to both the
gravitational core collapse and the radial distribution of magnetic field.

<|endoftext|><|startoftext|>
Introduction
Heavy ion collisions create a medium that may be the quark gluon plasma (QGP).
This medium can be studied through jets and jet-correlations. Jets make a good
probe because their properties can be calculated in the vacuum with perturba-
tive quantum chromodynamics (pQCD). Two-particle jet-like azimuthal correla-
tions have shown the away-side shape in central Au+Au collisions to be broadened
with respect to pp and peripheral Au+Au collisions or even double humped 1,2
(see Fig. 1a). The away-side structure is consistent with many different physics
mechanisms including: large angle gluon radiation 3,4, jets deflected by radial flow
or preferential selection of particles due to path-length dependent energy loss, hy-
drodynamic conical flow generated by Mach-cone shock waves 5,6, and Čerenkov
radiation 7,8. Three-particle correlations can be used to differentiate conical flow
and Čerenkov radiation, which have the characteristic of conical emission, from
other mechanisms. In addition, the associated particle pT dependence of the coni-
cal emission angle can be used to differentiate between hydrodynamic conical flow
and simple Čerenkov radiation.
http://arxiv.org/abs/0704.0224v1
November 4, 2018 14:16 WSPC/INSTRUCTION FILE
QM2006PosterProceedings˙V4
2 Jason Glyndwr Ulery For the STAR Collaboration
2. Analysis Procedure
The 3-particle correlation analysis method is rigorously described in 9. The results
reported here are for charged trigger particles of 3 < pT < 4 GeV/c and two
charged associated particles of 1 < pT < 2 GeV/c (except where otherwise noted).
The data were all taken in the STAR time projection chamber for pp, d+Au and
Au+Au collisions at
sNN=200 GeV/c.
Figure 1b shows the raw 3-particle azimuthal distribution in ∆φaT = φa − φT
and ∆φbT = φb − φT where φT , φa, and φb are the azimuthal angles of the trigger
particle and the two associated particles respectively. Combinatorial backgrounds
must be removed to obtain the genuine 3-particle correlation signal. The analysis is
performed by treating the events as composed of two components, particles that are
jet-like correlated with the trigger particle and background particles. One source of
background, the hard-soft background, results when one of the associated particles
has a jet-like correlation with the trigger particle and the other is uncorrelated,
except for the correlation due to flow. The background is constructed from the 2-
particle jet-like correlation, Ĵ2, folded with the normalized 2-particle background,
, Fig. 1a. The 2-particle background is constructed by mixing events with the
flow modulation added in pairwise from the average v2 values from the measure-
ments based on the reaction plane and 4-particle cumulant methods 1. For the v4
contribution we use the parameterization v4 = 1.15v
from the data10. The back-
ground is normalized (with scale factor α) to the signal within 0.8 < |∆φ| < 1.2
(zero yield at 1 radian or ZYA1). We shall refer to the hard-soft background as
Ĵ2 ⊗ αBinc2 .
φ-φ=φ∆
-1 0 1 2 3 4 5
-1 0 1 2 3 4 5
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
(a) (b) (c) (d)
Fig. 1. (color online) (a) Raw 2-particle correlation (points), background from mixed events with
flow modulation added-in (solid) and scaled by ZYA1 (dashed), and background subtracted 2-
particle correlation (insert). (b) Raw 3-particle correlation, (c) soft-soft background, βα2Binc
and (d) hard-soft background + trigger flow, Ĵ2⊗αBinc2 + βα2B
inc,TF
. See text for detail. Plots
are from ZDC-triggered 0-12% Au+Au collisions at
sNN=200 GeV/c.
Another source of background, the soft-soft background, results from correla-
tions between the two associated particles which are independent of the trigger
particle. This background is obtained from mixed events, where the trigger particle
and the associated particles are from different events in the same centrality win-
dow. We shall refer to the soft-soft background as Binc
. Since the two associated
November 4, 2018 14:16 WSPC/INSTRUCTION FILE
QM2006PosterProceedings˙V4
ARE THERE MACH CONES IN HEAVY ION COLLISIONS? THREE-PARTICLE CORRELATIONS FROM STAR 3
particles are from the same event, this background contains all of the correlations
between the two associated particles that are independent of the trigger particle,
including correlations from minijets, other jets in the event, and flow.
The flow between the two associated particles that is independent of the trigger
particle was accounted for in the soft-soft term, but particles are also correlated with
the trigger particle via flow. The trigger flow is added in triplet-wise from mixed
events, where the trigger and associated particles are all from different events in
the same centrality window. The v2 and v4 values are obtained the same way as for
the 2-particle background. The number of triplets is determined from the inclusive
events. We shall refer to the backgrounds from trigger flow as B
inc,TF
. The total
background is then, Ĵ2 ⊗ αBinc2 + βα2(Binc3 + B
inc,TF
). Both Binc
and B
inc,tf
are scaled by βα2. The normalization α2 corrects for the multiplicity bias from
requiring a trigger particle. The factor β accounts for the effect of non-poission
multiplicity distributions and is obtained such that the number of associated pairs
in the background subtracted jet-like three-particle correlation signal equals the
square of the number of associated particles in the background subtracted jet-like
two-particle correlation signal. Figure 1c and d show βα2Binc
and Ĵ2 ⊗ αBinc2 +
inc,TF
, respectively.
3. Results
-1 0 1 2 3 4 5-1
-0.02
-1 0 1 2 3 4 5-1
-0.02
-1 0 1 2 3 4 5-1
-0.05
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
STAR Preliminary
Fig. 2. (color online) Background subtracted 3-particle correlations for pp (top left), d+Au (top
middle), and Au+Au 50-80% (top right), 30-50% (bottom left), 10-30% (bottom center), and ZDC
triggered 0-12% (bottom right) collisions at
sNN=200 GeV/c.
Figure 2 shows background subtracted 3-particle jet-like correlation signals. The
pp and d+Au results are similar. Peaks are clearly visible for the near-side, (0,0), the
November 4, 2018 14:16 WSPC/INSTRUCTION FILE
QM2006PosterProceedings˙V4
4 Jason Glyndwr Ulery For the STAR Collaboration
away-side, (π,π) and the two cases of one particle on the near-side and the other on
the away-side, (0,π) and (π,0). The away-side peak displays diagonal elongation that
is consistent with kT broadening. The perpherial Au+Au results show additional
on-diagonal elongation of the away-side peak which may be due to contribution
from deflected jets. The additional on-diagonal broadening persists into the more
central Au+Au collisions. In addition, the more central Au+Au collisions display
an off-diagonal structure, at about π± 1.45 radians, that is consistent with conical
emission. The structure increases in magnitude with centrality and is prominent in
the high statistics top 12% central data provided by the on-line ZDC trigger.
partN
0 1 2 3 4 5 6
0.5 Near
1.45)±πDeflected (
1.45)±πCone (
STAR Preliminary
partN
0 1 2 3 4 5 6
1.00)±πDeflected-Cone (
1.30)±πDeflected-Cone (
1.45)±πDeflected-Cone (
STAR Preliminary
Fig. 3. (color online) (left) Average signals in 0.7 × 0.7 boxes at (0,0) (triangle), (π,π) (star),
(π± 1.45,π± 1.45) (square), and (π± 1.45,π∓ 1.45) (circle). (right) Differences in average signals,
between (π ± 1.45,π ± 1.45) and (π ± 1.45,π ∓ 1.45) (triangle), between (π ± 1.3,π ± 1.3) and
(π± 1.3,π∓ 1.3) (square), and between (π± 1.0,π± 1.0) and (π± 1.0,π∓ 1.0) (circle). Solid error
bars are statistical and shaded are systematic. Npart is the number of participants. The ZDC
0-12% points (open symbols) are shifted to the left for clarity.
Figure 3 (left) shows the centrality dependence of the average signal strengths
in different regions. The off-diagonal signals (circle) increase with centrality and
significantly deviate from zero in central Au+Au collisions. The locations of the
off-diagonal signals were determined from a double Gaussian fit to a strip projected
to the off-diagonal, Fig. 4, and were found to be 1.45 radians from π. The differ-
ences between on-diagonal signals, where both conical emission and deflected jets
may contribute, and off-diagonal signals, where only conical emission contributes is
shown in figure 3 (right). Since conical emission signals are expected to be of equal
magnitude on-diagonal as off-diagonal, the difference may indicate the contribution
from deflected jets. The difference decreases with distance from (π,π).
The Mach cone emission angle is expected to be independent of the associated
particle momentum 6, while the Čerenkov radiation model in Ref. 7 predicts an
emission angle that is sharply decreasing with increasing associated particle mo-
mentum. Figure 5 (left) shows the dependence the off-diagonal peak angle on asso-
ciated particle pT . The angle is consistent with constant as a function of associated
particle pT .
Figure 5 (right) shows the centrality dependence the off-diagonal peak angle.
November 4, 2018 14:16 WSPC/INSTRUCTION FILE
QM2006PosterProceedings˙V4
ARE THERE MACH CONES IN HEAVY ION COLLISIONS? THREE-PARTICLE CORRELATIONS FROM STAR 5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-0.2
STAR Preliminary
Fig. 4. (color online) Away-side projections of a strip of width 0.7 radians for (left) d+Au and
(right) 0-12% ZDC Triggered Au+Au. Off-diagonal projection (solid) is (∆φ1 − ∆φ)/2 and on-
diagonal projection (open) is (∆φ1 +∆φ)/2− π. Shaded bands are systematic errors.
 Assoc (GeV/c)
0 0.5 1 1.5 2 2.5
0.02±Au+Au 0-12 1.41
0.02±Au+Au 0-50 1.45
Statistical
Systematic
STAR Preliminary
partN(
3 3.5 4 4.5 5 5.5 6
Au+Au 0-12% (shifted)
Au+Au 30-50%, 10-30% and 0-10%
0.03±1.46
STAR Preliminary
Fig. 5. (color online) Emission angles from double Gaussian fits. (left) Angle as a function of
associated particle pT for Au+Au 0-12% ZDC triggered data (filled) and Au+Au 0-50% from
minimum bias data (open). (right) Angle as a function of centrality for Au+Au 0-12% ZDC
triggered data (circle) and Au+Au 30-50%, 10-30% and 0-10% from minimum bias data (square).
The 0-12% point has been shifted for clarity. Corresponding off-diagonal peak values from fits to
a constant are indicated in the legends. The dashed line is at π/2. Solid error bars are statistical
and shaded are systematic.
The angle is consistent with remaining constant as a function of centrality for mid-
central and central Au+Au collisions. The solid line at 1.46 on the plot is from a
fit to a constant.
4. Systematics
The major sources of systematic error are the elliptic flow measurement and the
normalization. The default v2 used is the average those measured by the reaction
plane and 4-particle cumulant methods. We use the reaction plane and 4-particle
cumulant v2 as the upper and lower bounds to estimate the systematic uncertainty
of the v2 subtraction. Figure 6a and b show the background subtracted 3-particle
correlation for reaction plane and 4-particle v2 respectively. The signal is robust with
respect to this variation. The hard-soft background and trigger flow backgrounds
individually vary a great deal with the change in elliptic flow but the variations
cancel to first order in the sum.
November 4, 2018 14:16 WSPC/INSTRUCTION FILE
QM2006PosterProceedings˙V4
6 Jason Glyndwr Ulery For the STAR Collaboration
To study the effect of the normalization the size of the normalization window
was doubled to 0.6 < |∆φ| < 1.4. The signal is robust with respect to this change
in normalization. Other sources of systematic error include the effect on the trigger
particle flow from requiring a correlated particle (20% on trigger particle v2), un-
certainty in the v4 parameterization, and multiplicity bias effects on the soft-soft
background. The systematic errors shown in figures include all sources mentioned.
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
-1 0 1 2 3 4 5-1
(a) (b) (c)
Fig. 6. (color online) 0-12% Au+Au ZDC triggerd data for different systematic checks: (a) reaction
plane v2, (b) 4-particle cumulant v2, and (c) normalization region for α of 0.6 < |∆φ| < 1.4
5. Conclusion
Three-particle azimuthal correlations have been studied for trigger particles of 3 <
pT < 4 GeV/c and associated particles of 1 < pT < 2 GeV/c in pp, d+Au, and
Au+Au collisions at
sNN=200 GeV/c by STAR. This analysis treats events as the
sum of two components, particles that are jet-like correlated with the trigger and
background particles. On-diagonal broadening has been observed in pp and d+Au
consistent with kT broadening. Additional on-diagonal broadening has been seen in
Au+Au collisions possibly due to deflected jets. Off-diagonal peaks consistent with
conical emission are present in central Au+Au collisions. To discriminate between
Mach cone emission and simple Čerenkov gluon radiation, a study of the associated
particle pT dependence was performed. No strong pT dependence of the angles on
associated particle pT is observed. This result is consistent with Mach cone emission.
References
1. J. Adams et al. (STAR Collaboration), Phys. Rev. Lett. 95, 152301 (2005).
2. S.S. Adler et al. (PHENIX Collaboration), Phys. Rev. Lett. 97, 052301 (2006).
3. I. Vitev, Phys. Lett. B 630, 78 (2005).
4. A.D. Polosa and C.A. Salgado, hep-ph/0607295.
5. H. Stoecker, Nucl. Phys. A750, 121 (2005).
6. J. Casalderrey-Solana, E. Shuryak and D. Teaney, J. Phys. Conf. Ser. 27, 23 (2005).
7. I.M. Dremin, Nucl. Phys. A767 233 (2006).
8. V. Koch, A. Majumder and X.-N. Wang, Phys. Rev. Lett. 96, 172302 (2006).
9. J. Ulery and F. Wang, nucl-ex/0609016.
10. J. Adams et al. (STAR Collaboration), Phys. Rev. C 72, 014904 (2004).
http://arxiv.org/abs/hep-ph/0607295
http://arxiv.org/abs/nucl-ex/0609016
	Introduction
	Analysis Procedure
	Results
	Systematics
	Conclusion
ABSTRACT
  We present results from STAR on 3-particle azimuthal correlations for a
$3<p_T<4$ GeV/c trigger particle with two softer $1<p_T<2$ GeV/c particles.
Results are shown for pp, d+Au and high statistics Au+Au collisions at
$\sqrt{s_{NN}}=200 GeV$. We observe a 3-particle correlation in central Au+Au
collisions which may indicate the presence of conical emission. In addition,
the dependence of the observed signal angular position on the $p_T$ of the
associated particles can be used to distinguish conical flow from simple
QCD-\v{C}erenkov radiation. An important aspect of the analysis is the
subtraction of combinatorial backgrounds. Systematic uncertainties due to this
subtraction and the flow harmonics $v_2$ and $v_4$ are investigated in detail.

<|endoftext|><|startoftext|>
Microsoft Word - kashlinsky_band.doc
Exploring First Stars Era with GLAST 
A. Kashlinsky1,2 and D. Band1,3,4 
1 Observational Cosmology Lab, Goddard Space Flight Center, Greenbelt MD 20771, 2SSAI, 3UMBC, 4CRESST 
Abstract. Cosmic infrared background (CIB) includes emissions from objects inaccessible to current telescopic studies, 
such as the putative Population III, the first stars. Recently, strong direct evidence for significant CIB levels produced by 
the first stars came from CIB fluctuations discovered in deep Spitzer images. Such CIB levels should have left a unique 
absorption feature in the spectra of high-z GRBs and blazars as suggested in [4]. This is observable with GLAST sources 
at z>2 and measuring this absorption will give important information on energetics and constituents of the first stars era. 
Keywords: Cosmology - background radiation, cosmic – gamma-rays, bursts – gamma-rays, astronomical observations 
PACS: 98.80.-k,98.70.Vc, 98.70.Rz, 95.85.Pw 
Cosmic infrared background (CIB) is a repository of emission throughout the entire history of the Universe, 
including from epochs containing objects inaccessible to current telescopic studies (see [1] for review).  One such 
epoch is when the first stars are thought to have been formed corresponding to z>10.  If the first stars (commonly 
called Population III – Pop III) were massive, they should have produced significant near-IR (NIR) CIB which 
should be cut-off by Lyman absorption at < 1 µm [2]. CIB derived from DIRBE- and IRTS-based analyses suggest 
higher fluxes at ~1-5 µm than that obtained by integrating galaxy counts [1]. The net CIB flux produced by these 
observed populations saturates at mAB~20-21 with fainter galaxies contributing little to the total budget. The excess 
flux at the near-IR (NIR) is commonly referred to as the NIRBE. The NIRBE, if extragalactic, must thus originate in 
still fainter systems, likely located at very early cosmic times. It is important to emphasize that e.g. K-band galaxies 
at mAB~19 are only at z~1 (e.g. [3]) with ordinary galaxies at earlier times contributing very little CIB; any modeling 
must account for the CIB evolution as observed in galaxy counts. The entire NIRBE correspond ~30 nW/m2/sr at 1-
4 µm of and this can be accounted for if only ~2% of the baryons have been processed through Pop III by z~10 [4]. 
FIGURE 1.  Right: Squares show net CIB in MJy/sr  from integrating galaxy counts and the dashed line corresponds to the CIB 
in the absence of NIRBE. Comoving number density of these photons is also shown. Circles show the NIRBE levels as discussed 
in [1]. Left: HESS measured spectra for the two blazars [5] are shown with crosses. Circles show the spectra corrected for 
absorption due to CIB photons produced by Pop III at z>10 and with total amplitude normalized to the shown value of ∆NIRBE. 
Recent HESS observations of absorption in the spectra of two z~0.2 blazars were taken to suggest that most of 
the NIRBE is not extragalactic for otherwise the unabsorbed spectra of the blazars would have to be too steep [5]. 
The situation is not as straightforward because the analysis in [5] used a template CIB which extends to <1 µm and 
is unsuitable for modeling CIB from z>7-10 sources, such as Pop III, whose CIB component, with a Lyman cutoff 
around 1 µm, should be used in any proper modeling. With this the constraints become significantly less severe and 
substantial CIB is still allowed by the data. This is shown in Fig.2, where we reconstructed the unabsorbed HESS 
blazar spectra using a more appropriate modeling of the CIB from Pop III. The model for the Pop III CIB 
component assumed: 1) CIB with Iν∝ν
-2, 2) CIB with a cutoff at the present wavelength of 1 µm, corresponding to 
the Lyman cutoff at z~10, 3) the amplitude normalized to the net NIRBE of amplitude ∆NIRBE shown in the figure, 
and 4) the contribution from ordinary galaxies is given by the observed values derived from deep galaxy counts. The 
right panels show the uncorrected spectra for various levels of NIRBE. Using the limit on the hardness parameter 
(Γ>-1.5) from [5], at least half of the claimed NIRBE levels can still be produced by Pop III (∆NIRBE<15 nW/m
2/sr). 
The most direct evidence for substantial energy release during the first stars’ era comes from the recent 
measurements of CIB fluctuations at 3.6-8 µm in deep Spitzer images, which remain after removing intervening 
ordinary galaxies to very faint levels [6,7]. The fluctuations are produced by populations that have significant 
clustering component (from clustering of the emitters), but only a small shot noise level (from sources occasionally 
entering the beam). The results indicate that 1) the CIB fluxes from cosmological populations producing these 
fluctuations are substantial with > 1-2 nW/m2/sr at 3.6 and 4.5 µm, and 2) these CIB fluctuations are produced by 
sources with individual fluxes of only <10-20 nJy, which likely places them at z>10 [8].  
If early stars produced even a fraction of the NIRBE, they would provide a source of abundant photons at high z. 
The present-day value of Iν=1 MJy/sr corresponds to the comoving number density of photons per logarithmic 
energy interval of 4π/c IνhPlanck=0.6 cm
-3; if these photons come from high z, their number density would increase as 
(1+z)3 at early times. These photons would also have higher (blue-shifted) energies in the past and would thus 
provide abundance of absorbers for sources of sufficiently energetic photons at high z via two-photon absorption.  
FIGURE 2. Left: shows the range of CIB photons which affects gamma-rays at given energy and the marked redshifts. Right: 
Optical depth vs γ-ray energy due to 2-photon absorption by the NIRBE photons for GRBs and other sources at marked redshifts. 
Left panel of Fig.2 shows the range of the CIB wavelengths where the two-photon absorption operates for 
different z; GLAST/LAT observations of GRB’s and blazars at z>1-2 should thus provide a test of emissions from 
the Pop III era.  Fig. 2 shows the optical depth for sources at high z (e.g., GRBs) assuming the entire NIRBE 
originated at z>10 [4]. If so, then γ-rays with energies > 260(1+z)-2 Gev from sources at z>1-2 should be completely 
absorbed even if only a faction of the NIRBE originated from the first stars era. GLAST and the Pop III era 
parameters have thus “conspired” very fortunately to uncover the CIB produced during the latter: 1) there should be 
a sharp cutoff at the Lyman limit at z>10, so CIB coming from these epochs is cut-off below the present-day 
wavelength of ~1µm; and 2) the threshold on the two-photon absorption process is such that with the GLAST energy 
limit of 300 GeV such measurement is not sensitive to the CIB beyond ~ 5 µm.  
For our purposes GRBs are cosmological sources with smooth spectra that extend up to GeV energies and are 
ideal for detecting absorption in the intervening IGM. As discussed, a source at z>1-2 should have a spectral cutoff 
at Ec=260(1+z)
-2 GeV  due to significantly energetic Pop III era emission. While GRBs are known to emit up to 18 
GeV [9] and perhaps to TeV energies [10], the >100 MeV spectrum that the LAT will detect must be extrapolated 
from a few detections by the CGRO/EGRET. At the lower energies (10 keV to 1 MeV), where many bursts have 
been detected and characterized, the GRB spectrum can be described by a smoothly broken power law (the ‘Band’ 
function [11]); in 6 bursts the EGRET observations [12] are consistent with an extrapolation of the simultaneously 
observed lower energy emission with a high energy spectral index of  β ~ -2 (where N(E)∝Eβ).  In a few bursts 
EGRET observed emission that lasted longer than the low energy emission (for GRB 940217 burst photons were 
observed up to 90 minutes after the 3 minute long lower energy emission [9]).  Milagrito may have detected TeV 
emission from GRB 970717a [10].  In GRB 941017 EGRET detected an additional hard power law component 
simultaneous with the lower energy ‘Band’ function component [13]. Thus the EGRET observations demonstrate 
the existence of >1 GeV emission.  In some cases the lower energy spectral components extrapolate to this energy 
band, but additional spectral and temporal components, expected on theoretical grounds [14,15] exist.  Despite these 
uncertainties, we can estimate the number of LAT GRBs where absorption by Pop III era photons might be observed 
by extrapolating the <1 MeV (synchrotron) spectra up to >1 GeV for a GRB population based on the BATSE 
observations. The GRB rate as a function of intensity, the burst spectrum, and the LAT's effective area as a function 
of photon energy have been convolved to predict the LAT's burst detection rate (see [16] on the GRB rate per year).  
FIGURE 3.  Distribution of z of Swift-detected GRBs (solid histogram) and the empirical function (dashes) used  in calculations. 
GRBs originate over a wide redshift range, and therefore we can expect the cutoff resulting from the Pop III era 
photon field to range over a wide energy range.  As a result of a broad intrinsic energy distribution and possibly 
evolution, GRB fluence correlates poorly with z, and consequently GRBs that are bright in the LAT will not 
necessarily originate at lower z than dim LAT bursts.  The z-distribution of the bursts the LAT will detect is 
currently uncertain: this distribution can be calculated assuming GRBs are correlated with other astrophysical 
phenomena (e.g., the star formation rate), or can be estimated empirically.  We use the observed distribution from 
the Swift mission (Fig. 3), which detects and rapidly localizes ~100 GRBs per year [17]. By convolving the LAT 
detection rate of bursts in which Ec can be observed, the relationship between Ec and z, and the z-probability 
distribution, we estimate that the LAT will detect ~7 GRBs/year with observable Pop III era absorption cutoffs.   
The stronger case for absorption by Pop III era photons will result from determining Ec and z for the same burst.  
The fraction of LAT-observed bursts for which redshifts will be determined is uncertain. The Swift mission is 
projected to last well into the GLAST era, and Swift should observe ~1/6 of the bursts the LAT detects; currently 
redshifts have been determined for ~1/3 of the Swift bursts.  GLAST will rapidly localize bursts onboard and on the 
ground, permitting follow-up ground-based observations that might result in burst redshifts; however, the 
localization by GLAST's GBM (degrees) and LAT (tenths of a degree) detectors may be too large to result in many 
redshifts.  In addition, various relations between burst characteristics have been proposed that could turn bursts into 
‘standard candles.’  For example, a tight relationship between the burst’s peak luminosity, a measure of its duration, 
and the average photon energy has been found [18]. The two GLAST instruments will observe the burst lightcurve 
(from which the duration can be measured) and spectrum (from which the average photon energy can be measured). 
The predicted ratio between the observed peak flux and the peak luminosity gives an estimate of the redshift that is 
akin to a photometric redshift. Thus we estimate that enough bursts with LAT-determined cutoffs will have redshifts 
enabling robust determination of the Pop III era spectral absorption over the course of the GLAST mission. 
AK acknowledges support from the National Science Foundations grant AST-0406587. 
REFERENCES 
1. A. Kashlinsky., Phys. Rep., 409, 361-438 (2005) 
2.   M. Santos, V. Bromm & M. Kamionkowski, MNRAS, 336, 1082-1092 (2002)  
3.   M. Cirasuolo et al. astro-ph/0609287 
4.   A. Kashlinsky, Ap.J.(Letters), 633, L5-L8 (2005) 
5.   F. Aharonian et al., Nature, 440, 1018-1021 (2006) 
6.   A.  Kashlinsky, R. Arendt, J. Mather & H.  Moseley, Nature, 438, 45-50 (2005) 
7.   A.  Kashlinsky, R. Arendt, J. Mather & H.  Moseley, Ap.J. (Letters), 654, L5-L8 (2007) 
8.   A.  Kashlinsky, R. Arendt, J. Mather & H.  Moseley, Ap.J. (Letters), 654, L1-L4 (2007) 
9.   K. Hurley,. et al, Nature, 372, 652 (1994) 
10.  R. Atkins, et al. ApJ, 583, 824 (2003) 
11.  D. Band, et al.. Ap.J., 413, 281 (1993) 
12.  B. Dingus, ``Observations of the Highest Energy Gamma Rays from Gamma-Ray Bursts'' in AIPC, 662, 240 (2003) 
13.  M. González, et al., Nature, 424, 749 (2003) 
14.  M. Bottcher & C. Dermer, ApJ, 499, 131 (1998) 
15.  Z. Dai & T. Lu, ApJ, 580, 1013 (2002) 
16.  N. Omodei, et al., these proceedings (2008) 
17.  N. Gehrels, et al., Ap.J., 611, 1005 (2004) 
18.  C. Firmani, G. Ghisellini, V. Avila-Reese, & G. Ghirlanda, MNRAS, 370, 185 (2006)
ABSTRACT
  Cosmic infrared background (CIB) includes emissions from objects inaccessible
to current telescopic studies, such as the putative Population III, the first
stars. Recently, strong direct evidence for significant CIB levels produced by
the first stars came from CIB fluctuations discovered in deep Spitzer images.
Such CIB levels should have left a unique absorption feature in the spectra of
high-z GRBs and blazars as suggested in [4]. This is observable with GLAST
sources at z>2 and measuring this absorption will give important information on
energetics and constituents of the first stars era.

<|endoftext|><|startoftext|>
Introduction
The fluorescent iron K (Fe Kα) emission line is considered to
be a useful probe of the accretion flow around the central black
hole of an active galaxy. In particular, due to the high orbital
velocity and strong gravitational field in the innermost regions
of an accretion disc, its profile shall be deformed by the con-
currence of Doppler and relativistic shifts. The resulting line is
therefore broadened, with a red-wing extending towards lower
energies (e.g. Fabian et al. 2000; Reynolds & Nowak 2003).
Detailed modelling of time-averaged spectra have been used to
obtain important estimates of the disc ionization state, its cover-
ing factor and, at least for the brightest and best cases, its emis-
sivity law, inner radius and BH spin (Brenneman & Reynolds
2006; Miniutti et al. 2006; Guainazzi et al. 2006; Nandra et
al. 2006). It is also well established that time-resolved spec-
tral analysis is a fundamental tool if we want to understand
not only the geometry and kinematics of the inner accretion
flow but also its dynamics. Early attempts (i.e. Iwasawa et al.
Send offprint requests to: F. Tombesi
e-mail: tombesi@iasfbo.inaf.it
1996; Vaughan & Edelson 2001; Ponti et al. 2004; Miniutti et
al. 2004) have clearly shown that the redshifted component of
the Fe Kα line is indeed variable and that complex geometrical
and relativistic effects should be taken into account (Miniutti et
al. 2004).
More recently, results of iron line variability have been reported
(Iwasawa, Miniutti & Fabian 2004; Turner et al. 2006; Miller et
al. 2006). These are consistent with theoretical studies on the
dynamical behaviour of the iron emission arising from local-
ized hot spots on the surface of an accretion disc (e.g. Dovčiak
et al. 2004). Iwasawa, Miniutti & Fabian (2004), for example,
measured a ∼25 ks modulation in the redshifted Fe Kα line flux
in NGC 3516, which suggests that the emitting region is very
close to the central black hole. However, these are likely to be
transient phenomena, since such spots are not expected to sur-
vive more than a few orbital revolutions. For this reason, it is
inherently difficult to establish the observational robustness of
these type of models, if not by accumulating further observa-
tional data.
Here we present results on the iron line variability in the
bright Seyfert galaxy NGC 3783 (z≃0.01) based on XMM-
http://arxiv.org/abs/0704.0226v1
2 F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783
Fig. 1. The X-ray light curves of NGC 3783 in the 0.3–10 keV band. Left panel: light curve of the 2000 observation. Right panel:
light curves of the 2001a and 2001b observations.
Newton data. This object has been taken as an example in
which multiple warm absorbers can mimic the broad iron line
feature (Reeves et al. 2004), contrary to the initial claim of
the presence of a broad iron line emission using ASCA data
(Nandra et al. 1997). However, the recent study by De Marco
et al. (2006) found evidence for a transient excess feature in the
5–6 keV energy band, interpreted as a redshifted component
of the Fe K line. This result is also supported by a variability
study by O’Neill & Nandra (2006), who examined rms vari-
ability spectra of a sample of bright active galaxies observed
with XMM-Newton. Given the above considerations, we re-
examined all the XMM-Newton observations of NGC 3783 to
perform a comprehensive study of the iron line temporal evo-
lution, on the shortest possible time-scale.
2. XMM-Newton observations
XMM-Newton observed NGC 3783 on 2000 December 28–
29 and on 2001 December 17–21. The first observation (ID
0112210101) has a duration of ∼40 ks while the second (ID
0112210201 and ID 0112210501, hereafter observation 2001a
and 2001b respectively) lasts over two complete orbits for a
total duration of ∼270 ks. Only the EPIC pn data are used in
the following analysis because of the high sensitivity in the
Fe K band. The EPIC pn camera was operated in the “Small
Window” mode with the Medium filter both during the 2000
and the 2001 observations. The live time fraction is thus 0.7.
The data were reduced using the XMM-SAS v. 6.5.0 software
while the analysis was carried on using the lheasoft v. 5.0 pack-
age. High background time intervals were excluded from the
analysis. The useful exposure time intervals are listed in Tab. 1,
together with the mean 0.3–10 keV count rate for each obser-
vation. Only single and double events were selected. Source
photons were collected from a circular region of 56 arcsec ra-
dius, while the background data were extracted from rectangu-
Table 1. Date, duration, useful exposure and mean EPIC pn
0.3–10 keV count rate for each XMM-Newton observation of
NGC 3783.
Obs. ID Date Duration Exposure 〈CR〉
(ks) (ks) (c/s)
0112210101 2000 Dec 28–29 40.412 35 8.5
0112210201 2001 Dec 17–19 137.818 115 6.5
0112210501 2001 Dec 19–21 137.815 120 8.5
lar, nearly source-free regions on the detector. The background
is assumed to be constant throughout the useful exposure. The
0.3–10 keV light curves are shown in Fig. 1 for each observa-
tion.
3. Data analysis
3.1. Spectral features of interest and selection of the
energy resolution
The time-averaged spectrum was analyzed using the XSPEC
v. 11.2 software package. For simplicity, we limited the anal-
ysis to the 4–9 keV band. In this energy band, we checked
that the complex and highly ionized warm absorber (with logξ
and NH up to ∼2.9 erg cm s
−1 and 5×1022 cm−2, Reeves et
al. 2004) shall not affect our conclusions below. The resid-
uals against a simple power-law plus cold absorption contin-
uum model for the 2001b observation, the longest continuous
dataset available, are shown in Fig. 2. In this fit we excluded
the Fe K energy band (i.e. 5–7 keV) and the best fit parameters
are (2.5 ± 0.6) × 1022 cm−2 and 1.81 ± 0.04, for the absorber
column density and power-law slope respectively.
Identified are four excess emission features: the main
Fe Kα core at ∼6.4 keV, a wing to the line core at around 6
keV, a peak at ∼7 keV (possibly Fe Kβ) and a narrow peak at
F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783 3
Fig. 2. The 4–9 keV residuals against a simple power-law
plus cold absorption continuum model for the spectrum of
NGC 3783 during the 2001b observation. The data are obtained
from EPIC pn.
∼5.4 keV. Moreover we identified two absorption features at
∼6.7 keV and ∼7.6 keV. When fitted with Gaussian emission
and absorption lines, all these features are significant at more
than ∼99% confidence level. Similar results were also obtained
by a detailed analysis with more complex models (Reeves et al.
2004). We will focus here on the analysis of the features vari-
ability properties. The application of the excess map technique
to the identified absorption features did not give significant re-
sults, thus, in the following, we will focus on the analysis of
emission features variability only. The 2001a observation has
been divided into two parts because of the gap in the data be-
tween t∼5×104 s and t∼6×104 s. Since all the selected spec-
tral features are comparable to the CCD spectral resolution, we
chose 100 eV for the energy resolution of the excess maps.
3.2. Selection of the time resolution
In choosing the time resolution for the excess maps we looked
for the best trade-off between getting a sufficiently short time-
scale, in order to oversample variability, and keeping enough
counts in each energy resolution bin. We first considered the
2001b observation, having the longest and continuous expo-
sure. Spectra were extracted during different time intervals (1
ks, 2.5 ks and 5 ks) around the local minimum flux state at
t∼215 ks. The required condition is that each 100 eV energy
bin in the 4–9 keV band has to contain at least 50 counts. At
the time resolution of 2.5 ks we got ∼90 counts per energy
bin at the energies of the “red” feature (5.3–5.4 keV), and ∼80
counts per energy bin in the “wing” feature energy band (5.8–
6.1 keV). Moreover, for a 107 M⊙ black hole we expect the
Keplerian orbital period to be ∼104 s at a radius of 10 rg. Thus,
selecting 2.5 ks as the excess maps time resolution, enables us
to completely oversample this typical time-scale. This choice
of time resolution, optimized for the 2001b observation, was
extended to the 2000 and 2001a data.
4. Excess emission maps
Energy spectra for a duration of the chosen exposure time (2.5
ks) are extracted in time sequence. 14 spectra are obtained
Table 2. Spectral features of interest in the 4–9 keV band with
the selected band-passes and mean intensity.
Feature Energy band 〈I〉
(keV) (10−5 ph s−1cm−2)
red 5.3–5.4 0.6
wing 5.8–6.1 2
core (Kα) 6.2–6.5 5.3
Kβ 6.8–7.1 1.2
Red+Wing 5.3–6.1 3.2
from the 2000 observation, 46 and 48 spectra are obtained from
2001a and 2001b observations respectively. For each spectrum
the continuum is determined and subtracted. The residuals in
counts unit are corrected for the detector response and put
together in time sequence to construct an image in the time-
energy plane.
4.1. Continuum subtraction
The continuum model is assumed to be always a simple ab-
sorbed power-law, throughout all the observations. For each
spectrum the energy band of the observed spectral features (i.e.
5–7 keV) is excluded during the continuum fit. The 4–5 keV
and 7–9 keV data are rebinned so that each channel contains
more than 50 counts to enable the use of the χ2 minimization
process when performing spectral fitting and to ensure that the
high energy end of the data (7–9 keV) have enough statistical
weight. Because of the chosen low energy bound (4 keV), the
fit is not sensitive to cold absorption. Thus the cold absorption
column density is fixed to the time-averaged spectrum value
(NH ≃ 2.5× 10
22 cm−2). Each 4–9 keV spectrum at 100 eV en-
ergy resolution is then fitted with its best-fit continuum model
and residuals are used to construct the excess emission map in
the time-energy plane.
Once all the continuum spectral fits have been done, we
checked if continuum changes could affect our measurements
of the line fluxes. The mean power-law slopes during the three
observations are 1.85, 1.75 and 1.79 respectively, with stan-
dard deviations of 0.08, 0.1 and 0.09. The power-law slopes
are indeed quite constant, consistent with values obtained from
the mean spectrum (§3.1), which result in a very marginal ef-
fect (< 0.1%) in the flux measurement of the narrow features
we found here. These power-law continuum slopes are also
in agreement with those previously found in observations us-
ing other instruments with overlapping spectral coverage, like
BeppoSAX (De Rosa et al. 2002), ASCA, RXTE and Chandra
(Kaspi et al. 2001).
4.2. Image smoothing
As discussed in Iwasawa, Miniutti & Fabian (2004), if the data
are acquired continuously and the characteristic time-scale of
any variation in a feature of interest is longer than the sam-
pling time (i.e. the time resolution), it is possible to suppress
random noise between neighboring pixels by applying a low-
pass filter. A circular Gaussian filter is used with σ=0.85 pixel
4 F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783
Time (ks)
Energy (keV)
0 0.05
 Excess (ph/2500s/cm^2)
0 50 100 150 200 250
Time (ks)
Energy (keV)
0 0.02 0.04 0.06
 Excess (ph/2500s/cm^2)
Fig. 3. The excess emission maps of the 4–9 keV band in the time-energy plane at 2.5 ks time resolution. The images have been
smoothed. Since the 6.4 keV line core is very strong and stable, the color map is adjusted to saturate the line core and allow
lower surface brightness features to be visible. Left panel: excess emission map from the 2000 observation. Right panel: excess
emission map from the 2001a and 2001b observations.
(200 eV in energy and 5 ks in time, FWHM). The excess map
Gaussian-filtered images for each observation are shown in
Fig. 3. Systematic variations are observed in the 5.3–5.4 keV
and in the 5.8–6.1 keV energy bands of the 2001b observation.
However, the image filtering can slightly smear these narrow
features and reduce their intensity.
5. Results
5.1. Light curves of the individual spectral features
Light curves of the four emission features are extracted from
the excess map filtered images. The selected band-passes are
listed in Tab. 2. During the image filtering process individ-
ual pixels lose their independence to the neighbouring ones.
This means that a simple counting statistics may be inappro-
priate for estimating the features light curves errors. For this
reason the estimation of the errors has been done by exten-
sive Monte Carlo simulations. We implemented 1000 simula-
tions following the same procedure in making the excess map
images. In the simulations all the spectral features parameters
and the power-law slope are assumed to be constant, while let-
ting the power-law normalization vary according to the 0.3–10
keV light curve. Light curves of individual spectral features
have been extracted from each simulation and their mean val-
ues and variances recorded. The square root of the mean of the
variances (i.e. the dispersion) has been regarded as the light
curves error. In Fig. 4 the emission features light curves for the
2001a and 2001b observations are shown. The 2000 observa-
tion light curves are not reported because they do not show any
sign of variability. The most intense variations are registered
in the light curves of the “red” (E=5.3–5.4 keV) and “wing”
(E=5.8–6.1 keV) features during the 2001b observation. The
observed peaks seem to follow the same kind of variability
pattern and, as shown in more details below, appear to be in
phase with the continuum emission. In order to check the sig-
nificance of the observed variability we extracted both real and
simulated data light curves in the entire 5.3–6.1 keV band, i.e.
of the “red+wing” structure. Then we compared the χ2 values
against a constant hypothesis for the real data and the 1000
simulations; equivalent results can be derived comparing the
variances directly. Only 73 of the simulations show variability
at the same level or greater than the real data, therefore we get
a variability confidence level of 93%.
The light curves of the excess emission features in the 5–6 keV
energy band (red, wing and red+wing) seem to show a variabil-
ity pattern with a recurrence of the flux peaks on time-scales
of 27 ks. We further investigated how it is likely to occur by
chance applying a method that makes use of the 1000 Monte
Carlo simulations. We folded the real data light curve with the
interval of 27 ks and we fitted it with a constant. We obtained a
χ2r = 88 for 19 degrees of freedom. We did the same to the simu-
lated red+wing light curves but, this time, folding in n = 9 trial
periods, from 17 to 37 ks at intervals of 2.5 ks, and recorded
the χ2i values. If N is the total number of simulated red+wing
light curves for which χ2i ≥ χ
r , the confidence level can be de-
F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783 5
/s Continuum
8 Core
0 105 2×105
Time (s)
Kbeta
Fig. 4. The light curves of the total 0.3–10 keV continuum
flux and of the four spectral features (Tab. 2) extracted from
the excess maps of the 2001a and 2001b observations (Fig. 3,
Right panel), with errors computed from simulations. The time
resolution is 2.5 ks.
rived as (1 − N1000·n ). Only N = 54 of the simulated red+wing
light curves folded at the trial periods show chi-square values
greater than the real one. Therefore, we could derive a confi-
dence level for the recurrence pattern on the 27 ks time scale of
99.4%.
Finally, we checked that the continuum above 7 keV (which
carries the photons eventually responsible for the Fe line pro-
duction) varies following the same pattern of the 0.3–10 keV
band (Fig. 4, Top panel).
5.2. Correlation with the continuum light curve
In the 0.3–10 keV light curve of observation 2001b (Fig. 5,
Upper panel) flux variations of ∼30% are visible with four
peaks separated by approximately equal time intervals. Given
such peculiar time series shape, we focused on this observa-
tion and searched for some typical time-scale in the variability
pattern. Thus, we applied the efsearch task (in Xronos), which
searches for periodicities in a time series calculating the maxi-
mum chi-square of the folded light curve over a range of peri-
ods. We found a typical time-scale for variability of 26.6±2.2
ks. We then removed the underlying long-term variability trend
by subtracting a 4th degree polynomial to the 0.3–10 keV con-
tinuum light curve (see Fig. 5, Middle panel). The polynomial
has been determined using the lcurve task (in Xronos), which
makes use of the least-square technique. Applying again the
efsearch task we found a typical time-scale for short-term vari-
ability of ∼27.4 ks1. The peaks observed in the continuum light
curve seem to appear at the same times at which those ob-
served in the “wing” and “red” light curves do. In order to look
1 It should be noted that the variability PSD study of this XMM-
Newton dataset by Markowitz (2005) suggests an excess of power
around 4×10−5 Hz (corresponding to about 25 ks) during the 2001b
observation (square symbols in his Fig. 3).
0.3−10 keV
0.3−10 keV
0 5×104 105
Time (s)
Red+Wing
Fig. 5. Upper panel: The 0.3–10 keV light curve of NGC 3783
during 2001b observation at 2.5 ks time resolution. Middle
panel: The 0.3–10 keV light curve of NGC 3783 during the
2001b observation after subtraction of a 4th degree polynomial
(long-term variations) at 2.5 ks time resolution. Lower panel:
The 5.3–6.1 keV (“red+wing” energy band) light curve ex-
tracted from the excess emission map of the 2001b observation
(Fig. 3, Right panel) at 2.5 ks time resolution.
for some correlation between the continuum and the 5.3–6.1
keV (“red+wing”) feature flux we computed the cross corre-
lation function (CCF) between the two time series, where the
input continuum light curve is the “de-trended” one. It is re-
ported in Fig. 6 as a function of time delay, measured with
respect to the continuum flux variations. No delay is evident,
with an estimated error at the peak of 2.5 ks. The continuum
and “red+wing” fluxes seem to show a correlation, with peak
value 0.7. To estimate the significance of the correlation we
computed the CCFs between the continuum and the simulated
“red+wing” light curves. If N is the number of simulated light
curves which have a higher cross correlation than the real one,
the significance of the correlation is (1 − N/1000). Applying
this method we found a confidence level greater than 99.9%.
5.3. High/Low flux state line profiles
Looking at the “red+wing” light curve (Fig. 5, Lower panel)
we constructed two spectra from the integrated high and low
flux intervals to verify the variability in this energy band. The
line profiles are shown in Fig. 7 where the ratio between the
data and a simple power-law plus cold absorption model is
shown. While the 6.4 keV and 6.9 keV features remain the
same, a small increase of counts is visible in the 5.3–5.4 keV
and 5.8–6.1 keV bands in the high flux state. Adding an emis-
sion Gaussian model to the simple power law plus Gaussian
line (at the Fe Kα energy) model in the high flux state inte-
grated spectra improves the χ2 of 18. Thus the significance of
the excess is ∼99.9%.
6 F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783
−104 0 104−
Time Delay (s)
Fig. 6. The cross correlation function calculated between the
de-trended 0.3–10 keV continuum light curve and the 5.3–6.1
keV feature (“red+wing”) light curve.
.4 HIGH
6 7 8
Energy (keV)
Fig. 7. The Fe K line profile during the High flux (open
squares) and the Low flux (solid circles) phases of the 5.3–6.1
keV feature. The ratios are computed against the best-fitting
continuum model. The energies are in the observer frame.
6. Discussion
In the 2001b observation, NGC 3783 clearly exhibits contin-
uum emission with two different time-scales: a long-term mod-
ulation with variations up to a factor 2 on intervals greater than
60 ks is superimposed to a shorter one on a time-scale of 27
ks, with modulations of 30% of the average value. According
to an estimated black hole mass of MBH = (3.0±0.5)×10
(Peterson et al. 2004), the latter modulation occurs with a
characteristic time-scale corresponding to the orbital period at
∼9–10 rg (e.g. Bardeen, Press & Teukolsky (1972)). It should
be noted, however, that a previous mass estimate by Onken &
Peterson (2002) gave the value of MBH = (8.7± 1.1)× 10
6 M⊙,
that would correspond to ∼20 rg. These mass discrepancies are
mainly due to a different scaling of the virial relation in per-
forming reverberation mapping measurements. However, the
former estimate is more accurate because it has been calibrated
to the MBH −σ relation and thus we will adopt this value in the
following discussion. The power spectral density of the source
is consistent with a red-noise shape (Markowitz 2005) where
variability mainly occurs on intervals of the order of days. Here
we identify an additional (additive) shorter time-scale compo-
nent (see the footnote 1), most likely produced within the in-
nermost accretion flow/corona system.
The most remarkable result of our analysis is the detec-
tion of redshifted (5.3–6.1 keV) Fe K emission and of its vari-
ability. The redshifted emission appears to respond only to the
shorter ∼27 ks time-scale modulation and shows a good cor-
relation with the continuum with a time-lag consistent with
zero within the errors (∆τ ∼ 1.25 ks). This indicates that the
continuum modulation on this time interval is likely to induce
Fe K emission from dense material close to the black hole
(which explains the observed redshift of the emission feature);
moreover the lack of time-lags implies that the distance be-
tween the sites of continuum and line production is smaller than
c∆τ ∼ 4×1013 cm ∼ 8 rg (for the black hole mass given above).
We are therefore most likely observing emission from the in-
nermost accretion flow in both the continuum and line emission
(corona and disc).
As discussed above, the variability time-scale suggests we
are looking at emission from around ∼9–10 rg. As a consis-
tency check, we fitted the time-averaged spectrum by including
a diskline component to account for the redshifted features. We
forced the emission region to be an annulus of ∆r= 0.5 rg with
uniform emissivity because the purpose of this test is to as-
sess the approximate location of the line-emitting region. We
obtain good fits with an almost face-on disc (i = 11 ± 4◦)
and an annulus at 9–15 rg depending on the assumed Fe line
rest-frame energy (from neutral at 6.4 keV to highly ionized
at 6.97 keV). For all cases we tested, the statistical improve-
ment is of ∆χ2 ∼ 16 for 3 additional degrees of freedom, cor-
responding to a confidence level of 99.7%. This fit shows that
the redshifted Fe line emission we detected is indeed consistent
in shape with being produced around 10 rg, where the disc or-
bital period is of the order of 27 ks, which agrees well with the
correlated (and zero-lag) variability of the two components.
The interpretation of our results is however not straightfor-
ward. The quasi-sinusoidal modulations of the continuum and
line emission (see Fig. 5) would suggest the presence of a local-
ized co-rotating flare above the accretion disc which irradiates
a small spot on its surface. The intensity modulations we see
(Fig. 5) would then be produced by Doppler beaming effects
(acting on both the flare and spot emission, i.e. on both contin-
uum and line) and the characteristic time-scale of 27 ks would
be identified with the orbital period (because of gravitational
time dilation, the period measured by an observer on the disc
at 10 rg would be shorter by ∼10%). As demonstrated above, a
flare/spot system orbiting the black hole at ∼10 rg would also
produce a time-averaged line profile in agreement with the ob-
servations. However, such a model makes definite prediction
on the Fe line energy modulation within one orbital period. In
this framework, the orbiting spot on the accretion disc would
also give rise to energy modulations of the Fe line due to the
Doppler effect and such energy modulation is barely seen in
the data (see Fig. 3). We stress that the adopted time resolu-
tion (2.5 ks) is good enough to detect energy modulations with
a characteristic 27 ks time-scale. This has been demonstrated
with XMM-Newton in the case of NGC 3516, where the modu-
lation occurs on a very similar time-scale (Iwasawa et al. 2004;
F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783 7
see also Dovčiak et al. 2004 for theoretical models). Therefore,
the lack of Fe line energy modulation disfavours the orbiting
flare/spot interpretation for NGC 3783.
However, a variability time-scale of the order of the orbital
one at a given radius does not necessarily imply the motion of
a point-like X-ray source. In fact, since the orbital time-scale is
the fastest at a given disc radius, and since the observed time-
scale of ∼27 ks corresponds to the orbital period at ∼10 rg, one
could argue that the data only imply that the X-ray variability
likely originates from within ∼10 rg. The apparent recurrence
in the X-ray continuum modulation may not necessarily be re-
lated to a real physical periodicity, especially considering the
limited length of the observation (only four putative cycles are
detected).
A possible explanation for the observed behaviour is that
the X-ray continuum source(s) (located within ∼10 rg from
the center) irradiates the whole accretion disc, but only a ring-
like structure around 10 rg is responsible for the fluorescent Fe
emission. This is possible if the bulk of the accretion disc is
so highly ionized that little Fe line is produced, while an over-
dense (and therefore lower ionization) structure is responsible
for the fluorescent emission. Such an over-dense region could
have an approximate ring-like geometry if it is for instance as-
sociated with a spiral-wave density perturbation. In this case,
the Fe emitting region is extended in the azimuthal direction
and we do not expect strong energy modulation with time,
whatever the origin of the continuum modulation. We point out
that spiral density distributions could result from the ordered
magnetic fields in the inner region of the disc and that the en-
ergy dissipation (via e.g. magnetic reconnection) could be en-
hanced there, thereby providing a common site for the produc-
tion of the X-ray continuum and the Fe line (e.g. Machida &
Matsumoto 2003).
On the other hand, if the apparent recurrence is in fact
real, it is worth noting that the continuous theoretical effort in
understanding the origin of quasi-periodic oscillations (QPO)
in neutron star and black hole systems provides a wealth of
mechanisms inducing quasi-periodic variability, although none
is firmly established (Psaltis 2001; Kato 2001; Rezzolla et al.
2003; Lee et al. 2004; Zycki & Sobolewska 2005 and many
others). A connection between QPO phase and Fe line inten-
sity has been previously claimed in the Galactic black hole
GRS 1915+105 (Miller & Homan 2005). Although in our case,
the presence of a QPO cannot be claimed because of the very
small number of detected cycles, the analogy is suggestive. In
the case of GRS 1915+105, Miller & Homan consider that a
warp in the inner disc, possibly due to Lense-Thirring preces-
sion, may produce the observed QPO-Fe line connection (e.g.
Markovic & Lamb 1998). However, to produce the observed
∼15% rms variability in the X-ray lightcurve, the black hole
spin axis should be inclined with respect to the line of sight by
at least 60◦ (which is at odds with our inclination estimate of
11±4◦ and with the Seyfert 1 nature of NGC 3783), and the tilt
precession angle should be larger than 20◦–30◦ (Schnittman,
Homan, Miller 2006). Both requirements make it highly un-
likely that Lense-Thirring precession can successfully account
for the observed modulations in NGC 3783.
While the origin of the coherent intensity modulation still
remains unclear, the correlated variation of the continuum and
line emission and the Fe line shape are consistent with an emis-
sion site at ∼10 rg. Moreover, the fact that the iron line variabil-
ity responds to the 27 ks time-scale modulation only implies
that this short time-scale variation is somehow detached from
the long-term variability. The latter may be associated with per-
turbations in the accretion disc propagating inwards from outer
radii and modulating the X-ray emitting region (Lyubarskii
1997), while the former seems to genuinely originate in the
inner disc.
Further observational data may help to clarify the complex
phenomena related to the relativistic Fe line temporal evolu-
tion in Seyfert 1 galaxies. Our work makes it clear that higher
quality data in the Fe band will be able to probe the inner-
most regions of accretion flows with high accuracy. Next gen-
eration of large collecting area X-ray missions such as XEUS
and Constellation-X or even very long observations with XMM-
Newton will be crucial to fully exploit such potential.
Acknowledgements. This paper is based on observations obtained
with the XMM-Newton satellite, an ESA funded mission with con-
tributions by ESA Member States and USA. We thank A. Müller, K.
Nandra, L. Nicastro, P. O’Neill and M. Orlandini for useful discus-
sions. MC, MD and GP acknowledge financial support from ASI un-
der contract ASI/INAF I/023/05/0. The authors thank the anonymous
referee for suggestions that led to improvements in the paper.
References
Bardeen, J. M., Press, W. H., & Teukolsky, S. A. 1972, ApJ, 178, 347
Brenneman, L. W., & Reynolds, C. S. 2006, ArXiv Astrophysics e-
prints, arXiv:astro-ph/0608502
De Rosa, A., Piro, L., Fiore, F. et al. 2002, A&A, 387, 838
De Marco, B., Cappi, M., Dadina, M., & Palumbo, G.G.C., 2006,
Astron. Nach., astro-ph/0610882
Dovčiak, M., Bianchi, S., Guainazzi, M., Karas, V., & Matt, G. 2004,
MNRAS, 350, 745
Fabian, A. C., Iwasawa, K., Reynolds, C. S., & Young, A. J. 2000,
PASP, 112, 1145
Guainazzi, M., Bianchi, S., & Dovciak, M. 2006, ArXiv Astrophysics
e-prints, arXiv:astro-ph/0610151
Iwasawa, K., et al. 1996, MNRAS, 282, 1038
Iwasawa, K., Miniutti, G., & Fabian, A. C. 2004, MNRAS, 355, 1073
Kaspi, S., et al. 2001, ApJ, 554, 216
Kato, S. 2001, PASJ, 53, L37
Lee W.H., Abramowicz M.A., & Kluzniak W., 2004, ApJ, 603, L93
Lyubarskii, Y. E. 1997, MNRAS, 292, 679
Machida, M., & Matsumoto, R. 2003, ApJ, 585, 429
Markovic D. & Lamb F.K., 1998, ApJ, 507, 316
Markowitz, A. 2005, ApJ, 635, 180
Miller J.M. & Homan J., 2005, ApJ, 618, L107
Miller, L., Turner, T. J., Reeves, J. N. et al. 2006, A&A, 453, L13
Miniutti, G., & Fabian, A. C. 2004, MNRAS, 349, 1435
Miniutti, G., et al. 2006, ArXiv Astrophysics e-prints,
arXiv:astro-ph/0609521
Nandra, K., George, I. M., Mushotzky, R. F., Turner, T. J., & Yaqoob,
T. 1997, ApJ, 477, 602
Nandra, K., O’Neill, P. M., George, I. M., Reeves, J. N., & Turner,
T. J. 2006, ArXiv Astrophysics e-prints, arXiv:astro-ph/0610585
O’Neill, P. & Nandra, K., 2006, in preparation
http://arxiv.org/abs/astro-ph/0608502
http://arxiv.org/abs/astro-ph/0610882
http://arxiv.org/abs/astro-ph/0610151
http://arxiv.org/abs/astro-ph/0609521
http://arxiv.org/abs/astro-ph/0610585
8 F. Tombesi et al.: Correlated modulation between the redshifted Fe Kα line and the continuum emission in NGC 3783
Onken, C. A., & Peterson, B. M. 2002, ApJ, 572, 746
Peterson, B. M., et al. 2004, ApJ, 613, 682
Ponti, G., Cappi, M., Dadina, M., & Malaguti, G. 2004, A&A, 417,
Psaltis D., 2001, Adv Space Res., 28, 481
Reeves, J. N., Nandra, K., George, I. M. et al. 2004, ApJ, 602, 648
Reynolds, C. S., & Nowak, M. A. 2003, Phys. Rep., 377, 389
Rezzolla L., Yoshida S., Maccarone T.J., Zanotti O., 2003, MNRAS,
344, L37
Schnittman J.D., Homan J., Miller J.M., 2006, ApJ, 642, 420
Turner, T. J., Miller, L., George, I. M., & Reeves, J. N. 2006, A&A,
445, 59
Vaughan, S., & Edelson, R. 2001, ApJ, 548, 694
Zycki P.T. & Sobolewska M.A., 2005, MNRAS, 364, 891
	Introduction
	XMM-Newton observations
	Data analysis
	Spectral features of interest and selection of the energy resolution
	Selection of the time resolution
	Excess emission maps
	Continuum subtraction
	Image smoothing
	Results
	Light curves of the individual spectral features
	Correlation with the continuum light curve
	High/Low flux state line profiles
	Discussion
ABSTRACT
  It has been suggested that X-ray observations of rapidly variable Seyfert
galaxies may hold the key to probe the gas orbital motions in the innermost
regions of accretion discs around black holes and, thus, trace flow patterns
under the effect of the hole strong gravitational field. We explore this
possibility analizing XMM-Newton observations of the seyfert 1 galaxy NGC 3783.
A detiled time-resolved spectral analysis is performed down to the shortest
possible time-scales (few ks) using "excess maps" and cross-correlating light
curves in different energy bands. In addition to a constant core of the Fe K
alpha line, we detected a variable and redshifted Fe K alpha emission feature
between 5.3-6.1 keV. The line exhibits a modulation on a time-scale of 27 ks
that is similar to and in phase with a modulation of the 0.3-10 keV source
continuum. The time-scale of the correlated variability of the redshifted Fe
line and continuum agrees with the local dynamical time-scale of the accretion
disc at 10 r_g around a black hole of 10^7 M_sun. Given the shape of the
redshfted line emission and the overall X-ray variability pattern, the line is
likely to arise from the relativistic region near the black hole.

<|endoftext|><|startoftext|>
Introduction 
Challenging motivations for isotopic studies in nuclear multifragmentation are derived 
from the importance of the density dependence of the symmetry-energy term of the 
nuclear equation of state for astrophysical applications and for effects linked to the 
manifestation of the nuclear liquid-gas phase transition.  
Müller and Serot, in their seminal paper [1], have demonstrated that the two-fluid nature 
of nuclear matter has very specific consequences for the phase behavior in the coexistence 
region. Different isotopic compositions are predicted for the coexisting liquid and gas 
phases, with the gas being more neutron rich than the liquid in asymmetric (N ≠ Z) 
matter. This difference stems from the decrease in the symmetry energy in nuclear matter 
as the density is decreased. The expected magnitude of this density dependence, however, 
is model dependent and very poorly constrained by existing data [2]. 
The calculations of Müller and Serot are restricted to infinite matter with no Coulomb 
force included. In addition, the isotopic composition, in the calculations, is typically 
varied within a range of proton fractions ρp/ρ = 0.3 to 0.5 whose limits are not easily 
accessible in experiments with heavy nuclei.  
Theoretical studies for finite systems also indicate that the sequential decay of excited 
reaction products has a tendency to reduce some of the expected effects [3]. 
If the phase transition for asymmetric nuclei still manifests itself as a plateau in the 
caloric curve [5] (i.e. the correlation between the temperature of the system and its 
excitation energy), this could be influenced by the degree of asymmetry as well as by the 
mass of the system undergoing the fragmentation. 
Figure 1. Location of the four studied projectiles in the plane of atomic number Z versus 
neutron number N. The contour lines represent the limiting temperatures according to [4] , the 
dashed line gives the valley of stability, and the full line corresponds to the N/Z = 1.49 of 197Au.
It has been shown [4] that, due to the Coulomb pressure, there exists a limiting 
temperature which represents the maximum temperature at which nuclei are found to exist 
as self-bound objects in Hartree-Fock calculations. The dependence of the breakup 
temperature on the excitation energy could therefore be governed by the limiting 
temperature. Figure 1 shows the calculated limiting temperature as a function of the 
neutron and proton number [4]. As it is possible to see from the picture, in case of proton-
rich nuclei, the phenomenon of vanishing limiting temperature is predicted, the limiting 
temperature decreasing with increasing proton fraction. 
Clearly, new experiments exploring such phenomena are mandatory for having a better 
knowledge of the thermodynamics of a finite nucleus and its decay.  
Recently a systematic investigation of projectile-spectator fragmentation has been 
undertaken at the ALADiN spectrometer at the GSI [6,7]: four different projectiles, 124Sn, 
197Au, 124La and 107Sn, all with an incident energy of 600 AMeV on 116Sn and 197Au 
targets, have been studied. The two latter beams have been delivered by the FRagment 
Separator (FRS) of the GSI as products of the fragmentation of a primary 142Nd beam at 
890 AMeV on a 9Be production target [7].  
The necessity of low beam intensities for the best operational condition of the ALADiN 
setup (≅2000 particles/sec), and the possibility of using a thick target in order to achieve 
high interaction rates are indeed conditions compatible with radioactive-ion-beam 
experiments. Moreover, the inverse kinematics offers the possibility of a threshold-free 
detection of all heavy fragments and residues and thus gives a unique access to the 
breakup dynamics. 
The measurement of the charge and the momentum vector of all projectile fragments with 
Z≥2 has been performed, with high efficiency and high resolution, with the TP-MUSIC 
IV detector [8]. Using the reconstructed values for the rigidity and pathlength, the charge 
of the particle measured by the TP-MUSIC IV, and the time-of-flight given by the TOF-
Wall, the velocity and the momentum vector can be calculated for each detected charged 
particle. The knowledge of velocity and momentum allows then the calculation of the 
particle's mass.  
Neutrons emitted in directions close to θlab = 0
o , are detected with the Large-Area 
Neutron Detector (LAND) which covers about half of the solid angle required for 
neutrons from the spectator decay. 
2. Gross Properties of multifragment decay 
The gross properties of projectile fragmentation are very similar for all the studied 
systems. 
The fragments emerging from the decay of the projectile spectators are well localised in 
rapidity. The distributions are peaked around a rapidity value very close to be beam 
rapidity and become narrower with increasing mass of the fragment. 
The observed independence of the Rise and Fall, i.e. of the correlation between the 
multplicity of intermediate mass fragments with Zbound, also for the unstable systems [6], 
confirms the hypothesis of equilibrium at freeze-out. 
The shape of the charge distributions are as well similar for all the systems. For larger 
values of Zbound, the charge distribution exhibits a U-shape. The heavy fragments are the 
residues of the lowly-excited projectile-spectators after the evaporation of light fragments 
and nucleons. In semi-central reactions, i.e. smaller values of Zbound, the distribution 
broadens and flattens over nearly the full charge distribution. For still lower values of 
Zbound, the charge distribution becomes steep. This is consistent with the system 
disassembling into predominantly lighter fragments. 
The charge distributions have been fitted with a power-law parameterization, σ(Z) ∝ Z-τ, 
in the charge range 3≤Z≤15. The power-law parameters τ (Figure 2 – left panel) allow to 
follow the transition from the U-shape to a pure exponential spectrum and reach the 
minimum value in the Zbound range which corresponds to the maximum production of 
IMF, i.e. in the multifragmentation region. They follow a nearly universal curve almost 
independent of the isotopic composition of the original spectator system. 
Figure 2. (Left Panel) The extracted τ  parameters as a function of normalized Zbound for 
124La, 124Sn and 107Sn at 600 AMeV, compared with earlier data for 197Au obtained at the 
same energy. (Right Panel) Neutron multiplicities as obtained from the LAND detector for 
all the studied systems (colored symbols are SMM predictions). 
Specific isotopic effects, even though small, can nevertheless be observed: in particular, 
the hierarchy of τ for the neutron-poor 124La and 107Sn and neutron-rich 124Sn and 197Au 
systems for Zbound/Zproj>0.5 is opposite to the standard predictions of the Statistical 
Multifragmentation Model SMM [9]. It can, however, be explained with a weak isotopic 
dependence of the surface-term coefficient in the liquid-drop description of the fragment 
masses at low excitation energy which gradually disappears with increasing excitation of 
the fragmenting system [10]. 
Specific isotopic effects are as well found in the reconstructed neutron multiplicities 
measured with the LAND detector [11] (Figure 2 - right panel). For peripheral collisions 
the values of the multiplicity depend on the number of available neutrons in the entrance 
channel. Going towards central collisions the number of emitted neutrons is progressively 
determined by the N/Z in the entrance channel. In a preliminary comparison with SMM 
calculation (colored points in Figure 2) a promising agreement has been observed.  
Neutrons will be important for establishing the mass and energy balance and in particular 
for calorimetry. In this respect it is crucial to identify the spectator neutrons and to 
distinguish them from the fireball ones. From a preliminary analysis of their rapidity 
distributions, the corresponding spectator sources have been identified. They are 
characterized by temperatures up to 4 MeV, possibly caused by large contributions from 
evaporation. 
2. Structure and Memory Effects in particle production 
The mass resolution obtained for projectile fragments entering into the acceptance of the 
ALADiN spectrometer is about 3% for fragments with Z=3 and decreases to 1.5% for 
Z≥6 [7]. Masses are thus individually resolved for fragments with atomic number Z≤10. 
The elements are resolved over the full range of atomic numbers up to the projectile Z 
with a resolution of ΔZ≤0.2 obtained with the TP-MUSIC IV detector. The mean N/Z of 
the mass distributions of light fragments in the range 3 ≤Z ≤ 13 for two different Zbound 
cuts is presented in Figure 3. 
Figure 3. Mean values <N>/Z of light fragments with 3 ≤ Z ≤ 13 produced in the 
fragmentation of 124Sn and 124La at 600 A MeV for two different bins in Zbound.
The values obtained for 124Sn are larger than those for 124La or 107Sn (not shown) as 
expected from the different N/Z of the original projectiles. Their odd-even variation is, 
however, much more strongly pronounced for the neutron-poor cases. The strongly bound 
α-type nuclei (even-even N=Z) attract a large fraction of the product yields during the 
secondary evaporation stage. This effect is, apparently, larger if already the hot fragments 
are close to N=Z symmetry, as it is expected for the fragmentation of 124La and 107Sn [12]. 
Inclusive data obtained with the FRS fragment separator at GSI for 238U [13] and 56Fe 
[14] fragmentations on titanium targets at 1 A GeV bombarding energy confirm that the 
observed patterns are very systematic, exhibiting at the same time nuclear structure 
effects characteristic for the isotopes produced and significant memory effects of the 
isotopic composition of the excited system by which they are emitted. This has the 
consequence that, because of its strong variation with Z, the neutron-to-proton ratio 
<N>/Z is not a useful observable for studying nuclear matter properties. For this purpose, 
techniques, such as the isoscaling [7], will have to be used which cause the nuclear 
structure effects to cancel out. A precise modeling of these secondary processes is, 
therefore, necessary for quantitative analyses. 
Another interesting feature of the mass distributions predicted by SMM is their 
dependence, in the case of the proton-rich systems, on the excitation energy of the system 
[10]. This difference arises from the dependence of the number of neutrons in light 
fragments on the Z spectrum. For the neutron-rich 124Sn system, on the other hand, the 
distributions should be independent of the excitation energy. From a first qualitative 
comparison with the model prediction of the mass distributions for different Zbound cuts, 
however, we have not observed any noticeable variation in the mean N/Zs for the 124La 
system (Figure 3). Also in this case it will be crucial to precisely model the sequential 
decay in order to clarify whether it has washed out the expected effect.  
3. Limiting temperature 
As previously mentioned, for proton-rich systems a rapid drop of the limiting temperature 
has been theoretically predicted (Figure 1) because of the increasing Coulomb pressure 
[4]. On the other hand, SMM calculations predict nearly isospin-invariant temperatures 
for the coexistence region [10]. Therefore, in order to distinguish whether the breakup 
temperature is determined by the binding properties of the excited hot nuclear system or 
by the phase space accessible to it by fragmentation we have analyzed the temperature for 
the studied neutron-poor and neutron-rich systems.  
The temperatures used in the caloric curve studies [5,15] were deduced from a double 
ratio of isotopic yields. Under the assumptions of low density and chemical equilibrium a 
grand canonical treatment [16] yields for the double ratio: 
ZAZA exp
),(),(
),(),(
with ΔB being the double difference of the binding energies of the chosen isotopes and a 
a statistical factor containing spin and mass terms. Furthermore, for best sensitivity the 
binding energy difference ΔB should be comparable or larger than the temperatures to be 
measured [17]. 
The thermometry with the 3,4He isotope pairs used in the nuclear caloric curve [5] benefits 
from the large difference ∆B = 20.6 MeV of the binding energies of these two nuclei but 
it is not the only choice. There are also other combinations of isotopes which can be 
expected to provide the necessary sensitivity for the measurement of temperatures in the 
MeV range. In a recent systematic study of different isotopic thermometers for spectator 
fragmentation [18] TBeLi, TCLi, and TCC have been analyzed in detail. In particular, it has 
been found that the rise at small Zbound, i.e. high excitation energy, previously observed 
with THeLi is well reproduced by most thermometers, including TBeLi which is derived 
from the 7,9Be and 6,8Li isotope ratios. This is an indication that the rise is not necessarily 
related to a particular behavior of either 3He or 4He. An overall good agreement between 
the different temperature observables has been obtained with the exception of those 
containing carbon isotopes. In the latter cases, the apparent temperature values remain 
approximately constant with values between 4 and 5 MeV.  
Measured isotope ratios as a function of Zbound for the 
124Sn, 124La and 107Sn systems are 
shown in Figure 4. A very interesting dependence on the isotopic composition of the 
Figure 4. Measured isotopic yield ratios as a function of Zbound/Zproj for the neutron-rich 
124Sn and the neutron-poor 124La and 107Sn systems. 
spectator is observed for all analyzed ratios with the exception of the 3He/4He ratio, 
exhibiting a reduced difference between the neutron-rich and neutron-poor systems. The 
strong sequential decay into α particles could explain the difference between these two 
behaviors. Therefore, also in this case, the structure effects responsible for the observed 
odd-even variation in the N/Z/s of intermediate-mass fragments (Figure 3) play the major 
role. 
For the 124Sn projectile spectator, moreover, the ratios, with the exception of 3He/4He, are 
almost independent of Zbound. For the mentioned ratio the decrease between the most 
central and the most peripheral collisions is about a factor 4 whereas for example, for the 
9Be/8Li ratio the total variation amounts hardly to a factor of 2. There seems to be a clear 
correlation between the difference of the binding energies of the involved isotopes and 
Zbound, i.e. excitation energy of the system [18]: the higher the first the stronger the 
variation of the corresponding ratio with Zbound.  
On the other hand, in the case of the proton-rich systems all ratios exhibit a strong 
dependence on the excitation energy deposited in the system.  
From the measured ratios the isotopic temperatures have been extracted and an overall 
agreement with the previous systematics [18] have been obtained for the THeLi, TBeLi, TCLi, 
and TCC thermometers. 
In Figure 5 in particular, the apparent THeLi and TBeLi temperatures for the 
124Sn and 124La, 
Figure 5. Isotopic temperatures THeLi and TBeLi as a function of Zbound/Zproj for 124Sn and 
124La, 107Sn projectile spectators. 
107Sn spectator systems are reported. By comparing the two systems with the same mass 
but different isospin-content, the average difference of the obtained THeLi temperatures is 
0.7±0.1 MeV whereas almost no difference (hardly 0.1 MeV in average) between the two 
systems is observed in the case of the TBeLi thermometer. The small difference observed in 
the case of THeLi is caused by the fact that the dependence of the 
6Li/7Li ratio (Figure 4) on 
the isotopic composition of the system is not compensated by the weak dependence of the 
3He/4He ratio. In the case of TBeLi both isotopic ratios 
7,9Be and 6,8Li exhibit a dependence 
on the N/Z of the original projectiles, which cancels out in the double ratio. The 
invariance of the isotopic temperature with the isotopic composition of the system is 
inconsistent with the limiting temperature predictions [4] of 2 MeV for the 124Sn and 124La 
systems (Figure 1) and seems to favor a statistical interpretation [10]. 
4. Conclusions 
Isotopic effects in the break-up of projectile spectators at relativistic energies have been 
reported. The gross properties of projectile fragmentation are very similar for all the 
studied systems. Specific isotopic effects, even though small, can nevertheless be 
observed: in particular, the inversion in the hierarchy of the τ exponential parameter of 
the charge distribution can be explained with a weak isotopic dependence of the surface-
term coefficient of the nuclear equation of state [10]. 
The mean N/Zs of the isotope distributions of light fragments exhibit as well specific 
isotopic effects. In particular, the observed odd-even variation is much more strongly 
pronounced for the neutron-poor cases and could be explained in terms of a simultaneous 
concurrence of both structure effects characteristic for the isotopes produced and memory 
effects of the isotopic composition of the excited system from which they are emitted.  
From the double ratios of Z≤4 isotopes, the isotopic temperatures have been determined. 
The small dependence (of about 0.7 MeV) observed in the THeLi thermometer could be 
due to the influence of structure effects in the sequential decay. The invariance with the 
isotopic composition in the entrance channel is inconsistent with the limiting-temperature 
predictions and seems to favors the statistical interpretation. On the other hand, the 
limiting-temperature concept reproduces nicely the mass dependence of the caloric curves 
[15] and only seems to fail when applied to the isospin degree of freedom.  
It should be noted that most of the limiting-temperature calculations are made for beta 
stable nuclei [19] while it was not explicitly tested whether the studied systems are still 
near the stability at breakup.  
The weak N/Z dependence of the breakup temperatures measured in this experiment 
shows that this is not a major point of concern. The same observation, on the other hand, 
is not compatible with the strong Coulomb effect predicted by Besprosvany and Levit [4]. 
This open point in the connection between limiting temperatures for heavy nuclei and 
breakup temperatures of fragmenting nuclear systems will require an improved 
understanding. 
C.Sf. acknowledges the receipt of an Alexander-von-Humboldt fellowship. This work was 
supported by the European Community under contract No. RII3-CT-2004-506078 and 
HPRI-CT-1999-00001 and by the Polish Scientific Research Committee under contract 
No. 2P03B11023. 
Bibliography 
[1] H. Müller and B.D. Serot, Phys. Rev. C 52 (1995) 2072. 
[2] for reviews see, e.g., C. Fuchs and H.H. Wolter Eur. Phys. J. A 30 (2006) 5. 
[3] A.B. Larionov et al., Nucl. Phys. A 658 (1999) 375. 
[4] J. Besprosvany and S. Levit, Phys. Lett. B 217 (1989) 1. 
[5] J. Pochodzalla et al., Phys. Rev. Lett. 75 (1995) 1040. 
[6] C. Sfienti et al., Nucl. Phys. A 749 (2005) 83c. 
[7] S. Bianchin et al., Contribution to this proceedings. 
[8] C.Sfienti et al., Proceeding of the XLI International Winter Meeting on Nuclear 
Physics, Bormio, 26 Jan. – 1 Feb. 2003, Ricerca Scientifica ed Educazione Permanente, 
Supplemento N.120, p.323. 
[9] R. Ogul and A. S. Botvina, Phys. Rev. C 66 (2005) 051601R.  
[10] A. S. Botvina et al., Phys. Rev. C 74 (2006) 044609. 
[11] P. Pawlowski et al, to be published.  
[12] N. Buykcizmeci et al., Eur. Phys. J A 25 (2005) 57. 
[13] M. V. Ricciardi et al.,Nucl. Phys. A 733 (2004) 299. 
[14] P. Napolitani et al., Phys. Rev. C 70 (2004) 054607. 
[15] J. Natowitz et al., Phys. Rev. C 65 (2002) 034618. 
[16] S. Albergo et al., Il Nuovo Cimento 89A (1985) 1. 
[17] M.B. Tsang et al., Phys. Rev. Lett 78 (1997) 3836. 
[18] W. Trautmann et al., to be published. 
[19] P. Wang et al., Nucl. Phys. A 748 (2005) 226.
ABSTRACT
  A systematic study of isotopic effects in the break-up of projectile
spectators at relativistic energies has been performed at the GSI laboratory
with the ALADiN spectrometer coupled to the LAND neutron detector. Besides a
primary beam of 124Sn, also secondary beams of 124La and 107Sn produced at the
FRS fragment separator have been used in order to extend the range of isotopic
compositions. The gross properties of projectile fragmentation are very similar
for all the studied systems but specific isotopic effects have been observed in
both neutron and charged particle production. The breakup temperatures obtained
from the double ratios of isotopic yields have been extracted and compared with
the limiting-temperature expectation.

<|endoftext|><|startoftext|>
arXiv:0704.0228v1  [gr-qc]  2 Apr 2007
epl draft
Einstein vs Maxwell: Is gravitation a curvature of space,
a field in flat space, or both?
Theo M. Nieuwenhuizen
Institute for Theoretical Physics, University of Amsterdam, Valckenierstraat 65, 1018 XE Amsterdam, The Netherlands
PACS 04.20.Cv – Fundamental problems and general formalism
PACS 04.20.Fy – Canonical formalism, Lagrangians, and variational principles
PACS 98.80.Bp – Origin and formation of the Universe
Abstract. - Starting with a field theoretic approach in Minkowski space, the gravitational energy
momentum tensor is derived from the Einstein equations in a straightforward manner. This allows
to present them as acceleration tensor = const. × total energy momentum tensor. For flat space
cosmology the gravitational energy is negative and cancels the material energy. In the relativistic
theory of gravitation a bimetric coupling between the Riemann and Minkowski metrics breaks
general coordinate invariance. The case of a positive cosmological constant is considered. A
singularity free version of the Schwarzschild black hole is solved analytically. In the interior the
components of the metric tensor quickly die out, but do not change sign, leaving the role of time
as usual. For cosmology the ΛCDM model is covered, while there appears a form of inflation at
early times. Here both the total energy and the zero point energy vanish.
It is said that in introducing the general theory of rela-
tivity (GTR), Einstein made the step that Lorentz and
Poincaré had failed to make: to go from flat space to
curved space. Technically, this arises from the group of
general coordinate transformations [1, 2]. One fundamen-
tal difficulty is then how to deal with the physics of gravi-
tation itself, since there is only a quasi energy-momentum
tensor [3]. For gravitational wave detection, e.g., this
leaves open the question as to how energy can be faithfully
transferred from the wave to the detector. The proper en-
ergy momentum tensor of gravitation was derived only
recently by Babak and Grishchuk [4], who start with a
field theoretic approach to gravitation, in terms of a ten-
sor field hµν in a Minkowski background space-time. The
metric of the latter, ηµν = diag(1,−1,−1,−1), is denoted
in arbitrary coordinates by γµν = (γ
µν)−1. The Riemann
metric tensor gµν = (g
µν)−1, is then defined by
gµν = γµν + hµν ≡ kµν , g
det(gµν)
det(γµν)
. (1)
It is just a way to code the gravitational field, allowing
to expresses distances by ds2 = gµνdx
µdxν . Such a non-
linear way to code distances in a flat space is not uncom-
mon. For diffuse light transport through clouds, one may
express distances in the optical thickness, the number of
extinction lengths. If the cloud is not homogeneous, points
at the same physical distance are described by a different
optical distance and, vice versa.
The Maxwell view that gravitation is a field in flat
space, was actually the starting point for Einstein, and
reappeared regularly. Nathan Rosen [5], coauthor of
the Einstein-Podolsky-Rosen paper that led the basis for
quantum information, considers a bimetric theory, in-
volving the Minkowski metric and the Riemann metric.
Bimetrism is quite natural, with ηµν entering e.g. particle
physics, and gµν e.g. cosmology. Rosen considers covari-
ant derivatives Dµ of Minkowski space, with Christoffel
symbols γλ
·µν vanishing in Cartesian coordinates. When re-
placing in the Riemann Christoffel symbols partial deriva-
tives by Minkowski covariant ones,
·µν =
gλσ(∂µgνσ + ∂νgµσ − ∂σgµν) 7→
·µν =
gλσ(Dµgνσ +Dνgµσ −Dσgµν), (2)
the obtained Christoffel-type symbols Gλ
·µν are tensors in
Minkowski space. Inspired by the Landau-Lifshitz and
Babak-Grishchuk results, we may define the acceleration
tensor
Aµν =
DαDβ(k
µνkαβ − kµαkνβ), (3)
where kµν = γµν + hµν and in which the γγ terms do not
http://arxiv.org/abs/0704.0228v1
Th.M. Nieuwenhuizen
contribute. Then we can calculate the combination
τµν =
Aµν − (Rµν − 1
gµνR)
. (4)
In doing so, we make use of Rosen’s observation that Rµν
remain unchanged if one replaces all partial derivatives by
covariant ones in Minkowski space [5]. It appears that
all second order derivatives drop out from (4), leaving a
bilinear form in first order covariant derivatives,
τµν =
· · :λh
· · :ρ −
· · :λh
· · :ρ
hµλ:ρhν
·λ:ρ +
kµνhλρ:σhλσ:ρ −
hλρ:µhν
hµλ:ρh · · :νλρ +
hλρ:µh · · :νλρ −
kµνhλρ:σhλρ:σ +
kµνh ·ρ:λρ h
. (5)
in which X:µ ≡ DµX and raising (lowering) of indices
· · :ρ is performed with k
µν (kµν). τ
µν is a tensor
in Minkowski space. For Cartesian coordinates, it coin-
cides with the Landau-Lifshitz quasi-tensor. In general, it
coincides with the Babak-Grishchuk tensor γtµν/g. Inclu-
sion of matter is now much easier than in [4]. Inserting
the Einstein equations in the right hand side of (4), we
may write the Einstein equations in the Newton shape:
acceleration=mass−1×force,
Aµν =
Θµν ,
Θµν =
θµν , θµν ≡ τµν + T µν . (6)
Θµν is the total energy momentum tensor of gravitation
and matter. It is conserved, DνΘ
µν = 0, since Eq. (3) im-
plies DνA
µν = 0, because covariant Minkowski derivatives
commute.
As an application, let us consider cosmology, described
by the Friedman-Lemaitre-Robertson-Walker (FLRW)
metric,
ds2 = U(t)c2dt2 − V (t)
1− kr2
+ r2dΩ2
, (7)
dΩ2 = dθ2 + sin2 θdφ2.
Let us consider flat space, k = 0, and U = 1, V (t) = a2(t)
with a the scale factor. Then ds2 = c2dt2 − a2(t)dr2
is space-independent, implying that A00 = 0, due to the
shape (3). According to (6) it then follows that the total
energy density is zero, because the gravitational energy
density, τ00 = −3c4ȧ2/(8πGa2), is negative and cancels
the one of matter, T 00 = ρ, due to the Friedman equation.
In other words, such a universe contains no overall energy.
So far we have discussed an alternative, field theoretic
formulation of GTR. If we consider a local energy mo-
mentum density as a sine qua non property, then we are
led to consider Minkowski space as a fixed “pre-space”,
that exist already without matter, just as a region of
space ahead of the earth’s orbit is right now almost empty
(Minkowskian), and when the earth arrives, there will be
more gravitational and matter fields, but, in our view, no
change of space. Also for cosmology there is a different
interpretation. In GTR coordinates are fixed to clusters
of galaxies, this is called “coordinate space”, but due to
the increasing scale factor galaxies are said to move away
from each other: physical space (i.e. Riemann space) is
said to expand. Here we are led to another view: Coordi-
nate space is physical space, so clusters of galaxies do not
move away from each other in time. [6] However, the cos-
mic speed of light dr/dt = c/a(t), which was very large at
early times, keeps on decreasing, thus causing a redshift,
till a is infinite, when galaxies are invisible.
Relativistic Theory of Gravitation, RTG. Let us move
on to an extension of GTR, giving up general coordinate
invariance. Discarding a total derivative of the Hilbert-
Einstein action, Rosen expresses the gravitational action
d3xdt
−g LR in terms of [5]
c4gµν
·λσ −Gλ·µσGσ·νλ) =
128πG
× (2hµν:ρhµν:ρ − 4hµν:ρhµρ:ν − hν·ν:µh ·ρ:µρ ).
Involving only Minkowski covariant first order derivatives,
it is close to general approaches in field theory. Logunov
and coworkers continue on this [6]. The subgroup of gauge
transformations that transform hµν but leave coordinates
invariant, allows three extra terms [6],
Lg = LR − ρΛ +
ρbiγµνg
µν − ρ0
γ/g. (9)
Here ρΛ is the familiar energy related to a cosmological
constant. The ρ0 term describes a harmless shift of the
zero level of energy, δS = −
d3xdt
−γρ0. The bimetric
term ρbi couples the Minkowski and the Riemann metrics.
It acts like a mass term, because it breaks general coor-
dinate invariance, and has some analogy to a mass term
in massive electrodynamics. Logunov then imposes the
relation
ρΛ = ρbi = ρ0, (10)
which, in the absence of matter, keeps space flat, hµν = 0,
gµν = γµν and also Lg = 0. Thus one free parameter re-
mains. Logunov’s choice ρbi ≡ −m2c4/(16πG) < 0 leads
to an inverse length m and, in quantum language, a gravi-
ton mass h̄m/c. The negative cosmological constant can
be counteracted by an inflaton field [7]. The obtained
theory has some drawbacks, such as self-repulsive prop-
erties for matter falling onto a black hole, and a minimal
and a maximal size of the scale factor in cosmology [6] [7].
For a related approach to finite range gravity, based on a
generalized Fierz-Pauli coupling, see [8].
Einstein vs Maxwell
We shall focus on the opposite choice, a positive cosmo-
logical constant Λ, [9] [10]
Ωv,0H
9.78Gyr
ρbi ≡
= ρΛ. (11)
Now the graviton has an “imaginary mass”, m =
−2Λbi/c, it is a “tachyon”: Gravitational waves are
unstable at today’s Hubble scale. But this is of no con-
cern, since on that scale, not single gravitational waves
but the whole Universe matters, being unstable (expand-
ing) anyhow.
Though we take ρbi = ρΛ, Λbi = Λ, our further notation
is valid for the general case ρbi 6= ρΛ, Λbi 6= Λ.
The Einstein equations that couple the Riemann metric
to matter read
Rµν −
gµνR =
tot , (12)
tot = T
µν + ρΛg
µν + ρbiγρσ(g
µρgσν − 1
gµνgρσ).
Conservation of energy momentum, T
tot;ν = 0, imposes a
constraint due to the ρbi terms, [6]
= 0, or Dνh
µν = 0, (13)
which for Cartesian coordinates coincides with the GTR
harmonic condition ∂ν(
−ggµν) = 0 [2]. Thus the the-
ory automatically demands the harmonic constraint for
gµν , or, equivalently, the Lorentz gauge for hµν , thereby
severely reducing the gauge invariance of GTR.
Changes of Einstein’s GTR have mostly met deep trou-
bles with one or another established property, though not
all proposals are ruled out [1,11]. The present one is rather
subtle and promising. For most applications, the Hubble-
size ρΛ = ρbi terms in Eq. (11,12) are too small to be
relevant, so known results from general relativity can be
reproduced. Indeed, viewed from a GTR standpoint, Eq.
(13) is only a particular gauge, and actually often con-
sidered, while the cosmological constant only plays a role
in cosmology. Logunov checked a number of effects in
the solar system: deflection of light rays by the sun, the
delay of a radio signal, the shift of Mercury’s perihelion,
the precession of a gyroscope, and the gravitational shift
of spectral lines. [6] Likewise, we expect agreement for
binary pulsars. [11] Differences between GTR and RTG
may arise, though, for large gravitational fields, that we
consider now.
Black holes. It is known that true black holes, ob-
jects that have a horizon, do not occur in the RTG with
ρΛ, ρbi → 0. [5] But there are solutions very similar to it,
that might be named “grey holes”, but we just call them
“black holes”. The Minkowski line element in spherical
coordinates is simply γµνdx
µdxν = c2dt2 − dr2 − r2dΩ2.
The one of Riemann space is
ds2 = gµνdx
µdxν = U(r)c2dt2 − V (r)dr2 −W 2(r)dΩ2. (14)
In harmonic coordinates, the Schwarzschild black hole is
described by [2]
r − rh
r + rh
, Ws = r + rh, rh =
. (15)
The horizon radius rh equals half the Schwarzschild radius.
Let us scale r → rrh, and define
U = eu, V = ev, W = 2rhe
w, (16)
so that w is small near the horizon. The dimensionless
small parameter arising from ρbi = ρΛ, is very small,
λ̄ ≡ rh
2Λ = 2.38 10−23
. µ̄ ≡ rh
2Λbi = λ̄. (17)
The sum and difference of the (t, t) and (r, r) Einstein
equations give
ev−2w − w′(u′ − v′ + 4w′)− 2w′′
= ev(λ̄2 − 1
µ̄2r2e−2w) +
8πGr2h
ev(ρ− p), (18)
w′(u′ + v′ − 2w′)− 2w′′
µ̄2(ev−u − 1) + 8πGr
ev(ρ+ p), (19)
respectively. The harmonic condition imposes
u′ − v′ + 4w′ = r exp(v − 2w).
In the Schwarzschild black hole of GTR, there is no
matter outside the origin. We shall focus on that situation.
A parametric solution of these equations then reads
1 + η(eξ + ξ + log η + r0)
1− η(eξ + ξ + log η + r0)
, (20)
u = ξ + log η,
v = ξ − ln η − 2 log(eξ + 1), (21)
w = ηeξ + µ̄2(ξ + log η + w0).
where ξ is the running variable and η is a small scale.
Corrections of next order in η can be expressed in diloga-
rithms, but they are not needed since µ̄ is very small.
To fix the scale η, we note that energy momentum con-
servation implies, as in GTR, (ρ + p)u′ + 2p′ = 0. In
the stationary state all matter is located at the origin,
which is only possible if p(r) ≡ 0, implying ρ(r)u′(r) = 0.
This is obeyed for r 6= 0 since ρ = 0 there, but since
ρ(0) > 0 (it is infinite), we have to demand u′(0) = 0.
Let us define a factor α by α = µ̄2/η. The above solu-
tion brings w′(r) = ∂ξw/∂ξr = (e
ξ + α)/[2(eξ + 1)], so in
the interior w′ = 1
α. Since ev ≪ 1 there, Eqs. (18,19)
confirm that w′′ = 0, and with w(1) = O(η) this solves
Th.M. Nieuwenhuizen
1.510.5
Fig. 1: Black hole functions U(r), V (r) and W (r), scaled
by factors 10, (bold lines) for µ̄ = 0.1, compared to the
Schwarzschild solution (thin lines; the part V < 0 for r < 1 is
not shown). Inside the horizon, U and V decay very rapidly.
Since they remain positive, time keeps its role in the interior.
w(r) = 1
α (r − 1). Moreover, from the harmonic con-
straint (20) we have in the interior u(r)− v(r) + 4w(r) =
const = 2 ln η, implying that Eq. (19) yields in the interior
u′(r) = {exp[2α(r − 1)]− η2 − µ̄2}/(2η). From u′(0) = 0
we can now solve α,
α = log
, η =
ln 1/µ̄
. (22)
As seen in fig. 1, our solution (16,20,-22) coincides with
Schwarzschild’s for ξ ≫ 1. In the regime ξ = O(1), there
is a transition towards the interior ξ ≪ −1, where ex-
ponential corrections can be neglected. Both U = ηeξ
and V = eξ/η are very small there, but, contrary to the
Schwarzschild case, they remain positive: The behavior in
the interior of the RTG black hole is not qualitatively dif-
ferent from usual, be it that the gravitational field is large.
Width of the brick wall. The transition layer ξ = O(1)
acts like ’t Hooft’s brick wall, [12] of characteristic width
ℓ⋆ = ηrh. Comparing to the Planck length ℓP =
h̄G/c3,
we get
0.977 10−9
1 + 0.019 log(M/M⊙)
. (23)
If quantum physics sets in at the Planck scale, our ap-
proach makes sense only for M > 103 M⊙.
Motion of test particles. For RTG with a negative cos-
mological constant, [6] it was claimed that an incoming
spherical shell of matter is scattered off from a black hole,
a counter-intuitive finding. Let us reconsider this issue.
The motion of a test body occurs along a geodesic
+ Γµνρv
νvρ = 0, vµ =
. (24)
For spherical shells of in-falling matter one needs Γ001 =
U ′/(2U). This brings dt/ds = v0 = 1/(CiU), for some
Ci. Solving v
1 = dr/ds from gµνv
µvν = 1, we then get
dr/dt = (ds/dt)(dr/ds) = −c
U(1− C2i U)/V . We can
now fix Ci at the initial position r = ri, where the spherical
shell is assumed to have a speed dri/dt = vi = βic
Ui/Vi,
viz. Ci =
(1 − β2i )/Ui, with |βi| ≤ 1. The differential
proper time dτ =
Udt and length dℓ =
V dr bring in
the particle’s rest frame dℓ/dτ =
V/Udr/dt, yielding
1− U(r(τ))
U(ri)
(1 − β2i ). (25)
The extreme case is when βi = 0 at ri = ∞, dℓ/dτ =
1− U. To have |dℓ/dτ | < c, it thus suffices that 0 <
U ≤ 1, which is the case. Near the horizon, |dℓ/dτ | is
almost equal to c and the more the shell penetrates the
interior, the closer its speed gets to c. For an outside
observer, the time to see it hit the center of the hole,
dr/|ṙ| equals (rh/c)
dr exp[ 1
(v − u)]. It is finite
and predominantly comes from the horizon, T = rh/cµ̄
2.74× 1032M/M⊙ yr.
The approaches [6–8] have a similar a black hole. While
[8] properly has U ′(0) = 0, in Logunov’s case one has
µ̄2 < 0, so w′ = 1
α < 0 in the interior. This seems to
solve the paradox of “matter reflected by the black hole”:
In-falling matter just enters, but the Logunov coordinate
x = exp(w) − 1 is non-monotonic (x′ < 0 in the interior).
However, the situation is more severe: For α < 0, the
theory does not allow a solution with u′(0) = 0, depriving
that theory of a proper black hole. This condition can
neither be obeyed in GTR: If the central mass is slightly
smeared, the Schwarzschild black hole cannot obey energy-
momentum conservation in GTR.
Cosmology. Starting from the FLRW metric, the har-
monic condition brings two relations: U ∼ V 3 and k = 0:
Minkowski space filled homogeneously with matter re-
mains flat [6]. We may thus put U = a6(t)/a4
, V = a2(t).
Going from cosmic time t to conformal time τ =
a3a−2
yields the familiar Einstein equations, extended by Λbi
terms,
− Λbi
, (26)
= −4πG
(ρ+ 3p) +
− Λbi
The first is the modified Friedman equation, the second
corresponds to the first law d(ρtota
3) = −ptotda3 provided
we define ρtot = ρ+ρΛ+ρ2+ρ6 and ptot = p−ρΛ− 13ρ2+ρ6,
with ρ2 = −3ρbi/2a2 and ρ6 = ρbia4∗/2a6. Note that ρ2
acts as a positive curvature term.
The scale factor has an absolute meaning. If we as-
sume that a ≫ 1 and a ≫ a2/3∗ , Eq. (26) just coin-
cides with the ΛCDM model (cosmological constant plus
cold dark matter), that gives the best fit of the observa-
tions [9] [10]. The ρ2 term allows a positive curvature-
type contribution. At large times, there is the exponential
growth a(τ) = C exp(H∞τ) with H∞ = c
Λ/3. In cos-
mic time this reads a(t) = a
∗ [3H∞(t0 − t)]−1/3, where
Einstein vs Maxwell
t0 is “the end of time”, the moment where the scale factor
has become infinite. The minimal scale factor is zero: in
this theory a big bang can occur since ρbi > 0. With-
out including an inflaton field, Eq. (26) yields an initial
growth of the expansion a = (a2
3Λbi/2)
1/3. In cos-
mic time this reads a = a1 exp(ct
Λbi/6), i. e., a certain
inflation scenario starting at t = −∞.
Also in RTG the gravitational energy precisely compen-
sates the other energy contributions at all times. The
vacuum energy also vanishes: In empty space, the cosmo-
logical constant energy ρΛ cancels the ρbi terms, due to
Eq. (10). See Eq. (12) for gµν = γµν .
In conclusion, we have first written the Einstein equa-
tion in a form that involves the gravitational energy
momentum tensor. An underlying Minkowski space is
needed, in which gravitation is a field. The metric ten-
sor is a way to deal with it, but the equations for the field
itself exist too, see Eq. (6). For flat cosmology it follows
that the total energy vanishes.
Next we have broken general coordinate invariance by
going to the bimetric theory of Logunov, called Relativis-
tic Theory of Gravitation. We have shown that the choice
of a positive bimetric constant allows to regularize the
interior of the Schwarzschild black hole: time keeps its
standard role and escape is, in principle, possible. While
neither the Schwarzschild nor the Logunov black hole sur-
vives smearing of the central mass by a tiny pressure in the
equation of state, ours does. Our modification of the Ein-
stein equations involves the cosmological constant, so it is
of Hubble size, immaterial for solar problems. In cosmol-
ogy, the theory directly leads to the ΛCDM model, while
it could accommodate a positive curvature-like term. At
short times, there is a form of inflation. The gravitational
energy exactly compensates the material energy. The zero
point energy vanishes (“again”), though the cosmological
constant is finite and positive: It is canceled by the bimet-
ric terms.
Euclidean space, a special case of Riemann geometry,
seems to be invoked by Nature, at least far away from bod-
ies and in cosmology. Our approach supports the follow-
ing space-time interpretation: curvature is a geometric de-
scription of the gravitational field in flat space. Clusters of
galaxies do not move away from each other, but the speed
of light changes with cosmic time, dr/dt = [a(t)/a∗]
while the conformal speed is dr/dτ = c/a(τ) as usual.
An empirical way to establish the Minkowski metric is
to present the Einstein equations as (c4/8πG)Rµν −Tµν +
gµνT + ρΛgµν = ρbiγµν , and to measure the left hand
side, which in the geometric view is considered to consist
of curved space properties alone. [6]
As in the standard model of elementary particles, the
separation of curved space into flat space and the gravi-
tational field has the following implication: the quantum
version of RTG – if it exists – will involve quantization of
fields, but not of space.
Finally we answer the question posed in the title. The
field theoretic approach to gravitation is by itself equiva-
lent to a curved space description, so both views apply, de-
scribing the same physics from a different angle. But when
the theory is extended to the relativistic theory of grav-
itation, the bimetrism forces to describe the Minkowski
metric separately, and then we see it as most natural to
view gravitation as a field in flat space, which is Maxwell’s
view.
Topics such as a realistic equation of state for black holes
and classical tunneling of its radiation, regularization of
other singularities, as well as aspects of the inflation and
of inhomogeneous cosmology are under study.
∗ ∗ ∗
Discussion with Martin Nieuwenhuizen and Armen Al-
lahverdyan is gratefully remembered.
REFERENCES
[1] C.W. Misner, K.S. Thorne and J.A. Wheeler, Gravitation,
(Freeman, San Francisco, 1973).
[2] S. Weinberg, Gravitation and Cosmology: Principles and
Application of the General Theory of Relativity, (Wiley,
New York, 1972).
[3] L. D. Landau and E. M. Lifshitz, The Classical Theory of
Fields, (Pergamon, Oxford, U.K., 1951; revised 1979).
[4] S. V. Babak and L. P. Grishchuk, Phys. Rev. D61, 024038
(1999).
[5] N. Rosen, Phys. Rev. 57, 147 (1940); ibid 150; Ann. Phys.
22, 11 (1963).
[6] A.A. Logunov, The Theory of Gravity, (Nauka, Moscow,
2001).
[7] S.S. Gershtein, A.A. Logunov and M.A.Mestvirishvili, gr-
qc/0602029.
[8] S. V. Babak and L. P. Grishchuk, Int. J. Mod. Phys. D12,
1905 (2003).
[9] M. Tegmark, A. Aguirre, M.J. Rees, and F. Wilczek,
Phys. Rev D73, 023505 (2006).
[10] D. N. Spergel, et al., Astrophys. J. Suppl. 148, 175 (2003).
[11] C.M. Will, Theory and Experiment in Gravitational
Physics, (Cambridge Univ. Press, New York, 1993), chap-
ter 12.3, discusses that gravitational radiation of binary
pulsars rules out a more recent bimetric theory of Rosen,
in which black holes do not have the Schwarzschild shape.
[12] G. ’t Hooft, Nucl. Phys. B256, 727 (1985).
ABSTRACT
  Starting with a field theoretic approach in Minkowski space, the
gravitational energy momentum tensor is derived from the Einstein equations in
a straightforward manner. This allows to present them as {\it acceleration
tensor} = const. $\times$ {\it total energy momentum tensor}. For flat space
cosmology the gravitational energy is negative and cancels the material energy.
In the relativistic theory of gravitation a bimetric coupling between the
Riemann and Minkowski metrics breaks general coordinate invariance. The case of
a positive cosmological constant is considered. A singularity free version of
the Schwarzschild black hole is solved analytically. In the interior the
components of the metric tensor quickly die out, but do not change sign,
leaving the role of time as usual. For cosmology the $\Lambda$CDM model is
covered, while there appears a form of inflation at early times. Here both the
total energy and the zero point energy vanish.

<|endoftext|><|startoftext|>
Introduction 4
1.1 The decision problems . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Deciding nonvanishing of Littlewood-Richardson coefficients . 12
1.3 Back to the general decision problems . . . . . . . . . . . . . 16
1.4 Saturated and positive integer programming . . . . . . . . . . 16
1.5 Quasi-polynomiality, positivity hypotheses, and the canonical
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 The plethysm problem . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Towards PH1, SH, PH2,PH3 via canonial bases and canonical
models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.8 Basic plan for implementing the flip . . . . . . . . . . . . . . 29
1.9 Organization of the paper . . . . . . . . . . . . . . . . . . . . 30
1.10 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Preliminaries in complexity theory 34
2.1 Standard complexity classes . . . . . . . . . . . . . . . . . . . 34
2.1.1 Example: Littlewood-Richardson coefficients . . . . . 35
2.2 Convex #P . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2.1 Littlewood-Richardson coefficients . . . . . . . . . . . 38
2.2.2 Littlewood-Richardson cone . . . . . . . . . . . . . . . 38
2.2.3 Eigenvalues of Hermitian matrices . . . . . . . . . . . 39
2.3 Separation oracle . . . . . . . . . . . . . . . . . . . . . . . . . 39
3 Saturation and positivity 41
3.1 Saturated and positive integer programming . . . . . . . . . . 41
3.1.1 A general estimate for the saturation index . . . . . . 45
3.1.2 Extensions . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1.3 Is there a simpler algorithm? . . . . . . . . . . . . . . 47
3.2 Littlewood-Richardson coefficients again . . . . . . . . . . . . 47
3.3 The saturation and positivity hypotheses . . . . . . . . . . . 49
3.4 The subgroup restriction problem . . . . . . . . . . . . . . . . 52
3.4.1 Explicit polynomial homomorphism . . . . . . . . . . 53
3.4.2 Input specification and bitlengths . . . . . . . . . . . . 55
3.4.3 Stretching function and quasipolynomiality . . . . . . 57
3.5 The decision problem in geometric invariant theory . . . . . . 58
3.5.1 Reduction from Problem 1.1.3 to Problem 1.1.4 . . . . 59
3.5.2 Input specification . . . . . . . . . . . . . . . . . . . . 59
3.5.3 Stretching function and quasi-polynomiality . . . . . . 60
3.5.4 Positivity hypotheses . . . . . . . . . . . . . . . . . . . 61
3.5.5 G/P and Schubert varieties . . . . . . . . . . . . . . . 62
3.6 PH3 and existence of a simpler algorithm . . . . . . . . . . . 63
3.7 Other structural constants . . . . . . . . . . . . . . . . . . . . 63
4 Quasi-polynomiality and canonical models 65
4.1 Quasi-polynomiality . . . . . . . . . . . . . . . . . . . . . . . 65
4.1.1 The minimal positive form and modular index . . . . 68
4.1.2 The rings associated with a structural constant . . . . 69
4.2 Canonical models . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2.1 From PH0 to PH1,3 . . . . . . . . . . . . . . . . . . . 70
4.2.2 On PH0 in general . . . . . . . . . . . . . . . . . . . . 72
4.3 Nonstandard quantum group for the Kronecker and the plethysm
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4 The cone associated with the subgroup restriction problem . 75
4.5 Elementary proof of rationality . . . . . . . . . . . . . . . . . 78
5 Parallel and PSPACE algorithms 84
5.1 Complex semisimple Lie group . . . . . . . . . . . . . . . . . 85
5.2 Symmetric group . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.3 General linear group over a finite field . . . . . . . . . . . . . 92
5.3.1 Tensor product problem . . . . . . . . . . . . . . . . . 93
5.4 Finite simple groups of Lie type . . . . . . . . . . . . . . . . . 94
6 Experimental evidence for positivity 95
6.1 Littlewood-Richardson problem . . . . . . . . . . . . . . . . . 95
6.2 Kronecker problem, n = 2 . . . . . . . . . . . . . . . . . . . . 95
6.3 G/P and Schubert varieties . . . . . . . . . . . . . . . . . . . 96
6.4 The ring of symmetric functions . . . . . . . . . . . . . . . . 97
7 On verification and discovery of obstructions 111
7.1 Obstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2 Decision problems . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3 Verification of obstructions . . . . . . . . . . . . . . . . . . . 113
7.4 Robust obstruction . . . . . . . . . . . . . . . . . . . . . . . . 115
7.5 Verification of robust obstructions . . . . . . . . . . . . . . . 116
7.6 Arithemetic version of the P#P vs. NC problem in charac-
teristric zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.6.1 Class varieties . . . . . . . . . . . . . . . . . . . . . . . 117
7.6.2 Obstructions . . . . . . . . . . . . . . . . . . . . . . . 118
7.6.3 Robust obstructions . . . . . . . . . . . . . . . . . . . 119
7.6.4 Verification of robust obstructions . . . . . . . . . . . 121
7.6.5 On explicit construction of obstructions . . . . . . . . 122
7.6.6 Why should robust obstructions exist? . . . . . . . . . 123
7.6.7 On discovery of robust obstructions . . . . . . . . . . 124
7.7 Arithmetic form of the P vs NP problem in characteristic zero126
Chapter 1
Introduction
This article belongs to a series of papers, [GCT1] to [GCT11], on geomet-
ric complexity theory (GCT), which is an approach to the P vs. NP and
related problems in complexity theory through algebraic geometry and rep-
resentation theory. We assume here that the underlying field of computation
is of characteristic zero. The usual P vs. NP problem is over a finite field.
The characteristic zero version is its weaker, formal implication, and philo-
sophically, the crux.
The basic principle underlying GCT is called the flip [GCTflip]. The
flip, in essence, reduces the negative hypotheses (lower bound problems) in
complexity theory, such as the P 6=?NP problem in characteristic zero, to
positive hypotheses in complexity theory (upper bound problems): specifi-
cally, to the problem of showing that a series of decision problems in rep-
resentation theory and algebraic geometry belong to the complexity class
P . Each of these decision problem is of the form: Given a nonnegative
structural constant in representation theory or geometric invariant theory,
such as the well known plethysm constant, decide if it is nonzero (nonvan-
ishing), or rather, if is nonzero after a small relaxation. This flip from the
negative to the positive may be considered to be a nonrelativizable form of
the flip–from the undecidable to the decidable–that underlies the proof of
Gödel’s incompleteness theorem. But the classical diagonalization technique
in Gödel’s result is relativizable [BGS], and hence, not applicable to the P
vs. NP problem. The flip, in contrast, is nonrelativizable. It is furthermore
nonnaturalizable [GCT10]); i.e., it crosses the natural proof barrier [RR]
that any approach to the P vs. NP problem must cross.
We suggest here a plan for implementating the flip; i.e., for showing that
the decision problems above belong to P . This is based on the reduction
in this paper of the complexity-theoretic positivity hypotheses mentioned
above to mathematical positivity hypotheses: specifically, to showing that
there exist positive formulae for the structural constants under consideration
and certain functions associated with them. We also give theoretical and
experimental evidence in support of the latter hypotheses.
Here we say that a formula is positive if its coefficients are nonegative.
The problem finding the positive formulae as above turns out be intimately
related to the analogous problem for the Kazhdan-Lusztig polynomials [KL1]
and the multiplicative structural constants of the canonical (global crystal)
bases [Kas2, Lu2] in the theory of Drinfeld-Jimbo quantum groups. The
known solution to the latter problem [KL2, Lu2] depends on the Riemann
hypothesis over finite fields, proved in [Dl], and the related results in [BBD].
Thus the flip and the reduction here together roughly say that the valid-
ity of the P 6= NP conjecture in characteristic zero is intimately linked
to the Riemann hypothesis over finite fields and related problems. This is
illustrated in Figure 1.1; the question marks there indicate unsolved prob-
lems. It seems that substantial extension of the techniques related to the
Riemann hypothesis over finite fields may be needed to prove the required
mathematical positivity hypotheses here. We do not have the necessary
mathematical expertize for this task. But it is our hope that the experts in
algebraic geometry and representation theory will have something to say on
this matter.
It may be conjectured that the flip paradigm would also work in the
context of the usual P vs. NP problem over F2 (the boolean field) or the
finite field Fp. But implementation of the flip over a finite field is expected
to be much harder than in characteristic zero. That is why we focus on
characteristic zero here, deferring discussion of the problems that arise over
finite field to [GCT11].
Now we turn to a more detailed exposition of the main results in this
paper and of Figure 1.1.
Acknoledgements
We are grateful to the authors of [BOR] for pointing out an error in the
saturation hypothesis (SH) in the earlier version of this paper. It has been
corrected in this version with appropriate relaxation without affecting the
overall approach of GCT (cf. Section 1.6 and also [GCT6erratum]). We
are also grateful to Peter Littelmann for bringing the reference [Dh] to our
Complexity theoretic negative hypotheses (lower bound problems)
The flip
Complexity theoretic positive hypotheses (upper bound problems)
The reduction in this paper|
Mathematical positivity hypotheses |
(?) The Riemann hypothesis over finite fields, related problems and their extensions
Figure 1.1: Pictorial depiction of the basic plan for implementing the flip
attention, to H. Narayanan for suggesting the use of [KB] in the proof of
Theorem 3.1.1 and bringing the positivity conjecture in [DM2] to our atten-
tion, and to Madhav Nori for a helpful discussion. The experimental results
in Chapter 6 were obtained using Latte [DHHH].
1.1 The decision problems
We now describe the relevant decision problems in representation theory
and algebraic geometry. The actual decision problems that arise in the flip
(cf. the second box in Figure 1.1) are relaxed versions of these problems
described later (cf. Hypothesis 1.1.6).
Problem 1.1.1 (Decision version of the Kronecker problem)
Given partitions λ, µ, π, decide nonvanishing of the Kronecker coefficient
. This is the multiplicity of the irreducible representation (Specht mod-
ule) Sπ of the symmetric group Sn in the tensor product Sλ ⊗ Sµ.
Equivalently [FH], let H = GLn(C) × GLn(C) and ρ : H → G =
GL(Cn ⊗ Cn) = GLn2(C) the natural embedding. Then k
is the multi-
plicity of the H-module Vλ(GLn(C))⊗Vµ(GLn(C)) in the G-module Vπ(G),
considered as an H-module via the embedding ρ.
Here Vλ(GLn(C)) denotes the irreducible representation (Weyl module)
of GLn(C) corresponding to the partition λ; Vπ(G) is the Weyl module of
G = GLn2(C).
Problem 1.1.1 is a special case of the following generalized plethysm
problem.
Problem 1.1.2 (Decision version of the plethysm problem)
Given partitions λ, µ, π, decide nonvanishing of the plethysm constant
. This is the multiplicity of the irreducible representation Vπ(H) of H =
GLn(C) in the irreducible representation Vλ(G) of G = GL(Vµ), where Vµ =
Vµ(H) is an irreducible representation H. Here Vλ(G) is considered an H-
module via the representation map ρ : H → G = GL(Vµ).
(Decision version of the generalized plethysm problem)
The same as above, allowing H to be any connected reductive group.
This is a special case of the following fundamental problem of represen-
tation theory (characteristic zero):
Problem 1.1.3 (Decision version of the subgroup restriction problem)
Let G be connected reductive group, H a reductive group, possibly discon-
nected, and ρ : H → G an explicit, polynomial homomorphism (as defined
in Section 3.4). Here H will generally be a subgroup of G, and ρ its em-
bedding. Let Vπ(H) be an irreducible representation of H, and Vλ(G) an
irreducible representation of G. Here π and λ denote the classifying labels
of the irreducible representations Vπ(H) and Vλ(G), respectively. Let m
the multiplicity of Vπ(H) in Vλ(G), considered as an H-module via ρ.
Given specifications of the embedding ρ and the labels λ, π, as described
in Section 3.4, decide nonvanishing of the multiplicity mπ
All reductive groups in this paper are over C. The reductive groups
that arise in GCT in characteristic zero are: the general and special linear
groups GLn(C) and SLn(C), algebraic tori, the symmetric group Sn, and
the groups formed from these by (semidirect) products. The reader may
wish to focus on just these concrete cases, since all main ideas in this paper
are illustrated therein.
Problem 1.1.3 is, in turn, a special case of the following most general
problem.
Problem 1.1.4 (Decision problem in geometric invariant theory)
Let H be a reductive group, possibly disconnected, X a projective H-
variety (H-scheme), i.e., a variety with H-action. Let ρ denote this H-
action. Let R = ⊕dRd be the homogeneous coordinate ring of X. Assume
that the singularities of spec(R) are rational.
We assume that X and ρ have special properties (as described in Sec-
tion 3.5), so that, in particular, they have short specifications. Let Vπ(H)
be an irreducible representation of H. Let sπd be the multiplicity of Vπ(H) in
Rd, considered as an H-module via the action ρ.
Given d, π, the specifications of X and ρ, decide nonvanishing of the
multiplicity sπ
This last problem is hopeless for general X. Indeed the usual specifi-
cation of X, say in terms of the generators of the ideal of its appropriate
embedding, is so large as to make this problem meaningless for a general
X. But the instances of this decision problem that arise in GCT are for the
following very special kinds of projective H-varieties X, which, in particular,
have small specifications (Section 3.5):
1. G/P , where G is a connected, reductive group, P ⊆ G its parabolic
subgroup, and H ⊆ G a reductive subgroup with an explicit polyno-
mial embedding. Problem 1.1.3 reduces to this special case of Prob-
lem 1.1.4; cf. Section 3.5.
2. Class varieties [GCT1, GCT2], which are associated with the funda-
mental complexity classes such as P and NP . They are very special
like G/P , with conjecturally rational singularities [GCT10]. Each class
variety is specified by the complexity class and the parameters of the
lower bound problem under consideration. Briefly, the P vs. NP prob-
lem in characteristic zero is reduced in [GCT1, GCT2] to showing that
the class variety corresponding to the complexity class NP and the pa-
rameters of the lower bound problem (such as the input size) cannot
be embedded in the class variety corresponding to the complexity class
P and the same parameters. Efficient criteria for the decision prob-
lems stated above are needed to construct explicit obstructions [GCT2]
to such embeddings, thereby proving their nonexistence. Specifically,
Problems 1.1.3 and 1.1.4 are the decision problems associated with
Problems 2.5 and 2.6 in [GCT2], respectively. See Sections 7.6-7.7 for
a brief review of this story.
For these varieties Problem 1.1.4 turns out to be qualitatively similar to
Problem 1.1.3 (cf. Section 3.5 and [GCT2, GCT10]). For this reason, the
Kronecker and the plethysm problems, which lie at the heart of the subgroup
restriction problem, can be taken as the main prototypes of the decision
problems that arise here.
One can now ask:
Question 1.1.5 Do the decision problems above (Problems 1.1.1-1.1.3 and
Problem1.1.4, when X therein is G/P or a class variety) belong to P? That
is, can the nonvanishing of any of structural constants in these problems
be decided in poly(〈x〉) time, where x denotes the input-specification of the
structural constant and 〈x〉 its bitlength?
For Problem 1.1.2, the input specification for the plethysm constant aπλ,µ
is given in the form of a triple x = (λ, µ, π). Here the partition λ is specified
as a sequence of positive integers λ1 ≥ λ2 ≥ · · ·λk > 0 (the zero parts of
the partition are suppressed); k is called the height or length of λ, and is
denoted by ht(λ). The bitlength 〈λ〉 is defined to be the total bitlength of
the integers λr’s. The bitlength 〈x〉 is defined to be 〈λ〉 + 〈µ〉 + 〈π〉. A
detailed specification of the input specification x and its bitlength 〈x〉 for
the other problems is given in Section 3.3.
For the reasons described in Section 1.6, Question 1.1.5 may not have
an affirmative answer in general; i.e., these problems may not be in P in
their strict form stated above. The following main conjectural complexity-
theoretic positivity hypothesis governing the flip says that the relaxed forms
of these decision problems described in Section 3.3 belong to P . As we shall
see in Chapter 7, these relaxed forms suffice for the purposes of the flip.
Hypothesis 1.1.6 (PHflip) The relaxed forms (cf. Section 3.3) of Prob-
lems 1.1.1, 1.1.2, 1.1.3, and the special cases of Problem 1.1.4, when X
therein is G/P or a class variety–which together include all decision prob-
lems that arise in the flip–belong to the complexity class P .
This means nonvanishing of any of these structural constants, modulo a
small relaxation (as described in Section 3.3), can be decided in poly(〈x〉)
time, where x denotes the input-specification of the structural constant and
〈x〉 its bitlength.
The phrase “modulo a small relaxation” in the relaxed form of the
plethysm problem means the following:
(a) Let h = dimG + htλ + htπ, where dim(G) is the dimension of the
group G in Problem 1.1.2. Then there exist absolute nonnegative constants
c and c′, independent of λ, µ and π, such that nonvanishing of the relaxed
(stretched) plethysm constant abπ
bλ,bµ
, for any positive integral relaxation
parameter b > chc
, can be decided in O(poly(〈λ〉, 〈µ〉, 〈π〉, 〈b〉)) time, where
〈b〉 denotes the bitlength b. The notation poly(〈λ〉, 〈µ〉, 〈π〉, 〈b〉) here means
bounded by a polynomial of constant degree in 〈λ〉, 〈µ〉, 〈π〉 and 〈b〉. In
particular, the time is O(poly(〈λ〉, 〈µ〉, 〈π〉) if the relaxation parameter b is
small; i.e. if its bitlength 〈b〉 is O(poly(〈λ〉, 〈µ〉, 〈π〉)). (Observe that the bit
length of h is O(poly(〈λ〉, 〈µ〉, 〈π〉)).)
(b) There exists a polynomial time algorithm for deciding nonvanishing of
, which works correctly on almost all λ, µ and π. Here polynomial time
means O(poly(〈λ〉, 〈µ〉, 〈π〉) time. The meaning of “correctly on almost all”
is specified in Hypothesis 1.6.5 below.
A detailed specification of the relaxation, i.e., the meaning of the phrase
“modulo a small relaxation” for the other problems is given in Section 3.3.
The structural constants in Problems 1.1.1-1.1.3 are of fundamental im-
portance in representation theory. The kronecker and the plethysm con-
stants in Problems 1.1.1 and 1.1.2, in particular, have been studied inten-
sively; see [FH, Mc, St4] for their significance. There are many known
formulae for these structural constants based on on the character formulae
in representation theory. Several formulae for the characters of connected,
reductive groups are known by now [FH], starting with the Weyl character
formula. For the symmetric group, there is the Frobenius character formula
[FH], for the general linear group over a finite field, Green’s formula [Mc],
and for finite simple groups of Lie type, the character formula of Deligne-
Lusztig [DL], and Lusztig [Lu1]. (Finite simple groups of Lie type, other
than GLn(Fq), are not needed in GCT.)
One obvious method for deciding nonvanishing of the structural con-
stants in Problems 1.1.1-1.1.4 is to compute them exactly. But all known al-
gorithms for exact computation of the structural constants in Problems 1.1.1-
1.1.3 take exponential time. This is expected, since this problem is #P -
complete. In fact, even the problem of exact computation of a Kostka
number, which is a very special case of these structural constants, is #P -
complete [N]. This means there is no polynomial time algorithm for com-
puting any of them, assuming P 6= NP .
Of course, there are #P -complete quantities–e.g. the permanent of a
nonnegative matrix [V]–whose nonvanishing can still be decided in polyno-
mial time [Sc]. But the decision problems above are of a totally different
kind and, at the surface, appear to have inherently exponential complexity.
This is because the dimensions of the irreducible representations that occur
in their statements can be exponential in the ranks of the groups involved
and the bit lengths of the classifying labels of these representations. For
example, the dimension of the Weyl module Vλ(GLn(C)) can be exponen-
tial in n and the bit length of the partition λ. Furthermore, the number
of terms in any of the preceding character formulae is also exponential. All
these decisions problems ask if one exponential dimensional representation
can occur within another exponential dimensional representation. To solve
them, it may seem necessary to take a detailed look into these representa-
tions and/or the character formulae of exponential complexity. Hence, it
seemed hard to believe that nonvanishing of these structural constants can,
nevertheless, be decided in polynomial time (modulo a small relaxation).
This constituted the main philosophical obstacle in the course of GCT.
1.2 Deciding nonvanishing of Littlewood-Richardson
coefficients
The first result, which indicated that this obstacle may be removable, came
in the wake of the saturation theorem of Knutson and Tao [KT1]. This
concerns the following special case of Problem 1.1.3, with G = H ×H, the
embedding ρ : H → G being diagonal.
Problem 1.2.1 (Littlewood-Richardson problem)
Given a complex semisimple, simply connected Lie group H, and its
dominant weights α, β, λ, decide nonvanishing of a generalized Littelwood-
Richardson coefficient cλ
. This is the multiplicity of the irreducible repre-
sentation Vλ(H) of H in the tensor product Vα(H)⊗ Vβ(H).
It was shown in [GCT3, KT2, DM2] independently that nonvanishing of
the Littlewood-Richardson coefficient of type A can be decided in polyno-
mial time; i.e., polynomial in the bit lengths of α, β, λ. Furthermore, the
algorithm in [GCT3] works in strongly polynomial time in the terminology
of [GLS]; cf. Section 2.1. The three main ingradients in this result are:
1. PH1: The Littlewood-Richarson rule, which goes back to 1940’s, and
whose most important feature is that it is positive–i.e., it involves no
alternating signs as in character-based formulae–and its strengthening
in [BZ], which gives a positive, polyhedral formula for the Littlewood-
Richardson coefficient as the number of integer points in a polytope;
this can be the BZ-polytope [BZ] or the hive polytope [KT1]. We
shall refer to this positivity property as the first positivity hypothesis
(PH1).
2. The polynomial and strongly polynomial time algorithms for linear
programming [Kh, Ta], and
3. SH: The saturation theorem of Knutson and Tao [KT1]. This says
that cλ
is nonzero if cnλ
nα,nβ
is nonzero for any n ≥ 1. We shall refer
to this saturation property as the saturation hypothesis (SH).
Brion [Z] observed that the verbatim translation of the saturation prop-
erty in [KT1] fails to hold for the the generalized Littlewood-Richardson
coefficients of types B, C, D (it also fails for the Kronecker coefficients, as
well as the plethysm constants [Ki]). Hence, the algorithms in [GCT3, KT2,
DM2] do not work in types B, C and D. Fortunately, this situation can
be remedied. It is shown in [GCT5] that nonvanishing of the generalized
Littewood-Richardson coefficient cλα,β of arbitrary type can be decided in
(strongly) polynomial time, assuming the positivity conjecture of De Loera
and McAllister [DM2]. This conjectural hypothesis, based on considerable
experimental evidence, is as follows. Let
c̃λα,β(n) = c
nα,nβ (1.1)
be the stretching function associated with the Littlewood-Richardson co-
efficient cλα,β . It is known to be a polynomial in type A [Der, Ki], and a
quasi-polynomial, in general [BZ, Dh, DM2]. Recall that a fuction f(n) is
called a quasi-polynomial if there exist l polynomials fj(n), 1 ≤ j ≤ l, such
that f(n) = fj(n) if n = j mod l. Here l is supposed to be the smallest such
integer, and is called the period of f(n). The period of c̃λ
(n) for types
B,C,D is either 1 or 2 [DM2]. In general, it is bounded by a fixed constant
depending on the types of the simple factors the Lie algebra.
Definition 1.2.2 We say that the quasi-polynomial f(n) is strictly posi-
tive, if all coefficients of fj(n), for all j, are nonnegative; i.e., the nonzero
coefficients are positive. In general, we define the positivity index p(f) of
f to be the smallest nonnegative integer such that f(n + p(f)) is strictly
positive. We also say that f(n) is positive with index p(f).
Thus f(n) is strictly positive, iff its positivity index is zero.
With this terminology, the hypothesis mentioned above is the following.
We say a connected reductive group H is classical, if each simple factor of
its Lie algebra H is of type A,B,C or D. We also say that the type of H
or H is classical.
Hypothesis 1.2.3 (PH2): [KTT, DM2] Assume that H in Problem 1.2.1
is classical. Then the Littlewood-Richardson stretching quasi-polynomial
(n) is strictly positive.
We shall refer to this as the second positivity hypothesis (PH2). This
was conjectured by King, Tollu and Toumazet [KTT] for type A, and De
Loera and McAllister for types B,C,D. Since the stretching function above
is a polynomial in type A, the positivity conjecture of King et al clearly
implies the saturation theorem of Knutson and Tao. That is, PH2 implies
SH for type A.
We can formulate an analogue of SH for a Lie algerbra of arbitrary clas-
sical type so that PH2 implies SH for an arbitrary type. For this, we need
to formulate the notion of a saturated quasi-polynomial, which is not con-
tradicted by the counterexamples, mentioned above, to verbatim translation
of the saturation property in [KT1, Ki] to the setting of quasi-polynomials.
Specifically, the notion of saturation in [KT1, Ki] works well if the stretching
function is a polynomial, but not so if it is a quasipolynomial. Let f(n) be
a quasi-polynomial with period l. Let fj(n), 1 ≤ j ≤ l, be the polynomials
such that f(n) = fj(n) if n = j mod l. The index of f , index(f), is defined
to be the smallest j such that the polynomial fj(n) is not identically zero.
If f(n) is identically zero, we let index(f) = 0. If f(1) 6= 0, then clearly
index(f) = 1.
Definition 1.2.4 We say that f(n) is strictly saturated if for any i: fi(n) >
0 for every n ≥ 1 whenever fi(n) is not identically zero. The saturation in-
dex s(f) of f is defined to be the smallest nonnegative integer such that
f(n + s(f)) is strictly saturated. We also say that f(n) is saturated with
index s(f).
Thus f(n) is strictly saturated iff its saturation index is zero. Clearly
the saturation idex is bounded above by the positivity index. Thus if f(n)
is strictly positive, it is strictly saturated. Hence, PH2 (Hypothesis 1.2.3)
implies:
Hypothesis 1.2.5 (SH): The Littlewood-Richardson stretching quasi-polynomial
(n) of arbitary classical type is strictly saturated.
The polynomial time algorithm in [GCT5] works assuming SH as well.
For the Littlewood-Richardson coefficient of type A, the notion of strict
saturation here coincides with the notion of saturation in [KT1] since cλα,β(n)
is a polynomial in that case. Knutson and Tao [KT1] also conjectured
a generalized saturation property for arbitrary types. But that property,
unlike the one defined above, is only conjectured to be sufficient, but not
claimed to be, or expected to be necessary. For this reason, it cannot be
used in the complexity-theoretic applications in this paper.
There is another positivity conjecture for Littlewood-Richardson coeffi-
cients that also implies the saturation theorem of Knutson and Tao. For
this consider the generating function
Cλα,β(t) =
c̃λα,β(n)t
n. (1.2)
It is a rational function since c̃λ
(n) is a quasi-polynomial [St1]. For type
A, if c̃λ
(n) is not identically zero, then Cλ
(t) is a rational function of
d + · · ·+ h0
(1− t)d+1
, (1.3)
since c̃λα,β(n) is a polynomial [St1]. It is conjectured in [KTT] that:
Hypothesis 1.2.6 (PH3:) The coefficients hi’s in eq.(1.3) are nonnegative
(and h0 = 1).
We shall call this the third positivity hypothesis (PH3). It clearly implies SH
for Littlewood-Richardson coefficients of type A. To describe its analogue
for arbitrary classical type we need a definition.
Let F (t) =
n f(n)t
n be the generating function associated with the
quasi-polynomial f(n). It is a rational function [St1].
Definition 1.2.7 We say that F (t) has a positive form, if, when f(n) is
not identically zero, it can be expressed in the form
F (t) =
d + · · ·+ h0
i=0(1− t
ai)di
, (1.4)
where (1) h0 = 1, and hi’s are nonnegative integers, (2) ai’s and di’s are
positive integers, (3)
i di = d+ 1, where d = max deg(fj(n)) is the degree
of f(n).
We define the modular index of this positive form to be max{ai}.
If F (t) has a positive form with a0 = 1, then f(n) is strictly saturated
(Definition 1.2.4); this easily follows from the power series expansion of the
right hand side of eq.(1.4).
The analogue of Hypothesis 1.2.6 for arbitrary classical type is:
Hypothesis 1.2.8 (PH3:) The rational function Cλ
(t) has a positive
form, with a0 = 1, of modular index bounded by a constant depending only
on the types of the simple factors of the Lie algebra of H.
This too implies SH for arbitrary classical type. For types B,C,D, the
constant above is 2. Experimental evidence for this hypothesis is given in
Section 6.1.
The analogue of the PH3, even in the more general q-setting, is known to
hold for the generating function of the Kostant partition function of type A,
and more generally, for a parabolic Kostant partition function; cf. Kirillov
[Ki]. This also gives a support for the PH3 above, given a close relationship
between Littlewood-Richardson coefficients and Kostant partition functions
[FH].
1.3 Back to the general decision problems
It may be remarked that the Littlewood-Richardson problem actually never
arises in the flip. It is only used as a simplest proptotype of the actual (much
harder) problems that arise–namely relaxed forms of Problems 1.1.1-1.1.4.
Now we turn to these problems. The goal is to generalize the preced-
ing results and hypotheses for the Littlewood-Richardson coefficients to the
structural constants that arise in these problems. The problem of finding a
positive, combinatorial formula for the plethysm constant (Problem 1.1.2),
akin to the positive Littlewood-Richardson rule, has already been recog-
nized as an outstanding, classical problem in representation theory [St4]–
the known formulae based on character theory mentioned in Section 1.1
are not positive, because they involve alternating signs. Indeed, existence
of such a formula is a part of the first positivity hypothesis (PH1) below
for the plethysm constant, and this problem is the main focus of the work
in [GCT4, GCT7, GCT8, GCT9]. In view of the intensive work on the
plethym constant in the literature, it has now become clear that the com-
plexity of the plethysm problem (Problem 1.1.2) is far higher than that of
the Littlewood-Richardson problem (Problem 1.2.1). This gap in the com-
plexity is the main source of difficulties that has to be addressed. We now
state the main ingradients in the plan in this paper to show that the relaxed
forms of Problems 1.1.1, 1.1.2, 1.1.3, and 1.1.4, with X = G/P or a class
variety, belong to P .
1.4 Saturated and positive integer programming
First, we formulate a general algorithmic paradigm of saturated and positive
integer programming that can be applied in the context of these problems.
Let A be an m×n integer matrix, and b an integral m-vector. An integer
programming problem asks if the polytope P : Ax ≤ b contains an integer
point. In general, it is NP-complete. We want to define its relaxed version,
which will turn out to have a polynomial time algorithm.
We allow m, the number of constraints, to be exponential in n. Hence,
we cannot assume that A and b are explicitly specified. Rather, it is assumed
that the polytope P is specified in the form of a (polynomial-time) separation
oracle in the spirit of Grötschel, Lovász and Schrijver [GLS]; cf. Section 2.3.
Given a point x ∈ Rn, the separation oracle tells if x ∈ P , and if not, gives
a hyperplane that separates x from P .
Let fP (n) be the Ehrhart quasi-polynomial of P [St1]. By definition,
fP (n) is the number of integer points in the dilated polytope nP .
An integer programming problem is called saturated, if
1. The specification of P also contains a number sie(P ), called the sat-
uration index estimate, with the guarantee that the saturation in-
dex s(fP ) ≤ sie(P ); cf. Definition 1.2.4. In particular, this means
fP (n+ sie(P )) is strictly saturated.
2. the goal of the problem is to give an efficient algorithm to decide if,
given an integral relaxation parameter c > sie(P ), if cP contains an
integer point.
The algorithm has to work only for relaxation parameters c > sie(P ). In
particular, if sie(P ) ≥ 1, the algorithm problem does not have to determine
if P contains an integer point.
An integer programming problem is called positive, if
1. the specification of P also contains a number pie(P ), called the pos-
itivity index estimate, with the guarantee that the positivity index
p(fP ) ≤ pie(P ); cf. Definition 1.2.2. In particular, this means fP (n+
pie(P )) is strictly positive.
2. the goal of the problem is to give an efficient algorithm to decide if,
given an integral relaxation parameter c > pie(P ), if cP contains an
integer point.
Again, the algorithm has to work only for relaxation parameters c > pie(P ).
Since s(fP ) ≤ p(fP ), a positive integer programming problem is also satu-
rated.
The following is the main complexity-theoretic result in this paper.
Theorem 1.4.1 (cf. Section 3.1)
1. Index of the Ehrhart quasi-polynomial fP (n) of a polytope P presented
by a separation oracle can be computed in oracle-polynomial time, and
hence, in polynomial time, assuming that the oracle works in polyno-
mial time.
2. A saturated, and hence positive, integer programming problem has a
polynomial time algorithm.
3. Suppose the polytopes P ’s that arise in a specific decision problem have
the following property: whenever P is nonempty, the Ehrhart quasi-
polynomial fP (n) is “almost always” strictly saturated. Then there
exists a polynomial time algorithm for deciding if P contains an integer
point that works correctly “almost always”.
The meaning of the phrase “almost always” in the context of the decision
problems in this paper will be specified later (cf. Theorem 3.1.1).
It may be remarked that the index as well as the period of the Ehrhart
quasi-polynomial can be exponential in the bit length of the specification
of P . In contrast to the polynomial time algorithm above to compute the
index, the known algorithms to compute the period (e.g. [W]) take time
that is exponential in the dimension of P . It may be conjectured that one
cannot do much better: i.e., the period, unlike the index here, cannot be
computed in polynomial time, in fact, even in 2o(dim(P )) time.
The algorithm in Theorem 1.4.1 is based on the separation-oracle-based
linear programming algorithm of Grötschel, Lovász and Schrijver [GLS], and
a polynomial time algorithm for computing the Smith normal form [KB].
The paradigm of saturated integer programming is useful when one
knows, a priori, a good estimate for the saturation index of the polytope
under consideration, or when the saturation index is almost always zero.
For example, if P is the hive polypolype for the Littlewood-Richardson co-
efficient (type A), then sie(P ) = 0, by the saturation theorem [KT1], and
pie(P ) = 0, by PH2 (Hypothesis 1.2.3). For the polytopes P that would
arise in this paper, sie(P ) and pie(P ) would in general be nonzero, but con-
jecturally always small, and sie(P ) would conjecturally be almost always
zero.
1.5 Quasi-polynomiality, positivity hypotheses, and
the canonical models
The basic goal now is to use Theorem 1.4.1 to get polynomial time algorithms
to decide nonvanishing, modulo small relaxation, of the structural constants
in Problems 1.1.1, 1.1.2, 1.1.3 and 1.1.4, with X = G/P or a class variety.
The main results in this paper which go towards this goal are as follows.
Quasi-polynomiality
We associate stretching functions with the structural constants in Prob-
lems 1.1.1-1.1.4, akin to the stretching function c̃λα,β(n) in eq.(1.1) associ-
ated with the Littlewwod-Richardson coefficient, and show that they are
quasipolynomials; cf. Chapter 4. (But their periods need not be constants,
as in the case of Littlewood-Richardson coefficients; in fact, they may be
exponential in general.) In particular, this proves Kirillov’s conjecture [Ki]
for the plethysm constants. The proof is an extension of Brion’s remarkable
proof (cf. [Dh]) of quasi-polynomiality of the stretching function associ-
ated with the Littlewood-Richardson coefficient. The main ingradient in
the proof is Boutot’s result [Bou] that singularities of the quotient of an
affine variety with rational singularities with respect to the action of a re-
ductive group are also rational. This is a generalization of an earlier result
of Hochster and Roberts [Ho] in the theory of Cohen-Macauley rings.
Saturation and positivity hypotheses
Using the stretching quasipolynomials above, we formulate (cf. Section 3.3)
analogues of the saturation and positivity hypotheses SH, PH1,PH2,PH3 in
Section 1.2 for the structural constants in Problems 1.1.1-1.1.3 and Prob-
lem 1.1.4, with X = G/P or a class variety. As for Littlewood-Richardson
coefficients, it turns out that PH2 implies SH. The hypotheses PH1 and SH
(more strongly, PH2) together imply that the problem of deciding nonvan-
ishing of the structural constant in any of these problems, modulo a small
relaxation, can be transformed in polynomial time into a saturated (more
strongly, positive) integer programming problem, and hence, can be solved
in polynomial time by Theorem 1.4.1. In particular, this shows that all
the relaxed decision problems that arise in flip (cf. Hypothesis 1.1.6) have
polynomial time algorithms, assuming these positivity hypotheses. Though
these algorithms are elementary, the positivity hypotheses on which their
correctness depends turn out to be nonelementary. They are intimately
linked to the fundamental phenomena in algebraic geometry and the theory
of quantum groups, as we shall see.
We also give theoretical and experimental results in support of these
hypotheses; cf. Chapter 4-6.
Canonical models
The proofs of quasi-polynomiality mentioned above also associate with each
structural constant under consideration a projective scheme, called the canon-
ical model, whose Hilbert function coicides with the stretching quasi-polynomial
associated with that structural constant, akin to the model associated by
Brion [Dh] with the Littlewood-Richardson coefficient. These canonical
models play a crucial role in the approach to the posivity hypotheses sug-
gested in Section 1.7.
1.6 The plethysm problem
We now give precise statements of these results and hypotheses for the
plethysm problem (Problem 1.1.2). It is the main prototype in this paper,
which illustrates the basic ideas. Precise statements for the more general
Problems 1.1.3 and 1.1.4 appear in Section 3.3.
As for the Littlewood-Richardson coefficients (cf.(1.1)), Kirillov [Ki] as-
sociates with a plethysm constant aπ
a stretching function
ãπλ,µ(n) = a
nλ,µ, (1.5)
and a generating function
Aπλ,µ(t) =
anπnλ,µt
(Note that µ is not stretched in these definitions.)
He conjectured that Aπ
(t) is a rational function. This is verified here
in a stronger form:
Theorem 1.6.1 (a) (Rationality) The generating function Aπλ,µ(t) is ratio-
(b) (Quasi-polynomiality) The stretching function ãπ
(n) is a quasi-polynomial
function of n. This is equivalent to saying that all poles of Aπλ,µ(t) are roots
of unity, and the degree of the numerator of Aπλ,µ(t) is strictly smaller than
that of the denominator.
(c) There exist graded, normal C-algebras S = S(aπ
) = ⊕nSn, and T =
T (aπ
) = ⊕nTn such that:
1. The schemes spec(S) and spec(T ) are normal and have rational singu-
larities.
2. T = SH , the subring of H-invariants in S, where H = GLn(C) as in
Problem 1.1.2,
3. The quasi-polynomial ãπλ,µ(n) is the Hilbert function of T . In other
words, it is the Hilbert function of the homogeneous coordinate ring of
the projective scheme Proj(T ).
(d) (Positivity) The rational function Aπλ,µ(t) can be expressed in a positive
form:
Aπλ,µ(t) =
h0 + h1t+ · · ·+ hdt
j(1− t
a(j))d(j)
, (1.6)
where a(j)’s and d(j)’s are positive integers,
j d(j) = d + 1, where d is
the degree of the quasi-polynomial ãπ
(n), h0 = 1, and hi’s are nonnegative
integers.
The specific rings S(aπλ,µ) and T (a
λ,µ) constructed in the proof of The-
orem 1.6.1 are very special. We call them canonical rings associated with
the plethysm constant aπ
. We call Y (aπ
) = Proj(S(aπ
)), and Z(aπ
Proj(T (aπ
)) the canonical models associated with aπ
. The canonical rings
are their homogenous coordinate rings.
It may be remarked that the analogue of Theorem 1.6.1 (b) for Littlewood-
Richardson coefficients has an elementary polyhedral proof. Specifically, the
Littlewood-Richardson stretching function c̃λα,β(n) of any type is a quasi-
polynomial since it coincides with the Ehrhart quasi-polynomial of the BZ-
polytope [BZ]. Similarly, the analogue of Theorem 1.6.1 (d) for Littlewood-
Richardson coefficients follows from Stanley’s positivity theorem for the
Ehrhart series of a rational polytope (which is implicit in [St3]). These
polyhderal proofs cannot be extended to the plethysm constant at this point,
since no polyhedral expression for them is known so far–in fact, this is a part
of the conjectural positivity hypothesis PH1 below. In contrast, Brion’s
proof in [Dh] of quasi-polynomiality of c̃λα,β(n) can be extended to prove
Theorem 1.6.1 since it does not need a polyhedral interpretation for aπλ,µ.
But Boutot’s result [Bou] that it relies on is nonelementary (because it needs
resolution of singularities in characteristic zero [Hi], among other things).
We also give an elementary (nonpolyhedral proof) for Theorem 1.6.1 (a) (ra-
tionality). But this does not extend to a proof of quasipolynomiality for all
n, which turns out to be a far delicate problem. It is crucial in the context
of saturated integer programming.
Theorem 1.6.2 (Finitely generated cone)
For a fixed partition µ, let Tµ be the set of pairs (π, λ) such that the
irreducible representation Vπ(H) of H = GLn(C) occurs in the irreducible
representation Vλ(G) of G = GL(Vµ(H)) with nonzero multiplicity. Then
Tµ is a finitely generated semigroup with respect to addition.
This is proved by an extension of Brion and Knop’s proof of the analogous
result for Littlewood-Richardson coefficients based on invariant theory. In
the case of Littlewood-Richardson coefficients, this again has an elementary
polyhedral proof [Z].
Theorem 1.6.3 (PSPACE)
Given partitions λ, µ, π, the plethysm constant aπ
can be computed in
poly(〈λ〉, 〈µ〉, 〈π〉) space.
The main observation in the proof of Theorem 1.6.3 is that the oldest
algorithm for computing the plethysm constant, based on the Weyl character
formula, can be efficiently parallelized so as to work in polynomial parallel
time using exponentially many processors. After this, the result follows from
the relationship between parallel and space complexity classes. It may be
remarked that the known algorithms for computing aπ
in the literature–
e.g., the one based on Klimyk’s formula [FH]–take exponential time as well
as space.
Theorems 1.6.1, 1.6.2 and 1.6.3 lead to the following conjectural sat-
uration and positivity hypotheses for the plethysm constant. These are
analogues of PH1,PH2,PH3, SH in Section 1.2 for Littlewood-Richardson
coefficients.
Hypothesis 1.6.4 (PH1)
For every (λ, µ, π) there exists a polytope P = P π
⊆ Rm such that:
(1) The Ehrhart quasi-polynomial of P coincides with the stretching quasi-
polynomial ãπ
(n) in Theorem 1.6.1. (This means P is given by a linear
system of the form
Ax ≤ b, (1.7)
where A does not depend on λ and π and b depends only on λ and π in a
homogeneous, linear fashion.) In particular,
aπλ,µ = φ(P ), (1.8)
where φ(P ) is equal to the number of integer points in P .
(2) The dimension m of the ambient space, and hence the dimension of P
as well, and the bitlength of every entry in A are polynomial in the bitlength
of µ and the heights of λ and π.
(3) Whether a point x ∈ Rm lies in P can be decided in poly(〈λ〉, 〈µ〉, 〈π〉, 〈x〉)
time. That is, the membership problem belongs to the complexity class P .
If x does not lie in P , then this membership algorithm also outputs, in the
spirit of [GLS], the specification of a hyperplane separating x from P .
The first statement here, in particular, would imply a positive, polyhedral
formula for a
, in the spirit of the known positive polyhedral formulae for
the Littlewood-Richardson coefficients in terms of the BZ- [BZ], hive [KT1]
or other types of polytopes [Dh]. It would also imply polyhedral proofs for
Theorem 1.6.1 (a), (b), (d), and Theorem 1.6.2. Conversely, Theorem 1.6.1
(a), (b), (d), and Theorem 1.6.2 constitute a theoretical evidence for exis-
tence of such a positive polyhedral formula.
The second statement in PH1 is justified by Theorem 1.6.3. Specifi-
cally, it should be possible to compute the number of integer points in P
in PSPACE in view of Theorem 1.6.3. If dim(P ) and m were exponential,
then the usual algorithms for this problem, e.g. Barvinok [Bar], cannot be
made to work in PSPACE. Indeed, it may be conjectured that the number
of integer points in a general polytope P ⊆ Rm can not be computed in
o(m) space.
The number of constraints in the hive [KT1] or the BZ-polytope [BZ]
for the Littlewood-Richardson coefficient cλ
is polynomial in the number
of parts of α, β, λ. In contrast, the number of constraints defining P π
be exponential in the 〈µ〉 and the number of parts of λ and π. But this is
not a serious problem. As long as the faces of the polytope P have a nice
description, the third statement in PH1 is a reasonable assumption. This
has been demonstrated in [GLS] for the well-behaved polytopes in combina-
torial optimization with exponentially many constraints. The situation in
representation theory should be similar, or even better. For example, the
facets of the hive polytope [KT1] are far nicer than the facets of a typical
polytope in combinatorial optimization.
It is known that membership in a polytope is a “very easy” problem.
Formally, if a polytope has polynomially many constraints, this problem
belongs to the complexity class NC ⊆ P [KR], the subclass of problems
with efficient parallel algorithms, which is very low in the usual complexity
hierarchy. Even if the number of constraints of P πλ,µ in PH1 is exponen-
tial, the membership problem may still be conjectured to be in NC (cf.
Remarkrnc)–which would be “very easy” compared to the decision problem
we began with (Problem 1.1.2). For this reason, PH1 is primarily a mathe-
matical positivity hypothesis as against PHflip (Hypothesis 1.1.6), and the
positive, polyhedral formula for aπ
in (1.8) is its main content.
The remaining positivity hypotheses are purely mathematical. They
generalize SH,PH2 and PH3 for the Littlewood-Richardson coefficients to
the plethysm constants. We turn their specification next. We can begin
by asking if the stretching quasipolynomial ãπ
(n) is strictly saturated or
positive. This need not be so. The recent article [Ro] shows that strict
saturation need not hold for the Kronecker coefficients, as was conjectured
in the earlier version of this paper. A similar phenomenon was also reported
in [GCT7, GCT8], where it was observed that the structural constants of
the nonstandard quantum groups associated with the plethysm problem (of
which the Kronecker problem is a special case) need not satisfy an analogue
of PH2. But it was observed there that the positivity (and hence saturation)
indices of these structural constants are small, though not always zero; eg.
see Figures 30,33,35 in [GCT8]. The same can be expected here. This is
also supported by the experimental evidence in [BOR] where too it may be
observed that the positivity index is small. Furthermore, in the special case
(n = 2) of the Kronecker problem analysed in [BOR], the saturation index
is zero for almost all Kronecker coefficients.
These considerations suggest:
Hypothesis 1.6.5 (SH)
(a): The saturation index (Definition 1.2.4) of ãπ
(n) is bounded by a poly-
nomial in the dimension of G in Problem 1.1.2 and the heights of λ and π.
This means there exist absolute nonnegative constants c and c′, independent
of n, λ, µ and π, such that the saturation index is bounded above by chc
where h = dimG+ htλ+ htπ.
(b): The quasi-polynomial ãπ
(n) is strictly saturated, i.e. the saturation
index is zero, for almost all λ, µ, π. Specifically, the density of the triples
(λ, µ, π) of total bit length N with nonzero aπ
for which the saturation index
is not zero is less than 1/N c
, for any positive constant c′′, as N → ∞.
A stronger form of (a) is:
Hypothesis 1.6.6 (PH2) The positivity index (Definition 1.2.2) of the
stretching quasi-polynomial ãπλ,µ(n) is bounded by a polynomial in the di-
mension of G and the heights of λ and π.
The following is another stronger form of SH (a). For this, we observe
that the positive rational form in Theorem 1.6.1 (d) is not unique. Indeed,
there is one such form for every h.s.o.p. (homogeneous sequence of param-
eters) of the homogenenous coordinate ring S; the a(j)’s in eq.(1.6) are the
degrees of these parameters.
Kirillov asked if the only possible pole of Aπ
is at t = 1–i.e. if a
(n) is
a polynomial. This is not so (cf. Section 6.2). But it may be conjectured that
the structural constants a(j)’s are small. Specifcally, consider an h.s.o.p. of
S with a (lexicographically) minimum degree sequence, and call the (unique)
positive rational form in Theorem 1.6.1 (d) associated with such an h.s.o.p.
minimal. The modular index χ(aπ
) of the plethysm constant is defined to
be the modular index (Definition 1.2.7) of this minimal positive form. Then:
Hypothesis 1.6.7 (PH3)
The function Aπ
(t) associated with aπ
has a positive rational form
with modular index bounded by a polynomial in the dimension of G and the
heights of λ and π.
More specifically, this is so for the minimial positive rational form of
(t) as above; i.e., the modular index χ(aπ
) is itself bounded by a poly-
nomial in the dimension of G and the heights of λ and π.
This is a conjectural analogue of a stronger form of PH3 for Littlewood-
Richardson coefficients (Hypothesis 1.2.6), which says that the modular in-
dex of a Littlewood-Richardson coefficient, defined similarly, is one. PH3
here would imply that the period of Aπ
(t) is smooth–i.e. has small prime
factors–though it may be exponential in the heights of λ, µ, π. It can be
shown that PH3 implies SH (a) (Section 3.3).
The following result addresses the second arrow in Figure 1.1 in the
context of the relaxed decision problem for the plethysm constant:
Theorem 1.6.8 The complexity theoretic positivity hypothesis PHflip (Hy-
pothesis 1.1.6) for the plethysm constant is implied by the mathematical
positivity hypotheses PH1 and SH above. Specifically, assuming PH1 and
(a) Nonvanishing of abπbλ,bµ for any b > ch
c′ , with c, c′, h as in SH, can be
decided in O(poly(〈λ〉, 〈µ〉, 〈π〉, 〈b〉)) time.
(b) There is an O(poly(〈λ〉, 〈µ〉, 〈π〉)) time algorithm for deciding if aπ
nonvanishing, which works correctly on almost all λ, µ and π; almost all
means the same as in SH.
Here (a) follows by applying Theorem 1.4.1 (2) to the polytope P π
PH1, and letting the positivity index estimate for this polytope be chc
; (b)
follows from Theorem 1.4.1 (3).
Evidence for the positivity hypotheses in special cases
Littlewood-Richardson coefficients are special cases of (generalized) plethym
constants. We have already seen that PH1 holds in this case, and that there
is considerable experimental evidence for PH2 and PH3 (Section 1.2). An-
other crucial special case of the plethym problem is the Kronecker prob-
lem (Problem 1.1.1)–in fact, this may be considered to be the crux of the
plethysm problem. It follows from the results in [GCT9] that PH1 holds for
the Kronecker problem when n = 2; the earlier known formulae [RW, Ro]
for the Kronecker coefficient in this case are not positive. It can also be seen
from the experimental evidence in [BOR] that the saturation and positivity
indices of the Kronecker coefficient, for n = 2, are very small, and almost
always zero. We also give in Chapter 6 additional experimental evidence for
PH2 for another basic special case of Problem 1.1.3, with H therein being
the symmetric group.
1.7 Towards PH1, SH, PH2,PH3 via canonial bases
and canonical models
In this section, we suggest an approach to prove PH1, SH, PH2 and PH3 for
the plethysm constant and the analogous hypotheses for the other structural
constants in Problems 1.1.3, and 1.1.4, with X = G/P or a class variety.
In the case of Littlewood-Richardson coefficients of type A, PH1 and SH
have purely combinatorial proofs. But it seems unrealistic to expect such
proofs of the saturation and positivity hypotheses for the plethysm and
other structural constants under consideration here given their substantially
higher complexity.
The approach that we suggest is motivated by the proof of PH1 for
Littlewood-Richardson coefficients of arbitrary types based on the canonical
(local/global crystal) bases of Kashiwara and Lusztig for representations of
Drinfeld-Jimbo quantum groups [Dh, Kas2, Li, Lu2, Lu4]. By a Drinfeld-
Jimbo quantum group we shall mean in this paper quantization Gq of a
complex, semisimple group G as in [RTF] that is dual to the Drinfeld-Jimbo
quantized enveloping algebra [Dri]. Canonical bases for representions of a
Drinfeld-Jimbo quantum group in type A are intimately linked [GrL] to the
Kazhdan-Lusztig basis for Hecke algebras [KL1, KL2]. A starting point for
the approach suggested here is:
Observation 1.7.1 (PH0) The homogeneous coordinate rings of the canon-
ical models associated by Brion with the Littlewood-Richardson coefficients
have quantizations endowed with canonical bases as per Kashiwara and Lusztig.
This is a consequence of the work of Kashiwara [Kas3] and Lusztig [Lu3,
Lu4]; see Proposition 4.2.1 for its precise statment. This is why we call the
models here canonical models.
We shall refer to the property above as the zeroeth positivity hypothesis
PH0. Positivity here refers to the deep characteristic positivity property of
the canonical basis proved by Lusztig: namely its multiplicative and comul-
tiplicative structure constants are nonnegative. For this reason, we say that
a canonical basis is positive. Similar positivity property is also known for
the Kazhdan-Lusztig basis [KL2]. The proofs of these positivity properties
are based on the Riemann hypothesis over finite fields (Weil conjectures)
[Dl] and the related work of Beilinson, Bernstein, Deligne [BBD].
The property above is called PH0 because it implies PH1 for Littlewood-
Richardson coefficients of arbitrary types. Specifically, the latter is a formal
consequence of the abstract properties of these canonical bases and is inti-
mately related to their positivity; cf. Section 4.2.1, and [Dh, Kas2, Li, Lu4].
The saturation hypothesis SH in type A [KT1] is a refined property of the
polyhedral formulae in PH1. In Section 4.2 we suggest an approach to prove
SH, PH2 and PH3 for arbitrary types based on the properties of these canon-
ical bases. All this indicates that for the Littlewood-Richardson problem
PH1, SH, PH2 and PH3 are intimately linked to PH0.
This suggests the following approach for proving PH1, SH, PH2 and PH3
for the plethysm and other structural constants under consideration in this
paper (cf. Section 4.2.2):
1. Construct quantizations of the homogeneous coordinate rings of the
canonical models associated with these structural constants,
2. Show that they have canonical bases in some appropriate sense thereby
extending PH0 to this general setting.
3. Prove PH1, SH, PH2, and PH3 by a detailed analysis and study of
these canonical bases as per this extended PH0, just as in the case of
Littlewood-Richardson coefficients.
Pictorially, this is depicted in Figure 1.2.
Quantizations of the homogeneous coordinate rings of the canonical
models associated with Littlewood-Richardson coefficients and their posi-
tive canonical bases are constructed using the theory Drinfeld-Jimbo quan-
tum group. In type A, it is intimately related to the theory of Hecke al-
gebras. But, as expected, the theories of Drinfeld-Jimbo quantum groups
and Hecke algebras do not work for the plethysm problem. What is needed
is a quantum group and a quantized algebra that can play the same role
in the plethysm problem that the Drinfeld-Jimbo quantum group and the
Hecke algebra play in the Littlewood-Richardson problem. These have been
constructed in [GCT4] for the Kronecker problem (Problem 1.1.1) and in
[GCT7] for the generalized plethysm problem (Problem 1.1.2). We shall call
them nonstandard quantum groups and nonstandard quantized algebras; cf.
Section 4.3 for their brief overview. In the special case of the Littlewood-
Richardson problem, these specialize to the Drinfeld-Jimbo quantum group
and the Hecke algebra, respectively. The article [GCT8] gives conjecturally
correct algorithms to construct canonical bases of the matrix coordinate
rings of the nonstandard quantum groups and of nonstandard algebras that
have conjectural positivity properties analogous to those of the canonical
Construction of quantizations of the coordinate rings of canonical models
Construction of canonical bases for these quantizations (PH0)
Positivity and saturation hypotheses PH1, SH |
Polynomial-time algorithms for the relaxed decision problems
Figure 1.2: Pictorial depiction of the approach
(global crystal) bases, as per Kashiwara and Lusztig, of the coordinate ring
of the Drinfeld-Jimbo quantum group, and the Kazhdan-Lusztig basis of the
Hecke algebra. These conjectures lie at the heart of the approach suggested
here, since they are crucial for the extension of PH0 (cf. Figure 1.2) to the
general setting here. Their verification seems to need substantial extension
of the work surrounding the Riemann hypothesis over finite fields mentioned
above.
1.8 Basic plan for implementing the flip
The main application of the results and hypotheses in this paper in the
context of the flip is the following result. As mentioned in Section 1.1, and
described in more detail in Sections 7.6-7.7, each lower bound problem, such
as the P vs. NP problem over C, is reduced in [GCT1, GCT2] to the prob-
lem of proving obstructions to embeddings among the class varieties that
arise in the problem. In Chapter 7 we define a robust obstruction, which is
an obstruction that is well behaved with respect to relaxation, and whose
validity (correctness) depends only on an appropriate PH1 but not SH. It is
conjectured that in each of the lower bound problems under consideration,
robust obstructions exist (Section 7.6.6). In the lower bound problems un-
der consideration, ultimately one is only interested in proving existence of
obstructions. So one may as well search for only robust obstructions.
Theorem 1.8.1 (cf. Chapter 7) Consider the P vs. NP or the NC vs.
P#P problem over C [GCT1]. Assume that the homogeneous coordinate
rings of the relevant class varieties [GCT1, GCT2] in this context have ra-
tional singularities. Also assume that the structural constants associated
with these class varieties satisfy analogous PH1 as specified in Chapter 7.
Then:
(a) The problem of verifying a robust obstruction in each of these problems
belongs to P , so also the relaxed form of the problem of verifying any ob-
struction (not necessarily robust).
(b) There exists an explicit family of robust obstructions in each of these
problems assuming an additional hypothesis OH specified in Chapter 7; the
meaning of the term explicit is also given there.
(b) The problem of deciding existence of a geometric obstruction also belongs
to P , assuming a stronger form of PH1 specified in Chapter 7. Here geomet-
ric obstruction is a simpler type of robust obstruction, defined in Chapter 7,
which is conjectured to exist in the lower bound problems under considera-
tion.
For a precise statement of this theorem, see Chapter 7.
This theorem needs only PH1, but not SH, which is only needed to
argue why robust obstructions should exist (Section 7.6.6), and furthermore,
it is only needed for Problems 1.1.1-1.1.3 and not for the GIT Problem
1.1.4. Thus PH1 is the main positivity hypothesis of GCT in the context
proving existence of (robust) obstructions for the lower bound problems
under consideration.
A basic plan for implementing the flip suggested by the considerations
above is summarized in Figure 1.3. It is an elaboration of Figure 1.1. Ques-
tion marks in the figure indicate open problems.
1.9 Organization of the paper
The rest of this paper is organized as follows.
Negative hypotheses in complexity theory (Lower bound problems)
The flip
Positive hypotheses in complexity theory (Upper bound problems)
Saturated and positive integer programming, and
the quasi-polynomiality results in this paper
Mathematical saturation and positivity hypotheses: PH1,SH (PH2,3)
Construction of the canonical models in this paper, and
construction of the quantum groups in GCT4,7
(PH0): Construction of quantizations of the coordinate
rings of the canonical models and their canonical bases
(?): Problems related to the Riemann Hypothesis over finite
fields, and their generalizations
Figure 1.3: A basic plan for implementing the flip
In Chapter 2 we describe the basic complexity theoretic notions that we
need in this paper and describe their significance in the context of represen-
tation theory.
In Chapter 3, we give a polynomial time algorithm for saturated integer
programming (Theorem 1.4.1), and give precise statements of the results
and positivity hypotheses for Problems 1.1.3 and 1.1.4 (with X = G/P or
a class variety) mentioned in Section 1.5. These generalize the ones given
in Section 1.6 for the plethysm constant. The framework of saturated in-
teger programming in this paper may be applicable to many other struc-
tural constants in representation theory and algebraic geometry, such as the
Kazhdan-Lusztig polynomials (cf. Sections 3.7).
In Chapter 4, we prove the basic quasi-polynomiality results–Theorem 1.6.1
and its generalizations for Problems 1.1.3 and 1.1.4. We also define canonical
models for the structural constants under consideration, and briefly describe
the relevance of the nonstandard quantum groups and the related results in
[GCT4, GCT7, GCT8] in the context of quantizing the coordinate rings of
these canonical models and extending PH0 to them (Figure 1.2).
In Chapter 5, we prove the basic PSPACE results–Theorem 1.6.3 and
its extensions for the various cases of Problem 1.1.3.
In Chapter 6, we give experimental evidence for the positivity hypotheses
PH2 and PH3 in some special cases of the Problems 1.1.1-1.1.4.
In Chapter 7, we describe an application (Theorem 1.8.1) of the re-
sults and positivity hypotheses in this paper to the problem of verifying or
discovering a robust obstruction, i.e., a “proof of hardness” [GCT2] in the
context of the P vs. NP and the permanent vs. determinant problems in
characteristic zero.
1.10 Notation
We let 〈X〉 denote the total bitlength of the specification of X. Here X can
be an integer, a partition, a classifying label of an irreducible representation
of a reductive group, a polytope, and so on. The exact meaning of 〈X〉 will
be clear from the context. The notation poly(n) means O(na), for some
constant a. The notation poly(n1, n2, . . .) similarly means bounded by a
polynomial of a constant degree in n1, n2, . . .. Given a reductive group H,
Vλ(H) denotes the irreducible representation of H with the classifying label
λ. The meaning depends on H. Thus if H = GLn(C), λ is a partition and
Vλ(H) the Weyl module indexed by λ, if H = Sm, then λ is a partition of
size |λ| = m, and Vλ(H) the Specht module indexed by λ, and so on.
Chapter 2
Preliminaries in complexity
theory
In this chapter, we recall basic definitions in complexity theory, introduce
additional ones, and illustrate their significance in the context of represen-
tation theory.
2.1 Standard complexity classes
As usual, P , NP and PSPACE are the classes of problems that can be
solved in polnomial time, nondeterministic polynomial time, and polyno-
mial space, respectively. The class of functions that can be computed in
polynomial time (space) is sometimes denoted by FP (resp. FPSPACE).
But, to keep the notation simple, we shall denote these classes by P and
PSPACE again.
Let SPACE(s(N)) denote the class of problems that can be solved in
O(s(N)) space on inputs of bit length N ; by convention s(N) counts only
the size of the work space. In other words, the size of the input, which is on
the read-only input tape, and the output, which is on the write-only output
tape is not counted. Hence s(N) can be less than the size of the input or the
output, even logarithmic compared to these sizes. The class space(log(N))
is denoted by LOGSPACE.
An algorithm is called strongly polynomial [GLS], if given an input x =
(x1, . . . , xk),
1. the total number of arithmetic steps (+, ∗,− and comparisones) in the
algorithm is polynomial in k, the total number of input parameters,
but does not depend 〈x〉, where 〈x〉 =
i〈xi〉 denotes the bitlength of
2. the bit length of every intermediate operand in the computation is
polynomial in 〈x〉.
Clearly, a strongly polynomial algorithm is also polynomial. let strong P ⊆
P denote the subclass of problems with strongly polynomial time algorithms.
The counting class associated with NP is denoted by #P . Specifically,
a function f : Nk → N, where N is the set of nonnegative integers, is in #P
if it has a formula of the form:
f(x) = f(x1, · · · , xk) =
χ(x, y), (2.1)
where χ is a polynomial-time computable function that takes values 0 or 1,
and y runs over all tuples such that 〈y〉 = poly(〈x〉). The formula (2.1) is
called a #P -formula. An important feature of a #P -formula in the context
of representation theory is that it is positive; i.e., it does not contain any
alternating signs.
The formula (2.1) is called a strong #P -formula, if, in addition, l is
polynomial in k and χ is a strongly polynomial-time computable function.
Let strong #P be the class of functions with strong #P -fomulae.
It is known and easy to see that
#P ⊆ PSPACE. (2.2)
2.1.1 Example: Littlewood-Richardson coefficients
By the Littlewood-Richardson rule [FH], the coefficient cλ
(cf. Prob-
lem 1.2.1) in type A is given by:
cλα,β =
χ(T ), (2.3)
where T runs over all numbering of the skew shape λ/α, and χ(T ) is 1 if
T is a Littlewood-Richardson skew tableau of content β, and zero, other-
wise. The total number of entries in T is quadratic in the total number of
nonzero parts in α, β, λ, and the number of arithmetic steps needed to com-
pute χ(T ) is linear in this total number. Hence (2.3) is a strong #P -formula,
and Littlewood-Richardson function c(α, β, λ) = cλα,β belongs to strong #P .
It may be remarked that the character-based formulae for the Littlewood-
Richardson coefficients are not #P -formulae, since they involve alternat-
ing signs. But the algorithms based on the these formulae for computing
Littlewood-Richardson coefficients run in polynomial space. Thus, from the
perspective of complexity theory, the main significance of the Littlewood-
Richardson rule is that it puts the problem, which at the surface is only in
PSPACE, in its smaller subclass (strong) #P .
Though the Littlewood-Richardson rule is often called efficient in the
representation theory literature, it is not really so from the perspective of
complexity theory. Because computation of cλ
using this formula takes
time that is exponential in both the total number of parts of α, β and λ, and
their bit lengths. This is inevitable, since this problem is #P -complete [N].
Specifically, this means there is no polynomial time algorithm to compute
, assuming P 6= NP .
As remarked in earlier, nonzeroness (nonvanishing) of cλ
can be decided
in poly(〈α〉, 〈β〉, 〈λ〉) time; [DM2, GCT3, KT1]. Furthermore, the algorithm
in [GCT3] is strongly polynomial; i.e., the number of arithemtic steps in
this algorithm is a polynomial in the total number of parts of α, β, λ, and
does not depend on the bit lengths of α, β, λ. Hence the problem of deciding
nonvanishing of cλ
(type A) belongs to strong P .
The discussion above shows that the Littlewood-Richardson problem is
akin to the problem of computing the permanent of an integer matrix with
nonnegative coefficients. The latter is known to be #P -complete [V], but
its nonvanishing can be decided in polynomial time, using the polynomial-
time algorithm for finding a perfect matching in bipartite graphs [Sc]. If
the positivity hypotheses in this paper hold, the situation would be similar
for many fundamental structural constants in representation theory and
algebraic geometry in a relaxed sense.
2.2 Convex #P
Next we want to introduce a subclass of #P called convex #P .
Given a polytope P ⊆ Rl, let χP denote the characteristic (membership)
function of P : i.e., χP (y) = 1, if y ∈ P , and zero otherwise. We say that
f = f(x) = f(x1, . . . , xk) has a convex #P -formula if, for every x ∈ Z
there exists a convex polytope (or, more generally, a convex body) Px ⊆ R
such that
1. The membership function χPx(y) can be computed in poly(〈x〉, 〈y〉)
time, each integer point in Px has O(poly(〈x〉)) bitlength, and
f(x) = φ(Px), (2.4)
where φ(Px) denotes the number of integer points in Px. Equivalently,
f(x) =
χPx(y), (2.5)
where y runs over tuples in Zl of poly(〈x〉) bitlength, and χPx denotes
the membership function of the polytope Px.
Equation (2.5) is similar to eq.(2.1). The main difference is that χ is now
the membership function of a convex polytope. Clearly, eq.(2.5), and hence,
eq.(2.4) is a #P -formula, when χPx can be computed in polynomial time.
Let convex #P be the subclass of #P consisting of functions with convex
#P -formulae.
We say that eq.(2.4) is a strongly convex #P -formula, if the character-
istic function of Px is computable in strongly polynomial time. Let strongly
convex #P be the subclass of #P consisting of functions with strongly con-
vex #P -formulae.
We do not assume in eq.(2.4) that the polytope Px is explicitly specified
by its defining constraints. Rather, we only assume, following [GLS], that
we are given a computer program, called a membership oracle, which, given
input parameters x and y, tells whether y ∈ Px in poly(〈x〉, 〈y〉) time.
If the number of constraints defining Px is polynomial in 〈x〉, then it
is possible to specify Px by simply writing down these constraints. In this
case the membership question can be trivially decided in polynomial time–in
fact, even in LOGSPACE–by verifying each constraint one at a time. This
would not work if Px has exponentially many constraints. In good cases,
it is possible to answer the membership question in polynomial time even
if Px has exponentially many facets. Many such examples in combinatorial
optimization are given in [GLS]. One such illustrative example in repre-
sentation theory is given in Section 2.2.2. The polytopes that would arise
in the plethysm and other problems of main interest in this paper are also
expected to be of this kind.
We now illustrate the notion of convex #P with a few examples in rep-
resentation theory.
2.2.1 Littlewood-Richardson coefficients
A geeneralized Littlewood-Richardson coefficient cλα,β for arbitrary semisim-
ple Lie algebra (Problem 1.2.1) has a strong, convex #P -formula, because
cλα,β = φ(P
α,β),
where P λα,β is the BZ-polytope [BZ] associated with the triple (α, β, λ).
It is easy to see from the description in [BZ] that the number of defin-
ing constraints of P λα,β is polynomial in the total number of parts (coor-
dinates) of α, β, λ. Given α, β, λ, these constraints can be computed in
strongly polynomial time. Hence, the membership problem for P λ
belongs
to LOGSPACE ⊆ P . It follows that the Littlewood-Richardson function
c(α, β, λ) = cλ
belongs to strongly convex #P .
2.2.2 Littlewood-Richardson cone
We now give a natural example of a polytope in representation theory, the
number of whose defining constraints is exponential, but whose membership
function can still be computed in polynomial time.
Given a complex, semisimple, simply connected groupG, let the Littlewood-
Richardson semigroup LR(G) be the set of all triples (α, β, λ) of dominant
weights of G such that the irreducible module Vλ(G) appears in the tensor
product Vα(G) ⊗ Vβ(G) with nonzero multiplicity [Z]. Brion and Knop [El]
have shown that LR(G) is a finitely generated semigroup with respect to
addition. This also follows from the polyhedral expression for Littlewood-
Richardson coefficients in terms of BZ-polytopes [Z]. Let LRR(G) be the
polyhedral cone generated by LR(G).
When G = GLn(C), the facets of LRR(G) have an explicit description by
the affirmative solution to Horn’s conjecture in [Kl, KT1]. But their number
can be quite large (possibly exponential). Nevertheless, membership of any
rational (α, β, λ) (not necessarily integral) in LRR(G) can be decided in
strongly polynomial time.
This is because LRR(G) is the projection of a polytope P (G), the num-
ber of whose constraints is polynomial in the heights of α, β, λ [Z]. If
φ : P (G) → LR(G) is this projection, we can choose P (G) so that for
any integral (α, β, λ), φ−1(α, β, λ) is the BZ-polytope associated with the
triple (α, β, λ). To decide if (α, β, λ) ∈ LR(G), we only have to decide if the
polytope φ−1(α, β, λ) is nonempty. This can be done in strongly polynomial
time using Tardos’ linear programming algorithm [Ta].
2.2.3 Eigenvalues of Hermitian matrices
Here is another example of a polytope in representation theory with expo-
nentially many facets, whose membership problem can still belong to P .
For a Hermitian matrix A, let λ(A) denote the sequence of eigenvalues
of A arranged in a weakly decreasing order. Let HEr be the set of triple
(α, β, λ) ∈ Rr such that α = λ(A + B), β = λ(A), λ = λ(B) for some
Hermitian matrices A and B of dimension r. It is closely related to the
Littlewood-Richardson semisgroup LRr = LR(GLr(C)): HEr ∩ P
r = LRr,
where Pr is the semigroup of partitions of length ≤ r. I. M. Gelfand asked
for an explicit description of HEr. Klyachko [Kl] showed that HEr is a
convex polyhedral cone. An explicit description of its facets is now known
by the affirmative answer to Horn’s conjecture. But their number may be
exponential. Hence, membership in HEr is still not easy to check using this
explicit description. This leads to the following complexity theoretic variant
of Gelfand’s question:
Question 2.2.1 Does the memembership problem for HEr belong to P?
Given that the answer is yes for the closely related LRr = LR(GLr(C))
(Section 2.2.2), this may be so. If HEr were a projection of some polytope
with polynomially many facets, this would follow as in Section 2.2.2. But
this is not necessary. For example, Edmond’s perfect matching polytope for
non-bipartite graphs is not known to be a projection of any polytope with
polynomially many constraints. Still the associated membership problem
belongs to P [Sc].
2.3 Separation oracle
Suppose P ⊆ Rl is a convex polytope whose membership function χP is
polynomial time computable. If χP (y) = 0 for some y ∈ R
r, it is natural to
ask, in the spirit of [GLS], for a “proof” of nonmembership in the form of a
hyperplane that separates y from P .
In this paper, we assume that all polytopes are specified by the separation
oracle. This is a computer program, which given y, tells if y ∈ P , and if
y 6∈ P , returns such a separting hyperplane as a proof of nonmembership. We
assume that the hyperplane is given in the form l = 0, where a linear function
l such that P is contained in the half space l ≥ 0, but l(y) < 0. Furthermore.
we assume that P is a well-described polyhedron in the sense of [GLS]. This
means P is specified in the form of a triple (χP , n, φ), where P ⊆ R
n, χP
is a program for computing the membership function given y ∈ Rn, and
there exists a system of inequalities with rational coefficients having P as
its solution set such that the encoding bit length of each inequality is at
most φ. We define the encoding length 〈P 〉 of P as n+ φ. We also assume
that the separation oracle works in O(poly(〈P 〉, 〈y〉) time.
For example, the polynomial time algorithm for the membership function
of the Littlewood-Richardson cone (cf. Section 2.2.2) can be easily modified
to return a separating hyperplane as a proof of nonmembership.
In what follows, we shall assume, as a part of the definition of a convex
#P -formula, that Px in (2.4) is a well-described polyhedron specified by
a separation oracle that works in polynomial time with 〈Px〉 = poly(〈x〉).
These additional requirements are needed for the saturated integer program-
ming algorithm in Chapter 3.
Chapter 3
Saturation and positivity
In this chapter we describe (Section 3.1) a polynomial time algorithm for
saturated and positive integer programming (Theorem 1.4.1). In Section 3.3
we state the main results and positivity hypotheses for the relaxed forms of
Problem 1.1.3 and Problem 1.1.4, with X = G/P or a class variety therein.
Together they say that these relaxed decision problems can be efficiently
transformed into saturated (more strongly, positive) integer programming
problems, and hence can be solved in polynomial time.
3.1 Saturated and positive integer programming
We begin by proving Theorem 1.4.1.
Let P ⊆ Rn be a polytope given by a separation oracle (Section 2.3).
Let 〈P 〉 be the encoding length of P as defined in Section 2.3. An oracle-
polynomial time algorithm [GLS] is an algorithm whose running time is
O(poly(〈P 〉)), where each call to the separation oracle is computed as one
step. Thus if the separation oracle works in polynomial time, then such
an algorithm works in polynomial time in the usual sense. Let φ(P ) be
the number of integer points in P . Let fP (n) = φ(nP ) be the Ehrhart
quasi-polynomial [St1] of P . Let l(P ) be the least period of fP (n), if P
is nonempty. Let fi,P (n), 1 ≤ i ≤ l(P ), be the polynomials such that
fP (n) = fi,P (n) if n = i modulo l(P ). Let FP (t) =
n≥0 fP (n)t
n denote
the Ehrhart series of P . It is a rational function.
Theorem 3.1.1 (a) The index of fP (n), index(fP ), can be computed in
oracle-polynomial time, and hence, in polynomial time, assuming that the
oracle works in polynomial time. Furthermore, if index(fP ) 6= 0 (i.e. if P
is nonempty), then fi,P (n) is not an identically zero polynomial for every i
divisible by index(fP ).
(b) The saturated, and hence, positive integer programming problem, as de-
fined in Section 1.4, can be solved in oracle-polynomial time. Here it is
assumed that the specification of P also contains the saturation index esti-
mate sie(P ), or the positivity index estimate pie(P ), and that the bitlength
of this estimate is O(poly(〈P 〉)). Given a relaxation parameter c > sie(P )
(or pie(P )), the problem is to determine if cP contains an integer point in
O(poly(〈P 〉, 〈c〉)) time.
(c) Suppose {Px} is a family of polytopes, indexed by some parameter x,
with the following property: wherenver Px is nonempty, the Ehrhart quasi-
polynomial fPx(n) is “almost always” strictly saturated. Almost always
means, the density of x’s of bitlength ≤ N , with nonempty Px for which
fPx(n) is not strictly saturated is less than 1/N
c′′ , for any positive c′′, as
N → 0. We also assume that Px is given by a separation oracle that works in
O(poly(〈x〉)) time, where 〈x〉 is the bitlength of x, and 〈Px〉 = O(poly(〈x〉)).
Then there exists a O(poly(〈x〉)) time algorithm for deciding if Px con-
tains an integer point that works correctly “almost always”; i.e., on almost
all x.
Proof:
Nonemptyness of P can be decided in oracle-polynomial time using the
algorithm of Grötschel, Lovász and Schrijver [GLS] (cf. Theorem 6.4.1
therein). An extension of this algorithm, furthermore, yields a specifica-
tion of the affine space span(P ) containing P if P is nonempty (cf. Theo-
rems 6.4.9, and 6.5.5 in [GLS]). Specifically, it outputs an integral matrix
C and an integral vector d such that span(P ) is defined by Cx = d. This
final specification is exact, even though the first part of the algorithm in
[GLS] uses the ellipsoid method. Indeed, the use of simultneous diophan-
tine approximation based on basis reduction in lattices is precisely to ensure
this exactness in the final answer. This is crucial for the next step of our
algorithm.
If P is empty, index(fP ) = 0. So assume that it is nonempty. Let C̄ be
the Smith normal form of C; i.e., C̄ = ACB for some unimodular matrices
A and B, where the leftmost principal submatrix of C̄ is a diagonal, integral
matrix, and all other columns are zero.
The matrices C̄, A and B can be computed in polynomial time using
the algorithm in [KB]. After a unimodular change of coordinates, by letting
z = B−1x, span(P ) is specified by the linear system C̄z = d̄ = Ad. The
equations in this system are of the form:
c̄izi = d̄i, (3.1)
i ≤ codim(P ), for some integers c̄i and d̄i. By removing common factors if
necessary, we can assume that c̄i and d̄i are relatively prime for each i. Let
c̃ be the l.c.m. of c̄i’s.
The statement (a) follows from:
Claim 3.1.2 index(fP ) = c̃ and fi,P (n) is not an identically zero polyno-
mial for every i divisible by c̃.
Proof of the claim: Indeed, nP = {nz | z ∈ P} contains no integer point
unless c̃ divides n. Hence, it is easy to see that FP (t) = FP̄ (t
c̃), where
FP̄ (x) is the Ehrhart series of the dilated polytope P̄ = c̃P . By eq.(3.1),
the equations defining P̄ are:
zi = d̄i(c̃/c̄i), (3.2)
Clearly, c̃ divides the least period l(P ) of fP , and l(P̄ ) = l(P )/c̃ is the period
of the Ehrhart quasipolynomial fP̄ (n). It suffices to show that the index of
fP̄ (n) is one and that fj,P̄ (n) is not an identically zero polynomial for every
1 ≤ j ≤ l(P̄ ). This is equivalent to showing that P̄ contains a point z with
with zi = ai/b, for some integers ai’s and b such that b = j modulo l(P̄ ).
Let us call such a point j-admissible. Because of the form of the equations
(3.2) defining span(P̄ ), we can assume, without loss of generality, that P̄ is
full dimensional. This means the system (3.2) is empty. Then this follows
from denseness of the set of j-admissible points. This proves the claim, and
hence (a).
(b): Let s = sie(P ) be the given saturation index estimate. This means
fP (n + s) is strictly saturated. This in conjunction with (a) implies that,
given a relaxation parameter c > s, cP contains an integer point, iff c
is divisible by index(fP ) (by letting n = c − s). This can be checked in
O(poly(〈P 〉, 〈c〉)) time since index(fP ) can be computed in polynomial time
by (a).
(c) The algorithm computes index(fPx) and says “Probably Yes”if the index
is one, and “No” otherwise. Since the saturation index of fPx(n) is zero
almost always, by the argument in (b) with s = 0 and c = 1, “Probably
Yes” really means “Yes” almost always. Q.E.D.
The algorithm in (c) has one drawback. If the answer is “Probably
Yes”, we have no easy way of checking if Px really contains an integer point.
Ideally, we would like an algorithm that says “Yes”, with an integer point
in Px as a proof certificate, or “No”, or “Unsure”, and the density of x’s on
which it says “Unsure” should be very small. This problem can be overcome
if the family {Px} has the following stronger property, akin to the family of
hive polytopes [KT1]: there is a linear function lx such that, for almost all x,
if {Px} is nonempty, then the lx-optimum of Px is integral (this is stronger
than saying that fPx(n) is strictly saturated). In this case, the algorithm in
(c) can be extended to yield the integral lx-optimum as a proof certificate. If
the lx-optimum is not integeral, the algorithm says “Unsure”. PH1 and SH
(Section 1.6) for the plethysm (and more generally, the subgroup restriction)
problem may be strengthened by stipulating that the polytopes therein have
this property. But this is not needed in this paper.
We note down one corollary of the proof of Theorem 3.1.1 (this should
be well known):
Proposition 3.1.3 The rational function FP (t) = FP̄ (t
c̃), where FP̄ (x) is
the Ehrhart series of the dilated polytope P̄ = c̃P , and c̃ is the index of
fP (n).
If P is explicitly specified in the form a linear system
Ax ≤ b, (3.3)
where A is an m × n matrix, b an m vector and m = poly(n), then the
following stronger version of Theorem 3.1.1 holds. Let 〈A〉 and 〈A, b〉 denote
the bitlength of the specification of A and of the linear system (3.3).
Theorem 3.1.4 Suppose P is specified in terms of an explicit linear system
(3.3). Then the index of the Erhart quasi-polynomial fP (n) can be computed
in poly(〈A, b〉) time, using poly(〈A〉) arithmetic operations.
Thus, saturated, and hence, positive integer programming problem spec-
ified in the form (3.3) can be solved in in poly(〈A, b, c〉) time, where c is the
relaxation parameter, using poly(〈A〉) arithmetic operations.
Proof: This is proved exactly as Theorem 3.1.1, but with Tardos’ strongly
polynomial time algorithm for combinatorial linear programming [Ta] used
in place of the algorithm in [GLS]. Q.E.D.
3.1.1 A general estimate for the saturation index
Now we give a general estimate for the saturation index of any polytope P
with a specification of the form
Ax ≤ b, (3.4)
where A is an m × n matrix, m possibly exponential. Let ‖P‖ = n + ψ,
where ψ is the maximum bitlength of any entry of A. Trivially, ‖P‖ ≤ 〈P 〉.
We do not assume that we know the specification (3.4) of P explicitly. We
only assume that it exists, and that we are told ‖P‖. Then:
Theorem 3.1.5 The saturation index of P is O(2poly(‖P‖)). Thus the
bitlength of the saturation index is O(poly(‖P‖)).
Conjecturally, this also holds for the positivity index. This estimate is
very conservative, but useful when no better estimate is available.
Proof: There exists a triangulation of P into simplices such that every vertex
of any simplex is also a vertex of P . Then
fP (n) =
f∆(n),
where ∆ ranges over all open simplices in this triangulation; a zero-dimensional
open simplex is a vertex. The saturation index of fP (n) is clearly bounded
by the maximum of the saturation indices of f∆(n).
Hence, we can assume, without loss of generality, that P is an open sim-
plex. Let v0, . . . , vn be its vertices. Then, by Ehrhart’s result (cf. Theorem
1.3 in [st5]),
FP (t) =
i hit
j=0(1− t
, (3.5)
where h0 = 1, hi’s are nonnegative, and aj is the least positive integer
such that ajvj is integral. By Cramer’s rule, the bit length of each aj is
poly(‖P‖). Without loss of generality, we can also assume that aj’s are
relatively prime. Otherwise, the estimate on the saturation index below has
to be multiplied by the g.c.d. of aj ’s. Then the result follows by applying
the following lemma to FP (t), since 〈aj〉 = O(poly(‖P‖)). Q.E.D.
Lemma 3.1.6 Let f(n) be a quasipolynomial whose generating function
F (t) has a positive form
F (t) =
i hit
j=0(1− t
, (3.6)
where h0 = 1, hi’s are nonnegative, and aj’s are positive and relatively
prime. Let a = max{aj}. Then the saturation index s(f) of f(n) is
O(poly(a, n)).
Proof: Let g(n) be the quasi-polynomial whose generating function G(t) =
g(n)tn is 1/
j=0(1− t
aj ). It is known that this is the Ehrhart quasipoly-
nomial of the polytope N(a0, . . . , an) defined by the linear system
ajxj = 1, xj > 0.
The saturation index s(g) of g(n) is bounded by the Frobenius number
associated with the set of integers {aj}–this is the largest positive integer m
such that the diophantine equation
ajxj = m
has no positive integeral solution (x0, . . . , xn). It is known (e.g. [BDR]) that
the Frobenius number is bounded by
a0a1a2(a0 + a1 + a2) = O(poly(a)),
assumming that a0 ≤ a1 . . .. Hence, s(g) = O(poly(a)).
Since f(n) is a quasi-polynomial, the degree of the numerator of F (t) is
less than the degree of the denominator. Thus the maximum value of i that
occurs in (3.6) is an.
Let gi(n), i ≤ an, be the quasi-polynomial whose generating function is
j=0(1− t
aj ). Then
s(gi) ≤ i+ s(g) = O(poly(a, n)).
Since, hi’s in (3.6) are nonnegative, s(f) = max s(gi). The result follows.
Q.E.D.
3.1.2 Extensions
We now mention a few straightforard extensions of Theorem 3.1.1.
First, it is not necessary that P be a closed polytope. We can allow
it to be half-closed. Specifically, it can be a solution set of a system of
inequalitites of the form:
A1x ≤ b1 and A2x < b2, (3.7)
where we have allowed strict inequalities. The function FP (n) = φ(nP ), the
number of integer points in nP , is again a quasi-polynomial. Hence, the
notions of saturation and positivity can be generalized to this setting in a
natural way.
Second, the algorithm in Theorem 3.1.1 (b) only needs a nonnegative
number s(P ) such that, for any positive integer c > s(P ):
Saturation guarantee: If the affine span of cP , contains an integer point,
then cP is guaranteed to contain an integer point.
If s(P ) = sie(P ), then this guarantee holds, as can be seen from the
proof of Theorem 3.1.1.
3.1.3 Is there a simpler algorithm?
Though the algorithm for saturated integer programming in Theorem 3.1.1
is conceptually very simple, in reality it is quite intricate, because the work
of Grötschel, Lovász and Schrijver [GLS] needs a delicate extension of the el-
lipsoid algorithm [Kh] and the polynomial-time algorithm for basis reduction
in lattices due to Lenstra, Lenstra and Lovász [LLL]. As has been empha-
sized in [GLS], such a polynomial-time algorithm should only be taken as a
proof of existence of an efficient algorithm for the problem under consider-
ation. It may be conjectured that for the problems under consideration in
this paper such simple, combinatorial algorithms exist. But for the design
of such algorithms, saturation alone does not suffice. The stronger property
(PH3), and more, is necessary. We shall address this issue in Section 3.6.
3.2 Littlewood-Richardson coefficients again
Theorem 3.1.4 applied to the BZ-polytope [BZ], with saturation index esti-
mate equal to zero, specializes to the following in the setting of the Littlewood-
Richardson problem (Problem 1.2.1):
Theorem 3.2.1 [GCT5] Assuming SH (Hypothesis 1.2.5), nonvanishing of
, given α, β, λ, can be decided in strongly polynomial time (Section 2.1)
for any semisimple classical Lie algebra G.
It is assumed here that α, β, λ are specified by their coordinates in the
basis of fundamental weights. For type A, this reduces to the result in
[GCT3], which holds unconditionly.
The saturation conjecture for type A arose [Z] in the context of Horn’s
conjecture and the related result of Klyachko [Kl]. We now turn to implica-
tions of Theorem 3.2.1 in this context.
Given a complex, semisimple, simply connected, classical group G, let
LR(G) be the Littlewood-Richardson semigroup as in Section 2.2.2. The
following is a natural generalization of the problem raised by Zelevinsky [Z]
to this general setting:
Problem 3.2.2 Give an efficient description of LR(G).
Zelevinsky asks for a mathematically explicit description. This is a com-
puter scientist’s variant of his problem.
Let LRR(G) be the polyhedral convex cone generated by LR(G). For
G = GLn(C), by the saturation theorem, a triple (α, β, λ) of dominant
weights belongs to LR(G) iff it belongs to LRR(G). Assuming SH (Hy-
pothesis 1.2.5), Theorem 3.2.1 provides the following efficient description
for LR(G) in general. Recall that the period of the Littlewood-Richardson
stretching polynomial c̃λ
(n) divides a fixed constant d(G), which only de-
pends on the types of simple factors of G [DM2, GCT5]. Let αi’s denote
the coordinates of α in the basis of fundamental weights.
Corollary 3.2.3 (a) Assuming SH, whether a given (α, β, λ) belongs to
LR(G) can be determined in strongly polynomial time.
(b) There exists a decomposition of LRR(G) into a set of polyhedral cones,
which form a cell complex C(G), and, for each chamber C in this complex,
a set M(C) of O(rank(G)2) modular equations, each of the form
aiαi +
biβi +
ciλi = 0 (mod d),
for some d dividing d(G), such that
1. SH (Hypothesis 1.2.5) is equivalent to saying that: (α, β, λ) ∈ LR(G)
iff (α, β, λ) ∈ LRR(G) and (α, β, λ) satisfies the modular equations in
the set M(Cα,β,λ) associated with the cone Cα,β,λ containing α, β, λ.
2. Given (α, β, λ), whether (α, β, λ) ∈ LRR(G) can be determined in
strongly polynomial time (cf. Section 1.2.5).
3. If so, the cone Cα,β,λ and the associated set M(C
) of modular equa-
tions can also be determined in strongly polynomial time. After this,
whether (α, β, λ) satisfies the equations in M(Cλα,β) can be trivially
determined in strongly polynomial time.
Proof: (a) is a consequence of Theorem 3.2.1. (b) follows from a careful
analysis of the algorithm therein; see the proof of a more general result
(Theorem 4.4.2) later. Q.E.D.
We call the labelled cell complex C(G), in which each cell C ∈ C(G)
is labelled with the set of modular equations M(C), the modular complex,
associated with LRR(G). When G = SLn(C), the modular complex is
trivial: it just consists of the whole cone LRR(G) with only one obvious
modular equation attached to it. But, for general G, the modular complex
and the map C → M(C) are nontrivial. We do not know their explicit
description. Corollary 3.2.3 says that, given x = (α, β, λ), whether x ∈
LRR(G), and whether the relevant modular equations are satisfied can be
quickely verified on a computer, though the modular equations cannot be
easily determined and verified by hand, as in type A. This is the main
difference between type A and general types.
This naturally leads to:
Question 3.2.4 Is there a mathematically explicit description of the mod-
ular complex C(G) for a general G?
3.3 The saturation and positivity hypotheses
Now let f(x), x ∈ Nk, be a counting function associated with a structural
constant in representation theory or algebraic geometry. Here x denotes
the sequence of parameters associated with the constant. Let 〈x〉 denote
the bitlength of x. Let ‖x‖ and rank(x) denote its combinatorial size and
combinatorial rank–these measure complexity of the nonstretchable part in
the specification of x and will be specified later for the f ’s of interest in this
paper.
For example, in the Littlewood-Richardson problem, x is the triple (α, β, λ),
f(x) = f(α, β, λ) = cλ
, 〈x〉 is the total bitlength of the coordinates of
α, β, λ, ‖x‖ is the total number of coordinates of α, β and λ, and rank(x) =
‖x‖. The number of coordinates does not change during stretching, and
hence, constitute the nonstretchable part of the input specification here.
Assume that f(x) is nonnegative for all x ∈ Nk, Then we can successively
ask the following questions:
1. Does f ∈ PSPACE? That is, can f(x) be computed in poly(〈x〉)
space?
2. Does f ∈ #P? (cf. Section 2.1)
3. Does f ∈ convex#P? (cf. Section 2.2)
4. Can a stretching function f̃(x, n) be associated with f(x) intrinsically
so that f̃(x, n) is quasi-polynomial?
5. (PH1?): Is there a polytope Px, for every x, with 〈Px〉 = O(poly(〈x〉))
and ‖Px‖ = O(poly(‖x‖)), such that f̃(x, n) = fPx(n)?
6. Are there good analogues of SH and/or PH2, PH3 for f̃(x, n)? If
so, nonvanishing of f(x), modulo small relaxation, can be decided in
O(poly(〈Px〉)) time by Theorem 3.1.1.
In the rest of this paper, we study these questions when f = f(x) is a
nonnegative function associated with a structural constant in any of the deci-
sion problems in Section 1.1. Exact specifications of x, 〈x〉, ‖x‖, rank(x), f(x),
and f̃(x, n) for these decision problems are given in Sections 3.4-3.5. It is
shown in Chapter 5 that f(x) ∈ PSPACE for Problem 1.1.2 and the special
cases of Problem 1.1.3 that arise in the flip. This may be conjectured to be
so for the f ’s in Problem 1.1.4, with X therein a class variety; cf [GCT10]
for its justification. Quasipolynomiality of f̃(x, n) is addressed in Chapter 4.
The hypotheses PH1, SH, PH2, and PH3 in these cases have the following
unified form.
Hypothesis 3.3.1 (PH1) Let f = f(x) be the function associated with a
structural constant in
1. Problem 1.1.1, or
2. 1.1.2, or
3. Problem 1.1.3, or
4. Problem 1.1.4, with X being a class variety therein.
Then the function f(x) has a convex #P -formula (cf. (2.4))
f(x) = φ(Px),
such that:
1. for every fixed x, the Ehrhart quasi-polynomial fPx(n) of Px coincides
with f̃(x, n).
2. 〈P 〉 = O(poly(P )) and ‖P‖ = O(poly(‖x‖)).
Hypothesis 3.3.2 (SH)
(a) Suppose f(x) is a structural constant as in PH1 above. Then for every x,
the saturation index s(f̃) of f̃(x, n) is O(poly(rank(x))). This means there
exist absolute nonnegative constants c, c′ such that s(f̃) ≤ c(rank(x))c
(b) For f(x) in Problems 1.1.1-1.1.3, the saturation index of f̃(x, n) is zero–
i.e., f̃(x, n) is strictly saturated–for almost all x. This means the density of
x, with 〈x〉 ≤ N and f(x) nonzero, for which the saturation index s(f̃) is
nonzero is ≤ 1/N c
, for any positive costant c′′, as N → ∞.
More strongly than (a),
Hypothesis 3.3.3 (PH2) For f(x) as in PH1, the positivity index of f̃(x, n)
is O(poly(rank(x))).
Hypothesis 3.3.4 (PH3) For f(x) as in PH1, the generating function
F (x, t) =
n f̃(x, n)t
n has a positive rational form of modular index O(poly(rank(x))).
More specifically, the modular index of f̃(x, n), as defined in Section 4.1.1
for f ’s that arise in this paper, is O(poly(rank(x))).
PH3 implies SH (a); this follows from Lemma 3.1.6.
The following conservative bound follows from Theorem 3.1.5.
Theorem 3.3.5 (Weak SH)
Assuming PH1 (Hypothesis 3.3.1), the saturation index of f̃(x, n) is
bounded by 2O(poly(‖x‖)); hence its bitlength is bounded by O(poly(‖x‖)).
The following result addresses the relaxed forms of the decision problems
for the structural constants under consideration (cf. Section 1.1).
Theorem 3.3.6 Suppose f(x) is a structural constant as in PH1 above.
Then PH1 (Hypothesis 3.3.1) and SH (Hypothesis 3.3.2) imply Hypothe-
sis 1.1.6 (PHflip) in this case. Specifically:
(a) For f(x) in Problems 1.1.1-1.1.4, nonvanishing of f̃(x, a), for a given x
and a relaxation parameter a > c(rank(x))c
, with c, c′ as in Hypothesis 3.3.2,
can be decided in poly(〈x〉, 〈a〉) time.
(b) For f(x) as in Problems 1.1.1-1.1.3, there is a poly(〈x〉) time algorithm
for deciding nonvanishing of f(x) that works correctly on almost all x.
This follows from Theorem 3.1.1.
The following sections give precise descriptions of x, 〈x〉, ‖x‖, rank(x)
and f̃(x, n) for the structural constants under consideration.
3.4 The subgroup restriction problem
In this section we consider the subgroup restriction problem (Problem 1.1.3).
The Kronecker and the plethysm problems (Problems 1.1.1, 1.1.2) are its
special cases.
Let G,H, ρ, λ, π,mπλ be as in Problem 1.1.3. We shall define below an ex-
plicit polynomial homomorphism ρ : H → G, as needed in the statement of
Problem 1.1.3, and also the precise specifications [H], [ρ], [λ], [π] of H, ρ, λ, π,
respectively. We shall also define the bitlengths 〈H〉, 〈ρ〉, 〈λ〉, 〈π〉 and the
combinatorial bit lengths ‖λ‖, ‖π‖. We let ‖H‖ = 〈H〉 and ‖ρ‖ = 〈ρ〉, since
H and ρ belong to the nonstretchable part of the input. On the other hand,
λ and π will be stretched in the definition of f̃(x, n), and hence their com-
binatorial bit lengths will differ from the usual bit lengths. The input x in
the subgroup restriction problem is the tuple ([H], [ρ], [λ], [π]). Its bitlength
〈x〉 is defined to be the sum of the bitlengths 〈H〉, 〈ρ〉, 〈λ〉, 〈π〉, and ‖x‖ is
defined to be the sum of ‖H‖, ‖ρ‖, ‖λ‖ and ‖π‖. Finally rank(x) is defined
to the sum of the ranks of H and G and ‖λ‖ and ‖π‖. Here that rank of
a (reductive) group is defined in a standard way. For example, the rank of
the symmentric group Sn is n, that of GLn(C) is n. The rank of a general
finite or connected simple group can be defined similarly, and the rank of a
more complex reductive group is defined to be the sum of the ranks of its
simple components. With this terminology, we let f(x) = mπ
, with x as
defined here in Hypotheses 3.3.1-3.3.4 and Theorem 3.3.6 for the subgroup
restriction problem. Here H and ρ are implicit in the definition of mπ
For example, in the plethym problem (Problem 1.1.2), these specifi-
cations are as follows. The specification [H] is just the root system for
H = GLn(C). Its bitlength 〈H〉 is n. The specification [ρ] of the repre-
sentation map ρ : H → G = GL(Vµ(H)) consists of just the partition µ
specified in terms of its nonzero parts. Its bitlength 〈ρ〉 = 〈µ〉. The ranks
of H and G are as usual. The partitions λ and µ are specified in terms of
their nonzero parts. Their bitlength is the total bitlength of the parts, and
the combinatorial bit length is the total number of parts (the height). It
is crucial here that only nonzero parts of λ are specified, because the rank
of G can be exponential in the rank of H and the bitlength of µ. Hence,
the bitlength of this compact representation of λ can be polynomial in the
rank of H and the bitlength of µ, even if the dimension of G is exponential.
The main difference between 〈x〉 and ‖x‖ is that the stretchable data λ and
π contribute their bitlengths to the former, and their heights to the latter.
The plethysm problem is the main prototype of the subgroup restriction
problem. If the reader wishes, (s)he can skip the rest of this subsection and
jump to Section 3.4.3 in the first reading.
In general, we assume that H in Problem 1.1.3 is a finite simple group, or
a complex simple, simply connected Lie group, or an algebraic torus (C∗)k,
or a direct product of such groups. The results and hypotheses in this paper
are also applicable if we allow simple types of semidirect products, such as
wreath products, which is all that we need for the sake of the flip. But these
extensions are routine, and hence, for the sake of simplicity, we shall confine
ourselves to direct products.
3.4.1 Explicit polynomial homomorphism
Now let us define an explicit polynomial homomorphism. This will be done
by defining basic explicit homomorphisms, and composing them functorially.
Basic explicit homomorphisms:
Let V be an irreducible polynomial representation of H (character-
istic zero), or more generally, an explicit polynomial representation that
is constructed functorially from the irreducible polynomial representations
using the operations ⊕ and ⊗. Then the corresponding homomorphism
ρ : H → G = GL(V ) is an explicit polynomial homomorphism. The iden-
tity map H → H is also an explicit polynomial homomorphism.
The polynomiality restriction here only applies to the torus component
of H. If H is a finite simple group, or a complex semisimple group, then
any irreducible representation of H is, by definition, polynomial. In general,
a representation is polynomial if its restriction to the torus component is
polynomial; i.e., a sum of polynomial (one dimensional) characters.
To see why the polynomiality restriction is essential, let H be a torus,
V its rational representation, and G = GL(V ). Let Vλ(G) = Sym
d(V ),
the symmetric representation of G, and let π be the label of the trivial
character of H. Then the multiplicity mπ
is the number of H-invariants in
Symd(V ). This is easily seen to be the number of nonnegative solutions of a
system of linear diophontine equations. But the problem of deciding whether
a given system of linear diophontine equations has a nonnegative solution
is, in general, NP -complete. Though the system that arises above is of a
special form, it is not expected to be in P if V is allowed to be any rational
representation; the associated decision problem may be NP -complete even
in this special case. If V is a polynomial representation of a torus H, then
all coefficients of the system are nonnegative, and the decision problem is
trivially in P .
Composition:
We can now compose the basic explicit (polynomial) homomorphisms
above functorially:
1. If ρi : H → Gi are explicit, the product map ρ : H →
iGi is also
explicit.
2. If ρi : Hi → Gi are explicit, the product map ρ :
Gi is also
explicit.
Instead of products, we can also allow simple semi-direct products such
as wreath products here. We may also allow other functorial constructions
such as induced representations and restrictions. For example, if ρ : H → G
is an explicit polynomial homomorphism, and G′ ⊆ G is an explicit subgroup
of G such that ρ(H) ⊆ G′, then the restricted homomorphism ρ′ : H → G′
can also be considered to be an explicit polynomial homomorphism. But
for the sake of simplicity, we shall confine ourselves to the simple functorial
constructions above.
3.4.2 Input specification and bitlengths
Now we describe the specifications [H], [ρ], [λ], [µ], their bitlengths. These
are very similar to the ones in the plethysm problem.
The specification [H]:
We assume that H is specified as follows.
(1) IfH is a complex, simple, simply connected Lie group, then the specifica-
tion [H] consists of the root system of H or the Dynkin diagram. Let 〈H〉 be
the bitlength of this specification. Thus, if H = SLn(C), then 〈H〉 = O(n).
(2) If H is a simple group of Lie type (Chevalley group) then it has a similar
specification [Ca]. The only finite groups of Lie type that arise in GCT are
SLn(Fpk) and GLn(Fpk). In this case the specification [H] is easy: we only
have to specify n, p, k. We define 〈H〉 in this case to be n + k + log2 p; not
log2 n+ log2 k + log2 n. As a rule, 〈H〉 is defined to be the sum of the rank
parameters (such as n and k here) and bit lengths of the weight parameters
(such as p here) in the specification. This is equivalent to assuming that the
rank parameters are specified in unary.
(3) If H is the alternating group An, we only specify n. Let 〈H〉 = n.
(4) The torus is specified by its dimension. We define 〈H〉 to be the dimen-
sion.
(5) If H is a product of such groups, its specification is composed from the
specifications of its factors, and the bitlength 〈H〉 is defined to be the sum
of the bitlengths of the constituent specifcations.
The specification [ρ]:
Let us first assume that ρ is a basic explicit polynomial homomorphism.
In this case the specification of ρ : H → G = GL(V ) is a pair [ρ] = ([H], [V ])
consisting of the specification [H] of H as above, and the combinatorial
specification [V ] of the representation V as defined below:
(1) If H is a semisimple, simply connected Lie group, and V = Vµ(H) its
irrreducible representation for a dominant weight µ of H, then V is specified
by simply giving the coordinates of µ in terms of the fundamental weights
of H. Thus [V ] = µ, and its bitlength 〈V 〉 is the total bitlength of all
coordinates of µ, and the combinatorial bit length ‖V ‖ is the total number
of coordinates of µ.
(2) If H = Sn, and V = Sγ its irreducible representation (Specht module),
then [V ] is the partition γ labelling this Specht module. We define 〈V 〉 to
be the bitlength of this partition, and ‖V ‖ = 〈V 〉.
(3) If H is a finite general linear group GLn(Fpk), and V its irreducible rep-
resentation, as classified by Green [Mc], then [V ] is the combinatorial clas-
sifying label of V as given in [Mc]. It is a certain partition-valued function,
which can be specified by listing the places where the function is nonzero
and the nonzero partition values at these places. Let 〈V 〉 be the bitlength
of this specification; it is O(poly(n, k, 〈p〉)). We let ‖V ‖ = 〈V 〉. More gener-
ally, if H is a finite group of Lie type, and V its irreducible representation,
then [V ] is the combinatorial classifying label of V as given by Lusztig [Lu1].
(4) If H is a torus and V is a polynomial character, then [V ] is the speci-
fication of the character. Its bitlength is the bitlength of the specification,
and combinatorial bit length is the dimension of H.
(5) If V is composed from irreducible representations, then [V ] is composed
from the specifications of the irreducible representations in an obvious way.
Bitlengths and combinatorial bitlengths are defined additively.
The bitlength 〈ρ〉 is defined to be 〈H〉+ 〈V 〉, where 〈V 〉 is the bitlength
of [V ].
If ρ is a composite homomorphism, its specification [ρ] is composed from
the specifications of its basic constituents in an obvious way. The bitlength
〈ρ〉 is defined to be the sum of the bitlengths of these basic specifications.
The specifications [λ] and [π]:
Vπ(H) is the tensor product of the irreducible representations of the
factors of H. We let [π] be the tuple of the combinatorial classifying labels
of each of these irreducible representations, as specified above. Let 〈π〉 be
their total bit length, and ‖π‖ the total combinatorial bit length. Similarly,
Vλ(G) is the tensor product of the irreducible representations of the factors
of G. When G = GLm(C), λ is a partition, which we specify by only giving
its nonzero parts, whose number is equal to the height of λ. This is crucial
since the height of λ can be much less than than the rank m of G, as in
the plethysm problem (Problem 1.1.2). We shall leave a similar compact
specification [λ] for a general connected, reductive G to the reader. Let 〈λ〉
be its bitlength and ‖λ‖ its combinatorial bit length.
3.4.3 Stretching function and quasipolynomiality
Let f(x) = mπ
as above, with x = ([H], [ρ], [λ], [π]). Here λ is the dominant
weight of G. First, assume that H is connected, reductive. Then π is the
dominant weight of H. For a given x, let us define the stretching function
f̃(x, n) = m̃πλ(n) = m
nλ, (3.8)
which is the multiplicity of Vnπ(H) in Vnλ(G), considered as an H-module
via ρ : H → G. Let Mπ
(t) =
n≥0 m̃
(n)tn be the generating function of
this stretching quasi-polynomial.
The following is the generalization of Theorem 1.6.1 in this setting.
Theorem 3.4.1 (a) (Rationality) The generating function Mπ
(t) is ratio-
(b) (Quasi-polynomiality) The stretching function m̃π
(n) is a quasi-polynomial
function of n.
(c) There exist graded, normal C-algebras S = S(mπ
) = ⊕nSn and T =
T (mπ
) = ⊕nTn such that:
1. The schemes spec(S) and spec(T ) are normal and have rational singu-
larities.
2. T = SH , the subring of H-invariants in S.
3. The quasi-polynomial m̃π
(n) is the Hilbert function of T .
(d) (Positivity) The rational function Mπ
(t) can be expressed in a positive
form:
Mπλ (t) =
h0 + h1t+ · · ·+ hdt
j(1− t
a(j))d(j)
, (3.9)
where a(j)’s and d(j)’s are positive integers,
j d(j) = d+1, where d is the
degree of the quasi-polynomial, h0 = 1, and hi’s are nonnegative integers.
The specific rings S(mπ
) and T (mπ
) constructed in the proof of this
result are called the canonical rings associated with the structrural con-
stant mπ
. The projective schemes Y (mπ
) = Proj(S(mπ
)), and Z(mπ
Proj(T (mπλ)) are called the canonical models associated with m
Theorem 3.4.1 and its generalization, when H can be disconnected, is
proved in Chapter 4; cf. Theorem 4.1.1.
Finitely generated semigroup
The following is an analogue of Theorem 1.6.2.
Theorem 3.4.2 Assume that H is connected. For a fixed ρ : H → G, let
T (H,G) be the set of pairs (µ, λ) of dominant weights of H and G such that
the irreducible representation Vπ(H) of H occurs in the irreducible repre-
sentation Vλ(G) of G with nonzero multiplicity. Then T (H,G) is a finitely
generated semigroup with respect to addition.
This is proved in Section 4.4.
PSPACE
The following is a generalization of Theorem 1.6.3.
Theorem 3.4.3 Assume that H in Problem 1.1.3 is a direct product, whose
each factor is a complex simple, simply connected Lie group, or an alternat-
ing (or symmetric) group, or SLn(Fpk) (or GLn(Fpk)), or a torus. Then
f(x) = mπ
can be computed in poly(〈x〉) space, with x as specified above.
This is proved in Chapter 5. It may be conjectured that Theorem 3.4.3
holds even when the composition factors of H are allowed to be general
finite simple groups of Lie type. This will be so if Lusztig’s algorithm [Lu5]
for computing the characters of finite simple groups of Lie type can be
parallelized; cf. Section 5.4.
Positivity hypotheses
Theorem 3.4.1-3.4.3, along with the experimental results in special cases
(cf. Chapter 6), constitute the main evidence in support of the positivity
Hypotheses 3.3.1-3.3.4 for the subgroup restrition problem.
3.5 The decision problem in geometric invariant
theory
Finally, let us turn to the most general Problem 1.1.4.
3.5.1 Reduction from Problem 1.1.3 to Problem 1.1.4
First, let us note that the subgroup restriction problem (Problem 1.1.3)
is a special case of Problem 1.1.4. To see this, let H, ρ and G be as in
Problem 1.1.3, and let X be the closed G-orbit of the point vλ corresponding
to the highest weight vector of Vλ(G) in the projective space P (Vλ(G)). Then
X = Gvλ ∼= G/Pλ, (3.10)
where the P = Pλ = Gvλ is the parabolic stabilizer of vλ. We have a natural
action of H on X via ρ. Let R be the homogeneous coordinate ring of X. By
[Ha, MR, Rm, Sm], the singularities of spec(R) are rational. By Borel-Weil
[FH], the degree one component R1 of the homogeneous coordinate ring R
of X is Vλ(G). Hence, s
1 in this special case of Problem 1.1.4 is precisely m
in Problem 1.1.3. The results in Section 3.4 for sπ1 generalize in a natural
way for sπd .
3.5.2 Input specification
The variety X in the above example is completely specified by H, ρ and λ.
Hence its specification [X] can be given in the form a tuple ([H], [ρ], [λ]),
where [H], [ρ] and [λ] are the specifications of H, ρ and λ as in Section 3.4,
The input specification x for Problem 1.1.4 in the special case above is the
tuple ([X], d, [π]) = ([H], [ρ], [λ], d, [π]), where [π] is the specification of π as
in Section 3.4.
We now describe a class of varieties X which have similar compact spec-
ifications.
Let G be a connected, reductive group, H a reductive, possibly discon-
nected, reductive group, and ρ : H → G an explicit polynomial homomor-
phism as in Section 3.4. Let V = Vλ(G) be an irreducible representation of G
for a dominant weight λ. Let P (V ) be the projective space associated with
V . It has a natural action of H via ρ. Let v ∈ P (V ) be a point that is char-
acterized by its stabilizer Gv ⊆ G. This means it is the only point in P (V )
that is stabilized by Gv. For example, the point vλ above is characterized by
its parabolic stabilier. We assume that we know the Levi decompositioon of
Gv explicity, and its compact specification [Gv ], like that of H, and also an
explicit compact specification of the embedding ρ′ : Gv → G, aking to that
of the explicit homomorphism ρ : H → G. Let X ⊆ P (V ) be the projective
closure of the G-orbit of v in P (V ). Then X as well as the action of H on
X are completely specified by λ,H, ρ,Gv and ρ
′. Hence, we can let [X] be
the tuple (λ, [H], [ρ], [Gv ], [ρ
′]). The input specification x for Problem 1.1.4
with the X of this form is the tuple ([X], d, [π]). The bitlengths 〈x〉 and ‖x‖
are defined additively. The rank(x) is defined to be the sum of the ranks of
H and G, dim(V ) and ‖π‖. Since the point vλ above is characterized by its
stabilizer, G/P is a variety of this form.
The class varieties [GCT1, GCT2] are either of this form, or a slight ex-
tension of this form, and admit such compact specifications. The algebraic
geometry of an X of the above form is completely determined by the repre-
sentation theories of the two homomorphisms ρ : H → G and ρ′ : Gv → G.
Furthermore, the results in [GCT2] say that Problem 1.1.4 for a class variety
is intimately linked with the subgroup restriction problem and its variants
for the homomomorphisms ρ and ρ′. Hence it is qualitatively similar to the
subgroup restriction problem in this case; cf. [GCT10] for further elabora-
tion of the connection between these two problems.
3.5.3 Stretching function and quasi-polynomiality
Now let H,X,R and sπ
be as in Problem 1.1.4, with H therein assumed to
be connected. We associate with f(x) = sπd the following stretching fucntion:
f̃(x, n) = s̃πd (n) = s
nd , (3.11)
where snπ
is the multiplicity of the irrreducible representation Vnπ(H) of H
in Rnd, the componenent of the homogeneous coordinate ring R of X with
degree nd. Let S(t) =
n≥0 s̃
(n)tn.
Theorem 3.5.1 Assume that the singularities of spec(R) are rational.
(a) (Rationality) The generating function Sπ
(t) is rational.
(b) (Quasi-polynomiality) The stretching function s̃πd(n) is a quasi-polynomial
function of n.
(c) There exist graded, normal C-algebras S = S(sπ
) = ⊕nSn and T =
T (sπ
) = ⊕nTn such that:
1. The schemes spec(S) and spec(T ) are normal and have rational singu-
larities.
2. T = SH , the subring of H-invariants in S.
3. The quasi-polynomial s̃πd(n) is the Hilbert function of T .
(d) (Positivity) The rational function Sπ
(t) can be expressed in a positive
form:
Sπd (t) =
h0 + h1t+ · · ·+ hkt
j(1− t
a(j))k(j)
, (3.12)
where a(j)’s and k(j)’s are positive integers,
j k(j) = k + 1, where k is
the degree of the quasi-polynomial s̃πd(n), h0 = 1, and hi’s are nonnegative
integers.
This is proved in Chapter 4. Theorem 3.4.1 is a special case of this theorem,
in view of the reduction in Section 3.5.1. Theorem 3.5.1 is applicable when
X is a class variety, assuming that its singularities are rational.
3.5.4 Positivity hypotheses
Even though Theorem 3.5.1 holds for any X, with spec(R) having ratio-
nal singularities, the positivity hypotheses PH1, SH, PH2, and PH3 can
be expected to hold for only very special X’s. In general, characterizing
the X’s with compact specification for which these hypotheses hold is a
delicate problem. Hypotheses 3.3.1-3.3.4 say that these hold when X in
Problem 1.1.4 is G/P (as in Section 3.5.1) or a class variety, with the input
specification x as described above. For future reference, we shall reformulate
these hypotheses purely in geometric terms.
For this we need a definition.
Let T =
n Tn be a graded complex C-algebra so that the singularities
of spec(T ) rational. Let Z = Proj(T ). Assume that Z has a compact
specification [Z]; we shall specify it below for the Z’s of interest to us.
We let [T ], the specification of T , to be [Z]. This will play the role of
the input in the definition below. Let 〈T 〉 denote its bitlength, and ‖T‖
combinatorial bit length. Let hT (n) = dim(Tn) be its Hilbert function,
which is a quasipolynomial, since the singularities of spec(T ) are rational;
cf. Lemma 4.1.3.
Definition 3.5.2 We say that PH1 holds for T (or Z) if the Hilbert quasi-
polynomial hT (n) is convex. This means there exists a polytope P = PT
depending on the input [T ], whose Ehrhart quasipolynomial fP (n) coincides
with the Hilbert function hT (n), and whose membership function χP (y) can
be computed in poly(〈T 〉, y) time. We assume that a separating hyperplane
can also be computed in polynomial time if y 6∈ P (Section 2.3).
If PH1 holds we can also ask if analogues of SH, PH2, and PH3–whose
formulation is similar and hence omitted–hold.
3.5.5 G/P and Schubert varieties
Let us illustrate this definition with an example. Let X ∼= G/Pλ be as in
Section 3.5.1 and R its homogeneous coordinate ring. We have already seen
that it has a compact specification: namely [X] = λ. Since singularities
of spec(R) are rational, PH1 makes sense. For G/P it follows from the
Borel-Weil theorem. The Hilbert series of R is of the form
h0 + · · ·+ hdt
(1− t)d+1
with h0 = 1 and hi’s nonnegative. This is so because R is Cohen-Macauley
[Rm] and is generated by its degree one component. Hence, the modular
index of the Hilbert function is one (PH3). PH2 turns out to be nontriv-
ial. Experimental evidence in its support for the classical G/P is given in
Section 6.3. Considerations for the Schubert subvarieties are similar. Ex-
perimental evidence for PH2 for the classical Schubert varieties is also given
in Section 6.3.
Now let s = sπ
be the multiplicity as Problem 1.1.4, with X having a
compact specification [X] as above. Let T = T (s) be the ring associated
with s as in Theorem 3.5.1 (c). Let Z = Z(s) = Proj(T ). We let the
specification [Z] = ([X], d, π). Let 〈Z〉 be its bitlength.
So Theorem 3.1.1 in this context implies:
Theorem 3.5.3 If PH1 and SH holds for Z(s) then nonvanishing of s,
modulo small relaxation, can be decided in poly(〈Z〉) time.
We also have the following reformulation:
Proposition 3.5.4 Hypotheses 3.3.1-3.3.4 are equivalent to PH1,SH,PH2,PH3
for Z(s), where s is a stucture constant that corresponds the structure con-
stant f(x) in Hypotheses 3.3.1. Thus, in the case of the subgroup restriction
problem, s = sπ1 = m
λ as in Section 3.5.1.
This is just a consequence of definitions.
3.6 PH3 and existence of a simpler algorithm
As we remarked in Section 3.1.3, the use of the ellipsoid method and basis
reduction in lattices makes the the algorithm for saturated integer program-
ming (cf. Theorem 3.1.1) fairly intricate. For the flip (cf. [GCTflip] and
Chapter 7), it is desirable to have simpler algorithms for the relaxed forms
of the decision problems under consideration, akin to the the polynomial
time combinatorial algorithms in combinatorial optimization [Sc] that do
not rely on the elliposoid method or basis reduction. We briefly examine in
this section the role of PH3 in this context.
The simple combinatorial algorithms in combinatorial optimization work
only when the problem under consideration is unimodular–in which case the
vertices of the underlying polytope P are integral–or almost unimodular–
e.g. when the vertices of P are half integral. Edmond’s algorithm for finding
minimum weight perfect matching in nonbipartite graphs [Sc] is a classic
example of the second case.
In the unimodular case, Stanley’s positivity result [St1] implies that the
rational function FP (t) has a positive form
FP (t) =
h(d)td + · · ·+ h(0)
(1− t)d+1
If PH3 (Hypothesis 3.3.4) holds for a structural function f(x) under con-
sideration then the Ehrhart series FPx(t) of the polytope Px associated with
x in PH1 (Hypothesis 3.3.1) has a minimal positive form in which each root
of the denominator has O(poly(‖x‖)) order. Roughly, this says that the
situation is “close” to the unimodular case. Hence, in such a case we can
expect a purely combinatorial polynomial-time algorithm for deciding non-
vanishing of f(x), modulo small relaxation, that does not need the ellipsoid
method or basis reduction.
3.7 Other structural constants
The paradigm of saturated and positive integer programming in this paper,
along with appropriate analogues of PH1,SH,PH2,PH3, may be applicable
several other fundamental structural constants in representation theory and
algebraic geometry, in addition to the ones in Problems 1.1.1-1.1.4 treated
above, such as
1. the value of a Kazhdan-Lusztig polynomial at q = 1, [KL1];
2. the values at q = 1 of the well behaved special cases of the parabolic
Kostka polynomials and their q-analogues [Ki];
3. the structural coefficients of the multiplication of Schubert polynomi-
als, and so on.
Chapter 4
Quasi-polynomiality and
canonical models
In this chapter we prove quasipolynomiality of the stretching functions as-
sociated with the various structural constants under consideration (Sec-
tion 4.1), describe the associated canonical models (Section 4.2), describe
the role of nonstandard quantum groups in [GCT4, GCT7, GCT8] in the
deeper study of these models (Section 4.3), prove finite generation of the
semigroup of weights (Theorem 3.4.2) (Section 4.4), and give an elementary
proof of rationality in Theorem 3.4.1 (a) (Section 4.5).
4.1 Quasi-polynomiality
Here we prove Theorem 3.5.1; Theorems 1.6.1 and 3.4.1 are its special cases
in view of the reduction in Section 3.5.1. This, in turn, follows from the
following more general result.
Let R = ⊕kRd be a normal graded C-algebra with an action of a reduc-
tive group H. Assume that spec(R) has rational singularities. Let H0 be
the connected component of H containing the identity. Let HD = H/H0 be
its discrete component. Given a dominant weight π of H0, we consider the
module Vπ = Vπ(H0), an H-module with trivial action of HD. Let s
denote
the multiplicity of the H-module Vπ in Rd. Let s̃
(n) be the multiplicity of
the H-module Vnπ in Rnd. This is a stretching function associated with the
mulitplicity sπ
. Let Sπ
(t) =
n≥0 s̃
(n)tn.
Theorem 4.1.1 (a) (Rationality) The generating function Sπ
(t) is ratio-
(b) (Quasi-polynomiality) The stretching function s̃π
(n) is a quasi-polynomial
function of n.
(c) There exist graded, normal C-algebras S = S(sπ
) = ⊕nSn and T =
T (sπd ) = ⊕nTn such that:
1. The schemes spec(S) and spec(T ) are normal and have rational singu-
larities.
2. T = SH , the subring of H-invariants in S.
3. The quasi-polynomial s̃π
(n) is the Hilbert function of T .
(d) (Positivity) The rational function Sπd (t) can be expressed in a positive
form:
Sπd (t) =
h0 + h1t+ · · ·+ hkt
j(1− t
a(j))k(j)
, (4.1)
where a(j)’s and k(j)’s are positive integers,
j k(j) = k + 1, where k is
the degree of the quasi-polynomial s̃π
(n), h0 = 1, and hi’s are nonnegative
integers.
Theorem 3.5.1 follows from this by letting R be the homogeneous coordinate
ring of X.
More generally, if W is an irreducible representation of HD, we can
consider the H-module Vπ ⊗ W . Let s
be its multiplicity in Rd. Let
(n) be the multiplicity of the trivial H-representation in the H-module
Rnd ⊗ V
nπ ⊗ Sym
n(W ∗). Then
Theorem 4.1.2 Analogue of Theorem 4.1.1 holds for s̃
For the purposes of the flip, Theorem 4.1.1 suffices.
Proof: We shall only prove Theorem 4.1.1, the proof of Theorem 4.1.2 be-
ing similar. The proof is an extension of M. Brion’s proof (cf. [Dh]) of
quasi-polynomiality of the stretching function associated with a Littlewood-
Richardson coefficient of any semisimple Lie algebra.
Clearly (a) follows from (b); cf. [St1].
(b) and (c):
Let Cd be the cyclic group generated by the primitive root ζ of unity of
order d. It has a natural action on R: x ∈ Cd maps z ∈ Rk to x
kz. Let
B = RCd =
n≥0Rnd ⊆ R be the subring of Cd-invariants. By Boutot
[Bou], B is a normal C-algebra and spec(B) has rational singularities.
Assume thatH0 is semisimple; extension to the reductive case being easy.
Let π∗ be the dominant weight of H0 such that V
π = Vπ∗ . By Borel-Weil
[FH],
Cπ∗ = ⊕n≥0V
nπ = ⊕n≥0Vnπ∗ ,
is the homogeneous coordinate ring of theH0-orbit of the point vπ∗ ∈ P (Vπ∗)
corresponding to the highest weight vector. This H0-orbit is isomorphic to
H0/Pπ∗ , where Pπ∗ ⊆ H0 is the parabolic stabilizer of vπ∗ . Hence Cπ∗ is
normal and spec(Cπ∗) has rational signularities; cf. [Ha, MR, Rm, Sm].
It follows that B ⊗ Cπ∗ is also normal, and spec(B ⊗ Cπ∗) has rational
singularities. Consider the action of C∗ on B ⊗ Cπ∗ given by:
x(b⊗ c) = (x · b)⊗ (x−1 · c),
where x ∈ C∗ maps b ∈ Bn to x
nb, the action on Cπ∗ being similar. Consider
the invariant ring
S = (B ⊗ Cπ∗)
= ⊕nSn = ⊗n≥0Rnd ⊗ V
nπ. (4.2)
By Boutot [Bou], it is a normal, and spec(D) has rational singularities.
Since Vnπ is an H-module, the algebra S has an action of H. Let
T = T (sπd ) = S
H = ⊕n≥0Tn (4.3)
be its subring of H-invariants. By Boutot [Bou], it is normal, and spec(T )
has rational singularities–this is the crux of the proof. By Schur’s lemma, the
multiplicity of the trivial H-representation in Sn = Rnd⊗V
nπ is precisely the
multiplicity s̃πd (n) of the H-module Vnπ in Rnd. Hence, the Hilbert function
of T , i.e., dim(Tn), is precisely s̃
d (n), and the Hilbert series
n≥0 dim(Tn)t
is Sπ
(t). Quasipolynomiality of s̃π
(n) follows by applying the following
lemma:
Lemma 4.1.3 (cf. [Dh]) If T = ⊕∞n=0Tn is a graded C-algebra, such that
spec(T ) is normal and has rational simgularites, then dim(Tn), the Hilbert
function of T , is a quasi-polynomial function of n.
(d) Since spec(T ) has rational singularities, T is Cohen-Macaualey. Let
t1, . . . , tu be its homogeneous sequence of parameters (h.s.o.p.), where u =
k + 1 is the Krull dimension of T . By the theory of Cohen-Macauley rings
[St2], it follows that its Hilbert series Sπd (t) is of the form
h0 + h1t+ · · ·+ hkt
i=1 (1− t
, (4.4)
where (1) h0 = 1, (2) di is the degree of ti, and (3) hi’s are nonnegative
integers. This proves (d). Q.E.D.
Remark 4.1.4 A careful examination of the proof above shows that ratio-
nality of Sπd (t), and more strongly, asymptotic quasi-polynomiality of s̃
d (n)
as n → ∞, can be proved using just Hilbert’s result on finite generation of
the algebra of invariants of a reductive-group action. Boutot’s result is nec-
essary to prove quasi-polynomiality for all n. This is crucial for saturated
and positive integer programming (Chapter 3).
4.1.1 The minimal positive form and modular index
The form (4.4) of Sπd (t) is not unique because it depends on the degrees di’s
of the paramters ti’s. For future use, let us record the following consequences
of the proof. Let T be the ring constructed in the proof above.
Corollary 4.1.5 Suppose T has an h.s.o.p. t = (t1, . . . , tu) with di =
deg(ti). Then S
(T ) has a positive rational form (4.4) with di = deg(ti)
therein.
The proof above is lets us define a minimal positive form of the rational
function Sπ
(t) associated with a structural constant s. For this, let us or-
der h.s.o.p.’s of T lexicographically as per their degree sequences. Here the
degree seqeunce of an h.s.o.p. t = (t1, . . . , tu) is defined to be (d1, . . . , du),
where di = deg(ti). The form (4.4) is the same for any h.s.o.p. of lexi-
cographically minimum degree sequence. We call it the minimal positive
form of Sπ
(t). The modular index of sπ
is defined to be max{di}, where
(d1, . . . , du) is the degree sequence of a lexicographically minimal h.s.o.p.
Since Problems 1.1.1, 1.1.2,1.1.3, 1.2.1 are special cases of Problem 1.1.4,
this defines minimal positive forms of the rational generating functions of the
stretching quasi-polynomials (cf. Theorem 3.4.1) associated with the struc-
tural constants in these problems, and also the modular indices of these
structural constants.
4.1.2 The rings associated with a structural constant
The preceding proof also associates with the structural constant s a few
rings which will be important later. Specifically, let S = S(s) and T = T (s)
be the rings as in Theorem 4.1.1 (c) associated with the structural constant
s = sπ
. Let R = R(s) be the homogeneous coordinate ring of X as in
Theorem 4.1.1. We call R(s), S(s) and T (s) the rings associated with the
structure constant s.
When s = mπ
, as in the subgroup restriction problem (Problem 1.1.3),
X ∼= G/P as given in eq.(3.10. Then these rings are explicitly as follows:
) = ⊕n≥0Vnλ(G),
) = ⊕n≥0Vnλ(G)⊗ Vnπ(H)
T (mπ
) = ⊕n≥0(Vnλ(G) ⊗ Vnπ(H)
∗)H .
(4.5)
By specializing the subgroup restriction problem further to the Littlewood-
Richardson problem (Problem 1.2.1), we get the following rings associated
by Brion (cf. [Dh]) with the Littlewood-Richardson coefficient cλ
R(cλα,β) = ⊕n≥0Vnα(H)⊗ Vnβ(H),
) = ⊕n≥0Vnα(H)⊗ Vnβ(H)⊗ Vnλ(H)
T (cλ
) = ⊕n≥0(Vnα(H)⊗ Vnβ(H)⊗ Vnλ(H)
∗)H .
(4.6)
4.2 Canonical models
There are several rings other than T (cλ
) whose Hilbert function coincides
with the Littlewood-Richardson stretching quasi-polynomial c̃λ
(n). For
example, let P = P λ
be the BZ-polytope [BZ] whose Ehrhart quasi-
polynomial coincides with c̃λ
(n). We can associate with P a ring TP
as in Stanley [St3] whose Hilbert function coincides with c̃λ
(n). There
are many other choices for P . For example, in type A, we can consider a
hive polytope or a honeycomb polytope [KT1] instead of the BZ-polytope.
The rings TP ’s associated with different P ’s will, in general, be different,
and there is nothing canonical about them. In contrast, the ring T (cλα,β) is
special because:
Proposition 4.2.1 (PH0) The rings R(cλ
), S(cλ
), T (cλ
) have quan-
tizations Rq(c
), Sq(c
), Tq(c
) endowed with canonical bases in the ter-
minology of Lusztig [Lu4]. Furthermore, the canonical bases of Rq(c
), Sq(c
are compatible with the action of the Drinfeld-Jimbo quantum group associ-
ated with H = GLn(C), and the canonical basis of Sq(c
α,β) is an extension
of the canonical basis of Tq(c
α,β) in a natural way.
This follows from the work of Lusztig (cf. [Lu3], Chapter 27 in [Lu4]) and
Kashiwara (cf. Theorem2 in [Kas3]). Specializations of these canonical
bases at q = 1 will be called canonical bases of R(cλα,β), S(c
α,β), T (c
α,β).
Lusztig [Lu4] has conjectured that the structural constants associated with
the canonical bases in Proposition 4.2.1 are polynomials in q with nonnega-
tive integral coefficients as in the case of the canonical basis of the (negative
part of the) Drinfeld-Jimbo enveloping algebra. We refer to Proposition 4.2.1
as PH0 in view of this (conjectural) positivity property.
In view of this proposition, we call the rings R(cλα,β), S(c
α,β) and T (c
the canonical rings associated with the Littlewood-Richardson coefficient
, and X = Proj(R(cλ
)), Y = Proj(S(cλ
)) and Z = Proj(T (cλ
)) the
canonical models associated with cλα,β.
4.2.1 From PH0 to PH1,3
Now we study the relevance of PH0 above in the context of PH1,SH,PH2,
and PH3 for Littlewood-Richardson coefficients (Section 1.2).
As already remarked in Section 1.7, PH1 for Littlewood-Richardson coeffi-
cients is a formal consequence of the properties of Kashiwara’s crystal oper-
ators on the canonical bases in PH0 (Proposition 4.2.1); [Dh, Kas2, Li, Lu4].
Specifically, the canonical basis of the ring Rq(c
) also yields a canon-
ical basis for the tensor product Vq,α ⊗ Vq,β of the irreducible Hq modules
with highest weights α and β. The Littlwood-Richardson rule for arbitrary
types follows from the study of Kashiwara’s crystal operators on this canon-
ical basis for the tensor product; [Lu4]. This rule is equivalent to the one
in [Li] based on combinatorial interpretation of the crystal operators in the
path model therein. The article [Dh] derives a convex polyhedral formula
for Littlewood-Richardson coefficients (of arbitrary type) using this com-
binatorial interpretation. Though the complexity-theoretic issues are not
addressed in [Dh], it can be verified that the polyhedral formula therein is a
convex #P -formula. This yields PH1 for Littlewood-Richardson coefficients
of arbitrary types using PH0.
Now let us see the relevance of PH0 in the context of SH for Littlewood-
Richardson coefficients of arbitrary type.
The polytope in [Dh], mentioned above, for type A is equivalent to the
hive polytope in [KT1] in the sense that the number integer points in both
the polytopes is the same. Knutson and Tao prove SH for type A by show-
ing that the hive polytope always has in integral vertex. To extend this
proof to an arbitrary type, one has to convert the polytope in [Dh] into a
polytope that is guaranteed to contain an integral vertex if the index of the
stretching quasipolynomial c̃λ
(n) is one. The main difficulty here is that
we do not have a nice mathematical interpretation for the index. Algorithm
in Theorem 3.1.1 applied to the polytope in [Dh] computes this index in
polynomial time. But it does not give a nice interpretation that can be used
in a proof as above.
This index is simply the largest integer dividing the degrees of all ele-
ments in any basis of the canonical ring T (cλ
)–in particular, the canon-
ical basis. This follows by applying Proposition 3.1.3 to the polytope in
[Dh]. This leads us to ask: is there an interpretation for the index based on
Lusztig’s topological construction of the canonical basis in Proposition 4.2.1?
If so, this may be used to extend the known polyhedral proof for SH in type
A to arbitrary types. Alternatively, it may be possible to prove SH using
topological properties of the canonical basis in the spirit of the topological
(intersection-theoretic) proof [Bl] of SH in type A.
Now let us see the relevance of PH0 in the context of PH3 for Littlewood-
Richardson coefficients.
First, let us consider the minimal positive form (Section 4.1.1) associated
with a Littlewood-Richardson coefficient cλ
of type A. Let T = T (cλ
denote the ring that arises in this case; cf. eq.(4.6). Now we can ask:
Question 4.2.2 Are all di’s occuring in the minimal positive form (cf.
(4.4)) one in this special case? This is equivalent to asking if the ring
T = T (cλα,β) in this case is integral over T1, the degree one component of T .
If so, this would provide an explanation for the conjecture of King at al
[KTT] (cf. eq.(1.3)) in the theory of Cohen-Macauley rings:
Proposition 4.2.3 Assuming yes, the conjecture of King et al [KTT] (Hy-
pothesis 1.2.6) holds.
Remark 4.2.4 In contrast, the ring TP associated with the hive polytope
(cf. beginning of Section 4.2) need not be integral over its degree one compo-
nent, in view of the fact that the hive polytope can have nonintegral vertices
[DM1].
Remark 4.2.5 T = T (cλ
) need not be generated by its degree one compo-
nent T1. If this were always so, the h-vector (hd, · · · , h0) in eq.(1.3) would
be an M-vector (Macauley-vector) [St2]. But one can construct α, β and λ
for which this does not hold.
Proof: (of the proposition) Since T is integral over T1, it has an h.s.o.p., all of
whose elements have degree 1. By Theorem 3.4.1, the singularities of spec(T )
are rational. Hence T is Cohen-Macaulay. Now the result immediately
follows from the theory of Cohen-Macauley rings [St2]. Q.E.D.
In view of this Proposition, the conjecture of King et al will follow if all
canonical basis elements of T (cλ
) can be shown to be integral over the basis
elements of degree one. This requires a further study of the multiplicative
structure of this canonical basis. Considerations for PH3 (Hypothesis 1.2.8)
for Littlewood-Richardson coefficients of arbitrary type are similar.
Similarly, the positivity property (PH2) of the stretching quasipolynomial
associated with Littlewood-Richardson coefficients may possibly follow from
a deep study of the multiplicative structure of the canonical basis as per
PH0 (Proposition 4.2.1), just as positivity of the multiplicative structural
coefficients of the canonical basis for the (negative part of the) Drinfeld-
Jimbo enveloping algebra follows from a deep study of the multiplicative
structure of this basis [Lu4].
4.2.2 On PH0 in general
The discussion above indicates that for Littlewood-Richardson coefficients
PH1,SH,PH3, and plausibly PH2 as well are intimately related to PH0
(Proposition 4.2.1). This leads us to ask if the rings associated in Sec-
tion 4.1.2 with other structural constants under consideration in this paper
have quantizations which satisfy appropriate forms of PH0. If so, this PH0
may be used to derive PH1, SH, PH3, and PH2 (Hypotheses 3.3.1-3.3.4)
for these structural constants. Note that SH (a) follows from PH3 (see the
remark after Hypothesis 3.3.4); PH2 may also follow from PH3. Thus PH1
and PH3 are the ones to focus on.
To formalize this, let s be a structural constant which is either the Kro-
necker coefficient as in Problem 1.1.1, or the plethysm constant as in Prob-
lem 1.1.2, or the multiplicity mπ
in Problem 1.1.3, or the multiplicity sπ
in Problem 1.1.4, when X therein is a class variety. Let R(s), S(s), T (s) be
the rings associated with s (Section 4.1.2). Let X(s) = Proj(R(s)), Y (s) =
Proj(S(s)) and Z(s) = Proj(R(s)). We call R = R(s), S = S(s), T = T (s)
the canonical rings associated with s, and X(s), Y (s), Z(s) the canonical
models associated with s, because we expect these rings and models to be
special as in the case of the Littlewood-Richardson coefficients.
Let H be as in Problem 1.1.3 or Problem 1.1.4. Assume that H is
connected. Let Hq denote the Drifeld-Jimbo quantization of H. Now we
Question 4.2.6 (PH0??) Are there quatizations Rq, Sq of R,S, with Hq-
action, and a quantization Tq of T with “canonical” bases (in some appro-
priate sense) B(Rq), B(Sq), B(Tq), where B(Rq) and B(Sq) are compatible
with the Hq-action and B(Sq) is an extension of B(Tq)? Furthermore, do
these canonical bases have appropriate positivity properties?
In other words, are there quantizations of R,S and T for which PH0
(Proposition 4.2.1) can be extended in a natural way?
If so, this extended PH0 may be used to prove PH1 and SH for s just as
in the case of Littlewood-Richardson coefficients (of type A).
4.3 Nonstandard quantum group for the Kronecker
and the plethysm problems
We now consider this question when s is the kronecker or the plethysm
constant (cf. Problems 1.1.1 and 1.1.2). PH0 for Littlewood-Richardson
coefficients (Proposition 4.2.1) depends critically on the theory of Drinfeld-
Jimbo quantum groups. This is intimately related (in type A) [GrL] to the
representation theory of Hecke algebras. To extend PH0 in the context of the
kronecker and the plethysm constants, one needs extensions of these theories
in the context of Problems 1.1.1-1.1.2. In this section, we briefly review the
results in [GCT4, GCT8, GCT7] in this direction and the theoretical and
experimental evidence it provides in support of PH0–that is, affirmative
answer to Question 4.2.6–in this context.
So let us consider the generalized plethysm problem (Problem 1.1.2).
As expected, the representation theory of Drinfeld-Jimbo quantum groups
and Hecke algebras does not work in the context of this general problem.
Briefly, the problem is that if H is a connected, reductive group and V its
representation, then the homomorphism H → G = GL(V ) does not quan-
tize in the setting of Drinfeld-Jimbo quantum groups. That is, there is no
quantum group homomorphism from Hq, the Drinfeld-Jimbo quantization
of H, to Gq, the Drinfeld-Jimbo quantization of G. In [GCT4, GCT7], a new
nonstandard quantization GHq of G– called a nonstandard quantum group–is
constructed so that there is a quantum group homomorphism Hq → G
When H = G, GHq coincides with the Drinfeld-Jimbo quantum group. The
article [GCT8] gives a conjectural scheme for constructing a nonstandard
canonical basis for the matrix coordinate ring of GHq that is akin to the
canonical basis for the matrix coordinate ring of the Drinfeld-Jimbo quan-
tum group [Lu4, Kas3].
It is known that the Drinfeld-Jimbo quantum group Gq = GLq(V ) and
the Hecke algebra Hn(q) are dually paired: i.e., they have commuting ac-
tions on V ⊗nq from the left and the right that determine each other, where Vq
denotes the standard quantization of V . Furthermore, the Kazhdan-Lusztig
basis forHn(q) is intimately related to the canonical basis for Gq [GrL]. Sim-
ilarly, [GCT7] constructs a nonstandard generalization BHn (q) of the Hecke
algebra which is (conjecturally) dually paired to GHq . The article [GCT8]
gives a conjectural scheme for constructing a nonstandard canonical basis of
BHn (q) akin to the Kazhdan-Lusztig basis of the Hecke algebra Hn(q).
The nonstandard quantum groupGHq and the nonstandard algebraB
n (q)
turn out to be fundamentally different from the standard Drinfeld-Jimbo
quantum group Gq and the Hecke algebra Hn(q). For example, the non-
standard quantum group GHq is a nonflat deformation of G in general. This
means the Poincare series of the matrix coordinate ring of GHq is different
from the Poincare series of the matrix coordinate ring of G. Specifically,
the terms of the first series can be smaller than the respective terms of the
second series. Similarly, BHn (q) is a nonflat deformation of the group algebra
C[Sn] of the symmetric group Sn; i.e., its dimension can be bigger than that
of C[Sn].
Nonflatness of GHq intuitively means that it is “smaller” than G in gen-
eral. Hence, it may seem that there is a loss of information when one goes
from G to GHq . Fortunately, there is none, as per the reciprocity conjecture
in [GCT7]. This roughly says that the information which is lost in the tran-
sition from G to GHq simply gets transfered to B
n (q), which is bigger than
Hn(q). In other words, there is no information loss overall. Hence analogues
of the properties in the standard setting should also hold in the nonstandard
setting, though in a far more complex way.
That is what seems to happen to positivity. Specifically, experimental ev-
idence suggests that the conjectural nonstandard canonical bases in [GCT8]
have nonstandard positivity properties which are complex versions of the
positivity properties in the standard setting. See [GCT7, GCT8, GCT10]
for a detailed story.
4.4 The cone associated with the subgroup restric-
tion problem
In this section, we prove Theorem 3.4.2, by extending the proof of Brion
and Knop (cf. [El]) for the Littlewood-Richardson problem. The proof is in
the spirit of the proof of quasipolynomiality in Section 4.1.
Let G be a connected, reductive group, H a connected, reductive sub-
group, and ρ : H → G a homomorphism. Theorem 3.4.2 has the following
equivalent formulation. Let S(H,G) be the set of pairs (µ, λ) such that
Vµ(H)⊗ Vλ(G) has a nonzero H-invariant. Then,
Theorem 4.4.1 The set S(H,G) is a finitely generated semigroup with re-
spect to addition.
When G = H ×H and the embedding H ⊆ G is diagonal, this special-
izes to the Brion-Knop result mentioned above. The proof follows by an
extension the technique therein.
Proof: Let B be a Borel subgroup of G, U the unipotent radical of B and
T the maximal torus in B. Similarly, let B′ be a Borel subgroup of H, U ′
the unipotent radical of B′ and T ′ the maximal torus in B′. Without loss
of generality, we can assume that B′ ⊆ B, U ′ ⊆ U , T ′ ⊆ T . Let A = C[G]U
be the algebra of regular functions on G that are invariant with respect to
the right multiplication by U . It is known to be finitely generated [El]. The
groups G and T act on A via left and right multiplication, respectively. As
a G× T -module,
A = ⊕λVλ(G), (4.7)
where the torus T acts on Vλ(G) via multiplication by the highest weight
λ∗ of the dual module. Similarly,
A′ = C[H]U
= ⊕λVµ(H), (4.8)
where the torus T ′ acts on Vµ(H) via multiplication by the highest weight
µ∗ of the dual module.
Now A⊗A′ is finitely generated since A and A′ are. Let X = (A⊗A′)H
be the ring of invariants of H acting diagonally on A⊗A′. The torus T ×T ′
acts on X from the right. Since H is reductive, X is finitely generated [PV].
Hence, the semigroup of the weights of the right action of T × T ′ on X is
finitely generated. We have
X = (A⊗A′)H = ((⊕Vλ(G)) ⊗ (⊕Vµ(H)))
H = ⊕(Vλ(G)⊗ Vµ(H))
and the weights of the algebra X are of the form (λ∗, µ∗) such that Vλ(G)⊗
Vµ(H) contains a nontrivial H-invariant. Therefore these pairs form a
finitely generated semigroup. Q.E.D.
For the sake of simplicity, assume that G and H are semisimple in what
follows. Let TR(H,G) denote the polyhedral convex cone in the weight
space of H ×G generated by T (H,G), as defined in Theorem 3.4.2. This is
a generalization of the Littlewood-Richardson cone (Section 2.2.2).
The following generalization of Corollary 3.2.3 is a consequence of The-
orem 3.1.1 and its proof.
Theorem 4.4.2 Assume that the positivity hypothesis PH1 (Section 3.3)
holds for the subgroup restriction problem for the pair (H,G), where both H
and G are classical. Given dominant weights µ, λ of H and G, the polytope
Pµ,λ as in PH1 has a specification of the form
Ax ≤ b (4.9)
where A depends only on H and G, but not on µ or λ, and b depends
homogeneously and linearly on µ, λ. Let n be the total number of columns
in A.
Then, there exists a decomposition of TR(H,G) into a set of polyhedral
cones, which form a cell complex C(H,G), and, for each chamber C in this
complex, a set M(C) of O(n) modular equations, each of the form
aiµi +
biλi = 0 (mod d),
such that
1. Saturation hypothesis SH is equivalent to saying that: (µ, λ) ∈ T (H,G)
iff (µ, λ) ∈ TR(H,G) and (µ, λ) satisfies the modular equations in the
set M(Cµ,λ) associated with the smallest cone Cµ,λ ∈ C(H,G) contain-
ing (µ, λ).
2. Given (µ, λ), whether (µ, λ) ∈ TR(H,G) can be determined in polyno-
mial time.
3. If so, whether (µ, λ) satisfies the modular equations associated with
the smallest cone in C(H,G) containing it can also be determined in
polynomial time.
Proof: Given a point p = (µ′, λ′) in the weight space of H×G, where µ′ and
λ′ are arbitrary rational points, let S(p) denote the constraints (half-spaces)
in the sytem (4.9) whose bounding hyperplanes contain the polytope Pµ′,λ′ .
We can decompose TR(H,G) into a conical, polyhedral cell complex, so that
given a cone C in this complex, and a point p in its interior, the set S(p)
does not depend on p. We shall denote this set by S(C). Thus the affine
span of Pµ,λ, for any (µ, λ) ∈ C, is determined by the linear system
A′x = b′,
where [A′, b′] consists of the rows of [A, b] in (4.9) corresponding to the set
S(C). By finding the Smith normal form of A′, we can associate with C a set
of modular equations that the entries of b′ must satisfy for this affine span to
contain an integer point; see the proof of Theorem 3.1.1. Since the entries of
A′ depend only on H and G, these equations depend only on C. If (µ, λ) ∈
T (H,G), then (µ, λ) is integral, and hence these equations are satisfied.
Conversely, if (µ, λ) ∈ TR(H,G) and these equations are satisfied, then the
saturation property implies that (µ, λ) ∈ T (H,G), as seen by examining
the proof of Theorem 3.1.1. Furthermore, given (µ, λ), the algorithm in the
proof of Theorem 3.1.1 implicitly determines if (µ, λ) ∈ TR(H,G) and if
these modular equations are satisfied in polynomial time. Q.E.D.
4.5 Elementary proof of rationality
In this section we give an elementary proof of rationality in Theorem 3.4.1
(a), when H therein is connected–actually of a slightly stronger statement:
namely, the stretching function m̃π
(n) is asymptotically a quasipolynomial,
as n → ∞; cf. Remark 4.1.4. But this proof cannot be extended to prove
quasipolynomiality for all n. The proof here is motivated by the work of
Rassart [Rs], De Loera and McAllister on the stretching function associated
with a Littlewood-Richardson coefficient.
First, we recall some standard results that we will need.
Vector partition functions
Given an integral s×n matrix B and integral n-vector c, consider the vector
paritition function φB(c), which is the number of integer solutions to the
integer programming problem
By = c, y ≥ 0. (4.10)
For a fixed c, b, let
φB,c(n) = φB(nc)
φB,c,b(n) = φB(nc+ b).
(4.11)
By Sturmfels [Stm] and Szenes-Vergne residue formula [SV], φB(c) is a
piecewise quasipolynomial function of c. That is, Rn can be decomposed into
polyhedral cones, called chambers, so that the restriction of φB(c) to each
chamber R is a multivariate quasipolynomial function of the coordinates of c.
This implies that φB,c(n) is a quasipolynomial function of n. It also implies
that the function φB,c,b(n) is asymptotically a quasipolynomial function of
n, as n→ ∞, because the points nc+ b, as n→ ∞, lie in just one chamber.
The Szenes-Verne residue formula [SV] for vector partition functions also
implies that there is a constant d(B), depending only on B, such that the
period of φB,c(n), for any c, divides d(B).
Klimyk’s formula
Let H ⊆ G and mπλ be as in Theorem 3.4.1 (a), with H connected. Let us
assume that H is semisimple, the general case being similar. Let H and G
be the Lie algebras of H and G respectively. We recall Klimyk’s formula for
. Without loss of generality, we can assume that the Cartan subalgebra
C ⊆ H is a subalgebra of the Cartan subalgebra D ⊆ G. So we have a
restriction from D∗ to C∗, and we assume that the half-spaces determining
positive roots are compatible. We denote weights of H by symbols such as µ
and of G by symbols such as µ̄. To be consistent, we shall use the notation
instead of mπ
in this proof. We write µ̄ ↓ µ if the weight µ̄ of G restricts
to the weight µ of H. We denote a typical element of the Weyl group of
H by W , and a typical element of the Weyl group of G by W̄ . Given a
dominant weight π of G and a weight µ̄ of G, let nµ̄(λ̄) denote the dimension
of the weight space for µ̄ in Bλ̄ = Vλ̄(G).
We assume that:
(A): For any weight µ of H, the number of µ̄’s such that µ̄ ↓ µ is finite.
For example, this is so in the plethysm problem (Problem 1.1.2). We
shall see later how this assumption can be removed.
By Klimyk’s formula (cf. page 428, [FH]),
(−1)W
µ̄↓π−ρ−W (ρ)
nµ̄(Vλ̄), (4.12)
where ρ is half the sum of positive roots of H. We allow µ̄ in the inner sum
to range over all weights µ̄ of G such that µ̄ ↓ π − ρ −W (ρ) by defining
nµ̄(Vλ̄) to be zero if µ̄ does not occur in Vλ̄.
Proof of Theorem 3.4.1 (a)
The goal is to express m̃π
(n) as a linear combination of vector partition
functions φB,c,b(n)’s, for suitable B, c, b’s, using Klimyk’s formula for m
After this, we can deduce asymptotic quasipolynomiality of m̃π
(n) from
asymptotic quasipolynomiality of φB,c,b(n)’s.
By Kostant’s multiplicity formula (cf. page 421 [FH]),
nµ̄(Vλ̄) =
(−1)W̄P (W̄ (λ̄+ ρ̄)− (µ̄+ ρ̄)), (4.13)
where P (λ̄), for a weight λ̄ of G, denotes the Kostant partition function;
i.e., the number of ways to write λ̄ as a sum of positive roots of G. It is
important for the proof that Kostant’s formula (4.13) holds even if µ̄ is not
a weight that occurs in the representation Vλ̄–in this case, nµ̄(Vλ̄) = 0, and
the right hand side of (4.13) vanishes.
By eq.(4.12) and (4.13),
(−1)W (−1)W̄
µ̄↓π−ρ−W (ρ)
P (W̄ (λ̄+ ρ̄)− (µ̄+ ρ̄)). (4.14)
Let D denote the dominant Weyl chamber in the weight space of G. Let
C denote the Weyl chamber complex associated with the weight space of G.
The cells in this complex are closed polyhedral cones. Each cone is either
the chamber W̄ (D), for some Weyl group element W̄ , or a closed face of
W̄ (D) of any dimension.
Using Möbius inversion, the inner sum
µ̄↓π−ρ−W (ρ)
P (W̄ (λ̄+ ρ̄)− (µ̄+ ρ̄))
in eq.(4.14) can be written as a linear combination
µ̄∈C:µ̄↓π−ρ−W (ρ)
P (W̄ (λ̄+ ρ̄)− (µ̄+ ρ̄)),
where C ranges over chambers in the Weyl chamber complex C, a(C) is an
appropriate constant for each C.
Hence,
(−1)W (−1)W̄
µ̄∈C:µ̄↓π−ρ−W (ρ)
P (W̄ (λ̄+ ρ̄)− (µ̄+ ρ̄)).
(4.15)
Now think of π and λ̄ as variables. But H and G are fixed, and hence
also the quantities such as ρ and ρ̄.
Claim 4.5.1 For fixed Weyl group elements W, W̄ and a fixed C, the sum
µ̄∈C:µ̄↓π−ρ−W (ρ)
P (W̄ (λ̄+ ρ̄)− (µ̄ + ρ̄)) (4.16)
can be expressed as a vector partition function associated with an appropriate
linear system
By = c, y ≥ 0, (4.17)
where the matrix
B = BH,G,C ,
depends only on C and the root systems of H and G, but not on π and λ̄,
and the coordinates of the vector
c = mW,W̄,C(λ̄, π, ρ, ρ̄),
depend on W, W̄ ,C, ρ, ρ̄, π, π, and furthermore, their dependence on π, λ̄, ρ, ρ̄
is linear.
Here assumption (A) is crucial. Without it, the sum (4.16) can diverge. Of
course, without assumption (A), we can still make the sum finite, by requir-
ing that µ̄ lie within the convex hull Hλ̄ generated by the points {W̄ (λ̄)},
where W̄ ranges over all Weyl group elements. This means we have to add
constraints to the system (4.17) corresponding to the facets of Hλ̄. But
the entries of the resulting B would depend on λ̄, and the theory of vector
partition functions will no longer apply.
Proof of the claim: Let µ̄i’s denote the integer coordinates of µ̄ in the basis
of fundamental weights. We denote the integer vector (µ̄1, µ̄2, · · · ) by µ̄
again. The Kostant partition function P (ν) is a vector partition function
associated with an integer programming problem:
BP v = ν, v ≥ 0,
where the columns of BP correspond to positive roots of G. The sum in
(4.16) is equal to the number of integral pairs (µ̄, v) such that
1. µ̄ ∈ C,
2. µ̄ ↓ π − ρ−W (ρ),
3. BP v = W̄ (λ̄+ ρ̄)− (µ̄ + ρ̄), v ≥ 0.
The first two condititions here can be expressed in terms of linear con-
straints (equalities and inequalities) on the coordinates µ̄i’s. Thus the three
conditions together can be expressed in terms of linear constraints on (µ̄, v).
By the finiteness assumption (A), the polytope determined by these con-
straints is a bounded polytope. The number of integer points in such a
polytope can be expressed as a vector partition function (cf. [BBCV]). This
proves the claim.
Let us denote the vector partition associated with the integer program-
ming problem (4.17) in the claim by φW,W̄ ,C(c(λ̄, π, ρ, ρ̄)). Then
(−1)W (−1)W̄
a(C)φW,W̄ ,C(c(λ̄, π, ρ, ρ̄)). (4.18)
Hence,
(n) = mnλ̄nπ =
(−1)W (−1)W̄
a(C)φW,W̄ ,C(c(nλ̄, nπ, ρ, ρ̄)).
(4.19)
It follows from Claim 4.5.1 and the standard results on vector partition
functions mentioned in the begining of this section that
gW,W̄ ,C(n) = φW,W̄ ,C(c(nλ̄, nπ, ρ, ρ̄)),
is asymptitically a quasipolynomial function of n. Hence, m̃π
(n) is also
asymptotically a quasipolynomial function of n. This implies (cf. [St1])
(t) =
(n)tn (4.20)
is rational function of t.
This proves Theorem 3.4.1 (a) under the finiteness assumption (A).
It remains to remove the assumption (A). Let G′ ⊇ H be the smallest
Levi subalgebra of G containing H. Then
mππ′ , (4.21)
where π′ ranges over dominant weights of G′, mπ
denotes the multiplicity of
Vπ′(G
′) in Vλ̄(G), and m
the multiplicity of Vπ(H) in Vπ′(G
′). Furthermore,
1. the finiteness asssumption (A) is now satisfied for the pair (G′,H): i.e.,
for any weight µ of H, the number of weights µ′’s of G′ such that µ′ ↓ µ
is finite.
2. There is a polyhedral expression for mπ
; this follows from [Li, Dh].
By the first condition and the argument above, we get an expression for
akin to (4.18). Substituting this expression and the polyhedral expres-
sion for mπ
in (4.21), leads to a formula for m̃π
(n) as a linear combination
of φB,c,b(n)’s for appropriate B, c, b’s. After this, we proceed as before.
This proves Theorem 3.4.1 (a). Q.E.D.
We also note down the following consequence of the proof.
Proposition 4.5.2 There is a constant D depending only G and H, such
that for any λ̄, π, orders of the poles of Mπ
(t) (cf. (4.20), as roots of unity,
divide D.
A bound onD provided by the proof below is very weak: D = O(2O(rank(G))).
Proof: It suffices to to bound the period of the quasipolynomial m̃π
(n). For
this, it suffices to let n→ ∞. For a fixed W, W̄ ,C, the chamber containing
c(nλ̄, nπ, ρ, ρ̄)) is completely determined by λ̄ and π as n→ ∞. Under these
conditions, the degree of φW,W̄ ,C(c(nλ̄, nπ, ρ, ρ̄)) is equal to the dimension of
the polytope associated with this vector partition function. This dimension
is clearly O(rank(G)2).
By Szenes-Vergne residue formula [SV], there is a constant D depending
on only G,H,W, W̄ , C, such that the period of the quasipolynomial h(n) =
φW,W̄ ,C(c(nλ̄, nπ, 0, 0)) divides D for every λ̄, π; here we are putting ρ and
ρ̄ equal to zero, since we are interested in what happens as n→ ∞. Q.E.D.
Chapter 5
Parallel and PSPACE
algorithms
In this chapter we give PSPACE algorithms (cf. Theorem 3.4.3) for com-
puting the various structural constants under consideration . We shall only
prove Theorem 3.4.3, when H is therein is either a complex, semisimple
group, or a symmetric group, or a general linear group over a finite field,
the extension to the general case being routine.
We recall two standard results in parallel complexity theory [KR], which
will be used repeatedly.
Let NC(t(N), p(N) denote the class of problems that can be solved
in O(t(N)) parallel time using O(p(N)) processors, where N denotes the
bitlength of the input. Let
NC = ∪iNC(log
i(N),poly(N)).
This is the class of problems having efficient parallel algorithms.
Proposition 5.0.3 [Cs, KR] Let A be an n × n-matrix with entries in a
ring R of characteristic zero. Then the determinant of A, and A−1, if A
is nonsingular, can be computed in O(log2 n) parallel steps using poly(n)
processors; here each operation in the ring is considered one step. Hence, if
R = Q, the problems of computing the determinant, the inverse and solving
linear systems belong to NC.
Proposition 5.0.4 The class NC(t(N), 2t(N)) ⊆ SPACE(O(t(N))). In
particular, NC(poly(N), 2O(poly(N))) ⊆ PSPACE.
5.1 Complex semisimple Lie group
In this section we prove a special case of Theorem 3.4.3 for the general-
ized plethym problem (Problem 1.1.2). Accordingly, let H be a complex,
semisimple, simply connected Lie group, G = GL(V ), where V = Vµ(H) is
an irreducible representation of H with dominant weight µ, ρ : H → G the
homomorphism corresponding to the representation, and mπ
the multiplic-
ity of Vπ(H) in Vλ(G), considered as an H-module via ρ; cf. Problem 1.1.3.
Then:
Theorem 5.1.1 The multiplicitymπ
can be computed in poly(〈λ〉, 〈µ〉, 〈π〉,dim(H))
space.
Here it is assumed that the partition λ = λ1 ≥ λ2 ≥ · · ·λr > 0 is rep-
resented in a compact form by specifying only its nonzero parts λ1, . . . , λr.
This is important since dim(G) can be exponential in dim(H) and 〈µ〉. A
compact representation allows 〈λ〉 to be small, say poly(dim(H), 〈µ〉), in this
case.
We begin with a simpler special case.
Proposition 5.1.2 If dim(V ) = poly(dim(H)), then mπ
can be computed
in PSPACE; i.e., in poly(〈λ〉, 〈µ〉, 〈π〉,dim(H)) space.
This implies that the Kronecker coefficient (Problem 1.1.1) can be computed
in PSPACE.
Proof: Let us use the notation λ̄ instead of λ to be consistent with the
notation used in Klimyk’s formula (4.12). By the latter,mπ
can be computed
in PSPACE if nµ̄(Vλ̄) in that formula can be computed in PSPACE for every
µ̄ and λ̄. In type A, this is just the number of Gelfand-Tsetlin tableau with
the shape λ̄ and weight µ̄. If dim(V ) = poly(dim(H)), the size of such
a tableau is O(dim(V )2) = poly(dim(H)). So we can count the number
of such tableu in PSPACE as follows: Begin with a zero count, and cycle
through all tableaux of shape λ̄ in polynomial space one by one, increasing
the count by one everytime the tableau satisfies all constraints for Gelfand-
Tsetlin tableau and has weight µ̄. In general, the role of Gelfand-Tsetlin
tableaux is played by Lakshmibai-Seshadri (LS) paths [Li, Dh]. Q.E.D.
The argument above does not work if dim(V ) is not poly(dim(H)), as
in the plethym problem (Problem 1.1.2), where dim(V ) = dim(Vµ) can
be exponential in n = dim(H) and the bitlength of µ. In this case, the
algorithm cannot even afford to write down a tableau since its size need not
be polynomial.
Next we turn to Theorem 5.1.1. For the sake of simplicity, we shall
prove it only for H = SLn(C), or rather GLn(C)–i.e., the usual plethysm
problem. This illustrates all the basic ideas. The general case is similar. We
shall prove a slightly stronger result in this case:
Theorem 5.1.3 The plethysm constant aπ
can be can be computed in
poly(〈λ〉, 〈µ〉, 〈π〉) space.
Here the dependence on n = dim(H) is not there. This makes a difference
if the heights of µ and π are less than n = dim(H)–remember that we are
using a compact representation of a partition in which only nonzero parts
are specified. This is really not a big issue. Because aπ
depends only on
the partitions λ, µ, π and not n. Hence, without loss of generality, we can
assume that n is the maximum of the heights of µ and π. It is possible to
strengthen Theorem 5.1.1 similarly.
To prove Theorem 5.1.3, we shall give an efficient parallel algorithm to
compute ãπλ,µ that works in poly(〈λ〉, 〈µ〉, 〈π〉) parallel time usingO(2
poly(〈λ〉,〈µ〉,〈π〉))
processors. This will show that the problem of computing ãπλ,µ is in the com-
plexity class NC(poly(〈λ〉, 〈µ〉, 〈π〉), 2poly(〈λ〉,〈µ〉,〈π〉)), which is contained in
PSPACE by Proposition 5.0.4. The basic idea is to parallelize the classical
character-based algorithm for computing aπ
by using efficient parallel algo-
rithm for inverting a matrix and solving a linear system (Proposition 5.0.3).
We begin by recalling the standard facts concerning the characters of
the general linear group. Given a representation W of GLm(C), let ρ :
GLm(C) → GL(W ) be the representation map. Let χρ(x1, . . . , xm) de-
note the formal character of this representation W . This is the trace of
ρ(diag(x1, . . . , xm)), where diag(x1, . . . , xn) denotes the generic diagonal ma-
trix with variable entries x1, . . . , xm on its diagonal. If W is an irreducible
representation Vλ(GLm(C)), then χρ(x1, . . . , xm) is the Schur polynomial
Sλ(x1, . . . , xm). By the Weyl character formula,
λi+m−i
|xm−ij |
, (5.1)
where |aij | denotes the determinant of anm×m-matrix a. The Schur polyno-
mials form a basis of the ring of symmetric polynomials in x1, . . . , xm. The
simplest basis of this ring consists of the complete symmetric polynomials
Mβ(x1, . . . , xm) defined by
Mβ(x1, . . . , xm) =
where γ ranges over all permutations of β and tγ =
i . Schur polyno-
mials are related to Mβ by:
Mβ , (5.2)
where k
is the Kostka number. This is the number of semistandard tableau
of shape λ and weight β.
If the representation W is reducible, its decomposition into irreducibles
is given by:
m(π)Vπ(GLn(C)), (5.3)
where m(π)’s are the coefficients of the formal character χρ(x1, . . . , xm) in
the Schur basis:
m(π)Sπ.
Proof of Theorem 5.1.3
Let λ, µ, π be as in Theorem 5.1.3. Let H = GLn(C), V = Vµ(H), G =
GL(V ). Let sλ(x1, . . . , xm) be the formal character of the representation
Vλ(G) of G. Here m = dim(Vµ) can be exponential in n and 〈µ〉. The basis
of Vµ(H) is indexed by semistandard tableau of shape µ with entries in [1, n].
Let us order these tableau, say lexicographically, and let Ti, 1 ≤ i ≤ m,
denote the i-th tableau in this order. With each tableau T , we associate a
monomial
t(T ) =
wi(T )
where wi(T ) denotes the number of i’s in T . Given a polynomial f(x1, . . . , xm),
let us define fµ = fµ(t1, . . . , tn) to be the polynomial obtained by substi-
tuting xi = t(Ti) in f(x1, . . . , xm). Then the formal character of Vλ(G),
considered as an H-representation of via the homomorphism H → G =
GL(Vµ(H)), is the symmetric polynomial Sλ,µ(t1, . . . , tn) = (Sλ)µ. The
plethysm constant aπλ,µ is defined by:
Sλ,µ(t1, . . . , tn) =
aπλ,µSπ(t1, . . . , tn). (5.4)
An efficient parallel algorithm to compute aπλ,µ is as follows. Here by an
efficient parallel algorithm, we mean an algorithm that works in poly(〈λ〉, 〈µ〉, 〈π〉)
time using 2poly(〈λ〉,〈µ〉,〈π〉) processors. We will repeatedly use Proposi-
tion 5.0.3.
Algorithm
(1) Compute Sλ,µ(t1, . . . , tn). By the Weyl character formula (5.1),
Sλ,µ(t1, . . . , tn) =
Aλ,µ(t1, . . . , tn)
Bλ,µ(t1, . . . , tn)
where Aλ(x1, . . . , xm) and Bλ(x1, . . . , xm) denote the numerator and denom-
inator in (5.1), and Aλ,µ = (Aλ)µ, and Bλ,µ = (Bλ)µ. Let R = C[t1, . . . , tn].
Aλ,µ(t1, . . . , tn) = |t(Tj)
λi+m−i|.
This is the determinant of an m×m matrix with entries in R, where m =
dim(V ) can be exponential in n and 〈µ〉. It can be evaluated in O(log2m)
parallel ring operations using poly(m) processors. Each ring element that
arises in the course of this algorithm is a polynomial in t1, . . . , tn of total
degree O(|λ|m), where |λ| denotes the size of λ. The total number of its
coefficients is r = O((|λ|m)n). Hence each ring operation can be carried
out efficiently in O(log2(r)) parallel time using poly(r) processors. Since
logm = poly(n, 〈µ〉) and log r = poly(n, 〈λ〉, 〈µ〉), it follows that Aλ,µ can
be evaluated in poly(n, 〈µ〉, 〈λ〉) parallel time using 2poly(n,〈µ,λ〉) processors.
The determinant Bλ,µ can also be computed efficiently in parallel in a similar
fashion. To compute Sλ,µ, we have to divide Aλ,µ by Bλ,µ. This can be done
by solving an r × r linear system, which, again, can be done efficiently in
parallel. This computation yields representation of Sλ,µ in the monomial
basis {Mβ} of the ring of symmetric polynomials in t1, . . . , tn.
(2) To get the coefficients aπλ,µ, we have to get the representation of Sλ,µ(t)
in the Schur basis. This change of basis requires inversion of the matrix
in the linear system (5.2). The entries of the matrix K occuring in this
linear system are Kostka numbers. Each Kostka number can be computed
efficiently in parallel. Hence, all entries of this matrix can be computed
efficiently in parallel. After this, the matrix can be inverted efficiently in
parallel, and the coefficients aπλ,µ’s of Sλ,µ in the Schur basis can be computed
efficiently in parallel. Finally, we use Proposition 5.0.4 to conclude that aπ
can be computed in PSPACE. Q.E.D.
5.2 Symmetric group
Next we prove Theorem 3.4.3 when H = Sm. Let X = Vµ(Sm) be an
irreducible representation (the Specht module) of Sm corresponding to a
partition µ of size m. Let ρ : H → G = GL(X) be the corresponding
homomorphism.
Theorem 5.2.1 Given partitions λ, µ, π, where µ and π have size m, the
multiplicity mπ
of the Specht module Vπ(Sm) in Vλ(G) can be computed in
poly(m, 〈λ〉) space.
Remark 5.2.2 The bitlengths 〈µ〉 and 〈π〉 are not mentioned in the com-
plexity bound because they are bounded by m.
For this, we need three lemmas.
Lemma 5.2.3 The character of a symmetric group can be computed in
PSPACE. Specifically, given a partition π of size m, and a sequence
i = (i1, i2, . . .) of nonnegative integers such that
jij = m, the value of
the character χπ of Sm on the conjugacy class Ci of permutations indexed
by i can be computed in poly(m) parallel time using 2poly(m) processors.
Hence it can be computed in poly(m) space (cf. Proposition 5.0.4).
Here the conjugacy class Ci consists of those permutations that have i1
1-cycles, i2 2-cycles, and so on.
Proof: Let k be the height of the partition π. Let x = (x1, . . . , xk) be the
tuple of variables xi’s. Given a formal series f(x) and a tuple (l1, . . . , lk) of
nonnegative integers, let [f(x)](l1,...,lk) denote the coefficient of x
1 · · · x
By the Frobenius character formula [FH],
χλ(Ci) = [f(x)](l1,...,lk), (5.5)
where
l1 = π1 + k − 1, l2 = π2 + k − 2, . . . , lk = πk,
f(x) = ∆(x)
Pj(x)
∆(x) =
i<j(xi − xj),
Pj(x) = x
1 + · · ·+ x
(5.6)
Since deg(f) = poly(m) and k ≤ m, the total number of coefficients of
f(x) is 2poly(m). Hence, we can evaluate f(x) in PSPACE by setting up
appropriate recurrence relations.
Alternatively, we can easily evaluate f(x) in poly(m) parallel time using
2poly(m) processors, and then extract its required coefficient. After this, the
result follows from Proposition 5.0.4. Q.E.D.
Lemma 5.2.4 Suppose φ is a character of Sm whose value on any conju-
gacy class Ci can be computed in O(s) space, for some parameter s. Then,
the multiplicity of the representation Vπ(Sm) in the representation Vφ(Sm)
corresponding φ can be computed in O(poly(m) + s) space.
Proof: The multiplicity is given by the inner product
〈φ, χπ〉 =
¯φ(σ)χπ(σ). (5.7)
By assumption, φ(σ) can be computed in O(s) space, and by Lemma 5.2.3,
χπ(σ) can be computed in poly(m) space. Hence, the result follows from
the preceding formula. Q.E.D.
Given an irreducible representation X = Vµ(Sm) and an irreducible rep-
resentation W = Vλ(G) of G = GL(X)), let ρµ denote the representation
map Sm → G, ρλ the representation map G→ GL(W ), and
ρ : Sm → G→ GL(W )
their composition. This is a representation of Sm. Let χρ be the character
of ρ.
Lemma 5.2.5 For any σ ∈ Sm, χρ(σ) can be computed in poly(m, 〈λ〉) in
poly(m, 〈λ〉) space.
The bitlength 〈µ〉 is not mentioned in the complexity bound because it
is bounded by m.
Proof: Let r = dim(X). The formal character of the representation Vλ(G)
of G = GL(X) is the Schur polynomial Sλ(x1, . . . , xr), r = dim(X). Hence,
χρ(σ) = Sλ(α)
where α = (α1, . . . , αr) is the tuple of eigenvalues of ρµ(σ). We shall compute
the right hand side fast in parallel–i.e., in poly(m, 〈λ〉) parallel time using
2poly(m,〈λ〉) processors–and then use Proposition 5.0.4 to conclude the proof.
This is done as follows.
(1) Let χµ denote the character of the representation ρµ. Let pi(α) = α
· · · + αir denote the i-th power sum of the eigenvalues. For any i,
pi(α) = χµ(σ
We can compute σi, for i ≤ |λ|, where |λ| denotes the size of λ, in poly(log i,m) =
poly(m, 〈λ〉) time using repeated squaring. After this χµ(σ
i) can be com-
puted fast in parallel in poly(m) time using Lemma 5.2.3. Thus each pi(α)
can be computed in poly(m, 〈λ〉) time in parallel using 2poly(m,〈λ〉) proces-
sors. We calculate pi(α) in parallel for all i ≤ |λ|, and all pγ(α) =
j pγj (α)
in parallel for all partitions γ of size at most m.
(2) After this, we calculate the complete symmetric function hi(α), for
each i ≤ |λ|, fast in parallel, by using the relation [Mc]:
|γ|=i
z−1γ pγ ,
where zγ =
i≥1 i
mimi!, and mi = mi(γ) denotes the number of parts of γ
equal to i. Thus we can calculate hγ(α) =
j hγj (α), for all partitions γ of
size m, fast in parallel.
(3) To compute Sλ(α), we recall that the transition matrix between
the Schur basis {Sλ} and the complete symmetric basis {hγ} of the ring
of symmetric functions is K∗, the transpose inverse of the Kostka matrix
K = [Kλ,γ ], where Kλ,γ denote the Kostka number; cf. [Mc]. As we noted in
the proof of Theorem 5.1.3, each Kostka number can be computed in fast in
parallel. Hence, K can be computed fast in parallel. After this, its inverse
K−1 can be computed fast in parallel by Proposition 5.0.3–this is the crux
of the proof–and finally K∗ as well. Thus Sλ(α) can be computed fast in
parallel, since each hγ(α) can be computed fast in parallel. Q.E.D.
Theorem 5.2.1 follows from Lemma 5.2.3,5.2.4 and 5.2.5. Q.E.D.
5.3 General linear group over a finite field
In this section we prove Theorem 3.4.3, when H therein is the general lin-
ear group GLn(Fpk) over a finite field Fpk . Irreducible representations of
H = GLn(Fpk) have been classified by Green [Mc]. They are labelled by
certain partition-valued functions. See [Mc] for a precise description of these
labelling functions. It is clear from the description therein that each labelling
function has a compact representation of bitlength O(n + k + 〈p〉), where
〈p〉 = log2 p; we specify a function by giving its partition values at the places
where it is nonzero. Let µ denote any such label. Let X = Vµ(H) be the
corresponding irreducible representation of H, and ρ : H → G = GL(X)
the corresponding homomorphism.
Theorem 5.3.1 Given a partition λ and labelling functions µ and π as
above, the multiplicity mπλ,µ of the irreducible representation Vπ(H) in Vλ(G)
can be computed in poly(n, k, 〈p〉, 〈λ〉) space.
The proof is similar to that of Theorem 5.2.1 for the symmetric group
with the following result playing the role of Lemma 5.2.3:
Lemma 5.3.2 Given a label γ of an irreducible character χγ of H = GLn(Fpk)
and a label δ of a conjugacy class in H, the value χγ(δ) can be computed
in poly(n, k, 〈p〉) parallel time using 2poly(n,k,〈p〉) processors, and hence by
Proposition 5.0.4, in poly(n, k, 〈p〉) space.
The label δ of a conjugacy class in H is also a partition valued function
[Mc], which admits a compact representation of bitlength poly(n, k, 〈p〉).
Proof: We shall parallelize Green’s algorithm [Mc] for computing the charac-
ter values, and then conclude by Proposition 5.0.3. Green shows that χγ(δ)’s
are entries of a transition matrix between a two polynomial bases: the first
constructed using Hall-Littlewood polynomials, and the second using Schur
polynomials. We have construct this transition matrix fast in parallel. We
shall only indicate here how the transition matrix between the basis of Hall-
Littlewood polynomials and the Schur basis for the ring symmetric functions
over Z[t] can be constructed fast in parallel. This idea can then be easily
extended to complete the proof.
First, we recall the definition of the Hall-Littlewood polynomial Pπ(x; t) =
Pπ(x1, . . . , xk; t) [Mc]. This is a symmetric polynomial in xi’s with co-
efficients in Z[t]. It interpolates between the Schur function sπ(x) and
the monomial symmetric function mπ(x) because Pπ(x; 0) = sπ(x) and
Pπ(x; 1) = mπ(x). The formal definition is as follows:
For a given partition π, let vπ(t) =
i≥0 vmi(t), where mi is the number
of parts of π equal to i, and
vm(t) =
1− ti
Pπ(x; t) =
Aπ(x, t)
Bπ(x, t)
, (5.8)
where
Aπ(x, t) =
sgn(σ)σ(xπ11 · · · x
i<j xi − txj ,
Bπ(x, t) = vπ(t)
i<j(xi − xj).
(5.9)
Here sgn(σ) denotes the sign of σ.
Let wπ,α(t)’s be the coeffcients of Pπ(x, t) in the Schur basis:
Pπ(x; t) =
wπ,α(t)sα(x). (5.10)
We want to calculate the matrix W = [wπ,α] fast in parallel. Using
formula (5.9), we calculate Aπ(x; t) fast in parallel; i.e., we calculate the
coefficients of Aπ(x; t) in the basis of monomials in x and t. We calculate
Bπ(x; t) similarly. After this the division in (5.8) can be carried out by
solving a an appropriate linear system. This can be done fast in parallel
by Proposition 5.0.3. Since, Pπ(x; t) is symmetric in xi’s, this yields its
coefficients in the monomial symmetric basis {mα(x)} with the coefficients
being in Z[t]. The transition matrix [Mc] from the monomial symmetric
basis to the Schur basis is given by the inverse of the Kostka matrix. This
inverse can be computed fast in parallel by Proposition 5.0.3. After this,
the coefficients wπ,α’s of Pπ(x; t) in the Schur basis can be computed fast in
parallel.
Furthermore, the inverse of W = [Wπ,α] can also be computed fast in
parallel by Proposition 5.0.3. Q.E.D.
5.3.1 Tensor product problem
Analogue of the Kronecker problem (Problem 1.1.1) for H = GLn(Fpk) is:
Problem 5.3.3 Given partition valued functions λ, µ, π, decide if the mul-
tiplicity bπλ,µ of Vπ(H) in the tensor product Vλ(H)⊗ Vµ(H) is nonzero.
In this context:
Theorem 5.3.4 The multiplicity mπ
can be computed in PSPACE; i.e.,
in poly(n, k, 〈p〉) space.
Proof: This follows from Lemma 5.3.2 and analogues of Lemmas 5.2.4 and
5.2.5 in this setting. Q.E.D.
A possible canditate for a stretching function assoociated with bπ
b̃πλ,µ(n) = b
nλ,nµ,
where nλ denotes the stretched partition-valed function obtained by stretch-
ing each partition value of λ by a factor of n. In other words b̃π
(n) is
the multiplicity of Vnπ(H(n)) in Vnλ(H(n)) ⊗ Vnµ(H(n)), where H(n) =
GLnm(Fpk) is the stretched group. Is it a quasi-polynomial? If so, we can
also ask for a good bound on its saturation and positivity indices.
5.4 Finite simple groups of Lie type
The work of Deligne-Lusztig [DL] and Lusztig[Lu5] yield an algorithm for
computing the character values for finite simple groups of Lie type.
Question 5.4.1 Can this algorithm be parallelized?
If so, Lemma 5.3.2, and hence Theorem 5.3.1, can be extended to finite
simple groups of Lie type.
Chapter 6
Experimental evidence for
positivity
In this chapter we give experimental evidence for positivity (PH2,3).
6.1 Littlewood-Richardson problem
Experimental evidence for PH2 in the context of the Littlewood-Richardson
problem (Problem 1.2.1) has been given in [DM2], and for PH3 in type A
in [KTT]. We give experimental evidence for PH3 in types B,C,D here.
Let Cλ
(t) be as in eq.(1.2). Its reduced positive form for various values
of α, β, λ is shown in Figure 6.1 for type B, in Figure 6.2 for type C, and
Figure 6.3 for type D. The rank of the Lie algebra is three in all cases.
In these types, the period of the stretching quasipolynomial c̃λ
(n) is at
most two. Accordingly, the period of every pole of Cλ
(t) is at most two.
The tables were computed from the tables in [DM2] for the stretching quasi-
polynomial c̃λ
(n) in these cases.
6.2 Kronecker problem, n = 2
Let kπλ,µ be the Kronecker coefficient in Problem 1.1.1. Let k̃
λ,µ(n) = k̃
nλ,nµ
be the associated stretching quasi-polynomial, and
Kπλ,µ(t) =
k̃πλ,µ(n)t
the associated rational function. An explicit formula (with alternating signs)
for the Kronecker coefficient, when n = 2, has given by Remmel and White-
head [RW] and Rosas [Ro], and a positive formula in [GCT9]. This case
turns out to be nontrivial. For example, the number of chambers (domains)
of quasi-polynomiality in this case turns out to be more than a million. Their
explicit description can be found out using the formula for the Kronecker
coefficient in [RW].
We implemented Rosas’ formula to check PH2 for the quasipolynomial
(n) for a few thousand values of µ, ν and λ with the help of a computer.
A large number of samples was chosen to ensure that a significant fraction
of the chambers were sampled. The quasi-polynomial k̃πλ,µ(n) and a positive
form of the rational function Cπλ,µ(t) are shown Figures 6.4 and 6.5 for few
sample values of λ = (λ1, λ2), µ = (µ1, µ2), and π = (π1, π2, π3, π4). It
may be noted that k̃π
(n) need not be a polynomial; this answers Kirillov’s
question [Ki] in the negative. But its period is at most two for n = 2. This
follows from the formula in [RW]. For the λ, µ and π that we sampled,
positivity index of k̃π
(n) is always zero. But it turns out [BOR] that
there are some λ, µ and π for which the saturation and positivity indices of
(n) are nonzero (one), but very small and thus consistent with SH and
PH2 (Hypothesis1.6.6) in this paper; in the earlier version of this paper, SH
and PH2 stipulated that the saturation and positivity indices are always
zero. These (λ, µ, π) escaped our random sampling, because their density is
extremely small [BOR].
6.3 G/P and Schubert varieties
Let V = Vλ(G) be an irreducible representation of G = SLk(C) correspond-
ing to a partition λ. Let vλ be the point in P (V ) corresponding to the
highest weight vector, and X = Gvλ ∼= G/Pλ its closed orbit. Let hk,λ(n)
be the Hilbert function of the homogeneous coordinate ring R of X. It is
a quasipolynomial since spec(R) has rational singularities. In fact, it is a
polynomial, since t = 1 is the only pole of the Hilbert series
Hk,λ(t) =
hk,λ(n)t
Figure 6.6 gives experimental evidence for strict positivity (PH2) of hk,λ(n)
(as discussed in Section 3.5.5) for a few sample values of k and λ. Figure 6.7
gives experimental evidence for strict positivity of the Hilbert polynomial
of the Schubert subvarieties of the Grassmanian; there Gn,k denotes the
Grassmannian of k-planes in V = Cn, and Ωa, a = (a(1), . . . , a(d)) its
Schubert subvariety consisting of the k-subspaces W such that dim(W ∩
Vn−k+i−a(i)) ≥ i for all i, where V = Vn ⊃ · · ·V1 ⊃ 0 is a complete flag of
subspaces in V . The Hilbert polynomials were computed using the explicit
polyhedral interpretation for them deduced from the theory of algebras with
straightening laws (Hodge algebras) [DEP2].
6.4 The ring of symmetric functions
Let V = Ck, G = GL(V ), H = Sk, with the natural embedding H →
G. Let us consider the spacial case of the subgroup restriction problem
(Problem 1.1.3), with Vλ(G) = V , and Vπ(H) the trivial representation of
H. Then s = mπ
, the multiplicity of the trivial representation in V , is
one. Though the decision problem (Problem 1.1.3) is trivial in this case, the
canonical model associated with s is nontrivial.
The canonical rings R = R(s) and S = S(s) associated with s in this
case coincide with C[V ] = C[x1, . . . , xk]. The ring T = T (s) = S
C[x1, . . . , xk]
Sk is the subring of symmetric functions. Its Hilbert function
h(n) is a quasipolynomial. PH1 and PH3 for Z = Proj(T ), as per Defini-
tion 3.5.2, follow easily, the latter from the well known rational generating
function for the partition function [St1]. But PH2 turns out to be nontriv-
ial. Figures 6.8-6.13 give experimental evidence for strict positivity of h(n)
(PH2). In these figures, the i-th row of the table for a given k shows hi(n),
where hi(n), 1 ≤ i ≤ l, are such that h(n) = hi(n), when n = i modulo the
period l of h(n).
α β λ Cλ
(0, 15, 5) (12, 15, 3) (6, 15, 6) 350 t
8+19121 t7+123576 t6+297561 t5+342064 t4+192779 t3+46992 t2+2641 t+1
(1−t)
(1−t2)
(4, 8, 11) (3, 15, 10) (10, 1, 3) 1+5 t+6 t
(1−t2)
(8, 1, 3) (11, 13, 3) (8, 6, 14) 2 t
8+45 t7+259 t6+591 t5+773 t4+522 t3+198 t2+29 t+1
(1−t)3(1−t2)
(8, 9, 14) (8, 4, 5) (1, 5, 15) 136 t
9+3422 t8+20204 t7+53608 t6+76076 t5+60986 t4+26674 t3+5568 t2+345 t+1
(1−t)3(1−t2)
(10, 5, 6) (5, 4, 10) (0, 7, 12) 219 t
8+12135 t7+79231 t6+193003 t5+223919 t4+127907 t3+31704 t2+1870 t+1
(1−t)6(1+t)3
Figure 6.1: Cλ
(t) for B3
α β λ Cλα,β(t)
(1, 13, 6) (14, 15, 5) (5, 11, 7) 18145 t
8+267151 t7+1070716 t6+1917716 t5+1735692 t4+778184 t3+144596 t2+5538 t+1
(1−t)4(1−t2)
(4, 15, 14) (12, 12, 10) (4, 9, 8) 2280 t
9+267658 t8+2746131 t7+9276935 t6+14682332 t5+11903923 t4+4746803 t3+751126 t2+21249 t+1
(1−t)4(1−t2)
(9, 0, 8) (8, 12, 9) (7, 7, 3) 3 t
2+4 t+1
(1−t)6
(10, 2, 7) (8, 10, 1) (7, 5, 5) 8984 t
8+132826 t7+534183 t6+960491 t5+873227 t4+394045 t3+74067 t2+2941 t+1
(1−t)4(1−t2)
(10, 10, 15) (11, 3, 15) (10, 7, 15) 7162 t
9+736327 t8+7178960 t7+23540366 t6+36359642 t5+28788904 t4+11166361 t3+1693696 t2+43515 t+1
(1−t)7(1+t)3
Figure 6.2: Cλα,β(t) for C3
α β λ Cλ
(0, 15, 5) (12, 15, 3) (6, 15, 6) 633 t
7+24259 t6+142236 t5+252113 t4+168220 t3+36131 t2+1414 t+1
(1−t)
(1−t2)
(4, 8, 11) (3, 15, 10) (10, 1, 3) 7962 t
8+503679 t7+4525372 t6+11944350 t5+12218255 t4+4879052 t3+586370 t2+10862 t+1
(1−t)
(1−t2)
(8, 1, 3) (11, 13, 3) (8, 6, 14) 81 t
7+19407 t6+211964 t5+513585 t4+426652 t3+110317 t2+4609 t+1
(1−t)
(1−t2)
(8, 9, 14) (8, 4, 5) (1, 5, 15) 9 t
2+8 t+1+2 t3
(1−t)
(10, 5, 6) (5, 4, 10) (0, 7, 12) 3647 t
7+111208 t6+570739 t5+920201 t4+560336 t3+106748 t2+3435 t+1
(1−t)
(1+t)
Figure 6.3: Cλ
(t) for D3
λ1 λ2 µ1 µ2 π1 π2 π3 π4 k̃
λ,µ(n); n odd k̃
λ,µ(n); n even K
λ,µ(t)
87 62 97 52 64 39 24 22 1/2 + 4n+ 11/2n2 1 + 4n+ 11/2n2 1+8 t+11 t
2+2 t3
(1−t)2(1−t2)
104 95 149 50 95 78 15 11 1/2 + 13/2n + 18n2 1 + 13/2n + 18n2 1+23 t+36 t
2+12 t3
(1−t)2(1−t2)
101 85 102 84 78 72 24 12 17/2n + 71
n2 1 + 17/2n + 71
n2 1+42 t+72 t
2+27 t3
(1−t)
(1−t2)
79 63 93 49 88 37 14 3 3/4 + 27
n+ 303
n2 1 + 27
n+ 303
n2 1+88 t+151 t
2+63 t3
(1−t)
(1−t2)
97 93 114 76 77 66 47 0 1/2 + 15/2n + 21n2 1 + 15/2n + 21n2 1+27 t+42 t
2+14 t3
(1−t)2(1−t2)
88 56 113 31 99 35 7 3 1/2 + 11/2n + 10n2 1 + 11/2n + 10n2 1+14 t+20 t
2+5 t3
(1−t)2(1−t2)
134 82 140 76 91 72 49 4 3/4 + 21n + 669
n2 1 + 21n + 669
n2 1+187 t+334 t
2+147 t3
(1−t)2(1−t2)
133 69 149 53 98 55 43 6 1 + 6n + 8n2 1 + 6n + 8n2 15 t
2+13 t+1+3 t3
(1−t)3
80 63 111 32 88 38 10 7 1 1 1+t
118 69 151 36 95 63 20 9 1 + 4n + 4n2 1 + 4n + 4n2 7 t
2+7 t+1+t3
(1−t)3
96 51 103 44 90 53 3 1 1/2 + 39
n+ 36n2 1 + 39
n+ 36n2 1+54 t+72 t
2+17 t3
(1−t)2(1−t2)
117 72 133 56 82 57 41 9 1 + 9n+ 18n2 1 + 9n + 18n2 35 t
2+26 t+1+10 t3
(1−t)
72 63 77 58 49 38 28 20 1/2 + 7n + 55
n2 1 + 7n+ 55
n2 1+33 t+55 t
2+21 t3
(1−t)
(1−t2)
48 37 49 36 34 24 16 11 1/2 + 6n + 37
n2 1 + 6n+ 37
n2 1+23 t+37 t
2+13 t3
(1−t)
(1−t2)
108 56 113 51 73 50 29 12 1 + 4n + 4n2 1 + 4n + 4n2 7 t
2+7 t+1+t3
(1−t)3
Figure 6.4: The quasipolynomial k̃π
and the rational function Kπ
(t) for the Kronecker problem, n = 2.
λ1 λ2 µ1 µ2 π1 π2 π3 π4 k̃
(n); n odd k̃π
(n); n even Kπ
77 40 78 39 58 29 24 6 1 + 19/2n + 57
n2 1 + 19/2n + 57
n2 56 t
2+37 t+1+20 t3
(1−t)3
153 81 157 77 96 63 61 14 1 + 3n+ 2n2 1 + 3n + 2n2 3 t
2+4 t+1
(1−t)3
90 89 102 77 90 42 30 17 1/2 + 13/2n + 6n2 1 + 13/2n + 6n2 1+11 t+12 t
(1−t)2(1−t2)
145 102 160 87 96 84 39 28 1 + 10n + 25n2 1 + 10n + 25n2 49 t
2+34 t+1+16 t3
(1−t)3
109 95 136 68 78 60 46 20 1 + 3n+ 2n2 1 + 3n + 2n2 3 t
2+4 t+1
(1−t)3
100 42 104 38 85 27 27 3 1 + 8n 1 + 8n 8 t+1+7 t
(1−t)2
74 51 86 39 52 34 26 13 1 1 1+t
98 90 124 64 92 67 22 7 1/2 + 23/2n + 60n2 1 + 23/2n + 60n2 1+70 t+120 t
2+49 t3
(1−t)2(1−t2)
57 38 75 20 52 25 17 1 1 + 3n+ 2n2 1 + 3n + 2n2 3 t
2+4 t+1
(1−t)
159 140 170 129 89 82 73 55 1 + 3/2n + 1/2n2 1 + 3/2n + 1/2n2 1+t
(1−t)3
144 122 157 109 88 86 74 18 3/4 + n+ 1/4n2 1 + n+ 1/4n2 1
(1−t)2(1−t2)
90 68 92 66 88 37 23 10 1/4 + 12n + 351
n2 1 + 12n+ 351
n2 1+98 t+176 t
2+76 t3
(1−t)
(1−t2)
89 42 100 31 76 28 19 8 1 + 6n+ 8n2 1 + 6n + 8n2 15 t
2+13 t+1+3 t3
(1−t)
88 56 107 37 71 39 20 14 1 + 9/2n + 9/2n2 1 + 9/2n + 9/2n2 8 t
2+8 t+1+t3
(1−t)3
124 111 133 102 98 89 27 21 1/2 + 7n+ 53
n2 1 + 7n+ 53
n2 1+32 t+53 t
2+20 t3
(1−t)2(1−t2)
Figure 6.5: Continuation of Figure 6.4
k λ hk,λ(n)
3 (21, 19) 399n3 + 35527969472513
137438953472
n2 + 4329327034365
137438953472
5 (21, 19) 3700378042361
4194304
n7 + 575575719967
524288
n6 + 2157156441
n5 + 266554253
n4 + 4643843
n3 + 1468423
n2 + 7619
3 (21, 9, 6) 270n3 + 40819369181185
274877906944
n2 + 3092376453119
137438953472
3 (12, 9, 5) 42n3 + 40132174413825
1099511627776
n2 + 11544872091645
1099511627776
3 (21, 9, 6) 27396522639355
536870912
n6 + 463063744509
8388608
n5 + 6265700353
262144
n4 + 5577375771
1048576
n3 + 84246529
131072
n2 + 20971505
524288
n+ 1048573
1048576
3 (21, 19, 16) 15n3 + 81363860455425
4398046511104
n2 + 8246337208319
1099511627776
4 (9, 7, 5) 7215545057279
17179869184
n6 + 4183298146289
4294967296
n5 + 247765925897
268435456
n4 + 1914699777
4194304
n3 + 4160749567
33554432
n2 + 587202553
33554432
n+ 67108863
67108864
4 (21, 12, 9) 16437913583613
268435456
n6 + 132498063359
2097152
n5 + 109509083155
4194304
n4 + 1462763527
262144
n3 + 171442179
262144
n2 + 10485755
262144
n+ 524287
524288
4 (21, 9, 5) 32469952757755
536870912
n6 + 129805320191
2097152
n5 + 108129157137
4194304
n4 + 2926313487
524288
n3 + 86638593
131072
n2 + 10616825
262144
n+ 262143
262144
4 (21, 9, 6) 27396522639355
536870912
n6 + 463063744509
8388608
n5 + 6265700353
262144
n4 + 5577375771
1048576
n3 + 84246529
131072
n2 + 20971505
524288
n+ 1048573
1048576
4 (31, 19, 5) 35969680015355
33554432
n6 + 1424674346311
2097152
n5 + 22705493343
131072
n4 + 46973953
n3 + 3423915
n2 + 65365
n+ 16383
16384
Figure 6.6: Hilbert polynomial for G/Pλ, G = SLk(C). There is a slight rounding error caused by interpolation–
e.g., the constant term of each polynomial should be one.
n k a
7 3 (1, 3, 5) 1/3n3 + 59373627899905
39582418599936
n2 + 28587302322173
13194139533312
7 3 (1, 2, 4) n+ 1
7 3 (1, 4, 6) 22265110462465
534362651099136
n5 + 4638564679679
11132555231232
n4 + 13
n3 + 105942526633
34359738368
n2 + 146028888073
51539607552
n+ 34359738361
34359738368
6 2 (1, 4, 5) 15637498706143
2251799813685248
n6 + 3665038759245
35184372088832
n5 + 1389660529559
2199023255552
n4 + 272014595421
137438953472
+230973796809
68719476736
n2 + 100215903571
34359738368
6 2 (1, 4, 6) 69578470195
25048249270272
n7 + 1217623228439
25048249270272
n6 + 372534725887
1043677052928
n5 + 30953963537
21743271936
n4 + 12044363351
3623878656
+683671553
150994944
n2 + 1335466297
402653184
n+ 268435457
268435456
7 3 (1, 4, 6) 23456248059223
562949953421312
n5 + 7330077518505
17592186044416
n4 + 1786706395137
1099511627776
n3 + 423770106525
137438953472
n2 + 24338148015
8589934592
n+ 17179869169
17179869184
6 3 (1, 3, 6) 1/8n4 + 16126170540715
17592186044416
n3 + 19
n2 + 710101259605
274877906944
8 3 (1, 3, 6) 171798691840001
1374389534720000
n4 + 31496426837333
34359738368000
n3 + 4080218931199
1717986918400
n2 + 443813287253
171798691840
Figure 6.7: Hilbert polynomial of the Schubert subvariety Ωa, a = (a(1), . . . , a(k)), of the Grassmannian Gn,k.
k = 2
1/2n + 1/2
1/2n + 1
k = 3
1/12n2 + 1/2n + 5
1/12n2 + 1/2n + 2/3
1/12n2 + 1/2n + 3/4
1/12n2 + 1/2n + 46912496118443
70368744177664
1/12n2 + 1/2n + 58640620148053
140737488355328
1/12n2 + 1/2n + 1
k = 4
n3 + 5
n2 + 61572651155457
140737488355328
n+ 15881834623431
35184372088832
n3 + 117281240296107
1125899906842624
n2 + 140737488355325
281474976710656
n+ 19
n3 + 234562480592215
2251799813685248
n2 + 123145302310909
281474976710656
n+ 19791209299969
35184372088832
n3 + 234562480592215
2251799813685248
n2 + 70368744177667
140737488355328
n+ 62549994824587
70368744177664
n3 + 5
n2 + 61572651155453
140737488355328
n+ 748278746681
2199023255552
n3 + 117281240296107
1125899906842624
n2 + 70368744177665
140737488355328
n+ 26388279066621
35184372088832
n3 + 117281240296107
1125899906842624
n2 + 7
n+ 7940917311717
17592186044416
n3 + 117281240296107
1125899906842624
n2 + 35184372088831
70368744177664
n+ 6841405683939
8796093022208
n3 + 29320310074027
281474976710656
n2 + 30786325577729
70368744177664
n3 + 58640620148053
562949953421312
n2 + 35184372088831
70368744177664
n+ 5619726097523
8796093022208
n3 + 117281240296105
1125899906842624
n2 + 7
n+ 2993114986727
8796093022208
n3 + 58640620148055
562949953421312
n2 + 1/2n + 1
Figure 6.8: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk ; k = 2, 3, 4.
n4 + 46912496118441
4503599627370496
n3 + 3787206717893
35184372088832
n2 + 469583091025
1099511627776
n+ 499743305817
1099511627776
n4 + 46912496118445
4503599627370496
n3 + 3787206717891
35184372088832
n2 + 503942829399
1099511627776
n+ 310001195055
549755813888
n4 + 46912496118441
4503599627370496
n3 + 3787206717897
35184372088832
n2 + 469583091031
1099511627776
n+ 30279519437
68719476736
400319966877379
1152921504606846976
n4 + 23456248059221
2251799813685248
n3 + 3787206717893
35184372088832
n2 + 503942829403
1099511627776
n+ 47340083975
68719476736
400319966877379
1152921504606846976
n4 + 5864062014805
562949953421312
n3 + 3787206717895
35184372088832
n2 + 117395772755
274877906944
n+ 89955703917
137438953472
400319966877379
1152921504606846976
n4 + 5864062014805
562949953421312
n3 + 3787206717893
35184372088832
n2 + 31496426837
68719476736
n+ 92771293595
137438953472
400319966877379
1152921504606846976
n4 + 46912496118447
4503599627370496
n3 + 1893603358947
17592186044416
n2 + 234791545515
549755813888
n+ 5661005505
17179869184
n4 + 23456248059223
2251799813685248
n3 + 1893603358949
17592186044416
n2 + 62992853675
137438953472
n+ 94680167945
137438953472
400319966877379
1152921504606846976
n4 + 46912496118441
4503599627370496
n3 + 3787206717891
35184372088832
n2 + 117395772757
274877906944
n+ 38869454029
68719476736
n4 + 46912496118441
4503599627370496
n3 + 3787206717897
35184372088832
n2 + 62992853675
137438953472
n+ 52494044729
68719476736
n4 + 5864062014805
562949953421312
n3 + 946801679473
8796093022208
n2 + 234791545515
549755813888
n+ 2830502753
8589934592
400319966877379
1152921504606846976
n4 + 23456248059219
2251799813685248
n3 + 1893603358949
17592186044416
n2 + 251971414695
549755813888
n+ 27487790695
34359738368
400319966877379
1152921504606846976
n4 + 23456248059223
2251799813685248
n3 + 1893603358945
17592186044416
n2 + 234791545509
549755813888
n+ 31233956605
68719476736
n4 + 11728124029611
1125899906842624
n3 + 946801679473
8796093022208
n2 + 251971414699
549755813888
n+ 19375074691
34359738368
n4 + 11728124029611
1125899906842624
n3 + 473400839737
4398046511104
n2 + 29348943189
68719476736
n+ 11005853695
17179869184
400319966877379
1152921504606846976
n4 + 23456248059219
2251799813685248
n3 + 473400839737
4398046511104
n2 + 251971414699
549755813888
n+ 5917510497
8589934592
n4 + 11728124029611
1125899906842624
n3 + 473400839737
4398046511104
n2 + 234791545511
549755813888
n+ 15616978311
34359738368
n4 + 23456248059223
2251799813685248
n3 + 1893603358947
17592186044416
n2 + 62992853673
137438953472
n+ 23192823403
34359738368
n4 + 11728124029611
1125899906842624
n3 + 946801679475
8796093022208
n2 + 117395772757
274877906944
n+ 2830502755
8589934592
400319966877379
1152921504606846976
n4 + 11728124029609
1125899906842624
n3 + 1893603358949
17592186044416
n2 + 31496426837
68719476736
n+ 30541989663
34359738368
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 946801679473
8796093022208
n2 + 117395772759
274877906944
n+ 9717363505
17179869184
n4 + 2932031007403
281474976710656
n3 + 1893603358945
17592186044416
n2 + 125985707353
274877906944
n+ 4843768673
8589934592
400319966877379
1152921504606846976
n4 + 23456248059223
2251799813685248
n3 + 59175104967
549755813888
n2 + 117395772755
274877906944
n+ 2830502755
8589934592
400319966877379
1152921504606846976
n4 + 23456248059221
2251799813685248
n3 + 946801679475
8796093022208
n2 + 62992853675
137438953472
n+ 3435973837
4294967296
400319966877379
1152921504606846976
n4 + 23456248059223
2251799813685248
n3 + 1893603358947
17592186044416
n2 + 58697886379
137438953472
n+ 11244462985
17179869184
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 946801679475
8796093022208
n2 + 15748213419
34359738368
n+ 9687537337
17179869184
n4 + 11728124029609
1125899906842624
n3 + 473400839737
4398046511104
n2 + 117395772753
274877906944
n+ 1892469965
4294967296
n4 + 23456248059221
2251799813685248
n3 + 946801679473
8796093022208
n2 + 7874106709
17179869184
n+ 11835020991
17179869184
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 473400839737
4398046511104
n2 + 117395772753
274877906944
n+ 3904244579
8589934592
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 946801679473
8796093022208
n2 + 125985707349
274877906944
n+ 1879048193
2147483648
Figure 6.9: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk , k = 5; the
first 30 rows.
n4 + 11728124029609
1125899906842624
n3 + 946801679473
8796093022208
n2 + 58697886375
137438953472
n+ 88453211
268435456
n4 + 11728124029611
1125899906842624
n3 + 946801679473
8796093022208
n2 + 15748213419
34359738368
n+ 2958755247
4294967296
400319966877379
1152921504606846976
n4 + 23456248059223
2251799813685248
n3 + 946801679475
8796093022208
n2 + 58697886375
137438953472
n+ 4858681755
8589934592
400319966877379
1152921504606846976
n4 + 11728124029609
1125899906842624
n3 + 473400839737
4398046511104
n2 + 62992853673
137438953472
n+ 151367771
268435456
n4 + 23456248059221
2251799813685248
n3 + 946801679473
8796093022208
n2 + 58697886375
137438953472
n+ 2274244835
4294967296
n4 + 11728124029611
1125899906842624
n3 + 236700419869
2199023255552
n2 + 62992853671
137438953472
n+ 858993459
1073741824
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 946801679473
8796093022208
n2 + 14674471595
34359738368
n+ 3904244571
8589934592
400319966877379
1152921504606846976
n4 + 23456248059219
2251799813685248
n3 + 59175104967
549755813888
n2 + 7874106709
17179869184
n+ 2421884337
4294967296
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 59175104967
549755813888
n2 + 29348943189
68719476736
n+ 3784939927
8589934592
400319966877379
1152921504606846976
n4 + 23456248059223
2251799813685248
n3 + 473400839737
4398046511104
n2 + 62992853673
137438953472
n+ 1908874355
2147483648
400319966877379
1152921504606846976
n4 + 5864062014805
562949953421312
n3 + 946801679473
8796093022208
n2 + 29348943189
68719476736
n+ 1952122291
4294967296
400319966877379
1152921504606846976
n4 + 11728124029609
1125899906842624
n3 + 946801679471
8796093022208
n2 + 62992853675
137438953472
n+ 2899102929
4294967296
400319966877379
1152921504606846976
n4 + 11728124029611
1125899906842624
n3 + 473400839735
4398046511104
n2 + 58697886381
137438953472
n+ 707625689
2147483648
400319966877379
1152921504606846976
n4 + 11728124029613
1125899906842624
n3 + 59175104967
549755813888
n2 + 15748213419
34359738368
n+ 2958755253
4294967296
n4 + 11728124029609
1125899906842624
n3 + 473400839737
4398046511104
n2 + 58697886377
137438953472
n+ 3288334339
4294967296
100079991719345
288230376151711744
n4 + 5864062014805
562949953421312
n3 + 59175104967
549755813888
n2 + 62992853675
137438953472
n+ 1210942169
2147483648
n4 + 11728124029611
1125899906842624
n3 + 946801679477
8796093022208
n2 + 29348943193
68719476736
n+ 1415251375
4294967296
n4 + 5864062014805
562949953421312
n3 + 59175104967
549755813888
n2 + 31496426839
68719476736
n+ 3435973835
4294967296
100079991719345
288230376151711744
n4 + 1466015503701
140737488355328
n3 + 473400839737
4398046511104
n2 + 29348943195
68719476736
n+ 976061147
2147483648
100079991719345
288230376151711744
n4 + 11728124029611
1125899906842624
n3 + 473400839737
4398046511104
n2 + 31496426839
68719476736
n+ 3280877793
4294967296
n4 + 5864062014805
562949953421312
n3 + 946801679473
8796093022208
n2 + 29348943191
68719476736
n+ 946234983
2147483648
100079991719345
288230376151711744
n4 + 2932031007403
281474976710656
n3 + 946801679475
8796093022208
n2 + 15748213419
34359738368
n+ 1479377623
2147483648
n4 + 5864062014805
562949953421312
n3 + 59175104967
549755813888
n2 + 29348943195
68719476736
n+ 122007643
268435456
100079991719345
288230376151711744
n4 + 2932031007403
281474976710656
n3 + 473400839737
4398046511104
n2 + 3937053355
8589934592
n+ 1449551461
2147483648
100079991719345
288230376151711744
n4 + 2932031007403
281474976710656
n3 + 236700419869
2199023255552
n2 + 29348943193
68719476736
n+ 71070151
134217728
n4 + 1466015503701
140737488355328
n3 + 59175104967
549755813888
n2 + 15748213419
34359738368
n+ 739688813
1073741824
100079991719345
288230376151711744
n4 + 11728124029611
1125899906842624
n3 + 473400839739
4398046511104
n2 + 14674471597
34359738368
n+ 607335219
1073741824
n4 + 2932031007403
281474976710656
n3 + 236700419869
2199023255552
n2 + 3937053355
8589934592
n+ 605471085
1073741824
100079991719345
288230376151711744
n4 + 11728124029611
1125899906842624
n3 + 473400839737
4398046511104
n2 + 29348943193
68719476736
n+ 88453211
268435456
100079991719345
288230376151711744
n4 + 1466015503701
140737488355328
n3 + 473400839737
4398046511104
n2 + 3937053355
8589934592
n+ 2147483647
2147483648
Figure 6.10: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk , k = 5;
the last 30 rows.
53375995583651
4611686018427387904
n5 + 21892498188609
36028797018963968
n4 + 418085902907
35184372088832
n3 + 115486898397
1099511627776
n2 + 26847522421
68719476736
n+ 8448724291
17179869184
53375995583651
4611686018427387904
n5 + 10946249094305
18014398509481984
n4 + 836171805815
70368744177664
n3 + 7396888121
68719476736
n2 + 15302809397
34359738368
n+ 9853503363
17179869184
106751991167299
9223372036854775808
n5 + 21892498188599
36028797018963968
n4 + 1672343611625
140737488355328
n3 + 57743449207
549755813888
n2 + 14060052663
34359738368
n+ 975427339
2147483648
13343998895913
1152921504606846976
n5 + 10946249094301
18014398509481984
n4 + 1672343611641
140737488355328
n3 + 59175104961
549755813888
n2 + 15302809423
34359738368
n+ 610309549
1073741824
106751991167305
9223372036854775808
n5 + 21892498188611
36028797018963968
n4 + 1672343611623
140737488355328
n3 + 115486898411
1099511627776
n2 + 13423761203
34359738368
n+ 4461910959
8589934592
13343998895913
1152921504606846976
n5 + 21892498188605
36028797018963968
n4 + 1672343611631
140737488355328
n3 + 29587552483
274877906944
n2 + 15939100865
34359738368
n+ 1927366573
2147483648
213503982334599
18446744073709551616
n5 + 10946249094305
18014398509481984
n4 + 418085902909
35184372088832
n3 + 57743449199
549755813888
n2 + 13423761217
34359738368
n+ 835973461
2147483648
106751991167299
9223372036854775808
n5 + 10946249094305
18014398509481984
n4 + 836171805821
70368744177664
n3 + 59175104963
549755813888
n2 + 7651404713
17179869184
n+ 320001575
536870912
106751991167299
9223372036854775808
n5 + 10946249094299
18014398509481984
n4 + 1672343611637
140737488355328
n3 + 115486898395
1099511627776
n2 + 14060052667
34359738368
n+ 255936429
536870912
213503982334597
18446744073709551616
n5 + 2736562273577
4503599627370496
n4 + 1672343611629
140737488355328
n3 + 7396888121
68719476736
n2 + 7651404703
17179869184
n+ 2859997491
4294967296
106751991167299
9223372036854775808
n5 + 342070284197
562949953421312
n4 + 104521475727
8796093022208
n3 + 57743449199
549755813888
n2 + 1677970153
4294967296
n+ 1790721319
4294967296
86400
n5 + 10946249094299
18014398509481984
n4 + 836171805815
70368744177664
n3 + 59175104959
549755813888
n2 + 3984775219
8589934592
n+ 987842475
1073741824
86400
n5 + 5473124547153
9007199254740992
n4 + 209042951453
17592186044416
n3 + 57743449193
549755813888
n2 + 3355940307
8589934592
n+ 884291849
2147483648
106751991167303
9223372036854775808
n5 + 10946249094301
18014398509481984
n4 + 836171805817
70368744177664
n3 + 118350209919
1099511627776
n2 + 1912851173
4294967296
n+ 264972307
536870912
213503982334605
18446744073709551616
n5 + 2736562273577
4503599627370496
n4 + 836171805827
70368744177664
n3 + 57743449201
549755813888
n2 + 7030026327
17179869184
n+ 1233125369
2147483648
53375995583651
4611686018427387904
n5 + 10946249094299
18014398509481984
n4 + 836171805819
70368744177664
n3 + 118350209933
1099511627776
n2 + 1912851177
4294967296
n+ 1478317113
2147483648
106751991167297
9223372036854775808
n5 + 1368281136787
2251799813685248
n4 + 836171805817
70368744177664
n3 + 57743449195
549755813888
n2 + 1677970151
4294967296
n+ 117959881
268435456
1667999861989
144115188075855872
n5 + 10946249094303
18014398509481984
n4 + 104521475727
8796093022208
n3 + 59175104961
549755813888
n2 + 3984775215
8589934592
n+ 109722991
134217728
106751991167297
9223372036854775808
n5 + 342070284197
562949953421312
n4 + 836171805815
70368744177664
n3 + 28871724597
274877906944
n2 + 1677970153
4294967296
n+ 332087389
1073741824
213503982334607
18446744073709551616
n5 + 10946249094307
18014398509481984
n4 + 836171805821
70368744177664
n3 + 59175104963
549755813888
n2 + 7651404709
17179869184
n+ 384426083
536870912
Figure 6.11: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk , k = 6; the first 20 rows.
213503982334615
18446744073709551616
n5 + 10946249094307
18014398509481984
n4 + 836171805813
70368744177664
n3 + 28871724597
274877906944
n2 + 7030026337
17179869184
n+ 640721865
1073741824
106751991167309
9223372036854775808
n5 + 10946249094297
18014398509481984
n4 + 209042951455
17592186044416
n3 + 29587552479
274877906944
n2 + 1912851179
4294967296
n+ 157275009
268435456
26687997791827
2305843009213693952
n5 + 342070284197
562949953421312
n4 + 836171805825
70368744177664
n3 + 57743449199
549755813888
n2 + 3355940301
8589934592
n+ 90445245
268435456
13343998895913
1152921504606846976
n5 + 5473124547153
9007199254740992
n4 + 104521475729
8796093022208
n3 + 29587552481
274877906944
n2 + 1992387605
4294967296
n+ 450971557
536870912
106751991167303
9223372036854775808
n5 + 10946249094307
18014398509481984
n4 + 836171805827
70368744177664
n3 + 28871724597
274877906944
n2 + 209746269
536870912
n+ 17843591
33554432
106751991167303
9223372036854775808
n5 + 10946249094297
18014398509481984
n4 + 836171805819
70368744177664
n3 + 924611015
8589934592
n2 + 239106397
536870912
n+ 5146825
8388608
1667999861989
144115188075855872
n5 + 5473124547153
9007199254740992
n4 + 418085902911
35184372088832
n3 + 7217931151
68719476736
n2 + 54922081
134217728
n+ 265331665
536870912
1667999861989
144115188075855872
n5 + 10946249094291
18014398509481984
n4 + 836171805825
70368744177664
n3 + 59175104963
549755813888
n2 + 478212795
1073741824
n+ 326629601
536870912
1667999861989
144115188075855872
n5 + 10946249094297
18014398509481984
n4 + 418085902909
35184372088832
n3 + 14435862303
137438953472
n2 + 3355940305
8589934592
n+ 24121261
67108864
53375995583655
4611686018427387904
n5 + 10946249094291
18014398509481984
n4 + 209042951453
17592186044416
n3 + 59175104973
549755813888
n2 + 3984775217
8589934592
n+ 503316475
536870912
26687997791827
2305843009213693952
n5 + 10946249094285
18014398509481984
n4 + 836171805825
70368744177664
n3 + 57743449207
549755813888
n2 + 3355940305
8589934592
n+ 115234099
268435456
106751991167303
9223372036854775808
n5 + 2736562273573
4503599627370496
n4 + 209042951459
17592186044416
n3 + 7396888121
68719476736
n2 + 3825702363
8589934592
n+ 42684549
67108864
106751991167301
9223372036854775808
n5 + 10946249094305
18014398509481984
n4 + 209042951453
17592186044416
n3 + 28871724605
274877906944
n2 + 1757506587
4294967296
n+ 277411267
536870912
106751991167303
9223372036854775808
n5 + 10946249094299
18014398509481984
n4 + 418085902905
35184372088832
n3 + 924611015
8589934592
n2 + 3825702353
8589934592
n+ 33950041
67108864
106751991167307
9223372036854775808
n5 + 1368281136787
2251799813685248
n4 + 52260737865
4398046511104
n3 + 28871724601
274877906944
n2 + 209746269
536870912
n+ 61328751
134217728
53375995583651
4611686018427387904
n5 + 2736562273575
4503599627370496
n4 + 104521475727
8796093022208
n3 + 29587552487
274877906944
n2 + 249048451
536870912
n+ 128849017
134217728
106751991167301
9223372036854775808
n5 + 5473124547145
9007199254740992
n4 + 209042951459
17592186044416
n3 + 7217931151
68719476736
n2 + 1677970157
4294967296
n+ 30318473
67108864
106751991167303
9223372036854775808
n5 + 342070284197
562949953421312
n4 + 418085902919
35184372088832
n3 + 14793776243
137438953472
n2 + 1912851181
4294967296
n+ 143223581
268435456
53375995583651
4611686018427387904
n5 + 5473124547153
9007199254740992
n4 + 209042951459
17592186044416
n3 + 1804482787
17179869184
n2 + 878753295
2147483648
n+ 13898875
33554432
106751991167303
9223372036854775808
n5 + 5473124547147
9007199254740992
n4 + 418085902907
35184372088832
n3 + 29587552479
274877906944
n2 + 956425585
2147483648
n+ 195527051
268435456
Figure 6.12: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk , k = 6; the middle 20 rows.
13343998895913
1152921504606846976
n5 + 5473124547145
9007199254740992
n4 + 418085902919
35184372088832
n3 + 7217931149
68719476736
n2 + 209746269
536870912
n+ 64348647
134217728
53375995583651
4611686018427387904
n5 + 5473124547143
9007199254740992
n4 + 104521475729
8796093022208
n3 + 14793776237
137438953472
n2 + 498096901
1073741824
n+ 28772927
33554432
106751991167301
9223372036854775808
n5 + 5473124547141
9007199254740992
n4 + 209042951457
17592186044416
n3 + 3608965575
34359738368
n2 + 1677970153
4294967296
n+ 11719905
33554432
13343998895913
1152921504606846976
n5 + 5473124547151
9007199254740992
n4 + 418085902905
35184372088832
n3 + 3698444059
34359738368
n2 + 478212797
1073741824
n+ 74631683
134217728
106751991167311
9223372036854775808
n5 + 5473124547147
9007199254740992
n4 + 418085902907
35184372088832
n3 + 28871724591
274877906944
n2 + 878753299
2147483648
n+ 10682367
16777216
106751991167313
9223372036854775808
n5 + 5473124547151
9007199254740992
n4 + 104521475729
8796093022208
n3 + 14793776237
137438953472
n2 + 478212797
1073741824
n+ 84006215
134217728
106751991167307
9223372036854775808
n5 + 2736562273573
4503599627370496
n4 + 209042951459
17592186044416
n3 + 28871724589
274877906944
n2 + 104873135
268435456
n+ 3161957
8388608
53375995583655
4611686018427387904
n5 + 5473124547149
9007199254740992
n4 + 209042951451
17592186044416
n3 + 29587552485
274877906944
n2 + 996193803
2147483648
n+ 118111589
134217728
106751991167309
9223372036854775808
n5 + 1368281136787
2251799813685248
n4 + 418085902907
35184372088832
n3 + 28871724601
274877906944
n2 + 419492539
1073741824
n+ 49899533
134217728
106751991167305
9223372036854775808
n5 + 2736562273573
4503599627370496
n4 + 52260737863
4398046511104
n3 + 29587552479
274877906944
n2 + 1912851173
4294967296
n+ 87717907
134217728
106751991167309
9223372036854775808
n5 + 1368281136787
2251799813685248
n4 + 104521475729
8796093022208
n3 + 28871724595
274877906944
n2 + 878753303
2147483648
n+ 8962701
16777216
106751991167305
9223372036854775808
n5 + 2736562273571
4503599627370496
n4 + 209042951453
17592186044416
n3 + 29587552499
274877906944
n2 + 956425585
2147483648
n+ 10878263
16777216
53375995583651
4611686018427387904
n5 + 5473124547157
9007199254740992
n4 + 104521475727
8796093022208
n3 + 3608965575
34359738368
n2 + 838985083
2147483648
n+ 53611225
134217728
13343998895913
1152921504606846976
n5 + 5473124547145
9007199254740992
n4 + 418085902907
35184372088832
n3 + 14793776249
137438953472
n2 + 498096903
1073741824
n+ 104354293
134217728
106751991167299
9223372036854775808
n5 + 2736562273575
4503599627370496
n4 + 209042951447
17592186044416
n3 + 3608965575
34359738368
n2 + 838985087
2147483648
n+ 7873221
16777216
106751991167303
9223372036854775808
n5 + 5473124547151
9007199254740992
n4 + 418085902899
35184372088832
n3 + 14793776243
137438953472
n2 + 956425599
2147483648
n+ 90737805
134217728
53375995583655
4611686018427387904
n5 + 5473124547155
9007199254740992
n4 + 104521475727
8796093022208
n3 + 14435862303
137438953472
n2 + 878753295
2147483648
n+ 18680385
33554432
106751991167305
9223372036854775808
n5 + 2736562273573
4503599627370496
n4 + 104521475725
8796093022208
n3 + 3698444063
34359738368
n2 + 478212799
1073741824
n+ 36634403
67108864
106751991167311
9223372036854775808
n5 + 684140568395
1125899906842624
n4 + 209042951463
17592186044416
n3 + 7217931153
68719476736
n2 + 52436567
134217728
n+ 19926951
67108864
26687997791827
2305843009213693952
n5 + 1368281136789
2251799813685248
n4 + 6532592233
549755813888
n3 + 14793776243
137438953472
n2 + 498096903
1073741824
n+ 67108871
67108864
Figure 6.13: The Hilbert quasipolynomial of Tk = C[x1, . . . , xk]
Sk , k = 6; the last 20 rows.
Chapter 7
On verification and discovery
of obstructions
In this chapter we give applications of the results and positivity hypotheses
in this paper to the problem of verifying or discovering an obstruction, i.e.,
a “proof of hardness” [GCT2] in the context of the P vs. NP and the
permanent vs. determinant problems in characteristic zero.
7.1 Obstruction
An obstruction in an abstract setting of Problem 1.1.4 is defined as follows.
Let X and Y be H-varieties with compact specifications (Section 3.5),
H a connected reductive group. Let 〈X〉 and 〈Y 〉 denote the bit lengths of
their specifications (Section 3.5). Suppose we wish to show that X cannot
be embedded as an H-subvariety of Y . Pictorially:
X 6 →֒ Y. (7.1)
For example, in the context of the P vs. NP problem in characteristic
zero [GCT1, GCT2], X is a class variety XNP (n, l) associated with the
complexity class NP for the given input size parameter n and the circuit
size parameter l. The variety Y is the class variety XP (l) associated with
the class P for given l. And H is SLl(C). If NP ⊆ P (over C) to the
contrary, then it would turn out that
XNP (n, l) →֒ XP (l)
as an H-subvariety, for every l = poly(n). The goal is to show that this
embedding cannot exist when l = poly(n) and n→ ∞.
Let R(X) and R(Y ) be the homogeneous coordinate rings of X and
Y , respectively. Let R(X)d and R(Y )d denote their degree d-compoenents.
Suppose to the contrary that an H-embedding as in (7.1) exists. Then there
exists a degre preserving H-equivariant surjection from R(Y )d to R(X)d for
every d, and hence, a degree-preserving H-equivariant injection from R(X)∗
to R(Y )∗
. Hence, every irreducible H-module Vλ(G) that occurs in R(X)
also occurs in R(Y )∗
. This leads to:
Definition 7.1.1 An irreducible representation Vλ(H) is called an obstruc-
tion for the pair (X,Y) if it occurs (as an H-submodule) in R(X)∗
but not
in R(Y )∗
, for some d. We say that Vλ(H) is an obstruction of degree d.
Remark 7.1.2 The obstruction as defined here is dual to the obstrution as
defined in [GCT2].
Existence of such an obstruction implies that X cannot be embedded in
Y as an H-subvariety.
Let us assume that X and Y are H-subvarieties of P (V ), where V is an
H-module, and that we are given a point y ∈ Y ⊆ P (V ) that is distiguished
in the following sense. Let Hy ⊆ H be the stabilizer of y. Then Cy, the line
in V corresponding to y, is invariant under Hy. Let [y] be the set of points
in P (V ) stabilized by Hy. We say that y is characterized by its stabilizer Hy
if y = [y]; i.e., y is the only point in P (V ) stabilized by Hy. Let
H[y] = ∪z∈[y]Hz
be the union of the H-orbits of all points in [y]. We say that y is a distin-
guished point of Y if Y equals the projective closure of H[y] in P (V ). If y
is characterized by its stabilizer, this means Y is the projective closure of
the orbit Hy of y.
If Vλ(H) occurs in R(Y )
, then it can be shown (cf. Proposition 4.2 in
[GCT2]) that Vλ(H) contains an Hy-submodule isomorphic to (Cy)
d, the
d-th tensor power of Cy. This leads to the following stronger notion of
obstruction:
Definition 7.1.3 [GCT2] We say that Vλ(H) is a strong obstruction for
the pair (X,Y ) if, for some d, it occurs in R(X)∗
, but it does not contain
an Hy-module isomorphic to (Cy)
Existence of a strong obstruction also implies that X cannot be embedded
in Y as an H-subvariety. The results in [GCT2] suggest that strong obstruc-
tions exist in the context of the lower bound problems under cosideration.
The goal then is to show their existence.
7.2 Decision problems
The decisions problems that arise in this context are the following. Let sλ
be the multiplicity of Vλ(H) in R(X)
, and md
the multiplicity of the Hy-
module (Cy)d in Vλ(H), considered an Hy-module via the the embedding
ρ : Hy →֒ H. Thus λ is a strong obstruction of degree d iff s
is nonzero
and md
is zero.
Problem 7.2.1 (Decision Problems)
(a) Given d, λ and the specification of X, decide if sλ
is nonzero.
(b) Given d, λ and the specifications of H,Hy and ρ, decide if m
λ is nonzero.
(c) Given d, λ and the specifications of X,H,Hy and ρ, decide if λ is a
strong obstruction of degree d.
The first is an instance of the decision Problem 1.1.4 in geometric in-
variant theory, and the second of the subgroup restriction Problem 1.1.3.
By the results in Chapter 3-4, relaxed forms of the decision problems in (a)
and (b) belong to P assuming appropriate PH1 and SH; this implies that
a relaxed form of the decision problem in (c) also belongs to P assumming
PH1 and SH. We will only need a weak relaxed form of (c), for which the
weak form of SH that is implied by PH1 (cf. Theorem 3.3.5) will suffice.
7.3 Verification of obstructions
The relevant PH1 are as follows.
Assume that that singularities of spec(R(X)) are rational. By Theo-
rem 3.5.1, the stretching function s̃λ
(k) = skλ
is a quasipolynomial. Hence
PH1 for sλ
(Hypothesis 3.3.1, or rather its slight variant obtained by replac-
ing R(X)d with R(X)
) is well defined. It is:
Hypothesis 7.3.1 (PH1):
There exists a polytope P λ
such that:
1. The number of integer points in P λ
is equal to sλ
2. The Ehrhart quasi-polynomial of P λd coincides with the stretching quasi-
polynomial s̃λd(n) (cf. Theorem 3.5.1).
3. The polytope P λ
is given by a separating oracle, as in Section 2.3. Its
encoding bitlength 〈P λ
〉 is poly(〈d〉, 〈λ〉, 〈X〉), and the combinatorial
size ‖P λ
‖ is poly(ht(λ), ‖X‖), where ‖X‖ is the combinatorial size of
X (Section 3.5), and ht(λ) is the height of λ.
Similarly by Theorem 3.4.1 (or rather its slight variant which can be
proved similarly), the stretching function mkdkλ is a quasipolynomial. Hence
PH1 for mdλ (cf. Hypothesis 3.3.1 and Section 3.4) is also well defined. It is:
Hypothesis 7.3.2 (PH1:)
There exists a polytope Qd
such that:
1. The number of integer points in Qd
is equal to md
2. The Ehrhart quasi-polynomial of Qd
coincides with the stretching quasi-
polynomial m̃d
(n) (Theorem 3.5.1).
3. The polytope Qdλ is given by a separating oracle. Its encoding bitlength
〈Qdλ〉 is poly(〈d〉, 〈λ〉, 〈ρ〉, 〈Hy 〉, 〈H〉), and the combinatorial size ‖Q
is O(poly(ht(λ), 〈H〉, 〈Hy〉, 〈ρ〉)). Here 〈H〉, 〈Hy〉 and 〈ρ〉 denote the
bitlengths of H,Hy and ρ (Section 3.4).
Theorem 7.3.3 (Weak SH:)
(a) Assuming PH1 (Hypothesis 7.3.1), the saturation index of s̃λ
(n) is at
most apoly(‖P
‖), for some explicit constant a > 0.
(b) Assuming PH1 (Hypothesis 7.3.2), the saturation index of m̃dλ(n) is at
most bpoly(‖Q
‖), for some explicit constant b > 0.
This follows from Theorem 3.3.5.
Theorem 7.3.4 Assume PH1 (Hypotheses 7.3.1-7.3.2). Then, given d, λ,
the specifications of X,H,Hy and ρ, and a relaxation parameter c greater
than the explicit bounds on the saturation indices in Theorem 7.3.3, whether
cλ is an obstruction of degree d can be decided in
poly(〈d〉, 〈λ〉, 〈X〉, 〈H〉, 〈Hy 〉, 〈ρ〉, 〈c〉)
time.
This follows by applying Theorem 3.1.1 to the polytopes P λ
and Qd
the saturation index estimates in Theorem 7.3.3.
7.4 Robust obstruction
We now define a notion of obstruction that is well behaved with respect to
relaxation.
Definition 7.4.1 Assume PH1 for both sλd andm
λ (Hypotheses 7.3.1-7.3.2).
We say that Vλ(H) is a robust obstruction for the pair (X,Y ) if one of the
following hold:
1. Qd
is empty, and P λ
is nonempty.
2. Both Qdλ and P λ
are nonempty, the affine span of Qd
does not contain
an integer point and the affine span of P λ
contains an integer point.
If Vλ(H) is a robust obstruction, so is Vlλ(H), for all or most positive
integral l, hence the name robust.
Proposition 7.4.2 Assume PH1 for both sλd and m
λ as above. If Vλ(H)
is a robust obstruction for the pair (X,Y ), then for some positive integer
k–called a relaxation parameter–Vkλ(H) is a strong obstruction for (X,Y ).
In fact, this is so for most large enough k.
Proof:
(1) Suppose Qd
is empty, and P λ
is nonempty. Let k be a large enough
positive integer k such that kP λ
= P kλ
contains an integer point. Then skλ
is nonzero. But mkd
is zero since Qkd
= kQd
is empty. Thus kλ is a strong
obstruction.
(2) Suppose both Qdλ and P λd are nonempty, the affine span of Q
λ does not
contain an integer point and the affine span of P λd contains an integer point.
We can choose a positive integer k such that the affine span of kQdλ = Q
does not contain an integer point, but kP λ
= P kλ
contains an integer point;
most large enough k have this property. This means skλ
is nonzero, butmkd
is zero. Thus kλ is a strong obstruction. Q.E.D.
7.5 Verification of robust obstructions
Theorem 7.5.1 Assume that the singularities of spec(R(X)) are rational.
Assume PH1 for both sλ
and md
as above. Then, given λ, d and the speci-
fications of ρ : Hy →֒ H and X, whether Vλ(H) is a robust obstruction can
be verified in poly(〈ρ〉, 〈Hy〉, 〈H〉, 〈X〉, 〈d〉, 〈λ〉) time. Furthermore, a positive
integral relaxation parameter k such that Vkλ(G) is a strong obstruction can
also be found in the same time.
The crucial result used implicitly here is the quasipolynomiality theorem
(Theorem 4.1.1) because of which PH1 for both sλ
and md
are well defined.
Proof: By linear programming [GLS], whether Qd
is nonempty or not can
be determined in poly(〈Qdλ〉) = poly(〈ρ〉, 〈Hy〉, 〈H〉, 〈d〉, 〈λ〉) time. If it
is nonempty, the linear programming algorithm also gives its affine span.
Whether this contains an integer point can be determined in polynomial
time, using the polynomial time algorithm for computing the Smith normal
form, as in the proof of Theorem 3.1.1.
Similarly, whether P λ
is nonempty or not can be determined in poly(〈P λ
poly(〈X〉, 〈d〉, 〈λ〉) time. If it is nonempty, whether its affine span contains
an integer point can be determined in polynomial time similarly. Further-
more, the algorithm can also be made to return a vertex v of the polytope
P λd if it is nonempty.
Using these observations, whether Vλ(G) is a robust obstruction can be
determined in polynomial time.
As far as the computation of the relaxation parameter k is concerned,
let us consider the second case in Definition 7.4.1–when both Qdλ and P
d are
nonempty, the affine span of Qdλ does not contain an integer point and the
affine span of P λd contains an integer point–the first case being simpler. In
this case, by examining the Smith normal forms of the defining equations
of the affine spans of P λ
and Qd
and the rational coordinates of a vertex
v ∈ P λ
, we can find a large enough k so that the affine span of Qkd
does not
contain an integer point, the affine span of P kλ
contains an integer point,
and P kλ
contains an integer point that is some multiple of v. Q.E.D.
The value of the relaxation parameter k computed above is rather con-
servative. One may wish to compute as small value of k as possible for
which Vkλ(G) is a strong obstruction (though in our application this is not
necessary). If SH for holds for the structural constant sλ
(cf. Hypothe-
sis 3.3.2 and Section 3.5), then we can let k be the smallest integer larger
than the saturation index (estimate) for P λ
such that affine span of Qkd
nonempty) does not contain an integer point (as can be ensured by looking
at the Smith normal of the defining equations of the affine span).
7.6 Arithemetic version of the P#P vs. NC prob-
lem in characteristric zero
We now specialize the discussion in the preceding sections in the context
of the arithmetic form of the P#P vs. NC problem in characteristric zero
[V]. In concrete terms, the problem is to show that the permanent of an
n×n complex matrix X cannot be expressed as a determinant of an m×m
complex matrix, whose entries are (possibly nonhomogeneous) linear com-
binations of the entries of X.
7.6.1 Class varieties
The class varieties in this context are as follows [GCT1]. Let Y be an m×m
variable matrix, which can also be thought of as a variable l-vector, l = m2.
Let X be its, say, principal bottom-right n × n submatrix, n < m, which
can be thought of as a variable k-vector, k = n2. Let V = Symm(Y ) be the
space of homogeneous forms of degree m in the variable entries of Y . The
space V , and hence P (V ), has a natural action of G = GL(Y ) = GLl(C)
given by
(σf)(Y ) = f(σ−1Y ),
for any f ∈ V , σ ∈ G, and thinking of Y as an l-vector. Let W = Symn(X)
be the space of homogeneous forms of degree n in the variable entries of
X. The space W , and also P (W ), has a similar action of K = GL(X) =
GLk(C). We use any entry y of Y not in X as the homogenizing variable
for embedding W in V via the map φ : W → V defined by:
φ(h)(Y ) = ym−nh(X), (7.2)
for any h(X) ∈W . We also think of φ as a map from P (W ) to P (V ).
Let g = det(Y ) ∈ P (V ) be the determinant form, and f = φ(h), where
h = perm(X) ∈ P (W ). Let ∆V [g],∆V [f ] ⊆ P (V ) be the projective closures
of the orbits Gg and Gf , respectively, in P (V ). Let ∆W [h] ⊆ P (W ) be the
projective closure of the K-orbit Kh of h in P (W ). Then ∆V [g] is called the
class variety associated with NC and ∆V [f ] the class variety associated with
P#P ; ∆W [h] is called the base class variety associated with P
#P . (The base
class variety is not used in what follows. Rather its variant, called a reduced
class variety defined below, will be used.) These class varieties depend on
the lower bound parameters n and m. If we wish to make these explicit, we
would write ∆V [f, n,m] and ∆V [g,m] instead of ∆V [f ] and ∆V [g].
The class varieties ∆V [g] = ∆V [g,m] and ∆V [f ] = ∆V [f, n,m] are
G-subvarieties of P (V ), and their homogeneous coordinate rings RV [g] =
RV [g,m] and RV [f ] = RV [f, n,m] have natural degree-preserving G-action.
It is conjectured in [GCT1] that, if m = poly(n) and n → ∞, then
f 6∈ ∆V [g]; this is equivalent to saying that the class variety ∆V [f, n,m]
cannot be embedded in the class variety ∆V [g,m] (as a subvariety). This
implies the arithmetic form of the P#P 6= NC conjecture in characteristic
zero.
7.6.2 Obstructions
The obstruction in this context is defined as follows. A G-module Vλ(G) is
called an obstruction for the pair (f, g) if it occurs in RV [f, n,m]
d but not
RV [g,m]
for some d. It is called a strong obstruction if, for some d, it occurs
in RV [f, n,m]
but it does not contain (Cg)d as a Gg-submodule, where
(Cg) ⊆ V denotes the one dimensional line corresponding to g, and Gg ⊆ G
is the stabilizer of g = det(Y ) ∈ P (V ). If Vλ(G) is a (strong) obstruction
of degree d, then the size |λ| = dm; hence d is completely determined by λ
and m.
Existence of an obstruction or a strong obstruction implies that the
class variety ∆V [f, n,m] cannot be embedded in the class variety ∆V [g,m],
as sought. The main algebro-geometric results of [GCT1, GCT2] suggest
that strong obstructions should indeed exist for all n → ∞, assuming m =
poly(n); cf. Section 4, Conjecture 2.10 and Theorem 2.11 in [GCT2]. The
goal then is to prove existence of strong obstructions for all n.
The definition of a strong obstruction can be simplified further as follows.
Let X ′ denote the set of variables, which consists of the variable entries in
X and the homogenizing variable y above. Let W ′ = Symm(X ′) ⊆ V =
Symm(Y ) be the space of homogeneous forms of degree m in the variables
of X ′. We have a natural action of H = GL(X ′) = GLn2+1(C) on W
and hence on P (W ′). We have a natural map φ′ : W → W ′ given by
φ′(h)(X ′) = ym−nh(X). The map φ in (7.2) is φ′ followed by the inclusion
from W ′ to V . We also think of φ′ as a map from P (W ) to P (W ′).
Let f ′ = φ′(h), for h = perm(X) ∈ P (W ). Let ∆W ′ [f
′] ⊆ P (W ′) be
the orbit closure of Hf ′. It is an H-subvariety of P (W ′), and hence its
homogeneous coordinate ring RW ′ [f
′] has the natural degree preserving H-
action. We call ∆W ′ [f
′] the reduced class variety for P#P . It is known (cf.
Theorem 8.2 in [GCT2]) that Vλ(G) occurs in RV [f ]
iff Vλ(H) occurs in
RW ′ [f
. Here the dominant weight λ of G is considered a dominant weight
of H by restriction from G to H.
Hence Vλ(G) is a strong obstruction for the pair (f, g), iff for some d,
Vλ(H) occurs in RW ′ [f
as an H-submodule and Vλ(G) does not contain
(Cg)d as a Gg-submodule. In particular, we can assume without loss of
generality that the height of the Young diagram for λ is at most n2 + 1;
otherwise Vλ(H) would be zero.
7.6.3 Robust obstructions
It is known that the stabilizer Gg of g = det(Y ) ∈ P (V ) consists of lin-
ear transformations in G of the form Y → AY ∗B−1, thinking of Y as an
m×m matrix, where Y ∗ is either Y or Y T , A,B ∈ GLm(C). Thus the con-
nected component of Gg is essentially GLm(C)×GLm(C) ⊆ G = GLl(C) =
GLm2(C). This means the subgroup restriction problem for the embedding
ρ : Gg →֒ G is essentially the Kronecker problem (Problem 1.1.1).
Assume PH1 (Hypothesis 7.3.2) for the subgroup restriction ρ : Gg →֒ G;
which is essentially PH1 for the Kronecker problem. It now assumes the
following concrete form. Let md
denote the multiplicity of the Gg-module
(Cg)d in Vλ(G). Assume that the height of λ is at most n
2+1 for the reasons
give above.
Hypothesis 7.6.1 (PH1:)
There exists a polytope Qd
such that:
1. The number of integer points in Qd
is equal to md
2. The Ehrhart quasi-polynomial of Qdλ coincides with the stretching quasi-
polynomial m̃d
(n) (cf. Theorem 3.5.1).
3. The polytope Qd
is given by a separating oracle, and its encoding
bitlength 〈Qdλ〉 is poly(n, 〈m〉, 〈d〉, 〈λ〉) time.
We have to explain why 〈Qd
〉 is stipulated to depend polynomially on
n and 〈m〉, rather than m. After all, the bitlengths 〈G〉, 〈Gg〉 and 〈ρ〉 are
O(poly(m2)) as per the definitions in Section 3.4. So, as per PH1 for sub-
group restriction in Section 3.4.3, 〈Qdλ〉 should depend polynomially on m.
We are stipulating a stronger condition for the following reason. First, as we
already mentioned, the above hypothesis is essentially PH1 for the Kronecker
problem, which is obtained by specializing PH1 for the plethysm problem
(Hypothesis 1.6.4). In Hypothesis 1.6.4, the encoding bitlength of the poly-
tope depends polynomially on the bitlengths of the various partition param-
eters λ, π, µ of the plethysm constant aπ
, but is independent of the rank of
the group G therein. (As explained in the remarks after Hypothesis 1.6.4,
this is justified because the bound in Theorem 1.6.3 is also independent of
the rank of G). For the same reason, the encoding bitlength of the polytope
here should be independent of the rank of G (which is m2), but should de-
pend polynomiallly on the total bit length of the partitions parametrizing
the representations Vλ(G) and (Cy)
d. This is O(n+ 〈m〉+ 〈d〉+ 〈λ〉). (Note
that the one dimensional representation (Cy)d of Gg is essentially the d-th
power of the determinant representation of Gg, since the connected compo-
nent of Gg is isomorphic to GLm(C)×GLm(C). The Young diagram for the
partition corresponding to the d-th power of the determinant representation
of GLm(C) is a rectangle of height m and width d. It can be specified by
simply giving m and d–this specification has bit length 〈m〉+ 〈d〉.)
Next let us specialize PH1 as per Hypothesis 7.3.1. The class variety
∆V [f ] = ∆[f, n,m] will now play the role of X in Hypothesis 7.3.1. But,
for the reasons explained in the proof of Proposition 7.6.4 below, we shall
instead specialize Hypothesis 7.3.1 to the (simpler) reduced class variety
Z = ∆W ′[f
′]. It now assumes that following concrete form. Let sλ
denote
the multiplicity of Vλ(H) in RW ′ [f
. Putting Z in place of X in Hypothe-
sis 7.3.1, we get:
Hypothesis 7.6.2 (PH1):
There exists a polytope P λ
such that:
1. The number of integer points in P λ
is equal to sλ
2. The Ehrhart quasi-polynomial of P λ
coincides with the stretching quasi-
polynomial s̃λ
(n) (cf. Theorem 3.5.1).
3. The polytope P λd is given by a separating oracle, and its encoding
bitlength 〈P λd 〉 is
poly(〈d〉, 〈λ〉, 〈Z〉) = poly(〈d〉, 〈λ〉, n, 〈m〉). (7.3)
Here (7.3) follows because 〈Z〉 = n+〈m〉. To see why, let us observe that
Z = ∆W ′ [f
′] is completely specified once the point f ′ = ym−nh ∈ P (W ′) is
specified. To specify f ′, it sufficies to specify m,n and the point h ∈ P (W ).
It is known [GCT2] that the point h = perm(X) ∈ P (W ) is completely
characterized by its stabilizer Kh ⊆ K = GL(X) = GLk(C). Furthermore,
Kh is explicitly known [Mc]. It is generated by the linear transformation in
K of the form X → λXµ−1, thinking of X as an n×n matrix, where λ and
µ are either diagonal or permutation matrices. So to specifiy h, it suffices
to specify Kh,K and the embedding ρ
′ : Kh →֒ K. The bit length of this
specification is O(n) (cf. Section 3.4). To specify f ′, and hence Z, it suffices
to specify m,n,K,Kh and ρ
′. The total bit length of this specification is
O(n+ 〈m〉).
Assume PH1 for both md
and sλ
, i.e., Hypotheses 7.6.1 and 7.6.2.
Definition 7.6.3 We say that Vλ(G) is a robust obstruction for the pair
(f, , g) if one of the following hold:
1. Qdλ is empty, and P
d is nonempty.
2. Both Qdλ and P
d are nonempty, the affine span of Q
λ does not contain
an integer point and the affine span of P λd contains an integer point.
If the first condition holds, we say that Vλ(G) is a geometric obstruction.
If the second condition holds, it is called a modular obstruction.
Proposition 7.6.4 Assume PH1 for both mdλ and s
d (Hypotheses 7.6.1 and
7.6.2). If Vλ(G) is a robust obstruction for the pair (f, g), then for some
positive integral relaxation parameter k, Vkλ(G) is a strong obstruction for
(f, g). In fact, this is so for most large enough k.
Proof: This essentially follows from Proposition 7.4.2. It only remains to
clarify why we can use PH1 for the reduced class variety ∆W ′[f
′]–as we are
doing here– in place of PH1 for the class variety ∆V [f ]. This is because,
as already mentioned, Vλ(G) occurs in RV [f ]
d iff Vλ(H) occurs in RW ′ [f
′]∗d.
Q.E.D.
7.6.4 Verification of robust obstructions
Theorem 7.6.5 Assume that the singularities of spec(RW ′ [f
′]) are ratio-
nal. Assume PH1 for both md
and sλ
as above (Hypotheses 7.6.1 and 7.6.2).
Then, given n,m, λ and d, whether Vλ(H) is a robust obstruction can be
verified in poly(n, 〈m〉, 〈d〉, 〈λ〉) time. Furthermore, a positive integral relax-
ation parameter k such that kλ is a strong obstruction can also be computed
in this much time.
Once n andm are specified, the various class varieties andK,Kh, ρ
′, G,Gg, ρ
above are automatically specified implicitly.
Proof: This follows from Theorem 7.5.1; cf. also the remark following its
proof. Q.E.D.
Theorem 7.3.4 can be similarly specialized in this context; we leave that
to the reader.
7.6.5 On explicit construction of obstructions
Theorem 7.6.6 Assume that m = poly(n) or even 2polylog(n), and:
1. (RH) [Rationality Hypothesis]: The singularities of spec(RW ′ [f
′]) are
rational.
2. PH1 for both md
and sλ
(Hypotheses 7.6.1 and 7.6.2).
3. OH [Obstruction Hypothesis]: For every (large enough) n, there exists
λ of poly(n) bit length such that |λ| is divisible by m and one of the
following holds (with d = |λ|/m):
(a) Qd
is empty, and P λ
is nonempty.
(b) Both Qdλ and P λ
are nonempty, the affine span of Qd
does not
contain an integer point and the affine span of P λ
contains an
integer point.
Then there exists an explicit family {λn} of robust obstructions.
Here we say that {λn} is an explicit family of robust obstructions if each
λn is short and easy to verify. Short means 〈λn〉 is O(poly(n)). Easy to verify
means whether λn is a robust obstruction can be verified in O(poly(n)) time.
The poly(n) bound here and in OH is meant to be independent of m,
as long as m << 2n; i.e., it should hold even when m = 2polylog(n). In
other words, the family {λn} should continue to remain an explicit robust
obstruction family, as we vary m over all values ≤ 2polylog(n), and perhaps
even values ≤ 2o(n), but will cease to be an obstruction family for some large
enough m = 2Ω(n). This is an important uniformity condition.
Proof: OH basically says that there exists a short robust obstruction λn for
every n. By Theorem 7.6.5, it is easy to verify. Q.E.D.
7.6.6 Why should robust obstructions exist?
The main question now is: why should OH hold? That is, why should
(short) robust obstructions exist?
As we already mentioned, the results in [GCT1, GCT2] indicate that
strong obstructions should exist for every n, assuming m = poly(n). We
shall give a heuristic argument for existence of robust obstructions assuming
that strong obstructions exist. This will crucially depend on the following SH
formd
, which is essentially SH for the Kronecker problem (i.e. specialization
of Hypothesis 1.6.5 to the Kronecker problem), good experimental evidence
for which is provided in [BOR].
Hypothesis 7.6.7 (SH:) (a): The saturation index of m̃d
(k) is bounded by
a polynomial in m. (Observe that the rank of G is poly(m) and the height of
λ is at most n2 + 1). (b): The quasi-polynomial m̃d
(n) is strictly saturated,
i.e. the saturation index is zero, for almost all λ (and d).
If Vλ(G) is a strong obstruction, s
is nonzero but md
is zero. Thus,
assumming PH1, there are three possibilities:
1. Qd
is empty, and P λ
is nonempty and contains an integer point.
2. Both Qdλ and P λd are nonempty, the affine span of Q
λ does not contain
an integer point and P λd contains an integer point.
3. Both Qd
and P λ
are nonempty. The affine span of Qd
contains an
integer point, but Qd
does not. And P λ
contains an integer point.
In the first two cases, λ is a robust obstruction. As per SH (Hypothe-
sis 7.6.7), for almost all λ, the Ehrhart quasipolynomial of Qd
is saturated:
this means (cf. the proof of Theorem 3.1.1), if the affine span of Qd
contains
an integer point then Qd
also contains an integer point. And hence, with
a high probability, the third case should not occur. In other words, strong
obstructions can be expected to be robust with a high probability.
Let us call a strong obstruction λ fragile if it is not robust; this means
the affine span of Qdλ contains an integer point, but Q
λ does not. By SH
(Hypothesis 7.6.7), if λ is fragile, then for some k = poly(m), Qkdkλ contains
an intger point, and hence, kλ is not obstruction. Thus fragile obstructions
are close to not being obstructions, and furthermore, are expected to be
rare, as argued above. This is why we are focussing on robust obstructions.
It may be remarked that the only SH needed in the argument above
is the one (Hypothesis 7.6.7) for the structural constant md
. This is a
special case of the SH for the subgroup restriction problem (cf. Section 3.4)
specialized to the embedding Gg →֒ G. In particular, we do not need SH
for the structural constant sλd ; i.e., for the more difficult decision problem in
geometric invariant theory (cf. Problem 1.1.4 and Section 3.5).
7.6.7 On discovery of robust obstructions
It may be conjectured that not just the verification (cf. Theorem 7.6.5)
but also the discovery of robust obstructions is easy for the problem under
consideration. In this section we shall give an argument in support of this
conjecture for geometric (robust) obstructions (which may be conjectured to
exist in the problem under consideration). For this we need to reformulate
the notions of strong and robust obstructions (Definition 7.6.3) as follows.
Let TZ be the set of pairs (d, λ) such that s
d is nonzero and SZ the set
of pairs (d, λ) such that mdλ is nonzero.
Proposition 7.6.8 Assuming PH1 above (Hypotheses 7.6.1 and 7.6.2), TZ
and SZ are finitely generated semigroups with respect to addition.
These semi-groups are analogues of the Littlewood-Richardson semigroup
(Section 2.2.2) in this setting.
Proof: The proof is similar to that for the Littlewood-Richardson semigroup
For given d and λ, the polytope P λ
in PH1 for sλ
(Hypothesis 7.6.2) has
a specification of the form
Ax ≤ b (7.4)
where A depends only the variety Z = ∆W ′ [f
′], but not on d or λ, and
b depends homogeneously and linearly on d and λ. Let P be the polytope
defined by the inequalities (7.4) where both d and λ are treated as variables.
Then P is a polyhedral cone (through the origin) in the ambient space
containing P with the coordinates x, d and λ. Let PZ be the set of integer
points in P . It is a finitely generated semigroup since P is a polyhedral cone.
Let TR be the orthogonal projection of P on the hyperplane corresponding
to the coordinates d and λ. Now TZ is simply the projection of PZ. Hence
it is a finitely generated semigroup as well.
The proof for SZ is similar, with SR defined similarly. Q.E.D.
The polyhedral cones TR and SR here are analogues of the Littlewood-
Richardson cone (Section 2.2.2) in this setting. Note that (d, λ) ∈ TR iff P
is nonempty; similarly for SR.
A Weyl module Vλ(G) is a strong obstruction for the pair (f, g) of degree
d iff (d, λ) occurs in TZ but not in SZ. It is a robust obstruction iff it occurs
in TR but not in SZ. It is a geometric obstruction iff it occurs in TR but not
in SR. It is a modular obstruction iff it occurs in TR and also in SR but not
in SZ.
Assuming PH1 (Hypothesis 7.6.2), whether (d, λ) belongs to TR can be
determined in polynomial time by linear programming, since (d, λ) ∈ TR
iff P λ
is nonempty. Similarly, assuming PH1 (Hypothesis 7.6.1), whether
(d, λ) ∈ SR can be determined in polynomial time.
The following is a stronger complement to PH1.
Hypothesis 7.6.9 (PH1*)
Whether TR\SR is nonempty can be determined in polynomial time; i.e.,
poly(n, 〈m〉) time. If so, the algorithm can also output (d, λ) ∈ TR \ SR of
polynomial bit length.
Proposition 7.6.10 Assuming PH1*, given n and m, the problem of decid-
ing if a geometric obstruction exists for the pair (f, g), and finding one if one
exists, belongs to the complexity class P ; i.e., it can be done in poly(n, 〈m〉)
time.
This immediately follows from Hypothesis 7.6.9 since (d, λ) is a geometric
obstruction iff (d, λ) ∈ TR \ SR.
Hypothesis 7.6.9 is supported by the following:
Proposition 7.6.11 Assuming PH1 (Hypotheses 7.6.1 and 7.6.2), Hypoth-
esis 7.6.9 holds if TR and SR have polynomially many explicitly given con-
straints with the specification of polynomial bit length; here polynomial means
poly(n, 〈m〉).
The proposition holds even if the polytope SR has exponentially many
constraints, as long as it is given by a separation oracle that works in poly-
nomial time.
Proof: It suffices to check if SR satisfies each constraint of TR. This can be
done in polynomial time using the linear programming algorithm in [GLS].
Specifically, let l(y) ≥ 0 be a constraint of TR. Then we just need to minimize
l(y) on SR and check if the minimum exceeds zero. Q.E.D.
But this method does not work when the number of constraints of TR is
exponential, as expected in the context of the lower bound problems under
consideration. In fact, no generic black-box-type algorithm, like the one in
[GLS] based on just a membership or separation oracle for TR, can be used
to prove (4) when the number of constraints of TR is exponential.
Fortunately, this is not a serious problem. A basic principle in combi-
natorial optimization, as illustrated in [GLS], is that a complexity theoretic
property that holds for polytopes with polynomially many constraints will
also hold for polytopes with exponentially many constraints, provided these
constraints are sufficiently well-behaved. For example, Edmond’s perfect
matching polytope for nonbipartite graphs has complexity-theoretic proper-
ties similar to the perfect matching polytope for bipartite graphs, though it
can have exponentially many constraints. We have already remarked that
TR and SR are analogues of the Littlewood-Richardson cones. The facets of
the Littlewood-Richardson cone have a very nice explicit description [Kl, Z].
The cones TR, SR here are expected to have similar nice explicit descrip-
tion. This is why Hypothesis 7.6.9 can be expected to hold even if the
number of constraints of TR is exponential, just as it holds even when SR
has exponentially many constraints. But a polynomial-time algorithm as in
Hypothesis 7.6.9 would have to depend crucially on the specific nature of
the facets (constraints) of TR in the spirit of the linear-programming-based
algorithm for the construction of a maximum-weight perfect matching in
nonbipartite graphs [Ed], where too the number of constraints is exponen-
tial but the algorithm still works because of the structure theorems based
on the specific nature of the constraints.
7.7 Arithmetic form of the P vs NP problem in
characteristic zero
We turn now to the arithemetic form of the P vs. NP problem in character-
istic zero. The arguments are essentially verbatim translations of those for
the arithmetic form of the P#P vs. NC problem in the preceding section.
Hence we shall be brief.
In the preceding section h(X) was perm(X) and g(Y ) was det(Y ). Now
h(X) and g(Y ) would be explicit (co)-NP-complete and P -complete func-
tions E(X) and H(Y ) constructed in [GCT1]. They can be thought of as
points in suitable W = Symk(X) and V = Syml[Y ], k = O(n2), l = O(m2),
with the natural action of GL(X) and G = GL(Y ), where n denotes the
number of input parameters and m denotes the circuit size parameter in
the lower bound problem. These functions are extremely special like the
determinant and the permanent in the sense that they are “almost” char-
acterized by their stabilizers as explained in [GCT1]–and this is enough for
our purposes.
We again have a natural embedding φ : P (W ) → P (V ), which lets us
define f = φ(h). The class variety for NP is defined to be ∆V [f ] ⊆ P (V ),
the projective closure of the orbit Gf . The class variety for P is ∆̃V [g] ⊆
P (V ), which is defined to be the projective closure of G[g], where [g] denotes
the set of points in P (V ) that are stabilized by Gg ⊆ G, the stabilizer of g.
An explicit description of Gg is given in [GCT1]; cf. Section 7 therein. To
show P 6= NP in characteristic zero, it suffices to show that ∆V [f ] is not a
subvariety of ∆̃V [g] for all large enough n, if m = poly(n) (cf. Conjecture
7.4. in [GCT1]). For this, in turn, it suffices to show existence of strong
obstructions, defined very much as in Section 7.6, for all n, assumming
m = poly(n).
We can then formulate PH1 for the new h(X) and g(Y ) just as in Hy-
potheses 7.6.1 and 7.6.2, and the notion of a robust obstruction as in Defi-
nition 7.6.3. We then have:
Theorem 7.7.1 (Verification of obstructions)
Analogues of Theorems 7.6.5 and 7.6.6 holds for h(X) = E(X) and
g(Y ) = H(Y ).
Furthermore, even discovery of robust obstructions can be conjectured
to be easy (poly-time)–this would follow from the obvious analogue of Hy-
pothesis 7.6.9 here.
Heuristic argument for existence of robust obstructions is very similar
to the one in Section 7.6.6. It needs SH for the special case of the subgroup
restriction problem for the embedding Gg →֒ G. The group Gg, as described
in [GCT1], is a product of some copies of the algebraic torus and the sym-
metric group. The subgroup restriction problem in this case is akin to but
harder than the plethysm problem.
Bibliography
[BGS] T. Baker, J. Gill, R. Soloway, Relativization of the P =?NP ques-
tion, SIAM J. Comput. 4, 431-442, 1975.
[BBCV] M. Baldoni, M. Beck, C. Cochet, M. Vergne, Volume computation
for polytopes and partition functions for classical root systems,
math.CO/0504231, Apr, 2005.
[Bar] A. Barvinok, A polynomial time algorithm for counting integral
points in polyhedra when the dimension is fixed. Math. Oper. Res.,
19 (4): 769-779, 1994.
[BDR] The Frobenius problem, rational polytopes, and Fourier-Dedekind
sums, Journal of number theory, vol. 96, issue 1, 2002.
[BBD] A. Beilinson, J. Bernstein, P. Deligne, Faisceaux pervers,
Astérisque 100, (1982), Soc. Math. France.
[Bl] P. Belkale, Geometric proofs of Horn and saturation conjectures,
math.AG/0208107.
[BZ] A. Berenstein, A. Zelevinsky, Tensor product multiplicities and
convex polytopes in partition space, J. Geom. Phys. 5(3): 453-472,
1988.
[BL] S. Billey, V. Lakshmibai, Singular Loci of Schubert varieties,
Birkhäuser, 2000.
[Bou] J. Boutot, Singularit’es rationelles et quotients par les groupes
r’eductifs, Invent. Math.88, (1987), 65-68.
[BOR] E. Briand, R. Orellana, M. Rosas, Reduced Kronecker coeffi-
cients and counter-examples to Mulmuley’s satuation conjecture
SH, arXiv:0810.3163v1 [math.CO] 17 Oct, 2008.
http://arxiv.org/abs/math/0504231
http://arxiv.org/abs/math/0208107
http://arxiv.org/abs/0810.3163
[Ca] R. Carter, Simple groups of Lie type, John Wiley and Sons, 1989.
[Cs] L. Csanky, Fast parallel matrix inversion algorithms, SIAM J. com-
put. 5 (1976), 618-623.
[DEP1] C. De Concini, D. Eisenbud, C. Procesi, Young diagrams and de-
terminantal varieties, Inv. Math. 56 (1980) 129-165.
[DEP2] C. De Concini, D. Eisenbud, C. Procesi, Hodge algebras, astérisque
91, Société mathématique de france, 1982.
[Dh] R. Dehy, Combinatorial results on Demazure modules, J. of Alge-
bra 205, 505-524 (1998).
[Dl] P. Deligne, La conjecture de Weil II, Publ. Math. Inst. Haut. Étud.
Sci. 52, (1980) 137-252.
[DL] P. Deligne, G. Lusztig, Representations of reductive groups over
finite fields, Annals Math. 103, 103-161.
[DHHH] De Loera, J.A., Haws, D., Hemmecke, R., Huggins, P., Tauzer, J.,
Yoshida, R. A User’s Guide for LattE v1.1, 2003, software package
LattE is available at http://www.math.ucdavis.edu/∼ latte/
[DM1] J. De Loera, T. McAllister, Vertices of Gelfand-Tsetlin polytopes,
math.CO/0309329, Sept. 2003.
[DM2] J. De Loera, T. McAllister, On the computation of Clebsch-Gordon
coefficients and the dilation effect, Experiment Math. 15, (2006),
no. 1, 7-20.
[Deo] V. Deodhar, A combinatorial setting for questions in Kazhdan-
Lusztig theory, Geom. Dedicata, 36, (1990).
[Der] H. Derkesen, J. Weyman, On the Littlewood-Richardson polyno-
mials, J. Algebra 255 (2002), no. 2, 247-257.
[Dri] V. Drinfeld, Quantum groups, Proc. Int. Congr. Math. Berkeley,
1986, vol. 1, Amer. Math. Soc. 1988, 798-820.
[DFK] M. Dyer, A. Frieze and R. Kannan, A randomized polynomial time
algorithm for approximating the volume of convex sets, in Journal
of the Association for Computing Machinary, 38:1-17, (1991)
http://www.math.ucdavis.edu/
http://arxiv.org/abs/math/0309329
[Ed] J. Edmonds, Maximum matching and a polyhedron with 0 − 1
vertices, Journal of Research of the National Bureau of Standards
B 69, (1965), 125-130.
[El] A. Elashvili, Invariant algebras, in Lie groups, their discrete sub-
groups, and invariant theory, Advances in Soviet Mathematics, vol.
8, ed. E. Vinberg, American Mathematical Society, 1992.
[Fu] W. Fulton, Eigenvalues of sums of Hermitian matrices (after A.
Klyachko), Séminaire Bourbaki, vol. 1997/98. Astérisque No. 2523
(1998), Exp. No. 845, 5, 255-269.
[FH] W. Fulton, J. Harris, Representation theory, A first course,
Springer, 1991.
[KM] M. Kapovich, J. Millson, Structure of the tensor product semi-
group, math.RT/0508186.
[GCTabs] K. Mulmuley, M. Sohoni, Geometric complexity theory, P vs. NP
and explicit obstructions, in “Advances in Algebra and Geometry”,
Edited by C. Musili, the proceedings of the International Confer-
ence on Algebra and Geometry, Hyderabad, 2001.
[GCTflip] K. Mulmuley, On P vs. NP, geometric complexity theory, and the
flip I: a high level view, Technical report TR-2007-16, computer
science department, The university of Chicago, September 2007.
Available at http://ramakrishnadas.cs.uchicago.edu.
[GCT1] K. Mulmuley, M. Sohoni, Geometric complexity theory I: an ap-
proach to the P vs. NP and related problems, SIAM J. Comput.,
vol 31, no 2, pp 496-526, 2001.
[GCT2] K. Mulmuley, M. Sohoni, Geometric complexity theory II: towards
explicit obstructions for embeddings among class varieties, SIAM
J. Comput., Vol. 38, Issue 3, June 2008.
[GCT3] K. Mulmuley, M. Sohoni, Geometric complexity theory III, on de-
ciding positivity of Littlewood-Richardson coefficients, cs. ArXiv
preprint cs. CC/0501076 v1 26 Jan 2005.
[GCT4] K. Mulmuley, M. Sohoni, Geometric complexity theory IV: quan-
tum group for the Kronecker problem, cs. ArXiv preprint cs.
CC/0703110, March, 2007.
http://arxiv.org/abs/math/0508186
http://ramakrishnadas.cs.uchicago.edu
[GCT5] K. Mulmuley, H. Narayanan, Geometric complexity theory V:
on deciding nonvanishing of a generalized Littlewood-Richardson
coefficient, Technical Report TR-2007-05, computer science de-
partment, The University of Chicago, May, 2007. Available at:
http://ramakrishnadas.cs.uchicago.edu
[GCT6erratum] K. Mulmuley, Erratum to the saturation hypothesis in
“Geometric Complexity Theory VI”, technical report TR-2008-10,
computer science department, the university of Chicago, October
2008. Available at: http://ramakrishnadas.cs.uchicago.edu.
[GCT7] K. Mulmuley, Geometric complexity theory VII: Nonstan-
dard quantum group for the plethysm problem, Tech-
nical Report TR-2007-14, computer science department,
The University of Chicago, September, 2007. Available at:
http://ramakrishnadas.cs.uchicago.edu.
[GCT8] K. Mulmuley, Geometric complexity theory VIII: On canon-
ical bases for the nonstandard quantum groups, Tech-
nical Report TR 2007-15, computer science department,
The university of Chicago, September 2007. Available at:
http://ramakrishnadas.cs.uchicago.edu.
[GCT9] B. Adsul, M. Sohoni, K. Subrahmanyam, Geometric complexity
theory IX: algbraic and combinatorial aspects of the Kronecker
problem, under preparation.
[GCT10] K. Mulmuley, Geometric complexity theory X: On class varieties
and nonstandard quantum groups, under preparation.
[GCT11] K. Mulmuley, Geometric complexity theory XI: on the flip over
finite or algebraically closed fields of positive characteristic, under
preparation.
[Hi] H. Hironaka, Resolution of singularities of an algebraic variety over
a field of characteristic zero, Ann. of Math (2), 79: 109-273.
[GrL] I. Grojnowski, G. Lusztig, On bases of irreducible representations
of quantum GLn, in Kazhdan-Lusztig theory and related topics,
Chicago, IL, 1989, Contemp. Math. 139, 167-174.
[GLS] M. Grötschel, L. Lovász, A. Schrijver, Geometric algorithms and
combinatorial optimzation, Springer-Verlag, 1993.
http://ramakrishnadas.cs.uchicago.edu
http://ramakrishnadas.cs.uchicago.edu
http://ramakrishnadas.cs.uchicago.edu
http://ramakrishnadas.cs.uchicago.edu
[Ha] M. Hashimoto, Another proof of global F -regularity of Schubert
varieties, arXiv:math.AC/0409007 v1 1 Sep 2004.
[Ho] M. Hochster, J. Roberts, Rings of invariants of reductive groups
acting on regular rings are Cohen-Macaulaey, Adv. in Math. 13
(1974), 115-175.
[JSV] M. Jerrum, A. Sinclair, E. Vigoda, A polynomial-time approxima-
tion algorithm for the permanent of a matrix with non-negative
entries, J. ACM, vol. 51, issue 4, 2004.
[Ji] M. Jimbo, A q-difference analogue of U(}) and the Yang-Baxter
equation, Lett. Math. Phys. 10 (1985), 63-69.
[KB] R. Kannan, A. Bachem, Polynomial algorithms for computing the
Smith and Hermite normal forms of an integer matrix, SIAM J.
comput., 8 (1979) 499-507.
[KR] R. Karp, V. Ramachandran, Parallel algorithms for shared memory
machines, Handbook of theoretical computer science, Ed. J. van
Leeuwen, Elsevier science publishers B.V., 1990.
[Kas1] Crystalizing the q-analogue of universal enveloping algebras,
Comm. Math. Phys. 133 (1990), 249-260.
[Kas2] M. Kashiwara, On crystal bases of the q-analogue of universal en-
veloping algebras, Duke Math. J. 63 (1991), 465-516.
[Kas3] M. Kashiwara, Global crystal bases of quantum groups, Duke
Mathematical Journal, vol. 69, no.2, 455-485.
[KL1] D. Kazhdan, G. Lusztig, Representations of Coxeter groups and
Hecke algebras, Invent. Math. 53 (1979), 165-184.
[KL2] D. Kazhdan, G. Lusztig, Schubert varieties and Poincare duality,
Proc. Symp. Pure Math., AMS, 36 (1980), 185-203.
[KTT] R. King, C. Tollu, F. Toumazet Stretched Littlewood-Richardson
coefficients and Kostka coefficients. In, Winternitz, P., Harnard, J.,
Lam, C.S. and Patera, J. (eds.) Symmetry in Physics: In Memory
of Robert T. Sharp. Providence, USA, AMS OUP, 99-112., CRM
Proceedings and Lecture Notes 34, 2004.
http://arxiv.org/abs/math/0409007
[Kh] L. Khachian, A polynomial algorithm in linear programming (in
Russian), Doklady Akad. Nauk SSSR 1979, t. 244, No. 5, 1093–
1096.
[Ki] A. Kirillov, An invitation to the generalized saturation conjecture,
math. CO/0404353., 20 Apr. 2004.
[Kli] A. Klimyk, K. Schmüdgen, Quantum groups and their representa-
tions, Springer, 1997.
[Kl] A. Klyachko, Stable vector bundles and Hermitian operators, IGM,
University of Marne-la-Vallee preprint (1994).
[KT1] A. Knutson, T. Tao, The Honeycomb model of GLn(C) tensor
products I: proof of the saturation conjecture, J. Amer. Math. Soc,
12, 1999, pp. 1055-1090.
[KT2] A. Knutson, T. Tao, Honeycombs and sums of Hermitian matrices,
Notices Amer. Math. Soc. 48, (2001) No. 2, 175186.
[LT] B. Leclerc, J. Thibon, Littlewood-Richardson coefficients and
Kazhdan-Lusztig polynomials, Combinatorial methods in represen-
tation theory, Adv. Stud. Pure. Math. 28 (2000), 155-220.
[LLL] A. Lenstra, H. Lenstra, Jr., L. Lov’asz, Factoring polynomials with
rational coefficients, Mathematische Annalen 261 (1982), 515-534.
[Li] P. Littelmann, Paths and root operators in representation theory,
Ann. of Math. 142 (1995), 499-525.
[Lu1] G. Lusztig, Characters of reductive groups over a finite field, An-
nals Math Studies 107, Princeton University Press.
[Lu2] G. Lusztig, Canonical bases arising from quantized enveloping al-
gebras, J. Amer. Math. Soc. 3, (1990), 447-498.
[Lu3] G. Lusztig, Canonical bases in tensor products, Proc. Nat. Acad.
Sci. USA, vo. 89, pp 8177-8179, 1992.
[Lu4] G. Lusztig, Introduction to quantum groups, Birkhäuser, 1993.
[Lu5] G. Lusztig, Character sheaves, (1985/1986), Advances in Math.
56, 193-237; II, 57, 226-265; III, 57, 266-315; IV 59, 1-63; V, 61,
103-155.
[Mc] I. Macdonald, Symmetric functions and Hall polynomials, Oxford
science publications, Clarendon press, 1995.
[MR] V. Mehta, A. Ramanathan, Frobeniuns splitting and cohomology
vanishing for Schubert varieties, Ann. Math. 122, 1985, 27-40.
[Mc] H. Minc, Permanents, Addison-Wesley, 1978.
[N] H. Narayanan, On the complexity of computing Kostka numbers
and Littlewood-Richardson coefficients, J. of Algebraic combina-
torics, vol. 24, issue 3, Nov. 2006.
[PV] V. Popov, E. Vinberg, Invariant theory, in Encyclopaedia of Math-
ematical Sciences, Algebraic Geometry IV, Eds. A. Parshin, I. Sha-
farevich, Springer-Verlag, 1989.
[Rm] A. Ramanathan, Schubert varieties are arithmetically Cohen-
Macaley, Invent. Math 80, No. 2, 283-294 (1985).
[Rs] E. Rassart, A polynomiality property for Littlewood-Richardson
coefficients, arXiv:math.CO/0308101, 16 Aug. 2003.
[RW] J. Remmel, T. Whitehead, On the Kronecker product of Schur
functions of two row shapes, Bull. Belg. Math. Soc. 1 (1994), 649-
[Ro] M. Rosas, The Kronecker Product of Schur Functions Indexed by
Two-Row Shapes or Hook Shapes, Journal of Algebraic Combi-
natorics, An international journal, Volume 14, issue 2, September
2001.
[RR] A. Razborov, S. Rudich, Natural proofs, J. Comput. System Sci.,
55 (1997), pp. 24-35.
[RTF] N. Reshetikhin, L. Takhtajan, L. Faddeev, Quantization of Lie
groups and Lie algebras, Leningrad Math. J., 1 (1990), 193-225.
[Sc] A. Schrijver, Combinatorial optimization, Vol. A-C, Springer, 2004.
[Sm] K. Smith, F-rational rings have rational singularities, Amer. J.
Math. 119 (1997).
[Sp] T. Springer, Linear algebraic groups, in Algebraic Geometry IV,
Encyclopaedia of Mathematical Sciences, Springer-Verlag, 1989.
http://arxiv.org/abs/math/0308101
[St1] R. Stanley, Enumerative combinatorics, vol. 1, Wadsworth and
Brooks/Cole, Advanced Books and Software, 1986.
[St2] R. Stanley, Combinatorics and commutative algebra, Birkhäuser,
1996.
[St3] R. Stanley, Generalized h-vectors, intersection cohomology of toric
varieties, and related results, Advanced studies in pure mathemat-
ics 11, 1987, commutative algebra and combinatorics, pp. 187-213.
[St4] R. Stanley, Positivity problems and conjectures in algebraic com-
binatorics, manuscript, to appear in Mathematics: Frontiers and
Perpsectives, 1999.
[st5] R. Stanley, Decompositions of rational polytopes, Annals of dis-
crete mathematics 6 (1980) 333-342.
[Stm] B. Sturmfels, On vector partition functions, J. Combinatorial The-
ory, Seris A 72 (1995), 302-309.
[SV] A. Szenes, M. Vergne, Residue formulae for vector partitions and
Euler-Maclarin sums, Advances in Apllied Mathematics, vol. 30,
issue 1/2, January 2003.
[Ta] E. Tardos, A strongly polynomial algorithm to solve combinatorial
linear programs, Operations Research 34 (1986), 250-256.
[W] K. Woods, Computing the period of an Ehrhart quasipolynomial.
The Electron. J. Combin. 12 (2005), Research paper 34.
[V] L. Valiant, The complexity of computing the permanent, Theoret-
ical Computer Science 8, pp 189-201, 1979.
[Z] A. Zelevinsky, Littlewood-Richardson semigroups,
arXiv:math.CO/9704228 v1 30 Apr 1997.
http://arxiv.org/abs/math/9704228
	Introduction
	The decision problems
	Deciding nonvanishing of Littlewood-Richardson coefficients
	Back to the general decision problems
	Saturated and positive integer programming
	Quasi-polynomiality, positivity hypotheses, and the canonical models
	The plethysm problem
	Towards PH1, SH, PH2,PH3 via canonial bases and canonical models
	Basic plan for implementing the flip
	Organization of the paper
	Notation
	Preliminaries in complexity theory
	Standard complexity classes
	Example: Littlewood-Richardson coefficients
	Convex #P
	Littlewood-Richardson coefficients
	Littlewood-Richardson cone
	Eigenvalues of Hermitian matrices
	Separation oracle
	Saturation and positivity
	Saturated and positive integer programming
	A general estimate for the saturation index
	Extensions
	Is there a simpler algorithm?
	Littlewood-Richardson coefficients again
	The saturation and positivity hypotheses
	The subgroup restriction problem
	Explicit polynomial homomorphism
	Input specification and bitlengths
	Stretching function and quasipolynomiality
	The decision problem in geometric invariant theory
	Reduction from Problem 1.1.3 to Problem 1.1.4
	Input specification
	Stretching function and quasi-polynomiality
	Positivity hypotheses
	G/P and Schubert varieties
	PH3 and existence of a simpler algorithm
	Other structural constants
	Quasi-polynomiality and canonical models
	Quasi-polynomiality
	The minimal positive form and modular index
	The rings associated with a structural constant
	Canonical models
	From PH0 to PH1,3
	On PH0 in general
	Nonstandard quantum group for the Kronecker and the plethysm problems
	The cone associated with the subgroup restriction problem
	Elementary proof of rationality
	Parallel and PSPACE algorithms
	Complex semisimple Lie group
	Symmetric group
	General linear group over a finite field
	Tensor product problem
	Finite simple groups of Lie type
	Experimental evidence for positivity
	Littlewood-Richardson problem
	Kronecker problem, n=2
	G/P and Schubert varieties
	The ring of symmetric functions
	On verification and discovery of obstructions
	Obstruction
	Decision problems
	Verification of obstructions
	Robust obstruction
	Verification of robust obstructions
	Arithemetic version of the P#P vs. NC problem in characteristric zero
	Class varieties
	Obstructions
	Robust obstructions
	Verification of robust obstructions
	On explicit construction of obstructions
	Why should robust obstructions exist?
	On discovery of robust obstructions
	Arithmetic form of the P vs NP problem in characteristic zero
ABSTRACT
  This article belongs to a series on geometric complexity theory (GCT), an
approach to the P vs. NP and related problems through algebraic geometry and
representation theory. The basic principle behind this approach is called the
flip. In essence, it reduces the negative hypothesis in complexity theory (the
lower bound problems), such as the P vs. NP problem in characteristic zero, to
the positive hypothesis in complexity theory (the upper bound problems):
specifically, to showing that the problems of deciding nonvanishing of the
fundamental structural constants in representation theory and algebraic
geometry, such as the well known plethysm constants--or rather certain relaxed
forms of these decision probelms--belong to the complexity class P. In this
article, we suggest a plan for implementing the flip, i.e., for showing that
these relaxed decision problems belong to P. This is based on the reduction of
the preceding complexity-theoretic positive hypotheses to mathematical
positivity hypotheses: specifically, to showing that there exist positive
formulae--i.e. formulae with nonnegative coefficients--for the structural
constants under consideration and certain functions associated with them. These
turn out be intimately related to the similar positivity properties of the
Kazhdan-Lusztig polynomials and the multiplicative structural constants of the
canonical (global crystal) bases in the theory of Drinfeld-Jimbo quantum
groups. The known proofs of these positivity properties depend on the Riemann
hypothesis over finite fields and the related results. Thus the reduction here,
in conjunction with the flip, in essence, says that the validity of the P vs.
NP conjecture in characteristic zero is intimately linked to the Riemann
hypothesis over finite fields and related problems.

<|endoftext|><|startoftext|>
Introduction
Basaltic asteroids are small bodies connected to the processes of heating and melting
that may have led to the mineralogical di�erentiation in the interiors of the largest asteroids.
Therefore, a precise knowledge of the inventory of basaltic asteroids may help to estimate
how many di�erentiated bodies actually formed in the asteroid Main Belt, and this in turn
may provide important constraints to the primordial conditions of the solar nebula.
In the visible wavelengths range, the re�ectance spectrum of basaltic asteroids is char-
acterized by a steep slope shortwards of 0.70 µm and a deep absorption band longwards of
0.75 µm. Asteroids showing this spectrum are classi�ed as V-type in the usual taxonomies
(e.g. Bus & Binzel, 2002).
A few years ago, most of the known V-type asteroids were members of the Vesta dy-
namical family, located in the inner asteroid belt �semi-major axis a < 2.5 AU�. This family
formed by the excavation of a large crater (Thomas et al., 1997; Asphaug, 1997) on the
surface of asteroid (4) Vesta, which is the only known large asteroid �diameter D ∼ 500 km�
to show a basaltic crust (McCord et al., 1970).
Nowadays, however, several V-type asteroids have been identi�ed in the inner belt but
outside the Vesta dynamical family (Burbine et al., 2001; Florczak et al., 2002; Alvarez-
Candal et al., 2006). Basaltic asteroids have also been found in the middle Main Belt
�2.5 < a < 2.8 AU� (Binzel et al., 2006; Roig et al., 2007), as well as among the Near
Earth Asteroids (NEA) population (McFadden et al., 1985; Cruikshank et al., 1991; Binzel
et al., 2004; Du�ard et al., 2006). Recent works (Carruba et al., 2005, 2007; Nesvorný
et al., 2007; Roig et al., 2007) provide evidence that many of these V-type asteroids may
be former members of the Vesta family, that reached their present orbits due to long term
dynamical evolution. The exception is asteroid (1459) Magnya, the only basaltic object so
far discovered in the outer belt �a > 2.8 AU� (Lazzaro et al., 2000). This asteroid is too far
away from the Vesta family and it is also too big �D = 20-30 km� to have a real probability
� 3 �
of being a fragment from the Vesta's crust (Michtchenko et al., 2002).
Beyond the existence of (4) Vesta and the V-type asteroids related to the Vesta dy-
namical family, the paucity of intact di�erentiated asteroids and of their fragments observed
today in the main belt is an strong constraint to the formation scenario of basaltic mate-
rial. The sample of iron meteorites collected in the Earth indicates that they would come
from the iron core of dozens of di�erentiated parent bodies. However, there are very few
olivine-rich asteroids (classi�ed as A-type) which are assumed to come from the mantle of
di�erentiated bodies, and only one asteroid, (1459) Magnya, is known to sample the basaltic
crust of a di�erentiated parent body other than (4) Vesta. Finally, the other Main Belt
asteroid families, which formed from the disruption of over �fty 10 < D < 400 km asteroids,
show little spectroscopic evidence that their parent bodies were heated enough to produce a
distinct core, mantle and crust (Cellino, 2003).
Aiming to establish if other V-type asteroids might be found together with (1459) Mag-
nya in the outer belt, thus giving support to the existence of a di�erentiated parent body
in that part of the belt, Roig & Gil-Hutton (2006) used the �ve band photometry from the
3rd release of the Sloan Digital Sky Survey Moving Objects Catalog (SDSS-MOC; Ivezi¢ et
al., 2001; Juri¢ et al., 2002) to identify candidate V-type asteroids. Among 263 candidates
that are not members of the Vesta dynamical family, Roig & Gil-Hutton found �ve possible
V-type asteroids in the outer belt: (7472) Kumakiri, (10537) 1991 RY16, (44496) 1998 XM5,
(55613) 2002 TY49, and (105041) 2000 KO41. However, these �ndings need to be con�rmed
by accurate spectroscopic observations.
The aim of this work is to describe the visible spectroscopic observations of two of these
candidates: (7472) Kumakiri and (10537) 1991 RY16. Our goal is to provide a more reliable
taxonomic classi�cation of these asteroids indicating that they would the second and third
basaltic asteroids discovered up to now in the outer belt. Our observations also reveal certain
peculiarities of their spectra that deserve special attention in future studies. Last but not
least, our results help to validate the approach of Roig & Gil-Hutton (2006) to predict V-type
asteroids. It is worth recalling that a similar study has been performed by Roig et al. (2007),
who used visible spectroscopic observations taken at the Gemini Observatory to con�rm the
classi�cation of two candidate V-type asteroids in the middle belt: (21238) 1995 WV7 and
(40521) 1999 RL95.
In Sect. 2, we describe the observations and the reduction procedures. In Sect. 3, we
present and discuss the results obtained. Finally, Sect. 4 is devoted to conclusions.
� 4 �
2. Observations
Low resolution spectroscopy of (7472) Kumakiri and (10537) 1991 RY16 were obtained
on November 14, 2006, as part of a 4 nights observational run, using the Calar Alto Faint
Object Spectrograph (CAFOS) at the 2.2m telescope in Calar Alto Observatory, Spain. The
prime aim of the run was to characterize V-type asteroids inside and outside the Vesta
family. Asteroid (7472) Kumakiri was observed again on December 29, 2006, using the same
instrument and telescope, under Director's Discretionary Time (DDT). Table 1 summarizes
the observational circumstances.
CAFOS1 is equipped with a 2048×2048 CCD detector SITe-1d (pixel size 24 µm/pixel,
plate scale 0.53"/pixel). We used the R400 grism allowing to obtain an observable spectral
range between 0.50 and 0.92 µm. To remove the solar component of the spectra and obtain
the re�ectance spectra, the solar analog stars HD 191854, HD 20630 and HD 28099 (Hardorp,
1978) were also observed at similar airmasses as the asteroids. In order to estimate the
quality of each night, at least two solar analogs were observed per night and we veri�ed that
the ratios between the corresponding spectra show no signi�cant variations. Bias frames,
spectral dome �at �elds and calibration lamps spectra were also taken in each night to allow
reduction of the science images. Spectrum exposures for each asteroid were splitted in two
exposures at two di�erent slit positions, A and B, separated by 20" (the width of the slit was
2.0"). The observations were performed with the telescope tracking at the proper motion
of the asteroid. Hence by subtracting A from B and B from A, a very accurate background
removal is achieved. Finally, standard methods for spectra extraction were applied.
3. Results and discussion
The re�ectance spectra of (7472) Kumakiri and (10537) 1991 RY16 are shown in Figs.
1 and 2. Both spectra show a steep slope shortwards of 0.70 µm and a deep absorption
band longwards of 0.75 µm. Using the algorithm of Bus (1999), we determine that the
spectra can be classi�ed as V-type. Figure 1 show that our observations are compatible
with the spectra of previously known V-type asteroids (gray lines) taken from the SMASS
survey (Bus & Binzel, 2002) and the S3OS2 survey (Lazzaro et al., 2004). Figure 2 show the
good agreement between the �ve band photometry of the SDSS-MOC (black lines) and the
observed spectra. It is worth noting that the values of maximum and minimum re�ectance
prevents to attribute to these spectra other taxonomic classi�cation, like R-, O- or Q-type.
1See http://www.caha.es/alises/cafos/cafos22.pdf for more details.
� 5 �
In view of this, (7472) Kumakiri and (10537) 1991 RY16 may be considered, together with
(1459) Magnya, the only V-type asteroids discovered up to now in the outer belt.
Notwithstanding, the spectra of (7472) Kumakiri and (10537) 1991 RY16 show a shal-
low absorption feature around 0.60-0.70 µm that has never been reported before in V-type
asteroids. This feature is more evident in the spectrum of (10537) 1991 RY16. After the its
identi�cation in the November 14 observations, and excluding possible reduction artifacts or
solar analogs problems, we requested Director Discretionary Time (DDT) for another obser-
vational run on December 29. Only the spectra of (7472) Kumakiri was able to be observed
during this run, con�rming the presence of the absorption band. Nevertheless, the band in
the spectrum of (10537) 1991 RY16 has also been observed independently by Moskovitz et
al. (2007).
To analyze this band, we recti�ed the spectra by subtracting a linear continuum in the
interval 0.55 and 0.75 µm and then �tted several polynomials of di�erent degrees. This
allowed to determine the center of the band at 0.63± 0.01 µm and the FWHM of ∼ 0.1 µm
(e.g. Fig. 3).
The origin of this absorption band is unclear. Such kind of bands are usually believed to
arise from the Fe2+ → Fe3+ charge transfer absorptions in phyllosilicate (hydrated) minerals
(Vilas & Ga�ey, 1989; Vilas et al., 1993). However, it is di�cult to explain the presence of
a hydrated mineral in the surface of a basaltic object, because the heating and melting that
produce the basalt also eliminate any traces of water.
It is known that pyroxene crystals Fe2+ cations do not show any absorption bands in the
spectral region from 0.56 to 0.72 µm. Therefore, the origin of the observed band might be
related to other impurity cations like Mn2+, Cr3+, and Fe3+, usually located in the M1 site
of terrestrial and meteorite orthopyroxenes (Shestopalov et al., 2007). In particular, broad
spin-allowed bands of trivalent chromium around 0.430-0.455 µm and 0.620-0.650 µm have
been observed in both re�ectance and transmitted spectra of Cr-containing terrestrial ortho
and clinopyroxenes (see Cloutis, 2002), as well as in diogenite re�ectance spectra (McFadden
et al., 1982). Cr3+ cations also give spin-forbidden bands near 0.480, 0.635, 0.655, and 0.670
µm but they do not give absorptions near 0.57 µm.
Cloutis (2002) speci�cally found that Cr3+ gives rise to an absorption band near 0.455
µm and a more complex absorption feature in the 0.65 µm region. However, changes in
the grain size of the pyroxenes may have an e�ect on the depth of these absorption bands
(Cloutis and Ga�ey 1991; Sunshine and Pieters 1993). Therefore, the presence of speci�c
absorption bands can be taken as an evidence for the presence of a particular cation, but the
characteristics of these bands (depth and width) are probably not reliable enough to constrain
� 6 �
the cation abundance (Cloutis 2002). For example, the grain size may be responsible of the
di�erent band depth observed between the spectra of (7472) Kumakiri and (10537) 1991
RY16. The slight di�erences in the band pro�le between the November and December
spectra of (7472) Kumakiri might be attributed to di�erent rotational phases2.
Another interesting feature observed in our spectra is that the band center of the major
absorption feature at 0.90 µm is displaced to larger wavelenghts. In our spectra, this region
is the noisiest but using di�erent polynomial �ts it was possible to estimate the center of
the band nearer to 0.92-0.93 µm. This behavior may also be attributed to the presence of
chromium on the surface. Actually, Cloutis and Ga�ey (1991) suggested that the Cr-rich
pyroxene samples in their study have the two major absorption features (i.e. the one centered
at 0.9 and the one centered at 1.9 µm, respectively) displaced to larger wavelengths than
expected, relative to their Fe contents. These authors also presented the predicted versus
actual wavelength position of the major Fe2+ absorption band center in the 1 µm region,
and this center is closer to 0.92 µm than to 0.90 µm. Therefore, the observational evidence
points to a possible Cr-rich basaltic composition on the surfaces of (7472) Kumakiri and
(10537) 1991 RY16.
Concerning the dynamical behavior of these two asteroids, Table 2 lists their proper
elements and diameters, as well as those of (1459) Magnya. The three asteroids are too small
to be di�erentiated bodies by themselves, they are quite spread in proper elements space and
do not belong to any of the asteroid dynamical families identi�ed in the outer belt. Therefore,
they are likely to be fragments from more than one di�erentiated parent bodies. Nevertheless,
at variance with (1459) Magnya, (7472) Kumakiri and (10537) 1991 RY16 evolve very close to
the non linear secular resonance de�ned by the combination g0+s0−g5−s7 ' 0, where gi and
si represent the frequencies of the perihelion $ and node Ω, respectively (i = 0 for asteroid,
i = 5 for Jupiter, i = 7 for Uranus; see Milani & Kneºevi¢, 1992). A 50 My simulation of
the orbits of these two asteroids, including gravitational perturbations from the four major
planets, indicate that they have quite stable orbits showing a slow circulation of the angle
$0+Ω0−$5−Ω7. Although this may be just a coincidence, a dynamical connection between
(7472) Kumakiri and (10537) 1991 RY16 cannot be ruled out and should be addressed by
more detailed studies.
2We have veri�ed that these di�erences cannot be related to observation/reduction problems, since we
do not �nd any di�erences between the spectra of the solar analog stars used in the di�erent nights.
� 7 �
4. Conclusions
We presented visible spectroscopic observations of two asteroids, (7472) Kumakiri and
(10537) 1991 RY16, located in the outer belt. The main goal of our work was to show that
these observations are compatible with the V-type taxonomic class. Therefore, these bodies
would constitute the second and third basaltic asteroids discovered up to now in that part
of the Main Belt.
However, the presence of a shallow absorption band in the spectra around 0.65 µm opens
some questions about the actual mineralogy of these two asteroids. This band is likely to
be related to the presence of Cr3+ cations, and provides evidence for a possible a Cr-rich
basaltic surface.
The spectroscopic similarities among the two asteroids, together with some shared dy-
namical properties, point to the idea of a common origin from the breakup of a di�erentiated
parent body in the outer belt. Further studies, including near infrared (NIR) spectroscopic
observations, are mandatory to better address these issues.
We thank Calar Alto Observatory for allocation of Director's Discretionary Time to this
programme. Fruitful discussions with D. Nesvroný are also highly appreciated. Based on
observations collected at the Centro Astronómico Hispano Alemán (CAHA) at Calar Alto,
operated jointly by the Max- Planck Institut für Astronomie and the Instituto de Astrofísica
de Andalucía (CSIC). RD acknowledges �nancial support from the MEC (contract Juan de
la Cierva).
REFERENCES
Alvarez-Candal, A., Du�ard, R., Lazzaro, D., & Michtchenko, T.A. 2006, A&A, 459, 969
Asphaug, E. 1997, Meteor. & Planet. Sci., 32, 965
Binzel, R., Rivkin, A., Stuart, S., et al. 2004, Icarus, 170, 259
Binzel, R.P., Masi, G., & Foglia, S. 2006, Bull. Amer. Astron. Soc., 38, 627
Burbine, T.H., Buchanan, P.C, Binzel, R.P., et al. 2001, Meteor. & Planet. Sci., 36, 761
Bus, S.J. 1999, PhD Thesis, MIT
Bus, S.J., & Binzel, R.P. 2002, Icarus, 158, 146
� 8 �
Carruba, V., Michtchenko, T.A., Roig, F., et al. 2005, A&A, 441, 819
Carruba, V., Roig, F., Michtchenko, T.A., et al. 2007, A&A, 465, 315
Cellino, A., Bus, S. J., Doressoundiram, A. & Lazzaro, D. in Asteroids III (eds Bottke, W.
F. et al.) 632�643 (Univ. Arizona Press, Tucson, 2003)
Cloutis, E. A.; Ga�ey, M. J. Journal of Geophysical Research, vol. 96, 1991, p. 22809-22826.
Cloutis, E.A. Journal of Geophysical Research (Planets), Volume 107, Issue E6, pp. 6-1.
Cruikshank, D.P., Tholen, D.J., Bell, J.F., et al. 1991, Icarus, 89, 1
Delbo, M., Gai, M., Lattanzi, M.G., et al. 2006, Icarus, 181, 618
Du�ard, R., de León, J., Licandro, J., et al. 2006, A&A, 456, 775
Florczak, M., Lazzaro, D., & Du�ard, R. 2002, Icarus, 159, 178
Hardorp, J. 1978, A&A, 63, 383
Ivezi¢, �., Tabachnik, S., Ra�kov, R., et al. 2001, AJ, 122, 2749
Juri¢, M., Ivezi¢, �., Lupton, R.H., et al. 2002, AJ, 124, 1776
Lazzaro, D., Michtchenko, T.A., Carvano, J.M., et al. 2000, Science, 288, 2033
Lazzaro, D., Angeli, C.A., Carvano, J.M., et al. 2004, Icarus, 172, 179
McCord, T.B., Adams, J.B., & Johnson, T.V. 1970, Science, 168, 1445
McFadden, L.A., Ga�ey, M.J., Takeda, H., Jackowski, T.L., Reed, K.L., 1982. Mem. Nat.
Inst. Polar. Res. 25, 188-206.
McFadden, L., Ga�ey, M.J., & McCord, T. 1985, Science, 229, 160
Michtchenko, T.A., Lazzaro, D., Ferraz-Mello, S., et al. 2002, Icarus, 158, 343
Milani, A., & Kneºevi¢, Z. 1992, Icarus, 98, 211
Moskovitz, N.A., Willman, M., Lawrence, S.J., et al. 2007, LPI Conf., 38, 1663
Nesvorný, D., Roig, F., Gladman, B., et al., Icarus (2007), doi:10.1016/j.icarus.2007.08.034
Roig, F., & Gil-Hutton, R. 2006, Icarus, 183, 411
� 9 �
Roig, F., Nesvorný, D., Gil-Hutton, R., & Lazzaro, D., Icarus (2007),
doi:10.1016/j.icarus.2007.10.004
Shestopalov, D. I.; McFadden, L. A.; Golubeva, L. F., Icarus, Volume 187, Issue 2, p. 469-481.
Sunshine, J. M.; Pieters, C. M. Journal of Geophysical Research, vol. 98, no. E5, p. 9075-
9087.
Thomas, P.C., Binzel, R.P., Ga�ey, M.J., et al. 1997, Science, 277, 1492
Vilas, F., & Ga�ey, M.J. 1989, Science, 246, 790
Vilas, F., Larson, S.M., Hatch, E.C., et al. 1993, Icarus, 105, 67
This preprint was prepared with the AAS LATEX macros v5.2.
� 10 �
Table 1: Observational circumstances for the targets: Universal Time (UT), heliocentric
distance (r), geocentric distance (∆), phase angle (φ), visual magnitude (V ), airmass and
total exposure time (Texp).
Asteroid UT r [AU] ∆ [AU] φ [deg] V [mag] airmass Texp
November 14
(7472) Kumakiri 03:25:00 2.920 2.123 13.5 16.6 1.037 3400 sec
(10537) 1991 RY16 00:05:08 3.040 2.053 1.8 17.1 1.091 4000 sec
December 29
(7472) Kumakiri 21:48:52 2.873 1.901 3.8 15.9 1.094 3600 sec
Table 2: Proper elements and sizes of V-type asteroids in the outer belt. For (1459) Magnya,
the last column gives the diameter from Delbo et al. (2006). For (7472) Kumakiri and
(10537) 1991 RY16, the diameter was computed assuming an albedo of 0.40.
Asteroid ap [AU] ep sin Ip D [km]
(1459) Magnya 3.14986 0.2183 0.2651 17.0± 1.0
(7472) Kumakiri 3.01033 0.1372 0.1562 8.5
(10537) 1991 RY16 2.84958 0.1023 0.1101 7.3
� 11 �
Fig. 1.� Re�ectance spectra of (7472) Kumakiri and (10537) 1991 RY16 (black lines)
compared to the spectra of several known V-type asteroids taken from the SMASS and
S3OS2 surveys (gray lines). The spectra are normalized to 1 at 0.55 µm and shifted by 0.5
units in re�ectance for clarity. To remove the solar contribution, we have used the solar
analog HD 191854 in the November 14 observations and the solar analog HD 28099 in the
December 29 observation.
� 12 �
Fig. 2.� Re�ectance spectra of (7472) Kumakiri and (10537) 1991 RY16 (gray lines) com-
pared to the photometric observations of the SDSS-MOC (black lines). The spectra are
normalized to 1 at 0.477 µm (i.e. the center of the g band in the SDSS photometric system),
and shifted by 0.5 units in re�ectance for clarity. The errors in the SDSS-MOC �uxes are
less than 3%.
� 13 �
(10537) 1991 RY16
-0.15
-0.05
0.54 0.59 0.64 0.69 0.74
Fig. 3.� Re�ectance spectrum of (10537) 1991 RY16 in the 0.55-0.75 µm interval. The
spectrum has been recti�ed by subtracting a linear continuum in this interval. From a
polynomial �t (thick line), the center of the absorption band is detected at 0.63 µm with a
FWHM of 0.1 µm.
	Introduction
	Observations
	Results and discussion
	Conclusions
ABSTRACT
  The identification of basaltic asteroids in the asteroid Main Belt and the
description of their surface mineralogy is necessary to understand the
diversity in the collection of basaltic meteorites. Basaltic asteroids can be
identified from their visible reflectance spectra and are classified as V-type
in the usual taxonomies. In this work, we report visible spectroscopic
observations of two candidate V-type asteroids, (7472) Kumakiri and (10537)
1991 RY16, located in the outer Main Belt (a > 2.85 UA). These candidate have
been previously identified by Roig and Gil-Hutton (2006, Icarus 183, 411) using
the Sloan Digital Sky Survey colors. The spectroscopic observations have been
obtained at the Calar Alto Observatory, Spain, during observational runs in
November and December 2006. The spectra of these two asteroids show the steep
slope shortwards of 0.70 microns and the deep absorption feature longwards of
0.75 microns that are characteristic of V-type asteroids. However, the presence
of a shallow but conspicuous absorption band around 0.65 microns opens some
questions about the actual mineralogy of these two asteroids. Such band has
never been observed before in basaltic asteroids with the intensity we detected
it. We discuss the possibility for this shallow absorption feature to be caused
by the presence of chromium on the asteroid surface. Our results indicate that,
together with (1459) Magnya, asteroids (7472) Kumakiri and (10537) 1991 RY16
may be the only traces of basaltic material found up to now in the outer Main
Belt.

<|endoftext|><|startoftext|>
Introduction and statement of the results
Let S be an open finite Riemann surface endowed with the Poincaré (hyperbolic) met-
ric. We will study some properties of holomorphic functions in the Riemann surface with
uniform growth control. Namely we will deal with the Banach space Aφ(S) of holomorphic
functions in S such that ‖f‖ := supS |f |e
−φ <∞ where φ is a given subharmonic function
that controls the growth of the functions in the space.
The fact that φ is subharmonic is a natural assumption on the weight that limits the
growth since any other growth control given by a weight ψ, ‖f‖∗ = supS |f |e
−ψ can be
replaced by an equivalent subharmonic function because φ = sup‖f‖∗≤1 log |f | is a subhar-
monic function and Aψ(S) = Aφ(S) with equality of norms, supS |f |e
−ψ = supS |f |e
We have fixed a metric. It is then natural to restrict the possible weights φ, in a way
that the functions in Aφ oscillate in a controlled way when the points are nearby in the
Poincaré metric. This is achieved for instance by assuming that φ has bounded Laplacian
(the Laplace-Beltrami operator with respect to the hyperbolic measure). That is, if in a
local coordinate chart the Poincaré metric is of the form ds2 = e2ν(z)|dz|2, then we assume
that ∆φ = 4e−2ν(z) ∂
∂z∂z̄
satisfies C−1 ≤ ∆φ ≤ C. If we want to deal with other weights
then it is possible to introduce a natural metric associated to the weight as it is done in the
plane in [MMOC03]. In this work we will only consider the Poincaré metric and bounded
Laplacian since it already covers many interesting cases and it is technically simpler.
The problems that we will consider are the following:
(A) The description of the interpolating sequences for Aφ(S): i.e. the sequences Λ ⊂ S
such that it is always possible to find an f ∈ Aφ(S) such that f(λ) = vλ for all λ ∈ Λ
whenever the data {vλ}Λ, satisfies the compatibility condition supΛ |vλ|e
−φ(λ) < +∞
(B) The description of sampling sets for Aφ(S): i.e. the sets E ⊂ S such that there is
a constant C > 0 that satisfies
|f |e−φ ≤ C sup
|f |e−φ, ∀f ∈ Aφ(S).
Date: Working draft: July 20, 2021.
Supported by DGICYT grant MTM2005-08984-C02-02 and the CIRIT grant 2005SGR00611.
http://arxiv.org/abs/0704.0231v1
2 JOAQUIM ORTEGA-CERDÀ
In the solution of these problems the Poincaré distance and the potential theory in the
surface play a key role. This has already been observed by A. Schuster and D. Varolin in
[SV04], where they provide sufficient conditions for a sequence to be interpolating/sampling
for functions in a slightly different context where the weighted uniform control of the growth
of the functions is replaced by a weighted L2 control. Their condition basically coincides
with the description that we reach so our work can be considered as the counterpart of
their theorems, although we will give a different proof of their results as well. We will
rely on the well-known case of the disk and some simplifying properties of finite Riemann
surfaces. Their method of proof looks more promising if one wants to extend the result to
Riemann surfaces with more complicated topology.
When the surface is a disk, which will be our model situation, the corresponding problems
have been solved in [BOC95], [OCS98] and in a different way in [Sei98]. Of course, the
more basic problem of describing the interpolating sequences for bounded holomorphic
functions in finite Riemann surfaces (in our notation φ ≡ 0), has been known for a long
time, see [Sto65]).
We introduce now some definitions that will be needed to state our results. For any
point z ∈ S and any r > 0 we denote by D(z, r) the domain in the surface S that consits
of points at hyperbolic distance from z less than r. They are topological disks if the center
z is outside a big compact of S, or if r is small enough, as we will see in Section 2.
A sequence Λ of points in S is hyperbolically separated if there is an ε > 0 such that
the domains {D(λ, ε)}λ∈Λ are pairwise disjoint.
Let gr(z, w) be the Green function associated to the surface D(z, r) with pole at the
“center” z and g(z, w) = g∞(z, w) be the Green function associated to the surface S. We
define the densities
D+φ (Λ) := lim sup
1/2<d(z,λ)<r
gr(z, λ)
D(z,r)
gr(z, w)i∂∂̄φ(w)
D−φ (Λ) := lim inf
1/2<d(z,λ)<r
gr(z, λ)
D(z,r)
gr(z, w)i∂∂̄φ(w)
The main result is
Theorem 1. Let S be a finite Riemann surface and let φ be a subharmonic function with
bounded Laplacian.
(A) A sequence Λ ⊂ S is an interpolating sequence for Aφ(S) if and only if it is hyper-
bolically separated and D+φ (Λ) < 1.
(B) A set E ⊂ S is a sampling set for Aφ(S) if and only if it contains an hyperbolically
separated sequence Λ ⊂ E such that D−φ (Λ) > 1.
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 3
In Section 2 we will prove some key properties of finite Riemann surfaces. In particular
we need to study the behavior of the hyperbolic metric as we approach the boundary of
the surface. We will also prove some weighted uniform estimates for the inhomogeneous
Cauchy-Riemann equation in the surface, Theorem 6, that has an interest by itself.
In the next section, we use the tools and Lemmas proved in Section 2 to reduce the
interpolating and sampling problem in S to a problem near the boundary that can be
reduced to the known case of the disk.
Finally in Section 4 we show how our results can be extended to other Banach spaces of
holomorphic functions where the uniform growth is replaced by weighted Lp spaces.
A final word on notation. By f . g we mean that there is a constant C independent of
the relevant variables such that f ≤ Cg and by f ≃ g we mean that f . g and g . f .
2. Basic properties of finite Riemann surfaces
We start by the definition and then we collect some properties of S that follow from the
restrictions that we are assuming on the topology of S.
Definition 2. A finite Riemann Surface is the interior of a smooth bordered compact
Riemann surface.
Our surface is an open Riemann surface and it is in fact an open subset of a compact
surface (the double, see [SS54]). See Figure 1 for a typical representation. Observe that
the genus is finite and the border of the surface consists of a finite number of smooth closed
Jordan curves. In most of what follows the particular case of a smooth finitely connected
open set in C has all the difficulties of the general case.
The following claim follows from instance from [Sch78, Prop 7.1-7.4]
Lemma 3. For any (0, 1)-form ω there is a solution u to the inhomogeneous Cauchy-
Riemann equation ∂̄u = ω. Moreover since S has an essential extension to a compact
Riemann surface if the data is a smooth form with compact support K in S then there is
a bounded linear solution u = T [w] with the bound |u| ≤ CK〈ω〉.
In this statement and in the following 〈ω〉 is the Poincaré length of the (0, 1)-form ω.
In the disk we have Blaschke factors that are very convenient to divide out zeros of
holomorphic functions without changing essentially the norm. The analogous functions
that provide us with the same property in the case of finite Riemann surfaces are given by
the next proposition:
Proposition 4. There is a constant C = C(S) > 0 such that for any point z ∈ S there is
a function hz ∈ H(S) with
| log |hz(w)| − g(z, w)| < C.
In particular hz(w) is a bounded holomorphic function that vanishes only on the point z
and for any ε > 0 K > |hz(w)| > C(ε) if d(z, w) > ε.
4 JOAQUIM ORTEGA-CERDÀ
Figure 1. A finite Riemann surface with three funnels
Proof. The obstruction for an harmonic function u to have an harmonic conjugate is that
for a set of generators {γi}
i=1 of the homology we have
∗du = 0, i = 1, . . . , m. If we
want u = log |f | for an f ∈ H(S), we just need that
∗du ∈ Z.
Being a finite Riemann surface there are {hj}
j=1 functions in the algebra of S without
zeros such that
∗d log |hj| = δij , see [Wer64, Lemma 1]. Thus the function
v(z) = u(z)−
log |hi(z)|
is the logarithm of an holomorphic function log |f | = v. Therefore there is a constant C such
that any harmonic function u in S admits an holomorphic function f with |u−log |f || < C.
Take a point z ∈ S and any holomorphic function kz ∈ H(S) that vanishes only on z. Then
g(z, w)− log |kz(w)| is harmonic in S and therefore there is a holomorphic function fz such
that |g(z, w)− kz(w)− log |fz|| < C. Thus we may define hz(w) = fz(w)kz(w) and it has
the estimate |g(z, w)− log |hz|| < C. The estimate |g(z, w)| > C(ε) when d(z, w) > ε holds
in finite Riemann surfaces, see for instance [Dil95, Theorem 5.5]. �
2.1. The hyperbolic metric in a finite Riemann surface. The open ends of the
Riemann surface can be parametrized as follows: The border of the Riemann surface S
is a finite union of smooth closed curves γ̃i, i = 1, . . . , n. Near each γ̃i there is a closed
geodesic γi that is homotopic to γ̃i. The subdomain of S bounded by γi and γ̃i is denoted
a “funnel” following the terminology of [DPRS87] and [Dil01].
We need to be more precise about the hyperbolic metric in the funnel. There are nice
coordinates in the funnel that provide good estimates. These are given by the collar
theorem. Let D be the universal holomorphic cover of S and let Tγ ∈ Aut(D) be the deck
transformation corresponding to the closed loop γ. Consider the surface Y = D/{T nγ }n∈Z.
This an annulus since π1(Y ) = Z. If we quotient it by the rest of the deck transformations
of the universal cover we get an holomorphic covering map πγ from Y → S which is
a local isometry (in Y and S we consider the Poincaré metric inherited from D). In fact
Y = {e−R < |z| < eR}, where R = π2/Length(γ), and πγ maps the unit circle isometrically
to γ. Moreover πγ is an isometric injection of the outer part of the annulus {1 < |z| < e
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 5
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
�������������
|z|=eR
|z|=e −R
|z|=1
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
��������
Figure 2. Standard coordinates on the funnel
onto the funnel. These will be called the standard coordinates of the funnel. See [Dil01]
and [Bus92] for details.
The Poincare metric in the the funnel is explicit in the standard coordinates and it is
comparable to the hyperbolic metric on the disk in the coordinate disk |z| < eR when
restricted to |z| > 1.
We denote by Ai, i = 1, . . . , n the funnels of S bounded by γi and γ̃i.
2.2. The inhomogeneous Cauchy-Riemann equation on the surface. We want to
solve the inhomogeneous Cauchy-Riemann equation on S with weighted uniform estimates.
In order to get good estimates it is useful to find functions f ∈ H(S) with precise size
control, i.e., |f | ≃ eφ outside a neighborhood of the zero set of f . With this function we
can later modify an integral formula to get a bounded solution to the ∂̄-equation when the
data has compact support. The following Lemma provides such a function that in other
context has been termed a “multiplier”:
Lemma 5. Let S be a finite Riemann surface and let φ be a subharmonic function with
bounded Laplacian. Then there is a function f with hyperbolically separated zero set Σ such
that |f | ≃ eφ whenever d(z,Σ) > ε. Moreover if we fix any compact K in S it is possible
to find f with the above properties and without zeros in K.
Proof. In any of the funnels Ai we transfer the subharmonic weight φ to the standard
coordinate chart 1 < |z| < eRi . We define a weight φi on the disk |z| < e
Ri in such a
way that φi has bounded invariant Laplacian and moreover |φ − φi| < C on the region
1 < |z| < eRi . One way to do so is the following: we assume from the very beginning that
φ is smooth (this is no restriction since otherwise it can be approximated by a smooth
function). Define
(2) φi(z) = φ(z)χ(z) +Mi‖z‖
where χ is a cutoff function such that χ ≡ 1 in eRi/2 < |z| < eRi , χ ≡ 0 in |z| < 1 and
Mi is taken big enough such that φi is subharmonic and the invariant Laplacian of φi is
bounded above and below.
6 JOAQUIM ORTEGA-CERDÀ
We are under the hypothesis of the result from [Sei95] that states that there is an
holomorphic function in the disk fi with separated zero set Z(fi) (in the hyperbolic metric
of the disk) such that |fi| ≃ e
φi whenever d(z, Z(fi)) > ε. Since the hyperbolic metric of
the disk is comparable to the hyperbolic metric in the funnel, we have found a function
fi ∈ H(Ai) with separated zero set such that |fi(z)| ≃ e
φ(z) if d(z, Z(fi)) > ε. Moreover
dividing out fi by a finite Blaschke product we can assume that fi is zero free in any
prefixed compact of the disk.
We consider the “core” of S to be S \ Ãi, where Ãi are the outer part of the funnels
mapped by eSi < |z| < eRi . The values of the Si are taken so big as to make sure that the
compact K in the hypothesis of the Lemma is contained in the core of S. We adjust the
fi i = 1, . . . , n as mentioned before to make sure that they are zero free in the inner part
of the funnels 1 < |z| < eSi . We finally define f0 ≡ 1 in the core of S.
To patch the different fi together we will need to solve a Cousin II problem with bounds.
Our data is fi defined on the inner parts of the funnels mapped by 1 < |z| < e
Si. The data
are bounded above and below in the inner parts of the funnels (because φ is bounded above
and below in any compact of S and fi have no zeros there). We want to find functions
gi ∈ H(Ai) and g0 holomorphic on the core of S such that fi = g0/gi in the inner part of the
funnel. If moreover gi and g0 are bounded (above and below) then the function f defined
as figi in each of the funnels Ai and g0 on the core of S is holomorphic on S and has the
desired growth properties. To find the functions gi observe that since the intersection of the
funnel Ai with the core of S strictly separates the outer part of the funnel from the inner
part of the core we can reduce the Cousin II problem to solving a ∂̄-equation with bounded
estimates of the solution on S when the data is bounded and with compact support (the
support is in the inner part of the funnels). This can be achieved by Lemma 3. �
With this function we can then obtain the following result which is interesting by itself:
Theorem 6. Let S be a finite Riemann surface and let φ be a subharmonic function with
a bounded Laplacian. There is a constant C > 0 such that for any (0, 1)-form ω on S
there is a solution u to the inhomogeneous Cauchy-Riemann equation ∂̄u = ω in S with
the estimate
|u(z)|e−φ(z) ≤ C sup
〈ω(z)〉e−φ(z),
whenever the right hand is finite.
Recall that the notation 〈ω(z)〉 means the hyperbolic norm of ω at the point z.
Proof. Let wi be the form w restricted to the funnel Ai. We take a standard coordinate
chart and we may think of wi as a (0, 1)-form defined on the disk |z| < e
Ri and with
support in 1 < |z| < eRi. Consider as in the proof of Lemma 5 a subharmonic function φi
in the disk with bounded laplacian and such that |φ− φi| < C if 1 < |z| < e
By the results in [OC02, Thm 2] there is a solution ui to the problem ∂̄ui = wi in the
disk |z| < eRi with the estimate
|z|<eRi
|ui|e
−φi ≤ Ci sup
1<|z|<eRi
〈wi〉e
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 7
Observe that the hyperbolic metric of the disk and of the surface S in the funnel are
equivalent. We consider ũi = uiχi, where χi is a cutoff function with support in 1 < |z| <
eRi and such that χi ≡ 1 if |z| > e
Ri/2. The function ũi is extended by 0 to the remaining
of S and it has the estimate supS |ũi|e
−φ ≤ Ci supS〈w〉e
−φ. Now ∂̄ũi coincides with w
on the outer part of the funnel Ai. Thus the (0, 1)-form wk = w −
i ∂̄ũi has compact
support in S and it satisfies supS〈wk〉e
−φ ≤ supS〈w〉e
−φ. The desired solution is then
ũi + v, where v is such that ∂̄v = wk. We must then solve ∂̄v = wk with weighted
uniform estimates but with the advantage that wk has compact support K.
Let T (ωk) be a solution operator for ∂ū = ωk. We take the operator T given by Lemma 3
the estimate supS |T [wk](z)| ≤ CK supK〈wk〉 holds. Take f with |f | ≃ e
φ and without zeros
in K as given in Lemma 5. Then we define R as
(3) R[ωk](z) = f(z)T [ωk/f ](z),
It solves ∂̄R[ωk] = ωk with the estimate
|R[ωk]|e
−φ ≤ CK sup
〈ωk〉e
The solution is thus v = R[wk]. �
3. The main results
Proposition 7. A separated sequence Λ ⊂ S is interpolating for Aφ(S) if and only if the
sequences Λi = Λ ∩Ai are interpolating in Aφ(Ai).
Proof. We only need to prove that we can pass from the local to the global interpolation
property. We split the proof in two steps
(1) From a funnel Ai to global S: We need to prove that there are finite sets Fi ⊂ Λi
such that ∪ni=1(Λi \ Fi) is interpolating globally.
(2) Filling up the remainder. We shall prove that by adding a finite number of points
to an interpolating sequence we still get an interpolating sequence. Thus Λ is
interpolating if (Λ1 \ F1) ∪ · · · ∪ (Λn \ Fn) is interpolating.
Let γ̃ be one of the closed curves on the boundary. Take a funnel A with outer end curve
in γ̃ and inner end curve in γ. The constant of interpolation in the funnel A is K > 0.
Take a cutoff function χε with support in the funnel such that 〈∂̄χε〉 < ε/(KC) (where C
is the constant in Theorem 6), the support is in a thick annulus of hyperbolic thickness
M = M(ε,K, C). We consider a smaller funnel where χε ≡ 1. The sequence Λ in this
smaller funnel has still at most interpolation constant K. We can interpolate arbitrary
values on Λ being small near the inner curve γ of A in the following way. Take some values
vλ with norm one. Take a function in the funnel f with norm at most K that solves the
interpolation problem. We are going to approximate it by a function in A that is small
near γ. Cut it off by χε and correct via the following inhomogeneous Cauchy-Riemann
equation:
8 JOAQUIM ORTEGA-CERDÀ
∂̄u = f∂̄χ
The function h = u − fχ is holomorphic. By using Theorem 6 on it is possible to solve
the equation with a solution u such that sup |u|e−φ ≤ ε. The function h does not solve
the problem directly but it almost does. We reiterate the procedure (interpolating the
error vλ − h(λ) and with a convergent series we get finally a function g such that h(λ) =
vλ, supA |h|e
−φ ≤ 2 and moreover in the inner half of the funnel that we denote by Ã,
supÃ |h|e
−φ ≤ ε.
Now it is easier to make it global. Take a new cutoff function χ with support in the
funnel A and that is one on the outer part of (i.e. A \ Ã. Then we need to solve
∂̄u = h∂̄χ,
with good global estimates in S. These are given by Theorem 6. We have solved the
interpolation problem when the sequence lies in the funnels. For the general situation
we only need to add a finite number of points. The existence of “Blaschke”-type factors
hλ(z) provided by Theorem 4 shows that Λ ∪ λ is interpolating if Λ is interpolating (it is
immediate to build functions in the space such that f |Λ ≡ 0 and f(λ) 6= 0). �
For the sampling part we need the following definition
Definition 8. Given the pair (S, φ) of a finite Riemann surface and a subharmonic function
defined on it, we associate to it the pairs: (Di, φi)i=1,...n of disks Di and subharmonic
functions φi defined on the disks as follow: If Ai = {1 < |z| < e
Ri}, i = 1, . . . , n are the
standard charts of the funnels of S we define Di = {|z| < e
Ri} and φi is any subharmonic
function in Di such that |φi − φ| < C in the region 1 < |z| < e
Ri, ∆φi = ∆φ in e
Ri/2 <
|z| < eRi and ∆φi ≃ 1 in |z| < e
Ri/2. They can be defined similarly as in (2), but to make
sure ∆φi = ∆φ we may take instead
φi(z) = φ(z)χ(z) +Miψ(z),
where ψ is any bounded subharmonic function in Di such that ∆ψ(z) = 1 if |z| < e
and 0 elsewhere.
The funnels Ai can be considered funnels of S and they are subdomains of Di too. We
will exploit this double nature in the following theorem
Theorem 9. Let S be a finite Riemann surface and let φ be a subharmonic function with
bounded Laplacian. A separated sequence Λ is sampling for Aφ(S) if and only if all the
sequences in the funnels Λi = Λi ∩ Ai ⊂ Di are sampling sequences for Aφi(Di), where
(Di, φi) are the associated pairs to S given by Definition 8.
Thus this Theorem and Proposition 7 show that the properties of sampling and inter-
polation only depend on the behavior of the sequence and the weight near the boundary
pieces.
To prove Theorem 9 we need some previous results
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 9
Lemma 10. Let S be a finite Riemann surface and let φ be a subharmonic function with
bounded Laplacian. A sequence Λ ⊂ S is a uniqueness sequence for Aφ(S) if and only if
all the sequences in the funnels Λi = Λi ∩ Ai ⊂ Di are uniqueness sequences for Aφi(Di),
where (Di, φi) are the associated pairs to S given by Definition 8.
Proof. It is easier to deal by negation. Let Λ be contained in the zero set of a function
f ∈ Aφ(S). Therefore Λi is in the zero set of f ∈ Aφ(Ai). We divide by a finite number
of zeros Ei and we obtain a new function g ∈ Aφ(Ai) without zeros in 1 < |z| ≤ e
Ri/2 and
such that Λi \Ei ⊂ Z(g). Take the disk Di and consider the cover by two open sets |z| > 1
and |z| < eRi/2. On the first set we have the function g and on the second the function 1.
The quotient is bounded above and below in the intersection of the sets. This defines a
bounded Cousin II in the disk Di problem that can be solved with bounded data. We get
a new function h ∈ Aφi(Di) that vanishes in Z(g). We can now add the finite number of
zeros Ei without harm. The reciprocal implication follows with the same argument. �
The next result is inspired by a result of Beurling ([Beu89, pp. 351–365]) that relates
the property of sampling sequence to that of uniqueness for all weak limits of the sequence.
In the context of the Bernstein space (in the original work by Beurling) the space was
fixed (it was C, the space of functions was fixed, the Bernstein class, and he considered
translates and limits of it of the sampling sequence). Here we need to move and take limits
of the sequence (by zooming on appropriate portions of it) but we also need to change the
support space (portions of S near the funnel that look like the unit disk) and we will also
move the space of functions by changing the weights. We need some definitions:
Definition 11. We consider triplets (Dn, φn,Λn) where Dn are disks Dn = D(0, rn) ⊂ D,
φn are subharmonic functions defined in a neighborhood of Dn and Λn is a finite collection
of points in Dn. We say that (Dn, φn,Λn) converges weakly to (D, φ,Λ) (where D is the
unit disk, φ a subharmonic function in D and Λ a discrete sequence in D) if the following
conditions are fullfilled:
• The domains Dn tend to D, i.e.: rn → 1,
• The weights φn tend to the weight φ in the sense that ∆φn as measures converges
weakly to ∆φ.
• The sequences Λn converge weakly to Λ, i.e, the measure
δλ converges weakly
to the measure
λ∈Λ δλ.
Let us fix a point p ∈ S. If a sequence of points zn ∈ S goes to ∞, i.e. d(zn, p) → ∞,
from a point n0 on it will eventually belong to the union of the funnels A1 ∪ · · · ∪ An. If
we take the set of points Dn = {z ∈ S; d(z, zn) < d(zn, p)/2} then Dn is an hyperbolic
disk contained in the funnels if n is big enough. In each of the Dn we consider the function
φn = φ|Dn and Λn = Λ ∩ Dn. Thus for any sequence of points zn with d(p, zn) → ∞ we
build a triplet (Dn, φn,Λn) for n big enough.
Definition 12. Let W (S, φ,Λ) be the set of all triplets (D, φ∗,Λ∗) which are weak limits
of triplets (Dn, φn,Λn) associated to any sequence zn such that d(p, zn) → ∞.
The theorem of Beurling on our context is
10 JOAQUIM ORTEGA-CERDÀ
Theorem 13. Let S be Riemann surface of finite type and let φ be a subharmonic function
with bounded Laplacian. A separated sequence Λ is sampling for Aφ(S) if and only if
• The sequence Λ is a uniqueness set for Aφ(S)
• For any triplet (D, φ∗,Λ∗) ∈ W (S, φ,Λ), the sequence Λ∗ is a uniqueness set for
Aφ∗(D).
Proof. Let us prove that the uniqueness conditions imply that Λ is a sampling sequence. If it
were not, there would be a sequence of functions fn ∈ Aφ(S) such that supΛ |fn|e
−φ ≤ 1/n
and supS |fn|e
−φ = 1. Take a sequence of points zn with |fn(zn)|e
−φ ≥ 1/2. If zn are
bounded we can take a subsequence of points that we still denote zn convergent to z
∗ ∈ S
and by a normal family argument there is a partial of fn convergent to f ∈ Aφ, such that
f |Λ ≡ 0, f(z
∗) 6= 0 and this is not possible. Thus zn must be unbounded. Then we take
the triplets (Dn, φn,Λn) associated to zn and Dn → D because zn → ∞ and the hyperbolic
radius of Dn is d(zn, p)/2. Since φn has bounded Laplacian, the mass of ∆φn restricted to
any compact K in D is bounded, thus we can take a subsequence that converges weakly to
a positive measure µ in D which satisfies (1−|z|)2µ ≃ 1 because all the mesures ∆φn satisfy
this inequalities with uniform constants. Let φ be such that ∆φ∗ = µ. Since Λn = Dn ∩ Λ
are all separated with uniform bound, there is a weak limit Λ∗. The functions fn in the
disks can be modified by a factor egn in such a way that hn = fne
gn satisfies hn(0) = 1
and |hn| ≤ e
φn+Re(gn), if n big enough and supΛn |hn|e
−φn+Re(gn) ≤ 1/n. We can add an
harmonic function v to φ∗ in such a way that φn+Re(gn) → v+ φ
∗ uniformly on compact
sets. Thus hn has a partial convergent to h ∈ Aφ∗ , h(0) = 1 and h|Λ∗ ≡ 0 which was not
possible by assumption.
In the other direction, we assume that Λ is a sampling sequence forAφ(S), and (D, φ
∗,Λ∗) ∈
W (S, φ,Λ). We want to prove that any f ∈ A∗φ(D) that vanishes in Λ
∗ is identically 0.
Take a sequence of points zn that escapes to infinity and (Dn, φn,Λn) the associated triple
that converges weakly to (D, φ∗,Λ∗). As φn → φ
∗ and Λn → Λ
∗ uniformly on compact
sets we can take a sequence of radii sn such that d(Λ ∩ D(zn, sn),Λ
∗ ∩ D(0, sn)) < 1/n,
D(zn, sn) ⊂ D(zn, rn) and |φn − φ
∗| ≤ 1/n. If f vanishes in Λ∗ that means that f is very
small in D(zn, sn)∩Λ. Assume that f(0) = 1. Take a cutoff function χn such that χn ≡ 0
outside D(zn, sn), χ(zn) = 1, and 〈dχ〉 < εn. Define gn = fχn − un, where ∂̄u = f∂̄χn is
the solution estimates by Theorem 6. Clearly gn is small in all points of Σ and it has at
least norm 1. Thus we are contradicting the fact that Λ is sampling.
Observe that one particular instance of finite Riemann surface, where we can apply the
result are the disks Di associated to the funnels with the metric φi. The final piece for the
proof of Theorem 9 is then
Lemma 14. If S is a finite Riemann surface, φ a subharmonic function with bounded
Laplacian and Λ is a uniformly separated sequence, then all possible weak limits coincide
with the weak limits of the disks associated to the surface, i.e,
W (S, φ,Λ) =W (D1, φ1,Λ1) ∪ · · · ∪W (Dn, φn,Λn).
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 11
Proof. The proof amounts to the observation that the metric in Di converges uniformly to
the metric in S as z → ∂Di, and in the definition of weak limits we only consider uniform
convergence over compacts. �
Theorem 9 follows now immediately from Theorem 13 and Lemmas 10 and 14. �
Now Theorem 9 and Theorem 7 show that the property of being a sampling/interpolating
sequence are determined by the behavior near the boundary, more precisely in the associ-
ated disks. In these disks there is a precise description of the interpolating and sampling
sequences (see [BOC95] and [OCS98]) that can be transported to the surface. If we rewrite
it we get the density conditions of Theorem 1, but the disks are not hyperbolic disks on
the surface, they correspond to hyperbolic disks in disks Di, but since the condition is
only relevant near the boundary, then the disks in both metrics look more an more sim-
ilar. Moreover the difference between the corresponding Green functions converge to 0
uniformly as we go to the boundary. Finally, as the sequence is uniformly discrete and the
Laplacian of the weight is bounded above and bellow, the small difference is absorbed by
the fact that the inequalities are strict and this proves Theorem 1. In fact it is possible to
replace in the definition of the density, (1) the Green function gr of D(z, r) by the Green
function g of S, because as before supw∈D(z,r) |gr(z, w)− g(z, w)| → 0 as z approaches the
boundary.
4. Some Lp-variants
We have considered up to now pointwise growth restrictions. It is possible to obtain from
our Theorem other results in different Banach spaces of holomorphic functions. Consider
for instance the weighted Bergman spaces
φ(S) = {f ∈ H(S);
|f |pe−φ dA < +∞},
where dA is s the hyperbolic area measure in S and p ∈ [1,∞). The natural problem in
this context is the following:
Definition 15. Let S be a finite Riemann surface, and let φ be a subharmonic function
with bounded Laplacian bigger than one, i.e., 1 + ε < ∆φ < M .
• A sequence Λ ⊂ S is interpolating for A
φ(S) if for any values vλ such that
pe−φ(λ) <∞
there is a function f ∈ A
φ(S) such that f(λ) = vλ.
The spaces A
φ can be empty if we only ask φ to be with positive bounded Laplacian. It is
then natural to require that the Laplacian is strictly bigger than one so that the Laplacian
plus the curvature of the metric in the manifold is strictly positive and there are functions
in the space (consider the case of the disk S = D for instance).
Let φ0 be a subharmonic function in S such that ∆φ0 = 1. The corresponding theorem
will be
12 JOAQUIM ORTEGA-CERDÀ
Theorem 16. Let S be a finite Riemann surface, and let φ be a subharmonic function
with bounded Laplacian strictly bigger than one. Let p ∈ [1,+∞) and Λ be a separated
sequence.
• The sequence Λ is interpolating for A
φ(S) if and only if D
(φ−φ0)
(Λ) < 1/p.
In the case of the unit disk dA(z) = (1 − |z|)−2 this description is well-known, see for
instance [Sei98, Thm 2,3].
Proof. The proof of the theorem is the same mutatis-mutandi as in the L∞ setting. The
basic tool that allows us to glue the pieces together is the next theorem which is the
generalization of Theorem 6 and it is proved in the same way:
Theorem 17. Let S be a finite Riemann surface, let φ be a subharmonic function with a
bounded Laplacian strictly bigger than one and let p ∈ [1,∞). There is a constant C =
C(p, S) > 0 such that for any (0, 1)-form ω on S there is a solution u to the inhomogeneous
Cauchy-Riemann equation ∂̄u = ω in S with the estimate
|u(z)|pe−φ(z)dA(z) ≤ C
〈ω(z)〉pe−φ(z)dA(z),
whenever the right hand is finite.
The proof of this result is again the same as in Theorem 6. We can separately solve the
C-R equation in each funnel using Theorem 2 from [OC02]. We glue them together with
a C-R equation with data that has compact support that can be solved with the operator
(3). �
References
[BOC95] Bo Berndtsson and Joaquim Ortega Cerdà, On interpolation and sampling in Hilbert spaces
of analytic functions, J. Reine Angew. Math. 464 (1995), 109–128. MR 96g:30070
[Beu89] Arne Beurling, The collected works of Arne Beurling. Vol. 1, Contemporary Mathematicians,
Birkhäuser Boston Inc., Boston, MA, 1989, Complex analysis, Edited by L. Carleson, P.
Malliavin, J. Neuberger and J. Wermer. MR 92k:01046a
[Bus92] Peter Buser, Geometry and spectra of compact Riemann surfaces, Progress in Mathematics,
vol. 106, Birkhäuser Boston Inc., Boston, MA, 1992. MR 93g:58149
[Dil95] Jeffrey Diller, A canonical ∂ problem for bordered Riemann surfaces, Indiana Univ. Math. J.
44 (1995), no. 3, 747–763. MR 96m:30063
[Dil01] , Green’s functions, electric networks, and the geometry of hyperbolic Riemann surfaces,
Illinois J. Math. 45 (2001), no. 2, 453–485. MR 2003j:30064
[DPRS87] Jozef Dodziuk, Thea Pignataro, Burton Randol, and Dennis Sullivan, Estimating small eigen-
values of Riemann surfaces, The legacy of Sonya Kovalevskaya (Cambridge, Mass., and
Amherst, Mass., 1985), Contemp. Math., vol. 64, Amer. Math. Soc., Providence, RI, 1987,
pp. 93–121. MR 88h:58119
[MMOC03] N. Marco, X. Massaneda, and J. Ortega-Cerdà, Interpolating and sampling sequences for entire
functions, Geom. Funct. Anal. 13 (2003), no. 4, 862–914. MR 2004j:30073
[OC02] Joaquim Ortega-Cerdà, Multipliers and weighted ∂-estimates, Rev. Mat. Iberoamericana 18
(2002), no. 2, 355–377. MR 2003j:32046
[OCS98] Joaquim Ortega-Cerdà and Kristian Seip, Beurling-type density theorems for weighted Lp
spaces of entire functions, J. Anal. Math. 75 (1998), 247–266. MR 2000k:46030
INTERPOLATING AND SAMPLING SEQUENCES IN FINITE RIEMANN SURFACES 13
[Sch78] Stephen Scheinberg, Uniform approximation by functions analytic on a Riemann surface, Ann.
of Math. (2) 108 (1978), no. 2, 257–298. MR 58 #17111
[Sei95] Kristian Seip, On Korenblum’s density condition for the zero sequences of A−α, J. Anal. Math.
67 (1995), 307–322. MR 97c:30044
[Sei98] , Developments from nonharmonic Fourier series, Proceedings of the International
Congress of Mathematicians, Vol. II (Berlin, 1998), no. Extra Vol. II, 1998, pp. 713–722
(electronic). MR 99h:42023
[SS54] Menahem Schiffer and Donald C. Spencer, Functionals of finite Riemann surfaces, Princeton
University Press, Princeton, N. J., 1954. MR 16,461g
[Sto65] E. L. Stout, Bounded holomorphic functions on finite Riemann surfaces, Trans. Amer. Math.
Soc. 120 (1965), 255–285. MR 32#1358
[SV04] A. Schuster and D. Varolin, Interpolation and Sampling for generalized Bergman spaces on
finite Riemann Surfaces., Preprint, 2004.
[Wer64] John Wermer, Analytic disks in maximal ideal spaces, Amer. J. Math. 86 (1964), 161–170.
MR 28 #5355
Dept. Matemàtica Aplicada i Anàlisi, Universitat de Barcelona, Gran Via 585, 08071
Barcelona, Spain
E-mail address : jortega@ub.edu
ABSTRACT
  We provide a description of the interpolating and sampling sequences on a
space of holomorphic functions with a uniform growth restriction defined on
finite Riemann surfaces.

<|endoftext|><|startoftext|>
Introduction
As elements of perturbative expansions of Quantum field theories, Feynman
graphs have been playing and still play a key role both for our conceptual
understanding and for state-of-the-art computations in particle physics. This
http://arxiv.org/abs/0704.0232v2
article is concerned with several aspects of Feynman graphs: First, the com-
binatorics of perturbative renormalization give rise to Hopf algebras of rooted
trees and Feynman graphs. These Hopf algebras come with a cohomology the-
ory and structure maps that help understand important physical notions, such
as locality of counterterms, the beta function, certain symmetries, or Dyson-
Schwinger equations from a unified mathematical point of view. This point of
view is about self-similarity and recursion. The atomic (primitive) elements in
this combinatorial approach are divergent graphs without subdivergences. They
must be studied by additional means, be it analytic methods or algebraic geom-
etry and number theory, and this is a significantly more difficult task. However,
the Hopf algebra structure of graphs for renormalization is in this sense a sub-
structure of the Hopf algebra structure underlying the relative cohomology of
graph hypersurfaces needed to understand the number-theoretic properties of
field theory amplitudes [6, 5].
2 Lie and Hopf algebras of Feynman graphs
Given a Feynman graph Γ with several divergent subgraphs, the Bogoliubov
recursion and Zimmermann’s forest formula tell how Γ must be renormalized in
order to obtain a finite conceptual result, using only local counterterms. This
has an analytic (regularization/extension of distributions) and a combinatorial
aspect. The basic combinatorial question of perturbative renormalization is to
find a good model which describes disentanglement of graphs into subdiver-
gent pieces, or dually insertion of divergent pieces one into each other, from the
point of view of renormalized Feynman rules. It has been known now for several
years that commutative Hopf algebras and (dual) Lie algebras provide such a
framework [26, 14, 15] with many ramifications in pure mathematics. From the
physical side, it is important to know that, for example, recovering aspects of
gauge/BRST symmetry [39, 37, 30, 38] and the transition to nonperturbative
equations of motion [12, 28, 29, 36, 3, 35, 32, 34, 4] are conveniently possible in
this framework, as will be discussed in subsequent sections.
In order to introduce these Lie and Hopf algebras, let us now fix a renormaliz-
able quantum field theory (in the sense of perturbation theory), given by a local
Lagrangian. A convenient first example is massless φ3 theory in 6 dimensions.
We look at its perturbative expansion in terms of 1PI Feynman graphs. Each
1PI graph Γ comes with two integers, |Γ| = rankH1(Γ), its number of loops,
and sdd(Γ), its superficial degree of divergence. As usual, vacuum and tadpole
graphs need not be considered, and the only remaining superficial divergent
graphs have exactly two or three external edges, a feature of renormalizability.
Graphs without subdivergences are called primitive. Here are two examples.
Both are superficially divergent as they have three external edges. The first
one has two subdivergences, the second one is primitive. Note that there are
infinitely many primitive graphs with three external edges. In particular, for
every n ∈ N one finds a primitive Γ such that |Γ| = n.
Let now L be the Q-vector space generated by all the superficially divergent
(sdd ≥ 0) 1PI graphs of our theory, graded by the number of loops | · |. There
is an operation on L given by insertion of graphs into each other: Let γ1, γ2 be
two generators of L. Then
γ1 ⋆ γ2 :=
n(γ1, γ2,Γ)
where n(γ1, γ2,Γ) is the number of times that γ1 shows up as a subgraph of Γ
and Γ/γ1 ∼= γ2. Here are two examples:
⋆ = + +
⋆ = 2
This definition is extended bilinearly onto all of L. Note that ⋆ respects the
grading as |γ1 ⋆ γ2| = |γ1| + |γ2|. The operation ⋆ is not in general associative.
Indeed, it is pre-Lie [14, 17]:
(γ1 ⋆ γ2) ⋆ γ3 − γ1 ⋆ (γ2 ⋆ γ3) = (γ1 ⋆ γ3) ⋆ γ2 − γ1 ⋆ (γ3 ⋆ γ3). (1)
To see that (1) holds observe that on both sides nested insertions cancel. What
remains are disjoint insertions of γ2 and γ3 into γ1 which do obviously not
depend on the order of γ2 and γ3. One defines a Lie bracket on L :
[γ1, γ2] := γ1 ⋆ γ2 − γ2 ⋆ γ1.
The Jacobi identity for [·, ·] is satisfied as a consequence of the pre-Lie property
(1) of ⋆. This makes L a graded Lie algebra. The bracket is defined by mutual
insertions of graphs. As usual, U(L), the universal envelopping algebra of L is a
cocommutative Hopf algebra. Its graded dual, in the sense of Milnor-Moore, is
therefore a commutative Hopf algebra H. As an algebra, H is free commutative,
generated by the vector space L and an adjoined unit I. By duality, one expects
the coproduct ofH to disentangle its argument into subdivergent pieces. Indeed,
one finds
∆(Γ) = I⊗ Γ + Γ⊗ I+
γ ⊗ Γ/γ. (2)
The relation γ ( Γ refers to disjoint unions γ of 1PI superficially divergent
subgraphs of Γ.Disjoint unions of graphs are in turn identified with their product
in H. For example,
= I⊗ + ⊗ I+ 2 ⊗ .
The coproduct respects the grading by the loop number, as does the product (by
definition). Therefore H =
n=0 Hn is a graded Hopf algebra. Since H0
it is connected. The counit ǫ vanishes on the subspace
n=1 Hn, called aug-
mentation ideal, and ǫ(I) = 1. As usual, if ∆(x) = I⊗ x+ x⊗ I, the element x
is called primitive. The linear subspace of primitive elements is denoted PrimH.
The interest in H and L arises from the fact that the Bogoliubov recursion
is essentially solved by the antipode of H. In any connected graded bialgebra,
the antipode S is given by
S(x) = −x−
S(x′)x′′, x /∈ H0 (3)
in Sweedler’s notation. Let now V be a C-algebra. The space of linear maps
LQ(H, V ) is equipped with a convolution product (f, g) 7→ f ∗ g = mV (f ⊗
g)∆ where mV is the product in V. Relevant examples for V are suggested by
regularization schemes such as the algebra V = C[[ǫ, ǫ−1] of Laurent series with
finite pole part for dimensional regularization (space-time dimensionD = 6+2ǫ.)
The (unrenormalized) Feynman rules provide then an algebra homomorphism
φ : H → V mapping Feynman graphs to Feynman integrals in 6+2ǫ dimensions.
On V there is a linear endomorphism R (renormalization scheme) defined, for
example minimal subtraction R(ǫn) = 0 if n ≥ 0, R(ǫn) = ǫn if n < 0. If Γ
is primitive, as defined above, then φ(Γ) has only a simple pole in ǫ, hence
(1−R)φ(Γ) is a good renormalized value for Γ. If Γ does have subdivergences,
the situation is more complicated. However, the map S
R : H → V
R(Γ) = −R
φ(Γ)−
′)φ(Γ′′)
provides the counterterm prescribed by the Bogoliubov recursion, and (S
φ)(Γ) yields the renormalized value of Γ. The map S
R is a recursive deforma-
tion of φ ◦ S by R, compare its definition with (3). These are results obtained
by one of the authors in collaboration with Connes [26, 14, 15].
For S
R to be an algebra homomorphism again, one requires R to be a Rota-
Baxter operator, studied in a more general setting by Ebrahimi-Fard, Guo and
one of the authors in [20, 22, 21]. The Rota-Baxter property is at the algebraic
origin of the Birkhoff decomposition introduced in [15, 16]. In the presence of
mass terms, or gauge symmetries etc. in the Lagrangian, φ, S
R and S
R ⋆φ may
contribute to several form factors in the usual way. This can be resolved by
considering a slight extension of the Hopf algebra containing projections onto
single structure functions, as discussed for example in [15, 32]. For the case of
gauge theories, a precise definition of the coefficients n(γ1, γ2,Γ) is given in [30].
The Hopf algebra H arises from the simple insertion of graphs into each other
in a completely canonical way. Indeed, the pre-Lie product determines the co-
product, and the coproduct determines the antipode. Like this, each quantum
field theory gives rise to such a Hopf algebra H based on its 1PI graphs. It is
no surprise then that there is an even more universal Hopf algebra behind all
of them: The Hopf algebra Hrt of rooted trees [26, 14]. In order to see this,
imagine a purely nested situation of subdivergences like
which can be represented by the rooted tree
To account for each single graph of this kind, the tree’s vertices should actually
be labeled according to which primitive graph they correspond to (plus some
gluing data) which we will suppress for the sake of simplicity. The coproduct
on Hrt – corresponding to the one (2) of H – is
∆(τ) = I⊗ τ + τ ⊗ I+
adm.c
Pc(τ) ⊗Rc(τ)
where the sum runs over all admissible cuts of the tree τ. A cut of τ is a
nonempty subset of its edges which are to be removed. A cut c(τ) is defined
to be admissible, if for each leaf l of τ at most one edge on the path from l to
the root is cut. The product of subtrees which fall down when those edges are
removed is denoted Pc(τ). The part which remains connected with the root is
denoted Rc(τ). Here is an example:
⊗ I+ I⊗
+ 2 • ⊗
+ • • ⊗
✁ ⊗ •.
Compared to Hrt, the advantage of H is however that overlapping divergences
are resolved automatically. To achieve this in Hrt requires some care [27].
3 From Hochschild cohomology to physics
There is a natural cohomology theory on H and Hrt whose non-exact 1-cocycles
play an important ”operadic” role in the sense that they drive the recursion
that define the full 1PI Green’s functions in terms of primitve graphs. In order
to introduce this cohomology theory, let A be any bialgebra. We view A as
a bicomodule over itself with right coaction (id ⊗ ǫ)∆. Then the Hochschild
cohomology of A (with respect to the coalgebra part) is defined as follows [14]:
Linear maps L : A → A⊗n are considered as n-cochains. The operator b, defined
bL := (id⊗ L)∆ +
(−1)i∆iL+ (−1)
n+1L⊗ I (4)
furnishes a codifferential: b2 = 0. Here ∆ denotes the coproduct of A and ∆i
the coproduct applied to the i-th factor in A⊗n. The map L ⊗ I is given by
x 7→ L(x) ⊗ I. Clearly this codifferential encodes only information about the
coalgebra (as opposed to the algebra) part of A. The resulting cohomology is
denoted HH•ǫ (A). For n = 1, the cocycle condition bL = 0 is simply
∆L = (id⊗ L)∆ + L⊗ I (5)
for L a linear endomorphism of A. In the Hopf algebraHrt of rooted trees (where
things are often simpler), a 1-cocycle is quickly found: the grafting operator B+,
defined by
B+(I) = •
B+(τ1 . . . τn) =
τ1 . . . τn
for trees τi
joining all the roots of its argument to a newly created root. Clearly, B+ reminds
of an operad multiplication. It is easily seen that B+ is not exact and therefore
a generator (among others) of HH1ǫ (Hrt). Foissy [23] showed that L 7→ L(I) is an
onto map HH1ǫ(Hrt) → PrimHrt. The higher Hochschild cohomology (n ≥ 2)
of Hrt is known to vanish [23]. The pair (Hrt, B+) is the universal model for all
Hopf algebras of Feynman graphs and their 1-cocycles [14]. Let us now turn to
those 1-cocycles of H. Clearly, every primitve graph γ gives rise to a 1-cocycle
+ defined as the operator which inserts its argument, a product of graphs, into
γ in all possible ways. Here is a simple example:
See [30] for the general definition involving some combinatorics of insertion
places and symmetries.
It is an important consequence of the B
+ satisfying the cocycle condition (5)
R ∗ φ)B+ = (1−R)B̃+(S
R ∗ φ) (6)
where B̃+ is the push-forward of B+ along the Feynman rules φ. In other words,
+ is the integral operator corresponding to the skeleton graph γ. This is the
combinatorial key to the proof of locality of counterterms and finiteness of renor-
malization [13, 28, 2, 3]. Indeed, equation (6) says that after treating all subdi-
vergences, an overall subtraction (1−R) suffices. The only analytic ingredient is
Weinberg’s theorem applied to the primitive graphs. In [2] it is emphasized that
H is actually generated (and determined) by the action of prescribed 1-cocycles
and the multiplication. A version of (6) with decorated trees is available which
describes renormalization in coordinate space [2].
The 1-cocycles B
+ give rise to a number of useful Hopf subalgebras of H. Many
of them are isomorphic. They are studied in [3] on the model of decorated
rooted trees, and we will come back to them in the next section. In [30] one of
the authors showed that in nonabelian gauge theories, the existence of a certain
Hopf subalgebra, generated by 1-cocycles, is closely related to the Slavnov-
Taylor identities for the couplings to hold. In a similar spirit, van Suijlekom
showed that, in QED, Ward-Takahashi identities, and in nonabelian Yang-Mills
theories, the Slavnov-Taylor identities for the couplings generate Hopf ideals I
of H such that the quotients H/I are defined and the Feynman rules factor
through them [37, 38]. The Hopf algebra H for QED had been studied before
in [11, 33, 39].
4 Dyson-Schwinger equations
The ultimate application of the Hochschild 1-cocycles introduced in the previous
section aims at non-perturbative results. Dyson-Schwinger equations, reorga-
nized using the correspondence PrimH → HH1ǫ (H), become recursive equations
inH[[α]], α the coupling constant, with contributions from (degree 1) 1-cocycles.
The Feynman rules connect them to the usual integral kernel representation. We
remain in the massless φ3 theory in 6 dimensions for the moment. Let Γ� be
the full 1PI vertex function,
Γ� = I+
res Γ=�
(normalized such that the tree level contribution equals 1). This is a formal
power series in α with values in H. Here res Γ is the result of collapsing all
internal lines of Γ. The graph res Γ is called the residue of Γ. In a renormalizable
theory, res can be seen as a map from the set of generators of H to the terms
in the Lagrangian. For instance, in the φ3 theory, vertex graphs have residue
�, and self energy graphs have residue −. The number SymΓ denotes the order
of the group of automorphisms of Γ, defined in detail for example in [30, 38].
Similarly, the full inverse propagator Γ− is represented by
Γ− = I−
res Γ=−
. (8)
These series can be reorganized by summing only over primitive graphs, with all
possible insertions into these primitive graphs. In H, the insertions are afforded
by the corresponding Hochschild 1-cocycles. Indeed,
Γ� = I+
γ∈PrimH,res γ=�
α|γ|B
�Q|γ|)
Sym γ
Γ− = I−
γ∈PrimH,res γ=−
α|γ|B
−Q|γ|)
Sym γ
. (9)
The universal invariant charge Q is a monomial in the Γr and their inverses,
where r are residues (terms in the Lagrangian) provided by the theory. In φ3
theory we have Q = (Γ�)2(Γ−)−3. In φ3 theory, the universality of Q (i. e. the
fact that the same Q is good for all Dyson-Schwinger equations of the theory)
comes from a simple topological argument. In nonabelian gauge theories how-
ever, the universality of Q takes care that the solution of the corresponding
system of coupled Dyson-Schwinger equations gives rise to a Hopf subalgebra
and therefore amounts to the Slavnov-Taylor identities for the couplings [30].
The system (9) of coupled Dyson-Schwinger equations has (7,8) as its solution.
Note that in the first equation of (9) an infinite number of cocycles contributes
as there are infinitely many primitive vertex graphs in φ36 theory – the second
equation has only finitely many contributions – here one. Before we describe
how to actually attempt to solve equations of this kind analytically (application
of the Feynman rules φ), we discuss the combinatorial ramifications of this con-
struction in the Hopf algebra. It makes sense to call all (systems of) recursive
equations of the form
X1 = I±
. . .
Xs = I±
combinatorial Dyson-Schwinger equations, and to study their combinatorics.
Here, the Bdn+ are non-exact Hochschild 1-cocycles and the Mn are monomials in
the X1 . . .Xs. In [3] we studied a large class of single (uncoupled) combinatorial
Dyson-Schwinger equations in a decorated version of Hrt as a model for vertex
insertions:
X = I+
αnwnB
where the wn ∈ Q. For example, X = I+αB+(X
2)+α2B+(X
3) is in this class.
It turns out [28, 3] that the coefficients cn of X, defined by X =
n=0 α
generate a Hopf subalgebra themselves:
∆(cn) =
Pnk ⊗ ck.
The Pnk are homogeneous polynomials of degree n−k in the cl, l ≤ n. These poly-
nomials have been worked out explicitly in [3]. One notices in particular that
the Pnk are independent of the wn and B
+ , and hence that under mild assump-
tions (on the algebraic independence of the cn) the Hopf subalgebras generated
this way are actually isomorphic. For example, X = I+αB+(X
2) +α2B+(X
and X = I + αB+(X
2) yield isomorphic Hopf subalgebras. This is an aspect
of the fact that truncation of Dyson-Schwinger equations – considering only a
finite instead of an infinite number of contributing cocycles – does make (at
least combinatorial) sense. Indeed, the combinatorics remain invariant. Similar
results hold for Dyson-Schwinger equations in the true Hopf algebra of graphs
H where things are a bit more difficult though as the cocycles there involve
some bookkeeping of insertion places.
The simplest nontrivial Dyson-Schwinger equation one can think of is the linear
X = I+ αB+(X).
Its solution is given by X =
n=0 α
n(B+)
n(I). In this case X is grouplike
and the corresponding Hopf subalgebra of cns is cocommutative [25]. A typical
and important non-linear Dyson-Schwinger equation arises from propagator
insertions:
X = I− αB+(1/X),
for example the massless fermion propagator in Yukawa theory where only the
fermion line obtains radiative corrections (other corrections are ignored). This
problem has been studied and solved by Broadhurst and one of the authors
in [12] and revisited recently by one of the authors and Yeats [35]. As we now
turn to the analytic aspects of Dyson-Schwinger equations, we briefly sketch the
general approach presented in [35] on how to successfully treat the nonlinearity
of Dyson-Schwinger equations. Indeed, the linear Dyson-Schwinger equations
can be solved by a simple scaling ansatz [25]. In any case, let γ be a primitive
graph. The following works for amplitudes which depend on a single scale, so
let us assume a massless situation with only one non-zero external momentum –
how more than one external momentum (vertex insertions) are incorporated by
enlarging the set of primitive elements is sketched in [32]. The grafting operator
+ associated to γ translates to an integral operator under the (renormalized)
Feynman rules
+)(I)(p
2/µ2) =
(Iγ(k, p)− Iγ(k, µ))dk
where Iγ is the integral kernel corresponding to γ, the internal momenta are
denoted by k, the external momentum by p, and µ is the fixed momentum at
which we subtract: R(x) = x|p2=µ2 .
In the following we stick to the special case discussed in [35] where only one
internal edge is allowed to receive corrections. The integral kernel φ(B
+) defines
a Mellin transform
F (ρ) =
Iγ(k, µ)(k
where ki is the momentum of the internal edge of γ at which insertions may
take place (here the fermion line). If there are several insertion sites, obvious
multiple Mellin transforms become necessary. The case of two (propagator) in-
sertion places has been studied, at the same example, in [35].
The function F (ρ) has a simple pole in ρ at 0. We write
F (ρ) =
We denote L = log p2/µ2. Clearly φR(X) = 1 +
n γnL
n. An important result
of [35] is that, even in the difficult nonlinear situation, the anomalous dimension
γ1 is implicitly defined by the residue r and Taylor coefficients fn of the Mellin
transform F. On the other hand, all the γn for n ≥ 2, are recursively defined
in terms of the γi, i < n. This last statement amounts to a renormalization
group argument that is afforded in the Hopf algebra by the scattering formula
of [16]. Curiously, for this argument only a linearized part of the coproduct is
needed. We refer to [35] for the actual algorithm. For a linear Dyson-Schwinger
equation, the situation is considerably simpler as the γn = 0 for n ≥ 2 since X
is grouplike [25].
Let us restate the results for the high energy sector of non-linear Dyson-Schwinger
equations [12, 35]: Primitive graphs γ define Mellin transforms via their integral
kernels B̃
+. The anomalous dimension γ1 is implicitly determined order by order
from the coefficients of those Mellin transforms. All non-leading log coefficients
γn are recursively determined by γ1, thanks to the renormalization group. This
reduces, in principle, the problem to a study of all the primitive graphs and the
intricacies of insertion places.
Finding useful representations of those Mellin transforms – even one-dimensional
ones – of higher loop order skeleton graphs is difficult. However, the two-loop
primitive vertex in massless Yukawa theory has been worked out by Bierenbaum,
Weinzierl and one of the authors in [4], a result that can be applied to other
theories as well. Combined with the algebraic treatment [12, 3, 35] sketched
in the previous paragraphs and new geometric insight on primitive graphs (see
section 5), there is reasonable hope that actual solutions of Dyson-Schwinger
equations will be more accessible in the future.
Using the Dyson-Schwinger analysis, one of the authors and Yeats [34] were
able to deduce a bound for the convergence of superficially divergent ampli-
tudes/structure functions from the (desirable) existence of a bound for the su-
perficially convergent amplitudes.
5 Feynman integrals and periods of mixed (Tate)
Hodge structures
A primitive graph Γ ∈ PrimH defines a real number rΓ, called the residue of
Γ, which is independent of the renormalization scheme. In the case that Γ is
massless and has one external momentum p, the residue rΓ is the coefficient
of log p2/µ2 in φR(Γ) = (1 − R)φ(Γ). It coincides with the coefficient r of the
Mellin transform introduced in the previous section. One may ask what kind of
a number r is, for example if it is rational or algebraic. The origin of this ques-
tion is that the irrational or transcendental numbers that show up for various
Γ strongly suggest a motivic interpretation of the rΓ. Indeed, explicit calcula-
tions [9, 10, 8] display patterns of Riemann zeta and multiple zeta values that
are known to be periods of mixed Tate Hodge structures – here the periods
are provided by the Feynman rules which produce Γ 7→ rΓ. By disproving a
related conjecture of Kontsevich, Belkale and Brosnan [1] have shown that not
all these Feynman motives must be mixed Tate, so one may expect a larger class
of Feynman periods than multiple zeta values. Our detailed understanding of
these phenomena is still far from complete, and only some very first steps have
been made in the last few years. However, techniques developed in recent work
by Bloch, Esnault and one of the authors [7] do permit reasonable insight for
some special cases which we briefly sketch in the following.
Let Γ be a logarithmically divergent massless primitive graph with one ex-
ternal momentum p. It is convenient to work in the ”Schwinger” parametric
representation [24] obtained by the usual trick of replacing propagators
dae−ak
and performing the loop integrations (Gaussian integrals) first which leaves us
with a (divergent) integral over various Schwinger parameters a. It is a classical
exercise [24, 7, 6] to show that in four dimensions, up to some powers of i and
φ(Γ) =
da1 . . . dan
e−QΓ(a,p
2)/ΨΓ(a)
Ψ2Γ(a)
where n is the number of edges of Γ. QΓ and ΨΓ are graph polynomials of Γ,
where ΨΓ, sometimes called Symanzik or Kirchhoff polynomial, is defined as
follows: Let T (Γ) be the set of spanning trees of Γ, i. e. the set of connected
simply connected subgraphs which meet all vertices of Γ. We think of the edges
e of Γ as being numbered from 1 to n. Then
t∈T (Γ)
This is a homogeneous polynomial in the ai of degree |H1(Γ)|. It is easily seen
(scaling behaviour of QΓ and ΨΓ) that rΓ =
∂φR(Γ)
∂ log p2/µ2
is extracted from φ(Γ)
by considering the ai as homogeneous coordinates of P
n−1(R) and evaluating at
p2 = 0 :
σ⊂Pn−1(R)
where σ = {[a1, . . . , an] : all ai can be choosen ≥ 0} and Ω is a volume form
on Pn−1. Let XΓ := {ΨΓ = 0} ⊂ P
n−1. If |H1(Γ)| = 1, the integrand in (10) has
no poles. If |H1(Γ)| > 1, poles will show up on the union ∆ =
γ(Γ,H1(γ) 6=0
of coordinate linear spaces Lγ = {ae = 0 for e edge of γ} – these need to
be separated from the chain of integration by blowing up. The blowups being
understood, the Feynman motive is, by abuse of notation,
Hn−1(Pn−1 −XΓ,∆−∆ ∩XΓ)
with Feynman period given by (10). See [7, 6] for details. Some particularly
accessible examples are the wheel with n spokes graphs
Γn :=
studied extensively in [7]. The corresponding Feynman periods (10) yield ratio-
nal multiples of zeta values [9]
rΓn ∈ ζ(2n− 3)Q.
Due to the simple topology of the Γn, the geometry of the pairs (XΓn ,∆Γn) are
well understood and the corresponding motives have been worked out explicitly
[7]. The methods used are however nontrivial and not immediately applicable
to more general situations.
When confronted with non-primitive graphs, i. e. graphs with subdivergences,
there are more than one period to consider. In the Schwinger parameter picture,
subdivergences arise when poles appear along exceptional divisors as pieces of
∆ are blown up. This situation can be understood using limiting mixed Hodge
structures [6], see also [31, 36] for a toy model approach to the combinatorics
involved. In [6] it is also shown how the Hopf algebra H of graphs lifts to the
category of motives. For the motivic role of solutions of Dyson-Schwinger equa-
tions we refer to work in progress. Finally we mention that there is related work
by Connes and Marcolli [18, 19] who attack the problem via Riemann-Hilbert
correspondences and motivic Galois theory.
Acknowledgements. We thank Spencer Bloch and Karen Yeats for discus-
sion on the subject of this review. The first named author (C. B.) thanks the
organizers of the ICMP 2006 and the IHES for general support. His research is
supported by the Deutsche Forschungsgemeinschaft. The IHES, Boston Univer-
sity and the Erwin-Schrödinger-Institute are gratefully acknowledged for their
kind hospitality. At the time of writing this article, C. B. is visiting the ESI as
a Junior Research Fellow.
References
[1] P. Belkale and P. Brosnan. Matroids, motives, and a conjecture of Kontse-
vich. Duke Math. J. 116, (1):147–188, 2003. math.AG/0012198.
[2] C. Bergbauer and D. Kreimer. The Hopf algebra of rooted trees in Epstein–
Glaser renormalization. Ann. Henri Poincare, 6:343–367, 2004. hep-
th/0403207.
[3] C. Bergbauer and D. Kreimer. Hopf algebras in renormalization theory: Lo-
cality and Dyson-Schwinger equations from Hochschild cohomology. IRMA
Lect. Math. Theor. Phys., 10:133–164, 2006. hep-th/0506190.
[4] I. Bierenbaum, D. Kreimer, and S. Weinzierl. The next-to-ladder approxi-
mation for Dyson-Schwinger equations. Phys. Lett. , B646:129–133, 2007.
hep-th/0612180.
[5] S. Bloch. Mixed Hodge structures and motives in physics. Talk at the
conference on Motives and Algebraic Cycles, Fields Institute, March 2007.
[6] S. Bloch. Motives Associated to Graphs. Takagi Lectures, Kyoto, November
2006. Available online at http://www.math.uchicago.edu/ bloch/.
[7] S. Bloch, H. Esnault, and D. Kreimer. On Motives Associated
to Graph Polynomials. Commun. Math. Phys., 267:181–225, 2006.
math.AG/0510011.
[8] D. Broadhurst, J. Gracey, and D. Kreimer. Beyond the triangle and
uniqueness relations: Non-zeta counterterms at large N from positive knots.
Z. Phys., C75:559–574, 1997. hep-th/9607174.
[9] D. Broadhurst and D. Kreimer. Knots and numbers in Φ4 theory to 7 loops
and beyond. Int. J. Mod. Phys., C6:519–524, 1995. hep-ph/9504352.
[10] D. Broadhurst and D. Kreimer. Association of multiple zeta values with
positive knots via feynman diagrams up to 9 loops. Phys. Lett. , B393:403–
412, 1997. hep-th/9609128.
[11] D. Broadhurst and D. Kreimer. Renormalization automated by Hopf alge-
bra. J. Symb. Comput., 27:581, 1999. hep-th/9810087.
[12] D. Broadhurst and D. Kreimer. Exact solutions of Dyson-Schwinger
equations for iterated one-loop integrals and propagator-coupling duality.
Nucl. Phys. , B600:403–422, 2001. hep-th/0012146.
[13] J. Collins. Renormalization. Cambridge Monographs on Mathematical
Physics. Cambridge University Press, 1984.
[14] A. Connes and D. Kreimer. Hopf algebras, renormalization and non-
commutative geometry. Commun. Math. Phys., 199:203–242, 1998. hep-
th/9808042.
[15] A. Connes and D. Kreimer. Renormalization in quantum field theory and
the Riemann– Hilbert problem I: The Hopf algebra structure of graphs
and the main theorem. Comm. Math. Phys., 210:249–273, 2000. hep-
th/9912092.
[16] A. Connes and D. Kreimer. Renormalization in quantum field theory and
the Riemann-Hilbert problem II: The beta-function, diffeomorphisms and
the renormalization group. Commun. Math. Phys., 216:215–241, 2001. hep-
th/0003188.
[17] A. Connes and D. Kreimer. Insertion and elimination: The doubly infinite
Lie algebra of Feynman graphs. Ann. Henri Poincaré, 3:411–433, 2002.
hep-th/0201157.
[18] A. Connes and M. Marcolli. From Physics to Number Theory via Noncom-
mutative Geometry, Part II: Renormalization, the Riemann-Hilbert cor-
respondence, and motivic Galois theory. In Frontiers in Number Theory,
Physics, and Geometry II, pages 617–713. Springer, 2006. hep-th/0411114.
[19] A. Connes and M. Marcolli. Quantum fields and motives. J. Geom. Phys. ,
56(1):55–85, 2006. hep-th/0504085.
[20] K. Ebrahimi-Fard, L. Guo, and D. Kreimer. Integrable renormalization I:
The ladder case. J. Math. Phys., 45:3758–3769, 2004. hep-th/0402095.
[21] K. Ebrahimi-Fard, L. Guo, and D. Kreimer. Spitzer’s identity and the
algebraic Birkhoff decomposition in pQFT. J. Phys., A37:11037–11052,
2004. hep-th/0407082.
[22] K. Ebrahimi-Fard, L. Guo, and D. Kreimer. Integrable renormalization II:
The general case. Ann. Henri Poincaré, 6:369–395, 2005. hep-th/0403118.
[23] L. Foissy. Les algèbres de Hopf des arbres enracinés I-II. Bull. Sci. Math.,
126:193–239 and 249–288, 2002.
[24] C. Itzykson and J. -B. Zuber. Quantum field theory. McGraw–Hill, 1980.
[25] D. Kreimer. Étude for linear Dyson-Schwinger equations. IHES Preprint
P/06/23.
[26] D. Kreimer. On the Hopf algebra structure of perturbative quantum field
theories. Adv. Theor. Math. Phys., 2:303–334, 1998. q-alg/9707029.
[27] D. Kreimer. On overlapping divergences. Commun. Math. Phys., 204:669,
1999. hep-th/9810022.
[28] D. Kreimer. Factorization in quantum field theory: An exercise in Hopf
algebras and local singularities. 2003. Contributed to Les Houches School of
Physics: Frontiers in Number Theory, Physics and Geometry, Les Houches,
France, 9-21 Mar 2003. hep-th/0306020.
[29] D. Kreimer. What is the trouble with Dyson-Schwinger equations?
Nucl. Phys. Proc. Suppl., 135:238–242, 2004. hep-th/0407016.
[30] D. Kreimer. Anatomy of a gauge theory. Annals Phys. , 321:2757–2781,
2006.
[31] D. Kreimer. The residues of quantum field theory: Numbers we should
know. In C. Consani and M. Marcolli, editors, Noncommutative Geometry
and Number Theory (Bonn, 2003), pages 187–204. Vieweg, 2006. hep-
th/0404090.
[32] D. Kreimer. Dyson Schwinger equations: From Hopf algebras to number
theory. In I. Binder and D. Kreimer, editors, Universality and Renor-
malization, volume 50 of Fields Inst. Comm., pages 225–248. AMS, 2007.
hep-th/0609004.
[33] D. Kreimer and R. Delbourgo. Using the Hopf algebra structure of QFT
in calculations. Phys. Rev. , D60:105025, 1999. hep-th/9903249.
[34] D. Kreimer and K. Yeats. Recursion and growth estimates in renormalizable
quantum field theory. hep-th/0612179.
[35] D. Kreimer and K. Yeats. An etude in non-linear Dyson-Schwinger equa-
tions. Nucl. Phys. Proc. Suppl. , 160:116–121, 2006. hep-th/0605096.
[36] I. Mencattini and D. Kreimer. The structure of the Ladder Insertion-
Elimination Lie algebra. Commun. Math. Phys., 259:413–432, 2005. math-
ph/0408053.
[37] W. van Suijlekom. The Hopf algebra of Feynman graphs in QED.
Lett. Math. Phys. , 77:265–281, 2006. hep-th/0602126.
[38] W. van Suijlekom. Renormalization of gauge fields: A Hopf algebra ap-
proach. 2006. hep-th/0610137.
[39] D. Volovich, I. and Prokhorenko. Renormalizations in quantum electro-
dynamics, and Hopf algebras. Tr. Mat. Inst. Steklova, 245(Izbr. Vopr. p-
adich. Mat. Fiz. i Anal. ):288–295, 2004.
	Introduction
	Lie and Hopf algebras of Feynman graphs
	From Hochschild cohomology to physics
	Dyson-Schwinger equations
	Feynman integrals and periods of mixed (Tate) Hodge structures
ABSTRACT
  In this expository article we review recent advances in our understanding of
the combinatorial and algebraic structure of perturbation theory in terms of
Feynman graphs, and Dyson-Schwinger equations. Starting from Lie and Hopf
algebras of Feynman graphs, perturbative renormalization is rephrased
algebraically. The Hochschild cohomology of these Hopf algebras leads the way
to Slavnov-Taylor identities and Dyson-Schwinger equations. We discuss recent
progress in solving simple Dyson-Schwinger equations in the high energy sector
using the algebraic machinery. Finally there is a short account on a relation
to algebraic geometry and number theory: understanding Feynman integrals as
periods of mixed (Tate) motives.

<|endoftext|><|startoftext|>
Many-body interband tunneling as a witness for complex dynamics in the
Bose-Hubbard model
Andrea Tomadin,1 Riccardo Mannella,1 and Sandro Wimberger1,2
Dipartimento di Fisica, Unversità degli Studi di Pisa, Largo Pontecorvo 3, 56127 Pisa, Italy
CNISM, Dipartimento di Fisica del Politecnico, C. Duca degli Abruzzi 24, 10129 Torino, Italy
(Dated: November 4, 2018)
A perturbative model is studied for the tunneling of many-particle states from the ground band
to the first excited energy band, mimicking Landau-Zener decay for ultracold, spinless atoms in
quasi-one dimensional optical lattices subjected to a tunable tilting force. The distributions of the
computed tunneling rates provide an independent and experimentally accessible signature of the
regular-chaotic transition in the strongly correlated many-body dynamics of the ground band.
PACS numbers: 03.65.Xp,32.80.Pj,05.45.Mt,71.35.Lk
The experimental advances in atom and quantum op-
tics allow the experimentalist to directly study a plethora
of minimal models which have been developed to de-
scribe usually much more complex phenomena occurring
in solid states [1, 2, 3]. Bose-Einstein condensates loaded
into optical lattices, which perfectly realize spatially peri-
odic potentials, are used, e.g., to implement the Wannier-
Stark problem [4, 5, 6] as a paradigm of quantum trans-
port where atoms move in a tilted lattice. Up to now all
experiments on the Wannier-Stark system with ultracold
atoms have been performed in a regime where atom-atom
interactions are either negligible [4] or reduce to an effec-
tive mean-field description [5, 7]. State-of-the-art setups
are, however, capable to achieve small filling factors of
the order of one atom per lattice site [2]. Moreover, the
atom-atom interactions can be tuned by the transversal
confinement and by Feshbach resonances [3, 8], resulting
in strong interaction-induced correlations.
The regime of strong correlations in the Wannier-Stark
system was addressed in [9, 10], revealing the sensitive
dependence of the system’s dynamics on the Stark force
F . The single-band Bose-Hubbard model of [9, 10] is
defined by the following Hamiltonian with the creation
l,1, annihilation âl,1, and number operators n̂ l,1 for the
first band of a lattice l = 1 . . . L:
Fl n̂ l,1−
â l+1,1
† â l,1 + h.c.
n̂ l,1 ( n̂ l,1 − 1) .
A transition from a regular dynamical (dominated by F )
to a quantum chaotic regime (with comparable values of
J1, U1 , F ) was found [9, 10]. The transition was quan-
titatively studied using the distribution of the spacings
between next nearest eigenenergies of the Hamiltonian
(1). This analysis [9, 10] verifies that the normalized level
spacings s ≡ ∆E/∆E obey a Poisson (P(s) = exp(−s) )
and a Wigner-Dyson (WD: P(s) = sπ/2 exp(−πs2/4) )
distribution in the regular and chaotic case, respectively
[11]. P(s) and the cumulative distribution functions
(CDF: C(s) ≡
ds′ P(s′)) are shown for typical cases in
Fig. 1, where we scanned F to emphasize the crossover
0 1 2 3 4 5
1.8 2 2.2 2.4 2.6 2.8 3
(2π/F)
0 1 2 3 4 5
0 1 2 3 4 5
0 1 2 3 4 5
0 5 10 15 20 25
FIG. 1: (a,b) CDF (stairs) and P(s) (stairs in insets) for N =
5 atoms, L = 8, lattice depth V = 10 recoil energies (fixing
J1 = 0.038), U1 = 0.032, F ≃ 0.063 (a) and 0.021 (b), with
WD (solid) and Poisson distributions (dashed). (c) χ2 test
with values close to zero for good WD statistics. The dashed
line marks the transition to quantum chaos as F is tuned. (d)
variance of the number of levels in intervals of length dE (with
normalized mean spacing), for the cases of (a) (squares) and
(b) (circles), with the random matrix predictions for Poisson
(dashed) and WD (solid) [11].
between the regular and the chaotic regime. Statistical
tests are also shown which confirm the analysis of [9, 10]
in a more systematical manner [12].
As shown in [9], the strong correlations in the quantum
chaotic regime induce a fast and irreversible decay of the
Bloch oscillations, which otherwise would persist in the
ideal, non-interacting case. Therefore, the crossover be-
tween the two regimes discussed above could be measured
in experiments by observing just the mean momentum as
a function of time. Here we introduce a new, robust and
hence also experimentally accessible prediction for this
crossover. In the presence of strong interactions parame-
terized by U1 , the single-band model should be extended
to allow for interband transitions [13], as recently realized
http://arxiv.org/abs/0704.0233v1
at F = 0 in experiments with fermionic interacting atoms
[3]. Instead of using a numerically hardly tractable com-
plete many-bands model, we introduce a perturbative de-
cay of the many-particles modes in the ground band to
a second energy band. Our novel approach to study the
Landau-Zener-like tunneling between the first and the
second band [1, 5, 7, 14, 15] leads to predictions for the
expected decay rates and their statistical distributions.
As we will show, the latter are drastically affected by the
dynamics in the ground band, and they therefore provide
a measurable witness for the regular-chaotic transition.
We first derive the individual decay rates of the dom-
inating interband coupling channels. These decay rates
will serve to effectively open the single-band model (1)
for mimicking losses arising from the interband coupling.
Our analysis starts from the following “unperturbed”
Hamiltonian for the first two bands:
ε1 n̂ l,1 + ε2 n̂ l,2 − J22 ( â l+1,2
† â l,2 + h.c.)
+Fl ( n̂ l,1 + n̂ l,2) +
n̂ l,1 ( n̂ l,1 − 1)
. (2)
For a moment, we neglect the hopping in the lower band,
where the single-particle Wannier functions [14] are more
localized than in the upper band. In the latter we neglect
the interactions, since initially only a few particles pop-
ulate the excited levels. A closer analysis of the full two-
bands system [12] shows that there are two dominating
mechanisms that promote particles to the second band.
The first one is a single-particle dipole coupling arising
from the force term:
H1 = F · D
â l,2
â l,1 + â l,1
â l,2
, (3)
where D depends only on the lattice depth V (measured
in recoil energies according to the definition in [7]). The
second one is a many-body effect, describing two particles
of the first band entering the second band together:
â l,2
â l,2
â l,1 â l,1 + (1 ↔ 2)
. (4)
The cross-band interaction is characterized by the pa-
rameter U× ≡ ãs
dxχ21χ
2 ≃ 0.5U1 (for V = 3 . . . 10)
[12], for U1 ≡ ãs
dxχ41, with renormalized scattering
length ãs [8, 12] and the Wannier functions χ1,2 local-
ized in each well for the first or second band. To justify
the following perturbative approach, it is crucial to real-
ize that the terms (3) and (4) must be small compared
with the band gap ∆ ≡ ε2 − ε1 (not necessarily small
with respect to the single band terms in (1)), and indeed
FD,U×, U1 ≪ ∆ for the parameters considered here.
For the first perturbation, the decay channel of a given
unperturbed Fock state labelled |b〉 (with a total number
of atoms N and nh atoms in an arbitrary well h) is
|b;N〉 ⊗ |vac〉 → |b′;N − 1〉 ⊗ |w〉 , n′h = nh − 1. (5)
Here, |w〉 =
m=−∞ Jm−w(|J2|/F ) âm,2
†|vac〉 is the
single-particle eigenstate for the Wannier-Stark problem,
localized around the site w in the second band, with the
Bessel function of the first kind Jm(x) [14].
The expectation value of (3) for |b;N〉 of the first band,
equal to the first-order δE(b), is zero because the opera-
tor does not conserve the number of particles within the
bands. The decay width at first-order is given by the ma-
trix element of the perturbation between the initial and
final state according to Fermi’s Golden Rule, and only
the first term in (3) gives a nonzero contribution [12]:
〈k|〈b′|
l=1 â l,2
† â l,1 |b〉|vac〉 =
l=1 Jl−w(|J2|/F )
·δ(n′
, nl − 1)
m 6=l δ(n
m, nm). (6)
The δ(·, ·) functions act as a selection rule for the Fock
states that are coupled by the perturbation. The tun-
neling mechanism does not include any income of en-
ergy from an external source, so the initial and final
energies E0(b) = 〈vac|〈b|H0|b〉|vac〉 and E0(b′, w) =
〈w|〈b′|H0|b′〉|w〉, respectively, must be equal as required
by the Golden Rule. The condition on the energy conser-
vation is, however, relaxed to account for the uncertainty
∆E(b) of the unperturbed energy levels of the initial and
final states in the lower band arising from the hopping in
this band initially neglected in (2). A detailed derivation
is given in [12], and here we only state the result:
∆E(b) = 2π (J1 /2)
∆E(b → b′) =
2π (J1 /2)
∆l=±1
n2l δ(nl+∆l + 1, nl). (7)
The level density ρ(E, b) around the unperturbed en-
ergy E0(b) of a Fock state |b〉 is then approximated by
a rectangular profile, of width ∆E(b) and unit area:
ρ(E, b) = χ {|E − E0(b)| ≤ ∆E(b)/2} / ∆E(b). The re-
laxed energy conservation rule selects from (5) the set K
of permitted decay channels (h,w) parameterized by the
two indices h,w such that:
′, w)− E0(b) = ∆− F (h− w)− U1 (nh − 1)
−∆E(b)+∆E(b
∆E(b)+∆E(b′)
. (8)
Hence the energy ∆ required to promote a particle to the
second band is supplied by the decrease of the interaction
(∝ U1 ) and by the work of the force (∝ F ) exerted on
the promoted particle.
The total width Γ1(b) for the decay via the allowed
channels K, is proportional to the square of the matrix
element and to the level density ρ(E, b):
Γ1(b) = 2π(FD)
(h,w)∈K
Jh−w( |J2|F ) ·
∆E(b)∆E(b′)
. (9)
Jm(x) significantly contributes only for |m| <∼ |x|. If
U1 ,∆E(b) ≪ ∆, the energy conservation is roughly
given by |∆| ≃ F (h − w). Requiring that the Bessel
function in (9) is substantially larger than zero, we ob-
tain the inequality |∆| ≤ |J2|. The last condition does
not depend on F , since a twofold effect is at work: a
stronger force produces a larger energy gain when a par-
ticle moves along the lattice, but the extension |J2/F | of
the single-particle state shrinks. Therefore, increasing F
results in an increased energy matching and a strongly
reduced “geometrical” matching. For 3 < V < 26, we
have |∆|− |J2| > 1.0 [12], such that the energy matching
cannot be realized by just tuning the lattice depth. The
decay can, however, be activated by an increase of the in-
teractions, which can be experimentally achieved by act-
ing on the transversal confining potential of a quasi-one
dimensional lattice, or by a Feshbach resonance [8]. In
the calculations presented below, we augmented U1 used
in [9, 10] by a factor of order 10, and as noted in the in-
troduction, a similar increase of the interaction strength
was used in the experiment to promote fermions to higher
bands [3], in close analogy to the here described field- and
interaction-induced interband coupling of bosons.
The second term (4) is treated in a similar way, with
the difference that two particles are promoted to the sec-
ond band, and the position of the second single-particle
state |w′〉 is an additional degree of freedom for the tran-
sition. The decay channels are:
|b,N〉⊗ |vac〉 → |b′, N − 2〉 ⊗ |w,w′〉 ; n′h = nh − 2. (10)
The energy matching selects a set K of decay channels,
parameterized by the three site indices h,w,w′:
(h,w,w′) ∈ K s.t. E0(b′, w, w′)− E0(b) =
= 2∆− F (2h− w − w′)− U1 (2nh − 3)
∆E(b) + ∆E(b′)
∆E(b) + ∆E(b′)
. (11)
The computation of the matrix element yields [12]:
Γ2(b) = 2π
(h,w,w′)∈K
Jh−w( |J2|F ) ·
Jh−w′( |J2|F )
· 4nh (nh − 1) 1∆E(b)∆E(b′)
. (12)
With respect to (9), the additional degree of freedom
w′ results in a summation over all possible values of
w−w′. This follows from the possibility to conserve the
energy even if a particle is pushed far, if the other parti-
cle is pushed almost equally far in the opposite direction.
Since the decay widths in (12) depend on the product of
two (rapidly decaying) Bessel functions – again a “geo-
metrical” matching condition – we apply the truncation
|w−w′| ≤ |J2/F |, to reduce the formula to a finite form.
We can now compute the total width ΓF(b) = Γ1(b) +
Γ2(b) defined by the two analyzed coupling processes for
0 1 2 3 4 5
-6.5 -6 -5.5
0 1 2 3 4 5
-5 -4.5 -4 -3.5
Log10Γ
0 1 2 3 4 5
-4 -3.5 -3 -2.5
-6.5 -6
-4 -3.5
FIG. 2: (a,c,e) CDF from Re {Ej} (stairs), together with
WD (solid) and Poisson predictions (dashed). (b,d,f) distri-
butions of the logarithm of the rates. In (a,b), (c,d), (e,f)
F ≃ 0.17, 0.31, 0.47, respectively, with (N,L) = (7, 6), V =
3, U1 = 0.2 (fixing U× ≃ 0.1). In the regular regime (f),
a log-normal distribution (dotted) well fits the data, with a
scaling P(Γ) ∝ Γ−x for the largest Γ (dashed line in the inset
of (f) with x = 1). In the chaotic case, a global power-law
behavior with x ≈ 2 is found (dashed line in the inset of (b)).
each basis state |b〉 of the single-band problem given in
(1). The ΓF(b) are inserted as complex potentials in the
diagonal of the single-band Hamiltonian matrix. After
a gauge transform that recovers the translational invari-
ance of the problem (see [10, 12] for details), the latter
matrix is used to compute the evolution operator over one
Bloch period TB, which is finally diagonalized to obtain
its eigenphases exp (−iEj TB). Along with the statistics
of the level spacings defined by Re {Ej}, the Figs. 2 and 3
analyze the statistical distributions of the tunneling rates
Γj = −2Im{Ej} for some paradigmatic cases. All rates
are much smaller than unity, which a posteriori is fully
consistent with our perturbative approach.
To observe what happens at the regular-chaotic tran-
sition (c.f. Fig. 1), we scan F in Fig. 2, and as F in-
creases, the average decay increases by orders of mag-
nitude, while the distributions broaden. The large in-
crease of the rates is due to an improved energy match-
ing, when F supplies the necessary energy to promote
particles to the second band. For the parameters of
Fig. 2, the single-particle Landau-Zener formula [14] gives
ΓLZ = F/(2π) exp
−π2∆2/(8F )
∼ 10−23, 10−12, 10−8
for (b,d,f). This huge variation, typical of semiclassical
formulae, implies that there are possibly parameters for
which our results are comparable to the single-particle
prediction, but, in general, the many-particle effects can-
not be neglected. Moreover, mean-field treatments of
the Landau-Zener tunneling at best predict a shift of Γ
[7, 15], but cannot account for their distributions.
In the chaotic regime, the Fock states are strongly
mixed by the dynamics [9, 10, 12] and a fast decaying
-12 -11 -10 -9 -8
-5.5 -5 -4.5 -4
10a) b)
c) d)
FIG. 3: (a,c) rate distributions in the chaotic regime with F ≃
0.17, U1 = 0.2 (U× ≃ 0.1), together with the corresponding
unscaled P(Γ) in (b,d). In (a,b) (N,L) = (7, 6), V = 4, and
in (c,d) (N,L) = (9, 8), V = 3. Power-laws P(Γ) ∝ Γ−x are
found with x ≈ 2 (dashed lines in (b,d)).
Fock state can act as a privileged decay channel for many
eigenstates. Many states then share similar rates, lead-
ing to thinner distributions. Therefore, the thinner dis-
tribution of Fig. 2 (b) is a direct signature of the chaotic
dynamics evidenced in (a), as compared with the regular
case in (e,f). In Fig. 2 (f), we found a good agreement
with the expected log-normal distribution of decay rates
[16] (or of the similarly behaving conductance [17]) in
the regular regime. There the system shows nearly per-
fect Bloch oscillations [9], and the motion of the atoms
is localized along the lattice [14]. We can even detect a
qualitative crossover to a power-law P(Γ) ∝ Γ−1 in the
right tail of the distribution, as predicted from localiza-
tion theory [16, 18, 19]. The distributions in Figs. 2 (b)
and 3 follow the expected power-law for open quantum
chaotic systems in the diffusive regime [18]. The expo-
nents x are, however, nonuniversal and depend on the
opening of the system. In our case, the decay channels
are defined by the interband coupling, which in a sense
attaches “leads” to all lattice sites within the sample.
Going along with the regular-to-chaotic transition in the
lower band of our model (from Fig. 2 (f) to (b), or to
Fig. 3) the Γ distributions transform from a log-normal
to a power-law with x ≈ 2, in close analogy to the tran-
sition from Anderson-localized to diffusive dynamics in
open disordered systems [18, 20].
In summary, our perturbative opening of the single-
band Wannier-Stark system allows one to study Landau-
Zener-like interband tunneling within a many-body de-
scription of the dynamics of ultracold atoms. The statis-
tical characterization of the tunneling rates (mean values
and form of the distributions) provides clear and robust
signatures of the regular-to-chaotic transition for future
experiments. A more detailed analysis of the interband
coupling in a full-blown model, in which at least two
bands are completely included, calls for huge computa-
tional resources to access the complete quantum spec-
tra. Nonetheless, our results are a first step in the di-
rection of studies for which “horizontal” and “vertical”
quantum transport along the lattice are simultaneously
present and influence each other in a complex manner.
We thank the Centro di Calcolo, Dipartimento di
Fisica, Università di Pisa, for providing CPU, and the
Humboldt Foundation, MIUR-PRIN, and EU-OLAQUI
for support.
[1] O. Morsch et al., Rev. Mod. Phys. 78, 179 (2006).
[2] M. Greiner et al., Nature 415, 39 (2002); T. Stöferle et
al., Phys. Rev. Lett. 92, 130403 (2004); S. Fölling et al.,
ibid. 97, 060403 (2006).
[3] M. Köhl et al., Phys. Rev. Lett. 94, 080403 (2005).
[4] M. BenDahan et al., Phys. Rev. Lett. 76, 4508 (1996);
S.R. Wilkinson et al., ibid. 4512; B.P. Anderson et al.,
Science 282, 1686 (1998).
[5] O. Morsch et al., Phys. Rev. Lett. 87, 140402 (2001).
[6] G. Roati et al., Phys. Rev. Lett. 92, 230402 (2004).
[7] S. Wimberger et al., Phys. Rev. A 72, 063610 (2005).
[8] T. Bergeman et al., Phys. Rev. Lett. 91, 163201 (2003).
[9] A. Buchleitner et al., Phys. Rev. Lett. 91, 253002 (2003).
[10] A.R. Kolovsky et al., Phys. Rev. E 68, 056213 (2003).
[11] M.L. Mehta, Random Matrices and the Statistical Theory
of Energy Levels (Academic Press, New York, 1991).
[12] A. Tomadin, Master’s thesis, Università di Pisa (2006).
[13] V.W. Scarola et al., Phys. Rev. Lett. 95, 033003 (2005).
[14] M. Glück et al., Phys. Rep. 366, 103 (2002).
[15] B. Wu et al., Phys. Rev. A 61, 023402 (2000); O. Zobay
et al., ibid. 61, 033603 (2000); D. Witthaut et al., ibid.
75, 013617 (2007); S. Wimberger et al., J. Phys. B 39,
729 (2006).
[16] M. Terraneo et al., Eur. Phys. J. B 18, 303 (2000).
[17] C.W.J. Beenakker, Rev. Mod. Phys. 69, 731 (1997).
[18] T. Kottos, J. Phys. A 38, 10761 (2005).
[19] G. Casati et al., Phys. Rev. Lett. 82, 524 (1999); S. Wim-
berger et al., ibid. 89, 263601 (2002).
[20] S. Wimberger et al., J. Phys. A 34, 7181 (2001).
ABSTRACT
  A perturbative model is studied for the tunneling of many-particle states
from the ground band to the first excited energy band, mimicking Landau-Zener
decay for ultracold, spinless atoms in quasi-one dimensional optical lattices
subjected to a tunable tilting force. The distributions of the computed
tunneling rates provide an independent and experimentally accessible signature
of the regular-chaotic transition in the strongly correlated many-body dynamics
of the ground band.

<|endoftext|><|startoftext|>
Introduction
Campana et al. (2007, C07 hereafter) investi-
gate the Epeak–Eγ (so called “Ghirlanda”) cor-
relation, including all GRBs detected by Swift
for which we know the redshift, the peak energy
Epeak and we have information on the presence of
the jet break, necessary to estimate the jet open-
ing angle, and therefore to calculate the collima-
tion corrected bolometric energy, Eγ . In a similar
study performed by us (Ghirlanda et al. 2007,
G07 hereafter), we concluded that there was no
new outlier with respect to the Epeak–Eγ correla-
tion (besides GRB 980425 and GRB 031203, but
see Ghisellini et al. 2006), while C07 claim that
there are 5 Swift bursts which do not obey the
correlation. The sample of GRBs studied by C07
and G07 is the same. In the following we give
arguments contrasting the claim of C07.
2. GRB 060526
This burst is the second most important out-
lier (in term of contribution to the χ2) presented
by C07. Both C07 and G07 used the same
source of data: Schaefer (2007) for the fluence
and Epeak, and Dai et al. (2006) for tjet. Using
the listed bolometric fluence one obtains Eγ,iso =
2.53 × 1052 erg. We recomputed the bolometric
fluence from the spectral parameters reported by
Schaefer (2007), obtaining Eγ,iso = 2.58 × 10
erg, which is the value we used. Instead C07
list an isotropic energy Eγ,iso = 1.07
+0.16
−0.14 × 10
erg. We remind that the isotropic energy is found
through
Eγ,iso = Sbol
4πd2L
(1 + z)
where Sbol is the bolometric fluence and the
(1+z) term accounts for the cosmological time di-
lation. Neglecting the (1+ z) term, and using the
bolometric fluence Sbol = 1.17×10
−7, as listed by
Schaefer (2007), one obtains Eγ,iso = 1.07× 10
erg, which is the value reported in C07. We there-
fore suggest that C07, for this burst, missed the
(1 + z) = 4.21 term when calculating Eγ,iso. The
Eγ value used by C07 is therefore larger than the
value found by G07 because of the larger Eγ,iso
(tjet is the same).
A separate problem concerns the values of
Epeak and bolometric fluence for this burst re-
ported by Schaefer (2007). In fact, this burst
showed two main peaks in BAT, separated by
∼200 seconds, with the second peak having
http://arxiv.org/abs/0704.0234v1
http://arxiv.org/abs/astro--ph/0703676
a slightly larger fluence than the first, with
a softer spectrum. The spectral behaviour
of this burst is thus complex, and the value
of Epeak = 25 ± 5 keV reported by Schaefer
(with a fluence corresponding to the first peak
only) may be controversial. For this reason
we have analyzed the available Swift data for
this burst. Our results and the consequences
for the Ghirlanda correlation can be found at:
www.brera.inaf/utenti/gabriele/060526/060526.html
3. GRB 050922C and GRB 060206
These two bursts lack optical data at times late
enough to encompass the jet break time predicted
by the Ghirlanda relation. The fact that there is
indeed an early break in the optical does not guar-
antee that this is a jet break, since we now know
that there is the possibility of multiple breaks in
the optical. In these cases only a lower limit on
the break time can be taken, corresponding to the
latest optical observations, as discussed in G07.
4. GRB 050401 and GRB 050416A
Several authors published a partial coverage of
the optical afterglow of these two bursts, but none
of them discussed the results which can be ob-
tained by collecting all the available data (at least
in one band). Therefore, the claim that in these
GRBs there is no apparent break refers to the par-
tial coverage presented in each paper. Because of
that, in G07 we constructed the light curves with
data from different sources.
In GRB 050401 the result of the fitting is some-
what dependent from the (yet unknown) assumed
magnitude of the host galaxy, which can con-
tribute to the late photometric points. Further-
more, there is a large uncertainty in the normal-
isation of the De Pasquale et al. (2006) points,
because they used a different reference star for
their differential photometry. What we plotted
in Fig. 1 of G07 assumes the maximum possible
displacement (–0.5 mag): assuming a lower one
would inevitably worsen a single power law decay
For GRB 050416A, it is true that Soderberg et
al. (2006) stated that a single power law decay
plus a 1998bw–like supernova light curve can fit
the data, but also in this paper there is no com-
plete collection of points coming from the avail-
able literature. Anyway, SN 2006aj associated
with GRB 060218 is by far the best studied at
early times, so using this as a template should
give a more reliable result. In this case the pres-
ence of a break in the optical light curve is clearly
required.
Given all the above, we think that in these two
GRBs there exists a margin of subjectivity for
judging the presence or not of a possible jet break
(this margin is however small for GRB 050416A).
But just because of this, it is not appropriate to
declare that they are outliers, and treat them as
such in the fits. At the very least, one should
consider them having a lower limit in Eγ corre-
sponding to the jet break time we have derived.
5. Additional comments
The pre–Swift data plotted in the figures of C07
are the values of Eγ calculated taking Eγ,iso from
Firmani et al. (2006) and the jet angles from
Nava et al. (2006), who reported slightly differ-
ent values of Eγ,iso. Since the derived jet angle
depends upon Eγ,iso, this procedure is not cor-
rect.
When calculating the χ2
value for the bursts in
the sample of Nava et al. (2006), C07 find agree-
ment in the case of an homogeneous circumburst
medium, and a larger χ2
in the case of a wind
profile. We instead confirm the original value re-
ported in Nava et al. (2006).
We note that the χ2
values given in Table 2
of C07 for the “Swift data achromatic breaks”
and “Swift data pure breaks” cases, do not cor-
respond to the values given in the text.
GRB 061121 is plotted as a lower limit in Eγ ,
and lies to the left of the Ghirlanda correlation. It
should not be included in the fit as instead done
in C07.
A symmetric error on a linear quantity trans-
forms into an asymmetric error in the logarithm.
We believe that C07 underestimated the error on
Eγ,iso due to the systematic choice of the smallest
error in the logarithmic quantity. In G07, instead,
we propagated the errors in the logarithmic space.
Finally, in Fig. 2 of C07 (wind case) there is an
additional pre–Swift burst which is not present
in Fig. 1.
6. Conclusions
We would like to stress that we are not willing
to defend the Epeak–Eγ correlation to death. As
any other scientific result, it must be the object
of severe scrutiny from the scientific community.
This is even more true since its potential cosmo-
logical use makes this correlation very important
[as well as the related, model independent and
assumption free, Liang & Zhang (2005) correla-
tion]. Furthermore, its existence can flag some
crucial property of GRB physics which are not yet
fully understood (but some attempts have already
been done, see Thompson 2006 and Thompson,
Meszaros & Rees 2007). Therefore to demon-
strate that this correlation is the result of some
selection effects (or not), or that its dispersion is
much larger than what it is now (or not), or that
there are outliers (or not) is a mandatory task,
that must be pursued carefully.
REFERENCES
1. Campana, S., Guidorzi, C., Tagliaferri,
G., Chincarini, G., Moretti, A., Rizzuto,
D. & Romano, P., 2007, A&A in press
(astro–ph/0703676) (C07)
2. Dai et al., 2007, ApJ submitt.
(astro-ph/0609269)
3. De Pasquale, M., Beardmore, A.P.,
Barthelmy, S.D., et al., 2006, MNRAS,
365, 1031
4. Ghirlanda, G., Nava, L., Ghisellini G.
& Firmani C., 2007, A&A, in press
(astro–ph/0702352) (G07)
5. Ghisellini, G., Ghirlanda, G., Mereghetti, S.,
Bosnjak, Z., Tavecchio, F., & Firmani C.,
2006, MNRAS, 372, 1699
6. Liang, E. & Zhang, B., 2005, ApJ, 633, L611
7. Schaefer, B.E. 2007, astro-ph/0612285
8. Soderberg, A.M., Nakar, E., Cenko, S.B., et
al., 2006, ApJ subm. (astro–ph/0607511)
9. Thompson, C., 2006, ApJ, 651, 333
10. Thompson, C., Meszaros, P. & Rees, M.J.,
2007, subm to ApJ, (astro–ph/0608282)
http://arxiv.org/abs/astro--ph/0703676
http://arxiv.org/abs/astro-ph/0609269
http://arxiv.org/abs/astro--ph/0702352
http://arxiv.org/abs/astro-ph/0612285
http://arxiv.org/abs/astro--ph/0607511
http://arxiv.org/abs/astro--ph/0608282
	Introduction
	GRB 060526
	GRB 050922C and GRB 060206
	GRB 050401 and GRB 050416A
	Additional comments
	Conclusions
ABSTRACT
  In their recent paper, Campana et al. (2007) found that 5 bursts, among those
detected by Swift, are outliers with respect to the E_peak-E_gamma
("Ghirlanda") correlation. We instead argue that they are not.

<|endoftext|><|startoftext|>
Introduction
The ATLAS and CMS experiments at the LHC will begin taking data in a few months and it is
widely believed that new physics beyond the Standard Model(SM) will be discovered in the coming
years. There are many expectations as to what this new physics may be and in what form it will
manifest itself, but it is likely that we will be in for a surprise. Once this new physics is discovered
our primary goal will be to understand its essential nature and how the specific discoveries, such as
the production and observed properties of new particles, fit into a broader theoretical framework.
The existence of a new charged gauge boson, W ′, or a W ′-like object, is now a relatively
common prediction which results from many new physics scenarios. These possibilities include the
Little Higgs(LH) model[1], the Randall-Sundrum(RS)[2] model with bulk gauge fields[3], Universal
Extra Dimensions(UED)[4], TeV scale extra dimensions[5, 6, 7], as well as many different extended
electroweak gauge models, such as the prototypical Left-Right Symmetric Model(LRM)[8, 9]. Al-
though the physics of a new Z ′ has gotten much attention in the literature[10], the detailed study
of a possible W ′ has fared somewhat less well[11]. Perhaps the most important property of a W ′,
apart from its mass and width, is the helicity of its couplings to the fermions in the SM. For all of
the models discussed in the literature above, these couplings are either purely left- or right-handed,
apart from some possible small mixing effects. Determining the helicity of the couplings of a newly
discovered W ′ is thus the first major step in opening up the underlying physics as it is an order
one discriminator between different classes of models.‡
As will be discussed below, there have been many suggestions over the last 20-plus years
as to how to measure the helicity of W ′ couplings, all of which have their own strengths and
weaknesses. These analyses have generally relied upon the use of the narrow width approximation.
However, in employing this approximation much valuable information about the properties of the
W ′ can be lost, in particular, that obtained from W −W ′ interference. The goal of this paper will
‡This is similar in nature to determining whether the known light neutrinos are Dirac or Majorana particles.
be to explore the effects of this interference on the transverse mass dependent distributions of the
W ′. As we will see the rather straightforward measurement of the transverse mass distribution
itself will allow us obtain the necessary W ′ helicity information. Furthermore, we will demonstrate
that such measurements will require only relatively low integrated luminosities for W ′ masses which
are not too large, and will employ the traditional ℓ+EmissT W
′ discovery channel.
Section II of the paper contains some background material and a historically-oriented
overview of previous ideas that have been suggested to address the W ′ helicity issue including
a discussion of their various strengths and weaknesses. Section III will present an analysis of
the W ′ transverse mass distribution and its helicity dependence for a range of W ′ masses, cou-
pling strengths and LHC integrated luminosities. The use of various asymmetries evaluated in
the W −W ′ interference region in order to assist with the W ′ helicity determination will also be
discussed. Section IV contains a final summary and discussion of our results.
2 Background and History
Let us begin by establishing some notation; since much of this should be fairly familiar we will be
rather sketchy and refer the interested reader to Ref.[10] for details.
We denote the couplings of the SM fermions to the Wi = (W = WSM ,W
′) as
Vff ′C
i f̄γµ(1− hiγ5)f
i + h.c. , (1)
where for the case of Wi = WSM , the coupling strength(for leptons and quarks, respectively)
and helicity factors are given by C
i , hi = 1 and Vff ′ is the CKM(unit) matrix when f, f
′ are
quarks(leptons); note that the helicity structure for both leptons and quarks is assumed to be the
same as in all the model cases above.§ Following the notation given in Ref.[10], with some obvious
§For simplicity in what follows we will further assume that the corresponding RH and LH CKM matrices are
identical up to phases and we will generally neglect any possible small effects arising from W − W ′ mixing. In the
case of RH couplings, we will further assume that the SM neutrinos are Dirac fields.
modifications, the inclusive pp → W+i → ℓ+ν +X differential cross section can be written as
dτ dy dz
|Vqq′ |2
(1 + z2) + 2AG−
, (2)
where K is a kinematic/numerical factor that accounts for NLO and NNLO QCD corrections[12]
as well as leading electroweak corrections[13] and is roughly of order ≃ 1.3 for suitably defined
couplings, τ = M2/s (
s = 14 TeV at the LHC) with M2 being the lepton pair invariant mass.
Furthermore,
Pij(CiCj)
ℓ(CiCj)
q(1 + hihj)
2 (3)
Pij(CiCj)
ℓ(CiCj)
q(hi + hj)
where the sums extend over all of the exchanged particles in the s-channel. Here
Pij = ŝ
(ŝ−M2i )(ŝ−M2j ) + ΓiΓjMiMj
[(ŝ −M2i )2 + Γ2iM2i ][i → j]
, (4)
with ŝ = M2 being the square of the total collision energy and Γi the total widths of the exchanged
Wi particles. Note that we have employed z = cos θ, the scattering angle in the CM frame defined
as that between the incoming u-type quark and the outgoing neutrino (both being fermions as
opposed to being one fermion and one anti-fermion). Furthermore, the following combinations of
parton distribution functions appear:
q(xa,M
2)q̄′(xb,M
2)± q(xb,M2)q̄′(xa,M2)] , (5)
where q(q′) is a u(d)−type quark and xa,b =
τe±y are the corresponding parton momentum
fractions. Analogous expressions can also be written in the case of W−i exchange by taking z → −z
and interchanging initial state quarks and anti-quarks.
In most cases of interest one usually converts the distribution over z above into one over
the transverse mass, MT , formed from the final state lepton and the missing transverse energy
associated with the neutrino; at fixed M , one has z = (1 −M2T /M2)1/2. The resulting transverse
mass distribution can then be written as
dy J(z → MT )
dτ dy dz
, (6)
where Y = min(ycut,−1/2 log τ) allows for a rapidity cut on the outgoing leptons and J(z → MT )
is the appropriate Jacobian factor[15]. In practice, ycut ≃ 2.5 for the two LHC detectors. Note
that dσ
will only pick out the z-even part of dσ
dτ dy dz
as well as the even combination of terms in
the product of the parton densities, G+
. In the usual analogous fashion to the Z ′ case[10], as we
will see in our discussion below, one can also define the forward-backward asymmetry as a function
of the transverse mass, in principle prior to integration over the rapidity y, AFB(MT , y), whose
numerator now picks out the z-odd terms in dσ
dτ dy dz
as well as the odd combination of terms in
the parton densities G−
To be complete, we note that historically when discussing new gauge boson production,
particularly when dealing with states which are weakly coupled as will be the case in what fol-
lows, use is often made of the narrow width approximation(NWA). In the W ′ case of relevance
here, the NWA essentially replaces the integration over dτ ∼ dM by a δ function, i.e., the W ′ is
assumed to be produced on-shell. Thus, for any smooth function f(M), essentially,
dM f(M)
dM f(M) π
ΓW ′δ(M −MW ′) → π2ΓW ′f(MW ′), apart from some overall factors. Note that
use of the NWA implies that we evaluate quantities on the ‘peak’ of the W ′ mass distribution, i.e.,
at M = MW ′ . This approximation is usually claimed to be valid up to O(ΓW ′/MW ′) corrections(at
worst), but there are occasions, e.g., when W −W ′ interference is important, when its use can lead
to a loss of valuable information and may even lead to wrong conclusions[16]. Unfortunately, in the
W ′ case, the quantity M itself is not a true observable due to the missing longitudinal momentum
of the neutrino.
Given this background, let us now turn to an historical discussion of the determination of
the W ′ coupling helicity. To be concrete, we will consider two different W ′ models; we will assume
for simplicity that C
= 1 in both cases and that only the value of hW ′ = ±1 distinguishes them.
In this situation, employing the NWA, the cross section for on-shell W ′ production (followed by its
leptonic decay) is proportional to ∼ (1+h2W ′) and is trivially seen to be independent of the helicity
of the couplings. We would thus conclude that cross section measurements are not useful helicity
discriminants. More interestingly, as was noted long ago[17], we find that the rapidity integrated
value of AFB, given in the NWA by
AFB ∼
h2W ′
(1 + h2W ′)
, (7)
also has the same value for either purely LH or RH couplings¶. Thus, in the NWA, AFB provides
no help in determining the W ′ coupling helicity structure for the cases we consider here. However,
we note that if the quark and leptonic coupling helicities of the W ′ are opposite, then the value of
AFB will flip sign in comparison to the above expectation.
It is apparent from this result that some other observable(s) must be used to distinguish
these two cases. Keeping the NWA assumption, the first suggestion[18] along these lines was to
examine the polarization of τ ’s originating in the decay W ′ → τν. In that paper it was explicitly
shown that the the energy spectrum of the final state particle in the decay τ → ℓ, π or ρ (in the τ
rest frame) was reasonably sensitive to the original W ′ helicity since the τ itself effectively decays
only through the SM LH couplings of the W (provided the W ′ is sufficiently massive as we will
assume here). The difficulty with this method is that the observation of this decay mode at the
LHC is not all that straightforward and even the corresponding Z ′ → ττ mode, which is somewhat
easier to observe, is just beginning to be studied by the LHC experimental collaborations[19].
¶This follows immediately from the fact that we have assumed that both the hadronic and leptonic couplings of
the W ′ have to have the same helicity.
Clearly, measuring the polarization of the τ ’s in W ′ → τν will be reasonably difficult in the LHC
detector environment and may, at the very least, require large integrated luminosities even for a
relatively light W ′. The results of detailed studies by the LHC collaborations to address this issue
are anxiously awaited.
In the early 90’s, two important NWA-based methods for probing the helicity of the W ′
were suggested[20]. The first of these is an examination of the rare decay mode W ′ → ℓ+ℓ−W
(with the W decaying into jets); in particular, one makes a measurement of the ratio of branching
fractions
B(W ′ → ℓ+ℓ−W )
B(W ′ → ℓν)
, (8)
obtained by employing the NWA. RW is expected to be roughly ∼O(0.01) or so after suitable cuts.
One of the main SM backgrounds, i.e., WZ production, can essentially be removed by demanding
that the dileptons do not form a Z, demanding that the mass of the jjℓℓ system be not far from
the (already known) value of MW ′ and that of the dijets reconstructs to the W mass. Even after
there requirements, however, some background from the continuum would remain. Furthermore,
as the energy of the final state W increases it is more likely that the resulting dijets will coalesce
into a single jet depending on the jet cone definition which is employed. In this case, at the very
least, a very large additional background from single jets may appear; it is also possible that the
events with a final state W would be completely lost without the dijet mass reconstruction. The
3ℓ+EmissT final state, with suitable cuts, would be obviously cleaner and would avoid some of these
issues but at the price of an overall suppression due to ratio of branching fractions of ≃ 1/3 thus
reducing the mass range over which this process would be useful.
In a general gauge model, the amplitude for this process is the sum of two graphs. In
the first graph, W
′− → ℓ−ν̄∗, i.e., the production of a virtual neutrino followed by the ‘decay’
ν̄∗ → ℓ+W−. Clearly, if the W ′ couples in a purely RH manner to the SM leptons then this graph
will vanish in the limit of massless neutrinos due to the presence of two opposite helicity projection
operators. This graph will, of course, be non-zero only if the W ′ couples in an at least partially LH
manner. The second graph involves the presence of the trilinear couplings W ′ZW and W ′Z ′W ;
recall that in any model with a W ′, a Z ′ will also appear just based on gauge invariance. In this
case, the decay proceeds as W ′ → WZ/Z ′∗ → Wℓ+ℓ−, noting that the on-shell SM Z contribution
can be removed by a suitable cut on the dilepton invariant mass. The main issue is the size of
the W ′Z ′W (and W ′ZW ) couplings and this can involve such things such as, e.g., the detailed
electroweak symmetry breaking patterns of the given model under study. Generically in extra
dimensional models[3, 4, 5, 6, 7], these couplings are absent in the limit of small mixing due the
orthogonality of the Kaluza-Klein wavefunctions of the states. In models where the SM SU(2)L
arises from a diagonal breaking of the form G1 ⊗ G2 → SU(2)Diag , such as in LH models[1], the
W ′Z ′W coupling is of order the SM weak coupling, g, while the W ′ZW coupling is either of order g
or can be mixing angle suppressed. In other cases, such as in the LRM[8], where SU(2)L⊗SU(2)R
just breaks to SU(2)L, the W
′ZW,WZ ′W couplings are only generated by mixings and for the
diagrams of interest are not longitudinally enhanced. Since the amplitude associated with the pure
leptonic graphs are absent in this case, the entire amplitude is mixing angle suppressed so that
this process has an unobservably small rate. In fact, there are no known models where the W ′
helicity is RH and the W ′ZW,WZ ′W couplings are not mixing angle suppressed‖. Thus, based
on known models, it appears that the observation of the rare decay W ′ → ℓ+ℓ−W would be a
compelling indication that the W ′ is at least partially coupled in a LH manner with apparently
most of the serious SM backgrounds being removable by conventional cuts. However, in making
a truly model-independent analysis one must exercise care in the use of this result. A detailed
analysis of the signal and backgrounds, including that for the jjℓ+ℓ− final state, for such decays
including realistic detector effects would be very useful in addressing all these issues and should be
performed. However, it also seems clear that is unlikely that a reliable measurement of RW can be
made with relatively low integrated luminosities.
‖In a fundamental UV complete theory, this may follow directly from arguments based solely on gauge invariance
and the requirement of high energy unitarity.
A second, imaginative possibility is to observeWW ′ associated production[20] withW → jj
for the same reasons as above. Many of the arguments made in the previous paragraph will
also apply in this case as well since the diagrams responsible for this process are quite similar to
previously discussed. Essentially these graphs are obtained by crossing, with the final state leptons
now replaced by an initial state qq̄. In this case one looks for the jjℓEmissT final state with the
ℓEmissT transverse mass peaking near MW ′ . One would anticipate this cross section to be of order
∼ 0.01 of that of the W ′ discovery channel. The main issues here are, as above, the SM backgrounds
and the nature of the triple gauge vertices. It is not likely that a reliable measurement of this cross
section will be performed with low luminosities that could be interpreted in a model-independent
way until all of the background and detector issues are dealt with. Again, a detailed analysis
including detector effects should be performed.
3 W −W ′ Interference as a Function of MT
What we have learned from the previous discussion is that tools which employ the NWA are not
particularly useful when we are trying to determine the W ′ coupling helicity with relatively low
luminosities in an easily examined final state. One of the key reasons for this is that the use of NWA
does not allow us to examine the influence of W − W ′ interference to which we now turn[21]∗∗.
To be specific, in the analysis that follows, we will employ the CTEQ6M parton densities[25]
and will restrict our attention only to the ℓ = e final state since it is better measured at these
energies[23] yielding a better MT resolution. Furthermore, we will assume that only SM particles
are accessible in the decay of theW ′ so that the total width can be straightforwardly calculated from
the assumptions described above and its assumed mass value; for example, we obtain Γ(W ′) = 51.9
GeV assuming a W ′ mass of 1.5 TeV including QCD corrections. NLO QCD modifications to the
distributions we discuss below have been ignored but those distributions we consider are rather
∗∗We note in passing that the usual experimental analyses at LHC[23] performed by both the ATLAS and CMS
collaborations (as well as those at the Tevatron by CDF and D0[22]) ignore the effects of W −W ′ interference since
these contributions are absent from default versions of stand-alone PYTHIA[24].
Figure 1: Transverse mass distribution for the production of a 1.5 TeV W ′ including interference
effects at the LHC displayed on both log and linear scales assuming an integrated luminosity of
300 fb−1. The lowest histogram is the SM continuum background. The upper blue(middle red)
histogram at MT = 600 GeV corresponds to the case of hW ′ = −1(1).
robust against large corrections.
Figure 2: Same as in the previous figure but now on a linear scale with lower luminosities and
smeared by the detector resolution. In the top(bottom) panel an integrated luminosity of 30(10)
fb−1 has been assumed. Detector smearing has now been included assuming δMT /MT = 2%.
The most obvious distribution to examine first is dσ
itself; for the moment let us restrict
ourselves to the two cases where C
= 1 and hW ′ = ±1. Fig. 1 shows this distribution for a large
integrated luminosity, assuming MW ′ = 1.5 TeV[22], as well as the SM continuum background
††. In
††Note that we would expect to see many excess events for such W ′ masses as only ≃ 25 pb−1 of luminosity would
obtaining these and other MT -deprndent distributions below, a cut on the lepton rapidity, |ηℓ ≤ 2.5,
has been applied. Several things are immediately clear: (i) In the region near the Jacobian peak
both distributions are quite similar; this is not surprising as this is the region where the NWA is
most applicable since now MT ≃ M and W −W ′ interference is minimal. In this limit we would
indeed recover our earlier result that the cross section is helicity independent. (ii) In the lower MT
region where interference effects are important the two models lead to quite different distributions.
In particular, for the LH case with hW ′ = 1, we observe a destructive interference with the SM
amplitude producing a distribution that lies below that of the pure SM continuum background.
(This is not surprising as the overall signs of the W and W ′ contributions are the same but we are
at values of
ŝ that are above MW yet below MW ′ so that the relevant propagators have opposite
signs.) However, for the RH case with hW ′ = −1, there is no such interference and therefore the
resulting distribution always lies above the SM background. It is fairly obvious that these two
distributions are trivially distinguishable at these large integrated luminosities. Note that other
contributions to the SM background, e.g., those from the decay of top quarks as well as guage
boson pairs, have been shown to be rather small at these masses at the detector level [23], at the
level of a few percent, and will be ignored in the analysis that follows.
Fig. 2 shows the same dσ
distribution on a linear scale but now for far smaller integrated
luminosities that may be obtained during early LHC running; here we include the effects of detector
smearing, with δMT /MT ≃ 2%, which is somewhat less important in the very large statistics sample
cases shown above. It is immediately apparent that even with only ∼ 10 fb−1 of luminosity the
two cases remain quite distinct; however, it also appears unlikely that much smaller luminosities
would be very useful in this regard. This result is a significant improvement over previous attempts
to determine the W ′ coupling helicity with low luminosities in clean channels.
At this point there are several important questions one might ask: (i) What happens for
a more massive W ′, i.e., how much luminosity will be needed in such cases to distinguish W ′
be needed to discover(5σ) such as state at the LHC.
Figure 3: High luminosity plot of the transverse mass distribution assuming MW ′ = 2.5(3.5) for
the upper(lower) pair of histograms along with the SM continuum background. In the interference
region near ≃ 0.5MW ′ the upper(lower) member of the pair corresponds to the case of hW ′ = −1(1).
Detector smearing has now been included assuming δMT /MT = 2%.
couplings of opposite helicities? (ii) What if the the W ′ couplings are weaker than our canonical
choice above? (iii) Do other observables, e.g., AFB, measured in the interference region below the
Jacobian peak assist us in model separation? (iv) In the case where the W ′ is a KK excitation,
does the presents of the additional W KK tower members alter these results? (v) In the discussion
above we have assumed that CℓW ′ = C
W ′ ; what would happen, e.g., if their signs were opposite
thus modifying the interference bewteen the W and W ′? (vi) What if the W ′ couplings are not
purely chiral and are an admixture of LH and RH helicities? It is to these issues that we now turn.
Fig. 3 provides us with a high luminosity overview for the more massive cases where MW ′ =
2.5 or 3.5 TeV. In the MW ′ = 2.5 TeV case, Fig. 4 demonstrates that the full 300 fb
−1 luminosity
is not required to distinguish the two possibly helicities; ∼ 60fb−1 seems to be the approximate
minimum luminosity that appears to be necessary. For higher masses, distinguishing the two cases
becomes far more difficult due to the smaller production cross section as we see from Fig. 5 for
the case of MW ′ = 3.5 TeV assuming a luminosity of 300 fb
−1; essentially the full luminosity is
Figure 4: Transverse mass distribution assuming a mass of 2.5 TeV for the W ′ along with the
SM continuum background; the upper(lower) panel corresponds to a luminosity of 300(75) fb−1.
In the interference region near ≃ 0.5MW ′ the upper(lower) histogram corresponds to the case of
hW ′ = −1(1).
required for model distinction in this case.
Figure 5: High luminosity plot of the transverse mass distribution assuming MW ′ = 3.5 TeV along
with the SM continuum background. In the interference region near ≃ 0.5MW ′ the upper(lower)
histogram corresponds to the case of hW ′ = −1(1).
What if the W ′ couplings are weaker? Clearly if they are too weak there will be insufficient
statistics to discriminate the two possible coupling helicity assignments for any fixed value of MW ′ .
In order to examine a realistic example of this situation, we consider the case of the second W
KK excitation in the UED model[4, 26] with a conserved KK-parity. In such a scenario the LH
couplings of this field to SM fermions vanish at tree level but are induced by one loop effects. In
this case one finds that the effective values of Cℓ,q are distinct but are qualitatively of order ∼ 0.05
though we employ the specific values obtained in Ref.[4, 26] below in the actual calculations. Fig. 6
shows the transverse mass distributions in this case assuming that MW ′ =1 TeV for the second level
KK state. The signal for this W KK state is clearly visible above the SM background. However,
we also see that for even for these high luminosities and low masses the two helicity choices are
not distinguishable. Clearly, one cannot determine the W ′ coupling helicity for such very weak
interaction strengths. Semi-quantitatively, we find that that this breakdown in the discriminating
power occurs when (CℓCq)1/2 ∼ 0.1 at these luminosities and masses.
Figure 6: High luminosity plot of the transverse mass distribution assuming MW ′ = 1 TeV for
the second W KK level in the UED model smeared by detector resolution as above. As usual the
lower histogram is the SM background while the other two correspond to the signal cases with
hW ′ = −1(1) and are essentially indistinguishable.
We now turn to the next question we need to address: can asymmetries be useful in strength-
ening our ability to determine the W ′ coupling helicity? We know from the discussion above that
the answer is apparently ‘no’ in the NWA limit, i.e., when MT ≃ M . Thus we must focus our at-
tention on the MT region below the peak where W−W ′ interference is strongest or, more generally,
examine the asymmetries’ MT -dependence directly. The most obvious quantity to begin with is the
y-integrated value of AFB for both W
′± channels. To make such a measurement, we need to know
several things in addition to the sign of the lepton (which we assume can be done with ≃ 100%
efficiency). At the parton level, in the case of W ′− for example, the relevant angle used to define
AFB lies between the incoming d-type quark and the outgoing ℓ
−. Reconstructing this direction
presents us with two problems: first, since the longitudinal momentum of the ν is unknown there
is an, in principle, two-fold ambiguity in the motion of the center of mass in the lab frame; this
can cause a serious dilution of the observed asymmetry but can be corrected for statistically using
Monte Carlo once the W ′ mass is known. Second, even when it is determined, the direction of
motion of the center of mass is not necessarily that of the d-type quark though it is likely to be
so when the boost of the center of mass frame is large. The later problem also arises for the case
of a Z ′ and has also been shown to be mostly correctable in detailed Monte Carlo studies[27]. For
the moment, let us forget these issues and ask what the y-integrated AFB(MT ) looks like in both
ℓ± channels; the results are shown in Fig. 7 assuming high luminosities and MW ′ = 1.5 TeV. Here
we see that these integrated quantities, even for luminosities of 300 fb−1, are essentially useless
in distinguishing the two coupling helicity cases. Furthermore, we also see that the two coupling
helicities lead to essentially identical results when MT ≃ MW ′ as would be expected based on the
NWA. A short analysis indicates that approximately ten times more integrated luminosity would
be required before some separation in the two cases becomes possible[28]. Clearly this situation
would only become worse if we were to raise the mass of the W ′ or reduce its coupling strength.
It is perhaps possible that some information is lost by only using the integrated quantity
AFB and we need to consider instead AFB(yW ), where yW is the rapidity of the center of mass frame.
This distribution is odd under the interchange yW → −yW at the LHC so we can simply fold this
distribution over the yW = 0 boundary to double the statistics. Furthermore, by integrating over
a wide MT range in the interference region below the W
′ peak, e.g., 0.4 ≤ MT ≤ 1 TeV in the case
of a 1.5 TeV W ′, further statistics can be gained. Fig 8 shows the resulting AFB(yW ) distributions
for a W ′± with mass of 1.5 TeV assuming a luminosity of 300 fb−1 for hW ′ = ±1. At these large
luminosities, the AFB(yW ) distributions for the two helicity choices are clearly distinguishable but
this will certainly become more difficult for lower luminosities or for larger masses. We find that we
essentially loose all coupling helicity information when the luminosity falls much below ≃ 100fb−1
for this W ′ mass.
The next observable we consider is the charge asymmetry, AWQ(yW ):
AWQ(yW ) =
N+(yW )−N−(yW )
N+(yW ) +N−(yW )
, (9)
where N±(yW ) are the number of events with charged leptons of sign ± in a given bin of rapidity.
Note that at the LHC, AWQ(yW ) is symmetric under yW → −yW so that we can again fold the
Figure 7: The y-integrated value of AFB, as a function of the transverse mass, assuming a mass of
1.5 TeV for the W ′+(W ′−) in the top(bottom) panel. Here an integrated luminosity of 300 fb−1
has been assumed. The two essentially indistinguishable histograms correspond to the two possible
choices of the helicity, hW ′ = ±1.
Figure 8: The value of AFB as a function the center of mass rapidity, yW , integrated over the
transverse mass bin 400-1000 GeV assuming a mass of 1.5 TeV for theW
′−) in the top(bottom)
panel. An integrated luminosity of 300 fb−1 has been assumed and the distribution has been folded
around yW = 0. The upper(lower) set of data points in the top(lower) panel for small values of yW
corresponds to the choice of hW ′ = −1. Note that we have chosen signs to make the ranges of AFB
comparable in both cases.
distribution around yW = 0. Fig. 9 shows this distribution, integrated over the interference region
0.4 ≤ MT ≤ 1 TeV, assuming MW ′ = 1.5 TeV and a luminosity of 300 fb−1. It is clear that at this
level of integrated luminosity the two distributions are reasonably distinguishable. However, as we
lower the luminosity or raise the mass of the W ′ the quality of the separation degrades significantly.
Certainly for luminosities less that ≃ 100 fb−1, this asymmetry measurement would not be very
helpful. Thus AWQ(yW ) is not a very useful tool for coupling helicity determination until high
luminosities become available.
Figure 9: The W − W ′ induced charge asymmetry, assuming MW ′ = 1.5 TeV, as a function the
center of mass rapidity, yW , integrated over the transverse mass bin 400-1000 GeV. An integrated
luminosity of 300 fb−1 has been assumed and the distribution has been folded around yW = 0.
The upper set of data points at low values of yW corresponds to the choice of hW ′ = 1.
A last asymmetry possibility to consider is the rapidity asymmetry for the final state charged
leptons themselves, Aℓ(yℓ):
Aℓ(yℓ) =
N+(yℓ)−N−(yℓ)
N+(yℓ) +N−(yℓ)
, (10)
which is also an even function of yℓ so the distribution can again be folded around yℓ = 0. The
resulting distribution can be seen in Fig. 10 for large integrated luminosities. Here we again see
reasonable model differentiation at low values of yℓ <∼ 1 but this fades in utility as integrated
luminosities drop much below ≃ 100 fb−1 as the two curves are generally rather close.
From this general discussion of possibly asymmetries that one can form employing this final
state we can thus conclude that their usefulness in coupling helicity determination will require
≃ 100fb−1.
Figure 10: The W − W ′ induced lepton asymmetry, assuming MW ′ = 1.5 TeV, as a function
the lepton’s rapidity, yℓ, integrated over the transverse mass bin 400-1100 GeV. An integrated
luminosity of 300 fb−1 has been assumed and the distribution has been folded around yℓ = 0. The
upper set of data points at low values of yℓ corresponds to the choice of hW ′ = 1.
In the case of extra dimensions we know that an entire tower of W ′-like KK states is
expected to exist. Do the presence of these additional states modify the results we have obtained
above for an ordinary W ′? To address this, consider the simplified case of a second W ′-like KK
state which have the same coupling strength as the SM W and is twice as heavy as the W ′ discussed
above, i.e.,3 TeV. Now imagine that the coupling helicity of this second state is uncorrelated with
that of the W ′; in the MT distribution in the W −W ′ interference region influenced by this state?
The upper panel in Fig. 11 addresses this issue for modest luminosities including the effects of
smearing. The upper(lower) set of three histograms corresponds to the case where hW ′ = −1(1)
and either there is no W ′′, as above, or hW ′′ = ±1. This demonstrates that the existence of the
extra KK states has little influence on the results we obtained above independent of their coupling
helicities.
Up to now we have assumed that CℓW ′ = C
W ′ ; what if this was no longer true? How would
the MT distribution and our ability to determine coupling helicity be modified? The simplest case
to examine is CℓW ′ = −C
= 1 with hW ′ = ±1. (Note that interchanging the signs of these two
couplings, i.e., which one of these two couplings we choose to be negative, has no physical effect on
the MT distribution or on any of the asymmetries discussed earlier.) The result of this investigation
is shown in the lower panel of Fig. 11. Here the red(green) histograms correspond to the cases
analyzed above where CℓW ′ = C
W ′ = 1 and hW ′ = 1(−1) whereas the blue(magenta) histograms
corresponds to the cases where CℓW ′ = −C
= 1 with hW ′ = 1(−1). It is clear from this Figure that
the MT distribution distinguishes only three of these cases with the C
W ′ = ±C
= 1, hW ′ = −1
possibilities being degenerate. The reason for this is that in both these cases there is no interference
with the SMW ′ exchange and in the pureW ′ term in the cross section this sign change is irrelevant;
these two degenerate cases are, of course, separable using the information obtained from AFB as
they produce values with opposite sign.
Lastly, and to be more general, we must at least consider possible scenarios where the
couplings of the W ′ to SM fermions are a substantial admixture of both LH and RH helicities,
though obvious examples of such kinds of models are apparently absent from the existing literature.
To get a feel for such a possibility, we perform two analyses: first, we set Cℓ,q = 1 as before and
vary the values of hW ′ between pairs of positive and negative values. As we do this, the helicity
of the couplings of the W ′ will vary as will its total decay width which behaves as ∼ 1 + h2W . In
a second analysis, we can rescale the values of the Cℓ,q so that the W ′ width is held fixed. In this
case, as we will see, the resulting histograms for the transverse mass distribution lie especially close
to one another. The results of these two sets of calculations are shown in Fig. 12 in the case of
large integrated luminosities assuming the default value of MW ′ = 1.5 TeV. In the first analysis
shown in the top panel, we see that at these assumed luminosities all of the different histograms
Figure 11: SmearedMT distributions for several scenarios; the top panel, the lower(upper) compares
the single W ′ case discussed above to that where a second KK state, W ′′, exists with coupling
helicities uncorrelated to that of the W ′. Details are given in the text. In the lower panel, we
compare the cases for hW ′ = ±1 allowing for the possibility that CℓW ′ = ±C
with the signs
uncorrelated with the coupling helicity; the details are discussed in the text.
are distinguishable and not just the two pairs of cases with opposite helicities. This result generally
remains true down to luminosities ∼ 75fb−1 or so. If we are only interested in separating opposite
helicity pairs then we find that the cases hW ′ = ±0.8(0.6, 0.4, 0.2) can be distinguished down to
luminosities of order ∼ 10(25, 50, 75)fb−1 , respectively.
In the second analysis, as seen in the lower panel of the figure, the histograms for hW ′ =
0.8, 0.6 and 0.4 (as well as for their corresponding opposite helicity partners) are very close to one
another and are essentially inseparable even at these high luminosities. However, the two sets of
opposite helicity histograms remain distinguishable and this will remains true down to luminosities
of order 30 − 75 fb−1. It would seem from these analyses that the transverse mass distribution
will play the dominant role in W ′ coupling helicity determination in all possible cases although
somewhat higher integrated luminosities may be required in some scenarios.
4 Summary and Conclusions
Apart from its mass and width, the most important property of a new charged gauge boson, W ′, is
the helicity of its couplings to the SM fermions. Such particles are predicted to exist in the TeV mass
range in many new physics models and this coupling helicity is an order one discriminator between
the various classes of models. The main difficulties with the existing techniques for determining
this helicity are potentially threefold: (i) they require rather high integrated luminosities even for
a relatively light W ′, and/or (ii) they are sufficiently intricate as to require a detailed background
and detector study to determine their feasibility, and/or (iii) they make use of more complex final
states other than the standard ℓ + EmissT discovery channel. Some of these techniques also suffer
from employing the narrow width approximation which can result in loss of valuable information
regarding the effects of W − W ′ interference. In this paper we propose a simple technique for
making this helicity determination at the LHC. In order to attempt to circumvent all of these
difficulties, we have examined the W −W ′ interference region of the transverse mass distribution
Figure 12: Same as the linear plot shown in Fig.1, but now for other values of the coupling
helicities. From top to bottom the pairs of histograms in the upper panel correspond to h(W ′) =
±0.8,±0.6,±0.4and±0.2, respectively. The next lowest single histogram corresponds to the case of
pure vector couplings, i.e., h(W ′) = 0. In producing these results we have assumed that the values
of the Cℓ,q=1. In the lower panel, we show the same result now but with the overall couplings
rescaled so as to keep the W ′ width a constant.
for the ℓ + EmissT discovery mode. We have found that this distribution is particularly sensitive
to the helicity of the W ′ couplings. In particular, using this technique we have shown that such
helicity differentiation requires only ∼ 10(60, 300) fb−1 assuming MW ′ = 1.5(2.5, 3.5) TeV and
provided that the W ′ has Standard Model strength couplings. This helicity determination can be
further strengthened by the use of various discovery channel leptonic asymmetries also measured
in the same interference regime once higher integrated luminosities are available as well as by the
more traditional approaches. Hopefully the LHC will observe a W ′ so that this approach can be
employed.
Acknowledgments
The author would like to thank A. De Roeck, S.Godfrey and J. Hewett for input and
discussions related to this paper.
References
[1] N. Arkani-Hamed, A. G. Cohen and H. Georgi, Phys. Lett. B 513, 232 (2001)
[arXiv:hep-ph/0105239]; N. Arkani-Hamed, A. G. Cohen, E. Katz and A. E. Nelson, JHEP
0207, 034 (2002) [arXiv:hep-ph/0206021].
[2] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370 (1999) [arXiv:hep-ph/9905221].
[3] H. Davoudiasl, J. L. Hewett and T. G. Rizzo, Phys. Lett. B 473, 43 (2000)
[arXiv:hep-ph/9911262]; A. Pomarol, Phys. Lett. B 486, 153 (2000) [arXiv:hep-ph/9911294];
Y. Grossman and M. Neubert, Phys. Lett. B 474, 361 (2000) [arXiv:hep-ph/9912408];
H. Davoudiasl, J. L. Hewett and T. G. Rizzo, Phys. Rev. D 63, 075004 (2001)
[arXiv:hep-ph/0006041]; T. Gherghetta and A. Pomarol, Nucl. Phys. B 586, 141 (2000)
[arXiv:hep-ph/0003129]; S. Chang, J. Hisano, H. Nakano, N. Okada and M. Yamaguchi,
Phys. Rev. D 62, 084025 (2000) [arXiv:hep-ph/9912498]; S. J. Huber and Q. Shafi, Phys.
http://arxiv.org/abs/hep-ph/0105239
http://arxiv.org/abs/hep-ph/0206021
http://arxiv.org/abs/hep-ph/9905221
http://arxiv.org/abs/hep-ph/9911262
http://arxiv.org/abs/hep-ph/9911294
http://arxiv.org/abs/hep-ph/9912408
http://arxiv.org/abs/hep-ph/0006041
http://arxiv.org/abs/hep-ph/0003129
http://arxiv.org/abs/hep-ph/9912498
Lett. B 498, 256 (2001) [arXiv:hep-ph/0010195] and Phys. Rev. D 63, 045010 (2001)
[arXiv:hep-ph/0005286]; R. Kitano, Phys. Lett. B 481, 39 (2000) [arXiv:hep-ph/0002279];
J. L. Hewett, F. J. Petriello and T. G. Rizzo, JHEP 0209, 030 (2002) [arXiv:hep-ph/0203091].
[4] T. Appelquist, H. C. Cheng and B. A. Dobrescu, Phys. Rev. D 64, 035002 (2001)
[arXiv:hep-ph/0012100]; H. C. Cheng, K. T. Matchev and M. Schmaltz, Phys. Rev.
D 66, 056006 (2002) [arXiv:hep-ph/0205314] and Phys. Rev. D 66, 036005 (2002)
[arXiv:hep-ph/0204342].
[5] I. Antoniadis, Phys. Lett. B 246, 377 (1990);
[6] T. G. Rizzo and J. D. Wells, Phys. Rev. D 61, 016007 (2000) [arXiv:hep-ph/9906234];
K. m. Cheung and G. Landsberg, Phys. Rev. D 65, 076003 (2002) [arXiv:hep-ph/0110346];
A. Strumia, Phys. Lett. B 466, 107 (1999) [arXiv:hep-ph/9906266]; A. Delgado, A. Pomarol
and M. Quiros, JHEP 0001, 030 (2000) [arXiv:hep-ph/9911252]; F. Cornet, M. Relano and
J. Rico, Phys. Rev. D 61, 037701 (2000) [arXiv:hep-ph/9908299]; C. D. Carone, Phys. Rev.
D 61, 015008 (2000) [arXiv:hep-ph/9907362]; P. Nath and M. Yamaguchi, Phys. Rev. D
60, 116004 (1999) [arXiv:hep-ph/9902323]; A. Muck, A. Pilaftsis and R. Ruckl, Phys. Rev.
D 65, 085037 (2002) [arXiv:hep-ph/0110391]; G. Polesello and M. Prata, Eur. Phys. J. C
32S2, 55 (2003); P. Nath, Y. Yamada and M. Yamaguchi, Phys. Lett. B 466, 100 (1999)
[arXiv:hep-ph/9905415];
[7] H. Davoudiasl and T. G. Rizzo, arXiv:hep-ph/0702078.
[8] For a classic review and original references, see R.N. Mohapatra, Unification and Supersym-
metry, (Springer, New York,1986).
[9] See, for example, K. R. Lynch, E. H. Simmons, M. Narain and S. Mrenna, Phys. Rev. D 63,
035006 (2001) [arXiv:hep-ph/0007286]; H. Georgi, E. Jenkins and E. H. Simmons, Nucl. Phys.
http://arxiv.org/abs/hep-ph/0010195
http://arxiv.org/abs/hep-ph/0005286
http://arxiv.org/abs/hep-ph/0002279
http://arxiv.org/abs/hep-ph/0203091
http://arxiv.org/abs/hep-ph/0012100
http://arxiv.org/abs/hep-ph/0205314
http://arxiv.org/abs/hep-ph/0204342
http://arxiv.org/abs/hep-ph/9906234
http://arxiv.org/abs/hep-ph/0110346
http://arxiv.org/abs/hep-ph/9906266
http://arxiv.org/abs/hep-ph/9911252
http://arxiv.org/abs/hep-ph/9908299
http://arxiv.org/abs/hep-ph/9907362
http://arxiv.org/abs/hep-ph/9902323
http://arxiv.org/abs/hep-ph/0110391
http://arxiv.org/abs/hep-ph/9905415
http://arxiv.org/abs/hep-ph/0702078
http://arxiv.org/abs/hep-ph/0007286
B 331, 541 (1990); A. Bagneid, T. K. Kuo and N. Nakagawa, Int. J. Mod. Phys. A 2, 1351
(1987).
[10] For classic reviews of Z’ physics, see A. Leike, Phys. Rept. 317, 143 (1999)
[arXiv:hep-ph/9805494]; J. L. Hewett and T. G. Rizzo, Phys. Rept. 183, 193 (1989); M. Cvetic
and S. Godfrey, arXiv:hep-ph/9504216; T. G. Rizzo, “Extended gauge sectors at future
colliders: Report of the new gauge boson subgroup,” eConf C960625, NEW136 (1996)
[arXiv:hep-ph/9612440] and arXiv:hep-ph/0610104.
[11] See, however, S. Godfrey, P. Kalyniak, B. Kamal, M. A. Doncheski and A. Leike, Int. J. Mod.
Phys. A 16S1B, 879 (2001) [arXiv:hep-ph/0009325]; S. Godfrey, P. Kalyniak, B. Kamal and
A. Leike, Phys. Rev. D 61, 113009 (2000) [arXiv:hep-ph/0001074].
[12] For a recent analysis and original references, see K. Melnikov and F. Petriello,
arXiv:hep-ph/0609070.
[13] U. Baur and D. Wackeroth, Nucl. Phys. Proc. Suppl. 116, 159 (2003) [arXiv:hep-ph/0211089];
[14] V. A. Zykunov, arXiv:hep-ph/0509315.
[15] See, for example, V. Barger and R. J. N. Phillips in Collider Physics, Frontiers in Physics
Series Vol.71, 1996.
[16] D. Berdine, N. Kauer and D. Rainwater, arXiv:hep-ph/0703058.
[17] H. E. Haber, “Signals Of New W’s And Z’s,” SLAC-PUB-3456 Presented at 1984 Summer
Study on the Design and Utilization of the Superconducting Super Collider, Snowmass, CO,
Jun 23 - Jul 23, 1984
[18] H. E. Haber, “Taus: A Probe Of New W And Z Couplings,” Presented at 1984 Summer Study
on the Design and Utilization of the Superconducting Super Collider, Snowmass, CO, Jun 23
- Jul 23, 1984.
http://arxiv.org/abs/hep-ph/9805494
http://arxiv.org/abs/hep-ph/9504216
http://arxiv.org/abs/hep-ph/9612440
http://arxiv.org/abs/hep-ph/0610104
http://arxiv.org/abs/hep-ph/0009325
http://arxiv.org/abs/hep-ph/0001074
http://arxiv.org/abs/hep-ph/0609070
http://arxiv.org/abs/hep-ph/0211089
http://arxiv.org/abs/hep-ph/0509315
http://arxiv.org/abs/hep-ph/0703058
[19] See the talk given by T. Vickey, at the ATLAS Exotics Working Group Meeting, 2/21/07.
[20] M. Cvetic, P. Langacker and B. Kayser, Phys. Rev. Lett. 68, 2871 (1992); M. Cvetic,
P. Langacker and J. Liu, Phys. Rev. D 49, 2405 (1994) [arXiv:hep-ph/9308251]; M. Cvetic
and P. Langacker, Phys. Rev. D 46, 4943 (1992) [Erratum-ibid. D 48, 4484 (1993)]
[arXiv:hep-ph/9207216].
[21] The possibility of probing W −W ′ interference in the tb̄ channel has recently been discussed
in E. Boos, V. Bunichev, L. Dudko and M. Perfilov, arXiv:hep-ph/0610080.
[22] The direct search lower limit on the mass of a W ′ with such couplings is approaching 1 TeV
from Run II data at the Tevatron; see, for example, P. Savard, “Searches for Extra Dimensions
and New Gauge Bosons at the Tevatron,” talk given at the XXXIII International Conference
on High Energy Physics, 26 July-2 August 2006, Moscow, Russia; T. Adams, “Searches for
New Phenomena with Lepton Final States at the Tevatron,” talk given at Rencontres de
Moriond Electroweak Interactions and Unified Theories 2007, La Thuile, Italy 10-17 March
2007; A. Abulencia et al. [CDF Collaboration], arXiv:hep-ex/0611022. In the case of the LRM,
the lower bound from indirect measurements may be somewhat larger: P. Langacker and
S. Uma Sankar, Phys. Rev. D 40, 1569 (1989).
[23] See, for example, G. Azuelos et al., Eur. Phys. J. C 3
9S2, 13 (2005) [arXiv:hep-ph/0402037];
C. Hof, T. Hebbeker and K. Hoepfner,
[24] T. Sjostrand, S. Mrenna and P. Skands, JHEP 0605, 026 (2006) [arXiv:hep-ph/0603175].
[25] S. Kretzer, H. L. Lai, F. I. Olness and W. K. Tung, Phys. Rev. D 69, 114005 (2004)
[arXiv:hep-ph/0307022]. We employ the CTEQ6M PDFs throughout this analysis. For full
details, see http://www.phys.psu.edu/∼cteq/.
[26] A. Datta, K. Kong and K. T. Matchev, Phys. Rev. D 72, 096006 (2005) [Erratum-ibid. D 72,
119901 (2005)] [arXiv:hep-ph/0509246].
http://arxiv.org/abs/hep-ph/9308251
http://arxiv.org/abs/hep-ph/9207216
http://arxiv.org/abs/hep-ph/0610080
http://arxiv.org/abs/hep-ex/0611022
http://arxiv.org/abs/hep-ph/0402037
http://arxiv.org/abs/hep-ph/0603175
http://arxiv.org/abs/hep-ph/0307022
http://www.phys.psu.edu/~cteq/
http://arxiv.org/abs/hep-ph/0509246
[27] See, for example, R. Cousins, J. Mumford and V. Valuev, CMS Note 2005/022; I. Golutin et
al., CMS AN-2007/003.
[28] F. Gianotti et al., Eur. Phys. J. C 39, 293 (2005) [arXiv:hep-ph/0204087].
http://arxiv.org/abs/hep-ph/0204087
	Introduction
	Background and History
	W-W' Interference as a Function of MT
	Summary and Conclusions
ABSTRACT
  Apart from its mass and width, the most important property of a new charged
gauge boson, $W'$, is the helicity of its couplings to the SM fermions. Such
particles are expected to exist in many extensions of the Standard Model. In
this paper we explore the capability of the LHC to determine the $W'$ coupling
helicity at low integrated luminosities in the $\ell +E_T^{miss}$ discovery
channel. We find that measurements of the transverse mass distribution,
reconstructed from this final state in the $W-W'$ interference region, provides
the best determination of this quantity. To make such measurements requires
integrated luminosities of $\sim 10(60) fb^{-1}$ assuming $M_{W'}=1.5(2.5)$ TeV
and provided that the $W'$ couplings have Standard Model magnitude. This
helicity determination can be further strengthened by the use of various
discovery channel leptonic asymmetries, also measured in the same interference
regime, but with higher integrated luminosities.

<|endoftext|><|startoftext|>
Introduction 1
2. Notations and preliminary results 2
3. Evolution equations for some geometric quantities 4
4. Essential parabolic flow equations 9
5. Stability of the limit hypersurfaces 15
6. Existence results 25
7. The inverse mean curvature flow 39
8. The IMCF in ARW spaces 41
Transition from big crunch to big bang 45
References 47
1. Introduction
In this paper we want to give a survey of the existence and regularity results
for extrinsic curvature flows in semi-Riemannian manifolds, i.e., Riemannian or
Lorentzian ambient spaces, with an emphasis on flows in Lorentzian spaces. In
order to treat both cases simultaneously terminology like spacelike, timelike,
etc., that only makes sense in a Lorentzian setting should be ignored in the
Riemannian case.
The general stability result for the limit hypersurfaces of converging curva-
ture flows in Section 5 is new. The regularity result in Theorem 6.5—especially
the time independent Cm+2,α-estimates—for converging curvature flows that
are graphs is interesting too.
Date: October 23, 2018.
2000 Mathematics Subject Classification. 35J60, 53C21, 53C44, 53C50, 58J05.
Key words and phrases. semi-Riemannian manifold, mass, stable solutions, cosmological
spacetime, general relativity, curvature flows, ARW spacetime.
This research was supported by the Deutsche Forschungsgemeinschaft.
http://arxiv.org/abs/0704.0236v4
2 CLAUS GERHARDT
2. Notations and preliminary results
The main objective of this section is to state the equations of Gauß, Co-
dazzi, and Weingarten for hypersurfaces. In view of the subtle but important
difference that is to be seen in the Gauß equation depending on the nature
of the ambient space—Riemannian or Lorentzian—, which we already men-
tioned in the introduction, we shall formulate the governing equations of a
hypersurface M in a semi-Riemannian (n+1)-dimensional space N , which is
either Riemannian or Lorentzian. Geometric quantities in N will be denoted
by (ḡαβ), (R̄αβγδ), etc., and those in M by (gij), (Rijkl), etc. Greek indices
range from 0 to n and Latin from 1 to n; the summation convention is always
used. Generic coordinate systems in N resp. M will be denoted by (xα) resp.
(ξi). Covariant differentiation will simply be indicated by indices, only in case
of possible ambiguity they will be preceded by a semicolon, i.e. for a function
u in N , (uα) will be the gradient and (uαβ) the Hessian, but e.g., the covariant
derivative of the curvature tensor will be abbreviated by R̄αβγδ;ǫ. We also point
out that
(2.1) R̄αβγδ;i = R̄αβγδ;ǫx
with obvious generalizations to other quantities.
Let M be a spacelike hypersurface, i.e. the induced metric is Riemannian,
with a differentiable normal ν. We define the signature of ν, σ = σ(ν), by
(2.2) σ = ḡαβν
ανβ = 〈ν, ν〉.
In case N is Lorentzian, σ = −1, and ν is timelike.
In local coordinates, (xα) and (ξi), the geometric quantities of the spacelike
hypersurface M are connected through the following equations
(2.3) xαij = −σhijνα
the so-called Gauß formula. Here, and also in the sequel, a covariant derivative
is always a full tensor, i.e.
(2.4) xαij = x
,ij − Γ kijxαk + Γ̄αβγx
The comma indicates ordinary partial derivatives.
In this implicit definition the second fundamental form (hij) is taken with
respect to −σν.
The second equation is the Weingarten equation
(2.5) ναi = h
where we remember that ναi is a full tensor.
Finally, we have the Codazzi equation
(2.6) hij;k − hik;j = R̄αβγδναxβi x
and the Gauß equation
(2.7) Rijkl = σ{hikhjl − hilhjk}+ R̄αβγδxαi x
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 3
Here, the signature of ν comes into play.
2.1. Definition. (i) Let F ∈ C0(Γ̄ ) ∩ C2,α(Γ ) be a strictly monotone cur-
vature function, where Γ ⊂ Rn is a convex, open, symmetric cone containing
the positive cone, such that
(2.8) F |∂Γ = 0 ∧ F |Γ > 0.
Let N be semi-Riemannian. A spacelike, orientable1 hypersurface M ⊂ N
is called admissible, if its principal curvatures with respect to a chosen normal
lie in Γ . This definition also applies to subsets of M .
(ii) Let M be an admissible hypersurface and f a function defined in a
neighbourhood of M . M is said to be an upper barrier for the pair (F, f), if
(2.9) F |M ≥ f
(iii) Similarly, a spacelike, orientable hypersurfaceM is called a lower barrier
for the pair (F, f), if at the points Σ ⊂M , where M is admissible, there holds
(2.10) F |Σ ≤ f.
Σ may be empty.
(iv) If we consider the mean curvature function, F = H , then we suppose F
to be defined in Rn and any spacelike, orientable hypersurface is admissible.
One of the assumptions that are used when proving a priori estimates is that
there exists a strictly convex function χ ∈ C2(Ω̄) in a given domain Ω. We
shall state sufficient geometric conditions guaranteeing the existence of such a
function. The lemma below will be valid in Lorentzian as well as Riemannian
manifolds, but we formulate and prove it only for the Lorentzian case.
2.2. Lemma. Let N be globally hyperbolic, S0 a Cauchy hypersurface, (xα)
a special coordinate system associated with S0, and Ω̄ ⊂ N be compact. Then,
there exists a strictly convex function χ ∈ C2(Ω̄) provided the level hypersur-
faces {x0 = const} that intersect Ω̄ are strictly convex.
Proof. For greater clarity set t = x0, i.e., t is a globally defined time function.
Let x = x(ξ) be a local representation for {t = const}, and ti, tij be the
covariant derivatives of t with respect to the induced metric, and tα, tαβ be the
covariant derivatives in N , then
(2.11) 0 = tij = tαβx
j + tαx
and therefore,
(2.12) tαβx
j = −tαx
ij = −h̄ijtανα.
Here, (να) is past directed, i.e., the right-hand side in (2.12) is positive definite
in Ω̄, since (tα) is also past directed.
1A hypersurface is said to be orientable, if it has a continuous normal field.
4 CLAUS GERHARDT
Choose λ > 0 and define χ = eλt, so that
(2.13) χαβ = λ
2eλttαtβ + λe
λttαβ .
Let p ∈ Ω be arbitrary, S = {t = t(p)} be the level hypersurface through p,
and (ηα) ∈ Tp(N). Then, we conclude
(2.14) e−λtχαβη
αηβ = λ2|η0|2 + λtijηiηj + 2λt0jη0ηi,
where tij now represents the left-hand side in (2.12), and we infer further
(2.15)
e−λtχαβη
αηβ ≥ 1
λ2|η|02 + [λǫ− cǫ]σijηiηj
λ{−|η0|2 + σijηiηj}
for some ǫ > 0, and where λ is supposed to be large. Therefore, we have in Ω̄
(2.16) χαβ ≥ cḡαβ , c > 0,
i.e., χ is strictly convex. �
3. Evolution equations for some geometric quantities
Curvature flows are used for different purposes, they can be merely vehicles
to approximate a stationary solution, in which case the flow is driven not
only by a curvature function but also by the corresponding right-hand side,
an external force, if you like, or the flow is a pure curvature flow driven only
by a curvature function, and it is used to analyze the topology of the initial
hypersurface, if the ambient space is Riemannian, or the singularities of the
ambient space, in the Lorentzian case.
In this section we are treating very general curvature flows2 in a semi-
Riemannian manifold N = Nn+1, though we only have the Riemannian or
Lorentzian case in mind, such that the flow can be either a pure curvature flow
or may also be driven by an external force. The nature of the ambient space,
i.e., the signature of its metric, is expressed by a parameter σ = ±1, such that
σ = 1 corresponds to the Riemannian and σ = −1 the Lorentzian case. The
parameter σ can also be viewed as the signature of the normal of the spacelike
hypersurfaces, namely,
(3.1) σ = 〈ν, ν〉.
Properties like spacelike, achronal, etc., however, only make sense, when N
is Lorentzian and should be ignored otherwise.
We consider a strictly monotone, symmetric, and concave curvature F ∈
C4,α(Γ ), homogeneous of degree 1, a function 0 < f ∈ C4,α(Ω), where Ω ⊂ N
is an open set, and a real function Φ ∈ C4,α(R+) satisfying
(3.2) Φ̇ > 0 and Φ̈ ≤ 0.
For notational reasons, let us abbreviate
(3.3) f̃ = Φ(f).
2We emphasize that we are only considering flows driven by the extrinsic curvature not
by the intrinsic curvature.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 5
Important examples of functions Φ are
(3.4) Φ(r) = r, Φ(r) = log r, Φ(r) = −r−1
(3.5) Φ(r) = r
k , Φ(r) = −r− 1k , k ≥ 1.
3.1. Remark. The latter choices are necessary, if the curvature function F
is not homogeneous of degree 1 but of degree k, like the symmetric polynomials
Hk. In this case we would sometimes like to define F = Hk and not H
k , since
(3.6) F ij =
is then divergence free, if the ambient space is a spaceform, cf. Lemma 5.8 on
page 24, though on the other hand we need a concave operator for technical
reasons, hence we have to take the k-th root.
The curvature flow is given by the evolution problem
(3.7)
ẋ = −σ(Φ− f̃)ν,
x(0) = x0,
where x0 is an embedding of an initial compact, spacelike hypersurfaceM0 ⊂ Ω
of class C6,α, Φ = Φ(F ), and F is evaluated at the principal curvatures of the
flow hypersurfaces M(t), or, equivalently, we may assume that F depends on
the second fundamental form (hij) and the metric (gij) of M(t); x(t) is the
embedding of M(t) and σ the signature of the normal ν = ν(t), which is
identical to the normal used in the Gaussian formula (2.3) on page 2.
The initial hypersurface should be admissible, i.e., its principal curvatures
should belong to the convex, symmetric cone Γ ⊂ Rn.
This is a parabolic problem, so short-time existence is guaranteed, cf. [18,
Chapter 2.5]
There will be a slight ambiguity in the terminology, since we shall call the
evolution parameter time, but this lapse shouldn’t cause any misunderstand-
ings, if the ambient space is Lorentzian.
At the moment we consider a sufficiently smooth solution of the initial value
problem (3.7) and want to show how the metric, the second fundamental form,
and the normal vector of the hypersurfaces M(t) evolve. All time derivatives
are total derivatives, i.e., covariant derivatives of tensor fields defined over the
curve x(t), cf. [17, Chapter 11.5]; t is the flow parameter, also referred to
as time, and (ξi) are local coordinates of the initial embedding x0 = x0(ξ)
which will also serve as coordinates for the the flow hypersurfaces M(t). The
coordinates in N will be labelled (xα), 0 ≤ α ≤ n.
3.2. Lemma (Evolution of the metric). The metric gij of M(t) satisfies the
evolution equation
(3.8) ġij = −2σ(Φ− f̃)hij .
6 CLAUS GERHARDT
Proof. Differentiating
(3.9) gij = 〈xi, xj〉
covariantly with respect to t yields
(3.10)
ġij = 〈ẋi, xj〉+ 〈xi, ẋj〉
= −2σ(Φ− f̃)〈xi, νj〉 = −2σ(Φ− f̃)hij ,
in view of the Codazzi equations. �
3.3.Lemma (Evolution of the normal). The normal vector evolves according
(3.11) ν̇ = ∇M (Φ− f̃) = gij(Φ− f̃)ixj .
Proof. Since ν is unit normal vector we have ν̇ ∈ T (M). Furthermore, differ-
entiating
(3.12) 0 = 〈ν, xi〉
with respect to t, we deduce
�(3.13) 〈ν̇, xi〉 = −〈ν, ẋi〉 = (Φ− f̃)i.
3.4. Lemma (Evolution of the second fundamental form). The second fun-
damental form evolves according to
(3.14) ḣ
i = (Φ− f̃)
i + σ(Φ− f̃)h
k + σ(Φ− f̃)R̄αβγδν
γxδkg
(3.15) ḣij = (Φ− f̃)ij − σ(Φ − f̃)hki hkj + σ(Φ − f̃)R̄αβγδναx
γxδj .
Proof. We use the Ricci identities to interchange the covariant derivatives of ν
with respect to t and ξi
(3.16)
(ναi ) = (ν̇
α)i − R̄αβγδνβx
= gkl(Φ− f̃)kixαl + gkl(Φ− f̃)kxαli − R̄αβγδνβx
For the second equality we used (3.11). On the other hand, in view of the
Weingarten equation we obtain
(3.17) D
(ναi ) =
(hki x
k ) = ḣ
k + h
Multiplying the resulting equation with ḡαβx
j we conclude
(3.18) ḣki gkj − σ(Φ − f̃)hki hkj = (Φ− f̃)ij + σ(Φ − f̃)R̄αβγδναx
or equivalently (3.14).
To derive (3.15), we differentiate
(3.19) hij = h
i gkj
with respect to t and use (3.8). �
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 7
We emphasize that equation (3.14) describes the evolution of the second
fundamental form more meaningfully than (3.15), since the mixed tensor is
independent of the metric.
3.5. Lemma (Evolution of (Φ− f̃)). The term (Φ− f̃) evolves according to
the equation
(3.20)
(Φ− f̃)
− Φ̇F ij(Φ− f̃)ij =σΦ̇F ijhikhkj (Φ− f̃) + σf̃ανα(Φ− f̃)
+ σΦ̇F ijR̄αβγδν
γxδj(Φ − f̃),
where
(3.21) (Φ− f̃)′ = d
(Φ− f̃)
(3.22) Φ̇ =
Φ(r).
Proof. When we differentiate F with respect to t we consider F to depend on
the mixed tensor h
i and conclude
(3.23) (Φ− f̃)′ = Φ̇F ij ḣ
i − f̃αẋ
The equation (3.20) then follows in view of (3.7) and (3.14). �
3.6. Remark. The preceding conclusions, except Lemma 3.5, remain valid
for flows which do not depend on the curvature, i.e., for flows
(3.24)
ẋ = −σ(−f)ν = σfν,
x(0) = x0,
where f = f(x) is defined in an open set Ω containing the initial spacelike
hypersurface M0. In the preceding equations we only have to set Φ = 0 and
f̃ = f .
The evolution equation for the mean curvature then looks like
(3.25) Ḣ = −∆f − σ{|A|2 + R̄αβνανβ}f,
where the Laplacian is the Laplace operator on the hypersurfaceM(t). This is
exactly the derivative of the mean curvature operator with respect to normal
variations as we shall see in a moment.
But first let us consider the following example.
3.7. Example. Let (xα) be a future directed Gaussian coordinate system
in N , such that the metric can be expressed in the form
(3.26) ds̄2 = e2ψ{σ(dx0)2 + σijdxidxj}.
Denote by M(t) the coordinate slices {x0 = t}, then M(t) can be looked at as
the flow hypersurfaces of the flow
(3.27) ẋ = −σ(−eψ)ν̄,
8 CLAUS GERHARDT
where we denote the geometric quantities of the slices by ḡij , ν̄, h̄ij , etc.
Here x is the embedding
(3.28) x = x(t, ξi) = (t, xi).
Notice that, if N is Riemannian, the coordinate system and the normal are
always chosen such that ν0 > 0, while, if N is Lorentzian, we always pick the
past directed normal.
Hence the mean curvature of the slices evolves according to
(3.29) ˙̄H = −∆eψ − σ{|Ā|2 + R̄αβ ν̄αν̄β}eψ.
We can now derive the linearization of the mean curvature operator of a
spacelike hypersurface, compact or non-compact.
3.8. Let M0 ⊂ N be a spacelike hypersurface of class C4. We first assume
that M0 is compact; then there exists a tubular neighbourhood U and a cor-
responding normal Gaussian coordinate system (xα) of class C3 such that ∂
is normal to M0.
Let us consider in U of M0 spacelike hypersurfaces M that can be writ-
ten as graphs over M0, M = graphu, in the corresponding normal Gaussian
coordinate system. Then the mean curvature of M can be expressed as
(3.30) H = {−∆u+ H̄ − σv−2uiujh̄ij}v,
where σ = 〈ν, ν〉, and hence, choosing u = ǫϕ, ϕ ∈ C2(M0), we deduce
(3.31)
H |ǫ=0 = −∆ϕ+ ˙̄Hϕ
= −∆ϕ− σ(|Ā|2 + R̄αβνανβ)ϕ,
in view of (3.29).
The right-hand side is the derivative of the mean curvature operator applied
to ϕ.
If M0 is non-compact, tubular neighbourhoods exist locally and the relation
(3.31) will be valid for any ϕ ∈ C2c (M0) by using a partition of unity.
The preceding linearization can be immediately generalized to a hypersurface
M0 solving the equation
(3.32) F |M0 = f,
where f = f(x) is defined in a neighbourhood of M0 and F = F (hij) is curva-
ture operator.
3.9. Lemma. Let M0 be of class C
m,α, m ≥ 2, 0 ≤ α ≤ 1, satisfy (3.32).
Let U be a (local) tubular neighbourhood of M0, then the linearization of the
operator F − f expressed in the normal Gaussian coordinate system (xα) cor-
responding to U and evaluated at M0 has the form
(3.33) − F ijuij − σ{F ijhki hkj + F ijR̄αβγδναx
γxδj + fαν
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 9
where u is a function defined in M0, and all geometric quantities are those of
M0; the derivatives are covariant derivatives with respect to the induced metric
of M0. The operator will be self-adjoint, if F
ij is divergence free.
Proof. For simplicity assume thatM0 is compact, and let u ∈ C2(M0) be fixed.
Then the hypersurfaces
(3.34) Mǫ = graph(ǫu)
stay in the tubular neighbourhood U for small ǫ, |ǫ| < ǫ0, and their second
fundamental forms (hij) can be expressed as
(3.35) v−1hij = −(ǫu)ij + h̄ij ,
where h̄ij is the second fundamental form of the coordinate slices {x0 = const}.
We are interested in
(3.36)
(F − f)|ǫ=0 .
To differentiate F with respect to ǫ it is best to consider the mixed form
i ) of the second fundamental form to derive
(3.37)
(F − f) = F ij ḣ
u = −F ijuij + F ij ˙̄h
where the equation is evaluated at ǫ = 0 and ˙̄h
i is the derivative of h̄
i with
respect to x0.
The result then follows from the evolution equation (3.14) for the flow (3.27),
i.e., we have to replace (Φ− f̃) in (3.14) by −1. �
4. Essential parabolic flow equations
From (3.14) on page 6 we deduce with the help of the Ricci identities a
parabolic equation for the second fundamental form
4.1. Lemma. The mixed tensor h
i satisfies the parabolic equation
(4.1)
i − Φ̇F
i;kl =
σΦ̇F klhrkh
i − σΦ̇Fhrih
rj + σ(Φ− f̃)hki h
− f̃αβxαi x
kj + σf̃αν
i + Φ̇F
kl,rshkl;ih
+ Φ̈FiF
j + 2Φ̇F klR̄αβγδx
− Φ̇F klR̄αβγδxαmx
rj − Φ̇F klR̄αβγδxαmx
+ σΦ̇F klR̄αβγδν
γxδl h
i − σΦ̇F R̄αβγδν
γxδmg
+ σ(Φ− f̃)R̄αβγδναxβi ν
γxδmg
+ Φ̇F klR̄αβγδ;ǫ{ναxβkx
mj + ναx
10 CLAUS GERHARDT
Proof. We start with equation (3.14) on page 6 and shall evaluate the term
(4.2) (Φ− f̃)ji ;
since we are only working with covariant spatial derivatives in the subsequent
proof, we may—and shall—consider the covariant form of the tensor
(4.3) (Φ− f̃)ij .
First we have
(4.4) Φi = Φ̇Fi = Φ̇F
klhkl;i
(4.5) Φij = Φ̇F
klhkl;ij + Φ̈F
klhkl;iF
rshrs;j + Φ̇F
kl,rshkl,;ihrs;j .
Next, we want to replace hkl;ij by hij;kl. Differentiating the Codazzi equation
(4.6) hkl;i = hik;l + R̄αβγδν
where we also used the symmetry of hik, yields
(4.7)
hkl;ij = hik;lj + R̄αβγδ;ǫν
+ R̄αβγδ{ναj x
i + ν
i + ν
i + ν
To replace hkl;ij by hij;kl we use the Ricci identities
(4.8) hik;lj = hik;jl + hakR
ilj + haiR
and differentiate once again the Codazzi equation
(4.9) hik;j = hij;k + R̄αβγδν
To replace f̃ij we use the chain rule
(4.10)
f̃i = f̃αx
f̃ij = f̃αβx
j + f̃αx
Then, because of the Gauß equation, Gaussian formula, and Weingarten
equation, the symmetry properties of the Riemann curvature tensor and the
assumed homogeneity of F , i.e.,
(4.11) F = F klhkl,
we deduce (4.1) from (3.14) on page 6 after reverting to the mixed representa-
tion. �
4.2. Remark. If we had assumed F to be homogeneous of degree d0 instead
of 1, then we would have to replace the explicit term F—occurring twice in the
preceding lemma—by d0F .
If the ambient semi-Riemannian manifold is a space of constant curvature,
then the evolution equation of the second fundamental form simplifies consid-
erably, as can be easily verified.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 11
4.3. Lemma. Let N be a space of constant curvature KN , then the second
fundamental form of the curvature flow (3.7) on page 5 satisfies the parabolic
equation
(4.12)
i − Φ̇F
i;kl = σΦ̇F
klhrkh
i − σΦ̇Fhrih
rj + σ(Φ − f̃)hki h
− f̃αβxαi x
kj + σf̃αν
i + Φ̇F
kl,rshkl;ih
+ Φ̈FiF
+KN{(Φ− f̃)δji + Φ̇F δ
i − Φ̇F
klgklh
Let us now assume that the open set Ω ⊂ N containing the flow hyper-
surfaces can be covered by a Gaussian coordinate system (xα), i.e., Ω can be
topologically viewed as a subset of I × S0, where S0 is a compact Riemannian
manifold and I an interval. We assume furthermore, that the flow hypersur-
faces can be written as graphs over S0
(4.13) M(t) = { x0 = u(xi) : x = (xi) ∈ S0 };
we use the symbol x ambiguously by denoting points p = (xα) ∈ N as well as
points p = (xi) ∈ S0 simply by x, however, we are careful to avoid confusions.
Suppose that the flow hypersurfaces are given by an embedding x = x(t, ξ),
where ξ = (ξi) are local coordinates of a compact manifoldM0, which then has
to be homeomorphic to S0, then
(4.14)
x0 = u(t, ξ) = u(t, x(t, ξ)),
xi = xi(t, ξ).
The induced metric can be expressed as
(4.15) gij = 〈xi, xj〉 = σuiuj + σklxki xlj ,
where
(4.16) ui = ukx
i.e.,
(4.17) gij = {σukul + σkl}xki xlj ,
hence the (time dependent) Jacobian (xki ) is invertible, and the (ξ
i) can also
be viewed as coordinates for S0.
Looking at the component α = 0 of the flow equation (3.7) on page 5 we
obtain a scalar flow equation
(4.18) u̇ = −e−ψv−1(Φ− f̃),
which is the same in the Lorentzian as well as in the Riemannian case, where
(4.19) v2 = 1− σσijuiuj,
and where
(4.20) |Du|2 = σijuiuj
12 CLAUS GERHARDT
is of course a scalar, i.e., we obtain the same expression regardless, if we use
the coordinates xi or ξi.
The time derivative in (4.18) is a total time derivative, if we consider u to
depend on u = u(t, x(t, ξ)). For the partial time derivative we obtain
(4.21)
= u̇− ukẋki
= −e−ψv(Φ− f̃),
in view of (3.7) on page 5 and our choice of normal ν = (να)
(4.22) (να) = σe−ψv−1(1,−σui),
where ui = σijuj.
Controlling the C1-norm of the graphs M(t) is tantamount to controlling v,
if N is Riemannian, and ṽ = v−1, if N is Lorentzian. The evolution equations
satisfied by these quantities are also very important, since they are used for
the a priori estimates of the second fundamental form.
Let us start with the Lorentzian case.
4.4. Lemma (Evolution of ṽ). Consider the flow (3.7) in a Lorentzian space
N such that the spacelike flow hypersurfaces can be written as graphs over S0.
Then, ṽ satisfies the evolution equation
(4.23)
˙̃v − Φ̇F ij ṽij =− Φ̇F ijhikhkj ṽ + [(Φ− f̃)− Φ̇F ]ηαβνανβ
− 2Φ̇F ijhkjxαi x
kηαβ − Φ̇F
ijηαβγx
− Φ̇F ijR̄αβγδναxβi x
− f̃βxβi x
k ηαg
where η is the covariant vector field (ηα) = e
ψ(−1, 0, . . . , 0).
Proof. We have ṽ = 〈η, ν〉. Let (ξi) be local coordinates for M(t). Differenti-
ating ṽ covariantly we deduce
(4.24) ṽi = ηαβx
α + ηαν
(4.25)
ṽij = ηαβγx
α + ηαβx
+ ηαβx
j + ηαβx
i + ηαν
The time derivative of ṽ can be expressed as
(4.26)
˙̃v = ηαβ ẋ
βνα + ηαν̇
= ηαβν
ανβ(Φ − f̃) + (Φ− f̃)kxαk ηα
= ηαβν
ανβ(Φ − f̃) + Φ̇F kxαk ηα − f̃βx
ikηα,
where we have used (3.11) on page 6.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 13
Substituting (4.25) and (4.26) in (4.23), and simplifying the resulting equa-
tion with the help of the Weingarten and Codazzi equations, we arrive at the
desired conclusion. �
In the Riemannian case we consider a normal Gaussian coordinate system
(xα), for otherwise we won’t obtain a priori estimates for v, at least not without
additional strong assumptions. We also refer to x0 = r as the radial distance
function.
4.5. Lemma (Evolution of v). Consider the flow (3.7) in a normal Gaussian
coordinate system where the M(t) can be written as graphs of a function u(t)
over some compact Riemannian manifold S0. Then the quantity
(4.27) v =
1 + |Du|2 = (rανα)−1
satisfies the evolution equation
(4.28)
v̇ − Φ̇F ijvij = −Φ̇F ijhikhkj v − 2v−1Φ̇F ijvivj
+ rαβν
ανβ [(Φ− f̃)− Φ̇F ]v2 + 2Φ̇F ijhki rαβxαkx
+ Φ̇F ijR̄αβγδν
+ Φ̇F ijrαβγν
2 + f̃αx
mkrβx
Proof. Similar to the proof of the previous lemma. �
The previous problems can be generalized to the case when the right-hand
side f is not only defined in N or in Ω̄ but in the tangent bundle T (N) resp.
T (Ω̄). Notice that the tangent bundle is a manifold of dimension 2(n+1), i.e.,
in a local trivialization of T (N) f can be expressed in the form
(4.29) f = f(x, ν)
with x ∈ N and ν ∈ Tx(N), cf. [17, Note 12.2.14]. Thus, the case f = f(x) is
included in this general set up. The symbol ν indicates that in an equation
(4.30) F |M = f(x, ν)
we want f to be evaluated at (x, ν), where x ∈M and ν is the normal of M in
The Minkowski problem or Minkowski type problems are also covered by the
present setting, though the Minkowski problem has the additional property that
the problem is transformed via the Gauß map to a different semi-Riemannian
manifold as a dual problem and solved there. Minkowski type problems have
been treated in [5], [23], [16] and [21].
4.6. Remark. The equation (4.30) will be solved by the same methods as
in the special case when f = f(x), i.e., we consider the same curvature flow,
the evolution equation (3.7) on page 5, as before.
14 CLAUS GERHARDT
The resulting evolution equations are identical with the natural exception,
that, when f or f̃ has to be differentiated, the additional argument has to be
considered, e.g.,
(4.31) f̃i = f̃αx
i + f̃νβν
i = f̃αx
i + f̃νβx
(4.32)
f = f̃αẋ
α + f̃νβ ν̇
β = −σ(Φ− f̃)f̃ανα + f̃νβgij(Φ− f̃)ix
The most important evolution equations are explicitly stated below.
Let us first state the evolution equation for (Φ − f̃).
4.7. Lemma (Evolution of (Φ− f̃)). The term (Φ− f̃) evolves according to
the equation
(4.33)
(Φ− f̃)
− Φ̇F ij(Φ− f̃)ij = σΦ̇F ijhikhkj (Φ− f̃)
+ σf̃αν
α(Φ− f̃)− f̃ναxαi (Φ− f̃)jgij
+ σΦ̇F ijR̄αβγδν
γxδj(Φ− f̃),
where
(4.34) (Φ− f̃)′ = d
(Φ− f̃)
(4.35) Φ̇ =
Φ(r).
Here is the evolution equation for the second fundamental form.
4.8. Lemma. The mixed tensor h
i satisfies the parabolic equation
(4.36)
i − Φ̇F
= σΦ̇F klhrkh
i − σΦ̇Fhrih
rj + σ(Φ − f̃)hki h
− f̃αβxαi x
kj + σf̃αν
i − f̃ανβ (x
kj + xαl x
− f̃νανβxαl x
lj − f̃νβx
i;l g
lj + σf̃ναν
αhki h
+ Φ̇F kl,rshkl;ih
rs; + 2Φ̇F
klR̄αβγδx
− Φ̇F klR̄αβγδxαmx
rj − Φ̇F klR̄αβγδxαmx
+ σΦ̇F klR̄αβγδν
γxδl h
i − σΦ̇F R̄αβγδν
γxδmg
+ σ(Φ− f̃)R̄αβγδναxβi ν
γxδmg
mj + Φ̈FiF
+ Φ̇F klR̄αβγδ;ǫ{ναxβkx
mj + ναx
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 15
The proof is identical to that of Lemma 4.1; we only have to keep in mind
that f now also depends on the normal.
If we had assumed F to be homogeneous of degree d0 instead of 1, then, we
would have to replace the explicit term F—occurring twice in the preceding
lemma—by d0F .
4.9. Lemma (Evolution of ṽ). Consider the flow (3.7) in a Lorentzian space
N such that the spacelike flow hypersurfaces can be written as graphs over S0.
Then, ṽ satisfies the evolution equation
(4.37)
˙̃v − Φ̇F ij ṽij =− Φ̇F ijhikhkj ṽ + [(Φ− f̃)− Φ̇F ]ηαβνανβ
− 2Φ̇F ijhkjxαi x
kηαβ − Φ̇F
ijηαβγx
− Φ̇F ijR̄αβγδναxβi x
− f̃βxβi x
k ηαg
ik − f̃νβx
ikxαi ηα,
where η is the covariant vector field (ηα) = e
ψ(−1, 0, . . . , 0).
The proof is identical to the proof of Lemma 4.4.
In the Riemannian case we have:
4.10. Lemma (Evolution of v). Consider the flow (3.7) in a normal Gauss-
ian coordinate system (xα), where the M(t) can be written as graphs of a func-
tion u(t) over some compact Riemannian manifold S0. Then the quantity
(4.38) v =
1 + |Du|2 = (rανα)−1
satisfies the evolution equation
(4.39)
v̇ − Φ̇F ijvij =− Φ̇F ijhikhkj v − 2v−1Φ̇F ijvivj
+ [(Φ− f)− Φ̇F ]rαβνανβv2
+ 2Φ̇F ijhkjx
krαβv
2 + Φ̇F ijrαβγx
+ Φ̇F ijR̄αβγδν
+ f̃βx
k rαg
ikv2 + f̃νβx
ikxαi rαv
where r = x0 and (rα) = (1, 0, . . . , 0).
5. Stability of the limit hypersurfaces
5.1. Definition. Let N be semi-Riemannian, F a curvature operator, and
M ⊂ N a compact, spacelike hypersurface, such that M is admissible and
F ij , evaluated at (hij , gij), the second fundamental form and metric of M , is
divergence free, then M is said to be a stable solution of the equation
(5.1) F |M = f,
16 CLAUS GERHARDT
where f = f(x) is defined in a neighbourhood of M , if the first eigenvalue λ1
of the linearization, which is the operator in (3.33) on page 8, is non-negative,
or equivalently, if the quadratic form
(5.2)
F ijuiuj − σ
{F ijhki hkj + F ijR̄αβγδναx
γxδj + fαν
is non-negative for all u ∈ C2(M).
It is well-known that the corresponding eigenspace is then onedimensional
and spanned by a strictly positive eigenfunction η
(5.3) − F ijηij − σ{F ijhki hkj + F ijR̄αβγδναx
γxδj + fαν
α}η = λ1η.
Notice that F ij is supposed to be divergence free, which will be the case, if
F = Hk, 1 ≤ k ≤ n, and the ambient space has constant curvature, as we
shall prove at the end of this section. If k = 1, then F ij = gij and N can be
arbitrary, while in case k = 2, we have
(5.4) F ij = Hgij − hij ,
hence N Einstein will suffice.
To simplify the formulation of the assumptions let us define:
5.2. Definition. A curvature function F is said to be of class (D), if for
every admissible hypersurfaceM the tensor F ij , evaluated at M , is divergence
free.
We shall prove in this section that the limit hypersurface of a converging
curvature flow will be a stable stationary solution, if the initial flow velocity
has a weak sign.
5.3. Theorem. Suppose that the curvature flow (3.7) on page 5 exists for
all time, and that the leaves M(t) converge in C4 to a hypersurface M , where
the curvature function F is supposed to be of class (D). Then M is a stable
solution of the equation
(5.5) F |M = f
provided the velocity of the flow has a weak sign
(5.6) Φ− f̃ ≥ 0 ∨ Φ− f̃ ≤ 0
at t = 0 and M(0) is not already a solution of (5.5).
Proof. Convergence of a subsequence of the M(t) would actually suffice for
the proof, however, the assumption (5.6) immediately implies that the flow
converges, if a subsequence converges and a priori estimates in C4,α are valid.
The starting point is the evolution equation (3.20) on page 7 from which we
deduce in view of the parabolic maximum principle that Φ− f̃ has a weak sign
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 17
during the evolution, cf. [18, Proposition 2.7.1], i.e., if we assume without loss
of generality that at t = 0
(5.7) Φ− f̃ ≥ 0,
then this inequality will be valid for all t. Moreover, there holds
(5.8)
(Φ− f̃) > 0 ∀ 0 ≤ t <∞
if this relation is valid for t = 0, as we shall prove in the lemma below.
On the other hand, the assumption
(5.9)
(Φ− f̃) > 0
is a natural assumption, for otherwise the initial hypersurface would already
be a stationary solution which of course may not be stable.
Notice also that apart from the factor Φ̇ the equation (3.20) looks like the
parabolic version of the linearization of (F − f). If the technical function
Φ = Φ(r) is not the trivial one Φ(r) = r, then we always assume that f > 0
and that this is also valid for the limit hypersurface M . Only in case Φ(r) = r
and F = H , we allow f to be arbitrary.
Thus, our assumptions imply that in any case
(5.10) Φ̇ > ǫ0 > 0 ∀ t ∈ R+.
Furthermore, we derive from (3.20) that not only the elliptic part converges
to 0 but also
(5.11) (Φ− f̃)′ = Φ̇Ḟ + σΦ̇(f)fανα(Φ− f̃),
i.e.,
(5.12) lim Ḟ = 0.
Suppose now that M is not stable, then the first eigenvalue λ1 is negative
and there exists a strictly positive eigenfunction η solving the equation (5.3)
evaluated atM . Let U be a tubular neighbourhood ofM with a corresponding
future directed normal Gaussian coordinate system (xα) and extend η to U by
setting
(5.13) η(x0, x) = η(x),
where, by a slight abuse of notation, we also denote (xi) by x. Thus there holds
(5.14) ηαν
α = 0
in M , and choosing U sufficiently small, we may assume
(5.15) |ηανα| < η
for all hypersurfaces M(t) ⊂ U .
Now consider the term
(5.16)
Φ̇−1(Φ− f̃)η
18 CLAUS GERHARDT
for large t, which converges to 0. Since it is positive, in view of (5.8), there
must exist a sequence of t, not explicitly labelled, tending to infinity such that
(5.17)
0 ≥ d
Φ̇−1(Φ− f̃)η
−Φ̇−2Φ̈Ḟ (Φ− f̃)η +
Φ̇−1(Φ− f̃)′η
Φ̇−1ηαν
α(Φ− f̃)2 − σ
Φ̇−1(Φ− f̃)2Hη,
where we used the relation (3.8) on page 5 to derive the last integral.
The rest of the proof is straight-forward. Multiply the equation (3.20) by
Φ̇−1η and integrate over M(t) for those values of t satisfying the preceding
inequality to deduce
(5.18)
Φ̇−1(Φ− f̃)′ =
−F ijηij(Φ− f̃)
{F ijhki hkj + F ijR̄αβγδναx
γxδj + Φ̇
−1Φ̇(f)fαν
α}η(Φ− f̃),
and conclude further that the right-hand side can be estimated from above by
(5.19)
η(Φ− f̃)
for large t, while the left-hand side can be estimated from below by
(5.20) − ǫ(t)
(Φ− f̃)η
such that
(5.21) ǫ(t) > 0 ∧ lim ǫ(t) = 0
in view of (5.17), where we used (5.12), (5.15) as well as
(5.22) lim(Φ− f̃) = 0;
a contradiction because of (5.8). �
5.4. Lemma. Let M(t) be a solution of the curvature flow (3.7) on page 5
defined on a maximal time interval [0, T ∗), 0 < T ∗ ≤ ∞, and suppose that
Φ− f̃ has a weak sign at t = 0, e.g.,
(5.23) (Φ− f̃) ≥ 0
and suppose furthermore that
(5.24)
(Φ− f̃) > 0,
(5.25)
(Φ− f̃) > 0 ∀ 0 ≤ t < T ∗.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 19
Proof. Let M0 be an abstract compact Riemannian manifold that is being
isometrically embedded inN with imageM(0). Let (ξi) be a generic coordinate
system for M0 and abbreviate (Φ− f̃ ) by u. The evolution equation (3.20) can
then be looked at as a linear parabolic equation for u = u(t, ξ) on M0 with
time dependent coefficients and time dependent Riemannian metric gij(t, ξ).
By assumption u doesn’t vanish identically at t = 0, i.e., there exists a ball
Bρ = Bρ(ξ0) such that
(5.26) u(0, ξ) > 0 ∀ ξ ∈ B̄ρ(ξ0).
Let C be the cylinder
(5.27) C = [0, T ∗)× B̄ρ
and assume that there exists a first t0 > 0 such that
(5.28) inf
u(t0, ·) = 0 = u(t0, ξ1).
We shall show that this is not possible: If ξ1 ∈ Bρ, then this contradicts the
strong parabolic maximum principle, cf. [18, Lemma 2.7.1], and if ξ1 ∈ ∂Bρ,
then we deduce from [18, Lemma 2.7.4] (a parabolic version of the Hopf Lemma)
(5.29)
(t0, ξ1) < 0,
where ν is the exterior normal of the ball Bρ in ξ1, contradicting the fact that
the gradient of u(t0, ·) vanishes in ξ1 because it is a minimum point; notice that
we already know u ≥ 0 in [0, T ∗)×M0. �
For some curvature operators one can prove a priori estimates for the second
fundamental form only for stationary solutions and not for the leaves of a
corresponding curvature flow. In order to use a curvature flow to obtain a
stationary solution one uses
ǫ-regularization“, i.e., instead of the curvature
function F one considers
(5.30) F̃ (hij) = F (hij + ǫHgij)
for ǫ > 0, and starts a curvature flow with F̃ and fixed ǫ > 0.
A priori estimates for the regularized flow are usually fairly easily derived,
since
(5.31) F̃ ij = F ij + ǫF klgklg
but of course the estimates depend on ǫ. Having uniform estimates one can
deduce that the flow—or at least a subsequence—converges to a limit hyper-
surfaces Mǫ satisfying
(5.32) F̃ |Mǫ = f.
Then, if uniform C4,α-estimates for the Mǫ can be derived, a subsequence will
converge to a solution M of
(5.33) F |M = f,
20 CLAUS GERHARDT
cf. [13], where this method has been used to find hypersurfaces of prescribed
scalar curvature in Lorentzian manifolds, see also Theorem 6.9 on page 36.
We shall now show that the solutions M obtained by this approach are all
stable, if F is of class (D) and the initial velocities of the regularized flows have
a weak sign. Notice that the curvature functions F̃ are in general not of class
5.5. Theorem. Let F be of class (D), then any solution M of
(5.34) F |M = f
obtained by a regularized curvature flow as described above is stable, provided
the initial velocity of the regularized flow has a weak sign, i.e., it satisfies
(5.35) Φ− f̃ ≥ 0 ∨ Φ− f̃ ≤ 0
at t = 0 and the flow hypersurfaces converge to the stationary solution in C4.
Proof. Let Mǫ be the limit hypersurfaces of the regularized flow for ǫ > 0, and
assume that the Mǫ satisfy uniform C
4,α-estimates such that a subsequence,
not relabelled, converges in C4 to a compact spacelike hypersurface M solving
the equation
(5.36) F |M = f.
Assume that M is not stable so that the first eigenvalue of the linearization
is negative and there exists a strictly positive eigenfunction η satisfying (5.3).
Extend η in a small tubular neighbourhood U of M such that (5.15) is valid
for all Mǫ, if ǫ is small, ǫ < ǫ0.
For those ǫ we then deduce
(5.37)
−F̃ ijηij − F̃ ij;ijη − 2F̃
;j ηi
− σ{F̃ ijhki hkj + F̃ ijR̄αβγδναx
γxδj + fαν
α}η < λ1
where the inequality is evaluated at Mǫ and where we used the convergence in
Now, fix ǫ, ǫ < ǫ0, then the preceding inequality is also valid for the flow
hypersurfaces M(t) converging to Mǫ, if t is large, and the same arguments as
at the end of the proof of Theorem 5.3 lead to a contradiction. Hence, M has
to be a stable solution. �
Knowing that a solution is stable often allows to deduce further geometric
properties of the underlying hypersurface like that it is either strictly stable
or totally geodesic especially if the curvature function is the mean curvature,
cf. e.g., [29], where the stability property has been extensively used to deduce
geometric properties.
We want to prove that a neighbourhood of stable solutions can be foliated
by a family of hypersurfaces satisfying the equation modulo a constant.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 21
5.6. Theorem. Let M ⊂ N be compact, spacelike, orientable and a stable
solution of
(5.38) G|M ≡ (F − f)|M = 0,
where F is of class (D) and M as well as F , f are of class Cm,α, 2 ≤ m ≤ ∞,
0 < α < 1, then a neighbourhood of M can be foliated by a family
(5.39) Λ = {Mǫ : |ǫ| < ǫ0 }
of spacelike Cm,α-hypersurfaces satisfying
(5.40) G|Mǫ = τ(ǫ),
where τ is a real function of class Cm,α. The Mǫ can be written as graphs over
M in a tubular neighbourhood of M
(5.41) Mǫ = { (u(ǫ, x), x) : x ∈M }
such that u is of class Cm,α in both variables and there holds
(5.42) u̇ > 0.
Proof. (i) Let us assume that M is strictly stable. Consider a tubular neigh-
bourhood of M with corresponding normal Gaussian coordinates (xα) such
that M = {x0 = 0}. The nonlinear operator G can then be viewed as an
elliptic operator
(5.43) G : Bρ(0) ⊂ Cm,α(M) → Cm−2,α(M)
where ρ is so small that all corresponding graphs are admissible.
In a smaller ball DG is a topological isomorphism, sinceM is strictly stable,
and hence G is a diffeomorphism in a neighbourhood of the origin, and there
exist smooth unique solutions
(5.44) Mǫ = { u(ǫ, x) : x ∈M } |ǫ| < ǫ0
of the equations
(5.45) G|Mǫ = ǫ
such that u ∈ Cm,α((−ǫ0, ǫ0)×M).
Differentiating with respect to ǫ yields
(5.46) DGu̇ = 1.
Let us consider this equation for ǫ = 0, i.e., on M , and define
(5.47) η = min(u̇, 0).
Then we deduce
(5.48) 0 ≤
〈DGη, η〉 =
η ≤ 0,
and hence there holds
(5.49) u̇ ≥ 0,
because of the strict stability of M .
22 CLAUS GERHARDT
Applying then the maximum principle to (5.46), we deduce further
(5.50) inf
u̇ > 0,
hence the hypersurfaces form a foliation if ǫ0 is chosen small enough such that
(5.51) inf
u̇(ǫ, ·) > 0 ∀ |ǫ| < ǫ0.
(ii) Assume now thatM is not strictly stable. After introducing coordinates
corresponding to a tubular neighbourhood U of M as in part (i) any function
u ∈ Cm,α(M) with |u|m,α small enough defines an admissible hypersurface
(5.52) M(u) = graphu ⊂ U
such that G|M(u) can be expressed as
(5.53) G|M(u) = G(u).
(5.54) A = DG(0),
then A is self-adjoint, monotone
(5.55) 〈Au, u〉 ≥ 0 ∀u ∈ H1,2(M)
and the smallest eigenvalue of A is equal to zero, the corresponding eigenspace
spanned by a strictly positive eigenfunction η.
Similarly as in [2, p. 621] we consider the operator
(5.56) Ψ(u, τ) = (G(u) − τ, ϕ(u))
defined in Bρ(0) × R, Bρ(0) ⊂ Cm,α(M) for small ρ > 0, where ϕ is a linear
functional
(5.57) ϕ(u) =
Ψ is of class Cm,α and maps
(5.58) Ψ : Bρ(0)× R → Cm−2,α(M)× R,
such that
(5.59) DΨ =
DG −1
evaluated at (0, 0) is bijective as one easily checks. Indeed let (u, ǫ) satisfy
(5.60) DΨ(u, ǫ) = (0, 0),
(5.61) Au = DGu = ǫ ∧
ηu = 0,
hence
(5.62) ǫ
η = 〈Au, η〉 = 〈u,Aη〉 = 0
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 23
and we conclude ǫ = 0 as well as u = 0.
To prove the surjectivity, let (w, δ) ∈ Cm−2,α(M)×R be arbitrary. Choosing
(5.63) ǫ = −
we deduce
(5.64)
(ǫ+ w)η = 0,
hence there exists ū ∈ Cm,α(M) solving
(5.65) Aū = ǫ+ w
(5.66) u = ū+ λη
(5.67) λ = δ −
then satisfies
(5.68)
ηu = ǫ,
i.e.,
(5.69) DΨ(u, ǫ) = (w, δ).
Applying the inverse function theorem we conclude that there exists ǫ0 > 0
and functions (u(ǫ, x), τ(ǫ)) of class Cm,α in both variables such that
(5.70) G(u(ǫ)) = τ(ǫ) ∧
ηu(ǫ) = ǫ ∀ |ǫ| < ǫ0;
τ(ǫ) is constant for fixed ǫ.
The hypersurfaces
(5.71) Λ = {Mǫ =M(u(ǫ)) : |ǫ| < ǫ0 }
will form a foliation, if we can show that
(5.72) u̇ 6= 0.
Differentiating the equations in (5.70) with respect to ǫ and evaluating the
result at ǫ = 0 yields
(5.73) Au̇(0) = τ̇(0) ∧
ηu̇(0) = 1
and we deduce further
(5.74) τ̇ (0)
η = 〈Au̇(0), η〉 = 〈u̇(0), Aη〉 = 0
and thus
(5.75) τ̇ (0) = 0 ∧ u̇(0) = η > 0,
24 CLAUS GERHARDT
if η is normalized such that 〈η, η〉 = 1, i.e., we have u̇(ǫ) > 0, if ǫ0 is chosen
small enough. �
5.7. Remark. Let M be a stable solution of
(5.76) G|M = 0
as in the preceding theorem, but not strictly stable and let Mǫ be a foliation
of a neighbourhood of M such that
(5.77) G|Mǫ = τ(ǫ) ∀ |ǫ| < ǫ0.
If M is the limit hypersurface of a curvature flow as in Theorem 5.3, then
(5.78) τ(ǫ) > 0 ∀ 0 < ǫ < ǫ0,
if the flow hypersurfacesM(t) converge to M from above, which is tantamount
(5.79) Φ(F )− f̃ ≥ 0,
or we have
(5.80) τ(ǫ) < 0 ∀ − ǫ0 < ǫ < 0,
(5.81) Φ(F )− f̃ ≤ 0,
in which case the flow hypersurfaces converge to M from below.
The direction
above“ is defined by the region the normal σν of M points
Proof. Let us assume that the flow hypersurfaces satisfy (5.79) and fix 0 < ǫ <
ǫ0. We may also suppose that the initial hypersurface M(0) doesn’t intersect
the tubular neighbourhood of M which is being foliated by Mǫ. Now, fix
0 < ǫ < ǫ0, then there must be a first t > 0 such that M(t) touches Mǫ from
above which yields, in view of the maximum principle,
(5.82) G|Mǫ = τ(ǫ) > 0,
since τ(ǫ) ≤ 0 would imply τ(ǫ) = 0 and Mǫ = M(t), cf. [18, Theorem 2.7.9],
i.e.,M(t) would be a stationary solution, which is impossible as we have proved
in Lemma 5.4. �
Finally, let us show that the symmetric polynomials Hk, 1 ≤ k ≤ n, are of
class (D), if the ambient space has constant curvature.
5.8. Lemma. Let N be a semi-Riemannian space of constant curvature,
then the symmetric polynomials F = Hk, 1 ≤ k ≤ n, are of class (D). In case
k = 2 it suffices to assume N Einstein.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 25
Proof. We shall prove the result by induction on k. First we note that the
cones of definition Γk ⊂ Rn of the Hk form an ordered chain
(5.83) Γk ⊂ Γk−1 ∀ 1 < k ≤ n,
cf. [7], so that a hypersurface admissible for Hk is also admissible for Hk−1.
For k = 1 we have
(5.84) F ij = gij
and the result is obviously valid for arbitrary N .
Thus let us assume that the result is already proved for 1 ≤ k < n. Set
F = Hk+1, F̂ = Hk and let M be an admissible hypersurface for F with
principal curvatures κi.
From the definition of the Hk’s we immediately deduce
(5.85) F̂ =
for fixed i, no summation over i, or equivalently,
(5.86) F̂ gij = F ij + F̂ jmhim,
notice that the last term is a symmetric tensor, since for any symmetric cur-
vature function F F ij and hij commute, cf. [18, Lemma 2.1.9]. Thus there
holds
(5.87) F ij = F̂ gij − F̂ jmhim
and we deduce, using the induction hypothesis,
(5.88)
;j = F̂
i − F̂ jmhim;j = F̂ i − F̂ jmh imj;
= F̂ i − F̂ i = 0,
where we applied the Codazzi equations at one point.
If F = H2, then
(5.89) F ij = Hgij − hij
and the assumption N Einstein suffices to conclude that F ij is divergence
free. �
6. Existence results
From now on we shall assume that ambient manifold N is Lorentzian, or
more precisely, that it is smooth, globally hyperbolic with a compact, connected
Cauchy hypersurface. Then there exists a smooth future oriented time function
x0 such that the metric in N can be expressed in Gaussian coordinates (xα) as
(6.1) ds̄2 = e2ψ{−(dx0)2 + σijdxidxj},
where x0 is the time function and the (xi) are local coordinates for
(6.2) S0 = {x0 = 0}.
26 CLAUS GERHARDT
S0 is then also a compact, connected Cauchy hypersurface. For a proof of the
splitting result see [4, Theorem 1.1], and for the fact that all Cauchy hyper-
surfaces are diffeomorphic and hence S0 is also compact and connected, see [3,
Lemma 2.2].
One advantage of working in globally hyperbolic spacetimes with a compact
Cauchy hypersurface is that all compact, connected spacelike Cm-hypersurfaces
M can be written as graphs over S0.
6.1. Lemma. Let N be as above and M ⊂ N a connected, spacelike hyper-
surface of class Cm, 1 ≤ m, then M can be written as a graph over S0
(6.3) M = graphu|S0
with u ∈ Cm(S0).
We proved this lemma under the additional hypothesis that M is achronal,
[10, Proposition 2.5], however, this assumption is unnecessary as has been
shown in [25, Theorem 1.1].
We are looking at the curvature flow (3.7) on page 5 and want to prove that
it converges to a stationary solution hypersurface, if certain assumptions are
satisfied.
The existence proof consists of four steps:
(i) Existence on a maximal time interval [0, T ∗).
(ii) Proof that the flow stays in a compact subset.
(iii) Uniform a priori estimates in an appropriate function space, e.g., C4,α(S0)
or C∞(S0), which, together with (ii), would imply T ∗ = ∞.
(iv) Conclusion that the flow—or at least a subsequence of the flow hypersur-
faces—converges if t tends to infinity.
The existence on a maximal time interval is always guaranteed, if the data
are sufficiently regular, since the problem is parabolic. If the flow hypersurfaces
can be written as graphs in a Gaussian coordinate system, as will always be the
case in a globally hyperbolic spacetime with a compact Cauchy hypersurface
in view of Lemma 6.1, the conditions are better than in the general case:
6.2. Theorem. Let 4 ≤ m ∈ N and 0 < α < 1, and assume the semi-
Riemannian space N to be of class Cm+2,α. Let the strictly monotone curvature
function F , the functions f and Φ be of class Cm,α and let M0 ∈ Cm+2,α be
an admissible compact, spacelike, connected, orientable3 hypersurface. Then the
curvature flow (3.7) on page 5 with initial hypersurface M0 exists in a maximal
time interval [0, T ∗), 0 < T ∗ ≤ ∞, where in case that the flow hypersurfaces
cannot be expressed as graphs they are supposed to be smooth, i.e, the conditions
should be valid for arbitrary 4 ≤ m ∈ N in this case.
3Recall that oriented simply means there exists a continuous normal, which will always
be the case in a globally hyperbolic spacetime.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 27
A proof can be found in [18, Theorem 2.5.19, Lemma 2.6.1].
The second step, that the flow stays in a compact set, can only be achieved
by barrier assumptions, cf. Definition 2.1. Thus, let Ω ⊂ N be open and
precompact such that ∂Ω has exactly two components
(6.4) ∂Ω =M1
where M1 is a lower barrier for the pair (F, f) and M2 an upper barrier. More-
over, M1 has to lie in the past of M2
(6.5) M1 ⊂ I−(M2),
cf. [18, Remark 2.7.8].
Then the flow hypersurfaces will always stay inside Ω̄, if the initial hyper-
surface M0 satisfies M0 ⊂ Ω, [18, Theorem 2.7.9]. This result is also valid if
M0 coincides with one the barriers, since then the velocity (Φ− f̃) has a weak
sign and the flow moves into Ω for small t, if it moves at all, and the arguments
of the proof are applicable.
In Lorentzian manifolds the existence of barriers is associated with the pres-
ence of past and future singularities. In globally hyperbolic spacetimes, when
N is topologically a product
(6.6) N = I × S0,
where I = (a, b), singularities can only occur, when the endpoints of the interval
are approached. A singularity, if one exists, is called a crushing singularity, if
the sectional curvatures become unbounded, i.e.,
(6.7) R̄αβγδR̄
αβγδ → ∞
and such a singularity should provide a future resp. past barrier for the mean
curvature function H .
6.3. Definition. Let N be a globally hyperbolic spacetime with compact
Cauchy hypersurface S0 so that N can be written as a topological product
N = I × S0 and its metric expressed as
(6.8) ds̄2 = e2ψ(−(dx0)2 + σij(x0, x)dxidxj).
Here, x0 is a globally defined future directed time function and (xi) are lo-
cal coordinates for S0. N is said to have a future resp. past mean curvature
barrier, if there are sequences M+k resp. M
k of closed, spacelike, admissible
hypersurfaces such that
(6.9) lim
= ∞ resp. lim
(6.10) lim sup inf
x0 > x0(p) ∀ p ∈ N
28 CLAUS GERHARDT
resp.
(6.11) lim inf sup
x0 < x0(p) ∀ p ∈ N,
If one stipulates that the principal curvatures of the M+k resp. M
k tend to
plus resp. minus infinity, then these hypersurfaces could also serve as barriers
for other curvature functions. The past barriers would most certainly be non-
admissible for any curvature function except H .
6.4.Remark. Notice that the assumptions (6.9) alone already implies (6.10)
resp. (6.11), if either
(6.12) lim sup inf
x0 > a
resp.
(6.13) lim inf sup
x0 < b
where (a, b) = x0(N), or, if
(6.14) R̄αβν
ανβ ≥ −Λ ∀ 〈ν, ν〉 = −1.
where Λ ≥ 0.
Proof. It suffices to prove that the relation (6.10) is automatically satisfied
under the assumptions (6.12) or (6.14) by switching the light cone and replacing
x0 by −x0 in case of the past barrier.
Fix k, and let
(6.15) τk = inf
then the coordinate slice
(6.16) Mτk = {x0 = τk}
touches Mk from below in a point pk ∈ Mk where τk = x0(pk) and the maxi-
mum principle yields that in that point
(6.17) H |Mτk
≥ H |Mk ,
hence, if k tends to infinity the points (pk) cannot stay in a compact subset,
i.e.,
(6.18) lim supx0(pk) → b
(6.19) lim supx0(pk) → a.
We shall show that only (6.18) can be valid. The relation (6.19) evidently
contradicts (6.12).
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 29
In case the assumption (6.14) is valid, we consider a fixed coordinate slice
M0 = {x0 = const}, then all hypersurfaces Mk satisfying
(6.20) H |M0 < infMk
nΛ < inf
have to lie in the future of M0, cf. [18, Lemma 4.7.1], hence the result. �
A future mean curvature barrier certainly represents a singularity, at least
if N satisfies the condition
(6.21) R̄αβν
ανβ ≥ −Λ ∀ 〈ν, ν〉 = −1
where Λ ≥ 0, because of the future timelike incompleteness, which is proved in
[1], and is a generalization of Hawking’s earlier result for Λ = 0, [24]. But these
singularities need not be crushing, cf. [15, Section 2] for a counterexample.
The uniform a priori estimates for the flow hypersurfaces are the hardest
part in any existence proof. When the flow hypersurfaces can be written as
graphs it suffices to prove C1 and C2 estimates, namely, the induced metric
(6.22) gij(t, ξ) = 〈xi, xj〉
where x = x(t, ξ) is a local embedding of the flow, should stay uniformly
positive definite, i.e., there should exist positive constants ci, 1 ≤ i ≤ 2, such
(6.23) c1gij(0, ξ) ≤ gij(t, ξ) ≤ c2gij(0, ξ),
or equivalently, that the quantity
(6.24) ṽ = 〈η, ν〉,
where ν is the past directed normal of M(t) and η the vector field
(6.25) η = (ηα) = e
ψ(−1, 0, . . . , 0),
is uniformly bounded, which is achieved with the help of the parabolic equation
(4.37) on page 15, if it is possible at all.
However, in some special situations C1-estimates are automatically satisfied,
cf. Theorem 6.11 at the end of this section.
For the C2-estimates the principle curvatures κi of the flow hypersurfaces
have to stay in a compact set in the cone of definition Γ of F , e.g., if F is the
Gaussian curvature, then Γ = Γ+ and one has to prove that there are positive
constants ki, i = 1, 2 such that
(6.26) k1 ≤ κi ≤ k2 ∀ 1 ≤ i ≤ n
uniformly in the cylinder [0, T ∗)×M0, whereM0 is any manifold that can serve
as a base manifold for the embedding x = x(t, ξ).
The parabolic equations that are used for these curvature estimates are
(4.36) on page 14, usually for an upper estimate, and (4.33) on page 14 for the
lower estimate. Indeed, suppose that the flow starts at the upper barrier, then
(6.27) F ≥ f
30 CLAUS GERHARDT
at t = 0 and this estimate remains valid throughout the evolution because of
the parabolic maximum principle, use (4.36). Then, if upper estimates for the
κi have been derived and if f > 0 uniformly, then we conclude from (6.27) that
the κi stay in a compact set inside the open cone Γ , since
(6.28) F |∂Γ = 0.
To obtain higher order estimates we are going to exploit the fact that the
flow hypersurfaces are graphs over S0 in an essential way, namely, we look
at the associated scalar flow equation (4.21) on page 12 satisfied by u. This
equation is a nonlinear uniformly parabolic equation, where the operator Φ(F )
is also concave in hij , or equivalently, convex in uij , i.e., the C
2,α-estimates of
Krylov and Safonov, [26, Chapter 5.5] or see [28, Chapter 10.6] for a very clear
and readable presentation, are applicable, yielding uniform estimates for the
standard parabolic Hölder semi-norm
(6.29) [D2u]β,Q̄T
for some 0 < β ≤ α in the cylinder
(6.30) Q = [0, T )× S0,
independent of 0 < T < T ∗, which in turn will lead to Hm+2+α,
m+2+α
2 (Q̄T )
estimates, cf. [18, Theorem 2.5.9, Remark 2.6.2].
Hm+2+α,
m+2+α
2 (Q̄T ) is a parabolic Hölder space, cf. [27, p. 7] for the original
definition and [18, Note 2.5.4] in the present context.
The estimate (6.29) combined with the uniform C2-norm leads to uniform
C2,β(S0)-estimates independent of T .
These estimates imply that T ∗ = ∞.
Thus, it remains to prove that u(t, ·) converges in Cm+2(S0) to a stationary
solution ũ, which is then also of class Cm+2,α(S0) in view of the Schauder
theory.
Because of the preceding a priori estimates u(t, ·) is precompact in C2(S0).
Moreover, we deduce from the scalar flow equation (4.21) on page 12 that u̇
has a sign, i.e., the u(t, ·) converge monotonely in C0(S0) to ũ and therefore
also in C2(S0).
To prove that graph ũ is a solution, we again look at (4.21) and integrate it
with respect to t to obtain for fixed x ∈ S0
(6.31) |ũ(x) − u(t, x)| =
e−ψv|Φ− f̃ |,
where we used that (Φ− f̃) has a sign, hence (Φ− f̃)(t, x) has to vanish when
t tends to infinity, at least for a subsequence, but this suffices to conclude that
graph ũ is a stationary solution and
(6.32) lim
(Φ− f̃) = 0.
Using the convergence of u to ũ in C2, we can then prove:
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 31
6.5. Theorem. The functions u(t, ·) converge in Cm+2(S0) to ũ, if the data
satisfy the assumptions in Theorem 6.2, since we have
(6.33) u ∈ Hm+2+β,
m+2+β
2 (Q̄),
where Q = Q∞.
Proof. Out of convention let us write α instead of β knowing that α is the
Hölder exponent in (6.29).
We shall reduce the Schauder estimates to the standard Schauder estimates
in Rn for the heat equation with a right-hand side by using the already estab-
lished results (6.29) and
(6.34) u(t, ·) →
C2(S0)
ũ ∈ Cm+2,α(S0).
Let (Uk) be a finite open covering of S0 such that each Uk is contained in a
coordinate chart and
(6.35) diamUk < ρ,
ρ small, ρ will be specified in the proof, and let (ηk) be a subordinate finite
partition of unity of class Cm+2,α.
Since
(6.36) u ∈ Hm+2+α,
m+2+α
2 (Q̄T )
for any finite T , cf. [18, Lemma 2.6.1], and hence
(6.37) u(t, ·) ∈ Cm+2,α(S0) ∀ 0 ≤ t <∞
we shall choose u0 = u(t0, ·) as initial value for some large t0 such that
(6.38) |aij(t, ·)− ãij |0,S0 < ǫ/2 ∀ t ≥ t0,
where
(6.39) aij = v2Φ̇F ij
and ãij is defined correspondingly for M̃ = graph ũ.
However, making a variable transformation we shall always assume that
t0 = 0 and u0 = u(0, ·).
We shall prove (6.33) successively.
(i) Let us first show that
(6.40) Dxu ∈ H2+α,
2 (Q̄).
This will be achieved, if we show that for an arbitrary ξ ∈ Cm+1,α(T 1,0(S0))
(6.41) ϕ = Dξu ∈ H2+α,
2 (Q̄),
cf. [18, Remark 2.5.11].
Differentiating the scalar flow equation (4.21) on page 12 with respect to ξ
we obtain
(6.42) ϕ̇− aijϕij + biϕi + cϕ = f,
32 CLAUS GERHARDT
where of course the symbol f has a different meaning then in (4.21).
Later we want to apply the Schauder estimates for solutions of the heat flow
equation with right-hand side. In order to use elementary potential estimates
we have to cut off ϕ near the origin t = 0 by considering
(6.43) ϕ̃ = ϕθ,
where θ = θ(t) is smooth satisfying
(6.44) θ(t) =
1, t > 1,
0, t ≤ 1
This modification doesn’t cause any problems, since we already have a priori
estimates for finite t, and we are only concerned about the range 1 ≤ t < ∞.
ϕ̃ satisfies the same equation as ϕ only the right-hand side has the additional
summand wθ̇.
Let η = ηk be one of the members of the partition of unity and set
(6.45) w = ϕ̃η,
then w satisfies a similar equation with slightly different right-hand side
(6.46) ẇ − aijwij + biwi + cw = f̃
but we shall have this in mind when applying the estimates.
The w(t, ·) have compact support in one of the Uk’s, hence we can replace
the covariant derivatives of w by ordinary partial derivatives without changing
the structure of the equation and the properties of the right-hand side, which
still only depends linearly on ϕ and Dϕ.
We want to apply the well-known estimates for the ordinary heat flow equa-
(6.47) ẇ −∆w = f̂
where w is defined in R × Rn.
To reduce the problem to this special form, we pick an arbitrary x0 ∈ Uk,
set z0 = (0, x0), z = (t, x) and consider instead of (6.46)
(6.48)
ẇ − aij(z0)wij = f̂
= [aij(z)− aij(z0)]wij − biwi − cw + f̃ ,
where we emphasize that the difference
(6.49) |aij(z)− aij(z0)|
can be made smaller than any given ǫ > 0 by choosing ρ = ρ(ǫ) in (6.35) and
t0 = t0(ǫ) in (6.38) accordingly. Notice also that this equation can be extended
into R × Rn, since all functions have support in {t ≥ 1
Let 0 < T <∞ be arbitrary, then all terms belong to the required function
spaces in Q̄T and there holds
(6.50) [w]2+α,QT ≤ c[f̂ ]α,QT ,
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 33
where c = c(n, α). The brackets indicate the standard unweighted parabolic
semi-norms, cf. [18, Definition 2.5.2], which are identical to those defined in
[27, p. 7], but there the brackets are replaced by kets.
Thus, we conclude
(6.51)
[w]2+α,QT ≤ c sup
Uk×(0,T )
|aij(z)− aij(z0)|[D2w]α,QT + c[f ]α,QT
+ c1{[D2u]α,Q+ + [Du]α,QT + [u]α,QT + |w|0,QT + |D2w|0,QT },
where c1 is independent of T , but dependent on ηk. Here we also used the fact
that the lower order coefficients and ϕ,Dϕ are uniformly bounded.
Choosing now ǫ > 0 so small that
(6.52) cǫ < 1
and ρ, t0 accordingly such the difference in (6.49) is smaller than ǫ, we deduce
(6.53)
[w]2+α,QT ≤ 2c[f ]α,QT
+ 2c1{[D2u]α,Q+ + [Du]α,QT + [u]α,QT + |w|0,QT + |D2w|0,QT }.
Summing over the partition of unity and noting that ξ is arbitrary we see
that in the preceding inequality we can replace w by Du everywhere resulting
in the estimate
(6.54)
[Du]2+α,QT ≤ c1[f ]α,QT
+ c1{[D2u]α,QT + [Du]α,QT + [u]α,QT + |Du|0,QT + |D3u|0,QT },
where c1 is a new constant still independent of T .
Now the only critical terms on the right-hand side are |D3u|0,QT , which can
be estimated by (6.57), and the Hölder semi-norms with respect to t
(6.55) [Du]α
,t,QT + [u]α2 ,t,QT .
The second one is taken care of by the boundedness of u̇, see (4.21) on page 12,
while the first one is estimated with the help of equation (6.42) revealing
(6.56) |Du̇| ≤ c{sup
[0,T ]
|u|3,S0 + |f |0,QT },
since for fixed but arbitrary t we have
(6.57) |u|3,S0 ≤ ǫ[D3u]α,S0 + cǫ|u|0,S0 ,
where cǫ is independent of t.
Hence we conclude
(6.58) |Du|2+α,QT ≤ const
uniformly in T .
(ii) Repeating these estimates successively for 2 ≤ l ≤ m we obtain uniform
estimates for
(6.59)
[Dlxu]2+α,QT ,
34 CLAUS GERHARDT
which, when combined with the uniform C2-estimates, yields
(6.60) |u(t, ·)|m+2,α,S0 ≤ const
uniformly in 0 ≤ t <∞.
Looking at the equation (4.21) we then deduce
(6.61) |u̇(t, ·)|m,α,S0 ≤ const
uniformly in t.
(iii) To obtain the estimates for Drtu up to the order
(6.62) [m+2+α
we differentiate the scalar curvature equation with respect to t as often as
necessary and also with respect to the mixed derivatives DrtD
x to estimate
(6.63)
1≤2r+s<m+2+α
using (6.60), (6.61) and the results from the prior differentiations.
Combined with the estimates for the heat equation in R×Rn these estimates
will also yield the necessary a priori estimates for the Hölder semi-norms in Q̄,
where again the smallness of (6.49) has to be used repeatedly. �
6.6. Remark. The preceding regularity result is also valid in Riemannian
manifolds, if the flow hypersurfaces can be written as graphs in a Gaussian
coordinate system. In fact the proof is unaware of the nature of the ambient
space.
With the method described above the following existence results have been
proved in globally hyperbolic spacetimes with a compact Cauchy hypersurface.
Ω ⊂ N is always a precompact domain the boundary of which is decomposed
as in (6.4) and (6.5) into an upper and lower barrier for the pair (F, f). We
also apply the stability results from Section 5 and the just proved regularity of
the convergence and formulate the theorems accordingly.
By convergence of the flow in Cm+2 we mean convergence of the leaves
M(t) = graphu(t, ·) in this norm.
6.7.Theorem. LetM1, M2 be lower resp. upper barriers for the pair (H, f),
where f ∈ Cm,α(Ω̄) and the Mi are of class Cm+2,α, 4 ≤ m, 0 < α < 1, then
the curvature flow
(6.64)
ẋ = (H − f)ν
x(0) = x0,
where x0 is an embedding of the initial hypersurface M0 = M2 exists for all
time and converges in Cm+2 to a stable solution M of class Cm+2,α of the
equation
(6.65) H |M = f,
provided the initial hypersurface is not already a solution.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 35
The existence result was proved in [11, Theorem 2.2], see also [18, Theorem
4.2.1] and the remarks following the theorem. Notice that f isn’t supposed to
satisfy any sign condition.
For spacetimes that satisfy the timelike convergence condition and for func-
tions f with special structural conditions existence results via a mean curvature
flow were first proved in [6].
The Gaussian curvature or the curvature functions F belonging to the larger
class (K∗), see [10] for a definition, require that the admissible hypersurfaces
are strictly convex.
Moreover, proving a priori estimates for the second fundamental form of
a hypersurface M in general semi-Riemannian manifolds, when the curvature
function is not the mean curvature, or does not behave similar to it, requires
that a strictly convex function χ is defined in a neighbourhood of the hypersur-
face, see Lemma 2.2 on page 3 where sufficient assumptions are stated which
imply the existence of strictly convex functions.
Furthermore, when we consider curvature functions of class (K∗), notice
that the Gaussian curvature belongs to that class, then the right-hand side f
can be defined in T (Ω̄) instead of Ω̄, i.e., in a local trivialization of the tangent
bundle f can be expressed as
(6.66) f = f(x, ν) ∧ ν ∈ Tx(N).
We shall formulate the existence results with this more general assumption,
though of course any stability claim only makes sense for f = f(x).
6.8. Theorem. Let F ∈ Cm,α(Γ+), 4 ≤ m, 0 < α < 1, be a curvature
function of class (K∗), let 0 < f ∈ Cm,α(T (Ω̄)), and let M1, M2 be lower resp.
upper barriers for (F, f) of class Cm+2,α. Then the curvature flow
(6.67)
ẋ = (Φ− f̃)ν
x(0) = x0
where Φ(r) = log r and x0 is an embedding of M0 =M2, exists for all time and
converges in Cm+2 to a stationary solution M ∈ Cm+2,α of the equation
(6.68) F |M = f
provided the initial hypersurface M2 is not already a stationary solution and
there exists a strictly convex function χ ∈ C2(Ω̄).
When f = f(x) and F is of class (D), then M is stable.
The theorem was proved in [10] when f is only defined in Ω̄ and in the
general case in [18, Theorem 4.1.1].
When F = H2 is the scalar curvature operator, then the requirement that
f is defined in the tangent bundle and not merely in N is a necessity, if the
scalar curvature is to be prescribed. To prove existence results in this case, f
36 CLAUS GERHARDT
has to satisfy some natural structural conditions, namely,
0 < c1 ≤ f(x, ν) if 〈ν, ν〉 = −1,(6.69)
|||fβ(x, ν)||| ≤ c2(1 + |||ν|||2),(6.70)
|||fνβ (x, ν)||| ≤ c3(1 + |||ν|||),(6.71)
for all x ∈ Ω̄ and all past directed timelike vectors ν ∈ Tx(Ω), where ||| · ||| is a
Riemannian reference metric.
Applying a curvature flow to obtain stationary solutions requires to approx-
imate F and f by functions Fǫ and fk and to use these functions for the flow.
The Fǫ are the ǫ-regularizations of F , which we already discussed before, cf.
(5.30) on page 19. Let us also write F̃ instead of Fǫ as before.
The functions fk have the property that |||fkβ ||| only grows linearly in |||ν|||
and |||fkνβ (x, ν)||| is bounded. To simplify the presentation we shall therefore
assume that f satisfies
(6.72) |||fβ(x, ν)||| ≤ c2(1 + |||ν|||),
(6.73) |||fνβ (x, ν)||| ≤ c3,
and also
(6.74) 0 < c1 ≤ f(x, ν) ∀ ν ∈ Tx(N), 〈ν, ν〉 < 0,
although the last assumption is only a minor point that can easily be dealt with,
see [13, Remark 2.6], and [13, Section 7 and 8] for the other approximations of
The barriers Mi, i = 1, 2, for (F, f) satisfy the barrier condition of course
only weakly, i.e., no strict inequalities; however, because of the ǫ-regularization
we need strict inequalities, so that the Mi’s are also barriers for (F̃ , f), if ǫ is
small. In [13, Remark 2.4 and Lemma 2.5] it is shown that strict inequalities
for the barriers may be assumed without loss of generality.
Now, we can formulate the existence result for the scalar curvature operator
F = H2 under these provisions.
6.9. Theorem. Let f ∈ Cm,α(T (Ω̄)), 4 ≤ m, 0 < α < 1, satisfy the con-
ditions (6.72), (6.73) and (6.74), and let M1, M2 be strict lower resp. upper
barriers of class Cm+2,α for (F, f). Let F̃ be the ǫ-regularization of the scalar
curvature operator F , then the curvature flow for F̃
(6.75)
ẋ = (Φ− f̃)
x(0) = x0
where Φ(r) = r
2 and x0 is an embedding of M0 = M2, exists for all time and
converges in Cm+2 to a stationary solution Mǫ ∈ Cm+2,α of
(6.76) F̃ |Mǫ = f
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 37
provided there exists a strictly convex function χ ∈ C2(Ω̄) and 0 < ǫ is small.
The Mǫ then converge in C
m+2 to a solution M ∈ Cm+2,α of
(6.77) F |M = f.
If f = f(x) and N Einstein, then M is stable.
These statements, except for the stability and the convergence in Cm+2, are
proved in [13].
6.10. Remark. Let us now discuss the pure mean curvature flow
(6.78) ẋ = Hν
with initial spacelike hypersurface M0 of class C
m+2,α, m ≥ 4 and 0 < α < 1.
From the corresponding scalar curvature flow (4.21) on page 12 we immediately
infer that the flow moves into the past of M0, if
(6.79) H |M0 ≥ 0
and into its future, if
(6.80) H |M0 ≤ 0.
Let us only consider the case (6.79) and also assume that M0 is not maximal.
From the a priori estimates in [11, Section 3 and Section 4] we then deduce
that the flow remains smooth as long as it stays in a compact set of N , and if
a compact, spacelike hypersurface M1 of class C
2 satisfying
(6.81) H |M1 ≤ 0
lies in the past of M0, then the flow will exist for all time and converge in
Cm+2 to a stable maximal hypersurface M , hence a neighbourhood of M can
be foliated by CMC hypersurfaces, where those in the future ofM have positive
mean curvature, in view of Remark 5.7 on page 24.
Thus, the flow will converge if and only if such a hypersurfaceM1 lies in the
past of M0.
Conversely, if there exists a compact, spacelike hypersurface M1 in N sat-
isfying (6.81), and there is no stable maximal hypersurface in its future, then
this is a strong indication that N has no future singularity, assuming that
such a singularity would produce spacelike hypersurfaces with positive mean
curvature.
An example of such a spacetime is the (n + 1)-dimensional de Sitter space
which is geodesically complete and has exactly one maximal hypersurface M
which is also totally geodesic but not stable, and the future resp. past of M
are foliated by coordinate slices with negative resp. positive mean curvature.
To conclude this section let us show which spacelike hypersurfaces satisfy
C1-estimates automatically.
38 CLAUS GERHARDT
6.11. Theorem. Let M = graphu|S0 be a compact, spacelike hypersurface
represented in a Gaussian coordinate system with unilateral bounded principal
curvatures, e.g.,
(6.82) κi ≥ κ0 ∀ i.
Then, the quantity ṽ = 1√
1−|Du|2
can be estimated by
(6.83) ṽ ≤ c(|u|,S0, σij , ψ, κ0),
where we assumed that in the Gaussian coordinate system the ambient metric
has the form as in (6.1).
Proof. We suppose as usual that the Gaussian coordinate system is future
oriented, and that the second fundamental form is evaluated with respect to
the past directed normal. We observe that
(6.84) ‖Du‖2 = gijuiuj = e−2ψ
|Du|2
hence, it is equivalent to find an a priori estimate for ‖Du‖.
Let λ be a real parameter to be specified later, and set
(6.85) w = 1
log‖Du‖2 + λu.
We may regard w as being defined on S0; thus, there is x0 ∈ S0 such that
(6.86) w(x0) = sup
and we conclude
(6.87) 0 = wi =
‖Du‖2
j + λui
in x0, where the covariant derivatives are taken with respect to the induced
metric gij , and the indices are also raised with respect to that metric.
Expressing the second fundamental form of a graph with the help of the
Hessian of the function
(6.88) e−ψv−1hij = −uij − Γ̄ 000uiuj − Γ̄ 00iuj − Γ̄ 00jui − Γ̄ 0ij .
we deduce further
(6.89)
λ‖Du‖4 = −uijuiuj
= e−ψ ṽhiju
iuj + Γ̄ 000‖Du‖4
+ 2Γ̄ 00ju
j‖Du‖2 + Γ̄ 0ijuiuj .
Now, there holds
(6.90) ui = gijuj = e
−2ψσijujv
and by assumption,
(6.91) hiju
iuj ≥ κ0‖Du‖2,
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 39
i.e., the critical terms on the right-hand side of (6.89) are of fourth order in
‖Du‖ with bounded coefficients, and we conclude that ‖Du‖ can’t be too large
in x0 if we choose λ such that
(6.92) λ ≤ −c|||Γ̄ 0αβ||| − 1
with a suitable constant c; w, or equivalently, ‖Du‖ is therefore uniformly
bounded from above. �
Especially for convex graphs over S0 the term ṽ is uniformly bounded as
long as they stay in a compact set.
7. The inverse mean curvature flow
Let us now consider the inverse mean curvature flow (IMCF)
(7.1) ẋ = −H−1ν
with initial hypersurfaceM0 in a globally hyperbolic spacetimeN with compact
Cauchy hypersurface S0.
N is supposed to satisfy the timelike convergence condition
(7.2) R̄αβν
ανβ ≥ 0 ∀ 〈ν, ν〉 = −1.
Spacetimes with compact Cauchy hypersurface that satisfy the timelike con-
vergence condition are also called cosmological spacetimes, a terminology due
to Bartnik.
In such spacetimes the inverse mean curvature flow will be smooth as long
as it stays in a compact set, and, if H |M0 > 0 and if the flow exists for all time,
it will necessarily run into the future singularity, since the mean curvature of
the flow hypersurfaces will become unbounded and the flow will run into the
future of M0. Hence the claim follows from Remark 6.4 on page 28.
However, it might be that the flow will run into the singularity in finite
time. To exclude this behaviour we introduced in [15] the so-called strong
volume decay condition, cf. Definition 7.2. A strong volume decay condition is
both necessary and sufficient in order that the IMCF exists for all time.
7.1. Theorem. Let N be a cosmological spacetime with compact Cauchy hy-
persurface S0 and with a future mean curvature barrier. Let M0 be a closed,
connected, spacelike hypersurface with positive mean curvature and assume fur-
thermore that N satisfies a future volume decay condition. Then the IMCF
(7.1) with initial hypersurface M0 exists for all time and provides a foliation of
the future D+(M0) of M0.
The evolution parameter t can be chosen as a new time function. The flow
hypersurfaces M(t) are the slices {t = const} and their volume satisfies
(7.3) |M(t)| = |M0|e−t.
Defining a new time function τ by choosing
(7.4) τ = 1− e−
40 CLAUS GERHARDT
we obtain 0 ≤ τ < 1,
(7.5) |M(τ)| = |M0|(1− τ)n,
and the future singularity corresponds to τ = 1.
Moreover, the length L(γ) of any future directed curve γ starting from M(τ)
is bounded from above by
(7.6) L(γ) ≤ c(1− τ),
where c = c(n,M0). Thus, the expression 1 − τ can be looked at as the radius
of the slices {τ = const} as well as a measure of the remaining life span of the
spacetime.
Next we shall define the strong volume decay condition.
7.2.Definition. Suppose there exists a time function x0 such that the future
end of N is determined by {τ0 ≤ x0 < b} and the coordinate slicesMτ = {x0 =
τ} have positive mean curvature with respect to the past directed normal for
τ0 ≤ τ < b. In addition the volume |Mτ | should satisfy
(7.7) lim
|Mτ | = 0.
A decay like that is normally associated with a future singularity and we
simply call it volume decay. If (gij) is the induced metric of Mτ and g =
det(gij), then we have
(7.8) log g(τ0, x)− log g(τ, x) =
2eψH̄(s, x) ∀x ∈ S0,
where H̄(τ, x) is the mean curvature ofMτ in (τ, x). This relation can be easily
derived from the relation (3.8) on page 5 and Remark 3.6 on page 7. A detailed
proof is given in [12].
In view of (7.7) the left-hand side of this equation tends to infinity if τ
approaches b for a.e. x ∈ S0, i.e.,
(7.9) lim
eψH̄(s, x) = ∞ for a.e. x ∈ S0.
Assume now, there exists a continuous, positive function ϕ = ϕ(τ) such that
(7.10) eψH̄(τ, x) ≥ ϕ(τ) ∀ (τ, x) ∈ (τ0, b)× S0,
where
(7.11)
ϕ(τ) = ∞,
then we say that the future of N satisfies a strong volume decay condition.
7.3. Remark. (i) By approximation we may assume that the function ϕ
above is smooth.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 41
(ii) A similar definition holds for the past of N by simply reversing the time
direction. Notice that in this case the mean curvature of the coordinate slices
has to be negative.
7.4. Lemma. Suppose that the future of N satisfies a strong volume decay
condition, then there exist a time function x̃0 = x̃0(x0), where x0 is the time
function in the strong volume decay condition, such that the mean curvature H̄
of the slices x̃0 = const satisfies the estimate
(7.12) eψ̃H̄ ≥ 1.
The factor eψ̃ is now the conformal factor in the representation
(7.13) ds̄2 = e2ψ̃(−(dx̃0)2 + σijdxidxj).
The range of x̃0 is equal to the interval [0,∞), i.e., the singularity corre-
sponds to x̃0 = ∞.
A proof is given in [15, Lemma 1.4].
7.5. Remark. Theorem 7.1 can be generalized to spacetimes satisfying
(7.14) R̄αβν
ανβ ≥ −Λ ∀ 〈ν, ν〉 = −1
with a constant Λ ≥ 0, if the mean curvature of the initial hypersurface M0 is
sufficiently large
(7.15) H |M0 >
cf. [25]. In that thesis it is also shown that the future mean curvature barrier
assumption can be dropped, i.e., the strong volume decay condition is sufficient
to prove that the IMCF exists for all time and provides a foliation of the future
of M0. Hence, the strong volume decay condition already implies the existence
of a future mean curvature barrier, since the leaves of the IMCF define such a
barrier.
8. The IMCF in ARW spaces
In the present section we consider spacetimes N satisfying some structural
conditions, which are still fairly general, and prove convergence results for the
leaves of the IMCF.
Moreover, we define a new spacetime N̂ by switching the light cone and
using reflection to define a new time function, such that the two spacetimes
N and N̂ can be pasted together to yield a smooth manifold having a metric
singularity, which, when viewed from the region N is a big crunch, and when
viewed from N̂ is a big bang.
The inverse mean curvature flows in N resp. N̂ correspond to each other via
reflection. Furthermore, the properly rescaled flow in N has a natural smooth
extension of class C3 across the singularity into N̂ . With respect to this natural
diffeomorphism we speak of a transition from big crunch to big bang.
42 CLAUS GERHARDT
8.1. Definition. A globally hyperbolic spacetime N , dimN = n+1, is said
to be asymptotically Robertson-Walker (ARW) with respect to the future, if a
future end of N , N+, can be written as a product N+ = [a, b) × S0, where S0
is a Riemannian space, and there exists a future directed time function τ = x0
such that the metric in N+ can be written as
(8.1) ds̆2 = e2ψ̃{−(dx0)2 + σij(x0, x)dxidxj},
where S0 corresponds to x0 = a, ψ̃ is of the form
(8.2) ψ̃(x0, x) = f(x0) + ψ(x0, x),
and we assume that there exists a positive constant c0 and a smooth Riemann-
ian metric σ̄ij on S0 such that
(8.3) lim
eψ = c0 ∧ lim
σij(τ, x) = σ̄ij(x),
(8.4) lim
f(τ) = −∞.
Without loss of generality we shall assume c0 = 1. Then N is ARW with
respect to the future, if the metric is close to the Robertson-Walker metric
(8.5) ds̄2 = e2f{−dx02 + σ̄ij(x)dxidxj}
near the singularity τ = b. By close we mean that the derivatives of arbitrary
order with respect to space and time of the conformal metric e−2f ğαβ in (8.1)
should converge to the corresponding derivatives of the conformal limit metric
in (8.5) when x0 tends to b. We emphasize that in our terminology Robertson-
Walker metric does not imply that (σ̄ij) is a metric of constant curvature, it is
only the spatial metric of a warped product.
We assume, furthermore, that f satisfies the following five conditions
(8.6) − f ′ > 0,
there exists ω ∈ R such that
(8.7) n+ ω − 2 > 0 ∧ lim
|f ′|2e(n+ω−2)f = m > 0.
Set γ̃ = 1
(n+ ω − 2), then there exists the limit
(8.8) lim
(f ′′ + γ̃|f ′|2)
(8.9) |Dmτ (f ′′ + γ̃|f ′|2)| ≤ cm|f ′|m ∀m ≥ 1,
as well as
(8.10) |Dmτ f | ≤ cm|f ′|m ∀m ≥ 1.
If S0 is compact, then we call N a normalized ARW spacetime, if
(8.11)
det σ̄ij = |Sn|.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 43
8.2. Remark. (i) If these assumptions are satisfied, then the range of τ is
finite, hence, we may—and shall—assume w.l.o.g. that b = 0, i.e.,
(8.12) a < τ < 0.
(ii) Any ARW spacetime with compact S0 can be normalized as one easily
checks. For normalized ARW spaces the constantm in (8.7) is defined uniquely
and can be identified with the mass of N , cf. [20].
(iii) In view of the assumptions on f the mean curvature of the coordinate
slices Mτ = {x0 = τ} tends to ∞, if τ goes to zero.
(iv) ARW spaces with compact S0 satisfy a strong volume decay condition,
cf. Definition 7.2 on page 40.
(v) Similarly one can define N to be ARW with respect to the past. In this
case the singularity would lie in the past, correspond to τ = 0, and the mean
curvature of the coordinate slices would tend to −∞.
We assume that N satisfies the timelike convergence condition and that S0
is compact. Consider the future end N+ of N and let M0 ⊂ N+ be a spacelike
hypersurface with positive mean curvature H̆ |M0 > 0 with respect to the past
directed normal vector ν̆—it will become apparent in a moment why we use the
symbols H̆ and ν̆ and not the usual ones H and ν. Then, as we have proved
in the preceding section, the inverse mean curvature flow
(8.13) ẋ = −H̆−1ν̆
with initial hypersurface M0 exists for all time, is smooth, and runs straight
into the future singularity.
If we express the flow hypersurfaces M(t) as graphs over S0
(8.14) M(t) = graphu(t, ·),
then we have proved in [14]
8.3. Theorem. (i) Let N satisfy the above assumptions, then the range of
the time function x0 is finite, i.e., we may assume that b = 0. Set
(8.15) ũ = ueγt,
where γ = 1
γ̃, then there are positive constants c1, c2 such that
(8.16) − c2 ≤ ũ ≤ −c1 < 0,
and ũ converges in C∞(S0) to a smooth function, if t goes to infinity. We shall
also denote the limit function by ũ.
(ii) Let ğij be the induced metric of the leaves M(t), then the rescaled metric
(8.17) e
tğij
converges in C∞(S0) to
(8.18) (γ̃m)
γ̃ (−ũ)
γ̃ σ̄ij .
44 CLAUS GERHARDT
(iii) The leaves M(t) get more umbilical, if t tends to infinity, namely, there
holds
(8.19) H̆−1|h̆ji − 1nH̆δ
i | ≤ ce
−2γt.
In case n+ ω − 4 > 0, we even get a better estimate
(8.20) |h̆ji − 1nH̆δ
i | ≤ ce
(n+ω−4)t.
To prove the convergence results for the inverse mean curvature flow, we con-
sider the flow hypersurfaces to be embedded in N equipped with the conformal
metric
(8.21) ds̄2 = −(dx0)2 + σij(x0, x)dxidxj .
Though, formally, we have a different ambient space we still denote it by the
same symbol N and distinguish only the metrics ğαβ and ḡαβ
(8.22) ğαβ = e
2ψ̃ ḡαβ
and the corresponding geometric quantities of the hypersurfaces h̆ij , ğij , ν̆ resp.
hij , gij , ν, etc., i.e., the standard notations now apply to the case when N is
equipped with the metric in (8.21).
The second fundamental forms h̆
i and h
i are related by
(8.23) eψ̃h̆
i = h
i + ψ̃αν
and, if we define F by
(8.24) F = eψ̃H̆,
(8.25) F = H − nṽf ′ + nψανα,
where
(8.26) ṽ = v−1,
and the evolution equation can be written as
(8.27) ẋ = −F−1ν,
since
(8.28) ν̆ = e−ψ̃ν.
The flow exists for all time and is smooth, due to the results in the preceding
section.
Next, we want to show how the metric, the second fundamental form, and
the normal vector of the hypersurfaces M(t) evolve by adapting the general
evolution equations in Section 3 on page 4 to the present situation.
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 45
8.4. Lemma. The metric, the normal vector, and the second fundamental
form of M(t) satisfy the evolution equations
(8.29) ġij = −2F−1hij ,
(8.30) ν̇ = ∇M (−F−1) = gij(−F−1)ixj ,
(8.31) ḣ
i = (−F
i + F
−1hki h
k + F
−1R̄αβγδν
γxδkg
(8.32) ḣij = (−F−1)ij − F−1hki hkj + F−1R̄αβγδναx
γxδj .
Since the initial hypersurface is a graph over S0, we can write
(8.33) M(t) = graphu(t)|S0 ∀ t ∈ I,
where u is defined in the cylinder R+×S0. We then deduce from (8.27), looking
at the component α = 0, that u satisfies a parabolic equation of the form
(8.34) u̇ =
where we emphasize that the time derivative is a total derivative, i.e.
(8.35) u̇ =
+ uiẋ
Since the past directed normal can be expressed as
(8.36) (να) = −e−ψv−1(1, ui),
we conclude from (8.34)
(8.37)
For this new curvature flow the necessary decay estimates and convergence
results can be proved, which in turn can be immediately translated to corre-
sponding convergence results for the original IMCF.
Transition from big crunch to big bang
With the help of the convergence results in Theorem 8.3, we can rescale the
IMCF such that it can be extended past the singularity in a natural way.
We define a new spacetime N̂ by reflection and time reversal such that the
IMCF in the old spacetime transforms to an IMCF in the new one.
By switching the light cone we obtain a new spacetime N̂ . The flow equation
in N is independent of the time orientation, and we can write it as
(8.38) ẋ = −H̆−1ν̆ = −(−H̆)−1(−ν̆) ≡ −Ĥ−1ν̂,
where the normal vector ν̂ = −ν̆ is past directed in N̂ and the mean curvature
Ĥ = −H̆ negative.
46 CLAUS GERHARDT
Introducing a new time function x̂0 = −x0 and formally new coordinates
(x̂α) by setting
(8.39) x̂0 = −x0, x̂i = xi,
we define a spacetime N̂ having the same metric as N—only expressed in the
new coordinate system—such that the flow equation has the form
(8.40) ˙̂x = −Ĥ−1ν̂,
where M(t) = graph û(t), û = −u, and
(8.41) (ν̂α) = −ṽe−ψ̃(1, ûi)
in the new coordinates, since
(8.42) ν̂0 = −ν̆0 ∂x̂
(8.43) ν̂i = −ν̆i.
The singularity in x̂0 = 0 is now a past singularity, and can be referred to
as a big bang singularity.
The unionN∪N̂ is a smooth manifold, topologically a product (−a, a)×S0—
we are well aware that formally the singularity {0}×S0 is not part of the union;
equipped with the respective metrics and time orientation it is a spacetime
which has a (metric) singularity in x0 = 0. The time function
(8.44) x̂0 =
x0, in N,
−x0, in N̂ ,
is smooth across the singularity and future directed.
N ∪ N̂ can be regarded as a cyclic universe with a contracting part N =
{x̂0 < 0} and an expanding part N̂ = {x̂0 > 0} which are joined at the
singularity {x̂0 = 0}.
It turns out that the inverse mean curvature flow, properly rescaled, defines
a natural C3- diffeomorphism across the singularity and with respect to this
diffeomorphism we speak of a transition from big crunch to big bang.
Using the time function in (8.44) the inverse mean curvature flows in N and
N̂ can be uniformly expressed in the form
(8.45) ˙̂x = −Ĥ−1ν̂,
where (8.45) represents the original flow in N , if x̂0 < 0, and the flow in (8.40),
if x̂0 > 0.
Let us now introduce a new flow parameter
(8.46) s =
−γ−1e−γt, for the flow in N,
γ−1e−γt, for the flow in N̂ ,
CURVATURE FLOWS IN SEMI-RIEMANNIAN MANIFOLDS 47
and define the flow y = y(s) by y(s) = x̂(t). y = y(s, ξ) is then defined in
[−γ−1, γ−1]× S0, smooth in {s 6= 0}, and satisfies the evolution equation
(8.47) y′ ≡ d
−Ĥ−1ν̂ eγt, s < 0,
Ĥ−1ν̂ eγt, s > 0.
In [14] we proved:
8.5. Theorem. The flow y = y(s, ξ) is of class C3 in (−γ−1, γ−1)×S0 and
defines a natural diffeomorphism across the singularity. The flow parameter s
can be used as a new time function.
8.6. Remark. The regularity result for the transition flow is optimal, i.e.,
given any 0 < α < 1, then there is an ARW space such that the transition flow
is not of class C3,α, cf. [19].
8.7. Remark. Since ARW spaces have a future mean curvature barrier, a
future end can be foliated by CMC hypersurfaces the mean curvature of which
can be used as a new time function., see [9] and [22]. In [8] we study this folia-
tion a bit more closely and prove that, when writing the CMC hypersurfaces as
graphs Mτ = graphϕ(τ, ·) in the special coordinate system of the ARW space,
where τ is the mean curvature, of Mτ then
(8.48) τ(−ϕ)1+γ̃
→ const > 0,
notice that ϕ < 0, and hence
(8.49) lim
ϕ(τ, x)
ϕ(τ, y)
= 1 ∀x, y ∈ S0.
Moreover, the new time function
(8.50) s = −τ−q, q = γ̃
1 + γ̃
can be extended to the mirror universe N̂ by odd reflection as a function of
class C3 across the singularity with non-vanishing gradient.
References
[1] Lars Andersson and Gregory Galloway, Ds/cft and spacetime topology, Adv. Theor.
Math. Phys. 68 (2003), 307–327, hep-th/0202161, 17 pages.
[2] Robert Bartnik, Remarks on cosmological spacetimes and constant mean curvature sur-
faces, Comm. Math. Phys. 117 (1988), no. 4, 615–624.
[3] Antonio N. Bernal and Miguel Sanchez, On smooth Cauchy hypersurfaces and Geroch’s
splitting theorem, Commun. Math. Phys. 243 (2003), 461–470, arXiv:gr-qc/0306108.
[4] , Smoothness of time functions and the metric splitting of globally hyperbolic
spacetimes, Comm. Math. Physics 257 (2005), 43–50, arXiv:gr-qc/0401112.
[5] Shiu Yuen Cheng and Shing Tung Yau, On the regularity of the solution of the n-
dimensional Minkowski problem, Comm. Pure Appl. Math. 29 (1976), no. 5, 495–516.
[6] Klaus Ecker and Gerhard Huisken, Parabolic methods for the construction of spacelike
slices of prescribed mean curvature in cosmological spacetimes., Commun. Math. Phys.
135 (1991), no. 3, 595–613.
http://arXiv.org/pdf/hep-th/0202161
http://arXiv.org/abs/gr-qc/0306108
http://arXiv.org/abs/gr-qc/0401112
48 CLAUS GERHARDT
[7] Lars Gȧrding, An inequality for hyperbolic polynomials, J. Math. Mech. 8 (1959), 957–
[8] Claus Gerhardt, Properties of the CMC foliation of ARW spaces, in preparation.
[9] , H-surfaces in Lorentzian manifolds, Commun. Math. Phys. 89 (1983), 523–553.
[10] , Hypersurfaces of prescribed curvature in Lorentzian manifolds, Indiana Univ.
Math. J. 49 (2000), 1125–1153, arXiv:math.DG/0409457.
[11] , Hypersurfaces of prescribed mean curvature in Lorentzian manifolds, Math. Z.
235 (2000), 83–97, arXiv:math.DG/0409465.
[12] , Estimates for the volume of a Lorentzian manifold, Gen. Relativity Gravitation
35 (2003), 201–207, math.DG/0207049.
[13] , Hypersurfaces of prescribed scalar curvature in Lorentzian manifolds, J. reine
angew. Math. 554 (2003), 157–199, math.DG/0207054.
[14] , The inverse mean curvature flow in ARW spaces - transition from big crunch
to big bang, 2004, arXiv:math.DG/0403485, 39 pages.
[15] , The inverse mean curvature flow in cosmological spacetimes, 2004,
arXiv:math.DG/0403097, 24 pages.
[16] , Minkowski type problems for convex hypersurfaces in the sphere, 2005,
arXiv:math.DG/0509217, 30 pages.
[17] , Analysis II, International Series in Analysis, International Press, Somerville,
MA, 2006, 395 pp.
[18] , Curvature Problems, Series in Geometry and Topology, vol. 39, International
Press, Somerville, MA, 2006, 323 pp.
[19] , The inverse mean curvature flow in Robertson-Walker spaces and its applica-
tion to cosmology, Methods Appl. Analysis 13 (2006), no. 1, 19–28, gr-qc/0404112.
[20] , The mass of a Lorentzian manifold, Adv. Theor. Math. Phys. 10 (2006), 33–48,
math.DG/0403002.
[21] , Minkowski type problems for convex hypersurfaces in hyperbolic space, 2006,
arXiv:math.DG/0602597, 32 pages.
[22] , On the CMC foliation of future ends of a spacetime, Pacific J. Math. 226
(2006), no. 2, 297–308, math.DG/0408197.
[23] Bo Guan and Pengfei Guan, Convex hypersurfaces of prescribed curvatures., Ann. Math.
156 (2002), 655–673.
[24] S. W. Hawking and G. F. R. Ellis, The large scale structure of space-time, Cambridge
University Press, London, 1973.
[25] Heiko Kröner, Der inverse mittlere Krümmungsfluß in Lorentz Mannigfaltigkeiten, 2006,
Diplomthesis, Heidelberg University.
[26] N. V. Krylov, Nonlinear elliptic and parabolic equations of the second order, Mathe-
matics and its Applications (Soviet Series), vol. 7, D. Reidel Publishing Co., Dordrecht,
1987.
[27] O.A. Ladyzhenskaya, V.A. Solonnikov, and N.N. Ural’tseva, Linear and quasi-linear
equations of parabolic type. Translated from the Russian by S. Smith, Translations of
Mathematical Monographs. 23. Providence, RI: American Mathematical Society (AMS).
XI, 648 p. , 1968 (English).
[28] Oliver C. Schnürer, Partielle Differentialgleichungen 2, 2006, pdf file, Lecture notes.
[29] Shing-Tung Yau, Geometry of three manifolds and existence of Black Hole due to bound-
ary effect, arXiv:math.DG/0109053.
Ruprecht-Karls-Universität, Institut für Angewandte Mathematik, Im Neuen-
heimer Feld 294, 69120 Heidelberg, Germany
E-mail address: gerhardt@math.uni-heidelberg.de
URL: http://www.math.uni-heidelberg.de/studinfo/gerhardt/
http://arXiv.org/abs/math.DG/0409457
http://arXiv.org/abs/math.DG/0409465
http://arXiv.org/pdf/math.DG/0207049
http://arXiv.org/pdf/math.DG/0207054
http://arXiv.org/pdf/math.DG/0403485
http://arXiv.org/abs/math.DG/0403097
http://arXiv.org/abs/math.DG/0509217
http://arXiv.org/pdf/gr-qc/0404112
http://arXiv.org/pdf/math.DG/0403002
http://arXiv.org/abs/math.DG/0602597
http://arxiv.org/abs/math.DG/0408197
http://page.mi.fu-berlin.de/~schnuere/skripte/pde.pdf
http://arXiv.org/abs/math.DG/0109053
	1. Introduction
	2. Notations and preliminary results
	3. Evolution equations for some geometric quantities
	4. Essential parabolic flow equations
	5. Stability of the limit hypersurfaces
	6. Existence results
	7. The inverse mean curvature flow
	8. The IMCF in ARW spaces
	Transition from big crunch to big bang
	References
ABSTRACT
  We prove that the limit hypersurfaces of converging curvature flows are
stable, if the initial velocity has a weak sign, and give a survey of the
existence and regularity results.

<|endoftext|><|startoftext|>
Introduction
As described by Castor, Abbott, and Klein (hereafter CAK),1) mass loss in the
form of a high velocity wind is driven from the surface of an OB star by radiation
pressure on a multitude of resonance transitions of intermediate charge states of
cosmically abundant elements. The wind is characterized by a mass-loss rate Ṁ ∼
10−6–10−5 M⊙ yr
−1 and a velocity profile V (R) ∼ V∞(1− ROB/R)
β , where β ≈ 1
the terminal velocity V∞ ∼ 3Vesc = 3 (2GMOB/ROB)
1/2 ∼ 1500 km s−1, R is the
distance from the OB star, and MOB and ROB are respectively the mass and radius
of the OB star. In a detached high-mass X-ray binary (HMXB), a compact object,
typically a neutron star, captures a fraction f ∼ πR2BH/4πa
2 of the OB star wind,
where a is the binary separation, RBH = 2GMNS/[V (a)
2 + c2s ] is the Bondi-Hoyle
radius, cs ∼ 10 (T/10
4)1/2 km s−1 is the sound speed, and T is the wind temperature.
Accretion of this material onto the neutron star powers an X-ray luminosity LX ∼
fGṀMNS/RNS ∼ 10
36–1037 erg s−1, where MNS and RNS are respectively the mass
and radius of the neutron star. The resulting X-ray flux photoionizes the wind and
reduces its ability to be radiatively driven, both because the higher ionization state
of the plasma results is a reduction in the number of resonance transitions, and
because the energy of the transitions shifts to shorter wavelengths where the overlap
with the stellar continuum is lower. To first order, the lower radiative driving results
in a reduced wind velocity near the neutron star V (a), which increases the Bondi-
Hoyle radius RBH, which increases the accretion efficiency f , which increases the
X-ray luminosity LX. In this way, the X-ray emission of HMXBs is the result of a
complex interplay between the radiative driving of the wind of the OB star and the
photoionization of the wind by the neutron star.
Known since the early days of X-ray astronomy, HMXBs have been extensively
∗) E-mail: mauche@cygnus.llnl.gov
∗∗) E-mail: liedahl1@llnl.gov
∗∗∗) E-mail: shizuka@llnl.gov
†) E-Mail: tomek@uchicago.edu
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0237v1
2 C. W. Mauche, D. A. Liedahl, S. Akiyama, and T. Plewa
studied observationally, theoretically,2)–4) and computationally.5)–8) They are excel-
lent targets for X-ray spectroscopic observations because the large covering fraction of
the wind and the moderate X-ray luminosities result in large volumes of photoionized
plasma that produce strong recombination lines and narrow radiative recombination
continua of H- and He-like ions, as well as fluorescent lines from lower charge states.
§2. Vela X-1
Vela X-1 is the prototypical detached HMXB, having been studied extensively in
nearly every waveband, particularly in X-rays, since its discovery as an X-ray source
during a rocket flight four decades ago. It consists of a B0.5 Ib supergiant and a
magnetic neutron star in an 8.964-day orbit. From an X-ray spectroscopic point of
view, Vela X-1 distinguished itself in 1994 when Nagase et al.,9) using ASCA SIS
data, showed that, in addition to the well-known 6.4 keV emission line, the eclipse
X-ray spectrum is dominated by recombination lines and continua of H- and He-like
Ne, Mg, Si, S, Ar, and Fe. These data were subsequently modeled in detail by Sako
et al.,10) using a kinematic model in which the photoionized wind was characterized
by the ionization parameter ξ ≡ LX/nr
2, where r is the distance from the neutron
star and n is the number density, given by the mass-loss rate and velocity law of
an undisturbed CAK wind. Vela X-1 was subsequently observed with the Chandra
HETG in 2000 for 30 ks in eclipse11) and in 2001 for 85, 30, and 30 ks in eclipse and at
binary phases 0.25 and 0.5, respectively.12), 13) Watanabe et al.,13) using very similar
assumptions as Sako et al. and a Monte Carlo radiation transfer code, produced a
global model of Vela X-1 that simultaneously fit the HETG spectra from the three
binary phases with a wind mass-loss rate Ṁ ≈ 2 × 10−6 M⊙ yr
−1 and terminal
velocity V∞ = 1100 km s
−1. One of the failures of this model was the velocity
shifts of the emission lines between eclipse and phase 0.5, which were observed to be
∆V ≈ 400–500 km s−1, while the model simulations predicted ∆V ∼ 1000 km s−1.
In order to resolve this discrepancy, Watanabe et al. performed a 1D calculation to
estimate the wind velocity profile along the line of centers between the two stars,
accounting, in an approximate way, for the reduction of the radiative driving due to
photoionization. They found that the velocity of the wind near the neutron star is
lower by a factor of 2–3 relative to an undisturbed CAK wind, which was sufficient to
explain the observations. However, these results were not fed back into their global
model to determine the effect on the X-ray spectra.
§3. Hydrodynamic Simulations
To make additional progress in our understanding of the wind and accretion
flow of Vela X-1 in particular and HMXBs in general — to bridge the gap between
the detailed hydrodynamic models of Blondin et al. and the simple kinetic-spectral
models of Sako et al. and Watanabe et al. — we have undertaken a project to
develop improved models of radiatively-driven photoionized accretion flows, with
the goal of producing synthetic X-ray spectral models that possess a level of detail
commensurate with the grating spectra returned by Chandra and XMM-Newton.
This project combines (1) XSTAR14) photoionization calculations, (2) HUL-
Hydrodynamic and Spectral Simulations of HMXB Winds 3
(a) (b) (c) (d)
x x x
Fig. 1. Color-coded maps of (a) log T (K) = [4.4, 8.3], (b) log n (cm−3) = [7.4, 10.8], (c)
log V (km s−1) = [1.3, 3.5], and (d) log ξ (erg cm s−1) = [1.1, 7.7] in the orbital plane of Vela X-1.
The positions of the OB star and neutron star are shown by the circle and the “×,” respectively.
The horizontal axis x = [−5, 7]× 1012 cm, and the vertical axis y = [−4, 8]× 1012 cm.
LAC15) emission models appropriate to X-ray photoionized plasmas, (3) improved
models of the radiative driving of the photoionized wind, (4) FLASH16) three-
dimensional time-dependent adaptive-mesh hydrodynamics calculations, and (5) a
Monte Carlo radiation transport code.17) Radiative driving of the wind is accounted
for via the force multiplier formalism,1) accounting for X-ray photoionization and
non-LTE population kinetics using HULLAC atomic data for 2× 106 lines of 35,000
energy levels of 166 ions of the 13 most cosmically abundant elements. In addi-
tion to the usual hydrodynamic quantities, the FLASH calculations account for (a)
the gravity of the OB star and neutron star, (b) Coriolis and centrifugal forces, (c)
radiative driving of the wind as a function of the local ionization parameter, temper-
ature, and optical depth, (d) photoionization and Compton heating of the irradiated
wind, and (e) radiative cooling of the irradiated wind and the “shadow wind” be-
hind the OB star. To demonstrate typical results of our simulations, we show in
Fig. 1 color-coded maps of the log of the (a) temperature, (b) density, (c) velocity,
and (d) ionization parameter of a FLASH simulation with parameters appropriate
to Vela X-1. This is a 2D simulation in the binary orbital plane, has a resolution
of ∆l = 9.4 × 1010 cm, and, at the time step shown (t = 100 ks), the relatively
slow (V ≈ 400 km s−1)∗) irradiated wind has reached just ∼ 2 stellar radii from the
stellar surface. The various panels show (1) the effect of the Coriolis and centrifugal
forces, which cause the flow to curve clockwise, (2) the cool, fast wind behind the
OB star, (3) the hot, slow irradiated wind, (4) the hot, low density, high velocity
flow downstream of the neutron star, and (5) the bow shock and two flanking shocks
formed where the irradiated wind collides with the hot disturbed flow in front and
downstream of the neutron star.
Given these maps, it is straightforward to determine where in the binary the
X-ray emission originates. To demonstrate this, we show in Fig. 2 color-coded maps
of the log of the emissivity of (a) SiXIV Lyα, (b) SiXIII Heα, (c) FeXXVI Lyα,
and (d) FeXXV Heα. The gross properties of these maps agree with Fig. 24 of
Watanabe et al., but they are now (1) quantitative rather than qualitative and (2)
specific to individual transitions of individual ions. The maps also capture features
that otherwise would not have been supposed, such as the excess emission in the H-
∗) Note that this velocity reproduces the value that Watanabe et al. found was needed to match
the velocity of the emission lines in the Chandra HETG spectra of Vela X-1.
4 C. W. Mauche, D. A. Liedahl, S. Akiyama, and T. Plewa
(a) (b) (c) (d)
x x x x
Fig. 2. Color-coded maps of the log of the X-ray emissivity of (a) SiXIV Lyα, (b) SiXIII Heα,
(c) FeXXVI Lyα, and (d) FeXXV Heα. In each case, two orders of magnitude are plotted.
and He-like Si lines downstream of the flanking shocks. Combining these maps with
the velocity map (Fig. 1c), these models make very specific predictions about (1)
the intensity of the emission features, (2) where the emission features originate, and
(3) their velocity widths and amplitudes as a function of binary phase.
The next step in our modeling effort is to feed the output of the FLASH simula-
tions into the Monte Carlo radiation transfer code, to determine how the spatial and
spectral properties of the X-ray emission features are modified by Compton scatter-
ing, photoabsorption followed by radiative cascades, and line scattering. This work
is underway.
Acknowledgements
This work was performed under the auspices of the U.S. Department of Energy
by University of California, Lawrence Livermore National Laboratory under Contract
W-7405-Eng-48. T. Plewa’s contribution to this work was supported in part by the
U.S. Department of Energy under Grant No. B523820 to the Center for Astrophysical
Thermonuclear Flashes at the University of Chicago.
References
1) J. I. Castor, D. C. Abbott, and R. I. Klein, Ap.J. 195 (1975), 157, CAK.
2) S. Hatchett and R. McCray, Ap.J. 211 (1977), 552.
3) R. McCray, T. R. Kallman, J. I. Castor, and G. L. Olson, Ap.J. 282 (1984), 245.
4) I. R. Stevens and T. R. Kallman, Ap.J. 365 (1990), 321.
5) J. M. Blondin, T. R. Kallman, B. A. Fryxell, and R. E. Taam, Ap.J. 356 (1990), 591.
6) J. M. Blondin, I. R. Stevens, and T. R. Kallman, Ap.J. 371 (1991), 684.
7) J. M. Blondin, Ap.J. 435 (1994), 756.
8) J. M. Blondin and J. W. Woo, Ap.J. 445 (1995), 889.
9) F. Nagase, G. Zylstra, T. Sonobe, T. Kotani, H. Inoue, and J. Woo, Ap.J. 436 (1994), L1.
10) M. Sako, D. A. Liedahl, S. M. Kahn, and F. Paerels, Ap.J. 525 (1999), 921.
11) N. S. Schulz, C. R. Canizares, J. C. Lee, and M. Sako, Ap.J. 564 (2002), L21.
12) G. Goldstein, D. P. Huenemoerder, and D. Blank, A.J. 127 (2004), 2310.
13) S. Watanabe, et al., Ap.J. 651 (2006), 421.
14) T. Kallman and M. Bautista, Ap.J.S. 133 (2001), 221
15) A. Bar-Shalom, M. Klapisch, and J. Oreg, Phys. Rev. A38 (1988), 1773
16) B. Fryxell, et al., Ap.J.S. 131 (2000), 273.
17) C. W. Mauche, D. A. Liedahl, B. F. Mathiesen, M. A. Jimenez-Garate, and J. C. Raymond,
Ap.J. 606 (2004), 168.
	Introduction
	Vela-0.25 X-1
	Hydrodynamic-0.25 Simulations
ABSTRACT
  We describe preliminary results of a global model of the radiatively-driven
photoionized wind and accretion flow of the high-mass X-ray binary Vela X-1.
The full model combines FLASH hydrodynamic calculations, XSTAR photoionization
calculations, HULLAC atomic data, and Monte Carlo radiation transport. We
present maps of the density, temperature, velocity, and ionization parameter
from a FLASH two-dimensional time-dependent simulation of Vela X-1, as well as
maps of the emissivity distributions of the X-ray emission lines.

<|endoftext|><|startoftext|>
Radio Astrometric Detection and Characterization of Extra-Solar
Planets:
A White Paper Submitted to the NSF ExoPlanet Task Force
Geoffrey C. Bower1, Alberto Bolatto1, Eric Ford2, Paul Kalas1, Jim Ulvestad3
ABSTRACT
The extraordinary astrometric accuracy of radio interferometry creates an
important and unique opportunity for the discovery and characterization of exo-
planets. Currently, the Very Long Baseline Array can routinely achieve better
than 100 µas accuracy, and can approach 10 µas with careful calibration. We
describe here RIPL, the Radio Interferometric PLanet search, a new program
with the VLBA and the Green Bank 100 m telescope that will survey 29 low-mass,
active stars over 3 years with sub-Jovian planet mass sensitivity at 1 AU. An
upgrade of the VLBA bandwidth will increase astrometric accuracy by an order
of magnitude. Ultimately, the colossal collecting area of the Square Kilometer
Array could push astrometric accuracy to 1 microarcsecond, making detection
and characterizaiton of Earth mass planets possible.
RIPL and other future radio astrometric planet searches occupy a unique
volume in planet discovery and characterization parameter space. The parameter
space of astrometric searches gives greater sensitivity to planets at large radii than
radial velocity searches. For the VLBA and the expanded VLBA, the targets of
radio astrometric surveys are by necessity nearby, low-mass, active stars, which
cannot be studied efficiently through the radial velocity method, coronagraphy,
or optical interferometry. For the SKA, detection sensitivity will extend to solar-
type stars. Planets discovered through radio astrometric methods will be suitable
for characterization through extreme adaptive optics.
The complementarity of radio astrometric techniques with other methods
demonstrates that radio astrometry can play an important role in the roadmap
for exoplanet discovery and characterization.
1Astronomy Department & Radio Astronomy Laboratory, University of California, Berkeley, CA 94720;
gbower,bolatto,pkalas@astro.berkeley.edu
2Harvard-Smithsonian Center for Astrophysics, 60 Garden St., Cambridge, MA 02138; ericb-
ford@gmail.com
3National Radio Astronomy Observatory, P.O. Box 0, Socorro NM 87801, U.S.A. ; julvesta@nrao.edu
http://arxiv.org/abs/0704.0238v1
– 2 –
1. Radio Astrometry and Extra-Solar Planets
Radio astrometry has long been the gold standard for definition of celestial reference
frames (Fey et al. 2004, AJ 127, 3587) and has been used to obtain the most accurate
geometric measurements of any astronomical technique. Astrometric results include mea-
surement of the deflection of background sources due to the gravitational fields of the Sun
and Jupiter (Fomalont & Kopeikin 2003, ApJ, 598, 704), the parallax and proper motion of
pulsars at distances greater than 1 kpc (Chatterjee et al. 2005, ApJ, 630, L61), an upper
limit to the proper motion of Sagittarius A* of a few km s−1 (Reid & Brunthaler 2004, ApJ,
616, 872), the rotation of the disk of M33 (Brunthaler et al. 2005, Science, 307, 1440), and
a < 1% distance to the Taurus star-forming cluster (Loinard 2006, BAAS, 209, 1080).
The Very Long Baseline Array (VLBA) images nonthermal radio emission and can
routinely achieve 100 µas astrometric accuracy, but has achieved an accuracy as high as 8
µas under favorable circumstances (Fomalont & Kopeikin 2003). Nonthermal stellar radio
emission has been detected from many stellar types (Güdel 2002, ARA&A, 40, 217), including
brown dwarfs (Berger et al. 2001, Nat, 410, 338), proto-stars (Bower et al. 2003, ApJ, 598,
1140) , massive stars with winds (Dougherty et al. 2005, ApJ, 623, 447), and late-type stars
(Berger et al. 2006, ApJ, 648, 629). Only late-type stars are sufficiently bright, numerous,
nearby, and low mass to provide a large sample of stars suitable for large-scale astrometric
exoplanet searches. Radio astrometric searches can determine whether or not M dwarfs, the
largest stellar constituent of the Galaxy, are surrounded by planetary systems as frequently
as FGK stars and if the planet mass-period relation varies with stellar type. The population
of gas giants at a few AU around low mass stars is an important discriminant between planet
formation models.
Radio astrometric searches have a number of unique qualities:
• Opportunity to discover planets around nearby active M dwarfs at large radii;
• Ability to fully characterize orbits of detected planets, without degeneracies in mass,
inclination, and longitude of ascending node;
• Sensitivity to long-period planets with sub-Jovian masses currently and Earth masses
ultimately;
• Complementary with existing planet searching techniques: most targets cannot be
explored through other methods;
• Ability to follow-up detected planets with imaging and spectroscopy; and,
• Absolute astrometric positions within the radio reference frame for stars and planets.
The quality and uniqueness of radio astrometry for planet searches are the result of two
factors:
– 3 –
Fig. 1.— Sensitivity of different methods in planet mass and semi-major axis space for
radio astrometric surveys and other methods. “Exp. VLBA” refers to the upgraded VLBA
described in § 4. The semi-major axis at the minimum in the astrometric search curves is
determined by the search duration, which is 3 years for RIPL and the Exp. VLBA campaign.
• High precision of radio astrometry: The VLBA can routinely achieve 100 µas
accuracy through relative astrometry. This precision is an order of magnitude better than
obtained from laser-guide star adaptive optics (e.g., Pravdo et al. 2005). Future instruments
will have one to two orders of magnitude more accurate astrometry, comparable to the best
accuracy achievable with the proposed SIM spacecraft.
• Active stars are difficult to study in optical programs: Our target stars are
active M dwarfs, which have radio fluxes on the order of 1 mJy. These radio stars are
difficult to study through optical radial velocity techniques because they are faint and because
the activity in these stars distorts line profiles, reducing the accuracy of radial velocity
measurements.
We give a sketch of the parameter space for RIPL, future radio astrometric searches,
the Space Interferometric Mission, radial velocity searches, and coronagraphic searches in
Figure 1. A comparison of the radial velocity and astrometric amplitudes indicates that
astrometric techniques are favored over radial velocity techniques for long period ( ∼> 1 year)
planets for these faint objects, for an astrometric accuracy of ∼ 100 µas (Ford 2006, PASP,
118, 364).
– 4 –
In Section 2, we describe the sensitivity and methods of radio astrometry. In Section 3,
we describe a new program with the VLBA and the Green Bank 100m telescope to search for
planets around nearby M dwarfs. In Section 4, we demonstrate that a bandwidth upgrade for
the VLBA will increase astrometric accuracy or stellar sample sizes by an order of magnitude.
In Section 5, we discuss the role that the Square Kilometer Array can play with its three
order of magnitude increase in sensitivity over the VLBA.
2. Radio Astrometry Sensitivity and Methods
Astrometric exoplanet searches must be able to detect an astrometric signal that has
an amplitude of
θ = 2
= 1400 µas ∗
∗Mp/MJ ∗
0.2M⊙
, (1)
for a planet of mass Mp orbiting a star of mass M∗ with a semimajor axis a at a distance D
from the Sun (a mass of 0.2 M⊙ corresponds to a M5 dwarf). To robustly detect a planet,
observations must span at least a significant fraction of a period
T = 2.2 yr ∗
0.2M⊙
. (2)
The ultimate accuracy that can be obtained through a radio astrometric technique is
σast = σbeam/SNR, (3)
where σbeam = b/λ is the synthetic beam size for an array with maximum baseline b, λ is the
observing wavelength, and SNR is the signal to noise ratio of the target source detection.
For the VLBA σast ≈ 500 µas/SNR.
The astrometric position is defined relative to nearby (∼ 1◦) compact radio sources.
Typical observations include switching on minute timescales between the calibrator and the
target sources, with less frequent observations of secondary calibrators. The use of mul-
tiple calibrators is intended to determine the differential delay in position on the sky due
to varying path length from tropospheric water vapor. The extent to which this cannot
be calibrated sets the final astrometric accuracy in observations that are not SNR-limited.
The nearer the calibrators and the greater sensitivity at which they can be detected typ-
ically determines this error. The error decreases linearly with decreasing calibrator-target
separation. The increased sensitivity of future arrays will increase the calibrator density
and therefore decrease the typical separation from calibrator to target and the uncalibrated
astrometric error. For sufficiently small target to calibrator separation, the calibrator will be
in the primary beam of the antenna, enabling simultaneous observations of the target and
calibrator that also remove temporal dependence of tropospheric variations.
– 5 –
RIGHT ASCENSION (J2000)
22 01 13.3054 13.3052 13.3050 13.3048 13.3046 13.3044 13.3042 13.3040
28 18 25.142
25.140
25.138
25.136
25.134
25.132
25.130
25.128
25.126
25.124
RIGHT ASCENSION (J2000)
22 01 13.3054 13.3052 13.3050 13.3048 13.3046 13.3044 13.3042 13.3040
28 18 25.142
25.140
25.138
25.136
25.134
25.132
25.130
25.128
25.126
25.124
RIGHT ASCENSION (J2000)
22 01 13.3054 13.3052 13.3050 13.3048 13.3046 13.3044 13.3042 13.3040
28 18 25.142
25.140
25.138
25.136
25.134
25.132
25.130
25.128
25.126
25.124
Fig. 2.— Images of GJ4247 in three separate epochs on 23, 25, and 26 March 2006 (right
to left) from the VLBA Precursor Astrometric Survey. Contour levels are -3, 3, 4, 5, 6, 7, 8
times the rms noise of 95 µJy. The synthesized beam is shown in the lower left hand corner
of each image.
3. RIPL: Radio Interferometric Planet Search
RIPL is a 1400-hour, 3-year VLBA and GBT program to search for planets around 29
nearby, low-mass, active stars. The program will achieve sub-Jovian planet mass sensitivity.
The observing program will be completed in 2009.
The most serious limitation to astrometric accuracy may be from stellar activity that
jitters the apparent stellar position. Most evidence, however, indicates that this radio astro-
metric jitter is small. For instance, White, Lim and Kundu (1994, ApJ, 422, 293) model the
radio emission from dMe stars as originating within ∼ 1 stellar radius of the photosphere.
At a distance of 10 pc for a M5 dwarf a stellar radius is ∼ 100 µas, roughly an order of
magnitude smaller than the astrometric signature of a Jupiter analog. We conducted the
VLBA Precursor Astrometric Survey (VPAS) in Spring 2006 to assess the effect of stellar
jitter on astrometric accuracy (Bower et al. 2007, in prep.).
For each star, three VLBA epochs were spread over fewer than 10 days. Seven stars were
detected in at least one epoch and four were detected in all three epochs (Figure 2). All stars
have proper motions and parallaxes determined by Hipparcos or other optical methods with
a precision of a few mas per year, yielding predicted relative positions accurate to ∼ 100 µas
during the length of the study. For all stars detected with multiple epochs, the motions
match the results of Hipparcos astrometry well with rms in each coordinate ranging from
0.08 to 0.26 µas. Deviations in the positions appear to be limited by our sensitivity; i.e., the
effect of stellar activity on their positions is unimportant.
In fact, the small differences in the fitted proper motion and the Hipparcos proper motion
already eliminate brown dwarfs as companions to these objects (Figure 3). The measured
differences are consistent with noise in the VLBA astrometry (200 µas /3day ∼ 20 mas/yr).
The typical reflex motion due to a long period brown dwarf is ∼ 100 mas/yr, which would be
apparent. The much longer time baseline and better sensitivity of RIPL will reduce proper
motion errors by ∼ 2 orders of magnitude.
– 6 –
r (AU)
GJ4247:  0.144 AU y−2
2 4 6 8 10
r (AU)
GJ896A:  0.070 AU y−2
2 4 6 8 10
Fig. 3.— Region of planetary mass and semi-major axis phase-space rejected by acceleration
upper limits based on combination of 3 epochs of radio astrometric measurements and optical
astrometry, primarily from Hipparcos. Different contours indicate confidence intervals for
excluded regions.
3.1. Synergy with other Planet Searches
RIPL is synergistic with the existing and future planet-search programs, as well as cur-
rent ground-based planet searches (including radial velocities, transits, adaptive optics, and
interferometry). RIPL provides an opportunity to search for planetary systems in a unique
area of parameter space that will not be targeted by other planet searches until the launch
of NASA SIM - Planetquest.
Ground based transit searches are most sensitive for very short periods (P ∼ days), and
the Kepler mission aims to detect planets with orbital periods of slightly more than a year.
Thus, RIPL will make a valuable contribution to our understanding of the frequency of
long-period planets around M stars. Further, unlike transits and radial velocity observations
astrometric measurements directly measure the planet mass, which is important for testing
models of planet formation. While the unknown inclination is less of an issue for studying
large samples of planets, measuring individual inclinations will be particularly valuable for
planets around M dwarfs, since a relatively modest number of M dwarfs are being surveyed
by RIPL (∼ 30 vs ∼ 3000 stars by radial velocities).
Ground-based optical and near-infared interferometers (e.g., PTI, NPOI) require bright
stars and are not appropriate for faint low-mass stars. The RIPL astrometric accuracy is an
order of magnitude better than the astrometric error from Keck Laser Guide Star Adaptive
Optics astrometry (Pravdo et al. 2005, ApJ, 630, 528). Thus, RIPL is the best means for
an astrometric search of M dwarfs until SIM launches (now estimated for no earlier than
2016).
– 7 –
A long-period planet detected by RIPL would enable exciting scientific investigations
such as photometric and spectroscopic observations to determine the planets physical prop-
erties. While space based missions such as TPF-C and TPF-I are expected to be extremely
powerful and aim to directly detect terrestrial mass planets, these missions are not expected
to launch for at least a decade in the future. Knowing which stars have giant planets suitable
for direct imaging would enable direct probes of an extrasolar planet.
4. VLBA Upgrade and Planet Detection
The VLBA is presently being upgraded from a typical data rate of 256 Mbit/s to 4
Gbit/s, with project completion estimated by 2010. This will result in a sensitivity increase
by a factor of 4, or about a factor of 8 increase in areal density of reference sources on the
sky. Thus, the typical distance between a target star and its nearest reference source will
decrease by a factor of ∼ 3. A few years later we expect a data rate of 16 Gbit/s, yielding
a target-calibrator separation more than 10 times smaller than current values. Since in
the limit of infinite SNR the astrometric error depends linearly on the separation from the
reference source, relative astrometric errors of . 10 µas should be fairly routine; in principle,
this would permit detection of a planet with a mass of less than 10% of the mass of Jupiter.
The sensitivity increase afforded by these upgrades will also permit a sizable increase of the
late-type dwarf sample.
5. Square Kilometer Array
The Square Kilometer Array (SKA; Carilli & Rawlings 2004, New AR, 48, 979) is a
proposed future radio telescope that would have a collecting area of a square kilometer,
approximately 200 times the collecting area of the VLBA. The SKA would be built toward
the end of the next decade; it is planned to cover the frequency range from 0.1 to 25 GHz, with
the 5–10 GHz range being most useful for astrometric planet detection. If 25% of the SKA
area at ∼ 8 GHz is constructed on baselines of 1000-5000 km, it will supply revolutionary
astrometric accuracy (Fomalont & Reid 2004, New AR, 48, 1473). With dish antennas of
12m diameter, the combination of sensitivity and wide field of view often will enable many
astrometric reference sources to be found in the same antenna field of view as the target
star, allowing all temporal variations in Earth’s atmosphere to be removed. In such a case,
the relative astrometric accuracy may reach ∼ 1 µas, competitive with SIM and enabling
astrometric detection of Earth-mass planets.
The sensitivity of the SKA will enable astrometric detection of thermal emission from
stars. The Sun, for instance, would be detectable to a distance of 10 pc with the SKA. Thus,
the SKA will be capable of detecting and characterizing planets around Sun-like stars.
	Radio Astrometry and Extra-Solar Planets
	Radio Astrometry Sensitivity and Methods
	RIPL: Radio Interferometric Planet Search
	Synergy with other Planet Searches
	VLBA Upgrade and Planet Detection
	Square Kilometer Array
ABSTRACT
  The extraordinary astrometric accuracy of radio interferometry creates an
important and unique opportunity for the discovery and characterization of
exo-planets. Currently, the Very Long Baseline Array can routinely achieve
better than 100 microarcsecond accuracy, and can approach 10 microarcsecond
with careful calibration. We describe here RIPL, the Radio Interferometric
PLanet search, a new program with the VLBA and the Green Bank 100 m telescope
that will survey 29 low-mass, active stars over 3 years with sub-Jovian planet
mass sensitivity at 1 AU. An upgrade of the VLBA bandwidth will increase
astrometric accuracy by an order of magnitude. Ultimately, the colossal
collecting area of the Square Kilometer Array could push astrometric accuracy
to 1 microarcsecond, making detection and characterizaiton of Earth mass
planets possible.
  RIPL and other future radio astrometric planet searches occupy a unique
volume in planet discovery and characterization parameter space. The parameter
space of astrometric searches gives greater sensitivity to planets at large
radii than radial velocity searches. For the VLBA and the expanded VLBA, the
targets of radio astrometric surveys are by necessity nearby, low-mass, active
stars, which cannot be studied efficiently through the radial velocity method,
coronagraphy, or optical interferometry. For the SKA, detection sensitivity
will extend to solar-type stars. Planets discovered through radio astrometric
methods will be suitable for characterization through extreme adaptive optics.
  The complementarity of radio astrometric techniques with other methods
demonstrates that radio astrometry can play an important role in the roadmap
for exoplanet discovery and characterization.

<|endoftext|><|startoftext|>
Interface dynamics of microscopic cavities in water
Joachim Dzubiella1, ∗
Physics Department, Technical University Munich, 85748 Garching, Germany
(Dated: November 4, 2018)
An analytical description of the interface motion of a collapsing nanometer-sized spherical cavity
in water is presented by a modification of the Rayleigh-Plesset equation in conjunction with ex-
plicit solvent molecular dynamics simulations. Quantitative agreement is found between the two
approaches for the time-dependent cavity radius R(t) at different solvent conditions while in the con-
tinuum picture the solvent viscosity has to be corrected for curvature effects. The typical magnitude
of the interface or collapse velocity is found to be given by the ratio of surface tension and fluid vis-
cosity, v ≃ γ/η, while the curvature correction accelerates collapse dynamics on length scales below
the equilibrium crossover scales (∼1nm). The study offers a starting point for an efficient implicit
modeling of water dynamics in aqueous nanoassembly and protein systems in nonequilibrium.
I. INTRODUCTION
Hydrophobic hydration in equilibrium is a phe-
nomenon which exhibits qualitatively different behavior
at small and large length scales.1,2 While small solutes
(radii R .1nm) are accommodated by water with only
minor perturbations, larger solutes (R &1nm) induce ma-
jor rearrangements of water interfacial structure. As a
consequence the solvation free energy G(R) of small hy-
drophobic cavities scales with solute volume while for
larger cavities it grows with surface area (as a good
approximation near liquid-vapor coexistence) accompa-
nied by weak solvent dewetting at extended restrain-
ing hydrophobic surfaces.3 Furthermore, water, which
is close to the liquid-vapor transition at normal condi-
tions, can minimize interface area by locally evaporat-
ing and forming a ’nanobubble’ within hydrophobic con-
finement. Evidence of bubble formation in confined ge-
ometry has been given early by computer simulations
of smooth plate-like solutes,4 but more recently it has
been demonstrated in varying degrees in atomistically
resolved plate-like solutes,5,6 hydrophobic tubes and ion
channels,7,8 and in the collapse of proteins,9,10 suggesting
that it plays a key role in the stabilization and folding
dynamics of certain classes of biomolecules.11,12 Experi-
mental evidence of nanobubbles in strong confinement (in
contrast to bubbles at a single planar surface3) has been
given for instance in studies of water between hydropho-
bic surfaces,13 in zeolites and silica nanotubes,14,15 and
on a subnanometer scale in nonpolar protein cavities.16
The dewetting induced change in solvation energy is
typically estimated using simple macroscopic arguments
as known from capillarity theory, e.g. by describing in-
terfaces with Laplace-Young (LY) type of equations.14,17
Recently an extension of the LY equation has become
available which extrapolates to microscopic scales by in-
cluding a curvature correction to the interface tension
and considering atomistic dispersion and electrostatic
potentials of the solvated solute explicitly.18 Although
those macroscopic considerations (e.g,. the concept of
surface tension) are supposed to break down on atom-
istic scales they show surprisingly good results for the
solvation energy of microscopic solutes, e.g. alkanes and
noble gases, and quantitatively account for dewetting
effects in nanometer-sized hydrophobic confinement.19
While we conclude that the equilibrium location of the
solute-solvent interface seems to be well described by
those techniques, nothing is known about the interface
dynamics of evolution and relaxation. In this study we
address two fundamental questions: First, what are the
equations which govern the interface motion on atomistic
(∼1nm) scales? Secondly, does the dynamics exhibit any
signatures of the length scale crossover found in equilib-
rium?
On macroscopic scales the collapse dynamics of a (va-
por or gas) bubble is related to the well-known phe-
nomenon of sonoluminescence.20 The governing equa-
tions can be derived from Navier-Stokes and capillarity
theory and are expressed by the Rayleigh-Plesset (RP)
equation.21 We will show that the RP equation simpli-
fies in the limit of microscopic cavities and can be ex-
tended to give a quantitative description of cavity inter-
face dynamics on nanometer length scales. We find a
qualitatively different dynamics than the typical “mean-
curvature flow” description of moving interfaces,22 in par-
ticular a typical magnitude of interface or collapse veloc-
ity given by the ratio of surface tension and fluid viscosity,
v ≃ γ/η. Our study is restricted to the generic case of
the collapse of a spherical cavity and is complemented
by explicit solvent molecular dynamics (MD) computer
simulations. We note here that recently, Lugli and Zer-
betto studied nanobubble collapse in ionic solutions by
MD simulations on similar length scales.23 While their
MD data compares favorably with our results their in-
terpretation and conclusions in terms of the RP equa-
tion are different. We will resume this discussion in the
conclusion section.
In this study we show that a simple analytical approach
quantitatively describes microscopic cavity collapse for
a variety of different solvent situations while the sim-
ulations suggest that the solvent viscosity needs to be
corrected for curvature effects. Our study might offer a
simple starting point for an efficient implicit modeling
of water dynamics in aqueous nanoassembly and protein
http://arxiv.org/abs/0704.0239v1
systems in nonequilibrium.
II. THEORY
The Rayleigh-Plesset equation for the time evolution
of a macroscopic vapor bubble with radius R(t) can be
written as21
= ∆P + 4η
, (1)
where ρm is the solvent mass density, ∆P = P − Pv the
difference in liquid and vapor pressures, η the dynamic
viscosity, and γ the liquid-vapor interface tension. While
for macroscopic bubble radii the inertial terms (left hand
side) control the dynamics, for decreasing radii the fric-
tional and pressure terms (right hand side) grow in rel-
ative magnitude and eventually dominate, so that com-
pletely overdamped dynamics can be assumed on atom-
istic scales:
Ṙ ≃ −
. (2)
A rough estimate for the threshold radius Rt below which
friction dominates is given when the Reynolds number
R = vRρm/η becomes unity and viscous and inertial
forces are balanced. With a typical initial interface ve-
locity of the order of v ∼ γ/η [from R̈(0) = 0 in eq. (1)]
we obtain
Rt = η
2/(ρmγ) (3)
which is ≃ 10nm for water at normal conditions. Note
that this threshold value can deviate considerably for a
fluid different than water and that the viscosity typically
has a strong temperature (T ) dependence which implies
that Rt can change significantly with T .
In equilibrium (Ṙ = 0) the remaining expression in
eq. (2) is the (spherical) LY equation ∆P + 2γ/R = 0.
Thus eq. (2) describes a linear relationship between cap-
illary pressure and interface velocity where R/(4η) plays
the role of an interface mobility (inverse friction).22 In-
terestingly, the mobility is linear in bubble radius which
leads to a constant velocity driven by surface tension in-
dependent of radius (assuming P ≃ 0); this has to be con-
trasted to the typically used capillary dynamics which is
proportional to the local mean curvature ∝ 1/R.22
Generalizations of the LY equation to small scales are
available by adding a Gaussian curvature term (∼ 1/R2)
as shown by Boruvka and Neumann24; that has been
demonstrated to be equivalent to a first order curvature
correction in surface tension, i.e. γ(R) = γ∞(1−δT/R),
where δT is the Tolman length
25 and γ∞ the liquid-vapor
surface tension for a planar interface (R = ∞). The
Tolman length has a magnitude which is usually of the
order of the size of a solvent molecule. Furthermore, it
has been observed experimentally that the viscosity of
strongly confined water can depend on the particular na-
ture of the surface/interface.26 We conclude that in gen-
eral one has to anticipate that - analogous to the surface
tension - the effective interface viscosity obeys a curva-
ture correction in the limit of small cavities due to water
restructuring in the first solvent layers at the hydropho-
bic interface. In the following we make the simple first
order assumption that the correction enters eq. (2) also
linear in curvature (∼ 1/R) yielding
Ṙ = −
∆PR+∆Pδvis + 2γ∞ +
, (4)
where the constant δvis is the coefficient for the first order
curvature correction in viscosity and η∞ the macroscopic
bulk viscosity. Additionally, we define δ = δvis − δT and
second order terms in curvature are neglected. We note
that the choice of the 1/R-scaling of the viscosity curva-
ture correction has no direct physical justification and is
arbitrary. We think however, that a curvature correction
based on an expansion in orders of mean curvature is the
simplest and most natural way for such a choice.
In water at normal conditions the pressure terms in
(4) are negligible so that for large radii (R ≫ δ) the in-
terface velocity is constant and R(t) = R0 − γ∞/(2η∞)t.
This leads to a collapse velocity of about v ≃0.4Å/ps
(40m/s) which is 6% of the thermal velocity of water
vth =
3kBT/m showing that dissipative heating of the
system is relatively weak on these scales. A rough esti-
mate for the dissipation rate can be made by the released
interfacial energy dG(R, t)/dt ≃ d(4πR(t)2γ∞)/dt =
−4πγ2
R(t)/η∞ yielding for instance dG(R, t = 0)/dt ≃
−35kBT/ps for a bubble with R0 =2nm. At small
radii (R ≃ δ) the solution of (4) goes as R(t) ∼
const− (δγ∞/η∞)t decreasing or increasing the ve-
locity depending on the sign of δ = δvis − δT, i.e. the
acceleration depends on the particular sign and mag-
nitude of the curvature corrections to surface tension
and viscosity. For large pressures and radii the first
term dominates which gives rise to an exponential de-
cay R(t) ∼ exp[−∆P/(4η∞)t]. While extending to small
scales we have assumed that the time scale of internal in-
terface dynamics, i.e. hydrogen bond rearrangements,27
is much faster than the one of bubble collapse.
III. MD SIMULATION
In order to quantify our analytical predictions we
complement the theory by MD simulations using ex-
plicit SPC/E water.28 The liquid-vapor surface tension
of SPC/E water has been measured and agrees with the
experimental value for a wide range of temperatures.29
For T = 300K and P = 1bar we have γ∞ = 72mN/m.
The Tolman length has been estimated to be δT ≃
0.9Å from equilibrium measurements of the solvation
energy of spherical cavities.30 At the same conditions the
dynamic viscosity of SPC/E water has been found to
be η∞ = 6.42 · 10
−4Pa·s,31 ∼24% smaller than for real
water. In experiments in nanometer hydrophobic con-
finement and at interfaces however, the viscosity shows
deviations from the bulk value but remains comparable.26
We proceed by treating the viscosity η∞ as an adjustable
parameter together with its curvature correction coeffi-
cient δvis.
The MD simulations are carried out with the
DLPOLY2 package32 using an integration time step of
2fs. The simulation box is cubic and periodic in all three
dimensions with a length of L = (61.1 ± 0.2)Å in equi-
librium involving N = 6426 solvent molecules. Electro-
static interactions are calculated by the smooth-particle
mesh Ewald summation method. Lennard-Jones inter-
actions are cut-off and shifted at 9Å. Our investigated
systems are at first equilibrated in the NPT ensemble
with application of an external spherical potential of the
form βV (r) = [Å/(r −R′0)]
12 and all molecules removed
with r < R′0 since vapor can safely be neglected on these
scales. This stabilizes a well-defined spherical bubble of
radius R0 ≃ R
0 + 1Å. We define the cavity radius by
the radial location where the water density ρ(r) drops to
half of the bulk density ρ0/2. Thirty independent config-
urations in 20ps intervals are stored and serve as initial
configurations for the nonequilibrium runs. We employ a
Nosé-Hoover barostat and thermostat with a 0.2ps relax-
ation time to maintain the solvent at a pressure P and
a temperature T . Other choices of relaxation times in
the reasonable range between 0.1 and 0.5ps do not alter
our results. In the nonequilibrium simulations the con-
straining potential is switched off and the relaxation to
equilibrium is averaged over the thirty runs.
IV. RESULTS
system P/bar T/K cNaCl/M Q/e η∞/(10
−4Pa·s)
I 1 300 0 0 5.14
II 1 300 1.5 0 5.94
III 1 277 0 0 8.48
IV 2000 300 0 0 4.56
V 1000 300 0 0 4.72
VI 1 300 0 +2 5.14
TABLE I: Investigated system parameters: pressure P , tem-
perature T , and salt (NaCl) concentration c. In system VI
a fixed ion with charge Q = +2e is placed at the center of
the collapsing bubble. The viscosity η∞ is a fit-parameter in
systems I-V (see text).
We perform simulations of six different systems I-VI
whose features are summarized in Tab. I and differ in
thermodynamic parameters T and P (I, III, IV, and V)
but also inclusion of dispersed salt (II), and the influ-
ence of a charged particle in the bubble center (VI) are
considered. Note that the exact value of the crossover
length scale (however defined) can depend on the detailed
thermodynamic or solvent condition but remains close to
1nm.2
0 5 10 15 20 25 30
t=1ps
t=5ps
t=10ps
t=14ps
t=17ps
t=19ps
t=23ps
0 5 10 15 20
 τ(R)/
"10-90"- thickness
FIG. 1: Interface density profiles ρ(r)/ρ0 for system I are
plotted vs. the radial distance r from the bubble center for
different times t/ps=1,5,10,14,17,19,23. Symbols denote MD
simulation data and lines are fits using 2ρ(r)/ρ0 = erf{[r −
R(t)]/d}+1. The bubble radius R(t) is defined by the distance
at which the density is ρ0/2 (dotted line). The inset shows
the “10-90” thickness τ = 1.8124 d of the interface vs. R for
initial radii R0 = 19.83Å (pluses) and R0 = 25.6Å (crosses).
System I is at normal conditions (T=300K, P=1bar)
and consists of pure SPC/E water. Fig. 1 shows the
observed interface profiles in the nonequilibrium situa-
tion at different times t/ps=1, 5, 10, 14, 17, 19, and 23
starting from an initial radius R0 = 19.83Å. The liquid-
vapor interface stays relatively sharp in the process of
relaxation but broadens noticeably for smaller radii. At
t ≃ 23ps the system is completely relaxed to a homoge-
neous density distribution. The same time scale of bubble
collapse has been found in explicit water computer simu-
lations of dewetting in nanometer-sized paraffin plates,17
polymers,11 and atomistically resolved proteins.9,10
We find that the interface profiles can be fitted very
well with a functional form 2ρ(r)/ρ0 = erf{[r−R(t)]/d}+
1, where d is a measure of the interface thickness. The
interface fits are also shown in Fig. 1 together with the
MD data. The experimentally accessible “10-90” thick-
ness τ of an interface is the thickness over which the
density changes from 0.1ρ0 to 0.9ρ0 and is related to the
parameter d via τ = 1.8124 d. While experimental values
of τ for the planar water liquid-vapor interface vary be-
tween ∼ 4 and 8Å the measured values for SPC/E water
in the finite simulation systems are τ∞ =3 to 4Å.
29 We
find a strongly radius-dependent function τ(R) plotted
in the inset to Fig. 1 for initial radii R0 = 19.83Å and
R0 = 25.6Å. For R ≃ R0 the thickness increases during
the following 5ps from the equilibrium value τ ≃ 3Å to
FIG. 2: Time evolution of the cavity radius R(t) for parame-
ters as defined in systems I-VI. The solution of the modified
RP equation (4) (lines) is plotted vs. MD data (symbols).
The inset shows the solution of the modified RP equation in-
cluding inertia terms, cf. lhs of (1), (dashed lines) compared
to eq. (4) for system I with initial radii R0 = 19.83Å and
R0 = 10.0Å.
about τ ≃ 4.5− 5Å independent of R0. While the exact
equilibrium thickness at t = 0 depends on the particu-
lar choice of the confining potential V (r) (e.g., a softer
potential might lead to a broader initial interface) this
suggest that 4.5-5Å is the typical interface thickness
for a bubble of 1nm size. Regarding the slope of the
curve one might speculate that τ(R → ∞) saturates to
the thickness τ∞ of the measured planar interface for
R0 → ∞. For R . 10Å the thickness increases twofold
during the relaxation to equilibrium. This broadening
might be attributed to increased density fluctuations and
the structural change of interfacial water in the system
when crossing from large to small length scales which has
been shown to happen in equilibrium at ∼ 1nm.1,2
In Fig. 2 we plot the time evolution of the bubble ra-
dius R(t) for all investigated systems. Let us first focus
on the simulation data of system I (circles). As antic-
ipated the bubble radius decreases initially in a linear
fashion while for smaller radii (R(t) . 10Å) the velocity
steadily increases. From the best fit of eq. (4) we find
a viscosity η∞ = 5.14 · 10
−4Pa·s and its curvature cor-
rection coefficient δvis = 4.4Å. Although investigating a
confined system with large interfaces the viscosity value
differs only 20% from the SPC/E bulk value. Further-
more, from our macroscopic point of view the MD data
show that high curvature decreases the viscosity and the
latter has to be curvature-corrected with a (positive) co-
efficient larger than the Tolman length δT. If the surface
tension decreased in a stronger fashion with curvature
than viscosity the collapse velocity would drop in qual-
itative disagreement with the simulation. The overall
behavior of R(t) and the collapse velocity of about ∼
1Å/ps agrees very well with the recent MD data of Lugli
and Zerbetto, who simulated the collapse of a 1nm sized
bubble in SPC water.23
The inset to Fig. 2 shows the solution of eq. (4) in-
cluding inertial terms [left hand side of (1)] to check the
assumption of overdamped dynamics. While inertial ef-
fects are indeed small but not completely negligible for
an initial radius R0 = 19.83Å they basically vanish for
R0 = 10Å. Interestingly, the inertial effects are not visible
in the MD simulation data at all. We attribute this obser-
vation to the finite and periodic simulation box which is
known to suppress long-ranged inertial (hydrodynamic)
effects.33
In the following we assume δvis to be independent of
the other parameters and treat only η∞ as adjustable
variable. In system II we add 175 salt pairs of sodium
chloride (NaCl) into the aqueous solution resulting in a
concentration of c ≃1.5M. The ion-SPC/E interaction
parameters are those used by Bhatt et al.34 who mea-
sured a linear increase of surface tension with NaCl con-
centration in agreement with experimental data. While
this increment for c = 1.5M is about small 2-3%, the vis-
cosity has been measured experimentally to increase by
approximately 18% at 298.15K.35 Indeed by comparing
the simulation data to the theory we find a 16% larger
viscosity η∞ = 5.94 ·10
−4Pa·s. A slower collapse velocity
has been found also in the MD simulations of Lugli and
Zerbetto in concentrated LiCl and CsCL solutions when
compared to pure water.23
In system III we investigate the effect of lowering the
temperature by simulating at T = 277K.While only a 5%
increase of the water surface tension (SPC/E and real
water) is estimated from available data29 the viscosity
depends strongly on temperature: the relative increase
has been reported to be between 55 − 75% for SPC/E
water (85% for real water).36 Inspecting the MD data
and considering the surface tension increase we find in-
deed a large decrease in viscosity of 65% with a best-fit
η∞ = 8.48 · 10
−4Pa·s. Both systems, II and III, show
that solvent viscosity has a substantial influence on bub-
ble dynamics as quantitatively described by our simple
analytical approach. In systems IV and V we return to
T = 300K but increase the pressure P by a factor of
2000 and 1000, respectively. Best fits provide viscosities
which are around 10% smaller than at normal conditions
in agreement with the very weak pressure dependence of
the viscosity found in experiments37,38 at T=300K. The
major contribution to the faster dynamics comes explic-
itly from the pressure terms in eq. (4). Although mov-
ing away from liquid-vapor coexistence by increasing the
pressure up to 2000bar we assume (and verify hereby)
that the bubble interface tension can still be described
by γ∞.
In system VI we investigate the influence of a hy-
drophilic solute on the bubble interface motion in order
to make connection to cavitation close to molecular (pro-
tein) surfaces. As a simple measure we fix a divalent ion
at the center of the bubble so that we retain spherical
symmetry. The ion is modeled by a Lennard-Jones (LJ)
potential ULJ(r) = 4ǫ[(σ/r)
12 − (σ/r)6] with Q = +2e
point charges and uses the LJ parameters of the SPC/E
oxygen-oxygen interaction. As demonstrated recently
the LY equation can be modified to include dispersion
and electrostatic solute-solvent interactions explicitly,18
which extends (4) to
Ṙ = −
− ρ0ULJ(R) +
32πǫ0R4
The last term in (5) is the Born electrostatic energy den-
sity of a central charge Q in a spherical cavity with ra-
dius R with low dielectric vapor ǫv = 1 surrounded by
a high dielectric liquid (1/ǫl ≃ 0). The electric field
around the ionic charge and the dispersion attracts the
surrounding dipolar water what accelerates and eventu-
ally completely governs the bubble collapse below a ra-
dius R(t) . 13Å (t & 7ps) as also shown in Fig. 2.
The theoretical prediction (5) agrees very well without
any fitting using the viscosity from system I. We find
that the acceleration is mainly due to the electrostatic
attraction; the dispersion term plays just a minor role
while the excluded volume repulsion eventually deter-
mines the final (equilibrium) radius of the interface with
R(t = ∞) ≃ 2Å.
V. CONCLUSIONS
In conclusion, we have presented a simple analytical
and quantitative description of the interface motion of
a microscopic cavity by modifying the macroscopic RP
equation. Based on our MD data we find for the macro-
scopic description that analogous to the surface tension
the viscosity has to be corrected for curvature effects, a
prediction compelling to investigate further in detail and
probably related to the restructuring of interfacial water
for high curvatures (small R). The viscosity correction
accelerates collapse dynamics markedly below the equi-
librium crossover scale (∼1nm) in contrast to the pure
equilibrium picture where surface tension decreases what
slows down the collapse. Further, we find that the dy-
namics is curvature-driven due to the corrections to sur-
face tension and viscosity, not due surface tension as often
postulated.22 As a simple estimate, the interface velocity
is typically given by the ratio of surface tension and fluid
viscosity, v ≃ γ/η.
A comment has to be made regarding the recent work
of Lugli and Zerbetto on MD simulations of nanobubble
collapse in ionic solutions. While their MD data of the
collapse velocity for a 1nm bubble agree very well with
our results their interpretation in terms of the RP equa-
tion is different. They fit the ’violent regime’ solution
of the RP equation to the data [which is the solution of
only the inertial part, left hand side of (1)] and argue
that the violent regime still holds on the nm scale. As
demonstrated in this work, we arrive to a different con-
clusion: the collapse is friction dominated, the collapse
driving force is mainly capillary pressure, and we suggest
that the microscopic viscosity has to be curvature cor-
rected to explain the high curvature collapse behavior in
the MD simulations. The good agreement between our
modified RP equation and the MD data for different sol-
vent conditions, leading for instance to an altered solvent
surface tension or viscosity, support our view.
We finally note that extensions of the LY equation are
based on minimizing an appropriate free energy G(R) or
free energy functional18,24 so that we can write in a more
general form Ṙ ∼ [∂G(R)/∂R]/[η(R)R]. It is highly de-
sirable to generalize this simple dynamics further to ar-
bitrary geometries with which a wide field of potential
applications might open up, i.e. an efficient implicit mod-
eling of the water interface dynamics in the nonequilib-
rium process of hydrophobic nanoassembly, protein dock-
ing and folding, and nanofluidics.
Acknowledgements
J. D. thanks Lyderic Bocquet for pointing to the RP
equation, Bo Li (Applied Math, UCSD), Roland R.
Netz, Rudi Podgornik, and Dominik Horinek for stimu-
lating discussions, and the Deutsche Forschungsgemein-
schaft (DFG) for support within the Emmy-Noether-
Programme.
∗ e-mail address:jdzubiel@ph.tum.de
1 D. Chandler, Nature 437, 640 (2005).
2 S. Rajamani, T. M. Truskett, and S. Garde, PNAS 102,
9475 (2005).
3 A. Poynor, L. Hong, I. K. Robinson, S. Granick, Z. Zhang,
and P. A. Fenter, Phys. Rev. Lett. 97, 266101 (2006).
4 A. Wallquist and B. J. Berne, J. Phys. Chem. 99, 2893
(1995).
5 T. Koishi, Phys. Rev. Lett. 93, 185701 (2004).
6 N. Giovambattista, P. J. Rossky, and P. D. Debenedetti,
Phys. Rev. E 73, 041604 (2006).
7 O. Beckstein and M. S. P. Sansom, Proc. Nat. Acad. Sci.
100, 7063 (2003).
8 A. Anishkin and S. Sukharev, Biophys. J. 86, 2883 (2004).
9 R. Zhou, X. Huang, C. Margulis, and B. J. Berne, Science
305, 1605 (2004).
10 P. Liu, X. Huang, R. Zhou, and B. J. Berne, Nature 437,
159 (2005).
11 P. R. ten Wolde and D. Chandler, Proc. Natl. Acad. Sci.
99, 6539 (2002).
12 D. M. Huang and D. Chandler, PNAS 97, 8324 (2000).
13 A. Carambassis, L. C. Jonker, P. Attard, and M. W. Rut-
mailto:jdzubiel@ph.tum.de
land, Phys. Rev. Lett. 80, 5357 (1998).
14 R. Helmy, Y. Kazakevich, C. Ni, and A. Y. Fadeev, J. Am.
Chem. Soc. 127, 12446 (2005).
15 K. Jayaraman, K. Okamoto, S. J. Son, C. Luckett, A. H.
Gopalani, S. B. Lee, and D. S. English, J. Am. Chem. Soc.
127, 17385 (2005).
16 M. D. Collins, G. Hummer, M. L. Quillin, B. W. Matthews,
and S. M. Gruner, PNAS 102, 16668 (2005).
17 X. Huang, C. J. Margulis, and B. J. Berne, PNAS 100,
11953 (2003).
18 J. Dzubiella, J. M. J. Swanson, and J. A. McCammon,
Phys. Rev. Lett. 96, 087802 (2006).
19 J. Dzubiella, J. M. J. Swanson, and J. A. McCammon, J.
Chem. Phys. 124, 084905 (2006).
20 C. E. Brennen, Cavitation and bubble dynamics (Oxford
U. Press, New York, 1995).
21 M. S. Plesset and A. Prosperetti, Annu. Rev. Fluid. Mech.
25, 577 (1977).
22 H. Spohn, J. Stat. Phys. 71, 1081 (1993).
23 F. Lugli and F. Zerbetto, ChemPhysChem 8, 47 (2007).
24 L. Boruvka and A. W. Neumann, J. Chem. Phys. 66, 5464
(1977).
25 R. C. Tolman, J. Chem. Phys. 17, 333 (1949).
26 U. Raviv, S. Giasson, J. Frey, and J. Klein, J. Phys.: Con-
dens. Matt. 14, 9275 (2002).
27 I. W. Kuo and C. J. Mundy, Science 303, 658 (2004).
28 H. J. C. Berendsen, J. R. Grigera, and T. P. Straatsma, J.
Phys. Chem. 91, 6269 (1987).
29 J. Alejandre, D. J. Tildesley, and G. A. Chapela, J. Chem.
Phys. 102, 4574 (1995).
30 D. M. Huang and D. Chandler, J. Phys. Chem. B 106,
2047 (2002).
31 B. Hess, J. Chem. Phys. 116, 209 (2002).
32 W. Smith and T. R. Forester (1999), the DLPOLY 2 User
Manual.
33 B. Dünweg and K. Kremer, J. Chem. Phys. 99, 6983
(1993).
34 D. Bhatt, J. Newman, and C. J. Radke, J. Phys. Chem. B
108, 9077 (2004).
35 Z. Hai-Lang and H. Shi-Jun, J. Chem. Eng. Data 41, 516
(1996).
36 P. E. Smith and F. van Gunsteren, Chem. Phys. Lett. 215,
315 (1993).
37 K. E. Bett and J. B. Cappi, Nature 207, 620 (1965).
38 J. V. Sengers and J. T. R. Watson, J. Phys. Chem. Ref.
Data 15, 1291 (1986).
ABSTRACT
  An analytical description of the interface motion of a collapsing
nanometer-sized spherical cavity in water is presented by a modification of the
Rayleigh-Plesset equation in conjunction with explicit solvent molecular
dynamics simulations. Quantitative agreement is found between the two
approaches for the time-dependent cavity radius $R(t)$ at different solvent
conditions while in the continuum picture the solvent viscosity has to be
corrected for curvature effects. The typical magnitude of the interface or
collapse velocity is found to be given by the ratio of surface tension and
fluid viscosity, $v\simeq\gamma/\eta$, while the curvature correction
accelerates collapse dynamics on length scales below the equilibrium crossover
scales ($\sim$1nm). The study offers a starting point for an efficient implicit
modeling of water dynamics in aqueous nanoassembly and protein systems in
nonequilibrium.

<|endoftext|><|startoftext|>
Introduction to the AdS/CFT correspondence.
arXiv:hep-th/0009139
19. Polchinski J. String theory, Cambridge: Cambridge University Press (1998).
20. Appelquist T, Chodos A, Freund PG. Modern Kaluza-Klein theories, Menlo Park: Addison-
Wesley (1987)
21. Kim HJ, Romans LJ, van Nieuwenhuizen P. The mass spectrum of chiral N = 2 D = 10
supergravity on S5. Phys. Rev. D 32:389 (1985)
22. Bianchi M, Freedman DZ, Skenderis K. Holographic renormalization. Nucl. Phys. B 631:159
(2002)
23. Maldacena JM. Eternal black holes in anti-de-Sitter. J. High Energy Phys. 0304:021 (2003)
24. Saremi O. Shear waves, sound waves on a shimmering horizon. arXiv:hep-th/0703170
25. Kovtun P, Son DT, Starinets AO. Holography and hydrodynamics: Diffusion on stretched
horizons. J. High Energy Phys. 0310:064 (2003)
26. Damour T. Black Hole Eddy Currents. Phys. Rev. D 18: 3598 (1978)
27. Thorne, KS, Price RH, Macdonald DA. Black Hole: The Membrane Paradigm, New Haven:
Yale University Press (1986)
28. Kovtun PK, Starinets AO. Quasinormal modes and holography. Phys. Rev. D 72:086009 (2005)
29. Starinets AO. Quasinormal spectrum and black hole membrane paradigm. Unpublished
30. Buchel A, Liu JT. Universality of the shear viscosity in supergravity. Phys. Rev. Lett., 93:090602
(2004)
31. Kovtun P, Son DT, Starinets AO. Viscosity in strongly interacting quantum field theories from
black hole physics. Phys. Rev. Lett. 94:111601 (2004)
32. Buchel A. On universality of stress-energy tensor correlation functions in supergravity, Phys.
Lett. B, 609:392 (2005).
33. Buchel A. N = 2* hydrodynamics. Nucl. Phys. B 708:451 (2005)
http://arXiv.org/abs/hep-th/0009139
http://arXiv.org/abs/hep-th/0703170
Viscosity, Black Holes, and QFT 23
34. Damour T. Surface effects in black hole physics. Proceedings of the Second Marcel Grossmann
Meeting on General Relativity. Ed. R. Ruffini, Amsterdam: North Holland (1982)
35. Buchel A, Liu JT, Starinets AO. Coupling constant dependence of the shear viscosity in N = 4
supersymmetric Yang-Mills theory. Nucl. Phys. B 707:56 (2005)
36. Teaney D. Effect of shear viscosity on spectra, elliptic flow, and Hanbury Brown-Twiss radii.
Phys. Rev. C 68:034913 (2003)
37. Shuryak E. Why does the quark gluon plasma at RHIC behave as a nearly ideal fluid? Prog.
Part. Nucl. Phys. 53:273 (2004)
38. Andronikashvili E. Zh. Eksp. Teor. Fiz. 18:429 (1948)
39. Parnachev A, Starinets A. The silence of the little strings. J. High Energy Phys. 0510:027 (2005)
40. Buchel A, Liu JT, Thermodynamics of the N = 2∗ flow. J. High Energy Phys. 0311:031 (2003)
41. Benincasa P, Buchel A, Starinets AO. Sound waves in strongly coupled non-conformal gauge
theory plasma. Nucl. Phys. B 733:160 (2006)
42. Herzog CP et al. Energy loss of a heavy quark moving through N = 4 supersymmetric Yang-
Mills plasma. J. High Energy Phys. 0607:013 (2006)
43. Liu H, Rajagopal K, Wiedemann UA. Calculating the jet quenching parameter from AdS/CFT,
Phys. Rev. Lett. 97:182301 (2006)
44. Casalderrey-Solana J, Teaney D. Heavy quark diffusion in strongly coupled N = 4 Yang Mills.
Phys. Rev. D 74:085012 (2006)
45. Gubser, SS. Drag force in AdS/CFT. Phys. Rev. D 74:126005 (2006)
46. Son DT, Starinets AO. Hydrodynamics of R-charged black holes. J. High Energy Phys. 0603:052
(2006)
47. Mas J, Shear viscosity from R-charged AdS black holes. J. High Energy Phys. 0603:016 (2006)
48. Maeda K, Natsuume M, Okamura T. Viscosity of gauge theory plasma with a chemical potential
from AdS/CFT. Phys. Rev. D 73:066013 (2006)
49. Saremi O. The viscosity bound conjecture and hydrodynamics of M2-brane theory at finite
chemical potential. J. High Energy Phys. 0610:083 (2006)
50. Benincasa P, Buchel A, Naryshkin R. The shear viscosity of gauge theory plasma with chemical
potentials. Phys. Lett. B 645:309 (2007)
	INTRODUCTION
	HYDRODYNAMICS
	Kubo's Formula For Viscosity
	Hydrodynamic Modes
	Viscosity In Weakly Coupled Field Theories
	AdS/CFT CORRESPONDENCE
	Review Of AdS/CFT Correspondence At Zero Temperature
	Black Three-Brane Metric
	REAL-TIME AdS/CFT
	Prescription For Retarded Two-Point Functions
	Calculating Hydrodynamic Quantities
	THE MEMBRANE PARADIGM
	THE VISCOSITY/ENTROPY RATIO
	Universality
	The Viscosity Bound Conjecture
	CONCLUSION
ABSTRACT
  We review recent progress in applying the AdS/CFT correspondence to
finite-temperature field theory. In particular, we show how the hydrodynamic
behavior of field theory is reflected in the low-momentum limit of correlation
functions computed through a real-time AdS/CFT prescription, which we
formulate. We also show how the hydrodynamic modes in field theory correspond
to the low-lying quasinormal modes of the AdS black p-brane metric. We provide
a proof of the universality of the viscosity/entropy ratio within a class of
theories with gravity duals and formulate a viscosity bound conjecture.
Possible implications for real systems are mentioned.

<|endoftext|><|startoftext|>
Introduction
The discovery of large couplings between electrons and the lattice in the cuprate
superconductors has led to a call for more detailed theoretical studies of electron-
phonon systems in low dimensions [1, 2, 3]. One of the best-known traditional
approaches to the electron-phonon problem is attributed to Migdal and Eliashberg
[4, 5]. In a bulk 3D system, the perturbation theory may be sharply truncated at
1st order and momentum dependence neglected if the phonon frequency is much less
than the Fermi energy [4]. In physical terms, Migdal’s approach requires that there
is a very high probability that emitted phonons are reabsorbed in a last-in-first-out
order. The typical materials of interest at the time were bulk metallic superconductors
where electron-phonon coupling is relatively weak, and the phonon frequency small
compared to the Fermi energy. For this reason, the application of Migdal–Eliashberg
(ME) theory has been very successful and remains highly regarded.
Strong electron-phonon coupling and large phonon frequencies in low dimensional
systems are outside the limits of validity of the Migdal–Eliashberg approach.
Therefore, the aim of this paper is to evaluate and discuss the effects of both vertex
corrections (VC) and spatial fluctuations on the theory of coupled electron-phonon
systems in the superconducting state. This follows on from the work by Hague
treating the normal (non-superconducting) state of the Holstein model using DCA
[6]. Initial attempts to include vertex corrections were carried out by Engelsberg and
Schrieffer [7]. Other previous attempts to extend ME theory include the introduction
of vertex corrections into the Eliashberg equations by Grabowski and Sham [8], and
http://arxiv.org/abs/0704.0241v1
Superconducting states of the quasi-2D Holstein model 2
an expansion to higher order in the Migdal parameter by Kostur and Mitrović to
investigate the 2D electron-phonon problem [9]. Grimaldi et al. generalised the
Eliashberg equations to include momentum dependence and vertex corrections [10].
An anomalous hardening of the phonon mode was seen by Alexandrov and Schrieffer
[11]. A discussion of the applicability of these and other approximations to the vertex
function can be found in reference [12].
The current paper uses DCA to introduce a fully self-consistent momentum-
dependent self-energy. DCA extends DMFT by introducing short-range fluctuations
in a controlled manner [13]. It is particularly good at describing the electron-phonon
problem, due to the limited momentum dependence of the self-energy, and in this
case, the self-consistent DCA can be viewed as an expansion about the Eliashberg
equations (in which momentum dependence is effectively coarse grained in a manner
similar to DMFT) [5]. In contrast to the Eliashberg equations, the full form of
the Green’s function is considered here, rather than the renormalised weak coupling
Green’s function (which has the form G−1(ǫk, iωn) = Ziωnσ0 − (ǫk + χ)σ3 −∆σ1).
Two approximations for the electron and phonon self energies are applied in this
paper. The first neglects vertex corrections, but incorporates non-local fluctuations.
The second incorporates lowest order vertex and non-local corrections. The vertex
corrections allow the sequence of phonon absorption and emission to be reordered
once, and therefore introduce exchange effects. The DCA result is compared to the
corresponding DMFT result and in this way low-dimensional effects are isolated. It
should be noted that in the extreme strong coupling limit, the Holstein model forms
a bipolaronic ground state, and perturbative methods in the electron and phonon
Green’s functions break down [14]. In the dilute limit, the Holstein model forms
a polaronic liquid. There are significant differences between the weak and strong
coupling limits of the polaron problem. In the strong-coupling limit, the Lang–Firsov
approximation may be applied, and physical properties have very different behaviour.
For example the effective mass is reduced by an exponential factor of coupling. Exact
numerical results show that the crossover between weak and strong coupling regimes
occurs rapidly at λ ∼ 1 [15]. For this reason, the present approximation should be
considered valid for |U | < W .
The paper is organised as follows. In section 2, the DCA is introduced. In
section 3, the Holstein model of electron-phonon interactions is described, and the
perturbation theory and the full algorithm used in this work are detailed. In section
4, the results are presented. The momentum dependence of the superconducting
order parameter is examined through the density of superconducting pairs. The phase
diagram is then computed and comparison is made with analytical results. A summary
of the major findings of this research is provided in section 5.
2. The dynamical cluster approximation
The dynamical cluster approximation [13, 16] is an extension to the dynamical mean-
field theory. DMFT has been applied as an approximation to models of 3D materials
[17, 18, 19]. However, application of DMFT to one- and two-dimensional models
gives an incomplete description of the physics. An example of significant differences
between two- and three-dimensional physics comes from quantum spin-systems. In 3D,
the Heisenberg model orders at a transition temperature, TN . Significant non-local
fluctuations in two dimensions reduce the Néel temperature to zero (Mermin–Wagner
theorem), and the mean-field approach fails completely.
Superconducting states of the quasi-2D Holstein model 3
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
Figure 1. A schematic representation of the reciprocal-space coarse graining
scheme for a 4 site DCA. Within the shaded areas, the self-energy is assumed to
be constant. There is a many to one mapping from the crosshatched areas to the
points at the centre of those areas. The coarse-graining procedure corresponds
to the mapping to a periodic cluster in real space, with spatial extent N
Also shown are the high symmetry points Γ, W and X, and lines connecting
the high symmetry points. An infinite number of k states are involved in the
coarse-graining step, so the approximation is in the thermodynamic limit. DMFT
corresponds to NC = 1.
Conceptually, DCA is similar to DMFT. The Brillouin zone is divided up into
NC subzones consistent with the lattice symmetry (see figure 1). Within each of these
zones, the self-energy is assumed to be momentum independent. For a system in the
normal state, the Green’s function is determined as,
G(Ki, z) =
Di(ǫ) dǫ
z + µ− ǫ− Σ(Ki, z)
where Di(ǫ) is the non-interacting Fermion density of states for subzone i, and the
vectors Ki represent the average k for each subzone (plotted as the large dots in
figure 1). The theory deals with the thermodynamic limit, and introduces non-local
fluctuations with a characteristic length scale of N
C . For NC = 1, DCA is equivalent
to DMFT.
Since superconducting states are to be considered, DCA is extended within the
Nambu formalism [20] in a similar manner to DMFT [17]. Green’s functions and self-
energies are described by 2 × 2 matrices, with off-diagonal anomalous terms relating
to the superconducting states. Note that in the following equations 4-vectors are
used, i.e. K ≡ (iωn,K). The Green’s function and self-energy matrices have the
components,
G(K) =
G(K) F (K)
F ∗(K) −G(−K)
Σ(K) =
Σ(K) φ(K)
φ∗(K) −Σ(−K)
The coarse graining step is generalised to the superconducting state as,
G(K, iωn) =
Di(ǫ)(ζ(Ki, iωn)− ǫ)
|ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2
Superconducting states of the quasi-2D Holstein model 4
Figure 2. Diagrammatic representation of the approximation used in this paper.
Series (a) represents the vertex-neglected theory which corresponds to the Migdal–
Eliashberg approach. This is valid when there is a high probability that the last
emitted phonon is the first to be reabsorbed, which is true if the phonon energy
ω0 and electron-phonon coupling U are small compared to the Fermi energy.
Series (b) represents additional diagrams for the vertex corrected theory. The
inclusion of the lowest order vertex correction allows the order of absorption and
emission of phonons to be swapped once. For moderate phonon frequency and
electron-phonon coupling, these additions to the theory, in combination with non-
local corrections are expected to improve the theory to sufficient accuracy. The
phonon self energies are labeled with Π, and Σ denotes the electron self-energies.
Lines represent the full electron Green’s function and wavy lines the full phonon
Green’s function.
F (K, iωn) = −
Di(ǫ)φ(Ki, iωn)
|ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2
where ζ(Ki, iωn) = iωn + µ− Σ(Ki, iωn).
The symmetry of the problem was constrained using the pm3m planar point
group suitable for a 2D square lattice [21]. The partial DOS used in the self-consistent
condition were calculated using the analytic tetrahedron method to ensure very high
accuracy [22].
3. The Holstein model
A simple, yet non-trivial, model of electron-phonon interactions treats phonons as
nuclei vibrating in a time-averaged harmonic potential (representing the interactions
between all nuclei) i.e. only one frequency ω0 is considered. The phonons couple
to the local electron density via a momentum-independent coupling constant g. The
resulting Holstein Hamiltonian [23] is written as,
H = −
t<ij>σc
iσcjσ +
niσ(gri−µ)+
Mω20r
The first term in this Hamiltonian represents a tight binding model with hopping
parameter t. Its Fourier transform takes the form ǫk = −2t
i=1 cos(ki). The second
term connects the local ion displacement, ri to the local electron density. Finally
the last term can be identified as the bare phonon Hamiltonian, which is a simple
harmonic oscillator. The creation and annihilation of electrons is represented by c
and ci respectively, pi is the ion momentum and M the ion mass. t = 0.25 in this
paper, corresponding to a bandwidth of W = 2. A small interplanar hopping of
Superconducting states of the quasi-2D Holstein model 5
t⊥ = 0.01 is included to reduce the strength of the logarithmic singularity at ǫ = 0
in Dπ,0(ǫ) and D0,π(ǫ) and stabilise the solution. This is only expected to modify the
results at very low temperature for large clusters, and gives the problem a quasi-2D
character.
It is possible to find an expression for the effective interaction between electrons by
integrating out phonon degrees of freedom [24]. In Matsubara space, this interaction
has the form,
U(iωs) =
ω2s + ω
Here, ωs = 2πsT represent the Matsubara frequencies for Bosons and s is an integer. A
variable U = −g2/Mω20 is defined to represent the effective electron-electron coupling
in the remainder of this paper.
When phonon frequency and coupling are small, Migdal’s theorem applies.
Migdal’s approach allows vertex corrections to be neglected and becomes exact when
U → 0−, ω0 → 0
+ and is 1st order in U . In the limit of huge phonon frequency,
the model maps onto an attractive Hubbard model, so the weak coupling limit of
the Holstein model is only obtained by considering all second-order diagrams in U ,
and ME theory fails. The vertex-corrected theory described in this paper has the
appropriate weak coupling behaviour for both large and small ω0.
In this paper, perturbation theory to 2nd order in U is used [19] (figure 2).
The derivation of the perturbation theory in Ref. [19] made use of the conserving
approximations of Bahm and Kadanoff [25, 24], which Miller et al. then simplified by
applying the dynamical mean-field theory (or local approximation). Here the theory
has been extended to include partial momentum dependence through the application
of the DCA. The electron self-energy has two terms, ΣME(K, iωn) neglects vertex
corrections (figure 2(a)), and ΣVC(K, iωn) corresponds to the vertex corrected case
(figure 2(b)). ΠME(K, iωs) and ΠVC(K, iωs) correspond to the equivalent phonon self
energies. The diagrams translate as follows:
ΣME(K) = UT
G(Q)D(K−Q) (8)
φME(K) = −UT
F (Q)D(K−Q) (9)
ΠME(K) = −2UT
[G(Q)G(K+Q)− F (Q)F ∗(K+Q)] (10)
ΣVC(K) = (UT )
Q1,Q2
[G(Q1)G(Q2)G(K−Q2 −Q1)
− F (Q1)G(Q2)F
∗(K−Q2 −Q1)
− F ∗(Q1)G(Q2)F (K−Q2 −Q1)
−G∗(Q1)F (Q2)F
∗(K−Q2 −Q1)]
×D(K−Q2)D(Q1 −Q2) (11)
φVC(K) = (UT )
Q1,Q2
[F ∗(Q1)F (Q2)F (K−Q2 +Q1)
−G(Q1)F (Q2)G(K−Q2 +Q1)
Superconducting states of the quasi-2D Holstein model 6
−G∗(Q1)F (Q2)G
∗(K−Q2 +Q1)
− F (Q1)G(Q2)G
∗(K−Q2 +Q1)]
×D(K−Q2)D(Q1 −Q2) (12)
ΠVC(K) = −(UT )
Q1,Q2
Tr {σ3G(Q2 +K)σ3G(Q2)σ3G(Q1)σ3G(K+Q1)}
×D(Q2 −Q1) (13)
where σ3 is the third Pauli matrix. Σ = ΣME +ΣVC and Π = ΠME + ΠVC.
The coarse-grained phonon propagator D(K, iωs) is calculated from,
D(K, iωs) =
ω2s + ω
0 −Π(K, iωs)
since the bare dispersion of the Holstein model is flat.
The time taken to perform the double integration over momentum and Matsubara
frequencies is the main barrier to performing vertex-corrected calculations, and this
limits the cluster size. Since the Holstein model with ω0, |U | ≪ W (W is the
bandwidth) has fluctuations which are almost momentum independent, the DCA has
especially fast convergence in NC for the parameter regime where ω0, |U | < W , and
calculations with relatively small cluster size accurately reflect the physics [6]. In this
respect, finite size calculations take too long to compute, and the application of DCA
to this problem is essential.
4. Results
In this section, I discuss results from the self-consistent scheme. Calculations are
carried out along the Matsubara axis, with sufficient Matsubara points for an accurate
calculation. The vertex corrected self-energies drop off more quickly with Matsubara
frequency, so it is possible to increase efficiency by calculating for less frequencies.
Typically, 256 Matsubara frequencies are used for the vertex neglected diagrams, and
64 for the vertex corrected diagrams, which reach asymptotic behaviour at smaller
Matsubara frequencies. The scheme was iterated until the normal and anomalous
self-energies had converged to an accuracy of approximately 1 part in 104. This
corresponds to a very high accuracy for the Green’s function.
Obtaining superconducting solutions involves an additional step, which is not
obvious at the outset. Since the anomalous Green’s function is proportional to
the anomalous self energy, initialising the problem with the non-interacting Green’s
function leads to a non-superconducting (normal) state. Also, the non-interacting
Green’s function is consistent with an ungapped state and opening a gap in the
electron spectrum can lead to limit cycles during self-consistency, which are damped
in the normal way [17].
To induce superconductivity, a constant superconducting field is applied to the
whole system, leading to a non-zero anomalous Green’s function, and automatically
opening a gap in the normal-state Green’s function. The procedure of applying a
fictitious superconducting field is analogous to the application of a magnetic field
to a spin system to induce a moment (the order parameter in that case). The
superconducting field is applied by adding a constant term to the anomalous self-
energy in equation 9. With the field applied, equations 4,5 and 8-14 are solved self-
consistently until convergence is reached. Once satisfactory convergence is reached,
Superconducting states of the quasi-2D Holstein model 7
 0.05
 0.15
 0.25
-2 -1.5 -1 -0.5  0  0.5  1  1.5  2
n=1.12
(0,0)
(π,0)
(π,π)
 0.005
 0.01
 0.015
 0.02
 0.025
 0.03
-2 -1.5 -1 -0.5  0  0.5  1  1.5  2
n=1.54
(0,0)
(π,0)
(π,π)
 0.01
 0.02
 0.03
 0.04
 0.05
 0.06
 0.07
 0.08
-2 -1.5 -1 -0.5  0  0.5  1  1.5  2
n=1.12, VC
(0,0)
(π,0)
(π,π)
 0.005
 0.01
 0.015
 0.02
 0.025
 0.03
 0.035
-2 -1.5 -1 -0.5  0  0.5  1  1.5  2
n=1.54, VC
(0,0)
(π,0)
(π,π)
Figure 3. Real part of the anomalous self-energy at various fillings: (a) n = 1.12,
no vertex corrections (b) n = 1.54, no vertex corrections (c) n = 1.12, vertex
corrections, (d) n = 1.54, vertex corrections. Calculations were carried out at
T = 0.005 with U = 0.6 and ω0 = 0.4. Momentum dependence corresponding to
non-local corrections is clearly visible at half-filling, but drops off as the edge of
the superconducting phase is reached. Vertex corrections are also most important
at half-filling. There is a dip in the anomalous self-energy because the vertex
corrections drop off more quickly in ωn, with opposite sign to φME, indicating
that the approximation is close to breakdown at half-filling. The slight increase
in the anomalous self-energy at n = 1.54 due to vertex corrections arises from a
change in the form of the electronic Green’s function. For half filling, the Green’s
function at the van-Hove points is pure imaginary, whereas for the dilute system,
it is mostly real, so sums over products of Green’s functions in the vertex can
change sign.
the fictitious field is completely removed. Iteration then continues until the true
superconducting state is reached. This procedure corresponds to initialising the self-
consistent cycle with a superconducting solution; note that similar techniques are used
for obtaining Mott insulating solutions in the Hubbard model using DMFT [17].
By following this procedure, a superconducting state may be found below the
transition temperature, TC . Green’s functions and self-energies computed in the
superconducting state can then be used to initialise the self-consistent equations
for similar couplings, fillings, temperatures (with an appropriate rescaling of the
Matsubara frequencies) and phonon frequencies. Above the transition temperature,
the magnitude of the anomalous Green’s function tends to zero during self-consistency
as expected.
It is possible to see the generic effects of vertex and non-local corrections by
examining the anomalous self energy. In figure 3, the anomalous self energy is shown
for n = 1.12 and n = 1.54 for a cluster size of NC = 4 with parameters of U = 0.6,
T = 0.005 and ω0 = 0.4 with and without vertex corrections. The panels are (a)
n = 1.12, no vertex corrections (b) n = 1.54, no vertex corrections (c) n = 1.12,
vertex corrections, (d) n = 1.54, vertex corrections. In panel (a), the momentum
dependence of the vertex neglected theory is clearly visible, and φ(K, iωn) has a much
Superconducting states of the quasi-2D Holstein model 8
larger value at the (π, 0) point. Momentum dependence is significantly reduced as
the system moves away from half-filling (panel b), indicating that Midgal-Eliashberg
theory is more accurate in dilute systems. This is expected, since in very dilute
systems, the electron density is sufficiently low that electrons meet very infrequently,
and therefore the crossed diagrams of figure 2b have extremely small contributions. By
scanning vertically, the effect of including vertex corrections can be seen. Corrections
are strongest close to half-filling, and drop off as the edge of the superconducting
phase is reached. Migdal–Eliashberg theory is clearly quite accurate for dilute 2D
systems, but it consistently fails close to half-filling. Initially, it seems as though
vertex corrections are larger than the vertex neglected results at n = 1.12. In fact,
this is not the case. As discussed in ref. [6], at half-filling vertex corrections act to
reduce the magnitude of the phonon self-energy, so there is much less renormalisation
of the phonon propagator. A smaller phonon propagator means that the effective
coupling is smaller, stabilising the expansion in λeff . There is a dip in the anomalous
self-energy because the vertex corrections drop off more quickly in ωn, with opposite
sign to φME, which is an indication that the approximation is close to breakdown at
half-filling. The slight increase in the anomalous self-energy at n = 1.54 due to vertex
corrections comes about from a change in form of the electronic Green’s function. For
half filling, the Green’s function at the van-Hove points is pure imaginary, whereas for
the dilute system, it is mostly real, so sums over products of Green’s functions can
change sign with respect to the Migdal–Eliashberg result. This sign change is also
seen in DMFT simulations of the 3D Holstein model [19].
At this stage, it is appropriate to examine the size of the parameter λeff that
defines the vertex correction expansion. Since the expansion in this case is in the full
phonon Green’s function, the expansion parameter is renormalised by the phonons,
and reads λeff = UD(µ)D(iωs = 0), where D(iωs = 0) is the phonon propagator at
zero Matsubara frequency. For dilute systems, D(µ) is typically small, and so λeff is
small (N.B. Unlike in 3D, D(µ) is never zero in 2D, because of the discontinuity in
the band edge of the non-interacting DOS). Close to half-filling, the DOS in 2D is
divergent, and this parameter is expected to be large. In the current approximation,
a small interplanar hopping was applied to stabilise the solution, so λeff is smaller
than expected in a pure 2D system. As noted in ref. [6], D(0) is reduced by vertex
corrections as compared to the Migdal–Eliashberg result. For most energies, the bare
density of states in 2D is smaller than the bare density of states in 3D, since the
divergence drops off logarithmically close to half filling. Therefore, λeff is only really
large for n = 1 within the current parameter range. For the mid to dilute limits,
the relative magnitude of the second order vertex correction goes like λ2 ∼ 0.04. At
half filling, with the current parameters, λ2 ∼ 0.5, so the approximation can only
be considered to be qualitatively correct. Nonetheless, the current approximation has
features appropriate to Hohenberg’s theorem (discussed later) and the bare DOS drops
off so quickly moving away from half-filling, that results are expected to be accurate
for most n.
How do the differences in the self-energy relate to observable quantities? One
of the big questions in unconventional superconductivity concerns the possible forms
that the order parameter can take, and a large discussion has grown up around issues
such as the existence of unconventional order parameters such as extended s-wave
and higher harmonics. To examine this idea, I demonstrate the evolution of the shape
of the anomalous pairing density (ns(k) = |T
n F (k, iωn)|), which is related to the
order parameter. In this paper, the superconducting order parameter is treated in
Superconducting states of the quasi-2D Holstein model 9
 0.05
 0.15
U=0.6, ω0=0.4, T=0.005, Nc=1
 0.05
 0.15
 0.05
 0.15
U=0.6, ω0=0.4, T=0.005, Nc=4
 0.05
 0.15
 0.05
 0.15
U=0.6, ω0=0.4, T=0.005, Nc=64
 0.05
 0.15
 0.14
 0.15
 0.16
 0.17
 0.18
 0.19
 0.21
 0.22
(0,π)(π,0)
NC= 1
NC= 4
NC=64
Figure 4. Variation of superconducting (anomalous) pairing density across the
Brillouin zone. U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. Cluster sizes are
increased from NC = 1 to NC = 64. Pairing occurs between electrons close to
the “Fermi-surface” at k and the opposite face of the surface at −k. Also shown
is the pairing density at the Fermi surface for the 3 different cluster sizes (bottom
right). For a cluster size of Nc = 1 corresponding to DMFT, the pairing is
uniform around the “Fermi-surface”, demonstrating that momentum dependence
has been neglected. Momentum dependence favours additional pairing along the
kx = ky line, and a peak can clearly be seen. The expansion in spherical harmonics
contains even momentum states with m = 0. For Nc = 64, additional peaks
can be seen, suggesting that the order parameter also contains extra higher-order
harmonics. The Fermi-surface is not very clearly defined, with the mobile electrons
spread out over a significant range of momentum states. T = 0.005 << W , so
the spread should be very small in all locations in the Brillouin zone, except at
the van Hove points, (π, 0) and (0, π), indicating that the spreading is due to the
low dimensionality.
a fully self-consistent manner within the lattice symmetry and no assumptions have
been made in advance about its form.
Figure 4 shows the variation of superconducting pairing across the Brillouin zone.
In all of the panels, U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. A range of cluster sizes
is shown. In the dynamical mean-field theory which corresponds to the Eliashberg
solution (cluster size of Nc = 1) the pairing is uniform around the Fermi-surface,
as is expected when momentum-dependence is neglected. The inclusion of non-local
momentum dependent fluctuations has a small, but significant effect on the ordering.
Pairing is reduced most at the (π, 0) and (0, π) points, leading to a visible peak at
(π/2, π/2). This demonstrates that the order parameter must necessarily include
higher harmonics. For Nc = 64 additional peaks are also seen. The additional
features may be examined by determining the parameters of an expansion in spherical
harmonics, clm =
(2π)3
Ylm(θ, φ)ns(k) (figure 5). This shows that the anomalous
density can be thought of as m = 0 harmonics with s, d, g... character, and additional
harmonics with m = ±4 in the g channel. The harmonics can be quite large, especially
away from half-filling, and undoubtedly need to be included if the superconductivity
is to be described correctly.
Superconducting states of the quasi-2D Holstein model 10
-0.025
-0.02
-0.015
-0.01
-0.005
 0.005
 0.01
 1  1.1  1.2  1.3  1.4  1.5  1.6  1.7
s, m=0
d, m=0
g, m=0
g, m=4,-4
Figure 5. Harmonic decomposition of the anomalous density computed for
Nc = 64 as chemical potential is varied. It can be seen that pure s-wave states
are the largest contributors to the anomalous density, followed by g and then d
states with m = 0. Owing to the hump at the (π/2, π/2) point, there are also
g states with m = ±4. The m = ±4 states have equal magnitude, so mtot = 0.
Note that the relative contribution of higher harmonics is greatest away from half
filling.
 0.05
 0.15
U=0.9, ω0=0.05, T=0.005, Nc=4
 0.05
 0.15
 0.05
 0.15
U=0.9, ω0=0.4, T=0.005, Nc=4
 0.05
 0.15
 0.05
 0.15
U=0.6, ω0=0.05, T=0.005, Nc=4
 0.05
 0.15
 0.05
 0.15
U=0.6, ω0=0.4, T=0.005, Nc=4
 0.05
 0.15
Figure 6. Variation of superconducting (anomalous) pairing density across the
Brillouin zone. T = 0.005 and NC = 4. Changes in the order parameter are
shown as coupling and phonon frequency are changed. As the phonon frequency
is increased, the momentum dependence also increases, and the Fermi-surface
is less well defined. For U = 0.9, ω0 = 0.05, the order parameter is almost
flat along the Fermi-surface, indicating that DMFT is a good approximation for
those parameters. For the largest coupling and phonon frequencies (at the edge
of applicability for the current approximation), the Fermi-surface is practically
destroyed.
Superconducting states of the quasi-2D Holstein model 11
Figure 6 shows the variation of superconducting (anomalous) pairing density
across the Brillouin zone as coupling and phonon frequency are changed. T = 0.005,
n = 1 and NC = 4 with vertex corrections excluded. As the phonon frequency is
increased, the momentum dependence also increases. For U = 0.9, ω0 = 0.05, the
order parameter is almost flat along the Fermi-surface, indicating that DMFT is a
good approximation for those parameters. Typically, additional coupling makes the
order more uniform in the Brillouin zone. Note that for very strong coupling, the
DMFT solution is expected to become exact, even in 2D, since the bare dispersion
is then essentially flat, and the problem is completely local. For weak coupling and
phonon frequency, there is a well defined Fermi-surface. For the largest coupling and
phonon frequencies (at the edge of applicability for the current approximation), the
Fermi-surface is practically destroyed.
It is of clear interest to map the phase diagram associated with superconducting
order. First, the superconductivity arising from DMFT is investigated in the absence
of vertex corrections. Figure 7 shows the resulting phase diagram. Note that in
the DMFT solution, the superconductivity is strongest at half-filling and the order
drops off monotonically as the filling increases. Assuming a form for the density of
states in 2D (with small interplane hoppnig) of D(ǫ) = (1− t log[(ǫ2 + t2⊥)/16t
2])/tπ2
(for |ǫ| < 4t) [26], the BCS result may be calculated using the expression TC(n) =
2ω0 exp(−1/|U |D(µ(n)))/π, with the chemical potential taken from the self-consistent
solution for a given n. This result also drops off monotonically. Results in the dilute
limit are in good agreement with the BCS result. Closer to half-filling, the DMFT
result is significantly smaller than the BCS result (which predicts TC(n = 1) > 0.07).
The difference in results between the two mean-field theories at half-filling is due to
the self-consistency in the DMFT. For small U , the self-consistent equations converge
on the first iteraction, but for larger U , the phonon and electron Green’s functions are
significantly renormalised, thus reducing the transition temperature.
To show the differences induced by spatial fluctuations, the phase diagram is
computed for a cluster size of NC = 4. Figure 8 shows the total density of
superconducting pairs for U = 0.6, ω0 = 0.4 and various temperatures and fillings,
without vertex corrections. Of most interest is an anomalous bump centred about
n = 1.25, indicating that the strongest superconductivity occurs away from half filling,
and that this is due to non-local fluctuations in 2D. Assuming a material with a non-
interacting band width of 1eV, the highest transition temperature would correspond
to 145K. This value is higher that that eventually expected in real materials. For
instance, the effect of a Coulomb pseudopotential UC will be a reduction of the
transition temperature. The standard BCS result is modified by Coulomb repulsion
in the following way, TC = 2ω0 exp(−1/(λ− UC)))/π.
In addition to the TC reduction due to Coulomb repulsion, a fundamental limit
on the transition temperature in pure 2D is the Hohenberg theorem [27]. This is
closely related to the effects of spatial fluctuations. The basis of Hohenberg’s proof is
the divergence of certain quantities (which are known to be finite) in d ≤ 2 at k = 0
when anomalous expectation values (e.g. the superconducting order parameter) are
non zero. In the DMFT solution, there are no specific k = 0 states due to the
coarse graining, and so the divergence in the correlation functions that led Hohenberg
to determine that the order parameter must be zero for zero momentum pairing
in d ≤ 2 is washed out, leading to a finite transition temperature for the 2D local
approximation. In DCA, partial momentum dependence is restored. Therefore, the
effects of the washed out divergences are stronger. There is still a finite transition
Superconducting states of the quasi-2D Holstein model 12
 0.02
 0.04
 0.06
 0.08
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
U=0.6, ω0=0.4, Nc=1
Figure 7. Superconducting phase diagram showing the total number of
superconducting states. U = 0.6, ω0 = 0.4 and various T . A cluster size of Nc = 1
has been used, and no vertex corrections are included. The superconductivity
is strongest at half-filling and the order drops off monotonically as the filling
increases. Results in the dilute limit are in good agreement with the BCS result
(the transition temperature from BCS is shown as the line with points in the
ns = 0 plane). Closer to half-filling, the DMFT result is significantly smaller
than the BCS result (which predicts TC(n = 1) > 0.07). The difference in results
between the two mean-field theories at half-filling is due to the self-consistency in
the DMFT.
temperature, but it is reduced wherever there is strong momentum dependence. This
is demonstrated by the drop in superconducting order at and close to half-filling in
figure 8, where the momentum dependence is strongest. As the number of cluster
points increases, the momentum resolution becomes superior, and the divergences of
Hohenberg’s theorem are expected to emerge in a systematic manner. In real materials
with quasi-2D character, some interplane hopping remains. In that case, the results
from small cluster DCA are expected to be more reliable.
Finally, I demonstrate the effects of vertex corrections on the superconducting
phase diagram. Figures 9 and 10 show the total number of superconducting states
as a function of filling and temperature for cluster sizes of NC = 1 and NC = 4
respectively. An electron-phonon coupling of U = 0.6 and phonon frequency of
ω0 = 0.4 have been used. Vertex corrections do not appear to make a large difference
to the DMFT result in figure 9. For NC = 4, the bulge that was seen in the non
vertex-corrected phase diagram is very clearly enhanced. Superconductivity at half-
filling is completely suppressed in the vertex corrected solution. This is the precursor
to Hohenberg’s theorem applying across the entire phase diagram, and both the effects
of spatial fluctuations and the lowest order vertex correction were essential to obtain
that agreement.
Superconducting states of the quasi-2D Holstein model 13
 0.02
 0.04
 0.06
 0.08
U=0.6, ω0=0.4, Nc=4
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
Figure 8. Superconducting phase diagram showing the total number of
superconducting states. U = 0.6, ω0 = 0.4 and various T . A cluster size of Nc = 4
has been used, and no vertex corrections are included. There is an anomalous
bump centred about n = 1.25, indicating that the strongest superconductivity
occurs away from half filling. The highest transition temperature occurs for
T = 0.025. The reduction in the transition temperature close to half-filling shows
the onset of Hohenberg’s theorem. The largest superconductivity coincides with
the increase in components without pure s-wave character (see figure 5).
 0.02
 0.04
 0.06
 0.08
 0.12
U=0.6, ω0=0.4, Nc=1, VC
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
 0.12
Figure 9. Superconducting phase diagram showing the total number of
superconducting states. U = 0.6, ω0 = 0.4 and various T . A cluster size of
Nc = 1 has been used, and vertex corrections are included. As in figure 7, the
DMFT result falls off monotonically with increased filling.
Superconducting states of the quasi-2D Holstein model 14
 0.02
 0.04
 0.06
 0.08
U=0.6, ω0=0.4, Nc=4, VC
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
Figure 10. Superconducting phase diagram showing the total number of
superconducting states. U = 0.6, ω0 = 0.4 and various T . A cluster size of
Nc = 4 has been used, and vertex corrections are included. Superconducting
states are suppressed at half-filling, and there is a significant bulge away from
half-filling, with a maximum transition temperature of 0.015W. It is significant
that the transition temperature is reduced to zero at half-filling and supressed
close to half filling, since reduction of transition temperatures is expected in 2D
due to Hohenberg’s theorem.
5. Summary
In this paper I have carried out DCA calculations of a quasi-2D Holstein model in
the superconducting state with large in plane hopping and small out of plane hopping
(t = 0.25, t⊥ = 0.01). Several approximations to the self-energy were considered,
including the neglect of vertex corrections (which corresponds to a momentum-
dependent extension to the Eliashberg theory), the inclusion of vertex corrections
as a corrected approximation for stronger couplings, and the introduction of spatial
fluctuations. The anomalous self energy, superconducting order parameter and phase
diagram were calculated.
The superconducting order parameter was found to modulate around the fermi
surface, and is not pure s-wave. Analysis of the harmonics showed that the state is
a conbination of s, d, f etc. states with m = 0, and other states with integer value of
m = ±4n. The total angular momentum is always zero. The contribution of the m 6= 0
states is considerably larger away from half filling. Increases in bare phonon frequency
tended to increase the strength of the superconducting order, and contributed to a
degeneration of the Fermi-surface.
The phase diagram was analysed for small cluster sizes of NC = 1, 4. For NC = 1,
the phase diagram was shown to agree qualitatively with the BCS theory. When
spatial fluctuations are included, the superconducting order is suppressed at half-
filling, leading to a characteristic hump at a doping of approximately δn = 0.25. Vertex
corrections completely suppress superconductivity at half-filling, which is believed to
be a manifestation of Hohenberg’s theorem. In particular, the states with the largest
momentum dependence showed the strongest reduction in the transition temperature,
Superconducting states of the quasi-2D Holstein model 15
indicating that spatial fluctuations as well as vertex corrections contribute to the
supression of superconducting order in pure 2D materials.
Acknowledgments
JPH would like to thank the University of Leicester for hospitality and use of facilities
while carrying out this research.
[1] A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar, D.L.Feng, E.D.Lu, T.Yoshida, H.Eisaki,
A.Fujimori, K.Kishio, J.-I.Shimoyama, T.Noda, S.Uchida, Z.Hussa, and Z.-X.Shen. Nature,
412:6846, 2001.
[2] R.J.McQueeny, Y.Petrov, T.Egami, M.Yethiraj, G.Shirane, and Y.Endoh. Phys. Rev. Lett.,
82:628, 1999.
[3] G.M.Zhao, M.B.Hunt, H.Keller, and K.A.Müller. Nature, 385:236, 1997.
[4] A.B.Migdal. JETP letters, 7:996, 1958.
[5] G.M.Eliashberg. Interactions between electrons and lattice vibrations in a superconductor.
JETP letters, 11:696, 1960.
[6] J.P.Hague. Electron and phonon dispersions of the two dimensional Holstein model: Effects of
vertex and non-local corrections. J. Phys. Condens. Matt, 15:2535, 2003.
[7] S.Engelsberg and J.R.Schrieffer. Phys. Rev., 131:993, 1963.
[8] M.Grabowski and L.J.Sham. Superconductivity from nonphonon interactions. Phys. Rev. B,
29:6132, 1984.
[9] V.N.Kostur and B.Mitrović. Electron-phonon interaction in two dimensions: Variation of
ImΣ(ǫp, ω) with increasing ωD/EF . Phys. Rev. B, 48:16388, 1993.
[10] C.Grimaldi, L.Pietrono, and S.Strässler. Nonadiabatic superconductivity: Electron-phonon
interaction beyond Migdal’s theorem. Phys. Rev. Lett., 75:1158, 1995.
[11] A.S.Alexandrov and J.R.Schrieffer. Phys. Rev. B, 56:13731, 1997.
[12] O.V.Danylenko and O.V.Dolgov. Nonadiabatic contribution to the quasiparticle self-energy in
systems with strong electron-phonon interaction. Phys. Rev. B, 63:094506, 2001.
[13] M.Hettler, A.N.Tahvildar-Zadeh, M.Jarrell, T.Pruschke, and H.R.Krishnamurthy. Nonlocal
dynamical correlations of strongly interacting electron systems. Phys. Rev. B, 58:7475, 1998.
[14] A.S.Alexandrov and N.F.Mott. Rep. Prog. Phys., 57:1197, 1994.
[15] P.E.Kornilovitch. Phys. Rev. Lett., 81:5382, 1998.
[16] M.Hettler, M.Mukherjee, M.Jarrell, and H.R.Krishnamurthy. Phys. Rev. B, 61:12739, 2000.
[17] A.Georges, G.Kotliar, W.Krauth, and M.Rozenburg. Rev. Mod. Phys, 68:13, 1996.
[18] A.N.Tahvildar-Zadeh, J.K.Freericks, and M.Jarrell. Magnetic phase diagram of the hubbard
model in three dimensions: the second-order local approximation. Phys. Rev. B, 55:942,
1997.
[19] P.Miller, J.K.Freericks, and E.J.Nicol. Possible experimentally observable effects of vertex
corrections in superconductors. Phys. Rev. B, 58(21):14498, 1998.
[20] Thomas Maier, Mark Jarrell, Thomas Pruschke, and Matthias H. Hettler. Quantum cluster
theories. cond-mat/0404055.
[21] T.Hahn, editor. International tables for crystallography. Volume A: Space group symmetry.
Kluwer Academic Publishers, Dordrecht, 1996.
[22] Ph. Lambin and J.P.Vigneron. Computation of crystal green’s functions in the complex-energy
plane with the use of the analytical tetrahedr on method. Phys. Rev. B, 29:3430, 1984.
[23] T.Holstein. Ann. Phys., 8:325–342, 1959.
[24] N.E.Bickers and D.J.Scalapino. Ann. Phys., 193:206, 1989.
[25] G.Bahm and L.P.Kadanoff. Phys. Rev., 124:287, 1961.
[26] L.S.Macarie and N.d’Ambrumenil. J. Phys. Condens. Matt, 7:3237, 1995.
[27] P.C.Hohenberg. Phys. Rev., 158:383, 1967.
http://arxiv.org/abs/cond-mat/0404055
	Introduction
	The dynamical cluster approximation
	The Holstein model
	Results
	Summary
ABSTRACT
  I investigate superconducting states in a quasi-2D Holstein model using the
dynamical cluster approximation (DCA). The effects of spatial fluctuations
(non-local corrections) are examined and approximations neglecting and
incorporating lowest-order vertex corrections are computed. The approximation
is expected to be valid for electron-phonon couplings of less than the
bandwidth. The phase diagram and superconducting order parameter are
calculated. Effects which can only be attributed to theories beyond
Migdal--Eliashberg theory are present. In particular, the order parameter shows
momentum dependence on the Fermi-surface with a modulated form and s-wave order
is suppressed at half-filling. The results are discussed in relation to
Hohenberg's theorem and the BCS approximation.

<|endoftext|><|startoftext|>
Introduction
This review summarizes maser results pertinent to star formation appearing in the
literature since the last maser meeting (IAU Symposium 206). References are drawn
from recent literature when possible.
2. Masing species
2.1. Water (H2O)
The 22.235 GHz water line is the predominant water maser line. Masers in this transition
are very bright, easily observable, and inverted under a wide range of conditions (e.g.,
Babkovskaia & Poutanen 2004). Several millimeter and submillimeter transitions of wa-
ter are also seen as masers. Discussion of these transitions can be found in the section of
these proceedings devoted to millimeter and submillimeter masers.
Water masers are frequently seen in outflows from both high-mass and low-mass YSOs
(Honma et al. 2005; Moscadelli, Cesaroni, & Rioja 2005; Goddi & Moscadelli 2006; Moscadelli et al. 2006).
These jets are seen in deceleration (Imai et al. 2002) and often have substructure on AU
scales (Torrelles et al. 2003; Furuya et al. 2005; Uscanga et al. 2005).
Water masers are sometimes believed to trace disks as well as outflows (Seth, Greenhill, & Holder 2002;
Gallimore et al. 2003), possibly excited by an expanding shock wave. Indeed, shocks
likely are responsible for arc-like maser distributions (Honma et al. 2004) and may ex-
cite masers in accreting material as well (Menten & van der Tak 2004). Water masers
appear in Bok globules (Gómez et al. 2006), likely tracing bipolar molecular outflows
(de Gregorio-Monsalvo et al. 2006), as well as bright rimmed clouds (Valdettaro et al. 2005),
again likely associated with outflows (Urquhart et al. 2006). The location of water masers
near the ionization front of large H ii regions provides evidence in support of triggered
star formation (Healy, Hester, & Claussen 2004). The common thread of all these en-
vironments is the existence of energetic shocks, which fits with conventional wisdom
regarding water maser pumping.
Despite the small Zeeman splitting coefficient of water, line-of-sight magnetic fields
of tens to hundreds of milligauss have been measured in star forming regions via water
maser Zeeman splitting (Sarma et al. 2002; Vlemmings et al. 2006), although direct in-
terpretation of the Stokes V profile as a magnetic field strength may be in error by up to a
http://arxiv.org/abs/0704.0242v1
2 Fish
factor of two depending on local velocity and magnetic field gradients (Vlemmings 2006).
Linear polarization observations of water masers can provide information on the orien-
tation of the magnetic field in the plane of the sky. The hourglass morphology of the
magnetic field in W3 IRS 5 appears to be due to the processes of collapse rather than a
result of the outflow traced by water masers (Imai et al. 2003). Further interpretation of
water maser polarization can be found in the review by Wouter Vlemmings.
Water masers have a fractal distribution over 4 orders of magnitude in spatial scale,
possibly indicating that they appear at the turbulent dissipation scale (Strelnitski et al. 2002;
Ripman & Strelnitski 2006; see also Vladimir Strelnitski’s contribution to these pro-
ceedings). Turbulence may also be responsible for variability, including changes in the
line-of-sight velocity of individual components, of the water masers in some sources
(Lekht et al. 2006a). Larger-scale variations may contribute as well, such as changes in
outflow parameters or cyclic variability of the central star (Pashchenko & Lekht 2005;
Lekht et al. 2006b). An ordered structure is inferred in W31(2) from successive flaring
of features at different velocities (Lekhn, Munitsyn, & Tolmachev 2005).
2.2. Methanol (CH3OH)
Methanol masers divide into two categories, known as class I and class II, based on
their propensity for certain transitions to produce masers while others are seen in ab-
sorption. Traditionally, class I and class II masers do not mix (e.g., Ellingsen 2005);
however, fine-tuned conditions may rarely excite lines from both classes simultaneously,
as appears to be the case in OMC-1 (Voronkov et al. 2005). For purposes of this review,
class I and class II methanol masers will be treated separately. Improved laboratory data
for rest frequencies have been obtained for many methanol transitions in both classes
(Müller, Menten, & Mäder 2004).
2.2.1. Class I
Class I masers are primarily collisionally pumped. They are typically found in younger
sources that are Class II masers and may trace distant parts of outflows interacting with
dense molecular gas (Beuther et al. 2005; Ellingsen 2006). Comparison of interferometric
maps of the 9.9 and 104.3 GHz transitions with H2 data confirms the outflow association
in IRAS 16547−4247 (Voronkov et al. 2006). Linear polarization suggests that Class I
masers may appear in oblique shocks parallel to the outflow axis and perpendicular to
the magnetic field in OMC-2 (Wiesemeyer, Thum, & Walmsley 2004), although interpre-
tation of methanol polarization may be complicated (e.g., Elitzur 2002).
Many different Class I transitions have been observed. Weak masers at 84.5 and
95.2 GHz to the southwest of W3(OH) provide strong evidence in support of colli-
sional pumping and allow for physical conditions to be inferred (Sutton et al. 2004).
The latter transition is commonly seen as a maser in both Class I and Class II sources
(Minier & Booth 2002). Strong 36 GHz maser emission is believed to be an indicator of an
early evolutionary stage, as may line ratios of other transitions (Gillis, Pratap, & Strelnitski 2005;
Hoffmann, Pratap, & Strelnitski 2006). The intensity ratio of highly-excited 146.6 and
156.8 GHz methanol masers may be a sensitive probe of density and temperature in mas-
ing regions (Lemonias, Strelnitski, & Pratap 2006). Short-timescale variability is seen in
the 44 and 146.6 GHz transitions (Pratap, Hoffmann, & Strelnitski 2006).
2.2.2. Class II
The key Class II maser transitions are at 6668 and 12178 MHz. Class II masers are
often found in an earlier evolutionary stage than ultracompact (UC) H ii regions (e.g.,
Minier et al. 2005), but observations of methanol masers cospatial with both millimeter
Masers and star formation 3
and centimeter continuum emission indicate that Class II masers appear over a wide
range of evolutionary stages (Pestalozzi et al. 2006). While the lower stellar mass limit
for methanol masers is still a subject of research, 6.7 GHz masers do not appear below
approximately 3 solar masses (Minier et al. 2003).
Linear structures of masers with organized velocity structures have led some to con-
clude that methanol masers often trace edge-on disks (Norris et al. 1993, 1998). Obser-
vations of maser proper motions (Minier et al. 2000), shocked H2 (De Buizer 2003), and
SiO (De Buizer et al. 2006) indicate that, in the majority of cases at least, methanol
masers are aligned with an outflow, not a disk. These structures may be explained by
propagation of a shock front into a region with large-scale velocity structure, such as rota-
tion (Dodson, Ojha, & Ellingsen 2004). Disk candidate sources remain (Slysh et al. 2002b;
Pestalozzi et al. 2004; Pillai et al. 2006), although these, too, may turn out to be associ-
ated with outflows when studied at high resolution in the mid infrared (e.g., De Buizer & Minier 2005).
The conclusion to be drawn is that a linear distribution of masers with a velocity gra-
dient does not by itself present convincing evidence that the masers trace an edge-on
disk. An intriguing variant is the possibility of methanol masers tracing a face-on disk in
G23.657−0.127 (Bartkiewicz, Szymczak, & van Langevelde 2005).
There is evidence to support the hypothesis that most methanol masers are tracing
shocked regions, often in the presence of outflows. Methanol masers appear preferentially
near radio sources with a spectral index indicative of an outflow (Zapata et al. 2006).
Mid-infrared images of some sources indicate that masers are found along the shocked
material on the surface of an outflow cavity (De Buizer 2006, 2007). In some sources,
methanol masers appear near but offset from UCH ii regions, suggesting that they appear
in the shocked molecular gas outside the ionization front, similar to hydroxyl masers
(Phillips & van Langevelde 2005).
The 6.7 GHz transition is a popular line for Galactic maser surveys. Several surveys
were reported on during IAU Symposium 206. Details of the Arecibo and Parkes multi-
beam 6.7 GHz surveys can be found in the section of these proceedings devoted to Galac-
tic maser surveys. Unsurprisingly, their distribution correlates well with Galactic struc-
ture (Pestalozzi, Minier, & Booth 2005; Pestalozzi et al. 2007). The 12.2 GHz line, when
it occurs, is almost always weaker than 6.7 GHz emission (B laszkiewicz & Kus 2004).
Both lines have also been the subject of monitoring studies (e.g., Goedhart, Gaylard, & van der Walt 2005),
which find variability in a large fraction of sources including periodic variability and a
time delay between features possibly due to light travel time (Goedhart, Gaylard, & van der Walt 2005;
Goedhart et al. 2005). Based on comparison of spectra over a period of a decade, the life-
time of an individual 6.7 GHz maser feature is about 150 years (Ellingsen 2007), while the
lifetime of the 6.7 GHz maser phase in a source is a few× 104 years (Codella et al. 2004;
van der Walt 2005), similar to the lifetime of the OH maser phase (e.g., Fish & Reid 2006).
Numerous other Class II transitions have been observed. New maser sources have
been found in rare transitions at 85.5, 86.6, and 107.0 GHz (Minier & Booth 2002;
Ellingsen et al. 2003) and a torsionally-excited line at 44.9 GHz (Voronkov, Austin, & Sobolev 2002).
Several weak maser lines near 165 GHz have also been detected (Salii & Sobolev 2006).
Emission in the 19.9 GHz transition is usually weak and correlates well with 6035 MHz
OH masers (Ellingsen et al. 2004). A search for 23.1 GHz emission resulted in no new
detections beyond the previously known maser in NGC 6334F (Cragg et al. 2004). Ob-
servations of these less common methanol maser transitions can help constrain physical
parameters in maser models. Improved collisional rate data has also allowed refinement
of methanol models, which slightly affects predicted excitation conditions and bright-
ness temperatures but not which transitions are expected to produce detectable Class II
masers (Cragg, Sobolev, & Godfrey 2005).
4 Fish
2.3. Hydroxyl (OH)
Hydroxyl masers are usually studied in sources with associated UCH ii regions (e.g.,
Fish et al. 2005) but are also found toward less evolved massive protostellar objects
(Edris, Fuller, & Cohen 2007). A different class of OH masers is seen at the ends of
the jet in the W3 TW object (Argon, Reid, & Menten 2003).
Hydroxyl masers around more evolved sources are usually seen in expansion (some-
times very rapid; see Stark et al. 2007) ahead of the ionization front of a UCH ii region
(Fish & Reid 2006). Sometimes the masers appear to trace a molecular disk or torus
(Slysh et al. 2002a; Hutawarakorn & Cohen 2005; Edris et al. 2005; Nammahachak et al. 2006).
Masers are often seen along arcs or filaments (Cohen et al. 2006), with extended filamen-
tary emission especially common at 4.7 GHz (Palmer, Goss, & Devine 2003).
Multitransition overlaps are of special interest because of their ability to constrain
physical conditions in models. The 4765 MHz line is observed to be the strongest line of
the 6 cm triplet but is usually only weakly inverted and often spatially extended (e.g.,
Palmer, Goss, & Whiteoak 2004; Harvey-Smith & Cohen 2005). A histogram of 18 cm
emission resembles 4.7 GHz lineshapes in W49A, suggesting that the high-gain 18 cm
emission and low-gain 6 cm emission have similar velocity distributions, even if the
4660 MHz emission is spatially separate (Palmer & Goss 2005). Much is made of over-
laps between 4765 and 1720 MHz masers (Palmer, Goss, & Devine 2003; Niezurawska
et al. 2004, 2005). It should be noted that 6035 MHz maser emission correlates more
strongly with 4765 MHz than does 1720 MHz, even if the velocities do not always agree
(Dodson & Ellingsen 2002; Smits 2003). Masers in the 1720 MHz transition also appear
to correlate with 1665 and 6035 MHz OH masers and 6.7 GHz methanol masers, at least
to arcsecond accuracy (Caswell 2004a). Masers in the 6030 MHz transition are almost
always accompanied by stronger emission at 6035 MHz (Caswell 2003), with excellent
spatial coincidence and agreement of magnetic field strengths with each other and with
1665 MHz masers (Etoka, Cohen, & Gray 2005). Masers in the highly-excited 13441 MHz
transition are rare but are always accompanied by 6035 MHz masers at the same velocity,
although the intensities in the two transitions do not show a high degree of correlation
(Baudry & Desmurs 2002; Caswell 2004c). The ground-state, satellite line transitions at
1612 and 1720 MHz are usually conjugate with respect to absorption and maser emis-
sion (Szymczak & Gérard 2004). Sources do exist in which both transitions are inverted,
though not in direct spatial overlap (e.g., Wright, Gray, & Diamond 2004).
Magnetic fields as strong as 40 mG are seen in OH masers (Slysh & Migenes 2006;
Fish & Reid 2007). Magnetic fields are highly ordered in star forming regions (Fish & Reid 2006)
and support pictures in which the processes of star formation do not tangle field lines
significantly. While magnetic field strengths are usually stable from epoch to epoch,
monotonic decay of the field in a Zeeman group in Cep A continues to be observed
(Bartkiewicz et al. 2005). High spectral resolution observations support the conventional
assumption that the Zeeman splitting coefficient appropriate for σ±1 components should
be assumed when measuring magnetic fields at 1612 and 1720 MHz (Fish, Brisken, & Sjouwerman 2006).
Linear polarization is of limited usefulness in determining the full, three-dimensional
orientation of the magnetic field, likely due to a combination of Faraday rotation and
anisotropic magnetohydronamic turbulence (Watson et al. 2004; Fish & Reid 2006).
Extreme variability is occasionally seen in OH. The 1665 MHz maser in W75 N briefly
flared to nearly 1 kJy to become the brightest OH maser in the sky (Alakoz et al. 2005).
The 4765 MHz transition is highly time-variable (Palmer, Goss, & Whiteoak 2004): the
maser in Mon R2 flared to nearly 80 Jy before disappearing (Smits 2003) and reap-
Masers and star formation 5
pearing (Fish et al. 2006b). Short-timescale variability is seen in the ground-state lines
(Ramachandran, Deshpande, & Goss 2006; also Miller Goss in these proceedings).
2.4. Formaldehyde (H2CO)
Formaldehyde masers are seen near a handful of several massive YSOs, with several new
detections in recent years (Araya et al. 2005, 2006). They have both a compact and an ex-
tended component with velocity gradients (Hoffman et al. 2003; Hoffman, Goss, & Palmer 2007).
A short-duration flare has been detected toward one source (Araya et al. 2007). Further
details can be found in the review by Esteban Araya.
2.5. Silicon monoxide (SiO)
While SiO masers are commonly seen in evolved stars, they are rare in star form-
ing regions. They are seen in bipolar outflows in W51 IRS 2 and Orion KL source I
and appear much closer to the central source than do water masers (Eisner et al. 2002;
Greenhill et al. 2004). As is the case in evolved stars, the maser species SiO, water, and
OH occur at progressively larger distances from source I (Cohen et al. 2006). While OH
masers are ubiquitous throughout Orion, there is a “zone of avoidance” associated with
source I in which they do not appear but SiO and water masers do. Interestingly, the
v = 1, J = 2 → 1 masers are found closer to the protostar in source I than are the
J = 1 → 0 masers, a finding that is difficult to understand in the context of SiO maser
pumping models (Doeleman et al. 2004).
2.6. Other species
Few other new maser species or transitions have been reported in the literature since the
last meeting. The first (J,K) = (6, 6) ammonia (NH3) maser has been detected centered
on a millimeter peak in NGC 6334 I (Beuther et al. 2007). Weakly inverted acetaldehyde
(CH3CHO) has been detected in the 111 → 110 transition at 1065.075 MHz toward Sgr
B2 (Chengalur & Kanekar 2003).
3. Multi-species associations
Methanol, OH, and water masers are frequently in the same source, although water and
methanol masers usually originate in different regions (Beuther et al. 2002; Caswell 2004b;
Edris et al. 2005; Szymczak, Pillai, & Menten 2005). Most 6.7 GHz methanol maser sources
have associated OH masers, almost always at 1665 MHz and frequently at 1667 MHz
as well (Szymczak & Gérard 2004), while the correlation between 6.7 GHz methanol
masers and 22 GHz water masers is less strong (e.g., Breen et al. 2007). The distribu-
tions of 6.7 GHz methanol masers and 6.0 GHz OH masers in W3(OH) are very similar,
although direct overlap of the two species is rare (Etoka, Cohen, & Gray 2005). Simi-
lar phenomena are also seen between 6.7 GHz methanol and 1.6/4.7 GHz OH masers
(Harvey-Smith & Cohen 2006).
All three species correlate more strongly with mid infrared emission than centimeter or
near infrared emission, and all are frequently found in linear groupings (De Buizer et al. 2005).
However, the luminosity of water masers correlates less strongly with far infrared lumi-
nosity than is the case for methanol and OH masers, likely because water masers are
not predominantly pumped by infrared photons (Szymczak, Pillai, & Menten 2005), al-
though they may not be pumped entirely by collisions either (Liu, Forster, & Sun 2005;
Liu et al. 2007). In any case, the existence of either masers (water and methanol) or out-
flows towards a UCH ii region is an excellent predictor of the other (Codella et al. 2004),
indicating that both masers and outflows are usually detectable in the UCH ii phase.
6 Fish
4. Observational advances
4.1. Proper motions and geometric distances
Evidence in support of the kinematic interpretation of maser motions continues to pile
up, with reports of the persistence of spot shapes in methanol (Moscadelli et al. 2002)
and water masers (Goddi et al. 2006) and the inferred average overdensity of masers as
compared to non-masing material (Fish, Reid, & Menten 2005; Fish et al. 2006a). In ad-
dition to tracing internal source motions, maser proper motions can be used to obtain
geometric parallax distances and measurements of Galactic rotation. Recent years have
seen this technique used for both methanol and water masers in W3(OH) to obtain dis-
tances accurate to a few percent (Xu et al. 2006; Hachisuka et al. 2006). Further details
can be found in proceedings in the Galactic structure session.
4.2. Spectral resolution
Very high spectral line observations at VLBI (very long baseline interferometry) spa-
tial resolution have been obtained toward masers in several species in star forming re-
gions. Detailed line profile analyses of water masers conclude that the near-Gaussian
lineshapes indicate that they occur in hot (∼ 1200 K) gas with small beaming angles
(Watson, Sarma, & Singleton 2002). Similar observations of 12.2 GHz methanol masers
(Moscadelli et al. 2003) and the ground-state quartet of OH in W3(OH) (Fish, Brisken, & Sjouwerman 2006)
find similarly Gaussian spectral profiles as well as maser spot positional gradients as a
function of velocity (equivalently, velocity gradients). The positional gradients show no
clear large-scale spatial organization but have similar magnitudes in methanol and three
of the four OH transitions. It is possible that these positional gradients represent turbu-
lent motions on very small spatial scales. If so, it is important to understand the char-
acteristics of the turbulence, since observed maser properties, including variability, can
be highly sensitive to turbulence in the masing region (Böger, Kegel, & Hegmann 2003;
Sobolev, Watson, & Okorokov 2003; Silant’ev et al. 2006). Similar polarization charac-
teristics in Class II methanol features at different velocities may indicate that velocity gra-
dients induce velocity redistribution (Wiesemeyer, Thum, & Walmsley 2004), which may
play a critical role in preventing saturated rebroadening (e.g., Nedoluha & Watson 1988).
4.3. Infrared pumping lines
Molecular infrared transitions observations, particularly of OH, are essential for measur-
ing maser pump efficiencies and may help place observational constraints on the radiative
pump cycles of some models (e.g., Gray 2007). Archival Infrared Space Observatory data
have been searched for the 34.6 and 53.3 µm pumping lines of OH with limited success,
due to the low spectral resolution of the instruments (He & Chen 2004; He 2005). Her-
schel and SOFIA will have the spectral resolution and frequency coverage required to
observe pumping lines of OH as well as the critical 560 µm line of methylidyne (CH).
5. Further remarks
Two quotes from De Buizer et al. (2005) serve to summarize the common themes of
the observations over the past five years. The first is that “maser emission in general can
trace a variety of phenomena associated with massive stars including shocks, outflows,
infall, and circumstellar disks. No one maser species is linked exclusively to one particular
process or phenomenon.” Indeed, while certain maser species may preferentially turn up
in a particular context (possibly a result of observational biases), the set of all observa-
tions of any one maser species resist being pigeonholed into a particular phenomenon.
Masers and star formation 7
The second quote is that water, OH, and methanol masers “do not seem to be associated
with different early evolutionary stages of massive stars. Instead it appears that they
all trace a variety of stellar phenomena throughout many early stages of massive stellar
evolution.” As is clear from §3, different maser species are commonly found together,
independent of the evolutionary stage of the source. While proposed sequences in which
certain masers turn on before others may be useful for statistical evaluation of evolu-
tionary phases, important exceptions to such sequences exist. Those oddball masers that
do not seem to fit present standard paradigms should be studied in especial detail, since
we cannot predict beforehand what will be thereby learned about their environment or
about maser processes in general.
It was only a few years ago that Ellingsen (2004) referred to masers as “the Bart Simp-
son of star formation research,” noting that they are “under-achievers” in comparison
with masers in other environments due to the lack of sensitive, high resolution observa-
tions at complementary wavelengths. While this may once have been true, recent maser
observations have made great advancements in probing a wide range of dynamic struc-
tures relevant to star formation. Maser VLBI allows observations of small- and large-scale
morphologies, magnetic fields, and motions on AU scales and is showing great promise
as a tool to trace Galactic structure. Maser models for some species are becoming suf-
ficiently refined to provide good constraints on physical conditions. The community is
beginning to appreciate the role of turbulence and the ability to probe its properties
using maser observations. Synergies with mid infrared instruments have clarified many
of the mysteries of linear structures with velocity gradients. We have entered the era of
greatly improved far infrared instrumentation, and the ALMA (Atacama Large Millime-
ter Array) era, with unprecedented sensitivity and angular resolution at submillimeter
wavelengths, will begin in a few years. Further advancements in radio instrumentation,
including new space VLBI missions and the SKA (Square Kilometre Array), will provide
even greater insights. It is perhaps more correct to state that maser observations are at
the vanguard of star formation research: yesterday’s observations can be explained by
complementary data and theory today, and today’s observations lay the groundwork for
the breakthroughs that will be achieved in the context of tomorrow.
Acknowledgements
The National Radio Astronomy Observatory is a facility of the National Science Foun-
dation operated under cooperative agreement by Associated Universities, Inc.
References
Alakoz, A.V., Slysh, V.I., Popov, M.V., & Val’tts, I.E. 2005, Astron. Lett., 31, 375
Araya, E., Hofner, P., Goss, W.M., Kurtz, S., Linz, H., & Olmi, L. 2006, ApJ, 643, L33
Araya, E., Hofner, P., Kurtz, S., Linz, H., Olmi, L., Sewilo, M., Watson, C., & Churchwell, E.
2005, ApJ, 618, 339
Araya, E., Hofner, P., Sewilo, M., Linz, H., Kurtz, S., Olmi, L., Watson, C., & Churchwell, E.
2007, ApJ, 654, L95
Argon, A.L., Reid, M.J., & Menten, K.M. 2003, ApJ, 593, 925
Babkovskaia, N., & Poutanen, J. 2004, A&A, 418, 117
Bartkiewicz, A., Szymczak, M., Cohen, R.J., & Richards, A.M.S. 2005, MNRAS, 361, 623
Bartkiewicz, A., Szymczak, M., & van Langevelde, H.J. 2005, A&A, 442, L61
Baudry, A., & Desmurs, J.F. 2002, A&A, 394, 107
Beuther, H., Thorwith, S., Zhang, Q., Hunter, T.R., Megeath, S.T., Walsh, A.J., & Menten,
K.M. 2005, ApJ, 627, 834
8 Fish
Beuther, H., Walsh, A., Schilke, P., Sridharan, T.K., Menten, K.M., & Wyrowski, F. 2002, A&A,
390, 289
Beuther, H., Walsh, A.J., Thorwith, S., Zhang, Q., Hunter, T.R., Megeath, S.T., & Menten,
K.M. 2007, A&A, in press, astro-ph/0702190
B laszkiewicz, L., & Kus, A.J. 2004, A&A, 413, 233
Böger, R., Kegel, W.H., & Hegmann, M. 2003, A&A, 406, 23
Breen, S.L., et al. 2007, MNRAS, in print, astro-ph/0702673
Caswell, J.L. 2003, MNRAS, 341, 551
Caswell, J.L. 2004a, MNRAS, 349, 99
Caswell, J.L. 2004b, MNRAS, 351, 279
Caswell, J.L. 2004c, MNRAS, 352, 101
Chengalur, J.N., & Kanekar, N. 2003, A&A, 403, L43
Codella, C., Lorenzani, A., Gallego, A.T., Cesaroni, R., & Moscadelli, L. 2004, A&A, 417, 615
Cohen, R.J., Gasiprong, N., Meaburn, J., & Graham, M.F. 2006, MNRAS, 367, 541
Cragg, D.M., Sobolev, A.M., Caswell, J.L., Ellingsen, S.P., & Godfrey, P.D. 2004, MNRAS, 351,
Cragg, D.M., Sobolev, A.M., & Godfrey, P.D. 2005, MNRAS, 360, 533
De Buizer, J.M. 2003, MNRAS, 341, 277
De Buizer, J.M. 2006, ApJ, 642, L57
De Buizer, J.M. 2007, ApJ, 654, L147
De Buizer, J.M., & Minier, V. 2005, ApJ, 628, L151
De Buizer, J.M., Radomski, J.T., Telesco, C.M., & Piña, R.K. 2005, ApJS, 156, 179
De Buizer, J.M., Redman, R., Feldman, P., Longmore, S., & Caswell, J. 2006, BAAS, 38, 1059
de Gregorio-Monsalvo, I., Gómez, J.F., Suárez, O., Kuiper, T.B.H., Anglada, G., Patel, N.A.,
& Torrelles, J.M. 2006, AJ, 132, 2584
Dodson, R.G., & Ellingsen, S.P. 2002, MNRAS, 333, 307
Dodson, R., Ojha, R., & Ellingse, S.P. 2004, MNRAS, 351, 779
Doeleman, S.S., Lonsdale, C.J., Kondratko, P.T., & Predmore, C.R. 2004, ApJ, 607, 361
Edris, K.A., Fuller, G.A., & Cohen, R.J. 2007, A&A, 465, 865
Edris, K.A., Fuller, G.A., Cohen, R.J., & Etoka, S. 2005, A&A, 434, 213
Eisner, J.A., Greenhill, L.J., Herrnstein, J.R., Moran, J.M., & Menten, K.M. 2002, ApJ, 569,
Elitzur, M. 2002, Astrophysical Spectropolarimetry, 225
Ellingsen, S.P. 2004, IAU Symposium, 221, 133
Ellingsen, S.P. 2005, MNRAS, 359, 1498
Ellingsen, S.P. 2006, ApJ, 638, 241
Ellingsen, S.P. 2007, MNRAS, in press, astro-ph/0702506
Ellingsen, S.P., Cragg, D.M., Lovell, J.E.J., Sobolev, A.M., Ramsdale, P.D., & Godfrey, P.D.
2004, MNRAS, 354, 401
Ellingsen, S.P., Cragg, D.M., Minier, V., Muller, E., & Godfrey, P.D. 2003, MNRAS, 344, 73
Etoka, S., Cohen, R.J., & Gray, M.D. 2005, MNRAS, 360, 1162
Fish, V.L., Brisken, W.F., & Sjouwerman, L.O. 2006, ApJ, 647, 418
Fish, V.L., & Reid, M.J. 2006, ApJS, 164, 99
Fish, V.L., & Reid, M.J. 2007, ApJ, 656, 943
Fish, V.L., Reid, M.J., Argon, A.L., & Zheng, X.-W. 2005, ApJS, 160, 220
Fish, V.L., Reid, M.J., & Menten, K.M. 2005, ApJ, 623, 269
Fish, V.L., Reid, M.J., Menten, K.M., & Pillai, T. 2006, A&A, 458, 485
Fish, V.L., Zschaechner, L.K., Sjouwerman, L.O., Pihlström, Y.M., & Claussen, M.J. 2006, ApJ,
653, L45
Furuya, R.S., Kitamura, Y., Wootten, A., Claussen, M.J., & Kawabe, R. 2005, A&A, 438, 571
Gallimore, J.F., Cool, R.J., Thornley, M.D., & McMullin, J. 2003, ApJ, 586, 306
Gillis, R.G., Pratap, P., & Strelnitski, V. 2005, BAAS, 37, 1472
Goddi, C., & Moscadelli, L. 2006, A&A, 447, 577
Goddi, C., Moscadelli, L., Torrelles, J.M., Uscanga, L., & Cesaroni, R. 2006, A&A, 447, L9
Goedhart, S., Gaylard, M.J., & van der Walt, D.J. 2003, MNRAS, 339, L33
http://arxiv.org/abs/astro-ph/0702190
http://arxiv.org/abs/astro-ph/0702673
http://arxiv.org/abs/astro-ph/0702506
Masers and star formation 9
Goedhart, S., Gaylard, M.J., & van der Walt, D.J. 2005, Ap&SS, 295, 197
Goedhart, S., Minier, V., Gaylard, M.J., & van der Walt, D.J. 2005, MNRAS, 356, 839
Gómez, de Gregorio-Monsalco, I., Suárez, O., & Kuiper, T.B.H. 2006, AJ, 132, 1322
Gray, M.D. 2007, MNRAS, 375, 477
Greenhill, L.J., Reid, M.J., Chandler, C.J., Diamon, P.J., & Elitzur, M. 2004, IAU Symp. 221,
Hachisuka, K., Brunthaler, A., Menten, K.M., Reid, M.J., Imai, H., Hagiwara, Y., Miyoshi, M.,
Horiuchi, S., & Sasao, T. 2006, ApJ, 645, 337
Harvey-Smith, L., & Cohen, R. J. 2005, MNRAS, 356, 637
Harvey-Smith, L., & Cohen, R. J. 2006, MNRAS, 371, 1550
He, J.H. 2005, New Astron., 10, 283
He, J.H., & Chen, P.S. 2004, New Astron., 9, 545
Healy, K.R., Hester, J.J., & Claussen, M.J. 2004, ApJ, 610, 835
Hoffman, I.M., Goss, W.M., & Palmer, P. 2007, ApJ, 654, 971
Hoffman, I.M., Goss, W.M., Palmer, P., & Richards, A.M.S. 2003, ApJ, 598, 1061
Hoffmann, S., Pratap, P., & Strelnitski, V. 2006, BAAS, 38, 947
Honma, M., Bushimata, T., & Choi, Y.K., et al. 2005, PASJ, 57, 595
Honma, M., Choi, Y.K., & Bushimata, T., et al. 2004, PASJ , 56, L15
Hutawarakorn, B., & Cohen, R. J. 2005, MNRAS, 357, 338
Imai, H., Horiuchi, S., Deguchi, S., & Kameya, O. 2003, ApJ, 595, 285
Imai, H., Watanabe, T., Omodaka, T., Nishio, M., Kameya, O., Miyaji, T., & Nakajima, J.
2002, PASJ, 54, 741
Lekht, E.E., Munitsyn, V.A., & Tolmachev, A.M. 2005, Astron. Lett., 31, 315
Lekht, E.E., Silant’ev, N.A., Krasnov, V.V., & Munitsyn, V.A. 2006, Astron. Rep., 50, 638
Lekht, E.E., Trinidad, M.A., Mendoza-Torres, J.E., Rudnitskij, G.M., & Tolmachev, A.M. 2006,
A&A, 456, 145
Lemonias, J.J., Strelnitski, V., & Pratap, P. 2006, BAAS, 38, 947
Liu, H., Forster, J.R., Liu, Y., & Sun, J. 2007, ApSS, in press
Liu, H.-P., Forster, J.R., & Sun, J. 2005, Chin. J. Astron. Astrophys., 5, 175
Menten, K.M., & van der Tak, F.F.S. 2004, A&A, 414, 289
Minier, V., & Booth, R.S. 2002, A&A, 387, 179
Minier, V., Booth, R.S., Ellingsen, S.P., Conway, J.E., & Pestalozzi, M.R. 2000, EVN Sympso-
sium, Proceedings of the 5th European VLBI Network Symposium, 179
Minier, V., Burton, M.G., Hill, T., Pestalozzi, M.R., Purcell, C.R., Garay, G., Walsh, A.J., &
Longmore, S. 2005, A&A, 429, 945
Minier, V., Ellingsen, S.P., Norris, R.P., & Booth, R.S. 2003, A&A, 403, 1095
Moscadelli, L., Cesaroni, R., & Rioja, M.J. 2005, A&A, 438, 889
Moscadelli, L., Menten, K.M., Walmsley, C.M., & Reid, M.J. 2002, ApJ, 564, 813
Moscadelli, L., Menten, K.M., Walmsley, C.M., & Reid, M.J. 2003, ApJ, 583, 776
Moscadelli, L., Testi, L., Furuya, R.S., Goddi, C., Claussen, M., Kitamura, Y., & Wootten, A.
2006, A&A, 446, 985
Müller, H.S.P., Menten, K.M., & Mäder, H. 2004, A&A, 428, 1019
Norris, R.P., et al. 1998, ApJ, 508, 275
Nammahachak, S., Asanok, K., Hutawarakorn Kramer, B., Cohen, R.J., Muanwong, O., &
Gasiprong, N. 2006, MNRAS, 371, 619
Nedoluha, G.E., & Watson, W.D. 1988, ApJ, 335, L19
Niezurawska, A., Szymczak, M., Cohen, R.J., & Richards, A.M.S. 2004, MNRAS, 350, 1409
Niezurawska, A., Szymczak, M., Richards, A.M.S., & Cohen, R.J. 2005, Ap&SS, 295, 37
Norris, R.P., Whiteoak, J.B., Caswell, J.L., Wieringa, M.H., & Gough, R.G. 1993, ApJ, 412,
Palmer, P., & Goss, W.M. 2005, MNRAS, 360, 993
Palmer, P., Goss, W.M., & Devine, K.E. 2003, ApJ, 599, 324
Palmer, P., Goss, W.M., & Whiteoak, J.B. 2004, MNRAS, 347, 1164
Pashchenko, M.I., & Lekht, E.E. 2005, Astron. Rep., 49, 624
10 Fish
Pestalozzi, M.R., Chrysostomou, A., Collett, J.L., Minier, V., Conway, J., & Booth, R.S. 2007,
A&A, 463, 1009
Pestalozzi, M.R., Elitzur, M., Conway, J.E., & Booth, R.S. 2004, ApJ, 603, L113
Pestalozzi, M.R., Minier, V., & Booth, R.S. 2005, A&A, 432, 737
Pestalozzi, M.R., Minier, V., Motte, F., & Conway, J.E. 2006, A&A, 448, L57
Phillips, C.J., & van Langevelde, H.J. 2005, Ap&SS, 295, 225
Pillai, T., Wyrowski, F., Menten, K.M., & Krügel, E. 2006, A&A, 447, 929
Pratap, P., Hoffmann, S., & Strelnitski, V. 2006, BAAS, 38, 948
Ramachandran, R., Deshpande, A.A., & Goss, W.M. 2006, ApJ, 653, 1314
Ripman, B.H., & Strelnitski, V. 2006, BAAS, 38, 1053
Salii, S.V., & Sobolev, A.M. 2006, Astron. Rep., 50, 965
Sarma, A.P., Troland, T.H., Crutcher, R.M., & Roberts, D.A. 2002, ApJ, 580, 928
Seth, A.C., Greenhill, L.J., & Holder, B.P. 2002, ApJ, 581, 325
Silant’ev, N.A., Lekht, E.E., Mendoza-Torres, J.E., & Rudnitskij, G.M. 2006, A&A, 453, 989
Slysh, V.I., & Migenes, V. 2006, MNRAS, 369, 1497
Slysh, V.I., Migenes, V., Val’tts, I.E., Lyubchenko, S.Yu., Horiuchi, S., Altunin, V.I., Fomalont,
E.B., & Inoue, M. 2002, ApJ, 564, 317
Slysh, V.I., Voronkov, M.A., Val’tts, I.E., & Migenes, V. 2002, Astron. Rep., 46, 969
Smits, D.P. 2003, MNRAS, 339, 1
Sobolev, A.M., Watson, W.D., & Okorokov, V.A. 2003, ApJ, 590, 333
Stark, D.P., Goss, W.M., Churchwell, E., Fish, V.L., & Hoffman, I.M. 2007, ApJ, 656, 943
Strelnitski, V., Alexander, J., Gezari, S., Holder, B.P., Moran, J.M., & Reid, M.J. 2002, ApJ,
581, 1180
Sutton, E.C., Sobolev, A.M., Salii, S.V., Malyshev, A.V., Ostrovskii, A.B., & Zinchenko, I.I.
2004, ApJ, 609, 231
Szymczak, M., & Gérard, E. 2004, A&A, 414, 235
Szymczak, M., Pillai, T., & Menten, K.M. 2005, A&A, 434, 613
Torrelles, J.M., Patel, N.A., Anglada, G., et al. 2003, ApJ, 598, L115
Urquhart, J.S., Thompson, M.A., Morgan, L.K., & White, G.J. 2006, A&A, 450, 625
Uscanga, L., Cantó, J., Curiel, S., Anglada, G., Torrelles, J.M., Patel, N.A., Gómez, J.F., &
Raga, A.C. 2005, ApJ, 634, 468
Valdettaro, R., Palla, F., Brand, J., & Cesaroni, R. 2005, A&A, 535, 540
van der Walt, J. 2005, MNRAS, 360, 153
Vlemmings, W.H.T. 2006, A&A, 445, 1031
Vlemmings, W.H.T., Diamond, P.J., van Langevelde, H.J., & Torrelles, J.M. 2006, A&A, 448,
Voronkov, M.A., Austin, M.C., & Sobolev, A.M. 2002, A&A, 387, 310
Voronkov, M.A., Brooks, K.J., Sobolev, A.M., Ellingsen, S.P., Ostrovskii, A.B., & Caswell, J.L.
2006, MNRAS, 373, 411
Voronkov, M.A., Sobolev, A.M., Ellingsen, S.P., & Ostrovskii, A.B. 2005, MNRAS, 362, 995
Watson, W.D., Sarma, A.P., & Singleton, M.S. 2002, ApJ, 570, L37
Watson, W.D., Wiebe, D.S., McKinney, J.C., & Gammie, C.F. 2004, ApJ, 604, 707
Wiesemeyer, A., Thum, C., & Walmsley, C.M. 2004, A&A, 428, 479
Wright, M.M., Gray, M.D., & Diamond, P.J. 2004, MNRAS, 350, 1272
Xu, Y., Reid, M.J., Zheng, X.W., & Menten, K.M. 2006, Science, 311, 54
Zapata, L.A., Rodŕıguez, L.F., Ho, P.T.P., Beuther, H., & Zhang, Q. 2006, AJ, 131, 939
	Introduction
	Masing species
	Water (H2O)
	Methanol (CH3OH)
	Class I
	Class II
	Hydroxyl (OH)
	Formaldehyde (H2CO)
	Silicon monoxide (SiO)
	Other species
	Multi-species associations
	Observational advances
	Proper motions and geometric distances
	Spectral resolution
	Infrared pumping lines
	Further remarks
ABSTRACT
  Recent observational and theoretical advances concerning astronomical masers
in star forming regions are reviewed. Major masing species are considered
individually and in combination. Key results are summarized with emphasis on
present science and future prospects.

<|endoftext|><|startoftext|>
Introduction
The nature of the metallic antiferromagnetically ordered
state in strongly correlated systems has been subject of
study for over two decades, but still remains to be fully un-
derstood. Interest in this topic has been stimulated by the
fact that the high temperature superconductivity of the
cuprates emerges from the doping of an antiferromagnetic
insulating compound, such as La2CuO4 [1,2]. The sim-
plest models to describe the electrons in the CuO2 planes
of the cuprates are the two dimensional t-J-model and
Hubbard model. Much of the initial effort went into the
study of a single hole state of these models in an antifer-
romagnetic background. For the motion of this hole, there
is a competition between the gain in kinetic energy from
the hopping and its disruptive effect on the antiferromag-
netic order, and consequent loss of potential energy. As a
result a hole excitation becomes a quasiparticle or mag-
netic polaron, heavily dressed by antiferromagnetic spin
fluctuations (see review article by Dagotto [3] and refer-
ences therein). Much of this work, however, relied on exact
diagonalization or quantum Monte Carlo methods, which
are limited to small clusters and very few hole excitations,
and cannot be readily extended to study the many-hole,
finite doping situation.
More recent studies capable of describing finite dop-
ing have concentrated on the relation between the antifer-
romagnetic fluctuations and superconducting order (for
a review see [4] and the references therein). One of the
main motivations is to understand whether the exchange
of these types of fluctuations can provide a purely elec-
tronic mechanism for inducing superconductivity. Here, in
this paper, we focus on the metallic antiferromagnetism,
the doped state with long range antiferromagnetic order.
Our interest is to examine how well the low energy ex-
citations in this ordered state can be described in terms
renormalized quasiparticles. To tackle this problem we use
the infinite dimensional Hubbard model.
The simplification in the infinite dimensional limit is
that the electron self-energy becomes local in character,
with no wavevector dependence [5,6]. The self-energy then
depends only on the frequency, as is the case for impurity
models, allowing the lattice problem to be cast in the form
of a self-consistent impurity model. There are several rea-
sonably accurate techniques for solving this effective im-
purity problem, a very accurate one for the zero and low
temperature regime being the numerical renormalization
group approach (NRG) [7].
Recently we studied the effect of a magnetic field on
the quasiparticle excitations in the strong correlation regime
of the infinite dimensional Hubbard model using the NRG
method [8]. We also extended a form of renormalized per-
turbation theory (RPT), originally developed for impu-
rity models [9], to this model, and used it to calculate
the local dynamic spin susceptibilities, obtaining results
in good overall agreement with those from the NRG. In
this paper we extend this combination of renormalization
techniques, NRG and RPT within dynamical mean field
theory, to look at the low energy excitations of the infi-
nite dimensional Hubbard model in a staggered field, and
in antiferromagnetic broken symmetry states. Extensive
calculations of the antiferromagnetic states in the Hub-
http://arxiv.org/abs/0704.0243v2
2 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
bard model using the DMFT-NRG approach have already
been reported in the paper of Zitzler, Pruschke and Bulla
[10]. We confirm their results for the phase diagram and
extend the calculations and analysis to the description of
the low energy renormalized excitations, and how these
can be described within the framework of a renormalized
perturbation theory.
2 Antiferromagnetic Broken Symmetry in
In considering the response of the Hubbard model [11] to
a staggered magnetic field and antiferromagnetic order,
we take the case of a bipartite lattice, which consists of
two sublattices A and B such that the nearest neighbors
of a site in the A sublattice are on the B sublattice and
vice versa. The Hamiltonian for the Hubbard model can
be written in the form,
i,j,σ
(tijc
A,i,σcB,j,σ + h.c.) + U
nα,i,↑nα,i,↓
A,i,σcA,i,σ + µ−σc
B,i,σcB,i,σ), (1)
where the hopping matrix element is taken as tij = −t
between nearest sites i and j only, and zero otherwise,
and α = A,B. A staggered field His
His =
H for i ∈ A sublattice
−H for i ∈ B sublattice (2)
has been included so that µσ = µ+σh, where h = gµBH/2
with the Bohr magneton µB. The non-interacting part
of the Hamiltonian H0,µ can be diagonalized in terms of
Bloch states and then expressed in the form,
H0,µ =
k,σMk,σCk,σ. (3)
where C
k,σ = (c
A,k,σ, c
B,k,σ), and the matrix Mk,σ is
given by
Mk,σ =
−µσ εk
εk −µ−σ
. (4)
The k sums run over a reduced Brillouin zone, and the
energy of the Bloch state is εk =
j tije
i(Ri−Rj)·k. The
free Green’s function matrix G0k,σ(ω) is given by (ω −
Mk,σ)
−1. The poles of the free Green’s function give the
elementary single particle excitations, which are given by
E0k,±(U = 0) = −µ0(h)±
h2 + ε2
, (5)
where µ0(h) is the chemical potential of the noninteracting
system in a staggered field. This illustrates that the elec-
tronic excitations are split into two subbands for a finite
staggered field.
Notice that we have adopted a special choice of ba-
sis {cA,k,σ, cB,k,σ} here [12,10]. Another common basis to
study antiferromagnetic and spin density wave symmetry
(SDW) breaking is {ck,σ, ck+q0,σ}, where q0 is the recip-
rocal lattice vector for commensurate SDW ordering. The
bases can be related by a linear transformation,
k+q0,σ
cA,k,σ
cB,k,σ
. (6)
For the latter basis the matrix Mk,σ would be diagonal
in the kinetic energy term and the symmetry breaking
is offdiagonal. For our study in the DMFT framework the
A−B-sublattice basis is, however, more convenient and we
will use it throughout the rest of this paper. It is possible,
of course, to relate the obtained quantities with the help
of (6) to the {ck,σ, ck+q0,σ} basis.
We can generalize the equations to the interacting prob-
lem by introducing a self-energy Σα,k,σ(ω), so that the
matrix Green’s function can be written in the form
Gk,σ(ω)=
ζA,k,σ(ω)ζB,k,σ(ω)− ε2k
ζB,k,σ(ω) −εk
−εk ζA,k,σ(ω)
where ζα,k,σ(ω) = ω + µσ − Σα,k,σ(ω). As we are dealing
with the infinite dimensional limit of the model, we take
the self-energy to be local so we can drop the k index. This
is the reason why the self-energy has a single site index
α = A,B and no offdiagonal terms appear in equation (7).
The symmetry of the bipartite lattice gives ΣB,σ(ω) =
ΣA,−σ(ω) ≡ Σ−σ(ω) and hence
ζB,−σ(ω) = ζA,σ(ω) ≡ ζσ(ω),
where we have simplified the notation. To determine these
quantitiesΣσ(ω) it is sufficient to focus on the A sublattice
only.
Summing the first component in the Green’s function
in equation (7) over k we obtain the Green’s function for
a site on the A sublattice, Glocσ (ω),
Glocσ (ω) =
ζ−σ(ω)
ζσ(ω)ζ−σ(ω)
ρ0(ε)
ζσ(ω)ζ−σ(ω)− ε
, (8)
where ρ0(ε) is the density of states of the non-interacting
system in the absence of the staggered field.
In the DMFT this local Green’s function, and the self-
energy Σσ(ω), are identified with the corresponding quan-
tities for an effective impurity model [12]. This implies
that the Green’s function G0,σ(ω) for the effective impu-
rity in the absence of an interaction at the impurity site
is given by
G−10,σ(ω) = Glocσ (ω)−1 +Σσ(ω). (9)
We can take the form of this impurity model to correspond
to an Anderson model [13] in a magnetic field,
HAM =
εd,σd
σdσ + Und,↑nd,↓ (10)
(Vk,σd
σck,σ + V
k,σdσ) +
εk,σc
k,σck,σ,
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 3
where εd,σ = εd−σh is the energy of the localized level at
an impurity site in a magnetic fieldH , U the interaction at
this local site, and Vk,σ the hybridization matrix element
to a band of conduction electrons of spin σ with energy
εk,σ. As we are focusing on an A site as the impurity we
take H = Hs.
The one-electron Green’s function for the impurity site
of this model is given by
Gimpσ (ω) =
ω − εdσ −Kσ(ω)−Σσ(ω)
, (11)
where
Kσ(ω) =
|Vk,σ|2
ω − εk,σ
. (12)
If this impurity Green’s function is equated to the local
lattice Green’s function Glocσ (ω), we identify εdσ = −µσ
and from equation (9), Kσ(ω) is given by
Kσ(ω) = ω + µσ − G−10,σ(ω). (13)
The function Kσ(ω) plays the role of the effective medium
and has to be calculated self-consistently.
The self-consistent calculations for Kσ(ω) can usually
be performed iteratively. Starting from a conjectured form
for Kσ(ω), the NRG method is used to calculate the self-
energy of the effective Anderson model, from which the
impurity Green’s function Gimpσ (ω) in (11) and the local
Green’s function for the lattice Glocσ (ω) in (8) can be de-
duced. If these two Green’s functions do not agree, then
equation (9) is used to derive a new starting value for
Kσ(ω) and the process continued until self-consistency is
achieved.
To find antiferromagnetic solutions, we calculated self-
consistent solutions for a decreasing sequence of staggered
magnetic fields to see if broken symmetry solutions of this
type exist as the staggered field is reduced to zero. For
the non-interacting density of states ρ0(ε) we take the
Gaussian form ρ0(ε) = e
−(ε/t∗)2/
πt∗, corresponding to
an infinite dimensional hypercubic lattice. It is useful to
define an effective bandwidth W = 2D for this density of
states via D, the point at which ρ0(D) = ρ0(0)/e
2, giving
2t∗ corresponding to the choice in reference [14].
In all the results we present here we take the value W =
4. In the NRG calculations we have used the improved
method [15,16] of evaluating the response functions with
the complete Anders-Schiller basis [17], and also determine
the self-energy from a higher order Green’s function [18].
In figure 1 we show the self-consistently calculated lo-
cal spectral density for the spin-up (upper panel) and spin-
down electrons (lower panel) at an A site with U = 3 and
5% hole doping (from the state at half-filling) for various
values of an applied staggered field. The staggered mag-
netic field induces a sublattice magnetization,
(nA,↑ − nA,↓), (14)
so that these spectra are quite different. For this set of
parameters, this difference persists as the staggered field
−4 −2 0 2 4
h =0.05
h =0.1
−4 −2 0 2 4
h =0.05
h =0.1
Fig. 1. (Color online) The spectral densities for the spin-up
electrons (upper panel) and spin-down electrons (lower panel)
at the A site for various values of the applied staggered field
for U = 3 and x = 0.95
is reduced to zero so that we have a spontaneous sub-
lattice magnetization corresponding to spontaneous anti-
ferromagnetic order. For the case away from half filling,
δ 6= 0, we have to keep adjusting the chemical potential
when iterating for a self-consistent solution. It shows a
slightly oscillatory behavior when iterating for a specific
filling x, and we follow the procedure described in refer-
ence [10]. This feature is related to the fact that the calcu-
lations are for a metastable ground state and instabilities
to more complicated ground states for antiferromagnetic
ordering than the homogeneous, commensurate Néel state,
which forms the basis for these DMFT calculations, can
occur [19,20,21,22,23,24,25,26,10]. As far as phase sep-
aration in the ground state is concerned, the results of
our calculations are generally in line with the conclusions
in [10] as they are carried out within the same frame-
work. The focus of this work is, however, the analysis of
generic quasiparticle properties in a doped antiferromag-
netic state. We consider the approach as a valid, approxi-
mate starting point for this endeavor, but modifications to
the results presented here can occur for calculations based
4 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
on a more complicated ground states not accessible within
the DMFT framework. For a more extensive discussion of
the applicability of the DMFT in this situation we refer
to the earlier work [10].
From results of this type of calculation, we have built
up a global antiferromagnetic/paramagnetic phase dia-
gram as a function of the doping δ and the on-site inter-
action U . This phase diagram is shown in figure 2, where
the value of the corresponding sublattice magnetization is
shown in a false color plot. We have added a line separat-
ing the spontaneously ordered and paramagnetic regimes.
0 0.05 0.1 0.15 0.2
0.1 0.2 0.3 0.4
Fig. 2. (Color online) Phase diagram showing the doping and
the U dependence of the sublattice magnetization mA as de-
duced from the DMFT-NRG calculations.
At half filling (δ = 0 axis) the spontaneous magnetiza-
tion increases with U . We can see that the antiferromag-
netic order from the half filled case persists when holes
are added. The value of the critical doping δc at which
the antiferromagnetism disappears depends on the on-site
interaction U . We expect that for small U the critical dop-
ing δc will increase with U since a tendency to order only
appears when an on-site interaction is present. From the
mapping to the t−J model we also expect that for large U
the antiferromagnetic coupling J decreases and therefore
the order is destroyed more easily. The values of U are,
however, not large enough to display this trend.
If we compare these results with the phase diagram
given by Zitzler et al. [10] we see that they are in very good
agreement. In their case the antiferromagnetic region was
mapped out to values of U ≃ 4.5. As the iterations tend
to oscillate, as discussed before, there is a problem of ob-
taining a self-consistent antiferromagnetic solution in the
large U regime. We have managed to extend the diagram
to somewhat larger values of U by stabilising the calcula-
tions by averaging the effective medium over a number of
iterations.
3 Local Quasiparticle Parameters
To examine the nature of the low energy excitations, we
will assume that the self-energy Σσ(ω) is non-singular at
ω = 0 so that, at least asymptotically, it can be expanded
in powers of ω. This assumption is not expected to be valid
close to the quantum critical point when the magnetic
order sets in, but to be a reasonable assumption otherwise.
We also assume that the imaginary part of the self-energy
vanishes which is confirmed by the numerical results of the
DMFT-NRG calculations. We will retain terms to order
ω only for the moment. The higher order corrections will
be considered later. We then find for ζσ(ω),
ζσ(ω) = ω(1−Σ′σ(0)) + µσ −Σσ(0) (15)
= z−1σ (ω + µ̃0,σ), (16)
where
µ̃0,σ = zσ(µ−Σσ(0)), and z−1σ = 1−Σ′σ(0). (17)
The interacting Green’s function (7) has poles at the roots
of the quadratic equation,
ζσ(ω)ζ−σ(ω)− ε2k = 0. (18)
The solutions of this equation are
E0k,± = −µ̃±
+∆µ̃2, (19)
where ε̃k =
z↑z↓εk, ∆µ̃ = (µ̃0,↑ − µ̃0,↓)/2, and µ̃ =
(µ̃0,↑ + µ̃0,↓)/2. This has the same form as for the non-
interacting system in a staggered field (5), so we can in-
terpret these excitations as quasiparticles coupled to an
effective staggered magnetic field h̃s = ∆µ̃/gµB, with µ̃
playing the role of a quasiparticle chemical potential. This
equation gives the dispersion relation for these single par-
ticle excitations, which can be regarded as constituting a
renormalized band, or bands as there are two branches.
The term magnetic polaron is sometimes used to describe
these single particle excitations in states of magnetic or-
der, because of the analogy with the motion of a particle
in a lattice to which it is strongly coupled, where the ex-
citation is termed a polaron.
The corresponding density of states of these free local
quasiparticles on the sublattice is
ρ̃0,σ(ω)=
ω + µ̃− σ∆µ̃
ω + µ̃+ σ∆µ̃
(ω + µ̃)2 −∆µ̃2
for |ω + µ̃| > |∆µ̃|, and is zero otherwise. In the case of
a half-filled band µ̃ = 0 and there is a gap at the Fermi
level εF = 0.
To determine this local quasiparticle density of states
in the presence of the symmetry breaking staggered mag-
netic field we need to calculate zσ and µ̃0,σ for each spin
type. Using the NRG we can do this in two ways. As the
DMFT-NRG calculations give us the self-energyΣσ(ω) di-
rectly, we only need its value, and that of its first derivative
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 5
at ω = 0, to deduce both zσ and µ̃0,σ using equation (17).
However, because the model is solved using an effective
impurity model, we can also deduce these quantities indi-
rectly from the many-body energy levels of the impurity
on approaching the low energy fixed point [27]. This sec-
ond method gives us not only a check on the results of the
direct method, but also allows to deduce some informa-
tion about the quasiparticle interactions, as we shall show
in the next section.
3.1 Calculation of Renormalized Parameters
To describe how the renormalized parameters are deduced
from the energy levels of the NRG calculation, we need to
outline how the NRG calculations are carried out. Fol-
lowing the procedure introduced by Wilson [28], the con-
duction band is logarithmically discretized and the model
then converted into the form of a one dimensional tight
binding chain, coupled via an effective hybridization Vσ to
the impurity at one end. In this representation Kσ(ω) =
|Vσ|2g(N)0,σ (ω), where g
0,σ (ω) is the one-electron Green’s
function for the first site of the isolated conduction elec-
tron chain of length N . The impurity Green’s function for
this discretized model then takes the form,
Gimpσ (ω) =
ω − εdσ − |Vσ|2g(N)0,σ (ω)−Σσ(ω)
. (21)
We can find the quasiparticle excitations of this model
by expanding the self-energy Σσ(ω) in the denominator of
this equation to first order in ω, and write the result in
the form,
Gimpσ (ω) =
ω − ε̃dσ − |Ṽσ |2g(N)0,σ (ω) +O(ω2)
, (22)
where
ε̃dσ = zσ[εdσ +Σσ(0)], |Ṽσ|2 = zσ|Vσ|2. (23)
We can then define a free quasiparticle propagator, G̃0,σ(ω),
0,σ (ω) =
ω − ε̃dσ − |Ṽσ |2g(N)0,σ (ω)
, (24)
and interpret zσ as the local quasiparticle weight.
In the NRG calculation the many-body excitations are
calculated iteratively, starting at the impurity site, and
increasing the chain length N by one site with each itera-
tion. When the matrices become too large to handle, only
the lowest 500-1500 states are kept at each iteration. The
many-body energy levels for the Nth iteration and the
set of quantum numbers M , EM (N), depend on the chain
length N and the discretization parameter Λ > 1. When
N becomes large these energy levels go to zero as Λ−N/2.
We now conjecture that the lowest single particle Eσp (N)
and single hole excitations Eσh (N) determined from the
NRG many-body excitations correspond to quasiparticle
excitations. If this is the case then they should correspond
to the poles of the quasiparticle Green’s function given in
equation (24), with values of Ṽσ and ε̃dσ, which are inde-
pendent of N as N → ∞. We can test this hypothesis by
substituting the values, ω = Eσp (N) and ω = E
h (N), into
the equation,
ω − ε̃dσ − |Ṽσ|2g(N)0,σ (ω) = 0, (25)
and deduce values of Ṽσ and ε̃dσ, which will in general
depend upon N . From these we can deduce zσ = |Ṽσ/Vσ|2
and µ̃0,σ = −ε̃dσ, which will also depend upon N , but if
the lowest single particle excitations of the system do cor-
respond to free quasiparticles, the values of zσ and µ̃0,σ
will become independent of N for large N . It should be
noted that we need both the particle and hole excitations
for each spin to determine the four renormalized parame-
ters. The parameters corresponding to spin up involve the
particle excitations with spin up and the hole excitations
with spin down.
That parameters can be found, which are independent
ofN for largeN , can be seen in figure 3, where we take the
results of a Kσ(ω) and µσ from the antiferromagnetic self-
consistent solution for the Hubbard model with U = 3 and
10% doping, using a value for the discretization parameter
Λ = 1.8.
0 10 20 30 40 50
Fig. 3. (Color online) The N-dependence of the renormalized
parameters zσ and µ̃0,σ for U = 3 and x = 0.9.
It can be seen that after about 25 iterations all the values
deduced for zσ and µ̃0,σ become independent of N . In the
next section, where we compare these results with the cor-
responding values deduced directly from the self-energy,
we get further confirmation that the values deduced re-
ally do describe the quasiparticle excitations of the lattice
model.
When two or more quasiparticles are excited from the
interacting ground state, there will be an interaction be-
tween them. For the Anderson impurity model this inter-
action will be local and can be expressed as Ũ , a renor-
malized value of the original interaction of the ‘bare’ par-
6 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
ticles. The value of Ũ can be deduced by looking at low-
est lying two-particle excitations derived from NRG cal-
culation. These could either be two-particle excitations,
E↑,↓pp (N), two-hole excitations, E
hh (N) or a particle-hole
excitation E
ph (N). By looking at the difference between
a two-particle excitation and two single particle excita-
tions, E↑,↓pp (N) − E↑p(N) − E↓p(N), as a function of N
we can deduce an effective interaction Ũ↑,↓pp (N) between
these two quasiparticles, as has been described fully ear-
lier for the standard Anderson model [27]. In a similar way
we can deduce an effective interaction between two holes,
hh (N), or a particle and hole, −Ũ
ph (N). To be able to
define a single quasiparticle interaction Ũ , not only must
Ũ↑,↓pp (N), Ũ
hh (N) and Ũ
ph (N), give values which are in-
dependent of N for large N , these values must be equal
so Ũ↑,↓pp = Ũ
hh = Ũ
ph = Ũ .
0 10 20 30 40 50
Fig. 4. (Color online) The N-dependence of the renormal-
ized particle-particle, particle-hole and hole-hole interactions
for U = 6 and x = 0.9, showing that they converge to a unique
value Ũ .
In figure 4 we give the values of Ũ↑,↓pp (N), Ũ
hh (N) and
ph (N) as deduced from DMFT-NRG calculation for the
Hubbard model in an antiferromagnetic state with U = 6,
10% doping and Λ = 1.8. It can be seen that the three
sets of results settle down to a common value Ũ .
We can go further and identify Ũ with the local quasi-
particle 4-vertex interaction for the effective impurity model,
Ũ = z↑z↓Γ↑,↓,↓,↑(0, 0, 0, 0), (26)
where Γ↑,↓,↓,↑(ω1, ω2, ω3, ω4) is the total 4-vertex at the
impurity site, which is equal to the same quantity for a
site in the lattice model. With this interpretation it is
possible to identify these parameters with those used in a
renormalized perturbation expansion. The parameters, V ,
εd,σ and U , together with g
0,σ(ω), specify the effective im-
purity model. The renormalized parameters, Ṽ , ε̃d,σ and
Ũ , together with gN0,σ(ω), can be used as an alternative
way of specifying this model. The renormalized perturba-
tion theory (RPT) is set up by expanding the self-energy
to order ω, as earlier, but retaining all the higher order
correction terms in a remainder term,
Σσ(ω) = Σσ(0) + ωΣ
σ(0) +Σ
σ (ω), (27)
where Σremσ (ω) is the remainder term. On substituting
this into the equation for the impurity Green’s function in
equation (11), we can deduce a general expression for the
quasiparticle Green’s function in the form,
G̃impσ (ω) =
ω − ε̃dσ − K̃σ(ω)− Σ̃σ(ω)
, (28)
where K̃σ(ω) = zσKσ(ω) and Σ̃σ(ω) = zσΣ
σ (ω) plays
the role of a renormalized self-energy. A diagrammatic
perturbation theory can then be carried out for Σ̃σ(ω)
in terms of the free quasiparticle propagators, with ad-
ditional diagrams arising from counter terms, which are
required to prevent over-counting (renormalization condi-
tions) [29,9,30]. This form of perturbation theory is valid
for all energy scales but is particularly effective for calcu-
lating the low energy terms arising from the quasiparticle
interactions. For the symmetric Anderson impurity model
it has been shown that this perturbation theory taken to
second order in Ũ , gives the exact spin and charge suscep-
tibilities at T = 0, and the exact T 2 contribution to the
conductivity [9].
Because, within DMFT, the self-energy for the lat-
tice is the same as that for the effective impurity, we can
equally well use the effective impurity model to calculate
it. This means that we can use the renormalized pertur-
bation theory for the effective impurity model to estimate
the correction terms to the free quasiparticle picture aris-
ing from the quasiparticle interactions.
3.2 Local Quasiparticle Weight
We now consider the values of the local quasiparticle weight
factor zσ, commonly known also as the wavefunction renor-
malization factor. This is an important factor in deter-
mining the parameters needed to describe the low energy
behavior of the system. When there is no k-dependence
of the self-energy, as is the case for infinite dimensional
models and DMFT, the effective mass of the quasiparti-
cles in the paramagnetic state is proportional to 1/zσ. We
show later that in the antiferromagnetic state the expres-
sion is more complicated and depends both on zσ and the
renormalized chemical potential µ̃0,σ. We have determined
zσ from the NRG results by the two methods described
and give the values of zσ deduced for both spin types as a
function of doping in figure 5. The results are for the case
U = 3, where there is antiferromagnetic order and the ex-
ternal staggered field has been set to zero. It can be seen
that there is a reasonable agreement between the values
obtained by the two different methods of calculation. Visi-
ble differences can be attributed to the inaccuracies when
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 7
0 0.05 0.1 0.15 0.2
 from Σ
 from Σ
 from FP
 from FP
Fig. 5. (Color online) The local quasiparticle weight zσ as de-
duced directly from the self-energy and also from the impurity
fixed point (FP) for U = 3 and various dopings.
numerically computing the derivative of the self-energy,
whose calculation involves a broadening procedure. When
the system is doped but still ordered we have z↑ 6= z↓, and
the renormalization effects are stronger for the minority
(down) spin particles on the sublattice. This is similar to
the results we found for a doped Hubbard model in a para-
magnetic state in the presence of a strong uniform mag-
netic field [8]. For a certain range of dopings the values of
z↑ and z↓ do not vary much. The tendency is that z↓ first
decreases and later increases, whereas z↑ decreases over
the whole range until both of them merge at the doping
point where the antiferromagnetic order disappears.
The results for the corresponding case with U = 6, a
value which is larger than the bandwidth, are shown in
figure 6.
0 0.05 0.1 0.15 0.2
 from Σ
 from Σ
 from FP
 from FP
Fig. 6. (Color online) The local quasiparticle weight zσ as
deduced directly from the self-energy and from the impurity
fixed point (FP) for U = 6 and various dopings.
On the whole the behavior is quite similar to that for
the case U = 3, only that the renormalization effects
are more pronounced. For a range of dopings the local
quasiparticle weights do not change much and have the
same tendency as described above. The implications for
the spectral quasiparticle weight and the effective mass
enhancement will be discussed in detail later.
3.3 Renormalized chemical potential
In figure 7 we give the results for the renormalized chemi-
cal potential, µ̃0,σ [defined in equation (17) and (23)], for
the two spin types in the spontaneously ordered antifer-
romagnetic states for U = 3 and U = 6 for a range of
dopings.
0 0.05 0.1 0.15
 from Σ
 from Σ
 from FP
 from FP
0 0.05 0.1 0.15
 from Σ
 from Σ
 from FP
 from FP
Fig. 7. (Color online) The renormalized chemical potential
µ̃0,σ as deduced directly from the self-energy and from the
impurity fixed point (FP) for various dopings for U = 3 (upper
panel) and U = 6 (lower panel).
The values calculated by the two different methods can be
seen to be in good agreement here, as well. We have added
the values for the half filled case. These were calculated
from the self-energy in the gap at ω = 0. The general
8 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
behavior of the values for µ̃0,σ for the case with U = 6 is
very similar to the case with smaller U
The renormalized chemical potential µ̃0,σ is an impor-
tant parameter in specifying the form of the local sublat-
tice quasiparticle spectral density ρ̃0σ(ω). From equation
(20) it can be seen that, as ω → −µ̃0,σ, ρ̃0,σ(ω) behaves
asymptotically as
ρ̃0,σ(ω) ∼
ω + µ̃0,σ
, (29)
so the quasiparticle density of states has a square root
singularity at ω = −µ̃0,σ. On the other hand, however, as
ω → −µ̃0,−σ, ρ̃0,σ(ω) behaves as
ρ̃0,σ(ω) ∼
ω + µ̃0,−σ, (30)
so the quasiparticle density of states goes to zero at ω =
−µ̃0,−σ. Between the two points, ω = −µ̃0,σ and ω =
−µ̃0,−σ, the quasiparticle density of states has a gap of
magnitude 2∆µ̃. As can be seen in figure 7 this free quasi-
particle gap decreases with increasing doping and closes
in the paramagnetic state. If we take into account the
values at half filling we see a strong reduction of 2∆µ̃,
when doping the system. We also see that µ̃0,↑ drops to
small negative values for finite hole doping, which corre-
sponds to the fact that the Fermi level then lies within the
lower band. These features will be seen clearly in the fig-
ures presented in the next section, where we compare the
quasiparticle densities of states with the full local spectral
densities calculated from the DMFT-NRG.
3.4 The Quasiparticle Interaction
The quasiparticles can be further characterized by an ef-
fective interaction Ũ as described before. In figure 8 we
plot the doping dependence of the renormalized interac-
tion over a range of dopings and U = 3 and U = 6.
We can see that in both cases the values decrease with
increasing doping. Hence, the effective quasiparticle inter-
action is stronger for a smaller hole density. For a certain
range of dopings, however, Ũ does not vary much. We can
also see that the ratio Ũ/U for the effective interaction as-
sume smaller values the larger the bare U becomes. Also
the absolute value of Ũ , i.e. without the scaling with U as
in figure 8, is smaller for larger bare U for the full range
of dopings. This effect of smaller quasiparticle interactions
for the stronger coupling case can be seen as sharper quasi-
particle peaks for larger U as will be discussed in the next
section.
4 Spectra and Quasiparticle Bands
4.1 Local Spectra
In this section we examine how well the local sublattice
quasiparticle density of states ρ̃0,σ(ω), evaluated from equa-
tion (20) with the renormalised parameters, describes the
0.05 0.1 0.15
 U / U for U =3
 U / U for U =6
Fig. 8. (Color online) The renormalised quasiparticle interac-
tion Ũ/U as deduced from the impurity fixed point for various
dopings and U = 3, 6.
low energy features seen in the local spectral density ρσ(ω)
calculated from the DMFT-NRG. At half filling there is
a gap at the Fermi level, so there are no single particle
excitations in the immediate neighbourhood of the Fermi
level, and this is not a very interesting case to consider.
We look in detail at the case of 10% doping where the
Fermi level lies at the top of the lower band, and consider
the two cases U = 3 and U = 6. In the upper panel of
figure 9 we compare the spectral density ρ↑(ω) with the
corresponding quantity z↑ρ̃0,↑(ω), from the quasiparticle
density of states.
We see that the behavior near the Fermi level (ω = 0),
and the singular feature seen in the lower branch of ρ↑(ω),
are well reproduced by the quasiparticle density of states.
Above the Fermi level there is a peak in the quasiparti-
cle density of states similar to that in the full spectrum
but somewhat more pronounced. Above the Fermi level
and below the upper peak there is a pseudo-gap region. In
the free quasiparticle spectrum it is a definite gap. In the
spectrum calculated from the direct NRG evaluation it ap-
pears as a pseudo-gap, with rather small spectral weight
just above the Fermi level. From the direct DMFT-NRG
calculations, due to the broadening features introduced to
obtain a continuous spectrum, it is not always possible
to say definitively whether there is a true gap above the
Fermi level or not. To resolve this question we can ap-
peal to the renormalised perturbation theory to look at
the corrections to the quasiparticle density of states aris-
ing from the quasiparticle interactions. A calculation of
the imaginary part of the renormalised self-energy Σ̃σ(ω)
to order Ũ2 should be sufficient to settle this issue. The
imaginary part of the second order diagram for the renor-
malised self-energy in the limit T → 0 for ω > 0 is given
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 9
−2 −1 0 1 2 3 4
−2 −1 0 1 2 3 4
Fig. 9. (Color online) The free local quasiparticle spectrum
(dashed line) in comparison with DMFT-NRG spectrum for
x = 0.9 and U = 3 for the spin-up electrons (upper panel) and
spin-down electrons (lower panel).
ImΣ̃(2)σ (ω) = πŨ
dε2 ρ̃0,σ(ε1)ρ̃0,−σ(ω − ε1 + ε2)
× ρ̃0,−σ(ε2)θ(ω − ε1 + ε2), (31)
where ρ̃0,σ(ε) is the free quasiparticle density of states.
The integration area is a triangle in the (ε1, ε2)-plane as
shown in figure 10.
To analyze the behavior of ImΣ̃
σ (ω) in the regime |µ̃0,↑| <
ω < |µ̃0,↓| we have to study where the integrand is non-
zero taking into account that ρ̃0,σ(ε) = 0 for |µ̃0,↑| <
ε < |µ̃0,↓|. The only non-zero contribution comes from
the small shaded region in figure 10, which leads to the
estimate,
ImΣ̃(2)σ (ω) ≃ πŨ2ρ̃0,σ(0)ρ̃0,−σ(−ω)ρ̃0,−σ(0)µ̃20,↑. (32)
When µ̃0,↑ is small, which occurs when the lower edge of
the gap in the quasiparticle density of states is very near
the Fermi level, this contribution to the imaginary part
PSfrag replacements
−ω + |µ̃0,↑|
|µ̃0,↑| |µ̃0,↓|
Fig. 10. (Color online) Integration region in the (ε1, ε2)-plane
for the imaginary part of the self-energy. The original triangle
region (0, ω,−ω) for integration in equation (31) is reduced in
the gap region, |µ̃0,↑| < ω < |µ̃0,↓|, to the small shaded region
shown in the figure.
of the renormalized self-energy will be finite but small. It
decreases with ω due the behavior of ρ̃0,−σ(−ω). Based on
this argument we conclude that there is a small, but finite
imaginary part of the self-energy in the free quasiparticle
gap 2∆µ̃, when it lies above the Fermi level, giving rise
to a finite spectral weight there. However, this spectral
weight is very small close to the lower edge of the free
quasiparticle density of states, when this edge lies only
just above the Fermi level.
In the lower panel of figure 9 we compare the ρ↓(ω)
with z↓ρ̃0,↓(ω). We see that in this case also the quasi-
particle density of states reproduces well the spectrum in
the region of the Fermi level and the peak structure in
the lower band, which is non-singular in this case. The
position of the peak above the Fermi level is also well re-
produced, but the peak in the free quasiparticle density
of states is singular, whereas that in the DMFT-NRG re-
sults is not. We would expect to lose this singularity in the
free quasiparticle density of states once the quasiparticle
scattering is taken into account and the renormalized self-
energy is included. It is possible also that the peak above
the Fermi level in the DMFT-NRG spectrum should be
sharper, as there is some tendency for the broadening in-
troduced in this approach to flatten peaked features in
regions away from the Fermi level. The spectral weight
in the pseudo-gap is even smaller than in the case for the
spin-up electrons, particularly in the region of the gap that
lies closest to the Fermi level. This is qualitatively in line
with the conclusions based on the renormalized pertur-
bation theory estimate of the effects of the quasiparticle
scattering.
We see very similar features in the spectra for the case
U = 6 and also 10% doping shown in figure 11.
Here, the peaks near the Fermi level are a bit sharper. The
observations made on the comparison of the quasiparticle
and DMFT-NRG spectra apply equally well to this case.
In addition to the low energy features charge peaks cor-
responding to the Hubbard bands appear. The lower one
can be identified in the full spectra, whereas the upper
10 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
−2 −1 0 1 2 3
−2 −1 0 1 2 3
Fig. 11. (Color online) The free quasiparticle spectrum
(dashed line) in comparison with DMFT-NRG spectrum for
x = 0.9 and U = 6 for the spin-up electrons (upper panel) and
spin-down electrons (lower panel).
Hubbard peak is not seen on the energy scale shown. The
quasiparticle density of states does not contain informa-
tion about these features at higher energy.
4.2 k-resolved Spectra
We can learn more about the low energy single parti-
cle excitations by looking at the spectral density of the
Green’s function Gk,σ(ω) in equation (7) for a given wave-
vector k. With the self-energies Σσ(ω) calculated within
the DMFT-NRG approach all elements of this matrix can
be evaluated. The local spectra and self-energies are spin-
dependent in the doped broken symmetry state, however,
the free quasiparticle bands E0
k,± [equation (19)] do not
depend on the spin. Here, we focus on the diagonal part
of Gk,σ(ω) corresponding to the A sublattice,
Gk,σ(ω) =
ζ−σ(ω)
ζσ(ω)ζ−σ(ω)− ε2k
. (33)
The weights of the quasiparticle excitations in this case
depend on the spin corresponding to the sublattice prop-
erties. We note that one can also analyze the quasiparticle
bands differently, for instance, from the k-resolved spectra
and the diagonal form of Gk,σ(ω). The form of the quasi-
particle bands remains unchanged then, but the weights
differ and do not depend on the spin σ in that case.
We first of all look at the Fermi surface which is the
locus of the k-points at the Fermi level (ω = 0) where
the Green’s function has poles. The conduction electron
energy εkF at these point is given by
ε2kF = (µ↑ −Σ↑(0))(µ↓ −Σ↓(0)). (34)
By Luttinger’s theorem, the volume of the Fermi sur-
face for the interacting system must equal that for the
non-interacting system with the same density. As the self-
energy depends only on ω, the two Fermi surfaces must
also have the same shape, and therefore must be identical.
The Fermi surface of the non-interacting system is given
by εkF = µ0, where µ0 is the chemical potential of the
non-interacting system in the absence of any applied field
for the given density. For this to be identical with that
given in equation (34),
(µ↑ −Σ↑(0))(µ↓ −Σ↓(0)) = µ20. (35)
We can check that this relation indeed holds from our
results for Σσ(ω) and µσ, independent of the value of U ,
or in the case of an applied staggered field, independent of
the field value. This relation implies that the total number
of electrons per site n can be calculated from an integral
over the non-interacting density of states,
n = 2
ρ0(ω)dω, (36)
where in the hole doped case µ0 = −
µ̄↑µ̄↓ and µ̄σ =
µσ −Σσ(0).
To relate this result to the quasiparticle picture, we
expand the self-energy in equation (33) to first order in
ω, but retain the remainder term, ΣRσ (ω) as in equation
(27). The Green’s function can be rewritten in the form,
G̃k,σ(ω) =
ζ̃−σ(ω)
ζ̃σ(ω)ζ̃−σ(ω)− ε̃2k
, (37)
where ζ̃σ(ω) = ω + µ̃0,σ − Σ̃σ(ω). We define a quasipar-
ticle Green’s function G̃k,σ(ω) via zσG̃k,σ(ω) = Gk,σ(ω).
The renormalized self-energy vanishes, Σ̃σ(ω) = 0, for the
free quasiparticle Green’s function G̃
k,σ(ω), which can be
separated into two independent branches of free quasipar-
ticles,
k,σ(ω) =
uσ+(εk)
ω − E0
uσ−(εk)
ω − E0
, (38)
where E0
k,± was defined in equation (19) and the weights
are given by
uσ±(εk) =
1∓ σ ∆µ̃√
∆µ̃2 + ε̃2
. (39)
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 11
This is similar in form to mean field theory, which would
correspond to putting zσ = 1, and ∆µ̃ = Ummf , where
mmf is the mean field sublattice magnetization. The spin
dependent contribution in (39) which arises from the sec-
ond term is most marked in the region near the Fermi
level. It should be noted that the quasiparticle excitations
k,± and weights u
±(εk) here are defined by expanding
the self-energy at ω = 0. This is so that they correspond
to the free quasiparticles in the renormalized perturbation
theory which have an infinite lifetime.
The spectral density ρ̃
(ω) for this free quasiparticle
Green’s function is a set of delta-functions,
k,σ(ω) = u
+(εk)δ(ω−E0k,+)+uσ−(εk)δ(ω−E0k,−). (40)
On the Fermi surface E0k,− = 0, which is consistent with
the result for the Fermi surface given in equation (34).
Summing over k gives the local quasiparticle density of
states in equation (20). We define the quasiparticle num-
ber ñ as the integral of the sum of the spin up and spin
down quasiparticle density of states up to the Fermi level,
dω(ω + µ̃)
(ω + µ̃)2 −∆µ̃2
(ω + µ̃)2 −∆µ̃2
If we change the variable of integration to ω′, where
z↑z↓ =
(ω + µ̃)2 −∆µ̃2,
the integration can be shown to be identical with that
in equation (36), using the fact that µ0 = −
µ̄↑µ̄↓. We
then have an alternative statement of Luttinger’s theorem
in the form ñ = n. This can also be found by summing
both spin components in (40), integrating over ω and then
converting the k-summation to an integral over the free
electron density of states ρ0(ω). We can check in our nu-
merical results that the relation in this form holds. The
occupation number n can be calculated both from a direct
evaluation of the number operator in the ground state, and
also by integrating the sum of the spectral densities ρσ(ω)
of the full local Green’s function to the Fermi level. The
value of ñ is similarly determined from the integral over
the total quasiparticle density of states, ρ̃σ(ω). All three
results were found to be in good agreement, to within one
or two percent deviation at the most.
Before discussing the k-resolved spectra in detail we
would like to ask what the spectral weight wqp of a quasi-
particle excitation at the Fermi level in the lower band is,
such that the Green’s function reads there
Gqp(ω) =
ω − E0
. (42)
To calculate wqp, we can not focus on the spin depen-
dent local sublattice quantities, but have to sum over both
sublattices or equivalently the two spin components. The
reason for this is that the antiferromagnetically ordered
state does not possess any net magnetization and has on
average as many spin up polarized as spin down electrons.
The division in the A and B sublattices is convenient for
the DMFT calculations but somewhat artificial. In our
case with hole doping the Fermi level lies within the lower
band, which for the free quasiparticles is denoted by E0
The corresponding weight on the Fermi surface defined by
(34) is then given by
wqp =
−(εkF) =
z↑ + z↓
(z↑ − z↓)∆µ̃
2|µ̃|
, (43)
where the average of the renormalized chemical potential
µ̃ and the difference ∆µ̃ were defined below equation (19).
From the definition of ∆µ̃ we can see that the second term
in (43) is spin rotation invariant. The spectral quasipar-
ticle weight wqp on the Fermi surface depends not only
on the renormalization factors zσ, but also on the renor-
malized chemical potentials µ̃0,σ. The same result for the
weight (43) can be obtained from the diagonal form of
Gk,σ(ω) and the spectral weight of the lower band. The
weight wqp corresponds to the spectral weight Z at the
Fermi level as for example given in references [3,31,32].
The first term of the result for wqp is like the arithmetic
average of zσ. From figures 5 and 6 we can see that z↑ > z↓
and from figure 7 that µ̃0,↓ < µ̃0,↑ < 0. Therefore the sec-
ond term in (43) gives a positive contribution to the spec-
tral weight. At the end of the section in figure 18 we show
values of wqp in comparison with the arithmetic average
of zσ.
In order to understand better the properties of the
quasiparticle bands, we now compare the quasiparticle
spectrum with the k-resolved spectral density ρk,σ(ω) de-
rived from the DMFT-NRG results. In figure 12 we make
a comparison for the case of 12.5% doping with U = 3
for the Green’s function Gk,σ(ω) given in equation (33),
ρk,σ(ω) = −ImGk,σ(ω+)/π, where ω+ = ω+ iη, with η →
0, with that derived for the free quasiparticles, zσ ρ̃
k,σ(ω)
from equation (40). The delta-functions of the free quasi-
particle results are indicated by arrows with the height of
the arrow indicating the value of the corresponding spec-
tral weight. The plots as a function of ω are shown for a
sequence values of εk and, where the peaks in ρk,σ(ω)
get very narrow and high in the vicinity of the Fermi
level, they have been truncated. It can be seen that the
free quasiparticle results give a reasonable picture of the
form of ρk,σ(ω), particularly in the immediate region of
the Fermi level. There is considerable variation along the
curves in the way the overall spectral weight is distributed
between the excitations below and above the pseudo-gap
as a function of εk. This is most marked in the region near
the Fermi level for the spin-up electrons (upper panel)
where most of the spectral weight is in the lower band and
it is much reduced in the upper band, whereas the opposite
is the case for the spin-down electrons. This is reflected
in the expression of the quasiparticle weights uσ±(εk) in
equation (39). For instance, u
−(εk) corresponding to the
lower band E0
k,− becomes maximal near the Fermi en-
ergy, whereas u
+(εk) goes to zero there. The finite width
of the quasiparticle peaks in ρk,σ(ω) can be described by
a RPT, when we take into account the renormalized self-
12 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
−1 −0.5 0 0.5 1
−1.125
−0.75
−0.375
0.375
1.125
−1 −0.5 0 0.5 1
−1.125
−0.75
−0.375
0.375
1.125
Fig. 12. (Color online) The spectral density ρk,σ(ω) for the
spin-up electrons (upper panel) and spin-down (lower panel)
plotted as a function of ω and a sequence of values of εk for
U = 3 and 12.5% doping. Also shown with arrows are the
positions of the free quasiparticle excitations, with the height
of the arrow indicating the corresponding weight.
energy Σ̃σ(ω) in equation (37). If we, for instance, use
the the second order approximation in Ũ , which was illus-
trated in the last section (31), we get a similar behavior
for small ω as seen for ρk,σ(ω) in figure 12.
From the positions of the peaks in the ρk,σ(ω) spectra
we can deduce two branches of an effective dispersion Ek,±
for single particle excitations and compare it with the ones
for the free quasiparticles E0
k,±. We give the results for
U = 3 in figure 13.
It can be seen that E0
k,− tracks the peak in the lower
band closely over a wide range of εk, −1.5 < εk < 1.5
(note the bandwidth W = 4). This is not the case in the
upper band, where E0
k,+ tracks the peak closely only in
the lowest section that lies closest to the Fermi level. As
one can see from the dotted line the Fermi level lies in
the lower band and intersects the lower band twice. This
corresponds to the two values with opposite sign ε±
can be see from equation (34).
The corresponding results for the k-resolved spectra
for U = 6 and also 12.5% doping are shown in figure 14.
−1.5 −1 −0.5 0 0.5 1 1.5
Fig. 13. (Color online) A plot of the peak positions Ek,± in
the spectral density ρk,σ(ω) (full line) as a function of εk for
U = 3 and 12.5% doping compared with the free quasiparticle
dispersion E0k (dashed line).
−1 −0.5 0 0.5 1
−1.125
−0.75
−0.375
0.375
1.125
−1 −0.5 0 0.5 1
−1.125
−0.75
−0.375
0.375
1.125
Fig. 14. (Color online) The spectral density ρk,σ(ω) for the
spin-up electrons (upper panel) and spin-down (lower panel)
plotted as a function of ω and a sequence of values of εk for
U = 6 and 12.5% doping. Also shown with arrows are the
positions of the free quasiparticle excitations, with the height
of the arrow indicating the corresponding weight.
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 13
In order to compare well with the case U = 3 we have
chosen an identical range for ω and εk, although the large
spectral peaks near the energy are very close together in
this presentation. It can be seen that the overall features
are very similar to those seen for U = 3. For the spin
up spectrum (upper panel) the peaks for the lower band
have most of the weight near the Fermi energy, whereas
the upper band is suppressed there, and vice versa for
the opposite spin direction. The lower bands are tracked
well by the free quasiparticles, and we can see that the
bands for the larger value of U are significantly flatter.
This is also clearly visible in the following figure 15, where
we again compare the quasiparticle band with the peak
position of the full spectra. On the range shown the lower
band Ek,− completely coincides with the free quasiparticle
band E0
−1.5 −1 −0.5 0 0.5 1 1.5
Fig. 15. (Color online) A plot of the peak positions Ek,± in the
spectral density ρk,σ(ω) (full line) as a function of εk for U = 6
and 12.5% doping compared with the free quasiparticle disper-
sion E0k (dashed line). On the range shown the lower band Ek,−
completely coincides with the free quasiparticle band E0k,−.
From the k-resolved spectra in figures 12 and 14 we can
also extract the width of the quasiparticle peak ∆qp in
the spectral density ρk,σ(ω) (majority spin σ =↑). Its in-
verse 1/∆qp gives a measure of the quasiparticle lifetime.
The results for ∆qp for the lower band Ek,− for the two
cases U = 3, 6 and 12.5% doping are shown in figure 16 as
function of εk.
This plot brings out more clearly the feature that can
be seen already in figures 12 and 14 (upper panel) that
the width increases sharply when we move away from the
Fermi level and the values for the width ∆qp for U = 6
are significantly smaller than those for U = 3. This is in
line with the fact that the local quasiparticle interaction
Ũ is smaller for the larger value of the bare interaction U
as commented on earlier. The free quasiparticle picture is
therefore even more appropriate in the case with stronger
interaction. To numerical accuracy the width vanishes at
and is finite for the interval ε−
< εk < ε
which lies
within the lower band but above the Fermi level.
−1 −0.5 0 0.5 1
Fig. 16. (Color online) A plot of the width of the peaks ∆qp in
the spectral density of the majority spin ρk,↑(ω) as a function
of εk for U = 3 (dashed line) and U = 6 (full line) and 12.5%
doping.
Another quasiparticle property, the effective mass en-
hancement m∗/m, can be extracted by calculating the
derivative of E0k,− in (19) with respect to εk, which yields
when evaluated at the Fermi energy (34),
µ̃0,↑µ̃0,↓
. (44)
The effective mass enhancement therefore does not only
depend on zσ, but also on the renormalized chemical po-
tentials µ̃0,σ. The general trend for m
∗/m as function of
U can be seen in figure 17 for the case of 7.5% doping.
0.2 0.4 0.6 0.8 1
Fig. 17. The ratio m∗/m according to (44) plotted over a
range of t2/U for 7.5% doping.
The effective mass increases sharply for large U as the
hole motion is energetically more costly in the ordered
background. The fact that the lower band for U = 6 seen
in figure 15 is flatter than in the case U = 3 in figure 13
14 J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model
can be attributed to the larger effective mass. We find a
similar behavior of m∗/m as function of U for different
filling factors from the ones shown in figure 16. The trend
is that the effective mass enhancement is less pronounced
for larger doping, which is intuitively understandable by
the quasiparticle motion in an ordered background.
In the DMFT framework for the paramagnetic state
as well as the case with homogeneous magnetic field, the
quasiparticle spectral weight wqp and the inverse of the ef-
fective mass enhancement m/m∗ can be described simply
by the renormalization factor zσ. In figure 18 we show a
comparison of the spectral quasiparticle weight wqp (43),
the arithmetic, (z↑ + z↓)/2, and geometric,
z↑z↓, aver-
age of the renormalization factors, and the inverse of the
effective mass, m/m∗, (44) for U = 3 for various dopings.
0 0.05 0.1 0.15
m/ m*
Fig. 18. (Color online) Comparison of the spectral quasiparti-
cle weight wqp from equation (43), the arithmetic, (z↑+ z↓)/2,
and geometric,
z↑z↓, average of the renormalization factors,
and the inverse of the effective mass, m/m∗ from equation (44),
for U = 3 and a range of dopings.
As seen in this case with antiferromagnetic symmetry break-
ing these quantities take a different form (43) and (44)
and have distinct values. For different values of U the
behaviour is qualitatively similar. As a first approxima-
tion the quasiparticle spectral weight wqp corresponds to
the arithmetic average of the renormalization factors zσ,
whilst m/m∗ relates to the geometric average. In general,
one can, however, not omit the dependence on the renor-
malized chemical potential as it gives a significant contri-
bution as can be seen in figure 18. This can be understood
for example for the limit of zero doping. The system then
becomes an antiferromagnetically ordered insulator with
spectral gap. The weights zσ tend to finite values, but the
effective mass must diverge. This is found in equation (44)
since µ̃0,↑ → 0 for δ → 0, and the trend can be seen in
figure 18.
5 Conclusions
We have studied the field induced and spontaneous anti-
ferromagnetic ordering in the hole doped Hubbard model
with DMFT-NRG calculations at T = 0. A phase diagram
separating antiferromagnetic and paramagnetic solutions
for different values of doping and interactions U ranging
from zero to about 1.5 times the bandwidth W has been
established and is in agreement with earlier results by
Zitzler et al. [10]. Our main objective has been to ana-
lyze the properties of the quasiparticle excitations in the
metallic antiferromagnetic state. We presented two differ-
ent ways of calculating the parameters zσ and µ̃0,σ, which
define the renormalized quasiparticles, and the two sets
of results have been shown to be in agreement. We have
also been able to deduce the effective on-site quasiparticle
interaction Ũ from the NRG low lying excitations. The
low energy properties of the local spectral function can
be understood in terms of the free quasiparticle picture.
We have used the second order perturbation expansion in
powers of Ũ to estimate the spectral weight in the pseudo-
gap region above the Fermi level.
We have been able to compare the position of the
peaks found in the k-dependent spectral functions with
the dispersion relation for the free quasiparticles. The free
quasiparticle dispersion gives a very good fit to the posi-
tion of these peaks in the lower band which intersects the
Fermi level. The quasiparticle lifetime, as deduced from
the widths of the peaks in the spectrum, increases for
stronger interactions. This is consistent with the fact that
the on-site quasiparticle interaction Ũ , which gives the
quasiparticles a finite lifetime, decreases with increase of
U in the same range. We have also shown how the spec-
tral quasiparticle weight at the Fermi level wqp and the
effective mass can be deduced from the parameters zσ and
µ̃0,σ. The effective mass is found to increase with the in-
teraction, and it diverges in the limit of zero doping whilst
wqp remains finite.
We have found that Luttinger’s theorem for the total
electron density in the antiferromagnetically ordered state
holds within the numerical accuracy for the range of dop-
ings and interactions studied. This is a further indication
that many aspects of Fermi liquid description may hold in
situations with symmetry breaking.
It is not easy to make a direct comparison of our re-
sults with earlier work [3] analyzing the quasiparticle exci-
tations in an metallic antiferromagnet as these have been
mainly based on the t − J-model for one or two holes
in a finite cluster. However, at a semiquantitative level,
the overall trend in our results seems to be similar to the
results surveyed by Dagotto, where the effective quasipar-
ticle bandwidth Weff is found to decrease with decreasing
J . This is line with our results if we identify Weff ∼ m/m∗
and J ∼ t2/U (see figure 17). Our values for the spectral
quasiparticle weight wqp are qualitatively similar to those
presented as the wavefunction renormalization Z in the
review article by Dagotto (see fig 27 [3]), and also the
ones reported more recently [32].
J. Bauer and A.C. Hewson: Renormalized quasiparticles in antiferromagnetic states of the Hubbard model 15
Acknowledgment
We wish to thank N. Dupuis, D.M. Edwards, W. Koller, D.
Meyer and A. Oguri for helpful discussions and W. Koller
and D. Meyer for their contributions to the development
of the NRG programs. We also acknowledge stimulating
discussions with G. Sangiovanni. One of us (J.B.) thanks
the Gottlieb Daimler and Karl Benz Foundation, the Ger-
man Academic exchange service (DAAD) and the EPSRC
for financial support.
References
1. P. W. Anderson, Science 235, 1196 (1987).
2. P. A. Lee, N. Nagaosa, and X.-G. Wen, Rev. Mod. Phys.
78, 17 (2006).
3. E. Dagotto, Rev. Mod. Phys. 66, 763 (1994).
4. A.-M. S. Tremblay, B. Kyung, and D. Senechal, Low Tem-
perature Physics 32, 424 (2006).
5. W. Metzner and D. Vollhardt, Phys. Rev. Lett. 62, 324
(1989).
6. E. Müller-Hartmann, Z. Phys. B 74, 507 (1989).
7. R. Bulla, T. Costi, and T. Pruschke, cond-mat/0701105
(unpublished).
8. J. Bauer and A. C. Hewson, cond-mat/0705.3824, to be
published in Phys. Rev. B (unpublished).
9. A. C. Hewson, Phys. Rev. Lett. 70, 4007 (1993).
10. R. Zitzler, T. Pruschke, and R. Bulla, Eur. Phys. J. B 27,
473 (2002).
11. J. Hubbard, Proc. R. Soc. London, Ser. A 276, 238 (1963).
12. A. Georges, G. Kotliar, W. Krauth, and M. Rozenberg,
Rev. Mod. Phys. 68, 13 (1996).
13. P. W. Anderson, Phys. Rev. 124, 41 (1961).
14. R. Bulla, Phys. Rev. Lett. 83, 136 (1999).
15. R. Peters, T. Pruschke, and F. B. Anders, Phys. Rev. B
74, 245114 (2006).
16. A. Weichselbaum and J. von Delft, cond-mat/0607497 (un-
published).
17. F. B. Anders and A. Schiller, Phys. Rev. Lett. 95, 196801
(2005).
18. R. Bulla, A. C. Hewson, and T. Pruschke, J. Phys.: Cond.
Mat. 10, 8365 (1998).
19. B. I. Shraiman and E. D. Siggia, Phys. Rev. Lett. 62, 1564
(1989).
20. M. Kato, K. Machida, H. Nakanishi, and M. Fujita, J.
Phys. Soc. Japan 59, 1047 (1990).
21. V. J. Emery, S. A. Kivelson, and H. Q. Lin, Phys. Rev.
Lett. 64, 475 (1990).
22. P. G. J. van Dongen, Phys. Rev. Lett. 74, 182 (1995).
23. P. G. J. van Dongen, Phys. Rev. B 54, 1584 (1996).
24. H. J. Schulz, Phys. Rev. Lett. 64, 1445 (1990).
25. J. K. Freericks and M. Jarrell, Phys. Rev. Lett. 74, 186
(1995).
26. V. J. Emery, S. A. Kivelson, and J. M. Tranquada, Proc.
Natl. Acad. Sci. (USA) 96, 8814 (1999).
27. A. C. Hewson, A. Oguri, and D. Meyer, Eur. Phys. J. B
40, 177 (2004).
28. H. R. Krishna-murthy, J. W. Wilkins, and K. G. Wilson,
Phys. Rev. B 21, 1003 (1980).
29. L. H. Ryder, Quantum Field Theory (Cambridge Univer-
sity Press, Cambridge, 1996).
30. A. C. Hewson, J. Phys.: Cond. Mat. 13, 10011 (2001).
31. G. Sangiovanni, A. Toschi, E. Koch, K. Held, M. Capone,
C. Castellani, O. Gunnarsson, S.-K. Mo, J. W. Allen, H.-D.
Kim, A. Sekiyama, A. Yamasaki, S. Suga, and P. Metcalf,
Phys. Rev. B 73, 205121 (2006).
32. G. Sangiovanni, O. Gunnarsson, E. Koch, C. Castellani,
and M. Capone, Phys. Rev. Lett. 97, 046404 (2006).
	Introduction
	Antiferromagnetic Broken Symmetry in DMFT
	Local Quasiparticle Parameters
	Spectra and Quasiparticle Bands
	Conclusions
ABSTRACT
  We analyze the properties of the quasiparticle excitations of metallic
antiferromagnetic states in a strongly correlated electron system. The study is
based on dynamical mean field theory (DMFT) for the infinite dimensional
Hubbard model with antiferromagnetic symmetry breaking. Self-consistent
solutions of the DMFT equations are calculated using the numerical
renormalization group (NRG). The low energy behavior in these results is then
analyzed in terms of renormalized quasiparticles. The parameters for these
quasiparticles are calculated directly from the NRG derived self-energy, and
also from the low energy fixed point of the effective impurity. They are found
to be in good agreement. We show that the main low energy features of the $\bf
k$-resolved spectral density can be understood in terms of the quasiparticle
picture. We also find that Luttinger's theorem is satisfied for the total
electron number in the doped antiferromagnetic state.

<|endoftext|><|startoftext|>
arXiv:0704.0244v2  [cond-mat.mtrl-sci]  4 Jun 2007
Comparison of exact-exchange calculations for solids in current-spin-density- and
spin-density-functional theory
S. Sharma1,2,∗ S. Pittalis2, S. Kurth2, S. Shallcross3, J. K. Dewhurst4, and E. K. U. Gross2
1 Fritz Haber Institute of the Max Planck Society, Faradayweg 4-6, D-14195 Berlin, Germany.
2 Institut für Theoretische Physik, Freie Universität Berlin, Arnimallee 14, D-14195 Berlin, Germany
3 Department of Physics, Technical University of Denmark,
Building 307, DK-2800 Kgs. Lyngby and
4 School of Chemistry, The University of Edinburgh, Edinburgh EH9 3JJ.
The relative merits of current-spin-density- and spin-density-functional theory are investigated
for solids treated within the exact-exchange-only approximation. Spin-orbit splittings and orbital
magnetic moments are determined at zero external magnetic field. We find that for magnetic (Fe,
Co and Ni) and non-magnetic (Si and Ge) solids, the exact-exchange current-spin-density functional
approach does not significantly improve the accuracy of the corresponding spin-density functional
results.
PACS numbers: 71.15 Mb, 71.15 Rf, 75.10 Lp
In the past 30 years, several generalizations of den-
sity functional theory (DFT) have been proposed. In the
early 70’s, DFT was extended to spin-DFT (SDFT) [1] by
including the spin magnetization as basic quantity in ad-
dition to the density. This allows for coupling of the spin
degrees of freedom to external magnetic fields and pro-
duces better results for spontaneously spin-polarized sys-
tems using approximate functionals. Adding yet another
density, the paramagnetic current, leads to the frame-
work of current-SDFT (CSDFT) [2, 3]. CSDFT includes
the coupling of the external magnetic field, through its
corresponding vector potential, to the orbital-degrees of
freedom [3].
SDFT has been enormously successful in predicting
the magnetic properties of materials. This success can
be attributed to the availability of exchange correlation
(xc) functionals which, even though originally designed
for non-magnetic systems, could be systematically ex-
tended to the spin polarized case. The most popular of
these functionals are the local spin density approxima-
tion (LSDA) and the generalized gradient approximation
(GGA). CSDFT, on the other hand, has not enjoyed the
same attention mainly because of problems which arise in
the extension of LSDA and GGA to include the paramag-
netic current density [4–6]. Exposing the homogeneous
electron gas to an external magnetic field leads to the
appearance of Landau levels which, in turn, give rise to
derivative discontinuities in the resulting xc energy den-
sity. Using this quantity to construct (semi-)local func-
tionals then automatically leads to local discontinuities in
the corresponding xc potentials, which are then awkward
to use in practical calculations.
Such problems can be avoided with the use of orbital
functionals and this fact, coupled with the success of
these functionals for SDFT calculations, has led to re-
cent interest in orbital functionals for CSDFT [7–11].
The results from these works have shown mixed suc-
cess. A modified version of the original CSDFT [12] lead
to promising results for spin-orbit induced splittings of
bands in solids, such as Si and Ge [9]. In contrast it was
found that for open-shell atoms and quantum dots the
difference between SDFT and CSDFT results was mini-
mal [7, 8]. Similarly, calculations for solids using a local
vorticity functional [13] and for quantum dots using a
LSDA-type xc functional [14] could not establish the su-
periority of CSDFT over SDFT.
In this work we present a systematic comparison of the
relative merits of CSDFT and SDFT for solids. Since
the Kohn-Sham (KS) system in CSDFT reproduces the
current of the interacting system, one would expect dif-
ferences between SDFT and CSDFT results for orbital
magnetic moments (which can be directly derived from
the current). With this in mind we calculate the orbital
magnetic moment of the spontaneous magnets Fe, Co and
Ni. Since CSDFT is believed to improve the spin-orbit
induced band splitting in the non-magnetic semiconduc-
tors Si and Ge [9] it makes these materials interesting
candidates for a study of the differences between the two
approaches.
Following Vignale and Rasolt [2, 3], the ground state
energy of a (non-relativistic) system of interacting elec-
trons in the presence of an external magnetic field
B0(r) = ∇×A0(r) can be written as functional of three
independent densities the particle density ρ(r), the mag-
netization density m(r) and the paramagnetic current
density jp(r). This functional is given by
http://arxiv.org/abs/0704.0244v2
E[ρ,m, jp] = Ts[ρ,m, jp] + U [ρ] + Exc[ρ,m, jp] +
ρ(r)v0(r) d
3r (1)
m(r) ·B0(r) d
jp(r) ·A0(r) d
ρ(r)A20(r) d
where Ts[ρ,m, jp] is the kinetic energy functional of non-interacting electrons, U [ρ] is the Hartree energy, and
Exc[ρ,m, jp] is the exchange-correlation energy. Minimization of Eq. (1) with respect to the three basic densities
leads to the Kohn-Sham (KS) equation which reads
As(r)
+ vs(r)− µBσ ·Bs(r)
Φj(r) = εjΦj(r) . (2)
Here σ is the vector of Pauli matrices and the Φi are spinor valued wave functions. The effective potentials vs, Bs
and As are such that the ground-state densities ρ, m and jp of the interacting system are reproduced. These effective
potentials are given by
vs(r) = v0(r) + vH(r) + vxc(r) +
0(r)−A
s (r)
, Bs(r) = B0(r) +Bxc(r), As(r) = A0(r) +Axc(r). (3)
Here, v0 is the external electrostatic potential and vH(r) =
ρ(r′)/|r − r′| d3r′ is the Hartree potential. The xc
potentials are given as functional derivatives of the xc energy with respect to the corresponding conjugate densities
which can be obtained from KS wave functions using the following relations
ρ(r) =
i (r)Φi(r), m(r) = −µB
i (r)σΦi(r), jp(r) =
i (r)∇Φi(r)−
i (r)
Φi(r)
where the sum runs over the occupied orbitals. For practical calculations, an approximation for the xc energy functional
Exc[ρ,m, jp] has to be adopted. Here we concentrate on approximations of the xc functional which explicitly depend
on the KS orbitals and therefore only implicitly on the densities. Such orbital functionals are usually treated within
the framework of the so-called Optimized Effective Potential (OEP) method [15–18] where the xc potential is obtained
as solution of the OEP integral equation. Recently, the OEP method has been generalized to non-collinear SDFT [19]
and CSDFT [7, 8]. Another generalization of the OEP method in the context of a spin-current DFT (SCDFT) based
on a different choice of densities has also been put forward [11]. In the present work the formalism of Refs. (7) and
(8) is used and the corresponding OEP equations can be put in a compact form as
(r)Ψk(r) + h.c. = 0, −µB
(r)σΨk(r) + h.c. = 0,
(r)∇Ψk(r) −
Ψk(r)
+ h.c. = 0 ,
where the so-called orbital shifts [17, 20] are defined as Ψk(r) =
∑unocc
Φj(r)Λkj
εk−εj
, here the summation runs over the
unoccupied states and
Λkj =
vxc(r
′)ρkj(r
Axc(r
′) · jpkj(r
′)−Bxc(r
′) ·mkj(r
′)− Φ
where ρkj(r) = Φ
(r)Φk(r), mkj(r) = −µBΦ
(r)σΦk(r) and jpkj(r) =
(r)∇Φk(r−
Φk(r)
Eq. (5) has a structure very similar to the OEP equa-
tions for non-collinear SDFT differing only by the redefi-
nition of the matrix Λ, which now also contains an extra
term depending upon the current density and its con-
jugate field. Due to their similar structure the CSDFT
OEP equations are solved by generalizing the ‘residue al-
gorithm’, successfully applied to solve the non-collinear
SDFT equations [19–21]. The only difference in the case
of CSDFT is introduction of an additional residue coming
from the third OEP equation in Eq. (5). In the present
work we have used the exchange-only exact-exchange
(EXX) functional to solve the OEP equations. The EXX
(gauge invariant) energy functional is the Fock exchange
energy but evaluated with KS spinors
EEXXx [{Φi}] ≡ −
∫ ∫ occ
i (r)Φj(r)Φ
′)Φi(r
|r− r′|
d3r d3r′ .
In order to keep the numerical analysis as accurate
as possible, in the present work all calculations are per-
formed using the state-of-the-art full-potential linearized
augmented plane wave (FPLAPW) method [22], imple-
mented within the EXCITING code [23]. The single-
electron problem is solved using an augmented plane
wave basis without using any shape approximation for
the effective potential. Likewise, the magnetization and
current densities and their conjugate fields are all treated
as unconstrained vector fields throughout space. The
deep lying core states (3 Ha below the Fermi level)
are treated as Dirac spinors and valence states as Pauli
spinors. To obtain the Pauli spinor states, the Hamilto-
nian containing only the scalar fields is diagonalized in
the LAPW basis: this is the first-variational step. The
scalar states thus obtained are then used as a basis to
set up a second-variational Hamiltonian with spinor de-
grees of freedom, which consists of the first-variational
eigenvalues along the diagonal, and the matrix elements
obtained from the external and effective vector fields in
Eq. (2). This is more efficient than simply using spinor
LAPW functions, but care must be taken to ensure there
are a sufficient number of first-variational eigenstates for
convergence of the second-variational problem. Spin-
orbit coupling is also included at this stage.
As was shown above for CSDFT, the magnetic field
couples not only to spin but also to the orbital degrees
of freedom through the vector potential. This makes CS-
DFT specifically important for magnetic materials and
particularly interesting for their orbital properties. By
analogy with SDFT, one might expect that the introduc-
tion of the paramagnetic current density gives an im-
provement in properties such as orbital moments and
spin-orbit induced band splitting, which are related to
this new basic variable. However, within the framework
of existing functionals it is yet to be established conclu-
sively that CSDFT performs better than SDFT for these
properties. The recent development of the OEP method
both for SDFT and CSDFT allows for a direct compar-
ison of these two approaches for the same xc functional,
namely EXX.
The orbital moments of spontaneous magnets Fe, Co
and Ni, in the absence of external magnetic fields and
with spin-orbit coupling included, are presented in Ta-
ble I. For SDFT, the LSDA, GGA and EXX functionals
are used, while for CSDFT the values are obtained us-
ing the EXX functional. It is clear from Table I that
there is no difference between the results obtained us-
ing EXX-CSDFT and EXX-SDFT. Formally, the jp de-
termined from SDFT does not correspond to the true
paramagnetic current density of the fully interacting sys-
SDFT CSDFT
Solid Exp. LSDA GGA EXX EXX
Fe 0.08 0.053 0.051 0.034 0.034
Co 0.14 0.069 0.073 0.013 0.013
Ni 0.05 0.038 0.037 0.029 0.029
36.2 36.7 63.4 63.4
TABLE I: Orbital magnetic moments for bulk Fe, Co and
Ni in µB . The experimental data are taken from Ref. (24).
The final row lists the average percentage deviation of the
numerical results from the experimental value.
tem. Nevertheless, it is standard practice to compute the
orbital magnetic moment L, like those listed in Table I,
which is related to jp from the KS orbitals by the relation
L = 1
r×jp(r)d
3r. The fact the EXX-SDFT and EXX-
CSDFT orbital moments are so close may be viewed as a
post-hoc justification of this practice for magnetic metals.
It should also be noted that in comparison to experiments
the EXX results are significantly worse than their LSDA
and GGA counterparts. One reason, of course, is the fact
that LSDA and GGA also include correlation in an ap-
proximate way which is neglected completely within the
EXX framework.
In a recent work [9] it is shown that the use of the
EXX functional in the framework of SCDFT, improves
the spin-orbit induced splitting of the bands in semicon-
ductors. Unfortunately, it is not clear if this improve-
ment is due to the use of different functionals (going
from LSDA to EXX), or due to the use of an extra den-
sity when going from SDFT to CSDFT. This has moti-
vated us to compare CSDFT and SDFT results for this
quantity using the same functional in both cases. We
have determined the value of this splitting for solid Si
and Ge and the results are presented in Table II. While
the EXX functional significantly improves the agreement
with experimental values, there is almost no change on
going from SDFT to CSDFT. Thus the improvement is
solely due to the orbital based functional. We also note
that the EXX-CSDFT results of Ref. (9) are significantly
different from ours, and in much worse agreement with
experiments. This might be due to the use of pseudopo-
tentials in the previous work. In this respect it is worth
noting that EXX derived KS energy gaps also show sig-
nificant differences depending on whether an all-electron
full-potential or pseudopotential method is used [25].
The paramagnetic current density of Ge for LSDA,
GGA, EXX-SDFT and EXX-CSDFT is plotted in Fig.
1. Ge is chosen as an example since the spin-orbit in-
duced splitting is largest for this system and, unlike in
the case of metallic orbital moments, this quantity does
show some difference on going from SDFT to CSDFT.
We immediately notice that there is no significant qual-
itative difference between the LSDA and GGA currents.
There are, however, pronounced differences in the cur-
Symmetry SDFT CSDFT
point Exp. LSDA GGA EXX EXXp EXXo
Ge Γ7v−8v 297 311 296 291.3 289 258.1
Ge Γ6c−8c 200 229.7 220 201.3 199 173.3
Si Γ25v 44 50 58 42.5 45.5 42.5
9.5 14.0 2.0 2.2 10.5
TABLE II: Spin-orbit induced splittings for bulk Ge and Si in
meV. The experimental data is taken from Ref. (26). EXXp
are results of the present work and EXXo are results from
Ref. (9). The final row lists the average percentage deviation
of numerical results from the experimental value.
rent density between LSDA/GGA and EXX-(C)SDFT:
the current in the latter case being smaller and more
homogeneous than that of the former. This is an inter-
esting finding since it indicates the tendency of (semi-)
local functionals towards higher values of the paramag-
netic current density. It is worthwhile noting previous
EXX-(C)SDFT results for open-shell atoms in which it
was found [7] that this effect was even more pronounced
and lead to vanishing currents.
Even though the EXX-SDFT current is considerably
lower in magnitude than that of EXX-CSDFT and also
has a less symmetric structure, the spin-orbit splittings
for the two cases are almost the same. Similar conclusions
regarding the total energies were also drawn for quantum
dots in external magnetic fields studied using EXX [8]
and other functionals of the current density [14]. From
Fig. 1 it is also clear that one of the major effects of using
the OEP method and of using jp as an extra density is to
change the local structure of the paramagnetic current,
which in turn suggests that quantities depending on local
properties of the currents, such as chemical shifts, might
exhibit larger differences in the two approaches. Such
calculations [27] of chemical shifts, performed using local
functionals, found that for molecules this is not the case.
The effect of the EXX functional on these shifts may be
an interesting subject for future investigations.
To summarize, in this work we have presented EXX-
SDFT and CSDFT calculations for solids. The orbital
magnetic moments of Fe, Co and Ni and the spin-orbit
induced band splitting of Si and Ge are computed. Our
analysis shows only minor differences between EXX- CS-
DFT and SDFT results. The spin-orbit induced band
splittings in EXX calculations are in rather good agree-
ment with experiments, while the results for the orbital
moments are worse than the LSDA or GGA values. This
highlights the importance of proper treatment of corre-
lations for the accurate determination of the orbital mo-
ments.
We acknowledge Deutsche Forschungsgemeinschaft
(SPP-1145) and NoE NANOQUANTA Network (NMP4-
CT-2004-50019) for financial support.
FIG. 1: (Color online)Paramagnetic current density for Ge,
in the [110] plane, calculated using the SDFT and CSDFT.
Arrows indicate the direction and information about the mag-
nitude (in atomic units) is given in the colour bar.
∗ Electronic address: sangeeta.sharma@physik.
fu-berlin.de
[1] U. von Barth and L. Hedin, J. Phys. C 5, 1629 (1972).
[2] G. Vignale and M. Rasolt, Phys. Rev. Lett. 59, 2360
(1987).
[3] G. Vignale and M. Rasolt, Phys. Rev. B 37, 10685 (1988).
[4] P. Skudlarski and G. Vignale, Phys. Rev. B 48, 8547
(1993).
[5] Y. Takada and H. Goto, J. Phys. Condens. Matter 10,
11315 (1998).
[6] K. Higuchi and M. Higuchi, Phys. Rev. B 74, 195122
(2006).
[7] S. Pittalis, S. Kurth, N. Helbig, and E.K.U. Gross,
Phys. Rev. A 74, 062511 (2006).
[8] N. Helbig, S. Kurth, S. Pittalis, E. Räsänen, and
E.K.U. Gross, cond-mat/0605599 (2006).
[9] S. Rohra, E. Engel, and A. Görling, cond-mat/0608505
(2006).
[10] T. Heaton-Burgess, P. Ayers, and W. Yang, Phys. Rev.
Lett. 98, 036403 (2007).
[11] S. Rohra and A.Görling, Phys. Rev. Lett. 97, 013005
(2006).
[12] K. Bencheikh, J. Phys. A: Math. Gen. 36, 11929 (2003).
[13] H. Ebert, M.Battocletti, and E.K.U. Gross, Euro-
phys. Lett. 40, 545 (1997).
[14] H. Saarikoski, E. Räsänen, S. Siljamäki, A. Harju, and
R. M. Nieminen, Phys. Rev. B 67, 205327 (2003).
[15] R. Sharp and G. Horton, Phys. Rev. 90, 317 (1953).
[16] J.D. Talman and W.F. Shadwick, Phys. Rev. A 14, 36
(1976).
[17] T. Grabo, T. Kreibich, S. Kurth, and E.K.U. Gross, in
Strong Coulomb Correlations in Electronic Structure Cal-
culations: Beyond Local Density Approximations, edited
by V. Anisimov (Gordon and Breach, Amsterdam, 2000),
p. 203.
[18] E. Engel and S. H. Vosko, Phys. Rev. A 47, 2800 (1993).
[19] S. Sharma, J.K. Dewhurst, C. Ambrosch-Draxl, S. Kurth,
N. Helbig, S. Pittalis, E.K.U. Gross, S. Shallcross, and
L. Nordström, cond-mat/0510800 (2006).
[20] S. Kümmel and J.P. Perdew, Phys. Rev. Lett. 90, 043004
(2003).
[21] S. Kümmel and J.P. Perdew, Phys. Rev. B 68, 035103
(2003).
[22] D.J. Singh, in Planewaves, pseudopotentials and the
LAPW method (Kluwer, Dordrecht, 1994).
[23] J. K. Dewhurst, S. Sharma, and C. Ambrosch-Draxl,
2004, http://exciting.sourceforge.net/.
[24] M. B. Stearns, in Magnetic properties of 3d, 4d, and
5d elements, alloys and compounds, edited by Landolt-
Boernstein (Springer, Berlin, 1987), Vol. Vol. III/19a.
[25] S. Sharma, J.K. Dewhurst, and C. Ambrosch-Draxl,
Phys. Rev. Lett. 95, 136402 (2005).
[26] O.Madelung, Semiconductors: Data Handbook (Springer-
Verlag, Berlin, 2004).
[27] A. M. Lee, N. C. Handy, and S. M. Colwell, J. Chem.
Phys. 103, 10095 (1995).
ABSTRACT
  The relative merits of current-spin-density- and spin-density-functional
theory are investigated for solids treated within the exact-exchange-only
approximation. Spin-orbit splittings and orbital magnetic moments are
determined at zero external magnetic field. We find that for magnetic (Fe, Co
and Ni) and non-magnetic (Si and Ge) solids, the exact-exchange
current-spin-density functional approach does not significantly improve the
accuracy of the corresponding spin-density functional results.

<|endoftext|><|startoftext|>
Introduction 1
2 Background 3
2.1 The classical MHV Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 A four–dimensional regulator for lightcone Yang–Mills . . . . . . . . . . . . . 6
2.3 The one–loop (++++) amplitude . . . . . . . . . . . . . . . . . . . . . . . . 11
3 The all-plus amplitudes from a counterterm 12
3.1 Mansfield transformation of LCT . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 The four–point case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 The general all–plus amplitude . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Discussion 27
A Notation 29
B Details on the four–point calculation 31
1 Introduction
One of the success stories arising from twistor string theory [1] (see [2] for a review) has
been the development of new techniques in perturbative quantum field theory. These include
recursion relations [3, 4], generalised unitarity [5] and MHV methods (see [6] for a review).
One of the key motivations of this work is to provide new approaches to study and derive
phenomenologically relevant scattering amplitudes. In particular, this requires that one be
able to deal with non-supersymmetric theories, and to include fermions, scalars, and particles
with masses. A vital first step is to apply these new methods to pure Yang-Mills (YM) theory,
and indeed, some of the first new results inspired by twistor string theory involved pure YM
amplitudes at tree- [7, 8, 9, 10, 11, 12, 13, 14] and one-loop [15] level.
A recalcitrant issue in this work is the derivation of rational terms in quantum amplitudes.
Unitarity-based techniques [16] and loop MHV methods [17] are successful in obtaining the
cut-constructible parts of amplitudes; essentially this is because at some level they are dealing
with four-dimensional cuts. In principle performing D-dimensional cuts generates all parts
of amplitudes [18, 19, 20, 21] as long as only massless particles are involved, however these
techniques still appear to be relatively cumbersome. Combinations of recursive techniques
and unitarity have led to important progress recently [22, 23, 24, 25, 26, 27, 28, 29, 30, 31],
but it would be preferable to have a more powerful prescriptive formulation, particularly
keeping in mind that applications to more general situations are sought.
A promising development from this point of view is the Lagrangian approach [32, 33, 34].
Here it has been argued that lightcone Yang-Mills theory, combined with a certain change
of field variables, yields a classical action which comprises precisely the MHV vertices. A
full Lagrangian description of MHV techniques would in principle give a prescription for
applying such methods to diverse theories. The next step in developing this is to understand
the quantum corrections in this Lagrangian approach. If one directly uses in a path integral
the classical MHV action, containing only purely four-dimensional MHV vertices, then it
is immediately clear that this cannot yield all known quantum amplitudes. For example,
there is no way to construct one-loop amplitudes where the external gluons all have positive
helicities, or where only one gluon has negative helicity, as all MHV vertices contain two
negative helicity particles (this issue has been recently discussed in [35]). These amplitudes
are particular cases where the entire amplitude consists of rational terms. More generally,
it seems clear that the vertices of the classical MHV Lagrangian will not yield the rational
parts of amplitudes, but only the cut-constructible terms [15]. Important insights into this
question can be obtained from the study of self-dual Yang-Mills theory, which has the same
all-plus one-loop amplitude of full YM [36, 37, 38] as its sole quantum correction.1 An
example, relevant to the discussion in this paper, is given in [35] where it was shown how
these amplitudes might be obtained from the Jacobian arising from a Bäcklund-type change
of variables which takes the self-dual Yang-Mills theory to a free theory.
A discussion of the full Yang-Mills theory in the lightcone gauge has recently been given
by Chakrabarti, Qiu and Thorn (CQT) in [39, 40, 41]. These papers employ an interesting
regularisation which, importantly, does not change the dimension of spacetime. For this
reason, we find it particularly suitable for setting the scene for the MHV diagram method,
which is inherently four-dimensional in current approaches. The regularisation of CQT
requires the introduction of certain counterterms, which prove to be rather simple in form.
What we will show in this paper is that these simple counterterms provide a very compact
and powerful way to represent the rational terms in gauge theory amplitudes; specifically,
we will demonstrate that the single two-point counterterm contains all the n-point all-plus
amplitudes. The way this happens is through the use of the new field variables of [32, 33, 34].
Other counterterms will combine with vertices from the Lagrangian and should generate the
rational parts of more general amplitudes. Based on the discussion in this paper, we propose
1In real Minkowski space, this is in fact its single non-vanishing amplitude.
that the counterterms, expressed in the field variables which give rise to standard MHV
vertices, in combination with Lagrangian vertices, generate the rational terms previously
missing from MHV diagram formulations.
The rest of the paper is organised as follows. After giving some background material in
section 2, we explicitly derive in section 3 the four point all-plus amplitude from the two-
point counterterm of CQT. We follow this by showing that the n-point expression, obtained
by writing the counterterm in new variables, has precisely the right collinear and soft limits
required for it to be the correct all-plus n-point amplitude. We present our conclusions in
section 4, and our notation and derivations of certain identities have been collected in two
appendices.
2 Background
In this section, we first review the classical field redefinition from the lightcone Yang–
Mills Lagrangian to the MHV–rules Lagrangian. We then move on to motivate the four–
dimensional regularisation scheme we will employ, and argue that it leads directly to the
introduction of a certain Lorentz–violating counterterm in the Yang–Mills Lagrangian. We
close the section with the remarkable observation that this counterterm provides a simple way
to calculate the four–point all-plus one–loop amplitude using only tree–level combinatorics.
2.1 The classical MHV Lagrangian
It seemed clear from the beginning that the MHV diagram approach to Yang-Mills theory
must be closely related to lightcone gauge theory. This idea was substantiated by Mansfield
[33] (see also [32]). The starting point of [33] is the lightcone gauge-fixed YM Lagrangian
for the fields corresponding to the two physical polarisations of the gluon. It was argued
convincingly in [33] that a certain canonical change of the field variables re-expresses this
lightcone Lagrangian as a theory containing the infinite series of MHV vertices. Some of
the arguments in [33] were rather general; these were reviewed in [34], where the change of
variables was discussed in more detail, and in particular it was shown how the four- and
five-point MHV vertices arise from the change of variables. In this paper we will mainly
follow the notation of [34].
The general structure of the lightcone YM Lagrangian, after integrating out unphysical
degrees of freedom, is (see appendix A for more details)
LYM = L+− + L++− + L−−+ + L++−− , (2.1)
where the gauge condition is ηµAµ = 0 with the null vector η = (1/
2, 0, 0, 1/
2). Since this
Lagrangian contains a + +− vertex, it is not of MHV type. In [33], Mansfield proposed to
eliminate this vertex through a suitably chosen field redefinition. Specifically, he performed
a canonical change of variables from (A, Ā) to new fields (B, B̄), in such a way that
L+−(A, Ā) + L++−(A, Ā) = L+−(B, B̄) . (2.2)
The remarkable result is that upon inserting this change of variables into the remaining two
vertices, the Lagrangian, written in terms of (B, B̄), becomes a sum of MHV vertices,
LYM = L+− + L+−− + L++−− + L+++−− + . . . . (2.3)
The crucial property of Mansfield’s transformation that makes this possible is that, while
both A and Ā are series expansions in the new B fields, A has no dependence on the B̄
fields while Ā turns out to be linear in B̄. Thus, since the remaining vertices are quadratic
in the B̄, the new interaction vertices have the helicity configuration of an MHV amplitude.
Mansfield was also able to show that the explicit form of the vertices coincides with the
CSW off-shell continuation of the Parke-Taylor formula for the MHV scattering amplitudes,
as proposed by [7].
One of the main results of [34] was the derivation of an explicit, closed formula for the
expansion of the original fields (A, Ā) in terms of the new fields (B, B̄). This was then used
to verify that the new vertices are indeed precisely the MHV vertices of [7], at least up to
the five-point level. We will now briefly review these results. First, recall that the positive
helicity field A is a function of the positive helicity B field only. It is expanded as follows:
A(~p ) =
(2π)3
∆(~p , ~p
, . . . ~p
) Y(~p ; 1 · · ·n) B(~p 1)B(~p 2) · · ·B(~p n) , (2.4)
where ∆(~p , ~p 1, . . . ~p n) := (2π)3δ(3)(~p − ~p 1 − · · · − ~p n). Note that the x− coordinate is
common to all the fields, which is why we have restricted the transformation to the lightcone
quantisation surface Σ.
By inserting this expansion into (2.2) and using the requirement that the transforma-
tion be canonical, Ettle and Morris succeeded in deriving a very simple expression for the
coefficients Y. After translating to our conventions (see appendix A), they are given by:
Y(~p ; 12 · · ·n) = (
2ig)n−1
〈12〉〈23〉 · · · 〈n− 1, n〉 . (2.5)
The first few terms in (2.4) are then:
A(~p ) =B(~p ) +
2igp+
d3p1d3p2
(2π)3
δ(3)(~p − ~p 1 − ~p 2)
〈12〉 B(~p
)B(~p
− 2g2p+
d3p1d3p2d3p3
(2π)6
δ(3)(~p − ~p 1 − ~p 2 − ~p 3)
〈12〉〈23〉 B(~p
1)B(~p 2)B(~p 3)
+ · · · .
(2.6)
Similarly, one can write down the expansion of the negative helicity field Ā, which, as
discussed above, is linear in B̄, but is an infinite series in the new field B. In [34] it was
shown that the coefficients in the expansion of Ā are very closely related to those for A.2
The expansion of B̄ turns out to be simply
Ā(~p ) =−
(2π)3
∆(~p , ~p
, . . . , ~p
(ps+)
(p+)2
Y(~p ; 1 · · ·n) B(~p 1)· · ·B̄(~p s)· · ·B(~p n)
(2π)3
∆(~p , ~p 1, . . . , ~p n)
(p+)2
Y(~p ; 1 · · ·n)
(ps+)
2 B(~p
) · · · B̄(~p s) · · ·B(~p n) .
(2.7)
Thus we see that at each order in the expansion, we need to sum over all possible positions
of B̄. Explicitly, the first few terms are:
Ā(~p ) = B̄(~p )−
d3p1d3p2
(2π)3
δ(3)(~p − ~p 1 − ~p 2) 1
〈12〉×
(p1+)
2B̄(~p 1)B(~p 2) + (p2+)
2B(~p 1)B̄(~p 2)
+ 2g2
d3p1d3p2d3p3
(2π)6
δ(3)(~p − ~p 1 − ~p 2 − ~p 3) 1
〈12〉〈23〉×
(p1+)
2B̄(~p 1)B(~p 2)B(~p 3)+(p2+)
2B(~p 1)B̄(~p 2)B(~p 3)+(p3+)
2B(~p 1)B(~p 2)B̄(~p 3)
+ · · ·
(2.8)
Using the above results, it is in principle straightforward to derive the terms that arise on
inserting the Mansfield transformation into the two remaining vertices of the theory. For
the simplest cases, one can see explicitly that these combine to produce MHV vertices, and
some arguments were also given in [33, 34] that this must be true in general.
In supersymmetric theories, the MHV vertices are enough to reproduce complete scat-
tering amplitudes at one loop [43]. However, as we mentioned earlier, for pure YM it is
clear that the terms in the MHV Lagrangian (2.3) will not be enough to generate complete
quantum amplitudes. For instance, the scattering amplitude with all gluons with positive
helicity, which at one loop is finite and given by a rational term, cannot be obtained by only
using MHV diagrams, for the simple reason that one cannot draw any diagram contributing
to it by only resorting to MHV vertices.3 Another amplitude which cannot be derived within
conventional MHV diagrams is the amplitude with only one gluon of negative helicity. Simi-
larly to the all-plus amplitude, this single-minus amplitude vanishes at tree level, and at one
loop is given by a finite, rational function of the spinor variables.
2This is perhaps easiest to see [42] by considering that, in the context of N = 4 SYM, A and B are part
of the same lightcone superfield.
3On the other hand, it was shown in [35] that the parity conjugate all-minus amplitude is correctly
generated by using MHV diagrams.
The lesson we learn from this is that, in order to apply the MHV method to derive
complete amplitudes in pure YM, one should look more closely at the change of variables
in the full quantum theory. There are several possible subtleties one should pay careful
attention to at the quantum level. First of all, it is possible that the canonical nature of the
transformation is not preserved, leading to a non–trivial Jacobian which could provide the
missing amplitudes. Another possible source of contributions could come from violations of
the equivalence theorem. This theorem states that, although correlation functions of the
new fields are in general different from those of the old fields, the scattering amplitudes
are actually the same4, as long as the new fields are good interpolating fields. These issues
were explored in some detail in [35] (see also [34, 42]) where it was shown, for a different
(non-canonical) field redefinition, how a careful treatment of these effects can combine to
reproduce some of the amplitudes that would seem to be missing at first sight.
Another aim of [35] was to demonstrate how to reproduce one of the above–mentioned
rational amplitudes, the one with all–minus helicities, in the MHV formalism. This amplitude
is slightly less mysterious than the all–plus amplitude in the sense that one can write down
the contributing diagrams using only MHV vertices; however a calculation without a suitable
regulator in place would give a vanishing answer, despite the fact that this amplitude is finite.
In [35], it was shown, using dimensional regularisation, that the full nonzero result arises
from a slight mismatch between four– and D (= 4− 2ǫ)–dimensional momenta.
It is natural therefore to expect that dimensional regularisation will be helpful also for
the problem at hand, which is to recover the rational amplitudes of pure Yang–Mills after
the Mansfield transformation. Decomposing the regularised lightcone Lagrangian into a pure
four-dimensional part and the remaining ǫ–dependent terms, and performing the transfor-
mation on the four-dimensional part only, will give rise to several new ǫ–dependent terms
that can potentially give finite answers when forming loops.
Although this approach shows promise, it is not the one we will make use of in the fol-
lowing. Instead, motivated by the fact that the Mansfield transformation seems to be deeply
rooted in four dimensions, we would like to look for a purely four–dimensional regularisation
scheme. We now turn to a review of the particular scheme we will use.
2.2 A four–dimensional regulator for lightcone Yang–Mills
In the above we explained why a näıve application of the Mansfield transform leads to puzzles
at the quantum level, and discussed possible ways to improve the situation. The conclusion
was that, since the missing amplitudes arise from subtle mismatches in regularisation, one
should be careful to perform the Mansfield transform on a suitably regularised version of
the lightcone Yang–Mills action. Here we will review one approach to the regularisation of
lightcone Yang–Mills, which, despite several slightly unusual features, appears to be ideally
4Modulo a trivial wave-function renormalisation.
suited for the problem at hand.
The regularisation we propose to use is inspired by recent work of CQT [39, 40, 41] on
Yang–Mills amplitudes in the lightcone worldsheet approach [44, 45]. This is an attempt
to understand gauge–string duality which is similar in spirit to ’t Hooft’s original work on
the planar limit of gauge theory [46], and aims at improving on early dual model techniques
[47, 48]. We recall that one of the main goals in those works is to exhibit the string worldsheet
as made up of very large planar diagrams (“fishnets”).
In their recent work, Thorn and collaborators make this statement more precise, using
techniques that were unavailable when the original ideas were put forward. It is hoped that,
by understanding how to translate a generic Yang–Mills planar diagram to a configuration
of fields (with suitable boundary conditions) on the lightcone worldsheet, it will eventually
become possible to perform the sum of all these diagrams. This approach to gauge–string
duality is thus complementary to that using the AdS/CFT correspondence.
The field content and structure of the worldsheet theory dual to Yang–Mills theory is
rather intricate (see e.g. [45]), but for our purposes the details are not important. What is
most relevant for us is that one of the principles of this approach is that all quantities on
the Yang–Mills side should have a local worldsheet description. This includes the choice of
regulator that needs to be introduced when calculating loop diagrams. This requirement led
Thorn [49] (see also [50, 51]) to introduce an exponential UV cutoff, which we will discuss
in a short while.
Since one of the goals of this programme is to translate an arbitrary planar diagram
into worldsheet form (and eventually calculate it), it is an important intermediate goal to
understand how to do standard Yang–Mills perturbation theory in “worldsheet–friendly”
language. In [39, 40, 41] CQT do exactly that for the simplest case, that of one–loop
diagrams in Yang–Mills theory, by analysing how familiar features like renormalisation are
affected by the unusual regularisation procedure and other special features of the lightcone
worldsheet formalism.
To conclude this brief overview of the lightcone worldsheet formalism, the main point for
our current purposes is that it provides motivation and justification for a slightly unusual
regularisation of lightcone Yang–Mills, which we will now describe.
Let us momentarily focus on the self–dual part of the lightcone Yang–Mills Lagrangian:
L = L−+ + L++− = −Az̄�Az + 2ig[Az, ∂+Az̄](∂+)−1(∂z̄Az) . (2.9)
This action provides one of the representations of self-dual Yang-Mills theory. After trans-
forming to momentum space, we find that the only vertex in the theory is the following
(suppressing the gauge index structure):
A2 A1
= −2g
[p1+p
z̄ − p2+p1z̄] = −
[12] . (2.10)
As for propagators, following [40], we will use the Schwinger representation:
dTe+Tp
. (2.11)
In (2.11) p2 is understood to be the appropriate (p2 < 0) Wick rotated version of the
Minkowski space inner product. For our choice of signature, the latter is
p · q = p+q− + p−q+ − p · q = p+q− + p−q+ − (pzqz̄ + pz̄qz) , (2.12)
so that p2 = 2(p+p− − pzpz̄).
We will also make use of the dual or “region momentum” representation, where one
assigns a momentum to each region that is bounded by a line in the planar diagram. By
convention, the actual momentum of the line is given by the region momentum to its right
minus that on its left, as given by the direction of momentum flow5. Clearly such a pre-
scription can only be straightforwardly implemented for planar diagrams, which is the case
considered in [40]. This is also sufficient for our purposes, since we are calculating the lead-
ing single–trace contribution to one–loop scattering amplitudes. Non–planar (multi–trace)
contributions can be recovered from suitable permutations of the leading–trace ones (see
e.g. [52]).
To demonstrate the use of region momenta, a sample one–loop diagram is pictured in
Figure 1.
Figure 1: A sample one–loop diagram indicating the labelling of region momenta. The
outgoing leg momenta are p1 = k1 − k4 , p2 = k2 − k1 , p3 = k3 − k2 , p4 = k4 − k3, while
the loop momentum (directed as indicated) is l = q − k1.
5In [40] the flow of momentum is chosen to always match the flow of helicity, but we will not use this
convention.
The “worldsheet–friendly” regulator that CQT employ is simply defined as follows [49]:
For a general n–loop diagram, with qi being the loop region momenta, one simply inserts an
exponential cutoff factor
exp(−δ
q2i ) (2.13)
in the loop integrand, where δ is positive and will be taken to zero at the end of the calcula-
tion. This clearly regulates UV divergences (from large transverse momenta), but, as we will
see, has some surprising consequences since it will lead to finite values for certain Lorentz–
violating processes, which therefore have to be cancelled by the introduction of appropriate
counterterms.
Note that q2 = 2qzqz̄ has components only along the two transverse directions, hence it
breaks explicitly even more Lorentz invariance than the lightcone usually does. This might
seem rather unnatural from a field-theoretical point of view, however it is crucial in the
lightcone worldsheet approach. Indeed, the lightcone time x− and x+ (or in practice its
dual momentum p+) parametrise the worldsheet itself, and are regulated by discretisation;
thus, they are necessarily treated very differently from the two transverse momenta qz, qz̄
which appear as dynamical worldsheet scalars. Fundamentally, this is because of the need
to preserve longitudinal (x+) boost invariance (which eventually leads to conservation of
discrete p+). The fact that the regulator depends on the region momenta rather than the
actual ones is a consequence of asking for it to have a local description on the worldsheet.
The main ingredient for what will follow later in this paper is the computation of the
(++) one–loop gluon self–energy in the regularisation scheme discussed earlier. This is
performed on page 10 of [40], and we will briefly outline it here. This helicity–flipping
gluon self–energy, which we denote by Π++, is the only potential self–energy contribution in
self–dual Yang–Mills; in full YM we would also have Π+− and, by parity invariance, Π−−.
There are two contributions to this process, corresponding to the two ways to route
helicity in the loop, but they can be easily shown to be equal so we will concentrate on one
of them, which is pictured in Figure 2.
Figure 2: Labelling of one of the selfenergy diagrams contributing to Π++.
In Figure 2, p,−p are the outgoing line momenta, l is the loop line momentum, and
k, k′, q are the region momenta, in terms of which the line momenta are given by
p = k′ − k, l = q − k′ . (2.14)
Remembering to double the result of this diagram, we find the following expression for
the self–energy:
Π++ =8g2N
(2π)4
−(p + l)+
(p+lz̄ − l+pz̄)
l2(p+ l)2
(−p+)(p+ l)+
((−p+)(pz̄ + lz̄)− (p+ + l+)(pz̄))
(p+)2
(p+lz̄ − l+pz̄)(p+(pz̄ + lz̄)− (p+ + l+)pz̄)
l2(p + l)2
(2.15)
Although we are suppressing the colour structure, the factor of N is easy to see by thinking
of the double–line representation of this diagram6. One of the crucial properties of (2.15)
is that the factors of the loop momentum l+ coming from the vertices have cancelled out,
hence there are no potential subtleties in the loop integration as l+ → 0. This means that,
although for general loop calculations one would have to follow the DLCQ procedure and
discretise l+ (as is done for other processes considered in [39, 40, 41]), this issue does not
arise at all for this particular integral, and we are free to keep l+ continuous.
To proceed, we convert momenta to region momenta, rewrite propagators in Schwinger
representation, and regulate divergences using the regulator (2.13). Employing the unbroken
shift symmetry in the + region momenta to further set k+ = 0, (2.15) can be recast as:
Π++ =
dT1dT2
(k′+)
eT1(q−k)
2+T2(q−k
′)2−δq2×
k′+(qz̄ − k′z̄)− (q+ − k′+)(k′z̄ − kz̄)
k′+(qz̄ − kz̄)− q+(k′z̄ − kz̄)
(2.16)
Since q− only appears in the exponential, the q− integration will lead to a delta function
containing q+, which can be easily integrated and leads to a Gaussian–type integral for qz, qz̄.
Performing this integral, we obtain (setting T = T1 + T2, x = T1/(T1 + T2))
Π++ =
dT δ2
[xkz̄ + (1− x)k′z̄]2
(T + δ)3
eTx(1−x)p
2− δT
(xk+(1−x)k′)2 . (2.17)
Notice that, had we not regularised using the δ regulator, we would have obtained zero at
this stage. Instead, now we can see that there is a region of the T integration (where T ∼ δ)
that can lead to a nonzero result. On performing the T and x integrations, and sending δ
to zero at the end, we obtain the following finite answer:
Π++ = 2
(kz̄)
2 + (k′z̄)
2 + kz̄k
. (2.18)
6 For simplicity, we take the gauge group to be U(N).
Notice that this nonvanishing, finite result violates Lorentz invariance, since it would
imply that a single gluon can flip its helicity. Also, it explicitly depends on only the z̄
components of the region momenta. Such a term is clearly absent in the tree-level Lagrangian
(unlike e.g. the Π+− contribution in full Yang–Mills theory), thus it cannot be absorbed
through renormalisation – it will have to be explicitly cancelled by a counterterm. This
counterterm, which will play a major rôle in the following, is defined in such a way that:
+ = 0 , (2.19)
in other words it will cancel all insertions of Π++, diagram by diagram. Let us note here
that, had we been doing dimensional regularisation, all bubble contributions would vanish
on their own, so there would be no need to add any counterterms. So this effect is purely
due to the “worldsheet–friendly” regulator (2.13).
It is also interesting to observe that in a supersymmetric theory this bubble contribution
would vanish7 so this effect is only of relevance to pure Yang–Mills theory.
2.3 The one–loop (++++) amplitude
Now let us look at the all–plus four-point one–loop amplitude in this theory. It is easy to
see that it will receive contributions from three types of geometries: boxes, triangles and
bubbles. It is a remarkable property8 that the sum of all these geometries adds up to zero.
In particular, with a suitable routing of momenta, the integrand itself is zero. Pictorially,
we can state this as:
+ 4× + 2× + 8× = 0 . (2.20)
The coefficients mean that we need to add that number of inequivalent orderings. So
we see (and refer to [40] for the explicit calculation) that the sum of all the diagrams that
one can construct from the single vertex in our theory, gives a vanishing answer. However,
as discussed in the previous section, this is not everything: we need to also include the
contribution of the counterterm that we are forced to add in order to preserve Lorentz
invariance. Since this counterterm, by design, cancels all the bubble graph contributions, we
are left with just the sum of the box and the four triangle diagrams. In pictures,
A++++ = +4× +
2× + 8× + 2× + 8×
(2.21)
7This can in fact be derived from the results of [53], where similar calculations were considered with
fermions and scalars in the loop.
8This observation is attributed to Zvi Bern [40].
where A++++ is the known result [54] for the leading–trace part of the four–point all-plus
amplitude:
A++++(A1A2A3A4) = i
[12][34]
〈12〉〈34〉 , (2.22)
and the terms in the parentheses clearly cancel among themselves. This leaves the box
and triangle diagrams, which are exactly those appearing in the calculation of the parity
conjugate amplitude using dimensional regularisation [35], where the bubbles were zero to
begin with.
Following [40], we make the obvious, but important for the following, observation that
one can change the position of the parentheses:
A++++ =
+ 4× + 2× + 8×
+2× +8× (2.23)
where again the terms in the parentheses are zero (by (2.20)). This demonstrates that
one can compute the all-plus amplitude just from a tree-level calculation with counterterm
insertions (of course, these diagrams are at the same order of the coupling constant as one–
loop diagrams because of the counterterm insertion). This remarkable claim is verified in
[40], where CQT explicitly calculate the 10 counterterm diagrams and recover the correct
result for the four-point amplitude (see pp. 22-23 of [40])9.
This result, apart from being very appealing in that one does not have to perform any
integrals (apart from the original integral that defined the counterterm) so that the calcula-
tion reduces to tree–level combinatorics, will also turn out to be a convenient starting point
for performing the Mansfield transformation. Specifically, our claim is that the whole series
of all-plus amplitudes will arise just from the counterterm action. In the following we will
show how this works explicitly for the four-point all-plus case, and then we will argue for
the n-point case that the corresponding expression derived from the counterterm has all the
correct singularities (soft and collinear), giving strong evidence that the result is true in
general.
3 The all-plus amplitudes from a counterterm
Having reviewed the relevant new features that arise when doing perturbation theory with the
worldsheet–motivated regulator of [49], we now have all the necessary ingredients to perform
the Mansfield change of variables on the regulated lightcone Lagrangian. In this section, we
will carry out this procedure. We will first regulate lightcone self–dual Yang–Mills, which, as
9In practice, these authors choose to insert the self-energy result (2.18) in the tree diagrams, so what
they compute is minus the all–plus amplitude.
discussed, will require us to introduce an explicit counterterm in the Lagrangian. Then we
will perform the Mansfield transformation on the original Lagrangian (converting it to a free
theory). We will then show that, upon inserting the change of variables into the counterterm
Lagrangian, we recover the all–plus amplitudes as vertices in the theory.
3.1 Mansfield transformation of LCT
As we saw, the “worldsheet-friendly” regularisation requires us to add a certain counterterm
to the lightcone Yang–Mills action, required in order to cancel the Lorentz-violating helicity–
flipping gluon selfenergy. As mentioned previously, the calculation of the all–plus amplitude
can be tackled purely within the context of self-dual Yang–Mills, which we will focus on from
now on. We see that, as a result of this regularisation, the complete action at the quantum
level becomes:
L(r)SDYM = L+− + L++− + LCT , (3.1)
where L+− + L++− is the classical Lagrangian for self-dual Yang-Mills introduced in (2.9).
Although CQT do not write down a spacetime Lagrangian for LCT, it is easy to see that the
following expression would have the right structure:
LCT = −
d3kid3kj Ai j(k
i, kj)[(kiz̄)
2 + (k
2 + kiz̄k
j , ki) . (3.2)
This expression depends explicitly on the dual, or region, momenta. In (3.2) we have made
use of the simplest way to associate region momenta to fields, which is to assign a region
momentum to each index line in double–line notation [46], and thus a momentum ki, kj to
each of the indices of the gauge field Ai j (now slightly extended into a dipole, as would be
natural from the worldsheet perspective, where an index is associated to each boundary).
Since each line has a natural orientation, the actual momentum of each line can be taken to
be the difference of the index momentum of the incoming index line and the outgoing index
line. So the momentum of Ai j(k
i, kj) is taken to be p = kj−ki. As discussed above, this
assignment can only be performed consistently for planar diagrams, which is sufficient for
our purposes.
Clearly, the structure of (3.2) is rather unusual. First of all, it depends only on the
antiholomorphic (z̄) components of the region momenta, and so is clearly not (lightcone)
covariant. Even more troubling is the fact that it does not depend only on differences of
region momenta, but also on their sums. Since each field thus carries more information
than just its momentum, LCT is a non–local object from a four–dimensional point of view
(although, as shown in [40], it can be given a perfectly local worldsheet description).
Leaving the above discussion as food for thought, we will now rewrite (3.2) in a more
conventional way that is most convenient for inserting into Feynman diagrams,
LCT = −
d3p d3p′ δ(p+ p′) Ai j(p
′)((kiz̄)
2 + (k
2 + kiz̄k
i(p) . (3.3)
In this expression, which can be thought of as the zero–mode or field theory limit of (3.2), all
the region momentum dependence is confined to the polynomial factor (kiz̄)
2+kiz̄k
z̄. This
vertex, inserted into tree diagrams, would exactly reproduce the effects of the counterterm
pictured in (2.19). Although (3.3) still exhibits some of the apparently undesirable features
we discussed above, the calculations in [40] demonstrate that, after summing over all possible
insertions of this term, the final result is covariant and correctly reproduces the all–plus
amplitudes10. Therefore, we believe that its problematic properties are really a virtue in
disguise, and (as we will see explicitly) they seem to be crucial in obtaining the full series of
n–point all–plus amplitudes from the Mansfield transformation of a single term.
We are now ready to perform the Mansfield change of variables. In the spirit of the
discussion earlier, we will perform the transformation on the classical part of the action
only:
L+−(A, Ā) + L++−(A, Ā) = L+−(B, B̄) (3.4)
Hence the classical part of the action has been converted to a free theory. Without a
regulator, this would be the whole story. However we now see that, within the particular
regularisation we are working with, the full Lagrangian L(r)SDYM contains one extra, one–loop
piece, given by LCT in (3.3), which is quadratic in the positive helicity fields A. To complete
the Mansfield transformation, we will clearly need to expand this term in the new fields B,
using the Ettle–Morris coefficients (2.4).
Since LCT depends only on the holomorphic A fields, we will only need the expansion of
A in terms of B given in (2.4). As a first check that LCT leads to the right kind of structure,
note that since A depends only on the holomorphic B fields, all the new vertices are all–plus.
Thus, the full action, when expressed in terms of the B fields, takes the schematic form:
L(r)SDYM(A, Ā) = L+−(B, B̄) + L++(B) + L+++(B) + L++++(B) + · · · (3.5)
In the next section we will calculate the four–point term L++++ and demonstrate that, when
restricted on–shell, it reproduces the known form (2.22) for the all–plus amplitude.
3.2 The four–point case
To begin with, we focus on the derivation of the four-point all-plus vertex, whose on-shell
version will give us the four-point scattering amplitude. We will thus expand the old fields
A in the counterterm (3.3) (or (3.2)) up to terms containing four B-fields.
When inserting the Ettle–Morris coefficients into (3.3), one has to sum over all possible
cyclic orderings with which this can be done. A complication is that now the counterterm
10Note that similar–looking treatments using index momenta instead of line momenta for vertices, but
which in the end sum up to covariant results have appeared in the context of noncommutative geometry
(see e.g. [55]). Although it is possible to write e.g. (3.2) in star–product form, at this stage it is not clear
whether that is a useful reformulation.
itself depends on the ordering. In other words, we need to sum over all the ways of assigning
dual momenta to the indices. Schematically, the inequivalent terms that we obtain are:
AA → (
B1B2)(
B3B4) + (
B2B3)(
B4B1)
B1B2B3)B4 + (
B2B3B4)B1 + (
B3B4B1)B2 + (
B4B1B2)B3 ,
(3.6)
where the terms on the first line arise from doing two quadratic substitutions and those
on the second from doing one cubic substitution. All the other possibilities are related by
cyclicity of the trace. For definiteness, let us now write down what one of these terms means
explicitly:11
B1B2B3)B4
= −2g2 tr
dpdp4δ(p+p4)
dp1dp2dp3δ(p−p1−p2−p3) p+√
〈12〉〈23〉×
× B(p1)B(p2)B(p3)
(k3z̄)
2 + (k4z̄)
2 + k4z̄k
B(p4)
= 2g2
dp1dp2dp3dp4δ(p1 + p2 + p3 + p4)×
(k3z̄)
2 + (k4z̄)
2 + k4z̄k
〈12〉〈23〉 tr
B(p1)B(p2)B(p3)B(p4)
(3.7)
The reason this particular combination of kz̄’s appears here is that, given the ordering we
chose, after the Mansfield transformation the counterterm ends up being on leg 4, and its
line bounds the regions with momenta k3 and k3. This is represented pictorially in Figure 3.
Figure 3: One of the contributions to the four–point all-plus vertex.
Although Figure 3 might suggest that there is a propagator between the counterterm
insertion and the location of the original A, which has now split into three B’s, this is of
course not the case since the whole expression is a vertex at the same point. We have
drawn the diagram in this fashion to emphasise which leg the counterterm is located on
after the transformation. On the other hand, this vertex is nonlocal (as discussed above, it
was nonlocal even in the original variables, but this is now compounded by the Mansfield
coefficients, which contain momenta in the denominator), so this notation serves as a useful
reminder of that fact.
11We suppress the overall factor of −g2N/(12π2) until the end of this section. Also, the integrals are
implicitly taken to be on the quantisation surface Σ.
It is interesting to note that (3.7) is essentially the same expression as the sum of the
two channels with the same region momentum dependence that appear in CQT’s calculation
of this amplitude using tree–level diagrammatics (compare with Eq. 83 in [40]), which we
illustrate in Fig. 4. Thus we have a picture where one post–Mansfield transform vertex (with
Figure 4: The two diagrams with counterterm insertions on leg 4 that arise in the calculation
of CQT, and, combined, add up to the contribution in Fig. 3.
B’s) effectively sums two tree–level pre–transformation (with A’s) Feynman diagrams. This
is a first indication that our calculation of the all–plus vertex can be mapped, practically
one–to–one, to that of the all–plus amplitude on pp. 22-23 of [40].
Another type of contribution to the vertex arises when we transform both of the A’s in
LCT. One of the two terms that we find is:
B2B3)(
B4B1)
= −2g2 tr
dp dp′δ(p+ p′)
dp2dp3δ(p− p2 − p3) p+√
〈23〉B(p
2)B(p3)
(k1z̄)
2 + (k3z̄)
2 + k1z̄k
dp4dp1δ(p′ − p4 − p1)
〈41〉B(p
4)B(p1)
= −2g2
dp1 · · ·dp4δ(p1+p2+p3+p4)
(p2+ + p
+ + p
((k1z̄)
2 + (k3z̄)
2 + k1z̄k
〈23〉〈41〉
B(p1)B(p2)B(p3)B(p4)
(3.8)
This contribution can also be mapped to one of the two terms with bubbles on internal lines
in CQT.
We can now tabulate all the terms that we obtain in this way by making the schematic
form (3.6) precise. Since the delta–function and trace over B parts are the same for all these
terms, in Table 1 we just list the rest of the integrand.
To obtain the final form of the vertex, we are now instructed to sum over all these
contributions. Thus we can write
L++++(B) = 2g2
dp1dp2dp3dp4δ(p1+p2+p3+p4) V(4) tr[B(p1)B(p2)B(p3)B(p4)] (3.9)
Schematic form Pictorial form Integrand
B1B2B3)B4
+k3k4
〈12〉〈23〉
B2B3B4)B1
+k1k4
〈23〉〈34〉
B3B4B1)B2
+k2k1
〈34〉〈41〉
B4B1B2)B3
+k2k3
〈41〉〈12〉
B2B3)(
B4B1) −
+k1k3
〈23〉〈41〉
B1B2)(
B3B4) −
+k4k2
〈34〉〈12〉
Table 1: The various contributions to the all–plus four–point vertex. Note that we use the
simplifying notation ki := k
where V(4) is given by the following expression:12
V(4) = 1√
〈12〉〈23〉〈34〉〈41〉×
3 + k
4 + k3k4)〈34〉〈41〉+ p1+
1 + k
4 + k1k4)〈12〉〈41〉
+ p2+
2 + k
1 + k2k1)〈12〉〈23〉+ p3+
3 + k
2 + k2k3)〈23〉〈34〉
− (p2+ + p3+)(p1+ + p4+)(k21 + k23 + k1k3)〈12〉〈34〉
− (p3+ + p4+)(p2+ + p1+)(k24 + k22 + k4k2)〈23〉〈41〉
(3.10)
Comparing this to the expected answer (2.22), we see that the (quadratic) antiholomorphic
momentum dependence should arise from the various kz̄ factors in (3.10). In [40], CQT start
from essentially the same expression and demonstrate that it gives the correct result for the
all-plus amplitude. Therefore, following practically the same steps as those authors, we can
easily see that we obtain the expected answer. However, since we would like to find the full
vertex V, we will need to keep off–shell information, and so we will choose a slightly different
route.
12For the sake of brevity we omit a subscript z̄ in the region momenta appearing in (3.10).
The main complication in bringing (3.10) into a manageable form is clearly the presence
of the region momenta. We would like to disentangle their effects as cleanly as possible.
Therefore, our derivation will proceed by the following steps:
1. First, we will show that (3.10) can be manipulated so that the quadratic dependence
on region momenta drops out, leaving only terms linear in the region momenta.
2. Second, we will decompose the resulting expression into a part that depends on the
region momenta and one that does not. The k–dependent part turns out to have a
very simple form, and vanishes on–shell.
3. Finally, we will show that the k–independent part reduces to the known amplitude.
For the first step, we will need the following identity, which is proved in appendix B:
+〈34〉〈41〉+ p1+
+〈12〉〈41〉+ p2+
+〈12〉〈23〉+ p3+
+〈23〉〈34〉
− (p2+ + p3+)(p1+ + p4+)〈12〉〈34〉 − (p3+ + p4+)(p2+ + p1+)〈23〉〈41〉 = 0
(3.11)
Also, using the shorthand notation Kij := (k
2 + (k
2 + kiz̄k
z̄: we note the following very
useful identity:
Kij = Kik + (k
z̄ − kkz̄ )(kiz̄ + k
z̄ + k
z̄ ) = Kik + (k
z̄ − kkz̄ )lijk (3.12)
where 1 ≤ k ≤ n and lijk = kiz̄+k
z̄ . Noting that, for j > k, k
z̄−kkz̄ = pk+1z̄ +pk+2z̄ + · · · p
we can use this to rewrite all the region momentum combinations appearing in (3.10) in the
following way:
K34 =
(K12 +K23 +K34 +K41 + (p̄3 + p̄4)(l124 + l234) + 2(p̄2 + p̄3)l134)
K14 =
(K12 +K23 +K34 +K41 − (p̄2 + p̄3)(l134 + l123) + 2(p̄3 + p̄4)l124)
K12 =
(K12 +K23 +K34 +K41 − (p̄3 + p̄4)(l124 + l234)− 2(p̄2 + p̄3)l123)
K23 =
(K12 +K23 +K34 +K41 + (p̄2 + p̄3)(l134 + l123)− 2(p̄3 + p̄4)l234)
K13 =
(K12 +K23 +K34 +K41 + (p̄3 − p̄2)l123 + (p̄1 − p̄4)l134)
K24 =
(K12 +K23 +K34 +K41 + (p̄4 − p̄3)l234 + (p̄2 − p̄1)l124)
(3.13)
where we have introduced the notation p̄i = p
z̄. We have thus expressed all the quadratic
region momentum dependence in terms of the common factor K12 +K23 +K34 +K41, and,
given (3.11), it is clear that this contribution will vanish.13
13One could have chosen a different combination of the Kij ’s, but we find the symmetric choice in (3.13)
convenient.
After this step, we are left with an expression which is linear in the region momenta. We
will now proceed in a similar way, and rewrite all the expressions that contain lijk in terms
of a suitably chosen common factor:
l124 + l234 =
(k1z̄ + k
z̄ + k
z̄ + k
(p1z̄ + p
l134 + l123 =
(k1z̄ + k
z̄ + k
z̄ + k
(p2z̄ + p
2l234 =
(k1z̄ + k
z̄ + k
z̄ + k
z̄) +
(2p2z̄ + p
z̄ − p1z̄)
2l123 =
(k1z̄ + k
z̄ + k
z̄ + k
z̄) +
(2p1z̄ + p
z̄ − p4z̄)
2l134 =
(k1z̄ + k
z̄ + k
z̄ + k
z̄) +
(2p3z̄ + p
z̄ − p2z̄)
2l124 =
(k1z̄ + k
z̄ + k
z̄ + k
z̄) +
(2p4z̄ + p
z̄ − p3z̄)
(3.14)
In appendix B we show that the total coefficient of the common (k1z̄ + k
z̄ + k
z̄ + k
z̄) factor is
+(+(p̄3 + p̄4) + (p̄2 + p̄3))〈34〉〈41〉+ p1+
+(−(p̄2 + p̄3) + (p̄3 + p̄4))〈12〉〈41〉
+ p2+
+(−(p̄3 + p̄4)− (p̄2 + p̄3))〈12〉〈23〉+ p3+
+(+(p̄2 + p̄3)− (p̄3 + p̄4))〈23〉〈34〉
− (p2+ + p3+)(p1+ + p4+)(
(p̄3 − p̄2) +
(p̄1 − p̄4))〈12〉〈34〉
− (p3+ + p4+)(p2+ + p1+)(
(p̄4 − p̄3) +
(p̄2 − p̄1))〈23〉〈41〉
= − 3
[(12) + (23) + (34) + (41)]
(3.15)
where (pi)
2 is the full covariant momentum squared, and (ij) = pi+p
z − p
z. Thus we see
that the complete dependence on the region momenta can be rewritten as follows:
= − 3
(12) + (23) + (34) + (41)
〈12〉〈23〉〈34〉〈41〉
. (3.16)
It is rather satisfying that the region momentum dependence of the vertex takes this simple
form, which clearly vanishes when the external legs are on–shell, and thus will not contribute
to the all–plus amplitudes.
Having completely disentangled the region momenta kz̄ from the actual momenta pz̄,
we will now focus on the terms containing only the latter, which were produced during the
decompositions in (3.14). After a few simple manipulations, they can be rewritten as14
V (4)p =
+[(p̄1 + p̄2)(p̄1 − p̄2) + (p̄3 + p̄2)(p̄3 − p̄2)]〈34〉〈41〉
+ p1+
+[(p̄2 + p̄3)(p̄2 − p̄3) + (p̄4 + p̄3)(p̄4 − p̄3)]〈41〉〈12〉
+ p2+
+[(p̄3 + p̄4)(p̄3 − p̄4) + (p̄1 + p̄4)(p̄1 − p̄4)]〈12〉〈23〉
+ p3+
+[(p̄4 + p̄1)(p̄4 − p̄1) + (p̄2 + p̄1)(p̄2 − p̄1)]〈23〉〈34〉
− (p2+ + p3+)(p1+ + p4+)[(p̄3 − p̄2)(p̄1 − p̄4)− (p̄1 + p̄2)2]〈12〉〈34〉
− (p3+ + p4+)(p2+ + p1+)[(p̄4 − p̄3)(p̄2 − p̄1)− (p̄2 + p̄3)2]〈23〉〈41〉
(3.17)
This expression, together with (3.16) is our proposal for the off–shell four–point all–plus
vertex that should be part of the MHV-rules formalism at the quantum level. It would be
very interesting to elucidate its structure and bring it into a more compact form. For the
moment, however, we will be content to demonstrate that (3.17) is equal on shell to the
sought–for amplitude.
To that end, we will follow a similar approach to CQT, and rewrite all the holomorphic
spinor brackets in terms of the following three: 〈12〉〈34〉, 〈23〉〈41〉, 〈12〉〈41〉. To achieve this,
we use momentum conservation and a certain cyclic identity (see appendix A) to write
+〈34〉〈41〉 = p4+
p3+〈42〉 −
p4+〈23〉
+〈42〉 − (p4+)2
p1+〈12〉 −
p3+〈32〉
− (p4+)2〈23〉
= p4+
+〈12〉〈41〉 − p4+(p4+ + p3+)〈23〉〈41〉 .
(3.18)
In a similar way, we can show that
+〈12〉〈23〉 = p2+
+〈12〉〈41〉 − p2+(p2+ + p3+)〈34〉〈12〉 ,
+〈23〉〈34〉 =−
p3+(p
+)〈12〉〈34〉 − p3+(p1++p2+)〈23〉〈41〉+p3+
+〈12〉〈14〉
(3.19)
Collecting all the terms together, and manipulating the resulting expressions, it is straight-
14We write V (4) =
+〈12〉〈23〉〈34〉〈41〉V(4).
forward to show that (3.17) simplifies to just
V (4)p =
〈23〉〈41〉{34}(p1+ + p2+)[(p̄1 − p̄2)− (p̄2 + p̄3)]
+〈12〉〈34〉{23}(p2+ + p3+)[(p̄1 + p̄2) + (p̄1 − p̄4)]
+〈12〉〈41〉
(p̄1 + p̄2)({41}+ {32})+(p̄2 + p̄3)({12}+ {43})
(3.20)
where we use the notation [34] {ij} = pi+p
z̄ − pj+piz̄ = (1/
+[ij]. Converting to the
usual antiholomorphic bracket notation, we rewrite (3.20) as
V (4)p =
〈23〉〈41〉
+[34](p
+ + p
+)[(p̄1−p̄2)− (p̄2+p̄3)]
+ 〈12〉〈34〉
+[23](p
+ + p
+)[(p̄1+p̄2) + (p̄1−p̄4)]
+ 〈12〉〈41〉
(p̄1 + p̄2)(p
+[41] + p
+[32])
+ (p̄2 + p̄3)(p
+[12] + p
+[43])
(3.21)
Note that so far this expression is completely off shell. We will now show that on shell it
reduces to the known result (2.22). In doing this we will keep track of the p2 terms that
appear when applying momentum conservation in the form
〈ik〉[kj] =
. (3.22)
These terms are collected in appendix B.
We start by rewriting each of the terms in the last two lines of (3.21) as follows
〈12〉〈41〉[41] p1+
+(p̄1 + p̄2) = −〈23〉〈41〉[34] p1+
+(p̄1 + p̄2)
〈12〉〈41〉[32] p3+
+(p̄1 + p̄2) = −〈12〉[32]〈42〉p2+p3+(p̄1 + p̄2)
− 〈12〉〈34〉[23] p3+
+(p̄1 + p̄2)
〈12〉〈41〉[12] p1+
+(p̄2 + p̄3) = −〈12〉〈34〉[23] p1+
+(p̄2 + p̄3)
〈12〉〈41〉[43] p3+
+(p̄2 + p̄3) = −〈41〉〈23〉[34] p3+
+(p̄2 + p̄3)
− 〈41〉[43]〈42〉p4+p3+(p̄2 + p̄3) .
(3.23)
We also transform the 〈12〉〈34〉 term using the Schouten identity and also momentum con-
servation,
〈12〉〈34〉[23]
+=〈23〉〈41〉[34]
++〈14〉〈23〉[13]
+−〈13〉〈42〉[23]
+ , (3.24)
and add up all contributions to the 〈23〉〈41〉 term, which are
〈23〉〈41〉[34]
4(p2+p̄1 − p1+p̄2) + 2(p3+p̄1 − p1+p̄3)
〈23〉〈41〉[34]
+[4{21}+ 2{31}] .
(3.25)
Converting to the spinor bracket, the first of these terms is
+[12]〈23〉[34]〈41〉 , (3.26)
while the remaining terms from (3.23) and (3.24) combine to give
〈14〉〈23〉[13]
+ − 〈13〉〈42〉[23]
(p2+ + p
+)[(p̄1+p̄2) + (p̄1−p̄4)]
+ 〈12〉[32]〈42〉p2+[p2+(p̄1 + p̄2)− p4+(p̄2 + p̄3)]
=− 〈14〉[13]〈12〉p3+(p2+ + p3+)[(p̄1+p̄2) + (p̄1−p̄4)]
+ 〈12〉[32]〈42〉p2+[p2+(p̄1 + p̄2)− p4+(p̄2 + p̄3)]
=− 〈14〉[13]〈12〉p3+(2(p2+ + p3+)p̄1 − 2p1+(p̄2 + p̄3)) = 2〈14〉[13]〈12〉p3+{41}
(3.27)
(where we suppress an overall 1/(4
2)) and we see that (3.27) cancels the second term
in (3.25), thus showing that (3.26) is the complete on-shell answer. Reintroducing all the
prefactors, we thus find that the amplitude is
A(4) = − g
〈12〉〈23〉〈34〉〈41〉 ×
+[12]〈23〉[34]〈41〉
[12][34]
〈12〉〈34〉 .
(3.28)
Now note that, as discussed in appendix A, in order to convert to the usual Yang–Mills
theory normalisation we need to send g → g/
2. We conclude that A(4) gives precisely the
result (2.22) for the all–plus scattering amplitude.
3.3 The general all–plus amplitude
We have just given an explicit derivation of the four point all-plus amplitude, from the
two-point counterterm (3.3). We will argue in the following that this two-point counterterm
contains all the all-plus amplitudes.
First, we can see immediately that the counterterm (3.3) has the right kind of structure.
Consider the n–point all–plus amplitude [56]:
A(n) =
1≤i<j<k<l≤n
〈ij〉[jk]〈kl〉[li]
〈12〉 · · · 〈n1〉 . (3.29)
In terms of spinor brackets this amplitude has terms of the form 〈 〉2−n[ ]2. A quick look
at the Ettle-Morris coefficients shows that, for an n–point vertex coming from LCT, they
contribute exactly 2 − n powers of the spinor brackets 〈 〉. Furthermore, there are exactly
two powers of [ ] coming from the counterterm Lagrangian LCT ∼ (k2z̄)A2 – one for each
power of k. Thus the general structure of LCT is appropriate to reproduce (3.29).
Pictorially, we can represent the general n–point amplitude, arising from the counterterm
in the new variables, as in Figure 5.
ki kj
Figure 5: The structure of a generic term contributing to the n–point vertex. All momenta
are taken to be outgoing, and all indices are modulo n.
Thus we can write this n–point all–plus vertex as follows:
A(n)+···+ =
1···n
δ(p+ p′)
1≤i<j≤n
Y(p; j + 1, . . . , i)
(kiz̄)
2 + (k
2 + kiz̄k
Y(p′; i+ 1, . . . , j)×
× tr[BiBi+1 · · ·BjBj+1 · · ·Bi−1]
2i)n−2
1···n
δ(p1+· · ·+pn)
1≤i<j≤n
+ + · · ·+ pi+)
〈j + 1, j + 2〉 · · · 〈i− 1, i〉×
(kiz̄)
2 + (k
2 + kiz̄k
) (pi+1+ + · · ·+ pj+)
pi+1+ p
〈i+ 1, i+ 2〉 · · · 〈j − 1, j〉tr[B1 · · ·Bn] .
(3.30)
Focusing only on the relevant part of the above expression, and ignoring all coefficients, the
general structure we obtain is the following:
V(n)+···+ =
〈12〉 · · · 〈n1〉×
1≤i<j≤n
〈j, j + 1〉〈i, i+ 1〉
+ − ki+)2((kiz̄)2 + (k
2 + kiz̄k
 (3.31)
where we have extracted the denominator at the expense of introducing the two missing
holomorphic factors 〈j, j + 1〉 and 〈i, i+ 1〉 in the numerator. We also made use of the fact
kj − ki = pi+1 + pi+2 + · · ·+ pj = −(pj+1 + pj+2 + · · ·+ pi) , (3.32)
applied to the + components, to rewrite the two p+ sums in the numerator in terms of the
k’s (this gives rise to a minus which we suppress).
It is easy to verify that, for n = 4, this sum reproduces the 6 contributions that appeared
in the four–point case, and (as we explicitly showed above) combined to give the expected
answer. Therefore, we would like to propose that the vertex (3.31) will reduce on–shell to
an expression proportional to (3.29). We will not attempt to prove this statement here15,
but will instead move on to study the general properties of the n-point expression (3.30).
Whilst the explicit calculation for the four point case was rather involved as we saw
earlier, the study of the general properties of the n–point amplitudes proves much simpler.
In particular, we will show that the collinear and soft limits of the expressions proposed for
the n–point case can be very easily shown to be correct. Let us start by introducing some
simplifying notation. One can write the change of variables for the A field as
A1 = Y12B2 +Y123B2B3 +Y1234B2B3B4 + · · · , (3.33)
where
Y12 = δ12, Y123 =
, Y1234 =
(23)(34)
, (3.34)
and generally
Y12...n =
1+3+4+ . . . (n− 1)+
(23)(34) . . . (n− 1 n) (3.35)
(for simplicity, we are dropping inconsequential constant factors in this discussion). This
notation is similar to that of [34]. Integrations and the insertion of suitable delta functions
are understood, and can be illustrated by comparing the short-hand expressions above with
the full equations given earlier. It will prove convenient to define
Kij = k
i + k
j + kikj, ki := k
z̄. (3.36)
We will use the expression Y•12...n in the following, where the dot in the first placemark
in the Y means that one substitutes in that place the negative of the sum of the other
momenta. Then the result which we have proved above for the four point amplitude V1234
can be expressed as
V1234 =K43Y•4Y•123 +K14Y•1Y•234 +K21Y•2Y•341 +K32Y•3Y•412
+K31Y•23Y•41 +K24Y•12Y•34 ,
(3.37)
15It is perhaps interesting to remark that the proof would involve converting the double sum in (3.31) to
the quadruple sum in (3.29)—a state of affairs which has appeared before in a rather different context [20].
or very simply
V1234 =
1≤i<j≤4
KijY• j+1...iY• i+1...j . (3.38)
It is clear that the general conjecture that all the n–point all plus amplitudes are generated
from the two-point counterterm (3.3) translates into the proposal that the n-point all-plus
amplitude V12...n is given by
V12...n =
1≤i<j≤n
KijY• j+1...iY• i+1...j , (3.39)
Let us now show that the expression on the right-hand side of (3.39) has precisely the same
soft and collinear limits as the known amplitude on the left-hand side.
Collinear limits
Under the collinear limit
pi → zP , pi+1 → (1− z)P , P 2 → 0 , (3.40)
the n-point amplitude V12...n behaves as
V12...n →
z(1− z)
(i i+ 1)
V12...i i+2...n , (3.41)
where we relabel P → pi after the limit is taken (the i+ and (i i + 1) factors involve
momenta rather than spinors, which is why the z-dependent factor is 1/z(1−z), rather than
the conventional 1/
z(1− z)).
Consider the behaviour of the right-hand side of (3.39) under the limit (3.40). The first
point is that if the indices i, i + 1 lie on different Y’s, then there are no poles generated in
this collinear limit. This is clear from the explicit expressions for the Y’s in (3.35). Thus we
may ignore any terms of this type. It is then immediate from the explicit forms of the Y’s
Y12...s →
z(1− z)
(i i+ 1)
Y12...i i+2...s , (3.42)
for any i = 2, . . . s − 1, with s ≤ n (the first index in Y never contributes in a collinear
limit, as one can see from the conjecture (3.39)). Thus we see that the Y expressions have
the right sort of collinear behaviour. It is straightforward to see that the K coefficients in
(3.39) also get relabelled correctly in the collinear limit; they are not explicitly involved as
they refer to pairs of momenta attached to different Y fields, and as we saw, these do not
contribute.
It is then immediate to see that the summation over the products of Y’s in (3.39) reduces
correctly in the collinear limit to the required summation over products of Y’s with one fewer
leg in total. Hence the proposal (3.39) for the amplitude has precisely the same collinear
limits as the physical amplitude.
Soft limits
We also find that there is a simple derivation of the soft limits of the expression in (3.39).
In the soft limit
pj → 0 , (3.43)
the n-point amplitude V12...n behaves as
V12...n → S(j) V12...j−1 j+1...n , (3.44)
where we assume cyclic ordering as usual, so that, for example, pn+1 = p1. The soft function
S(j) is given in terms of the momentum brackets by
S(j) =
j+(j − 1 j + 1)
(j − 1 j) (j j + 1) . (3.45)
The Y functions have a simple behaviour under soft limits. One has immediately that in
the soft limit pj → 0,
Y12...s → S(j) Y12...j−1 j+1...s , (3.46)
for j = 3, . . . s− 1 (with s ≤ n). For the soft limits corresponding to the case missing in the
above, we need the results
Y•s+1...j = Y•s+1...j−1
(j − 1)+
(j − 1 j) , Y•j...s = Y•j+1...s
(j + 1)+
(j j + 1)
, (3.47)
which follow from the definitions of the Y’s, and
(j + 1)+
(j j + 1)
(j − 1)+
(j − 1 j) =
j+(j − 1 j + 1)
(j − 1 j) (j j + 1) = S(j) , (3.48)
which follows from the cyclic identity i+(jk) + j+(ki) + k+(ij) = 0. Finally, from relabelling
the K’s we have in the soft limit that Ksj → Ksj−1. Then it follows that in the soft limit
Ksj Y•s+1...j Y•j+1...s +Ksj−1 Y•s+1...j−1 Y•j...s → S(j)Ksj−1 Y•s+1...j−1 Y•j+1...s , (3.49)
as required.
Again, it is then easy to see that the summation over the products of Y’s in (3.39) reduces
correctly in the soft limit to the required summation over products of Y’s with one fewer leg
in total. Hence the proposal (3.39) for the amplitude has precisely the same soft limits as
the physical amplitude.
4 Discussion
Whilst new, twistor-inspired methods for calculating amplitudes in gauge theory have led
to much progress, the lack of a systematic action-based formulation which incorporates
these new ideas has been an impediment to further developments. MHV diagrams have
the two advantages of being closely allied to the twistor picture, as well as providing an
explicit realisation of the dispersion and phase space integrals fundamental to unitarity-
based methods. However, without an action formalism, standard MHV methods have so far
been mainly restricted to massless theories at one-loop level, and to the cut-constructible
parts of amplitudes.
The advent of a classical MHV Lagrangian for gauge theory, derived from lightcone YM
theory [32, 33, 34], provides the basis for transcending these limitations. In order for this to
be realised, it is necessary to describe the quantum MHV theory. What we have done in this
paper is to investigate this quantum theory. Using the regularisation methods of [39, 40, 41],
we have provided arguments that the simplest one-loop counterterm in the quantum MHV
theory – a two point vertex – provides an extraordinarily concise generating function for the
infinite sequence of one-loop, all-plus helicity amplitudes in YM theory. We showed this by
explicit calculation for the four-point case, and then proved that the soft and collinear limits
of the conjectured n-point amplitude precisely matched those of the correct answer.
We would like to emphasise that the simplicity of our approach — which reduced the
calculations of the loop amplitudes we considered to tree–level algebraic manipulations—
is largely due to the four–dimensional nature of the regularisation scheme we employed.
By staying in four dimensions, we preserve the appealing features of the inherently four–
dimensional field redefinition of [32, 33].
Based upon this result, it is very natural to conjecture that the full quantum YM theory
is correctly described by this quantum MHV Lagrangian. The correct ingredients appear to
be present. For example, in the approach of [39, 40, 41] there arise one-loop counterterms
with helicities (++), (+ + −), (−−), (− − +). We studied the (++) counterterm in this
paper, arguing that when expressed in the (B, B̄) variables this generates the full set of all-
plus amplitudes. Transforming the (++−) counterterm to (B, B̄) variables will generate an
infinite sequence of single–minus vertices. There will be other contributions to single-minus
vertices from combinations of all-plus vertices and MHV vertices. It would be surprising if
the combined contributions of these did not lead to the correct YM single-minus expressions.
Certainly all of these have the correct powers of spinor brackets for this to be the case.
Transforming the (−−) and (− − +) counterterms to (B, B̄) variables will lead to new
contributions to MHV vertices16. The MHV vertices from the classical MHV Lagrangian
only generate the cut-constructible parts of YM loop amplitudes, such as the one-loop MHV
16In the MHV case there are additional counterterms noted in [41] which may also need to be taken into
account in future discussions.
amplitude. These new contributions might be expected to lead to the missing, rational
parts. This would also potentially explain why in [57] the combination of all-plus vertices
with MHV tree vertices did not yield the correct single-minus amplitudes – these additional
MHV contributions are missing.
Further evidence for the conjecture that the quantum MHV Lagrangian is equivalent
to quantum YM theory would be welcome. One could start with seeking explicit proofs of
the above proposals. One can also investigate beyond massless one-loop gauge theory – an
advantage of the Lagrangian approach is that the inclusion of masses, and of fermions and
scalars, is in principle clear. There are other issues raised by this work. It is plausible that
the potential quantum versions of the twistor space formulations of gauge theory [58, 59, 60]
are most likely to be allied to the quantum theory discussed here – one simple reason for
believing this is that the regularisation employed here keeps one in four dimensions. Perhaps
there are simple twistor space analogues of the counterterms discussed above.
Finally, although for our purposes the lightcone worldsheet approach to perturbative
gauge theory provided simply the motivation for a particular choice of regularisation scheme,
we believe that it would be fruitful to further explore possible connections between that
framework and the twistor string programme.
Addendum: We would like to thank Paul Mansfield and Tim Morris for having informed
us that they have recently been pursuing research related to that presented in this paper.
Their work, which is complementary to ours in that it employs dimensional regularisation,
has now appeared in [61].
Acknowledgements
It is a pleasure to thank Paul Heslop, Gregory Korchemsky, Paul Mansfield, Tim Morris
and Adele Nasti for discussions. We would like to thank PPARC for support under the
Rolling Grant PP/D507323/1 and the Special Programme Grant PP/C50426X/1. The work
of GT is supported by an EPSRC Advanced Fellowship EP/C544242/1 and by an EPSRC
Standard Research Grant EP/C544250/1.
A Notation
Lightcone conventions
Here we summarise our lightcone conventions. We start off by introducing lightcone
coordinates
x± :=
x0 ± x3√
, xz :=
x1 + ix2√
, xz̄ :=
x1 − ix2√
. (A.1)
We also have x+ = x−, x
z = −xz̄ , and so on. The scalar product between two vectors A and
B is written as
A · B := A+B− + A−B+ − AzBz̄ −Az̄Bz . (A.2)
We choose x− as our lightcone time coordinate, therefore the lightcone gauge used in this
paper is defined by
A− = 0 . (A.3)
This condition can be written as η ·A = 0, where η is a constant null vector, chosen to have
components η := (1/
2, 0, 0, 1/
2) (hence η− = 1, η+ = ηz = ηz̄ = 0).
To any four-vector p we associate the bispinor paȧ defined by
paȧ :=
p− −pz
−pz̄ p+
. (A.4)
We also define holomorphic and anti-holomorphic spinors as
λa :=
, λ̃ȧ :=
, (A.5)
from which it follows that
λaλ̃ȧ :=
pzpz̄
−pz̄ p+
. (A.6)
This is of course consistent with the on-shell condition p− = pzpz̄/p+. Furthermore, compar-
ing (A.4) and (A.6) and choosing η as specified earlier, we see that a generic off-shell vector
p can be decomposed as
p = λλ̃ + zη , (A.7)
where
p−p+ − pzpz̄
2(p · η) . (A.8)
(A.7) and (A.8) are the familiar decompositions of off-shell vectors in the MHV literature
[62, 17, 63, 15].
The off-shell holomorphic spinor product is defined as:
〈ij〉 =
z − p
, (A.9)
whereas for the antiholomorphic spinors we define
[ij] =
z̄ − pj+piz̄
. (A.10)
In these conventions, one finds
2(pi · pj) = 〈i j〉 [i j] +
(pi)2 +
(pj)2 , (A.11)
or, in the case where pi and pj are on shell, 2(pi · pj) = 〈i j〉 [i j]. In the standard QCD
literature conventions it is customary to define 2(pi · pj) = 〈i j〉 [j i]; this can be obtained by
simply re-defining the inner product of two anti-holomorphic spinors, [i j], to be the negative
of the right hand side of (A.10).
Useful identities
The form (A.9) is very convenient for deriving identities for 〈ij〉 that also involve the p+
components. For instance, one has:
pi+〈jk〉+
+〈ki〉+
pk+〈ij〉
pi+(p
z − pk+pjz)
z − pi+pkz)
pk+(p
z − p
= 0 .
(A.12)
It is also easy to see how to apply momentum conservation, take say 〈ij〉, and substitute
pj = −
k 6=j
pk (for each component). (A.13)
Then we have
+〈ij〉 =
pi+(−
k 6=j p
z) + (
k 6=j p
k 6=j
z − pk+piz
k 6=j
pk+〈ki〉 .
(A.14)
We have also used the momentum bracket notation from [34]
(ij) = pi+p
z − p
z , {ij} = pi+p
z̄ − pj+piz̄ . (A.15)
Lightcone Yang–Mills action
Here we give the form of the lightcone Yang–Mills action that we use in this paper. As
discussed in more detail in [35], starting from the YM Lagrangian −(1/4) trF 2, imposing the
lightcone gauge (A.3), and integrating out the A+ component which appears quadratically,
the final lightcone theory contains only the two physical components Az and Az̄ [64, 65, 66],
which we associate with positive and negative helicity respectively. The Lagrangian takes
the simple form (2.1)
LYM = L+− + L++− + L−−+ + L++−− , (A.16)
L+− = −2 tr{Az̄(∂+∂− − ∂z∂z̄)Az} ,
L++− = 2ig tr{[Az, ∂+Az̄](∂+)−1(∂z̄Az)} ,
L−−+ = 2ig tr{[Az̄, ∂+Az](∂+)−1(∂zAz̄)} ,
L++−− = −2g2 tr{[Az̄, ∂+Az](∂+)−2[Az, ∂+Az̄]} .
(A.17)
Note that, in agreement with CQT, we have used the normalisation tr{T aT b} = δab. In
order to convert to the usual conventions for Yang–Mills theory, we therefore need to rescale
g → g/
Relation to the notation of CQT
To compare our notation to that of [39, 40, 41], note that we employ outgoing momenta
instead of incoming, therefore the all–plus amplitudes in these works would be all–minus from
our perspective, and should thus be conjugated when comparing. Also, our time evolution
coordinate is taken to be x− rather that x+, which (among other changes) implies that p+
of CQT becomes p+. Our metric is also taken to have opposite signature to that in CQT.
Finally, CQT define momentum brackets K∧ij and K
ij , which are just our (ij) and {ij}
brackets respectively.
B Details on the four–point calculation
In this appendix we prove two results that were used in section 3, namely equations (3.11)
and (3.15). To make the expressions more compact, instead of momentum brackets we use
the following notation:
fij = −
. (B.1)
The fij variables satisfy the simple relation:
fij = fik + fkj , (B.2)
while momentum conservation is applied as
pi+fij = −
pk+fkj . (B.3)
Also, to minimise clutter, in this appendix we use the notation qi := p
Proof of the quadratic identity
In order to show (3.11), it is convenient to divide out by the
+ factor (which
is there anyway in (3.10)) in order to bring it to the form
q24f34f41 + q
1f12f41 + q
2f12f23 + q
3f23f34
− (q2 + q3)(q1 + q4)f12f34 − (q3 + q4)(q2 + q1)f23f41 = 0 ,
(B.4)
Expanding out the two last terms in (B.4) as
− (q1q3 + q2q4)(f12f34 + f23f41)− (q1q2 + q3q4)f12f34 − (q2q3 + q4q1)f23f41 , (B.5)
we apply momentum conservation on each of the four components of the first term of (B.5),
in the following way:
− q1q3f12f34 = q1f12(q1f14 + q2f24) = −q21f12f41 + q1q2f12f24 ,
− q1q3f23f41 = q3f23(q2f42 + q3f43) = −q23f23f34 + q2q3f23f42 ,
− q2q4f12f34 = q4(q3f13 + q4f14)f34 = −q24f34f41 + q3q4f13f34 ,
− q2q4f23f41 = q2f23(q2f21 + q3f31) = −q22f12f23 + q2q3f31f23 .
(B.6)
Clearly these transformations have been chosen to cancel the first four terms in (B.4). Col-
lecting the remaining terms, we obtain
q1q2f12(f24 − f34) + q2q3f23(f42 + f31 − f41) + q3q4f34(f13 − f12)− q1q4f23f41
= q1q2f12f23 + q2q3f23f32 + q3q4f34f23 + q1q4f23f14
= f23[q2(q1f12 + q3f32) + q4(q3f34 + q1f14)] = f23[−q2(q4f42)− q4(q2f24)]
(B.7)
thus showing (3.11).
Proof of the linear identity
We will now outline the proof ot the linear (in region momenta) identity (3.15). Con-
verting it to the notation used in the appendix, and performing simple manipulations, we
find (suppressing the overall 3/8 factor):
X = q24((p̄3 + p̄4) + (p̄2 + p̄3))f34f41 + q
1(−(p̄2 + p̄3) + (p̄3 + p̄4))f12f41
+ q22(−(p̄3 + p̄4)− (p̄2 + p̄3))f12f23 + q23(+(p̄2 + p̄3)− (p̄3 + p̄4))f23f34
(q2 + q3)(q1 + q4)[(p̄3 − p̄2) + (p̄1 − p̄4)]f12f34
(q3 + q4)(q1 + q2)[(p̄4 − p̄3) + (p̄2 − p̄1)]f23f41
= (p̄3 − p̄1)(q24f34f41 − q22f12f23) + (p̄4 − p̄2)(q21f12f41 − q23f23f34)
− (q2 + q3)(q1 + q4)(p̄3 + p̄1)f12f34 − (q3 + q4)(q1 + q2)(p̄2 + p̄4)f23f41
= (p̄3 − p̄1)(q24f34f41 − q22f12f23) + (p̄4 − p̄2)(q21f12f41 − q23f23f34)
− (p̄1 + p̄3)q2q4(f12f34 − f23f41) + (p̄2 + p̄4)q1q3(f12f34 − f23f41)
− (p̄1 + p̄3)(q1q2 + q3q4)f12f34 + (p̄1 + p̄3)(q2q3 + q4q1)f23f41 .
(B.8)
Similarly to the previous case, we will rewrite the second line in the final expression in such
a way that we completely cancel all the terms in the first line. To do that we use
−(p̄1 + p̄3)q2q4(f12f34 − f23f41) =(p̄3 − p̄1)(q22f12f23 − q24f34f41)+
+ q1q2p̄
1f12f31 − q4q1p̄1f41f13+
+ q3q4p̄
3f34f13 − q2q3p̄3f23f31
(B.9)
(p̄2 + p̄4)q1q3(f12f34 − f23f41) =(p̄4 − p̄2)(q23f23f34 − q21f12f41)+
+ q2q3p̄2f23f42 − q1q2p̄2f12f24+
+ q4q1p̄4f41f24 − q3q4p̄4f34f42 .
(B.10)
What remains after substituting these is
X = p̄1q1f31(q2f12 + q4f41) + q3p̄3f13(q4f34 + q2f23)
+ p̄2q2f42(q3f23 + q1f12) + q4p̄4f24(q1f41 + q3f34)
− (p̄1 + p̄3)(q1q2 + q3q4)f12f34 + (p̄1 + p̄3)(q2q3 + q4q1)f23f41
= p̄1q1q2f12f41 + p̄3q3q4f34f23 + p̄1q4q1f41f21 + p̄3q2q3f23f43
+ p̄2q2f42(q3f23 + q1f12) + q4p̄4f24(q1f41 + q3f34)
− (p̄1q3q4 + p̄3q1q2)f12f34 + (p̄1q2q3 + p̄3q4q1)f23f41 .
(B.11)
Now we collect various terms together to rewrite X as
X = p̄1q2f41(q1f12 + q3f23) + p̄3q4f23(q3f34 + q1f41)
+ p̄1q4f21(q1f41 + q3f34) + p̄3q2f43(q3f23 + q1f12)
+ p̄2q2f42(q3f23 + q1f12) + p̄4q4f24(q1f41 + q3f34)
= p̄1q2f41(2q3f23 − q4f42) + p̄3q4f23(2q1f41 − q2f24)
+ p̄1q4f21(2q1f41 − q4f42) + p̄3q2f43(2q3f23 − q4f42)
+ p̄2q2f42(2q3f23 − q4f42) + p̄4q4f24(2q1f41 − q2f24)
= 2[q2q3f23(p̄1f41 + p̄3f43 + p̄2f42) + q4q1f41(p̄3f23 + p̄1f21 + p̄4f24)]
+ (p̄1 + p̄2 + p̄3 + p̄4)q2q4f24f42 .
(B.12)
Clearly the term on the last line vanishes by momentum conservation. We now restore all
labels to write the final result as
X =2 (32)[f4(p
z̄ + p
z̄ + p
z̄)− p1z̄f1 − p2z̄f2 − p3z̄f3]+
+ 2 (14)[f2(p
z̄ + p
z̄ + p
z̄)− p3z̄f3 − p1z̄f1 − p4z̄f4] ,
(B.13)
where we used that q2q3f23 = p
+ − p3z/p3+) = p3+p2z − p2+p3z = (32) (and similarly
for (14)), and where fi = p
+. Using momentum conservation on both terms, we rewrite
them as
X = −2[(32) + (14)]
p1z̄p
p2z̄p
p3z̄p
p4z̄p
. (B.14)
For each momentum we have that p2 = 2(p+p− − pzpz̄), therefore we can rewrite the above
X = +[(32) + (14)]
+ 2(p1− + p
− + p
− + p
. (B.15)
The p− term vanishes, hence, noticing also that (32)+ (14) = −12((12)+ (23)+ (34)+ (41)),
we conclude that
X = −1
[(12) + (23) + (34) + (41)]
. (B.16)
Off-shell terms in the four-point case
For completeness, we also give the form of the off-shell terms that arose in the manipu-
lations leading to (3.26).
Using the notation Pij = (
) they are :
f(p2) =
4〈12〉 · · · 〈41〉
− P13(p̄1 + p̄2)(41)− P13(p̄2 + p̄3)(12) + P24(p̄2 + p̄3)(42)
P12[(p
+ + p
+)(2p̄1 + p̄2 − p̄3)− p3+(p̄1 + p̄2)− p1+(p̄2 + p̄3)] (13)
+ P12
[p2+(p̄1 + p̄2)− p4+(p̄2 + p̄3)](12)− 2P13
{31}(41)
(B.17)
This expression, together with V(4)
in (3.16), should be added to (3.26) in order to recover
a fully off-shell four–point vertex.
References
[1] E. Witten, Perturbative gauge theory as a string theory in twistor space, Comm. Math.
Phys. 252 (2004) 189, hep-th/0312171.
[2] F. Cachazo and P. Svrček, Lectures on twistor strings and perturbative Yang-Mills
theory, PoS RTN2005 (2005) 004, hep-th/0504194.
[3] R. Britto, F. Cachazo, and B. Feng, New recursion relations for tree amplitudes of
gluons, Nucl. Phys. B 715 (2005) 499, hep-th/0412308.
[4] R. Britto, F. Cachazo, B. Feng, and E. Witten, Direct proof of tree-level recursion
relation in Yang-Mills theory, Phys. Rev. Lett. 94 (2005) 181602, hep-th/0501052.
[5] R. Britto, F. Cachazo, and B. Feng, Generalized unitarity and one-loop amplitudes in
N = 4 super-Yang-Mills, Nucl. Phys. B 725 (2005) 275, hep-th/0412103.
[6] A. Brandhuber and G. Travaglini, Quantum MHV diagrams, hep-th/0609011.
[7] F. Cachazo, P. Svrček, and E. Witten, MHV vertices and tree amplitudes in gauge
theory, J. High Energy Phys. 0409 (2004) 006, hep-th/0403047.
[8] G. Georgiou and V. V. Khoze, Tree amplitudes in gauge theory as scalar MHV
diagrams, J. High Energy Phys. 0405 (2004) 070, hep-th/0404072.
[9] C.-J. Zhu, The googly amplitudes in gauge theory, J. High Energy Phys. 0404 (2004)
032, hep-th/0403115.
[10] J.-B. Wu and C.-J. Zhu, MHV vertices and scattering amplitudes in gauge theory, J.
High Energy Phys. 0407 (2004) 032, hep-th/0406085.
[11] J.-B. Wu and C.-J. Zhu, MHV vertices and fermionic scattering amplitudes in gauge
theory with quarks and gluinos, J. High Energy Phys. 0409 (2004) 063,
hep-th/0406146.
[12] G. Georgiou, E. W. N. Glover, and V. V. Khoze, Non–MHV tree amplitudes in gauge
theory, J. High Energy Phys. 0407 (2004) 048, hep-th/0407027.
[13] L. J. Dixon, E. W. N. Glover, and V. V. Khoze, MHV rules for Higgs plus multi-gluon
amplitudes, J. High Energy Phys. 0412 (2004) 015, hep-th/0411092.
[14] Z. Bern, D. Forde, D. A. Kosower, and P. Mastrolia, Twistor-inspired construction of
electroweak vector boson currents, Phys. Rev. D 72 (2005) 025006, hep-ph/0412167.
[15] J. Bedford, A. Brandhuber, B. Spence, and G. Travaglini, Non–supersymmetric loop
amplitudes and MHV vertices, Nucl. Phys. B 712 (2005) 59, hep-th/0412108.
[16] Z. Bern, L. J. Dixon, D. C. Dunbar, and D. A. Kosower, Fusing gauge theory tree
amplitudes into loop amplitudes, Nucl. Phys. B435 (1995) 59–101, hep-ph/9409265.
http://xxx.lanl.gov/abs/hep-th/0312171
http://xxx.lanl.gov/abs/hep-th/0504194
http://xxx.lanl.gov/abs/hep-th/0412308
http://xxx.lanl.gov/abs/hep-th/0501052
http://xxx.lanl.gov/abs/hep-th/0412103
http://xxx.lanl.gov/abs/hep-th/0609011
http://xxx.lanl.gov/abs/hep-th/0403047
http://xxx.lanl.gov/abs/hep-th/0404072
http://xxx.lanl.gov/abs/hep-th/0403115
http://xxx.lanl.gov/abs/hep-th/0406085
http://xxx.lanl.gov/abs/hep-th/0406146
http://xxx.lanl.gov/abs/hep-th/0407027
http://xxx.lanl.gov/abs/hep-th/0411092
http://xxx.lanl.gov/abs/hep-ph/0412167
http://xxx.lanl.gov/abs/hep-th/0412108
http://xxx.lanl.gov/abs/hep-ph/9409265
[17] A. Brandhuber, B. J. Spence, and G. Travaglini, One-loop gauge theory amplitudes in
N = 4 super Yang-Mills from MHV vertices, Nucl. Phys. B706 (2005) 150–180,
hep-th/0407214.
[18] W. L. van Neerven, Dimensional regularization of mass and infrared singularities in
two loop on-shell vertex functions, Nucl. Phys. B 268 (1986) 453.
[19] Z. Bern and A. G. Morgan, Massive loop amplitudes from unitarity, Nucl. Phys. B 467
(1996) 479, hep-ph/9511336.
[20] Z. Bern, L. J. Dixon, D. C. Dunbar, and D. A. Kosower, One-loop self-dual and N = 4
super Yang-Mills, Phys. Lett. B 394 (1997) 105, hep-th/9611127.
[21] A. Brandhuber, S. McNamara, B. Spence, and G. Travaglini, Loop amplitudes in pure
Yang-Mills from generalised unitarity, J. High Energy Phys. 0510 (2005) 011,
hep-th/0506068.
[22] Z. Bern, L. J. Dixon, and D. A. Kosower, Bootstrapping multi-parton loop amplitudes
in QCD, Phys. Rev. D 73 (2006) 065013, hep-ph/0507005.
[23] C. F. Berger, Z. Bern, L. J. Dixon, D. Forde, and D. A. Kosower, Bootstrapping
one-loop QCD amplitudes with general helicities, Phys. Rev. D 74 (2006) 036009,
hep-ph/0604195.
[24] C. F. Berger, Z. Bern, L. J. Dixon, D. Forde, and D. A. Kosower, All one-loop
maximally helicity violating gluonic amplitudes in QCD, Phys. Rev. D 75 (2007)
016006, hep-ph/0607014.
[25] Z. G. Xiao, G. Yang, and C. J. Zhu, The rational part of QCD amplitude. I: The
general formalism, Nucl. Phys. B 758 (2006) 1, hep-ph/0607015.
[26] X. Su, Z. G. Xiao, G. Yang, and C. J. Zhu, The rational part of QCD amplitude. II:
The five-gluon, Nucl. Phys. B 758 (2006) 35, hep-ph/0607016.
[27] Z. G. Xiao, G. Yang, and C. J. Zhu, The rational part of QCD amplitude. III: The
six-gluon, Nucl. Phys. B 758 (2006) 53, hep-ph/0607017.
[28] C. Anastasiou, R. Britto, B. Feng, Z. Kunszt, and P. Mastrolia, D-dimensional
unitarity cut method, Phys. Lett. B 645 (2007) 213, hep-ph/0609191.
[29] P. Mastrolia, On triple-cut of scattering amplitudes, Phys. Lett. B 644 (2007) 272,
hep-th/0611091.
[30] R. Britto and B. Feng, Unitarity cuts with massive propagators and algebraic
expressions for coefficients, Phys. Rev. D 75 (2007) 105006, hep-ph/0612089.
[31] C. Anastasiou, R. Britto, B. Feng, Z. Kunszt, and P. Mastrolia, Unitarity cuts and
reduction to master integrals in d dimensions for one-loop amplitudes,
hep-ph/0612277.
http://xxx.lanl.gov/abs/hep-th/0407214
http://xxx.lanl.gov/abs/hep-ph/9511336
http://xxx.lanl.gov/abs/hep-th/9611127
http://xxx.lanl.gov/abs/hep-th/0506068
http://xxx.lanl.gov/abs/hep-ph/0507005
http://xxx.lanl.gov/abs/hep-ph/0604195
http://xxx.lanl.gov/abs/hep-ph/0607014
http://xxx.lanl.gov/abs/hep-ph/0607015
http://xxx.lanl.gov/abs/hep-ph/0607016
http://xxx.lanl.gov/abs/hep-ph/0607017
http://xxx.lanl.gov/abs/hep-ph/0609191
http://xxx.lanl.gov/abs/hep-th/0611091
http://xxx.lanl.gov/abs/hep-ph/0612089
http://xxx.lanl.gov/abs/hep-ph/0612277
[32] A. Gorsky and A. Rosly, From Yang-Mills lagrangian to MHV diagrams, J. High
Energy Phys. 0601 (2006) 101, hep-th/0510111.
[33] P. Mansfield, The lagrangian origin of MHV rules, J. High Energy Phys. 0603 (2006)
037, hep-th/0511264.
[34] J. H. Ettle and T. R. Morris, Structure of the MHV–rules lagrangian, J. High Energy
Phys. 0608 (2006) 003, hep-th/0605121.
[35] A. Brandhuber, B. Spence, and G. Travaglini, Amplitudes in pure Yang–Mills and
MHV diagrams, J. High Energy Phys. 0702 (2007) 088, hep-th/0612007.
[36] D. Cangemi, Selfdual Yang-Mills theory and one loop like–helicity QCD multi–gluon
amplitudes., Nucl. Phys. B 484 (1997) 521, hep-th/9605208.
[37] D. Cangemi, Selfduality and maximally helicity violating QCD amplitudes, Int. J.
Mod. Phys. A 12 (1997) 1215, hep-th/9610021.
[38] G. Chalmers and W. Siegel, The self–dual sector of QCD amplitudes, Phys. Rev. D 54
(1996) 7628, hep-th/9606061.
[39] C. B. Thorn, Notes on one-loop calculations in light-cone gauge, hep-th/0507213.
[40] D. Chakrabarti, J. Qiu, and C. B. Thorn, Scattering of glue by glue on the light-cone
worldsheet I: Helicity non-conserving amplitudes, Phys. Rev. D 72 (2005) 065022,
hep-th/0507280.
[41] D. Chakrabarti, J. Qiu, and C. B. Thorn, Scattering of glue by glue on the light-cone
worldsheet II: Helicity conserving amplitudes, Phys. Rev. D 74 (2006) 045018,
hep-th/0602026.
[42] H. Feng and Y.-T. Huang, MHV lagrangian for N=4 Super Yang–Mills,
hep-th/0611164.
[43] A. Brandhuber, B. Spence, and G. Travaglini, From trees to loops and back, J. High
Energy Phys. 0601 (2006) 142, hep-th/0510253.
[44] K. Bardakci and C. B. Thorn, A worldsheet description of large Nc quantum field
theory, Nucl.Phys. B 626 (2002) 287, hep-th/0110301.
[45] C. B. Thorn, A worldsheet description of planar Yang-Mills theory, Nucl. Phys. B 637
(2002) 272, hep-th/0203167.
[46] G. ’t Hooft, A planar diagram theory for strong interactions, Nucl. Phys. B 72 (1974)
[47] H. B. Nielsen and P. Olesen, A parton view on dual amplitudes, Phys. Lett. B 32
(1970) 203.
http://xxx.lanl.gov/abs/hep-th/0510111
http://xxx.lanl.gov/abs/hep-th/0511264
http://xxx.lanl.gov/abs/hep-th/0605121
http://xxx.lanl.gov/abs/hep-th/0612007
http://xxx.lanl.gov/abs/hep-th/9605208
http://xxx.lanl.gov/abs/hep-th/9610021
http://xxx.lanl.gov/abs/hep-th/9606061
http://xxx.lanl.gov/abs/hep-th/0507213
http://xxx.lanl.gov/abs/hep-th/0507280
http://xxx.lanl.gov/abs/hep-th/0602026
http://xxx.lanl.gov/abs/hep-th/0611164
http://xxx.lanl.gov/abs/hep-th/0510253
http://xxx.lanl.gov/abs/hep-th/0110301
http://xxx.lanl.gov/abs/hep-th/0203167
[48] B. Sakita and M. Virasoro, Dynamical model of dual amplitudes, Phys. Rev. Lett. 24
(1970) 1146.
[49] C. B. Thorn, Renormalization of quantum fields on the lightcone worldsheet. 1. Scalar
fields, Nucl. Phys. B 699 (2004) 427, hep-th/0405018.
[50] K. Bardakci and C. B. Thorn, A mean field approximation to the worldsheet model of
planar φ3 field theory, Nucl.Phys. B 652 (2003) 196, hep-th/0206205.
[51] K. Bardakci, Selfconsistent field method for planar φ3 theory, Nucl. Phys. B. 677
(2004) 354, hep-th/0308197.
[52] Z. Bern, L. Dixon, D. Dunbar, and D. Kosower, One-loop n-point gauge theory
amplitudes, unitarity and collinear limits, Nucl.Phys. B 425 (1994) 217,
hep-th/9403226.
[53] J. Qiu, One loop gluon gluon scattering in light cone gauge, Phys. Rev. D 74 (2006)
085022, hep-th/0607097.
[54] Z. Bern and D. A. Kosower, The computation of loop amplitudes in gauge theories,
Nucl. Phys. B 379 (1992) 451.
[55] S. Minwalla, M. Van Raamsdonk, and N. Seiberg, Noncommutative perturbative
dynamics, JHEP 0002 (2000) hep-th/9912072.
[56] Z. Bern, G. Chalmers, L. Dixon, and D. A. Kosower, One-loop n gluon amplitudes with
maximal helicity violation via collinear limits, Phys. Rev. Lett. 72 (1994) 2134,
hep-ph/9312333.
[57] F. Cachazo, P. Svrček, and E. Witten, Twistor space structure of one–loop amplitudes
in gauge theory, J. High Energy Phys. 10 (2004) 074, hep-th/0406177.
[58] R. Boels, L. Mason, and D. Skinner, Supersymmetric gauge theories in twistor space,
J. High Energy Phys. 0702 (2007) 014, hep-th/0604040.
[59] R. Boels, L. Mason, and D. Skinner, From twistor actions to MHV diagrams, Phys.
Lett. B 648 (2007) 90, hep-th/0702035.
[60] R. Boels, A quantization of twistor Yang–Mills theory through the background field
method, hep-th/0703080.
[61] J. H. Ettle, C.-H. Fu, J. P. Fudger, P. R. W. Mansfield, and T. R. Morris, S-matrix
equivalence theorem evasion and dimensional regularisation with the canonical MHV
lagrangian, J. High Energy Phys. 0705 (2007) 011, hep-th/0703286.
[62] D. A. Kosower, Next-to-maximal helicity violating amplitudes in gauge theory, Phys.
Rev. D71 (2005) 045007, hep-th/0406175.
http://xxx.lanl.gov/abs/hep-th/0405018
http://xxx.lanl.gov/abs/hep-th/0206205
http://xxx.lanl.gov/abs/hep-th/0308197
http://xxx.lanl.gov/abs/hep-th/9403226
http://xxx.lanl.gov/abs/hep-th/0607097
http://xxx.lanl.gov/abs/hep-th/9912072
http://xxx.lanl.gov/abs/hep-ph/9312333
http://xxx.lanl.gov/abs/hep-th/0406177
http://xxx.lanl.gov/abs/hep-th/0604040
http://xxx.lanl.gov/abs/hep-th/0702035
http://xxx.lanl.gov/abs/hep-th/0703080
http://xxx.lanl.gov/abs/hep-th/0703286
http://xxx.lanl.gov/abs/hep-th/0406175
[63] J. Bedford, A. Brandhuber, B. Spence, and G. Travaglini, A twistor approach to
one–loop amplitudes in N = 1 supersymmetric Yang–Mills theory, Nucl. Phys. B 706
(2005) 100, hep-th/0410280.
[64] E. Tomboulis, Quantization of the Yang-Mills field in the null-plane frame, Phys. Rev.
D 8 (1973) 2736.
[65] J. Scherk and J. H. Schwarz, Gravitation in the light cone gauge, Gen. Rel. Grav. 6
(1975) 537.
[66] D. M. Capper, J. J. Dulwich, and M. J. Litvak, On the evaluation of integrals in the
light cone gauge, Nucl. Phys. B241 (1984) 463.
http://xxx.lanl.gov/abs/hep-th/0410280
	Introduction
	Background
	The classical MHV Lagrangian
	A four–dimensional regulator for lightcone Yang–Mills
	The one–loop (++++) amplitude
	The all-plus amplitudes from a counterterm
	Mansfield transformation of LCT
	The four–point case
	The general all–plus amplitude
	Discussion
	Notation
	Details on the four–point calculation
ABSTRACT
  It has been known for some time that the standard MHV diagram formulation of
perturbative Yang-Mills theory is incomplete, as it misses rational terms in
one-loop scattering amplitudes of pure Yang-Mills. We propose that certain
Lorentz violating counterterms, when expressed in the field variables which
give rise to standard MHV vertices, produce precisely these missing terms.
These counterterms appear when Yang-Mills is treated with a regulator,
introduced by Thorn and collaborators, which arises in worldsheet formulations
of Yang-Mills theory in the lightcone gauge. As an illustration of our
proposal, we show that a simple one-loop, two-point counterterm is the
generating function for the infinite sequence of one-loop, all-plus helicity
amplitudes in pure Yang-Mills, in complete agreement with known expressions.

<|endoftext|><|startoftext|>
Fermi-liquid effects in the transresistivity in quantum Hall double layers near ν = 1/2
Natalya A. Zimbovskaya
Department of Physics and Astronomy, St. Cloud State University,
720 Fourth Avenue South, St. Cloud, MN 56301, USA;
Urals State Mining University, Kuibysheva Str. 30, Yekaterinburg, Russia, 620000
(Dated: November 4, 2018)
Here, we present theoretical studies of the temperature and magnetic field dependences of the
Coulomb drag transresistivity between two parallel layers of two dimensional electron gases in
quantum Hall regime near half filling of the lowest Landau level. It is shown that Fermi-liquid
interactions between the relevant quasiparticles could give a significant effect on the transresistivity,
providing its independence of the interlayer spacing for spacings taking on values reported in the
experiments. Obtained results agree with the experimental evidence.
PACS numbers: 71.27.+a 73.43.-f
During the last decade double-layer two-dimensional
electron gas (2DEG) systems were of significant interest
due to many remarkable phenomena they exhibit, includ-
ing so called Coulomb drag. In Coulomb drag experi-
ments two 2DEGs are arranged close to each other, so
that they can interact via Coulomb forces. A current I
is applied to one layer of the system, and the voltage VD
in the other nearby layer is measured, with no current
allowed to flow in that layer. The ratio −VD/I gives a
transresistivity ρD which characterizes the strength of
the effect. The physical interpretation of the Coulomb
drag is that momentum is tranferred from the current
carrying layer to the nearby one due to interlayer inter-
actions [1, 2, 3].
It was shown theoretically [4, 5] and confirmed with
experiments [5] that the transresistivity between two
2DEGs in quantum Hall regime at one half filling of
the lowest Landau level for both layers is proportional
to T 4/3 (T is the temperature of the system) which is
quite different from the temperature dependence of ρD
in the absence of the external magnetic field applied to
2DEGs. This temperature dependence of the drag at
ν = 1/2 originates from the ballistic contribution to
the transresistivity. The latter reflects the response of
the two-layer system to the driving disturbance of finite
wave vector q and finite frequency ω when considering
scales are smaller than the mean free path l of electrons
(ql ≫ 1) , and times are shorter than their scattering
time τ (ωτ ≫ 1) [6].
In further experiments [7] the Coulomb drag was mea-
sured between 2DEGs where the layer filling factor was
varied around ν = 1/2. The transresistivity was re-
ported to be enhanced quadratically with ∆ν = ν− 1/2.
It was also reported that the curvature of the enhance-
ment depended on temperature but it was insensitive to
both sign of ∆ν and distance d between the layers. The
present work is motivated with these experiments of [7].
We calculate the transresistivity between two layers of
2DEGs subject to a strong magnetic field which provides
ν close to 1/2 for both layers.
We start from the well-known expression [1, 3] which
relates the Coulomb drag transresistivity to density-
density components of the polarization in the layers
Π(1)(q, ω) and Π(2)(q, ω) :
2(2π)2
(2π)2
sinh2(h̄ω/2T )
∣U(q, ω)
ImΠ(1)(q,ω)ImΠ(2)(q,ω). (1)
Here, U(q, ω) is the screened interlayer Coulomb inter-
action, and electron densities in the layers are supposed
to be equal (n1 = n2 = n).
Within the usual Composite Fermion (CF) approach
[8] a single layer polarizability describes that part of the
density-current electromagnetic response which is irre-
ducible with respect to the Coulomb interaction. Adopt-
ing for simplicity the RPA, we obtain the following ex-
pression for the 2× 2 polarizability matrix:
Π−1 = (K0)−1 + C−1. (2)
Here, the matrix K0 gives the response of noninteract-
ing CFs and C is the Chern-Simons interaction matrix.
Assuming for certainty the wave vector q to lie in the
”x” direction we have:
. (3)
Starting from the expression (2) we arrive at the fol-
lowing results for the density-density response function
Π00(i)(q, ω) :
Π00(i)(q, ω) = Π(i)(q, ω)
00(i)
(q,ω)
8iπh̄
K001(i)(q,ω)−
∆(i)(q,ω)
. (4)
http://arxiv.org/abs/0704.0246v1
∆(i)(q,ω) = K
00(i)(q,ω)K
11(i)(q,ω) +
K001(i)(q,ω)
Within the RPA response functions included in Eqs. (4),
(5) are simply related to components of the CF conduc-
tivity tensor σ̃ [8]:
xx(q,ω)
00(i)
(q,ω)
00(i)
(q,0)
σ̃(i)yy (q,ω) = −
K011(i)(q,ω)−K
11(i)(q,0)
σ̃(i)xy = −σ̃
K001(i)(q,ω). (6)
To proceed we calculate the components of the CF
conductivity at ν slightly away from 1/2. In this case
CFs experience a nonzero effective magnetic field Beff =
B−B1/2. We concentrate on the ballistic contribution to
the transresistivity, so we need asymptotics for the rel-
evant conductivity components applicable in a nonlocal
(ql ≫ 1) and high frequency (ωτ ≫ 1) regime. Cor-
responding expressions for σ̃ij were obtained in earlier
works [8]. However, these results are not appropriate for
our analysis for they do not provide a smooth passage to
the Beff → 0 limit at finite q. Due to this reason we
do not use them in further calculations. To get a suitable
approximation for the CF conductivity we start from the
standard solution of the Boltzmann transport equation
for the CF distribution function. This gives us the fol-
lowing results for the CF conductivity components for a
single layer [9]:
σ̃αβ =
(2πh̄)2
dψvα(ψ) exp
′′)dψ′′
′) exp
′′)dψ′′
(ψ′ − ψ)(1− iωτ)
dψ′. (7)
Here, m∗, Ω are the CF effective mass and the cyclotron
frequency at the effective magnetic field Beff ; ψ is the
angular coordinate of the CF cyclotron orbit. Now we
carry out some formal transformations of this expression
(7) following the way proposed before [9, 10]. First, we
expand the CF velocity components vβ(ψ
′) in Fourier
series:
vkβ exp(ikψ
′). (8)
Substituting this expansion (8) into (7) we obtain:
(2πh̄)2
dψvα(ψ) exp(ikψ)
ikΩ− iω +
+ iqvx(ψ)
vx(ψ +Ωθ
′)− vx(ψ)
dθ (9)
where θ = (ψ′ − ψ)/Ω.
Then we introduce a new variable η which is related
to the variable θ as follows:
ikΩ− iω +
+ iqvx(ψ)
vx(ψ +Ωθ
′)− vx(ψ)
dθ′, (10)
and we arrive at the result:
σ̃αβ =
im∗e2
(2πh̄)2
vα(ψ) exp(ikψ)
ω + i/τ − kΩ− qvx(ψ +Ωθ)
dψ. (11)
Under the conditions of interest ωτ ≫ 1, ql ≫ 1, and
also assuming that the filling factor is close to ν = 1/2,
so that qvF ≫ Ω (vF is the CFs Fermi velocity), the
variable θ is approximately equal to ητ(1 + iql cosψ +
ikΩτ − iωτ)−1. Taking this into account and expanding
the last term in the denominator of (11) in powers of Ωθ
we obtain:
qvx(ψ +Ωθ)
≈ qvx(ψ) + ηΩqτ(1 + iql cosψ + ikΩτ − iωτ)−1
(Ωτ)2(1 + iql cosψ + ikΩτ − iωτ)−2
. (12)
Substituting this asymptotic expression into (9) we can
calculate first terms of the expansions of relevant compo-
nents of the CF conductivity in powers of the small pa-
rameter (qR)−1 where R = vF /Ω is the CF cyclotron
radius. Within the ”collisionless” limit 1/τ → 0 we
have:
σ̃xx = −N
1− δ2
(1− δ2)5
2(qR)2
1− δ2
; (13)
σ̃yy = N
1− δ2 + iδ +
2(qR)2
(1− δ2)5
(1− δ2)3
; (14)
σ̃xy = iN
1− δ2
(1− δ2)3
. (15)
Here, N = m∗/2πh̄2 is the density of states at the
CF Fermi surface, and δ = ω/qvF . Using these re-
sults we can easily get approximations for the functions
αβ(i)
(q, ω) (α, β = 0.1) and, subsequently, the de-
sired density-density response function given by (4). It
was shown [3] that the integral over ω in the expres-
sion for ρD (1) is dominated by ω ∼ T, and the ma-
jor contribution to the integral over q in this expres-
sion comes from q ∼ kF (T/T0)1/3, where kF is the
Fermi wave vector and the scaling temperature T0 is de-
fined below. Therefore we get an estimate for δ, namely
δ ∼ (T/µ)(T0/T )1/3, where µ is the chemical potential
of a single 2DEG included in the bilayer. For the param-
eter T0 taking on values of the order of room tempera-
ture, δ is small compared to unity at low temperatures
(T ∼ 1K).
Here, we limit ourselves to the case of two identical
layers (Π(1) = Π(2) ≡ Π) . For δ ≪ 1 we obtain the
approximation:
Π00(q, ω)
− 8πih̄ωkF
1 + 2(kFR)
(qR)−2
Here, dn/dµ is the compressibility of the ν = 1/2 state
which is defined as [3]:
≡ Π00(q → 0; ω → 0) =
8πh̄2
. (17)
This differs from the compressibility of the noninteracting
2DEG in the absence of an external magnetic field (the
latter equals N). The difference in the compressibility
values is a manifestation of the Chern-Simons interaction
in strong magnetic fields.
In following calculations we adopt the expression used
in the work [3] for the screened interlayer potential
U(q,ω), namely:
U(q,ω) =
Vb + Ub
1 + Π(q,ω)(Vb + Ub)
Vb − Ub
1 + Π(q,ω)(Vb − Ub)
where Vb(q) = 2πe
2/ǫq and Ub(q) = (2πe
2/ǫq)e−qd
are Fourier components of the bare Coulomb potentials
for intralayer and interlayer interactions, respectively,
and ǫ is the dielectric constant. Substitung (18) into
(1) and using our result (16) for Π(q, ω) we can present
the transresistvity in the ”ballistic” regime as:
ρD = ρD0 + δρD. (19)
Here, the first term ρD0 is the transresistivity at ν = 1/2
when the effective magnetic field is zero, and the sec-
ond term gives a correction arising in a nonzero effective
magnetic field (away from ν = 1/2 ). As it was to be
expected, our expression for ρD0 coincides with the al-
ready known result [3]:
ρD0 =
Γ(7/3)ζ(4/3)
with T0 =
πe2nd/ǫ
(1 + α), and
2πe2d
. (21)
The leading term of the correction δρD at low tempera-
tures (T/T0) ≪ 1 can be writen as follows:
δρD =
(kFR)2
ρD0∆ν
+ 4a2
(∆ν)2.
Here, the dimensionless positive constant a2 can be ap-
proximated as:
sinh2 y
y4/3 cosh2 y
dy. (23)
We have to remark that our result (23) cannot be
used in the limit T → 0. Actually, this expression pro-
vides a good asymptotic form for the coefficient a2 when
(TkF l/µ)
1/3 ≥ 1.5. Assuming that the mean free path
is of the order of 1.0µm as in the experiments [11] on
dc magnetotransport in a single modulated 2DEG at ν
close to 1/2, and using the estimate of [7] for the electron
density n = 1.4 × 1015m−2, we obtain that the expres-
sion (23) gives good approximation for a2 when T/µ is
no less than 10−2 .
It follows from our results (19), (22) that transresis-
tivity ρD enhances nearly quadratically with ∆ν when
the filling factor deviates from ν = 1/2. The linear in
∆ν term is also present in the expression for δρD. This
causes an asymmetric shape of the plot of Eq. (22) rel-
ative to ∆ν = 0. However, this asymmetry is not very
significant for the linear term is smaller than the last
term on the right hand side of (22). This difference in
magnitudes is due to different temperature dependences
of the considered terms. The first term including the lin-
ear in (kFR)
−1 correction is proportional to (T/T0)
whereas the second one is proportional to (T/T0)
2/3 and
predominates at low temperatures. So, the magnetic field
dependence of the transresistivity near ν = 1/2 matchs
that observed in the experiments (See Fig. 1).
Keeping only the greatest term in (22), the ratio
ρD/ρD0 can be presented in the form:
= 4β(∆ν)2 + 1. (24)
FIG. 1: Scaled drag resistivity versus ∆ν at T = 0.6; lowest
dashed curve is the plot of Eq. (22) at m∗ = 4mb; A0 = 15,
and remaining curves present experimental data of [7];
Here, the coefficient β equals:
Γ(7/3)ζ(4/3)
. (25)
This coefficient is proportional to the curvature of the
plot of Eq. (22) assuming that the first term is neglected.
The curvature reveales a strong dependence on tempera-
ture whose character also agrees with experiments of [7]
as it is shown in Fig. 2.
A striking feature in the experimental results is that
they appear to be nonsensitive to the distance between
the 2DEGs. Sets of data corresponding to samples with
different interlayer spacings dA = 10nm and dB =
22.5nm fall on the same curve. This concerns both mag-
netic field dependence of the transresistivity and tem-
perature dependence of the parameter β . Results of the
present analysis provide a possible explanation for this
feature. It follows from (20)–(25) that the dependence
of ρD of the interlayer spacing is completely included in
the characteristic temperature T0 which is defined with
Eq. (21). The above quantity is nearly independent of
the interlayer separation d when the parameter α takes
on values larger that unity. Estimating the parameter
α as it is given by Eq. (21), we obtain that the con-
dition α > 1 could be satisfied for small values of the
compressibility of the ν = 1/2 state. However, within
the RPA the effective mass of CFs coincides with the
single electron band mass mb which takes on the value
mb ≈ 0, 07me for GaAs wells (me is the mass of a free
FIG. 2: Temperature dependence of the coefficient β−1 for
interlayer distances d = 10nm (upper curve) and d =
22.5nm (lower curve) compared to the summary of experi-
mental curvature at both spacings [7]
electron). Using this value to estimate the compressibil-
ity as it is introduced by Eq. (17) we get α ≈ 0.44. This
is too small to provide insensitivity of the coefficient β
determined by Eq. (25) to the interlayer distance for in-
terlayer spacings reported in the experiments [3]. The
above discrepancy could be removed taking into account
Fermi liquid interactions among quasiparticles (CFs). To
include Fermi liquid effects into consideration we write
the renormalized polarizability Π∗ in the form [8]:
Π∗−1 = Π−1 + F(0) + F(1). (26)
Here, Π is the polarizability of noninteracting CFs de-
fined with Eq. (2), and the remaining terms present con-
tributions arising due to Fermi liquid interaction in the
CF system. Only contributions from the first and great-
est two terms in the expansion of the Fermi liquid in-
teraction function in Legendre polynomials ( f0 and f1 ,
respectively) are kept in Eq. (26) to avoid too lengthy
calculations. Matrix elements of the 2× 2 matrices F(0)
and F(1) equal:
F(0) =
F(1) =
m∗ −mb
m∗ −mb
Within the Fermi liquid theory the effective mass m∗
is related to the ”bare” mass mb as follows:
2πh̄2
1 +A1
. (28)
Using these expressions (26)–(28) and carrying out cal-
culations within the relevant limit δ ≪ 1, we obtain that
the expression for the density-density response function
for a single layer keeps the form given by Eq. (16) where
the compressibility dn/dµ is replaced with the quantity
dn∗/dµ renormalized due to the Fermi liquid interaction:
8πh̄2
8πh̄2
. (29)
For strongly correlated quasiparticles this renormaliza-
tion may significantly reduce the compressibility of the
CF liquid, and, consequently, increase the value of the
parameter α. It is usually assumed [3, 8] that the Fermi
liquid renormalization of the effective mass significantly
changes its value: m∗ ∼ 5 − 10 mb. This gives for the
Fermi liguid coefficient A1 values of the order of 10. Us-
ing this estimate, and substituting our renormalized com-
pressibility (29) into the expression (21) we arrive at the
conclusion that dn∗/dµ is low enough for the condition
α > 1, to be satisfied when the Fermi liquid parameter
A0 ≡ f0/2πh̄
2 takes on values of the order of 10 − 100.
This conclusion does not seem an unrealistic one for it
is reasonable to expect A0 to be of the order or greater
than the next Fermi liquid parameter A1. We obtain a
reasonably good agreement between the plot of our Eq.
(22) and the experimental results, using A0 = 15 and
A1 = 3 (m
∗ = 4mb). (Fig. 1).
Our results for temperature dependence of β−1 also
agree with the results of experiments [7]. The up-
per curve in Fig.2 corresponds to the double-layer sys-
tem with with smaller interlayer spacing dA = 10nm
which gives T0 = 487K, and the lower curve exhibits
the temperature dependence of β−1 for greater spacing
dB = 22.5nm (T0 = 587K). The curves do not coincide
but they are arranged rather close to each other.
Finally, the results of the present analysis enable us
to qualitatively describe all important features observed
in experiments of [7] on the Coulomb drag slightly away
from one half filling of lowest Landau levels of both in-
teracting 2DEG. They also give us grounds to treat these
experimental results as one more evidence of strong Fermi
liquid interaction in the CF system near one half filling
of the lowest Landau level. The above interaction pro-
vides a significant reduction of the compressibility of the
CF liquid and a consequent enhancement in the screen-
ing length in single layers. Essentially, the parameter α
characterizes the ratio of the Thomas–Fermi screening
length in a single 2DEG at ν = 1/2 and the separation
between the layers [3]. When α > 1, intralayer interac-
tions predominate those between the layers which could
be the reason for low sensitivity of the bilayer to changes
in the interlayer spacing. It is likely that here is an expla-
nation for the reported nearly independence of the drag
on the interlayer separation [7]. We believe that at larger
distances between the layers the dependence of the tran-
sresistivity of d could be revealed in the experiments.
At the same time the results of [7] give us a valuable
opportunity to estimate a strength of Fermi liquid inter-
actions between quasiparticles at ν = 1/2 state which is
important for further studies of such systems.
Acknowledgments: The author thank K.L. Haglin and
G.M. Zimbovsky for help with the manuscript.
[1] L. Zheng and A.H. MacDonald, Phys. Rev. B 48, 8203
(1993).
[2] A. Kamenev and Y. Oreg, Phys. Rev. B 52, 7516 (1995).
[3] I. Ussishkin and A. Stern, Phys. Rev. B 56, 4013 (1997).
[4] S. Sakhi, Phys. Rev. B 56, 4098 (1997);
[5] M.P. Lilly, J.P. Eisenstein, L.N. Pfeiffer and K.W. West,
Phys. Rev. Lett. 80, 1714 (1998).
[6] When the external driving disturbance applied to one of
the layers is of small q,ω (ql ≪ 1, ωτ ≪ 1) the tran-
sresisitivity is dominated with the diffusion contribution,
and new effects could emerge (See e.g. F. von Oppen,
S.H. Simon and A. Stern, Phys. Rev. Lett. 87, 106803
(2001) and references therein).
[7] M.P. Lilly, J.P. Eisenstein, L.N. Pfeiffer and K.W. West,
cond-mat/9909231.
[8] B.I. Halperin, P.A. Lee and N. Read, Phys. Rev. B 47,
7312 (1993); S.H. Simon and B.I. Halperin, Phys. Rev.
B 48, 17368 (1993).
[9] N.A. Zimbovskaya and J.L. Birman, Phys. Rev. B, 60
16762 (1999).
[10] N.A. Zimbovskaya, ”Local Geometry of the Fermi
Surface and High-Frequency Phenomena in Metals”,
Springer-Verlag, New York, 2001.
[11] R.L. Willett, K.W. West, and L.N. Pfeiffer Phys. Rev.
Lett. 83, 2624 (1999).
http://arxiv.org/abs/cond-mat/9909231
ABSTRACT
  Here, we present theoretical studies of the temperature and magnetic field
dependences of the Coulomb drag transresistivity between two parallel layers of
two dimensional electron gases in quantum Hall regime near half filling of the
lowest Landau level. It is shown that Fermi-liquid interactions between the
relevant quasiparticles could give a significant effect on the
transresistivity, providing its independence of the interlayer spacing for
spacings taking on values reported in the experiments. Obtained results agree
with the experimental evidence.

<|endoftext|><|startoftext|>
Introduction 2
2. G-invariant Killing spinors in 4D 4
2.1 Orbits of Dirac spinors under the gauge group 4
2.2 The ungauged theory 7
2.3 The gauged theory 8
2.4 Generalized holonomy 10
3. Null representative 1 + ae1 11
3.1 Constant Killing spinor, da = 0 12
3.2 Killing spinor with da 6= 0 15
3.3 Half-supersymmetric backgrounds 17
4. Timelike representative 1 + be2 19
4.1 Conditions from the Killing spinor equations 19
4.2 Geometry of spacetime 20
4.3 Half-supersymmetric backgrounds 25
4.4 Time-dependence of second Killing spinor 28
5. Timelike half-supersymmetric examples 33
5.1 Static Killing spinors and b = b(z) 34
5.1.1 AdS2 ×H2 space-time (α = 0) 36
5.1.2 AdS4 space-time (β
2 = 4αγ) 39
5.1.3 The Reissner-Nordström-Taub-NUT-AdS4 family 41
5.2 Harmonic b solutions 43
5.2.1 Deformations of AdS2 ×H2 44
5.2.2 Deformations of AdS4 44
5.2.3 Deformations of Reissner-Nordström-Taub-NUT-AdS4 44
5.3 Imaginary b solutions 45
5.4 Action of the PSL(2,R) group on the imaginary b solutions 49
5.5 Gravitational Chern-Simons system and G0 = ψ− = 0 solutions 50
6. Final remarks 52
A. Spinors and forms 54
– 1 –
B. Spinor bilinears 56
C. The case P ′ = 0 57
D. Half-supersymmetric solutions with G0 = 0 58
1. Introduction
Throughout the history of string and M-theory an important part in many develop-
ments in the subject has been played by supersymmetric solutions of supergravity,
i.e. by backgrounds which admit a number of Killing spinors ǫ which are parallel with
respect to the supercovariant derivative1: Dµǫ = 0. Due to their ubiquitous role it has
long been realised that it would be advantageous to have classifications of all super-
symmetric solutions of a given theory.
For purely gravitational backgrounds the supersymmetric possibilities follow from
the Berger classification of the possible Riemannian holonomies [1] (see [2, 3] for an
extension to the Lorentzian case). However, in the presence of additional force fields
(carried by e. g. scalars, gauge potentials or a cosmological constant) it has proven very
difficult to obtain knowledge of all supersymmetric possibilities.
The reason for the complication in the presence of additional fields lies in the
holonomy of the supercurvature Rµν = D[µDν]. For purely gravitational backgrounds
the holonomy of the supercurvature is generically given by H = Spin(d − 1, 1) in d
dimensions, and hence coincides with the Lorentz group. In such cases the Lorentz
gauge freedom allows one to choose constant Killing spinors. Another simplification is
that if there is one Killing spinor with a specific stability subgroup, i.e. it is invariant
under some Lorentz subgroup, all other spinors with the same stability subgroup are
Killing as well.
For more general solutions including fields other than gravity, the holonomy is
generically extended to a larger group H ⊃ Spin(d− 1, 1). For example, in the present
paper we consider gauged minimal four-dimensional N = 2 supergravity, which has
H = GL(4,C) [4]. In such cases one cannot choose constant Killing spinors nor are all
spinors with the same stability subgroup automatically Killing. For these reasons the
1For the purpose of this discussion we will ignore possible additional Killing spinor equations coming
from the variation of dilatinos and gauginos.
– 2 –
classification of the backgrounds that allow for Killing spinors is more convoluted, or
richer, in such cases. For a long time the only classification available was in ungauged
minimal four-dimensional N = 2 supergravity [5, 6], which has H = SL(2,H).
A new impulse was given to the subject with the introduction of G-structures and
the method of spinor bilinears to solve the Killing spinor equations [7]. In this approach,
space-time forms are constructed as bilinears from a Killing spinor and one analyses
the constraints that these forms imply for the background. Using this framework, a
number of complete classifications [8–10] and many partial results (see e.g. [11–21] for
an incomplete list) have been obtained. By complete we mean that the most general so-
lutions for all possible fractions of supersymmetry have been obtained, while for partial
classifications this is only available for some fractions. Note that the complete classi-
fications mentioned above involve theories with eight supercharges and H = SL(2,H),
and allow for either half- or maximally supersymmetric solutions.
An approach which exploits the linearity of the Killing spinors has been proposed
[22] under the name of spinorial geometry. Its basic ingredients are an explicit oscillator
basis for the spinors in terms of forms and the use of the gauge symmetry to transform
them to a preferred representative of their orbit. In this way one can construct a
linear system for the background fields from any (set of) Killing spinor(s) [23]. This
method has proven fruitful in e.g. the challenging case of IIB supergravity [24–26].
In addition, it has been adjusted to impose ’near-maximal’ supersymmetry and thus
has been used to rule out certain large fractions of supersymmetry [27–30]. Finally, a
complete classification for type I supergravity in ten dimensions has been obtained [32].
In the present paper we would like to address the classification of supersymmetric
solutions in four-dimensional minimal N = 2 supergravity. As will also be reviewed in
section 2, the ungauged case has been classified completely [5,6]. For the gauged case,
the discussion of 1/4 supersymmetry splits up in a time-like and a light-like class (de-
pending on the causal nature of the Killing vector associated to the Killing spinor). The
time-like class is completely specified by a single complex function depending on three
spatial coordinates b = b(z, w, w̄), subject to a second-order differential equation which
can not be solved in general [13]. The light-like class can be given in all generality, and
in addition its restriction to 1/2-BPS solutions has been derived [16]. Furthermore,
there are no backgrounds with 3/4 supersymmetry [29] and AdS4 is the unique possi-
bility with maximal supersymmetry. Therefore the remaining open question concerns
half-supersymmetric backgrounds in the gauged theory2.
In the following, we will first re-analyse the 1/4-supersymmetric backgrounds us-
ing the method of spinorial geometry, and in fact find an additional possibility in the
2The addition of external matter was considered in [31].
– 3 –
light-like case: a half supersymmetric bubble of nothing in AdS4 and its Petrov type
II generalization, a new 1/4 BPS configuration that has the interpretation of grav-
itational waves propagating on the bubble of nothing. This completes the analysis
of the null class in all its generality. Then we will derive the constraints for half-
supersymmetric backgrounds for the timelike class. Subject to a single assumption on
the time-dependence of the second Killing spinor these will be solved in general, up to a
second order ordinary differential equation. The assumption will be justified by solving
the full set of conditions in a number of examples which illustrate the possible spatial
dependence of b. All these cases turn out to have time-dependence of the assumed
form. The different examples are:
• the b = b(z) family of solutions, comprising part of the Reissner-Nordström-Taub-
NUT-AdS4 backgrounds,
• waves on the previous backgrounds with b = b(z, w),
• solutions with b imaginary and their PSL(2,R) transformed counterparts,
• solutions of the dimensionally reduced gravitational Chern-Simons model that
can be embedded in the equations for a timelike Killing spinor [16].
We determine when these backgrounds preserve 1/2 supersymmetry and provide the
explicit Killing spinors. Moreover, in the subcases consisting of AdS4 and AdS2×H2, the
action of the isometries of these backgrounds on the Killing spinors is given explicitly.
The outline of this paper is as follows. In section 2, we discuss the orbits of Killing
spinors and review the known classification results in the theory at hand. In section
3, we go through the complete classification of the null class. In section 4, we discuss
the constraints for 1/4 and 1/2 supersymmetry in the timelike class. We derive the
time-dependence of the second Killing spinor and solve the equations for the case of
linear time-dependence (G0 = 0). A number of examples of the 1/2 BPS timelike class
are provided in section 5. Finally, in section 6 we present our conclusions and outlook.
In appendix A we review our notation and conventions for spinors, while in appendix B
the associated bilinear forms are given. Appendix C deals with the special case P ′ = 0,
to be defined in section 4.4. Finally, in appendix D, we will give the details of the
G0 = 0 case.
2. G-invariant Killing spinors in 4D
2.1 Orbits of Dirac spinors under the gauge group
In order to obtain the possible orbits of Spin(3,1) in the space of Dirac spinors ∆c,
– 4 –
we first consider the most general positive chirality spinor3 a1 + be12 (a, b ∈ C) and
determine its stability subgroup. This is done by solving the infinitesimal equation
αcdΓcd(a1 + be12) = 0 . (2.1)
First of all, notice that a1 + be12 is in the same orbit as 1, which can be seen from
eγΓ13eψΓ12eδΓ13ehΓ02 1 = ei(δ+γ)eh cosψ 1 + ei(δ−γ)eh sinψ e12 .
This means that we can set a = 1, b = 0 in (2.1), which implies then α02 = α13 = 0,
α01 = −α12, α03 = α23. The stability subgroup of 1 is thus generated by
X = Γ01 − Γ12 , Y = Γ03 + Γ23 . (2.2)
One easily verifies that X2 = Y 2 = XY = 0, and thus exp(µX + νY ) = 1 + µX + νY ,
so that X, Y generate R2.
Spinors of negative chirality are composed of odd forms, i.e. ae1 + be2. One can
show in a similar way that they are in the same orbit as e1, and the stability subgroup
is again R2, with the above generators X, Y .
For definiteness and without loss of generality we will always assume that the first
Killing spinor has a non-vanishing positive chirality component, and use (part of) the
Lorentz symmetry to bring this to the form 1. Hence we can write a general spinor as
1 + ae1 + be2. Now act with the stability subgroup of 1 to bring ae1 + be2 to a special
form:
(1 + µX + νY )(1 + ae1 + be2) = 1 + be2 + [a + 2b(ν − iµ)]e1 .
In the case b = 0 this spinor is invariant, so the representative is 1+ ae1, with isotropy
group R2. If b 6= 0, one can bring the spinor to the form 1+ be2, with isotropy group I.
The representatives4 together with the stability subgroups are summarized in table 1.
In the ungauged theory, we therefore can have the following G-invariant Killing
spinors. The R2-invariant Killing spinors are spanned by 1 and e1 and there can be up
to four of these. The I-invariant Killing spinors are spanned by all four basis elements
and there can be up to eight of these. In the first two case, the vector Va bilinear in the
spinor ǫ is lightlike, whereas in the last case it is timelike, see table 1. The existence of
a globally defined Killing spinor ǫ, with isotropy group G ∈ Spin(3,1), gives rise to a
G-structure. This means that we have an R2-structure in the null case and an identity
structure in the timelike case.
3Our conventions for spinors and their description in terms of forms can be found in appendix A.
4Note the difference in form compared to the Killing spinors of the corresponding theories in five
and six dimensions: in six dimensions these can be chosen constant [9] while in five dimensions they
are constant up to an overall function [28]. In four dimensions such a choice is generically not possible.
– 5 –
In U(1) gauged supergravity, the local Spin(3,1) invariance is actually enhanced
to Spin(3,1) × U(1). Thus, in order to obtain the stability subgroup, one determines
the Lorentz transformations that leave a spinor invariant up to an arbitrary phase
factor, which can then be gauged away using the additional U(1) symmetry. For the
representative 1, one gets in this way an isotropy group generated by X, Y and Γ13
obeying
[Γ13, X ] = −2Y , [Γ13, Y ] = 2X , [X, Y ] = 0 ,
i. e. G ∼= U(1)⋉R2. For ǫ = 1 + ae1 with a 6= 0, the stability subgroup R2 is not
enhanced, whereas the I of the representative 1+ be2 is promoted to U(1) generated by
Γ13 = iΓ•̄•. The Lorentz transformation matrix aAB corresponding to Λ = exp(iψΓ•̄•) ∈
U(1), with ΛΓBΛ
−1 = aABΓA, has nonvanishing components
a+− = a−+ = 1 , a••̄ = e
2iψ , a•̄• = e
−2iψ . (2.3)
Finally, notice that in U(1) gauged supergravity one can choose the function a in 1+ae1
real and positive: Write a = R exp(2iδ), use
eδΓ13(1 + ae1) = e
iδ1 + e−iδae1 = e
iδ(1 +Re1) ,
and gauge away the phase factor exp(iδ) using the electromagnetic U(1).
ǫ G ⊂ Spin(3,1) G ⊂ Spin(3,1) × U(1) Va = D(ǫ,Γaǫ)
1 R2 U(1)⋉R2 (1, 0,−1, 0)
1 + ae1 R
2 (a ∈ R) (1 + |a|2, 0,−1− |a|2, 0)
1 + be2 I U(1) (1 + |b|2, 0,−1 + |b|2, 0)
Table 1: The representatives ǫ of the orbits of Dirac spinors and their stability subgroups G
under the gauge groups Spin(3,1) and Spin(3,1) × U(1) in the ungauged and gauged theories,
respectively. The number of orbits is the same in both theories, the only difference lies in the
stability subgroups and the fact that a is real in the gauged theory. In the last column we
give the vectors constructed from the spinors.
In the gauged theory the classification of G-invariant spinors is therefore slightly
more complicated. There can be at most two U(1)⋉R2-invariant Killing spinors,
spanned by 1. The four R2-invariant spinors are spanned by 1 and e1. Then there
are the U(1)-invariant spinors, spanned by 1 and e2. Finally, for generic enough Killing
spinors, one does not fall in any of the above classes and the common stability subgroup
is I. Note that in the gauged theory the presence of G-invariant Killing spinors will
in general not lead to a G-structure on the manifold but to stronger conditions. The
– 6 –
structure group is in fact reduced to the intersection of G with Spin(3,1), and hence is
equal to the stability subgroup in the ungauged theory.
We will now consider the possible supersymmetric solutions to the equation Dµǫ =
0 in various sectors of N = 2, D = 4 in terms of the stability subgroup G of the Killing
spinors.
2.2 The ungauged theory
The supercovariant derivative of ungauged minimal N = 2 supergravity in four dimen-
sions reads
Dµ = ∂µ +
ωabµ Γab +
FabΓabΓµ . (2.4)
As mentioned in the introduction, a first point to notice is that there is no complex
conjugation on the Killing spinor. Therefore, the number of supersymmetries that are
preserved is always even: if ǫ is Killing, then so is iǫ.
First consider purely gravitational solutions with F = 0. In this case the superco-
variant connection truncates to the Levi-Civita connection and has Spin(3,1) holonomy.
This implies the following. If ǫ is Killing, then so are5 Γ3∗ǫ and Γ012∗ǫ (where ∗ denotes
complex conjugation). Together, the operations i, Γ3∗ and Γ012∗ generate four linearly
independent Killing spinors from any null spinor ǫ = 1 or ǫ = 1+ae1 and eight from any
time-like spinor ǫ = 1+ be2. This illustrates the general statement in the introduction:
if the gauge group equals the holonomy, as in this case, then there is only one possible
number of Killing spinors for every stability subgroup. Therefore there are only two
classes of supersymmetric solutions, which are listed in table 2, and which consist of
the gravitational wave and Minkowski space-time, respectively.
G = \ N = 4 8
Table 2: Gravitational solutions with G-invariant Killing spinors in the ungauged theory.
Now let us also allow for fluxes F . The supercovariant connection no longer equals
the Levi-Civita connection due to the flux term. In particular, this implies that Γ012∗ no
longer commutes with Dµ. However, this does still hold for the other operation: Γ3 ∗ ǫ
is Killing provided ǫ is. The combined operations of i and Γ3∗ generate four linearly
5These operations anti-commute and commute with the Γ-matrices, respectively.
– 7 –
independent spinors from any null or time-like spinor. Thus the number of supersym-
metries is always N = 4p, as illustrated in table 3. Indeed the generalised holonomy of
the supercovariant connection in the ungauged case is SL(2,H) [4], consistent with the
supersymmetries coming in quadruplets.
G = \ N = 4 8
Table 3: General solutions with G-invariant Killing spinors in the ungauged theory.
The half-supersymmetric solution have been classified by Tod [5] and consist of the
plane wave and the Israel-Wilson-Perjes metric, respectively. The maximally supersym-
metric solutions are AdS2 × S2 and its Penrose limits, the Hpp wave and Minkowski
space-time [6].
2.3 The gauged theory
The supercovariant derivative of gauged minimalN = 2 supergravity in four dimensions
reads
Dµ = ∂µ +
ωabµ Γab − iℓ−1Aµ + 12ℓ
−1Γµ +
FabΓabΓµ . (2.5)
Due to the gauging the structure of Γ-matrices is richer, but there still is no complex
conjugation on the Killing spinor. Therefore, the number of supersymmetries that are
preserved is always even: if ǫ is Killing, then so is iǫ.
Again, we first consider the purely gravitational solutions. In this case the super-
covariant derivative has SO(3,2) holonomy. The operation Γ012∗ commutes with Dµ
and therefore generates additional Killing spinors. Together, the operations i and Γ012∗
generate four linearly independent Killing spinors from generic null or time-like spinors.
The exception is the null spinor ǫ = 1+e1, in which case ǫ and Γ012∗ are linearly depen-
dent, and hence allows for two instead of four Killing spinors. The possibilities allowed
for by this analysis of the supercovariant derivative can be found in table 4.
However, although all these entries are allowed for by the spinor orbit structure
and the crude analysis of the supercurvature above, not all of them have an actual
field theoretic realisation in supergravity. In other words, there are no solutions to the
Killing spinor equations for all of the above sets of Killing spinors. The lightlike cases
were considered in [16]: The 1/4-BPS case is the Lobatchevski wave while imposing
more supersymmetries leads to the maximally supersymmetric AdS4 solution (with
– 8 –
G = \ N = 2 4 6 8
U(1)⋉ R2 × × × ×
2 √ ◦ × ×
U(1) × ◦ × ×
I × ◦ ◦
Table 4: Gravitational solutions with G-invariant Killing spinors in the gauged theory. Check
marks indicate entries with actual solutions, while circles stand for allowed entries which are
not realized.
G=1). The N = 4 and G = R2 entry is thus effectively empty. In particular, this
implies that imposing a single Killing spinor 1 + ae1 with a 6= 1 leads to AdS4. Also
note that the N = 6 and G = 1 entry must be empty since any time-like spinor plus
1+e1 leads to maximal supersymmetry, while all other Killing spinors come in groups of
four. The only remaining entries are N = 4 and G = U(1) or G = I. Using the results
of [13,16], it is straightforward to show that in these purely gravitational timelike cases
the geometry is given by
ds2 = −z
2 + n2
(dt− 2n cosh θdφ)2 + ℓ
z2 + n2
+ (z2 + n2)(dθ2 + sinh2 θdφ2) ,
where n = ±ℓ/2. But this is simply AdS4 written as a line bundle over a three-
dimensional base manifold, so both N = 4 entries are empty as well. We conclude that
there are no 1/2-supersymmetric gravitational solutions in the gauged theory, only the
1/4-supersymmetric Lobatchevski waves and maximally supersymmetric AdS4.
We now come to the general supersymmetric solutions in the gauged case. Due
to the gauging and flux terms, neither Γ012∗ nor Γ3∗ commute with Dµ. Therefore
we have the cases as listed in table 5. The supercovariant connection in the gauged
case has generalized holonomy GL(4,C) [4], again consistent with the supersymmetries
coming in doublets.
The 1/4-BPS solutions with G = R2 and G = U(1) were derived in [13], and we
will show there is no solution with G = U(1)⋉R2. In addition, it was shown in [16]
that any additional supersymmetries in the null case are always timelike, i.e. end up
in the N = 4 and G = 1 entry. Again, the N = 4 and G = R2 entry is empty. It
would be interesting to see if there is a nice explanation for this. In addition, the
maximally supersymmetric case is always AdS4. Recently, it has been shown in [29]
that the N = 6 and G = 1 entry is empty as well, because imposing three complex
Killing spinors implies that the spacetime is AdS4 and thus maximally supersymmetric.
– 9 –
G = \ N = 2 4 6 8
U(1)⋉ R2 ◦ × × ×
2 √ ◦ × ×
Table 5: General solutions with G-invariant Killing spinors in the gauged theory. Check
marks indicate entries with actual solutions, while circles stand for allowed entries which are
not realized.
The most general 1/2-BPS solution in the timelike case remains an open issue and will
be studied in this paper.
2.4 Generalized holonomy
In minimal gauged supergravity theories with eight supercharges, the generalized holon-
omy group for vacua preserving N supersymmetries, whereN = 0, 2, 4, 6, 8, is GL(8−N
2 [4]. To see this, assume that there exists a Killing spinor ǫ1. By a local
GL(4,C) transformation, ǫ1 can be brought to the form ǫ1 = (1, 0, 0, 0)
T . This is
annihilated by matrices of the form
that generate the affine group A(3,C) ∼= GL(3,C)⋉C3. Now impose a second Killing
spinor ǫ2 = (ǫ
2, ǫ2)
T . Acting with the stability subgroup of ǫ1 yields
eAǫ2 =
ǫ02 + b
, where bT = aTA−1(eA − 1) .
We can choose A ∈ gl(3,C) such that eAǫ2 = (1, 0, 0)T , and b such that ǫ02 + bT ǫ2 = 0.
This means that the stability subgroup of ǫ1 can be used to bring ǫ2 to the form
ǫ2 = (0, 1, 0, 0). The subgroup of A(3,C) that stabilizes also ǫ2 consists of the matrices
1 0 b2 b3
0 1 B12 B13
0 0 B22 B23
0 0 B32 B33
∈ GL(2,C)⋉ 2C2 .
Finally, imposing a third Killing spinor yields GL(1,C) ⋉ 3C as maximal generalized
holonomy group, which is however not realized in N = 2, D = 4 minimal gauged
– 10 –
supergravity [16, 29]. It would be interesting to better understand why such preons
actually do not exist. In section 4.3, we explicitely compute the generalized holonomy
group for N = 2, D = 4 minimal gauged supergravity in the case N = 2 and show that
it is indeed contained in A(3,C), supporting thus the classification scheme of [4].
3. Null representative 1 + ae1
In this section we will analyse the conditions coming from a single null Killing spinor.
As we saw in section 2.1, there are two orbits of such spinors, one with representative
ǫ = 1 and stability subgroup G = U(1)⋉R2 and one with ǫ = 1 + ae1 and G = R
Owing to local U(1) gauge invariance, it is always possible to choose the function a real
and positive, so in the following we set a = eχ, χ ∈ R. The Killing spinor equations
become
E •̄ − 2iF+•̄E−
= 0 ,
E •̄ + 2iF+•̄E−
= 0 ,
ω−• +
2iF−•E •̄ +
= 0 ,
ω−• +
−2iF−•E •̄ +
= 0 , (3.1)
where φ ≡ F+− + F •̄• and Ω ≡ ω+− + ω•̄•.
The conditions for the special U(1)⋉R2-orbit with ǫ = 1 can be obtained as the
singular limit χ → −∞ of the above equations. Note however that, in this limit, the
second line implies the constraint ℓ−1−iφ = 0, while the fourth line leads to ℓ−1+iφ = 0.
Clearly, for ℓ−1 6= 0 this does not allow for a solution. Hence, in the gauged theory,
there are no backgrounds with U(1)⋉R2-invariant Killing spinors.
The only null possibility is therefore given by the R2-invariant Killing spinor ǫ =
1 + eχe1. We will now analyse the above conditions for the generic case with χ finite.
In fact, we will furthermore assume it is positive. This does not constitute any loss of
generality since one can flip the sign of χ by changing chirality (a spinor 1 + eχe1 with
χ negative is gauge equivalent to a spinor e1 + e
χ̃1 with χ̃ = −χ positive), and hence
the resulting background will not depend on this sign.
From the last two equations one obtains the constraints
F−• = F−•̄ = 0 , φ = − i
tanhχ (3.2)
– 11 –
on the field strength, as well as
ω−• = ω−•̄ = − 1√
2ℓ coshχ
E− (3.3)
for the spin connection. (3.2) implies F+− = 0 and F •̄• = − i
tanhχ. The first two
equations of (3.1) yield then
ω+− = 2eχH3E
− − 1
coshχ
ω•̄• = 2i sinhχH1E
cosh 2χ
coshχ
A = −ℓ coshχH1E− − sinhχE3 ,
dχ = −2 coshχH3E− +
sinhχE1 , (3.4)
where E1 = (E• + E •̄)/
2, iE3 = (E• − E •̄)/
2, and we defined
F+• + F+•̄√
= H1 ,
F+• − F+•̄√
= iH3 .
In order to proceed, we distinguish two subcases, namely dχ = 0 and dχ 6= 0.
3.1 Constant Killing spinor, da = 0
If a and hence χ are constant, eqn. (3.4) implies χ = H3 = 0. Next we impose vanishing
torsion. The torsion two-form reads
T− = dE− +
E1 ∧ E− ,
T+ = dE+ − E1 ∧
ω+1 +
+ ω+3 ∧ E3 ,
T 1 = dE1 + E− ∧
ω+1 +
T 3 = dE3 +
E1 ∧ E3 − ω+3 ∧ E− . (3.5)
From T− = 0 one gets E−∧dE− = 0, so by Fröbenius’ theorem there exist two functions
η and u such that locally
E− = ηdu .
Plugging this into T− = 0 yields
d log η +
∧ du = 0 ,
– 12 –
so that there exists a function ξ such that
E1 = − ℓ
dη + ξdu .
The gauge field and its field strength can now be written as
A = −ℓηH1du , F =
H1dη ∧ du ,
and the Bianchi identity F = dA implies
dH1 +
H1d log η
∧ du = 0 .
This means that H1η
3/2 can depend only on u,
3/2 = −ϕ
where the prefactor and the derivative were chosen in order to conform with the notation
of [13]. Let us define a new coordinate x = −η−1/2, so that E1 = ℓ
dx+ξdu, E− = x−2du
A = −xϕ′(u)du . (3.6)
One can now use part of the residual gauge freedom, given by the stability subgroup
2 of the null spinor 1 + ae1, in order to simplify E
1. To this end, consider an R2
transformation with group element
Λ = 1 + µX + νY ,
where X and Y are given in (2.2). Defining α = µ+ iν, this can also be written as
Λ = 1 + αΓ+• + ᾱΓ+•̄ . (3.7)
Given the ordering A,B = +,−, •, •̄, the Lorentz transformation matrix aAB corre-
sponding to Λ ∈ R2 ⊆ Spin(3,1) reads
aAB =
0 1 0 0
1 −4|α|2 2ᾱ 2α
0 −2ᾱ 0 1
0 −2α 1 0
. (3.8)
The transformed vielbein αEA = aABE
B is thus given by
αE• = E• − 2αE− , αE1 = E1 −
2 (α + ᾱ)E− ,
αE •̄ = E •̄ − 2ᾱE− , αE3 = E3 +
2i (α− ᾱ)E− ,
αE− = E− , αE+ = E+ + 2ᾱE• + 2αE •̄ − 4|α|2E− . (3.9)
– 13 –
Choosing α + ᾱ = ξx2/
2, we can eliminate E1u, so one can set ξ = 0 without loss
of generality. Note that this still leaves a residual gauge freedom associated to the
imaginary part of α, which will be used below.
From dT 3 = 0 we get d(ω+3/x) ∧ du = 0, and thus there exist two functions β, β̃
such that
ω+3 = −xdβ + β̃du .
Plugging this into T 3 = 0 yields d(xE3 + βdu) = 0, which is solved by
E3 = − ℓ
dy + βdu , (3.10)
where y denotes some function that we shall use as a coordinate. Using the remaining
gauge freedom (3.8) with Imα = −βx2/2
2 allows to set also β = 0. The equation
T 1 = 0 tells us that ω+1 + E+/ℓ = γdu for some function γ. Using this together with
T+ = 0, one shows that
E− ∧ E+
E− ∧ E+
which means that the surface described by E− and E+ is integrable, so that
E+ = ℓ2
du+ hdV , (3.11)
for some functions G, h, V . The metric becomes then
ds2 = 2E−E+ +
Gdu2 + 2h
dudV + dx2 + dy2
. (3.12)
Finally, the equation T+ = 0 implies
∂xh = ∂yh = 0 , ∂V G =
∂uh , (3.13)
∂xG , β̃ = −
∂yG .
h can be eliminated by introducing a new coordinate v(u, V ) with ∂V v = h/ℓ
2 and
shifting G → G + 2∂uv, which leads to
ds2 =
Gdu2 + 2dudv + dx2 + dy2
. (3.14)
Note that, due to (3.13), G is independent of v, therefore ∂v is a Killing vector. One
easily verifies that it coincides with the Killing vector constructed from the Killing
spinor as − ℓ2
D(ǫ,Γµǫ).
– 14 –
All that remains is to impose the Maxwell and Einstein equations. One finds that
the former are automatically satisfied by the gauge potential (3.6). The same holds for
the Einstein equations, except for the uu-component, which gives the Siklos equation
with sources
∆G − 2
∂xG = −
ϕ′(u)2 . (3.15)
This family of solutions enjoys a large group of diffeomorphisms which leave the solution
invariant in form but change the function G. This is the Siklos-Virasoro invariance,
discussed in [16, 33]. In conclusion, the geometry of solutions admitting the constant
null spinor 1 + e1 is given by the Lobachevski waves with metric (3.14) and gauge field
(3.6), where G satisfies (3.15) and ϕ(u) is arbitrary. This coincides exactly with the
results of [13], where it was shown moreover that there is a second covariantly constant
spinor iff the wave profiles G and ϕ have the form
Gα(x, y, u) = −
+ 2αx3 − α2ℓ2(x2 + y2) , ϕ(u) = u , (3.16)
up to Siklos-Virasoro transformation, with α ∈ R constant. In this case, the solution
does also belong to the timelike class [13]. While the α 6= 0 solution only has the
obvious Killing vectors ∂v and ∂y, the special α = 0 case is maximally symmetric with
a five-dimensional isometry group.
3.2 Killing spinor with da 6= 0
If da and hence also dχ do not vanish, one can use the R2 stability subgroup of the
spinor 1 + eχe1 to eliminate the fluxes F+• and F+•̄. To see this, observe that under
an R2 transformation (3.8),
αF+• = F+• − 2iα
tanhχ , αF •̄• = F •̄• ,
so by choosing α = − iℓ
F+• cothχ one can achieve αF+• = 0. Note that this would not
be possible if χ = 0. With this gauge fixing, one has
sinhχE1 , A = − sinhχE3 , F = −1
tanhχE1 ∧ E3 . (3.17)
Next we impose vanishing torsion. Using (3.17), one easily shows that T− = 0 leads to
e2χ − 1
= 0 ,
and therefore one can introduce a function u with
e2χ − 1
E− = du . (3.18)
– 15 –
Before we come to the other torsion components, let us consider the Bianchi identity
and the Maxwell equations. The gauge field strength reads
F = dχ
sinh 2χ
Requiring it to be equal to dA implies that A/
tanhχ is closed, so that locally
tanhχdΨ . (3.19)
Note that the functions χ, u and Ψ must be independent, because otherwise E1, E−
and E3 would not be linearly independent. We can thus use these three functions as
coordinates.
Using
∗F = −1
tanhχE− ∧ E+ ,
the Maxwell equations d∗F = 0 imply
E− ∧ E+
sinh 2χ
E− ∧ E+
= 0 .
By Fröbenius’ theorem and (3.18), E+ can thus be written as
du+ hdV ,
where K̃, h and V are some functions, and we can use V as the remaining coordinate.
Substituing E+ into the Maxwell equations one obtains a constraint on the function h,
e2χ + 1
∧ du ∧ dV = 0 ,
and hence
h = h0(u, V )
e2χ + 1
In what follows, we define K = K̃/(e2χ + 1) and use ω+1 = (ω+• + ω+•̄)/
2, ω+3 =
(ω+• − ω+•̄)/
2i. We now come to the remaining torsion components. From T 3 = 0
and T 1 = 0 one obtains respectively
ω+3 = AE− , ω+1 = − E
ℓ coshχ
+BE− ,
where A and B are some functions to be determined. Finally, T+ = 0 yields
∂VK = 2∂uh0 , A = −
e4χ − 1
) sinhχ√
tanhχ
∂ΨK , B =
e4χ − 1
sinhχ∂χK .
– 16 –
The line element is given by
ds2 = 2E−E+ +
= cothχ
Kdu2 + 2h0dudV
ℓ2dχ2
4 sinh2 χ
sinhχ coshχ
. (3.20)
As before, one can eliminate h0 by introducing a new coordinate v(u, V ) with ∂V v = h0
and shifting K → K + 2∂uv, whereupon the metric becomes
ds2 = cothχ
Kdu2 + 2dudv
ℓ2dχ2
4 sinh2 χ
sinhχ coshχ
. (3.21)
Notice that, owing to (3.20), K is independent of v, therefore ∂v is a Killing vector.
It coincides with the Killing vector −
2D(ǫ,Γµǫ) constructed from the Killing spinor.
All that remains now is to impose Einstein’s equations. One finds that they are all
satisfied except for the uu component, which yields again a Siklos-type equation for K,
∂2ΨK + 4 tanhχ∂2χK −
cosh2 χ
∂χK = 0 . (3.22)
In conclusion, the bosonic fields for a configuration admitting a null Killing spinor
with dχ 6= 0 are given by (3.19) and (3.21), with K satisfying (3.22)6. As we will
discuss in section 5.3, the K = 0 solution is of Petrov type D and represents a bubble
of nothing in anti-De Sitter space-time. When K 6= 0, the metric becomes of Petrov
type II and the Weyl scalar signalling the presence of gravitational radiation acquires
a non-vanishing value. Hence the general solution represents a gravitational wave on a
bubble of nothing. To our knowledge these solutions have not featured in the literature
before.
3.3 Half-supersymmetric backgrounds
In the previous subsections we have addressed the conditions for preserving one null
Killing spinor of the form ǫ1 = 1 or ǫ1 = 1 + e
χe1. It is natural to enquire about the
possibility of these backgrounds admitting an additional Killing spinor with the same
2 stability subgroup, i.e. of the form ǫ2 = c01+ c1e1. Using the fact that ǫ1 is Killing,
the second Killing spinor equation Dµǫ2 = 0 can then be rewritten as
(c0 − c1)Dµ1 + ∂µc01 + ∂µc1e1 = 0 , (3.23)
6This solution escaped a majority of the present authors in [13]. The reason for this is that
equ. (4.32) of [13] is not correct; it must be R+−ij = 0, which yields no information on the constant
κ. Thus, in addition to the solutions with κ = 0 found in [13] (the Lobachevski waves), there are also
the κ = 1 solutions, which are exactly the ones found here with dχ 6= 0.
– 17 –
in the U(1)⋉R2 case and
(c0 − c1e−χ)Dµ1 + ∂µc01 + (∂µc1 − c1∂µχ)e1 = 0 , (3.24)
in the R2 case. Furthermore, we can assume that (c0 − c1) 6= 0 and (c0 − c1e−χ) 6= 0 in
the two cases, respectively, since otherwise the second Killing spinor would be linearly
dependent on the first and there would not be any additional constraints. Hence the e2
and e12 components of Dµ1 have to vanish separately. In particular, this implies that
ω−• = 0 (as can be seen from the third line of (3.1) in the singular limit χ → −∞).
However, this is clearly incompatible with (3.3). We conclude that, in the gauged
theory, there are no backgrounds with four R2-invariant Killing spinors. In other words,
there are no half-supersymmetric backgrounds with an R2-structure. This is unlike
the ungauged case, where the half-supersymmetric gravitational waves provide such
solutions.
Therefore, the only possibility to augment the supersymmetry of the null solutions
above is to add a Killing spinor which breaks the R2 invariance, i.e. with a non-vanishing
e2 and/or e12 component. From a linear combination of the first and second Killing
spinor one can then always construct a time-like Killing spinor, and hence this brings
us to the next section. For the convenience of the reader, we will already summarise
how to restrict the 1/4-supersymmetric null solutions to allow for a time-like Killing
spinor as well.
For the case with constant null Killing spinors, dχ = 0, the restriction was al-
ready discussed in [13] and is given in (3.16). For the other case, with dχ 6= 0, it is
straightforward to show that the solution (3.19), (3.21) admits a second Killing spinor
iff ∂χG = ∂ΨG = 0, so that G depends only on u. By a simple diffeomorphism one can
then set G = 0. The general solution to the Killing spinor equations reads in this case
ǫ = λ1(1 + e
χe1) +
e4χ − 1
(e2 + e
χe1 ∧ e2) , (3.25)
where λ1,2 ∈ C are constants. The invariants constructed from ǫ, as defined in appendix
B, are
2 cothχ(|λ2|2dv − |λ1|2du)−
sinh 2χ
(λ2λ̄1 − λ̄2λ1) dΨ ,
B = −
2(|λ1|2du+ |λ2|2dv) +
e4χ − 1 sinhχ
(λ̄1λ2 + λ1λ̄2) dχ ,
f = i(λ1λ̄2 − λ̄1λ2)
tanhχ , g = (λ̄1λ2 + λ1λ̄2)
cothχ .
The norm of the Killing vector V is given by
V 2 = − 2
sinh 2χ
(λ̄1λ2 + λ1λ̄2)
2 − 4|λ1λ2|2 tanhχ .
– 18 –
Since χ > 0, this is negative unless λ1 = 0 or λ2 = 0, so indeed the solution (3.19),
(3.21) with G = 0 must belong also to the timelike class. It turns out that it is identical
to the bubble of nothing of section 5.3 with imaginary b and L < 0. The coordinate
transformation
2A2(t− Ly)− z
, v = −
2A2(t− Ly)− z
Ψ = −2A2t , χ = artanhX
(3.26)
with A8 = −1/4L brings the metric (3.21) (with G = 0) to (5.60), and the field strength
of (3.19) to (5.61). Note that, in the new coordinates, the above invariants become
V = ∂t as a vector, and B = dz, in agreement with section 4.2.
4. Timelike representative 1 + be2
We will now turn to the timelike case and first recover the general 1/4-BPS solutions
[13]. Afterwards we will study the conditions for 1/2 supersymmetry. This will complete
the classification since we already know that no 3/4-supersymmetric solutions can arise
and AdS4 is the unique maximally supersymmetric possibility.
4.1 Conditions from the Killing spinor equations
Acting with the supercovariant derivative (2.5) on the representative 1 + be2 yields the
linear system
ω•̄•+ −
ω+−+ −
bA+ = 0 ,
ω•̄•+ +
ω+−+ −
F •̄• + ib√
F+− = 0 ,
ω•−+ + i
2bF•− = ω•++ = 0 , (4.1)
ω•̄•− −
ω+−− −
bA− +
F •̄• − i√
F+− = 0 ,
ω•̄•− +
ω+−− −
A− = 0 ,
b ω•+− + i
2F•+ = ω•−− = 0 , (4.2)
– 19 –
ω•̄•• −
ω+−• −
bA• − i
2F •̄− = 0 ,
ω•̄•• +
ω+−• −
A• − i
2bF •̄+ = 0 ,
ω•−• +
− ib√
F •̄• − ib√
F+− = 0 ,
b ω•+• +
F •̄• + i√
F+− = 0 , (4.3)
∂•̄b+
ω•̄••̄ −
ω+−•̄ −
bA•̄ = 0 ,
ω•̄••̄ +
ω+−•̄ −
A•̄ = 0 ,
ω•−•̄ = b ω
•̄ = 0 . (4.4)
From eqns. (4.1) - (4.4) one obtains the gauge potential and the fluxes in terms of the
spin connection and the function b,
− ∂+b
− ω•̄•+
, A− =
ω••̄− , A• =
(ω••̄• + ω
• ) ,
F+− = i√
(b ω•+• − b−1ω•−• ) , F•+ =
ω+−•̄ ,
F••̄ = i√
(b ω•+• + b
−1ω•−• ) +
, F•− = i
ω•−+ . (4.5)
Furthermore, the system (4.1) - (4.4) determines almost all components of the spin
connection (with the exception of ω••̄) in terms of the function b and its spacetime
derivatives,
ω+−+ =
, ω+−− = 0 , ω
ω+•+ = ω
•̄ = 0 , ω
− = −
, ω+•• =
ω−•+ = −b ∂•̄b̄ , ω−•− = ω−••̄ = 0 , ω−•• =
. (4.6)
In what follows, we assume b 6= 0. One easily shows that b = 0 leads to ℓ−1 = 0, so
this case appears only in ungauged supergravity.
4.2 Geometry of spacetime
In order to obtain the spacetime geometry, we consider the spinor bilinears
Vµ = D(ǫ,Γµǫ) , Bµ = D(ǫ,Γ5Γµǫ) ,
– 20 –
whose nonvanishing components are
2 b̄b , V− = −
2 , B+ =
2 b̄b , B− =
As V 2 = −4b̄b = −B2, V is timelike and B is spacelike. Using eqns. (4.1) - (4.4), it is
straightforward to show that V is Killing and B is closed, i. e. ,
∂AVB + ∂BVA − ωCB|AVC − ωCA|BVC = 0 ,
∂ABB − ∂BBA − ωCB|ABC + ωCA|BBC = 0 .
There exists thus a function z such that B = dz locally. Let us choose coordinates
(t, z, xi) such that V = ∂t and i = 1, 2. The metric will then be independent of t. Note
also that the system (4.1) - (4.4) yields
∂tb =
2 (|b|2∂− − ∂+)b = 0 ,
so b is time-independent as well. In terms of the vierbein EAµ the metric is given
ds2 = 2E+E− + 2E•E •̄ , (4.7)
where
E+µ =
Bµ + Vµ
2|b|2
, E−µ =
Bµ − Vµ
From V 2 = −4|b|2 and V = ∂t as a vector we get Vt = −4|b|2, so that V = −4|b|2(dt+σ)
as a one-form, with σt = 0. Furthermore, V
• = 0 yields E•t = 0, and thus
E• = E•zdz + E
The component E•z can be eliminated by a diffeomorphism
xi = xi(x′j , z) ,
= −EIz , I = •, •̄ .
As the matrix EIi is invertible
7, one can always solve for ∂xi/∂z. Note that the metric
is invariant under
t→ t+ χ(xi, z) , σ → σ − dχ ,
7One has det(EIi ) = − det(EAµ ), and the latter is always nonzero.
– 21 –
where χ(xi, z) denotes an arbitrary function. This second gauge freedom can be used
to eliminate σz. Hence, without loss of generality , we can take σ = σidx
i, and the
metric (4.7) becomes
ds2 = −4|b|2(dt+ σidxi)2 +
4|b|2 + 2E
iE •̄j dx
j . (4.8)
Next one has to impose vanishing torsion,
ν − ∂νEAµ + ωAµBEBν − ωAνBEBµ = 0 .
One finds that some of these equations are already identically satisfied, while the re-
maining ones yield (using the expressions (4.6) for the spin connection) the constraints
∂zσi = −
4|b|2 (E
•̄ − E•iEj•)∂j ln(b/b̄) , (4.9)
∂iσj − ∂jσi = (E•iE •̄j −E•jE •̄i )
∂z ln(b/b̄) +
, (4.10)
ω••̄t = −2|b|2∂z ln(b/b̄) +
− 2b̄
, (4.11)
j − ∂jE•i = (E•iE •̄j −E•jE •̄i )ω••̄•̄ , (4.12)
as well as
∂z + ω
∂z ln(b̄b) +
E•i = 0 . (4.13)
In (4.9), EiI denotes the inverse of E
j . In order to obtain the above equations, one has
to make use of the inverse tetrad
E+ = −
2|b|2∂z , E− =
2|b|2
2 ∂z , E• = E
•(∂i − σi∂t) .
(4.13) can be solved to give
E•i =
|b|Ê
i exp
dz ω••̄z −
, (4.14)
where Ê•i is an integration constant that depends only on the coordinates x
j . At this
point it is convenient to use the residual U(1) gauge freedom of a combined local Lorentz
and gauge transformation to eliminate ω••̄z . This is accomplished by the transformation
(2.3), with
dz ω••̄z .
– 22 –
Note that ψ is real, as it must be. Defining
Φ := − 1
, (4.15)
we have thus
E•i =
|b|Ê
i expΦ . (4.16)
Using (4.16) in (4.12), one gets for the only remaining unknown component ω••̄• of the
spin connection
ω••̄• =
ω̂••̄• − Êi•∂i
|b| exp(−Φ) ,
where ω̂••̄• denotes the spin connection following from the zweibein Ê
In what follows, we shall choose the conformal gauge for the two-metric hij =
ÊIiÊ
j , i. e. ,
hij = e
2ξ[(dx1)2 + (dx2)2] . (4.17)
with ξ depending only on the coordinates xi. Furthermore, we choose an orientation
such that
Ê•i Ê
j − Ê•j Ê •̄i = −ie2ξǫij ,
where ǫ12 = 1. To be concrete, we shall take
(ÊIi ) =
The eqns. (4.9) and (4.10) then simplify to
∂zσi = −
4|b|2 ǫij∂j ln(b/b̄) , (4.18)
∂iσj − ∂jσi = −
|b|2 e
2(Φ+ξ)ǫij
∂z ln(b/b̄) +
. (4.19)
Moreover, one has
ω••̄• = −∂• ln
|b|e−Φ−ξ
. (4.20)
In [13] it has been shown that in the case where the Killing vector constructed from
the Killing spinor is timelike, the Einstein equations follow from the Killing spinor
equations, so all that remains to do at this point is to impose the Bianchi identity and
the Maxwell equations. Using the spin connection (4.6) and (4.11) in (4.5), the gauge
– 23 –
potential and the field strength become
A = i(dt + σ)(b− b̄) + ℓ
ǫij∂j(Φ + ξ) dx
i − iℓ
d ln(b/b̄) ,
F = i(dt + σ) ∧ d (b̄− b) + 1
4|b|2dz ∧ dx
iǫij∂j(b+ b̄)
2|b|2
∂z(b+ b̄) +
e2(Φ+ξ)ǫijdx
i ∧ dxj . (4.21)
The Bianchi identity F = dA yields
∆(Φ + ξ) =
e2(Φ+ξ)
, (4.22)
with ∆ = ∂i∂i denoting the flat space Laplacian in two dimensions. As for the Maxwell
equations,
−gFµν) = 0 ,
the only nontrivial information comes from the t-component, which gives
4e2(Φ+ξ)
b2∂2z
− b̄2∂2z
+ b2∆
− b̄2∆1
= 0 , (4.23)
where we used eqns. (4.18) and (4.19).
Let us now show that the equations (4.22) and (4.23) are actually the same as the
ones in [16]. If we set
F = − 1
, eφ = 2eΦ+ξ , (4.24)
(4.22) yields exactly equation (2.3) of [16]. On the other hand, deriving (4.22) with
respect to z and using (4.15), one obtains
∆A + e2φ
3A∂zA− 3B∂zB + A3 − 3AB2 + ∂2zA
= 0 , (4.25)
where A and B denote the real and imaginary part of F respectively. This can be used
in (4.23) to get
∆B + e2φ
∂2zB + 3B∂zA+ 3A∂zB − B3 + 3A2B
= 0 ,
which, together with (4.25), yields
∆F + e2φ
F 3 + 3F∂zF + ∂
= 0 , (4.26)
i. e. , equation (2.2) of [16]. For a complete identification of the present results with
the ones in [16], one also has to set σ = ω.
– 24 –
In conclusion, the metric of the general 1/4-supersymmetric solution is given by
ds2 = −4|b|2(dt+ σ)2 + 1
4|b|2
dz2 + 4e2(Φ+ξ)dw dw̄
, (4.27)
where b and φ are determined by the system (4.22), (4.23) and w = x1 + ix2 ≡ x+ iy.
The one-form σ follows then from (4.18) and (4.19), and the gauge field strength is given
by (4.21). Note that (4.23) represents also the integrability condition for (4.18), (4.19).
As noted in [16], this system of equations is invariant under PSL(2,R) transformations8.
If we define a new coordinate z′ through the Möbius transformation
αz + β
γz + δ
, (4.28)
with α, β, γ and δ arbitrary real constants satisfying αδ − βγ = 1, then the functions
b̃(z′, xi) and Φ̃(z′, xi) defined by
(γz′ − α)2b −
γz′ − α , e
Φ̃ = (γz′ − α)2 eΦ , (4.29)
solve the system in the new coordinate system (z′, xi), with the function ξ(xi) left
invariant and z seen as a function of z′. This symmetry allows to generate new BPS
solutions from the known ones. Note however that it is only a symmetry of the equations
for 1/4 supersymmetry, and if we apply it to solutions with additional Killing spinors,
it will in general not preserve them, as we shall show explicitely in some examples.
4.3 Half-supersymmetric backgrounds
We now would like to investigate the possibility of adding a second Killing spinor. Since
the first Killing spinor ǫ1 has stability subgroup 1, one cannot use Lorentz transforma-
tions to bring the second spinor to a preferred form. Therefore we use the most general
ǫ2 = c01 + c1e1 + c2e2 + c12e1 ∧ e2 . (4.30)
The corresponding linear system simplifies significantly after inserting the results from
ǫ1. These determine all the fluxes and the spin connection in terms of the functions b,
ξ and their derivatives. First it is convenient to introduce the new basis9
b−1c2 − c0
8It might be of interest to investigate the possible relation between this ’hidden symmetry’ and the
Ehlers group for solutions of four-dimensional vacuum gravity with a Killing vector.
9Note that ǫ1 = (1, 0, 0, 0) in this basis.
– 25 –
in which the Killing spinor equations for ǫ2 read
(∂A +MA)α = 0 , (4.31)
with the connection MA given by
0 −∂+ ln b̄ 0 0
0 ∂+ ln b̄ −∂• ln b ∂• ln b
0 0 b̄−b√
∂+ ln
b̄− ∂+ ln b
0 −|b|2∂•̄ ln b̄ 0 b̄−b√2ℓ −
∂+ ln(b̄b)
0 0 |b|−2∂• ln b̄ −|b|−2∂• ln b̄
0 ∂− ln b −|b|−2∂• ln b̄ |b|−2∂• ln b̄
0 ∂•̄ ln b
b−b̄√
2ℓ|b|2 −
∂− ln(b̄b) 0
0 0 −
− ∂− ln b̄ b−b̄√2ℓ|b|2 +
∂− ln
0 −∂• ln b̄ 0 0
0 ∂• ln(b̄b) 0 0
b̄− ∂+ ln b −∂• ln
|b|e−Φ−ξ
b+ ∂+ ln b̄ 0 −∂• ln
|b|e−Φ−ξ
M•̄ =
0 0 −∂− ln b̄ ∂− ln b̄+
0 0 ∂− ln(b̄b) +
−∂− ln(b̄b)−
0 0 ∂•̄ ln
e−Φ−ξ
−∂•̄ ln b
0 0 −∂•̄ ln b̄ ∂•̄ ln
e−Φ−ξ
Let us first of all consider the simpler possibility of a second Killing spinor of the
form ǫ2 = c01 + c2e2. As discussed in section 2.1, both ǫ1 and ǫ2 are invariant under
the same U(1) symmetry, and hence this case constitutes the G = U(1) case with four
supersymmetries. As can easily be seen from the above Killing spinor equations with
α1 6= 0 and α2 = α12 = 0, this restricts the derivatives of the coefficient b to be
∂−b = −
, ∂+b = −
, ∂•b = ∂•̄b = 0 . (4.32)
Hence this corresponds to ∂zb = −1/ℓ. As will be discussed in section 5.1, this restric-
tion uniquely leads to the half-supersymmetric anti-Nariai space-time. Hence AdS2×H2
is the only possibility for backgrounds with four U(1)-invariant Killing spinors.
In the more general case with α2 and α12 non-vanishing, i.e. with trivial stability
subgroup, the Killing spinor equations do not so readily provide information about b
– 26 –
and one has to resort to their integrability conditions. The first integrability conditions
for the linear system (4.31) are
Nµνα ≡ (∂µMν − ∂νMµ + [Mµ,Mν ])α = 0 , (4.33)
where the matrices Mµ = E
µMA are given by
2(|b|2M− −M+) , Mz =
2|b|2
(M+ + |b|2M−) ,
Mw = σwMt +
eΦ+ξM• , Mw̄ = σw̄Mt +
eΦ+ξM•̄ ,
and we introduced the complex coordinates w = x + iy, w̄ = x − iy. For half-
supersymmetric solutions, the six matrices Nµν must have rank two. (As at least
one Killing spinor exists, namely ǫ1 = (1, 0, 0, 0), we already know that the Nµν can
have at most rank three. Rank one is not possible, because 3/4 BPS solutions cannot
exist [29]. Rank zero corresponds to the maximally supersymmetric case, which implies
that the spacetime geometry is AdS4 [13].) Let us define
Ñµν ≡ SNµνT ,
1 0 0 0
1 1 0 0
0 0 1 0
0 0 0 1
, T =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 1
The similarity transformation S corresponds to adding the first line to the second one
and T adds the last column to the third one. This does not alter the rank of Nµν . One
finds
Ñwt =
2b∂∂z b̄+
∂b̄ −2|b|
e−Φ−ξ
∂2b̄+ 1
∂b̄∂b̄
∂z b̄+
∂ ln b̄
0 −2∂(Φ + ξ)∂b̄
2b̄∂∂zb+
∂b −2|b|
e−Φ−ξ
∂2b+ 1
∂ ln b
0 −2∂(Φ + ξ)∂b)
2|b|3e−Φ−ξ∂̄∂ ln b 2b̄∂∂zb
0 −2|b|3eΦ+ξb−2
2∂zb+
∂ ln b
2|b|3e−Φ−ξ∂̄∂ ln b̄ 2b∂∂z b̄− 2ℓ∂b̄0 −2|b|3eΦ+ξ b̄−2
2∂z b̄+
∂z b̄+
∂z b̄+
∂ ln b̄
– 27 –
Ñw̄t =
2b∂̄∂z b̄ −2|b|e−Φ−ξ∂̄∂ ln b̄
∂z b̄+
∂̄ ln b̄
ℓ|b|e
2∂z b̄+
+ 2b|b|b̄e
2∂z b̄+
∂z b̄+
2b̄∂̄∂zb −2|b|e−Φ−ξ∂̄∂ ln b
∂̄ ln b
ℓ|b|e
2∂zb+
+ 2b̄|b|be
2∂zb+
2|b|b̄e−Φ−ξ
∂̄2b+ 1
∂̄b∂̄b 2b̄∂̄∂zb+
0 −2∂̄(Φ + ξ)∂̄b
∂̄ ln b
2|b|be−Φ−ξ
∂̄2b̄+ 1
∂̄b̄∂̄b̄ 2b∂̄∂z b̄− 4ℓ ∂̄b̄0 −2∂̄(Φ + ξ)∂̄b̄
∂z b̄+
∂̄ ln b̄
where ∂ = ∂w, ∂̄ = ∂w̄. The other four integrability conditions give no additional
information, because the lines of the corresponding matrices are proportional to the
lines of Ñwt and Ñw̄t
As the upper right 3× 3 determinant of Ñwt must vanish, we obtain ∂b = 0 or
e−2(Φ+ξ)b̄∂b̄
e−2(Φ+ξ)b∂b
e−2(Φ+ξ)b∂b
e−2(Φ+ξ)b̄∂b̄
= 0 . (4.34)
Let us assume that the expression in (4.34) does not vanish. One has then ∂b = 0
as well as ∂b̄ = 011. But then also (4.34) holds, which leads to a contradiction. Thus
(4.34) must be satisfied in any case.
Note that the vanishing of the first column of Ñµν implies that also the first column
of T−1NµνT is zero, and thus T
−1NµνT ∈ a(3,C), hence the generalized holonomy in
the case of one preserved complex supercharge is contained in the affine group A(3,C).
This supports the classification scheme of [4]. Of course, depending on the particular
solution, the generalized holonomy may also be a subgroup of A(3,C).
4.4 Time-dependence of second Killing spinor
In this section we will utilize the above Killing spinor equations to derive the time-
dependence of the second Killing spinor. In addition, we will show that the Killing
spinor equations can be completely solved when the second Killing spinor is time-
independent.
Let us first simplify the Killing spinor equations (4.31). In the following we set
b = reiϕ and define ψ = Φ + ξ, ψ1 = r
2α1, ψ2 = re
−ψα2, ψ12 = re
−ψα12 and ψ± =
10In order to show this, one has to make use of eqns. (4.22) and (4.26).
11This follows from the vanishing of the 3 × 3 determinant that is obtained from Ñwt by deleting
the first column and the third line.
– 28 –
ψ2 ± ψ12. First of all, use the integrability conditions (4.33), that can be rewritten as
ÑµνT
−1α = 0. Defining P = e−2ψb∂b, the second component for µ = w, ν = t gives
′ + ψ−∂P = 0 , (4.35)
with ′ = ∂z . Let us assume P
′ 6= 0 (the case P ′ = 0 is considered in appendix C and
will lead to the same conclusions). If we define g(t, z, w, w̄) = −ψ−/P ′, we get
ψ− = −gP ′ , ψ1 = g∂P .
The third component of the (w, t) integrability condition is of the form
ψ1f1 + ψ2∂b+ ψ−f− = 0 ,
for some functions f1, f− that depend on z, w, w̄ but not on t. Using the above form
of ψ1 and ψ−, this becomes
f1g∂P + ψ2∂b− f−gP ′ = 0 . (4.36)
Now, if g = 0, the latter equation implies ψ2∂b = 0, and hence (since ∂b 6= 0 due to
P ′ 6= 0) ψ2 = 0. Furthermore, ψ1 = ψ− = 0 in this case, so there exists no other Killing
spinor. Thus, g 6= 0 and we can write g = expG. Dividing (4.36) by g and deriving
with respect to t yields ∂t(ψ2/g) = 0 and hence
ψ2 = e
Gψ02(z, w, w̄) .
It is then plain that ∂tψi = ψi∂tG, i = 1, 2, 12. The Killing spinor equations are of the
form ∂µψi = Mµijψj , for some time-independent matrices Mµ. Taking the derivative
of this with respect to t, one gets ∂µ∂tG = 0, whence
G = G0t + G̃(z, w, w̄) ,
with G0 ∈ C constant. We have thus ∂tψi = G0ψi and hence also ∂tαi = G0αi. Fur-
thermore, the time-dependence of α0 can be easily deduced from the Killing spinor
equations: if G0 does not vanish it is of the same exponential form as the other com-
ponents of the second Killing spinor, i.e. ∂tα0 = G0α0, while if G0 vanishes there can
be a linear part in t, i.e. ∂tα0 = c for some constant c. Hence, in terms of the basis
elements, the time-dependence of the second Killing spinor takes the form12
G0 = 0 : ǫ2 = c01 + c1e1 + c2e2 + c12e1 ∧ e2 + ct(1 + be2) ,
G0 6= 0 : ǫ2 = eG0t(c01 + c1e1 + c2e2 + c12e1 ∧ e2) , (4.37)
12We will loosely refer to Killing spinors with G0 = 0 as time-independent, despite the possible
linear time-dependence, to distinguish from the G0 6= 0 exponential time-dependence.
– 29 –
where c0, c1, c2, c12 are time-independent functions of the spatial coordinates, and c is a
constant. This was derived assuming P ′ does not vanish, but as we show in appendix
C is in fact a completely general result. Hence, adding a second Killing spinor to
ǫ1 = 1 + be2, the Killing spinor equations imply that ǫ2 always has the above time-
dependence.
Plugging this time-dependence into the subsystem of the Killing spinor equations
not containing α0 one obtains in terms of ψi
ψ′1 −
ψ− = 0 , (4.38)
ψ′2 −
ψ12 = 0 , (4.39)
ψ′12 − e−2ψ
ψ12 = 0 , (4.40)
ψ′1 −
ψ− = 0 , (4.41)
ψ′2 + e
−2ψ ∂̄b
ψ2 = 0 , (4.42)
ψ′12 −
ψ12 = 0 , (4.43)
∂ψ1 − σwG0ψ1 = 0 , (4.44)
∂ψ2 −
σwG0 +
− 2∂ψ
ψ2 = 0 , (4.45)
∂ψ12 +
σwG0 +
− 2∂ψ
ψ12 = 0 , (4.46)
∂̄ψ1 −
σw̄G0 +
ψ1 + e
− ψ12
= 0 ,(4.47)
∂̄ψ2 −
σw̄G0 +
ψ12 = 0 ,(4.48)
∂̄ψ12 −
σw̄G0 +
ψ12 = 0 .(4.49)
ForG0 = 0, these equations simplify significantly, and allow for a complete solution.
As is shown in appendix D, under the additional assumption ψ− 6= 0, ψ1 6= 0, the metric
– 30 –
and the field strength for half-supersymmetric solutions with G0 = 0 are given in terms
of a single real function H depending only on the combination Z−w− w̄ and satisfying
the second order differential equation
1 + e−2H
Ḧ + Ḣ2
1− 3α
e2H + 1− α2
, (4.50)
where α ∈ R denotes an arbitrary constant and γ = 0, 1. The new coordinate Z is
defined by Z = z for γ = 0 and Z = ℓ ln
1 + z
for γ = 1. Furthermore, in the
remainder of this section and in appendix D, a dot denotes a derivative with respect
to Z − w − w̄. Given a solution of (4.50), one defines the functions χ, ρ by
e2H + 1− α2
− Ḣ2χ2 . (4.51)
Note that χ is imaginary and ρ is real. b and ψ are then given by
b = eγZ/ℓρ eiϕ , e2ψ = e2(H+γZ/ℓ) ,
where
tanϕ =
so that the metric reads
ds2 = −4ρ2e2γZ/ℓ(dt+ σ)2 + 1
+ e2Hdwdw̄
, (4.52)
where the shift vector satisfies
∂Zσw =
e−γZ/ℓ
, ∂σw̄ − ∂̄σw = −
e−γZ/ℓ
Finally, the gauge field strength is given by (4.21).
Equation (4.50) is actually the Euler-Lagrange equation for the following standard
action for the scalar H
d (Z − w − w̄)
M(H)Ḣ2 − V (H)
, (4.53)
where
M(H) =
e2H + 1
(e2H + 1− α2)3/2
, V (H) = − γ
e2H + 1− 2α2
(e2H + 1− α2)1/2
. (4.54)
– 31 –
Thus it is possible to use the energy conservation law of that model in order to evaluate
the “velocity” Ḣ in terms of H . Since dH = Ḣd (Z − w − w̄) one has
M(H)Ḣ2 + V (H)
= 0 , (4.55)
so that there must exist a constant E such that
[E − V (H)] =
e2H + 1− α2
e2H + 1
e2H + 1− 2α2√
e2H + 1− α2
(4.56)
The key-point is to consider now, as a new coordinate, the function H in place of
w+ w̄13 and to write down the full solution, say metric plus gauge field, in terms of H .
Using w = x+ iy, the general solution is given by
ds2 = −4ρ2e2γZ/ℓ
dt + e−γZ/ℓσ̂ydy
dy2 +
dZ2 + e2H
dZ − dH
A = ℓḢ
−2iρ2χeγZ/ℓdt+
1− e2Hχ2
− i ℓ
d log
, (4.57)
where Ḣ is given in equation (4.56), the functions χ and ρ are defined in (4.51) and
the shift vector reads
σ̂y = −
If γ = 1, a simple example of this set of solutions can be obtained by setting α = 0,
Ḣ = 1/ℓ , b =
. (4.58)
As will be shown in section 5.1.2, this corresponds to the maximally supersymmetric
AdS4 solution. More general γ = 1 solutions will be two-parameter deformations
thereof, the parameters being α and the energy E of the associated scalar system.
Setting γ = 0 the potential V (H) vanishes and the parameter E can be fixed by
a simple rescaling of the coordinates. Thus we are left with a one-parameter family
of solutions. Since the metric does no more depend explicitly on Z, it is useful to
replace the coordinate Z instead of x by H . Defining a new coordinate r such that
13This is possible by simply requiring that Ḣ 6= 0.
– 32 –
r4 ≡ 16
e2H + 1− α2
and a new parameter Q = 4
α, the complete solution reads
ds2 = −
dt− 2ℓ
r4 + ℓ2Q2
h(r)2
r4 + ℓ2Q2 − 16
dx2 + dy2
A = −Q
dy − i ℓ
d log
, (4.59)
where
h(r) =
r4 + ℓ2Q2
r4 + ℓ2Q2 − 16 . (4.60)
The parameter Q can thus be interpreted as an electric charge. The Petrov type of the
solution is D or simpler. If one sets Q = 4/ℓ the Petrov type is reduced to N , so that
there is a gravitational wave.
In order to complete the classification of G0 = 0 solutions, we need to study
separately the cases where either ψ1 or ψ− vanishes (it can easily be seen from (4.39)
and (4.40) that there is no solution if both vanish). As one can see by looking at
equations (4.38) and (4.41), the condition ψ1 = 0 leads to b = b(z), which is studied in
detail in section 5.1. The other possibility, ψ− = 0, is more involved, but as we show
in appendix D it boils down to three different cases, that can be completely solved:
the AdS2 × H2 anti-Nariai spacetime studied in section 5.1.1, the imaginary b case
solved in section 5.3, and finally the half BPS solution coming from the gravitational
Chern-Simons model, that we analyse in section 5.5.
We would like to remark that the assumptionG0 = 0 on the overall time-dependence
of the second Killing spinor seems a reasonable choice since all known 1/2-supersymmetric
solutions to be studied in the next section are contained in this class, or can be brought
to this class by a general coordinate transformation. Hence we expect the G0 = 0 class
to form an important subclass of all 1/2-supersymmetric solutions.
5. Timelike half-supersymmetric examples
The problem of finding all half BPS configurations in the timelike class involves the
solution of the integrability conditions we obtained above. To obtain explicit examples
of half BPS solutions, we shall restrict to some simple subclasses with particular b.
This will determine the fraction of preserved supersymmetry for the solutions which
are already known to be 1/4 supersymmetric, and will also lead to new solutions.
– 33 –
5.1 Static Killing spinors and b = b(z)
The timelike vector field V , constructed as a bilinear of the Killing spinor, is static
if the associated one-form V = dt + σ satisfies the Fröbenius condition V ∧ dV = 0.
Obviously, there can be static BPS solutions with V not being static itself, due to the
choice of coordinates; we shall loosely refer the Killing spinors whose vector bilinear is
static as static Killing spinors. The staticity condition, in turn, implies dσ = 0 and
puts strong constraints on the function b. Indeed, equation (4.18) implies that the
phase ϕ of b depends only on z. Then, (4.19) gives the modulus r of b in terms of its
phase,
sinϕ(z)
lϕ′(z)
. (5.1)
As a consequence, r and therefore the complete complex function b, depend on the
single variable z. The full solution is therefore determined by the single real function
ϕ, which has to satisfy the equations for supersymmetry (together with the conformal
factor ψ).
However, since the equations can be exactly solved for arbitrary b(z), we will stick
to this more general case and eventually comment on the static subcase.
If b depends only on z, the equations of motion simplify to
b2∂2z
= 0 , (5.2)
e−2ξ∆ξ =
. (5.3)
Here we have used the fact that Φ, defined in (4.15), depends only on the coordinate
z. In principle there is also an integration constant K(w, w̄) with arbitrary dependence
on the transverse coordinates, but since Φ appears only in the combination Φ+ ξ in all
the equations, we can always absorb the (w, w̄) dependence into the conformal factor
ξ. Now the left hand side of equation (5.3) depends only on the coordinates w and w̄,
while the right hand side depends only on z. This equation can be therefore satisfied
only if both sides are equal to some constant κ. The system of equations is then
e2ξ = 0 , (5.4)
e2Φ(z)
= − l
κ . (5.5)
Note that the first one is the Liouville equation, whose solution describes the transverse
two-dimensional manifold, which has therefore constant curvature κ.
– 34 –
Equations (5.2) and (5.5) can easily be solved [16]. Their solution is given by14
b̄ = −αz
2 + βz + γ
ℓ(2αz + β)
, (5.6)
with α, β, γ ∈ C. Then ξ solves the Liouville equation for a constant curvature two-
manifold with scalar curvature15
κ = 8(αγ̄ + ᾱγ)− 4ββ̄ . (5.7)
This solution generically belongs to the supersymmetric Reissner-Nordström-Taub-
NUT-AdS4 family of spacetimes. The values α = 0 and β
2 = 4αγ are special cases
and will be treated separately in the following. Note that the coefficients α, β and γ
are not three independent parameters, as they can be rescaled without changing the
function b: the solutions depend only on their ratios. For example, if α 6= 0, one can
use β/α and γ/α as independent complex parameters of the family of solutions.
The solutions with static Killing spinor form a subset of this family. For (5.6) the
staticity condition (5.1) yields the condition αβ̄− ᾱβ = 0. Recalling the expression for
the NUT charge of these solutions,
, (5.8)
this charge must vanish for non-vanishing α, as one could have guessed. On the other
hand, for α = 0 the solution is anti-Nariai, as we shall see below. We conclude that
the most general supersymmetric configuration with static Killing vector constructed
as a Killing spinor bilinear is either of the form (5.6) – i. e. in the fourth row of table
1 of [34] – with vanishing NUT charge, or it is anti-Nariai spacetime.
The supersymmetric static solutions discussed so far are generically 1/4-BPS. We
want to see what further condition ensures the presence of an additional Killing spinor.
Inserting the staticity ansatz b = b(z) into the integrability equations and requiring
these matrices to be of rank smaller or equal to two, one finds the following condition
(in particular this is obtained from the vanishing of the minor of the last row of Ñwt
and the first two rows of Ñw̄t)
2b′ +
= 0 , (5.9)
14With this definition, the constants α, β and γ coincide with a, b and c of [16] respectively.
15This scalar curvature differs from the one given in [16] for the case in which all coefficients are
real, k = 4αγ − β2 = κ/4. The factor of 4 comes from the different definition of the conformal factor
of the transverse metric, our ξ is related to the old γ by ξ = γ − ln 2.
– 35 –
As an aside, note that we have only used the ansatz b = b(z) so far and not the
staticity condition (5.1), i.e. the precise relation between r and ϕ. The static solutions
are therefore in general still a subset of the solutions under consideration.
Condition (5.9) calls for the following three different cases, corresponding to the
vanishing of its three factors.
5.1.1 AdS2 ×H2 space-time (α = 0)
Requiring the first factor of (5.9) to vanish leads to b = −z
+ ic with constant c,
corresponding to α = 0 in (5.6). We can absorb the imaginary part of c by a shift of
the coordinate z and henceforth will assume c ∈ R.
In this case κ = −4 and we have a hyperbolic transverse space. As a solution of
(5.4) we can take
e2ξ =
. (5.10)
Moreover, eΦ = l|b| and σ = 0, therefore giving the metric
ds2 = −4
dt2 +
dx2 + dy2
. (5.11)
This is the anti-Nariai AdS2 × H2 solution, with the AdS2 factor written in Poincaré
coordinates for c = 0 and in global coordinates for c 6= 0. The coordinate transforma-
tions between Poincaré coordinates (tP , zP ) (with c = 0) to global ones (tgl, zgl) (with
c 6= 0) is given by
(zgl −
z2gl + ℓ
2c2 cos(4ctgl/ℓ)) ,
tP = −
z2gl + ℓ
2c2 sin(4ctgl/ℓ)
zgl −
z2gl + ℓ
2c2 cos(4ctgl/ℓ)
. (5.12)
The electromagnetic field strength (4.21) in this case is given by
F = − 1
dx ∧ dy , (5.13)
i.e. only lives on the hyperbolic part and is independent of the coordinates of the AdS
part of space-time.
This solution preserves precisely 1/2 of the supersymmetries, as was already shown
in [35]. To obtain the form of the Killing spinors admitted by this metric we first
observe that the integrability conditions impose α2 = α12 = 0. Then the Killing spinor
equations are easily solved, but one should treat separately the cases c = 0 and c 6= 0:
– 36 –
• If c = 0, then
α0 = λ1 + λ2
, α1 =
, (5.14)
where λ1,2 ∈ C are integration constants. This yields the following Killing spinors,
spanning a two-dimensional complex space,
λ1 + λ2
1 + b
λ1 + λ2
e2 . (5.15)
Note that λ1 = 1, λ2 = 0 corresponds to the original Killing spinor. Also note
that the constant G0, corresponding to the time-dependence of the second Killing
spinor with λ2 6= 0, is zero. The form of the scalar invariant corresponding to the
general spinor ǫ is
b̃ = b
|λ1|2 + |λ2|2
λ̄1λ2 + λ1λ̄2
λ̄1λ2 − λ1λ̄2
. (5.16)
Here the first term is real, while the second is imaginary. Note that the latter
is in fact constant. Then the Killing vector Ṽ built from ǫ will have a norm
Ṽ 2 = −4|b̃|2, and will be timelike unless b̃ vanishes. This is however not possible,
because both the real and imaginary parts of b̃ should vanish, but since λ1,2 do
not depend on the coordinates, the real part cannot vanish. Therefore, every
Killing spinor of this solution belongs to the timelike class.
• If c 6= 0 we have
[λ1−iλ2+(λ1+iλ2)
−4ict/ℓ , α1 = −
|b| (λ1+iλ2)e
−4ict/ℓ , (5.17)
and the most general Killing spinor is parametrized by λ̃1,2 ∈ C as follows
(λ1 − iλ2)(1 + be2) +
2|b|(λ1 + iλ2)e
−4ict/ℓ(1 + b∗e2) . (5.18)
Note that the combination λ1− iλ2 corresponds to the first Killing spinor 1+ be2,
while the orthogonal combination λ1 + iλ2 gives rise to the second Killing spinor
proportional to 1 + b∗e2. Any combination with λ2 6= 0 has G0 = −4ic/ℓ.
In this case, the real part of the invariant b̃ is given by
Re(b̃) =
|λ1|2
(−z +
z2 + ℓ2c2 cos(4ct/ℓ)) +
|λ2|2
(−z −
z2 + ℓ2c2 cos(4ct/ℓ))+
2 + λ2λ
z2 + ℓ2c2 sin(4ct/ℓ)) , (5.19)
while the imaginary part is identical to that of (5.16).
– 37 –
It can easily be checked that the coordinate transformation (5.12) indeed relates the
complex scalar b̃, which is composed of spinor bilinears, in (5.16) and (5.19) to each
other.
Let’s now check how the isometries of AdS2 act on the Killing spinors. It is useful
to do this by embedding AdS2 with metric
ds2 = −4
dt2 +
) (5.20)
into the three-dimensional flat space Xa = (U, T,X) with metric
ds2 = −dU2 − dT 2 + dX2 . (5.21)
Then, AdS2 is obtained as the hyperboloid defined by
−U2 − T 2 +X2 = ℓ
, (5.22)
and its isometry group SO(2,1) will act as the three-dimensional Lorentz group on the
embedding coordinates Xa (here a is a three-dimensional Lorentz index).
If c = 0, the AdS2 metric (5.20) is in the Poincaré form, and can be seen to be
the induced metric on the hyperboloid by parameterizing it with the coordinates (t, z)
given by
z = U +X , t =
2(U +X)
. (5.23)
Then, if one defines the 3d Lorentz vector
|λ1|2 − |λ2|2
(λ∗1λ2 + λ1λ
2) ,−
|λ1|2 + |λ2|2
, (5.24)
one explicitly checks that the invariant b̃ can be put in the form
b̃ = XaΛ
ΛaΛa. (5.25)
Now, the real and imaginary part of b̃ are independently manifestly invariant under the
AdS2 isometries, as they should be (since they transform respectively as pseudoscalar
and scalar under diffeomorphism16).
If c 6= 0 we have AdS2 in global coordinates, and the embedding is modified to
U = − ℓ
+ c2 cos
, T = − ℓ
+ c2 sin
, X =
. (5.26)
16Note that Λ doesn’t depend on the sum of the phases of λ1,2; this is diffeomorphism invariant but
transforms under U(1) gauge transformations.
– 38 –
The invariant (5.19) takes again the manifestly invariant form (5.25), as expected, and
the isometries of AdS2 are realized linearly on the Killing spinors through their action
on Λa.
This result may be useful to study in detail quotients of AdS2 and to see whether
this operation breaks some supersymmetry.
5.1.2 AdS4 space-time (β
2 = 4αγ)
The following subcase corresponds to the vanishing of the second factor of the inte-
grability condition (5.9). The function b is then given by b = − z
+ ic, which can be
obtained as the special case β2 = 4αγ from (5.6). This corresponds to AdS4, the only
maximally supersymmetric solution of the theory. Indeed the integrability condition
matrices vanish in this case.
Let’s see in detail the form of the metric arising from different values of c. As in
the previous case we can take the constant c to be real. If c = 0, the metric is static,
σ = 0, ξ = 0 and e2Φ = |b|4, and we obtain anti-de Sitter in Poincaré coordinates,
ds2 = −z
dt2 − dx2 − dy2
dz2 . (5.27)
On the other hand, for c 6= 0, the metric appears in non-static coordinates,
σ = − ℓdy
, e2ξ =
4c2x2
, e2Φ = |b|4 , (5.28)
which give
ds2 = −
+ 4c2
dt− ℓdy
16c2x2
dx2 + dy2
+ 4c2
dz2 .
(5.29)
The field strength (4.21) vanishes in this case.
We shall now obtain the form of the Killing spinors for AdS4, and will do this in
the simpler c = 0 case. The solution of the Killing spinor equations yields
α0 = λ1 −
λ3 , α2 = −
λ2 , α12 =
1− zt
λ4 , (5.30)
where the coefficients λ1,...,4 span a four dimensional complex space, as expected in the
case of maximal supersymmetry. In the form basis of the spinors ǫ = c01+c1e1+c2e2+
– 39 –
c12e1 ∧ e2, we obtain
c0 = λ1 −
λ3 , c2 = −
λ3 + λ4 , c12 =
λ4 . (5.31)
The new Killing spinors corresponding to λ2 and λ4 both have
17 G0 = 0. To study the
action of the AdS4 isometries it is useful to embed the hyperboloid in a five-dimensional
flat space (U, V, T,X, Y ) with metric
ds2 = −dU2 + dV 2 − dT 2 + dX2 + dY 2. (5.32)
Then, AdS4 is the hypersurface −U2 + V 2 − T 2 +X2 + Y 2 = −ℓ2/4 and its isometries
are realized as the SO(3,2) isometries of the embedding space. The relation with the
Poincaré coordinates is
U − V ,
U − V ,
U − V , z = 2(U − V ) . (5.33)
If we define the vectors
ℓΛa =
|λ1|2 − |λ2|2 + |λ3|2 − |λ4|2
|λ1|2 + |λ2|2 − |λ3|2 − |λ4|2
λ3λ̄4 + λ̄3λ4 − λ̄1λ2 − λ1λ̄2
λ2λ̄4 + λ̄2λ4 − λ̄1λ3 − λ1λ̄3
λ2λ̄4 − λ̄2λ4 + λ̄1λ3 − λ1λ̄3
, Xa =
, (5.34)
where the index a = 1, . . . , 5 is an SO(3,2) index raised and lowered using the metric
(5.32), then
a = − 1
λ3λ̄4 − λ̄3λ4 + λ̄1λ2 − λ1λ̄2
]2 ≥ 0 , (5.35)
and the invariant b̃ for the Killing spinors reads
b̃ = c∗0c2 + c1c
12 = XaΛ
ΛaΛa . (5.36)
17Note that this does not hold for λ3, whose time-dependence is not of the form derived in section
4.4. There is no contradiction however, since all solutions in this class have P = 0 and hence are
treated separately in appendix C. It is interesting to find that nevertheless the time-dependence of
many Killing spinors in this class have the canonical G0 time-dependence.
– 40 –
This form of b̃ is manifestly invariant under the AdS4 isometries, and shows that under
Λa transforms in the fundamental representation of SO(3,2) under these transforma-
tions. Note that it has precisely the same form (5.25) as in the anti-Nariai case. Again,
the explicit knowledge of the AdS4 isometry group action on the Killing spinors is
important to study the supersymmetry of its quotients.
5.1.3 The Reissner-Nordström-Taub-NUT-AdS4 family
The last subcase corresponds to the vanishing of the third factor of the integrability
condition (5.9). Note that this is precisely the expression in square brackets of equation
(5.5) and the condition reads simply κ = 0. Then ξ is an harmonic function and the
transverse space is flat. In particular, the solution (5.6) admits a second Killing spinor
|β|2 = 2(αγ̄ + ᾱγ) . (5.37)
Since α 6= 0 we can define ζ = Im(β/α) and δ = Im(γ/α). Moreover, all equations
are invariant under rigid translations in the z directions, since the coordinate z never
appears explicitly in them. One can use this freedom to eliminate the real part of β/α
by performing the redefinition z 7→ z − 1
Re(β/α). Hence this complete family of 1/2
BPS solutions is determined by two real parameters ζ and δ,
b = −1
z2 − iζz + 1
ζ2 − iδ
2z − iζ . (5.38)
Then σ = −2ζ(r/ℓ)2dϑ and the resulting metric is
ds2 = −
z2 + ζ
+ (ζz + δ)
z2 + ζ
dt− 2ζ
z2 + ζ
z2 + ζ
+ (ζz + δ)
dr2 + r2 dϑ2
, (5.39)
where we used polar coordinates (r, ϑ) in the (w, w̄) plane. The charges of the solution
M = −δζ
, n =
, P = −ζ
, Q = −δ
. (5.40)
Essentially, the imaginary part of γ gives the electric charge and the imaginary part of β
determines the NUT charge. Note that the quantization condition P = −(kℓ2+4n2)/2ℓ
is also satisfied. In terms of the charges, the solution is given by
b = −1
(z − in)2 + 2n2 + iℓQ
2 (z − in) . (5.41)
– 41 –
The subfamily of static half BPS configurations is obtained by imposing the static-
ity condition ζ = 0 or equivalently vanishing NUT charge. It is parameterized by
the single parameter left, δ ∈ R and the solutions are restricted to have the following
charges
M = 0 , n = 0 , P = 0 , Q = −δ
In terms of the charges, the solution is given by
b = −1
z2 + iℓQ
. (5.42)
The metric and electromagnetic field strength for this solution read
ds2 = −
dt2 +
+ 4ℓ2z2 dwdw̄ , (5.43)
F = −Q
dt ∧ dz . (5.44)
This is simply the backreacted AdS4 filled with the electric field generated by an electric
charge Q placed in its center ζ = 0. The solution has a singularity there. Note that this
solution was already shown to be 1/2 supersymmetric in [36]. It was also shown there
that the Killing spinors are preserved if one compactifies the transverse two-dimensional
plane to a two-torus.
We will now discuss the Killing spinors for these metrics. The integrability condi-
tions impose α2 = 0 and
α4 . (5.45)
With these constraints, the Killing spinor equations simplify, and can be solved to give
α0 = λ1 + 2iζw̄λ2 , α1 = 0 , (5.46)
z2 + iζz + ζ
4z2 + ζ2
, α12 = α2 −
4z2 + ζ2 , (5.47)
where λ1,2 ∈ C parameterize the two dimensional space of Killing spinors. Then the
most general Killing spinor for these metrics is
ǫ = (λ1 + 2iζw̄λ2) 1− ℓλ2
2z + iζ
2z − iζ e1
+b (λ1 + 2iζw̄λ2) e2 −
z2 − iζz + ζ2
4z2 + ζ2
λ2 e1 ∧ e2 . (5.48)
– 42 –
Again the second Killing spinor has G0 = 0 time-dependence. Finally, the correspond-
ing orbit of the Killing spinor is determined by the invariant
b̃ = b|λ1|2 +
z2 + iζz + ζ
2z − iζ + 4ζ
2bww̄
|λ2|2 + 2iζb
w̄λ̄1λ2 − wλ1λ̄2
. (5.49)
It is easy to show now that b̃ is non vanishing for any choice of λ1,2: indeed if b̃ = 0, we
have ∂∂̄b̃ = 4ζ2b|λ2|2 = 0 and either λ2 = 0, which implies in turn λ1 = 0, or ζ = 0. In
the latter case, it is very easy to see that b̃ = 0 iff ǫ = 0. Therefore, all Killing spinors
of this family of metrics belong to the timelike class, and the solution is purely timelike.
Summary of the b = b(z) case:
1. The only supersymmetric solutions with static Killing spinor (i.e. whose timelike
Killing vector constructed as a Killing spinor bilinear is static) are AdS4, the
anti-Nariai spacetime and the Reissner-Nordström-AdS4 solutions of the fourth
row of table 1 of [34], i. e. solutions of the form (5.6) with vanishing NUT charge.
2. The only 1/2 BPS solutions with static Killing spinor are the anti-Nariai space-
time and the solution (5.43) with field strength (5.44).
3. The most general half BPS solution with b = b(z) are the anti-Nariai spacetime
and the solution (5.39) with charges (5.40) describing an electric charge in the
center of AdS4.
The natural way to continue this approach is to study half BPS solutions with b
harmonic, and this will be the subject of the next paragraph.
5.2 Harmonic b solutions
The previous class of solutions can be generalized by requiring ∆b = 0 instead of
b = b(z) [16]. This implies that ∆1/b = 0 and hence (4.23) still simplifies in exactly
the same way as in the b = b(z) case. Indeed, the solution is
b̄ = −αz
2 + βz + γ
ℓ(2αz + β)
, (5.50)
where now α, β and γ are no more constants but arbitrary functions of (w, w̄). It is
then easy to show that the ∆b = 0 condition requires these functions to be harmonic
and all (anti-)holomorphic, that is α, β and γ all depending either only on w or only
on w̄, and this is the most general solution with ∆b = 0. The b = b(z) configurations
– 43 –
are particular cases of this larger class, and are obtained for α, β and γ constant. Note
that also the ∂b = 0 and ∂b̄ = 0 subclasses fall into this family.
Let’s take for definiteness α, β, γ all anti-holomorphic, then b = b(z, w). The
requirement that the integrability conditions allow for an extra Killing spinor, i.e. that
they are of rank ≤ 2, in this case leads to several conditions. One of these is obtained
from the minor of the last three lines of Ñwt and reads
2∂z b̄+
∂z b̄+
∂b∂b − 2∂(Φ + ξ)∂b
∂b = 0. (5.51)
This gives three different cases to be analysed, corresponding to the vanishing of the
first three factors of this equation (vanishing of the fourth factor implies b = b(z) and
hence brings one back to the previous section).
5.2.1 Deformations of AdS2 ×H2
The vanishing of the first factor in (5.51) implies b = −z
+ ic(w), where c(w) is an arbi-
trary holomorphic function. These are the α(w) = 0 supersymmetric Kundt solutions
of Petrov type II, describing gravitational and electro-magnetic waves propagating on
anti-Nariai space-time [16].
The remaining integrability conditions however imply α1 = α2 = α12 = 0, in which
case there is no second Killing spinor, or ∂c = 0. Therefore there are no new half
BPS solutions with non constant c. In this class c constant is the half supersymmetric
anti-Nariai spacetime and the other preserve only 1/4 of the supersymmetries.
5.2.2 Deformations of AdS4
The vanishing of the second factor in (5.51) implies b = − z
+ ic(w). In this case we
are considering the β2 = 4αγ supersymmetric Kundt solutions, describing gravitational
and electro-magnetic waves propagating on AdS4 spacetime [16].
Again the remaining integrability equations have to solutions: α1 = α2 = α12 = 0 or
∂c = 0. Hence, as in the previous case, we find that there are no harmonic deformations
of AdS4 preserving half supersymmetry.
5.2.3 Deformations of Reissner-Nordström-Taub-NUT-AdS4
Not considering the previous two special cases, the general solution represents expand-
ing gravitational and electro-magnetic waves propagating on a Reissner-Nordström-
Taub-NUT-AdS4 spacetime [16]. When Im(β) = 0, the solution can be put in Robinson-
Trautman form and is of Petrov type II.
The vanishing of the third factor in (5.51) is given by
∂b∂b − 2∂(Φ + ξ)∂b = 0 . (5.52)
– 44 –
With b given in (5.50) this case can be solved for the derivative of Φ + ξ and implies
∂̄(Φ + ξ) =
, (5.53)
and therefore ∆(Φ + ξ) = 0. Then (5.3) fixes the transverse manifold to be flat and
κ(w) = 8(αγ̄ + ᾱγ)− 4ββ̄ = 0. (5.54)
But α,β and γ being holomorphic, this last equation can be satisfied if and only if they
are constant, and we are back to the previous case, i. e. there are no new 1/2 BPS
solutions.
Summary of the harmonic case:
There are no new half BPS solutions in the harmonic b case. The only half BPS
solutions are those with b = b(z), and as soon as one deforms these solutions by adding
some harmonic (w, w̄)-dependence, one breaks supersymmetry further to 1/4.
5.3 Imaginary b solutions
Another subcase we want to study is b̄ = −b, i. e. b purely imaginary. For notational
convenience we introduce18
b = iX ,
where X is real. From (4.15) one gets Φ = 0. All quantities in the Bianchi iden-
tity (4.22), apart from b and hence X , are then z-independent. The only consistent
possibility is to take ∂zX = 0. The remaining equations (4.22) and (4.23) read
e2ξ , ∆
e2ξ = 0 . (5.55)
Examples of 1/4 supersymmetric solutions of this class, i.e. with imaginary b, that
were discussed in [16] are X = (x/ℓ)α with α = −2 and α = 1
. These correspond to a
particular Petrov type I solution and an electrovac AdS travelling wave of Petrov type
N, respectively. It was shown that the latter actually preserves a second, null Killing
spinor. In this section we will derive the general condition for 1/2 supersymmetry in the
case of imaginary b and will find that there is a one-parameter family of such solutions.
The condition for 1/2 supersymmetry is very simple in this case. Assuming that
∂X is not equal to zero, which would clearly be incompatible with (5.55), there is
18In the following we will assume that X is positive without loss of generality.
– 45 –
only one differential constraint which needs to be satisfied for the existence of a second
Killing spinor, i. e. for the matrices of integrability conditions to have rank 2, namely
∂2X−1 − 2∂ξ∂X−1 = 0 . (5.56)
The above three differential equations can be integrated to
e2ξ = −iK̄(w̄)∂X−1 , ∂X−1 = i
, (5.57)
where K(w) is an arbitrary holomorphic function and L is a real constant. The func-
tion K(w) corresponds to the freedom to choose holomorphic coordinates on the two-
dimensional space, and hence it can be gauged away. A convenient gauge choice will
be K(w) = iℓ. Note that, for this choice, the imaginary part of the right hand side of
the last equation vanishes, and therefore that ∂yX = 0.
For L = 0, (5.57) can be integrated to give
, (5.58)
which is (up to a rescaling of the coordinate x) the example given above with α = 1
This was already found to be 1/2 supersymmetric in [16]. Here we find that this solution
is a special case of the most general possibility.
For other values of the constant L it is convenient to use X as a new coordinate
instead of solving for X(x). From (4.18) and (4.19) it follows that σ can be chosen to
. (5.59)
Then the metric reads
ds2 = −4X2
dz2 +
ℓ2dX2
X2(1 + 4LX4)
1 + 4LX4
dy2 . (5.60)
Finally, from (4.21) we obtain the gauge field strength
F = 2dt ∧ dX . (5.61)
Note that the geometry (5.60) is generically of Petrov type D, and becomes of Petrov
type N for L = 0.
Now let us turn our attention to the form of the second Killing spinor. First of all,
the integrability conditions imply that it takes the form
αT = (β1, β2, iX
3eξβ2, iX
3eξβ2) ,
– 46 –
where β1 and β2 are arbitrary space-time dependent functions. The Killing spinor
equations (4.31) yield
β1 = λ1 − 12λ2b
−2 , β2 = λ2b
where λ1 and λ2 are integration constants. This implies that the new Killing spinor
takes the form ǫ = λ1ǫ1 + λ2ǫ2, where
ǫ1 = 1 + iXe2 , ǫ2 =
X−2(1− iXe2) +
X−4 + L (e1 − iXe1 ∧ e2) . (5.62)
Note that G0 = 0 as well in this class.
One interesting aspect of the second Killing spinor ǫ2 is the norm of its associated
Killing vector Vµ = D(ǫ2,Γµǫ2). We find VµV
µ = −4X2L2, hence the second Killing
spinor is indeed null for the case L = 0, as was noticed before, while it is timelike
for L 6= 0. In the latter case, to understand whether the solution belongs also to the
null class of supersymmetric solutions, we have therefore to study the most general
linear combination of the two Killing spinors. The Killing vector Ṽ constructed from
ǫ = λ1ǫ1 + λ2ǫ2 has norm
Ṽ 2 =
λ̄1λ2 − λ1λ̄2
)2 − 4X2
L|λ1|2 + |λ2|2
which can vanish only if L ≤ 0. We have therefore three cases:
1. L > 0, pure timelike class, Petrov type D.
2. L = 0, belongs to both null and timelike classes, Petrov type N. This is the
homogeneous half BPS pp-wave in AdS. (In the terminology of [16] it has a wave
profile Gα with α = 0).
3. L < 0, belongs to both null and timelike classes, Petrov type D.
Actually the solutions (5.60) with L > 0 can be cast into a simpler form. This is
done by trading the coordinate y for a new variable ψ = Ly − t. For convenience, let
us also introduce the Schwarzschild coordinate r and rescale z,
r = − ℓ√
, ζ =
Lz . (5.63)
In the new coordinates, the metric and the gauge field strength read
ds2 = −
dt2 +
dψ2 + dζ2
, F = qe
dt ∧ dr , (5.64)
– 47 –
where we have defined qe = 2ℓ/
L. This is precisely the half BPS solution obtained
in [36], the massless limit of an electrically charged toroidal black hole, which forms a
naked singularity. It is also interesting to note that the charge qe diverges in the L→ 0
limit. This limit is naively singular in these coordinates, but it can be taken if we
perform a Penrose limit [37, 38]. The existence of this limit explains why we obtained
a one-parameter family of geometries (5.60) connecting the massless limit of toroidal
black holes and a pp-wave. Indeed, define the new coordinates (X+, X−, R, Z) and the
rescaled charge Qe by
ψ + t = 2ǫ2X+ , ψ − t = 2X− , r = 1
, ζ = ǫZ , qe =
. (5.65)
Then, the singular limit ǫ→ 0 yields is a regular solution of the theory and corresponds
to the half supersymmetric solution (5.60) with L = 0,
ds2 =
4 dX+dX− − Q
dX−2 + dR2 + dZ2
, F = Qe
dX− ∧ dR . (5.66)
In the procedure, we have blown up the metric in the neighborhood of a geodesic with
ψ + t constant near the boundary r → ∞ of AdS.
We now turn to the L < 0 case, which is both timelike and lightlike. Let us define
L = −µ2. We can perform a coordinate transformation inspired from the previous one,
ψ = Ly − t , r = − ℓ
, ζ =
z , (5.67)
under which the metric and the field strength become
ds2 =
dt2 +
− q2e
−dψ2 + dζ2
, F = qe
dt ∧ dr , (5.68)
where we have defined qe = 2ℓ/µ. We see that this is the precisely the metric for L > 0
after the double analytic continuation
t 7→ it , ψ 7→ iψ , qe 7→ −iqe . (5.69)
This solution represents therefore a bubble of nothing in AdS [39–42]. Note that the
metric is singular for r =
ℓqe. One should compactify t, in such a way to eliminate
the conical singularity on the (t, r) hypersurface. Then, if we compactify also ζ , this
S1 will have a minimal radius for r =
ℓqe (the boundary of the bubble of nothing)
and then grow with r. Note that for r → ∞ one locally recovers AdS spacetime, and
that the L = 0 solutions can again be understood as a Penrose limit of this metric.
– 48 –
5.4 Action of the PSL(2,R) group on the imaginary b solutions
We can now generate new supersymmetric solutions by acting with the PSL(2,R)
symmetry group (4.28)-(4.29) on the known ones. It is easy to check that the AdS4
and AdS2×H2 solutions are invariant under this group (although it acts non trivially on
the Killing spinors). Its action on the b = b(z) subfamily of the RNTN-AdS4 solutions
was studied in [16], where it was shown that it acts non trivially on the charges, by
mixing them. Here we want to apply it to the imaginary b solutions of the previous
paragraph.
The new solution solution of the supersymmetry equations (4.22)-(4.23) generated
by the transformation (4.28)-(4.29) is
b̃ = − γ
2γ2ℓXz + i
, e2(Φ̃+ξ) =
1 + 4LX4
, (5.70)
where, without loss of generality, we eliminated α by means of a translation of z19, and
dropped the prime of the new coordinate z′. The shift function is then determined by
solving equations (4.18) and (4.19),
σx = 0 , σy =
1 + 4LX4
4γ2X4z2
. (5.71)
Then, defining the new coordinates (T, σ, p, q) through
2ℓ2γ2
, σ =
, p = − ℓ
, q = 2ℓ2γ2z , (5.72)
the metric reads
ds2 = − Q(q)
q2 + p2
P (p)
q2 + p2
dq2 +
q2 + p2
P (p)
(q2 + p2)P (p) dσ2, (5.73)
Q(q) =
, P (p) =
p4 + 4Lℓ2
, (5.74)
and the gauge field (4.21) is
F = d
ℓ(q2 + p2)
∧ dT + d
q2 + p2
∧ dσ . (5.75)
19After this translation the limit γ → 0 is not anymore well-defined. To perform it, one has to
substitute preliminarily z with z − α/γ everywhere.
– 49 –
The form of the metric suggests some connection with the Plebanski-Demianski family
of solutions, and indeed these geometries are of Petrov type D for L 6= 0, and of Petrov
type N for L = 0, but we were not able to find the precise relation. Note also that
the parameter γ has been reabsorbed in the new variables, and we are left with a
one-parameter (L) family of solutions.
The left hand side of the necessary condition (4.34) for the existence of a second
Killing spinor reads, for this solution,
− 9iX
4 (1 + 4LX4)
ℓ2 (1 + 4γ4ℓ2X2z2)
γ2 (5.76)
which clearly vanishes only for γ = 0, i.e. if the PSL(2,R) transformation is trivial.
Therefore, the new solutions (5.73)-(5.75) preserves only 1/4 of the supersymmetries,
and we explicitly see that the PSL(2,R) transformations can break any additional
supersymmetry. Also note that if we perform the PSL(2,R) transformation adapting
the original metric to a different Killing spinor, we could in principle end up with other
supersymmetric solutions.
Surprisingly, we find that the L = 0 solution can be cast in the Lobatchevski wave
form, even though it only has a time-like Killing spinor. This can be seen by trading
the coordinates (q, p) for (x, z) defined by
, z =
, (5.77)
in the metric (5.73) with L = 0, which becomes
ds2 =
−2 dTdσ + z
x2 + z2
x2 + z2
x2 + z2
dT 2 + dz2 + dx2
. (5.78)
The field strength can be easily obtained from equation (5.75) but the result is not
particularly enlightening and therefore we do not report it. This metric represents a
1/4 BPS Lobatchevski wave, whose Killing spinor falls in the timelike class. This does
not contradict the results obtained in the null case, since the null Lobatchevski had a
field strength (3.6) of the form F = φ′(T )dT ∧dz, while this solution has a much more
complicated gauge field. It is however interesting to note that the solutions of the null
case do not exhaust all possible supersymmetric Lobatchevski waves.
5.5 Gravitational Chern-Simons system and G0 = ψ− = 0 solutions
A number of the previously studied subcases can be combined into the interesting
Ansatz
b = −1
αz2 + βz + γ
2αz + β − iη(w, w̄) , (5.79)
– 50 –
where α, β and γ are three real constants. For α = β = 0 this reduces to b imaginary,
while η = 0 leads to the real subcase of b = b(z). With this assumption, the equations
for a timelike Killing spinor reduce to
e2ξ (k − 3η) = 0 , ∆η + e2ξ
kη − η3
= 0, (5.80)
where we have defined k = 4αγ − β2 and ∆ = 4∂∂̄. Interestingly, as shown in [16],
this system of equations follows from the dimensionally reduced Chern-Simons action
[43, 44],
(2)Rη + η3
, (5.81)
if we use the conformal gauge (2)gijdx
idxj = e2ξ (dx2 + dy2) and η is the curl of a vector
potential,
(2)g ǫijη = ∂iAj−∂jAi. To obtain equations (5.80) we vary the action with
respect to Ai and ξ. When varying the dimensionally reduced Chern-Simons action
with respect to gij there is however an additional equation to (5.80).
Using the results of Grumiller and Kummer [48], one obtains the most general
solution to the dimensionally reduced Chern-Simons system [16]
e2ξ =
η4, (5.82)
where L is an integration constant and dη = e2ξdx. Trading the coordinate x for η, we
get the following configuration of the fields
ds2 = − 4
P ′22 + η
[dt + σ]
P ′22 + η
dz2 + P 22
e−2ξdη2 + e2ξdy2
A = 2
P ′22 + η
[dt + σ] +
Vdy − i ℓ
d log
, (5.83)
where P2(z) = αz
2 + βz + γ, k is defined as above and the shift function reads
αη2 +
dy . (5.84)
These solutions preserve 1/4 of the original supersymmetry. In fact, the k = 0 solutions
coincide with the imaginary b ones and their PSL(2,R) transforms of sections 5.3 and
5.4. For k non-vanishing these are different solutions.
As can be seen from the Poisson bracket (4.34), the only possibility to have 1/2
supersymmetry is α = 0 and hence k ≤ 0. In fact, starting from any solution with
k ≤ 0, one can always obtain α = 0 by an appropriate PSL(2,R) transformation. The
– 51 –
non-trivial part of the PSL(2,R) symmetry is z 7→ −1/(z + δ), whose action on the
parameters α, β and γ of the Ansatz (5.79) is given by
α 7→ αδ2 − βδ + γ , β 7→ 2αδ − β , γ 7→ α , (5.85)
which keeps k fixed. Indeed, for k ≤ 0, there is always a PSL(2,R) transformation that
sets α = 0, while this is impossible for k > 0.
The requirement α = 0 leads to the half-supersymmetric imaginary b solution of
section 5.3 for k = 0. In the case of k negative, when α = 0 one can scale β to 1 in
(5.79) without loss of generality, and γ can be put to zero by a translation in z. Hence
the function b is given by
b = −1
1− iη . (5.86)
The metric is given in (D.32) and is generically of Petrov type D. The second Killing
spinor can be found in (D.33). As shown in appendix D, the G0 = ψ− = 0 solutions
are either the imaginary b ones, anti-Nariai spacetime or the above 1/2 supersymmetric
solution with k = −1.
We would like to mention that (5.82) is the most general solution to the dimen-
sionally reduced Chern-Simons system, but not to the equations (5.80). The reason for
this is the additional constraint one obtains when varying (5.81) with respect to gij.
An example of this is provided by the Petrov type I solution with b = i(x/ℓ)2 in section
5.3 and its PSL(2,R) transform given in eq. (2.44) of [16].
6. Final remarks
In this paper, we applied spinorial geometry techniques to classify all supersymmetric
solutions of minimal N = 2 gauged supergravity in four dimensions.
In the presence of null Killing spinors, the problem can be completely solved,
and all 1/4- and 1/2-supersymmetric solutions have been written down explicitly. We
showed that there are no 1/4-BPS backgrounds with U(1)⋉R2-invariant Killing spinors
and those with R2-invariant Killing spinors have been derived in sections 3.1 and 3.2.
The backgrounds in the latter section were previously unknown and are Petrov type
II configurations describing gravitational waves propagating on a bubble of nothing
in AdS4. In addition, it turned out that there are no 1/2-BPS backgrounds with R
invariant Killing spinors and hence any additional Killing spinor is timelike. In section
3.3 we gave the backgrounds with one null and one timelike Killing spinor.
For a timelike Killing spinor we derived the conditions for the corresponding back-
grounds in section 4.1 and 4.2. We worked out the first integrability conditions nec-
essary for the existence of a second Killing spinor in section 4.3. We explicitly solved
– 52 –
these equations in a number of subcases in section 5, and thereby found several new
solutions, like the bubbles of nothing in AdS4, already obtained in the null formalism,
and their PSL(2,R)-transformed configurations. Furthermore, our results showed that
the generalized holonomy in the case of one preserved complex supercharge is contained
in A(3,C), supporting thus the classification scheme of [4].
In addition, the time-dependence of a second time-like Killing spinor was shown to
be an overall exponential factor with coefficient G0 in section 4.4. In the case G0 = 0
these equations have been solved in full generality, up to a second order ordinary
differential equation. We expect this class to comprise a large number of interesting
1/2-BPS solutions. Indeed, all the examples of section 5 either have vanishing G0 or
can be transformed to that case by a coordinate transformation.
There are several interesting points that remain to be understood. First of all, it
would be desirable to get a deeper insight into the underlying geometric structure in
the case of U(1) invariant spinors. In five dimensions, spacetime is a fibration over a
four-dimensional Hyperkähler or Kähler base for ungauged and gauged supergravity
respectively [8, 12], whereas in four-dimensional ungauged supergravity one has a fi-
bration over a three-dimensional flat space [5]. This suggests that the base for D = 4
gauged supergravity might be an odd-dimensional analogue of a Kähler manifold, i. e. ,
a Sasaki manifold. From the equations (4.22) and (4.23) this is not obvious.
Secondly, in [16], a surprising relationship between the equations (4.22), (4.23) gov-
erning 1/4 BPS solutions and the gravitational Chern-Simons theory [43] was found.
Why such a relationship should exist is not clear at all, and deserves further investiga-
tions.
The third point concerns preons, which were conjectured in [45] to be elementary
constituents of other BPS states. In type II and eleven-dimensional supergravity, it was
shown that imposing 31 supersymmetries implies that the solution is locally maximally
supersymmetric [27,30,46]. Similar results in four- and five-dimensional gauged super-
gravity were obtained in [28,29]. This implies that preonic backgrounds are necessarily
quotients of maximally supersymmetric solutions. While M-theory preons cannot arise
by quotients [47], it remains to be seen if 3/4 supersymmetric solutions to N = 2,
D = 4 or D = 5 gauged supergravities really do not exist. The only maximally super-
symmetric backgrounds in these theories are AdS4 [13] and AdS5 [12] respectively, so
the putative preonic configurations must be quotients of AdS.
Finally, it would be interesting to apply spinorial geometry techniques to classify
all supersymmetric solutions of four-dimensional N = 2 matter-coupled gauged super-
gravity. Work in this direction is in progress [49].
– 53 –
Acknowledgments
We are grateful to Alessio Celi, Marcello Ortaggio and Christoph Sieg for useful dis-
cussions. This work was partially supported by INFN, MURST and by the European
Commission program MRTN-CT-2004-005104. D.R. wishes to thank the Università
di Milano for hospitality. Part of this work was completed while he was a post-doc
at King’s College London, for which he would like to acknowledge the PPARC grant
PPA/G/O/2002/00475. In addition, he is presently supported by the European EC-
RTN project MRTN-CT-2004-005104, MCYT FPA 2004-04582-C02-01 and CIRIT GC
2005SGR-00564.
A. Spinors and forms
In this appendix, we summarize the essential information needed to realize the spinors
of Spin(3,1) in terms of forms. For more details, we refer to [50]. Let V = R3,1 be
a real vector space equipped with the Lorentzian inner product 〈·, ·〉. Introduce an
orthonormal basis e1, e2, e3, e0, where e0 is along the time direction, and consider the
subspace U spanned by the first two basis vectors e1, e2. The space of Dirac spinors is
∆c = Λ
∗(U⊗C), with basis 1, e1, e2, e12 = e1∧e2. The gamma matrices are represented
on ∆c as
Γ0η = −e2 ∧ η + e2⌋η , Γ1η = e1 ∧ η + e1⌋η ,
Γ2η = e2 ∧ η + e2⌋η , Γ3η = ie1 ∧ η − ie1⌋η , (A.1)
where
ηj1...jkej1 ∧ . . . ∧ ejk
is a k-form and
ei ∧ η =
(k − 1)!ηij1...jk−1ej1 ∧ . . . ∧ ejk−1 .
One easily checks that this representation of the gamma matrices satisfies the Clifford
algebra relations {Γa,Γb} = 2ηab. The parity matrix is defined by Γ5 = iΓ0Γ1Γ2Γ3,
and one finds that the even forms 1, e12 have positive chirality, Γ5η = η, while the odd
forms e1, e2 have negative chirality, Γ5η = −η, so that ∆c decomposes into two complex
chiral Weyl representations ∆+c = Λ
even(U ⊗ C) and ∆−c = Λodd(U ⊗ C).
Let us define the auxiliary inner product
αiei,
βjej〉 =
α∗iβi (A.2)
– 54 –
on U ⊗ C, and then extend it to ∆c. The Spin(3,1) invariant Dirac inner product is
then given by
D(η, θ) = 〈Γ0η, θ〉 . (A.3)
In many applications it is convenient to use a basis in which the gamma matrices act
like creation and annihilation operators, given by
Γ+η ≡
(Γ2 + Γ0) η =
2 e2⌋η , Γ−η ≡
(Γ2 − Γ0) η =
2 e2 ∧ η ,
Γ•η ≡
(Γ1 − iΓ3) η =
2 e1 ∧ η , Γ•̄η ≡
(Γ1 + iΓ3) η =
2 e1⌋η . (A.4)
The Clifford algebra relations in this basis are {ΓA,ΓB} = 2ηAB, where A,B, . . . =
+,−, •, •̄ and the nonvanishing components of the tangent space metric read η+− =
η−+ = η••̄ = η•̄• = 1. The spinor 1 is a Clifford vacuum, Γ+1 = Γ•̄1 = 0, and
the representation ∆c can be constructed by acting on 1 with the creation operators
Γ+ = Γ−,Γ
•̄ = Γ•, so that any spinor can be written as
φā1...ākΓ
ā1...āk1 , ā = +, •̄ .
The action of the Gamma matrices and the Lorentz generators ΓAB is summarized in
the table 6.
1 e1 e2 e1 ∧ e2
Γ+ 0 0
2e2 −
2e1 ∧ e2 0 0
2e1 0
2e1 ∧ e2 0
Γ•̄ 0
Γ+− 1 e1 −e2 −e1 ∧ e2
Γ•̄• 1 −e1 e2 −e1 ∧ e2
Γ+• 0 0 −2e1 0
Γ+•̄ 0 0 0 2
Γ−• −2e1 ∧ e2 0 0 0
Γ−•̄ 0 2e2 0 0
Table 6: The action of the Gamma matrices and the Lorentz generators ΓAB on the different
basis elements.
– 55 –
Note that ΓA = UA
aΓa, with
1 0 1 0
−1 0 1 0
0 1 0 −i
0 1 0 i
∈ U(4) ,
so that the new tetrad is given by EA = (U∗)AaE
B. Spinor bilinears
Given a Killing spinor
ǫ = c01 + c1e1 + c2e2 + c12e1 ∧ e2 , (B.1)
one can construct the bilinears
f̃ = −iD(ǫ, ǫ) = −i (c0c∗2 − c1c∗3 − c2c∗0 + c12c∗1) , (B.2)
g̃ = −iD(ǫ,Γ5ǫ) = c0c∗2 + c1c∗3 + c2c∗0 + c12c∗1 , (B.3)
Ṽ = D(ǫ,Γµǫ) dx
|c2|2 + |c12|2
− |c0|2 − |c1|2
|c2|2 + |c12|2 + |b|2
|c0|2 + |c1|2
(dt + σ)
ψ [(c2c
1 − c0c∗12) dw + (c1c∗2 − c12c∗0) dw̄] , (B.4)
B̃ = D(ǫ,Γ5Γµǫ) dx
|c2|2 − |c12|2
+ |c0|2 − |c1|2
|c2|2 − |c12|2 − |b|2
|c0|2 − |c1|2
(dt + σ)
ψ [(c2c
1 + c0c
12) dw + (c1c
2 + c12c
0) dw̄] , (B.5)
D(ǫ,Γµνǫ) dx
µ ∧ dxν = − (c0c∗2 − c1c∗12 + c2c∗0 − c12c∗1) dt ∧ dz
12 + |b|2c0c∗1
dt ∧ dw − 2e
0 + 4|b|2c1c∗0
dt ∧ dw̄
2 − c1c∗12 + c2c∗0 − c12c∗1)σw +
2|b|3 c2c
2|b|c0c
dz ∧ dw
2 − c1c∗12 + c2c∗0 − c12c∗1)σw̄ +
2|b|3 c12c
2|b|c1c
dz ∧ dw̄
– 56 –
12σw̄ − c12c∗0σw + |b|2 (c0c∗1σw̄ − c1c∗0σw)
4|b| (c0c
2 + c1c
12 − c2c∗0 − c12c∗1)
dw ∧ dw̄ . (B.6)
Given the first Killing spinor of the form ǫ1 = 1 + be2 and the second Killing spinor
ǫ2 = c01 + c1e1 + c2e2 + c12e1 ∧ e2, one can also construct mixed bilinears of the type
D(ǫ1,Γ···ǫ2), which verify the same differential equations as the bilinears built from the
original two Killing spinors:
f̂ = −i(b̄c0 − c2) , ĝ = b̄c0 + c2 , (B.7)
(c2 + bc0) (dt + σ) +
(c2 − bc0) dz +
b̄c1 − c12
dw̄ , (B.8)
(c2 − bc0) (dt+ σ) +
(c2 + bc0) dz +
b̄c1 + c12
dw̄ . (B.9)
C. The case P ′ = 0
In section 4.3, we simplified the equations for the second Killing spinor under the
assumption P ′ 6= 0, where P = e−2ψb∂b. Here we consider the case P ′ = 0. To this
end, we need the following subset of the Killing spinor equations (4.31):
∂+ψ2 −
ψ12 = 0 , (C.1)
∂+ψ12 − re−ψ∂•̄ ln b̄ ψ1 −
ψ12 = 0 , (C.2)
∂−ψ2 +
e−ψ∂•̄ ln b ψ1 −
ψ2 = 0 , (C.3)
∂−ψ12 −
ψ12 = 0 , (C.4)
re−ψ∂•
e2ψψ2
ψ1 = 0 , (C.5)
re−ψ∂•
e2ψψ12
ψ1 = 0 . (C.6)
If P ′ = 0, (4.35) implies ψ− = 0 or ∂P = 0. Let us first assume the former, i. e. ,
ψ2 = ψ12. From (C.6) – (C.5) one obtains then ψ1 = 0 or
= 0 . (C.7)
– 57 –
• If ψ1 = 0, (C.4) – (C.3) yields ψ2 = 0, and thus there exists no further Killing
spinor.
• If (C.7) holds, one can use (C.1) and (C.4) to show that ∂+ψ2 = ∂−ψ2 = 0, or
equivalently ∂tψ2 = ψ
2 = 0. Using this in (C.2) and (C.3) and deriving with
respect to t, one gets ∂̄b̄ ∂tψ1 = ∂̄b ∂tψ1 = 0. When ∂tψ1 6= 0, this means that
∂̄b = ∂b = 0, so b = b(z), which is a case analyzed in section 5.1. If instead
∂tψ1 = 0, all the ψi are independent of t, and the Killing spinor equations reduce
to the system (4.38) to (4.49) with G0 = 0.
In the case ∂P = 0, consider the integrability condition
′ + ψ−∂Q = 0 , (C.8)
where Q = e−2ψ b̄∂b̄, following from the first line of Ñwt. As long as Q
′ 6= 0, with the
same reasoning as in section 4.3, one obtains the system (4.38) to (4.49). If Q′ = 0,
(C.8) implies ψ− = 0 or ∂Q = 0. The case ψ− = 0 was already considered above, so
the only remaining case is P ′ = ∂P = Q′ = ∂Q = 0. For P = Q = 0 we get again
b = b(z), so without loss of generality we can assume P 6= 0 or Q 6= 0. Suppose that
Q = 0, P 6= 0, so b = b(w, z). Take the logarithm of e−2ψb∂b = P (w̄), derive with
respect to z, use (4.15), and apply ∂̄. This leads to ∂b = 0, which is a contradiction to
the assumption P 6= 0. In the same way one shows that P = 0, Q 6= 0 is not possible,
so that both P and Q must be nonvanishing. Now use the third row of Ñw̄t, which
leads to Q̄ψ2 = 0 and hence ψ2 = 0. Finally, the last row of Ñw̄t yields ψ− = 0, i. e. ,
the case already considered above.
Hence, the conclusion is that in the case P ′ = 0, the second Killing spinor either
has G0 time-dependence of the form (4.37), or leads to solutions with b = b(z). The
latter are treated separately in section 5.1. As can be found there, all 1/2-BPS solutions
with b = b(z) also have second Killing spinors with G0 time-dependence of the form
(4.37). Hence this time-dependence is a completely general result20 for second Killing
spinors in the time-like case.
D. Half-supersymmetric solutions with G0 = 0
From the difference of equations (4.39)−(4.43) and (4.48)−(4.49) one gets ψ− = ψ−(w).
Furthermore, [(4.42)−(4.40)+e−2ψ(4.47)] and (4.44) yield ψ1 = ψ1(z). Assuming ψ− 6=
20The only counterexample is the third Killing spinor of AdS4, see (5.30), but since this is maximally
supersymmetric it does not contradict the result.
– 58 –
0, eqns. (4.38) and (4.41) can be written in the form
= 0 ,
= 0 , (D.1)
where β = ψ1/ψ−. Deriving (D.1) with respect to w̄ gives
+ ∂∂̄
= 0 ,
+ ∂∂̄
= 0 .
Now use (D.1) in the difference between the first equation and the complex conjugate
of the second one to get
β̄β ′ − β̄ ′β
= 0 .
Observe that β̄β ′ − β̄ ′β = |ψ−|−2
1 − ψ̄′1ψ1
(z), so that for ψ̄1ψ
1 − ψ̄′1ψ1 6= 0 there
must exist a real function B(z) and a generic function h(w, w̄) such that
b = B(z)h(w, w̄) .
Plugging this into (D.1), we conclude that
= 0 ,
so that the phase of the function h is fixed, h = hR(w, w̄)e
iϕ0 , with hR real. Using
(4.18), the constancy of the phase of b implies that the shift vector σ does not depend
on z. (4.19) gives then
= 0 ,
or, using (4.15),
cosϕ0
+B′ = 0 ,
and thus
B′ = c ,
cosϕ0
= −c ,
where c denotes a real constant. Now we have to distinguish to cases:
1. c 6= 0: In this case b(z) =
B0 − 2 cosϕ03ℓ z
eiϕ0 . Plugging this into the first of
eqns. (D.1) one gets
= 0 ,
which is solved by ψ1 = ηb where η is a constant. But this yields ψ̄1ψ
1−ψ̄′1ψ1 = 0,
which contradicts our assumption.
– 59 –
2. c = 0: In this case b(w, w̄) = ihR(w, w̄). The combination (4.40)+(4.42)−(4.39)−
(4.43) leads to ψ− = 0, which again contradicts one of our assumptions.
We thus conclude that ψ̄1ψ
1− ψ̄′1ψ1 = 0, and hence ψ1 = ζ(z)eiθ0 where θ0 is a constant
and ζ(z) is a real function. Sending ψi → e−iθ0ψi we can take ψ1 real and non-negative
without loss of generality. Let us now consider the case where both ψ1 and ψ− are
non-vanishing. This allows to introduce new coordinates Z,W, W̄ such that
ψ1(z)
dz , dW =
ψ−(w)
, dW̄ =
ψ̄−(w̄)
Note that one can set ψ− = 1 using the residual gauge invariance w 7→ W (w), ψ 7→
ψ̃ = ψ − 1
ln(dW/dw)− 1
ln(dW̄/dw̄) leaving invariant the metric e2ψdwdw̄. We can
thus take W = w in the following. Equations (4.38) and (4.41) are then equivalent to
(∂Z + ∂)ϕ = 0 , ∂Z lnψ1 − (∂Z + ∂) ln r = 0 .
From the real part of the first equation we have
ϕ = ϕ(Z − w − w̄) .
Using ψ1 = ψ1(Z), the second equation implies
(∂Z + ∂)
= 0 ,
and hence
= ρ(Z − w − w̄) .
The function b must thus have the form
b(Z,w + w̄) = ψ1(Z)B(Z − w − w̄) ,
where B(Z − w − w̄) = ρ(Z − w − w̄)eiϕ(Z−w−w̄). The difference between (4.45) and
(4.46) yields
(∂Z + ∂) (lnψ1 − ψ) = 0 ,
so that lnψ1 − ψ = −H(Z − w − w̄) with H real. This gives
e2ψ = ψ1(Z)
for the conformal factor. In terms of the new coordinate Z, (4.15) reads
∂Zψ +
= 0 .
– 60 –
Using the definition of H we get
Ḣ + ∂Z lnψ1 +
= 0 , (D.2)
where a dot denotes a derivative with respect to Z−w− w̄. We can thus conclude that
∂Z lnψ1 = γ/ℓ for some constant γ, i. e. ,ψ1(Z) = ψ
γZ/ℓ. By shifting Z one can set
1 = 1. Calling χ = ψ+/ψ−, the only remaining nontrivial Killing spinor equations
∂Zχ− 2
χ+ 2iϕ̇+
= 0 ,
− Ḣ + γ
χ− 2ie−2H ϕ̇− 1
= 0 ,
∂χ + 2
χ− 2iϕ̇− 1
= 0 ,
∂̄χ+ 2
χ− 2iϕ̇ = 0 ,
χ + 2
1 + e−2H
− Ḣ + γ
= 0 .
Summing the first and the third equation yields χ = χ(Z −w− w̄), so that we are left
χ̇− 2
χ+ 2iϕ̇+
= 0 , (D.3)
− Ḣ + γ
χ− 2ie−2H ϕ̇− 1
= 0 , (D.4)
− χ̇+ 2 ρ̇
χ− 2iϕ̇ = 0 , (D.5)
χ + 2
1 + e−2H
− Ḣ + γ
= 0 . (D.6)
Adding (D.4) and (D.6) one gets
Ḣχ +
= 0 , (D.7)
which means that χ is purely imaginary. From (D.2) and (D.7) we obtain then the
function B,
+ Ḣ(1 + χ) = 0 . (D.8)
– 61 –
Using this, the remaining Killing spinor equations reduce further to
1 + e2H
= 0 , (D.9)
= 0 , (D.10)
1 + χ2
− 2 ρ̇
1 + e−2H
. (D.11)
Note that (D.9) automatically implies the integrability condition for the system (4.18),
(4.19), which reduces to
∂Zσw =
, ∂σw̄ − ∂̄σw = −
. (D.12)
Thus, also equation (4.23) is satisfied, whereas (4.22) reads
1 + e−2H
2Ḧ + Ḣ2
1 + 3χ2
. (D.13)
From (D.8) we obtain the phase ϕ and the modulus ρ of B,
tanϕ = i
− Ḣ2χ2 .
Plugging this into equation (D.10) yields
2ḦḢχ
1− χ2
− Ḣ2χ̇
1 + 3χ2
χ̇ = 0 .
Using (D.13), this can be rewritten as
1− χ2
1 + e−2H
= 0 ,
so that either Ḧ = 0 or Ḣχ (1− χ2) +
1 + e−2H
χ̇ = 0. It is straightforward to show
that the first case leads to AdS4, whereas the second one implies
e2H + 1
1− χ2 = −α
2 , (D.14)
where α is a real integration constant. Equations (D.9) and (D.11) are then identically
satisfied. Solving (D.14) for χ and plugging into (D.13) yields finally the ordinary
differential equation (4.50), which determines half-supersymmetric solutions with G0 =
0. Putting together all our results, we obtain (4.52) for the metric. Note that in the
– 62 –
case γ 6= 0 one can always set γ = 1 by rescaling the coordinates.
The second Killing spinor for these backgrounds is given by
αT = (α0, ρ
−2e−γZ/ℓ,
eH) ,
where
α0 = −
+ α̂0(Z,w, w̄) ,
and α̂0 is a solution of the system
∂Zα̂0 =
− iϕ̇ + γ
∂α̂0 = −
+ iϕ̇
, (D.15)
∂̄α̂0 = −
σw̄ +
+ iϕ̇
γχ e2H
ℓψ1ρ2
It is straightforward to verify that the integrability conditions for this system are al-
ready implied by (D.9), (D.10) and (D.12).
Consider now the case ψ− = 0. From the difference of equations (4.38) and (4.41)
it follows that b′/b is real. Then (4.38) and (4.44) imply that ψ is a real function,
depending only on z, ψ1 = ψ1(z). Moreover, since ψ12 = ψ2, the difference of equations
(4.45) and (4.46) imply that b′/b+ 1/ℓb is imaginary.
The conditions b′/b real and b′/b+ 1/ℓb imaginary can be satisfied simultaneously
in three different ways:
• b′/b = 0 hence b = b(w, w̄) is an imaginary function independent of z. This case
is solved completely in section 5.3.
• b′/b+ 1/ℓb = 0 implies b = −z/ℓ + c and corresponds to AdS2 ×H2, analyzed in
section 5.1.1. It is also a subcase of the following, general case,
• if we are not in one of the previous special cases, the function b must take the
b = −1
1− iY (w, w̄) , (D.16)
where Y (w, w̄) is some real function to be determined.
We thus have to solve just for the ansatz (D.16). Equation (4.38) implies ψ′1/ψ1 = b
than is solved by ψ1 = z, where we have reabsorbed the integrability constant in the
– 63 –
scale of z. Equation (4.39) (or equivalently (4.43)) tells us that ψ2 = ψ2(w, w̄), so that
the remaining independent equations read
iz2e−2ψ
1 + Y 2
− ψ2 = 0 ,
∂ψ2 + ∂
1 + Y 2
ψ2 − iY = 0 ,
∂̄ψ2 + ∂̄ log
1 + Y 2
ψ2 = 0 .
The first equation allows us to define a function H(w, w̄) such that
eψ = zeH(w,w̄) , (D.17)
while the last one implies that there must exist a holomorphic function C(w) such that
1 + Y 2
. (D.18)
Thus we are left with
e2HC(w) = i∂̄Y , (D.19)
e2HC(w)
= ie2HY
1 + Y 2
. (D.20)
This set of equations automatically implies the integrability condition for the system
(4.18), (4.19), which reduces to
∂zσ = i
, (D.21)
∂σ̄ − ∂̄σ = iℓ2e2HY
1 + Y 2
. (D.22)
Thus also (4.23), which reads
∂∂̄Y − e2HY
1 + Y 2
= 0 , (D.23)
is satisfied and it turns out that also the Bianchi identity (4.22), namely
∂∂̄2H − e2H
1 + 3Y 2
= 0 , (D.24)
holds. We conclude that a solution to the system (D.19), (D.20) describes a 1/2-BPS
configuration of the “gravitational Chern-Simons” system discussed in [16]. If C(w) = 0
then necessarily also Y = 0 so that we are left with AdS. If C(w) 6= 0 then we can
define new variables W and W̄ such that
∂W = C(w)∂ , ∂W̄ = C̄(w̄)∂̄ , (D.25)
– 64 –
so that we have
e2HCC̄ = i∂W̄Y ,
e2HCC̄
= ie2HCC̄Y
1 + Y 2
As what we did in the previous case, we can set C(w) = 1 using the residual gauge
invariance w 7→ W (w), ψ 7→ ψ̃ = ψ − 1
ln(dW/dw) − 1
ln(dW̄/dw̄) leaving invariant
the metric e2ψdwdw̄. We can thus take W = w without loss of generality, and get
e2H = i∂̄Y , (D.26)
2∂H = iY
1 + Y 2
(D.26) implies Y = Y [i(w − w̄)] and hence H = H [i(w− w̄)]. Denoting with a dot the
derivative with respect to the combination i(w − w̄) we have
e2H = Ẏ , (D.27)
2Ḣ = Y
1 + Y 2
. (D.28)
The equations for the shift form can now be integrated, giving
Ẏ d (w + w̄) (D.29)
Plugging (D.27) into (D.28) leads to
Ÿ = Ẏ Y (1 + Y 2) , (D.30)
which, integrated once, gives
Y 2 +
Y 4 ≡ P (Y ) , (D.31)
where L is a real constant and k = −121. We can thus use Y as a new coordinate,
instead of i(w − w̄). Call X = w + w̄, so that the solution reads
ds2 = − 4
1 + Y 2
PC(Y )dX
1 + Y 2
dz2 + z2
PC(Y )dX
PC(Y )
A = 2
1 + Y 2
dt+ ℓY
PC(Y )
1 + Y 2
1 + Y 2
1 + Y 2
. (D.32)
21The link with the notation of [48], where C and k are the Casimirs of the Poisson sigma model
equivalent to the dimensionally reduced gravitational Chern-Simons model in 2D, is given by 2C =
L/ℓ△.
– 65 –
We can thus finally compute the second Killing spinor, with the result
ǫ2 = −
1 + Y 2
PC(Y )
1 + iY
1− iY e
1 + Y 2
(1 + iY ) +
1− iY
e2 + ℓ
PC(Y )
1 + Y 2
e1 ∧ e2 . (D.33)
References
[1] M. Berger, “Sur les groupes d’holonomie homogènes de variétés à connexion affine et
des variétés riemanniennes,” Bul. Soc. Math. France 83 (1955) 279.
[2] R. L. Bryant, “Pseudo-Riemannian metrics with parallel spinor fields and vanishing
Ricci tensor,” Sémin. Congr., 4 Soc. Math. France, Paris (2000) 53.
[3] J. M. Figueroa-O’Farrill, “Breaking the M-waves,” Class. Quant. Grav. 17 (2000) 2925
[arXiv:hep-th/9904124].
[4] A. Batrachenko and W. Y. Wen, “Generalized holonomy of supergravities with 8 real
supercharges,” Nucl. Phys. B 690 (2004) 331 [arXiv:hep-th/0402141].
[5] K. P. Tod, “All metrics admitting supercovariantly constant spinors,” Phys. Lett. B
121 (1983) 241.
[6] J. Kowalski-Glikman, “Positive Energy Theorem And Vacuum States For The
Einstein-Maxwell System,” Phys. Lett. B 150 (1985) 125.
[7] J. P. Gauntlett, D. Martelli, S. Pakis and D. Waldram, “G-structures and wrapped
NS5-branes,” Commun. Math. Phys. 247 (2004) 421 [arXiv:hep-th/0205050].
[8] J. P. Gauntlett, J. B. Gutowski, C. M. Hull, S. Pakis and H. S. Reall, “All
supersymmetric solutions of minimal supergravity in five dimensions,” Class. Quant.
Grav. 20, 4587 (2003) [arXiv:hep-th/0209114].
[9] J. B. Gutowski, D. Martelli and H. S. Reall, “All supersymmetric solutions of minimal
supergravity in six dimensions,” Class. Quant. Grav. 20, 5049 (2003)
[arXiv:hep-th/0306235].
[10] P. Meessen and T. Ort́ın, “The supersymmetric configurations of N = 2, d = 4
supergravity coupled to vector supermultiplets,” Nucl. Phys. B 749, 291 (2006)
[arXiv:hep-th/0603099].
[11] J. P. Gauntlett and S. Pakis, “The geometry of D = 11 Killing spinors,” JHEP 0304,
039 (2003) [arXiv:hep-th/0212008].
– 66 –
[12] J. P. Gauntlett and J. B. Gutowski, “All supersymmetric solutions of minimal gauged
supergravity in five dimensions,” Phys. Rev. D 68, 105009 (2003) [Erratum-ibid. D 70,
089901 (2004)] [arXiv:hep-th/0304064].
[13] M. M. Caldarelli and D. Klemm, “All supersymmetric solutions of N = 2, D = 4
gauged supergravity,” JHEP 0309 (2003) 019 [arXiv:hep-th/0307022].
[14] J. P. Gauntlett, J. B. Gutowski and S. Pakis, “The geometry of D = 11 null Killing
spinors,” JHEP 0312, 049 (2003) [arXiv:hep-th/0311112].
[15] M. Cariglia and O. A. P. Mac Conamhna, “The general form of supersymmetric
solutions of N = (1,0) U(1) and SU(2) gauged supergravities in six dimensions,” Class.
Quant. Grav. 21 (2004) 3171 [arXiv:hep-th/0402055].
[16] S. L. Cacciatori, M. M. Caldarelli, D. Klemm and D. S. Mansi, “More on BPS solutions
of N = 2, D = 4 gauged supergravity,” JHEP 0407 (2004) 061 [arXiv:hep-th/0406238].
[17] M. Cariglia and O. A. P. Mac Conamhna, “Timelike Killing spinors in seven
dimensions,” Phys. Rev. D 70 (2004) 125009 [arXiv:hep-th/0407127].
[18] J. B. Gutowski and W. Sabra, “General supersymmetric solutions of five-dimensional
supergravity,” JHEP 0510, 039 (2005) [arXiv:hep-th/0505185].
[19] J. Bellorin and T. Ort́ın, “All the supersymmetric configurations of N = 4, d = 4
supergravity,” Nucl. Phys. B 726, 171 (2005) [arXiv:hep-th/0506056].
[20] M. Huebscher, P. Meessen and T. Ort́ın, “Supersymmetric solutions of N = 2 d = 4
SUGRA: The whole ungauged shebang,” Nucl. Phys. B 759, 228 (2006)
[arXiv:hep-th/0606281].
[21] J. Bellorin, P. Meessen and T. Ort́ın, “All the supersymmetric solutions of N = 1, d =
5 ungauged supergravity,” JHEP 0701, 020 (2007) [arXiv:hep-th/0610196].
[22] J. Gillard, U. Gran and G. Papadopoulos, “The spinorial geometry of supersymmetric
backgrounds,” Class. Quant. Grav. 22, 1033 (2005) [arXiv:hep-th/0410155].
[23] U. Gran, G. Papadopoulos and D. Roest, “Systematics of M-theory spinorial
geometry,” Class. Quant. Grav. 22, 2701 (2005) [arXiv:hep-th/0503046].
[24] U. Gran, J. Gutowski and G. Papadopoulos, “The spinorial geometry of
supersymmetric IIB backgrounds,” Class. Quant. Grav. 22, 2453 (2005)
[arXiv:hep-th/0501177].
[25] U. Gran, J. Gutowski and G. Papadopoulos, “The G(2) spinorial geometry of
supersymmetric IIB backgrounds,” Class. Quant. Grav. 23, 143 (2006)
[arXiv:hep-th/0505074].
– 67 –
[26] U. Gran, J. Gutowski, G. Papadopoulos and D. Roest, “Systematics of IIB spinorial
geometry,” Class. Quant. Grav. 23 (2006) 1617 [arXiv:hep-th/0507087].
[27] U. Gran, J. Gutowski, G. Papadopoulos and D. Roest, “N = 31 is not IIB,” JHEP
0702, 044 (2007) [arXiv:hep-th/0606049].
[28] J. Grover, J. B. Gutowski and W. Sabra, “Vanishing preons in the fifth dimension,”
Class. Quant. Grav. 24, 417 (2007) [arXiv:hep-th/0608187].
[29] J. Grover, J. B. Gutowski and W. A. Sabra, “Maximally minimal preons in four
dimensions,” arXiv:hep-th/0610128.
[30] U. Gran, J. Gutowski, G. Papadopoulos and D. Roest, “N = 31, D = 11,” JHEP 0702,
043 (2007) [arXiv:hep-th/0610331].
[31] M. M. Caldarelli and D. Klemm, “Supersymmetric Goedel-type universe in four
dimensions,” Class. Quant. Grav. 21 (2004) L17 [arXiv:hep-th/0310081].
[32] U. Gran, G. Papadopoulos, D. Roest and P. Sloane, “Geometry of all supersymmetric
type I backgrounds,” arXiv:hep-th/0703143.
[33] S. T. C. Siklos, “Lobatchevski plane gravitational waves,” in: Galaxies, axisymmetric
systems and relativity, ed. M. A. H. MacCallum, Cambridge University Press,
Cambridge (1985).
[34] N. Alonso-Alberca, P. Meessen and T. Ort́ın, “Supersymmetry of topological
Kerr-Newman-Taub-NUT-adS spacetimes,” Class. Quant. Grav. 17 (2000) 2783
[arXiv:hep-th/0003071].
[35] S. Cacciatori, D. Klemm and D. Zanon, “w∞ algebras, conformal mechanics, and black
holes,” Class. Quant. Grav. 17 (2000) 1731 [arXiv:hep-th/9910065].
[36] M. M. Caldarelli and D. Klemm, “Supersymmetry of anti-de Sitter black holes,” Nucl.
Phys. B 545 (1999) 434 [arXiv:hep-th/9808097].
[37] R. Penrose, “Any space-time has a plane wave as a limit”, in Differential geometry and
relativity, Reidel, Dordrecht (1976) pp. 27175,
[38] R. Gueven, “Plane wave limits and T-duality,” Phys. Lett. B 482 (2000) 255
[arXiv:hep-th/0005061].
[39] E. Witten, “Instability Of The Kaluza-Klein Vacuum,” Nucl. Phys. B 195 (1982) 481.
[40] D. Birmingham and M. Rinaldi, “Bubbles in anti-de Sitter space,” Phys. Lett. B 544,
316 (2002) [arXiv:hep-th/0205246].
– 68 –
[41] V. Balasubramanian and S. F. Ross, “The dual of nothing,” Phys. Rev. D 66 (2002)
086002 [arXiv:hep-th/0205290].
[42] D. Astefanesei and G. C. Jones, “S-branes and (anti-)bubbles in (A)dS space,” JHEP
0506 (2005) 037 [arXiv:hep-th/0502162].
[43] S. Deser, R. Jackiw and S. Templeton, “Topologically massive gauge theories,” Annals
Phys. 140 (1982) 372 [Erratum-ibid. 185 (1988 APNYA,281,409-449.2000) 406.1988
APNYA,281,409].
[44] G. Guralnik, A. Iorio, R. Jackiw and S. Y. Pi, “Dimensionally reduced gravitational
Chern-Simons term and its kink,” Annals Phys. 308 (2003) 222 [arXiv:hep-th/0305117].
[45] I. A. Bandos, J. A. de Azcárraga, J. M. Izquierdo and J. Lukierski, “BPS states in
M-theory and twistorial constituents,” Phys. Rev. Lett. 86 (2001) 4451
[arXiv:hep-th/0101113].
[46] I. A. Bandos, J. A. de Azcarraga and O. Varela, “On the absence of BPS preonic
solutions in IIA and IIB supergravities,” JHEP 0609 (2006) 009
[arXiv:hep-th/0607060].
[47] J. Figueroa-O’Farrill and S. Gadhia, “M-theory preons cannot arise by quotients,”
arXiv:hep-th/0702055.
[48] D. Grumiller and W. Kummer, “The classical solutions of the dimensionally reduced
gravitational Chern-Simons theory,” Annals Phys. 308, 211 (2003)
[arXiv:hep-th/0306036].
[49] S. L. Cacciatori, M. M. Caldarelli, D. Klemm, D. S. Mansi, P. Meessen, T. Ort́ın and
D. Roest, “All supersymmetric solutions of matter-coupled gauged N = 2, D = 4
supergravity,” in preparation.
[50] H. B. Lawson and M. L. Michelsohn, “Spin Geometry,” Princeton, UK: Univ. Pr. (1998)
– 69 –
ABSTRACT
  The supersymmetric solutions of N=2, D=4 minimal ungauged and gauged
supergravity are classified according to the fraction of preserved
supersymmetry using spinorial geometry techniques. Subject to a reasonable
assumption in the 1/2-supersymmetric time-like case of the gauged theory, we
derive the complete form of all supersymmetric solutions. This includes a
number of new 1/4- and 1/2-supersymmetric possibilities, like gravitational
waves on bubbles of nothing in AdS_4.

<|endoftext|><|startoftext|>
Introduction
There is currently a great deal of interest in the theoretical and practical possibility of
cloaking objects from the observation by electromagnetic fields. The basic idea of these in-
visibility devices [8, 9, 13], [18] is to use anisotropic transformation media whose permittivity
and permeability, ελν , µλν , are obtained from the ones, ελν0 , µ
0 , of isotropic media, by sin-
gular transformations of coordinates. The singularities lie on the boundary of the objects to
be cloaked. Here the material interpretation is taken. Namely, the ελν , µλν and the ελν0 , µ
represent the components in flat Cartesian space of the permittivity and the permeability of
physical media with different material properties. It appears that with existing technology it
is possible to construct media as described above using artificially structured metamaterials.
In [8, 9] a proof of cloaking was given for the conductivity equation -i.e., in the case of zero
frequency- from detection by measurement of the Dirichlet to Neumann map that relates
the value of the electric potential on the boundary to its normal derivative. The papers [13]
and [18] consider electromagnetic waves in the geometrical optics approximation, i.e. for
large frequencies. In [24] a experimental verification of cloaking is presented and [4] and
[5] give a numerical simulation. A rigorous prof of cloaking has already been given by [7]
where fixed frequency waves were studied, i.e., in the frequency domain. They consider a
class of finite energy solutions to Maxwell’s equations in a bounded set, O, that contains the
cloaked object on its interior, and they prove cloaking, at any frequency, with respect to the
measurement of the Cauchy data of these solutions on the boundary of O. We give further
comments on this paper below. For other results on this problem see [25] and [15]. In [16]
cloaking of elastic waves is considered, and the history of invisibility is discussed.
In this paper we study electromagnetic cloaking in the time-domain using the formalism of
time-dependent scattering theory [23]. This formalism provides us with a rigorous method to
analyze the propagation of electromagnetic wave packets with finite energy in transformation
media. In particular, it allows us to settle in an unambiguous way the mathematical problems
posed by the singularities of the inverse of the permittivity and the permeability of the
transformation media on the boundary of the cloaked objects. Von Neumann’s theory of self-
adjoint extensions of symmetric operators plays an important role on this issue. We write
Maxwell’s equations in Schrödinger form with the electromagnetic propagator playing the
role of the Hamiltonian. We prove that the electromagnetic propagator outside of the cloaked
objects is essentially self-adjoint. This means that it has only one self-adjoint extension,
AΩ, and that this self-adjoint extension generates the only possible unitary time evolution,
with constant energy, for finite energy electromagnetic waves propagating outside of the
cloaked objects. Moreover, AΩ is unitarily equivalent to the electromagnetic propagator in
the medium ελν0 , µ
0 . Using this fact, and since the coordinate transformation is the identity
outside of a ball, we prove that the scattering operator is the identity. This implies that for
any incoming finite-energy electromagnetic wave packet the outgoing wave packet is precisely
the same. In other words, it is not possible to detect the cloaked objects in any scattering
experiment where a finite-energy wave packet is sent towards the cloaked objects, since the
outgoing wave packet that is measured after interaction is the same as the incoming one.
Our results give a rigorous proof that the construction of [8, 9, 13], [18] cloaks passive and
active devices from observation by electromagnetic waves. Actually, the cloaking outside is
independent of what is inside the cloaked objects.
As is well known, self-adjoint extensions can be understood in terms of boundary condi-
tions. Actually, for the electromagnetic fields in the domain of AΩ the component tangential
to the exterior of the boundary of the cloaked objects of both, the electric and the mag-
netic field have to be zero. This boundary condition is self-adjoint in our case because the
permittivity and the permeability are degenerate on the boundary of the cloaked objects.
Furthermore, we prove cloaking for general anisotropic materials. In particular, our
results prove that it is possible to cloak objects inside general crystals.
Even though, as mentioned above, the cloaking is independent of the cloaked objects,
and in particular, the cloaking outside is not affected by the presence of passive and/or
active devices inside the cloaked objects, we discuss the dynamics of electromagnetic waves
inside the cloaked objects for completeness, since it helps to understand the above mentioned
independence of cloaking from the properties of the cloaked objects.
We prove that every self-adjoint extension of the electromagnetic propagator in a trans-
formation medium is the direct sum of the unique self-adjoint extension in the exterior of
the cloaked objects, AΩ, with some self-adjoint extension of the electromagnetic propagator
in the interior of the cloaked objects. Each of these self-adjoint extensions corresponds to a
possible unitary time evolution for finite energy electromagnetic waves. As is well known,
the fact that time evolution is unitary assures us that energy is conserved. This results
implies that the electromagnetic waves inside and outside of the cloaked objects completely
decouple from each other. Actually, the electromagnetic waves inside the cloaked objects
are not allowed to leave them, and viceversa, the electromagnetic waves outside can not go
inside.
In terms of boundary conditions, this means that transmission conditions that link the
electromagnetic fields inside and outside the cloaked objects are not allowed, since they do
not correspond to self-adjoint extensions of the electromagnetic propagator, and then, they
do not lead to a unitary dynamics that conserves energy. Furthermore, choosing a particular
self-adjoint extension of the electromagnetic propagator of the cloaked objects amounts to
choosing some boundary condition on the inside of the boundary of the cloaked objects. In
other words, any possible unitary dynamics implies the existence of some boundary condition
on the inside of the boundary of the cloaked objects.
The fact that there is a large class of self-adjoint extensions -or boundary conditions-
that can be taken inside the cloaked objects could be useful in order to enhance cloaking
in practice, where one has to consider approximate transformation media as well as in the
analysis of the stability of cloaking.
Actually, we consider a slightly more general construction than the one of [8, 9, 13], [18]
since we allow for a finite number of cloaked objects.
In [7] a very general construction for cloaking is introduced. In the case of Maxwell’s
equations all their constructions are made within the context of the permittivity and the
permeability tensor densities being conformal to each other, i.e., multiples of each other by a
positive scalar function. In particular, all isotropic media are included in this category. They
mention that both for mathematical and practical reasons it would be very interesting to
understand cloaking for general anisotropic materials in the absence of this assumption. In
this paper we actually solve this problem, since we prove cloaking for all general anisotropic
materials. In particular, our results prove that it is possible to cloak objects inside general
crystals.
Note, moreover, that [7] also considers the cases of the Helmholtz equation. We do not
discuss this problems here.
Furthermore, remark that the existing theorems in the uniqueness of inverse scattering
do not apply under the present conditions.
In [7] cloaking is proven with respect to the Cauchy data at any fixed frequency given
on a surface that encloses the cloaked object. In the case where the permittivity and the
permeability are bounded above and below it is well known that the Cauchy data at a fixed
frequency is equivalent to the scattering matrix at the same frequency. See for example
[17] and [26]. This equivalence is, however, not proven in the case where the permittivity
and the permeability are degenerate on the boundary of the objects. In fact, it is perhaps
even not true for general degenerate media that are not transformation media since in this
case it is possible that there are finite energy electromagnetic waves that are absorbed by
the boundary of the objects as t → ±∞. If this is true, the equivalence will not hold
since the Cauchy data in a surface that encloses the objects will not contain information
on the waves that are asymptotically absorbed by the boundary of the objects. It is a
problem of independent interest to see if this actually happens or not for general degenerate
permittivities and permeabilities. For an example of scattering by a bounded obstacle with
a singular boundary and Neumann boundary condition, where this happens see [10]. For a
similar situation in the scattering of electromagnetic waves by a Schwarzschild black-hole see
[2]. Note that in our approach we directly consider the scattering operator that is measured
in scattering experiments.
In the analysis of Maxwell’s equations with permittivity and permeability that are inde-
pendent of frequency the dispersion of the medium is not taken into account. This means
that cloaking will hold for electromagnetic wave packets with a narrow enough range of
frequencies, such that this assumption is valid.
The paper is organized as follows. In Section 2 we prove our results in electromagnetic
cloaking. In Section 3 we consider the propagation of electromagnetic waves in the interior
of the cloaked objects. In Section 4 we formulate cloaking as a boundary value problem
outside of the cloaked objects for the Maxwell equations at a fixed frequency, following
our analysis of the self-adjoint extensions of the electromagnetic propagator. In particular,
we give the appropriate boundary condition on the outside of the boundary of the cloaked
objects. Finally, in Section 5 we prove cloaking of infinite cylinders. This is of interest
since this is the case considered in the experimental verification in [24] and in the numerical
simulations of [4] and [5]. Of course, [24] only consider a slice of the cylinder. In Sections 3
and 4 we give further comments on the results of [7].
Addendum
After the previous version of this paper was posted in the arXiv we published the paper
[29] where we generalized the results of this paper on spherical cloaks to the case of high-
order cloaks, and where we also discussed cloaking in the frequency domain. Moreover, in
our paper [30] we identified the cloaking boundary condition that has to be satisfied in the
inside of the boundary of the cloaked objects, in the case where the permittivity and the
permeability are bounded above and below inside the cloaked objects.
2 Electromagnetic Cloaking
Let us consider Maxwell’s equations,
∇× E = −
B, ∇×H =
D, (2.1)
∇ ·B = 0,∇ ·D = 0, (2.2)
in a domain, Ω ⊂ R3, as follows,
Ω := R3 \ ∪Nj=1Kj, Kj ∩Kl = ∅, j 6= l (2.3)
where Kj, j = 1, 2, · · · , N, are the objects to be cloaked. We assume that each Kj is a ball
with center cj and radius aj, i.e.,
x ∈ R3 : |x− cj | ≤ aj
, j = 1, 2, · · · , N. (2.4)
The cloaked objects are denoted by
K := ∪Nj=1Kj .
We designate the Cartesian coordinates of x by xλ, λ = 1, 2, 3 and by Eλ, Hλ, B
λ, Dλ, λ =
1, 2, 3, respectively, the components of E,H,B, and D. As usual, we denote by ελν and µλν ,
respectively, the permittivity and the permeability. We have that,
Dλ = ελνEν , B
λ = µλνHν , (2.5)
where we use the standard convention of summing over repeated lower and upper indices.
We consider now a transformation from Ω0 := R
3 \ {c1, c2, · · · , cN} onto Ω that was first
used to obtain cloaking for the conductivity equation, i.e. at zero frequency, by [8, 9] and
then by [18] for cloaking electromagnetic waves (for a related result in two dimensions using
conformal mappings see [13]).
For any y ∈ R3 we denote, ŷ := y/|y|. Let yλ, λ = 1, 2, 3, designate the cartesian
coordinates of y ∈ Ω0. Take bj > aj, j = 1, 2, · · · , N . Then, for 0 < |y− cj| ≤ bj , we define,
x = x(y) = f(y) := cj +
bj − aj
|y − cj |+ aj
ŷ − cj. (2.6)
Note that this transformation blows up the point cj onto ∂Kj and that it sends the punctu-
ated ball B̃cj (bj) := {y ∈ R
3 : 0 < |y − cj | ≤ bj} onto the spherical shell, aj < |x− cj | ≤ bj .
We assume that,
B̃cj (bj) ∩ B̃cl(bl) = ∅, j 6= l, 1 ≤ j, l ≤ N. (2.7)
For y ∈ R3\∪Nj=1B̃cj (bj) we define the transformation to be the identity, x = x(y) = f(y) :=
y. Our transformation is a bijection from Ω0 onto Ω. By y = y(x) := f
−1(x) we designate
the inverse transformation. We denote the elements of the Jacobian matrix by Aλλ′ ,
Aλλ′ :=
. (2.8)
Note that the Aλλ′ ∈ C
Ω0 \ ∪
j=1∂B̃cj (bj)
. We designate by Aλ
λ the elements of the
Jacobian of the inverse bijection, y = y(x) := f−1(x),
Ω \ ∪Nj=1∂B̃cj (bj)
. (2.9)
The papers [8, 9] and [18] considered the case where N = 1, c1 = 0.
We take here the so called material interpretation and we consider our transformation
as a bijection between two different spaces, Ω0 and Ω. However, our transformation can be
considered, as well, as a change of coordinates in Ω0. Of course, these two point of view
are mathematically equivalent. This means, in particular, that under our transformation the
Maxwell equations in Ω0 and in Ω will have the same invariance that they have under change
of coordinates in three-space. See, for example, [21]. Let us denote by ∆ the determinant of
the Jacobian matrix (2.8). Then,
bj − aj
( bj−aj
|y− cj|+ aj
|y − cj|
, for 0 < |y − cj | ≤ bj . (2.10)
This result is easily obtained rotating into a coordinate system such that, y − cj = (|y −
cj|, 0, 0) [25]. For y ∈ Ω0 \ ∪
j=1B̃cj (bj),∆ ≡ 1.
Let us denote by E0,H0,B0,D0, ε
0 , µ
0 , respectively, the electric and magnetic fields,
the magnetic induction, the electric displacement, and the permittivity and permeability of
Ω0. ε
0 , µ
0 , are positive, Hermitian matrices that are constant in Ω0.
The electric field is a covariant vector that transforms as,
Eλ(x) = A
λ (y)E0,λ′(y). (2.11)
The magnetic fieldH is a covariant pseudo-vector, but as we only consider space transfor-
mations with positive determinant, it also transforms as in (2.11). The magnetic induction
B and the electric displacement D are contravariant vector densities of weight one that
transform as
Bλ(x) = (∆(y))
Aλλ′(y)B
0 (y), (2.12)
with the same transformation for D. The permittivity and permeability are contravariant
tensor densities of weight one that transform as,
ελν(x) = (∆(y))
Aλλ′(y)A
ν′(y) ε
0 (y), (2.13)
with the same transformation for µλν . The Maxwell equations (2.1, 2.2) are the same in
both spaces Ω and Ω0. Let us denote by ελν , µλν, ε0λν , µ0λν , respectively, the inverses of the
corresponding permittivity and permeability. They are covariant tensor densities of weight
minus one that transform as,
ελν(x) = ∆(y)A
λ (y)A
ν (y) ε0λ′ν′(y), µλν(x) = ∆(y)A
λ (y)A
ν (y)µ0λ′ν′(y). (2.14)
Note that
det ελν = ∆−1 det ελν0 , detµ
λν = ∆−1 detµλν0 , (2.15)
det ελν = ∆det ε0λν , detµλν = ∆detµ0λν . (2.16)
We now introduce the Hilbert spaces of electric and magnetic fields with finite energy.
The E0,H0,B0,D0, were defined in Ω0, but since R
3 \ Ω0 = {cj}
j=1 is of measure zero, we
can consider them as defined in R3, what we do below.
We denote by H0E the Hilbert space of all measurable, square integrable, C
3− valued
functions defined on R3 with the scalar product,
0ν dy
3. (2.17)
We similarly define the Hilbert space,H0H , of all measurable, square integrable, C
3− valued
functions defined on R3 with the scalar product,
0ν dy
3. (2.18)
The Hilbert space of finite energy fields in R3 is the direct sum
H0 := H0E ⊕H0H . (2.19)
Moreover, we designate byHΩE the Hilbert space of all measurable, C
3− valued functions
defined on Ω that are square integrable with the weight ελν , with the scalar product,
(1),E(2)
3. (2.20)
Finally, we denote by HΩH the Hilbert space of all measurable, C
3− valued functions
defined on Ω that are square integrable with the weight µλν , with the scalar product,
(1),H(2)
3. (2.21)
The Hilbert space of finite energy fields in Ω is the direct sum
HΩ := HΩE ⊕HΩH . (2.22)
We now write the Maxwell’s equations (2.1) in Schrödinger form. We first consider the
case of R3. We denote by ε0 and µ0, respectively, the matrices with entries ε0λν and µ0λν .
Recall that (∇×E)
= sλνρ ∂
Eρ where s
λνρ is the permutation contravariant pseudo-
density of weight −1 (see section 6 of chapter II of [21], where a different notation is used).
By a0 we denote the following formal differential operator,
ε0∇×H0
−µ0∇× E0
. (2.23)
Here, as usual, we denote, ε0∇ × H0 := ε0λν(∇ × H0)
ν , and µ0∇ × E0 = µ0λν(∇ × E0)
Then, equations (2.1) are equivalent to,
. (2.24)
Let us denote by C10(R
3) the set of all C6−valued continuously differentiable functions
on R3 that have compact support. Then, a0 with domain C
3) is a symmetric operator in
H0, i.e., a0 ⊂ a
0. Moreover, it is essentially self-adjoint in H0, i.e., it has only one self-adjoint
extension, that we denote by A0. Its domain is given by,
D(A0) =
, (2.25)
∈ D(A0), (2.26)
where the derivatives are taken in distribution sense. These results follow easily from the
fact that -via the Fourier transform- a0 is unitarily equivalent to multiplication by a matrix
valued function that is symmetric with respect to the scalar product of H0. Moreover, it
follows from explicit computation that the only eigenvalue of A0 is zero, that it has infinite
multiplicity, and that,
H0⊥ := (kernelA0)
∈ H0 :
ελν0 E0ν = 0,
µλν0 H0ν = 0
. (2.27)
Furthermore, A0 has no singular-continuous spectrum and its absolutely-continuous spec-
trum is R. See, for example, [27, 28].
Taking any
∈ H0⊥ ∩D(A0) (2.28)
we obtain a finite energy solution to the Maxwell equations (2.1, 2.2) as follows
(t) = e−itA0
. (2.29)
This is the unique finite energy solution with initial value at t = 0 given by (2.28). Note
that as e−itA0H0⊥ ⊂ H0⊥ equations (2.2) are satisfied for all times if they are satisfied at
t = 0.
Let us now consider the case of Ω. We denote by ε and µ, respectively, the matrices with
entries ελν and µλν .
We now define the following formal differential operator,
−µ∇× E
. (2.30)
Equations (2.1) are equivalent to,
Let us denote by C10(Ω) the set of all C
6−valued continuously differentiable functions on
Ω that have compact support. Then, aΩ with domain C
0(Ω) is a symmetric operator in HΩ.
To construct a unitary dynamics that preserves energy we have to analyse the self-adjoint
extensions of aΩ.
We denote by UE the following unitary operator from H0E onto HΩE ,
(UEE0)λ (x) := A
λ E0λ′(y), (2.31)
and by UH the unitary operator from H0H onto HΩH ,
(UHH0)λ (x) := A
λ H0λ′(y). (2.32)
Then,
U := UE ⊕ UH (2.33)
is a unitary operator from H0 onto HΩ.
We denote by a00 the restriction of a0 to C
0(Ω0). The operator a00 is essentially
self-adjoint and its only self-adjoint extension is A0. This follows from the essential self-
adjointness of a0 and from the fact that any function in C
3) can be approximated in the
graph norm of a0 by functions in C
0(Ω0). To prove this take any continuously differentiable
real-valued function, φ, defined on R such that, φ(y) = 0, |y| ≤ 1 and φ(y) = 1, |y| ≥ 2.
Then, for any
∈ C10(R
we have that,
φ(n|y− cj|)
∈ C10(Ω0)
and moreover,
s- lim
φ(n|y− cj|)
s- lim
φ(n|y− cj|)
where by s- lim we designate the strong limit in H0.
As a00 is essentially self-adjoint, it follows from the invariance of Maxwell equations that
aΩ is essentially self-adjoint, and that its unique self-adjoint extension, that we denote by
AΩ, satisfies
AΩ = U A0 U
∗. (2.34)
For the proof of these facts see [29]. Hence, we have the following theorem.
THEOREM 2.1. The operator aΩ is essentially self-adjoint, and its unique self-adjoint
extension, AΩ, satisfies (2.34).
The unitary equivalence given by (2.34) implies that AΩ has the same spectral prop-
erties that A0. Namely, it has no singular-continuous spectrum, the absolutely-continuous
spectrum is R and the only eigenvalue is zero and it has infinite multiplicity. Moreover,
HΩ⊥ := (kernelAΩ)
∈ HΩ :
ελνEν = 0,
µλνHν = 0
. (2.35)
Furthermore, taking any
∈ HΩ⊥ ∩D(AΩ) (2.36)
we obtain a finite energy solution to the Maxwell equations (2.1, 2.2) as follows
(t) = e−itAΩ
. (2.37)
This is the unique finite energy solution with initial value at t = 0 given by (2.36). Note
that as e−itAΩHΩ⊥ ⊂ HΩ⊥ equations (2.2) are satisfied for all times if they are satisfied at
t = 0. We can consider more general solutions by considering the scale of spaces associated
with AΩ, but we do not go into this direction here.
The facts that aΩ is essentially self-adjoint and that its unique self-adjoint extension AΩ is
unitarily equivalent to the propagator A0 of the homogeneous medium are strong statements.
They mean that the only possible unitary dynamics in Ω that preserves energy is given by
(2.37) and that this dynamics is unitarily equivalent to the free dynamics in R3 given by
(2.29). In fact, ∂Ω acts like a horizon for electromagnetic waves propagating in Ω in the
sense that the dynamics is uniquely defined without any need to consider the cloaked objects
K = ∪Nj=1Kj . As we will prove below this implies electromagnetic cloaking for all frequencies
in the strong sense that the scattering operator is the identity.
Since D(AΩ) = UD(A0), for any (E,H)
T ∈ D(AΩ) there is a (E0,H0)
T ∈ D(A0) such
. (2.38)
Then, it follows from (2.31, 2.32, 2.33) that
E× n = 0,H× n = 0, in ∂K+, (2.39)
where ∂K+ denotes the outside of the boundary of the cloaked objects, K, and n is the
normal vector to ∂K+, if (E0,H0) are, for example, bounded near ∂K+. That is to say, for
electromagnetic fields in the domain of AΩ the tangential components of both, the electric
and the magnetic field vanish in the exterior of the boundary of the cloaked objects. This is a
self-adjoint boundary condition because the permittivity and the permeability are degenerate
on ∂K+.
Let χΩ be the characteristic function of Ω, i.e., χΩ(x) = 1,x ∈ Ω, χΩ(x) = 0,x ∈ R
3 \ Ω.
We define,
(x) := χΩ(x)
(x). (2.40)
By (2.6, 2.10, 2.13),
∣ελν(x)
∣ ≤ C,
∣µλν(x)
∣ ≤ C, x ∈ Ω.
Then, J is a bounded operator from H0 into HΩ.
The wave operators are defined as follows,
W± = s- lim
eitAΩ Je−itA0P0⊥, (2.41)
where P0⊥ denotes the projector onto H0⊥.
Let us designate by W1,2(R3) the Sobolev space of C6 valued functions. We denote by I
the identity operator on H0. Then,
LEMMA 2.2.
W± = UP0⊥. (2.42)
Proof: Denote,
W (t) := eitAΩ J e−itA0P0⊥.
By (2.34), for any ϕ ∈ H0
W (t)ϕ = ψ(t) + UP0⊥ϕ, (2.43)
ψ(t) := U eitA0 (U∗J − I) e−itA0P0⊥ϕ. (2.44)
Let BR denote the ball of center zero and radius R in R
3. Since for |y| ≥ R, with R
large enough, our transformation, x = f(y), is the identity, x = y, and in consequence,
Aλλ′(y) = δ
λ′ for |y| ≥ R, we have that,
(U∗J − I) = (U∗J − I)χBR
. (2.45)
It follows that,
s- lim
ψ(t) = U s- lim
eitA0ϑ(t) (2.46)
with,
ϑ(t) := (U∗J − I)χBR
e−itA0P0⊥ϕ. (2.47)
We have that,
‖ϑ(t)‖
e−itA0P0⊥ϕ
e−itA0P0⊥ϕ
e−itA0P0⊥ϕ
. (2.48)
Then, as (A0 + i)
−1P0⊥ is bounded from H0 into W
1,2(R3) [27] [28], it follows from the
Rellich local compactness theorem that
(A0 + i)
−1P0⊥
is a compact operator in H0. Suppose that ϕ ∈ D(A0) ∩ H0⊥. Then,
s- lim
e−itA0P0⊥ϕ = s- lim
(A0 + i)
−1P0⊥e
−itA0(A0 + i)ϕ = 0, (2.49)
and whence, by (2.48),
s- lim
ϑ(t) = 0, (2.50)
and it follows that in this case,
s- lim
ψ(t) = 0. (2.51)
By continuity this is also true for ϕ ∈ H0⊥.
Then, (2.42) follows from (2.43) and (2.51).
The scattering operator is defined as
S := W ∗+W−. (2.52)
COROLLARY 2.3.
S = P0⊥. (2.53)
Proof: This is immediate from (2.42) because U∗ U = I.
Let us denote by S⊥ the restriction of S to H0⊥. S⊥ is the physically relevant scattering
operator that acts in the Hilbert space H0⊥ of finite energy fields that satisfy equations (2.2).
We designate by I⊥ the identity operator on H0⊥. We have that,
COROLLARY 2.4.
S⊥ = I⊥. (2.54)
Proof: This follows from Corollary 2.3.
The fact that S⊥ is the identity operator on H0⊥ means that there is perfect cloaking for
all frequencies. Suppose that for very negative times we are given an incoming wave packet
e−itA0ϕ−, with ϕ− ∈ H0⊥. Then, for large positive times the outgoing wave packet is given
by e−itA0ϕ+ with ϕ+ = S⊥ϕ−. But, as S⊥ = I, we have that ϕ+ = ϕ− and then,
e−itA0ϕ− = e
−itA0ϕ+.
Since the incoming and the outgoing wave packets are the same there is no way to detect
the cloaked objects K from scattering experiments performed in Ω.
In this paper we considered transformation media obtained from a singular transformation
that blows up a finite number of points, by simplicity, and since this is the situation in the
applications. Suppose that we have a transformation that is singular in a set of points
that we call M and denote now Ω0 := R
3 \M . What we really used in the proofs is that
W1,2(R3) = W
0 (Ω0) where W
0 (Ω0) denotes the completion of C
0 (Ω0) in the norm of
W1,2(R3). We also assumed that ελν0 , µ
0 are constant. What was actually needed is that a0
is essentially self-adjoint. All our results hold under this more general conditions provided
that in (2.41, 2.42) and (2.53) we replace P0⊥ by the projector onto the absolutely-continuous
subspace of A0 and that we assume that D(A0) ∩H0ac ⊂ W
1,2(R3), where we have denoted
the absolutely-continuous subspace of A0 by H0ac. Moreover, S⊥ has to be defined as the
restriction of S to H0ac and in (2.54) I⊥ has to be the identity operator on H0ac. Note that
under these general assumptions A0 could have non-zero eigenvalues and singular-continuous
spectrum.
For example, W1,2(R3) = W
0 (Ω0) if M has zero Sobolev one capacity [1, 11, 12].
Moreover, assume that the permittivity and the permeability tensor densities ελν0 , µ
0 are
bounded below and above. Under this condition a0 is essentially self-adjoint. Furthermore,
let us denote by Ĥ0 the Hilbert space of finite energy solutions defined as in (2.19) but
with ελν0 = µ
0 = δ
λµ. Let Â0, Ĥ0⊥ be, respectively, the electromagnetic propagator in Ĥ0
and the orthogonal complement of its kernel. We have that H0 and Ĥ0 are the same set
of functions with equivalent norms. Furthermore, D(A0) = D(Â0), kernel Â0 = kernelA0.
Moreover, (E0,H0)
T ∈ H0⊥ if and only if E0 = ε0Ê0,H0 = µ0Ĥ0 for some (Ê0, Ĥ0) ∈ Ĥ0⊥.
As [27, 28] D(Â0) ∩ Ĥ0⊥ ⊂ W
1,2(R3) we have that D(A0) ∩ H0⊥ ⊂ W
1,2(R3) if ε0, µ0 are
bounded operators on W1,2(R3) and this is true if the derivatives ∂
µ0 are bounded
operators on Ĥ0 for ρ = 1, 2, 3. Note, furthermore, that H0ac ⊂ H0⊥.
3 Electromagnetic Waves Inside the Cloaked Objects
Let us now consider the propagation of electromagnetic waves in the cloaked objects. For
this purpose we assume that in each Kj the permittivity and the permeability are given
by ελνj , µ
j , with inverses εjλν, µjλν and where εj, µj are the matrices with entries εjλν, µjλν.
Furthermore, we assume that 0 < ελνj , µ
j ≤ C,x ∈ Kj and that for any compact set Q con-
tained in the interior of Kj there is a positive constant CQ such that det ε
j > CQ, detµ
CQ,x ∈ Q. In other words, we only allow for possible singularities of εj, µj on the boundary
of Kj.
We designate by HjE the Hilbert space of all measurable, C
3− valued functions defined
on Kj that are square integrable with the weight ε
j , with the scalar product,
jν dx
3. (3.1)
Similarly, we denote by HjH the Hilbert space of all measurable, C
3− valued functions
defined on Kj that are square integrable with the weight µ
j , with the scalar product,
jν dx
3. (3.2)
The Hilbert space of finite energy fields in Kj is the direct sum
Hj := HjE ⊕HjH, (3.3)
and the Hilbert space in the cloaked objects K is the direct sum,
HK := ⊕
j=1Hj .
The complete Hilbert space of finite energy fields including the cloaked objects is,
H := HΩ ⊕HK . (3.4)
We now write (2.1) as a Schrödinger equation in each Kj as before. We define the
following formal differential operator,
εj∇×Hj
−µj∇× Ej
. (3.5)
Equation (2.1) in Kj is equivalent to
. (3.6)
Let us denote by C10(K̂j) the set of all C
6−valued continuously differentiable functions on
Kj that have compact support in the interior of Kj, that we denote by K̂j := Kj \ ∂Kj .
Then, aj with domain C
0(K̂j) is a symmetric operator in Hj . We denote,
a := aΩ ⊕
j=1 aj , (3.7)
with domain,
D(a) :=
⊕Nj=1
∈ C10(Ω)⊕
j=1 C
0(K̂j)
. (3.8)
The operator a is symmetric in H. The possible unitary dynamics that preserve energy for
the whole system including the cloaked objects K are given by the self-adjoint extensions of
a. Let us denote a the closure of a, with similar notation for aΩ, aj , j = 1, · · · , N . Then,
a = AΩ ⊕
j=1 aj,
where we used the fact that as aΩ is essentially self-adjoint, aΩ = AΩ. The adjoint of a is
given by,
D(a∗) =
⊕Nj=1
∈ H :
∈ D(AΩ), aj
, (3.9)
⊕Nj=1
⊕Nj=1 aj
, (3.10)
⊕Nj=1
∈ D(a∗). (3.11)
Let us denote by KΩ± := kernel(i∓ a
Ω),Kj± := kernel(i∓ a
j) the deficiency subspaces of
aΩ and aj, j = 1, · · · , N . Since aΩ is essentially self-adjoint KΩ± = {0}. Let K± := ⊕
j=1Kj±
be the deficiency subspaces of aK := ⊕
j=1aj . Suppose that K± have the same dimension.
Then, it follows from Corollary 1 in page 141 of [22] that there is a one-to-one correspondence
between self-adjoint extensions of aK and unitary maps from K+ into K−. If V is such a
unitary, then the corresponding self-adjoint extension AKV is given by,
D(AKV ) = {ϕ+ ϕ+ + V ϕ+ : ϕ ∈ D(aK), ϕ+ ∈ K+} ,
AKϕ = aKϕ+ iϕ+ − iV ϕ+.
Hence, since KΩ± = {0} and a = AΩ⊕aK there is a one-to-one correspondence between self-
adjoint extensions of a and unitary maps, V , from K+ into K−. The self-adjoint extension
AV corresponding to V is given by,
AV = AΩ ⊕ AKV .
Thus, we have proven the following theorem.
THEOREM 3.1. Every self-adjoint extension, A, of a is the direct sum of AΩ and of some
self-adjoint extension, AK of aK , i.e.,
A = AΩ ⊕AK . (3.12)
This theorem tells us that the cloaked objects K and the exterior Ω are completely
decoupled and that we are free to choose any boundary condition inside the cloaked objectsK
that makes aK self-adjoint without disturbing the cloaking effect in Ω. Boundary conditions
that make AK self-adjoint are well known. See for example, [19], [20], [14] and [6].
It follows from explicit computation that zero is an eigenvalue of every AK with infinite
multiplicity and that,
HK⊥ := (kernelAK)
∈ HK :
ελνK Eν = 0,
µλνK Hν = 0
, (3.13)
where by ελνK (x) := ε
j (x) for x ∈ Kj, and µ
K (x) := µ
j (x) for x ∈ Kj , j = 1, 2, · · · , N . It
follows that zero is an eigenvalue of A with infinite multiplicity and that,
H⊥ := (kernelA)
= HΩ⊥ ⊕HK⊥. (3.14)
For any ϕ = ϕΩ ⊕ ϕK ∈ H⊥ ∩D(A),
e−itAϕ = e−itAΩ ϕΩ ⊕ e
−itAK ϕK (3.15)
is the unique solution of Maxwell’s equations (2.1, 2.2) with finite energy that is equal to ϕ
at t = 0. This shows once again that the dynamics in Ω and in K are completely decoupled.
If at t = 0 the electromagnetic fields are zero in Ω, they remain equal to zero for all times,
and viceversa. Actually, electromagnetic waves inside the cloaked objects are not allowed to
leave them, and viceversa, electromagnetic waves outside can not go inside. This implies, in
particular, that the presence of active devices inside the cloaked objects has no effect on the
cloaking outside. In terms of boundary conditions, this means that transmission conditions
that link the electromagnetic fields inside and outside the cloaked objects are not allowed.
Furthermore, choosing a particular self-adjoint extension of the electromagnetic propagator
of the cloaked objects amounts to choosing some boundary condition on the inside of the
boundary of the cloaked objects. In other words, any possible unitary dynamics implies
the existence of some boundary condition on the inside of the boundary of the cloaked
objects. The particular boundary condition that nature will take depends on the specific
properties of the metamaterial used to build the transformation media as well us on the
properties of the media inside the cloaked objects. Note that this does not mean that we
have to put any physical surface, a lining, on the surface of the cloaked object to enforce any
particular boundary condition on the inside, since as we already mentioned this plays no role
in the cloaking outside. It would be, however, of theoretical interest to see what the interior
boundary condition turns out to be for specific cloaked objects and metamaterials. These
results apply to the exact transformation media that we consider on this paper. However,
the fact that there is a large class of self-adjoint extensions -or boundary conditions- that can
be taken inside the cloaked objects could be useful in order to enhance cloaking in practice,
where one has to consider approximate transformation media as well as in the analysis of
the stability of cloaking.
The fact that for the single coating there has to be boundary conditions on the inside
of ∂K has already been observed by [7]. In Definition 4.1 of [7] a definition of finite energy
solutions is given. Furthermore, is proven in Theorem 6.1 that in the case of the single
coating -where the permittivity and the permeability are bounded above and below inside
the cloaked object- the tangential components of the electric and the magnetic field of these
solutions have to vanish in the inside of the boundary of the cloaked object. Note that in
this case in order to have a self-adjoint extension of the electromagnetic propagator inside
the cloaked object we are only allowed to require that either the tangential component of E
or the tangential component of H vanishes, but not both.
These boundary conditions are called hidden boundary conditions in [7] where also the
case of the Helmholtz equation is considered. In the case of Maxwell’s equations they propose
two solutions to this issue. One of them is a lining, i.e., a physical material on the boundary of
the cloaked object that enforces a particular boundary condition, for example, they propose
a lining by a perfect electric conductor. Note that this raises now the question of what is the
boundary condition between the lining and the cloaking metamaterial. In fact, we face the
same problem as before, since we can always consider that the lining is part of the cloaked
objects, and then, the question of what is the appropriate boundary condition remains. The
second proposal of [7] is a double coating that corresponds to surrounding both the inner
and the outersurface of the cloaked objects with appropriately matched metamaterials. As
our permittivities and permeabilities inside K are allowed to vanish as they approach ∂K
the double coating fits in our formalism.
In Theorem 5.1 of [7] cloaking is proven for all frequencies and active devices, with the
double coating, with respect to the Cauchy data of the finite energy solutions that they define
in Definition 4.1.
Remark that there is no real contradiction between our results and the ones of [7]. Our
results imply that there is always a hidden boundary condition on the inside of the boundary
of the cloaked objects, that is imposed upon us by the fundamental principle of the con-
servation of the energy of the electromagnetic waves, that implies that time evolution has
to be given by a unitary group generated by a self-adjoint extension of the electromagnetic
propagator, and this amounts to a boundary condition at the inside of the boundary of
the cloaked objects. Note that we do not exclude here the possibility that in some cases
the electromagnetic propagator of the cloaked objects could be essentially self-adjoint, and
in this situation the dynamics inside the cloaked objects will be uniquely defined. In this
case the hidden boundary condition will be uniquely determined by the boundary conditions
satisfied by the functions in the domain of the unique self-adjoint realization of the elec-
tromagnetic propagator in the cloaked objects. Note, however, that we have proven that
for exact transformation media the cloaking outside is actually independent of the cloaked
objects.
4 Cloaking as a Boundary Value Problem
It is a question of independent interest to consider cloaking as a boundary value problem for
the Maxwell’ system at a fixed frequency
∇× E = iλB, ∇×H = −iλD, λ 6= 0, (4.1)
∇ ·B = 0,∇ ·D = 0. (4.2)
As we have already shown, cloaking is independent of the cloaked object, and this means
that we only have to consider these equation in Ω. The main question now is to decide what
is an appropriate class of solutions with locally finite energy. Our analysis of the self-adjoint
extensions of the electromagnetic propagator shows that we have to take solutions that are
locally in the domain of AΩ, that is to say that they are given by (2.38)
with (E0,H0)
T locally in the domain of A0, i.e., (E0,H0)
T are in the domain of A0 when
multiplied by any function in C∞0 (R
3). It follows from (2.39) that the solutions with locally
finite energy have to satisfy the boundary condition,
E× n = 0,H× n = 0, in ∂K+,
where ∂K+ is the outside of the boundary of the cloaked object. This is the only self-
adjoint boundary condition on ∂K+. Note that we define in the same way solutions with
(locally) finite energy in a bounded subset of Ω. In [7] a different definition of solutions with
(locally) finite energy is given in Definition 4.1.
5 Cloaking an Infinite Cylinder
We discuss now the case of an infinite cylinder. For simplicity we consider one cylinder
centered at zero and with its axis the vertical line L := (0, 0, x3), x3 ∈ R. Then,
x = (x1, x2, x3) ∈ R3 :
|x1|2 + |x2|2 ≤ a, x3 ∈ R
, Ω := R3 \K. (5.3)
The set Ω0 is now given by,
Ω0 = R
3 \ L. (5.4)
Let us denote by x := (x,1 , x2) the vectors in the x1 − x2 plane and x̂ := x/|x|. The
transformation (2.6) is replaced by
x = x(y) = f(y) :=
x = ( b−a
|y|+ a)ŷ,
x3 = y3,
(5.5)
for 0 < |y| ≤ b and with b > a. This transformation blows up the line L onto ∂K and it
sends Kb \ L onto Kb \K where
Kb :=
y = (y1, y2, y3) ∈ R3 :
|y1|2 + |y2|2 ≤ b, y3 ∈ R
For y ∈ R3 \Kb we define the transformation to be the identity, x = y.
The Hilbert spaces of finite energy electromagnetic fields, the unitary operator U and a0,
A0, aΩ, aK , a, are defined as in Section 2.
THEOREM 5.1. The operator aΩ is essentially self-adjoint, and its unique self-adjoint
extension, AΩ, satisfies
AΩ = U A0 U
∗. (5.6)
Proof: The theorem is proven as Theorem 2.1 observing that W 1,2(R2) = W
2 \ 0) since
{0} has zero Sobolev one capacity in R2 [1, 11, 12].
We now consider the wave and the scattering operators. For simplicity we assume below
that ελν0 = ε̃ δ
λν , µλν0 = µ̃ δ
λν . The wave operators are defined as in (2.41) but now the
operator J is defined as follows,
(x) := χΩ(x)φ(x)
where φ is continuous and it satisfies φ(x) = (|x| − a), a ≤ |x| ≤ a + δ and φ(x) = 1 for
|x| ≥ a+ 2δ, for some δ > 0.
LEMMA 5.2.
W± = UP0⊥. (5.7)
Proof: The lemma is proven as in the proof of Lemma 2.2, but in (2.45, 2.46, 2.47, 2.48) we
have to replace χBR
, by χCR
where, CR := {y ∈ R
3 : |y| ≤ R} for R large enough. Now we
can not prove (2.49, 2.50, 2.51) by compactness arguments because K is unbounded. Instead
we use propagation estimates for A0. The following results are well known. See for example
[3, 27, 28, 31] where the general anisotropic case is considered. For any ϕ ∈ H0⊥,
e−itA0ϕ =
(2π)3/2
eik·y
e−iω+(k)tP+(k)ϕ̂(k) + e
−iω−(k)tP−(k)ϕ̂(k)
d3k (5.8)
where ϕ̂ is the Fourier transform of ϕ, ω±(k) = ±|k|c with c := (ε̃µ̃)
−1/2, and P±(k) are
projectors on R3 that are infinitely differentiable for k ∈ R3\0. Suppose that ϕ̂ ∈ C∞0 (R
and let O be a bounded open set such that O ⊂ R3 \ L and support ϕ̂ ⊂ O. Denote
Ô :=
: k ∈ O
Then by the (non) stationary phase Theorem (see the Corollary to Theorem XI.14 of [23] ),
for any n = 1, 2, · · · there is a constant Cn such that
e−itA0ϕ
∣ ≤ Cn (1 + |y|+ |t|)
/∈ Ô. (5.9)
We write,
−itA0ϕ = φ1 + φ2 (5.10)
φ1 := χ(±y/(ct)/∈Ô)χCRe
−itA0ϕ (5.11)
φ2 := χ(±y/(ct)∈Ô)χCRe
−itA0ϕ. (5.12)
by (5.9)
s- lim
φ1 = 0. (5.13)
Note, furthermore, that there is an ǫ > 0 such that |k| ≥ ǫ for any k ∈ Ô. Then, for any
∈ Ô, |y| ≥ c|t|ǫ. It follows that there is a T such that
φ2 = 0, for|t| ≥ T. (5.14)
By (5.10, 5.13, 5.14)
s- lim
−itA0ϕ = 0, (5.15)
and (2.50, 2.51) follow. Note that P0⊥ is not needed because ϕ ∈ H0⊥. By continuity
this is true for all ϕ ∈ H0⊥. Then, (5.7) follows from (2.43, 2.51).
The scattering operator is defined as in (2.52).
COROLLARY 5.3.
S = P0⊥. (5.16)
Proof: This is immediate from (5.7) because U∗ U = I.
Let us denote by S⊥ the restriction of S to H0⊥. S⊥ is the physically relevant scattering
operator that acts in the Hilbert space H0⊥ of finite energy fields that satisfy equations (2.2).
We designate by I⊥ the identity operator on H0⊥. We have that,
COROLLARY 5.4.
S⊥ = I⊥. (5.17)
Proof: This follows from Corollary 5.3.
Again, the fact that S⊥ is the identity operator on H0⊥ means that there is cloaking for
all frequencies.
In Theorem 7.1 of [7] cloaking is proven for all frequencies with respect to the Cauchy
data of the finite energy solutions that they define in Definition 4.1 and furthermore, in
Theorem 8.2, they prove cloaking for all frequencies with the SHS boundary condition with
respect to the Cauchy data of the finite energy solutions that they define in Definition 8.1.
Theorem 3.1 remains true in the case of the cylinder. The proof is the same. Furthermore,
all the remarks about finite energy solutions, and cloaking and that we made in Sections
2, 3, are true in the case of a cylinder. We do not repeat them here. Moreover, equations
(2.38) hold. However, since now the transformation (5.5) only acts on the plane orthogonal
to the axis of the cylinder equations (2.39) has to be replaced by
E× x̂ = 0, H× x̂ = 0, in ∂K+, (5.18)
where E := (E1, E2),H := (H1, H2).
As in Section 4 we define solutions to (4.1, 4.2) with locally finite energy as solutions
that are locally in the domain of AΩ, that is to say that they are given by (2.38)
with (E0,H0)
T locally in the domain of A0, i.e., (E0,H0)
T are in the domain of A0 when
multiplied by any function in C∞0 (R
3). It follows from (5.18) that the solutions with locally
finite energy have to satisfy the boundary condition,
E× x̂ = 0, H× x̂ = 0, in ∂K+. (5.19)
Note that (5.19) is the SHS boundary condition considered in [7]. We have proven here
that (5.19) is the only self-adjoint boundary condition on ∂K+ allowed by energy conserva-
tion.
Acknowledgement
This work was partially done while I was visiting the Institut für Theoretische Physik,
Eidgenössische Techniche Höchschule Zurich. I thank professors Gian Michele Graf and
Jürg Fröhlich for their kind hospitality.
References
[1] R.A. Adams and J.J.F. Fournier, Sobolev Spaces, Second Edition, Academic Press,
Amsterdam, 2003
[2] A. Bachelot, Gravitational scattering of electromagnetic fields by Schwarzschild black-
holes, Ann. Inst. H. Poincaré 54 261-320 (1991).
[3] R. Courant and D. Hilbert, Methoden der Mathematischen Physik II, Zweite Auflage,
Springer-Verlag, Berlin, 1968.
[4] W. Cai, U.K. Chettiar, A.V. Kildishev and V.M. Shalaev, Optical Cloaking with meta-
materials, Nature Photonics 1 224-226 (2007).
[5] S.A. Cummer, B.-I. Popa, D. Schurig, D. R. Smith and J. Pendry, Full-wave simulation
of electromagnetic cloaking structures, Phys. Rev. E 74 036621 (2006).
[6] M. Sh. Birman and M.Z. Solomyak, The self-adjoint Maxwell operator in arbitrary
domains, Algebra i Analiz. 1 96-110 (1989). English transl. Leningrad Math. J. 1 99-
115 (1989).
[7] A. Greenleaf, Y. Kurylev, M. Lassas and G. Ulhmann, Full-wave invisibility of active
devices at all frequencies, Comm. Math. Phys. 275 749-789 (2007).
[8] A. Greenleaf, M. Lassas, and G. Uhlmann, Anisotropic conductivities that cannot be
detected by EIT, Physiol. Meas. 24 413-419 (2003).
[9] A. Greenleaf, M. Lassas, and G. Uhlmann, On nonuniqueness for Calderón’s inverse
problem, Math. Res. Let. 10 685-693 (2003).
[10] R. Hempel and R. Weder, On the completeness of wave operators under loss of local
compactness, Journal of Functional Analysis, 113 391-412 (1993).
[11] T. Kilpeläinen, J. Kinnunen and O. Martio, Sobolev spaces with zero boundary values
on metric spaces, Potential Anal. 12 233-247 (2000).
[12] J. Kinnunen and O. Martio, The Sobolev capacity on metric spaces, Ann. Acad. Sci.
Fenn. Math. 21 367-382 (1997).
[13] U. Leonhardt, Optical conformal mapping, Science 312 1777-1780 (2006).
[14] R. Leis, Initial Boundary Value Problems in Mathematical Physics, John Wiley & Sons,
New York, 1986.
[15] U. Leonhardt and T. G. Philbin, General relativity in electrical engineering, New J.
Phys. 8 247 (2006).
[16] G. W. Milton, M. Briane, and J. R. Willis, On cloaking for elasticity and physical
equations with transformation invariant form, New J. Phys. 8 248 (2006).
[17] A. Nachman, Reconstruction from boundary measurements, Ann. of Math. 128 71-96
(1988).
[18] J. B. Pendry, D. Schurig, and D. R. Smith, Controlling electromagnetic fields, Science
312 1780-1782 (2006).
[19] R. Picard, Ein Randwertproblem für die zeitunabhängigen Maxwellschen Gleichungen
mit der Randbedingung n · εE = n · µH = 0 in beschränken Gebieten beliebigen
Zusammenhangs Appl. Anal. 6 207-221 (1977).
[20] R. Picard, On the low frequency asymptotics in electromagnetic theory, J. Reine Angew.
Math. 354 50-73 (1984).
[21] E.J. Post, Formal Structure of Electromagnetics General Covariance and Electromag-
netics, Dover Publications, Mineola, New York, 1997.
[22] M. Reed and B. Simon, Methods of Modern Mathematical Physics II Fourier Analysis
and Self-Adjointness, Academic Press, New York, 1975.
[23] M. Reed and B. Simon, Methods of Modern Mathematical Physics III Scattering Theory,
Academic Press, New York, 1979.
[24] D. Schurig, , J.J. Mock, B.J. Justice, S.A. Cummer, J.B.. Pendry, A.F. Starr and D.
R. Smith, Metamaterial electromagnetic cloak at microwave frequencies Science 314
977-980 (2006).
[25] D. Schurig, J.B. Pendry, and D. R. Smith, Calculation of material properties and ray
tracing in transformation media, Opt. Exp. 14 9794-9804 (2006).
[26] G. Uhlmann, Scattering by a metric, Chap. 6.1.5 in Encyclopedia on Scvattering, Aca-
demic Press, R. Pike and P. Sabatier, eds (2002), 1668-1677.
[27] R. Weder, Analyticity of the scattering matrix for wave propagation in crystals, J. Math.
Pures et Appl. 64 121-148 (1985).
[28] R. Weder, Spectral and Scattering Theory for Wave Propagation in Perturbed Stratified
Media. Applied Mathematical Sciences 87 Springer-Verlag, New York, 1991.
[29] R. Weder, A rigorous analysis of high-order electromagnetic invisibility cloaks, arXiv:
0711.0507, 2007, J. Phys. A: Mathematical and Theoretical 41 065207 (2008). IOP-
Select.
[30] R. Weder, The boundary conditions for electromagnetic invisibility cloaks, arXiv:
0801.3611, 2008.
[31] C.H. Wilcox, Asymptotic wave functions and energy distributions in strongly propaga-
tive media, J. Math. Pures et Appl. 57 275-231 (1978).
	Introduction
	Electromagnetic Cloaking
	Electromagnetic Waves Inside the Cloaked Objects 
	Cloaking as a Boundary Value Problem
	Cloaking an Infinite Cylinder
ABSTRACT
  There is currently a great deal of interest in the theoretical and practical
possibility of cloaking objects from the observation by electromagnetic waves.
The basic idea of these invisibility devices \cite{glu1, glu2, le},\cite{pss1}
is to use anisotropic {\it transformation media} whose permittivity and
permeability $\var^{\lambda\nu}, \mu^{\lambda\nu}$, are obtained from the ones,
$\var_0^{\lambda\nu}, \mu^{\lambda\nu}_0$, of isotropic media, by singular
transformations of coordinates. In this paper we study electromagnetic cloaking
in the time-domain using the formalism of time-dependent scattering theory.
This formalism allows us to settle in an unambiguous way the mathematical
problems posed by the singularities of the inverse of the permittivity and the
permeability of the {\it transformation media} on the boundary of the cloaked
objects. We write Maxwell's equations in Schr\"odinger form with the
electromagnetic propagator playing the role of the Hamiltonian. We prove that
the electromagnetic propagator outside of the cloaked objects is essentially
self-adjoint. Moreover, the unique self-adjoint extension is unitarily
equivalent to the electromagnetic propagator in the medium
$\var_0^{\lambda\nu}, \mu^{\lambda\nu}_0$. Using this fact, and since the
coordinate transformation is the identity outside of a ball, we prove that the
scattering operator is the identity. Our results give a rigorous proof that the
construction of \cite{glu1, glu2, le}, \cite{pss1} perfectly cloaks passive and
active devices from observation by electromagnetic waves. Furthermore, we prove
cloaking for general anisotropic materials. In particular, our results prove
that it is possible to cloak objects inside general crystals.

<|endoftext|><|startoftext|>
Non-perturbative conserving approximations and Luttinger’s sum rule
Jutta Ortloff, Matthias Balzer, Michael Potthoff
Institut für Theoretische Physik und Astrophysik,
Universität Würzburg, Am Hubland, D-97074 Würzburg, Germany
Weak-coupling conserving approximations can be constructed by truncations of the Luttinger-
Ward functional and are well known as thermodynamically consistent approaches which respect
macroscopic conservation laws as well as certain sum rules at zero temperature. These properties
can also be shown for variational approximations that are generated within the framework of the self-
energy-functional theory without a truncation of the diagram series. Luttinger’s sum rule represents
an exception. We analyze the conditions under which the sum rule holds within a non-perturbative
conserving approximation. Numerical examples are given for a simple but non-trivial dynamical two-
site approximation. The validity of the sum rule for finite Hubbard clusters and the consequences
for cluster extensions of the dynamical mean-field theory are discussed.
PACS numbers: 71.10.-w, 71.10.Fd
I. INTRODUCTION
Continuous symmetries of a Hamiltonian imply the
existence of conserved quantities: The conservation of
total energy, momentum, angular momentum, spin and
particle number is enforced by a not explicitly time-
dependent Hamiltonian which is spatially homogeneous
and isotropic and invariant under global SU(2) and U(1)
gauge transformations. For the treatment of a macro-
scopically large quantum system of interacting fermions,
approximations are inevitable in general. Approxima-
tions, however, may artificially break symmetries and
thus lead to unphysical violations of conservations laws.
Baym and Kadanoff1,2 have analyzed under which cir-
cumstances an approximation for time-dependent corre-
lation functions, and for one- and two-particle Green’s
functions in particular, respect the mentioned macro-
scopic conservation laws. They were able to give cor-
responding rules for a proper construction of approxima-
tions, namely criteria for selecting suitable classes of dia-
grams, within diagrammatic weak-coupling perturbation
theory. Weak-coupling approximations following these
rules and thus respecting conservation laws are called
“conserving”. Frequently cited examples for conserving
approximations are the Hartree-Fock or the fluctuation-
exchange approximation.1,3,4
Baym2 has condensed the method of constructing con-
serving approximations into a compact form: A con-
serving approximation for the one-particle Green’s func-
tion G is obtained by using Dyson’s equation G =
1/(G−10 − Σ) with (the free, U = 0, Green’s function
G0 and) a self-energy Σ = ΣU [G] given by a univer-
sal functional. Apart from G, the universal functional
ΣU must depend on the interaction parameters U only.
Furthermore, the functional must satisfy a vanishing-curl
condition or, alternatively, must be derivable from some
(universal) functional ΦU [G] as TΣU [G] = δΦU [G]/δG
(the temperature T is introduced for convenience). In
short, “Φ-derivable” approximations are conserving.
Φ-derivable approximations have been shown2 to ex-
hibit several further advantageous properties in addition.
One of these concerns the question of thermodynamical
consistency. There are different ways to determine the
grand potential of the system from the Green’s function
which do not necessarily yield the same result when us-
ing approximate quantities. On the one hand, Ω may be
calculated by integration of expectation values, accessi-
ble by G, with respect to certain model parameters. For
example, Ω may be calculated by integration of the av-
erage particle number, as obtained from the trace of G,
with respect to the chemical potential µ. On the other
hand, Ω can be obtained as Ω = Φ + Tr lnG − Tr(ΣG)
without integration. A Φ-derivable approximation con-
sistently gives the same result for Ω in both ways.
At zero temperature T = 0 there is another non-trivial
theorem which is satisfied by any Φ-derivable approxima-
tion, namely Luttinger’s sum rule.5,6 This states that the
volume in reciprocal space that is enclosed by the Fermi
surface is equal to the average particle number. The orig-
inal proof of the sum rule by Luttinger and Ward5 is
based on the existence of Φ in the exact theory and is
straightforwardly transferred to the case of a Φ-derivable
approximation. This also implies that other Fermi-liquid
properties, such as the linear trend of the specific heat at
low T and Fermi-liquid expressions for the T = 0 charge
and the spin susceptibility are respected by a Φ-derivable
approximation.
There is a perturbation expansion5,7 which gives the
Luttinger-Ward functional ΦU [G] in terms of closed
skeleton diagrams (see Fig. 1). As a manageable Φ-
= + + +Φ
FIG. 1: Diagrammatic representation of the Luttinger-Ward
functional ΦU [G]. Double lines stand for the interacting one-
particle Green’s function G, dashed lines represent the ver-
tices U .
http://arxiv.org/abs/0704.0249v2
derivable approximation must specify a (universal) func-
tional ΦU [G] that can be evaluated in practice, one usu-
ally considers truncations of the expansion and sums up
a certain subclass of skeleton diagrams only. This, how-
ever, means that the construction of conserving approx-
imations is restricted to the weak-coupling limit.
One purpose of the present paper is to show that it is
possible to construct Φ-derivable approximations for lat-
tice models of correlated fermions with local interactions
which are non-perturbative, i.e. do not employ trunca-
tions of the skeleton-diagram expansion. The idea is to
employ the self-energy-functional theory (SFT).8,9,10 The
SFT constructs the Luttinger-Ward functional ΦU [G], or
its Legendre transform FU [Σ], in an indirect way, namely
by making contact with an exactly solvable reference sys-
tem. Thereby, the exact functional dependence of FU [Σ]
becomes available on a certain subspace of self-energies
which is spanned by the self-energies generated by the
reference system.
The obvious question is whether those non-
perturbative Φ-derivable approximations have the
same properties as the weak-coupling Φ-derivable ap-
proximations suggested by Baym and Kadanoff. This
requires the discussion of the following points:
(i) Macroscopic conservations laws. For fermionic lat-
tice models, conservation of energy, particle number and
spin have to be considered. Besides the static thermody-
namics, the SFT concept concentrates on the one-particle
excitations. For the approximate one-particle Green’s
function, however, it is actually simple to prove that the
above conservation laws are respected. A short discus-
sion is given in Appendix A.
(ii) Thermodynamical consistency. This issue has al-
ready been addressed in Ref. 11. It has been shown that
the µ derivative of the (approximate) SFT grand poten-
tial (including a minus sign) equals the average particle
number 〈N〉 as obtained by the trace of the (approxi-
mate) Green’s function. The same holds for any one-
particle quantity coupling linearly via a parameter to the
Hamiltonian, e.g. for the average total spin 〈S〉 coupling
via a field of strength B.
(iii) Luttinger sum rule. This is the main point to
be discussed in the present paper. There are different
open questions: First, it is straightforward to prove that
weak-coupling Φ-derivable approximations respect the
sum rule as one can directly take over the proof for the
exact theory. For approximations constructed within the
SFT, a different proof has to be given. Second, it turns
out that a non-perturbative Φ-derivable approximation
respects the sum rule if and only if the sum rule holds
for the reference system that is used within the SFT.
As the original and thereby the related reference system
may be studied in the strong-coupling regime, this raises
the question which reference system does respect the sum
rule, i.e. which approximation is consistent with the sum
rule. Third, it will be particularly interesting to study
reference systems which generate dynamical impurity ap-
proximations (DIA)8,9 and variational cluster approxi-
mations (VCA),10,12 as these consist of a finite number of
degrees of freedom. Does the Luttinger sum rule hold for
finite systems? Do the DIA and the VCA respect the sum
rule? What is the simplest approximation consistent with
the sum rule? Note that finite reference systems consist-
ing of a few sites only have been shown9,13,14,15,16,17,18,19
to generate approximations which qualitatively capture
the main physics correctly. Finally, it is important to un-
derstand these issues in order to understand whether and
how a violation of the sum rule is possible within cluster
extensions20,21,22,23 of the dynamical mean-field theory
(DMFT).24,25,26,27,28 Note that the SFT comprises the
DMFT and certain29 cluster extensions and that possi-
ble violations of the sum rule in the two-dimensional lat-
tice models have been reported,30,31,32 including a study
using the dynamical cluster approximation (DCA).33
The paper is organized as follows: A brief general dis-
cussion of the Luttinger sum rule is given in the next
section, and a form of the sum rule specific to systems
with a finite number of spatial degrees of freedom is
derived. Sec. III clarifies the status of the sum rule
with respect to non-perturbative approximations gener-
ated within the SFT framework. The results are elu-
cidated by several numerical examples obtained for the
most simple but non-trivial non-perturbative conserving
approximation in Sec. IV. Violations of the sum rule in fi-
nite systems and their consequences are discussed in Sec.
V. Finally, Sec. VI summarizes our main conclusions.
II. LUTTINGER SUM RULE
A system of interacting electrons on a lattice is gen-
erally described by a Hamiltonian H(t,U) = H0(t) +
H1(U) consisting of a one-particle part H0(t) and an
interaction H1(U) with one-particle and interaction pa-
rameters t and U , respectively. As a prototype, let
us consider the single-band Hubbard model34,35,36 on a
translationally invariant D dimensional lattice consist-
ing of L sites with periodic boundary conditions. The
Hamiltonian is given by:
iσcjσ +
niσni−σ . (1)
Here, i = 1, ..., L refers to the sites, σ =↑, ↓ is the spin
projection, ciσ (c
iσ) annihilates (creates) an electron in
the one-electron state |iσ〉, and niσ = c
ciσ. Fourier
transformation diagonalizes the hopping matrix t and
yields the dispersion ε(k). There are L allowed k points
in the first Brillouin zone.
Let G = Gt,U denote the one-electron Green’s func-
tion of the model H(t,U). In case of the Hubbard model,
its elements are given by Gij(ω) = 〈〈ciσ ; c
jσ〉〉ω . In the
absence of spontaneous symmetry breaking, the Green’s
function is spin-independent and diagonal in reciprocal
space. It can be written as Gk(ω) = 1/(ω + µ − ε(k) −
Σk(ω)) where µ is the chemical potential and Σk(ω) the
self-energy. We also introduce the notation Σt,U for
the self-energy, and Gt,0 = 1/(ω + µ − t) for the free
(non-interacting) Green’s function which exhibits the de-
pendence on the model parameters but suppresses the
frequency dependence. Dyson’s equation then reads as
Gt,U = 1/(G
t,0 −Σt,U ).
The Luttinger sum rule5,6 states that
〈N〉 = 2
Θ(Gk(0)) (2)
where N =
niσ is the particle-number operator,
〈N〉 its (T = 0) expectation value, and Θ the Heavy-
side step function. The factor 2 accounts for the two
spin directions. Since Gk(0)
−1 = µ − ε(k) − Σk(0), the
sum gives the number of k points enclosed by the in-
teracting Fermi surface which, for L → ∞, is defined via
µ−ε(k)−Σk(0) = 0. In the thermodynamic limit the sum
rule therefore equates the average particle number with
the Fermi-surface volume (apart from a factor (2π)D/L).
Note that, as Θ(Gk(0)) = Θ(1/Gk(0)), the sum rule Eq.
(2) also includes the so-called Luttinger volume37 which
(for L → ∞) is enclosed by the zeros of Gk(0).
The standard proof of the sum rule can be found in
Ref. 5. It is based on diagrammatic perturbation theory
to all orders which is used to construct the Luttinger-
Ward functional ΦU [G] as the sum of renormalized closed
skeleton diagrams (see Fig. 1). We emphasize that the
original proof straightforwardly extends also to finite sys-
tems. For L < ∞ the sum in Eq. (2) is discrete. Actually,
the proof is performed for finite L first, and the thermo-
dynamic limit (if desired) can be taken in the end. The
limit T → 0, on the other hand, is essential and is re-
sponsible for possible violations of the sum rule (see Sec.
Below we need an alternative but equivalent formu-
lation of the sum rule. We start from the following
(Lehmann) representation for the Green’s function:
Gk(ω) =
αm(k)
ω + µ− ωm(k)
. (3)
Here, ωm(k)−µ are the (real) poles and αm(k) the (real
and positive) weights. For real frequencies ω, it is then
easy to verify the identity:
Θ(Gk(ω)) =
Θ(ω+µ−ωm(k))−
Θ(ω+µ−ζn(k))
where ζn(k) − µ is the n-th (real) zero of the Green’s
function, i.e. Gk(ζn(k)− µ) = 0.
For temperature T = 0 we have 〈N〉 =
dω(−1/π)ImGk(ω + i0
+) and thus 〈N〉 =
αm(k)Θ(µ−ωm(k)). Hence, the Luttinger sum
rule reads:
αm(k)Θ(µ− ωm(k))
Θ(µ− ωm(k))−
Θ(µ− ζn(k))
This form of the sum rule is convenient for the discussion
of finite systems with L < ∞.
III. SELF-ENERGY-FUNCTIONAL THEORY
AND LUTTINGER SUM RULE
Within the self-energy-functional theory (SFT),8,9,10
the grand potential Ω is considered as a functional of the
self-energy:
Ωt,U [Σ] = Tr ln
t,0 −Σ
+ FU [Σ] . (6)
Here, the trace Tr of a quantity A is defined as TrA ≡
eiωn0
Ak(iωn) where iωn = i(2n+ 1)πT are
the fermionic Matsubara frequencies, and the functional
FU [Σ] is the Legendre transform of the Luttinger-Ward
functional ΦU [G]. The self-energy functional (6) is sta-
tionary at the physical self-energy, δΩt,U [Σt,U ]/δΣ = 0,
and, if evaluated at the physical self-energy, yields the
physical value for the grand potential: Ωt,U [Σt,U ] =
Ωt,U ≡ −T ln tr exp(−β(H(t,U)−µN)) where β = 1/T .
Comparing with the self-energy functional
Ωt′,U [Σ] = Tr ln
t′,0 −Σ
+ FU [Σ] (7)
of a reference system with the same interaction but a
modified one-particle part, i.e. with the Hamiltonian
H(t′,U), the not explicitly known but only U -dependent
functional FU [Σ] can be eliminated:
Ωt,U [Σ] = Ωt′,U [Σ] + Tr ln
t,0 −Σ
− Tr ln
t′,0 −Σ
An approximation is constructed by searching for a sta-
tionary point of the self-energy functional on the sub-
space of trial self-energies spanned by varying the one-
particle parameters t′:
∂Ωt,U [Σt′,U ]
= 0 . (9)
Inserting a trial self-energy into Eq. (8) yields
Ωt,U [Σt′,U ] = Ωt′,U +Tr ln
t,0 −Σt′,U
− Tr lnGt′,U .
The decisive point is that the r.h.s. can be evaluated ex-
actly for a reference system which is exactly solvable.
Apart from the free Green’s function Gt,0, it involves
quantities of the reference system only.
This strategy to generate approximations has several
advantages: (i) Contrary to the usual conserving approxi-
mations, the exact functional form of Ωt,U [Σ] is retained.
Any approximation is therefore non-perturbative by con-
struction. On the level of one-particle excitations, macro-
scopic conservation laws are respected as shown in Ap-
pendix A. (ii) With Ωt,U [Σt′,U ] evaluated at the station-
ary point t′ = t′s, an approximate but explicit expression
for a thermodynamical potential is provided. As all phys-
ical quantities derive from this potential, the approxima-
tion is thermodynamically consistent in itself (see Ref.
11 for details). (iii) As different reference systems gen-
erate different approximations, the SFT provides a uni-
fying framework that systematizes a class of “dynamic”
approximations (see Refs. 29,38 for a discussion).
In the following we discuss the question whether or
not a dynamic approximation respects the Luttinger sum
rule. For this purpose consider first the Tr ln(· · · ) terms
in Eq. (10). These can be evaluated using the ana-
lytical and causal properties of the Green’s functions
as described in Ref. 9 (see Eq. (4) therein). Using
−T ln(1 + exp(−ω/T )) → ωΘ(−ω) for T → 0 yields:
Tr ln
t,0 −Σt′,U
(ωm(k)− µ)Θ(µ− ωm(k))
(ζn(k)− µ)Θ(µ− ζn(k)) . (11)
Analogously, we have
Tr lnGt′,U = 2
(ω′m(k)− µ)Θ(µ− ω
m(k))
(ζn(k)− µ)Θ(µ− ζn(k)) .
Note that the reference system is always assumed to be
in the same macroscopic state as the original system,
i.e. it is considered at the same temperature and, more
importantly here, at the same chemical potential µ. Fur-
thermore, it has been used that, by construction of the
approximation, the self-energy and hence its poles at
ζn(k) − µ are the same for both, the original and the
reference system. This implies that the second terms on
the r.h.s. of Eq. (11) and (12), respectively, cancel each
other in Eq. (10). Finally, a (large but) finite system
(L < ∞) and a finite reference system are considered.
Hence, the set of poles of the Green’s function and of the
self-energy as well as sums over k are discrete and finite.
Taking the µ derivative on both sides of Eq. (10) then
yields:
∂Ωt,U [Σt′,U ]
∂Ωt′,U
Θ(µ− ωm(k))
Θ(µ− ω′m(k)) . (13)
Here we have assumed the ground state of the refer-
ence system to be non-degenerate with respect to the
particle number. From the (zero-temperature) Lehmann
representation39 it is then obvious that, within a sub-
space of fixed particle number, the µ-dependence of the
Green’s function is the same as its ω-dependence, i.e.
G(ω) = G̃(ω+µ) with a µ-independent function G̃. Via
the Dyson equation of the reference system, this prop-
erty can also be inferred for the self-energy and, via the
Dyson equation of the original system, for the (approx-
imate) Green’s function of the original system. Conse-
quently, the poles of (G−1
t,0 −Σt′,U )
−1 and of Gt′,U are
linearly dependent on µ, i.e. ωm(k) and ω
m(k) in Eqs.
(11) and (12) are independent of µ.
We once more exploit the fact that the self-energy of
the original system is identified with the self-energy of the
reference system. Using Eq. (4) one immediately arrives
〈N〉 = 〈N〉′ + 2
Θ(Gk(0))− 2
(0)) . (14)
This is the final result: The Luttinger sum rule for
the original system, Eq. (2), is satisfied if and only if
it is satisfied for the reference system, i.e. if 〈N〉′ =
(0)).
A few remarks are in order. For the reference sys-
tem, the status of the Luttinger sum rule is that of a
general theorem (as long as the general proof is valid);
〈N〉′ and G′
(0) represent exact quantities. The above
derivation shows that the theorem is “propagated” to the
original system irrespective of the approximation that is
constructed within the SFT. This propagation also works
in the opposite direction. Namely, a possible violation
of the exact sum rule for the reference system would im-
ply a violation of the sum rule, expressed in terms of
approximate quantities, for the original system.
Eq. (14) holds for any choice of t′. Note, however, that
stationarity with respect to the variational parameters t′
is essential for the thermodynamical consistency of the
approximation. In particular, consistency means that the
average particle number 〈N〉 = −∂Ωt,U [Σt′,U ]/∂µ on the
l.h.s. can be obtained as the trace of the Green’s function.
Stationarity is thus necessary to get the sum rule in the
form (5).
There are no problems to take the thermodynamic
limit (if desired) on both sides of Eq. (14) (after divi-
sion of both sides by the number of sites L). The k sums
turn into integrals over the unit cell of the reciprocal lat-
tice. For a D-dimensional lattice the D − 1-dimensional
manifolds of k points with Gk(0) = ∞ or Gk(0) = 0 form
Fermi or Luttinger surfaces, respectively.
For the above derivation, translational symmetry has
been assumed for both, the original as well as the refer-
ence system. Nothing, however, prevents us from repeat-
ing the derivation in case of systems with reduced (or
completely absent) translational symmetries. One sim-
ply has to re-interprete the wave vector k as an index
which, combined with m, refers to the elements of the
diagonalized Green’s function matrix G. The exact sum
rule, Eq. (5), generalizes accordingly. The result (14) re-
mains valid (with the correct interpretation of k) for an
original system with reduced translational symmetries.
It is also valid for the case of a translationally symmet-
ric original Hamiltonian where, due to the choice of a
reference system with reduced translational symmetries,
the symmetries of the (approximate) Green’s function of
the original system are (artificially) reduced. A typical
example is the variational cluster approximation (VCA)
where the reference system consists of isolated clusters of
finite size.
IV. TWO-SITE DYNAMICAL-IMPURITY
APPROXIMATION
While the Hartree-Fock approximation may be con-
sidered as the most simple weak-coupling Φ-derivable
approximation, the most simple non-perturbative Φ-
derivable approximation is given by the dynamical-
impurity approximation (DIA). This shall be demon-
strated in the following for the single-band Hubbard
model (1) as the original system to be investigated. The
DIA is generated by a reference system consisting of a
decoupled set of single-impurity Anderson models with
a finite number of sites ns and is known
8 to recover
the dynamical mean-field theory in the limit ns → ∞.
As long as the Luttinger sum rule holds for the single-
impurity reference system, the DIA must yield a one-
particle Green’s function and a self-energy respecting the
sum rule.
The Hamiltonian of the reference system is H(t′,U) =∑L
i=1 H
i with
H ′i =
iσciσ +
niσni−σ
aikσ +
ciσ + h.c.) .
For a homogeneous phase, the variational parameters
′ = ({ε
0 , ε
}) can be assumed to be indepen-
dent of the site index i: ε0 ≡ ε
0 , εk ≡ ε
, Vk ≡ V
For the sake of simplicity, we consider the two-site DIA
(ns = 2), i.e. a single bath site per correlated site only.
In this case there are three independent variational pa-
rameters only: the on-site energies of the correlated and
of the bath site, ε0 and εc ≡ εk=2, respectively, as well as
the hybridization strength V ≡ Vk=2. As the reference
0 0.2 0.4 0.6 0.8 1
filling  n
U=W=4
FIG. 2: Filling dependence of the variational parameters
at their respective optimized values and of the chemical po-
tential. Calculations for the Hubbard model with a semi-
elliptical free density of states of band width W = 4 and
interaction strength U = W = 4 using the two-site DIA.
system consists of replicated identical impurity models
which are spatially decoupled, the trial self-energy is lo-
cal and site-independent, Σij(ω) = δijΣ(ω).
Calculations have been performed for the Hub-
bard model with a one-particle dispersion ε(k) =
e−ik(Ri−Rj)tij such that the density of one-
particle energies D(ε) is semi-elliptic. For |ε| ≤ W/2,
D(ε) =
δ(ε− ε(k)) =
(W/2)2 − ε2 . (16)
The free band width is set to W = 4. This serves as the
energy scale.
The computation of the SFT grand potential is per-
formed as described in Ref. 9. Stationary points of the
resulting function Ω(ε0, εc, V ) ≡ Ωt,U [Σε0,εc,V ] are ob-
tained via iterated linearizations of its gradient. There
is a unique non-trivial stationary point (with V 6= 0).
Fig. 2 shows the variational parameters at this point as
functions of the filling n. For the entire range of fillings,
the ground state of the reference system lies in the in-
variant subspace with Ntot =
iσciσ + a
iσaiσ) = 2.
The parameters as well as the chemical potential are
smooth functions of n. We have checked that the ther-
modynamical consistency condition n = −L−1∂Ω/∂µ =∫ 0
ρ(ω)dω is satisfied within numerical accuracy. Here
ρ(ω) = D(ω + µ− Σ(ω)) (17)
is the interacting local density of states (DOS).
At half-filling the values of the optimized on-site en-
ergies are consistent with particle-hole symmetry. With
ε0 − µ = −U/2 and εc − µ = 0 the reference system is
in the Kondo regime with a well-formed local moment at
the correlated site. The finite hybridization strength V
leads, for U = W , to a finite DOS ρ(ω = 0) > 0 and thus
0 0.2 0.4 0.6 0.8 1
filling  n
2S-DMFT
two-site DIA
FIG. 3: Quasi-particle weight z as a function of the filling
within the two-site DIA (full lines) and the two-site DMFT40
(dashed lines). Calculations for U = W and U = 2W .
to a metallic Fermi liquid as it is expected for the Hub-
bard model within a (dynamical) mean-field description.
Due to the simple structure of the self-energy generated
by the two-site reference system, however, quasi-particle
damping effects are missing.
Decreasing the filling from n = 1 to n = 0 drives the
reference system more and more out of the Kondo regime.
While εc stays close to the chemical potential, the on-site
energy of the correlated site ε0 crosses µ close to quar-
ter filling and lies above µ eventually. Note that ε0 = 0
within the DMFT, i.e. for ns → ∞, while for finite ns
there is a clear deviation from ε0 = 0 which is necessary
to ensure thermodynamical consistency. For fillings very
close to n = 0, the grand potential Ωt,U [Σε0,εc,V ] be-
comes almost independent of Σ. This implies that it be-
comes increasingly difficult to locate the stationary point
with the numerical algorithm used. The slight upturn of
ε0 below n = 0.01 (see Fig. 2) might be a numerical ar-
tifact.
It is instructive to compare the parameters with those
of the two-site DMFT (2S-DMFT).40 The 2S-DMFT is
a simplified version of the DMFT where a mapping onto
the two-site single impurity Anderson model is achieved
by means of a simplified self-consistency equation. As-
suming ε0 = 0 as in the full DMFT, there are two pa-
rameters left (εc and V ) which are fixed by considering
the first non-trivial order in the low- and in the high-
frequency expansion of the self-energy and the Green’s
function in the DMFT self-consistency equation. Al-
though being well motivated, this approximation is es-
sentially ad hoc. One therefore has to expect that the 2S-
DMFT is thermodynamically inconsistent and exhibits a
violation of Luttinger’s sum rule. A comparison of the
DIA for ns = 2 with the 2S-DMFT is thus ideally suited
to demonstrate the advantages gained by constructing
approximations within the variational framework of the
First of all, there are differences in fact. At half-filling
the 2S-DMFT predicts the hybridization to be somewhat
larger than the two-site DIA while the value for εc is
again fixed by particle-hole symmetry. Deviations grow
with decreasing filling. Contrary to the two-site DIA, V
monotonously increases and is larger in the entire filling
range, ε0 = 0 by construction, and εc even diverges for
n → 0 within the 2S-DMFT (see Ref. 40). On the other
hand, the system is essentially uncorrelated in the limit
n → 0. Strong differences in the parameters, which en-
ter the self-energy only, therefore do not necessarily im-
ply strongly different physical quantities. This is demon-
strated by Fig. 3 which shows the quasi-particle weight
calculated via
dΣ(ω = 0)
as a function of the filling. While there are obvious differ-
ences when comparing the results from the two-site DIA
with those of the 2S-DMFT, the qualitative trend of z is
very similar in both approximations. Both approxima-
tions also compare well with the full DMFT: There is a
quadratic behavior of z(n) for n → 1 in the Fermi-liquid
phase (U = W ) and a linear trend when approaching the
Mott phase (U = 2W ). The critical interaction strength
for the Mott transition is found to be Uc ≈ 1.46W for
the two-site DIA and Uc = 1.5W within the 2S-DMFT.
For details on the Mott transition see Refs. 9,40.
In case of a local and site-independent self-energy, the
Luttinger sum rule can be written in the form41
µ = µ0 +Σ(ω = 0) , (19)
where µ0 is the chemical potential of the free (U = 0)
system at the same particle density. Eq. (19) implies that
not only the enclosed volume but also the shape of the
Fermi surface remains unchanged when switching on the
interaction. Using Eq. (17) this immediately implies41
ρ(0) = D(µ0) = ρ0(0) , (20)
i.e., in case of a correlated metal, the value of the inter-
acting local density of states at ω = 0 is independent of
U and thus fixed to the value of the density of states of
the non-interacting system at the same filling.
The interacting and the non-interacting DOS are plot-
ted in Fig. 4 for different fillings and for U = W and
U = 2W . The impurity self-energy of the two-site ref-
erence system is an analytical function of ω except for
two first-order poles on the real axis. Via Eq. (17) this
two-pole structure implies that the DOS consists of three
peaks the form of which is essentially given by the non-
interacting DOS. At half-filling the three peaks are eas-
ily identified as the lower and the upper Hubbard band
and the quasi-particle resonance as it is characteristic
for a (dynamical) mean-field description.27 For U = W
the resonance still has a significant weight. The weight
decreases upon approaching the critical interaction, and
the resonance has disappeared in the Mott insulator for
-4 -2 0 2 4 6
-6 -4 -2 0 2 4 6 8 10
n=1.0
n=0.75
n=0.5
n=0.25
n=0.0
FIG. 4: Interacting local density of states ρ(ω) (solid lines)
for different fillings as indicated. Calculations using the two-
site DIA for U = W (left) and U = 2W (right). For n = 0.25,
n = 0.5 and n = 0.75 the non-interacting DOS ρ0(ω) is shown
for comparison (dashed lines). Note that ρ(0) = ρ0(0). The
dotted line for U = 2W in the top panel is the DOS for
n = 0.99.
U = 2W . Hole doping of the Mott insulator is accom-
plished by the reappearance of the resonance at ω = 0
which preempts the creation of holes in the lower Hub-
bard band.42 As can be seen in the spectrum for n = 0.99
in the top panel (dotted line), the quasi-particle res-
onance appears within the Mott-Hubbard gap. With
decreasing filling, the upper Hubbard band gradually
shifts to higher excitation energies and loses weight. This
weight is transfered to the low-energy part of the spec-
trum. For lower fillings where the Kondo regime has been
left, one would actually expect that the quasi-particle res-
onance disappears by merging with the lower Hubbard
band. This, however, cannot be described with the sim-
ple two-pole structure of the self-energy. One therefore
should interprete the gap around ω = −1 at n = 0.25
as an artifact of the approximation. Furthermore, the
widths of the Hubbard bands is considerably underes-
timated as damping effects are missing completely. The
filling-dependent spectral-weight transfer across the Hub-
bard gap as well as the energy positions of the main
peaks, however, are in overall agreement with general
expectations.34,43
It is worth emphasizing that this simple two-site
0 0.2 0.4 0.6 0.8 1
-0.02
filling  n
Hubbard-I
2S-DMFT
two-site DIA
FIG. 5: Numerical results for the difference between the
volume enclosed by the Fermi surface VFS and the filling n as
a function of n for U = W = 4. The Luttinger sum rule (VFS−
n = 0) is exactly respected by the two-site DIA. Results for
the 2S-DMFT and the Hubbard-I approximation are shown
for comparison. Dashed line: difference between the filling n
and the average occupation of the correlated (impurity) site
in the reference system at stationarity for the two-site DIA.
dynamical-impurity approximation exactly fulfills the
Luttinger sum rule. In Fig. 4 this can be seen by compar-
ing with the DOS of the non-interacting system (dashed
lines). The non-interacting DOS cuts the interacting one
at ω = 0 which shows that Eq. (20) is satisfied. Note
that this is trivial for n = 1 as this is already enforced
by particle-hole symmetry. Off half-filling, however, the
pinning of the DOS to its non-interacting value at ω = 0
is a consequence of Φ-derivability and thereby a highly
non-trivial feature.
In contrast, the 2S-DMFT does show a violation of
Luttinger’s sum rule which, however, must be attributed
to the ad hoc nature of the approximation. Fig. 5 shows
0 0.2 0.4 0.6 0.8 1
filling  n
2S-DMFT
two-site DIA
FIG. 6: Filling dependence of the compressibility κ for U =
W as obtained within the 2S-DMFT via κ = ∂n/∂µ (solid
line) and via a general Fermi-liquid relation (Eq. (22), dashed
line). Using the two-site DIA identical results are obtained
for both cases.
the difference between the volume enclosed by the Fermi
surface
VFS =
Θ(µ−ε(k)−Σ(0)) = 2
dεD(ε+µ−Σ(0))
and the filling n as a function of the filling. As can be
seen, there is an artificial violation of the sum rule for the
2S-DMFT which is of the order of a few per cent while
for the Φ-derivable two-site DIA the sum rule is fully re-
spected. Note that, unlike the DMFT and also unlike
the simplified 2S-DMFT, the two-site DIA predicts a fill-
ing which slightly differs from the average occupation of
the correlated impurity site in the reference system (see
dashed line in Fig. 5). For a finite number of bath sites ns
this appears to be necessary to fulfill the Luttinger sum
rule. The figure also shows the result obtained within
the Hubbard-I approximation.34 Here a very strong (ar-
tificial) violation of up to 100 % (for n close to half-filling)
is obtained. This should be considered as a strong draw-
back which is typical for uncontrolled mean-field approx-
imations.
There are more relations which, analogously to the
Luttinger sum rule, can be derived by means of per-
turbation theory to all orders6 in the exact theory and
which are respected by weak-coupling conserving approx-
imations. For example, the compressibility, defined as
κ = ∂n/∂µ, can be shown to be related to the interact-
ing DOS and the self-energy at the Fermi edge via
κ = 2ρ(0)
∂Σ(0)
. (22)
Fig. 6 shows that for the 2S-DMFT it makes a difference
whether κ is calculated as the µ-derivative of the filling
or via Eq. (22). Again, this must be attributed to the
fact that the 2S-DMFT is not a Φ-derivable approxima-
tion. Contrary, the two-site DIA does respect the general
Fermi-liquid property (22) and thus yields the same re-
sult in both cases (see Fig. 6).
V. VIOLATION OF LUTTINGER’S SUM RULE
IN FINITE SYSTEMS
The preceding section has demonstrated that the two-
site DIA satisfies the Luttinger sum rule. According to
Eq. (14), we can conclude that the Luttinger sum rule
must hold for the corresponding reference system, i.e. for
the two-site single-impurity Anderson model. Of course,
this can be verified more directly by evaluating Eq. (5).
In case of a finite system or a system with reduced trans-
lational symmetries, the Green’s function is a matrix with
elements Gαβ(ω) where α refers to the one-particle basis
states, and the Luttinger sum rule reads:
α(k)m Θ(µ−ω
m ) =
Θ(µ−ω(k)m )−
Θ(µ− ζ(k)n ) .
Here the index k labels the elements of the diagonalized
Green’s function, i.e. Eq. (5) is generalized by replacing
(k, σ) → k. In case of an impurity model, Eq. (23) ac-
tually represents the Friedel sum rule.44,45 For the two-
site single-impurity Anderson model, the different one-
particle excitation energies ω
m − µ, the zeros of the
Green’s function ζ
n − µ and the weights α
m are eas-
ily determined by full diagonalization. We find that Eq.
(23) is satisfied in the entire parameter space (except for
V = 0, see below).
Note that a violation of the sum rule occurs when, as
a function of a model parameter x, a zero of the Green’s
function crosses ω = 0 for x = xc. At xc the number of
negative zeros counted by the second term on the r.h.s.
changes by one while the first term as well as the l.h.s.
remain constant since (unlike a pole) a zero of the Green’s
function is generically not connected with a change of the
ground state (level crossing). This implies that the sum
rule would be violated for x < xc or for x > xc.
The case V = 0 is exceptional. Within the two-site
DIA this corresponds to the Mott insulator (see Fig. 4,
topmost panel for U = 2W ). For V = 0 the reference
system consists of two decoupled sites, and the Green’s
function becomes diagonal in the site index. There is
no zero of the local Green’s function corresponding to
the uncorrelated site. We can thus concentrate on the
correlated site where the local Green’s function exhibits
a zero at η−µ = ε0+U/2. In the sector with one electron
at the correlated site (ε0 < µ < ε0 + U), the second
term on the r.h.s. changes by two at µ = µc = ε0 + U/2
because of the two-fold degenerate ground state. In this
case Luttinger’s sum rule in the form (23) is violated for
µ < µc and for µ > µc. This “violation”, however, is a
trivial one which immediately disappears if the ground-
state degeneracy is lifted by applying a weak field term,
for example.
Fig. 7 shows a phase diagram of the single-impurity
Anderson model with ns = 4 sites as obtained by
full diagonalization. The diagram covers the entire
range of the total particle number N =
〈c†σcσ〉 +∑
k=2〈a
akσ〉 from N = 0 to N = 2ns = 8. A non-
degenerate ground state is enforced by applying a small
but finite magnetic field. No violation of the Luttinger
sum rule is found. We have repeated the same calculation
also for ns = 10 using the Lanczos technique.
46 Again,
the sum rule is found to be always satisfied (We have
performed calculations for different U and bath parame-
ters). This might have been expected as the (ns → ∞)
Anderson model can generally be classified as a (local)
Fermi liquid.47
The situation is less clear in the case of correlated lat-
tice models such as the Hubbard or the t-J model. For
two dimensions there are several numerical studies using
high-temperature expansion,30 quantum Monte-Carlo,31
extended DMFT,32,48 and dynamical cluster approxima-
tion (DCA)33 which indicate a violation in the strongly
correlated metallic phase close to half-filling. For studies
of large clusters or studies directly working in the thermo-
-1 0 1 2 3
SIAM n  =4s
FIG. 7: Phase diagram µ vs. ε of the single-impurity An-
derson model with ns = 4 sites. Total particle numbers are
indicated by Roman figures. Results have been obtained by
full diagonalization for the following model parameters. One-
particle energies: ε0 = 0 (correlated site), εk = ε + (k − 3)
with k = 2, 3, 4 (uncorrelated bath sites). Hubbard interac-
tion: U = 2ε. Hybridization strength: Vk = 0.1 for k = 2, 3, 4.
To lift Kramers degeneracy in case of an odd particle number,
a weak (ferromagnetic) field of strength b = 0.001 is coupled
to the local spins. The dashed line marks the particle-hole
symmetric case. Luttinger’s sum rule is found to be satisfied
in the entire parameter space.
dynamic limit, a definite conclusion on the validity of the
sum rule is difficult to obtain as finite-temperature or ar-
tificial broadening effects etc. must be controlled numer-
ically. Contrary, full diagonalization of Hubbard clusters
consisting of a few sites only can provide exact results.
While their direct relevance for the thermodynamic limit
is less clear, it is important to note that reference sys-
tems with a finite number of sites or a finite number of
correlated sites provide the basis for a number of cluster
approaches within the SFT framework. Via Eq. (14) their
properties are transferred to the approximate treatment
of lattice models in the thermodynamic limit.
The validity of Eq. (23) has been checked for Hubbard
clusters of different size and in different geometries. The
µ vs. U phase diagram for an L = 4-site open Hubbard
chain with nearest-neighbor hopping in Fig. 8 shows a
representative example. Again, a small but finite field
term is added to avoid a ground-state degeneracy. As the
chemical potential, for fixed U , is moved off the particle-
hole symmetric point µ = U/2 and exceeds certain criti-
cal values (red lines), the particle number N [as obtained
from the l.h.s. of Eq. (23)] changes from N = L down to
(up to) N = 0 (N = 2L). A critical µ value indicates a
change of the ground state (level crossing) that is accom-
panied by a change of the ground-state particle number.
In the one-particle Green’s function this is characterized
by a pole ω
m −µ crossing ω = 0. The blue lines indicate
-5 0 5 10 15
Hubbard L=4
FIG. 8: Phase diagram µ vs. U of the Hubbard model with
L = 4 sites (open chain) as obtained by full diagonalization.
Nearest-neighbor hopping t = −1. A weak (ferromagnetic)
field of strength b = 0.01 is applied to lift Kramers degener-
acy. The dashed line marks the particle-hole symmetric case.
Particle numbers (l.h.s. of Eq. (23)) are indicated by Roman
figures. R.h.s. of Eq. (23): Arabic figures. Luttinger’s sum
rule is found to be violated for sufficiently strong U . Uc1, Uc,2:
critical interactions.
those chemical potentials at which a zero of the Green’s
function ζ
n − µ crosses ω = 0. Whenever this happens
the r.h.s. of Eq. (23) changes while the l.h.s. is constant.
Fig. 8 shows that this occurs several times in the N = L
sector. At the particle-hole symmetric point µ = U/2
the Luttinger sum rule is obeyed while it is violated in a
wide region of the parameter space corresponding to half-
filling N = L. However, a critical interaction strength
Uc turns out to be necessary. The value for Uc strongly
varies for different cluster sizes and geometries but has
always been found to be positive and finite. Note that
for L = 4 the sum rule is fulfilled for any particle num-
ber N 6= L. Qualitatively similar results can be found for
the L = 2-site Hubbard cluster where calculations can be
done even analytically. Again, a violation of the sum rule
is found in the half-filled sector beyond a certain critical
This has already been noticed by Rosch49 and was used
in combination with a strong-coupling expansion to ar-
gue that a violation of the sum rule generically occurs
for a Mott insulator. Stanescu et al.50 have shown quite
generally that the sum rule is fulfilled when particle-hole
symmetry is present (the Luttinger surface is the same
as the Fermi surface of the non-interacting system) but
violated in the Mott insulator away from particle-hole
symmetry. It is interesting to note that these arguments
cannot be used to construct a violation of the sum rule
within DMFT or for a single-impurity Anderson model:
For an (almost) particle-hole symmetric case and model
parameters describing a Mott insulator (within DMFT),
-4 -2 0 2 4 6 8
Luttinger
sum rule
particle
number
U=16t
FIG. 9: Ground-state particle number (red Roman figures,
l.h.s. of Eq. (23)) and prediction by the Luttinger sum rule
(blue Arabic figures, r.h.s. of Eq. (23)) as functions of the
chemical potential for a L = 9-site Hubbard cluster with pe-
riodic boundary conditions. Arabic numbers are only given
when different from Roman ones. Calculations using the
Lanczos method and a finite but small magnetic field and
finite but small on-site potentials to lift ground-state degen-
eracies.
an odd number of sites ns (with ns → ∞) must be consid-
ered and thus a magnetic field is needed to lift Kramers
degeneracy. Even an infinitesimal field, however, leads
(at zero temperature) to a finite and even large polar-
ization corresponding a well-formed but unscreened local
moment. This polarization is incomplete for any finite
U as the DMFT predicts a small but finite double occu-
pancy for a Mott insulator. Still there is a proximity to
the fully polarized band insulator which finally results in
a weakly correlated state and thus in a situation which
is unlikely to show a violation of the sum rule.
We have also considered Hubbard clusters with L = 9
and L = 10 sites by using the Lanczos technique.46
Calculations have been performed for different Lanczos
depths lmax to ensure that the results are independent
of lmax. Fig. 9 displays an example for L = 9 and a
highly symmetric cluster geometry with periodic bound-
ary conditions and a well-defined reciprocal space. To lift
ground-state degeneracies resulting from spatial symme-
tries as well as the Kramers degeneracy, small but fi-
nite on-site potentials and a small magnetic-field term
are included in the cluster Hamiltonian. Fig. 10 shows
an example for L = 10 sites without any spatial sym-
metries. Kramers degeneracy for odd N is removed by
applying a small magnetic field. With the figures we com-
pare the expressions on the left-hand and the right-hand
side of Eq. (23). Obviously, the sum rule is respected in
-4 -2 0 2 4 6 8
Luttinger
sum rule
particle
number
U=16t
FIG. 10: The same as Fig. 9 but for 10 sites.
most cases. Violations are seen for half-filling N = L,
i.e. in the “Mott-insulating phase”, which is consistent
with Ref. 49. However, the sum rule is also violated
in the “metallic phase” close to half-filling, namely for
N = L − 1 (Fig. 9, L = 9) and N = L − 1, L − 2 (Fig.
10, L = 10). This nicely corresponds to the generally
observed trend30,31,32,33,48 for violations in the slightly
doped metallic regime. We have also verified that the
sum rule is restored by lowering U .
Fig. 9 and 10 demonstrate that the sum rule is violated
in the whole µ range corresponding to N = L − 1. This
is an important point as it shows that it is irrelevant
whether the T = 0 limit is approached by holding 〈N〉
fixed and adjusting µ = µ(T ) or by fixing µ and let 〈N〉 =
〈N〉(T ) be T -dependent. A violation of the sum rule is
found in both cases.
Kokalj and Prelovs̆ek51 have demonstrated that viola-
tions of the sum rule can also be found for the t-J model
on a finite number of sites. Our result provides an ex-
plicit example showing that not only for t-J51 but also
for Hubbard clusters a violation can be found when the
chemical potential is set to µ = limT→0 µ(T ) with µ(T )
obtained for given 〈N〉 = const. Anyway, the original
proof5 does not depend on this choice for µ but appears
to work for any µ.
The results raise the question which assumptions used
in the original proof of the theorem are violated or where
the proof breaks down. Note that the recently proposed
alternative topological proof52 assumes a Fermi-liquid
state from the very beginning and thus cannot be applied
to a finite system. Using weak symmetry-breaking fields,
a more or less trivial breakdown due to ground-state de-
generacy has been excluded. An analysis of the ground
state of the L = 2 and L = 4 Hubbard clusters which
are accessible with exact (analytical or numerical) meth-
ods has shown that, for model parameters where the sum
rule is violated, the interacting ground state can never-
theless be adiabatically connected to the non-interacting
one. This excludes level crossing as a potential cause for
the breakdown. While we cannot make a definite state-
ment, it appears at least plausible that the violation of
the sum rule results from a non-commutativity of two
limiting processes, the infinite skeleton-diagram expan-
sion and the limit T → 0.
Using a functional-integral formalism, the Luttinger-
Ward functional at finite T can also be constructed in a
non-perturbative way, i.e. avoiding an infinite summation
of diagrams, as has been shown recently.53 Formally, the
Luttinger sum rule can be obtained by exploiting a gauge
invariance of the Luttinger-Ward functional [see Ref. 53]:
∂(iωn)
ΦU [G(iωn)] = 0 . (24)
If at all, this invariance can only be shown for T = 0
where iωn becomes a continuous variable. Unfortunately,
the non-perturbative construction of ΦU requires a T > 0
formalism. Hence, the validity of the sum rule depends
on question whether the limit T → 0 commutes with the
frequency differentiation. Necessary and sufficient condi-
tions for this assumption are not easily worked out. An
understanding of the main reason for the possible break-
down of the sum rule in finite systems, very similar to
the case of Mott insulators, is therefore not yet available
(see also the discussion in Ref. 49).
VI. CONCLUSIONS
Φ-derivable approximations are conserving, thermody-
namically consistent and, for T = 0, formally respect cer-
tain non-trivial theorems such as the Luttinger sum rule.
As the construction of the Luttinger-Ward functional Φ
is by no means trivial and may conflict with the limit
T → 0 or different other limiting processes, however, the
validity of the sum rule may be questioned. Violations of
the sum rule can be found in fact for the case of strongly
correlated electron systems. For Mott insulators and fi-
nite systems in particular, a breakdown is documented
easily.
This implies that a general approximation for the
spectrum of one-particle excitations (of the one-particle
Green’s function) may violate the sum rule for two pos-
sible reasons, namely because (i) the sum rule is violated
in the exact theory, or (ii) the approximation generates
an artificial violation.
Within the usual weak-coupling conserving approxima-
tions, such as the fluctuation-exchange approximation,
the sum rule always holds as the formal steps in the gen-
eral proof of the sum rule can be carried over to the
approximation – but with the important simplification
of a limited class of diagrams. This also implies that
weak-coupling conserving approximations, when applied
beyond the weak-coupling regime, might erroneously pre-
dict the sum rule to hold.
The present paper has focussed on non-perturbative
conserving approximations. Non-perturbative approxi-
mations, constructed within the framework of the self-
energy-functional theory and referring to a certain ref-
erence system, are Φ-derivable and consequently respect
certain macroscopic conservation laws and are thermo-
dynamically consistent. Whether or not the sum rule
holds within the approximate approach, however, cannot
be answered generally. We found that Luttinger’s sum
rule holds within an (SFT) approximation if and only if
it holds exactly in the corresponding reference system.
The reference system that leads to the most simple
but non-trivial example for a non-perturbative conserv-
ing approximation consists of a single correlated and
a single bath site. For this two-site system, we have
found the sum rule to be valid in the entire parameter
space. Consequently, the resulting two-site dynamical-
impurity approximation (DIA) – opposed to more ad hoc
approaches like the two-site DMFT – fully respects the
sum rule as could be demonstrated in different ways. In
view of the simplicity of the approximation this is a re-
markable result. Since the sum rule dictates the low-
frequency behavior of the one-particle Green’s function,
important mean-field concepts, such as the emergence of
a quasi-particle resonance at the Fermi edge, are quali-
tatively captured correctly, even away from the particle-
hole symmetric case. This qualifies the two-site DIA for a
quick but rough estimate of mean-field physics, including
phases with spontaneously broken symmetries.
Full diagonalization and the Lanczos method have
been employed to show that also the single-impurity An-
derson model with a finite number of ns > 2 sites respects
the sum rule. Consequently, this property is transferred
to an ns-site DIA. For ns → ∞ the full dynamical mean-
field theory is recovered which is thereby recognized as
the prototypical non-perturbative conserving approxima-
tion. Clearly, in the case of the DMFT, Φ-derivability is
well known27 and obvious, for example, when construct-
ing the DMFT with the help of the skeleton-diagram ex-
pansion.
Using as a trial self-energy the self-energy of a cluster
with L > 1 correlated sites, generates an approximation
where short-range spatial correlations are included up to
the cluster extension. These variational cluster approxi-
mations provide a first step beyond the mean-field con-
cept. Again, whether or not the sum rule is respected
within the VCA depends on the reference system itself.
For the L = 2 Hubbard cluster, analytical calculations
straightforwardly show that violations of the sum rule
occur at half-filling, beyond a certain critical interaction
strength. In the thermodynamic limit, this would corre-
spond to the Mott-insulating regime. Applying the Lanc-
zos method to larger clusters, has shown, however, that
a breakdown of the Luttinger sum rule is also possible
for fillings off half-filling. For sufficiently strong U , the
sum rule is violated in the whole N = L − 1-particle
sector. This would correspond to a (strongly correlated)
metallic state in the thermodynamic limit. Whether or
not a VCA calculation is consistent with the sum rule,
then depends on the set of cluster hopping parameters t′
which make the self-energy functional stationary. First
VCA calculations54 for the D = 2 Hubbard model at low
doping and using clusters with up to L = 10 sites do
predict a violation in fact.
It is by no means clear a priori what happens in a clus-
ter approach using additional bath degrees of freedom as
variational parameters, as e.g. in the cellular DMFT.21,22
The usual periodization of the self-consistent C-DMFT
self-energy, however, should be avoided when testing the
sum rule as this introduces an additional (though physi-
cally motivated) approximation. Instead, Eq. (5) must be
used with k re-interpreted as an index referring to the el-
ements of the self-consistent diagonalized lattice Green’s
function.
Employing the dynamical cluster approximation
(DCA)20 represents an alternative which directly oper-
ates in reciprocal space. From a real-space perspective,
the DCA is equivalent with the cellular DMFT but ap-
plied to a modified model H = H(t,U) → H(t,U) with
modified hopping parameters which are invariant under
superlattice translations as well as under translations on
the cluster.29,55 In the limit L → ∞ the replacement
t → t becomes irrelevant. Analogous to the C-DMFT,
the sum rule then holds within the DCA if and only if
it holds for the individual cluster at self-consistently de-
termined cluster parameters. Note, however, that this
requires that (besides the DCA self-energy) the modified
hopping t instead of the physical hopping has to be con-
sidered in the computation of the volume enclosed by
the Fermi (Luttinger) surface of the lattice model. This
is exactly what is usually done in DCA calculations.
Within this context and in view of the violations found
for finite Hubbard clusters, it is possible to understand
why a non-perturbative cluster approximation, like the
VCA,54 or a cluster extension of the DMFT, like the
DCA,33 can produce results that are inconsistent with
Luttinger’s theorem.
Acknowledgments
We thank Robert Eder and Achim Rosch for valu-
able discussions. The work is supported by the Deutsche
Forschungsgemeinschaft within the Forschergruppe FOR
APPENDIX A: MACROSCOPIC CONSERVATION
OF ENERGY, PARTICLE NUMBER AND SPIN
The one-particle Green’s function as obtained within
an approximation generated by the choice of a reference
system respects the macroscopic conservation laws which
result from symmetries of the system with respect to con-
tinuous transformation groups:
Energy conservation is apparently respected as by con-
struction the approximate SFT Green’s function depends
on a single frequency only, i.e. is invariant under time
translations.
Conservation of the total particle number and spin is
respected if the approximate G transforms in the same
way as the exact Green’s function under global U(1) and
SU(2) gauge transformations. Consider a general trans-
formation of the form
c†α → c
with unitary S such that the interaction part H1(U)
of the Hamiltonian is invariant (α refers to the states
of the one-particle basis). In a diagrammatic approach,
the invariance of H1(U) implies that the corresponding
conservation law is respected “locally” at each vertex.
Hence, for a conserving approximation in the sense of
Baym and Kadanoff, the transformation behavior of the
free Green’s function is then propagated by the diagram
rules to the full Green’s function. Consequently, the lat-
ter must transform under S in the same way as the exact
G, i.e.
Gαβ → Gαβ =
. (A2)
Consider now the case of the SFT. One has to show
that the approximate Green’s function G for the trans-
formed system with Hamiltonian H is given by G =
† if G is the approximate Green’s function of the
model H . Applying the transformation (A1) to H , one
finds H = H0(t) +H1(U) → H = H0(t) +H1(U) with
t = StS†. Again, S is assumed to leave the interaction
part invariant.
The Green’s function G of the transformed model is
(approximately) constructed via
from the free Green’s function of the transformed model
and the SFT self-energy which is the self-energy of the
reference system H ′ = H0(t
s) +H1(U) at the stationary
point t
For the transformed problem H , the stationary point
s is determined from the SFT Euler equation:
)ω;αβ
= 0 .
As an ansatz to solve the Euler equation we take
= St′1S
† (A5)
with t′1 to be determined. The transformation law (A2)
for the exact Green’s function of the reference system
= GSt′
S†,U = SGt′1,US
†. This also holds for
the free Green’s function. Using the Dyson equation of
the reference system we can deduce Σ
= SΣt′
Furthermore, for the free Green’s function of the trans-
formed original model we have G
t,0 = SGt,0S
†. Using
these results, we see that Eq. (A4) is equivalent to
t,0 −Σt′1,U
∂(Σt′
,U )ω;αβ
= 0 .
But this is just the Euler equation for the original
model which is solved by t′1 = t
s. Remembering the
ansatz made, we now have for the stationary point
s = St
†. Inserting this into Eq. (A3) gives G =
S(G−1
t,0 − Σt′s,U )
† = SGS† which is the desired re-
sult.
Note that one has to ensure that the stationary point
for the transformed problem t
s = St
† lies within the
space of one-particle parameters characteristic for the ref-
erence system. For models with local interaction part
and for local (and also global) gauge transformations,
however, this is always easily satisfied.
1 G. Baym and L. P. Kadanoff, Phys. Rev. 124, 287 (1961).
2 G. Baym, Phys. Rev. 127, 1391 (1962).
3 N. E. Bickers, D. J. Scalapino, and S. R. White, Phys. Rev.
Lett. 62, 961 (1989).
4 N. E. Bickers and S. R. White, Phys. Rev. B 43, 8044
(1991).
5 J. M. Luttinger and J. C. Ward, Phys. Rev. 118, 1417
(1960).
6 J. M. Luttinger, Phys. Rev. 119, 1153 (1960).
7 A. A. Abrikosow, L. P. Gorkov, and I. E. Dzyaloshinski,
Methods of Quantum Field Theory in Statistical Physics
(Prentice-Hall, New Jersey, 1964).
8 M. Potthoff, Euro. Phys. J. B 32, 429 (2003).
9 M. Potthoff, Euro. Phys. J. B 36, 335 (2003).
10 M. Potthoff, M. Aichhorn, and C. Dahnken, Phys. Rev.
Lett. 91, 206402 (2003).
11 M. Aichhorn, E. Arrigoni, M. Potthoff, and W. Hanke,
Phys. Rev. B 74, 024508 (2006).
12 C. Dahnken, M. Aichhorn, W. Hanke, E. Arrigoni, and
M. Potthoff, Phys. Rev. B 70, 245110 (2004).
13 K. Pozgajcic, preprint cond-mat/0407172.
14 W. Koller, D. Meyer, Y. Ono, and A. C. Hewson, Euro-
phys. Lett. 66, 559 (2004).
15 D. Sénéchal, P.-L. Lavertu, M.-A. Marois, and A.-M. S.
Tremblay, Phys. Rev. Lett. 94, 156404 (2005).
16 K. Inaba, A. Koga, S.-I. Suga, and N. Kawakami, Phys.
Rev. B 72, 085112 (2005).
17 K. Inaba, A. Koga, S.-I. Suga, and N. Kawakami, J. Phys.
Soc. Jpn. 74, 2393 (2005).
18 M. Aichhorn and E. Arrigoni, Europhys. Lett. 72, 117
(2005).
19 M. Eckstein, M. Kollar, M. Potthoff, and D. Vollhardt,
Phys. Rev. B 75, 125103 (2007).
20 M. H. Hettler, A. N. Tahvildar-Zadeh, M. Jarrell, T. Pr-
uschke, and H. R. Krishnamurthy, Phys. Rev. B 58, R7475
(1998).
21 G. Kotliar, S. Y. Savrasov, G. Pálsson, and G. Biroli, Phys.
Rev. Lett. 87, 186401 (2001).
22 A. I. Lichtenstein and M. I. Katsnelson, Phys. Rev. B 62,
R9283 (2000).
23 S. Okamoto, A. J. Millis, H. Monien, and A. Fuhrmann,
Phys. Rev. B 68, 195121 (2003).
24 W. Metzner and D. Vollhardt, Phys. Rev. Lett. 62, 324
(1989).
25 A. Georges and G. Kotliar, Phys. Rev. B 45, 6479 (1992).
26 M. Jarrell, Phys. Rev. Lett. 69, 168 (1992).
27 A. Georges, G. Kotliar, W. Krauth, and M. J. Rozenberg,
Rev. Mod. Phys. 68, 13 (1996).
28 G. Kotliar and D. Vollhardt, Physics Today 57, 53 (2004).
29 M. Potthoff and M. Balzer, Phys. Rev. B 75, 125112
(2007).
30 W. O. Putikka, M. U. Luchini, and R. R. P. Singh, Phys.
Rev. Lett. 81, 2966 (1998).
31 C. Gröber, R. Eder, and W. Hanke, Phys. Rev. B 62, 4336
(2000).
32 K. Haule, A. Rosch, J. Kroha, and P. Wölfle, Phys. Rev.
Lett. 89, 236402 (2002).
33 T. A. Maier, T. Pruschke, and M. Jarrell, Phys. Rev. B
66, 075102 (2002).
34 J. Hubbard, Proc. R. Soc. London A 276, 238 (1963).
35 M. C. Gutzwiller, Phys. Rev. Lett. 10, 159 (1963).
36 J. Kanamori, Prog. Theor. Phys. (Kyoto) 30, 275 (1963).
37 I. Dzyaloshinskii, Phys. Rev. B 68, 085113 (2003).
38 M. Potthoff, Adv. Solid State Phys. 45, 135 (2005).
39 A. L. Fetter and J. D. Walecka, Quantum Theory of Many-
Particle Systems (McGraw-Hill, New York, 1971).
40 M. Potthoff, Phys. Rev. B 64, 165114 (2001).
41 E. Müller-Hartmann, Z. Phys. B 76, 211 (1989).
42 D. S. Fisher, G. Kotliar, and G. Moeller, Phys. Rev. B 52,
17112 (1995).
43 A. B. Harris and R. V. Lange, Phys. Rev. 157, 295 (1967).
44 J. S. Langer and V. Ambegaokar, Phys. Rev. 121, 1090
(1961).
45 D. C. Langreth, Phys. Rev. 150, 516 (1966).
46 H. Q. Lin and J. E. Gubernatis, Comput. Phys. 7, 400
(1993).
47 A. C. Hewson, The Kondo Problem to Heavy Fermions
(Cambridge University Press, Cambridge, 1993).
48 K. Haule, A. Rosch, J. Kroha, and P. Wölfle, Phys. Rev.
B 68, 155119 (2003).
49 A. Rosch, preprint cond-mat/0602656.
50 T.D. Stanescu, P. Phillips and Ting-Pong Choy, Phys. Rev.
B 75, 104503 (2007).
51 J. Kokalj and P. Prelovs̆ek, Phys. Rev. B 75, 045111
(2007).
52 M. Oshikawa, Phys. Rev. Lett. 84, 3370 (2000).
53 M. Potthoff, Condens. Mat. Phys. 9, 557 (2006).
54 M. Balzer, W. Hanke, and M. Potthoff, unpublished
(2007).
55 G. Biroli, O. Parcollet, and G. Kotliar, Phys. Rev. B 69,
205108 (2004).
http://arxiv.org/abs/cond-mat/0407172
http://arxiv.org/abs/cond-mat/0602656
ABSTRACT
  Weak-coupling conserving approximations can be constructed by truncations of
the Luttinger-Ward functional and are well known as thermodynamically
consistent approaches which respect macroscopic conservation laws as well as
certain sum rules at zero temperature. These properties can also be shown for
variational approximations that are generated within the framework of the
self-energy-functional theory without a truncation of the diagram series.
Luttinger's sum rule represents an exception. We analyze the conditions under
which the sum rule holds within a non-perturbative conserving approximation.
Numerical examples are given for a simple but non-trivial dynamical two-site
approximation. The validity of the sum rule for finite Hubbard clusters and the
consequences for cluster extensions of the dynamical mean-field theory are
discussed.

<|endoftext|><|startoftext|>
2D-MIT as self-doping of aWigner-Mott insulator
S. Pankov ∗ V. Dobrosavljevic
National High Magnetic Field Laboratory, Florida State University, Tallahassee, FL 32306
Abstract
We consider an interaction-driven scenario for the two-dimensional metal-insulator transition in zero magnetic field (2D-MIT),
based on melting the Wigner crystal through vacancy-interstitial pair formation. We show that the transition from the Wigner-
Mott insulator to a heavy Fermi liquid emerges as an instability to self-doping, resembling conceptually the solid to normal liquid
transition in He3. The resulting physical picture naturally explains many puzzling features of the 2D-MIT.
Key words: Strong correlation; disorder; metal-insulator transition; Hubbard model
PACS: 71.27.+a, 72.15.Rn, 71.30.+h
A series of fascinating experiments, as first performed
some ten years ago by Kravchenko and co-workers [1], have
deeply changed our thinking about the two dimensional
electron gas (2DEG). These and many later experiments
demonstrated convincingly that the 2DEG can exhibit typ-
ical metallic behavior [2] above a well defined critical den-
sity nc. One of the most prominent features of this metallic
phase is an unprecedented resistivity drop, which is found
only in the low density regime n & nc close to the tran-
sition. These findings suggested that a well-defined metal-
insulator transition (MIT) may exist even in two dimen-
sions, in contrast to long held-beliefs based on the theories
for noninteracting disordered electrons.
These experiments are typically performed at such low
electron density where the relative strength of the Coulomb
interaction is so large (rs & 10), that the localization pro-
cesses could conceivably be suppressed by interaction ef-
fects. A diffusion-mode theory describing such interaction
renormalizations at weak disorder has been developed by
Finkelshtein and Punnoose [3], suggesting that sufficiently
strong interactions may indeed stabilize the metallic phase.
However, this theory can provide guidance only within a
narrowdiffusive regime restricted to very low temperatures.
In contrast, the most striking experimental results have
been established in a broad parameter range well outside
this regime, and are most pronounced in the cleanest sam-
ples [2]. In particular, the best established experimental sig-
nature of the transition relies on careful effective mass mea-
surements, which is found to diverges as m∗ ∝ (n− nc)
∗ Corresponding author; email: pankov@magnet.fsu.edu
1 1.05 1.1
-1.002
-0.998
-0.996
Fig. 1. Evolution of the free energy profile W [δ], as the electron
density is increased across the MIT (from top to bottom). The cusp
corresponds to the Wigner solid. The self-doped transition (blue
line) takes place before the instability emerges at half-filling (δ = 0,
orange line) and the Wigner insulating state is replaced by a heavy
Fermi liquid.
while the Lande g∗-factor remains largely unrenormalized.
Such phenomena cannot be understood within a low-energy
diffusion mode theory, which relies on Anderson localiza-
tion to produce an insulating phase.
A fundamental question is thus posed by these experi-
ments: what is the basic mechanism that drives the metal-
insulator transition in these systems? Does one have to rely
on disorder effects at all in the zero-th order approximation,
or can one understand the most important experimental
features by interaction effects alone? In this work, we con-
centrate solely on the effects of strong Coulomb interactions
in the clean limit, and try to establish which experimental
Preprint submitted to Elsevier 31 October 2018
http://arxiv.org/abs/0704.0250v1
features can be explained by entirely ignoring the disorder.
We approach the transition from the insulating side,
starting with the Wigner-Mott insulator, and examine
its melting by quantum fluctuations as density increases.
These are believed to be dominated [4] by the vacancy-
interstitial pair excitations on top of the classical triangu-
lar lattice configuration. Within the Wigner crystal, the
electrons are tightly bound, and the vacancy-interstitial
excitations can be well represented by a mapping to a two-
band Hubbard (i.e. charge-transfer) model, respectively
corresponding to the lattice and the interstitial electrons.
As the density increases, the charge-transfer gap eventu-
ally closes, and a metal-insulator transition assumes the
character of a Mott-metal-insulator transition, leading to
a strongly correlated metallic state on the metallic side.
Our model Hamiltonian reads:
fiσ + ecc
ciσ −
iσciσ + c
iσfiσ) +
fi↓ (1)
where f †, f and c†, c are creation and annihilation opera-
tors for site and interstitial electrons respectively. It is as-
sumed that the inter-cell hopping tij exists only between
interstitial orbitals, while only the site electrons are subject
to onsite Coulomb repulsion U , and the interstitial orbitals
are coupled to the site orbitals via hybridization V . The
local electrostatic potentials for the two bands are denoted
by ef and ec.
Wemodel the strong onsite repulsion by exclusion of dou-
ble ocupancy, using the standard slave-boson mean-field
formalism. The free energy per electron then reads:
W [λ, Z, µ, δ] =
ln (1 + exp (−(ε̃lk − µ)/T ))
(Z − 1) + µ (2)
where εlk are renormalized band energies, λ is the Lagrange
multiplier enforcing the slave-boson constraint, Z is the
quasiparticle weight, µ is the chemical potential. The chem-
ical potential µ is an internal parameter here, arising simi-
larly to λ, from constraining the electron density per unit
area. The self-doping δ measures deviation in the number
of electrons (per elementary cell) from the half filling. In
the classical limit of low electron density δ = 0, but it be-
comes δ 6= 0 at higher density, where quantum effects are
important. This is reminiscent of the liquid solid transition
in He3 [5]. The equations of the state are given by the sad-
dle point of the free energy W [λ, Z, µ, δ].
A peculiarity of this model is that the band energies
are not fixed, but are self-consistently determined through
their dependence on the occupation of the site and inter-
stitial orbitals. It immediately follows that the system is
unstable to the self-doping at the MIT. Indeed, by care-
fully accounting for the electrostatic energy balance due to
such charge transfer, we find that the free energy takes a
lower value (relative to the classical value at δ = 0) on one
of the branches at δ 6= 0 before the transition at half filling
is found (see Fig. 1).
With appropriately chosen model parameters we can ex-
plain the basic experimental results, which otherwise can-
not be captured by the diffusion mode theory. Similarly
to what is seen experimentally, we observe strong renor-
malization of the effective mass near the transition m∗ ∼
(n − nc)
−1. In this strongly correlated regime any extrin-
sic disorder is very effectively screened by interaction ef-
fects [7], providing a plausible scenario for the large resis-
tivity drop [8]. The magneto-resistance data are also natu-
rally explained by our two band model. The experiment [9]
has shown that in high parallel magnetic fields the trans-
port takes place by activated processes, with an activation
gap that vanishes linearly at some density n1 > nc. In the
strong field our model reduces to a trivial model of non-
interacting spinless electrons. For the density n < n1 the
system is a band insulator with the lower band completely
filled. The bands broaden as the density increases, and the
insulating gap closes linearly at some density n1 > nc.
Our static lattice model does not capture the collective
charge density fluctuations – the phonons of the Wigner
crystal. Because the Coulomb interaction is long ranged,
these collective modes are very soft, and play an important
role in renormalizing the model parameters. It is this strong
renormalization of the charge transfer gap ec − ef that
pushes the transition to such low density [6] (rs ≫ 1)), in a
fashion that is conceptually very similar to the formation of
the Coulomb gap [10]. At present, these effects are incorpo-
rated in through the choice of the electrostatic parameters
of our model. In future work, we would like to systemati-
cally incorporate these soft collective modes, the effects of
which are currently included in a semi-phenomenological
fashion. This program can be achieved by a variety of meth-
ods, including extended dynamical mean field approaches
(EDMFT), by exploiting unique properties of the Coulomb
potential, and by employing the techniques recently devel-
oped in the Coulomb glass context [10].
References
[1] S. V. Kravchenko et al., Phys. Rev. B 51, 7038 (1995).
[2] For a review, see S. V. Kravchenko and M. P. Sarachik, Rep.
Prog. Phys. 67, 1 (2004).
[3] A. Punnoose and A. M. Finkelstein, Phys. Rev. Lett. 88, 016802
(2002).
[4] B. Tanatar and D. M. Ceperley Phys. Rev. B 39, 5005 (1989).
[5] D. Vollhardt, Rev. Mod. Phys. 56, 99 (1984).
[6] Z. Lenac and M. Sunjic, Phys. Rev. B 52, 11238 (1995).
[7] D. Tanasković, et al., Phys. Rev. Lett. 91, 066603 (2003).
[8] M. C. O. Aguiar et al., Europhys. Lett. 67, 226 (2004).
[9] J. Jaroszynski et al, Phys. Rev. Lett. 92, 226403 (2004).
[10] S. Pankov and V. Dobrosavljević, Phys. Rev. Lett. 94, 046402
(2005).
	References
ABSTRACT
  We consider an interaction-driven scenario for the two-dimensional
metal-insulator transition in zero magnetic field (2D-MIT), based on melting
the Wigner crystal through vacancy-interstitial pair formation. We show that
the transition from the Wigner-Mott insulator to a heavy Fermi liquid emerges
as an instability to self-doping, resembling conceptually the solid to normal
liquid transition in He3. The resulting physical picture naturally explains
many puzzling features of the 2D-MIT.

<|endoftext|><|startoftext|>
Entanglement of Subspaces and Error Correcting Codes
Gilad Gour1, ∗ and Nolan R. Wallach2, †
1Institute for Quantum Information Science and Department of Mathematics and Statistics,
University of Calgary, 2500 University Drive NW, Calgary, Alberta, Canada T2N 1N4
2Department of Mathematics, University of California/San Diego, La Jolla, California 92093-0112
(Dated: November 4, 2018)
We introduce the notion of entanglement of subspaces as a measure that quantify the entanglement
of bipartite states in a randomly selected subspace. We discuss its properties and in particular we
show that for maximally entangled subspaces it is additive. Furthermore, we show that maximally
entangled subspaces can play an important role in the study of quantum error correction codes.
We discuss both degenerate and non-degenerate codes and show that the subspace spanned by
the logical codewords of a non-degenerate code is a k-totally (maximally) entangled subspace. As
for non-degenerate codes, we provide a mathematical definition in terms of subspaces and, as an
example, we analyze Shor’s nine qubits code in terms of 22 mutually orthogonal subspaces.
PACS numbers: 03.67.Mn, 03.67.Hk, 03.65.Ud
I. INTRODUCTION AND DEFINITIONS
Bipartite entanglement has been recognized as a cru-
cial resource for quantum information processing tasks
such as teleportation [1] and super dense coding [2]. As a
result, in the last years there has been an enormous effort
to understand and study the characterization, manipu-
lation and quantification of bipartite entanglement [3].
Yet, despite a great deal of progress that was achieved,
the theory on mixed bipartite entanglement is incom-
plete and a few central important questions such as the
additivity of the entanglement of formation [4] remained
open. Perhaps the richness and complexity of mixed bi-
partite entanglement can be found in the fact that a finite
set of measures of entanglement is insufficient to com-
pletely quantify it [5]. In this paper we shed some light
on mixed bipartite entanglement with the introduction
of a new kind of measure of entanglement which we call
entanglement of subspaces (EoS). We will see that EoS
can play an important role in the study of quantum error
correcting codes (QECC).
It has been shown recently [6, 7] that geometry of
high-dimensional vector spaces can be counterintuitive
especially when subspaces with very unique properties
are more common than one intuitively expects. That is,
roughly speaking, if a high dimensional subspace is se-
lected randomly it is quite likely to have strange proper-
ties. For example, in [7] it has been demonstrated that a
randomly chosen subspace of a bipartite quantum system
will likely contain nothing but nearly maximally entan-
gled states even if the dimension of the subspace is almost
of the same order as the dimension of the original sys-
tem. This kind of result has implications, in particular, to
super-dense coding [8] and for quantum communication
in general (see also [9] for other implications of randomly
∗Electronic address: gour@math.ucalgary.ca
†Electronic address: nwallach@ucsd.edu
selected subspaces). The quantification of the entangle-
ment of such subspaces is therefore very important and
we start with its definition.
Definition 1. Let HA and HB be finite dimensional
Hilbert spaces and let WAB be a subspace of HA ⊗HB.
The entanglement of WAB is defined as:
≡ min
ψAB∈WAB
: ‖ψAB‖ = 1
, (1)
where E
is the entropy of entanglement of ψAB.
Note that if the subspace WAB contains a product
state then E(WAB) = 0. On the other hand, if, for
example, WAB is orthogonal to a subspace spaned by
an unextendible product basis (UPB) [11, 12] then
E(WAB) > 0.
Claim: Let dA = dimHA and dB = dimHB. If
E(WAB) > 0 then
dimWAB ≤ (dA − 1)(dB − 1). (2)
This claim follows from [10] and also related to the fact
that the number of (bipartite) states in a UPB is at least
dA + dB − 1 [11]. Note that for two qubits (i.e. dA =
dB = 2) E(WAB) can be greater than zero only for one
dimensional subspaces.
We can use Eq. (1) to define another measure of en-
tanglement on bipartite mixed states.
Definition 2. Let ρ ∈ B
HA ⊗HB
be a bipartite
mixed state and let SABρ be the support subspace of ρ.
Then, the entanglement of the support of ρ is defined as
ESupport(ρ) ≡ E(SABρ ) .
It can be easily seen that this measure is not continuous
and therefore can not be considered as a proper measure
of entanglement. Nevertheless, this measure can serve as
a mathematical tool to find lower bounds for other mea-
sures of entanglement that are more difficult to calculate
http://arxiv.org/abs/0704.0251v2
mailto:gour@math.ucalgary.ca
mailto:nwallach@ucsd.edu
especially in higher dimensions. For example, the entan-
glement of the support of ρ provides a lower bound for
the entanglement of formation. It can be shown that in
lower dimensions the bound is generally not tight. For
example, for two qubits in a mixed state ρ, the entangle-
ment of the support ESupport(ρ) = 0 (see Eq. (2)). On
the other hand, in higher dimensions the bound can be
very tight [6, 7].
II. ENTANGLEMENT OF SUBSPACES
In this section we study some of the properties of EoS
with a focus on additivity properties. The EoS provides a
lower bound on the entanglement of formation and our in-
terest in its additivity properties is due to one of the most
important unresolved questions in quantum information,
namely the additivity conjecture for the entanglement
of formation. In particular, the additivity question of
EoS is identical to the additivity conjecture of quantum
channel output entropy [13] that has been shown to be
equivalent to the additivity conjecture of entanglement
of formation [4]. Thus, additivity properties of EoS can
shed some light on this topic.
A. Additivity properties of the entanglement of
subspaces
Here we consider the additivity properties of EoS. We
start by showing that if UAB and V A
′B′ are two sub-
spaces such that E(UAB) > 0 and/or E(V A′B′) > 0 then
E(UAB ⊗ V A′B′) > 0.
Consider W = Cn ⊗ Cm. Let ej, j = 1, ..., n be the
standard basis of Cn. We will also use the notation fj for
the standard basis of Cm. An element of a tensor product
of two vector spaces, A and B will be called a product if
it is of the form a⊗ b with a ∈ A and b ∈ B.
Proposition 1. Let u1, ..., ud, v1, ..., vd ∈ W be such that
if x =
i bivi is a product then x = 0. If z =
i ui ⊗ vi
is a product in (Cn ⊗ Cn)⊗ (Cm ⊗ Cm) then z = 0.
Proof. We write ui =
j=1 ej ⊗ uij and vj =
j=1 ej ⊗ vij . Assume that
iui⊗vi is a product
in (Cn⊗Cn)⊗(Cm⊗Cm). This means that there exists
z ∈ Cn⊗Cn and w ∈ Cm⊗Cm such that
i,k,l
(ek⊗el)⊗(uik⊗vil) = z⊗w.
If we write out z =
k,lzklek⊗el with zkl ∈ C then we
must have
uik⊗vil = zklw
for all k, l. We now write uik =
ikfm and w =
mfm⊗wm. The displayed formula now implies that
(k.l,m fixed)
umikvil = zklwm.
This implies that (with k and m fixed) we have
umikel⊗vil = (
zklel)⊗wm.
Hence
ikvi is a product. Our assumption implies
that it must be 0. Hence
i,k,m
umikek⊗fm⊗vi =
ui⊗vi.
As was to be proved.
Note that the proposition above states that if none of
the decompositions of a bipartite mixed state, ρ, contain
a product state, then also none of the decompositions
of ρ ⊗ σ (σ is a bipartite mixed state) contain a prod-
uct state. This property is related to the additivity con-
jecture [4] for the entanglement of formation (and other
measures) and one of the main questions that we will
consider here is wether the EoS is additive. That is, does
E(UAB ⊗ V A
′B′) = E(UAB) + E(V A
′B′) ?
Clearly, if the EoS were additive then the proposition
above would have been a trivial consequence of that.
However, we were not able to prove the additivity of
EoS (in general) although for some special cases it has
been tested numerically in [14] and no counter example
has been found. The proposition below provides a lower
bound.
Proposition 2. Let N = min{dimUAB, dimV A′B′}.
E(UAB) + E(V A
)− logN ≤ E
UAB ⊗ V A
. (3)
The equation above provides a lower bound whereas
the upper bound E
UAB ⊗ V A′B′
≤ E(UAB)+E(V A′B′)
follows directly from the definition of EoS. Thus, for N =
1 the EoS is additive. Note also that even if N is small
(e.g. N = 2), E(UAB) and E(V A′B′) can be arbitrarily
large (i.e. depending on dA and dB but not on N).
Proof. Let χ be a normalized vector in UAB⊗V A′B′ . We
can write χ in its Schmidt decomposition as follows:
i ⊗ vA
where
i pi = 1 (pi ≥ 0) and the uABi ’s (vA
i ’s) are
orthonormal. Now, from the strong subadditivity of the
von-Neumann entropy we have
S(ρA′) + S(ρB) ≤ S(ρAB) + S(ρAA′) ,
where ρA ≡ TrA′BB′χ ⊗ χ∗, ρB ≡ TrAA′B′χ ⊗ χ∗,
etc. Now, note that S(ρAA′) = E(χ) and S(ρAB) =
H({pi}) ≤ logN , where H({pi}) is the Shanon entropy.
Furthermore, note that
ρA′ =
piωi and ρB =
where
ωi ≡ TrB′vA
i ⊗ vA
and σi ≡ TrAuABi ⊗ uABi
Hence, since the von-Neumann entropy is concave we
S(ρA′) ≥
piS(ωi) =
piE(v
i ) ≥ E
and similarly S(ρB) ≥ E
. Combining all this we
≤ logN + E(χ) ,
for all χ ∈ UAB ⊗ V A′B′ . This complete the proof.
B. Maximally entangled subspaces
As we have seen above, if N = 1 then the EoS is clearly
additive. As we will see in the next subsection, it is also
additive for maximally entangled subspaces:
Definition 3. Let W be a subspace of HA ⊗ HB and
let dA = dimHA and dB = dimHB. W is said to be a
maximally entangled subspace in HA ⊗HB if
E(W ) = logm , (4)
where m ≡ min{dA, dB}.
The term maximally entangled subspace have been
used in [6, 7] for a subspace W with E(W ) slightly less
than logm. In this paper, we will call such subspaces
nearly maximally entangled to distinguish from (exactly)
maximally entangled subspaces as defined above.
In [15] it has been shown that the average entangle-
ment of a pure state φ ∈ HA ⊗ HB which is chosen
randomly according to the unitarily invariant measure
satisfies
〈E(φ)〉 ≥ log2 dA −
2 ln 2dB
where without loss of generality dA ≥ dB. Later on,
in [6, 7] this result has been extended to subspaces and
in particular it has been shown, somewhat surprisingly,
that a randomly chosen subspace of bipartite quantum
system will likely be a nearly maximally entangled sub-
space. Thus, as nearly maximally entangled subspaces
are quite common it is important to understand their
structure. As a first step in this direction, in the following
we study the structure of (exactly) maximally entangled
subspaces.
Let φ be a state in HA⊗HB. If e1, ..., em is an or-
thonormal basis of HB we may write
φi⊗ei .
We define a dB× dB Hermitian matrix B = [〈φi|φj〉] (i.e.
B is the reduced density matrix). Let λ1, ..., λdB be the
set of eigenvalues of B counting multiplicity. Then the
entanglement of φ is
E(φ) = −
λi log(λi) .
It is easy to show that E(φ) ≤ logm and equality is
attained if and only if B = 1
P with P a projection
matrix onto a d dimensional subspace of CdB . Clearly
this definition of entropy is independent of the choice of
basis and could also be given using an orthonormal basis
of HA and analyzing the corresponding dA coefficients
in HB. Under the condition of equality φ is maximally
entangled, and this in particular implies that if dA ≥ dB
〈φi|φj〉 =
δij .
Proposition 3. Assume that dA ≥ dB and set m = dB .
Let UAB be a maximally entangled subspace in HA⊗HB
of dimension d. If e1, ..., em is an orthonormal basis of
HB then there exist U1, ..., Um subspaces of HA such that
〈Ui|Uj〉 = 0 if i 6= j, dimUj = d > 0 for all j = 1, ...,m
and unitary operators Ti : C
d → Ui i = 1, ...,m such that
UAB = {
Tiw⊗ei|w ∈ Cd}.
Conversely, if U1, ..., Um are mutually orthogonal sub-
spaces of HA such that dimUj = d > 0 for all j = 1, ...,m
and we have unitary operators Ti : C
d → Ui i = 1, ...,m
such that
UAB = {
Tiw⊗ei|w ∈ Cd} ,
then UAB is maximally entangled.
Proof. Let ψ1, ..., ψd be an orthonomal basis of U
Then we can write
ψij⊗ei
with 〈ψij |ψkj〉 = 1mδik. The condition on U
AB is that
if a ∈ Cd is a unit vector then
ajψj is maximally
entangled in HA⊗HB. This implies that
ajψlj
ajψkj
δl,k .
Fix l 6= k and let p 6= q ≤ d be two integers. Let a =
(a1, ..., ad) with aj = 0 for j 6= p or j 6= q. Set ap =
b, aq = c and |b|2 + |c|2 = 1. Then we have
〈bψlp + cψlq|bψkp + cψkq〉 = 0.
On the other hand we have
〈bψlp + cψlq|bψkp + cψkq〉 = bc 〈ψlp|ψkq〉+ cb 〈ψlq|ψkp〉
Set z = bc. We look at two cases: first z = 1
(b = c =
) and second z = i√
(b = 1√
, c = i√
). Thus we have
〈ψlp|ψkq〉+ 〈ψlq|ψkp〉 = 0
for the first case and
〈ψlp|ψkq〉 − 〈ψlq|ψkp〉 = 0
for the second. Hence, 〈ψlp|ψkq〉 = 〈ψlq|ψkp〉 = 0. We set
Ul = Span{ψlp|p = 1, ..., d}. Then 〈Ul|Uk〉 = 0 if l 6= k.
We now consider what happens when l = k. We first
note that taking ap = 1 and all other entries equal to 0
we have 〈ψlp|ψlp〉 = 1m . Now using b, c as above for p 6= q
we have
〈ψlp|ψlp〉+ 〈ψlp|ψlq〉+ 〈ψlq|ψlp〉+ 〈ψkp|ψkp〉 =
〈ψlp|ψlp〉+ i 〈ψlp|ψlq〉 − i 〈ψlq|ψlp〉+ 〈ψkp|ψkp〉 =
Hence as above we find that 〈ψlp|ψlq〉 = 0 if p 6= q. Thus√
mψl1, ...,
mψld is an orthonormal basis of Ul. This
implies that the spaces U1, ..., Um have the desired prop-
erties. Let u1, ..., ud, be the standard orthonormal basis
of Cd and define Tiuj =
mψij . With this notation in
place UAB has the desired form. The converse is proved
by the obvious calculation.
Corollary 4. If UAB is a maximally entangled subspace
in HA⊗HB (dA ≥ dB), then
dimUAB ≤
Furthermore, there always exists a maximally entangled
subspace of dimension ⌊dA/dB⌋.
Proof. Assume that dimHA = dA ≥ dimHB = dB.
According to the first part of Proposition 4, if UAB is
a maximally entangled subspace of dimension d then
d × dB ≤ dA. On the other hand, if d ≤ ⌊dA/dB⌋ then
the second half of the statement implies that there is a
maximally entangled subspace of dimension d.
In the following we find necessary and sufficient con-
ditions for a subspace to be maximally entangled. In
section III we use this to show that maximally entan-
gled subspaces can play an important role in the study
of error correcting codes. As above we consider the space
HA⊗HB with dimHA = dA ≥ dimHB = dB and a max-
imally entangled subspace UAB ⊂ HA⊗HB. We will also
consider End(HB) to be a Hilbert space with inner prod-
uct 〈X |Y 〉 = Tr(X†Y ) for any two operators X and Y in
End(HB).
Proposition 5. Let UAB ⊂ HA⊗HB be a subspace and
dA ≥ dB. Then, UAB is maximally entangled if and only
if the map End(HB)⊗UAB → HA⊗HB given by X⊗u 7→√
dB(I ⊗X)u is an isometry onto its image.
Proof. Let d = dimUAB and let the notation be as in
Proposition 3. Thus, if UAB is maximally entangled and
if e1, ..., edB is an orthonormal basis of HB then an ele-
ment of UAB is of the form
T (w) =
Ti(w) ⊗ ei ,
with Ti a unitary operator from C
d onto a subspace Ui
of HA and Ui and Uj are orthogonal for all i 6= j. We
now calculate
〈(I⊗X)T (w)|(I⊗Y )T (z)〉 =
〈Ti(w)|Tj(z)〉 〈Xei|Y ej〉 .
Now, since 〈Ti(w)|Tj(z)〉 = δij〈w|z〉 (see Proposition 3)
we have:
〈(I ⊗X)T (w)|(I ⊗ Y )T (z)〉 =
δij〈w|z〉 〈Xei|Y ej〉
= 〈w|z〉Tr(X†Y ) .
That is, we proved that if UAB is maximally entangled
then the map is an isometry. For the converse we note
that we have an isometry of Cd onto UAB given by
T (w) =
Ti(w) ⊗ ei .
Now, if the map defined in the proposition is an isometry
〈(I ⊗X)T (w)|(I ⊗ Y )T (z)〉 = 〈w|z〉Tr(X†Y ) .
That is,
〈Ti(w)|Tj(z)〉 〈Xei|Y ej〉 = 〈w|z〉Tr(X†Y ) ,
for all X,Y ∈ End(HB). Hence, we must have
〈Ti(w)|Tj(z)〉 = δij〈w|z〉/ and from Proposition 3 the
subspace UAB is maximally entangled.
C. Additivity of maximally entangled subspaces
We now discuss the additivity properties of maximally
entangled subspaces.
Proposition 6. Let UAB ⊂ HA⊗HB and V A′B′ ⊂
HA′⊗HB′ be maximally entangled subspaces. Then,
UAB ⊗ V A
= logm+ logm′ , (5)
where m ≡ min{dA, dB} and m′ ≡ min{dA′ , dB′}.
Remark. From the above proposition it follows that
if dA ≥ dB and dA′ ≥ dB′ or dB ≥ dA and
dB′ ≥ dA′ then UAB⊗V A
′B′ is maximally entangled
in (HA⊗HA′)⊗(HB⊗HB′). However, if for example
dA > dB and dA′ < dB′ then U
AB⊗V A′B′ is NOT
maximally entangled in (HA⊗HA′)⊗(HB⊗HB′) because
mm′ < min{dAdA′ , dBdB′}.
Proof. There are basically two possibilities (up to inter-
changing factors): the first is dA ≥ dB and dA′ ≥ dB′ ,
and the second is dA ≥ dB and dA′ < dB′ .
In the first case we have as in the statement of
proposition 3 the subspaces Uj and the unitaries Tj :
d → Uj such that UAB = {
i Tiw⊗ei|w ∈ Cd}. We
also have the orthonormal basis fi of HB
, the sub-
spaces Vj and the unitaries Sj : C
d′ → Vj such that
′B′ = {
i Siw
′⊗fi|w′ ∈ Cd
′}. Thus, as a subspace of
(HA⊗HA′)⊗(HB⊗HB′), UAB⊗V A′B′ is spanned by the
elements
(Tiw⊗Sjw′)⊗(ei⊗fj).
Thus if we identify Cd⊗Cd′ with Cdd′ then the converse
assertion in proposition 3 implies that UAB⊗V A′B′ is a
maximally entangled space. This implies that
UAB⊗V A
= log dB + log dB′ = logm+ logm
We now consider the second case. For UAB we have ex-
actly as above UAB = {
i Tiw⊗ei|w ∈ Cd}. For V A
we denote by fj an orthonormal basis of HA
(not of
HB′ as above). Thus, according to proposition 3 we have
′B′ = {
i fi ⊗ Siw′|w′ ∈ Cd
′}. As a subspace of
(HA⊗HA′)⊗(HB⊗HB′), UAB⊗V A′B′ is spanned by the
elements
(Tiw⊗fj)⊗(ei⊗Sjw′).
We will assume first that d′ ≤ d. Let w′1, ..., w′d′ be
an orthonormal basis of Cd
. Thus, if φ is a state in
UAB⊗V A′B′ we can write it as
i,j,k
(Tiwk⊗fj)⊗(ei⊗Sjw′k) ,
where wk are some non-normalized vectors in C
d. Fur-
thermore,
〈φ|φ〉 = dBdA′
‖uk‖2 .
Hence, if φ is normalized then
‖uk‖2 =
dBdA′
Now, since Sjw
k is an orthonormal set of vectors for all
j and k (see proposition 3), the entanglement of φ as an
element of (HA⊗HA′)⊗(HB⊗HB′) is given by
dBdA′S(B)
where B = [〈wi|wj〉]1≤i,j≤d′ , and if λ1, ..., λd′ are the
eigenvalues of B then the von-Neumann entropy of B is
S(B) = −
λi logλi .
Now B is the most general d′ × d′ self adjoint, positive
semidefinate matrix with trace 1/dBdA′ . The minimum
of the entropy for such matrices is
log(dadB)
This proves the proposition for the case d′ ≤ d. If d < d′
then we can prove the proposition by using the same
argument, this time with wk an orthonormal basis of C
and with B′ =
w′i|w′j
1≤i,j≤d. This completes the
proof.
The above proposition also shows that the entangle-
ment of formation is additive for bipartite states with
maximally entangled support. If ρ is a mixed state in
HA⊗HB then the entanglement of formation is defined
in terms of the convex roof extension:
EF(ρ) = min
piE(φi)
where the minimum taken over all decompositions
piφi⊗φ∗i
with φi a pure bipartite state and pi > 0 and
pi = 1.
Corollary 7. Let ρ and σ be mixed states in B(HA⊗HB)
and B(HA′⊗HB′), respectively. If the support subspaces
Sρ and Sσ are maximally entangled then
EF(ρ⊗ σ) = EF(ρ) + EF(σ) .
The proof of this corollary follows directly from the
fact that for states with maximally entangled support
EF(ρ) = E(Sρ). Note that the class of mixed states with
maximally entangled support is extremely small (i.e. of
measure zero). In particular, it is a much smaller class
than the one found by Vidal, Dur and Cirac [16] .
III. ERROR CORRECTING CODES
A. Definitions
We consider error correcting codes that are used to en-
code l qubits in n ≥ l qubits in such a way that they can
correct errors on any subset of k or fewer qubits. These
codes, which we call (n, l, k) error correcting codes, can
be classified into two classes (for example see [17]): de-
generate and non-degenerate (or orthogonal) codes. We
start with a general definition of error correcting codes
that is equivalent to the definition given (for example)
in [17] , but here we define the codes in terms of sub-
spaces.
Definition 4. Let X ∈ End(⊗kC2) and 0 ≤ i0 < i1 <
... < ik−1 ≤ n − 1. The operator Xi0i1···ik−1 on ⊗nC2,
that represents the errors on the k qubits i1, ..., ik−1,
is defined by Xi0...ik−1v = σ(X ⊗ I)σ−1v, where (a)
σ ∈ Sn (acting on {0, 1, ..., n − 1} by permutations) is
defined such that σ(j) = ij , (b) σ can act on ⊗nC2 by
σ(v0 ⊗ v1 ⊗ · · · ⊗ vn−1) = vσ(0) ⊗ vσ(1) ⊗ · · · ⊗ vσ(n−1)
and (c) ⊗nC2 is viewed as (⊗kC2)⊗ (⊗n−kC2) (putting
together the k tensor factors that correspond to the k
qubits i1, ..., ik−1 and the rest n− k tensor factors). An
(n, l, k) error correcting code is defined from its following
ingredients:
I. An isometry T : ⊗lC2 → ⊗nC2.
II. Let V0 = T (⊗lC2). There are V1, ..., Vd mutually
orthogonal subspaces of ⊗nC2 that are also orthogonal
to V0.
III. For each Vj there is a unitary isomorphism, Uj, of
Vj onto V0 with U0 = I.
IV. Xi0i1···ik−1V0 ⊂ ⊕dj=0Vj .
V. Let Pj be the ortogonal projection of ⊗nC2 onto Vj
then if v ∈ V0 is a unit vector and Pj(Xi0i1···ik−1v) 6= 0
Pj(Xi0i1···ik−1v)
∥Pj(Xi0i1···ik−1v)
equals v up to a phase.
In the next subsection we study Shor’s (9, 1, 1) error
correcting code and show that it satisfies this definition.
However, before that, let us introduce the notion of k-
totally entangled subspaces which will play an important
role in our discussion of QECC.
Definition 5. Let H be the space of n qubits, ⊗nC2.
Corresponding to any choice of k qubits (tensor factors)
we can consider H = HA ⊗HB with HA = ⊗n−kC2 and
HB = ⊗kC2. For k ≤ n/2 we will say that a subspace,
V , of H is k-totally entangled if it is maximally entangled
relative to every decomposition of H as above.
It is interesting to note that all the subspaces spanned
by the logical codewords of the different non-degenerate
error correcting codes given in [18, 19, 20] are 2-totally
entangled subspaces. On the other hand, the subspaces
spanned by the logical codewords of degenerate codes,
like Shor’s 9 qubits code, are in general only partially
maximally entangled subspaces (i.e. maximally entan-
gled for some choices of k qubits but not for all choices).
In the following subsections we will see the reason for
that.
B. Analysis of Shor’s 9 qubits code
We start with the following notations. We set u± =
(|000〉 ± |111〉) so that the two logical codewords in
Shor’s 9 qubit code are v+ = u+ ⊗ u+ ⊗ u+ and v− =
u−⊗u−⊗u−. The subspace spanned by these codewords
is denoted by V0 = Cv+ ⊕ Cv−. We also denote u0± =
(|100〉 ± |011〉)/
2, u1± = (|010〉 ± |101〉)/
2 and u2± =
(|001〉 ± |110〉)/
Using these notations, we define 21 mutually orthogo-
nal 2 dimensional subspaces orthogonal to V0:
V1 = Cu− ⊗ u+ ⊗ u+ ⊕ Cu+ ⊗ u− ⊗ u−,
V2 = Cu+ ⊗ u− ⊗ u+ ⊕ Cu− ⊗ u+ ⊗ u−,
V3 = Cu+ ⊗ u+ ⊗ u− ⊕ Cu− ⊗ u− ⊗ u+,
V4+i = Cu
+⊗u+⊗u+⊕Cui−⊗u−⊗u−, for i = 0, 1, 2,
V7+i = Cu+⊗ui+⊗u+⊕Cu−⊗ui−⊗u−, for i = 0, 1, 2,
V10+i = Cu+⊗u+⊗ui+⊕Cu−⊗u−⊗ui−, for i = 0, 1, 2,
V13+i = Cu
−⊗u+⊗u+⊕Cui+⊗u−⊗u−, for i = 0, 1, 2,
V16+i = Cu+⊗ui−⊗u+⊕Cu−⊗ui+⊗u−, for i = 0, 1, 2,
V19+i = Cu+⊗u+⊗ui−⊕Cu−⊗u−⊗ui+, for i = 0, 1, 2.
If X ∈ End(C2) (linear maps of C2 to C2) then we
denote by Xi the linear map of ⊗9C2 to itself that is the
tensor product of the identity of C2 in every tensor factor
but the i-th and is X in the i-th factor thus
X0 = X ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I,
X1 = I ⊗X ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I ⊗ I
Then we have (here ⌊x⌋ = max{m|m ≤ x,m ∈ Z})
XiV0 ⊂ V0 ⊕ V⌊i/3⌋+1 ⊕ Vi+4 ⊕ Vi+13, 0 ≤ i ≤ 8.
We choose an observable R with
R|Vi = λiI, 0 ≤ i ≤ 21
R|W = µI,
where W is the orthogonal complement of ⊕21i=0Vi and
λi 6= λj for i 6= j and λi 6= µ for any i. We define a
unitary operator Uj : Vj → V0 as follows: we denote the
Pauli matrices by
, A2 =
, A3 =
and then define U0 = I, Ui = (A1)3i−1, for i = 1, 2, 3,
Ui = (A2)i−4, for 4 ≤ i ≤ 12 and Ui = (A3)i−13, for
13 ≤ i ≤ 21. This gives an one qubit error correcting
code since if v ∈ V0 is a state and if we have an error in
the i-th position then we will have
Xiv ∈ V0 ⊕ V⌊i/3⌋+1 ⊕ Vi+4 ⊕ Vi+13 .
Thus, if we measure the observable R on Xiv then the
measurement will yield one of λj with j = 0, ⌊i/3⌋+1, i+4
or i + 13 and Xiv will have collapsed up to a phase to
Ujv; hence applying Uj will fix the error.
Remark. Note that the subspace V0 is not 2-totally en-
tangled subspace. Nevertheless, V0 has very special
properties. In particular, if we group the 9 qubits as
(1, 2, 3) : (4, 5, 6) : (7, 8, 9), then for any choice of 2 qubits
that are not from the same group, the subspace V0 is
maximally entangled with respect to the decomposition
between the 2 qubits and the rest 7 qubits. If the 2 qubits
are chosen from the same group then the entanglement
of V0 with respect to this decomposition is 1ebit. Thus,
out of the 36 different decompositions, with respect to 27
of them E(V0) = 2ebits and with respect to the other 9
decompositions E(V0) = 1ebit.
C. Orthogonal codes
We now consider a somewhat more intuitive class of
codes known as non-degenerate codes which we also name
as orthogonal codes.
Definition 6. Let A0 = I, A1, A2, A3 the Pauli basis and
define A
j0j1···jk−1
i0i1···ik−1 to be
(Aj0 ⊗Aj1 ⊗ · · · ⊗Ajk−1)i0···ik−1 ,
where 0 ≤ jr ≤ 3 and 0 ≤ i0 < i1 < ... < ik−1 ≤
n − 1. Let Σ be the set of distinct operators of the
form A
j0j1···jk−1
i0i1···ik−1 . Then an orthogonal (n, l, k) code is
an (n, l, k) error correcting code such that if we label
Σ as the set of d + 1 operators S0 = I, S1, ..., Sd then
Vj = SjV0.
Note that Σ has
d+ 1 =
elements. Thus, a necessary condition that there exist
an (n, l, k) code is the quantum Hamming bound [17]:
≤ 2n−l.
Proposition 8. A 2l dimensional subspace V of ⊗nC2
is the V0 of an (n, l, k)-orthoganal error correcting code
if and only if V is 2k-totally entangled.
Proof. Let V be a 2k-totally entangled subspace in H =
⊗nC2, and let X : ⊗kC2 → ⊗kC2 be a linear map on
k qubits. As above, for any i0 < i1 < ... < ik−1 (1 ≤
il ≤ n) we denote by Xi0i1...ik−1 the operation X on H,
when acting on the k qubits i0, i1, ...., ik−1 (the rest of the
n − k qubits are left ”untouched”). Let also Z ≡ {X ∈
|TrX = 0} and for any i0 < ... < ik−1 let
Ui0...ik−1 ≡ {Xi0...ik−1V |X ∈ Z}. We define the subspace
W = V +
i0<...<ik−1
Ui0...ik−1 .
That is, W consists of all the possible states after an
error on k or less qubits has been occurred. Now, let
A0 = I, A1, A2, A3 be an orthonormal basis of End(C
with Ai invertible (e.g. the Pauli basis of 2×2 matrices).
As in Definition 6, we denote by A
j0...jk−1
i0...ik−1
the operator
Xi0...ik−1 that corresponds to X = Aj0 ⊗· · ·⊗Ajk−1 , and
the set of all such operators we denote by
Σ ≡ {Aj0...jk−1i0...ik−1 |1 ≤ il ≤ n, 0 ≤ jl ≤ 3} .
Now, let v1, ..., vd be an orthonormal basis of V and define
B ≡ {Svi
S ∈ Σ, 1 ≤ i ≤ d}.
We now argue that B is an orthonormal basis of W .
Clearly, the vectors in B span W . It is therefore enough
to show that the vectors in B are orthogonal. Let Svi
and S′vj be two vectors in B with S = A
j0...jk−1
i0...ik−1
S′ = A
...j′
...i′
. We denote byHB the Hilbert space of the
qubits i0, ..., ik−1 and i
0, ..., i
k−1, and by HA the Hilbert
space of the rest of the qubits. Note that HB consists
of at most 2k qubits. Now, since V is 2k-totally entan-
gled subspace, it is maximally entangled relative to the
decomposition H = HA ⊗HB. Thus, from Proposition 5
we clearly have
〈Svi|S′vj〉 = 〈vi|vj〉Tr
S̃†S̃′
= δijδSS′ ,
where S ≡ IA ⊗ S̃ and S′ ≡ IA ⊗ S̃′; that is, S̃ and S̃′
are the projections of S and S′ onto HB, respectively.
Hence, B is an orthonormal basis of W .
Since B is an orthonormal basis we can construct an
observable (i.e. Hermition operator) R such that for all
v ∈ V R(Sv) = λSSv with all of the λS distinct. We
also define R to be zero on the orthogonal complement
to W in H. Now, suppose that an element v has been
changed by a k-qubit transformation yielding Xi0...ik−1v.
We do a measurment of R and since the image is in W
the outcome is λS for some S. After the measurment, the
quantum state is Sv and so we recover v by applying S−1
(actually S if we used the Pauli basis). The converse fol-
lows from the same lines in the opposite direction. This
completes the proof.
Note that Corollary 4 together with the proposi-
tion above is consistent with the quantum Singleton
bound [22], n ≥ 4k + l, which also follows trivially from
the quantum Hamming bound for the case of orthogonal
codes that we considered in this subsection.
IV. SUMMERY AND CONCLUSIONS
We introduced the notion of entanglement of subspaces
as a measure that quantify the entanglement of bipartite
states in a randomly selected subspace. We discussed its
properties and suggested that it is additive. We were not
able to prove this conjecture (which is equivalent to the
additivity conjecture of the entanglement of formation)
although some numerical tests [14] supports that and for
maximally entangled subspaces we proved that it is ad-
ditive. We then extended the definition of maximally
entangled subspaces into k-totally entangled subspaces
and showed that the later can play an important role in
the study of quantum error correction codes.
We considered both degenerate and non-degenerate
codes and showed that the subspace spanned by the log-
ical codewords of a non-degenerate code is a k-totally
(maximally) entangled subspace. This observation, fol-
lowed by an analysis of the degenerate Shor’s nine qubits
code in terms of 22 mutually orthogonal subspaces, mo-
tivated us to define a general (possible degenerate) error
correcting code in terms of subspaces. We believe that
further investigation in this direction would lead to a
better understanding of degenerate quantum error cor-
recting codes.
Acknowledgments:— The authors would like to thank
Aram Harrow and David Meyer for fruitful discussions.
[1] C. H. Bennett, G. Brassard, C. Crepeau, R. Jozsa, A.
Peres and W. K. Wootters, Phys. Rev. Lett. 70, 1895
(1993).
[2] C. H. Bennett and S.J. Wiesner, Phys. Rev. Lett. 69,
2881 (1992).
[3] M. B. Plenio, S. Virmani, Quant. Inf. Comp. 7, 1 (2007).
[4] P. W. Shor, Commun. Math. Phys. 246, 453 (2004).
[5] G. Gour, Phys. Rev. A 72, 022323 (2005).
[6] P. Hayden, quant-ph/0409157
[7] P. Hayden, D. W. Leung and A. Winter, Comm. Math.
Phys. 265, 95 (2006).
[8] A. Abeyesinghe, P. Hayden, G. Smith and Andreas Win-
ter, IEEE Trans. Inform. Theory 52, 3635 (2006).
[9] P. Shor, Lecture Notes 2002, http://www.msri.org/ pub-
lications/ln/msri/2002/quantumcrypto/shor/1/
[10] N. R. Wallach, ”An Unentangled Gleason’s theorem”,
Quantum computation and information (Washington,
DC, 2000), 291–298,Contemp. Math. 305(2002).
[11] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J.
A. Smolin and B. M. Terhal, Phys. Rev. Lett. 82, 5385
(1999).
[12] D. P. DiVincenzo, T. Mor, P. W. Shor, J. A. Smolin and
B. M. Terhal, Comm. Math. Phys. 238, 379 (2003).
[13] A. Roy and G. Gour, in preparation.
[14] A. W. Harrow, private communication.
[15] D. N. Page, Phys. Rev. Lett. 71, 1291 (1993).
[16] G. Vidal, W. Dur and J. I. Cirac, Phys. Rev. Lett. 89,
027901 (2002).
[17] M. A. Nielsen and I. L. Chuang, ”Quantum Computa-
tion and Quantum Information” (Cambridge University
Press, 2000).
[18] A. M. Steane, Phys. Rev. Lett. 77, 793 (1996); Proc. R.
Soc. London A, 452, 2551 (1996).
[19] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin and W. K.
Wootters, Phys. Rev. A 54, 3824 (1996).
[20] R. Laflamme, C. Miquel, J. P. Paz and W. H. Zurek,
Phys. Rev. Lett. 77, 198 (1996).
[21] P. Shor, Phys. Rev. A 52, 2493 (1995).
[22] E. Knill and R. Laflamme, Phys. Rev. A 55, 900 (1997).
http://arxiv.org/abs/quant-ph/0409157
http://www.msri.org/
ABSTRACT
  We introduce the notion of entanglement of subspaces as a measure that
quantify the entanglement of bipartite states in a randomly selected subspace.
We discuss its properties and in particular we show that for maximally
entangled subspaces it is additive. Furthermore, we show that maximally
entangled subspaces can play an important role in the study of quantum error
correction codes. We discuss both degenerate and non-degenerate codes and show
that the subspace spanned by the logical codewords of a non-degenerate code is
a 2k-totally (maximally) entangled subspace. As for non-degenerate codes, we
provide a mathematical definition in terms of subspaces and, as an example, we
analyze Shor's nine qubits code in terms of 22 mutually orthogonal subspaces.

<|endoftext|><|startoftext|>
Does the present data on Bs − B̄s mixing rule out a large
enhan
ement in the bran
hing ratio of Bs → µ
Ashutosh Kumar Alok and S. Uma Sankar
Indian Institute of Te
hnology, Bombay, Mumbai-400076, India
In this letter, we 
onsider the 
onstraints imposed by the re
ent measurement of
Bs− B̄s mixing on the new physi
s 
ontribution to the rare de
ay Bs → µ
+µ−. New
physi
s in the form ve
tor and axial-ve
tor 
ouplings is already severely 
onstrained
by the data on B → (K,K∗)µ+µ−. Here, we show that Bs−B̄s mixing data, together
with the data on K0−K̄0 mixing and KL → µ
+µ− de
ay rate, strongly 
onstrain the
s
alar-pseudos
alar 
ontribution to Bs → µ
+µ−. We 
on
lude that new physi
s 
an
at best lead to a fa
tor of 2 in
rease in the bran
hing ratio of Bs → µ
+µ− 
ompared
to its Standard Model expe
tation.
The �avour 
hanging neutral intera
tion (FCNI) b → sµ+µ− serves as an important probe
to test the Standard Model (SM) and its possible extensions. This four fermion intera
tion
gives rise to semi-leptoni
 de
ays B → (K,K∗)µ+µ− and also the purely leptoni
 de
ay Bs →
µ+µ−. The semi-leptoni
 de
ays B → (K,K∗)µ+µ− have been observed experimentally
[1, 2, 3℄ with bran
hing ratios 
lose to their SM predi
tions [4, 5, 6℄. At present there is only
an upper limit, 1.0×10−7 at 95% C.L., on the bran
hing ratio of the de
ay Bs → µ
+µ− [7, 8℄.
The SM predi
tion for this bran
hing ratio is (3.2±1.5)×10−9 [9℄ or ≤ 7.7×10−9 at 3σ level.
Bs → µ
+µ− will be one of the important rare B de
ays to be studied by the experiments
at the up
oming Large Hadron Collider (LHC). We expe
t that the present upper limit will
be redu
ed signi�
antly in these experiments. A non-zero value of this bran
hing ratio is
measurable, if it is ≥ 10−8 [10℄.
In a previous publi
ation [11℄, we studied the 
onstraints on new physi
s 
ontribution to
the bran
hing ratio of Bs → µ
+µ− 
oming from the experimentally measured values of the
bran
hing ratios of B → (K,K∗)µ+µ−. We found that if the new physi
s intera
tions are in
the form of ve
tor/axial-ve
tor operators, then the present data on B(B → (K,K∗)µ+µ−)
does not allow a large boost in B(Bs → µ
+µ−). By large boost we mean an enhan
ement
of at least an order of magnitude in 
omparison to the SM predi
tion. However, if the new
http://arxiv.org/abs/0704.0252v1
physi
s intera
tions are in the form of the s
alar/pseudos
alar operators, then the presently
measured rates of B → (K,K∗)µ+µ− do not put any useful 
onstraints on Bs → µ
+µ− and
BNP (Bs → µ
+µ−) 
an be as high as the present experimental upper limit. Therefore we are
led to the 
on
lusion that if future experiments measure Bs → µ
+µ− with a bran
hing ratio
greater than 10−8, then the new physi
s giving rise to this de
ay has to be in the form of
s
alar/pseudos
alar intera
tion.
Re
ently Bs− B̄s mixing has been observed experimentally [13℄, with a very small exper-
imental error. In this paper, we want to see what 
onstraint this measurement imposes on
the new physi
s 
ontribution to the bran
hing ratio of Bs → µ
+µ−. In parti
ular, we 
on-
sider the question: Does it allow new physi
s in the form of s
alar/pseudos
alar intera
tion
to give a large boost in BNP (Bs → µ
+µ−) ?
We start by 
onsidering the Bs → µ
+µ− de
ay. The e�e
tive new physi
s lagrangian for
the quark level transition b̄ → s̄µ+µ− due to s
alar/pseudos
alar intera
tions 
an arise from
tree and/or ele
troweak penguin and/or box diagrams. We parametrize it as
b̄→s̄µ+µ− = G1 b̄(g
S + g
P γ5)s µ̄(g
S + g
P γ5)µ, (1)
where G1 is a dimensional fa
tor 
hara
terizing the overall s
ale of new physi
s, with di-
mension (mass)−2. This fa
tor essentially arises due to the s
alar propagator in tree or
ele
troweak penguin diagrams (or s
alar propagators in box diagrams) whi
h 
ouples the
quark bilinear to the lepton bilinear. gsbS,P and g
S,P are dimensionless numbers, 
hara
-
terizing, respe
tively, b − s and µ − µ 
ouplings due to new physi
s s
alar/pseudos
alar
intera
tions. Ele
tromagneti
 penguins ne
essarily have ve
tor 
ouplings in the lepton bi-
linear so they do not 
ontribute to the e�e
tive lagrangian in eq. (1). The amplitude for the
de
ay Bs → l
+l− is given by
M(Bs → µ
+µ−) = G1 g
∣b̄γ5s
∣Bs〉 [g
S ū(pµ)v(pµ̄) + g
P ū(pµ)γ5v(pµ)] . (2)
The pseudos
alar matrix element is,
∣b̄γ5s
∣Bs〉 = −i
mb +ms
, (3)
where mb and ms are the masses of bottom and strange quark respe
tively.
The 
al
ulation of the de
ay rate gives
ΓNP (Bs → µ
+µ−) = (gsbP )
2 + (g
f 2BsM
(mb +ms)2
. (4)
We see that the de
ay rate depends upon the new physi
s 
ouplings (gsbP )
and G2
2]. To obtain information on these parameters, we look at Bs − B̄s mixing together
with KL → µ
+µ− de
ay and K0 − K̄0 mixing.
Let us 
onsider Bs − B̄s mixing to obtain a 
onstraint on (g
. Repla
ing leptoni
bilinear by quark bilinear in eq. 1, we get ∆B = 2 Lagrangian,
LSPBs−B̄s = G2 b̄(g
S + g
P γ5)s b̄(g
S + g
P γ5)s, (5)
where G2 is another dimensional fa
tor. As in the 
ase of G1, introdu
ed in eq. (1), G2 also
arises due to the s
alar propagator (or progators in the 
ase of box diagrams). Therefore it
also has dimension (mass)−2 and is of the same order of magnitude as G1. From eq. (5),
we 
al
ulate the mass di�eren
e of the Bs mesons to be
∆mBs =
G2 (g
2B̂Bs
f 2BsM
(mb +ms)2
. (6)
Thus the e�e
tive b− s pseudos
alar 
oupling is obtained to be
(gsbP )
∆mBs(mb +ms)
2B̂Bsf
M3BsG2
. (7)
We now 
onsider the de
ay KL → µ
+µ−. The same new physi
s leading to the e�e
tive
b̄ → s̄µ+µ− lagrangian in eq. (1), also leads a similar e�e
tive lagrangian for s̄ → d̄µ+µ−
transition. The only di�eren
e will be the e�e
tive s
alar/pseudos
alar 
ouplings in the
quark bilinear. Thus we have,
s̄→d̄µ+µ− = G1 s̄(g
S + g
P γ5)d µ̄(g
S + g
P γ5)µ. (8)
The 
al
ulation of de
ay rate gives
ΓNP (KL → µ
+µ−) = 2(gsdP )
2 + (g
f 2KM
(md +ms)2
. (9)
Here extra fa
tor of 2 o

urs be
ause the amplitudes A(K0 → µ+µ−) = A(K̄0 → µ+µ−)
and KL =
K0+K̄0√
. We see that G2
2 + (g
2] 
an be 
al
ulated from Γ(KL → µ
+µ−),
on
e we know the value of (gsdP )
. In order to determine the value of (gsdP )
, we 
onsider
K0 − K̄0 mixing. The e�e
tive s
alar/pseudos
alar new physi
s lagrangian for this pro
ess

an be obtained from that of s̄ → d̄µ+µ− by repla
ing lepton 
urrent by 
orresponding quark

urrent or equaivalently from e�e
tive lagrangian of eq. (5) where b − s quark bilinear is
repla
ed by s− d quark bilinear,
K0−K̄0
= G2 s̄(g
S + g
P γ5)d s̄(g
S + g
P γ5)d. (10)
From this lagrangian, we obtain the KL −KS mass di�eren
e to be
∆mK =
G2 (g
f 2KM
(ms +md)2
. (11)
Thus the e�e
tive s− d pseudos
alar 
oupling is
(gsdP )
2∆mK(md +ms)
B̂Kf 2
M3KG2
. (12)
Substituting the above value of (gsdP )
in eq. (9), we get
2 + (g
2πG2B̂K
M2K∆mK
ΓNP (KL → µ
+µ−). (13)
Substituting the value of G2
2 + (g
2] from eq. (13) and (gsbP )
from eq. (7) in eq. (4),
we get
ΓNP (Bs → µ
+µ−) =
)2 (∆mBs
ΓNP (KL → µ
+µ−). (14)
The bran
hing ratio is given by,
BNP (Bs → µ
+µ−) =
)2 (∆mBs
τ(Bs)
τ(KL)
BNP (KL → µ
+µ−). (15)
We wish to obtain the largest possible value for B(Bs → µ
+µ−). To this end, we make the
liberal assumption that the experimental values of ∆mBs , ∆mK and BNP (KL → µ
+µ−) are
saturated by new physi
s 
ouplings. The de
ay rate for KL → µ
+µ− 
onsists of both long
distan
e and short distan
e 
ontributions. The new physi
s we 
onsider here, 
ontributes
only to the short distan
e part of the de
ay rate. In ref [14℄, an upper limit on the short
distan
e 
ontribution to B(KL → µ
+µ−) is 
al
ulated to be 2.5× 10−9. The mass di�eren
e
of the Bs mesons is re
enly measured by the CDF 
ollaboration to be ∆mBs = (1.17±0.01)×
10−11GeV [13℄. The bag parameters for theK and the Bs mesons are B̂K = (0.58±0.04) and
B̂Bs = (1.30±0.10) [15℄. The values of the other parameters of eq. (15) are taken from Review
of Parti
le Properties [16℄: ∆mK = (3.48±0.01)×10
−15GeV τ(Bs) = (1.47±0.06)×10
−12 Sec
and τ(KL) = (5.11± 0.02)× 10
−8 Sec. Substituting these values in eq. (15), we get
BNP (Bs → µ
+µ−) = (6.30± 0.75)× 10−9, (16)
where all the errors are added in quadrature. At 3σ, BSM(Bs → µ
+µ−) < 7.7× 10−9 where
as BNP (Bs → µ
+µ−) < 8.55× 10−9. Thus we see that this upper bound is almost the same
as the SM predi
tion even if we maximize the new physi
s 
ouplings by assuming that they
saturate the experimental values. Therefore the present data on Bs − B̄s mixing together
with data on K0 − K̄0 mixing and KL → µ
+µ− de
ay puts a strong 
onstraint on new
physi
s s
alar/pseudos
alar 
ouplings and doesn't allow a large boost in the bran
hing ratio
of Bs → µ
We now assume that the new physi
s involving s
alar/pseudos
alar 
ouplings a

ounts
for the di�eren
e between the experimental values and the SM predi
tions of ∆mK , ∆mBs
and the short distan
e 
ontribution to Γ(KL → µ
+µ−). The SM value for Bs − B̄s is given
by [17, 18℄,
(∆mBs)SM =
ηBMBs
B̂Bsf
M2WS(xt) |Vts|
= (1.16± 0.32)× 10−11GeV, (17)
with fBs
B̂Bs = (262±35)MeV [15℄ , ηB = 0.55±0.01[18℄ and |Vts| = 0.0409±0.0009 [16℄.
S(xt) with xt = m
W is one of the Inami-Lim fun
tions [19℄. The SM value for K
0 − K̄0
mixing is given by [20℄,
(∆mK)SM =
λ∗2c η1S(xc) + λ
t η2S(xt) + 2λ
η3S(xc, xt)
, (18)
where λj = V
jsVjd, xj = m
W . The fun
tions S are given by [21, 22℄,
S(xt) = 2.46
170GeV
, S(xc) = xc. (19)
S(xc, xt) = xc
4(1− xt)
3x2t ln xt
4(1− xt)2
. (20)
Using η1 = (1.32 ± 0.32) [23℄, η2 = (0.57 ± 0.01) [18℄, η3 = (0.47 ± 0.05) [24, 25℄, B̂K =
(0.58±0.04) [15℄ ; fK = (159.8±1.5)MeV , |Vcs| = 0.957±0.017±0.093, |Vcd| = 0.230±0.011,
|Vts| = 0.0409± 0.0009 and |Vtd| = 0.0074± 0.0008 [16℄, we get
(∆mK)SM = (1.87± 0.49)× 10
−15GeV. (21)
All the masses were taken from [16℄. Considering only the short-distan
e e�e
ts, the SM
bran
hing ratio for KL → µ
+µ− in next-to-next-to-leading order of QCD is (0.79± 0.12)×
10−9 [26℄. Substra
ting out the SM 
ontribution from the experimental values of ∆mBs ,
∆mK and BNP (KL → µ
+µ−) , we get
BNP (Bs → µ
+µ−) =
(∆mBs)exp − (∆mBs)SM
(∆mK)exp − (∆mK)SM
τ(Bs)
τ(KL)
Bexp(KL → µ
+µ−)short − BSM(KL → µ
. (22)
Substituting the experimental values and the SM predi
tions in the above equation, and
adding all the errors in quadrature, we get
BNP (Bs → µ
+µ−) = (0.08± 2.54)× 10−9. (23)
whi
h is 
onsistent with zero. At 3σ, the upper limit on the new physi
s 
ontribution is 
lose
to SM predi
tion. Thus the present data on ∆mBs along with ∆mK and BNP (KL → µ
puts strong 
onstraints on new physi
s s
alar/pseudos
alar 
ouplings and doesn't allow
a large enhan
ement in the bran
hing ratio of BNP (Bs → µ
+µ−) mu
h beyond the SM
predi
tions. New physi
s at most 
an 
ause a fa
tor of two enhan
ement but
not an order of magnitude. Hen
e the total bran
hing ratio whi
h is the sum of SM

ontribution and new physi
s 
ontribution will be of the order of 10−8 and hen
e rea
hable
at LHC.
Con
lusions:
In this letter, we 
onsidered the 
onstraints on the New Physi
s 
ouplings of
s
alar/pseudos
alar type in the b → s transition. It was shown previously that only su
h
New Physi
s 
an give rise to an order of magnitude enhan
ement of the de
ay rate for
Bs → µ
+µ−. Using the re
ent data on Bs − B̄s mixing, together with the data on K
0 − K̄0
mixing and the short distan
e 
ontribution to KL → µ
+µ−), we obtained very strong bounds
on B(Bs → µ
+µ−). New Physi
s in the form of s
alar/pseudos
alar 
ouplings 
an at most
in
rease the B(Bs → µ
+µ−) by a fa
tor of 2 
ompared to its Standard Model predi
tion.
An order magnitude enhan
ement, previously allowed, is ruled out.
A
knowledgments
We thank Prof. Rohini Godbole for posing a question whi
h led to this investigation. We
also thank Prof. B. Ananthanarayan for a 
riti
al reading of the manus
ript.
[1℄ Babar Collaboration: B. Aubert et al., Phys. Rev. Lett. 91, 221802 (2003).
[2℄ Belle Collaboration: A. Ishikawa et al., Phys Rev. Lett. 91, 261601 (2003).
[3℄ Babar Collaboration: B. Aubert et al., hep-ex/0604007.
[4℄ A. Ali, E. Lunghi, C. Greub and G. Hiller, Phys. Rev. D 66, 034002 (2002).
http://arxiv.org/abs/hep-ex/0604007
[5℄ E. Lunghi, hep-ph/0210379.
[6℄ F. Kruger and E. Lunghi, Phys. Rev. D 63, 014013 (2001).
[7℄ V. M. Abazov et al., D0 Collaboration Phys. Rev. Lett. 94, 071802 (2005).
[8℄ D. Tonelli for the CDF Collaboration, hep-ex/0605038.
[9℄ A. J. Buras, hep-ph/0101336 and Phys. Lett. B 566 115 (2003).
[10℄ R. Forty, WHEPP-8 pro
eedings.
[11℄ A. K. Alok and S. Uma Sankar, Phys. Lett. B 620 61 (2005) .
[12℄ Belle Collaboration: A. Ishikawa et al., hep-ex/0603108.
[13℄ CDF Collaboration: A. Abulen
ia, et al., Phys. Rev. Lett. 97, 242003 (2006).
[14℄ G. Isidori and R. Unterdorfer, JHEP 0401: 009 (2004).
[15℄ S. Hashimoto Int. J. Mod. Phys. A 20 5133 (2005).
[16℄ Review of Parti
le Physi
s: W. M. Yao et al., J. Phys. G 33 1 (2006) .
[17℄ Monika Blanke et. al., JHEP 0610 003 (2006).
[18℄ A. J. Buras et. al., Nu
l. Phys. B347 491 (1990).
[19℄ T. Inami and C. S. Lim, Prog. Theor. Phys. 65 297 (1981) [Erratum-ibid 65 1772 (1981)℄.
[20℄ A. J. Buras , hep-ph/0505175.
[21℄ A. J. Buras et. al., Nu
l. Phys. B238 529 (1984).
[22℄ A. J. Buras et. al., Nu
l. Phys. B245 369 (1984).
[23℄ S. Herrli
h and U. Nierste, Nu
l. Phys. B419 292 (1994).
[24℄ S. Herrli
h and U. Nierste, Phys. Rev. D 52, 6505 (1995).
[25℄ S. Herrli
h and U. Nierste, Nu
l. Phys. B476 27 (1996).
[26℄ M. Gorbahn and U. Haisah, Phys. Rev. Lett. 97, 122002 (2006).
http://arxiv.org/abs/hep-ph/0210379
http://arxiv.org/abs/hep-ex/0605038
http://arxiv.org/abs/hep-ph/0101336
http://arxiv.org/abs/hep-ex/0603108
http://arxiv.org/abs/hep-ph/0505175
	Acknowledgments
	References
ABSTRACT
  In this letter, we consider the constraints imposed by the recent measurement
of B_s - bar B_s mixing on the new physics contribution to the rare decay B_s
--> mu+ mu-. New physics in the form vector and axial-vector couplings is
already severely constrained by the data on B --> (K,K*) mu+ mu-. Here, we show
that B_s - bar B_s mixing data, together with the data on K0 - bar K0 mixing
and K_L --> mu+ mu- decay rate, strongly constrain the scalar-pseudoscalar
contribution to B_s --> mu+ mu-. We conclude that new physics can at best lead
to a factor of 2 increase in the branching ratio of B_s --> mu+ mu- compared to
its Standard Model expectation.

<|endoftext|><|startoftext|>
Introduction
The Spitzer Space Telescope Legacy project “From Molecular Cores to Planet-forming
Disks” includes IRAC and MIPS mapping of five large star-forming clouds (Evans et al.
2003). The Serpens cloud covers more than 10 square degrees as mapped by optical extinction
(Cambrésy 1999), but for reasons of practicality the c2d project was only able to observe 1.5
deg2 with the MIPS instrument on Spitzer (further Spitzer observations of a larger area of
Serpens are planned as part of an extended survey of the Gould Belt, Allen 2007, in prep.).
At an assumed distance of 260 pc (Straizys, Cernis, & Bartasiute 1996), the area mapped
by c2d corresponds to ∼ 4.5 × 7 pc. This paper is one in a series describing the IRAC
and MIPS observations of each of the c2d clouds. Previous papers include those on IRAC
observations of Serpens (Harvey et. al. 2006), Chamaeleon (Porras et al. 2007), and Perseus
(Jorgensen et al. 2006), as well as MIPS observations of Chamaeleon (Young et al. 2005),
Perseus (Rebull et al. 2007), Lupus (Chapman et al. 2007), and Ophiuchus (Padgett et al.
2007).
Our observations of Serpens cover an area that includes the well studied “core” clus-
ter region, Cluster A, together with the newly discovered Cluster B (Harvey et. al. 2006;
Djupvik et al. 2006) to the south, as well as the Herbig Ae/Be star, VV Ser. Significant
portions of this cloud have been studied by previous space infrared missions, including IRAS
(Zhang et al. 1988; Zhang, Laureijs, & Clark 1988) and ISO (Kaas et al. 2004; Djupvik et al.
2006). The much higher sensitivity and longer wavelength capability of the Spitzer MIPS
instrument, however, allows us to detect both very low luminosity infrared-excess objects
and to map very cool diffuse dust emission in the region. Our results are also complementary
to the 1.1mm mapping of the same region by Enoch et al. (2007). The combined results on
Serpens using both the MIPS and IRAC observations are discussed in a companion paper
– 3 –
where we also give detailed object lists (Harvey et al. 2007).
In §2 we describe details of the observations obtained from the MIPS instrument for
Serpens and the data processing pipeline used to reduce the observations. In §3 we describe a
number of results from our MIPS observations and correlations between them and the 2MASS
catalog (Skrutskie et al. 2006). We show in §3.1 that there is an excellent correlation between
the coolest dust that we can observe which emits at 160µm and the optical extinction in
Serpens. We investigate the possibility of time variability at 24µm in our two-epoch data
set in §3.2. In §3.3 we discuss our results statistically in terms of source counts and compare
these to predictions of models of the Galaxy as well as to the counts in the reference fields.
We present color-color and color-magnitude plots of the population of infrared sources in §3.4
and discuss the separation of likely cloud members from the extensive background population
of stars and extragalactic objects. In the final part of §3 we briefly describe some details of
individual sources of particular interest.
2. Observations and Data Reduction
The MIPS observations cover an area where A
> 6 in the contour maps of Cambrésy
(1999). In addition, a nearby off-cloud region of 0.5 square degrees was mapped for com-
parison with the cloud region. A summary of the regions observed is listed in Table 1 with
the AOR (Astronomical Observation Request) number to facilitate access from the Spitzer
archive. The regions covered at 24µm are outlined in Figure 1 against the 25 µm IRAS sky.
The observing strategy and basic MIPS data analysis for the c2d star-forming clouds have
been described in detail by Rebull et al. (2007), but we summarize here the most important
details. Fast scan maps were obtained at two separate epochs with a spacing between adja-
cent scan legs of 240” in each epoch. The second epoch observations were offset by 125” from
the first in the cross-scan direction to fill in the 70µm sky coverage that would otherwise
have been missed due to detector problems. The second epoch scan was also offset 80” from
the first in the scan direction to minimize missing 160µm data. For some of the c2d clouds,
these offsets together with sky rotation were sufficient to give essentially complete one-epoch
coverage at 160µm, but for Serpens there were still small gaps between every two scan lines.
Table 2 lists the sky coverage at each wavelength. The two observation epochs were sepa-
rated in time by ∼ 6 hours to allow identification of asteroids in the images; over this time
period asteroids will typically move 0.3 – 2 arcminutes. Because of Serpens’ relatively large
ecliptic latitude, ∼ 24 degrees, only a very small number of asteroids were seen, all of which
were removed by requiring 2-epoch detection in our final source lists. Typical integration
times are 30 seconds at 24µm, 15 seconds at 70µm and 3 seconds at 160µm. Additional GTO
– 4 –
observations east of the region of highest emission are not included in this analysis because a
different observing strategy was used. Those observations could, however, be added to ours
in order to construct a somewhat larger mosaic of the region.
Figure 2 shows the three individual images produced for the MIPS bands as well as
a false color image of the three together. Harvey et al. (2007) show an additional image
combining the 24µm data with IRAC observations as well as enlargements of the two main
clusters observed. Note that unlike the IRAC instrument, the three wavelengths of MIPS
all have diffraction limited spatial resolution which means the resolution varies dramatically
between 24µm (∼ 6”) and 160µm (∼ 40”).
Our data reduction is described in detail by Evans et al. (2007) but we summarize
the important details here. In addition, previous versions of the c2d pipeline, some of
which still apply to these data, have been described in more detail by Rebull et al. (2007)
and Young et al. (2005). We began our data reduction with the BCD images, processed
in this case by the standard SSC S13.2 pipeline. Following this the three MIPS channels
underwent slightly different processing paths in our c2d reduction. The 24µm data were
mosaicked with the SSC’s Mopex software (Makovoz & Marleau 2005) after processing in
the c2d pipeline to reduce artifacts, e.g. “jailbars” near bright sources. Point sources were
extracted with “c2dphot” (Harvey et al. in prep.), a source extractor based on “Dophot”
(Schechter, Mateo, & Saha 1993), which utilizes the mosaics for source identification but the
stack of individual BCD’s for each identified object to provide the photometry and position
information. We have estimated our completeness limit at 24µm in a manner similar to that
described for our IRAC photometry (Harvey et. al. 2006). We inserted a number of artificial
sources into the 24µm mosaic at random positions over a range of brightness covering the
range 2 < [24] < 12 mag. and then tested whether they were properly extracted. We also
produced a mosaic with only artificial sources (no real ones) but a noise level comparable
to that in the observed image, and tested the completeness of extraction from that artificial
image to estimate the effects of confusion in this relatively high source density region. Figure
3 shows the results from these tests. Clearly at the fainter flux levels, the effects of high
source density are important to the true completeness level in Serpens, e.g. [24] > 9.5 mag.
The processing of the 70µm data followed a path similar to that at 24µm with two
exceptions. At 70µm the SSC produces two sets of BCD’s, one of which is simply calibrated
and another that is filtered spatially and temporally in a manner that makes point source
identification easier but which does not conserve flux for brighter sources nor for diffuse emis-
sion. We produced mosaics of both the unfiltered and the filtered products using Mopex on
the native BCD pixel scale. Point sources were extracted using APEX (Makovoz & Marleau
2005). Source reality was checked by hand inspection and comparison with the 24µm source
– 5 –
list. Generally the filtered mosaics were used for point source extraction, but above F(70) ∼
2 Jy, we used the unfiltered data. Above F(70) ∼ 23 Jy, sources begin to be saturated. At
these very high flux levels we used a procedure to fit the wings of the source profile; these
data have been assigned a higher uncertainty of because of the inherent uncertainties in this
procedure.
Complete tables of source positions and flux densities for likely cloud members in Ser-
pens are given by Harvey et al. (2007) for our 3.6 – 70µm observations. At 160µm our
processing was limited to producing a native pixel scale mosaic using interpolation to fill in
missing pixels and point source extraction from the unfiltered mosaic. We extracted four
nominal point sources in the entire mapped area. Two of these are associated with obvious
multiple clumps of 24/70µm sources. The other two, SSTc2dJ1829167+0018225 (associ-
ated with IRAS 18267+0016) and SSTc2dJ18293197+0118429 (associated with source 159
of Kaas et al. (2004)) are likely powered mostly by single, shorter wavelength sources. Table
3 lists the positions and flux densities of these four nominal point sources with short com-
ments, since their 160µm photometry is not described in any of our other publications on
Serpens. None of these is in the core area of either of the main clusters. This is because large
areas in those clusters are saturated, and the close spacing of many bright sources leads to
the complicated, extended structure seen in Figure 2 at 160µm, without obvious point-like
sources.
After extraction, the source lists were bandmerged with our IRAC source lists for Ser-
pens (Harvey et. al. 2006) and the 2MASS catalog of J, H, and K
photometry (Skrutskie et al.
2006) as described by Evans et al. (2007). The radius for source matching with shorter wave-
length detections was 4” at 24µm and 8” at 70µm. Table 4 lists the number of sources ex-
tracted at 24 and 70µm, and some examples of statistics of numbers identified with shorter
wavelength sources. In addition to bandmerging, sources undergo a classification process
based on the available photometry, 2MASS, IRAC, and MIPS. For the purposes of this
paper the most important classification is that of “star” which implies a spectral energy
distribution that is well-fit as a reddened stellar photosphere without requiring any excess
infrared emission from possible circumstellar dust. The data reported here consist of a sub-
set of all the sources extracted in Serpens. The entire catalog is available from the SSC
website (http://ssc.spitzer.caltech.edu/legacy/all.html). For this paper we have limited our
discussion to sources with a signal-to-noise ratio greater than 5 and to sources found in both
epochs of observation to eliminate asteroids. These limits lead to a very high reliability for
the objects reported here, probably greater than 98%.
In addition to our reduction of the Serpens Cloud and off-cloud data, we have also
processed a 5.3 deg2 portion of the SWIRE Spitzer Legacy data (Surace et al. 2004) from
http://ssc.spitzer.caltech.edu/legacy/all.html
– 6 –
the ELAIS N1 field through our c2d pipeline. Since this field is almost entirely populated
by Galactic stars and extragalactic objects, it provides an additional control field against
which to compare our Serpens Cloud population as discussed below. Note that the SWIRE
observations go approximately a factor of 4 deeper than c2d due to increased integration
time.
3. Results
3.1. Extended Emission
The 160µm emission traces the coolest and most extended dust seen with MIPS. Figure 4
shows an image of the 160µm emission together with contours of the optical extinction. Also
shown are the locations of the two main clusters of young stellar objects in Serpens, the core
Cluster A, and Cluster B (also called the G3-G6 cluster by Djupvik et al. (2006)). The optical
extinction has been estimated by our fitting of the objects that were well characterized as
extincted stellar photospheres. This figure shows a very close correlation between the coolest
dust and the dust that is associated with optical extinction. The figure also clearly shows
that the two high-stellar-density clusters, Cluster A and B, are located in areas of maximum
extinction, as we discuss further in §3.5.
3.2. 24µm Time Variability
Since many pre-main-sequence stars exhibit variable optical emission, we conducted a
simple examination of the 24µm fluxes from the two observed epochs, similar to that in
Perseus by Rebull et al. (2007) and for the IRAC data in Serpens (Harvey et al. 2007). As
shown in Table 1, the time difference between the two epochs of observation was of order 4
hours. Figure 5 shows the ratio of the 24µm flux density between the two epochs for all the
extracted sources whose signal-to-noise ratio was above 5 that were detected in both epochs
of observation. Although there are a few outliers beyond the limits expected on the basis
of the signal-to-noise ratios, these are all readily explained as due to poor photometry near
the edges of the mosaic or problems due to source confusion or adjacency to bright sources.
This is consistent with the findings of Rebull et al. (2007) for the Perseus 24µm sources
and by Harvey et al. (2007) for the Serpens IRAC sources. Although there are undoubtedly
some variable sources in these clouds, the observing techniques of the c2d program were not
designed to enable reliable detection of modestly variable objects.
– 7 –
3.3. Source Counts
Because the Serpens star-forming cloud is so close to the Galactic plane, b ∼ 5 degrees,
the vast majority of the sources detected at the shorter wavelengths are background stars in
the Galaxy. At tfainter flux levels, background extragalactic objects constitute a significant
population. In order to estimate the background Galactic star numbers we have used the
Wainscoat et al. (1992) model provided by J. Carpenter (private communication). Figure 6
shows the predicted star counts from the model together with the observed counts at 24µm
for both the Serpens Cloud and the off-cloud region. Also shown in the figure are the source
counts from the c2d-processed SWIRE ELAIS N1 field which are largely extragalactic for
fluxes below a few mJy. This figure shows that contamination by Galactic stars at the
brighter fluxes and by extragalactic sources at the faint end is a significant problem for
identifying Serpens Cloud members. To address this problem we discuss our use of several
color and flux criteria in the following section. It is also apparent that there is an excess of
bright (F > 300 mJy) sources relative to the expected background counts. This excess is,
in fact, real and represents the bright end of the YSO candidate population discussed in the
following section.
3.4. Color-Magnitude Diagrams
The c2d team has discussed in a number of studies how the use of color-magnitude and
color-color diagrams can separate likely young cloud members with infrared excesses from
reddened stars and many background extragalactic sources (Young et al. 2005; Harvey et. al.
2006; Rebull et al. 2007; Harvey et al. 2007). Since nearly half of the area covered by our
MIPS 24µm observations was not observed with IRAC (Harvey et. al. 2006), we utilize the
color and magnitude criteria developed by Young et al. (2005) and refined by Rebull et al.
(2007) and Chapman et al. (2007) to isolate a candidate YSO population without requiring
the existence of IRAC data. The most populated diagram is naturally the color-magnitude
diagram of K
versus K
- [24] because of the much larger number of 24µm sources than
70µm ones. Figure 7 shows the distribution of sources in this diagram for the 1453 sources
with S/N above 5 at 24µm and with 2MASS K
matches within 4”. This distribution is
very similar to that seen in other well-populated c2d clouds such as Perseus (Rebull et al.
2007). A comparison of the SWIRE results, the Serpens off-cloud results, and the Serpens
Cloud data shows: 1) objects in our “star” class fall in a relatively narrow band with blue
-[24] colors (K
-[24] < 1) as would be expected, and 2) the part of the diagram toward
redder colors is populated by a number of sources in Serpens that are not seen in either
the off-cloud region or in the SWIRE data set, except at K magnitudes fainter than K
– 8 –
14. This allows us to assign a high probability that sources in the region K
< 14 and K
-[24] > 2 are Serpens Cloud YSO candidates with excess emission at 24µm probably due
to circumstellar dust. Note that the off-cloud area does have a population of moderately
reddened objects (K
-[24] < 2), well-fit as stellar photospheres that are not seen in the
SWIRE sample, simply because even the off-cloud area has more reddening than the high
Galactic latitude ELAIS N1 region. In order to categorize our YSO candidates crudely in
terms of evolutionary state, we have drawn lines in Figure 7 indicating where objects would
fall based on the YSO source classification criteria of Greene et al. (1994) using the K
-[24]
color to measure the spectral slope. Table 5 lists the number of candidates and the number
in each of the four classes. Although AGB stars with substantial mass loss also exhibit mid-
infrared excesses, Harvey et. al. (2006) have argued that the number expected in this area is
less than or of order a half dozen (four of which have already been confirmed spectroscopically
as AGB interlopers by Merin et al. (in prep.). The positions of and photometry for the YSO
candidates that are not in the area covered by IRAC are given by Harvey et al. (2007) along
with those in the IRAC area.
Harvey et al. (2007) discuss the comparison between YSO’s selected by the criteria used
here (K
and 24µm data only) and the more restrictive criteria possible with the combination
of IRAC data. They basically find that we actually may have missed 8 or 9 YSO’s in the area
not covered by IRAC and included a very few, 3 or 4, that may be background extragalactic
sources. But the overall conclusion is that there is a good correspondence between the YSO
candidates found using only MIPS and 2MASS versus those selected with a more complete
data set. It is also clear that the area mapped by both IRAC and MIPS, 0.85 deg2 contains
a much higher density of YSO’s, 235 or 276 deg−2 than does the area only covered by
MIPS/24µm with 51 YSO’s or 54 deg−2. Even if we exclude the area of the two high density
clusters, the area covered by the combined IRAC/MIPS observations has a YSO density a
factor of 4 higher than the area not included in the IRAC observations.
We have also plotted our photometry in two other color/magnitude spaces for compar-
ison with other c2d clouds. Figure 8 shows the distribution of sources in K
vs. K
-[70]
space. As observed by Rebull et al. (2007) in Perseus, there are a large number of likely
cloud members at much brighter K
magnitudes than seen for SWIRE extragalactic objects.
In addition, there is a small population of faint (in K
) objects that are redder than any of
the SWIRE objects in both Serpens and Perseus. The four objects redder than K
-[70] = 15
are all likely to be slightly less extreme versions of the sources discussed in the next section.
Two of these are located in cluster A, but tend to be around the outside of the tight cluster
of very red objects. The other two are in a small grouping associated with the second of
the four 160µm point sources listed in Table 3. Since all of these objects were also observed
in our program with IRAC, they are also listed in the appropriate tables of Harvey et al.
– 9 –
(2007), and all are considered high probability YSO’s.
The final color-magnitude diagram, [24] vs [24]-[70] is shown in Figure 9. Again this
distribution is qualitatively similar to that in Perseus, although we find many fewer sources
in the area overlapping the red edge of the extragalactic distribution than did Rebull et al.
(2007) for their “rest of the cloud”. The Serpens distribution is qualitatively more similar
to that for the NGC1333 portion of Perseus. Since many of the sources represented in this
diagram for Serpens are located in one of the two principal clusters, A and B, in Serpens,
it is perhaps not surprising that they would mimic some of the properties of similar young
clusters like NGC 1333.
3.5. The Most Embedded Objects
We have selected the coldest, most obscured sources from our sample by looking for
objects not detected in the 2MASS survey but detected with reasonable signal-to-noise at
both 24 and 70µm. There are 11 such objects in our surveyed area, and these are listed in
Table 6. Interestingly all 11 are located in the heart of either Cluster A or B. Additionally,
as shown in Table 6 all were detected in some or all IRAC bands. Their energy distributions
are all consistent with a designation of Class I even though they are not included in Figure
7 since they were not detected in the 2MASS survey. In fact, several of these objects are
strongly enough peaked in the far-infrared that they have energy distributions consistent
with some nominal Class 0 sources despite the fact that all were detected with IRAC. The
class status of these will be discussed further using mm data by Enoch et al. (2007, in prep.).
Figure 10 shows the SED’s for the two most embedded objects from Table 6. Each of these
appears to be associated with an outflow in its respective cluster, and both have very similar
SED’s that differ only in their absolute flux level by a factor of ∼ 10.
Table 6 shows also that the most embedded object in Cluster B (whose SED is shown in
Figure 10) was not selected as a YSO by Harvey et al. (2007). The reason is that the flux at
3.6µm was too faint to meet the selection criteria of that study. The area within 15” of that
source contains two other extracted compact sources in the c2d data set. The positions and
photometry for all three are shown in Table 7 and an image of the area is shown in Figure
11. Although the source density is quite high, the 70µm contours shown in the figure are
clearly centered on the northernmost source, “C”. Source “B” is a slightly extended source
that may represent a separate exciting object or may just be the location of the most visible
jet emission that has been discussed briefly by Harvey et al. (2007) in this region. Source
“A” is a faint, but very red object about 6” to the west of source “C” and appears to be a
point-like object in the images.
– 10 –
Figure 4 shows clearly that Cluster A and B are located in the highest extinction parts
of the cloud. Therefore the lack of detection of the objects in Table 6 at 1 – 2.3µm may
be due at least partly to the extinction of the cloud material in which they are embedded
in addition to individual circumstellar material. Although the nominal extinction values in
these areas range up to A
∼ 35 – 40, the fact that these values result from smoothing
over 90 arcseconds of the stellar distribution means that they probably underestimate the
extinction in the most extreme regions. This association of the coldest objects with the
highest extinction regions is similar to the correlation seen by Enoch et al. (2007) between
extinction and location of dense mm cores.
4. Summary
We have described the basic observational characteristics of the c2d MIPS observations
of the Serpens Cloud. In a 1.5 deg2 area we have found 250 YSO candidates on the basis
of the K
-[24] color. An additional 11 objects can be identified on the basis of their 24 and
70µm fluxes and lack of detection by 2MASS. All of these YSO candidates will be discussed
in more detail in a companion paper (Harvey et al. 2007). All the most embedded objects are
found in the central area of the two main clusters of YSO’s previously identified in Serpens.
The images and source catalogs derived from these data are all available on the SSC website,
http://ssc.spitzer.caltech.edu/legacy/all.html.
Support for this work, part of the Spitzer Legacy Science Program, was provided by
NASA through contracts 1224608, 1230782, and 1230779 issued by the Jet Propulsion Lab-
oratory, California Institute of Technology, under NASA contract 1407. Astrochemistry in
Leiden is supported by a NWO Spinoza grant and a NOVA grant. JKJ was supported by
NASA Origins grant NAG5-13050. This publication makes use of data products from the
Two Micron All Sky Survey, which is a joint project of the University of Massachusetts
and the Infrared Processing and Analysis Center/California Institute of Technology, funded
by NASA and the National Science Foundation. We also acknowledge extensive use of the
SIMBAD data base.
http://ssc.spitzer.caltech.edu/legacy/all.html
– 11 –
REFERENCES
Cambrésy, L. 1999 A & A, 345, 965
Chapman, N. et al. 2007, ApJ, in press.
Davis, C.J., Matthews, H.E, Ray, T.P., Dent, W.R.F., & Richer, J.S. 1999, MNRAS, 309,
Djupvik, A.A., André, Ph., Bontemps, S., Motte, F., Olofsson, G., Galfalk, M. & Floren,
H.-G. 2006,A&A, in press.
Evans, N. J., II, et al. 2003, PASP, 115, 965
Enoch, M. L. et al. 2006, ApJ, submitted
Evans, N. J., II et al. 2007, Final Delivery of Data ...: IRAC and MIPS,...
Greene, T. P., Wilking, B. A., André, P., Young, E. T. & Lada, C. J. 1994, ApJ, 434, 614
Harvey, P.M. et al. 2006, ApJ, 644, 307
Harvey, P.M. et al. 2007, in prep.
Jorgensen, J.K. et al. 2006, ApJ, 645, 1246
Kaas, A. A. et al. 2004, A & A, 421, 623
Makovoz, D. & Marleau, F. R. 2005, PASP, 117, 1113
Padgett, D.L. et al. 2007, ApJ, in press
Porras, A. et al. 2007, ApJ, in press
Rebull, L. et al. 2007, ApJ, in press
Schechter, P. L., Mateo, M., & Saha, A. 1993, PASP, 105, 1342
Skrutskie, M. et al. 2006, AJ, 131, 1163
Straizys, V., Cernis, K., & Bartasiute, S. 1996, Balt. Astr, 5, 125
Surace, J. A. et al. 2004, The SWIRE ELAIS N1 Image Atlases and Source Catalogs,
(Pasadena: Spitzer Science Center), http://ssc.spitzer.caltech.edu/legacy/
Wainscoat, R. J. et al. 1992, ApJS, 83, 111
http://ssc.spitzer.caltech.edu/legacy/
– 12 –
Young, K. E. 2005, ApJ, 628, 283
Zhang, C. Y. et al. 1988, A&A, 199, 170
Zhang, C. Y., Laureijs, R. J., & Clark, F. O. 1988, A&A, 196, 236
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Table 1. Summary of Observations
Region AOR Time-Date l a b a
(UT) (deg) (deg)
Serpens 5713408 2004-04-05 23:40 31.5 5.4
5713920 2004-04-06 04:05 31.5 5.3
5713664 2004-04-06 00:22 31.6 5.2
5714176 2004-04-06 04:48 30.6 5.1
Off Cloud 5716736 2004-04-06 01:26 35.2 4.4
5716992 2004-04-06 05:52 35.2 4.3
a l and b are listed for the center of the 24 µm AOR.
– 14 –
Table 2. Serpens Cloud Sky Coverage
Region 24 µm 70 µm 160 µm
(deg2) (deg2) (deg2)
Serpens 1.81 1.57 1.49
Off-Cloud 0.47 0.36 0.41
– 15 –
Table 3. 160µm Point Sources
RA (J200) Dec (J200) Flux (mJy) Comment YSO# a
18 29 32.3 +01 18 56 24000 Single 24/70µm Source 104
18 29 52.9 +00 36 09 18200 Cluster of four 24µm Sources
18 29 16.7 +00 18 20 10000 Single 24/70µm Source 88
18 28 15.7 −00 03 11 6070 Cluster of four 24µm Sources
aYSO number from Harvey et al. (2007).
– 16 –
Table 4. Serpens Cloud Detection Statistics
Wavelength(s) Source Number
24µm > 3σ 2635
24µm > 5σ 1494
70µm > 3σ 97
70µm > 5σ 88
24 & 70µm > 5σ 75
24µm & 2MASS K
> 5σ 1085
24µm & any IRAC 1040a
70µm & any IRAC 77
aThe greater number of matches between
24µm and K
versus IRAC is due to the
smaller area coverage of the IRAC data.
– 17 –
Table 5. Classification based on K
−[24]
Classification Serpens Source Counta
number with K
−[24]>2, K
<14 250
number with K
−[24]>2, K
<14, and Class I K
−[24] color 15 (6%)
number with K
−[24]>2, K
<14, and “flat” K
−[24] color 21 (8%)
number with K
−[24]>2, K
<14, and Class II K
−[24] color 158 (63%)
number with K
−[24]>2, K
<14, and Class III K
−[24] color 56 (22%)
aSince a 2MASS detection is required to be included in these statistics, very cold or
deeply embedded sources are not present in these counts, e.g. those sources in Table 6.
Table 6. The Most Embedded Objects
Name/Position YSO # a 3.6 µm 4.5 µm 5.8 µm 8.0 µm 24.0 µm 70.0 µm Associated Source b
SSTc2dJ... (mJy) (mJy) (mJy) (mJy) (mJy) (mJy)
18285404+0029299 40 5.81±0.50 27.6± 2.3 44.8± 2.6 56.4± 3.2 918± 85 11100± 1040 D62/66
18285486+0029525 42 1.94±0.12 10.6± 0.6 20.4± 1.1 30.2± 1.6 765± 70 7250± 675 D65
18290619+0030432 67 8.05±0.41 45.0± 2.8 93.9± 4.8 129± 7 1320± 139 7240± 713 D90
18290675+0030343 68 3.27±0.21 11.7± 0.7 14.9± 0.8 20.7± 1.2 1000± 105 11400± 1180 D94
18290906+0031323 < 0.12 0.29±0.03 0.40±0.09 0.31±0.08 64.6± 6.0 6380± 611 D101
18294810+0116449 135 1.96±0.10 6.98±0.42 12.1± 0.6 16.7± 0.8 219± 21 14900± 1420 K241, SMM9
18294963+0115219 141 0.85±0.08 2.64±0.27 2.32±0.28 3.54±0.31 1180± 117 82800± 7810 K258a, SMM1
18295219+0115478 150 7.38±0.41 33.0± 2.1 41.3± 2.2 40.0± 2.6 1640± 154 15200± 1420 K270, SMM10
18295285+0114560 155 8.65±0.44 34.6± 1.8 72.0± 3.4 110± 5 1040± 96 5570± 523 K276
18295927+0114016 195 2.72±0.28 5.76±0.44 7.78±1.16 36.0± 5.4 109± 19 12200± 1160 SMM3
18295992+0113116 198 2.77±0.16 29.5± 1.5 103± 4 199± 10 2620± 249 6830± 675 K331
aIdentifying number from YSO table in Harvey et al. (2007).
bReferences are: D: (Djupvik et al. 2006), K: (Kaas et al. 2004), SMM: Davis et al. (1999).
Table 7. Sources Marked In Figure 11
Marker Name/Position 3.6 µm 4.5 µm 5.8 µm 8.0 µm 24.0 µm 70.0 µm
SSTc2dJ... (mJy) (mJy) (mJy) (mJy) (mJy) (mJy)
Aa 18290904+0031280 0.95±0.11 2.78±0.23 2.92±0.24 5.03±0.40 14.0± 1.9 · · ·
B 18290864+0031305 0.06±0.03 0.32±0.02 0.47±0.05 0.62±0.07 36.2± 3.4 · · ·
C 18290906+0031323 < 0.12 0.29±0.03 0.40±0.09 0.31±0.08 64.6± 6.0 6380± 611
aThis is YSO # 75 in Harvey et al. (2007).
– 20 –
Fig. 1.— IRAS 25 µm map showing the observed c2d regions in the Serpens cloud, both the
star-forming region marked “SERPENS” and the low-extinction “OFF-CLOUD” area.
– 21 –
Fig. 2.— Registered Serpens 24µm, 70µm and 160µm images of the c2d MIPS region. The
color image is a composite of all three bands, and includes only the 1.27 square degree area
where data are available for each of the three bands. Colors represent red:160µm green:70
µm and blue:24µm. The black outline shows the region where 4 bands of IRAC data were
observed (Harvey et. al. 2006).
– 22 –
Fig. 3.— Completeness test at 24µm. The upper solid line shows the measured completeness
fraction for artificial sources inserted into the observed 24µm mosaic image of Serpens as a
function of magnitude. The slightly higher dash-dot line shows the completeness fraction for
sources inserted into an artificial image with no real sources but with a noise level equal to
that in the observed data. The lower solid line (mostly equal to zero) shows the fraction of
“unreliable” sources, i.e. sources extracted which were not real.
– 23 –
Fig. 4.— Contours of A
at levels of 5,10,20,30 mag determined from 2MASS and Spitzer
c2d IRAC data are overlaid on the Serpens 160 µm image. The visual extinction and 160µm
emission are quite well correlated. The locations of Cluster A and B are indicated.
– 24 –
0 1 2 3 4
log flux@24
Fig. 5.— A search for time variability in the Serpens 24µm data; plot of log flux ratio of
epoch1 to epoch2 versus log flux density (mJy) for the combined epoch data. There is no
verifiable time variable source in the cloud based on these data.
– 25 –
Fig. 6.— 24µm source counts in the Serpens MIPS field (dark line), and off-cloud region
(dashed line). SWIRE galaxy counts (thin line) fall below the Serpens data at our flux limit
of 1 mJy. The predicted source counts from the Wainscoat model at 25 µm (Wainscoat et al.
1992) are shown by the dot-dash line.
– 26 –
0 2 4 6 8 10 12 14
Ks-[24] (mag)
SWIRE
0 2 4 6 8 10 12 14
Ks-[24] (mag)
Class III Class II
flat Class I
Serpens
0 2 4 6 8 10 12 14
Ks-[24] (mag)
Class III Class II
flat Class I
Serpens OC
Fig. 7.— Color-magnitude diagram for K
vs. K
− [24] for objects in SWIRE (left) and
Serpens (center) and off-cloud region (right). The SWIRE counts are shown as a surface
density with darker implying higher density. Objects in SWIRE are expected to be mostly
galaxies (objects with K
&14) or stellar photospheres (objects with K
− [24] .1). For the
Serpens and off-cloud plots, filled gray circles are objects with SEDs resembling photospheres,
and plus signs are the remaining objects. An additional box around a point denotes that it
was also detected at 70µm. Objects that are candidate young objects have colors unlike those
objects found in SWIRE, e.g., K
.14 and K
− [24] &1. Dashed lines denote the divisions
between Class I, flat, Class II, and Class III objects; to omit foreground and background
stars, we have further imposed a K
− [24] >2 requirement on our Class III objects (see
text).
– 27 –
0 5 10 15 20
Ks-[70]
Fig. 8.— Color-magnitude diagram of K
vs. K
− [70] for Serpens (crosses) with data from
the full SWIRE survey (grey dots) included for comparison.
– 28 –
Fig. 9.— Color-magnitude diagram of [24] vs. [24] − [70] for Serpens (crosses) with data
from the full SWIRE survey (grey dots) included for comparison.
– 29 –
Fig. 10.— Spectral energy distribution for the two most embedded sources in Table 6, one in
Cluster A (open squares, SSTc2dJ1829463+0115219) and one in Cluster B (open diamonds,
SSTc2dJ18290906+0031323, source “C” in Table 7), both of which appear to be associated
with outflows.
– 30 –
Fig. 11.— Three color image of the eastern end of Cluster B where the most embedded
source, C, is located. This is the likely exciting source for an HH-like outflow visible in the
IRAC data. The color scheme is: blue/4.5µm, green/8.0µm, and red/24µm. The contours
of 70µm emission are also superimposed with levels at 40, 80, 160, 240, and 320 MJy/sr.
Also shown are the positions of two other compact sources extracted from the images in this
region. The letters correspond to positions/fluxes in Table 7.
	Introduction
	Observations and Data Reduction
	Results
	Extended Emission
	24m Time Variability
	Source Counts
	Color-Magnitude Diagrams
	The Most Embedded Objects
	Summary
ABSTRACT
  We present maps of 1.5 square degrees of the Serpens dark cloud at 24, 70,
and 160\micron observed with the Spitzer Space Telescope MIPS Camera. More than
2400 compact sources have been extracted at 24um, nearly 100 at 70um, and 4 at
160um. We estimate completeness limits for our 24um survey from Monte Carlo
tests with artificial sources inserted into the Spitzer maps. We compare source
counts, colors, and magnitudes in the Serpens cloud to two reference data sets,
a 0.50 deg^2 set on a low-extinction region near the dark cloud, and a 5.3
deg^2 subset of the SWIRE ELAIS N1 data that was processed through our
pipeline. These results show that there is an easily identifiable population of
young stellar object candidates in the Serpens Cloud that is not present in
either of the reference data sets. We also show a comparison of visual
extinction and cool dust emission illustrating a close correlation between the
two, and find that the most embedded YSO candidates are located in the areas of
highest visual extinction.

<|endoftext|><|startoftext|>
Introduction
	UED interactions and parameters
	Event simulation and test points
	 Analysis and results
	Conclusions
	Acknowledgments
	Bibliography
ABSTRACT
  Establishing that a signal of new physics is undoubtly supersymmetric
requires not only the discovery of the supersymmetric partners but also probing
their spins and couplings. We show that the sbottom spin can be probed at the
CERN Large Hadron Collider using only angular correlations in sbottom pair
production with subsequent decay of sbottoms into bottom quark plus the
lightest neutralino, which allow us to distinguish a universal extra
dimensional interpretation with a fermionic heavy bottom quark from
supersymmetry with a bosonic bottom squark. We demonstrate that this channel
provides a clear indication of the sbottom spin provided the sbottom production
rate and branching ratio into bottom quark plus the lightest neutralino are
sufficiently large to have a clear signal above Standard Model backgrounds.

<|endoftext|><|startoftext|>
Draft version October 26, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
MODELING THE GALAXY THREE-POINT CORRELATION FUNCTION
Felipe A. Maŕın
, Risa H. Wechsler
, Joshua A. Frieman
and Robert C. Nichol
Draft version October 26, 2018
ABSTRACT
We present new predictions for the galaxy three-point correlation function (3PCF) using high-
resolution dissipationless cosmological simulations of a flat ΛCDM Universe which resolve galaxy-size
halos and subhalos. We create realistic mock galaxy catalogs by assigning luminosities and colors
to dark matter halos and subhalos, and we measure the reduced 3PCF as a function of luminosity
and color in both real and redshift space. As galaxy luminosity and color are varied, we find small
differences in the amplitude and shape dependence of the reduced 3PCF, at a level qualitatively con-
sistent with recent measurements from the SDSS and 2dFGRS. We confirm that discrepancies between
previous 3PCF measurements can be explained in part by differences in binning choices. We explore
the degree to which a simple local bias model can fit the simulated 3PCF. The agreement between
the model predictions and galaxy 3PCF measurements lends further credence to the straightforward
association of galaxies with CDM halos and subhalos.
Subject headings: cosmology: large-scale structure of universe — galaxies: formation — galaxies:
statistics — galaxies: halos
1. INTRODUCTION
Observations of the higher-order statistics of the galaxy
distribution can provide fundamental tests of the stan-
dard cosmological model. For example, higher-order cor-
relation functions of the mass are predicted to be zero
in linear perturbation theory for Gaussian initial condi-
tions, which are expected in the simplest inflation models
of the early Universe (Peebles 1980, Bernardeau et al.
2002, Szapudi 2005 and references therein). In the
late Universe, however, non-linear gravitational cluster-
ing and biased galaxy formation lead to non-Gaussianity
in the galaxy density field, resulting in non-zero con-
nected N-point correlation functions (NPCFs) with N >
2. By studying higher-order galaxy statistics on large
scales, we can test the nature of the initial conditions;
on smaller scales, the NPCFs can constrain models of
biased galaxy formation (e.g., Fry & Gaztañaga 1993;
Frieman & Gaztañaga 1994) and the relationship be-
tween galaxies and their host dark matter halos.
The three-point correlation function (3PCF, or ζ),
and its Fourier-space equivalent, the bispectrum, are
the first in the hierarchy of higher-order statistics and
measure the shape dependence of the number of galaxy
triplets as a function of scale. The 3PCF is sensitive
to, for example, the shapes of dark matter halos and
the presence of filamentary structures in the large-scale
structure of the Universe (Sefusatti & Scoccimarro
2005). Since the pioneering work of Peebles and col-
laborators (see Peebles 1980 and references therein,
1 Department of Astronomy & Astrophysics, Kavli Institute
for Cosmological Physics, The University of Chicago, Chicago, IL
60637 USA
2 Kavli Institute for Particle Astrophysics & Cosmology, Physics
Department, and Stanford Linear Accelerator Center, Stanford
University, Stanford, CA 94305
3 Center for Particle Astrophysics, Fermi National Accelerator
Laboratory, P.O. Box 500, Batavia, IL 60510 USA
4 Institute of Cosmology & Gravitation, University of
Portsmouth, Portsmouth, PO1 2EG, UK
5 e-mail: fmarinp@uchicago.edu
in particular Groth & Peebles 1977), there have been
many measurements of the 3PCF and bispectrum
using a variety of angular (Peebles & Groth 1975;
Groth & Peebles 1977; Gaztañaga & Frieman 1994;
Frieman & Gaztañaga 1999; Huterer et al. 2001;
Szapudi et al. 2001, 2002; Ross et al. 2006) and red-
shift (Gaztañaga & Frieman 1994; Jing & Börner
1998; Verde et al. 1998; Scoccimarro et al. 2001a;
Feldman et al. 2001; Jing & Börner 2004) catalogs.
There has also been considerable theoretical work
to understand the 3PCF and bispectrum using
non-linear perturbation theory (see Bernardeau et al.
2002 and references therein), the halo model (see
Ma & Fry 2000; Scoccimarro et al. 2001b; Wang et al.
2004; Fosalba et al. 2005), and cosmological simulations
(Barriga & Gaztañaga 2002; Scoccimarro et al. 1999;
Gaztañaga & Scoccimarro 2005; Hou et al. 2005).
In recent years, there has been renewed inter-
est in the 3PCF due to the availability of large
redshift surveys. These surveys now provide both
the volume and the number of galaxies required
to make robust measurements of the 3PCF over a
range of scales. For example, recent papers by
Kayo et al. (2004), Hikage et al. (2005), Nichol et al.
(2006), Nishimichi et al. (2006), Kulkarni et al. (2007)
have presented measurements of the 3PCF from the
Sloan Digital Sky Survey (SDSS; York et al. 2000)
as a function of scale, galaxy luminosity, and color.
Nichol et al. (2006) also quantified the effect of large-
scale structures on the shape dependence of the 3PCF.
Likewise, several recent papers (Jing & Börner 2004;
Baugh et al. 2004; Croton et al. 2004; Gaztañaga et al.
2005; Pan & Szapudi 2005; Croton et al. 2006) provide
new measurements of the 3PCF and high-order correla-
tions from the Two-degree Field Galaxy Redshift Survey
(2dFGRS; Colless et al. 2001).
The main results from these recent SDSS and 2dF-
GRS analyses of the 3PCF are: i) on large scales, the
observed 3PCF is in qualitative agreement with expec-
http://arxiv.org/abs/0704.0255v2
2 Maŕın, Wechsler, Frieman and Nichol
tations for the growth of structure from Gaussian initial
conditions; ii) on smaller scales (i.e., in the non-linear
and weakly non-linear regimes), the 3PCF measured in
redshift space scales with the redshift-space two-point
correlation function (2PCF, or ξ) as ζ ∼ ξ2, consis-
tent with the “hierarchical clustering” ansatz (Peebles
1980); iii) the shape dependence of the reduced 3PCF
depends at most weakly on galaxy luminosity; iv) the
amplitude of the 3PCF is larger for elongated triangle
configurations than for more symmetric triangle shapes
(Gaztañaga & Scoccimarro 2005; Gaztañaga et al. 2005;
Nichol et al. 2006; Kulkarni et al. 2007) — again consis-
tent with expectations from non-linear clustering theory.
It is important to make detailed comparisons of these
3PCF observations with theoretical predictions that in-
corporate a realistic prescription for modeling galax-
ies and that account for observational effects such as
redshift-space distortions. Such comparisons can be car-
ried out in two ways: either the observations can be
corrected for the redshift-space effects and compared to
theory in real space, e.g., using the projected 3PCF
(Zheng 2004), which is analogous to the projected 2PCF
(Zehavi et al. 2005), or one can build mock galaxy cata-
logs from cosmological simulations and measure the the-
oretical 3PCF directly in redshift space. Since we also
have access to the real-space 3PCF from such mock cata-
logs, we can investigate in detail the relationship between
the real- and redshift-space correlation functions.
In this paper, we pursue the second of these method-
ologies, using state-of-the-art high-resolution dissipation-
less dark matter cosmological simulations. These simu-
lations have the spatial resolution required to identify
the dark matter (DM) halos and subhalos that host indi-
vidual galaxies and at the same time encompass a large
enough volume to probe large-scale structure in a sta-
tistically reliable way. The model we use assigns galaxy
properties (luminosity, color, etc.) directly to these DM
galactic halos and subhalos using simple, empirically-
based assumptions about these properties. This ap-
proach differs in both assumptions and the resolution
required from methods which build galaxy catalogs by
statistically assigning several galaxies to each (more mas-
sive) halo using a Halo Occupation Distribution (HOD;
e.g. Berlind & Weinberg 2002; Wang et al. 2004; see
Kulkarni et al. (2007) for constraints on HOD param-
eters from the SDSS Luminous Red Galaxy sample’s
3PCF) or from semi-analytic methods. The method used
here was first implemented by Kravtsov et al. (2004),
and has been applied successfully to different statisti-
cal studies, including the 2PCF (Conroy et al. 2006),
galaxy-galaxy lensing (Tasitsiomi et al. 2004), and close
pair statistics (Berrier et al. 2006) among others (see also
Vale & Ostriker 2004, 2006; Conroy et al. 2007). Here
we extend the study of this model to the 3PCF as a
function of luminosity, color, and redshift. Where pos-
sible, we make direct comparisons in redshift space with
recent observations.
In §2, we describe the simulations used in this paper
and our methods for constructing mock galaxy catalogs
based on resolved DM halos. We also review the tech-
niques used to estimate the NPCFs. In §3, we present
measurements of the 3PCF in both real and redshift
space for both the dark matter and galaxy catalogs, while
in §4, we study the dependence of the model 3PCF on
galaxy luminosity and color. In §5, we compare the
model 3PCF with SDSS observations and discuss the ef-
fects of binning. We relate the 3PCF to a simple non-
linear bias model in §6. In §7, we summarize and discuss
our findings.
2. METHODS
2.1. Dark matter simulations
We investigate clustering statistics using cosmological
N -body simulations of structure formation in the concor-
dance, flat ΛCDM cosmology with ΩΛ = 0.7 = 1 − Ωm,
h = 0.7, and σ8 = 0.9, where Ωm, ΩΛ are the present
matter and vacuum densities in units of the critical den-
sity, h is the Hubble parameter in units of 100 km s−1
Mpc−1, and σ8 specifies the present linear rms mass fluc-
tuation in spheres of radius 8 h−1Mpc. The simulations
used here were run using the Adaptive Refinement Tree
N−body code (ART, see Kravtsov et al. 1997 for de-
tails), which implements successive refinements in space
and time in high-density environments. The primary
simulation box we use is 120 h−1Mpc on a side (hereafter,
L120); the number and mass of each dark matter particle
are Np = 512
3 ≈ 1.34× 108 and mp = 1.07× 10
9h−1M⊙
respectively. This simulation has been previously used
to measure several properties of dark matter halos and
subhalos (e.g., Allgood et al. 2006; Wechsler et al. 2006;
Conroy et al. 2006; Berrier et al. 2006). In order to in-
clude more massive halos and study the effects of the
size of the sample on the statistical analysis, we also use
a second simulation with the same cosmological param-
eters in a bigger box, with 200 h−1Mpc on a side (which
was also used to measure halo shapes in Allgood et al.
2006). This box contains Np = 256
3 particles with mass
mp = 3.98× 10
10 h−1M⊙, therefore it will lack low mass
(and luminosity) objects that are included in the L120
From these dark matter samples, virialized concentra-
tions of particles are identified as halos. In order to find
these halos and their constituent subhalos (concentra-
tions of virialized matter inside bigger halos), a variant
of the Bound Density Maxima halo finding algorithm of
Klypin et al. (1999) is used. This algorithm assigns den-
sities to each particle using a smoothing kernel on the 32
nearest neighbors; centering on the highest-overdensity
particle, each center is surrounded by a sphere of radius
rfind = 50h
−1kpc. The algorithm removes unbound par-
ticles when calculating the properties of the halos. The
halo catalog is complete for halos with more than 50
particles, which corresponds to a minimum halo mass of
1.6×1010 h−1M⊙ for the L120 box and 2.0×10
12 h−1M⊙
for halos in the L200 box.
Henceforth, we will use the terms “distinct halo” to
mean any halo that is not within the virial radius of a
larger halo, “subhalo” to indicate a halo that is within
the virial radius of a larger halo, and “galactic halo” to
refer to the halo directly hosting a galaxy. Using this
terminology, the galactic halo of a satellite galaxy is a
subhalo while the galactic halo of a central or isolated
galaxy will be a distinct halo.
2.2. From halos to ‘galaxies’
It is expected that galaxy properties depend in detail
not only on the dark matter clustering, but also on the
Modeling galaxy three-point statistics 3
gas dynamics, radiative processes, and feedback mecha-
nisms that affect the baryonic components. A program
to include all those physical processes in building mock
galaxy catalogs from simulations would require introduc-
ing a number of free parameters and assumptions which
could partially obscure the relevant mechanisms that de-
termine the shape and amplitude of the 3PCF. Our ap-
proach is instead more empirical—to associate galaxies
of given properties with simulated dark matter halos and
subhalos, using halo properties that we can measure in
the simulation, and to see whether this one-to-one corre-
spondence predicts a galaxy 3PCF that is consistent with
the observations. As described below, the assignment of
galaxies to halos was designed to reproduce certain fea-
tures of the observed galaxy distribution, but the 3PCF
was not one of these. As a result, the 3PCF constitutes
a non-trivial test of this approach.
The primary galaxy samples used here are created by
assigning galaxy luminosities and colors drawn from the
SDSS redshift survey to dark matter halos and subhalos,
using the maximum circular velocity at z = 0, Vmax, as
an indicator of the halo virial mass. Vmax has been found
to be a good proxy of the galaxy potential well, which
is a good indicator of stellar mass (Kravtsov et al. 2004;
Conroy et al. 2006).
A galaxy luminosity in the r−band is assigned to
each halo by matching the cumulative velocity function
n(> Vmax) of all galactic halos (distinct halos and sub-
halos) to the observed SDSS r-band luminosity function
(Blanton et al. 2003b) at z = 0.1 (the approximate mean
redshift of the main spectroscopic SDSS galaxy sample
used to estimate the luminosity function). To correct
to z = 0 magnitudes, we use the code kcorrect v3 2
(Blanton et al. 2003a). Since the limited size of the box
gives us an upper limit on the luminosities which can
be reliably studied (in the statistical sense), and at the
same time we cannot sample the lowest-luminosity ob-
jects due to limited spatial resolution, for the L120 box
we present results for galaxies in the absolute magnitude
range −19 ≥ Mhr ≥ −22, where M
r ≡ Mr − 5 logh.
The L200 box has a lower spatial resolution, therefore
it contains only brighter objects, with Mhr
∼ − 20. In
order to assign colors to the galaxies, we use the proce-
dure described by Wechsler (2004) and Tasitsiomi et al.
(2004). This method uses the relation between local
galaxy density (defined as the distance to the tenth near-
est neighbor brighter than Mhr = −19.7 and within
cz = 1000 km/s) and color observed in the SDSS, using
the CMU-Pitt Value Added Catalog constructed from
DR1 (Abazajian et al. 2003), to assign a color to each
mock galaxy. We use the distant observer approxima-
tion (Bernardeau et al. 2002) to obtain positions in red-
shift space. Table 1 describes the subsamples used in this
study.
Although the general association of galaxies with dark
matter subhalos seems quite robust, the detailed asso-
ciation of galaxy properties with subhalo properties is
less clear. While galaxy luminosity is expected to be
quite tightly connected to velocity, the circular velocity
of a given subhalo decreases with time due to tidal strip-
ping as it interacts with its host halo. Observable galaxy
properties such as luminosity and color will likely be less
affected by this process. This implies that galaxy ob-
TABLE 1
Subsamples at z = 0
Box Subsample Number Density
of objects [(h−1Mpc)−3]
L120 All objects Mhr < −19 25371 1.5× 10
L120 −19 < Mhr < −20 15432 8.9× 10
L120 −20 < Mhr < −21 8153 4.7× 10
L120 red (g − r > 0.7) 10059 5.8× 10−3
L120 blue (g − r < 0.7) 15312 8.9× 10−3
L200 All objects Mhr < −20 43564 5.4× 10
L200 −20 < Mhr < −21 30575 3.8× 10
L200 −21 < Mhr < −22 9921 1.2× 10
L200 red (g − r > 0.7) 17278 2.2× 10−3
L200 blue (g − r < 0.7) 23748 2.9× 10−3
servables may be more strongly correlated with Vmax,acc,
the maximum circular velocity of the halos at the mo-
ment of accretion onto their host, than with the current
maximum circular velocity, Vmax,now. This conclusion
is supported by measurements of two-point statistics on
both large and small scales in simulations (Conroy et al.
2006; Berrier et al. 2006). Motivated by these consider-
ations, we construct galaxy catalogs using both Vmax,acc
and Vmax,now. We also use galaxy catalogs at different
redshifts in order to study the evolution of the 3PCF
with time. All these additional catalogs use the L120 box
and have the same spatial density as the sample of halos
selected by Vmax,now. We note that the model for as-
signing color is also not uniquely determined. It is, how-
ever, sufficient to match the two-point clustering length
for red and blue galaxies and several observed properties
of galaxy clusters (Zehavi et al. 2005). Future measure-
ments of both clustering statistics and of the properties
of galaxies in groups and clusters should help to further
refine these galaxy assignment models.
2.3. Measuring the 3PCF
Just as the 2PCF measures the excess probability of
finding two objects separated by a distance r, the 3PCF
describes the probability of finding three objects in a
particular triangle configuration compared to a random
sample. The probability of finding three objects in three
arbitrary volumes dV1, dV2, and dV3, at positions r1, r2
and r3 respectively, is given by (Peebles 1980)
P = [1 + ξ(r12) + ξ(r23) + ξ(r31) + ζ(r12, r23, r31)]×
n̄3dV1dV2dV3, (1)
where n̄ is the mean density of the objects, rij ≡ ri − rj
the distance between two objects, ξ is the 2PCF, and ζ
is the 3PCF:
ξ(r12)= 〈δ(r1)δ(r2)〉 (2)
ζ(r12, r23, r31)= 〈δ(r1)δ(r2)δ(r3)〉; (3)
here δ is the fractional overdensity in the dark matter
field or in the distribution of galaxies. Since the 3PCF
depends on the configuration of the three distances, it
is sensitive to the 2-D shapes of the spatial structures,
at large and small scales (Sefusatti & Scoccimarro 2005;
Gaztañaga & Scoccimarro 2005). Motivated by the “hi-
erarchical” form of the N-point functions, ζ ∝ ξ2, found
4 Maŕın, Wechsler, Frieman and Nichol
by Peebles & Groth (1975), we use the reduced 3PCF
Q(r, u, α) to present our results:
Q(r, u,α ) =
ζ(r, u, α)
ξ(r12)ξ(r23) + ξ(r23)ξ(r31) + ξ(r31)ξ(r12)
. (4)
This quantity is useful since Q is found to be close to
unity over a large range of scales even though ξ and ζ vary
by orders of magnitude (Peebles 1980). To parametrize
the triangles for the 3PCF measurements, r ≡ r12 sets
the scale size of the triangle, while the shape param-
eters are given by the ratio of two sides of the trian-
gle, u = r23/r12, and the angle between the two sides
of the triangle α = cos−1(r̂12 · r̂23), where r̂12, r̂23
are the unit vectors of the first two sides. Following
Gaztañaga & Scoccimarro (2005), triangles where α is
close to 0◦ or 180◦ are referred to as “elongated config-
urations”, while those with α ∼ 50◦ − 120◦ are referred
to as “rectangular configurations”,
We calculate the 2PCF using the estimator of
Landy & Szalay (1993),
DD − 2DR+RR
. (5)
Here, DD is the number of data pairs normalized by
ND ×ND/2, DR is the number of pairs using data and
random catalogs normalized by NDNR, and RR is the
number of random data pairs normalized by NR×NR/2.
The 3PCF is calculated using the Szapudi & Szalay
(1998) estimator:
DDD − 3DDR+ 3DRR−RRR
, (6)
where DDD, the number of data triplets, is normalized
by N3D/6, and RRR, the random data triplets, is nor-
malized by N3R/6. DDR is normalized by N
DNR/2, and
DRR by NDN
We estimate the errors using jack-knife re-sampling.
From each galaxy catalog, we construct sixteen subsam-
ples of L120 or L200; within each of them we remove a
different region (30 × 602 (h−1Mpc)3 for the L120 box,
and 50×1002 (h−1Mpc)3 for the L200 box). The variance
σJK of Q is calculated as:
σ2JK =
N − 1
(Qi − Q̄i)
2, (7)
where N = 16 is the number of subsamples, Qi is the
value for the i−th subsample, and Q̄i is the mean of the
Qi. We note in passing that the validity of jack-knife
resampling as a method to estimate the errors has not
been explicitly tested with mock catalogs for three-point
statistics. Although it is beyond the scope of this paper,
this may be an interesting topic of future investigation
especially once the statistical power of the measurements
improves.
To compute the 2PCF and 3PCF, we use the NPT
software developed in collaboration with the Auton Lab
at Carnegie Mellon University. NPT is a fast implemen-
tation of the NPCFs using multi-resolution kd-trees to
compute the number of pairs and triplets in a dataset.
For more details and information on the algorithm, see
Moore et al. (2001), Gray et al. (2004), and Nichol et al.
(2006).
3. THE 3PCF OF GALAXIES AND DARK MATTER
We estimate the reduced 3PCF for the distribution of
dark matter and for galaxies for different triangle con-
figurations, focusing on the scale and shape dependence
of the 3PCF. We also investigate its time evolution and
how it depends on the selection criterion for subhalos.
We study Q(r, u, α) in both real and redshift space, in
order to compare our results with current observations
and disentangle galaxy biasing effects from those which
are consequences of redshift distortions.
In order to distinguish scale and, most importantly,
shape effects, and to keep the errors as small as possi-
ble, we have chosen an intermediate-resolution binning
scheme. For studies of equilateral triangles (u = 1 and
α = π/3 rad), we use bins of size ∆ log(r) = 0.1. For
measurements of the shape dependence of the 3PCF, we
use triangles with four different scales r = 0.75, 1.5, 3, 6,
and 9 h−1Mpc, using the shape parameters u = 2, and 15
angular bins separated by ∆α = π/15 rad; the resolution
of the bins is given by ∆rij = ±0.03rij. This resolution is
sufficient to see the most important features of the 3PCF
even on small scales, although it is not sufficient to dis-
tinguish the “finger-of-god” effect at the smallest scales
in redshift space, where Q(α) varies very little except
at very small or elongated angles, where it increases to
many times the mean value (Gaztañaga & Scoccimarro
2005).
3.1. The 3PCF in real space
Figures 1 and 2 show the 3PCF for dark matter par-
ticles and for galaxies from the L120 and L200 simula-
tions. Here we plot results for dark matter (thick solid
line), galaxies in halos selected by Vmax,now (thin solid
line), and galaxies in halos selected by Vmax,acc (long-
dashed line) for z = 0 and for halos selected by Vmax,now
at z = 1 (short-dashed line) and z = 2 (dotted line).
Figure 1 shows the reduced 3PCF for equilateral trian-
gles, Qeq(r), in real (left panel) and redshift space (right
panel). In real space, the reduced 3PCF for the dark
matter is only weakly scale dependent on small scales,
decreases rapidly with increasing scale around r ∼ 3h−1
Mpc, and falls off more slowly on larger scales. This
behavior is broadly consistent with previous N-body re-
sults (e.g. Scoccimarro et al. 1998) and with expectations
from leading order non-linear perturbation theory on the
largest scales (shown as the thick red long-dashed curve
in Figure 1 left panel), with loop-corrected perturbation
theory on intermediate scales where the rms perturba-
tion amplitude δ(r) is of order unity (the transition to
the strongly non-linear regime; adding more orders to the
calculation would increase the agreement to the N -body
dark matter 3PCF amplitude), and with quasi-stable hi-
erarchical clustering on the smallest scales. Tests with
the L200 box indicate that the downturn in Qeq for dark
matter at scales larger than r ∼ 8h−1 Mpc is likely due
to finite volume effects.
At scales below r ∼ 10h−1 Mpc, the dark matter Qeq is
larger than that for the galaxies; this behavior is broadly
expected if galaxies are more strongly clustered than
(positively biased with respect to) the mass, cf. eqn.(10).
Modeling galaxy three-point statistics 5
Fig. 1.— The reduced 3PCF, Qeq(r), for equilateral triangles in the L120 box. Left: Results in real space. Thick solid line (black): dark
matter; thin solid line (black): galactic halos selected by Vmax,now at z = 0; short-dashed (green): galactic halos selected by Vmax,now at
z = 1; dotted (red): galactic halos selected by Vmax,now at z = 2; long-dashed (blue): galactic halos selected by Vmax,acc at z = 0; thick
long-dashed (red): leading-order perturbation theory, dark matter. Right: Results in redshift space. Line types correspond to the same
dark matter and halo samples as in the left-hand plot. Error bars are calculated using jack-knife resampling and are only shown for one of
the samples for clarity.
At higher redshift, evolution is seen in Qeq(r) that is con-
sistent with expectations from non-linear gravitational
evolution: on the largest scales, the amplitude of Qeq is
unchanging, as predicted from leading order perturbation
theory, while the sharp break associated with the tran-
sition to the strongly non-linear regime moves to larger
scales as the density perturbation amplitude increases
with time.
Comparing results for subhalos selected by Vmax,now
and by Vmax,acc, differences in the amplitude of Qeq ap-
pear on small scales, r . 3 h−1 Mpc. In halo-model
language, on these scales the 3PCF is sensitive to the
internal structure of halos, i.e., to the one- and two-halo
terms, while the three-halo term dominates the 3PCF on
larger scales (Wang et al. 2004; Takada & Jain 2003).
In Figure 2, the top panels show how the reduced 3PCF
depends on triangle shape in real space. In general,
the 3PCF for elongated configurations is greater than
for rectangular configurations. This is a consequence of
the fact that, in non-linear gravitational instability, ve-
locity flows tend to occur along gradients of the density
field (Bernardeau et al. 2002). The 3PCF is larger for
dark matter than for galaxies for all shapes and scales,
although the difference is larger for elongated configura-
tions. The difference in 3PCF amplitude between rect-
angular and elongated configurations is larger on large
scales, in broad agreement with leading-order theoreti-
cal predictions (Bernardeau et al. 2002): on large scales,
the strong shape dependence is determined by pertur-
bative non-linear dynamics; on smaller scales, the shape
dependence is washed out since the coherence between
the velocity and density fields gives way to virialized mo-
tions. This scale dependence of the 3PCF shape is also
reflected in the redshift evolution: in the L120 box (left
panel), the galaxy 3PCFs at z = 1 and 2 (green-dashed
and red-dotted curves) essentially retain the primordial
shape dependence of leading-order non-linear perturba-
tion theory, i.e., at those redshifts, these scales are still
close to the quasi-linear regime. At r = 3 h−1Mpc, the
largest evolution in Q(α) is found for elongated configu-
rations.
As was seen in Figure 1, the effect of changing halo se-
lection from Vmax,acc to Vmax,now on the 3PCF shape ap-
pears only on small scales, r . 1.5h−1 Mpc, i.e., roughly
within the scale of a typical cluster-mass host halo.
On the larger scales probed in the L200 box (right
panel of Figure 2), the galaxy reduced 3PCF (open cir-
cles) tracks the shape of the dark matter 3PCF fairly
well. The difference between the galaxy and dark matter
3PCF amplitudes on these scales is reasonably well fit
by a simple bias prescription: the thin blue curve is the
biased 3PCF that results from fitting the galaxy 3PCF
with eqn. (10); see §6. We also see that the jack-knife
errors increase on the largest scales, where the effects of
the finite box size start to become evident. For compar-
ison, the red long-dashed curve is the 3PCF of the dark
matter from leading-order non-linear perturbation the-
ory (Bernardeau et al. 2002; Jing & Borner 1997). On
the largest scales, it is in reasonable agreement with the
measured 3PCF for the dark matter.
3.2. The 3PCF in redshift space
Redshift distortions have been studied in depth (and
are a useful tool to constrain cosmological parameters)
for the power spectrum (e.g., Bernardeau et al. 2002;
da Ângela et al. 2005; Tinker 2007) and for the bis-
pectrum (Scoccimarro et al. 1999; Verde et al. 2002;
Sefusatti et al. 2006). Some comparisons have been
made for the 3PCF as well (Matsubara & Suto
1994; Takada & Jain 2003; Wang et al. 2004).
Gaztañaga & Scoccimarro (2005) found that the
redshift distortions do not have a strong dependence on
the cosmological parameters.
The right panel of Figure 1 and the bottom panels
6 Maŕın, Wechsler, Frieman and Nichol
0 50 100 150
0 50 100 150 0 50 100 150
0 50 100 150
0 50 100 150
Fig. 2.— Measurement of the reduced 3PCF as a function of triangle shape, Q(α), for different scales r, with side ratio u = r23/r12 = 2
fixed. Left: Results in the L120 box for r = 0.75, 1.5, and 3 h−1Mpc (from left to right) in real space (top panels) and redshift space
(bottom panels) for dark matter and galaxies; line types correspond to the same dark matter and halo samples as in Figure 1. Right:
Results in the L200 box for r = 6 (left) and 9 h−1Mpc (right) in real (top) and redshift space (bottom); Thick solid line (black): dark
matter; open circles: galaxies in halos selected by Vmax,now at z = 0; thin solid line (blue): predicted galaxy 3PCF using the dark matter
3PCF Qdm and eqn. (10) with best-fit bias parameters c1, c2 obtained from fitting the galaxy 3PCF at r = 9h
−1 Mpc (see §6); long
dashed (red): leading order non-linear perturbation theory prediction for dark matter reduced 3PCF. Error bars calculated using jack-knife
resampling method.
of Figure 2 show the 3PCF in redshift space. The first
feature that can be seen is a dramatic decrease in the
amplitude and in the scale and shape dependence of Q
compared to the real-space measurements. For exam-
ple, for equilateral triangles, the redshift space Qz(r)
is reduced compared to the real space Q(r) at small
scales and increased with respect to the real-space re-
sults on larger scales. The overall effect is that Qz(r)
is nearly scale-independent, i.e., the clustering appears
more hierarchical in redshift space (Suto & Matsubara
1994; Matsubara & Suto 1994; Scoccimarro et al. 1999).
Moreover, in redshift space, the suppression of the galaxy
3PCF relative to that of the dark matter is apparent
on all scales; it appears to be relatively independent of
scale and configuration and is larger than the relative
suppression in real space, consistent with earlier results
(Wang et al. 2004; Gaztañaga et al. 2005). It also ap-
pears that there is very little redshift evolution of the
galaxy 3PCF in redshift space; the measurements at
z = 1 and z = 2 are nearly indistinguishable from each
other. With regard to halo selection, as in real space we
find that QVmax,now < QVmax,acc, but the differences
between them are smaller than in real space.
Together, these results suggest that the shape and scale
dependence of the reduced 3PCF in redshift space on the
scales shown here are largely determined by redshift dis-
tortions, with non-linear gravitational evolution playing
a subdominant role.
4. OBSERVING THE 3PCF: LUMINOSITY AND COLOR
DEPENDENCE
To investigate the dependence of the three-point clus-
tering on galaxy luminosity and color and to make direct
comparisons with measurements from recent redshift sur-
veys, we calculate the 3PCF for galaxies with luminos-
ity and color cuts similar to those that have been ap-
plied to redshift survey data samples, with luminosity
and color information obtained as described in §2. The
luminosities are assigned according to Vmax,now in the
L120 and L200 boxes, since we hace those measurements
for both boxes. As seen in the previous section, the use
of Vmax,now is justified since the differences with respect
to Vmax,acc are only significant on small scales (r<∼ 1.5
h−1Mpc) in real space and are almost completely dis-
sapear on redshift space. We adopt the same binning
scheme used in the previous section.
4.1. Luminosity dependence
The left panels of Figure 3 show results for the re-
duced 3PCF for equilateral configuratons in two lumi-
nosity bins, in real and redshift space. For equilateral
triangles, there is a small difference in 3PCF between
the luminosity samples in real space: the fainter galaxies
have larger Q(r) than the brighter ones, as expected in
a simple linear bias model (eqn. 10). The redshift-space
3PCFs for these galaxies are roughly constant with r,
Qz,gal(r) ∼ 0.7, in agreement with SDSS measurements
for these configurations (see Figures 7-9 in Kayo et al.
2004). There is a very slight difference between the re-
duced 3PCF amplitudes for different luminosity bins in
redshift space, but it is not statistically significant for a
dataset of this size. This result qualitatively agrees with
the SDSS results of Kayo et al. (2004) who also found
almost no luminosity dependence of the reduced 3PCF
on these scales. They found a slightly higher amplitude
for the reduced 3PCF for the −19 > Mhr > −20 sample
compared to that of brighter galaxies, but their results
for the two luminosity samples were consistent within
the error bars. The brightest galaxies (dotted line) show
significant fluctuations in both real and redshift space;
we think this behavior is due to the small size of the box
Modeling galaxy three-point statistics 7
by luminosity
real space
by luminosity
redshift space
by color
real space
by color
redshift space
Fig. 3.— The reduced 3PCF, Q(r), for equilateral triangles as a
function of galaxy luminosity and color in real and redshift space.
Top left : Qeq(r) in real space for galaxies divided into luminosity
bins; long dash-dotted (cyan): −19 > Mhr > −20; short-dashed
(green): −20 > Mhr > −21; dotted (magenta): −21 > M
r > −22.
The brightest sample comes from the L200 box, the other two are
from the L120 box. Bottom left: Results in redshift space; line
types correspond to the same galaxy samples as in the top left
panel. Top right: Qeq(r) in real space for galaxies divided accord-
ing to color, using the L120 box; long-dashed (red): red galaxies
(g− r > 0.7); short dash-dotted (blue): blue galaxies (g− r < 0.7).
Bottom right Results in redshift space; line types correspond to
the same galaxy samples as in the top right panel. Error bars are
calculated using jack-knife respampling.
and the low density of objects.
Figure 4 shows the dependence of Q(α) on galaxy lu-
minosity in real (curves) and redshift space (points). We
use the same ordinate scale for all the plots to empha-
size where configuration effects are more important. The
top panels show results for small scales, calculated with
the L120 box. On these scales, the reduced redshift-
space 3PCF of the brighter sample (−20 > Mhr > −21,
short dashed curve for real space, filled triangles for red-
shift space) is slightly lower than for the fainter sample
(−19 > Mhr > −20, long dashed-dotted curve for real
space, filled squares for redshift space) for all angles,
consistent with the results for Qz(r) using equilateral
triangles. The lower plots in Figure 4 show the lumi-
nosity dependence on larger scales, measured using the
L200 box; note that the luminosity bins in the lower
plots are −20 > Mhr > −21 and −21 > M
r > −22.
The characteristic U-shape of Q(α) appears clearly in
the redshift-space measurements on scales larger than
r = 6 h−1Mpc. The luminosity dependence of Q on
these scales appears non-existent in redshift space and
only slight in real space. There are hints that the reduced
3PCF for fainter galaxies may have slightly higher am-
plitude and stronger shape dependence than for brighter
galaxies, but these trends are not statistically significant
in the samples studied here. Rather, the strong lumi-
nosity dependence observed for the 2PCF (Zehavi et al.
2005) appears to be closely matched by a correponding
dependence of the 3PCF, such that the reduced 3PCF Q
is roughly independent of luminosity.
L120 box L120 box
0 50 100 150
L200 box
0 50 100 150
L200 box
Fig. 4.— Q(α) shape dependence for galaxies divided by lumi-
nosity, for r = 1.5, 3, 6, and 9 h−1Mpc, with fixed u = 2 in
real (lines) and redshift space (symbols). Top two plots are re-
sults for the L120 box: filled squares and long dash-dotted (cyan):
−19 > Mhr > −20; filled triangles and short-dashed (green):
−20 > Mhr > −21; triangles are slightly shifted to the right for
clarity. Bottom plots show results for the L200 box: filled trian-
gles and short-dashed (green): −20 > Mhr > −21; filled circles and
dotted (magenta): −21 > Mhr > −22; circles are slightly shifted
to the right for clarity. Error bars are calculated using jack-knife
resampling and are shown only for one of the samples.
L120 box L120 box
0 50 100 150
L200 box
0 50 100 150
L200 box
Fig. 5.— Q(α) shape dependence for galaxies divided by color,
in the L120 box (top) and L200 box (bottom) in real (lines) and
redshift space (symbols). Open squares and long-dashed line (red):
red (g − r > 0.7) galaxies. open triangles and short dash-dotted
(blue): blue (g − r < 0.7) galaxies; squares are slightly shifted
to the right for clarity. Error bars are calculated using jack-knife
resampling and are shown only for one of the samples.
4.2. Color dependence
The right panels of Figure 3 show the reduced 3PCF
Q(r) for equilateral triangles separately for red and blue
galaxies in real (top) and redshift space (bottom). Due
to limited statistics, we use the full luminosity range in
each color bin, as opposed to Kayo et al. (2004), who di-
8 Maŕın, Wechsler, Frieman and Nichol
vided the color bins into luminosity subsamples as well
(see their Figure 9). Since the luminosity functions for
red and blue galaxies differ, our red and blue samples
have different characteristic luminosities — the red sam-
ple is on average brighter. In both real and redshift space,
the red galaxies appear to have a slightly higher reduced
3PCF amplitude. Comparing with the left panels, this
difference is in the opposite sense from that expected
from the fact that the red galaxies are brighter; put an-
other way, if we were to compare red and blue samples
of the same luminosity, the color difference in Q would
likely be larger than that seen here. On the other hand,
we should not overinterpret these trends, since the differ-
ences are within the statistical errors. Kayo et al. (2004)
found a similarly weak dependence of Q in redshift space
on color.
Figure 5 shows the color dependence of Q(α) in real
(curves) and redshift space (points), for the same sam-
ples and configurations as in Figure 4. In real space, on
scales larger than about 3 h−1 Mpc, Q for red galaxies
is larger for rectangular configurations and smaller for
elongated configurations than for blue galaxies. This be-
havior is qualitatively consistent with a picture in which
red galaxies preferentially occupy the inner regions of
clusters, while blue galaxies tend to trace out more el-
liptical or filamentary structures. In redshift space, the
trend with color is largely washed out. These redshift
space results appear more consistent with the observa-
tions of the 2dFGRS (Gaztañaga et al. 2005), where the
color differences for Q in redshift space are smaller than
those seen in the SDSS by Kayo et al. (2004).
The fact that red galaxies have a larger two-point clus-
tering amplitude than blue galaxies and that the reduced
3PCF for red galaxies is also larger than for blue galax-
ies (at least for rectangular configurations) suggests that
red galaxies have a larger quadratic (non-linear) bias,
Cf. eqn. 10. This is qualitatively consistent with the ob-
served morphology-density or color-density relation ac-
cording to which red galaxies are preferentially found in
dense regions, since the latter contribute more strongly
to the higher-order correlations. The qualitative agree-
ment between the two- and three-point observations and
our method for assigning galaxy colors confirms that the
color of a galaxy, a consequence of many physical pro-
cesses occurring inside the galaxies, depends largely on
the surrounding environment. However, we note that
the color assignment for these subhalos is the most un-
certain part of the model; future work will be required
to determine how clustering statistics will change if more
sophisticated schemes are adopted.
4.3. Redshift distortions: Galaxy Type and Evolution
So far, we have explored some differences between mea-
surements in real and redshift space of the 3PCF for dif-
ferent simulated galaxy samples. Here we investigate in
more detail whether the redshift distortions of the 3PCF
are universal or instead depend on the type of galaxy
studied and to what extent they evolve with time. We
study the behavior of the quantity
∆Q(r, u, α) ≡ Qz(r, u, α)−Qr(r, u, α), (8)
where Qr, Qz represent the reduced 3PCF measured in
real and redshift space, respectively. We explore ∆Q
L120 box
dark matter
L120 box
blue galaxies
red galaxies
L120 box
L200 box
Fig. 6.— ∆Q(r, u, α) ≡ Qz(r, u, α)−Qr(r, u, α) as a function of
galaxy type and redshift. Top: ∆Q for dark matter at z = 0 and for
galaxy samples at different redshifts in the L120 box, for equilateral
triangles (top left) and as a function of angle for triangles with
r = 3 h−1Mpc and u = 2 (top right). Bottom: ∆Q as a function
of galaxy type for equilateral triangles in the L120 box (bottom
left) and as a function of angle for r = 6 h−1Mpc and u = 2 in the
L200 box (bottom right).
as a function of galaxy type (luminosity and color) and
epoch. Figure 6 shows ∆Q for equilateral triangles as a
function of scale and also the configuration dependence
for triangles with fixed r, u = 2, and different opening
angles α.
In general, the trends are similar to those seen above:
Qz < Qr for small scales and for elongated triangle con-
figurations, while the opposite behavior is seen for larger
scales and for rectangular configurations. At z = 0, for
equilateral triangles ∆Q(r) appears to display a roughly
universal scale dependence, independent of galaxy type,
with ∆Q(r) ≃ 0.67r−2 − 2.6r−1 − 0.02r + 0.88 over the
range 0.5 ≤ r ≤ 5 h−1Mpc. On the other hand, for
r = 6h−1 Mpc and u = 2, the shape dependence of
∆Q shows more dependence on galaxy type, with blue
galaxies having larger values than red galaxies and bright
galaxies smaller values than faint galaxies. Note that the
redshift distortions of Q appear insensitive to whether
the subhalos are identified at the present or at the time
they are first accreted onto host halos.
We see clear evolution of ∆Q with redshift, as expected
since redshift distortion effects should become more pro-
nounced as perturbations become more non-linear. For
equilateral triangles, the upper left panel of Figure 6
shows that the scale r where ∆Q ∼ 0 increases with
time. The shape dependence of ∆Q at fixed scale also
appears to increase with time.
5. OBSERVING THE 3PCF: COMPARISON TO SDSS DATA
AND BINNING EFFECTS
Below, we compare our results with measurements of
the 3PCF from the SDSS by Nichol et al. (2006). The
SDSS sample is magnitude limited, with mr < 17.5, and
has additional cuts in absolute magnitude, −19 < Mr <
−22, and in redshift, 0.05 < z < 0.15. In order to com-
pare with this data, we randomly resample the galaxies
from the L120 simulation (in particular, the Vmax,noew
Modeling galaxy three-point statistics 9
Fig. 7.— Top: Qz(r, u = 2, α) at r = 1.0, 2.0 and 4.0 h
−1Mpc
from SDSS observations (green points) from Nichol et al. (2006)
and from our pseudo-flux-limited-sample in the L120 box (long-
dashed lines) with the same binning scheme used in the mentioned
paper. Bottom: Effect of binning in the 3PCF measurements.
Qz(r, u = 2, α) at r = 2.0 and 4.0 h
−1Mpc in the L120 box and
r = 10 h−1Mpc in the L200 box for the simulations in a volume-
limited sample. solid (red): results using a wide binning scheme
(Nichol et al. 2006); long-dashed (black): results with narrow bin-
ning. In the middle panel, we plot the results using SDSS data
with wide (green points) and narrow (triangles) binning.
sample at z = 0), so that they have the same absolute
magnitude distribution as the SDSS flux-limited sample
used in Nichol et al. (2006); we call these pseudo-flux-
limited samples. This resampling technique does not
properly model the distance-dependent selection func-
tion of the SDSS, but it should reproduce its clustering
properties on average. Moreover, since, as we have just
shown, the luminosity dependence of the reduced 3PCF
is weak, we expect that this procedure should be suffi-
ciently accurate for our purposes.
Nichol et al. (2006) measure the shape dependence of
Q(α) for four different length scales, using wide bins in
r, u, and α (see their paper for more details): ∆r = 1
h−1Mpc, ∆u = 1 and ∆α = 0.1 rad. In Figure 7,
we compare the SDSS results (green points) to the cal-
culation of Qz(r, u = 2, α) using our model redshift-
space, volume-limited sample 3PCF (solid lines) and
our pseudo-magnitude-limited sample (dotted line), each
with the same binning as the data. We use the L120 box
for small scales and the L200 box for the r = 10 h−1Mpc
measurements. As the top panels of Figure 7 show, the
model and data show good agreement within the model
jack-knife error bars in amplitude as well as in the shape
of Q(α).
Over the range of scales shown here, both the SDSS
data and the simulations show relatively little shape de-
pendence for the reduced 3PCF amplitude. This is in
contrast to the results of §4 above and to the 2dFGRS re-
sults of Gaztañaga et al. (2005) on similar scales, where
a significantly stronger shape dependence of Q(α) is evi-
dent. As noted by Gaztañaga & Scoccimarro (2005) and
Kulkarni et al. (2007), the differences can be traced to
the binning scheme: the relatively wide binning scheme
used here results in smearing and therefore suppression
of the U-shape of Q over most scales of interest. This
effect is not due primarily to the binning in α: small
bins in both r and u are necessary to see the effects of
shape-dependent clustering.
To illustrate these effects in more detail, in the lower
panels of Figure 7 we also show results for the same sam-
ple but with a narrower binning scheme, using ∆r =
0.1 h−1Mpc (ten times smaller), ∆u = 0.2 (five times
smaller), and ∆α = 0.05 rad (two times smaller). With
the narrower bins (shown by the dashed curves in Figure
7), the shape-dependence ofQ is more pronounced for the
simulation, especially on larger scales. For comparison,
for r = 4 h−1Mpc (lower center panel), the solid trian-
gles shows results for the SDSS flux-limited sample us-
ing a similar narrow binning scheme Nichol et al. (2006),
again showing good agreement between the model and
the data.
6. GALAXY BIAS AND THE 3PCF
As we have seen, the 3PCF predicted for galax-
ies differs systematically from that expected for dark
matter. These differences reflect differences in the
spatial distributions of these two populations; higher-
order statistics can therefore provide important con-
straints upon the bias between galaxies and dark mat-
ter (Fry & Gaztañaga 1993; Frieman & Gaztañaga 1994)
and its dependence upon galaxy properties.
On large scales, where the rms dark matter and galaxy
overdensities are small compared to unity, it is com-
mon to adopt a deterministic, local bias model (e.g.
Fry & Gaztañaga 1993),
δgal = f(δdm) = b1δdm +
δ2dm + ..., (9)
where δgal and δdm are the local galaxy and dark matter
overdensities smoothed over some scale R. We can use
the simulations above to test how well this simple bias
prescription characterizes the galaxy distribution and its
clustering statistics.
Figure 8 shows the relation between δgal and δdm for
all subhalos in the L200 box. The points show the over-
densities for the galaxy and dark matter fields in ran-
domly placed spheres of radius 10h−1 Mpc in both real
(left panel) and redshift space (right panel). The solid
black curves show the best quadratic fits of the form
in eqn. (9). The quadratic local bias model appears
to do a reasonable job in characterizing the mean rela-
tion. Nonetheless, there is significant scatter, either due
to stochastic bias or dependence of bias on other prop-
erties than δdm, that is not captured by this simple bias
model. The errors on these fits, also extended to samples
divided by galaxy luminosity, are shown in Fig. 9 by the
light solid contours.
To leading order, this bias prescription leads to a
relation between the galaxy and dark matter reduced
3PCF amplitudes of the form (Fry & Gaztañaga 1993;
Gaztañaga & Scoccimarro 2005)
Qgal =
(Qdm + c2) , (10)
where c1 = b1 and c2 = b2/b1. Also to leading order,
at low overdensities the relation between the galaxy and
dark matter 2PCF amplitudes in this model is given just
10 Maŕın, Wechsler, Frieman and Nichol
Fig. 8.— Galaxy vs. dark matter overdensities for all galaxies
in the L200 box measured in randomly placed spheres of radius
10h−1 Mpc in real (left panel) and redshift space (right panel).
Solid black curve denotes best fit using the quadratic bias relation
of eqn. 9. Long-dashed red line indicates the best linear bias fit
(b2 = 0) to the 2PCF, i.e., estimating b
= ξgal/ξdm. Short-dashed
blue curve indicates best fit of eqn. 10 to the 3PCF of the galaxies
and dark matter.
by the linear bias, ξgal = b
1ξdm. We can test how well
this bias prescription captures the clustering statistics
by fitting these relations to the dark matter and galaxy
2- and 3-point correlation functions in the simulation in
both real and redshift space and extracting the parame-
ters c1 and c2. Since the relation (9) holds for the den-
sity fields on some smoothing scale, we should fit the
correlation functions for separations comparable to or
slightly larger than this scale. For the 3PCF, we use
triangles with u ≡ r23/r12 = 2, r = 9 h
−1Mpc, and
weight equally all configurations with 00 < α < 1800
using the L200 box. To calculate the likelihood func-
tion, we use a method similar to that described in
Gaztañaga & Scoccimarro (2005), which is based on an
eigenmode analysis of the covariance matrix. We use
the jack-knife subsamples in real and redshift space to
construct covariance matrices, and we use only the dom-
inant eigemodes with values >
2/N where N = 16
is the number of jack-knife subsamples; we found that
adding further eigenmodes just increases artificially the
signal.
The best-fit parameter values from the 3PCF, substi-
tuted into Eqn. 9, are shown as the short-dashed blue
curves in Figure 8. Also, in the right panels of Fig-
ure 2 we show the predicted galaxy 3PCF (solid blue
curves) using the measured dark matter Qdm and the
best fitting c1, c2 parameters in Eqn. (10) from the fit at
r = 9h−1 Mpc. We see that the agreement with the mea-
sured galaxy 3PCF is very good in both real and redshift
space for triangles with r = 9h−1 Mpc; for configurations
with r = 6h−1 Mpc, the agreement in real space is still
quite good while in redshift space some deviations in the
shape-dependence appear.
In Figure 9 we show the fits for c1 and c2 for the dif-
ferent galaxy samples in the L200 box in both real and
redshift space: all galaxies (top panels), galaxies in the
absolute range −20 > Mhr > −21 (middle), and those in
the range −21 > Mhr > −22 (bottom). The thick oval
contours indicate the 1 and 2σ confidence intervals for a
∆χ2 distribution with two free parameters, constrained
using the 3PCF, via Eq. (10), and the large points in-
dicate the maxiumum likelihood values from the 3PCF.
The thin solid contours show the constraints on c1 and c2
using the fit of Eq. (9) to the measurements in Figure 8,
and the vertical line in each panel shows the estimate of
c1 from comparing the 2PCF amplitudes for galaxies and
dark matter using pairs selected from the triplets used to
measure the 3PCF. The best-fit values are shown in Ta-
ble 2: the first two columns show the best-fit parameters
1 , c
2 using the reduced 3PCF (eq. 10), the next two
columns, cδ1 and c
2, are calculated using the quadratic
bias model (eq. 9) fit to the counts in cells, and in the
last column c
1 is obtained from comparing the 2PCF of
galaxies and dark matter assuming a linear bias model,
c21 = ξgal/ξdm.
In agreement with previous measurements in surveys
and simulations, the bias parameters obtained from the
3PCF are degenerate, resulting in elongated contour el-
lipses; this could be mitigated to some extent by using
a larger variety of triangle configurations. In real space
(left panels of Figure 9), we see that the three methods
of extracting the bias parameters are in rough agree-
ment: there is a preference for c1 ∼ 1 and negative
c2 as was found for the 2dFGRS 3PCF measurements
(Gaztañaga et al. 2005). Note that the 3PCF fit tends
to overestimate c1 and to slightly overestimate c2 com-
pared to the other two methods. In redshift space (right
panels), the opposite is true: the 3PCF constraint tends
to underestimate c1. The right panels of Figs. 8 and
9 show that there is a larger discrepancy between the
3PCF fits and the counts-in-cells fit to Eqn. (9) in red-
shift space, suggesting that the relation (10) may not be
a good representation in redshift space.
While this comparison with the deterministic local
bias model is suggestive, to test it more quantitatively
one should use more triangle configurations and larger-
volume catalogs to enable more precise calculation of the
correlation matrices.
7. SUMMARY
We have studied the 3PCF of dark matter and galax-
ies in high-resolution dissipationless cosmological simu-
lations. The galaxy model associates dark matter ha-
los with galaxies by matching the halo velocity function,
including subhalos, with the observed galaxy luminosity
function by abundance, and has been shown previously to
provide an excellent match to observed two-point statis-
tics for galaxies (Conroy et al. 2006). Our primary re-
sults are as follows:
1. The reduced real-space 3PCF for both galaxies
and dark matter has strong dependence on scale
and shape. The shape dependence of the 3PCF
strengthens with increasing scale, in agreement
with previous simulation results for dark matter
and with expectations from non-linear perturba-
tion theory.
2. On small scales, or alternatively with increasing
time, the shape dependence of Q washes out as
Modeling galaxy three-point statistics 11
TABLE 2
Best-fit bias parameters in the L200 box
Subample c
All objects r-space 1.16
+0.21
−0.15
-0.19
+0.19
−0.16
+0.05
−0.12
-0.20
+0.19
−0.22
1.06±0.09
All objects z-space 0.86
+0.11
−0.14
-0.31
+0.15
−0.11
+0.05
−0.07
-0.32
+0.12
−0.18
1.03±0.08
−20 < Mhr < −21 r-space 1.08
+0.28
−0.18
-0.20
+0.28
−0.23
+0.07
−0.16
-0.08
+0.08
−0.19
1.01±0.09
−20 < Mhr < −21 z-space 0.86
+0.21
−0.16
-0.30
+0.20
−0.15
+0.06
−0.12
-0.34
+0.22
−0.16
1.00±0.08
−21 < Mhr < −22 r-space 1.42
+0.48
−0.29
-0.15
+0.33
−0.35
+0.06
−0.14
-0.36
+0.31
−0.05
1.19±0.10
−21 < Mhr < −22 z-space 0.99
+0.15
−0.35
-0.20
+0.21
−0.30
+0.12
−0.05
-0.43
+0.33
−0.07
1.13±0.13
0.5 1 1.5
0.5 1 1.5
Fig. 9.— Constraints on the model bias parameters c1 and c2 in
real (left panels) and redshift space (right panels), measured in the
L200 box for all galaxies (top), galaxies with −20 > Mhr > −21
(middle), and galaxies with −21 > Mhr > −22 (bottom). Thick
ellipses correspond to 1σ and 2σ constraints on parameters using
the reduced 3PCF fit to equation 10; symbols denote the minimum
χ2 value. Thin ellipses come from fitting the smoothed halo and
dark matter overdensities (equation 9) shown in Figure 8. The
vertical lines show the constraint on c1 from comparing the dark
matter and galaxy 2PCF amplitudes and fitting the ratio with a
linear bias model, c2
= ξgal/ξdm.
virial motions within halos replace coherent infall
on larger scales.
3. Redshift-space distortions attenuate the shape and
scale dependence of the reduced 3PCF and weaken
the evolution with redshift.
4. The reduced 3PCF shows only weak dependence on
galaxy luminosity and color; put another way, the
scaling between the 3PCF amplitude and the 2PCF
is predicted to be nearly independent of galaxy
type in this model. The trend of Q with color is
somewhat stronger than with luminosity: the re-
duced 3PCF is slightly enhanced for red galaxies
over blue, especially for elongated configurations.
5. Our model predictions are in excellent agreement
with the shape and scale dependence of the galaxy
3PCF measured in the SDSS when the same bin-
ning scheme is used. Since the results are highly
sensitive to the binning scheme, caution must be
exercised in comparing theory and observations of
the 3PCF. In combination with earlier results, this
comparison indicates that a simple scheme in which
galaxies and dark matter halos and subhalos are
associated in a one-to-one fashion based on maxi-
mum circular velocity can provide a good match to
a wide range of galaxy clustering statistics.
6. The effect on the 3PCF of changing the selection of
subhalos (i.e., of connecting galaxy luminosity to
Vmax,acc instead of Vmax,now) is evident on small
scales (r < 2 h−1Mpc) and for elongated configu-
rations, but is negligible on larger scales. Future
measurements of the 3PCF will help constrain dif-
ferent models for the association of galaxy luminos-
ity and color with subhalo properties.
7. On scales of order 10h−1 Mpc, a local, determinis-
tic bias scheme is in reasonable agreement with the
galaxy and dark matter distributions of the model.
The bias parameters extracted fromQgal are in rea-
sonable agreement with the δgal-δdm relation in real
space, less so in redshift space. Nevertheless, the
redshift-space constraints on the bias parameters
are in agreement with the 2dFGRS measurements
of the 3PCF.
We are indebted to Anatoly Klypin and Brandon All-
good for running and making available the simulations
used in this paper, which were run on the Columbia ma-
chine at NASA Ames and on the Seaborg machine at
NERSC (Project PI: Joel Primack), to Andrey Kravtsov
for running some of the halo catalogs used in this study,
and to Charlie Conroy for providing us with measure-
ments of vmax,acc. We thank Enrique Gaztañaga for
enlightening comments on measuring and interpreting
the 3PCF and comparing results of Q(α) with his es-
timator; and Cameron McBride for discussions on the
measurements of the bias parameters. We additionally
thank Andrey Kravtsov, Issha Kayo and Roman Scoc-
cimarro for several useful discussions. RHW was par-
tially supported by NASA through Hubble Fellowship
grant HST-HF-01168.01-A awarded by the Space Tele-
scope Science Institute, and also recieved support from
the U.S. Department of Energy under contract number
DE-AC02-76SF00515. This work was supported in part
by the Kavli Institute for Cosmological Physics through
the grant NSF PHY-0114422, and by the U.S. Depart-
ment of Energy at Fermilab and at U. Chicago. FAM
thanks the Fulbright Program and CONICYT-Chile for
additional support.
This study also made use of the SDSS DR3 Archive, for
which funding has been provided by the Alfred P. Sloan
12 Maŕın, Wechsler, Frieman and Nichol
Foundation, the Participating Institutions, the National
Aeronautics and Space Administration, the National Sci-
ence Foundation, the U.S. Department of Energy, the
Japanese Monbukagakusho, and the Max Planck Soci-
ety. The SDSS Web site is http://www.sdss.org/. The
SDSS is managed by the Astrophysical Research Consor-
tium (ARC) for the Participating Institutions: the Uni-
versity of Chicago, Fermilab, the Institute for Advanced
Study, the Japan Participation Group, the Johns Hop-
kins University, Los Alamos National Laboratory, the
Max-Planck-Institute for Astronomy (MPIA), the Max-
Planck-Institute for Astrophysics (MPA), New Mexico
State University, University of Pittsburgh, Princeton
University, the United States Naval Observatory, and
the University of Washington. We also made extensive
use of the NASA Astrophysics Data System and of the
astro-ph preprint archive at arXiv.org.
REFERENCES
Abazajian, K. et al. 2003, AJ, 126, 2081
Allgood, B., Flores, R. A., Primack, J. R., Kravtsov, A. V.,
Wechsler, R. H., Faltenbacher, A., & Bullock, J. S. 2006,
MNRAS, 367, 1781
Barriga, J. & Gaztañaga, E. 2002, MNRAS, 333, 443
Baugh, C. M. et al. 2004, MNRAS, 351, L44
Berlind, A. A. & Weinberg, D. H. 2002, ApJ, 575, 587
Bernardeau, F., Colombi, S., Gaztañaga, E., & Scoccimarro, R.
2002, Phys. Rep., 367, 1
Berrier, J. C., Bullock, J. S., Barton, E. J., Guenther, H. D.,
Zentner, A. R., & Wechsler, R. H. 2006, ApJ, 652, 56
Blanton, M. R., Brinkmann, J., Csabai, I., Doi, M., Eisenstein,
D., Fukugita, M., Gunn, J. E., Hogg, D. W., & Schlegel, D. J.
2003a, AJ, 125, 2348
Blanton, M. R. et al. 2003b, ApJ, 592, 819
Colless, M. et al. 2001, MNRAS, 328, 1039
Conroy, C., Wechsler, R. H., & Kravtsov, A. V. 2006, ApJ, 647,
—. 2007, ApJ, submitted, arXiv:astro-ph/0703374
Croton, D. J., Norberg, P., Gaztañaga, E., & Baugh, C. M. 2006,
ArXiv Astrophysics e-prints
Croton, D. J. et al. 2004, MNRAS, 352, 1232
da Ângela, J., Outram, P. J., Shanks, T., Boyle, B. J., Croom,
S. M., Loaring, N. S., Miller, L., & Smith, R. J. 2005, MNRAS,
360, 1040
Feldman, H. A., Frieman, J. A., Fry, J. N., & Scoccimarro, R.
2001, Physical Review Letters, 86, 1434
Fosalba, P., Pan, J., & Szapudi, I. 2005, ApJ, 632, 29
Frieman, J. A. & Gaztañaga, E. 1994, ApJ, 425, 392
—. 1999, ApJ, 521, L83
Fry, J. N. & Gaztañaga, E. 1993, ApJ, 413, 447
Gaztañaga, E. & Frieman, J. A. 1994, ApJ, 437, L13
Gaztañaga, E., Norberg, P., Baugh, C. M., & Croton, D. J. 2005,
MNRAS, 364, 620
Gaztañaga, E. & Scoccimarro, R. 2005, MNRAS, 361, 824
Gray, A. G., Moore, A. W., Nichol, R. C., Connolly, A. J.,
Genovese, C., & Wasserman, L. 2004, in ASP Conf. Ser. 314:
Astronomical Data Analysis Software and Systems (ADASS)
XIII, 249
Groth, E. J. & Peebles, P. J. E. 1977, ApJ, 217, 385
Hikage, C., Matsubara, T., Suto, Y., Park, C., Szalay, A. S., &
Brinkmann, J. 2005, PASJ, 57, 709
Hou, Y. H., Jing, Y. P., Zhao, D. H., & Börner, G. 2005, ApJ,
619, 667
Huterer, D., Knox, L., & Nichol, R. C. 2001, ApJ, 555, 547
Jing, Y. P. & Borner, G. 1997, A&A, 318, 667
Jing, Y. P. & Börner, G. 1998, ApJ, 503, 37
—. 2004, ApJ, 607, 140
Kayo, I., Suto, Y., Nichol, R. C., Pan, J., Szapudi, I., Connolly,
A. J., Gardner, J., Jain, B., Kulkarni, G., Matsubara, T.,
Sheth, R., Szalay, A. S., & Brinkmann, J. 2004, PASJ, 56, 415
Klypin, A., Gottlöber, S., Kravtsov, A. V., & Khokhlov, A. M.
1999, ApJ, 516, 530
Kravtsov, A. V., Berlind, A. A., Wechsler, R. H., Klypin, A. A.,
Gottlöber, S., Allgood, B., & Primack, J. R. 2004, ApJ, 609, 35
Kravtsov, A. V., Klypin, A. A., & Khokhlov, A. M. 1997, ApJS,
111, 73
Kulkarni, G. V., Nichol, R. C., Sheth, R. K., Seo, H.-J.,
Eisenstein, D. J., & Gray, A. 2007, ApJ, submitted;
arXiv:astro-ph/0703340
Landy, S. D. & Szalay, A. S. 1993, ApJ, 412, 64
Ma, C.-P. & Fry, J. N. 2000, ApJ, 543, 503
Matsubara, T. & Suto, Y. 1994, ApJ, 420, 497
Moore, A. W., Connolly, A. J., Genovese, C., Gray, A., Grone,
L., Kanidoris, N., Nichol, R. C., Schneider, J., Szalay, A. S.,
Szapudi, I., & Wasserman, L. 2001, in Mining the Sky, 71
Nichol, R. C., Sheth, R. K., Suto, Y., Gray, A. J., Kayo, I.,
Wechsler, R. H., Marin, F., Kulkarni, G., Blanton, M.,
Connolly, A. J., Gardner, J. P., Jain, B., Miller, C. J., Moore,
A. W., Pope, A., Pun, J., Schneider, D., Schneider, J., Szalay,
A., Szapudi, I., Zehavi, I., Bahcall, N. A., Csabai, I., &
Brinkmann, J. 2006, MNRAS, 368, 1507
Nishimichi, T., Kayo, I., Hikage, C., Yahata, K., Taruya, A., Jing,
Y. P., Sheth, R. K., & Suto, Y. 2006, ArXiv Astrophysics
e-prints
Pan, J. & Szapudi, I. 2005, MNRAS, 362, 1363
Peebles, P. J. E. 1980, The Large-Scale Structure of the Universe
(Princeton University Press)
Peebles, P. J. E. & Groth, E. J. 1975, ApJ, 196, 1
Ross, A. J., Brunner, R. J., & Myers, A. D. 2006, ApJ, 649, 48
Scoccimarro, R., Colombi, S., Fry, J. N., Frieman, J. A., Hivon,
E., & Melott, A. 1998, ApJ, 496, 586
Scoccimarro, R., Couchman, H. M. P., & Frieman, J. A. 1999,
ApJ, 517, 531
Scoccimarro, R., Feldman, H. A., Fry, J. N., & Frieman, J. A.
2001a, ApJ, 546, 652
Scoccimarro, R., Sheth, R. K., Hui, L., & Jain, B. 2001b, ApJ,
546, 20
Sefusatti, E., Crocce, M., Pueblas, S., & Scoccimarro, R. 2006,
Phys. Rev. D, 74, 023522
Sefusatti, E. & Scoccimarro, R. 2005, Phys. Rev. D, 71, 063001
Suto, Y. & Matsubara, T. 1994, ApJ, 420, 504
Szapudi, I. 2005, arXiv:astro-ph/0505391
Szapudi, I., Postman, M., Lauer, T. R., & Oegerle, W. 2001, ApJ,
548, 114
Szapudi, I. et al. 2002, ApJ, 570, 75
Szapudi, S. & Szalay, A. S. 1998, ApJ, 494, L41
Takada, M. & Jain, B. 2003, MNRAS, 340, 580
Tasitsiomi, A., Kravtsov, A. V., Wechsler, R. H., & Primack,
J. R. 2004, ApJ, 614, 533
Tinker, J. L. 2007, MNRAS, 374, 477
Vale, A. & Ostriker, J. P. 2004, MNRAS, 353, 189
—. 2006, MNRAS, 371, 1173
Verde, L., Heavens, A. F., Matarrese, S., & Moscardini, L. 1998,
MNRAS, 300, 747
Verde, L., Heavens, A. F., Percival, W. J., Matarrese, S., Baugh,
C. M., Bland-Hawthorn, J., Bridges, T., Cannon, R., Cole, S.,
Colless, M., Collins, C., Couch, W., Dalton, G., De Propris, R.,
Driver, S. P., Efstathiou, G., Ellis, R. S., Frenk, C. S.,
Glazebrook, K., Jackson, C., Lahav, O., Lewis, I., Lumsden, S.,
Maddox, S., Madgwick, D., Norberg, P., Peacock, J. A.,
Peterson, B. A., Sutherland, W., & Taylor, K. 2002, MNRAS,
335, 432
Wang, Y., Yang, X., Mo, H. J., van den Bosch, F. C., & Chu, Y.
2004, MNRAS, 353, 287
Wechsler, R. H. 2004, in Clusters of Galaxies: Probes of
Cosmological Structure and Galaxy Evolution, ed. J. S.
Mulchaey, A. Dressler, & A. Oemler
Wechsler, R. H., Zentner, A. R., Bullock, J. S., Kravtsov, A. V.,
& Allgood, B. 2006, ApJ, 652, 71
York, D. G. et al. 2000, AJ, 120, 1579
Zehavi, I. et al. 2005, ApJ, 630, 1
Zheng, Z. 2004, ApJ, 610, 61
http://www.sdss.org/
ABSTRACT
  We present new predictions for the galaxy three-point correlation function
(3PCF) using high-resolution dissipationless cosmological simulations of a flat
LCDM Universe which resolve galaxy-size halos and subhalos. We create realistic
mock galaxy catalogs by assigning luminosities and colors to dark matter halos
and subhalos, and we measure the reduced 3PCF as a function of luminosity and
color in both real and redshift space. As galaxy luminosity and color are
varied, we find small differences in the amplitude and shape dependence of the
reduced 3PCF, at a level qualitatively consistent with recent measurements from
the SDSS and 2dFGRS. We confirm that discrepancies between previous 3PCF
measurements can be explained in part by differences in binning choices. We
explore the degree to which a simple local bias model can fit the simulated
3PCF. The agreement between the model predictions and galaxy 3PCF measurements
lends further credence to the straightforward association of galaxies with CDM
halos and subhalos.

<|endoftext|><|startoftext|>
Introduction
Mass loss from post-main-sequence stars provides a large fraction of the heavy element
abundance and solid particle content of the interstellar medium (e.g. Wallerstein & Knapp
1998; Ferrarotti & Gail 2006). The mechanism(s) of the mass loss during the AGB phase of
stellar evolution are thought to involve both radiation pressure and stellar pulsation (Suh
1997; Wallerstein & Knapp 1998; Schröder et al. 1998), but most details of this process are
not well understood. Many uncertainties about these processes can be clarified by studies of
the spatial structure of circumstellar mass-loss shells. For example, a number of molecular
1Astronomy Department, University of Texas at Austin, 1 University Station C1400, Austin, TX 78712-
0259; pmh@astro.as.utexas.edu, feardrew@astro.as.utexas.edu
http://arxiv.org/abs/0704.0256v1
– 2 –
line studies of the extended envelopes around AGB stars have found strong evidence for
periodic variations in mass-loss rates leading to the appearance of “rings” in the radial
distribution of molecular emission lines, (e.g. Fong, Meixner & Shah 2003, Olofsson et al.
1996). Very deep, sensitive imaging studies have also found similar phenomena in the dust
around the most nearby extreme example of these objects IRC+10216 (Mauron & Huggins
2000). In order to study the inner regions of these circumstellar shells, however, angular
resolutions well under 1 arcsec are required. For example, at a distance of 1 kpc, the
dust evaporation radius around a 104 L⊙ star corresponds to an angular radius of 20 milli-
arcsec (mas). Speckle interferometry and more recently adaptive optics observations have
enabled resolutions of order 0.1 arcsec (Hofmann et al. 2001; Biller et al. 2005), while lunar
occultation observations and multi-aperture interferometry have pushed angular resolutions
to the milli-arcsecond level, e.g. reviews by Quirrenbach (2004) and Monnier (2003).
Until recently most interferometric observations have been made in typically one or two
relatively broad bands. We present here observations of a lunar occultation of the star AFGL
5440 (aka OH 06.86-1.5, IRAS 18036-2344), made with a high-speed infrared spectropho-
tometer, pMIRAS (Harvey & Wilson 2003), developed as a prototype for a more ambitious
instrument now nearing completion. This star has been classified as a carbon-rich AGB
star on the basis of its IRAS LRS spectrum (Zuckerman & Dyck 1986; Volk & Cohen 1989;
Kwok, Volk & Bidelman 1997). Groenewegen et al. (2002) have estimated the distance to
be 2.25 kpc. Near-infrared through far-infrared photometry of the source has been summa-
rized by Guglielmo et al. (1993) and more recently by Guandalini et al. (2006), including a
combination of ground based photometry, the IRAS values, and more recent MSX results.
The reported distance and photometry imply a luminosity of 1.4×104 L⊙.
In addition to the observations of AFGL 5440, we also discuss observations that we
have made of two “calibration” stars in order to understand the limitations of our obser-
vation/analysis process. These objects are cool stars with no detectable circumstellar dust
shell based on their near-infrared and IRAS colors, IRC+00233 (M7) and HD 155292 (K2).
Our observations cover the entire 1 - 4µm spectral region with a resolving power that
varies from ∼ 20 at the shortest wavelengths to ∼ 100 at the long end. Our time resolution
of 8 msec permits an effective angular resolution of a couple milli-arcsec, with the exact
resolution being a strong function of the signal-to-noise ratio as discussed later. The broad
wavelength coverage allows us to observe simultaneously the Fresnel fringe pattern of the
obscured central star at the shorter wavelengths together with the circumstellar dust emis-
sion from the warmest part of the dust distribution. In §2 we describe the details of the
observations and instrumental parameters and the basic data reduction process. Then in §3
we describe various ways we have modelled the star+shell in order to determine the limits
– 3 –
on the circumstellar shell structure placed by our observations. Finally in §4 we summarize
the implications of these results for the mass loss of this object.
2. Observations and Data Reduction
The observations of AFGL 5440 were made during an immersion occultation event
on 28 Aug 2001 at approximately 02:44:00 UT. The elevation of the Moon at the time of
the event was 36◦, and the Sun was 17◦ below the horizon. The sky conditions were not
completely photometric but cloud cover was minimal and intermittent. The position angle
of the occultation event on the lunar limb was 54◦, and the lunar phase was 0.72. We used
the pMIRAS instrument on the McDonald Observatory 2.7-m telescope. The details of the
instrument have been described by Harvey & Wilson (2003), but we summarize the most
important characteristics here. The instrument is essentially a long/wide slit, high-speed
spectrophotometer using a NaCl prism to disperse the light from the slit, which is then
imaged onto a 32×100 pixel portion of a 2562 InSb array detector. The slit width is chosen
to be the minimum acceptable in order to minimize the background on the detector within
the limitations imposed by the seeing conditions. For these observations the slit width was
set at about 5 arcsec for the typical seeing of 1.5 arcsec. The detector is read-out every 8
msec, with photon integration occuring over essentially the full 8 msec time. Therefore, the
Fresnel occultation fringe pattern is averaged over this 8 msec time (as well as over the 2.7-m
telescope aperture). Because we use a refractive dispersive element, the dispersion/spectral
resolution is not equal at all wavelengths; the highest dispersion is at the longest wavelengths,
a feature that minimizes the background photon count at those wavelengths. On average
over the 1 – 4µm waveband covered by the instrument, one pixel corresponds roughly to
λ/∆λ = 100. Because of uncompensated seeing effects and the lower spectral dispersion
at shorter wavelengths, the true resolving power at the shortest wavelengths is R ∼ 20.
In the spatial direction the plate scale is 0.4 arcsec/pixel. The instrument is read-noise
limited shortward of 2.5µm and background-limited longward of 3µm. A typical observation
consists of taking 5000 frames at a time roughly centered on the occultation event. This
is accomplished by using a buffer that holds the most recent 5000 frames and terminating
the data acquisition a few seconds after the event is observed on a real-time display. This
is an important feature since the predicted times of occultation events are often in error by
as much as 10 seconds due to irregularities in the lunar limb as well as imperfect stellar
astrometry. Our observations of the occultation event of the comparison star, IRC+00233,
were obtained during an immersion event on 29 Jun 2001, and those of HD 155292 were
obtained during an immersion event on 26 Aug 2001.
– 4 –
The data reduction process consists of several typical steps. Because the spectrum is
being observed with much higher time resolution than the seeing timescale, the spectrum
moves around by several pixels over the course of an occultation event. Therefore, to con-
struct a light curve with minimal spectral blurring, the images must be shifted to correspond
to the same wavelength/pixel scale. This is done with a simple cross-correlation algorithm
that works well for the high S/N data discussed here. The spectrum is also not perfectly
aligned with the X/Y axes of the detector, so we take out this tilt as well during the pro-
cessing to simplify later steps. Because we use a “long slit”, ∼ 10 arcsec, we can use the sky
measurements on either side of the stellar spectrum to provide an accurate and high-time-
resolution sky subtraction. For the data discussed here we produce a weighted average in
the spectral direction that is two pixels wide and sum the pixels in the spatial direction that
have detectable signal. We have experimented with more elaborate photometric extraction
schemes, but this technique appears to produce S/N ratios as high as any more complex
algorithms. We have also experimented with various flat-fielding methods but have found
only a small improvement in S/N with these methods. The end result of the data reduc-
tion process is a sequence of ∼ 100 light curves for which we extract a few hundred frames
centered on the occultation event. The frames that are taken well before the Fresnel fringe
pattern of the occultation becomes evident are used to estimate the noise level in the data,
due both to read-noise, background photon statistics, and the often non-negligible amount
of seeing/scintillation noise caused by the atmosphere.
Because the exact location of the spectrum on the detector varies both due to seeing
and between observing runs after adjustments to the instrument, we perform the wavelength
calibration by fitting the NaCl dispersion function to the observed positions of the J, H, K,
and L atmospheric transmission maxima in the actual data for each occultation event. The
accuracy of this calibration is probably good to ± .01 λ throughout the 1 – 4µm region that
is observed. Because of variable instrumental efficiency depending on the placement of the
star image on the input slit, and the common occurrence of non-photometric sky conditions
during occultation events, we specifically do not attempt to derive a flux calibration for our
data. We have, however, compared the relative signal from AFGL 5440 to that of relatively
well characterized stars observed on the same night and conclude that the stellar magnitudes
in the published literature for AFGL 5440 are consistent with values that we would have
derived from our signal strengths to within ± 30%.
– 5 –
3. Source Modeling
The fringe pattern of a lunar occultation event is a convolution of the pattern for a point
source over several parameters that all act to blur the fringes. These parameters include:
the telescope aperture, the integration time of the detector, the wavelength bandwidth of
the observation, and, most importantly, the source size/structure. In order to extract the
source size or more detailed properties of the spatial structure, the typical procedure is to
model the combination of all the above “blurring” parameters with various possible source
models to find the best fit to the data (e.g. Nather & McCants 1970; Richichi et al. 1995).
An additional uncertainty in the observations is the basic frequency of the fringe pattern,
i.e. the speed of the lunar shadow. Although this parameter is calculated by the software
that we use to predict occultation events, small uncertainties in the shape of the lunar limb
(roughness due to craters, etc) can produce differences in the predicted shadow speed up
to several tens of percent. Therefore, this parameter must also be fit in addition to the
parameters that blur the fringes.
We began our modeling process by assuming a uniform disk as the simplest possible
model with which to try fitting the data for the observed objects, AFGL 5440, IRC+00233,
and HD 155292. We use a simple χ2 test for the best model, allowing one or more parameters
to vary during the process. Typically we first allow both the size and lunar shadow velocity
to vary until we find an approximate fit to both. This fit can be done either individually at
each wavelength or globally using the entire waveband.
3.1. Comparison Stars
For IRC+00233 and HD 155292 we assumed that a single source size was likely to be
appropriate for the entire wavelength range within our observational uncertainties, and we fit
the observations globally for source size and lunar velocity. Rough estimates of the angular
sizes of these two stars can be derived simply by assuming that they are blackbodies of the
effective temperatures given by their spectral types. Using the empirical relation between
angular size and B-K color from van Belle (1999) for IRC+00233 (B = 11.06, K = 1.95), we
would expect an angular size of 3.5 mas; the star is, however, likely to be mildly variable, so
the size at the time of our observations might have been different by ± 20%. For HD155292
(B = 11.0, K = 4.9) a similar calculation gives an angular size of 0.6 mas. Since typical
departures from a uniform disk model are at the level of 10 – 30% (e.g. Thompson, Creech-
Eakland & van Belle 2003; and Scholz 2003), the resolution required to detect them reliably
for even IRC+00233 is below 1 mas. Based on tests we have done with model data, this is
beyond the capabilities of our current data set which is limited by both spectral resolution
– 6 –
and signal-to-noise ratio to accuracies on the order of ±2 mas.
The best fit size for IRC+00233 is between 4.5 and 5.0 mas. Figure 1 shows observed
and model light curves for a model assuming a 4.5 mas uniform disk for a subset of the
wavelengths observed. Figure 2 shows the χ2 and signal-to-noise ratio as a function of
wavelength over the entire observed band for this model. Both figures show that this model
provides a very good fit to the observations except for a small range of wavelengths around
∼ 1.7µm where a substantially larger size would provide a better fit. We do not have a
good explanation for this discrepancy; it may be due to some systematic noise effect or to
a real difference in the stellar photosphere in that region. For comparison Schmidtke et al.
(1986) observed this same star in an occultation event using narrow-band filters near 1.6 and
2.2µm. They found a size at those wavelengths of ∼ 3 mas, similar to but smaller than our
calculated blackbody size. For HD 155292 all uniform-disk models with a size less than about
3 mas were able to fit the data reasonably well (Figures 3 and 4). Since the blackbody size
of the star is less than a milli-arcsecond, this is consistent with the expected uncertainties in
our data and modeling. The signal-to-noise ratio for this star was low enough that we had
no effective narrow-band information beyond 2.5µm, and at the shorter wavelengths some
periodic electronic pickup had a non-negligible effect on the observed fringe patterns as well.
Note that for both these comparison stars there are wavelengths with reasonable S/N as
shown in Figures 1 and 3 where there is a less than adequate fit, so these are difficult to
explain solely as due to telluric atmospheric absorption effects.
3.2. AFGL 5440
For AFGL 5440 a quick glance at the observed light curves (Figure 5) indicated that a
single source size was unlikely to fit over the entire 1 – 4µm bandwidth (Harvey & Wilson
2003). This suggests that we are seeing the combination of the emission from the central
star and a circumstellar shell of material due to mass loss from the star. To demonstrate
the poor fit with a single size uniform disk, we show in Figures 5 and 6 the results of trying
to fit the data with one example uniform disk, 11 mas. As can be seen in the plots of χ2
as well as the observed versus model light curves, the 11 mas disk provides a passable fit in
the mid-range of wavelengths, 2.5 – 3.2µm, but produces fringes that are too sharp at the
longer wavelengths and too broadened at the shortest wavelengths.
This result motivated us to pursue a full radiative transfer model for the object that
could be used to compare both the size constraints provided by our occultation data and the
spectral energy distribution (Guandalini et al. 2006) which contains important and different
information about the relative amount of dust at different temperatures. The DUSTY code
– 7 –
(Ivezić, Nenkova & Elitzur 1999) was originally created, in fact, for modeling the emission
from AGB stars surrounded by mass-loss shells. Its output includes model source images
as well as the total energy distribution, so it is ideal for our purposes. Our approach to
using the code was to choose a particular combination of input parameters and then vary
the dust optical depth to find the best fit to the energy distribution for those parameters.
We then used the output source images that were computed as a function of wavelength
to calculate expected occultation fringes for the model and compared those to the observed
fringes. Since AFGL 5440 has been classified as a carbon-rich star, we assumed a carbon-
rich dust composition, typically some combination of amorphous carbon, silicon carbide, and
graphite as the major constituents. The other critical parameters for the models are the dust
temperature at the inner radius and the radial density gradient. The outer radius of the dust
shell makes essentially no difference to the observed characteristics at λ < 60µm as long as it
is at least 100 times the inner radius. The goal of our modeling was to find some reasonable
fit to our occultation data and the rough spectral energy distribution for the circumstellar
dust shell; we have not attempted to extract details of the stellar photosphere or attempt
a thorough examination of all possible dust size/composition models since our data do not
bear directly on those issues for reasons of wavelength coverage and spectral resolution. In
particular, we did not try to find any better than superficial agreement with the IRAS LRS
data.
We explored more than 200 models to understand the effect of varying the input pa-
rameters on the quality of the fit. Basically all the models that provided an approximate fit
to the occultation observations and the spectral energy distribution had several features in
common. First, the dust temperature at the inner radius of the dust shell was of order 950K
± 50K. Models with a maximum dust temperature below 900 K did not have enough hot
dust to fit the occultation fringes at the longer wavelengths, while models with hotter inner
edges had even more difficulty than the best-fit model in reproducing the energy distribu-
tion longward of 10µm. Secondly, the radial density gradient of models with a reasonable
fit was close to r−2 (or to DUSTY’s calculation of the gradient appropriate for a radiatively
driven wind which approximates an r−2 distribution for large radii). Models with a density
gradient of r−1.8 came closer to producing the IRAS 60µm flux, but did not fit the shape of
the 5 – 20µm energy distribution well. Models with a density gradient of r−2.2 can fit the
energy distribution out to 20µm and also provide a good fit to the occultation results, but
have substantially worse fits to the IRAS 60µm flux than our best fit models. The optical
depth for best fit at the fiducial wavelength of 5µm was typically in the range 0.3 – 0.6.
Finally, the dust composition that provided the best fit included a small amount of SiC
together with comparable amounts of amorphous carbon and graphite. Other carbon-rich
compositions produced reasonable fits except at the longest wavelengths or in the 11µm SiC
– 8 –
feature. Note that there is a range of optical properties for different forms of amorphous car-
bon (Andersen, Liodl,& Hofner 1999) that we did not explore. We experimented with two
grain size distributions, the MRN slope (Mathis, Rumpl & Nordsieck 1977), and another,
the KMH shape (Kim, Martin & Hendry 1994), used by Ivezić & Elitzur (1996) for models
of IRC+10216, that has a smoother fall-off on each end. We found that we could obtain
reasonably good fits with either distribution. Figures 7 and 8 show the fits to the spectral
energy distribution and to the occultation light curves for the best-fit model. Figure 9 shows
the quality of the fit versus wavelength, and figure 10 illustrates the spatial profiles of this
best-fit model. Figures 11 and 12 likewise show for comparison the fits for a model discussed
below that does not use any graphite. Finally, figures 13 and 14 show results for a third
model that fits the energy distribution but gives a poor fit to the occultation results because
of a lack of enough warm dust close to the star.
4. Discussion and Summary
Model dust shells have been computed for a number of carbon-rich AGB stars by various
authors. Le Bertre, Gougeon, & Le Sidaner (1995) and Le Bertre (1997) found that the
energy distributions for nearly two dozen carbon-rich stars could be modelled with very
similar dust shell parameters. They found a common value of maximum dust temperature
of order 950 K with shell density gradient of ρ ∝ r−2 for spherically symmetric shells. The
best fit dust property implied a dust emissivity, ǫ ∝ λ−1.3, consistent with that expected for
an amorphous carbon-rich dust composition as also found by Jura (2004). These conclusions
were enhanced by their ability to fit the energy distributions over a range of phases of the
observed variability of many of the stars. On the other hand, Suh (1997) suggested that a
“superwind” phase of mass loss could improve model fits to the energy distributions for a
number of carbon-rich AGB dust shells by enhancing the emission shortward of 30µm because
of a high rate of mass loss in recent times, e.g. at small radii where most of the emission
would be due to hotter dust. Ivezić & Elitzur (1996) used a self-consistent radiatively driven
hydrodynamic model for the density distribution around IRC+10216 and were able to fit both
the spectral energy distribution and the near-infrared angular visibility data from speckle
observations. Interestingly they derived an inner dust shell temperature substantially lower
than ours and most other studies, ∼ 750K. Winters et al. (1997) computed a full time-
dependent model of periodic outflow from AFGL 3068 and were able to produce a good fit
to the energy distribution and observed light curves. Virtually all of these modeling efforts
used dust emissivities appropriate for some form of amorphous carbon (usually including
SiC), but with no graphite (see also Lorenz-Martins 2001), contrary to our best model fit
above. The model of Ivezić & Elitzur (1996) utilized a grain size distribution that included
– 9 –
substantially larger grains than most earlier models.
Our lunar occultation observations add constraints to these results most directly in
defining the amount of dust in the innermost region around AFGL 5440, since our longest
observed wavelength is 4µm. Basically, our data imply that the density of grains emitting
in the 3 – 4µm spectral region must be sufficiently large to produce the substantial fringe
“blurring” observed in our light curves. Graphically, this amount of emission is illustrated
in figure 10 showing the source profiles for the best-fit model. For any given assumed dust
emissivity, this constraint then implies a fairly narrow range of optical depth and ratio of
near-infrared to mid-infrared dust emission. Thus, we constrain the dust density gradient in
the innermost part of the circumstellar shell. Finally, as mentioned above, the fact that we
were unable to find a satisfactory fit to the data with any model having a maximum dust
temperature lower than 900 K is a strong constraint on the location of the inner edge of
the dust shell. This result follows from the fact that dust cooler than 900 K cannot emit
sufficiently to produce the required amount of 3 – 4µm emission. Note that this temperature
is nearly a factor of two below the expected condensation temperatures for dust around these
stars (Egan & Leung 1995). The physical value of this inner radius for AFGL 5440 for the
dust properties of the best-fit model is 3.7×1014 cm, or an angular radius at the assumed
distance of 2.25 kpc of 11 mas. For reference the angular diameter of the central star is of
order 3 mas, based on a blackbody approximation for its likely photospheric temperature of
2500K ± 300K and 2µm magnitude.
Since our modeling is the first of which we are aware for this particular star over the
entire infrared wavelength region, we also discuss here the constraints on the circumstellar
cloud properties implied by the overall energy distribution. As mentioned above, our best-fit
model utilized roughly equal amounts of amorphous carbon and graphite in addition to the
small amount of SiC required to fit the 11µm feature seen in the IRAS LRS spectrum. This
result is contrary to the large amount of evidence against substantial amounts of graphite
in stars like AFGL 5440. The factor that drove us to include graphite was its relatively flat
emissivity vs. wavelength dependence between 10 and 50µm that enabled us to fit the IRAS
60µm flux. Models with only amorphous carbon and SiC failed to fit that flux by factors of
3 or more. Figures 11 and 12 discussed above show the energy distribution and model light
curves for the best-fitting model that uses only amorphous carbon and SiC, and which uses
the radiatively driven hydrodynamic density gradient of Ivezic & Elitzur (1996) as computed
by DUSTY. The light curves fit the observed occultation data nearly as well as those for the
best-fit model.
Clearly, however, the IRAS 60µm flux cannot be fit by any similar model without
graphite, and substantial modifications would have to be made to the assumed density law
– 10 –
in the outer regions or to the dust emissivity in order to come close to fitting the 60µm
flux. Explaining this problem is beyond the scope of our study, but the fact that there is
abundant evidence for non-constant outflow from carbon stars (Fong, Meixner & Shah 2003;
Mauron & Huggins 2000) provides a convenient (if ad hoc) explanation. If the mass-loss rate
were greater in the past, then the amount of dust at radii appropriate for emission at 60µm
might well be larger than a simple extrapolation from the dust responsible for the 3 – 20µm
emission. This would describe the opposite situation from that proposed by (Suh 1997) who
suggested higher mass-loss rates in the recent past to explain observations of a number of
other similar carbon stars. The radial location of dust emitting strongly at 60µm is of order
1 to a few arcseconds from the star. For an assumed distance of 2.25 kpc and its measured
outflow velocity of 22 km s−1 (Groenewegen et al. 2002), this would correspond to a time
of order 500 years to a couple thousand years in the past for the proposed higher mass-loss
rate. This time scale is comparable to the period of fluctuation seen by Mauron & Huggins
(2000) for IRC+10216.
The fact that we have constrained the absolute value and radial dependence of the dust
density with our observations means that we have constrained the recent dust mass-loss
rate for AFGL 5440 as well. The dust mass-loss rate implied by our derived optical depth
and radial density dependence is Ṁdust = 6.5 × 10
−8 M⊙/yr for the optical constants of
amorphous carbon used by DUSTY (Ivezić, Nenkova & Elitzur 1999). Groenewegen et al.
(2002) computed dust and gas mass-loss rates individually for AFGL 5440 on the basis of the
IRAS 60µm flux for the dust, and millimeter CO observations for the gas. They derived a
gas mass-loss rate of 3.1×10−5 M⊙/yr and dust mass-loss rate of 5.1×10
−8 M⊙/yr implying
a gas-to-dust mass ratio of 600. Interestingly, their dust mass-loss rate is slightly lower than
the value we have derived in spite of their using the 60µm IRAS flux for normalization. This
result reinforces the overall uncertainties in model assumptions and absolute dust opacties,
particularly at longer infrared wavelengths. In any case, a dust mass-loss rate of order 5 –
10×10−8M⊙/yr and gas mass-loss rate a few hundred times larger are consistent with the
data.
In summary, our observations have separately resolved the stellar photosphere and the
inner edge of the circumstellar dust shell around AFGL 5440. We have strongly constrained
the inner radius of the dust shell surrounding the star as well as the near-infrared optical
depth. Our constraints together with the spectral energy distribution suggest a dust density
gradient consistent with that expected for radiatively driven mass loss with the exception
that the far-infrared flux may imply a recent decrease in mass-loss rate from the time when
the far-ir-emitting dust was ejected.
– 11 –
5. Acknowledgments
We thank a number of people and institutions that have supported this work since its
earliest stages. M. Simon greatly encouraged our efforts and provided many useful comments
over the course of this work. He also provided the basic prediction software (developed by L.
Cassar) that we use for determining the times and elements of occultation events. We also
acknowledge illuminating conversations with A. Richichi, S. Guilloteau, D. Evans, and R.
E. Nather and very helpful suggestions from two anonymous referees. D. Wilson developed
the initial versions of a number of the reduction algorithms used in this work and provided
a great deal of assistance during the observations. C. Young provided help in the intricacies
of DUSTY. This project has been supported by: internal McDonald Observatory funding,
NASA Grant NAG5-10458, and NSF grant AST-0096626. We also acknowledge extensive use
of the NASA Astrophysics Data System and SIMBAD, and P. Harvey thanks the University
of Colorado’s Center for Astrophysics and Space Astronomy for graciously hosting him during
a sabbatical while much of this paper was written.
REFERENCES
Andersen, A. C., Liodl, R. & Hofner, S. 1999, A&A, 349, 243
Biller, B. A. et al. 2005, ApJ, 620, 450
Egan, M. P. & Leung, C. M. 1995, ApJ, 444, 251
Ferrarotti, A. S. & Gail, H.-P. 2006, A&A, 447, 553
Fong, D., Meixner, M. & Shah, R. Y. 2003, ApJ, 582, L39
Groenewegen, M. A. T., Sevenster, M., Spoon, H. W. W. & Pérez, I. 2002, A & A, 390, 511
Guandalini, R., Busso, M., Ciprini, S., Silvestro, G. & Persi, P. 2006, A&A, 445, 1069
Guglielmo, F. et al. 1993, A&A Suppl., 99, 31
Harvey, P. M. & Wilson, D. 2003, Proc. SPIE, 4841, 355
Hofmann, K.-H., Blocker, T., Weigelt, G. & Balega, Y. 2001, A&A, 379, 529
Ivezić, Z. & Elitzur, M. 1996 MNRAS, 279, 1019
Ivezić, Z., Nenkova, M. & Elitzur, M. 1999, User Manual for DUSTY, University of Kentucky
Internal Report, accessible at http://www.pa.uky.edu/∼moshe/dusty
http://www.pa.uky.edu/~moshe/dusty
– 12 –
Jura, M. 2004, ASP Conf. Series, 309, 321
Kim, S. H., Martin, P. G. & Hendry P. D. 1994, ApJ, 422, 164
Kwok, S., Volk, K. & Bidelman, W. P. 1997, ApJ, 112, 557
Le Bertre, T. 1997, A&A, 324, 1059
Le Bertre, T., Gougeon, S. & Le Sidaner, P. 1995, A&A, 299, 791
Lorenz-Martins, S., de Araújo, F. X., Codina Landaberry, S. J., de Almeida, W. G. & de
Nader, R. V. 2001, A&A, 367, 189
Mathis, J. S., Rumpl, W. & Nordsieck, K. H. 1977, ApJ, 217, 425
Mauron, N. & Huggins, P. J. 2000, A&A, 359, 707
Monnier, J.D. 2003, Rep. Prog. Phys., 66, 789
Nather, R. E. & McCants, M. M. 1970, AJ, 75, 963
Olofsson, H., Bergman, P., Eriksson, K. & Gustafsson, B. 1996, A&A, 311, 587
Quirrenbach, A. 2004, Adv. Sp. Res., 34, 524
Richichi, A. et al. 1995, A&A, 301, 439
Schröder, K.-P., Winters, J. M., Arndt, T. U. & Sedlmayr, E. 1998, A&A, 335, L9
Schmidtke, P. C. et al. 1986, AJ, 91, 961
Scholz, M. 2003, Proc. SPIE, 4838, 163
Suh, K. Y. 1997, MNRAS, 289, 559
Thompson, R.R., Creech-Eakman, M.J. & van Belle, G.T. 2003, Proc. SPIE, 4838, 221
van Belle, G.T. 1999, PASP, 111, 1515
Volk, K. & Cohen, M. 1989, AJ, 98, 931
Wallerstein, G. & Knapp, G. R. 1998, ARAA, 36, 369
Winters, J. M., Fleischer, A. J., Le Bertre, T. & Sedlmayr, E. 1997, A&A, 326, 305
Zuckerman, B. & Dyck, M. 1986, ApJ, 311, 345
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Table 1. DUSTY Models of AFGL 5440
Model Tmax ρ ∝ r
−N τ Amorph C Graphite SiC Size Dist.
K @5µm % %) %)
194 950 -2.0 0.41 40 50 10 MRN1
200 950 -2.0 0.55 95 0 5 KMH2
206 850 rad-flow3 0.45 95 0 5 KMH2
Note. — 1Mathis, Rumpl & Nordsieck (1977) with amin = .005µm; amax =
0.25µm.
2Kim, Martin & Hendry (1994) with amin = .005µm; amax = 0.2µm.
3 Radiatively driven outflow computed by DUSTY as described by
Ivezić & Elitzur (1996)
– 14 –
Fig. 1.— Plots of observed light curves (crosses) versus computed model light curves for
the 4.5 mas uniform disk model of IRC+00233 at a sampling of the range of wavelengths
observed between 1 and 4µm. In each panel the lower curve of triangles shows the difference
of observed minus model.
– 15 –
Fig. 2.— Plots of the χ2 (solid) for the model fit of the 4.5 mas uniform disk to the observed
light curve for IRC+00233 over the range of wavelengths observed between 1 and 4µm.
The observed signal-to-noise ratio is also shown (dashed) as a function of wavelength for
comparison.
– 16 –
Fig. 3.— Plots of observed light curves (crosses) versus computed model light curves for
the 2.0 mas uniform disk model of HD155292 at a sampling of the range of wavelengths
observed between 1 and 4µm. In each panel the lower curve of triangles shows the difference
of observed minus model.
– 17 –
Fig. 4.— Plots of the χ2 (solid) for the model fit of the 2.0 mas uniform disk to the
observed light curve for HD155292 over the range of wavelengths observed between 1 and
4µm. The observed signal-to-noise ratio is also shown (dashed) as a function of wavelength
for comparison.
– 18 –
Fig. 5.— Plots of observed light curves (crosses) versus computed model light curves for
the 11 mas uniform disk model for AFGL 5440 at a sampling of the range of wavelengths
observed between 1 and 4µm. In each panel the lower curve of triangles shows the difference
of observed minus model.
– 19 –
Fig. 6.— Plots of the χ2 (solid) for the model fit of the 11 mas uniform disk to the observed
light curve for AFGL 5440 over the range of wavelengths observed between 1 and 4µm.
The observed signal-to-noise ratio is also shown (dashed) as a function of wavelength for
comparison.
– 20 –
Fig. 7.— Plot of the observed spectral energy distribution of AFGL 5440 versus that
predicted by model 194, the best fit model. The open diamonds show the values from
Guandalini et al. (2006) and Guglielmo et al. (1993). The light grey line indicates the IRAS
LRS spectrum.
– 21 –
Fig. 8.— Plots of observed light curves (crosses) versus computed model light curves for
model 194 for AFGL 5440 at a sampling of the range of wavelengths observed between 1
and 4µm. In each panel the lower curve of triangles shows the difference of observed minus
model.
– 22 –
Fig. 9.— Plots of the χ2 (solid) for the fit of model 194 (the best fit) to the observed
light curve for AFGL 5440 over the range of wavelengths observed between 1 and 4µm.
The observed signal-to-noise ratio is also shown (dashed) as a function of wavelength for
comparison.
– 23 –
Fig. 10.— Plots of the model spatial distributions for AFGL 5440 predicted by model 194,
the best fit model. The curves are drawn for 1.0 and 1.2µm (solid, 1.0µm is the more
extended), and 1.4, 1.6, 1.8, 2.0, 2.2, 2.4, 2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, and 4.0µm, with the
longest wavelengths being the most extended of these.
– 24 –
Fig. 11.— Plot of the observed spectral energy distribution of AFGL 5440 versus that
predicted by model 200, the best fit model that does not use graphite as for fig. 7.
– 25 –
Fig. 12.— Plots of observed light curves (crosses) versus computed model light curves for
model 200 for AFGL 5440 at a sampling of the range of wavelengths observed between 1
and 4µm. In each panel the lower curve of triangles shows the difference of observed minus
model.
– 26 –
Fig. 13.— Plot of the observed spectral energy distribution of AFGL 5440 versus that
predicted by model 206, illustrating the fit for a shell with a cooler inner dust temperature
as for fig. 7.
– 27 –
Fig. 14.— Plots of observed light curves (crosses) versus computed model light curves for
model 206 for AFGL 5440 at a sampling of the range of wavelengths observed between 1
and 4µm. In each panel the lower curve of triangles shows the difference of observed minus
model. Note the larger residuals at the longer wavelengths than for the best fit model 194.
	Introduction
	Observations and Data Reduction
	Source Modeling
	Comparison Stars
	AFGL 5440
	Discussion and Summary
	Acknowledgments
ABSTRACT
  We present observations and modeling of a lunar occultation of the
dust-enshrouded carbon star AFGL 5440. The observations were made over a
continuous range of wavelengths from 1 - 4um with a high-speed
spectrophotometer designed expressly for this purpose. We find that the
occultation fringes cannot be fit by any single-size model. We use the DUSTY
radiative transfer code to model a circumstellar shell and fit both the
observed occultation light curves and the spectral energy distribution
described in the literature. We find a strong constraint on the inner radius of
the dust shell, Tmax = 950 K +/- 50K, and optical depth at 5um of 0.5 +/- 0.1.
The observations are best fit by models with a density gradient of r^-2 or the
gradient derived by Ivezic & Elitzur for a radiatively driven hydrodynamic
outflow. Our models cannot fit the observed IRAS 60um flux without assuming a
substantial abundance of graphite or by assuming a substantially higher
mass-loss rate in the past.

<|endoftext|><|startoftext|>
Orbifold cohomology of abelian symplectic reductions and
the case of weighted projective spaces
Tara S. Holm
Abstract. These notes accompany a lecture about the topology of symplectic
(and other) quotients. The aim is two-fold: first to advertise the ease of
computation in the symplectic category; and second to give an account of
some new computations for weighted projective spaces. We start with a brief
exposition of how orbifolds arise in the symplectic category, and discuss the
techniques used to understand their topology. We then show how these results
can be used to compute the Chen-Ruan orbifold cohomology ring of abelian
symplectic reductions. We conclude by comparing the several rings associated
to a weighted projective space. We make these computations directly, avoiding
any mention of a stacky fan or of a labeled moment polytope.
Contents
1. Symplectic manifolds and quotients 2
2. Orbifolds and their cohomology 5
3. Why the symplectic category is convenient 9
4. The case of weighted projective spaces 12
References 18
The notion of an orbifold has been present in topology since the 1950’s [S1,
S2]. More recently, orbifolds have played an important role in differential and alge-
braic geometry, and in mathematical physics. A fundamental theme is to compute
topological invariants associated to an orbifold, with one ostensible goal to under-
stand Gromov-Witten invariants for these spaces. The aim of the present article is
modest: to expound how techniques from symplectic geometry may be used to un-
derstand the degree-zero genus-zero Gromov-Witten invariants with three marked
points, the so-called Chen-Ruan orbifold cohomology ring; and to make ex-
plicit the details of these techniques in the case of weighted projective spaces.
In the symplectic category, orbifolds arise as symplectic quotients. We recount
the techniques from symplectic geometry that may be used to compute topolog-
ical invariants of a symplectic quotient. This is based on Kirwan’s seminal work
1991 Mathematics Subject Classification. Primary 53D20; Secondary 14N35, 53D45, 57R91.
Key words and phrases. Symplectic quotient, orbifold, cohomology.
TSH is grateful for the support of the NSF through the grant DMS-0604807.
http://arxiv.org/abs/0704.0257v1
2 TARA S. HOLM
[Ki]; and for orbifold invariants, the author’s joint work with Goldin and Knutson
[GHK]. The quotients we consider are by a compact connected abelian group. We
employ techniques coming from algebraic topology, most notably using equivariant
cohomology. For those used to working with finite groups, it is important to note
that, whereas for finite groups the invariant part of a cohomology ring is identical
to the equivariant cohomology, this is not the case for connected groups.
The main example in this article is a weighted projective space CPn
. Its
definition depends on a sequence (b) = (b0, . . . , bn) of positive integers. Kawasaki
showed that the ordinary cohomology groups, with integer coefficients, of the un-
derlying topological space of a weighted projective space are identical to the co-
homology groups of a smooth projective space [Ka], but there is a twisted ring
structure. We review the details of his work. Then in Theorem 4.2, we compute
the cohomology of the orbifold [CPn
], proving that
(0.1) H∗([CPn(b)];Z) = H
(S2n+1;Z) ∼=
〈b0 · · · bnun+1〉
Whereas Kawasaki finds a twist in the the ring structure, we find torsion in high
degrees of the ring (0.1). There is a natural map from Kawasaki’s ring to this one,
and we describe the map explicitly. Finally in Theorem 4.3, we compute the Chen-
Ruan cohomology ring of this orbifold. We make this computation using integer
coefficients, generalizing results in [J, Ma1, Ma2]. Moreover, we give explicit
generators and relations, and avoid mentioning a stacky fan [BCS] or a labeled
polytope [LT, GHK].
The definitions in this article make sense for arbitrary coefficient rings. Indeed,
all computations in the final section use integer coefficients. Moore and Witten
have suggested that the torsion in K-theory has more physical significance than
torsion in cohomology [MoWi]. The author together with Goldin, Harada and
Kimura, is investigating a K-theoretic version of [GHK] and of the computations
herein, building on the work of Harada and Landweber [HL].
The remainder of the paper is organized as follows. In Section 1 we give a quick
exposition of how orbifolds arise in the symplectic category. We then introduce
several cohomology rings associated to an orbifold in Section 2. We advertise the
ease of computation for these rings in Section 3. The novel results in this article
are the computations in Section 4. We include detailed proofs that avoid much of
the symplectic machinery used in [GHK].
Acknowledgments. Many thanks are due to Tony Bahri, Matthias Franz, Re-
becca Goldin, Megumi Harada, Ralph Kaufmann, Takashi Kimura, Allen Knutson,
Eugene Lerman, Reyer Sjamaar, and Alan Weinstein for many helpful conversa-
tions; and to Yoshiaki Maeda and the organizers and sponsors of Poisson 2006 in
Tokyo, Japan, where this work was presented.
1. Symplectic manifolds and quotients
We begin with a very brief introduction to the symplectic category; a more
detailed account of the subject can be found in [CdS]. A symplectic form on a
manifold M is a closed non-degenerate two-form ω ∈ Ω2(M). Thus, for any tangent
vectors X ,Y ∈ TpM , ωp(X ,Y) ∈ R. The key examples include the following.
Example 1.1: M = S2 = CP 1 with ωp(X ,Y) equal to the signed area of the
parallelogram spanned by X and Y. This is the Fubini-Study form on CP 1.
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 3
Example 1.2: M any orientable Riemann surface with ω as in Example 1.1. Note
that orientability is a necessary condition on a symplectic manifold M , because the
top exterior power of the symplectic form is a volume form.
Example 1.3: M = R2d with ω =
dxi ∧ dyi.
Example 1.4: M = Oλ a coadjoint orbit of a compact connected semisimple Lie
group, equipped with ω the Kostant-Kirillov-Soriau form.
Example 1.3 gains particular importance because of
Darboux’s Theorem 1.5. Let M be a symplectic 2d-manifold with symplectic
form ω. Then for every point p ∈ M , there exists a coordinate chart U about p with
coordinates x1, . . . , xd, y1, . . . , yd so that on this chart,
(1.1) ω =
dxi ∧ dyi.
Thus, whereas Riemannian geometry uses local invariants such as curvature to
distinguish metrics, symplectic forms are locally indistinguishable.
The symmetries of a symplectic manifold may be encoded as a group action.
Here we restrict ourselves to a compact connected abelian group T = (S1)n. An
action of T on M is symplectic if it preserves ω; that is, ρ∗gω = ω, for each g ∈ T ,
where ρg is the diffeomorphism corresponding to the group element g. The action
is Hamiltonian if in addition, for every ξ ∈ t, the vector field
(1.2) Xξ =
[exp(tξ)]|t=0
is a Hamiltonian vector field. That is, we require that ω(Xξ, ·) = dφ
ξ is an exact
one-form. Each φξ is a smooth function on M , determined up to a constant. Taking
them together, we may define a moment map
(1.3)
Φ : M −→ t∗
p 7−→
Φ(p) : t → R
ξ 7→ φξ(p)
Returning to our examples, we have Hamiltonian actions in all but the second
example.
Example 1.6: The circle S1 acts on M = S2 = CP 1 by rotations. If we use
angle and height coordinates on S2, then the vector field this action generates is
tangent to the latitude lines, so in coordinates, X ξ = ∂
, and since ω = dθ ∧ dh,
ω(Xξ, ·) = dh, so a moment map is the height function on S
2, as shown in Figure 1.1
below.
Example 1.7: If M is a two-torus M = T 2 = S1 × S1, then S1 × S1 acts on itself
by multiplication. This action is symplectic, but is not Hamiltonian. In fact, no
Riemann surface with non-zero genus has a nontrivial Hamiltonian torus action.
Example 1.8: The torus T d = (S1)d ⊂ Cd acts by coordinate-wise multiplication
on M = R2d = Cd. This action rotates each copy of C = R2 (at unit speed), and is
Hamiltonian. Identifying t∗ ∼= Rd, a moment map is
(1.4) Φ(z1, . . . , zn) = (|z1|
2, . . . , |zd|
up to a constant multiple.
4 TARA S. HOLM
PSfrag replacements
Figure 1.1. The vector field and moment map for S1 acting by
rotations on S2.
Example 1.9: Each coadjoint orbit M = Oλ ⊆ g
∗ may be identified as a homoge-
nous space G/L, where L is a Levi subgroup of the Lie group G. Thus G and its
maximal torus T act on M by left multiplication. A G-moment map is inclusion
(1.5) ΦG : Oλ →֒ g
and a T -moment map is the G-moment map composed with the natural projection
g∗ → t∗ that is dual to the inclusion t →֒ g.
In each of these examples, the image of the (torus) moment map is a convex
subset of Rn. This is true more generally.
Convexity Theorem 1.10 ([A],[GuSt]). If M is a compact Hamiltonian T -space,
then Φ(M) is a convex polytope. It is the convex hull of Φ(MT ), the images of the
T -fixed points.
The convexity theorem is an example of a localization phenomenon: a global
feature (the image of the moment map) that is determined by local features of the
fixed points (their images under the moment map). The convexity property is a
recurring theme in symplectic geometry; its many guises are illustrated in [GuSj].
The moment map is a T -invariant map: it maps entire T -orbits to the same
point in t∗. Thus when α is a regular value, the level set Φ−1(α) is a T -invariant
submanifold of M . Moreover, the action of T on a regular level set is locally free:
it has only finite stabilizers. This follows directly from the moment map condition:
at a regular value, dφξ is never zero, implying that Xξ is not zero, so there is no
1-parameter subgroup fixing points in the level set. Thus, at a regular value the
symplectic reduction M//T (α) = Φ−1(α) is an orbifold. In fact, Marsden and
Weinstein proved
Theorem 1.11 ([MaWe]). If M is a Hamiltonian T -space and α is a regular value
of the moment map Φ, then the symplectic reduction M//T (α) is a symplectic
orbifold.
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 5
More generally, the symplectic reduction M//T (α) at a critical value is a symplectic
stratified space [SL].
Symplectic reduction is an important technique for constructing new symplec-
tic manifolds from old. From our examples, we may construct several classes of
symplectic manifolds.
Example 1.12: For the action of S1 on M = S2 = CP 1 by rotation, the level
set of a regular value is a latitude line, which the circle rotates. The quotient is
a point. Note that if S1 acts by rotation S2 at twice the usual speed, then the
quotient, as an orbifold, is [pt/Z2]. Thus, every orbifold [pt/Zk] is a quotient of S
by a rotation action.
Example 1.13: T d acts on M = R2d = Cn by rotation of each copy of C = R2.
The level set of a regular value is a copy of T d, and so again the quotient is a point
(or potentially an orbipoint, for different actions of T d). However, we may also
restrict our attention to subtori K ⊆ T n. The action of K is still Hamiltonian, and
for certain choices of K, Cn//K(α) is a symplectic toric orbifold. Lerman and
Tolman show that every effective symplectic toric orbifold may be constructed in
this way [LT].
Example 1.14: For the T -action on a coadjoint orbit M = Oλ, the symplectic
reduction M//T (α) is known as a weight variety. One may determine the possi-
ble orbifold singularities by analyzing the combinatorics of G and its Weyl group.
See, for instance, [Kn] or [GHK]. This reduced space plays an important role in
representation theory.
2. Orbifolds and their cohomology
We now turn to orbifolds in the topological category. In terms of local models,
an orbifold is a topological space where each point has a neighborhood homeo-
morphic (or diffeomorphic) to the quotient of a (fixed dimensional) vector space
by a finite group. Satake introduced this notion in the 1950’s [S1, S2], originally
calling the spaces V -manifolds. Thurston coined the term orbifold when he redis-
covered them in the 1970’s (see [Th]) in his study of 3-manifolds. This local model,
however, makes it difficult to define very basic pieces in the theory of orbifolds:
overlap conditions on orbifold charts, suborbifolds, and maps between orbifolds.
As is evident already in the work of Haefliger [Hæ], the proper way to think of an
orbifold is as a Morita equivalence class of groupoids, one of which is a proper
étale groupoid (see [Moe]); or equivalently as a smooth Deligne-Mumford stack
(see [DM]). For example, using this structure, a map of orbifolds should simply be
a morphism of the appropriate objects.
While groupoids or stacks provide the correct mathematical framework, the
technology is a bit beyond the scope of this article. Indeed, for us it is sufficient to
work with the local models, largely because we restrict our attention to orbifolds
that arise as global quotients. Nevertheless, we will need to distinguish between an
orbifold X or [X ] and its underlying topological space (or coarse moduli space)
X . In particular, when X is presented as a global quotient of a manifold M by a
group G, we will use square brackets [M/G] to denote the orbifold, and M/G to
denote the underlying coarse moduli.
For an orbifold X, at each point x ∈ X, we have a local isotropy group Γx at x.
We will be interested in almost complex orbifolds, that is orbifolds that have
6 TARA S. HOLM
local models isomorphic to Cd/Γx, with Γx ⊆ U(d) a finite group acting unitarily
on Cd. Our main example in this section is the orbisphere shown in the figure
below. This is a symplectic toric orbifold in the sense of Tolman and Weitsman
Figure 2.1. An orbisphere with two orbifold points, a Zp singu-
larity at the north pole and a Zq singularity at the south pole. All
other points in the space have local isotropy group the one-element
group. When p and q are relatively prime, this is a weighted pro-
jective space CP 1p,q.
[LT]. In that context, it corresponds to the labeled polytope that is an edge, with
the vertices labeled p and q. This orbifold cannot (always) be presented as a global
quotient by a finite group, although it can be presented as a symplectic reduction.
When p and q are relatively prime, it is the reduction of C2 by the Hamiltonian
S1-action
(2.1) e2πiθ · (z1, z2) = (e
p·2πiθ · z1, e
q·2πiθ · z2).
Thus, it is a global quotient of the level set Φ−1(α) ≈ S3 by a locally free S1 action.
In this case, it is also the global quotient of CP 2 by a Zp × Zq action
1. When p
and q are not relatively prime, the orbisphere is not a global quotient by a finite
group, but it is still a symplectic quotient of C2 by a Hamiltonian S1 × Zg action,
where g = gcd(p, q) (following [LT]). Note also that an orbisphere is isomorphic
to a weighted projective space CP 2p,q exactly when p and q are relatively prime. If
p and q are not relatively prime, the weighted projective space is not reduced: it
has a global stabilizer. On the other hand, the orbisphere described above is always
reduced.
Next we turn to algebraic invariants that we may attach to an orbifold X. The
hope is that these invariants are computable and at the same time retain some
information of the orbifold structure of X. All the invariants are isomorphic to
singular cohomology if X = X is in fact a manifold.
Definition 2.1. The ordinary cohomology ring of an orbifold X is the singular
cohomology of the underlying topological space X,
(2.2) H∗(X ;R),
with coefficients in a commutative ring R.
1With an apology to number theorists, we take the topologist’s notation: Zp denotes the
integers modulo p.
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 7
This ring is computable using standard techniques from algebraic topology, but it
does not distinguish between the orbisphere in Figure 2.1 and a smooth sphere.
For the second invariant, we restrict our attention to orbifolds X presented as
the quotient of a manifold M by the locally free action of a Lie group G. It is
conjectured that every orbifold can be expressed as such a global quotient, and it
is known to be true for effective or reduced orbifolds, those that do not have a
global finite stabilizer (see, for example, [EHKV, Theorem 2.18]). A presentation
as a global quotient is desirable because then the topology or geometry of the
quotient X is simply the G-equivariant topology or geometry of the manifold M .
This principle motivates the following definition.
Definition 2.2. Given a presentation of an orbifold X = [M/G] as a global quo-
tient, the cohomology ring of the orbifold X is the equivariant cohomology
(2.3) H∗(X;R) := H∗G(M ;R),
with coefficients in a commutative ring R.
Recall that equivariant cohomology is a generalized cohomology theory in the
equivariant category. Using the Borel model, we define
(2.4) H∗G(M ;R) := H
∗((M × EG)/G;R),
where EG is a contractible (though infinite dimensional) space with a free G action,
andG acts diagonally onM×EG. There are well-developed methods for computing
equivariant cohomology, hence this invariant is still computable.
Whenever X = [M/G] is a global quotient, the associated quotient map
(2.5) q : M → X
is a G-invariant map. This induces a continuous map
(2.6) q : (M × EG)/G → X.
When X = X is a manifold (i.e. G acts on M locally freely), this map is a fibration
with fiber BG. The map q induces a map in cohomology,
(2.7) q∗ : H∗(X ;R) −→ H∗((M × EG)/G;R) = H∗G(M ;R).
This induced map is an isomorphism when
1. G acts freely on M ;
2. G is a finite group and R is a ring in which |G| is invertible; and
3. G acts locally freely on M and R is a field of characteristic 0.
This last item implies that the cohomology of the orbifold differs from the coho-
mology of its coarse moduli only in its torsion. Notably, when R = Z, this ring
does in fact distinguish between the orbisphere in Figure 2.1 and a smooth sphere.
The third invariant was introduced by Chen and Ruan [CR] to explain mathe-
matically the stringy Betti numbers and stringy Hodge numbers that physi-
cists have attached to orbifolds. To define this third invariant, we need to introduce
the first inertia orbifold
(2.8) I1(X) :=
(x, (g)Γx)
x ∈ X and (g)Γx is a conjugacy class in Γx
This is again an orbifold, and X is the suborbifold called the identity sector
whose pairs consist of a point in X together with the identity element coset. The
other connected components of I1(X) are called the twisted sectors. For a global
8 TARA S. HOLM
quotient X = [M/G] with G abelian, we may identify I1(X) =
g∈G[M
g/G]. On
the other hand, when X = [M/G] is global quotient with G finite, we may identify
I1(X) =
g∈T [M
g/C(g)], where the union is over T a set of representatives of
conjugacy classes in G. For the orbisphere example, the inertia orbifold is shown
in Figure 2.2.
PSfrag replacements
· · ·
· · ·
Figure 2.2. The inertia orbifold for the the orbisphere with two
orbifold points as in Figure 2.1. Each of the p − 1 points to the
right of the north pole represents a [pt/Zp] and each of the q − 1
to the right of the south pole a [pt/Zq].
Definition 2.3 ([CR]). Given a presentation of an orbifold X = [M/G] as a global
quotient, the Chen-Ruan orbifold cohomology of X, as a vector space, is defined
to be the cohomology of the first inertia orbifold,
(2.9) HCR(X;R) := H(I
1(X);R),
with coefficients in a commutative ring R.
The Chen-Ruan ring is endowed with a Q grading, different from the grading
coming from singular cohomology. For a connected component Z of I1(X) that lies
in the (g) piece of the first inertia, the grading of H(Z;R) is shifted by a rational
number which is twice the age of Z. The age is determined by the weights of
the action of the group element g on the normal bundle to Z inside of X. This
is precisely where we need X to be (stably) almost complex. These rational shifts
ensure that the ranks of the Chen-Ruan cohomology groups agree with the stringy
invariants of an orbifold. The dependence of the rational shifts on the normal
bundles ν(Z ⊂ X) means that the Chen-Ruan ring is not in general functorial for
arbitrary morphisms of orbifolds.
To define a product on the Chen-Ruan cohomology, we must define higher
inertia. The nth inertia orbifold consists of tuples, a point in the orbifold and an
n-tuple of conjugacy classes. Restricting to the 2nd inertia, there are natural maps
(2.10) e1, e2, e3 : I
2(X) −→ I1(X)
defined by
e1(p, (g)Γp , (h)Γp) = (p, (g)Γp),(2.11)
e2(p, (g)Γp , (h)Γp) = (p, (h)Γp), and(2.12)
e3(p, (g)Γp , (h)Γp) = (p, ((gh)
−1)Γp)(2.13)
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 9
Chen and Ruan define the product of two classes α, β ∈ HCR(X;R) to be be
(2.14) α ⌣ β := (e3)∗(e
1α ∪ e
2β ∪ ε),
where e∗1 and e
2 are pull-back maps, (e3)∗ is the push-forward, ∪ is the usual
cup product, and ε is the Euler class of the obstruction bundle. This Euler class
should be viewed as a quantum correction term. It ensures that the product respects
the Q-grading and that the product is associative. Neither of these properties is
immediately obvious, and proving the latter requires a rather substantial argument.
Since we can avoid mention of the obstruction bundle in our computations, we
suppress further details here, and refer the curious reader to [ℵGV, CR, GHK]
for additional information.
The Chen-Ruan cohomology ring is the degree 0 part of the (small) quantum
cohomology ring. Hence, it is generally the most difficult of these three invariants to
compute. It has been computed for orbifolds that are global quotients of a manifold
by a finite group in [FG]. The definition was extended to the algebraic category in
[ℵGV], and in [BCS], the ring is computed for toric Deligne-Mumford stacks with
Q, R and C coefficients. In the next section, we review how to compute this ring
for abelian symplectic quotients, as demonstrated in [GHK]. In the last section,
we will compute each of these invariants explicitly for weighted projective spaces.
We conclude this section with a table of these rings for an orbisphere that has
a Z2 singularity at the north pole and is otherwise smooth. This is an example of
a weighted projective space, and is denoted CP 11,2.
(2.15)
X = [CP 11,2] Ring Grading
H∗(X ;Z) Z[x]/〈x2〉 deg(x) = 2
H∗(X;Z) Z[x]/〈2x2〉 deg(x) = 2
H ∗CR(X;Z) Z[x, u]/〈2x
2, 2xu, u2 − x〉 deg(x) = 2, deg(u) = 1
3. Why the symplectic category is convenient
In the past thirty years, tremendous progress has been made in understanding
the equivariant topology of Hamiltonian T -spaces and its relationship to the ordi-
nary topology of their quotients. For a compact torus T = (S1)n, the classifying
bundle ET is an n-fold product of infinite dimensional spheres, and the classifying
space BT is an n-fold product of copies of CP∞. Thus,
(3.1) H∗T (pt;Z) = H
∗(BT ;Z) = Z[x1, . . . , xn],
where deg(xi) = 2. The key ingredient to understanding topology of Hamiltonian
T -spaces is the moment map. Frankel [F] proved that for a Hamiltonian T -action
on a Kähler manifold, each component φξ of the moment map is a Morse-Bott
function on M , and generically the critical set is the fixed point set MT . In his
paper [A] on the Convexity Theorem 1.10, Atiyah generalized this work to the
purely symplectic setting.
Building on the work of Frankel and Atiyah, Kirwan developed techniques to
prove two fundamental theorems that allow us to understand the cohomology of
Hamiltonian T -spaces and their quotients. The first is a version of localization:
it allows us to make global computations by understanding fixed point data. While
10 TARA S. HOLM
this theorem is not explicitly stated in her book [Ki], it does follow immediately
from her work in Chapter 5.
Injectivity Theorem 3.1 ([Ki]). Let M be a compact Hamiltonian T -space. The
inclusion map MT →֒ M induces
(3.2) i∗ : H∗T (M ;Q) −→ H
T ;Q)
an injection in equivariant cohomology.
The compactness hypothesis is stronger than strictly necessary. We may re-
place it with a properness condition on the moment map. The proof relies on the
fact that a generic component of the moment map is an equivariantly perfect
Morse-Bott function on M . The image of this injection has been computed in
many examples, including toric varieties, coadjoint orbits of compact connected
semisimple Lie groups, and coadjoint orbits of Kac-Moody groups. These compu-
tations initially appeared in [CS, GKM] and further generalizations are described
in [GoH, GuH, GuZ, HHH].
The second theorem relates the equivariant topology of a Hamiltonian T -space
to the ordinary topology of its reduction.
Surjectivity Theorem 3.2 ([Ki]). Let M be a compact Hamiltonian T -space,
and α a regular value of the moment map. The inclusion Φ−1(α) →֒ M induces
(3.3) κ : H∗T (M ;Q) −→ H
−1(α);Q)
a surjection in equivariant cohomology.
Again for surjectivity, compactness is more than is necessary. Most importantly,
this result does apply to linear actions of a torus on Cd with a proper moment
map. The key idea in the proof is to use the function ||Φ − α||2 as a Morse-like
function, now known as a Morse-Kirwan function. The critical sets are not
non-degenerate, but one may still explicitly understand them via a local normal
form. It is then possible to prove that ||Φ−α||2 is an equivariantly perfect function
on M .
The kernel of the map κ can be computed using methods in [Go, JK, TW].
Using the fact that at a regular value, H∗T (Φ
−1(α);Q) ∼= H∗(M//T (α);Q), we have
a diagram
(3.4)
0 // ker(κ)
// H∗T (M ;Q))
κ // //
H∗(M//T (α);Q) // 0
H∗T (M
T ;Q)
Thus, by computing im(i∗) and ker(κ), we may derive an explicit presentation of
the cohomology H∗([M//T (α)];Q) of an orbifold arising as a symplectic quotient.
We now turn to a generalization of Theorem 3.2 in the context of orbifolds and
the Chen-Ruan ring.
CR Surjectivity Theorem 3.3 ([GHK]). Let M be a compact Hamiltonian T -
space, and α a regular value of the moment map. The inclusion Φ−1(α) →֒ M
induces
(3.5) K :
H∗T (M
g;Q) −→
H∗T (Φ
−1(α)g ;Q)
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 11
a surjection. Moreover, these are R×T -graded rings, K is a map of graded rings,
and there is an isomorphism of graded rings
(3.6)
H∗T (Φ
−1(α)g ;Q) ∼= H
CR(M//T (α);Q).
The surjectivity (3.5) is a direct consequence of the Surjectivity Theorem 3.2
applied to each space Mg (and again the compactness is not strictly necessary).
The hard work is defining the grading and ring structure, and proving that K is
a map of graded rings. Goldin, Knutson, and the author define a ring structure
⌣ that generalizes a definition (for G finite) of Fantechi and Göttsche [FG]. Us-
ing this definition, we may deduce (3.6); on the other hand, associativity of this
product is not at all obvious. Making use of the injection i∗ on each piece, there
is an alternative product ⋆ on the ring
g∈T H
g;Q) that is much simpler to
compute, and clearly associative. This alternative product has the advantage that
it avoids all mention of the obstruction bundle; instead it relies only on fixed point
data (i.e. the topology of the fixed point set and isotropy data for the action of the
torus on the normal bundles to the fixed point components).
Another key point is that although the ring on the left of (3.5) is quite large,
there is a finite subgroup Γ of T , generated by all finite-order elements that stabilize
some regular point in M , so that the Γ-subring
(3.7)
H∗T (M
still surjects onto the Chen-Ruan cohomology of the reduction. Thus, while it
appears that we have made the computation much more complicated, it turns that
there is still an effective algorithm to complete it. For full details, please refer to
[GHK]. We now return to our examples.
Example 3.4: For a symplectic toric orbifold Cn//K(α), we may use the com-
binatorics of its labeled polytope to establish an explicit presentation of the Chen-
Ruan cohomology of these orbifolds [GHK, § 9]. In the cases where the symplectic
picture is identical to the algebraic, the [GHK] results replicate those of [BCS].
Example 3.5: For the T -action on a coadjoint orbit M = Oλ, the symplectic
reduction M//T (α) is a weight variety. The equivariant cohomology of M may
be read directly from its moment polytope, as may the orbifold singularities of the
reduction. In this case, the Theorem 3.3 yields an explicit combinatorial description
of the Chen-Ruan cohomology of the weight variety.
We conclude this section with a brief remark on coefficients. In both Theo-
rems 3.1 and 3.2, the rational coefficients are necessary. For the Injectivity The-
orem 3.1, we may prove the result over Z with the additional hypothesis that
H∗(MT ;Z) contains no torsion. The Surjectivity Theorem 3.2 over Z requires much
stronger hypotheses. We will see in the next section that we may compute inte-
grally for weighted projective spaces, but that a simple product of two weighted
projective spaces yields a counter-example to the general theorem. Tolman and
Weitsman verify surjectivity over Z for a rather restrictive class of torus actions
[TW]. This topic is being more closely examined for a larger collection of actions
by Susan Tolman and the author [HT].
12 TARA S. HOLM
4. The case of weighted projective spaces
Let b = (b0, . . . , bn) be an (n+1)-tuple of positive integers. Consider the circle
action on Cn+1 given by
(4.1) t · (z0, . . . , zn) = (t
b0z0, . . . , t
bnzn)
for each t ∈ S1. This action preserves the unit sphere S2n+1, and the weighted
projective space CPn
is the quotient of S2n+1 by this locally free circle action.
This is a symplectic reduction because S2n+1 is (up to equivariant homeomorphism)
a regular level set for a moment map Φ for the weighted S1 action on Cn+1. Thus,
(4.2) CPn(b)
n+1//S1.
Nonetheless, we may continue our analysis without invoking the full symplectic
machinery: the arguments simplify greatly in this special case.
When the bi are relatively prime (that is, gcd(b0, . . . , bn) = 1), this fits into
the framework of symplectic toric orbifolds discussed in Example 3.4. When the bi
are not relatively prime, we let g = gcd(b0, . . . , bn), and note that there is a global
Zg stabilizer. In this case, the orbifold [CP
] is not reduced. Its coarse moduli
space is the same as the coarse moduli space of CPn
(b/g)
, where (b/g) denotes the
sequence of integers (b0/g, . . . , bn/g); and as a non-reduced orbifold, it corresponds
to a gerbe
[CPn(b)] → [CP
(b/g)].
It is important to include the case when the bi are not relatively prime, because
such non-reduced weighted projective spaces may well show up as suborbifolds of
a reduced weighted projective space.
The cohomology of the topological space CPn
. Kawasaki studied the sin-
gular cohomology ring of (the coarse moduli space of) weighted projective spaces
[Ka]. To present the product structure, we will need the integers
(4.3) ℓk = ℓ
k := lcm
bi0 · · · bik
gcd(bi0 , . . . , bik)
0 ≤ i0 < · · · < ik ≤ n
for each 1 ≤ k ≤ n.
Theorem 4.1 ([Ka]). The integral cohomology of CPn
(4.4) Hi(CPn(b);Z)
Z if i = 2k, 0 ≤ k ≤ n,
0 otherwise.
Moreover, letting γi denote the generator of H
2i(CPn
;Z), we have
(4.5) γk ∪ γm =
ℓk · ℓm
γm+k.
Outline of the Proof. Let Gk denote the group of k
th roots of unity. Then
as a topological space, CPn
is homeomorphic to a quotient of ordinary projective
space CPn by the finite group
(4.6) G(b) = Gb0 × · · · ×Gbn .
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 13
Explicitly, using standard homogeneous coordinates, the map
pb : CP
n −→ CPn(b)(4.7)
[z0 : · · · : zn] 7−→ [z
0 : · · · : z
n ](4.8)
induces the homeomorphism CPn/G(b) ∼= CP
. This then induces an isomor-
phism in singular cohomology with rational coefficients, since over Q we have the
isomorphisms
(4.9) H∗(CPn;Q) ∼= H
∗(CPn;Q)G(b) ∼= H
∗(CPn/G(b);Q).
Over the integers, the computation is a bit more subtle. Using twisted lens
spaces, Kawasaki verifies that just as for CPn, the cohomology ring H∗(CPn
is torsion-free with a copy of Z in each even degree between 0 and 2n; however the
product structure is twisted by the weights bi. Moreover, there are cases when we
need all n generators γ1, . . . , γn to present this ring.
Kawasaki showed that the map
(4.10) p∗b : H
2k(CPn(b);Z) −→ H
2k(CPn;Z)
is multiplication by ℓk for all 1 ≤ k ≤ n. From this, we may deduce that
(4.11) γ1 ∪ γk =
ℓ1 · ℓk+1
γk+1, and hence γk ∪ γm =
ℓk · ℓm
γm+k.
The result now follows. �
It is important to note that pb is not an isomorphism of orbifolds. Indeed, the
isotropy group at any point in CPn
is the stabilizer group of any lift of the point
in S2n+1. Thus, all isotropy groups for CPn
are cyclic. On the other hand, the
orbifold CPn/G(b) has points with isotropy group G(b), which may not be cyclic.
Nevertheless we will make use of the map pb to understand the structure of the
cohomology of the orbifold [CPn
] = [S2n+1/S1
]. Here, the subscript (b) on S1
indicates that the circle action is weighted by the integers (b) = (b0, . . . , bn).
The cohomology of the orbifold [CPn(b)]. Since the weighted projective space
] is a symplectic reduction, it is possible invoke the results from Section 3 to
determine the cohomology of the orbifold H∗([CPn
];Q), and to apply results from
[GHK, §9] to obtain a presentation over Z. We give a direct argument here that
is similar in spirit, but that avoids much of this big machinery; we then compare
this to Kawasaki’s Theorem 4.1.
Theorem 4.2. The cohomology of the orbifold [CPn
(4.12) H∗([CPn(b)];Z) = H
(S2n+1;Z) ∼=
〈b0 · · · bnun+1〉
Moreover, the natural map
(4.13) q∗(b) : H
∗(S2n+1/S1(b);Z) −→ H
(S2n+1;Z)
is completely determined by q∗
(γ1) = ℓ1 · u = lcm(b0, . . . , bn) · u.
Proof. Consider the (weighted) circle action of S1 on Cn+1 given by
(4.14) t · (z0, · · · , zn) = (t
b0 · z0, · · · , t
bn · zn).
14 TARA S. HOLM
The unit sphere S2n+1 is invariant under this action, so we get a long exact sequence
in S1-equivariant cohomology for the pair (Cn+1, S2n+1),
(4.15) · · · → HiS1
(Cn+1, S2n+1;Z)
→ HiS1
(Cn+1;Z)
→ HiS1
(S2n+1;Z) → · · · .
Thinking of (Cn+1, S2n+1) as a disk and sphere bundle over a point, we may use
the Thom isomorphism to identify Hi
(Cn+1, S2n+1;Z) ∼= H
i−2(n+1)
(Cn+1;Z).
Under this identification, the map α is the cup product with the equivariant Euler
class
(4.16) eS1
(Cn+1) = bo · · · bnu
Thus, the map α is injective, so the long exact sequence splits into short exact
sequences
(4.17) 0 → HiS1
(Cn+1, S2n+1;Z)
→ HiS1
(Cn+1;Z)
→ HiS1
(S2n+1;Z) → 0.
Thus, we have a surjection
(4.18) β : Z[u] ∼= H
(Cn+1;Z) −→ H∗S1
(S2n+1;Z).
Moreover, the exactness of (4.17) means that the kernel of β is equal to the image
of α, namely all multiples of the equivariant Euler class. This establishes (4.12).
Turning to (4.13), the map
(4.19) q∗(b) : H
∗(S2n+1/S1(b);Z) −→ H
(S2n+1;Z),
is exactly the one defined in (2.7). We know that this is an isomorphism over Q,
so q∗
must map γ1 to a multiple of u. Moreover, because b0 · · · bnu
n+1 is zero in
(S2n+1;Z), we must have that
(4.20) (q∗(b)(γ1))
n+1 ∈
b0 · · · bnu
To determine the image of the class γ1, we return to the map pb : CP
n → CPn(b).
This map lifts to maps on S2n+1 and Cn+1 given by
(4.21)
Πb //
S2n+1
πb //?
S2n+1
0 6= (z0, · · · , zn) 7−→
|zbii |
zb00 , · · · , z
(4.22)
(0, . . . , 0) 7−→ (0, . . . , 0)(4.23)
The maps in this diagram are all equivariant with respect to the standard circle
action on the left-hand spaces and the (b)-weighted circle action on the right-hand
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 15
spaces. Thus, we have a diagram of maps
(4.24)
(Cn+1 × S1)/S1
Πb // (Cn+1 × S1
(S2n+1 × S1)/S1
πb //
(S2n+1 × S1(b))/S
CPn pb
// CPn
Applying singular cohomology H∗( ;Z) and identifying equivariant cohomology,
we have a commutative diagram
(4.25)
Z[x] H∗
(Cn+1;Z) oo
(Cn+1;Z) Z[u]
〈xn+1〉 H
(S2n+1;Z) oo
(S2n+1;Z)
〈b0···bnun+1〉
〈xn+1〉 H
∗(CPn;Z) oo
∗(CPn(b);Z) Z⊕ Zγ1 ⊕ · · · ⊕ Zγn
Because Cn+1 equivariantly deformation retracts to a point, the map Π∗b maps the
generator u to x. The commutativity of the top square then implies that π∗b (u) = x.
Thus, we know that
π∗b (q
(b)(γ1) = q
∗(p∗b(γ1))(4.26)
= q∗(ℓ1 · x), by Kawasaki’s result,(4.27)
= ℓ1 · x, since q
∗ is an equality,(4.28)
= π∗b (ℓ1 · u).(4.29)
In low degree, π∗b is injective, so we may conclude that q
(b)(γ1) = ℓ1 · u. Noting
that ℓ1 = lcm(b0, . . . , bn) completes the proof. �
Over the integers, this invariant does distinguish a weighted projective space
from the standard one; however, it may not differentiate between two weighted
projective spaces. For example, the cohomology rings of the orbifolds [CP 12,2] and
[CP 14,1] are identical. They are both
(4.30)
〈4u2〉
We note that these surjectivity techniques do not generally work over the inte-
gers. To see this, we note that for any abelian reduction of affine space, the domain
of the Kirwan map H∗T (C
N ;Z) has terms only in even degrees. If we consider the
simple product [CP 11,2 × CP
1,2], we may compute the cohomology of this orbifold
using the above result and the Künneth formula. Since [CP 11,2] has 2-torsion in high
degrees, the Tor term from the Künneth formula plays a role, yielding 2-torsion in
16 TARA S. HOLM
high odd degrees in the cohomology of the orbifold [CP 11,2 × CP
1,2]. Thus, surjec-
tivity must fail over the integers in this example. We note that any failure over Z
must be due to problems with torsion, because surjectivity does hold over Q.
The Chen-Ruan orbifold cohomology of [CPn
]. When computing the Chen-
Ruan ring, it is important to recall that a weighted projective space is a circle reduc-
tion. Thus, the finite group Γ for which the Γ-piece surjects onto H ∗CR([CP
(b)];Z)
is a cyclic group. For any vector v that is non-zero is a single coordinate, say the
ith coordinate, the stabilizer of v is Zbi . Thus, the group Γ generated by all finite
stabilizers is the ℓth roots of unity Zℓ ⊂ S
1, where ℓ = lcm(b0, . . . , bn). Hence, we
have a surjection
(4.31) Z[u, α0, α1, . . . , αℓ−1] // // H
CR([CP
(b)];Z).
In this case, thinking of e
ℓ = ζk ∈ Zℓ ⊂ S
1, αk denotes a generator for
(4.32) H∗S1
((Cn+1)ζk ;Z).
To complete the computation, we must determine the orbifold product
(4.33) αi ⌣ αj = αi ⋆ αj
and the kernel of the orbifold Kirwan map (3.5). For any integer m ∈ Z, we let
[m] denote the smallest non-negative integer congruent to m modulo ℓ. For any
rational number q ∈ Q, 〈q〉f denotes its fractional part. Finally, we let
(4.34) ak(m) :=
[bk ·m]
bk ·m
This is the rational number such that ζm acts on the k
th coordinate by e2πiak(m).
Theorem 4.3. The Chen-Ruan orbifold cohomology of CPn
(4.35) H ∗CR([CP
(b)];Z)
Z[u, α0, α1, . . . , αℓ−1]
I + J
where u is a class in degree 2,
(4.36) deg(αj) = 2
ak(j).
Here, I is the ideal
(4.37) I =
αiαj −
(bku)
ak(i)+ak(j)−ak(i+j)
α[i+j]
generated by the ⋆ product structure, and J is
(4.38) J =
ak(j)=0
the kernel of the surjection K of the orbifold Kirwan map.
Remark 4.4: The generator u is the generator of S1-equivariant cohomology and
hence has degree 2. The generator αk is a placeholder for the cohomology of the
ζk-sector.
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 17
Remark 4.5: Note that the generator α0 is the placeholder for the identity sector.
Indeed, we always have
(4.39) α0 ⋆ α0 = α0
as a consequence of the relation in (4.37) where i = j = 0, hence we may think of
α0 as 1.
Remark 4.6: The reader may use this theorem to check that H ∗CR( ;Z) does
distinguish [CP 12,2] from [CP
4,1].
Proof. We use the ⋆ product given by Equation (2.1) in [GHK]. In the case of
a weighted circle action on Cn+1, there is exactly one fixed point (the origin), any
generator αi restricted to that fixed point is 1, and the equivariant Euler class for
the kth coordinate is precisely bku, whence
(4.40) αi ⋆ αj =
(bku)
ak(i)+ak(j)−ak(i+j)
α[i+j].
Turning to the kernel computation, each (Cn+1)ζj has a weighted S1 action,
and so we apply Theorem 4.2 to this subspace. Thus, for the ζj-sector, the kernel
contribution is the equivariant Euler class of (Cn+1)ζj times the placeholder αj . We
note that (Cn+1)ζj contains the kth coordinate subspace precisely when ak(j) = 0.
Hence,
(4.41) eS1
((Cn+1)ζj ) =
ak(j)=0
and the theorem follows. �
This theorem is an immediate consequence of Theorem 4.2 and [GHK]. The
importance of this description is its ease in computation, since it avoids any com-
putation of a labeled moment polytope (á la [LT]) or of a stacky fan (á la [BCS]).
We demonstrate this computational facility in the following concluding example.
Example 4.7: Consider the weighted projective space [CP 51,2,2,3,3,3]. This is a
symplectic reduction of C6, the group Γ is Z/6Z, and so the Chen-Ruan orbifold
cohomology of [CP 51,2,2,3,3,3] is a quotient of
(4.42) Z[u, α0, α1, α2, α3, α4, α5].
18 TARA S. HOLM
The following chart contains the data needed to compute the ideals I and J .
(4.43)
g ζ0 ζ1 ζ2 ζ3 ζ4 ζ5
(C6)g C6 {0} 3C(3) 2C(2) 3C(3) {0}
aC(1)(g) 0
aC(2)(g) 0
aC(3)(g) 0
2 · age(g) 0 14
generator of
((C6)g;Z)
α0 α1 α2 α3 α4 α5
((C6)g) 108u6 1 27u3 4u2 27u3 1
Note that because of the multiplicities,
(4.44) 2 · age(g) = 2 ·
a1(g) + 2a2(g) + 3a3(g)
Since α0 = 1, and α1 and α5 are in the kernel ideal J , we only need to compute
the products among α2, α3 and α4. For example, we may compute
(4.45) α2 ⋆ α2 = (u)
(3u)0+0−0
α4 = 4u
All of the products contributing to I, then, are summarized in the following table.
(4.46)
⋆ α2 α3 α4
α2 4u
2α4 α5 = 0 4u
α3 27u
4 uα1 = 0
α4 uα2
Thus, as a ring,
(4.47) H ∗CR([CP
1,2,2,3,3,3];Z)
Z[u, α0, α1, α2, α3, α4, α5]
I + 〈108u6, α1, 27u3α2, 4u2α3, 27u3α4, α5〉
This generalizes Jiang’s computation [J] to a computation over Z.
References
[ℵGV] D. Abramovich, T. Graber, and A. Vistoli, “Algebraic orbifold quantum products.”
Contemp. Math., 310 (2002) 1–24. Preprint math.AG/0112004.
[A] M. Atiyah, “Convexity and commuting Hamiltonians.” Bull. London Math. Soc. 14
(1982) no. 1, 1–15.
[BCS] L. Borisov, L. Chen, and G. Smith, “The orbifold Chow ring of toric Deligne-Mumford
stacks.” J. Amer. Math. Soc. 18 (2005) no. 1, 193–215. Preprint math.AG/0309229.
[CdS] A. Cannas da Silva, Lectures on symplectic geometry. Lecture Notes in Mathemat-
ics, 1764. Springer-Verlag, Berlin, 2001.
[CR] W. Chen and Y. Ruan, “A New Cohomology Theory for Orbifold.”Commun.Math.Phys.
248 (2004) 1-31. Preprint math.AG/0004129.
[CS] T. Chang and T. Skjelbred, “Topological Schur lemma and related results.” Bull. Amer.
Math. Soc. 79 (1973) 1036–1038.
http://arxiv.org/abs/math/0112004
http://arxiv.org/abs/math/0309229
http://arxiv.org/abs/math/0004129
COHOMOLOGY OF ABELIAN SYMPLECTIC REDUCTIONS 19
[CCLT] T. Coates, A. Corti, Y.-P. Lee, and H.-H. Tseng, “The quantum orbifold cohomology
of weighted projective space.” Preprint math.AG/0608481.
[D] V. Danilov, “The geometry of toric varieties.” Russian Math. Surveys 33 (1978) no. 2,
97–154.
[DM] P. Deligne and D. Mumford, “The irreducibility of the space of curves of given genus.”
Inst. Hautes tudes Sci. Publ. Math. 36 (1969) 75–109.
[EHKV] D. Edidin, B. Hassett, A. Kresch, and A. Vistoli, “Brauer groups and quotient stacks.”
Amer. J. Math. 123 (2001) no. 4, 761–777.
[FG] B. Fantechi and L. Göttsche, “Cohomology for global quotients.” Duke Math. J. 117
(2003) no. 2, 197–227. Preprint math.AG/0104207.
[F] T. Frankel, “Fixed points and torsion on Kaḧler manifolds.” Ann. of Math. (2) 70 1959
[Go] R. Goldin, ”An effective algorithm for the cohomology ring of symplectic reduction.”
Geom. Funct. Anal. 12 (2002) 567-583. Preprint math.SG/0110022.
[GoH] R. Goldin and T. Holm, ”The equivariant cohomology of Hamiltonian G-spaces
from residual S1 actions.” Math. Res. Lett. 8 (2001) no. 1-2, 67–77. Preprint
math.SG/0107131.
[GHK] R. Goldin, T. Holm and A. Knutson, “Orbifold cohomology of torus quotients.” Duke
Math. Journal, to appear. Preprint math.SG/0502429.
[GKM] M. Goresky, R. Kottwitz, and R. MacPherson, “Equivariant cohomology, Koszul duality,
and the localization theorem.” Invent. Math. 131 (1998) no. 1, 25–83.
[GuH] V. Guillemin and T. Holm, “GKM theory for torus actions with non-isolated fixed
points.” Int. Math. Res. Not. 40 (2004) 2105–2124. Preprint math.SG/0308008.
[GuSj] V. Guillemin and R. Sjamaar, Convexity properties of Hamiltonian group actions.
CRM Monograph Series 26. American Mathematical Society, Providence, RI, 2005.
[GuSt] V. Guillemin and S. Sternberg, “Convexity properties of the moment mapping.” Invent.
Math. 67 (1982) no. 3, 491–513.
[GuZ] V. Guillemin and C. Zara, “1-skeleta, Betti numbers, and equivariant cohomology.”
Duke Math. J. 107 (2001) no. 2, 283–349. Preprint math.DG/9903051.
[Hæ] A. Haefliger, “Structures feuilletes et cohomologie valeur dans un faisceau de
groupodes.” Comment. Math. Helv. 32 (1958) 248–329.
[HHH] M. Harada, A. Henriques, and T. Holm, “Computation of generalized equivariant coho-
mologies of Kac-Moody flag varieties.” Adv. Math. 197 (2005) no. 1, 198–221. Preprint
math.AT/0409305.
[HL] M. Harada and G. Landweber, “Surjectivity for Hamiltonian G-spaces in K-theory.”
Trans. AMS to appear. Preprint math.SG/0503609.
[HeMe] A. Henriques and D. Metzler, “Presentations of Noneffective Orbifolds.” Trans. Amer.
Math. Soc. 356 (2004) no. 6 2481–2499. Preprint math.AT/0302182.
[HT] T. Holm and S. Tolman, “Integral Kirwan Surjectivity for Hamiltonian T -manifolds,”
in preparation.
[JK] L. Jeffrey and F. Kirwan, “Localization for nonabelian group actions.” Topology 34
(1995) no. 2, 291–327.
[J] Y. Jiang, “The Chen-Ruan Cohomology of Weighted Projective Spaces.” Preprint
math.AG/0304140.
[Ka] T. Kawasaki, “Cohomology of twisted projective spaces and lens complexes.” Math.
Ann. 206 (1973) 243–248.
[Ki] F. Kirwan, Cohomology of quotients in symplectic and algebraic geometry. Mathemat-
ical Notes, 31. Princeton University Press, Princeton, NJ, 1984.
[Kn] A. Knutson, Weight varieties, MIT Ph.D. thesis 1996.
[LT] E. Lerman and S. Tolman, Hamiltonian torus actions on symplectic orbifolds and toric
varieties. Trans. Amer. Math. Soc. 349 (1997) no. 10, 4201–4230. dg-ga/9511008.
[Ma1] E. Mann, Cohomologie quantique orbifolde des espaces projectifs á poids, IRMA (Stras-
bourg) Ph.D. thesis 2005. Available at math.AG/0510331.
[Ma2] E. Mann “Orbifold quantum cohomology of weighted projective spaces.” J. of Alg.
Geom., to appear. Preprint math.AG/0610617.
[MaWe] J. Marsden and A. Weinstein, “Reduction of symplectic manifolds with symmetry.” Rep.
Mathematical Phys. 5 (1974) no. 1, 121–130.
http://arxiv.org/abs/math/0608481
http://arxiv.org/abs/math/0104207
http://arxiv.org/abs/math/0110022
http://arxiv.org/abs/math/0107131
http://arxiv.org/abs/math/0502429
http://arxiv.org/abs/math/0308008
http://arxiv.org/abs/math/9903051
http://arxiv.org/abs/math/0409305
http://arxiv.org/abs/math/0503609
http://arxiv.org/abs/math/0302182
http://arxiv.org/abs/math/0304140
http://arxiv.org/abs/dg-ga/9511008
http://arxiv.org/abs/math/0510331
http://arxiv.org/abs/math/0610617
20 TARA S. HOLM
[Moe] I. Moerdijk, “Orbifolds as groupoids: an introduction.” In Orbifolds in mathematics
and physics (Madison, WI, 2001) 205–222, Contemp. Math., 310, Amer. Math. Soc.,
Providence, RI, 2002.
[MoWi] G. Moore and E. Witten, “Self-duality, Ramond-Ramond fields and K-theory.” J. High
Energy Phys. (2000) no. 5, Paper 32, 32 pp.
[S1] I. Satake, “On a generalization of the notion of manifold.” Procedings of the National
Academy of Sciences USA 42 (1956) 359–363.
[S2] I. Satake, “The Gauss-Bonnet Theorem for V-manifolds.” J. Math. Soc. Japan 9 (1957)
464–492.
[SL] R. Sjamaar and E. Lerman, “Stratified symplectic spaces and reduction.” Ann. of Math.
(2) 134 (1991) no. 2, 375–422.
[Th] W. Thurston, Three-dimensional geometry and topology. Vol. 1. Princeton Mathemat-
ical Series, #35. Princeton University Press, 1997. ISBN 0-691-08304-5.
[TW] S. Tolman and J. Weitsman, “The cohomology ring of symplectic quotients.” Communi-
cations in Analysis and Geometry 11 (2003) no. 4, 751–773. Preprint math.DG/9807173.
Department of Mathematics, Cornell University, Ithaca, NY 14853-4201 USA
E-mail address: tsh@math.cornell.edu
http://arxiv.org/abs/math/9807173
	1. Symplectic manifolds and quotients
	2. Orbifolds and their cohomology
	3. Why the symplectic category is convenient
	4. The case of weighted projective spaces
	References
ABSTRACT
  These notes accompany a lecture about the topology of symplectic (and other)
quotients. The aim is two-fold: first to advertise the ease of computation in
the symplectic category; and second to give an account of some new computations
for weighted projective spaces. We start with a brief exposition of how
orbifolds arise in the symplectic category, and discuss the techniques used to
understand their topology. We then show how these results can be used to
compute the Chen-Ruan orbifold cohomology ring of abelian symplectic
reductions. We conclude by comparing the several rings associated to a weighted
projective space. We make these computations directly, avoiding any mention of
a stacky fan or of a labeled moment polytope.

<|endoftext|><|startoftext|>
Introduction
	General considerations
	Expansion in powers of the field
	Summary and Conclusions
	References
ABSTRACT
  The usual procedure of including a finite number of vertices in Non
Perturbative Renormalization Group equations in order to obtain $n$-point
correlation functions at finite momenta is analyzed. This is done by exploiting
a general method recently introduced which includes simultaneously all vertices
although approximating their momentum dependence. The study is performed using
the self-energy of the tridimensional scalar model at criticality. At least in
this example, low order truncations miss quantities as the critical exponent
$\eta$ by as much as 60%. However, if one goes to high order truncations the
procedure seems to converge rapidly.

<|endoftext|><|startoftext|>
FORMATION AND COLLISIONAL EVOLUTION OF KUIPER
BELT OBJECTS
Scott J. Kenyon
Smithsonian Astrophysical Observatory
Benjamin C. Bromley
Department of Physics, University of Utah
David P. O’Brien
Planetary Science Institute
Donald R. Davis
Planetary Science Institute
This chapter summarizes analytic theory and numerical calculations for the formation and
collisional evolution of KBOs at 20–150 AU. We describe the main predictions of a baseline
self-stirring model and show how dynamical perturbations from a stellar flyby or stirring
by a giant planet modify the evolution. Although robust comparisons between observations
and theory require better KBO statistics and more comprehensive calculations, the data are
broadly consistent with KBO formation in a massive disk followed by substantial collisional
grinding and dynamical ejection. However, there are important problems reconciling the
results of coagulation and dynamical calculations. Contrasting our current understanding of
the evolution of KBOs and asteroids suggests that additional observational constraints, such
as the identification of more dynamical families of KBOs (like the 2003 EL61 family), would
provide additional information on the relative roles of collisional grinding and dynamical
ejection in the Kuiper Belt. The uncertainties also motivate calculations that combine collisional
and dynamical evolution, a ‘unified’ calculation that should give us a better picture of KBO
formation and evolution.
1. INTRODUCTION
Every year in the Galaxy, a star is born. Most stars
form in dense clusters of thousands of stars, as in the Orion
Nebula Cluster (Lada and Lada 2003; Slesnick et al. 2004).
Other stars form in small groups of 5–10 stars in loose asso-
ciations of hundreds of stars, as in the Taurus-Auriga clouds
(Gomez et al. 1993; Luhman 2006). Within these associa-
tions and clusters, most newly-formed massive stars are bi-
naries; lower mass stars are usually single (Lada 2006).
Large, optically thick circumstellar disks surround
nearly all newly-formed stars (Beckwith and Sargent 1996).
The disks have sizes of ∼ 100–200 AU, masses of ∼ 0.01-
0.1 M⊙, and luminosities of ∼ 0.2–1 L⋆, where L⋆ is the
luminosity of the central star. The masses and geometries
of these disks are remarkably similar to the properties of
the minimum mass solar nebula (MMSN), the disk required
for the planets in the solar system (Weidenschilling 1977b;
Hayashi 1981; Scholz et al. 2006).
As stars age, they lose their disks. For solar-type stars,
radiation from the opaque disk disappears in 1–10 Myr
(Haisch et al. 2001). Many older stars have optically thin
debris disks comparable in size to the opaque disks of
younger stars but with much smaller masses, . 1M⊕, and
luminosities, . 10−3 L⋆ (Chapter by Moro-Martin et al.).
The lifetime of this phase is uncertain. Some 100 Myr-old
stars have no obvious debris disk; a few 1–10 Gyr-old stars
have massive debris disks (Greaves 2005).
In the current picture, planets form during the transi-
tion from an optically thick protostellar disk to an opti-
cally thin debris disk. From the statistics of young stars in
molecular clouds, the timescale for this transition, ∼ 105
yr, is comparable to the timescales derived for the forma-
tion of planetesimals from dust grains (Weidenschilling
1977a; Youdin and Shu 2002; Dullemond and Dominik
2005) and for the formation of lunar-mass or larger
planets from planetesimals (Wetherill and Stewart 1993;
Weidenschilling et al. 1997a; Kokubo and Ida 2000; Nagasawa et al.
2005; Kenyon and Bromley 2006). Because the grains
in debris disks have short collision lifetimes, . 1 Myr,
compared to the ages of their parent stars, & 10 Myr,
high velocity collisions between larger objects must main-
tain the small grain population (Aumann et al. 1984;
Backman and Paresce 1993). The inferred dust produc-
tion rates for debris disks around 0.1–10 Gyr old stars,
∼ 1020 g yr−1, require an initial mass in 1 km ob-
http://arxiv.org/abs/0704.0259v1
jects, Mi ∼ 10–100 M⊕, comparable to the amount of
solids in the MMSN. Because significant long-term de-
bris production also demands gravitational stirring by
an ensemble of planets with radii of 500–1000 km or
larger (Kenyon and Bromley 2004a; Wyatt et al. 2005),
debris disks probably are newly-formed planetary sys-
tems (Aumann et al. 1984; Backman and Paresce 1993;
Artymowicz 1997; Kenyon and Bromley 2002b, 2004a,b).
KBOs provide a crucial test of this picture. With objects
ranging in size from 10–20 km to ∼ 1000 km, the size dis-
tribution of KBOs yields a key comparison with theoretical
calculations of planet formation (Davis and Farinella 1997;
Kenyon and Luu 1998, 1999a,b). Once KBOs have sizes
of 100–1000 km, collisional grinding, dynamical perturba-
tions by large planets and passing stars, and self-stirring by
small embedded planets produce features in the distribu-
tions of sizes and dynamical elements that observations can
probe in detail. Although complete calculations of KBO
formation and dynamical evolution are not available, these
calculations will eventually yield a better understanding of
planet formation at 20–100 AU.
The Kuiper belt also enables a vital link between the
solar system and other planetary systems. With an outer
radius of & 1000 AU (Sedna’s aphelion) and a current
mass of ∼ 0.1 M⊕ (Luu and Jewitt 2002; Bernstein et al.
2004, Cahpter by Petit et al.), the Kuiper belt has prop-
erties similar to those derived for the oldest debris disks
(Greaves et al. 2004; Wyatt et al. 2005). Understanding
planet formation in the Kuiper belt thus informs our inter-
pretation of evolutionary processes in other planetary sys-
tems.
This paper reviews applications of coagulation theory for
planet formation in the Kuiper belt. After a brief introduc-
tion to the theoretical background in §2, we describe results
from numerical simulations in §3, compare relevant KBO
observations with the results of numerical simulations in §4,
and contrast the properties of KBOs and asteroids in §5. We
conclude with a short summary in §6.
2. COAGULATION THEORY
Planet formation begins with dust grains suspended in
a gaseous circumstellar disk. Grains evolve into planets
in three steps. Collisions between grains produce larger
aggregates which decouple from the gas and settle into a
dense layer in the disk midplane. Continued growth of the
loosely bound aggregates leads to planetesimals, gravita-
tionally bound objects whose motions are relatively inde-
pendent of the gas. Collisions and mergers among the en-
semble of planetesimals form planets. Here, we briefly de-
scribe the physics of these stages and summarize analytic
results as a prelude to summaries of numerical simulations.
We begin with a prescription for the mass surface density
Σ of gas and dust in the disk. We use subscripts ‘d’ for the
dust and ‘g’ for the gas and adopt
Σd,g = Σ0d,0g
40 AU
, (1)
where a is the semimajor axis. In the MMSN, n =
3/2, Σ0d ≈ 0.1 g cm
−2, and Σ0g ≈ 5–10 g cm
(Weidenschilling 1977b; Hayashi 1981). For a disk with
an outer radius of 100 AU, the MMSN has a mass of
∼ 0.03 M⊙, which is comparable to the disk masses of
young stars in nearby star-forming regions (Natta et al.
2000; Scholz et al. 2006).
The dusty midplane forms quickly (Weidenschilling
1977a, 1980; Dullemond and Dominik 2005). For inter-
stellar grains with radii, r ∼ 0.01–0.1 µm, turbulent mixing
approximately balances settling due to the vertical compo-
nent of the star’s gravity. As grains collide and grow to r ∼
0.1–1 mm, they decouple from the turbulence and settle
into a thin layer in the disk midplane. The timescale for this
process is ∼ 103 yr at 1 AU and ∼ 105 yr at 40 AU.
The evolution of grains in the midplane is uncertain. Be-
cause the gas has some pressure support, it orbits the star
slightly more slowly than the Keplerian velocity. Thus,
orbiting dust grains feel a headwind that drags them to-
ward the central star (Adachi et al. 1976; Weidenschilling
1984; Tanaka and Ida 1999). For m-sized objects, the drag
timescale at 40 AU, ∼ 105 yr, is comparable to the growth
time. Thus, it is not clear whether grains can grow by di-
rect accretion to km sizes before the gas drags them into the
inner part of the disk.
Dynamical processes provide alternatives to random ag-
glomeration of grains. In ensembles of porous grains, gas
flow during disruptive collisions leads to planetesimal for-
mation by direct accretion (Wurm et al. 2004). Analytic es-
timates and numerical simulations indicate that grains with
r ∼ 1 cm are also easily trapped within vortices in the disk
(e.g. de la Fuente Marcos and Barge 2001; Inaba and Barge
2006). Large enhancements in the solid-to-gas ratio within
vortices allows accretion to overcome gas drag, enabling
formation of km-sized planetesimals in 104–105 yr.
If the dusty midplane is calm, it becomes thinner and
thinner until groups of particles overcome the local Jeans
criterion – where their self-gravity overcomes local orbital
shear – and ‘collapse’ into larger objects on the local dy-
namical timescale, ∼ 103 yr at 40 AU (Goldreich and Ward
1973; Youdin and Shu 2002; Tanga et al. 2004). This pro-
cess is a promising way to form planetesimals; however,
turbulence may prevent the instability (Weidenschilling
1995, 2003, 2006). Although the expected size of a col-
lapsed object is the Jeans wavelength, the range of plan-
etesimal sizes the instability produces is also uncertain.
Once planetesimals with r ∼ 1 km form, gravity dom-
inates gas dynamics. Long range gravitational interactions
exchange kinetic energy (dynamical friction) and angular
momentum (viscous stirring), redistributing orbital energy
and angular momentum among planetesimals. For 1 km
objects at 40 AU, the initial random velocities are compara-
ble to their escape velocities, ∼ 1 m s−1 (Weidenschilling
1980; Goldreich et al. 2004). The gravitational binding en-
ergy (for brevity, we use energy as a shorthand for specific
energy), Eg ∼ 10
4 erg g−1, is then comparable to the typ-
ical collision energy, Ec ∼ 10
4 erg g−1. Both energies are
smaller than the disruption energy – the collision energy
needed to remove half of the mass from the colliding pair
of objects – which is Q∗D ∼ 10
5–107 erg g−1 for icy mate-
rial (Davis et al. 1985; Benz and Asphaug 1999; Ryan et al.
1999; Michel et al. 2001; Leinhardt and Richardson 2002;
Giblin et al. 2004). Thus, collisions produce mergers in-
stead of debris.
Initially, small planetesimals grow slowly. For a large
ensemble of planetesimals, the collision rate is nσv, where
n is the number of planetesimals, σ is the cross-section,
and v is the relative velocity. The collision cross-section is
the geometric cross-section, πr2, scaled by the gravitational
focusing factor, fg,
σc ∼ πr
2fg ∼ πr
2(1 + β(vesc/evK)
2) , (2)
where e is the orbital eccentricity, vK is the orbital ve-
locity, vesc is the escape velocity of the merged pair
of planetesimals, and β ≈ 2.7 is a coefficient that ac-
counts for three-dimensional orbits in a rotating disk
(Greenzweig and Lissauer 1990; Spaute et al. 1991; Wetherill and Stewart
1993). Because evK ≈ vesc, gravitational focusing factors
are small and growth is slow and orderly (Safronov 1969).
The timescale for slow, orderly growth is
ts ≈ 30
1000 km
250 yr
0.1 g cm−2
Gyr ,
where P is the orbital period (Safronov 1969; Lissauer
1987; Goldreich et al. 2004).
As larger objects form, several processes damp parti-
cle random velocities and accelerate growth. For objects
with r ∼ 1–100 m, physical collisions reduce particle ran-
dom velocities (Ohtsuki 1992; Kenyon and Luu 1998). For
larger objects with r & 0.1 km, the smaller objects damp
the orbital eccentricity of larger particles through dynam-
ical friction (Wetherill and Stewart 1989; Kokubo and Ida
1995; Kenyon and Luu 1998). Viscous stirring by the large
objects excites the orbits of the small objects. For planetes-
imals with r ∼ 1 m to r ∼ 1 km, these processes occur
on short timescales, . 106 yr at 40 AU, and roughly bal-
ance when these objects have orbital eccentricity e ∼ 10−5.
In the case where gas drag is negligible, Goldreich et al.
(2004) derive a simple relation for the ratio of the eccentric-
ities of the large (‘l’) and the small (‘s’) objects in terms of
their surface densities Σl,s (see also Kokubo and Ida 2002;
Rafikov 2003c,b,d,a),
, (4)
with γ = 1/4 to 1/2. Initially, most of the mass is in small
objects. Thus Σl/Σs ≪ 1. For Σl/Σs ∼ 10
−3 to 10−2,
el/es ≈ 0.1–0.25. Because esvK ≪ vl,esc gravitational
focusing factors for large objects accreting small objects are
large. Runaway growth begins.
Runaway growth relies on positive feedback between
accretion and dynamical friction. Dynamical friction pro-
duces the largest fg for the largest objects, which grow
faster and faster relative to the smaller objects and contain
an ever growing fraction of the total mass. As they grow,
these protoplanets stir the planetesimals. The orbital ve-
locity dispersions of small objects gradually approach the
escape velocities of the protoplanets. With esvK ∼ vl,esc,
collision rates decline as runaway growth continues (eqs.
(2) and (4)). The protoplanets and leftover planetesimals
then enter the oligarchic phase, where the largest objects
– oligarchs – grow more slowly than they did as runaway
objects but still faster than the leftover planetesimals. The
timescale to reach oligarchic growth is (Lissauer 1987;
Goldreich et al. 2004)
to ≈ 30
250 yr
0.1 g cm−2
Myr , (5)
For the MMSN, to ∝ a
−3. Thus, collisional damping,
dynamical friction and gravitational focusing enhance the
growth rate by 3 orders of magnitude compared to orderly
growth.
Among the oligarchs, smaller oligarchs grow the fastest.
Each oligarch tries to accrete material in an annular ‘feed-
ing zone’ set by balancing the gravity of neighboring oli-
garchs. If an oligarch accretes all the mass in its feed-
ing zone, it reaches the ‘isolation mass’ (Lissauer 1987;
Kokubo and Ida 1998, 2002; Rafikov 2003a; Goldreich et al.
2004),
miso ≈ 28
40 AU
0.1 g cm−2
M⊕ . (6)
Each oligarch stirs up leftover planetesimals along its or-
bit. Smaller oligarchs orbit in regions with smaller Σl/Σs.
Thus, smaller oligarchs have larger gravitational focusing
factors (eqs. (2) and (4)) and grow faster than larger oli-
garchs (Kokubo and Ida 1998; Goldreich et al. 2004).
As oligarchs approach miso, they stir up the velocities
of the planetesimals to the disruption velocity. Instead of
mergers, collisions then yield smaller planetesimals and de-
bris. Continued disruptive collisions lead to a collisional
cascade, where leftover planetesimals are slowly ground to
dust (Dohnanyi 1969; Williams and Wetherill 1994). Ra-
diation pressure from the central star ejects dust grains
with r . 1–10 µm; Poynting-Robertson drag pulls larger
grains into the central star (Burns et al. 1979; Artymowicz
1988; Takeuchi and Artymowicz 2001). Eventually, plan-
etesimals are accreted by the oligarchs or ground to dust.
To evaluate the oligarch mass required for a disruptive
collision, we consider two planetesimals with equal mass
mp. The center-of-mass collision energy is
Qi = v
i /8 , (7)
where the impact velocity v2i = v
2+v2esc (Wetherill and Stewart
1993). The energy needed to remove half of the combined
mass of two colliding planetesimals is
Q∗D = Qb
+ ρQg
, (8)
1 3 5 7
log Radius (cm)
e = 0.001
e = 0.01
e = 0.1
Fig. 1.— Disruption energy, Q∗D, for icy objects. The solid
curve plots a typical result derived from numerical simulations of
collisions that include a detailed equation of state for crystalline
ice (Qb = 1.6 × 10
7 erg g−1, βb = −0.42, ρ = 1.5 g cm
−3, Qg =
1.5 erg cm−3, and βg = 1.25; Benz and Asphaug 1999). The other
curves plot results using Qb consistent with model fits to comet
breakups (βb ≈ 0; Qb ∼ 10
3 erg g−1, dashed curve; Qb ∼ 10
erg g−1, dot-dashed curve; Asphaug and Benz 1996). The dashed
horizontal lines indicate the center of mass collision energy (eq.
(7)) for equal mass objects with e = 0.001, 0.01, and 0.1. Col-
lisions between objects with Qi ≪ Q
D yield merged remnants;
collisions between objects with Qi ≫ Q
D produce debris.
where Qbr
βb is the bulk (tensile) component of the binding
energy and ρQgr
βg is the gravity component of the bind-
ing energy (Davis et al. 1985; Housen and Holsapple 1990,
1999; Holsapple 1994; Benz and Asphaug 1999). We adopt
v ≈ vesc,o, where vesc,o = (Gmo/ro)
1/2 is the escape ve-
locity of an oligarch with mass mo and radius ro. We define
the disruption mass md by deriving the oligarch mass where
Qi ≈ Q
D. For icy objects at 30 AU
md ∼ 3× 10
107 erg g−1
M⊕. (9)
Figure 1 illustrates the variation of Q∗D with radius for
several variants of eq. (8). For icy objects, detailed nu-
merical collision simulations yield Qb . 10
7 erg g−1,
−0.5 . βb . 0, ρ ≈ 1–2 g cm
−3, Qg ≈ 1–2 erg cm
and βg ≈ 1–2 (solid line in Fig. 1, Benz and Asphaug
1999, see also Chapter by Leinhardt et al.)). Models for
the breakup of comet Shoemaker-Levy 9 suggest a smaller
component of the bulk strength, Qb ∼ 10
3 erg g−1 (e.g.,
Asphaug and Benz 1996), which yields smaller disruption
energies for smaller objects (Fig. 1, dashed and dot-dashed
curves). Because nearly all models for collisional disrup-
tion yield similar results for objects with r & 1 km (e.g.,
Kenyon and Bromley 2004d), the disruption mass is fairly
independent of theoretical uncertainties once planetesimals
become large. For typical Q∗D ∼ 10
7–108 erg g−1 for 1–10
km objects (Fig. 1), leftover planetesimals start to disrupt
when oligarchs have radii, ro ∼ 200–500 km.
Once disruption commences, the final mass of an oli-
garch depends on the timescale for the collisional cascade
(Kenyon and Bromley 2004a,b,d; Leinhardt and Richardson
2005). If disruptive collisions produce dust grains much
faster than oligarchs accrete leftover planetesimals, oli-
garchs with mass mo cannot grow much larger than the
disruption radius (maximum oligarch mass mo,max ≈ md).
However, if oligarchs accrete grains and leftover planetes-
imals effectively, oligarchs reach the isolation mass before
collisions and radiation pressure remove material from the
disk (eq. (6); Goldreich et al. 2004). The relative rates of
accretion and disruption depend on the balance between
collisional damping and gas drag – which slow the colli-
sional cascade – and viscous stirring and dynamical friction
– which speed up the collisional cascade. Because deriv-
ing accurate rates for these processes requires numerical
simulations of planetesimal accretion, we now consider
simulations of planet formation in the Kuiper belt.
3. COAGULATION SIMULATIONS
3.1. Background
Safronov (1969) invented the current approach to plan-
etesimal accretion calculations. In his particle-in-a-box
method, planetesimals are a statistical ensemble of masses
with distributions of orbital eccentricities and inclinations
(Greenberg et al. 1978; Wetherill and Stewart 1989, 1993;
Spaute et al. 1991). This statistical approximation is es-
sential: N -body codes cannot follow the n ∼ 109–1012
1 km planetesimals required to build Pluto-mass or Earth-
mass planets. For large numbers of objects on fairly cir-
cular orbits (e.g., n & 104, r . 1000 km, and e . 0.1),
the method is also accurate. With a suitable prescription
for collision outcomes, solutions to the coagulation equa-
tion in the kinetic theory yield the evolution of n(m) with
arbitrarily small errors (e.g., Wetherill 1990; Lee 2000;
Malyshkin and Goodman 2001).
In addition to modeling planet growth, the statistical
approach provides a method for deriving the evolution of
orbital elements for large ensembles of planetesimals. If
we (i) assume the distributions of e and i for planetesimals
follow a Rayleigh distribution and (ii) treat their motions
as perturbations of a circular orbit, we can use the Fokker-
Planck equation to solve for small changes in the orbits
due to gas drag, gravitational interactions, and physical col-
lisions (Hornung et al. 1985; Wetherill and Stewart 1993;
Ohtsuki et al. 2002). Although the Fokker-Planck equation
cannot derive accurate orbital parameters for planetesimals
and oligarchs near massive planets, it yields accurate solu-
tions for the ensemble average e and i when orbital reso-
nances and other dynamical interactions are not important
(e.g., Wetherill and Stewart 1993; Weidenschilling et al.
1997a; Ohtsuki et al. 2002).
Several groups have implemented Safronov’s method for
calculations relevant to the outer solar system (Greenberg et al.
1984; Stern 1995, 2005; Stern and Colwell 1997a,b; Davis and Farinella
1997; Kenyon and Luu 1998, 1999a,b; Davis et al. 1999;
Kenyon and Bromley 2004a,d, 2005). These calculations
-3 -2 -1 0 1 2 3 4
log Radius (km)
0 Myr
40-47 AU
-3 -2 -1 0 1 2 3 4
log Radius (km)
Fig. 2.— Evolution of a multiannulus coagulation model with Σ = 0.12(ai/40 AU)−3/2 g cm−2. Left: cumulative mass distribution
at times indicated in the legend. Right: eccentricity distributions at t = 0 (light solid line), t = 10 Myr (filled circles), t = 100 Myr
(open boxes), t = 1 Gyr (filled triangles), and t = 5 Gyr (open diamonds). As large objects grow in the disk, they stir up the leftover
planetesimals to e ∼ 0.1. Disruptive collisions then deplete the population of 0.1–10 km planetesimals, which limits the growth of the
largest objects.
adopt a disk geometry and divide the disk into N concen-
tric annuli with radial width ∆ai at distances ai from the
central star. Each annulus is seeded with a set of planetes-
imals with masses mij , eccentricities eij , and inclinations
iij , where the index i refers to one of N annuli and the
index j refers to one of M mass batches within an annulus.
The mass batches have mass spacing δ ≡ mj+1/mj . In
most calculations, δ ≈ 2; δ ≤ 1.4 is optimal (Ohtsuki et al.
1990; Wetherill and Stewart 1993; Kenyon and Luu 1998).
Once the geometry is set, the calculations solve a set of
coupled difference equations to derive the number of ob-
jects nij , and the orbital parameters, eij and iij , as func-
tions of time. Most studies allow fragmentation and ve-
locity evolution through gas drag, collisional damping, dy-
namical friction and viscous stirring. Because Q∗D sets the
maximum size mc,max of objects that participate in the
collisional cascade, the size distribution for objects with
m < mc,max depends on the fragmentation parameters (eq.
(8); Davis and Farinella 1997; Kenyon and Bromley 2004d;
Pan and Sari 2005). The size and velocity distributions of
the merger population with m > mc,max are established
during runaway growth and the early stages of oligarchic
growth. Accurate treatment of velocity evolution is impor-
tant for following runaway growth and thus deriving good
estimates for the growth times and the size and velocity dis-
tributions of oligarchs.
When a few oligarchs contain most of the mass, col-
lision rates depend on the orbital dynamics of individual
objects instead of ensemble averages. Safronov’s statisti-
cal approach then fails (e.g., Wetherill and Stewart 1993;
Weidenschilling et al. 1997b). Although N-body methods
can treat the evolution of the oligarchs, they cannot follow
the evolution of leftover planetesimals, where the statis-
tical approach remains valid (e.g., Weidenschilling et al.
1997b). Spaute et al. (1991) solve this problem by adding
a Monte Carlo treatment of binary interactions between
large objects to their multiannulus coagulation code.
Bromley and Kenyon (2006) describe a hybrid code, which
merges a direct N-body code with a multiannulus coagula-
tion code. Both codes have been applied to terrestrial planet
formation, but not to the Kuiper belt.
Current calculations cannot follow collisional growth ac-
curately in an entire planetary system. Although the six
order of magnitude change in formation timescales from
∼ 0.4 AU to 40 AU is a factor in this statement, most
modern supercomputers cannot finish calculations involv-
ing the entire disk with the required spatial resolution on
a reasonable timescale. For the Kuiper belt, it is possi-
ble to perform a suite of calculations in a disk extending
from 30–150 AU following 1 m and larger planetesimals.
These calculations yield robust results for the mass distri-
bution as a function of space and time and provide interest-
ing comparisons with observations. Although current cal-
culations do not include complete dynamical interactions
with the giant planets or passing stars ((see, for example,
Charnoz and Morbidelli 2007), sample calculations clearly
show the importance of external perturbations in treating
the collisional cascade. We begin with a discussion of
self-stirring calculations without interactions with the gi-
ant planets or passing stars and then describe results with
external perturbers.
3.2. Self-Stirring
To illustrate in situ KBO formation at 40–150 AU, we
consider a multiannulus calculation with an initial ensem-
ble of 1 m to 1 km planetesimals in a disk with Σ0d =
0.12 g cm−2. The planetesimals have initial radii of 1 m
to 1 km (with equal mass per logarithmic bin), e0 = 10
i0 = e0/2, mass density
1 ρ = 1.5 g cm−3 and fragmentation
parameters Qb = 10
3 erg g−1, Qg = 1.5 erg cm
−3, βb = 0,
1 Our choice of mass density is a compromise between pure ice (ρ = 1
g cm−3) and the measured density of Pluto (ρ ≈ 2 g cm−3 Null et al.
and βg = 1.25 (dashed curve in Fig. 1; Kenyon and Bromley
2004d, 2005). The gas density also follows a MMSN, with
Σg/Σd = 100 and a vertical scale height h = 0.1 r
(Kenyon and Hartmann 1987). The gas density is Σg ∝
e−t/tg , with tg = 10 Myr.
This calculation uses an updated version of the Bromley and Kenyon
(2006) code that includes a Richardson extrapolation pro-
cedure in the coagulation algorithm. As in the Eulerian
(Kenyon and Luu 1998) and fourth order Runge-Kutta
(Kenyon and Bromley 2002a,b) methods employed pre-
viously, this code provides robust numerical solutions to
kernels with analytic solutions (e.g., Ohtsuki et al. 1990;
Wetherill 1990) without producing the wavy size distribu-
tions described in other simulations with a low mass cutoff
(e.g. Campo Bagatin et al. 1994). Once the evolution of
large (r > 1 m) objects is complete, a separate code tracks
the evolution of lower mass objects and derives the dust
emission as a function of time.
Figure 2 shows the evolution of the mass and eccentric-
ity distributions at 40–47 AU for this calculation. During
the first few Myr, the largest objects grow slowly. Dynam-
ical friction damps the orbits of the largest objects; colli-
sional damping and gas drag circularize the orbits of the
smallest objects. This evolution erases many of the initial
conditions and enhances gravitational focusing by factors
of 10–1000. Runaway growth begins. A few (and some-
times only one) oligarchs then grow from r ∼ 10 km to
r ∼ 1000 km in ∼ 30 Myr at 40 AU and in ∼ 1 Gyr at
150 AU (see eq. (5)). Throughout runaway growth, dynam-
ical friction and viscous stirring raise the random velocities
of the leftover planetesimals to e ≈ 0.01–0.1 and i ≈ 2o–
4o (v ∼ 50–500 m s−1 at 40–47 AU; Figure 2; right panel).
Stirring reduces gravitational focusing factors and ends run-
away growth. The large oligarchs then grow slowly through
accretion of leftover planetesimals.
As oligarchs grow, collisions among planetesimals initi-
ate the collisional cascade. Disruptive collisions dramat-
ically reduce the population of 1–10 km objects, which
slows the growth of oligarchs and produces a significant de-
bris tail in the size distribution. In these calculations, dis-
ruptive collisions remove material from the disk faster than
oligarchs can accrete the debris. Thus, growth stalls and
produces ∼ 10–100 objects with maximum sizes rmax ∼
1000–3000 km at 40–50 AU (Stern and Colwell 1997a,b;
Kenyon and Bromley 2004d, 2005; Stern 2005).
Stochastic events lead to large dispersions in the growth
time for oligarchs, to (eq. (5)). In ensembles of 25–50 simu-
lations with identical starting conditions, an occasional oli-
garch will grow up to a factor of two faster than its neigh-
bors. This result occurs in simulations with δ = 1.4, 1.7, and
2.0, and thus seems independent of mass resolution. These
events occur in ∼ 25% of the simulations and lead to factor
of ∼ 2 variations in to (eq. (5)).
In addition to modest-sized icy planets, oligarchic
1993). The calculations are insensitive to factor of two variations in the
mass density of planetesimals.
6 7 8 9 10
log Time (yr)
R > 10 km (40-47 AU)
R > 100 km (40-47 AU)
Dust (40-150 AU)
Fig. 3.— Time evolution of the mass in KBOs and dust grains.
Solid line: dust mass (r . 1 mm) at 40–150 AU. Dashed (dot-
dashed) lines: total mass in small (large) KBOs at 40–47 AU.
growth generates copious amounts of dust (Figure 3). When
runaway growth begins, collisions produce small amounts
of dust from ‘cratering’ (see, for example Greenberg et al.
1978; Wetherill and Stewart 1993; Stern and Colwell 1997a,b;
Kenyon and Luu 1999a). Stirring by growing oligarchs
leads to ‘catastrophic’ collisions, where colliding planetes-
imals lose more than 50% of their initial mass. These dis-
ruptive collisions produce a spike in the dust production
rate that coincides with the formation of oligarchs with r &
200–300 km (eq. (9)). As the wave of runaway growth
propagates outward, stirring produces disruptive collisions
at ever larger heliocentric distances. The dust mass grows
in time and peaks at ∼ 1 Gyr, when oligarchs reach their
maximum mass at 150 AU. As the mass in leftover plan-
etesimals declines, Poynting-Robertson drag removes dust
faster than disruptive collisions produce it. The dust mass
then declines with time.
3.3. External Perturbation
Despite the efficiency of self-stirring models in remov-
ing leftover planetesimals from the disk, other mechanisms
must reduce the derived mass in KBOs to current ob-
servational limits. In self-stirring calculations at 40–50
AU, the typical mass in KBOs with r ∼ 30—1000 km
at 4–5 Gyr is a factor of 5–10 larger than currently ob-
served (Luu and Jewitt 2002, Chapter by Petit et al.). Un-
less Earth-mass or larger objects form in the Kuiper belt
(Chiang et al. 2007; Levison and Morbidelli 2007), exter-
nal perturbations must excite KBO orbits and enhance the
collisional cascade.
Two plausible sources of external perturbation can re-
duce the predicted KBO mass to the desired limits. Once
Neptune achieves its current mass and orbit, it stirs up
the orbits of KBOs at 35–50 AU (Levison and Duncan
1990; Holman and Wisdom 1993; Duncan et al. 1995;
Kuchner et al. 2002; Morbidelli et al. 2004). In ∼ 100 Myr
or less, Neptune removes nearly all KBOs with a . 37–38
AU. Beyond a ∼ 38 AU, some KBOs are trapped in or-
bital resonance with Neptune (Malhotra 1995, 1996); oth-
ers are ejected into the scattered disk (Duncan and Levison
1997). In addition to these processes, Neptune stir-
ring increases the effectiveness of the collisional cascade
(Kenyon and Bromley 2004d), which removes additional
mass from the population of 0.1–10 km KBOs and prevents
growth of larger KBOs.
Passing stars can also excite KBO orbits and en-
hance the collisional cascade. Although Neptune dy-
namically ejects scattered disk objects with perihelia
q . 36–37 AU (Morbidelli et al. 2004), objects with
q & 45–50 AU, such as Sedna and Eris, require an-
other scattering source. Without evidence for massive
planets at a & 50 AU (Morbidelli et al. 2002), a pass-
ing star is the most likely source of the large q for these
KBOs (Morbidelli and Levison 2004; Kenyon and Bromley
2004c).
Adams and Laughlin (2001, see also Chapter by Dun-
can et al.) examined the probability of encounters between
the young Sun and other stars. Most stars form in dense
clusters with estimated lifetimes of ∼ 100 Myr. To ac-
count for the abundance anomalies of radionuclides in so-
lar system meteorites (produced by supernovae in the clus-
ter) and for the stability of Neptune’s orbit at 30 AU, the
most likely solar birth cluster has ∼ 2000–4000 members,
a crossing time of ∼ 1 Myr, and a relaxation time of ∼ 10
Myr. The probability of a close encounter with a distance
of closest approach aclose is then ∼ 60% (aclose/160 AU)
(Kenyon and Bromley 2004c).
Because the dynamical interactions between KBOs in a
coagulation calculation and large objects like Neptune or
a passing star are complex, here we consider simple cal-
culations of each process. To illustrate the evolution of
KBOs after a stellar flyby, we consider a very close pass
with aclose = 160 AU (Kenyon and Bromley 2004c). This
co-rotating flyby produces objects with orbital parameters
similar to those of Sedna and Eris. For objects in the coag-
ulation calculation, the flyby produces an e distribution
eKBO =
0.025(a/30 AU)4 a < a0
0.5 a > a0
with a0 ≈ 65 AU (see Ida et al. 2000; Kenyon and Bromley
2004c; Kobayashi et al. 2005). This e distribution produces
a dramatic increase in the debris production rate throughout
the disk, which freezes the mass distribution of the largest
objects2. Thus, to produce an ensemble of KBOs with r &
300 km at 40–50 AU, the flyby must occur when the Sun
has an age t⊙ & 10–20 Myr (Figure 2). For t⊙ & 100 Myr,
2 The i distribution following a flyby depends on the relative orientations of
two planes, the orbital plane of KBOs and the plane of the trajectory of the
passing star. Here, we assume the flyby produces no change in i, which
simplifies the discussion without changing any of the results significantly.
the flyby is very unlikely. As a compromise between these
two estimates, we consider a flyby at t⊙ ∼ 50 Myr.
6 7 8 9 10
log Time (yr)
Flyby Model
40-47 AU
47-55 AU
66-83 AU
Fig. 4.— Evolution of Σ after a stellar flyby. After 50 Myr
of growth, the close pass excites KBOs to large e (eq. (10)) and
enhances the collisional cascade.
Figure 4 shows the evolution of the KBO surface den-
sity in three annuli as a function of time. At early times
(t . 50 Myr), KBOs grow in the standard way. After the
flyby, the disk suffers a dramatic loss of material. At 40–47
AU, the disk loses ∼ 90% (93%) of its initial mass in ∼ 1
Gyr (4.5 Gyr). At ∼ 50–80 AU, the collisional cascade re-
moves ∼ 90% (97%) of the initial mass in ∼ 500 Myr (4.5
Gyr). Beyond ∼ 80 AU, KBOs contain less than 1% of the
initial mass. Compared to self-stirring models, flybys that
produce Sedna-like orbits are a factor of 2–3 more efficient
at removing KBOs from the solar system.
To investigate the impact of Neptune on the collisional
cascade, we parameterize the growth of Neptune at 30 AU
as a simple function of time (Kenyon and Bromley 2004d)
MNep ≈
6× 1027 e(t−tN )/t1 g t < tN
6× 1027 g + C(t− t1) tN < t < t2
1.0335× 1029 g t > t2
where CNep = 1.947 × 10
21 g yr−1, tN = 50 Myr, t1 = 3
Myr, and t2 = 100 Myr. These choices enable a model Nep-
tune to reach a mass of 1 M⊕ in 50 Myr, when the largest
KBOs form at 40–50 AU, and reach its current mass in 100
Myr3. As Neptune approaches its final mass, its gravity stirs
up KBOs at 40–60 AU and increases their orbital eccentric-
ities to e ∼ 0.1–0.2 on short timescales. In the coagula-
tion model, distant planets produce negligible changes in i,
so self-stirring sets i in these calculations (Weidenschilling
1989). This evolution enhances debris production by a fac-
tor of 3–4, which effectively freezes the mass distribution
of 100–1000 km objects at 40–50 AU. By spreading the
3 This prescription is not intended as an accurate portrayal of Neptune for-
mation, but it provides a simple way to investigate how Neptune might stir
the Kuiper belt once massive KBOs form.
6 7 8 9 10
log Time (yr)
Neptune Stirring
SS: 40-47 AU
NS: 40-47 AU
NS: 47-55 AU
Fig. 5.— Evolution of Σ(KBO) in models with Neptune stirring.
Compared to self-stirring models (SS; dashed curve), stirring by
Neptune rapidly removes KBOs at 40–47 AU (NS; solid cruve)
and at 47–55 AU (NS; dot-dashed curve).
leftover planetesimals and the debris over a larger volume,
Neptune stirring limits the growth of the oligarchs and thus
reduces the total mass in KBOs.
Figure 5 shows the evolution of the surface density in
small and large KBOs in two annuli as a function of time.
At 40–55 AU, Neptune rapidly stirs up KBOs to e ∼ 0.1
when it reaches its current mass at ∼ 100 Myr. Large
collision velocities produce more debris, which is rapidly
ground to dust and removed from the system by radiation
pressure at early times and by Poynting-Robertson drag at
later times. Compared to self-stirring models, the change
in Σ is dramatic, with only ∼ 3% of the initial disk mass
remaining at ∼ 4.5 Gyr.
From these initial calculations, it is clear that exter-
nal perturbations dramatically reduce the mass of KBOs
in the disk (see also Charnoz and Morbidelli 2007). Fig-
ure 6 compares the mass distributions at 40–47 AU and at
4.5 Gyr for the self-stirring model in Figure 2 (solid line)
with results for the flyby (dot-dashed line) and Neptune stir-
ring (dashed line). Compared to the self-stirring model, the
close flyby reduces the mass in KBOs by ∼ 50%. Neptune
stirring reduces the KBO mass by almost a factor of 3 rel-
ative to the self-stirring model. For KBOs with r & 30–50
km, the predicted mass in KBOs with Neptune stirring is
within a factor of 2–3 of the current mass in KBOs.
These simple calculations for the stellar flyby and Nep-
tune stirring do not include dynamical depletion. In the stel-
lar flyby picture, the encounter removes nearly all KBOs
beyond a truncation radius, aT ∼ 48 (aclose / 160 AU)
AU (Kenyon and Bromley 2004c). Thus, a close pass with
aclose ∼ 160 AU can produce the observed outer edge of the
Kuiper belt at 48 AU. Although many objects with initial
a > aT are ejected from the Solar System, some are placed
on very elliptical, Sedna-like orbits4. In the Neptune stir-
4Levison et al. (2004) consider the impact of the flyby on the scattered disk
and Oort cloud. After analyzing a suite of numerical simulations, they
conclude that the flyby must occur before Neptune reaches its current orbit
-3 -2 -1 0 1 2 3 4
log Radius (km)
Initial state
Self-stirring
Flyby 
Neptune
40-47 AU
Fig. 6.— Mass distributions for evolution with self-stirring
(heavy solid line), stirring from a passing star (dot-dashed line),
and stirring from Neptune at 30 AU (dashed line). After 4.5 Gyr,
the mass in KBOs with r & 50 km is ∼ 5% (self-stirring), ∼
3.5% (flyby), and ∼ 2% (Neptune stirring) of the initial mass. The
number of objects with r & 1000 km is ∼ 100 (self-stirring), ∼
1 (flyby), and ∼ 10 (Neptune stirring). The largest object has
rmax ∼ 3000 km (self-stirring), rmax ∼ 500–1000 km (flyby),
and rmax ∼ 1000–2000 km (Neptune stirring).
ring model, dynamical interactions will eject some KBOs
at 40–47 AU. If the dynamical interactions that produce the
scattered disk reduce the mass in KBOs by a factor of 2 at
40–47 AU (e.g., Duncan et al. 1995; Kuchner et al. 2002),
the Neptune stirring model yields a KBO mass in reason-
ably good agreement with observed limits (for a different
opinion, see Charnoz and Morbidelli 2007).
3.4. Nice Model
Although in situ KBO models can explain the current
amount of mass in large KBOs, these calculations do not
address the orbits of the dynamical classes of KBOs. To ex-
plain the orbital architecture of the giant planets, the ‘Nice
group’ centered at Nice Observatory developed an inspired,
sophisticated picture of the dynamical evolution of the gi-
ant planets and a remnant planetesimal disk (Tsiganis et al.
2005; Morbidelli et al. 2005; Gomes et al. 2005, and refer-
ences therein). The system begins in an approximate equi-
librium, with the giant planets in a compact configuration
(Jupiter at 5.45 AU, Saturn at ∼ 8.2 AU, Neptune at ∼ 11.5
AU, and Uranus at ∼ 14.2 AU) and a massive planetesimal
disk at 15–30 AU. Dynamical interactions between the gi-
ant planets and the planetesimals lead to an instability when
Saturn crosses the 2:1 orbital resonance with Jupiter, which
results in a dramatic orbital migration of the gas giants and
the dynamical ejection of planetesimals into the Kuiper belt,
scattered disk, and the Oort cloud. Comparisons between
the end state of this evolution and the orbits of KBOs in
and begins the dynamical processes that populate the Oort cloud and the
scattered disk. If Neptune forms in situ in 1–10 Myr, then the flyby cannot
occur after massive KBOs form. If Neptune migrates to 30 AU after mas-
sive KBOs form, then a flyby can truncate the Kuiper belt without much
impact on the Oort cloud or the scattered disk.
6 7 8 9 10
log Time (yr)
20-25 AU
36-44 AU
66-83 AU
NICE Model
Fig. 7.— Evolution of Σ in a self-stirring model at 20–100 AU.
At 20–25 AU, it takes ∼ 5–10 Myr to form 1000 km objects. After
∼ 0.5–1 Gyr, there are ∼ 100 objects with r ∼ 1000–2000 km
and ∼ 105 objects with r ∼ 100–200 km at 20–30 AU. As these
objects grow, the collisional cascade removes ∼ 90% of the mass
in remnant planetesimals. The twin vertical dashed lines bracket
the time of the Late Heavy Bombardment at ∼ 300–600 Myr.
the ‘hot population’ and the scattered disk are encouraging
(Chapter by Morbidelli et al.).
Current theory cannot completely address the likelihood
of the initial state in the Nice model. Thommes et al. (1999,
2002) demonstrate that n-body simulations can produce a
compact configuration of gas giants, but did not consider
how fragmentation or interactions with low mass planetes-
imals affect the end state. O’Brien et al. (2005) show that
a disk of planetesimals has negligible collisional grinding
over 600 Myr if most of the mass is in large planetesimals
with r & 100 km. However, they did not address whether
this state is realizable starting from an ensemble of 1 km
and smaller planetesimals. In terrestrial planet simulations
starting with 1–10 km planetesimals, the collisional cascade
removes ∼ 25% of the initial rocky material in the disk
(Wetherill and Stewart 1993; Kenyon and Bromley 2004b).
Interactions between oligarchs and remnant planetesimals
are also important for setting the final mass and dynamical
state of the terrestrial planets (Bromley and Kenyon 2006;
Kenyon and Bromley 2006). Because complete hybrid cal-
culations of the giant planet region are currently compu-
tationally prohibitive, it is not possible to make a reliable
assessment of these issues for the formation of gas giant
planets.
Here, we consider the evolution of the planetesimal disk
outside the compact configuration of giant planets, where
standard coagulation calculations can follow the evolution
of many initial states for 1–5 Gyr in a reasonable amount of
computer time. Figure 7 shows the time evolution for the
surface density of planetesimals in three annuli from one
typical calculation at 20–25 AU (dot-dashed curve; Mi = 6
M⊕), 36–44 AU (dashed curve; Mi = 9 M⊕), and 66–83
AU (solid curve; Mi = 12 M⊕). Starting from the stan-
dard surface density profile (eq. 1), planetesimals at 20–25
AU grow to 1000 km sizes in a few Myr. Once the colli-
sional cascade begins, the surface density slowly declines
to ∼ 10% to 20% of its initial value at the time of the Late
Heavy Bombardment, when the Nice model predicts that
Saturn crosses the 2:1 orbital resonance with Jupiter.
These results provide a strong motivation to couple co-
agulation calculations with the dynamical simulations of the
Nice group (see also Charnoz and Morbidelli 2007). In the
Nice model, dynamical interactions with a massive plan-
etesimal disk are the ‘fuel’ for the dramatic migration of the
giant planets and the dynamical ejection of material into the
Kuiper belt and the scattered disk. If the mass in the plan-
etesimal disk declines by ∼ 80% as the orbits of the giant
planets evolve, the giant planets cannot migrate as dramati-
cally as in the Gomes et al. (2005) calculations. Increasing
the initial mass in the disk by a factor of 3–10 may allow
coagulation and the collisional cascade to produce a debris
disk capable of triggering the scattering events of the Nice
model.
3.5. A Caveat on the Collisional Cascade
Although many of the basic outcomes of oligarchic
growth and the collisional cascade are insensitive to the ini-
tial conditions and fragmentation parameters for the plan-
etesimals, several uncertainties in the collisional cascade
can modify the final mass in oligarchs and the distributions
of r and e. Because current computers do not allow coag-
ulation calculations that include the full range of sizes (1
µm to 104 km), published calculations have two pieces,
a solution for large objects (e.g., Kenyon and Bromley
2004a,b) and a separate solution for smaller objects (e.g.,
Krivov et al. 2006). Joining these solutions assumes that (i)
collision fragments continue to collide and fragment until
particles are removed by radiative processes and (ii) mu-
tual (destructive) collisions among the fragments are more
likely than mergers with much larger oligarchs. These as-
sumptions are reasonable but untested by numerical cal-
culations (Kenyon and Bromley 2002a). Thus, it may be
possible to halt or to slow the collisional cascade before
radiation pressure rapidly remove small grains with r ≈
1–100 µm.
In current coagulation calculations, forming massive oli-
garchs at 5–15 AU in a massive disk requires an ineffi-
cient collisional cascade. When the cascade is efficient,
the most massive oligarchs have m . 1 M⊕. Slowing
the cascade allows oligarchs to accrete planetesimals more
efficiently, which results in larger oligarchs that contain a
larger fraction of the initial mass. If collisional damping
is efficient, halting the cascade completely at sizes of ∼ 1
mm leads to rapid in situ formation of Uranus and Neptune
(Goldreich et al. 2004) and early stirring of KBOs at 40 AU.
There are two simple ways to slow the collisional cas-
cade. In simulations where the cascade continues to small
sizes, r ∼ 1–10 µm, the radial optical depth in small grains
is τs ∼ 0.1–1 at 30–50 AU (Kenyon and Bromley 2004a).
Lines-of-sight to the central star are not purely radial, so
this optical depth reduces radiation pressure and Poynting-
Robertson drag by small factors, ∼ e−0.2τs ∼ 10%–30%,
and has little impact on the evolution of the cascade. With
τs ∝ a
−s and s ∼ 1–2, however, the optical depth may re-
duce radiation forces significantly at smaller a. Slowing the
collisional cascade by factors of 2–3 could allow oligarchs
to accrete leftover planetesimals and smaller objects before
the cascade removes them.
Collisional damping and gas drag on small particles may
also slow the collisional cascade. For particles with large
ratios of surface area to volume, r . 0.1–10 cm, colli-
sions and the gas effectively damp e and i (Adachi et al.
1976; Goldreich et al. 2004) and roughly balance dynam-
ical friction and viscous stirring. Other interactions be-
tween small particles and the gas – such as photophoresis
(Wurm and Krauss 2006) – also damp particles randome
velocities and thus might help to slow the cascade. Both
collisions and interactions between the gas and the solids
are more effective at large volume density, so these pro-
cesses should be more important inside 30 AU than outside
30 AU. The relatively short lifetime of the gas, ∼ 3–10 Myr,
also favors more rapid growth inside 30 AU. If damping
maintains an equilibrium e ∼ 10−3 at a ∼ 20–30 AU, oli-
garchs can grow to the sizes, r & 2000 km, required in the
Nice model. Rapid growth at a ∼ 5–15 AU might produce
oligarchs with the isolation mass (r ∼ 10–30 R⊕; eq. 6)
and lead to the rapid formation of gas giants.
Testing these mechanisms for slowing the collisional
cascade requires coagulation calculations with accurate
treatments of collisional damping, gas drag, and optical
depth for particle radii r ∼ 1–10 µm to r ∼ 10000 km.
Although these calculations require factors of 4–6 more
computer time than published calculations, they are possi-
ble with multiannulus coagulation codes on modern parallel
computers.
3.6. Model Predictions
The main predictions derived from coagulation mod-
els are n(r), n(e), and n(i) as functions of a. The cu-
mulative number distribution consists of three power laws
(Kenyon and Bromley 2004d; Pan and Sari 2005)
n(r) =
−αd r ≤ r1
ni r1 ≤ r < r0
−αm r ≥ r0
The debris population at small sizes, r ≤ r1, always has
αd ≈ 3.5. The merger population at large sizes, r ≥ r0,
has αm ≈ 2.7–4. Because the collisional cascade robs
oligarchs of material, calculations with more stirring have
steeper size distributions. Thus, self-stirring calculations
with Qb & 10
5 erg g−1 (Qb . 10
3 erg g−1) typically yield
αm ≈ 2.7–3.3 (3.5–4). Models with a stellar flyby or stir-
ring by a nearby gas giant also favor large αm.
The transition radii for the power laws depend on the
fragmentation parameters (see Fig. 1; see also Pan and Sari
(2005)). For a typical e ∼ 0.01–0.1 in self-stirring models,
r0 ≈ r1 ≈ 1 km when Qb & 10
5 erg g−1. When Qb . 10
erg g−1, r1 ≈ 0.1 km and r0 ≈ 10–20 km. Thus the cal-
culations predict a robust correlation between the transition
radii and the power law exponents: large r0 andαm or small
r0 and αm.
Because gravitational stirring rates are larger than ac-
cretion rates, the predicted distributions of e and i at 4–5
Gyr depend solely on the total mass in oligarchs (see also
Goldreich et al. 2004). Small objects with r . r0 con-
tain a very small fraction of the mass and cannot stir them-
selves. Thus e and i are independent of r (Fig. 2). The
e and i for larger objects depends on the total mass in the
largest objects. In self-stirring models, dynamical friction
and viscous stirring between oligarchs and planetesimals
(during runaway growth) and among the ensemble of oli-
garchs (during oligarchic growth) set the distribution of e
for large objects with r & r0. In self-stirring models, vis-
cous stirring among oligarchs dominates dynamical friction
between oligarchs and leftover planetesimals, which leads
to a shallow relation between e and r, e ∝ r−γ with γ ≈
3/4. In the flyby and Neptune stirring models, stirring by
the external perturber dominates stirring among oligarchs.
This stirring yields a very shallow relation between e and r
with γ ≈ 0–0.25.
Other results depend little on the initial conditions and
the fragmentation parameters. In calculations with different
initial mass distributions, an order of magnitude range in e0,
and Qb = 10
0–107 erg g−1, βb = −0.5–0, Qg = 0.5–5 erg
cm−3, and βg ≥ 1.25, rmax and the amount of mass re-
moved by the collisional cascade vary by . 10% relative to
the evolution of the models shown in Figures 2–7. Because
collisional damping among 1 m to 1 km objects erases the
initial orbital distribution, the results do not depend on e0
and i0. Damping and dynamical friction also quickly erase
the initial mass distribution, which yields growth rates that
are insensitive to the initial mass distribution.
The insensitivity of rmax and mass removal to the frag-
mentation parameters depends on the rate of collisional dis-
ruption relative to the growth rate of oligarchs. Because
the collisional cascade starts when mo ∼ md (eq. (9)),
calculations with small Qb (Qb . 10
3 erg g−1) produce
large amounts of debris before calculations with large Qb
(Qb & 10
3 erg g−1). Thus, an effective collisional cascade
should yield lower mass oligarchs and more mass removal
when Qb is small. However, oligarchs with mo ∼ md still
have fairly large gravitational focusing factors and accrete
leftover planetesimals more rapidly than the cascade re-
moves them. As oligarchic growth continues, gravitational
focusing factors fall and collision disruptions increase. All
calculations then reach a point where the collisional cascade
removes leftover planetesimals more rapidly than oligarchs
can accrete them. As long as most planetesimals have r ∼
1–10 km, the timing of this epoch is more sensitive to grav-
itational focusing and the growth of oligarchs than the col-
lisional cascade and the fragmentation parameters. Thus,
rmax and the amount of mass processed through the colli-
sional cascade are relatively insensitive to the fragmentation
parameters.
4. Confronting KBO collision models with KBO data
Current data for KBOs provide two broad tests of coagu-
lation calculations. In each dynamical class, four measured
parameters test the general results of coagulation models
and provide ways to discriminate among the outcomes of
self-stirring and perturbed models. These parameters are
• rmax, the size of the largest object,
• αm, the slope of the size distribution for large KBOs
with r & 10 km,
• r0, the break radius, which measures the radius where
the size distribution makes the transition from a
merger population (r & r0) to a collisional popu-
lation (r . r0) as summarized in eq. (12), and
• Ml, the total mass in large KBOs.
For all KBOs, measurements of the dust mass allow tests of
the collisional cascade and link the Kuiper belt to observa-
tions of nearby debris disks. We begin with the discussion
of large KBOs and then compare the Kuiper belt with other
debris disks.
Table 1 summarizes the mass and size distribution pa-
rameters derived from recent surveys. To construct this
table, we used online data from the Minor Planet Center
(http://cfa−www.harvard.edu/iau/lists/MPLists.html)
for rmax (see also Levison and Stern 2001) and the results
of several detailed analyses for αm, rmax, and r0 (e.g.,
Bernstein et al. 2004; Elliot et al. 2005; Petit et al. 2006,
Chapter by Petit et al.). Because comprehensive KBO sur-
veys are challenging, the entries in the Table are incomplete
and sometimes uncertain. Nevertheless, these results pro-
vide some constraints on the calculations.
Current data provide clear evidence for physical differ-
ences among the dynamical classes. For classical KBOs
with a = 42–48 AU and q > 37 AU, the cold population
(i . 4o) has a steep size distribution with αm ≈ 3.5–4
and rmax ∼ 300–500 km. In contrast, the hot population
(i & 10o) has a shallow size distribution with αm ≈ 3 and
rmax ∼ 1000 km (Levison and Stern 2001). Both popu-
lations have relatively few objects with optical brightness
mR ≈ 27–27.5, which implies r0 ∼ 20–40 km for reason-
able albedo ∼ 0.04–0.07. The detached, resonant, and scat-
tered disk populations contain large objects with rmax ∼
1000 km. Although there are too few detached or scattered
disk objects to constrain αm or r0, data for the resonant
population are consistent with constraints derived for the
hot classical population, αm ≈ 3 and r0 ≈ 20–40 km.
The total mass in KBOs is a small fraction of the ∼
10–30 M⊕ of solid material in a MMSN from ∼ 35–50
AU (Gladman et al. 2001; Bernstein et al. 2004; Petit et al.
2006, see also Chapter by Petit et al.). The classical and res-
onant populations have Ml ≈ 0.01–0.1 M⊕ in KBOs with
TABLE 1. DATA FOR KBO SIZE DISTRIBUTION
KBO Class Ml (M⊕) rmax (km) r0 (km) qm
cold cl 0.01–0.05 400 20–40 km & 4
hot cl 0.01–0.05 1000 20–40 km 3–3.5
detached n/a 1500 n/a n/a
resonant 0.01–0.05 1000 20–40 km 3
scattered 0.1–0.3 700 n/a n/a
r & 10–20 km. The scattered disk may contain more mate-
rial, Ml ∼ 0.3 M⊕, but the constraints are not as robust as
for the classical and resonant KBOs.
These data are broadly inconsistent with the predictions
of self-stirring calculations with no external perturbers. Al-
though self-stirring models yield inclinations, i ≈ 2o–4o,
close to those observed in the cold, classical population, the
small rmax and large αm of this group suggest that an ex-
ternal dynamical perturbation – such as a stellar flyby or
stirring by Neptune – modified the evolution once rmax
reached ∼ 300–500 km. The observed break radius, r0 ∼
20–40 km, also agrees better with the r0 ∼ 10 km ex-
pected from Neptune stirring calculations than the r0 ∼ 1
km achieved in self-stirring models (Kenyon and Bromley
2004d; Pan and Sari 2005). Although a large rmax and
small αm for the resonant and hot, classical populations
agree reasonably well with self-stirring models, the ob-
served rmax ∼ 1000 km is much smaller than the rmax ∼
2000–3000 km typically achieved in self-stirring calcula-
tions (Figure 1). Both of these populations appear to have
large r0, which is also more consistent with Neptune stir-
ring models than with self-stirring models.
The small Ml for all populations provide additional ev-
idence against self-stirring models. In the most optimistic
scenario, where KBOs are easily broken, self-stirring mod-
els leave a factor of 5–10 more mass in large KBOs than
currently observed at 40–48 AU. Although models with
Neptune stirring leave a factor of 2–3 more mass in KBOs
at 40–48 AU than is currently observed, Neptune ejects
∼ half of the KBOs at 40–48 AU into the scattered disk
(e.g., Duncan et al. 1995; Kuchner et al. 2002). With an es-
timated mass of 2–3 times the mass in classical and reso-
nant KBOs, the scattered disk contains enough material to
bridge the difference between the KBO mass derived from
Neptune stirring models and the observed KBO mass.
The mass in KBO dust grains provides a final piece of
evidence against self-stirring models. From an analysis of
data from Pioneer 10 and 11, Landgraf et al. (2002) esti-
mate a dust production rate of ∼ 1015 g yr−1 in 0.01–2
mm particles at 40–50 AU. The timescale for Poynting-
Robertson drag to remove these grains from the Kuiper belt
is ∼ 10–100 Myr (Burns et al. 1979), which yields a mass
of ∼ 1022–1024 g. Figure 8 compares this dust mass with
masses derived from mid-IR and submm observations of
several nearby solar-type stars (Greaves et al. 1998, 2004;
Williams et al. 2004; Wyatt et al. 2005) and with predic-
tions from the self-stirring, flyby, and Neptune stirring mod-
els. The dust masses for nearby solar-type stars roughly fol-
5 6 7 8 9 10
log Time (yr)
Self-stirring
Flyby
Neptune
Fig. 8.— Evolution of mass in small dust grains (0.001–1
mm) for models with self-stirring (dot-dashed line), stirring from
a passing star (dashed line), and stirring from Neptune at 30 AU
(solid line) for Qb = 10
3 erg g−1. Calculations with smaller
(large) Qb produce more (less) dust at t . 50 Myr and some-
what more (less) dust at t & 100 Myr. At 1–5 Gyr, models with
Neptune stirring have less dust than self-stirring or flyby models.
The boxes show dust mass estimated for four nearby solar-type
stars (from left to right in age: HD 107146, ǫ Eri, η Crv, and τ
Cet; Greaves et al. 1998, 2004; Williams et al. 2004; Wyatt et al.
2005) and two estimates for the Kuiper belt (boxes connected by
solid line Landgraf et al. 2002).
low the predictions of self-stirring models and flyby models
with Qb ∼ 10
3 erg g−1. The mass of dust in the Kuiper
belt is 1–3 orders of magnitude smaller than predicted in
self-stirring models and is closer to the predictions of the
Neptune stirring models.
To combine the dynamical properties of KBOs with
these constraints, we rely on results from N -body simu-
lations that do not include collisional processing of small
objects (see Chapter by Morbidelli et al.). For simplic-
ity, we consider coagulation in the context of the Nice
model, which provides a solid framework for interpreting
the dynamics of the gas giants and the dynamical classes of
KBOs. In the Nice model, Saturn’s crossing of the 2:1 res-
onance with Jupiter initiates the dynamical instability that
populates the Kuiper belt. As Neptune approaches a ≈ 30
AU, it captures resonant KBOs, ejects KBOs into the scat-
tered disk and the Oort cloud, and excites the hot classical
KBOs. Although Neptune might reduce the number of cold,
classical KBOs formed roughly in situ beyond 30 AU, the
properties of these KBOs probably reflect conditions in the
Kuiper Belt when the instability began.
The Nice model requires several results from coagula-
tion calculations. Once giant planets form at 5–15 AU, col-
lisional growth must produce thousands of Pluto-mass ob-
jects at 20–30 AU. Unless the planetesimal disk was mas-
sive, growth of oligarchs must dominate collisional grind-
ing in this region of the disk. To produce the cold classical
population at ∼ 45 AU, collisions must produce 1–10 Pluto-
mass objects and then efficiently remove leftover planetesi-
mals. To match the data in Table 1, KBOs formed at 20–30
AU should have a shallower size distribution and a larger
rmax than those at 40–50 AU.
Some coagulation results are consistent with the trends
required in the Nice model. In current calculations, colli-
sional growth naturally yields smaller rmax and a steeper
size distribution at larger a. At 40–50 AU, Neptune-stirring
models produce a few Pluto-mass objects and many smaller
KBOs with e ∼ 0.1 and i ≈ 2o–4o. Although collisional
growth produces more Plutos at 15–30 AU than at 40–50
AU, collisional erosion removes material faster from the in-
ner disk than from the outer disk (Fig. 7). Thus, collisions
do not produce the thousands of Pluto-mass objects at 15–
30 AU required in the Nice model.
Reconciling this aspect of the Nice model with the coag-
ulation calculations requires a better understanding of the
physical processes that can slow or halt the collisional cas-
cade. Producing gas giants at 5–15 AU, thousands of Plutos
at 20–30 AU, and a few or no Plutos at 40–50 AU implies
that the outcome of coagulation changes markedly from 5
AU to 50 AU. If the collisional cascade can be halted as out-
lined in section §3.5, forming 5–10 M⊕ cores at 5–15 AU
is straightforward. Slowing the collisional cascade at 20–30
AU might yield a large population of Pluto mass objects at
20–30 AU. Because αm and rmax are well-correlated, bet-
ter constraints on the KBO size distributions coupled with
more robust coagulation calculations can test these aspects
of the Nice model in more detail.
To conclude this section, we consider constraints on the
Kuiper belt in the more traditional migration scenario of
Malhotra (1995), where Neptune forms at ∼ 20–25 AU
and slowly migrates to 30 AU. To investigate the relative
importance of collisional and dynamical depletion at 40–
50 AU, Charnoz and Morbidelli (2007) couple a collision
code with a dynamical code and derive the expected distri-
butions for size and orbital elements in the Kuiper belt, the
scattered disk, and the Oort cloud. Although collisional de-
pletion models can match the observations of KBOs, these
models are challenged to provide enough small objects into
the scattered disk and Oort cloud. Thus, the results suggest
that dynamical mechanisms dominate collisions in remov-
ing material from the Kuiper belt.
Although Charnoz and Morbidelli (2007) argue against
a dramatic change in collisional evolution from 15 AU to
40 AU, the current architecture of the solar system provides
good evidence for this possibility. In the MMSN, the ratio
of timescales to produce gas giant cores at 10 AU and at
25 AU is ξ = (25/10)3 ∼ 15. In the context of the Nice
model, formation of Saturn and Neptune at 8–11 AU in 5–
10 Myr thus implies formation of other gas giant cores at
20–25 AU in 50–150 Myr. If these cores had formed, they
would have consumed most of the icy planetesimals at 20–
30 AU, leaving little material behind to populate the outer
solar system when the giant planets migrate. The appar-
ent lack of gas giant core formation at 20–30 AU indicates
that the collisional cascade changed dramatically from 5–15
AU (where gas giant planets formed) to 20–30 AU (where
gas giant planets did not form). As outlined in §3.5, under-
standing the interaction of small particles with the gas and
the radiation field may provide important insights into the
evolution of oligarchic growth and thus into the formation
and structure of the solar system.
5. KBOs and Asteroids
In many ways, the Kuiper Belt is similar to the asteroid
belt. Both are populations of small bodies containing rela-
tively little mass compared to the rest of the Solar System;
the structure and dynamics of both populations have been
influenced significantly by the giant planets; and both have
been and continue to be significantly influenced by colli-
sions. Due to its relative proximity to Earth, however, there
are substantially more observational data available for the
asteroid belt than the Kuiper Belt. While the collisional and
dynamical evolution of the asteroid belt is certainly not a
solved problem, the abundance of constraints has allowed
for the development of reasonably consistent models. Here
we briefly describe what is currently understood about the
evolution of the asteroid belt, what insights that may give us
with regards to the evolution of the Kuiper Belt, and what
differences might exist in the evolution of the two popula-
tions.
It has long been recognized that the primordial as-
teroid belt must have contained hundreds or thousands
of times more mass than the current asteroid belt (e.g.
Lecar and Franklin 1973; Safronov 1979; Weidenschilling
1977c; Wetherill 1989). Reconstructing the initial mass
distribution of the Solar System from the current masses of
the planets and asteroids, for example, yields a pronounced
mass deficiency in the asteroid belt region relative to an oth-
erwise smooth distribution for the rest of the Solar System
(Weidenschilling 1977c). To accrete the asteroids on the
timescales inferred from meteoritic evidence would require
hundreds of times more mass than currently exists in the
main belt (Wetherill 1989).
In addition to its pronounced mass depletion, the aster-
oid belt is also strongly dynamically excited. The mean
proper eccentricity and inclination of asteroids larger than
∼50 km in diameter are 0.135 and 10.9o (from the cata-
log of Knežević and Milani (2003)), which are significantly
larger than can be explained by gravitational perturbations
amongst the asteroids or by simple gravitational pertur-
bations from the planets (Duncan 1994). The fact that
the different taxonomic types of asteroids (S-type, C-type,
etc.) are radially mixed somewhat throughout the main belt,
rather than confined to delineated zones, indicates that there
has been significant scattering in semimajor axis as well
(Gradie and Tedesco 1982).
Originally, a collisional origin was suggested for the
mass depletion in the asteroid belt (Chapman and Davis
1975). The difficulty of collisionally disrupting the largest
asteroids, coupled with the survival of the basaltic crust of
the ∼500-km diameter asteroid Vesta, however, suggest that
collisional grinding was not the cause of the mass deple-
tion (Davis et al. 1979, 1985, 1989, 1994; Wetherill 1989;
Durda and Dermott 1997; Durda et al. 1998; Bottke et al.
2005a,b; O’Brien and Greenberg 2005). In addition, col-
lisional processes alone could not fully explain both the dy-
namical excitation and the radial mixing observed in the as-
teroid belt, although Charnoz et al. (2001) suggest that col-
lisional diffusion may have contributed to its radial mixing.
Several dynamical mechanisms have been proposed
to explain the mass depletion, dynamical excitation and
radial mixing of the asteroid belt. As the solar nebula
dissipated, the changing gravitational potential acting on
Jupiter, Saturn, and the asteroids would lead to changes
in their precession rates and hence changes in the posi-
tions of secular resonances, which could ‘sweep’ through
the asteroid belt, exciting e and i, and coupled with gas
drag, could lead to semi-major axis mobility and the re-
moval of material from the belt (e.g., Heppenheimer 1980;
Ward 1981; Lemaitre and Dubru 1991; Lecar and Franklin
1997; Nagasawa et al. 2000, 2001, 2002). It has also been
suggested that sweeping secular resonances could lead to
orbital excitation in the Kuiper Belt (Nagasawa and Ida
2000). However, as reviewed by Petit et al. (2002) and
O’Brien et al. (2006), secular resonance sweeping is gen-
erally unable to simultaneously match the observed e and
i excitation in the asteroid belt, as well as its radial mix-
ing and mass depletion, for reasonable parameter choices
(especially in the context of the Nice Model).
Another possibility is that planetary embryos were able
to accrete in the asteroid belt (e.g., Wetherill 1992). The
fact that Jupiter’s ∼10 Earth-mass core was able to accrete
in our Solar System beyond the asteroid belt suggests that
embryos were almost certainly able to accrete in the aster-
oid belt, even accounting for the roughly 3-4× decrease in
the mass density of solid material inside the snow line. The
scattering of asteroids by those embryos, coupled with the
Jovian and Saturnian resonances in the asteroid belt, has
been shown to be able to reasonably reproduce the observed
e and i excitation in the belt as well as its radial mixing
and mass depletion (Petit et al. 2001, 2002; O’Brien et al.
2006). In the majority of simulations of this scenario by
both groups, the embryos are completely cleared from the
asteroid belt.
Thus, the observational evidence and theoretical mod-
els for the evolution of the asteroid belt strongly suggest
that dynamics, rather than collisions, dominated its mass
depletion. Collisions, however, have still played a key
role in sculpting the asteroid belt. Many dynamical fam-
ilies, clusterings in orbital element space, have been dis-
covered, giving evidence for ∼20 breakups of 100-km or
larger parent bodies over the history of the Solar System
(Bottke et al. 2005a,b). The large 500-km diameter asteroid
Vesta has a preserved basaltic crust with a single large im-
pact basin (McCord et al. 1970; Thomas et al. 1997). This
basin was formed by the impact of a roughly 40-km projec-
tile (Marzari et al. 1996; Asphaug 1997).
The size distribution of main-belt asteroids is known
or reasonably constrained through observational surveys
down to ∼1 km in diameter (e.g. Durda and Dermott 1997;
 0.1  1  10  100  1000
Diameter (km)
Asteroid and TNO Size Distributions
Bernstein (2004)
Fit to Data
 0.1  1  10  100  1000
Diameter (km)
Asteroid and TNO Size Distributions
Subaru
SKADS
Spacewatch
Cataloged Asteroids
Fig. 9.— Observational estimates of the main belt and TNO size
distributions. The pentagons (with dashed best-fit curve) show
the total TNO population as determined from the Bernstein et al.
(2004) HST survey, converted to approximate diameters assum-
ing an albedo of 0.04. Points with arrows are upper limits given
by non-detections. The solid line is the population of observed
asteroids, and open circles are from debiased Spacewatch main-
belt observations (Jedicke and Metcalfe 1998). These data, con-
verted to diameters, were provided by D. Durda. The two dashed
lines are extrapolations based on the Sloan Digital Sky Survey
(Ivezić et al. 2001) and the Subaru Sub-km Main Belt Aster-
oid Survey (Yoshida et al. 2003), and diamonds show the debi-
ased population estimate from the SKADS survey (Gladman et al.
2007). Error bars are left out of this plot for clarity. Note that the
TNO population is substantially more populous and massive, by
roughly a factor of 1000, than the asteroid population.
Jedicke and Metcalfe 1998; Ivezić et al. 2001; Yoshida et al.
2003; Gladman et al. 2007). Not surprisingly, the largest
uncertainties are at the smallest sizes, where good orbits
are often not available for the observed asteroids, which
makes the conversion to absolute magnitude and diame-
ter difficult (e.g., Ivezić et al. 2001; Yoshida et al. 2003).
Recent results from the Sub-Kilometer Asteroid Diame-
ter Survey (SKADS, Gladman et al. 2007), the first survey
since the Palomar-Leiden Survey designed to determine or-
bits as well as magnitudes of main-belt asteroids, suggest
that the asteroid magnitude-frequency distribution may be
well represented by a single power law in the range from
H=14.0 to 18.8, which corresponds to diameters of 0.7 to 7
km for an albedo of 0.11. These observational constraints
are shown in Fig. 9 alongside the determination of the TNO
size distribution from Bernstein et al. (2004).
While over some size ranges, the asteroid size distribu-
tion can be fit by a single power law, over the full range of
observed asteroid diameters from ∼1-1000 km, there are
multiple bumps or kinks in the size distribution (namely
around 10 and 100 km in diameter). The change in slope
of the size distribution around 100 km is due primarily to
the fact that asteroids larger than this are very difficult to
disrupt, and hence the size distribution of bodies larger than
100 km is likely primordial. The change in slope around
10 km has a different origin—such a structure is produced
as a result of a change in the strength properties of aster-
oids, namely the transition from when a body’s resistance
to disruption is dominated by material strength to when it is
dominated by self-gravity. This transition in strength prop-
erties occurs at a size much smaller than 10 km, but re-
sults in a structure that propagates to larger sizes (see, e.g.,
Durda et al. 1998; O’Brien and Greenberg 2003). The pres-
ence of this structure in the asteroid size distribution is con-
sistent with the asteroids being a collisionally-relaxed pop-
ulation, i.e. a population in which the size distribution has
reached an approximate steady state where collisional pro-
duction and collisional destruction of bodies in each size
range are in balance.
The collisional evolution of the asteroid belt has been
studied by many authors (e.g. Davis et al. 1985; Durda
1993; Davis et al. 1994; Durda and Dermott 1997; Durda et al.
1998; Campo Bagatin et al. 1993, 1994, 2001; Marzari et al.
1999). The most recent models of collisional evolution
of the asteroid belt incorporate aspects of dynamical evo-
lution as well, such as the removal of bodies by reso-
nances and the Yarkovsky effect, and the enhancement
in collisional activity during its massive primordial phase
(O’Brien and Greenberg 2005; Bottke et al. 2005a,b). In
particular, Bottke et al. (2005a) explicitly incorporate the
results of dynamical simulations of the excitation and clear-
ing of the main belt by embedded planetary embryos per-
formed by Petit et al. (2001). Such collisional/dynamical
models can be constrained by a wide range of observational
evidence such as the main belt size distribution, the num-
ber of observed asteroid families, the existence of Vesta’s
basaltic crust, and the cosmic ray exposure ages of ordi-
nary chondrite meteorites, which suggest that the lifetimes
of meter-scale stony bodies in the asteroid belt are on the
order of 10-20 Myr (Marti and Graf 1992).
One of the most significant implications of having an
early massive main belt, which was noted in early colli-
sional models (e.g. Chapman and Davis 1975) and recently
emphasized in the case of collisional evolution plus dynam-
ical depletion (e.g., Bottke et al. 2005b), is that the majority
of the collisional evolution of the asteroid belt occurred dur-
ing its early, massive phase, and there has been relatively lit-
tle change in the main-belt size distribution since then. The
current, wavy main-belt size distribution, then, is a ‘fossil’
from its first few hundred Myr of collisional and dynamical
evolution.
So how does the Kuiper Belt compare to the asteroid belt
in terms of its collisional and dynamical evolution? Evi-
dence and modeling for the asteroid belt suggest that dy-
namical depletion, rather than collisional erosion, was pri-
marily responsible for reducing the mass of the primordial
asteroid belt to its current level. In the case of the Kuiper
Belt, this is less clear. As shown in Sec. 3, collisional
erosion, especially when aided by stellar perturbations or
the formation of Neptune, can be very effective in remov-
ing mass. At the same time, dynamical models such as
the Nice Model result in the depletion of a large amount
of mass through purely dynamical means and are able to
match many observational constraints. Recent modeling
that couples both collisional fragmentation and dynami-
cal effects suggests that collisional erosion cannot play too
large of a role in removing mass from the Kuiper Belt, oth-
erwise the Scattered Disk and Oort Cloud would be too de-
pleted to explain the observed numbers of short- and long-
period comets (Charnoz and Morbidelli 2007). That model
currently does not include coagulation. Further modeling
work, which self-consistently integrates coagulation, colli-
sional fragmentation, and dynamical effects, is necessary to
fully constrain the relative contributions of collisional and
dynamical depletion in the Kuiper Belt.
We have noted that the asteroid belt has a collisionally-
relaxed size distribution that is not well-represented by a
single power law over all size ranges. Should we expect the
same for the Kuiper Belt size distribution, and is there evi-
dence to support this? The collision rate in the Kuiper Belt
should be roughly comparable to that in the asteroid belt,
with the larger number of KBOs offsetting their lower in-
trinsic collision probability (Davis and Farinella 1997), and
as noted earlier in this chapter, the primordial Kuiper Belt,
like the asteroid belt, would have been substantially more
massive than the current population. This suggests that
the Kuiper Belt should have experienced a degree of col-
lisional evolution roughly comparable to the asteroid belt,
and thus is likely to be collisionally relaxed like the asteroid
belt. Observational evidence thus far is not detailed enough
to say for sure if this is the case, although recent work
(Kenyon and Bromley 2004e; Pan and Sari 2005) suggests
that the observational estimate of the TNO size distribution
by Bernstein et al. (2004), shown in Fig. 9, is consistent
with a collisionally-relaxed population.
While the Kuiper Belt is likely to be collisionally re-
laxed, it is unlikely to mirror the exact shape of the aster-
oid belt size distribution. The shape of the size distribu-
tion is determined, in part, by the strength law Q∗D, which
is likely to differ somewhat between asteroids and KBOs.
This is due to the difference in composition between as-
teroids, which are primarily rock, and KBOs, which con-
tain a substantial amount of ice, as well as the difference
in collision velocity between the two populations. With a
mean velocity of ∼5 km/s (Bottke et al. 1994), collisions
between asteroids are well into the supersonic regime (rel-
ative to the sound speed in rock). For the Kuiper belt, col-
lision velocities are about a factor of 5 or more smaller
(Davis and Farinella 1997), such that collisions between
KBOs are close to the subsonic/supersonic transition. For
impacts occurring in these different velocity regimes, and
into different materials, Q∗D may differ significantly (e.g.,
Benz and Asphaug 1999).
The difference in collision velocity can influence the size
distribution in another way as well. With a mean collision
velocity of ∼5 km/s, a body of a given size in the aster-
oid belt can collisionally disrupt a significantly larger body.
Thus, transitions in the strength properties of asteroids can
lead to the formation of waves that propagate to larger sizes
and manifest themselves as changes in the slope of the size
distribution, as seen in Fig. 9. For the Kuiper belt, with col-
lision velocities that are about a factor of 5 or more smaller
than in the asteroid belt, the difference in size between a
given body and the largest body it is capable of disrupting
is much smaller than in the asteroid belt, and waves should
therefore be much less pronounced or non-existent in the
KBO size distribution (e.g., O’Brien and Greenberg 2003).
There is still likely to be a change in slope at the largest sizes
where the population transitions from being primordial to
being collisionally relaxed, and such a change appears in
the debiased observational data of Bernstein et al. (2004)
(shown here in Fig. 9), although recent observations sug-
gest that the change in slope may actually occur at smaller
magnitudes than found in that survey (Petit et al. 2006).
Is the size distribution of the Kuiper Belt likely to be a
‘fossil’ like the asteroid belt? The primordial Kuiper Belt
would have been substantially more massive than the cur-
rent population. Thus, regardless of whether the depletion
of its mass was primarily collisional or dynamical, colli-
sional evolution would have been more intense early on
and the majority of the collisional evolution would have oc-
curred early in its history. In either case, its current size
distribution could then be considered a fossil from that early
phase, although defining exactly when that early phase ends
and the size distribution becomes ‘fossilized’ is not equally
clear in both cases. In the case where the mass depletion
of the Kuiper Belt occurs entirely through collisions, there
would not necessarily be a well-defined point at which one
could say that the size distribution became fossilized, as the
collision rate would decay continuously with time. In the
case of dynamical depletion, where the mass would be re-
moved fairly rapidly as in the case of the Nice Model de-
scribed in Sec. 3.4, the collision rate would experience a
correspondingly rapid drop, and the size distribution could
be considered essentially fossilized after the dynamical de-
pletion event.
As noted earlier in this section, an important observable
manifestation of collisions in the asteroid belt is the for-
mation of families, i.e. groupings of asteroids with similar
orbits. Asteroid families are thought to be the fragments of
collisionally disrupted parent bodies. These were first rec-
ognized by Hirayama (1918) who found 3 families among
the 790 asteroids known at that time. The number increased
to 7 families by 1926 when there were 1025 known as-
teroids (Hirayama 1927). Today, there are over 350,000
known asteroids while the number of asteroid families has
grown to about thirty.
Given that the Kuiper Belt is likely a collisionally
evolved population, are there collisional families to be
found among these bodies? Families are expected to be
more difficult to recognize in the Kuiper Belt than in the
asteroid belt. Families are identified by finding statistically
significant clusters of asteroid orbit elements—mainly the
semi-major axis, eccentricity and inclination. The colli-
sional disruption of a parent bodies launches fragments
with speeds of perhaps a few hundred meters/sec relative
to the original target body. This ejection speed is small
compared with the orbital speeds of asteroids, hence the or-
bits of fragments differ by only small amounts from that of
the original target body and, more importantly, from each
other. Thus, the resulting clusters of fragments are easy to
identify.
However, in the Kuiper Belt, where ejection velocities
are likely to be about the same but orbital speeds are much
lower, collisional disruption produces a much greater dis-
persion in the orbital elements of fragments. This reduces
the density of the clustering of orbital elements and makes
the task of distinguishing family members from the back-
ground population much more difficult (Davis and Farinella
1997). To date, there are over 1000 KBOs known, many of
which have poorly-determined orbits or are in resonances
that would make the identification of a family difficult or
impossible. Chiang et al. (2003) applied lowest-order secu-
lar theory to 227 non-resonant KBOs with well-determined
orbits and found no convincing evidence for a dynamical
family. Recently, however, Brown et al. (2007) found evi-
dence for a single family with at least 5 members associated
with KBO 2003 EL61. This family was identified based on
the unique spectroscopic signature of its members, and con-
firmed by their clustered orbit elements.
Given the small numbers involved, it cannot be said
whether or not finding a single KBO family at this stage is
statistically that different from the original identification of
3 asteroid families when there were only 790 known aster-
oids (Hirayama 1918). However, the fact that the KBO fam-
ily associated with 2003 EL61 was first discovered spectro-
scopically, and its clustering in orbital elements was later
confirmed, while nearly all asteroid families were discov-
ered based on clusterings in orbital elements alone, suggests
that even if comparable numbers of KBO families and aster-
oid families do exist, the greater dispersion of KBO families
in orbital element space may make them more difficult to
identify unless there are spectroscopic signatures connect-
ing them as well.
Perhaps when the number of non-resonant KBOs with
good orbits approaches 1000, more populous Kuiper Belt
families will be identified, and as can be done now with
the asteroid belt, these KBO families can be used as con-
straints on the interior structures of their original parent
bodies as well as on the collisional and dynamical history
of the Kuiper Belt as a whole.
6. Concluding Remarks
Starting with a swarm of 1 m to 1 km planetesimals
at 20–150 AU, the growth of icy planets follows a stan-
dard pattern (Stern and Colwell 1997a,b; Kenyon and Luu
1998, 1999a,b; Kenyon and Bromley 2004a,d, 2005). Col-
lisional damping and dynamical friction lead to a short pe-
riod of runaway growth that produces 10–100 objects with
r ∼ 300–1000 km. As these objects grow, they stir the
orbits of leftover planetesimals up to the disruption veloc-
ity. Once disruptions begin, the collisional cascade grinds
leftover planetesimals into smaller objects faster than the
oligarchs can accrete them. Thus, the oligarchs always con-
tain a small fraction of the initial mass in solid material.
For self-stirring models, oligarchs contain ∼ 10% of the
initial mass. Stellar flybys and stirring by a nearby gas gi-
ant augment the collisional cascade and leave less mass in
oligarchs. The two examples in §3.3 suggest that a very
close flyby and stirring by Neptune leave ∼ 2% to 5% of
the initial mass in oligarchs with r ∼ 100–1000 km.
This evolution differs markedly from planetary growth
in the inner solar system. In ∼ 0.1–1 Myr at a few AU, run-
away growth produces massive oligarchs, m & 0.01M⊕,
that contain most of the initial solid mass in the disk. Aside
from a few giant impacts like those that might produce
the Moon (Hartmann and Davis 1975; Cameron and Ward
1976), collisions remove little mass from these objects. Al-
though the collisional cascade removes many leftover plan-
etesimals before oligarchs can accrete them, the lost ma-
terial is much less than half of the original solid mass
(Wetherill and Stewart 1993; Kenyon and Bromley 2004b).
For a & 40 AU, runaway growth leaves most of the mass in
0.1–10 km objects that are easily disrupted at modest colli-
sion velocities. In 4.5 Gyr, the collisional cascade removes
most of the initial disk mass inside 70–80 AU.
Together with numerical calculations of orbital dynam-
ics (Chapter by Morbidelli et al.), theory now gives us a
foundation for understanding the origin and evolution of
the Kuiper belt. Within a disk of planetesimals at 20–
100 AU, collisional growth naturally produces objects with
r ∼ 10–2000 km and a size distribution reasonably close
to that observed among KBOs. As KBOs form, migration
of the giant planets scatters KBOs into several dynamical
classes (Chapter by Morbidelli et al.). Once the giant plan-
ets achieve their current orbits, the collisional cascade re-
duces the total mass in KBOs to current levels and produces
the break in the size distribution at r ∼ 20–40 km. Contin-
ued dynamical scattering by the giant planets sculpts the
inner Kuiper belt and maintains the scattered disk.
New observations will allow us to test and to refine this
theoretical picture. Aside from better measures of αm,
rmax, and r0 among the dynamical classes, better limits on
the total mass and the size distribution of large KBOs with
a ∼ 50–100 AU should yield a clear discriminant among
theoretical models. In the Nice model, the Kuiper belt was
initially nearly empty outside of ∼ 50 AU. Thus, any KBOs
found with a ∼ 50–100 AU should have the collisional
and dynamical signatures of the scattered disk or detached
population. If some KBOs formed in situ at a & 50 AU,
their size distribution depends on collisional growth modi-
fied by self-stirring and stirring by ∼ 30 M⊕ of large KBOs
formed at 20–30 AU and scattered through the Kuiper belt
by the giant planets. From the calculations of Neptune stir-
ring (§3.3), stirring by scattered disk objects should yield a
size distribution markedly different from the size distribu-
tion of detached or scattered disk objects formed at 20–30
AU. Wide-angle surveys on 2–3 m class telescopes (e.g.,
Pan-Starrs) and deep probes with 8–10 m telescopes can
provide this test.
Information on smaller size scales – αd and r1 – place
additional constraints on the bulk properties (fragmentation
parameters) of KBOs and on the collisional cascade. In any
of the stirring models, there is a strong correlation between
r0, r1, and the fragmentation parameters. Thus, direct mea-
sures of r0 and r1 provide a clear test of KBO formation
calculations. At smaller sizes (r . 0.1 km), the slope of
the size distribution αd clearly tests the fragmentation al-
gorithm and the ability of the collisional cascade to remove
KBOs with r ∼ 1–10 km. Although the recent detection
of KBOs with r ≪ 1 km (Chang et al. 2006) may be an
instrumental artifact (Jones et al. 2006; Chang et al. 2007),
optical and X-ray occultations (e.g., TAOS) will eventually
yield these tests.
Finally, there is a clear need to combine coagulation
and dynamical calculations to produce a ‘unified’ picture
of planet formation at a & 20 AU. Charnoz and Morbidelli
(2007) provide a good start in this direction. Because the
collisional outcome is sensitive to internal and external dy-
namics, understanding the formation of the observed n(r),
n(e), and n(i) distributions in each KBO population re-
quires treating collisional evolution and dynamics together.
A combined approach should yield the sensitivity of αm,
rmax, and r0 to the local evolution and the timing of the
formation of giant planets, Neptune migration, and stellar
flybys. These calculations will also test how the dynami-
cal events depend on the evolution during oligarchic growth
and the collisional cascade. Coupled with new observations
of KBOs and of planets and debris disks in other planetary
systems, these calculations should give us a better under-
standing of the origin and evolution of KBOs and other ob-
jects in the outer solar system.
We thank S. Charnoz, S. Kortenkamp, A. Morbidelli,
and an anonymous reviewer for comments that consider-
ably improved the text. We acknowledge support from
the NASA Astrophysics Theory Program (grant NAG5-
13278; BCB & SJK), the NASA Planetary Geology and
Geophysics Program (grant NNX06AC50G; DPO), and the
JPL Institutional Computing and Information Services and
the NASA Directorates of Aeronautics Research, Science,
Exploration Systems, and Space Operations (BCB & SJK).
REFERENCES
Adachi, I., Hayashi, C., and Nakazawa, K. (1976). The gas
drag effect on the elliptical motion of a solid body in the
primordial solar nebula. Progress of Theoretical Physics,
56, 1756–1771.
Adams, F. C. and Laughlin, G. (2001). Constraints on the
Birth Aggregate of the Solar System. Icarus, 150, 151–
Artymowicz, P. (1988). Radiation pressure forces on parti-
cles in the Beta Pictoris system. Astrophys. J. Lett., 335,
L79–L82.
Artymowicz, P. (1997). Beta Pictoris: an Early Solar Sys-
tem? Annual Review of Earth and Planetary Sciences,
25, 175–219.
Asphaug, E. (1997). Impact origin of the Vesta family. Me-
teoritics and Planetary Science, 32, 965–980.
Asphaug, E. and Benz, W. (1996). Size, Density, and
Structure of Comet Shoemaker-Levy 9 Inferred from the
Physics of Tidal Breakup. Icarus, 121, 225–248.
Aumann, H. H., Beichman, C. A., Gillett, F. C., de Jong, T.,
Houck, J. R., Low, F. J., Neugebauer, G., Walker, R. G.,
and Wesselius, P. R. (1984). Discovery of a shell around
Alpha Lyrae. Astrophys. J. Lett., 278, L23–L27.
Backman, D. E. and Paresce, F. (1993). Main-sequence
stars with circumstellar solid material - The VEGA phe-
nomenon. In E. H. Levy and J. I. Lunine, editors, Proto-
stars and Planets III, pages 1253–1304.
Beckwith, S. V. W. and Sargent, A. I. (1996). Circumstellar
disks and the search for neighbouring planetary systems.
Nature, 383, 139–144.
Benz, W. and Asphaug, E. (1999). Catastrophic Disruptions
Revisited. Icarus, 142, 5–20.
Bernstein, G. M., Trilling, D. E., Allen, R. L., Brown,
M. E., Holman, M., and Malhotra, R. (2004). The Size
Distribution of Trans-Neptunian Bodies. Astron. J., 128,
1364–1390.
Bottke, W. F., Nolan, M. C., Greenberg, R., and Kolvoord,
R. A. (1994). Velocity distributions among colliding as-
teroids. Icarus, 107, 255–268.
Bottke, W. F., Durda, D. D., Nesvorný, D., Jedicke, R.,
Morbidelli, A., Vokrouhlický, D., and Levison, H. F.
(2005a). Linking the collisional history of the main
asteroid belt to its dynamical excitation and depletion.
Icarus, 179, 63–94.
Bottke, W. F., Durda, D. D., Nesvorný, D., Jedicke,
R., Morbidelli, A., Vokrouhlický, D., and Levison, H.
(2005b). The fossilized size distribution of the main as-
teroid belt. Icarus, 175, 111–140.
Bromley, B. C. and Kenyon, S. J. (2006). A Hybrid N-
Body-Coagulation Code for Planet Formation. Astron.
J., 131, 2737–2748.
Brown, M. E., Barkume, K. M., Ragozzine, D., and Sc-
ahller, E. L. (2007). A collisional family of icy objects
in the Kuiper belt. Nature, 446, 294–297.
Burns, J. A., Lamy, P. L., and Soter, S. (1979). Radiation
forces on small particles in the solar system. Icarus, 40,
1–48.
Cameron, A. G. W. and Ward, W. R. (1976). The Origin of
the Moon. In Lunar and Planetary Institute Conference
Abstracts, pages 120–121.
Campo Bagatin, A., Farinella, P., and Paolicchi, P. (1993).
Collisional evolution of the asteroid size distribution: A
numerical simulation. Celestial Mechanics and Dynam-
ical Astronomy, 57, 403–404.
Campo Bagatin, A., Cellino, A., Davis, D. R., Farinella,
P., and Paolicchi, P. (1994). Wavy size distribu-
tions for collisional systems with a small-size cutoff.
Planet. Space Sci., 42, 1079–1092.
Campo Bagatin, A., Petit, J.-M., and Farinella, P. (2001).
How Many Rubble Piles Are in the Asteroid Belt?
Icarus, 149, 198–209.
Chang, H.-K., King, S.-K., Liang, J.-S., Wu, P.-S., Lin,
L. C.-C., and Chiu, J.-L. (2006). Occultation of X-rays
from Scorpius X-1 by small trans-neptunian objects. Na-
ture, 442, 660–663.
Chang, H.-K., Liang, J.-S., Liu, C.-Y., and King, S.-K.
(2007). Millisecond dips in the RXTE/PCA light curve
of Sco X-1 and TNO occultation. ArXiv Astrophysics
e-prints.
Chapman, C. R. and Davis, D. R. (1975). Asteroid colli-
sional evolution - Evidence for a much larger early pop-
ulation. Science, 190, 553–556.
Charnoz, S. and Morbidelli, A. (2007). Coupling dynami-
cal and collisional evolution of small bodies II: Forming
the Kuiper Belt, the Scattered Disk and the Oort Cloud.
Icarus, in press.
Charnoz, S., Thébault, P., and Brahic, A. (2001). Short-term
collisional evolution of a disc perturbed by a giant-planet
embryo. Astron. Astrophys., 373, 683–701.
Chiang, E., Lithwick, Y., Murray-Clay, R., Buie, M.,
Grundy, W., and Holman, M. (2007). A Brief History
of Transneptunian Space. In B. Reipurth, D. Jewitt, and
K. Keil, editors, Protostars and Planets V , pages 895–
Chiang, E. I., Lovering, J. R., Millis, R. L., Buie, M. W.,
Wasserman, L. H., and Meech, K. J. (2003). Resonant
and Secular Families of the Kuiper Belt. Earth Moon
and Planets, 92, 49–62.
Davis, D. R. and Farinella, P. (1997). Collisional Evolution
of Edgeworth-Kuiper Belt Objects. Icarus, 125, 50–60.
Davis, D. R., Chapman, C. R., Greenberg, R., Weiden-
schilling, S. J., and Harris, A. W. (1979). Collisional
evolution of asteroids - Populations, rotations, and ve-
locities. In T. Gehrels, editor, Asteroids, pages 528–557.
University of Arizona Press, Tucson, AZ.
Davis, D. R., Chapman, C. R., Weidenschilling, S. J., and
Greenberg, R. (1985). Collisional history of asteroids:
Evidence from Vesta and the Hirayama families. Icarus,
63, 30–53.
Davis, D. R., Weidenschilling, S. J., Farinella, P., Paolicchi,
P., and Binzel, R. P. (1989). Asteroid collisional history
- Effects on sizes and spins. In R. P. Binzel, T. Gehrels,
and M. S. Matthews, editors, Asteroids II, pages 805–
826. University of Arizona Press, Tucson, AZ.
Davis, D. R., Ryan, E. V., and Farinella, P. (1994). Asteroid
collisional evolution: Results from current scaling algo-
rithms. Planet. Space Sci., 42, 599–610.
Davis, D. R., Farinella, P., and Weidenschilling, S. J.
(1999). Accretion of a Massive Edgeworth-Kuiper Belt.
In Lunar and Planetary Institute Conference Abstracts,
pages 1883–1884.
de la Fuente Marcos, C. and Barge, P. (2001). The effect of
long-lived vortical circulation on the dynamics of dust
particles in the mid-plane of a protoplanetary disc. Mon.
Not. R. Astron. Soc., 323, 601–614.
Dohnanyi, J. W. (1969). Collisional models of asteroids and
their debris. J. Geophys. Res., 74, 2531–2554.
Dullemond, C. P. and Dominik, C. (2005). Dust coagula-
tion in protoplanetary disks: A rapid depletion of small
grains. Astron. Astrophys., 434, 971–986.
Duncan, M. (1994). Orbital Stability and the Structure of
the Solar System. In R. Ferlet and A. Vidal-Madjar, ed-
itors, Circumstellar Dust Disks and Planet Formation,
pages 245–255. Gif-sur-Yvette: Editions Frontieres.
Duncan, M. J. and Levison, H. F. (1997). A scattered comet
disk and the origin of Jupiter family comets. Science,
276, 1670–1672.
Duncan, M. J., Levison, H. F., and Budd, S. M. (1995). The
Dynamical Structure of the Kuiper Belt. Astron. J., 110,
3073–3081.
Durda, D. D. (1993). The Collisional Evolution of the As-
teroid Belt and Its Contribution to the Zodiacal Cloud.
Ph.D. Thesis, University of Florida.
Durda, D. D. and Dermott, S. F. (1997). The Collisional
Evolution of the Asteroid Belt and Its Contribution to
the Zodiacal Cloud. Icarus, 130, 140–164.
Durda, D. D., Greenberg, R., and Jedicke, R. (1998). Colli-
sional Models and Scaling Laws: A New Interpretation
of the Shape of the Main-Belt Asteroid Size Distribution.
Icarus, 135, 431–440.
Elliot, J. L., Kern, S. D., Clancy, K. B., Gulbis, A. A. S.,
Millis, R. L., Buie, M. W., Wasserman, L. H., Chiang,
E. I., Jordan, A. B., Trilling, D. E., and Meech, K. J.
(2005). The Deep Ecliptic Survey: A Search for Kuiper
Belt Objects and Centaurs. II. Dynamical Classification,
the Kuiper Belt Plane, and the Core Population. Astron.
J., 129, 1117–1162.
Giblin, I., Davis, D. R., and Ryan, E. V. (2004). On the
collisional disruption of porous icy targets simulating
Kuiper belt objects. Icarus, 171, 487–505.
Gladman, B., Kavelaars, J. J., Petit, J., Morbidelli, A., Hol-
man, M. J., and Loredo, T. (2001). The Structure of the
Kuiper Belt: Size Distribution and Radial Extent. As-
tron. J., 122, 1051–1066.
Gladman, B. J., Davis, D. R., Neese, N., Williams, G.,
Jedicke, R., Kavelaars, J. J., Petit, J.-M., Scholl, H., Hol-
man, M., Warrington, B., Esquerdo, G., and Tricarico, P.
(2007). SKADS: A Sub-Kilometer Asteroid Diameter
Survey. Icarus, submitted.
Goldreich, P. and Ward, W. R. (1973). The Formation of
Planetesimals. Astrophys. J., 183, 1051–1062.
Goldreich, P., Lithwick, Y., and Sari, R. (2004). Planet For-
mation by Coagulation: A Focus on Uranus and Nep-
tune. Annu. Rev. Astron. Astrophys., 42, 549–601.
Gomes, R., Levison, H. F., Tsiganis, K., and Morbidelli,
A. (2005). Origin of the cataclysmic Late Heavy Bom-
bardment period of the terrestrial planets. Nature, 435,
466–469.
Gomez, M., Hartmann, L., Kenyon, S. J., and Hewett, R.
(1993). On the spatial distribution of pre-main-sequence
stars in Taurus. Astron. J., 105, 1927–1937.
Gradie, J. and Tedesco, E. (1982). Compositional structure
of the asteroid belt. Science, 216, 1405–1407.
Greaves, J. S. (2005). Disks Around Stars and the Growth
of Planetary Systems. Science, 307, 68–71.
Greaves, J. S., Holland, W. S., Moriarty-Schieven, G., Jen-
ness, T., Dent, W. R. F., Zuckerman, B., McCarthy, C.,
Webb, R. A., Butner, H. M., Gear, W. K., and Walker,
H. J. (1998). A Dust Ring around epsilon Eridani: Ana-
log to the Young Solar System. Astrophys. J. Lett., 506,
L133–L137.
Greaves, J. S., Wyatt, M. C., Holland, W. S., and Dent,
W. R. F. (2004). The debris disc around τ Ceti: a mas-
sive analogue to the Kuiper Belt. Mon. Not. R. Astron.
Soc., 351, L54–L58.
Greenberg, R., Hartmann, W. K., Chapman, C. R., and
Wacker, J. F. (1978). Planetesimals to planets - Numeri-
cal simulation of collisional evolution. Icarus, 35, 1–26.
Greenberg, R., Weidenschilling, S. J., Chapman, C. R., and
Davis, D. R. (1984). From icy planetesimals to outer
planets and comets. Icarus, 59, 87–113.
Greenzweig, Y. and Lissauer, J. J. (1990). Accretion rates
of protoplanets. Icarus, 87, 40–77.
Haisch, Jr., K. E., Lada, E. A., and Lada, C. J. (2001). Disk
Frequencies and Lifetimes in Young Clusters. Astrophys.
J. Lett., 553, L153–L156.
Hartmann, W. K. and Davis, D. R. (1975). Satellite-sized
planetesimals and lunar origin. Icarus, 24, 504–514.
Hayashi, C. (1981). Structure of the solar nebula, growth
and decay of magnetic fields and effects of magnetic and
turbulent viscosities on the nebula. Progress of Theoret-
ical Physics Supplement, 70, 35–53.
Heppenheimer, T. A. (1980). Secular resonances and the
origin of eccentricities of Mars and the asteroids. Icarus,
41, 76–88.
Hirayama, K. (1918). Groups of asteroids probably of com-
mon origin. Astron. J., 31, 185–188.
Hirayama, K. (1927). Families of asteroids: second paper.
Annales de l’Observatoire astronomique de Tokyo, 19,
1–26.
Holman, M. J. and Wisdom, J. (1993). Dynamical stability
in the outer solar system and the delivery of short period
comets. Astron. J., 105, 1987–1999.
Holsapple, K. A. (1994). Catastrophic disruptions and cra-
tering of solar system bodies: A review and new results.
Planet. Space Sci., 42, 1067–1078.
Hornung, P., Pellat, R., and Barge, P. (1985). Thermal ve-
locity equilibrium in the protoplanetary cloud. Icarus,
64, 295–307.
Housen, K. R. and Holsapple, K. A. (1990). On the frag-
mentation of asteroids and planetary satellites. Icarus,
84, 226–253.
Housen, K. R. and Holsapple, K. A. (1999). Scale Effects
in Strength-Dominated Collisions of Rocky Asteroids.
Icarus, 142, 21–33.
Ida, S., Larwood, J., and Burkert, A. (2000). Evidence
for Early Stellar Encounters in the Orbital Distribution
of Edgeworth-Kuiper Belt Objects. Astrophys. J., 528,
351–356.
Inaba, S. and Barge, P. (2006). Dusty Vortices in Protoplan-
etary Disks. Astrophys. J., 649, 415–427.
Ivezić, Ž., Tabachnik, S., Rafikov, R., Lupton, R. H., Quinn,
T., Hammergren, M., Eyer, L., Chu, J., Armstrong, J. C.,
Fan, X., Finlator, K., Geballe, T. R., Gunn, J. E., Hen-
nessy, G. S., Knapp, G. R., Leggett, S. K., Munn, J. A.,
Pier, J. R., Rockosi, C. M., Schneider, D. P., Strauss,
M. A., Yanny, B., Brinkmann, J., Csabai, I. ., Hindsley,
R. B., Kent, S., Lamb, D. Q., Margon, B., McKay, T. A.,
Smith, J. A., Waddel, P., York, D. G., and the SDSS Col-
laboration (2001). Solar System Objects Observed in the
Sloan Digital Sky Survey Commissioning Data. Astron.
J., 122, 2749–2784.
Jedicke, R. and Metcalfe, T. S. (1998). The Orbital and Ab-
solute Magnitude Distributions of Main Belt Asteroids.
Icarus, 131, 245–260.
Jones, T. A., Levine, A. M., Morgan, E. H., and Rappaport,
S. (2006). Millisecond Dips in Sco X-1 are Likely the
Result of High-Energy Particle Events. ATEL, (949).
Kenyon, S. J. and Bromley, B. C. (2002a). Collisional Cas-
cades in Planetesimal Disks. I. Stellar Flybys. Astron. J.,
123, 1757–1775.
Kenyon, S. J. and Bromley, B. C. (2002b). Dusty Rings:
Signposts of Recent Planet Formation. Astrophys. J.
Lett., 577, L35–L38.
Kenyon, S. J. and Bromley, B. C. (2004a). Collisional Cas-
cades in Planetesimal Disks. II. Embedded Planets. As-
tron. J., 127, 513–530.
Kenyon, S. J. and Bromley, B. C. (2004b). Detecting the
Dusty Debris of Terrestrial Planet Formation. Astrophys.
J. Lett., 602, L133–L136.
Kenyon, S. J. and Bromley, B. C. (2004c). Stellar en-
counters as the origin of distant Solar System objects in
highly eccentric orbits. Nature, 432, 598–602.
Kenyon, S. J. and Bromley, B. C. (2004d). The Size Dis-
tribution of Kuiper Belt Objects. Astron. J., 128, 1916–
1926.
Kenyon, S. J. and Bromley, B. C. (2004e). The Size Dis-
tribution of Kuiper Belt Objects. Astron. J., 128, 1916–
1926.
Kenyon, S. J. and Bromley, B. C. (2005). Prospects for De-
tection of Catastrophic Collisions in Debris Disks. As-
tron. J., 130, 269–279.
Kenyon, S. J. and Bromley, B. C. (2006). Terrestrial Planet
Formation. I. The Transition from Oligarchic Growth to
Chaotic Growth. Astron. J., 131, 1837–1850.
Kenyon, S. J. and Hartmann, L. (1987). Spectral energy
distributions of T Tauri stars - Disk flaring and limits on
accretion. Astrophys. J., 323, 714–733.
Kenyon, S. J. and Luu, J. X. (1998). Accretion in the Early
Kuiper Belt. I. Coagulation and Velocity Evolution. As-
tron. J., 115, 2136–2160.
Kenyon, S. J. and Luu, J. X. (1999a). Accretion in the Early
Kuiper Belt. II. Fragmentation. Astron. J., 118, 1101–
1119.
Kenyon, S. J. and Luu, J. X. (1999b). Accretion in the Early
Outer Solar System. Astrophys. J., 526, 465–470.
Knežević, Z. and Milani, A. (2003). Proper element cat-
alogs and asteroid families. Astron. Astrophys., 403,
1165–1173.
Kobayashi, H., Ida, S., and Tanaka, H. (2005). The evidence
of an early stellar encounter in Edgeworth Kuiper belt.
Icarus, 177, 246–255.
Kokubo, E. and Ida, S. (1995). Orbital evolution of proto-
planets embedded in a swarm of planetesimals. Icarus,
114, 247–257.
Kokubo, E. and Ida, S. (1998). Oligarchic Growth of Pro-
toplanets. Icarus, 131, 171–178.
Kokubo, E. and Ida, S. (2000). Formation of Protoplanets
from Planetesimals in the Solar Nebula. Icarus, 143, 15–
Kokubo, E. and Ida, S. (2002). Formation of Protoplanet
Systems and Diversity of Planetary Systems. Astrophys.
J., 581, 666–680.
Krivov, A. V., Löhne, T., and Sremčević, M. (2006). Dust
distributions in debris disks: effects of gravity, radiation
pressure and collisions. Astron. Astrophys., 455, 509–
Kuchner, M. J., Brown, M. E., and Holman, M. (2002).
Long-Term Dynamics and the Orbital Inclinations of the
Classical Kuiper Belt Objects. Astron. J., 124, 1221–
1230.
Lada, C. J. (2006). Stellar Multiplicity and the Initial Mass
Function: Most Stars Are Single. Astrophys. J. Lett.,
640, L63–L66.
Lada, C. J. and Lada, E. A. (2003). Embedded Clusters in
Molecular Clouds. Annu. Rev. Astron. Astrophys., 41,
57–115.
Landgraf, M., Liou, J.-C., Zook, H. A., and Grün, E. (2002).
Origins of Solar System Dust beyond Jupiter. Astron. J.,
123, 2857–2861.
Lecar, M. and Franklin, F. A. (1973). On the Original Dis-
tribution of the Asteroids I. Icarus, 20, 422–436.
Lecar, M. and Franklin, F. A. (1997). The Solar Nebula,
Secular Resonances, Gas Drag, and the Asteroid Belt.
Icarus, 129, 134–146.
Lee, M. H. (2000). On the Validity of the Coagulation
Equation and the Nature of Runaway Growth. Icarus,
143, 74–86.
Leinhardt, Z. M. and Richardson, D. C. (2002). N-Body
Simulations of Planetesimal Evolution: Effect of Vary-
ing Impactor Mass Ratio. Icarus, 159, 306–313.
Leinhardt, Z. M. and Richardson, D. C. (2005). Planetesi-
mals to Protoplanets. I. Effect of Fragmentation on Ter-
restrial Planet Formation. Astrophys. J., 625, 427–440.
Lemaitre, A. and Dubru, P. (1991). Secular resonances in
the primitive solar nebula. Celestial Mechanics and Dy-
namical Astronomy, 52, 57–78.
Levison, H. F. and Duncan, M. J. (1990). A search for
proto-comets in the outer regions of the solar system.
Astron. J., 100, 1669–1675.
Levison, H. F. and Morbidelli, A. (2007). Models of the
Collisional Damping Scenario for Ice Giant Planets and
Kuiper Belt Formatio. ArXiv Astrophysics e-prints.
Levison, H. F. and Stern, S. A. (2001). On the Size Depen-
dence of the Inclination Distribution of the Main Kuiper
Belt. Astron. J., 121, 1730–1735.
Levison, H. F., Morbidelli, A., and Dones, L. (2004).
Sculpting the Kuiper Belt by a Stellar Encounter: Con-
straints from the Oort Cloud and Scattered Disk. Astron.
J., 128, 2553–2563.
Lissauer, J. J. (1987). Timescales for planetary accretion
and the structure of the protoplanetary disk. Icarus, 69,
249–265.
Luhman, K. L. (2006). The Spatial Distribution of Brown
Dwarfs in Taurus. Astrophys. J., 645, 676–687.
Luu, J. X. and Jewitt, D. C. (2002). Kuiper Belt Objects:
Relics from the Accretion Disk of the Sun. Annu. Rev.
Astron. Astrophys., 40, 63–101.
Malhotra, R. (1995). The Origin of Pluto’s Orbit: Implica-
tions for the Solar System Beyond Neptune. Astron. J.,
110, 420–429.
Malhotra, R. (1996). The Phase Space Structure Near Nep-
tune Resonances in the Kuiper Belt. Astron. J., 111, 504–
Malyshkin, L. and Goodman, J. (2001). The Timescale of
Runaway Stochastic Coagulation. Icarus, 150, 314–322.
Marti, K. and Graf, T. (1992). Cosmic-ray exposure his-
tory of ordinary chondrites. Annual Review of Earth and
Planetary Sciences, 20, 221–243.
Marzari, F., Cellino, A., Davis, D. R., Farinella, P., Zappala,
V., and Vanzani, V. (1996). Origin and evolution of the
Vesta asteroid family. Astron. Astrophys., 316, 248–262.
Marzari, F., Farinella, P., and Davis, D. R. (1999). Origin,
Aging, and Death of Asteroid Families. Icarus, 142, 63–
McCord, T. B., Adams, J. B., and Johnson, T. V. (1970).
Asteroid Vesta: Spectral reflectivity and compositional
implications. Science, 178, 745–747.
Michel, P., Benz, W., Tanga, P., and Richardson, D. C.
(2001). Collisions and Gravitational Reaccumulation:
Forming Asteroid Families and Satellites. Science, 294,
1696–1700.
Morbidelli, A. and Levison, H. F. (2004). Scenarios for
the Origin of the Orbits of the Trans-Neptunian Objects
2000 CR105 and 2003 VB12 (Sedna). Astron. J., 128,
2564–2576.
Morbidelli, A., Jacob, C., and Petit, J.-M. (2002). Planetary
Embryos Never Formed in the Kuiper Belt. Icarus, 157,
241–248.
Morbidelli, A., Emel’yanenko, V. V., and Levison, H. F.
(2004). Origin and orbital distribution of the trans-
Neptunian scattered disc. Mon. Not. R. Astron. Soc., 355,
935–940.
Morbidelli, A., Levison, H. F., Tsiganis, K., and Gomes, R.
(2005). Chaotic capture of Jupiter’s Trojan asteroids in
the early Solar System. Nature, 435, 462–465.
Nagasawa, M. and Ida, S. (2000). Sweeping Secular Res-
onances in the Kuiper Belt Caused by Depletion of the
Solar Nebula. Astron. J., 120, 3311–3322.
Nagasawa, M., Tanaka, H., and Ida, S. (2000). Orbital Evo-
lution of Asteroids during Depletion of the Solar Nebula.
Astron. J., 119, 1480–1497.
Nagasawa, M., Ida, S., and Tanaka, H. (2001). Origin
of high orbital eccentricity and inclination of asteroids.
Earth, Planets, and Space, 53, 1085–1091.
Nagasawa, M., Ida, S., and Tanaka, H. (2002). Excitation
of Orbital Inclinations of Asteroids during Depletion of
a Protoplanetary Disk: Dependence on the Disk Config-
uration. Icarus, 159, 322–327.
Nagasawa, M., Lin, D. N. C., and Thommes, E. (2005).
Dynamical Shake-up of Planetary Systems. I. Embryo
Trapping and Induced Collisions by the Sweeping Sec-
ular Resonance and Embryo-Disk Tidal Interaction. As-
trophys. J., 635, 578–598.
Natta, A., Grinin, V., and Mannings, V. (2000). Properties
and Evolution of Disks around Pre-Main-Sequence Stars
of Intermediate Mass. Protostars and Planets IV , pages
559–588.
Null, G. W., Owen, W. M., and Synnott, S. P. (1993).
Masses and densities of Pluto and Charon. Astron. J.,
105, 2319–2335.
O’Brien, D. P. and Greenberg, R. (2003). Steady-state size
distributions for collisional populations:analytical solu-
tion with size-dependent strength. Icarus, 164, 334–345.
O’Brien, D. P. and Greenberg, R. (2005). The collisional
and dynamical evolution of the main-belt and NEA size
distributions. Icarus, 178, 179–212.
O’Brien, D. P., Morbidelli, A., and Bottke, W. F. (2005).
Collisional Evolution of the Primordial Trans-Neptunian
Disk: Implications for Planetary Migration and the Cur-
rent Size Distribution of TNOs. In Bulletin of the Amer-
ican Astronomical Society, pages 676–+.
O’Brien, D. P., Morbidelli, A., and Bottke, W. F. (2006).
Re-Evaluating the Primordial Excitation and Clearing of
the Asteroid Belt. Icarus, submitted.
Ohtsuki, K. (1992). Evolution of random velocities of plan-
etesimals in the course of accretion. Icarus, 98, 20–27.
Ohtsuki, K., Nakagawa, Y., and Nakazawa, K. (1990). Ar-
tificial acceleration in accumulation due to coarse mass-
coordinate divisions in numerical simulation. Icarus, 83,
205–215.
Ohtsuki, K., Stewart, G. R., and Ida, S. (2002). Evolution
of Planetesimal Velocities Based on Three-Body Orbital
Integrations and Growth of Protoplanets. Icarus, 155,
436–453.
Pan, M. and Sari, R. (2005). Shaping the Kuiper belt size
distribution by shattering large but strengthless bodies.
Icarus, 173, 342–348.
Petit, J., Morbidelli, A., and Chambers, J. (2001). The
Primordial Excitation and Clearing of the Asteroid Belt.
Icarus, 153, 338–347.
Petit, J., Chambers, J., Franklin, F., and Nagasawa, M.
(2002). Primordial Excitation and Depletion of the Main
Belt. In W. F. Bottke, A. Cellino, P. Paolicchi, and R. P.
Binzel, editors, Asteroids III, pages 711–738. University
of Arizona Press, Tucson, AZ.
Petit, J.-M., Holman, M. J., Gladman, B. J., Kavelaars, J. J.,
Scholl, H., and Loredo, T. J. (2006). The Kuiper Belt
luminosity function from mR = 22 to 25. Mon. Not. R.
Astron. Soc., 365, 429–438.
Rafikov, R. R. (2003a). Dynamical Evolution of Planetes-
imals in Protoplanetary Disks. Astron. J., 126, 2529–
2548.
Rafikov, R. R. (2003b). Planetesimal Disk Evolution
Driven by Embryo-Planetesimal Gravitational Scatter-
ing. Astron. J., 125, 922–941.
Rafikov, R. R. (2003c). Planetesimal Disk Evolution Driven
by Planetesimal-Planetesimal Gravitational Scattering.
Astron. J., 125, 906–921.
Rafikov, R. R. (2003d). The Growth of Planetary Embryos:
Orderly, Runaway, or Oligarchic? Astron. J., 125, 942–
Ryan, E. V., Davis, D. R., and Giblin, I. (1999). A Labora-
tory Impact Study of Simulated Edgeworth-Kuiper Belt
Objects. Icarus, 142, 56–62.
Safronov, V. S. (1969). Evoliutsiia doplanetnogo oblaka.
(Evolution of the Protoplanetary Cloud and Formation
of the Earth and Planets, Nauka, Moscow [Translation
1972, NASA TT F-677]. 1969.
Safronov, V. S. (1979). On the origin of asteroids. In
T. Gehrels, editor, Asteroids, pages 975–991. University
of Arizona Press, Tucson, AZ.
Scholz, A., Jayawardhana, R., and Wood, K. (2006). Ex-
ploring Brown Dwarf Disks: A 1.3 mm Survey in Tau-
rus. Astrophys. J., 645, 1498–1508.
Slesnick, C. L., Hillenbrand, L. A., and Carpenter, J. M.
(2004). The Spectroscopically Determined Substellar
Mass Function of the Orion Nebula Cluster. Astrophys.
J., 610, 1045–1063.
Spaute, D., Weidenschilling, S. J., Davis, D. R., and
Marzari, F. (1991). Accretional evolution of a planetesi-
mal swarm. I - A new simulation. Icarus, 92, 147–164.
Stern, S. A. (1995). Collisional Time Scales in the Kuiper
Disk and Their Implications. Astron. J., 110, 856–868.
Stern, S. A. (2005). Regarding the Accretion of 2003 VB12
(Sedna) and Like Bodies in Distant Heliocentric Orbits.
Astron. J., 129, 526–529.
Stern, S. A. and Colwell, J. E. (1997a). Accretion in the
Edgeworth-Kuiper Belt: Forming 100-1000 KM Radius
Bodies at 30 AU and Beyond. Astron. J., 114, 841–849.
Stern, S. A. and Colwell, J. E. (1997b). Collisional Erosion
in the Primordial Edgeworth-Kuiper Belt and the Gener-
ation of the 30-50 AU Kuiper Gap. Astrophys. J., 490,
879–882.
Takeuchi, T. and Artymowicz, P. (2001). Dust Migration
and Morphology in Optically Thin Circumstellar Gas
Disks. Astrophys. J., 557, 990–1006.
Tanaka, H. and Ida, S. (1999). Growth of a Migrating Pro-
toplanet. Icarus, 139, 350–366.
Tanga, P., Weidenschilling, S. J., Michel, P., and Richard-
son, D. C. (2004). Gravitational instability and cluster-
ing in a disk of planetesimals. Astron. Astrophys., 427,
1105–1115.
Thomas, P. C., Binzel, R. P., Gaffey, M. J., Storrs, A. D.,
Wells, E. N., and Zellner, B. H. (1997). Impact excava-
tion on asteroid 4 Vesta: Hubble Space Telescope results.
Science, 277, 1492–1495.
Thommes, E. W., Duncan, M. J., and Levison, H. F. (1999).
The formation of Uranus and Neptune in the Jupiter-
Saturn region of the Solar System. Nature, 402, 635–
Thommes, E. W., Duncan, M. J., and Levison, H. F. (2002).
The Formation of Uranus and Neptune among Jupiter
and Saturn. Astron. J., 123, 2862–2883.
Tsiganis, K., Gomes, R., Morbidelli, A., and Levison, H. F.
(2005). Origin of the orbital architecture of the giant
planets of the Solar System. Nature, 435, 459–461.
Ward, W. R. (1981). Solar nebula dispersal and the stability
of the planetary system. I - Scanning secular resonance
theory. Icarus, 47, 234–264.
Weidenschilling, S. J. (1977a). Aerodynamics of solid bod-
ies in the solar nebula. Mon. Not. R. Astron. Soc., 180,
57–70.
Weidenschilling, S. J. (1977b). The distribution of mass in
the planetary system and solar nebula. Astrophys. Space
Sci., 51, 153–158.
Weidenschilling, S. J. (1977c). The distribution of mass in
the planetary system and solar nebula. Astrophys. Space
Sci., 51, 153–158.
Weidenschilling, S. J. (1980). Dust to planetesimals - Set-
tling and coagulation in the solar nebula. Icarus, 44,
172–189.
Weidenschilling, S. J. (1984). Evolution of grains in a tur-
bulent solar nebula. Icarus, 60, 553–567.
Weidenschilling, S. J. (1989). Stirring of a planetesimal
swarm - The role of distant encounters. Icarus, 80, 179–
Weidenschilling, S. J. (1995). Can gravitation instability
form planetesimals? Icarus, 116, 433–435.
Weidenschilling, S. J. (2003). Radial drift of particles in
the solar nebula: implications for planetesimal forma-
tion. Icarus, 165, 438–442.
Weidenschilling, S. J. (2006). Models of particle layers in
the midplane of the solar nebula. Icarus, 181, 572–586.
Weidenschilling, S. J., Spaute, D., Davis, D. R., Marzari,
F., and Ohtsuki, K. (1997a). Accretional Evolution of a
Planetesimal Swarm. Icarus, 128, 429–455.
Weidenschilling, S. J., Spaute, D., Davis, D. R., Marzari,
F., and Ohtsuki, K. (1997b). Accretional Evolution of a
Planetesimal Swarm. Icarus, 128, 429–455.
Wetherill, G. W. (1989). Origin of the asteroid belt. In
Asteroids II, pages 661–680.
Wetherill, G. W. (1990). Comparison of analytical and
physical modeling of planetesimal accumulation. Icarus,
88, 336–354.
Wetherill, G. W. (1992). An alternative model for the for-
mation of the asteroids. Icarus, 100, 307–325.
Wetherill, G. W. and Stewart, G. R. (1989). Accumulation
of a swarm of small planetesimals. Icarus, 77, 330–357.
Wetherill, G. W. and Stewart, G. R. (1993). Formation of
planetary embryos - Effects of fragmentation, low rel-
ative velocity, and independent variation of eccentricity
and inclination. Icarus, 106, 190–209.
Williams, D. R. and Wetherill, G. W. (1994). Size dis-
tribution of collisionally evolved asteroidal populations
- Analytical solution for self-similar collision cascades.
Icarus, 107, 117–128.
Williams, J. P., Najita, J., Liu, M. C., Bottinelli, S., Car-
penter, J. M., Hillenbrand, L. A., Meyer, M. R., and
Soderblom, D. R. (2004). Detection of Cool Dust around
the G2 V Star HD 107146. Astrophys. J., 604, 414–419.
Wurm, G. and Krauss, O. (2006). Concentration and sorting
of chondrules and CAIs in the late Solar Nebula. Icarus,
180, 487–495.
Wurm, G., Paraskov, G., and Krauss, O. (2004). On the
Importance of Gas Flow through Porous Bodies for the
Formation of Planetesimals. Astrophys. J., 606, 983–
Wyatt, M. C., Greaves, J. S., Dent, W. R. F., and Coulson,
I. M. (2005). Submillimeter Images of a Dusty Kuiper
Belt around η Corvi. Astrophys. J., 620, 492–500.
Yoshida, F., Nakamura, T., Watanabe, J., Kinoshita, D., Ya-
mamoto, N., and Fuse, T. (2003). Size and Spatial Dis-
tributions of Sub-km Main-Belt Asteroids. PASJ, 55,
701–715.
Youdin, A. N. and Shu, F. H. (2002). Planetesimal For-
mation by Gravitational Instability. Astrophys. J., 580,
494–505.
This 2-column preprint was prepared with the AAS LATEX macros v5.2.
	INTRODUCTION
	COAGULATION THEORY
	COAGULATION SIMULATIONS
	Background
	Self-Stirring
	External Perturbation
	Nice Model
	A Caveat on the Collisional Cascade
	Model Predictions
	Confronting KBO collision models with KBO data
	KBOs and Asteroids
	Concluding Remarks
ABSTRACT
  This chapter summarizes analytic theory and numerical calculations for the
formation and collisional evolution of KBOs at 20--150 AU. We describe the main
predictions of a baseline self-stirring model and show how dynamical
perturbations from a stellar flyby or stirring by a giant planet modify the
evolution. Although robust comparisons between observations and theory require
better KBO statistics and more comprehensive calculations, the data are broadly
consistent with KBO formation in a massive disk followed by substantial
collisional grinding and dynamical ejection. However, there are important
problems reconciling the results of coagulation and dynamical calculations.
Contrasting our current understanding of the evolution of KBOs and asteroids
suggests that additional observational constraints, such as the identification
of more dynamical families of KBOs (like the 2003 EL61 family), would provide
additional information on the relative roles of collisional grinding and
dynamical ejection in the Kuiper Belt. The uncertainties also motivate
calculations that combine collisional and dynamical evolution, a `unified'
calculation that should give us a better picture of KBO formation and
evolution.

<|endoftext|><|startoftext|>
Introduction to Hp Spaces, Cambridge University Press, London-New York-New
Rochelle-Melburnr-Sydney, 1980.
[3] Gorbachuk V.I., Knyazyuk A.V. Boundary values of solutions of operator differential equa-
tions, Uspekhi Mat. Nauk 44 (1989), no. 3, 55-91.
[4] Komatsu H. Ultradistributions and Hyperfunctions, Lecture Notes Math. 287 (1973), 180-
[5] Gorbachuk V.I. On solutions of an operator differential equation with singularilies, Bound-
ary Value Problems for Differential Equations, Sborn. Nauchn. Trudov, Inst. Matem.
Ukrain. AN, 1992, 8-36.
[6] Gorbachuk M.L., Denche M. Representation and boundary values of biharmonic functions,
Uspekhi Mat. Nauk 46 (1991), no. 6, p. 202.
[7] Mikhailov V.P. On the existence of boundary values of solutions of a polyharmonic equation
on the boundary of a domain, Mat. Sborn. 187 (1996), no. 11, 89-115.
[8] Berezansky Yu.M., Sheftel Z.G., Us G.F. Functional Analysis. Vol. 1,2, Birkhauser, Basel-
Boston-Berlin, 1966.
[9] Gorbachuk V.I. On Fourier series of periodic ultradistributions, Ukrain. Mat. Zh. 34 (1982),
no. 2, 144-150.
Institute of Mathematics
National Academy of Sciences of Ukraine
3 Tereshchenkivs’ka
Kyiv 01601, Ukraine
E-mail: imath@horbach.kiev.ua, sergiy.torba@gmail.com
ABSTRACT
  In trigonometric series terms all polyharmonic functions inside the unit disk
are described. For such functions it is proved the existence of their boundary
values on the unit circle in the space of hyperfunctions. The necessary and
sufficient conditions are presented for the boundary value to belong to certain
subspaces of the space of hyperfunctions.

<|endoftext|><|startoftext|>
Introduction
The nature of dark matter, which accounts for the majority of the mass in the Universe,
is one of the major outstanding problems of modern astrophysics. Although it is often
assumed that dark matter is collisionless, there is no a priori reason to believe that this is the
case, and it has been noted by other authors that a non-zero self-interaction cross-section can
have important astrophysical implications (e.g., Spergel & Steinhardt 2000). In particular,
self-interacting dark matter (SIDM) has been invoked to alleviate some apparent problems
with the standard cold dark matter (CDM) model, such as the non-observation of cuspy
mass profiles in galaxies (e.g., Moore 1994; Flores & Primack 1994; cf. Navarro et al. 1997;
Moore et al. 1999b) and the overprediction of the number of small sub-halos within larger
systems (e.g., Klypin et al. 1999; Moore et al. 1999a). Previous simulations and theoretical
studies suggest that a self-interaction cross-section per unit mass of σ/m ∼ 0.5− 5 cm2 g−1
is needed to explain the observed mass profiles of galaxies (e.g., Davé et al. 2001; Ahn &
Shapiro 2003, though see also Ahn & Shapiro 2005). Earlier studies have found stringent
upper limits on σ/m, inconsistent with the above range (e.g., Yoshida et al. 2000a; Hennawi
& Ostriker 2002; Miralda-Escudé 2002, though see also Sand et al. 2002). However, in
general these studies require non-trivial assumptions or statistical samples of clusters and
full cosmological simulations.
Furlanetto & Loeb (2002) pointed out that if one observes an offset between the gas and
dark matter in a merging cluster, arising because of the ram pressure acting on the gas but not
the dark matter, it can be used to constrain the collisional nature of dark matter. Markevitch
et al. (2002, hereafter M02) found just such a cluster, 1E 0657-56, which in the Chandra
image shows a bullet-like subcluster exiting the core of the main cluster, with prominent
bow shock and cold front features, and a uniquely simple merger geometry (Markevitch et
al. 2002, hereafter M02). This gas bullet lags behind the subcluster galaxies, which led
M02 to suggest that this cluster could be used to determine whether or not dark matter is
collisional. If dark matter were collisionless, one would expect the subcluster dark matter
– 3 –
halo to be coincident with the collisionless subcluster galaxies. A map of the dark matter
distribution was subsequently derived by Clowe et al. (2004) using weak lensing observations,
which showed that the subcluster dark matter clump lay ahead of the gas bullet, close to the
centroid of the subcluster galaxies (see also Clowe et al. 2006a, hereafter C06). The X-ray
image of 1E 0657-56 is shown in Fig. 1 with the most recently derived weak lensing mass
contours of C06 overlain. The weak lensing contours are shown instead of the strong lensing
contours since they are better for showing the overall structure, as they are derived from
a wider field of view. The positions of the total mass peaks from strong and weak lensing
analyses are consistent with one another, and the general structures are similar. The more
massive main cluster is on the left and the high-velocity merging bullet subcluster is on the
right. The main and subcluster mass peaks are clearly visible in the mass map, as is the
offset between the gas bullet and the corresponding dark matter (DM) peak. C06 argued
that this offset is direct evidence for the existence of dark matter.
The weak lensing mass map of Clowe et al. (2004) was used by Markevitch et al. (2004,
hereafter M04), in conjunction with the X-ray and optical observations available at the time,
to analytically estimate upper limits on the self-interaction cross-section per unit mass of DM,
σ/m, using three independent methods. These methods were based on the observed offset
between the gas bullet and the DM subclump, the high merger velocity of the subcluster,
and the survival of the DM subclump (more precisely, the subcluster’s M/L ratio being
equal to that observed in other clusters and in the main cluster). M04 assume a King mass
profile, based on the original weak lensing mass map, and that the subcluster has passed
only once through the main cluster, close to the main cluster core, as indicated by the X-ray
image. Their most stringent limit comes from the observed survival of the DM subclump,
from which they infer that σ/m < 1 cm2 g−1.
Although the analytic estimates performed by M04 provide useful upper limits on σ/m,
several conservative simplifying assumptions were necessary. For instance, the effects of
dynamical friction as the subcluster disturbs the main cluster mass distribution were ignored,
as was the possibility of multiple scatterings per particle. Although these effects are relatively
small, their inclusion may lead to tighter constraints. Furthermore, the analytic estimates
cannot address any structure that may be found in a high-resolution mass map (e.g., tails in
the DM distribution, similar to the gas tails seen in the bullet, due to collisional stripping
of DM, as described by M04). This argues for full N-body simulations that would include
the effects of SIDM with varying cross-sections.
Additionally, new data (both X-ray and lensing) have become available for 1E 0657-56.
Analysis of data from 450 ks of total exposure with Chandra gives a more accurate shock
Mach number of M = 3.0 ± 0.4 (all uncertainties 68%), which corresponds to a shock (and
– 4 –
bullet) velocity of 4700±630 km s−1 (Markevitch 2005). Recent weak and strong lensing
analyses of a much larger optical dataset, which includes HST observations, give a higher
quality mass map and a more accurate determination of the subcluster dark matter and
galaxy centroids (Bradač et al. 2006, hereafter B06; C06). In particular, the accuracy of the
total mass and galaxy centroids is now sufficient for an additional method of constraining
σ/m. In this paper, we will concentrate on the most sensitive method from M04, which is
based on the observed mass-to-light (M/L) ratios, and on this new test. The best way to
interpret the new high-quality data is through comparisons with detailed numerical simula-
tions of the merger which allow for SIDM with varying cross-sections. We present results
from such simulations and give constraints on the self-interaction cross-section of dark matter
particles. We assume Ω0 = 0.3, ΩΛ = 0.7, and H0 = 70 km s
−1 Mpc−1, for which 1′′ = 4.42
kpc at the cluster redshift of z = 0.296.
2. The Simulations
2.1. Simulation Code and Parameters
All simulations were performed using a modified version of the publicly available TreeSPH
code GADGET2 (Springel 2005). To model the self-interaction of the DM particles, we
adopted a Monte Carlo method used previously by other authors (e.g., Burkert 2000; Yoshida
et al. 2000b). At each simulation time step, the scattering probability for the ith particle is
given by
Pi = ρiσvrel∆t, (1)
where ρi is the local density, vrel is the relative velocity between the ith particle and its nearest
neighbor, and ∆t is the time step size. The local density is determined using GADGET2’s
smoothed particle hydrodynamic (SPH) capabilities. Collisions are assumed to be elastic
and scattering isotropic in the center-of-mass frame. In order for this relation to be valid,
∆t must be chosen such that Pi ≪ 1.
We ran a series of merger simulations with σ/m varying between 0 and 1.25 cm2 g−1.
Each simulation run included 106 DM particles (gas was not included in the simulations,
see discussion in § 4.5). Additionally, we performed a convergence test run with 107 DM
particles and σ/m ≈ 1 cm2 g−1, which agreed well with the lower resolution run for all
tests we performed. We interpret this agreement as indicating that the effects of individual
self-interacting DM particles are well modeled by the large computation particles used in
the simulations, and that the results we present here are not seriously affected by numerical
resolution effects. The ratio of DM particles in the main cluster and subcluster was set equal
– 5 –
to the initial total mass ratio of the clusters, which is known analytically from the King
models used to build the clusters.
In this work, we apply a new method for constraining σ/m based on the absence of an
offset between the subcluster total mass and galaxy centroids. For this, we added another
family of particles to the simulations to represent the collisionless galaxies. We choose 105
“normal” galaxy particles for the main cluster and 2.5×104 for the subcluster throughout.
The ratio of the number of normal galaxy particles was estimated based on galaxy counts
given by Barrena et al. (2002). The galaxies were initially distributed like the DM in each run.
The mass in galaxies was assumed to be roughly 5% of the total mass for each cluster, which,
combined with the number of galaxy particles, gives a low mass per galaxy (2.15×108 M⊙).
Using a large number of light-weight galaxies was chosen over using a more realistic mass
per galaxy so that accurate galaxy centroids could be determined. Test simulations run with
the more realistic average mass per galaxy of 1011 M⊙, also determined from results given by
Barrena et al. (2002), showed similar results to those given below in §3, though, as expected,
with a larger scatter. Two cD galaxies, one at the center of each cluster, were also included in
the simulations (though we note that three cD galaxies are observed, two associated with the
main cluster and one with the subcluster). Their inclusion leads to conservative estimates of
the effects of DM self-interaction, since a lower central DM density is required to reproduce
the observed total mass profile (and the scattering probability depends on the local DM
density via Eqn. 1). The cD galaxies were each given a mass of 1013M⊙.
The gravitational softening length was chosen to be 2 kpc throughout, which is on the
order of the mean inter-particle separation in the densest region in each simulation (i.e., at
core passage). The softening length for the cD galaxy particles was set to 60 kpc throughout.
This large softening length was chosen since on the scale of the simulations cD galaxies are
significantly extended objects and treating them as concentrated point-like masses would
be unrealistic. Since the lensing observations do not give an accurate mass profile for the
subcluster, King models with density profiles of ρ(r) = ρ0(1 + r
2/r2c )
−3/2, where ρ0 is the
central density and rc is the core radius, were conservatively chosen for the mass profiles of
each cluster. Such a choice gives conservative limits for the effects of self-interacting DM,
since King models do not have strongly concentrated “cuspy” cores, as compared to NFW
(Navarro et al. 1995) and Hernquist (1990) profiles. Thus the central density is lower and the
total number of DM particle collisions is conservatively reduced. Further discussion of the
impact of the bullet mass profile on our results can be found in §4.2. As suggested by the
X-ray morphology (M04), all simulated mergers were head-on collisions with zero impact
parameter and an initial separation of 4 Mpc. The effects of a possible non-zero impact
parameter are discussed in §4.1.
– 6 –
2.2. Initial Conditions
For each run, the initial conditions were chosen such that the projected mass profiles of
the main cluster and the bullet subcluster, after core passage and at the observed separation
(720 kpc), roughly matched those from the most recent combined strong and weak lensing
results derived by B06, which are given in the last row of Table 2. A relatively small
contribution from the observed distribution of gas mass was subtracted from the B06 total
mass, so that the resulting values could be directly compared with the simulations. The gas
masses were computed from the X-ray observations. The details of the derivation of the gas
mass map will be given in a future paper. A summary of the parameters for each simulation
run is given in Table 1, which gives the initial central density and core radius for the main
cluster (ρc,1, rc,1) and the bullet subcluster (ρc,2, rc,2), and σ/m for each run. The mass
profiles were truncated at 20 rc.
As pointed out by C06, weak lensing is expected to underestimate the mass of the lens
by 10-20% in the dense central regions. Furthermore, weak lensing can underestimate masses
due to mass-sheet degeneracy, where the mass map is affected by the non-detection of mass
at the edges of the field of view. The effect can be seen by comparing the total (i.e., without
the gas mass subtracted) weak lensing mass estimates of C06 to the mass profiles derived by
B06, which combine strong and weak lensing observations. We chose to match the projected
mass profiles in the simulations to those given by B06, since strong lensing is expected to
give better results in the central core regions. We are most interested in these regions since
most of the particle scattering depth accumulates near the center. We explore the effects of
decreasing the total mass of the system on our results in § 4.3.
2.3. Matching Simulations to Observations
Columns 4 & 5 of Table 2 give the total projected mass within 150 kpc of the mass peak
for the main cluster (M1(r < 150kpc)) and the subcluster (M2(r < 150kpc)) at the observed
separation for each simulation run and from the lensing observations. Some trial-and-error
was necessary to determine what initial conditions gave the desired projected mass profiles at
the desired separation. In general, larger σ/m values required a more concentrated subcluster
initially. This effect occurs because the self-interaction of the DM particles causes them to
be scattered away, particularly in high density regions. Consequently, the subcluster mass
profile is spread out during core passage. For the purposes of comparing the simulations
with observations, we take the simulation snapshot where the offset between the clusters is
closest to the observed separation. The time resolution of the snapshots was small enough
to match this value to within a few kpc (< 1′′), which is within the observational error. We
– 7 –
require the simulated mass profiles at this moment to be consistent with the strong lensing
mass map to within 10% in the inner regions (r < 500 kpc).
2.4. Stability of Simulated Halos
It is of interest to evaluate the stability of our simulated clusters, particularly in the pres-
ence of SIDM. In the case of non-self-interacting DM, the phase-space distribution function
can be computed and used to generate a gravitationally stable King model density profile.
However, in the case of SIDM, particle collisions will tend to transfer kinetic energy from one
region of the cluster to another, consequently altering the density profile (see, e.g., Burkert
2000). In section § 3.2, we will draw conclusions based on the fraction of particles scattered
away from the core of the subcluster due to the merger event. It is therefore necessary to
determine what fraction of particles might flow from the central region of the bullet due to
the instability resulting from SIDM collisions. To this end, we ran simulations of the subclus-
ter DM halo, allowing it to evolve in isolation over the timescale of the merger simulations
(about 1 Gyr). We used the same cluster parameters as the subcluster that has the highest
central density of all the clusters in Table 1. The results are shown in Figure 3, which gives
the initial density profile (solid line) and the density profile after 1 Gyr for σ/m = 0 cm2 g−1
(dotted line) and σ/m = 0.7 cm2 g−1 (dashed line). The density in the inner regions is
marginally enhanced in the case of SIDM. This result is similar to the core-collapse phase
seen by Burkert (2000), where weak interactions between the kinematically hot core and the
cooler outer regions result in an outward transport of kinetic energy (though this effect is
expected to be somewhat curtailed here due to the near isothermality of the King profile
at small radii). For the purposes of the test described in § 3.2, we are only concerned with
the total mass within projected radii. This quantity is plotted for each run in Figure 4. The
above effect of SIDM on the projected mass profile is negligible, particularly for projected
radii x ≥ 150 kpc, which is the minimum radius considered for the test described in § 3.2.
Thus, if we find that a large fraction of SIDM particles scattered outside this radius, it can
be assumed to be caused by the merger event as opposed to any halo instability. Further-
more, the collisionless galaxies are expected to adjust to any change in the overall potential
(which is dominated by the DM), thereby acting to further stabilize the mass-to-light ratio.
Indeed, in these isolated subcluster runs, the mass-to-light ratios within a projected radius
of 150 kpc from the cluster centers stay within 2% of their initial values, regardless of the
DM self-interaction.
– 8 –
3. Results
3.1. Galaxy – Dark Matter Centroid Offset
For non-self-interacting DM, the centroids of the subclump DM and galaxy distributions
are expected to be coincident throughout the simulation, since gravity is the only operating
force. However, when σ/m > 0, the subcluster DM halo experiences a drag force as it passes
through the main cluster, and subsequently lags the collisionless galaxies, just as the fluid-
like subcluster gas core is observed to lag the DM halo (see Fig. 1). We ran simulations with
a range of values for σ/m and calculated the centroids for each particle type by taking the
average projected position of the particles in some large region, centering on this position
with a smaller region, and repeating with smaller and smaller regions (down to a region with
a radius of 200 kpc). Column 6 of Table 2 gives ∆x, the offset between the subcluster galaxy
and DM centroids, for each run, for the moment when the subcluster is close to the observed
separation of 720 kpc from the main cluster. The dependence of ∆x on σ/m is also plotted
in Figure 5 (solid line). Results from the run with σ/m = 0 indicate that the offsets from the
simulations are accurate to about ±2 kpc (0.5′′). It is clear from Table 2 that the centroid
offset is a strong function of σ/m.
An X-ray image close-up of the bullet region with error contours for the subcluster total
mass and galaxy centroids overlain is shown in Figure 2. Details of the derivation of the
total mass centroids are given in C06. The centroid of the galaxy distribution was calculated
from the ACS photometry, using all galaxies for which the F814W-F606W color is within
0.15 mag of the red sequence. We used an Epanechnikov kernel with h = 30′′ (Merritt &
Tremblay, 1994; Gonzalez et al. 2002) to determine the centroid, and a bootstrap technique
to quantify the uncertainty. The centroid of the subcluster galaxies is found to be 5.7′′±6.6′′
(25 ± 29 kpc) west of the corresponding weak lensing mass peak. Given the observational
errors on the centroid positions (roughly 5′′, or 22 kpc, on the subcluster mass peak and
galaxy centroid), the absence of a larger offset means that σ/m < 1.25 cm2 g−1. We note
that, although this upper limit is greater than the best constraint of σ/m < 1 cm2 g−1 found
by M04, it is more robust, since it does not rely on the assumption that the subcluster and
the main cluster had equal M/L ratios prior to the merger, as is the case with the limit from
M04 (see § 3.2). This distinction is relevant since, although there is evidence for a universal
M/L ratio for clusters, the level scatter for individual clusters is not negligible (see Dahle
2000).
– 9 –
3.2. Subcluster M/L Ratio
In a merger scenario, SIDM is expected to give a lower M/L ratio for the subcluster that
has just passed through a dense core as compared to collisionless DM. This is because during
the merger, DM particles are scattered away due to collisions, while the collisionless galaxies
are relatively unaffected. To estimate the change in the M/L ratio in the simulations due
to the merger, we simply take the ratio of the total mass to galaxy mass within 150 kpc
(projected) of the bullet DM centroid and compare the values at the start of the simulations
and at the observed separation. The results are tabulated in Column 7 of Table 2, which
gives f , the fractional decrease in the bullet M/L ratio within 150 kpc, and also plotted in
Figure 5 (dashed line). We note that for σ/m ≈ 1 cm2 g−1, the subcluster loses about 38%
of its mass within 150 kpc, which is in agreement with a conservative estimate of 20 - 30%
given by M04. As expected, the numerical results yield somewhat tighter constraints on
σ/m as compared to the analytic estimates when using the same method and observational
constraints.
Using the latest lensing mass map from B06, we rederived M/L ratios for each of the
two subcluster within a projected 150 kpc of the total mass peaks (for previous results see
Clowe et al., 2004). For the subcluster, the mass contribution from the outskirts of the
main cluster has been approximately subtracted, whereas for the main cluster, the total
mass is used, since the contribution from the subcluster is negligible. The projected mass
contribution from the main cluster to the subcluster is estimated by taking the average
mass in an annulus at the distance of the subcluster (excluding the region of the subcluster
itself). This gives a conservative estimate for the upper limit on σ/m, since scattering
due to putative DM collisions is expected to result in an anomalously low M/L value for
the subcluster as compared to the main cluster, and by reducing the observed mass of
the subcluster we minimize the effect of the collisions that we want to constrain. We find
M/LB = 471± 28, 422± 25 and M/LI = 179± 11, 214± 13 for the subcluster and the main
cluster, respectively (for a discussion of the errors on the mass measurements, see B06). The
ratios agree with one another to within about the 68% confidence intervals. From the I band
data, we find that the ratio of M/L ratios of the subcluster and main cluster is 0.84± 0.07.
We conservatively choose to use the I band data only, since we want to put a firm lower limit
on this ratio, andM/LB is larger for the subcluster than for the main cluster. Assuming each
cluster started out with similar M/L values, which appears to be a reasonable assumption
for clusters in general (e.g., Mellier 1999; Dahle 2000), we conclude that the subcluster could
not have lost more than ∼ 23% of its initial mass. A comparison with the results from
simulations plotted in Figure 5 (dashed line) shows that this implies σ/m . 0.6 cm2 g−1,
which is a slight improvement over the previous best limit of σ/m . 1 cm2 g−1 from the
conservative estimates of M04.
– 10 –
3.3. Structure in Subcluster Dark Matter Distribution
M04 suggested that scattered DM particles, which would account for about 1/5 of the
total subcluster mass, might form tail features in the DM distribution, similar to the tails seen
in the X-ray image of the gas bullet (see Figure 1). The simulations allow us to determine
whether the non-observation of such tails in the mass map could be used to constrain σ/m.
We find that, rather than forming a tail, the scattered particles are mostly deposited in
the core of the main cluster, and do not form any features at a level that is interesting for
constraining σ/m.
4. Discussion
4.1. Non-zero Impact Parameter
As M04 argue, the morphology of the X-ray image, in particular, the symmetry of the
North-South X-ray bar (most likely an oblate spheroid viewed edge-on) between the main
cluster and subcluster mass peaks around the axis of symmetry set by the shape of the X-ray
bullet (which gives its present velocity direction), combined with the line-of-sight velocity
and X-ray derived Mach number, indicate a merger axis that is ∼ 10◦ from the plane of the
sky, and that the cluster cores must have passed close to one another, certainly within the
∼ 200 kpc core radius of the main cluster. In all simulations previously discussed, it was
assumed that the bullet subcluster passed directly through the center of the main cluster
core, i.e., that the impact parameter of the merger, b, is zero. For b > 0, we expect that the
effects of self-interacting DM will be reduced, since the density is at a maximum when the
core centers pass directly through one another, and the scattering probability is proportional
to the density (Eqn. 1). To test the strength of this effect, we re-ran the simulation R4 (see
Table 1) with an impact parameter of b = 200 kpc. Aside from the impact parameter,
the initial mass and velocity distributions were identical to those for run R4, so that the
relative effects of b > 0 could be investigated (specifically, no adjustments were made to the
initial conditions to more closely match the current observed mass profiles). The resulting
projected total mass profiles for the subcluster within 150 kpc and 250 kpc at the observed
separation agreed with those from the b = 0 run to within 4%. For the main cluster, the
match was better than 1%.
The resulting offset between the galaxy and DM centroids during the post core passage
phase was systematically smaller than the offset seen in the b = 0 run. At the observed
separation, the difference in the centroid offsets, as compared to the b = 0 run, was about
4 kpc, which is on the order of both the observational error and the accuracy of our numerical
– 11 –
technique. The fractional change in the M/L ratio of the subcluster was similarly affected;
for the b = 0 case, the M/L ratio within 150 kpc drops by about 27%, whereas for the run
with b = 200 kpc, it drops by 22%. Assuming, as we did in § 3.2, that the subcluster could not
have lost more than ∼ 23% of its initial mass, we find the constraint that σ/m < 0.7 cm2 g−1.
We therefore conclude that, although a non-zero impact parameter reduces the effects of self-
interacting DM as expected, the level of the effect is relatively small. This is likely due to
the assumed King mass profile of the main cluster. The radial density gradient is relatively
small within the core radius of the main cluster (which in this case is 151 kpc), so it is not
surprising that the effects of self-interacting DM are not significantly reduced by increasing
the impact parameter, so long as it is comparable to the core radius of the main cluster.
Naturally, the effects of a non-zero b would be increased if the main cluster had a strongly
peaked mass profile, though the current lensing data suggest otherwise. We conclude that
any value of impact parameter that is consistent with the observations will only slightly alter
our results.
4.2. Alternative Bullet Mass Profiles
As noted in §2, the choice of a King mass profile for the subcluster is expected to give
conservative estimates on the effects of collisional DM, based on the M/L ratio, since the
central density is low as compared to models with cuspy cores such as NFW and Hernquist
models. However, in the case of the galaxy/DM centroid offset test, one might argue that
since the subcluster galaxies are more tightly bound in the center for more highly concen-
trated mass profiles, it will be more difficult to displace them, which could lead to a smaller
offset between the centroids despite the increased action of DM collisions. We therefore ran
a test simulation, using a King model for the main cluster and a Hernquist model for the
subcluster, with σ/m = 0.72 cm2 g−1. The Hernquist profile is given by
ρ(r) =
(r + a)3
, (2)
whereM is the total cluster mass and a is the scale length (Hernquist 1990). As before, initial
parameters were chosen such that the bullet mass profile roughly matches the observed profile
at the current separation (we used M = 3.13× 1014 M⊙, a = 100 kpc). This is expected to
be the most conservative model combination for this test, since the main cluster King model
minimizes the effects of DM self-interaction while the subcluster Hernquist model maximizes
the binding energy of the subcluster galaxies. The results show that, when comparing to run
R4 in Table 2, the galaxy/DM centroid offset was only slightly less than that found with the
King model subcluster, on the order of the accuracy of the simulation offset values (less than
– 12 –
1 kpc). The change in the subcluster’s M/L was similarly only weakly affected (f = 0.27
for the King model bullet subcluster, whereas for the Hernquist model we find f = 0.31,
consistent with the King profile being the conservative case). The agreement is likely due
to the fact that the centroids become offset from one another after core passage, and it is
during core passage that the central density peak of the bullet is mostly “smoothed away”
due to DM collisions (recall that DM scattering is more frequent in high density regions,
so high density structures are more efficiently destroyed by DM self-interactions). Although
strongly peaked density profiles have been found to be unstable to SIDM (e.g., Burkert 2000;
Yoshida et al. 2000b), in our simulation a significant change in density only occurred at small
radii, such that the total projected mass of the subcluster within 50 kpc remained stable
up until the merger event. Therefore the subcluster mass distribution remained significantly
more peaked than a King profile cluster with the same projected mass within 150 kpc. We
conclude that our results are only weakly dependent on the mass profile chosen for the bullet,
so long as we require that the observed mass profile is reproduced. For the initially more
centrally concentrated profiles, the effects of the increased binding energy in the core are
balanced by the increased scattering frequency in this region.
4.3. Mass Profile Dependence
As mentioned in § 2, weak lensing is expected to underestimate the mass of the lens by
10-20%. There are two separate effects that contribute to this underestimation. First, near
the core of a cluster there is a large region without weak lensing galaxies, and this region
is effectively smoothed over when computing the mass map. Additionally, galaxies near the
regions where strong lensing dominates are measured in the weak lensing approximation,
which also leads to an underestimate of the mass in the core. Second, the total cluster mass
can be underestimated due to mass-sheet degeneracy, where the mass-map is affected by the
non-detection of mass at the edge of the field of view. Although projection of foreground
and/or background structures unassociated with the clusters will artificially increase the
mass, it is highly unlikely that such projected structures significantly contributed to the de-
tected lensing signal (C06). Results from strong lensing, which is not susceptible to the same
systematic underestimation as weak lensing, do indeed give systematically higher projected
masses for this system, by about a factor of 2 within the inner few hundred kpc, which is the
region we are most interested in for this analysis (compare B06 and C06; see C06 for further
discussion of this discrepancy). Though we chose to use the mass estimates from strong
lensing, since it should give a more reliable estimate of the projected mass near the cluster
cores, it is interesting to explore the dependence of our results on the lensing mass estimates.
To this end, we conducted a simulation run similar to run R4, but with the initial cluster
– 13 –
central densities chosen such that the projected mass profiles at the observed separation were
about 2 times lower than the masses derived from strong lensing observations, roughly in
agreement with the weak lensing results given by C06 (the initial core radii were the same
as for run R4). Since the scattering probability depends on the density, we expect these less
massive halos to be more weakly affected by SIDM.
Results obtained from the simulations with the lower mass normalization show that
the effects of SIDM are diminished, as expected for a linear dependence of the scattering
probabilities on the projected mass. In run R4, the M/L ratio dropped by 27%, whereas for
the run with 1/2 the total mass it dropped by 14% (roughly a factor of 2 less). Similarly, the
galaxy/DM centroid offset was 11.1 kpc, again, about a factor of 2 down from the 24.1 kpc
offset seen in run R4. If we assume that this factor of two effect can be applied to all of
values given in columns 6 & 7 of Table 2 and consider the more sensitive M/L test, we find
that requiring f . 0.20 would correspond to σ . 1.25 cm2 g−1. This is done as a test of
the method only, since these low halo mass values are not realistic, as they are insufficient
to produce the system of strong arcs observed in the HST images (C06).
4.4. Low Merger Velocity
All of the simulations discussed so far have assumed a merger velocity that is consistent
with that derived from X-ray observations (Markevitch, 2005), which give a Mach number for
the shock front of M = 3.0±0.4, and it is assumed that the subcluster has the same velocity
as the shock (though see Springel & Farrar, 2007). Since the subcluster could have slowed
down, or the shock front accelerated, it is interesting to ask what effect a lower velocity
would have on the inferred upper limit on σ/m, particularly since the observed velocity is
larger than would be expected from free fall of the subcluster onto the main cluster (Farrar
& Rosen 2007). In order to test the dependence of our upper limit on merger velocity, we
ran a simulation with σ/m = 0.72 cm2 g−1 such that the relative velocity of the cluster DM
halos at observed separation was 1.5 times lower, about 3100 km s−1 (M ≈ 2). This is close
to the expected free-fall velocity of the subcluster, and to the relative velocity of 2860 km s−1
found by Springel & Farrar (2007) from hydrodynamical simulations of this system. The
results showed little difference from the higher velocity run (compare to run R4 in Table 2):
∆x was 30.2 kpc (vs. 24.1 kpc) and f was 0.25 (vs. 0.27). We therefore conclude that our
results are relatively insensitive to merger velocities that are not in large disagreement with
the observations. A weak dependence of the M/L ratio on the subcluster velocity v is easy
to understand: the particle scatters out of the subcluster as long as v/2 is much greater than
the escape velocity from the subcluster, which it is by a large margin (M04).
– 14 –
4.5. Effects of Diffuse Gas
As mentioned in § 2, the intracluster gas observed in the X-ray band was not included
in the simulations (doing so would greatly increase the computing time and the complexity
involved with matching the observations in detail). The only way for the gas to affect the
results is via gravitational interaction (we ignore the possibility of non-gravitational baryon-
DM interactions, the cross-section of which has been shown to be extremely small, e.g.,
Chen et al. 2002). In general, the gas is expected to contribute about 10% of the total
mass of the system, a figure which appears to be consistent with the lensing and X-ray
observations (B06). One might worry that, when matching the observed mass profiles, some
“extra” DM is needed to account for the missing gas. As mentioned in § 2, gas masses
have been subtracted from the lensing masses using a detailed model of the gas distribution
derived from fitting the X-ray observations. In terms of the test involving the decrease in the
subcluster M/L ratio (see § 3.2), we needn’t worry about the subcluster gas for the simple
reason that the gas bullet is far from the subcluster mass peak (roughly 23′′, or 102 kpc).
Therefore, the gas in the region of the bullet mass peak is not centrally concentrated and
will not significantly add to the binding energy of potentially scattered DM particles. For
the galaxy and total mass centroid offset test (see § 3.1), the exclusion of the gas is expected
to give a conservative result: the gas bullet and bar feature seen in Figure 1 will act to
decelerate the subcluster DM halo and galaxies. However, if, as is the case with SIDM, the
DM halo starts to lag behind the galaxies and gets closer to the gas cores than the main
concentration of the galaxies, it will experience a larger deceleration, thereby increasing the
offset between the two. Due to the relatively low mass of the gas components, and the large
distance between the gas peaks and the subcluster DM halo and galaxies (as compared to
the offset of the latter), the strength of this effect will be quite small. We therefore conclude
that including gas in the simulations would not significantly affect our results.
5. Summary
We have combined results from new X-ray, optical and lensing observations and our
N-body simulations of the merging galaxy cluster 1E 0657-56 in order to derive an upper
limit on the self-interaction cross-section of dark matter particles, σ/m. We give constraints
on σ/m based on two independent methods: from the lack of offset between the total mass
peak and galaxy centroid of the subcluster that would arise during the merger due to drag
on the subcluster halo from DM particle collisions, and from the lack of a decreased mass-
to-light ratio of the subcluster due to scattering of DM particles. From the former, we
find σ/m < 1.25 cm2 g−1, and from the latter, σ/m < 0.7 cm2 g−1, which includes the
– 15 –
uncertainty in the impact parameter of the merger (upper limits are from 68% confidence
intervals). Our best constraint is a modest improvement of the previous best constraint
from conservative analytic estimates of σ/m < 1 cm2 g−1 (M04). Furthermore, our limit of
σ/m < 1.25 cm2 g−1 is more robust than the best analytic limit, since this method does not
depend on the assumption that the subcluster and main cluster M/L ratios were equal prior
to the merger. Previous studies have found that σ/m ∼ 0.5−5 cm2 g−1 is needed produce the
observational effects that self-interacting dark matter has been invoked to explain (e.g., non-
peaked galaxy mass profiles and the underabundance of small halos within larger systems).
Our results rule out almost this full range of values, at least under the assumption that σ is
velocity-independent.
We would like to thank Volker Springel, Naoki Yoshida, Yago Ascasibar, and Alexey
Vikhlinin for useful discussions and for providing access to various private codes. Simulations
were performed on a Beowulf cluster at the ITC in the Harvard-Smithsonian Center for
Astrophysics. Support for this work was partially provided for by the NASA Chandra grants
G04-5152X and TM6-7010X, and NASA contract NAS8-39073.
REFERENCES
Ahn, K., & Shapiro, P.R. 2003, J. Korean Astronomical Soc., 36, 89
Ahn, K., & Shapiro, P.R. 2005, MNRAS, 363, 1092
Barrena, R., Biviano, A., Ramella, M., Falco, E. E., & Seitz, S. 2002, A&A, 386, 816
Bradač, M., Clowe, D., Gonzalez, A. H., Marshall, P., Forman, W., Jones, C., Markevitch,
M., Randall, S., Schrabback, T., & Zaritsky, D. 2006, ApJ, 652, 937 (B06)
Burkert, A. 2000, ApJ, 534, L143
Chen, X., Hannestad, S., & Scherrer, R. J. 2002, Phys. Rev. D65, 123515
Clowe, D., Gonzalez, A. H., & Markevitch M. 2004, ApJ, 604, 596
Clowe, D., Bradač, M., Gonzalez, A. H., Markevitch, M., Randall, S. W., Jones, C., &
Zaritsky, D. 2006a, ApJ, 648, L109 (C06)
Clowe, D., Randall, S. W., Markevitch, M. 2006, Proceedings of the 2006b UCLA Dark
Matter Symposium (astro-ph/0611496)
Dahle, H. 2000, Proceedings of The NOT in the 2000’s, Eds. N. Bergvall, L. O. Takalo, &
V. Piirola (Univ. or Turku), 45
Davé, R., Spergel, D. N., Steinhardt, P. J., & Wandelt, B. D. 2001, ApJ, 547, 574
http://arxiv.org/abs/astro-ph/0611496
– 16 –
Farrar, G. R., Rosen, R. A. 2007, in 2007 AAS/AAPT Joint Meeting, American Astronomical
Society Meeting 209, #37.04.AAS, 370
Flores, R. A., & Primack, J. R. 1994, ApJ, 427, L1
Furlanetto, S. R., & Loeb, A. 2002, ApJ, 565, 854
Gonzalez, A. H., Zaritsky, D., Simard, L., Clowe, D., White, S. D. M. 2002, ApJ, 579, 577
Hennawi, J. F., & Ostriker, J. P. 2002, ApJ, 572, 41
Hernquist, L. 1990, ApJ, 356, 359
Klypin, A., Kravtsov, A. V., Valenzuela, O., & Prada, F. 1999, ApJ, 522, 82
Merritt, D., Tremblay, B. 1994, AJ, 108, 514
Miralda-Escudé, J. 2002, ApJ, 564, 60
Navarro, J. S., Frenk, C. S., & White, S. D. M. 1995, MNRAS, 275, 720
Navarro, J. S., Frenk, C. S., & White, S. D. M. 1997, ApJ, 490, 493
Markevitch, M., Gonzales, A. H., David, L., Vikhlinin, A., Murray, S., Forman, W., Jones,
C., & Tucker, W. 2002, ApJ, 567, L27 (M02)
Markevitch, M., Gonzales, A. H., Clowe, D., Vikhlinin, A., Forman, W., Jones, C., Murray,
S., Tucker, W. ApJ, 606, 819 (M04)
Markevitch, M. 2005, in Proceedings of The X-ray Universe 2005, San Lorenzo de El Escorial,
Spain (in press, astro-ph/0511345)
Mellier, Y. 1999, ARA&A, 37, 127
Moore, B. 1994, Nature, 370, 629
Moore, B., Ghigna, S., Governato, F., Lake, G., Quinn, T., Stadel, J., & Tozzi, P. 1999a,
ApJ, 524, L19
Moore, B., Quinn, T., Governato, F., Stadel, J., & Lake, G. 1999b, MNRAS, 310, 1147
Sand, D. J., Treu, T., & Ellis, R. S. 2002, ApJ, 574, L129
Spergel, D. N., & Steinhardt, P. J. 2000, Phys. Rev. Lett., 84, 3760
Springel, V., & Farrar, G. 2007 (in press, astro-ph/0703232)
Springel, V. 2005, MNRAS, 364, 1105
Yoshida, N., Springel, V., White, S. D. M., Tormen, G. 2000a, 535, L103
Yoshida, N., Springel, V., White, S. D. M., Tormen, G. 2000b, 544, L87
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0511345
http://arxiv.org/abs/astro-ph/0703232
– 17 –
Table 1. Initial Simulation Parameters
Run Name NDM σ/m ρc,1 rc,1 ρc,2 rc,2
(cm2 g−1) (106 M⊙ kpc
−3) (kpc) (106 M⊙ kpc
−3) (kpc)
R1 106 0 3.27 213 4.59 149
R2 106 0.24 3.27 213 4.59 149
R3 106 0.48 4.42 183 6.57 129
R4 106 0.72 7.03 151 11.75 108
R5 106 0.96 6.26 167 9.76 124
R6 106 1.25 6.26 167 9.76 124
– 18 –
Table 2. Conditions at Observed Separation
Run Name NDM σ/m M1(r < 150kpc) M2(r < 150kpc) ∆ x
(cm2 g−1) (1013 M⊙) (10
13 M⊙) (kpc)
R1 106 0 12.0 11.1 1.8 0.0
R2 106 0.24 11.5 10.4 5.4 0.08
R3 106 0.48 11.8 10.4 15.0 0.16
R4 106 0.72 12.6 11.0 24.1 0.27
R5 106 0.96 12.4 10.9 37.9 0.32
R6 106 1.25 11.4 9.8 53.9 0.38
Obs. 11.9±1.6 10.6±0.4 25± 29 0.16± 0.07
a∆ x is the offset between the subcluster total mass and galaxy centroids.
bf is the fractional decrease in the mass-to-light ratio of the subcluster within 150 kpc.
– 19 –
Fig. 1.— X-ray image with weak lensing mass contours overlain. The gas bullet lags the
subcluster DM halo. The current separation of the subcluster and main cluster mass peaks
is 720 kpc.
– 20 –
Fig. 2.— Close up of the subcluster bullet region, with the DM (blue) and galaxy (red)
centroid error contours overlain. The contours show the 68.3% and 99.7% error regions. The
left panel shows the X-ray Chandra image, while the right shows the optical HST image.
– 21 –
Fig. 3.— Density profile of an isolated King model cluster at t = 0 (solid line), and after
evolving for 1 Gyr with σ/m = 0 (dotted line) and σ/m = 0.7 cm2 g−1 (dashed line).
– 22 –
Fig. 4.— Total mass within projected radius x for the cluster plotted in Figure 3. Line-type
indications are the same as in Figure 3.
– 23 –
Fig. 5.— The dependence of the subcluster galaxy and total mass centroid offset (∆ x, solid
line) and the fractional change in the subcluster M/L ratio (f , dashed line) on σ/m. Based
on the values given in Table 2.
	Introduction
	The Simulations
	Simulation Code and Parameters
	Initial Conditions
	Matching Simulations to Observations
	Stability of Simulated Halos
	Results
	Galaxy – Dark Matter Centroid Offset
	Subcluster M/L Ratio
	Structure in Subcluster Dark Matter Distribution
	Discussion
	Non-zero Impact Parameter
	Alternative Bullet Mass Profiles
	Mass Profile Dependence
	Low Merger Velocity
	Effects of Diffuse Gas
	Summary
ABSTRACT
  (Abridged) We compare recent results from X-ray, strong lensing, weak
lensing, and optical observations with numerical simulations of the merging
galaxy cluster 1E0657-56. X-ray observations reveal a bullet-like subcluster
with a prominent bow shock, while lensing results show that the positions of
the total mass peaks are consistent with the centroids of the collisionless
galaxies (and inconsistent with the X-ray brightness peaks). Previous studies,
based on older observational datasets, have placed upper limits on the
self-interaction cross-section of dark matter per unit mass, sigma/m, using
simplified analytic techniques. In this work, we take advantage of new,
higher-quality observational datasets by running N-body simulations of
1E0657-56 that include the effects of self-interacting dark matter, and
comparing the results with observations. Furthermore, the recent data allow for
a new independent method of constraining sigma/m, based on the non-observation
of an offset between the bullet subcluster mass peak and galaxy centroid. This
new method places an upper limit (68% confidence) of sigma/m < 1.25 cm^2/g. If
we make the assumption that the subcluster and the main cluster had equal
mass-to-light ratios prior to the merger, we derive our most stringent
constraint of sigma/m < 0.7 cm^2/g, which comes from the consistency of the
subcluster's observed mass-to-light ratio with the main cluster's, and with the
universal cluster value, ruling out the possibility of a large fraction of dark
matter particles being scattered away due to collisions. Our limit is a slight
improvement over the previous result from analytic estimates, and rules out
most of the 0.5 - 5cm^2/g range invoked to explain inconsistencies between the
standard collisionless cold dark matter model and observations.

<|endoftext|><|startoftext|>
Introduction 1
2. Preliminaries 3
3. The N = 1 Z2 × Z2 orbifold 7
3.1 Instanton sector 10
3.2 Recovery of the ADS superpotential 11
3.3 Absence of exotic contributions 14
3.4 Study of the back-reaction 16
4. The N = 1 Z2 × Z2 orientifold 17
4.1 Instanton sector 19
5. An N = 2 example: the Z3 orientifold 21
5.1 Instanton sector 24
6. Conclusions 25
1. Introduction
It has long been realized that instantons in string theory are often in close correspon-
dence with instantons in gauge theories [1, 2, 3, 4, 5, 6]. Recently it was found that in
some situations stringy instantons can dynamically generate some terms which from a
low-energy effective point of view enter as ordinary external couplings in the superpo-
tential of gauge theories living on space-filling branes [7, 8, 9, 10, 11, 12, 13, 14]. By
instantons in string theory we generally mean instantons which are geometrically real-
ized as Euclidean extended objects wrapped on some non-trivial cycles of the geometry.
Thus, in a sense, a stringy instanton has a “life of its own”, not requiring an underlying
gauge theory. This opens up the possibility of having contributions originating from
instantons that do not admit a standard gauge theory realization. We shall refer to
these instantons as exotic.
There has been some debate in the recent literature about the instances where
such exotic instantons can actually contribute to the gauge theory superpotential in a
non-trivial manner. In this work we will contribute to such a debate by considering
backgrounds where a simple CFT description is possible, such as orbifolds or orientifolds
thereof.
We present various simple examples of what we believe to be a rather generic
situation. Namely, the presence of extra zero-modes for these instantons, in addition to
those required by the counting of broken symmetries, makes some of their contributions
vanish. Such extra zero-modes should not come as a surprise, since a D-brane instanton
in a CY manifold breaks a total of four out of eight supercharges, i.e. it has two extra
fermionic zero-modes from the point of view of holomorphic N = 1 gauge theory
quantities. We give some arguments as to why the backreaction of the space-filling
branes on the geometry might not help in lifting these extra zero-modes. We further
argue that only more radical changes of the background, such as the introduction of
fluxes, deformations of the CY geometry or the introduction of orientifold planes, can
remove these zero-modes. When this happens, exotic instantons do contribute to the
gauge theory superpotential and may provide qualitative changes in the low energy
effective dynamics, as for instance the stabilization of otherwise runaway directions.
We will be interested in Euclidean D-branes in type II theories. We will work
with IIB fractional branes at orbifold and orientifold singularities rather than type IIA
wrapped branes. The motivation for this choice of setting is two-fold. First, recent
advances in the gauge/gravity correspondence require the study of exotic instantons,
whose effects tend to stabilize the gauge theory rather than unstabilize it [15, 16, 9, 17],
and the gauge/gravity correspondence is more naturally defined in the context of IIB
theory. Second, similar effects are used in string phenomenology to try to understand
possible mechanisms for neutrino masses [7, 8, 13]. This latest activity is mainly done
in the type IIA scenario, but we find it easier to address some subtle issues in the IIB
orbifold case.
While working in an exact string background, our considerations will nonetheless be
only local, i.e. we will not be concerned with global issues such as tadpole cancellation
that arise in proper compactifications. This is perfectly acceptable in the context of the
gauge/gravity correspondence where the internal manifold is non-compact but, even for
string phenomenology, the results we obtain stand (locally) when properly embedded
in a consistent compactification.
The paper is organized as follows: In section 2 we set up the notation and discuss
some preliminary material. In section 3 we discuss our first case, namely the N = 1
Z2 × Z2 orbifold. After briefly recovering the usual instanton generated corrections to
the superpotential we discuss the possible presence of additional exotic contributions
and find that they are not present because of the additional zero-modes. We conclude
by giving a CFT argument on why such zero-modes are not expected to be lifted
even by taking into account the backreaction of the D-branes, unless one is willing
to move out the orbifold point in the CY moduli space. Sections 4 and 5 present
two separate instances where exotic contributions are present after having removed the
extra zero-modes by orientifolding. The first is an N = 1 orientifold, the second is an
N = 2 orientifold, displaying corrections to the superpotential and the prepotential,
respectively. We end with some conclusions and a discussion of further developments.
2. Preliminaries
In this section we briefly review the generic setup in the well understoodN = 4 situation
in order to introduce the notation for the various fields and moduli and their couplings.
The more interesting theories we will consider next will be suitable projections of the
N = 4 theory. In fact, the exotic cases can all be reduced to orbifolds/orientifolds
of this master case once the appropriate projections on the Chan-Paton factors are
performed.
Since we are interested in instanton physics (for comprehensive reviews see [18] and
the recent [19]) we will take the ten dimensional metric to be Euclidean. We consider
a system where both D3-branes and D(−1)-branes (D-instantons) are present. To be
definite, we take N D3’s and k D-instantons 1.
Quite generically we can distinguish three separate open string sectors:
• The gauge sector, made of those open strings with both ends on a D3-brane. We
assume the brane world-volumes are lying along the first four coordinates xµ and
are orthogonal to the last six xa. The massless fields in this sector form an N = 4
SYM multiplet [22]. We denote the bosonic components by Aµ and X
a. Written
in N = 1 language this multiplet is formed by a gauge superfield whose field
strength is denoted by Wα and three chiral superfields Φ
1,2,3. With a slight abuse
of notation, the bosonic components of the chiral superfields will also be denoted
by Φ, i.e. Φ1 = X4+ iX5 and so on. In N = 2 language we have instead a gauge
superfield A and a hypermultiplet H , all in the adjoint representation. The low
energy action of these fields is a four dimensional N = 4 gauge theory. All these
fields are N ×N matrices for a gauge group SU(N).
1These D3/D(−1) brane systems (and their orbifold projections) are very useful and efficient in
studying instanton effects from a stringy perspective even in the presence of non-trivial closed string
backgrounds, both of NS-NS type [20] and of R-R type [21].
• The neutral sector, which comprises the zero-modes of strings with both ends
on the D-instantons. It is usually referred to as the neutral sector because these
modes do not transform under the gauge group. The zero-modes are easily ob-
tained by dimensionally reducing the maximally supersymmetric gauge theory to
zero dimensions. We will use an ADHM [23] inspired notation [5, 6]. We denote
the bosonic fields as aµ and χ
a, where the distinction between the two is made by
the presence of the D3-branes. The fermionic zero-modes are denoted by MαA
and λα̇A, where α and α̇ denote the (positive and negative) four dimensional chi-
ralities and A is an SU(4) (fundamental or anti-fundamental) index denoting the
chirality in the transverse six dimensions. The ten dimensional chirality of both
fields is taken to be negative. In Euclidean space M and λ must be treated as
independent. When needed, we will also introduce the triplet of auxiliary fields
Dc, directly analogous to the four dimensional D, that can be used to express the
various interactions in an easier form as we will see momentarily. All these fields
are k × k matrices where k is the instanton number.
• The charged sector, comprising the zero-modes of strings stretching between a
D3-brane and a D-instanton. For each pair of such branes we have two conjugate
sectors distinguished by the orientation of the string. In the NS sector, where the
world-sheet fermions have opposite modding as the bosons, we obtain a bosonic
spinor ωα̇ in the first four directions where the GSO projection picks out the neg-
ative chirality. In the conjugate sector, we will get an independent bosonic spinor
ω̄α̇ of the same chirality. Similarly, in the R sector, after the GSO projection
we obtain a pair of independent fermions (one for each conjugate sector) both
in the fundamental of SU(4) which we denote by µA and µ̄A. These fields are
rectangular matrices N × k and k ×N .
The couplings of the fields in the gauge sector give rise to a four dimensional gauge
theory. The instanton corrections to such a theory are obtained by constructing the
Lagrangian describing the interaction of the gauge sector with the charged sector zero-
modes while performing the integral over all zero-modes, both charged and neutral. A
crucial point to notice and which will be important later is that while the neutral modes
do not transform under the gauge group, their presence affects the integral because of
their coupling to the charged sector.
The part of the interaction involving only the instanton moduli is well known from
the ADHM construction and it is essentially the reduction of the interacting gauge
Lagrangian for these modes in a specific limit where the Yukawa terms for λ and the
quadratic term for D are scaled out (see [18, 6] for details). The final form of this part
of the interaction is:
S1 = tr
− [aµ, χ
+ χaω̄α̇ω
α̇χa +
(Σ̄a)ABµ̄
AµBχa −
(Σ̄a)ABM
αA[χa,M
µ̄Aωα̇ + ω̄α̇µ
A + σ
βα̇[M
βA, aµ]
λα̇A − iD
ω̄α̇(τ c)
α̇ωβ̇ + iη̄
µν [a
µ, aν ]
(2.1)
where the sum over colors and instanton indices is understood. τ denotes the usual
Pauli matrices, η̄ (and η) the ’t Hooft symbols and Σ̄ (and Σ) are used to construct
the six-dimensional gamma-matrices
Σ̄a 0
. (2.2)
The above interactions can all be understood in terms of string diagrams on a disk with
open string vertex operators inserted at the boundary in the α′ → 0 limit.
The interaction of the charged sector with the scalars of the gauge sector can be
worked out in a similar way and yields
S2 = tr
ω̄α̇X
(Σ̄a)ABµ̄
. (2.3)
Let us rewrite the above action in a way which will be more illuminating in the following
sections. Since we will be mainly focusing on situations where we have N = 1 super-
symmetry, it is useful to write explicitly all indices in SU(4) notation, and then break
them into SU(3) representations. We thus write the six scalars Xa as the antisymmetric
representation of SU(4) as follows
XAB = −XBA ≡ (Σ̄
a)ABXa . (2.4)
The action S2 then reads
S2 = tr
ǫABCDω̄α̇XABXCDω
µ̄AXABµ
. (2.5)
Splitting now the indices A into i = 1 . . . 3 and 4, we can identify Φ
i ≡ Xi4 in the 3̄ of
SU(3) and Φi ≡ 1
ǫijkXjk in the 3 of SU(3). Thus we can rewrite the action (2.5) as
S2 = tr
ωα̇ +
ǫijkµ̄
iΦjµk
. (2.6)
In the above form, it is clear which zero-modes couple to the holomorphic superfields
and which others couple to the anti-holomorphic ones. This distinction will play an
important role later.
The main object of our investigation is the integral of e−S1−S2 over all moduli
Z = C
d{a, χ,M, λ,D, ω, ω̄, µ, µ̄} e−S1−S2 , (2.7)
where we have lumped all field independent normalization constants (including the
instanton classical action and the appropriate powers of α′ required by dimensional
analysis) into an overall coefficient C. There are, of course, other interactions involving
the fermions and the gauge bosons but, as far as the determination of the holomorphic
quantities are concerned, they can be obtained from the previous ones and supersym-
metry arguments. For example, a term in the superpotential is written as the integral
over chiral superspace
dx4dθ2 of a holomorphic function of the chiral superfields, but
such a function is completely specified by its value for bosonic arguments at θ = 0.
Thus, if we can “factor out” a term
dx4dθ2 from the moduli integral (2.7), whatever
is left will define the complex function to be used in the superpotential and similarly
for the prepotential in the N = 2 case if we succeed in factoring out an integral over
N = 2 chiral superspace
dx4dθ4.
The coordinates x and θ must of course come from the (super)translations bro-
ken by the instanton and they will be associated to the center of mass motion of the
D-instanton, namely, xµ = tr aµ and θαA = trMαA for some values of A.2 One must
pay attention however to the presence of possible additional neutral zero-modes coming
either from the traceless parts of the above moduli or from the fields λ and χ. These
modes must also be integrated over in (2.7) and their effects, as we shall see, can be
quite dramatic. In particular, the presence of λ in some instances is crucial for the
implementation of the usual ADHM fermionic constraints whereas in other circum-
stances it makes the whole contribution to the superpotential vanish. These extra λ
zero-modes are ubiquitous in orbifold theories and generically make it difficult to obtain
exotic instanton corrections for these models. As we shall see, they can however be
easily projected out by an orientifold construction making the derivation of such terms
possible.
In the full expression for the instanton corrections there will also be a field-inde-
pendent normalization factor coming from the one-loop string diagrams and giving for
instance the proper gYM dependence in the case of the usual instanton corrections.
In this paper we will only focus on the integral over the zero-modes, which gives the
proper field-dependence, referring the reader to [10, 11] for a discussion of these other
issues.
2Obviously, for the case of an anti-instanton, the roles of M and λ are reversed.
3. The N = 1 Z2 × Z2 orbifold
In order to present a concrete example of the above discussion, let us study a simple
C3/Z2 × Z2 orbifold singularity. The resulting N = 1 theory is a non-chiral four-node
quiver gauge theory with matter in the bi-fundamental. Non-chirality implies that the
four gauge group ranks can be chosen independently [24]. This corresponds to being
able to find a basis of three independent fractional branes in the geometry (for a review
on fractional branes on orbifolds see e.g. [25]).
The field content can be conveniently summarized in a quiver diagram, see Fig. 1,
which, together with the cubic superpotential
W = Φ12Φ23Φ31 − Φ13Φ32Φ21 + Φ13Φ34Φ41 − Φ14Φ43Φ31
+Φ14Φ42Φ21 − Φ12Φ24Φ41 + Φ24Φ43Φ32 − Φ23Φ34Φ42 , (3.1)
uniquely specifies the theory.
SU(N  ) SU(N  )
SU(N  )SU(N  )
Figure 1: Quiver diagram for the Z2 × Z2 orbifold theory. Round circles correspond to
SU(Nℓ) gauge factors while the lines connecting quiver nodes represent the bi-fundamental
chiral superfields Φℓm.
A stack of N regular D3-branes amounts to having one and the same rank assign-
ment on the quiver. The gauge group is then SU(N)4 and the theory is anN = 1 SCFT.
Fractional branes correspond instead to different (but anomlay free) rank assignments.
Quite generically, fractional branes can be divided into three different classes, depend-
ing on the IR dynamics they trigger [26]. The non-chiral nature and the particularly
symmetric structure of the orbifold under consideration allows one to easily construct
any such instance of fractional brane class.
If we turn on a single node, we are left with a pure SU(N) SYM gauge theory,
with no matter fields and no superpotential. This theory is believed to confine. The
geometric dual effect is that the corresponding fractional brane leads to a geometric
transition where the branes disappear leaving behind a deformed geometry. Indeed,
there is one such deformation in the above singularity.
Turning on two nodes leads already to more varied phenomena. There are now
two bi-fundamental superfields, but still no tree level superpotential. Thus, the system
is just like two coupled massless SQCD theories or, by a slightly asymmetric point of
view, massless SQCD with a gauged diagonal flavor group. The low-energy behavior
depends on the relative ranks of the two nodes.
If the ranks are different, the node with the highest rank is in a situation where it
has less flavors than colors. Then an Affleck-Dine-Seiberg (ADS) superpotential [27, 28]
should be dynamically generated, leading eventually to a runaway behavior. This set
up of fractional branes is sometimes referred to as supersymmetry breaking fractional
branes [29, 26, 30].
If the ranks are the same we are in a situation similar to Nf = Nc SQCD for both
nodes. Hence we expect to have a moduli space of SUSY vacua, which gets deformed,
but not lifted, at the quantum level. This moduli space is roughly identified in the
geometry with the fact that the relevant fractional branes are interpreted as D5-branes
wrapped on the 2-cycle of a singularity which is locally C× (C2/Z2). Such a fractional
brane can move in the C direction. This is what is called an N = 2 fractional brane
since, at least geometrically, it resembles very much the situation of fractional branes
at N = 2 singularities.
In what follows we use the two-node example as a simple setting in which we can
analyze the subtleties involved in the integration over the neutral modes. For the gauge
theory instanton case it is known that there are extra neutral fermionic zero-modes in
addition to those required to generate the superpotential. Their integration allows to
recover the fermionic ADHM constraints on the moduli space of the usual field theory
instantons. For such instantons, we will be able to obtain the ADS superpotential
and corresponding runaway behavior in the familiar context with Nc and Nf fractional
branes at the respective nodes, for Nf = Nc−1. On the other hand, we will argue that
the presence of such extra zero-modes rules out the possibility of having exotic instanton
effects, such as terms involving baryonic operators in the Nf = Nc case. It was the
desire to study such possible contributions that constituted the original motivation for
this investigation. We will first show that such effects are absent for this theory as it
stands, and we will later discuss when and how this problem can be cured.3
3In a situation where the CFT description is less under control than in the setting discussed in
the present paper, it has been argued in [17] that such baryonic couplings do arise in the context of
fractional branes on orbifolds of the conifold, possibly at the expense of introducing O-planes. Also
in a IIA set up similar to the ones of [7, 8, 10, 11, 13] it seems reasonable that one can wrap an
ED2-brane along an O6-plane and produce such couplings on other intersecting D6-branes.
Our orbifold theory can be easily obtained as an orbifold projection of N = 4 SYM.
The orbifolding procedure and the derivation of the superpotential (3.1) are by now
standard. We briefly recall the main points in order to fix the notation and because
some of the details will be useful later in describing the instantons in such a set up.
The group Z2 × Z2 has four elements: the identity e, the generators of the two Z2
that we denote with g1 and g2 and their product, denoted by g3 = g1g2. If we introduce
complex coordinates (z1, z2, z3) ∈ C
z1 = x4 + ix5 , z2 = x6 + ix7 , z3 = x8 + ix9 (3.2)
the action of the orbifold group can be defined as in Table 1.
z1 z2 z3
e z1 z2 z3
1 −z2 −z3
g2 −z
1 z2 −z3
g3 −z
1 −z2 z3
Table 1: The action of the orbifold generators.
Let γ(g) be the regular representation of the orbifold group on the Chan-Paton
factors. If the orbifold is abelian, as always in the cases we shall be interested in, we
can always diagonalize all matrices γ(g). We will assume that the two generators have
the following matrix representation
γ(g1) = σ3 ⊗ 1 =
1 0 0 0
0 1 0 0
0 0 −1 0
0 0 0 −1
, γ(g2) = 1⊗ σ3 =
1 0 0 0
0 −1 0 0
0 0 1 0
0 0 0 −1
(3.3)
where the 1’s denote Nℓ ×Nℓ unit matrices (ℓ = 1, ..., 4). Then, the orbifold projection
amounts to enforcing the conditions
Aµ = γ(g)Aµγ(g)
−1 , Φi = ±γ(g)Φiγ(g)−1 (3.4)
where the sign ± must be chosen according to the action of the orbifold generators
g that can be read off from Table 1. With the choice (3.3), the vector superfields
are block diagonal matrices of different size (N1, N2, N3, N4), one for each node of the
quiver, while the three chiral superfields Φi have the following form [24]
0 × 0 0
× 0 0 0
0 0 0 ×
0 0 × 0
, Φ2 =
0 0 × 0
0 0 0 ×
× 0 0 0
0 × 0 0
, Φ3 =
0 0 0 ×
0 0 × 0
0 × 0 0
× 0 0 0
, (3.5)
where the crosses represent the non-zero entries Φℓm appearing in the superpotential
(3.1).
3.1 Instanton sector
Now consider D-instantons in the above set up. Such instantons preserve half of the
4 supercharges preserved by the system of D3-branes plus orbifold. In this respect
recall that the fractional branes preserve exactly the same supercharges as the regular
branes.4 Using the N = 4 construction of the previous section and the structure of the
orbifold presented in eq. (3.5), we now proceed in describing the zero-modes for such
instantons.
The neutral sector is very similar to the gauge sector. Indeed, in the (−1) su-
perghost picture, the vertex operators for such strings will be exactly the same, except
for the eip·X factor which is absent for the instanton. The Chan-Paton structure will
also be the same, so that the same pattern of fractional D-instantons will arise as for
the fractional D3-branes. In particular, the only regular D-instanton (which could be
thought of as deriving from the one of N = 4 SYM) is the one with rank (instanton
number) one at every node. All other situations can be thought of as fractional D-
instantons, which can be interpreted as Euclidean D1-branes wrapped on the two-cycles
at the singularity, ED1 for short. Generically, we can then characterize an instanton
configuration in our orbifold by (k1, k2, k3, k4).
Following the notation introduced in section 2, the bosonic modes will comprise a
4×4 block diagonal matrix aµ, and six more matrix fields χ1, . . . χ6, that can be paired
into three complex matrix fields χ1+iχ2, χ3+iχ4, χ5+iχ6, having the same structure as
(3.5) but now where each block entry is a kℓ×km matrix. On the fermionic zero-modes
MαA and λα̇A (also matrices) the orbifold projection enforces the conditions
MαA = R(g)AB γ(g)M
αBγ(g)−1 , λα̇A = γ(g)λα̇Bγ(g)
−1R(g)BA (3.6)
4There is another Euclidean brane which preserves two supercharges, namely the Euclidean (anti)
D3-branes orthogonal to the 4 dimensions of space-time. We will be considering here only the D-
instantons, leaving the complete analysis of the other effects to future work. In this context, note that
the extended brane instantons would have an infinite action (and thus a vanishing contribution) in
the strict non-compact set up we are using here.
where R(g) is the orbifold action of Table 1 in the spinor representation which can be
chosen as
R(g1) = −Γ
6789 , R(g2) = −Γ
4589 . (3.7)
It is easy to find an explicit representation of the Dirac matrices such thatMαA and λα̇A
for A = 1, 2, 3 also have the structure of (3.5) while for A = 4 they are block diagonal.
Equivalently, one could write the spinor indices in the internal space in terms of the
three SO(2) charges associated to the embedding SO(2) × SO(2) × SO(2) ⊂ SO(6) ≃
SU(4)
Mα−++ =Mα1 , Mα+−+ =Mα2 , Mα++− =Mα3 , Mα−−− =Mα4 ,
λα̇+−− = λα̇1 , λα̇−+− = λα̇2 , λα̇−−+ = λα̇3 , λα̇+++ = λα̇4 . (3.8)
The most notable difference between the neutral sector and the gauge theory on the
D3-branes is that, whereas in the four-dimensional theory the U(1) gauge factors are
rendered massive by a generalization of the Green-Schwarz mechanism and do not
appear in the low energy action, for the instanton they are in fact present and enter
crucially into the dynamics.
Let us finally turn to the charged sector, describing strings going from the instan-
tons to the D3-branes. The analysis of the spectrum and the action of the orbifold
group on the Chan-Paton factors show, in particular, that the bosonic zero-modes are
diagonal in the gauge factors. There are four block diagonal matrices of bosonic zero-
modes ωα̇, ω̄α̇ with entries Nℓ×kℓ and kℓ×Nℓ respectively and eight fermionic matrices
µA, µ̄A with entries Nℓ × km and km × Nℓ, that again display the same structure as
above – same as (3.5) for A = 1, 2, 3 and diagonal for A = 4.
3.2 Recovery of the ADS superpotential
The measure on the moduli space of the instantons and the ADHM constraints are
simply obtained by inserting the above expressions into the moduli integral (2.7). If
one chooses some of the Nℓ or kℓ to vanish one can deduce immediately from the
structure of the projection which modes will survive and which will not.
As a consistency check, one can try to reproduce the ADS correction to the super-
potential [27, 28] for the theory with two nodes. Take fractional branes corresponding
to a rank assignment (Nc, Nf , 0, 0), and consider the effect of a ED1 corresponding to
instanton numbers (1, 0, 0, 0).
The only chiral fields present are the two components of Φ1 connecting the first
and second node
0 Q 0 0
Q̃ 0 0 0
0 0 0 0
0 0 0 0
. (3.9)
Since the instanton is sitting only at one node, all off diagonal neutral modes are absent,
as they connect instantons at two distinct nodes. Thus, the only massless modes present
in the neutral sector are four bosons xµ, denoting the upper-left component of aµ, two
fermions θα denoting the upper-left component of Mα4 and two more fermions λα̇
denoting the upper-left component of λα̇4. We have identified the non zero entries of
aµ and Mα4 with the super-coordinates xµ and θα since they precisely correspond to
the Goldstone modes of the super-translation symmetries broken by the instanton and
do not appear in S1+ S2 (cfr. (2.1) and (2.3)). Their integration produces the integral
over space-time and half of Grassmann space which precedes the superpotential term
to which the instanton contributes. On the contrary, λα̇ appears in S1 and when it is
integrated it yields the fermionic ADHM constraint.
In the charged sector, we have bosonic zero-modes ωuα̇ and ω̄α̇u, with u an index
in the fundamental or anti-fundamental of SU(Nc). In addition, there are fermionic
zero-modes µu and µ̄u with indices in SU(Nc), together with additional fermionic zero-
modes µ′f and µ̄′f where the index f is now in the fundamental or anti-fundamental of
SU(Nf ).
5 Note that the µ zero-modes carry an SU(4) index 4 (being on the diagonal)
while the µ′ zero-modes carry an SU(4) index 1, since they are of the same form as Φ1.
All this can be conveniently summarized in a generalized quiver diagram as rep-
resented in Fig. 2, which accounts for both the brane configuration and the instanton
zero-modes.
For a single instanton, the action (2.1) greatly simplifies since many fields are
vanishing as well as all commutators and one gets
S1 = i (µ̄uω
α̇ + ω̄α̇uµ
u) λα̇ − iDcω̄α̇u(τ
. (3.10)
Similarly, the coupling of the charged modes to the chiral superfield can be expressed
by writing eq. (2.3) as
ω̄α̇u
v + Q̃
ωα̇v −
µ̄uQ̃
µ̄′fQ
u . (3.11)
Note that it is the anti-holomorphic superfields that enter in the couplings with the
fermionic zero-modes, as is clear by comparing with (2.6). The above action is exactly
the same which appears in the ADHM construction as reviewed in [18].
5Recall that the bosonic zero-modes are diagonal in the gauge factors; therefore there are no ω
and ω̄α̇f zero-modes.
SU(N  ) SU(N  )f
Figure 2: Quiver diagram describing an ordinary instanton in a SU(Nc) × SU(Nf ) theory.
Gauge theory nodes are represented by round circles, instanton nodes by squares. The ED1
is wrapped on the same cycle as the color branes. All zero-modes are included except the θ’s
and the xµ’s, which only contribute to the measure for the integral over chiral superspace.
We are now ready to perform the integral (2.7) over all the existing zero-modes.
Writing
dx4dθ2W , (3.12)
we see that the instanton induced superpotential is
W = C
d{λ,D, ω, ω̄, µ, µ̄} e−S1−S2 . (3.13)
The integrals over D and λ enforce the bosonic and fermionic ADHM constraints,
respectively. Thus
W = C
d{ω, ω̄, µ, µ̄} δ(µ̄uω
α̇ + ω̄α̇uµ
u) δ(ω̄α̇u(τ
) e−S2 . (3.14)
We essentially arrive at the point of having to evaluate an integral over a set of zero-
modes which is exactly the same as the one discussed in detail in the literature, e.g. [18].
We thus quickly go to the result referring the reader to the above review for further
details. First of all, it is easy to see that, due to the presence of extra µ modes in
the integrand from the fermionic delta function, only when Nf = Nc − 1 we obtain
a non-vanishing result. After having integrated over the µ and µ′, we are left with
a (constrained) gaussian integration that can be performed e.g. by going to a region
of the moduli space where the chiral fields are diagonal, up to a row/column of ze-
roes. Furthermore, the D-terms in the gauge sector constrain the quark superfields
to obey QQ† = Q̃†Q̃, so that the bosonic integration brings the square of a simple
determinant in the denominator. The last fermionic integration conspires to cancel the
anti-holomorphic contributions and gives
WADS =
Λ2Nc+1
det(Q̃Q)
, (3.15)
which is just the expected ADS superpotential for Nf = Nc − 1, the only case where
such non-perturbative contribution is generated by a genuine one-instanton effect and
not by gaugino condensation. In (3.15) Λ is the SQCD strong coupling scale that is
reconstructed by the combination of e−8π
2/g2 coming from the instanton action with
various dimensional factors coming from the normalization of the instanton measure
[18].
3.3 Absence of exotic contributions
Until now, we have reproduced from stringy considerations the effect that is supposed
to be generated also by instantons in the gauge theory. Considering a slightly different
set up, we would like to study the possibility of generating other terms.
Let us consider a system with rank assignment (Nc, Nf , 0, 0), as before, but frac-
tional instanton numbers (0, 0, 1, 0). In other words, we study the effect of a single
fractional instanton sitting on an unoccupied node of the gauge theory. The quiver
diagram, with the relevant zero-modes structure, is given in Fig. 3.
The neutral zero-modes of the instanton sector are the same as before. This is
because the quantization of this sector does not know the whereabouts of the D3-
branes and thus all nodes are equivalent, in this respect. In the mixed sector, we have
no bosonic zero-modes now, since the ω and ω̄ are diagonal. Note that, although we
always have four mixed (ND) boundary conditions, due to the quiver structure induced
by the orbifold, here we effectively realize the same situation one has when there are
eight ND directions, namely that the bosonic sector of the charged moduli is empty.
On the other hand, there are fermionic zero-modes µu, µ̄u, µ
′f and µ̄′f , as in the
previous case. Note that despite having the same name, these zero-modes correspond
actually to different Chan-Paton matrix elements with respect to the previous ones,
the difference being in the instanton index that is not written explicitly. In particular
we can think of µ and µ′ as carrying an SU(4) index 2 and 3 respectively.
Because of the absence of bosonic charged modes, the action (2.1) is identically
SU(N  )fSU(N  )
Figure 3: Quiver diagram describing an exotic instanton in a SU(Nc) × SU(Nf ) theory.
Gauge theory nodes are represented by round circles, instanton nodes by squares. The ED1
is wrapped on a different cycle with respect to both sets of quiver branes.
zero and the action (2.3) contains only the last term:
S1 = 0
µ̄′fQ̃
u. (3.16)
Note that in this case it is the holomorphic superfields which appear above, as is clear
from (2.6) and from noticing that the diagonal fermionic zero-mode µ4 is not present.
We are thus led to consider
W = C
d{λ,D, µ, µ̄} e−S2 . (3.17)
One notices right away that the integral over the charged modes is non vanishing
(only) for the case Nf = Nc and gives a tantalizing contribution proportional to BB̃,
where B = detQ and B̃ = det Q̃ are the baryon fields of the theory. However, we
must carefully analyze the integration over the remaining zero-modes of the neutral
sector. Now neither D nor λ appear in the integrand. The integral over D does not
raise any concern: it is, after all, an auxiliary field and its disappearance from the
integrand is due to the peculiarities of the ADHM limit. Before taking this limit, D
appeared quadratically in the action and could be integrated out, leaving an overall
normalization constant. The integral over λ is another issue. In this case, λ is absent
from the integrand even before taking the ADHM limit and its integration multiplies
the above result by zero, making the overall contribution of such instantons to the
superpotential vanishing. Of course, the presence of such extra zero-modes should not
come as a surprise since they correspond to the two extra broken supersymmetries of
an instanton on a CY.
Therefore we see that the neutral zero-modes contribution, in the exotic instanton
case, plays a dramatic role and conspires to make everything vanishing (as opposite
to the ADS case analyzed before). A natural question is to see whether these zero-
modes get lifted by some effect we have not taken into account, yet. For one thing,
supersymmetry arguments would make one think that taking into account the back-
reaction of the D3-branes might change things. However, in the following subsection
we show that this seems not to be the case.
3.4 Study of the back-reaction
Let us stick to the case Nf = Nc, which is the only one where the integral (3.17)
might give a non-vanishing contribution. In this case the fractional brane system
is nothing but a stack of (Nc) N = 2 fractional branes. These branes couple to
only one of the 3 closed string twisted sectors [24]. More specifically, they source
the metric hµν , the R-R four-form potential Cµνρσ and two twisted scalars b and c from
the NS-NS and R-R sector respectively. This means that the disk one-point function
of their vertex operators [31, 32] is non vanishing when the disk boundary is attached
to such D3-branes. (Indeed in this way or, equivalently, by using the boundary-state
formalism [33, 34], one can derive the profile for these fields.)
If the back-reaction of these fields on the instanton lifted the extra zero-modes
λ’s, this should be visible when computing the one point function of the corresponding
closed string vertex operators on a disk with insertions on this boundary of the vertex
operators for such moduli. To see whether such coupling is there, we first need to write
down the vertex operators for the λ’s in the (±1/2) superghost pictures. The vertex in
the (−1/2) picture is found e.g. in [6] and reads
λ (z) = λα̇AS
α̇(z)SA(z)e−φ(z)/2 , (3.18)
where Sα̇(z) and SA(z) are the spin-fields in the first four and last six directions re-
spectively. For our argument we need to focus on the SA(z) dependence. Since the
modulus that survives the orbifold projection is, with our conventions, λα̇4 = λα̇+++,
we write the corresponding spin-field as
S+++(z) = eiH1(z)/2eiH2(z)/2eiH3(z)/2, (3.19)
where Hi(z) is the free boson used to bosonize the fermionic sector in the i-th complex
direction: ψi(z) = eiHi(z). The vertex operator in the +1/2 picture can be obtained by
applying the picture-changing operator to (3.18)
λ (z) = [QBRST, ξV
λ (z)] . (3.20)
The crucial part in QBRST is [31]
QBRST =
ψµ∂Xµ + ψ̄i∂Z i + ψi∂Z̄ i
+ . . . (3.21)
Because of the nature of the supercurrent, we see that (3.21) flips at most one sign in
(3.19), hence the product V
λ will always carry an unbalanced charge in some
of the three internal SO(2) groups. On the other hand, the vertex operators for the
fields sourced by the fractional D3’s cannot compensate such an unbalance. Hence, their
correlation function on the D-instanton with the insertion of V
λ carries a charge
unbalance and therefore vanishes. Therefore, at least within the above perturbative
approach, the neutral zero-modes seem not to get lifted by the back-reaction of the
D3-branes.
One might consider some additional ingredients which could provide the lifting. A
natural guess would be moving in the CY moduli space or adding suitable background
fluxes [35, 36]. There are indeed non-vanishing background fields at the orbifold point,
i.e. the b fields of the twisted sectors which the N = 2 fractional branes do not
couple to. These fields, however, being not associated to geometric deformations of
the internal space should be described by a CFT vertex operator uncharged under the
SO(2)’s, simply because of Lorentz invariance in the internal space. Therefore, the
only way to get an effective mass term for the zero-modes λ would be to move out of
the orbifold point in the CY moduli space. Indeed, the other moduli of the NS-NS
twisted sector, being associated to geometric blow-ups of the singularity, are charged
under (some of) the internal SO(2)’s and can have a non vanishing coupling with the
λ’s. More generically, complicated closed string background fluxes might be suitable.
This is an interesting option which however we do not pursue here, since we want to
stick to situations where a CFT description is available.
A more radical thing to do is to remove the zero-modes from the very start, for
instance by means of an orientifold projection [37, 38]. This is the option we are going
to consider in the remainder of this work.
4. The N = 1 Z2 × Z2 orientifold
In this section we supplement our orbifold background by an O3 orientifold and show
that in this case exotic instanton contributions do arise and provide new terms in the
superpotential. We refer to e.g. [39, 40, 41] for a comprehensive discussion of N = 1
and N = 2 orientifolds.
The first ingredient we need is the action of the O3-plane on the various fields.
Denote by Ω the generator of the orientifold. The action of Ω on the vertex operators
for the various fields (ignoring for the time being the Chan-Paton factors) is well known.
The vertex operators for the bosonic fields on the D3-brane contain, in the 0 picture,
the following terms: Aµ ∼ ∂τx
µ and Φi ∼ ∂σz̄
i. They both change sign under Ω,
the first because of the derivative ∂τ and the second because the orientifold action for
the O3-plane is always accompanied by a simultaneous reflection of all the transverse
coordinates zi.
The action of the orientifold on the Chan-Paton factors is realized by means of
a matrix γ(Ω) which in presence of an orbifold must satisfy the following consistency
condition [39]
γ(g)γ(Ω)γ(g)T = + γ(Ω) (4.1)
for all orbifold generators g. This amounts to require that the orientifold projection
commutes with the orbifold projection. The matrix γ(Ω) can be either symmetric or
anti-symmetric. We choose to perform an anti-symmetric orientifold projection on the
D3 branes and denote the corresponding matrix by γ−(Ω). This requires having an
even number Nℓ of D3 branes on each node of the quiver so that we can write
γ−(Ω) =
ǫ1 0 0 0
0 ǫ2 0 0
0 0 ǫ3 0
0 0 0 ǫ4
(4.2)
where the ǫℓ’s are Nℓ × Nℓ antisymmetric matrices obeying ǫ
ℓ = −1. Using (3.3) and
(4.2) it is straightforward to verify that the consistency condition (4.1) is verified.
The field content of the stacks of fractional D3-branes in this orientifold model is
obtained by supplementing the orbifold conditions (3.4) with the orientifold ones
Aµ = −γ−(Ω)A
µγ−(Ω)
−1 , Φl = −γ−(Ω)Φ
lTγ−(Ω)
−1. (4.3)
This implies that Aµ = diag (A
µ) with A
µ = ǫℓA
µ ǫℓ. Thus, the result-
ing gauge theory is a USp(N1) × USp(N2) × USp(N3) × USp(N4) theory. The chiral
superfields, which after the orbifold have the structure (3.5), are such that the Φℓm
component joining the nodes ℓ and m of the quiver, must obey the orientifold condi-
tion Φℓm = ǫℓΦ
mℓǫm. In the following, we will take N3 = N4 = 0 so that we are left
with only two gauge groups and no tree level superpotential.
4.1 Instanton sector
Let us now consider the instanton sector, starting by analyzing the zero-mode content
in the neutral sector. There are two basic changes to the previous story. The first is
that the vertex operator for aµ is now proportional to ∂σx
µ, not to ∂τx
µ and it remains
invariant under Ω (the vertex operator for χa still changes sign). The second is that
the crucial consistency condition discussed in [38] requires that we now represent the
action of Ω on the Chan-Paton factors of the neutral modes by a symmetric matrix
which can be taken to be
γ+(Ω) =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
, (4.4)
where the 1’s are kℓ × kℓ unit matrices. The matrix aµ will be 4 × 4 block diagonal,
e.g. aµ = diag (a
µ), but now a
µ = a
µ . The most generic situation is to have
a configuration with instanton numbers (k1, k2, k3, k4). By considering a configuration
with k3 = 1 and k1 = k2 = k4 = 0, we can project out all bosonic zero-modes except
for the four components a3µ that we denote by xµ. The scalars χ
4 . . . χ9 are off-diagonal
and we shall not consider them further.
The nice surprise comes when considering the orientifold action on the fermionic
neutral zero-modes MαA and λα̇A. The orbifold part of the group acts on the spinor
indices as in (3.7), while the orientifold projection acts as the reflection in the transverse
space, namely
R(Ω) = −iΓ456789 (4.5)
Putting together the orbifold projections (3.6) with the orientifold ones
MαA = RAB(Ω)γ+(Ω)(M
αB)Tγ+(Ω)
−1 , λα̇A = γ+(Ω)(λα̇B)
Tγ+(Ω)
−1RBA(Ω) (4.6)
we can find the spectrum of surviving fermionic zero-modes. Using (4.4) and (4.5), it
is easy to see that (4.6) implies
MαA = (MαA)T , λα̇A = −(λα̇A)
T . (4.7)
Thus, for the simple case where k3 = 1 and k1 = k2 = k4 = 0, all λ’s are projected out
and only two chiral M zero-modes remain: Mα−−−, to be identified with the N = 1
chiral superspace coordinates θα.
Also the charged zero-modes are easy to discuss in this simple scenario. There are
no bosonic modes since the D-instanton and the D3-branes sit at different nodes while
the bosonic modes are necessarily diagonal. Most of the fermionic zero-modes µA and
µ̄A are also projected out by the orbifold condition
µA = R(g)ABγ(g)µ
Bγ(g)−1 , µ̄A = R(g)ABγ(g)µ̄
Bγ(g)−1 . (4.8)
Finally, the orientifold condition relates this time the fields in the conjugate sectors,
allowing one to express µ̄ as a linear combination of the µ
µ̄A = R(Ω)ABγ+(Ω)(µ
B)Tγ−(Ω)
−1 . (4.9)
The only charged modes surviving these projections can be expressed, in block 4 × 4
notation, as
0 0 µ13 0
0 0 0 0
0 0 0 0
0 0 0 0
, µ̄2 =
0 0 0 0
0 0 0 0
µ̄31 0 0 0
0 0 0 0
0 0 0 0
0 0 µ23 0
0 0 0 0
0 0 0 0
, µ̄3 =
0 0 0 0
0 0 0 0
0 µ̄32 0 0
0 0 0 0
, (4.10)
where the entries, to be thought of as column/row vectors in the fundamental/anti-
fundamental of SU(Nℓ) depending on their position, are such that µ̄31 = −µ
13ǫ1 and
µ̄32 = −µ
23ǫ2.
Thus, in the case where we have fractional D3 branes (N1, N2, 0, 0) and an exotic
instanton (0, 0, 1, 0), the only surviving chiral field is Φ12 ≡ ǫ1Φ
21ǫ2, the orientifold
projection eliminates the offending λ’s and we are left with just the neutral zero-modes
xµ and θ
α and the charged ones µ13 and µ23. This is summarized in the generalized
quiver of Fig. 4.
In this case the instanton partition function is
dx4dθ2 W (4.11)
where the superpotential W is
W = C
dµ e−S1−S2 = C
dµ13dµ23 e
iµT13ǫ1Φ12µ23 . (4.12)
This integral clearly vanishes unless N1 = N2, in which case we have
W ∝ det(Φ12) (4.13)
USP(N  )USP(N  )1 2
Φ Φ1221
2313µ
Figure 4: The generalized Z2 ×Z2 orientifold quiver and the exotic instanton contribution.
We thus see that exotic instanton corrections are possible in this simple model.6
It is interesting to note that the above correction is present in the same case
(N1 = N2 ≡ N) where the usual ADS superpotential for USp(N) is generated [42]
WADS =
Λ2N+3
det(Φ12)
(4.14)
and its presence stabilizes the runaway behavior and gives a theory with a non-trivial
moduli space of supersymmetric vacua given by det(Φ12) = const. Of course, the ADS
superpotential for this case can also be constructed along the same lines as section 3.2,
see e.g. [18]. In fact, this derivation is somewhat simpler than the one for the SU(N)
gauge group since there are no ADHM constraints at all in the one instanton case.
We think the above situation is not specific to the background we have been consid-
ering, but is in fact quite generic. As soon as the λ zero-modes are consistently lifted,
we expect the exotic instantons to contribute new superpotential terms. As a further
example, in the next section we will consider a N = 2 model, where exotic instantons
will turn out to contribute to the prepotential.
5. An N = 2 example: the Z3 orientifold
Let us now consider the quiver gauge theory obtained by placing an orientifold O3-plane
at a C × C2/Z3 orbifold singularity. In what follows we will use N = 1 superspace
notation. We first briefly repeat the steps that led to the constructions of such a quiver
6The gauge invariant quantity above can be rewritten as the Pfaffian of a suitably defined mesonic
matrix.
theory in the seminal paper [39]. Define ξ = e2πi/3 and let the generator of the orbifold
group act on the first two complex coordinates as
0 ξ−1
, (5.1)
while leaving the third one invariant. This preserves N = 2 SUSY. The action of the
generator g on the Chan-Paton factors is given by the matrix
γ(g) =
1 0 0
0 ξ 0
0 0 ξ2
 . (5.2)
The N = 2 theory obtained this way, summarized in Fig. 5, is a three node quiver
gauge theory with gauge groups SU(N1)× SU(N2)× SU(N3), supplemented by a cubic
superpotential which is nothing but the orbifold projection of the N = 4 superpotential
(its precise form is not relevant for the present purposes).
SU(N  ) SU(N  )
SU(N  )1
Figure 5: The Z3 (un-orientifolded) theory. The lines with both ends on a single node
represent adjoint chiral multiplets which, together with the vector multiplets at each node
constitute the N = 2 vector multiplets. Similarly, lines between nodes represent chiral mul-
tiplets which pair up into hyper-multiplets, in N = 2 language.
As for the action of Ω on the Chan-Paton factors, we choose again to perform the
symplectic projection on the D3-branes. To do so, we must take N1 to be even and
N2 = N3, so that we can write
γ−(Ω) =
ǫ 0 0
0 0 1
0 −1 0
 , (5.3)
where ǫ is a N1×N1 antisymmetric matrix obeying ǫ
2 = −1 and the 1’s denote N2×N2
identity matrices. The matrices γ(g) and γ−(Ω) satisfy the usual consistency condi-
tion [38, 39] as in (4.1).
The field content on the fractional D3-branes at the singularity will be given by
implementing the conditions
Aµ = γ(g)Aµγ(g)
−1 , Φi = ξ−iγ(g)Φiγ(g)−1 ,
Aµ = −γ−(Ω)A
µγ−(Ω)
−1 , Φi = −γ−(Ω)Φ
iTγ−(Ω)
−1 . (5.4)
The orbifold part of these conditions forces Aµ and Φ
3 to be 3 × 3 block diagonal
matrices, e.g. Aµ = diag (A
µ), while the orientifold imposes that A
µ = ǫA
and A2µ = −A
µ . The resulting gauge theory is thus a USp(N1)× SU(N2) theory. It is
convenient, however, to still denote A2µ and A
µ diagramatically as belonging to different
nodes with the understanding that these should be identified in the above sense.
The projection on the chiral fields can be done similarly and we obtain, denoting
by Φℓm the non-zero entries of the fields Φ
1 and Φ2 (only one can be non-zero for each
pair ℓm)
Φ12 = −ǫΦ
31, Φ13 = +ǫΦ
21, Φ23 = Φ
23, Φ32 = Φ
32 . (5.5)
The field content is summarized in Table 2.
USp(N1) SU(N2)
Φ12 � �
Φ21 � �
Φ13 � �
Φ31 � �
Φ23 · ��
Φ32 · ��
Table 2: Chiral fields making up the quiver gauge theory.
The theory we want to focus on in the following has rank assignment (N1, N2) =
(0, N). This yields an N = 2 SU(N) gauge theory with an hyper-multiplet in the sym-
metric/(conjugate)symmetric representation. We denote the N = 2 vector multiplet
by A whose field content in the block 3× 3 notation is thus
0 0 0
0 A 0
0 0 −AT
 . (5.6)
In what follows we will be interested in studying corrections to the prepotential F
coming from exotic instantons associated to the first node (the one that is not populated
by D3-branes). Let us then analyze the structure of the stringy instanton sector of the
present model, first.
5.1 Instanton sector
The most generic situation is to have a configuration with instanton numbers (k1, k2)
(later we will be mainly concerned with a configuration with instanton numbers (1, 0)).
Let us start analyzing the zero-modes content in neutral sector. The story is
pretty similar to the one discussed in the previous section. The vertex operator for aµ
is proportional to ∂σx
µ and so it remains invariant under Ω. The action on the Chan-
Paton factors of these D-instantons must now be represented by a symmetric matrix
which we take to be
γ+(Ω) =
1′ 0 0
0 0 1
0 1 0
 (5.7)
where 1′ is a k1 × k1 unit matrix and the 1’s are k2 × k2 unit matrices.
Because of the different orientifold projection, the matrices of bosonic zero-modes
behave slightly differently. The matrices aµ, χ
8 and χ9 will still be 3×3 block diagonal,
e.g. aµ = diag (a
µ), but now a
µ = a
µ and a
µ = a
µ whereas the same relations
for χ8 and χ9 will have an additional minus sign. The remaining fields χ4...7 are off
diagonal and we shall not consider them further since we will consider only the case of
one type of instanton. By considering a configuration with k1 = 1 and k2 = 0, we can
project out all bosonic zero-modes except for the four components a1µ that we denote
by xµ.
Let us now consider the orientifold action on the fermionic neutral zero-modesMαA
and λα̇A. The orbifold part of the group acts on the internal spinor indices as a rotation
R(g) = e
Γ45e−
Γ67 , (5.8)
while the orientifold acts through the matrix R(Ω) given in (4.5). The orbifold and
orientifold projections thus require
MαA = R(g)ABγ(g)M
αBγ(g)−1 , λα̇A = γ(g)λα̇Bγ(g)
−1R(g)BA , (5.9)
MαA = R(Ω)ABγ+(Ω)(M
αB)Tγ+(Ω)
−1 , λα̇A = γ+(Ω)(λα̇B)
Tγ+(Ω)
−1R(Ω)BA .
Using the explicit expressions for the various matrices, we see that, for the simple
case where k1 = 1 and k2 = 0, all λ’s are projected out and only four chiral M zero-
modes remain: Mα−−− and Mα++− to be identified with the N = 2 chiral superspace
coordinates θ1α and θ
α. Hence, also in this case the orientifold projection has cured the
problem encountered in section 3 (albeit in a N = 2 context now) and we can rest
assured that the integration over the charged modes will yield a contribution to the
prepotential.
Let us now move to the charged zero-modes sector. Just as in the previous model,
there are no bosonic modes since the D-instanton and the D3-branes sit at different
nodes while the bosonic modes are necessarily diagonal. Most of the fermionic zero-
modes µA and µ̄A are projected out by the orbifold condition which is formally the same
as in (4.8), while the orientifold condition relates the fields in the conjugate sectors,
giving µ̄ as a linear combination of the µ’s according to
µ̄A = R(Ω)ABγ+(Ω)(µ
B)Tγ−(Ω)
−1 . (5.10)
To summarize, the only charged modes surviving the projection can be expressed, in
block 3× 3 notation as
0 0 0
0 0 0
µ 0 0
 , µ̄1 =
0 µT 0
0 0 0
0 0 0
0 0 0
µ′ 0 0
0 0 0
 , µ̄2 =
0 0 −µ′T
0 0 0
0 0 0
 (5.11)
where the entries are to be thought of as column/row vectors in the fundamental/anti-
fundamental of SU(N) depending on their position.
As anticipated, the configuration we want to consider is a (0, N) fractional D3-
branes system together with an exotic (1, 0) instanton. The quiver structure, including
the relevant moduli, is depicted in Fig. 6. It is now easy to see that inserting the
expressions (5.6) and (5.11) into Eqs. (2.1), (2.3) and (2.7) we finally obtain
dx4dθ4F with F = C
dµdµ′ eiµ
TAµ′ ∝ detA . (5.12)
It would be interesting to study the potential implications of this result in the gauge
theory. There are many other simple models that could be analyzed along these lines.
6. Conclusions
In this paper we have presented some simple examples of what seem to be rather generic
phenomena in the context of string instanton physics. We paid particular attention to
Φ Φ3223
U(N) U(N)
Figure 6: The extended Z3 orientifold theory with (0, N) fractional D3-branes and (1, 0)
instanton number. The upper node (which would represent the USp(N1) gauge group and
disappears when we set N1 = 0 as in the case under consideration) is where the instanton
sits. The lower nodes denote only one gauge group. The charged fermionic zero-modes follow
Eq. (5.11). For simplicity we have not drawn the lines denoting the adjoint.
the study of the fermionic zero-modes and their effects on the holomorphic quantities
of the theory. We have seen both examples where the instanton contributions vanish
due to the presence of extra zero-modes and where they do not. In the second case, as
explicitly shown in a N = 1 example, exotic instantons can have a stabilizing effect on
the theory.
Although we have only considered some simple examples, we would like to stress
that these results are quite generic and can be carried over to all orbifold gauge theories.
A future direction would be to try to be more systematic and analyze the various
possibilities encountered in more complex N = 2 and N = 1 models. In a similar
spirit, one should analyze the multi-instanton contributions as well, since the total
correction to the holomorphic quantities will be the sum of all such terms. The study
of the zero-modes is expected to be even more relevant in this case as it will probably
make many contributions vanish. With an eye to string phenomenology, one should
also incorporate these models into globally consistent compactifications and study the
effects of these terms there.
Lastly, it would be interesting to study the dynamical implications of some of the
terms generated. We briefly touched upon this at the end of section 4 when we men-
tioned the stabilizing effect of the exotic instanton on the USp(N) theory. Although
from the strict field theory point of view these terms are thought of as ordinary polyno-
mial terms in the holomorphic quantities,7 they are “special” when seen from the point
of view of string theory and they might therefore induce a particular type of dynamics.
Acknowledgements
We would like to thank many people for discussions and email exchanges at various
stages of this work that helped us sharpen the focus of the presentation: M. Bianchi,
M. Billò, P. Di Vecchia, S. Franco, M. Frau, F. Fucito, S. Kachru, R. Marotta, L. Mar-
tucci, F. Morales, B. E. W. Nilsson, D. Persson, I. Pesando, D. Robles-Llana, R. Russo,
A. Tanzini, A. Tomasiello, A. Uranga, T. Weigand and N. Wyllard.
R.A., M.B. and A.L. are partially supported by the European Commission FP6
Programme MRTN-CT-2004-005104, in which R.A is associated to V.U. Brussel, M.B.
to University of Padova and A.L. to University of Torino. R.A. is a Research Associate
of the Fonds National de la Recherche Scientifique (Belgium). The research of R.A.
is also supported by IISN - Belgium (convention 4.4505.86) and by the “Interuniver-
sity Attraction Poles Programme –Belgian Science Policy”. M.B. is also supported by
Italian MIUR under contract PRIN-2005023102 and by a MIUR fellowship within the
program “Rientro dei Cervelli”. The research of G.F. is supported by the Swedish
Research Council (Vetenskapsr̊adet) contracts 622-2003-1124 and 621-2002-3884. A.L.
thanks the Galileo Galilei Institute for the hospitality and support during the comple-
tion of this work.
References
[1] E. Witten, Nucl. Phys. B 460 (1996) 541 [arXiv:hep-th/9511030].
[2] M. R. Douglas, arXiv:hep-th/9512077.
[3] E. Witten, Nucl. Phys. B 474, 343 (1996) [arXiv:hep-th/9604030].
[4] O. J. Ganor, Nucl. Phys. B 499, 55 (1997) [arXiv:hep-th/9612077].
[5] M. B. Green and M. Gutperle, JHEP 0002 (2000) 014 [arXiv:hep-th/0002011].
[6] M. Billo, M. Frau, I. Pesando, F. Fucito, A. Lerda and A. Liccardo, JHEP 0302 (2003)
045 [arXiv:hep-th/0211250].
[7] R. Blumenhagen, M. Cvetic and T. Weigand, arXiv:hep-th/0609191.
7Save few (interesting) examples, these terms are typically irrelevant and as a consequence should be
naturally suppressed by a high energy scale. Indeed, the terms generated by stringy exotic instantons
are suppressed by powers of the string scale.
[8] L. E. Ibanez and A. M. Uranga, arXiv:hep-th/0609213.
[9] B. Florea, S. Kachru, J. McGreevy and N. Saulina, arXiv:hep-th/0610003.
[10] S. A. Abel and M. D. Goodsell, arXiv:hep-th/0612110.
[11] N. Akerblom, R. Blumenhagen, D. Lust, E. Plauschinn and M. Schmidt-Sommerfeld,
arXiv:hep-th/0612132.
[12] M. Bianchi and E. Kiritsis, arXiv:hep-th/0702015.
[13] M. Cvetic, R. Richter and T. Weigand, arXiv:hep-th/0703028.
[14] M. Bianchi, F. Fucito, J.F. Morales, arXiv:0704.0784 [hep-th].
[15] K. Intriligator and N. Seiberg, JHEP 0602 (2006) 031 [arXiv:hep-th/0512347].
[16] R. Argurio, M. Bertolini, C. Closset and S. Cremonesi, JHEP 0609, 030 (2006)
[arXiv:hep-th/0606175].
[17] R. Argurio, M. Bertolini, S. Franco and S. Kachru, arXiv:hep-th/0703236.
[18] N. Dorey, T. J. Hollowood, V. V. Khoze and M. P. Mattis, Phys. Rept. 371 (2002) 231
[arXiv:hep-th/0206063].
[19] M. Bianchi, S. Kovacs and G. Rossi, arXiv:hep-th/0703142.
[20] M. Billo, M. Frau, S. Sciuto, G. Vallone, and A. Lerda, JHEP 0603 (2006) 023
[arXiv:hep-th/0511036].
[21] M. Billo, M. Frau, I. Pesando and A. Lerda, JHEP 0405 (2004) 023 [arXiv:hep-
th/0402160]; M. Billo, M. Frau, F. Lonegro and A. Lerda, JHEP 0505 (2005) 047
[arXiv:hep-th/0502084]; M. Billo, M. Frau, F. Fucito and A. Lerda, JHEP 0611 (2006)
012 [arXiv:hep-th/0606013].
[22] E. Witten, Nucl. Phys. B 460 (1996) 335 [arXiv:hep-th/9510135].
[23] M. F. Atiyah, N. J. Hitchin, V. G. Drinfeld and Yu. I. Manin, Phys. Lett. A 65, 185
(1978).
[24] M. Bertolini, P. Di Vecchia, G. Ferretti and R. Marotta, Nucl. Phys. B 630 (2002) 222
[arXiv:hep-th/0112187].
[25] M. Bertolini, Int. J. Mod. Phys. A 18 (2003) 5647 [arXiv:hep-th/0303160].
[26] S. Franco, A. Hanany, F. Saad and A. M. Uranga, JHEP 0601 (2006) 011 [arXiv:hep-
th/0505040].
[27] T. R. Taylor, G. Veneziano and S. Yankielowicz, Nucl. Phys. B 218 (1983) 493.
[28] I. Affleck, M. Dine and N. Seiberg, Phys. B 241 (1984) 493.
[29] D. Berenstein, C. P. Herzog, P. Ouyang and S. Pinansky, JHEP 0509 (2005) 084
[arXiv:hep-th/0505029].
[30] M. Bertolini, F. Bigazzi and A. L. Cotrone, Phys. Rev. D 72 (2005) 061902 [arXiv:hep-
th/0505055].
[31] D. Friedan, E. J. Martinec and S. H. Shenker, Nucl. Phys. B 271 (1986) 93.
[32] L. J. Dixon, D. Friedan, E. J. Martinec and S. H. Shenker, Nucl. Phys. B 282 (1987) 13.
[33] P. Di Vecchia, M. Frau, I. Pesando, S. Sciuto, A. Lerda and R. Russo, Nucl. Phys. B
507 (1997) 259 [arXiv:hep-th/9707068]; M. Bertolini, P. Di Vecchia, M. Frau, A. Lerda,
R. Marotta and I. Pesando, JHEP 0102 (2001) 014 [arXiv:hep-th/0011077]; M. Bertolini,
P. Di Vecchia, M. Frau, A. Lerda and R. Marotta, Nucl. Phys. B 621 (2002) 157
[arXiv:hep-th/0107057].
[34] P. Di Vecchia and A. Liccardo, NATO Adv. Study Inst. Ser. C. Math. Phys. Sci. 556
(2000) 1 [arXiv:hep-th/9912161].
[35] L. Martucci, J. Rosseel, D. Van den Bleeken and A. Van Proeyen, Class. Quant. Grav.
22, 2745 (2005) [arXiv:hep-th/0504041].
[36] E. Bergshoeff, R. Kallosh, A. K. Kashani-Poor, D. Sorokin and A. Tomasiello, JHEP
0510, 102 (2005) [arXiv:hep-th/0507069].
[37] G. Pradisi and A. Sagnotti, Phys. Lett. B 216 (1989) 59.
[38] E. G. Gimon and J. Polchinski, Phys. Rev. D 54 (1996) 1667 [arXiv:hep-th/9601038].
[39] M. R. Douglas and G. W. Moore, arXiv:hep-th/9603167.
[40] M. Berkooz and R. G. Leigh, Nucl. Phys. B 483 (1997) 187 [arXiv:hep-th/9605049].
[41] G. Zwart, Nucl. Phys. B 526 (1998) 378 [arXiv:hep-th/9708040].
[42] K. A. Intriligator and P. Pouliot, Phys. Lett. B 353 (1995) 471 [arXiv:hep-th/9505006].
ABSTRACT
  We study the effects produced by D-brane instantons on the holomorphic
quantities of a D-brane gauge theory at an orbifold singularity. These effects
are not limited to reproducing the well known contributions of the gauge theory
instantons but also generate extra terms in the superpotential or the
prepotential. On these brane instantons there are some neutral fermionic
zero-modes in addition to the ones expected from broken supertranslations. They
are crucial in correctly reproducing effects which are dual to gauge theory
instantons, but they may make some other interesting contributions vanish. We
analyze how orientifold projections can remove these zero-modes and thus allow
for new superpotential terms. These terms contribute to the dynamics of the
effective gauge theory, for instance in the stabilization of runaway
directions.

<|endoftext|><|startoftext|>
7 Turbulent Diffusion of Lines and Circulations
Gregory L. Eyink
Department of Applied Mathematics & Statistics
The Johns Hopkins University
3400 N. Charles Street
Baltimore, MD 21218
Tel: 410-516-7201, Fax: 410-516-7459
e-mail: eyink@ams.jhu.edu
Abstract
We study material lines and passive vectors in a model of turbulent flow at infinite-
Reynolds number, the Kraichnan-Kazantsev ensemble of velocities that are white-noise in
time and rough (Hölder continuous) in space. It is argued that the phenomenon of “spon-
taneous stochasticity” generalizes to material lines and that conservation of circulations
generalizes to a “martingale property” of the stochastic process of lines.
PACS: 47.27.Jv, 52.65.Kj, 02.50.Fz, 05.45.Df
keywords: turbulence, material lines, circulations, Kraichnan model, dynamo, fractals
http://arxiv.org/abs/0704.0263v1
The evolution of material lines and surfaces passively carried by turbulent flow has long
been a subject of interest [1]. This is motivated in part by questions surrounding dynamically
relevant objects, such as vortex lines [2, 3, 4] and magnetic field-lines [5, 6], which have been
argued to behave similar to material lines at high Reynolds numbers. However, in that limit, the
turbulent velocity field is no longer differentiable in space but only Hölder continuous [7, 8, 9].
Observations from experiments and simulations suggest that material objects advected by such a
rough velocity become fractal, with a Hausdorff dimension strictly greater than their topological
dimension [10, 11, 12, 13, 14, 15]. This poses a difficulty to the view that vortex lines behave
as material lines—a consequence of the Kelvin-Helmholtz theorem [16, 17]—since circulations
are a priori not defined for non-rectifiable loops. It has recently been argued that the Kelvin
theorem in fact breaks down in turbulent flows, in the sense that the circulation is not strictly
conserved for every loop [18, 19]. Similar breakdown of Alfvén’s theorem on magnetic-flux
conservation [20] is expected in plasma turbulence at high magnetic Reynolds numbers [21].
These questions have been sharpened by recent work on the Kraichnan model of advection
by a Gaussian random velocity field that is delta-correlated in time [22]. A novel phenomenon
has been discovered there called spontaneous stochasticity: Lagrangian particle trajectories for
a non-Lipschitz advecting velocity are non-unique and split to form a random process in path-
space for a fixed velocity realization [23, 24, 25, 26, 27, 28, 29]. This phenomenon raises many
fundamental questions, including whether material objects such as lines and surfaces can even
exist in the limit of infinite Reynolds number. It is the purpose of this Letter to outline a
new approach to the evolution of such geometric objects in the Kraichnan model. We focus on
material lines and passive vectors, which are dual objects in the same sense as material particles
and passive scalar fields [30]. In particular, we shall sketch the proof of a “martingale property”
that has previously been proposed [18] as a generalization of the conservation of circulations
for a rough velocity field.
We consider stochastic flows [31] on a d-dimensional manifold M driven by Brownian vector
fields that are not Lipschitz regular in space. To simplify the presentation, we use Euclidean
space M = Rd or the torus M = Td to illustrate the main ideas. More precisely, u(x, t) is a
Gaussian random vector field, with mean u(x, t) and fluctuating part ũ(x, t) with covariance
〈ũi(x, t)ũj(x
′, t′)〉 = Dij(x,x
′; t)δ(t − t′). (1)
for x,x′ ∈ M. We are mainly interested in the case that u(x, t) ≡ 0 and u(x, t) ≡ ũ(x, t) is a
homogeneous random field, with D(x,x′; t) = D(x− x′, t). The quantity
∆(x,x′; t) = tr[D(x,x; t) +D(x′,x′; t)− 2D(x,x′; t)] (2)
is 〈‖u(x) − u(x′)‖2〉, the mean of the Euclidean norm squared, for a random field u(x) with
covariance D(x,x′; t). The case of greatest interest to us has ∆(x,x′; t) ∝ ‖x − x′‖2α with
0 < α < 1. In that case, u(x) is Hölder continuous with exponent α at every point in space.
We consider oriented lines (1-cells) given parametrically as continuous, one-to-one maps
C : [0, 1] → M. A material line satisfies
(d/dt)C(σ, t) = u(C(σ, t), ◦ t). (3)
for σ ∈ [0, 1] and t ∈ R. The circle “◦” means that we interpret equation (3) in the Stratonovich
sense. The (forward) Ito equation (d/dt)C(σ, t) = u(C(σ, t), t) equivalent to (3) has the mean
changed to u∗i (x, t) = ui(x, t) + (1/2)(∂/∂x
k)Dik(x,x
′; t)|
′=x ([31], section 3.4.) If M = R
d and if u(x, t) is a homogeneous random field, then the Ito and Stratonovich interpretations of
equation (3) are equivalent. Now let P
[C, t] denote the conditional probability distribution of
lines for a fixed velocity realization u. This distribution satisfies a stochastic Liouville equation:
(d/dt)P
[C, t] = −
δCi(σ)
(ui(C(σ), ◦ t)Pu[C, t]) . (4)
Equation (4) is a direct consequence of equation (3) and must also be interpreted in the
Stratonovich sense. It is formally equivalent to the Ito equation:
(d/dt)P
[C, t] = −
δCi(σ)
([u∗i (C(σ), t) + ũi(C(σ), t)]Pu[C, t])
δCi(σ)δCj(σ′)
Dij(C(σ), C(σ
′); t)P
[C, t]
. (5)
Averaging equation (5) over the Gaussian ensemble of velocities ũ yields a functional Fokker-
Planck equation for distributions in the space of free lines C on the manifold M :
(d/dt)P [C, t] = −
δCi(σ)
([u∗i (C(σ), t)P [C, t])
δCi(σ)δCj(σ
Dij(C(σ), C(σ
′); t)P [C, t]
. (6)
The first term on the righthand side represents a drift with the mean velocity u∗ and the
second term represents a diffusion arising from the velocity covariance D. Similar diffusions
on the path- and loop-spaces of a manifold M have been much studied, motivated in part by
questions from quantum field theory [32, 33].
The above considerations are rigorously justifiable for the case of a Lipschitz velocity with
α = 1 but are only formal when α < 1. A more careful (and also more physically realistic)
approach in the latter case is to replace the advecting velocity u with a “coarse-grained” or
smoothed velocity uλ = ϕλ ∗ u, by convolution with a smooth filter kernel ϕλ(r) = λ
−dϕ(r/λ).
The length-scale λ can be interpreted as a mathematical representation of the viscous cutoff in
a true turbulent velocity field [25, 26]. The exact solution of the Liouville equation (4) for such
a smoothed velocity is
(dC, t|C0, t0) = δ(C − ξ
(C0))dC, (7)
with initial condition P
(dC, t0|C0, t0) = δ(C − C0)dC. Here ξ
: M → M is the stochastic
flow of diffeomorphisms generated by the smoothed velocity-field uλ ([31], section 4.6). Despite
(7), a nontrivial diffusion process in line-space can be obtained if the limit λ → 0 is taken
appropriately. Consider a “nice” distribution Gρ(dC) which is supported on lines entirely
contained in the ball B(0, ρ) of radius ρ at the origin 0 and take the weak limit
Gρ(dC
(dC, t|C0 + C
0, t0)Ψ(C) =
(dC, t|C0, t0)Ψ(C) (8)
for bounded, continuous functionals Ψ(C) and t > t0. In the “weakly compressible regime”
[24, 25, 28]— and, in particular, for a divergence-free velocity field satisfying ∇·u = 0— this
limit should yield a non-degenerate diffusion. 1 This is a generalization of the phenomenon
of spontaneous stochasticity to the turbulent advection of lines, with initial line C0 at time t0
splitting into a random ensemble of lines C at time t.
As for the case of smooth advection, an unconditional diffusion satisfying equation (6)
may be obtained by averaging over the velocity u. The instantaneous realizations C of this
diffusion process should be fractal objects when the advecting velocity is Hölder continuous
with exponent 0 < α < 1 and rigorous estimates of their Hausdorff dimensions would be of
much interest. These questions may also be addressed numerically using Lagrangian Monte
Carlo techniques [34, 35, 36]. In such a study, the material line C(t) would be represented by a
discrete approximation CN (t) constructed from N +1 Lagrangian particles xa(t), a = 0, ..., N :
CN (σ, t) = (1− θN (σ))xaN (σ)(t) + θN (σ)xaN (σ)+1(t). (11)
Here aN (σ) = [Nσ] with [x] the greatest integer less than or equal to x (modulo N for loops)
and θN (σ) = (Nσ) where (x) = x− [x] is the fractional part of x. Thus, (11) corresponds to a
piecewise-linear curve with linear segments connecting the successive Lagrangian particles. So
long as δN (t) = maxa |xa(t)−xa+1(t)| . λ, the discrete approximation CN (t) represents well the
material line C(t) and the approximation becomes better as N → ∞ and δN (t) ≪ λ. However,
the same is not true in the opposite limit, where λ ≪ δN (t). The phenomenon of “spontaneous
stochasticity” for a rough velocity field makes it very doubtful that material lines even exist if
the limit λ → 0 is taken before evolving in time and an initial line then presumably “explodes”
1We note that the same diffusion process should also be obtained by using Duhamel’s formula to solve equation
(5) as the Ito integral
Pu[C, t] = S
0 (t, t0)P [C, t0]−
(t, t
δCi(σ)
[eui(C(σ), t
)]Pu[C, t
, (9)
where S∗0 (t, t
′) = Texp
∗(t′)
(t) = −
δCi(σ)
i (C(σ), t)·) +
δCi(σ)δCj(σ′)
Dij(C(σ), C(σ
); t)·
is the Fokker-Planck operator of the (mean) diffusion in line-space. Equation (9) can be solved iteratively to
generate a representation Pu[C, t] = S
(t, t0)P [C, t0] as a Wiener chaos expansion in white-noise eu; cf. [28, 29].
into a disconnected cloud of particles at any time t > 0. Thus, the velocity smoothing in (8)
appears to be necessary to define appropriately a line-diffusion for a rough (Hölder) velocity.
Alternatively, a stochastic regularization might be employed that adds a white-noise κdW (t)
to the evolution equation of Lagrangian particles [30].
We now turn to the dual problem of a passive vector (1-form) A advected by the velocity
u = u+ ũ:
∂tA(x, t) + (u(x, ◦t)·∇)A(x, t) + (∇u(x, ◦t))A(x, t) = 0, (12)
This stochastic equation is interpreted again in the Stratonovich sense. Equation (12) for d = 3
is equivalent by vector calculus identities to ∂tA+∇(u·A)− u×(∇×A) = 0. The latter has
the form of Ohm’s law,
E+ u×B = ηJ, (13)
in the ideal limit of zero resistivity (η = 0) for an electric field E = −∂tA − ∇(u·A) and
magnetic field B = ∇×A given by a vector potential A, with the electric current J = ∇×B.
Taking the curl of (13) yields an induction equation ∂tB = ∇×(u×B)+η△B for the magnetic
field. With this interpretation, the passive vector equation was introduced by Kazantsev [37]
as a soluble model of the kinematic dynamo. (See also [38, 39, 40].) The “circulation” (or
“holonomy”) of A along C is defined in Lagrangian form as
ΦL(C, t) =
A(t)·dx, (14)
where C(t) is the material line advected by u(t) which started as line C at the initial time
t = t0. Conservation of “circulation”, (d/dt)ΦL(C, t) = 0, follows formally from (12) for any
space dimension d ≥ 1. It is rigorously true for the case of a smooth advecting velocity with
α = 1 ([31], section 4.9). If C is a closed loop (1-cycle), then the line-integral (14) represents
gauge-invariant magnetic flux and the conservation law corresponds to Alfvén’s theorem [20].
We now consider the generalization of this result for α < 1. For this purpose, it is useful to
reformulate the passive vector equation (12) as a passive scalar in line-space:
∂tΦE(C, t) +
dσ [ui(C(σ), t) + ũi(C(σ), ◦t)]
δCi(σ)
ΦE(C, t) = 0. (15)
In this equation, ΦE(C, t) =
A(t)·dx is the Eulerian circulation of A along a fixed (non-
advected) line C. An exactly analogous reformulation of the incompressible Euler equation
(as an active scalar in loop-space) was advanced some time ago by Migdal [41]. Note that
conservation of circulations is just the formal solution of (15) by the method of characteristics.
We shall take the equation (15) as our primitive formulation of the passive vector; one of
the immediate advantages is that we can avoid (for the moment) the question how to define
line-integrals over fractal lines. We then convert (15) to Ito formulation:
∂tΦE(C, t) = −
dσ [u∗i (C(σ), t) + ũi(C(σ), t)]
δCi(σ)
ΦE(C, t)
dσ′ Dij(C(σ), C(σ
′), t)
δCi(σ)δCj(σ′)
ΦE(C, t) (16)
This stochastic equation is solved by the method of LeJan and Raimond [28, 29] (cf. footnote
#1), writing it as a (backward) Ito integral and iterating to obtain ΦE(C, t) = Su(t, t
′)ΦE(C, t
where the Markov operator semi-group S
(t, t′) is defined by a Wiener chaos expansion. More
intuitively, this solution is expressed as
ΦE(C, t) =
(dC ′, t′|C, t)ΦE(C
′, t′), t′ < t, (17)
in terms of the turbulent diffusion of lines (backward in time). We see that the circulations
are not conserved, except on average. This is precisely the “martingale property” that was
conjectured (for solutions of incompressible Euler equations) in [18]. Note that this property
imposes an irreversible arrow of time, since Eulerian circulations are given as averages over
their past values, not future ones. This “generalized Alfvén theorem” should be related to
dynamo action in the Kazantsev model [37, 38, 39, 40]. In the physical context of the dynamo,
there is an additional resistive term η
dxj∂i
δσij (x)
ΦE(C, t) on the righthand side of (15),
where δ/δσij(x) is the “area derivative” in the loop calculus of Migdal [41]. This term breaks
time-reversal symmetry and should select the backward-martingale solution (17) in the ideal
limit η → 0. Of course, it cannot be ruled out a priori that the η → 0 limit of the resistive
regularization and the λ → 0 limit for regularized velocity, as in (8), shall yield distinct weak
solutions of the loop-equation (15), as occurs in the intermediate compressibility regime of the
passive scalar problem [24, 25, 26, 28].
These results can be generalized to turbulent diffusion processes of higher-dimensional ma-
terial objects, k-dimensional oriented submanifolds of M or k-cells Ck(t). The dual object is
the passive k-form ωk, which satisfies (in Stratonovich sense)
k + L
ωk = 0 (18)
with L
the Lie-derivative along the vector field u ([31], section 4.9). This equation is formally
equivalent to conservation of the integral invariants
I(Ck, t) =
Ck(t)
ωk(t) (19)
for any k-cell Ck(t) comoving with u [42]. Then k = 0 is the passive scalar, k = 1 the passive
vector and k = d the passive density [24]. A theory similar to that developed here for k = 1
applies for any integer k. A unified approach to all these results is to consider directly the
turbulent diffusion of the Lagrangian flow maps ξt,t
, which satisfy the stochastic equation
(d/dt)ξt,t
(a) = u(ξt,t
(a), ◦t), ξt
′,t′(a) = a. (20)
In this framework one can derive for the distribution P
[ξ, t] on maps exact analogues of the
Liouville equation, in Stratonovich form (4) or Ito form (5). It is natural to formulate the
problem as an infinite-dimensional diffusion in the Hilbert space H = L2(M,Rd) . It is known
for the cases M = Rd or Td that the semigroup S(M) of Borel volume-preserving maps is a
closed subset of this Hilbert space, and that the group G(M) of volume-preserving diffeomor-
phisms is dense in S(M) for the L2-topology [43]. This construction is a close analogue of the
“generalized Euler flows” of Brenier, but for the Cauchy initial-value problem.
To summarize: We have outlined an approach to the study of material lines in a model
of turbulent flow at infinite Reynolds number and to the dual problem of a passive vector in
the same flow. The main conclusions are (1) that a non-degenerate diffusion should exist for
material lines, generalizing the phenomenon of “spontaneous stochasticity” of material points,
and (2) that the Kelvin/Alfvén theorem on conservation of circulations should generalize to a
“martingale property”. Although the approach sketched here depends heavily on the white-
noise character of the velocity field in time, we expect that similar results hold for more realistic
velocity ensembles with the crucial property that realizations are rough (Hölder continuous) in
space. See [18, 21] for related rigorous results on the solutions of incompressible fluid equations.
The two properties discussed in the context of this model problem should be an essential feature
of real fluid turbulence in the high Reynolds number limit.
Acknowledgements. We thank S. Chen, M. Chertkov, L. Chevillard, R. Ecke, C. Meneveau,
K. R. Sreenivasan and E. T. Vishniac for useful conversations. This work was supported by the
NSF grant # ASE-0428325 at the Johns Hopkins University and by the Center for Nonlinear
Studies at Los Alamos National Laboratory, where the research was begun.
References
[1] G. K. Batchelor, “The effect of homogeneous turbulence on material lines and surfaces,”
Proc. Roy. Soc. Lond. A 213 349–366 (1952).
[2] G. I. Taylor. “Observations and speculations on the nature of turbulence motion (1917)”,
in: Scientific Papers of Sir Geoffrey Ingram Taylor, vol.2 , ed. G.K. Batchelor (Cambridge
Univ. Press, 1971), p.69.
[3] G. I. Taylor and A. E. Green, “Mechanism of the production of small eddies from large
ones”, Proc. R. Soc. Lond. A, 158, 499–521 (1937).
[4] G. I. Taylor, “Production and dissipation of vorticity in a turbulent fluid,” Proc. R. Soc.
Lond. A 164 15–23 (1938).
[5] G. K. Batchelor, “On the spontaneous magnetic field in a conducting liquid in turbulent
motion,” Proc. Roy. Soc. Lond. A 201 405–416 (1950).
[6] P. G. Saffman, “On the fine-scale structure of vector fields advected by a turbulent fluid,”
J. Fluid Mech. 16 545 (1963).
[7] L. Onsager, “Statistical hydrodynamics,” Nuovo Cimento, 6 279–287 (1949).
[8] G. Parisi and U. Frisch, “On the singularity structure of fully developed turbulence,”
Turbulence and Predictability in Geophysical Fluid Dynamics, eds. M. Gil, R. Benzi and
G. Parisi (North-Holland, 1985), pp.84–88.
[9] G. L. Eyink, “Besov spaces and the multifractal hypothesis,” J. Stat. Phys. 78 353–375
(1995).
[10] B. B. Mandelbrot, “Géométrie fractale de la turbulence. Dimension de Hausdorff, disper-
sion et nature des singularités du mouvement des fluides,” Comptes Rendus (Paris) 282A
119–120 (1976).
[11] K. R. Sreenivasan and C. Meneveau, “The fractal facets of turbulence,” J. Fluid Mech.
173 357–386 (1986).
[12] J. C. H. Fung and J. C. Vassilicos, “Fractal dimensions of lines in chaotic advection,” Phys.
Fluids A 3 1433 (1991).
[13] E. Villermaux and Y. Gagne, “Line dispersion in homogeneous turbulence: stretching,
fractal dimensions, and micromixing,” Phys. Rev. Lett. 73 252–255 (1994).
[14] D. Kivotides, “Geometry of turbulent tangles of material lines,” Phys. Lett. A 318 574–578
(2003).
[15] F. C. G. A. Nicolleau and A. Elmaihy, “Study of the development of three-dimensional sets
of fluid particles and iso-concentration fields using kinematic simulation,” J. Fluid Mech.
517 229–249 (2004).
[16] H. Helmholtz, “Über Integrale der hydrodynamischen Gleichungen welche den Wirbelbe-
wegungen entsprechen”, Crelles Journal, 55 25–55 (1858).
[17] W. Thomson (Lord Kelvin), “On vortex motion”, Trans. Roy. Soc. Edin., 25, 217–260
(1869).
[18] G. L. Eyink, “Turbulent cascade of circulations,” Comptes Rendus Physique, 7 449–455
(2006).
[19] S. Chen, G. L. Eyink, Z. Xiao, and M. Wan, “Is the Kelvin theorem valid for high-Reynolds-
number turbulence?” Phys. Rev. Lett. 97 144505 (2006).
[20] H. Alfvén, “On the existence of electromagnetic-hydrodynamic waves,” Arkiv f. Mat.,
Astron. o. Fys. 29B 1–7 (1942)
[21] G. L. Eyink and H. Aluie, “The breakdown of Alfvén’s theorem in ideal plasma flows:
necessary conditions and physical conjectures,” Physica D 223 82–92 (2006).
[22] R. H. Kraichnan, “Small-scale structure of a scalar field convected by turbulence,” Phys.
Fluids 11 945–953 (1968).
[23] D. Bernard, K. Gawȩdzki, and A. Kupiainen, “Slow modes in passive advection,” J. Stat.
Phys. 90 519–569 (1998).
[24] K. Gawȩdzki and M. Vergassola, “Phase transition in the passive scalar advection,” Physica
D 138 63–90 (2000).
[25] W. E and E. Vanden-Eijnden, “Generalized flows, intrinsic stochasticity, and turbulent
transport,” Proc. Nat. Acad. Sci. (USA) 97 8200–8205 (2000).
[26] W. E and E. Vanden-Eijnden, “Turbulent Prandtl number effect on passive scalar advec-
tion,” Physica D 152 636–645 (2001).
[27] Y. Le Jan and O. Raimond, “Solutions statistiques fortes des équations différentielles
stochastiques,” C. R. Acad. Sci. Paris 327, Série I, 893–896 (1998)
[28] Y. Le Jan and O. Raimond, “Integration of Brownian vector fields,” Ann. Prob. 30 826–873
(2002).
[29] Y. Le Jan and O. Raimond, “Flows, coalescence and noise,” Ann. Prob. 32 1247–1315
(2004).
[30] G. Falkovich, K. Gawȩdzki & M. Vergassola, “Particles and fields in fluid turbulence,”
Rev. Mod. Phys. 73 913–975 (2001).
[31] H. Kunita, Stochastic Flows and Stochastic Differential Equations. (Cambridge University
Press, Cambridge, 1990).
[32] B. Driver and M. Röckner, “Construction of diffusions on path and loop spaces of compact
Riemannian manifolds,” Compt. Rend. Acad.Sci., Serie I 316 603–608 (1992).
[33] R. Léandre, “Analysis on loop spaces and topology,” Mathematical Notes 72 212–229
(2002).
[34] U. Frisch, A. Mazzino, and M. Vergassola, “Intermittency in passive scalar advection,”
Phys. Rev. Lett. 80 5532–5534 (1998).
[35] U. Frisch, A. Mazzino, A. Noullez and M. Vergassola, “Lagrangian method for multiple
correlations in passive scalar advection,” Phys. Fluids 11 2178–2186 (1999).
[36] O. Gat, I. Procaccia, and R. Zeitak, “Anomalous scaling in passive scalar advection: Monte
Carlo Lagrangian trajectories,” Phys. Rev. Lett. 80 5536–5539 (1998).
[37] A. P. Kazantsev, “Enhancement of a magnetic field by a conducting fluid,” Sov. Phys.
JETP 26 1031–1034 (1968).
[38] M. Vergassola, “Anomalous scaling for passively advected magnetic fields,” Phys. Rev. E
53 R3021–R3024 (1996).
[39] D. Vincenzi, “The Kraichnan-Kazantsev dynamo,” J. Stat. Phys. 106 1073–1091 (2002)
[40] A. Celani, A. Mazzino, and D. Vincenzi, “Magnetic field transport and kinematic dynamo
effect: a Lagrangian interpretation,” Proc. R. Soc. A 462 137–147 (2006).
[41] A.A. Migdal, “Loop equation in turbulence,” hep-th/9303130; “Turbulence as statis-
tics of vortex cells,” hep-th/9306152; “Loop equation and area law in turbulence,”
hep-th/9310088
[42] R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds, Tensor Analysis, and Applications.
Applied Mathematical Sciences, vol. 75. (Springer-Verlag, Berlin, 1983).
[43] Y. Brenier, “Topics on hydrodynamics and volume preserving maps,” in: Handbook of
Mathematical Fluid Dynamics II. (North-Holland, Amsterdam, 2003), pp.55–86.
http://arxiv.org/abs/hep-th/9303130
http://arxiv.org/abs/hep-th/9306152
http://arxiv.org/abs/hep-th/9310088
ABSTRACT
  We study material lines and passive vectors in a model of turbulent flow at
infinite-Reynolds number, the Kraichnan-Kazantsev ensemble of velocities that
are white-noise in time and rough (Hoelder continuous) in space. It is argued
that the phenomenon of ``spontaneous stochasticity'' generalizes to material
lines and that conservation of circulations generalizes to a ``martingale
property'' of the stochastic process of lines.

<|endoftext|><|startoftext|>
Introduction
In this letter, we address the issue of gluon radiation during the hydrodynamic stage in the
evolution of the deconfined hot QCD matter or quark gluon plasma (QGP) [1] (for review
see for example [2]).
The medium induced gluon radiation has been thoroughly explored in the context of final
state partonic energy loss or “jet quenching” [3]. The spatially extended nuclear matter
affects the processes of fragmentation and hadronization of the hard partons produced in the
relativistic heavy ion collisions. Essentially all high p⊥ hadronic observables are affected at
collider energies and the degree of the medium modification can give a characterization of the
hot QCD matter in the deconfined phase. In principle, the medium induced radiation effect
emerges from thermal QCD per se. However, in practice, different approximation schemes
are applied giving consistent results [4, 5]. On the other hand, gluon radiation has also been
considered in the context of gluon density saturation in the initial stage, where a strongly
interacting gluonic atmosphere is crucial for the rapid local thermalization for the deconfined
QCD matter [6].
The time evolution of the RHIC “fireball” can influence the observable particle production
spectra. Given a strong initial interaction, the resulting state of matter is usually modeled as
a relativistic fluid undergoing a hydrodynamic flow. Generalized fluid mechanics that charac-
terizes the long-distance physics of the transport of color charges has been developed for this
purpose [7] (for review see [8]). Recently, we discovered a type of single skyrmion solutions in
color fluid [9]. Moreover, we found an interesting case in which the time-dependent skyrmion
expands in time, which is in accordance with the expanding nature of the fireball generated
in RHIC experiments [10]. The pattern of gluon radiation pertaining to the color current of
these non-static configurations is an important character of this color skyrmion. So in this
letter we calculate this radiation spectrum in a semiclassical approach. The main results from
our calculation are the following. There is a fast fall-off in the UV side of the spectrum but a
smooth peak dominates the intermediate energy. And in IR, a long tail is the characteristic
feature.
The organization of this paper is the following. In Sect. 2, after a brief review of the
nonabelian fluid mechanics, we calculate the nonabelian current corresponding to the soliton
solution. In Sect. 3, semiclassical gluon radiation is calculated. In Sect. 4, comparison of the
radiation spectrum in our hydrodynamic approach and in other approaches is carried out.
2 Color current of an expanding soliton
Given the thermalization of hot QCDmatter above the deconfinement transition temperature,
the transport of the color charges in the volume of the nuclear size can be modeled by a
nonlinear sigma model in a first-order formalism
L = jµωµ − F (n)− geffJaµAaµ. (1)
This nonlinear sigma model describes an ideal fluid system. The configuration of this fluid
is described by a group element field U , which shows up in the velocity field ωµ
ωµ = −
Tr(σ3U
†∂µU). (2)
Conjugate to the velocity is the abelian charge current jµ. It is easy to see that the first term
in the lagrangian density (1) gives rise to the canonical structure of the fluid system. The
fact that we will consider only one abelian charge current means that U takes value in an
SU(2) group. The information about the equation of state (EOS) of the fluid is contained in
the second term, which is essentially the free energy density of the fluid. In fact, energy and
pressure densities are given by the ideal fluid formula
ǫ = F, p = nF ′ − F. (3)
Here n is the invariant length of jµ, n2 = jµjµ. The third term is the gauge coupling of the
fluid with an external gluon field Aaµ with an effective coupling geff . J
aµ is the nonabelian
charge current which is related to the abelian current by the Eckart factorization Jaµ = Qajµ
where Qa is the nonabelian charge density of the fluid configuration
Tr(σ3U
†σaU). (4)
For SU(2) group, a = 1, 2, 3.
When the temperature is relatively high, we approximate the EOS by
ǫ = 3p (5)
which is known in relativistic fluid mechanics to describe radiation. As a result, the free
energy density can be obtained by integrating Eq. (3),
n4/3 (6)
where β is a dimensionless constant of integration. In this case, and without an external
gluon field, the fluid system in (1) possesses a class of expanding soliton solutions which can
be studied via variational and collective coordinate methods [10].
U = U
, R(t) ≈ R0(
+ 1)4/3θ(t) (7)
where R0 and τ are the spacial and temporal characterizations of the variational soliton and
θ(t) the usual step function in time direction. Physically, it is certainly very interesting to
understand the origin of these two scales from a fundamental level. The approximation in
(7) is valid provided τ ≪ R0. This condition enables us to define a small parameter
. (8)
For our purpose, we calculate the nonabelian current in (1) corresponding to the soliton
solution in Eq. (7). To do so, the hedgehog ansatz is specified for the solution (7)
U = cosφ+ iσ · x̂ sinφ (9)
where x̂ is the unit vector and φ is given by the stereographic map
sinφ =
1 + s2
, cosφ = ±1− s
1 + s2
. (10)
We write s as the dimensionless coordinate x/R(t). The sign in the expression of cosφ
signifies a topological charge which is the skyrmion number. The negative sign gives the
skyrmion number +1 or a skyrmion and the positive sign the skyrmion number is −1 or
an anti-skyrmion. We will take the positive sign in the following. By expressing the abelian
current jµ in terms of the velocity ωµ through the equation of motion, we derive the following
expression for the nonabelian current
d3xJaµ =
(1 + s2)6
· (ŝ23s2Ṙ2 − 1) ·
δa3 (1− 6s2 + s4) + 4ǫa3bŝbs(1− s2) + 8ŝ3ŝas2

−ŝ3s(1 + s2)Ṙ
2ŝ1ŝ3s
2 − 2ŝ2s
2ŝ2ŝ3s
2 + 2ŝ1s
2ŝ23s
2 − s2 + 1

.(11)
The current in (11) has a natural form of a multipole expansion due to the skyrmion orienta-
tion in the color space. In this letter we only consider the effect of the lowest mode and the
effects of higher polarization will be considered elsewhere. The spherically symmetric part in
the current is contained only in the third component
d3xJa3
= −δa3
)3 d3s
(1 + s2)6
P6(s) (12)
where P6(s) = 1− 7s2 + 7s4 − s6.
3 Semiclassical gluon radiation
Now we consider the interaction between the expanding color skyrmion and the hard partons.
Since the transfer momentum between hard partons is in high order to that between hard
parton and soliton, we expect a hierarchy between the partonic coupling gYM and the effective
coupling geff . Accordingly, gluon self-interaction in terms like F
aµν can be omitted so we
can work with a free parton picture. Then the gauge coupling in (1) becomes the coupling
between a classical current and a free quantum field for gluon. In this approximation, the
lowest order semiclassical amplitude is given by
iM = geff 〈1|
d4xJaµÂaµ|0〉. (13)
|0〉 and |1〉 are gluonic Fock vacuum and one-gluon state. The gluon factor in (13) is given
by the wave function
〈1|Âaµ(x)|0〉 = ϕaεµ
eik·x√
where the color and helicity parts ϕ, ε will be summed over eventually. Putting the current
in, we have
iM = A(k)
dteiωt
(1 + s2)6
e−iR(t)k·sP6(s) (15)
where A(k) = −(2/β)3geffϕ3ε3/
2ω. The spatial Fourier transformation can be completed
analytically
iM = B(k)
dteiωt−R(t)kQ4(R(t)k) (16)
where B(k) = π2A(k)/120 and Q4(x) = 5x
2 − 5x3 + x4. To go further, we need to specify
R(t) in this equation to the form given in (7). This gives
iM = B(k)e−iωτ η
dteiηt−t
4/3) (17)
where η = ωτ/(kR0)
3/4. With onshell condition ω = k, η = λκ1/4 where κ is defined to be
R0k. Accordingly,
)(geff
ϕ3ε3e
)( iM̃λ(κ)
where
iM̃λ(κ) =
dteiλκ
1/4t−t4/3Q4(t
4/3) (19)
The radiation spectrum is given by dE = kdN . E(k) is the total energy radiated over
the entire time of expansion as a function of k. The number distribution is
|M|2d3k (20)
where the summation is over colors and helicities of the gluon. In a spherically symmetric
setting, dN = ndk where n is the density of states
n = 4πk2
|M|2. (21)
By straightforward calculation,
n = αR0λ
2κ−1/2|M̃λ(κ)|2, (22)
= αλ2κ1/2|M̃λ(κ)|2. (23)
where α ≡ (2π5/225)(g2eff /β6). The numerical results for λ = 1/15, 2/15, 1/5 are given in
Fig. 1.
0.2 0.5 1 2 5 10
���������
0.2 0.5 1 2 5 10
����������
Figure 1: Density of states and energy spectrum for λ = 1/5 (Black), 2/15 (Deep Gray) and
1/15 (Light Gray).
4 Comparison and discussion
Understanding the pattern of gluon radiation in relativistic heavy ion collision processes
is important for making an accurate determination of the physical mechanisms from the
measurement of its decay products.
In [6], the authors extracted the asymptotic behavior of the number density in small k is
of the 1/k form. In our case, the asymptotic of the number density in small k is ∼ 1/
k. (See
Fig. 2.) The difference comes from the fact that the medium size is taken to be infinitely
large in [6] while in our case the medium size is characterized by the soliton size R0. So the
IR behavior in our case is softer.
For the case of jet quenching, the radiation energy lost is due to scattering off the hard
quarks. A popular approach is to model the medium as a collection of colored static scattering
0.02 0.05 0.1 0.2 0.5 1
0.115
0.125
0.135
�!!!!
�������������
Figure 2: n/(1/
κ) in small k for λ = .2
centers [11]. This approach can be extended to the expanding medium [4] though the gluon
radiation by the expanding medium itself is not included. In fact, the medium induced gluon
radiation is characterized by the frequency
q̂L2 (24)
where q̂ is the quenching parameter, estimated to be .04 ∼ .16GeV 2/fm, and L is the in-
medium path length of a hard parton [12]. In general ωC is significantly larger than the
characteristic momentum in our case 1/R0. So there is a hierarchy between the medium
induced gluon radiation spectrum and the gluon radiation spectrum by the medium.
Our hydrodynamical approach opens up another interesting possibility to address the
eccentricity of the elliptic flow either intrinsically by considering the nonabelian color current
or exogenously by considering the gluon radiation patterns. This will be the topic of the
follow-up to this work.
Acknowledgment. This work was supported by a CUNY Collaborative Research In-
centive grant. The author has greatly benefited from the mentoring by V. P. Nair.
References
[1] PHENIX Collaboration, K. Adcox, et al, Nucl. Phys. A757 (2005) 184-283, nucl-
ex/0410003; I. Arsene et al. BRAHMS collaboration, Nucl. Phys. A757 (2005) 1-27,
nucl-ex/0410020; B. B. Back et al (PHOBOS), Nucl. Phys. A757 (2005) 28-101, nucl-
ex/0410022; STAR Collaboration: J. Adams, et al, Nucl. Phys. A757 (2005) 102-183,
nucl-ex/0501009.
[2] Berndt Muller, James L. Nagle, nucl-th/0602029.
[3] Alexander Kovner, Urs A. Wiedemann, “Gluon Radiation and Parton Energy Loss”, in
Quark Gluon Plasma 3 Editors: R. C. Hwa and X. Wang World Scientific Singapore,
hep-ph/0304151.
[4] Carlos A. Salgado, Urs Achim Wiedemann, Phys. Rev. D68 (2003) 014008, hep-
ph/0302184.
[5] Urs A. Wiedemann, Nucl. Phys. B588 (2000) 303, hep-ph/0005129.
[6] Yuri V. Kovchegov, Dirk H. Rischke, Phys. Rev. C56 (1997) 1084, hep-ph/9704201.
[7] R. Jackiw, V.P. Nair, So-Young Pi, Phys. Rev. D62 (2000) 085018, hep-th/0004084;
B. Bistrovic, R. Jackiw, H. Li, V.P. Nair, S.-Y. Pi, Phys. Rev. D67 (2003) 025013,
hep-th/0210143.
[8] R. Jackiw, V.P. Nair, S.-Y. Pi, A.P. Polychronakos, J. Phys.A. Math. Gen. 37 (2004)
R327.
[9] Jian Dai, V.P. Nair, Phys. Rev. D74 (2006) 085014, hep-ph/0605090.
[10] Jian Dai, “Stability and Evolution of Color Skyrmions in the Quark-Gluon Plasma”,
hep-ph/0612260.
[11] M. Gyulassy, X. Wang, Nucl. Phys. B420 (1994) 583.
[12] Miklos Gyulassy, Ivan Vitev, Xin-Nian Wang, Ben-Wei Zhang, “Jet Quenching and
Radiative Energy Loss in Dense Nuclear Matter”, in Quark Gluon Plasma 3 Editors: R.
C. Hwa and X. Wang World Scientific Singapore, nucl-th/0302077.
ABSTRACT
  The density of states and energy spectrum of the gluon radiation are
calculated for the color current of an expanding hydrodynamic skyrmion in the
quark gluon plasma with a semiclassical method. Results are compared with those
in literatures.

<|endoftext|><|startoftext|>
Introduction
PKS 2155-304 (z=0.116, Falomo et al. 1991) is a prototype of high frequency peaked BL Lac objects. It has been observed in
the entire electromagnetic spectrum, from radio to TeV gamma-rays. It was the target of several multifrequency campaigns, the
main scope of which was to study the variability of the spectral energy distribution (SED), in order to constrain emission models.
In particular we refer to the 1991 and 1994 campaigns involving IUE, ROSAT, ASCA, EUVE and ground based telescopes
(see Edelson et al. 1995, Urry et al. 1997, and references therein). There were noticeable differences in source behaviour between
these two epochs. While in 1991 the multiwavelength variability was almost achromatic, and the X-ray variation led that in the
UV by a couple of hours, in 1994 the variability was more pronounced in X-rays than in UV-optical, with a lag of the latter
⋆ This paper is the corrected version astroph 0704.0265 published in A&A 469 503. It contains the material in the ”Errata Corrrige”, in press
in A&A.
http://arxiv.org/abs/0704.0265v2
2 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
Period of observation Nights of observation Number of photometric points Total exposure time
May 6 129 14520 s
September 8 159 18080 s
October 3 102 11590 s
November 21 1581 173540 s
December 6 64 7030 s
Table 1. Outline of observations accomplished in 2005.
by two days. The general pattern was that of a hardening of the spectrum with increasing intensity. More recently Zhang et al.
(2006b) studied a large set of data covering the period 2000-2005 obtained with the XMM-Newton satellite, which allowed a
direct comparison of the X-ray and UV-optical band, the latter deriving from the Optical Monitor on board the satellite. The
complexity of the variability pattern is confirmed. Some episodes of achromatic variation were detected, but a general tendency
of increasing variability amplitude with increasing frequency, and spectral hardening with increasing intensity was found.
Optical photometry has been performed by several groups in several occasions (see e.g. Miller et al. (1983), Smith et al.
(1992), Xie et al. (1996), Paltani et al. (1997), Pesce et al. (1997), Fan & Lin (2000), Tommasi et al. (2001) and references
therein). All this material is rather fragmented, consisting of few hours of observations during few nights. The difficulty of a
systematic observing campaign covering many nights is partly overcome by the possibility of observing using remotely guided
or robotic telescopes.
The REM telescope, originally designed for a prompt detection of gamma ray bursts (see Molinari et al. (2006)), is particularly
apt for photometric studies of BL Lacs (see also the previous results for PKS 0537-441 by Dolcini et al. 2005, and for 3C 454.3
by Fuhrmann et al. 2006) and, being located at La Silla (Chile), it is ideally fit to study PKS 2155-304.
We report on extensive and intensive photometric campaign performed in 2005 in the V, R, I, J, H, K bands. For the total
number of photometric points, for the time resolution (minutes) and spectral range this campaign seems to supersede all the
IR-optical photometric material presented thus far.
2. REM, Photometric procedure, data analysis
2.1. REM
The Rapid Eye Mount (REM) Telescope is a 60 cm fully robotic instrument. It has two cameras fed at the same time by a dichroic
filter that allows the telescope to observe in the NIR (z’, J, H, K) as well as optical (I, R, V). Further information on the REM
project may be found in Zerbi et al. (2001), Chincarini et al. (2003) and Covino et al. (2004).
2.2. Observations and data analysis
REM observed the PKS 2155-304 field during May, September, October, November and December 2005 in VRIH bands. Only
during three nights in September the telescope observed also in J and K filters. To allow intranight and short time-scale variability
monitoring, very intensive observations (2-3 h, quasi-continuously) were made during five of the nights in November. An outline
of the observations is reported in Table 1, while the complete log is only available in Table A.1 (see Appendix A): we report for
each photometric point the band, the epoch, the integration time, the intensity and its uncertainty. Typical integration times are
≤100 s and statistical uncertainties are always ≤ 10% and ≤ 3% in the highest state (November 2005, see following).
Reduction of the REM NIR and optical frames followed standard procedures. Photometric analysis of the frames was done
using the GAIA1 and DAOPHOT packages (Stetson 1986). Relative calibration was obtained by calculating magnitude shifts
relative to three bright isolated stars in the field, indicated by A, B, C in Fig. 1 (image taken from ESO Digitized Sky Survey2).
The NIR frames were calibrated using the magnitudes of the A, B and C stars as reported in the 2MASS catalogue3. For the
optical, we exposed on 2006 June 29 the standard field G156-31 (Landolt, 1992), and immediately after this the PKS 2155-304
field. We calculated the zero points which were then used to calibrate all of our data. The observed magnitudes in the REM filters
for the reference objects A, B, and C are reported in Table 2. We have monitored the relative intensities of the A, B, C reference
stars during the entire observation period, and we have detected no indication of variability within 0.1 mag (error on the average
≤0.01 mag).
1 http://star-www.dur.ac.uk/ pdraper/gaia/gaia.html
2 http://archive.eso.org/dss/dss
3 http://irsa.ipac.caltech.edu
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 3
Figure 1. PKS2155-304 field (DSS-1 survey). Letters indicate stars used for calibration.
A B C
RA 21:58:46.505 21:58:43.807 21:58:42.337
DEC -30:17:51.29 -30:17:15.71 -30:10:27.41
K 11.171±0.024 12.475±0.030 12.648±0.024
H 11.182±0.027 12.556±0.026 12.769±0.027
J 11.510±0.027 12.838±0.026 13.091±0.029
I 12.184±0.005 13.421±0.009 13.216±0.006
R 12.981±0.004 13.434±0.006 13.671±0.010
V 13.179±0.005 13.822±0.009 13.899±0.013
Table 2. Coordinates, IR and optical magnitudes for the reference stars.
Note that we found significant deviations from the optical calibrations provided by the finding charts for AGN of the
Heidelberg University4 (Hamuy & Maza, 1989). In particular the star C is also used as a calibrator by these authors and our
optical zeropoint differs by about 0.3 mag from theirs.
Relative and absolute calibration errors have been added in quadrature to the photometric error derived from the procedure.
3. Results
3.1. Long term variability
In this section we report the results of the long term photometric analysis. The light curves in the H, R, I, V filters are given in
Fig. 2.
The intensity is normalized with respect to the average over the entire observation period. These averages are given in Table
3. It is immediately apparent that the total variability range is very different in the various filters, being a factor ≈ 4 in H and a
factor ≈ 2 in V (see Table 3) . The shapes of the light curves are similar in the various filters. A flare-like structure is apparent
in all filters at t ≈ 680 (first days of November). The ratio between the V- and H-band fluxes, designated as V/H, is reported in
Fig. 3. In order not to introduce spurious effects due to small time scale variability, the V/H ratio has been computed for pairs of
V and H measurements spaced apart in time by no more than 10 minutes.
It seems that there are two main colour states: the source softens rather abruptly, in response to the November flare. On
the basis of the light curve and the colour curve we divide the observations in three epochs: 1 500-525, 2 640-660, 3 670-725,
expressed in MJD5.
4 http://www.lsw.uni-heidelberg.de/projects/extragalactic/charts/2155-304.html
5 For the Modified Julian Date we use the convention MJD=JD-2,453,000.5
4 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
500 550 600 650 700
500 550 600 650 700
500 550 600 650 700
H filter
I filter
R filter
V filter
500 550 600 650 700
Figure 2. Normalized light curves of PKS 2155-304. Flux is reported in arbitrary unit (a. u.). In each boxes a typical error bar is
plotted.
Filter H I R V
Average 114.9±3.3 34.45±6.5 30.89±5.13 30.70±5.05
Max value 156.5 46.4 38.3 37.4
Min value 36.5 19.1 16.2 16.2
Average ep.1 39.3±1.4 21.4±1.5 18.7±1.3 18.1±0.7
Average ep.2 65.9±5.2 28.4±3.3 27.2±2.5 20.3±3.4
Average ep.3 122.9±6.1 38.8±1.9 34.1±1.5 33.5±1.7
Table 3. Average intensities for all epochs and all filters. All data are in mJy units. Epoch 1 corresponds to May 2005 observa-
tions, epoch 2 to September-October 2005 observations and epoch 3 to November-December 2005 observations.
3.2. Short time-scale variability
We report in Fig. 4 the light curves for five nights in November 2005, when the observations were more intensive. All the nights
belong to epoch 3, corresponding to the high state of the source.
The mean intensity and the 1-sigma values for each night are given in Table 4.
A χ2 analysis indicates that in each night the significance of variability is very high, but for the nights of Nov 4 and Nov 18
for the H band and Nov 19 for the V band. In the box of Nov 4 - V band we also report the photometry of a comparison star
which illustrates directly the significance of the source variability. Though the shapes of intensity curves are different (see Fig.
4), there is a rather regular colour-intensity dependence (see Fig. 5) indicating harder states for higher intensities.
We adopt the usual definition of time scale variability τ = 11+z
d f/dt . Following Montagni et al. (2006), a variability time scale
is taken as reliable if the light curve can be approximated with a linear dependence, and it contains at least 10 points. In particular
this gives a time scale of ≈ 24 h for the November 4 night (Fig. 4, V band - Nov 4 box). The simultaneous H light curve does
not show any regular variability. We note that on November 8 in the H curve there is a flare-like event. If one connects 4 points
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 5
Time (MJD)
500 525 550 575 600 625 650 675 700 725
Figure 3. V/H flux ratio evolution during 2005. Error bars are comparable with symbol size.
Figure 4. Light curves in the H and V filters for five nigths in November 2005, when the observations were more intensive. Dates
of observations are reported in each box. The solid line in V band - 4 Nov box results from a linear regression analysis. The solid
line in H band - 8 Nov box connects the four points of the flare-like structure. In each box it is given a typical error bar. In V
band - 4 Nov box the light curve of one comparison star is also plotted, with a fixed enhancement of 9 mJy.
6 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
Figure 5. V/H flux ratio versus intensity for the five more intensively observed nights of epoch 3. In each box a typical error bar
is plotted.
Night 4/11 8/11 18/11 19/11 20/11
Average H 119.3±1.7 119.3±3.0 120.4±2.1 124.5±2.8 130.8±3.0
Average I 39.1±0.6 36.4±0.5 38.7±0.6 38.7±0.6 38.3±0.7
Average R 38.5±0.8 36.40±.8 37.3±0.5 38.0±1.6 37.1±0.5
Average V 33.2±1.1 33.0±1.1 33.4±1.1 33.5±1.1 34.7±0.1
Table 4. Average intensities and 1-sigma values for all filters for all five nights with more intensive observations in November
2005. All values are in mJy units.
as suggested in Fig. 4 H band - Nov 8 box, the time scale variability is as short as 1.5 h. Unfortunately the V light curve is too
sparse to confirm the presence of the flare also in this band.
3.3. The NIR-Optical spectral energy distribution
We had six filter coverage (K,H,J,I,R,V) during three nights of Sept. 2005 (epoch 2) and representative SEDs for these nights are
reported in Fig. 6.
The delays between exposures in the different filters are less than 20 minutes. Reddening corrections are less than 6% in V
and have been neglected. A fit with a single power law yields α ≈0.9 and it is clearly not good. The main deviation derives from
the J filter, exceeding substantially our photometric precision of about 10%. An improvement in the fit is obtained by using a
broken power law with spectral indices α ≈0.4 for the IR data and α ≈0.9 for the optical data.
For comparison, we report in Fig. 7 the SED of June 29, 2006, exposure used for calibration purpose: its profile is rather
similar to that of Sept. 2005.
At the other epochs the SED consists of 4 points (H, I, R, V), and in Figs. 8 and 10 we give representative examples of SEDs
acquired on epoch 1 and 3. The time differences between observations at various filters are less than 20 minutes.
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 7
Frequency (Hz)
19/9 data
21/9 data
26/9 data
Average data
Figure 6. September 2005 spectra for observations including the K and J filters. The spectral fit on average data with a single
power law yields a spectral index α=0.91±0.07.
Frequency (Hz)
Figure 7. 29 June 2006 spectrum. The spectral fit with a single power law yields a spectral index α=0.90±0.16.
In Fig. 8, which refers to a low state, we report also the estimated contribution of the host galaxy, which was calculated
adopting the H magnitude of the galaxy measured by Kotilainen et al. (1998) and the Mannucci at al. (2001) template spectrum
for giant ellipticals. It is apparent that the contribution of the galaxy never exceeds 20% of the BL Lac signal. At the other epochs
the contribution from galaxy is negligible and it is not relevant for explaining the excess in J with respect a single power law
noted above. The epoch 2 photometry (Fig. 9) is compared with spectrophotometry obtained with the ESO 3.6m telescope by R.
Falomo6 on July 25, 2001 (Sbarufatti et al. (2006)). The source was found in a similar, but somewhat lower brightness state and
some deviations from a power law are apparent. The HRIV points at epoch 3 (Fig. 10) are roughly fitted by a single power law
of α ≈1.3. In any case the comparison of the SEDs at the three epochs clearly indicate a softening with increasing intensity.
6 spectrum available at the ZBLLAC online library, http://www.oapd.inaf.it/zbllac
http://www.oapd.inaf.it/zbllac
8 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
Figure 8. 13 May 2005 spectrum - epoch 1. We report also the spectrum of the host galaxy (see text). The spectral fit with a
single power law yields a spectral index α=0.77±0.16.
Figure 9. 19 September 2005 spectrum - epoch 2. For comparison we report the ESO 3.6m telescope spectrophotometry which
correspond to a slightly lower state of the source. The spectral fit with a single power law yields a spectral index α=0.88±0.05.
4. Discussion
A collection of near-IR/optical SEDs of PKS2155-304 obtained by various authors at different epochs is presented in Fig. 11 and
in Table 5. Our data encompass all those reported in the literature.
In the historical observations of PKS2155-304 the delays between exposures at different filters are typically of the order of
hours, instead of about 10 minutes as in our data set. Comparing literature data with our data it is apparent that the maximum
we observed on 20 November 2005 in the H filter light curve is the highest state ever reported in this band. Note that the V state
was comparable with states reported in the literature, likely because the coverage of the source in the optical band is less sparse
than that in the NIR. A most noticeable result of our photometry is the discovery of long term H-band variability, the amplitude
of which is much larger than that in the optical.
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 9
Frequency (Hz)
Figure 10. 4 November 2005 spectrum - epoch 3. The spectral fit with a single power law yields a spectral index α=1.32±0.25.
Error bars are comparable with symbols size.
Data set α V (mJy)
This work (13/5/2005) 0.77±0.16 16.485±0.263
This work (19/9/2005) 0.88±0.05 24.370±0.238
This work (4/11/2005) 1.32±0.24 35.278±0.498
Bertone at al. (2000) 0.42±0.26 26.20±0.58
Pesce et al. (1997) 0.62±0.30 24.50±0.67
Zhang & Xie (1996) 0.62±0.16 22.90±0.63
Bersanelli et al. (1992) 0.61±0.38 51.88±1.56 (J band)
Treves et al. (1989) (1/12/1983) 0.51±0.31 19.80±0.36
Treves et al. (1989) (11/11/1984) 0.51±0.41 26.20±0.48
Miller & McAlister (1983) 0.62±0.56 17.8
Table 5. Spectral index values and V values for all spectra plotted in Fig. 11. α vs V plot is reported in Fig. 12.
In Fig. 12 we plot the spectral index vs the V magnitude, as reported in table 5. There is no apparent correlation. It is
noticeable however that the highest state in all bands (our observation of Nov 2005) corresponds to a rather soft spectral shape.
This contrasts with the usual source behaviour of hardening with increasing intensity, as found in the UV-X-ray band (see
Introduction). It contrasts also with the short time scale variability, as reported in section 3.2.
There is a general consensus that the blazar SED can be explained by the superposition of a synchrotron component, and
an inverse Compton one due either to scattering off the synchrotron photons (synchrotron-self Compton, SSC), or to external
photons like those of the broad line region or of a thermal disk (e.g. Tavecchio et al. 1998, Katarzynski et al. 2005). This results
in a typical two-maxima shape of the blazar SED. In Fig. 13 we report examples of the SED modeling proposed for PKS 2155-
304, on the basis of data taken in 1997. The models are detailed in Chiappetti et al. (1999). The object is a typical HBL, with the
synchrotron peak in the soft X-rays.
A well known critical point of this model, is that the source size is essentially constrained by variability, and variability itself
requires that the SED is constructed using simultaneous observations in all bands. A further step of the modelling consists in
identifying the physical origin of the relativistic jet and of its variability, see e.g. Katarzynski & Ghisellini (2006). With this
premise it is obvious that the optical-IR photometric study, non simultaneous with that in other regions of the SED, has only a
limited relevance in clarifying the overall picture. However we would like to make some remarks. If the SSC models reported in
Fig. 13 truly represent the behaviour of the SED in 1997, as suggested by the good match with the X-ray and TeV energy data,
and if our 2005 optical-IR spectra are also due to the SSC mechanism, then the latter represent a different condition in the jet
and point to different critical parameters within the SSC scenario. While the IR-optical spectrum in May 2005 (triangles) has
the same shape as predicted in 1997, but different normalization, the November 2005 IR-optical spectrum is different in both
10 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
This work (13/5/2005)
This work (19/9/2005)
This work (4/11/2005)
Bertone et al. 2000 (24/5/1996)
Pesce et al. 1997 (19/5/1994)
Zhang and Xie 1996 (5/7/1994)
Bersanelli et al. 1992 (17/1/1987)
Treves et al. 1989 (1/12/1983)
Treves et al. 1989 (11/11/1984)
Miller and McAlister 1983 (19/11/1981)
Frequency (Hz)
Figure 11. Different spectra of PKS2155-304 from observations at other epochs reported in the literature. Symbols correspond
to following works: filled circles: this work (13/5/2005 data), filled squares: this work (19/9/2005 data), filled up triangles: this
work (4/11/2005 data), open diamonds: Bertone et al. (2000; 24/5/1996 data), open circles: Pesce et al. (1997; 19/5/1994 data,
the Hamuy & Maza (1989) calibration is used), open up triangles: Zhang and Xie (1996; 5/7/1994 data), open squares: Bersanelli
et al. (1992; 17/1/1987 data), open crosses: Treves et al. (1989; 1/12/1983 data), open stars: Treves et al. (1989; 11/11/1984 data),
asterisks: Miller and McAlister (1983; 19/11/1981 data). Spectral index values and V magnitudes for all data sets are reported in
Table 5.
shape and normalization. The May 2005 observation suggests that the synchrotron peak may be located at a frequency similar to
the one observed in 1997 (approximately between extreme UV and soft X-rays), the total energy being somewhat higher (about
a factor 2, see Figure 13) than observed in 1997. The slope of the November 2005 spectrum suggests instead a much lower
synchrotron peak energy, around the IR-optical domain or even redward, i.e. about 2-3 orders of magnitude lower than observed
in 1997 and inferred in May 2005. While a variation of the synchrotron peak energy of this amplitude and on this time scale
(the September 2005 slope is intermediate between those of May and November 2005, suggesting a monotonic change) it is not
unprecedented in blazars (Mkn501 exhibited a similar variation in a much more rapid time scale, Pian et al. 1998), this would be
the first observation of this kind in PKS 2155-304. Therefore, our interpretation is only tentative, although supported by the large
observed IR variability.
Alternatively, in order to explain the optical-IR flux excess we observe in 2005 with respect to the SSC prediction based on
the earlier multiwavelength data (Fig. 13), one could invoke a thermal component, possibly from hot dust associated with the
“dusty torus” surrounding the central region of the active nucleus, as suggested in the cases of other blazars with excess in the
optical-infrared band (De Diego et al. 1997, for blazar 3C 66A; Pian et al. 1999 for 3C 279; Pian et al. 2002, 2006, for blazar
PKS 0537-441). However, this seems somewhat less likely, because high emission states, as observed by us, are expected to be
dominated by non-thermal beamed relativistic radiation.
The continuation of this and other similar optical-IR studies, which have been proven to be promising but do not provide
enough information for a physical interpretation of the data, requires that the observations are extended to other wavelengths.
Simultaneous observations over a large wavelength range is the only tool to provide the necessary information for a physical
interpretationof the observed variability of blazars. REM monitorings of the kind reported here could be an effective trigger to
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 11
V (mJy)
15 17.5 20 22.5 25 27.5 30 32.5 35
Figure 12. α vs V plot for data reported in Fig. 11. Symbols are the same as used in Fig. 11.
X-ray satellites, and programs along these lines are foreseen with SWIFT. Cross correlation procedures, which up to now have
been limited mainly to the X-ray band (Zhang et al. 2005, 2006a, 2006b, Sembay et al. 2002, Edelson et al. 1995), would be
extended to a much larger portion of the SED.
Appendix A: Table of observations
Filter Epoch (MJD) Integr time s Intensity mJy Sigma
K 633.4872 120 73.993 3.640
K 633.4966 120 73.345 3.640
K 633.5012 120 73.382 2.938
K 633.5094 120 76.417 1.043
K 633.5254 120 72.309 2.698
K 635.4835 120 70.598 2.579
K 635.6612 120 66.373 2.810
K 635.6670 120 67.794 2.548
K 635.6736 120 67.857 2.096
K 635.6794 120 69.182 2.758
K 635.6832 120 66.987 2.263
K 639.8339 120 67.982 0.689
K 639.8355 120 67.345 0.702
H 503.7708 120 38.943 3.048
H 503.7723 120 39.412 2.977
H 503.7741 120 38.907 2.903
H 503.8028 120 40.628 3.180
H 503.8043 120 43.135 7.497
H 507.7738 120 38.373 0.314
H 507.7753 120 38.302 0.314
H 507.7839 120 38.692 0.317
H 507.7854 120 38.586 0.316
12 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 507.7873 120 38.835 0.601
H 507.7881 120 38.871 0.318
H 508.7487 120 37.881 1.999
H 508.7501 120 37.224 2.168
H 508.7516 120 37.771 2.127
H 508.7531 120 38.444 1.924
H 508.7549 120 37.847 1.146
H 508.7565 120 37.396 1.786
H 510.6465 120 39.339 2.005
H 510.7174 120 39.850 1.559
H 510.7192 120 39.557 1.584
H 510.7208 120 39.594 1.513
H 510.7229 120 40.274 1.466
H 510.7244 120 40.071 1.240
H 510.7379 120 39.484 1.258
H 510.7394 120 39.740 1.555
H 510.7413 120 39.960 1.382
H 510.7427 120 39.412 1.399
H 510.7448 120 40.219 2.635
H 510.7463 120 40.256 1.246
H 514.7073 120 38.799 1.271
H 514.7089 120 38.056 1.247
H 514.7107 120 38.515 1.297
H 514.7122 120 38.267 1.741
H 514.7143 120 37.881 1.379
H 514.7156 120 38.887 1.415
H 514.7068 120 37.881 2.448
H 514.7710 120 37.812 1.342
H 514.7734 120 37.259 1.288
H 514.7749 120 36.478 1.162
H 515.6989 120 40.929 2.272
H 515.7003 120 40.108 2.117
H 515.7022 120 40.966 2.833
H 515.7037 120 41.155 2.210
H 515.7085 120 40.703 3.926
H 515.7086 120 41.422 2.149
H 515.7091 120 41.042 2.316
H 515.7106 120 41.117 2.320
H 515.7124 120 41.498 2.266
H 515.7139 120 40.816 2.117
H 635.3400 120 61.436 2.148
H 635.3489 120 63.801 0.405
H 635.3552 120 63.566 0.794
H 635.3648 120 62.349 0.855
H 637.5426 120 57.494 0.952
H 637.5481 120 57.334 1.721
H 637.5502 120 57.230 1.591
H 637.5619 120 55.928 1.162
H 637.5779 120 55.671 1.427
H 642.6039 120 59.597 1.427
H 642.6054 120 59.597 1.394
H 643.4546 120 61.267 0.902
H 643.4561 120 61.948 0.509
H 643.4581 120 62.119 1.031
H 643.4596 120 62.925 1.627
H 643.4607 120 61.663 0.741
H 643.4631 120 62.607 0.992
H 645.5581 120 64.095 1.014
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 13
H 645.5622 120 65.528 1.049
H 645.5641 120 64.036 1.104
H 645.5655 120 63.860 2.105
H 646.4435 120 60.873 2.048
H 646.4464 120 60.817 2.036
H 646.4490 120 60.482 2.044
H 646.4510 120 60.705 3.685
H 646.4521 120 62.291 2.194
H 646.4536 120 61.834 3.389
H 650.5509 120 64.213 2.708
H 650.5524 120 66.134 2.500
H 650.5537 120 64.036 3.012
H 650.5552 120 62.810 5.245
H 650.5571 120 65.047 3.372
H 650.5586 120 64.629 2.247
H 650.5606 120 63.449 2.176
H 650.5621 120 65.770 2.425
H 650.5641 120 66.256 2.454
H 650.5709 120 64.808 4.160
H 650.5743 120 64.692 2.418
H 655.3672 120 73.253 2.759
H 655.3688 120 72.048 1.800
H 655.3708 120 71.322 1.508
H 655.3716 120 71.916 1.493
H 655.3741 120 71.718 1.505
H 655.3756 120 73.312 1.762
H 655.3772 120 71.454 1.996
H 655.3787 120 71.257 1.886
H 655.5449 120 71.652 1.500
H 655.5464 120 71.652 1.695
H 655.5481 120 70.474 1.475
H 655.5496 120 70.799 1.482
H 655.5515 120 71.191 1.490
H 655.5530 120 71.718 1.501
H 655.5548 120 72.115 1.641
H 655.5563 120 71.718 1.566
H 655.5581 120 72.381 2.305
H 655.5596 120 72.448 1.516
H 655.5615 120 71.536 1.562
H 655.5630 120 71.060 1.552
H 655.5648 120 71.652 1.565
H 655.5663 120 71.454 1.561
H 655.5681 120 71.119 1.620
H 655.5696 120 70.799 2.577
H 665.3660 120 118.912 3.138
H 678.3451 120 121.458 1.503
H 678.3466 120 116.957 1.524
H 678.3480 120 117.173 1.628
H 678.3501 120 121.458 1.687
H 678.3516 120 120.455 1.491
H 678.3531 120 121.570 1.689
H 678.3551 120 117.497 1.821
H 678.3566 120 120.123 3.080
H 678.3581 120 119.902 1.562
H 678.3604 120 121.123 2.447
H 678.3619 120 117.822 1.971
H 678.3634 120 119.131 1.601
H 678.3656 120 120.012 1.932
14 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 678.3671 120 119.571 1.661
H 678.3685 120 119.022 2.584
H 678.3707 120 119.681 1.927
H 678.3723 120 117.065 1.885
H 678.3738 120 117.931 1.460
H 678.3759 120 118.366 1.500
H 678.3774 120 118.584 1.593
H 678.3788 120 118.366 1.767
H 678.3811 120 118.803 1.506
H 678.3826 120 118.803 1.470
H 678.3840 120 118.803 1.470
H 678.3862 120 117.822 1.493
H 678.3877 120 118.693 2.064
H 678.3892 120 117.931 1.760
H 678.3911 120 117.931 1.638
H 678.3926 120 119.022 1.653
H 678.3942 120 119.902 1.790
H 678.3963 120 118.475 1.466
H 678.3978 120 118.039 1.461
H 678.3993 120 118.257 1.765
H 678.4014 120 118.039 1.461
H 678.4028 120 117.605 1.968
H 678.4043 120 118.912 1.472
H 678.4064 120 120.123 1.728
H 678.4078 120 121.123 1.499
H 678.4093 120 119.792 2.164
H 678.4114 120 118.584 2.225
H 678.4128 120 117.281 1.528
H 678.4143 120 117.281 2.457
H 678.4164 120 117.497 1.754
H 678.4178 120 115.140 1.500
H 678.4193 120 119.022 1.776
H 678.4214 120 117.822 1.535
H 678.4218 120 118.257 1.643
H 678.4244 120 115.352 1.503
H 678.4265 120 118.912 1.711
H 678.4280 120 116.098 1.437
H 678.4294 120 120.900 1.496
H 678.4346 120 119.131 1.601
H 678.4362 120 120.677 2.019
H 678.4376 120 120.677 1.494
H 678.4398 120 120.789 1.495
H 678.4412 120 120.566 3.387
H 678.4427 120 121.123 2.359
H 678.4447 120 122.019 1.891
H 678.4462 120 120.789 1.495
H 678.4476 120 120.344 2.092
H 678.4497 120 120.900 2.811
H 678.4512 120 121.794 2.372
H 678.4526 120 121.794 2.372
H 678.4546 120 120.012 2.605
H 678.4562 120 120.566 3.189
H 678.4576 120 121.906 1.693
H 678.4597 120 121.458 1.882
H 678.4612 120 122.582 2.214
H 678.4626 120 121.794 1.887
H 682.3480 120 118.693 1.260
H 682.3495 120 117.497 1.152
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 15
H 682.3510 120 116.634 3.660
H 682.3532 120 119.681 1.270
H 682.3547 120 119.681 1.270
H 682.3562 120 119.131 3.121
H 682.3583 120 119.351 1.170
H 682.3597 120 117.822 1.155
H 682.3618 120 118.693 1.163
H 682.3633 120 118.257 1.851
H 682.3647 120 118.257 1.159
H 682.3669 120 118.366 2.313
H 682.3684 120 117.173 1.192
H 682.3700 120 116.420 1.575
H 682.3721 120 116.420 1.184
H 682.3735 120 118.803 1.689
H 682.3771 120 116.527 3.454
H 682.3771 120 117.389 1.246
H 682.3785 120 117.389 1.246
H 682.3801 120 117.497 1.928
H 682.3821 120 119.681 1.329
H 682.3836 120 120.900 1.556
H 682.3735 120 120.677 1.183
H 682.3872 120 118.148 1.254
H 682.3887 120 119.902 1.175
H 682.3903 120 117.713 1.757
H 682.3917 120 118.693 1.260
H 682.3938 120 117.065 1.747
H 682.3953 120 119.131 2.328
H 682.3968 120 118.148 2.121
H 682.3989 120 116.849 2.006
H 682.4005 120 116.634 2.662
H 682.4019 120 119.131 1.693
H 682.4041 120 121.011 1.284
H 682.4055 120 118.257 2.216
H 682.4070 120 119.351 3.641
H 682.4156 120 120.234 1.547
H 682.4171 120 122.695 2.299
H 682.4186 120 125.900 3.406
H 682.4222 120 132.198 2.373
H 682.4257 120 119.571 3.235
H 682.4271 120 118.693 5.296
H 682.4305 120 120.677 4.421
H 682.4324 120 119.571 3.235
H 682.4338 120 121.011 1.284
H 682.4359 120 117.931 1.310
H 682.4374 120 118.803 1.860
H 682.4388 120 121.682 1.418
H 682.4398 120 123.488 5.620
H 682.4412 120 123.148 1.666
H 682.4427 120 120.012 4.503
H 682.4449 120 119.571 1.464
H 682.4464 120 113.143 1.319
H 682.4479 120 114.189 2.702
H 682.4500 120 120.344 1.549
H 682.4506 120 122.582 1.830
H 682.4543 120 118.039 2.596
H 682.4558 120 126.714 1.551
H 682.4573 120 123.375 7.051
H 683.3333 120 114.611 0.668
16 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 683.3348 120 112.312 1.052
H 683.3363 120 114.717 0.985
H 683.3385 120 112.002 0.653
H 683.3401 120 112.105 0.653
H 683.3415 120 113.560 0.662
H 683.3435 120 113.770 1.442
H 683.3451 120 113.665 0.662
H 683.3466 120 115.885 1.568
H 683.3486 120 113.770 0.809
H 683.3501 120 114.505 0.737
H 683.3516 120 113.665 0.808
H 683.3535 120 113.143 0.804
H 683.3551 120 117.605 0.685
H 683.3566 120 114.400 1.258
H 683.3585 120 115.140 0.671
H 683.3600 120 116.849 0.681
H 683.3616 120 117.065 1.005
H 683.3636 120 112.519 0.656
H 683.3651 120 115.246 0.819
H 683.3666 120 112.830 0.802
H 683.3687 120 115.034 0.818
H 683.3702 120 114.928 0.740
H 683.3717 120 113.979 1.739
H 683.3739 120 114.190 0.812
H 683.3754 120 112.934 0.727
H 683.3769 120 115.565 2.991
H 683.3795 120 112.519 1.054
H 683.3810 120 118.584 0.691
H 683.3826 120 114.400 0.736
H 683.3845 120 112.416 2.109
H 683.3860 120 115.459 0.991
H 683.3875 120 114.084 0.811
H 683.3889 120 115.034 0.740
H 683.3904 120 113.665 0.890
H 683.3920 120 116.527 0.750
H 684.3339 120 120.234 1.488
H 684.3353 120 119.681 1.662
H 684.3368 120 121.123 1.743
H 684.3389 120 119.571 1.558
H 684.3403 120 119.792 1.561
H 684.3441 120 119.681 1.662
H 684.3456 120 122.695 2.570
H 684.3470 120 123.148 1.983
H 684.3484 120 123.148 1.983
H 684.3505 120 119.792 1.929
H 684.3519 120 121.682 1.585
H 684.3541 120 120.900 1.624
H 684.3556 120 122.356 1.700
H 684.3571 120 122.356 1.514
H 684.3591 120 120.123 1.487
H 684.3606 120 120.900 1.532
H 684.3621 120 121.682 2.641
H 684.3644 120 120.789 1.945
H 684.3660 120 121.011 1.577
H 684.3674 120 121.682 3.418
H 684.3696 120 116.205 1.514
H 684.3711 120 122.244 1.698
H 684.3726 120 122.131 2.291
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 17
H 684.3748 120 122.356 1.514
H 684.3762 120 120.234 3.279
H 684.3777 120 123.034 1.653
H 684.3798 120 122.808 2.055
H 684.3813 120 123.034 1.523
H 684.3828 120 124.059 1.536
H 684.3848 120 122.808 1.767
H 684.3863 120 121.906 2.646
H 684.3878 120 124.516 2.896
H 684.3898 120 122.582 1.517
H 684.3912 120 122.019 1.639
H 684.3927 120 122.469 1.596
H 684.3948 120 124.861 1.627
H 684.3962 120 123.602 1.778
H 684.3977 120 121.123 1.808
H 685.3274 120 125.091 1.945
H 685.3289 120 121.346 1.851
H 685.3303 120 127.769 2.978
H 685.3324 120 120.123 1.803
H 685.3409 120 119.792 3.144
H 685.3424 120 116.957 2.644
H 685.3439 120 117.389 2.202
H 685.3459 120 113.770 2.201
H 685.3474 120 117.389 2.202
H 685.3489 120 114.084 2.500
H 685.3510 120 114.506 2.358
H 685.3525 120 118.803 1.888
H 685.3540 120 115.352 2.038
H 685.3561 120 115.992 1.741
H 685.3575 120 116.313 1.746
H 685.3590 120 114.295 1.715
H 685.3610 120 115.459 1.795
H 685.3625 120 117.605 2.206
H 685.3639 120 118.257 3.015
H 685.3660 120 116.527 1.778
H 685.3676 120 119.022 2.451
H 685.3690 120 115.885 2.386
H 685.3711 120 116.420 2.397
H 685.3726 120 115.671 1.838
H 685.3740 120 115.992 1.741
H 685.3761 120 116.527 1.947
H 685.3776 120 119.461 1.793
H 685.3790 120 115.246 1.979
H 685.3810 120 117.822 1.768
H 685.3825 120 114.928 2.223
H 685.3840 120 117.497 1.763
H 689.3300 120 121.682 1.856
H 689.3315 120 120.900 1.844
H 689.3330 120 119.792 1.827
H 689.3351 120 120.344 1.913
H 689.3365 120 116.205 1.995
H 689.3380 120 119.792 1.827
H 689.3401 120 118.039 1.972
H 689.3423 120 120.455 2.723
H 689.3438 120 119.461 1.858
H 689.3458 120 121.794 2.035
H 689.3473 120 119.902 2.003
H 689.3487 120 121.011 2.138
18 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 689.3508 120 119.681 1.861
H 689.3523 120 117.822 1.797
H 689.3537 120 117.931 1.799
H 689.3557 120 123.034 2.055
H 689.3572 120 118.912 4.965
H 689.3586 120 118.584 2.095
H 690.3843 120 135.402 1.676
H 690.3857 120 130.866 8.745
H 690.3892 120 156.468 6.088
H 690.3907 120 133.791 6.165
H 690.3922 120 132.442 1.543
H 690.3943 120 132.809 1.209
H 690.3958 120 131.712 1.072
H 690.3972 120 131.470 1.196
H 690.3994 120 134.408 1.094
H 690.4009 120 136.278 1.109
H 690.4023 120 133.668 3.077
H 690.4044 120 133.545 1.146
H 690.4059 120 136.530 1.111
H 690.4073 120 136.153 1.891
H 690.4094 120 136.781 1.499
H 690.4109 120 132.686 1.207
H 690.4124 120 133.422 1.145
H 690.4144 120 134.532 1.224
H 690.4158 120 135.278 1.393
H 690.4172 120 134.161 1.221
H 690.4193 120 135.902 2.436
H 690.4208 120 134.904 1.158
H 690.4222 120 137.666 2.020
H 690.4244 120 138.047 2.360
H 690.4259 120 133.791 1.217
H 690.4274 120 137.160 1.905
H 690.4297 120 136.404 1.171
H 690.4309 120 135.153 0.992
H 690.4324 120 138.812 1.521
H 691.3710 120 131.955 1.019
H 691.3724 120 123.148 2.555
H 691.3739 120 133.668 2.891
H 691.3760 120 135.777 2.111
H 691.3774 120 150.531 4.325
H 691.3789 120 134.285 1.639
H 691.3812 120 133.176 2.647
H 691.3827 120 135.527 1.334
H 691.3842 120 132.809 2.524
H 691.3864 120 142.176 1.193
H 691.3878 120 137.033 1.349
H 691.3893 120 134.408 1.751
H 691.3913 120 140.355 5.166
H 691.3928 120 135.527 1.766
H 691.3943 120 141.914 1.849
H 691.3964 120 135.153 1.761
H 691.3977 120 136.781 0.972
H 691.3999 120 126.948 0.980
H 692.3336 120 119.022 2.366
H 692.3351 120 118.366 2.780
H 692.3366 120 118.803 2.361
H 692.3387 120 122.469 3.639
H 692.3402 120 128.952 2.681
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 19
H 692.3417 120 118.257 2.777
H 692.3439 120 120.455 2.343
H 692.3453 120 118.584 3.864
H 692.3468 120 118.803 2.311
H 692.3489 120 120.012 2.641
H 692.3503 120 121.011 3.349
H 692.3518 120 118.693 2.726
H 692.3539 120 116.634 3.385
H 692.3553 120 119.792 2.751
H 692.3569 120 119.681 2.690
H 692.3590 120 117.714 3.835
H 692.3605 120 119.241 2.370
H 692.3619 120 118.584 2.916
H 692.3717 120 124.631 2.448
H 692.3729 120 121.346 3.439
H 692.3743 120 118.912 2.517
H 692.3759 120 119.131 2.400
H 692.3778 120 119.681 2.943
H 692.3793 120 119.681 3.392
H 692.3807 120 117.065 2.327
H 692.3828 120 120.566 2.507
H 692.3843 120 117.714 2.371
H 692.3857 120 119.022 2.366
H 692.3878 120 117.822 2.315
H 692.3907 120 120.234 2.459
H 692.3922 120 120.677 2.348
H 692.3937 120 120.900 2.839
H 692.3959 120 118.257 3.352
H 692.3973 120 121.235 2.847
H 692.3988 120 120.789 2.433
H 692.4003 120 122.582 2.595
H 692.4017 120 121.458 2.525
H 692.4032 120 119.131 2.340
H 692.4053 120 122.582 2.385
H 692.4067 120 120.900 2.403
H 692.4082 120 119.681 2.811
H 692.4103 120 122.356 2.432
H 692.4118 120 119.461 3.467
H 692.4133 120 121.235 2.668
H 692.4155 120 121.346 2.384
H 692.4169 120 123.716 2.407
H 692.4184 120 118.148 2.298
H 692.4500 120 118.803 2.790
H 692.4514 120 123.148 2.419
H 692.4551 120 120.789 2.715
H 692.4565 120 119.461 2.374
H 692.4580 120 116.527 2.423
H 692.4602 120 122.469 2.876
H 692.4616 120 121.011 2.377
H 693.3339 120 124.631 1.014
H 693.3353 120 124.516 1.364
H 693.3368 120 123.261 1.618
H 693.3389 120 123.602 1.440
H 693.3403 120 124.516 1.451
H 693.3417 120 121.794 1.692
H 693.3438 120 121.011 1.410
H 693.3453 120 120.123 1.093
H 693.3467 120 121.570 0.989
20 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 693.3489 120 124.173 1.130
H 693.3503 120 122.695 1.117
H 693.3519 120 121.123 1.102
H 693.3541 120 119.351 1.025
H 693.3556 120 122.131 1.111
H 693.3571 120 123.261 1.122
H 693.3593 120 121.458 1.503
H 693.3607 120 129.189 2.209
H 693.3622 120 122.695 1.263
H 693.3643 120 123.488 1.195
H 693.3658 120 122.244 1.112
H 693.3673 120 120.455 1.096
H 693.3694 120 122.131 1.792
H 693.3708 120 120.234 1.032
H 693.3723 120 122.131 0.994
H 693.3745 120 123.261 1.712
H 693.3760 120 121.570 1.176
H 693.3774 120 121.682 1.882
H 693.3795 120 121.906 1.984
H 693.3798 120 123.261 4.246
H 693.3813 120 121.458 2.901
H 693.3834 120 125.091 1.288
H 693.3849 120 120.566 1.035
H 693.3878 120 124.516 1.133
H 693.3892 120 125.784 2.151
H 693.3907 120 122.808 1.520
H 693.3927 120 125.322 1.373
H 693.3942 120 124.861 1.368
H 693.3959 120 125.206 1.139
H 693.3977 120 126.831 2.486
H 693.3992 120 126.132 1.220
H 693.4006 120 130.625 3.120
H 693.4022 120 127.182 3.482
H 693.4037 120 125.669 2.463
H 693.4052 120 124.746 1.071
H 693.4073 120 124.287 2.861
H 693.4088 120 125.553 1.143
H 693.4091 120 125.669 4.777
H 693.4126 120 125.437 3.655
H 693.4141 120 126.598 3.466
H 693.4155 120 127.652 1.873
H 693.4176 120 124.631 1.636
H 693.4191 120 126.016 1.654
H 693.4205 120 124.631 1.928
H 693.4226 120 125.322 2.040
H 693.4241 120 124.746 2.340
H 693.4256 120 125.322 4.429
H 693.4276 120 126.248 5.251
H 693.4291 120 128.714 4.663
H 693.4305 120 128.359 6.955
H 693.4327 120 127.887 7.968
H 693.4342 120 129.070 2.860
H 693.4357 120 132.077 6.561
H 693.4377 120 126.016 6.373
H 693.4392 120 129.786 6.214
H 693.4407 120 126.948 7.566
H 693.4421 120 126.714 6.067
H 693.4436 120 130.986 8.754
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 21
H 693.4451 120 129.427 5.964
H 693.4471 120 124.746 9.127
H 693.4487 120 122.582 1.115
H 693.4501 120 124.287 1.067
H 693.4522 120 121.458 1.782
H 693.4537 120 120.455 1.165
H 693.4551 120 123.148 1.616
H 694.3585 120 126.598 1.567
H 694.3599 120 130.866 1.659
H 694.3615 120 125.784 1.557
H 694.3604 120 123.944 1.921
H 694.3649 120 126.248 1.563
H 694.3665 120 127.299 1.900
H 694.3685 120 123.944 1.921
H 694.3700 120 128.714 1.677
H 694.3715 120 128.241 1.587
H 694.3735 120 127.417 1.902
H 694.3750 120 129.666 5.188
H 694.3765 120 128.241 1.845
H 694.3785 120 129.905 2.530
H 694.3800 120 130.505 2.183
H 694.3815 120 129.382 1.639
H 694.3834 120 130.025 3.869
H 694.3849 120 129.308 4.726
H 694.3864 120 129.905 3.331
H 694.3884 120 129.308 3.316
H 694.3899 120 128.477 2.888
H 694.3913 120 127.769 1.581
H 694.3934 120 127.534 1.579
H 694.3948 120 130.625 3.140
H 694.3964 120 130.145 2.925
H 694.3983 120 130.265 1.697
H 694.3998 120 130.025 2.439
H 694.4013 120 130.986 1.707
H 694.4034 120 129.308 4.726
H 694.4049 120 131.349 2.654
H 694.4064 120 131.228 1.663
H 694.4078 120 131.955 2.764
H 694.4093 120 130.866 1.659
H 694.4108 120 126.714 1.606
H 694.4129 120 131.470 2.286
H 694.4144 120 128.833 1.633
H 694.4159 120 128.714 1.677
H 694.4175 120 131.228 2.113
H 694.4189 120 130.625 1.656
H 694.4204 120 132.809 1.683
H 694.4219 120 131.834 1.671
H 694.4233 120 132.931 1.984
H 694.4248 120 136.781 1.734
H 694.4270 120 136.153 2.554
H 694.4291 120 138.174 4.931
H 694.4307 120 133.914 3.762
H 694.4321 120 131.349 1.626
H 694.4341 120 134.532 1.705
H 694.4355 120 130.745 2.187
H 694.4370 120 132.564 1.727
H 694.4385 120 132.320 2.130
H 694.4399 120 132.564 2.878
22 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 694.4414 120 130.625 1.879
H 694.4435 120 130.265 1.809
H 694.4450 120 132.931 1.685
H 694.4465 120 132.564 3.083
H 694.4479 120 132.320 3.393
H 694.4495 120 133.176 2.498
H 694.4509 120 137.286 1.789
H 694.4529 120 133.299 1.650
H 694.4544 120 137.920 2.686
H 694.4559 120 135.527 4.146
H 695.3516 120 126.481 0.737
H 695.3531 120 128.241 1.476
H 695.3546 120 128.005 0.939
H 695.3566 120 126.365 1.029
H 695.3580 120 126.481 0.737
H 695.3596 120 124.631 0.642
H 695.3616 120 124.631 2.313
H 695.3630 120 126.132 0.735
H 695.3645 120 127.887 1.694
H 695.3667 120 129.786 0.952
H 695.3682 120 125.553 0.824
H 695.3696 120 126.714 0.738
H 695.3711 120 124.746 0.819
H 695.3726 120 128.952 1.156
H 695.3740 120 130.265 1.277
H 695.3765 120 129.427 0.754
H 695.3779 120 126.714 1.242
H 695.3794 120 129.666 1.493
H 695.3815 120 124.631 0.642
H 695.3829 120 124.402 1.115
H 695.3844 120 125.669 1.775
H 695.3857 120 125.437 1.229
H 695.3879 120 132.564 1.990
H 695.3894 120 128.359 0.842
H 695.3915 120 128.596 1.817
H 695.3929 120 126.481 0.737
H 695.3944 120 123.602 0.811
H 695.3965 120 125.322 0.645
H 695.3979 120 129.189 1.939
H 695.3994 120 130.745 0.673
H 695.4015 120 124.746 0.915
H 695.4030 120 128.596 1.260
H 695.4044 120 128.005 1.696
H 695.4065 120 123.716 0.637
H 695.4080 120 129.905 1.835
H 695.4095 120 126.365 0.927
H 696.3481 120 129.070 1.375
H 696.3496 120 126.016 1.437
H 695.1936 120 129.547 1.477
H 696.3533 120 129.189 1.748
H 696.3547 120 131.955 1.505
H 696.3562 120 132.442 1.510
H 696.3583 120 128.833 1.417
H 696.3597 120 130.625 1.436
H 696.3612 120 132.564 1.877
H 696.3633 120 131.470 1.861
H 696.3647 120 133.791 1.587
H 696.3662 120 129.308 1.831
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 23
H 696.3677 120 128.596 1.820
H 696.3698 120 126.598 1.444
H 696.3713 120 127.652 1.404
H 696.3727 120 128.714 1.665
H 696.3749 120 131.470 1.499
H 696.3763 120 128.477 1.738
H 696.3778 120 130.745 1.438
H 696.3815 120 130.986 2.217
H 696.3829 120 130.745 1.691
H 696.3844 120 131.228 2.319
H 696.3865 120 129.905 2.013
H 696.3879 120 133.054 2.351
H 696.3894 120 130.986 1.554
H 696.3915 120 130.505 1.765
H 696.3930 120 130.265 1.612
H 696.3945 120 134.038 1.897
H 696.3966 120 132.199 2.142
H 696.3981 120 132.564 1.641
H 696.3995 120 131.107 1.556
H 696.4016 120 130.986 2.030
H 696.4031 120 130.986 3.365
H 696.4053 120 131.712 7.665
H 696.4073 120 131.712 1.502
H 696.4088 120 131.107 2.935
H 699.3295 120 124.976 1.120
H 699.3309 120 127.887 1.259
H 699.3325 120 124.402 1.225
H 699.3345 120 124.058 1.785
H 699.3360 120 121.011 1.256
H 699.3375 120 122.356 1.761
H 699.3396 120 123.716 1.159
H 699.3411 120 125.437 1.175
H 699.3426 120 123.944 2.071
H 699.3454 120 123.944 3.210
H 699.3469 120 123.944 1.358
H 699.3484 120 123.944 1.161
H 699.3505 120 120.123 2.904
H 699.3521 120 123.716 1.433
H 699.3535 120 123.148 1.426
H 699.3555 120 122.131 1.850
H 699.3569 120 122.244 1.145
H 699.3584 120 123.944 2.474
H 699.3605 120 124.631 3.879
H 699.3619 120 125.091 1.707
H 699.3634 120 125.437 1.536
H 699.3654 120 122.921 1.862
H 699.3669 120 123.148 2.872
H 699.3683 120 121.458 1.089
H 699.4052 120 126.831 1.137
H 699.3718 120 121.123 1.192
H 699.3734 120 123.602 1.217
H 699.3754 120 127.299 1.474
H 699.3768 120 127.652 2.654
H 699.3784 120 130.265 1.282
H 699.3821 120 126.481 1.465
H 699.3835 120 128.833 1.207
H 699.3850 120 126.948 3.288
H 699.3871 120 126.831 1.248
24 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 699.3885 120 129.666 3.247
H 699.3900 120 130.866 3.389
H 699.3922 120 126.831 3.176
H 699.3936 120 125.437 1.997
H 699.3950 120 125.091 1.371
H 699.3971 120 123.716 1.356
H 699.3986 120 122.356 1.146
H 699.4000 120 123.830 1.110
H 699.4023 120 126.948 1.250
H 699.4038 120 122.469 1.147
H 699.4052 120 124.402 1.225
H 699.4073 120 126.481 5.734
H 699.4088 120 124.287 1.978
H 699.4103 120 125.322 1.534
H 699.4123 120 124.402 1.441
H 699.4138 120 122.921 1.769
H 699.4153 120 125.784 1.378
H 699.4173 120 125.437 2.196
H 699.4188 120 124.516 1.524
H 699.4202 120 123.034 1.425
H 699.4224 120 123.944 1.220
H 699.4238 120 122.808 1.274
H 699.4253 120 121.906 1.492
H 700.3828 120 124.059 1.010
H 700.3843 120 123.944 1.128
H 700.3857 120 125.553 1.215
H 700.3879 120 124.402 1.068
H 700.3893 120 122.131 1.111
H 700.3908 120 129.427 1.053
H 700.3929 120 126.715 1.088
H 700.3943 120 126.831 1.306
H 700.3958 120 122.356 1.113
H 700.3978 120 127.534 1.579
H 700.3994 120 127.065 1.156
H 700.4009 120 125.206 1.643
H 700.4030 120 125.553 1.376
H 700.4044 120 128.952 1.413
H 700.4059 120 124.516 1.926
H 700.4079 120 125.669 1.144
H 700.4091 120 126.365 1.385
H 700.4108 120 125.553 1.078
H 700.4129 120 126.249 1.221
H 700.4143 120 125.206 1.019
H 700.4158 120 124.976 1.287
H 700.4178 120 124.631 1.206
H 700.4193 120 126.365 2.371
H 700.4208 120 124.402 1.633
H 700.4244 120 127.652 1.096
H 700.4259 120 127.065 1.392
H 700.4274 120 126.132 1.027
H 700.4288 120 126.249 1.149
H 700.4291 120 125.206 2.141
H 700.4305 120 130.745 2.344
H 700.4326 120 127.417 3.488
H 700.4341 120 126.831 2.273
H 700.4355 120 123.944 1.627
H 700.4376 120 124.861 3.200
H 700.4391 120 124.516 1.827
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 25
H 700.4406 120 127.887 1.401
H 700.4420 120 126.715 1.859
H 700.4435 120 123.602 1.530
H 700.4449 120 127.769 1.489
H 700.4464 120 124.173 1.130
H 700.4479 120 123.602 1.354
H 700.4499 120 122.695 1.430
H 700.4514 120 123.375 1.059
H 700.4529 120 125.553 1.744
H 700.4549 120 124.516 1.069
H 700.4448 120 123.830 1.357
H 700.4463 120 123.375 1.352
H 700.4598 120 123.716 1.062
H 700.4614 120 126.948 1.228
H 700.4628 120 120.012 1.236
H 700.4642 120 122.469 1.261
H 701.3506 120 117.281 1.487
H 701.3521 120 117.389 1.752
H 701.3535 120 120.012 1.521
H 701.3558 120 122.244 2.293
H 701.3572 120 117.497 1.632
H 701.3588 120 116.312 1.674
H 701.3608 120 117.605 1.692
H 701.3622 120 118.912 1.652
H 701.3637 120 116.098 1.733
H 701.3658 120 113.143 1.434
H 701.3672 120 113.770 1.978
H 701.3688 120 114.611 1.917
H 701.3708 120 113.247 1.823
H 701.3724 120 112.209 1.422
H 701.3738 120 110.364 1.777
H 701.3759 120 112.002 1.874
H 701.3774 120 116.420 1.476
H 701.3789 120 115.671 1.664
H 701.3810 120 116.205 1.734
H 701.3824 120 117.065 1.747
H 701.3839 120 119.131 1.601
H 701.3862 120 118.475 1.768
H 701.3876 120 115.671 1.664
H 701.3890 120 111.078 1.931
H 701.3911 120 116.527 1.677
H 701.3926 120 117.281 1.687
H 701.3940 120 115.778 1.728
H 701.3962 120 115.671 1.507
H 701.3977 120 118.803 2.065
H 701.3996 120 117.931 1.697
H 701.4011 120 117.281 1.687
H 701.4026 120 118.693 1.546
H 701.4045 120 119.131 1.714
H 701.4060 120 117.497 1.579
H 701.4075 120 119.351 1.555
H 701.4095 120 118.693 1.772
H 701.4110 120 119.131 1.714
H 701.4125 120 116.742 1.445
H 701.4147 120 115.991 1.511
H 701.4162 120 116.957 2.033
H 701.4177 120 116.742 1.809
H 701.4199 120 116.420 1.517
26 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 701.4214 120 117.713 1.694
H 701.4229 120 114.189 1.447
H 702.3953 120 124.173 1.680
H 702.3967 120 123.602 1.814
H 702.3982 120 123.716 2.229
H 702.4003 120 125.784 1.743
H 702.4017 120 125.437 1.739
H 702.4032 120 121.235 1.640
H 702.4053 120 125.091 1.782
H 702.4067 120 124.746 2.679
H 702.4082 120 123.602 1.814
H 702.4103 120 122.695 1.701
H 702.4117 120 125.900 1.703
H 702.4132 120 125.437 2.181
H 702.4152 120 124.631 1.651
H 702.4174 120 122.356 1.621
H 702.4194 120 124.059 2.157
H 702.4214 120 122.582 1.799
H 702.4224 120 124.631 1.686
H 702.4244 120 124.976 1.732
H 702.4259 120 122.244 2.051
H 702.4274 120 125.437 1.662
H 702.4294 120 125.437 1.900
H 702.4309 120 123.148 1.928
H 702.4324 120 124.402 2.087
H 702.4344 120 123.716 1.674
H 702.4359 120 125.206 1.735
H 702.4374 120 124.631 1.888
H 702.4393 120 124.631 1.727
H 702.4408 120 123.944 1.819
H 702.4423 120 123.830 1.875
H 702.4443 120 123.375 1.669
H 702.4458 120 124.861 2.095
H 702.4473 120 123.148 2.066
H 702.4493 120 126.249 1.750
H 702.4508 120 123.602 1.672
H 702.4523 120 124.173 1.680
H 702.4537 120 122.808 1.802
H 703.4019 120 116.957 1.096
H 703.4033 120 114.295 1.478
H 703.4048 120 115.885 1.086
H 703.4062 120 113.979 1.068
H 703.4077 120 113.770 1.066
H 703.4092 120 118.039 1.106
H 703.4113 120 115.671 1.200
H 703.4128 120 112.727 1.056
H 703.4142 120 113.979 1.068
H 703.4163 120 112.416 1.166
H 703.4177 120 116.849 1.681
H 703.4192 120 118.257 1.448
H 703.4213 120 115.034 1.078
H 703.4228 120 112.209 1.104
H 703.4242 120 116.098 1.205
H 703.4263 120 117.497 1.288
H 703.4277 120 115.034 1.078
H 703.4298 120 116.312 1.145
H 703.4313 120 116.312 1.090
H 703.4340 120 113.874 1.067
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 27
H 703.4354 120 118.475 1.450
H 703.4370 120 115.992 2.706
H 703.4390 120 115.459 1.136
H 703.4405 120 118.257 1.370
H 703.4420 120 112.727 1.056
H 668.7222 120 114.190 1.070
H 703.4456 120 114.928 1.077
H 703.4470 120 116.527 1.427
H 703.4485 120 114.084 1.184
H 703.5240 120 114.822 1.852
H 703.5255 120 118.475 2.026
H 703.5270 120 116.205 2.329
H 703.5284 120 116.098 1.872
H 703.5300 120 117.173 2.051
H 703.5314 120 115.459 2.072
H 703.5334 120 118.475 1.910
H 703.5349 120 116.420 2.205
H 703.5364 120 115.459 1.862
H 703.5385 120 116.205 1.874
H 703.5399 120 115.991 1.870
H 703.5414 120 117.713 2.501
H 703.5428 120 114.611 1.960
H 703.5443 120 115.885 1.901
H 703.5458 120 118.366 2.072
H 703.5479 120 116.098 1.942
H 703.5493 120 114.822 2.010
H 703.5508 120 116.742 2.789
H 703.5529 120 116.849 1.884
H 703.5543 120 117.713 1.898
H 703.5558 120 118.584 1.984
H 703.5583 120 114.084 1.951
H 703.5598 120 116.849 2.097
H 703.5613 120 115.778 2.192
H 703.5634 120 113.979 2.158
H 703.5648 120 117.389 2.223
H 703.5663 120 117.605 2.167
H 703.5684 120 116.634 2.628
H 703.5698 120 116.312 1.989
H 703.5713 120 117.281 1.891
H 709.5557 120 114.505 2.709
H 709.5571 120 113.665 2.793
H 710.5570 120 118.148 0.925
H 710.5585 120 118.912 1.503
H 710.5613 120 118.366 1.315
H 710.5627 120 119.241 1.000
H 710.5643 120 118.039 1.876
H 710.5663 120 116.742 2.758
H 710.5673 120 117.930 1.973
H 710.5693 120 118.693 0.929
H 720.5547 120 118.584 2.774
H 720.5562 120 116.312 3.596
H 720.5577 120 120.344 3.037
H 720.5592 120 119.791 2.861
H 720.5607 120 123.716 3.172
H 720.5621 120 121.458 2.842
H 720.5636 120 121.794 4.078
H 720.5651 120 115.459 3.119
H 720.5666 120 119.791 3.423
28 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
H 721.5507 120 131.955 3.061
H 721.5521 120 125.206 3.027
H 721.5535 120 125.322 3.448
H 721.5557 120 131.591 3.109
H 721.5571 120 132.442 3.342
H 721.5585 120 130.385 3.655
H 721.5607 120 127.299 3.041
H 721.5621 120 128.005 2.970
H 721.5636 120 129.547 3.132
H 722.5481 120 123.261 8.973
H 722.5496 120 117.605 11.772
H 722.5510 120 127.065 9.366
J 633.4851 120 60.078 5.054
J 633.4935 120 59.637 5.078
J 633.5005 120 57.374 4.908
J 633.5074 120 61.026 5.165
J 633.5208 120 59.637 5.047
J 635.5070 120 57.163 1.229
J 635.6336 120 56.483 1.373
J 635.6411 120 56.691 0.912
J 635.6470 120 58.495 1.927
J 635.6531 120 56.223 0.904
J 635.6575 120 58.065 1.328
J 635.6643 120 56.743 1.339
J 635.6752 120 57.163 1.519
J 639.6642 120 55.657 1.692
J 639.6657 120 55.172 1.728
I 504.27 60 23.344 0.637
I 504.27 60 22.294 0.406
I 504.27 60 23.130 0.210
I 508.27 60 21.095 0.384
I 508.27 60 21.487 0.391
I 508.28 60 21.487 0.391
I 509.29 60 19.239 0.175
I 509.29 60 19.063 0.173
I 509.29 60 19.063 0.173
I 511.25 60 20.332 0.185
I 511.26 60 20.332 0.185
I 511.26 60 20.710 0.188
I 511.28 60 20.520 0.187
I 511.28 60 20.332 0.185
I 511.28 60 20.520 0.187
I 516.25 60 23.560 0.214
I 516.25 60 23.344 0.212
I 516.25 60 23.344 0.425
I 516.25 60 23.778 0.216
I 516.25 60 23.998 0.437
I 516.31 60 21.095 0.384
I 516.31 60 20.902 0.380
I 516.31 60 20.710 0.377
I 517.24 60 21.290 0.194
I 517.24 60 21.095 0.192
I 517.24 60 21.095 0.192
I 619.21 30 23.344 0.425
I 619.21 30 23.130 0.421
I 619.21 30 23.344 0.425
I 619.21 30 23.344 0.425
I 619.22 30 23.778 0.433
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 29
I 619.22 30 23.344 0.425
I 619.22 30 23.560 0.429
I 619.22 30 23.344 0.425
I 619.23 30 25.362 0.231
I 619.23 30 25.362 0.231
I 619.24 30 25.362 0.231
I 619.24 30 25.362 0.231
I 619.24 30 25.129 0.229
I 619.24 30 25.362 0.231
I 619.25 30 25.129 0.229
I 619.25 30 25.362 0.231
I 619.25 30 25.362 0.231
I 619.26 30 25.362 0.231
I 631.13 30 24.220 0.441
I 631.13 30 23.778 0.433
I 631.13 30 23.998 0.437
I 631.13 30 25.362 0.462
I 632.98 30 30.774 0.280
I 632.98 30 30.774 0.280
I 632.99 30 31.346 0.285
I 632.99 30 31.929 0.291
I 632.99 30 33.127 1.206
I 632.99 30 33.434 0.304
I 633 30 30.492 0.555
I 633 30 30.492 0.555
I 633 30 29.661 0.540
I 633 30 29.661 0.540
I 633.01 30 28.852 0.263
I 633.01 30 28.852 0.525
I 633.01 30 28.066 0.255
I 633.01 30 28.066 0.511
I 633.02 30 27.554 0.501
I 633.02 30 27.051 0.492
I 633.03 30 27.554 0.501
I 633.03 30 27.301 0.248
I 634.98 30 26.314 0.479
I 634.99 30 25.362 0.462
I 634.99 30 25.129 0.457
I 634.99 30 25.129 0.457
I 635.1 30 31.929 0.581
I 635.1 30 33.743 0.614
I 635.12 30 25.597 0.932
I 635.12 30 25.833 0.940
I 635.13 30 25.597 0.699
I 635.13 30 26.314 0.718
I 635.14 30 28.326 0.516
I 635.15 30 28.066 0.511
I 635.15 30 28.326 0.516
I 635.16 30 28.588 0.520
I 635.16 30 29.119 0.530
I 635.17 30 29.119 0.795
I 635.17 30 28.852 0.525
I 635.17 30 28.852 0.525
I 635.17 30 28.066 0.511
I 635.18 30 29.119 0.795
I 635.18 30 29.389 0.535
I 635.18 30 29.119 0.530
I 635.19 30 28.852 2.100
30 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
I 640.22 120 29.119 0.265
I 640.22 120 29.119 0.265
I 641.11 120 29.389 0.535
I 641.11 120 28.852 0.525
I 643.21 120 30.774 0.280
I 643.22 120 30.774 0.280
I 644.1 120 31.346 0.285
I 644.1 120 31.346 0.285
I 648.2 120 32.224 0.586
I 648.21 120 32.522 0.592
I 648.22 120 32.224 0.586
I 655.03 120 32.224 1.759
I 655.03 120 32.224 0.586
I 655.2 120 31.929 0.581
I 655.03 120 28.066 0.255
I 655.03 120 27.809 0.253
I 655.2 120 34.370 0.313
I 655.2 120 34.370 0.313
I 655.21 120 35.009 0.319
I 655.21 120 35.009 0.319
I 655.22 120 34.688 0.316
I 655.22 120 35.009 0.319
I 678.03 120 39.827 0.626
I 678.03 120 39.608 0.639
I 678.04 120 39.974 0.645
I 678.04 120 39.718 0.713
I 678.06 120 38.885 0.618
I 678.06 120 39.608 0.794
I 678.07 120 39.173 0.615
I 678.07 120 39.901 0.655
I 678.09 120 39.245 0.657
I 678.09 120 39.245 0.624
I 678.1 120 39.644 0.639
I 678.12 120 38.387 0.610
I 678.12 120 38.529 0.612
I 678.13 120 38.422 0.630
I 678.13 120 38.493 0.612
I 679.08 120 37.756 1.039
I 679.09 120 38.778 0.848
I 680.01 120 39.974 0.635
I 680.02 120 38.000 0.665
I 680.03 120 37.410 0.671
I 680.03 120 37.375 0.639
I 680.04 120 36.896 0.595
I 680.05 120 37.617 0.629
I 680.06 120 37.169 0.599
I 680.06 120 38.035 0.613
I 680.08 120 37.410 0.655
I 680.08 120 37.272 0.815
I 680.14 120 35.726 0.568
I 680.14 120 35.924 0.564
I 681.1 120 38.458 0.643
I 681.1 120 38.210 0.607
I 681.11 120 38.529 0.612
I 681.12 120 36.457 0.579
I 681.13 120 39.245 0.937
I 681.13 120 39.901 0.716
I 682.03 120 38.000 0.604
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 31
I 682.03 120 37.860 0.602
I 682.04 120 37.169 0.591
I 682.05 120 36.896 0.586
I 682.06 120 36.491 0.580
I 682.06 120 36.357 0.596
I 682.07 120 36.457 0.672
I 682.08 120 36.794 0.616
I 682.1 120 36.457 0.731
I 682.1 120 36.189 0.605
I 682.11 120 36.223 0.747
I 682.11 120 36.693 0.851
I 682.13 120 35.660 0.610
I 682.13 120 36.090 0.767
I 683.02 120 38.210 0.788
I 683.02 120 37.169 0.599
I 683.02 120 36.223 0.584
I 683.03 120 36.457 0.588
I 683.03 120 35.594 0.574
I 683.05 120 35.431 0.593
I 683.05 120 35.792 0.577
I 683.06 120 35.924 0.579
I 683.06 120 38.849 0.680
I 684.02 120 38.885 0.802
I 684.02 120 39.390 0.626
I 684.04 120 39.608 0.771
I 684.04 120 38.529 0.605
I 684.05 120 37.479 0.604
I 684.05 120 39.101 0.631
I 684.07 120 39.029 0.719
I 684.07 120 36.189 0.633
I 689.02 120 36.761 0.603
I 689.02 120 36.896 0.662
I 689.03 120 35.957 0.645
I 689.04 120 41.132 0.663
I 690.08 120 40.718 0.647
I 690.08 120 42.285 0.759
I 690.09 120 42.052 0.704
I 690.11 120 42.480 0.851
I 690.11 120 41.589 0.696
I 690.11 120 42.794 1.022
I 691.06 120 41.704 0.655
I 691.06 120 42.013 0.718
I 691.08 120 42.207 0.951
I 691.08 120 38.387 0.619
I 692.02 120 38.105 0.684
I 692.02 120 39.426 0.660
I 692.04 120 39.101 0.669
I 692.04 120 37.825 0.601
I 692.06 120 38.210 0.639
I 692.08 120 38.778 0.616
I 692.08 120 38.885 0.611
I 692.14 120 39.827 0.734
I 693.02 120 39.608 0.629
I 693.03 120 39.499 0.709
I 693.04 120 38.849 0.680
I 693.04 120 38.458 0.643
I 693.05 120 38.316 0.609
I 693.06 120 39.426 0.747
32 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
I 693.07 120 39.029 0.700
I 693.08 120 38.635 0.634
I 693.09 120 37.548 0.775
I 693.09 120 37.965 0.603
I 693.1 120 37.686 0.592
I 693.11 120 37.756 0.600
I 693.12 120 37.930 0.596
I 693.12 120 38.387 0.610
I 693.14 120 37.513 0.596
I 694.05 120 38.210 0.600
I 694.05 120 38.493 0.612
I 694.08 120 38.493 0.612
I 694.08 120 39.462 0.627
I 694.09 120 39.974 0.635
I 694.1 120 40.011 0.636
I 694.11 120 39.499 0.769
I 694.11 120 40.868 0.649
I 694.11 120 40.568 0.679
I 694.13 120 39.499 0.675
I 694.13 120 39.390 0.689
I 694.14 120 39.353 0.635
I 694.14 120 39.101 0.614
I 695.05 120 40.680 0.639
I 695.05 120 39.938 0.635
I 695.06 120 38.921 0.628
I 695.07 120 38.671 1.275
I 695.08 120 38.635 0.614
I 695.08 120 38.458 0.611
I 695.09 120 39.137 0.685
I 695.1 120 38.422 0.604
I 696.05 120 39.101 0.621
I 696.05 120 39.245 0.704
I 696.06 120 40.531 0.693
I 696.06 120 40.196 0.639
I 696.08 120 40.643 0.646
I 696.08 120 40.755 0.794
I 696.09 120 40.943 0.660
I 699.03 120 41.475 0.785
I 699.03 120 41.436 0.651
I 699.04 120 40.906 0.650
I 699.04 120 41.436 0.693
I 699.04 120 41.551 0.787
I 699.06 120 42.052 0.796
I 699.06 120 41.132 0.675
I 699.08 120 41.360 0.667
I 699.08 120 41.666 0.729
I 699.09 120 40.755 0.648
I 699.1 120 41.589 0.653
I 699.11 120 40.943 0.651
I 699.11 120 41.936 0.666
I 700.04 120 41.284 0.723
I 700.04 120 41.170 0.849
I 700.05 120 46.388 0.107
I 700.05 120 40.793 0.714
I 700.05 120 40.793 0.641
I 700.08 120 41.246 0.648
I 700.08 120 40.943 0.651
I 700.09 120 40.755 0.713
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 33
I 700.1 120 40.270 0.742
I 700.11 120 40.755 0.713
I 700.11 120 40.680 0.647
I 701.03 120 40.943 0.735
I 701.03 120 37.895 0.648
I 701.04 120 37.721 0.695
I 701.05 120 38.387 0.610
I 701.06 120 38.387 0.619
I 701.06 120 38.778 0.874
I 701.07 120 39.137 0.702
I 701.07 120 37.479 0.615
I 701.07 120 37.895 0.648
I 701.09 120 37.101 0.590
I 701.09 120 37.238 0.592
I 702.04 120 39.499 0.691
I 702.04 120 40.307 0.763
I 702.05 120 40.382 0.634
I 702.05 120 40.270 0.633
I 702.07 120 41.094 0.653
I 702.07 120 40.568 0.645
I 702.08 120 41.132 0.654
I 702.08 120 41.513 0.669
I 703.05 120 36.592 0.657
I 703.06 120 37.341 0.593
I 703.06 120 37.375 0.613
I 703.08 120 36.964 0.647
I 703.08 120 37.341 0.638
I 704.04 120 36.390 0.622
I 704.04 120 36.056 0.581
I 704.06 120 36.998 0.597
I 704.06 120 36.625 0.575
I 704.08 120 36.896 0.586
I 704.08 120 37.721 0.599
I 711.07 120 36.524 0.574
I 711.08 120 36.090 0.592
I 715.04 120 37.032 0.633
I 715.04 120 36.794 0.616
I 715.05 120 36.964 0.596
I 715.05 120 36.727 0.643
I 721.06 120 34.212 0.585
I 721.06 120 35.236 0.560
I 722.06 120 37.965 0.783
I 722.06 120 38.281 0.705
R 504.27 60 20.601 0.375
R 504.28 60 19.856 0.361
R 504.28 60 20.791 0.757
R 508.28 60 17.942 0.163
R 508.28 60 17.778 0.162
R 508.29 60 17.778 0.162
R 509.29 60 17.778 0.162
R 509.29 60 17.615 0.160
R 509.29 60 17.778 0.162
R 510.3 60 16.214 0.295
R 510.3 60 16.364 0.149
R 510.3 60 16.364 0.149
R 511.26 60 18.962 0.173
R 511.26 60 18.788 0.171
R 511.26 60 19.137 0.174
34 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
R 511.28 60 18.962 0.173
R 511.28 60 18.788 0.171
R 511.28 60 18.788 0.171
R 516.25 60 21.178 0.193
R 516.25 60 18.445 0.168
R 516.25 60 18.616 0.169
R 517.24 60 19.674 0.179
R 517.24 60 19.314 0.176
R 517.24 60 19.493 0.177
R 517.25 60 19.493 0.177
R 517.25 60 19.493 0.177
R 517.25 60 19.493 0.177
R 619.2 30 26.175 0.476
R 619.21 30 26.175 0.476
R 619.21 30 26.175 0.476
R 619.21 30 26.175 0.476
R 619.22 30 23.653 0.215
R 619.22 30 23.653 0.215
R 619.23 30 23.872 0.217
R 619.23 30 23.872 0.217
R 619.24 30 23.653 0.215
R 619.24 30 23.872 0.217
R 619.24 30 23.653 0.215
R 619.24 30 23.872 0.217
R 619.25 30 24.093 0.219
R 619.25 30 24.093 0.219
R 619.25 30 24.093 0.219
R 619.25 30 24.093 0.219
R 631.13 30 26.661 0.485
R 631.13 30 26.908 0.490
R 632.98 30 28.700 0.261
R 632.98 30 28.700 0.261
R 632.99 30 28.966 0.264
R 632.99 30 28.700 0.261
R 632.99 30 28.966 0.264
R 632.99 30 28.966 0.264
R 632.99 30 29.234 0.266
R 632.99 30 29.234 0.266
R 633 30 28.700 0.261
R 633 30 28.700 0.261
R 633.01 30 27.918 0.254
R 633.01 30 28.176 0.256
R 633.01 30 27.662 0.252
R 633.01 30 27.408 0.249
R 633.02 30 26.661 0.243
R 633.02 30 26.661 0.243
R 633.02 30 26.417 0.240
R 633.02 30 26.417 0.240
R 634.98 30 23.653 0.215
R 634.98 30 23.008 0.209
R 634.98 30 23.872 0.217
R 634.98 30 23.872 0.217
R 635.1 30 27.408 0.249
R 635.1 30 27.408 0.249
R 635.13 30 28.176 0.769
R 635.13 30 28.966 0.527
R 635.13 30 28.437 1.035
R 635.13 30 28.437 0.776
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 35
R 635.14 30 29.777 0.813
R 635.14 30 28.700 0.784
R 635.15 30 28.176 0.769
R 635.15 30 28.700 0.784
R 635.15 30 26.175 0.238
R 635.15 30 25.697 0.468
R 635.16 30 25.935 0.236
R 635.16 30 25.935 0.236
R 635.17 30 24.093 0.658
R 635.17 30 25.935 0.236
R 635.17 30 25.935 0.236
R 635.18 30 26.417 0.240
R 635.18 30 26.661 0.485
R 635.18 30 25.697 0.468
R 635.18 30 24.997 0.455
R 635.19 30 25.935 0.472
R 635.19 30 26.417 0.481
R 640.22 120 28.966 0.264
R 640.22 120 27.408 0.249
R 641.11 120 31.760 0.578
R 641.11 120 31.760 0.578
R 644.1 120 27.408 0.499
R 644.1 120 27.662 0.503
R 648.21 120 28.437 0.259
R 648.21 120 28.966 0.264
R 648.22 120 28.437 0.518
R 648.22 120 30.053 1.641
R 652.05 120 31.760 0.289
R 655.02 120 26.417 0.240
R 655.02 120 26.417 0.240
R 655.03 120 25.461 0.232
R 655.03 120 25.461 0.232
R 655.2 120 32.351 0.294
R 655.2 120 32.351 0.294
R 655.21 120 32.351 0.294
R 655.21 120 32.351 0.294
R 665.04 30 31.469 0.573
R 665.04 30 32.054 0.583
R 678.02 30 34.824 0.634
R 678.02 120 35.480 0.915
R 678.02 120 35.980 0.927
R 678.04 120 35.980 0.957
R 678.04 120 35.409 0.914
R 678.05 120 35.303 0.925
R 678.05 120 35.409 0.922
R 678.07 120 35.303 0.919
R 678.07 120 35.551 0.920
R 678.08 120 34.846 0.989
R 678.08 120 34.846 0.908
R 678.1 120 34.846 0.904
R 678.1 120 34.637 0.909
R 678.11 120 34.222 0.893
R 678.11 120 34.119 0.891
R 678.13 120 33.406 0.869
R 678.13 120 33.676 0.876
R 679.08 120 34.187 0.958
R 679.08 120 33.880 0.891
R 680.01 120 33.507 1.134
36 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
R 680.01 120 33.507 1.248
R 680.02 120 33.138 1.062
R 680.03 120 32.707 0.857
R 680.04 120 32.972 0.869
R 680.04 120 33.238 0.933
R 680.05 120 33.005 1.185
R 680.06 120 33.005 0.970
R 680.07 120 32.707 1.318
R 680.07 120 32.905 1.159
R 680.13 120 32.707 0.896
R 680.13 120 31.731 0.862
R 681.09 120 31.699 0.845
R 681.1 120 34.360 0.892
R 681.11 120 34.498 0.900
R 681.11 120 34.291 0.960
R 681.12 120 34.429 0.929
R 681.13 120 34.671 0.917
R 682.02 120 35.021 0.934
R 682.03 120 34.846 0.939
R 682.04 120 34.846 1.176
R 682.04 120 33.744 1.023
R 682.05 120 33.272 1.046
R 682.06 120 32.806 1.156
R 682.07 120 32.905 0.874
R 682.07 120 33.812 0.889
R 682.09 120 33.744 1.120
R 682.09 120 31.539 0.997
R 682.11 120 31.476 0.847
R 682.11 120 32.021 0.890
R 682.12 120 31.317 0.862
R 682.12 120 31.190 0.832
R 683.01 120 31.444 0.832
R 683.01 120 31.190 0.858
R 683.03 120 31.635 0.831
R 683.03 120 31.476 0.828
R 683.04 120 31.731 0.853
R 683.04 120 32.086 0.842
R 683.06 120 31.731 0.834
R 683.06 120 30.782 0.816
R 684.01 120 31.001 0.821
R 684.01 120 33.339 0.936
R 684.02 120 33.305 0.884
R 684.02 120 34.085 0.896
R 684.03 120 33.541 0.877
R 684.03 120 34.050 0.919
R 684.05 120 34.602 0.997
R 684.05 120 34.256 0.915
R 684.06 120 34.050 0.919
R 684.06 120 33.812 0.879
R 689.01 120 33.744 0.882
R 689.02 120 31.892 0.837
R 689.03 120 32.183 0.844
R 689.03 120 32.054 0.870
R 690.07 120 32.444 0.851
R 690.08 120 36.161 0.953
R 690.09 120 35.056 0.935
R 690.09 120 35.909 0.934
R 690.1 120 36.125 0.934
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 37
R 690.11 120 36.269 0.998
R 691.06 120 36.415 0.946
R 691.06 120 36.891 1.216
R 691.07 120 38.277 1.190
R 691.07 120 37.410 1.027
R 692.02 120 37.112 1.137
R 692.02 120 33.914 0.886
R 692.03 120 34.256 0.915
R 692.03 120 33.272 0.883
R 692.05 120 33.406 1.032
R 692.06 120 33.138 0.867
R 692.07 120 33.272 0.871
R 692.08 120 33.071 0.866
R 692.09 120 33.948 0.887
R 692.09 120 33.575 0.878
R 692.13 120 33.507 0.916
R 692.13 120 33.914 0.927
R 693.02 120 34.567 0.902
R 693.02 120 34.222 0.893
R 693.03 120 33.710 0.881
R 693.04 120 33.238 0.870
R 693.05 120 33.812 0.883
R 693.05 120 33.642 0.879
R 693.06 120 33.105 0.867
R 693.07 120 33.205 0.869
R 693.07 120 33.205 0.869
R 693.07 120 32.905 0.862
R 693.08 120 33.005 0.864
R 693.09 120 32.608 0.855
R 693.1 120 32.938 0.863
R 693.1 120 32.872 0.861
R 693.11 120 32.839 0.860
R 693.12 120 32.740 0.858
R 694.05 120 35.729 0.929
R 694.05 120 35.409 0.922
R 694.06 120 35.587 0.926
R 694.06 120 35.622 0.927
R 694.07 120 34.916 0.910
R 694.07 120 34.811 0.907
R 694.09 120 35.232 0.917
R 694.09 120 35.551 0.931
R 694.1 120 35.515 0.924
R 695.04 120 34.637 0.903
R 695.05 120 34.741 0.906
R 695.06 120 33.880 0.881
R 695.06 120 34.085 0.890
R 695.07 120 33.744 0.888
R 695.08 120 33.778 0.883
R 695.09 120 34.187 0.892
R 695.09 120 33.914 0.886
R 696.04 120 35.338 0.974
R 696.04 120 34.986 0.907
R 696.06 120 34.532 0.901
R 696.06 120 34.567 0.902
R 696.07 120 35.056 0.913
R 696.08 120 35.126 0.915
R 696.09 120 36.016 0.931
R 696.09 120 36.089 1.021
38 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
R 699.02 120 37.186 1.180
R 699.02 120 36.927 1.217
R 699.04 120 35.444 1.217
R 699.04 120 36.744 1.146
R 699.05 120 36.089 0.933
R 699.05 120 36.488 0.991
R 699.07 120 35.837 0.932
R 699.08 120 35.658 0.923
R 699.09 120 36.233 0.941
R 699.09 120 35.622 1.040
R 699.1 120 36.089 0.933
R 699.11 120 36.415 0.989
R 700.03 120 36.161 0.940
R 700.03 120 35.837 0.932
R 700.05 120 35.409 0.922
R 700.05 120 35.480 0.923
R 700.06 120 35.338 0.920
R 700.06 120 35.303 0.919
R 700.07 120 34.846 0.908
R 700.08 120 35.021 0.912
R 700.09 120 34.222 0.893
R 700.09 120 35.091 0.914
R 700.1 120 35.056 0.909
R 700.11 120 35.303 0.919
R 701.02 120 33.440 0.875
R 701.03 120 33.473 0.871
R 701.04 120 34.085 0.890
R 701.04 120 34.016 0.997
R 701.05 120 33.676 0.932
R 701.06 120 33.812 1.008
R 701.07 120 33.778 1.024
R 701.07 120 32.972 0.914
R 701.08 120 33.473 0.871
R 701.08 120 33.105 1.143
R 702.03 120 36.306 1.076
R 702.04 120 36.488 0.947
R 702.05 120 35.622 0.927
R 702.05 120 35.694 0.928
R 702.06 120 36.016 0.936
R 702.06 120 36.125 0.934
R 702.08 120 36.634 0.965
R 702.08 120 36.670 0.958
R 703.04 120 32.313 0.910
R 703.04 120 32.411 0.855
R 703.05 120 32.608 0.917
R 703.06 120 32.641 0.895
R 703.07 120 32.510 0.852
R 703.08 120 32.575 0.854
R 704.04 120 32.021 0.879
R 704.04 120 32.248 0.846
R 704.05 120 31.795 0.831
R 704.05 120 32.313 0.886
R 704.07 120 32.740 0.878
R 704.07 120 32.248 0.866
R 711.07 120 32.641 0.976
R 711.07 120 32.248 1.138
R 715.04 120 32.806 1.178
R 715.03 120 32.608 0.959
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 39
R 715.05 120 32.674 1.048
R 715.05 120 32.707 1.030
R 721.06 120 31.412 0.943
R 721.06 120 31.763 1.513
R 722.05 120 33.071 1.682
R 722.07 120 34.986 0.912
R 723.05 120 35.021 1.626
V 504.29 60 18.243 0.332
V 504.29 60 18.928 0.172
V 504.29 60 18.076 0.164
V 504.3 60 18.243 0.166
V 504.3 60 18.243 0.166
V 504.3 60 18.076 0.164
V 508.29 60 17.910 0.163
V 508.29 60 17.910 0.163
V 508.29 60 17.910 0.163
V 509.29 60 17.583 0.160
V 509.29 60 17.422 0.159
V 509.29 60 17.583 0.160
V 510.3 60 16.185 0.295
V 510.3 60 16.185 0.147
V 510.3 60 16.185 0.147
V 511.26 60 18.412 0.168
V 511.26 60 18.412 0.168
V 511.26 60 18.412 0.168
V 511.28 60 18.412 0.168
V 511.28 60 18.582 0.169
V 511.28 60 18.412 0.168
V 516.31 60 18.412 0.168
V 516.31 60 18.754 0.171
V 516.31 60 18.076 0.164
V 517.24 60 18.582 0.169
V 517.24 60 18.582 0.169
V 517.24 60 18.582 0.169
V 517.25 60 18.412 0.168
V 517.25 60 18.412 0.168
V 517.25 60 18.412 0.168
V 619.2 30 25.183 0.458
V 619.2 30 24.952 0.454
V 619.21 30 25.651 0.467
V 619.21 30 24.952 0.454
V 619.22 30 23.179 0.211
V 619.22 30 23.179 0.211
V 619.22 30 23.394 0.213
V 619.22 30 23.179 0.211
V 619.23 30 23.179 0.211
V 619.23 30 23.394 0.213
V 619.23 30 23.394 0.213
V 619.23 30 23.394 0.213
V 619.24 30 23.610 0.215
V 619.24 30 23.394 0.213
V 619.25 30 23.610 0.215
V 619.25 30 23.610 0.215
V 619.26 30 23.610 0.215
V 619.26 30 23.610 0.215
V 632.98 30 28.386 0.258
V 632.98 30 28.648 0.261
V 632.99 30 29.724 0.270
40 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
V 632.99 30 28.648 0.261
V 633 30 28.386 0.258
V 633 30 28.386 0.258
V 633.01 30 27.359 0.249
V 633.01 30 27.108 0.247
V 633.01 30 26.860 0.244
V 633.01 30 26.613 0.242
V 633.01 30 26.128 0.238
V 633.02 30 26.128 0.238
V 634.98 30 22.341 0.203
V 634.98 30 22.136 0.201
V 635.1 30 26.128 0.476
V 635.1 30 25.651 0.467
V 635.12 30 28.386 0.775
V 635.13 30 29.181 0.797
V 635.13 30 28.648 1.043
V 635.13 30 28.386 1.808
V 635.14 30 29.181 0.797
V 635.14 30 26.613 0.242
V 635.14 30 26.860 0.244
V 635.14 30 27.108 0.247
V 635.15 30 26.860 0.489
V 635.15 30 26.860 0.489
V 635.16 30 26.860 0.489
V 635.16 30 26.369 0.480
V 635.16 30 26.369 0.480
V 635.16 30 26.128 0.476
V 635.17 30 25.651 0.467
V 635.17 30 26.369 0.480
V 635.18 30 26.128 0.238
V 635.18 30 26.369 0.480
V 635.19 30 26.860 0.733
V 635.19 30 26.369 0.720
V 640.21 120 27.868 0.254
V 640.21 120 27.612 0.251
V 641.1 120 32.292 0.294
V 641.11 120 31.996 0.291
V 643.21 120 28.386 0.258
V 643.21 120 28.386 0.258
V 644.1 120 29.181 0.266
V 644.1 120 28.648 0.261
V 648.2 120 28.386 0.258
V 648.2 120 28.914 0.263
V 648.21 120 28.126 0.256
V 648.21 120 28.126 0.256
V 648.22 120 30.839 1.123
V 648.22 120 29.724 0.541
V 651.08 120 33.504 0.915
V 651.08 120 32.591 0.890
V 651.2 120 32.591 0.297
V 651.2 120 33.814 0.308
V 651.21 120 32.591 0.297
V 651.22 120 32.591 0.297
V 651.23 120 32.591 0.297
V 651.23 120 32.591 0.297
V 651.24 120 32.591 0.297
V 651.24 120 32.591 0.297
V 652.07 120 34.762 0.316
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 41
V 652.08 120 34.443 0.313
V 652.2 120 31.124 0.283
V 652.2 120 30.839 0.281
V 652.21 120 31.412 0.286
V 652.21 120 31.124 0.283
V 652.23 120 31.412 0.286
V 652.23 120 31.412 0.286
V 652.24 120 31.124 0.283
V 652.24 120 31.124 0.283
V 655.02 120 25.183 0.229
V 655.02 120 25.183 0.229
V 655.2 120 32.591 0.297
V 655.2 120 32.591 0.297
V 655.21 120 32.292 0.294
V 655.21 120 32.591 0.297
V 655.22 120 32.591 0.297
V 655.22 120 32.591 0.297
V 664.2 30 34.127 0.621
V 664.2 30 34.443 0.627
V 664.21 30 35.083 0.639
V 664.21 30 27.359 0.249
V 678.02 120 35.278 0.498
V 678.02 120 35.083 0.505
V 678.03 120 34.096 0.482
V 678.03 120 34.443 0.506
V 678.05 120 33.783 0.477
V 678.05 120 33.320 0.643
V 678.06 120 33.258 0.757
V 678.06 120 33.258 0.733
V 678.08 120 32.712 0.697
V 678.08 120 32.832 0.846
V 678.09 120 33.106 0.487
V 678.09 120 31.996 1.190
V 678.11 120 32.442 1.289
V 678.11 120 32.174 1.444
V 678.12 120 31.820 0.724
V 678.12 120 31.762 1.101
V 679.07 120 33.075 0.486
V 679.08 120 33.106 0.777
V 679.09 120 30.109 0.944
V 679.1 120 29.316 0.755
V 679.13 120 32.893 0.847
V 679.13 120 31.441 0.568
V 679.14 120 31.239 1.869
V 679.15 120 31.412 0.627
V 680 120 33.075 0.486
V 680.01 120 32.501 0.490
V 680.02 120 32.382 0.625
V 680.02 120 32.772 0.482
V 680.03 120 32.144 0.803
V 680.04 120 32.382 0.646
V 680.05 120 32.263 0.856
V 680.05 120 32.742 0.537
V 680.07 120 32.115 0.472
V 680.07 120 30.726 1.273
V 680.13 120 33.014 0.485
V 680.13 120 32.832 0.483
V 681.09 120 36.805 0.785
42 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
V 681.09 120 37.282 0.821
V 681.1 120 36.300 0.852
V 681.11 120 35.967 0.650
V 681.12 120 37.111 0.985
V 681.12 120 36.635 0.807
V 682.02 120 33.474 0.518
V 682.02 120 34.096 0.491
V 682.03 120 32.292 0.456
V 682.04 120 32.292 0.475
V 682.05 120 31.908 0.451
V 682.05 120 31.732 0.491
V 682.06 120 32.471 0.861
V 682.07 120 31.908 0.494
V 682.09 120 33.474 1.049
V 682.09 120 35.408 1.139
V 682.1 120 33.846 0.770
V 682.1 120 34.127 0.728
V 682.12 120 32.412 0.810
V 682.12 120 32.681 1.051
V 683.01 120 32.352 0.784
V 683.01 120 32.115 0.497
V 683.02 120 31.762 0.723
V 683.04 120 31.499 0.860
V 683.04 120 31.211 0.564
V 683.05 120 30.641 0.450
V 683.05 120 31.355 0.461
V 684.01 120 33.350 0.735
V 684.01 120 33.474 0.473
V 684.03 120 34.666 0.920
V 684.03 120 32.681 0.919
V 684.04 120 34.507 0.997
V 684.04 120 32.561 0.915
V 684.06 120 32.651 0.816
V 684.06 120 31.996 0.874
V 689.01 120 31.849 0.507
V 689.01 120 32.233 0.486
V 689.02 120 31.996 0.470
V 689.03 120 32.352 0.476
V 690.06 120 36.434 0.536
V 690.07 120 36.872 0.555
V 690.07 120 35.769 0.526
V 690.07 120 36.266 0.533
V 690.08 120 35.473 0.549
V 690.09 120 36.000 1.070
V 690.1 120 35.670 0.891
V 690.1 120 35.375 0.911
V 691.05 120 37.351 0.578
V 691.05 120 36.974 0.557
V 691.07 120 37.145 0.560
V 691.07 120 36.838 0.542
V 692.01 120 32.233 0.486
V 692.01 120 33.014 0.526
V 692.03 120 33.412 0.481
V 692.03 120 33.412 0.481
V 692.04 120 33.908 0.556
V 692.05 120 32.412 0.466
V 692.05 120 32.471 0.467
V 692.06 120 34.570 0.508
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 43
V 692.07 120 34.634 0.498
V 692.07 120 33.566 0.483
V 692.08 120 34.285 0.493
V 692.09 120 34.570 0.508
V 692.13 120 31.879 0.846
V 692.13 120 31.557 0.837
V 693.01 120 34.411 0.602
V 693.03 120 34.826 0.695
V 693.03 120 33.628 0.484
V 693.04 120 34.002 0.541
V 693.05 120 33.939 0.488
V 693.06 120 33.443 0.481
V 693.06 120 33.628 0.484
V 693.06 120 33.136 0.477
V 693.08 120 33.535 0.505
V 693.08 120 32.953 0.484
V 693.09 120 32.832 0.593
V 693.1 120 33.075 0.498
V 693.11 120 33.228 0.600
V 693.11 120 32.712 0.894
V 693.12 120 33.197 0.488
V 693.13 120 33.535 0.493
V 694.04 120 34.890 0.513
V 694.04 120 34.762 0.511
V 694.06 120 35.278 0.546
V 694.06 120 34.666 0.510
V 694.07 120 35.343 0.520
V 694.07 120 35.703 0.525
V 694.08 120 35.083 0.516
V 694.08 120 34.987 0.514
V 694.1 120 34.954 0.982
V 694.1 120 35.703 0.525
V 694.11 120 34.222 0.515
V 694.12 120 34.285 0.504
V 694.12 120 34.159 0.515
V 694.13 120 34.443 0.506
V 694.13 120 31.732 0.448
V 695.04 120 34.285 0.504
V 695.04 120 34.507 0.507
V 695.05 120 34.033 0.500
V 695.06 120 34.064 0.501
V 695.07 120 33.939 0.511
V 695.07 120 34.096 0.527
V 695.08 120 33.721 0.496
V 695.09 120 33.721 0.496
V 696.04 120 34.602 0.509
V 696.04 120 34.762 0.511
V 696.05 120 34.826 0.512
V 696.05 120 34.316 0.505
V 696.07 120 34.507 0.520
V 696.07 120 34.666 0.715
V 696.08 120 35.019 0.515
V 696.09 120 35.703 0.689
V 699.02 120 35.245 0.518
V 699.02 120 35.506 0.914
V 699.03 120 35.310 0.519
V 699.03 120 34.411 0.506
V 699.05 120 34.348 0.505
44 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
V 699.05 120 34.538 0.508
V 699.06 120 35.083 0.516
V 699.06 120 34.666 0.510
V 699.07 120 33.814 0.497
V 699.07 120 35.148 0.517
V 699.08 120 34.602 0.509
V 699.09 120 35.148 0.517
V 699.1 120 35.604 0.523
V 699.1 120 35.441 0.521
V 700.03 120 35.868 0.527
V 700.03 120 35.441 0.521
V 700.04 120 35.525 0.530
V 700.04 120 34.922 0.513
V 700.06 120 34.634 0.522
V 700.06 120 35.019 0.527
V 700.07 120 34.380 0.505
V 700.07 120 34.634 0.669
V 700.08 120 34.253 0.504
V 700.09 120 34.890 0.610
V 700.1 120 34.858 0.845
V 700.1 120 33.908 0.511
V 701.02 120 32.203 0.513
V 701.02 120 32.501 0.490
V 701.03 120 32.292 0.486
V 701.04 120 31.791 0.479
V 701.05 120 32.501 0.490
V 701.05 120 32.712 0.972
V 701.06 120 31.326 0.585
V 701.08 120 32.115 0.852
V 701.08 120 32.893 0.484
V 702.03 120 35.051 0.515
V 702.03 120 35.834 0.527
V 702.04 120 35.769 0.526
V 702.04 120 35.148 0.517
V 702.06 120 35.736 0.525
V 702.06 120 35.901 0.716
V 702.07 120 35.571 0.809
V 702.07 120 35.245 0.531
V 703.03 120 31.211 0.459
V 703.04 120 31.703 0.916
V 703.05 120 30.613 0.740
V 703.05 120 31.470 0.739
V 703.06 120 30.924 0.541
V 703.07 120 31.879 0.595
V 703.07 120 31.791 0.747
V 703.07 120 31.067 0.468
V 704.03 120 31.239 0.459
V 704.03 120 31.067 0.457
V 704.05 120 31.239 0.459
V 704.05 120 31.239 0.666
V 704.06 120 31.441 0.474
V 704.06 120 31.470 0.445
V 704.07 120 31.616 0.590
V 704.07 120 31.268 0.498
V 710.06 120 29.669 0.447
V 710.07 120 29.154 0.601
V 711.07 120 31.586 0.765
V 711.07 120 31.674 0.941
A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005 45
V 715.03 120 31.703 0.592
V 715.03 120 32.442 0.990
V 715.04 120 31.820 0.771
V 715.04 120 31.967 0.638
V 721.05 120 29.806 0.678
V 721.05 120 30.248 0.710
V 722.05 120 32.591 0.695
V 722.05 120 32.681 0.590
V 723.05 120 32.115 0.754
V 723.05 120 31.820 0.724
Table A.1: Log of observations. Epoch of observations is reported in JD-
2453000.5 unit.
References
Bersanelli, M., Bouchet, P., Falomo, R, & Tanzi, E. G., 1992, AJ, 104, 28
Bertone, E., Tagliaferri, G., Ghisellini, G., et al. 2000, A&A, 356, 1
Cardelli, J. A., Clayton, G. C. & Mathis, J. S. 1989, ApJ, 345, 245
Chincarini, G., Zerbi, F., Antonelli, A., et al. 2003, The Messenger, 113, 40
Covino, S., Stefanon, M., Fernandez-Soto, A., et al. 2004, SPIE, 5492, 1613
Chiappetti, L., Maraschi, L., Tavecchio, F., et al. 1999, ApJ, 521, 552
De Diego, J. A., Kidger, M. R., Gonzáles-Pérez, J. N., and Letho, H. J., 1997, A&A, 318, 331
Dolcini, A., Covino, S., Treves, A.,et al. 2005, A&A, 443, L33
Edelson, R., Krolik, J., Madejski, G., et al. 1995, ApJ, 438, 120
Falomo, R., Giraud, E., Maraschi, L., et al., ApJ, 1991, 380, L67
Falomo, R., Bersanelli, M., Bouchet, P., Tanzi, E. M., AJ, 1993, 106, 11
Fan, J. H. & Lin, R. G., A&A, 355, 880
Fuhrmann, L., Cucchiara, A., Marchili, N., et al. 2006, A&A, 445L, 1
Hamuy, M. & Maza, J. 1989, AJ, 97, 720
Katarzynski, K., Ghisellini, G., Tavecchio, F., et al. 2005, A&A, 433, 479
Katarzynsky, K., Ghisellini, G., Mastichiadis, A., et al. 2006, A&A, 453, 47
Katarzynski, K. & Ghisellini, G., 2006, A&A, in press [arXiv:astro-ph/0610801]
Kotilainen, J. K., Falomo, R., & Scarpa, R. 1998, A&A, 336, 479
Landolt, A. U., 1992, AJ, 104, 340
Mannucci, F., Basile, F., Poggianti, B. M., et al., 2001, MNRAS, 326, 745
Miller, H. R., McAlister, 1983, ApJ, 272, 26
Molinari et al., 2006, [arXiv:astro-ph/0612607]
Montagni, F., Maselli, A., Massaro, E., et al. 2006, A&A, 451, 435
Paltani, S., Courvoisier, T. J.-L., Blecha, A., & Bratschi, P. 1997, A&A, 327, 539
Pesce, J. E., Urry, C. M., Maraschi, L., et al. 1997, ApJ, 486, 770
Pian, E., Vacanti, G., Tagliaferri, G., et al. 1998, ApJ, 492, L17
Pian, E., Urry, C. M., Maraschi, L., et al. 1999, ApJ, 521, 112-120
Pian, E., Falomo, R., Hartman, R.C., et al. 2002, ApJ, 392, 407-415
Pian, E., Romano, P., Treves, A., et al. 2006, in preparation
Sbarufatti, B., Falomo, R., Treves, A. & Kotilainen, J. 2006, A&A, 457, 35
Sembay, S., Edelson, R., Markowitz, A., et al. 2002, ApJ, 574, 634
Smith, P. S., Hall, P. B., Allen, R. G., & Sitko, M. L. 1992, ApJ, 400, 115
Stetson, P. B., 1986, PASP, 99, 191
Tavecchio, F., Maraschi, L. and Ghisellini, G., 1998, ApJ, 509, 608
Tanihata, C., Kataoka, J., Takahashi, T., & Madejski, G. M., 2004, ApJ, 601, 759
Tommasi, L., Diaz, R., Palazzi, E., et al. 2001, ApJ Supplement Series, 132, 73
Treves, A., Morini, M., Chiappetti, L., et al. 1989, ApJ, 341, 733
Urry, C. M., Maraschi, L., Edelson, R., et al. 1997, ApJ, 486, 799
Xie, G. Z., Zhang, Y. H., Li, K. H., et al. 1996, AJ, 111, 3
Zerbi F. M., Chincarini, G., Ghisellini, G., et al. 2001, AN, 322, 275
Zhang, Y. H., Xie, G. Z. 1996, A&A Supplement Series, 116, 289
Zhang, Y. H., Treves, A., Celotti, A., Qin, Y. P., & Bai, J. M., 2005, ApJ, 629, 686
http://arxiv.org/abs/astro-ph/0610801
http://arxiv.org/abs/astro-ph/0612607
46 A. Dolcini et al.: REM monitoring of PKS2155-304 during 2005
Figure 13. SED of PKS 2155-304 in two states, adapted from Chiappetti et al. (1999) (see the paper for details). Data from
this work are also plotted. Filled triangles correspond to epoch 1 (13/5/2005 data), while filled hexagons belong to epoch 3 data
(20/11/2005). Optical, UV and REM data are dereddened using E(B-V)=0.026 and parameters given by Cardelli et al. (1989).
Zhang, Y. H., Treves, A., Maraschi, L., Bai, J. M., & Liu, F. K. 2006a, ApJ, 637, 699
Zhang, Y. H., Bai, J. M., Zhang, S. N., et al., 2006b, ApJ, 651, 682
	Introduction
	REM, Photometric procedure, data analysis
	Observations and data analysis
	Results
	Long term variability
	Short time-scale variability
	The NIR-Optical spectral energy distribution
	Discussion
	Table of observations
ABSTRACT
  Spectral variability is the main tool for constraining emission models of BL
  Lac objects. By means of systematic observations of the BL Lac prototype PKS
2155-304 in the infrared-optical band, we explore variability on the scales of
months, days and hours. We made our observations with the robotic 60 cm
telescope REM located at La Silla, Chile. VRIJHK filters were used. PKS
2155-304 was observed from May to December 2005. The wavelength interval
explored, the total number of photometric points and the short integration time
render our photometry substantially superior to previous ones for this source.
On the basis of the intensity and colour we distinguish three different states
of the source, each of duration of months, which include all those described in
the literature. In particular, we report the highest state ever detected in the
H band. The source varied by a factor of 4 in this band, much more than in the
V band (a factor ~2). The source softened with increasing intensity, contrary
to the general pattern observed in the UV-X-ray bands. On five nights of
November we had nearly continuous monitoring for 2-3 hours. A variability
episode with a time scale of ~24 h is well documented, a much more rapid flare
with t=1-2 h, is also apparent, but is supported by relatively few points.

<|endoftext|><|startoftext|>
Supernova Polarization
and the Type IIn Classification
Jennifer L. Hoffman
Department of Astronomy, UC Berkeley, 601 Campbell Hall, Berkeley, CA 94720-3411
Abstract. While the members of the Type IIn category of supernovae are united by the presence
of strong multicomponent Balmer emission lines in their spectra, they are quite heterogeneous with
respect to other properties such as Balmer line profiles, light curves, strength of radio emission,
and intrinsic brightness. We are now beginning to see variety among SNe IIn in their polarimetric
characteristics as well, some but not all of which may be due to inclination angle effects. The
increasing number of known “hybrid” SNe with IIn-like emission lines suggests that circumstellar
material may be more common around all types of SNe than previously thought. Investigation of the
correlations between spectropolarimetric signatures and other IIn attributes will help us address the
question of classification of “interacting SNe” and the possibility of distinguishing different groups
within the diverse IIn subclass.
Keywords: supernovae, type IIn, supernova classification
PACS: 97.60.Bw
Type IIn (“narrow-line”) supernovae (SNe IIn) are Type II events whose primary
distinction is the existence of strong narrow hydrogen Balmer emission lines in the
spectra [1, 2]. These lines indicate the presence of circumstellar material ejected by the
progenitor star and excited by the UV and X-ray photons from the supernova explosion.
The IIn subclass includes 2–5% of all Type II supernovae [3].
Turatto [4] categorized the SNe IIn as a special class of core-collapse supernovae
related to the SNe Ib/c and hypernovae. However, in recent years several new discoveries
have blurred the boundaries between the IIn category and other types of supernovae. It
should be noted that since the label of IIn is assigned based on spectral characteristics,
while that of II-L or II-P is based on features of the light curve, these three categories are
not mutually exclusive, and some objects may be given more than one classification in
the literature. But observations of “hybrid” supernovae such as SN 2002ic [5, 6] and
SN 2005gj [7], which showed IIn-like Hα lines superposed on type Ia-like spectra,
and of “chameleon” supernovae such as SN 2001em [8, 9], which evolved from a
Type Ic to a Type IIn over the span of a few years, suggest that not only SNe II
but potentially all supernovae may show signatures of interaction with circumstellar
material. Consequently, Turatto’s revised classification scheme [10] includes a broad
category of “interacting SNe” that spans all SN types and includes the SNe IIn.
The IIn category has always been heterogeneous, as noted by Filippenko [2], whose
Figure 14 presented a non-coeval collection of SN IIn spectra. Figure 1 shows that even
when compared at similar ages, SNe IIn have quite diverse spectral characteristics. The
primary feature of SNe IIn spectra, the strong Hα emission line, also varies substantially
in strength and profile between objects of comparable age (Figure 2) and even for a given
object over time. In particular, the very narrow (FWHM < 200 km/s) component of Hα
http://arxiv.org/abs/0704.0266v2
FIGURE 1: Deredshifted optical
spectra of four Type IIn super-
novae at similar epochs, showing
the diversity of spectral character-
istics within the subtype. All dates
are post-discovery. Detailed analy-
ses of SN 1997eg and SN 2000P
have been conducted by [11] and
[12], respectively; the other data
are presented courtesy of A. Fil-
ippenko (priv. comm.). For the
unpublished spectra, redshift cor-
rections are based on information
in the Asiago Supernova Catalog
[13]. Hydrogen Balmer lines are
labeled.
does not necessarily exist at all times in the evolution of a SN IIn. In SN 1997eg (Figure
3; [11]), this narrow Hα component disappeared after 100 days post-discovery, was
replaced by a small symmetric absorption feature, and then reappeared around day 400.
The Hα line in SN 1998S ([12]; their Figure 7) also lost its narrow emission component
early on, but was very asymmetric and changed much more dramatically over time.
Because of interaction with circumstellar material, the light curves of SNe IIn often
decline quite slowly in comparison with other core-collapse objects, but not all members
of the subclass show this slow decline [2]. The canonical IIn SN 1988Z decreased in
brightness by only 5 magnitudes in 1000 days [14], but others decline more quickly
(e.g., SN 1998S; [15]). SN 1994W faded by ∼6 magnitudes in B in only 100 days [16].
Circumstellar interaction can also cause some SNe IIn to become strong radio and
X-ray emitters; SN 1988Z and SN 1986J are among the brightest radio supernovae
ever observed [17]. Nearly all the SNe IIn detected in X-rays have also been strong
radio sources [18, 19]. However, not all SNe IIn become radio-loud [18], and there
is considerable heterogeneity even among the “radio-quiet” subset [2]. The overlap
between the IIn, II-P, and II-L categories makes it difficult to identify trends in radio
brightness with core-collapse subtype.
FIGURE 2. As in Fig. 1, but for the Hα region of the spectrum.
FIGURE 3. Time evolution of the multi-component Hα profile of SN 1997eg [11]. Spectra are normal-
ized to day 16 at v =−10,000 km s−1.
FIGURE 4. Polarized flux spectra (percent polarization multiplied by flux) for SN 1998S [12] and SN
1997eg [11], binned to 10 Angstroms in each case. Dashed lines show total flux spectra for comparison,
scaled by 0.037 and 0.036, respectively. Polarization data are corrected for estimated interstellar polariza-
tion (ISP) effects; these estimates are somewhat uncertain in each case, but a simple correction for ISP
could not account for all the differences between these two polarized flux spectra.
Finally, SNe IIn show considerable variety in their spectropolarimetric characteristics
(Figure 4). Supernovae of all types are known to be polarized due to intrinsic asphericity
of the ejecta. The circumstellar material surrounding SNe IIn can produce its own po-
larization signature in addition to that arising from the ejecta. Some of the polarization
variations between SNe IIn are likely due to inclination; that is, they may have similar
geometrical distributions of circumstellar material, but have different polarization sig-
natures due to different viewing angles. To study this effect, I am constructing a grid of
Monte Carlo radiative transfer models of the Hα line polarization produced by circum-
stellar matter distributions with various geometries and seen at various viewing angles
[20]. Preliminary results suggest that differences in viewing angle may account for some
of the polarimetric variety in SNe IIn. However, such geometric effects cannot explain
all of the IIn diversity; for example, the transformation of SN 2001em from a Ic into a
IIn [8, 9] is very unlikely to be due to a sudden change in inclination!
These examples show that the IIn classification is currently something of a “catchall”
for a number of related but distinct objects, all with close connections to Turatto’s
[10] new category of “interacting SNe”. Now that circumstellar interaction has been
recognized to occur for a broad range of supernova types, and now that high-quality
time-dependent data are more readily available, we are in a good position to reconsider
the categorization of SNe IIn and other interacting supernovae. An initiative underway
within the Berkeley supernova group will quantify the spectral and spectropolarimetric
characteristics of SNe IIn and search for correlations between these features and other
properties such as light curve shape and radio/X-ray behavior. If found, such correlations
will illuminate the relationships between diverse members of the IIn category and
perhaps ultimately argue for the category’s subdivision.
In the future, full understanding and correct classification of these important objects
will depend critically on obtaining a broad range of observational information, including
multiwavelength and polarimetric data, with as much time coverage as possible. In
addition, supernova researchers should expand their collaborations with the evolved star
community. If the circumstellar media around SNe IIn and related objects are indeed
created by winds and outflows from massive stars, we can use nearby evolved star
nebulae as analogous cases to the circumstellar envelopes of supernovae. Such an effort
could greatly improve our understanding of supernova progenitors and help shed light
on the physical causes of the diversity we observe among interacting supernovae.
ACKNOWLEDGMENTS
This research was funded by an NSF Astronomy & Astrophysics Postdoctoral Fellow-
ship, AST-0302123; by NSF grant AST-0607485 to A. V. Filippenko; and by the Na-
tional Energy Research Scientific Computing Center, US DOE Contract #DE-AC03-
76SF00098. I thank Alex Filippenko at UC Berkeley and Peter Nugent at the Lawrence
Berkeley Laboratory for their support and invaluable contributions.
REFERENCES
1. E. M. Schlegel, MNRAS 244, 269 (1990).
2. A. V. Filippenko, ARAA 35, 309 (1997).
3. E. Cappellaro, M. Turatto, D. Y. Tsvetkov, et al., A&A 322, 431 (1997).
4. M. Turatto, S. Benetti, and E. Cappellaro, “Variety in Supernovae,” in From Twilight to Highlight:
The Physics of Supernovae, Springer, Berlin, 2003, p. 200.
5. J. Deng, K. S. Kawabata, Y. Ohyama, et al., ApJ 605, L37 (2004).
6. M. Hamuy, M. M. Phillips, N. B. Suntzeff, et al., Nature 424, 651 (2003).
7. G. Aldering, P. Antilogus, S. Bailey, et al., ApJ 650, 510 (2006).
8. A. M. Soderberg, A. Gal-Yam, and S. R. Kulkarni, GCN 2586, 1 (2004).
9. N. N. Chugai, and R. A. Chevalier, ApJ 641, 1051–1059 (2006).
10. M. Turatto, “Classification of Supernovae,” in This Volume, AIP, New York, 2007.
11. J. L. Hoffman, D. C. Leonard, R. Chornock, et al. (2007), in prep.
12. D. C. Leonard, A. V. Filippenko, A. J. Barth, et al., ApJ 536, 239 (2000).
13. R. Barbon, V. Buondí, E. Cappellaro, et al., A&AS 139, 531 (1999).
14. M. Turatto, E. Cappellaro, I. J. Danziger, et al., MNRAS 262, 128 (1993).
15. A. Fassia, W. P. S. Meikle, W. D. Vacca, et al., MNRAS 318, 1093 (2000).
16. R. J. Cumming, and P. Lundqvist, “Supernova Progenitor Constraints from Circumstellar Interaction:
Type II,” in Advances in Stellar Evolution, Cambridge University Press, Cambridge, 1997, p. 297.
17. R. A. Chevalier, C. Fransson, and T. K. Nymark, ApJ 641, 1029 (2006).
18. S. D. van Dyk, M. Hamuy, and A. V. Filippenko, AJ 111, 2017 (1996).
19. D. Pooley, W. H. G. Lewin, D. W. Fox, et al., ApJ 572, 932 (2002).
20. J. L. Hoffman, “Polarized Line Profiles as Diagnostics of Circumstellar Geometry in Type IIn
Supernovae,” in Circumstellar Media and Late Stages of Massive Stellar Evolution, Rev. Mex. AA
Ser. Conf., 2007, in press.
ABSTRACT
  While the members of the Type IIn category of supernovae are united by the
presence of strong multicomponent Balmer emission lines in their spectra, they
are quite heterogeneous with respect to other properties such as Balmer line
profiles, light curves, strength of radio emission, and intrinsic brightness.
We are now beginning to see variety among SNe IIn in their polarimetric
characteristics as well, some but not all of which may be due to inclination
angle effects. The increasing number of known "hybrid" SNe with IIn-like
emission lines suggests that circumstellar material may be more common around
all types of SNe than previously thought. Investigation of the correlations
between spectropolarimetric signatures and other IIn attributes will help us
address the question of classification of "interacting SNe" and the possibility
of distinguishing different groups within the diverse IIn subclass.

<|endoftext|><|startoftext|>
Introduction
Among X-ray binary systems (XRBs), 18 or more have been identified as containing
probable black hole (BH) accretors (McClintock & Remillard 2005). The BH masses mea-
sured to date appear to fall into a limited range. From a Bayesian analysis of the observa-
tional parameters of several low-mass XRBs, Bailyn et al. (1998) concluded that 6 of the 7
systems measured had BH masses clustered around 7 M⊙and that the overall population was
heavily biased away from low BH masses (3 – 5 M⊙). Since that study, at least one system,
J0422+32, has been found to have a measured mass of about 4 M⊙(Gelino & Harrison 2003),
but the overall trend toward higher BH masses persists.
This result is in conflict with theoretical evolutionary models of the formation of BH
XRBs, which predict a continuous distribution of BH masses from neutron star masses up to
10–15 M⊙ (Fryer & Kalogera 2001). Confirmation or elimination of the low-mass BH “gap”
would provide important constraints on the role of massive star evolution, supernova ener-
getics, and subsequent binary evolution in BH formation (Fryer & Kalogera 2001). Further
analysis of the mass distribution of BHs in compact binary systems is currently hampered
by the generally poor precision of existing mass estimates (e.g., see Table 4.2 in McClintock
& Remillard 2005). In this manuscript, we address this limitation by analyzing near-infrared
(NIR) spectra to obtain a precise BH mass for one XRB, A0620-00.
A0620-00 was discovered in 1975 when it erupted in an X-ray nova (Elvis et al. 1975).
After its return to quiescence, A0620-00 was revealed as an interacting binary system with a
K star donating mass via an accretion disk to a compact object (Oke 1977; McClintock et al.
1983). Later, the orbital period of the binary and the radial velocity amplitude of the donor
star were measured and yielded a mass function, f(M) = 3.17 M⊙, which established A0620-
00 as a likely BH XRB (McClintock & Remillard 1986). Further observations established the
binary mass ratio and determined the masses of the stars to within one unknown, the binary
orbital inclination: M1 = (3.09±0.09) sin
3 i M⊙ and M2 = (0.21±0.09) sin
3 i M⊙, where M1
and M2 are the BH and donor star masses, respectively (Marsh, Robinson & Wood 1994;
1Visiting Astronomer at the Infrared Telescope Facility, which is operated by the University of Hawaii
under Cooperative Agreement no. NCC 5-538 with the National Aeronautics and Space Administration,
Science Mission Directorate, Planetary Astronomy Program.
– 3 –
Orosz et al. 1994). Several groups have determined values for the inclination, with the num-
bers ranging from 38◦ ≤ i ≤ 75◦. As a result, estimates of the BH mass in A0620-00 vary from
3.3 to 13.6 M⊙(Haswell et al. 1993; Shahbaz, Naylor & Charles 1994; Froning & Robinson
2001; Gelino et al. 2001).
The broad range of derived inclinations and BH masses for A0620-00 result from long-
term variability in the system and from uncertain determinations of the amount of veiling,
or dilution, by sources other than the donor star. The inclination is determined by modeling
the amplitude of the ellipsoidal variations in the donor star light curve, so both of these
effects will alter the derived BH mass. In particular, an additional source will dilute the
amplitude of the ellipsoidal variation, leading to an underestimate of the inclination and a
corresponding overestimate of the BH mass if not taken into account.
The best way to determine the true donor star contribution in A0620-00 is to model
its spectrum, particularly in the NIR, where the late type donor star is expected to domi-
nate. Shahbaz, Bandyopadhyay & Charles (1999) modeled a low S/N, K-band spectrum of
A0620-00, from which they concluded that the accretion disk contributes at most 27% of the
continuum in the NIR. They fit only the 12CO bandhead at 2.29 µm, however, which is sen-
sitive to both temperature and luminosity of the donor star and may be prone to metallicity
effects in compact binary systems (Froning & Robinson 2001). Harrison et al. (2007) also
recently published a K-band spectrum of A0620-00 in which they confirmed that the 12CO
absorption lines are anomalously weak. What is needed to settle the debate over the con-
tribution of the accretion disk to the NIR spectrum of A0620-00 are higher S/N, broadband
spectra. To this end, we have obtained and present here 0.8 – 2.4 µm spectroscopy of A0620-
00 obtained with SpeX at the NASA InfraRed Telescope Facility (IRTF). This manuscript is
organized as follows: § 2 summarizes the data reduction and calibration steps; § 3 presents
the data analysis and modeling of the donor star spectrum; and § 4 gives discussion and
conclusions.
2. Observations and Data Reduction
We observed A0620-00 on 2004 January 8 – 10 using SpeX on the IRTF (Rayner et al.
2003). The weather was clear with good seeing conditions (. 0.7′′) throughout the run. We
observed A0620-00 using the ShortXD mode, which has a cross-dispersed echelle configu-
ration and covers 0.8 – 2.5 µm simultaneously in 6 orders. All observations were obtained
through the 0.′′5 slit, resulting in a spectral resolution in the center of each order of R = 1200
(250 km s−1). A nearby A0V star was observed hourly to sample the atmospheric absorption
spectrum. We also observed several spectral type calibration stars in the same configuration
– 4 –
used for A0620-00 (supplemented with similar data from a previous SpeX observing run).
The observations are summarized in Table 1.
All data reduction, calibration, and spectral extraction steps were performed using
Spextool, an IDL-based package developed by the IRTF (Cushing, Vacca, & Rayner 2004;
Vacca, Cushing, & Rayner 2003). The calibration processing steps included flat-fielding,
sky subtraction, optimal spectral extraction, and wavelength calibration. Each A0620-00
exposure was extracted individually to preserve the full 300 sec time resolution. The spectra
were corrected for telluric absorption and flux-calibrated using the A0V stellar spectra as
described by Vacca, Cushing, & Rayner (2003). Finally, the orders were merged for each
exposure to create single spectra covering the full wavelength range. The S/N per resolution
element was ∼4 in the individual spectra. The slit was kept aligned to the parallactic angle
during data acquisition, so the relative fluxes along the full 0.8 – 2.4 µm range are accurate to
≤ 2% (where the upper limit is the uncertainty at 0.9 µm when guiding at 2.4 µm; Cushing
et al. 2004). In addition, stable observing conditions resulted in absolute flux calibrations
of comparable quality. We have not quantified this number as our analysis does not depend
on the absolute flux of the spectrum, but we note for completeness that a rough comparison
of our mean in-band colors to those of Froning & Robinson (2001) and Gelino et al. (2001)
yielded JHK colors within 0.1 mag of their results, well within the level of variability observed
in A0620-00 over long time periods.
For the time-averaged spectrum of A0620-00, we shifted each 300 sec spectrum to
the rest frame of the donor star before median combining, using the orbital ephemeris of
McClintock & Remillard (1986) and the donor star radial velocity amplitude fromMarsh, Robinson & Wood
(1994). The error bars were determined by calculating the median absolute deviation of each
pixel and then propagated through the smoothing and de-reddening steps. We did not cor-
rect for the effects of orbital smearing within an exposure time, but we note that this is
a negligible effect (≤30 km s−1) at our 250 km s−1 spectral resolution. The time-averaged
spectrum is shown in Figure 1. The spectrum has been boxcar-smoothed by 3 pixels, equiva-
lent to one resolution element. Based on the scatter around linear fits to (relatively) line-free
spectral regions, we find that the S/N in the time-averaged spectrum is ∼55 in H and K and
∼45 in J (> 1µm, ∼30 at shorter wavelengths).
Also shown in Figure 1 is the dereddened spectrum of A0620-00, calculated assuming a
reddening along the line of sight of E(B–V) = 0.39 (Wu et al. 1976). Figures 2 – 4 show ex-
panded views of the J,H, and K bands of the spectrum, with prominent spectral absorption
and emission features labeled. Line identifications were made using multiple sources, in-
cluding Wallace et al. (2000), Meyer et al. (1998), Kleinmann & Hall (1986), Harrison et al.
– 5 –
(2004), and the Atomic Line List2.
3. Analysis
The broadband NIR spectrum of A0620-00 shown in Figures 1 – 4 is characterized by a
blue continuum on which are superposed broad (full width at zero intensities ≥4000 km s−1)
emission lines of H I and He II and narrow (full width at half minima ≃ 250 – 400 km s−1)
absorption lines of neutral metals, including transitions of Na I, Mg I, Al I, Si I, K I, Ca I,
Ti I, and Fe I. The emission lines are believed to originate in the accretion disk, while the
absorption features originate in the photosphere of the donor star. The absorption spectrum
is similar to that of a K star, with previous estimates of the spectral type ranging from K3V
to K7V (Oke 1977; González Hernández et al. 2004).
In Figure 5, we show the dereddened spectrum of A0620-00 compared to that of a field
K5V star. The K star has been normalized to A0620-00 near 2.29 µm, just blueward of
the 12CO (2,0) bandhead. The comparison immediately shows that a K5V (or hotter) star
cannot be the only flux source in the NIR. If the K5 star is scaled to the flux of A0620-00
in the K band, it exceeds the dereddened flux by >20% at bluer wavelengths. Adopting the
K4V and K3V spectral types of Gelino et al. (2001) and González Hernández et al. (2004)
results in an even larger disparity between the expected and observed J- and H-band fluxes.
Even a spectral type as late at K7V exceeds the observed flux in A0620-00 by up to 10%
over most of the J and H bands when normalized to 100% contribution in K.
Modest increases in the assumed reddening along the line of sight do not reconcile the
spectrum of A0620-00 with that of a K5V star. Adopting a reddening of E(B–V) = 0.45
brings the 0.8 µm fluxes into agreement, but the template spectrum is still brighter than the
spectrum of A0620-00 at longer wavelengths, including most of the J and H bands. Because
the relative reddening values between H and K are small, the reddening must be increased to
E(B–V) > 1.0 to bring the spectrum of the normalized K5V template below the spectrum of
A0620-00 at all NIR wavelengths. It is extremely difficult to reconcile a reddening this high
with the observed depth of the interstellar absorption feature at 2200 Å in the spectrum of
A0620-00 (Wu et al. 1976). As a result, the fundamental conclusion remains: if the donor
star is the sole source of NIR emission in A0620-00, its spectral type must be later than that
of a K7V. Otherwise, some level of dilution must be present.
The absorption spectrum of A0620-00 resembles that of the K5V template, but there
2http://www.pa.uky.edu/p̃eter/atomic/
– 6 –
is at least one important difference between them: the CO molecular absorption features in
A0620-00 are significantly weaker relative to the metal lines than in the template spectrum.
This difference can affect determinations of the donor star contribution to the NIR spectrum.
For example, the dilution analysis performed by Shahbaz, Bandyopadhyay & Charles (1999)
on the K-band spectrum of A0620-00 is unlikely to be a valid determination of the contribu-
tion of the donor star to the NIR spectrum, since they relied entirely on the relative strength
of the 12CO 2.29 µm feature. The weakness of the CO lines also suggests that other anomalies
may exist in the spectrum, necessitating that its decomposition be undertaken over a wide
wavelength range and using multiple line species and features. Accordingly, we compare the
line equivalent widths and line ratios in A0620-00 with those of field star populations and
model the spectrum using both spectral type standards and synthetic spectra.
3.1. Classification Based on Spectral Indices
Studies using equivalent widths (EWs) and EW ratios to determine the spectral type of
a star or stellar population have been pursued by several groups (e.g., Origlia, Moorwood,
& Oliva 1993; Ali et al. 1995; Förster Schreiber 2000; and sources therein). Of particular
interest to us is the work of Förster Schreiber (2000; hereafter FS), who examined H and
K-band absorption lines to find temperature and luminosity indices and indices sensitive to
dilution of the stellar spectrum by other sources. FS was primarily interested in spectral
trends in giant and supergiant stars as the dominant stellar source in extragalactic NIR
spectra, but his analysis includes some dwarf stars as well.
For comparison, we calculated the EWs of several prominent stellar absorption lines in
the time-averaged spectrum of A0620-00 to compare to the spectral indices in FS. Table 2
gives the measured EWs. The lines were chosen to correspond to those in Table 3 of FS
and were calculated using the same continuum normalization and integration limits. Where
applicable, we also applied the EW correction for spectral resolution from Equations 2 –
4 of FS. The error bars on the EWs are the standard deviation of the mean for several
measurements of each line with variable estimates of the continuum placement.
We first compared our EWs to the spectral indices given in Figure 5 of FS, which presents
temperature and luminosity indicators for stars with solar or near-solar abundances. With
the exception of Si I λ1.59 µm and Mg I λ2.28 µm, all of the lines under analysis show a
strong trend of increasing EWs with decreasing stellar temperature. The EWs in A0620-00
are on the low side of the distributions for these lines, indicating stellar temperatures of
≥5000 K.
– 7 –
We also compared our EWs with those presented in Ali et al. (1995), who concentrated
on temperature indices for dwarf stars. Note that the EWs in Table 2 used to compare
to the Ali et al. (1995) indices are larger than those used for the FS indices because Ali
et al. used a wider wavelength interval for their measurements, which we mirrored. Using
their EW-temperature relationships, we obtain a temperature of 4600±300 K from Ca I and
5000±450 K from Na I. Therefore, if uncorrected for dilution, EWs in A0620-00 point to
a donor star of roughly type K3V star or earlier. However, stars of spectral type K7V or
earlier exceed the observed spectrum of A0620-00 in J and H when zero dilution is assumed
in K. Therefore, we conclude that a diluting continuum source must be present in the NIR
spectrum of A0620-00.
The amount of dilution of the stellar spectrum by another source is determined in FS
by comparing the line ratios of adjacent atomic and molecular features (their Table 8).
Unfortunately, their line ratios use the H and K band CO molecular absorption features,
which we have already seen are not normal in A0620-00. Indeed, a comparison of the 12CO
1.62 µm and 2.29 µm features in A0620-00 gives a result so disproportionately strong in the
1.62 µm line that the ratio doesn’t even appear on the FS spectral index plot. Similarly, a
comparison of the 2.29 µm feature to the nearby Na I and Ca I EWs indicates that the CO
feature is weaker in A0620-00 than in any of the dwarf stars analyzed by FS. These results
indicate that the CO features cannot be used to estimate the non-stellar dilution component
A0620-00.
3.2. Fitting Spectral Type Standard Stars to the Spectrum of A0620-00
In an effort to quantify the contribution of the donor star to the NIR spectrum of A0620-
00 and its dilution by other sources, we compared its spectrum to those of K3V, K5V, and
K7V spectral type standard stars. The standard stars (listed in Table 1) were observed
with the same instrument configuration and calibrated using the same procedures as for the
A0620-00 observations. Before comparing the A0620-00 and template spectra, both were
boxcar-smoothed over 3 pixels (one resolution element). The spectrum of A0620-00 was also
dereddened with E(B–V) = 0.39 (using the reddening curve of Cardelli, Clayton, & Mathis
(1989) and the standard star spectra were convolved with a Gaussian of 83 km s−1 FWHM to
mimic the rotational broadening of the donor star in A0620-00 (Marsh, Robinson & Wood
1994). Note, however, that the rotational broadening is smaller than the 250 km s−1 resolu-
tion of the spectra and has a minimal effect on the results.
We wrote an IDL program to fit scaled template spectra to the spectrum of A0620-00
using the following steps. First, we selected a small wavelength range (typically, 0.1 µm or
– 8 –
smaller) and fit a spline function to the continuum. The continuum points were selected by
eye. After normalizing both the spectrum of A0620-00 and that of the template star and
placing both spectra on a common, linear dispersion, we multiplied the template spectrum
by a fraction, f , which represents the donor star contribution to the spectrum of A0620-00,
and subtracted the scaled donor star spectrum from that of A0620-00. We varied f from 0
to 1 in increments of 0.01 to find the fraction that minimized the rms of the residual in each
waveband. Finally, we repeated this analysis over several spectral lines and groups of lines
over the full NIR spectral range. The fit regions we examined are given in Table 3. The
resulting best values of f and the rms for each fit region and template spectrum is given in
Table 4.
There are few absorption lines in the J band that are both relatively strong and un-
contaminated by emission lines in the A0620-00 spectrum, so our fits were restricted to the
portion of the long J band between Pγ and Pβ (1.10 – 1.26 µm). This region contains a
blend of singly-ionized atomic species, including transitions of Mg I, Fe I, and Si I. The best
fit fractions range from 0.78 to 1.0. There is a disparity between the strongest line in this
range, Mg I λ1.183 µm and the other lines in this band: the Mg I line is best fit at f ∼0.9,
but the other lines are weaker in the template than in A0620-00 even at f = 1.
The situation is less ambiguous in the H band. The best fit to the full H band spectrum
using the K5V standard star is shown in Figure 6. The fit has f = 0.76 and an rms of
0.016. Although the χ2 statistics are relatively poor (χ2ν = 13.5), there is a good qualitative
correspondence between the morphologies of the observed and template spectra. Similar
results are obtained with the K3V and K7V templates. Many of the stronger transitions
(predominately Mg I and Si I lines) are too weak in the f = 0.76 template, however. If
we restrict the fits to narrower regions around these lines, we generally obtain higher f and
better fits (e.g., χ2ν = 5.2 for the 1.48 – 1.52 µm region fit by the K5V template).
The large χ2ν values for our fits indicate that our error bars are undersized relative to
the true uncertainty in the fits. This is unsurprising, given the systematic uncertainties
that affect modeling of NIR spectra in faint compact binaries, including the influence of
sky background and telluric absorption correction, uncertain placement of the continuum
level where no true continuum exists, and complex line blending wherein small temperature
and/or abundance variations between the template and target stars can affect fit results. As
a result, we have chosen to determine the mean value and uncertainty in the H-band donor
star contribution by using an average of fits to multiple lines and multiple templates over
the full waveband, rather than thorough the use of χ2 statistics, as we believe the scatter
between line fits provides a more rigorous sampling of uncertainties, particularly systematics.
We determined the best representative value for the H band donor star contribution by
– 9 –
averaging the best-fit f values for the three narrowband H fit regions (1.48 – 1.52 µm, 1.56–
1.61 µm, and 1.70–1.72 µm) and the K5V and K7V templates. Averaging these values for
the K5V and K7V template stars gives a donor star fraction f = 0.82±0.02.
In K, we fit three regions: the spectrum shortward of He I λ2.058 µm, which is dominated
by Ca I absorption lines; the region between He I and Bγ, which includes transitions of
Mg I, Al I, and Si I; and the region longward of Bγ, which contains a rich blend of features,
including transitions of Na I, Ca I, Fe I, Ti I, Mg I, and CO. Note that we do not have any
fit results for the K3V template to the K-band spectrum, because the absorption lines in the
K3V template are too weak relative to those of A0620-00, even at f=1. We have already
shown from the comparison of the SEDs of A0620-00 and a K3V star that a star this hot
cannot be the sole emission source (i.e., have f = 1) in A0620-00 in K without exceeding
the observed spectrum at shorter wavelengths. Now we see that a K3V star also cannot be
reconciled to the spectrum of A0620-00 by decreasing its fractional contribution, because its
absorption lines are already too weak at f = 1 to match those observed in A0620-00. As a
result, we restrict our fits in K to K5V and K7V templates.
The Ca I lines from 1.90 – 2.02 µm could not be simultaneously fit by a single value for
f . The fit values given in Table 4 appear to the eye to be overdiluted as a result of spurious
features (residuals of the telluric absorption correction) driving the fits. The strongest lines
— 1.978 and 1.987 µm — are fit by f ∼ 0.5, but at this value the other lines in this region
are too weak. This spectral region may be contaminated by Bδ λ1.945 µm emission from
the accretion disk. Problems also beset the spectral fits in the 2.07 – 2.15 µm region. The
K5V and K7V stellar spectra have anomalous emission bumps at 2.14 µm that cause visibly
overdiluted fits over the full wavelength range. If the fits are restricted to 2.07 – 2.13 µm,
best fits are obtained for f ∼ 0.65, while fits to the strongest feature alone, Al I 2.117 µm,
gives f = 0.75 for both the template fits.
The final fit region was the long-K portion of the spectrum, 2.18 – 2.42 µm, which
includes numerous atomic species transitions and the CO bandheads. Figure 7 shows the
best fits for the K5V template over the full fit region and when the fit is restricted to
λ < 2.28 µm, excluding the region dominated by the CO lines. When long K is fit in
its entirety, the fits are driven to large dilutions of the donor star to match the weak CO
absorption in the A0620-00 spectrum. At these low donor fractions (f = 0.45 for the K5V
template and f = 0.37 for the K7V template), the atomic lines are too weak in the template
spectrum relative to those in A0620-00. If the fit is restricted to 2.18 – 2.28 µm, the donor
fractions rise to f = 0.81 (K5V) and f = 0.76 (K7V). At these values, the atomic absorption
spectrum of A0620-00 is well fit although there remain discrepencies between template and
target spectra, most notably in the red components of Na I λ2.209 and λ2.339 µm and in
– 10 –
the Fe I lines from 2.226 – 2.247 µm, all of which are too weak in the template relative
to A0620-00. A single discrepancy in Fe I may explain these deviations, as there are Fe I
transitions coincident with both of the ”Na I” lines (see Figure 6 of FS for an illustration of
the complex line blending in this spectral region).
3.3. Fitting Model Spectra to the Spectrum of A0620-00
In addition to modeling A0620-00 with standard star spectra, we used the LinBrod
program to generate synthetic spectra for a Roche lobe-filling star in a compact binary
with the geometry of A0620-00 (Bitner & Robinson 2006). We adopted the following pa-
rameter values for the models: Porb = 0.323 d, q = 0.067, i = 41
◦, and K2 = 433 km s
(McClintock & Remillard 1986; Marsh, Robinson & Wood 1994; Gelino et al. 2001). We cre-
ated phase-resolved spectra for donor star temperatures of T = 4000, 4250, 4500, 4750, and
5000 K. Finally, we also created models at each temperature in which the carbon abundance
in the star was reduced to [C/H] = -0.5, -1.0, -1.5, and -2.0. To compare to the observed
spectrum of A0620-00, we averaged the model spectra over the binary orbit after removing
the donor star orbital motion and smoothed the spectra to the observed spectral resolution.
Because the SED and spectral type standard star fits to the A0620-00 spectrum point
to a donor star spectral type later than K3V, or T < 5000 K, we concentrated on the T =
4000, 4250, and 4500 K models. In an effort to characterize the carbon depletion required to
match the observed CO line depths, we also focused on the long-K portion of the spectrum
(2.18 – 2.42 µm) . Table 5 gives fit results for the LinBrod model fits. We first fit the solar
abundance models and then repeated the fits for the carbon-depleted spectra. Figure 8 shows
the normalized long K spectrum of A0620-00 with the solar abundance, T = 4000 K model
and with the T = 4000 K, [C/H] = -1.5 spectrum. The model spectra has been scaled by
f = 0.77, the best fit donor fraction for the 2.18 – 2.28 µm region. The [C/H] = -1.5 models
provided the best fit to the spectrum of A0620-00 for all three of the donor temperatures
examined (χ2ν = 4.0 for the T=4000 K, [C/H] = -1.5 model fit to 2.28 – 2.39 µm, versus
χ2ν = 5.9 and χ
ν = 5.2 for the [C/H] = -1.0 and -2.0 models, respectively). The -0.5 and
-1.0 models had CO lines that were still too strong relative to those of A0620-00, while the
CO features were virtually absent in the -2.0 models and too weak to match the observed
spectrum.
– 11 –
4. Discussion and Conclusions
4.1. The Donor Star in A0620–00
4.1.1. The Donor Star Spectral Type and Fractional Contribution to the NIR Spectrum
Our analysis of the NIR spectrum of A0620-00 has demonstrated three principal results:
1) the donor star is not the only NIR flux source, with 18±2% of the H-band flux originating
in another component of the binary; 2) the donor star must be later than a K3V spectral
type; and 3) the CO absorption lines are anomalously weak, requiring a carbon abundance
of [C/H] = -1.5 in the donor star to match the observed line depths.
A comparison of the broadband NIR SED of A0620-00 with those of spectral type
standard star spectra shows that the donor star cannot be the sole source of NIR flux. If
the standard star spectra are normalized to the dereddened spectrum of A0620-00 in the
K-band, they exceed the observed flux in the J and H wavebands. This result is true for all
standard star spectra earlier than M0V. The discrepancy cannot be reconciled by changes
in the differential reddening correction because the relative reddening correction between K
and J and H is too small. Additionally, the K-band absorption lines in the K3V standard
star are too weak to match the line depths seen in the A0620-00 spectrum even at a 100%
donor star contribution. Decreasing the donor star contribution to f < 1 will only make the
template lines weaker, so a K3V spectral type for the donor star is ruled out.
The most precise measurement of the donor star temperature is by González Hernández et al.
(2004), who fit synthetic spectra created from model atmospheres to the visible spectrum of
A0620-00. They found that T2 = 4900 ± 100 K, which corresponds to a K2/K3V spectral
type. This result is not in agreement with our requirement that the donor star be later than
K3V. There are reasons to believe that the González Hernández et al. (2004) results may be
unreliable, however. To determine the stellar parameters they used 24 Fe I lines, to which
they fit models with five free parameters: stellar temperature, gravity, and metallicity, as
well as a normalization and slope to represent dilution from the accretion disk. The use of
Fe I lines alone to constrain all stellar parameters (plus the disk contribution) is uncommon
practice for stellar modeling, which typically relies on independent determinations of the
stellar temperature, as well as on both Fe I and Fe II transitions. González Hernández et al.
(2004) also determined that the abundances of the elements they fit were slightly above solar
values. The adoption of a cooler donor star temperature, as required by our results, will
reduce their derived abundances.
We conclude that the spectral type of the donor star is most likely between K5V to
K7V, but do not attempt to constrain the spectral type more precisely. The rms values
– 12 –
for the dilution fits to various regions of the A0620-00 spectrum are comparable for both
spectral types and we cannot rely on the CO features to create more precise temperature
indices. We therefore averaged the dilution values from both spectral type fits for the three
narrow regions in the H-band to derive our H-band donor star fraction: f = 0.82 ± 0.02,
or an 82% donor star contribution in H. The donor fraction in long K (> 2.2µm) is 81%
for a K5V template or 76% for the K7V. Figure 9 shows the spectrum of A0620-00 with
a K5V template star scaled to 82% of the H-band flux and the A0620-00 spectrum after
the contribution of the donor star has been subtracted. While the spectrum of A0620-00 is
dominated by the K type donor, there is a significant second component consisting of a blue
continuum and strong H I and He II line emission.
Our results of a K5V to K7V donor star spectral type and 18% – 24% veiling in H
and K do not agree with those of Gelino et al. (2001), who found that the NIR SED of
A0620-00 matched that of a K4V (T = 4600 K) star with no dilution. As pointed out by
Hynes et al. (2005), however, there is a degeneracy between the spectral type of the star and
the amount and distribution of a veiling spectrum in an XRB: a cooler donor star plus a
diluting component can result in the same SED as a hotter donor star with no contamination.
Hynes et al. showed that even a 100 K overestimate of the temperature of the donor star
could result in a factor of two underestimate of the veiling. Because our data show, both
in the line EWs and the NIR SED, that a K type donor star cannot be the only NIR flux
source, we conclude that Gelino et al. (2001) overestimated the temperature of the donor
star and consequently underestimated the spectral dilution in the NIR.
Our results also disagree with the recent work of Harrison et al. (2007), who argue by
analogy with the IR spectrum of the cataclysmic variable SS Cygni that A0620-00 has <4%
dilution in the K-band. Their argument can be summarized as follows: a) A0620-00 and
SS Cyg have similar K-band spectral slopes, binary properties, and quiescent mass accretion
rates; b) SS Cyg also has a mid-IR excess; c) the NIR and MIR SEDs of SS Cyg can be
well fit by a K4V stellar spectrum plus free-free emission; d) because A0620-00 and SS Cyg
are similar, application of the star plus wind model can be extended from the latter to the
former to estimate a ∼4% limit on the contamination level of the NIR spectrum of A0620-00.
We have several concerns about this line of reasoning. First, as already discussed above,
the similarity of a SED to that of a field star does not preclude contaminating emission
from the disk. The data shown in Harrison et al. reinforce this: the JHK colors of SS Cyg
are consistent with that of an undiluted K4V star even as the ellipsoidal light curves show
clear evidence of contamination. In fact, Harrison el al. point out that the K-band spectrum
of A0620-00 has a different slope from those of the field stars, which provides fairly clear
evidence of contamination. Second, it is not known whether free-free emission is the correct
– 13 –
explanation for the excess MIR flux observed in SS Cyg. Another possibility discussed by
Dubus et al. (2004), who originally published the IR SED shown by Harrison et al., is that
circumbinary disk emission is responsible. Indeed, Muno & Mauerhan (2006) have found a
MIR component in A0620-00, which they fit with a ≃600 K blackbody component, consistent
with emission from circumbinary dust rather than free-free emission in a wind. The cool
blackbody component found by Muno & Mauerhan (2006) does not affect the NIR spectrum
of A0620-00, which suggests that the attempt to use MIR data from SS Cyg to estimate the
NIR contamination in A0620-00 is a red herring. Finally, we reiterate what our simultaneous,
JHK spectra of A0620-00 indicate directly about the donor star contamination in the system:
based on the shape and fluxes of the JHK continuum spectrum, the absolute EWs of the
atomic absorption lines, and dilution analyses using both field stars and a Roche-lobe filling
model spectrum, the donor star must be significantly diluted (∼20%), even in the K-band.
4.1.2. Weak CO Features in the Donor Star Spectrum
Although the atomic absorption line spectrum in A0620-00 is consistent with a K5V –
K7V spectral type, the molecular 12CO lines are significantly weaker in A0620-00 than in
a field star, corresponding to CO line strengths normally seen in early G stars. Using the
LinBrod model spectra of a Roche lobe-filling star, we found that reducing the C abundance
to [C/H] = -1.5 results in a better match to the CO lines in the spectrum than the [C/H] =
0, -1.0, and -2.0 models when the donor star fraction is fit to f = 0.77.
The weakness of the 12CO lines in A0620-00 has already been noted by Harrison et al.
(2007), who determine that the 12C abundance must be decreased 50% to match the depth
of the 12CO bandheads. The 50% reduction in 12C abundance ([12C/H] = -0.3) found by
Harrison et al. is significantly smaller than the 97% drop we claim. Again, however, we
have several concerns about the method by which Harrison et al. obtained their results.
First, they started with the line list included in the spectral synthesis program SPECTRUM
(Gray & Corbally 1994), but when they were unable to match the spectra of standard spec-
tral type field stars using this line list (and Kurucz atmospheres) due to the presence of
strong absorption features in the model but not the observed spectra, they abandoned it
and constructed one consisting only of Na I, Mg I, and CO transitions. When they found
that the lines in their new models were too weak to match those of field stars at the correct
temperature, they globally increased the log(gf) values of every line in their new line list
to match up with observations. They then applied this revised model to the spectrum of
A0620-00, adjusting the C isotopic abundances until the best fit by eye was achieved.
We are not confident in the quantitative reliability of spectral model fits in which spectral
– 14 –
lines have been dropped and oscillator strengths altered in order to achieve even a rough fit
to a K5V field star. We note in contrast that our LinBrod models give fits to the spectrum
of A0620-00 of comparable quality of those using template field stars with no deletions or
alterations to the spectral synthesis line lists required. Our other concern with the Harrison
et al. fits is the placement of the continuum and evaluation of the best model fit. They
determined the best-fit model by eye. To our eyes, however, the fits they show in their
Figure 3 do not properly take into account the actual data quality: their pseudo-continuum
levels appear too high and their CO line minima are too low in their preferred model. A
rough comparison of their observed spectrum to ours shows that the 12CO normalized line
depths are similar in both data sets, suggesting that a statistical model fit to their spectrum
would result in a 12C abundance similar to the one we obtain.
Finally, Harrison et al. contend that in addition to a low 12C abundance in A0620-00, the
13C abundance is enhanced, such that 13C/12C = 1. They base their identification of 13CO
on the presence of a feature coinciding with the 13CO(3,1) 2.374 µm bandhead. However,
they also note that the 13CO(2,0) 2.345 µm is not seen in their spectrum, despite being the
stronger feature of the two. Given the poor atmospheric conditions at the time their data
was taken and the increasing amount of telluric H20 absorption at these wavelengths, we
do not believe that their data provide a clear detection of 13CO and certainly not of 13C at
equal abundance with 12C. In our spectrum of A0620-00, there is a feature at the location of
the 13CO(2,0) 2.345 µm bandhead, but no feature coinciding with 13CO(3,1) 2.374 µm. We
have marked the locations of the first two 13CO bandheads in our Figure 4. Dhillon et al.
(2002) point out, however, that a Ti I feature is coincident with the 13CO(2,0) 2.345 µm
bandhead and is a much more likely explanation of the feature we see. As a result, we do
not believe that an unambiguous detection of 13CO can be made in our spectrum and we
find no evidence of any enhancement of this species in A0620-00.
Anomalously weak CO absorption features have been seen in other compact binary sys-
tems. The NIR spectra of several cataclysmic variables show weak or absent 12CO absorption
(Harrison et al. 2000, 2004). In the dwarf nova U Gem, the NIR CO absorption lines are sig-
nificantly weaker than in the M3V standard star spectrum that provides a good match to the
atomic lines. Model fits to the FUV spectrum of the metal-enriched white dwarf in U Gem
show that the C abundance on the WD surface is [C/H] = -1.0, while the N abundance
is highly super-solar, [N/H] = 0.7 (Sion et al. 1998; Long & Gilliland 1999; Froning et al.
2001). Similarly, in the UV spectrum of the BH XRB XTE J1118+480 in outburst, typically
strong emission lines of C IV and O V are undetectable, while the N V λ1240 Å appears
enhanced (Haswell et al. 2002). The UV line ratios are inconsistent with photoionization
models, leading Haswell et al. to conclude that the emission spectrum is indicative of the
accretion of C-depleted material from the donor star. As a result, A0620-00 can be added
– 15 –
to the ever-increasing list of cataclysmic variables and XRBs that show depleted C (and, in
some cases, enhanced N), pointing to a common history of nuclear processing of C to N in
compact binary systems.
4.2. The Mass of the Black Hole in A0620–00
Based on previous work, the mass of the BH in A0620-00 is known to the value of
one unknown, the binary inclination: M1 = (3.09± 0.09) sin
−3 i (Marsh, Robinson & Wood
1994). The orbital inclination can be obtained by modeling the light curve of the donor star.
The donor star fills its Roche lobe and is distorted in shape, which leads to a double-humped
ellipsoidal variation in the light curve, the amplitude of which is dependent on inclination. If
a source besides the donor star contributes to the light curve, the amplitude of the ellipsoidal
variation will be diluted, leading to an underestimate of the inclination and a corresponding
overestimate of the BH mass if the contaminating source is not taken into account.
The most precise inclination results to date were reported by Gelino et al. (2001), who
modeled JHK light curves of A0620-00. Based on good agreement between the target SED
and that of a dereddened K4 star, they concluded that the donor star is the only NIR
continuum source in A0620-00. Using this assumption, they modeled the ellipsoidal light
curve and found i = 40.◦75± 3◦ and M1 = 11.0± 1.9 M⊙. As discussed above, however, the
NIR SED alone is insufficient to resolve the degeneracy between donor star temperature and
veiling by another flux source in the system. Our results indicate that the donor star cannot
be the only NIR flux source in A0620-00 and that consequently, the Gelino et al. results
overestimate the BH mass in A0620-00.
We previously modeled the H-band light curve in A0620-00 with a donor star plus accre-
tion disk model and determined 38◦ ≤ i ≤ 75◦, or 3.3 ≤ M1 ≤ 13.6M⊙ (Froning & Robinson
2001). The broad range of values was caused by a degeneracy between the inclination
and the fractional contribution of the accretion disk to the H-band light. Table 5 of
Froning & Robinson (2001) gives the inclination in A0620-00 as a function of the fractional
contribution of diluting sources in the H-band. In § 3.2 of this paper, we determined that
the donor star contributes 82±2% of the H-band flux in A0620-00. This result, a diluting
fraction of 18±2%, combined with Table 5 in Froning & Robinson (2001) gives a binary
inclination for A0620-00 of i = 43± 1◦.
Based on this inclination, we obtain the mass of the BH accretor in A0620-00: M1 =
9.7±0.6M⊙. This result is comparable to previous estimates of the BH mass in the literature.
Shahbaz, Naylor & Charles (1994) found a BH mass of 10 M⊙, while Gelino et al. (2001)
– 16 –
found M1 = 11.0±1.9 M⊙. The inclination we derive is within the error interval of Gelino
et al. The difference in BH masses between the two results comes from the slightly higher
inclination we adopt as a result of our determination that the donor star cannot be the sole
NIR emission source.
The error bar on our derived BH mass represents the propagated statistical errors in
the result. A potential source of systematic uncertainty is our assumption that the atomic
absorption spectrum of A0620-00 can be modeled by template spectra with solar abundances.
The fact that the C abundance in A0620-00 has to be decreased significantly to match the
CO lines suggests the need for caution in this regard. However, the relative line ratios of
the atomic transitions largely agree with each other in the derived fractional donor star
contributions. These lines are also strong transitions that lie on the flat portion of the curve
of growth and will not be sensitive to small abundance variations. Finally, we note that while
González Hernández et al. (2004) derived slightly super-solar abundances for the metal lines
in A0620-00, they used a stellar temperature that is too hot to be consistent with the NIR
SED. The adoption of a cooler temperature for the donor star will cause the metal line
abundances required to fit the optical spectrum to decrease.
Another potential source of systematic error is the large time interval (8 years) between
acquisition of the NIR light curve and the spectra. Our analysis assumed that the donor star
diluting fraction found by analyzing the spectra applies equally well to the light curve data.
This assumption is valid, we believe, because while A0620-00 is variable, its NIR colors typi-
cally don’t vary by more than 0.2 mag, and six observations of the H-band light curve spaced
over days to years had mean colors that agreed to within 0.04 mag (Froning & Robinson 2001;
Gelino et al. 2001). Our rough estimate of the absolute calibration of our time-averaged
spectrum was also consistent with the previous measurements. We can estimate the uncer-
tainty in the non-donor star contribution by assuming that it could vary by ±4%, consistent
with the previous measurements. This would cause the donor star fraction to range from
79 ≤ f ≤ 86, which results in i = 43± 1◦, consistent with our statistical uncertainties.
We thank Nathaniel Cunningham for assistance in observing A0620-00, and the staff at
the IRTF for their support. We also thank Chris Sneden and Niall Gaffney for their help in
calculating the LinBrod donor star spectra and for useful discussions.
Facilities: IRTF(SpeX)
– 17 –
REFERENCES
Ali, B., Carr, J. S., Depoy, D. L., Frogel, J. A., & Sellgren, K. 1995, AJ, 110, 2415
Bailyn, C. D., Jain, R. K., Coppi, P., & Orosz, J. A. 1998, ApJ, 499, 367
Bitner, M. A. & Robinson, E. L. 2006, AJ, 131, 1712
Cardelli, J. A., Clayton, G. C., & Mathis, J. S. 1989, ApJ, 422, 158
Cushing, M. C., Vacca, W. D., & Rayner, J. T. 2004, PASP, 116, 362
Dhillon, V. S., Littlefair, S. P., Marsh, T. R., Sarna, M. J., & Boakes, E. H. 2002, A&A,
393, 611
Dubus, G., Campbell, R., Kern, B., Taam, R. E., & spruit, H. C. 2004, MNRAS, 349, 869
Elvis, M., Page, C. G., Pounds, K. A., Ricketts, M. J., & Turner, M. J. L. 1975, Nature,
257, 656
Förster Schreiber 2000, AJ, 120, 2089
Froning, C. S., Long, K. S., Drew, J. E., Knigge, C., & Proga, D. 2001, ApJ, 562, 963
Froning, C. S., & Robinson, E. L. 2001, AJ, 121, 2212
Fryer, C. L. & Kalogera, V. 2001, ApJ, 554, 548
Gelino, D. M. & Harrison, T. E. 2003, ApJ, 599, 1254
Gelino, D. M., Harrison, T. E., & Orosz, J. A. 2001, AJ, 122, 2668
González Hernández, J. I., Rebolo, R., Israelian, G., Casares, J., Maeder, A., & Meynet, G.
2004, ApJ, 609, 988
Gray, R. O. & Corbally, C. J. 1994, AJ, 107, 742
Harrison, T. E., Howell, S. B., Szkody, P., & Cordova, F. A. 2007, AJ, 133, 162
Harrison, T. E., Osborne, H. L., & Howell, S. B. 2005, AJ, 129, 2400
Harrison, T. E., Osborne, H. L., & Howell, S. B. 2004, AJ, 127, 3493
Harrison, T. E., McNamara, B. J., Szkody, P., & Gilliland, R. L. 2000, AJ, 120, 2649
Haswell, C. A., Hynes, R. I., King, A. R., & Schenker, K. 2002, MNRAS, 332, 928
– 18 –
Haswell, C. A., Robinson, E. L., Horne, K., Stiening, R. F. & Abbott, T. M. C. 1993, ApJ,
411, 802
Hynes, R. I., Robinson, E. L., & Bitner, M. 2005, ApJ, 630, 405
Kleinmann, S. G. & Hall, D. N. B. 1986, ApJ, 62, 501
Leibowitz, E. M., Hemar, S., & Orio, M. 1998, MNRAS, 300, 463
Long, K. S. & Gilliland, R. L. 1999, ApJ, 511, 916
Marsh, T. R., Robinson, E. L. & Wood, J. H. 1994, MNRAS, 266, 137
McClintock, J. E. & Remillard, R. A. 2005, in Compact Stellar X-ray Sources, eds. W. Lewin
& M. van der Klis, (Cambridge: Cambridge University Press), in press
McClintock, J. E. & Remillard, R. A. 1986, ApJ, 308, 110
McClintock, J. E., Petro, L. D., Remillard, R. A. & Ricker, G. R. 1983, ApJL, 266, L27
Meyer, M. R., Edwards, S., Hinkle, K. H., & Strom, S. E. 1998, ApJ, 508, 397
Muno, M. P. & Mauerhan, J. 2006, ApJ, 648, L135
Oke, J. B. 1977, ApJ, 217, 181
Origlia, L., Moorwood, A. F. M., & Oliva, E. 1993, A&A, 280, 536
Orosz, J. A., Bailyn, C. B., Remillard, R. A., McClintock, J. E. & Foltz, C. B. 1994, ApJ,
436, 848
Rayner, J. T., Toomey, D. W., Onaka, P. M., Denault, A. J., Stahlberger, W. E., Vacca,
W. D., Cushing, M. C., & Wang, S. 2003, PASP, 115, 362
Shahbaz, T., Hynes, R. I., Charles, P. A., Zurita, C., Casares, J., Haswell, C. A., Araujo-
Betancor, S., & Powell, C. 2004, MNRAS, 354, 31
Shahbaz, T., Bandyopadhyay, R. M. & Charles, P.A. 1999, A&A, 346, 82
Shahbaz, T., Naylor, T. & Charles, P. A. 1994, MNRAS, 268, 756
Sion, E. M., et al. 1998, ApJ, 496, 449
Vacca, W. D., Cushing, M. C., & Rayner, J. T. 2003, PASP, 115, 389
Wallace, L., Meyer, M. R., Hinkle, K., & Edwards, S. 2000, ApJ, 535, 325
– 19 –
Wu, C.-C., Aalders, J. W. G., van Duinen, R. J., Kester, D., & Wesselius, P. R. 1976, A&A,
50, 445
This preprint was prepared with the AAS LATEX macros v5.2.
– 20 –
– 21 –
Table 1. SpeX Observations
Object Date (UT) Instrument Texp (min) Φ
A0620-00 2004 Jan 8 SpeX 290 0.59 – 1.47
A0620-00 2004 Jan 9 SpeX 280 0.69 – 1.49
A0620-00 2004 Jan 10 SpeX 250 0.92 – 1.61
HD42606 (K2.5 III) 2004 Jan 18 SpeX 1.2 · · ·
HD 3765 (K2 V) 2004 Jan 18 SpeX 2 · · ·
HD16160 (K3 V) 2004 Jan 19 SpeX 1.8 · · ·
61 Cyg A (K5 V) 2000 Sept 15 SpeX 0.5 · · ·
61 Cyg B (K7 V) 2000 Sept 15 SpeX 0.5 · · ·
aOrbital phase coverage of A0620–00 for each night’s observations, based
on the ephemeris of McClintock & Remillard (1986).
– 22 –
Table 2. Equivalent Widths of Selected Absorption Lines
Feature EWa (Å) EWb (Å)
Si I 1.5892 1.52±0.04 · · ·
12CO (6,3) 1.6187 0.86±0.07 · · ·
Na I 2.2076 0.94±0.06 2.18±0.04
Fe I 2.2263 0.36±0.05 · · ·
Fe I 2.2387 0.40± · · ·
Ca I 2.2636 1.06±0.06 3.21±0.03
Mg I 2.2814 0.43±0.04 0.68±0.06
12CO (2,0) 2.2935 0.38±0.06 2.26±0.08
12CO (3,1) 2.3227 0.22±0.02 · · ·
13CO (2,0) 2.3448 0.64±0.04 · · ·
aEWs calculated using the integration lim-
its of Förster Schreiber 2000.
bEWs calculated using the integration lim-
its of Ali et al. (1995).
– 23 –
Table 3. Wavelength Ranges for Spectral Fits
Waveband λ Range (mum) Description
J 1.10 – 1.26 Long J
1.175 – 1.215 Blend incl. Mg I, Fe I, Si I
H 1.4 – 1.8 Full H band
1.48 – 1.52 Mg1 lines.
1.56 – 1.61 Blend incl. Mg I, Si I
1.70 – 1.72 Mg1
K 1.90 – 2.02 Several Ca I lines.
2.07 – 2.15 Short K incl. Mg I, Si I, Al I.
2.18 – 2.42 Long K incl. CO bandhead.
2.18 – 2.28 No CO, lines incl. Na I, Fe I, Ca I, Mg I.
– 24 –
Table 4. Template Star Fits to A0620–00 Spectrum
Template Wavelength Range Template Fraction rms
Spectral Type (µm) (f)
K3V 1.10–1.26 0.78 0.012
1.175 – 1.215 0.82 0.010
1.4–1.8 0.76 0.016
1.48 – 1.52 0.71 0.011
1.56 – 1.61 0.81 0.011
1.70 – 1.72 0.91 0.009
K5V 1.10 – 1.26 0.90 0.013
1.175 – 1.215 1.0 0.010
1.4 – 1.8 0.78 0.016
1.48 – 1.52 0.82 0.010
1.56 – 1.61 0.88 0.015
1.70 – 1.72 0.78 0.010
1.90 – 2.02 0.37 0.02
2.07 – 2.15 0.52 0.01
2.18 – 2.42 0.45 0.012
2.18 – 2.28 0.81 0.009
K7V 1.10 – 1.26 0.87 0.013
1.175 – 1.215 0.99 0.011
1.4 – 1.8 0.76 0.017
1.48 – 1.52 0.82 0.010
1.56 – 1.61 0.85 0.015
1.70 – 1.72 0.77 0.009
1.90 – 2.02 0.39a 0.021
2.07 – 2.15 · · · b · · ·
2.18 – 2.42 0.37 0.014
2.18 – 2.28 0.76 0.009
– 25 –
aFits too diluted due to noise in spectra.
bFits compromised by spurious feature in template.
– 26 –
Table 5. LinBrod Fits to A0620–00 Spectrum
Model Temperature Model C Abundance Wavelength Range Template Fraction rms
(K) (log[C/H]/[C/H]⊙) (µm) (f)
4000 0.0 2.18 – 2.28 0.72 0.011
4250 0.0 2.18 – 2.28 0.76 0.011
4500 0.0 2.18 – 2.28 0.99 0.011
Full Long K Region Including CO Lines
4000 0.0 2.18 – 2.38 0.14 0.015
4000 -0.5 2.18 – 2.38 0.28 0.014
4000 -1.0 2.18 – 2.38 0.54 0.012
4000 -1.5 2.18 – 2.38 0.78 0.011
4000 -2.0 2.18 – 2.38 0.74 0.012
4250 0.0 2.18 – 2.38 0.14 0.015
4250 -0.5 2.18 – 2.38 0.28 0.014
4250 -1.0 2.18 – 2.38 0.56 0.013
4250 -1.5 2.18 – 2.38 0.82 0.012
4250 -2.0 2.18 – 2.38 0.81 0.012
4500 0.0 2.18 – 2.38 0.17 0.015
4500 -0.5 2.18 – 2.38 0.35 0.014
4500 -1.0 2.18 – 2.38 0.71 0.013
4500 -1.5 2.18 – 2.38 0.96 0.012
4500 -2.0 2.18 – 2.38 0.93 0.013
Longward of the 2.29 µm Bandhead Onlya
4000 -0.5 2.28 – 2.38 0.77 0.025
4000 -1.0 2.28 – 2.38 0.77 0.014
4000 -1.5 2.28 – 2.38 0.77 0.012
4000 -2.0 2.28 – 2.38 0.77 0.013
4250 -0.5 2.28 – 2.38 0.77 0.024
– 27 –
Table 5—Continued
Model Temperature Model C Abundance Wavelength Range Template Fraction rms
(K) (log[C/H]/[C/H]⊙) (µm) (f)
4250 -1.0 2.28 – 2.38 0.77 0.014
4250 -1.5 2.28 – 2.38 0.77 0.012
4250 -2.0 2.28 – 2.38 0.77 0.013
4500 -0.5 2.28 – 2.38 0.77 0.020
4500 -1.0 2.28 – 2.38 0.77 0.014
4500 -1.5 2.28 – 2.38 0.77 0.013
4500 -2.0 2.28 – 2.38 0.77 0.014
aDonor star fraction fixed in these models to the best-fit value to the nearby atomic lines
from the template star fits.
– 28 –
Fig. 1.— The NIR spectrum of A0620–00, obtained in 2004 January. The solid line shows
the time-averaged spectrum of A0620. Individual exposures were shifted to remove the
orbital motion of the donor star before averaging. The dotted line shows the spectrum after
dereddening, assuming E(B–V) = 0.39 (Wu et al. 1976).
Fig. 2.— The spectrum of A0620-00 in J. Prominent spectral features are labeled. An error
bar representative of the statistical uncertainty per resolution element is plotted on the far
right of the plot. Also shown at the bottom of the figure is the spectrum of HD45137, the
A0V star used for telluric correction of the A0620–00 spectra. The spectrum retains the
throughput profile of each spectral order but is not shown with absolute flux calibration.
The H I lines intrinsic to the A0V spectrum have been fitted and removed using the xtellcor
program developed by the IRTF.
Fig. 3.— The spectrum of A0620-00 in H. Prominent spectral features are labeled. A
representative error bar for a resolution element is plotted on the far right. The telluric
spectrum is also shown at the bottom of the figure.
Fig. 4.— The spectrum of A0620-00 in K. Prominent spectral features are labeled. A
representative error bar for a resolution element is plotted on the far right. The telluric
spectrum is also shown.
Fig. 5.— Shown in black is the dereddened spectrum of A0620-00. Shown in gray is the
spectrum of 61 Cyg A, a K5V spectral type star. The spectrum of 61 Cyg A has been
normalized to the flux of A0620 just blueward of the 12CO 2.29 µm bandhead.
Fig. 6.— The normalized H-band spectrum of A0620–00 with a scaled spectral type standard
star fit. The standard star, shown in red, is 61 Cyg A, a K5V star. It has been scaled by
f = 0.76.
Fig. 7.— The normalized K-band spectrum of A0620–00 with a scaled spectral type standard
star fit. The template star is 61 Cyg A, a K5V star. The solid red spectrum shows the
template scaled by f = 0.45, the best fit over the full 2.18 – 2.42 µm range. The dashed red
spectrum shows the template scaled by f = 0.81, the best fit to the 2.18 – 2.28 µm region.
Fig. 8.— The normalized K-band spectrum of A0620–00 with scaled LinBrod T = 4000 K
model spectra fits. The solid red line shows the LinBrod model with [C/H] = -1.5, while the
dashed red line shows the solar abundance model. To avoid confusion, the latter is shown
only for λ > 2.288 µm. The models are scaled by f = 0.77.
Fig. 9.— The top panel shows the dereddened spectrum of A0620-00 in black and the
spectrum of 61 Cyg A, a K5V star, in gray. The spectrum of the K5V template has been
– 29 –
scaled to 82% of the A0620-00 flux at the center of the H band, 1.6 µm. The lower panel shows
the NIR spectrum of the accretion disk in A0620-00, created by subtracting the template
spectrum from that of A0620-00.
– 30 –
f1.eps
– 31 –
f2.eps
– 32 –
f3.eps
– 33 –
f4.eps
– 34 –
f5.eps
– 35 –
f6.eps
– 36 –
f7.eps
– 37 –
f8.eps
– 38 –
f9.eps
	Introduction
	Observations and Data Reduction
	Analysis
	Classification Based on Spectral Indices
	Fitting Spectral Type Standard Stars to the Spectrum of A0620-00
	Fitting Model Spectra to the Spectrum of A0620-00
	Discussion and Conclusions
	The Donor Star in A0620–00
	The Donor Star Spectral Type and Fractional Contribution to the NIR Spectrum
	Weak CO Features in the Donor Star Spectrum
	The Mass of the Black Hole in A0620–00
ABSTRACT
  We present broadband NIR spectra of A0620-00 obtained with SpeX on the IRTF.
The spectrum is characterized by a blue continuum on which are superimposed
broad emission lines of HI and HeII and a host of narrower absorption lines of
neutral metals and molecules. Spectral type standard star spectra scaled to the
dereddened spectrum of A0620-00 in K exceed the A0620-00 spectrum in J and H
for all stars of spectral type K7V or earlier, demonstrating that the donor
star, unless later than K7V, cannot be the sole NIR flux source in A0620-00. In
addition, the atomic absorption lines in the K3V spectrum are too weak with
respect to those of A0620-00 even at 100% donor star contribution, restricting
the spectral type of the donor star in A0620-00 to later than K3V. Comparison
of the A0620-00 spectrum to scaled K star spectra indicates that the CO
absorption features are significantly weaker in A0620-00 than in field dwarf
stars. Fits of scaled model spectra of a Roche lobe-filling donor star to the
spectrum of A0620-00 show that the best match to the CO absorption lines is
obtained when the C abundance is reduced to [C/H] = -1.5. The donor star
contribution in the H waveband is determined to be 82+-2%. Combined with
previous published results from Froning & Robinson (2001) and Marsh et al.
(1994), this gives a precise mass for the black hole in A0620-00 of M_BH =
9.7+-0.6 M_solar.

<|endoftext|><|startoftext|>
Introduction
Quantum computing offers us the opportunity to
solve certain problems thought to be intractable
on a classical machine. For example, the follow-
ing classically hard problems benefit from quan-
tum algorithms: factorization [19], unsorted database
search [6], and simulation of quantummechanical sys-
tems [26].
In addition to significant work on quantum al-
gorithms and underlying physics, there have been
several studies exploring architectural trade-offs for
quantum computers. Most such research [3, 16] has
focused on simulating quantum algorithms on a fixed
layout rather than on techniques for quantum circuit
synthesis and layout generation. These studies tend
Classical Control:
HDL Format
(plus annotations
for scheduling)
Quantum Layout
(including initial
qubit placement)
Basic 
Blocks
Custom 
Modules
Quantum 
Circuit 
Specification
CAD Flow for 
Quantum Circuits
New Custom Module
Figure 1: The goal of our CAD flow is to automate
the laying out of a quantum circuit to generate a
physical layout, an intelligent initial placement of
qubits, the associated classical control logic and an-
notations to help the online scheduler better use the
layout optimizations as they were intended. This flow
may then be used recursively to design larger blocks
using previously created modules.
to use hand-generated and hand-optimized layouts on
which efficient scheduling is then performed. While
this approach is quite informative in a new field, it
quickly becomes intractable as the size of the circuit
grows.
Our goal is to automate most of the tasks involved
in generating a physical layout and its associated con-
trol logic from a high-level quantum circuit specifica-
tion (Figure 1). Our computer-aided design (CAD)
flow should process a quantum circuit specification
and produce the following:
• a physical layout in the desired technology
• an intelligent initial qubit placement in the lay-
• classical control circuitry specified in some hard-
ware description language (HDL), which may
then be run through a classical CAD flow
http://arxiv.org/abs/0704.0268v1
• a set of annotations or “hints” for the online
scheduler, allowing a tighter coupling of layout
optimizations to actual runtime operation
Much like a classical CAD flow, this quantum CAD
flow is intended to be used hierarchically. We begin
with a set of technology-specific basic blocks (some
ion trap technology examples are given in Section 2).
We then lay out some simple quantum circuits with
the CAD flow, thus creating custom modules. The
CAD flow may then be used recursively to create ever
larger designs. This approach allows us to develop,
evaluate and reuse design heuristics and avoids both
the uncertainty and time-intensive nature of hand-
generated layouts.
1.1 Motivation for a Quantum CAD
Quantum circuits that are large enough to be “inter-
esting” require the orchestration of hundreds of thou-
sands of physical components. In approaching such
problems, it is important to build upon prior work in
classical CAD flows. Although the specifics of quan-
tum technologies (such as are discussed in Section 2)
are different from classical CMOS technologies, prior
work in CAD research can give us insight into how
to approach the automated layout of quantum gates
and channels.
Further, quantum circuits exhibit some interesting
properties that lend themselves to automatic synthe-
sis and computer-aided design techniques:
Quantum ECC Quantum data is extremely frag-
ile and consequently must remain encoded at all
times – while being stored, moved, and com-
puted upon. The encoded version of a circuit
is often two or three orders of magnitude larger
than the unencoded version. Further, the ap-
propriate level of encoding may need to be se-
lected as part of the layout process in order to
achieve an appropriate “threshold” of error-free
execution. Rather than burdening the designer
with the complexities of adding fault-tolerance
to a circuit, computer-aided synthesis, design
and verification can perform such tasks automat-
ically.
Ancillae Quantum computations use many helper
qubits known as ancillae. Ancillae consist of
bits that are constructed, utilized and recycled
as part of a computation. Sometimes, ancillae
are explicit in a designer’s view of the circuit.
Often, however, they should be added automat-
ically in the process of circuit synthesis, such as
during the construction of fault-tolerant circuits
from high-level circuit descriptions. An auto-
matic design flow can insert appropriate circuits
to generate and recycle ancillae without involv-
ing the designer.
Teleportation Quantum circuits present two pos-
sibilities for data transport: ballistic movement
and teleportation. Ballistic movement is rela-
tively simple over short distances in technologies
such as ion traps (Section 2). Teleportation is an
alternative that utilizes a higher-overhead distri-
bution network of entangled quantum bits to dis-
tribute information with lower error over longer
distances [9]. The choice to employ teleportation
is ideally done after an initial layout has deter-
mined long communication paths. Consequently,
it is a natural target for a computer-aided design
flow.
1.2 Contributions
In this paper, we make the following contributions:
• We propose a CAD flow for automated design of
quantum circuits and detail the necessary com-
ponents of the flow.
• We describe a technique for automatic synthe-
sis of the classical control circuitry for a given
layout.
• We show that different grid-based architectures,
which have been the focus of most prior work in
this field, exhibit vastly varying performance for
the same circuit.
• We present heuristics for the placement and
routing of quantum circuits in ion trap technol-
• We lay out some quantum error correction cir-
cuits and evaluate the effectiveness of the heuris-
tics in terms of circuit area and latency.
Dead 
End Gate
Straight 
Channel
Three-Way 
Intersection
Four-Way 
Intersection
P0P0P0P0
P1P1P1
Straight 
Channel Gate
Figure 2: Example library of basic macroblocks.
Each macroblock has a specific number of ports
(shown as P0-P3) along with a set of electrodes used
for ion movement and trapping. Some macroblocks
contain a trap region where gates may be performed
(black square).
1.3 Paper Organization
The rest of this paper is organized as follows. We
introduce our chosen technology in Section 2, fol-
lowed by an overview of prior work in the field in
Section 3. In Section 4, we detail our proposed CAD
flow and our evaluation metrics. In Section 5, we de-
scribe the control circuitry interface and scheduling
protocol that we use in the following sections. Sec-
tion 6 contains a study of grid-based layouts, which
have been the basis of most prior work on this sub-
ject. In Section 7, we present a greedy approach to
laying out quantum circuits, followed in Section 8
by a much more scalable dataflow analysis-based ap-
proach to layout. Section 9 contains our experimental
results for all three approaches to layout generation,
and we conclude in Section 10.
2 Ion Traps
For our initial study, we choose trapped ions [4, 17] as
our substrate technology. Trapped ions have shown
good potential for scalability [10]. In this technol-
ogy, a physical qubit is an ion, and a gate is a loca-
tion where a trapped ion may be operated upon by a
modulated laser.
The ion is both trapped and ballistically moved
by applying pulse sequences to discrete electrodes
which line the edges of ion traps. Figure 3a shows an
experimentally-demonstrated layout for a three-way
intersection [7]. A qubit may be held in place at any
trap region, or it may be ballistically moved between
them using the gray electrodes lining the paths.
Rather than using ion traps as basic blocks, we de-
fine a library of macroblocks consisting of multiple
traps for two reasons. First, macroblocks abstract
out some of the low-level details, insulating our anal-
yses from variations in the technology implementa-
tions of ion traps. Details such as which ion species
is used, specific electrode sizing and geometry (clearly
variable in the layout in Figure 3a) and exact voltage
levels necessary for trapping and movement are all
encapsulated within the macroblock. Second, ballis-
tic movement along a channel requires carefully timed
application of pulse sequences to electrodes in non-
adjacent traps. By defining basic blocks consisting of
a few ion traps, we gain the benefit that crossing an
interface between basic blocks requires communica-
tion only between the two blocks involved.
We use the library of macroblocks shown in Fig-
ure 2, each of which consists of a 3x3 grid of trap re-
gions and electrodes, with ports to allow qubit move-
ment between macroblocks. The black squares are
gate locations, which may not be performed at inter-
sections or turns in ion trap technology. Each of these
macroblocks may be rotated in a layout. This library
is by no means exhaustive, however it does provide
the major pieces necessary to construct many physi-
cal circuits. The macroblocks we present are abstrac-
tions of experimentally-demonstrated ion trap tech-
nology [7, 18]. In Figure 3, we show how one can map
a demonstrated layout (Figure 3a) to our macroblock
abstractions (Figure 3b). We model this layout as
a set of StraightChannel and ThreeWayIntersection
macroblocks. Above the ion trap plane is an array of
MEMS mirrors which routes laser pulses to the gate
locations in order to apply quantum gates [11], as
shown in Figure 3c.
Some key differences between this quantum circuit
technology and classical CMOS are as follows:
• “Wires” in ion traps consist of rectangular chan-
nels, lined with electrodes, with atomic ions sus-
pended above the channel regions and moved
ballistically [13]. Ballistic movement of qubits
requires synchronized application of voltages on
channel electrodes to move data around. Thus
each wire requires movement control circuitry to
handle any qubit communication.
• A by-product of the synchronous nature of the
qubit wire channels is that these circuits can
be used in a synchronous manner with no ad-
ditional overhead. This enables some convenient
pipelining options which will be discussed in Sec-
tion 8.1.
Figure 3: a) Experimentally demonstrated physical layout of a T-junction (three-way intersection). b)
Abstraction of the circuit in (a), built using the StraightChannel and ThreeWayIntersection macroblocks
shown in Figure 2. c) The ion traps are laid out on a plane, above which is an array of MEMS mirrors used
to route and split the laser beams that apply quantum gates.
• Each gate location will likely have the ability
to perform any operation available in ion trap
technology. This enables the reuse gate locations
within a quantum circuit.
• Scalable ion trap systems will almost certainly
be two-dimensional due to the difficulty of fab-
ricating and controlling ion traps in a third di-
mension [8]. This means that all ion crossings
must be intersections.
• Any routing channel may be shared by multi-
ple ions as long as control circuits prevent multi-
ion occupancy. Consequently, our circuit model
resembles a general network, although schedul-
ing the movement in a general networking model
adds substantial complexity to our circuit.
• Movement latency of ions is not only dependent
on Manhattan distance but also on the geometry
of the wire channel. Experimentally, it has been
shown that a right angle turn takes substantially
longer than a straight channel over the same dis-
tance [18, 7].
3 Related Work
Prior research has laid the groundwork for our quan-
tum circuit CAD flow. Svore et al [22, 23] proposed
a design flow capable of pushing a quantum program
down to physical operations. Their work outlined
various file formats and provided initial implementa-
tions of some of the necessary tools. Similarly, Balen-
siefer et al [2, 3] proposed a design flow and compi-
lation techniques to address fault-tolerance and pro-
vided some tools to evaluate simple layouts. While
our CAD flow builds upon some of these ideas, we
concentrate on automatic layout generation and con-
trol circuitry extraction.
Additionally, initial hand-optimized layouts have
been proposed in the literature. Metodi et al [15] pro-
posed a uniform Quantum Logic Array architecture,
which was later extended and improved in [24]. Their
work concentrated on architectural research and did
not delve into details of physical layout or scheduling.
Finally, Metodi et al [16] created a tool to automati-
cally generate a physical operations schedule given a
quantum circuit and a fixed grid-based layout struc-
ture. We extend and improve upon their work by
adding new scheduling heuristics capable of running
on grid-based and non-grid-based layouts.
Maslov et al [14] have recently proposed heuristics
for the mapping of quantum circuits onto molecules
used in liquid state NMR quantum computing tech-
nology. Their algorithm starts with a molecule to be
used for computation, modeled as a weighted graph
with edges representing atomic couplings within the
molecule. The dataflow graph of the circuit is
mapped onto the molecule graph with an effort to
minimize overall circuit runtime. Our techniques fo-
cus on circuit placement and routing in an ion trap
technology and do not use a predefined physical sub-
strate topology as in the NMR case. A new ion trap
geometry is instead generated by our toolset for each
circuit.
4 Quantum CAD Flow
The ultimate goal of a quantum CAD flow is identical
to that of a standard classical CAD flow: to automate
High-Level Description
Synthesizer
Tech-Independent 
Netlist
Tech Mapping
(Tech-Specific Gates, Encoding, Fault Tolerance)
Tech Parameter File 
(Basic Blocks)
Custom Modules
Tech-Dependent Netlist
Interconnect
(if necessary)
Placement
and Routing
Classical Control Synthesis
for Qubit Movement
Geometry-Aware 
Netlist
Figure 4: An overview of our CAD flow for quantum
circuits. Ovals represent files; rectangles represent
tools. The gray area highlights the portions on which
we focus in this paper.
the synthesis and laying out of a circuit. For a quan-
tum CAD flow, the output circuit consists of both the
quantum portion and the associated classical control
logic.
The quantum CAD flow we present elaborates on
the design flows described in prior works [3, 22, 23].
Unlike prior work, our CAD flow addresses the need
to integrate automatic generation of classical control
into the flow. Figure 4 shows an overview of our CAD
toolset. Rectangles are tools, while ovals represent
intermediate file formats. Our toolset is built to be
as similar to classical CAD flows as possible, while
still accounting for the differences between classical
and quantum computing described in Section 1.1.
At the top, we begin with a high-level description
of the desired quantum circuit. At present this spec-
ification consists of a sequence of quantum assembly
language (QASM [3]) instructions implementing the
desired circuit, since this is a convenient format al-
ready being used by various third-party tools. We are
currently investigating extension of this high-level de-
scription to other formats, such as schematic entry,
mathematical formulae or a more general high-level
language.
The synthesizer parses the QASM file and gener-
ates a technology-independent netlist stored in XML
format. From this point onward (downward in the
figure), all file formats are XML. Additionally, infor-
mation may be modified or added but generally not
removed. As we move down the flow, we add more
and more low level details, but we also keep high-
level information such as encoded qubit groupings,
nested layout modules, distinction between ancillae
and data, etc. This allows low-level tools to make
more intelligent decisions concerning qubit placement
and channel needs based on high-level circuit struc-
ture. It likewise allows logical level modification at
the lowest levels without having to attempt to deduce
qubit groupings.
A technology parameter file specifies the complete
set of basic blocks available for the layout (see exam-
ples in Figure 2), as well as design rules for connect-
ing them. A basic block specification contains the
following:
• the geometry of the block in enough detail to
allow fabrication
• control logic for each operation possible within
the block (including both movement and gates)
• control logic for handling each operation possible
at each interface
The most basic function of the technology map-
ping tool is to take a technology-independent netlist
and map it onto allowed basic blocks to create the
technology-dependent netlist. This may be more or
less complicated depending upon the complexity of
the basic blocks. In addition, it may need to trans-
late to technology-specific gates (in case the QASM
file uses gates not available in this technology), en-
code the qubits used in the circuit (perhaps also auto-
matically adding the ancilla and operation sequences
necessary for error correction) and add fault tolerance
to the final physical circuit.
In the initial technology-dependent netlist, all
qubits are physical qubits, meaning that encoding
levels have been set (though they may still be modi-
fied later). At this point, any technology-specific op-
timizations may optionally be applied to the physical
circuit encapsulated in this netlist. Additionally, if
the circuit is complex enough to warrant the inclusion
of a teleportation-based interconnection network [9],
it is added to the netlist here using the higher level
qubit grouping information in the netlist.
Once the designer is happy with the netlist, a place-
ment and routing tool lays out the netlist and adds
any further channels needed for communication. This
geometry-aware netlist may be iterated upon as nec-
essary to refine the layout. Once the layout is final-
ized, the classical control synthesis tool combines the
control logic of the various components of the design,
integrates interface control mechanisms to function
properly and generates the unified control structure
for the entire layout. Our control synthesis tool gen-
erates a Verilog file, which may then be run through
a classical CAD flow for implementation.
The layout specification along with the con-
trol logic file together comprise the geometry-aware
netlist, which is the end result for the quantum cir-
cuit initially specified in the high-level description.
In order to allow hierarchical design of larger quan-
tum circuits, we may now add this geometry-aware
netlist to our set of custom modules. Future technol-
ogy mappings may use both the basic blocks speci-
fied in the technology parameter file and any custom
modules we create (or acquire).
The gray area in Figure 4 identifies the portions
we shall be focusing on for the rest of this paper. We
currently process the high-level description (a QASM
file) directly into a technology-dependent netlist for
ion traps using the macroblocks shown in Figure 2.
Thus we perform a tech mapping, but no automatic
encoding, interconnect or addition of gates for fault
tolerance. In this paper, we focus on laying out low-
level circuits, such as those for encoded ancilla gen-
eration and error correction. The classical control
synthesis box of the CAD flow is discussed in Sec-
tion 5, while placement and routing are analyzed and
compared in Sections 6, 7, 8 and 9.
We use two main metrics to evaluate the perfor-
mance of our CAD flow: area and latency. For area,
we consider the bounding box around the layout, so
irregularly-shaped layouts are penalized (since they
have wasted space). To determine latency of circuit
execution, we use the scheduling heuristic described
in Section 5.2 and extended in Section 8.3. A third
metric of interest is fault-tolerance. For small layouts
and circuits, we can use third-party tools to deter-
mine whether a given layout and schedule is fault-
tolerant [5], but we do not currently use the fault-
tolerance metric in our iterative design flow. We use
area and latency because, to a first approximation,
lower area and lower latency are likely to decrease de-
coherence. Previous algorithms to accurately deter-
mine the error tolerance of a quantum circuit have in-
volved very computationally-intensive analyses that
would be inappropriate for circuits with more than
a few dozen gates [1]. However, we are looking into
ways to incorporate fault tolerance as a metric.
5 Control
The classical control system is responsible for exe-
cuting the quantum circuit, including deciding where
and when gate operations occur and tracking and
managing every qubit in the system. It is composed
of the following major components: instruction issue
logic, gate control logic and macroblock control logic.
Instruction issue logic handles all instruction schedul-
ing and determines qubit movement paths. Gate con-
trol logic oversees laser resource arbitration, deciding
which requested gate operations may occur at any
given time. The macroblock control logic, which con-
sists of an individual logic block for each macroblock
in the system, handles all the internals of the mac-
roblock, including details of gate operation for each
gate possible within the macroblock, qubit movement
within the macroblock and qubit movement into and
out of the ports.
5.1 Control Interfaces
The first step in the control flow involves process-
ing the quantum circuit’s high-level description (the
QASM file). The instruction issue logic accepts this
stream of instructions as input and creates a series
of qubit control messages. Using these qubit control
messages, macroblock control logic blocks can deter-
mine where to move qubits and when to execute a
gate operation. Qubit control messages are simple
bit streams composed of a qubit ID, along with a se-
quence of commands, as shown in Figure 5. When
a qubit needs to perform an action, the instruction
issue logic sends to it an appropriate control message
which travels with the qubit as it traverses the lay-
out. Once a macroblock receives a qubit and its corre-
sponding control message, it uses the first command
in the sequence to determine the operation it must
perform. The macroblock then removes the com-
Figure 5: Example of how a qubit control message is
constructed to move a qubit through a series of mac-
roblocks. The qubit enters M0 and travels through
M1 and M2, arriving at M3 where it is instructed to
perform a CNOT.
mand bits used and passes on the remaining control
message to the next macroblock into which the qubit
travels. In this manner, the instruction issue logic
can create a multi-command qubit control message
that specifies the path a qubit will traverse through
consecutive macroblocks, along with where gate op-
erations take place. The instruction issue logic only
has to transmit this control message to the source
macroblock, relying on the inter-macroblock commu-
nication interface to handle the rest.
Communication between the instruction issue logic
and the macroblocks takes place using a shared con-
trol message bus in order to minimize the number
of wire connections required by the instruction issue
logic. Each macroblock listens to the control message
bus for messages addressed to it and only processes
messages with a destination ID that match the mac-
roblock’s ID. A macroblock is only responsible for
monitoring the control message bus if it contains a
qubit that has no remaining command bits. This con-
dition generally occurs after a gate operation, when
the instruction issue logic is deciding what action the
qubit should take next. Once the instruction issue
logic sends a new control message for the qubit, the
macroblock resumes operation.
Macroblocks communicate with each other via con-
trol signals associated with each quantum port in the
macroblock. Each port has signals to control qubit
movement into the macroblock and signals to control
movement out of the macroblock via that port. These
signals are connected to the corresponding signals of
the neighboring macroblocks. The macroblocks as-
sert a request signal to a destination macroblock
when a qubit command indicates the qubit should
cross into the next macroblock. If an available
signal response is received, the qubit, along with its
control message, can move across into the neighbor-
ing macroblock; if not, the qubit must wait until the
available signal is present.
The macroblock interface enables the instruction
issue logic to schedule qubit movement as a path
through a sequence of macroblocks, without concern-
ing itself with the low level details of qubit move-
ment. This modular system allows macroblocks to
be replaced with any other macroblock that imple-
ments the defined interface, without modifying the
instruction issue logic.
Additionally, macroblocks have an interface to the
laser control logic. Whenever a macroblock is in-
structed to perform a gate operation, it must request
a laser resource through the laser control logic. The
laser controller is responsible for aggregating requests
from all the macroblocks in the system, and decid-
ing when and where to send laser pulses. The laser
controller also attempts to parallelize as many oper-
ations as possible. Once the laser pulses have com-
pleted, the laser controller notifies the macroblocks,
indicating that the gate operation is complete.
5.2 Instruction Scheduling
The instruction issue logic is responsible for deter-
mining the runtime execution order of the instruc-
tions in the quantum circuit, which involves both
preprocessing and online scheduling. The instruc-
tion sequence is first preprocessed to assign priori-
ties that will help during scheduling. The sequence is
traversed from end to beginning, scheduling instruc-
tions as late as dependencies allow, using realistic
gate latencies but ignoring movement. Essentially,
each instruction is labeled with the length of its crit-
ical path to the end of the program. This is similar
to the method used in [16], but we use critical path
with gate times rather than the size of the dependent
subtree.
The instruction preprocessing generates an opti-
mal schedule assuming infinite gates and zero move-
ment cost. However, we wish to evaluate a layout
with more realistic characteristics. Our scheduler is
designed to schedule on an arbitrary graph, but the
layouts provided to it by the place and route tool are
in fact planar layouts using only right angles. In ad-
dition, the scheduler requires that the qubit initial
positions be provided as well.
Our scheduler implements a greedy scheduling
technique. It keeps the set of instructions which
have had all their dependencies fulfilled (and thus
are ready to be executed). It attempts to schedule
them in priority order. So the highest priority ready
instruction (according to critical path) is attempted
first and is thus more likely to get access to the re-
sources it needs. These contested resources include
both gates and channels/intersections. Once all pos-
sible instructions have been scheduled, time advances
until one or more resources is freed and more instruc-
tions may be scheduled. This scheduling and stalling
cycle continues until the full sequence has been exe-
cuted or until deadlock occurs, in which case it is de-
tected and the highest priority unscheduled instruc-
tion at the time of deadlock is reported.
Since we are interested in evaluating layouts rather
than in designing an efficient online scheduler, we use
very thorough searches over the graph in both gate
assignment and pathfinding. This causes the sched-
uler to take longer but takes much of the uncertainty
concerning schedule quality out of our tests. In addi-
tion, the scheduler reports stalling information which
may be used for iterating upon the layout.
5.3 Control Extraction
Armed with well defined component interfaces and a
method to execute the quantum instructions, all that
remains to create the control system for a given quan-
tum circuit is putting the pieces together. The quan-
tum datapath is composed of an arbitrary number
of macroblocks pulled from the component library.
Each macroblock in our component library has asso-
ciated with it classical control logic. The control logic
handles all the internals of the macroblock including
details of ion movement, ion trapping and gate oper-
ation. In our library, the macroblock control logic is
specified using behavioral Verilog modules.
When the layout stage of the CAD flow creates a
physical layout of macroblocks, we extract the cor-
responding control logic blocks and assemble them
together in a top-level Verilog module for the full
control system, stitching together all necessary mac-
roblock interfaces. This module instantiates all the
appropriate macroblock control modules, along with
the instruction issue logic and laser controller unit.
Combined, these modules are assembled into a sin-
Figure 6: QPOS grid structure constructed by tiling
the highlighted 2× 2 macroblock cell.
gle Verilog module which implements the full classi-
cal control system for the quantum circuit and which
may be input to a classical CAD flow for synthesis.
6 Grid-based Layouts
We begin our exploration of placement and routing
heuristics by considering grid-based layouts. A ma-
jority of the work done in the field has concentrated
on these types of layouts. In all of these works, a lay-
out is constructed by first designing a primitive cell
and then tiling this cell into a larger physical layout.
For example, the authors of [15, 16] manually design
a single cell, and for any given quantum circuit, they
use that cell to construct an appropriately sized lay-
out. In [23], the authors automate the generation of
an H-Tree based layout constructed from a single cell
pattern. Similarly, [3] uses a cell such as in [23] but
also provides some tools to evaluate the performance
of a circuit when the number of communication chan-
nels and gate locations within the cell is varied. We
use a combination of these methods to implement a
tool that automatically creates a grid-based physical
layout for a given quantum circuit.
The grid-based physical layouts generated by our
tools are constructed by first creating a primitive cell
out of the macroblocks mentioned in Section 2 and
then tiling the cell to fill up the desired area. For
example, Figure 6 shows how a 2 × 2 sized cell can
be tiled to create the layout used in [16] (referred to
henceforth as the QPOS grid). These types of simple
structures are easy to automatically generate given
only the number of qubits and gate operations in the
quantum circuit. Furthermore, grid-based structures
are very appealing to consider because, apart from
 1000
 2000
 3000
 4000
 5000
 6000
 7000
 8000
 9000
 0  100  200  300  400  500  600  700  800  900
Grid Structure
[[23,1,7]] Golay Encode Grid Search (3x2 cell)
Min - Mean - Max
(b)(a)
Figure 7: Variations in runtime of various grid-based
physical layouts for [[23, 1, 7]] Golay encode circuit.
For each grid structure the minimum, mean, and
maximum time are plotted.
Figure 8: Comparison of the best 3×2 cell for two dif-
ferent circuits. (a) The best cell for the [[23, 1, 7]] Go-
lay encode circuit. (b) The best cell for the [[7, 1, 3]]
L1 correct circuit.
selecting the number of cells in the layout and the
initial qubit placement, no other customization is re-
quired in order to map a quantum circuit onto the
layout. The regular pattern also makes it easy to de-
termine how qubits move through the system, as sim-
ple schemes such as dimension-ordered routing can be
used.
The approach we use to generate the grid-based
layout for a given quantum circuit is as follows:
1. Given the cell size, create a valid cell structure
out of macroblocks.
2. Create a layout by tiling the cell to fill up the
desired area.
3. Assign initial qubit locations.
4. Simulate the quantum circuit on the layout to
determine the execution time.
The first step finds a valid cell structure. A cell is
valid if all the macroblocks that open to the perime-
ter of the cell have an open macroblock to connect to
when the cell is tiled. Also, a cell cannot have an iso-
lated macroblock within it that is unreachable. Once
we tile this valid cell to create a larger layout, we must
decide on how to assign initial qubit locations. The
two methods we utilize are: a systematic left to right,
one qubit per cell approach, and a randomized place-
ment. The systematic placement allows us to fairly
compare different layouts. However, since the initial
placement of the qubits can affect the performance
of the circuit, the tool also tries a number of random
placements in an effort to determine if the systematic
placement unfairly handicapped the circuit.
This layout generation and evaluation procedure is
iterated upon until all valid cell configurations of the
given size are searched. We then repeat this process
for different cell sizes. The cell structure that results
in the minimum simulated time for the circuit is used
to create the final layout.
As an example, Figure 7 shows the results of
searching for the best layout composed of 3× 2 sized
cells targeting the [[23, 1, 7]] Golay encode circuit [21],
one of our benchmarks shown in Table 1. More than
900 valid cell configurations were tested. For each
cell configuration, we try multiple initial qubit place-
ments (as mentioned earlier) resulting in a range of
runtimes for each cell configuration. Differences in
the runtime of the circuit are not limited to just vari-
ations on the cell configuration but are in fact also
highly dependent on the initial qubit placement.
Figure 8 shows the best cell structure found by
conducting a search of all 2× 2, 2× 3, and 3× 2 sized
cells for two different circuits. The main result of this
search is that the best cell structure used to create
the grid-based layout is dependent on what circuit
will be run upon it. By varying the location of gates
and communication channels, we tailor the structure
of the layout to match the circuit requirements.
While this type of exhaustive search of physical
layouts is capable of finding an optimal layout for a
quantum circuit, it suffers from a number of draw-
backs. Namely, as the size of the cell increases, the
number of possible cell configurations grows exponen-
tially. Searching for a good layout for anything but
the smallest cell sizes is not a realistic option. Fur-
thermore, while small circuits may be able to take
advantage of primitive cell based grids, larger cir-
cuits will require a less homogeneous layout. One
approach to doing this is to construct a large layout
out of smaller grid-based pieces, all with different cell
configurations. While this approach is interesting, we
feel a more promising approach is one that resembles
a classical CAD flow, where information extracted
from the circuit is used to construct the layout.
7 Greedy Place and Route
One problem we observed in the regular grid layout
design was that the high amount of channel conges-
tion due to limited bandwidth causes densely-packed
(occupied) gates. Additionally, we found that a num-
ber of gate locations and channels in many of the
grids were not even used by the scheduler to perform
the circuit.
We present a new heuristic that attempts to solve
some of these problems. The heuristic is a simple
greedy algorithm that starts with only as many gate
locations as qubits (because we assume that qubits
only rest in storage/gate locations) and no channels
connecting the gates. It iterates with the circuit
scheduler, moving and connecting gate locations un-
til the qubits can communicate sufficiently to perform
the specified circuit. The current layout is fed into
the circuit scheduler which tries to schedule until it
finds qubits in gate locations that cannot communi-
cate to perform a gate. The place and router then
connects the problematic gate locations and tries
scheduling on the layout again. The iteration fin-
ishes once the circuit can be successfully completed.
Our algorithm bears some similarity to the iterative
procedure in adaptive cluster growth placement [12]
in classical CAD. Gate locations are placed from the
center outward as the circuit grows to fit a rectilinear
boundary.
The placer can move gate locations that have to
be connected if they are not already connected to
something else. The router connects gate locations
by making a direct path in the x and y directions
between them and placing a new channel, shifting
existing channels out of the way. Since channels are
allowed to overlap, intersections are inserted where
the new channels cut across existing ones.
This technique has the advantage that, since the
circuit scheduler prioritizes gates based on gate delay
critical path, potentially critical gates are mapped
to gate locations and connected early in the process.
Thus critical gates tend to be initially placed close
together to shorten the circuit critical path. Ad-
ditionally, gate locations that need to communicate
can be connected directly instead of using a general
shared grid channel network, where congestion can
occur and cause qubits to be routed along unneces-
sarily long paths.
A disadvantage of this heuristic is that gate place-
ment is done to optimize critical path, not to min-
imize channel intersections. This means that the
layout could end up having many 4-way channel in-
tersections and turns, both of which have more de-
lay than 2-way straight channels. Additionally, even
though critical gates are mapped and placed near
each other, the channel routing algorithm tends to
spread these gate locations apart as more channels
cut through the center of the circuit. We discuss our
experimental evaluation of this heuristic in Section 9.
8 Dataflow-Based Layouts
As described in Section 6, a systematic row by row
initial placement for qubits allows us to make some-
what accurate comparisons between different grid-
based layouts, while a random initial qubit placement
allows us to test a single grid’s dependence on qubit
starting positions. However, in laying out a quantum
circuit, we would like to have a more intelligent and
natural means of determining initial qubit placement.
For this, we turn to the dataflow graph representation
of the circuit.
8.1 Dataflow Graph Analysis
Figure 9a shows a QASM instruction sequence con-
sisting of Hadamard gates (H) and controlled bit-
flips (CX) operating on qubits Q0, Q1, Q2 and Q3,
with each instruction labeled by a letter. Figure 9b
shows the equivalent sequence of operations in stan-
dard quantum circuit format. Either of these may
A) H Q0
B) H Q1
C) H Q2
D) H Q3
E) CX Q0,Q1
F) CX Q2,Q3
G) CX Q1,Q2
H) CX Q2,Q3
I) CX Q0,Q2
(b) (c)(a)
Figure 9: a) A QASM instruction sequence. b) A quantum circuit equivalent to the instruction sequence in
(a). c) A dataflow graph equivalent to the instruction sequence in (a). Each node represents an instruction,
as labeled in (a). Each arc represents a qubit dependency.
A B C D
A B C D
A B C D
NG4NG3NG2
(b) (c)(a)
NG4NG3NG2
NG4NG3NG2
Figure 10: a) Each node (instruction) is initialized in its own node group (NG, outlined by the dotted lines),
which corresponds to a physical gate location in a layout. Once placed, we extract physical distances between
the nodes (the edge labels). b) We find the longest edge weight on the longest critical path (the length 5
edge on the path C-F-G-H-I; solid bold arrows) and merge its two node groups to eliminate that latency.
c) We recompute the critical path (A-E-I; dashed bold arrows) and merge its node groups, and so on.
be translated into the dataflow graph shown in Fig-
ure 9c, where each node represents a QASM instruc-
tion (as labeled in Figure 9a) and each arc represents
a qubit dependency. With this dataflow graph, we
may perform some analyses to help us place and route
a layout for our quantum circuit.
The general idea is that we shall create node groups
in the dataflow graph which correspond to distinct
gate locations that may then be placed and routed
on a layout. All instructions within a single node
group are guaranteed to be executed at a single gate
location, as elaborated upon in Section 8.3. To be-
gin with, we create a node group for each instruction,
giving us a dataflow group graph, as shown in Fig-
ure 10a. If we lay out this group graph with a distinct
designated gate for each instruction (using heuristics
discussed in Section 8.2), we get a layout in which
the starting location of each qubit is specified implic-
itly by its first gate location, so no additional initial
placement heuristic is needed.
From this layout we can extract movement latency
between nodes and label the edges with weights (as in
Figure 10a). We now find the longest critical path by
qubit. The critical path A-E-I of qubit Q0 has length
14 (the dashed bold arrows), while the critical path
C-F-G-H-I of qubit Q2 has length 15 (the solid bold
arrows). We select the longest edge on the longest
critical path, which is the edge G-H with weight 5.
We merge these two node groups to eliminate this la-
tency, in effect specifying that these two instructions
should occur at the same gate location (Figure 10b).
We then update the layout and recompute distances.
Assuming we merged these two node groups to the
location of H (NG8), then the weight of edge F-G
changes to 1 (to match the weight of edge F-H) and
the weight of edge E-G probably changes to 6 (former
E-G plus former G-H), but the exact change really
depends on layout decisions. The new critical path
is now A-E-I, so if we do this again, we merge node
groups NG5 and NG9 to eliminate the edge of weight
8, and we get the group graph in Figure 10c.
In merging nodes, there is the possibility that two
qubit starting locations get merged, complicating the
assignment of initial placement. For this reason, we
add a dummy input node for each qubit before its
first instruction. The merging heuristic doesn’t allow
more than one input node in any single node group,
so we maintain the benefit of having an intelligent
initial qubit placement without extra work.
There is an important trade-off to consider when
taking this merging approach. A tiled grid layout
provides plenty of gate location reuse but is un-
likely to provide any pipelinability without great ef-
fort. A layout of the group graph in Figure 10a
(with each instruction assigned to a distinct gate
location) provides no gate location reuse at all but
high potential pipelinability. This raises the ques-
tion of whether we wish to minimize area and time
(for critical data qubits), maximize throughput of a
pipeline (for ancilla generation), or compromise at
some middle ground where small sets of nearby nodes
are merged in order to exploit locality while still re-
taining some pipelinability. We intend to further ex-
plore this topic in the future.
8.2 Placement and Routing
Taking the group graph from the dataflow analysis
heuristic, the placement algorithm takes advantage
of the fanout-limited gate output imposed by the No-
Cloning Theorem [25] to lay out the dataflow-ordered
gate locations in a roughly rectangular block. We
adopt a gate array-style design, where gate locations
are laid out in columns according to the graph, with
space left between each pair of columns for necessary
channels. This can lead to wasted space due to a
linear layout of uneven column sizes, so we may also
perform a folding operation, wherein a short column
may be folded in (joined) with the previous column,
thus filling out the rectangular bounding box of the
layout as much as possible and decreasing area. The
columns are then sorted to position gate locations
that need to be connected roughly horizontal to one
another. This further minimizes channel distance be-
tween connected gate locations and reduces the num-
ber of high-latency turns.
Once gate locations are placed, we use a grid-based
model in which we first route local wire channels be-
tween gate locations that are in adjacent or the same
columns. These channels tend to be only a few mac-
roblocks long each. A separate global channel is then
inserted between each pair of rows and between each
Technology-
Dependent
Netlist
Dataflow
Analysis &
Gate Combination
Sorted Dataflow 
Placement
Local/Global 
Channel Router
Geometry-Aware
Netlist
Gate Scheduler/
Simulator
Placement and Routing
Figure 11: The placement and routing portion of our
CAD flow (shown in Figure 4) takes a technology-
dependent netlist and translates it into a geometry-
aware netlist through an iterative process involving
dataflow analysis and placement and routing tech-
niques.
pair of columns of gate locations. These global chan-
nels stretch the full length of the layout. There are
no real routing constraints in our simple model since
channels are allowed to overlap and turn into 3- or
4-way intersections. We depend on the dataflow col-
umn sorting in the placement phase to reduce the
number of intersections and shared local channels.
While local channels could technically be used for
global routing and vice versa, we’ve found that this
division in routing tends to divide the traffic and sep-
arate local from long-distance congestion.
With these basic placement and routing schemes,
we may now iterate upon the layout, as shown in Fig-
ure 11. The technology-dependent netlist is trans-
lated into a dataflow group graph with a separate
gate location for each instruction (Figure 10a). This
group graph is then placed, routed and scheduled to
get latency and identify the runtime critical path (as
opposed to the critical path in the group graph, which
fails to take congestion into account). The longest
latency move on the runtime critical path (between
two node groups) is merged into one node group, thus
eliminating the move since a node group represents
a single gate location. This new group graph is then
placed, routed and scheduled again to find the next
pair of node groups to merge.
Once this process has iterated enough times, we
reach a point where congestion at some heavily
merged node group is actually hurting the latency
with each further merge. We alleviate this conges-
tion by adding storage nodes (essentially gate loca-
tions that don’t actually perform gates) near the con-
gested node group. This increases the area slightly
but maintains the locality exploited by the merging
heuristic. If congestion persists, we halt the algo-
rithm, back up a few merging steps and output the
geometry-aware netlist.
8.3 Annotated Scheduling
The scheduling heuristic described in Section 5.2
schedules an arbitrary QASM instruction sequence on
an arbitrary layout. However, once we have assigned
instructions in a dataflow graph to node groups (as
described in Section 8.1), we wish those instructions
to be executed at their proper location on any lay-
out placed and routed from the group graph. To this
end, we annotate each instruction in the instruction
sequence with the name of the gate location where
it must be executed. Additionally, since we have the
gate locations in advance, we can incorporate move-
ment in the back-prioritization of the instruction se-
quence. Thus, the priority assigned to each qubit
now incorporates both gate latencies and movement
through an uncongested layout, which gives us a bet-
ter approximation of each qubit’s critical path. We
use this extended scheduler in our dataflow-based ex-
periments presented in Section 9.
9 Results
We now present our simulation results for the heuris-
tics described in earlier sections.
9.1 Benchmarks
Relatively high error rates of operations in a quantum
computer necessitate heavy encodings of qubits. As
such, we focus on encoding circuits (useful for both
data and ancillae) and error correction circuits to ex-
periment with circuit layout techniques. We lay out
a number of error correction and encoding circuits to
Qubit Gate
Circuit name count count
[[7, 1, 3]] L1 encode [20] 7 21
[[23, 1, 7]] L1 encode [21] 23 116
[[7, 1, 3]] L1 correction [1] 21 136
[[7, 1, 3]] L2 encode [20] 49 245
Table 1: List of our QECC benchmarks, with quan-
tum gate count and number of qubits processed in
the circuit.
evaluate the effectiveness of the heuristics used in our
CAD flow in terms of circuit area and latency, as de-
termined by our scheduler. Our circuit benchmarks
are shown in Table 1. We use two level 1 (L1) encod-
ing circuits, a level 2 (L2) recursive encoding circuit
and a fault-tolerant level 1 correction circuit.
The idea of the encoding circuits is that they will
provide a constant stream of encoded ancillae to in-
teract with encoded data qubit blocks. Thus, for
these circuits, throughput is a more important mea-
sure than latency, implying that they would benefit
greatly from pipelining. Nonetheless, a high latency
circuit could introduce non-trivial error due to in-
creased qubit idle time. On the other hand, correc-
tion circuits are much more latency dependent, since
they are on the critical path for the processing of data
qubit blocks.
9.2 Evaluation
We have evaluated a variety of layout design heuris-
tics on the four benchmarks shown in Table 1. The
results are in Table 2. “QPOS Grid” refers to
the best scheduled layout from the literature [16]
(see Section 6). “Optimal Grid” refers to the best
grid with an area matching the QPOS Grid used
that was found by the exhaustive search described
in Section 6. “Greedy” refers to the heuristic de-
scribed in Section 7. “DF” refers to the dataflow-
based approach from Section 8. “Non-folded” means
the dataflow graph is laid out with varying column
widths; “folded” means the layout has been made
more rectangular by stacking columns. The num-
ber of global channels is between each pair of rows
and columns of gate locations. “Critical combining”
refers to our dataflow group graph merging heuristic.
The exhaustive search over grids yields the best
latency for all benchmarks, which is not surprising.
Circuit Heuristic Latency (µs) Area
[[7, 1, 3]] L1 encode QPOS Grid 548.0 49
Optimal Grid 509.0 49
Greedy channel and gate location placement 648.0 36
Non-folded DF, 2 global channels, critical combining 768.2 231
Folded DF, 1 global channels, critical combining 795.4 126
Folded DF, 2 global channels, critical combining 712.4 182
[[23, 1, 7]] Golay encode QPOS Grid 2268.0 575
Optimal Grid 1801.0 575
Greedy channel and gate location placement 2457.0 168
Non-folded DF, 2 global channels, critical combining 2169.2 3880
Folded DF, 1 global channels, critical combining 2264.0 713
Folded DF, 2 global channels, critical combining 2248.2 1394
[[7, 1, 3]] L1 correction QPOS Grid 1300.0 1271
Optimal Grid 771.0 1271
Greedy channel and gate location placement 1932.0 756
Non-folded DF, 2 global channels, critical combining 999.8 2378
Folded DF, 1 global channels, critical combining 1501.2 690
Folded DF, 2 global channels, critical combining 1121.2 1496
[[7, 1, 3]] L2 encode QPOS Grid 2411.0 1365
Optimal Grid 1367.0 1365
Greedy channel and gate location placement 4791.0 936
Non-folded DF, 2 global channels, critical combining 1582.4 4087
Folded DF, 1 global channels, critical combining 1828.6 1617
Folded DF, 2 global channels, critical combining 1944.8 3381
Table 2: Latency results for a variety of ECC circuits with different placement and routing heuristics.
This kind of search becomes intractable quickly as
circuit size grows, and additionally, it is based on the
unproven assumption that a regular layout pattern
is the best approach. We include this data point as
something to keep in mind as a target latency.
Among the polynomial-time heuristics, we first
note that no single heuristic is optimal for all four
benchmarks and that, in fact, no single heuristic op-
timizes both latency and area for any single circuit.
Dataflow-based place and route techniques in general
produce the lowest latency circuits. We find that the
optimal global channel count per column (1 or 2) de-
pends on the circuit being laid out. This is an artifact
of the lack of maturity in our routing methodology.
We intend to explore more adaptive routing optimiza-
tion in our ongoing work.
The dataflow approach and the QPOS Grid tend
to trade off between latency and area. However, we
expect that the dataflow approach will show greater
potential for pipelining, thus allowing us to target cir-
cuits such as an encoded ancilla generation factory,
for which throughput is of greater importance than
latency. We also observe that non-folded dataflow
layouts are likely to have even greater pipelinability
than folded ones, but at the likely cost of greater area.
Although, we should note that the area estimates for
the non-folded DF-based layouts are in fact overes-
timates due to our use of a liberal bounding box for
these calculations.
We find that the greedy heuristic tends to find
the best design area-wise, but the latency penalty
increases with circuit complexity. This is expected,
as greedy is unable to handle congestion problems,
so it works best for small circuits where congestion
is not an issue. It is for the opposite reason that the
DF heuristics fail on the [[7, 1, 3]] L1 encode. They
insert too much complexity into an otherwise simple
problem.
10 Conclusion
We presented a computer-aided design flow for the
layout, scheduling and control of ion trap-based
quantum circuits. We focused on physical quantum
circuits, that is, ones for which all ancillae, encod-
ings and interconnect are explicitly specified. We
explored several mechanisms for generating optimal
layouts and schedules for our benchmark circuits.
Prior work has tended to assume a specific regular
grid structure and to schedule operations within this
structure. We investigated a variety of grid structures
and showed a performance variance of a factor of four
as we varied grid structure and initial qubit place-
ment. Since exhaustive search is clearly impractical
for large circuits, we also explored two polynomial-
time heuristics for automated layout design. Our
greedy algorithm produces good results for very sim-
ple circuits, but quickly begins to be suboptimal as
circuit size grows. For larger circuits, we investigated
a dataflow-based analysis of the quantum circuit to
assist a place and route mechanism which leverages
from classical algorithms. We found that our our
dataflow approach generally offers the best latency,
often at the cost of area. However, we expect that a
layout based on the dataflow graph analysis also of-
fers better potential for pipelining than a grid-based
approach, and we intend to investigate this further in
the future.
References
[1] P. Aliferis, D. Gottesman, and J. Preskill.
Quantum accuracy threshold for concatenated
distance-3 codes. Arxiv preprint quant-
ph/0504218, 2005.
[2] S. Balensiefer, L. Kreger-Stickles, and M. Oskin.
QUALE: quantum architecture layout evaluator.
Proceedings of SPIE, 5815:103, 2005.
[3] S. Balensiefer, L. Kregor-Stickles, and M. Oskin.
An evaluation framework and instruction set ar-
chitecture for ion-trap based quantum micro-
architectures. Proc. 32nd Annual International
Symposium on Computer Architecture, 2005.
[4] J. I. Cirac and P. Zoller. Quantum computations
with cold trapped ions. Phys. Rev. Lett, 74:4091–
4094, 1995.
[5] A. Cross. qasm-tools.
http://www.media.mit.edu/quanta/quanta-
web/projects/qasm-tools/, 2006.
[6] L. Grover. Symposium on Theory of Computing
(STOC 1996), pages 212–219.
[7] W. Hensinger, S. Olmschenk, D. Stick, D. Hucul,
M. Yeo, M. Acton, L. Deslauriers, C. Monroe,
and J. Rabchuk. T-junction ion trap array for
two-dimensional ion shuttling, storage, and ma-
nipulation. Applied Physics Letters, 88(3):34101,
2006.
[8] D. Hucul, M. Yeo, W. K. Hensinger, J. Rabchuk,
S. Olmschenk, and C. Monroe. On the transport
of atomic ions in linear and multidimensional ion
trap arrays. quant-ph/0702175, 2007.
[9] N. Isailovic, Y. Patel, M. Whitney, and J. Ku-
biatowicz. Interconnection Networks for Scal-
able Quantum Computers. Proceedings of the
33rd International Symposium on Computer Ar-
chitecture (ISCA), 2006.
[10] D. Kielpinski, C. Monroe, and D.J. Wineland.
Architecture for a large-scale ion-trap quantum
computer. Nature, 417:709–711, 2002.
[11] J. Kim, S. Pau, Z. Ma, H. McLellan, J. Gages,
A. Kornblit, and R. Slusher. System design for
large-scale ion trap quantum information pro-
cessor. Quantum Information and Computation,
5(7):515–537, 2005.
[12] CM Kyung, JM Widder, and DA Mlynski.
Adaptive cluster growth (ACG); a new algo-
rithm for circuit packingin rectilinear region. De-
sign Automation Conference, 1990. EDAC. Pro-
ceedings of the European, pages 191–195, 1990.
[13] M.J. Madsen, W.K. Hensinger, D. Stick, J.A.
Rabchuk, and C. Monroe. Planar ion trap ge-
ometry for microfabrication. Applied Physics B:
Lasers and Optics, 78:639 – 651, 2004.
[14] D. Maslov, S. M. Falconer, and M. Mosca.
Quantum circuit placement: Optimizing qubit-
to-qubit interactions through mapping quan-
tum circutis into a physical experiment. quant-
ph/0703256, 2007.
[15] T. Metodi, D. Thaker, A. Cross, F. Chong, and
I. Chuang. A Quantum Logic Array Microar-
chitecture: Scalable Quantum Data Movement
and Computation. Proceedings of the 38th Inter-
national Symposium on Microarchitecture (MI-
CRO), 2005.
[16] T.S. Metodi, D.D. Thaker, A.W. Cross, F.T.
Chong, and I.L. Chuang. Scheduling physical
operations in a quantum information processor.
Proceedings of SPIE, 6244:62440T, 2006.
[17] C. Monroe, D. M. Meekhof, B. E. King, W. M.
Itano, and D. J. Wineland. Demonstration of a
universal quantum logic gate. Phys. Rev. Lett.,
75:4714–4717, 1995.
[18] C. Pearson, D. Leibrandt, W. Bakr, W. Mallard,
K. Brown, and I. Chuang. Experimental investi-
gation of planar ion traps. Phys. Rev. A, 73(3),
2006.
[19] P.W. Shor. Polynomial-time algorithms for
prime factorization and discrete logarithms on a
quantum computer. 35’th Ann. Symp. on Foun-
dations of Comp. Science (FOCS), pages 124–
134, 1994.
[20] A. M. Steane. Simple quantum error correcting
codes. Phys. Rev. A, 54:4741–4751, 1996.
[21] A.M. Steane. Overhead and noise threshold of
fault-tolerant quantum error correction. Phys.
Rev. A, 68(4):42322, 2003.
[22] K. Svore, A. Aho, A. Cross, I. Chuang, and
I. Markov. A Layered Software Architecture for
Quantum Computing Design Tools. Computer,
39(1):74–83, 2006.
[23] K. Svore, A. Cross, A. Aho, I. Chuang, and
I. Markov. Toward a software architecture for
quantum computing design tools. Proceedings
of the 2nd International Workshop on Quantum
Programming Languages (QPL), pages 145–162,
2004.
[24] D.D. Thaker, T.S. Metodi, A.W. Cross, I.L.
Chuang, and F.T. Chong. Quantum Memory
Hierarchies: Efficient Designs to Match Avail-
able Parallelism in Quantum Computing. Pro-
ceedings of the 33rd International Symposium on
Computer Architecture (ISCA), 2006.
[25] W. Wootters and W. Zurek. A single quantum
cannot be cloned. Nature, 299:802–803, 1982.
[26] C. Zalka. Simulating quantum systems on
a quantum computer. Proceedings: Math-
ematical, Physical and Engineering Sciences,
454(1969):313–322, 1998.
	Introduction
	Motivation for a Quantum CAD Flow
	Contributions
	Paper Organization
	Ion Traps
	Related Work
	Quantum CAD Flow
	Control
	Control Interfaces
	Instruction Scheduling
	Control Extraction
	Grid-based Layouts
	Greedy Place and Route
	Dataflow-Based Layouts
	Dataflow Graph Analysis
	Placement and Routing
	Annotated Scheduling
	Results
	Benchmarks
	Evaluation
	Conclusion
ABSTRACT
  We present a computer-aided design flow for quantum circuits, complete with
automatic layout and control logic extraction. To motivate automated layout for
quantum circuits, we investigate grid-based layouts and show a performance
variance of four times as we vary grid structure and initial qubit placement.
We then propose two polynomial-time design heuristics: a greedy algorithm
suitable for small, congestion-free quantum circuits and a dataflow-based
analysis approach to placement and routing with implicit initial placement of
qubits. Finally, we show that our dataflow-based heuristic generates better
layouts than the state-of-the-art automated grid-based layout and scheduling
mechanism in terms of latency and potential pipelinability, but at the cost of
some area.

<|endoftext|><|startoftext|>
Introduction
Blazars are the most extreme class of Active Galactic Nuclei (AGN) exhibiting rapid
variability at all wavelengths and a high degree of linear polarization in the optical. They
have been observed at all wavelengths, from radio through VHE γ-rays and are characterized
by non-thermal continuum spectra and radio jets with individual components often exhibit-
ing apparent superluminal motion. This class of AGNs is comprised of BL Lac objects and
flat-spectrum radio quasars (FSRQs), which are distinguished primarily on the basis of the
absence or presence of broad emission lines in their optical spectra.
The broadband spectra of blazars are associated with non-thermal emission and exhibit
two broad spectral components. The low energy component is due to synchrotron emis-
sion from non-thermal electrons in a relativistic jet whereas the high energy component is
attributed either to the Compton upscattering of low energy radiation by the synchrotron
emitting electrons (for a recent review see, e.g., Böttcher (2006)) or the hadronic processes
initiated by relativistic protons co-accelerated with the electrons (Mücke & Protheroe 2001;
Mücke et al. 2003) . Blazars are often known to exhibit variability at all wavelengths, vary-
ing on time scales from months, to a few days, to even less than an hour in some cases.
The radio emission of blazars shows variability on a time scale of weeks to months whereas
the optical emission for some blazars might vary on a time scale of around one and a half
hours. At X-ray energies, some HBLs exhibit characteristic loop features when the pho-
ton energy spectral index, α, is plotted against the X-ray flux. These plots are known as
hardness-intensity diagrams (HIDs) and the loop structures are called spectral hysteresis.
This spectral hysteresis can be interpreted as the signature of synchrotron radiation, due to
the gradual injection and/or acceleration of ultrarelativistic electrons in the emitting region
and their subsequent radiative cooling (Kirk et al. 1998; Georganopoulos & Marscher 1998;
Kataoka et al. 2000; Kusunose et al. 2000; Li & Kusunose 2000; Böttcher & Chiang 2002).
3C 66A is classified as a low-frequency peaked (or radio selected) BL Lac object (LBL).
The peak of the low-frequency component of LBLs generally lie in the IR or optical regime,
whereas the high-energy component peak is located at several GeV, and the γ-ray output
is typically comparable to or slightly higher than the spectral output of the synchrotron
component. The redshift of 3C 66A has a relatively uncertain determination of z = 0.444
(Bramel et al. 2005). It has exhibited rapid microvariability at optical and near infrared
in the past and has been suggested as a promising candidate for detection by the new
generation of atmospheric Čerenkov telescope facilities like H.E.S.S., MAGIC, or VERITAS
(Costamante & Ghisellini 2002). This object has been studied in radio, IR, optical, X-rays
and γ-rays in the past. Its low-frequency component is known to peak in the IR - UV regime
whereas the high-frequency component generally peaks at multi MeV - GeV energies. The
– 3 –
multiwavelength SED and correlated broadband spectral variability behaviour of 3C 66A
have been very poorly understood. For this reason, Böttcher et al. (2005) organized an
intensive multiwavelength campaign to observe this object from July 2003 through April
2004, with the core campaign period being Sept. - Dec. 2003.
As described in Böttcher et al. (2005), the object exhibited several outbursts in the
optical. The variation was on the order of ∆m ∼ 0.3-0.5 over a timescale of several days.
The minimum variability timsecale of 2 hr provided an estimate for the size of the emitting
region to be on the order of 1015 cm. The optical flares suggested the presence of an optical
spectral hysteresis pattern with the B - R hardness peaking several days before the R- and B-
band flux peaked. The RXTE PCA data indicated a transition between the synchrotron and
the high-energy component at photon energies of & 10 keV. The broadband SED of 3C 66A
suggested that the synchrotron component peaked in the optical. In the VHE γ-ray regime,
STACEE provided an upper limit at Eγ & 150 GeV whereas an upper limit at Eγ > 390 GeV
resulted from simultaneous Whipple observations.
In this paper, we use a leptonic jet model to reproduce the broadband SED and the
observed optical spectral variability patterns of 3C 66A and make predictions regarding
observable X-ray spectral variability patterns and γ-ray emission. In §2, we describe the
time-dependent leptonic jet model used to reproduce the observed SED and optical spectral
variability patterns of 3C 66A. The parameters used to simulate the observed results are
described in §3. The modeling results and VHE γ-ray predictions are discussed in §4. We
summarize in §5.
Throughout this paper, we refer to α as the energy spectral index, Fν [Jy] ∝ ν
−α. A
cosmology with Ωm = 0.3, ΩΛ = 0.7, and H0 = 70 km s
−1 Mpc−1 is used. In this cosmology,
and using the redshift of z = 0.444, the luminosity distance of 3C 66A is dL = 2.46 Gpc.
2. Model Description
The SEDs and optical variability patterns of 3C 66A were modeled using a one-zone
homogeneous leptonic jet model. The model assumes injection of a population of ultrarel-
ativistic non-thermal electrons and positrons into a spherical emitting volume (the “blob”)
of comoving radius Rb at a time-dependent rate. Since the positrons lose equal amount of
energy as the electrons via the same radiative loss mechanisms so we do not distinguish
between them throughout the paper. The injected electron population is described by a
single power law distribution with a particle spectral index q, comoving injection density
Qinje (γ; t) (cm
−3s−1) and low- and high-energy cutoffs γ1 and γ2, respectively, such that
– 4 –
Qinje (γ) = Q
0 (t)γ
−q for γ1 ≤ γ ≤ γ2, where Q
0 (t) is the injection function and is given by,
0 (t) =
Linj(t)
if q 6= 2
Linj(t)
mec2ln(γ2/γ1)
if q = 2
where Linj specifies the power of the injected pair population and V
b is the blob volume
in the comoving frame.
The randomly oriented magnetic field B has uniform strength throughout the blob and
is determined by an equipartition parameter eB ≡ uB/ue (in the comoving frame), where
uB is the magnetic field energy density and ue is the electron energy density. We keep eB
constant so that the magnetic field value changes according to the evolving electron energy
density value as determined by equation 2. The initial injection of the electron population
into the blob takes place at a height z0 above the plane of the central accretion disk. The
emitting region travels relativistically with a speed v/c = βΓ = (1 − 1/Γ
2)1/2 along the jet.
The jet is directed at an angle θobs with respect to the line of sight. The Doppler boosting of
the emission region with respect to the observer’s frame is determined by the Doppler factor
δ = [Γ(1− βΓ cos θobs)]
−1, where Γ is the bulk Lorentz factor.
As the emission region propagates in the jet, the electron population inside the blob
continuously loses its energy due to synchrotron emission, Compton upscattering of syn-
chrotron photons (SSC) and/or Compton upscattering of external photons (EC). The seed
photons for the EC process include the UV soft X-ray emission from the disk entering the jet
either directly (Dermer et al. 1992; Dermer & Schlickeiser 1993) or after getting reprocessed
in the BLR or other circumnuclear material (Sikora et al., 1994; Dermer et al. 1997). The
time-dependent evolution of the electron and photon population inside the emission region
is governed, respectively, by,
∂ne(γ, t)
ne(γ, t)
+Qe(γ, t)−
ne(γ, t)
te,esc
∂nph(ǫ, t)
= ṅph,em(ǫ, t)− ṅph,abs(ǫ, t)−
nph(ǫ, t)
tph,esc
Here, (dγ/dt)loss is the radiative energy loss rate, due to synchrotron, SSC and/or EC
emission, for the electrons. Qe(γ, t) is the sum of external injection and intrinsic γ − γ pair
– 5 –
production rate and te,esc is the electron escape time scale. ṅph,em(ǫ, t) and ṅph,abs(ǫ, t) are the
photon emission and absorption rates corresponding to the electrons’ radiative losses and,
tph,esc = (3/4)Rb/c is the photon escape timescale. The time-dependent evolution of the
electron and photon population inside the blob is followed and radiative energy loss rates as
well as photon emissivities are calculated using the time-dependent radiation transfer code
of Böttcher & Chiang (2002).
The model only follows the evolution of the emission region out to sub-pc scales and
as a result only the early phase of γ-ray production can be simulated. Since the radiative
cooling is strongly dominant over adiabatic cooling during this phase and the emission region
is highly optically thick out to GHz radio frequencies, the simulated radio flux is well below
the actual radio data. We do not simulate the phase of the jet components in which they
are expected to gradually become transparent to radio frequencies as that would require the
introduction of several additional, poorly constrained parameters.
3. Model Parameters
The model independent parameters that were estimated using the SED and optical
intraday variability measurements (see Böttcher et al. 2005) were used to develop an initial
set of input parameters:
δ ≈ 15
R ≈ 3.3× 1015 cm
B ≈ 2.9 ǫ
γ1 ≈ 3.1× 10
γ2 ≈ 1.5× 10
p ≈ 4 (4)
Here p is the equilibrium spectral index that determines the optical synchrotron spec-
trum and p = q+1 for strongly cooled electrons. The initial set of parameters was modified
to reproduce the quiescent as well as the flaring state of 3C 66A. Approximately 350 sim-
ulations were carried out to study the effects of variations of various parameters, such as
γ1, γ2, q, B and Γ, on the resulting broadband spectra and light curves. The set of model
parameters that provided a satisfactory fit to the quiescent state of 3C 66A involved a value
of the Doppler factor, δ = Γ = 24 and a viewing angle of θobs = 2.4
o. These parameters
were chosen on the basis of VLBA observations that provided the limits on the superluminal
motion and indicated bending of the jet towards the line of sight thus resulting in a smaller
– 6 –
viewing angle and a higher Doppler boosting of the emission region as compared to the
values inferred from the superluminal measurements on larger scales (Jorstad et al. 2005;
Böttcher et al. 2005). The fitting of the SED both in the quiescent as well as flaring state
of 3C 66A was carried out such that the simulated quiescent state does not overpredict the
X-ray photon flux as X-ray photons are expected to be dominated by the flaring episodes.
On the other hand, the flaring state was simulated such that the resulting time-averaged
spectrum passes through the observed time-averaged optical as well as X-ray data points.
This was achieved by varying individual parameters, such as, γ1, γ2 and q between the values
for quiescent and flaring states with time profiles as discussed in the next section. A value
of γ1 = 2.1× 10
3, γ2 = 4.5× 10
4 and q = 2.4 provided a satisfactory fit to the flaring state.
Also, during our multiwavelength campaign of 2003 - 2004, flux upper limits at multi-GeV -
TeV energies could be obtained and as a result we could get upper limits on the respective
parameters governing the EC component. The various model parameters used to simulate
the two states of 3C 66A are listed in Table 1.
Figures 1 and 2 respectively show the reproduction of the SED of 3C 66A, for both the
quiescent and flaring state observed during the campaign period. The quiescent state is a
reproduction of the state observed around 1st October 2003 whereas the flaring state is the
reproduction of a generic 10 day flaring period corresponding to the timescale of several of
the major outbursts that were observed during the campaign. The simulated time-averaged
spectrum of 3C 66A in the flaring state is shown in Figure 3. The simulations, corresponding
to fits 1 and 2 of Table 1, were carried out for a pure SSC emission process by artificially
setting LD = 0, where LD is the bolometric disk luminosity. Fit 3 of Table 1 refers to an
EC+SSC case with LD = 1.0× 10
45 ergs s−1 and is shown in Figure 9. The value of LD was
chosen such that it is more than the value of the jet luminosity used in the simulations and
at the same time does not produce a blue bump in the simulated SED. In order to assess
the possible effect of EC emission in 3C66A, an upper limit to the optical depth of the BLR
was first determined using XSTAR, which returns the ionization balance and temperature,
opacity, and emitted line (Hα, Hβ) and continuum fluxes. The BLR was modeled as a
spherical shell with rBLR,in = 0.045 pc and rBLR,out = 0.050 pc, where rBLR,in and rBLR,out
stand for the inner and outer radii of the broad line region. A Thomson optical depth of
0.3 for the BLR was chosen as a reasonable upper limit such that the line emission is weak
enough or absent to be consistent with the observed featureless continuum.
– 7 –
ν [Hz]
Oct. 1
Nov. 1
Nov. 11
Dec. 28
STACEE
Whipple
(99 % UL)RXTE 2003
XMM−Newton
ROSAT
EXOSAT
EGRET
BeppoSAX
1999/2001
EINSTEIN
1979/1980
STACEE (99 % UL, Γ = 2.5)
(99 % UL, Γ = 3.0)
STACEE
(99 % UL)
Fig. 1.— Reproduction of the quiescent state of 3C 66A observed around October 1st 2003.
The simulation of this state was carried out using parameters that do not overpredict the X-
ray photon flux. The black colored solid line indicates the instantaneous spectrum generated
by the simulation after the system (blob + injected electron population) attains equilibrium.
The low-energy component peaks in the optical at νsyn ≈ 4.8 × 10
14 Hz whereas the high-
energy SSC component peaks in the MeV regime at νSSC ≈ 1.6× 10
21 Hz. The synchrotron
cooling timescale in the observer’s frame is ≈ 1.2 hours, which is on the order of observed
minimum optical variability timescale of 2 hours. The diamond shaped STACEE upper
limit is a new addition and is provided by Lindner (2006). All data that are indicated by
dotted curves are archival data and are shown for comparison. The historical average of the
5 EGRET pointings is also included to provide a guideline for our simulated VHE emission.
– 8 –
ν [Hz]
Oct. 1
Nov. 1
Nov. 11
Dec. 28
STACEE
Whipple
(99 % UL)
RXTE 2003
XMM−Newton
ROSAT
EXOSAT
EGRET
BeppoSAX
1999/2001
EINSTEIN
1979/1980
STACEE (99 % UL, Γ = 2.5)
(99 % UL, Γ = 3.0)
STACEE
(99 % UL)
Fig. 2.— Simulation of the flaring state for a generic 10 day flare corresponding to the
timescale of several major outbursts that were observed in the optical regime during our
campaign. The various curves show the instantaneous spectral energy distribution of 3C 66A
at several different times in the observer’s frame: black (red in the online version) dotted
line (∼ 5th hour), gray (green) dashed line (∼ 8th hour), black (blue) dot-dashed line (∼
14th hour), gray (yellow) long-dashed line (∼ 20th hour), long-dashed black line (∼ 8th day,
highest state attained by the system during flaring), gray solid line (∼ 9th day), dotted black
(violet) line (∼ 16th day), gray (cyan) colored solid line (∼ 18th day), dashed black (magenta)
colored line (∼ 20th day) and black (red) solid line (∼ 22nd day, equilibrium state reached
by the system after the flaring episode is over). The synchrotron component of the flaring
state peaks at νsyn ≈ 1.1× 10
15 Hz and the SSC component peaks at νSSC ≈ 2.7× 10
22 Hz.
The SSC component of this state cuts off at νSSC,cutoff ≈ 2.3 × 10
24 Hz. The synchrotron
cooling timescale in the optical regime is ≈ 37 minutes for the flaring state.
– 9 –
4. Results and Discussion
As can be seen in Figure 3, the time-averaged simulated spectrum passes through the
time-averaged optical data points whereas the high energy end of the synchrotron compo-
nent passes through the time averaged X-ray data indicating the dominance of synchrotron
emission in the production of such photons in case of flaring. For X-ray photons with energy
beyond 10-12 keV, the data is less reliable due to low count rates and possible source con-
fusion with 3C 66B. The spectral upturn at ≥ 7 keV occurs due to the presence of the SSC
component in the simulation. The presence of this component cannot be suppressed because
in order to suppress it the population of seed photons would have to be diluted, which can
be done by increasing the size of the emission region. But the size of the emission region
cannot be increased any further due to the strict constraint on the maximum size of the blob
that comes from the observed minimum variability timescale in the optical region, which
is 2 hrs. Hence, the emission region size cannot exceed 3.6 × 1015 (D/24) cm. Thus, our
model suggests that the harder X-ray photons come from the SSC and not the synchrotron
mechanism with the expected spectral hardening taking place at ∼ 7 keV. The high energy
component, due to the SSC emission, for the time-averaged spectrum (see Figure 3) cuts off
at ∼ 1.0× 1024 Hz or 4 GeV. From the simulated level of VHE emission we predict that the
object is well within the observational range of MAGIC, VERITAS and, especially, GLAST
(see Figure 3) whose sensitivity limit is 50 times lower than that of EGRET at 100 MeV
and even more at higher energies and its two year limit for source detection in an all-sky
survey is 1.6 × 10−9 photons cm−2 s−1 (at energies > 100 MeV). Thus it will be possible
to extract the spectral and variability information for this object at such high energies in
future observations.
Flaring above the quiescent state of 3C 66A was reproduced using a flaring profile for
the electron injection power (Linj(t)) that was Gaussian in time (see Figure 4):
Linj(t) = L
inj(t) +
(Lflinj − L
(z−rc)2
] (5)
Here, qu and fl stand for the quiescent and flaring state respectively, z determines the
position of the emission region in the jet at time t, rc indicates the position of the center of
the simulated flare and σ stands for the Gaussian width of the flare.
The rest of the parameters such as γ1 and γ2 and q were also changed accordingly.
In order to simulate the observed optical flare, the system was first allowed to come to an
equilibrium and after the equilibrium was set up the flare was introduced with a Gaussian
width, σ corresponding to 14 days in the observer’s frame. Although the flare was introduced
– 10 –
ν [Hz]
VERITAS
STACEE
Whipple
(99 % UL)
RXTE 2003
XMM−Newton
ROSAT
EXOSAT
EGRET
BeppoSAX
1999/2001
EINSTEIN
1979/1980
STACEE (99 % UL, Γ = 2.5)
(99 % UL, Γ = 3.0)
GLAST
MAGIC
(Large ZA)
STACEE
(99 % UL)
Fig. 3.— Time-averaged spectral energy distribution of 3C 66A for a period of 23 days
around a flare as shown in Figure 2. The filled black (colored in the online version) circles
are the time-averaged optical and IR data points for the entire campaign period and the
“RXTE 2003” denotes the time-averaged X-ray data points. The dot-dashed black line
is the contribution from the synchrotron component only whereas the long-dashed black
line indicates the contribution of the SSC component only. The time-averaged synchrotron
component peaks at νsyn ≈ 7.2× 10
14 Hz whereas the time-averaged SSC component peaks
at νSSC ≈ 5.3 × 10
21 Hz. The synchrotron component cuts off near 7 keV whereas the SSC
component cuts off at ∼ 4 GeV. The black colored dashed line indicates the attenuation due
to the optical depth at VHE energies. The γγ absorption effect becomes significant at ∼
200 GeV. The black (green, maroon and magenta) lines indicate the sensitivity limits for an
observation time of 50 hours for MAGIC, VERITAS and MAGIC (Large Zenith Angle) and
for GLAST for an observation time of 1 month.
– 11 –
in order to simulate the observed major optical outbursts lasting for 10 days, the choice of
14 days for the Gaussian width was made such that the width of the simulated flare matches
that of the observed flare, rc was adjusted such that the centre of the simulated flare aligns
with that of the observed one and the value of Linj was varied such that the peak of the
simulated flare matches that of the observed one.
The observed lightcurves did not agree well with a flaring profile that was top-hat or
triangular in time as can be seen in the figure. The presence of a flare that is Gaussian in
time might represent an initial injection of particles into the emission region at the base of
the jet. The particles slowly get accelerated as a shock wave ploughs through the region
and finally dies out in time. Crucial information on the dominant acceleration mechanism
comes from the change in the shape of the particle injection spectral index with time, which
might also indicate a possible change in the B-field orientation. According to the current
understanding of acceleration mechanisms, parallel shocks generally produce electron spectra
of Qe(γ) ∝ γ
−q with 2.2 . q . 2.3 (Achterberg et al. 2001; Galant et al. 1999), whereas
oblique shocks produce much softer injection spectral indices. On the other hand, 2nd order
Fermi acceleration behind the shock front might give rise to a harder injection index of the
order of q ∼ 1 or beyond (Virtanen & Vainio 2005). In order to reproduce the flaring state,
the simulation first starts out in the quiescent state with quiescent state parameters and
then the value of these parameters is changed to the flaring state parameters as the flaring
is introduced in the simulation. Since, the value of q, in our simulations, changes from 3.1
(quiescent state) to 2.4 (flaring state) it might indicate a possible change in the orientation
of the B-field from oblique to parallel during the flaring episode or an interplay between
the 1st and 2nd order Fermi acceleration thereby making the particle spectra harder. The
contribution from such acceleration mechanisms and the shear acceleration (Rieger & Duffy
2004) might play an important role in accelerating the particles to higher energies.
The simulated optical variability in the R band (0.55 mag) matches the observed value
(0.3-0.5 mag) for a 10 day period outburst. The predicted variability in B is more than
that of R by ∼ 0.15 mag as also observed, which indicates that the spectrum is becoming
harder (see Figure 5) with the spectral upturn occuring at B-R ≈ 0.72 mag as shown in
Figure 6. Figure 6 is a hardness intensity graph that shows that the object follows a positive
correlation of becoming harder in B-R while getting brighter in both the bands during the
10-day flare simulated in Figure 2. This agrees well with the observed optical variability
pattern. In this study, we are not addressing the variability that was observed on intraday
timescales as that analysis would open up an even larger parameter space, which cannot be
reasonably well constrained without any variability information in the X-ray regime.
The flare declines faster as compared to the time taken by the flare to rise. This might
– 12 –
2940 2945 2950
JD - 2450000
Fig. 4.— The simulated lightcurves for various flaring profiles that have been superimposed
on the observed R-band lightcurve (see Figure 7 of Böttcher et al. (2005)) for an outburst on
∼ November 1st 2003. The solid black line denotes a flaring profile that is Gaussian in time
as used for the flare in Figure 2, the dash-dotted black line is a trianglular flaring profile
whereas the dashed black line is a flaring profile that is top-hat in time. As can be seen, the
Gaussian flaring profile closely matches the width as well as the profile of the observed flare.
– 13 –
0.0e+00 2.0e+05 4.0e+05 6.0e+05 8.0e+05 1.0e+06 1.2e+06
time (sec)
2e+12
3e+12
4e+12
5e+12
0.0e+00
5.0e+11
1.0e+12
1.5e+12
1 keV
10 keV
15 keV
3 keV
0e+00
2e+12
4e+12
6e+12
1 MeV
100 MeV
10 GeV
∆R ~ 0.55 mag
Fig. 5.— Simulated lightcurves for the optical, X-rays and γ-ray energy regimes shown in
the three panels respectively. The simulated variability in the R band is ≈ 0.55 mag as
indicated by the arrows. The B band, denoted by the black dotted line exhibits a higher
variability of ≈ 0.7 mag, in the simulation, than that in the R band, which is consistent
with our observations. The simulated lightcurve at 1 keV is indicated by a black dashed
curve and exhibits an amplitude variation of ≈ 1.4 × 1012 Jy Hz. The 3, 10 and 15 keV
lightcurves, denoted by the black solid line, black long-dashed line and the black dot-dashed
curve, respectively, on the other hand do not exhibit much variability. In the VHE regime,
the 1 MeV lightcurve is denoted by a black solid line. The 100 MeV lightcurve is indicated
by a black long-dashed curve and the simulated variability amplitude in this energy regime
is on the order of 1012 Jy Hz. The black dot-dashed line indicates the lightcurve at 10 GeV.
– 14 –
13.6013.7013.8013.9014.0014.10
R magnitude
0.640
0.660
0.680
0.700
0.720
0.740
0.760
0.780
Fig. 6.— The simulated hardness-intensity diagram indicates a positive correlation between
R- and B-band for an outburst lasting for ∼ 10 days. The object becomes brighter in R and
harder in B-R as shown by the arrows. The spectral upturn takes place at B-R ≈ 0.72 mag
where the flux in B equals that in R (corresponding to αBR = 0).
– 15 –
indicate that the particles’ synchrotron cooling timescale is less than or equal to the light
crossing time.
τ obscool,sy ≈ 2.8× 10
−1/2 (
15 s (6)
We can calculate the observed synchrotron cooling timescale, τ obscool,syn in the optical
regime from equation 6 (Böttcher et al. 2005) using δ = 24, B = 2.4 G and ν15 = 0.48 for
the quiescent state and B = 2.8 G and ν15 = 1.1 for the flaring state (see Figures 1 and
2), where ν15 is the characteristic synchrotron frequency in units of 10
15 Hz. This yields a
value of τ obscool,sy ∼ 1.2 hours for the quiescent state whereas for the flaring state it reduces
to 37 minutes. The observed minimum variability timescale of ∼ 2 hours might therefore
correspond to the observed dynamical timescale, where
τ obsdyn ≈
1 + z
. (7)
This implies that it takes time to build up the electron population in the emission
region through flaring but once built up the electrons lose their energy efficiently to produce
synchrotron photons. This can be used to constrain the value of the magnetic field in the
jet, which has been allowed to evolve in time keeping eB = 1 and has an average value of 2.4
Gauss in the simulated quiescent state and 2.8 Gauss in the simulated flaring period.
The crossover of X-ray lightcurves, in our simulations, is a result of the dominance of
the SSC component in hard X-rays (see Figure 5). The lightcurve of soft X-ray photons of
energy 1 keV exhibits a greater variability of ∼ 1.4 × 1012 Jy Hz in its flux as compared
to their optical counterpart. This is expected because the soft X-ray photons, during the
flaring episode, are produced from synchrotron emission of electrons that are accelerated
to very high energies and as a result have a very short cooling timescale and thus greater
variability. In case of hard X-rays no significant variability is predicted. This is because such
photons are produced from Compton upscattering of synchrotron photons off the low-energy
electrons and as a result the cooling timescale is much longer as compared to the cooling
timescale of their soft X-ray and optical counterparts. Hence, the variability information
gets washed out. The predicted X-ray spectral variability pattern of large variability in the
low X-ray energy band and negligible variability in the high X-ray energy band is similar to
what has also been observed in BL Lacertae on several occasions (see for e.g., Ravasio et al.
2003, 2002).
As can be seen in Figure 7, spectral hysteresis patterns are not predicted for optical as
– 16 –
well as soft X-ray photons. This is expected because the cooling timescale of their parent
electron population is so short that what is observed is the average effect of this cooling
over the dynamical timescale and hence any hysteresis pattern gets smeared out. On the
other hand, one expects to see these patterns at higher energies because as explained earlier,
this photon population comes from Compton upscattering off low-energy electrons, which
have a longer cooling timescale and as a result the photon population gradually builds up
over time and then dies away giving rise to a hysteresis pattern (see Figure 8). The slight
spectral softening at 10 keV seen in its hysteresis pattern (see Figure 7) for higher values of
νFν indicates a small synchrotron contribution near the peak of the flare.
The simulated instantaneous SED, for a pure SSC model, shows a definite presence of
γ-ray emission in 3C66A, in the quiescent as well as the flaring state (see Figure 1 and 2).
The intrinsic cutoff of VHE emission in the flaring state, according to the simulations, for
the time averaged spectrum is ∼ 1.0 × 1024 Hz or 4 GeV. In our simulations, the emission
of VHE γ-ray photons is produced by the SSC mechanism in the quiescent as well as the
flaring state. Figure 5 shows the simulated lightcurves for VHE photons and as can be seen
the νFν value changes by ∼ 4.17×10
12 Jy Hz at 100 MeV. The variability in VHE photons is
expected as they are the result of Compton upscattering off the higher energy electrons and
due to this the hysteresis pattern is not seen at such high energies as the cooling timescale
of such high energy electrons is very short (see Figure 8).
From Figure 9, it can be seen that the high-energy component of 3C 66A, in the flaring
state, could start out with a dominant contribution of the EC emission, shown by the red
solid line. But as the blob travels further away and passes the outer edge of the broad line
region, the EC contribution becomes less significant and the SSC emission takes over. This
is indicated by the black long-dashed line in the figure. We might actually find that this
maximum contribution would be just enough to explain the historical EGRET flux and that
there could be GeV flaring due to early external Comptonization.
The effect of an optical depth due to the IIRB on the spectra of 3C 66A was also
evaluated and was found to be insignificant in the energy range we are interested in as
shown in Figure 3. The optical depth due to the IIRB was determined using the analytic
expression given in Stecker et al. (2006). The γ − γ absorption till ∼ 100 GeV is negligible
and becomes slightly observable at ∼ 200 GeV as the optical depth takes a value of, τγγ ≈
2.9. Hence, the SSC emission cutoff value at ∼ 4 GeV is intrinsic.
– 17 –
0.61 0.71 0.81 0.91
νFν [Jy Hz]
0.09 0.29 0.49 0.69 0.89
0.55 0.65 0.75 0.85 0.95
1 keV
10 keV
Fig. 7.— Simulated spectral hysteresis pattern in the R-band, 1 keV and 10 keV energy
regimes, shown in the three panels respectively. As can be seen, the hysteresis pattern starts
to show up in the 10 keV energy regime.
– 18 –
0.45 0.55 0.65 0.75 0.85 0.95
νFν [Jy Hz]
0.15 0.35 0.55 0.75 0.95
0.02 0.22 0.42 0.62 0.82
α 10 GeV
1 MeV
100 MeV
Fig. 8.— Simulated hysteresis pattern for 1 MeV, 100 MeV and 10 GeV energy regimes,
shown in the three panels repectively. The hysteresis pattern is prominent for the 1 MeV
energy regime but starts to become absent at higher energies.
– 19 –
ν [Hz]
Oct. 1
Nov. 1
Nov. 11
Dec. 28
STACEE
Whipple
(99 % UL)RXTE 2003
XMM−Newton
ROSAT
EXOSAT
EGRET
BeppoSAX
1999/2001
EINSTEIN
1979/1980
STACEE (99 % UL, Γ = 2.5)
(99 % UL, Γ = 3.0)
STACEE
(99 % UL)
Fig. 9.— Simulation of the effect of the BLR on the instantaneous spectral energy distri-
bution of 3C 66A for the first 3 days of a simulation similar to Figure 2. The curves in the
figure denote the instantaneous spectra obtained from the simulation. The gray (red in the
online version) solid line denotes one of the initial instantaneous spectrum at the beginning
of the simulation whereas the black long-dashed line indicates the last spectrum obtained
from the simulation.
– 20 –
5. Summary
An extensive analysis of the data of 3C 66A, obtained from the multiwavelength mon-
itoring campaign on 3C 66A from July 2003 to April 2004, was carried out using a time-
dependent leptonic jet model. The analysis was targeted towards understanding the dom-
inant radiation mechanism in the production of the high-energy component of the SED of
3C 66A in the quiescent as well as the flaring state. Our simulations yielded predictions
regarding the observable variability patterns in the X-ray as well as the VHE energy regimes
where such patterns could not be detected during the campaign. The object was well sam-
pled in the optical, especially in the R-band, during the campaign. It had exhibited several
major outbursts (∼10 days) in this regime with a varibility of ∆m ∼ 0.3-0.5. The X-ray
data covered the 3-10 keV range with the onset of the high-energy component expected at
≥ 10 keV photon energies. Only upper limits in the VHE regime had been obtained.
The simulations from our model could successfully reproduce the observed SED as well
as the optical spectral variability patterns. The model suggests the dominance of the SSC
mechanism in the production of hard X-ray as well as VHE photons. On the other hand, soft
X-ray photons exhibit spectral softening during flaring indicating the onset of the synchrotron
component in this energy range. According to the simulated time-averaged spectrum, the
synchrotron component is expected to cut off near 7 keV whereas the SSC component cuts
off at ∼ 4 GeV.
A flaring profile that was Gaussian in time could successfully reproduce the observed
flaring profile for a timescale of ∼ 10 days. The simulated varibility in R (∆m ∼ 0.55) agreed
well with the observed variability. According to the simulations, the object flares up in R
and B simultaneoulsy with τ obscool,syn (37 minutes) being less than or equal to the light crossing
time (2 hours) during flaring. No significant variability is predicted in the hard X-ray regime.
This is due to the production of such photons from Compton upscattering off low-energy
electrons with cooling timescales much longer than the light crossing time, 3Rb/4c. On the
other hand, the simulated lightcurves of VHE γ-ray photons exhibit significant variability as
such photons are produced from the Compton upscattering off higher energy electrons with
shorter cooling timescales than the light crossing time.
The effect of the optical depth due to γ − γ absorption by the IIBR on the SED of
3C 66A was also evaluated. The simulations do not predict a significant effect on the SED
due to the optical depth. The SSC emission cutoff predicted to be at ∼ 4 GeV can be
taken as the intrinsic SSC emission cutoff value for this object. We predict the object to
be well within the observational range of MAGIC, VERITAS and GLAST. Finally, the EC
emission for this object was also calculated and it appears that the EC emission could be
dominant in the high-energy component initially, but as the emission region travels further
– 21 –
away from the BLR, the EC contribution becomes less significant and the SSC emission
takes over. It is highly probable that this maximum contribution of the EC component
might explain the historical EGRET flux and that there could be GeV flaring due to early
external Comptonization.
This work was partially supported through NRL BAA 76-03-01, contract no. N00173-
05-P-2004.
REFERENCES
Achterberg, A., et al., 2001, MNRAS, 328, 393
Böttcher, M., 2006, in proc. “The Multi-Messenger Approach to High-Energy Gamma-Ray
Sources”, Barcelona, Spain, 2006, Astroph. & Space Sci., in press
Böttcher, M., et al., 2005, ApJ, 631, 169
Böttcher, M., & Chiang, J., 2002, ApJ, 581, 127
Böttcher, M., Mause, H., & Schlickeiser, R., 1997, A&A, 324, 395
Bramel, D. A., et al., 2005, ApJ, 629, 108
Costamante, L., & Ghisellini, G., 2002, A&A, 384, 56
Dermer, C. D., Sturner, S. J., & Schlickeiser, R., 1997, ApJS, 109, 103
Dermer, C. D., & Schlickeiser, R., 1993, ApJ, 416, 458
Dermer, C. D., Schlickeiser, R., & Mastichiadis, A., 1992, A&A, 256, L27
Gallant, Y. A., et al., 1999, A&AS, 138, 549
Georganopoulos, M., & Marscher,A. P., 1998, ApJ, 506, 621
Jorstad, S. G., et al., 2005, AJ, 130, 1418
Kataoka, J., et al., 2000, ApJ, 528, 243
Kirk, J. G., Rieger, F. M., & Mastichiadis, A., 1998, A&A, 333, 452
Kusunose, M., Takahara, F., & Li, H., 2000, ApJ, 536, 299
– 22 –
Li, H., & Kusunose, M., 2000, ApJ, 536, 729
Lindner, T., PhD thesis, McGill University, 2006
Mücke, A., & Protheroe, R. J., 2001, Astropart. Phys., 15, 121
Mücke, A., et al., 2003, Astropart. Phys., 18, 593
Ravasio, M., et al., 2003, A&A, 408, 479
Ravasio, M., et al., 2002, A&A, 383, 763
Rieger, F. M., & Duffy, P., 2004, ApJ, 617, 155
Sikora, M., Begelman, M. C., & Rees, M. J., 1994, ApJ, 421, 153
Stecker, F. W., Malkan, M. A., Scully, S. T., 2006, in press
Virtanen, J. J. P., & Vainio, R., 2005, ApJ, 621, 313
This preprint was prepared with the AAS LATEX macros v5.2.
– 23 –
Table 1. Model Parameters used to reproduce the quiescent and flaring state of 3C 66A as
shown in Figures 1 and 2, respectively. Note: Linj is the luminosity with which electron
population is injected into the blob. γ1,2 are the low- and high-energy cutoffs of electron
injection spectrum and q is the particle spectral index. Profile stands for the flare profile
used to reproduce the optical variability pattern, eB is the equipartition parameter and
magnetic field B is the equipartition value. Γ is the bulk Lorentz factor, Rb is the comoving
radius of the blob, θobs is the viewing angle and τT,BLR is the radial Thomson depth of the
Fit Linj [10
41 ergs/s] γ1 [10
3] γ2 [10
4] q Profile eB B [G] Γ Rb [10
15 cm] θobs [deg] τT,BLR
1 2.7 1.8 3.0 3.1 ——– 1 2.4 24 3.59 2.4 0
2 8.0 2.1 4.5 2.4 Gaussian 1 2.8 24 3.59 2.4 0
3 8.0 2.1 4.5 2.4 Gaussian 1 2.8 24 3.59 2.4 0.3
	Introduction
	Model Description
	Model Parameters
	Results and Discussion
	Summary
ABSTRACT
  The BL Lac object 3C 66A was observed in an extensive multiwavelength
monitoring campaign from July 2003 till April 2004. The spectral energy
distribution (SED) was measured over the entire electromagnetic spectrum, with
flux measurements from radio to X-ray frequencies and upper limits in the very
high energy (VHE) gamma-ray regime. Here, we use a time-dependent leptonic jet
model to reproduce the SED and optical spectral variability observed during our
multiwavelength campaign. Our model simulations could successfully reproduce
the observed SED and optical light curves and predict an intrinsic cutoff value
for the VHE gamma-ray emission at ~ 4 GeV. The effect of the optical depth due
to the intergalactic infrared background radiation (IIBR) on the peak of the
high-energy component of 3C 66A was found to be negligible. Also, the presence
of a broad line region (BLR) in the case of 3C 66A may play an important role
in the emission of gamma-ray photons when the emission region is very close to
the central engine, but further out, the production mechanism of hard X-ray and
gamma-ray photons becomes rapidly dominated by synchrotron self-Compton
emission. We further discuss the possibility of an observable X-ray spectral
variability pattern. The simulated results do not predict observable hysteresis
patterns in the optical or soft X-ray regimes for major flares on multi-day
time scales.

<|endoftext|><|startoftext|>
Introduction
M dwarfs, the most common stars in our Galaxy, were added to
the target lists of planet-search programs soon after the first exo-
planet discoveries. Compared to Sun-like stars, they suffer from
some drawbacks: they are faint and photon noise therefore of-
ten limits measurements of their radial velocity, and many are
at least moderately active and thus prone to so-called “radial-
velocity jitter” (Saar & Donahue 1997). On the other hand, the
smaller masses of M dwarfs result in a higher wobble ampli-
tude for a given planetary mass, and their p-mode oscillations
have both smaller amplitudes and shorter periods than those of
solar type stars. These oscillations therefore average out much
faster. As a result, the detection of an Earth-like planet in the –
closer – habitable zone of an M dwarf is actually within reach
of today’s best spectrographes. Perhaps most importantly, how-
ever, M dwarfs represent unique targets to probe the dependance
on stellar mass of planetary formation, thanks to the wide mass
range (0.1 to 0.6 M�) spanned by that spectral class alone.
Send offprint requests to: X. Bonfils
? Based on observations made with the HARPS instrument on the
ESO 3.6 m telescope under the GTO program ID 072.C-0488 at Cerro
La Silla (Chile).
The first planet found to orbit an M dwarf, GJ 876b (Delfosse
et al. 1998; Marcy et al. 1998), was only the 9th exoplanet
discovered around a main sequence star. Besides showing that
Jupiter-mass planets can form at all around very-low-mass stars,
its discovery suggested that they might be common, since it was
found amongst the few dozen M dwarfs that were observed at
that time. Against these early expectations, no other M dwarf
was reported to host a planet until 2004, though a second planet
(GJ 876c, mp sin i = 0.56 MJup – Marcy et al. 2001) was soon
found around GJ 876 itself.
In 2004, the continuous improvement of the radial-velocity
techniques resulted in the quasi-simultaneous discovery of three
Neptune-mass planets, around µ Ara (mp sin i = 14 M⊕ – Santos
et al. 2004), ρ Cnc (mp sin i = 14 M⊕ – McArthur et al. 2004)
and GJ 436 (mp sin i = 23 M⊕ – Butler et al. 2004; Maness et al.
2006). Of those three, GJ 436b, orbits an M dwarf, and put that
spectral class back on the discovery forefront. It was soon fol-
lowed by another two, a single planet around GJ 581 (mp sin i =
17 M⊕ – Bonfils et al. 2005b) and a very light (mp = 7.5 M⊕)
third planet in the GJ 876 system (Rivera et al. 2005). As a re-
sult, planets around M dwarfs today represent a substantial frac-
tion (30%) of all known planets with m sin i <∼ 30 M⊕.
2 X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674
Even with GJ 849b (mp sin i = 0.82 MJup – Butler et al. 2006)
now completing the inventory of M-dwarf planets found with
radial-velocity techniques, the upper-range of planet masses re-
mains scarcely populated. This contrasts both with the (still very
incompletely known) Neptune-mass planets orbiting M dwarfs
and with the jovian planets around Sun-like stars. At larger sep-
arations, microlensing surveys similarly probe the frequency of
planets as a function of their mass. That technique has detected
four putative planets that likely orbit M dwarfs: OGLE235-
MOA53b (mp ∼ 1.5 − 2.5 MJup – Bond et al. 2004), OGLE-05-
071Lb (mp = 0.9 MJup – Udalski et al. 2005), OGLE-05-390Lb
(mp = 0.017 MJup – Beaulieu et al. 2006) and OGLE-05-169Lb
(mp = 0.04 MJup – Gould et al. 2006). Two of these four plan-
ets have likely masses below 0.1 MJup . Given the detection bias
of that technique towards massive companions, this again sug-
gests that Neptune-mass planets are much more common than
Jupiter-mass ones around very-low-mass stars.
Here we report the discovery of a 11 M⊕ planet orbiting
GJ 674 every 4.69 days. GJ 674b has the 5th lowest mass of
the known planets, and coincidentally is also the 5th planetary
system centered on a M dwarf. Its detection adds to the small in-
ventory of both very-low mass planets and planets around very-
low mass stars. After reviewing the properties of the GJ 674 star
(§2), we briefly present our radial velocity measurements (§3)
and their Keplerian analysis (§4). A careful analysis of the mag-
netic activity of GJ 674 (§5) assigns one of the two periodicities
to rotational modulation of a stellar spot signal, and the other
one to a bona fide planet. We conclude with a brief discussion of
the properties of the detected planet.
2. The properties of GJ 674
GJ 674 (HIP 85523, LHS 449) is a M2.5 dwarf (Hawley et al.
1997) in the Altar constellation. At 4.5 pc (π = 220.43±1.63 mas
– ESA 1997), it is the 37th closest stellar system, the 54th clos-
est star (taking stellar multiplicity into account)1, and only the
2nd closest known planetary system (after � Eridani, and slightly
closer than GJ 876).
Its photometry (V = 9.382 ± 0.012; K = 4.855 ± 0.018 –
Turon et al. 1993; Cutri et al. 2003) and parallax imply absolute
magnitudes of MV = 11.09 ± 0.04 and MK = 6.57 ± 0.04. GJ
674’s J − K color (= 0.86 – Cutri et al. 2003) and the Leggett
et al. (2001) colour-bolometric relation result in a K-band bolo-
metric correction of BCK = 2.67, and in a 0.016 L� luminosity.
The K-band mass-luminosity relation of Delfosse et al.
(2000) gives a 0.35 M� mass and the Bonfils et al. (2005a)
photometric calibration of the metallicity results in [Fe/H] =
−0.28 ± 0.2.
The moderate X-ray luminosity (Lx/Lbol ' 5.10−5 – Hünsch
et al. 1999) and Ca ii H & K emission depict a modestly active
M dwarf (Fig. 1). Its UVW galactic velocities place GJ 674 be-
tween the young and old disk populations (Leggett 1992), sug-
gesting an age of ∼ 108−9yr.
Last but not least, since we are concerned with radial veloc-
ities, the high proper motion of GJ 674 (1.05 arcsec yr−1 – ESA
1997) changes the orientation of its velocity vector along the
line-of-sight (e.g. Kürster et al. 2003) to result in an apparent
secular acceleration of 0.115 m s−1 yr−1. At our current precision
this acceleration will not be detectable before another decade.
1 on Mar. 1st 2007 (http://www.chara.gsu.edu/RECONS/TOP100.htm)
3962.5 3965 3967.5 3970 3972.5 3975
Wavelength [Å]Wavelength [Å]
Fig. 1. Emission reversal in the Ca ii H line of GJ 674 (top) and GJ
581 (bottom). Within our sample GJ 581 has one of the weakest Ca ii
emission and illustrates a very quiet M dwarf. GJ 674 has much stronger
emission and is moderately active.
Table 1. Observed and inferred stellar parameters for GJ 674
Parameter GJ 674
Spectral Type M2.5
V 9.382 ± 0.012
π [mas] 220.43 ± 1.63
Distance [pc] 4.54 ± 0.03
MV 11.09 ± 0.04
K 4.855 ± 0.018
MK 6.57 ± 0.04
L? [L�] 0.016
Lx/Lbol 5.10−5
v sin i [km s−1] . 1
dvr/dt [m s−1yr−1] 0.115
[Fe/H] −0.28
M? [ M�] 0.35
age [Gyr] 0.1-1
Teff [K] 3500-3700
3. Radial-velocity data
We observed GJ 674 with the HARPS echelle spectrograph
(Mayor et al. 2003) mounted on the ESO 3.6-m telescope at La
Silla Observatory (Chile). After demonstrating impressive planet
finding capabilities right after its commissioning (Pepe et al.
2004), this spectrograph now defines the state of the art in radial-
velocity measurements, delivering a significantly better preci-
sion than its ambitious 1 m s−1 specification. As one recent pub-
lished example, Lovis et al. (2006a) obtained a 0.64 m s−1 disper-
sion for the residuals of their orbital solution of the 3 Neptune-
mass planets of HD 69830.
We observed GJ 674 without interlaced Thorium-Argon light
to obtain cleaner spectra for spectroscopic analysis, at some
small cost in the ultimate Doppler precision. Since June 2004
we have gathered 32 exposures of 900 s each with a median
S/N ratio of ∼ 90. Their Doppler information content, evalu-
ated according to the prescriptions of Bouchy et al. (2001), is
mostly below 1 m s−1. Our internal errors additionally include,
in quadrature sum, an “instrumental” uncertainty of 0.5 m s−1 for
the nightly drift of the spectrograph (since we do not use the
ThAr lamp to monitor it) and the measurements uncertainty of
the daily wavelength zero point calibration. We did benefit of
the recent improvements of the HARPS wavelength calibration,
which is now stable to 0.1 m s−1 (Lovis et al. 2006b).
A constant radial velocity gives a very large reduced chi-
square (χ̄2 = 132) for the time series, which reflects a disper-
X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674 3
σ = 7.35 m s−1
χ̄2ν = 132
53000 53250 53500 53750 54000
Julian date −2,400,000 [day]Julian date −2,400,000 [day]
1 10 100 1000
Period [day]Period [day]
Fig. 2. Upper panel: Radial-velocity measurements of GJ 674 as a func-
tion of time. The high dispersion (σ = 7.35 m s−1) and chi-square value
(χ̄2 = 132) betray a (coherent or incoherent) signal in the data. Bottom
panel: the Lomb-Scargle periodogram of the velocities has prominent
power excess around P = 4.69 days (downward arrow), which indi-
cates that much of the excess dispersion reflects a coherent signal with
a period close to that value. The second highest peak, at 1.27 day, is a
one-day alias of the 4.69 days period (1.27 ' 1 + 1/4.69).
sion (∼ 7.4 m s−1) well above our internal errors (Fig. 2). This
prompted a search for an orbital (§4) and/or magnetic activity
(§5) signal.
4. Orbital analysis
A Lomb-Scargle periodogram (Press et al. 1992) of the velocity
measurements shows a narrow peak around 4.69-day (Fig. 2).
Adjustment of a single Keplerian orbit demonstrates that it is
best described by a m2 sin i =12.7 M⊕ planet (0.040 MJup) re-
volving around GJ 674 every P2 = 4.6940 ± 0.0005 days in a
slightly eccentric orbit (e2 = 0.10 ± 0.02). The residuals around
this low-amplitude orbit (K1 = 9.8±0.2 m s−1) have a dispersion
of 3.27 m s−1 (Fig. 3), still well above our measurement errors,
and the reduced chi-square per degree of freedom is χ̄2 = 30.6.
A periodogram of the residuals indicates that much of this ex-
cess dispersion stems from a broad power peak centered around
35 days, prompting us to perform a 2-planet fit.
We searched for 2-planet Keplerian solutions with Stakanof
(Tamuz, in prep.), a program which uses genetic algorithms
to efficiently explore the large parameter space of multi-planet
models. Stakanof quickly converged to a 2-planet solution that
describes our measurements much better than the single planet
fit (σ = 0.82 m s−1, χ̄2 = 2.57 per degree of freedom – Fig. 4).
The orbital parameters of the 4.69-day planet change little from
the 1-planet fit, except for the eccentricity which increases to
e2 = 0.20±0.02. Its mass is revised down to M2 sin i = 11.09 M⊕,
and the period hardly changes, P2 = 4.6938 ± 0.0007 day. The
second planet would have a P3 = 34.8467 ± 0.0324 day pe-
riod, an e3 = 0.20 ± 0.05 eccentricity and a minimum mass
of m3 sin i = 12.58 M⊕. Such periods would correspond to semi-
major axes of 0.04 and 0.15 AU. Those are sufficiently disjoint
σ = 3.27 m s−1
χ2 = 30.6
0 0.25 0.5 0.75 1
53000 53250 53500 53750 54000
Julian date −2,400,000 [day]Julian date −2,400,000 [day]
1 10 100 1000
Period [day]Period [day]
Fig. 3. Upper panel: Radial velocities of GJ 674 (red filled circles)
phase-folded to the 4.6940 days period of the best 1-planet fit (curve).
The dispersion around the fit (σ = 3.27 m s−1) and its reduced chi-
square (χ̄2 = 30.6 per degree of freedom) indicate that a single planet
does not describe the data very well. Middle panel: Radial-velocity
residuals of the 1-planet fit Bottom panel: The Lomb-Scargle peri-
odogram of the residuals shows a broad peak centered around 35 days.
that mutual interactions can be neglected over observable time
scales, and that the system would be stable over longer time
scales.
The low dispersion around the solution and the lack of any
significant peak in the Lomb-Scargle periodogram of its residu-
als shows that our current radial-velocity measurements contain
no evidence for an additional component.
5. Activity analysis
Apparent Doppler shifts unfortunately do not always originate
in the gravitational pull of a companion: in a rotating star, stellar
surface inhomogeneities such as plages and spots can break the
exact balance between light emitted in the red-shifted and blue-
shifted halves of the star. Observationally, these inhomogeneities
translate into flux variations as well as into changes of both the
shape and the centroid of spectral lines (Saar & Donahue 1997;
Queloz et al. 2001). Spots typically also impact spectral indices,
whether designed to probe the chromosphere (to which photo-
spheric spots have strong magnetic connections), or the photo-
sphere (because spots have cooler spectra). Of the two candidate
periods, the 4.69-day one is unlikely to reflect stellar rotation.
We measure from our GJ 674 spectra a rotational velocity of
v sin i . 1 km s−1, which would need a rather unprobable stellar
4 X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674
σ = 0.82 m s−1
χ2 = 2.57
0 0.25 0.5 0.75 1
53000 53250 53500 53750 54000
Julian date −2,400,000 [day]Julian date −2,400,000 [day]
1 10 100 1000
Period [day]Period [day]
Fig. 4. Top two panels: Radial velocity measurements phased to each of
the two periods, after subtraction of the other component of our best 2-
planet Keplerian model. Third panel: Residuals of the best 2-planet fit
as a function of time (O−C, Observed minus Computed). Bottom panel:
Lomb-Scargle periodogram of these residuals.
inclination (i . 15◦) to match such a short period. The moderate
activity level of GJ 674 on the other hand leaves the nature of
the second signal a priori uncertain, and the very small rotation
velocity removes much of the power of the usual bisector test
(Appendix A). We therefore investigated its magnetic activity
through photometric observations (§5.1) and detailed examina-
tion of the chromospheric features in the clean HARPS spectra
(§5.2).
5.1. Photometric variability
We obtained photometric measurements with the CCD cam-
era of the Euler Telescope (La Silla) during 21 nights between
September 2nd and October 19th 2006. GJ 674 was observed
through a VG filter which, amongst the available filters, opti-
∆F = −0.013 sin(2π/35.68 + 54259.93)
53980 53990 54000 54010 54020 54030
Julian date −2,400,000 [day]Julian date −2,400,000 [day]
1 10 100 1000
Period [day]Period [day]
Fig. 5. Upper panel: Differential photometry of GJ 674 as a function
of time. The star clearly varies with a 1.3% amplitude. Bottom panel:
The periodogram of the GJ 674 photometry exhibits significant power
excess peaked at 35 days (small black arrow).
mizes the flux ratio between GJ 674 and its two brightest refer-
ence stars. This relatively blue filters also happens to have good
sensitivity to spots on cool stars such as GJ 674. To minimize at-
mospheric scintillation noise we took advantage of the low stel-
lar density to defocus the images to FWHM ∼ 11′′, so that we
could use longer exposure times. The increased read-out and sky
background noises from the larger synthetic aperture which we
then had to use remain negligible compared to both stellar pho-
ton noise and scintillation.
We gathered 14 to 75 images per night with a median expo-
sure time of 20 seconds. We used the Sept. 24th data, which have
the longest nightly time base, to tune the parameters of the Iraf
Daophot package and optimize the set of reference stars (HD
157931, CD 4611534 and 7 anonymous fainter stars) to min-
imize the dispersion in the GJ 674 photometry for that night.
These parameters were then fixed for the analysis of the full
data set. The nightly light curves for GJ 674 were normalized
by that of the sum of the references, clipped at 3-σ to remove
a small number of outliers, and averaged to one measurement
per night to examine the long term photometric variability of
GJ 674. GJ 674 clearly varies with a ∼1.3% amplitude, and a
(quasi-)period close to 35 days (Fig. 5). To verify that this vari-
ability does not actually originate in one of the reference stars,
we repeated the analysis alternately using as reference star HD
157931 alone and the average of the 8 other references. Both
light curves are very similar to Fig. 5.
The photometric observations are consistent with the signal
of a single spot, within the limitations of their incomplete phase
coverage: the variations are approximately sinusoidal, and their
∼0.2-0.3 radian phase shift from the corresponding radial veloc-
ity signal closely matches the difference expected for a spot. The
spot would cover 2.6% of the stellar surface if completely dark,
corresponding to a ∼ 0.16 R? radius for a circular spot.
X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674 5
Hα Ca II H+K−5
0 1 2 3 4 5
IndexIndex
Ca II H+K
0 0.2 0.4 0.6 0.8 1
Ca II H+K
1 10 100 1000
Period [day]Period [day]
Fig. 6. Upper panel: Differential radial velocity of GJ 674, corrected for the signature of the 4.69 days planet in our 2-planet Keplerian fit, as
a function of the Hα (red filled circles) and Ca ii H&K (green filled squares) spectral indices defined in the text. Bottom right panels: the Ca ii
H+K and Hα indexes phased to the longer period of the 2-planet Keplerian model. Bottom left panels: Power Density spectra of the spectroscopic
indexes. A clear power excess peaks at 34.8 days (vertical dashed lines).
5.2. Variability of the spectroscopic indices
The emission reversal in the core of the Ca iiH&K resonant lines
results from non-radiative heating of the chromosphere, which
is closely coupled to spots and plages through magnetic con-
nections between the photosphere and chromosphere. The Hα
line is similarly sensitive to chromospheric activity. We mea-
sured these chromospheric spectral features results in the clean
HARPS spectra used to measure the radial velocities, and exam-
ine their variability.
Like the well known Mt. Wilson S index (Baliunas et al.
1995), our Ca ii H+K index is defined as:
Index =
H + K
B + V
. (1)
with H and K sampling the two lines of the Ca ii doublet, and
B and V the continuum on both sides of the doublet. Our H
and K intervals are 31 km s−1 wide and centered on 3933.664
and 3968.47 Å, while B and V are respectively integrated over
[3952.6, 3956 Å] and [3974.8, 3976 Å].
This H+K index varies with a clear period of ∼34.8 days
(Fig. 6). Within the combined errors this is consistent with both
the photometric period and the longer radial velocity period. The
phasing of the chromospheric index and the photometry is such
that lower photometric flux matches higher Ca ii emission, as ex-
pected if active chromospheric regions hover over photospheric
spots.
A plot of the (apparent) radial-velocity as a function of the
H+K spectral index similarly shows the characteristic loop pat-
tern expected for a spot. The radial velocity effect of a spot can-
cels out when it crosses the sub-observer meridian, which oc-
curs twice during a rotation period: once on the hemisphere fac-
ing the observer, and once on the opposite hemisphere. During
the front-facing crossing the spot has maximal projected area,
hence maximal chromospheric emission, while it has a minimal
projected area (and is possibly hidden, depending on its latitude
and the stellar inclination) during the back-facing crossing. As
a result, both extrema of the chromospheric index correspond to
radial-velocity zero-crossings. At intermediate phases the spot
produces intermediate chromospheric emission levels, and it in-
duces positive (respectively negative) radial-velocity shifts when
the masked area is on the rotationally blue- (respectively red-
shifted) half of the star. The net result in a plot of chromospheric
emission as a function of radial velocity is a closed loop.
Chromospheric filling-in of photospheric Hα absorption
has similarly been found a powerful activity diagnostic for M
dwarfs. Kürster et al. (2003) found that in Barnard’s star it corre-
lates linearly with the radial-velocity variations, and interpreted
that finding as evidence that active plage regions inhibit the con-
vective velocity field. The variation pattern in GJ 674 definitely
differs from a linear correlation between Hα and the radial-
velocity residuals, and needs a different explanation.
6 X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674
Similarly to Kürster et al. (2003) we define our Hα index as:
Index =
F1 + F2
. (2)
with FHα sampling the Hα line, and F1 and F2 the contin-
uum on both sides of the line. Our FHα interval is 31 km s−1
wide and centered on 6562.808 Å, while F1 and F2 are respec-
tively integrated over [6545.495, 6556.245 Å] and [6575.934,
6584.684 Å]. The Hα index behaves similarly to the Ca ii H+K
index.
The chromospheric indices vary by factors of ∼2 and ∼1.3
(for our specific choices of continuum windows), and are thus
much more contrasted than the photometry. They do not how-
ever vary as smoothly with phase as the photometry, perhaps
due to (micro-)flares. This somewhat reduces their value as diag-
nostics of spot-induced radial velocity variations, but these mea-
surements on the other hand require no new observation. They
undoubtedly reinforce the spot interpretation here, and they will
be extremely useful in cases where photometry cannot be imme-
diately obtained.
5.3. Planets vs. activity
In §4 we showed that our 32 radial-velocity measurements of
GJ 674 are well described by two Keplerian signals, as illustrated
by the low reduced chi-square of that model. The above analy-
sis (§5) however demonstrates that the rotation period of GJ 674
coincides with the longer of the two Keplerian periods. Both the
stellar flux and the Ca iiH+K emission vary with that period, im-
plying that the surface of GJ 674 has a magnetic spot. This spot
must induce radial-velocity changes, with the observed phase
relative to the photometric signal. As a consequence, some, and
probably all, of the 35-day radial-velocity signal must originate
in the spot. Planet-induced activity through magnetic coupling
(e.g. Shkolnik et al. 2005) would in principle be an alternative
explanation of the correlation, but here it is not a very attrac-
tive one: the inner planet is at least as massive as the hypothet-
ical 35-day planet, and would, at least naively, be expected to
have stronger interactions with the magnetosphere of GJ 674.
The 4.69-day period however is only seen in the radial velocity
signal, and it has no photometric or chromospheric counterpart.
6. Discussion
6.1. Characteristics of GJ 674b
Perhaps the most important result of the above analysis is that
the ∼4.69-day planet of GJ 674 is robust: variability identifies
the stellar rotation period as ∼35 days, and the 4.69-day period
therefore cannot reflect rotation modulation. The short period
signal, in spite of its larger amplitude, also has no counterpart in
either photometry or chromospheric emission, further excluding
a signal caused by magnetic activity.
The 1-planet fit, which effectively treats the activity sig-
nal as white noise, results in a minimum mass for GJ 674b
of m2 sin i = 12.7 M⊕. The 2-planet fit by contrast filters
out this signal. That filtering obviously uses a physical model
which is not completely appropriate, but that remains preferable
to handling a (partly) coherent signal as white noise. We there-
fore adopt the corresponding estimate of the minimum mass,
m2 sin i = 11.09 M⊕.
At 0.039 AU from its parent star, the temperature of GJ 674 b
is ∼450 K. Planets above a few Earth masses planets can, but
need not, accrete a large gas fraction, leaving its composition –
mostly gaseous or mostly rocky – unclear. The orbital eccentric-
ity might shed light on the structure of GJ 674b, if confirmed
by additional measurements: rocky and gaseous planets have
rather different dissipation properties, and significant eccentric-
ity at the short period of GJ 674 b needs a high Q factor, unless it
is pumped by an additional planet at a longer period (e.g. Adams
& Laughlin 2006). For now, the stellar activity leaves the statis-
tical significance of the eccentricity slightly uncertain, and we
therefore prefer to stay clear from overinterpreting it.
6.2. Properties of M-dwarf planets
One important motivation in searching for planets around M
dwarfs is to investigate whether the planet-metallicity correla-
tion found for Jupiter-mass planets around solar-type stars ex-
tends to very-low-mass stars. Our photometric calibration of M
dwarfs metallicity (Bonfils et al. 2005b) gives respective metal-
licities of [Fe/H]=−0.03, −0.25, +0.14, +0.03, and −0.28 for
GJ 436, GJ 581, GJ 849, GJ 876 and GJ 674. M dwarfs with
known planets therefore have an average metallicity of −0.078
and a median of −0.03. By comparison, the 44 M dwarfs of the
Bonfils et al. (2005b) volume limited sample which are not cur-
rently known to host a planet have average and median metal-
licities of −0.181 and −0.160. M dwarfs with planets therefore
appear slightly more metal-rich than M dwarfs without planets.
A Kolmogorov-Smirnov test (Press et al. 1992) of the two sam-
ples gives an 11% probability that they are drawn from the same
distribution. The significance of the discrepancy is therefore still
modest, limited by small-number statistics.
One can additionally note that the two stars which host giant
planets, GJ 876 and GJ 849, occupy the metal-rich tail of the M
dwarf metallicity distribution, with GJ 849 almost as metal-rich
as the most metal-rich star of the comparison sample. The next
most metal-rich of the M dwarfs with planets, GJ 436, has an ad-
ditional long-period companion (P>6 yr) which might well be a
giant planet (Maness et al. 2006) and would then strengthen that
trend. If confirmed by additional data, this would validate the
theoretical predictions (Ida & Lin 2004; Benz et al. 2006) that
only Jovian-mass planets are more likely to form around metal-
rich stars. Current observations are consistent with this predic-
tion, but not yet very conclusively so (Udry et al. 2006).
Much recent theoretical work has gone into examining how
planet formation depends on stellar mass. Within the “core ac-
cretion” paradigm, Laughlin et al. (2004) and Ida & Lin (2005)
predict that giant planet formation is inhibited around very-
low-mass stars, while Neptune-mass planets should inversely
be common. Within the same paradigm, but assuming that M
dwarfs have denser protoplanetary disks, Kornet et al. (2006)
predict instead that Jupiter-mass planets become more frequent
in inverse proportion to the stellar mass. Finally, Boss (2006) ex-
amines how planet formation depends on stellar mass for plan-
ets formed by disk instability, and concludes that frequency of
Jupiter-mass planet is independent of stellar mass, as long as
disks are massive enough to become unstable.
To date, none of the ∼300 M dwarfs scrutinized for plan-
ets by the various radial-velocity searches (Bonfils et al. 2006;
Endl et al. 2006; Butler et al. 2006) has been found to host a
hot Jupiter. Conversely, GJ 674b is already the 4th hot Neptune.
Though that cannot be established quantitatively yet, these sur-
veys are likely to be almost complete for hot Jupiters, which
are easily detected. Hot Neptune detection, on the other hand, is
definitely highly incomplete. Setting aside this incompleteness
for now, simple binomial statistics shows that the probability of
X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674 7
−0.75 −0.5 −0.25 0 0.25
[Fe/H][Fe/H]
Fig. 7. Upper panel: Metallicity distributions of 44 M dwarfs without
known planets (gray shading) and of the 5 M dwarfs known to host
planets (black shading). Bottom panel: Corresponding cumulative dis-
tributions (solid and dashed lines, respectively).
Table 2. Keplerian parameterization for GJ 674b and GJ 674’s spot.
Parameter GJ 674b Spot
P [days] 4.6938 ± 0.007 34.8467 ± 0.0324
T [JD] 2453780.085 ± 0.078 2453767.13 ± 0.92
e 0.20 ± 0.02 0.20 ± 0.05
ω [deg] 143 ± 6 113 ± 9
K [m s−1] 8.70 ± 0.19 5.06 ± 0.19
a1 sin i [AU] 3.68 10−6 1.59 10−5
f (m) [M�] 3.0 10−13 4.4 10−13
m2 sin i [MEarth] 11.09 12.58
a [AU] 0.039 0.147
finding no and 4 detections in 300 draws of the same function is
only 3%. There is a thus 97% probability that hot Neptunes are
more frequent than hot Jupiter around M dwarfs. Accounting for
this detection bias in more realistic simulations (Bonfils et al.
in prep.) obviously increases the significance of the difference.
Planet statistics around M dwarfs therefore favor the theoreti-
cal models which, at short periods, predict more Neptune-mass
planets than Jupiter-mass planets.
Acknowledgements. We are grateful to the anonymous referee for construc-
tive comments. XB and NCS acknowledge support from the Fundação para
a Ciência e a Tecnologia (Portugal) in the form of fellowships (references
SFRH/BPD/21710/2005 and SFRH/BPD/8116/2002) and a grant (reference
POCI/CTE-AST/56453/2004). The photometric monitoring has been performed
on the EULER 1.2 meter telescope at La Silla Observatory. We are grateful to
the SNF (Switzerland) for its continuous support. This research has made use of
the SIMBAD database, operated at CDS, Strasbourg, France.
References
Adams, F. C. & Laughlin, G. 2006, ApJ, 649, 1004
Baliunas, S. L., Donahue, R. A., Soon, W. H., et al. 1995, ApJ, 438, 269
Beaulieu, J.-P., Bennett, D. P., Fouqué, P., et al. 2006, Nature, 439, 437
Benz, W., Mordasini, C., Alibert, Y., & Naef, D. 2006, in Tenth Anniversary
of 51 Peg-b: Status of and prospects for hot Jupiter studies, ed. L. Arnold,
F. Bouchy, & C. Moutou, 24–34
Bond, I. A., Udalski, A., Jaroszyński, M., et al. 2004, ApJ, 606, L155
Bonfils, X., Delfosse, X., Udry, S., Forveille, T., & Naef, D. 2006, in Tenth
Anniversary of 51 Peg-b: Status of and prospects for hot Jupiter studies, ed.
L. Arnold, F. Bouchy, & C. Moutou, 111–118
Bonfils, X., Delfosse, X., Udry, S., et al. 2005a, A&A, 442, 635
Bonfils, X., Forveille, T., Delfosse, X., et al. 2005b, A&A, 443, L15
Boss, A. P. 2006, ApJ, 643, 501
Bouchy, F., Pepe, F., & Queloz, D. 2001, A&A, 374, 733
Butler, R. P., Johnson, J. A., Marcy, G. W., et al. 2006, ArXiv Astrophysics e-
prints
Butler, R. P., Vogt, S. S., Marcy, G. W., et al. 2004, ApJ, 617, 580
Cutri, R. M., Skrutskie, M. F., van Dyk, S., et al. 2003, 2MASS
All Sky Catalog of point sources. (The IRSA 2MASS All-
Sky Point Source Catalog, NASA/IPAC Infrared Science
Archive. http://irsa.ipac.caltech.edu/applications/Gator/)
Delfosse, X., Forveille, T., Mayor, M., et al. 1998, A&A, 338, L67
Delfosse, X., Forveille, T., Ségransan, D., et al. 2000, A&A, 364, 217
Endl, M., Cochran, W. D., Kürster, M., et al. 2006, ApJ, 649, 436
ESA. 1997, VizieR Online Data Catalog, 1239, 0
Gould, A., Udalski, A., An, D., et al. 2006, ApJ, 644, L37
Hawley, S. L., Gizis, J. E., & Reid, N. I. 1997, AJ, 113, 1458
Hünsch, M., Schmitt, J. H. M. M., Sterzik, M. F., & Voges, W. 1999, Late-type
stars in the ROSAT all-sky survey, ed. B. Aschenbach & M. J. Freyberg, 387–
Ida, S. & Lin, D. N. C. 2004, ApJ, 604, 388
Ida, S. & Lin, D. N. C. 2005, ApJ, 626, 1045
Kornet, K., Wolf, S., & Różyczka, M. 2006, A&A, 458, 661
Kürster, M., Endl, M., Rouesnel, F., et al. 2003, A&A, 403, 1077
Laughlin, G., Bodenheimer, P., & Adams, F. C. 2004, ApJ, 612, L73
Leggett, S. K. 1992, ApJS, 82, 351
Leggett, S. K., Allard, F., Geballe, T. R., Hauschildt, P. H., & Schweitzer, A.
2001, ApJ, 548, 908
Lovis, C., Mayor, M., Pepe, F., et al. 2006a, Nature, 441, 305
Lovis, C., Mayor, M., Pepe, F., Queloz, D., & Udry, S. 2006b, in Precision
Spectroscopy in Astrophysics
Maness, H. L., Marcy, G. W., Ford, E. B., et al. 2006, ArXiv Astrophysics e-
prints
Marcy, G. W., Butler, R. P., Fischer, D., et al. 2001, ApJ, 556, 296
Marcy, G. W., Butler, R. P., Vogt, S. S., Fischer, D., & Lissauer, J. J. 1998, ApJ,
505, L147
Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20
McArthur, B. E., Endl, M., Cochran, W. D., et al. 2004, ApJ, 614, L81
Pepe, F., Mayor, M., Queloz, D., et al. 2004, A&A, 423, 385
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992,
Numerical recipes in C. The art of scientific computing (Cambridge:
University Press, —c1992, 2nd ed.)
Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279
Rivera, E. J., Lissauer, J. J., Butler, R. P., et al. 2005, ApJ, 634, 625
Saar, S. H. & Donahue, R. A. 1997, ApJ, 485, 319
Santos, N. C., Bouchy, F., Mayor, M., et al. 2004, A&A, 426, L19
Shkolnik, E., Walker, G. A. H., Bohlender, D. A., Gu, P.-G., & Kürster, M. 2005,
ApJ, 622, 1075
Turon, C., Creze, M., Egret, D., et al. 1993, Bulletin d’Information du Centre de
Donnees Stellaires, 43, 5
Udalski, A., Jaroszyński, M., Paczyński, B., et al. 2005, ApJ, 628, L109
Udry, S., Mayor, M., Benz, W., et al. 2006, A&A, 447, 361
Appendix A: Bisector analysis
As demonstrated by Saar & Donahue (1997) the bisector anal-
ysis loses much of its diagnostic power when applied to slow
rotators. In simulations of the impact of star spots on radial-
velocity and bisector measurements, they found that, for a given
spot configuration, the radial velocity varies linearly with v sin i
while the bisector span varies as (v sin i)3.3. The bisector signal
therefore decreases faster with decreasing rotational velocities
than the radial-velocity signal, and disappears faster in measure-
ment noise. For GJ 674 we measure a very low rotation velocity
(v sin i < 1km s−1). It is therefore unsurprising that the cor-
8 X. Bonfils et al.: An 11 M⊕ planet around the nearby M dwarf GJ 674
−7.5 −5 −2.5 0 2.5 5 7.5
RV [m/s]RV [m/s]
1 10 100 1000
Period [day]Period [day]
Fig. A.1. Bisector analysis for GJ 674 measurements
relation between the bisector span and radial velocity is weak
(Fig. A.1) and not statistically significant.
ABSTRACT
  Context: How planet properties depend on stellar mass is a key diagnostic of
planetary formation mechanisms. Aims: This motivates planet searches around
stars which are significantly more massive or less massive than the Sun, and in
particular our radial velocity search for planets around very-low mass stars.
Methods: As part of that program, we obtained measurements of GJ 674, an M2.5
dwarf at d=4.5 pc, which have a dispersion much in excess of their internal
errors. An intensive observing campaign demonstrates that the excess dispersion
is due to two superimposed coherent signals, with periods of 4.69 and 35 days.
Results: These data are well described by a 2-planet Keplerian model where each
planet has a ~11 Mearth minimum mass. A careful analysis of the (low level)
magnetic activity of GJ 674 however demonstrates that the 35-day period
coincides with the stellar rotation period. This signal therefore originates in
a spot inhomogeneity modulated by stellar rotation. The 4.69-day signal on the
other hand is caused by a bona-fide planet, GJ 674b. Conclusion: Its detection
adds to the growing number of Neptune-mass planets around M-dwarfs, and
reinforces the emerging conclusion that this mass domain is much more populated
than the jovian mass range. We discuss the metallicity distributions of M dwarf
with and without planets and find a low 11% probability that they are drawn
from the same parent distribution. Moreover, we find tentative evidence that
the host star metallicity correlates with the total mass of their planetary
system.

<|endoftext|><|startoftext|>
arXiv:0704.0271v2  [cond-mat.stat-mech]  27 Feb 2008
An individual based model with global competition interaction: fluctuation effects in
pattern formation
E. Brigatti ⋆±, V. Schwämmle† and Minos A. Neto⋆
†Centro Brasileiro de Pesquisas F́ısicas, Rua Dr. Xavier Sigaud 150, 22290-180, Rio de Janeiro, RJ, Brazil
⋆Instituto de F́ısica, Universidade Federal Fluminense, Campus da Praia Vermelha, 24210-340, Niterói, RJ, Brazil
±e-mail address: edgardo@if.uff.br
(October 26, 2018)
We present some numerical results obtained from a simple individual based model that describes
clustering of organisms caused by competition. Our aim is to show that, even when a deterministic
description developed for continuum models predicts no pattern formation, an individual based
model displays well defined patterns, as a consequence of fluctuation effects caused by the discrete
nature of the interacting agents.
87.17.Aa,87.23.Kg, 87.23.-n, 05.10.Ln
I. INTRODUCTION
Birth and death processes are two of the most relevant
characteristics of the dynamics of biological populations
and can be responsible for the emergence of stable spa-
tial patterns [1]. In fact, the intrinsic asymmetry in the
nature of birth and death processes can enhance small
initial differences in the spatial population density and
lead to the formation of structures [2–4]. These clusters
are resistant to some levels of diffusion and emerge as
soon as the birth of new individuals outcompetes their
movements. For this reason, simplified models combin-
ing birth and death processes with Brownian movement
are able to describe aggregation of individuals [5].
Another central ingredient, present in ecological sys-
tems, that can cause the generation of spatial structures,
is the competition for resources [6–8]. Different individ-
uals struggle for nutrients with a competition strength
directly dependent on the individuals’ spatial density
within the competing range. Reproduction and/or death
rates depending on the number of individuals in the sur-
roundings can stand for this kind of interaction. This fea-
ture has attracted the interest of experts from a variety of
fields, ranging from pure mathematics [9] and non-linear
physics [10–14,27] to population biology [16,17] and the-
oretical studies in evolutionary theory [20–25]. In ad-
dition, similar behaviors can be found in physical sys-
tems, such as, for example, in mode interaction in crys-
tallization fronts [18] and in spin-wave patterns [19]. It
is remarkable that the structured state generated by this
kind of frequency-dependent interaction exists only for
some specific form of the interaction [24] and is reached
through a transition in the parameter space. This transi-
tion, (segregation transition [10]), drives the steady state
of the system from a spatially homogeneous distribution
to one marked by some clearly distinguishable inhomo-
geneities.
All these models, characterized by diffusion effects and
an implementation of frequency dependent birth and
death processes, permit multiple interpretations.
In a common interpretation the system space directly
represents the physical space where the organisms live
and the diffusion represents their spatial movement.
Competition between individuals corresponds to a mech-
anism of growth control caused by limited common re-
sources. In this case, pattern formation can reproduce
the evolution of bacterial colonies [6], plankton concen-
tration [5], development of vegetation [8] or spatial dis-
tribution of predators [7].
On the other hand, a different interpretation enables
us to describe the speciation process: the generation of
two different species starting from one single continuous
population of interbreeding organisms. To be specific,
we can describe the speciation process by representing
all the phenotypic characteristics that determine the bi-
ological success of an individual by a number, the strat-
egy value, that labels each individual. By reproduction,
that includes a mutation process, an offspring inherits a
strategy that slightly differs from that of its parent. In
order to model natural selection, a frequency-dependent
mechanism that mimics competition, completes the in-
gredients necessary for the emergence of population clus-
tering. In this scenario, the generation of a new cluster
is interpreted, in a broad sense, as a speciation event.
Now, if the model space represents the mentioned strat-
egy space and the diffusion models the mutation process
during reproduction, we can identify the mechanism of
growth control with natural selection and the branching
events with the speciation process. This different inter-
pretation justifies the analogy between a model that de-
scribes the speciation process and the ones that describe
spatial pattern formation in the evolution of bacterial
colonies, vegetation or predation.
We must remember that, since this model does not in-
clude sexual reproduction, we are describing trait diver-
gence in an asexual population, rather than speciation.
Anyway, apart from effects strictly related to sexual re-
production, the dynamics characterized by the individu-
http://arxiv.org/abs/0704.0271v2
als’ diffusion from regions of low viability in favor of more
viable ones, is the essential core of these two phenomena.
A detailed and motivated discussion of these processes
can be found in the following references [20–25].
We can describe such processes starting from an indi-
vidual based model, that yields information on the be-
havior of a finite system (finite population) and accounts
for fluctuation effects caused by the discrete nature of the
interacting agents. Another approach, that neglects fluc-
tuations, describes individuals just with the use of a field
that represents the population density at each position in
space over time. This method, usually denominated con-
tinuous mean-field description [15,27], becomes exact in
the infinite-size limit if fluctuations are small compared
to averages [4]. Note that we are not making a mean-
field approximation in the nature of the interaction (see,
for instance, ref. [26]), that, in our model, is local and
different for each individual.
Choosing this second strategy, the generalization
of a well-investigated equation (Fisher-Kolmogoroff-
Petrovsky-Piscounoff [28]) is quite common at present.
In addition to a diffusion process with coefficient D and
a population growth mechanism (rate a), this equation
incorporates a growth-limiting process controlled by the
parameter b [11,12,14]:
∂ρ(x, t)
= D∇2ρ(x, t) + aρ(x, t) − (1.1)
b ρ(x, t)
ρ(y, t)F (x, y)dy ,
where ρ is the density of individuals at position x and
time t. Competition is obtained by varying the death
probability for each individual, and is controlled through
the influence function F (x, y). Let us focus on the shape
of the influence function: it can range from a simple box-
like function to a globally uniform interaction. However,
the Gaussian function should be considered particularly
relevant. If, for instance, we need to represent the ac-
tivity of a sedentary animal the interaction represented
in the influence function should take into account the in-
dividual’s daily excursion around the fixed breeding site
that can be represented by Brownian motion and, for this
reason, by a Gaussian distribution. In the same way, if
we want to represent the habitat degeneration induced by
the growth of a colony of plants [29] we can think that the
colony is originated by a single individual that disperses
its seeds in a way also well described by Brownian motion.
More generally, for a biological interaction that does not
stop at some defined length (presence of a cutoff), and
that is nonlocal and controlled by a purely stochastic
process, the Gaussian function should be the most natu-
ral choice. On the other hand, this choice is a source of
complications. Deterministic descriptions, in the case of
a Gaussian influence function, predict no pattern forma-
tion [12,30]. However, such descriptions do not take into
account fluctuations arising from the discrete character
of individuals. The importance of these fluctuations has
been recently pointed out in a quite paradigmatic exam-
ple, where random walking organisms that reproduce and
die at a constant rate spontaneously aggregate [2–5,31].
The deterministic approximation is not able to show
this behavior, incapable of capturing the essential asym-
metry between birth, a multiplicative process that incre-
ments the density in the regions adjacent to the parent,
and death events, that occur anywhere. Even when
the patterns can be obtained within the deterministic
description, a recent work outlines the importance of
fluctuations by showing their impact on affecting transi-
tion points and amplitudes [14,32,33].
In this work we present some numerical results ob-
tained by means of a simple individual based model. Our
aim is to show the appearance of a segregation transition
in a model where the deterministic instability, produced
by the non-local interaction, is not sufficient for gener-
ating inhomogeneities [12], but the superimposed micro-
scopic stochastic fluctuations permit the emergence of
patterns. Moreover, we compare the model implemen-
tation in the strategy space with the implementation in
the physical space. In the first, used to characterize the
speciation process, diffusion corresponds to a mutation
phenomenon, operating just one time in each individ-
ual’s life. In the second, directly related to the reaction-
diffusion equation (eq. (1.1)), diffusion describes a typical
Brownian motion.
The paper is organized as follows. The next section
describes the model used in our simulations. Section III
shows, for specific values of the parameters, the emer-
gence of spatially inhomogeneous steady states. In sec-
tion IV we prove that these patterns are not caused by
some finite size effect. Section V is devoted to illustrate
the segregation transition and general conditions that al-
low spatial segregation of arbitrary wavelengths. In sec-
tion VI we describe, in the light of the existing literature,
the cluster size dependence on diffusion rate and popu-
lation size and give some hints related to the behavior of
fluctuations. Conclusion are reported in section VII.
II. THE MODEL
The simulations start with an initial population of N0
individuals located along a ring of length L, i.e. we take
periodic boundary conditions. At each time step, our
model is controlled by the following microscopic rules:
1) Each individual, characterized by its position x, dies
with probability P ,
P = K ·
exp(−
(x− yj)2
) , (2.1)
whereN(τ) is the total number of individuals at the ac-
tual step τ and yj is the respective individuals’ positions.
The distance between two individuals is obtained by tak-
ing the shorter distance on the ring. The strength of
competition declines with increasing distance according
to a Gaussian function with deviation C. The parameter
K depicts the carrying capacity.
2) If the individual survives this death selection, it
reproduces. The newborn, starting from the parent’s lo-
cation, moves in a random direction a distance obtained
from a Gaussian distribution of standard deviation σ.
This change represents the effect of mutations in the
offspring phenotype.
As soon as all the individuals have passed the death
selection and eventually reproduced, the next time step
begins. This model implementation is analogous to the
diffusion process described by eq. (1.1).
To establish a more direct comparison between that
mean field description and the individual based simula-
tion, we have also implemented the model with an exact
microscopic representation of the diffusion term. In this
case, at any given time step, we perform first a loop over
all particles where individuals move some distance, in a
random direction, chosen from a Gaussian distribution of
standard deviation σ. At the end of this loop, a second
one starts, where:
1) Each individual with strategy x dies with a proba-
bility obtained from eq. (2.1).
2) If the individual survives the death selection pro-
cess, it reproduces and the newborn maintains the same
location of the parent.
If, in eq. (1.1), we measure time in units of the simula-
tion time step, the coefficient D is related to our simula-
tion parameter through D ∝ σ2. The influence function
is given by a Gaussian of standard deviation C and the
effective growth rate is 1− P , with P given by eq. (2.1).
This non constant growth rate can be represented by
eq. (1.1) for a = 1 and the frequency dependent part
included in the integral term (see ref. [14]).
Essentially, the only difference between these two ver-
sions of our model is that, in the first one, individuals
move only at birth, while in the second version they can
move at every time step throughout their life. Since the
death probability, at equilibrium, is approximately 1/2,
usually, in the second version, an individual will move
between one and two times during its entire life. For
this reason, there should be no relevant differences in the
qualitative behavior of the two model implementations.
As shown by the measures reported in section IV, the
only significant effect is the appearance of slightly wider
distributions.
III. MODULATION
In the following we present some typical examples of
steady states generated by the dynamics of the model
that clearly show the emergence of patterns for some spe-
cific values of the parameters.
For a global competition that results to be extremely
long-ranged (large C values, in relation to the values of
parameters σ and K), the steady state is characterized
by a spatially homogeneous occupancy. If the σ value is
sufficiently large or/and the K value sufficiently small,
totally homogeneous distributions are obtained (Figure
1), otherwise the solution is smooth but with the pop-
ulation concentrated in one region of the ring, with its
width controlled by σ.
As stated above, a simple heuristic analysis of eq. (1.1)
in the Fourier space shows that there is a necessary con-
dition for the emergence of inhomogeneity: the Fourier
transform of the influence function must have negative
values and large enough magnitude [12,13]. A Gaus-
sian in an infinite domain has a positive counterpart in
Fourier space and so does not match such requirements.
In contrast with these results, the fluctuations present in
our individual based model arrive to excite one specific
mode and modulations of this wavelength appear. The
tuning of the parameters allows modulations of arbitrary
wavelengths (Figure 1).
When C is decreased, the competition between modes
becomes stronger and no single mode dominates. In this
situation, some small regions of the ring are occupied
forcing all the remaining areas, up to some range, to
be nearly empty. The landscape results populated by
several living colonies divided by dead regions. There is
almost no competition between individuals of different
colonies and the space separating them can be identified
with an effective interaction length. This steady state
(spiky state [10]) corresponds to a sequence of isolated
colonies (spikes) and seen in the Fourier space, many
active wavelengths contribute to it (Figure 2).
Finally, for extremely short-ranged competition, in re-
lation to the σ value, no collective cooperation between
different excited modes emerges and a noisy spatially
homogeneous distribution appears (Figure 2).
We describe these paradigmatic steady states of the
system by characterizing the related spatial structure
with the help of a structure function S(q) [14] defined
as follows:
S(q) =
exp[i2πq · xj(τ)]
(3.1)
where the sum is performed over all individuals j with
their positions determined by xj(τ). Note, that, for con-
venience, the structure function is calculated only over
the closed interval [0, L]. The function is averaged over
some time interval T in order to avoid noisy data. S(0)
corresponds to the square of the mean number of individ-
uals in the system. The maxima of S(q) identify the rel-
evant periodicity present in the steady state. We will see
that the position of the global maximum (qM ) provides
an appropriate order parameter for the identification of
the segregation transition.
In our study we explored two different initial condi-
tions. In the first (local i.c.), the colony is located in
a finite and compact region of the space. In the second
(global i.c.), the individuals are spread all over the space.
The final distribution is independent of this choice and,
generally, local initial conditions make the system reach
the steady state more slowly. For this reason, if not dif-
ferently specified, our results are obtained from global
initial conditions.
IV. FINITE SIZE EFFECTS
Our analysis starts by exploring the model dependence
on the space size L. The reason for such interest is given
by the necessity of testing if the pattern formation is not
merely a product of some finite size effect. This is im-
portant, in the light of what was reported by Fuentes
et al. in ref. [12]. In their work, a numerical solution
of eq. (1.1) with a Gaussian influence function, showed a
segregation transition. But such a transition was just the
effect of the finite domain size that acted like a cut-off
for the Gaussian. Evidence of this interpretation came
from the observation that the amplitude of the pattern
depended on the ratio of the standard deviation of the in-
fluence function to the domain size - the critical values of
the standard deviation corresponding to the segregation
transition depended linearly on the domain size - and the
same patterns appeared for a modified Gaussian, which
vanishes abruptly beyond a cutoff.
The study of our individual based model gave differ-
ent results. By running some simulations with exactly
the same parameters but changing the ring extension, we
were able to show that the system is not influenced by the
domain size. If we choose data from spiky steady states,
that permit clear quantitative measures, it is possible to
remark that the general morphology of the patterns does
not change increasing the L value. In fact, both the pop-
ulation density and the mean number of peaks per space
interval remains constant. Moreover, in order to provide
a more precise test of possible small variations in the
distribution, we measured the cluster size. This quan-
tity was calculated by evaluating the standard deviation
(< x2 >i −< x >i2)1/2 of the position of the i individu-
als confined in each peak, then averaged over the different
peaks present at step τ and, finally, averaged over many
time steps after the system has reached the steady state.
Varying the system size caused no changes in the clus-
ters size (see Figure 3). From this result, we concluded
that the general aspect of the steady state does not
change with L. In particular, in contrast with what hap-
pens when the mean field equation is solved numerically,
the patterns do not depend on the ratio C
. For example,
for C = 0.2 and L = 50 we obtained a spiky steady state,
for C = 0.004 and L = 1 (same ratio) we obtained an
homogeneous steady state. Taking into account these re-
sults, from now on, all our simulations are implemented
on a ring of size 1.
We have just shown how the average of the population
size < N > scales with L in the steady state. Now, we
will give, through a simple heuristic argument, an esti-
mation of < N > as a function of the parameters K and
C, that will be useful also in the following of our analysis.
We can assume that, locally and in the steady state,
the number of deaths must be, on average, compensated
by the number of newborns, in order to comprise a sta-
ble population. For this reason, the death probability P
must equal 1/2. Assuming that the number of neighbors
that compete with a single individual are the ones living
up to a distance C and that, in these surroundings, the
average density N/L can be considered to be uniform, P
reduces to K ·2C ·N/L. Thus, N ∝ L/(CK). Looking at
Figure 4 we can see that this crude evaluation, that ne-
glects diffusion, inhomogeneity and reduces the influence
function to a box, describes well the general behavior of
the data obtained from our simulations.
V. SEGREGATION TRANSITION
In the previous paragraphs we supported the fact that
steady states, depending on the parameter values, can
assume inhomogeneous spatial distributions. Now, we
will try to describe the transition towards these states
(segregation transition). The structure function intro-
duced in eq. (3.1) provides a proper order parameter
to describe this transition. Different regions in the pa-
rameter space, coinciding with different steady states,
correspond to different positions of the global maximum
(obviously we are not taking into account S(q) at q = 0)
of the structure function. The transition from a homo-
geneous to an inhomogeneous distribution (see Figure 2)
matches the jump of the position of the global maxi-
mum (qM ) to a clear integer value, corresponding to the
number of clusters present in the space. For this rea-
son we can characterize the transition by looking at the
shape assumed by S(q), or looking at the value of qM .
If the space is homogeneously occupied, the structure
function does not present an integer maximum. On the
contrary, the maximum is located at qM ≃ 1.4. This
value corresponds to an uniform distribution of individ-
uals in the interval [0,1], approximated by the expression
exp(i2πq ·x)dx
. The segregation transition is char-
acterized by the passage of qM from 1.4 to an integer
value as soon as a modulation becomes dominant. In
Figure 5 we show qM as a function of C, varying K and
σ. First of all, from the analysis of these data, we can
observe that the number of clusters scales as C−1 (or,
equivalently, the periodicity of the inhomogeneous phase
has wave lengths proportional to C). Moreover, a criti-
cal value of C exists for which the transition takes place.
This Ccritic grows with 1/K and with σ. An analysis
of the available data suggests the possible dependence:
Ccritic ∝ σ2/3K−1/3 . Finally, for larger values of the
parameter C, in this range of the parameters σ and K,
the distributions are characterized by just one peak.
For any value of the competition strength, as can be
seen in Figure 6, there exists a critical value σcritic, de-
pendent on C and K, above which no spatial structures
emerge. Another measurement, that permits us to state
this relation in a different and clearer way, is presented
in the next paragraph.
VI. CLUSTER SIZE AND FLUCTUATIONS
In the following we describe, in the light of the existing
literature, the cluster size dependence on diffusion rate
and population size for the two different implementations
of the model.
We start by analyzing the typical size S of the clus-
ters that appear in the spiky phase. The data exposed in
Figure 7 show a dependence of the cluster size on the dif-
fusion coefficient: S ∝ σ, equivalent to S ∝
D. These
results are in accordance with the data presented in ref.
[33] obtained from an individual based model. In addi-
tion, this work pointed out how this behavior deviates
from the conclusions obtained from the deterministic ap-
proximation, where the cluster size was only weakly de-
pendent on the diffusion coefficient; another fact support-
ing the relevance of fluctuation effects in these systems.
We can easily interpret the dependence of the clus-
ter size on the diffusion coefficient by assuming that the
individuals confined in a cluster diffuse a distance pro-
portional to
DJ where J is the number of jumps the
individual performs in its life. In the case of the first
implementation of the model, where the diffusion is due
to the mutation process, J is obviously 1. Similar results
are obtained with the second implementation (see Fig-
ure 7), apart from a slightly wider cluster size (in this
case, individuals move, on average, more than just once
during their lifetime). Even so, the data show the same
dependence on the diffusion coefficient.
Finally, we present S as a function of K: S ∝
The reasons for this behavior are already explained in
ref. [34]: the cluster size is not controlled just by the
single individual’s number of jumps; in fact the diffusive
process continues with its descendants. For this reason,
it is proportional to the mean lifetime of a family, esti-
mated to be proportional toK−1 (see ref. [34] for details).
We introduce a new quantity that is useful for describ-
ing the existence of a critical diffusion value, giving an
estimation of its dependence on others parameters and
a confirmation of previous results. This quantity, which
we call mobility M(τ), estimates the mean mobility of
individuals. At a given time step τ , we choose an indi-
vidual i. Then we look for the closest agent, among all
the population, at time step τ − 1. We identify as di the
distance between these two individuals. Averaging over
the entire population N(τ) we obtain:
M(τ) =
di. (6.1)
The values assumed by M on varying the parameters σ
and C are shown in Figure 8. It is easy to distinguish two
clearly different behaviors. If the system is organized in a
spiky state (when σ ≤ σcritic), M(τ) ∝ σ. M is another
way of measuring the mean distance that an individual
moves during its lifetime inside the region defined by the
cluster. For this reason, this measure is coherent with the
data obtained from the direct evaluation of the standard
deviation of the clusters. In contrast, when the system is
organized in the homogeneous phase (when σ > σcritic)
M becomes independent of σ and is proportional to the
inverse of the occupation density M(τ) ∝ KC. The val-
ues of the mobility obtained from simulations with dif-
ferent values of C and K can be easily collapsed into
one function (see the inset of Figure 8). The collapse is
performed using the scaling σ → σ/C
K. This indicates
that the characteristic value of the crossover, σcritic, that
separates the two different behaviors of M(τ), scales as:
σcritic ∝ C
K. (6.2)
We conclude our study with some measurements try-
ing to catch some properties of the system fluctuations.
First, we estimated the fluctuations of the total popula-
tion, averaging over different simulations. The variance
turned out to be constant throughout the time evolution
and of the order of the square root of the total population.
The mechanism of auto-regulation of the population di-
mension does not allow the growth of big differences in
the total number of individuals.
For this reason, we focused our attention on the spa-
tial distribution of the population and tried to measure
some properties of these fluctuations. We studied the
variation of the local number of individuals in the same
simulation, for different times. We analyzed the evo-
lution of the system starting from local initial condi-
tions, with the population concentrated in the interval
[0.49, 0.51]. In this situation, the system evolves in time
with a small cluster fluctuating around the initial space
interval. This situation changes when a branching event
occurs that generates two well-defined clusters. Our in-
terest is in showing the behavior of local space fluctua-
tions and capturing possible variations in correspondence
with the branching event. First of all, we looked at the
mean value of the spatial local fluctuations Fs(τ), defined
as Fs(τ) = (
f2j )
1/2, where fj is the occupancy varia-
tion of the bin j from time step τ − 1 to time step τ . We
performed the average over all the b bins the ring was
divided in and obtained Fs(τ) =
N(τ), with no rel-
evant variations throughout the time evolution, even in
the time interval corresponding to the branching event.
More interesting is the shape of the frequency distribu-
tion of the size of fj . In fact, a simple Gaussian does
not fit this distribution, that presents extended tails (see
Figure 9). Throughout the system time evolution, the
shape of the normalized distribution is conserved. For
global initial conditions the same frequency distribution,
with extended tails, is recovered at the steady state. It
is identical to the one obtained with local initial condi-
tions and measured at the steady state. We think that
the deviation of the distribution from a Gaussian can be
considered as a hint that fluctuations play a relevant role
in the dynamics of the systems.
VII. CONCLUSIONS
We presented some results regarding clustering of or-
ganisms caused by a frequency-dependent interaction
that represents competition. We showed how this way
of modeling competition can be used not only to de-
scribe spatial phenomena in population biology, but also,
through a more abstract interpretation, to test ideas of
evolutionary theory (for example, studying the speciation
process).
From this unifying perspective, our study, obtained
from an extensive collection of data coming from sim-
ulations of an individual based model with global com-
petition, pointed out the relevance of fluctuation effects
in pattern formation. For the influence function adopted,
the mean field description predicts the absence of spatial
structures. On the contrary, fluctuations are able to ex-
cite the emergence of well defined patterns, which can
not be generated from a deterministic instability.
Furthermore, we discussed other fundamental proper-
ties of our model in the light of the existing literature,
unfolding a comparison with other models that describe
spatial segregation originated by some deterministic in-
stability. We showed that the observed patterns are not
due to a finite size effect, we characterized the behavior
of the segregation transition in various regions of the pa-
rameter space and we studied the existence of a critical
diffusion value. We analyzed the dependence of the clus-
ter size on the diffusion coefficient and pointed out some
characteristics of the fluctuations of the system.
ACKNOWLEDGMENTS
We are grateful to J.S. Sá Martins for a critical reading
of the manuscript and thank the Brazilian agency CNPq
for financial support.
[1] S. A. Levin and L. A. Segel, SIAM Rev. 27, 45 (1985); R.
Durrett and S. A. Levin, Philos. Trans. R. Soc. London,
Ser. B 343, 329 (1994).
[2] Y.-C. Zhang, M. Serva and M. Polikarpov, J. Stat. Phys.
58, 849 (1990).
[3] N. M. Shnerb, Y. Louzoun, E. Bettelheim and S.
Solomon, Proc. Natl. Acad. Sci. U.S.A. 97, 10322 (2000).
[4] B. Houchmandzadeh, Phys. Rev. E 66, 052902 (2002).
[5] W. R. Young, A. J. Roberts and G. Stuhne, Nature 412,
328 (2001).
[6] E. Ben-Jacob, I. Cohen, and H. Levine, Adv. Phys. 49,
395 (2000); J.Wakita, K. Komatsu, A. Nakahara, T. Mat-
suyama, and M. Matsushita, J. Phys. Soc. Jpn. 63, 1205
(1994).
[7] K. S. McCann, J. B. Rasmussen, and J. Umbanhowar,
Ecol. Lett. 8, 513 (2005).
[8] J. B. Wilson and A. D. Q. Agnew, Adv. Ecol. Res. 23,
263 (1992); R. Lefever and O. Lejeune, Bull. Math. Biol.
59, 263 (1997); N. M. Shnerb, P. Sarah, H. Lavee, and
S. Solomon, Phys. Rev. Lett. 90, 038101 (2003).
[9] N. F. Britton, SIAM J. Appl. Math. 50, 1663 (1990), S.
A. Gourley and N. F. Britton, J. Math. Biol. 34, 297
(1996).
[10] Y. E. Maruvka and N. M. Shnerb, Phys. Rev. E 73,
011903 (2006).
[11] A. Sasaki, J. Theor. Biol. 186, 415 (1997).
[12] M. A. Fuentes, M. N. Kuperman and V. M. Kenkre,
Phys. Rev. Lett. 91, 158104 (2003).
[13] M.A. Fuentes, M.N. Kuperman and V.M. Kenkre, J.
Phys. Chem. B 108, 10505 (2004).
[14] E. Hernandez-Garcia and C. Lopez, Phys. Rev. E 70,
016216 (2004).
[15] J. Billingham, Nonlinearity 17, 313 (2004).
[16] G. Flierl, D. Grunbaum, S. Levin and D. Olson, J. Theor.
Biol. 196, 397 (1999).
[17] B. M. Bolker, Theor. Popul. Biol. 64, 255 (2003).
[18] D. A. Kurtze, Phys. Rev. B 40, 11104 (1989).
[19] F.-J. Elmer, Phys. Rev. Lett. 70, 2028 (1993).
[20] J. Roughgarden, Am. Nat. 106, 683 (1972).
[21] F. Bagnoli and M. Bezzi, Phys. Rev. Lett. 79 3302
(1997).
[22] U. Dieckmann and M. Doebeli, Nature 400, 354 (1999);
M. Doebeli and U. Dieckmann, Nature 421, 259 (2003).
[23] E. Brigatti, J. S. Sá Martins and I. Roditi, Physica A
376, 378 (2007); V. Schwämmle and E. Brigatti, Euro-
phys. Lett. 75, 342 (2006).
[24] S. Pigolotti, C. Lopez and E. Hernandez-Garcia, Phys.
Rev. Lett. 98 258101 (2007).
[25] H. Sayama, M.A.M. de Aguiar, Y. Bar-Yam and M.
Baranger, Phys. Rev. E 65, 051919 (2002)
[26] H. Sayama, L. Kaufman and Y. Bar-Yam, Phys. Rev. E
62, 7065 (2000).
[27] D. R. Nelson and N.M. Shnerb, Phys. Rev. E 58, 1383
(1998).
[28] R. A. Fisher, Ann. Eugenics 7, 353 (1937); A. Kolmogo-
roff, I. Petrovsky and N. Piscounoff, Moscow Univ. Bull.
Math. 1, 1 (1937).
[29] F. J. Weissing and J. Huisman, J. Theor. Biol. 168, 323
(1994).
[30] J. Polechova and N. H. Barton, Evolution, 59, 1194
(2005).
[31] M. Meyer, S. Havlin and A. Bunde, Phys. Rev. E 54,
5567 (1996).
[32] C. Lopez and E. Hernandez-Garcia, Physica D 199, 223
(2004).
[33] E. Hernandez-Garcia and C. Lopez, Physica A 356, 95
(2005).
[34] E. Hernandez-Garcia and C. Lopez, J. Phys.: Condens.
Matter 17, S4263 (2005).
FIG. 1. Homogeneous steady state distribution ( top,
C = 4.0, 1/K = 80000, σ = 0.01) and modulated steady state
distribution (bottom, C = 0.9, 1/K = 18000, σ = 0.01). The
insets show the structure functions, S(q), of the correspond-
ing simulations. We show the distributions at time step 1000,
whereas the structure functions are averaged over 500 time
steps.
FIG. 2. Spiky (top, C = 0.059, 1/K = 500, σ = 0.001) and
homogeneous (bottom, C = 0.005, 1/K = 500, σ = 0.001)
steady states. The insets show the structure functions S(q).
We show the distributions at time step 2000, whereas the
structure functions are averaged over 1000 time steps. The
transition between these two states, in this typical range of
parameters, has been extensively studied.
0 10 20 30 40 50
0.005
FIG. 3. Dependence of the cluster size on the ring size
L. We present data from the model with mutation (trian-
gles) and from the one that implements diffusion (circles),
C = 0.2, K = 0.0029, σ = 0.001. The average is carried out
over all the clusters present at a given time step and over
different time steps.
FIG. 4. The number of individuals N present in the steady
state is proportional to (CK)−1. This result is in accordance
with the one obtained for a box-type influence function of
length C (see ref. [32]). We present data for different simu-
lations with 1/K ∈ [50, 500] and σ ∈ [0.0001, 0.01]. This last
parameter does not influence the final number of individuals.
The dashed line has slope -1.
FIG. 5. Segregation transition at Ccritical. Upper figure:
variation in dependence of K, where 1/K = 50, 150, 400, 500
and σ = 0.001. Lower figure: variation in dependence of σ,
where σ = 0.0005, 0.001, 0.002, 0.005, 0.01 and 1/K = 200.
FIG. 6. As shown in these figures, a critical σ value exists,
above which no spatial structures emerge. Upper figure: vari-
ation in dependence on C; we set 1/K = 100. Lower figure:
variation in dependence on K; we set C = 0.01.
FIG. 7. Top: Cluster size as a function of σ; 1/K = 300.
The solid line has slope 1. Bottom: cluster size as a function
of 1/K; σ = 0.0001. The solid line has slope 1/2. Trian-
gles represent data from the simulations where diffusion is
implemented through mutations, circles for the direct imple-
mentation of the diffusive process; we set C = 0.09.
FIG. 8. Mobility dependence on the diffusion parameter
σ for different C values; K = 0.01. In the inset, the data
collapse for arbitrary values of the parameters C and K.
FIG. 9. Spatial local fluctuation distribution at the steady
state: deviation from a Gaussian (C = 0.1, K = 0.00005,
σ = 0.0017). Data are averaged over 5 time steps. The con-
tinuous line is the best fit Gaussian.
ABSTRACT
  We present some numerical results obtained from a simple individual based
model that describes clustering of organisms caused by competition. Our aim is
to show how, even when a deterministic description developed for continuum
models predicts no pattern formation, an individual based model displays well
defined patterns, as a consequence of fluctuations effects caused by the
discrete nature of the interacting agents.

<|endoftext|><|startoftext|>
Introduction
Since H2 –the most abundant molecule in space– lacks a permanent dipole moment, its
rotational transitions are prohibited. Although the quadrupolar transitions exist, they are
of little use for the syudy of the bulk of molecular gas in the ISM because they require high
temperature to be excited. Instead, the structure and properties of cold molecular clouds
in the interstellar medium are usually studied using low-energy rotational transitions of
simple non-symmetric polar molecules. For practical reasons, the first rotational transition
(J = 1→0) of carbon monoxide (CO), at 115.27 GHz has been the most popular choice.
This transition, however, has long been known to be nearly always optically thick, so –
for a given filling factor– its intensity is expected to increase monotonically with the kinetic
temperature of the emitting gas. Clearly, this could have adverse effects on efforts to establish
the distribution of molecular gas in the ISM from CO observations alone, because very
low temperature gas might go unnoticed in sensitivity-limited CO observations while warm
regions (& 20 K) will stand out even if they are not those with the highest molecular content.
The sources where these effects might be most noticeable are those with large temperature
gradients; for instance in molecular clouds located in the immediate vicinity of hot stars.
While all molecular emission tracers share this temperature dependance to some degree,
absorption lines can be detected even in very cold gas, provided sufficiently bright background
continuum sources are available. The scarcity of such sources at the wavelength of the
common molecular tracers, however, has limited the usefulness of absorption measurements
in the study of specific Galactic molecular clouds (e.g Evans et al. 1980). The 6-cm (110-111)
transition of ortho formaldehyde (H2CO) offers an interesting alternative. Owing to collisions
with neutral particles that selectively overpopulate the lower energy level, the excitation
temperature of the 110 → 111 transition lies below 2.7 K (Townes & Cheung 1969). This
allows the transition to be observed in absorption against the cosmic microwave background
(CMB) (Snyder et al. 1969), and makes it a potentially powerful tracer of molecular gas in
any direction of the sky. The excitation requirements are such the the 6 cm H2CO line is a
good indicator of the presence of cool to warm molecular gas (T & 10 K) at intermediate
densities (103 cm−3 ≤ n ≤ 105cm−3). Unfortunately, the absorption line is weak, so very
large amounts of telescope time are required to map large areas of the sky.
Recently, Rodŕıguez et al. (2006) conducted a blind search for H2CO absorption and
compared CO emission and H2CO absorption profiles towards the Galactic anticenter. They
found a rough, large-scale correlation between these two tracers, and concluded that both
lines preferentially trace warm and dense molecular gas. Here, we will examine this relation
between CO and H2CO at a somewhat smaller scale using observations of the well-known,
nearby star-forming region Sharpless 140 (S140 –Sharpless 1959) associated with the dark
– 3 –
dust cloud Lynds 1204 (L1204 –Lynds 1962). L1204 is centered at l = 107◦. 47, b = +4◦. 82
and covers an area of 2.5 square degrees (Lynds 1962). At its southwest edge lies S140, a
prominent compact arc-shaped H ii region with an angular size of ∼ 2′ × 6′. The ionization
of S140 is maintained by the nearby B0V star HD211880 (Blair et al. 1978). The distance of
S140/L1204 deduced from the brightness of the exciting star is 910 pc (Crampton & Fisher
1974). S140 has been the subject of many observational studies, that have usually focused on
the Photon Dominated Region (PDR) on the edge of L1204, and on the embedded infrared
sources located right behind it (e.g., Preibisch et al. 2001; Hayashi & Murata 1992; Preibisch
& Smith 2002; Bally et al. 2002). Remarkably, while the dust cloud is seen as an extended
dark feature covering more than two square degrees, the CO emission peaks immediately
behind the Hα arc (Heyer et al. 1996; Evans et al. 1987; Blair et al. 1978), while only
relatively faint emission extends deep within the dust cloud (Helfer & Blitz 1997). The 6-cm
line of H2CO was detected in absorption against the CMB in L1204 near S140 by Blair et al.
(1978) with the NRAO 43 m telescope, and unexpectedly by Evans et al. (1987) during VLA
observations of the bright condensation just northwest of S140. However neither of those
studies provided a extensive mapping of the H2CO CMB absorption in L1204, and the exact
extension of the gas traced by H2CO remains unclear. In this article, we will present such a
extensive mapping of the 6 cm CMB absorption of H2CO over most of the large dust complex
L1204, and compare our results with existing CO observations taken from the literature.
2. Data
The H2CO observations were obtained during two sessions (January and September-
October 2004, respectively) with the 25.6-m telescope of the Onsala Space Observatory
(OSO) in Sweden. At 6 cm, the angular resolution is 10′, and our pointing precision was
always better than 20′′. Frequency-switching, with a frequency throw of 0.4 MHz was used,
and both polarizations of the incoming signal were recorded simultaneously in two indepen-
dent units of the autocorrelation spectrometer. Each of these units provided 800 2 kHz-wide
channels. At the observed frequency of 4829.660 MHz, this setup provided a total bandwidth
of 99 km s−1 and a (Hanning-smoothed) velocity resolution of 8 kHz ≡ 0.49 km s−1. The
spectrometer was centered at the systemic velocity of S140, VLSR = −8.0 km s−1. Daily
observations of the supernova remnant Cas A were used to check the overall performance of
the system. The system temperature during our observations varied from 33 to 36 K.
In order to map the entire region behind S140, we observed 72 positions on a regular
square grid with a 10′ spacing, centered at l = 107◦. 0, b = +5◦. 3; the resulting map uniformly
covers a 1◦. 0 × 1◦. 8 rectangular region (Fig. 1) . The off-line data reduction was done
– 4 –
with the CLASS program of the GILDAS software package (Guilloteau & Forveille 1989),
and involved only the subtraction of (flat) baselines from individual integrations and the
averaging of all spectra taken at the same pointing position. The total integration time for
each of these positions was about 10 hours, yielding a typical final noise level of 3 mK (T∗A).
The distribution of radio continuum sources in the region of L1204 has been studied in
detail by Allen Machalek & Jia (in preparation), using data from the Canadian Galactic Plane
Survey. Fairly bright continuum emission is associated with the Hα arc and the embedded
massive protostars located behind it –but, as we will see momentarily no formaldehyde
was detected from either of these regions. In addition, a number of extragalactic background
sources as well as diffuse emission associated with the dust cloud L1204 itself contribute to the
overall radio continuum. The typical brightness temperature average over the Onsala beam
at 6 cm, however, is only about 0.2 K, except towards the Hα arc and the embedded massive
protostars (where again, no absorption was detected). Since the brightness temperature is
so small, any H2CO absorption profiles features detected must be absorption of the cosmic
microwave background radiation at 2.7 K.
In the analysis of our new observations, we will also make use of 12CO(1-0) observations
of L1204/S140 kindly provided by Dr. Tamara Helfer, and published in Heyer et al. (1996),
and Helfer & Blitz (1997). These data were obtained with the 14-m telescope of the Five
College Radio Astronomy Observatory (FCRAO) in Amherst (MA), and have an intrinsic
angular resolution of 45′′. For comparison with our formaldehyde data, we have smoothed
the CO(1-0) observations to 10′, and resampled them on our observing grid.
3. Results
Formaldehyde absorption was detected in at least 16 of our 72 observed positions (see
Fig. 2). The maximum absorption is located 10′ arcmin behind the S140 H ii region at a
LSR velocity of −8.0 km s−1, similar to that of the CO emission detected in that area.
Table 1. Source positions.
Source Position (l, b) Size (∆α,∆δ) α(J2000.0), δ(J2000.0) Reference
L1204 107◦. 37, +4◦. 87 1◦. 0 × 2◦. 5 22h26m41s.+63◦15′36′′. Lynds (1962)
S140(Hα) 106
◦. 8, +5◦. 3 2′ × 6′ 22h19m23s.+63◦18′16′′. Sharpless (1959)
Our survey 107◦. 0, +5◦. 3 1◦. 0 × 1◦. 8 22h20m52s.+63◦24′49′′. This paper
– 5 –
Fig. 1.— This figure shows a sketch of the L1204/S140 region, with our observed positions
shown as ”+” symbols. The correspondence between Galactic coordinates (used throughout
the paper) and equatorial coordinates (that have usually been preferred for observations of
S140) is indicated. The circle represents the 2◦. 5 size of the dust cloud L1204 as reported
by Lynds (1962). The central positions of L1204 and S140 are shown as ”” symbols. The
contour represents the lowest value of the H2CO absorption.
A second spatio-kinematical structure is detected towards the north-east (here, and in the
rest of the paper, north and all other directions refer to Galactic coordinates), at VLSR ∼
−11 km s−1. Both components are presumably associated with L1204, and have clear CO
counterparts (Fig. 2 – Blair et al. 1978; Evans et al. 1987; Sugitani & Fukui 1987; Park &
Minh 1995). There is also an isolated absorption feature towards the southeast, at VLSR ∼
−2.5 km s−1. Given its low LSR velocity, this feature is likely unrelated to L1204, and is
probably a local cloud along the line of sight. Thus, while Sugitani & Fukui (1987) identified
three molecular components associated with L1204 in their 13CO observations, we only find
two in our formaldehyde data. We do find evidence, however, for a systematic velocity
gradient across the cloud. Park & Minh (1995) argued that this complex overall spatio-
kinematical morphology was created when S140 and L1204 were swept up by an expanding
shell associated with the Cepheus bubble. Our data do not illuminate this assertion any
further, and a more thorough study is necessary to understand the detailed structure of this
region.
– 6 –
Fig. 2.— (a) Mosaic of H2CO CMB absorption spectra observed in the L1204/S140 region.
The (0,0) position corresponds to l = 107◦. 0, b = 5◦. 3, about 12′ east of S140, and about
40′ north-west of the nominal center of L1204 (Lynds 1962). (b) Corresponding CO(1-0)
observations smoothed to 10′ (see text). Note that the CO emission is located a full 10′ west
of the maximum H2CO absorption.
4. Comparison with other molecular tracers
The S140/L1204 region has been observed in many different molecular tracers (e.g.
Tafalla et al. 1993, Zhou et al. 1993, Park & Minh 1995), but most of these observations
have focused either on the S140 PDR or on the embedded infrared sources located just
behind S140, while only a few observations covered the entire dust cloud. Indeed, the first
CO observations of S140 (Blair et al. 1978) only covered a limited part of the region. To
our knowledge, the only existing large-scale CO map of L1204 is that obtained in the 90s
with the FCRAO telescope (see §2) and published by Heyer et al. (1996) and Helfer & Blitz
(1997)1. As mentioned earlier, we will use a smoothed version of that dataset here in order
to compare with our formaldehyde observations.
In general, the CO emission and H2CO absorption morphologies in this region are quite
similar (Fig. 3). This was already noticed by Blair et al. (1978) in their 6′ observations.
It is also in good agreement with the results obtained towards the Galactic anticenter by
1The region lies on the edge of, and is only partly covered by, the CfA Galactic plane survey of Dame et
al. (2001).
– 7 –
Fig. 3.— All panels show a grey-scale version of the DSS-red image of the region around
l = 107◦. 0, b = 5◦. 4. The Hα arc of S140 is clearly visible on these (red) images near l =
106◦. 8, b = 5◦. 3. In the three left panels (a1–a3), CO contours taken from the smoothed CO
data of Heyer et al. (1996) are overlaid on top of the DSS image, whereas in the three right
panels (b1–b3), our H2CO contours are overlaid. The contours in the top two panels (a1–b1)
include the entire velocity range associated with L1204 (from -12 to -5 K km s−1), whereas
in the middle two (a2–b2) and bottom two (a3–b3) panels, the contours correspond only to
the -11 K km s−1 and the -8 K km s−1 components, respectively. The asterisks correspond
to the position of the peak H2CO absorption for each component.
– 8 –
Rodŕıguez et al. (2006), and towards the Orion molecular complex by Cohen et al. (1983).
There are, however, several noteworthy differences between the CO emission and H2CO
absorption in S140. The first difference is the fact that the CO peak and the H2CO absorption
maximum are not located at the same position. The CO integrated intensity map (Fig. 3,
see also Fig. 2) shows that the maximum CO emission occurs just behind the S140 Hα arc
at the western edge of L1204, while only comparatively fainter emission extends to greater
longitudes. The maximum H2CO absorption, however, is located about 10
′ eastward of the
CO peak. A second notable difference between the CO and H2CO profile is the existence of
a CO emission ”tail” in the south/southeast part of the main cloud with little or no H2CO
counterpart (Figs. 2 and 3).
-0.35
-0.25
-0.15
-0.05
 0  10  20  30  40  50
I (12CO) K km/sec
Intensity ratio
S140 + L1204
L1204
Galactic Anticenter
Fig. 4.— Correlation between the H2CO absorption line intensity and the
12CO(1-0) emission
line intensity at corresponding points in L1204. The squares correspond to data from Table
2, the open circles correspond to H2CO upper limits and the triangles correspond to the
Galactic anticenter data previously published in Rodŕıguez et al. (2006). The horizontal
and vertical lines show the “best case” detection limit. The dashed line is the least-squares
fit for the L1204/S140 region data, and the dash-dotted line is the least-squares fit for the
Galactic anticenter data from Rodŕıguez et al. (2006). Note that the fits do not pass through
the (0,0) point, suggesting that the relation is not linear at low intensity values.
In order to study the relation between H2CO absorption and CO(1-0) emission in a
more quantitative way, we have computed the moments of the profiles shown in Fig. 2. The
results are listed in Table 2 (Appendix B). When two velocity components are visible at
a given pointing, the moments for each were computed separately. Intensities above 3σ are
shown as squares in Fig. 4, and were used to make least-square fits (see below). The two
– 9 –
spatio-kinematical components that we identified in our formaldehyde dataset behave quite
similarly with respect to the CO-H2CO relation, and are plotted together in Fig. 4. The
best least-squares fit to a straight line for the entire L1204 dataset yields:
I(H2CO) = (4.4± 1.1)× 10−3 I(CO) + (34± 16)× 10−3 K km s−1. (1)
We shall see momentarily that the CO emission near S140 may be particularly bright because
of local heating. Ignoring the pointings very near S140, however, yields a fairly similar
relation between CO and H2CO:
I(H2CO) = (3.8± 0.9)× 10−3 I(CO) + (31± 9)× 10−3 K km s−1. (2)
The small difference between these relations presumably reflects the differing excitation
requirements for the two lines. The relation between CO and H2CO given by Eqs. 1 and
2 for the L1204 region is almost identical to that found towards the Galactic anticenter by
Rodŕıguez et al. (2006):
I(H2CO) = (4.1± 0.5)× 10−3 I(CO) + (34± 13)× 10−3 K km s−1. (3)
It is important to note, however, that, in spite of the agreement between the fits to
the Galactic anticenter and S140 data, there is very significant scatter in the CO-H2CO
relation, some points lying nearly 10σ away from the linear relation. This situation was
already noticed by Rodŕıguez et al. (2006) in their study of the Galactic anticenter. This
lack of a detailed correspondence between CO emission and H2CO absorption presumably
reflects differences in the excitation conditions of the two tracers, as we will now discuss in
the next section.
5. Discussion
The comparison between H2CO absorption and CO(1-0) emission profiles in the Galac-
tic Anticenter and in the L1204/S140 region has led us to three important observational
conclusions:
1. Qualitatively, the morphology of CO and H2CO are quite similar, and quantitatively,
the line integrated intensities correlate quite well with one another.
– 10 –
2. The scatter in the CO-H2CO relation is, however, significantly larger than the obser-
vational errors.
3. In the specific case of S140, the CO emission peak is offset by about 3 pc from the
locus of the deepest formaldehyde absorption, and there is a region south of the main
cloud where significant CO emission is detected with little or no H2CO counterpart.
From the general large-scale correspondance between the CO(1-0) and H2CO 6 cm
integrated maps (Fig. 3), and from the fair correlation between their line intensities (Fig.
4), we conclude that the physical conditions needed for the excitation of both lines are
quite similar. The calculations presented in the Appendix A, indeed show that both lines
preferentially trace warm gas at intermediate densities (103.6< n < 105 for H2CO; n > 10
for CO). In this scheme, the offset between the CO(1-0) peak and the H2CO maximum
absorption may seem puzzling. Note that a similar trend is seen at higher resolution: while
the CO peak in the full-resolution CO map published by Heyer et al. (1996) and Helfer &
Blitz (1997) is at l = 106o.8, b = +5o.3, the H2CO absorption feature seen in the high-
resolution VLA images published by Evans et al. (1987) is centered around l = 106o.9, b =
+5o.3, again a few arcminutes to the east.
We suggest that a combination of two effects may explain this puzzling result. First, it
can be seen from the excitation analysis presented in Appendix A that the H2CO absorption
strength ”saturates” for T & 30 K, whereas the temperature of the CO emission continues
to rise at higher kinetic temperature (Fig. 5). For example, while the H2CO line strength
increases by only about 30% when the kinetic temperature goes from 20 to 40 K, the CO(1-0)
line intensity increases by more than a factor of two. According to Park & Minh (1995), the
CO brightness temperature is about 40 K at the peak and 20 K for the rest of the cloud. Thus,
the strong CO peak behind S140 may well be largely due to enhanced kinetic temperatures
related to local heating (by the external star providing the ionization of S140, and/or by
the infrared sources embedded in the cloud). As one progresses into the cloud, the local
heating diminishes, and the CO line intensity fades. Also, it should be pointed out that the
formaldehyde calculations presented in the Appendix show that the 6-cm line should be seen
in emission rather than absorption when the density exceeds 105 cm−3. The fact that this is
not the case near the CO peak (neither in our low-resolution data, nor in the high-resolution
VLA data presented by Evans 1978) suggests that the gas density there is lower than 105
cm−3. Other effects that could explain the offset between the CO and the H2CO peaks are
the lower dissociation energy, and the lower abundance (and, therefore, lower self-shielding)
of formaldehyde compared to CO (see Appendix A.3). In a photo-dissociated region, these
effects should combine to create a stratified distribution where CO survives nearer the source
of the UV photons than H2CO. This stratification, combined with the heating of the CO,
– 11 –
would naturally lead to the offset between H2CO and CO seen in the present data.
Finally, the origin of the other main difference between CO and H2CO in S140, namely
the existence of CO emission at the south of L1204 with no or little formaldehyde counterpart,
is likely related to another aspect of the excitation differences between the 6 cm line of
formaldehyde and the 1-0 transition of carbon monoxide. Fig. 6 of the Appendix A.2 shows
that the density detection limit for H2CO line is ∼10 times larger than the density limit for
the CO(1-0) line. We therefore suggest that the gas traced by the CO emission to the south
of L1204 is of relatively very low density. It is interesting to note, indeed, that classical
high-density molecular tracers (e.g. CS or NH3) have only been detected around the CO
peak behind S140, and not in the southern region of the cloud.
Thus, we conclude that the CO(1-0) and H2CO 6 cm lines both tend to preferentially
trace warm gas at intermediate densities. There are, however, significant differences related
either to differing excitation requirements or to differing abundances. These differences can
easily explain the large scatter in the CO–H2CO relation.
6. Conclusions
The main conclusions of this work are the following:
1. We have mapped a large region (70′ × 110′) around L1204/S140 in the 6 cm line of
formaldehyde, observing a total of 72 regularly-spaced positions every 10′ on a regular
grid. The center of our map was at l = 107◦. 0, b = +5◦. 3, and formaldehyde was
detected against the cosmic microwave background in at least 16 of our 72 positions
(Fig. 2).
2. The formaldehyde emission can be separated in three spatio-kinematical components
(Fig. 3): two (at VLSR ∼ –11 km s−1 in the northeast part of the cloud, and at VLSR
∼ –8 km s−1 just behind S140) are clearly associated with L1204, whereas the other
(an isolated component at VLSR ∼ –2.5 km s−1 towards the southeast) is most likely a
local foreground cloud unrelated to S140/L1204.
3. Both qualitatively and quantitatively, the CO(1-0) emission and the formaldehyde
6 cm absorption lines correlate fairly well. An excitation analysis shows that both
preferentially trace warm gas at intermediate densities.
4. There are, however, notable differences between the CO and H2CO lines, that can be
traced to differing excitation requirements and abundances. Those differences are most
likely the origin of the large scatter in the CO-H2CO intensity correlation.
– 12 –
We thank Professor Roy Booth, director (retired) of the radio observatory at Onsala, for
generous allocations of telescope time and for his warm hospitality during our several visits
to the observatory. We are also grateful to the observatory technical and administrative
staff for their capable assistance with our observing program. We thank Tamara Helfer
for supplying us with the CO data cube of S140. We acknowledge the financial support of
the Dirección General de Asuntos del Personal Académico (DGAPA), Universidad Nacional
Autónoma de México (UNAM) and Consejo Nacional de Ciencia y Tecnoloǵıa (CONACyT),
in México, and the Director’s Discretionary Research Fund at the Space Telescope Science
Institute. The Digitized Sky Surveys were produced at the Space Telescope Science Institute
under U.S. Government grant NAG W-2166. The images of these surveys are based on
photographic data obtained using the Oschin Schmidt Telescope on Palomar Mountain and
the UK Schmidt Telescope. The plates were processed into the present compressed digital
form with the permission of these institutions.
A. Model calculations
A.1. Collisional pumping
Observations of H2CO in dark clouds show that the anomalous absorption of the 6 and
2 cm lines is due to collisions with H2 that selectively overpopulate the lower levels of the
lines (Evans et al. 1975). We calculated the non–LTE equilibrium populations of the first
40 levels of ortho–H2CO assuming excitation by the 2.7 K background and collisions with
H2. Green (1991) has calculated excitation rates of these levels for collisions with He taking
advantage of the spherical symmetry of the He potential for kinetic temperatures T = 10 to
300 K. According to Green (1991), excitation rates by H2 collisions could be 2.2 times higher
than those by He because of the smaller reduced mass and differences in the interaction
potentials. We will show the results of the calculations under the assumption that the H2–
H2CO collisional rates are the same as the He–H2CO rates. Probabilities for the radiative
transitions were taken from Jaruschewski et al. (1986).
The optical depths of the transitions involved in the pumping mechanism are generally
larger than 1 at high densities and a radiative transfer calculation is required. Two limiting
approximations in the radiation transport are often considered in molecular clouds: the
large velocity gradient (LVG) model and the microturbulent model (Leung & Liszt 1976).
The LVG model assumes that the line profile is dominated by systematic motion of the gas
while the microturbulent model assumes that the turbulent velocity is much larger than any
systematic motion. S 140 is likely to have several velocity components, but the existence
of large systematic motions in molecular clouds and the validity of the LVG model has not
– 13 –
been well established in other molecular clouds (e.g. Evans et al. (1975), Zuckerman & Evans
(1974), Zhou et al. (1990)). We therefore use the microturbulent model, and for simplicity
we will use the escape probability formalism to account for photon trapping in a turbulent
medium in plane-parallel slab of mean total optical depth τt that is perpendicular to the line
of sight. In this case the A–value of a transition in the equations of statistical equilibrium is
multiplied by a “loss probability” P (τ, τt) that depends on the mean optical depth τ in the
slab. There are many different ways to define P (τ, τt), which can differ by several orders of
magnitude for large optical depths. We will use the form suggested by Hummer & Storey
(1992) for a uniform medium with no continuum absorption:
P (τ, τt) =
[K2(τ) +K2(τt − τ)] , (A1)
where
K2(τ) =
dx φ(x)E2[τφ(x)] , (A2)
and E2 is the second exponential integral function. The function K2(τ) can be calculated
from fits by Hummer (1981) for a normalized Doppler profile, φ(x) = exp(−x2)/
Equation (A1) can be viewed as the single flight escape probability through either side
of the slab averaged over the line profile. The probability that a photon of an isotropic
background reaches optical depth τ in this slab is also given by equation (A1), and the
blackbody continuum is thus attenuated by a factor P (τ, τt). The mean optical thickness is
given by
nu/gu
nl/gl
dz , (A3)
where σlu is the absorption cross section of the transition and ∆νD is the Doppler width.
We assumed a Doppler width of 2 km s−1 (FWHM = 2
ln 2∆νD = 3.3 km s
−1). The level
population and its statistical weight are given by n and g respectively with subindex u for
the upper and l for the lower level.
The emergent line brightness temperature with subtracted background T0 is found by
direct integration of the source function throughout the slab as
∆Tb = Tb − T0
= T0[ exp(−T )− 1]
exp(−T + τ)
nl/gl
nu/gu
dτ , (A4)
where
– 14 –
1− nu/gu
nl/gl
dz , (A5)
is the total mean optical depth of a slab of thickness L and T = τtφ(0) is the line–center
total optical depth.
Equation A4 gives the correct asymptotic limits: Tb → T0 when the density goes to
0 and Tb → T for high densities. The population densities and the optical depths as a
function of position in the slab in equations (A1) and (A3) were calculated iteratively. A
convergence of 10−3 K in Tb was achieved after a few iterations for T ≤ 40 K and H2
densities n(H2) < 10
6 cm−3. For higher densities and temperatures the procedure becomes
unstable. Fig. 5 shows the calculated ∆Tb for a 1 pc–thick slab as a function of the H2
density and a constant H2CO abundance of 2 × 10−9 with respect to H2 (Hasegawa et al.
(1992), Leung et al. (1984)). The assumed thickness of the slab has an important effect in
the anomalous absorption as shown in Fig. 5. As the thickness of the slab decreases, the
effectiveness of the pumping mechanism that cools the line decreases.
(a) (b)   (c)
Fig. 5.— (a) Brightness temperature minus background continuum vs. density for different
kinetic temperatures T of a 1–pc thick slab. (b) Contour intervals are 0.2 K from ∆Tb = −1
to 1 K. (c) Brightness temperature minus background continuum vs. density for different
slab thicknesses at T = 40 K.
Garrison et al. (1975) identified some transitions, like 111 → 312, that produce selection
effects in the excitation of some levels that cool the H2CO doublets. We have tested our model
for possible variations of the collision rates. An overall increase of collisional rates by a factor
of 2.2 decreases ∆Tb in figure 5 by 0.3 to 0.5 K for T ≥ 20 K and n(H2) > 1.6 × 104 cm−3.
For lower T and n(H2) there is very little variation in the predicted Tb.
– 15 –
A.2. A PDR model for CO
The UV field has a higher influence on the CO brightness temperature than on the
H2CO brightness temperature. Both molecules are quickly photodissociated near the edge
of clouds, but the larger abundance of the CO molecule makes its chemistry and interaction
with radiation more complex. In order to take into account the variation of CO abundance
along the line of sight, we used the Meudon PDR code to calculate the CO brightness
temperature of a plane–parallel slab irradiated by a UV field (Le Bourlot et al. 1993). A
detailed description of a revised version of the code is given by Le Petit et al. (2006).
The sharp separation between the molecular, atomic and ionized emissions suggests
that the L1204/S140 interface is a PDR viewed nearly edge–on (Hayashi & Murata 1992)
irradiated by HD 211880. The angle of incidence of the star’s radiation on the PDR boundary
is an unknown parameter but appears to be more-or-less perpendicular. Furthermore, the
infrared embedded sources are probably young stars that may also enhance the radiation
field (Evans et al. 1989).
We ran the code with its parameters set to represent a plane–parallel slab irradiated
from one side by a UV field with an enhancement factor χ = 200 with respect to the Draine
(1978) average interstellar radiation field. In a PDR, the gas is heated by the photoelec-
tric emission from grains and PAH’s, H2 formation in grains, UV pumping in the Lyman
and Werner bands, gas–grain collisions, photoionization, and photodissociation. As the UV
radiation is absorbed deeper into the cloud, other processes like cosmic rays and chemical
reaction energies become important. Cooling is produced by fine–structure and molecular
line emission. Shocks and turbulence can keep a PDR away from isobaric equilibrium. How-
ever in our model the temperature and density of the PDR were kept constant in order to
compare the results with our H2CO model in Fig. 6. We used the chemical network given
for S140 by the Meudon group at its Internet site2, which does not include H2CO. Thus the
CO and H2CO calculations represent different models, and Fig. 6 is given only as indication
of the local conditions that produce the emission and absorption for each molecule.
A.3. Photodissociation of H2CO
The H2CO molecule is quickly photodissociated into CO and H2 or H in the average UV
interstellar radiation field with a rate of 1.0 × 10−9 exp(−1.7AV ) s−1 (van Dishoeck 1988).
Keene et al. (1985) estimated that far–ultraviolet radiation (FUV) from the star HD211880
2http://aristote.obspm.fr/MIS/pdr/exe.html
– 16 –
CO H CO
Fig. 6.— Brightness temperature minus background continuum vs. density for different
kinetic temperatures T for both the 12CO(1-0) line and the 6 cm H2CO line (Note that
∆Tb(k) is plotted ”negative” compared to Figure 5).
will have an enhancement factor of G0 = 150 with respect to the average interstellar field
(Habing 1968) at the ionization front, although Spaans et al. (1997) found that a more
intense radiation field may be needed to explain the H2 rotational emission. H2CO has a
dissociation energy of 3.61 ± 0.03 eV (Suto Wang & Lee 1986), and a photodissociation rate
of 1.0 × 10−9 sec−1 in the interstellar field (van Dishoeck 1988) while for CO the values
are 11.2 eV and 2.0 × 10−10 sec−1 respectively. Thus it is possible that the H2CO will be
selectively photodissociated near the S140 ionization front and the bright PDR region, where
the CO emission peaks. Detailed PDR model calculations by Li et al. (2002) show that at
20′ from the ionization front, where we observe the H2CO maximum, G0 ≤ 20 and AV ∼ 15.
We added 62 reactions involving H2CO and H2CO
+ taken from the UMIST data base
(Woodall et al., 2007) 3 to the chemical network of the Meudon group mentioned above and
ran a PDR model with G0 = 200, T = 40K and constant density of 10
3 cm−3. We found
that H2CO has significant abundance only at depths of Av > 7 while CO becomes important
at Av > 4, which shows that photodestruction could explain the offset between the CO and
the H2CO peaks.
3http://www.udfa.net/
– 17 –
B. Profile moments in detail
– 18 –
Table 2. Profile moments for each position of Fig. 2. The upper limits correspond to 3σ.
The symbol “. . . ” indicates no data were available, while “nf” indicates that data were
available but no reliable fit could be made. Values marked with “yes” in column 6 are
plotted as squares in Fig. 4, and were used for the least square fits.
Offset (l,b) 1000× I(H2CO) 〈V 〉 I(CO) 〈V 〉 Included in
arcmin K km s−1 km s−1 K km s−1 km s−1 fits
40, -70 < -24.3 nf . . . . . .
30, -70 < -18.9 nf . . . . . .
20, -70 < -25.8 nf . . . . . .
10, -70 < -23.4 nf 4.1 ± 0.7 -7.6 ± 1.3
0, -70 -42.1 ± 8.4 -7.5 ± 1.5 3.5 ± 0.6 -8.7 ± 1.5 yes
-10, -70 -30.3 ± 7.1 -6.2 ± 1.5 8.6 ± 0.8 -8.3 ± 0.8 yes
-20, -70 < -21.6 nf < 2.1 nf
40, -60 < -22.5 nf . . . . . .
30, -60 < -24.0 nf . . . . . .
20, -60 < -26.7 nf 5.9 ± 0.8 -7.0 ± 1.1
10, -60 < -23.7 nf 5.1 ± 0.6 -8.3 ± 1.0
0, -60 < -19.2 nf < 1.8 nf
-10, -60 -27.9± 7.1 -6.7 ± 0.5 < 2.1 nf
-20, -60 < -28.8 nf < 1.5 nf
40, -50 < -25.5 nf . . . . . .
30, -50 -109.3 ± 8.9 -2.6 ± 0.2 14.8 ± 1.1 -1.8 ± 0.2 yes
20, -50 < -25.2 nf 10.5 ± 0.8 -5.0 ± 0.4
10, -50 < -21.6 nf 8.2 ± 0.6 -7.0 ± 0.5
0, -50 < -18.9 nf < 2.4 nf
-10, -50 < -16.5 nf < 2.4 nf
-20, -50 < -24.6 nf < 4.8 nf
40, -40 < -23.1 nf . . . . . .
30, -40 < -22.8 nf 3.1 ± 0.6 -6.2 ± 1.2
20, -40 < -22.2 nf 7.2 ± 0.8 -7.2 ± 0.9
10, -40 < -23.1 nf 4.5 ± 0.5 -8.8 ± 1.1
0, -40 < -22.5 nf < 2.7 nf
-10, -40 < -26.1 nf < 2.7 nf
-20, -40 < -18.6 nf < 3.0 nf
40, -30 < -22.5 nf 3.4 ± 0.5 -9.0 ± 1.5
30, -30 < -22.8 nf 4.0 ± 0.7 -8.9 ± 1.5
– 19 –
Table 2—Continued
Offset (l,b) 1000× I(H2CO) 〈V 〉 I(CO) 〈V 〉 Included in
arcmin K km s−1 km s−1 K km s−1 km s−1 fits
20, -30 < -25.5 nf < 1.8 nf
10, -30 < -30.9 nf 5.2 ± 0.6 -8.6 ± 1.0
0, -30 < -20.1 nf < 4.2 nf
-10, -30 < -26.7 nf < 1.8 nf
-20, -30 < -22.2 nf < 2.1 nf
40, -20 . . . . . . < 2.7 nf
30, -20 < -30.9 nf 3.5 ± 0.7 -9.4 ± 1.9
20, -20 < -24.0 nf < 2.7 nf
10, -20 -31.6 ± 6.0 -8.2 ± 1.6 8.2 ± 1.1 -8.9 ± 1.2 yes
0, -20 < -29.1 nf 8.5 ± 1.4 -9.5 ± 1.7
-10, -20 < -19.5 nf 6.8 ± 0.3 -8.7 ± 0.9
-20, -20 < -22.2 nf < 1.5 nf
40, -10 . . . . . . < 1.8 nf
30, -10 < -35.7 nf < 2.4 nf
20, -10 < -16.8 nf < 3.0 nf
10, -10 -43.1 ± 8.2 -8.1 ± 1.6 4.7 ± 1.1 -10.3 ± 2.8 yes
0, -10 -76.2 ± 13.8 -9.5 ± 1.8 11.7 ± 0.8 -9.0 ± 0.6 yes
-10, -10 -82.7 ± 7.7 -8.1 ± 0.8 18.6 ± 0.6 -8.2 ± 0.3 yes
-20, -10 < -24.3 nf < 2.7 nf
40, 0 < -33.0 nf 2.8 ± 0.5 -11.6 ± 2.3
30, 0 < -28.8 nf < 3.3 nf
20, 0 -63.8 ± 7.8 -7.8 ± 0.9 3.8 ± 1.1 -8.9 ± 3.0 yes
10, 0 -93.7 ± 11.3 -8.7 ± 1.1 9.4 ± 0.7 -9.9 ± 0.8 yes
0, 0 -272.6 ± 9.8 -7.8 ± 0.3 21.6 ± 0.7 -8.0 ± 0.3 yes
-10, 0 -159.8 ± 8.6 -8.0 ± 0.4 46.5 ± 1.6 -7.8 ± 0.3 yes
-20, 0 < -24.0 nf 10.2 ± 0.7 -7.7 ± 0.6
40,+10 . . . . . . < 1.7 nf
30, +10 < -28.8 nf < 0.9 nf
20, +10 -52.0 ± 8.2 -8.7 ± 1.4 5.3 ± 0.4 -9.1 ± 0.7 yes
10, +10 -114.1 ± 9.7 -6.2 ± 0.6 11.1 ± 1.3 -7.1 ± 0.9 yes
– 20 –
Table 2—Continued
Offset (l,b) 1000× I(H2CO) 〈V 〉 I(CO) 〈V 〉 Included in
arcmin K km s−1 km s−1 K km s−1 km s−1 fits
-70.3 ± 7.7 -10.3 ± 1.2 13.5 ± 1.1 -10.6 ± 1.0 yes
0, +10 -247.0 ± 10.1 -7.6 ± 0.3 22.8 ± 1.6 -8.5 ± 0.6 yes
-10, +10 -118.9 ± 9.2 -8.1 ± 0.6 22.3 ± 2.0 -8.6 ± 0.8 yes
-20, +10 < -19.2 nf < 0.9 nf
40, +20 . . . . . . < 1.2 nf
30, +20 < -25.2 nf 3.1 ± 0.4 -8.3 ± 1.7
20, +20 -42.3 ± 9.3 -7.1 ± 1.6 6.7 ± 0.7 -9.9 ± 1.0 yes
-82.0 ± 7.4 -11.3 ± 1.1 nf nf
10, +20 -49.8 ± 6.9 -6.9 ± 1.0 7.0 ± 1.2 -8.0 ± 1.4 yes
-63.1 ± 5.5 -10.8 ± 1.0 8.3 ± 0.9 -11.0 ± 1.4 yes
0, +20 -68.3 ± 9.7 -8.9 ± 1.3 6.1 ± 0.7 -10.8 ± 1.3 yes
-10, +20 -56.4 ± 6.8 -8.6 ± 1.1 5.5 ± 0.5 -10.8 ± 1.1 yes
-20, +20 < -29.7 nf < 2.7 nf
40, +30 . . . . . . < 1.6 nf
30, +30 < -27.9 nf < 1.6 nf
20, +30 < -19.5 nf 3.4 ± 0.8 -11.4 ± 2.7
10, +30 -54.1 ± 8.8 -9.3 ± 1.6 5.6 ± 0.7 -10.3 ± 1.2 yes
0, +30 < -18.0 nf 5.0 ± 0.6 -11.1 ± 1.3
-10, +30 < -27.0 nf 2.0 ± 0.4 -10.2 ± 2.2
-20, +30 < -23.4 nf < 3.0 nf
– 21 –
REFERENCES
Bally, J., Reipurth, B., Walawender, J., & Armond, T., 2002, AJ, 124, 2152
Blair, G. N., Evans, N. J., Vanden Bout, P. A., & Peters W. L., 1978, ApJ, 219, 893-913
Crampton, D., & Fisher, W. A., 1974, Pub. Dom. Astrophys. Obs., 14, 283
Cohen, R. J., Matthews, N., Few, R. W., & Booth, R. S., 1983, MNRAS, 203, 1123
Dame, T. M., Hartmann, D., & Thaddeus, P., 2001, ApJ, 547, 792
Draine, B. T., 1978, ApJS, 36, 595.
Evans, N. J., II, Zuckerman, B., Sato, T., & Morris, G. 1975, ApJ, 199, 383.
Evans, N. J., II, Rubin, R. H., & Zuckerman, B. 1980, ApJ239, 839
Evans, N. J., II, Kutner, M. L., & Mundy, L. G., 1987, ApJ, 323, 145
Evans, N. J., II, Mundy, L. G., Kutner, M. L.; Depoy, D. L., 1989, ApJ, 346, 212.
Garrison, B. J., Lester, W. A., Jr., Miller, W. H., & Green, S., 1975, ApJ, 200, L175.
Green, S. 1991, ApJS, 76, 979.
Guilloteau S., & Forveille T. 1989, Grenoble Image and Line Data Analysis System
(GILDAS), IRAM, http://www.iram.fr/IRAMFR/GILDAS
Habing, H. J., 1968, Bull. Astron. Inst. Netherlands, 19, 421.
Hasegawa, T. I., Herbst, E., & Leung, C. M., 1992, ApJS, 82, 167.
Hayashi M., & Murata Y., 1992, PASJ, 44, 391
Hayashi, M. & Murata, Y., 1992, PASJ, 44, 391.
Helfer, T. T., & Blitz, L., 1997, ApJ, 478, 233
Heyer, M. H., Carpenter, J. M., & Ladd, E. F., 1996, ApJ, 463, 630
Hummer, D. G., 1981, J. Quant. Spec. Radiat. Transf., 26, 187.
Hummer, D. G., & Storey, P. J., 1992, MNRAS, 254, 277.
http://www.iram.fr/IRAMFR/GILDAS
– 22 –
Jaruschewski, S., Chandra, S., Varshalovich, D. A., & Kegel, W. H. 1986, A&AS, 63, 307.
Keene, J., Blake, G. A., Phillips, T. G., Huggins, P. J., & Beichman, C. A., 1985, ApJ, 299,
Le Bourlot, J., Pineau Des Forets, G., Roueff, E. & Flower, D. R., 1993, å, 267, 233.
Le Petit, F., Nehmé, C., Le Bourlot, J. & Roueff, E., 2006, ApJS, 164, 506.
Le Teuff Y. H., Millar T. J., & Markwick A. J., 2000, A&AS, 146, 157.
Leung, C. M., & Liszt, H. S., 1976, ApJ, 208, 732.
Leung, C. M., Herbst, E., & Huebner, W. F., 1984, ApJS, 56, 231.
Li, W., Evans, N. J., II, Jaffe, D. T., van Dishoeck, E. F., & Thi, W. F., 2002, ApJ, 568,
Lynds, B. T., 1962, ApJS, 7, 1
Park, Y., & Minh Y., 1995, JKAS, 28, 255
Preibisch, T., Balega, Y. Y., Schertl, D., Smith, M. D., & Weigelt, G., 2001, A&A, 378, 539
Preibisch, T., & Smith, M. D, 2002, A&A, 383, 540
Rodŕıguez, M. I., Allen, R., Loinard, L., & Wiklind, T., 2006, ApJ, 652, 1230
Sharpless, S., 1959 , ApJS, 4, 257
Snyder, L. E., Buhl, D., Zuckerman, B., Palmer, P., 1969, PhRvL, 22, 679
Spaans, M & van Dishoeck, E. F., 1997, å, 323, 953
Sugitani, K., & Fukui, Y., 1987, IAUS, 115, 75
Suto, M., Wang, X., & Lee, L.C., 1986, JChPh, 85, 4228
Thaddeus, P., 1972, ApJ, 173, 317
Tafalla, M., Bachiller, R., & Martin-Pintado, J., 1993, ApJ, 403, 175
Timmermann, R., Bertoldi, F., Wright, C. M., Drapatz, S., Draine, B. T., Haser, L.,&
Sternberg, A., 1996, A&A, 315L, 281
Townes, C. H., & Cheung, A. C., 1969, ApJ, 157L, 103
– 23 –
Ungerechts, H., Winnewisser, G., Walmsley, C. M., 1986, A&A, 157, 207
van Dishoeck, E. F. 1988, Rate Coefficients in Astrochemistry, Millar, T. J. & Williams, D.
A. (ed.), Dordrecht: Kluwer, 1988, p. 49.
Vanden Bout, P. A., Snell, R. L., & Wilson, T. L., 1983, A&A, 118, 337
Zhou, S., Evans, N. J. II, Butner, H. M., Kutner, M. L., Leung, C. M., & Mundy, L. G.,
1990, ApJ, 363, 168.
Zuckerman, B., Evans, N. J., II 1974, ApJ, 192, L149.
This preprint was prepared with the AAS LATEX macros v5.2.
	Introduction
	Data
	Results
	Comparison with other molecular tracers
	Discussion
	Conclusions
	Model calculations
	Collisional pumping
	A PDR model for CO
	Photodissociation of H2CO
	Profile moments in detail
ABSTRACT
  We report observations of the dust cloud L1204 with the Onsala 25-m telescope
in the 6 cm (1$_{11}-1_{10}$) transition of \htco. The observed region includes
the
  S140 H${\alpha}$ arc. This spectral line is seen here in absorption against
the cosmic microwave background, indicating the presence of widespread warm
molecular gas at intermediate densities. Overall, the distributions of H$_2$CO
and CO (taken from the literature) are fairly similar, though significant
differences exist at small scales. Most notably, while the CO peak is nearly
coincident with the S140 H${\alpha}$ arc, the maximum H$_2$CO absorption is
clearly separated from it by a full 10$'$ beam ($\sim$ 3 pc). We argue that
these differences result from differing abundances and excitation requirements.
The CO(1-0) line is more optically thick and more biased towards warm gas than
the H$_2$CO 6 cm line. On the other hand, formaldehyde is more easily
photodissociated and is, therefore, a poorer tracer of the molecular gas
located immediately behind Photon Dominated Regions.

<|endoftext|><|startoftext|>
Introduction 2
Acknowledgements 3
1. The dimer model on graphs with boundary 3
1.1. Dimers on graphs with boundary 3
1.2. Dimers on surface graphs with boundary 4
2. Kasteleyn orientations on surface graphs with boundary 5
2.1. Kasteleyn orientations 5
2.2. Discrete spin structures 5
2.3. The Pfaffian formula for the partition function 6
3. Cutting and gluing 8
3.1. Cutting and gluing graphs with boundary 8
3.2. Cutting and gluing surface graphs with boundary 9
3.3. Cutting and gluing discrete spin structures 10
3.4. Cutting Pfaffians 12
4. Quantum field theory for dimers 13
4.1. Quantum field theory on graphs 13
4.2. Quantum field theory for dimers on graphs 13
4.3. The dimer model as the theory of free Fermions 14
5. Dimers on bipartite graphs and height functions 16
5.1. Composition cycles on bipartite graphs 16
5.2. Height functions for planar bipartite graphs 16
5.3. Height functions for bipartite surface graphs 19
5.4. The dimer quantum field theory on bipartite surface graphs 22
References 23
Date: October 22, 2018.
1991 Mathematics Subject Classification. Primary: 82B20; Secondary: 57R15.
http://arxiv.org/abs/0704.0273v1
2 DAVID CIMASONI AND NICOLAI RESHETIKHIN
Introduction
A dimer configuration on a graph Γ is a choice of a family of edges of Γ, called
dimers, such that each vertex of Γ is adjacent to exactly one dimer. Assigning
weights to the edges of Γ allows to define a probability measure on the set of dimer
configurations. The study of this measure is called the dimer model on Γ. Dimer
models on graphs have a long history in statistical mechanics [6, 12], but also show
interesting aspects involving combinatorics, probability theory [10, 4], real algebraic
geometry [9, 8], etc...
A remarkable fact about dimer models was discovered by P.W. Kasteleyn in
the 60’s: the partition function of the dimer model can be written as a linear
combination of 22g Pfaffians of N ×N matrices, where N is the number of vertices
in the graph and g the genus of a closed oriented surface Σ where the graph can be
embedded. The matrices are signed-adjacency matrices, the sign being determined
by an orientation of the edges of Γ called a Kasteleyn orientation. If the graph
is embedded in a surface of genus g, there are exactly 22g equivalence classes of
Kasteleyn orientations, defining the 22g matrices. This Pfaffian formula for the
partition function was proved by Kasteleyn in [6] for the cases g = 0, 1, and only
stated for the general case [7]. A combinatorial proof of this fact and the exact
description of coefficients for all oriented surfaces first appeared much later [11, 14].
The number of equivalence classes of Kasteleyn orientations on a graph Γ em-
bedded in Σ is also equal to the number of equivalence classes of spin structures on
Σ. An explicit construction relating a spin structure on a surface with a Kasteleyn
orientation on a graph with dimer configuration was suggested in [10]. In [3], we in-
vestigated further the relation between Kasteleyn orientations and spin structures.
This allows to understand Kasteleyn orientations on a graph embedded in Σ as
discrete spin structures on Σ. We also used this relation to give a geometric proof
of the Pfaffian formula for closed surfaces. Our final formula can be expressed as
follows: given a graph Γ embedded in a closed oriented surface Σ of genus g, the
partition function of the dimer model on Γ is given by
Z(Γ) =
ξ∈S(Σ)
Arf(ξ)Pf(Aξ(Γ)),
where S(Σ) denotes the set of equivalence classes of spin structures on Σ, Arf(ξ) =
±1 is the Arf invariant of the spin structure ξ, and Aξ(Γ) is the matrix given by
the Kasteleyn orientation corresponding to ξ.
The first part of the present paper is devoted to the extension of the results
obtained in [3] to dimer models on graphs embedded in surfaces with boundary
(Sections 1 and 2). We then show how the operations of cutting and gluing act
on discrete spin structures and how they change the partition function (Section 3).
These operations define the structure of a functorial quantum field theory in the
spirit of [2, 13], as detailed in Section 4. We then give two equivalent reformulations
of the dimer quantum field theory: the “Fermionic” version, which describes the
partition function of the dimer model as a Grassman integral, and the “Bosonic”
version, the equivalent description of dimer models on bipartite surface graphs in
terms of height functions. This special case of bipartite graphs is the subject of
Section 5.
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 3
Throughout this paper, Σ is a compact surface, possibly disconnected and possi-
bly with boundary, endowed with the counter-clockwise orientation. All results can
be extended to the case of non-orientable surfaces, which will be done in a separate
publication. We refer to [14] for a combinatorial treatment of dimer models on
non-orientable surface graphs.
Acknowledgements. We are grateful to J. Andersen, M. Baillif, P. Teichner and
A. Vershik for inspiring discussions. We also thankfully acknowledge the hospitality
of the Department of Mathematics at the University of Aarhus. The work of D.C.
was supported by the Swiss National Science Foundation. This work of N.R. was
partially supported by the NSF grant DMS–0307599, by the CRDF grant RUM1–
2622, by the Humboldt foundation and by the Niels Bohr research grant.
1. The dimer model on graphs with boundary
1.1. Dimers on graphs with boundary. In this paper, a graph with boundary
is a finite graph Γ together with a set ∂Γ of one valent vertices called boundary
vertices . A dimer configuration D on a graph with boundary (Γ, ∂Γ) is a choice of
edges of Γ, called dimers , such that each vertex that is not a boundary vertex is
adjacent to exactly one dimer. Note that some of the boundary vertices may be
adjacent to a dimer of D, and some may not. We shall denote by ∂D this partition
of boundary vertices into matched and non-matched. Such a partition will be called
a boundary condition for dimer configurations on Γ.
A weight system on Γ is a positive real valued function w on the set of edges of
Γ. It defines edge weights on the set D(Γ, ∂Γ) of dimer configurations on (Γ, ∂Γ)
w(D) =
w(e),
where the product is taken over all edges occupied by dimers of D.
Fix a boundary condition ∂D0. Then, the Gibbs measure for the dimer model
on (Γ, ∂Γ) with weight system w and boundary condition ∂D0 is given by
Prob(D | ∂D0) =
Z(Γ;w | ∂D0)
where
Z(Γ;w | ∂D0) =
D:∂D=∂D0
w(D),
the sum being on all D ∈ D(Γ, ∂Γ) such that ∂D = ∂D0.
Let V (Γ) denote the set of vertices of Γ. The group
G(Γ) = {s : V (Γ) → R>0}
acts on the set of weight systems on Γ as follows: (sw)(e) = s(e+)w(e)s(e−), where
e+ and e− are the two vertices adjacent to the edge e. Note that (sw)(D) =
v s(v)w(D) and Z(Γ; sw | ∂D0) =
v s(v)Z(Γ;w | ∂D0), both products being on
the set of vertices of Γ matched by D0. Therefore, the Gibbs measure is invariant
under the action of the group G(Γ).
Note that the dimer model on (Γ, ∂Γ) with boundary condition ∂D0 is equivalent
to the dimer model on the graph obtained from Γ by removing all edges adjacent
to non-matched boundary vertices.
4 DAVID CIMASONI AND NICOLAI RESHETIKHIN
Given two dimer configurations D and D′ on a graph with boundary (Γ, ∂Γ),
let us define the (D,D′)-composition cycles as the connected components of the
symmetric difference C(D,D′) = (D ∪D′)\(D ∩D′). If ∂D = ∂D′, then C(D,D′)
is a 1-cycle in Γ with Z2-coefficients. In general, it is only a 1-cycle (rel ∂Γ).
1.2. Dimers on surface graphs with boundary. Let Σ be an oriented compact
surface, not necessarily connected, with boundary ∂Σ. A surface graph with bound-
ary Γ ⊂ Σ is a graph with boundary (Γ, ∂Γ) embedded in Σ, so that Γ ∩ ∂Σ = ∂Γ
and the complement of Γ \ ∂Γ in Σ \ ∂Σ consists of open 2-cells. These conditions
imply that the graph Γ := Γ ∪ ∂Σ is the 1-skeleton of a cellular decomposition of
Note that any graph with boundary can be realized as a surface graph with
boundary. One way is to embed the graph in a closed surface of minimal genus,
and then to remove one small open disc from this surface near each boundary vertex
of the graph.
A dimer configuration on a surface graph with boundary Γ ⊂ Σ is simply a
dimer configuration on the underlying graph with boundary (Γ, ∂Γ). Given two
dimer configurations D and D′ on a surface graph Γ ⊂ Σ, let ∆(D,D′) denote
the homology class of C(D,D′) in H1(Σ, ∂Σ;Z2). We shall say that two dimer
configurations D and D′ are equivalent if ∆(D,D′) = 0 ∈ H1(Σ, ∂Σ;Z2). Note
that given any three dimer configurations D,D′, and D′′ on Γ ⊂ Σ, we have the
identity
(1) ∆(D,D′) + ∆(D′, D′′) = ∆(D,D′′)
in H1(Σ, ∂Σ;Z2).
Fix a homology class β ∈ H1(Σ, ∂Σ;Z2), a dimer configuration D1 ∈ D(Γ, ∂Γ)
and a boundary condition ∂D0. The associated partial partition function is defined
Zβ,D1(Γ;w | ∂D0) =
D:∂D=∂D0
∆(D,D1)=β
w(D),
where the sum is taken over all D ∈ D(Γ, ∂Γ) such that ∂D = ∂D0 and ∆(D,D1) =
The equality (1) implies that
Zβ,D1(Γ;w | ∂D0) = Zβ+∆(D0,D1),D0(Γ;w | ∂D0).
Furthermore, the relative homology class β′ = β + ∆(D0, D1) lies in the image of
the canonical homomorphism j : H1(Σ,Z2) → H1(Σ, ∂Σ;Z2). Hence,
Zβ′,D0(Γ;w | ∂D0) =
α:j(α)=β′
Zα(Γ, w | ∂D0),
where the sum is taken over all α ∈ H1(Σ,Z2) such that j(α) = β
′, and
Zα(Γ;w | ∂D0) =
D:∂D=∂D0
∆(D,D0)=α
w(D).
Therefore the computation of the partition function Zβ,D1(Γ;w | ∂D0) boils down
to the computation of Zα(Γ;w | ∂D0) with α ∈ H1(Σ;Z2). We shall give a Pfaffian
formula for this latter partition function in the next section (see Theorem 2.4).
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 5
2. Kasteleyn orientations on surface graphs with boundary
2.1. Kasteleyn orientations. Let K be an orientation of the edges of a graph Γ,
and let C be an oriented closed curve in Γ. We shall denote by nK(C) the number
of times that, traveling once along C following its orientation, one runs along an
edge in the direction opposite to the one given by K.
A Kasteleyn orientation on a surface graph with boundary Γ ⊂ Σ is an orien-
tation K of the edges of Γ = Γ ∪ ∂Σ which satisfies the following condition: for
each face f of Σ, nK(∂f) is odd. Here ∂f is oriented as the boundary of f , which
inherits the orientation of Σ.
Using the proof of [3, Theorem 3.1], one easily checks that if ∂Σ is non-empty,
then there always exists a Kasteleyn orientation on Γ ⊂ Σ. More precisely, we have
the following:
Proposition 2.1. Let Γ ⊂ Σ be a connected surface graph, possibly with boundary,
and let C1, . . . , Cµ be the boundary components of Σ with the induced orientation.
Finally, let n1, . . . , nµ be 0’s and 1’s. Then, there exists a Kasteleyn orientation on
Γ ⊂ Σ such that 1 + nK(−Ci) ≡ ni (mod 2) for all i if and only if
n1 + · · ·+ nµ ≡ V (mod 2),
where V is the number of vertices of Γ.
Proof. First, let us assume that there is a Kasteleyn orientation K on Γ ⊂ Σ such
that 1 + nK(−Ci) ≡ ni for all i. Let Σ
′ be the closed surface obtained from Σ by
pasting a 2-disc Di along each boundary component Ci. Let Γ
′ ⊂ Σ′ be the surface
graph obtained from Γ as follows: for each i such that ni = 1, add one vertex in the
interior of Di and one edge (arbitrarily oriented) between this vertex and a vertex
of Ci. The result is a Kasteleyn orientation on Γ
′ ⊂ Σ′, with Σ′ closed. By [3,
Theorem 3.1], the number V ′ of vertices of Γ′ is even. Hence,
0 ≡ V ′ ≡ V + n1 + · · ·+ nµ (mod 2).
Conversely, assume Γ ⊂ Σ is a surface graph with n1+ · · ·+nµ ≡ V (mod 2). Paste
2-discs along the boundary components of Σ as before. This gives a surface graph
Γ′ ⊂ Σ′ with Σ′ closed and V ′ even. By [3, Theorem 3.1], there exists a Kasteleyn
orientation K ′ on Γ′ ⊂ Σ′. It restricts to a Kasteleyn orientation K on Γ ⊂ Σ with
1 + nK(−Ci) ≡ ni for all i. �
Recall that two Kasteleyn orientations are called equivalent if one can be ob-
tained from the other by a sequence of moves reversing orientations of all edges
adjacent to a vertex. The proof of [3, Theorem 3.2] goes through verbatim: if
non-empty, the set of equivalence classes of Kasteleyn orientations on Γ ⊂ Σ is an
affine H1(Σ;Z2)-space. In particular, there are exactly 2
b1(Σ) equivalence classes of
Kasteleyn orientations on Γ ⊂ Σ.
2.2. Discrete spin structures. As in the closed case, any dimer configuration
D on a graph Γ allows to identify equivalence classes of Kasteleyn orientations on
Γ ⊂ Σ with spin structures on Σ. Indeed, [3, Theorem 4.1] generalizes as follows.
Given an oriented simple closed curve C in Γ, let ℓD(C) denote the number of
vertices v in C whose adjacent dimer of D sticks out to the left of C in Σ. Also, let
V∂D(C) be the number of boundary vertices v in C not matched by D, and such
that the interior of Σ lies to the right of C at v.
6 DAVID CIMASONI AND NICOLAI RESHETIKHIN
Theorem 2.2. Fix a dimer configuration D on a surface graph with boundary
Γ ⊂ Σ. Given a class α ∈ H1(Σ;Z2), represent it by oriented simple closed curves
C1, . . . , Cm in Γ. If K is a Kasteleyn orientation on Γ ⊂ Σ, then the function
qKD : H1(Σ;Z2) → Z2 given by
qKD (α) =
Ci · Cj +
(1 + nK(Ci) + ℓD(Ci) + V∂D(Ci)) (mod 2)
is a well-defined quadratic form on H1(Σ;Z2).
Proof. Fix a dimer configuration D on (Γ, ∂Γ) and a Kasteleyn orientation K on
Γ ⊂ Σ. Let Σ′ be the surface (homeomorphic to Σ) obtained from Σ by adding a
small closed collar to its boundary. For every vertex v of ∂Γ that is not matched
by a dimer of D, add a vertex v′ near v in the interior of the collar and an edge
between v and v′. Let us denote by Γ′ the resulting graph in Σ′. Putting a
dimer on each of these additional edges, and orienting them arbitrarily, we obtain
a perfect matching D′ and an orientation K ′ on Γ′. Although Γ′ ⊂ Σ′ is not
strictly speaking a surface graph, all the methods of [3, Section 4] apply. Indeed,
Kuperberg’s vector field defined near Γ′ clearly extends continuously to the collar.
As in the closed case, it also extends to the faces with even index singularities. Using
the perfect matching D′ on Γ′, we obtain a vector field f(K ′, D′) with even index
singularities, which determines a spin structure ξf(K′,D′) on Σ
′. Johnson’s theorem
[5] holds for surfaces with boundary, so this spin structure defines a quadratic
form q on H1(Σ
′;Z2) = H1(Σ;Z2). If C is a simple close curve in Γ
′ ⊂ Σ′, then
q([C]) + 1 = nK
(C) + ℓD′(C) as in the closed case. The proof is completed using
the equalities nK
(C) = nK(C) and ℓD′(C) = ℓD(C) + V∂D(C). �
Since Johnson’s theorem holds true for surfaces with boundary and [3, Proposi-
tion 4.2] easily extends, we have the following corollary.
Corollary 2.3. Let Γ ⊂ Σ be a surface graph, non-necessarily connected, and pos-
sibly with boundary. Any dimer configuration D on Γ ⊂ Σ induces an isomorphism
of affine H1(Σ;Z2)-spaces
ψD : K(Γ ⊂ Σ) −→ S(Σ)
from the set of equivalence classes of Kasteleyn orientations on Γ ⊂ Σ onto the set
of spin structures on Σ. Furthermore, ψD − ψD′ is equal to the Poincaré dual of
∆(D,D′). In particular, ψD = ψD′ if and only if D and D
′ are equivalent dimer
configurations. �
2.3. The Pfaffian formula for the partition function. Let Γ be a graph, not
necessarily connected, and possibly with boundary, endowed with a weight system
w. Realize Γ as a surface graph Γ ⊂ Σ, and fix a Kasteleyn orientation K on it.
The Kasteleyn coefficient associated to an ordered pair (v, v′) of distinct vertices
of Γ is the number
aKvv′ =
εKvv′(e)w(e),
where the sum is on all edges e in Γ between the vertices v and v′, and
εKvv′(e) =
1 if e is oriented by K from v to v′;
−1 otherwise.
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 7
One also sets aKvv = 0. Let us fix a boundary condition ∂D0 and enumerate the
matched vertices of Γ by 1, 2, . . . , 2n. Then, the corresponding coefficients form a
2n× 2n skew-symmetric matrix AK(Γ;w | ∂D0) = A
K called the Kasteleyn matrix .
Let D be a dimer configuration on (Γ, ∂Γ) with ∂D = ∂D0, given by edges
e1, . . . , en matching vertices iℓ and jℓ for ℓ = 1, . . . , n. Let σ be the permutation
(1, . . . , 2n) 7→ (i1, j1, . . . , in, jn), and set
εK(D) = (−1)σ
εKiℓjℓ(eℓ),
where (−1)σ denotes the sign of σ. Note that εK(D) does not depend on the choice
of σ, but only on the dimer configuration D.
Finally, recall that the Arf invariant of a (possibly degenerate) quadratic form
q on H := H1(Σ;Z2) is defined by
Arf(q) =
(−1)q(α).
If there is a component γ of ∂Σ such that q(γ) 6= 0, then one easily checks that
Arf(q) = 0. On the other hand, if q(γ) = 0 for all boundary components γ of Σ,
then Arf(q) takes the values +1 or −1.
Theorem 2.4. Let Γ ⊂ Σ be a surface graph, not necessarily connected, and possi-
bly with boundary. Let b1(Σ) denote the dimension of H1(Σ;Z2), and let g denote
the genus of Σ. Then,
Zα(Γ;w | ∂D0) =
2b1(Σ)
(−1)q
εK(D0)Pf(A
for any α ∈ H1(Σ;Z2), and
Z(Γ;w | ∂D0) =
Arf(qKD0)ε
K(D0)Pf(A
where both sums are over the 2b1(Σ) equivalence classes of Kasteleyn orientations
on Γ ⊂ Σ. Furthermore, Arf(qKD0)ε
K(D0) does not depend on D0.
Proof. First note that if the theorem holds for two surface graphs, then it holds
for their disjoint union. Therefore, it may be assumed that Σ is connected. The
first formula follows from Theorem 2.2: the proof of Theorem 4 and the first half
of the proof of Theorem 5 of [3] generalize verbatim to the case with (possible)
boundary. The second formula can be obtained from the first one by summing
over all α ∈ H1(Σ;Z2). However, this requires some cumbersome computations,
so let us give another proof of this equality. As mentioned in Section 1, the dimer
model on (Γ, ∂Γ) with boundary condition ∂D0 is equivalent to the dimer model
on the graph Γ′ = Γ′(∂D0) obtained from Γ by removing all edges adjacent to non-
matched boundary vertices. Let w′ denote the restriction of w to Γ′. If Γ ⊂ Σ is a
surface graph with boundary, then Γ′ ⊂ Σ′ is a surface graph, where Σ′ is the closed
oriented surface obtained from Σ by gluing discs along all boundary components.
By [3, Theorem 5.3],
Z(Γ;w | ∂D0) = Z(Γ
′;w′) =
Arf(qK
(D0)Pf(A
K′(Γ′;w′)),
8 DAVID CIMASONI AND NICOLAI RESHETIKHIN
the sum being on all equivalence classes of Kasteleyn orientations on Γ′ ⊂ Σ′.
Such a Kasteleyn orientation K ′ extends uniquely to a Kasteleyn orientation K on
Γ ⊂ Σ such that qKD0(γ) = 0 for all boundary component γ of Σ. Furthermore,
(D0) = ε
K(D0) and A
K′(Γ′;w′) = AK(Γ;w | ∂D0). Since Arf(q
) = 0 for all
other Kasteleyn orientations, the theorem follows. �
3. Cutting and gluing
3.1. Cutting and gluing graphs with boundary. Let (Γ, ∂Γ) be a graph with
boundary, and let us fix an edge e of Γ. Let (Γ{e}, ∂Γ{e}) denote the graph with
boundary obtained from (Γ, ∂Γ) as follows: cut the edge e in two, and set ∂Γ{e} =
∂Γ ∪ {v′, v′′}, where v′ and v′′ are the new one valent vertices. Iterating this
procedure for some set of edges E leads to a graph with boundary (ΓE, ∂ΓE), which
is said to be obtained by cutting (Γ, ∂Γ) along E.
Note that a dimer configuration D ∈ D(Γ, ∂Γ) induces an obvious dimer config-
uration DE ∈ D(ΓE, ∂ΓE): cut in two the dimers of D that belong to E.
A weight system w on Γ induces a family of weight systems (wt
)t on ΓE indexed
by t : E → R>0, as follows: if e is an edge of Γ which does not belong to E, set
(e) = w(e); if e ∈ E is cut into two edges e′, e′′ of ΓE, set w
(e′) = t(e)w(e)1/2 and
(e′′) = t(e)−1w(e)1/2. Note that this family of weight systems is an orbit under
the action of the subgroup of G(ΓE) consisting of elements s such that s(v) = 1 for
all v ∈ V (Γ) and s(v′) = s(v′′) whenever v′, v′′ ∈ ∂ΓE come from the same edge of
Let us now formulate how the cutting affects the partition function. The proof
is straightforward.
Proposition 3.1. Fix a boundary condition ∂D0 on (Γ, ∂Γ) and a set E of edges
of Γ. Then, given any parameter t : E → R>0,
Z(Γ;w | ∂D0) =
Z(ΓE;w
| ∂DI0),
where the sum is taken over all subsets I of E and ∂DI0 is the boundary condition
on (ΓE, ∂ΓE) induced by ∂D0 and I: a vertex of ∂ΓE is matched in ∂D
0 if and only
if it is matched in ∂D0 or it comes from an edge in I. �
The operation opposite to cutting is called gluing: pick a pair of boundary
vertices of Γ, and glue the adjacent edges e′, e′′ along these vertices into a single
edge e. In order for the result to be a graph, it should be assumed that e′ and e′′
are different edges of Γ. We shall denote by (Γϕ, ∂Γϕ) the graph obtained by gluing
(Γ, ∂Γ) according to a pairing ϕ of several vertices of ∂Γ.
Note that a dimer configuration D ∈ D(Γ, ∂Γ) induces a dimer configuration
Dϕ ∈ D(Γϕ, ∂Γϕ) if and only if the boundary condition ∂D on ∂Γ is compatible
with ϕ, i.e: ϕ relates matched vertices with matched vertices. Obviously, a dimer
configuration DE is compatible with the pairing ϕ which glues back the edges of E,
and (DE)ϕ = D on ((ΓE)ϕ, (∂ΓE)ϕ) = (Γ, ∂Γ).
An edge weight system w on Γ induces an edge weight system wϕ on Γϕ as
follows:
wϕ(e) =
w(e) if e is an edge of Γ;
w(e′)w(e′′) if e is obtained by gluing the edges e′ and e′′ of Γ.
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 9
If E is a set of edges of Γ and ϕ is the pairing which glues back these edges, then
)ϕ = w for any t : E → R>0.
The effect of gluing on the partition function is best understood in the language
of quantum field theory. We therefore postpone its study to Section 4.
3.2. Cutting and gluing surface graphs with boundary. Let Γ ⊂ Σ be a
surface graph with boundary. Let C be a simple curve in Σ which is “in general
position” with respect to Γ, in the following sense:
(i) it is disjoint from the set of vertices of Γ;
(ii) it intersects the edges of Γ transversally;
(iii) its intersection with any given face of Σ is connected.
Let ΣC be the surface with boundary obtained by cutting Σ open along C. Also,
let ΓC := ΓE(C) be the graph with boundary obtained by cutting (Γ, ∂Γ) along the
set E(C) of edges of Γ which intersect C, as illustrated in Figure 1.
Γ ⊂ Σ
ΓC ⊂ ΣC
Figure 1. Cutting a surface graph Γ ⊂ Σ along a curve C.
Obviously, ΓC ⊂ ΣC is a surface graph with boundary. We will say that it is
obtained by cutting Γ ⊂ Σ along C. Abusing notation, we shall write wtC for the
weight system wt
E(C) on ΓC .
A class β ∈ H1(Σ, ∂Σ;Z2) induces βC ∈ H1(ΣC , ∂ΣC ;Z2) via
H1(Σ, ∂Σ;Z2) → H1(Σ, ∂Σ ∪N(C);Z2) ≃ H1(ΣC , ∂ΣC ;Z2).
Here N(C) denotes a neighborhood of C in Σ, the first homomorphism is induced
by inclusion, and the second one is the excision isomorphism. Note that given
any two dimers configurations D and D′ on Γ ⊂ Σ, ∆(DC , D
C) = ∆(D,D
′)C in
H1(ΣC , ∂ΣC ;Z2).
This easily leads to the following refinement of Proposition 3.1.
Proposition 3.2. Fix β ∈ H1(Σ, ∂Σ;Z2), D
′ ∈ D(Γ, ∂Γ), and a boundary condi-
tion ∂D0 on (Γ, ∂Γ). Then, given any parameter t : E(C) → R>0,
Zβ,D′(Γ;w | ∂D0) =
I⊂E(C)
ZβC ,D′C (ΓC ;w
C | ∂D
where the sum is taken over all subsets I of E(C) and ∂DI0 is the boundary condition
on (ΓC , ∂ΓC) induced by ∂D0 and I. �
Let us now define the operation opposite to cutting a surface graph with bound-
ary. Pick two closed connected subsets M1,M2 of ∂Σ, which are not points, and
satisfy the following properties:
(i) M1 ∩M2 ⊂ ∂M1 ∪ ∂M2 and ∂M1 ∪ ∂M2 is disjoint from ∂Γ;
10 DAVID CIMASONI AND NICOLAI RESHETIKHIN
(ii) the intersection of each given face of Σ with M1 ∪M2 is connected;
(iii) there exists an orientation-reversing homeomorphism ϕ : M1 → M2 which
induces a bijection M1 ∩ ∂Γ → M2 ∩ ∂Γ such that for all v in M1 ∩ ∂Γ, v
and ϕ(v) are not adjacent to the same edge of Γ.
Let Γϕ ⊂ Σϕ be obtained from the surface graph Γ ⊂ Σ by identifying M1 and M2
via ϕ and removing the corresponding vertices of Γ. This is illustrated in Figure 2.
By the conditions above, the pair Γϕ ⊂ Σϕ remains a surface graph. It is said to
be obtained by gluing Γ ⊂ Σ along ϕ.
Γ ⊂ Σ Γϕ ⊂ Σϕ
−→ M2
Figure 2. Gluing a surface graph Γ ⊂ Σ along ϕ : M1 → M2.
Note that any surface graph ΓC ⊂ ΣC obtained by cutting Γ ⊂ Σ along some
curve C in general position with respect to Γ satisfies the conditions listed above.
Furthermore, (ΓC)ϕ ⊂ (ΣC)ϕ = Γ ⊂ Σ, where ϕ is the obvious homeomorphism
identifying the two closed subsets of ∂ΣC coming from C. Conversely, if C denotes
the curve in Σϕ given by the identification ofM1 andM2 via ϕ, then it is in general
position with respect to Γϕ, and (Γϕ)C ⊂ (Σϕ)C .
3.3. Cutting and gluing discrete spin structures. Let Γ ⊂ Σ be a surface
graph with boundary, and let C be a simple curve in Σ in general position with
respect to Γ. As noted above, any dimer configurationD on (Γ, ∂Γ) induces a dimer
configuration DC on (ΓC , ∂ΓC). If two dimer configurations D,D
′ ∈ D(Γ, ∂Γ) are
equivalent, then DC , D
C ∈ D(ΓC , ∂ΓC) are equivalent as well:
∆(DC , D
C) = ∆(D,D
′)C = 0 ∈ H1(ΣC , ∂ΣC ;Z2).
A Kasteleyn orientation K on Γ ⊂ Σ induces a Kasteleyn orientation KC on
ΓC ⊂ ΣC as follows. Let KC be equal to K on all edges of ΓC coming from edges
of Γ. For all the new edges of ΓC , there is a unique orientation which satisfies the
Kasteleyn condition, since each face of Σ is crossed at most once by C. One easily
checks that if K and K ′ are equivalent Kasteleyn orientations, then KC and K
are also equivalent. Hence, there is a well-defined operation of cutting discrete spin
structures on a surface with boundary.
This is not a surprise. Indeed, the inclusion ΣC ⊂ ΣC ∪ N(C) = Σ induces
a homomorphism i∗ : H1(ΣC ;Z2) → H1(Σ;Z2). The assignment q 7→ qC = q ◦ i∗
defines a map from the quadratic forms on H1(Σ;Z2) to the quadratic forms on
H1(ΣC ;Z2), which is affine over the restriction homomorphism i
∗ : H1(Σ;Z2) →
H1(ΣC ;Z2). By Johnson’s theorem, it induces an affine map between the sets of
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 11
spin structures S(Σ) → S(ΣC). By Corollary 2.3, there is a unique map K(Γ ⊂
Σ) → K(ΓC ⊂ ΣC) which makes the following diagram commute:
(2) K(Γ ⊂ Σ)
∼= ψD
// K(ΓC ⊂ ΣC)
∼= ψDC
S(Σ) // S(ΣC).
This map is nothing but [K] 7→ [KC ].
Now, let K be a Kasteleyn orientation on a surface graph Γ ⊂ Σ, and let
ϕ : M1 →M2 be an orientation-reversing homeomorphism between two closed con-
nected subsets in ∂Σ, as described above. We shall say that a Kasteleyn orientation
K on Γ ⊂ Σ is compatible with ϕ if the following conditions hold:
(i) whenever two edges e′, e′′ of Γ are glued into a single edge e of Γϕ, the
orientation K agrees on e′ and e′′, giving an orientation Kϕ on e;
(ii) the induced orientation Kϕ is a Kasteleyn orientation on Γϕ ⊂ Σϕ.
The Kasteleyn orientationKϕ on Γϕ ⊂ Σϕ is said to be obtained by gluing K along
Given any Kasteleyn orientation K on Γ ⊂ Σ, the induced orientation KC on
ΓC ⊂ ΣC is compatible with the map ϕ such that (ΣC)ϕ = Σ; furthermore, (KC)ϕ
is equal to K. Conversely, if K is a Kasteleyn orientation on Γ ⊂ Σ which is
compatible with ϕ, and C denotes the curve in Σϕ given by the identification of
M1 and M2 via ϕ, then (Kϕ)C is equal to K. With these notations, any dimer
configuration D on Γ which is compatible with ϕ satisfies (Dϕ)C = D. Therefore,
diagram (2) gives
K(Γϕ ⊂ Σϕ)
∼= ψDϕ
// K(Γ ⊂ Σ)
∼= ψD
S(Σϕ) // S(Σ),
where both horizontal maps are affine over i∗ : H1(Σϕ;Z2) → H
1(Σ;Z2). Under-
standing the gluing of Kasteleyn orientations (up to equivalence) now amounts to
understanding the restriction homomorphism i∗. Using the exact sequence of the
pair (Σϕ,Σ), one easily checks the following results:
– The restriction homomorphism i∗ is injective, unlessM1 andM2 are disjoint
and belong to the same connected component of Σ. In this case, the kernel
of i∗ has dimension 1.
– The homomorphism i∗ is onto unless M1 ∪ M2 is a 1-cycle and the cor-
responding connected component of Σϕ is not closed. In this case, the
cokernel of i∗ has dimension 1.
This leads to the four following cases. Fix a Kasteleyn orientation K on Γ ⊂ Σ.
(1) If i∗ is an isomorphism, then there exist a Kasteleyn orientation K ′ equiv-
alent to K which is compatible with ϕ. Furthermore, the assignment
[K] 7→ [K ′ϕ] gives a well-defined map between K(Γ ⊂ Σ) and K(Γϕ ⊂ Σϕ).
(2) If i∗ is onto but not injective, then there exist K ′,K ′′ ∼ K which are
compatible with ϕ, inducing two distinct well-defined maps [K] 7→ [K ′ϕ]
and [K] 7→ [K ′′ϕ] between K(Γ ⊂ Σ) and K(Γϕ ⊂ Σϕ).
12 DAVID CIMASONI AND NICOLAI RESHETIKHIN
(3) If i∗ is injective but not onto, then M1 ∪M2 is a 1-cycle, oriented as part
of the boundary of Σ. There exist K ′ ∼ K which is compatible with ϕ if
and only if the following condition holds:
nK(M1) + n
K(M2) ≡ 0 (mod 2) if M1 and M2 are disjoint;
nK(M1 ∪M2) ≡ 1 (mod 2) otherwise.
(Note that this condition only depends on the equivalence class of K.) In
this case, it induces a well-defined class [K ′ϕ] in K(Γϕ ⊂ Σϕ).
(4) Finally, assume i∗ is neither onto nor injective. If K satisfies the condition
above, then there exist K ′,K ′′ ∼ K which are compatible with ϕ, inducing
two well-defined maps [K] 7→ [K ′ϕ] and [K] 7→ [K
ϕ]. On the other hand,
if K does not satisfy the condition above, then it does not contain any
representative which is compatible with ϕ.
3.4. Cutting Pfaffians. Let us conclude this section with one last observation.
Let Γ ⊂ Σ be a surface graph with boundary, and let C be a simple curve in Σ.
The equality
Z(Γ;w | ∂D0) =
I⊂E(C)
Z(ΓC ;w
C | ∂D
of Proposition 3.1 can be understood as the Taylor series expansion of the function
Z(Γ;w | ∂D0) in the variables (w(e))e∈E(C). Clearly, if E(C) = {ei1 , . . . , eik}, then
w(eiℓ)
∂kZ(Γ;w | ∂D0)
∂w(ei1) · · · ∂w(eik)
(0) = Z(ΓC ;w
C | ∂D
By Theorem 2.4, the partition function Z(Γ;w | ∂D0) can be expressed as a lin-
ear combination of Pfaffians of matrices AK(Γ;w | ∂D0) depending on Kasteleyn
orientations K of Γ ⊂ Σ such that qKD0(γ) = 0 for all boundary component γ of
Σ. Recall that any such orientation K extends to a Kasteleyn orientation KC on
ΓC ⊂ ΣC . Furthermore, all equivalence classes of Kasteleyn orientations such that
(D0)C
(γ) = 0 for all boundary component γ of ΣC are obtained in this way. (This
follows from the fact that the map [K] 7→ [KC ] is affine over the restriction homo-
morphism.) Finally, the partition function Z(ΓC ;w
C | ∂D0) can also be expressed
as a linear combination of Pfaffians of matrices AKC (ΓC ;w
C | ∂D0) via Theorem
Gathering all these equations, we obtain a relation between the Pfaffian of the
matrix AK(Γ;w | ∂D0) and the Pfaffian of the matrix A
KC (ΓC ;w
C | ∂D0). This
relation turns out to be exactly the equation below, a well-known property of Pfaf-
fians.
Proposition 3.3. Let A = (aij) be a skew-symmetric matrix of size 2n. Given
an ordered subset I of the ordered set α = (1, . . . , 2n), let AI denote the matrix
obtained from A by removing the ith row and the ith column for all i ∈ I. Then,
for any ordered set of indices I = (i1, j1, . . . , ik, jk),
∂kPf(A)
∂ai1j1 · · · ∂aikjk
= (−1)σ(I)Pf(AI),
where (−1)σ(I) denote the signature of the permutation which sends α to the ordered
set I(α\I). �
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 13
4. Quantum field theory for dimers
4.1. Quantum field theory on graphs. Let (Γ, ∂Γ) be a graph with boundary,
and let us assume that each vertex v in ∂Γ is oriented, that is, endowed with some
sign εv. In the spirit of the Atiyah-Segal axioms for a (0 + 1)-topological quantum
field theory [2, 13], let us define a quantum field theory on graphs as the following
assignment:
(1) Fix a finite dimensional complex vector space V .
(2) To the oriented boundary ∂Γ, assign the vector space
Z(∂Γ) =
εv=+1
εv=−1
where V ∗ denotes the vector space dual to V .
(3) To a finite graph Γ with oriented boundary ∂Γ and weight system w, assign
some vector Z(Γ;w) ∈ Z(∂Γ), with Z(∅;w) = 1 ∈ C = Z(∅).
Note that any orientation preserving bijection f : ∂Γ → ∂Γ′ induces an isomor-
phism Z(f) : Z(∂Γ) → Z(∂Γ′) given by permutation of the factors. This assign-
ment is functorial: if g : ∂Γ′ → ∂Γ′′ is another orientation preserving bijection,
then Z(g ◦ f) = Z(g) ◦Z(f). Finally, if f : ∂Γ → ∂Γ′ extends to a homeomorphism
F : Γ → Γ′, then Z(f) maps Z(Γ) to Z(Γ′). Note also that Z(−∂Γ) = Z(∂Γ)∗, and
that Z(∂Γ ⊔ ∂Γ′) = Z(∂Γ)⊗ Z(∂Γ′).
The main point is that we require the following gluing axiom. Let Γ be a graph
with oriented boundary ∂Γ, such that there exists two disjoint subsets X1, X2 of ∂Γ
and an orientation reversing bijection ϕ : X1 → X2 (i.e. εϕ(v) = −εv for all v ∈ X1).
Obviously, ϕ induces a linear isomorphism Z(ϕ) : Z(X1) → Z(X2)
∗. Let Γϕ denote
the graph with boundary ∂Γϕ = ∂Γ \ (X1 ∪X2) obtained by gluing Γ according to
ϕ, and let wϕ be the corresponding weight system on Γϕ (recall Section 3.1). Let
Bϕ denote the composition
Z(∂Γ) = Z(∂Γϕ)⊗ Z(X1)⊗ Z(X2) → Z(∂Γϕ)⊗ Z(X2)
∗ ⊗ Z(X2) → Z(∂Γϕ),
where the first homomorphism is given by id⊗Z(ϕ)⊗ id, and the second is induced
by the natural pairing Z(X2)
∗ ⊗ Z(X2) → C. We require that
Bϕ(Z(Γ;w)) = Z(Γϕ;wϕ).
Remark. In the same spirit, one can define a quantum field theory on surface graphs .
Here, the vector Z(Γ ⊂ Σ;w) ∈ Z(∂Γ) might depend on the realization of Γ as a
surface graph Γ ⊂ Σ, and the gluing axiom concerns gluing of surface graphs, as
defined in Section 3.2.
4.2. Quantum field theory for dimers on graphs. Let us now explain how
the dimer model on weighted graphs with boundary defines a quantum field theory.
As vector space V , choose the 2-dimensional complex vector space with fixed basis
a0, a1. Let α0, α1 denote the dual basis in V
∗. To a finite graph Γ with oriented
boundary ∂Γ and weight system w, assign
Z(Γ;w) =
Z(Γ;w | ∂D) a(∂D) ∈ Z(∂Γ),
where the sum is on all possible boundary conditions ∂D on ∂Γ, and
a(∂D) =
εv=+1
aiv(∂D) ⊗
εv=−1
αiv(∂D) ∈ Z(∂Γ).
14 DAVID CIMASONI AND NICOLAI RESHETIKHIN
Here, iv(∂D) = 1 if the vertex v is matched by ∂D, and iv(∂D) = 0 otherwise.
Let us check the gluing axiom. First note that Bϕ(a(∂D)) = 0 unless ∂D is
compatible with ϕ (i.e: unless ϕ(v) is matched in ∂D if and only if v is matched in
∂D). In such a case, B(a(∂D)) = a(∂D|∂Γϕ), where ∂D|∂Γϕ denotes the restriction
of the boundary condition ∂D to ∂Γϕ ⊂ ∂Γ. All the possible boundary conditions
∂Dϕ on ∂Γϕ are given by such restrictions. Therefore,
Bϕ(Z(Γ;w)) =
∂D⊃∂Dϕ
Z(Γ;w | ∂D)
a(∂Dϕ),
the interior sum being on all boundary conditions ∂D on ∂Γ that are compatible
with ϕ, and such that ∂D|∂Γϕ = ∂Dϕ. By definition,
∂D⊃∂Dϕ
Z(Γ;w | ∂D) =
D:∂D⊃∂Dϕ
w(D) = Z(Γϕ;wϕ | ∂Dϕ).
Therefore, the gluing axiom is satisfied.
4.3. The dimer model as the theory of free Fermions. Let W be an n-
dimensional vector space. The choice of an ordered basis in W induces an isomor-
phism between its exterior algebra
W = ⊕nk=0
W and the algebra generated
by elements φ1, . . . , φn with defining relations φiφj = −φjφi. This space is known
as the Grassman algebra generated by φ1, . . . , φn. The choice of an ordered basis
in W also defines a basis in the top exterior power of W . The integral over the
Grassman algebra of W of an element a ∈
W is the coordinate of a in the top
exterior power of W with respect to this basis. It is denoted by
a dφ.
There is a scalar product on the Grassman algebra generated by φ1, . . . , φn; it
is given by the Grassman integral
(3) < F,G >=
F (φ)G(ψ)dφdψ.
Note that the monomial basis is orthonormal with respect to this scalar product.
One easily shows (see e.g. the Appendix to [3]) that the Pfaffian of a skew symmetric
matrix A = (aij) can be written as
Pf(A) =
i,j=1
φiaijφj
Let us now use this to reformulate the quantum field theory of dimers in terms of
Grassman integrals. Let Γ ⊂ Σ be a (possibly disconnected) surface graph, possibly
with boundary. Let us fix a numbering of the vertices of Γ, a boundary condition
∂D0 on ∂Γ and a Kasteleyn orientation K on Γ ⊂ Σ. Let a
ij be the Kasteleyn
coefficient associated to K and the vertices i, j of Γ (recall Section 2). By Theorem
2.4 and the identity above,
Z(Γ;w | ∂D0) =
Arf(qKD0)ε
K(D0)
i,j∈V (D0)
dφ∂D0 ,
where the sum is over all 2b1(Σ) equivalence classes of Kasteleyn orientations on
Γ ⊂ Σ, V (D0) denotes the set of vertices of Γ that are matched by D0, and dφ∂D0 =
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 15
∧i∈V (D0)dφi. This leads to the formula
Z(Γ;w | ∂D0) =
i,j∈V (D0)
DK∂D0φ,
where DK∂D0φ = Arf(q
)εK(D0) d∂D0φ. Let us point out that this measure does
not depend on the choice of D0, but only on the induced boundary condition ∂D0.
Now, the numbering of the vertices of Γ gives a numbering of the vertices of ∂Γ.
This induces a linear isomorphism between Z(∂Γ) and the Grassman algebra
generated by (φi)i∈∂Γ. The image of the partition function under this isomorphism
is the following element of the Grassman algebra of boundary vertices:
Z(Γ;w) =
Z(Γ;w | ∂D0)
i∈V (∂D0)
(∂Γ),
where V (∂D0) = V (D0) ∩ ∂Γ. This leads to
Z(Γ;w) =
i,j∈V (D0)
DK∂D0φ
i∈V (∂D0)
i,j∈V (Γ)
where DKφ = Arf(qKD0)ε
K(D0) ∧i/∈∂Γ dφi. This measure depends only on K, but
not on D0.
We can now formulate the dimer model as the theory of free (Gaussian) Fermions:
(1) To the boundary of Γ ⊂ Σ, we assign
(∂Γ), the Grassman algebra gener-
ated by the ordered set ∂Γ;
(2) To a surface graph Γ ⊂ Σ with ordered set of vertices V (Γ) and weight
system w, we assign the element Z(Γ ⊂ Σ;w) of
(∂Γ) given by
Z(Γ ⊂ Σ;w) =
i,j∈V (Γ)
where the sum is over all 2b1(Σ) equivalence classes of Kasteleyn orientations
on Γ ⊂ Σ, and DKφ = Arf(qKD0)ε
K(D0) ∧i/∈∂Γ dφi.
The gluing axiom now takes the following form. Let Γϕ ⊂ Σϕ denote the surface
graph with boundary obtained by gluing Γ ⊂ Σ along some orientation-reversing
homeomorphism ϕ : M1 → M2 (see Section 3.2). Recall that ϕ induces a bijection
between the two disjoint sets X1 = ∂Γ ∩M1 and X2 = ∂Γ ∩M2. Therefore, it
induces an isomorphism Z(ϕ) :
(X1) →
(X2). Consider the map Bϕ given by
the composition
(∂Γ) =
(∂Γϕ)⊗
(X1)⊗
(X2) →
(∂Γϕ)⊗
(X2) →
(∂Γϕ).
Here, the first homomorphism is given by id⊗ (h ◦Z(ϕ))⊗ id, where h :
(X2) →
∗ is the isomorphism induced by the scalar product (3). Then, we require
Bϕ(Z(Γ ⊂ Σ;w)) = Z(Γϕ ⊂ Σϕ;wϕ).
We already know that this equality holds. Indeed, Z(Γ ⊂ Σ;w) just depends on
(Γ, w), and the formula above is nothing but the gluing axiom for Z(Γ;w) translated
in the formalism of Grassman algebras. However, it can also be proved from scratch
using the results of Section 3.3 together with well-known properties of Pfaffians.
16 DAVID CIMASONI AND NICOLAI RESHETIKHIN
5. Dimers on bipartite graphs and height functions
5.1. Composition cycles on bipartite graphs. Recall that a bipartite structure
on a graph Γ is a partition of its set of vertices into two groups, say blacks and
whites, such that no edge of Γ joins two vertices of the same group. Equivalently,
a bipartite structure can be regarded as a 0-chain
v black
v white
v ∈ C0(Γ;Z).
A bipartite structure induces an orientation on the edges of Γ, called the bipartite
orientation: simply orient all the edges from the white vertices to the black ones.
Using this orientation, a dimer configuration D ∈ D(Γ, ∂Γ) can now be regarded
as a 1-chain with Z-coefficients
e ∈ C1(Γ;Z)
such that ∂D = β in C0(Γ, ∂Γ;Z) = C0(Γ;Z)/C0(∂Γ;Z). Therefore, given two
dimer configurations D,D′ on Γ, their difference D −D′ is a 1-cycle (rel ∂Γ) with
Z-coefficients, denoted by C(D,D′). Its connected components are called (D,D′)-
composition cycles . In short, a bipartite structure on a graph allows to orient the
composition cycles.
5.2. Height functions for planar bipartite graphs. Let us now assume that
the bipartite graph Γ is planar without boundary, i.e. that it can be realized
as a surface graph Γ ⊂ S2. Let X denote the induced cellular decomposition
of the 2-sphere, which we endow with the counter-clockwise orientation. Since
H1(X ;Z) = H1(S
2;Z) = 0, the 1-cycle C(D,D′) is a 1-boundary, so there exists
σD,D′ ∈ C2(X ;Z) such that ∂σD,D′ = C(D,D
′). Let hD,D′ ∈ C
2(X ;Z) be given
by the equality
σD,D′ =
f∈F (X)
hD,D′(f) f ∈ C2(X ;Z),
where the sum is over all faces of X . The cellular 2-cochain hD,D′ is called a height
function associated to D,D′. Since H2(X ;Z) = H2(S
2;Z) = Z, the 2-chain σD,D′
is uniquely defined by D,D′ up to a constant, and the same holds for hD,D′ . Hence,
one can normalize all height functions by setting hD,D′(f0) = 0 for some fixed face
f0. This is illustrated in Figure 3.
Alternatively, hD,D′ can be defined as the only h ∈ C
2(X ;Z) such that h(f0) = 0
and h increases by 1 when a (D,D′)-composition cycle is crossed in the positive
direction (left to right as we cross). It follows that for any height function h and
any two 2-cells f1 and f2,
|h(f1)− h(f2)| ≤ d(f1, f2),
where d(f1, f2) is the distance between f1 and f2 in the dual graph, i.e. the minimal
number of edges crossed by a path connecting an point inside f1 with a point inside
f2. This can be regarded as a Lipschitz property of height functions. Note also
that for any three dimer configurations D, D′ and D′′ on Γ, the following cocycle
equality holds:
hD,D′ + hD′,D′′ = hD,D′′ .
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 17
Figure 3. An example of a bipartite planar graph with two dimer
configurations D (solid) and D′ (traced lines). The corresponding
height function hD,D′ (where f0 is the outer face) and (D,D
composition cycles are pictured on the right hand side.
The Lipschitz condition stated above leads to the following definition. Given a
fixed 2-cell f0 of the cellular decomposition X induced by Γ ⊂ S
2, set
H(X, f0) = {h ∈ C
2(X ;Z) |h(f0) = 0 and |h(f1)− h(f2)| ≤ d(f1, f2) ∀f1, f2}.
Given h ∈ H(X, f0), let C(h) denote the oriented closed curves formed by the
set of oriented edges e of Γ such that h increases its value by 1 when crossing e in
the positive direction. (In other words, C(h) = ∂σ, where σ ∈ C2(X ;Z) is dual to
h ∈ C2(X ;Z).) Obviously, there is a well-defined map
D(Γ)×D(Γ) → H(X, f0), (D,D
′) 7→ hD,D′
with C(hD,D′) = C(D,D
′). However, this map is neither injective nor surjective
in general. Indeed, the number of preimages of a given h is equal to the number of
dimer configurations on the graph obtained from Γ by removing the star of C(h).
Depending on Γ ⊂ S2, this number can be zero, or arbitrarily large.
To obtain a bijection, we proceed as follows. Fix a dimer configuration D0 on Γ.
Let C(D0) denote the set of all C ⊂ Γ consisting of disjoint oriented simple 1-cycles,
such that the following condition holds: for all e ∈ D0, either e is contained in C
or e is disjoint from C. Finally, set
HD0 (X, f0) = {h ∈ H(X, f0) |C(h) ∈ C(D0)}.
Proposition 5.1. Given any h ∈ HD0(X, f0), there is unique dimer configuration
D ∈ D(Γ) such that hD,D0 = h. Furthermore, given any two dimer configurations
D0, D1 on Γ, we have a canonical bijection
HD0(X, f0) → HD1(X, f0)
given by h 7→ h+ hD0,D1 .
Proof. One easily checks that the assignment D 7→ C(D,D0) defines a bijection
D(Γ) → C(D0). Furthermore, there is an obvious bijection HD0(X, f0) → C(D0)
given by h 7→ C(h). This induces a bijection D(Γ) → HD0(X, f0) and proves the
first part of the proposition. The second part follows from the first one via the
cocycle identity hD,D0 + hD0,D1 = hD,D1 . �
18 DAVID CIMASONI AND NICOLAI RESHETIKHIN
Let us now consider an edge weight system w on the bipartite planar graph Γ.
Recall that the Gibbs measure of D ∈ D(Γ) is given by
Prob(D) =
Z(Γ;w)
where w(D) =
e∈D w(D) and Z(Γ;w) =
D∈D(Γ)w(D). Let us now fix a dimer
configurationD0 and a face f0 ofX , and use the bijectionD(Γ) → HD0(X, f0) given
by D 7→ hD,D0 to translate this measure into a probability measure on HD0(X, f0).
To do so, we shall need the following notations: given an oriented edge e of Γ,
wβ(e) =
w(e) if the orientation on e agrees with the bipartite orientation;
w(e)−1 otherwise.
This defines a group homomorphism wβ : C1(X ;Z) → R>0. Finally, given any
f ∈ F (X), set
qf = wβ(∂f),
where ∂f is oriented as the boundary of the counter-clockwise oriented face f . This
number qf is called the volume weight of the face f .
Proposition 5.2. The Gibbs measure on D(Γ) given by the edge weight system w
translates into the following probability measure on HD0(X, f0):
ProbD0(h) =
ZD0,f0(X, q)
where
q(h) =
f∈F (X)
f and ZD0,f0(X ; q) =
h∈HD0(X,f0)
q(h).
Furthermore, this measure is independant of the choice of f0. Finally, the bijection
HD0(X, f0) → HD1(X, f0) given by h 7→ h+hD0,D1 is invariant with respect to the
measures ProbD0 and ProbD1 .
Proof. For any D ∈ D(Γ), we have
w(D)w(D0)
w(e)−1 = wβ(C(D,D0))
= wβ(∂σD,D0) = wβ
f∈F (X)
hD,D0(f)∂f
f∈F (X)
wβ(∂f)
hD,D0(f) =
f∈F (X)
hD,D0 (f)
f = q(hD,D0).
The proposition follows easily from this equality. �
Let V (Γ) (resp. E(Γ)) denote the set of vertices (resp. of edges) of Γ. Recall
that the group
G(Γ) = {s : V (Γ) → R>0}
acts on the set of weight systems on Γ by (sw)(e) = s(e+)w(e)s(e−), where e+ and
e− are the two vertices adjacent to the edge e. As observed in Section 1.1, the
Gibbs measure on D(Γ) is invariant under the action of the group G(Γ).
Note also that this action is free unless Γ is bipartite. In this later case, the
1-parameter family of elements sλ ∈ G(Γ) given by sλ(v) = λ if v is black and
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 19
sλ(v) = λ
−1 if v is white act as the identity on the set of weight systems. Hence, if
Γ is bipartite, the number of “essential” parameters is equal to |E(Γ)|− |V (Γ)|+1.
If this bipartite graph is planar, then
|E(Γ)| − |V (Γ)|+ 1 = |F (X)| − χ(S2) + 1 = |F (X)| − 1.
The |F (X)| volume weights qf are invariant with respect to the action of G(Γ). They
can be normalized in such a way that
f∈F (X) qf = 1, giving exactly |F (X)| − 1
parameters. Thus, in the height function formulation of the Gibbs measure, only
essential parameters appear.
5.3. Height functions for bipartite surface graphs. Let us now address the
general case of a bipartite surface graph Γ ⊂ Σ, possibly disconnected, and possibly
with boundary ∂Γ ⊂ ∂Σ. Fix a family γ = {γi}
i=1 of oriented simple curves in Γ
representing a basis in H1(Σ, ∂Σ;Z). Note that such a family of curves exists since
Γ is the 1-squeletton of a cellular decomposition X of Σ.
Given any D,D′ ∈ D(Γ, ∂Γ), the homology class of C(D,D′) = D −D′ can be
written in a unique way
[C(D,D′)] =
D,D′(i)[γi] ∈ H1(Σ, ∂Σ;Z),
with a
D,D′(i) ∈ Z. Hence, C(D,D
i=1 a
D,D′(i)γi is a 1-boundary (rel ∂X),
that is, there exists σ
D,D′ ∈ C2(X, ∂X ;Z) = C2(X ;Z) such that
(4) C(D,D′)− ∂σ
D,D′ −
D,D′(i)γi ∈ C1(∂X ;Z).
The 2-cochain h
D,D′ ∈ C
2(X ;Z) dual to σ
D,D′ is called a height function associated
to D,D′ with respect to γ. Since Z2(X, ∂X ;Z) = H2(X, ∂X ;Z) = H2(Σ, ∂Σ;Z) ∼=
H0(Σ;Z), the 2-chain σ
D,D′ is uniquely determined byD,D
′ and γ up to an element
of H0(Σ;Z), and the same holds for h
D,D′ . In other words, the set of height
functions associated toD,D′ with respect to γ is an affineH0(Σ;Z)-space: it admits
a freely transitive action of the abelian group H0(Σ;Z). One can normalize the
height functions by choosing some family F0 of faces of X , one for each connected
component of X , and by setting h
D,D′(f0) = 0 for all f0 ∈ F0.
Given h ∈ C2(X ;Z), set C(h) = ∂σ ∈ C1(X ;Z), where σ ∈ C2(X ;Z) is dual to
h ∈ C2(X ;Z). Given a fixed D0 ∈ D(Γ, ∂Γ), let C(D0) denote the set of all C ⊂ Γ
consisting of disjoint oriented 1-cycles (rel ∂Γ) such that the following condition
holds: for all e ∈ D0, either e is contained in C or e is disjoint from C.
Finally, let H
(X,F0) denote the set of pairs (h, a) ∈ C
2(X ;Z) × Zb1 which
satisfy the following properties:
– h(f0) = 0 for all f0 in F0;
– there exists C ∈ C(D0) such that C − C(h)−
i=1 a(i)γi ∈ C1(∂X ;Z).
We obtain the following generalization of Proposition 5.1. The proof is left to the
reader.
Proposition 5.3. Given any (h, a) ∈ H
(X,F0), there is a unique dimer config-
uration D ∈ D(Γ, ∂Γ) such that h
= h and a
= a. Furthermore, given any
20 DAVID CIMASONI AND NICOLAI RESHETIKHIN
two dimer configuration D0, D1 ∈ D(Γ, ∂Γ), there is a canonical bijection
(X,F0) → H
(X,F0)
given by (h, a) 7→ (h+ h
D0,D1
, a+ a
D0,D1
Recall that the boundary conditions on dimer configurations induce a partition
D(Γ, ∂Γ) =
D(Γ, ∂Γ | ∂D′0),
where D(Γ, ∂Γ | ∂D′0) = {D ∈ D(Γ, ∂Γ) | ∂D = ∂D
0}. This partition translates
into a partition of H
(X,F0) via the bijection D(Γ, ∂Γ) → H
(X,F0) given by
D 7→ (h
). Indeed, let F∂(X) denote the set of boundary faces of X , that
is, the set of faces of X that are adjacent to ∂Σ. The choice of a boundary condition
∂D′0 (together with F0) determines h
(f) for all D such that ∂D = ∂D′0 and
all f ∈ F∂(X). The actual possible values of h
on the boundary faces depend
on γ, D0 and F0; they can be determined explicitely. We shall denote by ∂h such
a value of a height function on boundary faces, and call it a boundary condition for
height functions. In short, we obtain a partition
(X,F0) =
(X,F0 | ∂h
indexed by all possible boundary conditions on height functions h
. Each bound-
ary condition on dimer configurations corresponds to one boundary condition on
height functions via D 7→ h
Let us now consider an edge weight system w on the bipartite graph Γ, and a
fixed boundary condition ∂D′0. Recall that the Gibbs measure for the dimer model
on (Γ, ∂Γ) with weight system w and boundary condition ∂D′0 is given by
Prob(D | ∂D′0) =
Z(Γ;w | ∂D′0)
where
Z(Γ;w | ∂D′0) =
D∈D(Γ,∂Γ | ∂D′0)
w(D).
Let us realize Γ as a surface graph Γ ⊂ Σ, fix a dimer configuration D0 ∈
D(Γ, ∂Γ), a family γ = {γi} of oriented simple curves in Γ representing a basis
in H1(Σ, ∂Σ;Z), and a collection F0 of faces of the induced cellular decomposition
X of Σ, one face for each connected component of X . We can use the bijection
D(Γ, ∂Γ | ∂D′0) → H
(X,F0 | ∂h
0) given by D 7→ (h
) to translate the
Gibbs measure into a probability measure on H
(X,F0 | ∂h
To do so, let us first extend the weight system w to all edges of X by setting
w(e) = 1 for all boundary edges of X . As in the planar case, define wβ : C1(X ;Z) →
R>0 as the group homomorphism such that, for any oriented edge e of X ,
wβ(e) =
w(e) if the orientation on e agrees with the bipartite orientation;
w(e)−1 otherwise.
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 21
Note that this makes sense even for boundary edges where there is no bipartite
orientation, as w(e) = 1 for such edges. Consider the parameters
qf = wβ(∂f) for all f ∈ F (X) \ F∂(X);
qi = wβ(γi) for all 1 ≤ i ≤ b1.
We obtain the following generalization of Proposition 5.2:
Proposition 5.4. Given an element (h, a) ∈ H
(X,F0), set
q(h, a) =
f∈F (X)\F∂(X)
1≤i≤b1
Then, the Gibbs measure for the dimer model on (Γ, ∂Γ) with weight system w
and boundary condition ∂D′0 translates into the following probability measure on
(X,F0 | ∂h
ProbD0(h, a | ∂h
q(h, a)
D0,F0
(X ; q | ∂h′0)
where
D0,F0
(X ; q | ∂h′0) =
(h,a)∈H
(X,F0 | ∂h
q(h, a).
Furthermore, the measure is independant of the choice of F0. Finally, the bijection
(X,F0 | ∂h
0) → H
(X,F0 | ∂h
1) given by (h, a) 7→ (h+ h
D0,D1
, a+ a
D0,D1
invariant with respect to the measures ProbD0 and ProbD1 .
Proof. For any D ∈ D(Γ, ∂Γ | ∂D′0), equation (4) leads to
C(D,D0)− ∂σ
i=1 a
(i)γi
Computing the first term, we get
wβ(C(D,D0)) =
w(e)−1 = w(D)w(D0)
As for the second one,
wβ(∂σ
) = wβ
f∈F (X)
hD,D0(f)∂f
f∈F (X)
hD,D0 (f)
Since wβ(γi) = qi, these equations lead to
w(D) = w(D0)
f∈F (X)
1≤i≤b1
i = λ · q(h
where λ = w(D0)
f∈F∂(X)
f depends only on D0 and ∂D
0. The proposi-
tion follows easily from this equality. �
Let us count the number of essential parameters in the dimer model on (Γ, ∂Γ)
with some boundary condition partitioning ∂Γ into (∂Γ)nm ⊔ (∂Γ)m, matched and
non-matched vertices. We have |E(Γ)|− |(∂Γ)nm| edge weights, with an action of a
(|V (Γ)|−|(∂Γ)nm|)-parameter group. Since Γ is bipartite, there is a b0(Γ)-parameter
22 DAVID CIMASONI AND NICOLAI RESHETIKHIN
subgroup acting as the identity. Therefore, the number of essential parameters is
equal to
|E(Γ)| − |V (Γ)|+ b0(Γ) = |E(X)| − |∂Γ| − |V (X)|+ b0(X)
= |F (X)| − |∂Γ| − χ(X) + b0(X)
= |F (X) \ F∂(X)|+ b1(Σ)− b2(Σ).
The numbers |F (X) \ F∂(X)| and b1(Σ) correspond to the parameters qf and qi.
Furthermore, the parameters qf can be normalized by
f qf = 1, the product being
on all faces of a given closed component of Σ. Therefore, we obtain exactly the
right number of parameters in this height function formulation of the dimer model.
Remark. Note that all the results of the first part of the present section can be
adapted to the general case of a non-necessarily bipartite surface graph: one simply
needs to work with Z2-coefficients. However, the height function formulation of the
dimer model using volume weights does require a bipartite structure. It is unknown
whether a reformulation of the dimer model with the right number of parameters
is possible in the general case.
5.4. The dimer quantum field theory on bipartite surface graphs. Let us
now use these results to reformulate the dimer quantum field theory on bipartite
graphs. Let Γ ⊂ Σ be a bipartite surface graph, and let X denote the induced
cellular decomposition of Σ. Fix a dimer configuration D0 ∈ D(Γ, ∂Γ), a family
γ = {γi} of oriented simple curves in Γ representing a basis in H1(Σ, ∂Σ;Z), and a
choice F0 of one face in each connected component of X .
(1) To ∂X , assign
Z(∂X) =
f∈F∂(X)
where W is the complex vector space with basis {αn}n∈Z, and F∂(X) de-
notes the set of faces of X adjacent to the boundary.
(2) To X with weight system q = {qf}f∈F (X) ∪ {qi}1≤i≤b1(Σ), assign
D0,F0
(X ; q) =
D0,F0
(X ; q | ∂h)α(∂h) ∈ Z(∂X),
where
D0,F0
(X ; q | ∂h) =
(h,a)∈H
(X,F0 | ∂h)
f∈F (X)\F∂(X)
1≤i≤b1(Σ)
and α(∂h) =
f∈F∂(X)
f αh(f).
Recall the notation a(∂D) ∈ Z(∂Γ) of Section 4.2. The bijection D(Γ, ∂Γ) →
(X,F0) induces an inclusion j : Z(∂Γ) →֒ Z(∂X) such that
j(a(∂D)) =
f∈F∂(X)
DIMERS ON SURFACE GRAPHS AND SPIN STRUCTURES. II 23
Therefore, using the proof of Proposition 5.4,
j(Z(Γ;w)) =
Z(Γ;w | ∂D) j(a(∂D))
D0,F0
(X ; q | ∂h)w(D0)
f∈F∂(X)
f∈F∂(X)
αh(f)
= w(D0)Z
D0,F0
(X ; q),
where the weight system q is obtained from w by qf = wβ(∂f) and qi = wβ(γi).
In this setting, the gluing axiom makes sense only when the data β, D0 and γ
are compatible with the gluing map ϕ. In such a case case, it holds by the equality
above and the results of Section 4.2.
The equivalence between the quantum field theories formulated in Section 4.3
and in the present section should be regarded as a discrete version of the boson-
fermion correspondence on compact Riemann surfaces (see [1]).
References
1. L. Álvarez-Gaumé, J.-B. Bost, G. Moore, P. Nelson and C. Vafa, Bosonization on higher genus
Riemann surfaces, Comm. Math. Phys. 112, (1987) 503–552.
2. M. Atiyah, Topological quantum field theories, Inst. Hautes Études Sci. Publ. Math. 68,
(1988) 175–186.
3. D. Cimasoni and N. Reshetikhin, Dimers on surface graphs and spin structures. I, to appear
in Comm. Math. Phys.
4. H. Cohn, R. Kenyon and J. Propp, A variational principle for domino tilings. J. Amer. Math.
Soc. 14, (2001) 297–346.
5. D. Johnson, Spin structures and quadratic forms on surfaces. J. London Math. Soc. (2) 22,
(1980) 365–373.
6. W. Kasteleyn, Dimer statistics and phase transitions. J. Mathematical Phys. 4, (1963) 287–
7. W. Kasteleyn, Graph Theory and Theoretical Physics (Academic Press, London 1967) pp.
43–110.
8. R. Kenyon and A. Okounkov, Planar dimers and Harnack curves. Duke Math. J. 131, (2006)
499–524.
9. R. Kenyon, A. Okounkov and S. Sheffield, Dimers and amoebae. Ann. of Math. (2) 163,
(2006) 1019–1056.
10. G. Kuperberg, An exploration of the permanent-determinant method. Electron. J. Combin.
5, (1998) Research Paper 46, 34 pp. (electronic).
11. A. Galluccio and M. Loebl, On the theory of Pfaffian orientations. I. Perfect matchings and
permanents. Electron. J. Combin. 6, (1999) Research Paper 6, 18 pp. (electronic).
12. B. McCoy and T.T. Wu, The two-dimensional Ising model (Harvard University Press, Cam-
bridge Massachusetts 1973).
13. G. Segal, The definition of conformal field theory, Differential geometrical methods in theo-
retical physics (Como, 1987), 165–171, NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., 250,
Kluwer Acad. Publ., Dordrecht, 1988.
14. G. Tesler, Matchings in graphs on non-orientable surfaces, J. Combin. Theory Ser. B 78
(2000), no. 2, 198–231.
Department of Mathematics, UC Berkeley, 970 Evans Hall, Berkeley, CA 94720, USA
E-mail address: cimasoni@math.berkeley.edu
E-mail address: reshetik@math.berkeley.edu
	Introduction
	Acknowledgements
	1. The dimer model on graphs with boundary
	1.1. Dimers on graphs with boundary
	1.2. Dimers on surface graphs with boundary
	2. Kasteleyn orientations on surface graphs with boundary
	2.1. Kasteleyn orientations
	2.2. Discrete spin structures
	2.3. The Pfaffian formula for the partition function
	3. Cutting and gluing
	3.1. Cutting and gluing graphs with boundary
	3.2. Cutting and gluing surface graphs with boundary
	3.3. Cutting and gluing discrete spin structures
	3.4. Cutting Pfaffians
	4. Quantum field theory for dimers
	4.1. Quantum field theory on graphs
	4.2. Quantum field theory for dimers on graphs
	4.3. The dimer model as the theory of free Fermions
	5. Dimers on bipartite graphs and height functions
	5.1. Composition cycles on bipartite graphs
	5.2. Height functions for planar bipartite graphs
	5.3. Height functions for bipartite surface graphs
	5.4. The dimer quantum field theory on bipartite surface graphs
	References
ABSTRACT
  In a previous paper, we showed how certain orientations of the edges of a
graph G embedded in a closed oriented surface S can be understood as discrete
spin structures on S. We then used this correspondence to give a geometric
proof of the Pfaffian formula for the partition function of the dimer model on
G. In the present article, we generalize these results to the case of compact
oriented surfaces with boundary. We also show how the operations of cutting and
gluing act on discrete spin structures and how they change the partition
function. These operations allow to reformulate the dimer model as a quantum
field theory on surface graphs.

<|endoftext|><|startoftext|>
New version announcement for TaylUR, an
arbitrary-order diagonal automatic
differentiation package for Fortran 95
G.M. von Hippel 1
Department of Physics, University of Regina, Regina, Saskatchewan, S4S 0A2,
Canada
Abstract
We present a new version of TaylUR, a Fortran 95 module to automatically compute
the numerical values of a complex-valued function’s derivatives with respect to sev-
eral variables up to an arbitrary order in each variable, but excluding mixed deriva-
tives. The new version fixes a potentially serious bug in the code for exponential-
related functions that could corrupt the imaginary parts of derivatives, as well as
being compatible with a wider range of compilers.
Key words: automatic differentiation, higher derivatives, Fortran 95
PACS: 02.60.Jh, 02.30.Mv
1991 MSC: 41-04, 41A58, 65D25
NEW VERSION PROGRAM SUMMARY
Manuscript Title: New version announcement for TaylUR, an arbitrary-order diag-
onal automatic differentiation package for Fortran 95
Authors: G.M. von Hippel
Program Title: TaylUR
Journal Reference:
Catalogue identifier:
Licensing provisions: none
Programming language: Fortran 95
Computer: Any computer with a conforming Fortran 95 compiler
Operating system: Any system with a conforming Fortran 95 compiler
Keywords: automatic differentiation, higher derivatives, Fortran 95
PACS: 02.60.Jh, 02.30.Mv
Email address: vonhippg@uregina.ca (G.M. von Hippel).
URL: http://uregina.ca/~vonhippg/ (G.M. von Hippel).
1 Corresponding author
Preprint submitted to Elsevier Science 4 November 2018
http://arxiv.org/abs/0704.0274v1
Classification: 4.12 Other Numerical Methods, 4.14 Utility
Catalogue identifier of previous version: ADXR v1 0
Journal reference of previous version: Comput. Phys. Commun. 174 (2006) 569-576
Does the new version supersede the previous version?: yes
Nature of problem:
Problems that require potentially high orders of derivatives with respect to some
variables or derivatives of complex-valued functions, such as e.g. expansions of Feyn-
man diagrams in particle masses in perturbative Quantum Field Theory.
Solution method:
Arithmetic operators and Fortran intrinsics are overloaded to act correctly on ob-
jects of a defined type taylor, which encodes a function along with its first few
derivatives with respect to the user-defined independent variables. Derivatives of
products and composite functions are computed using Leibniz’s rule and Fàa di
Bruno’s formula.
Reasons for the new version:
The previous version [1] contained a potentially serious bug in the functions over-
loading the exponential-related intrinsics (EXP, LOG, SIN, COS, TAN, SINH, COSH,
TANH), which could corrupt the imaginary parts of derivatives. It also contained
some features which caused it to crash when compiled with certain compilers (no-
tably the NAG and Lahey/Fujitsu compilers).
Summary of revisions:
The bug in the exponential-related intrinsics has been corrected. A number of ad-
ditional changes have been made to the code to enable better compatibility with a
greater range of compilers, including the NAG and Lahey/Fujitsu compilers. Users
of some of these compilers may have to define useintrinsic as a preprocessor sym-
bol when compiling TaylUR.
Restrictions:
Memory and CPU time constraints may restrict the number of variables and Taylor
expansion order that can be achieved. Loss of numerical accuracy due to cancella-
tion may become an issue at very high orders.
Unusual features:
No mixed higher-order derivatives are computed. The complex conjugation opera-
tion assumes all independent variables to be real.
Running time:
The running time of TaylUR operations depends linearly on the number of vari-
ables. Its dependence on the Taylor expansion order varies from linear (for linear
operations) through quadratic (for multiplication) to exponential (for elementary
function calls).
References:
[1] G. M. von Hippel, TaylUR, an arbitrary-order diagonal automatic differentiation
package for Fortran 95, Comput. Phys. Commun. 174 (2006) 569-576.
ABSTRACT
  We present a new version of TaylUR, a Fortran 95 module to automatically
compute the numerical values of a complex-valued function's derivatives with
respect to several variables up to an arbitrary order in each variable, but
excluding mixed derivatives. The new version fixes a potentially serious bug in
the code for exponential-related functions that could corrupt the imaginary
parts of derivatives, as well as being compatible with a wider range of
compilers.

<|endoftext|><|startoftext|>
Mapping radii of metric spaces∗
George M. Bergman
November 4, 2018
Dedicated to the memory of David Gale
Abstract
It is known that every closed curve of length ≤ 4 in Rn (n > 0) can be surrounded by a sphere
of radius 1, and that this is the best bound. Letting S denote the circle of circumference 4, with the
arc-length metric, we here express this fact by saying that the mapping radius of S in Rn is 1.
Tools are developed for estimating the mapping radius of a metric space X in a metric space Y. In
particular, it is shown that for X a bounded metric space, the supremum of the mapping radii of X in
all convex subsets of normed metric spaces is equal to the infimum of the sup norms of all convex linear
combinations of the functions d(x,−) : X → R (x ∈ X).
Several explicit mapping radii are calculated, and open questions noted.
1 The definition, and three examples.
Definition 1. We will denote by Metr the category whose objects are metric spaces, and whose morphisms
are nonexpansive maps. That is, for metric spaces X and Y we let
(1) Metr(X,Y ) = {f : X → Y | (∀x0, x1 ∈ X) d(f(x0), f(x1)) ≤ d(x0, x1)}.
Throughout this note, a map of metric spaces will mean a morphism in Metr.
Given a nonempty subset A of a metric space Y, we define its radius by
(2) radY (A) = infy∈Y supa∈A d(a, y),
a nonnegative real number or +∞. For metric spaces X and Y, we define the mapping radius of X in Y
(3) map-rad(X,Y ) = supf∈Metr(X,Y ) radY (f(X))
= supf∈Metr(X,Y ) infy∈Y supx∈X d(f(x), y).
If X is a metric space and Y a class of metric spaces, we likewise define
(4) map-rad(X,Y) = supY ∈Y map-rad(X,Y )
= supY ∈Y, f∈Metr(X,Y ) infy∈Y supx∈X d(f(x), y).
(The term “mapping radius” occurs occasionally in complex analysis with an unrelated meaning [15, Def. 7.11].)
All vector spaces in this note will be over the field of real numbers unless the contrary is stated.
The result stated in the first sentence of the abstract has been discovered many times [5], [6], [18], [19],
[25]. (Usually, the length of the closed curve is given as 1 and the radius of the sphere as 1/4, but the
scaled-up version will be more convenient here.) Let us obtain it in somewhat greater generality.
∗2000 Mathematics Subject Classifications. Primary: 54E40. Secondary: 46B20, 46E15, 52A40.
Keywords: nonexpansive map between metric spaces, maximum radius of image, convex subset of a normed vector space.
Any updates, errata, related references etc., learned of after publication will be noted at
http://math.berkeley.edu/∼ gbergman/papers/ .
http://arxiv.org/abs/0704.0275v2
http://math.berkeley.edu/\protect 
Lemma 2. Let S denote the circle of circumference 4, with the arc-length metric. Then for any nonzero
normed vector space V, we have map-rad(S, V ) = 1.
Proof. In V, any 1-dimensional subspace U is isometric to the real line R, and we can map S into R, and
hence into U, by “folding it flat”, getting for image an interval of length 2. Since this interval has points at
distance 2 apart, its radius in V cannot be less than 1, so map-rad(S, V ) ≥ 1.
For the reverse inequality, consider any map f : S → V. We wish to find a point y ∈ V having distance
≤ 1 from every point of f(S). Let p and q be any two antipodal points of S, and let
y = (f(p) + f(q))/2.
Every point x ∈ S lies on a length-2 arc between p and q in S, hence d(p, x) + d(q, x) = 2, hence
d(f(p), f(x))+d(f(q), f(x)) ≤ 2, i.e., (d(f(p), f(x))+d(f(q), f(x)))/2 ≤ 1, so d((f(p)+f(q))/2, f(x)) ≤ 1,
as claimed.
Let us make explicit the argument used at the very last step above. It is the c1 = c2 = 1/2 case of
(5) If c1, . . . , cn are nonnegative real numbers summing to 1, and v1, . . . , vn are elements of a
normed vector space V, then for all w ∈ V, d(
ci vi, w) ≤
ci d(vi, w).
This can be seen by writing the left-hand side as ||(
civi) − w|| = ||
ci (vi − w)|| ≤
||ci (vi − w)|| =
ci d(vi, w).
Consider next the union X of two circles S0 and S1, each of circumference 4, intersecting in a pair of
points antipodal in each (e.g., take for S0 and S1 any two distinct great circles on a sphere of circumfer-
ence 4), again with the arc-length metric. We can show that this X also has mapping radius ≤ 1 in V by
the same argument as before, except that where we previously used an arbitrary pair of antipodal points,
we are now forced to use precisely the pair at which our circles intersect. We are not so restricted in the
example showing that radius 1 can actually be achieved – we can stretch one circle taut between any two
antipodal points, and for most choices of those points, we have a great deal of freedom as to what to do with
the other circle. In any case, we have
Lemma 3. Let X be the union of two circles S0 and S1, each of circumference 4, intersecting in a pair
of points antipodal in each, with the arc-length metric. Then for any nonzero normed vector space V, we
have map-rad(X,V ) = 1.
We could apply the same method to any number of circles joined at a common pair of antipodal points; but
let us move in a different direction. Again picturing S0 and S1 as great circles on a sphere of circumference 4
in Euclidean 3-space, assume they meet at right angles, and call their points of intersection the north and
south poles. Let us bring in a third circle, S2, the equator, and let X = S0 ∪ S1 ∪ S2, again with the arc
length metric.
We no longer have a pair of antipodal points belonging to all three circles; rather, we have three pairs of
points, S1 ∩ S2 = {p0, q0}, S2 ∩ S0 = {p1, q1}, and S0 ∩ S1 = {p2, q2}. Now given a normed vector space
V and a map f : X → V in Metr, suppose we let
(6) y = (f(p0) + f(q0) + f(p1) + f(q1) + f(p2) + f(q2))/6.
What can we conclude about d(y, f(x)) for x ∈ X ?
Say x ∈ S2. Since both {p0, q0}, and {p1, q1} are pairs of antipodal points of S2, we have d(p0, x) +
d(q0, x) = d(p1, x) + d(q1, x) = 2. The same will not be true of d(p2, x) and d(q2, x). To determine how
large these can get, let us take x ∈ S2 as far as possible (under our arclength metric) from the intersections
of S2 with our two circles through the poles p2 and q2. This happens when x is at the midpoint of any
of the quadrants into which p0, q0, p1 and q1 divide S2; in this situation, d(p2, x) = d(q2, x) = 3/2.
(Each quadrant has arc-length 1, and one has to go a quadrant and a half to get from p2 or q2 to x.) We
see, in fact, that for any x ∈ S2 we have d(p2, x) = d(q2, x) ≤ 3/2, hence d(p2, x) + d(q2, x) ≤ 3. Now
applying any map f : X → V, and invoking (5) with all ci = 1/6, we see that for y as in (6) we have
d(y, f(x)) ≤ (2 + 2 + 3)/6 = 7/6. We have proved this for x ∈ S2; by symmetry, it is also true for x lying
on S0 or S1. This allows us to conclude, not that map-rad(X,V ) = 1 as in the preceding two cases, but
(7) map-rad(X,V ) ≤ 7/6.
And in fact, there do exist maps f : X → V with radV (f(X)) > 1. To describe such a map, note that
X can be identified with the 1-skeleton of a regular octahedron of edge 1. In the next few paragraphs, let
us put aside our picture of X in terms of great circles on a sphere, and replace it with this (straight-edged)
octahedral skeleton.
If we look at our octahedron in Euclidean 3-space from a direction perpendicular to one of its faces, we see
that face and the opposite one as overlapping, oppositely oriented equilateral triangles, with vertices joined
by the remaining 6 edges, which look like a regular hexagon. Now suppose we regard these two opposite
triangular faces as made of stiff wire, and the other 6 edges as made of string. Then if we bring the planes of
the two wire triangles closer to one another, the string edges will loosen. Suppose, however, that we rotate
the top triangle clockwise as they approach one another, so that three of those strings are kept taut, while
the other three become still looser. When the planes of our wire triangles meet, those wire triangles will
coincide, and the three taut string edges will fall together with the three edges of that triangle, while the
three loose ones become loops, hanging from the three vertices. Let us lock the two wire triangles together,
and pull the three loops taut, radially away from the center of symmetry of the triangle.
What we then have is the image of a certain map f in Metr from our octahedral skeleton X into a
plane, which we can identify with R2. We see that radR2(f(X)) will be the distance from the center of
symmetry of our figure to each of the three points to which the drawn-out loops are stretched; i.e., the sum of
the distance from the center of symmetry to each vertex of the triangle, and the length of the stretched loop
attached thereto. The former distance is two thirds of the altitude of the triangle, (2/3)(
3/2) = 1/
and the latter length is 1/2 (since the loop doubles back), so
(8) radR2(f(X)) = 1/
3 + 1/2. Hence, map-rad(X,R2) ≥ 1/
3 + 1/2 > 1.
This shows that our three-circle space does indeed behave differently from the preceding one- and two-
circle examples.
However 1/
3 + 1/2 ≈ 1.0773, which falls well short of the upper bound 7/6 ≈ 1.1667 of (7).
We can overcome this deficiency by using a different norm on R2. Let V be R2 with the norm whose
unit disc is the region enclosed by a regular hexagon H of unit side. Note that the 6 sides of H are parallel
to the 6 radii joining 0 to the vertices of H, hence these sides have length 1 in the new metric, just as
in the Euclidean metric, and indeed, any line segment in one of those directions will have the same length
in both metrics. Now let us map X, still pictured as the 1-skeleton of a regular octahedron of side 1 in
Euclidean 3-space, into V so that, as before, two opposite triangles are embedded isometrically (now under
the metric of V ), and made to fall together with each other and with three of the other edges, while the
remaining three edges form loops that are stretched radially outward as far as they will go. Let us moreover
take the sides of our image-triangle to be parallel to three sides of H.
The map X → R2 that does this is almost the same one as before. The 9 edges that end up parallel to
edges of H are mapped exactly as before, since distances in those directions are the same in the two metrics.
The three folded loops end up set-theoretically smaller than before, since the new metric is greater in their
direction than is the Euclidean metric, and they go out a distance 1/2 in the new metric before turning back;
but they still contribute the value 1/2 to the calculation of the radius of our image of X. The significant
change in that calculation concerns the distance from the center of our triangle to its three vertices. Looking
at our triangle as a translate of one of the 6 equilateral triangles into which H is decomposed by its radii,
we see that the altitude of that triangle is equal to its side in this metric (since the midpoint of a side of H
has the same distance, 1, from the origin as a vertex of H does). Hence the distance from the center to a
vertex is 2/3. Adding to this the distance 1/2 from that vertex to the end of the loop attached to it, we
get 2/3 + 1/2 = 7/6. Assuming that the center of our triangle is indeed the minimizing point defining the
radius (i.e., is a value of y that yields the infimum (2); we will verify this in Lemma 5), this achieves the
upper bound (7). Summarizing, and making a few supplementary observations, we have
Lemma 4. Let X be the 1-skeleton of a regular octahedron of side 1, under the arc-length metric. Then
for any nonzero normed vector space V,
(9) 1 ≤ map-rad(X,V ) ≤ 7/6.
The exact value of map-rad(X,V ) is 1 if V is 1-dimensional, is ≥ 1/
3 + 1/2 if V is Rn (n ≥ 2)
under the Euclidean norm, and is 7/6 if V is R2 under the norm having for unit circle a regular hexagon.
Proof. The lower bound 1 in (9) is gotten as in the last full sentence before Lemma 3, by regarding X
as S0 ∪ S1 ∪ S2, straightening out one of these circles to cover a segment of length 2 in a 1-dimensional
subspace of V, and letting the other two circles collapse into that line in any way. (Or for a construction
that relies less on geometric intuition, pick any p ∈ S0, map X into R by the function d(p,−), note that
this map sends p and the point antipodal to p on S0 to 0 and 2, respectively, and embed R in V.) As
before, such an image of X has points 2 units apart, and so has radius ≥ 1 in V by the triangle inequality.
The upper bound 7/6 was obtained in (7).
To see that when V itself is 1-dimensional, the value 1 is not exceeded, note that the distance between
any two points of X is ≤ 2. Hence the image of X under any map into such a V is a segment of length
≤ 2, hence of radius ≤ 1.
The lower bounds 1/
3 + 1/2 and 7/6 for V = R2 with the two indicated norms were obtained above
by explicit mappings.
Let us now justify the assumption we made just before the statement of the above lemma, about the
center from which we computed the radius.
Lemma 5. Let V be a normed vector space, A a nonempty subset of V, and G a finite group of isometries
of V which preserve A. Then
(10) radV (A) = infy∈V G supa∈A d(a, y),
where V G is the fixed-point set of G.
In particular, if V G is a singleton {v0}, then
(11) radV (A) = supa∈A d(a, v0).
Proof. Given v ∈ V, let
(12) y = |G|−1
and note that this point lies in V G, and that for any a ∈ A,
(13) d(a, y) = d(a, |G|−1
gv) ≤
|G|−1d(a, gv) = |G|−1
d(g−1a, v)
≤ |G|−1
supb∈A d(b, v) = supb∈A d(b, v).
Here the first inequality holds by (5) and the second by considering b = g−1a. Hence for y ∈ V G defined
by (12), supa∈A d(a, y) ≤ supa∈A d(a, v), from which (10) follows. The final assertion is a special case.
For V = R2 with a regular hexagon as unit circle, the group G generated by a rotation by 2π/3 about
any point is an isometry of V, and if we take that point to be the center of symmetry of the set f(X) we
were looking at above, G preserves f(X) and has that center of symmetry as unique fixed point; so the
above lemma justifies our description of the radius of f(X) in terms of distance from that point. In the
earlier computation using the Euclidean metric on R2, we “saw” that the radius was measured from the
center of symmetry; this is now likewise justified by Lemma 5.
Lemma 4 leaves open
Question 6. For V = R2 under the Euclidean norm, and X the 1-skeleton of a regular octahedron of
side 1, where does map-rad(X,V ) lie within [1/
3 + 1/2, 7/6] ?
For V = Rn, again with the Euclidean norm, but n > 2, is the answer the same?
Having whetted our appetite with this example, let us prove some general results.
2 General properties of mapping radii.
Lemma 7. Let X, X ′, Y, Y ′ be nonempty metric spaces, Y and Y′ classes of such metric spaces, and
V and V ′ normed vector spaces.
(i) If there exists a surjective map h : X → X ′ (or more generally, a map X → X ′ with dense image) in
Metr, then map-rad(X ′, Y ) ≤ map-rad(X,Y ).
(ii) If Y ′ ⊆ Y, then for any nonempty subset A of Y ′ we have radY ′(A) ≥ radY (A). Here equality will
hold if Y ′ is a retract of Y ; i.e., if the inclusion of Y ′ in Y has a left inverse in Metr.
Hence if Y ′ is a retract of Y, then map-rad(X,Y ′) ≤ map-rad(X,Y ). In particular, this is true if Y
is a normed vector space (or more generally, a convex subset of such a space) and Y ′ the fixed subspace
(respectively, subset) of a finite group G of affine isometries of Y.
(iii) If Y′ ⊆ Y, then map-rad(X,Y′) ≤ map-rad(X,Y).
(iv) In contrast to (i) and (ii), for X ′ ⊆ X, either of the numbers map-rad(X ′, Y ) and map-rad(X,Y )
can be greater than the other, and if Y ′ ⊆ Y, or if Y ′ is a surjective image of Y in Metr, either of the
numbers map-rad(X,Y ′) and map-rad(X,Y ) can be greater than the other.
Proof. (i) Suppose h : X → X ′ has dense image. Then for any f : X ′ → Y, fh(X) is dense in f(X ′),
hence radY (fh(X)) = radY (f(X
′)), so the terms of the supremum defining map-rad(X,Y ) include all the
terms of the supremum defining map-rad(X ′, Y ), from which the asserted inequality follows.
(ii) The terms of the infimum defining radY (A) include the terms of the infimum defining radY ′(A),
giving the first inequality.
If there exists a retraction e of Y onto Y ′, then for every y ∈ Y and a ∈ A we have d(a, e(y)) ≤ d(a, y),
since e is nonexpansive and fixes points of A. Hence supa∈A d(a, e(y)) ≤ supa∈A d(a, y), and taking the
infimum of this over y ∈ Y, we get radY ′(A) ≤ radY (A). This and the previous inequality give the asserted
equality. Since Metr(X,Y ′) ⊆Metr(X,Y ), we also get map-rad(X,Y ′) ≤ map-rad(X,Y ), as claimed.
If Y is a convex subset of a normed vector space, and Y ′ the fixed set of a finite group G as in the final
assertion, note that the function e(v) = |G|−1
gv used in the proof of Lemma 5 is nonexpansive:
d(e(v), e(w)) = ||e(v)− e(w)|| = ||e(v − w)|| ≤ ||v − w|| = d(v, w),
and is a retraction of Y onto Y G.
(iii) This is again a case of suprema of a smaller and a larger set of real numbers.
(iv) The assertion for X ′ ⊆ X can be seen from the following mapping radii, where subsets of R are
given the induced metric:
map-rad({0}, {0, 2}) = 0, map-rad({0, 2}, {0, 2}) = 2, map-rad({0, 1, 2}, {0, 2}) = 0.
The assertion for Y ′ ⊆ Y is shown by the observations
map-rad({0, 2}, {0}) = 0, map-rad({0, 2}, {0, 2}) = 2, map-rad({0, 2}, {0, 1, 2}) = 1.
Finally, to get the case where Y ′ is a surjective image of Y, note that we have surjections {0, 3} → {0, 2} →
{0, 1} in Metr, and that
map-rad({0, 2}, {0, 3}) = 0, map-rad({0, 2}, {0, 2}) = 2, map-rad({0, 2}, {0, 1}) = 1.
(With a bit more work, one can construct sets X, Y0, Y1, Y2 ⊆ R such that each Yi+1 is both a subset and
a surjective image of Yi, and such that map-rad(X,Y0) < map-rad(X,Y1) > map-rad(X,Y2).)
To state consequences of the above results, let us fix some notation.
Definition 8. For n ≥ 0, n-dimensional Euclidean space, i.e., Rn with the Euclidean norm, will be denoted
n. The class of all Euclidean spaces, {En | n ≥ 0}, will be denoted Euc.
The class of all normed vector spaces, regarded as metric spaces, will be denoted NmV. The class of all
convex subsets of normed vector spaces, regarded as metric spaces, will be denoted Conv.
The diameter of a metric space X will be defined by diam(X) = supx,y∈X d(x, y).
Corollary 9. If X is a nonempty metric space, then
(14) map-rad(X,E1) ≤ map-rad(X,E2) ≤ . . . ≤ map-rad(X,En) ≤ . . . ,
with supremum map-rad(X,Euc). Further,
(15) diam(X)/2 = map-rad(X, E1) ≤ map-rad(X, Euc) ≤ map-rad(X, NmV)
≤ map-rad(X, Conv) ≤ map-rad(X, Metr) = radX(X) ≤ diam(X).
Proof. Since En is the fixed subspace of a reflection of En+1, the final assertion of Lemma 7(ii) gives (14).
(We could have put “0 = map-rad(X, E0) ≤” at the left end of (14); but this would complicate some
references we will want to make to (14) later.) By definition, map-rad(X,Euc) is the supremum of these
values.
To see the initial equality of (15), note on the one hand that under any nonexpansive map f : X → E1,
the images of any two points of X are ≤ diam(X) apart, hence f(X) must lie in an interval of length
≤ diam(X), and any interval in E1 has radius half its length, so map-rad(X, E1) ≤ diam(X)/2. On the
other hand, for x, y ∈ X, the function d(x,−) : X → E1 is nonexpansive, and the images of x and y under
this map are d(x, y) apart, whence the radius of f(X) is at least half this value. Taking the supremum
over all x and y, we get map-rad(X, E1) ≥ diam(X)/2.
The next four steps, inequalities among mapping radii, are instances of Lemma 7(iii). In the equality
following these, the direction “≤” simply says that nonexpansive maps are radius-nonincreasing, while “≥”
holds because one of the maps in the supremum defining map-rad(X, Metr) is the identity map of X. The
final inequality is immediate.
We note in passing some cases where these mapping radii are easy to evaluate.
Corollary 10. If a metric space X satisfies radX(X) = diam(X)/2, then all terms of (15) through
radX(X) are equal (and hence also equal to all terms of (14)).
In particular, this is true whenever (i) X is a finite tree, with edges of arbitrary positive lengths, under
the arc-length metric, or (ii) X has an isometry ρ with a fixed point 0 such that for every x ∈ X,
d(x, ρ(x)) = 2 d(x, 0).
Proof. The first sentence is clear from (15). To get the two classes of examples, it suffices to show in each
case that radX(X) ≤ diam(X)/2, since (15) gives the reverse inequality.
In case (i), X is compact, so we may choose x, y ∈ X with d(x, y) = diam(X). The unique non-self-
intersecting path between x and y is isometric to a closed interval, and so has a midpoint p, satisfying
d(x, p) = d(y, p) = diam(X)/2; it now suffices to show that d(z, p) ≤ diam(X)/2 for all z ∈ X. Consider
the unique non-self-intersecting path from p to z. Because X is a tree, that path out of p cannot have
nontrivial intersection with both the path from p to x and the path from p to y; assume it meets the
latter only in p. Then the unique non-self-intersecting path from z to y is the union of the path from z to
p and the path from p to y, and we know that it has length ≤ diam(X), so subtracting off diam(X)/2,
the length of the path from p to y, we conclude that the length of the path from z to p is ≤ diam(X)/2,
as required.
In case (ii), we have radX(X) ≤ supx∈X d(x, 0) = supx∈X d(x, ρ(x))/2 ≤ diam(X)/2.
Examples falling under case (ii) above include all centrally symmetric subsets of normed vector spaces
containing 0, under the induced metric, and a hemisphere under the geodesic metric.
A less trivial result, now. Recall that in proving the upper bounds on the mapping radii of Lemmas 2,
3 and 4, we in effect chose formal weighted combinations of points of X, and used these to specify convex
linear combinations of points of f(X) ⊆ V. We abstract this technique below. In the statement of the
theorem, as a convenient way to express formal weighted combinations of points of X, we use probability
measures on X with finite support. (Recall that a probability measure on X is a nonnegative-valued
measure µ such that µ(X) = 1, and that µ is said to have support in a set X0 if it is zero on every subset
of X −X0. Apologies for the double use of “d” below, for the distance function of the metric space and the
“d” of integration.)
Theorem 11. Let X be a nonempty metric space. Then
(16) map-rad(X,Conv) = infµ supx∈X
d(x, z) dµ(z),
where the infimum is over all probability measures µ on X with finite support.
Proof. We first prove “≤”, imitating the argument of Lemmas 2, 3 and 4. We must show, for any nonexpan-
sive map f : X → C, where C is a convex subset of a normed vector-space V, and any probability measure
µ on X with finite support, that
(17) radC(f(X)) ≤ supx∈X
d(x, z) dµ(z).
For any point x of X, let µx denote the probability measure on X with singleton support {x}.
Since the µ of (17) is a probability measure with finite support, it has the form c1µx1 + · · · + cnµxn ,
where x1, . . . , xn are points of X, and c1, . . . , cn are nonnegative real numbers summing to 1. The point
ci f(xi) lies in C, so by definition of the radius, the left-hand side of (17) is ≤ supx∈X d(y, f(x)) =
supx∈X d(
ci f(xi), f(x)), which by (5) is ≤ supx∈X
ci d(f(xi), f(x)), which, because f is nonex-
pansive, is ≤ supx∈X
ci d(x, xi). The sum in this expression is the integral in (17), giving the desired
inequality.
In proving the direction “≥” in (16), we may assume the metric space X is bounded, since otherwise
it has infinite diameter, in which case (15) tells us that the left hand side of (16) is infinite. Assuming
boundedness, we shall display a particular embedding e of X in a convex subset C of a normed vector
space U, such that radC(e(X)) is greater than or equal to the right-hand side of (16).
Let U be the space of all continuous bounded real-valued functions on X, under the sup norm, let
e : X → U take each x ∈ X to the function d(x,−) (this e is easily seen to be nonexpansive) and let C
be the convex hull of e(X). Now for x ∈ X, its image e(x) = d(x,−) can be written y 7→
d(y, z) dµx(z).
Hence an arbitrary u ∈ C, i.e., a convex linear combination of these functions, will have the same form, but
with µx replaced by a convex linear combination µ of the measures µx, i.e., a general probability measure
µ on X with finite support. For such a function u, and any x ∈ X, the distance d(e(x), u) in C is the
sup norm of u− e(x), which is at least the value of u− e(x) at x ∈ X, which is u(x)− 0 =
d(x, z) dµ(z).
The radius of e(X) in C is thus at least the infimum over all µ of the supremum over all x of this integral,
which is the right-hand side of (16).
Recall that when we obtained our bound (7) on the mapping radius of the 1-skeleton of an octahedron,
analogy and good luck led us to the formal linear combination of points of X used in (6) (in effect, a
probability measure µ), which turned out to give the optimal bound. In general we ask
Question 12. Let X be a finite graph with edges of possibly unequal lengths, under the arc-length metric.
Must there be a probability measure µ on X with finite support that realizes the infimum of (16)?
Is there an algorithm for finding such a µ if it exists, or if not, for evaluating (16)?
We cannot expect in general that a measure of the desired sort will have support in the set of vertices of
the graph X, as happened in Lemma 4. E.g., if X is isometric to a circle with arc-length metric, one can
show that a measure µ realizes the infimum of (16) if and only if it gives equal weight to p and q whenever
p and q are antipodal points; so if X is, say, an equilateral polygon with an odd number of vertices, µ
cannot be concentrated in the vertices.
A class of examples generalizing our octahedral skeleton, which it would be of interest to examine, are
the 1-skeleta of cross polytopes [7].
A situation simpler than that of Question 12 is that of a finite metric space X. Here the determination
of the right-hand side of (16) is a problem in linear programming; whether it has an elegant solution I don’t
know. The determination of map-rad(X,En) for such a space X is, similarly, in principle, a problem in
calculus.
In Corollary 10, we saw that the mapping radius is easy to compute for a space that has “a robust
center”. Using the preceding theorem, let us show the same for a space with a pair of “robust antipodes”.
Corollary 13. Suppose the metric space X has a pair of points p and q such that
(18) (∀ r ∈ X) d(p, r) + d(r, q) = d(p, q).
Then letting D = d(p, q), we have diam(X) = D, and map-rad(X,Conv) = D/2. Thus, the terms of (15)
through map-rad(X,Conv) are all equal to D/2.
In particular, this is true if X is the 1-skeleton of a regular tetrahedron or of a parallelopiped (in
particular, of a cube), with the arc-length metric, or is the 0-skeleton of any of the regular polyhedra other
than the tetrahedron, with metric induced by the arc-length metric on the 1-skeleton of that polyhedron.
The property (18) is, of course, inherited by any subspace of X containing p and q.
Proof. For any two points r, r′ ∈ X, we have
2 d(r, r′) ≤ (d(r, p)+d(p, r′))+(d(r, q)+d(q, r′)) = (d(p, r)+d(r, q))+(d(p, r′)+d(r′, q)) = 2D,
so d(r, r′) ≤ D, whence diam(X) = D. Now let µ be the probability measure giving weight 1/2 to each
of p and q. For this µ, the integral on the right-hand side of (16) has value D/2 for all x, hence the
supremum of that integral over x is D/2, hence (16) shows that map-rad(X,Conv) ≤ D/2. Comparing
with the first term of (15), we see that all the the terms of (15) through map-rad(X,Conv) (though not,
as before, through map-rad(X,Metr)) are equal.
For X the 1-skeleton of a regular tetrahedron, we get (18) on taking for p and q the midpoints of
two opposite edges. For X the 1-skeleton of a parallelopiped, we can use any two antipodal points (not
necessarily vertices. In picturing this case, it may help to note that X is isometric to the 1-skeleton of a
rectangular parallelopiped.) In the 0-skeleton cases, we use any pair of opposite vertices. In each case, the
verification of (18) is not hard.
The final sentence is clear.
So, for instance, for the 1-skeleta of the tetrahedron and cube of edge 1, the 6-tuples of terms of (15) (not
distinguishing terms shown connected by equals-signs) are (1, 1, 1, 1, 3/2, 2) and (3/2, 3/2, 3/2, 3/2, 3, 3)
respectively. (The reason the last two numbers are equal for the cube, but distinct for the tetrahedron, is
that for the cube, the function x 7→ supy d(x, y) is 3 for all x, while for the tetrahedron, it ranges from
a maximum value 2 at the midpoints of the edges to a minimum value 3/2 at the vertices. In neither of
these cases is the maximum twice the minimum, so neither of them falls under Corollary 10.)
Let us note a curious feature of the construction used in Theorem 11: it has what at first looks like a
universal property (part (i) of the next result) but turns out not to be (part (ii)).
Corollary 14 (to proof of Theorem 11). Let X be a bounded metric space, let U be the space of continuous
bounded real-valued functions on X under the sup norm (cf. second half of the proof of Theorem 11), and
let e : X → U be the map taking each x ∈ X to the function d(x,−).
Now let f : X → V be any map (in Metr) from X into a normed vector space V. Then
(i) For every family of points x1, . . . , xn ∈ X, every family c1, . . . , cn of nonnegative real numbers summing
to 1, and every x ∈ X, one has
(19) d(f(x),
ci f(xi)) ≤ d(e(x),
ci e(xi)).
However,
(ii) Given points x1, . . . , xn ∈ X, and two families of nonnegative real numbers b1, . . . , bn and c1, . . . , cn,
each summing to 1, it is not necessarily true that
(20) d(
bi f(xi),
ci f(xi)) ≤ d(
bi e(xi),
ci e(xi)).
Thus, the convex hull of e(X) need not admit a map (in Metr) to the convex hull of f(X) making a
commuting triangle with e and f.
Proof. (i) may be seen by combining the calculations of the last sentence of the proof of the “≤” direction
of Theorem 11, which shows that d(f(x),
ci f(xi)) ≤
ci d(x, xi), and the end of the proof of the “≥”
direction, which, by evaluating e(xi) and e(x) as elements of the function-space U at the point x, shows
that d(e(x),
ci e(xi)) ≥
ci d(x, xi).
To get (ii), let X again be a circle of circumference 4 with arc-length metric, and let x0, x1, x2, x3 ∈ X
be four points equally spaced around it. Note that for any u ∈ X, we have d(x0, u) + d(x2, u) = 2 =
d(x1, u)+d(x3, u). Hence if we choose the bi and ci so that the right-hand-side of (20) is d((e(x0)+e(x2))/2,
(e(x1)+e(x3))/2), we see that this value is 0. On the other hand, if we map X into E
1 by f(x) =
1 − max(d(x0, x), 1), then of the f(xi), only f(x0) is nonzero, so the left-hand side is not 0, so (20)
fails.
There are, in fact, a different normed vector space U and mapping e : X → U for which the universal
property of (20) does hold [23, Theorem 2.2.4]; we examine this construction in an appendix, §6.
3 Some explicit mapping radii.
A classical result of H. E.W. Jung is, in effect, an evaluation of the mapping radius in En of a very simple
metric space.
Theorem 15 (after Jung [17]). Let D∞ denote an infinite metric space in which the distances between
distinct points are all 1. (The cardinality does not matter as long as it is infinite.) Then the values of
map-rad(D∞, E
n) for n = 0, 1, 2, . . . are, respectively,
(21) 0 < 1/2 < 1/
3/8 < . . . <
n/(2(n+ 1)) < . . . .
Hence, map-rad(D∞, Euc) = 1/
Likewise, for any positive integer m, if we let Dm be an m-element metric space with all pairwise
distances 1, then for every n ≥ 0,
(22) map-rad(Dm, E
r/(2(r + 1)) , where r = min(m−1, n).
Hence, map-rad(Dm, Euc) =
(m− 1)/(2m) .
Summary of proof. The main result of [17] is that every subset of En of diameter ≤ 1 has radius ≤
n/(2(n+ 1)) . This gives map-rad(D∞, E
n/(2(n+ 1)) . On the other hand, the n+ 1 vertices of
the n-simplex of edge 1 in En form a subset of radius exactly
n/(2(n+ 1)) , and clearly D∞ can be
mapped onto that set, establishing equality. Taking the limit of this increasing sequence as n → ∞, one
gets map-rad(D∞, Euc) = 1/
Clearly, the hypothesis m > n works as well as m =∞ in concluding as above that map-rad(Dm, En) =
n/(2(n+ 1)) . For m ≤ n, on the other hand, any image of Dm in En lies in an affine subspace that can be
identified with Em−1, so in that case we get map-rad(Dm, E
n) = map-rad(Dm, E
m−1) =
(m− 1)/(2m) .
Combining these results, we get (22) and the final conclusion.
The inequalities (21) show that each step of (14) can be strict. What about the steps of (15)? If we
identify terms connected by equal-signs, then (15) lists six possibly distinct values, connected by five ≤-signs.
Three of these ≤-signs are shown strict by the 3-point metric space D3 of the above theorem, for which,
I claim, the 6-tuple of values is (1/2, 1/
3 , 2/3, 2/3, 1, 1). The first of these values, and the last two,
are clear, and the second comes from the above theorem (line after (22)). To evaluate the remaining two
values, map-rad(D3, NmV) and map-rad(D3,Conv), consider the embedding e : D3 → U as in the last
paragraph of the proof of Theorem 11. The space U used there can in this case be described as R3 under
the sup norm; let C be the convex hull in U of
(23) e(D3) = {(0, 1, 1), (1, 0, 1), (1, 1, 0)}.
Then Lemma 7(ii) (in particular, the final sentence) tells us that radC(e(D3)) is the common distance of
the three points of e(D3) from the unique point of C invariant under cyclic permutation of the coordinates,
namely (2/3, 2/3, 2/3). This common distance is 2/3 (since each member of e(D3) has a zero coordinate),
so radC(e(D3)) = 2/3, and by Theorem 11, this is map-rad(D3,Conv). Since map-rad(D3, NmV) ≤
map-rad(D3, Conv), to show that map-rad(D3, NmV) is also 2/3 it will suffice to obtain a nonexpansive
map f of D3 into a vector space V such that radV (f(D3)) = 2/3. This may be done by using the same
mapping as above, but translated by (−2/3,−2/3,−2/3), so that the affine span of its image becomes a
vector subspace of R3, which, with its induced norm, we take as our V. The preceding argument now gives
radV (f(D3)) = 2/3.
For a space showing strict inequality at the final step of (15), radX(X) ≤ diam(X), one can use any
nontrivial instance of Corollary 10; for instance, the unit interval [0, 1], for which that corollary shows that
the 6-tuple in question is (1/2, 1/2, 1/2, 1/2, 1/2, 1).
This leaves the step
(24) map-rad(X,NmV) ≤ map-rad(X,Conv).
I thought at first that equality had to hold here: that for a C a convex subset of a normed vector space V
and any A ⊆ C (in particular, the image of any map of a metric space into C), one had radV (A) = radC(A).
However, this is not so: consider the untranslated case (23) of the above D3 example, and note that the point
(1/2, 1/2, 1/2) ∈ U has distance 1/2 from each point of (23); so radU (e(D3)) ≤ 1/2 < 2/3 = radC(e(D3)).
Nonetheless we have seen that for X = D3, equality holds in (24). Here, however, is an example (which
it took attempts spread over many months to find) for which that inequality is strict.
Consider the graph with 7 vertices, x, y0, y1, y2, z0, z1, z2, and 9 edges: a length-1 edge from x to each
of the yi, and a length-2 edge from yi to zj whenever i 6= j; and let X be the vertex-set of this graph,
with arc-length metric. Thus, for all i 6= j we have
(25) d(x, yi) = 1, d(x, zi) = 3, d(yi, yj) = 2,
d(yi, zj) = 2, d(yi, zi) = 4, d(zi, zj) = 4.
Let us first find map-rad(X,Conv), using Theorem 11. We must maximize the infimum (16) over the
convex linear combinations of µx, . . . , µz2 . By Lemma 5, it suffices to maximize that expression over points
invariant under permutations of the subscripts; i.e., over convex linear combinations of
(26) µx, µy = (µy1 + µy2 + µy3)/3, µz = (µz1 + µz2 + µz3)/3.
We find that
(27) µx(x) = 0, µx(yi) = 1, µx(zi) = 3,
µy(x) = 1, µy(yi) = 4/3, µy(zi) = 8/3,
µz(x) = 3, µz(yi) = 8/3, µz(zi) = 8/3.
Any convex linear combination of these three functions has value ≥ 8/3 at each zi; so every value of the
supremum in (16) is at least 8/3. Moreover, taking µ = µy (or more generally, µ = (1− t)µy + tµz for any
t ∈ [0, 5/6]), we see that this value 8/3 is attained; so
(28) map-rad(X,Conv) = 8/3.
The idea of our verification that map-rad(X,NmV) is strictly smaller than (28) will be to use the non-
convex affine combination (3µy − µx)/2 of the functions (27), so as to reduce somewhat the highest values
of µy, those at the zi, without bringing the values at other points up by too much. But since we don’t have
the analog of Theorem 11 for non-convex combinations (and indeed, that analog is not true in general – if it
were, then 2µy − µx would lead to a still better result, but it does not), we must calculate by hand rather
than calling on such a theorem. So suppose f is a nonexpansive map of X into a normed vector space V,
and let
(29) p = (f(y0) + f(y1) + f(y2)− f(x))/2.
We need to bound the distances between p and the points of f(X). In view of the symmetry of (29), it will
suffice to bound the distances to f(x), f(y0) and f(z0). We calculate
(30) d(p, f(x)) = || (f(y0) + f(y1) + f(y2)− f(x)− 2f(x))/2 ||
≤ (||f(y0)− f(x)||+ ||f(y1)− f(x)|| + ||f(y2)− f(x)||)/2 ≤ (1 + 1 + 1)/2 = 3/2.
d(p, f(y0)) = || (f(y0) + f(y1) + f(y2)− f(x)− 2f(y0))/2 ||
≤ (||f(y1)− f(y0)||+ ||f(y2)− f(x)||)/2 ≤ (2 + 1)/2 = 3/2.
d(p, f(z0)) = || (f(y0) + f(y1) + f(y2)− f(x)− 2f(z0))/2 ||
≤ (||f(y0)− f(x)||+ ||f(y1)− f(z0)||+ ||f(y2)− f(z0)||)/2 ≤ (1 + 2 + 2)/2 = 5/2.
Taking the maximum of these values, we get
(31) map-rad(X,NmV) ≤ 5/2 < 8/3 = map-rad(X,Conv),
a strict inequality, as claimed.
The above observations suggest the question: Which normed vector spaces V have the property that
the radius of every subset X of V is the same whether evaluated in V, or in an arbitrary convex subset of
V containing X ? This is examined in an appendix, §7.
The example of (23) showed that the radius of a subset of a normed vector space could change when one
passed to a larger normed vector space. Let us note a curious consequence.
Lemma 16. Let U be R3 under the sup norm, and U0 ⊆ U be {(a, b, c) ∈ U | a + b + c = 0}. Then
there is no isometric reflection U → U having U0 as its fixed subspace. In fact, no finite group of affine
isometries of any normed vector space W containing U has U0 as its fixed subspace.
Proof. Let W be any normed vector space containing U, and let f : D3 → U0 be given by f(x) =
e(x) − (2/3, 2/3, 2/3), for e as in the paragraph containing (23). The first sentence of Lemma 7(ii) gives
radW (f(D3)) ≤ radU (f(D3)), which we saw is < radU0(f(D3)). On the other hand, if W had a finite group
G of affine isometries with fixed subspace U0, then Lemma 5 would give radW (f(D3)) = radU0(f(D3)).
Returning to (14) and (15), let us for simplicity reduce the number of independent values by “normalizing”
to the case diam(X) = 2, and ask for more detailed information than those inequalities.
Question 17. Let X run over all metric spaces of diameter 2. What can one say about the geometry of
the resulting sets of sequences
(32) {(map-rad(X,E1), map-rad(X,E2), . . . , map-rad(X,En), . . . )} ⊆ RN,
(33) {(map-rad(X, Euc), map-rad(X, NmV), map-rad(X, Conv), map-rad(X, Metr))} ⊆ R4 ?
Can one describe them exactly? Are they convex; or do they become convex on replacing the entries by
their logarithms, or under some other natural change of coordinates?
If two successive terms of a member of (32) are equal, is the sequence constant from that point on?
Another family of questions, suggested by Theorem 15, is
Question 18. For n ≥ 2, what can one say about the set of nonnegative real numbers that can be written
map-rad(X,En) for finite metric spaces X in which all distances are integers?
Are all such real numbers “constructible”, i.e., obtainable from rational numbers by a finite sequence of
square roots and ring operations?
Is this set well-ordered for each n ? (It has a smallest element 0, and a next-to-smallest element 1/2.)
Does this set change if “finite metric spaces X . . . ” is weakened to “bounded metric spaces X . . . ”?
For m < n, can one assert any inclusion between the sets of mapping radii into Em, and into En ? Are
there values that occur as map-rad(X,Euc) for some X, but not as map-rad(X ′,En) for any X ′ and n ?
(E.g., can 1/
2 be written in the latter form?)
We end this section with an observation made in [12] for the spaces En, which in fact holds for closed
convex subsets of arbitrary finite-dimensional normed spaces.
Lemma 19 (cf. [12, Proposition 29, p.14, and second paragraph of p.46]). If C is a closed convex subset of
a normed vector space V of finite dimension n, and A a subset of C with > n elements, then radC(A) =
supA0 radC(A0), where A0 runs over the n+1-element subsets of A.
Proof. “≥” is clear; so it suffices to show that if for some real number r, each A0 is contained in a closed
ball of radius r centered at a point of C, then so is A. Now for each a ∈ A, the set of v ∈ C such that
a lies in the closed ball in C of radius r about v is the closed ball in C of radius r about a, hence a
compact convex subset of V. To say that a set A0 is contained in some closed ball of radius r centered
at a point of C is to say that the intersection of these sets, as a runs over A0, is nonempty. By Helly’s
Theorem ([14], [8]), if a family of compact convex subsets of Rn has the property that every system of n+1
members of this family has nonempty intersection, then so does the whole family; which in this case means
that all of A is contained in a ball of the indicated sort.
The above lemma does not imply the corresponding statement for mapping radii. For example, let
X = {x, y0, y1, y2}, where x has distance 1/2 from each of the yi, and these have distance 1 from each
other. The maximum of the mapping radii in E2 of 3-element subsets of X is map-rad({y0, y1, y2},E2) =
map-rad(D3,E
2) = 1/
3. But map-rad(X,E2) ≤ radX(X) = 1/2.
On the other hand, for this example, map-rad(X,E2) can be described as the infimum over p ∈ X of
the supremum of map-rad(X0,E
2) over all 3-element subsets X0 of X containing p. So we ask
Question 20. Does there exist, for every positive integer n, a positive integer N and a formula which
for every metric space X of ≥ N elements, and every normed vector space V of dimension n, expresses
map-rad(X,V ), using the operations of suprema and infima, in terms of the numbers map-rad(X0, V ), for
N -element subsets X0 ⊆ X ?
4 Realizability of mapping radii.
For a subset A of a metric space Y, let us say that radY (A) is realized if the infimum in the definition (2)
of that expression is attained, that is, if there exists y ∈ Y such that A is contained in the closed ball of
radius radY (A) about y.
Likewise, for metric spaces X and Y, let us say that map-rad(X,Y ) is realized if the supremum in the
definition of that expression is attained; that is, if there exists an f : X → Y such that radY (f(X)) =
map-rad(X,Y ). (This does not presume that radY (f(X)) is realized.)
Lemma 21. Let X and Y be nonempty metric spaces.
(i) If Y is compact, then for any subset A ⊆ Y, radY (A) is realized.
(ii) If X and Y are both compact, then map-rad(X,Y ) is realized.
However
(iii) For X compact and Y bounded and complete, or for X bounded and complete and Y compact,
map-rad(X,Y ) may fail to be realized.
Proof. (i) follows from the fact that for bounded A, supa∈A d(a, y) is a continuous function of y, hence
assumes a minimum on Y.
To get (ii), we note that Metr(X,Y ) is a closed subset of the function space Y X , which is compact
because Y is, so Metr(X,Y ) is compact in the function topology. We would like to say that the real-valued
map on this space given by f 7→ radY (f(X)) is continuous, and hence assumes a maximum. For general
X, this continuity does not hold, as will follow from the second statement of (iii); but I claim that it holds if
X is compact. For given f ∈Metr(X,Y ) and ε > 0, compactness allows us to cover X by finitely many
open balls of radius ε/3, say centered at x1, . . . , xn. Consider the neighborhood of f in Metr(X,Y ) given
U = {g ∈Metr(X,Y ) | d(f(xi), g(xi)) < ε/3 (i = 1, . . . , n)}.
Taking any x ∈ X and y ∈ Y, note that there exists i such that d(xi, x) < ε/3; hence for g ∈ U,
|d(f(x), y) − d(g(x), y)| ≤ d(f(x), g(x)) ≤ d(f(x), f(xi)) + d(f(xi), g(xi)) + d(g(xi), g(x))
≤ ε/3 + ε/3 + ε/3 = ε.
Thus, the two functions associating to every y ∈ Y the numbers supx∈X d(f(x), y) and supx∈X d(g(x), y)
differ everywhere by ≤ ε, whence the infima of these functions, radY (f(X)) and radY (g(X)) differ by ≤ ε,
giving continuity of f 7→ radY (f(X)), which, as noted above, yields (ii).
(iii) For an example with X but not Y compact, let X = D2, i.e., a space consisting of two points
at distance 1 apart, and let Y = {y2, y3, . . . , yn, . . . } ∪ {z}, with d(ym, yn) = 1 (m 6= n) and d(yn, z) =
1 − 1/n. Note that the radius in Y of a point-pair {z, yn} or {ym, yn} with m < n is 1 − 1/n. Now
Metr(X,Y ) consists of all set-maps X → Y, and it follows from the above calculation that map-rad(X,Y ) =
supn(1− 1/n) = 1, but that this value is not achieved. (If we had not specified that Y should be complete,
we could have used the simpler example, X = D2, Y = [0, 1).)
For an example with Y but not X compact, let X = {x2, x3, . . . , xn, . . . }, with d(x2n, x2n+1) = 2−1/n,
and all other pairs of distinct points having distance 1; and let Y = [0, 2] ⊆ E1. Note that if a map X → Y
is to have radius > 1/2, it must send some pair of points to values differing by > 1, and by our metric
on X, these two points must have the forms x2n, x2n+1. Since all other points have distance 1 from these
two, the images of all other points must fall within the interval between their images. Hence the image of
our map falls within an interval of length ≤ 2− 1/n for some positive n, i.e., of length < 2, and hence of
radius < 1. But such images can have radii arbitrarily close to 1, again giving a mapping radius that is not
realized.
Corollary 22. Suppose Y is a metric space in which every closed bounded subset is compact. Then
(i) For every bounded nonempty subset A ⊆ Y, radY (A) is realized.
(ii) If the isometry group of Y is transitive, or more generally, if Y has a bounded subset which meets
every orbit of that group, then for every compact nonempty metric space X, map-rad(X,Y ) is realized.
Proof. (i) Let radY (A) = r, choose any a0 ∈ A, and let Y ′ be the closed ball of any radius r′ > diam(A) ≥
r about a0 in Y. By assumption Y
′ is compact. We see that A ⊆ Y ′, and that every point y ∈ Y with
supa∈A d(a, y) ≤ r′ lies in Y ′. Since infy∈Y supa∈A d(a, y) = r, the space Y contains points y for which
supa∈A d(a, y) comes arbitrarily close to r; hence it will contain points for which that value is arbitrarily
close to r and is ≤ r′. Points with this latter property lie in Y ′, whence radY ′(A) is also equal to r, and
applying part (i) of the preceding lemma with Y ′ for Y gives the desired conclusion.
(ii) Suppose every orbit of the isometry group of Y meets the closed ball of radius c about y0 ∈ Y, and
let x0 be any point of X. Then every f : X → Y may be adjusted by an isometry of Y (which will preserve
the radius of f(X)) so that we get d(f(x0), y0) ≤ c, and after this adjustment, f(X) will lie in the closed
ball of radius c + diam(X) about y0. Letting Y
′ denote the closed ball of any radius r′ > c + diam(X)
about y0, we see as in the proof of (i) that the radii of these image sets f(X) in Y
′ will equal their radii
within Y, and applying part (ii) of the preceding lemma with Y ′ for Y, we get the desired conclusion.
5 Related literature (and one more question).
Lemma 2 above, determining the mapping radius of a circle in a normed vector space V, occurs frequently
in the literature (with E3 or En for V ) as an offshoot of the proof of Fenchel’s Theorem, the statement
that the total curvature of a closed curve C in E3 is at least 2π, with equality only when C is planar
and convex [9, Satz I]. To prove that theorem, Fenchel noted that this total curvature is the length of the
curve in the unit sphere S2 traced by the unit tangent vector to C, and that that curve cannot lie wholly
in an open hemisphere of S2 (nor in a closed hemisphere unless C is planar). He completed the proof by
showing [9, Satz I′ ] that a closed curve of length < 2π (respectively, equal to 2π) in S2 must lie in an
open hemisphere (respectively, must either lie in an open hemisphere or be a union of two great semicircles).
In our language, this says that a circle of circumference < 2π, made a metric space using arc-length, has
mapping radius < π/2 in S2 and (along with some additional information) that the circle of arc-length
exactly 2π has mapping radius π/2.
Subsequent authors [5], [6, Lemma on p.30], [16], [19], [21], [22] gave simpler proofs of Fenchel’s Satz I′
(similar to our proof of Lemma 2), and/or generalized that result from S2 to Sn, and/or obtained the more
precise result that the mapping radius of a circle of length L ≤ 2π in Sn is L/4, and/or noted that the
same method also gives the analogous result with En, or indeed any of a large class of geometric structures,
in place of Sn.
The last-mentioned generalizations were based on the observation that the concept of the midpoint of a
pair of points can be defined, and behaves nicely, in many geometric contexts. I do not know whether more
general convex linear combinations, such as we used in (5) and in the proof of Theorem 11, can be defined
outside the context of vector spaces so as to behave nicely; hence the emphasis in this note on vector spaces
and their convex subsets. A.Weinstein (personal communication) suggests that an approach to “averaging”
of points introduced by Cartan and developed further by Weinstein in [24] might serve this function. J.Lott
(personal communication) points similarly to the concepts of Hadamard space [2] and Busemann convex
space [4].
The results on closed curves of length L in the unit sphere cited above all take L ≤ 2π. If we write S1L
for a circle of circumference L with arc-length metric, and Sn for the unit n-sphere (of circumference 2π)
with geodesic distance as metric, it is clear that the result map-rad(S1L, S
n) = L/4 cannot be expected to
hold when L > 2π; but it would be interesting to investigate how that mapping radius does behave as a
function of L. For all L, map-rad(S1L, S
n) < π, since a curve of fixed length cannot come arbitrarily close
to every point of Sn, and if it misses the open disk of geodesic radius r about a point p, then it is contained
in the closed disc of geodesic radius π − r about the antipodal point.
Many of the papers referred to above consider arcs as well as closed curves; i.e., also study map-rad([0, L],
Sn), and prove that for L ≤ π, this equals L/2. Again, the case of larger L would be of interest. So we
Question 23. For fixed n > 1, how does map-rad(S1L, S
n) behave for L > 2π, and how does map-rad([0, L],
Sn) behave for L > π, as functions of L ?
For instance, are these two functions piecewise analytic?
It seems likely that there will be ranges of values of L in which different configurations of a closed curve or
arc give maximum radius, and that the value of this radius will be an analytic function of L within each such
range. (I conjecture that for all L between 2π and a value somewhat greater than 3π, map-rad(S1L, S
will be realized by a “3-peaked crown”, consisting of 6 arcs of great circles, with midpoints equally spaced
along a common equator. For map-rad(S1L, S
n) with n > 2, I have no guesses.)
Many papers in this area also consider the smallest “box” – in various senses – into which one can fit
all curves, or closed curves, of unit length [5] [13] [20], or all point-sets of unit diameter [10]. These do not
translate into statements about our concept of mapping radius for two reasons. First, they deal with arc
length in the Euclidean metric, but with “boxes” which, though they could in many cases be considered
closed balls in another metric, are not balls in the Euclidean metric; and our formalism of mapping radius
does not look at more than one metric on Y at a time. Second, they generally allow rotations as well as
translations in fitting the box around the curve, while in looking at radii we only have one closed ball of
each radius centered at a given point.
The intuitive interest of Question 23 above arises in part from a special property of the sphere: that a
large open or closed ball, i.e., one that falls just short of covering Sn, has for complement a small closed or
open ball. For spaces Y not having this property, the most natural analogs of those questions might be the
corresponding questions about “mapping co-radii”, given by the definitions
(34) coradY (A) = supy∈Y infa∈A d(a, y) (A ⊆ Y ),
(35) map-corad(X,Y ) = inff∈Metr(X,Y ) coradY (f(X))
= inff∈Metr(X,Y ) supy∈Y infx∈X d(f(x), y)
(cf. (2) and (3)). So, for instance, one might ask about the values of map-corad([0, L], B2) for B2 the closed
unit disc in R2, as a function of L.
(I’m not sure that “co-radius” is a good choice of term: one could argue that that term would more ap-
propriately apply either to radY (Y−A), or to what in the notation of (34) would be written coradY (Y −A).
So the above names are just suggestions, which others may choose to revise.)
6 Appendix: The Arens-Eells space of X.
At the end of §2, I mentioned that every metric space X admits an embedding in a normed vector
space U having the universal property that Corollary 14(ii) showed that the embedding we were considering
there did not have. The construction in question was introduced by Arens and Eells [1], and its universal
property noted by Weaver [23, Theorem 2.2.4], who calls it the Arens-Eells space of X. Weaver is there
most interested in this space as a pre-dual to the Banach space of Lipschitz functions on X. I will sketch
below a motivation for the same object in terms of the universal property. My description will also make a
couple of technical choices different from those of [1] and [23].
Essentially the same construction arises in mathematical economics, in the study of the “transportation
problem” [11], cf. [23, §2.3]. What to us will be the norm of an element of the Arens-Eells space appears
there as the minimum cost of transporting goods from a given set of sources to a given set of markets.
To lead up to the construction, let a metric space X be given, consider any map (as always, nonexpansive)
f of X into a normed vector space V, and let us ask, as a sample question: If we know the distances among
four points x1, x2, x3, x4 ∈ X, what can we say about ||f(x1) + f(x2)− f(x3)− f(x4)|| ?
Clearly, this will be bounded above by ||f(x1)−f(x3)|| + ||f(x2)−f(x4)|| ≤ d(x1, x3) + d(x2, x4). The
other way of pairing terms of opposite sign similarly gives the bound d(x1, x4) + d(x2, x3). Hence
(36) ||f(x1) + f(x2)− f(x3)− f(x4)|| ≤ min(d(x1, x3) + d(x2, x4), d(x1, x4) + d(x2, x3)).
For a similar, but slightly less straightforward case, suppose we want to bound ||3f(x1) + f(x2) −
2f(x3) − 2f(x4)||. We cannot, as before, pair off terms whose coefficients in this expression happen to be
the same except for sign. There are, however, ways of breaking up that expression as a linear combination
of differences; and a little experimentation shows that all ways of doing so are convex combinations of two
extreme decompositions. These two cases lead to the bound
(37) ||3f(x1) + f(x2)− 2f(x3)− 2f(x4)|| ≤
min(2d(x1, x3)+d(x1, x4)+d(x2, x4), 2d(x1, x4)+d(x1, x3)+d(x2, x3)).
We will not stop here to prove that (36) and (37) are best bounds. Let us simply observe that these
considerations suggest that the norm of such a linear combination of images of points of X under a universal
map e : X → U should be given by an infimum of linear combinations of the numbers d(x, y) (x, y ∈ X)
with nonnegative real coefficients, the infimum being taken over all such linear expressions which, when each
d(x, y) is replaced by e(x)− e(y), give the required element.
An obvious problem is that the only elements we get in this way are those in which the sum of the
coefficients of the members of e(X) is 0. This difficulty is intrinsic in the situation: There will not in fact
exist a nonexpansive map of X into a normed vector space having the standard sort of universal mapping
property with respect to such maps, because, though the condition of nonexpansivity bounds the distances
among images of points of X, it does not bound the distances between such images and 0; so universality
would force the images of points of X to have infinite norm.
What we can get, rather, is a set-map e of X into a vector space U, and a norm on the subspace U0
of linear combinations of images of points of X with coefficients summing to 0, such that for all x, y ∈ X,
||e(x) − e(y)|| ≤ d(x, y), and which has the universal property that given any nonexpansive map f of X
into a normed vector space V, there exists a unique vector-space homomorphism g : U → V which satisfies
f = ge, and is nonexpansive on U0. Observe that the norm on U0 induces a metric on each coset of that
subspace; in particular, on the coset U1 of elements in which the sum of all coefficients is 1, which is the
affine span of the image of X. The map of X into that coset is nonexpansive, and the asserted universal
property of e is easily seen to yield (20), the property that the construction of §2 failed to have.
Weaver’s answer to the same distance-to-0 problem is to use metric spaces with basepoint, and basepoint-
respecting maps, the basepoint of a vector space being 0. This has the advantage of giving a universal
property in the conventional sense, with both U and V in the category of normed vector spaces. However,
it requires one to make a possibly unnatural choice of basepoint in X ; changes in that choice induce isometries
on the universal space, which, though affine, are not linear. The approach I actually find most natural is
to regard what I have called U1 as a “normed affine space”, that is, a set with a simply transitive group
of “translation” maps by elements of a normed vector space, and to note that U1 has a genuine universal
property in the category of normed affine spaces. However, the development of that concept would be an
excessive excursion for this appendix. Still another approach would be to work with “normed” vector spaces
where the norm is allowed to take on the value +∞. In any case, it is straightforward to verify that the
Arens-Eells space of X as described in [23] and my U0 are isometrically isomorphic, so below I will quote
results of Weaver’s, tacitly restated for my version of the construction.
The details, now: let U be the vector space of all real-valued (i.e., not necessarily nonnegative) measures
µ on X with finite support, and, as before, for each x ∈ X let µx be the probability measure with support
{x}. Thus, {µx | x ∈ X} is a basis of U. Let U0 ⊆ U denote the subspace of measures µ satisfying
µ(X) = 0. Let W similarly denote the space of all real-valued measures on X × X with finite support;
for each (x, y) ∈ X ×X, let νx,y be the probability measure with support {(x, y)}, and let W+ ⊆ W be
the cone of nonnegative linear combinations of the νx,y, i.e., the nonnegative-valued measures on X ×X.
Finally, let D : W → U be the linear map defined by the condition
(38) D(νx,y) = µx − µy for x, y ∈ X,
which clearly has image U0. We now define the norm of any µ ∈ U0 by
(39) ||µ|| = inf
ν∈W+, D(ν)=µ
(x,y)∈X×X
d(x, y) dν .
It is easy to verify that this indeed gives a norm with the desired universal property. The one verification that
is not immediately obvious is that it is a norm rather than a pseudonorm; i.e., that it is nonzero for nonzero
µ ∈ U0. To get this, one first proves the desired universal property in the wider context of pseudonormed
vector spaces, then notes that given any nonzero µ =
ai µxi ∈ U0 (I finite, all ai nonzero), one can
find a nonexpansive map f : X → R which is zero at all but one of the xi, say xi0 , from which it follows
by the universal property that ||µ|| ≥ |
ai f(xi)| = |ai0 f(xi0)| > 0.
Weaver [23, Theorem 2.3.7(b)] shows that the infimum in (39) is always attained, and in fact, by a ν
whose “support” in X (the set of points which appear as x or y in terms νx,y having nonzero coefficient
in the expression for ν) coincides with the support of µ (the set of x such that µx appears with nonzero
coefficient in the expression for µ). Our next proposition strengthens this result a bit. For brevity, we will
call on Weaver’s result in the proof, but I will sketch afterward how the argument can be made self-contained.
We will use the following notation and terminology. Given ν =
bj νxj ,yj ∈ W (where J is a finite
set, the pairs (xj , yj) for j ∈ J are distinct, and all bj 6= 0), let Γ(ν) be the directed graph having for
vertices all points of X, and for directed edges the finitely many pairs (xj , yj) (j ∈ J). Let us define the
positive support of a directed graph Γ as the set of vertices which are initial points of its edges, and its negative
support as the set of vertices which are terminal points. For ν ∈ W, we will call the positive and negative
supports of Γ(ν) the positive and negative supports of ν. On the other hand, for µ =
ai µxi ∈ U0, let
us define its positive support to be {xi | ai > 0}, and its negative support to be {xi | ai < 0}. These are
clearly disjoint. Note that when ν ∈W+, the positive support of D(ν) is contained in the positive support
of ν, and contains all elements thereof that are not also in the negative support of ν, and that the negative
support of D(ν) has the dual properties.
When we speak of a cycle in a directed graph, we shall mean a cycle in the corresponding undirected
graph; we shall also understand that in a cycle no vertex is traversed more than once. Note that a cycle
of length 1 in Γ(ν) can only arise when a term νx,x has nonzero coefficient in ν, while a cycle of length
2, i.e., the presence of two edges between x and y, can only occur if νx,y and νy,x both have nonzero
coefficients. But a cycle of length n > 2 involving a given sequence of vertices may arise in any of 2n ways,
depending on the orientations of the edges.
We now prove
Proposition 24 (cf. [11, Theorem 3.3, p.84]). Let µ ∈ U0. Then the infimum of (39) is attained by an
element ν ∈W+ (not necessarily unique) whose positive and negative supports coincide respectively with the
positive and negative supports of µ, and whose graph Γ(ν) has no cycles.
Proof. As mentioned, Weaver proves the existence of a ν ∈ W+ with D(ν) = µ which achieves the infi-
mum (39) and has the same support as µ. Let ν be chosen, first, to have these properties; second, among
such elements, to minimize the total number of edges in Γ(ν), and finally, to minimize the sum of the
coefficients of all the νx,y in its expression. This last condition is achievable because the set of elements of
W+ which are linear combinations of a given finite family of the νx,y, and for which the coefficients of these
elements are all ≤ some constant, is compact; so after finding some ν with D(ν) = µ which achieves the
minimum (39), has the same support as µ, and minimizes the number of edges in Γ(ν), we may restrict our
search for elements also minimizing the coefficient-sum to the compact set of elements having these properties
and having every coefficient less than or equal to the coefficient-sum of the element we have found.
Suppose, now, that Γ(ν) has a cycle. Thus, we may choose distinct vertices p1, . . . , pk, and for each
j ∈ {1, . . . , k}, a term νpj ,pj+1 or νpj+1,pj occurring with positive coefficient in ν, where the subscripts j
are taken modulo k. (If k > 2 and both νpj ,pj+1 and νpj+1,pj occur in ν, we choose one of these arbitrarily.
If k = 2, we make sure that the terms we choose for j = 1, 2 are distinct, one being νp1,p2 and the other
νp2,p1 .) For each j ∈ {1, . . . , k} let us now define ν′pj ,pj+1 to be νpj ,pj+1 if that is the jth term in the
list we have chosen, or −νpj+1,pj if the jth term in that list is νpj+1,pj , and let ν′ =
ν′pj ,pj+1 ∈ W.
In general, ν′ /∈ W+, but for all λ ∈ R near enough to 0, we have ν + λν′ ∈ W+, since the relevant
coefficients in ν are strictly positive. Note that for each j, D(ν′pj ,pj+1) = µj − µj+1, hence D(ν
′) = 0,
hence D(ν + λν′) = D(ν) = µ.
Clearly,
(x,y)∈X×X
d(x, y) d(ν+λν′) is an affine function of λ. Hence it must be constant, otherwise,
using small λ of appropriate sign, we would get a contradiction to the assumption that ν achieves the
minimum of (39); so all the elements ν+λν′ achieve this same minimum. Now some choice of λ will cause
λν′ to exactly cancel the smallest among the coefficients of terms νpj ,pj+1 or νpj+1,pj in our cycle in Γ(ν).
Thus, ν + λν′ contradicts the minimality assumption on the number of edges in Γ(ν). This contradiction
shows that Γ(ν) has no cycles.
Next, let us compare the positive and negative supports of ν with those of µ. We have chosen ν so
that its support, namely the union of its positive and negative supports, coincides with the support of µ;
and since µ = D(ν), the positive and negative supports of ν will each contain the corresponding support
of µ. So if these inclusions are not both equalities, we must have a vertex p which is both in the positive
and the negative support of ν; i.e., such that there is an edge (q, p) of Γ(ν) leading into p, and an
edge (p, r) leading out of it. Let ν′ = νq,r − νq,p − νp,r. Like the element denoted by that symbol in the
preceding argument, this satisfies D(ν′) = 0. Let us again form ν + λ ν′, this time choosing the value
λ > 0 which leads to the cancellation of the smaller of the coefficients of νq,p and νp,r in ν, or of both
if these coefficients are equal. Since this does not reverse the sign of either of these coefficients, ν + λ ν′
still belongs to W+. Note that
(x,y)∈X×X
d(x, y) d(ν+λν′) ≤
(x,y)∈X×X
d(x, y) dν, since by the triangle
inequality, d(q, r) ≤ d(q, p) + d(p, r); so the property of minimizing the latter integral among elements of
W+ mapped to µ by D has not been lost. Also, Γ(ν + λν
′) has dropped at least one edge that belonged
to Γ(ν), since at least one coefficient was canceled, and has gained at most one edge, namely (q, r) (if that
was not previously present); so the total number of edges has not increased. Finally, when we look at the
sum of all the coefficients, we see that the coefficient of νq,r has increased by λ, while those of νq,p and
νp,r have both decreased by λ, so there has been a net change of −λ < 0. Thus, we have a contradiction to
our choice of ν as minimizing that sum. This completes the proof of the main assertion of the proposition.
Let us verify, finally, the parenthetical comment that the ν of the proposition may not be unique. Let
X be a 4-point space {x1, x2, x3, x4} where the distance between every pair of distinct points is 1, and let
µ = µx1 + µx2 − µx3 − µx4 . It is not hard to check that in this case, the only elements ν that can possibly
satisfy the conditions of the proposition are νx1,x3 + νx2,x4 and νx1,x4 + νx2,x3 . Since these give the same
value for the integral of (39), d(x1, x3) + d(x2, x4) = 2 = d(x1, x4) + d(x2, x3), each satisfies our conditions.
(Of course, for most choices of metric on this set X, one of these two values is smaller than the other,
and we then get a unique ν satisfying the conditions of the proposition.)
To get a self-contained version of the above proof which includes the existence result we cited from [23],
one may start by looking at any finite subset X0 of X containing the support of µ, verify by compactness
as above that the infimum of (39) over elements ν with support contained in X0 is achieved, then note
that any element in the support of ν but not in the support of µ must belong to both the positive and
negative supports of ν, a situation excluded by the proof. Letting X0 then run over all finite subsets of
X containing the support of µ, one sees that the infimum of (39) exists, and is simply the infimum with ν
restricted to have support in the support of µ.
We remark that the final condition of the above proposition, that Γ(ν) have no cycles, is not entailed
by the other conditions. E.g., returning to X = {x1, x2, x3, x4} with all distances 1, we see that every
convex linear combination ν of the two elements that we found, νx1,x3 + νx2,x4 and νx1,x4 + νx2,x3 , still
minimizes (39), and still has support X ; but if ν is a proper convex combination of those two elements,
then Γ(ν) = Γ(νx1,x3+ νx2,x4) ∪ Γ(νx1,x4+ νx2,x3), which contains (indeed, is) a cycle.
Let us now show, however, that when, as in the statement of the proposition, Γ(ν) is cycle-free, it
uniquely determines ν. Thus, the calculation of the norm (39) reduces in principle to checking finitely
many ν.
Lemma 25. Let µ ∈ U0, and let Γ be a directed graph with vertex-set X and without cycles. Then there
is at most one ν ∈W (and so, a fortiori, at most one ν ∈ W+) such that D(ν) = µ and Γ(ν) ⊆ Γ.
To characterize this element ν, consider any edge (x, y) in Γ. Let Γx (containing x) and Γy (containing
y) be the connected components into which the connected component of Γ containing (x, y) separates when
that edge is removed. Then the coefficient in ν of νx,y is the common value of
dµ and −
dµ, i.e.,
is both the sum of the coefficients of µz over z in Γx, and the negative of the corresponding sum over Γy.
Proof. We will prove the assertion of the second paragraph, from which that of the first clearly follows.
Writing µ = D(ν), the contributions to the expression
dµ from any term νp,q such that both p and
q lie in Γx clearly cancel, while terms such that neither p nor q lies in Γx contribute nothing. This leaves
the νx,y term, which contributes precisely its coefficient, leading to the first description of that coefficient.
Likewise, this term contributes the negative of its coefficient to
dµ, yielding the second description.
Corollary 26. Suppose µ ∈ U0 is integer-valued, and let ν be an element of W+ with the properties that
D(ν) = µ, that ν has the same positive and negative supports as µ, and that Γ(ν) has no cycles.
Then for every x ∈ X such that the coefficient of µx in µ is ±1, the vertex x is a leaf of Γ(ν).
Hence, if µ has the property that the coefficient of every µx is ±1, then ν is induced, in the obvious
way, by a bijection between the positive support of µ and the negative support of µ. Thus, in that case,
letting n be the common cardinality of these supports, there are exactly n! such ν ∈ W+.
Proof. Consider any x such that µx has coefficient +1 in µ. Then x is in the positive support of Γ(ν),
but not in the negative support. The latter condition says that ν involves no terms νy,x (y ∈ X), so +1
is the sum of the coefficients in ν of the terms νx,y (y ∈ X − {x}). Since ν ∈ W+, these coefficients are
nonnegative, and by Lemma 25 they are integers, so as they sum to 1, only one of them can be nonzero,
making x a leaf. The same argument, mutatis mutandis, gives the case where the coefficient of µx is −1.
The assertion of the final paragraph clearly follows, since a directed graph in which every vertex is a leaf
corresponds to a bijection between “source” and “sink” vertices.
As sample applications, recall the two computations at the beginning of this section, with which we
motivated the construction of our universal embedding e : X → U0. In our present notation, what we
were doing was evaluating the norms in U0 of elements of the two forms µx1 + µx2 − µx3 − µx4 and
3µx1 + µx2 − 2µx3 − 2µx4. In the first case, the last paragraph of the above corollary leads to just two
graphs, and hence two values of ν one of which must achieve the infimum (39), namely νx1,x3 + νx2,x4 and
νx1,x4 + νx2,x3 , showing that if f is our universal map e, equality holds in (36), and for general f, (36) is
the best bound. (This also establishes the example that we said was “not hard to check” in the next-to-last
paragraph of the proof of Proposition 24.)
In the case µ = 3µx1 + µx2 − 2µx3 − 2µx4, Corollary 26 says that x2 is a leaf of Γ(ν). As it lies in the
positive support of µ, the vertex it is attached to must lie in the negative support, i.e., must be either x3
or x4. In the former case, subtracting νx2,x3 from ν will give an element ν
′ ∈ W+ which is sent by D to
3µx1 − µx3 − 2µx4 . Since this ν′ has only one element, x1, in its positive support, its graph is uniquely
determined, giving ν′ = νx1,x3 +2νx1,x4 , hence ν = νx2,x3 + νx1,x3 +2νx1,x4 . The case where x2 is attached
to x4 similarly gives ν = νx2,x4 + 2νx1,x3 + νx1,x4 , and these together show that (37) is a best bound.
I referred earlier to the mathematical economist’s “transportation problem”. There, our d(x, y) corre-
sponds to the cost of transporting a unit quantity of goods from location x to location y; so our definition of
||µ|| describes the minimum cost of transporting goods produced and consumed at locations and in quantities
specified by µ.
Incidentally, the first assertion of Corollary 26 does not remain true if we weaken “the coefficient of µx
in µ is ±1” to “the coefficient of µx has least absolute value among the nonzero coefficients occurring in
µ.” For instance, suppose µ has the form 3µx1 − 4µx2 + 2µx3 − 4µx4 + 3µx5 . Then one of the elements of
W+ satisfying the conditions of Corollary 26 is ν = 3νx1,x2 + νx3,x2 + νx3,x4 + 3νx5,x4 . Here Γ(ν) has the
form x1 → x2 ← x3 → x4 ← x5, so x3, despite having smallest coefficient in µ, is not a leaf.
Proposition 24 sheds some light on our earlier “partial universality” result, Corollary 14(i). Given any
convex linear combination of points of our universal image of X,
aiµxi (ai > 0,
ai = 1), and any
point x ∈ X (which for simplicity we will assume is not one of the xi, though the argument can be adjusted
to the case where it is), the difference µx −
aiµxi is an element of U0 with positive support {x} and
negative support {xi | i ∈ I}. For this situation, the conditions of Proposition 24 clearly lead to a unique
Γ(ν), to wit, the tree whose edges are all pairs (x, xi) (i ∈ I), and hence to the unique choice ν =
ai νx,xi .
Thus, the right-hand side of (39) comes to
ai d(x, xi), which is equal to the right-hand side of (19). In
contrast, when one considers the difference between two general convex linear combinations of elements µx,
as in Corollary 14(ii), there may be many directed graphs satisfying the conditions of Proposition 24, so the
norm of that difference doesn’t have a simple expression.
The universality of the Arens-Eells space U yields a formula for map-rad(X,NmV) analogous to our
description (16) of map-rad(X,Conv); namely,
(40) map-rad(X,NmV) = infµ∈U1 supx∈X infν∈W+, D(ν)=µ−µx
y,z∈X
d(y, z) dν(y, z).
But this is cumbersome to use. E.g., the reader might try working through a verification, for the space
described by (25), of the statement that the µ implicit in (29), (µy0 + µy1 + µy2 − µx)/2, does indeed lead
to the infimum of (40), showing that map-rad(X,NmV) = 5/2, and not a smaller value.
Given an element µ ∈ U0, it would be interesting to look for bounds on the number of distinct graphs
Γ(ν) corresponding to elements ν ∈W+ as in the first sentence of Corollary 26. (This is simply a function
of the coefficients occurring in µ, as a family of positive real numbers with multiplicities.) To start with,
one might look for bounds in terms of the cardinalities of the positive and negative supports of µ.
Weaver [23] also gets a description of the universal nonexpanding map of X into a normed complex
vector space, paralleling the description for the real case, but he notes [23, p.43, next-to-last paragraph of
§2.2] that in the complex case it is no longer true that the infimum corresponding to (39) is always attained
by a ν having the same support as µ. (In the complex version of (39), by the way, one must replace dν by
|dν|, instead of restricting ν to a “positive cone” as above, since there is no natural analog of that cone.
Weaver takes this approach for both the real and complex cases; my use of W+ for the real case is one of
the different technical choices that I have made.) For instance, if X = {x, y0, y1, y2} with d(x, yi) = 1 and
d(yi, yj) = 2 (i 6= j), and if µ = µy0 + ωµy1 + ω2µy2 , where ω is a primitive cube root of unity, then the
minimizing ν is νy0,x + ωνy1,x + ω
2νy2,x, which makes that integral 3, while the best ν having support
in the support of µ, {y0, y1, y2}, is 13 (1− ω)νy0,y1 +
(ω − ω2)νy1,y2 + 13 (ω
2 − 1)νy2,y0 , of which each term
contributes 1
3 · 2 to that integral, giving a total of
3 · 2 =
12 > 3. If in this space we replace the
point x by a sequence of points x1, x2, . . . , such that d(xm, yi) = 1 + 1/m and d(xm, xn) = |1/m− 1/n|,
the above µ still has ||µ|| = 3, but the infimum defining that norm is not achieved.
7 Appendix: Translating convex sets to 0.
In §3, we saw that the radius of a subset X of a normed vector space V could be larger when measured
within a convex subset C of V than within the whole space V. If we regard this as a pathology, we would
like to know in which V it does not occur. We shall obtain partial results below, which, we will see, make
it likely that for n > 2, the only norms on Rn for which it does not happen are those giving a structure
isomorphic to En.
Observe that the radius of X, whether within V or within a convex subset C, is determined by the set
of closed balls containing X, and that these are all convex; hence that radius is a function of the convex hull
of X. So our question reduces to the case where X is convex. Moreover, if X shows the above behavior
with respect to one convex subset C of V, it will show it with respect to any smaller convex subset in
which it lies; these two observations reduce our question to the case where C = X. This reduction is the
equivalence of conditions (41) and (42) of the next lemma. Condition (43) then reformulates the problem.
(Note that in (43) and similar statements throughout this section, an expression such as “C − v” will
denote the translate of the set C by the vector −v, in contrast to notations such as X−{x} for set-theoretic
difference, used occasionally in earlier sections.)
Lemma 27. If V is a locally compact normed vector space, with closed unit ball B, then the following
conditions are equivalent:
(41) For every nonempty subset X of V, and convex subset C of V containing X, one has
radV (X) = radC(X).
(42) For every nonempty convex subset C of V, one has radV (C) = radC(C).
(43) Every nonempty closed convex subset C of B has a translate C − v which contains 0 and is
again contained in B.
Proof. We have noted the equivalence of (41) and (42); let us prove (42) equivalent to (43).
(43)⇒ (42): Dilating by arbitrary constants, we see that if (43) holds for B, then it holds for rB for
all positive real numbers r. Moreover, the statement that C − v contains 0 and is contained in rB is
equivalent to saying that v ∈ C and that v+ rB contains C; i.e., that C is contained in the ball of radius
r about v ∈ C. Thus (43) says that if a closed convex set C is contained in some closed ball about some
point of V (taken without loss of generality to be 0), then it is contained in a ball of the same radius about
one of its own points. This yields the case of (42) where C is closed. The facts that the convex hull of a
finite subset of V is compact, hence closed, and that the radius of an arbitrary set is the supremum of the
radii of its finite subsets, allow us to deduce the general case of (42) from the case of closed C.
(42)⇒ (43): If C is a closed convex subset of V contained in B, then radV (C) ≤ 1, so by (42),
radC(C) ≤ 1. Moreover, compactness of C implies that the set of radii of closed balls containing C and
centered at points v ∈ C achieves this minimum radC(C) ≤ 1, so that C is contained in a translate v+B
(v ∈ C), i.e., C − v ⊆ B.
Now (43) is a statement purely about the convex set B in the topological vector space V, so our question
becomes that of which subsets B of a topological vector space V satisfy it. (In the statement of the lemma,
the topology and the set B both arise from the normed structure on V ; but that relation is not needed by
the statement of (43) alone.) Here are some pieces of language, one ad hoc, the rest more or less familiar,
that we will use in examining this question.
Definition 28. If C ⊆ B are convex subsets of a real vector space V, with 0 ∈ B, we shall call C parkable
in B if there exists v ∈ C such that C − v ⊆ B. When clear from context, “in B” may be omitted.
If V is a real topological vector space, we will call sets of the form {x ∈ V | L(x) = a}, where L is a
nonzero continuous linear functional on V and a ∈ R, hyperplanes, while sets of the form {x ∈ V | L(x) ≥
a} will be called closed half-spaces.
A subset S of a vector space V will be called centrally symmetric if S = −S. A center of symmetry of
a subset S of V will mean a point v ∈ V such that S − v is centrally symmetric; equivalently, such that
S = 2v − S. (Note that a center of symmetry of a nonempty convex set belongs to that set.)
Lemma 29. Let B be a compact convex subset of Rn containing 0. Then the following conditions are
equivalent:
(44) The intersection of B with every hyperplane A that meets B is parkable.
(45) The intersection of B with every closed half-space H that meets B is parkable.
(46) Every nonempty closed convex subset C of B is parkable (= (43) above).
Proof. (46)⇒ (44) is clear; we will show (44)⇒ (45)⇒ (46).
(44)⇒ (45): Let H be a closed half-space in Rn, bounded by a hyperplane A, and meeting B. If H∩B
contains 0 it is trivially parkable, so assume the contrary. Thus B meets both H and its complement,
hence it meets their common boundary A, so by (44) there exists v ∈ A ∩ B such that (A ∩B) − v ⊆ B.
I claim that (H ∩ B) − v is also contained in B. Indeed, let p ∈ H ∩ B; we wish to show p − v ∈ B.
Intersecting our sets with the subspace of V spanned by p and v, and taking appropriate coordinates, we
may assume that n = 2, that A is the line {(x, y) | y = 1} ⊆ R2, and that v is the point (0, 1). H will
be the closed half-plane {(x, y) | y ≥ 1}, so we can write p = (xp, yp) with yp ≥ 1.
In this situation, A ∩ B will be a line segment (possibly degenerate) extending from a point (s, 1) to
a point (t, 1) (s ≤ t). Since (A ∩ B) − v ⊆ B, B also contains the segment from (s, 0) to (t, 0). Note
that if xp were > t, then the point where the line segment from p = (xp, yp) ∈ B to (t, 0) ∈ B meets A
would have x-coordinate > t, contradicting the assumption that A ∩ B terminates on the right at (t, 1);
so xp ≤ t. Similarly, xp ≥ s. Thus, xp ∈ [s, t], so (xp, 0) ∈ B. Hence p − v = (xp, yp−1) lies on the line
segment connecting p = (xp, yp) ∈ B with (xp, 0) ∈ B, hence lies in B, as claimed.
(45)⇒ (46): Suppose C is a nonempty closed convex subset of B which is not parkable. By compactness
of B, among the translates of C contained in B there is (at least) one that minimizes its distance to 0 in
the Euclidean norm on Rn; let us assume C itself has this property. Let p be the point of C nearest to
0 in that norm, and let A be the hyperplane passing through p and perpendicular (again in the Euclidean
norm) to p regarded as a vector. Then C will lie wholly in the half-space H bounded by A and not
containing 0. (For if we had q ∈ C not lying in H, then points close to p on the line segment from p to q
would be nearer to 0 than p is.) Assuming (45), H ∩B is parkable; say v ∈ H ∩B with (H ∩B)− v ⊆ B.
Since v ∈ H, if we write v as the sum ap+ q of a scalar multiple of p and a vector q perpendicular to p,
the coefficient a will be ≥ 1, and so in particular, positive. It follows that for sufficiently small positive c,
the point p − cv will be closer to 0 than p is; moreover, if we take such a c that is ≤ 1, (H ∩ B) − cv
will still be contained in B, since H ∩B and (H ∩B)− v are. Hence C − cv is contained in B, and has
a point p− cv which is closer to 0 than p is, contradicting our minimality assumption on C and p.
(In the above result, we could have replaced Rn by any real Hilbert space.)
Clearly, the closed Euclidean unit ball in Rn satisfies (44), and hence (45) and (46); hence since those
conditions are preserved by invertible linear transformations, so does the closed region enclosed by any
ellipsoid centered at 0. On the other hand, our example in the paragraph containing (23), of a normed
vector space in which (41) failed, had for its unit ball B a 3-cube centered at 0, showing that our properties
fail for that B. To see geometrically the failure of (44) for that B, choose a vertex of that cube and pass
a plane A through the three vertices adjacent thereto; it is not hard to see that A ∩ B is not parkable.
One can similarly show that none of the regular polyhedra centered at 0 satisfy (44), nor a circular cylinder
centered at 0, nor the solid obtained by attaching a hemisphere to the top and bottom of such a cylinder.
In fact, for n > 2, I know of no compact convex subset of Rn with nonempty interior that does satisfy that
condition, other than the regions enclosed by ellipsoids centered at 0. The situation is different for n = 2,
as shown by point (d) of the next result.
Lemma 30. Suppose B is a centrally symmetric convex subset of Rn. Then
(a) Any nonempty convex subset C ⊆ B that has a center of symmetry is parkable in B.
Hence, assuming in the remaining points that B is also compact, we have
(b) If the intersection of B with every hyperplane A that meets B has a center of symmetry, then B
satisfies the equivalent conditions (44)-(46).
In particular,
(c) If B is the closed region enclosed by an ellipsoid in Rn, then B satisfies (44)-(46), and
(d) If n = 2, then without further restrictions, B satisfies (44)-(46).
Proof. Let C be as in (a), with center of symmetry z ∈ C. Then for every x ∈ C, 2z − x ∈ C ⊆ B, so by
central symmetry of B, we have x− 2z ∈ B. Averaging x and x− 2z, we get x− z ∈ B. Thus C− z ⊆ B,
so C is parkable.
It follows that any B as in (b) satisfies (44), hence by Lemma 29, all of (44)-(46).
In the situation of (c), the intersection of B with a hyperplane A, if nonempty, is either a point or the
region enclosed by an ellipsoid in A, hence has a center of symmetry, while in (d) the intersection of B
with every line in R2 that meets B is a point or a closed line segment, hence has a center of symmetry; so
in each case, (b) gives the asserted conclusion.
Question 31. Let n ≥ 3, and suppose B is a compact convex subset of Rn having nonempty interior and
containing 0. Of the implications (i)⇒ (ii)⇒ (iii), which we have noted hold among the conditions listed
below, is either or both reversible?
(i) B is an ellipsoid centered at 0.
(ii) B is centrally symmetric, and for every hyperplane A meeting B, A ∩B has a center of symmetry.
(iii) Every closed convex subset of B is parkable in B.
Branko Grünbaum has pointed out to me a similarity between this question and the result of W.Blaschke
[3, pp.157–159] that if E is a smooth compact convex surface in R3 with everywhere nonzero Gaussian
curvature, such that when E is illuminated by parallel rays from any direction, the boundary curve of the
bright side lies in a plane, then E is an ellipsoid. I believe that methods similar to Blaschke’s may indeed
show that both implications of Question 31 are reversible. To see why, suppose B is a compact convex
subset of R3 with nonempty interior containing 0, which satisfies (iii) above, and whose boundary E is (as
in Blaschke’s result) a smooth surface with everywhere nonzero Gaussian curvature. Let A be any plane
through 0, and A′ the plane gotten by shifting A a small distance. Now the vectors that can possibly park
A′ ∩ B are constrained by the directions of the tangent planes to E at the points of A′ ∩ E (which are
well-defined because E is assumed smooth), and if we take A′ sufficiently close to A, these tangent planes
become close to the corresponding tangent planes at the points of A ∩ E. Applying the above observations
to planes A′ on both sides of A, one can deduce that all the tangent planes to E along A∩E must contain
vectors in some common direction (I am grateful to Bjorn Poonen for this precise formulation of a rough
idea I showed him); in other words, that A ∩ E is the boundary of the bright side when E is illuminated
by parallel rays from that direction. By definition, A∩E lies in the plane A; so we have the situation that
Blaschke considered, except that we have started with planarity and concluded that the curve is a boundary
of illumination, rather than vice versa.
That last difference is probably not too hard to overcome. More serious is the smoothness assumption
on E, used in both the above discussion and Blaschke’s argument. Finally, can the result be pushed from
n = 3 to arbitrary n ≥ 3 ? I leave it to those more skilled than I in the subject to see whether these ideas
can indeed be turned into a proof that (iii)⇒(i) in Question 31.
A related argument which can be extracted from a step in Blaschke’s development shows that a compact
convex subset B of R2 containing 0 and satisfying (46), whose boundary is a smooth curve containing
no line segments, must be centrally symmetric. Again, one would hope to remove the conditions on the
boundary.
One can ask about a converse to another of our observations:
Question 32. Suppose C is a compact convex subset of Rn (n > 2) such that for every centrally symmetric
compact convex subset B of Rn containing a translate C′ of C, the set C′ is parkable in B. Must C
have a center of symmetry?
Here the behavior of a given C can change depending on whether the dimension of the ambient vector
space is 2 or – as in the above question – larger: a triangle C has the above property in R2 by Lemma 30(d),
but not in R3, as we saw in the example where B was a cube.
Returning to the “pathology” which motivated the considerations of this section, one important case is
where the radius of a subset X of a normed vector space V decreases when V is embedded in a larger
normed vector space W. The next lemma determines how far down the radius of a given X can go.
Lemma 33. Let V be a normed vector space, and X a bounded subset of V. Then
(47) infW⊇V radW (X) = radV {(x− y)/2 | x, y ∈ X},
where W ranges over all normed vector spaces containing V. This infimum is realized by a W in which V
has codimension 1.
Proof. First consider any normed vector space W containing V, and suppose X is contained in the closed
ball of radius r about w ∈W. That ball has w as a center of symmetry, so it also contains {2w−y | y ∈ X},
hence taking midpoints of segments connecting that set to points x ∈ X, it contains {w+ (x− y)/2 | x, y ∈
X}. Translating by −w, we see that the ball of radius r about 0 contains {(x− y)/2 | x, y ∈ X}, so r is
at least the right-hand side of (47). This gives the inequality “≥ ” in (47); it remains to construct a W for
which radW (X) equals that right-hand side.
Before doing this, note that (47) holds trivially if X is empty or a singleton; so assuming it is neither,
let us re-scale and assume without loss of generality that the right-hand side of (47) equals 1. Since the set
whose radius is taken there is centrally symmetric, that set is contained in the closed unit ball BV of V.
(Cf. the proof of Lemma 30(a), which works not just for Rn, but for any normed vector space with B its
closed unit ball; or the proof of Lemma 5, applied to the 2-element group generated by x 7→ −x.) Now let
W = V ⊕ R, let us identify V with V × {0} ⊆ W, and let us take for the closed unit ball BW of W the
closure of the convex hull of
(48) {(x, 1) | x ∈ X} ∪ BV ∪ {(−x,−1) | x ∈ X}.
(We understand “closure” to mean “with respect to the product topology”, since we don’t have a norm until
we have made the above definition.) It is easy to see that any point in the convex hull of (48) whose second
coordinate is 0 is a convex linear combination of a point of BV and a member of the set on the right-hand
side of (47); but by assumption that set is contained in BV ; so in fact, BW ∩ V = BV , so the norm of W
indeed extends that of V.
But BW contains the translate {(x, 1) | x ∈ X} of X, hence X is contained in the closed ball of radius
1 about (0,−1), hence has radius ≤ 1 in W.
Even the case V = En is not immune to this phenomenon, since even in that case, the overspace W of
the above construction is generally not Euclidean. For instance, if we take for X an equilateral triangle in
2 centered at the origin, it is not hard to see that {(x− y)/2 | x, y ∈ X} is a hexagon whose vertices are
the midpoints of the edges a regular hexagon with the same circumcircle as X ; so the radius of X decreases
in W by the ratio of the inradius to the circumradius of a regular hexagon, in other words, by
This will not, of course, happen for a centrally symmetric X (cf. Lemma 30 or (47)). Other cases for
which it cannot happen depend on the metric structure: if X is a right or obtuse triangle in E2, or more
generally, any bounded set containing a diameter of a closed ball in which it lies, its radius clearly cannot
go down under extension of the ambient normed vector space (cf. Corollary 10).
8 Acknowledgements.
In addition to persons acknowledged above, I am indebted to W.Kahan for showing me an exercise he
had given his Putnam-preparation class, of proving Lemma 2 in E3, and for subsequently pointing out that
my solution to that exercise worked in any normed vector space; to Nik Weaver for information about his
results in [23], and to David Gale for pointing out the connection between the construction of §6 and results
in mathematical economics.
References
[1] Richard F. Arens and James Eells, Jr., On embedding uniform and topological spaces, Pacific J. Math.
6 (1956) 397–403. MR 18, 406e.
[2] Martin R. Bridson and André Haefliger, Metric spaces of non-positive curvature, Grundlehren der Math-
ematischen Wissenschaften, v.319. Springer, 1999. MR 2000k:53038.
[3] Wilhelm Blaschke, Kreis und Kugel, Leipzig, 1916, reprinted by Chelsea Publishing Company, New
York, 1949. MR 17, 887b.
[4] Herbert Busemann, Spaces with non-positive curvature, Acta Math. 80 (1948) 259–310. MR 10,623g.
[5] G. D. Chakerian and M. S. Klamkin, Minimal covers for closed curves, Math. Mag., 46 (1973) 55–61.
MR 47#2496.
[6] S. S. Chern, Curves and surfaces in Euclidean space, Studies in Global Geometry and Analysis, pp.16-56,
Math. Assoc. Amer., Studies in Math., v.4, 1967. MR 35#3610.
[7] H. S. M. Coxeter, Regular Polytopes, Methuen & Co.; Pitman, 1948; 1949. MR 10,261e, and for
subsequent editions, MR 27#1856 and MR 51#6554.
[8] Ludwig Danzer, Branko Grünbaum and Victor Klee, Helly’s theorem and its relatives, Proc. Sympos.
Pure Math., Vol. VII, pp. 101–180, Amer. Math. Soc., Providence, R.I., 1963. MR 28#524.
[9] Werner Fenchel, Über Krümmung und Windung geschlossener Raumkurven, Math. Ann., 101 (1929)
238–252.
[10] David Gale, On inscribing n-dimensional sets in a regular n-simplex, Proc. Amer. Math. Soc., 4 (1953)
222–225. MR 14,787b.
[11] David Gale, The theory of linear economic models, McGraw-Hill, 1960; University of Chicago Press,
1989. MR 22#6599.
[12] Hugo Hadwiger and Hans Debrunner, Combinatorial geometry in the plane, translated by Victor Klee,
with a new chapter and other additional material supplied by the translator. Holt, Rinehart and Win-
ston, New York 1964 vii + 113 pp. MR 29#1577.
[13] J. H̊astad, S. Linusson and J. Wästlund A smaller sleeping bag for a baby snake, Discrete Comput.
Geom., 26 (2001) 173–181. MR 2002b:52012.
[14] E. Helly, Über Mengen konvexer Körper mit gemeinschaftlichen Punkten, Jahresber. Deutsch. Math.-
Verein., 32 (1923) 175–176.
[15] Einar Hille, Analytic function theory, v. II, Ginn and Co., Boston, 1962. MR 34#1490.
[16] R. A. Horn, On Fenchel’s Theorem, Amer. Math. Monthly, 78 (1971) 380-381. MR 44#2142.
[17] Heinrich W. E. Jung, Über die kleinste Kugel, die eine räumliche Figur einschliesst, J. Reine Angew.
Math., 123 (1901) 241–257.
[18] J. C. C. Nitsche, The smallest sphere containing a rectifiable curve, Amer. Math. Monthly, 78 (1971)
881–882. MR 45#480.
[19] H. Rutishauser and H. Samelson, Sur le rayon d’une sphére dont la surface contient une courbe fermée,
C. R. Acad. Sci. Paris, 227 (1948) 755–757. MR 10, 321c; correction, MR 10, p.856.
[20] Jonathan Schaer and John E. Wetzel, Boxes for curves of constant length, Israel J. Math., 12 (1972)
257–265. MR 47#5726.
[21] B. Segre, Sui circoli geodetici di una superficie a curvatura totale constante, che contengono nell’interno
una linea assegnata, Boll. Un. Mat. Ital., 13 (1934) 279–283. Zbl 10, 271.
[22] Philip C. Tonne, A simple closed curve on a hemisphere, Houston J. Math., 10 (1984) 585.
MR 86b:53004.
[23] Nik Weaver, Lipschitz algebras, World Scientific Publishing, River Edge, NJ, 1999, xiv+223 pp., ISBN:
981-02-3873-8. MR 2002g:46002.
[24] Alan Weinstein, Almost invariant submanifolds for compact group actions, J. Eur. Math. Soc. 2 (2000)
53–86. MR 2002d:53076.
[25] John E. Wetzel, Covering balls for curves of constant length, Enseignement Math., (2) 17 (1971) 275–
277. MR 48#12315.
George M. Bergman
Department of Mathematics
University of California
Berkeley, CA 94720-3840
gbergman@math.berkeley.edu
	The definition, and three examples.
	General properties of mapping radii.
	Some explicit mapping radii.
	Realizability of mapping radii.
	Related literature (and one more question).
	Appendix: The Arens-Eells space of X.
	Appendix: Translating convex sets to 0.
	Acknowledgements.
ABSTRACT
  It is known that every closed curve of length \leq 4 in R^n (n>0) can be
surrounded by a sphere of radius 1, and that this is the best bound. Letting S
denote the circle of circumference 4, with the arc-length metric, we here
express this fact by saying that the "mapping radius" of S in R^n is 1.
  Tools are developed for estimating the mapping radius of a metric space X in
a metric space Y. In particular, it is shown that for X a bounded metric space,
the supremum of the mapping radii of X in all convex subsets of normed metric
spaces is equal to the infimum of the sup norms of all convex linear
combinations of the functions d(x,-): X --> R (x\in X).
  Several explicit mapping radii are calculated, and open questions noted.

<|endoftext|><|startoftext|>
Introduction
Total intensity surveys reveal a number of radio spurs that can be joined into small
circles on the sky, so-called radio loops. One of these is Loop I, which has an intriguing
filament called the North-Polar Spur (NPS). It has been concluded by several authors (e.g.
Berkhuijsen et al. 1971; Heiles 1979; Salter 1983, and references therein) that the radio loops
are correlated with expanding gas and dust shells, energized by supernovae or stellar winds.
Accordingly, the Loop I superbubble was attributed to stellar winds from the SCO-CEN OB
association and supernova activity in the same vicinity, with the NPS being the brightest
segment of a supernova remnant (SNR). Its magnetic field (B-field) has been modelled by
Spoelstra (1972) and Heiles (1998) based on radio and optical polarization data, suggesting
that the local B-field is deformed by an expanding shell. According to a model by Weaver
1Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada
T6G 2V4.
2National Research Council Canada, Herzberg Institute of Astrophysics, Dominion Radio Astrophysical
Observatory, P.O. Box 248, Penticton, BC, V2A 6J9, Canada; maik.wolleben@nrc-cnrc.gc.ca
http://arxiv.org/abs/0704.0276v1
– 2 –
(1979), an SNR, produced by a member of the SCO-CEN association, has expanded inside
the Loop I bubble. Its shell is just beginning to encounter and interact with the surface
of the surrounding H I shell (GS 331+14-15). There has been a debate over the expansion
velocity (vexp) of GS 331+14-15: Weaver gives vexp ≈ 2 km s
−1, based on data from the
northern hemisphere, while Heiles (1984) and Sofue et al. (1974) find a higher vexp of 19 and
25 km s−1. In a picture by Bochkarev (1987) the H I shell has already passed the Sun.
The relatively low expansion velocity of the neutral gas surrounding Loop I implies an
age of some 106 years and hence suggests the NPS to be part of an old SNR. At this age,
however, an SNR is well beyond its radiative phase and its optical and radio emission become
invisible. On the other hand, X-ray and radio emission have been observed towards the NPS,
indicating an age at least one order of magnitude less. Based on X-ray data, Borken & Iwan
(1977) suggest that Loop I is an old but recently reheated SNR. This was refined by Egger
(1995) who argue that Loop I is caused by a shock from a recent supernova, heating the
inner wall of the SCO-CEN supershell, and giving rise to the prominent X-ray feature of the
Recently, Wolleben et al. (2006) published the new DRAO Low-Resolution Polarization
Survey of the northern sky at 1.4 GHz1, which is part of the Canadian Galactic Plane Survey
(Taylor et al. 2003). Making use of these new data as well as the WMAP polarization data
at 23 GHz (Page et al. 2006), a new and detailed study of the Loop I region can be
performed. In this paper I describe a model consisting of two interweaving magnetic shells,
which explains some newly detected polarization and depolarization structures, correctly
predicts the shape of the NPS, as well as providing a clue to the origin of X-ray emission
observed towards the NPS.
2. Large Polarized Structures In The DRAO Survey
The polarization data used in this paper are shown in Figures 1 and 2 The NPS has
long been known to be one of the most intensely polarized features of the sky. Its filaments
cover large parts of the northern Galactic hemisphere at latitudes of b & 30◦, with an abrupt
drop of polarized emission below 30◦ latitude at 1.4 GHz. Surprisingly, structures of the
NPS are very evident in polarized intensity but its filamentary structure is not detectable
in polarization angle. In the following, previously unknown polarization features detected in
1See also Wolleben (2005). The data are publicly available and can be downloaded at:
http://www.drao.nrc.ca/26msurvey, http://www.mpifr-bonn.mpg.de/div/konti/26msurvey, or from
CDS, Strasbourg (Wolleben et al. 2005)
http://www.drao.nrc.ca/26msurvey
http://www.mpifr-bonn.mpg.de/div/konti/26msurvey
– 3 –
the DRAO polarization survey, which are of relevance for this study, are described:
2.1. High Latitude Polarized Emission (HLPE)
In Fig. 1 (top), towards the Galactic poles (b = ±90◦), a systematic pattern of polarized
emission and polarization angle is visible (HLPE, hereafter). At 408 MHz, the all-sky map
of Haslam et al. (1982) shows a weak counterpart of the HLPE in total intensity (Stokes-I)
with excess emission at the Galactic poles of ∼ 4 K (northern pole) and ∼ 5 K (southern
pole). The Stockert survey at 1420 MHz (Reich & Reich 1986) also shows weak Stokes-I
emission at the poles of ∼ 60 mK (northern pole) and ∼ 200 mK (southern pole). These
temperatures imply a spectral index of −3.4 and −2.6 for the northern and southern HLPE,
respectively, and thereby suggest that the HLPE is synchrotron emission.
What could cause enhanced synchrotron emission around the Galactic poles – the
HLPE?
First, the HLPE could be (polarized) synchrotron emission from the local spiral arm,
which would produce a similar pattern of polarization angles towards the Galactic poles.
Second, the HLPE could be caused by the Galactic halo. However, neither can explain
the distinct boundaries of the HLPE, which are observed in the second and third Galactic
quadrant (cf. thin white lines in Fig. 1 top). In fact, synchrotron emission from the halo
or local spiral arm should be observable at all latitudes where the local Galactic B-field is
perpendicular to the line-of-sight, and not just at the poles.
2.2. New Radio Loop
A previously unknown filament of polarized emission in the southern Galactic hemi-
sphere (hereafter referred to as “New Loop”) is revealed by the DRAO polarization survey.
The New Loop has an excess polarized intensity of ∼ 170 mK at 1.4 GHz relative to its
surroundings. The elongated shape of the New Loop can be fitted by a small circle. From a
fit by eye, the center of this circle is at l = 345◦, b = 0◦, with a radius of 65◦ (dashed line
in Fig. 1 top). Faint counterparts can be found in total intensity at 408 MHz (∼ 5 K) and
1420 MHz (∼ 250 mK), which gives a spectral index of −2.4. This filament can also be seen
in the WMAP polarization data at 23 GHz.
– 4 –
2.3. Depolarization Band
A minimum of polarized emission from the first Galactic quadrant from ∼ 0◦ to ∼ 70◦ at
1.4 GHz, within a band confined by remarkably sharp boundaries at b ≈ ±30◦ (cf. Fig 1 top),
is seen with great clarity for the first time. The percentage polarization within this area is
around 3%, compared to 20 - 30% outside the band. Obviously, strong depolarization within
the Galactic plane must “destroy” the polarized emission from the first Galactic quadrant.
The Depolarization Band is prominent in the DRAO data at 1.4 GHz but not seen in the
WMAP data at 23 GHz, which ephasises the assumption that it is caused by depolarization
due to Faraday rotation. The length of the line-of-sight makes differential Faraday rotation
a likely depolarization mechanism2. However, the absence of polarized emission from the
NPS below b = 30◦, in the DRAO data at 1.4 GHz as well as in the WMAP data at 23 GHz,
cannot be attributed solely to the Depolarization band. It must reflect an inherent property
of the emission region. This is explained by the model presented in this paper.
3. The Model
The model consists of two synchrotron emitting shells: S1 and S2. The shells are
spherical with constant shell thickness. No emission is produced outside the shells. The Sun
resides within S1 between its inner and outer surface (see Fig. 3 and 4). Theoretical models
of magnetic fields in supershells (e.g. Ferriere et al. 1991; Tomisaka 1992) predict that the
ambient B-field is pushed by the shock wave and compressed within the expanding shell.
The model described here resembles these theoretical B-fields in the simplest way: Magnetic
field lines run from the polar caps (the magnetic poles) along longitudes of the shell. The
appearance of the B-field of the shell, that is, its B⊥ and B‖ components, depends on the
vantage point. In case of a large nearby shell the projection can result in a complicated
B-field pattern on the sky.
Each shell is described by 8 parameters (Tab. 1): the center coordinates l, b, d; the
inner (rin) and outer (rout) radius of the shell; the angle between the B-field and the line-
of-sight to the Galactic center (Bθ) and to Galactic north (Bφ); and two scaling factors
describing the intrinsic brightness of each shell. For simplicity, it is assumed that there is
no Faraday rotation within the shells. The model described here is an attempt to reproduce
the polarized emission – the emission tracing the regular B-field – because this model does
2Differential Faraday rotation occurs if synchrotron emitting and Faraday rotating regions are mixed. In
this case, polarization vectors with different orientation may superimpose and cancel each other (see also
Burn 1966).
– 5 –
not take irregular or turbulent B-field components into account. Therefore, the model must
be fitted to polarization maps, as is done here. Moreover, polarization maps at frequencies
around 1 GHz or less are believed to show rather local emission
due to depolarization, which helps avoiding confusion with unrelated background emis-
sion.
The model uses different field orientations in the two shells. Where the shells overlap in
space this is obviously not correct. But the complex shape of the edge of the Local Bubble (cf.
Fig 3) suggests a very complex evolution, probably the outcome of many earlier stellar winds
and supernova events. Any previous large-scale magnetic field has probably been tangled by
this complex evolution. Presumably the Local Bubble is not a unique circumstance, so the
implication is that most regions in the Galaxy are not characterized by a large-scale quasi-
uniform component. The two-field model proposed is a simple approximation to a complex
reality, and is unlikely to be correct over the full extent of the shells. However, the good fit
to the data demonstrates that it is an adequate model for most of the Loop I region.
The model was used to calculate the integrated Stokes U and Q values for each line-
of-sight through the shell complex. Polarized intensity and polarization angle were derived
from these integrated values, thereby accounting for depolarization due to superposition of
differently oriented polarization vectors along the line-of-sight (see Appendix for a more
detailed description). In the northern hemisphere, DRAO data were preferred over WMAP
data because of the better sensitivity. Some regions in the DRAO maps were masked out
however: the “Fan-Region” (see Fig. 1), the two H II-regions S 27 (l = 4◦, b = 22◦) and S 7
(l = 350◦, b = 22◦), and a ±3◦ strip along the Galactic plane. In the southern hemisphere,
23 GHz WMAP data (scaled to 1.4 GHz using a spectral index of −3) were preferred over
the 1.4 GHz polarization survey of Testori et al. (2004) because the western caps lie at b ≈ 0◦
where depolarization at 1.4 GHz is likely to be strong. In the WMAP data the Galactic plane
was masked out because a simple “scaling up” of WMAP data from 23 GHz to 1.4 GHz,
without taking depolarization effects into account, would result in wrong polarized intensities
in regions where Faraday rotation along the line-of-sight is high. The southern Galactic pole
in the WMAP was masked out because here sensitivity is too low to permit accurate fitting.
Fitting was done by computer, applying an algorithm that searched for the minimum
of the square-root of the sum of the differences between modelled and observed polarized
intensities. The algorithm randomly modified model parameters until a good fit was achieved.
Different initial start values for the fit were chosen to evaluate the uniqueness of the best-fit.
In order to determine the confidence ranges of the best-fit parameters, models with slightly
different sets of parameters were tried and visually inspected.
– 6 –
4. Discussion
Figures 2 and 5 show polarized intensity and polarization angle maps of the best-fit
model. At high Galactic latitudes the HLPE is correctly reproduced by S1. Only S1 produces
HLPE because the Sun is located inside this shell, leading to local emission around the
Sun. S2 produces the polarized emission of the NPS. At intermediate latitudes, where the
line-of-sight through the B⊥ component of S1 is longest, the New Loop is seen. At low
latitudes (|b| . 30◦) synchrotron emission from the two shells is reduced because the path
lengths through the shells are short and B⊥ is small. Although the model was only fitted to
polarized intensities, the polarization angles at intermediate and high Galactic latitudes are
remarkably well reproduced.
The predicted western cap region does not fit the observations as well as the eastern cap
region. However, the western caps are likely to be inside the Local Bubble, while the eastern
caps may have expanded into a denser medium. Hence, the “real” shells are not likely to
be of perfectly spherical shape. However, a comprehensive model which includes the full
complexity of the ISM around Loop I is beyond the scope if this paper, and is probably
impossible for want of adequate data.
Portions of the two proposed synchrotron shells currently overlap in space, which, pro-
jected onto the sky, results in a ring-like region in the northern Galactic hemisphere (see
Fig. 5 bottom). This region roughly agrees with the location of 1.5 keV X-ray emission from
the NPS. It is therefore suggested that the X-ray emission is produced by recent interaction
of the two shells within this region.
5. Possible Formation History
The SCO-CEN association can be divided into three subgroups: Lower Centaurus Crux
(LCC), Upper Centaurus Lupus (UCL), and Upper Scorpius (US). Máız-Apellániz (2001)
calculated the positions of the center of each subgroup in the past, taking the effects of solar
motion, Galactic rotation, and motions in the z-direction into account. Accordingly, the
stars whose paths come closest to the center of S1 are those of the LCC3, which crossed this
point 6± 2 Myr ago at a distance of 70 pc from the Sun. Closest to the center of S2 is the
US group at its current location (l = 350◦, b = 20◦, 145 pc away from the Sun). Stellar
activity within the LCC subgroup started 11 to 12 Myr ago, and 5 to 6 Myr ago within the
3Note that the scattering of members of these subgroups across the sky means that the UCL is almost
as good a candidate.
– 7 –
US (de Geus et al. 1989).
The picture that emerges from these data can be summarized as follows. The first
supernovae are expected to take place about 3-5 Myr after formation of an association.
Thus, about 7 to 8 Myr ago, supernova explosions began to occur within the LCC and,
subsequently, delivered enough energy to inflate a bubble around LCC – the S1-shell. The
US, instead, has only recently reached its evolutionary time scale and has just begun inflating
the Loop I bubble (S2). The shock front of Loop I (S2) hit the LCC bubble (S1) just recently
(104 years ago or less), giving rise to the X-ray emission observed toward the NPS.
In this picture S1 is about 6 Myr old, based on the positional coincidence with the LCC
at this time. S2 is 1-2 Myr old, based on the the age of the US. This suggests that S1 is more
evolved than S2, which may explain why almost no observational tracers of S1 can be found
in total intensity surveys. S1 seems to be an almost dissolved shell, whose last observable
remnants are the HLPE and the New Radio Loop.
The scenario proposed here has compelling similarities to the models of H I shells in
this region by Weaver (1979) and Bochkarev (1987), mentioned earlier in the Introduction.
The ages derived above and best-fit radii of the model give expansion velocities of 8 and
32 km s−1 for S1 and S2, respectively, using the standard model for the kinematic age of
stellar wind bubbles with tkin = 0.6R/vexp (Weaver et al. 1977). Taking into account the
uncertainties in this estimate, these velocities agree with the velocities of 2 km s−1 (Weaver
1979) and 19-25 km s−1 (Heiles 1984; Sofue et al. 1974) found for H I gas in this region. This
may imply that the H I distribution in this region is the result of two superimposed H I
shells that are expanding with different velocities.
6. Conclusions
Based on an analysis of the new DRAO polarization survey and scaled WMAP data, a
model consisting of two synchrotron emitting shells is developed that reproduces large-scale
structures in the polarized sky. One of these shells, S1, has reached the Sun and gives rise
to polarized synchrotron emission at high Galactic latitudes, the HLPE described in this
paper. Where the path length through S1 is longest and its B-field is perpendicular to the
line-of-sight over its whole length in the emitting region, a new radio loop is detected in
polarized intensity. A younger shell, S2, correctly reproduces emission from the NPS. A
scenario is proposed in which S2 has recently started to interact with S1, possibly giving
rise to the observed X-ray emission. The model also predicts correctly the low polarization
from S1 and S2 within the Depolarization Band. The picture of the local ISM suggested in
– 8 –
this paper is in agreement with previous studies outlined in the Introduction, although the
two-shell geometry of the model makes a comparison with previous models difficult.
The author would like to thank T. L. Landecker, E. M. Berkhuijsen, R. Kothes, A. D.
Gray, and P. Vaudrevange for comments on the manuscript. The Dominion Radio Astrophys-
ical Observatory is a National Facility operated by the National Research Council Canada.
The Canadian Galactic Plane Survey is a Canadian project with international partners, and
is supported by the Natural Sciences and Engineering Research Council (NSERC). I acknowl-
edge the use of the Legacy Archive for Microwave Background Data Analysis (LAMBDA).
Support for LAMBDA is provided by the NASA Office of Space Science.
A. Magnetic Field Used In The Model
In this model an approximation for the magnetic field ~Bi inside an expanding shell is
used. Its strength and direction depend on the direction (θ, ϕ) of the ambient magnetic field
B̃ as well as the inner and outer radii of the shell (rin, rout). Looking at Fig. 6, it is easy
to see that the magnitude of ~Bi is maximal / zero where the expansion is perpendicular /
parallel to B̃. For |~x| > rout and |~x| < rin the magnetic field strength ~Bi is assumed to be
zero.
The direction of ~Bi is given by
B̂i =
|~ρi|
, (A1)
where ~ρi = b̂ − (b̂ · x̂)x̂ with b̂ the unit vector of the ambient field B̃ and x̂ the unit vector
pointing towards the i-th line-of-sight element. The field strength of ~Bi can be evaluated to
ni = (b̂ · x̂)
2, (A2)
giving rise to the Stokes parameters
ui ∝ ni B
(γ+1)/2
⊥ sin
2 arctan
qi ∝ ni B
(γ+1)/2
⊥ cos
2 arctan
for the i-th line-of-sight. Here B⊥ =
B2l +B
b and Bl, Bb are the projections of
~Bi onto
the line-of-sight. An energy spectral index of the synchrotron radiation of γ ≈ 2.8 is taken.
Assuming no intrinsic Faraday rotation, the polarized intensity and the polarized position
angle is finally found
U2 +Q2,
PA = 1
arctan
– 9 –
where U =
ui and Q =
REFERENCES
Berkhuijsen, E. M., Haslam, C. G. T., & Salter, C. J. 1971, A&A, 14, 252
Bochkarev, N. G. 1987, Ap&SS, 138, 229
Borken, R. J., & Iwan, D.-A. C. 1977, ApJ, 218, 511
Burn, B. J. 1966, MNRAS, 133, 67
Egger, R. J. 1995, ASP Conf. Ser. 80: The Physics of the Interstellar Medium and Inter-
galactic Medium, 80, 45
Ferriere, K. M., Mac Low, M.-M., & Zweibel, E. G. 1991, ApJ, 375, 239
de Geus, E. J., de Zeeuw, P. T., & Lub, J. 1989, A&A, 216, 44
Haslam, C. G. T., Stoffel, H., Salter, C. J., & Wilson, W. E. 1982, A&AS, 47, 1
Heiles, C. 1979, ApJ, 229, 533
Heiles, C. 1984, ApJS, 55, 585
Heiles, C. 1998, LNP Vol. 506: IAU Colloq. 166: The Local Bubble and Beyond, 506, 229
Máız-Apellániz, J. 2001, ApJ, 560, L83
Page, L., et al. 2006, ArXiv Astrophysics e-prints, arXiv:astro-ph/0603450
Reich, P., & Reich, W. 1986, A&AS, 63, 205
Salter, C. J. 1983, Bulletin of the Astronomical Society of India, 11, 1
Sfeir, D. M., Lallement, R., Crifo, F., & Welsh, B. Y. 1999, A&A, 346, 785
Snowden, S. L., et al. 1995, ApJ, 454, 643
Sofue, Y., Hamajima, K., & Fujimoto, M. 1974, PASJ, 26, 399
Spoelstra, T. A. T. 1972, A&A, 21, 61
Taylor, A. R., et al. 2003, AJ, 125, 3145
http://arxiv.org/abs/astro-ph/0603450
– 10 –
Testori, J. C., Reich, P., & Reich, W. 2004, in The Magnetized Interstellar Medium, ed. B.
Uyaniker, W. Reich, & R. Wielebinski, 57
Tomisaka, K. 1992, PASJ, 44, 177
Weaver, R., McCray, R., Castor, J., Shapiro, P., & Moore, R. 1977, ApJ, 218, 377
Weaver, H. 1979, IAU Symp. 84: The Large-Scale Characteristics of the Galaxy, 84, 295
Wolleben, M. 2005, Ph.D. Thesis,
Wolleben, M., Landecker, T. L., Reich, W., & Wielebinski, R. 2005, VizieR Online Data
Catalog, 344, 80411
Wolleben, M., Landecker, T. L., Reich, W., & Wielebinski, R. 2006, A&A, 448, 411
This preprint was prepared with the AAS LATEX macros v5.2.
– 11 –
Table 1. Best-fit parameters of the model.
l b d rin rout Bθ Bφ
Shell (deg) (deg) (pc) (pc) (pc) (deg) (deg)
S1 346 3 78 72 91 71 −72
±5 ±5 ±10 ±10 ±10 ±30 ±30
S2 347 37 95 63 87 25 25
±15 ±15 ±10 ±15 ±10 ±30 ±30
– 12 –
Fig. 1.— Map of polarized intensity in units of mK (top) and polarization angle (bottom) in
Galactic coordinates, taken from the DRAO polarization survey at 1.4 GHz (northern hemi-
sphere) and the WMAP polarization data smoothed to 4◦ resolution (southern hemisphere).
The data are shown in rectangular projection to expose structures around the Galactic poles.
The polarized intensity map exhibits three extended features: the NPS (from l = 310◦ to
50◦ and b = 30◦ to 80◦), the New Radio Loop (centered at l = 40◦, b = −55◦, about 40◦
in diameter), and the so-called “Fan-Region” (centered at l = 140◦, b = 5◦, about 60◦ in
diameter). The solid lines mark the observable extent of the HLPE in the second and third
Galactic quadrant. The dashed line shows a circle fitted through the New Loop and one of
the two extended “arms” of the NPS (at about b = 70◦). Black contours in the bottom panel
show 1.5 keV X-ray emission associated with the NPS at levels of 200 and 350×10−6counts/s
(taken from Snowden et al. 1995).
– 13 –
Fig. 2.— Observed (left) and modelled (right) maps of the northern (top) and southern
(bottom) Galactic poles with polarization vectors overlayed. The contours show polarized
intensity from the DRAO and WMAP data from 50 to 400 mK, in steps of 50 mk.
– 14 –
Fig. 3.— The sketch displays cuts through the model viewed looking down on the Galactic
plane towards negative latitudes (top) and the vertical plane through the Sun and perpen-
dicular to the line-of-sight looking towards l = 90◦ (bottom). The two shells are indicated by
solid (S1) and dashed (S2) lines. The Sun is indicated by the circled dot. Thin lines indicate
the B-field orientation of each shell (the B-field component parallel to the image plane). The
stippled region shows the Na I distribution around the Local Bubble (taken from Sfeir et al.
1999). Filled triangles and squares show the centers of the LCC and US subgroup today,
5 Myr ago, and 10 Myr ago (only for LCC) (taken from Máız-Apellániz 2001).
– 15 –
Fig. 4.— An “engineering drawing” to help the reader depict the 3-dimensional structure
of the two proposed shells. The solid line indicates S1, and the dashed line indicates S2.In
contrast to Fig. 3 these drawings show projections of the two shells rather than cuts through
the Galaxy. The Sun is located at the origin of the coordinate system. The figure shows
the shells as seen from the Galactic anti-center towards the Galactic center (top left), seen
sideways (top right), and seen from above (bottom).
– 16 –
Fig. 5.— Modelled maps of polarized intensity in units of mK (top) and polarization angle
(bottom). The dashed line from Fig. 1 is repeated in the top panel. The black contour in the
bottom panel indicates the region where the S1 and S2 shells overlap in space. The model
was merely fitted to polarized intensity which means that “Fig. 5 top” (model) was fitted
to “Fig. 1 top” (survey). Nevertheless, the predicted polarization angles (“Fig. 5 bottom”)
resemble the observed pattern (“Fig. 1 bottom”) remarkably well. The Fan-Region is not
part of the model described in this paper.
– 17 –
Fig. 6.— Geometry of the spherical shell. rin and rout are the inner and outer radii of the
shell, respectively; B̃ is the background magnetic field and b̂ is its unit vector; ~Bi is the
magnetic field inside the shell along the i-th line-of-sight element; and x̂ is the unit vector
pointing towards the i-th line-of-sight element. Region I and II indicate regions with strong
and weak B-fields, respectively.
	Introduction
	Large Polarized Structures In The DRAO Survey
	High Latitude Polarized Emission (HLPE)
	New Radio Loop
	Depolarization Band
	The Model
	Discussion
	Possible Formation History
	Conclusions
	Magnetic Field Used In The Model
ABSTRACT
  The North Polar Spur (NPS) is the brightest filament of Loop I, a large
circular feature in the radio continuum sky. In this paper, a model consisting
of two synchrotron emitting shells is presented that reproduces large-scale
structures revealed by recent polarization surveys. The polarized emission of
the NPS is reproduced by one of these shells. The other shell, which passes
close to the Sun, gives rise to polarized emission towards the Galactic poles.
It is proposed that X-ray emission seen towards the NPS is produced by
interaction of the two shells. Two OB-associations coincide with the centers of
the shells. A formation scenario of the Loop I region is suggested.

<|endoftext|><|startoftext|>
Introduction
Let F be a family of sets. The Helly number h(F) of F is the minimal
positive integer h such that if a finite subfamily K ⊂ F satisfies
K′ 6= ∅
for all K′ ⊂ K of cardinality ≤ h, then
K 6= ∅. Helly’s classical theorem
(1913, see e.g. [3]) asserts that the Helly number of the family of convex sets
in Rd is d+ 1.
Helly’s theorem and its numerous extensions are of central importance
in discrete and computational geometry (see [3, 10]). It is of considerable
interest to understand the role of convexity in these results, and to find
suitable topological extensions. Indeed, it is often the case that topological
methods provide a deeper understanding of the underlying combinatorics
behind Helly type theorems. Helly himself realized in 1930 (see [3]) that in
his theorem, convex sets can be replaced by topological cells if you impose
the additional requirement that all non-empty intersections of these cells are
again topological cells. Helly’s topological version of his theorem also follows
from the later nerve theorems of Borsuk, Leray and others (see below).
The following result was conjectured by Grünbaum and Motzkin [8] , and
proved by Amenta [1]. A family of sets G is an (F , r)-family if for any finite
G ′ ⊂ G, the intersection
G ′ is a union of at most r disjoint sets from F .
Theorem 1.1 (Amenta). Let F be the family of compact convex sets in Rd.
Then for any (F , r)-family G
h(G) ≤ r(d+ 1) .
The main motivation for the present paper was to find a topological ex-
tension of Amenta’s Theorem.
Let X be a simplicial complex on the vertex set V . The induced sub-
complex on a subset of vertices S ⊂ V is X [S] = {σ ∈ X : σ ⊂ S}. The
link of a subset A ⊂ V is lk(X,A) = {τ ∈ X : τ ∪ A ∈ X, τ ∩ A = ∅ } .
The geometric realization of X is denoted by |X|. We identify X and |X|
when no confusion can arise. All homology groups considered below are with
rational coefficients, i.e. Hi(X) = Hi(X ;Q) and H̃i(X) = H̃i(X ;Q).
The rational Leray number L(X) ofX is the minimal d such that H̃i(Y ) =
0 for all induced subcomplexes Y ⊂ X and i ≥ d. The Leray number can be
regarded as a simple topologically based “complexity measure” of X . Note
that L(X) = 0 iff X is a simplex, and L(X) ≤ 1 iff X is the clique complex
of a chordal graph (see [9]). It is well-known (see e.g. [7]) that L(X) ≤ d
iff H̃i(lk(X, σ)) = 0 for all σ ∈ X and i ≥ d. Leray numbers have also
significance in commutative algebra, since L(X) is equal to the Castelnuovo-
Mumford regularity of the Stanley-Reisner ring of X over Q (see [7]).
From now on we assume that V1, . . . , Vm are finite disjoint 0-dimensional
complexes, and denote their join by V1 ∗ · · · ∗ Vm. Let ∆m−1 be the simplex
on the vertex set [m] = {1, . . . , m}, and let π denote the simplicial projection
from V1 ∗ · · · ∗ Vm onto ∆m−1 given by π(v) = i if v ∈ Vi. For a subcomplex
X ⊂ V1 ∗ · · ·∗Vm, let r(X, π) = max{|π
−1(π(x))| : x ∈ |X|}. Our main result
is the following
Theorem 1.2. Let Y = π(X) and r = r(X, π). Then
L(Y ) ≤ rL(X) + r − 1 . (1)
Example: For r ≥ 1, d ≥ 2 let m = rd, and consider a partition [m] =
k=1Ak with |Ak| = d. For i ∈ [m] let Vi = {i} × [r]. Denote by ∆(A) the
simplex on vertex set A, with boundary ∂∆(A) ≃ S |A|−2. For k, j ∈ [r] let
Akj = Ak × {j}, and let
Xk = ∆(A1k) ∗ · · · ∗∆(Ak−1,k) ∗ ∂∆(Akk) ∗∆(Ak+1,k) ∗ · · · ∗∆(Ark) .
Let X =
k=1Xk. Then L(X) = d − 1, and the projection π : X → ∆m−1
satisfies r(X, π) = r. Since π(X) = ∂∆m−1, it follows that L(π(X)) = m−1.
Hence equality is attained in (1).
As mentioned earlier, Theorem 1.2 is motivated by an application in com-
binatorial geometry. The nerve N(F) of a family of sets F , is the simplicial
complex whose vertex set is F and whose simplices are all F ′ ⊂ F such that
F ′ 6= ∅. It is easy to see that
h(F) ≤ 1 + L(N(F)). (2)
A finite family F of compact sets in some topological space is a good cover if
for any F ′ ⊂ F , the intersection
F ′ is either empty or contractible. If F
is a good cover in Rd, then by the Nerve Lemma (see e.g. [2]) L(N(F)) ≤ d,
hence follows the Topological Helly’s Theorem: h(F) ≤ d + 1. Theorem 1.2
implies a similar topological generalization of Amenta’s theorem.
Theorem 1.3. Let F is a good cover in Rd. Then for any (F , r)-family G
h(G) ≤ r(d+ 1) .
The proof of Theorem 1.2 combines a vanishing theorem for the multiple
point sets of a projection, with an application of the image computing spec-
tral sequence due to Goryunov and Mond [5]. In Section 2 we describe the
Goryunov-Mond result. In Section 3 we prove our main result, Proposition
3.1, which is then used to deduce Theorem 1.2. The proof of Theorem 1.3 is
given in Section 4.
2 The Image Computing Spectral Sequence
For X ⊂ V1 ∗ · · · ∗ Vm and k ≥ 1 define the multiple point set Mk by
Mk = {(x1, . . . , xk) ∈ |X|
k : π(x1) = · · · = π(xk)} .
Let W be a Q-vector space with an action of the symmetric group Sk.
Denote Alt = 1
sign(σ)σ ∈ Q[Sk]. Then
AltW = {Altw : w ∈ W} =
{w ∈ W : σw = sign(σ)w for all σ ∈ Sk} . (3)
The natural action of Sk on Mk induces an action on the rational chain
complex C∗(Mk) and on the rational homology H∗(Mk). The idempotence of
Alt implies that
Alt H∗(Mk) ∼= H∗(AltC(Mk)) . (4)
The following result is due to Goryunov and Mond [5] (see also [4] and [6]).
Theorem 2.1 (Goryunov and Mond). Let Y = π(X) and r = r(X, π). Then
there exists a homology spectral sequence {Erp,q} converging to H∗(Y ) with
E1p,q =
AltHq(Mp+1) 0 ≤ p ≤ r − 1, 0 ≤ q
0 otherwise
Remark: The E1 terms in the original formulation of Theorem 2.1 in [5],
are given by E1p,q = AltHq(D
p+1) where
k = closure{(x1, . . . , xk) ∈ |X|
k : π(x1) = · · · = π(xk), xi 6= xj for i 6= j} .
The isomorphism
AltHq(D
p+1) ∼= AltHq(Mp+1)
which implies (5), is proved in Theorem 3.4 in [6]. Indeed, as noted there, the
inclusionDp+1 → Mp+1 induces an isomorphism Alt Cq(D
p+1) ∼= AltCq(Mp+1)
already at the alternating chains level.
3 Homology of the Multiple Point Set
In this section we study the homology of a generalization of the multiple
point set. For subcomplexes X1, . . . , Xk ⊂ V1 ∗ · · · ∗ Vm, let
M(X1, . . . , Xk) = {(x1, . . . , xk) ∈ |X1| × · · · × |Xk| : π(x1) = · · · = π(xk)} .
In particular, if X1 = · · · = Xk = X then M(X1, . . . , Xk) = Mk.
We identify the generalized multiple point set M(X1, . . . , Xk) with the
simplicial complex whose p-dimensional simplices are {wi0 , . . . , wip}, where
1 ≤ i0 < · · · < ip ≤ m, wij = (vij ,1, . . . , vij ,k) ∈ V
and {vi0,r, . . . , vip,r} ∈ Xr
for all 1 ≤ r ≤ k. The main ingredient in the proof of Theorem 1.2 is the
following
Proposition 3.1. H̃j
M(X1, . . . , Xk)
= 0 for j ≥
i=1 L(Xi).
The proof of Proposition 3.1 depends on a spectral sequence argument
given below. We first recall some definitions. Let K be a simplicial complex.
The subdivision sd(K) is the order complex of the set of the non-empty
simplices of K ordered by inclusion. For σ ∈ K let DK(σ) denote the order
complex of the interval [σ, ·] = {τ ∈ K : τ ⊃ σ}. Let
DK(σ) denote the
order complex of the interval (σ, ·] = {τ ∈ K : τ % σ}. Note that
DK(σ) is
isomorphic to sd(lk(K, σ)) via the simplicial map τ → τ − σ. Since DK(σ)
is contractible, it follows that Hi
DK(σ),
DK(σ)
∼= H̃i−1
lk(K, σ)
for all
i ≥ 0.
For σ ∈ V1 ∗ · · · ∗ Vm, let σ̃ =
i∈π(σ) Vi. Note that if σ2 ∈ X2, . . . , σk ∈ Xk
then there is an isomorphism
M(X1, σ2, . . . , σk) ∼= X1[∩
i=2σ̃i] . (6)
For 0 ≤ p ≤ n =
i=2 dimXi let
S ′p = {(σ2, . . . , σk) ∈ X2 × · · · ×Xk :
dim σi ≥ n− p}
and let Sp = S
p − S
p−1. For σ = (σ2, . . . , σk) ∈ S
p let
Aσ = M(X1, σ2, . . . , σk)×DX2(σ2)× · · · ×DXk(σk) ,
Bσ = M(X1, σ2, . . . , σk)×
DX2(σ2)× · · · ×
DXj (σj)× · · · ×DXk(σk)
Proposition 3.2. There exists a homology spectral sequence {Erp,q} converg-
ing to H∗
M(X1, . . . , Xk)
such that
E1p,q =
i1,...,ik≥0
i1+···+ik=p+q
i=2σ̃i]
H̃ij−1
lk(Xj, σj)
for 0 ≤ p ≤ n , 0 ≤ q , and E1p,q = 0 otherwise.
Proof: For 0 ≤ p ≤ n let
σ∈S′p
Aσ ⊂ M(X1, . . . , Xk)× sd(X2)× · · · × sd(Xk).
Write K = Kn, and consider the projection on the first coordinate θ : K →
M(X1, . . . , Xk). Let (x1, . . . , xk) ∈ M(X1, . . . , Xk), and let σi denote the
minimal simplex in Xi that contains xi. Then the fiber
(x1, . . . , xk)
= {(x1, . . . , xk)} ×DX2(σ2)× · · · ×DXk(σk)
is a cone, hence K is homotopy equivalent to M(X1, . . . , Xk). The filtration
∅ ⊂ K0 ⊂ · · · ⊂ Kn = K gives rise to a homology spectral sequence {E
converging to H∗(K) ∼= H∗(M(X1, . . . , Xm)). The E
p,q terms are computed
as follows. First note that
Kp−1 =
Bσ . (8)
Secondly,
Aσ −Bσ
∩ Aσ′ = ∅ for σ 6= σ
′ ∈ Sp. Hence
H∗(Aσ, Bσ) . (9)
Applying excision, (8),(9), and the Künneth formula we obtain:
E1p,q = Hp+q(Kp, Kp−1)
∼= Hp+q
Hp+q(Aσ, Bσ) ∼=
i1,...,ik≥0
i1+···+ik=p+q
M(X1, σ2, . . . , σk)
DXj (σj),
DXj(σj)
i1,...,ik≥0
i1+···+ik=p+q
i=2σ̃i]
H̃ij−1
lk(Xj , σj)
Proof of Proposition 3.1: If L(Xj) = 0 for all 1 ≤ j ≤ k, then all the
Xj’s are simplicies, say Xj = σj . It follows that M(X1, . . . , Xk) is isomorphic
to the simplex
j=1 π(σj) and thus has vanishing reduced homology in all
nonnegative dimensions. Suppose then that m =
j=1 L(Xj) > 0. Without
loss of generality we may assume that L(X1) > 0. Let i1, . . . , ik ≥ 0 such that
j=1 ij ≥ m. Then either i1 ≥ L(X1) and then Hi1
i=2σ̃i]
= 0, or there
exists a 2 ≤ j ≤ k such that ij − 1 ≥ L(Xj) and then H̃ij−1
lk(Xj, σj)
By (7) it follows that E1p,q = 0 if p + q ≥ m, hence H̃j
M(X1, . . . , Xk)
for all j ≥ m.
Remark: If all the Vj’s are singletons then M(X1, . . . , Xk) is isomorphic to
j=1Xj . Hence Proposition 3.1 implies the following result of [7].
Corollary 3.3 ([7]). If X1, . . . , Xk are simplicial complexes on the same
vertex set, then
Xj) ≤
L(Xj) .
Proof of Theorem 1.2: Let Y = π(X) and r = r(X, π). Assuming as we
may that L(X) > 0, we have to show that Hm(Y ) = 0 for m ≥ rL(X)+r−1.
By Theorem 2.1 it suffices to show that Alt Hq(Mp+1) = 0 for all pairs (p, q)
such that p ≤ r − 1 and p + q ≥ rL(X) + r − 1. Indeed, p ≤ r − 1 implies
that q ≥ rL(X) ≥ (p + 1)L(X), thus Hq(Mp+1) = 0 by Proposition 3.1.
4 A Topological Amenta Theorem
Proof of Theorem 1.3: Suppose G = {G1, . . . , Gm} is an (F , r)-family.
Write Gi =
j=1 Fij , where ri ≤ r and Fij
Fij′ = ∅ for 1 ≤ j 6= j
′ ≤ ri.
Let Vi = {Fi1, . . . , Firi} and consider the nerve
X = N({Fij : 1 ≤ i ≤ m, 1 ≤ j ≤ ri}) ⊂ V1 ∗ · · · ∗ Vm .
Let ∆m−1 be the simplex on the vertex set {G1, . . . , Gm} and let π denote
the projection of V1 ∗ · · · ∗Vm into ∆m−1 given by π(Fij) = Gi. Then π(X) =
N(G). Let y ∈ |N(G)| and let σ = {Gi : i ∈ I} be the minimal simplex in
N(G) such that y ∈ |σ|. Then
|π−1(y)| = |{ (ji : i ∈ I) :
Fiji 6= ∅ }| . (10)
On the other hand
(ji:i∈I)
Fiji (11)
and the union on the right is a disjoint union. The assumption that G is an
(F , r) family, together with (10) and (11), imply that |π−1(y)| ≤ r for all
y ∈ |N(G)|. Since F is a good cover in Rd, the Leray number of the nerve
satisfies L(X) = L(N(F)) ≤ d. Therefore by (2) and Theorem 1.2
h(G) ≤ 1 + L(N(G)) = 1 + L(π(X)) ≤
1 + rL(X) + r − 1 ≤ r(d+ 1) .
References
[1] N. Amenta, A short proof of an interesting Helly-type theorem. Discrete
Comput. Geom. 15(1996), 423–427.
[2] A. Björner, Topological methods, in Handbook of Combinatorics (R.
Graham, M. Grötschel, and L. Lovász, Eds.) , 1819–1872, North-
Holland, Amsterdam, 1995.
[3] J. Eckhoff, Helly, Radon and Carathéodory type theorems, in Handbook
of Convex Geometry (P.M. Gruber and J.M. Wills Eds.), North–Holland,
Amsterdam, 1993.
[4] V. Goryunov, Semi-simplicial resolutions and homology of images and
discriminants of mappings. Proc. London Math. Soc. 70(1995), 363–385.
[5] V. Goryunov, D. Mond, Vanishing cohomology of singularities of map-
pings. Compositio Math. 89(1993), 45–80.
[6] K. Houston, An introduction to the image computing spectral sequence.
Singularity theory (Liverpool, 1996), 305–324, London Math. Soc. Lec-
ture Note Ser., 263, Cambridge Univ. Press, Cambridge, 1999.
[7] G. Kalai, R. Meshulam, Intersections of Leray complexes and regu-
larity of monomial ideals, Journal of Combinatorial Theory Ser. A.,
113(2006), 1586–1592.
[8] B. Grünbaum, T. Motzkin, On components in some families of sets,
Proc. Amer. Math. Soc. 12(1961), 607–613.
[9] G. Wegner, d-Collapsing and nerves of families of convex sets, Arch.
Math. (Basel) 26(1975) 317–321.
[10] R. Živaljević, Topological methods, in Handbook of discrete and compu-
tational geometry, (J. Goodman and J. O’Rourke Eds.), 209–224, CRC
Press, Boca Raton, 1997.
ABSTRACT
  Let X be a simplicial complex on the vertex set V. The rational Leray number
L(X) of X is the minimal d such that the rational reduced homology of any
induced subcomplex of X vanishes in dimensions d and above. Let \pi be a
simplicial map from X to a simplex Y, such that the cardinality of the preimage
of any point in |Y| is at most r. It is shown that L(\pi(X)) \leq r L(X)+r-1.
One consequence is a topological extension of a Helly type result of Amenta.

<|endoftext|><|startoftext|>
Introduction
Spin foam models were first introduced as a space-time alternative to the spin network
description of states in loop quantum gravity [3]. The most studied spin foam models
are due to Barrett and Crane [8,9]. A spin foam is a discretization of space-time where
the fundamental degrees of freedom are the areas labelling its 2-dimensional faces.
An important goal in the investigation of spin foam models is to obtain predictions
that can be compared to the large scale, classical, or semiclassical behavior of gravity.
This work continues the numerical investigation of the physical properties of spin foam
models of Riemannian quantum gravity begun in [5–7, 13]. In this paper, we extend
the computations to the q-deformed Barrett-Crane model and to larger space-time
triangulations.
The main applications of q-deformation are two-fold. On the one hand, it can act
as a regulator for divergent models, as is apparent in the link between the Ponzano-
Regge [27] and Turaev-Viro [31] models. On the other hand, Smolin [30] has argued
that q-deformation is necessary to account for a positive cosmological constant. Both
of these aspects are explored in more detail in Section 2.2. A surprising result of our
work is evidence that the limit, as the cosmological constant is taken to zero through
positive values, is discontinuous.
Large triangulations are necessary to approximate semiclassical space-times. The
possibility of obtaining numerical results from larger triangulations takes us one step
closer to that goal and increases the number of facets from which the physical proper-
ties of a spin foam model may be examined. As an example, we are able to study how
the spin-spin correlation varies with the distance between faces in the triangulation.
http://arxiv.org/abs/0704.0278v1
This paper is structured as follows. We begin in Section 2 by reviewing the basics
of q-deformation and discussing in detail its aforementioned applications. Section 3
reviews the details of the Barrett-Crane model, summarizes the necessary changes
for its q-deformation, and defines several observables associated to spin foams. In
Section 4, we review the existing numerical simulation techniques and how they need
to be generalized to handle q-deformation and larger triangulations. Section 5 presents
the results of our numerical simulations. In Section 6, we give our conclusions and list
some avenues for future research. The Appendix briefly summarizes our notational
conventions and useful formulas.
2 Deformation of su(2)
In this section, we describe the q-deformation of the Lie algebra su(2) into the algebra
suq(2) (also denoted Uq(su(2))), the representations of suq(2), and the applications
of q-deformation. The deformations of spin(4) are then obtained through the isomor-
phism spin(4) ∼= su(2)⊕ su(2).
The following is part of the general subject of quantum groups [21]. Here we shall
concentrate solely on the su(2) and spin(4) cases.
2.1 The algebra suq(2) and its representations
The Lie algebra su(2) is generated by the well known Pauli matrices σi, which obey
the commutation relations
[σ+, σ−] = 4σ3, [σ3, σ+] = 2σ+, [σ3, σ−] = −2σ−, (1)
where σ± = σ1 ± iσ2. The universal enveloping algebra of su(2) is the associative
algebra generated by σ± and σ3 subject to the above identities, with the Lie bracket
being interpreted as [A,B] = AB −BA.
The q-deformed algebra suq(2) is constructed by replacing σ3 with another gen-
erator. Formally, it is thought of as Σ = q
σ3 , where q ∈ C with the exceptions
q 6= 0, 1,−1. The Lie bracket relations are replaced by the identities
[σ+, σ−] = 4
Σ2 − Σ−2
q − q−1
, Σσ+ = qσ+Σ, Σσ− = −qσ−Σ. (2)
We can rewrite q = 1+ 2ε and think of ε as a small complex number. Then, formally
at leading order in ε, the substitution Σ = q
σ3 = 1 + εσ3 + O(ε
2) reduces the
deformed identities (2) to the standard Lie algebra relations (1). The associative
algebra generated by σ± and σ3 subject to the deformed identities (2) is the algebra
suq(2).
For generic q, that is, when q is not a root of unity, the finite-dimensional irreducible
representations of suq(2) are classified by a half-integer, j = 0, 1/2, 1, 3/2, . . . , referred
to as the spin, in direct analogy with the representations of su(2) and the theory of
angular momentum. The dimension of the representation j is 2j + 1. When q =
exp(iπ/r) is a 2rth root of unity (ROU), where r > 2 is an integer called the ROU
parameter, the representations j are still defined, but become reducible for j > (r −
2)/2. They decompose into a sum of representations with spin at most (r − 2)/2 and
so-called trace 0 ones, whose nature will be explained below.
For the purposes of this paper we are concerned only with intertwiners between
representations of suq(2), i.e., linear maps commuting with the action of the algebra,
and their (quantum) traces1.
Any such intertwiner can be constructed from a small set of generators and elemen-
tary operations on them. These constructions, as well as traces, can be represented
graphically. Such graphs are called (abstract) spin networks. Their calculus is well
developed and is described in [18], whose conventions we follow throughout the paper
with one exception: we use spins (half-integers) instead of twice-spins (integers). A
brief review of our notation and conventions can be found in the Appendix.
Trace 0 representations of suq(2) are so called because the trace of an intertwiner
from such a representation to itself is always zero. Thus, they can be freely discarded,
as they do not contribute to the evaluation of q-deformed spin networks.
2.2 Applications of q-deformation
Deformation, especially with q = exp(iπ/r) a 2rth primitive ROU, is important for
spin foam models for at least two reasons. Replacing q = 1 by some ROU can act as
a regulator for a model whose partition function and observable values are otherwise
divergent. Also, suq(2) spin networks
2 naturally appear when considering a positive
cosmological constant in loop quantum gravity.
The original Ponzano-Regge model [27] attempts to express the path integral for
3-dimensional Riemannian general relativity as a sum over labelled triangulations of a
3-manifold. The edges of the triangulation are labelled by discrete lengths, identified
with spin labels of irreducible SU(2) representations. Each tetrahedron contributes a
6j-symbol factor to the summand, normalized to ensure invariance of the overall sum
under change of triangulation. Unfortunately, the Ponzano-Regge model turned out
to be divergent. Motivated by the construction of 3-manifold invariants, Turaev and
Viro were able to regularize the Ponzano-Regge model [1, 31] by replacing the SU(2)
6j-symbols with their q-deformed analogs at a ROU q. The key feature of the regu-
larization is the truncation of the summation to only the irreducible representations
of suq(2) of non-zero trace, which leaves only a finite number of terms in the model’s
partition function.
A version of the Barrett-Cranemodel, derived from a group field theory by De Pietri,
Freidel, Krasnov and Rovelli [16] (DFKR for short), was also found to be divergent.
A q-deformed version of the same model at a ROU q is similarly regularized (see Sec-
tion 3.2). Some numerical results for the regularized version of this model are given
in Section 5.2.
The argument linking q-deformation to the presence of a positive cosmological
constant is due to Smolin [29] and is given in more refined form in [30]. It is briefly
summarized as follows. Loop quantum gravity begins by writing the degrees of free-
dom of general relativity in terms of an SU(2) connection on a spatial slice and the
slice’s extrinsic curvature. A state in the Schrödinger picture, a wave function on
the space of connections, can be constructed by integrating the Chern-Simons 3-form
over the spatial slice. This state, known as the Kodama state, simultaneously satisfies
all the canonical constraints of the theory and semiclassically approximates de Sit-
1When q = 1, this notion of trace reduces up to sign to the usual trace of a linear map, but is
slightly different otherwise, cf. [10, Chapter 4].
2These are graphs embedded in a 3-manifold, labelled by representations of suq(2). They are
similar to but distinct from the abstract spin networks referred to above. See [4] for the distinction.
ter spacetime, which is a solution of the vacuum Einstein equations with a positive
cosmological constant. The requirement that the Kodama state also be invariant un-
der large gauge transformations implies discretization of the cosmological constant,
Λ ∼ 1/r, with r a positive integer. The coefficients of the Kodama state in the spin
network basis are obtained by evaluating the labelled graph, associated to a basis
state, as an abstract suq(2) spin network. Here the deformation parameter q is a
ROU, q = exp(iπ/r), where the ROU parameter r is identified with the discretization
parameter of the cosmological constant.
Given the heuristic link [4] between spin networks of loop quantum gravity and
spin foams, it is natural to q-deform a spin foam model as an attempt to account
for a positive cosmological constant. With this aim, Noui and Roche [23] have given
a q-deformed version of the Lorentzian Barrett-Crane model. The possibility of q-
deformation has been with the Riemannian Barrett-Crane model since its inception [8]
and all the necessary ingredients have been present in the literature for some time. In
the next section these details are collected in a form ready for numerical investigation.
3 Deformation of the Barrett-Crane model
Consider a triangulated 4-manifold. Let ∆n denote the set of n-dimensional simplices
of the triangulation. The dual 2-skeleton is formed by associating a dual vertex, edge
and polygonal face to each 4-simplex, tetrahedron, and triangle of the triangulation,
respectively. A spin foam is an assignment of labels, usually called spins, to the dual
faces of the dual 2-skeleton. Each dual edge has 4 spins incident on it, while each dual
vertex has 10. A spin foam model assigns amplitudes AF , AE and AV , that depend on
all the incident spins, to each dual face, edge and vertex, respectively. The amplitude
Z(F ) assigned to a spin foam F is the product of the amplitudes for individual cells of
the 2-complex, while the total amplitude Ztot assigned to a triangulation is obtained
by summing over all spin foams based on the triangulation:
Z(F ) =
AF (f)
AE(e)
AV (v), Ztot =
Z(F ). (3)
Some models, such as those based on group field theory [16,17,24], also include a sum
over triangulations in the definition of the total partition function.
3.1 Review of the undeformed model
The Riemannian Barrett-Crane model was first proposed in [8]. Its relation to the
Crane-Yetter [15] spin foam model is analogous to the relation of the Plebanski [26]
formulation of general relativity (GR) to 4-dimensional BF theory with Spin(4) as the
structure group. Both BF theory and the Crane-Yetter model are topological and the
latter is considered a quantization of the former [2]. In the Plebanski formulation, GR
is a constrained version of BF theory. Similarly, the Barrett-Crane model restricts
the spin labels summed over in the Crane-Yetter model. With this restriction, Barrett
and Crane hoped to produce a discrete model of quantum (Riemannian) GR.
3.1.1 Dual vertex amplitude
All amplitudes are defined in terms of spin(4) spin networks. However, given the
isomorphism spin(4) ∼= su(2)⊕ su(2), all irreducible representations of spin(4) can be
written as tensor products of irreducible representations of su(2). The Barrett-Crane
model specifically limits itself to balanced representations, which are of the form j⊗ j,
where j is the irreducible representation of su(2) of spin j. Since the tensor product
corresponds to a juxtaposition of edges in a spin network, any spin(4) spin network
may be written as an su(2) spin network where an edge labelled j ⊗ j is replaced by
two parallel edges, each labelled j. To avoid redundancy of notation, we use a single
j instead of j ⊗ j to label spin(4) spin network edges. We then distinguish them from
su(2) networks by placing a bold dot at every vertex.
The Barrett-Crane vertex is an intertwiner between four balanced representations:
e . (4)
The graphs on the right hand side of the definition are su(2) spin networks and the
sum runs over all admissible labels e. The graphical notation and the conditions for
admissibility are defined in the Appendix.
The above expression defines the Barrett-Crane vertex in a way that breaks ro-
tational symmetry. However, it can be shown that the vertex is in fact rotation-
ally symmetric. Up to normalization, this property makes the Barrett-Crane vertex
unique [28]. The above formula defines a vertical splitting of the vertex. A ninety
degree rotation will define an analogous horizontal splitting. Both possibilities are
important in the derivation of the algorithm presented in Section 4.1.
Given a 4-simplex v of a triangulation, the corresponding vertex of the dual 2-
complex is assigned the amplitude
AV (v) =
j1,0j1,1
j1,4j1,2
j2,1 j2,4
j2,2 j2,3
. (5)
This spin network is called the 10j-symbol. The 4-simplex v is bounded by five tetra-
hedra, which correspond to the vertices of the 10j graph. The four edges incident on
a vertex correspond to the four faces of the corresponding tetrahedron; the spin labels
are assigned accordingly. The edge joining two vertices corresponds to the face shared
by corresponding tetrahedra. Evaluation of the 10j-symbol is discussed in Section 4.1.
While the crossing structure depicted above is immaterial in the undeformed case, it
is essential at nontrivial values of q. It is given here for reference.
3.1.2 Dual edge and face amplitudes
The original paper of Barrett and Crane did not specify dual edge and face amplitudes.
Three different dual edge and face amplitude assignments were considered in a previous
paper [7]. We concentrate on the same possibilities.
For the Perez-Rovelli model [25], we have
AF (f) = j , AE(e) =
j1 j2 j3 j4
. (6)
For the DFKR model [16], we have
AF (f) = j , AE(e) =
. (7)
For the Baez-Christensen model [7], we have
AF (f) = 1, AE(e) =
. (8)
The bubble diagram, when translated into su(2) spin networks, corresponds to two
bubbles (see Appendix)
. (9)
and evaluates to (2j + 1)2.
The so-called eye diagram simply counts the dimension of the space of 4-valent
intertwiners, which is also the number of admissible e-edges summed over in Equa-
tion (4). In symmetric form, it is given by
1 + min{2j, s− 2J} if positive and s is integral,
0 otherwise,
where s =
k jk, j = mink jk, and J = maxk jk.
3.2 The q-deformed model
Thanks to graphical notation, the q-deformation of the spin foam amplitudes described
above is straightforward, with only a few subtleties. The main distinction is that
q-deformed graphs are actually ribbon (framed) graphs with braiding. Thus, any
undeformed spin network has to be supplemented with information about twists and
crossings before evaluation.
In [32], Yetter generalized the Barrett-Crane 4-vertex for a q-deformed version of
spin(4). Since spin(4) ∼= su(2) ⊕ su(2), there is a two parameter family of possible
deformations of the Lie algebra, spinq,q′ (4)
∼= suq(2) ⊕ suq′(2). Yetter singles out
the one parameter family q′ = q−1, restricted to balanced representations, since it
preserves the invariance of the Barrett-Crane vertex under rotations. This family also
has especially simple curl and twist identities:
, (11)
where the left factor of j ⊗ j corresponds to suq(2) and the right one to suq−1(2), and
the 3-vertex is the obvious juxtaposition of two suq(2) and suq−1(2) 3-vertices. Once
this deformation is adopted, the ribbon structure can be ignored [32], so one only
needs to specify the crossing structure for a given spin(4) spin network to obtain a
well-defined q-evaluation.
There are three basic graphs needed to define the Barrett-Crane simplex ampli-
tudes: the bubble, the eye, and the 10j-symbol. The evaluation of the bubble graph,
Equation (9), is [2j + 1]2, where the quantum integer [2j + 1] is defined in the Ap-
pendix. Remarkably, the value of the eye diagram turns out not to depend on q and
its value is still given by Equation (10). The only exception is when q is a ROU with
parameter r. Then, the dimension of the space of 4-valent intertwiners changes to
1 + min{2j, s− 2J}
r − 1−max{2J, s− 2j}
if positive and
s is integral,
0 otherwise,
where again s =
k jk, j = mink jk, and J = maxk jk.
The 10j-symbol is the only network with a non-planar graph. Originally, it was
defined in terms of the 15j-symbol from the Crane-Yetter model. This 15j-symbol
was defined with q-deformation in mind, so its crossing and ribbon structure was fully
specified [14, Section 3]. Adapted to the 10j-graph, it can be summarized as follows:
Consider a 4-simplex. The dual 1-skeleton of the boundary has five dual vertices and
ten dual edges, and is the complete graph K5 on these five dual vertices. If we remove
one of the (non-dual) vertices from the boundary of the 4-simplex, what remains is
homeomorphic to R3. For any such homeomorphism, the embedding of K5 into R
can be projected onto a 2-dimensional plane. The crossing structure of the 10j graph
is defined by such a projection. It is illustrated in Equation (5). Although, with
crossings, the 10j graph is no longer manifestly invariant under permutations of its
vertices, it can be shown to be so.
3.3 Observables
The definition of observables in a spin foam model of quantum gravity is still open
to interpretation (see Section 6 of [7] for a brief discussion). For a fixed spin foam,
the half-integer spin labels of its faces are the fundamental variables of the model.
Practically speaking, any observable of a spin foam model should be an expectation
value of some function O(F ) of the spin labels of a spin foam F , averaged over all spin
foams with amplitudes specified by Equation (3):
〈O〉 =
O(F )Z(F )
. (13)
In this paper we choose to concentrate on a few observables representative of the
kind of quantities computable in a spin foam model. As before, fix a triangulation of
a 4-manifold, let ∆2 represent the set of its faces and let j : ∆2 → {0, 1/2, 1, . . .} be
the spin labelling. We define:
J(F ) =
⌊j(f)⌉ , (14)
(δJ)2(F ) =
(⌊j(f)⌉ − 〈J〉)
, (15)
A(F ) =
⌊j(f)⌉ ⌊j(f) + 1⌉, (16)
Cd(F ) =
f,f ′∈∆2
dist(f,f ′)=d
⌊j(f)⌉ ⌊j(f ′)⌉ − 〈J〉
〈(δJ)2〉
. (17)
where ⌊n⌉ denotes a quantum half-integer (see Appendix), | · | denotes cardinality,
dist(f, f ′) denotes the distance between faces, and Nd is a normalization factor (see
below for the definition of distance and Nd). These observables represent average spin
per face, variance of spin per face, average area per face, and spin-spin correlation as
a function of d.
The choice of observables given above is somewhat arbitrary. For instance, there
are several subtly distinct choices for the expression for (δJ)2. Fortunately, they all
yield expectation values that are nearly identical. The expression given above has the
technical advantage of falling into the class of so-called single spin observables. These
are observables whose expectation value can be directly obtained from the knowledge
of probability with which spin j occurs on any face of a spin foam. All of J , (δJ)2,
and A are single spin observables, while Cd is not.
Note that on a fixed triangulation with no other background geometry, there is
no physical notion of distance. We can, instead, define a combinatorial analog. For
any two faces f and f ′ of a given triangulation, let dist(f, f ′) be the smallest number
of face-sharing tetrahedra that connect f to f ′. Given the discrete structure of our
spacetime model, it is conceivable that this combinatorial distance, multiplied by a
fundamental unit of length, approximates some notion of distance derived from the
dynamical geometry of the spin foam model.
The correlation function Cd may be thought of as analogous to a normalized 2-
point function of quantum field theory. The d-degree of face f is the number of faces
f ′ such that dist(f, f ′) = d. If the d-degree of every face is the same, the normalization
factor Nd can be taken to be the number of terms in the sum (17), that is, the number
of face pairs separated by distance d. This choice ensures the inequality |Cd| ≤ 1. If
not all faces have the same d-degree, then the normalization factor has to be modified
Nd = |∆2|Dd, (18)
where Dd is the maximum d-degree of a face, which reduces to the simpler definition
in the case of uniform d-degree.
The choice of the q-dependent expression ⌊j⌉, instead of simply using the half-
integer j, is motivated in Section 5.1. For some q, the argument of the square root in
A(F ) may be negative or even complex. In that case, a branch choice will have to be
made. Luckily, if q = 1, q is a ROU, or q is real, the expression under the square root
is always non-negative.
4 Numerical simulation
The key development that made possible numerical simulation of variations of the
(undeformed) Barrett-Crane model [6,7] is the development by Christensen and Egan
of a fast algorithm for evaluating 10j-symbols [13]. In this section, we show how
this algorithm generalizes to the q-deformed case and discuss numerical evaluation of
observables for the previously described spin foam models.
4.1 The q-deformation of the fast 10j algorithm
The derivation of the Christensen-Egan algorithm given in [13] is contingent on the
possibility of splitting the Barrett-Crane 4-vertex as in Equation (4) and on the re-
coupling identity, Equation (43) of the Appendix. Both identities still hold in the
q-deformed case. The validity of the 4-vertex splitting was proved by Yetter [32] and
the recoupling identity is a standard part of suq(2) representation theory.
The only remaining detail of the algorithm’s generalization is the crossing structure
of the 10j graph, which was established in Section 3.2. However, its only consequence
is an extra factor from the twist implicit in the bubble diagram of Section 4 of [13],
cf. Equation (50) of the Appendix. We will not reproduce the derivation of the algo-
rithm here. However, the way in which the twist arises is schematically illustrated in
Figure 1. Note that the triviality of the twist for Yetter’s balanced representations,
Equation (11), does not apply here since the twist occurs separately in distinct suq(2)
networks.
The algorithm itself can be summarized in the following form:
{10j} = (−)2S
m1,m2
φ tr[M4M3M2M1M0]. (19)
The 10j-symbol depends on the ten spins ji,k, (i = 1, 2, k = 0, . . . , 4) specified in
Equation (5). The overall prefactor depends on the total spin S =
i,k ji,k and the
per-term prefactor is
φ = (−)m1−m2 [2m1 + 1][2m2 + 1]q
m1(m1+1)−m2(m2+1). (20)
(a) (b)
(c) (d)
Figure 1: In reference to [13], (a) corresponds to Equation (1), (b) corresponds to
Equation (2), while (c) and (d) correspond to the “ladder” and “bubble” diagrams
of Section 4, respectively. The illustrated twist introduces the explicitly q-dependent
factor into Equation (20).
The exponents of (−) and q are always integers. The Mk are matrices (not all of
the same size) of dimensions compatible with the five-fold product and trace. Their
matrix elements are
[2lk + 1](T1)
θ(j2,k−1, lk+1, j1,k) θ(j2,k+1, lk+1, j1,k+1)
, (21)
lk j2,k mi
lk+1 j2,k−1 j1,k
θ(j2,k, lk+1,mi)
. (22)
The quantum integers [n], as well as the theta θ(a, b, c) and tetrahedral Tet[· · ·] suq(2)
spin networks are defined in the Appendix.
The quantities lk and mi are spin labels (half-integers). They are constrained by
admissibility conditions (parity conditions and triangle inequalities). The parity of
each index is determined by the conditions
lk ≡ j1,k + j2,k ≡ j1,k−1 + j2,k−2, (23)
mi ≡ lk + j2,k−1, (24)
for i = 1, 2 and k = 0, . . . , 4, where ≡ denotes equivalence mod 1 and the second
subscript of j is taken mod 5. Summation bounds are determined by the triangle in-
equalities, which must be checked for each trivalent vertex introduced in the derivation
of the algorithm. They boil down to
lb3(j1,k, j2,k, j2,k−1) ≤ mi ≤ j1,k + j2,k + j2,k−1, (25)
|j1,k−1 − j2,k−2| ≤ lk ≤ j1,k−1 + j2,k−2, (26)
|j1,k − j2,k| ≤ lk ≤ j1,k + j2,k, (27)
|mi − j2,k−1| ≤ lk ≤ mi + j2,k−1, (28)
for i = 1, 2 and k = 0, . . . , 4, where we have used the notation
lb3(a, b, c) = 2max{a, b, c} − (a+ b+ c). (29)
When q = exp(iπ/r) is a ROU, extra inequalities must be taken into account to
exclude summation over reducible representations. These are
mi ≥ j1,k + j2,k + j2,k−1 − (r − 2), (30)
mi ≤ ub3(j1,k, j2,k, j2,k−1) + (r − 2), (31)
lk ≤ (r − 2)− (j1,k + j2,k), (32)
lk ≤ (r − 2)− (j1,k−1 + j2,k−2), (33)
lk ≤ (r − 2)− (m+ j2,k−1), (34)
where now
ub3(a, b, c) = 2min{a, b, c} − (a+ b+ c).
If any of the parity constraints or inequalities cannot be satisfied, the 10j-symbol
evaluates to zero.
This algorithm has been implemented and tested in the q = 1 and ROU cases,
for both j and r up to several hundreds. Unfortunately, for generic q, when Q =
max{|q|, |q|−1} > 1, the quantum integers grow exponentially as |[n]| ∼ Qn. Such a
rapid growth makes the sums involved in this algorithm numerically unstable. It is
still possible to use this algorithm with Q close to 1 or symbolically, using rational
functions of q instead of limited precision floating point numbers. Symbolic computa-
tion is, however, significantly slower (by up to a factor of 106) than its floating point
counterpart. The software library spinnet which implements these and other spin
network evaluations is available from the authors and will be described in a future
publication.
4.2 Positivity and statistical methods
The sums involved in evaluating expectation values of observables, as in Equation (13),
are very high-dimensional. For instance, a minimal triangulation of the 4-sphere (seen
as the boundary of a 5-simplex) contains 20 faces. Hence, any brute force evaluation
of an expectation value, even on such a small lattice, involves a sum over the 20-
dimensional space of half-integer spin labels.
Fortunately, in the undeformed case, the total amplitude Z(F ) for a closed spin
foam is never negative3 [5]. The proof for the q = 1 case generalizes to the ROU
case. One need only realize two facts. The first is that, in the ROU case, quantum
integers are non-negative. The second is that, for q a ROU, an suq−1(2) spin network
evaluates to the complex conjugate of the corresponding suq(2) spin network. The
3We expect the same thing to hold in Lorentzian signature [5, 12].
disjoint union of any two such spin networks evaluates to their product, the absolute
value squared of either of them, and hence is non-negative. Then, the same positivity
result follows as from Equation (1) of [5]. This positivity allows us to treat Z(F ) as
a statistical distribution and use Monte Carlo methods to extract expectation values
with much greater efficiency than brute force summation.
The main tool for evaluating expectation values is the Metropolis algorithm [20,22].
The algorithm consists of a walk on the space of spin labellings. Each step is randomly
picked from a set of elementary moves and is either accepted or rejected based on
the relative amplitudes of spin foam configurations before and after the move. An
expectation value is extracted as the average of the observable over the configurations
constituting the walk. Elementary moves for spin foam simulations are discussed in
the next section.
A Metropolis-like algorithm is possible even if individual spin foam amplitudes
Z(F ) are negative or even complex. However, if the total partition function Ztot sums
to zero, then the expectation values in Equation (13) become ill defined. Moreover, in
numerical simulations, if Ztot is even close to zero, expectation value estimates may
exhibit great loss of precision and slow convergence. In the path-integral Monte Carlo
literature, this situation is known as the sign problem [11]. Still, the sign problem
need not occur or, depending on the severity of the problem, there may be ways of
effectively dealing with it.
Independent Metropolis runs can be thought of as providing independent estimates
of a given expectation value. Thus, the error in the computed value of an observable
can be estimated through the standard deviation of the results of many independent
simulation runs [19].
4.3 Elementary moves for spin foams
The choice of elementary moves for spin foam simulations must satisfy several criteria.
Theoretically, the most important one is ergodicity. That is, any spin foam must
be able to transform into any other one through a sequence of elementary moves
which avoid configurations with zero amplitude. Practically, it is important that
these moves usually preserve admissibility. A spin foam F is called admissible if the
associated amplitude Z(F ) is non-zero. If, starting with an admissible spin foam,
most elementary moves produce an inadmissible spin foam, the simulation will spend
a lot of time rejecting such moves without any practical benefit.
As before, consider a fixed triangulation of a compact 4-manifold. The parity
conditions (23) imposed on the ji,k,
j1,k + j2,k ≡ j1,k−1 + j2,k−2, 0 ≤ k ≤ 4,
when taken together with the total spin foam amplitude (3), provide strong constraints
on admissible spin foams. One can show that a move that changes spin labels by ±1/2
(mod 1) on each face of a closed surface in the dual 2-skeleton preserves the parity
constraint. We take as the elementary moves the moves that change the spin labels
by ±1/2 on the boundaries of the dual 3-cells of the dual 3-complex; the dual 3-cells
correspond to the edges of the triangulation. If the manifold has non-trivial mod 2
homology in dimension 2, additional moves would be necessary, but for the examples
we consider the moves above suffice. From a practical point of view, extra moves
might improve the simulation’s equilibration time. For instance, in the ROU case,
parity preserving moves that change the spins from 0 to (r − 2)/2 or (r − 3)/2 were
introduced, since spins close to either admissible extreme may have large amplitudes.
This property of the Perez-Rovelli and Baez-Christensen models is illustrated in the
following section.
Unfortunately, the inequalities constraining spin labels do not have a similar geo-
metric interpretation and cannot be used to easily restrict the set of elementary moves
in advance.
5 Results
Using methods described in the previous section, we ran simulations of the three vari-
ations of the Barrett-Crane model described in Section 3 and obtained expectation
values for observables listed in Section 3.3. While previous work [7] performed simula-
tions only on the minimal triangulation of the 4-sphere, which we will refer to simply
as the minimal triangulation, we have extended the same techniques to arbitrary tri-
angulations of closed manifolds.
5.1 Discontinuity of the r → ∞ limit
The most striking result we can report is a discontinuity in the transition to the limit
r → ∞, where r, a positive integer, is the ROU parameter with q = exp(iπ/r). As
r → ∞, the deformation parameter q tends to its classical value 1. If we interpret
the cosmological constant as inversely proportional to r, Λ ∼ 1/r, this limit also
corresponds to Λ → 0, through positive values. For a fixed spin foam, the amplitudes
and observables we study tend continuously to their undeformed values as r → ∞.
However, we find that observable expectation values do not tend to their undeformed
values in the same limit, that is, 〈O〉r 9 〈O〉q=1 as r → ∞.
The discontinuity is most simply illustrated with the single spin distribution, that
is the probability of finding spin j at any spin foam face. This probability can be
estimated from the histogram of all spin labels that have occurred during a Monte
Carlo simulation. The points in Figure 2(a) show the single spin distributions for the
Baez-Christensen model with r = 50 and q = 1. The curves show the corresponding
single bubble amplitude. It is the amplitude Z(Fj) of a spin foam Fj with all spin
labels zero, except for the boundary of an elementary dual 3-cell, whose faces are all
labelled with spin j. The amplitudes and distributions are normalized as probabil-
ity distributions so their sums over j yield 1. The similarity between the points and
the continuous curves is consistent with the hypothesis that spin foams with isolated
bubbles dominate the partition function sum. The behavior of the single spin dis-
tribution for the Perez-Rovelli model is very similar, except that its peaks are much
more pronounced.
Note that the undeformed single spin distribution has a single peak at j = 0, while
the r = 50 case has two peaks, one at j = 0 and the other at j = (r − 2)/2, the
largest non-trace 0 irreducible representation. The bimodal nature of the single spin
distribution has an important impact on the large r behavior of observable expectation
values, as is most easily seen with single spin observables (Section 3.3). For instance,
if we consider the average, j̄, of the half-integers j, the large j peak would dominate
the expectation value and 〈j̄〉 would diverge linearly in r, as r → ∞. On the other
hand, since J is the average of the quantum half-integers ⌊j⌉, 〈J〉 at least approaches
a constant in the same limit. This is illustrated in Figure 2(b).
Figure 2: (a) Single spin distribution and single bubble amplitude for the Baez-
Christensen model. The distribution was obtained from 109 steps of Metropolis sim-
ulation on a triangulation with 202 faces (cf. Section 5.3). (b) Some single spin ob-
servables as functions of j, with r = 50.
However, as shown in Figure 3, this limit is not the same as the undeformed
expectation value. At the same time, as can be seen from the plot of the Perez-
Rovelli average area in the same figure, there are some observables whose large r
limits are at least very close to the undeformed values. The area observable summand
⌊j⌉ ⌊j + 1⌉ is exactly zero at both j = 0 and j = (r − 2)/2, while the spin
observable summand Jj = ⌊j⌉ is zero at j = 0 but still positive at j = (r − 2)/2,
Figure 2(b). The large j peak of the Perez-Rovelli model is very narrow and thus
the expectation value of a single spin observable is strongly influenced by its value at
j = (r − 2)/2.
The data for larger triangulations is qualitatively similar.
5.2 Regularization of the DFKR model
As expected, the ROU deformation of the DFKR model yields a finite partition func-
tion and finite expectation values. For instance, its single spin distribution for r = 40
is illustrated in Figure 4. The divergence of the amplitude for large spins in the unde-
formed, q = 1, case makes numerical simulation impossible without an artificial spin
cutoff. Thus, we do not have an undeformed analog of the single spin distribution.
For the minimal triangulation, the ROU spin distribution deviates slightly from the
single bubble amplitude close to the boundaries of admissible j. For the larger trian-
gulation, the deviation is much more pronounced and is not restricted to the edges.
This suggests that there are other significant contributions to the partition function
besides single bubble spin foams.
Note the large weight associated with spins around j = r/4. Around this value of
j, both the area Aj =
⌊j⌉ ⌊j + 1⌉ and the spin Jj = ⌊j⌉ attain their maximal values
and are proportional to r. Thus, it is natural to expect their expectation values to
grow linearly in r, which is consistent with the divergent nature of the undeformed
DFKR model. This is precisely the behavior shown in Figure 5. On the minimal
triangulation, the best linear fits for the average spin expectation value and for the
Figure 3: Observables for the Baez-Christensen (BCh) and Perez-Rovelli (PR) models
as functions of the ROU parameter r. For large r, observables do not in general tend
to their undeformed, q = 1, values; arrows show the deviation. Some observables
were scaled to fit on the graph. Data is from Metropolis simulations on the minimal
triangulation.
square root of the average spin variance are
〈J〉r = 0.146 r − 0.064, (35)
(δJ)2
= 0.014 r + 0.187. (36)
For larger triangulations, the dependence of these observables is also approximately
linear in r, with only slight variation in the effective slope.
5.3 Spin-spin correlation
The ability to work with larger lattices allows us to explore a broader range of observ-
ables. One of them is the spin-spin correlation function Cd defined in Section 3.3. In
general 〈C0〉 = 1 and 〈Cd〉 → 0 for large d. The decay of the correlation shows how
quickly the spin labels on different spin foam faces become independent. A positive
value of 〈Cd〉 indicates that, on average, any two faces distance d apart both have
spins above (or both below) the mean 〈J〉. On the other hand, a negative value of
〈Cd〉 indicates that, on average, any two faces distance d apart have one spin above
and one below the mean 〈J〉.
A small triangulation limits the maximum distance between faces. For example,
the minimal triangulation has maximum distance d = 3. Larger triangulations of
the 4-sphere were obtained by refining the minimal one by applying Pachner moves
randomly and uniformly over the whole triangulation. We restricted the Pachner
moves to those that did not decrease the number of simplices.
Figure 4: Single spin distributions and single bubble amplitudes for the DFKR model.
The distributions were obtained from 109 steps of Metropolis simulation on the mini-
mal triangulation and on a triangulation with 202 faces (cf. Section 5.3).
The largest triangulation we have used has maximum distance d = 6. Its corre-
lations for different models are shown in Figure 6 along with those from the minimal
triangulation. Correlation functions for different values of ROU parameter r (including
the q = 1 case) and other triangulations are qualitatively similar.
Notice the small negative dip for small values of d for the Perez-Rovelli and Baez-
Christensen models. As discussed in previous sections, the partition functions of these
models are dominated by spin foams with isolated bubbles. The correlation data is
consistent with this hypothesis. The values of the spins assigned to faces of the bubble
will be strongly correlated, while the values of the spins on two faces, one of which
lies on the bubble and the other does not, should be strongly anti-correlated. Since
a given face usually has fewer nearest neighbors that lie on the same bubble than
that do not, on average, the short distance correlation is expected to be negative. At
slightly larger distances, the correlation function turns positive again. This indicates
that on a larger triangulations, spin foams with several isolated bubbles contribute
strongly to the partition function. Although, with so few data points, it is difficult
to extrapolate the behavior of the correlation function to larger triangulations and
distances, its features are qualitatively similar to that of a condensed fluid, where the
density-density correlation function exhibits oscillations on the scale of the molecular
dimensions.
Note that the behavior of the DFKR correlation function is significantly different
from the other two. This is also consistent with the already observed fact that its
partition function has strong contributions from other than single or isolated bubble
spin foams.
Figure 5: Observables for the DFKR model: area 〈A〉, average spin 〈J〉, spin standard
deviation
〈(δJ)2〉. Metropolis simulation, minimal triangulation. Error bars are
smaller than the data points.
6 Conclusion
We have numerically investigated the behavior of physical observables for the Perez-
Rovelli, DFKR, and Baez-Christensen versions of the Barrett-Crane spin foam model.
Each version assigns different dual edge and face amplitudes to a spin foam, and these
choices greatly affect the behavior of the resulting model. The behavior of the models
was also greatly affected by q-deformation.
The limiting behavior of observables was found to be discontinuous in the limit of
large ROU parameter r, i.e., q = exp(iπ/r) close to its undeformed value of 1. This
result is at odds with the physical interpretation of the relation Λ ∼ 1/r between the
cosmological constant Λ and the ROU parameter. Finally, the behavior of the exam-
ined physical observables, especially of the spin-spin correlation function, indicates the
dominance of isolated bubble spin foams in the Perez-Rovelli and Baez-Christensen
partition functions, while less so for the the DFKR one.
Some questions raised by these results deserve attention. For instance, it is not
known whether the same q → 1 limit behavior will be observed when q is taken
through non-ROU values. While calculations with max{|q|, |q|−1} > 1 are numerically
unstable, they should still be possible for |q| ∼ 1.
Another important project is to perform a more extensive study of the effects of
triangulation size in order to better understand the semi-classical limit.
Finally, all of this work should also be carried out for the Lorentzian models, which
are physically much more interesting but computationally much more difficult.
These and other questions will be the subject of future investigations.
Figure 6: Spin-spin correlation functions for the Baez-Christensen (BCh), Perez-
Rovelli (PR) and DFKR models, on the minimal triangulation (6 vertices, 15 edges, 20
faces, 15 tetrahedra, and 6 4-simplices) as well as a larger triangulation (23 vertices,
103 edges, 202 faces, 200 tetrahedra, and 80 4-simplices). ROU parameter r = 10.
Acknowledgements
The authors would like to thank Wade Cherrington for helpful discussions. The first
author was supported by NSERC and FQRNT postgraduate scholarships and the
second author by an NSERC grant. Computational resources for this project were
provided by SHARCNET.
A Spin network notation and conventions
Quantum integers are a q-deformation of integers. For an integer n, the corresponding
quantum integer is denoted by [n] and is given by
[n] =
qn − q−n
q − q−1
. (37)
In the limit q → 1, we recover the regular integers, [n] → n. Note that [n] is invariant
under the transformation q 7→ q−1. When q = exp(iπ/r) is a root of unity (ROU), for
some integer r > 1, an equivalent definition is
[n] =
sin(nπ/r)
sin(π/r)
. (38)
This expression is non-negative in the range 0 ≤ n ≤ r. Quantum factorials are
defined as
[n]! = [1][2] · · · [n]. (39)
In many cases, q-deformed spin network evaluations can be obtained from their unde-
formed counterparts by simply replacing factorials with quantum factorials. For con-
venience, when dealing with half-integral spins, we also define quantum half-integers
⌊j⌉ =
when j is a half-integer.
Abstract suq(2) spin networks can be approached from two different directions.
They can represent contractions and compositions of suq(2)-invariant tensors and in-
tertwiners [10]. At the same time, they can represent traces of tangles evaluated
according to the rules of the Kauffman bracket [18]. Either way, the computations
turn out to be the same. We present here formulas for the evaluation of a few spin
networks of interest.
The single bubble network evaluates to what is sometimes called the superdimension
of the spin-j representation:
j = (−)2j [2j + 1]. (41)
(As in the rest of the paper, the spin labels are half-integers.)
Up to a constant, there is a unique 3-valent vertex (corresponding to the Clebsch-
Gordan intertwiner) whose normalization is fixed up to sign by the value of the θ-
network :
θ(a, b, c) =
(−)s[s+ 1]![s− 2a]![s− 2b]![s− 2c]!
[2a]![2b]![2c]!
, (42)
where s = a+ b + c. The θ-network is non-vanishing, together with the three-vertex
itself, if and only if s is an integer and the triangle inequalities are satisfied: a ≤ b+ c,
b ≤ c+ a, and c ≤ a+ b. In addition, when q is a ROU, one extra inequality must be
satisfied: s ≤ r − 2. The triple (a, b, c) of spin labels is called admissible if θ(a, b, c) is
non-zero.
The recoupling identity gives the transformation between different bases for the
linear space of 4-valent tangles (or intertwiners):
(−)2e[2e+ 1]Tet
a b e
c d f
θ(a, d, e) θ(c, b, e)
e , (43)
where the sum is over all admissible labels e and the value of the tetrahedral network
a b e
c d f
m≤S≤M
(−)S [S + 1]!
i[S − ai]!
j [bj − S]!
, (44)
where
[bj − ai]! E ! = [2A]![2B]![2C]![2D]![2E]![2F ]! (45)
a1 = (a+ d+ e) b1 = (b+ d+ e+ f) (46)
a2 = (b + c+ e) b2 = (a+ c+ e+ f) (47)
a3 = (a+ b+ f) b3 = (a+ b+ c+ d) (48)
a4 = (c+ d+ f) m = max{ai} M = min{bj}. (49)
Due to parity constraints, the ai, bj, m, M , and S are all integers.
Since the three-vertex is unique up to scale, its composition with with a braiding
applied to two incoming legs yields a multiplicative factor:
= (−)a+b−cqa(a+1)+b(b+1)−c(c+1)
. (50)
Note that the above braiding factor is not invariant under the transformation q 7→ q−1,
while the bubble, tetrahedral and θ-networks are all invariant under this transforma-
tion, by virtue of their expressions in terms of quantum integers.
References
[1] Archer F and Williams R M 1991 Physics Letters B 273 438–44
[2] Baez J C 1996 Letters in Mathematical Physics 38 129–43 (Preprint arXiv:q-alg/9507006)
[3] Baez J C 1998 Classical and Quantum Gravity 15 1827–58 (Preprint arXiv:gr-qc/9709052)
[4] Baez J C 2000 Lecture Notes in Physics 543 25–94 (Preprint arXiv:gr-qc/9904025)
[5] Baez J C and Christensen J D 2002 Classical and Quantum Gravity 19 2291–306 (Preprint
arXiv:grqc/0110044)
[6] Baez J C, Christensen J D, and Egan G 2002 Classical and Quantum Gravity 19 6489–513
(Preprint arXiv:gr-qc/0208010)
[7] Baez J C, Christensen J D, Halford T R, and Tsang D C 2002 Classical and Quantum Gravity
19 4627–48 (Preprint arXiv:gr-qc/0202017)
[8] Barrett J W and Crane L 1998 Journal of Mathematical Physics 39 3296–302 (Preprint arXiv:gr-
qc/9709028)
[9] Barrett J W and Crane L 2000 Classical and Quantum Gravity 17 3101–18 (Preprint arXiv:gr-
qc/9904025)
[10] Carter J S, Flath D E, and Saito M 1995 The Classical and Quantum 6j-Symbols (Princeton,
New Jersey: Princeton University Press)
[11] Ceperley D and Alder B 1986 Science 231 555–60
[12] Cherrington J W and Christensen J D 2006 Classical and Quantum Gravity 23 721–36 (Preprint
arXiv:gr-qc/0509080)
[13] Christensen J D and Egan G 2002 Classical and Quantum Gravity 19 1184–93 (Preprint
arXiv:gr-qc/0110045)
[14] Crane L, Kauffman L H, and Yetter D N 1997 Journal of Knot Theory and Its Ramifications 6
177–234 (Preprint arXiv:hep-th/9409167)
[15] Crane L and Yetter D N 1993 A categorical construction of 4D topological quantum field theories
Quantum Topology ed L H Kauffman and R A Baadhio (Singapore: World Scientific Press) pp
120–30
[16] De Pietri R, Freidel L, Krasnov K, and Rovelli C 2000 Nuclear Physics B 574 785–806 (Preprint
arXiv:hep-th/9907154)
[17] Freidel L 2005 International Journal of Theoretical Physics 44 1769–83 (Preprint arXiv:hep-
th/0505016)
[18] Kauffman L H and Lins S L 1994 Temperley-Lieb Recoupling Theory and Invariants of 3-
Manifolds (Princeton, New Jersey: Princeton University Press)
[19] Kikuchi M and Ito N 1993 Journal of the Physical Society of Japan 62 3052
[20] Landau D P and Binder K 2005 A Guide to Monte Carlo Simulations in Statistical Physics 2nd
ed (Cambridge: Cambridge University Press)
[21] Majid S 2000 Foundations of Quantum Group Theory (Cambridge: Cambridge University Press)
[22] Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H, and Teller E 1953 Journal of
Chemical Physics 21 1087–92
[23] Noui K and Roche P 2003 Classical and Quantum Gravity 20 3175–214 (Preprint arXiv:gr-
qc/0211109)
[24] Perez A 2003 Classical and Quantum Gravity 20 R043–104 (Preprint arXiv:gr-qc/0301113)
[25] Perez A and Rovelli C 2001 Nuclear Physics B 599 255–82 (Preprint arXiv:gr-qc/0006107)
[26] Plebański J F 1977 Journal of Mathematical Physics 18 2511–20
[27] Ponzano G and Regge T 1968 Semiclassical limit of Racah coefficients Spectroscopic and Group
Theoretical Methods in Physics ed F Bloch et al (Amsterdam: North-Holland) pp 1–98
[28] Reisenberger M P 1999 Journal of Mathematical Physics 40 2046–54 (Preprint arXiv:gr-
qc/9809067)
[29] Smolin L 1995 Journal of Mathematical Physics 36 6417–55 (Preprint arXiv:gr-qc/9505028)
[30] Smolin L 2002 Quantum gravity with a positive cosmological constant Preprint arXiv:hep-
th/0209079
[31] Turaev V G and Viro O Y 1992 Topology 31 865–902
[32] Yetter D N 1999 Journal of Knot Theory and Its Ramifications 8 815–29 (Preprint
arXiv:math.QA/9801131)
	Introduction
	Deformation of su(2)
	The algebra suq(2) and its representations
	Applications of q-deformation
	Deformation of the Barrett-Crane model
	Review of the undeformed model
	Dual vertex amplitude
	Dual edge and face amplitudes
	The q-deformed model
	Observables
	Numerical simulation
	The q-deformation of the fast 10j algorithm
	Positivity and statistical methods
	Elementary moves for spin foams
	Results
	Discontinuity of the r limit
	Regularization of the DFKR model
	Spin-spin correlation
	Conclusion
	Spin network notation and conventions
ABSTRACT
  We numerically study Barrett-Crane models of Riemannian quantum gravity. We
have extended the existing numerical techniques to handle q-deformed models and
arbitrary space-time triangulations. We present and interpret expectation
values of a few selected observables for each model, including a spin-spin
correlation function which gives insight into the behaviour of the models. We
find the surprising result that, as the deformation parameter q goes to 1
through roots of unity, the limit is discontinuous.

<|endoftext|><|startoftext|>
Introduction
Observations of H I velocities perpendicular to the disk (vz) are necessary for studies
of both the interstellar medium (ISM) (McKee & Ostriker 1977, Kulkarni & Heiles 1988,
Braun 1992, 1997) and disk dynamics (Oort 1932; Rupen 1987; Lockman & Gehman 1991;
Merriefield 1993; Malhotra 1994, 1995; Olling 1995) because they set a direct upper limit
on the thermal and kinetic temperature of the gas. Hence the H I velocities perpendicular
to the disk are an important dynamical tracer and as such can be used to constrain, both
the gas mass distribution in the plane and its vertical structure (i.e., density as a function
of height-z above the plane)(van der Kruit & Shostak 1982,1984, Lockman & Gehman 1991,
Malhotra 1995).
http://arxiv.org/abs/0704.0279v1
– 2 –
The behavior of the velocity dispersions as a function of galactic radius is important
for determinations of the shape of dark matter halos. To date even the most sophisticated
methods (e.g. Olling 1995, 1996) assume either a constant or an azimuthally symmetric
velocity dispersion. Our measurements can therefore be used with studies of edge-on systems
to determine radial variations in the mass to light (M/L) ratio.
Processes associated with star formation,such as stellar winds and multiple supernova
explosions are thought to put energy into the ISM in the form of mechanical energy, starlight
(which leads to photoelectric emission from dust grains), and cosmic rays. The velocity
dispersion in the z direction is intimately connected to the forces holding the gas against
gravitational instabilities and hence to star-formation in the disk (e.g. Mac Low & Klessen
2004, Li, Mac Low & Klessen 2005). Measuring the degree of correlation between the
locations of star-forming regions and those of high dispersion is a good method to investigate
the relation between star-related energy sources and H I bulk motions.
The face-on spiral galaxy NGC 1058 (e.g. Eskridge et al. 2002) is ideal for studies of
H I vz dispersions. Its low inclination (4
◦–11◦) (Lewis 1987, van der Kruit & Shostak 1984)
means that the gradient in rotational velocity is small across the beam and therefore it does
not significantly corrupt measurements of velocities perpendicular to the disk.
Single dish studies of NGC 1058 (Allen & Shostak 1979, Lewis 1975, Lewis 1984) lack
the resolution to trace the dispersion across the disk but through modeling of the rotational
component these authors estimate it to range between 7 and 9 km/sec. In a series of papers
(1982-1984) van der Kruit & Shostak analyze H I emission profiles in a number of face-on
galaxies and determine that the velocity dispersions in NGC 1058 range only between 7 to
8 km/sec at all radii with very little variation. Dickey, Hanson, & Helou (1990) find the
that velocity dispersion in NGC 1058 decreases with optical surface brightness but that in
the extended gas disk, beyond the Holmberg radius the velocity dispersion is 5.7 km/sec
everywhere, that is no variations with spiral phase or H I surface density are found.
All previous determinations of the H I velocity dispersion in NGC 1058 have been
hindered by low spatial (e.g., Lewis 1984) and/or spectral (e.g., van der Kruit & Shostak
1984) resolution as well as by relatively poor sensitivity, requiring smoothing over large
sections of the galactic disk, or missing up to 40% of the total flux (Dickey, Hanson, &
Helou 1990). These trade-offs have led to significantly different conclusions about the H I
velocity dispersion. Our sensitive observations at high spatial and velocity resolution as well
as recovery of the entire single dish flux, allowed us to accurately measure the profile widths
even in the outskirts of the H I disk, to resolve the arm from the interarm regions, and to
analyze in detail the H I profile shapes, not just their breadths throughout the H I disk.
– 3 –
2. Observations and Data Reductions
The 21cm line of neutral hydrogen in NGC 1058 was observed with the VLA in the C
and CS1configurations. The C configuration data was taken on 14 and 15 June 1993 for a
total time on-source of 12.23 hours. The D configuration observation were performed on 7
and 8 November 1993 for a total time on source of 2.67 hours, and CS configuration data was
collected on January 3, 1995 for a total time on-source of 5.42 hours. Rupen (1997,1998)
gives a detailed account of the UV coverage in each of the configurations, compares the
merits of each configuration, and discusses the benefits of combining them.
Both the C and CS configurations have a maximum baseline of 3.6 km, while the D
configuration has a maximum baseline 1 km). The minimum baseline, which determines
the size of the most extended feature which can be observed by the VLA, is 35 m. All
observations were taken in dual polarization mode and Hanning smoothing was applied on-
line; resulting in 127 independent spectral channels with a velocity width of 2.58 km/sec.
We followed the normal AIPS calibration procedures and used the same flux (3C48) and
phase (0234+285) calibrators throughout. The continuum emission was approximated as a
linear fit to visibilities in 20 line-free channels on each side of the signal, and this fit was then
subtracted from the uv-data in all the channels. Rupen (1999), gives a detailed description
of the bandpass calibration and continuum subtraction.
The data cube presented here was deconvolved using the CLEAN algorithm as imple-
mented by AIPS task IMAGR.2, iterated until the residuals were nearly zero and the flux
density in the CLEAN model was stable. The cube was tapered to a resolution of 30′′× 29′′,
or 1.3 × 1.3 kpc at a distance of 10 Mpc (Ferguson et al. 1998). Rupen (1997), presents
a detailed comparison of several cleaning algorithms and motivates the use of the CLEAN
algorithm for this data. A more general discussion of CLEAN as implemented in AIPS is
given in chapter 5, of the AIPS cookbook as well as in Cornwell, Braun, & Briggs 1999. A
more specific examination of deconvolution algorithms as applied on our NGC 1058 data is
presented in Rupen (1997,1999).
The RMS noise level in the line channels of the cube was 0.5 and mJy/beam, corre-
sponding to a column density of 1.6× 1018 cm−2 per channel. The H I integrated line profile
1The CS (shortened C) configuration moves two antennas from intermediate stations in the standard C
configuration to the center of the array. The resulting short spacings significantly increase the sensitivity of
the array to extended structure, while maintaining the same spatial resolution (Rupen 1997).
2A description of IMAGR can be found in the AIPS cookbook available online at
http://www.aoc.nrao.edu/aips/cook.html.
http://www.aoc.nrao.edu/aips/cook.html
– 4 –
agrees with those obtained in single dish studies (Allen & Shostak 1979) after both the single
dish and the VLA data are corrected for primary beam response.
Figure 1 presents the frames of the 30′′ data cube. Each image (traditionally called
a channel map) in this figure represents the 21 cm line emission at a certain velocity, the
abscisa and ordinate axis are the RA and Dec coordinates. The lack of artifacts in these
images (such as a negative bowl around the galaxy) also suggests that the images have been
correctly deconvolved.
3. Results and Analysis
The general properties of NGC 1058 are presented in Table 1. Figure 2 shows intensity
weighted mean velocity contours atop the H I intensity map and presents the H I spiral
structure of NGC 1058. Figure 2 also illustrates the superb sensitivity and resolution of
these studies, which allowed us to measure the H I emission at distances of approximatively
10 kpc from the center of the disk and to differentiate between the arms and the inter-arms.
We characterize the widths of the H I profiles by the relative dispersions σv of the best
Gaussian fit 3; the fits were done using a least squares minimization algorithm.
Figures 3 and 4, show the observed profiles for a few pixels throughout NGC 1058 from
the 45′′ and the 30′′ data sets respectively. The single Gaussians which best approximate
the shapes of these profiles, as well as their residuals are also shown. Note, that each of the
profiles is representative of the remainder of the spectra, it is not a best find or the result
of averaging over large areas of the disk or velocity space.
While the residual patterns suggest that a single Gaussian is not a good functional
description of the H I profiles in NGC 1058, the FWHM and σv derived from the single
Gaussian fits track well the intrinsic width of the H I spectrum. This proportionality allows
us to describe the widths of the profile in terms of the results of our least squares fitting.
The general characteristics of the velocity dispersion will be discussed in terms of the 45′′and
30′′ cubes.
3 The FWHM is often use to characterize the H I line widths. This tradition is based upon the fact that
H I profiles are modeled by one or multiple Gaussians, where the flux as a function of velocity v is given by
f(v) =
(v − v0)
and v0 is the velocity at associated with the peak flux.
– 5 –
4. General Characteristics of the Velocity Dispersion
Figure 5 presents the distribution of velocity dispersions across the disk of NGC 1058.
Unlike previous observers of NGC 1058, we find a wide range of dispersions from 4 to 14
km sec−1 in addition to a few extremely narrow profiles with σv ∼ 3.5 km sec−1. These
narraow profiles are found in regions of relatively low column density at radii greater than
300′′ or 13 kpc. There are three regions of high dispersion which stand out in Figure 5:
one in the center (labeled C and ∼ 4.5 kpc across) and two others symmetric about the
center in the North-West (N ∼ 3 kpc) and South-East (S ∼ 3 × 5 kpc) of the center. We
find no obvious correlation between high H I velocity dispersion and stars or star formation
tracers such as Hα (Figure 10), radio contiuum, SNe except in the central region C. The
most probable explanation for the observed highest dispersions outside the central region
(i.e. in N and S) are small scale (≤ 0.7 kpc) bulk motions (see section 7 below). In the
southern, part the disk could be warping (van der Kruit & Shostak 1984, Shen & Sellwood
2006) leading to the observed broad profiles. Figure 6 shows that H I profiles from N and
S are also assymmetric. However, a similar explanation for region N would suggest rather
impressive small-scale structure in the warp as it would requiere the inclination to change
∼ 3 degrees over a region smaller than 0.7 kpc in diameter, if we assume from Tully Fisher
an intrinsic rotation velocity of 150 km s−1 . A more exciting alternative explanation for the
bulk motions observed in N is that they are caused by the infall of gas left over from galaxy
formation. However, this is somewhat difficult to reconcile with the relatively low column
density in these regions.
Two global trends are evident from the derived dispersions: a radial fall-off, shown in
Figure 7, and a predominance of the broadest profiles in the inter-arm regions of the galaxy
(Figure 9). Ferguson et al. (1998) used deep Hα observations to reveal the presence of H II
regions in the central 6 kpc of NGC 1058. There are several knots of high dispersion (12 to
13.5 km/sec) in region C, with most of the profiles measuring between 7.5 and 11 km/sec.
However, none of the star formation sites outside the central ∼2 kpc discovered in that study
seem to affect the width of the profiles. Also, regions N and S are located in the inter-arm
regions and are not associated with sufficiently strong star formation to be detected in the
Ferguson et al. (1998) study. Therefore we find that the dispersions do not correlate with
star formation as shown from the overlay of Feruson et al (1998)’s Hα map atop contours of
velocity dispersion Figure 10.
Figure 8 shows the kinetic energy in the gas associated with motions perpendicular to
the disk. Because only a qualitative behaviour was of interest here, the kinetic energy in
vertical motions at a certain pixel location was roughly approximated as the product of total
intensity times the square of the velocity dispersion. Approximated as such, the kinetic
– 6 –
energy in vertical motions does not follow the decline in star light which drops with radius
as ∼ exp
. Figure 9 shows that, the broadest profiles seem to be found in relatively low
column density areas between the spiral arms (as traced by H I). As such we do not find a
correlation between the velocity dispersion of stars in the disk or the H I column density.
The dissimilarity between stars, star-formation, H I intensity, and the kinetic energy in
the gas implies that processes other than those directly associated with stars put energy into
the ISM. Sellwood & Balbus (1999) suggested that magnetic fields with strengths of a few
micro-gauss in these extended disks allow energy to be extracted from galactic differential
rotation through MHD-driven turbulence. While that mechanism predicted a uniform dis-
persion outside of the optical disk, in an attempt to explain lower quality data on NGC 1058,
a similar mechanism has the potential of explaining the level and behaviour of the velocity
dispersions as a function of radius (Sellwood, private communication). The Sellwood & Bal-
bus (1999) paper generated significant work on numerical models that predict the occurance
of the magnetorotational instability in galactic disks (e.g. Dziourkevitch, Elstner, & Rudiger
2004, Pionteck & Ostriker 2004).
5. Profile Shapes
Any model that would explain how energy is put into the ISM must account for the
shape of the profiles in NGC 1058. A single Gaussian least-squares fitting routine was run
on the 30′′ and 45′′ data sets. In both cases, we found that while the signal to noise for most
profiles was excellent the chisq per degree of freedom was larger than a few, the residuals
also suggested that the wings were broader than those of a Gaussian.
To understand whether how the H I line shapes varied throughout the galaxy, the profiles
were normalized by flux, aligned so that their peaks were at the same central velocity. These
were plotted in units of FWHM (2.354σv), using the parameters from the single Gaussian
fits to control the scaling. This was done to reveal only the difference in the line shapes and
not other differences such as the width of peak intensity. While stacking up the 45′′ profiles
it became clear that almost all profiles appeared to have the same shape. Figure 13 suggests
that despite it being non-Gaussian, the shapes of the line profiles are identical throughout
most of the galaxy when scaled by σv and their peak flux and aligned so that their peaks
occur at the same velocity.
For the 45′′ data, median shapes from profiles within different width and peak intensity
ranges were compared and found to be identical within the error-bars. The method used to
derive such median line shapes is fairly straightforward. After the pixel selection (by FWHM,
– 7 –
location in the galaxy, etc.) the profiles corresponding to every pixel were normalized in
intensity dividing by the peak flux. A grid of 63 channels for the 45′′ data and 108 for the
30′′ data was set up to replace the velocity axis from units of km/sec to units of FWHM.
For example suppose that the velocity corresponding to the peak of a certain profile (in the
45′′ cube) is Vcen km s
−1 and that its FWHM is FW km s−1. Only the channels between
Vcen − 3×FW and Vcen + 3×FW were used in deriving the median shape. The normalized
fluxes corresponding to these channels were then resampled onto a grid where each bin (i.e.;
channel) is 6FW divided by the number of channels. Each bin therefore contains a certain
number distribution of normalized fluxes; these fluxes were then sorted and the middle value
is taken as the median.
Median profiles were also derived and compared from various areas throughout the
disk and the line shapes appeared similar everyhwere except in N and S, where the profiles
where more asymmetric as previously discussed. For brevity we present just two of these
tests in Figure 14. The same median comparison tests were done on the 30′′ data. At the
30′′ resolution median profiles derived for certain ranges of peak flux and from various areas
throughout the galaxy were also identical. However, the median profiles derived for various
ranges of FWHM appeared to vary in the shape of their wings, perhaps because of the lower
signal to noise in this data set, and to the smaller number of broad (σv ≥ 10km s−1) than
that of narrow lines.
Throughout our analysis we assumed that the noise characteristic in each profile is
random and that the rms noise is the same regardless of the strengh of the signal. This
assumption need not be true as deconvolution algorithms seem to produce noise that is
proportional in a non-linear fashion with signal (Rupen 97). However different noise for
different flux levels will hardly lead to a universal, non-Gaussian line shape. A double
Gaussian (a narrow and a broad component) as shown in Figure 15 is a good fit to the
median profile derived from the 45′′ data.
6. Kinetic Energy Distribution
The uniformity of the profile shape in NGC 1058 suggests that on scales of 2.5 kpc,
the neutral gas is being stirred into the same distribution of energy per unit mass and that
this distribution is different than that for other galaxies (e.g. the Milky Way). Figure 16
shows the normalized kinetic energy (KE) distribution for the Milky Way and for NGC 1058.
This comparison is only qualitative. The term “normalized” in the case of the Milky Way
refers to the fact that the KE distribution was obtained from a model (i.e. double Gaussian
fit) of the H I emission at the North Galactic Pole; this model was presented in Kulkarni
– 8 –
& Fich (1985), hereafter KF85, and it only includes “normal” emission, i.e. it does not
include emission from the H I falling into the disk. These authors corrected for the infalling
emission by assuming that the huge bump on one side of the profile represented infalling
gas. To remove the bump they reflected the profile about the velocity corresponding to the
peak flux and obtained the profile shown in Figure 16. The units of the KF plot are Kelvin2
km2 s−2. The units for the NGC 1058 are arbitrary, and the term “normalized” in this
case means that instead of flux or temperature, we use flux divided by peak flux, and bin
numbers instead of velocities. To compare those qualitatively we aligned the KF profile with
the NGC 1058 median profile from the entire 45′′ data set. The aligning was done by fitting
a single Gaussian to the KF85 profile and to the NGC 1058 median profile. We require and
that the limits of the KF85 and the NGC 1058 profiles span an equal number of FWHM.
The striking feature in the Galactic energy distribution, also noted by Kulkarni & Fich
(1985) is the almost constant kinetic energy for about 50 km/sec. In contrast, NGC 1058’s
KE curve is more centrally peaked. Presumably the KE distribution is set both by the
galactic potential as well as explosive ISM events (such as SNe, star formation, and infalling
gas). It is not perfectly clear how these factors have shaped the energy distribution as a
function of velocity of either the Milky Way or NGC 1058. function of velocity.
7. Beam Smearing and Bulk Motions
An accurate study of the profile shapes throughout the galaxy require us to understand
the effect of beam smearing on our measurements. Consider a round spiral disk with gas
moving in circular orbits at a velocity vcirc; attaching polar coordinates to this disk (rd, θ) and
letting the angle between the normal to the plane of the galaxy and line-of sight be referred
to as the inclination angle (i) the observed radial velocity on a set of sky-coordinates (x, y)
will be
vz(x, y) = vz(rd, θ)cos(i) + vcirc(rd) sin(i)cos(θ) + vred
where vred is the velocity of the galaxy for which with respect to the observer. Obviouosly a
smaller i is (a more face-on) makes it easier to measure the true vz distribution. A gradient
in the vcirc(rd)sin(i)cos(θ) across the resolution element (beam) will increase the width of
the profile and confuse the measurements of the velocities perpendicular to the disk. This
problem is known as beam smearing.
Two tests were performed using our highest resolution (15′′) data to quantify the effect
of beam smearing on our measurements of σv. First, we determined the maximum in-plane
velocity difference within a beam which would contribute to the width of the line profile at
a certain position in the galaxy (i.e. at a pixel). This was done by finding the maximum
– 9 –
difference (hereafter Mdiff) between the central velocity vcen (i.e. the velocity associated
with the peak flux as derived from the single Gaussian fit to the 15′′ data) of the H I pixel
and the central velocities of all the pixels within a square with 32′′ sides centered on that
pixel. Figure 11 shows the map of these maximum difference. This method is based on the
assumption that differences in vcen are due to gas motions in the plane of the galaxy. This
test shows that σv is correlated with Mdiff . The correlation between σv and Mdiff suggests
the exitence of bulk motions on scales smaller and equal to those probed by our highest
resolution data 15′′ (0.7 kpc).
Finding Mdiff across NGC 1058’s disk gives an upper limit to the broadening of the
H I profiles. To better understand the effect of beam smearing on our observations we
constructed a simple model of how H I in NGC 1058 would appear if it was an infinitely
cold disk; we then convolved this model with a 30′′ beam and ran our Gaussian least squares
fitting routine on the resulting H I profiles. The widths of these final model profiles were
significantly smaller than those measured in NGC 1058 (Figure 12) suggesting that beam
smearing does not have a significant impact on our observations.
8. Summary and Conclusions
Excellent resolution and high sensitivity H I observations of NGC 1058 show an intrigu-
ing picture of the interstellar medium throughout this galaxy: the velocity dispersion ranges
from 4 to 14 km/sec but is not correlated with star formation or the spiral arms, which is
another major ISM regulator. Global trends such as a radial fall-off must be explained in the
context of significant local effects; most notable among these are isolated, resolved regions
of high velocity dispersions as well as significant scatter in the dispersion at a given radius.
In summary unlike some previous studies, we find that the dispersion is not constant and it
does not simply decline with radius. we also find that there is no tight correlation between
the width of the profiles and the spiral arms.
The most probable source for the highest dispersions observed outside the central regions
are small scale (≤0.7 kpc) bulk motions. The energy sources supporting such motions are not
entirely clear. The disk is warped in the southern part (van der Kruit & Shostak 1984, Shen
& Sellwood 2006), leading to the observed broad profiles: however, a similar explanation for
region N would suggest a rather impressive small-scale structure in the warp, as it would
require the inclination to change by ∼ 3 degrees over a size smaller than 0.7 kpc.
There is no obvious correlation with stars or star formation tracers such as Hα, radio
continuum, SNe except in region C; nor is it clear what role, if any, is played by spiral arms
– 10 –
in driving the observed small scale bulk motions. Some of the measured velocity dispersions
are higher than the 10 km s−1 canonical sound speed in the ISM, but since we cannot easily
measure directly the pressure and 3-dimensional density structure of the gas, we cannot
determine the exact sound speed to know if we are indeed seeing supersonic motions.
The shapes of the H I profiles in NGC 1058 are non-Gaussian and hence cannot be
explained as emission from single temperature gas. Therefore, it is not clear whether these
narrow profiles are evidence of a lower thermal balance point between heating and cooling
mechanisms in NGC 1058’s outskirts as compared to the rest of the galaxy.
A double Gaussian description of the H I profile is far from a complete surprise. The
surprise is the constancy between the broad and narrow components throughout NGC 1058’s
H I disk. In previous studies (e.g. Mebold 1972, Young & Lo 1996) it was found that some
of the H I profiles were well described by double Gaussians, and these associated the narrow
Gaussians with the CNM and the broad with the WNM. Young & Lo (1996) found that
the narrow component existed only in regions of high H I column density, next to areas
with active star formation. It is unlikely that the universal profile in NGC 1058 can be
explained as a combination of cold and warm medium for the narrow and broad component
respectively, because it seems difficult to have the same ratio of warm to cold gas in regions
associated with stars and star fromation and at radii three times the optical R25. Also, high
resolution observations in other galaxies (Braun 1998) showed that the CNM dissappears at
the edges of the optical disk.
Further quests on the observational front such as (at what resolution will this universality
break down, is this spatial scale particular to NGC 1058, can we see the same shape and/or its
universality in other systems), as well as theoretical efforts to model mechanisms of injecting
energy into the ISM, and determine how that energy dissipates throughout a fractal ISM are
necessary to understand the full significance of the universal profile in our 45′′ data cube.
A.P. would like to thank Jacqueline van Gorkom for invaluable help in designing the
experiment, as well as during during the analysis process and in editing this document.
A.P. would also like to thank Liese van Zee, Mordecai Mac-Low and Jennifer Donovan
for their helpfull suggestions and discussions. The National Radio Astronomy Observatory
is a facility of the National Science Foundation operated under cooperative agreement by
Associated Universities, Inc..
REFERENCES
Allen, R. J., and Shostak, G. S. 1979, Astron. Astrophys. Suppl. 35, 163
– 11 –
Braun, R. and Walterbros R. A. M. , 1992 ApJ, 386, 120
Braun, R., 1997 ApJ484, 637
Braun, R., astro-ph/9804320 Interstellar Turbulence, Proceedings of the 2nd Guillermo Haro
Conference. Edited by Jose Franco and Alberto Carraminana. Cambridge University
Press, 1999., p.12
Dickey, J.M. and Lockman, F.J., 1990, Annu. Rev. Astron. Astrophys. 28, 215
Dickey, J.M., Mebold, U., Stanimirovic, S., Staveley-Smith L., 2000, ApJ, 536. 756D
Dziourkevitch, N., Elstner, D., & Rudiger, G., 2004, A& A, 423, L29
Eskridge, P. B., Frogel, J. A., Pogge, R. W., Quillen, A. C., et al. 2002, ApJS, 143, 73
Ferguson, A., Wyse, R. F. G., Gallagher, J.S., Hunter, D.A., 1998 ApJ506, 19
Ferguson, A., Gallagher, J. S., Wyse, R. F. G., AJ, 116, 673
Kulkarni, S.R. and Fich, M., 1985 ApJ289, 792
Kulkarni, S.R. and Heiles, C., 1988 in Galactic and Extragalactic Radio Astronomy, ed. G.L.
Verschuur& K. I. Kellermann (New York: Springer Verlag), 95
Lewis, B.M., 1984 ApJ285, 453
Lewis, B.M., 1987 ApJS63, 515
Lewis, B.M., 1987 Obs. 107, 201L
Li, Y., Mac Low, M.M., & Klessen, R.S., 2005, ApJ, 620, L19
Lockman, F. J., 1984, ApJ, 283, 90
Lockman, F. J. and Gehman, C. S., 1991, ApJ, 382, 182
Mac Low, M.M., & Klessen, R. S., 2004, Rev. Mod. Phys. 76, 125
Malhotra, S., 1994, ApJ, 433, 687
Malhotra, S., 1995 ApJ, 448, 132
McKee, C.F.,Ostriker, J.P. 1977, ApJ, 218, 148
Mebold, U., 1972, A& A, 19, 13
http://arxiv.org/abs/astro-ph/9804320
– 12 –
Merrifield, M. R., 1993, MNRAS, 261, 233
Olling, R., 1995, PhD Thesis
Olling, R., 1996, AJ 112, 457
Oort, J. H., 1932, Bull. Astron. Inst. Netherlands, 6, 349
Piontek, R. A., & Ostriker, E. C., 2004, ApJ 601, 905
van der Kruit, P.C., Shostak, G.S., 1982 A&A, 115, 293S
Rupen, M. P., 1987 PhD thesis
Rupen, M. P., 1997, VLA Scientific Memorandum, No. 172: A Test of the CS (Shortened
C) Configuration available at http://www.vla.nrao.edu/memos/sci
Rupen, M. P., 1998, VLA Scientific Memorandum, No. 172: A Test of the CS (Shortened
C) Configuration available at http://www.vla.nrao.edu/memos/sci/175/cstest2/
Rupen, M.P., 1999, Spectral Line Observing II: Calibration and Analysis in ASP Conf.
Ser.,180, Synthesis Imaging in Radio Astronomy II ed. by Taylor, G.B., Carilli, C.L.,
& Perley, R.A. pg. 229
van der Kruit, P.C., Shostak, G.S., 1984 A&A, 134, 258V
Shen, J. & Sellwood, J. A., MNRAS, 2006 370, 2
Sellwood, J. A. & Balbus, S.A., ApJ 1999, 511, 660
Shostak, G.S., van der Kruit, P.C., 1984 A&A, 132, 20S
Young, L.M., Lo, K.Y. 1996 ApJ, 462, 203
Young, L.M., Lo, K.Y. 1997 ApJ, 476, 127
Young, L.M., Lo, K.Y. 1997 ApJ, 490, 710
This preprint was prepared with the AAS LATEX macros v5.2.
http://www.vla.nrao.edu/memos/sci
http://www.vla.nrao.edu/memos/sci/175/cstest2/
– 13 –
Table 1. General Properties
NGC 1058
R.A.(B1950) 02 40 23.2
Dec (B1950) +37 07 48.0
Morphological type Sc
Vsys [km/sec] 518
LB [L⊙] 1.5× 109
MHI [M⊙] 2.3× 109
SFR [M⊙ yr
−1]a 3.5× 10−2
D25×d25 [arcmin]b 3.0× 2.8
Distance[Mpc]c 10
Physical equivalent of 1′′ 48.5 pc
Inclinationd 4–11◦
Environmente member of the NGC 1023 Group
aSFR stands for Star Formation Rate, it was calculated from Hα fluxes by Ferguson,
Gallagher & Wyse (1998)
bNASA/IPAC extragalactic database (NED)
cFerguson, Gallager, & Wyse (1998)
dvan der Kruit & Shostak (1984)
e Lewis (1975)
– 14 –
Fig. 1.— Sample channel maps for NGC 1058–Each square image represents the H I 21 cm
line emission within a velocity range of 2.58 km s−1 where the central velocity of that range
is given on the upper left corner of each image. The x and y axis of every channel map gives
the Right Ascension (RA) and Declination (Dec) coordinates,and are identical to these in
Figure 2. Such sample channel maps can be assembled together in the same way the frames
of a movie are put together to make what is refered to as a data cube.
– 15 –
Fig. 2.— Intensity weighted mean velocity contours atop H I intensity map for NGC 1058
Intensity weighted mean velocity contours atop H I intensity map (grey) for NGC 1058—This
figure was made using the 15′′ data cube with a sensitivity of 0.5 mJy/beam corresponding
to a column density of 1.6 × 1018 cm−2. The physical resolution of this image is 0.7 kpc and
the velocity contours range between 500 and 558 km/sec in 2 km/sec increments. The H I
disk in NGC 1058 extends to a diameter of more than 20 kpc.
– 16 –
Fig. 3.— Four sample of H I profiles (crosses), the Gaussian fit (solid line) and the residual pattern
(red) with σv from the Gaussian fit: 12.7 km/sec (upper left, pixel from a region of high dispersion),
5.95 km/sec (upper right, pixel from a region of low dispersion), 7.6 km/sec (lower left pixel from
an interarm region), 9.3 km/sec (lower right, pixel from an arm region). The x axis is in km/sec
and the y axis represents H I intensity in mJy/beam. The shown residuals indicate that a single
Gaussian function does not adequately describe the line shapes. However the width of the Gaussian
does track the breadth of the H I profile. This figure is based on the 45′′ data cube and the single
Gaussian fits done for that cube.
– 17 –
Fig. 4.— Three samples of H I profiles (solid line), the Gaussian fit (squares) and the residual
pattern (dashed) with σv from the Gaussian fit: 3.8, 7.6, and 13.2 respectively. Figure based
on the 30′′ data cube and the single Gaussian fits done for that cube. The x axis is in km/sec
and the y axis represents H I intensity in mJy/beam.
– 18 –
Fig. 5.— Distribution of dispersions throughout NGC 1058; the regions of highest dispersion
are labeled N, C, and S. The x and y axis are the RA and Dec in B1950 coordinates. This
figure is based on the results of the single Gaussian fit performed on the 30′′ data. The
contours are in km/sec and start in steps of 0.5 km/sec. Black is used for dispersions
between 5.5 to 7, cyan for 7.5 to 9, green for 9.5 to 11, red for 11.5 to 13, and magenta for
13.5 to 15 km/sec.
– 19 –
Fig. 6.— Four sample profiles in the regions with highest asymmetries (of order few percent);
The x axis of each of the four plots represents velocity and is in units of km s−1 and the
y axis represents H I intensity in mJy/beam, where the beam refers to the point spread
function of the observations. The upper left is from region N, upper right from C, lower left
from S, and lower right from a region West of S. Regions N,C, and S are shown and labeled
in Figure 3.
– 20 –
Fig. 7.— Radial dependece of NGC 1058’s σvs—The x axis (in arcseconds) gives the radius
while the y axis (in km/sec) gives the σv as derived from single Gaussian least squares fits
to the 30′′ data cube. The filled circles represent points with error bars,less than 12.5% of
σv, the empty circles – points with error bars between 12.5% and 25% and the dots– points
with errors greater than 25%. Despite a few high σv regions (N,S in Figure 5, the radial
falloff is evident.
– 21 –
Fig. 8.— The energy in the neutral gas was roughly approximated as the product of the
total H I intensity and the square of the velocity dispersion. Both the green and the black
lines represent azimuthal averages of concentric rings around the center of NGC 1058. The
black error bars show the rms in each of these rings. The red line shows the exponential fit
to the stellar data. The energy in the gas falls off with radius much slower than the stellar
luminosity suggesting that processes other than those associated with star input energy are
responsible for heating the gas at large radiae.
– 22 –
Fig. 9.— σv contours atop of an H I total intensity map—The contours range from 4 to 14
km/sec in steps of 0.5 km/sec and are based on the single Gaussian fit to the 30′′, NGC 1058
data cube. Note that regions N and S of high σv are located in the inter-arms.
– 23 –
Fig. 10.— Hα greyscale from Ferguson, Gallager, & Wyse 1998, atop dispersion contours as
in Figure 5.
– 24 –
Fig. 11.— Contours of maximum potential beam smearing in km s−1. Black is used for
dispersions maximum beam smearing effect of 1, 2, and 3 km/sec, cyan for 4,5,6, green for
7,8,9, red for 10, 11, and 12, and magenta for values of 30 km/sec and above. 5.5 to 7,
cyan for 7.5 to 9, green for 9.5 to 11, red for 11.5 to 13, and magenta for 13.5 to 15 km/sec.
The magenta contours are regions where the H I emssion was very faint or non-existent. As
such the Gaussian fitting routine employed produced spurious results. The square shape of
some of the contours is an artifact of the method employed in determining the maximum
beam smearing. This figure is based on the 15′′ data cube. Note that the highest velocity
gradients are found in regions N and S and south-west of C. Regions N, S, and C are shown
and labeled in Figure 5.
– 25 –
Fig. 12.— Effect of beam smearing on σv measurements at 30
′′ resolution. The x axis in
in km/sec and the y axis is in mJy/beam. The connected squares represent the observed
profile. The narrow profiles were obtained by modeling the velocity profiles associated with
an infinitely cold disk and then convolving that model with a 30′′ beam, and running the
Gaussian least squares fitting routine on the convolved cube.
– 26 –
Fig. 13.— Median Profile (red) atop all H I profiles from NGC 1058 45′′ data cube. The x
axis is flux/peak flux and the y axis is velocity minus the central velocity and divided by the
FWHM.
– 27 –
Fig. 14.— Median profiles derived for certain FWHM ranges (left pannel);and for peak flux
ranges (right pannel) from the 45′′ NGC 1058 data set. The x axis is not in [km/sec] but
represents the grid (bin number) on which the profiles were set. Please refer to text for a
detailed explanation. The y axis is the flux divided by peak flux. In the left pannel black
is used for median profiles with widths (FWHM) between 14 to 18 km/sec, cyan for 18 to
22 km/sec, green 22 and red for 26 to 30 km/sec. In the right pannel a solid line is used
for profiles with peak fluxes between 10 and 25 mJy/beam, a dotted line is used for profiles
with peaks between 25 and 40 mJy/beam, short dash for 40 to 55, and long dash for 55 to
70 mJy/beam.
– 28 –
Fig. 15.— Double Gaussian Fit to the Universal Profile — The x axis is in bins as described
in the text and the y axis is the flux normalized to the peak intensity. The two Gaussian
components used to fit the Median profile and their sum are shown in dotted lines. The
residuals are shown in red. Error bars based on the rms in each bin are also given. The
ratio between the areas of the broad and narrow components is 1.35 while that between their
FWHM is 2.09.
– 29 –
Fig. 16.— Normalized Kinetic Energy distributions for the Milky Way (top) and NGC 1058
(bottom) and corresponding H I profiles. The top figure was obtained from a double Gaussian
decomposition of the North Galactic Pole H I emission, from Kulkarni & Fich (1985). For the
top figure, the x axis represents velocity in units of km s−1 and the y axis is the normalized
energy in units of Kelvin km2 s−2. The inset upper right figure shows the H I profile
from which the normalized energy curve for the Milky Way’s North Galactic Pole emission
was estimated. The bottom figure shows the qualitative behaviour of the kinetic energy
distribution with velocity in NGC 1058. Here the x axis represents a bin number (ref.to
text) while the y axis represents the qualitative behaviour of the kinetic energy distribution
in NGC 1058. This figure suggests that the kinetic energy in the Galactic North Galactic
Pole emission is more evenly distributed in velocity than that in NGC 1058.
	Introduction
	Observations and Data Reductions
	Results and Analysis
	General Characteristics of the Velocity Dispersion 
	Profile Shapes
	Kinetic Energy Distribution
	 Beam Smearing and Bulk Motions
	Summary and Conclusions
ABSTRACT
  We present excellent resolution and high sensitivity Very Large Array (VLA)
observations of the 21cm HI line emission from the face-on galaxy NGC 1058,
providing the first reliable study of the HI profile shapes throughout the
entire disk of an external galaxy. Our observations show an intriguing picture
of the interstellar medium; throughout this galaxy velocity-- dispersions range
between 4 to 15 km/sec but are not correlated with star formation, stars or the
gaseous spiral arms. The velocity dispersions decrease with radius, but this
global trend has a large scatter as there are several isolated, resolved
regions of high dispersion. The decline of star light with radius is much
steeper than that of the velocity dispersions or that of the energy in the gas
motions.

<|endoftext|><|startoftext|>
Introduction
From the realization [26, 27, et seq.] that all cataclysmic variables (CVs)
are interacting binary stars, their existence posed a dilemma for theories of
binary evolution. The notion that close binary stars might evolve in ways
fundamentally different from isolated stars was rooted in the famous ‘Algol
paradox’ (that the cooler, lobe-filling subgiant or giant components among
these well-known eclipsing binaries are less massive, but more highly evolved,
than their hotter main-sequence companions). The resolution of that para-
dox invoked large-scale mass transfer reversing the initial mass ratios of these
binaries [34]. Indeed, model calculations assuming conservation of total mass
and orbital angular momenum are qualitatively consistent with the main fea-
tures of Algol-type binaries. Even if quantitative consistency between models
http://arxiv.org/abs/0704.0280v1
2 Ronald F. Webbink
and observational data generally requires some losses of mass and angular
momentum among Algol binaries (e.g., [10, 21, 8]), the degree of those losses
is typically modest, and the remnant binary is expected to adhere closely to
an equilibrium core mass-radius relation for low-mass giant stars (see, e.g.,
the pioneering study of AS Eri by Refsdal, Roth & Weigert [46]). Those rem-
nant binaries are typically of long orbital period (days to weeks) in compari-
son with CVs, and furthermore typically contain helium white dwarfs of low
mass, especially in the short-period limit. In contrast, CVs evidently contain
relatively massive white dwarfs, in binary systems of much shorter orbital
periods (hours), that is, with much smaller total energies and orbital angular
momenta.
In an influential analysis of the Hyades eclipsing red dwarf/white dwarf
binary BD +16◦ 516 (= V471 Tau), Vauclair [57] derived a total system mass
less than the turnoff mass of the Hyades, and noted that the cooling age of
the white dwarf component was much smaller than the age of the cluster.
He speculated that V471 Tau in its present state was the recent product of
the ejection of a planetary nebula by the white dwarf. Paczynski [41] realized
that, immediately prior to that event, the white dwarf progenitor must have
been an asymptotic giant branch star of radius ∼600 R⊙, far exceeding its
current binary separation ∼3R⊙. He proposed that the dissipation of orbital
energy provided the means both for planetary nebula ejection and for the
severe orbital contraction between initial and final states, a process he labeled
‘common envelope evolution’ (not to be confused with the common envelopes
of contact binary stars). Discovery soon followed of the first ‘smoking gun’,
the short-period eclipsing nucleus of the planetary nebula Abell 63 [3].
Over the succeeding three decades, there have been a number of attempts
to build detailed physical models of common envelope evolution (see [55] for
a review). These efforts have grown significantly in sophistication, but this
phenomenon presents a daunting numerical challenge, as common envelope
evolution is inherently three-dimensional, and the range of spatial and tem-
poral scales needed to represent a common envelope binary late in its inspiral
can both easily exceed factors of 103. Determining the efficiency with which
orbital energy is utilized in envelope ejection requires such a code to conserve
energy over a similarly large number of dynamical time scales.
Theoretical models of common envelope evolution are not yet capable of
predicting the observable properties of objects in the process of inspiral. If
envelope ejection is to be efficient, then the bulk of dissipated orbital energy
must be deposited in the common envelope on a time scale short compared
with the thermal time scale of the envelope, else that energy be lost to ra-
diation. The duration of the common envelope phase thus probably does not
exceed ∼103 years. However, general considerations of the high initial orbital
angular momenta of systems such as the progenitor of V471 Tau, and the fact
that most of the orbital energy is released the envelope only very late in the
inspiral have led to a consensus view [60, 32, 33, 65, 50] that the planetary
nebulae they eject should be bipolar in structure, with dense equatorial rings
Common Envelope Evolution Redux 3
absorbing most of the initial angular momentum of the binary, and higher-
velocity polar jets powered by the late release of orbital energy. Indeed, this
appears to be a signature morphology of planetary nebulae with binary nuclei
(e.g., [2]), although it may not be unique to binary nuclei.
2 The Energetics of Common Envelope Evolution
Notwithstanding the difficulties in modeling common envelope evolution in
detail, it is possible to calculate with some confidence the initial total energy
and angular momentum of a binary at the onset of mass transfer, and the
corresponding orbital energy and angular momentum of any putative remnant
of common envelope evolution.
Consider an initial binary of component masses M1 and M2, with orbital
semimajor axis Ai. Its initial total orbital energy is
Eorb,i = −
GM1M2
. (1)
Let star 1 be the star that initiates interaction upon filling its Roche lobe. If
M1c is its core mass, and M1e = M1 − M1c its envelope mass, then we can
write the initial total energy of that envelope as
Ee = −
GM1M1e
λR1,L
, (2)
where R1,L is the Roche lobe radius of star 1 at the onset of mass transfer (the
orbit presumed circularized prior to this phase), and λ is a dimensionless pa-
rameter dependent on the detailed structure of the envelope, but presumably
of order unity. For very simplified models of red giants – condensed poly-
tropes [40, 14, 16] – λ is a function only of me ≡ Me/M = 1 − Mc/M , the
ratio of envelope mass to total mass for the donor, and is well-approximated
λ−1 ≈ 3.000− 3.816me + 1.041m
e + 0.067m
e + 0.136m
e , (3)
to within a relative error < 10−3.
For the final orbital energy of the binary we have
Eorb,f = −
GM1cM2
, (4)
where Af is of course the final orbital separation. If a fraction αCE of the
difference in orbital energy is consumed in unbinding the common envelope,
αCE ≡
orb − E
, (5)
4 Ronald F. Webbink
αCEλr1,L
M1 −M1c
, (6)
where r1,L ≡ R1,L/Ai is the dimensionless Roche lobe radius of the donor
at the start of mass transfer. In the classical Roche approximation, r1,L is a
function only of the mass ratio, q ≡ M1/M2 [7]:
r1,L ≈
0.49q2/3
0.6q2/3 + ln(1 + q1/3)
. (7)
Typically, the second term in brackets in (6) dominates the first term.
As formulated above, our treatment of the outcome of common envelope
evolution neglects any sources or sinks of energy beyond gravitational terms
and the thermal energy content of the initial envelope (incorporated in the
parameter λ). The justification for this assumption is again that common
envelope evolution must be rapid compared to the thermal time scale of the
envelope. This implies that radiative losses (or nuclear energy gains – see
below) are small. They, as well as terminal kinetic energy of the ejecta, are
presumably reflected in ejection efficiencies αCE < 1. We neglect also the
rotational energy of the common envelope (invariably small in magnitude
compared to its gravitational binding energy), and treat the core of the donor
star and the companion star as inert masses, which neither gain nor lose
mass or energy during the course of common envelope evolution. One might
imagine it possible that net accretion of mass by the companion during inspiral
might compromise this picture. However, the common envelope is typically
vastly less dense than the companion star. and may be heated to roughly
virial temperature on infall. A huge entropy barrier arises at the interface
between the initial photosphere of the companion and the common envelope
in which it is now embedded, with a difference in entropy per particle of order
(µmH/k)∆s ≈ 4–6. The rapid rise in temperature and decrease in density
through the interface effectively insulates the accreting companion thermally,
and strongly limits the fraction of the very rarified common envelope it can
retain upon exit from that phase [62, 15].
Common envelope evolution entails systemic angular momentum losses as
well as systemic mass and energy losses. Writing the orbital angular momen-
tum of the binary,
GM21M
2A(1 − e
M1 +M2
, (8)
in terms of the total orbital energy, E = −GM1M2/2A, we find immediately
that the ratio of final to initial orbital angular momentum is
)3/2 (
M1c +M2
M1 +M2
)−1/2 (
)1/2 (
1− e2f
1− e2i
. (9)
Since M1c < M1 and we expect the initial orbital eccentricity to be small (ei ≈
0), it follows that any final energy state lower than the initial state (|Ef | >
Common Envelope Evolution Redux 5
|Ei|) requires the loss of angular momentum. The reverse is not necessarily
true, so it is the energy budget that most strongly constrains possible outcomes
of common envelope evolution.
3 Does Common Envelope Evolution Work?
As an example of common envelope energetics, let us revisit the pre-CV
V471 Tau, applying the simple treatment outlined above. It is a member of
the Hyades, an intermediate-age metal-rich open cluster (t = 650 Myr, [Fe/H]
= +0.14) with turnoff mass MTO = 2.60 ± 0.06M⊙ [28]. The cooling age of
the white dwarf is much smaller than the age of the cluster (tcool,WD = 10
yr [39] – but see the discussion there of the paradoxical fact that this most
massive of Hyades white dwarfs is also the youngest). Allowing for the possi-
bility of significant mass loss in a stellar wind prior to the common envelope
phase, we may take MTO for an upper limit to the initial mass M1 of the
white dwarf component. The current masses for the white dwarf and its dK2
companion, as determined by O’Brien et al. [39] are MWD = 0.84± 0.05M⊙,
MK = 0.93±0.07M⊙, with orbital separation A = 3.30±0.08R⊙. A 2.60M⊙
star of Hyades metallicity with a 0.84M⊙ core lies on the thermally-pulsing
asymptotic giant branch, with radius (maximum in the thermal pulse cycle)
which we estimate at Ri = 680R⊙ = R1,L, making Ai = 1450R⊙. With this
combination of physical parameters, we derive an estimate of αCEλ = 0.057
for V471 Tau. Equation (3) then implies αCE = 0.054. This estimate of course
ignores any mass loss prior to common envelope evolution (which would drive
αCE to lower values), or orbital evolution since common envelope evolution
(which would drive αCE to higher values). In any event, the status of V471 Tau
would appear to demand only a very small efficiency of envelope ejection.1
The fact that V471 Tau is a double-lined eclipsing member of a well-
studied cluster provides an exceptionally complete set of constraints on its
prior evolution. In all other cases of short-period binaries with degenerate or
compact components, available data are inadequate to fix simultaneously both
the initial mass of the compact component and the initial binary separation,
for example. To validate the energetic arguments outlined above, one must
resort to consistency tests, whether demonstrating the existence of physically-
plausible initial conditions that could produce some individual system, or else
1 The anomalously small value of αCE deduced for V471 Tau may be connected to
its puzzlingly high white dwarf mass and luminosity: O’Brien et al. [39] suggest
that it began as a heirarchical triple star, in which a short-period inner binary
evolved into contact, merged (as a blue straggler), and later engulfed its lower-
mass companion in a common envelope. An overmassive donor at the onset of
common envelope evolution would then have a more massive core than produced
by its contemporaries among primordially single stars, and it would fill its Roche
lobe with a more massive envelope at somewhat shorter orbital period, factors all
consistent with a larger value of αCE having led to V471 Tau as now observed.
6 Ronald F. Webbink
following a plausible distribution of primordial binaries wholesale through the
energetics of common envelope evolution and showing that, after application
of appropriate observational selection effects, the post-common-envelope pop-
ulation is statistically consistent with the observed statistics of the selected
binary type. In the cases of interacting binaries, such as CVs, one should allow
further for post-common-envelope evolution. Nevertheless, within these limi-
tations, binary population synthesis models show broad consistency between
the outcomes of common envelope evolution and the statistical properties of
CVs and pre-CVs [4, 24, 44, 17, 63], as well as with most super-soft X-ray
sources [5], for assumed common envelope ejection efficiencies typically of or-
der αCE ≈ 0.3–0.5.
A useful tool in reconstructing the evolutionary history of a binary, used
implicitly above in analyzing V471 Tau, is the mass-radius diagram spanned
by single stars of the same composition as the binary. Figure 1 illustrates
such a diagram for solar-composition stars from 0.08 M⊙ to 50 M⊙. In it are
plotted various critical radii marking, as a function of mass, the transition
from one evolutionary phase to the next.2 Since the Roche lobe of a binary
component represents a dynamical limit to its size, its orbital period fixes the
mean density at which that star fills its Roche lobe,
logPorb(d) ≈
log(RL/R⊙)−
log(M/M⊙)− 0.455 , (10)
to within a very weak function of the binary mass ratio. The mass and radius of
any point in Fig. 1 therefore fixes the orbital period at which such a star would
fill its Roche lobe, just as the orbital period of a binary fixes the evolutionary
state at which such a star initiates mass transfer.
2 Not all evolutionary phases are represented here. In a binary, a donor initiates
mass transfer when it first fills its Roche lobe; if it would have done so at a prior
stage of evolution, then its present evolutionary state is ‘shadowed’, in the sense
that it only occurs by virtue of the binary not having filled its lobe previously.
Thus, for example, low- and intermediate-mass stars cannot in general initiate
mass transfer during core helium burning, because they would have filled their
Roche lobes on the initial ascent of the giant branch.
Fig. 1 (facing page). The mass-radius diagram for stars of solar metallicity, con-
structed from the parametric models of stellar evolution by Hurley, Pols, & Tout [19]
and models of thermally-pulsing asymptotic giant branch stars by Wagenhuber &
Weiss [58]. Also plotted in the locus of asymptotic giant branch stars at the onset
of the superwind, after Willson [64]; beyond this radius, systemic mass loss drives
orbital expansion faster than nuclear evolution drives stellar expansion, and a bi-
nary will no longer be able to initiate tidal mass transfer. The unlabeled dotted
line terminating at the junction between lines labeled ‘helium core flash’ and ‘core
helium ignition’ marks the division between those helium cores (at lower masses)
which evolve to degeneracy if stripped of their envelope, and those (at higher masses)
which ignite helium non-degenerately and become helium stars.
Common Envelope Evolution Redux 7
8 Ronald F. Webbink
In Fig. 2, the corresponding core masses of low- and intermediate-mass
stars are plotted in the mass-radius diagram. For a binary which is the imme-
diate product of common envelope evolution, the mass of the most recently
formed white dwarf (presumably the spectroscopic primary) equals the core
mass of the progenitor donor star. That donor (presuming it to be of solar
metallicity) must be located somewhere along the corresponding core mass
sequence in Fig. 2, with the radius at any point along that sequence corre-
sponding to the Roche lobe radius at the onset of the mass transfer, and the
mass a that point corresponding to the initial total mass of the donor. Thus,
if the mass of the most recently-formed white dwarf is known, it is possible
to identify a single-parameter (e.g., initial mass or initial radius of the donor)
family of possible common-envelope progenitors.
Using a mapping procedure similar to this, Nelemans & Tout [35] recently
explored possible progenitors for detached close binaries with white dwarf
components. Broadly speaking, they found solutions using (6) for almost all
systems containing only one white dwarf component. Only three putative
post-common-envelope systems failed to yield physically-plausible values of
αCEλ: AY Cet (G5 III + DA, Porb = 56.80 d [49]), Sanders 1040 (in M67:
G4 III + DA, Porb = 42.83 d [56]), and HD 185510 (=V1379 Aql: gK0 +
sdB, Porb = 20.66 d [22]). The first two of these systems are non-eclipsing,
but photometric masses for their white dwarf components are extremely low
(estimated at ∼0.25M⊙ and 0.22M⊙, respectively), with Roche lobe radii
consistent with the limiting radii of very low-mass giants as they leave the
giant branch (cf. Fig. 2, above). They are thus almost certainly post-Algol
binaries, and not post-common-envelope binaries. HD 185510 is an eclipsing
binary; a spectroscopic orbit exists only for the gK0 component [9]. The mass
(0.304 ± 0.015M⊙) and radius (0.052 ± 0.010R⊙) of the sdB component,
deduced from model atmosphere fitting of IUE spectra combined with solution
of the eclipse light curve, place it on a low-mass white dwarf cooling curve,
rather than among helium-burning subdwarfs [22]. Indeed, from fitting very
detailed evolutionary models to this system, Nelson & Eggleton [38] found a
Fig. 2 (facing page). The mass-radius diagram for low- and intermediate-mass
stars, as in Fig. 1, but with loci of constant core mass added. The solid lines added
correspond to core masses interior to the hydrogen-burning shell, dashed lines to
those interior to the helium-burning shell. Solid lines intersecting the base of the
giant branch (dash-dotted curve) correspond to helium core masses of to 0.15, 0.25,
0.35, 0.5, 0.7, 1.0, 1.4, and 2.0 M⊙; those between helium ignition and the initial
thermal pulse to 0.7, 1.0, 1.4, and 2.0 M⊙, and those beyond the initial thermal pulse
to 0.7, 1.0, and 1.4 M⊙. Dashed lines between helium ignition and initial thermal
pulse correspond to carbon-oxygen core masses of 0.35, 0.5, 0.7, 1.0, and 1.4 M⊙.
Beyond the initial thermal pulse, helium and carbon-oxygen core masses converge,
with the second dredge-up phase reducing helium core masses above ∼0.8 M⊙ to
the carbon-oxygen core.
Common Envelope Evolution Redux 9
10 Ronald F. Webbink
post-Algol solution they deemed acceptable. It thus appears that these three
problematic binaries are products of quasi-conservative mass transfer, and not
common envelope evolution.
The close double white dwarfs present a more difficult conundrum, how-
ever. Nelemans et al. [36, 37, 35] found it impossible using the energetic argu-
ments (6) outlined above to account for the existence of a most known close
double white dwarfs. Mass estimates can be derived for spectroscopically de-
tectable components of these systems from their surface gravities and effective
temperatures (determined from Balmer line fitting). The deduced masses are
weakly dependent on the white dwarf composition, and may be of relatively
modest accuracy, but they are independent of the uncertainties in orbital
inclination afflicting orbital solutions. These mass estimates place the great
majority of detectable components in close double white dwarf binaries below
∼0.46M⊙, the upper mass limit for pure helium white dwarfs (e.g., [51]). They
are therefore pure helium white dwarfs, or perhaps hybrid white dwarfs (low-
mass carbon-oxygen cores with thick helium envelopes). While reconstructions
of their evolutionary history yield physically-reasonable solutions for the final
common envelope phase, with values for 0 < αCEλ < 1, the preceding phase of
mass transfer, which gave rise to the first white dwarf, is more problematic. If
it also proceeded through common envelope evolution, the deduced values of
αCEλ ≤ −4 for that phase are unphysical. Nelemans & Tout [35] interpreted
this paradox as evidence that descriptions of common envelope evolution in
terms of orbital energetics, as described above, are fundamentally flawed.
4 An Alternative Approach to Common Envelope
Evolution?
Nelemans et al. [36] proposed instead parameterizing common envelope evo-
lution in terms of γ, the ratio of the fraction of angular momentum lost to
the fraction of mass lost:
Ji − Jf
M1 −M1,c
M1 +M2
. (11)
Both initial and final orbits are assumed circular, so the ratio of final to initial
orbital separations becomes
M1c +M2
M1 +M2
M1 −M1c
M1 +M2
. (12)
Among possible solutions leading to known close double white dwarfs, Nele-
mans & Tout [35] find values 1 < γ . 4 required for the second (final) common
envelope phase, and 0.5 . γ < 3 for the first (putative) common envelope
phase. They note that values in the range 1.5 < γ < 1.7 can be found among
possible solutions for all common envelope phases in their sample, not only
Common Envelope Evolution Redux 11
those leading to known double white dwarfs, but those leading to known pre-
CV and sdB binaries as well.
The significance of this finding is itself open to debate. At one extreme, it
would seem implausible for any mechanism to remove less angular momentum
per unit mass than the orbital angular momentum per unit mass of either
component in its orbit (so-called Jeans-mode mass loss). At the other extreme,
a firm upper limit to γ is set by vanishing final orbital angular momentum,
Jf . If M1c and M2 can be regarded as fixed, the corresponding limits on γ are
M1 +M2
M1 −M1c
> γ >
M1 +M2
M1 −M1c
M1 +M2
M1c +M2
. (13)
In a fairly typical example, M1c = M2 =
M1, γ is inevitably tightly con-
strainted for any conceivable outcome: 5
> γ > 5
. The ratio of final to initial
orbital separation, Af/Ai, is extremely sensitive to γ near the upper limit of
its range. It is therefore not surprising to find empirical estimates of γ clus-
tering as they do – their values merely affirm the fact that Af must typically
be much smaller than Ai.
The unphysically large or, more commonly, negative values of αCEλ noted
above for the first mass transfer phase in the production of close white dwarf
binaries [35] implies that the orbital energies of these binaries have increased
through this phase (or, at any rate, decreased by significantly less than the
nominal binding energies of their common envelopes). Such an increase in
orbital energy is a hallmark of slow, quasi-conservative mass transfer, on a
thermal or nuclear time scale. Thermal time scale mass transfer is driven by
relaxation of the donor star toward thermal evolution; the re-expansion of
the donor following mass ratio reversal is powered by the (nuclear) energy
outflow from the core of the star. Likewise, the bulk expansion of the donor
star in nuclear time scale mass transfer draws energy from nuclear sources in
that star. It appears, therefore, that the first phase of mass transfer among
known close double white dwarfs cannot have been a common envelope phase,
but must instead have been a quasi-conservative phase, notwithstanding the
difficulties that conclusion presents, as we shall now see.
The dilemma that the close double white dwarfs present is illustrated in
Figs. 3 and 4. Figure 3 shows the distribution of immediate remnants of
mass transfer among solar-metallicity binaries of low and intermediate mass,
for a relatively moderate initial mass ratio. Conservation of total mass and
orbital angular momentum have been assumed. The remnants of the intial
primary include both degenerate helium white dwarfs, and nondegenerate he-
lium stars which have lost nearly all of their hydrogen envelopes. The helium
white dwarfs lie almost entirely along the left-hand boundary, the line labeled
‘envelope exhaustion’ in Fig. 1. (The extent of this sequence is more apparent
in the distribution of remnant secondaries.) Their progenitors have enough
angular momentum to accommodate core growth in the terminal phases of
mass transfer. In the calculation shown, the least massive cores grow from
12 Ronald F. Webbink
Common Envelope Evolution Redux 13
0.11M⊙ to ∼0.18M⊙ by the completion of mass transfer. In contrast, virtu-
ally all binaries leaving nondegenerate helium star remnants have too little
angular momentum to recover thermal equilibrium before they have lost their
hydrogen envelopes; for them, there is no slow nuclear time scale phase of
mass transfer, and core growth during mass transfer is negligible. The lowest-
mass helium star remnants have nuclear burning lifetimes comparable to their
hydrogen-rich binary companions, now grown through mass accretion. Those
more massive than ∼0.8M⊙ develop very extended envelopes during shell he-
lium burning, and will undergo a second phase of mass transfer from primary
to secondary, not reflected here; such massive white dwarfs are absent in the
Nelemans & Tout [35] sample, and so are omitted here.
In Fig. 4, the remnants of the first phase of conservative mass transfer
illustrated in Fig. 3 are followed through the second phase of mass transfer,
using (6). Because the remnants of the first phase have second-phase donors
much more massive their companions, and nearly all have deep convective en-
velopes, they are unstable to dynamical time scale mass transfer, and undergo
common envelope evolution. The systems labeled ‘Without Wind Mass Loss’
have been calculated assuming that no orbital evolution or mass loss occurs
between the end of the first phase of mass transfer and common envelope
evolution. It is assumed furthermore that αCE = 1, in principle marking the
most efficient envelope ejection energetically possible. Binary orbital periods
of 0.1, 1.0, and 10 days (assuming equal component masses) are indicated for
reference.
Observed close double white dwarfs, as summarized by Nelemans &
Tout [35], have a median orbital period of 1.4 d, and mass (spectroscopic pri-
mary) 0.39M⊙. Among double-lines systems, nearly equal white dwarf masses
are strongly favored, with the median q = 1.00. Clearly, most observed dou-
ble white dwarfs are too long in orbital period (have too much total energy
and angular momentum) to have evolved in the manner assumed here. Fur-
thermore, the computed binary mass ratios are typically more extreme than
observed, with the second-formed core typically 1.3-2.5 times as massive as
the first. The problem is that, while remnant white dwarfs or low-mass he-
Fig. 3 (facing page). Products of mass- and angular momentum-conservative mass
transfer for a typical initial mass ratio. The radii indicated refer to Roche lobe radii at
the onset or termination of mass transfer, as appropriate. To avoid common envelope
evolution, the donor stars (the region outlined in bold toward the lower right in the
diagram) must have radiative envelopes, and so arise between the terminal main
sequence and base of the giant branch. Their mass transfer remnants are outlined
in bold at the center-left of the diagram, with the remnant accretors at upper right.
The regions mapped are truncated in each case at a lower initial donor mass of
1.0 M⊙ and upper initial donor core mass of 0.7 M⊙. Lines of constant initial core
mass (with values as in Fig. 2) are indicated for the initial and remnant primaries.
Lines of constant remnant primary mass are indicated for the remnant secondaries.
14 Ronald F. Webbink
Common Envelope Evolution Redux 15
lium stars with suitable masses can be produced in the first, conservative mass
transfer phase, the remnant companions have envelope masses too large, and
too tightly bound, to survive the second (common envelope) phase of inter-
action at orbital separations and periods as large as observed. Evidently, the
progenitors of these double white dwarfs have lost a significant fraction of
their initial mass, while gaining in orbital energy, prior to the final common
envelope phase. These requirements can be fulfilled by a stellar wind, pro-
vided that the process is slow enough that energy losses in the wind can be
continuously replenished from nuclear energy sources.
The requisite mass loss and energy gain are possible with stellar wind
mass loss during the non-interactive phase between conservative and common
envelope evolution, or with stellar winds in nuclear time-scale mass transfer
or the terminal (recovery) phase of thermal time-scale mass transfer. Systemic
mass loss during or following conservative mass transfer will (in the absence
of angular momentum losses) shift the remnant regions to the left and upward
in Fig. 3 (subject to the limit posed by envelope exhaustion), while systemic
angular momentum losses shift them downwards. More extreme initial mass
ratios shift them downwards to the left.
The net effect of wind mass loss is illustrated by the regions labeled ‘With
Wind Mass Loss’ in Fig. 4. For simplicity, it is assumed here that half the
remnant mass of the original secondary was lost in a stellar wind prior to
the common envelope phase. Mass loss on this scale not only significantly re-
duces the mass of the second-formed white dwarf relative to the first, but the
concomitant orbital expansion produces wider remnant double white dwarfs,
bringing this snapshot model into good accord with the general properties of
real systems. Losses of this magnitude might be unprecedented among single
stars prior to their terminal superwind phase, but they have been a persis-
tent feature of evolutionary studies of Algol-type binaries [10, and references
Fig. 4 (facing page). Remnants of the second, common envelope, phase of mass
transfer of the systems shown in Fig. 3. Masses refer to the final remnants of the
original secondaries, and radii to their Roche lobe radii. Two groups of remnants are
shown. Those at lower right labeled ‘Without Wind Mass Loss’ follow directly from
the distributions of remnant primaries and secondaries shown in Fig. 3. Because
the remnant secondaries straddle the helium ignition line in Fig. 3, across which
core masses are discontinuous (see Fig. 2), the distribution of their post-common-
envelopes remnants is fragmented, some appearing as degenerate helium white dwarf
remnants (lower left), some as helium main sequence star remnants (lower center),
and the remainder as shell-burning helium star remnants (upper center). These latter
two groups overlap in the mass-radius diagram. The remnant distributions labeled
‘With Wind Mass Loss’ assume that the remnant secondaries of conservative mass
transfer lose half their mass in a stellar wind prior to common envelope evolution.
They too are fragmented, into degenerate helium white dwarf remnants (lower left)
and shell-burning helium stars (upper right). Within each group of remnants, lines
of constant remnant primary mass are shown, as in Fig. 3.
16 Ronald F. Webbink
therein] and, indeed, of earlier studies of close double white dwarf forma-
tion [11]. In the present context, their existence appears inescapable, if not
understood.
5 Long-Period Post-Common-Envelope Binaries and the
Missing Energy Problem
If the properties of short-period binaries with compact components can be
reconciled with the outcomes of common envelope evolution as expected from
simple energetics arguments, a challenge to this picture still comes from the
survival of symbiotic stars and recurrent novae at orbital separations too large
to have escaped tidal mass transfer earlier in their evolution. Notwithstanding
this author’s earlier hypothesis that the outbursting component in the recur-
rent nova T CrB (and its sister system RS Oph) might be a nondegenerate star
undergoing rapid accretion [59, 29, 61], it is now clear that the hot components
in both of these systems must indeed be hot, degenerate dwarfs [48, 6, 1, 18].
Furthermore, the short outburst recurrence times of these two binaries de-
mand that the degenerate dwarfs in each must have masses very close to the
Chandrasekhar limit.
The complexion of the problem posed by these systems can be illustrated
by a closer examination of T CrB itself. Its orbital period (P = 227.53 d)
and spectroscopic mass function (f(m) = 0.299M⊙) are well-established
from the orbit of the donor M3 III star [23]. The emission-line orbit for
the white dwarf [25] now appears very doubtful [18], but the system shows
very strong ellipsoidal variation (e.g., [1]), suggesting that the system is near
a grazing eclipse. Following Hric, et al. [18], I adopt MWD = 1.38 M⊙
amd MM3 = 1.2 M⊙. The Roche lobe radius of the white dwarf is then
RL,WD = 84 R⊙, nearly an order of magnitude larger than can be accom-
modated from the energetics arguments presented above, even for αCE = 1,
assuming solar metallicity for the system (see Fig. 5). A similar discrepancy
occurs for RS Oph.
It is evident that these long-period binaries are able to tap some en-
ergy source not reflected in the energy budget in (6). One possibility, dis-
cussed repeatedly in studies of planetary nebula ejection ([30, 42], more re-
cently [58, 12, 13]) is that the recombination energy of the envelope comes
Fig. 5 (facing page). Post-common-envelope masses and Roche lobe radii for bina-
ries consisting of a white dwarf or helium star plus a 1.2M⊙ companion, computed
with αCE = 1. Remnant systems inhabit the regions outlined in bold, and spanned
vertically by lines of constant white dwarf/helium star mass of 0.25, 0.35, 0.5, 0.7,
1.0, 1.4, and 2.0M⊙. Other initial sequences, encoded as in Fig. 2, have been mapped
through common envelope evolution. The location in this diagram of the white dwarf
in the recurrent nova T CrB is also indicated.
Common Envelope Evolution Redux 17
18 Ronald F. Webbink
into play. For solar composition material (and complete ionization), that re-
combination energy amounts to 15.4 eVamu−1, or 1.49 × 1013 erg g−1. For
tightly-bound envelopes on the initial giant branch of the donor, this term
is of little consequence; but near the tip of the low-mass giant branch, and
on the upper aymptotic giant branch of intermediate-mass stars, it can be-
come comparable with, or even exceed, the gravitational potential energy of
the envelope. In the model calculations of thermally-pulsing asymptotic giant
branch stars by Wagenhuber & Weiss [58], the threshold for spontaneous ejec-
tion by envelope recombination occurs consistently when the stellar surface
gravity at the peak thermal pulse luminosity falls to
log gHRI = −1.118± 0.042 . (14)
This threshold marks the presumed upper limit to the radii of lower-mass
asymptotic giant branch stars in Figs. 1 et seq. in the present paper. In fact,
the total energies of the envelopes of these stars formally becomes decidedly
positive even before the onset of the superwind phase, also shown in these
figures.
Whether single stars successfully tap this ionization energy in ejecting
planetary nebulae is still debated, but the circumstances of mass transfer in
binary systems would seem to provide a favorable environment for doing so.
In the envelopes of extended giants and asymptotic giant branch stars, pho-
tospheric electron densities and opacities are dominated by heavy elements;
the middle of the hydrogen ionization zone is buried at optical depths of order
τ ∼ 105. Adiabatic expansion of the envelope of the donor into the Roche lobe
of its companion can therefore trigger recombination even as the recombina-
tion radiation is itself trapped and reprocessed within the flow, much as the
same process occurs in rising convective cells.
Other possible energy terms exist that have been neglected in the energet-
ics arguments above: rotational energy, tidal contributions, coulomb energy,
magnetic fields, etc. But Virial arguments preclude most of these terms from
amounting to more than a minor fraction of the internal energy content of
the common envelope at the onset of mass transfer, when the energy budget
is established. The only plausible energy source of significance is the input
from nuclear reactions. In order for that input to be of consequence, it must
of course occur on a time scale short compared with the thermal time scale of
the common envelope. Taam [53] explored the possibility that shell burning
in an asymptotic giant branch core could be stimulated by mixing induced
dynamically in the common envelope (see also [54, 52]). Nothing came of
this hypothesis: mixing of fresh material into a burning shell required tak-
ing low-density, high-entropy material from the common envelope and mixing
it downward many pressure scale heights through a strongly stable entropy
gradient to the high-density, low-entropy burning region. In the face of strong
buoyancy forces, dynamical penetration is limited to scales of order a pressure
scale height.
Common Envelope Evolution Redux 19
6 Common Envelope Evolution with Recombination
The notion that recombination energy my be of importance specifically to
common envelope evolution is not new. It has been included, at least para-
metrically in earlier studies, for example, by Han et al. [13], who introduced
a second α-parameter, αth, characterizing the fraction of the initial thermal
energy content of the common envelope available for its ejection. The initial
energy kinetic/thermal content of the envelope is constrained by the Virial
Theorem, however, and it is not clear that there is a compelling reason for
treating it differently from, say, the orbital energy input from the inspiraling
cores. We choose below to formulate common envelope evolution in terms of
a single efficiency parameter, labeled here βCE to avoid confusion with αCE
as defined above.
By combining the standard stellar structure equations for hydrostatic equi-
librium and mass conservation, we can obtain an expression for the gravita-
tional potential energy, Ωe, of the common envelope:
Ωe ≡ −
dM = 3 PV
P dV , (15)
where subscripts c refer to the core-envelope boundary, and ∗ to the stellar
surface. This is, of course, the familiar Virial Theorem applied to a stellar
interior.
It is convenient to split the pressure in this integral into non-relativistic
(particle), Pg, and relativistic (photon), Pr, parts. The envelopes of giants un-
dergoing common envelope evolution are sufficiently cool and non-degenerate
to make the classical ideal gas approximation an excellent one for the particle
gas. One can then write
P = Pg + Pr =
ur , (16)
where ug and ur are kinetic energy densities of particle and radiation gases,
respectively. The total internal energy density of the gas is
u = ug + ur + uint , (17)
where the term uint now appearing represents non-kinetic contributions to
the total energy density of the gas, principally the dissociation and ionization
energies plus internal excitation energies of bound atoms and molecules. The
overwhelmingly dominant terms in uint are the ionization energies: uint ≈
ρχeff .
Integrating over the stellar envelope, we obtain for the total energy Ee of
the envelope:
Ee = Ωe + Ue
3 PU |
− 2Ug − Ur
+ (Ug + Ur + Uint)
= −4πR3cPc − Ug + Uint , (18)
20 Ronald F. Webbink
where we explicitly take P∗ → 0. In fact, experience shows that, for red-giant
like structures, Rc is so small that the first right-hand term in the last equality
can generally be neglected. In that case, we get the familiar Virial result, but
with the addition of a term involving the ionization/excitation/dissociation
energy available in the gas, Uint ≈ Meχeff , which becomes important for dif-
fuse, loosely-bound envelopes.
In the context of common envelope evolution, it is of course the dissipated
orbital energy, E
orb−E
orb, that must unbind the envelope. However, the inclu-
sion of Uint in Ee now opens the possibility that the common envelope began
with positive total energy; that is, in the usual αCE-prescription, it is possible
for λ−1 to be zero or even negative, which has the undesirable consequence
that αCE need not lie in the interval 0 ≤ αCE ≤ 1 for all physically-possible
outcomes. However, the gravitational potential energy of the envelope, Ωe, is
negative-definite, and by comparing it with all available energy sources (or-
bital energy released plus internal energy of the envelope), we can define an
ejection efficiency βCE that has the desired property, 0 ≤ βCE ≤ 1:
βCE ≡
orb − E
orb)− Ue
4πR3cPc + 2Ug + Ur
orb − E
orb + Ug + Ur + Uint
. (19)
By analogy to the form factor λ in the conventional αCE formalism above, we
can define separate form factors λΩ for the gravitational potential energy and
λP for the gas plus radiation contributions to the (kinetic) internal energy of
the envelope:
and Ug + Ur =
, (20)
In contrast, the recombination energy available can be written simply in terms
of an average ionization energy per unit mass,
Uint = Meχeff . (21)
The ratio of final to initial orbital separation then becomes
1 + 2
βCEλΩr1,L
λPr1,L
χeffAi
M1 −M1c
In the limit that radiation pressure Pr, ionization energy (Uint), and the
boundary term (4πR3cPc) are all negligible, then 2λΩ → λP → λ and
βCE → 2αCE/(1 + αCE).
Fig. 6 (facing page). Post-common-envelope masses and Roche lobe radii as in
Fig. 5, but with recombination energy included, computed from (22) with βCE = 1,
with the approximation 2λΩ = λP = λ from (3). At small separations, the differences
are inconsequential, but substantially larger final separations are allowed when Af &
10R⊙ (RL & 3R⊙).
Common Envelope Evolution Redux 21
22 Ronald F. Webbink
The ability to tap the recombination energy of the envelope has a profound
effect on the the final states of the longest-period intermediate-mass binaries,
those that enter common envelope evolution with relatively massive, degener-
ate carbon-oxygen (or oxygen-neon-magnesium) cores. As is evident in Fig. 6,
possible final states span a much broader range of final orbital separations.
Indeed, for the widest progenitor systems, the (positive) total energy of the
common envelope can exceed the (negative) orbital energy of the binary, mak-
ing arbitrarily large final semimajor axes energetically possible.3 The inclusion
of recombination energy brings both T CrB and RS Oph within energetically
accessible post-common-envelope states. It suffices as well to account for the
exceptionally long-period close double white binary PG 1115+166, as sug-
gested by Maxted, et al. [31].
7 Conclusions
Re-examination of global constraints on common envelope evolution leads to
the following conclusions:
Both energy and angular momentum conservation pose strict limits on the
outcome of common envelope evolution. Of these two constraints, however,
energy conservation is much the more demanding.
The recent study of close double white dwarf formation by Nelemans &
Tout [35] shows clearly that their progenitors can have lost little orbital en-
ergy through their first episodes of mass transfer. Since common envelope
ejection must be rapid if it is to be efficient, its energy budget is essentially
fixed at its onset by available thermal and gravitational terms. The preser-
vation of orbital energy through that first phase of mass transfer therefore
indicates that the observed close double white dwarfs escaped common en-
velope formation in that first mass transfer phase. They evidently evolved
through quasi-conservative mass transfer. However, strictly mass- and angular
momentum-conservative mass transfer leaves remnant accretors that are too
massive and compact to account for any but the shortest-period close double
white dwarfs. Significant mass loss and the input of orbital energy prior to the
onset of the second (common envelope) phase of mass transfer are required.
The requisite energy source must be of nuclear evolution, which is capable of
driving orbital expansion and stellar wind losses during the slower (thermal
recovery or nuclear time scale) phases of quasi-conservative mass transfer, or
during the interval between first and second episodes of mass transfer. Details
of this process remain obscure, however.
3 The final orbit remains constrained by the finite initial orbital angular momentum
of the binary. Final semimajor axes much in excess of the initial semimajor axis
may be energetically allowed, but the finite angular momentum available means
that they cannot be circular – see (9) – an effect which has been neglected in
Fig. 6.
Common Envelope Evolution Redux 23
Long-period cataclysmic variables such as T CrB and RS Oph pose a
more extreme test of common envelope energetics. With their massive white
dwarfs, the evident remnants of much more massive initial primaries, they
are nevertheless too low in total systemic mass to be plausible products of
quasi-conservative mass transfer, but too short in orbital period to have es-
caped tidal mass transfer altogether. They must be products of common en-
velope evolution, but to have survived at their large separations, they demand
the existence of a latent energy reservoir in addition to orbital energy to as-
sist in envelope ejection. It appears that these binaries efficiently tap ioniza-
tion/recombination energy in ejecting their common envelopes. That reservoir
is demonstrably adequate to account for the survival of these binaries. Its in-
clusion requires only a simple revision to the parameterization of common
envelope ejection efficiency.
Acknowledgement. This work owes its existence to both the encouragement and the
patience of Gene Milone, to whom I am most grateful. Thanks go as well to Ron
Taam for a useful discussion of possible loopholes in common envelope theory, and
to Jarrod Hurley for providing the source code described in Hurley, et al. (2000).
This work was supported in part by grant AST 0406726 to the University of Illinois,
Urbana-Champaign, from the US National Science Foundation.
References
1. K. Belczyński, J. Miko lajewska: MNRAS 296, 77 (1998)
2. H.E. Bond: Binarity of Central Stars of Planetary Nebulae. In Asymmetrical
Planetary Nebulae II: From Origins to Microstructures, ed by J.H. Kastner, N.
Soker, S. Rappaport (ASP Conf. Ser. Vol. 199, San Francisco 2000), pp. 115-123
3. H.E.Bond, W. Liller, E.J. Mannery: ApJ 223, 252 (1978)
4. M. de Kool: A&A 261, 188 (1992)
5. R. Di Stefano, S. Rappaport: ApJ 437, 733 (1994)
6. D. Dobrzycka, S.J. Kenyon, D. Proga, J. Miko lajewska, R.A. Wade: AJ 111,
2090 (1996)
7. P.P. Eggleton: ApJ 268, 368 (1983)
8. P. Eggleton: Evolutionary Processes in Binary and Multiple Stars (CUP, Cam-
bridge)
9. F.C. Fekel, G.W. Henry, M.R. Busby, J.J. Eitter: AJ 106, 2170 (1993)
10. G. Giuricin, F. Mardirossian: ApJS 46, 1 (1981)
11. Z. Han: MNRAS 296, 1019 (1998)
12. Z. Han, P. Podsiadlowski, P.P. Eggleton: MNRAS 270, 12 (1994)
13. Z. Han, P. Podsiadlowski, P.P. Eggleton: MNRAS 272, 800 (1995)
14. R. Härm, M. Schwarzschild: ApJS 1, 319 (1955)
15. M.S. Hjellming, R.E. Taam: ApJ 370, 709 (1991)
16. M.S. Hjellming, R.F. Webbink: ApJ 318, 794 (1987)
17. S.B. Howell, L.A. Nelson, S. Rappaport: ApJ 550, 897 (2001)
18. L. Hric, K. Petrik, Z. Urban, P. Niarchos, G.C. Anupama: A&A 339, 449 (1998)
19. J.R. Hurley, O.R. Pols, C.A. Tout: MNRAS 315, 543 (2000)
24 Ronald F. Webbink
20. I. Iben, Jr., M. Livio: PASP 105, 1373 (1993)
21. I. Iben, Jr., A.V. Tutukov: ApJ 284, 719 (1984)
22. C.S. Jeffery, T. Simon: MNRAS 286, 487 (1997)
23. S.J. Kenyon, M.R. Garcia: AJ 91, 125 (1986)
24. U. Kolb: A&A 271, 149 (1993)
25. R.P. Kraft: ApJ 127, 625 (1958)
26. R.P. Kraft: ApJ 135, 408 (1962)
27. R.P. Kraft: ApJ 139, 457 (1964)
28. Y. Lebreton, J. Fernandes, T. Lejeune: A&A 374, 540 (2001)
29. M. Livio, J.W. Truran, R.F. Webbink: ApJ 308, 736 (1996)
30. L.B. Lucy: AJ 72, 813 (1967)
31. P.F.L. Maxted, M.R. Burleigh, T.R. Marsh, N.P. Bannister: MNRAS 334, 833
(2002)
32. M. Morris: ApJ 249, 572 (1981)
33. M. Morris: PASP 99, 1115 (1987)
34. D.C. Morton: ApJ 132, 146 (1960)
35. G. Nelemans, C.A. Tout: MNRAS 356, 753 (2005)
36. G. Nelemans, F. Verbunt, L.R. Yungelson, S.F. Portegies Zwart: A&A 360,
1011 (2000)
37. G. Nelemans, L.R. Yungelson, S.F. Portegies Zwart, F. Verbunt: A&A 365, 491
(2001)
38. C.A. Nelson, P.P. Eggleton: ApJ 552, 664 (2001)
39. M.S. O’Brien, H.E. Bond, E.M. Sion: ApJ 563, 971 (2001)
40. D.E. Osterbrock: ApJ 118, 529 (1953)
41. B. Paczynski: B. Common Envelope Binaries. In Structure and Evolution of
Close Binary Systems, ed by P.P. Eggleton, S. Mitton, J.A.J. Whelan (Reidel,
Dordrecht 1976), pp 75-80
42. B. Paczyński, J. Zió lkowski: AcA 18, 255 (1968)
43. L. Pastetter, H. Ritter: A&A 214, 186 (1989)
44. M. Politano: ApJ 465, 338 (1996)
45. F.A. Rasio, M. Livio: ApJ 471, 366 (1996)
46. S. Refsdal, M.L. Roth, M.L., A. Weigert: A&A 36, 113 (1974)
47. E.L. Sandquist, R.E. Taam, A. Burkert: ApJ 533, 984 (2000)
48. P.L. Selvelli, A. Cassatella, R. Gilmozzi: ApJ 393, 289 (1992)
49. T. Simon, F.C. Fekel, D.M. Gibson, Jr.: ApJ 295, 153 (1985)
50. N. Soker: ApJ 496, 833 (1998)
51. A.V. Sweigart, L. Greggio, A. Renzini: ApJ 364, 527 (1990)
52. R.E. Taam: The Common Envelope Phase of Binary Evolution. In Interacting
Binary Stars, ed by A.W. Shafter (ASP Conf. Ser., Vol. 56, San Francisco 1994),
pp 208-217
53. R.E. Taam: private communication (2007)
54. R.E. Taam, P. Bodenheimer: ApJ 337, 849 (1989)
55. R.E. Taam, E.L. Sandquist: ARA&A 38, 113 (2000)
56. M. van den Berg, F. Verbunt, R.D. Mathieu: A&A 347, 866 (1999)
57. G. Vauclair: A&A 17, 437 (1972)
58. J. Wagenhuber, A. Weiss: A&A 290, 807 (1994)
59. R.F. Webbink: Nature 262, 271 (1976)
60. R.F. Webbink: The Evolutionary Significance of Recurrent Novae. In Changing
Trends in Variable Star Research, IAU Colloq. No. 46, ed by F.M. Bateson, J.
Smak, I.H. Urch (U. Waikato, Hamilton, NZ 1979), pp 102-118
Common Envelope Evolution Redux 25
61. R.F. Webbink, M. Livio, J.W. Truran, M. Orio: ApJ 314, 653 (1987)
62. R.F. Webbink: Late Stages of Close Binary SystemsClues to Common Envelope
Evolution. In Critical Observations Versus Physical Models for Close Binary
Systems, ed by K.-C. Leung (Gordon & Breach, New York 1988), pp 403-446
63. B. Willems, U. Kolb: A&A 419, 1057 (2004)
64. L.A. Willson: ARA&A 38, 573 (2000)
65. H.W. Yorke, P. Bodenheimer, R.E. Taam: ApJ 451, 308 (1995)
	Common Envelope Evolution Redux
	Ronald F. Webbink
ABSTRACT
  Common envelopes form in dynamical time scale mass exchange, when the
envelope of a donor star engulfs a much denser companion, and the core of the
donor plus the dense companion star spiral inward through this dissipative
envelope. As conceived by Paczynski and Ostriker, this process must be
responsible for the creation of short-period binaries with degenerate
components, and, indeed, it has proven capable of accounting for short-period
binaries containing one white dwarf component. However, attempts to reconstruct
the evolutionary histories of close double white dwarfs have proven more
problematic, and point to the need for enhanced systemic mass loss, either
during the close of the first, slow episode of mass transfer that produced the
first white dwarf, or during the detached phase preceding the final, common
envelope episode. The survival of long-period interacting binaries with massive
white dwarfs, such as the recurrent novae T CrB and RS Oph, also presents
interpretative difficulties for simple energetic treatments of common envelope
evolution. Their existence implies that major terms are missing from usual
formulations of the energy budget for common envelope evolution. The most
plausible missing energy term is the energy released by recombination in the
common envelope, and, indeed, a simple reformulation the energy budget
explicitly including recombination resolves this issue.

<|endoftext|><|startoftext|>
The Source of Turbulence in
Astrophysical Disks:
An Ill-posed Problem.
Denis Richard, 
NASA Ames Research Center
UCSC-Ames Planet and Star Formation Meeting
March 29, 2007
Astrophysical Disks
Disks are ubiquitous in Astrophysics : 
* Planetary disks
* Circumstellar Disks (around young stars)
* Binary systems
* Active Galactic Nuclei (around black holes)
Therefore, understanding disks is fundamental to understand
planetary and stellar formation and evolution. 
Wide range of sizes :
From Saturn's rings ( ~ 107 km) to AGN disks (~ parsec = 3.1 10 13 km)
Wide variety of complex physical processes, 
one of them being the transport of angular momentum. 
Astrophysical Disks (Artist view)
Astrophysical Disks (HST)
Turbulence in Disks
* Accretion Disk : gas and “dust” falling inward, toward the central object.
* Thin disk = H/R << 1    →   Keplerian rotation : Ω ~ r -3/2 (no radial or 
vertical velocity in first approximation)
* To maintain stationary rotation, angular momentum needs to be 
transported outward.
* Need for an adequate transport mechanism.
* Molecular viscosity is too small. 
*  Turbulence :  instability mechanism ? Transport properties ?
* Early models : Shakura & Sunyaev 1973 :
* Turbulence most likely generated by differential rotation (shear flow).
* Ad hoc model for transport : turbulent viscosity  based on smallest constraints : 
 = α . Cs . H 
A Short History of Turbulence Models for Accretion Disks
* 1973 – 1991 : Shakura & Sunyaev era : 
- Source of turbulence unknown (shear flow assumption ?)
- Work on turbulence, in order to improve transport model.
- meanwhile, transport model : ν
 = α . Cs . H 
* 1991  :  Magneto Rotational Instability rediscovered
  Weak magnetic field coupled to shear flow gives rise to a linear instability.
  (Chandrasekhar 1960, Balbus & Hawley 1991)
* 1991 – present : MRI era :  
- Source of turbulence : MRI
- Turbulent model : ν
 = α . Cs . H 
Turbulent Viscosity Model
What is it ?
* Analytically : A description of turbulent transport as a diffusive mechanism.
* For numerical simulation : a type of basic subgrid model.
Where does it come from ?
* Built ad hoc (alpha-viscosity) : relevant length scale x relevant velocity.
* Measured experimentally (lab or numerical) : Reynold stress.
When should it be used ?
* When studying anything BUT turbulence as a fundamental physical process.
* When simulations do not have adequate resolution to describe the  whole range of 
scales of the flow, in which case the viscosity model has to be chosen as to describe 
only the subgrid scales. 
 From Instabilities to Turbulence
Two types of Instabilities :
* Linear : flow unstable to infinetesimal perturbations (super-critical transitions)
ex : thermal convection, Rayleigh/centrifugal instability in rotating flows.
* Non-Linear : flow unstable to finite amplitudes (sub-critical transitions)  
ex : plan shear flow, Differential rotation (?)
Analytical : Linear can be well treated (transition) / no general model for Non-Linear.
Numerical : Linear : generally low Reynolds, large scale flows = “lower” resolution
      Non-linear : generally high Reynolds, small scale flows = “high” resolution
Laboratory : In theory can study both types equally well.
In an Astrophysical context : Both types are equally difficult to study,
because what ultimately matters is the turbulent state, which will generally be at very 
high Reynolds, thus difficult to describe. 
Thus, while MRI has been around for more than 15 years in the disk community, there is 
no associated description for turbulent transport. An instability easy to describe 
does not mean that the induced turbulence is equally easy to quantify.  
Polemic ? What Polemic ?
* Little to no doubt that MRI is at work in most accretion disks.
* But : Is MRI the only turbulent mechanism relevant to Astrophysical disks ?
“Schools of thoughts” :
MRI  school
* MRI is necessary to power accretion disks
   (Only MRI can provide adequate angular momentum transport.) 
* MRI is sufficient to power all accretion disks 
                       (giving birth to such things as “Dead zone models”)
Instability X school
* Instability X is also relevant to Astrophysical Disks dynamics.
  (Differential Rotation, Plane Shear, Strato-rotational, Baroclinic,...)
No-school school
* Just tell me how turbulence acts in my disk, so that I can react, 
  coagulate, form a planet, evolve, etc... 
Arguments against Differential Rotation
Analytical : radial and azimuthal fluctuations can
                    not grow at the same time.
                    (Balbus, Hawley & Stone, 1996) 
Problem : this set of equations is linear,
                       thus irrelevant to non-linear instabilities
                       (~ Rayleigh criterion)
                        
Numerical : No-show in numerical simulations
                    (Balbus, Hawley & Stone, 1996) 
Problem : Reynolds numbers are “low”
                       (a 1024 3 grid can simulate a maximum  
                        Re of order 10,000)
                        
Experimental : H.Ji, M.Burin, E.Schartman & 
J.Goodman, 2006 (accompanied by comments by
S.Balbus) 
Can Differential Rotation lead to Turbulence in Disks ?
* High Reynolds Shear flow : Astrophysical Disks Re ~ 106 to 1020
* Early (1930's) laboratory experiments : 
         Couette-Taylor flow unstable at high Re.
  (Richard & Zahn, 1999) 
Data from Wendt (1933) 
and Taylor (1936)
Can Differential Rotation lead to Turbulence in Disks ?
* New experimental setup (Richard, 2001).
Conclusion : In a laboratory
experiment (a not-so-close analog to 
a disk), differential rotation can give 
rise to turbulence despite published
arguments.
Differential Rotation may lead 
to turbulence in Keplerian disks.
Princeton Experiment
(H.Ji, M.Burin, E.Schartman & J.Goodman, 2006)
(Comments  requested by meeting organizers...)
* Couette-Taylor flow.
* No flow-visualization, due to technical difficulties (private 
communication, M.Burin). Turbulent flows can not be positively 
differenciated from laminar flows. (As stated in the paper itself.)
This is why Ji et al never claim that the flows are stable. They 
merely discuss fluctuation levels.
* Boundary conditions : 
* Stubby aspect ratio with independently rotating rings to 
         compensate and avoid large scale circulation.
* clever setup but : 
- experimental calibration does not agree with 
                       numerical simulations of this setup 
                       (see Burin et al, 2006).
- calibration done “blind” (no visualization) and only close by the Rayleigh 
  boundary where the flow could be turbulent (priv. comm. M.Burin). Should 
  have been done in a  regime where there are no doubts that the flow is 
         laminar. Could have been returning the flow to a laminar regime after a sub-
  critical transition.
Princeton Experiment
(H.Ji, M.Burin, E.Schartman & J.Goodman, 2006)
(Comments  requested by meeting organizers...)
* Main argument for Astrophysical flows : 
* measured β < 6.2 10-6
*  then compares 
          α and β parameters 
          numerically, by deriving 
   a formulation of the α viscosity for the laboratory setup. Concludes that α and β 
   should have a similar value, thus that α ~ β < 6.2 10-6  is too small (α ~ 10 -3).
* BUT : this formulation for the α viscosity has meaning only in a thin keplerian disk 
  where Cs = Ω.H. It makes absolutely no sense for this experimental setup.
* What makes sense is to compare α and β value in an Astrophysical context.
  For α.Ω.H2 = β.Ω.R2, then β / α = (H/R)2, therefore for α ~ 10 -3, and 0.001< H < 0.1, 
β ~ 10 -9 - 10 -5, still provides adequate transport of angular momentum.
Conclusion
* The issue of differential rotation is still very much open.
* The debate about the origin of turbulence in disks should be a search to characterize 
the turbulent state of disks and to model transport properties. Today, it sometimes 
seems to be more an effort to eliminate non-fashionable instabilities. 
* Considering the complexity of disks, it would be naive to think that one process only 
participate in angular momentum process.
* Discouraging work on various instabilities is damaging for disk understanding. Not 
only angular momentum is transported. Chemistry, planet formation, etc. are affected by 
turbulence that may not ultimately be relevant for angular momentum transport.
* Should be acknowledged :
- The limitations of our tools : numerical, analytical, and experimental. 
   Being able to describe a process better/more easily does not make it more 
   relevant or important. 
- The lack of observational constraints on disks physics.
Real object
Analytical model
LES / DNS models
Laboratory model
...but quite possibly...Hopefully... ...while most often
         presented as :Tools
ABSTRACT
  An critical overview of the current state of research in turbulence in
astrophysical disks.

<|endoftext|><|startoftext|>
Introduction
	Pragmatic space-time codes:
	Decoding metric computation
	A simplified metric
	Search for good puncturation matrices
	Numerical results
	Conclusions
	Acknowledgments
	References
ABSTRACT
  This paper considers the use of punctured convolutional codes to obtain
pragmatic space-time trellis codes over block-fading channel. We show that good
performance can be achieved even when puncturation is adopted and that we can
still employ the same Viterbi decoder of the convolutional mother code by using
approximated metrics without increasing the complexity of the decoding
operations.

<|endoftext|><|startoftext|>
Introduction
In the paper [17], Jones introduced a certain Markov trace on the tower of
Hecke algebras H(An−1) associated to the Coxeter groups Sn = W (An−1), which
are the symmetric groups. When Jones’ trace is restricted to one of the algebras
H = H(An−1), it is degenerate, but its radical is an ideal, J , of H and so we obtain
a generically nondegenerate trace on the algebra H/J , which is the Temperley–Lieb
algebra TLn occurring in statistical mechanics [25] (the trace is the matrix trace
of a transfer matrix algebra).
In [19], Kazhdan and Lusztig introduced a remarkable polynomial Px,w(q) for
any elements x, w in a Coxeter group W . These polynomials have important ap-
plications in representation theory. Although the polynomials have an elementary
1991 Mathematics Subject Classification. 20C08, 20F55, 57M15.
Typeset by AMS-TEX
http://arxiv.org/abs/0704.0283v1
2 R.M. GREEN
definition, the only obvious way to compute them is using a rather complicated
recurrence relation. One of the main obstructions to computing the polynomials
efficiently is a fast way to compute the integer µ(x, w), which is the coefficient of
q(ℓ(w)−ℓ(x)−1)/2 in Px,w(q). In [12], the author showed how Jones’ trace can be used
to compute the leading coefficients µ(x, w) ∈ Z in the case where x and w are fully
commutative elements of W (in the sense of [24]). In this paper, we will investigate
the analogous phenomenon in Coxeter type En. This includes Coxeter groups of
types A and D as special cases.
The algebras TLn may be defined in terms of generators and relations in a
way that generalizes readily to Coxeter systems of other types. These generalized
Temperley–Lieb algebras have been studied for Coxeter type En by a number of
people [2, 3, 7]. Although the Coxeter groups of type En are infinite for n > 8, the
Hecke algebra quotient TL(En) in this case is still finite dimensional. In [2], tom
Dieck constructed a diagrammatic representation of TL(En), although the question
of whether this is a realisation—a faithful representation—is not tackled. In §9, we
will prove
Theorem 1.1. The diagrammatic representation of TL(En) given in [2] is injec-
tive.
The closing remarks of [2] state without proof that this representation can be
used to define a Markov trace on the tower of algebras TL(En). In Theorem 8.11,
we will prove this claim and furthermore we will show that there is a unique such
Markov trace. Although this is similar to what happens in type A, the analogous
claim for Coxeter type D is false.
This trace is also remarkable for other reasons: after suitable rescaling, it is a
tabular trace in the sense of [10], and a generalized Jones trace in the sense of [12].
The fact that the trace is tabular implies that it is (generically) nondegenerate on
the algebras TL(En). The fact that we have a generalized Jones trace will lead
to the following theorem (proved in §9) where the monomial basis elements bw are
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 3
defined in §3.
Theorem 1.2. Let {bw : w ∈ Wc} be the monomial basis of TL(En) indexed by the
fully commutative Coxeter group elements, and let tr be the unique Markov trace on
the tower of algebras TL(En). If x, y ∈ Wc, then the coefficient of v−1 in tr(bxby−1)
(after expansion as a power series) is µ̃(x, y), where
µ̃(x, y) =
µ(x, y) if x ≤ y,
µ(y, x) if x 6≤ y,
and µ(a, b) is the integer defined in [19].
We will also show in §9 how µ̃(x, y) may be evaluated non-recursively using the
diagram calculus.
2. Traces and Markov traces
By a trace on an R-algebra A, we mean an R-linear map t : A −→ R such that
t(ab) = t(ba) for all a, b ∈ A. The radical of the trace is the set of all a ∈ A such
that t(ab) = 0 for all b ∈ A. The radical is always an ideal of A, and if it is trivial,
the trace is said to be nondegenerate. In any case, if I is the radical of t, then t
induces a nondegenerate trace on the quotient algebra R/I.
The set of traces on an R-algebra A has a natural R-module structure. In the
special case where ρ is a representation of an R-algebra A, then the matrix trace
associated to ρ is a trace in the above sense, which means that, if A is semisimple,
the Grothendieck group of A gives a Z-lattice in the space of traces, generated by
the traces of the simple modules.
We will be particularly concerned with algebras where the base ring R is obtained
by extending scalars from the ring of Laurent polynomials A = Z[v, v−1] to some
ring F ⊗ A. This has the effect of specializing the parameter v to an invertible
element of F . In this situation, a trace is called generically nondegenerate if it is
nondegenerate as a trace over A, and if it also remains nondegenerate as a trace
over F ⊗A for all but finitely many specializations of v.
4 R.M. GREEN
Suppose now that R is an integral domain and {An : n ≥ N} is a family of unital
R-algebras such that An is a subalgebra of An+1 for all n ≥ N . Let A∞ be the
associated direct limit. Suppose also that there is a set of elements {gn : n ∈ N}
such that gn+1 ∈ An+1\An for all n and such that {gn : n ≤ M} is an algebra
generating set for AM . Following [5, §4], we may now introduce the notion of
Markov trace.
Definition 2.1. Maintain the above notation, and let F be a field containing R.
A Markov trace on A∞ with parameter z ∈ F is an F -linear map τ : A∞ −→ F
satisfying the following conditions:
(i) τ(1) = 1;
(ii) τ(hbn+1) = zτ(h) for n ≥ N and h ∈ An;
(iii) τ(hh′) = τ(h′h) for all h, h′ ∈ A∞.
Jones [17] proved that there is a unique Markov trace with parameter z on the
tower of Hecke algebras of type An, and that the only one of these traces that
passes to the Temperley–Lieb quotient is the one with parameter z = (v + v−1)−1.
This is an important observation in the construction of the Jones polynomial, be-
cause conditions (ii) and (iii) for the trace are what is needed to ensure that the
polynomial is invariant under the two types of “Markov move”.
Some other notable work on Markov traces includes that of Geck and Lam-
bropoulou [4], who classified the Markov traces in Coxeter types B and D, using a
suitable extension of the above definition. Lambropoulou [20] extended this work
(in type B) to generalized and cyclotomic Hecke algebras of type B.
For the purposes of studying Temperley–Lieb type quotients of Hecke algebras,
a better definition of Markov traces seems to be one that appears in work of Seifert
[22] and recent work of Gomi [6, Definition 3.7]. In this case, one retains conditions
(i) and (iii) of Definition 2.1 and replaces condition (ii) by the requirement that
τ(aTs) = zsT (a)
whenever we have a ∈ H(WI) for some parabolic subgroup WI corresponding to
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 5
I ⊆ S\{s}. (In other words, we require condition (ii) to hold for all generators of
An+1, not just one particular generator.) Here, zs is an indeterminate depending
on the conjugacy class of s in W .
In this paper, we will restrict our attention to the tower of algebras TL(En),
and in this case, the above definitions happen to agree; however, they do not agree
in the corresponding question for type Dn. In the latter case, it can be shown that
the Seifert–Gomi formulation produces a unique Markov trace, and Definition 2.1
does not.
3. The algebras TL(En)
Let X = X(En) be a Coxeter graph of type En, where n ≥ 6. Following [3], we
label the vertices of X by 0, 1, . . . , n− 1 in such a way that 1, 2, 3, . . . , n− 1 lie in
a straight line, and such that 3 is the unique vertex of degree 3, which is adjacent
to 2, 4 and 0. Figure 1 shows the case n = 6.
Figure 1. Coxeter graph of type E6
 4  5 3 2 1
Let W (En) be the associated Coxeter group with distinguished set of generating
involutions
S(En) = {si : i is a vertex of X(En)}.
In other words, W = W (En) is given by the presentation
W = 〈S(En) | (st)m(s,t) = 1 for m(s, t) < ∞〉,
where m(s, s) = 1, m(s, t) = 2 if s and t are not adjacent in X , and m(s, t) = 3
if s and t are adjacent in X . The elements of S = S(En) are distinct as group
elements, and m(s, t) is the order of st. Denote by Hq = Hq(En) the Hecke algebra
6 R.M. GREEN
associated to W . This is a Z[q, q−1]-algebra with a basis consisting of (invertible)
elements Tw, with w ranging over W , satisfying
TsTw =
Tsw if ℓ(sw) > ℓ(w),
qTsw + (q − 1)Tw if ℓ(sw) < ℓ(w),
where ℓ is the length function on the Coxeter group W , w ∈ W , and s ∈ S. If
n > 8, the group W is infinite and Hq has infinite rank as an A-algebra.
For the applications we have in mind, it is convenient to extend the scalars of
Hq to produce an A-algebra H, where A = Z[v, v−1] and v2 = q, and to define a
scaled version of the T -basis, {T̃w : w ∈ W}, where T̃w := v−ℓ(w)Tw. We will write
A+ and A− for Z[v] and Z[v−1], respectively.
A product w1w2 · · ·wn of elements wi ∈ W is called reduced if
ℓ(w1w2 · · ·wn) =
i ℓ(wi). We reserve the terminology reduced expression for
reduced products w1w2 · · ·wn in which every wi ∈ S. We write
L(w) = {s ∈ S : ℓ(sw) < ℓ(w)}
R(w) = {s ∈ S : ℓ(ws) < ℓ(w)}.
The set L(w) (respectively, R(w)) is called the left (respectively, right) descent set
of w.
Call an element w ∈ W complex if it can be written as a reduced product
x1wss′x2, where x1, x2 ∈ W and wss′ is the longest element of some rank 2 parabolic
subgroup 〈s, s′〉 such that s and s′ correspond to adjacent vertices in the Coxeter
graph En. Denote by Wc(En) the set of all elements of W that are not complex.
The elements of Wc = Wc(En) are the fully commutative elements of [24]; they
are characterized by the property that any two of their reduced expressions may be
obtained from each other by repeated commutation of adjacent generators.
Let J(En) be the two-sided ideal of H generated by the elements
T1 + Ts + Tt + Tst + Tts + Tsts,
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 7
where (s, t) runs over all pairs of elements of S for which m(s, t) = 3. Follow-
ing Graham [7, Definition 6.1], we define the generalized Temperley–Lieb algebra
TL(En) to be the quotient A-algebra H(En)/J(En). We denote the corresponding
epimorphism of algebras by θ : H(En) −→ TL(En). Let tw (respectively, t̃w) de-
note the image in TL(En) of the basis element Tw (respectively, T̃w) of H. If s ∈ S,
we define bs ∈ TL(En) by bs = v−11 + t̃s.
A more convenient description of TL(En) for the purposes of this paper is by
generators and relations (as in [3, §2.2]). Since the Laurent polynomial v + v−1
occurs frequently, we denote it by δ.
Proposition 3.1. As a unital A-algebra, TL(En) is given by generators {bs : s ∈
S} and relations
b2s = δbs,
bsbt = btbs if m(s, t) = 2,
bsbtbs = bs if m(s, t) = 3.
The following basis theorem will be used freely in the sequel.
Theorem 3.2 [3, 7].
(i) The set {t̃w : w ∈ Wc} is a free A-basis for TL(En).
(ii) If w ∈ Wc and w = si1si2 · · · sir is reduced, then the element
bw = bsi
· · · bsir
is a well-defined element of TL(En).
(iii) The set {bw : w ∈ Wc} is a free A-basis for TL(En).
Proof. Part (i) is due to Graham [7, Theorem 6.2]. Parts (ii) and (iii) are stated
by Fan in [3, §2.2], and more details may be found in [13, Proposition 2.4]. �
Definition 3.3 [3, §2.3]. Let P = P (n) denote the set of subsets of the Coxeter
graph En that consist of non-adjacent vertices. We allow P to include the empty set,
8 R.M. GREEN
∅. For any A ∈ P , let i(A) be the product of the elements of S(En) corresponding
to the vertices in A (with i(∅) = 1); note that the order of the product is immaterial
since the vertices in A correspond to commuting generators. Let A,B ∈ P . We say
that A and B are neighbours if and only if 1 + #(A ∩ B) = #A = #B, and the
two vertices in (A∪B)\(A∩B) are adjacent in En. Define an equivalence relation
on P by taking the reflexive and transitive closure of the relation A ∼ B if A and
B are neighbours. Let P̄ denote the set P/ ∼ .
Example 3.4. In type E7, let A = {0, 2, 4, 6} and B = {0, 1, 4, 6}. In this case,
i(A) = b0b2b4b6 and i(B) = b0b1b4b6, A and B are neighbours, and the equivalence
class of A is precisely {A,B}.
Definition 3.5 [3, §6.3]. Let n ≥ 6.
If n is odd, we define P ′ = P ′(n) to be the subset of P (n) consisting of the sets
{(n− 1)− 2j : 0 ≤ j ≤ N} : 0 ≤ N ≤ n− 1
together with the set
{n− 1, n− 3, n− 5, . . . , 4} ∪ {0}
and the empty set.
If n is even, we define P ′ = P ′(n) be the subset of P (n) consisting of the sets
{(n− 1)− 2j : 0 ≤ j ≤ N} : 0 ≤ N ≤ n− 2
together with the empty set.
Example 3.6. In type E6, we have
P ′ = {{5}, {5, 3}, {5, 3, 1}, ∅} .
In type E7, we have
P ′ = {{6}, {6, 4}, {6, 4, 2}, {6, 4, 2, 0}, {6, 4, 0}, ∅} .
The importance of the set P ′ comes from the following
Proposition 3.7 (Fan, [3, Lemma 8.1.2]). The set P ′ constitutes a complete set
of equivalence class representatives for P with respect to ∼. �
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 9
4. Cells and the a-function
In §4, we recall the definitions of the a-function and cells arising from the mono-
mial basis. Most of this material comes from the papers [3] and [10], or is implicit
in them.
Definition 4.1 [3, Definition 2.3.1]. The a-function a : Wc −→ Z≥0 is defined by
a(w) := max
{#A : w = xi(A)y is reduced}
for w ∈ Wc.
Proposition 4.2. Let w ∈ Wc and let f ∈ A. Define the degree, deg f , of f to
be the largest integer n such that vn occurs with nonzero coefficient in f , with the
convention that deg 0 = −∞. Denote the structure constants with respect to the
monomial basis by gx,y,z ∈ A, namely
bxby =
gx,y,zbz.
(i) The structure constant gx,y,z is either zero or a nonnegative power of δ, and,
given x and y, we have gx,y,z 6= 0 for a unique z.
(ii) If s ∈ S and gs,y,z 6∈ Z, then gs,y,z = δ, ℓ(sy) < ℓ(y) and y = z. Similarly, if
gx,s,z 6∈ Z, then gx,s,z = δ, ℓ(xs) < ℓ(x) and x = z.
(iii) We have a(w) = maxx,y∈Wc deg gx,y,w.
(iv) We have a(w) = maxx,y∈Wc deg gw,x,y.
Proof. Parts (i) and (ii) are well known and follow easily from [3, Proposition 5.4.1].
Part (iii) is proved in [10, Proposition 4.2.3] using the results of [3].
The proof of [3, Theorem 5.5.1] shows that
deg gw,x,y ≤ min(a(w), a(x)),
which means that
x,y∈Wc
deg gw,x,y ≤ a(w).
10 R.M. GREEN
Conversely, [3, Lemma 5.2.6] shows that
bwbw−1 = (v + v
−1)a(w)bd
for some d ∈ Wc, so taking x = w−1 and y = d, we find that
x,y∈Wc
deg gw,x,y ≥ a(w),
which completes the proof of (iv). �
Definition 4.3 [3, Definition 4.1].
For any w,w′ ∈ Wc, we say that w′ ≤L w if there exists bx such that gx,w,w′ 6= 0,
where g is as in Proposition 4.2.
For any w,w′ ∈ Wc, we say that w′ ≤R w if there exists bx such that gw,x,w′ 6= 0.
For any w,w′ ∈ Wc, we say that w′ ≤LR w if there exist bx and by such that
bxbwby = cbw′ for some c 6= 0.
We write w ∼L w′ to mean that both w′ ≤L w and w ≤L w′. Similarly, we
define w ∼R w′ and w ∼LR w′.
The relation ∼L (respectively, ∼R, ∼LR) is an equivalence relation, and the
corresponding equivalence classes of Wc are called the left (respectively, right, two-
sided) cells.
It is clear from the definitions and the fact that the identity element is a monomial
basis element that two-sided cells are unions of left cells, and also unions of right
cells.
Proposition 4.4.
(i) Let w ∈ Wc. If we have w = xi(A)y reduced for some A such that #A = a(w),
then i(A) ∼LR w and w ∼R xi(A).
(ii) The a-function is constant on left, right, and two-sided cells.
(iii) If w,w′ ∈ Wc are such that w′ ≤R w and w′ 6∼R w, then a(w′) > a(w). An
analogous statement holds for left cells and two-sided cells.
(iv) The right cell containing i(A) is precisely the set
{w ∈ Wc : w = i(A)x reduced, a(w) = #A}.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 11
(v) A left cell and a right cell contained in the same two-sided cell intersect in a
unique element.
Proof. Statement (i) is proved during the argument establishing [3, Theorem 4.5.1.].
The fact that the a-function is constant on two-sided cells is implicit in the proof
of [3, Theorem 4.5.1]. Since two-sided cells are unions of left (or right) cells, part
(ii) follows.
Suppose now that w,w′ ∈ Wc are such that w′ ≤R w and w′ 6∼R w. An inductive
argument using the definition of ≤R reduces the problem to the case where there
is some s ∈ S such that bwbs is a multiple of bw′ , so let us assume that this is
the situation. By [3, Corollary 4.2.2], the assumption that w′ ≤R w implies that
a(w′) ≥ a(w). The statement follows unless a(w′) = a(w), so suppose we are in
this case.
Let us write w = xi(A)y as in statement (i). Now [3, Lemma 4.2.5], applied to
the element xi(A) and the sequence of generators corresponding to ys, shows that
we have w′ = xi(A)y′ reduced. By part (i), we find that w′ ∼R xi(A), and thus
that w′ ∼R w, a contradiction.
The statement for left cells follows by a symmetrical argument, and the statement
for two-sided cells follows from the previous claims and the fact that if w′ ≤LR w,
then there is a chain
w′ = w1, w2, w3, . . . , wk = w
where, for each 1 ≤ i < k, we have either wi ≤L wi+1 or wi ≤R wi+1. This
completes the proof of (iii).
Part (iv) is [3, Proposition 4.4.3].
Part (v) is well known and follows from the proof of [3, Theorem 6.1.2]. �
Remark 4.5. For finite and affine Weyl groups, the a-function defined above is
known by [23, Theorem 3.1] to be the restriction of Lusztig’s more general a-
function [21] restricted to the subset Wc.
Although it is not true that each of the monomial cells studied above is a cell
12 R.M. GREEN
in the sense of Kazhdan–Lusztig [19], it can be shown fairly easily that each left
(respectively, right, two-sided) monomial cell is a subset of some left (respectively,
right, two-sided) Kazhdan–Lusztig cell.
5. Traces on the algebras TL(En)
In §5, we will extend scalars and deal with a K-form of TL(En), where K is
a field containing A and a square root of δ. (The existence of
δ is needed for
compatibility with [3], but can ultimately be removed; see Remark 6.4.) We write
TLK(En) := K ⊗A TL(En). We aim to classify the traces, τ : TLK(En) −→ K,
that is, linear functions τ with the property that τ(ab) = τ(ba) for all a, b ∈
TLK(En). It is clear that the set of all traces on TLK(En) is a K-vector space
(dependent in principle on K and δ). The main result of §5 is that there is a basis
for this vector space in natural bijection with the set P ′ of §3.
The next result shows how τ naturally induces a function P/ ∼ −→ K.
Lemma 5.1. Maintain the notation of Definition 3.3. Suppose A,B ∈ P are such
that A ∼ B, and let τ : TLK(En) −→ K be a trace. Then τ(i(A)) = τ(i(B)).
Proof. The proof immediately reduces to the case where A and B are neighbours.
Let s (respectively, t) be the element of S corresponding to the unique element
of A\B (respectively, B\A). It is immediate from the definitions that i(A) =
bsi(A ∩B) = i(A ∩B)bs and i(B) = bti(A ∩B) = i(A ∩B)bt. We then have
τ(i(A)) = τ(bsi(A ∩B)) = τ(bsbtbsi(A ∩B))
= τ(btbsi(A ∩B)bs) = τ(btbsbsi(A ∩B))
= δτ(btbsi(A ∩B))
= τ(btbtbsi(A ∩B)) = τ(btbsi(A ∩B)bt)
= τ(btbsbti(A ∩B)) = τ(bti(A ∩B))
= τ(i(B)),
as required. �
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 13
Lemma 5.2. Any trace τ : TLK(En) −→ K is determined by its values on the set
{i(A) : A ∈ P}.
Proof. Suppose the values of τ(i(A)) are known for each A ∈ P . We will show how
to compute the value of τ(bw), where w ∈ Wc is arbitrary.
Let us write w = xi(A)y reduced as in Proposition 4.4 (i). Using a reverse
induction, we will assume that the values of τ(bw′) for a(w
′) > a(w) = #A, if
such w′ exist, have been determined. By the defining relations of TL(En), we have
bi(A)bi(A) = δ
#Abi(A), and so we have
τ(bw) = τ(bxbi(A)by)
= δ−#Aτ(bxbi(A)bi(A)by)
= δ−#Aτ(bi(A)bybxbi(A)).
Now i(A)y and xi(A) lie in Wc because w does, and Proposition 4.4 (i) and (ii)
shows that a(w) = a(xi(A)). By Proposition 4.2 (i), we have
bi(A)bybxbi(A) = bi(A)ybxi(A) = δ
for some z ∈ Wc, and it is clear from the definitions that z ≤L xi(A). By Proposi-
tion 4.4 (ii) and (iii), we see that
a(z) ≥ a(xi(A)) = a(w) = #A.
If a(z) > #A then our inductive hypothesis determines the value of τ(δcbz),
which in turn determines the value of τ(bw). We may therefore assume that a(z) =
#A. To complete the proof, it is enough to show that z = i(A), because the value
of τ(bz) will then have been determined by our assumptions.
Let s ∈ A. Since bsbi(A) = δbi(A) by the defining relations, the definition of bz
shows that bsbz = δbz. By Proposition 4.2 (ii), this means that ℓ(sz) < ℓ(z), and
it follows that A ⊆ L(z). Because A is a set of commuting generators, standard
14 R.M. GREEN
properties of Coxeter groups show that we can write z = i(A)z′ reduced. Applying
Proposition 4.4 (iv) to the fact that a(z) = #A shows that z ∼R i(A). A symmet-
rical argument then shows that we have z ∼L i(A). By Proposition 4.4 (v), this
can only happen if z = i(A). �
Theorem 5.3. For each Ā ∈ P̄ (as in Definition 3.3), there is a unique trace
τĀ : TLK(En) −→ K such that for each B ∈ P we have
τĀ(i(B)) =
1 if B ∈ Ā,
0 otherwise.
The set
{τĀ : Ā ∈ P̄}
is a K-basis for the set of all traces τ : TLK(En) −→ K.
Proof. It is clear from the definition of trace that the traces from TLK(En) to K
form a K-vector space. Lemmas 5.1 and 5.2 show that this space has dimension at
most the size of P̄ .
Fan [3, Theorem 5.6.1] shows that TLK(En) is semisimple and that is then a
direct sum of |P̄ | matrix rings. This proves that the dimension of the space of traces
is at least the size of P̄ , and thus that the space has the claimed dimension.
A dimension count, together with another application of lemmas 5.1 and 5.2,
then shows that there are unique traces τĀ with the properties claimed, and that
they form a basis. �
We now come to the central definition of the paper.
Definition 5.4. The trace tr : TLK(En) −→ K is defined by
Ā∈P̄
δ−#AτĀ,
where τĀ is as in Theorem 5.3.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 15
Corollary 5.5. Any trace τ : TLK(En) −→ K satisfies τ(bw) = τ(bw−1) for all
w ∈ Wc.
Proof. It follows from Proposition 3.1 that there is a unique A-linear antiautomor-
phism ∗ : TL(En) −→ TL(En) fixing the generators bs. We may extend this to a
K-linear antiautomorphism ∗ : TLK(En) −→ TLK(En). If a ∈ TLK(En), let us
write a∗ for ∗(a). Note that if A ∈ P , then i(A) is invariant under ∗, because i(A)
is a product of commuting generators bs.
Given a trace τ : TLK(En) −→ K, the K-linear map τ ′ : TLK(En) −→ K
defined by τ ′(a) = τ(a∗) is also a trace. Since τ and τ ′ agree on all elements i(A)
for A ∈ P , Lemma 5.2 shows that τ = τ ′, and the assertion follows. �
Remark 5.6. The trace tr will turn out to induce the Markov trace of the title.
Note that the definition makes sense because Ā, B̄ ∈ P̄ implies #A = #B.
Traces on Hecke algebras of finite Coxeter groups are known have a property
similar to that given in Corollary 5.5; see [5, Corollary 8.2.6] for more details.
6. Cellular structure and the a-funtion
In §6, we explain how the trace tr is particularly compatible with the structure
of TL(En) as a cellular algebra, in the sense of [8]. We will not recall the complete
definition of a cellular algebra here, but we summarize below the properties of the
cellular structure that are important for our purposes.
Definition 6.1. Let Λ be the set of two-sided cells for TL(En), equipped with the
partial order induced by ≤LR. For each λ ∈ Λ, let M(λ) be an indexing set for
the left cells contained in λ; note that the inversion map on the Coxeter group W
induces a bijection between the set of left cells in λ and the set of right cells in λ
(see the remarks at the end of [3, §4.4]).
Proposition 6.2. Maintain the above notation.
(i) Let T, U ∈ M(λ) for some fixed λ ∈ Λ. Then T ∩ U contains a unique element,
w, and we define CT,U = bw.
16 R.M. GREEN
(ii) The A-algebra anti-automorphism ∗ : TL(En) −→ TL(En) defined by ∗(bw) =
bw−1 satisfies ∗(CT,U ) = CU,T . In particular, we have w2 = 1 if and only if
bw = CT,T for some T .
(iii) Suppose that CP,Q and CR,S are arbitrary monomial basis elements, and define
CT,U by the condition
CP,QCR,S = δ
aCT,U
(which makes sense by Proposition 4.2 (i)). If P,Q,R, S, T and U all belong
to the same two-sided cell, then P = T and S = U ; if, furthermore, we have
Q = R, then a = a(CT,U ). If it is not the case that P = T , S = U and Q = R,
then we have a < a(CT,U ).
Proof. Parts (i) and (ii), which are originally due to Graham [7], are proved in [10,
Proposition 4.2.1]. Part (iii) is proved in [10, propositions 4.2.1 and 4.2.3] using
the results of [3]. �
Proposition 6.3. For all w ∈ Wc, we have tr(bw) = δa, where a = −a(w) if
w2 = 1, and a < −a(w) otherwise.
Proof. Let λ be the two-sided cell containing w. We will prove the statement by
induction on the partial order on two-sided cells given in Definition 6.1. Writing
w = CT,U for T, U ∈ M(λ), as in Proposition 6.2 (i), and applying Proposition 6.2
(ii), we see that the condition w2 = 1 is equivalent to T = U .
By Proposition 4.4, there exists a product of a(w) commuting generators, i(A), in
λ. Define V ∈ M(λ) by the condition CV,V = bi(A). Since tr is a trace, Proposition
6.2 (iii) shows that
tr(CT,U ) = δ
−a(w)tr(CT,V CV,U ) = δ
−a(w)tr(CV,UCT,V ).
By Proposition 4.2 (i), we have
CV,UCT,V = δ
bCX,Y
for some b ≥ 0 and some basis element CX,Y . There are now two cases to consider.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 17
The first possibility is that CX,Y comes from the two-sided cell λ. (If T = U ,
this case must occur by Proposition 6.2 (iii).) In this case, we have X = Y = V ,
and thus CX,Y = bi(A). Proposition 6.2 (iii) then shows that b = a(w) if T = U ,
and b < a(w) otherwise. Since we have tr(bi(A)) = δ
−a(w) by definition of tr, we
have tr(CT,U ) = δ
−a(w)+b−a(w), and the result follows.
The other possibility is that CX,Y comes from a two-sided cell λ
′ with λ′ < λ,
and T 6= U . In this case, Proposition 4.4 (iii) shows that a(CX,Y ) > a(w). By
the inductive hypothesis, we know that tr(CX,Y ) = δ
a′ , where a′ ≤ −a(CX,Y ) <
−a(w). This means that tr(CT,U ) = δ−a(w)+b+a
. By propositions 4.2 (iii) and 4.4
(ii), we have b ≤ a(w), and thus tr(CT,U ) = δa for a < −a(w), as required. �
Remark 6.4. The above proposition shows that we do not actually need
δ ∈ k to
define tr. From now on, we need only assume that K is a field containing A.
Proposition 6.5. If K is the field of fractions of the power series ring Z[[v−1]],
then tr is a nondegenerate trace on TLK(En), and
tr(CP,QCR,S)− δQRδPS ∈ v−1Q[[v−1]],
where δQR and δPS are the Kronecker delta.
Proof. An element x of K is uniquely representable in the form
where λi ∈ Q for all i. If x 6= 0, we define deg x to be the largest integer j such
that λj 6= 0. If x, y 6= 0 then deg(xy) = deg x + deg y, so the facts that deg δ = 1
and deg 1 = 0 imply that deg δa = −a.
The second assertion follows from the fact that deg δa = −a combined with
Proposition 4.4 (ii), Proposition 6.2 (iii) and Proposition 6.3.
We will now show that for any nonzero a ∈ TLK(En), we have tr(aa∗) 6= 0, from
which the assertion follows. We have
λwbw,
18 R.M. GREEN
and by clearing denominators (thus multiplying a by a nonzero scalar), we may
assume that we have λw ∈ A for all w ∈ Wc. Choose w′ with λw′ 6= 0 and
N(w′) := deg λw′ maximal, and let cw′ be the (integer) coefficient of v
N(w′) in λw′ .
Setting aw′ = v
−N(w′)λw′bw′ , we then have
tr(aw′a
w′) = c
2 mod v−1Q[[v−1]].
If λw′′ 6= 0 but deg λw′′ is not maximal, we may again define aw′′ = v−N(w
′′)λw′′bw′′ ,
but then
tr(aw′′a
w′′) ∈ v−1Q[[v−1]].
Since the integers c2 are strictly positive, it follows that
tr((v−N(w
′)a)(v−N(w
′)a)∗) 6∈ v−1Q[[v−1]],
which completes the proof. �
Proposition 6.6. Let K be the field of fractions of the power series ring Z[[v−1]],
and let K ′ be the subfield of K consisting of the field of fractions of Z[[v−2]].
(i) The field TLK(En) has a unique structure as a Z2-graded algebra over K
which vn has degree n mod 2 and K ′ is precisely the set of elements of degree 0
mod 2.
(ii) The algebra TLK(En) has a unique structure as a Z2-graded algebra over K
in which vn has degree n mod 2 and the generators bs have degree 1 mod 2.
We denote the even subalgebra consisting of elements of degree 0 mod 2 by
TLK′(En).
(iii) Let τ : TLK(En) −→ K be any trace. Then there are unique K ′-linear maps
τ(0), τ(1) : TLK′(En) −→ K ′ such that τ(0) + vτ(1) is the restriction of τ to
TLK′(En), and furthermore, τ(0) and τ(1) are themselves traces.
Proof. Recall from the proof of Proposition 6.5 that K = Q((v−1)) = Q[v][[v−1]],
so that each element x ∈ K has a unique expression of the form
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 19
where qi ∈ Q and N ∈ Z depends on x. Similar reasoning shows that the subfield
K ′ of K then consists precisely of those elements for which qi = 0 whenever i is
odd. Part (i) is a consequence of this construction.
The assertion of (ii) is immediate from the observation that the defining relations
of Proposition 3.1 respect the given grading.
Let π : K −→ K ′ be the map
where
q′i =
qi if i is even,
0 otherwise.
Our description ofK ′ shows that π is a K ′-linear map. Denoting the restriction of τ
to TLK′(En) by τ
′, it follows that π ◦τ ′ is a trace on TLK′(En). Since τ(0) = π ◦τ ′,
the maps τ(0), vτ(1) = τ
′ − τ(0) and τ(1) are also traces, completing the proof of
(iii). �
Note that any trace from TLK′(En) to K
′ extends uniquely to a trace from
TLK(En) to K by tensoring by K ⊗K′ −.
Lemma 6.7. The trace tr : TLK(En) −→ K arises from a trace
tr′ : TLK′(En) −→ K ′
by extension of scalars.
Proof. We use the notation of §5. Note that if A ∈ P , then i(A) is an element of
TLK(En) of degree #A mod 2. We also have tr(i(A)) = δ
#A, which is an element
of K of degree #A mod 2.
Recall that TLK′(En) is a K
′-subalgebra of TLK(En) and note that if y, z are
homogeneous elements of TLK(En), then yz and zy have the same degree. The
argument of Lemma 5.2 now shows that if x is an element of TLK′(En), we have a
relation
tr(x) = tr
Ā∈P̄
λĀi(A)
20 R.M. GREEN
where for each Ā ∈ P̄ , we have λĀ(i(A)) ∈ TLK′(En). By the first paragraph of
the proof, λĀ must be homogeneous of degree #A mod 2, and tr(λĀi(A)) ∈ K ′.
The proof is completed by the observation that any x ∈ TLK(En) is uniquely
expressible as x(0) + vx(1) for x(0), x(1) ∈ TLK′(En) (compare with Proposition 6.6
(iii)). �
Corollary 6.8. If w ∈ Wc and tr(bw) = δa as in Proposition 6.3, then a ≡ λ(w)
mod 2.
Proof. By Lemma 6.7, we have deg tr(bw) = ℓ(w) mod 2, so the assertion follows
from the fact that deg δ = 1. �
§7. tom Dieck’s diagram calculus
In [2], tom Dieck introduced a diagram calculus for the algebras TL(En). To
give a rigorous definition of tom Dieck’s diagram calculus, as we do here, we first
need to recall the graphical definition of the Temperley–Lieb algebra. We start by
recalling Jones’ formalism of k-boxes [18], following the approach of Martin and
the author in [15]. For further details and references, the reader is referred to [11,
Definition 7.1. Let k be a nonnegative integer. The standard k-box, Bk, is the
set {(x, y) ∈ R2 : 0 ≤ x ≤ k + 1, 0 ≤ y ≤ 1}, together with the 2k marked points
1 = (1, 1), 2 = (2, 1), 3 = (3, 1), . . . , k = (k, 1),
k + 1 = (k, 0), k + 2 = (k − 1, 0), . . . , 2k = (1, 0).
Definition 7.2. Let X and Y be embeddings of some topological spaces (such
as lines) into the standard k-box. Multiplication of such embeddings to obtain a
new embedding in the standard k-box shall, where appropriate, be defined via the
following procedure on k-boxes. The product XY is the embedding obtained by
placing X on top of Y (that is, X is first shifted in the plane by (0, 1) relative to Y ,
so that marked point (i, 0) in X coincides with (i, 1) in Y ), rescaling vertically by a
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 21
scalar factor of 1/2 and applying the appropriate translation to recover a standard
k-box.
Definition 7.3. Let k be a nonnegative integer. Consider the set of smooth em-
beddings of a single curve (which we usually call an “edge”) in the standard k-box,
such that the curve is either closed (isotopic to a circle) or its endpoints coincide
with two marked points of the box, with the curve meeting the boundary of the
box only at such points, and there transversely.
By a smooth diffeomorphism of this curve we mean a smooth diffeomorphism of
the copy of R2 in which it is embedded, that fixes the boundary, and in particular
the marked points, of the k-box, and takes the curve to another such smooth
embedding. (Thus, the orbit of smooth diffeomorphisms of one embedding contains
all embeddings with the same endpoints.)
A concrete Brauer diagram is a set of such embedded curves with the property
that every marked point coincides with an endpoint of precisely one curve. (In
examples we can represent this set by drawing all the curves on one copy of the
k-box. Examples can always be chosen in which no ambiguity arises thereby.)
Two such concrete diagrams are said to be equivalent if one may be taken into
the other by applying smooth diffeomorphisms to the individual curve embeddings
within it.
There is an obvious map from the set of concrete diagrams to the set of pair
partitions of the 2k marked points. It will be evident that the image under this
map is an invariant of concrete diagram equivalence.
The set Bk(∅) is the set of equivalence classes of concrete diagrams. Such a class
(or any representative) is called a Brauer diagram.
Let D1, D2 be concrete diagrams. Since the k-box multiplication defined above
internalises marked points in coincident pairs, corresponding curve endpoints in
D1D2 may also be internalised seamlessly. Each chain of curves concatenated in
this way may thus be put in natural correspondence with a single curve. Thus
the multiplication gives rise to a closed associative binary operation on the set of
22 R.M. GREEN
concrete diagrams. It will be evident that this passes to a well defined multiplication
on Bk(∅). Let R be a commutative ring with 1. The elements of Bn(∅) form the
basis elements of an R-algebra PBn (∅) with this multiplication.
A curve in a diagram that is not a closed loop is called propagating if its endpoints
have different y-values, and non-propagating otherwise. (Some authors use the
terms “through strings” and “arcs” respectively for curves of these types.)
Note that in a Brauer diagram drawn on a single copy of the k-box it is not gen-
erally possible to keep the embedded curves disjoint. Let Tk(∅) ⊂ Bk(∅) denote the
subset of diagrams having representative elements in which the curves are disjoint.
Representatives of this kind are called Temperley–Lieb diagrams.
It will be evident that PBn (∅) has a subalgebra with basis the subset Tk(∅). (That
is to say, the disjointness property is preserved under multiplication.) We denote
this subalgebra Pn(∅)
Because of the disjointness property there is, for each element of Tk(∅), a unique
assignment of orientation to its curves that satisfies the following two conditions.
(i) A curve meeting the r-th marked point of the standard k-box, where r is odd,
must exit the box at that point.
(ii) Each connected component of the complement of the union of the curves in the
standard k-box may be oriented in such a way that the orientation of a curve
coincides with the orientation induced as part of the boundary of the connected
component.
Note that the orientations match up automatically in composition. If D1 and
D2 are equivalent concrete Temperley–Lieb diagrams, the diffeomorphisms that
give rise to the equivalence set up a bijection between the connected components
of D1 and those of D2.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 23
Figure 2. A pillar diagram corresponding to an element of T8(∅)
1 2 3 4 5 6 7 8
910111213141516
Definition 7.4. A pillar diagram consists of a pair (D, f), where D ∈ Tk(∅) is a
Temperley–Lieb diagram and f is a function from the connected components of D
to Z≥0, such that any component with anticlockwise orientation is mapped to zero.
On the diagram D, we indicate the values of f on the clockwise connected com-
ponents either by writing in the appropriate integer, or by inserting k disjoint discs
(the “pillars” of [2]).
The set of pillar diagrams arising from the set Tk(∅) will be denoted Tk(•).
Example 7.5. Let k = 8. A pillar diagram corresponding to an element of Tk(•)
is shown in Figure 2. Note that there are 10 connected components, precisely 7 of
which inherit a clockwise orientation. The values of f on these 7 components are
3, 2, 2, 1, 0, 0, 0.
We define an algebra Pn(•), analogous to Pn(∅), with the set Tk(•) as a basis.
The multiplication is k-box multiplication with the added convention that function
values on the connected components are additive. (This is natural if one represents
the function values with pillars as in Figure 2.)
For our purposes, we need to apply an equivalence relation on the concrete
diagrams of Tk(•). Locally, this is given by the relation shown in Figure 3.
24 R.M. GREEN
Figure 3. A topological reduction rule
In the notation where clockwise regions are labelled by nonnegative integers, the
relation of Figure 3 is that shown in Figure 4.
Figure 4. Alternative notation for the topological reduction
If the regions labelled k and l are connected to each other, Figure 3 shows that
we have k = l > 1 and p = k − 1. On the other hand, if the regions labelled k
and l are genuinely distinct, that is, the arcs shown on the left hand side of figure
3 are not sections of some longer arc, then we have p = k+ l− 1 ≥ 1. In the latter
case, it is not possible for any regions labelled by the integer zero to be created or
destroyed by the topological reduction. Note that the other partial regions shown
in figures 2 and 3 have anticlockwise orientation, and as such they are labelled by
the integer 0.
Definition 7.6. If L is a closed loop in a concrete diagram of Tk(•), we define
m(L) to be the integer label of the region immediately interior to L; in particular,
we have m(L) = 0 if L has anticlockwise orientation.
Let R be a commutative ring with 1. The R-algebra PEn (•) is the quotient of
the R-algebra Pn(•) obtained by applying the following three relations:
(i) for each closed loop L whose immediate interior is labelled 1 and whose imme-
diate exterior is necessarily labelled 0, relabel the immediate interior of L by 0
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 25
and remove L;
(ii) for each closed loop L whose immediate interior is labelled 0 and whose imme-
diate exterior is labelled k, relabel the immediate interior of L by k, remove L
and multiply by δ;
(iii) for each region R labelled by k ≥ 2 (whether or not R is a closed loop), decrease
the label of R by 1 and multiply by δ.
A basis for PEn (•) may be obtained by using the notion of “reduced” diagrams
given in [2, §2] and Bergman’s diamond lemma [1]. However, we do not pursue this
because we do not need it for our purposes.
Definition 7.7. Suppose n > 1 and 1 ≤ k < n.
The diagram Enk of PEn (•) is the one where each point i is connected by a
propagating edge to point 2n+1− i, unless i ∈ {k, k+1, 2n−k, 2n+1−k}. Points
k and k + 1 are connected by an edge, as are points 2n − k and 2n + 1 − k. All
regions are labelled by 0.
The diagram Bnk of PEn (•) is the one where each point i is connected by a
propagating edge to point 2n+ 1− i, and all regions are labelled by 0, except the
rectangular region bounded by k, k+1, 2n− k and 2n+1− k, which is labelled by
Proposition 7.8. There is a unique homomorphism ρ : TL(En) −→ PEn (•) of
unital A-algebras sending b0 to Bn3 and bs to Ens for i ∈ {1, 2, . . . , n − 1}, where
the numbering of generators is as in §3.
Proof. This is a routine (but important) exercise using the presentation of Propo-
sition 3.1, and is essentially the same as the proof of [2, Theorem 2.5]. �
We shall see later that ρ is in fact a faithful representation. We will not determine
the image of ρ, but this can be done by an inductive combinatorial argument similar
to those in [9, §5].
26 R.M. GREEN
§8. Existence and uniqueness of the Markov trace
There is a well-known embedding ιn : TL(En) −→ TL(En+1) sending bs to bs
for each generator of TL(En) (see [3, §6.3]). This means that the tower of algebras
TL(En), equipped with the generators bs, fits into the framework of Markov traces
defined in §2. We recall the definition in order to fix some notation.
Definition 8.1. Let K be a field containing A. A Markov trace on TLK(E∞) with
parameter z ∈ K is a K-linear map τ : TLK(E∞) −→ K satisfying the following
conditions:
(i) τ(1) = 1;
(ii) τ(hbn) = zτ(h) for n ≥ 6 and h ∈ TLK(En);
(iii) τ(hh′) = τ(h′h) for all n ≥ 6 and h, h′ ∈ TLK(En).
Remark 8.2. Note that in condition (ii), bn is the unique generator in TL(En+1)
that does not lie in TL(En). As mentioned in [3, §2.2], the algebras TL(En) are
quotients of the Hecke algebras of the Coxeter groups W (En), and bs = q
−1/2(Ts+
1), where the Ts are the usual generators for the Hecke algebra as given in [16, §7].
This means that the Markov trace can also be regarded as a trace on a tower of
Hecke algebras.
Proposition 8.3. If τ is a Markov trace on TLK(E∞), then the parameter z must
be equal to δ−1, and τ is unique. Restricted to TL(En), such a Markov trace must
agree with the trace tr.
Proof. Let n ≥ 6. Part (ii) of Definition 8.1 shows that τ(bn−1bn) = zτ(bn−1). On
the other hand, the defining relations and part (iii) of the definition show that
τ(bn−1bn) = δ
−1τ(bn−1(bn−1bn)) = δ
−1τ(bn−1bnbn−1) = δ
−1τ(bn−1),
proving the assertion about the parameter.
To prove the other assertions, it suffices to show that, regarding TLK(En) as
a subalgebra of TLK(E∞), we have τ(i(A)) = δ
−#A for A ∈ P = P (n). Choose
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 27
such an A. It follows from Definition 3.3 that for sufficiently large N ≥ n, and
identifying A in the obvious way with an element of P (N), we can find B ∈ P (N)
with A ∼ B and B∩{b0, b1, . . . , b5} = ∅. The first assertion together with repeated
applications of part (ii) of Definition 8.1 (and one application of part (i)) now show
that τ(i(B)) = δ−#B = δ−#A, and Lemma 5.1 completes the proof. �
To prove that the Markov trace on TLK(E∞) exists, we make use of the diagram
calculus, as hinted in [2, §6].
Definition 8.4. Let k be a nonnegative integer. The standard k-cone is obtained
from the standard k-box by identifying each pair of points {(x, 0), (x, 1)} for each
0 ≤ x ≤ k+1, and identifying all the points in the set {(k+1, y) : 0 ≤ y ≤ 1}. The
standard k-cone is homeomorphic to a closed disc.
Let D be a diagram in PEk (•). The trace diagram, D, of D is obtained by
identifying the boundary points of the k-box bounding D to form the standard
k-cone.
Figure 5. The trace diagram of the pillar diagram in Figure 2
Example 8.5. The trace diagram D corresponding to the diagram D of Figure 2
is shown in Figure 5.
Notice that the outer part of the trace diagram (regarded as a disc) will always
have an anticlockwise orientation and thus be labelled by 0. Consequently, any
28 R.M. GREEN
regions in the trace diagram not labelled by zero must be bounded by at least one
closed loop. (It is possible for the closed loops to be nested.)
Definition 8.6. Let g : Z≥0 −→ Z≥0 be given by
g(c) =
1 if c = 0,
c− 1 if c ≥ 1.
If D is a trace diagram for TL(En), we define the content, c(D), of D to be the
integer
g(f(L)),
where the sum is over all the connected components L of D that are interior to at
least one closed loop, and where f(L) is the integer assigned to L as in Definition
Example 8.7. The content of the trace diagram in Figure 5 is
g(2) + g(3) + g(3) = 5.
Lemma 8.8. The content of a trace diagram D is invariant under the topological
reduction rule shown in Figure 3.
Proof. Consider the application of the topological reduction rule to a diagram that
looks locally like the situation in Figure 6.
Figure 6. Labelling of points involved in the topological relation
 C  D
As in the discussion following Figure 4, there are two cases to consider, according
as the two pillar regions are connected or not in D.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 29
There are four cases to consider, according as there is an oriented curve in D
from point A to point C, and (independently) according as there is an oriented
curve in D from point D to point B.
Suppose first that there is no oriented curve in D from point A to point C, and
also that there is no oriented curve in D from point D to point B. In this case,
the two pillar regions are genuinely distinct, and applying the topological relation
does not produce any new closed loops. We are then in the case p = k + l − 1 ≥ 1
of Figure 4, so the summands (k − 1) and (l − 1) appearing in Definition 8.6 are
replaced by a single ((k + l − 1)− 1), leaving the content unchanged.
We next deal with the case where there is an oriented curve from point A to
point C, but no oriented curve from point D to point B. In this case, the two pillar
regions are connected to each other, and the application of the topological rule
produces a new closed loop (labelled zero) from the curve originally connecting
point A to point C. We are now in the case k = l > 1 of Figure 4. This will change
one of the summands (k− 1) of Definition 8.6 to (k− 2), and a new summand of 1
will be produced, corresponding to the new closed loop. The content thus remains
unchanged.
Consideration of the case where there is an oriented curve from point D to point
B, but not from point A to point C, proceeds in exactly the same way. The last case,
in which both oriented curves exist, also works similarly, except that the oriented
curves shown in Figure 6 are already part of a closed loop. Application of the
topological relation splits this closed loop into two closed loops, again producing
an extra summand of 1 and changing a summand (k − 1) to (k − 2), leaving the
content unchanged. �
Lemma 8.9. There is a well-defined K-linear map
τ•n : PEn (•) −→ K
such that for each pillar diagram D, τ•n(D) = δ
c(D). If x, y ∈ PEn (•), we have
τ•n(xy) = τ
n(yx).
30 R.M. GREEN
Proof. For the first assertion, we need to check relations (a)–(c) of Definition 7.6.
Relation (iii) holds by Lemma 8.8.
In relation (i), we have D = D1, where D1 is the result of removing a loop
labelled 1 from D. Since c(D) = c(D1), we have τ
n(D) = τ
n(D1).
In relation (ii), we have D = δD2, where D2 is the result of removing a loop
labelled 0 from D. Since c(D) = c(D2) + 1, we have τ
n(D) = τ
n(D2).
By linearity, we only need check the second assertion in the case where x and y
are pillar diagrams, and this is immediate from the construction of trace diagrams
from pillar diagrams. �
It is not hard to see that there is an algebra embedding ι•n : PEn (•) −→ PEn+1(•)
analogous to the map ιn. Given a pillar diagram D of PEn (•), ι•(D) is the diagram
obtained by adding a vertical line on the right of the diagram.
Lemma 8.10. Let D be a pillar diagram of PEn (•).
(i) We have τ•n+1(ι
n(D)) = δτ
n(D).
(ii) Let En+1n be as in Definition 7.7. Then we have τ
n(D) = τ
n+1(ιn(D)En).
Proof. Part (i) follows from the observation that the trace diagram ι•n(D) differs
from the trace diagram D only in having a single extra closed loop, labelled 0.
A short calculation involving diagrams shows that the trace diagrams D and
ιn(D)En are equivalent, from which part (ii) follows. �
Theorem 8.11. Let τn : TLK(En) −→ K be the trace defined by
τn(x) = δ
−nτ•n(ρ(x)).
The family of traces {τn : n ≥ 6} is compatible with the direct limit of algebras
TLK(En) and gives the unique Markov trace on TLK(E∞). Furthermore, the
Markov trace agrees with the traces tr of Definition 5.4.
Proof. The maps τn are traces by Proposition 7.8 and Lemma 8.9. They are
compatible with the direct limit by Lemma 8.10 (i). Since τ•n(1) = δ
n, we have
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 31
τn(1) = 1. Condition (ii) of Definition 8.1 follows from part (ii) of Lemma 8.10.
Uniqueness of the Markov trace, and agreement with the traces tr, is given by
Proposition 8.3. �
9. Proofs and applications
Proof of Theorem 1.1. We need to show that the homomorphism ρ of Proposition
7.8 is injective, and there is no loss in passing to the field of fractions K of Z[[v−1]].
In this case, Proposition 6.5 and Theorem 8.11 show that the unique Markov trace
on TLK(En), which can be defined on Im(ρ), is nondegenerate on TLK(En). The
conclusion follows. �
Proposition 9.1. The linear map
(1 + v−2)nτn = v
−nτ•n ◦ ρ
restricted to TL(En) takes values in A. It is a tabular trace in the sense of [10],
and a positive generalized Jones trace in the sense of [12].
Proof. The first assertion comes from the fact that τ•n evaluated on a diagram (such
as an element of the form ρ(bw) for w ∈ Wc) yields a nonnegative integer power of
To check that (1+v−2)nτn is a tabular trace, we need to check that axiom (A5) of
[10, Definition 1.3.4] is satisfied. We have just shown that (1+v−2)nτn takes values
in A, and it is clear from Theorem 8.11 that (1+ v−2)nτn is a trace. We have seen
in Corollary 5.5 and Proposition 6.2 (ii) that (1 + v−2)nτn(x) = (1 + v
−2)nτn(x
for all x ∈ TL(En). All that remains to check is that
τ(va(CS,T )CS,T ) = δS,T mod v
−1A−.
This follows from propositions 6.2 (ii) and 6.3 once we observe that we have
(1 + v−2)n = 1 mod v−2Q[[v−1]],
32 R.M. GREEN
regarded as power series in Q[v][[v−1]].
To show that (1+ v−2)nτn is a generalized Jones trace (see [12, Definition 2.9]),
two further conditions must be checked. One of these is precisely that established
by Lemma 6.7; the other is that, for x, y ∈ Wc, we should have
(1 + v−2)nτn(cxcy−1) =
1 mod v−1A− if x = y,
0 mod v−1A− otherwise,
where {cw : w ∈ Wc} is the canonical basis of TL(En) defined by J. Losonczy
and the author in [14]. By [14, Theorem 3.6], this is nothing other than the
basis {bw : w ∈ Wc} in this case. The corresponding property for tr (instead of
(1+v−2)nτn) follows from Proposition 6.5, and the assertion for (1+v
−2)nτn follows
from the fact that (1 + v−2)n = 1 mod v−2A−.
A generalized Jones trace is positive if it sends canonical basis elements to el-
ements of N[v, v−1]. This holds for (1 + v−2)nτn by Proposition 6.3: in this case,
(1 + v−2)nτn(bw) = δ
b for some b ≥ 0, so that (1 + v−2)n ∈ N[v, v−1]. �
Remark 9.2. Proposition 9.1 corrects the proof of [10, Theorem 4.3.5], where the
proof that the tabular trace takes the same values on x and x∗ contains a gap.
Proof of Theorem 1.2. By [12, Theorem 7.10], the conclusion of Theorem 1.2 holds
for a generalized Jones trace if the underlying Coxeter group has “Property F”
and a bipartite Coxeter graph. Clearly the graphs En are bipartite, because they
contain no circuits. Property F holds by [12, Remark 3.5]; see [13, Lemma 5.6] for
a fuller explanation.
To complete the proof, we simply have to transfer the result from (1+v−2)nτn to
the Markov trace, which follows from the fact that (1+v−2)n = 1 mod v−2A−. �
The next result is an easier to use version of Theorem 1.2.
Corollary 9.3. Let x, y ∈ Wc(En). Then we have
µ̃(x, y) =
1 if τ•n ◦ ρ(bxby−1) = δn−1,
0 otherwise.
ON THE MARKOV TRACE FOR TEMPERLEY–LIEB ALGEBRAS OF TYPE En 33
Proof. This follows from Theorem 1.2 together with the observation that bxby−1 =
δbbw for some b ≥ 0 and w ∈ Wc, and the fact that τ•n sends diagrams to positive
powers of δ. �
Remark 9.4. It follows from [13, Theorem 4.6 (iv)] and [14, Theorem 3.6] that the
monomial basis element bx is the projection of the Kazhdan–Lusztig basis element
C′x ∈ H(En). Regarding tr and τ•n ◦ ρ as traces on the Hecke algebra, Theorem
1.2 and Corollary 9.3 can be used to evaluate the trace on products of certain
Kazhdan–Lusztig basis elements, without evaluating the product (which would be
difficult). Another noteworthy property of these results is that they give non-
recursive formulae for certain of the integers µ(x, y).
Remark 9.5. In [7, §9], Graham showed that if x, w ∈ Wc for TL(En) then µ(x, y) ∈
{0, 1}, and also produced a nonrecursive method of finding all the x with µ(x, y) = 1
for a fixed y. (In [7], x and y are said to be “close” if µ̃(x, y) = 1.) However, unlike
the results above, this does not give an efficient way to compute µ(x, y) when both
of x and y are specified. Corollary 9.3 can therefore be regarded as a quick way to
tell if two elements are close or not.
Remark 9.6. It is possible to modify Theorem 1.2 and Corollary 9.3 so that they
provide a nonrecursive way to test whether two diagrams represent the same algebra
element. However, we do not pursue this here for reasons of space.
Example 9.7. Consider the Coxeter system of type En with n = 6, and generators
s0, . . . , s5 as numbered in Figure 1. Define y = s1s2s4s0s5 and
w = s1s2s3s4s0s3s5s2s4s1s3s2s0s3s4s5;
these are both reduced expressions for fully commutative elements. The diagrams
ρ(by) and ρ(bw) are shown in figures 7 and 8 respectively. To evaluate τ
n(bybw−1),
we invert the diagram for bw, compose it with by and identify boundary points to
produce a trace diagram. The trace diagram so obtained is shown in Figure 9 (up
34 R.M. GREEN
to equivalence), and by inspection, it has content 1 + 1 + 1 + (3− 1) = 5 = n − 1.
It follows from Corollary 9.3 that µ(y, w) = 1.
Figure 7. The diagram ρ(by) of Example 9.7
REFERENCES 35
Figure 8. The diagram ρ(bw) of Example 9.7
Figure 9. The trace diagram corresponding
to τ•6 ◦ ρ(bybw−1) of Example 9.7
Acknowledgement
I am grateful to P.P. Martin for helpful comments on an early version of this
paper.
References
36 REFERENCES
[1] G.M. Bergman, The diamond lemma for ring theory, Adv. Math. 29 (1978), 178–218.
[2] T. tom Dieck, Bridges with pillars: a graphical calculus of knot algebra, Topology Appl. 78
(1997), 21–38.
[3] C.K. Fan, Structure of a Hecke algebra quotient, J. Amer. Math. Soc. 10 (1997), 139–167.
[4] M. Geck and S. Lambropoulou, Markov traces and knot invariants related to Iwahori–Hecke
algebras of type B, J. Reine Angew. Math. 482 (1997), 191–213.
[5] M. Geck and G. Pfeiffer, Characters of finite Coxeter groups and Iwahori–Hecke algebras,
Oxford University Press, Oxford, 2000.
[6] Y. Gomi, The Markov traces and the Fourier transforms, J. Algebra 303 (2006), 566–591.
[7] J.J. Graham, Modular representations of Hecke algebras and related algebras, Ph.D. thesis,
University of Sydney, 1995.
[8] J.J. Graham and G.I. Lehrer, Cellular algebras, Invent. Math. 123 (1996), 1–34.
[9] R.M. Green, Generalized Temperley–Lieb algebras and decorated tangles, J. Knot Th. Ram.
7 (1998), 155–171.
[10] R.M. Green, Tabular algebras and their asymptotic versions, J. Algebra 252 (2002), 27–64.
[11] R.M. Green, On planar algebras arising from hypergroups, J. Algebra 263 (2003), 126–150.
[12] R.M. Green, Generalized Jones traces and Kazhdan–Lusztig bases, J. Pure Appl. Alg. (to
appear; math.QA/0509362).
[13] R.M. Green, Star reducible Coxeter groups, Glasgow Math. J. 48 (2006), 583–609.
[14] R.M. Green and J. Losonczy, Canonical bases for Hecke algebra quotients, Math. Res. Lett.
6 (1999), 213–222.
[15] R.M. Green and P.P. Martin, Constructing cell data for diagram algebras, J. Pure Appl. Alg.
(in press; math.RA/0503751).
[16] J.E. Humphreys, Reflection Groups and Coxeter Groups, Cambridge University Press, Cam-
bridge, 1990.
[17] V.F.R. Jones, Hecke algebra representations of braid groups and link polynomials, Ann. of
Math. (2) 126 (1987), 335–388.
[18] V.F.R. Jones, Planar Algebras, I (preprint).
[19] D. Kazhdan and G. Lusztig, Representations of Coxeter groups and Hecke algebras, Invent.
Math. 53 (1979), 165–184.
[20] S. Lambropoulou, Knot theory related to generalized and cyclotomic Hecke algebras of type
B, J. Knot Th. Ram. 8 (1999), 621–658.
[21] G. Lusztig, Cells in affine Weyl groups, Algebraic groups and related topics, Adv. Studies
Pure Math 6, North-Holland and Kinokuniya, Tokyo and Amsterdam, 1985, pp. 255–287.
[22] B.G. Seifert, The spherical trace on inductive limits of Hecke algebras of type A, B, C, D
and factors, Quart J. Math. 41 (1990), 109–126.
[23] J.Y. Shi, Fully commutative elements and Kazhdan–Lusztig cells in the finite and affine
Coxeter groups, II, Proc. Amer. Math. Soc. 133 (2005), 2525–2531.
[24] J.R. Stembridge, On the fully commutative elements of Coxeter groups, J. Algebraic Combin.
5 (1996), 353–385.
[25] H.N.V. Temperley and E.H. Lieb, Relations between percolation and colouring problems and
other graph theoretical problems associated with regular planar lattices: some exact results
for the percolation problem, Proc. Roy. Soc. London Ser. A 322 (1971), 251–280.
http://arxiv.org/abs/math/0509362
http://arxiv.org/abs/math/0503751
ABSTRACT
  We show that there is a unique Markov trace on the tower of Temperley--Lieb
type quotients of Hecke algebras of Coxeter type $E_n$ (for all $n \geq 6$). We
explain in detail how this trace may be computed easily using tom Dieck's
calculus of diagrams. As applications, we show how to use the trace to show
that the diagram representation is faithful, and to compute leading
coefficients of certain Kazhdan--Lusztig polynomials.

<|endoftext|><|startoftext|>
Introduction
Quasinormal modes (QNMs) were originally observed in considering the scattering or emission of
gravitational waves by Schwarzschild black holes [1]. It was found that a characteristic damped
oscillation, which only depends on the black hole mass, dominated the time evolution in a certain
period of time. Since then QNMs have been investigated extensively both analytically and numerically.
For a general review and classification, see Refs. [2, 3]. From numerical studies, an asymptotic formula
for quasinormal frequencies of Schwarzschild black holes was obtained [4]:
2GMωn ≈ 0.0874247+
i+O[n−1/2]. (1)
The real part in the above formula was later postulated to be 1
ln 3 [5] based on a discrete area
spectrum of quantum black holes proposed in Ref. [6]. This was confirmed later by Motl and Neitzke
[7]. The recent surge of interest in the QNMs derived from its possible application in determining
the Immirzi parameter in loop quantum gravity[8]. The numerical value ln 3 in the real part of the
asymptotic quasinormal frequencies in Schwarzschild black holes was at first taken as a hint that the
relevant gauge group in loop quantum gravity is SO(3) instead of the commonly believed SU(2).
However, as shown in Ref. [7], the value ln 3 is not universal and one should take the argument with
a grain of salt.
Another interesting application of QNMs was pointed out by Horowitz and Hubeny in their study
of a scalar field in the background of a Schwarzschild anti-de Sitter black hole [9]. According to
AdS/CFT correspondence, a large black hole in AdS spacetime corresponds to a thermal state in CFT
[10]. They argued the decay of the scalar field corresponds to the decay of a perturbation of this state.
In the BTZ black hole, a one-to-one correspondence was found between the QNMs in the bulk and
the poles of the retarded correlation function in the dual conformal field theory on the boundary [11].
The idea of dS/CFT correspondence has also been proposed and formulated [15]. Since there is a
cosmological horizon in de Sitter spacetime, QNMs may also be defined in principle. Similar studies of
QNMs have also been carried out in de Sitter spacetime trying to lent support for such correspondence
[16]. However, the situation there is more subtle and it seems QNMs only exist in odd dimensions [3].
Therefore, it is not clear whether such correspondence makes sense in even dimensions, and further
study is necessary.
2 Perturbative calculation of the asymptotic form of quasi-
normal frequencies
In Ref. [12], the author calculated the first order correction to the asymptotic form of quasinormal
frequencies of a Schwarzschild black hole using a WKB analysis. The result was extended to include
the scalar field case using the monodromy analysis developed by Motl and Neitzke [13]. The agreement
with numerical results is excellent. We will begin with a brief review of their method which made
systematic expansion more accessible. In a background spacetime described by a metric gµν , a massless
scalar Φ satisfies the following Klein-Gordon equation:
−g∂νΦ
= 0. (2)
For four dimensional Schwarzschild black holes, the metric is given by
ds2 = −f(r)dt2 + f(r)−1dr2 + r2dΩ2,
with f(r) = (1− r0
) and r0 = 2GM. Let
Φ(r, t,Ω) = r φ(r)Ylm(Ω) e
iωt. (3)
φ(r) now satisfies the following equation:
− f(r) d
+ V (r)φ = ω2φ, (4)
V (r) = (1 − r0
l(l + 1)
By a simple modification in the potential V (r) [2],
V (r) = (1− r0
l(l + 1)
(1− j2)r0
, (5)
the previous equation can also describes linearized perturbation of the metric or an electromagnetic
test fields. Here, j = 0, 1, 2 which is the spin of the relevant field. They can also be classified as
the tensor, vector, and scalar types of perturbation to the background Schwarzschild metric using the
master equations derived by Ishibashi and Kodama [14]. Introducing the tortoise coordinate:
x(r) = r + r0 ln(r/r0 − 1),
one obtain a Schrodinger-like equation
+ V [r(x)]
φ = ω2φ. (6)
Because of our convention in eq (3), QNMs are defined through the following out-going wave
boundary condition:
φ(x) ∼
eiωx as x → −∞ (horizon),
e−iωx as x → ∞ (spatial infinity),
assuming Reω > 0. Define
f(x) = eiωx φ ∼
e2iωx as x → −∞,
1 as x → ∞.
According to Ref. [7], the boundary condition at the horizon translates to the monodromy of f(x)
around it
M(r0) = e4πωr0 . (9)
The same monodromy can also be accounted for by those around r = 0 and r = ∞, and it has
been shown that only the former one is non-trivial. To find the monodromy around r = 0, one need
to introduce the complex coordinate variable
z = ω(x− iπr0) = ω[r + r0 ln(1− r/r0)], (10)
which is vanishing at the black hole singularity r = 0. In the limit |r/r0| ≪ 1, the potential can be
expanded as a series in
z/(ωr0):
V (z) = −ω
2(1− j2)
3l(l+ 1) + 1− j2
2(−ωr0)1/2z3/2
− 3l(l+ 1) + 1− j
2(−ωr0)3/2z1/2
+ . . . . (11)
Note that the third term in the above expression is of order (−ωr0)−3/2 and would not contribute
until we consider third order perturbation. To second order in perturbation theory, the wavefunction
can be expanded as
φ = φ(0) +
φ(1) +
φ(2) +O(ω−3/2). (12)
The zeroth, first and second order equations are given by
dφ(0)
1− j2
φ(0) = 0; (13)
dφ(1)
1− j2
φ(1) =
−ωr0 δV (z)φ(0); (14)
dφ(2)
1− j2
φ(2) =
−ωr0 δV (z)φ(1), (15)
respectively. Here,
δV (z) =
3l(l+ 1) + 1− j2
2(−ωr0)1/2z3/2
. (16)
Define φ
(z) to be the two linearly independent solutions to the zeroth order equation
(z) =
J±j/2(z). (17)
In the asymptotic region z ≫ 1
(z) ≈ cos[z − π(1 ± j)/4]. (18)
It has been shown by Musiri and Siopsis that φ
can be expressed in terms of φ
+ (z) = Cφ
+ (z)
dz1 δV (z1)φ
(z1)φ
+ (z1)− Cφ
dz1 δV (z1)φ
+ (z1)φ
+ (z1); (19)
(z) = Cφ
+ (z)
dz1 δV (z1)φ
(z1)φ
(z1)− Cφ(0)− (z)
dz1 δV (z1)φ
+ (z1)φ
(z1). (20)
where C =
−ωr0/ sin(πj/2) [13]. Similarly, φ(2)± can in turn be expressed in terms of φ
+ (z) = Cφ
+ (z)
dz2 δV (z2)φ
(z2)φ
+ (z2)− Cφ
dz2 δV (z2)φ
+ (z2)φ
+ (z2); (21)
(z) = Cφ
+ (z)
dz2 δV (z2)φ
(z2)φ
(z2)− Cφ(0)− (z)
dz2 δV (z2)φ
+ (z2)φ
(z2). (22)
In the limit, z → ∞,
(z) = c−± φ
+ (z)− c+± φ
(z); (23)
(z) = d−± φ
+ (z)− d+± φ
(z). (24)
Here,
c±± = C
dz1 δV (z1)φ
(z1)φ
(z1); (25)
d±± = C
dz1δV (z2) δV (z1)φ
+ (z2)φ
(z1)− φ(0)− (z2)φ
+ (z1)
(z1). (26)
Notice that φ
defined in eq (17) are in fact linearly dependent to each other when j is an even
integer. As a result, each of these coefficients is divergent by itself in these cases. It is reassuring to
see that all the divergent pieces cancel among themselves so that physically interested quantities do
have a smooth limit when j is an even integer. In zeroth order, the combination
φ(0)(z) = φ
+ (z)− e−iπ(j/2)φ
(z) ∼ e−iz (27)
in the asymptotic region z ≫ 1. This can be extended to second order
φ(z) =
+ (z) +
+ (z) +
+ (z)
− e−iπ(j/2)
1− ξ√
(z) +
(z) +
, (28)
by introducing two parameters ξ and ζ. Naturally, they are determined by the condition that the
coefficient of the eiz term is vanishing when z → ∞:
ξ = ξ+ + ξ−; (29)
ζ = −ξξ− + d++ eiπj/2 − d+− + d−− e−iπj/2 − d−+, (30)
where
ξ+ = c++ e
iπj/2 − c+−, ξ− = c−− e−iπj/2 − c−+. (31)
Substitute the above result back to eq (28), we have
φ(z) = i eiπ(1−j)/4 sin(πj/2) e−iz
1− ξ−√
ξ(ξm + c+−)− d−− e−iπj/2 + d−+
, (32)
where the identity c−+ = c+− has been used to simplify the expression.
When going around the black hole singularity by 3π, φ
and φ
both pick up an extra phase:
( e3iπz) = e3iπ(2±j)/2φ
(−z); (33)
( e3iπz) = e3iπ(3±j)/2φ
(−z). (34)
Consequently,
φ( e3iπz) = e3iπ(1+j)/2
+ (−z)− i
+ (−z)−
+ (−z)
− e−iπ(j/2)
1− ξ√
e3iπ(1−j)/2
(−z)− i 1√
(−z)− 1
. (35)
To second order,
φ( e3iπz) = −i eiπ(1−j)/4 sin(3πj/2) e−iz
(1 + ie3iπj)ξ+ + (1 + i)ξ−√
−ωr0(−1 + ei3πj)
−(1 + i)ξ ξ− + [(1 + ei3πj)(d++ eiπj/2 − d−+) + 2d−− e−iπj/2 − 2d+−]
−ωr0(−1 + ei3πj)
+ . . . , (36)
where the term eiz is not relevant for our calculation and has been neglected. Taking the ratio between
the coefficients of the term e−iz in eqs (36) and (32), we obtain the monodromy to second order:
M(r0) = −[1 + 2 cos(jπ)]
∆2c +∆2d
. (37)
Here,
(1 + ie3iπj)ξ+ + (i+ e
3iπj)ξ−
(−1 + ei3πj)
; (38)
∆2c = −(1− i)ξ+ ξ− − ξc+−; (39)
∆2d =
(1 + ei3πj)(d++ e
iπj/2 + d−− e
−iπj/2)− 2d+− − 2d−+ ei3πj
(−1 + ei3πj)
. (40)
The terms ∆2c and ∆2d depend on coefficients cµν and dµν , respectively. Although our expression for
∆1 here is different from that in Ref. [13] by a phase factor, our final result is identical to their.
Making use of the formula
I1(µ, ν) ≡
dz z−1/2Jµ(z)Jν(z) =
π/2Γ(1+2µ+2ν
Γ(3−2µ−2ν
)Γ(3+2µ−2ν
)Γ(3−2µ+2ν
, (41)
one can obtain explicitly
c++ =
3l2 + 3l+ 1− j2
) Γ(1−2j
) Γ(1+2j
) sin[
π(1−2j)
2 sin( j π
; (42)
c−− =
3l2 + 3l+ 1− j2
) Γ(1−2j
) Γ(1+2j
) sin[
π(1+2j)
2 sin( j π
; (43)
c+− =
3l2 + 3l+ 1− j2
) Γ(1−2j
) Γ(1+2j
) sin[
π(1−2j)
] sin[
π(1+2j)
2 sin( j π
. (44)
Note that
c−− = −c++(j → −j); c+− = −c−+(j → −j). (45)
These relation are also obeyed by dµν ’s, which can be used to reduce our work. With the above results,
we are ready to find ∆1 and ∆2c in eq (40):
∆1 = −
i(3l2 + 3l + 1− j2) Γ2(1
) cos(
) cos(jπ)
2π3/2[1 + 2 cos(jπ)]
; (46)
∆2c = −
(3l2 + 3l + 1− j2)2 Γ4(1
) Γ2(1−2j
) Γ2(1+2j
) cos(jπ)
1152π3
. (47)
The double integral
I2(µ2, ν2;µ1, ν1) ≡
dz1 z
1 Jµ2(z2)Jν2(z2)Jµ1(z1)Jν1 (z1) (48)
can be expressed in terms of the generalized hypergeometric functions, but the general formula is quite
complicated and not particularly illuminating. Therefore, we will just give the final result explicitly
for the coefficients d++ and d+−:
d++ = −
3l2 + 3l+ 1− j2
cot( jπ
) 5G4(
, 1+j
, 1−j
, 1, 2+j
, 2−j
576 sin2( j π
3l2 + 3l+ 1− j2
cot( j π
) cot(j π) Γ(1+2j
) Γ(1+2j
) Γ2(1+j
288 sin( jπ
1 + 2j
1 + j
1 + j
1 + 2j
2 + j
2 + j
5 + 2j
, 1 + j; 1)
3l2 + 3l+ 1− j2
) Γ2(1−2j
) Γ2(1+2j
) sin2[
π(1−2j)
] sin[
π(1+2j)
2π3 sin2( j π
; (49)
d+− = −
3l2 + 3l+ 1− j2
) 5G4(
, 1+j
, 1−j
, 1, 2+j
, 2−j
1152 sin3( j π
3l2 + 3l+ 1− j2
cot2( j π
) cot(j π) Γ(1−2j
) Γ(1−2j
) Γ2(1−j
1− 2j
1− 2j
5− 2j
, 1− j; 1)
3l2 + 3l+ 1− j2
) Γ2(1−2j
) Γ2(1+2j
) sin2[
π(1−2j)
] sin2[
π(1+2j)
2304 π3 sin2( j π
. (50)
Here, we have used the regularized generalized hypergeometric function 5G4(a1, a2, a3, a4, a5; b1, b2, b3, b4; z)
so that the pole structure of each term in these expressions are more explicit. It is related to the usual
generalized hypergeometric function by
5G4(a1, a2, a3, a4, a5; b1, b2, b3, b4; z) =
5F4(a1, a2, a3, a4, a5; b1, b2, b3, b4; z)
Γ(b1)Γ(b2)Γ(b3)Γ(b4)
. (51)
The other two coefficients can be obtained by relations analogous to those in eq (45)
d−− = −d++(j → −j); d−+ = −d+−(j → −j). (52)
On the face of it, each of the dµν ’s has a third order pole coming from terms involving the generalized
hypergeometric function when j is an even integer. On closer look, we see there are some cancelation
among the divergences and in the end all they have are just simple poles in such limit similar to the
cµν ’s. Another possible divergence arises in d−− when j = 1, which will again be canceled when we
calculate the monodromy.
It is now straightforward to obtain ∆2d by making use of the following two identities
−4π7/2 cos2(
) cot(jπ)Γ(
1 + 2j
1 + 2j
) Γ2(
1 + j
1 + 2j
1 + j
1 + j
1 + 2j
2 + j
2 + j
5 + 2j
, 1 + j; 1)
+4π7/2 cos2(
) cot(jπ)Γ(
1 − 2j
1− 2j
) Γ2(
1− 2j
1− 2j
5− 2j
, 1− j; 1)
−Γ4(1
) Γ2(
1 − 2j
) Γ2(
1 + 2j
) sin[
π(1− 2j)
] sin[
π(1 + 2j)
] = 0; (53)
−4π7/2 cos(jπ
1 + 2j
1 + 2j
) Γ2(
1 + j
1 + 2j
1 + j
1 + j
1 + 2j
2 + j
2 + j
5 + 2j
, 1 + j; 1)
−4π7/2 cos(
1− 2j
1 − 2j
) Γ2(
1 − j
1− 2j
1− 2j
5− 2j
, 1− j; 1);
+8π5 Γ(
) 5G4(
1 + j
2 + j
−Γ4(1
) Γ2(
1 − 2j
) Γ2(
1 + 2j
) cos(
)[1− cos(jπ)] = 0. (54)
Eventually, we achieve the following nice result
∆2d =
(3l2 + 3l+ 1− j2)2 Γ4(1
) Γ2(1−2j
) Γ2(1+2j
) cos(jπ)
1152π3[1 + 2 cos(jπ)]
, (55)
where all divergences have been canceled out.
Together with the result from eq (47), the asymptotic form of quasinormal frequencies of a four
dimensional Schwarzschild black hole is found to be
4πωr0 = (2n+ 1)πi+ ln[1 + 2 cos(jπ)]
i(3l2 + 3l + 1− j2) Γ2(1
) Γ(1−2j
) Γ(1+2j
) cos( jπ
) cos(jπ)
2π3/2[1 + 2 cos(jπ)]
(3l2 + 3l+ 1− j2)2 Γ4(1
) Γ2(1−2j
) Γ2(1+2j
) cos2(jπ)
576π3[1 + 2 cos(jπ)]2(−ωr0)
+O[(−ωr0)−3/2]. (56)
The physically interested cases are
≈ (2n+ 1)πi+ ln 3 + 1− i√
(l2 + l − 1)Γ4(1/4)
2π3/2
(l2 + l − 1)2Γ8(1/4)
2592π3
, for j = 2; (57)
≈ (2n+ 1)πi+ ln 3 + 1− i√
(l2 + l + 1/3)Γ4(1/4)
2π3/2
(l2 + l+ 1/3)2Γ8(1/4)
288π3
, for j = 0; (58)
≈ 2nπi+ i2π(l
2 + l)2
, for j = 1. (59)
A few comments are in order. First, all the second order corrections are purely imaginary. In particular,
when j = 2 (gravitational perturbation) the numerical coefficients of the i/n term (after divided by
4π) are 0.739, 3.58, 49.7 for l = 2, 3, 6, respectively. They are in good agreement with the known
numerical studies [4]. As for the real part, our result predicts vanishing correction. For j = 2, this is
again consistent with the numerical results in Ref. [4] for l = 2, 3. For l = 6 the numerical result is
0.263, which seems to be contradictory to ours. However, the numerical value for l = 6 has opposite
sign relative to those of l = 2, 3. This is peculiar, since in all other cases a given type of corrections
are always of the same sign irrespective of the specific value of angular momentum. Therefore, we
believe more study is needed to clarify whether there is really a discrepancy. As for the j = 1 case,
the numerical study in Ref. [17] suggests the leading correction is of the form b
. However, this
does not necessarily mean the two results are inconsistent. In fact, one can only extract the behavior
of the leading correction to the real part from their Fig. 2 and further numerical study is needed to
confirm or refute our prediction.
3 Conclusion
In sum, we have calculated to second order the correction to the asymptotic form of quasinormal
frequencies for Schwarzschild black holes in four dimensions. Most of our results are consistent with
the numerical ones when available. In cases where there seem to be contradiction, we think further
numerical studies are needed to clarify the situation. It would also be helpful if more detailed numerical
studies can be carried out for the j = 0 case so that more thorough comparisons are possible. It would
be interesting to generalize the method to other spacetime backgrounds [18]. Extension to higher order
is also desirable. It might enable us to find a quantitative prediction for the ”algebraically special”
frequencies in Schwarzschild black holes, where the quasinormal frequency is purely imaginary and it
increases with the fourth power of l [19, 4].
Acknowledgment
The author thanks Chong-Sun Chu for helpful discussions. The work is supported in part by the
National Science Council and the National Center for Theoretical Sciences, Taiwan.
References
[1] C. V. Vishveshwara, ”Scattering of gravitation radiation by a Schwarzschild black-hole,” Nature
227, 936 (1970); W. H. Press, ”Long Wave Trains of Gravitational Waves from a Vibrating
black hole,” Astrophys. J. Lett. 170, L105 (1971).
[2] H. P. Nollert, ”Quasinormal modes: the characteristic ’sound’ of black holes and neutron stars,”
Class Quantum Grav. 16, R159 (1999); K. D. Kokkotas and B. G. Schmidt, ”Quasi-normal
modes of stars and black holes,” Living Rev. Relativ. 2, 2 (1999) [gr-qc/9909058].
http://arxiv.org/abs/gr-qc/9909058
[3] J. Noatario and R. Schiappa, ”On the classification of asymptotic quasinormal frequencies
for d–dimensional black hole and quantum gravity,” Adv. Theor. Math. Phys. 8, 1001 (2004)
[hep-th/0411267].
[4] H. P. Nollert, ”Quasinormal Modes of Schwarzschild Black Holes: the Determination of Quasi-
normal Frequencies of Very Large Imaginary Parts,” Phys. Rev. D 47, 5253 (1993); N. An-
dersson, ”On the Asymptotic Distribution of Quasinormal-mode Frequencies of Schwarzschild
Black Holes, ” Class Quantum Grav. 10, L61 (1993).
[5] S. Hod, ”Bohr’s Correspondence Principle and the Area Spectrum of Quantum Black Holes,”
Phys. Rev. Lett. 81, 4293 (1998) [gr-qc/9812002].
[6] J. D. Bekenstein and V. F. Mukhanov, ”Spectroscopy of the Quantum Black Hole,”
Phys. Lett. B360, 7 (1995) [gr-qc/9505012].
[7] L. Motl, ”An Analytic Computation of Asymptotic Schwarzschild Quasinormal Frequencies,
” Adv. Theor. Math. Phys. 6, 1135 (2003) [gr-qc/0212096]; L. Motl, and A. Neitzke,
”Asymptotic Black Hole Quasinormal Frequencies,” Adv. Theor. Math. Phys. 7, 307 (2003)
[hep-th/0301173].
[8] O. Dreyer, ”Quasinormal Modes, the Area Spectrum, and Black Hole Entropy,” Phys. Rev.
Lett. 90, 081301 (2003) [gr-qc/0211076].
[9] G. T. Horowitz and V. E. Hubeny, ”Quasinormal Modes of AdS Black Holes and the Approach
to Thermal Equilibrium,” Phys. Rev. D62, 024027 (2000) [hep-th/9909056].
[10] J. Maldacena, ”The Large N Limit Of Superconformal Field Theories And Supergravity,”
Adv. Theor. Math. Phys. 2, 231 (1998) [hep-th/9711200]; E. Witten, ”Anti-De Sitter Space
And Holography,” Adv. Theor. Math. Phys. 2, 253 (1998) [hep-th/9802150]; S. Gubser,
I. Klebanov, and A. Polyakov”Gauge Theory Correlators From Noncritical String Theory,”
Phys. Lett. B 428, 105 (1998) [hep-th/9802109].
[11] D. Birmingham, I. Sachs, and S. N. Solodukhin, ”Conformal Field Theory Interpretation of
Black Hole Quasinormal Modes,” Phys. Rev. Lett. 88, 151301 (2002) [hep-th/0112055].
http://arxiv.org/abs/hep-th/0411267
http://arxiv.org/abs/gr-qc/9812002
http://arxiv.org/abs/gr-qc/9505012
http://arxiv.org/abs/gr-qc/0212096
http://arxiv.org/abs/hep-th/0301173
http://arxiv.org/abs/gr-qc/0211076
http://arxiv.org/abs/hep-th/9909056
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/9802150
http://arxiv.org/abs/hep-th/9802109
http://arxiv.org/abs/hep-th/0112055
[12] A. Maassen van den Brink, ”WKB analysis of the Regge- Wheeler Equation Down in the
Frequency Plane,” J. Math. Phys. 45, 327 (2004) [gr-qc/0303095].
[13] S. Musiri and G. Siopsis, Perturbative calculation of quasi-normal modes of Schwarzschild black
holes, Class. Quant. Grav. 20 L285, (2003) [hep-th/0308168].
[14] A. Ishibashi and H. Kodama, ”A Master Equation for Gravitational Perturbations of
Maximally Symmetric Black Holes in Higher Dimensions,” Prog. Theor. Phys. 110, 701
(2003) [hep-th/0305147]; A. Ishibashi and H. Kodama, ”Stability of Higher Dimen-
sional Schwarzschild Black Holes,” Prog. Theor. Phys. 110, 901 (2003) [hep-th/0305185];
A. Ishibashi and H. Kodama, ”Master Equations for Perturbations of Generalized Static
Black Holes with Charge in Higher Dimensions,” Prog. Theor. Phys. 111, 29 (2004)
[hep-th/0308128].
[15] E. Witten, ”Quantum Gravity In De Sitter Space,” [hep-th/0106109]; A. Strominger,
”The dS/CFT Correspondence,” JHEP 0110, 034 (2001) [hep-th/0106113]; D. Klemm,
”Some Aspects Of The De Sitter/CFT Correspondence,” Nucl. Phys. B 625, 295 (2002)
[hep-th/0106247].
[16] E. Abdalla, B. Wang, A. Lima-Santos, and W. G. Qiu, ”Support Of dS/CFT Correspon-
dence From Perturbations Of Three-Dimensional Space-Time,” Phys. Lett. B 538, 435 (2002)
[hep-th/0204030]; E. Abdalla, K. H. C. Castello-Branco, and A. Lima-Santos, ”Sup-
port of dS/CFT Correspondence from Space-time Perturbations,” Phys. Rev. D 66, 104018
(2002) [hep-th/0208065]; Y. S. Myung and N. J. Kim, ”Difference between AdS and dS
Spaces: Wave Equation Approach,” Class. Quant. Grav. 21, 63 (2004) [hep-th/0304231];
T. R. Choudhury and T. Padmanabhan, ”Quasinormal Modes in Schwarzschild-de Sitter
Space-time: A Simple Derivation of the Level Spacing of the Frequencies,” Phys. Rev. D 69,
064033 (2004) [gr-qc/0311064].
[17] V. Cardoso, J. P. S. Lemos and S. Yoshida, Quasinormal Modes of Schwarzschild Black Holes
in Four and Higher Dimensions, Phys. Rev. D69, 044004 (2004) [gr-qc/0309112].
[18] S. Musiri, S. Ness and G. Siopsis, ”Perturbative Calculation of Quasi-normal Modes of AdS
Schwarzschild Black Holes, Phys. Rev. D 73, 064001 (2006), [hep-th/0511113]; F.-W. Shu
http://arxiv.org/abs/gr-qc/0303095
http://arxiv.org/abs/hep-th/0308168
http://arxiv.org/abs/hep-th/0305147
http://arxiv.org/abs/hep-th/0305185
http://arxiv.org/abs/hep-th/0308128
http://arxiv.org/abs/hep-th/0106109
http://arxiv.org/abs/hep-th/0106113
http://arxiv.org/abs/hep-th/0106247
http://arxiv.org/abs/hep-th/0204030
http://arxiv.org/abs/hep-th/0208065
http://arxiv.org/abs/hep-th/0304231
http://arxiv.org/abs/gr-qc/0311064
http://arxiv.org/abs/gr-qc/0309112
http://arxiv.org/abs/hep-th/0511113
and Y.-G. Shen, ” it Perturbative Calculation of Quasinormal Modes of d-Dimensional Black
Holes,” JHEP 0608, 087 (2006) [hep-th/0605128].
[19] S. Chandrasekhar, ”On Algebraically Special Perturbations of Black Holes,” Proc. R. Soc.
London A392, 1 (1984).
http://arxiv.org/abs/hep-th/0605128
	Introduction
	Perturbative calculation of the asymptotic form of quasinormal frequencies
	Conclusion
ABSTRACT
  We analytically calculate to second order the correction to the asymptotic
form of quasinormal frequencies of four dimensional Schwarzschild black holes
based on the monodromy analysis proposed by Motl and Neitzke. Our results are
in good agreement with those obtained from numerical calculation.

<|endoftext|><|startoftext|>
Introduction
Carbon nanotubes are prototypical of quasi-one dimensional graphene nanos-
tructures. The approximate electronic structure of a carbon nanotube with
diameter D is understood starting from the graphene dispersion relation, i.e.
the Dirac cone E = ~v0|k|, and quantizing the angular momentum about the
axis so that En = ~v0
k2z + k
n, where kn = (n + a)π/R, where a is 0 or 1/2
depending on whether the nanotube is metallic or semiconducting. A metal-
lic nanotube has two dispersionless bands that cross the Fermi level while a
semiconducting nanotube has a band-gap Eg = γ0a0/R ∼ 0.4eV·nm/R. This
property, that graphene nanostructures can be metallic or semiconducting de-
pending on their shape carries over to nanopatterned graphene ribbons as
shown below.
High purity multiwalled carbon nanotubes (as well as single walled nanotubes)
were found to be room-temperature ballistic conductors [1]. This property re-
quires (at least) that electrons traverse the length of the nanotube without
scattering. This discovery coincided with predictions of the effect by Ando
[2,3], and by Todorov and White [4] who demonstrated that the chiral nature
of the charge carriers in nanotubes inhibits backscattering [in all graphene
structures (including graphene), chirality results from the equivalence of the
A and B sub-lattices]. Ando first recognized the formal analogy between neu-
trino wave functions and those that describe electrons near the Fermi level
in nanotubes (and in graphene). Neutrinos are massless fermions that are de-
scribed by the Weyl’s equation (or massless Dirac equation) [3]. The quantum
number associated with chirality is the pseudospin which, like spin, can have
two values. Unlike spin, the pseudospin is coupled to the momentum. In order
to backscatter an electron, the scattering potential must reverse both the mo-
mentum and the pseudospin. Interactions that act equivalently on A and B
atoms (like long-range potentials) conserve pseudospin and cannot backscatter
charge carriers.
Ballistic conduction is only one of the favorable electronic properties of carbon
nanotubes. Others are the extremely weak electron-phonon coupling [1,5], the
excellent FET characteristics [6], and the robustness of the material itself. All
of these properties indicate that nanotubes could be used for nanoelectronics.
Unfortunately, incorporation of nanotubes in large-scale integrated electronic
architectures proves to be so daunting that it may never be realized. Harness-
ing these properties requires graphitic materials that are related to carbon
nanotubes, but which are more manageable.
Precisely these theoretical considerations led us in 2001 to speculate that 2D
graphene could serve these purposes. We initiated experiments on epitaxially
grown graphene on single crystal silicon carbide. Much of the earlier efforts fo-
cused on producing and characterizing the epitaxial graphene material. While
we have achieved some success, much work remains. To fully exploit the prop-
erties of nanopatterned epitaxial graphene, one must control the graphene
material, its structure, and the chemistry and morphology of defined edges.
These are the challenges for graphene-based nanoelectronics. The most im-
portant feature of 2D epitaxial graphene is that interconnected structures
can in principle be patterned on the scale of an entire wafer. If, like carbon
nanotubes, the carriers remain ballistic, it will lead to a fascinating world of
coherent carbon-based electronics.
The discovery of the intriguing properties of deposited exfoliated graphene has
recently caused overwhelming excitement in the 2D electron gas community
[7,8,9,10]. This very fascinating material clearly demonstrates the chiral na-
ture of the charge carriers, as it manifests in several properties, of which the
anomalous phase in the quantum Hall effect is the most striking. The spon-
taneous rippling caused by the Mermin-Wagner transition [11,12,13] and the
absence of the weak anti-localization, possibly due to the gauge field at the
ripples [14], as well as the recently discovered high-field splitting of the Landau
levels [15] are all very important effects that still require full explanation.
The possibility that epitaxial graphene may serve as a platform for carbon-
based nanoelectronics has further greatly amplified the interest in this field,
especially in the electronics community. However, epitaxial graphene and de-
posited exfoliated graphene are very different materials. Epitaxial graphene is
generally multi-layered whereas exfoliated graphene has only one layer. There-
fore, epitaxial graphene, is a much more complex material; in fact it represents
a class of materials. It may seem that epitaxial graphene is simply ultrathin
graphite, but this is emphatically not so. Experimentally, the charge carri-
ers in epitaxial graphene are found to be chiral and the band structure is
clearly related to the Dirac cone [16,17,18,19,20,21]. To lowest order, epitaxial
graphene appears to consist of stacked, non-interacting graphene sheets, the
first of which is highly charged and the others carry much lower charge. In con-
trast to deposited exfoliated graphene, anomalous phase-transition-like state-
changes are often observed in transport measurements of epitaxial graphene,
that are probably related to weak interlayer interactions.
These first measurements suggest that, like most layered quasi-2D conducting
materials, epitaxial graphene is poised to present a host of interesting new
phenomena. A snapshot of the emerging science and technology of epitaxial
graphene is given here.
0 1 2 3 4 5 6
Number of graphene layers
100 200 300 400 500
Energy(eV)
(a) Si:C=0.75
(b) Si:C=1.59
(c) Si:C=0.14
Fig. 1. Model of Si:C Auger peak intensity ratio versus number of graphene layers
for SiC(0001) substrates. Solid line: Model with interface layer of C adatoms at
1/3 their bilayer density. Dotted line: Model with interface layer of Si adatoms at
1/3 their bilayer density. Dashed line: Model with bulk-terminated SiC(0001). Inset
shows Auger spectra obtained after (a) ex-situ H2 etching (no UHV preparation),
(b) UHV anneal at 1150◦C (LEED
3 pattern), (c) UHV anneal at 1350◦C
(LEED 6
3 × 6
3 pattern).
2 Epitaxial graphene formation and characterization
It is well known that ultrathin graphitic films grow on hexagonal silicon
carbide crystals [22,23,24,25,26]. Specifically they grow on the 0001 (silicon-
terminated) and 0001 (carbon-terminated) faces of 4H- and 6H-SiC when crys-
tals are heated to about 1300◦C in ultra-high vacuum (UHV). It is also possi-
ble to grow these films at more moderate vacuum conditions using ovens with
controlled background gas. The epitaxial growth is established by examining,
for example, the LEED patterns after various growth times (see e.g., Fig. 3).
Growth on the Si face is slow and terminates after relatively short times at high
temperatures. The growth on the carbon face apparently does not self-limit
so that relatively thick layers (∼ 4 up to 100 layers) can be achieved.
For thin layers, we can estimate the graphene thickness by modeling mea-
sured Auger-electron intensities [16] or photoelectron intensities [17]. Fig. 1
shows model results for the Si:C Auger intensity ratio for graphene grown on
SiC(0001) substrates, with three different assumptions for the interface layer
between bulk SiC and the graphene layers (see caption). The Auger model,
valid for both 4H and 6H polytypes, includes the relative sensitivity factors
for Si and C [27], attenuation of the 3keV incident electrons and of the Auger
electrons exiting from successively deeper layers [28,29,30], and the electron
collection angle (42◦). Thicker multilayer graphene can be measured via con-
ventional ellipsometry.
Scanning tunneling microscopy images of monolayer graphene on the surfaces
of 4H- and 6H-SiC(0001) (Si-face) show large flat regions with a characteristic
Fig. 2. STM topographs (0.8 V sample bias, 100 pA) of nominally 1 ML epitaxial
graphene on SiC(0001). Top: Image showing large flat regions of 6
3 × 6
3 re-
construction and regions where the reconstruction has not fully formed. Next-layer
islands are also seen. Bottom: A region of 6
3 reconstruction, imaged through
the overlying graphene layer.
hexagonal corrugation of ∼ 0.3 Å on a 1.9-nm period (Fig. 2). Small-scale
images resolve the graphene atomic lattice throughout [16,23], but with a
factor 10×-20× smaller amplitude. Imaging for the monolayer is apparently
dominated by interface states of an underlying reconstruction of the SiC. In
conjunction with the graphene overlayer is a 6
3 × 6
3R30◦ reconstruction
with respect to the bulk-terminated SiC surface. The detailed reconstruction
of this surface is still a matter of debate [31]. Successive graphene layers show
much less influence of the interface states [16], but the 1.9-nm corrugation
period (6 × 6 with respect to the SiC bulk-terminated surface) is still visible
in both STM and LEED for the thickest Si-face films we have prepared [5-6
monolayers (ML)].
To date, most transport measurements have been done on multilayer graphene
grown on the carbon face [SiC(0001) substrates]. This material is grown in an
RF-induction furnace at pressures of ∼ 10−5 Torr. Because the initial film-
growth is very rapid, it is rare to obtain films thin enough for direct STM
and LEED studies of those layers near the SiC interface. As a consequence of
charge transfer from the SiC, these layers are the most important for electrical
transport. Surface x-ray scattering has proved to be a useful tool for extracting
quantitative information about the C-face-grown material.
Figure 3 shows LEED patterns from two graphene films grown on 4H-SiC(0001)
substrates. According to the Auger ratios, these were nominally (a) 3 ML
graphene, and (b) 4 ML graphene. The LEED pattern in Fig. 3(a) shows rel-
 = 0.022Å-1
 = 0.003Å-1
 (Å-1)
2.90 2.92 2.94 2.96 2.98 3.00
Si-Face
C-Face
Graphite (011) 
rotated 32.2o
 = 0.005Å-1
φφφφ (degrees)
-3 -2 -1 0 1 2 3
-2.2o 2.2o
φφφφ (degrees)
-33 -32 -31 -30 -29 -28 -27
(011) Graphite
Graphite [01 ]
(c) (d)
(a) (b)Graphite
[1100]
[1000]
Fig. 3. LEED and x-ray diffraction from multilayer graphene grown on 4H-SiC(0001)
substrates. (a) LEED pattern (71 eV) for ∼ 3 ML graphene, (b) LEED pattern (103
eV) for ∼ 4 ML graphene (unlabeled sets of 6-fold spots in (a) and (b) are from a√
3R30◦ SiC interface reconstruction). (c) Radial x-ray scans through (top) the
(10ℓ) graphite rod, and (bottom) across the diffuse arcs seen in (b). (d) Azimuthal
x-ray scans across (top) the graphite (10ℓ) rod and (bottom) the diffuse rods seen
in (b).
atively good registry to the SiC substrate (with the unit cell rotated by 30◦,
as for Si-face material), whereas the film in Fig. 3(b) shows some rotational
disorder. The evidence suggests that epitaxial growth does occur at the inter-
face, but that succeeding graphene sheets do not have strong rotational order.
Interestingly, the diffuse rings in Fig. 3(b) are clearly centered around a mini-
mum in intensity on the SiC azimuth, indicating some preferential alignment,
as discussed below.
While there is azimuthal disorder in the film, the long range vertical order of
the film is much larger than is observed for Si-face grown films [32]. This is
demonstrated in Fig. 3(c) that shows radial x-ray diffraction scans through
both the graphite (10ℓ) graphite rod (φ = −30.0◦ in the [1100] SiC direction)
and through the diffuse rings (φ = 2.2◦ in the SiC [1000] direction). The x-
ray profiles for both the (10ℓ = 1.5) and diffuse rods on the C-face graphene
are nearly 10 times narrower than those for Si-face films. The profile widths
are inversely related to the size of order graphene domains; L = 2π/∆qr. For
Si-face films the order graphene regions are ∼ 290Å while for the C-face films
the domains are ∼ 2100Å. The domain size estimated this way is most likely
a lower limit on the actual size of a graphene sheet. A continuous graphene
sheet (typically 3000Å terrace width) folded over a SiC step would break the
scattered x-ray coherence from the two regions, but may have a much smaller
influence on the electronic structure. Note that even the diffuse rings have
domain sizes of ∼ 1200Å.
In fact, the rotationally disordered graphene has a structure. Fig. 3(d) shows x-
ray azimuthal scans through both the graphite (10ℓ) and diffuse graphite rods.
The diffuse rings are in fact peaked at ±2.2◦ relative to the SiC azimuth. This
angle is not arbitrary. It corresponds to a structure were two vertically stacked
graphene sheets are commensurate if rotated with respect to one another by
cos 11/13 = 32.204◦ [33]. Both 30◦ and ±2.204◦ rotated graphene are also
nearly commensurate with the SiC 6
3 × 6
3 R30 seen in Si-face grown
graphene [see Fig. 3(a)] It therefore seems that during graphitization large
graphene sheets are free to rotate with respect to each other and lock in, on
average, to these preferred orientations on the SiC C-face.
In addition to the difference in long range and orientational order of films
grown on the two polar faces of SiC, the vertical roughness of the multilayer
graphene is very different. X-ray diffraction reveals that the rms roughness of
the C-face multilayer films is less than 0.05 Å over the 2 µm coherence length
of the beam [34]. On the Si-face the roughness is much larger (∼ 0.2Å [35]),
presumably as a consequence of the 6 × 6 corrugation (see Fig. 2).
Finally, x-ray reflectivity experiments show two other important features of
multilayer graphene grown on the C-face of SiC. First, the first layer of
graphene sits 1.62Å above the last SiC layer [34,36]. This bond length is nearly
equal to the bond length of diamond (1.54Å) and suggest that the substrate
bond to the first graphene layer is much stronger than a van der Waals in-
teraction. In fact ab intio calculations find and very similar bond distance
[36]. These calculations show that the first graphene layer is in fact insulat-
ing. Only the formation of the second graphene layer gives rise to an electron
dispersion curve showing a Dirac cone. Thus the first graphene layer can be
interpreted as a “buffer” layer between the substrate and an isolated layer
with the electronic properties of an isolated graphene sheet.
The second important result from the x-ray reflectivity is that the graphene
interlayer spacing is significantly larger than bulk graphite [34]. The measured
value is 3.368Å which is between the value of bulk graphite and turbostratic
graphite. This larger spacing suggest a significant density of stacking faults.
This is not too surprising given the rotational disorder in the C-face films.
For a random stacking fault model the layer spacing can be used to estimate
the stacking fault density to be one every other layer [34,37]. This type of
density suggest that the AB stacking order, that would destroy the graphene
electronic character, is nearly lost in these films and may significantly impact
the transport properties of these films.
Fig. 4. Infrared transmission spectroscopy of epitaxial graphene with about 10 layers
revealing Landau level structure. (a) Infrared transmission spectrum at B = 0.4 T
and T = 1.9 K, showing a series of absorption peaks. (Inset) The absorption maxima
positions as a function of field showing the
B dependence that is characteristic
for a chiral “massless” Dirac particle. (b) Schematic diagram of the Landau levels
En(B) in which the only parameter is v0 that is found to be 10
8 cm/s. The arrows
indicate the observed transitions. EF is determined from the lowest field for which
the n = 0 to n = 1 transition is observed.
3 Landau level spectroscopy of epitaxial graphene
Dirac particle properties of the charge carries in epitaxial graphene multilayers
have been beautifully demonstrated in Landau Level spectroscopy by Sadowski
et al. [20]. (See Sadowski et. al. in this issue for a summary and update).
We summarize some of the results here. In these measurements, an epitaxial
graphene sample is illuminated by infrared light in a magnetic field at low
temperatures. The absorption is measured as a function of photon energy at
various magnetic field strengths. An example of such a spectrum is shown
in Fig. 4. The various absorption lines are identified as transitions between
various Landau levels. The transitions energy are found to accurately follow
En = v0
2ne~B. The exact
B dependence is the hallmark of a ”massless”
Dirac particle (more precisely, of a linear density of states); massive particles
have a linear B dependence. Moreover, a gap at the tip of the Dirac cone
also distorts the
B behavior. The Fermi velocity is determined from the
dispersion of the transitions with magnetic field to be v0 = 1.03 × 108 cm/s,
which is close to its value for exfoliated graphene. The n = 0 to n = 1
transition is observed only for B ≥ 0.16 T, which indicates that the n = 1
level is just depopulated at that field. Hence, -15 meV < EF < 15 meV and
n ≈ 1.5×1010 /cm2 and the Fermi wavelength is ≈ 300 nm. It is further found
that the intensity of the signal scales with the thickness of the film. These
experiments demonstrate that epitaxial graphene consists of stacked graphene
layers, whose electronic band structure is characterized by a Dirac cone with
chiral charge carriers. Remarkably, there is no evidence for a gap nor for a
SiC with 
scratches
Flatten SiC by H2
etching Graphitization
Deposit
contacts
E-beam
lithography
DevelopO2 plasma 
Lift off HSQ 
Bare SiC HSQ Exposed 
Graphene Metal
contact
Resist spin-
coating
bonding
Fig. 5. Patterning epitaxial graphene
deviation of the linear density of states: undistorted Dirac cone properties are
directly observed as close as 20 meV to the Dirac point in the n = 0 − n = 1
transitions.
Epitaxial graphene is clearly not graphite, which has a different spectrum and
an entirely different electronic structure (see Sadowski et al. in this issue). This
difference reflects that epitaxial graphene does not have the Bernal stacking
that would lift the pseudospin degeneracy [34]. Hence epitaxial graphene is a
form of multilayered graphene that is structurally and electronically distinct
from graphite. These experiments probe the low charge density bulk of the
epitaxial graphene layer. Below we discuss the highly charged interface layer.
4 Patterning epitaxial graphene
Epitaxial graphene samples are patterned using a variety of microelectronics
patterning methods. Features down to several tens of nanometers are produced
by standard e-beam lithography methods. The method is outlined in Fig. 5.
5 Transport in 2D epitaxial graphene
The first published transport measurements on epitaxial graphene were made
on a Hall bar patterned on a graphene film with about 3 layers on the sili-
con face of 4H-SiC [16]. The mobility of the sample was relatively low (1100
cm2/V·s) nevertheless the Shubnikov-de Haas oscillations are clearly distin-
guished (see Fig. 6) [38]. Resistance maxima in graphene are expected at
fields Bn when the Fermi energy intercept the Landau levels, i.e. for EF =
2ne~Bn, where v0 ≈ 108 cm/s is the Fermi velocity, hence Bn = (EF /v0)2/2ne~ =
B1/n. For normal electrons maxima are found when EF = (n + 1/2)eBn~/m,
Fig. 6. 2D transport measured in a 400 µm by 600 µm Hall bar on 3 layer epitaxial
graphene on the Si face. Mobility µ = 1200 cm2/V·s, coherence length lφ = 300 nm.
(a) Magnetoresistance at T =0.3, 2 and 4 K showing well developed SdH peaks,
indicated with their Landau indices n; the Hall resistance at 0.3 K (dashed line),
shows a weak feature at the expected Hall plateau position. The amplitude of the
weak localization peak at B = 0 corresponds to 1G0. (b) Landau plot; the linear
extrapolation passes through the origin demonstrating the anomalous Berry’s phase
characteristic of graphene. (c) The Lifshitz-Kosevich analysis of the n = 2 and n = 3
peaks which correspond to graphene with a Fermi velocity vF = 7.2 × 105 cm/s.
hence Bn = EF m/(n + 1/2)~e. Therefore the Landau plot (a plot of n ver-
sus 1/Bn) of a Dirac particle intercepts the origin whereas the Landau plot
of a normal electron intercepts the y axis at n = 1/2. The intercept should
occur at 0 when the Berry’s phase is anomalous. This shows that the Landau
plot provides a ready method to identify a Dirac particle when the quantum
Hall measurements are not feasible. The Landau plot (Fig. 7) for data on a
sample similar to the one of [16] passes through the origin indicating that the
Berry’s phase is anomalous. The Hall coefficient at 0.3 K is found to be 330
Ω/T corresponding to a charge density of 2 × 1012 electrons/cm2. (Note that
for a Dirac particle it should be 6500/B1 = 450 Ω/T.) From v0 = 10
8 cm/s
we further find that EF ≈ 1680 K. The large charge density is caused by the
built-in electric field at the SiC-graphene interface, which dopes the interfa-
cial graphene layer. This layer carries most of the current (and causes the SdH
oscillations). The charge density of the top layers is more than 2 order of mag-
nitude smaller (see above) and they are expected to be much more resistive.
The temperature dependence of the SdH peak amplitudes is determined by
the Landau level spacing En+1(B)−En(B) and given by the Lifshitz-Kosevich
equation: An(T ) ∼ u/ sinh(u) where u = 2πk2BT/∆E(B) [39]. From this fit
we find that at B = 7 T, (E3(B) − E2(B))/kB = 250 K (compared with 340
K predicted for graphene at this carrier density) and that the Dirac point is
about 1290 K below EF .
This sample shows ample evidences that the carriers in the high-charge-density
layer, like those in the low-density layers, are Dirac electrons. However the
quantum Hall effect is not observed. Instead, only weak undulations are seen
in the Hall resistance. It was assumed that higher mobility samples would
enhance the QHE and subsequent work progressed in that direction. Note also
the intense weak localization peak near B = 0 indicative of significant point-
defect scattering. Due to the high current density, the interface graphene layer
dominates the transport, although the other layers are expected to contribute,
and more so in 2D structures than in quasi 1D structures (see below).
Graphene grown on the Si face typically has low electron mobilities. The very
thin films are relatively unprotected from even slight residual oxidizing gases
that damage the graphene [32]. Work is still progressing to improve Si face
graphene films.
On the other hand, graphene grown on the C face has much higher mobili-
ties [18]. The films are also considerably thicker so that the high-density layer
at the interface is more protected [34]. Fig. 7 shows the MR measurements
of a Hall bar (100 µm ×1000 µm) at several temperatures [21]. The SdH
oscillations are barely discernable, which is generally the case for our high
mobility 2D samples. The reason for this is not likely due to sample inho-
mogeneity. The Landau plot of the oscillations reveals the anomalous Berry’s
phase, characteristic of Dirac electrons. Furthermore the charge density is
3.8× 1012 electrons/cm2. The charge density from the Hall effect is 4.6× 1012
electrons/cm2. The Lishitz-Kosevich analysis of the peak heights agrees with
the expected Landau level spacing for a Dirac particle.
A striking feature of this sample is that the weak localization peak is very
weak, ∼ 0.07G0 (compared with the sample in Fig. 6) which indicates that
point defect density in this sample is low and these defects are possibly lo-
calized entirely at the patterned edges of the Hall bar. On the other hand, a
marked temperature dependent depression of the conductance at low fields is
observed. This feature suggests weak anti-localization that is expected when
Dirac electrons are scattered by long-range potentials [2,3]. These could be due
to the localized counterions in the SiC substrate. In fact the amplitude, field
and temperature dependence of this feature match predictions of the weak
anti-localization very well [40].
Another typical feature is the large positive magnetoresistance and a kink in
the Hall resistance at low fields. These features (as well as the small discrep-
ancy in the charge density) could be due to the other layers of density n . 1010
−5 0 5
B (T)
−0.02 0 0.02
−0.02 0 0.02
B (T)
3 4 5 6 7 8 9
B (T)
0 0.1 0.2 0.3 0.4
1/B (T−1)
0 2 4 6 8
B (T)
(a) (b)
(d) (e)
Fig. 7. 2D transport in a 100 µm × 1000 µm Hall bar on a ∼10 layer eptitaxial
graphene film on the C face. a) Resistance as a function of the magnetic field. Inset,
dash-dot lines, low field MR at various temperatures (1.4, 4.2, 7, 10, 15, 20, 30,
50 K). (b) Low field MR after subtracting 50 K data as a background. dash-dot
lines, experimental data, which show suppressed weak localization peak around
zero. The positive MR above 0.02 T reveal the weak anti-localization effect. Solid
lines, fits to the theory by McCann et al.. (c) High field MR after subtracting a
parabolic background at several temperatures(4, 7, 15, 30 K). Well defined SdH
oscillations can be seen down to 2.5 T. (d) Landau plot for SdH oscillations, which
intercept y axis at zero. (e) Landau level spacing obtained by Lifshitz-Kosevich
analysis. Squares: experiment. Solid line, theoretical prediction for ∆E assuming
vF = 0.82 × 108 cm/s, dash-dot line: vF = 108 cm/s.
/cm2 [20], although no SdH features can be attributed to them. It should be
noted that the critical field Bc for which extreme quantum limit is reached
(where EF coincides with the n = 0 Landau level, i.e. at about 30 meV above
the Dirac point) is also very low: Bc ≤ 160 mT (see Fig. 4).
The Hall resistance is featureless (except for extremely weak ripples) and shows
no evidence for quantum Hall plateaus, as is for a typical high mobility 2D
samples.
The transport properties of a narrower ribbon are shown in Fig. 8. It is at
once clear that the SdH oscillations are much more pronounced. The Landau
plot corresponds quite well with the expectations for a Dirac particle with a
velocity 0.7×108 cm/s. This ribbon shows evidence for weak anti-localization.
A more pronounced weak localization peak compared with Fig. 7 is observed.
3 4 5 6 7 8 9
B (T)
0 0.1 0.2 0.3
 (T−1)
0 5 10
B (T)
(b) (c)
Fig. 8. Intermediate width Hall bar: 1 µm × 5 µm. The zero field resistance is 502
Ω. (a) High field MR after subtracting a smooth background at several tempera-
tures(4, 10, 20, 30, 50, 70 K). (b) Landau plot. B1 = 53T, intercept 0.13±0.02 (c)
Square: Landau level spacing ∆E obtained by fitting the temperature dependence of
SdH amplitudes to LK equation. Solid line, theoretical prediction for ∆E assuming
vF = 0.7 × 108 cm/s, dash-dot line: vF = 108 cm/s
However the Hall resistance, which is quite similar to that in Fig. 6, shows no
evidence for the QHE.
6 Transport in quasi-1D epitaxial graphene
Quantum confinement effects manifest in narrow ribbons. As for 2D Hall bars,
this interface graphene layer is charged with about 4 × 1012 electrons/cm2
which corresponds to a Fermi wavelength of about 20 nm. Since the Fermi
wavelength of the low-density layers is about 400 nm, consequently for ribbons
that are narrower than 500 nm, these layers contribute little to the transport.
For very narrow ribbons (≤ 100 nm) with rough edges, the low-density layers
are expected to be insulating, since there are no propagating modes (channels).
Figure 9 shows the Hall resistance and the magnetoresistance of a narrow
ribbon (see Ref. [18] for details). The Landau levels for a graphene ribbon are
approximately given by
En(B, W ) ≈ [En(W )4 + En(B)4]1/4 (1)
where EB(n) =
2neBv20~ and EW (n) = nπ~v0/W [41]. Confinement effects
become apparent for low fields, approximately when the cyclotron diameter
becomes greater than the ribbon width. Confinement will then cause deviation
Fig. 9. Narrow Hall bar 500 nm × 6 µm. The zero field resistance is 1125 Ω. (a)
Magnetoresistance oscillations for temperatures ranging from 4-58 K after subtrac-
tion of a smooth background. (b) Landau plot of the magnetoresistance peaks. The
deviation for large from linearity is due to quantum confinement. (c) The energy
gap between the Fermi level and the lowest unoccupied Landau level is found from
the Lifshitz-Kosevich analysis (inset) of the peaks and increases linearly with field
for large fields and saturates for low fields. The saturation confirms quantum con-
finement.
from the linearity in the Landau plot as seen in Fig. 9. The Lifshitz-Kosevich
analysis confirms the confinement. For high magnetic fields the energy sepa-
ration between the Landau levels increases with increasing field as expected,
while for low field the energy separation saturates and is determined by the
quantum confinement. Note that this analysis does not require a determination
of the locations of the magnetoresistance peaks (Ref. [18]).
The mobilities of the graphene ribbons appears to increase with decreasing
width, Fig. 11. This effect may be related to the reduced back-scattering with
decreasing number of conducting channels. On the other hand, back-scattering
at the ribbon edges should become relatively more important with decreasing
width. The amplitudes of the SdH oscillations are much more pronounced for
narrow ribbons than for high mobility 2D Hall bars.
A relatively large fraction of the high-mobility narrower Hall bar samples do
not exhibit SdH oscillations at all, as seen in Fig. 10. Occasionally rather
complex magnetoresistance structures that in many cases appear not to be
random but exhibit features that are approximately linear in field (like in the
Aharonov-Bohm effect). Several of these systems are found to be coherent
and ballistic. In one case the resistance of a 0.5× 5 µm Hall bar abruptly and
reversibly drops by an order of magnitude at T = 200 K to below 10 Ω/sq. It
appears that scattering at the edges is specular without any back-scattering.
0 2 4 6 8
B (T)
Fig. 10. Magnetoresistance of a 0.2 µm × 1 µm ribbon. The experiment were done
at 4, 8, 12, 30, 45, 60, 90 K, from top to bottom. The resistance has been shifted
for clarity, except for 4 K. The amplitude of the weak localization peak at zero field
is about 1G0.
10 -1 100 101 102 103
Width (micron)
mobility  @4K with Ns=3.4 1012/cm2
10 -2 10-1 100 101 102
Width ( µm )
T=4 K T=250K
Width (µm) Width (µm)
Fig. 11. The width dependence of mobility.
The effects point to a correlated electronic system (Levy, Berger, de Heer et
al., to be published).
7 Structure dependent properties and the absence of the quantum
Hall effect
A key focus of epitaxial graphene research is to develop a new graphene-
based electronics material with shape tunable properties. The intrinsic width
dependent bandgap of graphene ribbons has been born out experimentally
in back-gated deposited exfoliated graphene ribbons [42]. We have not yet
demonstrated the effect in epitaxial graphene, primarily due to problems in
gating the material, which we hope to solve soon.
Currently we have reasonable statistics that appear to suggest that the mobil-
ities of the ribbons actually increase with decreasing ribbon width (Fig. 11).
This intriguing property could be due to the fact that the system becomes
more one-dimensional with decreasing width and thereby that backscattering
is inhibited. On the other hand, the decreasing width also implies that the
edges (which are presumed to be rough) become more important and enhance
the scattering. Apparently that effect is not dominant.
It is remarkable that the SdH oscillations are extremely weak except for very
low mobility samples, that are known to be quite defective (as in Fig. 6).
In fact the SdH oscillations are almost imperceptible in the 2D sample (the
amplitudes are only 0.001 of the mean resistance) even though they are well
resolved up to the 15th Landau level. The weak localization peak is weak
(∼ 0.07G0) and evidence is seen for weak anti-localization. In contrast, the
oscillation of the 2nd Landau level in the low mobility sample 2D is large (0.3
of the mean resistance); this sample may exhibit the quantum Hall effect at
high fields. Furthermore, the weak localization peak is intense (∼ 1G0) In the
intermediate regime, the 1 µm width ribbon exhibits well resolved SdH peaks
(0.016 of the mean resistance) while the weak localization peak is 0.52G0,
weak anti-localization is also present.
Narrow ribbons exhibit more intense weak-localization peaks, well-resolved
SdH oscillations, quantum confinement peaks, and high mobilities but no ev-
idence for the quantum Hall effect. It may be assumed that the QHE in the
high-density layer is shorted out by the low density layers, however this is
not bourn out in simulations. For example, it is not possible to ”convert” the
oscillations of Fig. 9 to those of Fig. 7 by adding the conductivity of many
graphene layers to the former. Note that the relative SdH oscillation ampli-
tudes in Fig. 7 are 16 times smaller than in Fig. 8, while they are more than
20 times smaller in Fig. 6, while the square resistances of all three are within
a factor of 3 from each other.
The fact that the most intense SdH peaks in 2D samples are seen in the most
defective samples, leads us to conclude that defects, specifically in the ”bulk”
of the sample (i.e. away from the edges) are required for large amplitude SdH
peaks, and hence for the QHE.
This point of view is strengthened by the fact that a coulomb (electrostatic)
potential cannot trap Dirac particles [43,44]. Hence, if scattering away from
the edges is primarily from (long-range) coulomb potentials due to counter
ions in the SiC substrate, then these potentials cannot trap the carriers. It
is well known that localized states in the bulk are required for the QHE so
that the absence of such states would inhibit the QHE [45,46]! It would be of
course very important that this conclusion is verified since it so dramatically
departs from observations in deposited exfoliated graphene samples, which
further underscores fundamental differences in these materials.
References
[1] S. Frank, P. Poncharal, Z. L. Wang, W. A. de Heer, Science 280 (5370) (1998)
1744–1746.
[2] T. Ando, T. Nakanishi, J. Phys. Soc. Jpn. 67 (5) (1998) 1704–1713.
[3] T. Ando, T. Nakanishi, R. Saito, J. Phys. Soc. Jpn. 67 (8) (1998) 2857–2862.
[4] C. T. White, T. N. Todorov, Nature 393 (6682) (1998) 240–242.
[5] T. Hertel, G. Moos, Phys. Rev. Lett. 84 (21) (2000) 5002–5005.
[6] S. J. Tans, A. R. M. Verschueren, C. Dekker, Nature 393 (6680) (1998) 49–52.
[7] K. S. Novoselov, A. K. Geim, S. V. Morozov, D. Jiang, M. I. Katsnelson, I. V.
Grigorieva, S. V. Dubonos, A. A. Firsov, Nature 438 (7065) (2005) 197–200.
[8] Y. B. Zhang, Y. W. Tan, H. L. Stormer, P. Kim, Nature 438 (7065) (2005)
201–204.
[9] N. M. R. Peres, F. Guinea, A. H. C. Neto, Phys. Rev. B 73 (12) (2006) 125411.
[10] V. P. Gusynin, S. G. Sharapov, Phys. Rev. B 71 (12) (2005) 125124.
[11] N. D. Mermin, H. Wagner, Phys. Rev. Lett. 17 (22) (1966) 1133–1136.
[12] P. C. Hohenberg, Phys. Rev. 158 (2) (1967) 383–386.
[13] J. C. Meyer, A. K. Geim, M. I. Katsnelson, K. S. Novoselov, T. J. Booth,
S. Roth, Nature 446 (7131) (2007) 60–63.
[14] S. V. Morozov, K. S. Novoselov, M. I. Katsnelson, F. Schedin, L. A.
Ponomarenko, D. Jiang, A. K. Geim, Phys. Rev. Lett. 97 (1) (2006) 016801.
[15] Y. Zhang, Z. Jiang, J. P. Small, M. S. Purewal, Y. W. Tan, M. Fazlollahi, J. D.
Chudow, J. A. Jaszczak, H. L. Stormer, P. Kim, Phys. Rev. Lett. 96 (13) (2006)
136806.
[16] C. Berger, Z. M. Song, T. B. Li, X. B. Li, A. Y. Ogbazghi, R. Feng, Z. T. Dai,
A. N. Marchenkov, E. H. Conrad, P. N. First, W. A. de Heer, J. Phys. Chem.
B 108 (52) (2004) 19912–19916.
[17] E. Rollings, G.-H. Gweon, S. Zhou, B. Mun, J. McChesney, B. Hussain,
A. Fedorov, P. First, W. de Heer, A. Lanzara, J. Phys. Chem. Solids 67 (9-
10) (2006) 2172–2177.
[18] C. Berger, Z. M. Song, X. B. Li, X. S. Wu, N. Brown, C. Naud, D. Mayo,
T. B. Li, J. Hass, A. N. Marchenkov, E. H. Conrad, P. N. First, W. A. de Heer,
Science 312 (5777) (2006) 1191–1196.
[19] T. Ohta, A. Bostwick, T. Seyller, K. Horn, E. Rotenberg, Science 313 (5789)
(2006) 951–954.
[20] M. L. Sadowski, G. Martinez, M. Potemski, C. Berger, W. A. de Heer, Phys.
Rev. Lett. 97 (26) (2006) 266405.
[21] X. S. Wu, X. B. Li, Z. M. Song, C. Berger, W. A. de Heer, to appear in Phys.
Rev. Lett.
[22] A. J. van Bommel, J. E. Crombeen, A. van Tooren, Surf. Sci. 48 (2) (1975)
463–472.
[23] F. Owman, P. Martensson, Surf. Sci. 369 (1-3) (1996) 126–136.
[24] L. Li, I. S. T. Tsong, Surf. Sci. 351 (1-3) (1996) 141–148.
[25] I. Forbeaux, J. M. Themlin, A. Charrier, F. Thibaudau, J. M. Debever, Appl.
Surf. Sci. 162 (2000) 406–412.
[26] A. Charrier, A. Coati, T. Argunova, F. Thibaudau, Y. Garreau, R. Pinchaux,
I. Forbeaux, J. M. Debever, M. Sauvage-Simkin, J. M. Themlin, J. Appl. Phys.
92 (5) (2002) 2479–2484.
[27] G. Ertl, J. Kuppers, Low energy electrons and surface chemistry, VCH, 1985.
[28] C. Powell, A. Jablonski, I. Tilinin, S. Tanuma, D. R. Penn, J. Electron
Spectrosc. Relat. Phenom. 98-99 (1999) 1–15.
[29] S. Tanuma, C. Powell, D. Penn, Surf. Interface Anal. 17 (1991) 927–939.
[30] S. Tanuma, C. Powell, D. Penn, Surf. Interface Anal. 17 (1991) 911–926.
[31] W. Chen, H. Xu, L. Liu, X. Gao, D. Qi, G. Peng, S. C. Tan, Y. Feng, K. P.
Loh, A. T. S. Wee, Surf. Sci. 596 (1-3) (2005) 176–186.
[32] J. Hass, R. Feng, T. Li, X. Li, Z. Zong, W. A. de Heer, P. N. First, E. H.
Conrad, C. A. Jeffrey, C. Berger, Appl. Phys. Lett. 89 (14) (2006) 143106.
[33] A. N. Kolmogorov, V. H. Crespi, Phys. Rev. B 71 (23) (2005) 235415.
[34] J. Hass, R. Feng, J. Millán-Otoya, X. Li, M. Sprinkle, P. N. First, C. Berger,
W. A. de Heer, E. H. Conrad, cond-mat/0702540.
[35] J. Hass, J. Millán-Otoya, M. Sprinkle, P. N. First, C. Berger, W. A. de Heer,
E. H. Conrad, (to be published).
[36] F. Varchon, R. Feng, J. Hass, X. Li, B. N. Nguyen, C. Naud, P. Mallet, J. Y.
Veuillen, C. Berger, E. H. Conrad, L. Magaud, cond-mat/0702311.
[37] R. Franklin, Acta Cryst. 4 (1951) 253.
[38] The data in Fig. 2 of [16] are inverse resistance as a function of field, so that
the maxima of resistance correspond to minima in the plot.
[39] I. M. Lifshitz, A. M. Kosevich, Sov. Phys. JETP 2 (4) (1956) 636–645.
[40] E. McCann, K. Kechedzhi, V. I. Fal’ko, H. Suzuura, T. Ando, B. L. Altshuler,
Phys. Rev. Lett 97 (14) (2006) 146805.
[41] N. M. R. Peres, A. H. C. Neto, F. Guinea, Phys. Rev. B 73 (24) (2006) 241403.
[42] M. Y. Han, B. Oezyilmaz, Y. Zhang, P. Kim, cond-mat/0702511.
[43] O. Klein, Z. Phys. 53 (1929) 157.
[44] A. D. Martino, L. Dell’Anna, R. Egger, Phys. Rev. Lett. 98 (6) (2007) 066802.
[45] D. Yoshioka, The Quantum Hall Effect, Springer, Berlin, 2002.
[46] S. Ilani, J. Martin, E. Teitelbaum, J. H. Smet, D. Mahalu, V. Umansky,
A. Yacoby, Nature 427 (6972) (2004) 328–332.
	Introduction
	Epitaxial graphene formation and characterization
	Landau level spectroscopy of epitaxial graphene
	Patterning epitaxial graphene 
	Transport in 2D epitaxial graphene
	Transport in quasi-1D epitaxial graphene
	Structure dependent properties and the absence of the quantum Hall effect
	References
ABSTRACT
  Graphene multilayers are grown epitaxially on single crystal silicon carbide.
This system is composed of several graphene layers of which the first layer is
electron doped due to the built-in electric field and the other layers are
essentially undoped. Unlike graphite the charge carriers show Dirac particle
properties (i.e. an anomalous Berry's phase, weak anti-localization and square
root field dependence of the Landau level energies). Epitaxial graphene shows
quasi-ballistic transport and long coherence lengths; properties which may
persists above cryogenic temperatures. Paradoxically, in contrast to exfoliated
graphene, the quantum Hall effect is not observed in high mobility epitaxial
graphene. It appears that the effect is suppressed due to absence of localized
states in the bulk of the material.Epitaxial graphene can be patterned using
standard lithography methods and characterized using a wide array of
techniques. These favorable features indicate that interconnected room
temperature ballistic devices may be feasible for low dissipation high-speed
nanoelectronics.

<|endoftext|><|startoftext|>
Introduction
Computerized tomography has had a huge impact on medical diagnostics.
Numerous methods of tomographic medical imaging have been developed
and are being developed (e.g., the “standard” X-ray, single-photon emission,
positron emission, ultrasound, magnetic resonance, electrical impedance, op-
tical) [55, 59, 75, 76, 77]. The designers of these modalities strive to in-
crease the image resolution and contrast, and at the same time to reduce
the costs and negative health effects of these techniques. However, these
goals are usually rather contradictory. For instance, some cheap and safe
methods with good contrast (like optical or electrical impedance tomogra-
phy) suffer from low resolution, while some high resolution methods (such
as ultrasound imaging) often do not provide good contrast. Recently re-
searchers have been developing novel hybrid methods that combine different
physical types of signals, in hope to alleviate the deficiencies of each of the
types, while taking advantage of their strengths. The most successful exam-
ple of such a combination is the Thermoacoustic Tomography (TAT)
∗Mathematics Department, Texas A& M University, College Station, TX 77843-3368,
USA. kuchment@math.tamu.edu
†Mathematics Department, University of Arizona, Tucson, AZ 77843-3368, USA.
leonk@math.arizona.edu
http://arxiv.org/abs/0704.0286v2
1[62]. Albeit not being a common feature in clinics yet, TAT scanners are
actively researched, developed and already manufactured, for instance by
OptoSonics, Inc. (http://www.optosonics.com/), founded by the pioneer of
TAT R. Kruger.
After a substantial effort, major breakthroughs have been achieved in the
last couple of years in the mathematical modeling of TAT. The aim of this
article is to survey this recent progress and to describe the relevant models,
mathematical problems, and reconstruction procedures arising in TAT, and
to provide references to numerous research publications on this topic.
The main thrust of this text is toward mathematical methods; consid-
erations of the text length, as well as authors’ background do not let us
discuss in any detail industrial and physical set-ups and parameters of the
TAT technique, and limitations of the corresponding mathematical models.
Fortunately, the excellent recent surveys by M. Xu and L.-H. V. Wang [117]
and by A. A. Oraevsky and A. A. Karabutov [87, 88] accomplish all of these
tasks, and thus the reader is advised to consult with them for all such is-
sues (see also the recent textbook [113]). On the other hand, in spite of
the significant recent progress in mathematics of TAT, there is no compre-
hensive survey text addressing in details the relevant mathematical issues,
although the surveys [88, 117] do mention some mathematical reconstruction
techniques.
The structure of the paper is a follows: Section 2 contains a brief descrip-
tion of the TAT procedure. The next Section 3 provides the mathematical
formulation of the TAT problem. In general, it is formulated as an inverse
problem for the wave equation. However, in the case of the constant sound
speed, it can be also described in terms of a spherical mean operator (a
spherical analog of the Radon transform). The section also contains the list
of natural questions to be addressed concerning this model. These issues are
addressed then one by one in the following sections. In particular, Section
4 discusses uniqueness of reconstruction, i.e. the question of whether the
data collected in TAT is sufficient for recovery of the information of interest.
Albeit, for all practical purposes this issue is resolved in Corollary 2, we pro-
vide an additional discussion of unresolved uniqueness problems, which are
probably of more academic interest. Section 5 addresses inversion formulas
and algorithms. In Section 6 effects of having only partial data are discussed.
1TAT is also called Photoacoustic (PAT) or Optoacoustic (OAT) Tomography and is
sometimes abbreviated as TCT, which stands for Thermoacoustic Computed Tomography
http://www.optosonics.com/
Section 7 contains results concerning the so called range conditions, i.e. the
conditions that all ideal data must satisfy. Section 8 provides additional
remarks and discussions of the issues raised in the previous sections. The
paper ends with an Acknowledgments section and bibliography. Concerning
the latter, we need to mention that the engineering and biomedical literature
on TAT is rather vast and no attempt has been made in this text to create
a comprehensive bibliography of the topic from the engineering prospective.
The references in [87, 88, 109, 112, 117] to a large extent fill this gap. The
authors, however, have tried to present a sufficiently complete review of the
existing literature on mathematics of TAT.
2 Thermoacoustic tomography
In TAT, a short duration EM pulse is sent through a biological object (e.g.,
woman’s breast in mammography) with the aim of triggering a thermoa-
coustic response in the tissue. As it is explained in [117], the radiofrequency
(RF) and the visible light frequency ranges are currently considered to be the
most suitable for this purpose. Since mathematics works exactly the same
way in both of these frequency ranges, we will not make such distinction and
will be talking about just “an EM pulse”. E.g., in Figure 1 a microwave
pulse is assumed. In most cases the pulse is spatially wide, so that the
Figure 1: The TAT procedure.
whole object is more or less uniformly irradiated. Some part of EM energy
is absorbed throughout the object. The amount of energy absorbed at a
location strongly depends on local biological properties of the cells. Oxygen
saturation, concentration of hemoglobin, density of the microvascular net-
work (angiogenesis), ionic conductivity, and water content are among the
parameters that influence the absorption strongly [117]. Thus, if the energy
absorption distribution function f(x) were known, it would provide a great
diagnostic tool. For instance, it could be useful for detecting cancerous cells
that absorb several times more energy in the RF range than the healthy
ones [62, 88, 115, 117]. However, as an imaging tool neither RF waves, nor
visual light alone would provide acceptable resolution. In the RF case, this
is due to the long wave length. One can use shorter microwaves, but this
will be at the expense of the penetration depth. In the optical region, the
problem is with the multiple scattering of light. So, a different mechanism,
the so called Photoacoustic Effect [46, 107, 113, 117], is used to image f(x).
Namely, the EM energy absorption results in thermoelastic expansion and
thus in a pressure wave p(x, t) (an ultrasound signal) that can be measured
by transducers placed around the object. Now one can attempt to recover the
function f(x) (the image) from the measured data p(x, t). Such a measuring
scheme, utilizing two types of waves, brings about the high resolution of the
ultrasound diagnostics and the high contrast of EM waves. It overcomes the
adverse effect of the low contrast of ultrasound with respect to soft tissue.
In fact, such a low contrast is a good thing here, allowing one to assume in
the first approximation that the sound speed is constant. This often used
approximation is not always appropriate, but it is the most studied case at
the moment. Later on in this text we will describe some initial considerations
of the variable sound speed case, following [4, 57].
For this TAT method (and in particular, for the mathematical model
described below) to work, several conditions must be met. For instance, the
time duration of the EM pulse must be shorter than the time it takes the
sound wave to traverse the smallest feature that needs to be reconstructed.
The ultrasound detector must be able to resolve the time scale of the duration
of the EM pulse. On the other hand, the transducer must be also able to
detect much lower frequencies. Thus, one needs to have extra-wide-band
transducers, and these are currently available. One can find the technical
discussion of all these issues, for instance, in [88, 117]. In this text we will
assume that all these conditions are met and thus the mathematical models
described are applicable.
In the next section we present a mathematical description of the relation
between f(x) and p(x, t) (similar mathematical problems arise in sonar [73]
and radar [81] imaging, as well as in geophysics [27]).
3 Mathematical model of TAT:
wave equation and the spherical mean trans-
3.1 The wave equation model
We assume that the ultrasound speed at location x is equal to c(x). Then,
modulo some constant coefficients that we will assume all to be equal to 1,
the pressure wave p(x, t) satisfies the following problem for the standard wave
equation [28, 107, 115]:
ptt = c
2(x)∆xp, t ≥ 0, x ∈ R
p(x, 0) = f(x),
pt(x, 0) = 0
The goal is to find, using the data measured by transducers, the initial value
f(x) at t = 0 of the solution p(x, t).
In order to formalize what data is in fact measured, one needs to specify
what kind of transducers is used, as well as the geometry of the measurement.
By the geometry of the measurement we mean the distribution of locations
of transducers used to collect the data.
We briefly describe here the commonly considered measurement proce-
dure, which uses point detectors. Line and planar detectors have also been
suggested (see Section 8.1.1). It is too early to judge which one of them
will become most successful, but the one using point transducers has been
more thoroughly studied mathematically and experimentally, and thus will
be mostly addressed in this article. In this case, the transducers are assumed
to be point-like, i.e. of sufficiently small dimension. A transducer at time
t measures the average pressure over its surface at this time, which for the
small size of the transducer can be assumed to be just the value of p(y, t) at
the location y of the transducer. Dimension count shows immediately that in
order to have enough data for reconstruction of the function f(x), one needs
to collect data from the transducers’ locations y running over a surface S in
3. Thus, the data at the experimentalist’s disposal is the function g(y, t)
that coincides with the restriction of p(x, t) to the set of points y ∈ S.
Taking into account that the measurements produce the values g(y, t) of
the pressure p(x, t) of (1) on S × R+, the set of equations (1) extends to
become
ptt = c
2(x)∆xp, t ≥ 0, x ∈ R
p(x, 0) = f(x),
pt(x, 0) = 0
p(y, t) = g(y, t), y ∈ S × R+
Figure 2: An illustration to (2).
The problem now reduces to finding the initial value f(x) in (2) from the
knowledge of the lateral data g(x, t) (see Figure 3.1). A person familiar with
PDEs might suspect first that there is something wrong with this problem,
since we seem to have insufficient data for the recovery of the solution of the
wave equation in a cylinder from the lateral values alone. This, however,
is an illusion, since in fact there is a significant additional restriction: the
solution holds in the whole space, not just inside the cylinder S × R+. We
will see soon that in most cases, the data is sufficient for recovery of f(x).
3.2 Spherical mean model
We now introduce an alternative formulation of the problem that works in
the constant speed case only. We will assume that the units are chosen in
such a way that c(x) = 1. The known Poisson-Kirchhoff formula [25, Ch.
VI, Section 13.2, Formula (15)] for the solution of (1) gives
p(x, t) = c
(t(Rf)(x, t)) , (3)
where
(Rf)(x, r) =
|y|=1
f(x+ ry)dA(y) (4)
is the spherical mean operator applied to the function f(x), and dA is the
normalized area element on the unit sphere in R3. Hence, knowledge of the
function g(x, t) for x ∈ S and all t ≥ 0 essentially means knowledge of the
spherical mean Rf(x, t) at all points (x, t) ∈ S × R+. One thus is lead to
studying the spherical mean operator R : f → Rf and in particular its
restriction RS to the points x ∈ S only (these are the points where we place
transducers):
RSf(x, t) =
|y|=1
f(x+ ty)dA(y), x ∈ S, t ≥ 0. (5)
This explains why, in many works on TAT, the spherical mean operator has
been the model of choice. Albeit the (unrestricted) spherical mean operator
has been studied rather intensively and for a long time (e.g., [17, 25, 58]),
its version RS with the centers restricted to a subset S appears to have been
studied since early 1990s only [1]-[14], [16, 26, 30, 31, 33, 32, 34, 35, 36, 39,
41, 63, 64, 68, 69, 71, 72, 73, 77, 80, 82, 83, 89, 90, 91, 95, 96, 97, 104, 121]
and offers quite a few new and often hard questions.
In what follows, we will alternate between these two (PDE and integral
geometry) interpretations of the TAT model, since each of them has its own
advantages.
3.3 Main mathematical problems of TAT
We now formulate the typical list of problems one would like to address in
order to implement the TAT reconstruction.
1. For which sets S ∈ R3 is the data collected by transducers placed along
S sufficient for unique reconstruction of f? In terms of the spheri-
cal mean operator, the question is whether RS has zero kernel on an
appropriate class of functions, say continuous with compact supports.
2. If the data collected from S is sufficient, what are inversion formulas
and algorithms?
3. How stable is the inversion?
4. What happens if the data is “incomplete”?
5. What is the space of all possible “ideal” data g(t, y) collected on a sur-
face S? Mathematically (and in the constant sound speed case) it is
the question of describing the range of the operator RS in appropriate
function spaces. This question might seem to be unusual (for instance,
to people used to partial differential equations), but in tomography im-
portance of knowing the range of Radon type transforms is well known.
Such information is used to improve inversion algorithms, complete in-
complete data, discover and compensate for certain data errors, etc.
(e.g., [30, 38, 39, 40, 53, 54, 55, 76, 77, 90]).
4 Uniqueness of reconstruction
Many of the problems of interest to TAT can be formulated in any dimension
d, albeit the practical dimensions are only d = 3 and d = 2. We will consider
an arbitrary dimension d whenever we see this suitable.
Let S ⊂ Rd be the set of locations of transducers and f be a compactly
supported function (one can show that for purposes of uniqueness of recon-
struction problem, one can always assume that f is smooth [7]). Does the
absence of the signal on the transducers, i.e. g(t, y) = 0 for all t and y in
S, imply that f = 0? If the answer is a “yes,” we call S - a uniqueness
set, otherwise a non-uniqueness set. In other words, in terms of TAT, the
uniqueness sets are those that distributing transducers along them provides
enough data for unique reconstruction of the function f(x).
In terms of the wave equation, uniqueness sets are the sets of complete
observability, i.e. such that observing the motion on this set only, one gets
enough information to reconstruct the whole oscillation. In terms of the
spherical mean operator, the question is of whether the equality RSf = 0
implies that f = 0.
We will address this problem for the constant sound speed case first.
4.1 Constant speed case
As it has been discussed, the dimension count makes it clear that S must
be (d − 1)-dimensional, i.e. a surface in 3D or a curve in 2D. We will
also see that most of such surfaces are “good”, i.e. are uniqueness ones (or,
in other words, provide enough information for reconstruction). Thus, we
should rather discuss the problem of describing the “bad”, non-uniqueness
sets. The following simple statement is very important and not immediately
obvious.
Lemma 1 [7, 71, 72, 121] Any non-uniqueness set S is a set of zeros of a
(non-trivial) harmonic polynomial. In particular,
1. If there is no non-zero polynomial vanishing on S, then S is a unique-
ness set.
2. If there is no non-zero harmonic function vanishing on S, then S is a
uniqueness set.
The proof of this lemma is very simple. It works under the assumption
of exponential decay of the function f(x), not necessarily of compactness of
its support. It also introduces some polynomials that play significant role in
the whole analysis of the spherical mean operator RS.
Let k ≥ 0 be an integer. Consider the convolution
Qk(x) = |x|
2k ∗ f(x) =
|x− y|2kf(y)dy. (6)
This is clearly a polynomial of degree at most 2k. Rewriting the integral in
polar coordinates centered at x and using radiality of |x − y|, one sees that
Qk(x) is determined if we know the values Rf(x, t) of the spherical mean of
f centered at x:
Qk(x) = cd
t2k+d−1Rf(x, t)dt.
In particular, If RSf ≡ 0, then each polynomial Qk vanishes on S.
Another observation that is easy to justify is that if the function f is
exponentially decaying (e.g., is compactly supported), then if all polynomials
Qk vanish identically, the function itself must be equal to zero. (This is not
necessarily true anymore if f and its derivatives decay only faster than any
power, rather than exponentially.)
Thus, we conclude that if f is not identically equal to zero, then there is at
least one non-zero polynomial Qk. Since, as we discussed, equality RSf = 0
implies that Qk|S = 0, we conclude that S must be algebraic.
Now notice the following simple to verify equality (with a non-zero con-
stant ck):
∆Qk = ckQk−1, (7)
where ∆ is the Laplace operator. This implies that the lowest k non-zero
polynomial Qk is harmonic. Since Qk|S = 0, this proves the lemma.
Consider now the case when S is a closed (hyper-)surface (i.e., the bound-
ary of a bounded domain). Since, as it is well known, there is no non-zero
harmonic function in the domain that would vanish at the boundary (the
spectrum of the Dirichlet Laplace operator is strictly positive), we conclude
that such S is a uniqueness set for harmonic polynomials. Thus, we get the
following important
Corollary 2 [7, 63] Any closed surface is uniqueness set for the spherical
mean Radon transform.
An older alternative proof of this corollary provides an additional insight
into the problem. We thus sketch it here. Let us assume for simplicity
that the dimension d ≥ 3 is odd (even dimensions require a little bit more
work). Suppose that the closed surface S remains stationary (nodal) for the
oscillation described by (1). Since the oscillation is unconstrained and the
initial perturbation is compactly supported, after a finite time, the interior
of S will become stationary. On the other hand, we can think that S is
fixed (since it is not moving anyway). Then, the energy inside S must stay
constant. This is the contradiction that proves the statement of Corollary 2.
We will see in the next Section that the same method works in some cases
of variable sound speed, providing the needed uniqueness of reconstruction
result.
This corollary resolves the uniqueness problems for most practically used
geometries. It fails, however, if f does not decay sufficiently fast (see [3],
where it is shown in which Lp(Rd) classes of functions f(x) closed surfaces
remain uniqueness sets).
It also provides uniqueness for some “limited data” problems. For in-
stance, if S is an open (even tiny) piece of an analytic closed surface Σ, it
suffices. Indeed, if it did not, then it would be a part of an algebraic non-
uniqueness surface. Uniqueness of analytic continuation would show then
that the whole Σ is a non-uniqueness set, which we know to be incorrect.
This result, however, does not say that it would be practical to reconstruct
using observations from a tiny S. We will see later that this would not lead
to a satisfactory reconstructions, due to instabilities.
A geometry sometimes used is the planar one, i.e. detectors are placed
along a plane S (line in the 2D). In this case, there is no uniqueness of
reconstruction when the sound speed is constant. Indeed, if f(x) is odd with
respect to S, then clearly all measured data g(t, y) will vanish. However, it is
well known [25, 58] that functions even with respect to S can be recovered.
What saves the day in TAT is that the object to be imaged is located on one
side of S. Then, extending f(x) as an even function with respect to S, one
can still recover it from the data.
Although, for all practical purposes the uniqueness of reconstruction prob-
lem is essentially resolved by the Corollary 2, the complete understanding of
uniqueness problem has not been achieved yet. Thus, we include below some
known theoretical results and open problems.
4.1.1 Non-uniqueness sets in R2.
In this Section, we follow the results and exposition of [7, 71, 72] in discussing
uniqueness sets in 2D. What are simple examples of non-uniqueness sets? As
we have already mentioned, any line S (or a hyperplane in higher dimensions)
is a non-uniqueness set, since any function f odd with respect to S will clearly
produce no signal: RSf = 0. Analogously, consider a Coxeter system ΣN of
N lines passing through a point and forming equal angles (see Fig. 3).
Figure 3: Coxeter cross ΣN .
Choosing the intersection point as the pole and expanding functions into
Fourier series with respect to the polar angle, it is easy to discover existence
of an infinite dimensional space of functions that are odd with respect to
each of the N lines. Thus, such a cross ΣN is also a non-uniqueness set. Less
obviously, one can use the infinite dimensional freedom just mentioned to
add any finite set Φ of points still preserving non-uniqueness. The following
major and very non-trivial result was conjectured in [71, 72] and proven in
[7]. It shows that there are no other bad sets S besides the ones we have just
discovered:
Theorem 3 A set S ⊂ R2 is a non-uniqueness set for the spherical mean
transform in the space of compactly supported functions, if and only if
S ⊂ ωΣN ∪ Φ,
where ΣN is a Coxeter system of lines, ω is a rigid motion of the plane, and
Φ is a finite set.
A sketch of a rather intricate proof of this result is provided in Section
4.1.2 Higher dimensions
Here we present a believable conjecture of how the result should look like in
higher dimensions.
Conjecture 4 [7]A set S ⊂ Rd is a non-uniqueness set if and only if S ⊂
ωΣ ∪ Φ, where Σ is the surface of zeros of a homogeneous harmonic polyno-
mial, ω is a rigid motion of Rd, and Φ is an algebraic surface of dimension
at most d− 2.
Figure 4: A picture of a 3-dimensional non-uniqueness set.
The progress towards proving this conjecture has been slow, albeit some
partial cases have been treated ([1]-[12]). E.g., in some cases one can prove
that S is a ruled surface (i.e., consists of lines), but proving that these lines
(rules) pass through a common point remains a challenge. It is known,
though, that both the zero sets of homogeneous harmonic polynomials and
algebraic subsets of dimension at most d − 2 are non-uniqueness sets [2, 7],
and thus one should avoid using them as placements of transducers for TAT.
4.1.3 Relations to other areas of analysis
The problem of injectivity of RS has relations to a wide variety of areas of
analysis (see [1, 7] for many examples). In particular, the following interpre-
tation is important:
Theorem 5 [7, 63] The following statements are equivalent:
1. S ⊂ Rd is a non-uniqueness set for the spherical mean operator.
2. S is a nodal set for the wave equation, i.e. there exists a non-zero
compactly supported f such that the solution of the wave propagation
problem 
= ∆u,
u(x, 0) = 0,
ut(x, 0) = f(x)
vanishes on S for any moment of time.
3. S is a nodal set for the heat equation, i.e. there exists a non-zero
compactly supported f such that the solution of the problem
= ∆u,
u(x, 0) = f(x)
vanishes on S for any moment of time.
The interpretation in terms of the wave equation provides important PDE
tools and insights, which have lead to a recent progress [33, 12] (albeit it
has not lead yet to a complete alternative proof of Theorem 3). The rough
idea, originally introduced in [33], is that if S is a nodal set, then it might be
considered as the fixed boundary. In this case, the signals must go around S.
However, in fact, there is no obstacle, so signals can propagate along straight
lines. Thus, in order to avoid discrepancies in arrival times, S must be very
special. One can find details in [33] and in [12].
4.2 Uniqueness in the case of a variable sound speed
It is shown in [35, Theorem 4] that uniqueness of reconstruction also holds in
the case of a smoothly varying (strictly positive) sound speed, if the source
function f(x) is completely surrounded by the observation surface S (in other
words, if there is no US signal coming from outside of S). The proof uses
the celebrated unique continuation result by D. Tataru [108].
One can also establish uniqueness of reconstruction in the case of the
source not necessarily completely surrounded by S. However, here we need
to impose an additional non-trapping condition on the sound speed. We
assume that the sound speed is strictly positive c(x) > c > 0 and such that
c(x)− 1 has compact support, i.e. c(x) = 1 for large x.
Consider the Hamiltonian system in R2nx,ξ with the Hamiltonian H =
c2(x)
|ξ|2: 
x′t =
= c2(x)ξ
ξ′t = −
∇ (c2(x)) |ξ|2
x|t=0 = x0, ξ|t=0 = ξ0.
The solutions of this system are called bicharacteristics and their projections
into Rnx are rays.
We will assume that the following non-trapping condition holds:
all rays (with ξ0 6= 0) tend to infinity when t→ ∞.
Theorem 6 [4] Under the non-trapping conditions formulated above, com-
pactly supported function f(x) is uniquely determined by the data g measured
on S for all times. (No assumption of f being supported inside S is imposed.)
One should mention that ray trapping can occur for some sound speed
profiles. For instance, if c(x) = |x| for some range r1 < |x| < r2, then there
are rays trapped in this spherical shell. We are not sure what happens in
this case to the uniqueness of reconstruction statement of Theorem 6 and
inversion formula of Theorem 7.
5 Reconstruction: formulas and examples
Here we will address the procedures of actual reconstruction of the source
f(x) from the data g(t, y) measured by transducers.
5.1 Constant sound speed
We assume here that the sound speed is constant and normalized to be equal
to 1.
5.1.1 Inversion formulas
Before we move to our case of interest, which is spheres centered on a closed
surface S surrounding the object to be imaged, we briefly refer to related
but somewhat different work. Namely, the problem of recovering functions
from integrals over spheres centered on a (hyper)plane S has attracted a
lot of attention over the years. Albeit, as it has been mentioned before,
there is no uniqueness in this case (functions odd with respect to S are
annihilated), even functions can be recovered. Thus also functions supported
on one side of the plane can be as well, by means of their even extension.
Many explicit inversion formulas and procedures have been obtained for this
situation [16, 26, 31, 39, 41, 60, 77, 80, 89, 90, 101, 102, 103]. We will not
provide any details here, since this acquisition geometry is not very useful.
In particular, this is due to “invisibility” of some parts of the interfaces,
see Section 6, which arises from truncating the plane. The same problem
is encountered with some other unbounded acquisition surfaces, such as a
surface of an “infinitely” long cylinder.
Thus, it is more practical to place transducers along a closed surface
surrounding the object. The simplest surface of this type is a sphere.
5.1.2 Fourier expansion methods
Let us assume that S is the unit sphere in Rn. We would like to reconstruct
a function f(x) supported inside S from the known values of its spherical
integrals g(y, r) with the centers on S:
g(y, r) =
f(y + rω)rn−1dω, y ∈ S.
The first inversion procedures for the case of spherical acquisition were de-
scribed in [82] in 2D and in [83] in 3D. These solutions were obtained by
harmonic decomposition of the measured data and the sought function, and
by equating coefficients of the corresponding Fourier series.
In particular, the 2-D algorithm of [82] is based on the Fourier decompo-
sition of f and g in angular variables:
f(x) =
fk(ρ)e
ikϕ, x = (ρ cos(ϕ), ρ sin(ϕ)) (9)
g(y(θ), r) =
gm(r)e
ikθ, y = (R cos(θ), R sin(θ)).
Following [82] we consider the Hankel transform ĝm,J(λ) of the Fourier coef-
ficients gm(r) (divided by 2πr)
ĝm,J(λ) =
gm(r)J0(λr)dr = H0
gm(r)
. (10)
To simplify the presentation we introduce the convolution GJ(λ, y) of the
sought function with the Bessel function J0(λ|x− y|).
GJ(λ, y) =
f(x)J0(λ|x− y|)dx, (11)
One can notice that ĝm,J(λ) are the Fourier coefficients of GJ(λ, y) in θ:
ĝm,J(λ) =
GJ(λ, y)e
−imθdθ. (12)
Now coefficients fm(ρ) can be recovered from gm(r) by application of the
addition theorem for the Bessel function J0(λ|x− y|):
J0(λ|x− y|) =
Jm(λ|x|)Jm(λ|y|)e
−im(ϕ−θ). (13)
Indeed, by substituting equations (9) and (13) into (11), and (11) into (12)
one obtains
ĝm,J(λ) = 2πJm(λ|R|)
fm(ρ)Jm(λρ)ρdρ = Hm(fm(ρ)),
where Hm is the m-th order Hankel transform. Since the latter transform
is self-invertible, the coefficients fm(ρ) can be recovered by the following
formula
fm(ρ) = Hm
ĝm,J(λ)
Jm(λ|R|)
Jm(λ|R|)
gm(r)
, (14)
which is the main result of [82]. Function f(x) can now be reconstructed by
summing series (9).
Note that the above method requires a division of the Hankel transform
of the measured data by Bessel functions Jm that have infinitely many zeros.
Theoretically, there is no problem; the Hankel transform H0
gm(r)
has to
have zeros that would cancel those in the denominator. However, since the
measured data always contain some error, the exact cancelation is not likely
to happen, and one needs a sophisticated regularization scheme to keep the
total error bounded.
This problem can be avoided by replacing in (10) Bessel function J0 by
Hankel function H
ĝm,H(λ) =
gm(r)H
0 (λr)dr.
The addition theorem for H
0 (λ|x− y|) takes form
0 (λ|x− y|) =
Jm(λ|x|)H
m (λ|y|)e
−im(ϕ−θ),
and by proceeding as before one can obtain the following formula for fm(ρ):
fm(ρ) = Hm
ĝm,H(λ)
m (λ|R|)
m (λ|R|)
gm(r)H
0 (λr)dr
Unlike Jm, Hankel functions H
m (t) do not have zeros for all real values of t
and therefore problems with division by zeros do not arise in this amended
version of the method [82].
This derivation can be repeated in 3-D, with the exponentials eikθ replaced
by the spherical harmonics, and with cylindrical Bessel functions replaced by
their spherical counterparts. By doing this one will arrive at the Fourier
series method of [83]. Our use of Hankel function H
0 above is similar to
the way the authors of [83] utilized spherical Hankel function h
0 to avoid
the divisions by zero.
5.1.3 Filtered backprojection methods
The favorite way of inverting Radon transform for tomography purposes is
by using filtered backprojection type formulas, which involve filtration in
Fourier domain followed (or preceded) by a backprojection. In the case of
the set of spheres centered on a closed surface (e.g., sphere) S, one expects
such a formula to involve a filtration with respect to the radius variable and
then some integration over the set of spheres passing through the point of
interest. For quite a while, no such type formula had been discovered. This
did not prevent practitioners from reconstructions, since good approximate
inversion formulas (parametrices) could be developed, followed by an iterative
improvement of the reconstruction, see e.g. reconstruction procedures in
[114, 115, 118, 119, 120], and especially [96, 97].
The first set of exact inversion formulas of the filtered backprojection type
was discovered in [33]. These formulas were obtained only in odd dimensions.
Several different variations of such formulas (different in terms of exact order
of the filtration and backprojection steps) were developed. Let us denote by
g(p, r) = r2RSf the spherical integral, rather than the average, of f . Then
various versions of the 3D inversion formulas that reconstruct a function f(x)
supported inside S from its the spherical mean data RSf , read:
f(x) = − 1
g(y, |y − x|)dA(y),
f(x) = − 1
g(y, t)
) ∣∣∣∣∣
t=|y−x|
dA(y),
f(x) = − 1
g(y,t)
)) ∣∣∣∣∣
t=|y−x|
dA(y).
Recently, analogous formulas were obtained for even dimensions in [32]. De-
noting by g, as before the spherical integrals (rather than averages) of f , the
formulas of [32] in 2D look as follows:
f(x) =
g(y, t) log(t2 − |x− y|2) dt dl(y), (16)
f(x) =
g(y, t)
log(t2 − |x− y|2) dt dl(y), (17)
A different set of explicit inversion formulas that work in arbitrary dimensions
was presented in [69].
f(x) =
4(2π)n−1
n(y)h(y, |x− y|)dA(y). (18)
h(y, t) =
Y (λt)
J(λt′)g(y, t′)dt′
−J(λt)
Y (λt′)g(y, t′)dt′
λ2n−3dλ, (19)
J(t) =
Jn/2−1(t)
tn/2−1
, Y (t) =
Yn/2−1(t)
tn/2−1
Jn/2−1(t) and Yn/2−1(t) are respectively the Bessel and Neumann functions
of order n/2− 1, and n(y) is the vector of exterior normal to ∂B.
In 2-D equations (18), (19) can be simplified to yield the following recon-
struction formula
f(x) = −
g(y, t′)
|x− y|2 − t′2
 dl(y).
A similar simplification is also possible in 3D resulting in the formula
f(x) =
g(y, t)
) ∣∣∣∣∣
t=|y−x|
dA(y). (20)
Equation (20) is equivalent to one of the formulas derived in [116] for the 3D
case. It is interesting to notice that the “universal” formula of [116] holds for
all geometries when the backprojection type formulas are known: spherical,
cylindrical, and planar. It is not very likely that such explicit formulas would
be available for any closed surfaces S different from spheres (see a related
discussion in [15, 27]).
5.1.4 Series solutions for arbitrary geometries
Although, as we have just mentioned, we do not expect such explicit formulas
to be derived for non-spherical closed surfaces S, there is, however, a different
approach [70] that theoretically works for any closed S and that is practically
useful in some non-spherical geometries.
Let λ2m and um(x) be the eigenvalues and normalized eigenfunctions of
the Dirichlet Laplacian −∆ on the interior Ω of a closed surface S:
∆um(x) + λ
mum(x) = 0, x ∈ Ω, Ω ⊆ R
n, (21)
um(x) = 0, x ∈ S,
||um||
|um(x)|
2dx = 1.
As before, we would like to reconstruct a compactly supported function f(x)
from the known values of its spherical integrals g(y, r) with the centers on S:
g(y, r) =
f(y + rω)rn−1dω, y ∈ S.
We notice that um(x) is the solution of the Dirichlet problem for the Helmholtz
equation with zero boundary conditions and the wave number λm, and thus
it admits the Helmholtz representation
um(x) =
Φλm(|x− y|)
um(y)ds(y) x ∈ Ω, (22)
where Φλm(|x− y|) is a free-space rotationally invariant Green’s function of
the Helmholtz equation (21).
The eigenfunctions {um(x)}
0 form an orthonormal basis in L2(Ω). There-
fore, f(x) can be represented by the series
f(x) =
αmum(x) (23)
um(x)f(x)dx.
Since f(x) is C10 , series (23) converges pointwise. A reconstruction formula of
αm, and thus of f(x), will result if we substitute representation (22) into (23)
and interchange the order of integrations. Indeed, after a brief calculation
we will get
um(x)f(x)dx =
I(y, λm)
um(y)dA(x), (24)
where
I(y, λ) =
Φλ(|x− y|)f(x)dx. (25)
Certainly, the need to know the spectrum and eigenfunctions of the
Dirichlet Laplacian imposes a severe constraint on the surface S. However,
there are simple cases when the eigenfunctions are well known, and fast sum-
mation formulas for the corresponding series are available. Such is the case
of a cubic measuring surface S (see [70]); the eigenfunctions um are products
of sine functions
um(x) =
πm1x1
πm2x2
πm3x3
, (26)
where m = (m1, m2, m3), m1, m2, m3 ∈ N, and the eigenvalues are easily
found as well
λm = π
2|m|2/R2. (27)
Sum (23) is just a regular 3-D Fourier sine series easily computable by ap-
plication of the Fast Sine Fourier transform algorithm. The algorithmic
trick that allows one to calculate fast the coefficients αm consists in com-
puting first integrals (25) on a uniform mesh in λ. This is easily done by
a one-dimensional Fast Cosine Fourier transform algorithm, with Φλ(t) =
cos(λt)/t. The normal derivatives of um(x) are also products of sine func-
tions, this time two-dimensional ones. This, in turn, permits rapid evaluation
of integrals
I(y, λ) ∂
um(y)dA(x) for each mesh value of λ, and for each
one of the six faces ∂Ωi, i = 1, ..., 6 of the cube. Finally, the computation of
αm using equation (24) reduces to the interpolation in the spectral param-
eter λ, since the integrals in the right hand side of this equation have been
computed for the mesh values of this parameter (not for λm). Due to oscil-
latory nature of the integrals (25) a low order interpolation here would lead
to inaccurate reconstructions. Luckily, however, these integrals are analytic
functions of parameter λ (due to the finite support of g). Hence, high order
polynomial interpolation is applicable, and numerics yields very good results.
The algorithm we just described requires O(m3 logm) floating point op-
erations if the reconstruction is to be performed on an m×m×m Cartesian
grid, from comparably discretized data measured on a cubic surface. In prac-
tical terms, it yields reconstructions in the matter of several seconds on grids
with total number of nods exceeding a million [70].
5.1.5 Time reversal (backpropagation) methods
In the constant speed case, the following approach is possible in 3D: due
to the validity of the Huygens’ principle (i.e., the signal escapes from any
bounded domain in finite time), the pressure p(t, x) inside S will become
equal to zero for any time T larger than the time required to cross the domain
(i.e., time that it takes the sound to move along the diameter of S, which
for c = 1 equals the diameter). Thus, one can impose the zero conditions
on p(t, x) for t = T and solve the wave equation (2) back in time, using the
measured data g as the boundary values. The solution of this well posed
problem at t = 0 gives the desired source function f(x). Such methods have
been successfully implemented [22].
Although in 2D or in presence of sound speed variations, Huygens’ princi-
ple does not hold anymore, and thus the signal theoretically will stay forever,
one can find good approximate solutions using a similar approach [4, 18], see
discussions of the variable speed case below.
5.1.6 Examples of reconstructions and additional remarks about
the inversion formulas
• It is well known that different analytic inversion formulas in tomogra-
phy can behave differently in numerical implementation (e.g., in terms
of their stability), However, numerical implementation seems to show
that the analytic (backprojection type) formulas (15)-(20), in spite of
some of them being not equivalent, work equally well. See, for example
the results of an analytic formula reconstruction in 3D shown in Fig.
• It is worth noting that although formulas (15)-(16) and (18)-(20) will
yield identical results when applied to functions that can be represented
as the spherical mean Radon transform of a function supported inside
S, they are in general not equivalent when applied to functions with
larger supports. Simple examples (e.g., of f being the characteristic
Figure 5: A mathematical phantom in 3D (left) and its reconstruction using
an analytic inversion formula.
set of a large ball containing S) show that these two types of formulas
provide different reconstructions.
• An interesting observation is that backprojection formulas (15)-(20) do
not reconstruct the function f correctly inside the surface S, if f has
support reaching outside S. For instance, applying the reconstruction
formulas to the function RS(χ|x|≤3) leads to an incorrect reconstruction
of the value of f = χ|x|≤3 inside S = {|x| ≤ 1}. (Here by χV we denote
the characteristic function of the set V , i.e. it takes the value 1 in V
and zero outside. So, χ|x|≤3 is the characteristic function of the ball of
radius 3 centered at the origin.)
An another example: if one adds to the phantom shown in Fig. 5
two balls to the right of the surrounding sphere S, this leads to strong
artifacts, as seen on Fig. 6.
What is the reason for such a distortion? If one does not know in ad-
vance that f has support inside S, the backprojection formulas shown
before use insufficient information to recover a function with a larger
support, and thus uniqueness of reconstruction is lost. Then the for-
mulas misinterpret the data, wrongly assuming that they came form a
function supported inside S and thus reconstructing the function in-
correctly.
Notice that the series reconstruction of the preceding Section is free of
such problem. E.g., the reconstruction shown in Fig. 7 confirms this.
Figure 6: A perturbed reconstruction, due to presence of two additional balls
outside S (not shown on the picture).
Figure 7: In the phantom shown on the left, most disks are located outside
the square acquisition surface S indicated by the dotted line. This does not
perturb the reconstruction inside S (right).
5.2 Reconstruction in the variable speed case
We will assume here that the sound speed c(x) is smooth, positive, constant
for large x, and non-trapping. Although most analytic techniques we de-
scribed above do not work in the variable speed case, some formulas can be
derived and algorithms can be designed. This work is in a beginning stage
and the results described below most surely can and will be improved.
5.2.1 “Analytic” inversions
Let us denote by Ω the interior of the observation surface S, i.e. the area
where the object to be imaged is located. Consider in Ω the operator A =
−c2(x)∆ with zero Dirichlet conditions on the boundary S = ∂Ω. This
operator is self-adjoint, if considered in the weighted space L2(Ω; c−2(x)).
We also denote by E the operator of harmonic extension, which trans-
forms a function φ on S to a harmonic function on Ω which coincides with φ
on S.
The following result provides a formula for reconstructing f from the data
Theorem 7 [4] The function f(x) in (2) can be reconstructed in Ω as fol-
lows:
f(x) = (Eg|t=0)−
2 sin (τA
2 )E(gtt)(x, τ)dτ. (28)
The validity of this result hinges upon decay estimates for the solution (so
called local energy decay [29, 110, 111]), which hold under the non-trapping
condition. These estimates guarantee a qualified decay of the solution p(t, x)
inside any bounded region, e.g. in Ω, when time t increases. In odd dimen-
sions decay is exponential, but only polynomial in even dimensions. The
decay can be used instead of Huygens’ principle to solve the wave equation
backwards, starting at the infinite time. This leads to the formula (28).
Due to functions of the operator A being involved, it is not that clear how
explicit this formula can be made. For instance, it would be interesting to
see whether one can derive from (28) a backprojection inversion formula for
the case of a constant sound speed and S being a sphere (we have already
seen that such formulas are known).
5.2.2 Backpropagation
The exponential decay at large values of time can be used as follows: for a
sufficiently large T , one can assume that the solution is practically zero at
t = T . Thus, imposing zero initial conditions at t = T and solving in reverse
time direction, one arrives at t = 0 to an approximation of f(x) [18].
5.2.3 Eigenfunction expansions
One natural way to try to use the formula (28) is to use eigenfunction expan-
sion of the operator A in Ω (assuming that such expansion is known). This
immediately leads to the following result:
Theorem 8 Under the same conditions on the sound speed as before, func-
tion f(x) can be reconstructed inside Ω from the data g in (2), as the following
L2(B)-convergent series:
f(x) =
fkψk(x), (29)
where the Fourier coefficients fk can be recovered using one of the following
formulas:
fk = λ
k gk(0)− λ
sin (λkt)g
k(t)dt,
fk = λ
k gk(0) + λ
cos (λkt)g
k(t)dt, or
fk = −λ
sin (λkt)gk(t)dt = −λ
sin (λkt)g(x, t)
(x)dxdt,
gk(t) =
g(x, t)
(x)dx.
Here ν denotes the external normal to S.
One notices that this is a generalization to the variable sound speed case
of the expansion method of [70], discussed in Section 5.1.4. An interesting
feature is that, unlike in [70], we do not need to know the whole space Green’s
function for A (which is certainly not known).
It is not clear yet how feasible numerical implementation of this approach
could be.
6 Partial data. “Visible” and “invisible” sin-
gularities
Uniqueness of reconstruction does not imply practical recoverability, since the
reconstruction procedure might be severely unstable. This is well known to
be the case, for instance, in incomplete data situations in X-ray tomography,
and even for complete data problems in some imaging modalities, such as
the electrical impedance tomography [64, 68, 76, 77].
In order to describe the results below, we need to explain the notion
of the wave front set WF (f) of a function f(x). This set carries detailed
information on singularities of f(x). It consists of pairs (x, ξ) of a point x
in space and a wave vector (Fourier domain variable) ξ 6= 0. It is easier to
say what it means that a point (x0, ξ0) is not in the wave front set WF (f).
This means that one can smoothly cut-off f to zero at a small distance from
x0 in such a way that the Fourier transform φ̂f(ξ) of the resulting function
φ(x)f(x) decays faster than any power of ξ in directions that are close to
the direction of ξ0. We remind the reader that if this Fourier transform
decays that way in all directions, then f(x) is smooth near the point x0. So,
the wave front set contains pairs (x0, ξ0) such that f is not smooth near x0,
and ξo indicates why it is not: the Fourier transform does not decay well in
this direction. For instance, if f(x) consists of two smooth pieces joined non-
smoothly across a smooth interface Σ, then WF (f) contains pairs (x, ξ) such
that x is in Σ and ξ is normal to Σ at x. One can find simple introduction
to the notions of microlocal analysis, such as the wave front set, for instance
in [106].
Analysis done in [99] for the constant speed case (equivalently, for the
spherical mean transform RS), showed which parts of the wave front (and
thus singularities) of a function f can be recovered from its partial X-ray
data. An analog of this result also holds for the spherical mean transform
RS [73] (see also [120] for a practical discussion). We formulate it below in
an imprecise form (see [73] for precise formulation).
Theorem 9 [73] A wavefront set point (x, ξ) of f is “stably recoverable”
from RSf if and only if there is a circle (sphere in higher dimensions) centered
on S, passing through x, and normal to ξ at this point.
As we have already mentioned, this result does not exactly hold the way it is
formulated and needs to include some precise conditions (see [73, Theorem
3]). The statement is, for instance, correct if S is a smooth hypersurface and
the support of f lies on one side of the tangent plane to S at the center of
the sphere mentioned in the theorem.
Talking about jump singularities only (i.e., interfaces between smooth
regions inside the object to be imaged), this result says that in order for a
piece of the interface to be stably recoverable (dubbed “visible”), one should
have for each point of this interface, a sphere centered at S and tangent to
the interface at this point. Otherwise, the interface will be blurred away
(even if there is a uniqueness of reconstruction theorem). The reason is that
if all spheres of integration are transversal to the interface, the integration
smoothes off the singularity, and therefore its recovery becomes highly unsta-
ble (numerically, one has to deal with inversion of a matrix with exponentially
fast decaying singular values). The Figure 8 below shows an example of an
incomplete data reconstruction from spherical mean data. One sees clearly
the effect of disappearance of the parts of the boundaries that are not touched
tangentially by circles centered at transducers’ locations.
Figure 8: Effect of incomplete data: the phantom (left) and its incomplete
data reconstruction. The transducers were located along a 180o circular arc
(the left half of a large circle surrounding the squares).
7 Range conditions
As it has already been mentioned, the space of functions g(t, y) that could
arise as exact data measured by transducers (i.e., the range of the data),
is very small (of infinite codimension in the spaces of all functions of t >
0, y ∈ S). Knowing this space (range) is useful for many theoretical and
practical purposes (reconstruction algorithms, error corrections, incomplete
data completion, etc.), and thus has attracted a lot of attention (e.g., [30,
38, 39, 40, 53, 54, 64, 66, 67, 68, 74, 76, 77, 78, 90, 100].
For instance, for the standard Radon transform
f(x) → g(s, ω) =
x·ω=s
f(x)dx, |ω| = 1,
the range conditions on g(s, ω) are:
1. evenness: g(−s,−ω) = g(s, ω)
2. moment conditions: for any integer k ≥ 0, the kth moment
Gk(ω) =
skg(ω, s)ds
extends from the unit circle of vectors ω to a homogeneous polynomial
of degree k in ω.
The evenness condition is obviously necessary and is kind of “trivial”. It
seems that the only non-trivial conditions are the moment ones. However,
here the standard Radon transform misleads us, as it often happens. In fact,
for more general transforms of Radon type it is often easy (or easier) to find
analogs of the moment conditions, while analogs of the evenness conditions
are often elusive (see [64, 66, 67, 76, 77, 84] devoted to the case of SPECT
(single photon emission tomography)). The same happens in TAT.
Let us deal first with the case of a constant sound speed, when one can
think of the spherical mean transform RS instead of the wave equation model.
An analog of the moment conditions was already present implicitly (without
saying that these were range conditions) in [71, 72, 7] and explicitly formu-
lated as such in [95]. Indeed, our discussion in Section 4 of the polynomials
Qk provides the following conditions of the moment type:
Moment conditions [7, 71, 72, 95] on data g(p, r) = RSf(p, r) look as
follows: for any integer k ≥ 0, the moment
Mk(ω) =
r2k+d−1g(p, r)dr
can be extended from S to a (non-homogeneous) polynomial Qk(x) of degree
at most 2k.
These conditions, however, are incomplete, and in fact infinitely many
others, which play the role of an analog of evenness, need to be added.
Complete range descriptions for RS when S is a circle in 2D were discov-
ered in [13] and then in odd dimensions in [34]. They were then extended
to any dimension and interpreted in several different ways in [6]. These
conditions happen to be intimately related to PDEs and spectral theory.
Figure 9:
In order to describe these conditions, we need to introduce some notations.
Let B be the unit ball in Rd, S - the unit sphere, and C - the cylinder B×[0, 2]
(see Fig. 9).
We introduce the spherical mean operator RS as before:
RSf(x, t) =
|y|=1
f(x+ ty)dA(y), x ∈ S.
Several different range descriptions for RS were provided in [6], out of
which we only show a few:
Theorem 10 [6] The following three statements are equivalent:
1. The function g ∈ C∞0 (S × [0, 2]) is representable as RSf for some
f ∈ C∞0 (B). (In other words, g represents an ideal (free of errors) set
of TAT data.)
2. (a) The moment conditions are satisfied.
(b) Let −λ2 be any eigenvalue of the Laplace operator in B with zero
Dirichlet conditions and ψλ be the corresponding eigenfunction.
Then the following orthogonality condition is satisfied:
S×[0,2]
g(x, t)∂νψλ(x)jn/2−1(λt)t
n−1dxdt = 0. (31)
Here jp(z) = cp
Jp(z)
is the so called spherical Bessel function.
3. (a) The moment conditions are satisfied.
(b) Let ĝ(x, λ) =
g(x, t)jn/2−1(λt)t
n−1dt. Then, for any m ∈ Z, the
mth spherical harmonic term ĝm(x, λ) of ĝ(x, λ) vanishes at all
zeros λ 6= 0 of Bessel function Jm+n/2−1(λ).
Remark 11 [6]
1. In odd dimensions, moment conditions are not necessary, and thus con-
ditions 2(b) or 3(b) suffice. (A similar earlier result was established for
a related transform in [34].)
2. The range conditions (2) of the previous Theorem are also necessary
when S is the boundary of any bounded domain, not necessarily a
sphere.
3. An analog of these conditions can be derived for a variable sound speed
(without non-trapping conditions imposed).
8 Concluding remarks
8.1 Variations of the TAT procedure
8.1.1 Planar and linear transducers
Assuming that transducers are point-like, is clearly an approximation, and
in fact a transducer measures the average pressure over its area. It has
been rightfully claimed that the point approximation for transducers should
lead to some blurring in the reconstructions. This, as well as intricacies of
reconstructions from the data obtained by point transducers, triggered recent
proposals for different types of transducers (see [20, 21], [47]-[52], [92, 93]).
In these papers, it was suggested to use either planar, or line detectors.
In the first case [47], the detectors are assumed to be large and planar,
ideally assumed to be approximations of infinite planes that are placed tan-
gentially to a sphere containing the object. Thus, the data one collects is
the integrals of the pressure over these planes, for all values of t > 0. If one
takes the standard 3D Radon transform of the pressure p(x, t) with respect
to x:
P (x, t) 7→ q(s, t, ω) =
x·ω=s
p(x, t)dA(x),
where dA is the surface measure and ω is a unit vector in R3, this is well
known to reduce the 3D Laplace operator ∆x to the second derivative ∂
2/∂s2
[30, 38, 39, 40, 53, 54], and thus the 3D wave equation to the string vibra-
tion problem. The measured data provide the boundary conditions for this
problem. The initial conditions in (1) mean evenness with respect to time,
and thus the standard d’Alambert formula leads to the immediate realiza-
tion that the measured data is just the 3D Radon transform of f(x). Thus,
the reconstruction boils down to the well known inversion formulas for the
Radon transform.
Another proposal ([20, 21], [49]-[52], [92, 93]) is to use line detectors
that provide line integrals of the pressure p(x, t). Such detectors can be
implemented optically, using either Fabry-Perot [20], or Mach-Zehnder [93]
interferometers.
Suppose that the object is surrounded by a surface that is rotation in-
variant with respect to the z-axis. It is suggested to place the line detectors
perpendicular to the z-axis and tangential to the surface. The same consid-
eration as above then shows that after the 2D Radon (or X-ray, which in
2D is the same) transform in each plane orthogonal to z-axis, the 3D wave
equation converts into the 2D one for the Radon data. The measurements
provide the boundary data. Thus, the reconstruction boils down to solving
a 2D problem similar to the one in the case of point detectors, and then
inverting the 2D Radon transform.
Due to the recent nature of these two projects, it appears to be too early
to judge which one will be superior in the end. For instance, it is not clear
beforehand, whether the approximation of infinite size (length, area) of the
linear or planar detectors works better than the zero dimension approxima-
tion for point detectors. Further developments will resolve these questions.
8.1.2 Direct imaging techniques
Some direct imaging techniques have been suggested, which might not require
mathematical reconstructions. See, for instance, [79] about an acoustic lens
system.
8.1.3 Using contrast agents
Contrast agents to improve TAT imaging have been developed (e.g., [24]).
8.1.4 Passive thermoacoustic imaging
The TAT model we have considered can be called “active thermoacoustic to-
mography,” due to the set-up when the practitioner creates the signal. There
has been some recent development of the “passive thermoacoustic tomogra-
phy,” where the thermoacoustic signal is used to image the temperature
sources present inside the body. One can find a survey of this area in [94].
8.2 Uniqueness
8.2.1 Sketch of the proof of Theorem 10
We provide here a brief outline of the rather technical proof of Theorem 10.
Suppose that f is compactly supported, not identically zero, and such
that RSf = 0. Our previous considerations show that one can assume that
S is an algebraic curve (not a straight line) that is contained in the set of
zeros of a non-trivial harmonic polynomial. Now one touches the boundary
of the support of f from outside by a circle centered on S. Then microlocal
analysis of the operator RS (which happens to be an analytic Fourier Integral
Operator, FIO [19, 42, 43, 44, 45, 65, 98]) shows that, due to the equality
RSf = 0, at the tangency point the vector co-normal to the sphere should not
belong to the analytic wave front of f (microlocal regularity of solutions of
RSf = 0). This, for instance, can be also extracted from the results of [105].
On the other hand, a theorem by Hörmander and Kashiwara [56, Theorem
8.5.6] shows that this vector must be in the analytic wave front set, since
f = 0 on one side of the sphere (a microlocal version of uniqueness of analytic
continuation). This way, one gets a contradiction. Unfortunately, the life is
not so easy, and the proof sketched above does not go through smoothly, due
to possible cancelation of wavefronts at different tangency points. Then one
has to involve the geometry of zeros of harmonic polynomials [37] to exclude
the possibility of such a cancelation.
Thus, the proof uses microlocal analysis and geometry of zeros of har-
monic polynomials. Both these tools have their limitations. For instance,
the microlocal approach (at least, in the form it is used in [7]) does not al-
low considerations of non-compactly supported functions. Thus, the validity
of the Theorem for arbitrarily fast decaying, but not compactly supported,
functions is still not established, albeit it most certainly holds. On the other
hand, the geometric part does not work that well in dimensions higher than
two. Development of new approaches is apparently needed in order to over-
come these hurdles. A much simpler PDE approach has emerged recently
[33] (see also [12] and the next Section), albeit its achievements have been
limited so far.
8.2.2 Some open problems concerning uniqueness
As it has already been mentioned, one can consider the practical problems
about uniqueness resolved. However, the mathematical understanding of the
uniqueness problem for the restricted spherical mean operators RS is still
unsatisfactory. Here are some questions that still await their resolution:
1. Describe uniqueness sets in dimensions larger than 2 (prove the Con-
jecture 4). Recent limited progress, as well as variations on this theme
can be found in [1]-[12].
2. Prove Theorem 3 without using microlocal and harmonic polynomial
tools.
3. Prove Theorem 3 on uniqueness sets S under the condition of suffi-
ciently fast decay (rather than compactness of support) of the function.
Very little is known for the case of functions without compact support.
The main known result is of [3], which describes for which values of
1 ≤ p ≤ ∞ the result of Corollary 2 still holds:
Theorem 12 [3] Let S be the boundary of a bounded domain in Rd and
f ∈ Lp(Rd) such that RSf ≡ 0. If p ≤ 2d/(d−1), then f ≡ 0 (and thus
S is injectivity set for this space). This fails for any p > 2d/(d− 1).
8.3 Inversion
Albeit closed form (backprojection type) inversion formulas are available now
for the cases of S being a plane (and object on one side from it), cylinder,
and a sphere, there is still some mystery surrounding this issue.
1. Can one write a backprojection type inversion formula in the case of
the constant sound speed for a closed surface S which is not a sphere?
We suspect that the answer to this question is negative (see also related
discussion in [15, 27]).
2. The inversion formulas for S being a sphere assume that the object to
be imaged is inside S. One can check on simplest examples that if the
support of function f(x) reaches outside S, the inversion formulas do
not reconstruct the function correctly even inside of S. See [5] for a
discussion.
3. The I. Gelfand’s school of integral geometry has developed a marvelous
machinery of the so called κ operator, which provides a general ap-
proach to inversion and range descriptions for transforms of Radon type
[38, 39]. In particular, it has been applied to the case of integration of
various collections (“complexes”) of spheres in [39, 41]. This consider-
ation seems to suggest that one should not expect explicit closed form
inversion formulas for RS when S is a sphere. We, however, know that
such formulas have been discovered recently [33, 69]. This apparent
controversy has not been resolved.
4. Can one derive any more explicit analytic formulas from (28)?
5. Can the series expansion formulas of Theorem 8 be efficiently imple-
mented?
One can also mention that in some works [15, 23] it is suggested to use in
the TAT problem not only the values of the pressure measured by transducers
on the observation surface S, but its normal derivative to S as well. If one
knows both, then taking Fourier transform in the time variable and using the
whole space Green’s function for the Helmholtz equation leads immediately
to a reconstruction formula for the solution (which seems to be much simpler
than what is proposed in [23]). The problem is that this normal derivative is
not measured by TAT devices. Under some circumstances (e.g., when there
are no sources of ultrasound outside S), one can prove the theoretical pos-
sibility of recovering the missing normal derivative. This, however, does not
seem to us to be a plausible procedure. In rare cases (planar, cylindrical,
or spherical surface S), when involvement of the normal derivative can be
eliminated (e.g., [15, 27]), this might lead to feasible inversion algorithms,
but in these cases, as explained before in this text, explicit and nicely im-
plementable analytic inversion formulas are available. So, jury is still out on
this issue as well.
8.4 Stability
Stability of inversion when S is a sphere surrounding the support of f(x) is
the same as for the standard Radon transform, as the results of [91] and sec-
ond statement of Theorem 11 show. However, if the support reaches outside,
albeit Corollary 2 still guarantees uniqueness of reconstruction, stability (at
least for the parts outside S) is gone. Indeed, Theorem 9 shows that some
parts of singularities of f outside S will not be stably “visible.”
8.5 Range
As Theorem 9 states, the range conditions 2 and 3 of Theorem 10 are neces-
sary also for non-spherical closed surfaces S and for functions with support
outside S. They, however, are not expected to be sufficient, since Theorem 9
indicates that one might expect non-closed ranges in some cases. The same
applies for non-constant sound speed case.
Acknowledgments
The work of the first author was partially supported by the NSF DMS grants
0604778 and 0648786. The second author was partially supported by the
DOE grant DE-FG02-03ER25577 and NSF DMS grant 0312292. Part of
this work was completed when the first author was at the Isaac Newton
Institute for Mathematical Sciences. The authors express their gratitude to
the NSF, DOE and INI for this support. The authors thank M. Agranovsky
for extremely useful discussions, M. Anastasio, G. Beylkin and M. Klibanov
for providing preprints and references, and the reviewers for very helpful
remarks.
References
[1] Agranovsky, M. 1997 Radon transform on polynomial level sets and
related problems. Israel Math. Conf. Proc., 11, 1-21.
[2] Agranovsky, M. 2000 On a problem of injectivity for the Radon trans-
form on a paraboloid. Analysis, geometry, number theory: the mathe-
matics of Leon Ehrenpreis (Philadelphia, PA, 1998). Contemp. Math.
251, AMS, Provodence, RI, 1–14.
[3] Agranovsky, M., Berenstein, C., & Kuchment, P. 1996 Approximation
by spherical waves in Lp-spaces, J. Geom. Anal. 6, no. 3, 365–383.
[4] Agranovsky, M. & Kuchment, P. 2007 Uniqueness of reconstruction and
an inversion procedure for thermoacoustic and photoacoustic tomogra-
phy with variable sound speed. Inverse Problems 23, 2089–2102.
[5] Agranovsky, M., Kuchment, P. & Kunyansky, L. 2007 On reconstruc-
tion formulas and algorithms for the thermoacoustic and photoacoustic
tomography. Submitted.
[6] Agranovsky, M., Kuchment, P., & Quinto, E. T. 2007 Range descriptions
for the spherical mean Radon transform. J. Funct. Anal. 248, 344–386.
[7] Agranovsky, M. & Quinto, E. T. 1996 Injectivity sets for the Radon
transform over circles and complete systems of radial functions. Journal
of Functional Analysis, 139, 383–414.
[8] Agranovsky, M. & Quinto, E. T. 2001 Geometry of stationary sets for
the wave equation in Rn:the case of finitely suported initial data. Duke
Math. J., 107, no. 1, 57–84.
[9] Agranovsky, M. & Quinto, E. T. 2003 Stationary sets for the wave equa-
tion in crystallographic domains. Trans. AMS, 355, no. 6, 2439–2451.
[10] Agranovsky, M. & Quinto, E. T. 2006 Remarks on stationary sets for the
wave equation. Integral Geometry and Tomography, Contemp. Math.
405, 1–11.
[11] Agranovsky, M., Volchkov, V. V. & Zalcman, L. 1999 Conical uniqueness
sets for the spherical Radon transform. Bull. London Math. Soc., 31, no.
4, 363–372.
[12] Ambartsoumian, G. & Kuchment, P. 2005 On the injectivity of the
circular Radon transform. Inverse Problems 21, 473–485.
[13] Ambartsoumian, G. & Kuchment, P. 2006 A range description for the
planar circular Radon transform. SIAM J. Math. Anal. 38, no. 2, 681–
[14] Ambartsoumian, G. & Patch, S. 2007 Thermoacoustic tomography: nu-
merical results. Proceedings of SPIE 6437, Photons Plus Ultrasound:
Imaging and Sensing 2007: The Eighth Conference on Biomedical Ther-
moacoustics, Optoacoustics, and Acousto-optics, Alexander A. Oraevsky,
Lihong V. Wang, Editors, 64371B.
[15] Anastasio, M. A., Zhang, J., Modgil, D., and La Rivière, P. J. 2007
Application of inverse source concepts to photoacoustic tomography,
preprint.
[16] Andersson, L.-E.. 1988 On the determination of a function from spherical
averages. SIAM J. Math. Anal. 19 no. 1, 214–232.
[17] Asgeirsson, L. 1937 Über eine Mittelwerteigenschaft von Lösungen ho-
mogener linearer partieller Differentialgleichungen zweiter Ordnung mit
konstanten Koeffizienten. Ann. Math., 113, 321–346.
[18] Bangerth, W., Georgieva-Hristova, Y. & Kuchment, P. 2007 On recon-
struction in thermoacoustic tomography with variable speed, in prepa-
ration.
[19] Beylkin, G. 1984 The inversion problem and applications of the gener-
alized Radon transform. Comm. Pure Appl. Math. 37, 579–599.
[20] Burgholzer, P., Hofer, C., Paltauf, G., Haltmeier, M., & Scherzer, O.
2005 Thermoacoustic tomography with integrating area and line detec-
tors. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency
Control 52(9), 1577–1583.
[21] Burgholzer, P.,, Hofer, C., Matt, G. J., Paltauf, G., Haltmeier, M.
& Scherzer, O. 2006 Thermoacoustic tomography using a fiber-based
Fabry-Perot interferometer as an integrating line detector. Proc. SPIE
6086, 434–442.
[22] Burgholzer, P., Matt, G., Haltmeier, M. & Patlauf, G. 2007 Exact and
approximate imaging methods for photoacoustic tomography using an
arbitrary detection surface. Phys. Rev. E 75, 046706.
[23] Clason, C. and Klibanov, M. 2007 Quasireversibility method in ther-
moacoustic tomography in heterogeneous medium, preprint.
[24] Copland, J. A. et al. 2004 Bioconjugated gold nanoparticles as a molec-
ular based contrast agent: implications for imaging of deep tumors using
optoacoustic tomography. Molecular Imaging and Biology 6, no. 5, 341–
[25] Courant, R. & Hilbert, D. 1962 Methods of Mathematical Physics, Vol-
ume II Partial Differential Equations, Interscience, New York.
[26] Denisjuk, A. 1999 Integral geometry on the family of semi-spheres. Fract.
Calc. Appl. Anal. 2, no. 1, 31–46.
[27] Devaney, A. J. and Beylkin, G. 1984 Diffraction tomography using arbi-
trary transmitter and receiver surfaces. Ultrasonic Imaging 6, 181-193.
1984.
[28] Diebold, G. J., Sun, T. & Khan, M. I. 1991 Photoacoustic monopole
radiation in one, two, and three dimensions. Phys. Rev. Lett. 67, no.
24, 3384–3387.
[29] Egorov, Yu. V. & Shubin, M. A. 1992 Partial Differential Equations I.
Encyclopaedia of Mathematical Sciences, (Springer Verlag), 30, 1–259
[30] Ehrenpreis, L. 2003 The Universality of the Radon Transform, Oxford
Univ. Press.
[31] Fawcett, J. A. 1985 Inversion of n-dimensional spherical averages. SIAM
J. Appl. Math. 45, no. 2, 336–341.
[32] Finch, D., Haltmeier, M. & Rakesh 2007 Inversion of spherical
means and the wave equation in even dimensions. Preprint arXiv
math.AP/0701426.
[33] Finch, D., Patch, S. & Rakesh 2004 Determining a function from its
mean values over a family of spheres. SIAM J. Math. Anal. 35, no. 5,
1213–1240.
[34] Finch, D. & Rakesh 2006 The range of the spherical mean value operator
for functions supported in a ball. Inverse Problems 22, 923-938.
[35] Finch, D. & Rakesh 2007 Recovering a function from its spherical mean
values in two and three dimensions. Preprint.
http://arxiv.org/abs/math/0701426
[36] Finch, D. & Rakesh 2007 The spherical mean value operator with centers
on a sphere. Preprint. To appear in Inverse Problems.
[37] Flatto, L., Newman, D. J. & Shapiro, H. S. 1966 The level curves of
harmonic functions. Trans. Amer. Math. Soc. 123, 425–436.
[38] Gelfand, I., Gindikin, S. & Graev M. 1980 Integral geometry in affine
and projective spaces. J. Sov. Math. 18, 39–167.
[39] Gelfand, I., Gindikin, S. & Graev M. 2003 Selected Topics in Integral
Geometry. Transl. Math. Monogr. v. 220, Amer. Math. Soc., Providence
[40] Gelfand, I., Graev M. & Vilenknin, N. 1965 Generalized Functions, v.
5: Integral Geometry and Representation Theory, Acad. Press.
[41] Gindikin, S. 1995 Integral geometry on real quadrics, in Lie groups and
Lie algebras: E. B. Dynkin’s Seminar, 23–31, Amer. Math. Soc. Transl.
Ser. 2, 169, Amer. Math. Soc., Providence, RI.
[42] Greenleaf, A. & Uhlmann, G. 1990 Microlocal techniques in integral
geometry. Contemporary Math. 113, 149–155.
[43] Guillemin, V. 1975 Fourier integral operators from the Radon transform
point of view. Proc. Symposia in Pure Math., 27, 297–300.
[44] Guillemin, V. 1985 On some results of Gelfand in integral geometry.
Proc. Symposia in Pure Math., 43, 149–155.
[45] Guillemin, V. & Sternberg S. 1977 Geometric Asymptotics. Amer. Math.
Soc., Providence, RI.
[46] Gusev, V. E. & Karabutov, A. A. 1993 Laser Optoacoustics. American
Inst. of Physics, NY.
[47] Haltmeier, M., Burgholzer, P., Paltauf, G. & Scherzer, O. 2004 Ther-
moacoustic computed tomography with large planar receivers. Inverse
Problems 20, 1663–1673.
[48] Haltmeier, M., Schuster, T. & O. Scherzer. 2005 Filtered backprojec-
tion for thermoacoustic computed tomography in spherical geometry.
Mathematical Methods in the Applied Sciences, 28, 1919–1937.
[49] Haltmeier, M., Paltauf, G., Burgholzer, P. & Scherzer, O. 2005 Ther-
moacoustic Tomography with integrating line detectors. Proc. SPIE
5864:586402-8.
[50] Haltmeier, M., Burgholzer, P., Hofer, C., Paltauf, G., Nuster, R. &
Scherzer, O. 2005 Thermoacoustic tomography using integrating line
detectors. Ultrasonics Symposium 1, 166–169.
[51] Haltmeier, M., Scherzer, O., Burgholzer, P. & Paltauf, G. 2005 Ther-
moacoustic Computed Tomography with large planar receivers. ECMI
Newsletter 37, pp. 31-34. http://www.it.lut.fi/mat/EcmiNL/ecmi37/
[52] Haltmeier, M., & Fidler, T. Mathematical Challenges Aris-
ing in Thermoacoustic Tomography with Line Detectors, preprint
arXiv:math.AP/0610155.
[53] Helgason, S. 1980 The Radon Transform, Birkhäuser, Basel.
[54] Helgason, S. 2000 Groups and Geometric Analysis. Amer. Math. Soc.,
Providence, R.I.
[55] Herman, G.(Ed.) 1979 Image Reconstruction from Projections . Topics
in Applied Physics, v. 32, Springer Verlag, Berlin, New York.
[56] Hörmander, L. 1983 The Analysis of Linear Partial Differential Opera-
tors, vol. 1, Springer-Verlag, New York.
[57] Jin, X. & Wang, L. V. 2006 Thermoacoustic tomography with correction
for acoustic speed variations. Physics in Medicine and Biology 51, 6437–
6448.
[58] John, F. 1971 Plane Waves and Spherical Means Applied to Partial
Differential Equations, Dover.
[59] Kak, A. C. & Slaney, M. 2001 Principles of Computerized Tomographic
Imaging. SIAM, Philadelphia.
[60] Köstli K. P., Frenz, M., Bebie, H. & Weber H. P. 2001 Temporal back-
ward projection of optoacoustic pressure transients using Fourier trans-
form methods. Phys. Med. Biol. 46, 1863–1872
http://www.it.lut.fi/mat/EcmiNL/ecmi37/
http://arxiv.org/abs/math/0610155
[61] Kruger, R. A., Kiser, W. L., Reinecke, D. R. & Kruger, G. A. 2003 Ther-
moacoustic computed tomography using a conventional linear trans-
ducer array. Med. Phys. 30, no 5, 856–860.
[62] Kruger, R. A., Liu. P., Fang, Y. R. & Appledorn, C. R. 1995 Photoa-
coustic ultrasound (PAUS)reconstruction tomography. Med. Phys. 22,
1605–1609.
[63] Kuchment, P. 1993, unpublished.
[64] Kuchment, P. 2006 Generalized Transforms of Radon Type and Their
Applications. in [85], pp. 67–91.
[65] Kuchment, P., Lancaster, K. & Mogilevskaya, L. 1995 On local tomog-
raphy. Inverse Problems, 11, 571–589.
[66] Kuchment, P. & Lvin S. 1990 Paley-Wiener theorem for the exponential
Radon transform. Acta Applicandae Mathematicae, no.18, 251–260.
[67] Kuchment, P. & Lvin S. 1991 The Range of the Exponential Radon
Transform. Soviet Math Dokl, 42, no.1, 183–184.
[68] Kuchment, P. & Quinto, E. T. 2003. Some problems of integral geometry
arising in tomography. Chapter XI in [30].
[69] Kunyansky, L. 2007 Explicit inversion formulae for the spherical mean
Radon transform. Inverse problems 23 (2007), 737-783.
[70] Kunyansky, L. 2007 A series solution and a fast algorithm for the
inversion of the spherical mean Radon transform. Preprint arXiv
math.AP/0701236.
[71] Lin, V. & Pinkus, A. 1993 Fundamentality of ridge functions. J. Approx.
Theory, 75, 295–311.
[72] Lin, V. & Pinkus, A 1994 Approximation of multivariate functions. In
Advances in Computational Mathematics, H. P. Dikshit & C. A. Mic-
chelli, Eds., World Sci. Publ., 1–9.
[73] Louis, A. K. & Quinto, E. T. 2000 Local tomographic methods in Sonar.
In Surveys on solution methods for inverse problems, Springer, Vienna,
147–154.
http://arxiv.org/abs/math/0701236
[74] Lvin S. 1994 Data correction and restoration in emission tomography,
in E.T. Quinto, M. Cheney, and P. Kuchment (Editors), Tomography,
Impedance Imaging, and Integral Geometry, Lectures in Appl. Math.,
vol. 30, AMS, Providence, RI, 149–155.
[75] Mathematics and Physics of Emerging Biomedical Imaging,
The National Academies Press 1996. Available online at
http://www.nap.edu/catalog.php?record id=5066#toc.
[76] Natterer, F. 1986 The mathematics of computerized tomography, Wiley,
New York.
[77] Natterer, F. & Wübbeling, F. 2001 Mathematical Methods in Image Re-
construction, Monographs on Mathematical Modeling and Computation
5, SIAM, Philadelphia, PA.
[78] Nessibi, M. M., Rachdi, L. T. & Trimeche, K. 1995 Ranges and inversion
formulas for spherical mean operator and its dual. J. Math. Anal. Appl.
196, no. 3, 861–884.
[79] Niederhauser, J. J., Jaeger, M., Lemor, R., Weber, P. & Frenz, M.
2005 Combined ultrasound and optoacoustic system for real-time high-
contrast vascular imaging in vivo. IEEE Transactions on medical Imag-
ing 24, 436–440.
[80] Nilsson, S. 1997 Application of fast backprojection techniques for some
inverse problems of integral geometry. Linkoeping studies in science and
technology, Dissertation 499, Dept. of Mathematics, Linkoeping univer-
sity, Linkoeping, Sweden.
[81] Nolan, C. J. & Cheney, M. 2002 Synthetic aperture inversion. Inverse
Problems 18, 221–235.
[82] Norton, S. J. 1980 Reconstruction of a two-dimensional reflecting
medium over a circular domain: exact solution. J. Acoust. Soc. Am.
67, 1266–1273.
[83] Norton, S. J. & Linzer, M. 1981 Ultrasonic reflectivity imaging in three
dimensions: exact inverse scattering solutions for plane, cylindrical, and
spherical apertures. IEEE Transactions on Biomedical Engineering, 28,
200–202.
http://www.nap.edu/catalog.php?record_id=5066#toc
[84] Novikov, R. 2002 On the range characterization for the two-dimensional
attenuated X-ray transform. Inverse Problems 18, 677–700.
[85] Olafsson, G. & Quinto, E. T. (Editors), 2006 The Radon Transform, In-
verse Problems, and Tomography. American Mathematical Society Short
Course January 3–4, 2005, Atlanta, Georgia, Proc. Symp. Appl. Math.,
v. 63, AMS, RI.
[86] Oraevsky, A. A., Esenaliev, R. O., Jacques, S. L. Tittel, F. K. 1996
Laser optoacoustic tomography for medical diagnostics principles. Proc.
SPIE 2676, 22.
[87] Oraevsky, A. A. & Karabutov, A. A. 2002 In Handbook of Optical
Biomedical Diagonstics, edited by V. V. Tuchin, SPIE, Bellingham, WA,
Chap. 10.
[88] Oraevsky A. A.& A. A. Karabutov, A. A., 2003 Optoacoustic Tomogra-
phy, Ch. 34 In Biomedical Photonics Handbook, edited by T. Vo-Dinh,
CRC, Boca Raton, FL, Chap. 34, 34-1 – 34-34.
[89] Palamodov, V. P. 2000 Reconstruction from limited data of arc means.
J. Fourier Anal. Appl. 6, no. 1, 25–42.
[90] Palamodov, V. P. 2004 Reconstructive Integral Geometry, Birkhäuser,
Basel.
[91] Palamodov, V. 2006 Remarks on the general Funk transform. Preprint,
Tel Aviv University, August.
[92] Paltauf, G., Burgholzer, P., Haltmeier, M. & O. Scherzer 2005 Ther-
moacoustic Tomography using optical Line detection. Proc. SPIE 5864,
7–14.
[93] Paltauf, G., R. Nuster, Haltmeier, M. & Burgholzer, P. 2007(?) Ther-
moacoustic Computed Tomography using a Mach-Zehnder interferome-
ter as acoustic line detector. Submitted
[94] Passechnik, V. I., Anosov, A. A. & Bograchev, K. M. 2000 Fundamentals
and prospects of passive thermoacoustic tomography. Critical reviews in
Biomed. Eng. 28, no. 3&4, 603–640.
[95] Patch, S. K. 2004 Thermoacoustic tomography - consistency conditions
and the partial scan problem. Phys. Med. Biol. 49, 1–11.
[96] Popov D. A. & Sushko, D. V. 2002 A parametrix for the problem of
optical-acoustic tomography. Dokl. Math. 65, no. 1, 19–21.
[97] Popov D. A. & Sushko, D. V. 2004 Image restoration in optical-acoustic
tomography. Problems of Information Transmission 40, no. 3, 254–278.
[98] Quinto, E. T. 1980 The dependence of the generalized Radon transform
on defining measures. Trans. Amer. Math. Soc. 257, 331–346.
[99] Quinto, E. T. 1993 Singularities of the X-ray transform and limited data
tomography in R2 and R3. SIAM J. Math. Anal. 24, 1215–1225.
[100] Quinto, E. T. 2006 An introduction to X-ray tomography and Radon
transforms. In [85], 1–23.
[101] Ramm, A. G. 1985 Inversion of the backscattering data and a problem
of integral geometry. Phys. Lett. A 113, no. 4, 172–176.
[102] Ramm, A. G. 2002 Injectivity of the spherical means operator. C. R.
Math. Acad. Sci. Paris 335, no. 12, 1033–1038.
[103] Romanov, V. G. 1967 Reconstructing functions from integrals over a
family of curves. Sib. Mat. Zh. 7, 1206–1208.
[104] Schuster, T. & Quinto, E. T. 2005 On a regularization scheme for linear
operators in distribution spaces with an application to the spherical
Radon transform. SIAM J. Appl. Math. 65 (4), 1369–1387.
[105] Stefanov, P. & Uhlmann, G. Integral geometry of tensor fields on a class
of non-simple Riemannian manifolds. Preprint arXiv:math/0601178.
[106] Strichartz, Robert S. 2003 A Guide to Distribution Theory and Fourier
Transforms. World. Sci.
[107] Tam, A. C. 1986 Applications of photoacoustic sensing techniques. Rev.
Mod. Phys. 58, no. 2, 381–431.
[108] Tataru, D. 1995 Unique continuation for solutions to PDEs; between
Hörmander’s theorem and Holmgren’s theorem. Comm. PDE 20, 814–
http://arxiv.org/abs/math/0601178
[109] Tuchin, V. V. (Editor) 2002 Handbook of Optical Biomedical Diagnos-
tics. SPIE, Bellingham, WA.
[110] Vainberg, B. 1975 The short-wave asymptotic behavior of the solutions
of stationary problems, and the asymptotic behavior as t → ∞ of the
solutions of nonstationary problems. Russian Math. Surveys, 30, no. 2,
1–58.
[111] Vainberg, B. 1982. Asymptotics methods in the Equations of Mathe-
matical Physics. (Gordon & Breach)
[112] Vo-Dinh, T. (Editor) 2003 Biomedical Photonics Handbook. CRC, Boca
Raton, FL.
[113] Wang, L. V. &Wu, H. 2007 Biomedical Optics. Principles and Imaging.
Wiley-Interscience.
[114] Wang, X., Pang, Y., Ku, G., Xie, X., Stoica, G. & Wang, L. 2003
Noninvasive laser-induced photoacoustic tomography for structural and
functional in vivo imaging of the brain. Nature Biotechnology, 21, no.
7, 803–806.
[115] Xu, M. & Wang, L.-H. V. 2002 Time-domain reconstruction for ther-
moacoustic tomography in a spherical geometry. IEEE Trans. Med.
Imag. 21, 814–822.
[116] Xu, M. & Wang, L.-H. V. 2005 Universal back-projection algorithm for
photoacoustic computed tomography. Phys. Rev. E 71, 016706.
[117] Xu, M. & Wang, L.-H. V. 2006 Photoacoustic imaging in biomedicine.
Review of Scientific Instruments 77, 041101-01 – 041101-22.
[118] Xu, Y., Feng, D. & Wang, L.-H. V. 2002 Exact frequency-domain re-
construction for thermoacoustic tomography: I. Planar geometry. IEEE
Trans. Med. Imag. 21, 823–828.
[119] Xu, Y., Xu, M. & Wang, L.-H. V. 2002 Exact frequency-domain re-
construction for thermoacoustic tomography: II. Cylindrical geometry.
IEEE Trans. Med. Imag. 21, 829–833.
[120] Xu, Y., Wang, L., Ambartsoumian, G. & Kuchment, P. 2004 Recon-
structions in limited view thermoacoustic tomography. Medical Physics,
31(4), 724–733.
[121] Zobin, N. 1993. Unpublished.
	Introduction
	Thermoacoustic tomography
	Mathematical model of TAT: wave equation and the spherical mean transform
	The wave equation model
	Spherical mean model
	Main mathematical problems of TAT
	Uniqueness of reconstruction
	Constant speed case
	Non-uniqueness sets in R2.
	Higher dimensions
	Relations to other areas of analysis
	Uniqueness in the case of a variable sound speed
	Reconstruction: formulas and examples
	Constant sound speed
	Inversion formulas
	Fourier expansion methods
	Filtered backprojection methods
	Series solutions for arbitrary geometries
	Time reversal (backpropagation) methods
	Examples of reconstructions and additional remarks about the inversion formulas
	Reconstruction in the variable speed case
	``Analytic'' inversions
	Backpropagation
	Eigenfunction expansions
	Partial data. ``Visible'' and ``invisible'' singularities
	Range conditions
	Concluding remarks
	Variations of the TAT procedure
	Planar and linear transducers
	Direct imaging techniques
	Using contrast agents
	Passive thermoacoustic imaging
	Uniqueness
	Sketch of the proof of Theorem 10
	Some open problems concerning uniqueness
	Inversion
	Stability
	Range
ABSTRACT
  The paper presents a survey of mathematical problems, techniques, and
challenges arising in the Thermoacoustic and Photoacoustic Tomography.

<|endoftext|><|startoftext|>
Search for Very High Energy Emission from Gamma-Ray
Bursts using Milagro
P. M. Saz Parkinson for the Milagro Collaboration1
Santa Cruz Institute for Particle Physics, University of California, Santa Cruz, CA 95064
Abstract. Gamma-Ray Bursts (GRBs) have been detected at GeV energies by EGRET and models predict emission at > 100
GeV. Milagro is a wide field (2 sr) high duty cycle (> 90%) ground based water Cherenkov detector that records extensive
air showers in the energy range 100 GeV to 100 TeV. We have searched for very high energy emission from a sample of 106
gamma-ray bursts (GRB) detected since the beginning of 2000 by BATSE, BeppoSax, HETE-2, INTEGRAL, Swift or the
IPN. No evidence for emission from any of the bursts has been found and we present upper limits from these bursts.
Keywords: gamma-ray sources; gamma-ray bursts; astronomical observations: gamma-ray
PACS: 98.70.Rz,95.85.Pw
Some of the most important contributions to understanding gamma-ray bursts (GRBs) have come from observations
of afterglows over a wide spectral range [1]. Many GRB models predict very high energy (VHE, > 100 GeV) emission
from GRBs at a level comparable to that at MeV energies (e.g. [2, 3]). EGRET detected several GRBs at energies
above 100 MeV, indicating that the spectrum of GRBs extends at least out to 1 GeV, with no evidence for a spectral
cut-off [4]. A second component was also found in one burst which extended up to at least 200 MeV and had a much
slower temporal decay than the main burst [5]. At very high energies, there has been no conclusive emission detected
for any single GRB, though a search for counterparts to 54 BATSE bursts with Milagrito, a prototype of Milagro,
found evidence for emission from one burst, with an after trials significance slightly greater than 3σ [6]. At these high
energies, gamma rays are attenuated by the redshift-dependent extra-galactic background light (EBL) [7], making the
detection of VHE emission from GRBs very challenging.
A search for an excess of events above those due to the background was carried out for each of the 106 satellite-
detected GRBs in our sample (see Table 1). These represent all the GRBs known to have occurred within the field
of view of Milagro during its first seven years of operations (2000-2006)2. Milagro detected no significant emission
from any of these bursts, and fluence upper limits are given in Table 1.
We acknowledge Scott Delay and Michael Schneider for their dedicated efforts in the construction and maintenance of the Milagro experiment.
This work has been supported by the National Science Foundation (under grants PHY-0245234, -0302000, -0400424, -0504201, -0601080, and
ATM-0002744) the US Department of Energy (Office of High-Energy Physics and Office of Nuclear Physics), Los Alamos National Laboratory,
the University of California, and the Institute of Geophysics and Planetary Physics.
REFERENCES
1. van Paradijs, J., Kouveliotou, C. & Wijers, R. A. M. J. 2000, Annual Review of Astronomy and Astrophysics 38, 379
2. Dermer, C. D., Chiang, J., & Mitman, K. E. 2000, ApJ 537, 785
3. Zhang, B. & Mészáros, P. 2001, ApJ 559, 110
4. Dingus, B. L. 2001, in Aharonian, F. A. and Volk, H. J. (eds), High Energy Gamma Ray Astronomy, AIP Conf. Proc., 558, 383
5. Gonzalez, M. M. et al. 2003, Nature 424, 749
6. Atkins, R. et al. 2000, ApJL 533, L119
7. Primack, J. R. et al. 2005, in Aharonian, F., Volk, H., and Horns, D. (eds) Gamma 2004 Heidelberg, AIP Conf. Proc., 745, 23-33
1 A. Abdo, B. T. Allen, D. Berley, E. Blaufuss, S. Casanova, B. L. Dingus, R. W. Ellsworth, M. M. Gonzalez, J. A. Goodman, E. Hays,
C. M. Hoffman, B. E. Kolterman, C. P. Lansdell, J. T. Linnemann, J. E. McEnery, A. I. Mincer, P. Nemethy, D. Noyes, J. M. Ryan, F. W. Samuelson,
P. M. Saz Parkinson, A. Shoup, G. Sinnis, A. J. Smith, G. W. Sullivan, V. Vasileiou, G. P. Walker, D. A. Williams, X. W. Xu and G. B. Yodh
2 GRB 060218, due to its long duration of more than 2000 s moved out of Milagro’s field of view after the start of the burst. The limit presented
here is for the initial 10 s hard spike reported by the instrument team.
http://arxiv.org/abs/0704.0287v1
TABLE 1. GRBs in the Milagro field of view (2000-2006). Column 1 is the GRB name. A superscript refers to the number
of IPN error regions in the Milagro field of view. A superscript of one implies only one of two error regions fell in the
Milagro field of view, while a two implies that both did, and they are listed one after the other. Column 2 gives the duration
of the burst (in seconds), column 3 the zenith angle (in degrees), column 4 the measured redshift, column 5 the satellite(s)
detecting the GRB, and column 6 gives the Milagro 99% confidence upper limit on the 0.2–20 TeV fluence in erg cm−2.
Numbers in bold (also labelled with a #) take into account absorption by the EBL (using the Primack 05 model) for a redshift
given in column 4. Those with three dots imply the redshifts are so high that all the emission is expected to be absorbed.
GRB Dur. θ z Inst. 99% UL
000113 370 21 ... BATSE 5.5e-6
0001311 12 41 ... IPN 6.5e-7
000205 23 25 ... BSAX 6.9e-7
000206 10 39 ... BSAX 9.3e-7
000212 8 2.2 ... BATSE 1.1e-6
000220 2.4 49 ... BATSE 1.1e-5
000226 10 32 ... BATSE 3.4e-6
000226b1 94.5 32 ... IPN 7.8e-7
000301C 14 38 2.03 BATSE ...
000302 120 32 ... BATSE 6.8e-6
000314 12.8 45 ... BSAX 3.6e-5
000317 550 6.4 ... BATSE 7.9e-6
000330 0.2∗ 30 ... BATSE 1.0e-6
000331 55 38 ... BATSE 1.2e-5
000402 120 48 ... BSAX 4.5e-5
000408 2.5 31 ... BATSE 1.0e-6
000424 5 36 ... BATSE 7.6e-7
000508 30 34 ... BATSE 3.7e-6
0006071 0.12 42 ... IPN 4.6e-7
000615 10 39 ... BSAX 1.6e-6
000630 20 32 ... IPN 2.2e-6
0007072 18 43 ... IPN 1.9e-6
0007072 18 41 ... IPN 1.0e-6
000727 10 41 ... IPN 2.6e-6
000730 7 19 ... IPN 4.2e-7
0008211 8 27 ... IPN 6.9e-7
0008301 8 35 ... IPN 9.1e-7
000926 25 16 2.04 IPN ...
001017 10 42 ... IPN 2.2e-6
001018 31 32 ... IPN 2.1e-6
001019 10 20 ... IPN 1.1e-6
001105 30 8.5 ... IPN 1.4e-6
001204 0.44 48 ... BSAX 1.2e-5
010104 2 20 ... IPN 4.0e-7
010220 150 27 ... BSAX 2.1e-6
010613 152 25 ... IPN 2.9e-6
010706 48 37 ... IPN 2.6e-6
010903 41 49 ... IPN 2.9e-5
010921 24.6 10 0.45 HETE 2.9e-5#
011130 83.2 34 ... HETE 3.4e-6
011212 84.4 33 ... HETE 6.7e-6
020311 11.5 27 ... IPN 1.7e-7
0204292 16 39 ... IPN 4.6e-7
0204292 16 30 ... IPN 3.0e-7
020625b 125 38 ... HETE 5.7e-6
020702 26 34 ... IPN 1.4e-6
0209081 17 19 ... IPN 7.3e-7
020914 9 5.7 ... IPN 4.2e-7
021104 19.7 13 ... HETE 7.5e-7
021112 7.1 34 ... HETE 9.4e-7
021113 20 18 ... HETE 6.4e-7
021211 6 35 1.01 HETE ...
030413 15 27 ... IPN 1.0e-6
030823 56 33 ... HETE 2.8e-6
GRB Dur. θ z Inst. 99% UL
031026 0.24 45 ... IPN 1.1e-6
031220 23.7 43 ... HETE 4.0e-6
040506 175 49 ... IPN 6.0e-6
040924 0.6 43 0.859 HETE 1.4e-3#
041211 30.2 43 ... HETE 4.8e-6
041219a 520 27 ... INTGR. 5.8e-6
050124 4 23 ... Swift 3.0e-7
050213 17 23 ... IPN 1.3e-6
050319 15 45 3.24 Swift ...
050402 8 40 ... Swift 2.1e-6
050412 26 37 ... Swift 1.7e-6
050502 20 43 3.793 INTGR. ...
050504 80 28 ... INTGR. 1.3e-6
050505 60 29 4.3 Swift ...
050509b 0.128 10 0.226? Swift 1.1e-6#
050522 15 23 ... INTGR. 5.1e-7
050607 26.5 29 ... Swift 8.9e-7
050703 26 26 ... IPN 1.2e-6
050712 35 39 ... Swift 2.5e-6
050713b 30 44 ... Swift 4.0e-6
050715 52 37 ... Swift 1.7e-6
050716 69 30 ... Swift 1.6e-6
050820 20 22 2.612 Swift ...
051103 0.17 50 0.001? IPN 4.2e-6#
051109 36 9.7 2.346 Swift ...
051111 20 44 1.55 Swift ...
051211b 80 33 ... INTGR. 2.6e-6
051221 1.4 42 0.55 Swift 9.8e-4#
051221b 61 26 ... Swift 1.8e-6
060102 20 40 ... Swift 2.0e-6
060109 10 22 ... Swift 4.1e-7
060110 15 43 ... Swift 3.0e-6
060111b 59 37 ... Swift 2.3e-6
060114 100 41 ... INTGR. 5.1e-6
060204b 134 31 ... Swift 2.7e-6
060210 5 43 3.91 Swift ...
060218 10 44.6 0.03 Swift 3.8e-5#
060306 30 46 ... Swift 7.2e-6
060312 30 44 ... Swift 3.3e-6
060313 0.7 47 ... Swift 2.7e-6
060403 25 28 ... Swift 1.0e-6
060427b 0.22 16 ... IPN 2.1e-7
060428b 58 27 ... Swift 1.1e-6
060507 185 47 ... Swift 1.8e-5
060510b 330 43 4.9 Swift ...
060515 52 42 ... Swift 9.6e-6
060712 26 35 ... Swift 3.8e-6
060814 146 23 ... Swift 2.5e-6
060904A 80 14 ... Swift 2.4e-6
060906 43.6 29 3.685 Swift ...
061002 20 45 ... Swift 4.0e-6
061126 191 28 ... Swift 4.3e-6
061210 0.8 23 0.41? Swift 6.1e-6#
061222a 115 30 ... Swift 5.6e-6
ABSTRACT
  Gamma-Ray Bursts (GRBs) have been detected at GeV energies by EGRET and
models predict emission at > 100 GeV. Milagro is a wide field (2 sr) high duty
cycle (> 90 %) ground based water Cherenkov detector that records extensive air
showers in the energy range 100 GeV to 100 TeV. We have searched for very high
energy emission from a sample of 106 gamma-ray bursts (GRB) detected since the
beginning of 2000 by BATSE, BeppoSax, HETE-2, INTEGRAL, Swift or the IPN. No
evidence for emission from any of the bursts has been found and we present
upper limits from these bursts.

<|endoftext|><|startoftext|>
Introduction
	Formulae in the grand canonical model
	The canonical model solution
	Generating grand canonical results from the canonical ensemble
	Bimodality: the basic formulae in grand canonical and canonical ensemble
	Representative Results
	Connection between probability distribution of the largest fragment and average multiplicity
	Summary
	Acknowledgement
	References
ABSTRACT
  We address two issues in the thermodynamic model for nuclear disassembly.
Surprisingly large differences in results for specific heat were seen in
predictions from the canonical and grand canonical ensembles when the nuclear
system passes from liquid-gas co-existence to the pure gas phase. We are able
to pinpoint and understand the reasons for such and other discrepancies when
they appear. There is a subtle but important difference in the physics
addressed in the two models. In particular if we reformulate the parameters in
the canonical model to better approximate the physics addressed in the grand
canonical model, calculations for observables converge. Next we turn to the
issue of bimodality in the probability distribution of the largest fragment in
both canonical and grand canonical ensembles. We demonstrate that this
distribution is very closely related to average multiplicities. The
relationship of the bimodal distribution to phase transition is discussed.

<|endoftext|><|startoftext|>
Introduction to Superconductivity,
McGraw-Hill, Inc., New York (1996).
[10] G. Agnolet et al., Phys. Rev. B 39, 8934 (1989).
[11] A. Safonov et al., Phys. Rev. Lett. 81, 4545 (1998)
[12] D. Resnick et al., Phys. Rev. Lett. 47, 1542 (1981).
[13] Z. Hadzibabic et al., Nature 441, 1118 (2006).
[14] T. Simula and P. Blakie, Phys. Rev. Lett. 96, 020404
(2006).
[15] A. Trombettoni et al., New J. Phys. 7, 57 (2005).
[16] Three circularly polarized laser beams (λ = 810.1nm) in-
tersect in a tripodlike configuration, with θ = 6.6◦ angles
to the z-axis. Calculation of the optical dipole potential
[17] includes counterrotating terms and interaction with
both the D1 and D2 lines, as well as the ‘fictitious mag-
netic field’ due to the circular polarization. The tilted
bias field of the TOP trap makes P ≈ 0.5[17].
[17] R. Grimm et al., Adv. At. Mol. Opt. Phys. 42, 95 (2000).
[18] J. Javanainen, Phys. Rev. A 60, 4902 (1999).
[19] K. Burnett et al., J. Phys. B 35, 1671 (2002).
[20] To avoid radial flows during VOL ramp-up, RTF is kept
constant by balancing the lattice-enhanced mean field
pressure with radial confinement due to the optical lattice
envelope, by the choice of a 67µm 1/e2 intensity waist.
Axial (z) confinement is due to the magnetic trap alone.
[21] D. Petrov et al., Phys. Rev. Lett. 87, 050404 (2001).
[22] In the axial condensate region between z =
−Rz/3,+Rz/3, where according to our 3D GPE
simulations 85% of the tunnel current is localized
and hence the relative phase is measured, axial phase
fluctuations [21] vary between ≈ 600mrad (“cold” data
in Fig. 2) and 800mrad (“hot”) in the regime J/T ≈ 1.
[23] Within a dataset, the ramp-down rate is kept fixed, tr =
τ × VOL/1.3 kHz, τ = 18ms if not otherwise indicated.
[24] D. Scherer et al., Phys. Rev. Lett. 98, 110402 (2007).
[25] J is averaged over junctions within the central 11% of
the array area, from which all quantitative experimental
results are extracted.
[26] D. Ananikian and T. Bergeman, Phys. Rev. A 73, 013604
(2006).
[27] M. Naraschewski and D. Stamper-Kurn, Phys. Rev. A
58, 2423 (1998).
[28] Following [24], in simulations we count a vortex if all
three phase differences in an elemental triangle of junc-
tions are ∈ (0, π), or if all are ∈ (−π, 0).
[29] The very short annihilation time of configuration I pairs
[ τ < 5ms in Fig. 3(c) ] is not unexpected: their spacing
3 = 2.8µm is comparable to the diameter of a vortex
core in the bulk condensate (after VOL rampdown).
[30] L. Pitaevskii and S. Stringari, Phys. Rev. Lett. 87,
180402 (2001).
ABSTRACT
  We observe the proliferation of vortices in the
Berezinskii-Kosterlitz-Thouless regime on a two-dimensional array of
Josephson-coupled Bose-Einstein condensates. As long as the Josephson
(tunneling) energy J exceeds the thermal energy T, the array is vortex-free.
With decreasing J/T, vortices appear in the system in ever greater numbers. We
confirm thermal activation as the vortex formation mechanism and obtain
information on the size of bound vortex pairs as J/T is varied.

<|endoftext|><|startoftext|>
Introduction
The binary content of a globular cluster is important in determining the frequency and
nature of cluster stellar exotica, as well as the dynamical evolution of the cluster. It has long
been recognized that binary formation is inevitable in a self-gravitating system1. Indeed,
the presence of binaries as a central energy source is vital to avoid complete core-collapse
(Goodman & Hut 1989). However, only more recently has it been realised that globular
clusters must also have formed with a sizeable binary population (see Hut et al. 1992 for
an early review). That globular clusters harbour a mixture of dynamically formed and
primordial binaries can be used to understand observations of their stellar content, such as
the diverse blue straggler population in 47 Tucanae (Mapelli et al. 2004).
Knowledge of the likely primordial binary fraction of globular clusters is essential as
input to models of globular cluster evolution. It also provides a constraint on the cluster
formation process. Considering that the presence of binaries in the cluster core has a pro-
nounced effect on the core properties and cluster evolution (Hut 1996), knowledge of the
central binary frequency is also important. Indications are that this is relatively small – of
the order of 20% (e.g. Bellazzini et al. 2002) or less (e.g. Cool & Bolton 2002) – when
compared to the frequencies of binaries observed in the solar neighbourhood (Duquennoy &
Mayor 1991) and open clusters such as M67 (Fan et al. 1996) which are of the order of 50%.
It would be particularly useful to take measurements of the current binary fraction in
globular clusters – whether that be in the core or outer regions – and extrapolate backwards to
gain a reliable determination of the primordial binary content. However, processes involved
1The ten-body gravitational calculations of von Hoerner (1960) are the earliest N -body calculations
published. They were continued until the first binary formed, at which point the calculations were halted.
– 3 –
in the intervening cluster evolution make this difficult. For example, binaries can be formed
and destroyed in a variety of interactions between cluster members (Hurley & Shara 2002).
Binaries will on average be more massive than single stars and thus are affected differently
by mass segregation. Also, the escape rates of single stars and binaries will differ. Finally,
the internal evolution of the components of binaries can also lead to binaries’ destruction.
Current simulation techniques have been designed to model these (and other) processes
(Aarseth 2003) and have reached the level of sophistication required to produce realistic clus-
ter models. In this way the link between primordial and current cluster binary populations
can be investigated directly (e.g. Hurley et al. 2005; Ivanova et al. 2005). Aarseth (1996)
conducted an N -body simulation starting with 10 000 stars and a 5% binary frequency where
notably the stars were drawn from a realistic initial mass function (IMF), the cluster was
subject to the tidal field of the Galaxy, and both stellar and binary evolution were modelled.
This model cluster had a half-life of about 2Gyr at which point the core binary frequency
had risen to 20% primarily owing to mass-segregation. Thus binaries were not preferentially
depleted. In this case it was not necessary to include a large initial binary fraction in order
to halt core-collapse and yield a significant observed abundance in the central regions. The
earlier work of McMillan & Hut (1994) reported N -body simulations of 2 000 stars or less
and binary frequencies in the range of 5-20%. They included the Galactic tidal field but
only considered point-mass dynamics. McMillan & Hut (1994) showed that there is a criti-
cal primordial binary frequency of 10-15% below which the binaries are destroyed before the
cluster dissolves owing to the tidal field. Furthermore, they found that above this critical
value there exists a minimum possible binary mass fraction for the cluster – this result could
be used with observations of present-day binary frequency to place limits on the primordial
frequency. We note that the McMillan & Hut (1994) simulations were restricted to equal-
mass stars, and the binaries were a factor of two heavier then single stars – this could give
misleading results when applied to real clusters2.
These N -body simulations were definitely in the open cluster regime. Dynamical pro-
cesses which destroy (and equally may create) cluster binaries are density dependent. In
addition, the central stellar density of a cluster is a function of the number, N , of cluster
members. Thus, it is not clear that these prior results apply to globular cluster conditions.
More recently Ivanova et al. (2005) have conducted Monte Carlo simulations of clusters
2 Binaries would naturally be twice as massive as single stars on average if binaries form by random
pairings independent of the stellar IMF. In general correlated masses are assumed (e.g. Kroupa 1995)
although the exact situation is unclear – the recent survey of stars in the solar neighborhood and in young
open clusters compiled by Halbwachs et al. (2003) shows a distribution of mass-ratios, q, with a broad peak
between 0.2− 0.7 but also a sharp peak for q > 0.8.
– 4 –
with up to 5 × 105 members and core number densities ranging from 103 − 106 stars pc−3.
They show that an initial binary frequency of 100% is required to produce a current core
binary frequency of 10% for a globular cluster such as 47 Tucanae. Depletion of binaries
in the cluster core is found to be the result of stellar evolution processes as well as three-
and four-body dynamical interactions. It is our intention in this paper to test these claims
by utilising direct N -body simulations of star clusters with up to N = 100 000 members
initially.
One aspect that will affect the evolution of the cluster binary population is the orbital
parameters of the primordial binaries – in particular the initial ratio of hard to soft binaries.
The boundary between these two regimes is determined by the mean kinetic energy of the
cluster stars (with binaries represented by their center-of-mass motion) where hard binaries
have a binding energy in excess of 2/3 of the mean kinetic energy (Hut et al. 1992). We
note that a useful estimate for the boundary in terms of the binary orbital separation is
given by twice the cluster half-mass radius divided by N . In three-body single–binary star
interactions hard binaries tend to harden and provide kinetic heating for the cluster (Heggie
1975; Hut 1983). Soft binaries are less strongly bound (and thus on average are wider) and
are efficiently destroyed in three- and four-body encounters. As noted by Hut et al. (1992) it
is for this reason that soft binaries are not generally included in cluster models. A common
misconception is that the omission of soft binaries is to aid the speed of simulation; however
it is binaries near the hard/soft boundary that provide the main threat to efficient simulation
(Aarseth 2003). The omission is more a realisation that soft binaries have little impact on
the cluster dynamics or exotic star formation and so the focus is on the more meaningful
binaries, so to speak. Neglecting soft binaries has the capacity to alter binary fractions in
the halo of a model cluster as binary encounters tend to occur in or near the cluster core.
For this reason we will attempt to account for any omitted soft binary populations when
making binary fraction comparisons.
Our simulation method and initial conditions are detailed in Sections 2 and 3. Results
are given in Section 4 followed by discussion in Section 5. We briefly summarize our results
in Section 6.
2. Models
All simulations utilized in this work were performed using the NBODY4 code (Aarseth
1999) on GRAPE-6 boards (Makino 2002) located at the American Museum of Natural
History. NBODY4 uses the 4th-order Hermite integration scheme and an individual timestep
algorithm to follow the orbits of cluster members and invokes regularization schemes to deal
– 5 –
with the internal evolution of small-N subsystems (see Aarseth 2003 for details). Stellar and
binary evolution of the cluster stars are performed in concert with the dynamical integration
as described in Hurley et al. (2001).
The results of four extensive simulations (detailed below) form the dataset for this
paper. We will make use of data from two simulations that have previously been reported
in the literature – a simulation starting with 95 000 single stars and 5 000 binaries (Shara &
Hurley 2006) and a simulation starting with 12 000 single stars and 12 000 binaries (Hurley
et al. 2005). The former contained 100 000 members at birth, if we count each binary as
one object, and thus had a primordial binary frequency of 5%. We will refer to this as the
K100-5 simulation. After about nine Gyr of evolution the cluster membership was reduced
by half and at an age of 15 − 16Gyr the model cluster had reached the end of the main
core-collapse phase (associated with a minimum in core-radius, after which the size of the
core stabilizes, in relative terms). Figure 1a shows the behaviour of the core radius as the
K100-5 model evolves. Also shown is the 10% Lagrangian radius – the radius which encloses
the inner 10% of the cluster by mass. From Figure 1a we see that initially the inner regions of
the cluster expand owing to stellar evolution mass-loss before two-body effects take over and
drive a prolonged period of contraction. When the cluster is about 12 half-mass relaxation
times old (as denoted across the top of Fig. 1a) the core radius reaches a minimum of 0.17 pc
and the main core-collapse phase is halted. The 10% Lagrangian radius at this point is
0.94 pc. The core density of the model begins at 102 stars pc−3 and increases to a maximum
of 104 stars pc−3 just before termination of the model at 20Gyr.
The core radius in Figure 1 is actually the density radius commonly used in N -body
simulations (Casertano & Hut 1985). It is calculated from the density weighted average
of the distance of each star from the density centre (Aarseth 2003). This definition, in
combination with the effects of three-body interactions and the movement of binaries across
the core boundary, allows for the small-scale fluctuations in core radius observed in Figure 1.
Such fluctuations could be smoothed out (see Heggie, Hut & Trenti 2006, for example)
but we have chosen not to do this. This N -body core-radius is distinct from observational
determinations of core-radius calculated, for example, from the surface brightness profile
(SBP) of a cluster. As discussed by Wilkinson et al. (2003) there is no general relation
between the two quantities but usually the N -body value is the lesser of the two. This
is supported by an in-depth analysis of the core-radius evolution of the K100-5 simulation
that will be presented in an upcoming paper (Hurley, 2007, in preparation). Preliminary
results show that the the core-radius obtained from the two-dimensional projected SBP of
the K100-5 model agrees well with the N -body core radius for the first 7Gyr of evolution
but is about twice as large by the time the model reaches 16Gyr of age. Thus the binary
fraction within the 10% Lagrangian radius may often be a better number to compare with
– 6 –
central binary fractions quoted for real clusters and we will give both this and the core binary
fraction in our results.
The second model had a primordial binary frequency of 50% and was tailored to in-
vestigate the evolution and stellar populations of the old open cluster M67. It had 24 000
members at birth and we will refer to this as the K24-50 simulation. It had a half-life of about
2Gyr and after 4Gyr of evolution only 2 000 stars and binaries remained. The core density
was about 102 stars pc−3 on average, reaching a maximum of 350 stars pc−3 at 3 480Myr with
a corresponding core radius of 0.3 pc. Figure 1b shows the evolution of the core and 10%
Lagrangian radii for the K24-50 simulation.
To investigate the evolution of binary fractions across a range of star cluster models we
will also make use of two simulations that have yet to be published. These are a simulation
that started with 90 000 single stars and 10 000 binaries (K100-10) and a simulation that
started with 40 000 single stars and 10 000 binaries (K50-20). In Table 1 we summarize the
properties of the four simulations.
For each model the initial setup is as follows. Masses for the single stars are drawn
from the IMF of Kroupa, Tout & Gilmore (1993) between the mass limits of 0.1 and 50M⊙.
Each binary mass is chosen from the IMF of Kroupa, Tout & Gilmore (1991), as this had
not been corrected for the effect of binaries, and the component masses are set by choosing
a mass-ratio from a uniform distribution. We assume that all stars are on the zero-age
main sequence (ZAMS) when the simulation begins and that any residual gas from the star
formation process has been removed. We use a Plummer density profile (Aarseth, Hénon
& Wielen 1974) and assume the stars and binaries are in virial equilibrium when assigning
the initial positions and velocities. There is no primordial segregation by mass, binary
properties, or any other discriminating factor in these models. Each cluster is subject to a
standard Galactic tidal field – a circular orbit in the Solar neighborhood. Stars are removed
from the simulation when their distance from the density centre exceeds twice that of the
tidal radius of the cluster. The metallicity of the stars in the two simulations starting with
100 000 stars (K100-5 and K100-10) was set to be Z = 0.001 while both the K24-50 and
K50-20 simulations were assigned solar metallicity (Z = 0.02).
3. Binary Period Distributions
The orbital separations of the 5 000 primordial binaries in the K100-5 simulation (Shara
& Hurley 2006) were drawn from the log-normal distribution suggested by Eggleton, Fitchett
& Tout (1989) with a peak at 30 au. This distribution is based on the properties of doubly-
– 7 –
bright visual binaries in the Bright Star Catalogue (Hoffleit 1983) and is in agreement with
the survey data of Duquennoy & Mayor (1991) for binaries in the solar neighbourhood –
although the latter observations do not rule out a flat distribution. Orbital eccentricities of
the primordial binaries were assumed to follow a thermal distribution (Heggie 1975). In the
K100-5 model the initial separation distribution was capped at 100 au. With a half-mass
radius of 6.7 pc for the initial model the hard/soft binary boundary is at about 30 au. Thus
the maximum of 100 au excludes only the softest binaries from the distribution. Binaries
with an initial pericentre distance less than five times the radius of the primary star were
rejected in the setup of the model – for binaries closer than this it is assumed that interaction
during the formation process and on the Hayashi track would lead to collision. Rather
than enact such a collision we simply choose another set of binary parameters from the
distributions. In this way the intended primordial binary fraction is preserved. The resulting
period distribution of the K100-5 model is shown in Figure 2a. We see that the distribution
is peaked at 105 d and does not extend beyond 106 d. The K100-10 simulation had the
same binary setup as that of the K100-5 model. The K50-20 simulation also used the same
Eggleton, Fitchett & Tout (1989) distribution of orbital separations but with a cap at 50 au.
Primordial binaries in the M67 (or K24-50) simulation of Hurley et al. (2005) have
orbital separations drawn from a flat distribution of log a (Abt 1983). An upper cutoff
of 50 au was applied so that soft binaries were not included in the model – with a half-
mass radius of 3.9 pc the hard/soft binary limit for the starting model was about 40 au.
For this model very close primordial orbits were also rejected. The corresponding period
distribution for the primordial binaries in the K24-50 simulation is shown in Figure 2b. We
note that a goal of the K24-50 simulation was to reproduce the relatively large number of
blue stragglers observed in M67. For this purpose an Eggleton, Fitchett & Tout (1989)
separation distribution was ruled out as it did not lead to enough blue straggler production
from Case A mass transfer in close binaries. Uncorrelated masses of the component stars in
binaries were also ruled out for the same reason (see Hurley et al. 2005 for details).
During this work we will be making comparisons to the Monte Carlo models presented by
Ivanova et al. (2005). In their study binary periods were chosen from a uniform distribution
in logP between the limits of 0.1 and 107 d. Thus they assumed a wider distribution of
primordial binaries. If, for example, the Eggleton, Fitchett & Tout (1989) distribution used
in the K100-5 simulation was extended to include all periods up to 107 d, rather than being
curtailed at 100 au, the 5 000 binaries that make up the distribution shown in Figure 2a
would represent about 5/6 of the full population. So effectively there would be 1 000 soft
binaries that have been neglected and the true primordial frequency would be 6%. One could
then assume that these soft binaries were broken-up at the very start of the simulation –
although this may not be true for soft binaries residing in the less-dense outer regions of
– 8 –
the cluster. However, we note that there is no evidence that binary periods in star clusters
should extend as far as 107 d (Meylan & Heggie 1997).
In terms of hard binaries one could argue, for the sake of semantics, that in comparison
to a population drawn from a uniform distribution of periods extending from 0.1 d (without
restriction) our initial distributions are under-sampling the contribution of hard binaries. A
key point here is that short-period binaries were not excluded from the primordial popula-
tions of our simulations by some ad-hoc process. Instead the distribution of orbital periods is
dictated by using distributions borne from observations in combination with accounting for
pre-main sequence (MS) evolution – before contracting along the Hayashi track the stellar
radius of a pre-MS star can be a factor of five or more greater than on the ZAMS (Siess,
Dufour & Forestini 2000) and birth periods must allow for this (Kroupa 1995). Pre-MS evo-
lution was not considered by Ivanova et al. (2005) although they did reject systems where
one or both stars would initially fill their Roche-lobes at pericentre – this was also assumed
in our models.
4. Results
In Figure 3 we show the evolution of the core binary fraction for the four N -body
simulations introduced above. Also shown is the binary fraction within the 10% Lagrangian
radius and the overall binary fraction of the model clusters.
Except at late times in the K24-50 model, when the cluster has lost more than 90%
of its original mass and is nearing dissolution, we see that in each case the cluster binary
fraction remains close to the primordial value. Focussing on the K100-5 simulation, Figure 4
shows the fractions of single stars and binaries (compared to their respective initial number)
in the cluster. Following on from Figure 3a the fractions are similar at all times as expected.
However, Figure 4 also shows the fractions of single stars and binaries that have escaped the
cluster and we see that from about 2Gyr onwards the fractional escape rate of single stars
is greater than that of the binaries. At the end of the simulation (20Gyr) the difference is
34%. This is offset somewhat by evolution processes (stellar and binary) that destroy binaries
(see the dotted line in Figure 4). These processes include binaries becoming unbound due
to supernova mass-loss and/or kicks (only relevant for the first 100Myr of evolution) and
mass transfer-induced mergers in close binaries. The remaining difference is balanced by
the destruction of binaries in dynamical encounters and this becomes more important as the
cluster evolves. We note that even though the cluster binary fraction is relatively static as
the cluster evolves the characteristics of the binary population change markedly over time
with hard binaries favoured at late times.
– 9 –
Evident from Figure 3 is an overall trend for the core binary fraction to increase with
time, irrespective of simulation type. For the core binary population of the K100-5 model we
see that this rises from an initial 5% to as high as 40% around the time that the core-collapse
phase is halted. After this time the core binary fraction becomes quite noisy owing to the
small size of the core (see Figure 1) and the small numbers of binaries and stars in the core.
However, the value always remains greater than the initial value. We see also from Figure 3a
that the binary frequency within the inner 10% Lagrangian radius rises to a maximum of
16% just prior to the end of the core-collapse phase.
It is important to note at this point that we are working with radii derived from spherical
data whereas observational determinations of binary fractions are based on two-dimensional
projected data. With our models it is possible to test the effect of this discrepancy on our
findings. If we calculate the 10% Lagrangian radius for model K100-5 from a two-dimensional
projection we find that the radius is reduced by about 20-40% across the evolution (the
choice of projection axis does not affect this result). This is consistent with the expectation
suggested by Fleck et al. (2006). A similar relationship is reported by Baumgardt, Makino &
Hut (2005) in that the half-light radius (calculated from projected data) is approximately half
the size of the half-mass radius (calculated from spherical data). However, the binary fraction
within the projected 10% Lagrangian radius of our K100-5 model is almost indistinguishable
from that of the result shown in Figure 3a (the dotted curve).
We now aim to understand the processes underlying the evolution of the core binary
fraction of star clusters, focussing again on the K100-5 simulation. Figure 5 shows the
number of single stars and binaries in the core, relative to their total number in the cluster,
as the cluster evolves. For the first 10Gyr of evolution the ratio of binaries in the core to
binaries in the cluster is fairly static – roughly 1 in 10 binaries is in the core. However,
the ratio of single stars found in the core is decreasing sharply over the same timeframe
and thus single stars are being lost from the core at a greater rate than from the cluster in
general (comparing Figs. 4 and 5). From 10Gyr onwards the ratio of binaries in the core
also decreases. This corresponds to a period of increasing core density: prior to 10Gyr the
core density of stars hovers around the 102 stars pc−3 mark but from 10− 15Gyr it increases
by an order of magnitude. The binary fraction continues to rise in the core over this period
indicating that single stars continue to be lost from the core at a greater rate than binaries.
We note that mass loss from stellar evolution is reduced considerably at this stage compared
to earlier in the cluster lifetime when more massive stars were present.
Figure 6 confirms that the number of binaries in the core is decreasing with time even
though the binary fraction, fb,c, is increasing. We also see from this figure that at least half of
the binaries in the core at any time were not present in the core the last time the population
– 10 –
was sampled (this is done at intervals of 80Myr). So the core binary population is by no
means static as many binaries are being created/destroyed, or moving in and out of the core,
on the 80Myr timescale. It is important to note for comparison that the relaxation time
in the core is approximately 200Myr initially and decreases to about 50Myr at late times.
Individual binaries in cluster cores are both promiscuous and mobile – transient residents.
In Figure 7 we examine the fraction of core binaries that were created in exchange
interactions. These are short-lived 3- and 4-body gravitational encounters where a star is
exchanged into an existing binary displacing one of the members of that binary (Heggie
1975). Thus it is a process by which primordial binaries can be destroyed and replaced by
new dynamical, or exchange, binaries. We see from Figure 7a that these non-primordial
binaries come to dominate the core population towards the end of the core-collapse phase
in the K100-5 simulation. Figure 7a also shows that the double-degenerate binary content
increases steadily in the core with time and comprises about 30% of the core binaries subse-
quent to the completion of the core-collapse phase. In Figure 7b we see that the exchange
binary content in the core of the K100-10 model does not reach the heights of the K100-5
model. Presumably this is a consequence of the lower core-density of the K100-10 model.
The fraction of double-degenerate binaries is similar – any decrease in double-degenerate pro-
duction via dynamical means in the K100-10 model is compensated by the increased number
of primordial binaries. The fraction of exchange binaries in the core of the K24-50 simulation
(Figure 7d is comparatively low whereas the K50-20 simulation (Figure 7c exhibits a much
larger fraction. Clearly there is a positive correlation between core density and the fraction
of exchange binaries in the core.
Figure 8a shows the number of binaries created and destroyed in exchange interactions
occurring in the core in intervals of 80Myr. Also shown is the number of core binaries
destroyed by all processes (exchanges, orbital perturbations, supernovae, mergers) in each
interval. The key point to note here is that on average exchange interactions are creating
as many binaries as they are destroying. For the entire cluster there were 1 024 binaries
destroyed in exchange interactions during the simulation and 933 binaries created.
Figure 8b looks at the movement of binaries in and out of the core as the cluster evolves.
Across each 80Myr interval it shows the fraction of core binaries that move out of the core
during the interval and the fraction of binaries that have moved into the core during the
interval. We see that the inwards and outwards fluxes are equal. Also shown is the fraction
of binaries entering the core that have previously been in the core – most binaries that leave
the core eventually revisit it. We see a pattern where binaries move outwards across the
core boundary owing to recoil velocities from gravitational encounters, or as a result of the
shrinking core. The core binary population is then replenished by binaries sinking inwards
– 11 –
owing to mass-segregation effects. In the discussion below we will refer to this pattern as
binary convection. We note that binaries on radial orbits with a moderate to high eccentricity
will also make an apparent contribution to this process.
An analysis of binary disruption for the K100-5 simulation is given in Figure 9 in terms
of cumulative events. Exchange interactions and orbital perturbations from nearby stars are
by far the dominant causes of binary disruption and these are shown in the top panel. We see
that perturbation events are more likely at early times in the evolution but, as soft binaries
disappear and the binary population becomes skewed towards hard binaries, exchange events
eventually overtake perturbations as the major cause of disruption. However there is an
important distinction to make between these two types of event. Exchange interactions are
counted as a disruption event in Figure 9a even if the event also leads to the creation of a
new binary and as we have seen in Figure 8a this is more than likely. On the other hand if
a binary is broken up owing to an orbital perturbation (also known as a fly-by) there is no
possibility of a replacement binary being created in the event.
The lower panel of Figure 9 shows the number of binaries that were ejected from the core
and escaped the cluster. There is a sharp correlation between the incidence of escape and
the increase in core density after 10Gyr. Even so, the total number of binaries lost owing
to this process remains an order of magnitude less than either perturbation or exchange
disruption. There is an initial burst of stellar/binary evolution induced mergers in short-
period primordial binaries followed by a gradual depletion of binaries owing to this process
and collisions in highly eccentric binaries. The cluster had a total of 287 binaries that
experienced either a merger or an internal collision and 67 of these events occurred in the core.
We also see from Figure 9b that supernova events do not make a meaningful contribution to
depletion of the core binary population.
Figure 10 repeats Figure 9 for the K24-50 simulation. In this simulation mergers and
collisions are the most likely cause of core binary loss. This is linked to the increased
primordial binary fraction and decreased core density, compared to the K100-5 simulation.
For similar reasons exchange disruption is more likely than perturbed disruption over the
course of the evolution. In fact in this simulation even the loss of binaries from the core as
a result of escape is greater than that from perturbed break-up. A key distinction between
the K24-50 and K100-5 simulations is that in the K24-50 case the ratio of binary destruction
to creation in exchanges is 3:1 whereas it was close to 1:1 for the K100-5 simulation.
The effect of a substantial primordial binary population on the evolution of open clusters
has been documented in the past (McMillan, Hut & Makino 1990, for example and see
Meylan & Heggie 1997 for a review). The main results are that in comparison to simulations
without primordial binaries the core-collapse phase of evolution is less dramatic and the
– 12 –
cluster lifetime is reduced. Little has been done on this subject for globular clusters to
date primarily because direct simulations have not been possible. However, our simulations
starting with 100 000 stars can start to shed some light on the expected behaviour. We see
from Table 1 that increasing the primordial binary frequency from 5% (K100-5 simulation)
to 10% (K100-10) does not reduce the cluster half-life significantly. In contrast the K24-50
simulation with 50% binaries has a half-life of 2 060Myr while a comparable simulation of
30 000 single stars with no primordial binaries has a half-life of 3 600Myr. As noted in Hurley
& Shara (2003) the presence of a large number of primordial binaries in an open cluster leads
to an enhanced rate of escaping stars via recoil velocity kicks obtained in 3-body interactions.
In comparison, the K100-5 and K100-10 clusters have deeper potential wells and also the
change in binary fraction between the two models is much less than for the open cluster
example. So a sharp change in the escape rate is not to be expected.
Figure 11 shows that the core radius evolution of the K100-5 and K100-10 simulations
is similar up to 15Gyr (when the K100-10 simulation was stopped). We note however that
the core density of the K100-10 model at this time is only half that of the K100-5 model. So
the presence of additional primordial binaries has reduced the number density of stars in the
core. Also in Figure 11 we compare the core radius evolution of a 100 000 star simulation
with no primordial binaries (a K100-0 model). Here we see that the core radius evolution
is slightly more irregular but overall the evolution is once again similar up to 15Gyr. After
core-collapse has been halted the situation is different as the single star model experiences a
fluctuating, and generally increasing, core-radius while the core-radius of the K100-5 model
remains approximately constant (see Figure 1a). The K100-0 model has a greater core
density than the K100-5 model at the end of the main core-collapse phase. The fluctuating
core-radius of the K100-0 model in the post-core-collapse phase is indicative of the core
bounce and subsequent oscillations expected for such a model – these phenomena are more
pronounced for models without primordial binaries (see the related discussion in Heggie &
Hut 2003 and Heggie, Trenti & Hut 2006).
In Figure 12 we investigate the radial distribution of the K100-5 binary population at
times of 6, 12 and 18Gyr, i.e. before, during and after the deep core-collapse phase. We see
that outside of the half-mass radius the binary fraction is effectively constant with radius
and changes little with time. The binary population in this region is also dominated by
primordial binaries – exchange binaries are unlikely to be found outside of the half-mass
radius. Within the half-mass radius the binary fraction rises sharply towards the centre of
the cluster and binaries become more centrally concentrated as the cluster evolves. Note
that the inner radial bin corresponds to the inner 5% Lagrangian radius so the core is not
resolved in Figure 12.
– 13 –
5. Discussion
Our N -body results clearly show that the core binary fraction of an evolved star cluster
is expected to be greater than the primordial binary fraction. We see this behaviour in each
of the models presented and at all times in the evolution. The most striking case is our main
model (K100-5) which started with 95 000 single stars and 5 000 binaries and experienced a
factor of eight increase in the core binary fraction after 16Gyr of evolution (coinciding with
the end of the main core-collapse phase).
At face value our results appear to be in clear contradiction to the Monte-Carlo results
recently presented by Ivanova et al. (2005). Their main reference model has a primordial
binary fraction fb,0 = 1.0 and a stellar density of nc = 10
5 stars pc−3. Processes such
as exchange interactions, orbital perturbations, binary evolution and mass-segregation are
included and the model is reduced to fb,c = 0.095 at 14Gyr. Ivanova et al. (2005) then repeat
the simulation with fb,0 = 0.5 and end up with fb,c = 0.07 so, as they note, the relationship
between primordial binary fraction and final core binary fraction is not linear. Coming at
this from the other direction our models show a possible saturation effect as the primordial
binary fraction increases. Looking back at Figures 3a and 3b we see that the core binary
fraction of the K100-5 model at the 14Gyr mark is 0.2 (up from fb,0 = 0.05) while it is 0.3
(up from fb,0 = 0.1) for the K100-10 model. So the simulation with the lower primordial
binary fraction has experienced the greater relative increase in core binary content. Our
K24-50 model, which started with fb,0 = 0.5, has fb,c = 0.8 at a similar dynamical age so
the relative increase is less again. These results raise the possibility that decreasing fb,0
below 0.5 in the Monte Carlo models may lead to conditions where fb,c can increase. It is
interesting to note that the idealized models presented recently by Heggie, Trenti & Hut
(2006) showed saturation effects in the core for initial binary frequencies greater than 10%
and also recorded an increase in the core binary fraction with time.
Ivanova et al. (2005) also performed a model to compare with 47 Tucanae. This was
similar to their main reference model although slightly more dense and with an increased
velocity dispersion. The result for fb,0 = 1.0 was fb,c = 0.07 – this lead to the conclusion
that the primordial binary frequencies of globular clusters such as 47 Tucanae must have
been close to 100% to explain current observations. However, Ivanova et al. (2005) also
ran the same simulation with fb,0 = 0.75, 0.5 and 0.25 and reported little or no variation in
the final core binary fraction. It would seem safe to assume that repeating the simulation
with fb,0 = 0.1 may give the same result or even an increase in binary fraction. This would
act to remove any obvious discrepancies between the N -body and Monte Carlo results. We
would certainly be interested in seeing the results of a Monte Carlo simulation conducted
with fb,0 = 0.1 and a similar setup of the primordial binary population as used in this work
– 14 –
– much easier than repeating a large N -body simulation with 100% binaries.
A major distinction between our N -body models and the Monte Carlo simulations
mentioned above is that the stellar density is at least an order of magnitude greater in the
latter. Fortunately, Ivanova et al. (2005) performed a simulation with nc = 10
3 stars pc−3
which facilitates a more direct comparison with our K50-20 model which had a similar core
density throughout the evolution. The K50-20 model experienced an almost factor of two
increase in core binary fraction as it evolved from 0 − 8Gyr. The comparable Monte Carlo
model showed a reduction in core binary fraction of more than a factor of two over the same
period. So there is an obvious deviation in behaviour. Of course there is a large difference
in the primordial binary fractions (0.2 compared to 1.0). The effect of this will be discussed
further below. However, we note at this stage that the initial hard binary fraction in the
Monte Carlo model was ∼ 30% (Ivanova, private communication) and this rose to 37% – so
the hard binary fraction increases and subsequently the models do show agreement at some
level. Another consideration is the velocity dispersion which is generally around 3−4 km s−1
for our models and was set to 10 km s−1 for most of the Monte Carlo models. However,
Ivanova et al. (2005) did perform two models (D4 and M12) similar in all respects except
that σ = 10 km s−1 in one and 4.5 km s−1 in the other. There was no significant difference in
the final core binary fractions of these models.
In Section 3 we discussed that in the setup of our models we might be neglecting a
fraction of soft binaries from the true primordial population. This results from imposing
a maximum initial orbital separation and at most would cause the binary fraction to be
underestimated by a few per cent. Thus we are confident that our choice of initial parameters
for the binary populations in our models is not affecting the result that the core binary
fraction increases as a cluster evolves. We also note that differences in the setup of primordial
binaries between our simulations and those of Ivanova et al. (2005) make it difficult to
directly compare quoted binary frequencies. For example, by not accounting for pre-MS
stellar radii as we do, Ivanova et al. (2005) have a greater relative number of close binaries
in their primordial populations. Such an excess would result in a greater number of evolution-
induced binary mergers. If we were to adopt the period distribution and methods used by
Ivanova et al. (2005) we would need to choose ∼ 11 000 binaries in order to recover the 5 000
in our K100-5 model at birth. This gives an effective primordial binary frequency of 11%, for
the sake of comparison. The effective primordial binary frequency for the K24-50 simulation
would be 80%. Adopting these values, in the worst case scenario, would still not lead us to
conclude that the core binary fraction of an evolved cluster is decreased from the primordial
value.
The comparable rates of binary disruption and creation owing to exchanges in our
– 15 –
K100-5 simulation indicates that 3-body interactions dominated over 4-body interactions.
This is because the most likely outcome of a binary-binary encounter is a binary and two
single stars. So a binary is lost from the overall count. This is not the case for binary-
single encounters where the most likely outcome is a binary and a single star, although the
pairing of stars in the binary and/or the orbital parameters may have changed. By contrast,
exchange interactions in the K24-50 simulation produced a binary disruption rate much
higher than the binary creation rate. Here we had a much higher proportion of primordial
binaries and thus binary-binary encounters were more likely. Thus, in terms of exchange
interactions, increasing the primordial binary fraction can lead to a greater rate of binary
destruction. This would certainly be expected to be true of models with comparable stellar
densities. However, a competing effect comes from the fact that the central density is less for
simulations with higher primordial binary fractions. We certainly see this when comparing
our K100-5 and K100-10 models. The setup of these models was identical in all respects
except for the change in primordial binary frequency from 5% to 10%. The models have
similar half-lives and we showed that the core radius evolution is also similar. So at any
particular time in the evolution they are at a comparable dynamical age. But there is one
clear difference – the model with twice as many primordial binaries has a central stellar
density that is a factor of two less. This translates to a lower incidence of close stellar
encounters and as we saw from Figure 7 a greatly reduced fraction of exchange binaries
in the core. Previous simulations, albeit with small-N , have indicated that the effects of
primordial binaries saturate at some level (Wilkinson et al. 2003) so this is not necessarily
a trend that we expect will continue as the primordial binary fraction is increased towards
unity. However it is certainly significant for clusters with frequencies of 10% or less.
Another point to note is that in a 3-body exchange, not only is a binary not lost, but also
a more massive single star is swapped for a less massive one, increasing the likelihood that
the single star will be lost from the core via mass-segregation. So the exchange process has
indirectly increased the core binary fraction. The process of binary convection that became
evident from Figure 8b also is related to mass-segregation and acts to keep the core binary
fraction healthy. Both single stars and binaries in the core are subject to velocity kicks from
gravitational encounters. These kicks can remove an object from the core and even from the
cluster entirely. For binaries this is less likely to occur primarily because they are on average
more massive than single stars. Also, the average stellar mass decreases radially outwards
in an evolved cluster. So if a core binary suddenly finds itself outside of the core it can be
expected to be one of the more massive objects in its new local environment and thus to
quickly sink back towards the core. We note that we found the movement of binaries inwards
and outwards across the core boundary, as exhibited by Figures 6 and 8, to be quite striking.
Our K100-5 N -body simulation creates a realistic model of a moderate-size globular
– 16 –
cluster. It combines stellar and binary evolution with a self-consistent treatment of the
cluster dynamics. It includes primordial binaries and accounts for the tidal field of the
Galaxy. Thus it provides us with a solid picture of how such a cluster evolves. Single stars
escape from the cluster at a greater rate than binaries do – single stars are less massive on
average so they are more likely to be tidally stripped after segregating to the outer regions
of the cluster and also more likely to be ejected from the cluster in gravitational encounters.
However, binaries are also lost from the cluster population owing to supernova disruption,
evolution-induced mergers and dynamical encounters. These effects balance and the ratio
of single stars to binaries is similar at all times in the evolution. As the cluster evolves
binaries sink towards the centre and the binary fraction increases in the central regions.
The core radius decreases as core-collapse proceeds and dynamical encounters become more
prevalent. These encounters not only break-up binaries but also create new binaries. The
cluster evolves to a state where primordial binaries dominate the binary population in the
outer regions and non-primordial binaries dominate towards the centre.
In the centre of the cluster soft binaries are broken-up as a result of orbital perturbations
from gravitational encounters. Binaries become involved in exchange interactions, primarily
three-body, but these tend to create as many binaries as they destroy. Hard binaries are lost
when the components merge as a result of close binary evolution or a collision at periastron.
These are ongoing processes as the cluster evolves. At an age of 10Gyr the rate of exchange
interactions is greater than that of perturbed break-ups and mergers. However, perturbed
break-ups are the dominant cause of binary loss. This is compared to the Monte Carlo
model of Ivanova et al. (2005) which found that evolutionary mergers were the dominant
event at the same age. We also find that after 10Gyr, as the core density increases, that
binaries can be kicked out of the cluster directly from the core. Partly as a result of the
combination of these processes the number of binaries in the core decreases as the cluster
evolves. Also to blame is the movement of binaries outwards across the core boundary owing
to the decreasing size of the core and recoil velocities invoked in gravitational encounters.
However, the movement of single stars outwards across the core boundary is greater and the
net effect is an increase in the core binary fraction. This is also helped by binary convection
where binaries that were previously resident in the core are cycled back in.
Noting that the typical membership of Galactic globular clusters exceeds 300 000 stars
(Gnedin & Ostriker 1997, for example) we must ask the question – to what extent can
we expect this behaviour to extend to globular clusters in general? We can start with
the ejection rate, tej, of stars from an isolated cluster calculated by Hénon (1969) which
gives tej ∝ ln (0.4N) trh (Binney & Tremaine 1987). Here trh is the half-mass relaxation
timescale and we can relate this to behaviour near the core of a cluster if we assume that
core-mass scales with total mass and that radii do not vary appreciably with cluster mass.
– 17 –
This indicates that the relative rate of outward binary ejection and inward mass-segregation
(which occurs on a relaxation timescale) is only weakly dependent on the cluster mass. If
we look in detail at the local relaxation timescale this scales as
ρ ln (0.4N)
(Davies, Piotto & De Angeli 2004, as derived from Binney & Tremaine 1987) where σ is the
velocity dispersion of the cluster stars and ρ is the mass density. We can take σ ∝
M/rh ∝
c and ρ ∝ Mc, using the above assumptions, to show that tr ∝ M
c / ln (0.4N). Here
Mc is the cluster core-mass, M is the total cluster mass and rh is the half-mass radius. The
timescale for a typical binary in the core of a globular cluster to have a close encounter with
another star scales as
tenc ∝
(Davies, Piotto & De Angeli 2004) where n is the number density and n ∼ ρ if the average
stellar mass is of the order of M⊙, as it is in an evolved cluster core. This gives us tenc ∝
c . To escape the core a binary must acquire a boost in energy of the order of GMc/2 rc
(where G is the Gravitational constant). So, assuming that the average energy imparted in
an encounter does not vary strongly with mass, we have tej ∝ M
c . This rather simplified
analysis returns Hénon’s result and shows that asM (or N) increases there will be relatively
less binary convection as both the ejection and relaxation timescales increase. However, the
effect on the observed core binary fraction can be expected to be minimal.
We cannot definitively use our results to make predictions regarding globular clusters
such as 47 Tucanae because the central density in these clusters is at least an order of
magnitude higher than that reached by our models. However, we note that our model with
the highest core density showed the greatest increase in core binary fraction. Furthermore,
we have considered a range of cluster types. It does not appear, from our simulations, that
an initial binary fraction anywhere near as high as 100% is required to give a core population
of 20% or less at later times. We also note that proper-motion cleaned colour-magnitude
diagrams recently presented for NGC6397 (Richer et al. 2006) and M4 (Richer et al. 2004)
show a distinct lack of binaries in regions outside of the cluster centre – this cannot be
reconciled with a large primordial binary population.
6. Summary
We have presented a range of simulations typical of rich open clusters and moderate-size
globular clusters. In each case we find that the fraction of binaries in the core of a cluster
– 18 –
does not decrease as the cluster evolves. In fact the overriding trend is for an increase in core
binary fraction from the primordial value. Thus we do not agree with Ivanova et al. (2005)
that the binary fraction in the core will be depleted in time. We also do not agree that
models of globular cluster evolution need necessarily include large populations of primordial
binaries.
Our simulations have shown that the binary population in the core of a cluster is con-
tinually being replenished by stars from outside the core, many of which were previously in
the core. This is a process we have termed binary convection. We also find that the binary
content of an evolved star cluster is dominated by exchange binaries provided that the stellar
density is relatively high. This is true of our moderate-size globular cluster models and we
expect it to be true in more massive clusters. We also show that increasing the primordial
binary fraction does not necessarily lead to an increase in the final binary fraction – in fact
it gives more scope for binary depletion. A key and paradoxical result is that a final binary
fraction that can be achieved by choosing a higher primordial binary fraction may also be
replicated by choosing an initially lower binary fraction.
We find that the overall binary fraction of a cluster does not vary appreciably from the
primordial value as a cluster evolves. This is a result of binary destruction being balanced
by a greater rate of escape of single stars compared to binaries. We also find that the
primordial binary frequency of a cluster is well preserved outside of the cluster half-mass
radius. Therefore, observations of the current binary fraction in these regions is a good
indicator of the primordial binary fraction while determination of the core binary fraction
provides an upper limit.
We acknowledge the generous support of the Cordelia Corporation and that of Edward
Norton which has enabled AMNH to purchase GRAPE-6 boards and supporting hardware.
We thank the anonymous referee for extremely helpful comments and especially for alerting
us to the scaling considerations.
– 19 –
REFERENCES
Aarseth, S., Hénon, M., & Wielen, R. 1974, A&A, 37, 183
Aarseth, S. J. 1996, in Proc. IAU Symp. 174, Dynamical evolution of star clusters: con-
frontation of theory and observations., ed. P. Hut & J. Makiino (Dordrecht: Kluwer),
Aarseth, S. J. 1999, PASP, 111, 1333
Aarseth, S. J. 2003, Gravitational N-body Simulations: Tools and Algorithms (Cambridge
Monographs on Mathematical Physics). Cambridge University Press, Cambridge
Abt, H. A. 1983, ARA&A, 21, 343
Baumgardt, H., Makino, J., & Hut P. 2005, ApJ, 620, 238
Bellazzini, M., Fusi Pecci, F., Messineo, M., Monaco, L., & Rood, R. T. 2002, AJ, 123, 1509
Binney, J., & Tremaine, S. 1987, Galactic Dynamics. Princeton University Press, Princeton
Casertano, S. & Hut, P. 1985, ApJ, 298, 80
Cool, A. M., & Bolton, A. S. 2002, in ASP Conference Series 263, Stellar Collisions, Mergers
and their Consequences, ed. M.M. Shara (San Francisco: ASP), 163
Davies, M.B., Piotto, G., & De Angeli, F. 2004, MNRAS, 348, 129
Duquennoy, A., & Mayor, M. 1991, A&A, 248, 485
Eggleton, P.P., Fitchett, M., & Tout, C.A. 1989, ApJ, 347, 998
Fan, X., et al. 1996, AJ, 112, 628
Fleck, J.-J., Boily, C.M., Lançon, A., & Deiters S. 2006, MNRAS, 369, 1392
Gnedin, O.Y., & Ostriker, J.P. 1997, ApJ, 474, 223
Goodman, J., & Hut, P. 1989, Nature, 339, 40
Halbwachs, J.L., Mayor, M., Udry, S., & Arenou, F. 2003, A&A, 397, 159
Heggie, D.C. 1975, MNRAS, 173, 729
Heggie, D.C., & Hut, P. 2003, The Gravitational Million-Body Problem. Cambridge Univer-
sity Press, Cambridge
– 20 –
Heggie, D.C., Trenti, M., & Hut, P. 2006, MNRAS, 368, 677
Hénon, M. 1969, A&A, 2, 151
Hoffleit, D. 1983, The Bright Star Catalogue (4th ed.; New Haven: Yale University Obser-
vatory)
Hurley, J. R., Tout, C. A., Aarseth, S. J., & Pols, O.R. 2001, MNRAS, 323, 630
Hurley, J. R., & Shara, M.M. 2002, ApJ, 570, 184
Hurley, J. R., & Shara, M.M. 2003, ApJ, 589, 179
Hurley, J. R., Pols, O. R., Aarseth, S. J., & Tout, C. A. 2005, MNRAS, 363, 293
Hut, P. 1983, ApJ, 272, L29
Hut, P., McMillan, S., Goodman, J., Mateo, M., Phinney, E. S., Pryor, C., Richer, H. B.,
Verbunt, F., & Weinberg, M. 1992, PASP, 104, 981
Hut, P. 1996, in Proc. IAU Symp. 174, Dynamical evolution of star clusters: confrontation
of theory and observations., ed. P. Hut & J. Makiino (Dordrecht: Kluwer), 121
Ivanova, N., Belczynski, K., Fregeau, J.M., & Rasio, F.A. 2005, MNRAS, 358, 572
Kroupa, P. 1995, MNRAS, 277, 1507
Kroupa, P., Tout, C. A., & Gilmore, G. 1991, MNRAS, 251, 293
Kroupa, P., Tout, C. A., & Gilmore, G. 1993, MNRAS, 262, 545
Makino, J. 2002, in ASP Conference Series 263, Stellar Collisions, Mergers and their Conse-
quences, ed. M.M. Shara (San Francisco: ASP), 389
Mapelli, M., Sigurdsson, S., Colpi, M., Ferraro, F. R., Possenti, A., Rood, R. T., Sills, A.,
& Beccari, G. 2004, ApJ, 605, L29
McMillan, S., & Hut, P. 1994, ApJ, 427, 793
McMillan, S., Hut, P., & Makino, J. 1990, ApJ, 362, 522
Meylan, G. & Heggie, D.C. 1997, A&ARv, 8, 1
Richer, H. B., Fahlman, G. G., Brewer, J., Davis, S., Kalirai, J., Stetson, P.B., Hansen, B.
M. S., Rich, R. M., Ibata, R. A., Gibson, B. K., & Shara, M. M. 2004, AJ, 127, 2771
– 21 –
Richer, H. B., et al. 2006, Science, 313, 936
Shara, M. M., & Hurley, J. R. 2006, ApJ, 646, 464
Siess, L., Dufour, E., & Forestini, M. 2000, A&A, 358, 593
von Hoerner, S. 1960, Z. Astrophys., 50, 184
Wilkinson, M. I., Hurley, J. R., Mackey, A. D., Gilmore, G. F., & Tout, C. A. 2003, MNRAS,
343, 1025
This preprint was prepared with the AAS LATEX macros v5.2.
– 22 –
Fig. 1.— Evolution of the core radius (solid line) and the radius containing the inner 10% of
the cluster mass (dotted line) for: a) the K100-5 simulation; and b) the K24-50 simulation.
The numbers across the top show the number of half-mass relaxation times that have elapsed.
Note that Ns,0 and Nb,0 refer to the number of single stars and binaries, respectively, in the
starting model.
– 23 –
Fig. 2.— Period distribution of the primordial binary populations in: a) the K100-5 sim-
ulation (starting with 5 000 binaries); and b) the K24-50 simulation (starting with 12 000
binaries).
– 24 –
Fig. 3.— Evolution of the binary fraction in the core (solid line), within the 10% Lagrangian
radius (dotted line), and for the entire cluster (dashed line). Results are shown for the: a)
K100-5; b) K100-10; c) K50-20; and d) K24-50 simulations (see Table 1 for details).
– 25 –
Fig. 4.— Fraction of single stars (solid line) and binaries (dashed line) remaining in the
cluster as a function of time (lines decreasing from top-left). Each population is scaled by the
initial number of that population. Also shown are the fractions of single stars and binaries
that have escaped from the cluster (lines increasing from bottom-left). The dotted line is the
combined fraction of binaries lost to escape and binary/stellar evolution processes. Results
are for the K100-5 simulation.
– 26 –
Fig. 5.— Number of single stars in the core as a fraction of the number of single stars in the
cluster (solid line) and number of binaries in the core as a fraction of the number of binaries
in the cluster (dashed line). Results are for the K100-5 simulation.
– 27 –
Fig. 6.— Number of core binaries as a function of time (solid line). Also shown at each
time is the number of binaries that have remained in the core from the previous sampling
(dashed line). Results are for the K100-5 simulation and the data are sampled every 80Myr.
– 28 –
Fig. 7.— Fraction of binaries in the core that were created in an exchange interaction (solid
line) and fraction of core binaries that contain two degenerate stars (dotted line). Results
are shown for the: a) K100-5; b) K100-10; c) K50-20; and d K24-50 simulations (as described
in Table 1).
– 29 –
Fig. 8.— Statistics regarding core binaries across intervals of 80Myr as the K100-5 model
cluster evolves. Shown are: a) number of binaries destroyed in an exchange interaction
occurring in the core (solid line), number of binaries created in exchange interactions in the
core (dashed line) and the number of binaries destroyed by any means (dotted line); and b)
fraction of binaries that have moved out of the core but remained in the cluster (solid line:
as a fraction of the number of binaries in the core at the start of the interval), number of
binaries that have moved into the core (dashed line: as a fraction of the number of binaries
in the core at the end of the interval) and the fraction of binaries entering the core that
have previously resided in the core (dotted line). Note that the data have been moderately
smoothed – over a width of three bins (or 240Myr). Further smoothing would hide the
naturally irregular behaviour of the binary destruction/creation processes.
– 30 –
Fig. 9.— Cumulative numbers of events that lead to the destruction of binaries in the core.
Shown are: a) binaries broken-up in exchange encounters (solid line) and binaries broken-up
owing to orbital perturbations (dotted line); b) binaries that were ejected from the core
and escaped from the cluster (dashed line), binaries broken-up as a result of supernovae
explosions (dotted line) and binaries in which the stars merged (solid line) – this includes
stellar evolution induced mergers and collisions at periastron in highly eccentric binaries.
Results are for the K100-5 simulation.
– 31 –
Fig. 10.— As for Figure 9 but for the K24-50 simulation.
– 32 –
Fig. 11.— Comparison of core-radius evolution for models starting with 100 000 stars. The
K100-5 simulation is taken as a reference model and shown are differences between the core
radius of this model and models starting with 0% (dotted line) and 10% primordial binaries
(K100-10: dashed line). The difference is scaled by the core radius of the K100-5 model.
Note that for each simulation the core radius used is the average core radius in a 250Myr
interval.
– 33 –
Fig. 12.— Binary data as a function of radial position for the K100-5 model. Shown at
times of 6Gyr (solid line), 12Gyr (dashed line) and 18Gyr (dotted line) are: a) distribu-
tion of binary fraction; b) fraction of binaries that are primordial. At each time there are
twenty radial bins each containing the same mass, i.e. corresponding to Lagrangian radii
incremented by 5%. Thus the core is not resolved.
– 34 –
Table 1. Details of the four N -body simulations utilised in this work. Columns 1 and 2
show the number of single stars and binaries in the starting model. The distribution used
to select the orbital separations of the primordial binaries is given in column 3 and this is
followed by the maximum applied to the distribution (in au). Column 5 lists the primordial
binary fraction and in column 6 we show the typical stellar density in the core for the
simulation (stars/pc3). The half-life of the simulation (time in Myr for Ns +Nb to drop to
half the initial value) is given in column 7 and finally an identifying label is supplied for
each simulation.
Ns,0 Nb,0 ψ(a) amax fb,0 nc t1/2 label
95000 5000 EFT30 100 0.05 102 − 104 8920 K100-5
90000 10000 EFT30 100 0.10 100− 500 8850 K100-10
40000 10000 EFT30 50 0.20 103 5560 K50-20
12000 12000 log a 50 0.50 100− 350 2060 K24-50
	Introduction
	Models
	Binary Period Distributions
	Results
	Discussion
	Summary
ABSTRACT
  We investigate the evolution of binary fractions in star clusters using
N-body models of up to 100000 stars. Primordial binary frequencies in these
models range from 5% to 50%. Simulations are performed with the NBODY4 code and
include a full mass spectrum of stars, stellar evolution, binary evolution and
the tidal field of the Galaxy. We find that the overall binary fraction of a
cluster almost always remains close to the primordial value, except at late
times when a cluster is near dissolution. A critical exception occurs in the
central regions where we observe a marked increase in binary fraction with time
-- a simulation starting with 100000 stars and 5% binaries reached a core
binary frequency as high as 40% at the end of the core-collapse phase
(occurring at 16 Gyr with ~20000 stars remaining). Binaries are destroyed in
the core by a variety of processes as a cluster evolves, but the combination of
mass-segregation and creation of new binaries in exchange interactions produces
the observed increase in relative number. We also find that binaries are cycled
into and out of cluster cores in a manner that is analogous to convection in
stars. For models of 100000 stars we show that the evolution of the core-radius
up to the end of the initial phase of core-collapse is not affected by the
exact value of the primordial binary frequency (for frequencies of 10% or
less). We discuss the ramifications of our results for the likely primordial
binary content of globular clusters.

<|endoftext|><|startoftext|>
Approaching the Heisenberg limit in an atom laser
M. Jeppesen,1 J. Dugué,1, 2 G. R. Dennis,1 M. T. Johnsson,1 C. Figl,1 N. P. Robins,1 and J. D. Close1
Australian Research Council Centre Of Excellence for Quantum-Atom Optics,
Department of Physics, The Australian National University, Canberra, ACT 0200, Australia
Laboratoire Kastler-Brossel, 24 rue Lhomond, 75231 Paris Cedex 05, France
We present experimental and theoretical results showing the improved beam quality and reduced
divergence of an atom laser produced by an optical Raman transition, compared to one produced
by an RF transition. We show that Raman outcoupling can eliminate the diverging lens effect that
the condensate has on the outcoupled atoms. This substantially improves the beam quality of the
atom laser, and the improvement may be greater than a factor of ten for experiments with tight
trapping potentials. We show that Raman outcoupling can produce atom lasers whose quality is
only limited by the wavefunction shape of the condensate that produces them, typically a factor of
1.3 above the Heisenberg limit.
PACS numbers: 03.75.Pp,03.75.Mn
Experiments in ultracold dilute atomic gases have had
an enormous impact on physics. The realization of Bose-
Einstein condensates (BECs), degenerate Fermi gases,
BEC-BCS crossover systems, and many others have re-
sulted in many fundamental insights and a wealth of new
results in both experiment and theory. One exciting sys-
tem to emerge from this research is the atom laser, a
highly coherent, directional beam of degenerate atoms,
controllably released from a BEC [1, 2, 3, 4, 5, 6, 7, 8].
The atom lasers demonstrated so far have produced
beams many orders of magnitude brighter than is pos-
sible with thermal atomic beams [9].
Atom laser beams show great promise for studies
of fundamental physics and in high precision measure-
ments [10]. In the future, it will be possible to produce
quadrature squeezing in atoms lasers, to use atom lasers
to produce correlations and entanglement between mas-
sive particles [11], as well as high precision interferome-
ters both on earth and in space [12]. For all these it will
be crucial to develop atom lasers with output modes that
as clean as possible in amplitude and phase, to allow sta-
ble modematching, just as it was crucial for optical lasers.
The beam quality factor M2, introduced for atom lasers
by J.-F. Riou et al. [13, 14], is a measure of how far the
beam deviates from the Heisenberg limit, and is defined
∆x∆px, (1)
where ∆x is the beam width, measured at the waist,
and ∆px is the transverse momentum spread. An ideal
(Gaussian) beam would therefore have M2 = 1 along
both its principal transverse axes. A number of exper-
imental works have shown that the beam quality of an
atom laser is strongly affected by the interaction of the
outcoupled atoms with the BEC from which it is pro-
duced [13, 15, 16, 17, 18]. As the atoms fall through the
condensate, the repulsive interaction acts as a diverging
lens to the outcoupled atoms. This leads to a divergence
in the atom laser beam and (because the BEC is a non-
ideal lens) a poor quality transverse beam profile. Such
behavior may cause problems in mode matching the atom
laser beam to another atom laser, a cavity or to a waveg-
FIG. 1: (color online). Top: Sequence of atom laser beams
showing the improved beam profile of a Raman atom laser.
The atom laser beams were produced using RF (a) and Ra-
man (b and c) transitions. The angle between the Raman
beams (see Fig. 2 (a)) was θ = 30◦ in (b) and θ = 140◦ in
(c), corresponding to a kick of 0.5h̄k (0.3 cm/s) and 1.9h̄k
(1.1 cm/s) respectively. The outcoupling rate differs be-
tween each atom laser. Below: Comparison of experimental
(dashed) and theoretical (solid) beam profiles 500 µm below
the BEC. The height of each theoretical curve has been scaled
to match experimental data.
http://arxiv.org/abs/0704.0291v2
uide. Experiments on atom lasers in waveguides have
produced beams with improved spatial profile [7]. How-
ever, precision measurements with atom interferometry
are likely to require propagation in free space, to avoid
introducing noise from the fluctuations in the waveguide
itself [12].
In a recent Letter [13], it was shown that the quality of
a free space atom laser is improved by outcoupling from
the base of the condensate. Our scheme, however, en-
ables the production of a high quality atom laser while
outcoupling from the center of the condensate. This is
desirable for a number of reasons: First, because the clas-
sical noise level is determined by the outcoupling Rabi
frequency, then outcoupling from the center, where the
density is greatest, gives the highest possible output flux
for a given classical noise level [19]. Second, outcoupling
from the center allows the longest operating time (for a
quasicontinuous atom laser) since the condensate can be
drained completely. Third, outcoupling from the center
minimizes the sensitivity of the output coupling to con-
densate excitations or external fluctuations.
In a recent Letter [9], we have demonstrated a contin-
uously outcoupled atom laser where the output coupler
is a coherent multi-photon (Raman) transition [6, 20]. In
this scheme, the atoms receive a momentum kick from the
absorption and emission of photons. They leave the con-
densate more quickly, so that adverse effects due to the
mean-field repulsion from the condensate are reduced.
In this Letter, we report measurements of a substantial
improvement in the beam quality M2 using this outcou-
pling. In Fig. 1, we show absorption images of atom
laser beams outcoupled from the center of a BEC with
(a) negligible momentum kick, (b) a kick of 0.3 cm/s,
and 1.1 cm/s (c). As the kick increases, the divergence
is reduced and the beam profile improved.
In our experiment, we create 87Rb BECs of
5× 105 atoms in the |F = 1,mF = −1〉 state via stan-
dard runaway evaporation of laser cooled atoms. We
use a highly stable, water cooled QUIC magnetic trap
(axial frequency ωy = 2π × 12 Hz and radial frequency
ωρ = 2π × 128 Hz, with a bias field of B0 = 2 G). We
control drifts in the magnetic bias by using high stability
power supplies and water cooling. This stability allows
us to precisely and repeatably address the condensate.
We produce the atom laser by transferring the atoms to
the untrapped |F = 1,mF = 0〉 state and letting them
fall under gravity. To outcouple atoms with negligible
momentum kick we induce spin flips via an RF field of
a frequency corresponding to the Zeeman shift in the
center of the condensate. Alternatively, we induce the
spin flips via an optical Raman transition. The setup
is shown in Fig. 2 (a). Two optical Raman beams, sep-
arated by an angle θ, propagate in the plane of grav-
ity and the magnetic trap bias field. The momentum
transfer to the atoms through absorption and emission of
the photons is 2h̄k sin(θ/2), with k the wave number of
FIG. 2: (color online) (a) Experimental schematic (not to
scale) showing the BEC, Raman lasers, and trapping coils.
(b) Cross section along the two strong axes of the magnetic
trap, showing the BEC, outcoupling surface, and atom laser
trajectories. Note that the field of view in (b) is rotated 90◦
with respect to (a).
the laser beams. The Raman laser beams are produced
from one 700 mW diode laser. We can turn the laser
power on or off in less than 200 ns using a fast switching
AOM in a double pass configuration. After the switch-
ing AOM, the light is split and sent through two separate
AOMs, again each in a double pass configuration. The
frequency difference between the AOMs corresponds to
the Zeeman plus kinetic energy difference between the
initial and final states of the two-photon Raman transi-
tion. We stabilize the frequency difference by running
the 80 MHz function generators driving the AOMs from
a single oscillator. The beams are then coupled via single
mode, polarization maintaining optical fibers directly to
the BEC through a collimating lens and waveplate, pro-
viding a maximum intensity of 2500 mW/cm2 per beam
at the BEC. The polarization of the beams is optimized
to achieve maximum outcoupling with a downward kick
and corresponds to π polarization for the upper beam
and σ+ for the lower beam.
The outcoupling resonance is set to the center of the
BEC for both RF and Raman outcoupling, as shown
in Fig. 2 (b). This point is found by performing spec-
troscopy on the BEC using 100 ms of weak output cou-
pling at varying RF frequencies, and measuring the num-
ber of atoms remaining in the condensate after the output
coupling time [3]. A typical calibration curve is shown in
Fig. 3 (a), in this case for RF outcoupling. We operate
both RF and Raman output couplers at the point of max-
imum outcoupling rate. We further check this frequency
by ensuring that a continuous beam can still be produced
when the initial condensate is very small, which can only
happen when outcoupling from the center.
We observe the system using standard absorption
imaging along the y (weak trapping) direction, on the
F = 2 → F ′ = 3 transition, with a 200 µs pulse of
repumping light (F = 1 → F ′ = 2) 1 ms prior to imag-
ing. From these images we are able to extract the rms
width of the atom laser as a function of fall distance (see
FIG. 3: (a) Output coupling spectroscopy showing the oper-
ating point at the center of the BEC, solid curve to guide the
eye. (b) The rms beam width for RF and Raman atom lasers.
The dots represent experimental measurements and the solid
curves our theoretical predictions.
Fig. 3 (b)), which we use to calculateM2 (details below).
To model the system, we use a two-step method fol-
lowing [13]. Inside the condensate, we use the WKB
approximation, by integrating the phase along the clas-
sical trajectories of atoms moving in the Thomas-Fermi
potential of the condensate (an inverted paraboloid) [16].
After this, we propagate the atom laser wavefunction us-
ing a Kirchoff-Fresnel diffraction integral over the surface
of the condensate:
ψ(r) =
dS′ · [G∇′ ψ − ψ∇′G], (2)
where G = G(r, r′) is the Green’s function for the Hamil-
tonian in the gravitational potential V (r) = −mgz [21].
Therefore, the model includes only interactions between
condensate atoms and beam atoms; interactions between
atoms within the beam are ignored. The integral in
Eq. (2) is formally a two dimensional surface integral
over the whole condensate. However for simplicity, fol-
lowing [13], we neglect divergence in the weak trapping
axis and only consider cross sections in the plane of the
strong trapping axes, and so the integral becomes one
dimensional. A 3D wavefunction is built up by calculat-
ing the atom laser in a series of planes along the weak
trapping axis.
We ignore the effects of the magnetic field on the atom
laser. The atom laser state |F = 1,mF = 0〉 is unaffected
to first order by the magnetic field, but is weakly anti-
trapped due to the second order Zeeman effect, with an
effective trapping frequency of ω2nd = 2π × 2.6 Hz. The
transverse position of an atom in such a potential is
x(t) = x0 cosh(ω2ndt) ≈ x0(1 + ω
2/2). (3)
For the 1 mm (14 ms) propagation we consider here the
transverse position is affected by less than 3%. We also
ignore the AC Stark effect of the Raman beams on the
atom laser, because the intensity of the beams does not
change significantly over the 1 mm propagation.
We have checked the validity of this model against
a solution of the full 3D Gross-Pitaesvskii (GP) equa-
tion, including beam-beam interactions. To find the
atom laser wavefunction at large distances below the con-
densate (up to 1 mm), we transfer the GP model to a
freely falling frame once the atom laser wavefunction has
reached steady state. The details of the calculation will
be the basis of a future publication. The two models give
good agreement.
Calculating the quality factor M2 of the atom laser
directly from Eq. (1) requires measurement of the beam
width at the waist ∆x0. Because the BEC acts as a
diverging lens on the atom laser, the beam waist is virtual
and located above the BEC, and so it is not possible to
measure the beam quality M2 using Eq. 1 only. For
our simulations, M2 is calculated equivalently from the
wavefunction ψ(x, y, z) at some height z below the BEC
in which the atom laser has reached the paraxial regime:
(M2/2)2 = (∆x(z))2(∆kx(z))
2 − C(z)2, (4)
where ∆x(z) is beam width and C(z) is the curvature-
beam width product [22]:
C(z) =
dx. (5)
In practice it is difficult to measure the wavefunction
phase, and hence C(z). However the beam width, in
the paraxial regime, obeys:
∆(x(t))2 = (∆x0)
2 + (∆vx)
2(t− tw)
2, (6)
where tw is the time when the beam is at its waist, and
∆x0 is the beam waist. In principle M
2 may be deter-
mined simply from measurements of the beam width at
different heights. In our experiment, we can only measure
the beam width in the far field, at distances greater than
300 µm below the condensate (observation at distances
less than 300 µm are prevented by the condensate expan-
sion after trap switchoff.) In the far field the second term
of Eq. 6 dominates, and so only the velocity spread can
be measured. Therefore we calculate ∆x0 and tw from
the model, tw = mC(z)/(h̄∆k
x), with tw negative since
the waist is virtual and located above the BEC. We then
fit to the experimental data to find ∆vx.
In Fig. 4, we present the theoretical and experimental
results. We find that as the kick increases, the beam qual-
ity is improved and the divergence is reduced. For our
parameters, we find that for an RF atom laserM2 = 2.2,
and for a Raman atom laser M2 = 1.4 with the maxi-
mum two photon kick. As the kick increases,M2 contin-
ues to improve, and approaches but does not reach the
Heisenberg limit of one. It asymptotes to a limit slightly
above that, which for our parameters is equal to 1.3. In
this regime of large kick, the interaction of the outcou-
pled atoms with the condensate becomes negligible, and
FIG. 4: (a) Calculated quality factor M2 of an atom laser.
The dots are the experimental measurements, and the solid
line our theoretical predictions. (b) M2 as a function of trap-
ping frequency for an RF atom laser (dashed line), a kick
of 0.5h̄k (0.3 cm/s) (dotted line), and 2h̄k (1.1 cm/s) (solid
line). The condensate number was N = 5 × 105 atoms, and
the aspect ratio ωρ/ωy was 10.
the transverse atom laser wavefunction is approximately
the free space evolution of the condensate wavefunction
(along the outcoupling surface). It is therefore limited
by the non-ideal (non-Gaussian) condensate wavefunc-
tion itself. We calculate the product ∆x∆px for the con-
densate wavefunction (taken through the central horizon-
tal plane of the condensate) to be 1.3. We have therefore
improved the beam quality M2 by 50 percent, down to
a factor of 1.4 above the Heisenberg limit. In addition,
our simulations show that (using the same maximum two
photon kick) it is possible to reach the condensate limit
even for much tighter trapping potentials. In Fig 4 (b),
we show the results of simulations for increasing trap fre-
quencies, up to ω = 2π × 300 Hz. As the trap frequency
increases, theM2 worsens, up toM2 = 14 for RF outcou-
pling from a 2π×300 Hz trap. For the maximum Raman
two photon kick, the increase is only to M2 = 1.7 for
the same 2π × 300 Hz trap. Only for traps of less than
2π×50 Hz is the beam quality of an RF atom laser within
5 percent of that of a Raman atom laser.
With higher order Raman transitions [23], it will be
possible to reach the condensate limit even for exper-
iments with traps of several kilohertz. It will also be
possible to reach the Heisenberg limit by completely re-
moving the atomic interaction, for example by using a
Feschbach resonance. Using Raman lasers phase locked
to the 6.8 GHz hyperfine splitting will prevent populat-
ing the anti-trapped state, and produce a truly two state
atom laser [18, 24]. Such lasers, combined with the high
quality transverse mode of Raman atom lasers, could be
used in a continuous version of the atomic Mach-Zehnder
Bragg interferometer [25], and in the development of
atomic local oscillators.
We thank Ruth Mills for useful discussions. CF ac-
knowledges funding from the Alexander von Humboldt
foundation. This work was financially supported by the
Australian Research Council Centre of Excellence pro-
gram. Numerical simulations were done at the APAC
National Supercomputing Facility.
∗ Electronic address: matthew.jeppesen@anu.edu.au;
URL: http://www.acqao.org
[1] H. M. Wiseman, Phys. Rev. A 56, 2068 (1997).
[2] M.-O. Mewes, M. R. Andrews, D. M. Kurn, D. S. Durfee,
C. G. Townsend, and W. Ketterle, Phys. Rev. Lett. 78,
582 (1997).
[3] I. Bloch, T. W. Hänsch, and T. Esslinger, Phys. Rev.
Lett. 82, 3008 (1999).
[4] F. Gerbier, P. Bouyer, and A. Aspect, Phys. Rev. Lett.
86 (2001).
[5] G. Cennini, G. Ritt, C. Geckeler, and M. Weitz, Phys.
Rev. Lett. 91, 240408 (2003).
[6] E. W. Hagley, L. Deng, M. Kozuma, J. Wren, K. Helmer-
son, S. L. Rolston, and W. D. Phillips, Science 283, 1706
(1999).
[7] W. Guerin, J.-F. Riou, J. P. Gaebler, V. Josse, P. Bouyer,
and A. Aspect, Phys. Rev. Lett. 97, 200402 (2006).
[8] A. Öttl, S. Ritter, and T. Köhl, M. Esslinger, Phys. Rev.
Lett. 95, 090404 (2005).
[9] N. P. Robins, C. Figl, S. A. Haine, A. K. Morrison,
M. Jeppesen, J. J. Hope, and J. D. Close, Phys. Rev.
Lett. 96, 140403 (2006).
[10] M. A. Kasevich, Science 298, 1363 (2002).
[11] S. A. Haine, M. K. Olsen, and J. J. Hope, Phys. Rev.
Lett. 96, 133601 (2006).
[12] Y. Le Coq, J. A. Retter, S. Richard, A. Aspect, and
P. Bouyer, App. Phys. B 84, 627 (2006).
[13] J.-F. Riou, W. Guerin, Y. L. Coq, M. Fauquembergue,
V. Josse, P. Bouyer, and A. Aspect, Phys. Rev. Lett. 96,
070404 (2006).
[14] A. E. Siegman, IEEE. J. Quantum Electron. 27, 1146
(1991).
[15] M. Köhl, T. Busch, K. Mølmer, T. W. Hänsch, and
T. Esslinger, Phys. Rev. A 72, 063618 (2005).
[16] T. Busch, M. Köhl, T. Esslinger, and K. Mølmer, Phys.
Rev. A 65, 043615 (2002).
[17] Y. L. Coq, J. H. Thywissen, S. A. Rangwala, F. Gerbier,
S. Richard, G. Delannoy, P. Bouyer, and A. Aspect, Phys.
Rev. Lett. 87, 170403 (2001).
[18] A. Öttl, S. Ritter, M. Kohl, and T. Esslinger, Rev. of Sci.
Instrum. 77, 063118 (2006).
[19] N. P. Robins, A. K. Morrison, J. J. Hope, and J. D. Close,
Phys. Rev. A 72, 031606 (2005).
[20] J. Ruostekoski, T. Gasenzer, and D. Hutchinson, Phys.
Rev. A 68, 011604 (2003).
[21] C. J. Bordé, C. R. Acad. Sci. Paris 4, 509 (2001).
[22] J.-F. Riou, Ph.D. thesis, Institut D’Optique (2006).
[23] M. Kozuma, L. Deng, E. W. Hagley, J. Wen, R. Lutwak,
K. Helmerson, S. L. Rolston, and W. D. Phillips, Phys.
Rev. Lett. 82, 871 (1999).
[24] J. Dugué, N. P. Robins, C. Figl, M. Jeppesen, P. Sum-
mers, M. T. Johnsson, J. J. Hope, and J. D. Close, Phys.
Rev. A 75, 053602 (2007).
[25] Y. Torii, Y. Suzuki, M. Kozuma, T. Sugiura, T. Kuga,
L. Deng, and E. W. Hagley, Phys. Rev. A 61, 041602
(2000).
mailto:matthew.jeppesen@anu.edu.au
http://www.acqao.org
ABSTRACT
  We present experimental and theoretical results showing the improved beam
quality and reduced divergence of an atom laser produced by an optical Raman
transition, compared to one produced by an RF transition. We show that Raman
outcoupling can eliminate the diverging lens effect that the condensate has on
the outcoupled atoms. This substantially improves the beam quality of the atom
laser, and the improvement may be greater than a factor of ten for experiments
with tight trapping potentials. We show that Raman outcoupling can produce atom
lasers whose quality is only limited by the wavefunction shape of the
condensate that produces them, typically a factor of 1.3 above the Heisenberg
limit.

<|endoftext|><|startoftext|>
Introduction 2
2 Overview of the cone jet-finding algorithm 5
3 IR unsafety in the midpoint algorithm 6
4 An exact seedless cone jet definition 8
4.1 One-dimensional example . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 The two-dimensional case . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.1 General approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2.2 Specific computational strategies . . . . . . . . . . . . . . . . . . . 11
4.3 The split–merge part of the cone algorithm . . . . . . . . . . . . . . . . . . 14
5 Tests and comparisons 16
5.1 Measures of IR (un)safety . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Rsep: an inexistent problem . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Physics impact of seedless v. midpoint cone . . . . . . . . . . . . . . . . . 23
5.4.1 Inclusive jet spectrum . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.2 Jet masses in 3-jet events . . . . . . . . . . . . . . . . . . . . . . . 26
6 Conclusions 30
A Further computational details 32
A.1 Cone multiplicities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.2 Computational complexity of the split–merge step . . . . . . . . . . . . . . 32
B Proof of IR safety of the SISCone algorithm 33
B.1 General aspects of the proof . . . . . . . . . . . . . . . . . . . . . . . . . . 34
B.2 Split–merge ordering variable . . . . . . . . . . . . . . . . . . . . . . . . . 37
1 Introduction
Two broad classes of jet definition are generally advocated [1] for hadron colliders. One
option is to use sequential recombination jet algorithms, such as the kt [2] and Cam-
bridge/Aachen algorithms [3], which introduce a distance measure between particles, and
repeatedly recombine the closest pair of particles until some stopping criterion is reached.
While experimentally these are starting to be investigated [4, 5], the bulk of measurements
are currently carried out with the other class of jet definition, cone jet algorithms (see e.g.
[6]). In general there are indications [7] that it may be advantageous to use both sequential
recombination and cone jet algorithms because of complementary sensitivities to different
classes of non-perturbative corrections.
Cone jet algorithms are inspired by the idea [8] of defining a jet as an angular cone
around some direction of dominant energy flow. To find these directions of dominant
energy flow, cone algorithms usually take some (or all) of the event particles as ‘seeds’,
i.e. trial cone directions. Then for each seed they establish the list of particles in the trial
cone, evaluate the sum of their 4-momenta, and use the resulting 4-momentum as a new
trial direction for the cone. This procedure is iterated until the cone direction no longer
changes, i.e. until one has a “stable cone”.
Stable cones have the property that the cone axis a (a four-vector) coincides with the
(four-vector) axis defined by the total momentum of the particles contained in the cone,
D (pin cone, a) = 0 , with pin cone =
pi Θ(R−D(pi, a)) , (1)
where D(p, a) is some measure of angular distance between the four-momentum p and the
cone axis a, and R is the given opening (half)-angle of the cone, also referred to as the cone
radius. Typically one defines D2(p, a) = (yp− ya)2+(φp−φa)2, where yp, ya and φp, φa are
respectively the rapidity and azimuth of p and a.
Two types of problem arise when using seeds as starting points of an iterative search
for stable cones. On one hand, if one only uses particles above some momentum threshold
as seeds, then the procedure is collinear unsafe. Alternatively if any particle can act as a
seed then one needs to be sure that the addition of an infinitely soft particle cannot lead
to a new (hard) stable cone being found, otherwise the procedure is infrared (IR) unsafe.
The second of these problems came to fore in the 1990’s [9], when it was realised that
there can be stable cones that have two hard particles on opposing edges of the cone and
no particles in the middle, e.g. for configurations such as
pt1 > pt2; R < D(p1, p2) < (1 + pt2/pt1)R. (2)
In traditional iterative cone algorithms, p1 and p2 each act as seeds and two stable cones
are found, one centred on p1, the other centred on p2. The third stable cone, centred
between p1 and p2 (and containing them both) is not found. If, however, a soft particle
is added between the two hard particles, it too acts as a seed and the third stable cone is
then found. The set of stable cones (and final jets) is thus different with and without the
soft particle and there is a resulting non-cancellation of divergent real soft production and
corresponding virtual contributions, i.e. the algorithm is infrared unsafe.
Infrared unsafety is a serious issue, not just because it makes it impossible to carry
out meaningful (finite) perturbative calculations, but also because it breaks the whole
relation between the (Born or low-order) partonic structure of the event and the jets that
one observes, and it is precisely this relation that a jet algorithm is supposed to codify:
it makes no sense for the structure of multi-hundred GeV jets to change radically just
because hadronisation, the underlying event or pileup threw a 1 GeV particle in between
them.
A workaround for the above IR unsafety problem was proposed in [9]: after finding
the stable cones that come from the true seed particles, add artificial “midpoint” seeds
between pairs of stable cones and search for new stable cones that arise from the midpoint
seeds. For configurations with two hard particles, the midpoint fix resolved the IR unsafety
issue. It was thus adopted as a recommendation [6] for Run II of the Tevatron and is now
coming into use experimentally [10, 11].
Recently, it was observed [1] that in certain triangular three-point configurations there
are stable cones that are not identified even by the midpoint procedure. While these can be
identified by extended midpoint procedures (e.g. midpoints between triplets of particles)
[12, 13], in this article (section 3) we show that there exist yet other 3-particle configurations
for which even this fix does not find all stable cones.
Given this history of infrared safety problems being fixed and new ones being found,
it seems to us that iterative1 cone algorithms should be abandoned. Instead we believe
that cone jet algorithms should solve the mathematical problem of demonstrably finding
all stable cones, i.e. all solutions to eq. (1). This kind of jet algorithm is referred to as
an exact seedless cone jet algorithm [6] and has been advocated before in [16]. With an
exact seedless algorithm, the addition of one or more soft particles cannot lead to new
hard stable cones being found, because all hard stable cones have already been (provably)
found. Therefore the algorithm is infrared safe at all orders.
Two proposals exist for approximate implementations of the seedless jet algorithm
[6, 17]. They both rely on the event being represented in terms of calorimeter towers,
which is far from ideal when considering parton or hadron-level events. Ref. [6] also pro-
posed a procedure for an exact seedless jet algorithm, intended for fixed-order calculations,
and implemented for example in the MCFM and NLOJet fixed order (NLO) codes [18, 19].2
This method takes a time O
to find jets among N particles. While perfectly ad-
equate for fixed order calculations (N ≤ 4), a recommendation to extend the use of such
seedless cone implementations more generally would have little chance of being adopted
experimentally: the time to find jets in a single (quiet!) event containing 100 particles
would approach 1017 years.
Given the crucial importance of infrared safety in allowing one to compare theoretical
predictions and experimental measurements, and the need for the same algorithm to be
used in both, there is a strong motivation for finding a more efficient way of implementing
the seedless cone algorithm. Section 4 will show how this can be done, first in the context
of a simple one-dimensional example (sec. 4.1), then generalising it to two dimensions (y,
φ, sec. 4.2) with an approach that can be made to run in polynomial (N2 lnN) time. As
1A more appropriate name might be the doubly iterative cone algorithm, since as well as iterating the
cones, the cone algorithm’s definition has itself seen several iterations since its original introduction by
UA1 in 1983 [14], and even since the Snowmass accord [15], the first attempt to formulate a standard,
infrared and collinear-safe cone-jet definition, over 15 years ago.
2Section 3.4.2 of [6] is the source of some confusion regarding nomenclature, because after discussing
both the midpoint and seedless algorithms, it proceeds to show some fixed-order results calculated with
the seedless algorithm, but labelled as midpoint. Though both algorithms are IR safe up to the order that
was shown, they would not have given identical results.
in recent work on speeding up the kt jet-algorithm [20], the key insights will be obtained
by considering the geometrical aspects of the problem. Section 4.3 will discuss aspects of
the split–merge procedure.
In section 5 we will study a range of physics and practical properties of the seedless
algorithm. Given that the split–merge stage is complex and so yet another potential source
of infrared unsafety, we will use Monte Carlo techniques to provide independent evidence
for the safety of the algorithm, supplementing a proof given in appendix B. We will
examine the speed of our coding of the algorithm and see that it is as fast as publicly
available midpoint codes. We will also study the question of the relation between the low-
order perturbative characteristics of the algorithm, and its all-order behaviour, notably
as concerns the ‘Rsep’ issue [21, 1]. Finally we highlight physics contexts where we see
similarities and differences between our seedless algorithm and the midpoint algorithm.
For inclusive quantities, such as the inclusive jet spectrum, perturbative differences are of
the order of a few percent, increasing to 10% at hadron level owing to reduced sensitivity to
the underlying event in the seedless algorithm. For exclusive quantities we see differences
of the order of 10− 50%, for example for mass spectra in multi-jet events.
2 Overview of the cone jet-finding algorithm
Algorithm 1 A full specification of a modern cone algorithm, governed by four param-
eters: the cone radius R, the overlap parameter f , the number of passes Npass and a
minimum transverse momentum in the split–merge step, pt,min. Throughout, particles are
to be combined by summing their 4-momenta and distances are to be calculated using the
longitudinally invariant ∆y and ∆φ distance measures (where y is the rapidity).
1: Put the set of current particles equal to the set of all particles in the event.
2: repeat
3: Find all stable cones of radius R (see Eq. (1)) for the current set of particles, e.g.
using algorithm 2, section 4.2.2.
4: For each stable cone, create a protojet from the current particles contained in the
cone, and add it to the list of protojets.
5: Remove all particles that are in stable cones from the list of current particles.
6: until No new stable cones are found, or one has gone around the loop Npass times.
7: Run a Tevatron Run-II type split–merge procedure [6], algorithm 3 (section 4.3), on
the full list of protojets, with overlap parameter f and transverse momentum threshold
pt,min.
Before entering into technical considerations, we outline the structure of a modern cone
jet definition as algorithm 1, largely based on the Tevatron Run-II specification [6]. It is
governed by four parameters. The cone radius R and overlap parameter f are standard
and appeared in previous cone algorithms. The Npass variable is new and embodies the
suggestion in [1] that one should rerun the stable cone search to eliminate dark towers [21],
particle pt [GeV] y φ
1 400 0 0
2 110 0.9R 0
3 90 2.3R 0
4 1.1 1.5R 0
Table 1: Particles 1–3 represent a hard configuration. The jets from this hard configuration
are modified in the midpoint cone algorithm when one adds the soft particle 4.
i.e. particles that do not appear in any stable cones (and therefore never appear in jets)
during a first pass of the algorithm, even though they can correspond to significant energy
deposits. A sensible default is Npass = ∞ since, as formulated, the procedure will in any
case stop once further passes find no further stable cones. The pt,min threshold for the
split–merge step is also an addition relative to the Run II procedure, inspired by [12, 7].
It is discussed in section 4.3 together with the rest of the split–merge procedure and may
be set to zero to recover the original Run II type behaviour, a sensible default.
The main development of this paper is the specification of how to efficiently carry out
step 3 of algorithm 1. In section 3 we will show that the midpoint approximation for
finding stable cones fails to find them all, leading to infrared unsafety problems. Section 4
will provide a practical solution. Code corresponding to this algorithm is available publicly
under the name of ‘Seedless Infrared Safe Cone’ (SISCone).
3 IR unsafety in the midpoint algorithm
Until now, the exact exhaustive identification of all stable cones was considered to be too
computationally complex to be feasible for realistic particle multiplicities. Instead, the
Tevatron experiments streamline the search for stable cones with the so-called ’midpoint
algorithm’ [9]. Given a seed, the latter calculates the total momentum of the particles
contained within a cone centred on the seed, uses the direction of this momentum as a new
seed and iterates until the resulting cone is stable. The initial set of seeds is that of all
particles whose transverse momentum is above a seed threshold s (one may take s = 0 to
obtain a collinear-safe algorithm). Then, one adds a new set of seeds given by all midpoints
between pairs of stable cones separated by less than 2R and repeats the iterations from
these midpoint seeds.
The problem with the midpoint cone algorithm can be seen from the configurations of
table 1, represented also in fig. 1. Using particles 1 − 3, there exist three stable cones.
In a pt-scheme recombination procedure (a pt weighted averaging of y and φ) they are at
y ≃ {0.194R, 1.53R, 2.3R}.3 Note however that starting from particles 1, 2, 3 as seeds, one
only iterates to the stable cones at y ≃ 0.194R and y = 2.3R. Using the midpoint between
3In a more standard E-scheme (four-momentum) recombination procedure the exact numbers depend
slightly on R, but the conclusions are unchanged.
p t/GeV p t/GeV
(a) (b)
y0 1 2 3−1
y0 1 2 3−1
Figure 1: Configuration illustrating one of the IR unsafety problems of the midpoint jet
algorithm (R = 1); (a) the stable cones (ellipses) found in the midpoint algorithm; (b)
with the addition of an arbitrarily soft seed particle (red wavy line) an extra stable cone
is found.
these two stable cones, at y ≃ 1.247R, one iterates back to the stable cone at y ≃ 0.194R,
therefore the stable cone at y = 1.53R is never found. The result is that particles 1 and 2
are in one jet, and particle 3 in another, fig.1a.
If additionally a soft particle (4) is present to act as a seed near y = 1.53R, fig.1b, then
the stable cone there is found from the iterative procedure. In this case we have three
overlapping stable cones, with hard-particle content 1 + 2, 2 + 3 and 3. What happens
next depends on the precise splitting and merging procedure that is adopted. Using that
of [6] then for f < 0.55 the jets are merged into a single large jet 1 + 2+ 3, otherwise they
are split into 1 and 2 + 3. Either way the jets are different from those obtained without
the extra soft seed particle, meaning that the procedure is infrared unsafe. In contrast, a
seedless approach would have found the three stable cones independently of the presence
of the soft particle and so would have given identical sets of jets.
The infrared divergence arises for configurations with 3 hard particles in a common
neighbourhood plus one soft one (and a further hard electroweak boson or QCD parton
to balance momentum). Quantities where it will be seen include the NLO contribution
to the heavy-jet mass in W/Z+2-jet (or 3-jet) events, the NNLO contribution to the
W/Z+2-jet cross section or the 3-jet cross section, or alternatively at NNNLO in the
inclusive jet cross section. The problem might therefore initially seem remote, since the
theoretical state of the art is far from calculations of any of these quantities. However
one should recall that infrared safety at all orders is a prerequisite if the perturbation
series is to make sense at all. If one takes the specific example of the Z+2-jet cross
section (measured in [10]) then the NNLO divergent piece would be regulated physically
by confinement at the non-perturbative scale ΛQCD, and would give a contribution of order
s ln pt/ΛQCD. Since αs(pt) ln pt/ΛQCD ∼ 1, this divergent NNLO contribution will be
of the same order as the NLO piece αEWα
s. Therefore the NLO calculation has little formal
meaning for the midpoint algorithm, since contributions involving yet higher powers of αs
Observable 1st miss cones at Last meaningful order
Inclusive jet cross section NNLO NLO
W/Z/H + 1 jet cross section NNLO NLO
3 jet cross section NLO LO
W/Z/H + 2 jet cross section NLO LO
jet masses in 3 jets, W/Z/H + 2 jets LO none
Table 2: Summary of the order (α4s or α
sαEW ) at which stable cones are missed in various
processes with a midpoint algorithm, and the corresponding last order that can be mean-
ingfully calculated. Infrared unsafety first becomes visible one order beyond that at which
one misses stable cones.
will be parametrically as large as the NLO term.4 The situation for a range of processes is
summarised in table 2.
4 An exact seedless cone jet definition
One way in which one could imagine trying to ‘patch’ the seed-based iterative cone jet-
algorithm to address the above problem would be to use midpoints between all pairs of
particles as seeds, as well as midpoints between the initial set of stable cones.5 However
it seems unlikely that this would resolve the fundamental problem of being sure that one
will systematically find all solutions of eq. (1) for any ensemble of particles.
Instead it is more appropriate to examine exhaustive, non-iterative approaches to the
problem, i.e. an exact seedless cone jet algorithm, one that provably finds all stable cones,
as advocated already some time ago in [16].
For very low multiplicities N , one approach is that suggested in section 3.3.3 of [6] and
used in the MCFM [18] and NLOJet [19] next-to-leading order codes. One first identifies
all possible subsets of the N particles in the event. For each subset S, one then determines
the rapidity (yS) and azimuth (φS) of the total momentum of the subset, pS =
i∈S pi
and then checks whether a cone centred on yS , φS contains all particles in S but no other
particles. If this is the case then S corresponds to a stable cone. This procedure guarantees
that all solutions to eq. (1) will be found.
In the above procedure there are ∼ 2N distinct subsets of particles and establishing
whether a given subset corresponds to a stable cone takes time O (N). Therefore the
time to identify all stable cones is O
. For the values of N (≤ 4) relevant in fixed-
order calculations, N2N time is manageable, however as soon as one wishes to consider
4As concerns the measurement [10], the discussion is complicated by the confusion surrounding the
nomenclature of the seedless and midpoint algorithms — while it seems that the measurement was carried
out with a true midpoint algorithm, the calculation probably used the ‘midpoint’ as defined in section
3.4.2 of [6] (cf. footnote 2), which is actually the seedless algorithm, i.e. the measurements and theoretical
predictions are based on different algorithms.
5This option was actually mentioned in [6] but rejected at the time as impractical.
etc...
Figure 2: Representation of points on a line and the places where a sliding segment has a
change in its set of enclosed points.
parton-shower or hadron-level events, with dozens or hundreds of particles, N2N time is
prohibitive. A solution can only be considered realistic if it is polynomial in N , preferably
with not too high a power of N .
As mentioned in the introduction, approximate procedures for implementing seedless
cone jet algorithms have been proposed in the past [6, 17]. These rely on considering the
momentum flow into discrete calorimeter towers rather than considering particles. As such
they are not entirely suitable for examining the full range event levels, which go from fixed-
order (few partons), via parton shower level (many partons) and hadron-level, to detector
level which has both tracking and calorimetry information.
4.1 One-dimensional example
To understand how one might construct an efficient exact seedless cone jet algorithm, it is
helpful to first examine a one-dimensional analogue of the problem. The aim is to identify
all solutions to eq. (1), but just for (weighted) points on a line. The equivalent of a cone
of radius R is a segment of length 2R.
Rather than immediately looking for stable segments one instead looks for all distinct
ways in which the segment can enclose a subset of the points on the line. Then for each
separate enclosure one calculates its centroid C (weighted with the pt of the particles) and
verifies whether the segment centred on C encloses the same set of points as the original
enclosure. If it does then C is the centre of a stable segment.
A simple way of finding all distinct segment-enclosures is illustrated in fig.2. First one
sorts the points into order on the line. One then places the segment far to the left and slides
it so that it goes infinitesimally beyond the leftmost point. This is a first enclosure. Then
one slides the segment again until its right edge encounters a new point or the left edge
encounters a contained point. Each time either edge encounters a point, the point-content
of the segment changes and one has a new distinct enclosure. Establishing the stability of
each enclosure is trivial, since one knows how far the segment can move in each direction
without changing its point content — so if the centroid is such that the segment remains
within these limits, the enclosure corresponds to a stable segment.
The computational complexity of the above procedure, N lnN , is dominated by the
need to sort the points initially: there are O (N) distinct enclosures and, given the sorted
list, finding the next point that will enter or leave an edge costsO (1) time, as does updating
the weighted centroid (assuming rounding errors can be neglected), so that the time not
associated with the sorting step is O (N).
(a) (b) (c) (d)
Figure 3: (a) Some initial circular enclosure; (b) moving the circle in a random direction
until some enclosed or external point touches the edge of the circle; (c) pivoting the circle
around the edge point until a second point touches the edge; (d) all circles defined by pairs
of edge points leading to the same circular enclosure.
4.2 The two-dimensional case
4.2.1 General approach
The solution to the full problem can be seen as a 2-dimensional generalisation of the
above procedure.6 The key idea is again that of trying to identify all distinct circular
enclosures, which we also call distinct cones (by ‘distinct’ we mean having a different point
content), and testing the stability of each one. In the one-dimensional example there was a
single degree of freedom in specifying the position of the segment and all distinct segment
enclosures could be obtained by considering all segments with an extremity defined by a
point in the set. In 2 dimensions there are two degrees of freedom in specifying the position
of a circle, and as we shall see, the solution to finding all distinct circular enclosures will
be to examine all circles whose circumference lies on a pair of points from the set.
To see in detail how one reaches this conclusion, it is useful to examine fig. 3. Box (a)
shows a circle enclosing two points, the (red) crosses. Suppose, in analogy with fig. 2 that
one wishes to slide the circle until its point content changes. One might choose a direction
at random and after moving a certain distance, the circle’s edge will hit some point in the
plane, box (b), signalling that the point content is about to change. In the 1-dimensional
case a single point, together with a binary orientation (taking it to be the left or right-hand
point) were sufficient to characterise the segment enclosure. However in the 2-dimensional
case one may orient the circle in an infinite number of ways. We can therefore pivot the
circle around the boundary point. As one does this, at some point a second point will then
touch the boundary of the circle, box (c).
The importance of fig. 3 is that it illustrates that for each and every enclosure, one
can always move the corresponding circle (without changing the enclosure contents) into
a position where two points lie on its boundary.7 Conversely, if one considers each circle
6We illustrate the planar problem rather than the cylindrical one since for R < π/2 the latter is a
trivial generalisation of the former.
7There are two minor exceptions to this: (a) for any point separated from all others by more than 2R,
the circle containing it can never have more than that one point on its edge — any such point forms a
stable cone of its own; (b) there may be configurations where three or more points lie on the same circle
whose boundary is defined by a pair of points in the set, and considers all four permutations
of the edge points being contained or not in the enclosure, then one will have identified
all distinct circular enclosures. Note that one given enclosure can be defined by several
distinct pairs of particles, which means that when considering the enclosures defined by all
pairs of particles, we are likely to find each enclosure more than once, cf. fig. 3d.
A specific implementation of the above approach to finding the stable cones is given
as algorithm 2 below. It runs in expected time O (Nn ln n) where N is the total number
of particles and n is the typical number of particles in a circle of radius R.8 The time
is dominated by a step that establishes a traversal order for the O (Nn) distinct circular
enclosures, much as the one-dimensional (N lnN) example was dominated by the step
that ordered the O (N) distinct segment enclosures.9 Some aspects of algorithm 2 are
rather technical and are explained in the subsubsection that follows. A reader interested
principally in the physics of the algorithm may prefer to skip it on a first reading.
4.2.2 Specific computational strategies
A key input in evaluating the computational complexity of various algorithms is the knowl-
edge of the number of distinct circular enclosures (or ‘distinct cones’) and the number of
stable cones. These are both estimated in appendix A.1, and are respectively O (Nn) and
(expected) O (N).
Before giving the 2-dimensional analogue of the 1-d algorithm of section 4.1 we examine
a simple ‘brute force’ approach for finding all stable cones. One takes all ∼ Nn pairs of
points within 2R of each other and for each pair identifies the contents of the circle and
establishes whether it corresponds to a stable cone, at a cost of O (N) each time, leading to
an overall N2n total cost. This is to be compared to a standard midpoint cone algorithm,
whose most expensive step will be the iteration of the expected O (Nn) midpoint seeds,
for a total cost also of N2n, assuming the average number of iterations from any given seed
to be O (1).10
One can reduce the computational complexity by using some of the ideas from the 1-d
example, notably the introduction of an ordering for the boundary points of circles, and
the use of the boundary points as sentinels for instability. Specifically, three elements will
be required:
i) one needs a way of labelling distinct cones that allows one to test whether two cones
are the same at a cost of O (1);
of radius R (i.e. are cocircular) — given a circle defined by a pair of them, the question of which of the
others is in the circle becomes ambiguous and one should explicitly consider all possible combinations of
inclusion/exclusion; a specific case of this is when there are collinear momenta (coincident points), which
can however be dealt more simply by immediately merging them.
8Given a detector that extends to rapidities y < ymax, n/N ∼ πR2/(4πymax), which is considerably
smaller than 1 — this motivates us to distinguish n from N .
9For comparison we note that the complexity of public midpoint algorithm implementations scales as
10In both cases one can reduce this to Nn2 by tiling the plane into squares of edge-length R and
restricting the search for the circle contents to tiles in the vicinity of the circle centre.
Algorithm 2 Procedure for establishing the list of all stable cones (protojets). For sim-
plicity, parts related to the special case of multiple cocircular points (see footnote 7) are
not shown. They are a straightforward generalisation of steps 6 to 13.
1: For any group of collinear particles, merge them into a single particle.
2: for particle i = 1 . . . N do
3: Find all particles j within a distance 2R of i. If there are no such particles, i forms
a stable cone of its own.
4: Otherwise for each j identify the two circles for which i and j lie on the circumference.
For each circle, compute the angle of its centre C relative to i, ζ = arctan ∆φiC
5: Sort the circles found in steps 3 and 4 into increasing angle ζ .
6: Take the first circle in this order, and call it the current circle. Calculate the total
momentum and checkxor for the cones that it defines. Consider all 4 permutations
of edge points being included or excluded. Call these the “current cones”.
7: repeat
8: for each of the 4 current cones do
9: If this cone has not yet been found, add it to the list of distinct cones.
10: If this cone has not yet been labelled as unstable, establish if the in/out status
of the edge particles (with respect to the cone momentum axis) is the same as
when defining the cone; if it is not, label the cone as unstable.
11: end for
12: Move to the next circle in order. It differs from the previous one either by a
particle entering the circle, or one leaving the circle. Calculate the momentum for
the new circle and corresponding new current cones by adding (or removing) the
momentum of the particle that has entered (left); the checkxor can be updated by
XORing with the label of that particle.
13: until all circles considered.
14: end for
15: for each of the cones not labelled as unstable do
16: Explicitly check its stability, and if it is stable, add it to the list of stable cones
(protojets).
17: end for
ii) one needs a way of ordering one’s examination of cones so that one can construct the
cones incrementally, so as not to pay the (at least, see below) O (
n) construction
price anew for each cone;
iii) one needs a way limiting the number of cones for which we carry out a full stability
test (which also costs at least
To label cones efficiently, we assign a random q-bit integer tag to each particle. Then we
define a tag for combinations of particles by taking the logical exclusive-or of all the tags of
the individual particles (this is easily constructed incrementally and is sometimes referred
to as a checkxor). Then two cones can be compared by examining their tags, rather
than by comparing their full list of particles. With such a procedure, there is a risk of
two non-identical cones ending up with identical tags (‘colliding’), which strictly speaking
will make our procedure only ‘almost exact’. The probability p of a collision occurring is
roughly the square of the number of enclosures divided by the number of distinct tags.
Since we have O (Nn) enclosures, this gives p ∼ N2n2/2q. By taking q sufficiently large
(in a test implementation we have used q = 96) and using a random number generator
that guarantees that all bits are decorrelated [22], one can ensure a negligible collision
probability.11
Given the ability to efficiently give a distinct label to distinct cones, one can address
points ii) and iii) mentioned above by following algorithm 2. Point (ii) is dealt with by
steps 2–6, 12 and 13: for each particle i, one establishes a traversal order for the circles
having i on their edge — the traversal order is such that as one works through the circles,
the circle content changes only by one particle at a time, making it easy to update the
momentum and checkxor for the circle.12 One maintains a record of all distinct cones in
the form of a hash (as a hash function one simply takes log2Nn bits of the tag), so that it
only takes O (1) time to check whether a cone has been found previously.
Rather than explicitly checking the stability of each distinct cone, the algorithm exam-
ines whether the multiple edge points that define the cone are appropriately included/excluded
in the circle around the cone’s momentum axis, step 10. All but a tiny fraction of unstable
cones fail this test, so that at the end of step 14 one has a list (of size O (N)) of candidate
stable cones — at that point one can carry out a full stability test for each of them. This
therefore deals with point (iii) mentioned above.
The dominant part of algorithm 2 is the ordering of the circles, step 5, which takes
n lnn time and must be repeated N times. Therefore the overall cost is Nn ln n. As
well as computing time, a significant issue is the memory use, because one must maintain
a list of all distinct cones, of which there are O (Nn). One notes however that standard
11A more refined analysis shows that we need only worry about collisions between the tags of stable cones
and other (stable or unstable) cones — since there are O (N) stable cones, the actual collision probability
is more likely to be O
/2q. In practice for N ∼ 104 and n ∼ 103 (a very highly populated event)
and using q = 96, this gives p ∼ 10−18. In principle to guarantee an infinitesimal collision probability
regardless of N, q should scale as lnN , however N will in any case be limited by memory use (which scales
as Nn) so a fixed q is not unreasonable.
12Rounding errors can affect the accuracy of the momentum calculated this way; the impact of this can
be minimised by occasionally recomputing the momentum of the circle from scratch.
implementations of the split–merge step of the cone algorithm also require O (Nn) storage,
albeit with a smaller coefficient.
It is worth highlighting also an alternative approach, which though slower, O
Nn3/2
has lower memory consumption and also avoids the small risk inexactness from the check-
xor. It is similar to the brute-force approach, but uses 2-dimensional computational ge-
ometry tree structures, such as quad-trees [23] or k-d trees [24]. These involve successive
sub-divisions of the plane (in quadrants, or pairs of rectangles), similarly to what is done
in 1-dimensional binary trees. They make it possible to check the stability of a given circle
n time (the time is mostly taken by identifying tree cells near the edge of the circle,
of which there are O (
n)), giving an overall cost of Nn3/2. The memory use of this form
of approach is O (N
n), simply the space needed to store the stable-cone contents.13
4.3 The split–merge part of the cone algorithm
The split–merge part of our cone algorithm is basically that adopted for Run-II of
the Tevatron [6]. It is shown in detail as algorithm 3. Since it does not depend on the
procedure used to find stable cones, it may largely be kept as is. We do however include
the following small modifications:
1. The run II proposal used Et throughout the split–merge procedure. This is not
invariant under longitudinal boosts. We replace it with p̃t, a scalar sum of the
transverse momenta of the constituents of the protojet. This ensures that the results
are both boost-invariant and infrared safe. We note that choosing instead pt (a
seemingly natural choice, made for example in the code of [19, 13]) would have led
to IR unsafety in purely hadronic events — the question of the variable to be used
for the ordering is actually a rather delicate one, and we discuss it in more detail in
appendix B.2.
2. We introduce a threshold pt,min below which protojets are discarded (step 2 of algo-
rithm 3). This parameter is motivated by the discussion in [6] concerning problems
associated with an ‘excess’ of stable cones in seedless algorithms, notably in events
with significant pileup. It provides an infrared and collinear safe way of removing the
resulting large number of low pt stable cones. By setting it to zero one recovers a be-
haviour identical to that of the Run-II algorithm (modulo the replacement Et → p̃t,
above), and we believe that in practice zero is actually a sensible default value. We
note that a similar parameter is present in PxCone [12, 7].
13Though here we are mainly interested in exact approaches, one may also examine the question of
the speed of the approximate seedless approach of Volobouev [17]. This approach represents the event
on a grid and essentially calculates the stability of a cone at each point of the grid using a fast-Fourier
transformation (FFT). In principle, for this procedure to be as good as the exact one, the grid should be
fine enough to resolve each distinct cone, which implies that it should have O (Nn) points; therefore the
FFT will require O (Nn lnNn) time, which is similar in magnitude to the time that is needed by the exact
algorithm. An open question remains that of whether a coarser grid might nevertheless be ‘good enough’
for many practical applications.
Algorithm 3 The disambiguated, scalar p̃t based formulation of a Tevatron Run-II type
split–merge procedure [6], with overlap threshold parameter f and transverse momentum
threshold pt,min. To ensure boost invariance and IR safety, for the ordering variable and the
overlap measure, it uses of p̃t,jet =
i∈jet |pt,i|, i.e. a scalar sum of the particle transverse
momenta (as in a ‘pt’ recombination scheme).
1: repeat
2: Remove all protojets with pt < pt,min.
3: Identify the protojet (i) with the highest p̃t.
4: Among the remaining protojets identify the one (j) with highest p̃t that shares
particles (overlaps) with i.
5: if there is such an overlapping jet then
6: Determine the total p̃t,shared =
k∈i&j |pt,k| of the particles shared between i and
7: if p̃t,shared < fp̃t,j then
8: Each particle that is shared between the two protojets is assigned to the one to
whose axis it is closest. The protojet momenta are then recalculated.
9: else
10: Merge the two protojets into a single new protojet (added to the list of protojets,
while the two original ones are removed).
11: end if
12: If steps 7–11 produced a protojet that coincides with an existing one, maintain
the new protojet as distinct from the existing copy(ies).
13: else
14: Add i to the list of final jets, and remove it from the list of protojets.
15: end if
16: until no protojets are left.
3. After steps 7–11, the same protojet may appear more than once in the list of protojets.
For example a protojet may come once from a single original stable cone, and a second
time from the splitting of another original stable cone. The original statement of the
split–merge procedure [6] did not address this issue, and there is a resulting ambiguity
in how to proceed. One option (as is done for example in the seedless cone code of
[19]) is to retain only a single copy of any such identical protojets. This however
introduces a new source of infrared unsafety: an added soft particle might appear in
one copy of the protojet and not the other and the two protojets would then no longer
be identical and would not be reduced to a single protojet. This could (and does
occasionally, as evidenced in section 5.1) alter the subsequent split–merge sequence.
If one instead maintains multiple identical protojets as distinct entities (as is done in
the codes of [13, 18]), then the addition of a soft particle does not alter the number
of hard protojet entries in the protojet list and the split–merge part of the algorithm
remains infrared safe. We therefore choose this second option, and make it explicit
as step 12 of algorithm 3.
The split–merge procedure is guaranteed to terminate because the number of overlapping
pairs of protojets is reduced each time an iteration of the loop finds an overlap. A proof of
the infrared safety of this (and the other) parts of our formulation of the cone algorithm is
given in appendix B. The computational complexity (O (N2)) of the split–merge procedure
is generally smaller than that of the stable-cone search, and so we relegate its discussion
to appendix A.2.
Finally, before closing this section, let us return briefly to the top-level of the cone
formulation, algorithm 1 and the question of the loop over multiple passes. This loop
contains just the stable-cone search, and one might wonder why the split–merge step has
not also been included in the loop. First consider pt,min = 0: protojets found in different
passes cannot overlap, and the split–merge procedure is such that if a particle is in a
protojet then it will always end up in a jet. Therefore it is immaterial whether the split–
merge step is kept inside or outside the loop. The advantage of keeping it outside the loop
is that one may rerun the algorithm with multiple overlap values f simply by repeating
the split–merge step, without repeating the search for stable cones. For pt,min 6= 0 the
positioning of the split–merge step with respect to the Npass loop would affect the outcome
of the algorithm if all particles not found in first-pass jets were to be inserted into the
second pass stable-cone search. Our specific formulation constitutes a design choice, which
allows one to rerun with different values of f and pt,min without repeating the stable-cone
search.
5 Tests and comparisons
5.1 Measures of IR (un)safety
In section 4 we presented a procedure for finding stable cones that is explicitly IR safe. In
appendix B we provide a proof of the IR safety of the rest of the algorithm. The latter is
rather technical and not short, and while we have every reason to believe it to be correct,
we feel that there is value in supplementing it with complementary evidence for the IR
safety of the algorithm. As a byproduct, we will obtain a measure of the IR unsafety of
various commonly used formulations of the cone algorithm.
To verify the IR safety of the seedless cone algorithm, we opt for a numerical Monte
Carlo approach, in analogy with that used in [25] to test the more involved recursive
infrared and collinear safety (a prerequisite for certain kinds of resummation). The test
proceeds as follows. One generates a ‘hard’ event consisting of some number of randomly
distributed momenta of the order of some hard scale pt,H , and runs the jet algorithm on the
hard event. One then generates some soft momenta at a scale pt,S ≪ pt,H , adds them to the
hard event (randomly permuting the order of the momenta) and reruns the jet algorithm.
One verifies that the hard jets obtained with and without the soft event are identical. If
they are not, the jet algorithm is IR unsafe. For a given hard event one repeats the test
with many different add-on soft events so as to be reasonably sure of identifying most hard
events that are IR unsafe. One then repeats the whole procedure for many hard events.
Algorithm Type IR unsafe Code
JetClu Seeded, no midpoints 2h+1s [9] [13]
SearchCone Seeded, search cone [21], midpoints 2h+1s [1] [13]
MidPoint Seeded, midpoints (2-way) 3h+1s [1] [13]
MidPoint-3 Seeded, midpoints (2-way, 3-way) 3h+1s [13]
PxCone Seeded, midpoints (n-way), non-standard SM 3h+1s [12]
Seedless [SM-pt] Seedless, SM uses pt 4h+1s
a [here]
Seedless [SM-MIP] Seedless, SM merges identical protojets 4h+1sb [here]
Seedless [SISCone] Seedless, SM of algorithm 3 no [here]
aFailures on 4h+1s arise only for R > π/4; for smaller R, failures arise only for higher multiplicities
bFailures for 4h+1s are extremely rare, but become more common for 5h+1s and beyond
Table 3: Summary of the various cone jet algorithms and the code used for tests here;
SM stands for “split–merge”; Nh+Ms indicates that infrared unsafety is revealed with
configurations consisting of N hard particles and M soft ones, not counting an additional
hard, potentially non-QCD, particle to conserve momentum. All codes have been used in
the form of plugins to FastJet (v2.1) [20].
The hard events are produced as follows: we choose a linearly distributed random
number of momenta (between 2 and 10) and for each one generate a random pt (linearly
distributed, 2−24pt,H ≤ pt ≤ pt,H , with pt,H = 1000GeV), a random rapidity (linearly
distributed in −1.5 < y < 1.5) and a random φ. For each hard event we also choose
random parameters for the jet algorithm, so as to cover the jet-algorithm parameter space
(0.3<R<1.57, 0.25<f <0.95, linearly distributed, the upper limit on R being motivated
by the requirement that R < π/2; the pt,min on protojets is set to 0 and the number of
passes is set to 1). For each add-on soft event we generate between 1 and 5 soft momenta,
distributed as the hard ones, but with the soft scale pt,S = 10
−100GeV replacing pt,H .
We note that the hard events generated as above do not conserve momentum — they
are analogous to events with a missing energy component or with identified photons or
leptons that are not given as inputs to the jet clustering. For the safety studies on the
full SISCone algorithm, we therefore also generate a set of hard events which do have
momentum conservation, analogous to purely hadronic events.
To validate our approach to testing IR safety, we apply it to a range of cone jet algo-
rithms, listed in table 3, including the many variants that are IR unsafe. In PxCone the
cut on protojets is set to 1GeV and in the SearchCone algorithm the search cone radius
is set to R/2.
The fraction of hard events failing the safety test is shown in fig. 4 for each of the jet
algorithms.14 All jet algorithms that are known to be IR unsafe do indeed fail the tests.
14The results are based on 80 trial soft add-on events for each hard event and should differ by no more
than a few percent (relative) from a full determination of the IR safety for each hard event (which would be
obtained in the limit of an infinite number of trial soft add-on events for each hard event). For SISCone we
only use 20 soft add-on events, so as to make it possible to probe a larger number of hard configurations.
10-5 10-4 10-3 10-2 10-1 1 
Fraction of hard events failing IR safety test
JetClu
SearchCone
PxCone
MidPoint
Midpoint-3
Seedless [SM-pt]
Seedless [SM-MIP]
Seedless (SISCone)
50.1%
48.2%
16.4%
15.6%
0.17%
< 10-9
Figure 4: Failure rates for the IR safety tests. The algorithms are as detailed in table 3.
Seeded algorithms have been used with a zero seed threshold. The events used do not
conserve momentum (i.e. have a missing energy component), except for the seedless SM-pt
case (where all events conserve momentum, to highlight the issue that arises in that case)
and for SISCone (where we use a mix of momentum conserving and non-conserving events
so as to fully test the algorithm). Further details are given in the text
One should be aware that the absolute failure rates depend to some extent on the way we
generated the hard events, and so are to be interpreted with caution. Having said that,
our hard events have a complexity similar to the Born-level (lowest-order parton-level)
of events that will be studied at LHC, for example in the various decay channels of tt̄H
production, and so both the order of magnitudes of the failure rates and their relative sizes
should be meaningful.
Algorithms that fail on ‘2h+1s’ events have larger failure rates than those that fail
on ‘3h+1s’ events, as would be expected — they are ‘more’ infrared unsafe. One notes
the significant failure rates for the midpoint algorithms, ∼ 16%, and the fact that adding
3-way midpoints (i.e. between triplets of stable cones) has almost no effect on the failure
rate, indicating that triangular configurations identified as IR unsafe in [1] are much less
important than others such as that discussed in section 3. PxCone’s smaller failure rate
seems to be due not to its multi-way midpoints, but rather to its specific split–merge
procedure which leads to fewer final jets (so that one is less sensitive to missing stable
cones).
Seedless algorithms with problematic split–merge procedures lead to small failure rates
(restricting one’s attention to small values of R, these values are further reduced). One
might be tempted to argue that such small rates of IR safety failure are unlikely to have
a physical impact and can therefore be ignored. However there is always a risk of some
specific study being unusually sensitive to these configurations, and in any case our aim
here is to provide an algorithm whose IR safety is exact, not just approximate.
Finally, with a ‘good’ split–merge procedure, that given as algorithm 3, none of the over
5 × 109 hard events tested (a mix both with and without momentum conservation) failed
the IR safety test. For completeness, we have carried out limited tests also for Npass = ∞
and with a pt,min on protojets of 100GeV, and have additionally performed tests with a
larger range of rapidities (|y| < 3), collinearly-split momenta, cocircular configurations,
three scales instead of two scales and again found no failures. These tests together with
the proof given in appendix B give us a good degree of confidence that the algorithm truly
is infrared safe, hence justifying its name.
5.2 Speed
As can be gathered from the discussion in [6], reasonable speed is an essential requirement
if a new variant of cone jet algorithm is to be adopted. To determine the speed of various
cone jet algorithms, we use the same set of events taken for testing the FastJet formulation
of the kt jet algorithm in [20] — these consist of a single Pythia [26] dijet event (with
pt,jets ≃ 50GeV) to which we add varying numbers of simulated minimum bias events so
as to vary the multiplicity N . Thus the event structure should mimic that of LHC events
with pileup.
Figure 5 shows the time needed to find jets in one event as a function of N . Among
the seeded jet algorithms we consider only codes that include midpoint seeds. For the
(CDF) midpoint code [13], written in C++, there is an option of using only particles above
a threshold s as seeds and we consider both the common (though collinear unsafe) choice
s = 1GeV and the (collinear safe but IR unsafe) s = 0GeV. The PxCone code [12],
written in Fortran 77, has no seed threshold.
Our seedless code, SISCone, is comparable in speed to the fastest of the seeded codes,
the CDF midpoint code with a seed threshold s = 1GeV, and is considerably faster than
the codes without a seed threshold (not to mention existing exact seedless codes which take
∼ 1 s to find jets among 20 particles and scale as N2N). Its run time also increases more
slowly with N than that of the seeded codes, roughly in agreement with the expectation
of SISCone going as Nn ln n (with a large coefficient) while the others go as N2n. The
midpoint code with s = 1GeV has a more complex N -dependence presumably because
we have run the timing on a single set of momenta, and the proportionality between the
number of seeds and N fluctuates and depends on the event structure.
For comparison purposes we have also included the timings for the FastJet (v2) kt imple-
mentation, which for these values ofN uses a strategy that involves a combination ofN lnN
and Nn dependencies. Timings for the FastJet implementation of the Aachen/Cambridge
algorithm are similar to those for the kt algorithm.
 0.001
 0.01
 100  1000  10000
CDF midpoint (s=0 GeV)
CDF midpoint (s=1 GeV)
PxCone
SISCone
kt (fastjet)
Figure 5: Time to cluster N particles, as a function of N , for various algorithms, with
R = 0.7 and f = 0.5, on a 3.4GHz Pentium R© IV processor. For the CDF midpoint
algorithm, s is the threshold transverse momentum above which particles are used as
seeds.
5.3 Rsep: an inexistent problem
Suppose we have two partons separated by ∆R and with transverse momenta pt1 and pt2
(pt1 > pt2). Both partons end up in the same jet if the cone containing both is stable, i.e.
< 1 + z , z =
, (3)
where the result is exact for small R or with pt-scheme recombination. Equivalently one
can write the probability for two partons to be clustered into a single jet as
P2→1(∆R, z) = Θ
1 + z −
. (4)
The limit on ∆R/R ranges from 1 for z = 0 to 2 for z = 1. This z-dependent limit is the
main low-order perturbative difference between the cone algorithm and inclusive versions
of sequential recombination ones like the kt or Cambridge/Aachen algorithms, since the
latter merge two partons into a single jet for ∆R/R < 1, independently of their energies.
Rsep  = 1.3?
TWO JETS
 0  0.5  1  1.5  2  2.5
∆R / R
ONE JET
NP: 2 jets?
PT: 1 jet?
Figure 6: Schematic representation of the phase space region in which two partons will
end up in a single cone jet versus two jets, at the 2-parton level (PT) and, according to
the Rsep statement, after showering and hadronisation (NP).
A statement regularly made about cone algorithms (see for example [21, 1, 27]) is
that parton showering and hadronisation reduce the stability of the cone containing the
‘original’ two partons, leading to a modified ‘practical’ condition for two partons to end
up in a single jet,
< min (Rsep , 1 + z) , (5)
or equivalently,
P2→1(∆R, z) = Θ
1 + z − ∆R
Rsep −
, (6)
with Rsep ≃ 1.3 [28, 29].15 This situation is often represented as in figure 6, which depicts
the ∆R, z plane, and shows the regions in which two partons are merged into one jet or
resolved as two jets. The boundary ∆R = 1+z corresponds to eq. (3), while the alternative
boundary at ∆R = Rsep is eq. (5).
So large a difference between the low-order partonic expectation and hadron-level results
would be quite a worrying feature for a jet algorithm — after all, the main purpose of a
jet algorithm is to give as close a relation as possible between the first couple of orders of
perturbation theory and hadron level.16
The evidence for the existence of eq. (6) with Rsep = 1.3 seems largely to be based [28,
29] on merging two events (satisfying some cut on the jet pt’s), running the jet-algorithm
on the merged event, and examining at what distance particles from the two events end
up in the same jet. This approach indicated that particles were indeed less likely to end
15The name Rsep was originally introduced [30] in the context of NLO calculations of hadron-collider
jet-spectra, but with a different meaning — there it was intended as a free parameter to model the lack
of knowledge about the details of the definition of the cone jet algorithm used experimentally. This is
rather different from the current use as a parameter intended to model our inability to directly calculate
the impact of higher-order and non-perturbative dynamics of QCD in cone algorithms.
16The apparent lack of correspondence is considered sufficiently severe that in some publications (e.g.
[11]) the NLO calculation is modified by hand to compensate for this.
 0  0.5  1  1.5  2  2.5
∆R / Rcone
Prob. 2 kt subjets → 1 cone jet
 = 1; Rcone = 0.4
a) parton level
 0  0.5  1  1.5  2  2.5
∆R / Rcone
b) hadron level
Figure 7: The probability P2→1(∆R, z) for two kt-algorithm subjets to correspond to a
single cone jet, as a function of pt1/pt2 and ∆R for the two kt subjets. Events have been
generated with Herwig [31] (hadron-level includes the underlying event) and the results
are based on studying all kt jets with pt > 50GeV and |y| <1. Further details are to be
found in the text.
up in the same jet if they were more than 1.3R apart, however the result is an average
over a range of z values making it hard to see whether eq. (6) is truly representative of the
underlying physics.17
To address the question in more depth we adopt the following strategy. Rather than
combining different events, we use one event at a time, but with two different jet algorithms.
On one hand we run SISCone with a fairly small value of R, Rcone = 0.4. Simultaneously
we run inclusive kt jet-clustering [2] on the event, using a relatively large R (Rkt = 1.0),
and identify any hard kt-jets. For each hard kt jet we undo its last clustering step so as to
obtain two subjets, S1 and S2 — these are taken to be the analogues of the two partons.
We then examine whether there is a cone jet that contains more than half of the pt of
each of S1 and S2. If there is, the conclusion is that the two kt subjets have ended up
(dominantly) in a single cone jet.
The procedure is repeated for many events, and one then examines the probability,
P2→1(∆R, z), of the two kt subjets being identified with a single cone jet, as a function of
the distance ∆R between the two subjets, S1 and S2, and the ratio z of their pt’s. The
17A preliminary version of [27] showed more differential results; these, however, seem not to be in the
definitive version.
results are shown in fig. 7 both at parton-shower level and at hadron level, as simulated
with Herwig [31]. The middle contour corresponds to a probability of 1/2. At parton-
shower level this contour coincides remarkably well with the boundary defined by eq. (3),
up to ∆R/R = 1.7. It is definitely not compatible with eq. (5) with Rsep = 1.3. Beyond
∆R/R = 1.7 the contour bends a little and one might consider interpreting this as an
Rsep ≃ 1.8.18 However, in that region the transition between P = 1 and P = 0 is broad,
and to within the width of the transition, there remains good agreement with eq. (3) — it
seems more natural therefore to interpret the small deviation from eq. (3) as a Sudakov-
shoulder type structure [32], which broadens and shifts the Θ-function of eq. (4), as would
happen with almost any discontinuity in a leading-order QCD distribution.
Once one includes hadronisation effects in the study, fig. 7b, one finds that the transition
region broadens further, as is to be expected. Now the P = 1/2 contour shifts away slightly
from the 1 + z result at small z as well. However, once again this shift is modest, and of
similar size as the breadth of the transition region.
To verify the robustness of the above results we have examined other related indicators.
One of them is the probability, P2→2 of finding two cone jets, each containing more than
half of the transverse momentum of just one of the kt subjets. At two-parton level, one
expects P1→2 + P2→2 = 1. Deviation from this would indicate that our procedure for
matching cone jets to kt jets is misbehaving. We find that the relation holds to within
around 15% over most of the region, deviating by at most ∼ 25% in a small corner of phase
space ∆R/R ≃ 1.5, z ≃ 0.2. Another test is to examine the fraction F2 of the softer S2’s
transverse momentum that is found in the cone that overlaps dominantly with S1. At two-
parton level this should be equal to P2→1, but this would not be the case after showering
if there were underlying problems with our matching procedure. We find however that F2
does agree well with P2→1. These, together with yet further tests, lead to us to believe that
conclusions drawn from fig. 7 are robust. Finally, while these results have been obtained
within a Monte Carlo simulation, Herwig, a similar study could equally be well carried
experimentally on real events.
So, in contrast to statements that are often made about the cone jet algorithm, the
perturbative picture of when two partons will recombine, given by eq. (4), seems to be a
relatively good indicator of what happens even after perturbative radiation and hadronisa-
tion. In particular the evidence that we have presented strongly disfavours the Rsep-based
modification, eq. (6). This is a welcome finding, and should help provide a firmer basis for
cone-based phenomenology.
5.4 Physics impact of seedless v. midpoint cone
In this section, we discuss the impact on physical measurement of switching from a mid-
point type algorithm to a seedless IR-safe one such as SISCone. We study two physical
observables, the inclusive jet spectrum and the jet mass spectrum in 3-jet events. The
18Such a value has been mentioned to us independently by M. Wobisch in the context of unpublished
studies of jet shapes for the SearchCone algorithm [21].
spectra have been obtained by generating events with a Monte-Carlo either at fixed order
in perturbation theory (NLOJet [19]) or with parton showering and hadronisation (Pythia
[26]), and by performing the jet analysis on each event using three different algorithms
(each with R = 0.7 and f = 0.5, and additionally in the case of SISCone, Npass = 1 and
pt,min = 0):
1. SISCone: the seedless, IR-safe definition described in algorithms 1–3;
2. midpoint(0): the midpoint algorithm using all particles as seeds;
3. midpoint(1): the midpoint algorithm using as seeds all particles above a threshold
of 1 GeV.
We have used a version of the CDF implementation of the midpoint algorithm modified to
have the split–merge step based on p̃t rather than pt (so that it corresponds to algorithm 4.3
with pt,min = 0). The motivation for this is that we are mainly interested in the physics
impact of having midpoint versus all stable cones, and the comparison is simplest if the
subsequent split–merge procedure is identical in both cases.19
We shall first present the results obtained for the inclusive jet spectrum and then discuss
the jet mass spectrum in 3-jet events. Most studies carried out in this section have used
kinematics corresponding to the Tevatron Run II, i.e. a centre-of-mass energy
s = 1.96
TeV, and usually, for simplicity we have chosen not to impose any cuts in rapidity.
5.4.1 Inclusive jet spectrum
As discussed in section 3, the differences between the midpoint algorithm and SISCone are
expected to start when we have 3 particles in a common neighbourhood plus one to balance
momentum. For pure QCD processes this corresponds to 2 → 4 diagrams, O (α4s). This is
NNLO for the inclusive spectrum. Though a NNLO calculation of the inclusive spectrum
is beyond today’s technology (for recent progress, see [33]), we can easily calculate the
O (α4s) difference between midpoint and SISCone, using just tree-level 2 → 4 diagrams,
since the difference between the algorithms is zero at orders α2s and α
s, i.e. we can neglect
two-loop 2 → 2 diagrams and one-loop 2 → 3 diagrams. The significance of the difference
can be understood by comparing to the leading order spectrum, which is identical for the
two algorithms.
Figure 8 shows the resulting spectra: the upper plot gives the leading order inclusive
spectrum together with the difference between SISCone and midpoint(0) at O (α4s). The
lower plot shows the relative difference. One sees that the use of the IR-safe seedless cone
algorithm introduces modest corrections, of order 1-2%, in the inclusive jet spectrum. This
order of magnitude is roughly what one would expect, since the differences only appear at
19We could also have compared SISCone with a midpoint algorithm using pt in the split–merge (a
common default); the figures we show below would have stayed unchanged at the 1% level for the inclusive
spectrum, while for the jet masses the effects range between a few percent at moderate masses and 10−20%
in the high-mass tail.
20 40 60 80 100 120 140 160 180 200
inclusive pT spectrum (all y)
SISCone (Born level, 0(αs
|midpoint(0) -- SISCone| 0(αs
NLOJet
R=0.7, f=0.5
20 40 60 80 100 120 140 160 180 200
pT (GeV)
-0.02
-0.01
20 40 60 80 100 120 140 160 180 200
pT (GeV)
-0.02
-0.01
Figure 8: (a) Inclusive jet spectrum: the upper curve gives the leading-order (O (α2s))
spectrum, while the lower (blue) curve gives the difference between the SISCone and mid-
point(0) algorithm, obtained from the O (α4s) tree-level amplitude; (b) the relative differ-
ence.
-0.04
-0.02
 50  100  150  200
pt [GeV]
pp−  √s = 1.96 TeV
R=0.7, f=0.5, |y|<0.7Pythia 6.4
(a) hadron-level (with UE)
hadron-level (no UE)
parton-level
 50  100  150  200
pt [GeV]
pp    √s = 14 TeV
R=0.7, f=0.5, |y|<0.7Pythia 6.4
(b) hadron-level (with UE)
hadron-level (no UE)
parton-level
Figure 9: Relative difference between the inclusive jet spectra for midpoint(1) and SIS-
Cone, obtained from Pythia at parton level, hadron level without underlying event (UE)
contributions, and hadron level with UE. Shown (a) for Tevatron collisions and (b) for
LHC collisions.
relative order α2s. As we will see below, larger differences will appear when one examines
more exclusive quantities.
In addition, we have used Herwig and Pythia to investigate the differences between
midpoint(1) and SISCone with parton showering. Both generators give similar results, and
we show the results just of Pythia, fig. 9a. The difference at parton level is very similar
to what was observed at fixed order. At hadron level without underlying event (UE)
corrections, the difference remains at the level of 1−2% (though it changes sign); once one
includes the underlying event contributions, the difference increases noticeably at lower
pt — this is because the midpoint(1) algorithm receives somewhat larger UE corrections
than SISCone. Since the underlying event is one of the things that is likely to change from
Tevatron to LHC, in figure 9b we show similar curves for LHC kinematics. At parton level
and at hadron level without the underlying event, the results are essentially the same as for
the Tevatron. With the underlying event included, the impact of the missing stable cones
in the midpoint algorithm reaches of the order of 10 to 15%, and thus starts to become
quite a significant effect. With Herwig, we find that the impact is little smaller because its
underlying event is smaller than Pythia’s at the LHC.
5.4.2 Jet masses in 3-jet events
As well as the inclusive jet pT spectrum, we can also study more exclusive quantities. One
example is the jet-mass spectrum in multi-jet events. Jet-masses are potentially of interest
for QCD studies, particle mass measurements [34] and new physics searches, where they
could be used to identify highly boosted W/Z/H bosons or top quarks produced in the
decays of new heavy particles [35].
The simplest multi-jet events in which to study jet masses are 3-jet events. There, the
0 10 20 30 40 50 60 70 80 90 100
M (GeV)
Mass spectrum of jet 2
midpoint(0) -- SISCone
SISCone
NLOJet
R=0.7, f=0.5
0 10 20 30 40 50 60 70 80 90 100
M (GeV)
2 Mass spectrum of jet 2
midpoint(0) -- SISCone
SISCone
NLOJet
R=0.7, f=0.5
∆ R23 < 1.4
Figure 10: Mass spectrum of the second hardest jet as obtained with the different cone
algorithms on tree-level 4-particle events (generated with NLOJet): the plots shows the
relative difference between the midpoint and SISCone results. In the upper plot we consider
all three-jet events satisfying the transverse-momentum cuts, while in the lower plot (note
scale) we consider only those in which second and third jet are separated by ∆R23 < 2R.
 0  10  20  30  40  50
(a) SISCone
midpoint(0)
midpoint(1)
 0  10  20  30  40  50  60  70  80
 0.01
Pythia 6.4 R=0.7, f=0.5
SISCone
midpoint(0)
midpoint(1)
-0.75
-0.25
 0.25
 0  10  20  30  40  50  60  70  80
M (GeV)
(c) midpoint(0)
 0  10  20  30  40  50  60  70  80
-0.75
-0.25
 0.25
M (GeV)
(d) midpoint(1)
Figure 11: Mass spectrum of the third hardest jet obtained from the different cone algo-
rithms run on three-jet Pythia events. The top-left (top-right) plot shows the spectrum in
linear (logarithmic) scale and the bottom plots show the relative difference between each
midpoint algorithm and SISCone. See the text for the details of the event selection.
masses of all the jets vanish at the 3-particle level. The first order at which the jet masses
become non-zero is O (α4s) and this is also the order at which differences appear between
the midpoint and seedless cone algorithms. Therefore, as in section 5.4.1, we generate
2 → 4 tree-level events, but now keep only those with exactly 3 jets with pT ≥ 20 GeV in
the final state. We further impose that the hardest jet should have a pT of at least 120
GeV and the second hardest jet a pT of at least 60 GeV. With these cuts we can compute
the jet-mass spectrum for each of the three jets and for the three different algorithms.
In the upper plot of Figure 10, we show the relative difference “(midpoint(0) - SIS-
Cone)/SISCone” for the mass spectrum of the second hardest jet. In the lower plot we
show the same quantity for events in which we have placed an additional requirement that
the y−φ distance between the second and third jets be less than 2R (such distance cuts are
often used when trying to reconstruct chains of particle decays). The midpoint algorithm’s
omission of certain stable cones leads to an overestimate of the mass spectrum by up to
∼ 10% without a distance cut (much smaller differences are observed for the first and third
jet) and of over 40% with a distance cut. The problem is enhanced by the presence of the
distance cut because many more of the selected events then have three particles in a com-
mon neighbourhood, and this is precisely the situation in which the midpoint algorithm
misses stable cones (cf. section 3).
We emphasise also that the NLO calculation of these mass spectra would be impossible
with a midpoint algorithm, because the 10− 40% tree-level differences would be converted
into an infrared divergent NLO contribution.
A general comment is that the problems seen here for the midpoint algorithm without
a distance cut are of the same general order of magnitude as the 16% failure rate in the
IR safety tests of section 5.1, suggesting that the absolute failure rates given there are a
good indicator of the degree of seriousness of issues that can arise in generic studies with
the infrared unsafe algorithms.
In addition to this fixed-order parton-level analysis, we have studied the jet masses in
3-jet events at hadron level (i.e. after parton showering and hadronisation) using events
generated with Pythia. At hadron level many more seeds are present, due to the large
particle multiplicity. One might therefore expect the midpoint algorithm to become a
good approximation to the seedless one.
For the mass of the second hardest jet, i.e. the quantity we studied at fixed order in
figure 10, we find that the midpoint and seedless algorithms do give rather similar results
at hadron level. In other words differences that we see in a leading order calculation are
not propagated through to the full hadron level result. This is a serious practical issue
for the midpoint algorithm, because a jet algorithm’s principal role is to provide a good
mapping between low-order parton level and hadron level.
Nevertheless, despite the many seeds that are present at hadron level, we find that there
are still some observables for which the midpoint algorithm’s lack of stable cones does have
a large impact even at hadron level. This is the case that the mass distribution of the
third hardest jet, shown in figure 11 (obtained without a distance cut) on both linear and
logarithmic scales so as to help visualise the various regions of the distribution. Moderate
differences are present in the peak region, but in the tail of the distribution they become
large, up to 50%. They are greater for midpoint(1) than for midpoint(0), because the seed
threshold causes fewer stable cones to be found with the midpoint(1) algorithm.
These results have been checked using the Herwig Monte-Carlo. We have observed
similar differences at parton-shower level, at the hadron level and at the hadron level
including underlying event, both in the peak of the distribution and in the tail. We note
that hadronisation corrections are substantial in the tail of the distribution, both for the
midpoint and SISCone algorithms.
The above results confirm what one might naturally have expected: while very inclusive
quantities may not be overly sensitive to the deficiencies of one’s jet algorithm, as one
extends one’s investigations to more exclusive quantities, those deficiencies begin to have
a much larger impact.
6 Conclusions
Given the widespread use of cone jet algorithms at the Tevatron and their foreseen contin-
ued use at LHC, it is crucial that they be defined in an infrared safe way. This is necessary
in general so as to ensure that low-order parton-level considerations about cone jet-finding
hold also for the fully showered, hadronised jets that are observed in practice. It is also a
prerequisite if measurements are to be meaningfully compared to fixed order (LO, NLO,
NNLO) predictions.
The midpoint iterative cone algorithm currently in use is infrared unsafe, as can be seen
by examining the sets of stable cones that are found for simple three-parton configurations.
This may seem surprising given that the midpoint algorithm was specifically designed to
avoid an earlier infrared safety problem — however the midpoint infrared problem appears
at one order higher in the coupling, and this is presumably why it was not identified in the
original analyses. The tests shown in section 5.1 suggest that the midpoint-cone infrared
safety problems, while smaller than without the midpoint, are actually quite significant
(∼ 15%).
We therefore advocate that where a cone jet algorithm is used, it be a seedless variant.
For such a proposal to be realistic it is crucial that the seedless variant be practical. The
approaches adopted in fixed order codes take O
time and are clearly not suitable in
general. Here we have shown that it is possible to carry out exact seedless jet-finding in ex-
pectedO
Nn3/2
time withO
Nn1/2
storage, or almost exactly20 in expected O (Nn lnn)
time with O (Nn) storage (we recall that N is the total number of particles, n the typical
number of particles in a jet). The second of these approaches has been implemented in a
C++ code named SISCone, available also as a plugin for the FastJet package. For N ∼ 1000
it is comparable in speed to the existing CDF midpoint code with 1GeV seeds. While this
is considerably slower than the N lnN and related FastJet strategies [20] for the kt and
Cambridge/Aachen jet algorithms, it remains within the limits of usability and provides
for the first time a cone algorithm that is demonstrably infrared and collinear safe at all
orders, and suitable for use at parton level, hadron level and detector level.
20with a failure probability that can be made arbitrarily small and that we choose to be . 10−18.
As well as being infrared safe, a jet algorithm must provide a faithful mapping between
expectations based on low-order perturbative considerations, and observations at hadron
level. There has been considerable discussion of worrisome possible violations of such a
correspondence for cone algorithms, the “Rsep” issue. For SISCone we find however that
the correspondence holds well.
An obvious final question is that of the impact on physics results of switching from
the midpoint to the seedless cone. For inclusive quantities, one expects the seedless cone
jet algorithm to give results quite similar to those of the midpoint cone, because the IR
unsafety of the midpoint algorithm only appears at relatively higher orders. This is borne
out in our fixed order and parton-shower studies of the inclusive jet spectrum where we
see differences between the midpoint and SISCone algorithms of about a couple of percent.
At moderate pt at hadron level, the differences can increase to 5− 10%, because SISCone
has a lower sensitivity to the underlying event, a welcome ‘fringe-benefit’ of the seedless
algorithm.
For less inclusive quantities, for example the distribution of jet masses in multi-jet
events, differences can be significant. We find that for 3-jet events, the absence of some
stable cones (i.e. infrared unsafety) in the midpoint algorithm leads to differences compared
to SISCone at the∼ 10% level at leading order (α4s) in a large part of the jet-mass spectrum.
Greater effects still, up to 50%, are seen with specific cuts at fixed order, and in the tails of
the jet-mass spectra for parton-shower events. Thus, even if the infrared safety issues of the
midpoint algorithm appear to be at the limit of today’s accuracy when examining inclusive
quantities, for measurements of even moderate precision in multi-jet configurations (of
increasing interest at Tevatron and omnipresent at LHC), the use of a properly defined
cone algorithm such as SISCone is likely to be of prime importance.
Acknowledgements
We are grateful to Markus Wobisch for many instructive discussions about cone algorithms,
Steve Ellis and Joey Huston for exchanges about their IR safety and Rsep, Matteo Cacciari
for helpful suggestions on the SISCone code and Giulia Zanderighi for highlighting the
question of collinear safety. We thank them all, as well as George Sterman, for useful
comments and suggestions on the manuscript. We also gratefully acknowledge Mathieu
Rubin for a careful reading of an early version of the manuscript, Andrea Banfi for pointing
out a relevant reference and Torbjörn Sjöstrand for assistance with Pythia. The infrared
unsafe configuration shown here was discovered subsequent to discussions with Mrinal
Dasgupta on non-perturbative properties of cone jet algorithms. This work has been
supported in part by grant ANR-05-JCJC-0046-01 from the French Agence Nationale de
la Recherche. G.S. is funded by the National Funds for Scientific Research (Belgium).
Finally, we thank the Galileo Galilei Institute for Theoretical Physics for hospitality and
the INFN for partial support during the completion of this work.
A Further computational details
A.1 Cone multiplicities
In evaluating the computational complexity of (computational) algorithms for various
stages of the cone jet algorithm it is necessary to know the numbers of distinct cones
and of stable cones. Such information also constitutes basic knowledge about cone jet
definitions, which may for example be of relevance in understanding their sensitivity to
pileup, i.e. multiple pp interactions in the same bunch crossing.
Since large multiplicities will be due to pileup, let us consider a simple model for the
event structure which mimics pileup, namely a set of momenta distributed randomly in y
and φ and all with similar pt’s (or alternatively with random pt’s in some limited range).
Given that the particles will be spread out over a region in y, φ that is considerably
larger than the cone area, in addition to N , the total number of particles, it is useful to
introduce also n, the number of points likely to be contained in a region of area πR2.
The first question to investigate is that of the number of distinct cones. The number
of pairs of points that has to be investigated is O (Nn). However some of these pairs of
points will lead to identical cones. It is natural to ask whether, despite this, the number
of distinct cones is still O (Nn). To answer this question, one may examine how far one
can displace a cone in any given direction before its point content changes. The area swept
when moving a cone a distance δR is 4RδR, and the average number of points intersected
is 4ρR δR where ρ = O (n/R2) is the density of points (per unit area). Therefore the
distance moved before the cone edge is likely to touch a point is δR = (4ρR)−1 = O (R/n).
Correspondingly the area in which one can move the centre of cone without changing the
cone’s contents is π(δR)2 = O (R2/n2). Given that the total area is O (R2N/n) we have
that the number of distinct cones is O (Nn), the same magnitude as the number of relevant
point pairs.
Let us now consider the number of stable cones. If we take a cone at random and sum
its momenta then the resulting momentum axis will differ from the original cone axis by an
amount typically of order R/
n (since the standard deviation of y and φ for set of points
in the cone is O (R)). The probability of the difference being . R/n in both the y and
φ directions (i.e. the probability that the new axis contains the same set of particles) is
∼ (R/n)2/(R/
n)2 ∼ 1/n. Therefore the number of stable cones is O (N). This assumes
a random distribution of particles. There may exist special classes of configurations for
which the number of stable cones is greater than O (N). Therefore timing results that are
sensitive to the number of stable cones are to be understood as “expected” results rather
than rigorous upper bounds.
A.2 Computational complexity of the split–merge step
To study the computational complexity of the split–merge step, we work with the expec-
tation that there are O (N) initial protojets (as discussed above) and that there will be
roughly N/n ≪ N final jets (since there are O (n) particles per jet). It is reasonable to
assume that there will be roughly equal numbers of merging and splitting operations. Split-
ting leaves the number of protojets unchanged, while merging reduces it by 1. Therefore
there will be O (N) split–merge steps before we reach the final list of jets.
There are three kinds of tasks in the split–merge procedure. Firstly one has to maintain
a list of jets ordered in p̃t, both for finding the one with highest p̃t and for searching through
the remaining jets (in order of decreasing p̃t) to find an overlapping one. Maintaining the
jets in order is easily accomplished with a balanced tree (for example a priority_queue
or multiset in C++), at a cost of N lnN for the initial construction and lnN per update,
i.e. a total of N lnN , which is small compared to the remaining steps.
In examining the complexity of finding the hardest overlapping jet one needs to know
the cost of comparing two jets for overlap as well as the typical number of times this will
have to be done. A naive comparison of two jets takes time n. Using a 2d tree structure
such as a quadtree or k-d tree (as suggested also by Volobouev [17]), this can be reduced
n. The number of jets to be compared before an overlap is found will depend on the
event structure — if one assumes that jet positions are decorrelated with their p̃t’s, then
O (N/n) comparisons will have to be made each time around the loop. The total cost of
this will therefore be N2/
n (N2) with (without) a 2d tree.
Finally each merging/splitting procedure will take
n (n) time with (without) a tree,
so the total time spent merging and splitting will be O (N
n) (or O (Nn) without a tree).
The dominant step is the search for overlapping jets, which will have a total cost of
n (with a sizable coefficient), or N2 without any 2d tree structures. Since in practice
N2 is smaller than the Nn lnn needed to find the stable cones, here the introduction of a
tree structure gives little overall advantage.
A final comment concerns memory usage: when not using any tree structures, the list of
protojets and their contents requires O (Nn) space, which is the same order of magnitude
as the storage needed for identifying the set of stable cones in the first place. With a tree
structure this can be reduced to O (N
B Proof of IR safety of the SISCone algorithm
In this appendix, we shall explicitly prove that SISCone, algorithms 1–3, is infrared safe.
This means that if we run SISCone first with a set of hard particles, then with the same set
of hard particles together with additional soft particles, then: (a) all jets found in the event
without soft particles will be found also in the event with the soft particles; (b) any extra
jets found in the event with soft particles will themselves be soft, i.e. they will not contain
any of the hard particles. If either of these conditions fails in a finite region of phasespace
for the hard particles, then the cancellation between (soft) real and virtual diagrams will
be broken at some order of perturbation theory, leading to divergent jet cross sections.
We will first discuss the proof using a simplifying assumption: two protojets with
distinct hard particle content have distinct values for the split–merge ordering variable,
p̃t. We shall then discuss subtleties associated with various ordering variables, and explain
why p̃t is a valid choice.
B.1 General aspects of the proof
By soft particles, we understand particles whose momenta are negligible compared to the
hard ones. Specifically, for any set of hard particles {p1, . . . , pn} and any set of soft ones
{p̄1, . . . , p̄m}, we consider a limit in which all soft momenta are scaled to zero, so that they
do not affect any momentum sums,
{p̄j}→0
pi. (7)
In what follows, the limit of the momenta of the soft particles being taken to zero will be
implicit.
Let us now compare two different runs of the cone algorithm: in the first one, referred to
as the “hard event”, we compute the jets starting with a list of hard particles {p1, . . . , pN},
and, in the second one, referred to as the “hard+soft event”, we compute the jets with the
same set of hard particles plus additional soft particles {p̄1, . . . , p̄M}. As mentioned above,
the IR safety of the SISCone algorithm amounts to the statements (a) that for every jet
in the hard event there is a corresponding jet in the hard+soft event with identical hard
particle content (plus possible extra soft particles) and (b) that there are no hard jets in
the hard+soft event that do not correspond to a jet in the hard event. To prove this, we
shall proceed in two steps: first, we shall show that the determination of stable cones is IR
safe, then that the split–merge procedure is also IR safe.
The IR safety of the stable-cone determination is a direct consequence of the fact that:
• each cone initially built from the hard particles only was determined by two particles
in algorithm 2. This cone is thus still present when adding soft particles and, because
of eq. (7), is still stable. Hence, all stable cones from the hard event are also present
after inclusion of soft particles, the only difference being that they also contain extra
soft particles which do not modify their momentum.
• no new stable cone containing hard particles can appear. Indeed, if a new stable
cone appeared, Snew with content {pα1 , . . . , pαn, p̄ᾱ1 , . . . , p̄ᾱm}, then the fact that its
momentum
pαi +
p̄ᾱj corresponds to a stable cone, implies, by eq. (7), that the
cone with just the hard momenta pαi is also stable. However as shown in section 4.2
all stable cones in the hard event have already been identified, therefore this cone
cannot be new.
From these two points, one can deduce that after the determination of the stable cones we
end up with two different kinds of stable cones: firstly, there are those that are the same as
in the hard event but with possible additional soft particles; and secondly there are stable
cones that contain only soft particles. So, the ‘hard content’ of the stable cones has not
been changed upon addition of soft particles and algorithm 2 is IR safe.
The main idea behind the proof of the IR safety of the split–merge process, algorithm 3,
is to show by induction that the hard content of the protojets evolves in the same way for
the hard and hard+soft event. Since the hard content is the same at the beginning of the
process, it will remain so all along the split–merge process which is what we want to prove.
There is however a slight complication here: when running algorithm 3 over one itera-
tion of the loop in the hard event, we sometimes have to consider more than one iteration
of the loop in the hard+soft event. As we shall shortly see, in that case, only the last of
these iterations modifies the hard content of the jets and it does so in the same way as in
the hard event step.
So, let us now follow the steps of algorithm 3 in parallel for the hard and hard+soft
event, and show that they are equivalent as concerns the hard particles. In the following
analysis, item numbers coincide with the corresponding step numbers in algorithm 3.
2: If pt,min is non-zero, all purely soft protojets will be removed from the hard+soft
event and by eq. (7) the same set of hard protojets will be removed in the hard and
hard+soft event. Thus the correspondence between the hard protojets in the two
events will persist independently of pt,min.
3: In general, protojets with identical hard content will have nearly identical p̃t values,
whereas protojets with different hard-particle content will have substantially different
p̃t values.
21 Therefore the addition of soft particles will not destroy the p̃t ordering
and the protojet with the largest p̃t in the hard event, i will have the same hard
content as the one in the hard+soft event (let us call it i′).
4: The selection of the highest-p̃t protojet j (j
′ in the hard+soft case) that overlaps with
i (i′) can differ in the hard and hard+soft events, and we need to consider separately
the cases where this does not, or does happen. The first case, C1, is that i′ and j′
overlap in their hard content — because of the common p̃t ordering, j
′ must then
have the same hard content as j. The second case, C2, is that i′ and j′ only overlap
through their soft particles, so j′ cannot be the ‘same’ jet as j (since j by definition
overlaps with i through hard particles). By following the remaining part of the loop,
we shall show that in the first case all modifications of the hard content are the same
in the hard and hard+soft events, while, for the second case, the iteration of the loop
in the hard+soft event does not modify any hard content of the protojets. In this
second case, we then proceed to the next iteration of the loop in the hard+soft event
but stay at the same one for the hard event.
C1: The two protojets i′ and j′ overlap in their hard content
6,7: We need to compute the fraction of p̃t shared by the two protojets. Since the
hard contents of i (j) and i′ (j′) are identical, the fraction of overlap, given
by the hard content only, will be the same in the hard and hard+soft events.
Hence, the decision to split or merge the protojets will be identical.
21As mentioned already, this point is more delicate than it might seem at first sight. We come back to
it in the second part of this appendix.
8: Since the centres of both protojets are the same in the hard and hard+soft
events, the decision to attribute a hard particle to one protojet or the other will
be the same in both events. Hence splitting will reorganise hard particles in the
same way for the hard+soft event as for the hard one.
10: In both the hard and the hard+soft events, the merging of the two protojets
will result in a single protojet with the same hard content.
C2: The two protojets i′ and j′ overlap through soft particles only
6,7: Since the fraction of p̃t shared by the protojets will be 0 in the limit eq. (7), the
two protojets will be split.
8: In the splitting, only shared particles, i.e. soft particles, will be reassigned to
the first or second protojet. The hard content is therefore left untouched, as is
the p̃t ordering of the protojets.
11: At the end of the splitting/merging of the overlapping protojets, we have to consider
the two possible overlap cases separately: in the first case, the hard contents of the
protojets are modified in the same way for the hard and hard+soft event. This case
is thus IR safe. In the second case, the iteration of the loop in the hard+soft event
does not correspond to any iteration of the loop in the hard event. However the hard
content of the protojets in the hard+soft event is not modified and the p̃t ordering of
the jets remains identical; at the next iteration of the hard+soft loop, the new j′ may
once again have just soft overlap with i′ and the loop will thus continue iterating,
splitting the soft parts of the jets, but leaving the hard content of the jets unchanged.
This will continue until j′ corresponds to the j of the hard event, i.e. we encounter
case 1.22 Therefore even though we may have gone around the loop more times in
the hard+soft event, we do always reach a stage where the split–merge operation in
the hard+soft event coincides with that in the hard event, and so this part of the
procedure is infrared safe.
5,14: Up to possible intermediate loops involving case 2 above, when the protojet i has no
overlapping protojets in the hard event, the corresponding i′ in the hard+soft event
has no overlaps either. Final jets will thus be added one by one with the same hard
content in the hard and hard+soft events.
This completes the proof that the SISCone algorithm is IR safe, modulo subtleties related
to the ordering variable, as discussed below. Regarding the ‘merge identical protojets’
(MIP) procedure:
22Note that the second case can only happen a finite number of times between two occurrences of the
first case: as the p̃t ordering is not modified during the second case, each time around the loop the overlap
will involve a j′ with a lower p̃t than in the previous iteration, until one reaches the j
′ that corresponds
to j.
12: In algorithm 3, we do not automatically merge protojets appearing with the same
content during the split–merge process. This is IR safe. If instead we allow for two
identical protojets to be automatically merged, then when two protojets have the
same hard content but differ as a result of their soft content, they are automatically
merged in the hard event but not in the hard+soft event. This in turn leads to IR
unsafety of the final jets.
A final comment concerns collinear safety and cocircular points. When defining a
candidate cone from a pair of points, if additional points lie on the edge of the cone, then
there is an ambiguity as to whether they will be included in the cone. From the geometrical
point of view, this special case of cocircular points (on a circle of radius R) can be treated
by considering all permutations of the the cocircular points being included or excluded
from the circle contents. SISCone contains code to deal with this general issue. The case
of identically collinear particles, though a specific example of cocircularity, also adds the
problem that a circle cannot properly be defined from two identical points. For explicit
collinear safety we thus simply merge any collinear particles into a single particle, step 1
of algorithm 2. Given the resulting collinear-safe set of protojets, the split–merge steps
preserve collinear safety, since particles at identical y−φ coordinates are treated identically.
B.2 Split–merge ordering variable
Suppose we use some generic variable v (which may be pt, Et, mt, p̃t, etc.) to decide the
order in which we select protojets for the split–merge process. A crucial assumption in the
proof of IR safety is that two jets with different hard content will also have substantially
different values for v, i.e. the ordering of the v’s will not be changed by soft modifications.
If this is not the case then the choice of the hard protojets that enter a given split–merge
loop iteration can be modified by soft momenta, with a high likelihood that the final jets
will also be modified.
At first sight one might think that whatever variable is used, it will have different values
for distinct hard protojets. However, momentum conservation and coincident masses of
identical particles can introduce relations between the kinematic characteristics of distinct
protojets. Some care is therefore needed so as to ensure that these relations do not lead
to degeneracies in the ordering, with consequent ambiguities and infrared unsafety for the
final jets. In particular:
• Two protojets can have equal and opposite transverse momenta if between them they
contain all particles in the event (and the event has no missing energy or ‘ignored’
particles such as isolated leptons). It is probably fair to assume that no two protojets
will have identical longitudinal components, since in pp collisions the hard partonic
reaction does not occur in the pp centre of mass frame.
• Two protojets will have identical masses if they each stem exclusively from the same
kind of massive particle. The two massive particles may be undecayed (e.g. fully
reconstructed b-hadrons) or decayed (top, W , Z, H , or some non-standard new
particle), or even one decayed and the other not (some hypothetical particle with
a long lifetime).23 In the second case we can assume that two identical decayed
particles have different decay planes, because there is a vanishing phase space for
them to have identical decay planes.
Note that in a simple two-parton event almost any choice of variable will lead to a degen-
eracy (no sensible invariant will distinguish the two particles), however this specific case
is not problematic because for R < π/2 neither of the two partons can be in a protojet
that overlaps with anything else. From the point of view of IR safety, it is only for ‘fat’
(non-collimated) hard protojets that we need worry about the problem of degeneracies
in the split–merge ordering, because only then will there be overlaps whose resolution is
ambiguous in the presence of degeneracies.
Let us now consider what occurs with various possible choices for the split–merge
variable.
pt: This choice, adopted in certain codes [13, 19], can be seen to have a problem for events
with momentum conservation in the hadronic part, because if two non-overlapping
protojets contain, between them, all the hard particles then they will have identical
pt’s. If they each overlap with a common third protojet, the resulting split–merge
sequence will be ambiguous. Table 4 provides an example of such an event. The
simplest occurrences of this problem (4h+ 1s) apply only to R > π/4 (four particles
must form at least 3 fat protojets). The problem arises also for smaller R values, but
only at higher multiplicities.
mt: A workaround for the event of table 4 is to use the transverse mass, mt =
p2t +m
In pure QCD, with all particles stable, this is a good variable, because even if two
fat protojets have identical pt’s through momentum conservation, the fact that they
are ‘fat’ implies that they will be massive (over and above intrinsic particle masses),
and the phase space for them to have identical masses vanishes, thus killing any
IR divergences. However, for events with two identical decaying particles, two fat
protojets resulting from the particle decays can have identical pt’s (by momentum
conservation) and identical masses (because the decaying particles were identical).
This could happen for example in the fully hadronic decay channel for tt̄ events.
Thus, this choice is not advisable in a general purpose algorithm.
Et: The variable used in the original run II proposal was Et [6]. It has the drawback that
it is not longitudinally boost invariant: at central rapidity it is equal to mt, while
at high rapidities it tends to pt. Because the phase space for two protojets to have
identical rapidities vanishes (recall that we do not fix the partonic centre-of-mass),
two protojets with identical pt’s and masses will have different Et’s, because the
23Strictly speaking, for all scenarios of decayed heavy particles, the finite width Γ of the particle ensures
that the two jets actually have slightly different masses, breaking any degeneracies. In practice however,
ΓW,Z,t ∼ 1GeV and (for a light Higgs) ΓH ≪ ΛQCD, whereas for the width to save us from the dangers of
degeneracies we would need Γ ≫ ΛQCD.
event 1
n px py pz
0 86.01 66 0
1 64 -66 0
2 -77 -70 0
3 -73 70 0
4 -0.01 0 2
event 2
n px py pz
0 85.99 66 0
1 64 -66 0
2 -77 -70 0
3 -73 70 0
4 0.01 0 2
Table 4: Illustration of two events that conserve transverse momentum and differ only
through a soft particle, but lead to different hard jets with a split–merge procedures that
uses pt as the ordering variable and for measuring overlap. All the particles are to be taken
massless. For R = 0.9 and f = 0.7 each event has stable cones consisting of {01}, {23}
and {12}, as well as all single particles. The slight difference in momenta between the two
events, to balance the soft particle, causes the {01} ({23}) protojet to have the largest pt
in the first (second) event, it splits with {12} (merges with {12}), leading after further
split–merge steps to two hard jets, {01} and {23} (one hard ‘monster’ jet, {0123}).
degree of ‘interpolation’ between between pt and mt will be different. This resolves
the degeneracy and should cure the resulting IR safety issue, albeit at the expense
of introducing boost-dependence.
p̃t: The scalar sum of transverse momenta of the protojet constituents, p̃t, has the prop-
erty that it is equal to mt if all particles in the protojet have identical rapidities,
while it is equal to pt (i.e. the vector sum) if all particles have identical azimuths.
For a decayed massive particle, it essentially interpolates between pt and mt accord-
ing to the orientation of the decay plane. The phase space for all particles to have
identical azimuths vanishes, as does the phase space for the decay products of two
heavy particles to have identically oriented decay planes. Therefore this choice re-
solves any degeneracies, as is needed for infrared safety. Another advantage of p̃t is
that adding a particle to a protojet always increases its p̃t (this is not the case for pt
or Et), ensuring that the degree of overlap between a pair of jets is always bounded
by 1. Since it is also boost invariant, it is the choice that we recommend and that
we adopt as our default.24
Note that the above considerations hold for any split–merge procedure that relies on order-
ing the jets according to a single-jet variable. One might also consider ordering according
to variables determined from pairs of protojets: e.g. first split-merge the pair of protojets
with the largest (or alternatively smallest) overlap, recalculate all overlaps, and then repeat
until there are no further overlaps. However this specific example would also be dangerous,
24One might worry about the naturalness of a variable that depends on the decay plane of heavy particles
— however, any unnaturalness is present anyway in the split–merge procedure since if two particles decay
purely in the transverse plane then there is a likelihood of having overlapping protojets, whereas if they
decay in longitudinally oriented decay planes they will not overlap.
since the particles that are common to protojets a and b (say) could also be the particles
that are common between a and c, once again leading to an ambiguous split–merge se-
quence. One protojet-pair ordering variable that might be free of this problem is the y−φ
distance between the protojets, however we have not investigated it in detail.
A final comment concerns the impact of the split–merge procedure on non-global [36]
resummations for jets [37], in which one is interested in determining which of a set of
ordered soft particles are in a given hard jet. A soft and collinear splitting inside the jet
can modify the p̃t (or Et or mt) of the jet by an amount of the same order of magnitude
as a soft, large-angle emission near the edge of the jet. In events with two back-to-back
narrow jets, for which there is a near degeneracy between the p̃t’s of the two hard jets,
this can affect which of the two hard protojets split–merges first with an overlapping soft
protojet, leading to ambiguities in the assignment of the soft particles to the two hard jets.
This interaction between collinear and soft modes is somewhat reminiscent of that in [38],
though the origin and structure are kinematical in our case. Considering only branchings
with transverse momenta above ǫpt,hard, for R > π/4 this is likely to be relevant in events
with two equally soft particles (α2s ln ǫ) and n soft-collinear splittings (α
2n ǫ) giving an
overall contribution αn+2s ln
2n+1 ǫ. This competes with the normal soft-ordered non-global
logarithms, starting from order α3s ln
3 ǫ. For R ≤ π/4, the problem will only arise with a
greater number of equally soft large-angle particles, and so will be further suppressed by
powers of αs.
References
[1] TeV4LHC QCD Working Group et al., hep-ph/0610012.
[2] S. Catani, Y. L. Dokshitzer, M. H. Seymour and B. R. Webber, Nucl. Phys. B 406
(1993) 187; S. D. Ellis and D. E. Soper, Phys. Rev. D 48 (1993) 3160 [hep-ph/9305266].
[3] Y. L. Dokshitzer, G. D. Leder, S. Moretti and B. R. Webber, JHEP 9708, 001 (1997)
[hep-ph/9707323]; M. Wobisch and T. Wengler, hep-ph/9907280; M. Wobisch, “Mea-
surement and QCD analysis of jet cross sections in deep-inelastic positron proton
collisions at
s = 300GeV,” DESY-THESIS-2000-049.
[4] V. M. Abazov et al. [D0 Collaboration], Phys. Lett. B 525 (2002) 211
[hep-ex/0109041].
[5] A. Abulencia et al. [CDF II Collaboration], Phys. Rev. Lett. 96 (2006) 122001
[hep-ex/0512062].
[6] G. C. Blazey et al., hep-ex/0005012.
[7] M. H. Seymour and C. Tevlin, JHEP 0611 (2006) 052 [hep-ph/0609100].
[8] G. Sterman and S. Weinberg, Phys. Rev. Lett. 39 (1977) 1436.
http://arxiv.org/abs/hep-ph/0610012
http://arxiv.org/abs/hep-ph/9305266
http://arxiv.org/abs/hep-ph/9707323
http://arxiv.org/abs/hep-ph/9907280
http://arxiv.org/abs/hep-ex/0109041
http://arxiv.org/abs/hep-ex/0512062
http://arxiv.org/abs/hep-ex/0005012
http://arxiv.org/abs/hep-ph/0609100
[9] S.D. Ellis, private communication to the OPAL Collaboration; D.E. Soper and H.-C.
Yang, private communication to the OPAL Collaboration; L.A. del Pozo, Univer-
sity of Cambridge PhD thesis, RALT–002, 1993; R. Akers et al. [OPAL Collabora-
tion], Z. Phys. C 63, 197 (1994); M. H. Seymour, Nucl. Phys. B 513 (1998) 269
[hep-ph/9707338].
[10] V. Abazov et al. [D0 Collaboration], hep-ex/0608052.
[11] A. Abulencia et al. [CDF Run II Collaboration], hep-ex/0512020.
[12] L. A. del Pozo and M. H. Seymour, pxcone (unpublished code).
[13] The CDF Collaboration’s implementation of the Teva-
tron Run-II cone definition [6] is available at
http://www.pa.msu.edu/~huston/Les_Houches_2005/Les_Houches_SM.html
[14] G. Arnison et al. [UA1 Collaboration], Phys. Lett. B 132 (1983) 214.
[15] J. E. Huth et al., in Snowmass Summer Study (1990) pp. 134–136.
[16] N. Kidonakis, G. Oderda and G. Sterman, Nucl. Phys. B 525 (1998) 299
[hep-ph/9801268].
[17] I. Volobouev, presentation at MC4LHC meeting, CERN, July 2006.
[18] J. Campbell and R. K. Ellis, Phys. Rev. D 65 (2002) 113007 [hep-ph/0202176].
[19] Z. Nagy, Phys. Rev. Lett. 88 (2002) 122003 [hep-ph/0110315]; Phys. Rev. D 68 (2003)
094002 [hep-ph/0307268].
[20] M. Cacciari and G. P. Salam, Phys. Lett. B 641 (2006) 57 [hep-ph/0512210].
[21] S. D. Ellis, J. Huston and M. Tonnesmann, in Proc. of the APS/DPF/DPB Sum-
mer Study on the Future of Particle Physics (Snowmass 2001) ed. N. Graf, p. P513
[hep-ph/0111434].
[22] M. Luscher, Comput. Phys. Commun. 79 (1994) 100 [hep-lat/9309020].
[23] H. Samet, ACM Computing Surveys (CSUR) 16 (1984) 187.
[24] J. L. Bentley, Communications of the ACM 18 (1975) 509.
[25] A. Banfi, G. P. Salam and G. Zanderighi, JHEP 0503 (2005) 073 [hep-ph/0407286];
Phys. Lett. B 584 (2004) 298 [hep-ph/0304148].
[26] T. Sjostrand et al., Comput. Phys. Commun. 135, 238 (2001) [hep-ph/0010017];
T. Sjostrand et al., hep-ph/0308153.
http://arxiv.org/abs/hep-ph/9707338
http://arxiv.org/abs/hep-ex/0608052
http://arxiv.org/abs/hep-ex/0512020
http://www.pa.msu.edu/~huston/Les_Houches_2005/Les_Houches_SM.html
http://arxiv.org/abs/hep-ph/9801268
http://arxiv.org/abs/hep-ph/0202176
http://arxiv.org/abs/hep-ph/0110315
http://arxiv.org/abs/hep-ph/0307268
http://arxiv.org/abs/hep-ph/0512210
http://arxiv.org/abs/hep-ph/0111434
http://arxiv.org/abs/hep-lat/9309020
http://arxiv.org/abs/hep-ph/0407286
http://arxiv.org/abs/hep-ph/0304148
http://arxiv.org/abs/hep-ph/0010017
http://arxiv.org/abs/hep-ph/0308153
[27] J. M. Campbell, J. W. Huston and W. J. Stirling, Rept. Prog. Phys. 70 (2007) 89
[hep-ph/0611148].
[28] F. Abe et al. [CDF Collaboration], Phys. Rev. D 45 (1992) 1448.
[29] B. Abbott, M. Bhattacharjee, D. Elvira, F. Nang and H. Weerts [for the D0 Collabo-
ration], FERMILAB-PUB-97-242-E.
[30] S. D. Ellis, Z. Kunszt and D. E. Soper, Phys. Rev. Lett. 69 (1992) 3615
[hep-ph/9208249].
[31] G. Marchesini, B. R. Webber, G. Abbiendi, I. G. Knowles, M. H. Seymour and
L. Stanco, Comput. Phys. Commun. 67 (1992) 465; G. Corcella et al., JHEP 0101
(2001) 010 [hep-ph/0011363].
[32] S. Catani and B. R. Webber, JHEP 9710 (1997) 005 [hep-ph/9710333].
[33] A. Daleo, T. Gehrmann and D. Maitre, hep-ph/0612257.
[34] S. Fleming, A. H. Hoang, S. Mantry and I. W. Stewart, hep-ph/0703207; A. Hoang
and S. Mantry, presentations at the “Ringberg workshop on non-perturbative QCD
of jets”, Ringberg Castle, 8–10 January 2007.
[35] J. Huston, private communication; A. L. Fitzpatrick, J. Kaplan, L. Randall and
L. T. Wang, hep-ph/0701150; B. Lillie, L. Randall and L. T. Wang, hep-ph/0701166.
W. Skiba and D. Tucker-Smith, hep-ph/0701247; B. Holdom, hep-ph/0702037;
J. M. Butterworth, J. R. Ellis and A. R. Raklev, hep-ph/0702150.
[36] M. Dasgupta and G. P. Salam, Phys. Lett. B 512 (2001) 323 [hep-ph/0104277], JHEP
0203 (2002) 017 [hep-ph/0203009]; A. Banfi, G. Marchesini and G. Smye, JHEP 0208
(2002) 006 [hep-ph/0206076].
[37] R. B. Appleby and M. H. Seymour, JHEP 0212 (2002) 063 [hep-ph/0211426]; A. Banfi
and M. Dasgupta, Phys. Lett. B 628 (2005) 49 [hep-ph/0508159]; Y. Delenda, R. Ap-
pleby, M. Dasgupta and A. Banfi, JHEP 0612 (2006) 044 [hep-ph/0610242].
[38] J. R. Forshaw, A. Kyrieleis and M. H. Seymour, JHEP 0608 (2006) 059
[hep-ph/0604094].
http://arxiv.org/abs/hep-ph/0611148
http://arxiv.org/abs/hep-ph/9208249
http://arxiv.org/abs/hep-ph/0011363
http://arxiv.org/abs/hep-ph/9710333
http://arxiv.org/abs/hep-ph/0612257
http://arxiv.org/abs/hep-ph/0703207
http://arxiv.org/abs/hep-ph/0701150
http://arxiv.org/abs/hep-ph/0701166
http://arxiv.org/abs/hep-ph/0701247
http://arxiv.org/abs/hep-ph/0702037
http://arxiv.org/abs/hep-ph/0702150
http://arxiv.org/abs/hep-ph/0104277
http://arxiv.org/abs/hep-ph/0203009
http://arxiv.org/abs/hep-ph/0206076
http://arxiv.org/abs/hep-ph/0211426
http://arxiv.org/abs/hep-ph/0508159
http://arxiv.org/abs/hep-ph/0610242
http://arxiv.org/abs/hep-ph/0604094
	Introduction
	Overview of the cone jet-finding algorithm
	IR unsafety in the midpoint algorithm
	An exact seedless cone jet definition
	One-dimensional example
	The two-dimensional case
	General approach
	Specific computational strategies
	The split–merge part of the cone algorithm
	Tests and comparisons
	Measures of IR (un)safety
	Speed
	Rsep: an inexistent problem
	Physics impact of seedless v. midpoint cone
	Inclusive jet spectrum
	Jet masses in 3-jet events
	Conclusions
	Further computational details
	Cone multiplicities
	Computational complexity of the split–merge step
	Proof of IR safety of the SISCone algorithm
	General aspects of the proof
	Split–merge ordering variable
ABSTRACT
  Current cone jet algorithms, widely used at hadron colliders, take event
particles as seeds in an iterative search for stable cones. A longstanding
infrared (IR) unsafety issue in such algorithms is often assumed to be solvable
by adding extra `midpoint' seeds, but actually is just postponed to one order
higher in the coupling. A proper solution is to switch to an exact seedless
cone algorithm, one that provably identifies all stable cones. The only
existing approach takes N 2^N time to find jets among N particles, making it
unusable at hadron level. This can be reduced to N^2 ln(N) time, leading to
code (SISCone) whose speed is similar to that of public midpoint
implementations. Monte Carlo tests provide a strong cross-check of an
analytical proof of the IR safety of the new algorithm, and the absence of any
'R_{sep}' issue implies a good practical correspondence between parton and
hadron levels. Relative to a midpoint cone, the use of an IR safe seedless
algorithm leads to modest changes for inclusive jet spectra, mostly through
reduced sensitivity to the underlying event, and significant changes for some
multi-jet observables.

<|endoftext|><|startoftext|>
Introduction
The JPC = 1−− resonances near the new flavor thresholds: Υ(4S), ψ(3770), and φ(1020) are
the well known sources in e+e− experiments of pairs of the new-flavor mesons: respectively
BB̄, DD̄, and KK̄. A number of experimental approaches depends on the knowledge of the
relative yield of pairs of charged and neutral mesons:
Rc/n =
σ(e+e− → P+P−)
σ(e+e− → P 0P̄ 0)
, (1)
where P stands for the pseudoscalar meson, i.e. B, D, or K, and dedicated measurements
of such ratio have been done at the Υ(4S) resonance [1] at ψ(3770) [2] and at φ(1020) [3].
The values of the ratio Rc/n at all three discussed resonances are close to one due to these
resonances being isotopic scalars, and it is the deviation of the discussed ratio from one that
presents phenomenological interest. This deviation is generally contributed by the following
factors: the isospin violation due to the Coulomb interaction between the charged mesons
and due to the isotopic mass difference between charged and neutral mesons, and, in the
case of the KK̄ production at the φ(1020) resonance, a non-negligible nonresonant isovector
production amplitude. The latter effect can be studied and described as the “tail of the ρ
resonance”, while the isospin breaking due to the mass difference is usually accounted for
as a kinematical effect in the P wave production cross section factor p3, where p is the the
c.m. momentum of each of the mesons. The Coulomb effect has attracted a considerable
theoretical attention. The expression for this effect in the ratio Rc/n in the limit, where the
resonance and the charged mesons are considered as point-like particles [4] has the simple
textbook form:
δRc/n =
, (2)
with α being the QED constant and v the velocity of each of the (charged) mesons in the
c.m. frame. However for the production of the real-life mesons the analysis is complicated
by the charge form factors of the mesons[5], by the form factor in the vertex of interaction of
the resonance with the meson pair [5, 6] and generally by the strong interaction between the
mesons [7, 8, 9]. In particular, it has been argued [8, 9] that the modification of the Coulomb
effect by the strong (resonant) interaction between the mesons is quite significant. The
previously considered picture of the strong interaction was however somewhat unrealistic.
Namely, it has been assumed [8, 9] that the wave function in the I = 1 state of the meson
pair is vanishing at short but finite distances, which would correspond to a singular behavior
of the strong interaction at finite distances. In this paper we derive the formulas for the
Coulomb effect in the ratio Rc/n under the standard assumption about the strong scattering
amplitude in the channels with I = 0 and I = 1. We find that in the case of the Υ(4S) and
ψ(3770) resonances, where the heavy meson pairs are produced by the isotopically singlet
electromagnetic current of the corresponding heavy quark, the strong-interaction effect in
the Coulomb correction depends on the scattering phase δ1 in the I = 1 channel and is a
smooth function of the energy across the resonance, while in the case of the Kaon production
at and near the φ(1020) there is also a smooth dependence on the nonresonant part of the
strong scattering phase δ0 in the isoscalar channel inasmuch as there is a contribution of the
isovector production amplitude at these energies. In either case we find that the behavior
of the Coulomb effect is smooth on the scale of the resonance width, unlike the behavior
previously found [8, 9] under less realistic assumptions.
We further notice that essentially the same calculation can be applied to considering the
effect on the ratio Rc/n of the isotopic mass difference ∆m between the charged and neutral
mesons, at least in the first order in ∆m, by considering the mass difference as a perturbation
by a (constant) potential. In this way we find that the result coincides with the linear in ∆m
term in the ratio of the kinematical factors p3 only in the limit of vanishing strong scattering
phase. Once the latter phase is taken into account, there arises a correction whose relative
contribution is determined by the parameter (p a) with a being the characteristic range of
the strong interaction. We therefore conclude that the conventionally used p3 approximation
for this effect may be somewhat applicable to the KK̄ production at the φ(1020) resonance,
where p ≈ 120MeV, but becomes quite questionable for the DD̄ production at the ψ(3770),
where p ≈ 280MeV.
The strong-scattering phase in the P -wave state of mesons produced in e+e− annihilation
near the threshold is proportional to p3. We therefore expect the discussed effects of the
strong interaction in the ratio Rc/n to exhibit a measurable variation with energy. A mea-
surement of this variation can thus provide an information on the strong scattering phases,
which is not readily available by other means.
The material in the paper is organized as follows. In Sec. 2 we consider the production of
meson pairs by an isosinglet source and derive the formula for the correction to Rc/n due to
a generic isospin-violating interaction potential V (r) viewed as a perturbation. In Sec. 3 we
generalize this treatment to the situation where the source is a coherent mixture of I = 0 and
I = 1. The specific expressions corresponding to the Coulomb interaction and the isotopic
mass difference are considered in Sec. 4. Sec. 5 contains phenomenological estimates of the
constraints on the parameters of the strong interaction between heavy mesons based on the
currently available data [1, 2] for BB̄ and DD̄ production. Finally, in Sec. 6 we summarize
our results.
2 General formulas for an isoscalar source
We start with considering the behavior of the scattering wave functions of a meson-antimeson
pair in the limit of exact isotopic symmetry, i.e. neglecting any Coulomb effects and the
isotopic mass difference. We adopt the standard picture (see e.g. in the textbook [10]),
where the strong interaction is confined within the range of distances r < a, so that beyond
that range, at r > a the motion of the mesons is free. The two relevant independent solutions
to the Schrödinger equation at r > a for the radial wave function in the P wave are the free
outgoing wave
f(pr) =
eipr (3)
and its complex-conjugate, f ∗(pr), describing the incoming wave. A general wave function
of a pair of neutral mesons, φn(r) as well as of a pair of charged mesons, φc(r), in this region
is a linear superposition of these two solutions.
In the region of strong interaction, i.e. at r < a, the isotopic symmetry selects as
independent channels the states with definite isospin, I = 0 and I = 1, corresponding to
the wave functions φ0 = φc + φn and φ1 = φc − φn. The detailed behavior of the I = 0 and
I = 1 wave functions inside the strong interaction region is not important for the present
treatment, and the important point is that the non-singular at r = 0 ‘inner’ wave functions
match at r = a particular linear superpositions of the incoming and outgoing waves (which
superpositions in fact correspond to standing waves):
χ0(r) = e
iδ0 f(pr) + e−iδ0 f ∗(pr) ,
χ1(r) = e
iδ1 f(pr) + e−iδ1 f ∗(pr) , (4)
where δ0 and δ1 are the strong scattering phases in respectively the isoscalar and isovector
states.
Consider now the production of meson pairs by a source localized inside the region of
strong interaction, i.e. r < a, such as e.g. the electromagnetic current. The wave function
of the produced meson pairs at r ≤ a is then determined by both the source and the
strong interaction, and the relevant solution to the Schrödinger equation is chosen by the
requirement that asymptotically at large distances, r → ∞, only an outgoing wave is present.
Let us first consider the simple case where the relevant electromagnetic current is a pure
isotopic singlet, which is the case for DD̄ and BB̄ pair production. Then in the limit of
exact isotopic symmetry the outgoing waves for the ‘n’ and the ‘c’ channels have exactly the
same amplitude, which for our present purpose can be chosen as one:
φ(0)c (r) = f(pr) and φ
n (r) = f(pr) at r → ∞ , (5)
where the superscript (0) stands for the approximation of exact isotopic symmetry. It can be
noted that the approximation of the free motion beyond the region of the strong interaction
in fact makes the expressions in Eq.(5) applicable at all r > a, i.e. all the way down to the
matching point r = a. It is helpful to notice for a later discussion that at the matching point
the I = 1 wave function is vanishing while the I = 0 function φ
0 contains only the outgoing
wave. When continued into the strong interaction region, i.e. at r < a, the function φ0
evolves into the solution determined by the strong interaction and the source.
The isospin-violating effects of the Coulomb interaction and of the mass difference ∆m
between the charged and neutral mesons can be generally described as being due to a presence
of an extra potential V (r) in the ‘c’ channel beyond the region of the strong interaction:
V = −α/r for the Coulomb interaction effect and a constant potential V = 2∆m describing
the mass difference. In other words the wave function φn of the ‘n’ channel is still determined
at r > a by the radial Schrödinger equation for free P -wave motion2, while the equation for
the ‘c’ channel function φc reads as
+ p2 −mV (r)− 2
φc(r) = 0 . (6)
It is assumed throughout the present consideration that the isospin-breaking potential
exists only at distances beyond the range of the strong interaction, i.e. that V (r) has support
only at r > a. The justification for such treatment is that in the region of the strong force
2Clearly, in the considered here first order in the isospin violation only the difference of the interaction
between the two channels is important, thus any such difference can be relegated to one channel, while
keeping the other one unperturbed. Also, any effect of the mass difference in the kinetic term p2/m is of
order v2/c2 as compared to the discussed here effect of ∆m in the overal energy difference between the two
channels, and is totally neglected in our treatment.
small isospin-violating effects are compared to the energy of the strong interaction, so that
the contribution of any such effects arising at r < a is very small, while in the region r > a
the relative contribution of the potential V (r) is determined by its ratio to the kinetic energy
of the mesons, which is small near the threshold.
It should be emphasized that although the interaction at distances r > a is present only
in the ‘c’ channel, the wave functions in both channels are modified in comparison with those
in Eq.(5), as a result of the coupling between channels imposed by the boundary conditions
at r = a. According to the setting of the problem of production of the meson pairs by a
localized source, the appropriate modified functions are those containing at r → ∞ only the
outgoing waves
φc → (1 + x) f(pr), φn → (1 + y) f(pr) , (7)
where the (complex) coefficients x and y arise due to the potential V , and are proportional to
V in the considered here first order of perturbation theory. These coefficients determine the
ratio of the production amplitudes: Ac/An = 1+ x− y, and the discussed here modification
of the yield ratio:
Rc/n = 1 + 2Rex− 2Re y . (8)
The modified wave function in both channels is subject to two conditions:
i: The channel with neutral mesons has only an outgoing wave at all r > a. In other words,
the expression for φn(r) in Eq.(7) is valid at all r down to r = a;
ii: The wave function of the channel with isospin I = 1 at r ≤ a should be proportional to
the standing-wave solution matching the function χ1 in Eq.(4), since there is no source for
the I = 1 state of the meson pairs.
These two conditions are sufficient to fully determine the modified functions at r > a and
thus to find the coefficients x and y.
The first order in V (r) perturbation of the wave function in the channel with charged
mesons is found in the standard way, using the P wave Green’s function G+(r, r
′) satisfying
the equation
+ p2 −
G+(r, r
′) = δ(r − r′) , (9)
and the condition that G+(r, r
′) contains only an outgoing wave when either of its arguments
goes to infinity. The Green’s function is constructed from two solutions of the homogeneous
equation, i.e. from the functions f(pr) and f ∗(pr), as
G+(r, r
2 i p
[f(pr) f ∗(pr′) θ(r − r′) + f(pr′) f ∗(pr) θ(r′ − r)] , (10)
where θ is the standard unit step function. The perturbation δφc is then found as
δφc(r) = m
G+(r, r
′) V (r′) f(pr′) dr′ . (11)
One readily finds from this explicit form of the solution that δφc contains only the outgoing
wave at asymptotic distances r → ∞:
δφc |r→∞ = −
f(pr)
V (r′) |f(pr′)|2 dr′ , (12)
so that the coefficient x is purely imaginary:
x = − i
V (r′) |f(pr′)|2 dr′ (13)
and gives no contribution to the ratio of the production rates Rc/n described by Eq.(8)3.
Consider now the matching of the wave functions at r = a. In this region of r one has
r < r′ in the integral in Eq.(11) so that the correction in the ‘c’ channel has only an incoming
wave:
δφc(r) |r→a = η f
∗(pr) (14)
η = −
V (r′) [f(pr′)]
dr′ . (15)
The wave functions φ0 = φc + φn and φ1 = φc − φn corresponding to the states with isospin
I = 0 and I = 1 are then found as
φ0 |r→a = 2f(pr) + y f(pr) + η f
∗(pr) and φ1 |r→a = η f
∗(pr)− y f(pr) . (16)
One can now apply the condition ii to determine the coefficient y. Indeed, the condition for
the wave function φ1 at r → a to be proportional to f ∗(pr) + e2iδ1 f(pr) requires y to be
given by
y = −η e2iδ1 . (17)
Upon substitution in Eq.(8) this yields
Rc/n = 1 +
e2iδ1
e2ipr
V (r) dr
 . (18)
3It can be noticed that the integral in Eq.(13) is divergent, which corresponds to the infrared-divergent
behavior of the perturbation for the phase of the wave function, logarithmic for the Coulomb interaction
and linear for a constant potential. This slight technical difficulty can be readily resolved, for our present
purposes, by introducing an infrared regularizing factor exp(−λ r) in the potential and setting λ → 0 in the
end result.
3 Mixed isoscalar and isovector source
The formula (18) gives the general expression for the isospin-breaking effect in the considered
yield ratio for the case where the mesons are produced by an isoscalar source. The presented
consideration can also be extended to a situation where the source is a general coherent mix-
ture of an isoscalar and isovector. The specific isotopic composition of the source determines
the ratio of the coefficients of the amplitudes of the running outgoing waves in the I = 1
and I = 0 channels at the matching point r = a, which ratio we denote as A1/A0, thus
defining A1 and A0 as the production amplitudes in the respective channels (in the limit of
exact isotopic symmetry). In this situation the generalization of the expressions in Eq.(5) for
radial wave functions in the ‘outer’ region r > a in the zeroth order in the isospin violation
can be written as
φ(0)c (r) = (A0 + A1) f(pr) and φ
n (r) = (A0 − A1) f(pr) . (19)
The isospin violation in the asymptotic form of these wave functions at r → ∞ can then be
parametrized, similarly to Eq.(7), by complex coefficients x and y as
φc → (A0 + A1) (1 + x) f(pr), φn → (A0 − A1) (1 + y) f(pr) , (20)
so that the yield ratio is found from
Rc/n =
A0 + A1
A0 − A1
(1 + 2Rex− 2Re y ) . (21)
The coefficient x, similarly to the previous discussion and the equation (13), is purely
imaginary and in fact does not contribute in Eq.(21), while the coefficient y is found from the
appropriately modified conditions on the wave functions. Namely, the previously discussed
condition i remains applicable, so that the asymptotic expression in Eq.(20) for the ‘n’
channel function remains valid in the entire ‘outer’ region r > a down to the matching point
r = a. In order to allow for the isovector component of the source the condition ii has to be
modified as will be described few lines below.
The perturbation by the potential V (r) of the ‘c’ channel wave function at the matching
point r = a is readily found, similarly to Eq.(14), as
δφc(r) | r→a = η (A0 + A1) f ∗(pr) (22)
with η given by Eq.(15).
One can now write the expressions for the resulting ‘outer’ wave functions in the isotopic
channels at the matching point:
φ0(r) | r→a = 2A0 f(pr) + η (A0 + A1) f ∗(pr) + y (A0 −A1)f(pr) =
2A0 + y (A0 −A1)− η (A0 + A1) e2iδ0
f(pr) + η (A0 + A1) e
iδ0 χ0(r) (23)
φ1(r) | r→a = 2A1 f(pr) + η (A0 + A1) f ∗(pr)− y (A0 −A1)f(pr) =
2A1 − y (A0 −A1)− η (A0 + A1) e2iδ1
f(pr) + η (A0 + A1) e
iδ1 χ1(r) , (24)
with χ0 and χ1 being the standing wave functions from Eq.(4) in the corresponding isotopic
channels, which when evolved in the region of strong interaction contain no singularity at
r = 0. The remaining parts in the latter expressions for the functions φ0 and φ1 describe
the proper running outgoing waves. These parts, when continued down in r into the strong
interaction region evolve to match the source at r < a. The ratio of the amplitudes of
the isovector and the isoscalar running waves is determined by the isotopic composition of
the source, and by the isotopically symmetric propagation through the strong-interaction
region. Thus the ratio of the amplitudes of these waves at r = a does not depend on the
isospin-breaking effects at r > a and should be equal to A1/A0. Applying this condition to
the isotopic wave functions given by the expressions (23) and (24), one finds the equation
for the coefficient y:
2A1 − y (A0 − A1)− η (A0 + A1) e2iδ1
2A0 + y (A0 − A1)− η (A0 + A1) e2iδ0
. (25)
This equation in fact replaces in this more general situation the previously discussed condi-
tion ii, which condition and the ensuing result in Eq.(17) are readily recovered in the limit
A1/A0 = 0 from Eq.(25).
Considering that both y and η are of the first order in the potential V , it is sufficient to
use the linear expansion of the equation (25) in y and η, finding in this way the solution for
y in the form
y = −ηA0 e
2iδ1 − A1 e2iδ0
A0 − A1
, (26)
and thus arriving at the final formula for the relative yield:
Rc/n =
A0 + A1
A0 −A1
2iδ1 − A1 e2iδ0
A0 − A1
e2ipr
V (r) dr
. (27)
Given that A0 = |A0| eiδ0 and A1 = |A1| eiδ1, the amplitude-dependent factor in this formula
can also be written in terms of the real ratio ρ = |A1/A0| as
2iδ1 −A1 e2iδ0
A0 −A1
= e2iδ1
1− ρ ei(δ0−δ1)
1− ρ e−i(δ0−δ1)
. (28)
4 The Coulomb and the mass-difference effects
The general formulas in Eq.(18) and (27) can now be applied to a discussion of the specific
isospin-breaking effects in the e+e− production of meson pairs at and near the threshold
resonances. We start with considering the effect of the Coulomb interaction. In a detailed
treatment of this correction one should include the realistic form factors of the mesons,
which cut off at short distances the difference in the electromagnetic interactions between
the charged and neutral mesons. In the present discussion we replace for simplicity the
gradual cutoff of the Coulomb interaction by an abrupt cutoff at an effective range r = ac,
where generally ac ≥ a 4. The master integral with the Coulomb potential V (r) = −α/r in
the equations (18) and (27) then takes the form
e2ipr
V (r) dr =
cos 2pac
2(pac)2
sin 2pac
− Ci(2pac)
− cos 2pac
sin 2pac
2(pac)2
− Si(2pac)
2 (pac)2
− ln(2 pac) + 1− γE
(pac)
, (29)
where the integral sine and cosine are defined in the standard way:
Si(z) =
sin t
and Ci(z) = −
cos t
and γE = 0.577 . . . is the Euler’s constant. The latter line in Eq.(29) shows few first terms of
the expansion of the integral in the parameter (pac). This expansion illustrates the behavior
of the correction toward the threshold. For the purpose of this illustration one can consider
first the simpler expression in Eq.(18). The imaginary part, which determines the discussed
Coulomb effect in Rc/n in the limit where there is no strong scattering, δ1 → 0, is not singular
4As previously mentioned, any extension of the isospin-breaking potential inside the strong interaction
region can result only in very small corrections.
at pac → 0, and the textbook formula (2) is recovered in this limit. The real part of the
integral in Eq.(29) is singular at small pac, but it multiplies in Eq.(18) the factor sin δ1. The
P -wave scattering phase in its turn is proportional at small momenta to p3: δ1 ∼ (pa)3, so
that the overall contribution of the real part of the integral is not singular at the threshold
either. Considering a more general expression for the Coulomb effect for the case of an
isotopically mixed source, following from the equation (27), one can readily arrive at the
same conclusion that the singular in (pac) real part of the integral (29) does not lead to
an actual singularity, since it only enters the ratio Rc/n multiplied by a combination of the
phases δ0 and δ1 (cf. Eq.(28)), each vanishing as p
3 toward the threshold.
As previously mentioned, the effect of the isotopic mass difference corresponds to that of
a constant potential V = 2∆m extending from the range of the strong interaction r = a to
infinity. The master integral with such potential has the form
e2ipr
V (r) dr =
2 cos 2 pa
+ sin 2 pa+ i
2 sin 2 pa
− cos 2 pa
− 2 pa+ 3 i+O
(pa)2
. (30)
In the limit of vanishing strong scattering phases the mass correction to Rc/n is determined
by only the imaginary part of the integral, which in the limit of small pa thus yields
Rc/n = 1− 3∆m
= 1− 3∆m
, (31)
where E is the total kinetic energy of the meson pair, and the found expression coincides
with the linear in ∆m term in the expansion of the usually assumed ratio of the kinematical
factors (p+/p0)
3. Clearly, in the more realistic case of presence of the strong scattering the
real part of the integral in Eq.(30) also contributes and the simple kinematical approximation
is generally invalidated.
5 Phenomenological estimates
In this section we discuss application of our formulas to interpreting the data on the charged
to neutral meson yield ratio Rc/n at the near-threshold resonances Υ(4S), ψ(3770) and
φ(1020). The purpose of this discussion is to illustrate the effect of the strong scatering on
the isospin breaking corrections, and we use here the simplified picture of a abrupt cutoff
of the Coulomb interaction and of the isotopic mass difference effects. Such simplification
generally can be used as long as the parameter (pa) is not large. A detailed analysis should
likely involve a model of a gradual cutoff, since the details of the transition become important
at lager momenta.
5.1 Υ(4S)
The simplest case for the study of the isospin breaking corrections in the relative production
of heavy mesons is offered by the BB̄ pair production near and at the Υ(4S) resonance.
Indeed, this process only is due to the purely isosinglet electromagnetic current of the b
quarks, and the isotopic mass difference between the B mesons is very small: ∆mB =
−0.33±0.28MeV [11], so that any deviation of the ratio Rc/n from one is essentially entirely
due to the Coulomb interaction. On the other hand, the parameter α/v for the Coulomb
effect in this case is the largest due to small velocity of the B mesons: at the energy of the
Υ(4S) peak vB/c ≈ 0.06. In particular, the numerical value in the expression (2) is 0.19.
The experimental data [1] however indicate a significantly smaller deviation of Rc/n from one.
The BaBar data with the smallest errors give Rc/n = 1.006±0.036±0.031. Such behavior is
likely a result of a combined effect of the meson and production vertex form factors [5, 6] and
of the discussed here modification of the Coulomb correction by the strong scattering phase.
These effects can in principle be separated and studied quantitatively by measuring the
energy dependence of the ratio Rc/n near the Υ(4S) resonance. With the presently available
data we can only use a simplified parametrization of the form factor effects by introducing
an abrupt cutoff for the Coulomb interaction at r = ac ≥ a and thereby estimate the likely
regions in the (ac, δ1) plane. Such estimate from the equations (18) and (29) is shown in Fig.1
as a one-sigma area, corresponding to the BaBar data with the statistical and systematic
errors added in quadrature: Rc/n = 1.006± .048. Clearly, more precise data from dedicated
measurements of the ratio Rc/n are needed for a better understanding of the parameters of
strong interaction between the B mesons.
0.2 0.4 0.6 0.8 1 1.2
Figure 1: The one sigma area (shaded) in the (ac, δ1) plane corresponding to the BaBar data
on the B+B−/B0B̄0 yield ratio at the Υ(4S) resonance.
5.2 ψ(3770)
The largest isospin-breaking effect in the DD̄ production at the ψ(3770) is that due to the
mass difference between the charged and the neutral D mesons: ∆mD = 4.78±0.10MeV [11].
The most precise measurements of this process have been done [2] at the energy
3773MeV. At this energy the momentum of each charged D meson is p+ = 254MeV and
that for a neutral D meson is p0 = 287MeV. Thus the ratio of the kinematical factors
(p+/p0)
3 ≈ 0.69 is significantly less than one. The Coulomb effect is somewhat smaller.
Indeed, the velocity of a charged meson at this energy is v+/c = 0.135 and the expression
(2) gives numerically 0.085. One can notice that if the kinematical and the Coulomb factors
are combined in a straightforward way to estimate Rc/n = (p+/p0)
3 [1 + πα/(2v+)] ≈ 0.75,
this would be in a very good agreement with the experimental number [2]: Rc/n = 0.776 ±
0.024+0.014
−0.006. Thus it is quite likely that at this particular energy there is a considerable
cancelation between the strong-interaction effects in the yield ratio, and such cancelation by
itself imposes constraints on the parameters of strong interaction between the D mesons,
which constraints is interesting to analyze.
An analysis of the strong-interaction effects along the lines discussed in the present paper
generally runs into two difficulties. One is that our approach is accurate only in the linear
in ∆m approximation, while the actual effect of the isotopic mass difference between the
D mesons is not very small. However, numerically, the first term in the expansion of the
kinematical factor (Eq.(31)) gives 0.67, which is quite close to the mentioned above value
0.69, and it looks like the linear term gives a reasonable approximation. The other point
is that the cutoff parameter ac for the Coulomb interaction at short distances does not
necessarily coincide with the range parameter a used for the short-distance cutoff of the
effect of the mass difference. However, as previously noted, the Coulomb effect is somewhat
small at the energy of the ψ(3770) resonance, and for the purpose of preliminary estimates
we set ac = a in our numerical analysis. In order to allow for possible errors introduced by
our approximations in comparing with the data, we linearly add a theoretical uncertainty
of 0.03 units to the combined in quadrature statistical and experimental errors. Proceeding
in this way we find that the only region in the (a, δ1) plane at a < 2 fm consistent with the
CLEO-c data at one sigma level is the one shown in Fig.2.
0.2 0.4 0.6 0.8 1
Figure 2: The area (shaded) in the (a, δ1) plane corresponding to the CLEO-c data on the
D+D−/D0D̄0 yield ratio at the ψ(3770) resonance. The uncertainty shown includes a one
sigma experimental error with our estimate of the theoretical uncertainty added linearly.
It is interesting to compare the plots in the Figures 1 and 2. In the heavy quark limit
applied to both b and c quarks the strong interaction between the heavy mesons should be
the same, corresponding to the same range parameters a and ac. The scattering phase δ1
for these two systems is generally different due to different masses. However, provided there
are no isovector ‘molecular’ bound states, the sign of the phase should be the same, with
the absolute value of the phase for heavier B mesons being larger than for the D mesons.
The comparison with the data for the D mesons favors small values of the range parameter,
as indicated by Fig.2. If one also assumes that ac ≈ a for the B mesons, the short range
of ac, according to Fig.1, is compatible with the B mesons data at a negative scattering
phase δ1, which sign of δ1 is also in agreement with the D meson data. A negative sign of δ1
corresponds to a repulsion, which for the I = 1 state of heavy meson pairs can be expected
on general grounds [12].
5.3 φ(1020)
We believe that the production of KK̄ pairs in e+e− annihilation at and near the φ(1020)
resonance merits a separate analysis along the lines discussed in the present paper and using
detailed data similar to those in Ref.[3]. As is known, this production receives a small but
measurable nonresonant contribution from the isovector part of the electromagnetic current
of the u and d quarks, which corresponds to an isotopically mixed source. Furthermore, it
has been pointed out [13] that a detailed theoretical analysis of the K+K−/K0K̄0 yield ratio
at the φ(1020) resonance produces a result which possibly is at a meaningful variance with
the data.
At present we limit ourselves to noticing that the formula in Eq.(27), applicable in this
situation, describes a smooth behavior of the considered isospin breaking effects across the
resonance in the I = 0 channel. Indeed, the I = 0 scattering phase at energy E near the
resonance energy E0 is given by the Breit-Wigner formula
e2iδ0 =
∆− i γ
∆+ i γ
e2iδ̃0 , (32)
where ∆ = E −E0, δ̃0 is the nonresonant scattering phase in the isoscalar channel, and γ is
the width parameter. Both δ̃0 and γ are smooth functions of the energy proportional to p
small momentum, and γ(E0) determines the resonance width Γ as γ = Γ/2. The ratio of the
isovector and isoscalar production amplitudes can then be parametrized near the resonance
∆+ i γ
ei (δ1−δ̃0) , (33)
where µ is a parameter with dimension of energy: µ ∼ mφ − mρ. The amplitude ratio
entering the correction factor in Eq.(27) can then be written in the form
2iδ1 − A1 e2iδ0
A0 − A1
= e2iδ1
µ− (∆− i γ) e−i(δ1−δ̃0)
µ− (∆ + i γ) e+i(δ1−δ̃0)
, (34)
which manifestly shows that this ratio is a pure phase factor of a complex quantity slowly
varying across the φ(1020) resonance.
6 Summary
We have considered the effects of the isospin breaking by the Coulomb interaction and by
the isotopic mass difference in the relative yield Rc/n of pairs of charged and neutral mesons
near threshold by a compact source, such as in the production of heavy mesons in e+e−
annihilation. These effects are modified by the strong interaction scattering phases. The
general formula for a situation where the source is an arbitrary coherent mixture of an
isoscalar and isovector is given by Eq.(27). In particular, for a purely isoscalar source, which
is the case for the e+e− annihilation into DD̄ and BB̄ pairs the strong-interaction effect is
determined by the scattering phase δ1 in the I = 1 channel (Eq.(18)). As a practical matter
we find that under the standard assumptions about the strong scattering amplitudes in the
near-threshold resonance region the ratio Rc/n has a smooth behavior with energy showing
no abnormal rapid variation on the scale of the resonance width. The energy dependence
of this ratio is rather determined by the non-resonant scattering scattering phase(s). In
the P -wave the phase δ1 is proportional to p
3, so that a measurement of the behavior ratio
Rc/n with energy can provide information on this phase, which is not readily accessible by
other means. The behavior of the ratio Rc/n at larger energies away from the threshold also
depends on the details of the onset of the strong interaction between the heavy mesons at
short distances and on the behavior of their electromagnetic form factors, and a study of
this behavior can provide an insight into these properties of the heavy-light hadrons.
Acknowledgements
The work of MBV is supported, in part, by the DOE grant DE-FG02-94ER40823.
References
[1] J.P. Alexander et al. [CLEO Collaboration], Phys.Rev.Lett. 86, 2737 (2001);
S.B. Athar et al. [CLEO Collaboration], Phys.Rev. D 66, 052003 (2002);
B. Aubert et al. [BABAR Collaboration], Phys.Rev. D 65, 032001 (2002);
B. Aubert et al. [BABAR Collaboration], Phys.Rev. D 69, 071101 (2004);
N.C. Hastings et al. [Belle Collaboration], Phys.Rev. D 67, 052004 (2003).
[2] Q. He et al. [CLEO Collaboration], Phys.Rev.Lett. 95, 121801 (2005) [Erratum-ibid.
96, 199903 (2006)]
[3] M.N. Achasov et al. [SND Collaboration], Phys.Rev. D 63, 072002 (2001).
[4] D. Atwood and W.J. Marciano, Phys.Rev. D 41, 1736 (1990).
[5] G.P. Lepage, Phys.Rev. D 42, 3251 (1990).
[6] N. Byers and E. Eichten, Phys.Rev. D 42, 3885 (1990).
[7] R. Kaiser, A.V. Manohar, and T. Mehen, Report hep-ph/0208194, Aug. 2002 (unpub-
lished)
[8] M.B. Voloshin, Mod.Phys.Lett. A 18, 1783 (2003).
[9] M.B. Voloshin, Phys.Atom.Nucl. 68, 771 (2005) [Yad.Fiz. 68, 804 (2005)].
[10] L.D. Landau and E.M. Lifshits, Quantum Mechanics (Non-relativistic Theory), Third
Edition, Pergamon, Oxford, 1977.
[11] W.M. Yao et al. [Particle Data Group], J.Phys. G 33, 1 (2006).
[12] M.B. Voloshin and L.B. Okun, JETP Lett. 23, 333 (1976).
[13] A. Bramon, R. Escribano, J.L. Lucio M. and G. Pancheri, Phys.Lett. B 486, 406 (2000)
http://arxiv.org/abs/hep-ph/0208194
	Introduction
	General formulas for an isoscalar source
	Mixed isoscalar and isovector source
	The Coulomb and the mass-difference effects
	Phenomenological estimates
	(4S)
	(3770)
	(1020)
	Summary
ABSTRACT
  We revisit the problem of interplay between the strong and the Coulomb
interaction in the charged-to-neutral yield ratio for $B {\bar B}$ and $D {\bar
D}$ pairs near their respective thresholds in $e^+e^-$ annihilation. We
consider here a realistic situation with a resonant interaction in the isospin
I=0 channel and a nonresonant strong scattering amplitude in the I=1 state. We
find that the yield ratio has a smooth behavior depending on the scattering
phase in the I=1 channel. The same approach is also applicable to the $K {\bar
K}$ production at the $\phi(1020)$ resonance, where the Coulomb effect in the
charged-to-neutral yield ratio is generally sensitive to the scattering phases
in both the isoscalar and the isovector channels. Furthermore, we apply the
same approach to the treatment of the effect of the isotopic mass difference
between the charged and neutral mesons and argue that the strong-scattering
effects generally result in a modification to the pure kinematical effect of
this mass difference.

<|endoftext|><|startoftext|>
Introduction
In the imminent LHC environment, where one expects to have an experi-
mental luminosity precision tag at the level of 2%, [1] the requirement for
the theoretical precision tag on the corresponding luminosity processes, such
as single W,Z production with the subsequent decay into light lepton pairs,
should be at the 0.67% level in order not to compromise, unnecessarily, the
over-all precision of the respective LHC luminosity determinations. This dic-
tates that multiple gluon and photon radiative effects must be controlled at
the stated precision. The theory of QED ⊗ QCD exponentiation [2] allows
for the simultaneous resummation of multiple gluon and multiple photon ra-
diative effects in LHC physics processes, to be realized ultimately by MC
methods on an event-by-event basis in the presence of parton showers, in
a framework which allows us to systematically improve the accuracy of the
calculations without double-counting of effects, in principle to all orders in
both αs and α. Such a theoretical framework opens the way to the desired
theoretical precision tag on the LHC luminosity processes.
Our starting point for the new QED⊗QCD resummation theory [2] is
the QCD resummation theory presented in Ref. [3]. This resummation is an
exact rearrangement of the QCD perturbative series based on the N = 1 term
in the exponent in the formal proof of exponentiation in non-Abelian gauge
theories in the eikonal approximation, as given in Ref. [4]. This exponential
is augmented with a sum of residuals which take into account the remaining
contributions to the perturbative series exactly to all orders in αs.
therefore have an exact result whereas the resummation theory in Ref. [4] and
those in Refs. [5–7] are approximate. Recently, an alternative resummation
theory, the soft-collinear effective theory(SCET) [8], has been developed to
treat double resummation of soft and collinear effects. Since we have an exact
re-arrangement of the perturbative series, we could introduce the results from
Refs. [5–8] into our representation as well. Such introductions will appear
elsewhere.
The need for the extension of the QCD resummation theory to QED⊗
QCD resummation was already suggested by the results in Refs. [9–14], where
1If desired, our overall expoential factor can be made to include all of the terms in the
exponent in Ref. [4], in principle.
it was shown that in the evolution of the structure functions the inclusion of
the QED contributions leads to effects at the level of ∼ 0.3%, already almost
half of the error budget discussed above. We will find similar size effects from
the threshold region of heavy gauge boson production. All of these must be
taken into account if one wants ∼ 1.0% for the theoretical precision tag.
The discussion is organized as follows. In Section 2, we review the exten-
sion of the YFS theory to an exact resummation theory for QCD. Section
3 presents the further extension to QED⊗QCD. Section 4 contains the ap-
plication to heavy gauge boson production with the attendant discussion of
shower/ME matching. Section 5 contains some concluding remarks.
2 Extension of YFS Theory to QCD
We consider a parton-level single heavy boson production process such as
q + q̄′ → V + n(g) + X → ℓ̄ℓ′ + n(g) + X , where V = W±, Z, and ℓ =
e, µ, ℓ′ = νe, νµ(e, µ) respectively for V = W
+(Z), and ℓ = νe, νµ, ℓ
′ = e, µ
respectively for V = W−. It has been established [3] that the cross section
may be expressed as
dσ̂exp =
dσ̂n = eSUMIR(QCD)
(2π)4
eiy·(p1+p2−q1−q2−
kj)+DQCD
× ˜̄βn(k1, . . . , kn)
where gluon residuals ˜̄βn(k1, . . . , kn), defined by Ref. [3], are free of all in-
frared divergences to all orders in αs(Q). The functions SUMIR(QCD) and
DQCD, together with the basic infrared functions B
QCD, B̃
QCD, and S̃
QCD are
specified in Ref. [3]. We call attention to the essential compensation between
the left over genuine non-Abelian IR virtual and real singularities between
the phase space integrals
dPh β̄n and
dPh β̄n+1 that really allows us to
isolate ˜̄βj and distinguishes QCD from QED, where no such compensation
occurs. The result in (1) has been realized by Monte Carlo methods [3]. See
also Refs. [15–17] for exact O(α2s) and Refs. [18–20] for exact O(α) results
on the heavy gauge boson production processes which we discuss here.
Apparently, we can not emphasize too much the exactness of (1). Some
confusion seems to exist because it does not show explicitly an ordered ex-
ponential operator for an appropriate ordering prescription, path-ordered,
time-ordered, etc. The essential point is that, in (1), we have evaluated
the matrix elements of these operators and written the result in terms of
the over-all exponent shown therein and the residuals ˜̄βj. This allows us to
maintain exactness to all orders in αs.
3 QED⊗QCD Resummation Theory
The new QED⊗QCD theory is obtained by simultaneously exponentiating
the large IR terms in QCD and the exact IR divergent terms in QED, so that
we arrive at the new result
dσ̂exp = e
SUMIR(QCED)
n,m=0
d3kj1
d3k′j2
(2π)4
eiy·(p1+q1−p2−q2−
k′j2 )+DQCED
× ˜̄βn,m(k1, . . . , kn; k
1, . . . , k
where the new YFS [21,22] residuals, ˜̄βn,m(k1, . . . , kn;k
1, . . . , k
m), with n hard
gluons and m hard photons, defined in Ref. [2], represent the successive
application of the YFS expansion first for QCD and subsequently for QED.
The functions SUMIR(QCED), DQCED are determined from their QCD
analogs SUMIR(QCD), DQCD via the substitutions
BnlsQCD → B
QCD + B
QED ≡ B
QCED,
B̃nlsQCD → B̃
QCD + B̃
QED ≡ B̃
QCED, (3)
S̃nlsQCD → S̃
QCD + S̃
QED ≡ S̃
everywhere in expressions for the latter functions given in Ref. [3]. We stress
that if desired the exponent corresponding the N th Gatherall exponent for
N > 1 can be systematically included in the QCD exponents SUMIR(QCD),
DQCD if desired, with a corresponding change in the respective residuals
˜̄βn,m(k1, . . . , kn; k
1, . . . , k
m). The residuals
˜̄βn,m(k1, . . . , kn; k
1, . . . , k
m) are
free of all infrared singularities, and the result in (2) is a representation that
is exact and that can therefore be used to make contact with parton shower
MC’s without double counting or the unnecessary averaging of effects such as
the gluon azimuthal angular distribution relative to its parent’s momentum
direction.
In the respective infrared algebra (QCED) in (2), the average Bjorken x
values
xavg(QED) ∼= γ(QED)/(1 + γ(QED)),
xavg(QCD) ∼= γ(QCD)/(1 + γ(QCD)),
where γ(A) = 2αACA
(Ls − 1), A = QED, QCD, with CA = Q
f , CF , respec-
tively, for A = QED, QCD and the big log Ls, imply that QCD dominant
corrections happen an order of magnitude earlier than those for QED. This
means that the leading ˜̄β0,0-level gives already a good estimate of the size of
the interplay between the higher order QED and QCD effects which we will
use to illustrate (2) here.
4 QED⊗ QCD Threshold Corrections and
Shower/ME Matching at the LHC
The cross section for the processes pp → V +n(γ)+m(g)+X → ℓ̄ℓ′+n′(γ)+
m(g) +X , where V, ℓ, ℓ′ are the vector-boson / lepton combinations defined
in Section 3, may be constructed from the parton-level cross section via the
usual formula (we use the standard notation here [2])
dσexp =
dxidxjFi(xi)Fj(xj)dσ̂exp(xixjs), (4)
In this section, we will use the result in (2) here with semi-analytical methods
and structure functions from Ref. [23] to examine the size of QED⊗QCD
threshold corrections. A Monte Carlo realization will appear elsewhere [24].
First, we wish to make contact with the existing literature and stan-
dard practice for QCD parton showers as realized by HERWIG [25] and/or
PYTHIA [26]. Eventually, we will also make contact with the new parton
distribution function evolution MC algorithm in Ref. [27]. We intend to
combine our exact YFS-style resummation calculus with HERWIG and/or
PYTHIA by using the latter to generate a parton shower starting from the
initial (x1, x2) point at factorization scale µ, after this point is provided by
the {Fi}. This combination of theoretical constructs can be systematically
improved with exact fully exclusive results order-by-order in αs, where cur-
rently the state of the art in such a calculation is the work in Ref. [28] which
accomplishes the combination of an exact O(αs) correction with HERWIG,
where the gluon azimuthal angle is averaged in the combination.
The issue of this being an exact rearrangement of the QCD and QED
perturbative series requires some comment. Unlike the threshold resumma-
tion techniques in Refs. [5–7], we have a resummation which is valid over the
entire phase space. Thus, it is readily applicable to an exact treatment of
the respective phase space in its implementation via MC methods.
We may illustrate how the combination with PYTHIA/HERWIG may
proceed as follows. We note that, for example, if we use a quark mass
mq as our collinear limit regulator, DGLAP [29] evolution of the structure
functions allows us to factorize all the terms that involve powers of the big log
Lc = lnµ
2/m2q −1 in such a way that the evolved structure function contains
the effects of summing the leading big logs L = lnµ2/µ20 where the evolution
involves initial data at the scale µ0. This gives us a result independent of mq
for mq ↓ 0. In the DGLAP theory, the factorization scale µ represents the
largest pT of the gluon emission included in the structure function.
In practice, when we use these structure functions with an exact result
for the residuals in (2), it means that we must in the residuals omit the
contributions from gluon radiation at scales below µ. This can be shown to
amount in most cases to replacing Ls = ln ŝ/m
q − 1 → Lnls = ln ŝ/µ
2 but
in any case it is immediate how to limit the pT in the gluon emission
that we do not double count effects. In other words, we apply the standard
QCD factorization of mass singularities to the cross section in (2) in the
standard way. We may do it with either the mass regulator for the collinear
singularities or with dimensional regularization of such singularities. The
final result should be independent of this regulator and this is something
that we may use as a cross-check on the results.
This would in practice mean the following: We first make an event with
the formula in (4) which would produce an initial beam state at (x1, x2) for
the two hard interacting partons at the factorization scale µ from the struc-
ture functions {Fj} and a corresponding final state X from the exponenti-
ated cross section in dσ̂exp(xixjs), where we stress that the latter has had all
collinear singularities factorized so that it is much more convergent then its
analog in LEP physics for the electroweak theory for example. The standard
Les Houches procedure [30] of showering this event (x1, x2, X) would then be
used, employing backward evolution of the initial partons. If we restrict the
pT as we have indicated above, there would be no double counting of effects.
Let us call this pT matching of the shower from the backward evolution and
the matrix elements in the QCED exponentiated cross section.
It is possible, however, to be more accurate in the use of the exact result
in (2). Just as the residuals ˜̄βn,m(k1, . . . , kn; k
1, . . . , k
m) are computed order
by order in perturbation theory from the corresponding exact perturbative
results by expanding the exponents in (2) and comparing the appropriate
corresponding coefficients of the respective powers of αnαms , so too can the
shower formula which is used to generate the backward evolution be expanded
so that the product of the shower formula’s perturbative expansion, the per-
turbative expansion of the exponents in (2), and the perturbative expansions
of the residuals can be written as an over-all expansion in powers of αnαms
and required to match the respective calculated exact result for given order.
In this way, new shower subtracted residuals, {
βn,m(k1, . . . , kn; k
1, . . . , k
are calculated that can be used for the entire gluon pT phase space with an
accuracy of the cross section that should in principle be improved compared
with the first procedure for shower matching presented above. Both ap-
proaches are under investigation, where we note that the shower subtracted
2 Here, we refer to both on-shell and off-shell emitted gluons.
residuals have been realized for the exact O(α) luminosity Bhabha process
at DAPHNE energies by the authors in Ref. [31].
Returning to the general discussion, we compute, with and without QED,
the ratio rexp = σexp/σBorn, where we do not use the narrow resonance ap-
proximation, for we wish to set a paradigm for precision heavy vector boson
studies. The formula which we use for σBorn is obtained from that in (4) by
substituting dσ̂Born for dσ̂exp therein, where dσ̂Born is the respective parton-
level Born cross section. Specifically, we have from (1) the ˜̄β0,0-level result
σ̂exp(x1x2s) =
∫ vmax
dv γQCED v
γQCED−1FYFS(γQCED)
eδYFS σ̂Born((1− v)x1x2s)
where we intend the well-known results for the respective parton-level Born
cross sections and the value of vmax implied by the experimental cuts under
study.
What is new here is the value for the QED⊗QCD exponent
γQCED =
+ 2CF
Lnls (6)
where Lnls = ln x1x2s/µ
2 when µ is the factorization scale. The functions
FYFS(γQCED) and δYFS(γQCED) are well-known [22] as well:
FY FS(γQCED) =
e−γQCEDγE
Γ(1 + γQCED)
δY FS(γQCED) =
γQCED +
2ζ(2)−
where ζ(2) is Riemann’s zeta function of argument 2, i.e., π2/6, and γE is
Euler’s constant, i.e., 0.5772. . . .
Using these formulas in (4) allows us to get the results
rexp =
1.1901 , QCED ≡ QCD+QED, LHC
1.1872 , QCD, LHC
1.1911 , QCED ≡ QCD+QED, Tevatron
1.1879 , QCD, Tevatron.
We see that QED is at the level of .3% at both LHC and FNAL. This is stable
under scale variations [2]. We agree with the results in Refs. [15,16,18–20] on
both of the respective sizes of the QED and QCD effects. Furthermore, the
QED effect is similar in size to structure function results found in Refs. [9–13].
5 Conclusions
We have shown that YFS theory (EEX and CEEX), when extended to non-
Abelian gauge theory, allows simultaneous exponentiation of QED and QCD,
QED⊗QCD exponentiation. For QED⊗QCD we find that full MC event
generator realization is possible in a way that combines our calculus with
HERWIG and PYTHIA in principle. Semi-analytical results for QED (and
QCD) threshold effects agree with literature on Z production. As QED is
at the .3% level, it is needed for LHC theory predictions at . 1%. The cor-
responding analysis of the W production is in progress. We have illustrated
a firm theoretical basis for the realization of the complete O(α2s , ααs, α
2) re-
sults needed for the FNAL/LHC/RHIC/ILC physics and all of the latter are
in progress.
Acknowledgments
Work partly supported by US DOE grant DE-FG02-05ER41399 and by
NATO grant PST.CLG.980342. S.A.Y. thanks the organizers of the 2007
Cracow Epiphany Conference for hospitality. B.F.L.W. thanks Prof. W.
Hollik for the support and kind hospitality of the MPI, Munich, while a
part of this work was completed. We also thank Prof. S. Jadach for useful
discussions.
References
[1] M. Dittmar, F. Pauss and D. Zurcher, Phys. Rev. D56 (1997) 7284; M.
Rijssenbeek, in Proc. HCP2002, ed. M. Erdmann (Karlsruhe, 2002) 424;
M. Dittmar, ibid. 431
[2] C. Glosser, S. Jadach, B.F.L. Ward and S.A. Yost, Mod. Phys. Lett.
A19 (2004) 2119; B.F.L. Ward, C. Glosser, S. Jadach and S.A. Yost,
Int. J. Mod. Phys. A20 (2005) 3735 and in Proc. ICHEP04, vol. 1, eds.
H. Chen et al., (World Sci., Singapore, 2005) 588; B.F.L. Ward and
S. Yost, in Proc. 2005 HERA-LHC Workshop, CERN-2005-014, eds.
A. DeRoeck and H. Jung (CERN, Geneva, 2005) 304; and references
therein.
[3] B.F.L. Ward and S. Jadach, Acta Phys. Polon. B33 (2002) 1543 and in
Proc. ICHEP02, ed. S. Bentvelsen et al. (North Holland, Amsterdam,
2003) 275; Mod. Phys. Lett.A14 (1999) 491; D.B. DeLaney, S. Jadach,
C. Shio, G. Siopsis, B.F.L. Ward, Phys. Lett. B342 (1995) 239; D.
DeLaney et al., Mod. Phys. Lett. A12 (1997) 2425; D.B. DeLaney et al.,
Phys. Rev.D52 (1995), erratum ibid.D66 (2002) 019903; and references
therein.
[4] J.G.M. Gatherall, Phys. Lett. B133 (1983) 90.
[5] G. Sterman, Nucl. Phys. B281 (1987) 310.
[6] S. Catani and L. Trentadue, Nucl. Phys. B327 (1989) 323; ibid. B353
(1991) 183.
[7] E. Berger and H. Contopanagos, Phys. Rev. D57 (1998) 253; and refer-
ences therein.
[8] C.W. Bauer, S. Fleming, D. Pirjol and I.W. Stewart, Phys. Rev. D63,
114020 (2001); C.W. Bauer, D. Pirjol and I.W. Stewart, Phys. Rev.
D65, 054022 (2002); and references therein.
[9] S. Haywood, P.R. Hobson, W. Hollik and Z. Kunszt, in Proc. 1999
CERN Workshop on Standard Model Physics (and more) at the LHC,
CERN-2000-004, eds. G. Altarelli and M.L. Mangano (CERN, Geneva,
2000) 122.
[10] H. Spiesberger, Phys. Rev. D52 (1995) 4936.
[11] W.J. Stirling, “Electroweak Effects in Parton Distribution Functions,”
talk presented at ESF Exploratory Workshop, Electroweak Radiative
Corrections to Hadronic Observables at TeV Energies, Durham, Sept.
2003.
[12] M. Roth and S. Weinzierl, Phys. Lett. B590 (2004) 190.
[13] W. J. Stirling et al., in Proc. ICHEP04, eds. H. Chen et al. (World Sci.,
Singapore, 2005) 527.
[14] J. Blumlein and H. Kawamura, Nucl. Phys. B708, 467 (2005); Acta
Phys. Polon. B33 (2002) 3719; and references rtherein.
[15] R. Hamberg, W.L. van Neerven and T. Matsuura, Nucl. Phys. B359
(1991) 343.
[16] W.L. van Neerven and E.B. Zijlstra, Nucl. Phys. B382 (1992) 11; ibid.
B680 (2004) 513; and references therein.
[17] C. Anastasiou et al., Phys. Rev. D69 (2004) 094008.
[18] U. Baur, S. Keller and W.K. Sakumoto, Phys. Rev. D57 (1998) 199; U.
Baur, S. Keller and D. Wackeroth, ibid. D59 (1998) 013002; U. Baur et
al., ibid. D65 (2002) 033007; and references therein.
[19] S. Dittmaier and M. Kramer, Phys. Rev. D65 (2002) 073007; and ref-
erences therein
[20] Z.A. Zykunov, Eur. Phys. J.C3 (2001) 9; and references therein.
[21] D.R. Yennie, S.C. Frautschi, and H. Suura,Ann. Phys. 13 (1961) 379;
see also K.T. Mahanthappa, Phys. Rev. 126 (1962) 329, for a related
analysis.
[22] See also S. Jadach et al., Comput. Phys. Commun. 102 (1997) 229; S.
Jadach, M. Skrzypek and B.F.L. Ward, Phys. Rev. D55 (1997) 1206;
S. Jadach, B.F.L. Ward and Z. Wa̧s, Phys. Rev. D63 (2001) 113009; S.
Jadach, B.F.L. Ward and Z. Wa̧s, Comp. Phys. Commun. 130 (2000)
260; S. Jadach et al., ibid. 140 (2001) 432, 475.
[23] A.D. Martin et al., Phys. Rev. D51 (1995) 4756.
[24] S. Jadach et al., to appear.
[25] G. Corcella et al., hep-ph/0210213 and references therein.
[26] T. Sjostrand et al., hep-ph/0308153.
http://arxiv.org/abs/hep-ph/0210213
http://arxiv.org/abs/hep-ph/0308153
[27] S. Jadach and M. Skrzypek, Acta Phys. Pol. B35 (2004) 735;
hep-ph/0504263, 0504205.
[28] S. Frixione and B. Webber, J. High Energy Phys. 0206 (2002) 029; S.
Frixione, P. Nason and B. Webber, ibid. 0308 (2003) 007; and references
therein.
[29] G. Altarelli and G. Parisi, Nucl. Phys. B126 (1977) 298; Yu. L. Dok-
shitzer, Sov. Phys. JETP 46 (1977) 641; L.N. Lipatov, Yad. Fiz. 20
(1974) 181; V. Gribov and L. Lipatov, Sov. J. Nucl. Phys. 15 (1972)
675, 938.
[30] E. Boos et al., hep-ph/0109068.
[31] G. Balossini et al., Nucl. Phys. Proc. Suppl. 162, 59 (2006); in Proc.
ICHEP06, in press, and references therein.
http://arxiv.org/abs/hep-ph/0504263
http://arxiv.org/abs/hep-ph/0109068
	Introduction
	Extension of YFS Theory to QCD
	QEDQCD Resummation Theory
	QED QCD Threshold Corrections and Shower/ME Matching at the LHC
	Conclusions
ABSTRACT
  We present the theory of QED x QCD resummation and its interplay with
shower/matrix element matching in precision LHC physics scenarios. We
illustrate the theory using single heavy gauge boson production at hadron
colliders.

<|endoftext|><|startoftext|>
Introduction
The study of arrangements is a very important subject in discrete and computa-
tional geometry, where one studies arrangements of n subsets of Rk (often referred
to as objects of the arrangements) for fixed k and large values of n (see [1] for a
survey of the known results from this area). The precise nature of the objects in an
arrangements will be discussed in more details below. Common examples consist
of arrangements of hyperplanes, balls or simplices in Rk. More generally one con-
siders arrangements of objects of “bounded description complexity”. This means
that each set in the arrangement is defined by a first order formula in the language
of ordered fields involving at most a constant number of polynomials whose degrees
are also bounded by a constant (see [12]).
Key words and phrases. Combinatorial Complexity, O-minimal Structures, Homotopy Types,
Arrangements.
The author was supported in part by NSF grant CCF-0634907.
2000 MATHEMATICS SUBJECT CLASSIFICATION 14P10, 14P25
http://arxiv.org/abs/0704.0295v3
2 SAUGATA BASU
In this paper we consider parametrized families of arrangements. The question
we will be interested in most, is the number of “topologically” distinct arrange-
ments which can occur in such a family (precise definition of the topological type
of an arrangement is given later (see Definition 1.6)). Parametrized arrangements
occur quite frequently in practice. For instance, take any arrangement A in Rk1+k2
and let π : Rk1+k2 → Rk2 be the projection on the last k2 co-ordinates. Then for
each z ∈ Rk2 , the intersection of the arrangement A with the fiber π−1(z), is an ar-
rangement Az in R
k1 and the family of the arrangements {Az}z∈Rk2 is an example
of a parametrized family of arrangements. Even though the number of arrange-
ments in the family {Az}z∈Rk2 is infinite, it follows from Hardt’s triviality theorem
generalized to o-minimal structures (see Theorem 4.2 below) that the number of
“topological types” occurring amongst them is finite and can be effectively bounded
in terms of the n, k1, k2 up to multiplication by a constant that depends only on
the particular family from which the objects of the arrangements are drawn. If
by topological type we mean homeomorphism type, then the best known upper
bound on the number of types occurring is doubly exponential in k1, k2. However,
if we consider the weaker notion of homotopy type, then we obtain a singly ex-
ponential bound. We conjecture that a singly exponential bound also holds for
homeomorphism types as well.
We now make precise the class of arrangements that we consider and also the
notion of topological type of an arrangement.
1.1. Combinatorial Complexity in O-minimal Geometry. In order to put the
study of the combinatorial complexity of arrangements in a more natural mathe-
matical context, as well as to elucidate the proofs of the main results in the area,
a new framework was introduced in [2] which is a significant generalization of the
settings mentioned above. We recall here the basic definitions of this framework
from [2], referring the reader to the same paper for further details and examples.
We first recall an important model theoretic notion – that of o-minimality –
which plays a crucial role in this generalization.
1.1.1. O-minimal Structures. O-minimal structures were invented and first studied
by Pillay and Steinhorn in the pioneering papers [13, 14]. Later the theory was
further developed through contributions of other researchers, most notably van
den Dries, Wilkie, Rolin, Speissegger amongst others [20, 21, 22, 25, 26, 15]. We
particularly recommend the book by van den Dries [19] and the notes by Coste [6]
for an easy introduction to the topic as well as the proofs of the basic results that
we use in this paper.
Definition 1.1 (o-minimal structure). An o-minimal structure over a real closed
field R is a sequence S(R) = (Sn)n∈N, where each Sn is a collection of subsets of R
(called the definable sets in the structure) satisfying the following axioms (following
the exposition in [6]).
(1) All algebraic subsets of Rn are in Sn.
(2) The class Sn is closed under complementation and finite unions and inter-
sections.
(3) If A ∈ Sm and B ∈ Sn then A×B ∈ Sm+n.
(4) If π : Rn+1 → Rn is the projection map on the first n co-ordinates and
A ∈ Sn+1, then π(A) ∈ Sn.
(5) The elements of S1 are precisely finite unions of points and intervals.
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 3
The class of semi-algebraic sets is one obvious example of such a structure, but
in fact there are much richer classes of sets which have been proved to be o-minimal
(see [6, 19]).
1.1.2. Admissible Sets. We now recall from [2] the definition of the class of sets that
will play the role of sets with bounded description complexity mentioned above.
Definition 1.2 (admissible sets). Let S(R) be an o-minimal structure over R
and let T ⊂ Rk+ℓ be a fixed definable set. Let π1 : R
k+ℓ → Rk (respectively
π2 : R
k+ℓ → Rℓ) be the projections onto the first k (respectively last ℓ) co-ordinates.
T ⊂ Rk+ℓ
We will call a subset S of Rk to be a (T, π1, π2)-set if
S = Ty = π1(π
2 (y) ∩ T )
for some y ∈ Rℓ.
If T is some fixed definable set, we call a family of (T, π1, π2)-sets to be a
(T, π1, π2)-family. We wil also refer to a finite (T, π1, π2)-family as an arrangement
of (T, π1, π2)-sets.
1.2. Stable Homotopy Equivalence. For any finite CW-complex X we denote
by SX the suspension of X and for n ≥ 0, we denote by SnX the n-fold iterated
suspension S ◦ S ◦ · · · ◦ S
︸ ︷︷ ︸
n times
Note that if i : X →֒ Y is an inclusion map, then there is an obvious induced
inclusion map Sni : SnX →֒ SnY between the n-fold iterated suspensions of X and
Recall from [17] that for two finite CW-complexes X and Y , an element of
(1.1) {X ;Y } = lim
[SiX,SiY ]
is called an S-map (or map in the suspension category). An S-map f ∈ {X ;Y } is
represented by the homotopy class of a map f : SNX → SNY for some N ≥ 0.
Definition 1.3 (stable homotopy equivalence). An S-map f ∈ {X ;Y } is an S-
equivalence (also called a stable homotopy equivalence) if it admits an inverse f−1 ∈
{Y ;X}. In this case we say that X and Y are stable homotopy equivalent.
If f ∈ {X ;Y } is an S-map, then f induces a homomorphism
f∗ : H∗(X,Z) → H∗(Y,Z)
between the homology groups of X and Y .
The following theorem characterizes stable homotopy equivalence in terms of
homology.
Theorem 1.4. [8, pp. 604] Let X and Y be two finite CW-complexes. Then X and
Y are stable homotopy equivalent if and only if there exists an S-map f ∈ {X ;Y }
which induces isomorphisms f∗ : H∗(X,Z) → H∗(Y,Z).
4 SAUGATA BASU
1.3. Diagrams and Co-limits. The arrangements that we consider are all finitely
triangulable. In other words, the union of objects of an arrangement is homeomor-
phic to a finite simplicial complex, and each individual object in the arrangement
will correspond to a sub-complex of this simplicial complex. It will be more con-
venient to work in the category of finite regular cell complexes, instead of just
simplicial complexes.
Let A = {A1, . . . , An}, where each Ai is a sub-complex of a finite regular cell
complex. We will denote by [n] the set {1, . . . , n} and for I ⊂ [n] we will denote by
AI (respectively AI) the regular cell complexes
Ai (respectively
Ai). Notice
that if J ⊂ I ⊂ [n], then
AJ ⊂ AI ,
AI ⊂ AJ .
We will call the collection of sets {|AI |}I⊂[n] together with the inclusion maps
iI,J : |AI | →֒ |AJ |, J ⊂ I, the diagram of A. Notice that (even though we do not
use this fact), |A[n]| is the co-limit of the diagram of A. For I ⊂ [n] we will denote
by A[I] the sub-arrangement {Ai | i ∈ I}.
1.4. Diagram Preserving Maps. Now let A = {A1, . . . , An}, B = {B1, . . . , Bn}
where each Ai, Bj is a sub-complex of a finite regular cell complex for 1 ≤ i, j ≤ n.
Definition 1.5 (diagram preserving maps). We call a map f : |A[n]| → |B[n]|
to be diagram preserving if f(|AI |) ⊂ |BI | for every I ⊂ [n]. (Notice that the
above property is equivalent to f(|Ai|) ⊂ |Bi| for every i ∈ [n] but the previous
property will be more convenient for us later when we extend the definition of
diagram preserving maps to homotopy co-limits (see Definition 3.3).) We say that
two maps f, g : |A[n]| → |B[n]| are diagram homotopic if there exists a homotopy
h : |A[n]| × [0, 1] → |B[n]|, such that h(·, 0) = f, h(·, 1) = g and h(·, t) is diagram
preserving for each t ∈ [0, 1].
More generally, we call a map f : SN |A[n]| → SN |B[n]| to be diagram preserving
if f(SN |AI |) ⊂ S
N |BI | for every I ⊂ [n]. We say that two maps f, g : S
N |A[n]| →
SN |B[n]| are diagram homotopic if there exists a homotopy h : SN |A[n]| × [0, 1] →
SN |B[n]| such that h(·, 0) = f, h(·, 1) = g and h(·, t) is diagram preserving for each
t ∈ [0, 1].
We say that f : |A[n]| → |B[n]| is a diagram preserving homeomorphism if there
exists a diagram preserving inverse map g : |B[n]| → |A[n]| such that the induced
maps g ◦ f : |A[n]| → |A[n]| and f ◦ g : |B[n]| → |B[n]| are Id|A[n]| and Id|B[n]|,
respectively.
We say that f : |A[n]| → |B[n]| is a diagram preserving homotopy equivalence
if there exists a diagram preserving inverse map g : |B[n]| → |A[n]| such that the
induced maps g◦f : |A[n]| → |A[n]| and f ◦g : |B[n]| → |B[n]| are diagram homotopic
to Id|A[n]| and Id|B[n]|, respectively.
We say that an S-map f ∈ {|A[n]|; |B[n]|} is a diagram preserving stable homo-
topy equivalence if it is represented by a diagram preserving map
f̃ : SN |A[n]| → SN |B[n]|
such that there exists a diagram preserving inverse map
g̃ : SN |B[n]| → SN |A[n]|
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 5
for which the induced maps
g̃ ◦ f̃ : SN |A[n]| → SN |A[n]|,
f̃ ◦ g̃ : SN |B[n]| → SN |B[n]|
are diagram homotopic to IdSN |A[n]| and IdSN |B[n]|, respectively.
Translating these topological definitions into the language of arrangements, we
say that:
Definition 1.6 (topological type of an arrangement). Two arrangements A,B are
homeomorphic (respectively homotopy equivalent, stable homotopy equivalent) if
there exists a diagram preserving homeomorphism (respectively homotopy equiva-
lence, stable homotopy equivalence) between them.
Remark 1.7. Note that, since two definable sets might be stable homotopy equiv-
alent, without being homotopy equivalent (see [18, pp. 462]), and also homotopy
equivalent without being homeomorphic, the notions of homeomorphism type, ho-
motopy type and stable homotopy type are each strictly weaker than the previous
The main results of this paper can now be stated.
1.5. Main Results. Let S(R) be an o-minimal structure over R, T ⊂ Rk1+k2+ℓ
a closed and bounded definable set, and let π1 : R
k1+k2+ℓ → Rk1+k2 (respectively,
π2 : R
k1+k2+ℓ → Rℓ, π3 : R
k1+k2 → Rk2) denote the projections onto the first
k1 + k2 (respectively, the last ℓ, the last k2) co-ordinates. For any collection A =
{A1, . . . , An} of (T, π1, π2)-sets, and z ∈ R
k2 , we will denote by Az the collection
of sets, {A1,z, . . . , An,z}, where Ai,z = Ai ∩ π
3 (z), 1 ≤ i ≤ n.
A fundamental theorem in o-minimal geometry is Hardt’s trivialization theorem
(Theorem 4.2 below) which says that there exists a definable partition of Rk2 into
a finite number of definable sets {Ti}i∈I such that for each i ∈ I, all fibers Az with
z ∈ Ti are definably homeomorphic. A very natural question is to ask for an upper
bound on the size of this partition (which will also give an upper bound on the
number of homeomorphism types amongst the arrangements Az, z ∈ R
Hardt’s theorem is a corollary of the existence of cylindrical cell decompositions
of definable sets proved in [11] (see also [19, 6]). When A is a (T, π1, π2)-family
for some fixed definable set T ⊂ Rk1+k2+ℓ, with π1 : R
k1+k2+ℓ → Rk1+k2 , π2 :
k1+k2+ℓ → Rℓ, π2 : R
k1+k2 → Rk2 the usual projections, and #A = n, the
quantitative definable cylindrical cell decomposition theorem in [2] gives a doubly
exponential (in k1k2) upper bound on the cardinality of I and hence on the number
of homeomorphism types amongst the arrangements Az, z ∈ R
k2 . A tighter (say
singly exponential) bound on the number of homeomorphism types of the fibers
would be very interesting but is unknown at present. Note that we cannot hope for
a bound which is better than singly exponential because the lower bounds on the
number of topological types proved in [5] also applies in our situation.
In this paper we give tighter (singly exponential) upper bounds on the number of
homotopy types occurring amongst the fibers Az, z ∈ R
k2 . We prove the following
theorems. The first theorem gives a bound on the number of stable homotopy types
of the arrangements Az, z ∈ R
k2 , while the second theorem gives a slightly worse
bound for homotopy types.
6 SAUGATA BASU
Theorem 1.8. There exists a constant C = C(T ) > 0 such that for any collection
A = {A1, . . . , An} of (T, π1, π2)-sets the number of distinct stable homotopy types
amongst the arrangements Az, z ∈ R
k2 is bounded by
C · n(k1+1)k2 .
If we replace stable homotopy type by homotopy type, we obtain a slightly weaker
bound.
Theorem 1.9. There exists a constant C = C(T ) > 0 such that for any collection
A = {A1, . . . , An} of (T, π1, π2)-sets the number of distinct homotopy types occuring
amongst the arrangements Az, z ∈ R
k2 is bounded by
C · n(k1+3)k2 .
2. Background
In this section we describe some prior work in the area of bounding the number of
homotopy types of fibers of a definable map and their connections with the results
presented in this paper.
We begin with a definition.
Definition 2.1 (A-sets). Let A = {A1, . . . , An}, such that each Ai ⊂ R
k is a
(T, π1, π2)-set. For I ⊂ {1, . . . , n}, we let A(I) denote the set
(2.1)
i∈I⊂[n]
j∈[n]\I
(Rk \Aj)
and we will call such a set to be a basic A-set. We will denote by C(A) the set of
non-empty connected components of all basic A-sets.
We will call definable subsets S ⊂ Rk defined by a Boolean formula whose atoms
are of the form, x ∈ Ai, 1 ≤ i ≤ n, an A-set. An A-set is thus a union of basic
A-sets. If T is closed, and the Boolean formula defining S has no negations, then S
is closed by definition (since each Ai is closed) and we call such a set an A-closed
Moreover, if V is any closed definable subset ofRk, and S is anA-set (respectively
A-closed set), then we will call S∩V to be an (A, V )-set (respectively (A, V )-closed
set).
2.1. Bounds on the Betti numbers of Admissible Sets. The problem of
bounding the Betti numbers of A-sets is investigated in [2], where several results
known in the semi-algebraic and semi-Pfaffian case are extended to this general
setting. In particular, we will need the following theorem proved there.
Theorem 2.2. [2] Let S(R) be an o-minimal structure over R and let T ⊂ Rk+ℓ
be a closed definable set. Then, there exists a constant C = C(T ) > 0 depending
only on T such that for any arrangement A = {A1, . . . , An} of (T, π1, π2)-sets of
k the following holds.
For every i, 0 ≤ i ≤ k,
(2.2)
D∈C(A)
bi(D) ≤ C · n
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 7
Remark 2.3. The main intuition behind the bound in Theorem 2.2 (as well as similar
results in the semi-algebraic and semi-Pfaffian settings) is that the homotopy type
(or at least the Betti numbers) of a definable set in Rk defined in terms of n sets
belonging to some fixed definable family, depend only on the interaction of these
sets at most k + 1 at a time. This is reminiscent of Helly’s theorem in convexity
theory (see [7]) but in a homotopical setting. This observation is also used to give
an efficient algorithm for computing the Betti numbers of arrangements (see [3,
Section 8]). However, the proof of Theorem 2.2 in [2] (as well as the proofs of
similar results in the semi-algebraic [4] and semi-Pfaffian settings [10]) depends on
an argument involving the Mayer-Vietoris sequence for homology, and does not
require more detailed information about homotopy types. In Section 3 below, we
make the above intuition mathematically precise.
We prove two theorems (Theorems 3.6 and 3.7 below) and these auxiliary re-
sults are the keys to proving the main results of this paper (Theorems 1.8 and
1.9). Moreover, these auxiliary results could also be of independent interest in the
quantitative study of arrangements.
2.2. Homotopy types of the fibers of a semi-algebraic map. Theorem 2.2
gives tight bounds on the topological complexity of an A-set in terms of the cardi-
nality of A, assuming that the sets in A belong to some fixed definable family. A
problem closely related to the problem we consider in this paper is to bound the
number of topological types of the fibers of a projection restricted to an arbitrary
A-set.
More precisely, let S ⊂ Rk1+k2 be a set definable in an o-minimal structure over
the reals (see [19]) and let π : Rk1+k2 → Rk2 denote the projection map on the
last k2 co-ordinates. We consider the fibers, Sz = π
−1(z) ∩ S for different z in
k2 . Hardt’s trivialization theorem, (Theorem 4.2 below) shows that there exists
a definable partition of Rk2 into a finite number of definable sets {Ti}i∈I such that
for each i ∈ I and any point zi ∈ Ti, π
−1(Ti) ∩ S is definably homeomorphic to
Szi × Ti by a fiber preserving homeomorphism. In particular, for each i ∈ I, all
fibers Sz with z ∈ Ti are definably homeomorphic.
In case S is an A-set, with A a (T, π1, π2)-family for some fixed definable set
T ⊂ Rk1+k2+ℓ, with π1 : R
k1+k2+ℓ → Rk1+k2 , π2 : R
k1+k2+ℓ → Rℓ, π2 : R
k1+k2 →
k2 , the usual projections, and #A = n, the quantitative definable cylindrical cell
decomposition theorem in [2] gives a doubly exponential (in k1k2) upper bound
on the cardinality of I and hence on the number of homeomorphism types of the
fibers of the map π3|S . A tighter (say singly exponential) bound on the number
of homeomorphism types of the fibers would be very interesting but is unknown at
present.
Recently, the problem of obtaining a tight bound on the number of topological
types of the fibers of a definable map for semi-algebraic and semi-Pfaffian sets
was considered in [5], and it was shown that the number of distinct homotopy
types of the fibers of such a map can be bounded (in terms of the format of the
formula defining the set) by a function singly exponential in k1k2. In particular,
the combinatorial part of the bound is also singly exponential. A more precise
statement in the case of semi-algebraic sets is the following theorem which appears
in [5].
Theorem 2.4. [5] Let P ⊂ R[X1, . . . , Xk1 , Y1, . . . , Yk2 ], with deg(P ) ≤ d for each
P ∈ P and cardinality #P = n. Then, for any fixed P-semi-algebraic set S the
8 SAUGATA BASU
number of different homotopy types of fibers π−1(y) ∩ S for various y ∈ π(S) is
bounded by
(2k1nk2d)
O(k1k2).
Remark 2.5. The proof of Theorem 2.4 however has the drawback that it relies
on techniques involving perturbations of the original polynomials in order to put
them in general position, as well as Thom’s Isotopy Theorem, and as such does not
extend easily to the o-minimal setting. The main results of this paper (see Theorem
1.8 and Theorem 1.9) extend the combinatorial part of Theorem 2.4 to the more
general o-minimal category.
Remark 2.6. Even though the formulation of Theorem 2.4 seems a little different
from the main theorems of this paper (Theorems 1.8 and 1.9), they are in fact
closely related. In fact, as a consequence of Theorem 1.9 we obtain bounds on the
number of homotopy types of the fibers of S for any fixed A-set S, analogous to
the one in Theorem 2.4.
More precisely we have:
Theorem 2.7. Let S(R) be an o-minimal structure over R, and T ⊂ Rk1+k2+ℓ a
closed and bounded definable set, and π1 : R
k1+k2+ℓ → Rk1+k2 , π2 : R
k1+k2+ℓ →
ℓ, and π3 : R
k1+k2 → Rk2 the projection maps. Then, there exists a constant
C = C(T ) > 0, such that for any collection A = {A1, . . . , An} of (T, π1, π2)-sets,
for any fixed A-set S the number of distinct homotopy types of fibers π−13 (z)∩S for
various z ∈ π3(S) is bounded by
C · n(k1+3)k2 .
A similar result with a bound of C · n(k1+1)k2 holds for stable homotopy types
as well.
3. A Topological Comparison Theorem
As noted previously, the main underlying idea behind our proof of Theorem 1.8
is that the homotopy type of an A-set in Rk depends only on the interaction of sets
in A at most (k + 1) at a time. In this section we make this idea precise.
We show that in case A = {A1, . . . , An}, with each Ai a definable, closed and
bounded subset of Rk, the homotopy type of any A-closed set is determined by a
certain sub-complex of the homotopy co-limit of the diagram of A. The crucial fact
here is that this sub-complex depends only on the intersections of the sets in A at
most k + 1 at a time.
In order to avoid technical difficulties, we restrict ourselves to the category of
finite, regular cell complexes (see [24] for the definition of a regular cell complex).
The setting of finite, regular cell complexes suffices for us, since it is well known
that closed and bounded definable sets in any o-minimal structure are finitely tri-
angulable, and hence, are homeomorphic to regular cell complexes.
3.1. Topological Preliminaries. Let A = {A1, . . . , An}, where each Ai is a sub-
complex of a finite regular cell complex. We now define the homotopy co-limit of
the diagram of A.
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 9
3.1.1. Homotopy Co-limits. Let ∆[n] denote the standard simplex of dimension n−1
with vertices in [n] (and by |∆[n]| the corresponding closed geometric simplex). For
I ⊂ [n], we denote by ∆I the (#I − 1)-dimensional face of ∆[n] corresponding to I.
The homotopy co-limit, hocolim(A), is a CW-complex defined as follows.
Definition 3.1 (homotopy co-limit).
hocolim(A) =
I⊂[n]
∆I ×AI/ ∼
where the equivalence relation ∼ is defined as follows.
For I ⊂ J ⊂ [n], let sI,J : |∆I | →֒ |∆J | denote the inclusion map of the face |∆I |
in |∆J |, and let iI,J : |AJ | →֒ |AI | denote the inclusion map of |AJ | in |AI |.
Given (s,x) ∈ |∆I |×|AI | and (t,y) ∈ |∆J |×|AJ | with I ⊂ J , then (s,x) ∼ (t,y)
if and only if t = sI,J(s) and x = iI,J(y).
Note that there exist two natural maps
fA : |hocolim(A)| → |A
[n]|,
gA : |hocolim(A)| → |∆[n]|
defined by
(3.1) fA(s,x) = s,
(3.2) gA(s,x) = x.
where (s,x) ∈ |∆Ic | × c, c is a cell in A
[n] and Ic = {i ∈ [n] | c ∈ Ai}.
Notice that we have
|hocolim(A)| =
I⊂[n]
|∆I | × |AI | ⊂
I⊂[n]
|∆I | × A
Definition 3.2 (truncated homotopy co-limits). For any m, 0 ≤ m ≤ n, we will
denote by hocolimm(A) the sub-complex of hocolim(A) defined by
(3.3) hocolimm(A) = g
A (skm(∆[n])).
Definition 3.3 (diagram preserving maps between homotopy co-limits). Replacing
in Definition 1.5, |A[n]| and |B[n]|, by |hocolim(A)| and |hocolim(B)| respectively,
as well as |AI | and |BI | by f
A (|AI |) and f
B (|BI |) respectively, we get definitions
of diagram preserving homotopy equivalences and stable homotopy equivalences
between |hocolim(A)| and |hocolim(B)|, and more generally for anym ≥ 0, between
|hocolimm(A)| and |hocolimm(B)|.
Definition 3.4. We say thatA ≈m B if there exists a diagram preserving homotopy
equivalence
φ : |hocolimm(A)| → |hocolimm(B)|.
We say that A ∼m B, if there exists a diagram preserving stable homotopy
equivalence φ ∈ {hocolimm(A); hocolimm(B)}, represented by
φ̃ : SN |hocolimm(A)| → S
N |hocolimm(B)|,
for some N > 0.
10 SAUGATA BASU
Remark 3.5. Note that in the above definition the map φ need not be induced by
a diagram preserving map φ : A[n] → B[n] (respectively, φ̃ : SN |hocolimm(A)| →
SN |hocolimm(B)|). Indeed if it was the case then the proofs of Theorems 3.6 and
3.7 below would be simplified considerably.
The two following theorems are the crucial topological ingredients in the proofs
of our main results.
Theorem 3.6. Let A = {A1, . . . , An},B = {B1, . . . , Bn} be two families of sub-
complexes of a finite regular cell complex, such that:
(1) Hi(|A
[n]|,Z),Hi(|B
[n]|,Z) = 0, for all i ≥ k, and
(2) A ∼k B.
Then, A and B are stable homotopy equivalent.
Theorem 3.7. Let A = {A1, . . . , An},B = {B1, . . . , Bn} be two families of sub-
complexes of a finite regular cell complex, such that:
(1) dim(Ai), dim(Bi) ≤ k, for 1 ≤ i ≤ n, and
(2) A ≈k+2 B.
Then, A and B are homotopy equivalent.
We now state two corollaries of Theorems 3.6 and 3.7 which might be of interest.
Given a Boolean formula θ(T1, . . . , Tn) containing no negations and a family of
sub-complexes A = {A1, . . . , An} of a finite regular cell complex, we will denote
by Aθ the sub-complex defined by the formula, θA, which is obtained from θ by
replacing in θ the atom Ti by Ai for each i ∈ [n], and replacing each ∧ (respectively
∨) by ∩ (respectively ∪).
Corollary 3.8. Let A = {A1, . . . , An},B = {B1, . . . , Bn} be two families of sub-
complexes of a finite regular cell complex, satisfying the same conditions as in The-
orem 3.6. Let θ(T1, . . . , Tn) be a Boolean formula without negations. Then, |Aθ|
and |Bθ| are stable homotopy equivalent.
Corollary 3.9. Let A = {A1, . . . , An},B = {B1, . . . , Bn} be two families of sub-
complexes of a finite regular cell complex, satisfying the same conditions as in The-
orem 3.7. Let θ(T1, . . . , Tn) be a Boolean formula without negations. Then, |Aθ|
and |Bθ| are homotopy equivalent.
3.2. Proofs of Theorems 3.6 and 3.7. Let A and B as in Theorem 3.6.
We need a preliminary lemma.
Lemma 3.10.
|A[n]| is diagram preserving homotopy equivalent to |hocolim(A)|.
Proof. Consider the map
fA : |hocolim(A)| → |A
defined in (3.1).
Clearly, if x ∈ c, f−1A (c) = |∆Ic |. Now applying Smale’s version of the Vietoris-
Begle Theorem [16] we obtain that fA is a homotopy equivalence. Clearly, fA is
diagram preserving. Moreover, (see for instance the proof of Theorem 6 in [16])
there exists an cellular inverse map
hA : |A
[n]| → |hocolim(A)|
such that fA ◦ hA is diagram preserving, and is a homotopy inverse of fA. �
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 11
We can now prove Theorems 3.6 and 3.7.
Proof of Theorem 3.6. Let hA : |A
[n]| → |hocolim(A)| be a diagram preserving
homotopy equivalence known to exist by Lemma 3.10. Since hA is cellular, and
dim |A[n]| ≤ k, its image is contained in hocolimk(A) since by definition (Eqn.
(3.3))
skk(hocolim(A)) ⊂ hocolimk(A).
We will denote by hA,B : S
N |hocolimk(A)| → S
N |hocolimk(B)| a map represent-
ing a diagram preserving stable homotopy equivalence known to exist by hypothesis
(which we assume to be cellular).
Let iB,k : S
N |hocolimk(B)| →֒ S
N |hocolim(B)| denote the inclusion map. The
map iB,k induces isomorphisms
(iB,k)∗ : Hj(hocolimk(B),Z) → Hj(hocolim(B),Z)
for 0 ≤ j ≤ k − 1.
Consequently, the map fB ◦ iB,k induces isomorphisms
(fB ◦ iB,k)∗ : Hj(hocolimk(B),Z) → Hj(B
[n],Z)
for 0 ≤ j ≤ k − 1.
Composing the maps, SNhA, hAB, iB,k,S
NfB we obtain that the map,
NfB ◦ iB,k ◦ hAB ◦ S
NhA : S
N |A[n]| → SN |B[n]|
induces isomorphisms
(SNfB ◦ iB,k ◦ hA,B,k ◦ S
NhA)∗ : Hj(|A
[n]|,Z) → Hj(|B
[n]|,Z)
for all j ≥ 0.
Moreover, the map SNfB ◦ iB,k ◦ hAB ◦ S
NhA is diagram preserving since each
constituent of the composition is diagram preserving. It now follows from Theorem
1.4 that the S-map represented by
φ = SNfB ◦ iB,k ◦ hAB ◦ S
NhA : S
N |A[n]| → SN |B[n]|,
is a diagram preserving stable homotopy equivalence. �
Before proving Theorem 3.7 we first need to recall a few basic facts from homo-
topy theory.
Definition 3.11 (k-equivalence). A map f : X → Y between two regular cell
complex is called a k-equivalence if the induced homomorphism
f∗ : πi(X) → πi(Y )
is an isomorphism for all 0 ≤ i < k, and an epimorphism for i = k, and we say that
X is k-equivalent to Y . (Note that k-equivalence is not an equivalence relation).
We also need the following well-known fact from algebraic topology.
Proposition 3.12. Let X,Y be finite regular cell complexes with
dim(X) < k, dim(Y ) ≤ k,
and f : X → Y a k-equivalence. Then, f is a homotopy equivalence between X and
Proof. See [23, pp. 69]. �
12 SAUGATA BASU
Proof of Theorem 3.7. The proof is along the same lines as that of the proof of
Theorem 3.6. Let hA : |A
[n]| → |hocolim(A)| be a diagram preserving homotopy
equivalence known to exist by Lemma 3.10. By the same argument as before, its
image is contained in |hocolimk+2(A)|.
We will denote by hA,B : |hocolimk+2(A)| → |hocolimk+2(B)| a diagram preserv-
ing homotopy equivalence known to exist by hypothesis.
Let iB,k+2 : |hocolimk+2(B)| →֒ |hocolim(B)| denote the inclusion map. The
map iB,k+2 induces isomorphisms
(iB,k+2)∗ : πj(hocolimk+2(B)) → πj(hocolim(B))
for 0 ≤ j ≤ k+1. This is a consequence of the exactness of the homotopy sequence
of the pair (hocolim(B), hocolimk+2(B)) (see [18]).
Consequently, the map fB ◦ iB,k induces isomorphisms
(gB ◦ iB,k)∗ : πj(hocolimk+2(B)) → πj(B
for 0 ≤ j ≤ k + 1.
Composing the maps, hA, hAB, iB,k+2, fB we obtain that the map
fB ◦ iB,k ◦ hAB ◦ hA : |A
[n]| → |B[n]|
induces isomorphisms
(fB ◦ iB,k ◦ hA,B,k ◦ hA)∗ : πj(A
[n]) → πj(B
for 0 ≤ j ≤ k + 1.
Moreover, the map fB◦iB,k◦hAB◦hA is diagram preserving since each constituent
of the composition is diagram preserving. It now follows from Proposition 3.12 that
the map
φ = fB ◦ iB,k ◦ hAB ◦ hA : |A
[n]| → |B[n]|
is a diagram preserving homotopy equivalence. �
Proof of Corollary 3.8. First note that since the formula θ does not contain nega-
tions, writing θ as a disjunction of conjunctions, there exists Σ ⊂ 2[n] such that
AI (respectively, Bθ =
BI). Let A
′ = {AI | I ∈ Σ} (respectively,
B′ = {BI | I ∈ Σ}). It follows from the hypothesis that
A′ ∼k B
Now apply Theorem 3.6. �
Proof of Corollary 3.9. The proof is similar to that of Corollary 3.8 using Theorem
3.7 in place of Theorem 3.6 and is omitted. �
4. Proofs of the Main Theorems
4.1. Summary of the main ideas. We first summarize the main ideas under-
lying the proof of Theorem 1.8. The proof of Theorem 1.9 is similar and differs
only in technical details. Let A = {A1, . . . ,An} be a (T, π1, π2)- arrangement in
k1+k2 . Using Proposition 4.7, we obtain a definable partition, {Cα}α∈I (say) of
k2 , into connected locally closed definable sets Cα ⊂ R
k2 , with the property that
as z varies over Cα, we get for each I ⊂ [n] with #I ≤ k1 + 1 isomorphic (and
continuously varying) triangulations of the sub-arrangement A[I]. Moreover, these
triangulations are downward compatible in the sense that the restriction to A[J ]
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 13
of the triangulation of A[I], refines that of A[J ] for each J ⊂ I (cf. Proposition
4.7 below). These facts allow us to prove that for any z1, z2 ∈ Cα the truncated
homotopy co-limits |hocolimk1(Az1)| and |hocolimk1(Az2)| are homotopy equiva-
lent by a diagram preserving homotopy equivalence. More precisely, we first prove
that the thickened homotopy co-limits |hocolim+k1(Az1 , ε̄)| and |hocolim
(Az2 , ε̄)|
are homeomorphic, and then use Proposition 4.8 to deduce that |hocolimk1(Az1)|
and |hocolimk1(Az2)| are homotopy equivalent. Theorem 3.6 then implies that Az1
is stable homotopy equivalent to Az2 by a diagram preserving stable homotopy
equivalence. It remains to bound the number of elements in the partition {Cα}α∈I .
We use Theorem 2.2 to obtain a bound of C · n(k1+1)k2 on this number, where C is
a constant which depends only on T .
In order to prove Theorem 1.8 we recall a few results from o-minimal geometry.
We first note an elementary property of families of admissible sets (see [2] for a
proof).
Observation 4.1. Suppose that T1, . . . , Tm ⊂ R
k+ℓ are definable sets, π1 : R
k+ℓ →
k and π2 : R
k+ℓ → Rℓ the two projections. Then, there exists a definable sub-
set T ′ ⊂ Rk+ℓ+m depending only on T1, . . . , Tm, such that for any collection of
(Ti, π1, π2) families Ai, 1 ≤ i ≤ m, the union
Ai is a (T
′, π′1, π
2)-family, where
π′1 : R
k+m+ℓ → Rk and π′2 : R
k+ℓ+m → Rℓ+m are the projections onto the first k,
and the last ℓ+m co-ordinates respectively.
4.2. Hardt’s Triviality for Definable Sets. One important technical tool will
be the following o-minimal version of Hardt’s triviality theorem.
Let X ⊂ Rk×Rℓ and A ⊂ Rk be definable subsets of Rk×Rℓ and Rℓ respectively,
and let π : X → Rℓ denote the projection map on the last ℓ co-ordinates.
We say that X is definably trivial over A if there exists a definable set F and a
definable homeomorphism
h : F ×A → X ∩ π−1(A),
such that the following diagram commutes.
F ×A X ∩ π−1(A)
In the diagram above π2 : F ×A → A is the projection onto the second factor. We
call h a definable trivialization of X over A.
If Y is a definable subset of X , we say that the trivialization h is compatible with
Y if there is a definable subset G of F such that h(G×A) = Y ∩ π−1(A). Clearly,
the restriction of h to G×A is a trivialization of Y over A.
Theorem 4.2 (Hardt’s theorem for definable families). Let X ⊂ Rk × Rℓ be a
definable set and let Y1, . . . , Ym be definable subsets of X. Then, there exists a
finite partition of Rℓ into definable sets C1, . . . , CN such that X is definably trivial
over each Ci, and moreover the trivializations over each Ci are compatible with
Y1, . . . , Ym.
14 SAUGATA BASU
Remark 4.3. We first remark that it is straightforward to derive from the proof of
Theorem 4.2 that the definable sets C1, . . . , CN can be chosen to be locally closed,
and can be expressed as, C1 = R
ℓ \ B1, C2 = B1 \ B2, . . . , CN = BN−1 \ BN for
closed definable sets B1, . . . , BN . Clearly, the closed definable sets B1, . . . , BN ,
determine the sets Ci of the partition.
Remark 4.4. Note also that it follows from Theorem 4.2, that there are only a finite
number of topological types amongst the fibers of any definable map f : X → Y
between definable sets X and Y . This remark would be used a number of times
later in the paper.
Since in what follows we will need to consider many different projections, we
adopt the following convention.
Notation 4.5. Given m and p, p ≤ m, we will denote by
π≤pm : R
m → Rp
(respectively π>pm : R
m → Rm−p) the projection onto the first p (respectively the
last m− p) coordinates.
4.3. Definable Triangulations. A triangulation of a closed and bounded defin-
able set S is a simplicial complex ∆ together with a definable homeomorphism from
|∆| to S. Given such a triangulation we will often identify the simplices in ∆ with
their images in S under the given homeomorphism.
We call a triangulation h1 : |∆1| → S of a definable set S, to be a refinement of
a triangulation h2 : |∆2| → S if for every simplex σ1 ∈ ∆1, there exists a simplex
σ2 ∈ ∆2 such that h1(|σ1|) ⊂ h2(|σ2|).
Let S1 ⊂ S2 be two closed and bounded definable subsets of R
k. We say that a
definable triangulation h : |∆| → S2 of S2, respects S1 if for every simplex σ ∈ ∆,
h(σ) ∩ S1 = h(σ) or ∅. In this case, h
−1(S1) is identified with a sub-complex of ∆
and h|h−1(S1) : h
−1(S1) → S1 is a definable triangulation of S1. We will refer to
this sub-complex by ∆|S1 .
We introduce the following notational conventions in order to simplify arguments
used later in the paper.
Notation 4.6. If T ⊂ Rk1+k2+ℓ be any definable subset of Rk1+k2+ℓ, for each m ≥ 0,
and (z,y0, . . . ,ym) ∈ R
k2+(m+1)ℓ, we will denote by Tz,y0,...,ym ⊂ R
k1 the definable
1≤i≤m
{x ∈ Rk1 | (x, z) ∈ Tyi}. For {j0, . . . , jm′} ⊂ [m], we will denote by
πm,j0,...,jm′ : R
(m+1)ℓ → R(m
′+1)ℓ the projection map on the appropriate blocks of
co-ordinates.
It is well known that compact definable sets are triangulable and moreover the
usual proof of this fact (see for instance [6]) can be easily extended to produce
a definable triangulation in a parametrized way. We will actually need a family
of such triangulations satisfying certain compatibility conditions mentioned before.
The following proposition states the existence of such families. We omit the proof
of the proposition since it is a technical but straightforward extension of the proof
of existence of triangulations for definable sets.
Proposition 4.7 (existence of m-adaptive triangulations). Let T ⊂ Rk1+k2+ℓ be a
closed and bounded definable subset of Rk1+k2+ℓ and let m ≥ 0. For each 0 ≤ p ≤ m,
there exists
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 15
(1) a definable partition {Cp,α}α∈Ip of R
k2+(p+1)ℓ, into locally closed sets, de-
termined by a sequence of definable closed sets, {Bp,α}α∈Ip (see Remark 4.3
above), and
(2) for each α ∈ Ip, a definable continuous map,
hp,α : |∆p,α| × Cp,α →
(z,y0,...,yp)∈Cp,α
Tz,y0,...,yp
where ∆p,α is a simplicial complex, and such that for each (z,y0, . . . ,yp) ∈
Cp,α, the restriction of hp,α to |∆p,α| × (z,y0, . . . ,yp) is a definable trian-
gulation
hp,α : |∆p,α| × (z,y0, . . . ,yp) → Tz,y0,...,yp
of the definable set Tz,y0,...,yp respecting the subsets, Tz,y0 , . . . , Tz,yp, and
(3) for each subset {j0, . . . , jp′} ⊂ [p], (Idk2 , πp,j0,...,jp′ )(Cp,α) ⊂ Cp′,β for some
β ∈ Ip′ , and for each (z,y0, . . . ,yp) ∈ Cp,α, the definable triangulation of
Tz,yj0 ,...,yjp′
induced by the triangulation
hp,α : |∆p,α| × (z,y0, . . . ,yp) → Tz,y0,...,yp
is a refinement of the definable triangulation,
hp′,β : |∆p′,β| × (z,yj0 , . . . ,yjp′ ) → Tz,yj0 ,...,yjp′
(We will call the family {hp,α}0≤p≤m,α∈Ip an m-adaptive family of triangulations
of T .)
We will also need the following technical result.
Proposition 4.8. Let Ct ⊂ R
k, t ≥ 0 be a definable family of closed and bounded
sets, and let C ⊂ Rk+1 be the definable set
Ct × {t}. If for every 0 ≤ t < t
Ct ⊂ Ct′ , and C0 = π
k+1(C ∩ (π
−1(0)), then there exists t0 > 0 such that, C0
has the same homotopy type as Ct for every t with 0 ≤ t ≤ t0.
Proof. The proof given in [4] (see Lemma 16.17) for the semi-algebraic case can
be easily adapted to the o-minimal setting using Hardt’s triviality for definable
families instead of for semi-algebraic ones. �
We now introduce another notational convention.
Notation 4.9. Let F(x) be a predicate defined over R+ and y ∈ R+. The notation
∀(0 < x ≪ y) F(x) stands for the statement
∃z ∈ (0, y) ∀x ∈ R+ (if x < z, then F(x)),
and can be read “for all positive x sufficiently smaller than y, F(x) is true”.
More generally,
Notation 4.10. For ε̄ = (ε0, . . . , εn) and a predicate F(ε̄) over R
+ we say “for all
sufficiently small ε̄, F(ε̄) is true” if
∀(0 < ε0 ≪ 1)∀(0 < ε1 ≪ ε0) · · · ∀(0 < εn ≪ εn−1)F(ε̄).
16 SAUGATA BASU
4.4. Infinitesimal Thickenings of the Faces of a Simplex. We will need the
following construction.
Let ε̄ = (ε0, . . . , εn) ∈ R
+ , with 0 ≤ εn < · · · < ε0 < 1. Later we will require ε̄
to be sufficiently small (see Notation 4.10).
For a face ∆J ∈ ∆[n], we denote by CJ(ε̄) the subset of |∆J | defined by
CJ (ε̄) = {x ∈ |∆J | | dist(x, |∆I |) ≥ ε#I−1 for all I ⊂ J}.
Note that,
|∆[n]| =
I⊂[n]
CI(ε̄).
Figure 1. The complex ∆[n].
I ⊂ J ⊂ K = [n]
CI(ε̄)
CI(ε̄) ∩ CJ(ε̄) ∩ CK(ε̄)
CI(ε̄) ∩ CJ(ε̄)
CJ(ε̄)
CJ(ε̄) ∩ CK(ε̄)
CK(ε̄)
Figure 2. The corresponding complex C(∆[n]) with I ⊂ J ⊂ K = [n].
Also, observe that for sufficiently small ε̄ > 0, the various CJ (ε̄)’s are all home-
omorphic to closed balls, and moreover all non-empty intersections between them
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 17
also have the same property. Thus, the cells CJ (ε̄)’s together with the non-empty
intersections between them form a regular cell complex, C(∆[n], ε̄), whose underly-
ing topological space is |∆[n]| (see Figures 1 and 2).
Definition 4.11. We will denote by C(skm(∆[n]), ε̄) the sub-complex of C(∆[n], ε̄)
consisting of the cells CI(ε̄)’s together with the non-empty intersections between
them where |I| ≤ m+ 1.
We now use thickened simplices defined above to define a thickened version of
the homotopy co-limit of an arrangement A.
4.5. Thickened Homotopy Co-limits. Given an m-adaptive family of triangu-
lations of T (cf. Proposition 4.7), {hp,α}0≤p≤m,α∈Ip and z ∈ R
k2 , we define a cell
complex, hocolim+m(Az) (best thought of as an infinitesimally thickened version
of hocolimm(Az)), whose associated topological space is homotopy equivalent to
|hocolimm(Az)|.
Definition 4.12 (the cell complex hocolim
m(Az)). Let Cm denote the cell complex
C(skm(∆[n]), ε̄) defined previously (cf. Definition 4.11).
Let C be a cell of Cm. Then, C ⊂ |∆I | for a unique simplex ∆I with I =
{i0, . . . , im′} ⊂ [n], m
′ ≤ m, and (following notation introduced before in Definition
4.11)
C = CI1(ε̄) ∩ · · · ∩ CIp(ε̄),
with I1 ⊂ I2 ⊂ · · · ⊂ Ip ⊂ I and p ≤ m
We denote by K(C, ε̄) the cell complex consisting of the cells
C × hm′,α(|σ|, z,yi0 , . . . ,yim′ )
with α ∈ Im′ , (z,yi0 , . . . ,yim′ ) ∈ Cα,m′ , σ ∈ ∆m′,α, and hm′,α(|σ|, z,yi0 , . . . ,yim′ ) ⊂
Az,I . We denote
(4.1) hocolim+m(Az, ε̄) =
K(C).
The compatibility properties (properties (2) and (3) in Proposition 4.7) of the m-
adaptive family of triangulations of T , {hp,α}0≤p≤m,α∈Ip , ensure that hocolim
m(Az, ε̄)
defined above is a regular cell complex. Notice that, since the map fA defined in
Eqn. 3.1 extends to |hocolim+m(Az, ε̄), the notion of diagram preserving maps ex-
tend to |hocolim+m(Az, ε̄) as well.
We now prove:
Lemma 4.13. Let z ∈ Rℓ and m ≥ 0. Then, for all sufficiently small ε̄ > 0,
|hocolim+m(Az, ε̄)| is homotopy equivalent to |hocolimm(Az)| by a diagram preserv-
ing homotopy equivalence.
Proof. Let N = |hocolim+m(Az, ε̄)|. First replace εm by a variable t in the definition
ofN to obtain a closed and bounded definable set, Nmt , and observe thatN
t ⊂ N
for all 0 < t < t′ ≪ 1.
Now apply Proposition 4.8 to obtain that N is homotopy equivalent to Nm0 .
Now, replace εm−1 by t in the definition of N
0 to obtain N
t , and applying
Proposition 4.8 obtain that Nm0 is homotopy equivalent to N
0 . Continuing in
this way we finally obtain that, N is homotopy equivalent to N00 = |hocolimm(Az)|.
18 SAUGATA BASU
Moreover, the diagram preserving property is clearly preserved at each step of the
proof. �
Proof of Theorem 1.8. Recall that for m ≥ 0, and (z,y0, . . . ,ym) ∈ R
k2+(m+1)ℓ,
we denote by Tz,y0,...,ym the definable set
Tz,yi ⊂ R
Now apply Proposition 4.7 to the set T with m = k1 to obtain an k1-adaptive
family of triangulations {hp,α}1≤p≤k1,α∈Ip .
We now fix {y1, . . . ,yn} ⊂ R
ℓ and let A = {A1, . . . , An} with Ai = Tyi ⊂
k1+k2 . For each z ∈ Rk2 , we will denote by Az = {A1,z, . . . , An,z} where Ai,z =
{x ∈ Rk1 | (x, z) ∈ Ai}.
For α ∈ Ik1 , and 1 ≤ i0 < · · · < ik1 ≤ n, we will denote by Bk1,α,i0,...,ik1 ⊂ R
the definable closed set
Bk1,α,i0,...,ik1 = {z ∈ R
ℓ | (z,y0, . . . ,yk1) ∈ Bk1,α}.
α∈Ik1
{Bk1,α,i0,...,ik1 | 1 ≤ i0 < i1 < · · · < ik1 ≤ n},
and let C ∈ C(B). Theorem 1.8 will follow from the following two lemmas.
Lemma 4.14. For any z1, z2 ∈ C, Az1 is stable homotopy equivalent to Az2 .
Proof. Clearly, by Theorem 3.6 it suffices to prove that |hocolimk1(Az1 )| is diagram
preserving homotopy equivalent to |hocolimk1(Az2 )|.
The compatibility properties of the triangulations ensure that that the complex
|hocolim+k1(Az1 , ε̄) is isomorphic to |hocolim
(Az2 , ε̄) and hence |hocolim
(Az1 , ε̄)|
is homeomorphic to |hocolim+k1(Az1 , ε̄)|.
Using Lemma 4.13 we get a diagram preserving homotopy equivalence
φ : |hocolimk1(Az1 )| → |hocolimk1(Az2 )|.
It now follows from Theorem 3.6 that the arrangements Az1 and Az2 are stable
homotopy equivalent. �
Lemma 4.15. There exists a constant C(T ) such that the cardinality of C(B) is
bounded by C · n(k1+1)k2 .
Proof. Notice that eachBk1,α, α ∈ Ik1 is a definable subset ofR
k2+(k1+1)ℓ depending
only on T . Also, the cardinality of the index set Ik1 is determined by T .
Hence, the set B consists of
definable sets, each one of them is a
(Bk1,α, π
k2+(k1+1)ℓ
, π>k2
k2+(k1+1)ℓ
for some α ∈ Ik1 . Using Observation 4.1, we have that B is a (B, π
2)-set for
some B determined only by T . Now apply Theorem 2.2. �
The theorem now follows from Lemmas 4.14 and 4.15 proved above. �
Proof of Theorem 1.9. The proof is similar to that of Theorem 1.8 given above,
except we use Theorem 3.7 instead of Theorem 3.6, and this accounts for the slight
worsening of the exponent in the bound. �
TOPOLOGICAL TYPES OF PARAMETRIZED ARRANGEMENTS 19
Proof of Theorem 2.7. Using a construction due to Gabrielov and Vorobjov [9] (see
also [2]) it is possible to replace any given A-set by a closed bounded A′-set (where
A′ is a new family of definable closely related to A with #A′ = 2k(#A)), such that
the new set has the same homotopy type as the original one. Using this construction
one can directly deduce Theorem 2.7 from Theorem 1.9. We omit the details. �
References
1. Pankaj K. Agarwal and Micha Sharir, Arrangements and their applications, Handbook of
computational geometry (J. Urrutia J.R. Sack, ed.), North-Holland, Amsterdam, 2000, pp. 49–
119. MR 1746675
2. S. Basu, Combinatorial complexity in o-minimal geometry, preprint at
arXiv:math.CO/0612050, 2006, An extended abstract appears in the Proceedings of
the ACM Symposium on the Theory of Computing, 2007.
3. , Algorithmic semi-algebraic geometry and topology – recent progress and open prob-
lems, Surveys on Discrete and Computational Geometry: Twenty Years Later, Contemporary
Mathematics, vol. 453, American Mathematical Society, 2008, pp. 139–212.
4. S. Basu, R. Pollack, and M.-F. Roy, Algorithms in real algebraic geometry, Algorithms
and Computation in Mathematics, vol. 10, Springer-Verlag, Berlin, 2006 (second edition).
MR 1998147 (2004g:14064)
5. S. Basu and N. Vorobjov, On the number of homotopy types of fibers of a definable map,
Journal of the London Mathematical Society (2007), 757–776.
6. M. Coste, An introduction to o-minimal geometry, Istituti Editoriali e Poligrafici Internazion-
ali, Pisa, 2000, Dip. Mat. Univ. Pisa, Dottorato di Ricerca in Matematica.
7. L. Danzer, B. Grunbaum, and V. Klee, Helly’s theorem and its relatives, Convexity, Pro-
ceedings of Symposia In Pure Mathematics, vol. VII, American Mathematical Society, 1963,
pp. 101–180.
8. Jean Dieudonné, A history of algebraic and differential topology. 1900–1960, Birkhäuser
Boston Inc., Boston, MA, 1989. MR 995842 (90g:01029)
9. A. Gabrielov and N. Vorobjov, Approximation of definable sets by compact families and upper
bounds on homotopy and homology, preprint at arXiv:math.AG/0710.3028v1, 2007.
10. D. Grigoriev and N. Vorobjov, Solving systems of polynomial inequalities in subexponential
time, Journal of Symbolic Computation 5 (1988), 37–64.
11. J.F. Knight, A. Pillay, and C. Steinhorn, Definable sets in ordered structures. II., Trans.
Amer. Math. Soc. 295 (1986), no. 2, 593–605. MR 0833698 (88b:03050b)
12. J. Matousek, Lectures on discrete geometry, Graduate Texts in Mathematics, vol. 212,
Springer-Verlag, 2002.
13. A. Pillay and C. Steinhorn, Definable sets in ordered structures. I., Trans. Amer. Math. Soc.
295 (1986), no. 2, 565–592. MR 0833697 (88b:03050a)
14. , Definable sets in ordered structures. III., Trans. Amer. Math. Soc. 309 (1988), no. 2,
469–576. MR 0943306 (89i:03059)
15. J.-P. Rolin, P. Speissegger, and A. J. Wilkie, Quasianalytic Denjoy-Carleman classes and
o-minimality, J. Amer. Math. Soc. 16 (2003), no. 4, 751–777 (electronic). MR 1992825
(2004g:14065)
16. S. Smale, A Vietoris mapping theorem for homotopy, Proc. Amer. Math. Soc. 8:3 (1957),
604–610.
17. E. H. Spanier and J. H. C. Whitehead, Duality in relative homotopy theory, Ann. of Math.
(2) 67 (1958), 203–238. MR 0105105 (21 #3850)
18. E.H. Spanier, Algebraic topology, McGraw-Hill, New York, 1966.
19. L. van den Dries, Tame topology and o-minimal structures., LMS Lecture Notes, vol. 248,
Cambridge University Press, 1998.
20. Lou van den Dries and Chris Miller, Geometric categories and o-minimal structures, Duke
Math. J. 84 (1996), no. 2, 497–540. MR 1404337 (97i:32008)
21. Lou van den Dries and Patrick Speissegger, The real field with convergent generalized power
series, Trans. Amer. Math. Soc. 350 (1998), no. 11, 4377–4421. MR 1458313 (99a:03036)
22. , The field of reals with multisummable series and the exponential function, Proc.
London Math. Soc. (3) 81 (2000), no. 3, 513–565. MR 1781147 (2002k:03057)
20 SAUGATA BASU
23. O. Ya. Viro and D. B. Fuchs, Homology and cohomology, Topology. II, Encyclopaedia Math.
Sci., vol. 24, Springer, Berlin, 2004, Translated from the Russian by C. J. Shaddock, pp. 95–
196. MR 2054457
24. George W. Whitehead, Elements of homotopy theory, Graduate Texts in Mathematics, vol. 61,
Springer-Verlag, New York, 1978. MR 516508 (80b:55001)
25. A. J. Wilkie, Model completeness results for expansions of the ordered field of real numbers
by restricted Pfaffian functions and the exponential function, J. Amer. Math. Soc. 9 (1996),
no. 4, 1051–1094. MR 1398816 (98j:03052)
26. , A theorem of the complement and some new o-minimal structures, Selecta Math.
(N.S.) 5 (1999), no. 4, 397–421. MR 1740677 (2001c:03071)
School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.
E-mail address: saugata.basu@math.gatech.edu
	1. Introduction
	1.1. Combinatorial Complexity in O-minimal Geometry
	1.2. Stable Homotopy Equivalence
	1.3. Diagrams and Co-limits
	1.4. Diagram Preserving Maps
	1.5. Main Results
	2. Background
	2.1. Bounds on the Betti numbers of Admissible Sets
	2.2. Homotopy types of the fibers of a semi-algebraic map
	3. A Topological Comparison Theorem
	3.1. Topological Preliminaries
	3.2. Proofs of Theorems ?? and ??
	4. Proofs of the Main Theorems
	4.1. Summary of the main ideas
	4.2. Hardt's Triviality for Definable Sets
	4.3. Definable Triangulations
	4.4. Infinitesimal Thickenings of the Faces of a Simplex
	4.5. Thickened Homotopy Co-limits
	References
ABSTRACT
  Let ${\mathcal S}(\R)$ be an o-minimal structure over $\R$, $T \subset
\R^{k_1+k_2+\ell}$ a closed definable set, and $$ \displaylines{\pi_1:
\R^{k_1+k_2+\ell}\to \R^{k_1 + k_2}, \pi_2: \R^{k_1+k_2+\ell}\to \R^{\ell}, \
\pi_3: \R^{k_1 + k_2} \to \R^{k_2}} $$ the projection maps.
  For any collection ${\mathcal A} = \{A_1,...,A_n\}$ of subsets of
$\R^{k_1+k_2}$, and $\z \in \R^{k_2}$, let $\A_\z$ denote the collection of
subsets of $\R^{k_1}$, $\{A_{1,\z},..., A_{n,\z}\}$, where $A_{i,\z} = A_i \cap
\pi_3^{-1}(\z), 1 \leq i \leq n$. We prove that there exists a constant $C =
C(T) > 0,$ such that for any family ${\mathcal A} = \{A_1,...,A_n\}$ of
definable sets, where each $A_i = \pi_1(T \cap \pi_2^{-1}(\y_i))$, for some
$\y_i \in \R^{\ell}$, the number of distinct stable homotopy types of $\A_\z,
\z \in \R^{k_2}$, is bounded by $ \displaystyle{C \cdot n^{(k_1+1)k_2},} $
while the number of distinct homotopy types is bounded by $ \displaystyle{C
\cdot n^{(k_1+3)k_2}.} $ This generalizes to the general o-minimal setting,
bounds of the same type proved in \cite{BV} for semi-algebraic and
semi-Pfaffian families. One main technical tool used in the proof of the above
results, is a topological comparison theorem which might be of independent
interest in the study of arrangements.

<|endoftext|><|startoftext|>
USC-07/HEP-B3 hep-th/0704.0296
Generalized Twistor Transform And Dualities
With A New Description of Particles With Spin
Beyond Free and Massless1
Itzhak Bars and Bora Orcal
Department of Physics and Astronomy,
University of Southern California, Los Angeles, CA 90089-0484, USA.
Abstract
A generalized twistor transform for spinning particles in 3+1 dimensions is constructed
that beautifully unifies many types of spinning systems by mapping them to the same twistor
, thus predicting an infinite set of duality relations among spinning systems with
different Hamiltonians. Usual 1T-physics is not equipped to explain the duality relationships
and unification between these systems. We use 2T-physics in 4+2 dimensions to uncover
new properties of twistors, and expect that our approach will prove to be useful for practi-
cal applications as well as for a deeper understanding of fundamental physics. Unexpected
structures for a new description of spinning particles emerge. A unifying symmetry SU(2, 3)
that includes conformal symmetry SU(2, 2) =SO(4, 2) in the massless case, turns out to be
a fundamental property underlying the dualities of a large set of spinning systems, including
those that occur in high spin theories. This may lead to new forms of string theory back-
grounds as well as to new methods for studying various corners of M theory. In this paper
we present the main concepts, and in a companion paper we give other details [1].
I. SPINNING PARTICLES IN 3+1 - BEYOND FREE AND MASSLESS
The Penrose twistor transform [2]-[5] brings to the foreground the conformal symmetry
SO(4, 2) in the dynamics of massless relativistic particles of any spin in 3 + 1 dimensions.
The transform relates the phase space and spin degrees of freedom xµ, pµ, s
µν to a twistor
and reformulates the dynamics in terms of twistors instead of phase space. The
1 This work was partially supported by the US Department of Energy, grant number DE-FG03-84ER40168.
http://arxiv.org/abs/0704.0296v2
twistor ZA is made up of a pair of SL(2, C) spinors µ
α̇, λα, α, α̇ = 1, 2, and is regarded as
the 4 components A = 1, 2, 3, 4 of the Weyl spinor of SO(4, 2) =SU(2, 2).
The well known twistor transform for a spinning massless particle is [5]
µα̇ = −i (x̄+ iȳ)α̇β λβ, λαλ̄β̇ = pαβ̇, (1.1)
where (x̄+ iȳ)
(xµ + iyµ) (σ̄µ)
, and pαβ̇ =
pµ (σµ)αβ̇ , while σµ = (1, ~σ) , σ̄µ =
(−1, ~σ) are Pauli matrices. xµ + iyµ is a complexification of spacetime [2]. The helicity
h of the particle is determined by p · y = h. The spin tensor is given by sµν = εµνρσyρpσ,
and it leads to 1
sµνsµν = h
2. The Pauli-Lubanski vector is proportional to the momentum
εµνρσs
νρpσ = (y · p) pµ−p2yµ = hpµ, appropriate for a massless particle of helicity h.
The reformulation of the dynamics in terms of twistors is manifestly SU(2, 2) covariant.
It was believed that twistors and the SO(4, 2)=SU(2, 2) symmetry, interpreted as conformal
symmetry, govern the dynamics of massless particles only, since the momentum pµ of the
form pαβ̇ = λαλ̄β̇ automatically satisfies p
µpµ = 0.
However, recent work has shown that the same twistor ZA =
that describes massless
spinless particles (h = 0) also describes an assortment of other spinless particle dynamical
systems [6][7]. These include massive and interacting particles. The mechanism that avoids
pµpµ = 0 [6][7] is explained following Eq.(6.9) below. The list of systems includes the
following examples worked out explicitly in previous publications and in unpublished notes.
The massless relativistic particle in d = 4 flat Minkowski space.
The massive relativistic particle in d = 4 flat Minkowski space.
The nonrelativistic free massive particle in 3 space dimensions.
The nonrelativistic hydrogen atom (i.e. 1/r potential) in 3 space dimensions.
The harmonic oscillator in 2 space dimensions, with its mass ⇔ an extra dimension.
The particle on AdS4, or on dS4.
The particle on AdS3×S1 or on R× S3.
The particle on AdS2×S2.
The particle on the Robertson-Walker spacetime.
The particle on any maximally symmetric space of positive or negative curvature.
The particle on any of the above spaces modified by any conformal factor.
A related family of other particle systems, including some black hole backgrounds.
In this paper we will discuss these for the case of d = 4 with spin (h 6= 0). It must be
emphasized that while the phase spaces (and therefore dynamics, Hamiltonian, etc.) in these
systems are different, the twistors
µα̇, λα
are the same. For example, the massive particle
phase space (xµ, pµ)massive and the one for the massless particle (x
µ, pµ)massless are not the
same (xµ, pµ) , rather they can be obtained from one another by a non-linear transformation
for any value of the mass parameter m [6], and similarly, for all the other spaces mentioned
above. However, under such “duality” transformations from one system to another, the
twistors for all the cases are the same up to an overall phase transformation
µα̇, λα
massive
µα̇, λα
massless
= · · · =
µα̇, λα
. (1.2)
This unification also shows that all of these systems share the same SO(4, 2)=SU(2, 2)
global symmetry of the twistors. This SU(2, 2) is interpreted as conformal symmetry for the
massless particle phase space, but has other meanings as a hidden symmetry of all the other
systems in their own phase spaces. Furthermore, in the quantum physical Hilbert space, the
symmetry is realized in the same unitary representation of SU(2, 2) , with the same Casimir
eigenvalues (see (7.16,7.17) below), for all the systems listed above.
The underlying reason for such fantastic looking properties cannot be found in one-time
physics (1T-physics) in 3+1 dimensions, but is explained in two-time physics (2T-physics)
[8] as being due to a local Sp(2, R) symmetry. The Sp(2, R) symmetry which acts in phase
space makes position and momentum indistinguishable at any instant and requires one extra
space and one extra time dimensions to implement it, thus showing that the unification relies
on an underlying spacetime in 4+2 dimensions. It was realized sometime ago that in 2T-
physics twistors emerge as a gauge choice [9], while the other systems are also gauge choices
of the same theory in 4+2 dimensions. The 4+2 phase space can be gauge fixed to many
3+1 phase spaces that are distinguishable from the point of view of 1T-physics, without any
Kaluza-Klein remnants, and this accounts for the different Hamiltonians that have a duality
relationship with one another. We will take advantage of the properties of 2T-physics to
build the general twistor transform that relates these systems including spin.
Given that the field theoretic formulation of 2T-physics in 4+2 dimensions yields the
Standard Model of Particles and Forces in 3+1 dimensions as a gauge choice [10], including
spacetime supersymmetry [11], and given that twistors have simplified QCD computations
[12][13], we expect that our twistor methods will find useful applications.
II. TWISTOR LAGRANGIAN
The Penrose twistor description of massless spinning particles requires that the pairs
µα̇, iλ̄α̇
or their complex conjugates (λα, iµ̄
α) be canonical conjugates and satisfy the he-
licity constraint given by
Z̄AZA = λ̄α̇µ
α̇ + µ̄αλα = 2h. (2.1)
Indeed, Eq.(1.1) satisfies this property provided y · p = h. Here we have defined the 4̄ of
SU(2, 2) as the contravariant twistor
Z̄A ≡
Z†η2,2
λ̄α̇ µ̄
, η2,2 =
= SU (2, 2) metric. (2.2)
The canonical structure, along with the constraint Z̄AZA = 2h follows from the following
worldline action for twistors
− 2hÃ
, DτZ
− iÃZA. (2.3)
In the case of h = 0 it was shown that this action emerges as a gauge choice of a more
general action in 2T-physics [6][7]. Later in the paper, in Eq.(4.1) we give the h 6= 0
2T-physics action from which (2.3) is derived as a gauge choice. The derivative part of
this action gives the canonical structure S0 =
dτiZ̄A
λ̄α̇∂τµ
α̇ + µ̄α∂τλα
that requires
µα̇, iλ̄α̇
or their complex conjugates (λα, iµ̄
α) to be canonical conjugates.
The 1-form Ãdτ is a U(1) gauge field on the worldline, DτZ
A is the U(1) gauge covariant
derivative that satisfies δε
for δεÃ = ∂ε/∂τ and δεZ
A = iεZA. The
term 2hÃ is gauge invariant since it transforms as a total derivative under the infinitesimal
gauge transformation. 2hÃ was introduced in [6][7] as being an integral part of the twistor
formulation of the spinning particle action.
Our aim is to show that this action describes not only massless spinning particles, but also
all of the other particle systems listed above with spin. This will be done by constructing
the twistor transform from ZA to the phase space and spin degrees of freedom of these
systems, and claiming the unification of dynamics via the generalized twistor transform.
This generalizes the work of [6][7] which was done for the h = 0 case of the action in (2.3).
We will use 2T-physics as a tool to construct the general twistor transform, so this unification
is equivalent to the unification achieved in 2T-physics.
III. MASSLESS PARTICLE WITH ANY SPIN IN 3+1 DIMENSIONS
In our quest for the general twistor transform with spin, we first discuss an alternative
to the well known twistor transform of Eq.(1.1). Instead of the yµ (τ) that appears in the
complexified spacetime xµ + iyµ we introduce an SL(2, C) bosonic2 spinor vα̇ (τ) and its
complex conjugate v̄α (τ) , and write the general vector yµ in the matrix form as yα̇β =
hvα̇v̄β + ωpα̇β , where ω (τ) is an arbitrary gauge freedom that drops out. Then the helicity
condition y · p = h takes the form v̄pv = 1. Furthermore, we can write λα = pαβ̇vβ̇ since this
automatically satisfies λαλ̄β̇ = pαβ̇ when p
2 = (v̄pv − 1) = 0 are true. With this choice of
variables, the Penrose transform of Eq.(1.1) takes the new form
λα = (pv)α , µ
α̇ = [(−ix̄p+ h) v]α̇ , p2 = (v̄pv − 1) = 0, (3.1)
where the last equation is a set of constraints on the degrees of freedom xµ, pµ, v
α̇, v̄α.
If we insert the twistor transform (3.1) into the action (2.3), the twistor action turns into
the action for the phase space and spin degrees of freedom xµ, pµ, v
α̇, v̄α
ẋµpµ −
p2 + ih
(v̄p)Dτv −Dτv (pv)
− 2hÃ
. (3.2)
where Dτv = v̇ − iÃv is the U(1) gauge covariant derivative and we have included the
Lagrange multiplier e to impose p2 = 0 when we don’t refer to twistors. The equation of
motion for Ã imposes the second constraint v̄pv−1 = 0 that implies U(1) gauge invariance3.
From the global Lorentz symmetry of (3.2), the Lorentz generator is computed via
Noether’s theorem Jµν = xµpν − xνpµ + sµν , with sµν = i
hv̄ (pσ̄µν + σµνp) v. The helic-
2 This is similar to the fermionic case in [5]. The bosonic spinor v can describe any spin h.
3 If this action is taken without the U(1) constraint
Ã = 0
, then the excitations in the v sector describe
an infinite tower of massless states with all helicities from zero to infinity (here we rescale
2hv → v)
Sall spins =
ẋµpµ −
v̄pv̇ − v̇pv
(3.3)
The spectrum coincides with the spectrum of the infinite slope limit of string theory with all helicities
v̄pv. This action has a hidden SU(2, 3) symmetry that includes SU(2, 2) conformal symmetry. This is
explained in the rest of the paper by the fact that this action is a gauge fixed version of a 2T-physics
master action (4.1,5.4) in 4+2 dimensions with manifest SU(2, 3) symmetry. A related approach has been
pursued also in [15]-[18] in 3+1 dimensions in the context of only massless particles. Along with the
manifestly SU(2, 3) symmetric 2T-physics actions, we are proposing here a unified 2T-physics setting for
discussing high spin theories [14] including all the dual versions of the high spin theories related to the
spinning physical systems listed in section (I).
ity is determined by computing the Pauli-Lubanski vector W µ = 1
εµνλσsνλpσ = (hv̄pv) p
The helicity operator hv̄pv reduces to the constant h in the U(1) gauge invariant sector.
The action (3.2) gives a description of a massless particle with any helicity h in terms of
the SL(2, C) bosonic spinors v, v̄. We note its similarity to the standard superparticle action
[20][21] written in the first order formalism. The difference with the superparticle is that the
fermionic spacetime spinor θα̇ of the superparticle is replaced with the bosonic spacetime
spinor vα̇, and the gauge field Ã imposes the U(1) gauge symmetry constraint v̄pv − 1 = 0
that restricts the system to a single, but arbitrary helicity state given by h.
Just like the superparticle case, our action has a local kappa symmetry with a bosonic
local spinor parameter κα (τ), namely
α̇ = p̄α̇βκβ, δκxµ =
((δκv̄)σµv − v̄σµ (δκv)) , (3.4)
µ = 0, δκe = −ih
κ̄ (Dτv)−
, δκÃ = 0. (3.5)
These kappa transformations mix the phase space degrees of freedom (x, p) with the spin
degrees of freedom v, v̄. The transformations δκxµ, δκe are non-linear.
Let us count physical degrees of freedom. By using the kappa and the τ -reparametrization
symmetries one can choose the lightcone gauge. From phase space xµ, pµ there remains 3
positions and 3 momentum degrees of freedom. One of the two complex components of
vα̇ is set to zero by using the kappa symmetry, so vα̇ =
. The phase of the remaining
component is eliminated by choosing the U(1) gauge, and finally its magnitude is fixed by
solving the constraint v̄pv − 1 = 0 to obtain vα̇ = (p+)−1/2
. Therefore, there are no
independent physical degrees of freedom in v. The remaining degrees of freedom for the
particle of any spin are just the three positions and momenta, and the constant h that
appears in sµν . This is as it should be, as seen also by counting the physical degrees of
freedom from the twistor point of view. When we consider the other systems listed in the
first section, we should expect that they too are described by the same number of degrees
of freedom since they will be obtained from the same twistor, although they obey different
dynamics (different Hamiltonians) in their respective phase spaces.
The lightcone quantization of the the massless particle systems described by the actions
(3.2,3.3) is performed after identifying the physical degrees of freedom as discussed above.
The lightcone quantum spectrum and wavefunction are the expected ones for spinning mass-
less particles, and agree with their covariant quantization given in [15]-[19].
IV. 2T-PHYSICS WITH SP(2, R) , SU(2, 3) AND KAPPA SYMMETRIES
The similarity of (3.2) to the action of the superparticle provides the hint for how to
lift it to the 2T-physics formalism, as was done for the superparticle [22][9] and the twistor
superstring [23][24]. This requires lifting 3+1 phase space (xµ, pµ) to 4+2 phase space
XM , PM
and lifting the SL(2, C) spinors v, v̄ to the SU(2, 2) spinors VA, V̄
A. The larger
set of degrees of freedom XM , PM , VA, V̄
A that are covariant under the global symmetry
SU(2, 2) =SO(4, 2) , include gauge degrees of freedom, and are subject to gauge symmetries
and constraints that follow from them as described below.
The point is that the SU(2, 2) invariant constraints on XM , PM , VA, V̄
A have a wider set
of solutions than just the 3+1 system of Eq.(3.2) we started from. This is because 3+1 di-
mensional spin & phase space has many different embeddings in 4+2 dimensions, and those
are distinguishable from the point of view of 1T-physics because target space “time” and
corresponding “Hamiltonian” are different in different embeddings, thus producing the dif-
ferent dynamical systems listed in section (I). The various 1T-physics solutions are reached
by simply making gauge choices. One of the gauge choices for the action we give below in
Eq.(4.1) is the twistor action of Eq.(2.3). Another gauge choice is the 4+2 spin & phase
space action in terms of the lifted spin & phase space XM , PM , VA, V̄
A as given in Eq.(5.4).
The latter can be further gauge fixed to produce all of the systems listed in section (I)
including the action (3.2) for the massless spinning particle with any spin. All solutions still
remember that there is a hidden global symmetry SU(2, 2) =SO(4, 2) , so all systems listed
in section (I) are realizations of the same unitary representation of SU(2, 2) whose Casimir
eigenvalues will be given below.
For the 4 + 2 version of the superparticle [22] that is similar to the action in (5.4), this
program was taken to a higher level in [9] by embedding the fermionic supercoordinates in
the coset of the supergroup SU(2, 2|1) /SU(2, 2)×U(1). We will follow the same route here,
and embed the bosonic SU(2, 2) spinors VA, V̄
A in the left coset SU(2, 3) /SU(2, 2)×U(1) .
This coset will be regarded as the gauging of the group SU(2, 3) under the subgroup
[SU (2, 2)× U (1)]
from the left side. Thus the most powerful version of the action that
reveals the global and gauge symmetries is obtained when it is organized in terms of the
XMi (τ), g (τ) and Ã (τ) degrees of freedom described as
4+2 phase space
XM (τ)
PM (τ)
≡ XMi (τ) , i = 1, 2, doublets of Sp (2, R) gauge symmetry,
group element g (τ) ⊂ SU (2, 3) subject to [SU (2, 2)× U(1) ]L × U(1)L+R gauge symmetry.
We should mention that the h = 0 version of this theory, and the corresponding twistor prop-
erty, was discussed in [6], by taking g (τ) ⊂SU(2, 2) and dropping all of the U(1)’s. So, the
generalized theory that includes spin has the new features that involves SU(2, 2) →SU(2, 3)
and the U(1) structures. The action has the following form
XNj ηMN + Tr
(iDτg) g
− 2hÃ
, (4.1)
where εij =
is the antisymmetric Sp(2, R) metric, and DτX
i = ∂τX
the Sp(2, R) gauge covariant derivative, with the 3 gauge potentials Aij = εikA
For SU(2, 3) the group element is pseudo-unitary, g−1 = (η2,3) g
† (η2,3)
, where η2,3 is the
SU(2, 3) metric η2,3 =
. The covariant derivative Dτg is given by
Dτg = ∂τg − iÃ [q, g] , q =
14×4 0
 (4.2)
where the generator of U(1)L+R is proportional to the 5×5 traceless matrix
q ∈ u(1) ∈ su(2, 3)L+R . The last term of the action −2hÃ, which is also the last term
of the action (2.3), is invariant under the U(1)L+R since it transforms to a total derivative.
Finally, the 4× 4 traceless matrix (L) BA ∈su(2, 2) ∈su(2, 3) that appears on the left side of
g (or right side of g−1) is
(L) BA ≡
LMN , LMN = εijXMi X
j = X
MPN −XNPM . (4.3)
where ΓMN =
ΓM Γ̄N − ΓN Γ̄M
are the 4×4 gamma-matrix representation of the 15 gen-
erators of SU(2, 2). A detailed description of these gamma matrices is given in [11].
The symmetries of actions of this type for any group or supergroup g were discussed
in [9][23][24][7]. The only modification of that discussion here is due to the inclusion of
the U(1) gauge field Ã. In the absence of the Ã coupling the global symmetry is given
by the transformation of g (τ) from the right side g (τ) → g (τ) gR where gR ⊂SU(2, 3)R.
However, in our case, the presence of the coupling with the U(1)L+R charge q breaks the
global symmetry down to the (SU(2, 2)×U(1))R subgroup that acts on the right side of g.
So the global symmetry is given by
global: g (τ) → g (τ) hR, hR ∈ [SU(2, 2)× U(1)]R ⊂ SU(2, 3)R. (4.4)
Using Noether’s theorem we deduce the conserved global charges as the [SU(2, 2)×U(1)]R
components of the the following SU(2, 3)R Lie algebra valued matrix J(2,3)
J(2,3) = g
, J2,3 = η2,3 (J2,3)
(η2,3)
, (4.5)
The traceless 4× 4 matrix (J ) BA =
ΓMNJMN is the conserved SU(2, 2) =SO(4, 2) charge
and J0 is the conserved U(1) charge. Namely, by using the equations of motion one can
verify ∂τ (J ) BA = 0 and ∂τJ0 = 0. The spinor charges jA, j̄A are not conserved4 due to the
coupling of Ã. As we will find out later in Eq.(6.8), jA is proportional to the twistor
J0ZA, (4.6)
up to an irrelevant gauge transformation. It is important to note that J and J0 are invariant
on shell under the gauge symmetries discussed below. Therefore they generate physical
symmetries [SU(2, 2)×U(1)]R under which all gauge invariant physical states are classified.
The local symmetries of this action are summarized as
Sp (2, R)×
SU (2, 2) 3
kappa
kappa U (1)
(4.7)
The Sp(2, R) is manifest in (4.1). The rest corresponds to making local SU(2, 3) transfor-
mations on g (τ) from the left side g (τ) → gL (τ) g (τ) , as well as transforming XMi =
XM , PM
as vectors with the local subgroup SU(2, 2)L =SO(4, 2) , and A
ij under the
kappa. The 3/4 kappa symmetry which is harder to see will be discussed in more detail
below. These symmetries coincide with those given in previous discussions in [9][23][24][7]
despite the presence of Ã. The reason is that the U(1)L+R covariant derivative Dτg in
Eq.(4.2) can be replaced by a purely U(1)R covariant derivative Dτg = ∂τg + igqÃ because
the difference drops out in the trace in the action (4.1). Hence the symmetries on left side
of g (τ) → gL (τ) g (τ) remain the same despite the coupling of Ã.
We outline the roles of each of these local symmetries. The Sp(2, R) gauge symmetry
can reduce XM , PM to any of the phase spaces in 3+1 dimensions listed in section (I). This
4 In the high spin version of (4.1) with Ã = 0, the global symmetry is SU(2, 3)R and jA, j̄
A are conserved.
is the same as the h = 0 case discussed in [6]. The [SU(2, 2)×U(1)]L gauge symmetry can
reduce g (τ) ⊂SU(2, 3) to the coset g → t (V ) ∈SU(2, 3) /[SU(2, 2)×U(1)]L parameterized
by the SU(2, 2)×U(1) spinors
VA, V̄
as shown in Eq.(5.3). The remaining 3/4 kappa
symmetry, whose action is shown in Eq.(5.15), can remove up to 3 out of the 4 parameters
in the VA. The U(1)L+R symmetry can eliminate the phase of the remaining component
in V . Finally the constraint due to the equation of motion of Ã fixes the magnitude of V .
In terms of counting, there remains only 3 position and 3 momentum physical degrees of
freedom, plus the constant h, in agreement with the counting of physical degrees of freedom
of the twistors.
It is possible to gauge fix the symmetries (4.7) partially to exhibit some intermediate
covariant forms. For example, to reach the SL(2, C) covariant massless particle described by
the action (3.2) from the 2T-physics action above, we take the massless particle gauge by
using two out of the three Sp(2, R) gauge parameters to rotate the M = +′ doublet to the
(τ) =
, and solving explicitly two of the Sp(2, R) constraints X2 = X ·P = 0
XM = (
xµ (τ)), PM = (
x · p ,
pµ (τ)). (4.8)
This is the same as the h = 0 massless case in [6]. There is a tau reparametrization gauge
symmetry as a remnant of Sp(2, R) . Next, the [SU(2, 2)×U(1)]L gauge symmetry reduces
g (τ) → t (V ) written in terms of
VA, V̄
as given in Eq.(5.3), and the 3/4 kappa symmetry
reduces the SU(2, 2) spinor VA →
to the two components SL(2, C) doublet vα̇, with a
leftover kappa symmetry as discussed in Eqs.(3.4-3.5). The gauge fixed form of g is then
g = exp
2hvα̇
0 0 0
2hv̄α 0
1 hvα̇v̄β
2hvα̇
0 1 0
2hv̄α 1
∈ SU (2, 3) . (4.9)
The inverse g−1 = (η2,3) g
† (η2,3)
is given by replacing v, v̄ by (−v) , (−v̄) . Inserting the
gauge fixed forms of X,P, g (4.8,4.9) into the action (4.1) reduces it to the massless spin-
ning particle action (3.2). Furthermore, inserting these X,P, g into the expression for the
current in (4.5) gives the conserved SU(2, 2) charges J (see Eqs.(5.9,5.20)) which have the
significance of the hidden conformal symmetry of the gauge fixed action (3.2). This hidden
symmetry is far from obvious in the form (3.2), but it is straightforward to derive from the
2T-physics action as we have just outlined.
Partial or full gauge fixings of (4.1) similar to (4.8,4.9) produce the actions, the hidden
SU(2, 2) symmetry, and the twistor transforms with spin of all the systems listed in section
(I). These were discussed for h = 0 in [6], and we have now shown how they generalize
to any spin h 6= 0, with further details below. It is revealing, for example, to realize that
the massive spinning particle has a hidden SU(2, 2) “mass-deformed conformal symmetry”,
including spin, not known before, and that its action can be reached by gauge fixing the
action (4.1), or by a twistor transform from (2.3). The same remarks applied to all the other
systems listed in section (I) are equally revealing. For more information see our related paper
Through the gauge (4.8,4.9), the twistor transform (3.1), and the massless particle action
(3.2), we have constructed a bridge between the manifestly SU(2, 2) invariant twistor action
(2.3) for any spin and the 2T-physics action (4.1) for any spin. This bridge will be made
much more transparent in the following sections by building the general twistor transform.
V. 2T-PHYSICS ACTION WITH XM , PM , VA, V̄
A IN 4+2 DIMENSIONS
We have hinted above that there is an intimate relation between the 2T-physics action
(4.1) and the twistor action (2.3). In fact the twistor action is just a gauged fixed version
of the more general 2T-physics action (4.1). Using the local SU(2, 2) =SO(4, 2) and local
Sp(2, R) symmetries of the general action (4.1) we can rotateXM (τ) , PM (τ) to the following
form that also solves the Sp(2, R) constraints Xi ·Xj = X2 = P 2 = X · P = 0 [6][7]
XM = (
0), PM = (
0). (5.1)
This completely eliminates all phase space degrees of freedom. We are left with the gauge
fixed action Sh =
(Dτg) g
− 2hÃ
, where (iL) → 1
′−′, and
′−′ = 1. Due to the many zero entries in the 4×4 matrix Γ−′− [6], only one column
from g in the form
and one row from g−1 in the form
Z̄A,−Z̄5
can contribute in
the trace, and therefore the action becomes Sh =
iZ̄AŻA − iZ̄5Ż5 + Ã
Z̄5Z5 − 2h
Here Z̄5Ż5 drops out as a total derivative since the magnitude of the complex number Z5 is
a constant Z̄5Z5 = 2h. Furthermore, we must take into account Z̄
AZA − Z̄5Z5 = 0 which
is an off-diagonal entry in the matrix equation g−1g = 1. Then we see that the 2T-physics
action (4.1) reduces to the twistor action (2.3) with the gauge choice (5.1)5.
Next let us gauge fix the 2T-physics action (4.1) to a manifestly SU(2, 2) =SO(4, 2)
invariant version in flat 4+2 dimensions, in terms of the phase space & spin degrees of
freedom XM , PM , VA, V̄
A. For this we use the [SU(2, 2)×U(1)]left symmetry to gauge fix g
gauge fix: g → t (V ) ∈ SU(2, 3)
[SU(2, 2)× U(1)]left
(5.2)
The coset element t (V ) is parameterized by the SU(2, 2) spinor V and its conjugate V̄ =
V †η2,2 and given by the 5×5 SU(2, 3) matrix6
t (V ) =
1− 2hV V̄
)−1/2
1− 2hV̄ V
)−1/2
2hV̄ 1
 . (5.3)
The factor 2h is inserted for a convenient normalization of V. Note that the first matrix
commutes with the second one, so it can be written in either order. The inverse of the group
element is t−1 (V ) = (η2,3) t
† (η2,3)
= t (−V ) , as can be checked explicitly t (V ) t (−V ) = 1.
Inserting this gauge in (4.1) the action becomes
Ẋ · P −
AijXi ·Xj −
ΩMNLMN − 2hÃ
V̄ LV
1− 2hV̄ V
(5.4)
XNj ηMN − 2hÃ
V̄ LV
1− 2hV̄ V
(5.5)
where
i = ∂τX
j − ΩMNXiN (5.6)
is a covariant derivative for local Sp(2, R) as well as local SU(2, 2) =SO(4, 2) but with a
composite SO(4, 2) connection ΩMN (V (τ)) given conveniently in the following forms
ΩMNΓMN =
(i∂τ t) t
SU(2,2)
ΩMNLMN = −Tr
(i∂τ t) t
. (5.7)
Thus, Ω is the SU(2, 2) projection of the SU(2, 3) Cartan connection and given explicitly as
ΩMNΓMN = 2h
V̇ − V V̄ V̇
V̄ − V
V̇ − V̄ V̇ V
1− 2hV̄ V
1− 2hV̄ V
) + h
V̄ V̇ − V̇ V
1− 2hV̄ V
) (5.8)
5 In the high spin version of (4.1) without Ã (see footnote (3)), we replace Z5 = e
Z̄AZA and after
dropping a total derivative, the twistor equivalent becomes Sall spins =
iZ̄AŻA + Z̄Zφ̇
. For a
more covariant version that displays the SU(2, 3) global symmetry, we introduce a new U(1) gauge field
for the overall phase of
and write Sall spins =
iZ̄AŻA − iZ̄5Ż5 + B̃
Z̄AZA − Z̄5Z5
6 Arbitrary fractional powers of the matrix
1− 2hV V̄
are easily computed by expanding in a series and
then resuming to obtain
1− 2hV V̄
= 1+ V V̄
1− 2hV̄ V
)γ − 1
/V̄ V.
The action (5.4,5.5) is manifestly invariant under global SU(2, 2) =SO(4, 2) rotations, and
under local U (1) phase transformations applied on VA, V̄
A. The conserved global symmetry
currents J and J0 can be derived either directly from (5.4) by using Noether’s theorem, or
by inserting the gauge fixed form of g → t (V ) into Eq.(4.5)7 J(2,3) = t−1
1− 2hV V̄
1− 2hV V̄
J0, J0 =
2hV̄ LV
1− 2hV̄ V
(5.9)
1− 2hV V̄
LV 1√
1− 2hV̄ V
(5.10)
According to the equation of motion for Ã that follows from the action (5.4) we must have
the following constraint (this means U(1) gauge invariant physical sector)
V̄ LV
1− 2hV̄ V
= 1. (5.11)
Therefore, in the physical sector the conserved [SU(2, 2)×U(1)]right charges take the form
physical sector: J0 = 2h, J =
1− 2hV V̄
1− 2hV V̄
. (5.12)
Let us now explain the local kappa symmetry of the action (5.4,5.5). The action (5.4) is
still invariant under the bosonic local 3/4 kappa symmetry inherited from the action (4.1).
The kappa transformations of g (τ) in the general action (5.4) correspond to local coset
elements exp
∈SU(2, 3)left/[SU(2, 2)×U(1)]left with a special form of the spinor KA
KA = Xi ·
Γκi (τ)
, (5.13)
with κiA (τ) two arbitrary local spinors8. Now that g has been gauge fixed g → t (V ), the
kappa transformation must be taken as the naive kappa transformation on g followed by a
[SU(2, 2)×U(1)]left gauge transformation which restores the gauge fixed form of t (V )
t (V ) → t (V ′) =
Tr (ω)
t (V ) (5.14)
The SU(2, 2) part of the restoring gauge transformation must also be applied on XM , PM .
Performing these steps we find the infinitesimal version of this transformation [22]
δκV =
1− 2hV V̄
1− 2hV̄ V
, δκX
i = ω
MNXiN , δκA
ij = see below, (5.15)
7 In the high spin version (Ã = 0) the conserved charges include jA as part of SU(2, 3)R global symmetry.
It is then also convenient to rescale
2hV → V in Eqs.(5.3-5.10) to eliminate an irrelevant constant.
8 In this special form only 3 out of the 4 components of KA are effectively independent gauge parameters.
This can be seen easily in the special frame for XM , PM given in Eq.(5.1).
where ωMN (K, V ) has the same form as ΩMN in Eq.(5.8) but with V̇ replaced by the
δκV given above. The covariant derivative D̂τX
i in Eq.(5.6) is covariant under the local
SU(2, 2) transformation with parameter ωMN (K, V ) (this is best seen from the projected
Cartan connection form Ω = [(i∂τ t) t
−1]SU(2,2)). Therefore, the kappa transformations (5.15)
inserted in (5.5) give
δκSh =
Xi ·Xj + iT r
(Dτ t) t
. (5.16)
In computing the second term the derivative terms that contain ∂τK have dropped out in
the trace. Using Eq.(5.13) we see that
LK = 1
εliXMl X
XLj ΓMNΓLκ
j (5.17)
εliXMl X
j (ΓMNL + ηNLΓM − ηMLΓN )κl (5.18)
εliXi ·Xj
Xl · Γκj
(5.19)
The completely antisymmetric XMi X
l ΓMNL term in the second line vanishes since i, j, l
can only take two values. The crucial observation is that the remaining term in LK is
proportional to the dot products Xi ·Xj. Therefore the second term in (5.16) is cancelled by
the first term by choosing the appropriate δκA
ij in Eq.(5.16), thus establishing the kappa
symmetry.
The local kappa transformations (5.15) are also a symmetry of the global SU(2, 3)R
charges δκJ = δκJ0 = δκjA = 0 provided the constraints Xi · Xj = 0 are used. Hence
these charges are kappa invariant in the physical sector.
We have established the global SO(4, 2) and local Sp(2, R)× (3/4 kappa)×U(1) symme-
tries of the phase space action (5.4) in 4+2 dimensions. From it we can derive all of the phase
space actions of the systems listed in section (I) by making various gauge choices for the
local Sp(2, R)× (3/4 kappa)×U(1) symmetries. This was demonstrated for the spinless case
h = 0 in [6]. The gauge choices for XM , PM discussed in [6] now need to be supplemented
with gauge choices for VA, V̄
A by using the kappa×U(1) local symmetries.
Here we demonstrate the gauge fixing described above for the massless particle of any spin
h. The kappa symmetry effectively has 3 complex gauge parameters as explained in footnote
(8). If the kappa gauge is fixed by using two of its parameters we reach the following forms
, V̄ A → (0 v̄α) , V̄ V → 0,
1− 2hV V̄
)−1/2 →
. (5.20)
By inserting this gauge fixed form of V, and the gauge fixed form of X,P given in Eq.(4.8),
into the action (5.4) we immediately recover the SL(2, C) covariant action of Eq.(3.2). The
U(1) gauge symmetry is intact. The kappa symmetry of the action of Eq.(3.2) discussed in
Eqs.(3.4,3.5) is the residual 1/4 kappa symmetry of the more general action ((5.4).
For other examples of gauge fixing that generates some of the systems in the list of section
(I) see our related paper [1].
VI. GENERAL TWISTOR TRANSFORM (CLASSICAL)
The various formulations of spinning particles described above all contain gauge degrees
of freedom of various kinds. However, they all have the global symmetry SU(2, 2)=SO(4, 2)
whose conserved charges J BA are gauge invariant in all the formulations. The most sym-
metric 2T-physics version gave the J BA as embedded in SU(2, 3)R in the SU(2, 2) projected
form in Eq.(4.5)
SU(2,2)
. (6.1)
Since this is gauge invariant, when gauge fixed, it must agree with the Noether charges
computed in any version of the theory. So we can equate the general phase space version of
Eq.(5.9) with the twistor version that follows from the Noether currents of (2.3) as follows
J = Z(h)Z̄(h) − 1
Z(h)Z̄(h)
1− 2hV V̄
1− 2hV V̄
J0 (6.2)
The trace corresponds to the U(1) charge J0 = Tr
Z(h)Z̄(h)
J0 = Z
(h)Z̄(h) =
1− 2hV V̄
1− 2hV V̄
. (6.3)
In the case of h = 0 this becomes
Z(0)Z̄(0) = L. (6.4)
Therefore the equality (6.3) is solved up to an irrelevant phase by
Z(h) =
1− 2hV V̄
Z(0). (6.5)
By inserting (6.4) into the constraint (5.11) we learn a new form of the constraints
V̄ Z(0) =
1− 2hV̄ V , V̄ Z(h) = 1. (6.6)
In turn, this implies
Z(0) =
1− 2hV̄ V
(6.7)
which is consistent9 with Z(0)Z̄(0) = L , and its vanishing trace Z̄(0)Z(0) = 0 since LL = 0
(due to X2 = P 2 = X · P = 0). Putting it all together we then have
Z(h) =
1− 2hV V̄
1− 2hV̄ V
V. (6.8)
We note that this Z(h) is proportional to the non-conserved coset part of the SU(2, 3) charges
J2,3, that is jA =
(h) given in Eqs.(4.5,4.6) or (5.10), when g and L are replaced by
their gauge fixed forms, and use the constraint10 J0 = 2h.
The key for the general twistor transform for any spin is Eq.(6.5), or equivalently (6.8).
The general twistor transform between Z(0) and XM , PM which satisfies Z(0)Z̄(0) = L is
already given in [6] as
Z(0) =
= −i X
, λ(0)α λ̄
X+P µ −XµP+
(σµ)αβ̇ . (6.9)
Note that (X+P µ −XµP+) is compatible with the requirement that any SL(2, C) vector
constructed as λ
must be lightlike. This property is satisfied thanks to the Sp(2, R)
constraints X2 = P 2 = X · P = 0 in 4+2 dimensions, thus allowing a particle of any
mass in the 3 + 1 subspace (since P µPµ is not restricted to be lightlike). Besides satisfying
Z(0)Z̄(0) = L, this Z(0) also satisfies Z̄(0)Z(0) = 0, as well as the canonical properties of
twistors. Namely, Z(0) has the property [6]
dτ Z̄(0)∂τZ
(0) =
dτ ẊMPM . (6.10)
From here, by gauge fixing the Sp(2, R) gauge symmetry, we obtain the twistor transforms
for all the systems listed in section (I) for h = 0 directly from Eq.(6.9), as demonstrated
in [6]. All of that is now generalized at once to any spin h through Eq.(6.5). Hence (6.5)
together with (6.9) tell us how to construct explicitly the general twistor Z
A in terms
9 To see this, we note that Eqs.(6.4,6.6) lead to LV V̄ L
1−2hV̄ V =
Z(0)Z̄(0)V V̄ Z(0)Z̄(0)
1−2hV̄ V = Z
(0)Z̄(0) = L.
10 For the high spin version (Ã = 0) we don’t use the constraint. Instead, we use Z(h) = 1√
1−2hV V̄
Z(0) only
in its form (6.5), and note that, after using Eq.(6.4), the jA in Eq.(5.10) takes the form jA =
1−2hV̄ V
, and it is possible to rescale h away everywhere
2hV → V.
of spin & phase space degrees of freedom XM , PM , VA, V̄
A. Then the Sp(2, R) and kappa
gauge symmetries that act on XM , PM , VA, V̄
A can be gauge fixed for any spin h, to give
the specific twistor transform for any of the systems under consideration.
We have already seen in Eq.(6.2) that the twistor transform (6.5) relates the conserved
SU(2, 2) charges in twistor and phase space versions. Let us now verify that (6.5) provides
the transformation between the twistor action (2.3) and the spin & phase space action (5.4).
We compute the canonical structure as follows
dτ Z̄(h)∂τZ
(h) =
dτ Z̄(0)
1− 2hV V̄
1− 2hV V̄
(6.11)
Z̄(0) 1√
1−2hV V̄
1−2hV V̄
+Z̄(0) 1
1−2hV V̄ ∂τZ
(6.12)
Ẋ · P + Tr
(i∂τ t) t
(6.13)
The last form is the canonical structure of spin & phase space as given in (5.4). To prove this
result we used Eq.(6.10), footnote (6), and the other properties of Z(0) including Eqs.(6.4-
6.7), as well as the constraints X2 = P 2 = X · P = 0, and dropped some total derivatives.
This proves that the canonical properties of Z(h) determine the canonical properties of spin
& phase space degrees of freedom and vice versa.
Then, including the terms that impose the constraints, the twistor action (2.3) and the
phase space action (5.4) are equivalent. Of course, this is expected since they are both gauge
fixed versions of the master action (4.1), but is useful to establish it also directly via the
general twistor transform given in Eq.(6.5).
VII. QUANTUM MASTER EQUATION, SPECTRUM, AND DUALITIES
In this section we derive the quantum algebra of the gauge invariant observables J BA
and J0 which are the conserved charges of [SU(2, 2)×U(1)]R. Since these are gauge invariant
symmetry currents they govern the system in any of its gauge fixed versions, including in
any of its versions listed in section (I). From the quantum algebra we deduce the constraints
among the physical observables J BA ,J0 and quantize the theory covariantly. Among other
things, we compute the Casimir eigenvalues of the unitary irreducible representation of
SU(2, 2) which classifies the physical states in any of the gauge fixed version of the theory
(with the different 1T-physics interpretations listed in section (I)).
The simplest way to quantize the theory is to use the twistor variables, and from them
compute the gauge invariant properties that apply in any gauge fixed version. We will apply
the covariant quantization approach, which means that the constraint due to the U(1) gauge
symmetry will be applied on states. Since the quantum variables will generally not satisfy
the constraints, we will call the quantum twistors in this section ZA, Z̄
A to distinguish them
from the classical Z
A , Z̄
(h)A of the previous sections that were constrained at the classical
level. So the formalism in this section can also be applied to the high spin theories (discussed
in several footnotes up to this point in the paper) by ignoring the constraint on the states.
According to the twistor action (2.3) ZA and iZ̄
A (or equivalently λα and iµ̄
α) are canon-
ical conjugates. Therefore the quantum rules (equivalent to spin & phase space quantum
rules) are
ZA, Z̄
= δ BA . (7.1)
These quantum rules, as well as the action, are manifestly invariant under SU(2, 2) . In
covariant SU(2, 2) quantization the Hilbert space contains states which do not obey the
U(1) constraint on the twistors. At the classical level the constraint was J0 = Z̄Z = 2h,
but in covariant quantization this is obeyed only by the U(1) gauge invariant subspace of
the Hilbert space which we call the physical states. The quantum version of the constraint
requires Ĵ0 as a Hermitian operator applied on states (we write it as Ĵ0 to distinguish it
from the classical version)
Ĵ0 =
A + Z̄AZA
, Ĵ0|phys〉 = 2h|phys〉. (7.2)
The operator Ĵ0 has non-trivial commutation relations with ZA, Z̄
A which follow from the
basic commutation rules above
Ĵ0, ZA
= −ZA,
Ĵ0, Z̄
= Z̄ A. (7.3)
By rearranging the orders of the quantum operators ZAZ̄
A = Z̄AZA+4 we can extract from
(7.2) the following relations
Z̄Z = Ĵ0 − 2, T r
= Ĵ0 + 2. (7.4)
Furthermore, by using Noether’s theorem for the twistor action (2.3) we can derive the 15
generators of SU(2, 2) in terms of the twistors and write them as a traceless 4 × 4 matrix
J BA at the quantum level as follows
J BA = ZAZ̄B −
δ BA =
ZZ̄ −
Ĵ0 + 2
. (7.5)
In this expression the order of the quantum operators matters and gives rise to the shift
J0 → Ĵ0 + 2 in contrast to the corresponding classical expression. The commutation rules
among the generators J BA and the ZA, Z̄A are computed from the basic commutators (7.1),
J BA , ZC
= −δ BC ZA +
J BA , Z̄D
= δ DA Z̄
B − 1
Z̄D δ BA (7.6)
J BA ,J DC
= δ DA J BC − δ BC J DA ,
Ĵ0,J BA
= 0. (7.7)
We see from these that the gauge invariant observables J BA satisfy the SU(2, 2) Lie algebra,
while the ZA, Z̄
A transform like the quartets 4, 4̄ of SU(2, 2) . Note that the operator Ĵ0
commutes with the generators J BA , therefore J BA is U(1) gauge invariant, and furthermore
Ĵ0 must be a function of the Casimir operators of SU(2, 2) . When Ĵ0 takes the value 2h
on physical states, then the Casimir operators also will have eigenvalues on physical states
which determine the SU(2, 2) representation in the physical sector.
From the quantum rules (7.3), it is evident that the U(1) generator Ĵ0 can only have
integer eigenvalues since it acts like a number of operator. More directly, through Eq.(7.4)
it is related to the number operator Z̄Z. Therefore the theory is consistent at the quantum
level (7.2) provided 2h is an integer.
Let us now compute the square of the matrix J BA . By using the form (7.5) we have
(JJ ) =
ZZ̄ − Ĵ0+2
ZZ̄ − Ĵ0+2
= ZZ̄ZZ̄ − 2 Ĵ0+2
ZZ̄ +
Ĵ0+2
where we have used
Ĵ0, ZAZ̄
= 0. Now we elaborate
ZZ̄ZZ̄
Ĵ0 − 2
Z̄B =
Ĵ0 − 1
B where
we first used (7.4) and then (7.3). Finally we note from (7.5) that ZAZ̄
B = J BA + Ĵ0+24 δ
Putting these observations together we can rewrite the right hand side of (JJ ) in terms of
J and Ĵ0 as follows11
(JJ ) =
Ĵ20 − 4
. (7.8)
11 A similar structure at the classical level can be easily computed by squaring the expression for J in Eq.(6.2)
and applying the classical constraint J0 = Z̄
AZA = 2h. This yields the classical version J CA J BC =
J BA + 316J
A = hJ BA + 34h
2δ BA , which is different than the quantum equation (7.8). Thus, the
quadratic Casimir at the classical level is computed as C2 =
J20 = 3h
2 which is different than the
quantum value in (7.16).
This equation is a constraint satisfied by the global [SU(2, 2)×U(1)]R charges J BA , Ĵ0 which
are gauge invariant physical observables. It is a correct equation for all the states in the
theory, including those that do not satisfy the U(1) constraint (7.2). We call this the
quantum master equation because it will determine completely all the SU(2, 2) properties of
the physical states for all the systems listed in section (I) for any spin.
By multiplying the master equation with J and using (7.8) again we can compute JJJ .
Using this process repeatedly we find all the powers of the matrix J
(J )n = αnJ + βn, (7.9)
where
αn(Ĵ0) =
Ĵ0 − 1
Ĵ0 − 2
Ĵ0 + 2
, (7.10)
βn(Ĵ0) =
Ĵ20 − 4
αn−1(Ĵ0). (7.11)
Remarkably, these formulae apply to all powers, including negative powers of the matrix J .
Using this result, any function of the matrix J constructed as a Taylor series takes the form
f (J ) = α
(7.12)
where
Ĵ0 − 1
Ĵ0 − 2
Ĵ0 + 2
, (7.13)
Ĵ0 − 1
Ĵ0 + 2
Ĵ0 − 2
Ĵ0 − 2
Ĵ0 + 2
 . (7.14)
We can compute all the Casimir operators by taking the trace of J n in Eq.(7.9), so we
find12
Cn(Ĵ0) ≡ Tr (J )n = 4βn(Ĵ0) =
Ĵ20 − 4
αn−1(Ĵ0). (7.15)
In particular the quadratic, cubic and quartic Casimir operators of SU(2, 2) =SO(6, 2) are
computed at the quantum level as
C2(Ĵ0) =
Ĵ20 − 4
, C3(Ĵ0) =
Ĵ20 − 4
Ĵ0 − 4
, (7.16)
C4(Ĵ0) =
Ĵ20 − 4
7Ĵ20 − 32Ĵ0 + 52
. (7.17)
12 Other definitions of Cncould differ from ours by normalization or linear combinations of the Tr (J n).
The eigenvalue of the operator Ĵ0 on physical states Ĵ0|phys〉 = 2h|phys〉 completely fixes
the unitary SU(2, 2) representation that classifies the physical states, since the most general
representation of SO(4, 2) is labeled by the three independent eigenvalues of C2, C3 and C4.
Obviously, this result is a special representation of SU(2, 2) since all the Casimir eigenvalues
are determined in terms of a single half integer number h. Therefore we conclude that all of
the systems listed in section (I) share the very same unitary representation of SU(2, 2) with
the same Casimir eigenvalues given above.
In particular, for spinless particles
Ĵ0 → h = 0
we obtain C2 = −3, C3 = 6, C4 = −394 ,
which is the unitary singleton representation of SO(4, 2) =SU(2, 2). This is in agreement
with previous covariant quantization of the spinless particle in any dimension directly in
phase space in d + 2 dimensions, which gave for the SO(d, 2) Casimir the eigenvalue as
MN → 1 − d2/4 on physical states that satisfy X2 = P 2 = X · P = 0 [8].
So, for d = 4 we get C2 = −3 in agreement with the quantum twistor computation above.
Note that the classical computation either in phase space or twistor space would give the
wrong answer C2 = 0 when orders of canonical conjugates are ignored and constraints used
classically.
Of course, having the same SU(2, 2) Casimir eigenvalue is one of the infinite number of
duality relations among these systems that follow from the more general twistor transform
or the master 2T-physics theory (4.1). All dualities of these systems amount to all quantum
functions of the gauge invariants J BA that take the same gauge invariant values in any of
the physical Hilbert spaces of the systems listed in section (I).
All the physical information on the relations among the physical observables is already
captured by the quantum master equation (7.8), so it is sufficient to concentrate on it. The
predicted duality, including these relations, can be tested at the quantum level by computing
and verifying the equality of an infinite number of matrix elements of the master equation
between the dually related quantum states for the systems listed in section (I). In the
case of the Casimir operators Cn the details of the individual states within a representation
is not relevant, so that computation whose result is given above is among the simplest
computations that can be performed on the systems listed in section (I) to test our duality
predictions. This test was performed successfully for h = 0 at the quantum level for some
of these systems directly in their own phase spaces [26], verifying for example, that the free
massless particle, the hydrogen atom, the harmonic oscillator, the particle on AdS spaces,
all have the same Casimir eigenvalues C2 = −3, C3 = 6, C4 = −394 at the quantum level.
Much more elaborate tests of the dualities can be performed both at the classical and
quantum levels by computing any function of the gauge invariant J BA and checking that it
has the same value when computed in terms of the spin & phase space of any of the sys-
tems listed in section (I). At the quantum level all of these systems have the same Casimir
eigenvalues of the Cn for a given h. So their spectra must correspond to the same unitary
irreducible representation of SU(2, 2) as seen above. But the rest of the labels of the repre-
sentation correspond to simultaneously commuting operators that include the Hamiltonian.
The Hamiltonian of each system is some operator constructed from the observables J BA ,
and so are the other simultaneously diagonalizable observables. Therefore, the different
systems are related to one another by unitary transformations that sends one Hamiltonian
to another, but staying within the same representation. These unitary transformations are
the quantum versions of the gauge transformations of Eq.(4.7), and so they are the duality
transformations at the quantum level. In particular the twistor transform applied to any of
the systems is one of those duality transformations. By applying the twistor transforms we
can map the Hilbert space of one system to another, and then compute any function of the
gauge invariant J BA between dually related states of different systems. The prediction is
that all such computations within different systems must give the same result.
Given that J BA is expressed in terms of rather different phase space and spin degrees
of freedom in each dynamical system with a different Hamiltonian, this predicted duality is
remarkable. 1T-physics simply is not equipped to explain why or for which systems there
are such dualities, although it can be used to check it. The origin as well as the proof of
the duality is the unification of the systems in the form of the 2T-physics master action
of Eq.(4.1) in 4+2 dimensions. The existence of the dualities, which can laboriously be
checked using 1T-physics, is the evidence that the underlying spacetime is more beneficially
understood as being a spacetime in 4+2 dimensions.
VIII. QUANTUM TWISTOR TRANSFORM
We have established a master equation for physical observables J at the quantum level.
Now, we also want to establish the twistor transform at the quantum level expressed as
much as possible in terms of the gauge invariant physical quantum observables J . To this
end we write the master equation (7.8) in the form
J − 3
Ĵ0 − 2
J + 1
Ĵ0 + 2
= 0. (8.1)
Recall the quantum equation (7.5) J + Ĵ0+2
= ZZ̄, so the equation above is equivalent to
J − 3
Ĵ0 − 2
Z = 0. (8.2)
This is a 4× 4 matrix eigenvalue equation with operator entries. The general solution is
Ĵ0 + 2
V̂ (8.3)
where V̂A is any spinor up to a normalization. This is verified by using the master equation
(8.1) which gives
J − 3
Ĵ0 − 2
J − 3
Ĵ0 − 2
J + 1
Ĵ0 + 2
V̂ = 0. Not-
ing that the solution (8.3) has the same form as the classical version of the twistor transform
in Eq.(6.8), except for the quantum shift J0 → Ĵ0 + 2, we conclude that the V̂A introduced
above is the quantum version of the VA discussed earlier (up to a possible renormalization
as belonging to the coset SU(2, 3) /[SU(2, 2)×U(1)].
Now V̂A is a quantum operator whose commutation rules must be compatible with those
of ZA, Z̄
A, Ĵ0 and J BA . Its commutation rules with J BA , Ĵ0 are straightforward and fixed
uniquely by the SU(2, 2)×U(1) covariance
Ĵ0, V̂A
= −V̂A,
Ĵ0, V̂
, (8.4)
J BA , V̂C
= −δ BC V̂A +
V̂C δ
J BA , V̂
= δ DA V̂
δ BA . (8.5)
Other quantum properties of V̂A follow from imposing the quantum property Z̄Z = Ĵ0 −
2 in (7.4). Inserting Z of the form (8.3), using the master equation, and observing the
commutation rules (8.4), we obtain
J + Ĵ0 + 2
V̂ = 1. (8.6)
13 The quantum version of V̂ is valid in the whole Hilbert space, not only in the subspace that satisfies the
U(1) constraint Ĵ0 → 2h. In particular, in the high spin version, already at the classical level we must
take V̂ = V (
J0) and then rescale it V
2h → V as described in previous footnotes. So in the full
quantum Hilbert space we must take V̂ =
2hV (Ĵ0 + γ)
−1/2 (or the rescaled version V
2h → V ) with
the possibly quantum shifted operator (Ĵ0 + γ)
−1/2.
This is related to (5.11) if we take (5.9) into account by including the quantum shift J0 →
Ĵ0 + 2. Considering (8.3) this equation may also be written as
V̂ Z = Z̄V̂ = 1. (8.7)
Next we impose
ZA, Z̄
= δ BA to deduce the quantum rules for [V̂A, V̂
]. After some
algebra we learn that the most general form compatible with
ZA, Z̄
= δ BA is
V̂A, V̂
= − V̂ V̂
Ĵ0 − 1
δ BA +
M(J − 3 Ĵ0 − 2
) + (J − 3 Ĵ0 − 2
, (8.8)
where M BA is some complex matrix and M̄ = (η2,2)M
† (η2,2)
. The matrix M BA could not
be determined uniquely because of the 3/4 kappa gauge freedom in the choice of V̂A itself.
A maximally gauge fixed version of V̂A corresponds to eliminating 3 of its components
V̂2,3,4 = 0 by using the 3/4 kappa symmetry, leaving only A ≡ V̂1 6= 0. Then we find V̂
1,2,4
and V̄ 3 = A†. Let us analyze the quantum properties of this gauge in the context of the
formalism above. From Eq.(8.6) we determine A = (J 13 )
e−iφ, where φ is a phase, and
then from Eq.(8.3) we find ZA.
J 1A +
Ĵ0 + 2
)−1/2
e−iφ, Z̄A = eiφ
)−1/2
J A3 +
Ĵ0 + 2
(8.9)
We see that, except for the overall phase, ZA is completely determined in terms of the
gauge invariant J BA . We use a set of gamma matrices ΓM given in ([6],[11]) to write
J BA = 14iJ
MN (ΓMN)
A as an explicit matrix so that ZA can be written in terms of the
15 SO(4, 2) =SU(2, 2) generators JMN . We find
J12 + 1
J+− + 1
′−′ + Ĵ0+2
(J+1 + iJ+2)
′1 + iJ+
e−iφ√
, (8.10)
and Z̄A =
Z†η2,2
. The orders of the operators here are important. The basisM = ±′,±, i
with i = 1, 2 corresponds to using the lightcone combinations X±
′ ±X1′
, X± =
(X0 ±X1).
From our setup above, the ZA, Z̄
A in (8.10) are guaranteed to satisfy the twistor commuta-
tion rules
ZA, Z̄
= δ BA provided we insure that the V̂A, V̂
have the quantum properties
given in Eqs.(8.4,8.5,8.8). These are satisfied provided we take the following non-trivial
commutation rules for φ
φ, Ĵ0
= i, [φ, J12] =
Ĵ0, e
= ±e±iφ, [J12, e±iφ] = ±1
e±iφ (8.11)
while all other commutators between φ and JMN vanish. Then (8.8) becomes [V̂A, V̂
] = 0,
so M BA vanishes in this gauge. Indeed one can check directly that only by using the Lie
algebra for the JMN , Ĵ0 and the commutation rules for φ in (8.11), we obtain
ZA, Z̄
= δ BA ,
which a remarkable form of the twistor transform at the quantum level.
The expression (8.10) for the twistor is not SU(2, 2) covariant. Of course, this is because
we chose a non-covariant gauge for V̂A. However, the global symmetry SU(2, 2) is still intact
since the correct commutation rules between the twistors and JMN or the J BA as given in
(7.6,7.7) are built in, and are automatically satisfied. Therefore, despite the lack of manifest
covariance, the expression for ZA in (8.10) transforms covariantly as the spinor of SU(2, 2) .
It is now evident that one has many choices of gauges for V̂A. Once a gauge is picked
the procedure outlined above will automatically produce the quantum twistor transform in
that gauge, and it will have the correct commutation rules and SU(2, 2) properties at the
quantum level. For example, in the SL(2, C) covariant gauge of Eq.(5.20), the quantum
twistor transform in terms of JMN is
µα̇ =
Jµν (σ̄
vβ̇ +
′−′vα̇, λα =
′µ (σµ)αβ̇ v
β̇. (8.12)
with the constraint
v̄σµvJ
+′µ = 1. (8.13)
This gauge for V̂M covers several of the systems listed in section (I). The spinless case was
discussed at the classical level in ([6]). The quantum properties of this gauge are discussed
in more detail in ([1]).
The result for ZA in (8.10) is a quantum twistor transform that relies only on the gauge
invariants J BA or equivalently JMN . It generalizes a similar result in [6] that was given at the
classical level. In the present case it is quantum and with spin. All the information on spin is
included in the generators JMN = LMN +SMN . There are other ways of describing spinning
particles. For example, one can start with a 2T-physics action that uses fermions ψM (τ)
[27] instead of our bosonic variables VA (τ) . Since we only use the gauge invariant J
MN , our
quantum twistor transform (8.3) applies to all such descriptions of spinning particles, with
an appropriate relation between V̂ and the new spin degrees of freedom. In particular in the
gauge fixed form of V̂ that yields (8.10) there is no need to seek a relation between V̂ and
the other spin degrees of freedom. Therefore, in the form (8.10), if the JMN are produced
with the correct quantum algebra SU(2, 2) =SO(4, 2) in any theory, (for example bosonic
spinors, or fermions ψM , or the list of systems in section (I), or any other) then our formula
(8.3) gives the twistor transform for the corresponding degrees of freedom of that theory.
Those degrees of freedom appear as the building blocks of JMN . So, the machinery proposed
in this section contains some very powerful tools.
IX. THE UNIFYING SU(2, 3) LIE ALGEBRA
The 2T-physics action (4.1) offered the group SU(2, 3) as the most symmetric unifying
property of the spinning particles for all the systems listed in section (I), including twistors.
Here we discuss how this fundamental underlying structure governs and simplifies the quan-
tum theory.
We examine the SU(2, 3) charges J BA , Ĵ0, jA, j̄A given in (4.5,5.9,5.10). Since these are
gauge invariant under all the gauge symmetries (4.7) they are physical quantities that should
have the properties of the Lie algebra14 of SU(2, 3) in all the systems listed in section (I).
Using covariant quantization we construct the quantum version of all these charges in terms
of twistors. By using the general quantum twistor transform of the previous section, these
charges can also be written in terms of the quantized spin and phase space degrees of freedom
of any of the relevant systems.
The twistor expressions for Ĵ0,J BA are already given in Eqs.(7.2,7.5)
Ĵ0 =
A + Z̄AZA
, J BA = ZAZ̄B −
Ĵ0 + 2
δ BA . (9.1)
We have seen that at the classical level (jA)classical =
J0ZA and now we must figure out
14 Even when jA is not a conserved charge when the U(1) constraint is imposed, its commutation rules are
still the same in the covariant quantization approach, independently than the constraint.
the quantum version jA =
Ĵ0 + αZA that gives the correct SU(2, 3) closure property
jA, j̄
= J BA +
A . (9.2)
The coefficient 5
is determined by consistency with the Jacobi identity
jA, j̄
j̄B, jC
[jC , jA, ] , j̄
= 0, and the requirement that the commutators of jA with
J BA , Ĵ0 be just like those of ZA given in Eqs.(7.6,7.7), as part of the SU(2, 3) Lie algebra.
So we carry out the computation in Eq.(9.2) as follows
jA, j̄
Ĵ0 + αZAZ̄
Ĵ0 + α− Z̄B
Ĵ0 + α
Ĵ0 + αZA (9.3)
Ĵ0 + α
Ĵ0 + α− 1
Z̄BZA (9.4)
Ĵ0 + α− 1
ZA, Z̄
+ ZAZ̄
B (9.5)
= δ BA
Ĵ0 + α− 1 +
Ĵ0 + 2
+ J BA (9.6)
To get (9.4) we have used the properties ZAf
Ĵ0 + 1
ZA and Z̄
Ĵ0 − 1
Z̄B for any function f
. These follow from the commutator
Ĵ0, ZA
= −ZA
written in the form ZAĴ0 =
Ĵ0 + 1
ZA which is used repeatedly, and similarly for Z̄
B. To
get (9.6) we have used
ZA, Z̄
= δ BA and then used the definitions (9.1). By comparing
(9.6) and (9.2) we fix α = 1/2. Hence the correct quantum version of jA is
Ĵ0 +
ZA = ZA
Ĵ0 −
. (9.7)
The second form is obtained by using ZAf
Ĵ0 + 1
Note the following properties of the jA, j̄
j̄AjA =
Ĵ0 −
Ĵ0 −
Ĵ0 −
Ĵ0 − 2
(9.8)
Ĵ0 +
Ĵ0 +
Ĵ0 +
Ĵ0 + 2
(9.9)
which will be used below.
With the above arguments we have now constructed the quantum version of the SU(2, 3)
charges written as a 5× 5 traceless matrix
Ĵ2,3 =
quantum
(9.10)
B − 1
Ĵ0 +
Ĵ0 +
 , (9.11)
with Ĵ0,J given in Eq.(9.1).
At the classical level, the square of the matrix J2,3 vanishes since L2 = 0 as follows
(J2,3)
classical
= g−1
g = 0. (9.12)
At the quantum level we find the following non-zero result which is SU(2, 3) covariant
Ĵ2,3
ZZ̄ − 1
Ĵ0 +
Ĵ0 +
(9.13)
Ĵ2,3
− 1. (9.14)
By repeatedly using the same equation we can compute all powers
Ĵ2,3
, and by taking
traces we obtain the Casimir eigenvalues of the SU(2, 3) representation. For example the
quadratic Casimir is
Ĵ2,3
= −5. (9.15)
Written out in terms of the charges, Eq.(9.14) becomes
− 1. (9.16)
Collecting terms in each block we obtain the following relations among the gauge invariant
charges J , Ĵ0, j, j̄
− jj̄ + 5
+ 1 = 0, (9.17)
j − jĴ0 +
j = 0, (9.18)
−j̄j +
Ĵ0 + 1 = 0. (9.19)
Combined with the information in Eq.(9.9) the first equation is equivalent to the master
quantum equation (7.8). After using jĴ0 = Ĵ0j+ j, the second equation is equivalent to the
eigenvalue equation (8.2) whose solution is the quantum twistor transform (8.3). The third
equation is identical to (9.8).
Hence the SU(2, 3) quantum property
Ĵ2,3
Ĵ2,3
− 1, or equivalently
Ĵ2,3 + 2
Ĵ2,3 +
= 0, governs the quantum dynamics of all the sytems listed in sec-
tion (I) and captures all of the physical information, twistor transform, and dualities as a
property of a fixed SU(2, 3) representation whose generators satisfy the given constraint.
This is a remarkable simple unifying description of a diverse set of spinning systems, that
shows the existence of the sophisticated higher structure SU(2, 3) for which there was no
clue whatsoever from the point of view of 1T-physics.
X. FUTURE DIRECTIONS
One can consider several paths that generalizes our discussion, including the following.
• It is straightforward to generalize our theory by replacing SU(2, 3) with the super-
group SU(2, (2 + n) |N) . This generalizes the spinor VA to V aA where a labels the
fundamental representation of the supergroup SU(n|N) . The case of N = 0 and n = 1
is what we discussed in this paper. The case of n = 0 and any N relates to the
superparticle with N supersymmetries (and all its duals) discussed in [22] and in
[6][7]. The massless particle gauge is investigated in [17], but the other cases listed
in section (I) remain so far unexplored. The general model has global symmetry
SU(2, 2)×SU(n|N)×U(1) ⊂ [SU(2, (2 + n) |N)]R if a U(1) gauging is included, or the
full global symmetry [SU(2, (2 + n) |N)]R in its high spin version. It also has local
gauge symmetries that include bosonic & fermionic kappa symmetries embedded in
[SU(2, (2 + n) |N)]L as well as the basic Sp(2, R) gauge symmetry. The gauge sym-
metries insure that the theory has no negative norm states. In the massless particle
gauge, this model corresponds to supersymmetrizing spinning particles rather than
supersymmetrizing the zero spin particle. The usual R-symmetry group in SUSY is
replaced here by SU(n|N)×U(1) . For all these cases with non-zero n,N , the 2T-
physics and twistor formalisms unify a large class of new 1T-physics systems and
establishes dualities among them.
• One can generalize our discussion in 4+2 dimensions, including the previous paragraph,
to higher dimensions. The starting point in 4+2 dimensions was SU(2, 2) =SO(4, 2)
embedded in g =SU(2, 3) . For higher dimensions we start from SO(d, 2) and seek a
group or supergroup that contains SO(d, 2) in the spinor representation. For example
for 6+2 dimensions, the starting point is the 8×8 spinor version of SO(8∗) =SO(6, 2)
embedded in g =SO(9∗) =SO(6, 3) or g =SO(10∗) =SO(6, 4) . The spinor variables
in 6+2 dimensions VA will then be the spinor of SO(8
∗) =SO(6, 2) parametrizing the
coset SO(9∗) /SO(8∗) (real spinor) or SO(10∗) /SO(8∗)×SO(2) (complex spinor). This
can be supersymmetrized. The pure superparticle version of this program for various
dimensions is discussed in [6][7], where all the relevant supergroups are classified.
That discussion can now be taken further by including bosonic variables embedded
in a supergroup as just outlined in the previous item. As explained before [6][7], it
must be mentioned that when d + 2 exceeds 6 + 2 it seems that we need to include
also brane degrees of freedom in addition to particle degrees of freedom. Also, even in
lower dimensions, if the group element g belongs to a group larger than the minimal
one [6][7], extra degrees of freedom will appear.
• The methods in this paper overlap with those in [28] where a similar master quantum
equation technique for the supergroup SU(2, 2|4) was used to describe the spectrum of
type-IIB supergravity compactified on AdS5×S5. So our methods have a direct bearing
onM theory. In the case of [28] the matrix insertion
in the 2T-physics action was
generalized to
L(4,2)
L(6,0)
to describe a theory in 10+2 dimensions. This approach to
higher dimensions can avoid the brane degrees of freedom and concentrate only on the
particle limit. Similar generalizations can be used with our present better develped
methods and richer set of groups mentioned above to explore various corners of M
theory.
• One of the projects in 2T-physics is to take advantage of its flexible gauge fixing
mechanisms in the context of 2T-physics field theory. Applying this concept to the
2T-physics version of the Standard Model [10] will generate duals to the Standard
Model in 3+1 dimensions. The study of the duals could provide some non-perturbative
or other physical information on the usual Standard Model. This program is about to
be launched in the near future [29]. Applying the twistor techniques developed here
to 2T-physics field theory should shed light on how to connect the Standard Model
with a twistor version. This could lead to further insight and to new computational
techniques for the types of twistor computations that proved to be useful in QCD
[12][13].
• Our new models and methods can also be applied to the study of high spin theories
by generalizing the techniques in [14] which are closely related to 2T-physics. The
high spin version of our model has been discussed in many of the footnotes, and
can be supersymmetrized and written in higher dimensions as outlined above in this
section. The new ingredient from the 2T point of view is the bosonic spinor VA and
the higher symmetry, such as SU(2, 3) and its generalizations in higher dimensions or
with supersymmetry. The massless particle gauge of our theory in 3+1 dimensions
coincides with the high spin studies in [15]-[18]. Our theory of course applies broadly to
all the spinning systems that emerge in the other gauges, not only to massless particles.
The last three sections on the quantum theory discussed in this paper would apply
also in the high spin version of our theory. The more direct 4+2 higher dimensional
quantization of high spin theories including the spinor VA (or its generalizations V
is obtained from our SU(2, 3) quantum formalism in the last section.
• One can consider applying the bosonic spinor that worked well in the particle case to
strings and branes. This may provide new string backgrounds with spin degrees of
freedom other than the familiar Neveu-Schwarz or Green-Schwarz formulations that
involve fermions.
More details and applications of our theory will be presented in a companion paper [1].
We gratefully acknowledge discussions with S-H. Chen, Y-C. Kuo, and G. Quelin.
[1] I. Bars and B. Orcal, in preparation.
[2] R. Penrose, “Twistor Algebra,” J. Math. Phys. 8 (1967) 345; “Twistor theory, its aims and
achievements, in Quantum Gravity”, C.J. Isham et. al. (Eds.), Clarendon, Oxford 1975, p.
268-407; “The Nonlinear Graviton”, Gen. Rel. Grav. 7 (1976) 171; “The Twistor Program,”
Rept. Math. Phys. 12 (1977) 65.
[3] R. Penrose and M.A. MacCallum, “An approach to the quantization of fields and space-time”,
Phys. Rept. C6 (1972) 241; R. Penrose and W. Rindler, Spinors and space-time II, Cambridge
Univ. Press (1986).
[4] A. Ferber, Nucl. Phys. B 132 (1977) 55.
[5] T. Shirafuji, “Lagrangian Mechanics of Massless Particles with Spin,” Prog. Theor. Phys. 70
(1983) 18.
[6] I. Bars and M. Picon, “Single twistor description of massless, massive, AdS, and other in-
teracting particles,” Phys. Rev. D73 (2006) 064002 [arXiv:hep-th/0512091]; “Twistor Trans-
form in d Dimensions and a Unifying Role for Twistors,” Phys. Rev. D73 (2006) 064033,
[arXiv:hep-th/0512348].
[7] I. Bars, “Lectures on twistors,” [arXiv:hep-th/0601091], appeared in Superstring Theory and
M-theory, Ed. J.X. Lu, page ; and in Quantum Theory and Symmetries IV, Ed. V.K. Dobrev,
Heron Press (2006), Vol.2, page 487 (Bulgarian Journal of Physics supplement, Vol. 33).
[8] I. Bars, C. Deliduman and O. Andreev, “ Gauged Duality, Conformal Symmetry and Space-
time with Two Times” , Phys. Rev. D58 (1998) 066004 [arXiv:hep-th/9803188]. For reviews
of subsequent work see: I. Bars, “ Two-Time Physics” , in the Proc. of the 22nd Intl. Col-
loq. on Group Theoretical Methods in Physics, Eds. S. Corney at. al., World Scientific 1999,
[arXiv:hep-th/9809034]; “ Survey of two-time physics,” Class. Quant. Grav. 18, 3113 (2001)
[arXiv:hep-th/0008164]; “ 2T-physics 2001,” AIP Conf. Proc. 589 (2001), pp.18-30; AIP Conf.
Proc. 607 (2001), pp.17-29 [arXiv:hep-th/0106021].
[9] I. Bars, “ 2T physics formulation of superconformal dynamics relating to twistors and super-
twistors,” Phys. Lett. B 483, 248 (2000) [arXiv:hep-th/0004090]. “Twistors and 2T-physics,”
AIP Conf. Proc. 767 (2005) 3 [arXiv:hep-th/0502065].
[10] I. Bars, “The standard model of particles and forces in the framework of 2T-physics”, Phys.
Rev. D74 (2006) 085019 [arXiv:hep-th/0606045]. For a summary see “The Standard Model
as a 2T-physics theory,” arXiv:hep-th/0610187.
[11] I. Bars, Y-C. Kuo, “Field Theory in 2T-physics with N = 1 supersymme-
try”, arXiv:hep-th/0702089; ibid. “Supersymmetric 2T-physics field theory”,
arXiv:hep-th/0703002.
[12] F. Cachazo, P. Svrcek and E. Witten, “ MHV vertices and tree amplitudes in gauge the-
ory”, JHEP 0409 (2004) 006 [arXiv:hep-th/0403047]; “ Twistor space structure of one-
loop amplitudes in gauge theory”, JHEP 0410 (2004) 074 [arXiv:hep-th/0406177]; “Gauge
theory amplitudes in twistor space and holomorphic anomaly”, JHEP 0410 (2004) 077
[arXiv:hep-th/0409245].
[13] For a review of Super Yang-Mills computations and a complete set of references see:
F.Cachazo and P.Svrcek, “Lectures on twistor strings and perturbative Yang-Mills theory,”
PoS RTN2005 (2005) 004, [arXiv:hep-th/0504194].
http://arxiv.org/abs/hep-th/0512091
http://arxiv.org/abs/hep-th/0512348
http://arxiv.org/abs/hep-th/0601091
http://arxiv.org/abs/hep-th/9803188
http://arxiv.org/abs/hep-th/9809034
http://arxiv.org/abs/hep-th/0008164
http://arxiv.org/abs/hep-th/0106021
http://arxiv.org/abs/hep-th/0004090
http://arxiv.org/abs/hep-th/0502065
http://arxiv.org/abs/hep-th/0606045
http://arxiv.org/abs/hep-th/0610187
http://arxiv.org/abs/hep-th/0702089
http://arxiv.org/abs/hep-th/0703002
http://arxiv.org/abs/hep-th/0403047
http://arxiv.org/abs/hep-th/0406177
http://arxiv.org/abs/hep-th/0409245
http://arxiv.org/abs/hep-th/0504194
[14] M. A. Vasiliev, JHEP 12 (2004) 046, [hep-th/0404124].
[15] S. Fedoruk, J. Lukierski, “Massive relativistic particle models with bosonic counterpart of
supersymmetry,” Phys.Lett. B632 (2006) 371-378 [hep-th/0506086].
[16] S. Fedoruk, E. Ivanov, “Master Higher-spin particle,” Class. Quant. Grav. 23 (2006) 5195-5214
[hep-th/0604111].
[17] S. Fedoruk, E. Ivanov, J. Lukierski, “Massless higher spin D=4 superparticle with both
N=1 supersymmetry and its bosonic counterpart,” Phys. Lett. B641 (2006) 226-236
[hep-th/0606053].
[18] S. Fedoruk, E. Ivanov, “New model of higher-spin particle,” [hep-th/0701177].
[19] V.G. Zima, S. Fedoruk, “Spinor (super)particle with a commuting index spinor”, JETP Lett.
61 (1995) 251-256.
[20] I. Bars and A. Hanson, Phys. Rev. D13 (1976) 1744.
[21] R. Casalbuoni, Phys. Lett. 62B (1976) 49; ibid. Nuovo Cimento 33A (1976) 389; L. Brink and
J. Schwarz, Phys. Lett. 100B (1981) 310 .
[22] I. Bars, C. Deliduman and D. Minic, “Supersymmetric Two-Time Physics”, Phys. Rev. D59
(1999) 125004, hep-th/9812161; “Lifting M-theory to Two-Time Physics”, Phys. Lett. B457
(1999) 275 [arXiv:he:hep-th/9904063].
[23] I. Bars, “Twistor superstring in 2T-physics,” Phys. Rev. D70 (2004) 104022,
[arXiv:hep-th/0407239].
[24] I. Bars, “Twistors and 2T-physics,” AIP Conf. Proc. 767 (2005) 3 , [arXiv:hep-th/0502065].
[25] I. Bars and Y-C. Kuo, “Interacting two-time Physics Field Theory with a BRST gauge In-
variant Action”, hep-th/0605267.
[26] I. Bars, “Conformal symmetry and duality between free particle, H-atom and harmonic oscilla-
tor”, Phys. Rev. D58 (1998) 066006 [arXiv:hep-th/9804028]; “Hidden Symmetries, AdSd×Sn,
and the lifting of one-time physics to two-time physics”, Phys. Rev. D59 (1999) 045019
[arXiv:hep-th/9810025].
[27] I. Bars and C. Deliduman, Phys. Rev. D58 (1998) 106004, [arXiv:he:hep-th/9806085.]
[28] I. Bars, “ Hidden 12-dimensional structures in AdS5 x S
5 and M4 x R6 supergravities,” Phys.
Rev. D 66, 105024 (2002) [arXiv:hep-th/0208012]; “ A mysterious zero in AdS(5) x S
5 super-
gravity,” Phys. Rev. D 66, 105023 (2002) [arXiv:hep-th/0205194].
[29] I. Bars and G. Quelin, in prepapartion.
http://arxiv.org/abs/hep-th/0404124
http://arxiv.org/abs/hep-th/0506086
http://arxiv.org/abs/hep-th/0604111
http://arxiv.org/abs/hep-th/0606053
http://arxiv.org/abs/hep-th/0701177
http://arxiv.org/abs/hep-th/9812161
http://arxiv.org/abs/hep-th/9904063
http://arxiv.org/abs/hep-th/0407239
http://arxiv.org/abs/hep-th/0502065
http://arxiv.org/abs/hep-th/0605267
http://arxiv.org/abs/hep-th/9804028
http://arxiv.org/abs/hep-th/9810025
http://arxiv.org/abs/hep-th/9806085
http://arxiv.org/abs/hep-th/0208012
http://arxiv.org/abs/hep-th/0205194
	Spinning Particles in 3+1 - Beyond Free and Massless 
	Twistor Lagrangian
	Massless Particle With Any Spin in 3+1 Dimensions
	2T-physics With Sp( 2,R) , SU(2,3) and Kappa Symmetries
	2T-physics Action with XM,PM,VA,A in 4+2 Dimensions
	General Twistor Transform (Classical)
	Quantum Master Equation, Spectrum, and Dualities
	Quantum Twistor Transform
	The Unifying SU( 2,3)  Lie algebra
	Future Directions
	References
ABSTRACT
  A generalized twistor transform for spinning particles in 3+1 dimensions is
constructed that beautifully unifies many types of spinning systems by mapping
them to the same twistor, thus predicting an infinite set of duality relations
among spinning systems with different Hamiltonians. Usual 1T-physics is not
equipped to explain the duality relationships and unification between these
systems. We use 2T-physics in 4+2 dimensions to uncover new properties of
twistors, and expect that our approach will prove to be useful for practical
applications as well as for a deeper understanding of fundamental physics.
Unexpected structures for a new description of spinning particles emerge. A
unifying symmetry SU(2,3) that includes conformal symmetry SU(2,2)=SO(4,2) in
the massless case, turns out to be a fundamental property underlying the
dualities of a large set of spinning systems, including those that occur in
high spin theories. This may lead to new forms of string theory backgrounds as
well as to new methods for studying various corners of M theory. In this paper
we present the main concepts, and in a companion paper we give other details.

<|endoftext|><|startoftext|>
Mon. Not. R. Astron. Soc. 000, 1–?? (2007) Printed 20 August 2019 (MN LATEX style file v2.2)
Remnant evolution after a carbon-oxygen white dwarf merger
S.-C. Yoon1,2⋆, Ph. Podsiadlowski3 and S. Rosswog4
1Astronomical Institute ”Anton Pannekoek”, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, The Netherlands
2Department of Astronomy & Astrophysics, University of California, Santa Cruz, CA95064, USA
3Department of Astrophysics, University of Oxford, Keble Road, Oxford OX1 3RH, UK
4School of Engineering and Science, Jacobs University Bremen†, Campus Ring1, Bremen 28759, Germany
Accepted/ Received
ABSTRACT
We systematically explore the evolution of the merger of two carbon-oxygen (CO) white
dwarfs. The dynamical evolution of a 0.9 M⊙ + 0.6 M⊙ CO white dwarf merger is followed
by a three-dimensional SPH simulation. The calculation uses a state-of-the art equation of
state that is coupled to an efficient nuclear reaction network that accurately approximates all
stages from helium burning up to nuclear statistical equilibrium. We use an elaborate pre-
scription in which artificial viscosity is essentially absent, unless a shock is detected, and a
much larger number of SPH particles than earlier calculations. Based on this simulation, we
suggest that the central region of the merger remnant can, once it has reached quasi-static
equilibrium, be approximated as a differentially rotating CO star, which consists of a slowly
rotating cold core and a rapidly rotating hot envelope surrounded by a centrifugally supported
disc. We construct a model of the CO remnant that mimics the results of the SPH simulation
using a one-dimensional hydrodynamic stellar evolution code and then follow its secular evo-
lution, where we include the effects of rotation on the stellar structure and the transport of
angular momentum. The influence of the Keplerian disc is implicitly treated by considering
mass accretion from the disc onto the hot envelope. The stellar evolution models indicate that
the growth of the cold core is controlled by neutrino cooling at the interface between the core
and the hot envelope, and that carbon ignition in the envelope can be avoided despite high
effective accretion rates. This result suggests that the assumption of forced accretion of cold
matter that was adopted in previous studies of the evolution of double CO white dwarf merger
remnants may not be appropriate. Specifically we find that off-center carbon ignition, which
would eventually lead to the collapse of the remnant to a neutron star, can be avoided if the
following conditions are satisfied: (1) when the merger remnant reaches quasi-static equilib-
rium, the local maximum temperature at the interface between the core and the envelope must
be lower than the critical limit for carbon-ignition. (2) Angular-momentum loss from the cen-
tral merger remnant should not occur on a time scale shorter than the local neutrino cooling
time scale at the interface. (3) The mass-accretion rate from the centrifugally supported disc
must be sufficiently low (Ṁ . 5 × 10−6...10−5 M⊙ yr
−1). Our results imply that at least
some products of double CO white dwarfs merger may be considered good candidates for
the progenitors of Type Ia supernovae. In this case, the characteristic time delay between the
initial dynamical merger and the eventual explosion would be ∼ 105 yr.
Key words: Stars: evolution – Stars: white dwarf – Stars: accretion – Supernovae: general –
1 INTRODUCTION
The coalescence of two carbon-oxygen (CO) white dwarfs with
a combined mass in excess of the Chandrasekhar limit has long
been considered a promising path towards a Type Ia supernova (SN
Ia; Iben & Tutukov 1984; Webbink 1984). Indeed, in the last few
⋆ E-mail: scyoon@science.uva.nl (SCY); podsi@astro.ox.ac.uk (PhP);
s.rosswog@iu-bremen.de (SR)
† formerly International University Bremen
years, a few massive double CO white dwarf systems have been
found that have periods short enough for them to merge within
a Hubble time (e.g. Napiwotzki et al. 2002, 2004). This double-
degenerate (DD) scenario can also easily explain the lack of hy-
drogen and helium lines in most SN Ia spectra and the occur-
rence of SNe Ia both in old and young star-forming systems (e.g.
Branch et al. 1995).
Theoretically, the final fate of double CO white dwarf merg-
ers has been much debated. Previous studies assumed that the dy-
namical disruption of the Roche-lobe filling secondary should lead
c© 2007 RAS
http://arxiv.org/abs/0704.0297v2
2 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
Figure 1. Schematic illustration of the configuration of the remnant of a
double CO white dwarf merger once quasi-static equilibrium has been es-
tablished.
to the formation of a thick disc around the primary white dwarf
(Tutukov & Yungelson 1979; Mochkovitch & Livio 1989, 1990).
Therefore, accretion of CO-rich matter from the thick disc onto
the central cold white dwarf has been studied for investigating the
evolution of such mergers by many authors (Nomoto & Iben 1985;
Saio & Nomoto 1985, 1998, 2004; Piersanti et al. 2003a,b). As ac-
cretion rates from the thick disc should be close to the Edding-
ton limit (Ṁ ≈ 10−5 M⊙yr
−1), most of those studies concluded
that carbon ignition in the envelope of the accreting white dwarf
is an inevitable consequence of such rapid accretion of CO-rich
matter. Once carbon ignites off-center, the burning flame propa-
gates inwards on a relatively short time scale (∼ 5000 yr), and
the CO white dwarf is transformed into an ONeMg white dwarf
(Saio & Nomoto 1985, 1998). When the mass of the ONeMg white
dwarf approaches the Chandrasekhar limit, electron capture onto
Ne and Mg is expected to lead to the gravitational collapse of
the white dwarf to a neutron star (Nomoto & Kondo 1991; see
Dessart et al. 2006 and Kitaura, Janka & Hillebrandt 2006 for re-
cent studies of such collapse).
However, the evolution of the remnants of double CO white
dwarf mergers is not yet well understood. For instance, it has
been debated whether the accretion rate decreases when the ac-
creting white dwarf reaches critical rotation (Piersanti et al. 2003a;
Saio & Nomoto 2004). More importantly, the canonical descrip-
tion of the merger remnant as a primary white dwarf + thick
disc system is clearly an oversimplification. In previous three-
dimensional smoothed particle hydrodynamics (SPH) simula-
tions (Benz et al. 1990; Segretain, Chabrier & Mochkovitch 1997;
Guerrero et al. 2004; see also Sect. 2), a large fraction of the dis-
rupted secondary and the outermost layers of the primary form an
extended hot envelope around the cold core containing most of the
primary mass. The rest of the secondary mass becomes a centrifu-
gally supported disc in the outermost layers of the merger rem-
nant. Interestingly, the merger remnant reaches a state of quasi-
static equilibrium within a few minutes from the onset of the dy-
namical disruption of the secondary. As the structure of the cold
core plus the hot envelope appears to have a fairly spheroidal shape
(see below) rather than the toroidal shape obtained with a zero-
temperature equation of state (Mochkovitch & Livio 1989, 1990),
the merger remnant may be better described as a differentially ro-
tating single CO star consisting of a slowly rotating cold core and
a rapidly rotating hot extended envelope surrounded by a Keple-
rian disc, as illustrated in Fig. 1, than the previously adopted pri-
mary white dwarf + thick disc system. The further evolution of the
merger must therefore be determined by the thermal cooling of the
hot envelope and the redistribution of the angular momentum in-
side the central remnant, and accretion of matter onto the envelope
from the Keplerian disc.
With this new approach to the problem in mind, we here re-
visit both the dynamical and the secular evolution of double CO
white dwarf mergers. In the following section (Sect. 2), we present
the numerical results of an SPH simulation of the dynamical evolu-
tion of the coalescence of a 0.9 M⊙ WD and a 0.6 M⊙ CO white
dwarf up to the stage of quasi-hydrostatic equilibrium, and we care-
fully investigate the structure of the merger remnant. In Sect. 3, we
construct models of the central remnant in quasi-static equilibrium
state (primary + hot extended envelope) which mimic the SPH re-
sult and calculate the thermal evolution of the merger remnant using
a hydrodynamic stellar evolution code. In particular, the conditions
for avoiding off-center carbon ignition are systematically explored.
In Sect. 4, we conclude this work by discussing uncertainties in our
assumptions, the implications for Type Ia supernovae and future
work.
2 DYNAMICAL EVOLUTION OF THE MERGER
Before discussing the subsequent thermal evolution after the co-
alescence of a double CO white dwarf coalescence binary, we
investigate the configuration of the remnant in quasi-static equi-
librium in some detail. For this purpose, we have carried out
a SPH simulation of the dynamical process of the coalescence
of two CO white dwarfs of 0.9 M⊙ and 0.6 M⊙, respec-
tively. Our simulation uses a 3D smoothed particle hydrodynam-
ics (SPH) code that is an offspring of a code developed to simu-
late neutron star mergers (Rosswog et al. 2000; Rosswog & Davies
2002; Rosswog & Liebendörfer 2003). It uses an artificial viscos-
ity scheme with time-dependent parameters (Morris & Monaghan
1997). In the absence of shocks, the viscosity parameters have a
very low value (α = 0.05 and β = 0.1; most SPH implemen-
tations use values of α = 1...1.5 and β = 2...3); if a shock is
detected, a source term (Rosswog et al. 2000) guarantees that the
parameters rise to values that are able to resolve the shock prop-
erly without spurious post-shock oscillations. To suppress artificial
viscosity forces in pure shear flows, we additionally apply a switch
originally suggested by Balsara (1995).
To account for the energetic feedback onto the fluid from
nuclear transmutations, we use a minimal nuclear reaction net-
work developed by Hix et al. (1998). It couples a conventional α-
network stretching from He to Si with a quasi-equilibrium-reduced
α-network. Although a set of only seven nuclear species is used,
this network reproduces the energy generation of all burning stages
from He-burning to NSE very accurately. For details and tests we
refer to Hix et al. (1998). We use the HELMHOLTZ equation of
state (EOS), developed by the Center for Astrophysical Thermonu-
clear Flashes at the University of Chicago. This EOS allows to
freely specify the chemical composition of the gas and can be cou-
pled to nuclear reaction networks. The electron/positron equation
of state has been calculated without approximations, i.e. it makes
no assumptions about the degree of degeneracy or relativity; the
exact expressions are integrated numerically to machine precision.
The nuclei in the gas are treated as a Maxwell-Boltzmann gas,
the photons as blackbody radiation. The EOS is used in tabular
form with densities ranging from 10−10 6 ρYe 6 10
11 g cm−3
and temperatures from 104 to 1011 K. A sophisticated, biquin-
tic Hermite polynomial interpolation is used to enforce thermody-
namic consistency (i.e. the Maxwell-relations) at interpolated val-
ues (Timmes & Swesty 2000).
We use a MacCormack predictor-corrector method (e.g.
Lomax et al. 2001) with individual particle time steps to evolve
the fluid. With our standard parameters for the tree-opening cri-
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 3
Figure 2. Dynamical evolution of the coalescence of a 0.6 M⊙ + 0.9 M⊙ CO white dwarf binary. The panels in the left column show the density in the
orbital plane, the panels in the right column the temperature in units of 106 K. Lengths are in code units (= 109 cm).
c© 2007 RAS, MNRAS 000, 1–??
4 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
Figure 3. Dynamical evolution of the coalescence of a 0.6 M⊙ + 0.9 M⊙ CO white dwarf binary. Continued from Fig. 2.
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 5
Figure 4. The evolution of the local peak of temperature during the merger
of two CO white dwarfs of 0.6M⊙ and 0.9M⊙, respectively, as a function
of time after the onset of the simulation.
terion and the integration, this time marching implementation con-
serves the total energy to better than 4 × 10−3 and the total an-
gular momentum to better than 2 × 10−4. Note that this could,
in principle, be improved even further by taking into account the
so-called “grad-h”-terms (Springel & Hernquist 2002; Monaghan
2002; Price 2004) and extra-terms arising from adapting gravita-
tional smoothing terms (Price & Monaghan 2006).
To avoid numerical artifacts, we only use equal mass SPH
particles. For the initial conditions, we therefore stretch a uniform
particle distribution according to a function that has been derived
from solving the 1D stellar structure equations. This technique is
described in detail in Rosswog, Ramirez-Ruiz & Hix (2007). This
particle setup is then further relaxed with an additional damping
force (e.g. Rosswog, Speith & Wynn 2004) so that the particles can
settle into their true equilibrium configuration. The calculations are
performed with 2×105 SPH particles, a much larger particle num-
ber than could be afforded by previous calculations, and run up to
a much longer evolutionary time (5 minutes) than previous calcu-
lations (see Table 1).
Figs. 2 and 3 show the dynamical evolution of the merging
process of the double white dwarf system considered in this study.
The panel in the left columns show the densities and the panel in
the right columns the temperatures (in units of 106 K) in the orbital
plane. The secondary is completely disrupted within 1.7 minutes,
and mass accretion onto the primary induces local heating near the
surface of the primary. Fig. 4 shows the evolution of the maximum
temperature as a function of time. The peak in temperature reaches
1.7×109 K at t ≃ 1.0 min, where t = 0.0 marks the moment when
the simulation starts. Carbon ignites when T & 109 K, but nuclear
burning is quenched soon due to the local expansion of the hottest
region, as is also observed in the simulations of Guerrero et al.
(2004). The total amount of energy released due to nuclear burn-
ing is about 1045 erg.
Segretain, Chabrier & Mochkovitch (1997) considered the
same initial white dwarf masses as in the present study. But
they adopted the original artificial viscosity prescription of
Monaghan & Varnas (1988), which is known to introduce spuri-
ous forces in shear flows, and they did not include nuclear burning
(Table 1). By the end of their calculation (t = 1.56 min), Tmax
reached 8×108 K, while in our simulation, it decreases to 8×108 K
only when t ∼ 1.7 min. Interestingly, Tmax decreases further af-
Figure 5. Top: Density contour of the merger remnant in the x− z plane at
t = 5.3 min. Here one code unit corresponds to 109 cm. Middle: Thermo-
dynamic structure of the merger remnant at t = 5.31 min: shown are the
temperature and the density as a function of distance from the centre, along
the positive x- and z-axis, as indicated. Bottom: Angular velocity in units
of the local Keplerian value at t = 5.31 min, along the positive/negative
x- and y-axis of the merger remnant.
terwards in our calculation, as shown in Fig. 4, and reaches a steady
value at Tmax ≃ 5.6 × 10
8 K when t & 2.5min. In the other cal-
culations by Benz et al. (1990) and Guerrero et al. (2004), the dy-
namical evolution of the merger was not followed for more than 2
minutes either, and we cannot directly compare our results to theirs.
However, we suspect that Tmax would also decrease further in the
systems they considered if they had continued their calculations for
a longer evolutionary time. It should also be noted that energy dis-
sipation by artificial viscosity might lead to overheating, and that
c© 2007 RAS, MNRAS 000, 1–??
6 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
thermal diffusion – which may play an important role in the outer-
most layers – is not considered in the present study. It is thus likely
that Tmax in the quasi-static equilibrium state may be even lower
in reality than in our simulation.
Fig. 5 shows the structure of the merger remnant at quasi-static
equilibrium. The central region with R . 109 cm (Mr . 1.1 M⊙)
has a fairly spheroidal shape, and a centrifugally supported disc
appears at R & 109 cm where the angular velocity is close to
the Keplerian value. The fraction of the secondary mass contained
in the Keplerian disc is larger in our simulation (about 67%) than
in Segretain, Chabrier & Mochkovitch (1997) (about 41 %). The
innermost core (R . 3 × 108 cm; Mr . 0.6 M⊙) is essen-
tially isothermal, and the temperature has its peak value (T ≃
5.6 × 108 K) at R ≃ 5 × 108 cm and Mr ≃ 0.85 M⊙. The
disc material extends over 4× 109 cm along the z-axis as the tem-
perature is still high; if thermal diffusion were included, the disc
would become much thinner on a short time scale of a few hours.
Therefore, our simulation confirms the remnant structure at quasi-
static equilibrium that is illustrated in Fig. 1. In the next section, we
investigate the secular evolution of the merger from such a quasi-
static equilibrium state.
3 SECULAR EVOLUTION OF THE MERGER REMNANT
3.1 Physical assumptions and methods
Our SPH simulation shows that the remnant of the merger of two
CO white dwarfs (0.9 M⊙ + 0.6 M⊙) in the state of quasi-static
equilibrium has the following features (see Fig. 6):
(i) The core is cold and nearly isothermal.
(ii) The local peak of temperature (Tp) is located at a mass co-
ordinate slightly less than the primary mass.
(iii) A steep gradient in temperature appears at the interface be-
tween the core and the local peak of temperature.
(iv) The interface is rather widely extended into the primary
(∆Minterface ≈ 33 % of the primary mass), and the mass of the
quasi-isothermal cold core (Mcore) is about 77 % of the primary
mass.
(v) The mass of the outer envelope above the local peak of tem-
perature contains about 33 % of the mass of the secondary, and the
rest of the secondary forms a Keplerian disc.
Let us define Tp as the local peak of temperature at quasi-static
equilibrium, MCM as the mass of the central remnant (cold core +
hot envelope), and Mp as the location of Tp in the mass coordi-
nate (i.e., Mp = Mcore + ∆Minterface; see Fig. 6). To construct
models of the central remnant, we use a one-dimensional hydrody-
namic stellar evolution code which incorporates the effects of ro-
tation on the stellar structure, transport of angular momentum due
to the shear instability, Eddington-Sweet circulation, and the Gol-
dreich, Schubert and Fricke instability, and dissipation of rotational
energy due to shear motions. The effects of magnetic fields are ne-
glected (see Sec. 4). More details about the code are described in
(Yoon & Langer 2004; hereafter YL04) and references therein.
In order to mimic the temperature profile of the central rem-
nant as obtained from the SPH simulation, we artificially deposit
energy in the envelope, using the following prescription for a white
dwarf with M = MCR:
e(Mr) = A(T
(Mr)− T (Mr)) [erg g
], (1)
where
Figure 6. Initial model of the central remnant for sequences Sa1 – Sa11.
The top and middle panels show temperature as a function of the mass co-
ordinate and radius, respectively. The solid curve in the bottom panel gives
the angular-velocity profile as a function of radius. The dashed curve de-
notes the angular velocity in units of the local Keplerian value.
T ′(Mr) =
3 · 107 K+ (7 · 107 K− 3 · 107 K)
Mcore
if Mr < Mcore,
Tp − (Tp − 7 · 10
Mp−Mr
Mp−Mcore
if Mcore 6 Mr 6 Mp,
C − (C − Tp)
log[ρ(Mr)/ρs]
log[ρ(Mp)/ρs]
if Mr > Mp.
In this way, the temperature profile in the central remnant model
follows T ′(Mr). Here, A and C are constants. We use A =
105 erg g−1 s−1 K−1 and C = 2× 108 K in most cases.
A rotational profile is imposed as
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 7
Table 1. Comparison of SPH simulations of double CO white dwarf mergers. The columns list: M1 and M2: the masses of the primary and the secondary,
respectively; NoP: the total number of particles used; νsph: the type of artificial viscosity employed, ‘std.’ refers to Monaghan & Varnas (1988); Network:
type of nuclear network employed; tsim: evolutionary time that has elapsed by the end of the calculation; Tmax: maximum temperature obtained during the
simulation; and Tp: the local peak of temperature at the end of the calculation
Ref.∗ M1 M2 NoP νsph Network tsim Tmax TP
1 1.2 M⊙ 0.9 M⊙ ∼ 7× 103 std. None 51 sec. ? ∼ 109 K
2 0.8 M⊙ 0.6 M⊙ ∼ 4× 104 std. + Balsara-switch alpha network 50 sec. 1.4× 109 K ?
2 1.0 M⊙ 0.6 M⊙ ∼ 4× 104 std. + Balsara-switch alpha network 65 sec. 1.6× 109 K ?
2 1.0 M⊙ 0.8 M⊙ ∼ 4× 104 std. + Balsara-switch alpha network 65 sec. 2.0× 109 K ?
3 0.9 M⊙ 0.6 M⊙ ∼ 6× 104 std None 1.56 min. ? ∼ 7× 108 K
4 0.9 M⊙ 0.6 M⊙ 2× 105 see Rosswog et al. (2000) QSE-alpha network 5.3 min. 1.7× 109 K 5.6× 108 K
∗1: Benz et al. (1990), 2: Guerrero et al. (2004), 3: Segretain, Chabrier & Mochkovitch (1997); 4: Present Study
Table 2. Merger remnant model sequences. Each column lists the following: No.: sequence label, MCR: mass of the central remnant, Mcore: mass of the
quasi-isothermal core, Mp: location of the local peak of temperature in the mass coordinate, Tp: the local peak of temperature, ρp: density at Mr = Mp,
τJ: adopted time scale for angular momentum loss according to Eq. (4), Ṁacc: adopted mass accretion rate from the Keplerian disc, C-ig: off-center ignition
of carbon, MWD,ig: total mass of the central remnant when off-center carbon ignition occurs, Mr,ig: location of off-center carbon ignition in the mass
coordinate.
No. MCR Mcore Mp Tp ρp τJ Ṁacc C-ig MWD,ig Mr,ig
[M⊙] [M⊙] [M⊙] [108 K] [106 g/cm3] [yr] 10−6 M⊙/yr [M⊙] [M⊙]
Sa1 1.11 0.6 0.84 5.6 0.8 ∞ 0.0 No - -
Sa2 1.11 0.6 0.84 5.6 0.8 102 0.0 Yes 1.11 0.80
Sa3 1.11 0.6 0.84 5.6 0.8 103 0.0 Yes 1.11 0.80
Sa4 1.11 0.6 0.84 5.6 0.8 104 0.0 Yes 1.11 0.85
Sa5 1.11 0.6 0.84 5.6 0.8 105 0.0 No - -
Sa6 1.11 0.6 0.84 5.6 0.8 105 10.0 Yes 1.34 1.09
Sa7 1.11 0.6 0.84 5.6 0.8 105 5.0 Yes 1.34 1.20
Sa8 1.11 0.6 0.84 5.6 0.8 105 2.0 No - -
Sa9 1.11 0.6 0.84 5.6 0.8 105 1.0 No - -
Sa10 1.11 0.6 0.84 5.6 0.8 5 · 105 5.0 No - -
Sa11 1.11 0.6 0.84 5.6 0.8 5 · 105 1.0 No - -
Aa1 1.25 0.6 0.93 5.0 2.3 ∞ 0.0 No - -
Aa2 1.25 0.6 0.93 5.0 2.3 102 0.0 Yes 1.250 0.90
Aa3 1.25 0.6 0.93 5.0 2.3 103 0.0 Yes 1.250 0.92
Aa4 1.25 0.6 0.93 5.0 2.3 104 0.0 Yes 1.250 1.12
Aa5 1.25 0.6 0.92 5.0 2.3 105 0.0 No - -
Aa6 1.25 0.6 0.92 5.0 2.3 106 0.0 No - -
Aa7 1.25 0.6 0.92 5.0 2.3 105 10.0 Yes 1.360 1.20
Aa8 1.25 0.6 0.92 5.0 2.3 105 5.0 No - -
Aa9 1.25 0.6 0.92 5.0 2.3 105 1.0 No - -
Aa10 1.25 0.6 0.92 5.0 2.3 106 10.0 Yes 1.382 1.22
Ab1 1.25 0.7 0.92 5.0 3.1 ∞ 0.0 No - -
Ab2 1.25 0.7 0.92 5.0 3.1 103 0.0 Yes 1.250 0.97
Ab3 1.25 0.7 0.92 5.0 3.1 104 0.0 No - -
Ab4 1.25 0.7 0.92 5.0 3.1 105 0.0 No - -
Ab5 1.25 0.7 0.92 5.0 3.1 105 10.0 Yes 1.344 1.21
Ab6 1.25 0.7 0.92 5.0 3.1 105 5.0 No - -
Ac1 1.25 0.5 0.88 5.9 1.6 ∞ 0.0 Yes 1.250 0.84
Ad1 1.25 0.6 0.92 6.0 1.8 ∞ 0.0 Yes 1.250 0.90
Ad2 1.25 0.6 0.92 6.0 1.8 106 5.0 Yes 1.252 0.90
Ae1 1.25 0.6 0.90 6.8 1.5 ∞ 0.0 Yes 1.250 0.87
Ba1 1.363 0.82 0.95 5.0 12.2 ∞ 0.0 No - -
Ba2 1.363 0.82 0.95 5.0 12.2 102 0.0 Yes 1.363 0.95
Ba3 1.363 0.82 0.95 5.0 12.2 103 0.0 Yes 1.363 1.12
Ba4 1.363 0.82 0.95 5.0 12.2 104 0.0 No - -
Ba5 1.363 0.82 0.95 5.0 12.2 105 0.0 No - -
Ba6 1.363 0.82 0.95 5.0 12.2 105 10.0 Yes 1.398 1.34
Ba7 1.363 0.82 0.95 5.0 12.2 105 5.0 Yes 1.483 1.43
Ba8 1.363 0.82 0.95 5.0 12.2 105 1.0 No - -
Ta1 1.25 0.60 0.86 5.0 28.8 ∞ 0.0 No - -
c© 2007 RAS, MNRAS 000, 1–??
8 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
Figure 7. (a) Evolution of a non-rotating white dwarf accreting with a constant accretion rate of Ṁ = 10−5 M⊙ yr−1 with an initial mass of 0.9 M⊙
(Seq. N0.9) in the density – temperature plane. (b) The local effective accretion rate (Ṁeff,r := 4πr
2ρv) as a function of density in Seq. N0.9, at different
evolutionary epochs as indicated by the labels. (c) – (f) The rates of energy loss/production due to neutrino (ǫν ) cooling, compressional heating (ǫcomp),
nuclear energy generation (ǫnuc) and thermal diffusion (ǫth) at different evolutionary epochs. Note that here ǫν , ǫcomp and ǫnuc represent the values which
are used in the evolutionary calculations, while ǫth is an order-of-magnitude estimate according to Eq. (7).
Ω(Mr) =
ΩO, if Mr < Mcore,
ωO + (1− ωO)
Mr−Mcore
MCR−Mcore
if Mr > Mcore,
where ΩO = 0.2
GMCR/R3 and ωO = ΩO/
GMcore/r
core.
As shown in Fig. 6, this simple assumption gives a rotational ve-
locity profile that is morphologically similar to that found in the
SPH simulation: a steep gradient at the interface between the core
and the envelope, and a local peak in the envelope. Within our 1-D
approximation of the effects of rotation, the exact shape of the ro-
tational velocity profile does not affect the main conclusions of the
present work for the following reasons. Firstly, the velocity gradient
at the interface is adjusted to the threshold value for the dynamical
shear instability on a very short time scale (see below, and discus-
sions in YL04). Secondly, our 1-D approximation underestimates
the effect of the centrifugal force on the stellar structure in layers
which rotate more rapidly than about 60 % critical (YL04), and un-
certainties due to this limit are much greater than due to the shape
of Ω(r) in the outer layers of the envelope. Possible uncertainties
due to this limitation are critically discussed in Sect. 4.
The central remnant may lose angular momentum by out-
ward angular momentum transport into the Keplerian disc
(Popham & Narayan 1991; Paczyński 1991) and/or by the gravita-
tional wave radiation, e.g., due to the r-mode instability (Andersson
1998; Friedman & Morsink 1998). Our code cannot properly de-
scribe any of these effects, and here we consider them simply by
assuming a constant time scale for the angular momentum loss (τJ;
see Knaap 2004; cf. Piersanti et al. 2003a), such that the specific
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 9
Table 3. Accreting white dwarf model sequences with a constant accretion
rate (Ṁ = 10−5 M⊙/yr). The columns list: No: sequence label, Minit:
initial mass, logLinit/L⊙ : initial luminosity, MWD,ig: the total mass of
the white dwarf when carbon ignites off-center, and Mr,ig: location of car-
bon ignition in the mass coordinate. Sequences with ‘N’ are for non-rotating
models, and ‘R’ for rotating models.
No Minit logLinit/L⊙ MWD,ig Mr,ig
N0.7 0.7 −2.118 0.999 0.793
N0.8 0.8 −2.128 1.010 0.862
N0.9 0.9 −2.188 1.039 0.939
N1.0 1.0 −2.137 1.087 1.024
N1.1 1.1 −2.170 1.150 1.114
N1.2 1.2 −2.119 1.225 1.207
R0.8 0.8 −2.114 1.297 1.038
R0.9 0.9 −2.119 1.249 1.050
R1.0 1.0 −2.082 1.207 1.069
R1.1 1.1 −2.050 1.205 1.127
angular momentum of each mass shell decreases over a time step
∆t by an amount
∆ji = ji [1− exp(−∆t/τJ)] . (4)
Mass accretion from the Keplerian disc is also considered in
some model sequences, with different values for the accretion rate
(Ṁacc). The angular-momentum accretion is treated in the same
way as in YL04: the accreted matter is assumed to carry angular
momentum at a value close to the Keplerian value if the surface
velocity of the central remnant is below critical, while no angular-
momentum accretion is allowed otherwise.
Model sequences with different sets of MCR, Mcore, Mp, τJ,
Ṁ , and Tp are calculated, as summarized in Table 2. The initial
model in Seqs S is intended to reproduce the result of our SPH sim-
ulation, where MCR = 1.10 M⊙ and Mp ≈ 0.84 M⊙ are adopted.
We also assume MCR = 1.25 M⊙ and Mp ≈ 0.9 M⊙ in Seq. A,
and MCR = 1.364 M⊙ and Mp = 0.95 M⊙ in Seq. B, to simulate
mergers of 0.9 − 1.0 M⊙ + 0.7 − 1.0 M⊙ white dwarf binaries.
At a given MCR, different sets of Mcore, Mp, and Tp are marked
in the sequence label by minor characters (a, b, c, d, e), while dif-
ferent sets of τJ and Ṁacc are indicated by Arabian numbers. For
instance, sequences Sa1 – Sa11 have the same initial merger model,
but different values for τJ and Ṁacc. Rotation is neglected in a test
sequence Ta1 (i.e., the models are non-rotating). The temperature
and angular-velocity profiles in the initial central remnant model of
Seqs Sa1 - Sa11 are shown in Fig. 6. The temperature (a few to sev-
eral 108 K) and the size (∼ 109 cm) of the envelope appear to be
comparable to those obtained from the SPH simulation (see Fig. 5).
For comparison, we also ran model sequences for classi-
cal cold-matter accretion with a constant accretion rate of Ṁ =
10−5 M⊙/yr, for both non-rotating and rotating cases, as summa-
rized in Table 3.
3.2 Results
3.2.1 Classical models of cold-matter accretion
Before discussing the central remnant models, let us first investi-
gate the evolution of classical cold-matter accreting white dwarf
models in detail. In these models, the accreted matter is assumed to
have the same entropy as the surface value of the accreting white
dwarf. As shown in previous studies (e.g. Nomoto & Iben 1985),
the thermal evolution of rapidly accreting white dwarfs is deter-
Figure 8. Local effective mass accretion rate (Ṁeff,r ≡ 4πr
2ρv) as a
function of density in the models of sequence R0.9, at different evolutionary
epochs.
mined by the interplay of compressional heating and thermal dif-
fusion. Fig. 7 shows an example of the evolution of such accret-
ing white dwarf models for an initial WD mass of 0.9 M⊙ and
a constant accretion rate of Ṁacc = 10
−5 M⊙ yr
−1 (Seq. N0.9;
Table 3).
Fig. 7a shows that the temperature increases continuously in
the envelope (ρ . 106 g cm−3), and finally carbon burning be-
comes significant at ρ ≃ 5.6 × 105 g cm−3 and T ≃ 6× 108 K
when t ≃ 1.3 × 104 yr. In Figs. 7c –f, the rates of compressional
heating (ǫcomp), neutrino cooling (ǫν ), nuclear energy generation
(ǫnuc) and thermal diffusion (ǫth) are shown. In our stellar evolu-
tion code, the compressional heating rate is calculated according
ǫcomp =
(Kippenhahn & Weigert 1990). Neutrino cooling rates are obtained
following Itoh et al. (1996). While ǫcomp, ǫν , and ǫnuc in the figures
correspond to the values that are used for the evolutionary calcula-
tions, the thermal diffusion rate (ǫth) – which is only calculated
implicitly in the code – can only be estimated to within an order-
of-magnitude from
ǫth ≈ TCP/τth . (6)
Here CP denotes the specific heat at constant pressure, and τth the
local thermal diffusion time scale defined as
τth ≡ H
P/K , (7)
where HP is the pressure scale height, and K
[(4acT 3)/(3CPκρ
2)] is the thermal diffusivity. It is clear
from Fig. 7 that the local peak of temperature is located where the
compressional heating rate begins to dominate over the thermal
diffusion rate (ρ ≈ 105 g cm−3), as expected. The neutrino
cooling rate also increases as the temperature in the envelope
becomes higher, but nuclear energy generation becomes significant
before neutrino cooling dominates the thermal evolution, inducing
a carbon-burning flash around ρ ≃ 5.6 · 105 g cm−3.
As Table 3 shows, and consistent with the findings of
Nomoto & Iben (1985), such off-center carbon flashes occur re-
gardless of the initial mass of the white dwarf, if Ṁacc ≈
10−5 M⊙ yr
−1. The results with models including rotation show
that carbon ignition may be delayed if the effect of rotation is in-
cluded (Table 3; see also Piersanti et al. 2003a and Saio & Nomoto
2004). The reason is that the local effective mass accretion rate
c© 2007 RAS, MNRAS 000, 1–??
10 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
Figure 9. (a) Evolution of the central remnant in Seq. Sa1 in the log ρ − T plane. The dotted curve gives the critical temperature where the nuclear energy
generation rate due to carbon burning equals the energy loss rate due to neutrino cooling. (b) The local effective accretion rate (Ṁeff,r ≡ 4πr
2ρvr) as a
function of density in the merger remnant model of Seq. Sa1, at different evolutionary epochs as indicated by the labels. (c) – (f) The rates of energy loss/gain
due to neutrino (ǫν ) cooling, compressional heating (ǫcomp), nuclear energy generation (ǫcomp) and thermal diffusion (ǫth) as a function of density in the
central remnant models of Seq. Sa1 at different evolutionary epochs. Note that here ǫν , ǫcomp and ǫnuc represent the values which are used in the evolutionary
calculations, while ǫth is an order-of-magnitude estimate according to Eq. (7).
(Ṁeff,r ≡ 4πr
2ρv) inside the white dwarf at a given mass is lower
because of the centrifugal force. For instance, in Seq. N0.9, we
have Ṁeff,r ≈ 10
−5 M⊙ yr
−1 at around ρ = 5 × 105 g cm−3
when t ≃ 104 yr (Fig. 7b), but Ṁeff,r is lowered by a factor of
two in the corresponding rotating model at a similar epoch (i.e.,
Ṁeff,r ≈ 5 × 10
−6 M⊙ yr
−1), as revealed in Fig. 8. However,
carbon ignition occurs well before the white dwarf reaches the
Chandrasekhar limit, in all model sequences considered. Thus, ro-
tation by itself cannot change the conclusion of the previous work
that the coalescence of double CO white dwarfs should lead to
accretion-induced collapse rather than a thermonuclear explosion,
unless the accretion rate is significantly lowered, as was also shown
by Piersanti et al. (2003a) and Saio & Nomoto (2004).
3.2.2 Sequences without angular-momentum loss and mass
accretion
Having understood the physics of the thermal evolution of CO
white dwarfs which accrete cold matter with a rate close to the Ed-
dington limit, we now investigate the evolution of the central rem-
nant model consisting of a cold core and a hot envelope as described
in Sect. 3.1. First, we examine the results of the model sequences
where both angular-momentum loss and mass accretion from the
Keplerian disc are neglected (i.e., τJ = ∞ and Ṁ = 0; Seqs Sa1,
Aa1, Ab1, Ac1, Ad1, Ae1, Ba1, & Ta1).
Fig. 9a illustrates the evolution of the central remnant for
MCR = 1.10 M⊙ in Seq. Sa1 in the density – temperature
plane. Note that the local peak of temperature at t = 0.0 (Tp =
5.6× 108 K) is significantly below the critical temperature for car-
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 11
bon ignition (TC−ig; dotted curve in Fig 9a). It is shown in Fig. 9b
that the local effective accretion rate (Ṁeff,r) remains relatively
high (5×10−6−10−5 M⊙ yr
−1) around ρ = 106 g cm−3, where
the local peak of temperature is located, for about 5000 yrs. Despite
such high effective accretion rates, the temperature peak continu-
ously decreases, although the inner core becomes somewhat hotter
due to compression, and the central remnant finally becomes a cold
white dwarf. A few remarkable differences compared to the stan-
dard accreting white dwarf models are found in this regard. Firstly,
since the envelope is very hot, neutrino cooling – in particular by
photoneutrinos – is significant from the beginning, and even domi-
nant over the thermal diffusion at the interface between the core and
the envelope as shown in Fig. 9c. In cold-matter accreting white
dwarfs, neutrino cooling becomes important only after a significant
amount of mass has been accreted (Fig. 7). Secondly, the compres-
sional heating rate is slightly lower than the neutrino cooling rate
around the local peak of temperature. As the contraction of the cen-
tral remnant is mainly determined by the thermal evolution of the
envelope, the local accretion rate is in fact controlled by the cool-
ing process. This explains why we have ǫcomp ≈ ǫν around the
local peak of temperature for the initial ∼ 104 yrs, and why the
local peak of temperature continuously decreases despite the rela-
tively high effective accretion rate. This conclusion is the same for
all other sequences with a TP that is significantly lower than TC−ig
(Seq. Aa1, Ab1, & Ba1) including the non-rotating case (Seq. Ta1;
Table 2).
We find that, in Seq. Sa1, the differentially rotating layers at
the interface between the core and the envelope in the initial central
remnant model are stable against the dynamical shear instability
(DSI). They are, however, unstable to the DSI in other sequences,
where the interface is more degenerate (see YL04 for discussions
on the DSI). Consequently, in Seq. Aa1 for example, the rate of
rotational energy dissipation (ǫrot) appears to be very high initially
(Fig. 10). The differentially rotating layers are rapidly smeared out
by the dynamical shear instability (see the discussion in Sect. 2 in
YL04), and ǫrot falls below the thermal diffusion and/or neutrino
cooling rate only within 20 yrs. Hence we conclude that the rota-
tional energy dissipation does not play an important role for the
long-term evolution of the central remnant.
Fig. 11 shows how the evolution of the central remnant
changes if the local peak of temperature in the initial model (Tp) is
close to or above the critical limit for carbon burning (TC−ig), with
Seqs Ac1 and Ae1 as examples. In contrast to Seq. Sa1 or Aa1, car-
bon burning dominates the evolution very soon in both sequences,
and the temperature increases rapidly. Although the further evolu-
tion has not been followed in the present study, it is most likely that
the carbon-burning flame propagates inward such that the central
remnant is converted into an ONeMg white dwarf within several
thousand years as shown by Saio & Nomoto (1998).
As summarized in Table 2, all other sequences follow the same
evolutionary pattern: off-center carbon ignition is avoided in Seqs
Ab1, Ba1, and Ta1 where Tp is significantly below TC−ig, while
carbon ignites off-center in the other sequences where Tp & TC−ig.
It is thus remarkable that the thermal evolution of the central rem-
nant is sensitively determined by the local peak of temperature in
the quasi-static equilibrium state.
In conclusion, in the absence of angular-momentum loss and
mass accretion from the Keplerian disc, the thermal evolution of
the central remnant is roughly controlled by neutrino cooling at the
interface between the core and the envelope, and off-center carbon
burning may be avoided as long as Tp < TC−ig, while it seems
inevitable if Tp & TC−ig.
Figure 10. Upper panel: The angular velocity relative to the local maximum
as a function of the mass coordinate in the central remnant models of Seq.
Aa1 at different evolutionary epochs. Lower panel: the rate of rotational
energy dissipation (ǫrot; see YL04) as a function of the mass coordinate in
the corresponding models shown in the upper panel.
3.3 Effect of angular momentum loss
In Seqs Sa2 – Sa5, the central remnant has the same initial condi-
tions as in Seq. Sa1, angular momentum loss from the white dwarf
with different time scales τJ is considered according to Eq. (4).
Note that off-center carbon ignition occurs in Seqs Sa2, Sa3 &
Sa4, where τJ . 10
4 yr, while it is avoided in Seq. Sa5 where
τJ = 10
5 yr. These results indicate that off-center carbon igni-
tion should be induced if the angular-momentum loss occurs too
rapidly for neutrino cooling or thermal diffusion to control the ef-
fective mass accretion. For instance, Fig. 12 shows that in Seq. Sa4,
where τJ = 10
4 yr, the effective mass accretion rate reaches a few
10−5 M⊙ yr
−1 at the interface between the core and the envelope
(ρ ≈ 106 g cm−3), and the compressional heating rate exceeds the
neutrino cooling rate.
It is shown that the critical angular-momentum-loss time
scale, τJ, for off-center carbon ignition (τJ,crit) is smaller for Seqs
Ab and Ba than for Seqs Sa and Aa: τJ,crit ≈ 10
3 for Seqs Ab
and Ba, and τJ,crit ≈ 10
4 for Seqs Sa and Aa. This is due to the
different local thermodynamic properties at the interface between
the core and the envelope in different central remnant models. As
shown in Fig. 13, higher density and/or temperature at the interface
result in a shorter neutrino cooling time, making it possible to avoid
local heating for a smaller τJ. In other words, τJ,crit roughly cor-
responds to the time scale for neutrino cooling at the local peak of
temperature (τν,p).
From this experiment, we conclude that, in the absence of
c© 2007 RAS, MNRAS 000, 1–??
12 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
Figure 11. Evolution of the central remnant in Seqs Ac1 (upper panel) and
Ae1 (lower panel), in the log ρ−T plane. The dotted curve gives the critical
temperature where nuclear energy generation rate due to carbon burning
equals to energy loss rate due to neutrino cooling.
mass accretion from the Keplerian disc, carbon ignition may be
avoided in the central remnant, if Tmax,init < TC−ig, and if
τJ > τν,p.
3.4 Mass accretion from the Keplerian disc
In reality, mass accretion from the Keplerian disc onto the central
remnant is expected. The accretion rate is determined by the vis-
cosity of the disc, which is not well known. However, we expect
the accretion rate from a Keplerian disc may be significantly lower
than from a pressure-supported thick disc that was assumed in pre-
vious studies. Our results, as summarized in Table 2, indicate that
even with mass accretion, the central remnant with Tp < TC−ig can
avoid off-center carbon ignition if the accretion rate is sufficiently
low (i.e., Ṁ < 5× 10−6...10−5 M⊙ yr
−1), and if τJ > τν,p (see
Table 2).
The thermal history of the central remnant in those sequences
where carbon ignites off-center is similar to that of the white dwarf
in classical accretion model sequences. However, as the central
remnant has a rapidly rotating hot envelope, carbon ignition is sig-
nificantly delayed compared to the case of classical accretion. In
Seq. N1.2, where Minit = 1.2 M⊙ and Ṁacc = 10
−5 M⊙ yr
carbon ignites only when about 0.025 M⊙ is accreted, while in
Seq. Aa7 more than 0.15 M⊙ have to be accreted to induce carbon
ignition at the same accretion rate, despite its higher initial mass.
On the other hand, the comparison of Seq. Aa7 with Seq. Aa10
indicates that off-center carbon ignition is delayed if the central
remnant keeps more angular momentum. The critical accretion rate
for inducing off-center carbon ignition is thus difficult to precisely
determine, as our 1-D models significantly underestimate the effect
of the centrifugal force, especially in the envelope where carbon
ignites. In addition, the physics of angular momentum loss/gain is
not well understood yet, as discussed in Yoon & Langer (2005).
Note that MWD,ig in Seqs Aa10, Ba6 and Ba7 is already very
close, or even above the Chandrasekhar limit. However, the cen-
tral density in those models is still smaller by an order of mag-
nitude than the critical limit for carbon ignition due to the effect
of rotation. As the carbon-burning flame will propagate inwards
within several thousand years (Saio & Nomoto 1998), only about
∼ 0.05 M⊙ may be further accreted by the time the burning flame
reaches the center, and the central density may not become high
enough to induce a thermonuclear explosion before the whole cen-
tral remnant is converted into an ONeMg white dwarf. (Super-)
Chandrasekhar mass ONeMg white dwarfs produced in this way
will eventually collapse to a neutron star (see Yoon & Langer 2005;
Dessart et al. 2006).
On the other hand, the white dwarf continuously grows
to/above the Chandrasekhar limit (≈ 1.4 M⊙) without suffering
carbon ignition (neither at the center nor off-center) in Seqs Sa8,
Sa9, Sa10, Sa11, Aa8, Aa9, Ab6, and Ba8. The outcome in these
cases is thus the formation of a (super-) Chandrasekhar mass CO
white dwarf, which will eventually explode as a Type Ia super-
nova. The mass of the exploding white dwarf should depend on
the amount of angular momentum (Yoon & Langer 2005) and can-
not exceed the mass budget of merging white dwarfs. Fig. 14 shows
the evolutionary paths of the central remnant for Seqs Sa8, Sa9 &
Sa11 as examples in the mass – angular momentum plane. Note that
the central remnant initially has a large amount of angular momen-
tum (J = 1.11×1050 erg s), such that without loss/gain of angular
momentum, it should accrete matter until it reaches M ≃ 1.68 M⊙
where it explodes in a SN Ia explosion. In Seqs Sa8 and Sa9, the ac-
cretion time scale (τacc) is longer than the angular momentum loss
time scale, and the total angular momentum of the white dwarf con-
tinuously decreases while the total mass increases. Consequently,
carbon ignites at the center when the white dwarf grows to 1.50M⊙
and 1.42 M⊙ for Seqs Sa8 and Sa9, respectively. In Seq. Sa11, on
the other hand, both mass and angular momentum of the central
remnant continuously increase, given that τacc . τJ, and a SN Ia
explosion is expected only when M ≃ 1.70 M⊙. Note that this
is even larger than the mass budget of the binary system consid-
ered for this sequence (i.e, 0.9 M⊙+0.6 M⊙). In nature, the white
dwarf must stop growing in mass when M = 1.5 M⊙, and a SN Ia
explosion will be induced only when a sufficient amount of angular
momentum has been removed, e.g. via gravitational wave radiation,
as illustrated by the path Sa11-B in Fig. 14.
4 CONCLUSION AND DISCUSSION
We have explored the dynamical and secular evolution of the
merger of double CO white dwarf binaries whose total mass ex-
ceeds the Chandrasekhar limit. Based on our new SPH simula-
tion of the coalescence of two CO white dwarfs of 0.9 M⊙ and
0.6 M⊙, we suggest that the immediate post-merger remnant is
best described as a differentially rotating CO star consisting of a
slowly rotating cold core and a rapidly rotating hot envelope that is
surrounded by a Keplerian disc rather than as “cold white dwarf +
thick disc” system, as in previous investigations. The evolution of
such a CO star is determined by the thermal evolution of the en-
velope, and the growth of the core is controlled by the cooling due
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 13
Figure 12. Same as in Fig. 9, but for Seq. Sa4.
Figure 13. Contour lines of the neutrino cooling time scale (τν ≡
TCp/ǫν ) in the log ρ − T plane. The level at each line gives log τν in
units of years. The dashed line denotes the critical temperature for carbon
ignition. The local peak of temperature and the corresponding density in the
initial model of Seqs Sa, Aa, Ab, Ac, Ae, and Ba are marked by the filled
symbols as indicated by the labels.
to neutrino emission and thermal diffusion, which is fundmentally
different from the assumption of “forced accretion of cold matter”.
Our 1-D stellar evolution models of the central remnant, i.e.
the cold core and the hot envelope, which include the effects of
rotation, indicate that there are three necessary conditions for the
merger remnant to avoid off-center carbon ignition such that a SN
Ia may be produced:
(i) The local peak of temperature of the merger remnant at the
interface between the core and the envelope must be lower than the
critical temperature for carbon ignition (Tp < TC−ig).
(ii) The time scale for angular-momentum loss from the central
remnant by must be larger than the neutrino cooling time scale at
the interface (τJ > τν,P).
(iii) Mass accretion from the Keplerian disc onto the cen-
tral remnant must be sufficiently slow (Ṁacc . 5 ×
10−6...10−5 M⊙ yr
Our new SPH simulation confirms that at least the first condition
(Tp < TC−ig) should be fulfilled in the CO white dwarf binary
considered.
As emphasized in Sect. 3.1, our 1-D models significantly un-
derestimate the effect of the centrifugal force on the stellar structure
in the rapidly rotating outermost layers. However, since thermal
diffusion always dominates over both neutrino cooling and com-
pressional heating in the outer envelope (ρ . 105...106 g cm−3)
above the interface, as shown in Figs. 7 and 12, the detailed struc-
ture of the rapidly rotating outermost layers above the interface may
not significantly affect our results on the thermal evolution of the
merger remnant, as long as the angular momentum of the envelope
is not lost faster than the local neutrino cooling time scale at the
interface. On the other hand, mass accretion from the Keplerian
disc should occur preferentially along the equatorial plane of the
envelope. As shown in the SPH simulation, the envelope is more
extended along the equatorial plane, where most angular momen-
tum is deposited, than along the polar axis, and the resultant com-
pressional heating must be much weakened, compared to the case
of our 1-D models. The enhanced role of rotation must thus help
c© 2007 RAS, MNRAS 000, 1–??
14 S.-C. Yoon, Ph. Podsiadlowski & S. Rosswog
to increase the critical mass accretion rate for inducing off-center
carbon ignition, in favor of producing a Type Ia supernova.
We have concluded that the loss of angular momentum on a
short time scale (τJ . τν,p ≈ 10
4...105 yr) may induce off-center
carbon ignition even when Tmax,init < TC−ig. Rapidly rotating
compact stars may experience loss of angular momentum by gravi-
tational wave radiation, due to either the bar-mode instability or the
r-mode instability. The onset of the dynamical or secular bar-mode
instability requires a very high ratio of the rotational energy to the
gravitational energy: Erot/Egrav & 0.2 for the dynamical bar-
node instability, and Erot/Egrav & 0.14 for the secular bar-mode
instability (e.g. Shapiro & Teukolsky 1983). As both our 1-D mod-
els and SPH simulation give a value of Erot/Egrav that is much
lower (about 0.06 – 0.07) than those critical limits, the bar-mode in-
stability may not be relevant. The r-mode instability may operate,
in principle, even with such a low Erot/Egrav (Andersson 1998;
Friedman & Morsink 1998). However, we estimate that the growth
time of the r-mode instability (τr), using our central remnant mod-
els and following Lindblom (1999), is & 106 yr, which is much
longer than the local neutrino cooling time scale (τν,p ≈ 10
4 yr).
Alternatively, angular momentum might be transported from the
accreting star into the Keplerian disc when the accretor reaches
critical rotation. Calculations by Saio & Nomoto (2004) indicate,
however, that the decrease of the total angular momentum due to
such an effect is not significant in accreting white dwarfs. In con-
clusion, neither gravitational wave radiation nor outward angular-
momentum transport is likely to lead to a rapid loss of angular
momentum from the central remnant such that τJ < τν,p, unless
magnetic torques are important.
The central remnant may be enforced to rotate rigidly on a
short time scale in the presence of strong magnetic torques (cf.
Spruit 2002). The central remnant in both our SPH simulation and
1-D models has Jtot > 10
50 erg s, which is significantly higher
than the maximum limit a rigidly rotating white dwarf can retain,
as shown in Fig. 14. This means that if magnetic torques led to rigid
rotation, a large amount of angular momentum should be trans-
ported into the Keplerian disc (Case a in Fig. 14), or mass shed-
ding of super-critically spun-up layers should occur from the cen-
tral remnant (Case b in Fig. 14). In Case a, the local density around
the interface should increase by several factors by the time when
the central remnant reaches rigid rotation as implied by Fig. 15.
Off-center carbon ignition might be inevitable in this case due to a
resultant high effective accretion rate, if the time for angular mo-
mentum redistribution were shorter than the local cooling time due
to neutrino losses. In Case b, on the other hand, the local density
at the interface might not increase if mass shedding from the cen-
tral remnant occurred at a sufficiently high rate. Therefore, the role
of magnetic fields in the merger evolution remains uncertain at the
current stage and is a challenging subject for future work.
The coalescence of more massive double CO white dwarf bi-
naries is likely to result in a higher maximum temperature due to the
enhanced role of gravity. Consequently, given the important role of
the maximum temperature in the merger remnant for its final fate,
less massive binary CO white dwarfs may be favored for the pro-
duction of SNe Ia from such a channel.
We note that there are number of potentially important factors
that have not been included in either the present study or previous
simulations. These include the following points:
(i) The previous and present simulations assumed that white
dwarfs are cold prior to the merging process. However,
Iben, Tutukov & Fedorova (1998) point out that tidal interactions
Figure 14. Evolution of the central remnant in the mass – angular momen-
tum plane. The thick solid curve shows the angular momentum of a rigidly
rotating white dwarf with critical rotation at the surface as a function of the
white dwarf mass. The thick dashed curve and the thick dot-dashed curve
give the critical angular momentum for a differentially rotating CO white
dwarf to reach carbon ignition at the center (ρc = 2 × 109 g cm−3), and
electron-capture induced collapse (ρc = 10×1010 g cm−3), respectively,
according to Yoon & Langer (2005). A SN Ia explosion is expected in the
hatched region. The filled circle denotes the initial model of the central rem-
nant in Seq. Sa. The evolution of the central remnant in Seqs Sa 8, Sa9 and
Sa11 is shown by the thin dotted curves, as indicated. The thin solid curves
denote possible evolutionary paths of the central remnant with strong mag-
netic torques that may enforce rigid rotation, with loss of angular momen-
tum but without mass shedding (Case ’a’), and with both loss of angular
momentum and mass shedding (Case ’b’). See the text for more details.
Figure 15. The density profile in the initial model of the central remnant
in Seqs Sa (solid curve), and in a corresponding hot (Tc = 108 K) white
dwarf model that rotates rigidly at critical rotation at the surface (dashed
curve).
might heat up the white dwarfs as the orbit shrinks, which could
weaken the gravitational potential of the primary. Furthermore, as
the temperature of white dwarfs is a function of their age, younger
progenitors should have more extended envelopes, which may re-
sult in a lower Tp.
(ii) A thin hydrogen/helium envelope must be present initially in
both the primary and the secondary. As hydrogen or helium should
ignite at a much lower temperature than carbon, the influence of
the release of nuclear energy during the merger process may be
even more important than shown in the existing SPH simulations,
c© 2007 RAS, MNRAS 000, 1–??
Remnant evolution after a carbon-oxygen white dwarf merger 15
which is likely to lower Tp. Furthermore, neutrino losses, which
were neglected in the present study, would also tend to reduce Tp.
(iii) At a given total mass (Mtot = Mprimary + Msecondary),
different mass ratios of the white dwarf components (q ≡
Msecondary/Mprimary) must result in different merger structures.
(iv) A lower q at a given Mtot may not only lead to a stronger
gravitational potential of the primary, but also to a lower mass
accretion rate during the dynamical mass transfer (Guerrero et al.
2004). As the former and the latter will tend to increase and de-
crease Tp, respectively, quantitative studies are necessary to predict
how Tp will change with q.
Finally, another important ingredient that needs to be consid-
ered is thermal diffusion during the dynamical evolution. As shown
above, the mass-accretion rate from the Keplerian disc onto the en-
velope of the central remnant is one of the most important factors
that critically determine the final fate of double CO white dwarf
mergers. The accretion rates depend on the structure of the Kep-
lerian disc at thermal equilibrium, which can be only understood
by including thermal diffusion in future simulations. But here we
emphasize again that the accretion rates from a centrifugally sup-
ported Keplerian disc should be significantly lower than those from
a pressure-supported thick disc that was previously assumed, which
opens the possibility for at least some double CO white dwarf
mergers to produce SNe Ia.
ACKNOWLEDGMENTS
We are grateful to Norbert Langer and Ken’ichi Nomoto for many
useful suggestions and comments. SCY is supported by the VENI
grant (639.041.406) of the Netherlands Organization for Scientific
Research (NWO). The computations have been performed on the
JUMP supercomputer at the Höchstleistungsrechenzentrum Jülich.
REFERENCES
Andersson, N., 1998, ApJ, 502, 708
Balsara, D.S., 1995, JCoPh., 121, 357
Benz, W., Cameron, A.G.W., Press, W.H., & Bowers, R.L., 1990,
ApJ, 348, 647
Branch, D., Livio, M., Yungelson, L. R., Boffi, F.R., Baron, E.,
1995, PASP, 107, 1019
Dessart, L., Burrows, A., Ott, C., Livne, E., Yoon, S.-C., &
Langer, N., 2006, ApJ, 644, 1063
Friedman, J.L., & Morsink, S.M, 1998, ApJ, 502, 714
Guerrero, J., Garcı́a-Berro, E., & Isern, J., 2004, A&A, 413, 257
Hix, W.R. et al., 1998, ApJ, 503, 332
Iben, I. Jr., & Tutukov, A., 1984, ApJS, 55, 335
Iben, I.Jr., Tutukov, A.V., & Fedorova, A.V., 1998, ApJ, 503, 344
Itoh, N., Hayashi, H., Hishikawa, A., & Kohyama, Y., 1996, ApJS,
102, 411
Kippenhahn, R., & Weigert, A., 1990, Stellar Structure and Evo-
lution, Springer-Verlag
Kitaura, F.S., Janka, H.-Th., & Hillebrandt, W., 2006, A&A, 450,
Knaap, R.C.J., 2004, Master Thesis, Utrecht University
Lomax,H., Pulliam,T.H. and Zingg,D.W., 2001, Fundamentals of
Computational Fluid Dynamics, Springer, Heidelberg
Lindblom, L, 1999, Phys. Rev. D, 60, 4007
Mochkovitch, R., & Livio, M., 1989, A&A, 209, 111
Mochkovitch, R., & Livio, M., 1990, A&A, 236, 378
Monaghan, J.J., & Varnas, S.R., 1988, MNRAS, 231, 515
Monaghan, J.J., 2002, SPH compressible turbulence, MNRAS
335, 843
Morris, J.P., & Monaghan, J.J., 1997, J. Comp. Phys., 136, 41
Napiwotzki, R., Karl, C., & Nelemans, G. et al., 2004, RMxAC,
20, 113
Napiwotzki, R., Koester, D., & Nelemans, G., et al., 2002, A&A,
386, 957
Nomoto, K., & Iben, I.Jr., 1985, ApJ, 297, 53
Nomoto, K., & Kondo, Y., 1991, ApJ, 367, L19
Paczyński, B., 1991, ApJ, 370, 597
Piersanti, L., Gagliardi, S., Iben, I.Jr., & Tornambé, A., 2003a,
ApJ, 583, 885
Piersanti, L., Gagliardi, S., Iben, I.Jr., & Tornambé, A., 2003b,
ApJ, 598, 1229
Popham, R., & Narayan, R., 1991, ApJ, 370, 604
Price, D., 2004, PhD thesis (Cambridge)
Price, D. & Monaghan, J.J., accepted by MNRAS,
[astro-ph/0610872]
Rosswog, S., & Davies, M.B., 2002, MNRAS, 334, 481
Rosswog, S., Davies, M.B., Thielemann, F.-K., & Piran, T., 2000,
A&A, 360,171
Rosswog, S., & Liebendörfer, M., 2003, MNRAS, 342, 673
Rosswog, S., Ramirez-Ruiz, E. and Hix, W.R., to be submitted
Rosswog, S., Speith, R. & Wynn, G., 2004, MNRAS, 351, 1121
Saio, H., & Nomoto, K., 1985, A&A, 150, 21
Saio, H., & Nomoto, K., 1998, ApJ, 500, 388
Saio, H., & Nomoto, K., 2004, ApJ, 615, 444
Segretain, L., Chabrier, G., & Mochkovitch, R., 1997, ApJ, 481,
Shapiro, S.L., & Teukolsky, S.A., 1983, Black Holes, White
Dwarfs and Neutron Stars: The Physics of Compact Objects,
Willey-Interscience
Springel, V. & Hernquist, L., 2002, MNRAS, 333, 649
Spruit, H.C., 2002, A&A, 381, 923
Tutukov, A.V., & Yungelson, L., 1979, Acta. Astr., 29, 665
Timmes, F.X. & Swesty, F.D., 2000, ApJS, 126, 501
Webbink, R.F., 1984, ApJ, 277, 355
Yoon, S.-C., & Langer, N., 2004, A&A,419, 623 (YL04)
Yoon, S.-C., & Langer, N., 2005, A&A, 435, 967
c© 2007 RAS, MNRAS 000, 1–??
http://arxiv.org/abs/astro-ph/0610872
ABSTRACT
  We systematically explore the evolution of the merger of two carbon-oxygen
(CO) white dwarfs. The dynamical evolution of a 0.9 Msun + 0.6 Msun CO white
dwarf merger is followed by a three-dimensional SPH simulation. We use an
elaborate prescription in which artificial viscosity is essentially absent,
unless a shock is detected, and a much larger number of SPH particles than
earlier calculations. Based on this simulation, we suggest that the central
region of the merger remnant can, once it has reached quasi-static equilibrium,
be approximated as a differentially rotating CO star, which consists of a
slowly rotating cold core and a rapidly rotating hot envelope surrounded by a
centrifugally supported disc. We construct a model of the CO remnant that
mimics the results of the SPH simulation using a one-dimensional hydrodynamic
stellar evolution code and then follow its secular evolution. The stellar
evolution models indicate that the growth of the cold core is controlled by
neutrino cooling at the interface between the core and the hot envelope, and
that carbon ignition in the envelope can be avoided despite high effective
accretion rates. This result suggests that the assumption of forced accretion
of cold matter that was adopted in previous studies of the evolution of double
CO white dwarf merger remnants may not be appropriate. Our results imply that
at least some products of double CO white dwarfs merger may be considered good
candidates for the progenitors of Type Ia supernovae. In this case, the
characteristic time delay between the initial dynamical merger and the eventual
explosion would be ~10^5 yr. (Abridged).

<|endoftext|><|startoftext|>
Introduction
	2. Preliminaries
	3. Abstract Jackson's inequality in a Banach space
	4. The examples of application of the abstract Jackson's inequality in particular spaces
	4.1. Jackson's inequalities in Lp(2) and C(2)
	4.2. Jackson's inequalities of the approximation by exponential type entire functions in the space Lp(R, p)
	References
ABSTRACT
  For an arbitrary operator A on a Banach space X which is a generator of
C_0-group with certain growth condition at the infinity, the direct theorems on
connection between the smoothness degree of a vector $x\in X$ with respect to
the operator A, the order of convergence to zero of the best approximation of x
by exponential type entire vectors for the operator A, and the k-module of
continuity are given. Obtained results allows to acquire Jackson-type
inequalities in many classic spaces of periodic functions and weighted $L_p$
spaces.

<|endoftext|><|startoftext|>
Parametrized Post-Newtonian Expansion of Chern-Simons Gravity
Stephon Alexander1 and Nicolás Yunes1
Center for Gravitational Wave Physics, Institute for Gravitational Physics and Geometry and Department of Physics,
The Pennsylvania State University, University Park, PA 16802, USA
(Dated: November 4, 2018)
We investigate the weak-field, post-Newtonian expansion to the solution of the field equations
in Chern-Simons gravity with a perfect fluid source. In particular, we study the mapping of this
solution to the parameterized post-Newtonian formalism to 1 PN order in the metric. We find
that the PPN parameters of Chern-Simons gravity are identical to those of general relativity, with
the exception of the inclusion of a new term that is proportional to the Chern-Simons coupling
parameter and the curl of the PPN vector potentials. We also find that the new term is naturally
enhanced by the non-linearity of spacetime and we provide a physical interpretation for it. By
mapping this correction to the gravito-electro-magnetic framework, we study the corrections that
this new term introduces to the acceleration of point particles and the frame-dragging effect in
gyroscopic precession. We find that the Chern-Simons correction to these classical predictions could
be used by current and future experiments to place bounds on intrinsic parameters of Chern-Simons
gravity and, thus, string theory.
PACS numbers: 11.25.Wx, 95.55.Ym, 04.60.-m, 04.80.Cc
I. INTRODUCTION
Tests of alternative theories of gravity that modify gen-
eral relativity (GR) at a fundamental level are essential to
the advancement of physics. One formalism that has had
incredible success in this task is the parameterized post-
Newtonian (PPN) framework [1, 2, 3, 4, 5, 6]. In this
formalism, the metric of the alternative theory is solved
for in the weak-field limit and its deviations from GR are
expressed in terms of PPN parameters. Once a metric
has been obtained, one can calculate predictions of the
alternative theory, such as light deflection and the perihe-
lion shift of Mercury, which shall depend on these PPN
parameters. Therefore, experimental measurements of
such physical effects directly lead to constraints on the
parameters of the alternative theory. This framework,
together with the relevant experiments, have already
been successfully employed to constrain scalar-tensor the-
ories (Brans-Dicke, Bekenstein) [7], vector-tensor the-
ories (Will-Nordtvedt [8], Hellings-Nordtvedt [9]), bi-
metric theories (Rosen [10, 11]) and stratified theories
(Ni [12]) (see [13] for definitions and an updated review.)
Only recently has this framework been used to study
quantum gravitational and string-theoretical inspired
ideas. On the string theoretical side, Kalyana [14] investi-
gated the PPN parameters associated with the graviton-
dilaton system in low-energy string theory. More re-
cently, Ivashchuk, et. al. [15] studied PPN parameters
in the context of general black holes and p-brane spher-
ically symmetric solutions, while Bezerra, et. al. [16]
considered domain wall spacetimes for low energy effec-
tive string theories and derived the corresponding PPN
parameters for the metric of a wall. On the quan-
tum gravitational side, Gleiser and Kozameh [17] and
more recently Fan, et. al. [18] studied the possibility
of testing gravitational birefringence induced by quan-
tum gravity, which was proposed by Amelino-Camelia,
el. al. [19] and Gambini and Pullin [20]. Other non-
PPN proposals have been also put forth to test quan-
tum gravity, for example through gravitational waves
[21, 22, 23, 24, 25, 26, 27, 28], but we shall not discuss
those tests here.
Chern-Simons (CS) gravity [29, 30] is one such ex-
tension of GR, where the gravitational action is mod-
ified by the addition of a parity-violating term. This
extension is promising because it is required by all 4-
dimensional compactifications of string theory [31] for
mathematical consistency because it cancels the Green-
Schwarz anomaly [32]. CS gravity, however, is not unique
to string theory and in fact has its roots in the standard
model, where it arises as a gravitational anomaly pro-
vided that there are more flavours of left handed leptons
than right handed ones. Moreover the CS extension to
GR can arise via the embedding of the three dimensional
Chern-Simons topological current into a 4D space-time
manifold, decsribed by Jackiw and Pi [30]
Chern-Simons gravity has been recently studied in
the cosmological context. In particular, this framework
was used to shed light on the anisotropies of the cos-
mic microwave background (CMB) [33, 34, 35] and the
leptogenesis problem [34, 36, 37]. Parity violation has
also been shown to produce birefringent gravitational
waves [28, 29], where different polarizations modes ac-
quire varying amplitudes. These modes obey different
propagation equations because the imaginary sector of
the classical dispersion relation is CS corrected. Different
from [20], in CS birefringence the velocity of the gravita-
tional wave remains that of light.
In this paper we study CS gravity in the PPN frame-
work, extending the analysis of [38] and providing some
missing details. In particular, we shall consider the ef-
fect of the CS correction to the gravitational field of,
for instance, a pulsar, a binary system or a star in the
weak-field limit. These corrections are obtained by solv-
http://arxiv.org/abs/0704.0299v1
ing the modified field equations in the weak-field limit for
post-Newtonian (PN) sources, defined as those that are
weakly-gravitating and slowly-moving [39]. Such an ex-
pansion requires the calculation of the Ricci and Cotton
tensors to second order in the metric perturbation. We
then find that CS gravity leads to the same gravitational
field as that of classical GR and, thus, the same PPN
parameters, except for the inclusion of a new term in the
vectorial sector of the metric, namely
0i = 2ḟ (∇× V )i , (1)
where ḟ acts as a coupling parameter of CS theory and
Vi is a PPN potential. We also show that this solution
can be alternatively obtained by finding a formal solu-
tion to the modified field equations and performing a PN
expansion, as is done in PN theory. The full solution
is further shown to satisfy the additional CS constraint,
which leads to equations of motion given only by the di-
vergence of the stress-energy tensor.
The CS correction to the metric found here leads to
an interesting interpretation of CS gravity and forces us
to consider a new type of coupling. The interpretation
consists of thinking of the field that sources the CS cor-
rection as a fluid that permeates all of spacetime. Then
the CS correction in the metric is due to the “dragging” of
such a fluid by the motion of the source. Until now, cou-
plings of the CS correction to the angular momentum of
the source had been neglected by the string theory com-
munity. Similarly, curl-type terms had also been consid-
ered unnecessary in the traditional PPN framework, since
previous alternative gravity theories had not required it.
As we shall show, in CS gravity and thus in string the-
ory, such a coupling is naturally occurring. Therefore, a
proper PPN mapping requires the introduction of a new
curl-type term with a corresponding new PPN parameter
of the type of Eq. (1).
A modification to the gravitational field leads natu-
rally to corrections of the standard predictions of GR.
In order to illustrate such a correction, we consider the
CS term in the gravito-electro-magnetic analogy [40, 41],
where we find that the CS correction accounts for a mod-
ification of gravitomagnetism. Furthermore, we calculate
the modification to the acceleration of point particles and
the frame dragging effect in the precession of gyroscopes.
We find that these corrections are given by
δai = −
c2 r2
δΩi = − ḟ
c3 r2
ni − v
, (2)
where m and v are the mass and velocities of the source,
while r is the distance to the source and ni = xi/r is a
unit vector, with · and × the flat-space scalar and cross
products. Both corrections are found to be naturally en-
hanced in regions of high spacetime curvature. We then
conclude that experiments that measure the gravitomag-
netic sector of the metric either in the weak-field (such as
Gravity Probe B [42]) and particularly in the non-linear
regime, will lead to a direct constraint on the CS cou-
pling parameter ḟ . In this paper we develop the details
of how to calculate these corrections, while the specifics
of how to actually impose a constraint, which depend
on the experimental setup, are beyond the scope of this
paper.
The remainder of this paper deals with the details of
the calculations discussed in the previous paragraphs.
We have divided the paper as follows: Sec. II describes
the basics of the PPN framework; Sec. III discusses CS
modified gravity, the modified field equations and com-
putes a formal solution; Sec. IV expands the field equa-
tions to second order in the metric perturbation; Sec. V
iteratively solves the field equations in the PN approx-
imation and finds the PPN parameters of CS gravity;
Sec. VI discusses the correction to the acceleration of
point particles and the frame dragging effect; Sec. VII
concludes and points to future research.
The conventions that we use throughout this work are
the following: Greek letters represent spacetime indices,
while Latin letters stand for spatial indices only; semi-
colons stand for covariant derivatives, while colons stand
for partial derivatives; overhead dots stand for deriva-
tives with respects to time. We denote uncontrolled re-
mainders with the symbol O(A), which stands for terms
of order A. We also use the Einstein summation con-
vention unless otherwise specified. Finally, we use ge-
ometrized units, where G = c = 1, and the metric signa-
ture (−,+,+,+).
II. THE ABC OF PPN
In this section we summarize the basics of the PPN
framework, following [6]. This framework was first
developed by Eddington, Robertson and Schiff [1, 6],
but it came to maturity through the seminal papers of
Nordtvedt and Will [2, 3, 4, 5]. In this section, we de-
scribe the latter formulation, since it is the most widely
used in experimental tests of gravitational theories.
The goal of the PPN formalism is to allow for com-
parisons of different metric theories of gravity with each
other and with experiment. Such comparisons become
manageable through a slow-motion, weak-field expansion
of the metric and the equations of motion, the so-called
PN expansion. When such an expansion is carried out
to sufficiently high but finite order, the resultant solu-
tion is an accurate approximation to the exact solution
in most of the spacetime. This approximation, however,
does break down for systems that are not slowly-moving,
such as merging binary systems, or weakly gravitating,
such as near the apparent horizons of black hole binaries.
Nonetheless, as far as solar system tests are concerned,
the PN expansion is not only valid but also highly accu-
rate.
The PPN framework employs an order counting-
scheme that is similar to that used in multiple-scale anal-
ysis [43, 44, 45, 46]. The symbol O(A) stands for terms of
order ǫA, where ǫ ≪ 1 is a PN expansion parameter. For
convenience, it is customary to associate this parameter
with the orbital velocity of the system v/c = O(1), which
embodies the slow-motion approximation. By the Virial
theorem, this velocity is related to the Newtonian poten-
tial U via U ∼ v2, which then implies that U = O(2) and
embodies the weak-gravity approximation. These expan-
sions can be thought of as two independent series: one
in inverse powers of the speed of light c and the other in
positive powers of Newton’s gravitational constant.
Other quantities, such as matter densities and deriva-
tives, can and should also be classified within this order-
counting scheme. Matter density ρ, pressure p and spe-
cific energy density Π, however, are slightly more com-
plicated to classify because they are not dimensionless.
Dimensionlessness can be obtained by comparing the
pressure and the energy density to the matter density,
which we assume is the largest component of the stress-
energy tensor, namely p/ρ ∼ Π/ρ = O(2). Derivatives
can also be classified in this fashion, where we find that
∂t/∂x = O(1). Such a relation can be derived by noting
that ∂t ∼ vi∇i, which comes from the Euler equations of
hydrodynamics to Newtonian order.
With such an order-counting scheme developed, it is
instructive to study the action of a single neutral particle.
The Lagrangian of this system is given by
L = (gµνu
−g00 − 2g0ivi − gijvivj
where uµ = dxµ/dt = (1, vi) is the 4-velocity of the parti-
cle and vi is its 3-velocity. From Eq. (3), note that knowl-
edge of L to O(A) implies knowledge of g00 to O(A), g0i
to O(A − 1) and gij to O(A − 2). Therefore, since the
Lagrangian is already known to O(2) (the Newtonian so-
lution), the first PN correction to the equations of motion
requires g00 to O(4), g0i to O(3) and gij to O(2). Such
order counting is the reason for calculating different sec-
tors of the metric perturbation to different PN orders.
A PPN analysis is usually performed in a particular
background, which defines a particular coordinate sys-
tem, and in an specific gauge, called the standard PPN
gauge. The background is usually taken to be Minkowski
because for solar system experiments deviations due to
cosmological effects are negligible and can, in principle,
be treated as adiabatic corrections. Moreover, one usu-
ally chooses a standard PPN frame, whose outer regions
are at rest with respect to the rest frame of the universe.
Such a frame, for example, forces the spatial sector of the
metric to be diagonal and isotropic [6]. The gauge em-
ployed is very similar to the PN expansion of the Lorentz
gauge of linearized gravitational wave theory. The differ-
ences between the standard PPN and Lorentz gauge are
of O(3) and they allow for the presence of certain PPN
potentials in the vectorial sector of the metric perturba-
tion.
The last ingredient in the PPN recipe is the choice of
a stress-energy tensor. The standard choice is that of a
perfect fluid, given by
T µν = (ρ+ ρΠ+ p)uµuν + pgµν . (4)
Such a stress-energy density suffices to obtain the PN
expansion of the gravitational field outside a fluid body,
like the Sun, or of compact binary system. One can show
that the internal structure of the fluid bodies can be ne-
glected to 1 PN order by the effacement principle [39] in
GR. Such effacement principle might actually not hold
in modified field theories, but we shall study this subject
elsewhere [47].
With all these machinery, on can write down a super-
metric [6], namely
g00 = −1 + 2U − 2βU2 − 2ξΦW + (2γ + 2 + α3 + ζ1
− 2ξ)Φ1 + 2 (3γ − 2β + 1 + ζ2 + ξ)Φ2
+ 2 (1 + ζ3)Φ3 + 2 (3γ + 3ζ4 − 2ξ)Φ4 − (ζ1 − 2ξ)A,
g0i = −
(4γ + 3 + α1 − α2 + ζ1 − 2ξ)Vi
(1 + α2 − ζ1 + 2ξ)Wi,
gij = (1 + 2γU) δij , (5)
where δij is the Kronecker delta and where the PPN
potentials (U,ΦW ,Φ1,Φ2,Φ3,Φ4,A, Vi,Wi) are defined
in Appendix A. Equation (5) describes a super-metric
theory of gravity, because it reduces to different met-
ric theories, such as GR or other alternative theo-
ries [6], through the appropriate choice of PPN param-
eters (γ, β, ξ, α1, α2, α3, ζ1, ζ2, ζ3, ζ4). One could obtain
a more general form of the PPN metric by performing
a post-Galilean transformation on Eq. (5), but such a
procedure shall not be necessary in this paper.
The super-metric of Eq. (5) is parameterized in terms
of a specific number of PPN potentials, where one usually
employs certain criteria to narrow the space of possible
potentials to consider. Some of these restriction include
the following: the potentials tend to zero as an inverse
power of the distance to the source; the origin of the co-
ordinate system is chosen to coincide with the source,
such that the metric does not contain constant terms;
and the metric perturbations h00, h0i and hij transform
as a scalar, vector and tensor. The above restrictions
are reasonable, but, in general, an additional subjective
condition is usually imposed that is based purely on sim-
plicity: the metric perturbations are not generated by
gradients or curls of velocity vectors or other generalized
vector functions. As of yet, no reason had arisen for re-
laxing such a condition, but as we shall see in this paper,
such terms are indeed needed for CS modified theories.
What is the physical meaning of all these parame-
ters? One can understand what these parameters mean
by calculating the generalized geodesic equations of mo-
tion and conservation laws [6]. For example, the param-
eter γ measures how much space-curvature is produced
by a unit rest mass, while the parameter β determines
how much “non-linearity” is there in the superposition
law of gravity. Similarly, the parameter ξ determines
whether there are preferred-location effects, while αi rep-
resent preferred-frame effects. Finally, the parameters ζi
measure the amount of violation of conservation of to-
tal momentum. In terms of conservation laws, one can
interpret these parameters as measuring whether a the-
ory is fully conservative, with linear and angular momen-
tum conserved (ζi and αi vanish), semi-conservative, with
linear momentum conserved (ζi and α3 vanish), or non-
conservative, where only the energy is conserved through
lowest Newtonian order. One can verify that in GR,
γ = β = 1 and all other parameters vanish, which implies
that there are no preferred-location or frame effects and
that the theory is fully conservative.
A PPN analysis of an alternative theory of gravity then
reduces to mapping its solutions to Eq. (5) and then de-
termining the PPN parameters in terms of intrinsic pa-
rameters of the theory. The procedure is simply as fol-
lows: expand the modified field equations in the metric
perturbation and in the PN approximation; iteratively
solve for the metric perturbation to O(4) in h00, to O(3)
in h0i and to O(2) in hij ; compare the solution to the
PPN metric of Eq. (5) and read off the PPN parameters
of the alternative theory. We shall employ this procedure
in Sec. V to obtain the PPN parameters of CS gravity.
III. CS GRAVITY IN A NUTSHELL
In this section, we describe the basics of CS gravity,
following mainly [29, 30]. In the standard CS formalism,
GR is modified by adding a new term to the gravitational
action. This term is given by [30]
SCS =
d4xf (⋆R R) , (6)
where mpl is the Planck mass, f is a prescribed external
quantity with units of squared mass (or squared length
in geometrized units), R is the Ricci scalar and the star
stands the dual operation, such that
R⋆R =
Rαβγδǫ
αβµνRγδµν , (7)
with ǫµνδγ the totally-antisymmetric Levi-Civita tensor
and Rµνδγ the Riemann tensor.
Such a correction to the gravitational action is inter-
esting because of the unavoidable parity violation that
is introduced. Such parity violation is inspired from CP
violation in the standard model, where such corrections
act as anomaly-canceling terms. A similar scenario oc-
curs in string theory, where the Green-Schwarz anomaly
is canceled by precisely such a CS correction [32], al-
though CS gravity is not exclusively tied to string the-
ory. Parity violation in CS gravity inexorably leads to
birefringence in gravitational propagation, where here
we mean that different polarization modes obey differ-
ent propagation equations but travel at the same speed,
that of light [29, 30, 36, 47]. If CS gravity were to
lead to polarization modes that travel at different speeds,
then one could use recently proposed experiments [17]
to test this effect, but such is not the case in CS grav-
ity. Birefringent gravitational waves, and thus CS grav-
ity, have been proposed as possible explanations to the
cosmic-microwave-background (CMB) anisotropies [36],
as well as the baryogenesis problem during the inflation-
ary epoch [33].
The magnitude of the CS correction is controlled by the
externally-prescribed quantity f , which depends on the
specific theory under consideration. When we consider
CS gravity as an effective quantum theory, then the cor-
rection is suppressed by some mass scale M , which could
be the electro-weak scale or some other scale, since it is
unconstrained. In the context of string theory, the quan-
tity f has been calculated only in conservative scenar-
ios, where it was found to be suppressed by the Planck
mass. In other scenarios, however, enhancements have
been proposed, such as in cosmologies where the string
coupling vanishes at late times [48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58], or where the field that generates f cou-
ples to spacetime regions with large curvature [59, 60] or
stress-energy density [28, 47]. For simplicity, we here as-
sume that this quantity is spatially homogeneous and its
magnitude is small but non-negligible, so that we work
to first order in the string-theoretical correction. There-
fore, we treat ḟ as an independent perturbation parame-
ter, [70] unrelated to ǫ, the PN perturbation parameter.
The field equations of CS modified gravity can be ob-
tained by varying the action with respect to the metric.
Doing so, one obtains
Gµν + Cµν = 8πTµν , (8)
where Gµν is the Einstein tensor, Tµν is a stress-energy
tensor and Cµν is the Cotton tensor. The latter tensor is
defined via
Cµν = −
(µDαRν)β + (Dσf,τ )
⋆Rτ (µ
where parenthesis stand for symmetrization, g is the de-
terminant of the metric, Da stands for covariant differ-
entiation and colon subscripts stand for partial differen-
tiation.
Formally, the introduction of such a modification to
the field equations leads to a new constraint, which is
compensated by the introduction of the new scalar field
degree of freedom f . This constraint originates by re-
quiring that the divergence of the field equations vanish,
namely
DµCµν =
Dνf (
⋆RR) = 0, (10)
where the divergence of the Einstein tensor vanished by
the Bianchi identities. If this constraint is satisfied, then
the equations of motion for the stress-energy DµT
µν are
unaffected by CS gravity. A common source of confusion
is that Eq. (10) is sometimes interpreted as requiring that
R⋆R also vanish, which would then force the correction
to the action to vanish. However, this is not the case be-
cause, in general, f is an exact form (d2f = 0) and, thus,
Eq. (10) only implies an additional constraint that forces
all solutions to the field equations to have a vanishing
The previous success of CS gravity in proposing plau-
sible explanations to important cosmological problems
prompts us to consider this extension of GR in the weak-
field regime. For this purpose, it is convenient to rewrite
the field equations in trace-reversed form, since this form
is most amenable to a PN expansion. Doing so, we find,
Rµν + Cµν = 8π
Tµν −
, (11)
where the trace of the Cotton tensor vanishes identically
and T = gµνT
µν is the four dimensional trace of the
stress-energy tensor. To linear order, the Ricci and Cot-
ton tensors are given by [30]
Rµν = −
�hµν +O(h)2,
Cµν = −
ǫ̃0αβ(µ�ηhν)β,α +O(h)2, (12)
where ǫ̃αβγδ is the Levi-Civita symbol, with conven-
tion ǫ̃0123 = +1, and �η = −∂2t + ηij∂i∂j is the flat
space D’Alambertian, with ηµν the Minkowski metric. In
Eq. (12), we have employed the Lorentz gauge condition
α = h,µ/2, where h = g
µνhµν is the four dimensional
trace of the metric perturbation.
The Cotton tensor changes the characteristic behav-
ior of the Einstein equations by forcing them to become
third order instead of second order. Third-order par-
tial differential equations are common in boundary layer
theory [43]. However, in CS gravity, the third-order con-
tributions are multiplied by a factor of f and we shall
treat this function as a small independent expansion pa-
rameter. Therefore, the change in characteristics in the
modified field equations can also be treated perturba-
tively, which is justified because eventhough ḟ might be
enhanced by standard model currents, extra dimensions
or a vanishing string coupling, it must still carry some
type of mass suppression.
The trace-reversed form of the field equations is useful
because it allows us to immediately find a formal solution.
Inverting the D’Alambertian operator we obtain
Hµν = −16π �−1η
Tµν −
+O(h)2, (13)
where we have defined an effective metric perturbation
Hµν ≡ hµν + ḟ ǫ̃0αβ(µhν)β,α. (14)
Note that this formal solution is identical to the formal
PN solution to the field equations in the limit ḟ → 0.
Also note that the second term in Eq. (14) is in essence
a curl operator acting on the metric. This antisymmetric
operator naturally forces the trace of the CS correction
to vanish, as well as the 00 component and the symmetric
spatial part.
From the formal solution to the modified field equa-
tions, we immediately identify the only two possible non-
zero CS contributions : a coupling to the vector compo-
nent of the metric h0i; and coupling to the transverse-
traceless part of the spatial metric hTTij . The latter
has already been studied in the gravitational wave con-
text [29, 30, 47] and it vanishes identically if we require
the spatial sector of the metric perturbation to be confor-
mally flat. The former coupling is a new curl-type contri-
bution to the metric perturbation that, to our knowledge,
had so far been neglected both by the string theory and
PPN communities. In fact, as we shall see in later sec-
tions, terms of this type will force us to introduce a new
PPN parameter that is proportional to the curl of certain
PPN potentials.
Let us conclude this section by pushing the formal so-
lution to the modified field equations further to obtain a
formal solution in terms of the actual metric perturba-
tion hµν . Combining Eqs. (13) and (14) we arrive at the
differential equation
hµν+ḟ ǫ̃
(µhν)β,α = −16π �−1η
Tµν −
+O(h)2.
Since we are searching for perturbations about the gen-
eral relativistic solution, we shall make the ansatz
hµν = h
µν + ḟζµν +O(h)2, (16)
where h
µν is the solution predicted by general relativity
h(GR)µν ≡ −16π �−1η
Tµν −
, (17)
and where ζµν is an unknown function we are solving for.
Inserting this ansatz into Eq. (15) we obtain
ζµν+ḟ ǫ̃
(µζν)β,α = 16πǫ̃
(µ∂α�
Tν)β −
gν)βT
We shall neglect the second term on the left-hand side be-
cause it would produce a second order correction. Such
conclusion was also reached when studying parity viola-
tion in GR to explain certain features of the CMB [35].
We thus obtain the formal solution
ζµν = 16πǫ̃
(µ∂α�
Tν)β −
gν)βT
and the actual metric perturbation to linear order be-
comes
hµν = −16π �−1η
Tµν −
+ 16πḟ ǫ̃kℓi�−1η
δi(µTν)ℓ,k −
δi(µην)ℓT,k
+O(h)2,
where we have used some properties of the Levi-Civita
symbol to simplify this expression. The procedure pre-
sented here is general enough that it can be directly ap-
plied to study CS gravity in the PPN framework, as well
as possibly find PN solutions to CS gravity.
IV. PN EXPANSION OF CS GRAVITY
In this section, we perform a PN expansion of the field
equations and obtain a solution in the form of a PN se-
ries. This solution then allows us to read off the PPN
parameters by comparing it to the standard PPN super-
metric [Eq. (5)]. In this section we shall follow closely the
methods of [6] and [61] and indices shall be manipulated
with the Minkowski metric, unless otherwise specified.
Let us begin by expanding the field equations to second
order in the metric perturbation. Doing so we find that
the Ricci and Cotton tensors are given to second order
Rµν = −
�ηhµν − 2hσ(µ,ν)σ + h,µν
2hρ(µ,ν)λ − hµν,ρλ − hρλ,µν
hρλ,µhρλ,ν + h
ν,λ (21)
− hρµ,λhρν,λ +
h,λ − 2hλρ,ρ
hµν,λ − 2hλ(µ,ν)
+O(h)3,
Cµν = −
ǫ̃0αβ(µ
�ηhν)β,α − hσβ,αν)σ
ǫ̃0αβ(µ
�ηhν)β,α − hσβ,αν)σ
2hν)(λ,α) − hλα,ν)
β − 2hσ(λ,β)σ + h,βλ
− 2Q̂Rν)β,α
ǫ̃σαβ(µ
2h0(σ,τ) − hστ,0
hτ [β,α]ν) − hν)[β,α]τ
hµλǫ̃
0αβ(λ
�ηhν)β,α − hσβ,αν)σ
ǫ̃0αβ(µ
β,α − hσβ,ασλ)
hνλ +O(h)3. (22)
where index contraction is carried out with the
Minkowski metric and where we have not assumed any
gauge condition. The operator Q̂(·) takes the quadratic
part of its operand [of O(h)2] and it is explained in more
detail in Appendix B, where the derivation of the expan-
sion of the Cotton tensor is presented in more detail. In
this derivation, we have used the definition of the Levi-
Civita tensor
ǫαβγδ = (−g)1/2ǫ̃αβγδ =
ǫ̃αβγδ +O(h)2, (23)
ǫαβγδ = −(−g)−1/2ǫ̃αβγδ = −
ǫ̃αβγδ +O(h)2.
Note that the PN expanded version of the linearized Ricci
tensor of Eq. (21) agrees with previous results [6]. Also
note that if the Lorentz condition is enforced, several
terms in both expressions vanish identically and the Cot-
ton tensor to first order reduces to Eq. (12), which agrees
with previous results [30].
Let us now specialize the analysis to the standard PPN
gauge. For this purpose, we shall impose the following
gauge conditions
k − 1
h,j = O(4),
k − 1
hkk,0 = O(5), (24)
where hkk is the spatial trace of the metric perturbation.
Note that the first equation is the PN expansion of one of
the Lorentz gauge conditions, while the second equation
is not. This is the reason why the previous equations
where not expanded in the Lorentz gauge. Nonetheless,
such a gauge condition does not uniquely fix the coordi-
nate system, since we can still perform an infinitesimal
gauge transformation that leaves the modified field equa-
tions invariant. One can show that the Lorentz and PPN
gauge are related to each other by such a gauge transfor-
mation. In the PPN gauge, then, the Ricci tensor takes
the usual form
R00 = −
∇2h00 −
h00,ih00,
hijh00,ij +O(6),
R0i = −
∇2h0i −
h00,0i +O(5),
Rij = −
∇2hij +O(4), (25)
which agrees with previous results [6], while the Cotton
tensor reduces to
C00 = O(6),
C0i = −
ḟ ǫ̃0kli∇2h0l,k +O(5),
Cij = −
ḟ ǫ̃0kl(i∇2hj)l,k +O(4), (26)
where ∇ = ηij∂i∂j is the Laplacian of flat space [see Ap-
pendix B for the derivation of Eq. (26).] Note again the
explicit appearance of two coupling terms of the Cotton
tensor to the metric perturbation: one to the transverse-
traceless part of the spatial metric and the other to the
vector metric perturbation. The PN expansions of the
linearized Ricci and Cotton tensor then allow us to solve
the modified field equations in the PPN framework.
V. PPN SOLUTION OF CS GRAVITY
In this section we shall proceed to systematically solve
the modified field equation following the standard PPN
iterative procedure [6]. We shall begin with the 00 and
ij components of the metric to O(2), and then proceed
with the 0i components to O(3) and the 00 component
to O(4). Once all these components have been solved for
in terms of PPN potentials, we shall be able to read off
the PPN parameters adequate to CS gravity.
A. h00 and hij to O(2)
Let us begin with the modified field equations for the
scalar sector of the metric perturbation. These equations
are given to O(2) by
∇2h00 = −8πρ, (27)
because T = −ρ. Eq. (27) is the Poisson equation, whose
solution in terms of PPN potentials is
h00 = 2U +O(4). (28)
Let us now proceed with the solution to the field equa-
tion for the spatial sector of the metric perturbation.
This equation to O(2) is given by
∇2hij + ḟ ǫ̃0kl(i∇2hj)l,k = −8πρδij , (29)
where note that this is the first appearance of a Cotton
tensor contribution. Since the Levi-Civita symbol is a
constant and ḟ is only time-dependent, we can factor
out the Laplacian and rewrite this equation in terms of
the effective metric Hij as
∇2Hij = −8πρδij , (30)
where, as defined in Sec. III,
Hij = hij + ḟ ǫ̃0kl(ihj)l,k. (31)
The solution of Eq. (30) can be immediately found in
terms of PPN potentials as
Hij = 2Uδij +O(4), (32)
which is nothing but Eq. (13). Recall, however, that in
Sec. III we explicitly used the Lorentz gauge to simplify
the field equations, whereas here we are using the PPN
gauge. The reason why the solutions are the same is that
the PPN and Lorentz gauge are indistinguishable to this
order.
Once the effective metric has been solved for, we can
obtain the actual metric perturbation following the pro-
cedure described in Sec. III. Combining Eq. (31) with
Eq. (32), we arrive at the following differential equation
hij + ḟ ǫ̃
(ihj)l,k = 2Uδij. (33)
We look for solutions whose zeroth-order result is that
predicted by GR and the CS term is a perturbative cor-
rection, namely
hij = 2Uδij + ḟζij , (34)
where ζ is assumed to be of O(ḟ)0. Inserting this ansatz
into Eq. (33) we arrive at
ζij + ḟ ǫ̃
(iζj)l,k = 0, (35)
where the contraction of the Levi-Civita symbol and the
Kronecker delta vanished. As in Sec. III, note that the
second term on the left hand side is a second order cor-
rection and can thus be neglected to discover that ζij
vanishes to this order.
The spatial metric perturbation to O(2) is then simply
given by the GR prediction without any CS correction,
namely
hij = 2Uδij +O(4). (36)
Physically, the reason why the spatial metric is unaf-
fected by the CS correction is related to the use of a
perfect fluid stress-energy tensor, which, together with
the PPN gauge condition, forces the metric to be spa-
tially conformally flat. In fact, if the spatial metric were
not flat, then the spatial sector of the metric perturba-
tion would be corrected by the CS term. Such would
be the case if we had pursued a solution to 2 PN or-
der, where the Landau-Lifshitz pseudo-tensor sources a
non-conformal correction to the spatial metric [39], or if
we had searched for gravitational wave solutions, whose
stress-energy tensor vanishes [29, 36]. In fact, one can
check that, in such a scenario, Eq. (30) reduces to that
found by [29, 30, 36, 47] as ρ → 0. We have then found
that the weak-field expansion of the gravitational field
outside a fluid body, like the Sun or a compact binary, is
unaffected by the CS correction to O(2).
B. h0i to O(3)
Let us now look for solutions to the field equations for
the vector sector of the metric perturbation. The field
equations to O(3) become
∇2h0i +
h00,0i +
ḟ ǫ̃0kli∇2h0l,k = 16πρvi, (37)
where we have used that T 0i = −T0i. Using the lower
order solutions and the effective metric, as in Sec. III, we
obtain
∇2H0i + U,0i = 16πρvi, (38)
where the vectorial sector of the effective metric is
H0i = h0i +
ḟ ǫ̃0klih0l,k. (39)
We recognize Eq. (38) as the standard GR field equation
to O(3), except that the dependent function is the effec-
tive metric instead of the metric perturbation. We can
thus solve this equation in terms of PPN potentials to
obtain
H0i = −
Wi, (40)
where we have used that the superpotential X satisfies
X,0j = Vj − Wj (see Appendix A for the definitions.)
Combining Eq. (39) with Eq. (40) we arrive at a differ-
ential equation for the metric perturbation, namely
h0i +
ḟ ǫ̃0klih0l,k = −
Wi. (41)
Once more, let us look for solutions that are perturbation
about the GR prediction, namely
h0i = −
Wi + ḟζi, (42)
where we again assume that ζi is of O(ḟ)0. The field
equation becomes
ḟ (∇× ζ)i =
(∇× V )i +
(∇×W )i
where (∇×A)i = ǫijk∂jAk is the standard curl operator
of flat space. As in Sec. III, note once more that the
second term on the left-hand side is again a second order
correction and we shall thus neglect it. Also note that
the curl of the Vi potential happens to be equal to the
curl of the Wi potential. The solution for the vectorial
sector of the actual gravitational field then simplifies to
h0i = −
Wi + 2ḟ (∇× V )i +O(5). (44)
We have arrived at the first contribution of CS mod-
ified gravity to the metric for a perfect fluid source.
Chern-Simons gravity was previously seen to couple to
the transverse-traceless sector of the metric perturbation
for gravitational wave solutions [29, 30, 36, 47]. The CS
correction is also believed to couple to Noether vector
currents, such as neutron currents, which partially fueled
the idea that this correction could be enhanced. However,
to our knowledge, this correction was never thought to
couple to vector metric perturbations. From the analy-
sis presented here, we see that in fact CS gravity does
couple to such terms, even if the matter source is neu-
trally charged. The only requirement for such couplings
is that the source is not static, ie. that the object is ei-
ther moving or spinning relative to the PPN rest frame
so that the PPN vector potential does not vanish. The
latter is suppressed by a relative O(1) because in the far
field the velocity of a compact object produces a term of
O(3) in Vi, while the spin produces a term of O(4). In
a later section, we shall discuss some of the physical and
observational implications of such a modification to the
metric.
C. h00 to O(4)
A full analysis of the PPN structure of a modified the-
ory of gravity requires that we solve for the 00 component
of the metric perturbation to O(4). The field equations
to this order are
∇2h00 −
h00,ih00,i +
hijh00,ij = 4πρ [1
v2 − U + 1
, (45)
where the CS correction does not contribute at this order
(see Appendix B.) Note that the h0i sector of the metric
perturbation to O(3) does not feed back into the field
equations at this order either. The terms that do come
into play are the h00 and hij sectors of the metric, which
are not modified to lowest order by the CS correction.
The field equation, thus, reduce to the standard one of
GR, whose solution in terms of PPN potentials is
h00 = 2U − 2U2 + 4Φ1 + 4Φ2 +2Φ3 +6Φ4 +O(6). (46)
We have thus solved for all components of the metric per-
turbation to 1 PN order beyond the Newtonian answer,
namely g00 to O(4), g0i to O(3) and gij to O(2).
D. PPN Parameters for CS Gravity
We now have all the necessary ingredients to read off
the PPN parameters of CS modified gravity. Let us begin
by writing the full metric with the solutions found in the
previous subsections:
g00 = −1 + 2U − 2U2 + 4Φ1 + 4Φ2 + 2Φ3 + 6Φ4 +O(6),
g0i = −
Wi + 2ḟ (∇× V )i +O(5),
gij = (1 + 2U) δij +O(4). (47)
One can verify that this metric is indeed a solution of
Eqs. (27), (29), (37) and (45) to the appropriate PN or-
der and to first order in the CS coupling parameter. Also
note that the solution of Eq. (47) automatically satis-
fies the constraint ⋆RR = 0 to linear order because the
contraction of the Levi-Civita symbol with two partial
derivatives vanishes. Such a solution is then allowed in
CS gravity, just as other classical solutions are [62], and
the equations of motion for the fluid can be obtained di-
rectly from the covariant derivative of the stress-energy
tensor.
We can now read off the PPN parameters of the CS
modified theory by comparing Eq. (5) to Eq. (47). A vi-
sual inspection reveals that the CS solution is identical
to the classical GR one, which implies that γ = β = 1,
ζ = 0 and α1 = α2 = α3 = ξ1 = ξ2 = ξ3 = ξ4 = 0 and
there are no preferred frame effects. However, Eq. (5)
contains an extra term that cannot be modeled by the
standard PPN metric of Eq. (5), namely the curl contri-
bution to g0i. We then see that the PPN metric must be
enhanced by the addition of a curl-type term to the 0i
components of the metric, namely
g0i ≡ −
(4γ + 3 + α1 − α2 + ζ1 − 2ξ)Vi
(1 + α2 − ζ1 + 2ξ)Wi + χ (r∇× V )i , (48)
where χ is a new PPN parameter and where we have
multiplied the curl operator by the radial distance to
the source, r, in order to make χ a proper dimension-
less parameter. Note that there is no need to introduce
any additional PPN parameters because the curl of Wi
equals the curl of Vi. In fact, we could have equally pa-
rameterized the new contribution to the PPN metric in
terms of the curl of Wi, but we chose not to because Vi
appears more frequently in PN theory. For the case of
CS modified gravity, the new χ parameter is simply
χ = 2
, (49)
which is dimensionless since ḟ has units of length. If an
experiment could measure or place bounds on the value
of χ, then ḟ could also be bounded, thus placing a con-
straint on the CS coupling parameter.
VI. ASTROPHYSICAL IMPLICATIONS
In this section we shall propose a physical interpreta-
tion to the CS modification to the PPN metric and we
shall calculate some GR predictions that are modified by
this correction. This section, however, is by no means a
complete study of all the possible consequences of the CS
correction, which is beyond the scope of this paper.
Let us begin by considering a system of A nearly spher-
ical bodies, for which the gravitational vector potentials
are simply [6]
V i =
viA +
, (50)
W i =
(vA · nA)niA +
where mA is the mass of the Ath body, rA is the field
point distance to the Ath body, niA = x
A/rA is a unit
vector pointing to the Ath body, vA is the velocity of the
Ath body and J iA is the spin-angular momentum of the
Ath body. For example, the spin angular momentum for
a Kerr spacetime is given by J i = m2ai, where a is the
dimensionless Kerr spin parameter. Note that if A = 2
then the system being modeled could be a binary of spin-
ning compact objects, while if A = 1 it could represent
the field of the sun or that of a rapidly spinning neutron
star or pulsar.
In obtaining Eq. (50), we have implicitly assumed a
point-particle approximation, which in classical GR is
justified by the effacement principle. This principle pos-
tulates that the internal structure of bodies contributes
to the solution of the field equations to higher PN order.
One can verify that this is indeed the case in classical
GR, where internal structure contributions appear at 5
PN order. In CS gravity, however, it is a priori unclear
whether an analogous effacement principle holds because
the CS term is expected to couple with matter current via
standard model-like interactions. If such is the case, it is
possible that a “mountain” on the surface of a neutron
star [63] or an r-mode instability [64, 65, 66] enhances
the CS contribution. In this paper, however, we shall
neglect these interactions, and relegate such possibilities
to future work [47].
With such a vector potential, we can calculate the CS
correction to the metric. For this purpose, we define the
correction δg0i ≡ g0i − g(GR)0i , where g
0i is the GR
prediction without CS gravity. We then find that the CS
corrections is given by
δg0i = 2
(vA × nA)i −
(JA · nA)
where the · operator is the flat-space inner product and
where we have used the identities ǫ̃ijk ǫ̃klm = δilδjm −
δimδjl and ǫ̃ilk ǫ̃jlm = 2δij . Note that the first term of
Eq. (51) is of O(3), while the second and third terms
are of O(4) as previously anticipated. Also note that ḟ
couples both to the spin and orbital angular momentum.
Therefore, whether the system under consideration is the
Solar system (vi of the Sun is zero while J i is small), the
Crab pulsar (vi is again zero but J i is large) or a binary
system of compact objects (neither vi nor J i vanish),
there will in general be a non-vanishing coupling between
the CS correction and the vector potential of the system.
From the above analysis, it is also clear that the CS
correction increases with the non-linearity of the space-
time. In other words, the CS term is larger not only for
systems with large velocities and spins, but also in re-
gions near the source. For a binary system, this fact
implies that the CS correction is naturally enhanced
in the last stages of inspiral and during merger. Note
that this enhancement is different from all previous en-
hancements proposed, since it does not require the pres-
ence of charge [28, 47], a fifth dimension with warped
compactifications [59, 60], or a vanishing string cou-
pling [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]. Un-
fortunately, the end of the inspiral stage coincides with
the edge of the PN region of validity and, thus, a com-
plete analysis of such a natural enhancement will have to
be carried out through numerical simulations.
In the presence of a source with the vector potentials of
Eq. (50), we can write the vectorial sector of the metric
perturbation in a suggestive way, namely
g0i =
viA −
vA − v(eff)A
(eff)
A · nA − 2
(eff)
where we have defined an effective velocity and angular
momentum vector via
viA(eff) = v
A − 6ḟ
J iA(eff) = J
A − ḟmAviA, (53)
or in terms of the Newtonian orbital angular momentum
= rA × pA and linear momentum piA(N) = mAv
LiA(eff) = L
A(N) − 6ḟ (nA × JA)
J iA(eff) = J
A − ḟpiA. (54)
From this analysis, it is clear that the CS corrections
seems to couple to both a quantity that resembles the or-
bital and the spin angular momentum vector. Note that
when the spin angular momentum vanishes the vectorial
metric perturbation is identical to that of a spinning mov-
ing fluid, but where the spin is induced by the coupling
of the orbital angular momentum to the CS term.
The presence of an effective CS spin angular momen-
tum in non-spinning sources leads to an interesting phys-
ical interpretation. Let us model the field that sources
ḟ as a fluid that permeates all of spacetime. This field
could be, for example, a model-independent axion, in-
spired by the quantity introduced in the standard model
to resolve the strong CP problem [67]. In this scenario,
then the fluid is naturally “dragged” by the motion of any
source and the CS modification to the metric is nothing
but such dragging. This analogy is inspired by the er-
gosphere of the Kerr solution, where inertial frames are
dragged with the rotation of the black hole. In fact, one
could push this analogy further and try to construct the
shear and bulk viscosity of such a fluid, but we shall not
attempt this here. Of course, this interpretation is to be
understood only qualitatively, since its purpose is only
to allow the reader to picture the CS modification to the
metric in physical terms.
An alternative interpretation can be given to the CS
modification in terms of the gravito-electro-magnetic
(GED) analogy [40, 41], which shall allow us to eas-
ily construct the predictions of the modified theory. In
this analogy, one realizes that the PN solution to the
linearized field equations can be written in terms of a
potential and vector potential, namely
ds2 = − (1− 2Φ)dt2 − 4 (A · dx) dt+ (1 + 2Φ) δijdxidxj ,
where Φ reduces to the Newtonian potential U in the
Newtonian limit [41] and Ai is a vector potential related
to the metric via Ai = −g0i/4. One can then construct
GED fields in analogy to Maxwell’s electromagnetic the-
ory via
Ei = − (∇Φ)i − ∂t
Bi = (∇×A)i , (56)
which in terms of the vectorial sector of the metric per-
turbation becomes
Ei = − (∇Φ)i + 1
Bi = −1
(∇× g)i , (57)
where we have defined the vector gi = g0i. The geodesic
equations for a test particle then reduce to the Lorentz
force law, namely
F i = −mEi − 2m (v ×B)i . (58)
We can now work out the effect of the CS correction
on the GED fields and equations of motion. First note
that the CS correction only affects g. We can then write
the CS modification to the Lorentz force law by defining
δai = ai − ai
, where ai
is the acceleration vector
predicted by GR, to obtain,
δai =
δġi +
(v × δΩ)i , (59)
where we have defined the angular velocity
δΩi = (∇× δg)i . (60)
The time derivative of the vector gi is of O(5) and can
thus be neglected, but the angular velocity cannot and it
is given by
δΩi = −
3 (vA · nA)niA − viA
, (61)
which is clearly of O(3). Note that although the first
term between square brackets cancels for circular orbits
because niA is perpendicular to v
A to Newtonian order,
the second term does not. The angular velocity adds a
correction to the acceleration of O(4), namely
δai = −3
(vA · nA) (vA × nA)i , (62)
which for a system in circular orbit vanishes to Newto-
nian order. One could use this formalism to find the
perturbations in the motion of moving objects by inte-
grating Eq. (62) twice. However, for systems in a circular
orbit, such as the Earth-Moon system or compact bina-
ries, this correction vanishes to leading order. Therefore,
lunar ranging experiments [68] might not be able to con-
straint ḟ .
Another correction to the predictions of GR is that
of the precession of gyroscopes by the so-called Lense-
Thirring or frame-dragging effect. In this process, the
spin angular momentum of a source twists spacetime in
such a way that gyroscopes are dragged with it. The
precession angular velocity depends on the vector sector
of the metric perturbation via Eq. (61). Thus, the full
Lense-Thirring term in the precession angular velocity of
precessing gyroscopes is
ΩiLT = −
J iA(eff) − 3n
JA(eff) · nA
. (63)
Note that this angular velocity is identical to the GR
prediction, except for the replacement J iA → J iA(eff).
In CS modified gravity, then, the Lense-Thirring effect
is not only produced by the spin angular momentum of
the gyroscope but also by the orbital angular momentum
that couples to the CS correction. Therefore, if an ex-
periment were to measure the precession of gyroscopes
by the curvature of spacetime (see, for example, Gravity
Probe B [42]) one could constraint ḟ and thus some in-
trinsic parameters of string theory. Note, however, that
the CS correction depends on the velocity of the bodies
with respect to the inertial PPN rest-frame. In order to
relate these predictions to the quantities that are actually
measured in the experiment, one would have to transform
to the experiment’s frame, or perhaps to a basis aligned
with the direction of distant stars [6].
Are there other experiments that could be performed
to measure such a deviation from GR? Any experiment
that samples the vectorial sector of the metric would in
effect be measuring such a deviation. In this paper, we
have only discussed modifications to the frame-dragging
effect and the acceleration of bodies through the GED
analogy, but this need not be the only corrections to
classical GR predictions. In fact, any predictions that de-
pends on g0i indirectly, for example via Christoffel sym-
bols, will probably also be modified unless the corrections
is fortuitously canceled. In this paper, we have laid the
theoretical foundations of the weak-field correction to the
metric due to CS gravity and studied some possible cor-
rections to classical predictions. A detailed study of other
corrections is beyond the scope of this paper.
VII. CONCLUSION
We have studied the weak-field expansion of the solu-
tion to the CS modified field equations in the presence
of a perfect fluid PN source in the point particle limit.
Such an expansion required that we linearize the Ricci
and Cotton tensor to second order in the metric pertur-
bation without any gauge assumption. An iterative PPN
formalism was then employed to solve for the metric per-
turbation in this modified theory of gravity. We have
found that CS gravity possesses the same PPN parame-
ters as those of GR, but it also requires the introduction
of a new term and PPN parameter that is proportional
to the curl of the PPN vector potentials. Such a term
is enhanced in non-linear scenarios without requiring the
presence of standard model currents, large extra dimen-
sions or a vanishing string coupling.
We have proposed an interpretation for the new term
in the metric produced by CS gravity and studied some
of the possible consequences it might have on GR predic-
tions. The interpretation consists of picturing the field
that sources the CS term as a fluid that permeates all of
spacetime. In this scenario, the CS term is nothing but
the “dragging” of the fluid by the motion of the source.
Irrespective of the validity of such an interpretation, the
inclusion of a new term to the weak-field expansion of
the metric naturally leads to corrections to the standard
GR predictions. We have studied the acceleration of
point particles and the Lense-Thirring contribution to
the precession of gyroscopes. We have found that both
corrections are proportional to the CS coupling parame-
ter and, therefore, experimental measurement of these ef-
fects might be used to constraint CS and, possibly, string
theory.
Future work could concentrate on studying further the
non-linear enhancement of the CS correction and the
modifications to the predictions of GR. The PPN analysis
performed here breaks down very close to the source due
to the use of a point particle approximation in the stress
energy tensor. One possible research route could consists
of studying the CS correction in a perturbed Kerr back-
ground [69]. Another possible route could be to analyze
other predictions of the theory, such as the perihelion
shift of Mercury or the Nordtvedt effect. Furthermore,
in light of the imminent highly-accurate measurement of
the Lense-Thirring effect by Gravity Probe B, it might be
useful to revisit this correction in a frame better-adapted
to the experimental setup. Finally, the CS modification
to the weak-field metric might lead to non-conservative
effects and the breaking of the effacement principle [47],
which could be studied through the evaluation of the
gravitational pseudo stress-energy tensor. Ultimately, it
will be experiments that will determine the viability of
CS modified gravity and string theory.
Acknowledgments
The authors acknowledge the support of the Center
for Gravitational Wave Physics funded by the National
Science Foundation under Cooperative Agreement PHY-
01-14375, and support from NSF grants PHY-05-55-628.
We would also like to thank Cliff Will for encouraging
one of us to study the PPN formalism and Pablo Laguna
for suggesting one of us to look into the PPN expansion
of CS gravity. We would also like to thank R. Jackiw,
R. Wagoner and Ben Owen for enlightening discussions
and comments.
APPENDIX A: PPN POTENTIALS
In this appendix, we present explicit expressions for
the PPN potentials used to parameterize the metric in
Eq. (5). These potentials are the following:
|x− x′|
d3x′,
ρ′v′i
|x− x′|
d3x′,
ρ′v′j(x− x′)j(x− x′)i
|x− x′|3
d3x′,
ρ′ρ′′
(x− x′)i
|x− x′|3
(x′ − x′′)i
|x− x′′| −
(x− x′′)i
|x′ − x′′|
d3x′d3x′′,
ρ′v′2
|x− x′|
d3x′, Φ2 ≡
ρ′U ′
|x− x′|
d3x′,
|x− x′|
d3x′, Φ4 ≡
|x− x′|
d3x′,
v′i (x− x′)
|x− x′| d
ρ′|x− x′|d3x′. (A1)
These potentials satisfy the following relations
∇2U = −4πρ, ∇2Vi = −4πρvi,
∇2Φ1 = −4πρv2, ∇2Φ2 = −4πρU,
∇2Φ3 = −4πρΠ, ∇2Φ4 = −4πp,
∇2X = −2U (A2)
The potential X is sometimes referred to as the super-
potential because it acts as a potential for the Newtonian
potential.
APPENDIX B: LINEARIZATION OF THE
COTTON TENSOR
In this appendix, we present some more details on the
derivation of the linearized Cotton tensor to second order.
We begin with the definition of the Cotton tensor [30] in
terms of the symmetrization operator, namely
Cµν = − 1√
(Dσf) ǫ
σαβ(µDαR
β + (Dστf)
Rτ(µ|σ|ν)
Using the symmetries of the Levi-Civita and Riemann
tensor, as well as the fact that f depends only on time,
we can simplify the Cotton tensor to
Cµν = (−g)−1ḟ
ǫ̃0αβ(µRν)β,α + ǫ̃
0αβ(µΓ
Γ0στ ǫ̃
σαβ(µRν)τ αβ
. (B2)
Noting that the determinant of the metric is simply g =
−1 + h, so that (−g)−1 = 1 + h, we can identify four
terms in the Cotton tensor
A = ḟ ǫ̃
0αβ(µ
L̂Rν)β,α
B = ḟ ǫ̃
0αβ(µhρρ
L̂Rν)β,α
C = ḟ ǫ̃
0αβ(µ
L̂Rλβ
ǫ̃σαβ(µ
L̂Γ0στ
L̂Rν)ταβ
E = ḟ ǫ̃
0αβ(µ
Q̂Rν)β,α
, (B3)
where the L̂ operator stands for the linear part of its
operand, while the Q̂ operator isolates the quadratic part
of its operand. For example, if we act L̂ and Q̂ on (1+h)n,
where n is some integer, we obtain
L̂(1 + h)n
= nh,
Q̂(1 + h)n
n(n− 1)
h2.(B4)
Let us now compute each of these terms separately.
The first four terms are given by
A = −
ǫ̃0αβ(µ
β,α − hσβ,νασ
B = −
hǫ̃0αβ(µ
β,α − hσβ,νασ
C = −
ǫ̃0αβ(µ
hν)λ,α + h
α,λ − hλα,ν)
β − hσλ,βσ − hσβ,λσ + h,λβ
ǫ̃σαβ(µ
2h0(σ,τ) − hστ,0
hτ [β,α]
ν − hν [β,α]τ
The last term of the Cotton tensor is simply the deriva-
tive of the Ricci tensor which we already calculated to
second order in Eq. (21). In order to avoid notation clut-
ter, we shall not present it again here, but instead we
combine all the Cotton tensor pieces to obtain
Cµν = − ḟ
ǫ̃0αβ(µ
β,α − hσβ,ασν)
ǫ̃0αβ(µ
β,α − hσβ,ασν)
2hν)(λ,α) − hλα,ν)
β − 2hσ(λ,β)σ + h,βλ
− 2Q̂Rν)β,α
ǫ̃σαβ(µ
2h0(σ,τ) − hστ,0
hτ [β,α]
ν) − hν)[β,α]τ
+O(h)3
where its covariant form is
Cµν = −
ǫ̃0αβ(µ
�ηhν)β,α − hσβ,αν)σ
ǫ̃0αβ(µ
�ηhν)β,α − hσβ,αν)σ
2hν)(λ,α) − hλα,ν)
β − 2hσ(λ,β)σ + h,βλ
− 2Q̂Rν)β,α + hνλ
β,α − hσβ,ασλ)
ǫ̃σαβ(µ
2h0(σ,τ) − hστ,0
hτ [β,α]ν) − hν)[β,α]τ
hµλǫ̃
0αβ(λ
�ηhν)β,α − hσβ,αν)σ
+O(h)3. (B7)
For the PPN mapping of CS modified gravity, only the
00 component of the metric is needed to second order,
which implies we only need C00 to O(h)2. This compo-
nent is given by
C00 =
ǫ̃ijk0
2h0(i,ℓ) − hiℓ,0
hℓ[k,j]0 − h0[k,j]ℓ
h0ℓǫ̃
0jk(ℓ
�ηh0k,j − hik,j0i
+O(h)3, (B8)
where in fact the last term vanishes due to the PPN gauge
condition. Note that this term is automatically of O(6),
which is well beyond the required order we need in h00.
[1] L. I. Schiff, Proc. Nat. Acad. Sci. 46, 871 (1960).
[2] K. Nordtvedt, Phys. Rev. 169, 1017 (1968).
[3] K. J. Nordtvedt and C. M. Will, Astrophys. J. 177, 775
(1972).
[4] C. M. Will, Astrophys. J. 163, 611 (1971).
[5] C. M. Will, Astrophys. J. 185, 31 (1973).
[6] C. M. Will, Theory and experiment in gravitational
physics (Cambridge University Press, Cambridge, UK,
1993).
[7] R. V. Wagoner, Phys. Rev. D1, 3209 (1970).
[8] C. M. Will and K. J. Nordtvedt, Astrophys. J. 177, 757
(1972).
[9] R. W. Hellings and K. Nordtvedt, Phys. Rev. D 7, 3593
(1973).
[10] N. Rosen, Annals Phys. 84, 455 (1974).
[11] D. L. Lee, W.-T. Ni, C. M. Caves, and C. M. Will, As-
trophys. J. 206, 555 (1976).
[12] D. L. Lee, A. P. Lightman, and W. T. Ni, Phys. Rev.
D10, 1685 (1974).
[13] C. M. Will, Living Reviews in Relativity 9 (2006).
[14] S. Kalyana Rama, ArXiv High Energy Physics - Theory
e-prints (1994), hep-th/9411076.
[15] V. D. Ivashchuk, V. S. Manko, and V. N. Melnikov,
ArXiv General Relativity and Quantum Cosmology e-
prints (2001), gr-qc/0101044.
[16] V. B. Bezerra, L. P. Colatto, M. E. Guimarães, and R. M.
Teixeira Filho, Phys. Rev. D 65, 104027 (2002), gr-
qc/0104038.
[17] R. J. Gleiser and C. N. Kozameh, Phys. Rev. D 64,
083007 (2001), gr-qc/0102093.
[18] Y.-Z. Fan, D.-M. Wei, and D. Xu (2007), astro-
ph/0702006.
[19] G. Amelino-Camelia, J. Ellis, N. E. Mavromatos, D. V.
Nanopoulos, and S. Sarkar, Nature (London) 393, 763
(1998), astro-ph/9712103.
[20] R. Gambini and J. Pullin, Phys. Rev. D59, 124021
(1999), gr-qc/9809038.
[21] C. M. Will, Phys. Rev. D 57, 2061 (1998), gr-qc/9709011.
[22] L. S. Finn and P. J. Sutton, Phys. Rev. D 65, 044022
(2002).
[23] P. D. Scharre and C. M. Will, Phys. Rev. D 65, 042002
(2002), gr-qc/0109044.
[24] P. J. Sutton and L. S. Finn, Class. Quantum Grav. 19,
1355 (2002), gr-qc/0112018.
[25] C. M. Will and N. Yunes, Class. Quantum Grav. 21, 4367
(2004), gr-qc/0403100.
[26] E. Berti, A. Buonanno, and C. M. Will, Class. Quantum
Grav. 22, S943 (2005), gr-qc/0504017.
[27] E. Berti, A. Buonanno, and C. M. Will, Phys. Rev. D 71
(2005), gr-qc/0411129.
[28] S. Alexander, L. S. Finn, and N. Yunes, in progress
(2007).
[29] S. Alexander and J. Martin, Phys. Rev. D71, 063526
(2005), hep-th/0410230.
[30] R. Jackiw and S. Y. Pi, Phys. Rev. D68, 104012 (2003),
gr-qc/0308071.
[31] J. Polchinski, String theory. Vol. 2: Superstring theory
and beyond (Cambridge University Press, Cambridge,
UK, 1998).
[32] M. B. Green, J. H. Schwarz, and E. Witten, SUPER-
STRING THEORY. VOL. 2: LOOP AMPLITUDES,
ANOMALIES AND PHENOMENOLOGY (Cambridge
University Press (Cambridge Monographs On Mathemat-
ical Physics), Cambridge, Uk, 1987).
[33] A. Lue, L.-M. Wang, and M. Kamionkowski, Phys. Rev.
Lett. 83, 1506 (1999), astro-ph/9812088.
[34] M. Li, J.-Q. Xia, H. Li, and X. Zhang (2006), hep-
ph/0611192.
[35] S. H. S. Alexander (2006), hep-th/0601034.
[36] S. H. S. Alexander, M. E. Peskin, and M. M. Sheik-
Jabbari, Phys. Rev. Lett. 96, 081301 (2006), hep-
th/0403069.
[37] S. H. S. Alexander and J. Gates, S. James, JCAP 0606,
018 (2006), hep-th/0409014.
[38] S. Alexander and N. Yunes (2007), hep-th/0703265.
[39] L. Blanchet, Living Rev. Rel. 9, 4 (2006), and references
therein, gr-qc/0202016.
[40] K. S. Thorne, R. H. Price, and D. A. MacDonald, Black
holes: The membrane paradigm (Black Holes: The Mem-
brane Paradigm, 1986).
[41] B. Mashhoon (2003), gr-qc/0311030.
[42] A discussion of the history, technology and
physics of Gravity Probe B can be found at
http://einstein.standfod.edu.
[43] C. M. Bender and S. A. Orszag, Advanced mathematical
methods for scientists and engineers 1, Asymptotic meth-
ods and perturbation theory (Springer, New York, 1999).
[44] J. Kevorkian and J. D. Cole, Multiple scale and singular
perturbation methods (Springer, New York, 1991), and
references therein.
[45] N. Yunes, W. Tichy, B. J. Owen, and B. Brügmann,
Phys. Rev. D74, 104011 (2006), gr-qc/0503011.
[46] N. Yunes and W. Tichy, Phys. Rev. D74, 064013 (2006),
gr-qc/0601046.
[47] S. Alexander, B. Owen, and N. Yunes, work in progress.
[48] R. H. Brandenberger and C. Vafa, Nucl. Phys. B316,
391 (1989).
[49] A. A. Tseytlin and C. Vafa, Nucl. Phys. B372, 443
(1992), hep-th/9109048.
[50] A. Nayeri, R. H. Brandenberger, and C. Vafa, Phys. Rev.
Lett. 97, 021302 (2006), hep-th/0511140.
[51] C.-Y. Sun and D.-H. Zhang (2006), hep-th/0611101.
[52] D. H. Wesley, P. J. Steinhardt, and N. Turok, Phys. Rev.
D72, 063513 (2005), hep-th/0502108.
[53] S. Alexander, R. H. Brandenberger, and D. Easson, Phys.
Rev. D62, 103509 (2000), hep-th/0005212.
[54] R. Brandenberger, D. A. Easson, and D. Kimberly, Nucl.
Phys. B623, 421 (2002), hep-th/0109165.
[55] T. Battefeld and S. Watson, Rev. Mod. Phys. 78, 435
(2006), hep-th/0510022.
[56] R. H. Brandenberger, A. Nayeri, S. P. Patil, and C. Vafa
(2006), hep-th/0608121.
[57] R. Brandenberger (2007), hep-th/0702001.
[58] P. Brax, C. van de Bruck, and A.-C. Davis, Rept. Prog.
Phys. 67, 2183 (2004), hep-th/0404011.
[59] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370
(1999), hep-ph/9905221.
[60] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 4690
(1999), hep-th/9906064.
[61] C. W. Misner, K. S. Thorne, and J. A. Wheeler, Gravi-
tation (W. H. Freeman & Co., San Francisco, 1973).
[62] D. Guarrera and A. J. Hariton (2007), gr-qc/0702029.
[63] B. J. Owen, Phys. Rev. Lett. 95, 211101 (2005), astro-
ph/0503399.
[64] S. Chandrasekhar, Phys. Rev. Lett. 24, 611 (1970).
[65] J. L. Friedman and B. F. Schutz, Astrophys. J. 222, 281
(1978).
[66] L. Lindblom, B. J. Owen, and S. M. Morsink, Phys. Rev.
Lett. 80, 4843 (1998), gr-qc/9803053.
[67] M. Dine, W. Fischler, and M. Srednicki, Phys. Lett.
B104, 199 (1981).
[68] J. Murphy, T. W., K. Nordtvedt, and S. G. Turyshev,
Phys. Rev. Lett. 98, 071102 (2007), gr-qc/0702028.
[69] N. Yunes and J. A. Gonzalez, Phys. Rev. D73, 024010
(2006), gr-qc/0510076.
[70] Formally, ḟ by itself is dimensional, so it cannot be
treated as an expansion parameter. A dimensionless pa-
rameter can, however, be constructed by dividing ḟ by
some length scale squared.
http://einstein.standfod.edu
ABSTRACT
  We investigate the weak-field, post-Newtonian expansion to the solution of
the field equations in Chern-Simons gravity with a perfect fluid source. In
particular, we study the mapping of this solution to the parameterized
post-Newtonian formalism to 1 PN order in the metric. We find that the PPN
parameters of Chern-Simons gravity are identical to those of general
relativity, with the exception of the inclusion of a new term that is
proportional to the Chern-Simons coupling parameter and the curl of the PPN
vector potentials. We also find that the new term is naturally enhanced by the
non-linearity of spacetime and we provide a physical interpretation for it. By
mapping this correction to the gravito-electro-magnetic framework, we study the
corrections that this new term introduces to the acceleration of point
particles and the frame-dragging effect in gyroscopic precession. We find that
the Chern-Simons correction to these classical predictions could be used by
current and future experiments to place bounds on intrinsic parameters of
Chern-Simons gravity and, thus, string theory.

<|endoftext|><|startoftext|>
Scaling of Resistance and Electron Mean Free Path of Single-Walled Carbon
Nanotubes
Meninder Purewal1, Byung Hee Hong2, Anirudhh Ravi2, Bhupesh Chandra3, James Hone3, and Philip Kim2
Department of Applied Physics, Columbia University, New York, New York 10027
Department of Physics, Columbia University, New York, New York 10027 and
Department of Mechanical Engineering, Columbia University, New York, New York 10027
We present an experimental investigation on the scaling of resistance in individual single walled
carbon nanotube devices with channel lengths that vary four orders of magnitude on the same
sample. The electron mean free path is obtained from the linear scaling of resistance with length at
various temperatures. The low temperature mean free path is determined by impurity scattering,
while at high temperature the mean free path decreases with increasing temperature, indicating that
it is limited by electron-phonon scattering. An unusually long mean free path at room temperature
has been experimentally confirmed. Exponentially increasing resistance with length at extremely
long length scales suggests anomalous localization effects.
Single walled carbon nanotubes (SWNTs) are 1D con-
ductors that exhibit a rich variety of low dimensional
charge transport phenomena [1], including ballistic con-
duction [2, 3, 4, 5, 6], localization [7] and 1D variable
range hopping [8]. The electron mean free path, Lm, is
one of the important length scales that characterize the
different 1D transport regimes. One method of determin-
ing Lm in SWNTs is to measure ballistic conduction for a
given device channel length. However, this method yields
a lower bound of Lm, and works only at low temperature
[2, 3, 4, 5] or at higher temperature for small length scales
(<60 nm) [6]. Another approach to obtain Lm at room
temperature is to employ scanning probe microscopy to
measure the linear scaling of the channel resistance [9], or
use non-invasive multi-terminal measurements [10]. Due
to the experimental limitations of these approaches, the
characterization of Lm for the same SWNTs over a range
of temperatures is yet to be realized.
Recent advances in the growth of extremely long
SWNTs (>1 mm) [11] now allow for an intensive study
on their intrinsic properties. In this letter, we present
experimental measurements on the scaling behavior of
resistance in individual, millimeter long SWNTs for the
temperature range of 1.6 - 300 K. From the linear scaling
of resistance, the temperature dependent electron mean
free path is calculated for each temperature. Beyond the
linear scaling regime, we observe that the resistance in-
creases exponentially with length, indicating localization
behavior.
Macroscopically long and straight individual SWNTs
were grown on a degenerately doped Si/SiO2 substrate
(tox = 500 nm) using the chemical vapor deposition
method described in Ref.[11]. This was followed by the
fabrication of multiple Pd electrodes with various sepa-
rations (200 nm- 400 µm)(Fig. 1(a)). Pd electrodes were
chosen to create highly transparent SWNT-electrode con-
tacts [4]. The diameters of the SWNTs were measured
by atomic force microscope (AFM). We chose SWNTs
with diameter d less than 2.5 nm to exclude any possi-
bility of including multiwalled nanotubes (MWNT) in
-15 0 15
-30 -15 0
500 m500 m
Vg (V)
Vg (V)
0.8 m
0.8 m
1.5 m
1.5 m
FIG. 1: (a)Optical image showing typical SWNT devices with
multiple Pd electrodes. (Inset) Scanning electron microscope
image of an isolated SWNT contacted with these electrodes.
Room temperature ISD(Vg) of selected channel lengths for (b)
metallic SWNT (M1) and (c) semiconducting SWNT (SC3)
with VSD = 6.4 and 2.7 mV, respectively.
this study. In addition, we confirmed that the high
bias saturation current is < 30 µA for all SWNTs stud-
ied [12], assuring that the samples consisted of single
tubes rather than small bundles or MWNTs. The sub-
strate was used as a gate electrode to tune the chemical
potential of the sample by the application of a gate volt-
age Vg. A small dc source-drain bias voltage (< 10 mV),
VSD, was applied between pairs of consecutive electrodes,
and the two-terminal linear response conductance was
determined from the measured source-drain current ISD.
Fig. 1(b-c) shows the measured ISD as a function of
VSD for selected channel length sections on two repre-
sentative SWNTs. All curves exhibit a ‘gap’ like fea-
ture - a range of Vg where ISD is suppressed. On
the same SWNT, every device (pair of consecutive elec-
http://arxiv.org/abs/0704.0300v2
trodes) shows a similar ISD(Vg) up to a length-dependent
multiplicative factor, once we align the centers of the
gap region for each curve. The similarity of the ISD(Vg)
behavior in different sections for each SWNT sample
indicates that the corresponding ‘gap’ features are de-
rived from the intrinsic electronic structure of the SWNT
rather than the effects of random local variation.
We use the qualitatively different ISD(Vg) behaviors of
different SWNTs to categorize them as metallic (M-NT)
or semiconducting nanotubes (S-NT). Typical S-NTs
(Fig. 1(c)) exhibit an off current region ISD < 10
when the Fermi energy EF lies n the energy gap [13, 14].
On the other hand, a weaker suppression of ISD(Vg) is
observed in the ‘small gap’ region in M-NTs (Fig. 1(b)).
The ‘small gap’ in M-NTs has been attributed to the
curvature-induced energy gap Eg <100 meV [15], which
is distinguished from the S-NT energy gap, which scales
with diameter as Eg ∼ 1/d (nm) [1]. Among the 11
SWNTs we studied in this letter, we found 4 M-NTs
and 7 S-NTs. Each of these SWNTs exhibit a gap cen-
tered at Vg > 0, indicating their p-doped nature. At
large negative gate voltage (Vg < −20 V), EF lies well
outside of the gap region and ISD(Vg) saturates to I
whose value depends only on the applied VSD and chan-
nel length L of the SWNT section. The two-terminal
resistance of the SWNT section is then obtained from
R(L) = VSD/I
SD. We note that four-terminal resistance
measurements are possible for each section by utilizing
the available multiple electrode configuration. However,
in our experiment, the four terminal measurements yield
essentially similar results to the two terminalR(L), which
prevents separation of the ‘contact’ resistance contribu-
tion from R(L). Such inseparable contact resistance be-
tween SWNT-metal electrodes was reported to be caused
by the invasiveness of metal contacts [16].
We designed many pairs of electrodes with different L
on each SWNT so that the scaling of R(L) can be studied
for a specific sample at a given temperature T . Fig. 2(a)
show R(L) of a representative SWNT measured in the
temperature range of 1.6 - 300 K and with an L range of
200 nm - 50 µm. In these ranges, R(L) increases linearly
and appears to converge to a finite value for small L (in-
set to Fig. 2(a)). We found that this scaling behavior can
be described well by a simple linear dependence with an
offset: R(L) = ρL +Rc, where ρ and Rc are interpreted
as the 1D resistivity and contact resistance, respectively.
The solid lines in Fig. 2(a) are the two parameter line
fits of the data points at a given T value. From these
fits, Rc(T ) and ρ(T ) are obtained as shown in Fig. 2(b)
and Fig. 2(c), respectively. For this sample, Rc remains
fairly constant at ∼8 kΩ and ρ(T ) exhibits typical metal-
lic behavior, i.e. it decreases with T and saturates to a
value ρsat at low temperatures. Similar scaling behavior
of R(L) is observed in other SWNTs, from which both Rc
and ρ(T ) are extracted within the linear scaling regime.
Table I summarizes d, Rc, and ρsat for the 4 M-NTs
200 K
300 K
110 K
1.65 K
L ( m)
T (K) T (K)
0 100 200 300
100 200 300
(b) (c)
0 20 40
L ( m)
FIG. 2: (a) (Inset) R(L) for sample M1 at select temperatures
ranging from 1.6 - 300 K. (Main) A log-log plot highlights the
behaviors at different lengths scaling 3 orders of magnitude.
From the linear fits (solid lines) of these data points, we ob-
tain the 1D resistivity (b) and the contact resistance (c) at
different temperatures. The dashed line in (c) represents RQ.
and 7 S-NTs considered in this study. To understand the
scaling of R(L) in Fig. 2, we begin with the two-terminal
Landauer-Buttiker formula applied to SWNTs [9]. If we
consider 4 low-energy channels in the SWNT, 2 each for
spin and band degeneracy, then the scaling of resistance
is given by R(L) = (h/4e2)(L/Lm + 1) + Rnc, where e
and h are electron charge and Plank constant and Lm
and Rnc are the electron mean free path and the non-
transparent contact resistance, respectively. Note that
we separate out the contribution of Rnc from the total
contact resistance Rc, so that the contact resistance be-
comes the quantum resistance RQ = h/4e
2 when the con-
tacts become fully transparent. From the experimentally
obtained ρ(T ) and Rc, we can deduce Lm = RQ/ρ(T )
and Rnc = Rc − RQ for each of our SWNT samples. In
particular, we note that Rnc <∼ RQ for the majority of
our samples, suggesting that the barrier at the contacts
is very thin and adds only a negligible contribution when
L becomes substantially large.
We now discuss the temperature dependent behavior
of the mean free path. Fig. 3 is the central result of
this letter, showing Lm(T ) of the SWNTs listed in Ta-
ble I. Overall, Lm(T ) exhibits different behaviors in
two regimes separated by Tcr: (i) the high tempera-
ture regime (T > Tcr) where Lm ∼ T
−1 (dashed line
in Fig. 3), which indicates that inelastic scattering be-
TABLE I: Device characteristics for SWNTs used in this study. The character M (SC) is designated for metallic (semiconduct-
ing) SWNTs.
M1 M2 M3 M4 SC1 SC2 SC3 SC4 SC5 SC6 SC7
d(nm) 2.0 ± .2 1.3 ± .4 1.7 ± .6 1.6 ± .4 1.6 ± .4 1.8 ± .6 1.9 ± .4 2.1 ± .2 2.2 ± .2 2.0 ± .6 2.2 ± .2
Rc(kΩ) 7.9 ± .8 11.5 ± 2.9 8.3 ± 2.5 12.0 ± 4.4 10.2 ± 4.5 14.9 ± 5.7 10.4 ± .9 7.0 ± 2.3 25.4 ± 4.2 6.9 ± 40 21.8 ± 14
ρsat(kΩ/µm) 0.76 ± .02 0.87 ± .02 0.93 ± .01 6.5 ± .08 2.95 ± .05 3.61 ± .05 4.64 ± .01 5.91 ± .12 8.13 ± .31 14.1 ± .19 16.3 ± .13
(µm) 8.56 ± .23 7.65 ± .17 7.07 ± .08 1.00 ± .01 2.24 ± .04 1.83 ± .03 1.40 ± .01 1.10 ± .02 0.80 ± .03 0.47 ± .01 0.40 ± .01
1001 10
T (K)
FIG. 3: (color online) The electron mean free path for the
samples listed in Table I at different temperatures. Most
metallic SWNTs (open circles) saturate at higher values than
that of semiconductors (closed circles). The dashed line rep-
resents T−1 dependence. The insets show scanning gate
microscopy images taken on devices SC2 (upper) and SC7
(lower). Lighter color corresponds less current in the SWNT.
The defects in the SWNT are highlighted by the bright region
(suppressed current) on the SWNT. The scale bar is 500nm.
tween electrons and acoustic phonons is dominant [9, 17]
regardless of chirality [18]; and (ii) the low temperature
regime (T < Tcr) where Lm saturates to the the tube
specific Lsatm . In this low temperature limit, the phonons
freeze out and Lsatm is determined by the temperature in-
dependent elastic scattering with impurities. We believe
the widely spread Lsatm values (0.4-10 µm) in (ii) are a
result of each SWNT sample having a static disorder of
different strengths and densities. We employ scanning
gate microscopy (SGM) [19] to image this static disor-
der. Indeed, the SGM images on S-NTs (insets to Fig. 3)
reveal that the SWNT with a shorter Lsatm shows more
defects. Note also that we have experimentally confirmed
that Lm is generally much higher for M-NTs than that
of S-NTs. This is an indication that the scattering of
electrons is strongly suppressed in M-NTs, as predicted
by Ando et al. [20] and McEuen et al. [21]. In M-NTs
we have experimentally shown that the ballistic electron
0 100 200
1.65 K
110 K
L (µm)
T (K)
0 100 200
L (µm)
101 102
0 100 200 300 400
1.65 K
110 K
L (µm)
0.5 M
T (K)
10050
L (µm)
100 10210110-1
FIG. 4: R(L) in the non-linear regimes for samples (a) SC6
(Lsatm ≈ 460 nm) and (b)M3 (L
m ≈ 7 µm). Note that the
data is magnified in (a) for clarity. The dashed line is an ex-
tension of the linear regime and the solid line is a fit for all
data. Rdev shows the absolute value of the difference between
the actual device resistance and the corresponding linear resis-
tance at 110 K (lower inset a) and 1.65 K (lower inset b).The
non-linearity increases with decreasing temperature, which is
reflected in the value of Lc(upper insets).
conduction is possible for channel lengths up to 8 µm at
low temperature and 0.8 µm even at room temperature.
Finally, we turn our attention to the non-linear scaling
of R(L). Fig. 4 presents R(L) beyond the linear scal-
ing regime of a representative S-NT and M-NT. At ex-
tremely long length scales and low temperatures, R(L)
deviates from the linear dependence extended from the
linear regime (dashed lines in main figure and see also
Rdev = R(L) − Rc − RQL/Lm in lower insets). Since
R(L << Lsatm ) ∼ RQ for all temperatures, we empha-
size here that this non-linear behavior in R(L) is solely
due to increasing electron scattering in the bulk part of
the SWNTs rather than an increasing barrier between
the SWNT and electrodes. In order to experimentally
determine the critical length scale Lc beyond which the
non-linear behaviors is dominant, we use a phenomeno-
logical equation: R(L) = Rc + RQ(L/Lm + e
L/Lc) to
fit the data (solid curves in Fig. 4). While Lc shows a
strong sample dependent behavior, generally we found
Lc >> Lm in all temperature ranges, with the tempera-
ture dependence exhibiting a trend of increasing Lc with
increasing T (upper insets to Fig. 4). This observed be-
havior of Lc(T ) excludes the quantum interference re-
lated to strong localization effects such as Anderson Lo-
calization [7] from the possible scenarios. In particular, in
the high temperature regime (T > Tcr), the phase coher-
ence length Lφ is limited by the phase-breaking electron-
phonon scattering, and thus Lφ ∼ Lm << Lc, inviting
further study to elucidate the observed localization be-
havior beyond the strong localization limit [22, 23].
In conclusion, we determine the length dependent re-
sistance for SWNTs with channel lengths ranged 200 nm
- 400 µm. From the scaling behavior we evaluate the
electron mean free path and localization length of the
SWNT for a range of temperatures. While the low tem-
perature mean free path is determined by the impurity
scattering, an unusually long mean free path is demon-
strated at room temperature, even with the dominant
electron-phonon scattering.
We thank I. Aleiner, B. Altshuler, and P. Jarillo-
Herrero for helpful discussions. This work is supported
by the NSF NIRT(ECS 0507111), CAREER (DMR-
0349232), NSEC (CHE-0117752), and the New York
State Office of Science, Technology, and Academic Re-
search (NYSTAR).
[1] R. Saito, G. Dresselhaus, and M.S. Dresselhaus, Physical
Properties of Carbon Nanotubes (Imperial College Press,
London 1998).
[2] J. Kong, E. Yenilmez, T.W. Tombler, W. Kim, H. Dai,
R.B. Laughlin, L. Liu, C.S. Jayanthi, and S.Y. Wu, Phys.
Rev. Lett. 87, 106801 (2001).
[3] W. Liang, M. Bockrath, D. Bozovic, J.H. Hafner, M.
Tinkham, and H. Park, Nature 411, 665 (2001).
[4] D. Mann, A. Javey, J. Kong, Q. Wang, and H. Dai, Nano
Lett. 3, 1541 (2003).
[5] A. Javey, J. Guo, Q. Wang, M. Lundstrom and H. Dai,
Nature, 424, 654 (2003).
[6] A. Javey, J. Guo, M. Paulsson, Q. Wang, D. Mann, M.
Lundstrom, and H. Dai, Phys. Rev. Lett. 92, 106804
(2004).
[7] C. Gomez-Navarro, P.J. de Pablo, J. Gomez-Herrero, B.
Biel, F.J. Garcia-Vidal, A. Rubio and F. Flores, Nat.
Mater. 4, 534 (2005).
[8] B. Gao, D.C. Glattli, B. Placais and A. Bachtold, Phys.
Rev. B 74, 085410 (2006).
[9] J. Park, S. Rosenblatt, Y. Yaish, V. Sazonova, H. Us-
tunel, S. Braig, T.A. Arias, P.W. Brouwer and P.L.
McEuen, Nano Lett. 4, 517 (2004).
[10] B. Gao, Y.F. Chen, M.S. Fuhrer, D.C. Glattli, and A.
Bachtold, Phys. Rev. Lett. 95, 196802 (2005).
[11] B.H. Hong, J.Y. Lee, T. Beetz, Y. Zhu, P. Kim, and K.S.
Kim, J. Am. Chem. Soc. 127, 15336 (2005).
[12] Z. Yao, C.L. Kane, and C. Dekker, Phys. Rev. Lett. 84,
2941 (2000).
[13] S.J. Tans, A.R.M. Verschueren, and C. Dekker, Nature
393, 49 (1998).
[14] J. Appenzeller, J. Knoch, V. Derycke, R. Martel, S. Wind
and Ph. Avouris, Phys. Rev. Lett. 89, 126801 (2002).
[15] C. Zhou, J. Kong, and H. Dai, Phys. Rev. Lett. 84, 5604
(2000).
[16] A. Bezryadin, A. R. M. Verschueren, S. J. Tans, and C.
Dekker, Phys. Rev. Lett. 80, 4036 (1998).
[17] V. Perebeinos, J. Tersoff, and Ph. Avouris, Phys. Rev.
Lett. 94, 086802 (2005).
[18] X. Zhou, J. Park, S. Huang, J. Liu, and P. L. McEuen,
Phys. Rev. Lett. 95, 146805 (2005).
[19] A. Bachtold, M.S. Fuhrer, S. Plyasunov, M. Forero, E.H.
Anderson, A. Zettl, and P.L. McEuen, Phys. Rev. Lett.
84, 6082 (2000).
[20] T. Ando and T. Nakanishi, Jpn. J. Appl. Phys. 67, 1704
(1998).
[21] P.L. McEuen, M. Bockrath, D.H. Cobden, Y. G. Yoon,
and S.G. Louie, Phys. Rev. Lett. 83, 5098 (1999).
[22] F. Triozon, S. Roche, A. Rubio, and D. Mayou, Phys.
Rev. B 69, 121410(R) (2004).
[23] R. Avriller, S. Latil, F. Triozon, X. Balse, and S. Roche,
Phys. Rev. B 74, 121406(R) (2006).
ABSTRACT
  We present an experimental investigation on the scaling of resistance in
individual single walled carbon nanotube devices with channel lengths that vary
four orders of magnitude on the same sample. The electron mean free path is
obtained from the linear scaling of resistance with length at various
temperatures. The low temperature mean free path is determined by impurity
scattering, while at high temperature the mean free path decreases with
increasing temperature, indicating that it is limited by electron-phonon
scattering. An unusually long mean free path at room temperature has been
experimentally confirmed. Exponentially increasing resistance with length at
extremely long length scales suggests anomalous localization effects.

<|endoftext|><|startoftext|>
Introduction
XTi , Yi
= {Xi,1, ...,Xi,d, Yi}ni=1 be a length n realization of a (d+ 1)-dimensional
strictly stationary process following the heteroscedastic model
Yi = m (Xi) + σ (Xi) εi,m (Xi) = E (Yi|Xi) , (1.1)
in which E (εi |Xi ) = 0, E
ε2i |Xi
= 1, 1 ≤ i ≤ n. The d-variate functions m, σ are the
unknown mean and standard deviation of the response Yi conditional on the predictor vector
Xi, often estimated nonparametrically. In what follows, we let
XT , Y, ε
have the stationary
distribution of
XTi , Yi, εi
. When the dimension of X is high, one unavoidable issue is the
“curse of dimensionality”, which refers to the poor convergence rate of nonparametric esti-
mation of general multivariate function. Much effort has been devoted to the circumventing
of this difficulty. In the words of Xia, Tong, Li and Zhu (2002), there are essentially two
approaches: function approximation and dimension reduction. A favorite function approxima-
tion technique is the generalized additive model advocated by Hastie and Tibshirani (1990),
Address for correspondence: Lijian Yang, Department of Statistics and Probability, Michigan State Univer-
sity, East Lansing, MI 48824, USA. E-mail: yang@stt.msu.edu
http://arxiv.org/abs/0704.0302v2
2 LI WANG AND LIJIAN YANG
see also, for example, Mammen, Linton and Nielsen (1999), Huang and Yang (2004), Xue
and Yang (2006 a, b), Wang and Yang (2007). An attractive dimension reduction method is
the single-index model, similar to the first step of projection pursuit regression, see Friedman
and Stuetzle (1981), Hall (1989), Huber (1985), Chen (1991). The basic appeal of single-index
model is its simplicity: the d-variate function m (x) = m (x1, ..., xd) is expressed as a univariate
function of xT θ0 =
p=1 xpθ0,p. Over the last two decades, many authors had devised various
intelligent estimators of the single-index coefficient vector θ0 = (θ0,1, ..., θ0,d)
, for instance,
Powell, Stock and Stoker (1989), Härdle and Stoker (1989), Ichimura (1993), Klein and Spady
(1993), Härdle, Hall and Ichimura (1993), Horowitz and Härdle (1996), Carroll, Fan, Gijbels
and Wand (1997), Xia and Li (1999), Hristache, Juditski and Spokoiny (2001). More recently,
Xia, Tong, Li and Zhu (2002) proposed the minimum average variance estimation (MAVE) for
several index vectors.
All the aforementioned methods assume that the d-variate regression function m (x) is
exactly a univariate function of some xT θ0 and obtain a root-n consistent estimator of θ0. If
this model is misspecified (m is not a genuine single-index function), however, a goodness-of-fit
test then becomes necessary and the estimation of θ0 must be redefined, see Xia, Li, Tong
and Zhang (2004). In this paper, instead of presuming that underlying true function m is
a single-index function, we estimate a univariate function g that optimally approximates the
multivariate function m in the sense of
g (ν) = E
m (X)|XT θ0 = ν
, (1.2)
where the unknown parameter θ0 is called the SIP coefficient, used for simple interpretation
once estimated; XT θ0 is the latent SIP variable; and g is a smooth but unknown function used
for further data summary, called the link prediction function. Our method therefore is clearly
interpretable regardless of the goodness-of-fit of the single-index model, making it much more
relevant in applications.
We propose estimators of θ0 and g based on weakly dependent sample, which includes
many existing nonparametric time series models, that are (i) computationally expedient and
(ii) theoretically reliable. Estimation of both θ0 and g has been done via the kernel smoothing
techniques in existing literature, while we use polynomial spline smoothing. The greatest
advantages of spline smoothing, as pointed out in Huang and Yang (2004), Xue and Yang
(2006 b) are its simplicity and fast computation. Our proposed procedure involves two stages:
estimation of θ0 by some
n-consistent θ̂, minimizing an empirical version of the mean squared
error, R(θ) = E{Y − E(Y |XT θ)}2; spline smoothing of Y on XT θ̂ to obtain a cubic spline
estimator ĝ of g. The best single-index approximation to m(x) is then m̂(x) = ĝ
xT θ̂
Under geometrically strong mixing condition, strong consistency and
n-rate asymptotic
SINGLE-INDEX PREDICTION MODEL 3
normality of the estimator θ̂ of the SIP coefficient θ0 in (1.2) are obtained. Proposition 2.2 is
the key in understanding the efficiency of the proposed estimator. It shows that the derivatives
of the risk function up to order 2 are uniformly almost surely approximated by their empirical
versions.
Practical performance of the SIP estimators is examined via Monte Carlo examples. The
estimator of the SIP coefficient performs very well for data of both moderate and high dimension
d, of sample size n from small to large, see Tables 1 and 2, Figures 1 and 2. By taking advantages
of the spline smoothing and the iterative optimization routines, one reduces the computation
burden immensely for massive data sets. Table 2 reports the computing time of one simulation
example on an ordinary PC, which shows that for massive data sets, the SIP method is much
faster than the MAVE method. For instance, the SIP estimation of a 200-dimensional θ0 from
a data of size 1000 takes on average mere 2.84 seconds, while the MAVE method needs to spend
2432.56 seconds on average to obtain a comparable estimates. Hence on account of criteria (i)
and (ii), our method is indeed appealing. Applying the proposed SIP procedure to the rive flow
data of Iceland, we have obtained superior forecasts, based on a 9-dimensional index selected
by BIC, see Figure 5.
The rest of the paper is organized as follows. Section 2 gives details of the model spec-
ification, proposed methods of estimation and main results. Section 3 describes the actual
procedure to implement the estimation method. Section 4 reports our findings in an extensive
simulation study. The proposed SIP model and the estimation procedure are applied in Section
5 to the rive flow data of Iceland. Most of the technical proofs are contained in the Appendix.
2. The Method and Main Results
2.1. Identifiability and definition of the index coefficient
It is obvious that without constraints, the SIP coefficient vector θ0 = (θ0,1, ..., θ0,d)
identified only up to a constant factor. Typically, one requires that ‖θ0‖ = 1 which entails
that at least one of the coordinates θ0,1, ..., θ0,d is nonzero. One could assume without loss of
generality that θ0,d > 0, and the candidate θ0 would then belong to the upper unit hemisphere
Sd−1+ =
(θ1, ..., θd) |
p=1 θ
p = 1, θd > 0
For a fixed θ = (θ1, ..., θd)
, denote Xθ = X
T θ, Xθ,i = X
i θ, 1 ≤ i ≤ n. Let
mθ (Xθ) = E (Y |Xθ) = E {m (X) |Xθ} . (2.1)
Define the risk function of θ as
R (θ) = E
{Y −mθ (Xθ)}2
= E {m (X)−mθ (Xθ)}2 + Eσ2 (X) , (2.2)
4 LI WANG AND LIJIAN YANG
which is uniquely minimized at θ0 ∈ Sd−1+ , i.e.
θ0 = arg min
θ∈Sd−1+
R (θ) .
Remark 2.1. Note that Sd−1+ is not a compact set, so we introduce a cap shape subset of
Sd−1+
Sd−1c =
(θ1, ..., θd) |
θ2p = 1, θd ≥
1− c2
, c ∈ (0, 1)
Clearly, for an appropriate choice of c, θ0 ∈ Sd−1c , which we assume in the rest of the paper.
Denote θ−d = (θ1, ..., θd−1)
, since for fixed θ ∈ Sd−1+ , the risk function R (θ) depends only
on the first d− 1 values in θ, so R (θ) is a function of θ−d
R∗ (θ−d) = R
θ1, θ2, ..., θd−1,
1− ‖θ−d‖22
with well-defined score and Hessian matrices
S∗ (θ−d) =
R∗ (θ−d) , H
∗ (θ−d) =
∂θ−d∂θ
R∗ (θ−d) . (2.3)
Assumption A1: The Hessian matrix H∗ (θ0,−d) is positive definite and the risk function R
is locally convex at θ0,−d, i.e., for any ε > 0, there exists δ > 0 such that R
∗ (θ−d)−R∗ (θ0,−d) <
δ implies ‖θ−d − θ0,−d‖2 < ε.
2.2. Variable transformation
Throughout this paper, we denote by Bda =
x ∈ Rd |‖x‖ ≤ a
the d-dimensional ball with
radius a and center 0 and
∣the kth order partial derivatives of m are continuous on B
the space of k-th order smooth functions.
Assumption A2: The density function of X, f (x) ∈ C(4)
, and there are constants
0 < cf ≤ Cf such that
cf/Vold
≤ f (x) ≤ Cf/Vold
, x ∈ Bda
f (x) ≡ 0, x /∈ Bda
For a fixed θ, define the transformed variables of the SIP variable Xθ
Uθ = Fd (Xθ) , Uθ,i = Fd (Xθ,i) , 1 ≤ i ≤ n, (2.4)
SINGLE-INDEX PREDICTION MODEL 5
in which Fd is the a rescaled centered Beta {(d+ 1) /2, (d+ 1) /2} cumulative distribution
function, i.e.
Fd (ν) =
∫ ν/a
Γ (d+ 1)
Γ {(d+ 1) /2}2 2d
1− t2
)(d−1)/2
dt, ν ∈ [−a, a] . (2.5)
Remark 2.2. For any fixed θ, the transformed variable Uθ in (2.4) has a quasi-uniform [0, 1]
distribution. Let fθ (u) be the probability density function of Uθ, then for any u ∈ [0, 1]
fθ (u) =
d (v)
fXθ (v) , v = F
d (u) ,
in which fXθ (v) = lim△ν→0 P (ν ≤ Xθ ≤ ν +△ν). Noting that xθ is exactly the projection of
x on θ, let Dν = {x|ν ≤ xθ ≤ ν +△ν} ∩Bda, then one has
P (ν ≤ Xθ ≤ ν +△ν) = P (X ∈ Dν) =
f (x) dx.
According to Assumption A2
cfVold(Dν)
Vold (B
≤ P (ν ≤ Xθ ≤ ν +△ν) ≤
CfVold(Dν)
Vold (B
On the other hand
Vold(Dν) = Vold−1(Jν)△ν + o (△ν) ,
where Jν = {x|xθ = v} ∩Bda. Note that the volume of Bda is πd/2ad/Γ (d/2 + 1) and
Vold−1 (Jν) = π(d−1)/2
a2 − ν2
)(d−1)/2
Γ {(d+ 1)/2} ,
Vold−1(Jν)
Vold (B
Γ (d+ 1)
}(d−1)/2
Therefore 0 < cf ≤ fθ (u) ≤ Cf <∞, for any fixed θ and u ∈ [0, 1].
In terms of the transformed SIP variable Uθ in (2.4), we can rewrite the regression function
mθ in (2.1) for fixed θ
γθ (Uθ) = E {m (X) |Uθ} = E {m (X) |Xθ} = mθ (Xθ) , (2.6)
then the risk function R (θ) in (2.2) can be expressed as
R (θ) = E
{Y − γθ (Uθ)}2
= E {m (X)− γθ (Uθ)}2 + Eσ2 (X) . (2.7)
2.3. Estimation Method
6 LI WANG AND LIJIAN YANG
Estimation of both θ0 and g requires a degree of statistical smoothing, and all estimation
here is carried out via cubic spline. In the following, we define the estimator θ̂ of θ0 and the
estimator ĝ of g.
To introduce the space of splines, we pre-select an integer n1/6 ≪ N = Nn ≪ n1/5 (log n)−2/5,
see Assumption A6 below. Divide [0, 1] into (N + 1) subintervals Jj = [tj, tj+1), j = 0, ..., N −
1, JN = [tN , 1], where T := {tj}Nj=1 is a sequence of equally-spaced points, called interior knots,
given as
t1−k = ... = t−1 = t0 = 0 < t1 < ... < tN < 1 = tN+1 = ... = tN+k,
in which tj = jh, j = 0, 1, ..., N +1, h = 1/ (N + 1) is the distance between neighboring knots.
The j-th B-spline of order k for the knot sequence T denoted by Bj,k is recursively defined by
de Boor (2001).
Denote by Γ(k−2) = Γ(k−2) [0, 1] the space of all C(k−2) [0, 1] functions that are polynomials
of degree k−1 on each interval. For fixed θ, the cubic spline estimator γ̂θ of γθ and the related
estimator m̂θ of mθ are defined as
γ̂θ (·) = arg min
γ(·)∈Γ(2)[0,1]
{Yi − γ (Uθ,i)}2 , m̂θ (ν) = γ̂θ {Fd (ν)} . (2.8)
Define the empirical risk function of θ
R̂ (θ) = n−1
{Yi − γ̂θ (Uθ,i)}2 = n−1
{Yi − m̂θ (Xθ,i)}2 , (2.9)
then the spline estimator of the SIP coefficient θ0 is defined as
θ̂ = arg min
θ∈Sd−1c
R̂ (θ) ,
and the cubic spline estimator of g is m̂θ with θ replaced by θ̂, i.e.
ĝ (ν) =
arg min
γ(·)∈Γ(2)[0,1]
Yi − γ
{Fd (ν)} . (2.10)
2.4. Asymptotic results
Before giving the main theorems, we state some other assumptions.
Assumption A3: The regression function m ∈ C(4)
for some a > 0.
Assumption A4: The noise ε satisfies E (ε |X) = 0, E
ε2 |X
= 1 and there exists a positive
constant M such that sup
|ε|3 |X = x
< M . The standard deviation function σ (x) is
continuous on Bda,
0 < cσ ≤ inf
x∈Bda
σ (x) ≤ sup
x∈Bda
σ (x) ≤ Cσ <∞.
SINGLE-INDEX PREDICTION MODEL 7
Assumption A5: There exist positive constants K0 and λ0 such that α (n) ≤ K0e−λ0n holds
for all n, with the α-mixing coefficient for
XTi , εi
defined as
α (k) = sup
B∈σ{Zs,s≤t},C∈σ{Zs,s≥t+k}
|P (B ∩ C)− P (B)P (C)| , k ≥ 1.
Assumption A6: The number of interior knots N satisfies: n1/6 ≪ N ≪ n1/5 (log n)−2/5.
Remark 2.3. Assumptions A3 and A4 are typical in the nonparametric smoothing literature,
see for instance, Härdle (1990), Fan and Gijbels (1996), Xia, Tong Li and Zhu (2002). By
the result of Pham (1986), a geometrically ergodic time series is a strongly mixing sequence.
Therefore, Assumption A5 is suitable for (1.1) as a time series model under aforementioned
assumptions.
We now state our main results in the next two theorems.
Theorem 1. Under Assumptions A1-A6, one has
θ̂−d−→ θ0,−d, a.s.. (2.11)
Proof. Denote by (Ω,F ,P) the probability space on which all
XTi , Yi
are defined. By
Proposition 2.2, given at the end of this section
‖θ−d‖2≤
R̂∗ (θ−d)−R∗ (θ−d)
−→ 0, a.s.. (2.12)
So for any δ > 0 and ω ∈ Ω, there exists an integer n0 (ω), such that when n > n0 (ω),
R̂∗ (θ0,−d, ω) − R∗ (θ0,−d) < δ/2. Note that θ̂−d = θ̂−d (ω) is the minimizer of R̂∗ (θ−d, ω),
so R̂∗
θ̂−d (ω) , ω
− R∗ (θ0,−d) < δ/2. Using (2.12), there exists n1 (ω), such that when
n > n1 (ω), R
θ̂−d (ω) , ω
− R̂∗
θ̂−d (ω) , ω
< δ/2. Thus, when n > max (n0 (ω) , n1 (ω)),
θ̂−d (ω) , ω
−R∗ (θ0,−d) < δ/2 + R̂∗
θ̂−d (ω) , ω
−R∗ (θ0,−d) < δ/2 + δ/2 = δ.
According to Assumption A1, R∗ is locally convex at θ0,−d, so for any ε > 0 and any ω, if
θ̂−d (ω) , ω
−R∗ (θ0,−d) < δ, then
∥θ̂−d (ω)−θ0,−d
∥ < ε for n large enough , which implies
the strong consistency.
Theorem 2. Under Assumptions A1-A6, one has
θ̂−d−θ0,−d
d−→ N {0,Σ (θ0)} ,
where Σ (θ0) = {H∗ (θ0,−d)}−1Ψ(θ0) {H∗ (θ0,−d)}−1, H∗ (θ0,−d) = {lpq}d−1p,q=1 and Ψ(θ0) =
{ψpq}d−1p,q=1 with
lp,q = −2E [{γ̇pγ̇q + γθ0 γ̈p,q} (Uθ0)] + 2θ0,qθ
E [{γ̇pγ̇d (Uθ0) + γθ0 γ̈p,d} (Uθ0)]
+2θ−30,dE [(γθ0 γ̇d) (Uθ0)]
θ20,d + θ
I{p=q} + θ0,pθ0,qI{p 6=q}
+2θ0,pθ
0,dE [{γ̇pγ̇q + γθ0 γ̈p,q} (Uθ0)]− 2θ0,pθ0,qθ
γ̇2d + γθ0 γ̈d,d
(Uθ0)
8 LI WANG AND LIJIAN YANG
ψpq = 4E
γ̇p − θ0,pθ−10,dγ̇d
γ̇q − θ0,qθ−10,dγ̇d
(Uθ0) {γθ0 (Uθ0)− Y }
in which γ̇p and γ̈p,q are the values of
∂θp∂θq
γθ taking at θ = θ0, for any p, q = 1, 2, ..., d−1
and γθ is given in (2.6).
Remark 2.4. Consider the Generalized Linear Model (GLM): Y = g
XT θ0
+σ (X) ε, where
g is a known link function. Let θ̃ be the nonlinear least squared estimator of θ0 in GLM.
Theorem 2 shows that under the assumptions A1-A6, the asymptotic distribution of the θ̂−d
is the same as that of θ̃. This implies that our proposed SIP estimator θ̂−d is as efficient as if
the true link function g is known.
The next two propositions play an important role in our proof of the main results. Propo-
sition 2.1 establishes the uniform convergence rate of the derivatives of γ̂θ up to order 2 to
those of γθ in θ. Proposition 2.2 shows that the derivatives of the risk function up to order 2
are uniformly almost surely approximated by their empirical versions.
Proposition 2.1. Under Assumptions A2-A6, with probability 1
θ∈Sd−1c
u∈[0,1]
|γ̂θ (u)− γθ (u)| = O
log n+ h4
, (2.13)
1≤p≤d
θ∈Sd−1c
1≤i≤n
{γ̂θ (Uθ,i)− γθ (Uθ,i)}
log n√
, (2.14)
1≤p,q≤d
θ∈Sd−1c
1≤i≤n
∂θp∂θq
{γ̂θ (Uθ,i)− γθ (Uθ,i)}
log n√
. (2.15)
Proposition 2.2. Under Assumptions A2-A6, one has for k = 0, 1, 2
‖θ−d‖≤
∂kθ−d
R̂∗ (θ−d)−R∗ (θ−d)
= o(1), a.s..
Proofs of Theorem 2, Propositions 2.1 and 2.2 are given in Appendix.
3. Implementation
In this section, we will describe the actual procedure to implement the estimation of θ0
and g. We first introduce some new notation. For fixed θ, write the B-spline matrix as
Bθ = {Bj,4 (Uθ,i)}n, Ni=1,j=−3 and
Pθ = Bθ
θ (3.1)
as the projection matrix onto the cubic spline space Γ
n,θ. For any p = 1, ..., d, denote
Ḃp =
Bθ, Ṗp =
SINGLE-INDEX PREDICTION MODEL 9
as the first order partial derivatives of Bθ and Pθ with respect to θ.
Let Ŝ∗(θ−d) be the score vector of R̂
∗ (θ−d), i.e.
Ŝ∗(θ−d) =
R̂∗ (θ−d) . (3.2)
The next lemma provides the exact forms of Ŝ∗(θ−d).
Lemma 3.1. For the score vector of R̂∗ (θ−d) defined in (3.2), one has
Ŝ∗ (θ−d) = −n−1
ṖpY − θpθ−1d Y
, (3.3)
where for any p = 1, 2, ..., d
ṖpY = 2Y
T (I−Pθ) Ḃp
θ Y, (3.4)
where Ḃp =
{Bj,3 (Uθ,i)−Bj+1,3 (Uθ,i)} Ḟd (Xθ,i)h−1Xi,p
}n, N
i=1,j=−3
Ḟd (x) =
Γ (d+ 1)
aΓ {(d+ 1) /2}2 2d
I (|x| ≤ a) .
Proof. For any p = 1, 2, ..., d, the derivatives of B-splines in de Boor (2001) implies
Ḃp =
Bj,4 (Uθ,i)
}n, N
i=1,j=−3
Bj,4 (Uθ,i)
}n, N
i=1,j=−3
Bj,3 (Uθ,i)
tj+3 − tj
Bj+1,3 (Uθ,i)
tj+4 − tj+1
Ḟd (Xθ,i)Xi,p
}n, N
i=1,j=−3
{Bj,3 (Uθ,i)−Bj+1,3 (Uθ,i)} Ḟd (Xθ,i) h−1Xi,p
}n, N
i=1,j=−3
Next, note that
Ṗp = Ḃp
θ +Bθ
= Ḃp
θ +Bθ
θ +Bθ
Since
BTθ Bθ
BTθ Bθ
BTθ Bθ
θ Bθ +
)−1 ∂
BTθ Bθ
and ∂
BTθ Bθ
= ḂTp Bθ +B
θ Ḃp, thus
)−1 (
p Bθ +B
θ Ḃp
10 LI WANG AND LIJIAN YANG
Hence
Ṗp = (I−Pθ) Ḃp
θ +Bθ
p (I−Pθ) .
Thus, (3.4) follows immediately.
In practice, the estimation is implemented via the following procedure.
Step 1. Standardize the predictor vectors {Xi}ni=1 and for each fixed θ ∈ Sd−1c obtain the
CDF transformed variables {Uθ,i}ni=1 of the SIP variable {Xθ,i}
through formula (2.5), where
the radius a is taken to be the 95% percentile of {‖Xi‖}ni=1.
Step 2. Compute quadratic and cubic B-spline basis at each value Uθ,i, where the number
of interior knots N is
N = min
n1/5.5
, (3.5)
Step 3. Find the estimator θ̂ of θ0 by minimizing R̂
∗ through the port optimization routine
with (0, 0, ..., 1)
as the initial value and the empirical score vector Ŝ∗ in (3.3). If d < n, one
can take the simple LSE (without the intercept) for data {Yi,Xi}ni=1 with its last coordinate set
positive.
Step 4. Obtain the spline estimator ĝ of g by plugging θ̂ obtained in Step 3 into (2.10).
Remark 3.1. In (3.5), c1 and c2 are positive integers and [ν] denotes the integer part of ν. The
choice of the tuning parameter c1 makes little difference for a large sample and according to our
asymptotic theory there is no optimal way to set these constants. We recommend using c1 = 1
to save computing for massive data sets. The first term ensures Assumption A6. The addition
constrain c2 can be taken from 5 to 10 for smooth monotonic or smooth unimodel regression
and c2 > 10 if has many local minima and maxima, which is very unlikely in application.
4. Simulations
In this section, we carry out two simulations to illustrate the finite-sample behavior of our
SIP estimation method. The number of interior knots N is computed according to (3.5) with
c1 = 1, c2 = 5. All of our codes have been written in R.
Example 1. Consider the model in Xia, Li, Tong and Zhang (2004)
Y = m (X) + σ0ε, σ0 = 0.3, 0.5, ε
i.i.d∼ N(0, 1)
where X = (X1,X2)
T ∼N(0, I2), truncated by [−2.5, 2.5]2 and
m (x) = x1 + x2 + 4exp
− (x1 + x2)2
x21 + x
. (4.1)
If δ = 0, then the underlying true functionm is a single-index function, i.e., m (X) =
2XT θ0+
XT θ0
, where θT0 = (1, 1) /
2. While δ 6= 0, then m is not a genuine single-index
SINGLE-INDEX PREDICTION MODEL 11
function. An impression of the bivariate function m for δ = 0 and δ = 1 can be gained in
Figure 1 (a) and (b), respectively.
Table 1: Report of Example 1 (Values out/in parentheses: δ = 0/δ = 1)
σ0 n θ0 BIAS SD MSE Average MSE
5e− 04 0.00825 7e− 05
(−0.00236) (0.02093) (0.00044) 7e− 05
−6e− 04 0.00826 7e− 05 (0.00043)
(0.00174) (0.02083) (0.00043)
−0.00124 0.00383 2e− 05
(−0.00129) (0.01172) (0.00014) 2e− 05
−0.00124 0.00383 2e− 05 (0.00014)
(0.00110) (0.01160) (0.00013)
0.00121 0.01346 0.00018
(−0.00137) (0.02257) (0.00051) 0.00018
−0.00147 0.01349 0.00018 (0.00051)
(0.00062) (0.02309) (0.00052)
−0.00204 0.00639 4e− 05
(−0.00229) (0.01205) (0.00015) 4e− 05
0.00197 0.00637 4e− 05 (0.00015)
(0.00208) (0.01190) (0.00014)
For δ = 0, 1, we draw 100 random realizations of each sample size n = 50, 100, 300 respec-
tively. To demonstrate how close our SIP estimator is to the true index parameter θ0, Table 1
lists the sample mean (MEAN), bias (BIAS), standard deviation (SD), the mean squared error
(MSE) of the estimates of θ0 and the average MSE of both directions. From this table, we find
that the SIP estimators are very accurate for both cases δ = 0 and δ = 1, which shows that
our proposed method is robust against the deviation from single-index model. As we expected,
when the sample size increases, the SIP coefficient is more accurately estimated. Moreover, for
n = 100, 300, the total average is inversely proportional to n.
Example 2. Consider the heteroscedastic regression model (1.1) with
m (X) = sin
, σ (X) = σ0
5− exp
5 + exp
) , (4.2)
in which Xi = {Xi,1, ...,Xi,d}T and εi, i = 1, ..., n, are
i.i.d∼ N (0, 1), σ0 = 0.2. In our simulation,
the true parameter θT0 = (1, 1, 0, ..., 0, 1)/
3 for different sample size n and dimension d. The
12 LI WANG AND LIJIAN YANG
superior performance of SIP estimators is borne out in comparison with MAVE of Xia, Tong,
Li and Zhu (2002). We also investigate the behavior of SIP estimators in the previously
unemployed cases that sample size n is smaller than or equal to d, for instance, n = 100, d =
100, 200 and n = 200, d = 200, 400. The average MSEs of the d dimensions are listed in Table
2, from which we see that the performance of the SIP estimators are quite reasonable and in
most of the scenarios n ≤ d, the SIP estimators still work astonishingly well where the MAVEs
become unreliable. For n = 100, d = 10, 50, 100, 200, the estimates of the link prediction
function g from model (4.2) are plotted in Figure 2, which is rather satisfactory even when
dimension d exceeds the sample size n.
Theorem 1 indicates that θ̂−d is strongly consistent of θ0,−d. To see the convergence, we
run 100 replications and in each replication, the value of ‖θ̂ − θ0‖/
d is computed. Figure
3 plots the kernel density estimations of the 100 ‖θ̂ − θ0‖ in Example 2, in which dimension
d = 10, 50, 100, 200. There are four types of line characteristics which correspond to the two
sample sizes, the dotted-dashed line (n = 100), dotted line (n = 200), dashed line (500) and
solid line (n = 1000). As sample sizes increasing, the squared errors are becoming closer to 0,
with narrower spread out, confirmative to the conclusions of Theorem 1.
Lastly, we report the average computing time of Example 2 to generate one sample of size
n and perform the SIP or MAVE procedure done on the same ordinary Pentium IV PC in
Table 2. From Table 2, one sees that our proposed SIP estimator is much faster than the
MAVE. The computing time for MAVE is extremely sensitive to sample size as we expected.
For very large d, MAVE becomes unstable to the point of the breaking down in four cases.
5. An application
In this section we demonstrate the proposed SIP model through the river flow data of
Jökulsá Eystri River of Iceland, from January 1, 1972 to December 31, 1974. There are 1096
observations, see Tong (1990). The response variables are the daily river flow (Yt), measured in
meter cubed per second of Jökulsá Eystri River. The exogenous variables are temperature (Xt)
in degrees Celsius and daily precipitation (Zt) in millimeters collected at the meteorological
station at Hveravellir.
This data set was analyzed earlier through threshold autoregressive (TAR) models by
Tong, Thanoon and Gudmundsson (1985), Tong (1990), and nonlinear additive autoregressive
(NAARX) models by Chen and Tsay (1993). Figure 4 shows the plots of the three time series,
from which some nonlinear and non-stationary features of the river flow series are evident. To
make these series stationary, we remove the trend by a simple quadratic spline regression and
these trends (dashed lines) are shown in Figure 4. By an abuse of notation, we shall continue
to use Xt, Yt, Zt to denote the detrended series.
SINGLE-INDEX PREDICTION MODEL 13
In the analysis, we pre-select all the lagged values in the last 7 days (1 week), i.e., the
predictor pool is {Yt−1, ..., Yt−7,Xt,Xt−1, ...,Xt−7, Zt, Zt−1, ..., Zt−7, }. Using BIC similar to
Huang and Yang (2004) for our proposed spline SIP model with 3 interior knots, the following
9 explanatory variables are selected from the above set {Yt−1, ..., Yt−4,Xt,Xt−1,Xt−2, Zt, Zt−1}.
Based on this selection, we fit the SIP model again and obtain the estimate of the SIP coefficient
θ̂ = {−0.877, 0.382,−0.208, 0.125,−0.046,−0.034, 0.004,−0.126, 0.079}T . Figure 5 (a) and (b)
display the fitted river flow series and the residuals against time.
Next we examine the forecasting performance of the SIP method. We start with estimating
the SIP estimator using only observations of the first two years, then we perform the out-of-
sample rolling forecast of the entire third year. The observed values of the exogenous variables
are used in the forecast. Figure 5 (c) shows this SIP out-of-sample rolling forecasts. For the
purpose of comparison, we also try the MAVE method, in which the same predictor vector is
selected by using BIC. The mean squared prediction error is 60.52 for the SIP model, 61.25 for
MAVE, 65.62 for NAARX, 66.67 for TAR and 81.99 for the linear regression model, see Chen
and Tsay (1993). Among the above five models, the SIP model produces the best forecasts.
6. Conclusion
In this paper we propose a robust SIP model for stochastic regression under weak depen-
dence regardless if the underlying function is exactly a single-index function. The proposed
spline estimator of the index coefficient possesses not only the usual strong consistency and
n-rate asymptotically normal distribution, but also is as efficient as if the true link function
g is known. By taking advantage of the spline smoothing method and the iterative method,
the proposed procedure is much faster than the MAVE method. This procedure is especially
powerful for large sample size n and high dimension d and unlike the MAVE method, the
performance of the SIP remains satisfying in the case d > n.
Acknowledgment
This work is part of the first author’s dissertation under the supervision of the second
author, and has been supported in part by NSF award DMS 0405330.
Appendix
A.1. Preliminaries
In this section, we introduce some properties of the B-spline.
Lemma A.1. There exist constants c > 0 such that for
j=−k+1 αj,kBj,k up to order k = 4
ch1/r ‖α‖r ≤
j=−k+1 αj,kBj,k
3r−1h
)1/r ‖α‖r , 1 ≤ r ≤ ∞
ch1/r ‖α‖r ≤
j=−k+1 αj,kBj,k
≤ (3h)1/r ‖α‖r , 0 < r < 1
14 LI WANG AND LIJIAN YANG
where α := (α−1,2, α0,2, ..., αN,2, ..., αN,4). In particular, under Assumption A2, for any fixed θ
ch1/2 ‖α‖2 ≤
j=−k+1
αj,kBj,k
≤ Ch1/2 ‖α‖2 .
Proof. It follows from the B-spline property on page 96 of de Boor (2001),
j=−k+1Bj,k ≡
3 on [0, 1]. So the right inequality follows immediate for r = ∞. When 1 ≤ r < ∞, we use
Hölder’s inequality to find
j=−k+1
αj,kBj,k
j=−k+1
|αj,k|r Bj,k
j=−k+1
1−1/r
= 31−1/r
j=−k+1
|αj,k|r Bj,k
Since all the knots are equally spaced,
−∞Bj,k (u) du ≤ h, the right inequality follows from
j=−k+1
αj,kBj,k (u)
du ≤ 3r−1h ‖α‖rr .
When r < 1, we have
j=−k+1
αj,kBj,k
j=−k+1
|αj,k|r Brj,k.
Since
j,k (u) du ≤ tj+k − tj = kh and
j=−k+1
αj,kBj,k (u)
du ≤ ‖α‖rr
Brj,k (u) du ≤ 3h ‖α‖
the right inequality follows in this case as well. For the left inequalities, we derive from Theorem
5.4.2, DeVore and Lorentz (1993)
|αj,k| ≤ C1h−1/r
∫ tj+1
j=−k+1
αj,kBj,k (u)
for any 0 < r ≤ ∞, so
|αj,k|r ≤ Cr1h−1
∫ tj+1
j=−k+1
αj,kBj,k (u)
SINGLE-INDEX PREDICTION MODEL 15
Since each u ∈ [0, 1] appears in at most k intervals (tj,tj+k), adding up these inequalities, we
obtain that
‖α‖rr ≤ C1h
∫ tj+k
j=−k+1
αj,kBj,k (u)
du ≤ 3Ch−1
j=−k+1
αj,kBj,k
The left inequality follows.
For any functions φ and ϕ, define the empirical inner product and the empirical norm as
〈φ,ϕ〉θ =
φ (u)ϕ (u) fθ (u) du, ‖φ‖22,n,θ = n
φ2 (Uθ,i) .
In addition, if functions φ,ϕ are L2 [0, 1]-integrable, define the theoretical inner product and
its corresponding theoretical L2 norm as
‖φ‖22,θ =
φ2 (u) fθ (u) du, 〈φ,ϕ〉n,θ = n
φ (Uθ,i)ϕ (Uθ,i) .
Lemma A.2. Under Assumptions A2, A5 and A6, with probability 1,
θ∈Sd−1c
k,k′=2,3,4
1≤j,j′≤N
Bj,k, Bj′,k′
Bj,k, Bj′,k′
log n
Proof. We only prove the case k = k′ = 4, all other cases are similar. Let
ζθ,j,j′,i = Bj,4 (Uθ,i)Bj′,4 (Uθ,i)− EBj,4 (Uθ,i)Bj′,4 (Uθ,i) ,
with the second moment
Eζ2θ,j,j′,i = E
B2j,4 (Uθ,i)B
j′,4 (Uθ,i)
EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
where
EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
}2 ∼ N−2, E
B2j,4 (Uθ,i)B
j′,4 (Uθ,i)
∼ N−1 by Assumption A2.
Hence, Eζ2θ,j,j′,i ∼ N−1. The k-th moment is given by
∣ζθ,j,j′,i
∣Bj,4 (Uθ,i)Bj′,4 (Uθ,i)− EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
≤ 2k−1
∣Bj,4 (Uθ,i)Bj′,4 (Uθ,i)
∣EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
where
∣EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
k ∼ N−k, E
∣EBj,4 (Uθ,i)Bj′,4 (Uθ,i)
k ∼ N−1. Thus, there exists
a constant C > 0 such that E
∣ζθ,j,j′,i
k ≤ C2k−1k!Eζ2j,j′,i. So the Cramér’s condition is satisfied
with Cramér’s constant c∗. By the Bernstein’s inequality (see Bosq (1998), Theorem 1.4, page
31), we have for k = 3
ζθ,j,j′,i
≤ a1 exp
25m22 + 5c
+ a2 (k)α
q + 1
])6/7
16 LI WANG AND LIJIAN YANG
where
δn = δ
log n√
, a1 = 2
δ2 (nN)
log2 n
25m22 + 5c
, m22 ∼ N−1,
a2 (3) = 11n
, m3 = max
1≤i≤n
∥ζθ,j,j′,i
≤ cN1/3.
Observe that 5cδn = o(1) by Assumption A6, then by taking q such that
≥ c0 log n,
q ≥ c1n/ log n for some constants c0, c1, one has a1 = O(n/q) = O (log n), a2 (3) = o
Assumption A6 again. Assumption A5 yields that
q + 1
])6/7
K0 exp
q + 1
])}6/7
≤ Cn−6λ0c0/7.
Thus, for fixed θ ∈ Sd−1c , when n large enough
ζθ,j,j′,i
≤ c log n exp
−c2δ2 log n
+ Cn2−6λ0c0/7. (A.1)
We divide each range of θp, p = 1, 2, ..., d − 1, into n6/(d−1) equally spaced intervals with
disjoint endpoints −1 = θp,0 < θp,1 < ... < θp,Mn = 1, for p = 1, ..., d − 1. Projecting these
small cylinders onto Sd−1c , the radius of each patch Λr, r = 1, ...,Mn is bounded by cM
Denote the projection of the Mn points as θr =
θr,−d,
1− ‖θr,−d‖22
, r = 0, 1, ...,Mn.
Employing the discretization method, sup
θ∈Sd−1c
1≤j,j′≤N
∣ζθ,j,j′,i
∣ is bounded by
0≤r≤Mn
1≤j,j′≤N
∣ζθr,j,j′,i
∣+ sup
0≤r≤Mn
1≤j,j′≤N
∣ζθ,j,j′,i − ζθr,j,j′,i
∣ . (A.2)
By (A.1) and Assumption A6, there exists large enough value δ > 0 such that
ζθr,j,j′,i
≤ n−10,
which implies that
1≤j,j′≤N
ζθr ,j,j′,i
N2Mnn
−10 ≤ C
n−3 <∞.
Thus, Borel-Cantelli Lemma entails that
0≤r≤Mn
1≤j,j′≤N
ζθr ,j,j′,i
log n√
, a.s.. (A.3)
SINGLE-INDEX PREDICTION MODEL 17
Employing Lipschitz continuity of the cubic B-spline, one has with probability 1
0≤r≤Mn
1≤j,j′≤N
ζθ,j,j′,i − ζθr,j,j′,i
M−1n h
−6) . (A.4)
Therefore Assumption A2, (A.2), (A.3) and (A.4) lead to the desired result.
Denote by Γ = Γ(0)∪Γ(1)∪Γ(2) the space of all linear, quadratic and cubic spline functions
on [0, 1]. We establish the uniform rate at which the empirical inner product approximates the
theoretical inner product for all B-splines Bj,k with k = 2, 3, 4.
Lemma A.3. Under Assumptions A2, A5 and A6, one has
An = sup
θ∈Sd−1c
γ1,γ2∈Γ
〈γ1, γ2〉n,θ − 〈γ1, γ2〉θ
‖γ1‖2,θ ‖γ2‖2,θ
log n
, a.s.. (A.5)
Proof. Denote without loss of generality,
j=−k+1
αjkBj,k, γ2 =
j=−k+1
βjkBj,k,
for any two 3 (N + 3)-vectors
α =(α−1,2, α0,2, ..., αN,2, ..., αN,4) , β =(β−1,2, β0,2, ..., βN,2, ..., βN,4) .
Then for fixed θ
〈γ1, γ2〉n,θ =
j=−k+1
αj,kBj,k (Uθ,i)
j=−k+1
βj,kBj,k (Uθ,i)
j=−k+1
j′=−k+1
αj,kβj′,k′
Bj,k, Bj′,k′
‖γ1‖22,θ =
j=−k+1
j′=−k+1
αj,kαj′,k′
Bj,k, Bj′ ,k′
‖γ2‖22,θ =
j=−k+1
j′=−k+1
βj,kβj′,k′
Bj,k, Bj′ ,k′
According to Lemma A.1, one has for any θ ∈ Sd−1c ,
c1h ‖α‖22 ≤ ‖γ1‖
2,θ ≤ c2h ‖α‖
2 , c1h ‖β‖
2 ≤ ‖γ2‖
2,θ ≤ c2h ‖β‖
c1h ‖α‖2 ‖β‖2 ≤ ‖γ1‖2,θ ‖γ2‖2,θ ≤ c2h ‖α‖2 ‖β‖2 .
18 LI WANG AND LIJIAN YANG
Hence
An = sup
θ∈Sd−1c
γ1∈γ,γ2∈Γ
〈γ1, γ2〉n,θ − 〈γ1, γ2〉θ
‖γ1‖2,θ ‖γ2‖2,θ
‖α‖∞ ‖β‖∞
c1h ‖α‖2 ‖β‖2
× sup
θ∈Sd−1c
k,k′=2,3,4
1≤j,j′≤N
Bj,k, Bj′ ,k′
Bj,k, Bj′ ,k′
An ≤ c0h−1 sup
θ∈Sd−1c
k,k′=2,3,4
1≤j,j′≤N
Bj,k, Bj′ ,k′
Bj,k, Bj′ ,k′
which, together with Lemma A.2, imply (A.5).
A.2. Proof of Proposition 2.1
For any fixed θ, we write the response YT = (Y1, ..., Yn) as the sum of a signal vector γθ,
a parametric noise vector Eθ and a systematic noise vector E, i.e.,
Y = γθ +Eθ +E,
in which the vectors γTθ = {γθ (Uθ,1) , ..., γθ (Uθ,n)}, ET = {σ (X1) ε1, ..., σ (Xn) εn} and ETθ =
{m (X1)− γθ (Uθ,1) , ...,m (Xn)− γθ (Uθ,n)}.
Remark A.1. If m is a genuine single-index function, then Eθ0 ≡ 0, thus the proposed SIP
model is exactly the single-index model.
Let Γ
n, θ be the cubic spline space spanned by {Bj,4 (Uθ,i)}
, −3 ≤ j ≤ N for fixed θ.
Projecting Y onto Γ
n, θ yields that
γ̂θ = {γ̂θ (Uθ,1) , ..., γ̂θ (Uθ,n)}T = ProjΓ(2)
γθ + ProjΓ(2)
Eθ + ProjΓ(2)
where γ̂θ is given in (2.8). We break the cubic spline estimation error γ̂θ (uθ) − γθ (uθ) into a
bias term γ̃θ (uθ)− γθ (uθ) and two noise terms ε̃θ (uθ) and ε̂θ (uθ)
γ̂θ (uθ)− γθ (uθ) = {γ̃θ (uθ)− γθ (uθ)}+ ε̃θ (uθ) + ε̂θ (uθ) , (A.6)
where
γ̃θ (u) = {Bj,4 (u)}T−3≤j≤N V
〈γθ, Bj,4〉n,θ
, (A.7)
ε̃θ (u) = {Bj,4 (u)}T−3≤j≤N V
〈Eθ, Bj,4〉n,θ
, (A.8)
ε̂θ (u) = {Bj,4 (u)}T−3≤j≤N V
〈E, Bj,4〉n,θ
. (A.9)
SINGLE-INDEX PREDICTION MODEL 19
In the above, we denote by Vn,θ the empirical inner product matrix of the cubic B-spline basis
and similarly, the theoretical inner product matrix as Vθ
Vn,θ =
θ Bθ =
Bj′,4, Bj,4
j,j′=−3
,Vθ =
Bj′,4, Bj,4
j,j′=−3 . (A.10)
In Lemma A.5, we provide the uniform upper bound of
∞. Before
that, we first describe a special case of Theorem 13.4.3 in DeVore and Lorentz (1993).
Lemma A.4. If a bi-infinite matrix with bandwidth r has a bounded inverse A−1 on l2 and
κ = κ (A) := ‖A‖2
is the condition number of A, then
∞ ≤ 2c0 (1− ν)
, with
c0 = ν
−2r ∥
, ν =
κ2 − 1
)1/4r (
κ2 + 1
)−1/4r
Lemma A.5. Under Assumptions A2, A5 and A6, there exist constants 0 < cV < CV such
that cVN
−1 ‖w‖22 ≤ wTVθw ≤ CVN−1 ‖w‖
2 and
−1 ‖w‖22≤ w
Vn,θw ≤ CVN−1 ‖w‖22 , a.s., (A.11)
with matrices Vθ and Vn,θ defined in (A.10). In addition, there exists a constant C > 0 such
θ∈Sd−1c
≤ CN, a.s., sup
θ∈Sd−1c
∞ ≤ CN. (A.12)
Proof. First we compute the lower and upper bounds for the eigenvalues of Vn,θ. Let w be any
(N + 4)-vector and denote γw (u) =
j=−3wjBj,4 (u), then Bθw = {γw (Uθ,1) , ..., γw (Uθ,n)}
and the definition of An in (A.5) from Lemma A.3 entails that
‖γw‖22,θ (1−An) ≤ w
Vn,θw = ‖γw‖22,n,θ ≤ ‖γw‖
2,θ (1 +An) . (A.13)
Using Theorem 5.4.2 of DeVore and Lorentz (1993) and Assumption A2, one obtains that
‖w‖22 ≤ ‖γw‖
2,θ = w
Vθw =
wjBj,4
‖w‖22 , (A.14)
which, together with (A.13), yield
−1 ‖w‖22 (1−An) ≤ w
Vn,θw ≤ CfCN−1 ‖w‖22 (1 +An) . (A.15)
Now the order of An in (A.5), together with (A.14) and (A.15) implies (A.11), in which cV =
cfC,CV = CfC. Next, denote by λmax (Vn,θ) and λmin (Vn,θ) the maximum and minimum
eigenvalue of Vn,θ, simple algebra and (A.11) entail that
−1 ≥ ‖Vn,θ‖2 = λmax (Vn,θ) ,
= λ−1min (Vn,θ) ≤ c
V N, a.s.,
20 LI WANG AND LIJIAN YANG
κ := ‖Vn,θ‖2
= λmax (Vn,θ)λ
min (Vn,θ) ≤ CV c
V <∞, a.s..
Meanwhile, let wj = the (N + 4)-vector with all zeros except the j-th element being 1, j =
−3, ..., N . Then clearly
j Vn,θwj =
B2j,4 (Uθ,i) = ‖Bj,4‖
, ‖wj‖2 = 1,−3 ≤ j ≤ N
and in particular
0 Vn,θw0 ≤ λmax (Vn,θ) ‖w0‖2 = λmax (Vn,θ) ,
−3Vn,θw−3 ≥ λmin (Vn,θ) ‖w−3‖2 = λmin (Vn,θ) .
This, together with (A.5) yields that
κ = λmax (Vn,θ)λ
min (Vn,θ) ≥
wT0 Vn,θw0
wT−3Vn,θw−3
‖B0,4‖2n,θ
‖B−3,4‖2n,θ
‖B0,4‖2θ
‖B−3,4‖2θ
1 +An
which leads to κ ≥ C > 1, a.s. because the definition of B-spline and Assumption A2 ensure
that ‖B0,4‖2θ ≥ C0 ‖B−3,4‖
for some constant C0 > 1. Next applying Lemma A.4 with
κ2 − 1
)1/16 (
κ2 + 1
)−1/16
and c0 = ν
, one gets
≤ 2ν−8N (1− ν)−1 =
CN, a.s.. Hence part one of (A.12) follows. Part two of (A.12) is proved in the same fashion.
In the following, we denote by QT (m) the 4-th order quasi-interpolant of m corresponding
to the knots T , see equation (4.12), page 146 of DeVore and Lorentz (1993). According to
Theorem 7.7.4, DeVore and Lorentz (1993), the following lemma holds.
Lemma A.6. There exists a constant C > 0, such that for 0 ≤ k ≤ 2 and γ ∈ C(4) [0, 1]
(γ −QT (γ))(k)
h4−k,
Lemma A.7. Under Assumptions A2, A3, A5 and A6, there exists an absolute constant C > 0,
such that for function γ̃θ (u) in (A.7)
θ∈Sd−1c
(γ̃θ − γθ)
h4−k, a.s., 0 ≤ k ≤ 2, (A.16)
Proof. According to Theorem A.1 of Huang (2003), there exists an absolute constant C > 0,
such that
θ∈Sd−1c
‖γ̃θ − γθ‖∞ ≤ C sup
θ∈Sd−1c
γ∈Γ(2)
‖γ − γθ‖∞ ≤ C
h4, a.s., (A.17)
SINGLE-INDEX PREDICTION MODEL 21
which proves (A.16) for the case k = 0. Applying Lemma A.6, one has for 0 ≤ k ≤ 2
θ∈Sd−1c
{QT (γθ)− γθ}
≤ C sup
θ∈Sd−1c
h4−k ≤ C
h4−k, (A.18)
As a consequence of (A.17) and (A.18) for the case k = 0, one has
θ∈Sd−1c
‖QT (γθ)− γ̃θ‖∞ ≤ C
h4, a.s.,
which, according to the differentiation of B-spline given in de Boor (2001), entails that
θ∈Sd−1c
{QT (γθ)− γ̃θ}
h4−k, a.s., 0 ≤ k ≤ 2. (A.19)
Combining (A.18) and (A.19) proves (A.16) for k = 1, 2.
Lemma A.8. Under Assumptions A1, A2, A4 and A5, there exists an absolute constant C > 0,
such that
1≤p≤d
θ∈Sd−1c
{γ̃θ (Uθ,i)− γθ (Uθ,i)}ni=1
h3, a.s., (A.20)
1≤p,q≤d
θ∈Sd−1c
∂θp∂θq
{γ̃θ (Uθ,i)− γθ (Uθ,i)}ni=1
h2, a.s.. (A.21)
Proof. According to the definition of γ̃θ in (A.7), and the fact that QT (γθ) is a cubic spline
on the knots T
{{QT (γθ)− γ̃θ} (Uθ,i)}ni=1 = Pθ {{QT (γθ)− γθ} (Uθ,i)}
which entails that
{{QT (γθ)− γ̃θ} (Uθ,i)}ni=1 =
Pθ {{QT (γθ)− γθ} (Uθ,i)}ni=1
= Ṗp {{QT (γθ)− γθ} (Uθ,i)}ni=1 +Pθ
{{QT (γθ)− γθ} (Uθ,i)}ni=1 .
Since
{{QT (γθ)− γθ} (Uθ,i)}ni=1 =
(Uθ,i)
{QT (γθ)− γθ} (Uθ,i)Xip
applying (A.19) to the decomposition above produces (A.20). The proof of (A.21) is similar.
22 LI WANG AND LIJIAN YANG
Lemma A.9. Under Assumptions A2, A5 and A6, there exists a constant C > 0 such that
θ∈Sd−1c
∥n−1BTθ
∞ ≤ Ch, a.s., sup
1≤p≤d
θ∈Sd−1c
n−1ḂTp
≤ C, a.s., (A.22)
θ∈Sd−1c
‖Pθ‖∞ ≤ C, a.s., sup
1≤p≤d
θ∈Sd−1c
≤ Ch−1, a.s.. (A.23)
Proof. To prove (A.22), observe that for any vector a ∈ Rn, with probability 1
∥n−1BTθ a
∞ ≤ ‖a‖∞ max−3≤j≤N
Bj,4 (Uθ,i)
≤ Ch ‖a‖∞ ,
n−1ḂTp a
≤ ‖a‖∞ max−3≤j≤N
{(Bj,3 −Bj+1,3) (Uθ,i)} Ḟd (Xθ,i)Xi,p
≤ C ‖a‖∞ .
To prove (A.23), one only needs to use (A.12), (A.22) and (3.1).
Lemma A.10. Under Assumptions A2 and A4-A6, one has with probability 1
θ∈Sd−1c
BTθ E
= max
−3≤j≤N
Bj,4 (Uθ,i)σ (Xi) εi
log n√
, (A.24)
1≤p≤d
θ∈Sd−1c
BTθ E
= sup
1≤p≤d
θ∈Sd−1c
ḂTpE
log n√
. (A.25)
Similarly, under Assumptions A2, A4-A6, with probability 1
θ∈Sd−1c
BTθ Eθ
= sup
θ∈Sd−1c
−3≤j≤N
Bj,4 (Uθ,i) {m (Xi)− γθ (Uθ,i)}
log n√
(A.26)
1≤p≤d
θ∈Sd−1c
BTθ Eθ
log n√
, a.s.. (A.27)
Proof. We decompose the noise variable εi into a truncated part and a tail part εi = ε
i,1 +
εDni,2 +m
i , where Dn = n
η (1/3 < η < 2/5), εDni,1 = εiI {|εi| > Dn},
εDni,2 = εiI {|εi| ≤ Dn} −m
i = E [εiI {|εi| ≤ Dn} |Xi] .
It is straightforward to verify that the mean of the truncated part is uniformly bounded by
D−2n , so the boundedness of B spline basis and of the function σ
2 entail that
θ∈Sd−1c
Bj,4 (Uθ,i) σ (Xi)m
n−2/3
SINGLE-INDEX PREDICTION MODEL 23
The tail part vanishes almost surely
P {|εn| > Dn} ≤
D−3n <∞.
Borel-Cantelli Lemma implies that
Bj,4 (Uθ,i) σ (Xi) ε
, for any k > 0.
For the truncated part, using Bernstein’s inequality and discretization as in Lemma A.2
θ∈Sd−1c
1≤j≤N
Bj,4 (Uθ,i) σ (Xi) ε
log n/
, a.s..
Therefore (A.24) is established as with probability 1
θ∈Sd−1c
n−2/3
log n/
log n/
The proofs of (A.25), (A.26) are similar as E {m (Xi)− γθ (Uθ,i) |Uθ,i } ≡ 0, but no truncation
is needed for (A.26) as sup
θ∈Sd−1c
1≤i≤n
|m (Xi)− γθ (Uθ,i)| ≤ C <∞. Meanwhile, to prove (A.27),
we note that for any p = 1, ..., d
[Bj,4 (Uθ,i) {m (Xi)− γθ (Uθ,i)}]
According to (2.6), one has γθ (Uθ) ≡ E {m (X) |Uθ}, hence
E [Bj,4 (Uθ) {m (X)− γθ (Uθ)}] ≡ 0,−3 ≤ j ≤ N, θ ∈ Sd−1c .
Applying Assumptions A2 and A3, one can differentiate through the expectation, thus
[Bj,4 (Uθ) {m (X)− γθ (Uθ)}]
≡ 0, 1 ≤ p ≤ d,−3 ≤ j ≤ N, θ ∈ Sd−1c ,
which allows one to apply the Bernstein’s inequality to obtain that with probability 1
[Bj,4 (Uθ,i) {m (Xi)− γθ (Uθ,i)}]
(nh)−1/2 log n
which is (A.27).
Lemma A.11. Under Assumptions A2 and A4-A6, for ε̂θ (u) in (A.9), one has
θ∈Sd−1c
u∈[0,1]
|ε̂θ (u)| = O
log n
, a.s.. (A.28)
24 LI WANG AND LIJIAN YANG
Proof. Denote â ≡ (â−3, · · · , âN )T =
BTθ Bθ
BTθ E = V
n−1BTθ E
, then ε̂θ (u) =
j=−3 âjBj,4 (u), so the order of ε̂θ (u) is related to that of â. In fact, by Theorem 5.4.2
in DeVore and Lorentz (1993)
θ∈Sd−1c
u∈[0,1]
|ε̂θ (u)| ≤ sup
θ∈Sd−1c
‖â‖∞ =
θ∈Sd−1c
n−1BTθ E
≤ CN sup
θ∈Sd−1c
∥n−1BTθ E
∞ , a.s.,
where the last inequality follows from (A.12) of Lemma A.5. Applying (A.24) of Lemma A.10,
we have established (A.28).
Lemma A.12. Under Assumptions A2 and A4-A6, for ε̃θ (u) in (A.8), one has
θ∈Sd−1c
u∈[0,1]
|ε̃θ (u)| = O
log n
, a.s.. (A.29)
The proof is similar to Lemma A.11, thus omitted.
The next result evaluates the uniform size of the noise derivatives.
Lemma A.13. Under Assumptions A2-A6, one has with probability 1
1≤p≤d
θ∈Sd−1c
1≤i≤n
ε̂θ (Uθ,i)
(nh3)−1/2 log n
, (A.30)
1≤p≤d
θ∈Sd−1c
1≤i≤n
ε̃θ (Uθ,i)
(nh3)−1/2 log n
, (A.31)
1≤p,q≤d
θ∈Sd−1c
1≤i≤n
∂θp∂θq
ε̂θ (Uθ,i)
(nh5)−1/2 log n
, (A.32)
1≤p,q≤d
θ∈Sd−1c
1≤i≤n
∂θp∂θq
ε̃θ (Uθ,i)
(nh5)−1/2 log n
. (A.33)
Proof. Note that
ε̂θ (Uθ,i)
= (I−Pθ) Ḃp
θ E+Bθ
p (I−Pθ)E.
Applying (A.24) and (A.25) of Lemma A.10, (A.12) of Lemma A.5, (A.22) and (A.23) of
Lemma A.9, one derives (A.30). To prove (A.31), note that
ε̃θ (Uθ,i)
{PθEθ} = ṖpEθ +Pθ
Eθ = T1 + T2, (A.34)
in which
(I−Pθ) Ḃp −Bθ
(I−Pθ) Ḃp −Bθ
BTθ Bθ
ḂTpBθ
BTθ Bθ
BTθ Eθ
SINGLE-INDEX PREDICTION MODEL 25
T2 = Bθ
BTθ Bθ
BTθ Eθ
By (A.24), (A.12), (A.22) and (A.23), one derives
θ∈Sd−1c
‖T1‖∞ = O
n−1/2N3/2 log n
, a.s., (A.35)
while (A.27) of Lemma A.10, (A.12) of Lemma A.5
θ∈Sd−1c
‖T2‖∞ = N ×O
n−1/2h−1/2 log n
n−1/2h−3/2 log n
, a.s.. (A.36)
Now, putting together (A.34), (A.35) and (A.36), we have established (A.31). The proof for
(A.32) and (A.33) are similar.
Proof of Proposition 2.1. According to the decomposition (A.6)
|γ̂θ (u)− γθ (u)| = |{γ̃θ (u)− γθ (u)}+ ε̃θ (u) + ε̂θ (u)| .
Then (2.13) follows directly from (A.16) of Lemma A.7, (A.28) of Lemma A.11 and (A.29) of
Lemma A.12. Again by definitions (A.8) and (A.9), we write
{(γ̂θ − γθ) (Uθ,i)} =
(γ̃θ − γθ) (Uθ,i) +
γ̃θ (Uθ,i) +
ε̂θ (Uθ,i) .
It is clear from (A.20), (A.30) and (A.31) that with probability 1
1≤p≤d
θ∈Sd−1c
1≤i≤n
(γ̃θ − γθ) (Uθ,i)
1≤p≤d
θ∈Sd−1c
1≤i≤n
ε̃θ (Uθ,i)
ε̂θ (Uθ,i)
)−1/2
log n
Putting together all the above yields (2.14). The proof of (2.15) is similar.
A.3. Proof of Proposition 2.2
Lemma A.14. Under Assumptions A2-A6, one has
θ∈Sd−1c
R̂ (θ)−R (θ)
= o(1), a.s..
Proof. For the empirical risk function R̂ (θ) in (2.9), one has
R̂ (θ) = n−1
{γ̂θ (Uθ,i)−m (Xi)− σ (Xi) εi}2
26 LI WANG AND LIJIAN YANG
= n−1
{γ̂θ (Uθ,i)− γθ (Uθ,i) + γθ (Uθ,i)−m (Xi)− σ (Xi) εi}2 ,
hence
R̂ (θ) = n−1
{γ̂θ (Uθ,i)− γθ (Uθ,i)}2 + n−1
σ2 (Xi) ε
+2n−1
{γ̂θ (Uθ,i)− γθ (Uθ,i)} {γθ (Uθ,i)−m (Xi)− σ (Xi) εi}
{γθ (Uθ,i)−m (Xi)}2 + 2n−1
{γθ (Uθ,i)−m (Xi)}σ (Xi) εi,
where γ̂θ (x) is defined in (2.8). Using the expression of R (θ) in (2.7), one has
θ∈Sd−1c
∣R̂ (θ)−R (θ)
∣ ≤ I1 + I2 + I3 + I4,
I1 = sup
θ∈Sd−1c
{γ̂θ (Uθ,i)− γθ (Uθ,i)}2
I2 = sup
θ∈Sd−1c
{γ̂θ (Uθ,i)− γθ (Uθ,i)} {γθ (Uθ,i)−m (Xi)− σ (Xi) εi}
I3 = sup
θ∈Sd−1c
{γθ (Uθ,i)−m (Xi)}2 − E {γθ (Uθ)−m (X)}2
I4 = sup
θ∈Sd−1c
σ2 (Xi) ε
i − Eσ2 (X)
{γθ (Uθ,i)−m (Xi)}σ (Xi) εi
Bernstein inequality and strong law of large number for α mixing sequence imply that
I3 + I4 = o(1), a.s.. (A.37)
Now (2.13) of Proposition 2.1 provides that
θ∈Sd−1c
u∈[0,1]
|γ̂θ (u)− γθ (u)| = O
n−1/2h−1/2 log n+ h4
, a.s.,
which entail that
I1 = O
n−1/2h−1/2 log n
, a.s., (A.38)
I2 ≤ O
(nh)−1/2 log n+ h4
× sup
θ∈Sd−1c
|γθ (Uθ,i)−m (Xi)− σ (Xi) εi| .
Hence
I2 ≤ O
n−1/2h−1/2 log n+ h4
, a.s.. (A.39)
The lemma now follows from (A.37), (A.38) and (A.39) and Assumption A6.
SINGLE-INDEX PREDICTION MODEL 27
Lemma A.15. Under Assumptions A2 - A6, one has
θ∈Sd−1c
1≤p≤d
R̂ (θ)−R (θ)
− n−1
ξθ,i,p
n−1/2
, a.s., (A.40)
in which
ξθ,i,p = 2 {γθ (Uθ,i)− Yi}
γθ (Uθ,i)−
R (θ) , E (ξθ,i,p) = 0. (A.41)
Furthermore for k = 1, 2
θ∈Sd−1c
R̂ (θ)−R (θ)
n−1/2h−1/2−k log n+ h4−k
, a.s.. (A.42)
Proof. Note that for any p = 1, 2, ..., d
R̂ (θ) = n−1
{γ̂θ (Uθ,i)− Yi}
γ̂θ (Uθ,i) ,
R (θ) = E
{γθ (Uθ)−m (X)}
γθ (Uθ)
{γθ (Uθ)−m (X)− σ (X) ε}
γθ (Uθ)
Thus E (ξθ,i,p) = 2E
{γθ (Uθ,i)− Yi} ∂∂θpγθ (Uθ,i)
R (θ) = 0 and
R̂ (θ)−R (θ)
= (2n)
ξθ,i,p + J1,θ,p + J2,θ,p + J3,θ,p, (A.43)
J1,θ,p = n
{γ̂θ (Uθ,i)− γθ (Uθ,i)}
(γ̂θ − γθ) (Uθ,i) ,
J2,θ,p = n
{γθ (Uθ,i)−m (Xi)− σ (Xi) εi}
(γ̂θ − γθ) (Uθ,i) ,
J3,θ,p = n
{γ̂θ (Uθ,i)− γθ (Uθ,i)}
γθ (Uθ,i) .
Bernstein inequality implies that
θ∈Sd−1c
1≤p≤d
ξθ,i,p
n−1/2 log n
, a.s.. (A.44)
28 LI WANG AND LIJIAN YANG
Meanwhile, applying (2.13) and (2.14) of Proposition 2.1, one obtains that
θ∈Sd−1c
1≤p≤d
|J1,θ,p| = O
log n+ h4
)−1/2
log n+ h3
n−1h−2 log2 n+ h7
, a.s.. (A.45)
Note that
J2,θ,p = n
{γθ (Uθ,i)−m (Xi)− σ (Xi) εi}
(γ̃θ − γθ) (Uθ,i)
−n−1 (E+Eθ)T
{Pθ (E+Eθ)} .
Applying (2.13), one gets
θ∈Sd−1c
1≤p≤d
J2,θ,p + n
−1 (E+Eθ)
{Pθ (E+Eθ)}
, a.s.,
while (A.24), (A.26) and (A.12) entail that with probability 1
θ∈Sd−1c
1≤p≤d
n−1 (E+Eθ)
{Pθ (E+Eθ)}
log n
×N ×N ×O
log n
n−1N log2 n
θ∈Sd−1c
1≤p≤d
|J2,θ,p| = O
h3 + n−1N log2 n
, a.s.. (A.46)
Lastly
J3,θ,p − n−1
(γ̃θ − γθ)
γθ (Uθ,i) = n
−1 (E+Eθ)
BTθ Bθ
By applying (A.24), (A.26), and (A.12), it is clear that with probability 1
θ∈Sd−1c
1≤p≤d
n−1BTθ E+n
BTθ Bθ
log n
×N ×O
h+ (nN)
log n
n−1 log2 n+ (nN)−1/2 log n
while by applying (A.16) of Lemma A.7, one has
θ∈Sd−1c
1≤p≤d
(γ̃θ − γθ)
γθ (Uθ,i)
, a.s.,
SINGLE-INDEX PREDICTION MODEL 29
together, the above entail that
θ∈Sd−1c
1≤p≤d
|J3,θ,p| = O
h4 + n−1 log2 n+ (nN)−1/2 log n
, a.s.. (A.47)
Therefore, (A.43), (A.45), (A.46), (A.47) and Assumption A6 lead to (A.40), which, together
with (A.44), establish (A.42) for k = 1.
Note that the second order derivative of R̂ (θ) and R (θ) with respect to θp, θq are
{γ̂θ (Uθ,i)− Yi}
∂θp∂θq
γ̂θ (Uθ,i) +
γ̂θ (Uθ,i)
γ̂θ (Uθ,i)
E {γθ (Uθ)−m (X)}
∂θp∂θq
γθ (Uθ) + E
γθ (Uθ)
γθ (Uθ)
The proof of (A.42) for k = 2 follows from (2.13), (2.14) and (2.15).
Proof of Proposition 2.2. The result follows from Lemma A.14, Lemma A.15, equations
(A.50) and (A.51).
A.4. Proof of the Theorem 2
Let Ŝ∗p (θ−d) be the p-th element of Ŝ
∗ (θ−d) and for γθ in (2.6), denote
ηi,p := 2
γ̇p − θ0,pθ−10,dγ̇d
(Uθ0,i) {γθ0 (Uθ0,i)− Yi} , (A.48)
where γ̇p is value of
γθ taking at θ = θ0, for any p, q = 1, 2, ..., d − 1.
Lemma A.16. Under Assumptions A2-A6, one has
1≤p≤d−1
Ŝ∗p (θ0,−d)− n−1
n−1/2
, a.s.. (A.49)
Proof. For any p = 1, ..., d − 1
Ŝ∗p (θ−d)− S∗p (θ−d) =
− θpθ−1d
R̂ (θ)−R (θ)
Therefore, according to (A.40), (A.41) and (A.48)
ηi,p = n
ξθ0,i,p − θ0,pθ
ξθ0,i,d, E (ηi,p) = 0,
1≤p≤d−1
Ŝ∗p (θ0,−d)− S∗p (θ0,−d)− n−1
n−1/2
, a.s..
Since S∗ (θ−d) attains its minimum at θ0,−d, for p = 1, ..., d − 1
S∗p (θ0,−d) ≡
− θpθ−1d
R (θ)
which yields (A.49).
30 LI WANG AND LIJIAN YANG
Lemma A.17. The (p, q)-th entry of the Hessian matrix H∗ (θ0,−d) equals lp,q given in Theo-
rem 2.
Proof. It is easy to show that for any p, q = 1, 2, ..., d,
R (θ) =
E {m (X)− γθ (Uθ)}2 = −2E
γθ (Uθ)
γθ (Uθ)
∂θp∂θq
R (θ) = −2E
γθ (Uθ)
γθ (Uθ) + γθ (Uθ)
∂θp∂θq
γθ (Uθ)
Note that
R∗ (θ−d) =
R (θ)−
R (θ) , (A.50)
∂θp∂θq
R∗ (θ−d) =
∂θp∂θq
R (θ)−
∂θp∂θd
R (θ)−
∂θd∂θq
R (θ)
1− ‖θ−d‖22
R (θ) +
∂θd∂θd
R (θ) . (A.51)
R∗ (θ−d) = −2E
γθ (Uθ)
γθ (Uθ)
+ 2θ−1
γθ (Uθ)
γθ (Uθ)
∂θp∂θq
R∗ (θ−d) = −2E
γθ (Uθ)
γθ (Uθ) + γθ (Uθ)
∂θp∂θq
γθ (Uθ)
+2θqθ
γθ (Uθ)
γθ (Uθ) + γθ (Uθ)
∂θp∂θd
γθ (Uθ)
1− ‖θ−d‖22
γθ (Uθ)
γθ (Uθ)
+2θpθ
γθ (Uθ)
γθ (Uθ) + γθ (Uθ)
∂θp∂θq
γθ (Uθ)
−2θpθqθ−2d E
γθ (Uθ)
+ γθ (Uθ)
∂θd∂θd
γθ (Uθ)
Therefore we obtained the desired result.
Proof of Theorem 2. For any p = 1, 2, ..., d − 1, let
fp (t) = Ŝ
tθ̂−d + (1− t) θ0,−d
, t ∈ [0, 1],
fp (t) =
tθ̂−d+(1− t) θ0,−d
θ̂q − θ0,q
SINGLE-INDEX PREDICTION MODEL 31
Note that Ŝ∗ (θ−d) attains its minimum at θ̂−d, i.e., Ŝ
≡ 0. Thus, for any p = 1, 2, ..., d−
1, tp ∈ [0, 1], one has
−Ŝ∗p (θ0,−d) = Ŝ∗p
− Ŝ∗p (θ0,−d) = fp (1)− fp (0)
∂θqθp
tpθ̂−d + (1− tp) θ0,−d
q=1,...,d−1
θ̂−d−θ0,−d
−Ŝ∗ (θ0,−d) =
∂θq∂θp
tpθ̂−d + (1− tp) θ0,−d
p,q=1,...,d−1
θ̂−d − θ0,−d
Now (2.11) of Theorem 1 and Proposition 2.2 with k = 2 imply that uniformly in p, q =
1, 2, ..., d − 1
∂θq∂θp
tpθ̂−d+(1− tp) θ0,−d
−→ lq,p, a.s., (A.52)
where lp,q is given in Theorem 2. Noting that
θ̂−d−θ0,−d
is represented as
∂θq∂θp
tpθ̂−d + (1− tp) θ0,−d
p,q=1,...,d−1
nŜ∗ (θ0,−d) ,
where Ŝ∗ (θ0,−d) =
Ŝ∗p (θ0,−d)
and according to (A.48) and Lemma A.16
Ŝ∗p (θ0,−d) = n
ηp,i + o
n−1/2
, a.s., E (ηp,i) = 0.
Let Ψ (θ0) = (ψpq)
p,q=1
be the covariance matrix of
Ŝ∗p (θ0,−d)
with ψpq given in
Theorem 2. Cramér-Wold device and central limit theorem for α mixing sequences entail that
nŜ∗ (θ0,−d)
d−→ N {0,Ψ(θ0)} .
Let Σ (θ0) = {H∗ (θ0,−d)}−1Ψ(θ0)
{H∗ (θ0,−d)}T
, withH∗ (θ0,−d) being the Hessian matrix
defined in (2.3). The above limiting distribution of
nŜ∗ (θ0,−d), (A.52) and Slutsky’s theorem
imply that
θ̂−d−θ0,−d
d−→ N {0,Σ (θ0)} .
References
Bosq, D. (1998). Nonparametric Statistics for Stochastic Processes. Springer-Verlag, New
York.
32 LI WANG AND LIJIAN YANG
Carroll, R., Fan, J., Gijbles, I. and Wand, M. P. (1997). Generalized partially linear single-
index models. J. Amer. Statist. Assoc. 92 477-489.
Chen, H. (1991). Estimation of a projection -persuit type regression model. Ann. Statist. 19
142-157.
de Boor, C. (2001). A Practical Guide to Splines. Springer-Verlag, New York.
DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation: Polynomials and
Splines Approximation. Springer-Verlag, Berlin.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman
and Hall, London.
Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist.
Assoc. 76 817-823.
Härdle, W. (1990). Applied Nonparametric Regression. Cambridge University Press, Cam-
bridge.
Härdle, W. and Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index models.
Ann. Statist. 21 157-178.
Härdle, W. and Stoker, T. M. (1989). Investigating smooth multiple regression by the method
of average derivatives. J. Amer. Statist. Assoc. 84 986-995.
Hall, P. (1989). On projection pursuit regression. Ann. Statist. 17 573-588.
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall,
London.
Horowitz, J. L. and Härdle, W. (1996). Direct semiparametric estimation of single-index
models with discrete covariates. J. Amer. Statist. Assoc. 91 1632-1640.
Hristache, M., Juditski, A. and Spokoiny, V. (2001). Direct estimation of the index coefficients
in a single-index model. Ann. Statist. 29 595-623.
Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31
1600-1635.
Huang, J. and Yang, L. (2004). Identification of nonlinear additive autoregressive models. J.
R. Stat. Soc. Ser. B Stat. Methodol. 66 463-477.
SINGLE-INDEX PREDICTION MODEL 33
Huber, P. J. (1985). Projection pursuit (with discussion). Ann. Statist. 13 435-525.
Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of
single-index models Journal of Econometrics 58 71-120.
Klein, R. W. and Spady. R. H. (1993). An efficient semiparametric estimator for binary
response models. Econometrica 61 387-421.
Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic properties of
a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443-1490.
Pham, D. T. (1986). The mixing properties of bilinear and generalized random coefficient
autoregressive models. Stochastic Anal. Appl. 23 291-300.
Powell, J. L., Stock, J. H. and Stoker, T. M. (1989). Semiparametric estimation of index
coefficients. Econometrica. 57 1403-1430.
Tong, H. (1990) Nonlinear Time Series: A Dynamical System Approach. Oxford, U.K.:
Oxford University Press.
Tong, H., Thanoon, B. and Gudmundsson, G. (1985) Threshold time series modeling of two
icelandic riverflow systems. Time Series Analysis in Water Resources. ed. K. W. Hipel,
American Water Research Association.
Wang, L. and Yang, L. (2007). Spline-backfitted kernel smoothing of nonlinear additive
autoregression model. Ann. Statist. Forthcoming.
Xia, Y. and Li, W. K. (1999). On single-index coefficient regression models. J. Amer. Statist.
Assoc. 94 1275-1285.
Xia, Y., Li, W. K., Tong, H. and Zhang, D. (2004). A goodness-of-fit test for single-index
models. Statist. Sinica. 14 1-39.
Xia, Y., Tong, H., Li, W. K. and Zhu, L. (2002). An adaptive estimation of dimension
reduction space. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 363-410.
Xue, L. and Yang, L. (2006 a). Estimation of semiparametric additive coefficient model. J.
Statist. Plann. Inference 136, 2506-2534.
Xue, L. and Yang, L. (2006 b). Additive coefficient modeling via polynomial spline. Statistica
Sinica 16 1423-1446.
Table 2: Report of Example 2
Sample Size n Dimension d
Average MSE Time
MAVE SIP MAVE SIP
4 0.00020 0.00018 1.91 0.19
10 0.00031 0.00043 2.17 0.10
30 0.00106 0.00285 2.77 0.13
50 0.00031 0.00043 3.29 0.10
100 0.00681 0.00620 5.94 0.31
200 0.00529 0.00407 27.90 0.49
4 0.00008 0.00008 3.28 0.09
10 0.00012 0.00017 3.93 0.13
30 0.00017 0.00058 5.41 0.15
50 0.00032 0.00127 8.48 0.16
100 — 0.00395 — 0.44
200 — 0.00324 — 0.73
4 0.00004 0.00003 5.32 0.17
10 0.00005 0.00007 7.49 0.24
30 0.00006 0.00017 10.08 0.26
50 0.00007 0.00030 15.42 0.24
100 0.00015 0.00061 40.81 0.54
200 — 0.00197 — 1.44
4 0.00002 0.00001 14.44 0.76
10 0.00002 0.00003 24.54 0.79
30 0.00002 0.00008 32.51 0.83
50 0.00002 0.00010 52.93 0.89
100 0.00003 0.00012 143.07 0.99
200 0.00004 0.00020 386.80 1.96
400 — 0.00054 — 4.98
4 0.00001 0.00001 33.57 1.95
10 0.00001 0.00001 62.54 3.64
30 0.00001 0.00002 92.41 1.95
50 0.00001 0.00003 155.38 2.72
100 0.00001 0.00005 275.73 1.81
200 0.00008 0.00006 2432.56 2.84
400 — 0.00010 — 9.35
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
−1 0 1 2
−1 0 1 2
Figure 1: Example 1. (a) and (b) Plots of the actual surface m in model (4.1) with re-
spect to δ = 0, 1; (c) and (d) Plots of various univariate functions with respect to δ = 0, 1:
XTi θ̂, Yi
, 1 ≤ i ≤ 50 (dots); the univariate function g (solid line); the estimated function of
g by plugging in the true index coefficient θ0 (dotted line); the estimated function of g by
plugging in the estimated index coefficient (dashed line) θ̂ = (0.69016, 0.72365)T for δ = 0 and
(0.72186, 0.69204)T for δ = 1.
−2 −1 0 1 2
n= 100 ,  d= 10
−2 −1 0 1 2
n= 100 ,  d= 50
−2 −1 0 1 2
n= 100 ,  d= 100
−3 −2 −1 0 1 2 3
n= 100 ,  d= 200
Figure 2: Example 2. Plots of the spline estimator of g with the estimated index parameter θ̂
(dotted curve), cubic spline estimator of g with the true index parameter θ0 (dashed curves),
the true function m (x) in (4.2) (solid curve), and the data scatter plots (dots).
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Density Estimation, d=10
n=100
n=200
n=500
n=1000
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Density Estimation, d=50
n=100
n=200
n=500
n=1000
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Density Estimation, d=100
n=100
n=200
n=500
n=1000
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35
Density Estimation, d=200
n=100
n=200
n=500
n=1000
Figure 3: Example 2. Kernel density estimators of the 100 ‖θ̂ − θ0‖/
0 200 400 600 800 1000
0 200 400 600 800 1000
0 200 400 600 800 1000
Figure 4: Time plots of the daily Jökulsá Eystri River data (a) river flow Yt (solid line) with its
trend (dashed line) (b) temperature Xt (solid line) with its trend (dashed line) (c) precipitation
Zt (solid line) with its trend (dashed line).
+++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++
+++++++++++++++++++++++++++++++++++++
++++++
+++++
+++++
++++++++++
++++++++++
+++++++
+++++++
++++++++++++
+++++
++++++++++++
+++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++++++
++++++++++++++++++
+++++
+++++
++++++
+++++++
+++++
+++++
+++++
+++++
++++++++
+++++
++++++++++++++
+++++++
+++++++++++++++++++++++++
++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++
++++++++++
+++++++++++++++++++++
++++++
++++++
++++++
++++++++++
+++++++++
+++++
++++++++++++++++++++++++++++++
+++++++
++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
0 200 400 600 800 1000
0 200 400 600 800 1000
800 900 1000 1100
++++++++++++++++++++++++++++++++++++++++++++++++++
++++++
++++++++++
+++++++++++++++++++++
++++++
++++++
++++++
++++++++++
+++++++++
+++++
++++++++++++++++++++++++++++++
+++++++
++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Figure 5: (a) The scatter plot of the river flow (“+”) and the fitted plot of the river flow (line)
and (b) Residuals of the fitted SIP model (c) Out-of-sample rolling forecasts (line) of the river
flow for the entire third year (“+”) based on the first two years’ river flow.
ABSTRACT
  For the past two decades, single-index model, a special case of projection
pursuit regression, has proven to be an efficient way of coping with the high
dimensional problem in nonparametric regression. In this paper, based on weakly
dependent sample, we investigate the single-index prediction (SIP) model which
is robust against deviation from the single-index model. The single-index is
identified by the best approximation to the multivariate prediction function of
the response variable, regardless of whether the prediction function is a
genuine single-index function. A polynomial spline estimator is proposed for
the single-index prediction coefficients, and is shown to be root-n consistent
and asymptotically normal. An iterative optimization routine is used which is
sufficiently fast for the user to analyze large data of high dimension within
seconds. Simulation experiments have provided strong evidence that corroborates
with the asymptotic theory. Application of the proposed procedure to the rive
flow data of Iceland has yielded superior out-of-sample rolling forecasts.

<|endoftext|><|startoftext|>
Introduction
The Pierre Auger Observatory in Malargüe, Argentina, is designed to study
the origin of ultra-high energy cosmic rays with energies above 1018 eV. While
Preprint submitted to Astroparticle Physics 30 October 2018
http://arxiv.org/abs/0704.0303v2
still under construction, scientific data taking began in 2004, and first results
have been published [1,2,3].
The Pierre Auger Observatory is a hybrid detector that combines two tech-
niques traditionally used to measure cosmic ray air showers: surface particle
detection and air fluorescence detection. Both detector types measure the cos-
mic ray primary indirectly, using the Earth’s atmosphere as part of the de-
tector medium. When the primary particle enters the atmosphere, it interacts
with air molecules, initiating a cascade of secondary particles, the so-called ex-
tensive air shower. Surface detectors in the form of ground arrays sample the
shower front as it impacts the ground, whereas air fluorescence detectors make
use of the fact that the particles in the air shower excite nitrogen molecules
in the air, causing UV fluorescence. Using photomultiplier cameras to record
air shower UV emission, we can observe showers as they develop through the
atmosphere and obtain a nearly calorimetric estimate of the shower energy.
Upon completion, the surface detector (SD) array of the Pierre Auger Ob-
servatory will comprise 1600 water Cherenkov detector tanks, deployed in a
hexagonal grid over an area of 3000 km2, and four fluorescence detector (FD)
stations overlooking the SD from the periphery. An advantage of combining
both detector types at the same site is the possibility to cross-calibrate. Based
on the subset of events seen with both detectors, the nearly calorimetric in-
formation of the FD provides the energy calibration of the SD.
For the calibration to be meaningful, the properties of the calorimeter, i.e.
the atmosphere, must be well-known. At the Pierre Auger Observatory, this is
achieved by an extensive program to monitor the atmosphere within the overall
FD aperture and measure atmospheric attenuation and scattering properties
in the 300 to 400 nm wavelength band recorded by the FDs [4,5,6].
Two primary forms of atmospheric light scattering need to be considered:
molecular, or Rayleigh, scattering, mainly due to nitrogen and oxygen molecules;
and aerosol scattering due to airborne particulates. The angular distribution
of scattered light in both types of scattering may be described by a phase
function P (θ), defined as the probability per unit solid angle of scattering
through an angle θ.
Rayleigh scattering allows for an analytical treatment, and assuming isotropic
scattering, the Rayleigh phase function has the well known 1 + cos2 θ angular
dependence. Matters are more complicated for aerosols, because the scatter-
ing cross section depends on the size distribution and shape of the scatter-
ers. Forward scattering typically dominates in this case, but the fraction of
forward-scattered light varies strongly with aerosol type. Moreover, a rigorous
analytical treatment is not possible, though the literature gives various ap-
proximations. For example, if one assumes spherical particles with a known or
estimated size distribution, then aerosol scattering can be described analyti-
cally using Mie theory [7]. In practice, however, aerosols vary a great deal in
size and shape, and the aerosol content of the atmosphere changes on short
time scales as wind lifts up dust, weather fronts pass through, or rain removes
dust from the atmosphere.
The FD reconstruction of the primary cosmic ray particle energy must account
not only for light that is “lost” between the shower and the camera due to
scattering, but also for direct and indirect Cherenkov light contributing to
the FD signal. The amount of Cherenkov light seen by the FDs depends on
the viewing angle, i.e. the angle between the shower axis and the FD line of
sight, and can be calculated once the geometry of the air shower is determined.
At small viewing angles, direct Cherenkov light dominates, while at viewing
angles greater than ∼ 20◦, the FDs detect mainly “indirect” Cherenkov light
scattered into the FD field of view. To calculate this scattered component, the
aerosol phase function needs to be known. Finally, a small multiple scattering
component also adds to the contamination of the fluorescence light and must
be removed [8].
The Aerosol Phase Function (APF) light sources [9,10], in conjunction with
the fluorescence detectors at the Pierre Auger Observatory, are designed to
measure the aerosol phase function on an hourly basis during FD data tak-
ing. The APF light sources direct a near-horizontal pulsed light beam across
the field of view of a nearby FD. The aerosol phase function can then be re-
constructed from the intensity of the light observed by the FD cameras as a
function of scattering angle. Since the FD telescopes cover about 180◦ in az-
imuth, the aerosol phase function is measured over a wide range of scattering
angles.
Currently, APF light sources are installed and operating at two of the FDs.
With their ability to measure the angular distribution of the scattered light,
the APF light sources are meant to complement other atmospheric monitoring
tools at the Auger site which measure the optical depth, and therefore the
amount of attenuation due to aerosols.
This paper describes the design and performance of the APF light sources.
It is structured as follows. Section 2 gives a description of the APF facilities.
Section 3 describes how the aerosol phase function is determined from the
APF data. In Section 4, we show first results for data taken between June and
December 2006. Section 5 summarizes the paper.
Fig. 1. Schematic layout of the Pierre Auger Observatory. The shaded area indicates
the shape and size of the surface detector area. The fluorescence detectors are placed
at the periphery of the surface detector array. The field of view of the 6 bays of each
fluorescence detector (FD) is indicated by the lines. From the Central Laser Facility
(CLF) [6] in the center of the surface detector array, a pulsed UV laser beam is
directed into the sky, providing another test beam which can be observed by the FDs.
2 APF Light Sources
2.1 Detector Buildings, Optics, and Electronics
The Auger FD comprises four detector stations (see Fig. 1). At present, the
sites at Los Leones, Coihueco, and Los Morados are completed and fully opera-
tional, while the fourth site at Loma Amarilla is under construction. APF light
sources are operating at the Coihueco and Los Morados FD sites. Both were
built by the University of New Mexico group [10]. Fig. 2 shows a photograph
of the APF container building at Los Morados.
Each APF building contains sources which operate at different wavelengths in
the region of interest between 300 nm and 400 nm. During the initial studies
described in this paper, only one light source with a Johnson U-band filter
of central wavelength 350 nm was used. However, in the near future, we plan
to operate the light sources at several wavelengths to study the wavelength
dependence of the phase function over the full range of the FD sensitivity.
The light beam is provided by a broad-band Xenon flash lamp source from
Fig. 2. Photos of the enclosure (left) and the light source (right) at the Los Morados
APF facility. In the photograph on the left, the Los Morados FD can be seen on the
horizon (to the left of the container).
Perkin Elmer Optoelectronics (model LS-1130-4 FlashPac with FX-1160 flash
lamp). The Xenon flash lamps were chosen because of their excellent stability
in intensity and pulse shape. A Johnson/Cousins (Bessel) U-band filter from
Omega Optical Inc. (part number XBSSL/U/50R) selects a central wavelength
of ∼ 350 nm, FWHM 60 nm) from the broad flash lamp spectrum. The beam
is focused using a 20.3 cm diameter UV enhanced aluminum spherical mirror
(speed f/3) from Edmund Scientific Co. (part number R43-589). All optical
components are assembled on a commercial optical plate. We use Thor optical
table parts, assembled from Nomex Epoxy/Fiberglass 1.91 cm panels from
TEKLAM (part number N507EC).
The Xenon lamps rest inside refurbished 6.1 m shipping containers, and the
light is sent through a 0.749 cm thick acrylite UV transmitting window (Cyro
Industries acrylite OP-4 UVT acrylic). Each light source provides a nearly
horizontal beam of divergence ≤ 10 mrad pulsed across the field of view of the
nearby fluorescence detector. Computer control occurs from the correspond-
ing FD building. A serial radio link (YDI Wireless, model 651-900001-001
(TranzPoint ESC-II Kit)) connects the computer to a commercial ADC/relay
system (model ADC-16F 16 channel 8 bit ADC and RH-8L 8-relay card from
Electronic Energy Control Inc.) at the light source.
Once during each hour of FD data taking, the ADC/relay system enables a
1 Hz GPS pulser (CNS Systems Inc., model CNSC01 with TAC32 software)
and a 12 V to 24 V inverter to power the Xenon flash lamps. Each lamp fires
a set of 5 shots, pulsed at 2 second intervals. The APF events are flagged by
the FD data acquisition system and the corresponding FD data are stored on
disk in especially designated APF data files.
When the light sources are not operating, only the radio link and the ADC
board are powered. The total current draw is therefore only ∼ 0.2 A at 12 V,
and the whole system can be powered by batteries recharged during the day
with 12 V solar panels (two Siemens SP75 75 W solar modules with Trace
C35 controller).
2.2 APF Signals in the Fluorescence Detectors
The light beam produced by the APF sources is observed by the cameras of
the corresponding FD site. The FD detectors of the Pierre Auger Observatory
are described in detail elsewhere [11]. Here, we only give a short summary of
the main characteristics relevant for the analysis of APF shots.
Each Auger FD site contains six bays, and each bay encloses a UV telescope
composed of a spherical light-collecting mirror, a photomultiplier camera at
the focal surface, and a UV transmitting filter in the aperture. The mirrors
have a radius of curvature of 3.4 m and an area of about 3.5 × 3.5 m2. The
camera consists of 440 photomultipliers with a hexagonal bialkaline photo-
cathode, arranged in a 20× 22 array. Each camera has a field of view of 30.0◦
in azimuth and 28.6◦ in elevation, covering an elevation angle range from 1.6◦
to 30.2◦ above horizon. To reduce optical aberrations, including coma, the FD
telescopes use Schmidt optics with a circular diaphragm of diameter 2.2 m
placed at the center of curvature of the mirror, and a refractive corrector ring
at the telescope aperture.
Fig. 3 shows an APF shot as seen by the Coihueco FD. Five out of the 6 bays
of the Coihueco FD site observe light from the Coihueco APF facility. In this
figure, the light travels from right to left. Fig. 4 shows the relative positions
of the APF source and the FD at the Coihueco site. The geometry is in part
dictated by the local topography, and consequently is slightly different for the
Los Morados site.
Fig. 3. The APF pulse as seen by the Coihueco FD. The light travels from right
to left, and each PMT Cluster observes 30◦ in azimuth. Note that the projection of
the approximately horizontal APF beam onto the spherical FD surface results in a
curved track.
Fig. 4. Scheme of the location of the Coihueco APF light source relative to the
Coihueco FD. Located at the center is the Coihueco FD with its field of view in-
dicated. The value of α is 26◦ and β is 38◦, measured from the North. The shot
direction γ is about 24◦.
3 Determination of the Aerosol Phase Function
The signal from the APF light source observed by the ith pixel of a fluorescence
detector can be expressed as
Si = I0 · Ti ·
·∆zi ·∆Ωi · ǫi . (1)
In this equation, I0 is the light source intensity; Ti is the transmission factor
e−ri/Λtot which accounts for light attenuation from the beam to the pixel; ri
is the distance from the beam to the detector; Λtot, Λm, and Λa are the total,
molecular, and aerosol extinction length, respectively; and σ−1m dσm/dΩ and
σ−1a dσa/dΩ are the normalized differential molecular and aerosol scattering
cross sections, respectively, which are identical to the phase functions Pm(θ)
and Pa(θ). The integral of Pm(θ) and Pa(θ) over all solid angles is equal to 1.
Finally, ∆zi, ∆Ωi, and ǫi are the track length, detector solid angle, and the
efficiency for the ith pixel of the detector.
The data come in the form of total PMT signal per pixel from a particular shot.
Those data are binned as a function of azimuth and averaged between the five
shots taken within 10 seconds. In this analysis, 5◦ bins are used, although the
fit is relatively insensitive to the number of bins. Each FD pixel is hexagonally
shaped, so for those lying at the boundary of two azimuth bins, the fractional
area of the hexagon in each bin is used to properly distribute the signal. The
signal in each pixel is divided by ∆zi, 1/r
i and ǫi to correct for the geometry of
the beam and pixel calibration. Note that in the roughly cylindrical geometry
of the FD-APF beam, the ∆zi and 1/r
i corrections almost completely cancel
Typical values for the aerosol extinction length in dry atmospheres are be-
tween 10 km and 20 km, reaching 40 km for very clear conditions. Since the
perpendicular distance from the beam to the FD is only on the order of a
few hundred meters, it is reasonable to assume full atmospheric transmission
(Ti = 1) over the length of the beam. In reality, this assumption does not
hold well for the most distant beam points, so these points are not used in
the present study. In the near future, measurements of the extinction length
from the Auger lidar stations [5] will be used to improve the APF analysis.
In another approximation, we assume that the extinction lengths are identi-
cal for each pixel for single measurements and do not require an index i. In
principle, the extinction length depends on the number density of scatterers
and is therefore a function of the density (temperature, pressure) of the air.
Given corrections for geometry, attenuation, and pixel efficiency, Eq. 1 reduces
Si = C ·
, (2)
where C is a constant whose value is unimportant because arbitrary units are
sufficient in determining the phase function.
From the theory of Rayleigh scattering it is known that the Rayleigh phase
function is
Pm(θ) =
16 π(1 + 2 γ)
(1 + 3 γ) + (1− γ) cos2 θ
where γ accounts for the effect of molecular anisotropy on Rayleigh scattering.
For isotropic scattering, γ = 0, this reduces to the familiar
Pm(θ) =
(1 + cos2 θ) . (4)
The effect of the anisotropy is small and wavelength-dependent. Bucholtz [12]
Fig. 5. Schematic of track seen by ith pixel.
estimates γ ≃ 0.015 at 360 nm and concludes that the correction leads to
a ∼ 3% systematic increase in the Rayleigh scattering cross section, and a
fractional change ≤ 1.5% from the approximate (1 + cos2 θ). In our analysis,
only the shape of the function is relevant, and we use Eq. 4 as an approximation
of Eq. 3.
The aerosol phase function is often parameterized by the Henyey-Greenstein
function [13]:
Pa(θ) =
1− g2
(1 + g2 − 2gµ)3/2
, (5)
where µ = cos θ and g is an asymmetry parameter equal to the mean cosine of
the scattering angle: g = 〈cos θ〉. The parameter g is a measure of how much
light is scattered in the forward direction; a greater g means more light is
forward-scattered. Values for g range from g = 1 (total forward scattering) to
g = −1 (total backward scattering), with g = 0 indicating isotropic scattering.
The Henyey-Greenstein function works well for pure forward scattering, but
it cannot describe realistic aerosol conditions, which typically give rise to non-
negligible backscattering. Following [14,15], we modify Eq. 5 so that
Pa(θ) =
1− g2
(1 + g2 − 2gµ)3/2
3µ2 − 1
2(1 + g2)3/2
. (6)
The new term in this expression is proportional to the second Legendre poly-
nomial, and it is introduced to describe the extra backscattering component.
The value f is a fit parameter used to tune the relative strength of forward to
backward scattering.
The binned APF signal observed in the FD is therefore subjected to a 4-
parameter fit:
Si = A · (1 + µ
i ) +B · (1− g
(1 + g2 − 2gµi)3/2
3µ2i − 1
2(1 + g2)3/2
, (7)
where A, B, g and f are the fit parameters.
In principle, the parameters A and B, which describe the relative amount of
Rayleigh and Mie scattering, can be determined from measurements of the
extinction lengths Λm and Λa and assumptions about the particle albedo,
i.e. the ratio of light scattered by the aerosol particle in all directions to the
amount of incoming light. The albedo is close to one if the particle is mostly
reflective. Since local information on the extinction lengths was not available
for this analysis, we use A and B as additional fit parameters. We find that
the distinct shapes of the two phase functions does allow a determination of
A and B from the data themselves.
At Coihueco, the APF signal is seen in 5 out of the 6 mirrors, so the track is
visible over ∼ 150◦ in azimuth. At the boundary between each mirror there
is some overlap in the fields of view of pixels. This overlap produces a double
counting of signal resulting in the value of bins at boundaries being too large.
These bins are simply ignored in the fit. The values of the other bins and their
errors are obtained from the mean and standard deviation of the five APF
shots in each shot sequence.
On clear nights with few or no aerosols, the fit to Eq. 7 returns unphysical val-
ues for the parameters B, f , and g. In those cases, we re-fit the data to a pure
Rayleigh function by setting B, f , and g equal to zero. Two examples of fits,
one for a night with aerosol content, and one for a night with pure Rayleigh
scattering, are shown in Fig. 6. The aerosol, molecular, and total phase func-
tions are shown. The aerosol phase function is obtained by subtracting the
molecular component determined by the fit.
We fit the data only over a subrange of the available scattering angles, from
θmin ≃ 32.5
◦ to θmax ≃ 147.3
◦. As Fig. 6 indicates, the data deviates from
the theoretical prediction for scattering angles below θmin and above θmax.
At smaller and larger angles, several effects corrupt the signal and make it
unusable for the fit to the phase function. Due to the local geometry at the
Coihueco site (see Fig. 4), the APF shot is not visible for θ < 24◦, and below
30◦, the signal is incomplete because the beam is still partially beneath the
detector field of view. At large scattering angles, the beam is at a rapidly in-
creasing distance to the corresponding FD bay, and attenuation of light from
the beam to the detector becomes important. As mentioned earlier, because
local measurements of the optical depth are not yet available, we simply as-
sume T = 1. As measurements of T become available, the attenuation of light
scattered at large angles can be used to correct the data.
]° [θscattering angle 
0 20 40 60 80 100 120 140 160 180
A         46.2±  8602 
B         1814± 2.696e+05 
g         0.0036± 0.6804 
f         0.0147± 0.4977 
Aerosol Phase Function
Total phase function
Rayleigh phase function
Mie phase function
]° [θscattering angle 
0 20 40 60 80 100 120 140 160 180
10000
15000
20000
25000
30000
35000
40000
45000
50000
A         19± 1.011e+04 Aerosol Phase Function
Total phase function
Rayleigh phase function
Mie phase function
Fig. 6. Two examples for APF data fits on different days. In the upper plot (June
28, 2006, 5:12 am local time) aerosols are visible. Data are fit to the function given
in Eq. 7. The phase function in the lower plot (July 2, 2006, 3:12 am local time) is
consistent with pure Rayleigh scattering. Data are fit to Eq. 7, with B = 0, f = 0,
and g = 0. Error bars for both plots are the standard deviation of the 5 APF events.
In order to apply geometrical corrections when binning the data, the angle at
which the APF light source shoots (γ in Fig. 4) with respect to the FD and the
elevation angle of the shot direction needs to be known. We determined these
values from the data themselves. The elevation angle was determined from a
reconstruction of APF shots with the FD offline reconstruction [16], and γ was
determined from the analysis of APF shots on nights where aerosol scattering
was negligible. The data from these nights were fit to the Rayleigh component
of the phase function, with the position of the minimum (nominally at 90◦
scattering angle) as a free parameter. The fit value of this angle was then used
to deduce the direction which the APF light source shoots relative to the FD
(∼ 24◦ at Coihueco).
4 First Results
We have applied the analysis described in Section 3 to data recorded between
June and December 2006 at the Coihueco site. Since the APF light sources
operate during all nights of FD operation, this data set includes all moonless
nights, with the exception of nights with rain or strong winds when the FDs
remain closed. Fig. 7 shows the distribution of the asymmetry parameter g
(left) and the backscatter parameters f (right). For most nights with aerosol
contamination, the value of g at the experiment site in Malargüe is ∼ 0.6,
with an average of 0.59 and a standard deviation of 0.07 for the data period
analyzed here. Values of g = 0 indicate hours where the measured phase
function can be described by pure Rayleigh scattering, so the aerosol phase
function is effectively negligible. Fig. 7 also shows the asymmetry parameter
as a function of time for the analyzed period. With the limited amount of data
taken so far, no conclusions concerning seasonal variations can be drawn. The
asymmetry parameter appears to be stable during the observed time period.
With more data becoming available over the next few years, we plan to monitor
the month-to-month variation in g and analyze possible correlations with other
weather measurements.
One of the main tasks of the APF, in addition to providing the in situ aerosol
phase function for every hour of FD data taking, is the identification of “clear”
nights with small aerosol contamination. These nights play an important role
in the calibration of other atmospheric monitoring devices such as the Central
Laser Facility (CLF) [6]. On clear nights, the measured phase function can be
described by pure Rayleigh scattering (measurements where this is the case
appear as g = 0 in Fig. 7).
To confirm the reliability of the fit where both the normalization of the Mie
and the Rayleigh contribution are fit parameters, Fig. 8 shows the Rayleigh
normalization factor A for the same data set. One might expect the molecular
asymmetry parameter g
0 0.2 0.4 0.6 0.8 1
backscatter parameter f
-0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
Jun 2006 Jul 2006 Aug 2006 Sep 2006 Oct 2006 Nov 2006 Dec 2006
Fig. 7. Top: Distribution of the asymmetry parameter g (top left) and the backscatter
parameter f (top right) for all measurements performed between June and December
2006. Values of g = 0 (and f = 0) indicate that the phase function can be described
with pure Rayleigh scattering. Bottom: Asymmetry parameter g as a function of
time.
contribution to be rather stable, and in fact this parameter does not change
much with time.
It is instructive to compare the average asymmetry parameter obtained from
the APF with model expectations and measurements at comparable locations.
Typically, measurements are performed at optical wavelengths and cannot be
directly compared to measurements at UV wavelengths. However, a compi-
lation at different wavelengths from 450 nm to 700 nm [17] shows that the
wavelength dependence of g is small; values at 450 nm are a few percent
larger than at 550 nm.
Rayleigh normalization
4000 6000 8000 10000 12000 14000 16000
Fig. 8. Distribution of the Rayleigh normalization parameter A for all measurements
performed between June and December 2006.
To first order, g = 0.7 is often used as a generic value for g in radiative transfer
models. A smaller value for g is expected at dry locations. A parameterization
of aerosol optical properties by d’Almeida et al. [18] suggests values for g
between 0.64 and 0.83 at 550 nm depending on aerosol type and season, with
higher averages for high relative humidity.
The Pierre Auger Observatory is located east of the Andes in the Pampa
Amarilla, an arid high plateau at 1420 m a.s.l., so values around 0.6 are within
expectations. For comparison, recent measurements carried out in the South-
ern Great Planes of the US [19] yield values for g at 550 nm of 0.60± 0.03 for
dry conditions and 0.65± 0.05 for ambient conditions.
The aerosol phase function most commonly used in fluorescence detector data
analysis, both for the High Resolution Fly’s Eye (HiRes) Experiment [20],
which operated in Utah between 1997 and 2006, and the Pierre Auger FD de-
tectors, is the function obtained from a desert aerosol simulation by Longtin [21].
Longtin’s desert model is based on Mie scattering theory and assumes that the
desert atmosphere has three major components: carbonaceous particles, water-
soluble particles, and sand. For each aerosol component, the model assumes a
characteristic log normal size distribution and refractive index. Longtin per-
formed his calculations for several wavelengths and wind speeds; those made
at 550 nm with a wind speed of 10 m/s most closely match the 300 nm to
400 nm nitrogen fluorescence band observed by the FDs and have therefore
been traditionally used in air fluorescence data analysis.
]° [θscattering angle 
0 20 40 60 80 100 120 140 160 180
Aerosol Phase Function
 = 550 nm, wind = 10 m/s)λLongtin (
This work: f = 0.5, g = 0.7
This work: f = 0.4, g = 0.6
Fig. 9. Comparison of the Longtin aerosol phase function (desert atmosphere simu-
lated with a wind speed of 10 m/s) with the default phase function used in the Auger
atmospheric database (f = 0.5, g = 0.7) and the typical phase function measured by
the APF (f = 0.4, g = 0.6).
Fig. 9 compares the Longtin aerosol phase function at 550 nm to the modi-
fied Henyey-Greenstein function of Eq. 6 with two sets of f and g: f = 0.5
and g = 0.7, the default values used by the Auger atmospheric database; and
f = 0.4 and g = 0.6, the values determined in this study to be more typical
of the detector location. The comparison shows that, on average, the differ-
ence between the Longtin function and the measured phase function is small
for those scattering angles relevant in fluorescence measurements — ∼ 30◦
to 150◦. Only at the largest scattering angles above 160◦ do the phase func-
tions differ notably. This region is outside the current range of validity of our
measurement.
Our primary interest in aerosol scattering is its effect on the air shower recon-
struction, most notably the determination of the shower energy. However, it is
not straightforward to estimate the extent to which the use of measured rather
than averaged values of f and g changes the energy reconstruction, as this
depends strongly on other atmospheric parameters, for example the aerosol
optical depth. Rather than singling out the phase function measurement, we
need to study the effect of the combined measurement of all atmospheric pa-
rameters, a task which is beyond the scope of this paper.
We can, however, get an estimate of the relevance of the phase function mea-
surement by studying its effect on the energies of events that are of particu-
lar importance for the energy calibration of the detector, the “golden hybrid
hist1
Mean   0.6996
RMS    0.7913
) / Eapf - Estd(E
-10 -8 -6 -4 -2 0 2 4 6 8 10
210 hist1
Mean   0.6996
RMS    0.7913
f > 0, g > 0
Mean   0.71
RMS    0.79
f = 0, g = 0
Mean   1.87
RMS    2.34
Weather Conditions
f > 0, g > 0
f = 0, g = 0
Fig. 10. Differences in the energies of golden hybrid events reconstructed with de-
fault phase function values (Estd) and those reconstructed using phase function fit
parameters determined from APF measurements (Eapf). The red (bold) histogram
represents data taken during nights with measurable aerosols; the blue (light) his-
togram depicts events observed on purely molecular nights.
events.” These are events observed by one or more fluorescence detectors and
three or more surface array tanks. For “golden hybrid events” observed by the
Coihueco FD site between June and December 2006, we performed the recon-
struction twice: first, using the default parameters f = 0.5 and g = 0.7 to
estimate aerosol scattering; and second, using the fit parameters determined
from APF measurements. In both cases atmospheric extinction was simu-
lated using an average aerosol profile model representative of the Malargüe
site [22,23].
Fig. 10 depicts the relative differences in energies caused by reconstructing
showers with the default phase function and the measured phase function.
The red (bold) histogram represents data taken during nights with aerosol
contamination (f > 0, g > 0) while the blue (light) histogram represents
data taken during nights where according to the APF analysis scattering is
purely molecular. The correction is typically of order one percent. However,
on those nights when aerosol loading is extremely low, so that atmospheric
scattering may be characterized as purely molecular, the use of the default
scattering parameters causes larger errors in the shower reconstruction. Under
such conditions, the total phase function lacks the strong forward-scattering
component typical of aerosols. During these periods, incorrectly accounting
for aerosol scattering starts to impact the energy calibration of the detector.
A correct determination of the phase function on a regular basis is therefore
an important part of the atmospheric monitoring efforts at the site.
5 Conclusions and Outlook
As part of the atmospheric monitoring program at the Pierre Auger Obser-
vatory, the aerosol phase function at 350 nm is routinely measured at two of
the four FD sites. A first analysis of data taken from June to December 2006
shows that values of g = 〈cos θ〉 ≃ 0.6 for the mean cosine of the scattering
angle θ are typical for aerosols at the site of the experiment. Over the next
several years, the APF light sources will produce a data set of unprecedented
size of the scattering properties of aerosols. This data set will enable us to
carefully study any seasonal change in the aerosol content. The APF light
sources and the other atmospheric monitoring instruments at the Auger site
will accumulate one of the largest sets of continuous measurements in the
300 nm to 400 nm range ever recorded for a single location.
The APF light sources are currently operating at a wavelength of 350 nm only.
In the near future, we will add regular measurements at 330 nm and 390 nm to
study the dependence of the phase function on the wavelength of the scattered
light.
Acknowledgements
We are grateful to the following agencies and organizations for financial sup-
port: The APF light sources were built by a grant from the Department of
Energy (DOE) Office of Science (USA) (DE-FG03-92ER40732). Parts of the
APF analysis were performed during the 2006 REU (Research Experience for
Undergraduates) program at Columbia University’s Nevis Laboratories which
is supported by the National Science Foundation (USA) under contract num-
ber NSF-PHY-0452277.
References
[1] P. Mantsch (for the Pierre Auger Collaboration), The Pierre Auger Observatory
Status and Progress, in Proc. 29th Int. Cosmic Ray Conference, Pune, India
(2005) (astro-ph/0604114); www.auger.org.
[2] J. Abraham et al. (Pierre Auger Collaboration), Astroparticle Phys.
27 (2007) 155.
[3] J. Abraham et al. (Pierre Auger Collaboration), Astroparticle Phys., in press
(2007) (astro-ph/0607382).
http://arxiv.org/abs/astro-ph/0604114
http://arxiv.org/abs/astro-ph/0607382
[4] M. Mostafá (for the Pierre Auger Collaboration), Atmospheric Monitoring
for the Pierre Auger Fluorescence Detector, in Proc. 28th Int. Cosmic Ray
Conference, Tsukuba, Japan (2003), 2 (HE1.3), 465.
[5] S.Y. BenZvi et al., Nucl. Instr. Meth. A, in press (2007) (astro-ph/0609063).
[6] B. Fick et al., Journal of Instrumentation 1 (2006) P11003.
[7] G. Mie, Ann. Phys. 25 (1908) 377.
[8] M.D. Roberts, J. Phys. G: Nucl. Part. Phys 31 (2005) 1291.
[9] J.A.J. Matthews, R. Clay, for the Pierre Auger Collaboration, Atmospheric
Monitoring for the Auger Fluorescence Detector, in Proc. 27th Int. Cosmic Ray
Conference, Hamburg, Germany (2001), 2 (HE 1.8), 745.
[10] J.A.J. Matthews et al., APF Light Sources for the Auger Southern Observatory,
in Proc. 28th Int. Cosmic Ray Conference, Tsukuba, Japan (2003), 2 (HE 1.5),
[11] The Pierre Auger Collaboration, Performance of the Fluorescence Detectors of
the Pierre Auger Observatory, in Proc. 29th Int. Cosmic Ray Conference, Pune,
India (2005) (astro-ph/0508389).
[12] A. Bucholtz, Appl. Opt. 34 (1995) 2765.
[13] L. Henyey and J. Greenstein, Astrophys. Journal 93 (1941) 70.
[14] E.S. Fishburne, M.E. Neer, and G. Sandri, Voice Communication via Scattered
Ultraviolet Radiation, Report 274, Vol. 1, Aeronautical Research Associates of
Princeton, Princeton, NJ (1976).
[15] F. Riewe and A.E.S. Green, Applied Optics 17 (1978) 1923.
[16] S. Argirò et al., submitted to Computer Physics Communications (2007).
[17] M. Fiebig and J.A. Ogren, J. Geophys. Res. 111 (2006) D21204.
[18] G.A. d’Almeida, P. Koepke, and E.P. Shettle, Atmospheric Aerosols: Global
Climatology and Radiative Characteristics, A. Deepak Publishing, Hampton,
Virginia (1991).
[19] E. Andrews et al., J. Geophys. Res. 111 (2006) D05S04.
[20] G.B. Thomson et al. (for the HiRes Collaboration), Nucl. Phys. Proc. Suppl.
136 (2004) 28; www.cosmic-ray.org.
[21] D.R. Longtin, A Wind Dependent Desert Aerosol Model: Radiative Properties,
Air Force Geophysics Laboratories, AFL-TR-88-0112, 1988.
[22] S.Y. BenZvi et al. (for the Pierre Auger Collaboration), Measurement of
Aerosols at the Pierre Auger Observatory, in Proc. 30th Int. Cosmic Ray
Conference, Mérida, México (2007).
http://arxiv.org/abs/astro-ph/0609063
http://arxiv.org/abs/astro-ph/0508389
[23] M. Prouza (for the Pierre Auger Collaboration), Systematic Study of
Atmosphere-Induced Influences and Uncertainties on Shower Reconstruction
at the Pierre Auger Observatory, in Proc. 30th Int. Cosmic Ray Conference,
Mérida, México (2007).
	Introduction
	APF Light Sources
	Detector Buildings, Optics, and Electronics
	APF Signals in the Fluorescence Detectors
	Determination of the Aerosol Phase Function
	First Results
	Conclusions and Outlook
	Acknowledgements
	References
ABSTRACT
  Air fluorescence detectors measure the energy of ultra-high energy cosmic
rays by collecting fluorescence light emitted from nitrogen molecules along the
extensive air shower cascade. To ensure a reliable energy determination, the
light signal needs to be corrected for atmospheric effects, which not only
attenuate the signal, but also produce a non-negligible background component
due to scattered Cherenkov light and multiple-scattered light. The correction
requires regular measurements of the aerosol attenuation length and the aerosol
phase function, defined as the probability of light scattered in a given
direction. At the Pierre Auger Observatory in Malargue, Argentina, the phase
function is measured on an hourly basis using two Aerosol Phase Function (APF)
light sources. These sources direct a UV light beam across the field of view of
the fluorescence detectors; the phase function can be extracted from the image
of the shots in the fluorescence detector cameras. This paper describes the
design, current status, standard operation procedure, and performance of the
APF system at the Pierre Auger Observatory.

<|endoftext|><|startoftext|>
Introduction
Throughout history we have used concepts from our current technology as
metaphors to describe our world. Examples of this are the description of the
body as a factory during the Industrial Age, and the description of the brain as a
computer during the Information Age. These metaphors are useful because they
extend the knowledge acquired by the scientific and technological developments
to other areas, illuminating them from a novel perspective. For example, it is
common to extend the particle metaphor used in physics to other domains, such
as crowd dynamics [27]. Even when people are not particles and have very com-
plicated behaviour, for the purposes of crowd dynamics they can be effectively
described as particles, with the benefit that there is an established mathemati-
cal framework suitable for this description. Another example can be seen with
cybernetics [4, 28], where the system metaphor is used: everything is seen as a
system with inputs, outputs, and a control that regulates the internal variables
of the system under the influence of perturbations from its environment. Yet
another example can be seen with the computational metaphor [60], where the
universe can be modelled with simple discrete computational machines, such as
cellular automata or Turing machines.
Having in mind that we are using metaphors, this paper proposes to extend
the concept of information to describe the world: from elementary particles to
galaxies, with everything in between, particularly life and cognition. There is no
suggestion on the nature of reality as information [58]. This work only explores
the advantages of describing the world as information. In other words, there are
no ontological claims, only epistemological.
In the next section, the motivation of the paper is presented, followed by a
section describing the notion of information to be used throughout the paper. In
Section 4, eight tentative laws of information are put forward. These are applied
to the notions of life (Section 5) and cognition (Section 6). The paper closes
presenting future work and conclusions.
2 Why Information?
There is a great interest in the relationship between energy, matter, and infor-
mation [32, 54, 43]. One of the main reasons for this arises because this rela-
tionship plays a central role in the definition of life: Hopfield [30] suggests that
the difference between biological and physical systems is given by the meaning-
ful information content of the former ones. Not that information is not present
in physical systems, but—as Roederer puts it—information is passive in physics
and active in biology [49]. However, it becomes complicated to describe how this
information came to be in terms of the physical laws of matter and energy. In
this paper the inverse approach is proposed: let us describe matter and energy in
terms of information. If atoms, molecules and cells are described as information,
there is no need of a qualitative shift (from non-living to living matter) while
describing the origin and evolution of life: this is translated into a quantitative
shift (from less complex to more complex information).
There is a similar problem when we study the origin and evolution of cog-
nition [20]: it is not easy to describe cognitive systems in terms of matter and
energy. The drawback with the physics-based approach to the studies of life and
cognition is that it requires a new category, that in the best situations can be
referred to as “emergent”. Emergence is a useful concept, but it this case it is
not explanatory. Moreover, it stealthily introduces a dualist view of the world:
if we cannot relate properly matter and energy with life and cognition, we are
forced to see these as separate categories. Once this breach is made, there is
no clear way of studying or understanding how systems with life and cognition
evolved from those without it. If we see matter and energy as particular, simple
cases of information, the dualist trap is avoided by following a continuum in the
evolution of the universe. Physical laws are suitable for describing phenomena
at the physical scale. The tentative laws of information presented below aim at
being suitable for describing phenomena at any scale. Certainly, there are other
approaches to describe phenomena at multiple scales, such as general systems
theory and dynamical systems theory. These approaches are not exclusive, since
one can use several of them, including information, to describe different aspects
of the same phenomena.
Another benefit of using information as a basic descriptor for our world is
that the concept is well studied and formal methods have already been developed
[14, 46], as well as its philosophical implications have been discussed [19]. Thus,
there is no need to develop a new formalism, since information theory is well
established. I borrow this formalism and interpret it in a new way.
Finally, information can be used to describe other formalisms: not only par-
ticles and waves, but also systems, networks, agents, automata, and computers
can be seen as information. In other words, it can contain other descriptions
of the world, potentially exploiting their own formalisms. Information is an
inclusive formalism.
3 What Is Information?
Extending the notion of Umwelt [57], the following notion of information can be
given:
Notion 1 Information is anything that an agent can sense, perceive, or observe.
This notion is in accordance with Shannon’s [52], where information is seen
as a just-so arrangement, a defined structure, as opposed to randomness [12, 13],
and it can be measured in bits. This notion can be applied to everything that
surrounds us, including matter and energy, since we can perceive it—because it
has a defined structure—and we are agents, according to the following notion:
Notion 2 An agent is a description of an entity that acts on its environment
[22, p. 39].
Noticing that agents (and their environments) are also information (as they
can be perceived by other agents, especially us, who are the ones who describe
them as agents), an agent can be a human, a cell, a molecule, a computer
program, a society, an electron, a city, a market, an institution, an atom, or a
star. Each of these can be described (by us) as acting in their environment,
simply because they interact with it. However, not all information is an agent,
e.g. temperature, color, velocity, hunger, profit.
Notion 3 The environment of an agent consists of all the information interact-
ing with it.
Information will be relative to the agent perceiving it2. Information can exist
in theory “out there”, independently of an agent, but for practical purposes, it
can be only spoken about once an agent—not necessarily a human—perceives
/ interacts with it. The meaning of the information will be given by the use
the agent perceiving it makes of it [59], i.e. how the agent responds to it [7].
Thus, Notion 1 is a pragmatic one. Note that perceived information is different
from the meaning that an agent gives to it. Meaning is an active product of the
interaction between information and the agent perceiving it [13, 44].
Like this, an electron can be seen as an agent, which perceives other electrons
as information. The same description can be used for molecules, cells, and
animals. We can distinguish:
First order information is that which is perceived directly by an agent. For
example, the information received by a molecule about another molecule
Second order information is that which is perceived by an agent about in-
formation perceived by another agent. For example, the information per-
ceived by a human observer about a molecule receiving information about
another molecule.
Most of the scientific descriptions about the world are second order informa-
tion, as we perceive how agents perceive and produce information. The present
approach also introduces naturally the role of the observer in science, since ev-
erything is “observing” the (limited, first order) information it interacts with
from its own perspective. Humans would be second-level observers, observing
the information observed by information. Everything we can speak about is
observed, and all agents are observers.
Information is not necessarily conserved, i.e. it can be created, destroyed, or
transformed. These can take place only through interaction. Computation can
be seen as the change in information, be it creation, destruction, or transfor-
mation. Matter and energy can be seen as particular types of information that
cannot be created or destroyed, only transformed, along with the well-known
properties that characterize them.
2Shannon’s information [52] deals only with the technical aspect of the transmission of
information and not with its meaning, i.e. it neglects the semantic aspect of communication.
The amount of information required to describe a process, system, object,
or agent determines its complexity [46]. According to our current knowledge,
during the evolution of our universe there has been a shift from simple informa-
tion towards more complex information [2] (the information of an atom is less
complex than that of a molecule, than that of a cell, than that of a multicellular
organism, etc.). This “arrow of complexity”[11] in evolution can guide us to
explore general laws of information.
4 Tentative Laws of Information
Seeing the world as information allows us to describe general laws that can be
applied to everything we can perceive. Extending Darwin’s theory [15], the
present framework can be used to reframe “universal Darwinism” [17], which
explores the idea of evolution beyond biological systems. In this work, the laws
that describe the general behaviour of information as it evolves are introduced.
These laws are only tentative, in the sense that they are only presented with
arguments in favour of them, but they still need to be thoroughly tested.
4.1 Law of Information Transformation
Since information is relative to the agents perceiving it, information will poten-
tially be transformed as different agents perceive it. Another way of stating this
law is the following: information will potentially be transformed by interacting
with other information. This law is a generalization of the Darwinian principle
of random variation, and ensures novelty of information in the world. Even
when there might be static information, different agents can perceive it differ-
ently and interact with it, potentially transforming it. Through evolution, the
transformation of information generates a variety or diversity that can be used
by agents for novel purposes.
Since information is not a conserved quantity, it can increase (created), de-
crease (destroyed), or be maintained as it is transformed.
As an example, RNA polymerase (RNAP) can make errors while copying
DNA onto RNA strands. This slight random variation can lead to changes
in the proteins for which the RNA strands serve as templates. Some of these
changes will lead to novel proteins that might improve or worsen the function of
the original proteins.
The transformation of information can be classified as follows:
Dynamic. Information changes itself. This could be considered as “objective,
internal” change.
Static. The agent perceiving the information changes, but the information itself
does not change. There is a dynamic change but in the agent. This could
be considered as “subjective, internal” change.
Active. An agent changes information in its environment. This could be con-
sidered as an “objective, external” change.
Stigmergic. An agent makes an active change of information, which changes
the perception of that information by another agent. This could be con-
sidered as “subjective, external” or “intersubjective” change.
4.2 Law of Information Propagation
Information propagates as fast as possible. Certainly, only some information
manages to propagate. In other words, we can assume that different informa-
tion has a different “ability” to propagate, also depending on its environment.
The “fitter” information, i.e. that which manages to persist and propagate faster
and more effectively, will prevail over other information. This law generalizes
the Darwinian principle of natural selection, the maximum entropy production
principle [37] (entropy can also be described as information), and Kauffman’s
tentative fourth law of thermodynamics3. It is interesting that this law contains
the second law of thermodynamics, as atoms interact, propagating informa-
tion homogeneously. It also describes living organisms, where genetic informa-
tion is propagated across generations. And it also describes cultural evolution,
where information is propagated among individuals. Life is “far from thermo-
dynamic equilibrium” because it constrains [32] the (more simple) information
propagation at the thermodynamic scale, i.e. the increase of entropy, exploiting
structures to propagate (or maintain) the (more complex) information at the
biological scale.
In relation with the law of information transformation, as information re-
quires agents to perceive it, information will be potentially transformed. This
source of novelty will allow for the “blind” exploration of better ways of propa-
gating information, according to the agents perceiving it and their environments.
Extending the previous example, if errors in transcription made by RNAP are
beneficial for its propagation (which entails the propagation of the cell producing
RNAP), cells with such novel proteins will have better chances of survival than
their “cousins” without transcription errors.
The propagation of information can be classified as follows:
Autonomous. Information propagates by itself. Strictly speaking, this is not
possible, since at least some information is determined by the environment.
However, if more information is produced by itself than by its environment,
we can call this autonomous propagation (See Section 5).
Symbiotic. Different information cooperates, helping to propagate each other.
Parasitic. Information exploits other information for its own propagation.
Altruistic. Information promotes the propagation of other information at the
cost of its own propagation.
3“The workspace of the biosphere expands, on average, as fast as it can in this coconstruct-
ing biosphere” [32, p. 209]
4.3 Law of Requisite Complexity
Taking into account the law of information transformation, transformed infor-
mation can increase, decrease, or maintain its previous complexity, i.e. amount
[46]. However, more complex information will require more complex agents to
perceive, act on, and propagate it. This law generalizes the cybernetic law of
requisite variety [4]. Note that simple agents can perceive and interact with
part of complex information, but they cannot (by themselves) propagate it. An
agent cannot perceive (and thus contain) information more complex than itself.
For simple agents, information that is complex for us will be simple as well.
As stated above, different agents can perceive the same information in different
ways, giving it different meanings.
The so called “arrow of complexity” in evolution [11] can be explained with
this law. If we start with simple information, its transformation will produce by
simple drift [39, 41] increases in the complexity of information, without any goal
or purpose. This occurs simply because there is an open niche for information to
become more complex as it varies. But this also promotes agents to become more
complex to exploit novel (complex) information and propagate it. Evolution does
not need to favour complexity in any way: information just propagates to every
possible niche as fast as possible, and it seems that there is often an “adjacent
possible” [32] niche of greater complexity.
For example, it can be said that a protein (as an agent) perceives some
information via its binding sites, as it recognizes molecules that “fit” a site. More
complex molecules will certainly need more complex binding sites. Whether
complex molecules are better or worse is a different matter: some will be better,
some will be worse. But for those which are better, the complexity of the
proteins must match the complexity of the molecules perceived. If the binding
site perceives only a part of the molecule, then this might be confused with
other molecules which share the perceived part. Following the law of information
transformation, there will be a variety of complexities of information. The law
of requisite complexity just states that the increase in complexity of information
is determined by the ability of agents to perceive, act on, and propagate more
complex information.
Since more complex information will be able to produce more variety, the
speed of the complexity increase will escalate together with the complexity of
the information.
4.4 Law of Information Criticality
Transforming and propagating information will tend to a critical balance be-
tween its stability and its variability. Propagating information maintains itself
as much as possible, but transforming information varies it as much as possi-
ble. This struggle leads to a critical balance analogous to the “edge of chaos”
[36, 31], self-organized criticality [8, 1], and the “complexity from noise” princi-
ple [6]. The homeostasis of living systems can also be seen as the self-regulation
of information criticality.
This law can generalize Kauffman’s four candidate laws for the coconstruc-
tion of a biosphere [32, Ch. 8]. Their relationship with this framework demands
further discussion, which is out of the scope of this paper.
A well known example can be seen with cellular automata [36] and random
Boolean networks [31, 21, 23]: stable (ordered) dynamics limit considerably or
do not allow change of states so information cannot propagate, while variable
(chaotic) dynamics change the states too much, losing information. Following
the law of information propagation, information will tend to a critical state
between stability and variability to maximize its propagation: if it is too stable,
it will not propagate, and if it is too variable, it will be transformed. In other
words, “critical” information will be able to propagate better than stable or
variable one, i.e. as fast as possible (cf. law of information propagation).
4.5 Law of Information Organization
Information produces constraints that regulate information production. These
constraints can be seen as organization [32]. In other words, evolving information
will be organized (by transformation and propagation) to regulate information
production. According to the law of information criticality, this organization
will lie at a critical area between stability and variability. And following the
law of information propagation, the organization of information will enable it to
propagate as fast as possible.
This law can also be seen as information having a certain control over its
environment, since the organization of information will help it withstand pertur-
bations. It has been shown [33, 47, 34] that using this idea as a fitness function
can lead to the evolution of robust and adaptive agents, namely maximizing the
mutual information between sensors and environment.
A clear example of information producing its own organization can be seen
with living systems, which are discussed in Section 5.
4.6 Law of Information Self-organization
Information tends to its preferred, most probable state. This is actually a tautol-
ogy, since observers determine probabilities after observing tendencies of infor-
mation dynamics. Still, this tautology can be useful to describe and understand
phenomena. This law lies at the heart of probability theory and dynamical sys-
tems theory [5]. The dynamics of a system tend to a subset of its state space,
i.e. attractors, depending on its history. This simple fact reduces the possibility
space of information, i.e. a system will tend towards a small subset of all pos-
sible states. If we describe attractors as “organized”, then we can describe the
dynamics of information in terms of self-organization [25].
Pattern formation can be described as information self-organizing, and re-
lated to the law of information propagation. Information will self-organize in
“fit” patterns that are the most probable (defined a posteriori).
Understanding different ways in which self-organization is achieved by trans-
forming information can help us understand better natural phenomena [24] and
design artificial systems [22]. For example, random Boolean networks can be
said to self-organize towards their attractors [23].
4.7 Law of Information Potentiality
An agent can give different potential meanings to information. This implies
that the same information can have different meanings. Moreover, meaning—
while being information—can be independent of the information carrying it, i.e.
depend only on the agent observing it. Thus, different information can have the
same potential meaning. The precise meaning of information will be given by
an agent observing it within a specific context.
The potentiality of information allows the effective communication between
agents. Different information has to be able to acquire the same meaning
(homonymy), while the same information has to be able to acquire different
meanings (polysemy) [44]. The relationship between the laws of information
and communication is clear, but beyond the scope of this paper.
The law of information potentiality is related to a passive information trans-
formation, i.e. a change in the agent observing information.
In spite of information potentiality, not all meanings will be suitable for
all information. In other words, pure subjectivism cannot dictate meanings of
information. By the law of information propagation, some meanings will be
more suitable than others and will propagate. The suitability of meanings will
be determined by their use and context [59]. However, there is always a certain
freedom to subjectively transform information.
For example, a photon can be observed as a particle, as a wave, or as a
particle-wave. The suitability of each given meaning is determined by the context
in which the photon is described/observed.
4.8 Law of Information Perception
The meaning of information is unique for an agent perceiving it in unique, always
changing open contexts. If meaning of information is determined by the use an
agent makes of it, which is embedded in an open environment, we can go to
such a level of detail that the meaning will be unique. Certainly, agents make
generalizations and abstractions of perceptions in order to be able to respond to
novel information. Still, the precise situation and context will never be repeated.
This makes perceived information unique. The implication of this is that the
response to any given information might be “unexpected”, i.e. novelty can
arise. Moreover, the meaning of information can be to a certain extent arbitrary.
This is related with the law of information transformation, as the uniqueness
of meaning allows the same information perceived differently by the same or
different agents to be statically transformed.
This law is a generalization of the first law of human perception: “whatever
is perceived can be perceived only from a uniquely situated place in the overall
structure of points of view” [29, p. xxiv] (cited in [44, p. 250]). We can describe
agents perceiving information as filtering it. An advantage of humans and other
agents is that we can choose which filter to use to perceive. The suggestion
is not that “unpleasant” information should be solipsistically ignored, but that
information can be potentially actively transformed.
For example, T lymphocytes in an immune system can perceive foreign agents
and attack them. Even when the response will be similar for similar foreign
agents, each perception will be unique, a situation that always leaves space for
novelty.
Scales of perception
Different information is perceived at different scales of observation [9]. As the
scale tends to zero, then the information tends to infinite. For lower scales, more
information and details are perceived. The uniqueness of information perception
dominates at these very low (spatial and temporal) scales. However, as gener-
alizations are made, information is “compressed”, i.e. only relevant aspects of
information are perceived4. At higher scales, more abstractions and general-
izations are made, i.e. less information is perceived. When the scale tends to
infinite, the information tends to zero. In other words, no information is needed
to describe all of the universe, because all the information is already there. This
most abstract understanding of the world is in line with the “highest view” of
Vajrayana Buddhism [45]. Implications at this level of description cannot be
right or wrong, because there is no context. Everything is contained, but no
information is needed to describe it, since it is already there. This “maximum”
understanding is also described as vacuity, which leads to bliss [45, p. 42].
Following the law of information criticality, agents will tend to a balance
where the perceived information is minimal but maximally predictive [51] (at a
particular scale): few information is cheaper, but more information in general
entails a more precise predictability. The law of requisite complexity applies at
particular scales, since a change of scale will imply a change of complexity of
information [9].
5 On the Notion of Life
There is no agreed notion of life, which reflects the difficulty of defining the
concept. Still, many researchers have put forward properties that characterize
important aspects of life. Autopoiesis is perhaps the most salient one, which
notes that living systems are self-producing [55, 38]. Still, it has been argued
that autopoiesis is a necessary but not sufficient property for life [50]. The
relevance of autonomy [10, 42, 35] and individuality [40, 35] for life have also
been highlighted .
These approaches are not unproblematic, since no living system is completely
autonomous. This follows from the fact that all living systems are open. For
4The relevance is determined by the context, i.e. different aspects will be relevant for
different contexts.
example, we have some degree of autonomy, but we are still dependent on food,
water, oxygen, sunlight, bacteria living in our gut, etc. This does not mean that
we should abandon the notion of autonomy in life. However, we need to abandon
the sharp distinction between life and non-life [11, 35], as different degrees of
autonomy escalate gradually, from the systems we considered as non-living to
the ones we consider as living. In other words, life has to be a fuzzy concept.
Under the present framework, living and non-living systems are information.
Rather than a yes/no definition, we can speak about a “life ratio”:
Notion 4 The ratio of living information is the information produced by itself
over the information produced by its environment.
Being more specific—since all systems also receive information—a system
with a high life ratio produces more (first order) information about itself than
the one it receives from its environment. Following the law of information orga-
nization, this also implies that living information produces more of its own con-
straints (organization) to regulate itself than the ones produced by its environ-
ment, and thus it has a greater autonomy. All information will have constraints
from other (environmental) information, but we can measure (as second-order
information) the proportion of internal over external constraints to obtain the
life ratio. If this is greater than one, then the information regulates by itself more
than the proportion that is regulated by external information. In the opposite
case, the life ratio would be less than one.
Following the law of information propagation, evolution will tend to informa-
tion with higher life ratios, simply because this can propagate better, as it has
more “control” and autonomy over its environment. When information depends
more on its environment for its propagation, it has a higher probability of being
transformed as it interacts with its environment.
Note that the life ratio depends on spatial and temporal scales at which
information is perceived. For example, for some microorganisms observed at a
scale of years , the life ratio would be less than one, but if observed at a scale of
seconds, the life ration would be greater than one.
Certainly, some artificial systems would be considered as living under this
notion. However, we can make a distinction between living systems embodied
in or composed by biological cells [16], i.e. life as we know it, and the rest, i.e.
life as it could be. The latter ones are precisely those explored by artificial life.
6 On the Notion of Cognition
Cognition is certainly related with life [53]. The term has taken different mean-
ings in different contexts, but all of them can be generalized into a common
notion [20]. Cognition comes from the Latin cognoscere, which means “get to
know”. Like this,
Notion 5 A system is cognitive if it knows something [20, p.135].
From Notion 2, all agents are cognitive, since they “know” how to act on
their environment, giving (first order) meaning to their environmental informa-
tion. Thus, there is no boundary between non-cognitive and cognitive systems.
Throughout evolution, however, there has been a gradual increase in the com-
plexity of cognition [20]. This is because all agents can be described as possessing
some form of cognition, i.e. “knowledge” about the (first-order) information they
perceive5.
Following the law of requisite complexity, evolution leads to more complex
agents, to be able to cope with the complexity of their environment. This is
precisely what triggers the (second-order) increase in the complexity of cognition
we observe.
Certainly, there are different types of cognition6. We can say that a rock
“knows” about gravity because it perceives its information, which has an effect
on it, but it cannot react to this information. Throughout evolution, infor-
mation capable of maintaining its integrity has prevailed over that which was
not. Robust information is that which can resist perturbations to maintain its
integrity. The ability to react to face perturbations to maintain information
makes information adaptive, increasing its probability of maintenance. When
this reaction is made before it occurs, the information is anticipative7. As in-
formation becomes more complex (even if only by information transformation),
the mechanisms for maintaining this information also become more complex, as
stated by the law of requisite complexity. This has led gradually to the advanced
cognition that animals and machines posses.
7 Future Work
The ideas presented here still need to be explored and elaborated further. One
way of doing this would be with a simulation-based method. Being inspired by
ǫ-machines [51, 26], one could start with “simple” agents that are able to per-
ceive and produce information, but cannot control their own production. These
would be let to evolve, measuring if complexity increases as they evolve. The
hypothesis is that complexity would increase (under which conditions still re-
mains to be seen), to a point where “ǫ-agents” will be able to produce themselves
depending more on their own information than that of the environment. This
would be similar to the evolution in Tierra [48] or Avida [3] systems, only that
self-replication would not be inbuilt. The tentative laws of information presented
in Section 4 would be better defined if such a system was studied.
One important aspect that remains to be studied is the representation of
5One could argue that, since agency (and thus cognition) is already assumed in all agents,
this approach is not explanatory. But I am not trying to explain the “origins” of agency, since
I assume it to be there from the start. I believe that we can only study the evolution and
complexification of agency and cognition, not their “origins”.
6For example, human, animal, plant, bacterial, immune, biological, adaptive, systemic, and
artificial [20].
7For a more detailed treatment on robustness, adaptation, and anticipation, see [22]
thermodynamics in terms of information. This is because the ability to per-
form thermodynamic work is a characteristic property of biological systems [32].
This work can be used to generate the organization necessary to sustain life
(cf. law of information organization). It is difficult to describe life in terms
of thermodynamics, since it entails new characteristic properties not present in
thermodynamic systems. But if we see the latter ones as information, it will be
easier to describe how life—also described as information—evolves from them,
as information propagates itself at different scales.
A potential application of this framework would be in economy, considering
capital, goods, and resources as information (a non-conserved quantity) [18].
A similar benefit (of non-conservation) could be given in game theory: if the
payoff of games is given in terms of information (not necessarily conserved), non-
zero sum games could be easier to grasp than if the payoff is given in material
(conserved) goods.
It becomes clear that information (object), the agent perceiving it (subject)
and the meaning-making or transformation of information (action) are deeply
interrelated. They are part of the same totality, since one cannot exist without
the others. This is also in line with Buddhist philosophy. The implications of an
informational description of the world for philosophy have also to be addressed,
since some schools have focussed on partial aspects of the object-subject-action
trichotomy. Another potential application of the laws of information would be
in ethics, where value can be described accordingly to the present framework.
8 Conclusions
This paper introduced general ideas that require further development, extension
and grounding in particular disciplines. Still, a first step is always necessary, and
hopefully feedback from the community will guide the following steps of this line
of research.
Different metaphors for describing the world can be seen as different lan-
guages: they can refer to the same objects without changing them. And each
can be more suitable for a particular context. For example, English has several
advantages for fast learning, German for philosophy, Spanish for narrative, and
Russian for poetry. In other words, there is no “best” language outside a par-
ticular context. In a similar way, I am not suggesting that describing the world
as information is more suitable than physics to describe physical phenomena,
or better than chemistry to describe chemical phenomena. It would be redun-
dant to describe particles as information if we are studying only particles. The
suggested approach is meant only for the cases when the physical approach is
not sufficient, i.e. across scales, constituting an alternative worth exploring to
describe evolution.
It seems easier to describe matter and energy in terms of information than
vice versa. Moreover, information could be used as a common language across
scientific disciplines [56].
Acknowledgements
I should like to thank Irun Cohen, Inman Harvey, Francis Heylighen, David
Krakauer, Antonio del Rı́o, Marko Rodriguez, David Rosenblueth, Stanley
Salthe, Mikhail Prokopenko, Clément Vidal, and Héctor Zenil for their useful
comments and suggestions.
Bibliography
[1] Adami, Christoph, “Self-organized criticality in living systems”, Phys. Lett. A 203
(1995), 29–32.
[2] Adami, Christoph, “What is complexity?”, Bioessays 24, 12 (December 2002), 1085–
1094.
[3] Adami, Chris, and C. Titus Brown, “Evolutionary learning in the 2d artificial life system
”Avida””, Proc. Artificial Life IV (R. Brooks and P. Maes eds.), MIT Press (1994),
377–381.
[4] Ashby, W. Ross, An Introduction to Cybernetics, Chapman & Hall London (1956).
[5] Ashby, W. Ross, “Principles of the self-organizing system”, Principles of Self-
Organization (Oxford, ) (H. V. Foerster and G. W. Zopf, Jr. eds.), Pergamon (1962),
255–278.
[6] Atlan, H., “On a formal definition of organization”, J Theor Biol 45, 2 (June 1974),
295–304.
[7] Atlan, Henri, and Irun R. Cohen, “Immune information, self-organization and mean-
ing”, Int. Immunol. 10, 6 (1998), 711—717.
[8] Bak, Per, Chao Tang, and KurtWiesenfeld, “Self-organized criticality: An explanation
of the 1/f noise”, Phys. Rev. Lett. 59, 4 (July 1987), 381–384.
[9] Bar-Yam, Y., “Multiscale variety in complex systems”, Complexity 9, 4 (2004), 37–45.
[10] Barandarian, Xabier, “Behavioral adaptive autonomy. a milestone in the ALife route
to AI?”, Artificial Life IX Proceedings of the Ninth International Conference on the
Simulation and Synthesis of Living Systems (J. Pollack, M. Bedau, P. Husbands,
T. Ikegami, and R. A. Watson eds.), MIT Press (2004), 514–521.
[11] Bedau, Mark A., “Four puzzles about life”, Artificial Life 4 (1998), 125–140.
[12] Cohen, Irun R., Tending Adam’s Garden: Evolving the Cognitive Immune Self, Academic
Press London (2000).
[13] Cohen, Irun R., “Informational landscapes in art, science, and evolution”, Bulletin of
Mathematical Biology 68, 5 (July 2006), 1213–1229.
[14] Cover, Thomas M., and Joy A. Thomas, Elements of Information Theory, Wiley-
Interscience (July 2006).
[15] Darwin, Charles, The Origin of Species, Wordsworth (1998).
[16] De Duve, Christian, Live Evolving: Molecules, Mind, and Meaning, Oxford University
Press (2003).
[17] Dennett, Daniel, Darwin’s Dangerous Idea, Simon & Schuster (1995).
[18] Farmer, J. Doyne, and Neda Zamani, “Mechanical vs. informational components of price
impact”, EPJ B 55, 2 (2007), 189–200.
[19] Floridi, Luciano ed., The Blackwell Guide to Philosophy of Computing and Information,
Blackwell (2003).
[20] Gershenson, Carlos, “Cognitive paradigms: Which one is the best?”, Cognitive Systems
Research 5, 2 (June 2004), 135–156.
[21] Gershenson, Carlos, “Introduction to random Boolean networks”, Workshop and Tu-
torial Proceedings, Ninth International Conference on the Simulation and Synthesis of
Living Systems (ALife IX) (Boston, MA, ) (M. Bedau, P. Husbands, T. Hutton, S. Ku-
mar, and H. Suzuki eds.), (2004), 160–173.
[22] Gershenson, Carlos, Design and Control of Self-organizing Systems, CopIt Arxives
Mexico (2007), http://tinyurl.com/DCSOS2007.
[23] Gershenson, Carlos, “Guiding the self-organization of random Boolean networks”, The-
ory in Biosciences (In Press).
[24] Gershenson, Carlos, “The sigma profile: A formal tool to study organization and its
evolution at multiple scales”, Complexity (In Press).
[25] Gershenson, Carlos, and Francis Heylighen, “When can we call a system self-
organizing?”, Advances in Artificial Life, 7th European Conference, ECAL 2003 LNAI
2801 (Berlin, ) (W. Banzhaf, T. Christaller, P. Dittrich, J. T. Kim, and J. Ziegler
eds.), Springer (2003), 606–614.
[26] Görnerup, Olof, and James P. Crutchfield, “Hierarchical self-organization in the
finitary process soup”, Artificial Life In Press (2008), Special Issue on the Evolution of
Complexity.
[27] Helbing, Dirk, and Tamás Vicsek, “Optimal self-organization”, New Journal of Physics
1 (1999), 13.1–13.17.
[28] Heylighen, Francis, and Cliff Joslyn, “Cybernetics and second order cybernetics”,
Encyclopedia of Physical Science and Technology, (R. A. Meyers ed.) 3rd ed. vol. 4.
Academic Press New York (2001), pp. 155–170.
[29] Holquist, M., “Introduction”, Art and Answerability, (M. M. Bakhtin ed.). University
of Texas Press Austin (1990).
[30] Hopfield, J. J., “Physics, computation, and why biology looks so different”, Journal of
Theoretical Biology 171 (1994), 53–60.
[31] Kauffman, S. A., The Origins of Order, Oxford University Press (1993).
[32] Kauffman, Stuart A., Investigations, Oxford University Press (2000).
[33] Klyubin, Alexander S., Daniel Polani, and Chrystopher L. Nehaniv, “Organization of
the information flow in the perception-action loop of evolved agents perception-action loop
of evolved agents”, Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware
(R. S. Zebulum, D. Gwaltney, G. Hornby, D. Keymeulen, J. Lohn, and A. Stoica
eds.), IEEE Computer Society (2004), 177–180.
[34] Klyubin, Alexander S., Daniel Polani, and Chrystopher L. Nehaniv, “Representations
of space and time in the maximization of information flow in the perception-action loop”,
Neural Computation 19 (2007), 2387–2432.
[35] Krakauer, D. C., and P. M. A. Zanotto, “Viral individuality and limitations of the
life concept”, Protocells: Bridging Nonliving and Living Matter, (S. Rasmussen, M. A.
Bedau, L. Chen, D. Deamer, D. C. Krakauer, N. Packard, and D. P. Stadler eds.).
MIT Press (2007).
[36] Langton, Christpher, “Computation at the edge of chaos: Phase transitions and emer-
gent computation”, Physica D 42 (1990), 12–37.
[37] Martyushev, L. M., and V. D. Seleznev, “Maximum entropy production principle in
physics, chemistry and biology”, Physics Reports 426, 1 (April 2006), 1–45.
[38] McMullin, Barry, “30 years of computational autopoiesis: A review”, Artificial Life 10,
3 (Summer 2004), 277–295.
[39] McShea, Daniel W., “Metazoan complexity and evolution: Is there a trend?”, Evolution
50 (1996), 477–492.
[40] Michod, Richard E., Darwinian Dynamics: Evolutionary Transitions in Fitness and
Individuality, Princeton University Press Princeton, NJ (2000).
[41] Miconi, T, “Evolution and complexity: the double-edged sword”, Artificial Life 14, 3
(Summer 2008), 325–344, Special Issue on the Evolution of Complexity.
[42] Moreno, Alvaro, and Kepa Ruiz-Mirazo, “The maintenance and open-ended growth of
complexity in nature: information as a decoupling mechanism in the origins of life”, Re-
framing Complexity: Perspectives from the North and South, (F. Capra, A. Juarrero,
P. Sotolongo, and J. van Uden eds.). ISCE Publishing (2006).
[43] Morowitz, Harold, and D. Eric Smith, “Energy flow and the organization of life”, Tech.
Rep. no. 06-08-029, Santa Fe Institute, (2006).
[44] Neuman, Yair, Reviving the Living: Meaning Making in Living Systems vol. 6 of Studies
in Multidisciplinarity, Elsevier Amsterdam (2008).
[45] Nydahl, Ole, The Way Things Are: A living Approach to Buddhism for today’s world.,
O Books (2008).
[46] Prokopenko, Mikhail, Fabio Boschetti, and Alex J. Ryan, “An information-theoretic
primer on complexity, self-organisation and emergence”, Complexity 15, 1 (2009), 11–28.
[47] Prokopenko, M., V. Gerasimov, and I. Tanev, “Evolving spatiotemporal coordination
in a modular robotic system”, From Animals to Animats 9: 9th International Conference
on the Simulation of Adaptive Behavior (SAB 2006) (S. Nolfi, G. Baldassarre, R. Cal-
abretta, J. C. T. Hallam, D. Marocco, J.-A. Meyer, O. Miglino, and D. Parisi
eds.), vol. 4095 of Lecture Notes in Computer Science, Springer (2006), 558–569.
[48] Ray, T. S., “An approach to the synthesis of life”, Artificial Life II, (C. Langton,
C. Taylor, J. D. Farmer, and S. Rasmussen eds.) vol. XI of Santa Fe Institute Studies
in the Sciences of Complexity. Addison-Wesley Redwood City, CA (1991), pp. 371–408.
[49] Roederer, Juan G., Information and its Role in Nature, Springer-Verlag Heidelberg
(May 2005).
[50] Ruiz-Mirazo, Kepa, and Alvaro Moreno, “Basic autonomy as a fundamnental step in
the synthesis of life”, Artificial Life 10, 3 (Summer 2004), 235–259.
[51] Shalizi, Cosma R., Causal Architecture, Complexity and Self-Organization in Time
Series and Cellular Automata, PhD thesis University of Wisconsin at Madison (2001).
[52] Shannon, C. E., “A mathematical theory of communication”, Bell System Technical
Journal 27 (July and October 1948), 379–423 and 623–656.
[53] Stewart, John, “Cognition = life : Implications for higher-level cognition”, Behavioural
processes 35 (1995), 311–326.
[54] Umpleby, Stuart, “Physical relationships among matter, energy and information”, Cy-
bernetics and Systems 2004 (Vienna, ) (R. Trappl ed.), vol. 1, Austrian Society for
Cybernetic Studies, (2004), 124–6.
[55] Varela, Francisco J., Humberto R. Maturana, and R. Uribe., “Autopoiesis: The
organization of living systems, its characterization and a model”, BioSystems 5 (1974),
187–196.
[56] von Baeyer, Hans Christian, Information: The New Language of Science, Harvard
University Press Cambridge, MA (2004).
[57] von Uexküll, Jakob, “A stroll through the worlds of animals and men”, Instinctive
Behavior: The Development of a Modern Concept, (C. H. Schiller ed.). International
Universities Press New York (1957), pp. 5–80.
[58] Wheeler, John Archibald, “Information, physics, quantum: the search for links”, Com-
plexity, Entropy, and the Physics of Information, (W. H. Zurek ed.) vol. VIII of Santa
Fe Institute Studies in the Sciences of Complexity. Perseus Books Reading, MA (1990).
[59] Wittgenstein, Ludwig, Philosophical Investigations 3rd ed., Prentice Hall (1999).
[60] Wolfram, Stephen, A New Kind of Sciene, Wolfram Media (2002).
	1 The World as Evolving Information
	1 Introduction
	2 Why Information?
	3 What Is Information?
	4 Tentative Laws of Information
	4.1 Law of Information Transformation
	4.2 Law of Information Propagation
	4.3 Law of Requisite Complexity
	4.4 Law of Information Criticality
	4.5 Law of Information Organization
	4.6 Law of Information Self-organization
	4.7 Law of Information Potentiality
	4.8 Law of Information Perception
	5 On the Notion of Life
	6 On the Notion of Cognition
	7 Future Work
	8 Conclusions
ABSTRACT
  This paper discusses the benefits of describing the world as information,
especially in the study of the evolution of life and cognition. Traditional
studies encounter problems because it is difficult to describe life and
cognition in terms of matter and energy, since their laws are valid only at the
physical scale. However, if matter and energy, as well as life and cognition,
are described in terms of information, evolution can be described consistently
as information becoming more complex.
  The paper presents eight tentative laws of information, valid at multiple
scales, which are generalizations of Darwinian, cybernetic, thermodynamic,
psychological, philosophical, and complexity principles. These are further used
to discuss the notions of life, cognition and their evolution.

<|endoftext|><|startoftext|>
Polymerization Force Driven Buckling of Microtubule Bundles Determines the Wavelength of
Patterns Formed in Tubulin Solutions
Yongxing Guo, Yifeng Liu, Jay X. Tang, and James M. Valles, Jr.
Physics Department, Brown University, Providence, RI 02912
(Dated: June 8, 2021)
We present a model for the spontaneous formation of a striated pattern in polymerizing microtubule solutions.
It describes the buckling of a single microtubule (MT) bundle within an elastic network formed by other similarly
aligned and buckling bundles and unaligned MTs. Phase contrast and polarization microscopy studies of the
temporal evolution of the pattern imply that the polymerization of MTs within the bundles creates the driving
compressional force. Using the measured rate of buckling, the established MT force-velocity curve and the
pattern wavelength, we obtain reasonable estimates for the MT bundle bending rigidity and the elastic constant
of the network. The analysis implies that the bundles buckle as solid rods.
Microtubules (MTs), a major component of the eukary-
otic cytoskeleton [1], can form various structures and pat-
terns. For example, in vivo, MTs organize into the spindles
and asters essential for mitosis [2] and the parallel arrays and
stripes necessary for directing early processes in embryogene-
sis [3, 4]. Many in vitro studies of MT organization have been
performed in order to elucidate the mechanisms underlying
the formation of these structures [5, 6, 7, 8]. Of particular
relevance here are the striped birefringent patterns [Fig. 1(a)],
which spontaneously form from polymerizing a purified tubu-
lin solution without motor proteins or MT associated proteins.
Hitt et al. attributed these patterns to the formation of ne-
matic liquid crystalline domains [5]. Tabony et al., on the
other hand, proposed that a reaction-diffusion based mecha-
nism drives the formation of MT stripes [6]. Our recent in-
vestigations imply a starkly different scenario in which the
local MT alignment into wave-like structures occurs through
a collective process of MT bundling and buckling [9]. MTs
that are aligned by a static magnetic field [10, 11] or convec-
tive flow [9] during the initial stage of polymerization sponta-
neously form bundles in tubulin solutions with concentrations
of a few mg/ml. These bundles elongate and buckle in co-
ordination with neighboring bundles into a wave-like shape.
The nesting of the buckled bundles can quantitatively account
for the MT density and orientation variations leading to the
striped birefringent pattern [9]. We proposed that a compres-
sional force is generated by MT polymerization occurring uni-
formly along the bundle contours. The buckling wavelength
is controlled by the bending rigidity of the bundles and the
elasticity of the background network of MTs. This interesting
initial assessment calls for further investigation of the micro-
scopic picture of the bundle elongation, the MT buckling force
and the buckling mode selection mechanism.
Here we present a mechanical model for the process in
addition to new experimental data on the time evolution of
the bundle contour length and solution birefringence that pro-
vide direct support for the validity of the model. The model
considers the instability of a single MT bundle under a com-
pressional force, embedded in an elastic network formed by
both bundled and dispersed MTs. Time lapse phase contrast
and quantitative polarized light microscopy imply that MT
polymerization within the bundles provides the compressional
FIG. 1: Image of a MT birefringent pattern and a sketch of the me-
chanical buckling model. (a) Striped birefringent pattern [23]. The
image was taken between crossed polarizers with the polarization di-
rections at 45◦ with respect to the x axis. (b) Schematic drawing
of buckled MT bundles surrounded by an elastic MT network (gray
background). The white dashed line depicts the central bundle before
buckling. The white sinusoidal curve depicts the elongated bundle
after buckling and the gray sinusoidal curves represent the neighbor-
ing MT bundles. ξ (x) is the transverse displacement for the central
bundle.
force. Specifically, they reveal that the bundles elongate uni-
formly along their contours while maintaining constant radii
consistent with growth through the elongation of the individ-
ual MTs comprising them. We make predictions for the char-
acteristic buckling wavelength using the bundle bending rigid-
ity and the critical buckling force estimated from the measured
MT force-velocity curve. The measured wavelength of about
600 µm implies that the bundles bend as solid rods.
We envision initially the microtubule solution to consist of
an array of straight and parallel bundles aligned along the x
axis and embedded in a network composed of dispersed MTs
as in Fig. 1(b). All of the bundles experience a similar com-
pressional force that grows to a critical value, causing them to
buckle. To describe the buckling, we consider a single bun-
dle in the center of the sample and characterize its interaction
with the network using a single elastic constant, α , such that
αξ (x) is the elastic restoring force exerted by the network on
the bundle per unit length. Treating the bundle as a rod with a
bending rigidity, K, under a uniform compressional force, F ,
the force balance in the y direction at the onset of the buckling
is given by [12, 13, 14]
ξ (x)
∂ξ (x)
]+αξ (x) = 0 (1)
Performing a standard normal mode stability analysis of
Eq. (1) using ξ (x) ∝ eikx yields a relation between the angu-
lar wavenumber, k, and the compressional force, F = α/k2 +
Kk2, which suggests a minimum or critical compressional
force Fc for a buckling solution. The critical compressional
force is Fc = 2
Kα , and the characteristic wavelength is
λc = 2π/k = π
8K/Fc = 2π
K/α (2)
The resultant characteristic wavelength [Eq. (2)] agrees
with the prediction for λc based on energy minimization [9].
This model predicts buckling in a higher mode than the fun-
damental one as in classic Euler buckling.
In agreement with experiments, this model implies that the
orientation of MT bundles in a striped sample varies continu-
ously in space [9]. In contrast, previous models had suggested
that discrete and alternate angular orientations of the MTs
formed the striated patterns [15]. In addition, the weak depen-
dence of the buckling wavelength on the mechanical parame-
ters is consistent with the small variations in both the observed
buckling wavelength across a single macroscopic sample and
the patterns formed under different conditions (for example,
samples with different tubulin concentrations and samples in
containers with different size.).
Time lapse phase contrast microscopy reveals that the MT
bundles elongate uniformly along their contour during buck-
ling, which is consistent with polymerization occurring uni-
formly along the bundles. The elongation is illustrated in the
phase images Fig. 2(a) and Fig. 2(b), showing a fixed region
taken 12 and 100 minutes after polymerization initiation, re-
spectively. The three white curves in each image are computer
generated traces of bundle contours that extend between se-
lected fiducial marks. The fiducial marks are visible as dark
spots in the images. To generate the white curves, we pre-
sumed that the bundles followed the striations in the images
and traced the stripes between the fiducial marks, whose po-
sitions were tracked using the MetaMorph imaging software
(Universal Imaging, West Chester, PA). Specifically, we de-
termined the local striation orientation at each pixel by calcu-
lating a Fast Fourier Transform (FFT) of the area around the
pixel, shown, for example, in Fig. 2(c). The FFT appeared
as an elongated spot oriented perpendicular to the striation di-
rection [Fig. 2(d)]. The radially integrated FFT intensity has
a peak at a specific azimuthal angle [Fig. 2(d)] that is perpen-
dicular to the striation orientation. In this way, the lengths
of three segments along a MT bundle were recorded every 30
seconds and plotted in Fig. 2(f), (g) and (h). The normal-
ized lengths of these three segments grew at nearly the same,
constant rate, shown in Fig. 2(i), implying that the MT bun-
dles elongate uniformly along their contour instead of growing
FIG. 2: Illustration and measurements of the uniform elongation of
MT bundles [23]. (a,b) Phase contrast images of a sample region,
show progression of the pattern over one hour. MT bundles are dis-
cerned by the thin striations. The image contrast is enhanced for bet-
ter visualization. Segments 1 through 3 are adjacent pieces of a con-
tour followed by bundles. The segment ends are defined by fiducial
marks. (c) Magnified view of the region denoted by the white box in
(b), showing an encircled fiducial mark. (d) Fast Fourier Transform
(FFT) of (c). (e) The radially averaged FFT intensity plotted versus
the azimuthal angle θ and fit using a Gaussian function. The local
bundle orientation is orthogonal to the angle at which the Gaussian
fit peaks. (f-h) Length of segments 1 (f), 2 (g) and 3 (h) as a func-
tion of time. (i) Lengths of the three segments as functions of time,
normalized to their lengths at 46 minutes.
solely at their ends. It further suggests that the bundles elon-
gate through polymerization of their constituent MTs, which
start and end at random places along a bundle. The uni-
form growth of all MTs within the bundle justifies a uniform
elongation rate and the use of a uniform compressional force
throughout the bundle in the mechanical model, giving rise to
the sinusoidal ξ (x) over the entire pattern.
Additional quantitative information about the microscopic
picture of the buckling is gained through time-lapse bire-
fringence measurements. PolScope (CRI, Cambridge, MA)
images, taken sequentially at a fixed sample region [16],
yielded the time evolution at each pixel of both the retar-
dance (∆ ≡ bire f ringence× h, where h is the sample thick-
ness) and the slow axis direction (ϕ(x), orientation of MT
bundles) [17]. Two representative PolScope images of a sin-
gle region taken at different stages of self-organization are
shown in Fig. 3(a) and 3(b). The slow axis variation, ϕ(x),
along the white lines in Fig. 3(a) and 3(b) can be fit to
ϕ(x) = atan[A 2π
cos( 2π
(x + x0))], indicating that the bundle
follows ξ (x) = Asin( 2π
(x+ x0)) with a single wavelength λ ,
buckling amplitude A, and offset x0 [Fig. 3(c)]. The resultant
wavelength, λ ≈ 600 µm, is plotted in Fig. 3(d). The nor-
FIG. 3: Time evolution of a MT pattern obtained by measuring the
retardance and slow axis of the sample using a PolScope imaging
system [16]. (a,b) Retardance images of a sample region at 12 and
100 min of self-organization, respectively. The gray bar shows the
retardance magnitude scale and the white pins provide the slow axis
orientation. The straight white lines represent the slow axis line scan
position. (c) Slow axis line scan (black) and the fitted slow axis ori-
entation ϕ(x) = atan[A 2π
cos( 2π
(x+x0))] (gray) at 100 min. (d) The
dominant buckling wavelength λ , obtained from the fitted shapes of
the bundle at individual time points. (e) The length evolution of the
fitted bundle contour. L0 = 1544 µm is the initial unbuckled length of
the bundle. The segment before the arrow designates a latent period
prior to the onset of the buckling. (f) The magnitude of the retardance
averaged over the white lines as shown in (a,b) versus the normalized
length L/L0.
malized contour length calculated from the fits, L(t)/L0, grew
nearly linearly with time at a normalized rate of L̇(t)/L0≈ 1 %
per min [Fig. 3(e)]. Simultaneously, the retardance magnitude
averaged over the white line in Fig. 3(a) increased roughly
in proportion to L(t)/L0 [Fig. 3(f)]. Based on the nesting
model we proposed earlier and assuming that neighboring
MT bundles do not coalesce, the average retardance goes as
∆(t) ∼ δ × n(t)L(t)/L0 [9, 17], where n(t) is the number of
MTs in the cross section of a bundle and δ is the retardance of
a single MT. Therefore, the linear relation between ∆(t) and
L(t)/L0 implies that n(t) remains constant throughout buck-
ling. Thus, the elongation of MT bundles occurs through the
polymerization of MTs within the bundles and does not in-
volve the incorporation of new MTs to existing bundles.
With the above observations and model, we can quanti-
tatively characterize the elastic properties of the bundle (K)
and network (α). We begin with the implications of the mea-
sured wavelength λ . In order to predict λ from the mechani-
cal buckling model, we need to estimate K and F [Eq. (2)].
Two limits exist for K. If tight packing (solid model) of
the MTs inside the bundle is assumed, then Ksolid = n2KMT,
where KMT ≈ 3.4× 10−23 N ·m2 is the bending rigidity of
a single MT [18, 19]. If MTs slide freely inside the bun-
dle, then Kslip = nKMT. We employ the measured force-
velocity relation, f (v) = C1 ln[C2/(v + C3)] (C1 = 1.89 pN,
C2 = 1.13 µm/min and C3 = −0.08 µm/min [18]), for a sin-
gle MT and presume F = n f (v), where v is the average
elongation rate of individual MT inside the bundle. Writ-
ing the average length of MTs inside the bundle as lMT,
the elongation rate of a single MT is then approximately
v(lMT) = lMT × L̇(t)/L0. Using the models for K, F and
Eq. (2), we derive predictions of λ for both the solid model,
λsolid = π
8nKMT/ f (v(lMT)), and the slip model, λslip =
8KMT/ f (v(lMT)). Each depends on lMT and n. Using
n = 280 [9], we plot the wavelength over a reasonable range
of individual MT lengths ([1]) in Fig. 4. The solid model for
K appears much more reasonable than the slip model. The
fact that K depends quadratically on n in our system suggests
that MTs are fully coupled (acting like a solid material) inside
the bundle, similar to the behavior of F-actin bundles held to-
gether through depletion forces [20]. The bundling of initially
aligned MTs can be attributed to the depletion force induced
by unpolymerized tubulin dimers, oligomers and even short
MTs [9].
The conclusion that the bundles bend as solid rods appar-
ently conflicts with the picture of elongation, that involves the
growth and relative sliding of individual MTs within the bun-
dles. We speculate that the explanation involves two distinct
time scales: the time for a MT to come to mechanical equi-
librium with its neighbors following the insertion of a tubulin
dimer to its end, τmech, and the average interval between in-
sertions, τdimer. In the limit τmech < τdimer, strong coupling be-
tween the MTs in the bundle can occur leading to the solid rod
result. The opposite limit intuitively leads to weak coupling
between the MTs within a bundle. We estimate τdimer ≈ 0.1s
from our data, which seems quite long compared to the times
characterizing the relative motion of neighboring MTs on the
molecular length scales relevant to τmech. The exact molecu-
lar picture, which goes beyond the scope of our model, needs
further study.
Using the solid model for K, we can calculate the remain-
ing model parameter, α , from Eq. (2): α = Kslip(2π/λexpt)4 ≈
0.032Pa. This value is remarkably small compared to that
estimated for a single MT buckling inside a cell (α∗ ≈
2700Pa [12]). We identify two contributors to the difference
between α and α∗. In general, α ∼ G, where G is the elas-
tic shear modulus of the surrounding network. G ∼ 1Pa in
our system [21], while G∗ ∼ 1000Pa for the surrounding cy-
toskeleton network inside the cell [22]. The other contributor
is the coordination of the buckling of the MT bundles, which
reduces the distortion of the surrounding network, and thus
weakens the effective restoring force and α (analysis in prepa-
FIG. 4: Theoretically calculated wavelength (λ ) as a function of the
average length of MTs (lMT) inside the bundle at the onset of buck-
ling. In the solid model λsolid = π
8nKMT/ f (v(lMT)), and in the
slip model λslip = π
8KMT/ f (v(lMT))). λexpt is the experimentally
observed buckling wavelength (dashed line).
ration).
In summary, using microscopic studies of the temporal evo-
lution of the striated MT patterns, we show that the polymer-
ization of MTs within the bundles causes uniform elongation.
This in turn creates the driving compressional force which
ultimately causes the MT bundles to buckle. It is this coor-
dinated buckling that produces the striped birefringent pat-
tern. The proposed mechanical buckling model adequately
describes the buckling process. It predicts a critical buckling
force and a characteristic wavelength, which depend on the
elasticity of the surrounding network and the bending rigid-
ity of the MT bundles. Combing the bending rigidity of MT
bundles and the established MT force-velocity curve with the
mechanical model, we obtain a reasonable estimate for the
elastic constant of the network and find that MTs inside the
bundle are fully coupled.
We thank Allan Bower for help in understanding the elas-
tic constant α and thank L. Mahadevan and Thomas R.
Powers for valuable discussions. This work was supported
by NASA (NNA04CC57G, NAG3-2882) and NSF (DMR
0405156, DMR 0605797).
[1] A. Desai and T. J. Mitchison, Annu. Rev. Cell Dev. Biol. 13, 83
(1997).
[2] D. Bray, Cell Movement: From Molecules to Motility (Garland,
New York, 2001).
[3] R. P. Elinson and B. Rowning, Dev. Biol. 128, 185 (1988).
[4] G. Callaini, Development 107, 35 (1989).
[5] A. L. Hitt, A. R. Cross, and R. C. Williams, J. Biol. Chem. 265,
1639 (1990).
[6] J. Tabony, Science 264, 245 (1994).
[7] F. J. Nédélec, T. Surrey, A. C. Maggs, and S. Leibler, Nature
389, 305 (1997).
[8] C. E. Walczak, I. Vernos, T. J. Mitchison, E. Karsenti, and
R. Heald, Current Biology 8, 903 (1998).
[9] Y. Liu, Y. Guo, J. M. Valles, and J. Tang, Proc. Natl. Acad. Sci.
U.S.A. 103, 10654 (2006).
[10] W. Bras, G. P. Diakun, J. F. Dı́az, G. Maret, H. Kramer, J. Bor-
das, and F. J. Medrano, Biophys. J. 74 (1998).
[11] N. Glade and J. Tabony, Biophys. Chem. 115, 29 (2005).
[12] C. P. Brangwynne, F. C. MacKintosh, S. Kumar, N. A. Geisse,
J. Talbot, L. Mahadevan, K. K. Parker, D. E. Ingber, and D. A.
Weitz, J. Cell Biol. 173, 733 (2006).
[13] J. R. Gladden, N. Z. Handzy, A. Belmonte, and E. Villermaux,
Phys. Rev. Lett. 94, 035503 (2005).
[14] L. D. Landau and E. M. Lifshitz, Theory of Elasticity (Oxford,
New York, 1986), 3rd ed.
[15] J. Tabony and N. Glade, Langmuir 18, 7196 (2002); C. Papa-
seit, L. Vuillard, and J. Tabony, Biophys. Chem. 79, 33 (1999);
J. Tuszynski, M. V. Sataric, and et al., Physics Letters A 340,
175 (2005).
[16] The sample was polymerized from a 5 mg/ml tubulin solution
(2 mM GTP, 3.5% in molar ratio of Oregon Green conjugated
taxol to tubulin dimers, 100 mM pipes, 1 mM EGTA, 2 mM
MgSO4, PH 6.9) in a 40×8×0.4mm3 glass cuvette which was
exposed to 9 T vertical static magnetic field for 5 minutes at
37◦C (the magnetic field direction is along the long axis of the
cuvette). The cuvette was then laid flat on the microscope stage
and a coherently buckled area was chosen for observation and
measurement at 30◦C.
[17] R. Oldenbourg, E. D. Salmon, and P. T. Tran, Biophys. J. 74,
645 (1998).
[18] M. Dogterom and B. Yurke, Science 278, 856 (1997).
[19] J. A. Tuszyński, T. Luchko, S. Portet, and J. M. Dixon, Eur.
Phys. J. E 17, 29 (2005).
[20] M. M. A. E. Claessens, M. Bathe, E. Frey, and A. R. Bausch,
Nature Materials 5, 748 (2006).
[21] M. Sato, W. H. Schwartz, S. C. Selden, and T. D. Pollard, J.
Cell Biol. 106, 1205 (1988).
[22] R. E. Mahaffy, C. K. Shih, F. C. MacKintosh, and J. Käs, Phys.
Rev. Lett. 85, 880 (2000).
[23] The sample was polymerized from 8 mg/ml tubulin solution
(same buffer condition as in [16]) in a 40×10×1mm3 quartz
cuvette and was subjected to convective flow (induced by asym-
metrical thermal contacts, with the left and bottom surfaces in
contact with a 37◦C waterbath-warmed aluminum holder and
other sides exposed to 30◦C ambient) for the first 9 minutes.
The cuvette was then laid flat on the microscope stage for ob-
servation and measurement at 30◦C.
	References
ABSTRACT
  We present a model for the spontaneous formation of a striated pattern in
polymerizing microtubule solutions. It describes the buckling of a single
microtubule (MT) bundle within an elastic network formed by other similarly
aligned and buckling bundles and unaligned MTs. Phase contrast and polarization
microscopy studies of the temporal evolution of the pattern imply that the
polymerization of MTs within the bundles creates the driving compressional
force. Using the measured rate of buckling, the established MT force-velocity
curve and the pattern wavelength, we obtain reasonable estimates for the MT
bundle bending rigidity and the elastic constant of the network. The analysis
implies that the bundles buckle as solid rods.

<|endoftext|><|startoftext|>
Neutron Inelastic Scattering Processes as Background for Double-Beta Decay
Experiments
D.-M. Mei,1, 2, ∗ S.R. Elliott,1 A. Hime,1 V. Gehman,1, 3 and K. Kazkaz3, †
Los Alamos National Laboratory, Los Alamos, NM 87545
The Department of Earth Science and Physics, University of South Dakota, Vermillion, South Dakota 57069
Center for Experimental Nuclear Physics and Astrophysics, and
Department of Physics, University of Washington, Seattle, WA 98195
(Dated: October 29, 2018)
We investigate several Pb(n, n′γ) and Ge(n, n′γ) reactions. We measure γ-ray production from
Pb(n, n′γ) reactions that can be a significant background for double-beta decay experiments which
use lead as a massive inner shield. Particularly worrisome for Ge-based double-beta decay experi-
ments are the 2041-keV and 3062-keV γ rays produced via Pb(n, n′γ). The former is very close to
the 76Ge double-beta decay endpoint energy and the latter has a double escape peak energy near
the endpoint. We discuss the implications of these γ rays on past and future double-beta decay
experiments and estimate the cross section to excite the level that produces the 3062-keV γ ray.
Excitation γ-ray lines from Ge(n, n′γ) reactions are also observed. We consider the contribution of
such backgrounds and their impact on the sensitivity of next-generation searches for neutrinoless
double-beta decay using enriched germanium detectors.
PACS numbers: 23.40.-s, 25.40.Fq, 29.40.Wk
I. INTRODUCTION
Neutrinoless double-beta decay plays a key role in
understanding the neutrino’s absolute mass scale and
particle-antiparticle nature [1, 2, 3, 4]. If this nuclear de-
cay process exists, one would observe a mono-energetic
line originating from a material containing an isotope
subject to this decay mode. One such isotope that may
undergo this decay is 76Ge. Germanium-diode detectors
fabricated from material enriched in 76Ge have estab-
lished the best half-life limits and the most restrictive
constraints on the effective Majorana mass for the neu-
trino [5, 6]. One analysis [7] of the data in Ref. [6] claims
evidence for the decay with a half-life of 1.2 × 1025 y.
Planned Ge-based double beta decay experiments [8, 9]
will test this claim. Eventually, these future experiments
target a sensitivity of > 1027 y or ∼ 1 event/ton-year
to explore mass values near that indicated by the atmo-
spheric neutrino oscillation results.
The key to these experiments lies in the ability to re-
duce intrinsic radioactive background to unprecedented
levels and to adequately shield the detectors from ex-
ternal sources of radioactivity. Previous experiments’
limiting backgrounds have been trace levels of natural
decay chain isotopes within the detector and shielding
components. The γ-ray emissions from these isotopes
can deposit energy in the Ge detectors producing a con-
tinuum, which may overwhelm the potential neutrino-
∗Permanent Address: The Department of Earth Science and
Physics, University of South Dakota, Vermillion, South Dakota
57069
†Permanent Address: Lawrence Livermore National Laboratory,
Livermore CA 94550
less double-beta-decay signal peak at 2039 keV. Great
progress has been made identifying the location and ori-
gin of this contamination, and future efforts will substan-
tially reduce this contribution to the background. The
background level goal of 1 event/ton-year, however, is an
ambitious factor of ≈ 400 improvement over the currently
best achieved background level [6]. If the efforts to re-
duce the natural decay chain isotopes are successful, pre-
viously unimportant components of the background must
be understood and eliminated. The potential for neutron
reactions to be one of these background components is
the focus of this paper. The work of Mei and Hime[10]
recognized that (n, n′γ) reactions will become important
for ton-scale double-beta decay experiments. Specifically,
we have studied neutron reactions in Pb and Ge, materi-
als that play important roles in the Majorana [8] design.
But since lead is used by numerous low-background ex-
periments, the results will have wider utility.
This paper presents measurements and simulations of
Pb(n, n′γ) and Ge(n, n′γ) reactions and estimates the re-
sulting background for Ge-detector based, double-beta
decay experiments for a given neutron flux. With these
results, we then use the neutron flux, energy spectrum,
angular distribution, multiplicity and lateral distribu-
tions determined in [10] to estimate the background in
Ge detectors situated in underground laboratories. In
Section II we describe the experiments, the data, and the
simulations. In Sections III and IV we describe the analy-
sis of these data. Section IV also discusses the important
Pb(n, n′γ) production of γ rays at 2041 and 3062 keV.
The former is dangerously near the 2039-keV Q-value for
zero-neutrino double-beta decay in 76Ge and the latter
can produce a double-escape peak line at 2040 keV. These
dangerous processes for Ge-based double-beta decay ex-
periments are discussed for the first time in this work.
Section V determines an overall background model for
http://arxiv.org/abs/0704.0306v4
our detector and the implications of this model for fu-
ture experimental designs. It also considers the relevant
merits of Cu versus Pb as shielding materials, and the
use of depth to mitigate these backgrounds is discussed.
We also consider the possibility that the double-escape
peak of the 3062-keV γ ray could contribute to the signal
claimed in Ref. [7]. Finally, we summarize our conclu-
sions in Section VI.
II. THE MEASUREMENTS
We collected five data sets to explore the implications
of (n, n′γ) for double-beta decay experiments. All mea-
surements were done in our basement laboratory at Los
Alamos National Laboratory. The laboratory building is
at an atmospheric depth of 792 g/cm2 and provides about
1 mwe concrete (77 g/cm2) overburden against cosmic
ray muons.
Three data sets were taken with a CLOVER detec-
tor [11]. This detector is a set of 4 n-type, segmented
germanium detectors. The four crystals have a total nat-
ural germanium mass of 3 kg and each crystal is seg-
mented in half. The CLOVER detector and its opera-
tion in our laboratory were described in Ref. [12]. The
remaining two measurements were done with a PopTop
detector [13] set up in coincidence with a NaI detector.
The PopTop is a 71.8-mm long by 64-mm diameter p-
type Ge detector. Taking into account the central bore,
the detector is 215 cm3 or 1.14 kg. The NaI crystal is
15.25-cm long by 15.25-cm diameter and is directly con-
nected to a photo-tube. All data were read out using
a pair of X Ray Instrumentation Associates (XIA) [14]
Digital Gamma Finder Four Channel (DGF4C) CAMAC
modules. The CAMAC crate is connected to the PCI bus
of a Dell Optiplex computer running Windows 2000. The
system was controlled using the standard software sup-
plied by XIA. This data acquisition software runs in the
IGOR Pro environment [15] and produces binary data
files that were read in and analyzed using the ROOT
framework[16].
The data sets include:
1. A background run with the CLOVER
2. A Th-wire source run with the CLOVER
3. An AmBe source run with the CLOVER using two
different geometries of moderator
4. An AmBe source run with the PopTop surrounded
by lead
5. An AmBe source run with the PopTop surrounded
by copper
In this section we describe the experiments and the data
collected.
A. The Experimental Configurations
The CLOVER was surrounded by 10 cm of lead shield-
ing to reduce the signal from ambient radioactivity. Un-
derneath and above the lead was 5 cm of 30%-loaded
borated polyethylene to reduce thermal neutrons. The
background run done in this configuration lasted 27.13
live-days. The configuration for the Th-source run was
similar, but with some lead removed to expose the detec-
tor to the source. The Th source run had a live time of
1337 seconds.
The setup was modified somewhat from this
background-run configuration for the measurements with
the AmBe source. Fig. 1 shows the configuration for
one of the AmBe measurements. For these data, the
CLOVER was shielded on four sides with 10 cm of lead.
The AmBe source, 30 mCi of 241Am with a calibrated
neutron yield of ≈ 63,000 Hz (±0.7%), was on one side
of the CLOVER with 5 cm of lead and a layer of pure
polyethylene moderator (either 10 or 15 cm thick) be-
tween the source and detector. The data acquisition sys-
tem is inactive during data transfer. Only the AmBe
runs had a large enough event rate for the dead time to
be appreciable. A 6.13-h live-time data run (57% live)
was taken with 15 cm of moderator, and another 3.57-h
live-time data (38% live) run was taken with 10 cm mod-
erator (pictured). For the analysis presented below, the
data from these two configurations were combined, and
thus the AmBe-CLOVER data set contains 9.7 h of live
time. The observed energy spectrum extended from ≈
10-3100 keV for these data sets.
During the analysis of the AmBe data, we observed
a weak line at 3062 keV. This energy corresponds to a
γ-ray transition in 207Pb, and we therefore hypothesized
that it was generated via Pb(n, n′γ). The double-escape-
peak (DEP) energy (2040 keV) associated with this γ ray
is very dangerous for 76Ge neutrinoless double-beta de-
cay experiments because it falls so close to the transition
energy (2039 keV). Furthermore because the DEP is a
single-site energy deposition, it cannot be distinguished
from double-beta decay through event topology. This is
in contrast to a full-energy γ-ray peak, which tends to
consist of several interactions and therefore is a multiple-
site deposition. (See [12] for a discussion of the use of
event topology to reduce background in Ge detectors.)
The final two measurements were intended to study
this 3062-keV line in the spectrum and demonstrate its
origin. In both cases a PopTop Ge detector faced a 15.25
cm by 15.25 cm NaI detector for coincidence data. By se-
quentially placing a Pb and then a Cu absorber between
an AmBe source and a PopTop Ge detector, we tested
the hypothesis that the line was due to neutron interac-
tions in Pb. By looking for an coincident energy deposit
in the NaI detector, we could be assured the Ge detector
signal originated from a neutron interaction in the sam-
ple. An energy deposit threshold in the NaI of greater
than 200 keV was required for a coincidence. The Pop-
Top was placed 27.3 cm from the NaI detector with the
source placed 20.3 cm (7 cm) from the Ge (NaI) detector.
For the lead study, 5 cm of lead was placed directly be-
tween the Ge detector and the source. Additional lead, in
the form of 5-cm-thick bricks was positioned around the
4 sides of the Ge detector to reduce room background.
For the copper study, a 0.5-cm thick Cu tube was placed
around the PopTop and a 5-cm Cu block was placed be-
tween the PopTop and the source. For this final run, all
the lead was removed. For both of these sets of data, the
observed spectra extended from ≈ 125 keV to ≈ 9 MeV.
For the PopTop data, the Pb and Cu runs were of 19.12
h and 17.76 h live-time, respectively.
FIG. 1: The CLOVER detector as configured for the AmBe
source run. The setup at the time of this photograph used
4′′ of polyethylene. One wall of the lead shield was removed
only to clarify the relationship between the AmBe source,
moderator, and the CLOVER.
B. The Data sets
The crystals were individually calibrated, and the re-
sulting spectra summed together to form a single his-
togram. The peaks within each of the 3 CLOVER data
sets were identified and their intensities determined. If an
event had 2 crystals that responded in coincidence, the
histogram would have two entries. Therefore the spectra
we analyzed and simulated included all single-crystal en-
ergy deposits. By not eliminating events that registered
signals in more than 1 of the CLOVER Ge detectors,
we maximized the event rate. The peak strengths were
estimated by fitting a Gaussian shape to peaks and a
flat background to the spectrum in the region near the
peak. For the nuclear recoil lines, the peak shape was
assumed to be a triangle and not Gaussian. In Table I
the uncertainties derive from this fit. A summary of the
peak strengths is given in Table I and the spectra them-
selves are shown in Fig. 2. The data sets were chosen
to help decouple line blendings. Because the rates in
all peaks and continua are much higher for the source-
induced data than for the background, features in those
spectra are due to the sources and other contributions
can be safely ignored. For example, the 2614.5-keV line
can arise from either the decay of 208Tl or 208Pb(n, n′γ).
When exposed to a Th source, Tl decay dominates the
spectrum, whereas when exposed to an AmBe source,
(n, n′γ) dominates. Hence by normalizing the rate in
this line to the rate in a pure neutron-induced transition
(e.g. the 596-keV 74Ge(n,n’γ)), we can determine the
relative contribution of the two processes to the back-
ground spectrum. In fact, in the background data, both
processes contribute to this line.
Some comments on our choices for line identification
are in order. For an isotope such as 72Ge where a neutron
capture leads to a stable nucleus, almost all (n,γ) lines
could also be interpreted as (n, n′γ) lines in the result-
ing nucleus; in this case 73Ge. For isotopes within the
detector however, such as the 53.5-keV 72Ge(n,γ) tran-
sition, the competing 73Ge(n, n′γ) line would be a sum
of this γ-ray energy and the recoil nucleus energy. At
these low energies where the recoil is a fair fraction of
the γ-ray energy, the (n, n′γ) would simply contribute to
the continuum and not be observed as a line. For the
high energy cases, the blend of a mono-energetic γ-ray
line and a (n, n′γ) process might be present.
For the calibration runs, our threshold was approxi-
mately 70 keV. Also note that we used a thorium wire as
a calibration source. Since the wire is pure natural Th,
we observe the Th X rays in that data. In contrast, the
background run shows lines from the thorium chain as a
contaminant, therefore those lines are absent.
In all spectra, there are a few lines we have not iden-
tified.
TABLE I: A summary of the observed lines in the various spectra taken
with the CLOVER detector. Blank entries indicate that no significant
peak feature above the continuum was found. Single and double escape
peaks are labeled by SEP and DEP respectively. Line assignments for
which we are unsure are indicated by a question mark. Line energies are
taken from the Table of Isotopes [17].
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
23.4 70Ge(n,γ) 1.017(5)
46.5 210Pb 112.75(42)
72Ge(n,γ)
61.01(31) 2.079(8)
63.2 234Th 93.02(38)
67.7 230Th 24.05(19) 0.591(4)
68.8 72Ge(n,γ) 0.440(4)
72.80 Pb x-ray 10.5(1) 0.506(4)
74.97 Pb x-ray 663.5(1.0) 34.1(2) 1.703(7)
76.7 Unidentified 29.0(2)
228Th
Pb x-ray
Pb x-ray
115.28(42) 10.9(1) 0.930(5)
Pb x-ray 55.65(29) 16.8(1) 0.262(3)
Continued
TABLE I – continued
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
89.9 Th x-ray 32.4(2)
92.7 234Th 171.38(51) 0.060(1)
93.4 Th x-ray 47.5(2)
96.0 115In(n,γ) ? 0.166(2)
99.5 228Ac 13.71(15) 3.0(1)
105.3 Unidentified 8.99(12)
104.8
105.6
Th x-ray 20.6(1)
108.7 Th x-ray 7.6(1)
109.9 19F(n, n′γ) 43.00(26) 0.506(4)
129.1 228Ac 12.91(14) 3.2(1)
139.7 74Ge(n,γ) 47.20(27) 2.339(8)
143.9 230Th 20.03(18)
154.0 228Ac 7.69(11) 1.5(1)
159.7 77mGe 0.114(2)
162.4 115In(n,γ) 10.71(13) 1.073(6)
174.9 70Ge(n,γ) 7.45(11) 0.763(5)
186.1
186.2
226Ra
115In(n,γ)
114.60(42) 0.323(3)
197.1
198.4
19F(n, n′γ)
71Ge sum
81.04(35) 2.328(8)
199.2 228Ac 0.66(2)
202.6 115In(n,γ) 0.061(1)
209.5 228Ac 19.38(17) 10.9(1)
215.5 228Th 2.43(6) 0.92(3)
238.6 212Pb 295.77(67) 139.9(1) 0.105(2)
242.0 214Pb 57.49(30) 9.2(1)
247.1 70Ge(n,γ) 0.070(1)
253.7 74Ge(n,γ) 2.76(7) 0.410(3)
270.2 228Ac 21.39(18) 9.1(1)
273.0 115In(n,γ) 0.055(1)
277.4
208Tl
208Pb(n, n′γ)
12.98(14) 5.4(1) 0.086(2)
284.6 Unidentified 2.79(7)
288.1 212Bi ? 1.03(3)
295.2 214Pb 58.02(30)
297.2
298.7
72Ge(n,γ)
115In(n,γ)
0.068(1)
300.1 212Pb 18.59(17) 9.7(1)
306.2 70Ge(n,γ) 1.12(4) 0.046(1)
321.4 228Ac 0.67(2)
326.0 70,72Ge(n, n′γ) 0.487(4)
328.3 228Ac 10.02(12) 8.2(1)
332.9 228Ac 1.14(3)
335.5 115In(n,γ) 0.028(1)
338.7 228Ac 45.13(26) 31.5(2)
351.9 214Pb 95.11(38)
354.1 Unidentified 0.043(1)
385.1 115In(n,γ) 0.048(1)
391.3 70Ge(n,γ) 0.053(1)
409.8 228Ac 4.05(8) 4.3(1)
416.9 116mIn 2.21(6) 0.359(3)
438.9 Unidentified 2.54(6)
445.2 74Ge(n,γ) 0.037(1)
452.3 212Bi? 0.82(2)
463.3 228Ac 10.96(13) 9.2(1)
474.0 72Ge(n,γ) ? 2.54(6)
Continued
TABLE I – continued
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
478.6 228Ac 0.39(2)
470-485
10B(n,α)
7Li∗(γ)7Li
Doppler
Broadened
signf.
492.9 73Ge(n,γ) 0.123(2)
499.9 70Ge(n,γ) 0.453(4)
503.9 228Ac 0.34(2)
509.3
510.7
510.7
510.9
228Ac
208Tl
208Pb(n, n′γ)
Annih.γ
171.93(51) 16.8(1) 3.409(10)
516.2 35Cl(n,γ) 0.160(2)
537.5 206Pb(n, n′γ) 5.12(9) 0.158(2)
562.9
563.0
228Ac
76Ge(n, n′γ)
12.83(14) 1.52(3) 0.244(3)
569.7 207Pb(n, n′γ) 14.17(15) 0.422(4)
572.3 228Ac 0.53(2)
574.7 74Ge(n, γ) 0.091(2)
583.1
208Tl
208Pb(n, n′γ)
71.64(33) 49.6(2) 0.256(3)
595.9
74Ge(n, n′γ)
73Ge(n,γ)
59.90(30) 1.869(7)
608.3 73Ge(n,γ) 0.333(3)
609.2 214Bi 60.11(30)
629.6 72Ge(n, n′γ) 0.078(2)
648.2 115In(n,γ) 0.025(1)
657.2 206Pb(n, n′γ) 0.047( 1)
662.0 137Cs 9.04(12)
663.8 206Pb(n, n′γ) 0.069(1)
669.0 70Ge(n, n′γ) 0.030(1)
692.4 72Ge(n, n′e−) 87.70(37) 2.406(8)
701.0 74Ge(n, n′γ) 0.082(2)
708.2 70Ge(n,γ) 0.176(2)
727.3 212Bi 15.72(16) 11.7(1)
747.7 70Ge(n,γ) 0.047(1)
755.3 228Ac 2.19(6) 1.52(3)
763.1 208Tl 0.85(3)
763.1 208Pb(n, n′γ)? 0.032(1)
766.6
768.4
224mPa
214Bi
4.85(9)
771.8 228Ac 2.02(6) 2.11(4)
782.0 228Ac 0.67(2)
785.5 212Bi 4.19(8) 1.57(3)
786.3
786.8
35Cl(n,γ)
208Pb(n, n′γ)
0.041(1)
788.4
788.7
35Cl(n,γ)
70Ge(n,γ)
0.064(1)
795.0 228Ac 7.92(11) 6.1(1)
798.0 208Pb(n, n′γ) 0.023(1)
803.1 206Pb(n, n′γ) 20.90(18) 0.850(5)
806.2 214Bi 3.03(7)
808.2 70Ge(n,γ) 0.048(1)
818.6 116mIn 0.064(1)
824.9 1.03(4)
830.4 228Ac 0.70(2)
834.1 72Ge(n, n′γ) 45.15(26) 0.290(3)
Continued
TABLE I – continued
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
835.6 228Ac 2.30(4)
840.4 228Ac 1.19(3)
843.8 27Al(n, n′γ) 4.48(8) 0.112(2)
846.9 76Ge(n, n′γ) 0.062(1)
860.4
860.4
208Tl
208Pb(n, n′γ)
8.64(12) 6.0(1) 0.090(2)
865.0 Unidentified 0.094(2)
867.9 73Ge(n,γ) 4.25(8) 0.466(4)
881.0 206Pb(n, n′γ) 2.50(6) 0.151(2)
892.9 212Bi 0.42(2)
894.3 72Ge(n, n′γ) 0.029(1)
897.8 207Pb(n, n′γ) 6.28(10) 0.199(2)
904.1 228Ac 0.94(3)
911.2 228Ac 48.86(27) 37.1(2)
934.1 214Bi 1.99(6)
958.4 228Ac 0.37(2)
960.9 74Ge(n, n′γ) 0.095(2)
964.4 228Ac 11.22(13) 6.2(1)
968.8 228Ac 26.83(20) 21.6(1)
981.0 206,8Pb(n, n′γ) 0.035(1)
988.4 228Ac 1.95(5) 0.20(1)
993.7
995.1
74Ge(n, n′γ)
206Pb(n, n′γ)
0.026(1)
999.5 74Ge(n, n′γ) 0.034(1)
1001.5 224mPa 8.03(11)
1004.5 228Ac 0.17(1)
1014.5 27Al(n, n′γ) 7.46(11) 0.173(2)
1033.1 228Ac 0.20(1)
1040.1
1040.8
70Ge(n, n′γ) 16.89(16) 0.210(2)
1063.7 207Pb(n, n′γ) 8.47(11) 0.145(2)
1065.0 228Ac 0.47(2)
1078.8 212Bi 1.11(4) 0.62(2)
1093.9 208Tl sum 511+583 0.90(3)
1095.8
1096.9
207Pb(n, n′γ)
70Ge(n,γ)
116mIn
7.81(11) 0.464(4)
1101.3 74Ge(n, n′γ) 0.123(2)
1105.6 74Ge(n, n′γ) 0.019(1)
1110.4 228Ac sum 0.52(2)
1120.6 214Bi 11.70(13)
1122.5 228Ac sum 0.26(1)
208Pb(n, n′γ)
72Ge(n, n′γ)
0.022(1)
1131.6 73Ge(n,γ)? 1.34(5) 0.034(1)
1139.4 70Ge(n,γ) 1.34(5) 0.077(2)
1153.5 228Ac 0.15(1)
1155.2 214Bi 1.08(4)
1164.9
1166.0
35Cl(n, n′γ)
72Ge(n, n′γ)
0.49(3) 0.072(1)
1173.5 60Co 13.67(14)
1201.2 p(n,γ)d DEP 0.415(3)
1204.2
73Ge(n,γ)
74Ge(n, n′γ)
9.39(12) 0.163(2)
1226.7 74Ge(n, n′γ) 0.017(1)
1238.4 214Bi 5.22(9)
1246.9 228Ac 0.58(2)
Continued
TABLE I – continued
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
1261.0 74Ge(n, n′γ) 0.019(1)
1281.0 214Bi 0.74(3)
1286-7 228Ac Blend 0.14(1)
1293.5 116mIn 4.09(8) 0.462(4)
1298.8 70Ge(n,γ) 0.087(2)
1332.5 74Ge(n, n′γ) 0.018(1)
1332.5 60Co 12.27(14)
1344.5
1345.9
1347.7
74Ge(n,γ)
206Pb(n, n′γ)
70Ge(n,γ)
0.52(3) 0.017(1)
1374.2
228Ac sum
964 + 409
911 + 463
0.28(1)
1378.0 214Bi 3.24(7)
1378.8 70Ge(n,γ) 0.065(1)
1393.8 206Pb(n, n′γ) 0.016(1)
1401.5 214Bi 0.58(3)
1408.6 214Bi 1.84(5)
1413.6 73Ge(n, n′γ) 0.018(1)
1431.1 228Ac 0.15(1)
1433.5 206Pb(n, n′γ) 0.020(1)
1436.9 208Pb(n, n′γ) 0.017(1)
1459.2 228Ac 0.80(2)
1461.0 40K 30.18(22) 0.066(1)
1463.9 72Ge(n, n′γ) 0.114(2)
1466.8 206Pb(n, n′γ) 0.032(1)
1471.6 73Ge(n,γ) 0.047(1)
1489.2 74Ge(n, n′γ) 0.025(1)
1496.2 228Ac 0.73(3) 0.85(3)
1501.7 228Ac 0.42(2)
1508.9 116mIn 0.069(1)
1508.9 214Bi 1.95(3)
1512.7 212Bi 0.38(2)
1538 214Bi 0.54(3)
1557.1 228Ac 0.14(1)
1580.8 228Ac 0.68(3) 0.55(2)
1588.3 228Ac 6.06(10) 2.94(5)
1592.5
1592.5
1593.0
208Tl DEP
208Pb DEP
207Pb(n, n′γ)
7.24(11) 2.10(4) 0.107(2)
1599.3 214Bi 0.38(2)
1601.1
1602.0
35Cl(n,γ)
74Ge(n, n′γ)
0.018(1)
1614.9 208Pb(n, n′γ)? 0.020(1)
1620.5 212Bi 1.81(5) 1.32(3)
1625.0 228Ac 0.34(2)
1630.7 228Ac 1.82(5) 1.46(3)
1631.5
1632.0
74Ge(n, n′γ)
70Ge(n,γ)
0.021(1)
1634.0 76Ge(n,γ) 0.016(1)
1638.3 228Ac 0.43(3) 0.41(2)
1640.4
208Pb(n, n′γ)
74,76Ge(n, n′γ)
0.026(1)
1661.3 214Bi 0.36(2)
1666.3 228Ac 0.17(1)
1699.5 206Pb(n, n′γ)? 0.021(1)
1704.5 206Pb(n, n′γ) 0.72(3) 0.041(1)
Continued
TABLE I – continued
Energy Process Count Rates
(keV) backgrnd Thorium CLOVER
(per hr) (Hz) AmBe (Hz)
1710.9 72Ge(n, n′γ) 0.327(3)
1712.2 p(n,γ)d SEP 0.156(2)
1725.7 207Pb(n, n′γ) 0.029(1)
1729.6 214Bi 3.59(7)
1764.7 214Bi 11.68(13)
1779.0
27Al(n,γ)
28Al ⇒28 Si
2.13(6) 0.127(2)
1806.0 212Bi 0.11(1)
1844.5 206Pb(n, n′γ) 0.044(1)
1846.9 214Bi 2.26(6)
1940.4 74Ge(n, n′γ) 0.027(1)
1951.1 35Cl(n,γ) 0.032(1)
1959.3 35Cl(n,γ) 0.023(1)
2092.1
2092.7
206Pb(n, n′γ)
207Pb(n, n′γ)
0.039(1)
2103.8
208Tl SEP
208Pb SEP
5.21(9) 2.25(4) 0.090(2)
2112.1 116mIn 0.061(1)
2118.5 214Bi 0.64(3)
2204.0 214Bi 3.73(8)
2223.3 p(n,γ)d 5.813(13)
2390.5 116mIn 0.015(1)
2448.5 214Bi 0.51(3)
2614.5
208Tl
208Pb(n, n′γ)
39.39(25) 16.3(1) 0.729(5)
2650.3 206Pb(n, n′γ)? 0.011(1)
2686 sum 208Tl? 0.12(1)
2892 sum 208Tl 0.08(1)
3061.9 207Pb(n, n′γ) 0.010(1)
C. Neutron Spectra Simulation
Fast neutrons (from 100 MeV to 1 GeV or more) tend
to produce additional neutrons through nuclear reactions
as they traverse high-Z material. In particular the flux
of neutrons will increase several-fold while the average
neutron energy decreases through these processes. As a
result, fast neutrons will penetrate deep into a shield pro-
ducing additional neutrons at lower energies. These low
energy neutrons (∼< 20 MeV) give rise to a substantial
γ-ray flux because (n,n
γ) cross sections are large near
10 MeV, but become small at higher energies. Hence
it is these secondary lower-energy neutrons that inter-
act with the shield and detector materials to produce γ
rays, which can give rise to background in double-beta
decay experiments. To understand the process by which
high energy neutrons influence the low-energy neutron
flux and, in turn, the observed γ-ray flux, we simulated
neutrons impinging on an outer shield and tracked how
their spectrum changed as the particles traversed the
shield. We also simulated the production of neutron-
induced γ rays and how the Ge detector responded to
Energy (keV)
0 50 100 150 200 250 300 350
AmBe Spectrum
Background Spectrum
Energy (keV)
400 500 600 700 800 900
AmBe Spectrum
Background Spectrum
Energy (keV)
1000 1200 1400 1600 1800 2000
AmBe Spectrum
Background Spectrum
Energy (keV)
2000 2200 2400 2600 2800 3000 3200
AmBe Spectrum
Background Spectrum
FIG. 2: The AmBe and background spectra taken with the
CLOVER.
them. Specifically, we performed simulations of several
geometries including:
1. A simulation of the cosmic-ray produced neutrons
with energy up to 1 GeV at our lab in Los Alamos
and their propagation through a 10-cm Pb shield.
The response of the CLOVER detector to γ rays
produced by neutron interactions in the shield was
simulated. This simulation, compared to our back-
ground data, tests the precision to which we can
model neutron production, scattering with sec-
ondary neutron production and (n,n’γ) interac-
tions.
2. A simulation of the neutron flux induced on the
CLOVER from the AmBe source (neutron energy
≤11.2 MeV) passing through 15 cm of polyethylene
before impinging on the 10-cm Pb shield. Since the
neutron flux is of low energy, this simulation tests
the precision to which we model (n,n’γ) interac-
tions.
3. A simulation of the neutron flux with neutron en-
ergies up to a few GeV expected at 3200 mwe deep
due to cosmic-ray µ interactions in the surround-
ing rock and 30-cm lead shield and the resulting
response of the CLOVER to the flux of γ rays aris-
ing from this flux. This simulation permits us to
estimate rates in detectors situated in underground
laboratories.
The first two of these simulations are to verify the code’s
predictive power. The third is to aid in understanding the
utility of depth to avoid neutron-induced backgrounds.
The simulation package GEANT3-GCALOR [18, 19]
is described in detail in Ref. [10]. In general, (n, n′)
reactions leave the target nucleus in a highly excited
state which subsequently decays via a γ-ray cascade to
the ground state. In the simulation, inelastic scattering
cross sections for excitation to a given level depends on
the properties of the ground and excited states. These
cross sections were calculated using in-house-written
code based on Hauser-Feshbach [20] theory modified by
Moldauer [21]. The validation of the Hauser-Feshbach
theory has been the subject of several studies [22, 23, 24].
The simulated γ-ray flux arises from the relaxation of the
initial excited-state distribution, which includes a large
number of levels (60 states for 208Pb(n, n′γ) reactions,
for example). The nuclear levels and their decay chan-
nels were provided by the ENSDF[25] database through
the GEANT package. Note however, that the simula-
tion did not predict every possible transition. In particu-
lar the important 2041-keV and 3062-keV emissions from
Pb were not part of this simulation. This situation arises
because the simulation packages only have (n, n′) cross
sections for the lowest lying excited states for most nu-
clei. It is set to zero for most other levels. The details of
this simulation are described in detail in Ref. [10]. Here
we study the effectiveness of the simulation to predict
spectra resulting from (n,n’γ).
The simulation was done by generating neutrons with
the appropriate energy spectrum outside the lead shield
and propagating them through the shield including sec-
ondary interactions that may add to the neutron flux and
alter the energy spectrum. Fig. 3 shows a comparison
between the data and the simulations for the CLOVER
background run and AmBe run. Note only neutrons
as primary particles were simulated for this comparison
and the dominant difference between the two spectra is
due to the room’s natural radioactivity and non-neutron
µ-induced processes. Here we excluded those processes
from the simulation to emphasize the spectral shape, in-
cluding lines, that are a direct result of neutron inter-
actions. The similarity of the spectra in Fig. 3 indicates
that the measured background spectrum is dominated by
neutron-induced reactions.
The uncertainty in the simulation is calculated by com-
paring the well known peaks in Table II which shows
a comparison of the simulation to the line production
for both background and AmBe runs. The measured
neutron produced lines are within about 5% of the pre-
dicted values from simulation, as is the continuum rate
in the AmBe data. Therefore the (n,n’γ) rates are well-
simulated for nuclear states with well-defined cross sec-
tions. (The continuum for the background data includes
processes that were not simulated and hence is not a
good measure of the uncertainty.) Because the neutron
flux estimates come from these line strengths (See Sec-
tion III), the uncertainty in the flux cancels in these es-
timates. The uncertainty in the measured neutron flux
and spectrum underground(≈35%) constrains the preci-
sion to which such simulations can be verified and is well
described in Ref. [10]. This 35% uncertainty due to the
flux is much larger than the uncertainty for the γ-ray
line production described above. Therefore, a total un-
certainty of 35% is used for all predictions of line rates
underground throughout this paper.
III. THE NEUTRON FLUX
In this section, we use the data to determine the neu-
tron fluxes we observed during our various experimental
configurations. We then compare our measured cosmic-
ray induced flux with that predicted from past measure-
ments and our simulation.
A. Ge(n, n′γ) Analysis
Spectral lines that indicate neutron interactions in nat-
ural Ge detectors have been studied previously. See Ref-
erences [26, 27, 28, 29], for example. In particular, the
sawtooth-shaped peaks due to 72,74Ge(n, n′) at 693 keV
and 596 keV respectively are clear indications of neu-
trons and have been used to deduce neutron fluxes [30].
Operating Ge detectors in a low-background configura-
tion, these lines can be used to help interpret the back-
ground components. Recent double-beta decay experi-
ments [5, 6] have constructed their detectors from Ge en-
riched in isotope 76. Although an appreciable amount of
74Ge remained (14%), 70,72,73Ge are depleted. For such
Energy (keV)
500 1000 1500 2000 2500 3000
Am-Be, MC
Am-Be, Data
Background, Data
Background, MC
Am-Be, MC
Am-Be, Data
Background, Data
Background, MC
Energy (keV)
500 550 600 650 700 750 800
Am-Be, MC
Am-Be, Data
Background, Data
Background, MC
Am-Be, MC
Am-Be, Data
Background, Data
Background, MC
FIG. 3: Comparison of the measured and simulated AmBe
spectra for the CLOVER detector surrounded by 10 cm lead
and 15 cm of moderator. The upper plot shows the energy
range between 10 - 3100 keV. The lower plot shows the range
470 - 830 KeV where the most significant (n, n′γ) lines can
be seen. The simulated AmBe neutron spectrum was nor-
malized to the AmBe source strength for 6.13-h live-time.
The measured total neutron flux in the background spectrum
(see Section IIIC) was used to normalize the simulated back-
ground spectrum. Note, only neutrons as primary particles
were simulated for this comparison and the difference between
the spectra is due to the room’s natural radioactivity and non-
neutron, µ-induced processes.
detectors, only lines originating in isotopes 74 and 76
are useful for neutron interaction analysis. As these ex-
periments reach for lower background, neutron-induced
backgrounds become a greater concern and the diagnos-
tic tools more important.
Neutrons from (α,n) and fission reactions have an en-
ergy spectrum with an average energy similar to the
AmBe spectrum used in this study. Furthermore, the
average energy of the AmBe neutrons is similar to that
of the neutrons within the hadronic cosmic-ray flux im-
pinging on our surface laboratory although the latter ex-
tend to much higher energies. Therefore the Ge-detector
signatures indicating the presence of neutrons described
above will be similar to those arising from neutrons origi-
nating from the rock walls of an underground laboratory.
However, low-background experiments that use Ge de-
tectors are typically deep underground and are shielded
TABLE II: A comparison of the simulated to measured rates
(Background: per hour and AmBe: Hz) for several lines pro-
duced by neutron interactions. The 2041-keV and 3062-keV
lines are not included in the simulation.
Process γ-ray Background-CLOVER
Energy Simulation Measurement
74Ge(n, n′γ) 596 keV 56.21 59.90(30)
74Ge(n, n′γ) 254 keV 2.63 2.76(7)
76Ge(n, n′γ) 2023 keV 3.2×10−7 below sensitivity
206Pb(n, n′γ) 537 keV 4.82 5.12(9)
207Pb(n, n′γ) 898 keV 6.21 6.28(10)
206Pb(n, n′γ) 1706 keV 0.69 0.72(3)
206Pb(n, n′γ) 2041 keV none not seen
Continuum region 2000-2100 keV 110.2 187.35(19)
207Pb(n, n′γ) 3062 keV none not seen
Process γ-ray AmBe-CLOVER
Energy Simulation Measurement
74Ge(n, n′γ) 596 keV 1.8 1.87
74Ge(n, n′γ) 254 keV 0.36 0.41
76Ge(n, n′γ) 2023 keV 8.5×10−4 below sensitivity
206Pb(n, n′γ) 537 keV 0.15 0.16
207Pb(n, n′γ) 898 keV 0.14 0.20
206Pb(n, n′γ) 1706 keV 0.04 0.04
206Pb(n, n′γ) 2041 keV none not seen
Continuum region 2000-2100 keV 7.01 7.33
207Pb(n, n′γ) 3062 keV none 0.01
from environmental radioactivity by a thick shield. This
shield, typically made of Pb, is then usually surrounded
by a neutron moderator. This configuration is effective at
greatly reducing the neutron flux originating from (α,n)
and fission reactions in the cavity walls of the under-
ground laboratory. In contrast, although neutrons origi-
nating from µ interactions underground are much rarer,
they have much higher energy. Therefore these µ-induced
neutrons can penetrate the shield more readily and be-
come a major fraction of the neutrons impinging on the
detector.
B. The AmBe Neutron Flux
The estimate of the flux of neutrons with energies
greater than 692 keV is given by [30, 31, 32]
Φn = k
, (1)
where I is the counts s−1 under the asymmetric 692-keV
peak, V is the volume of the detector in cm3 (566 cm3)
and k is a parameter found by Ref. [30] to be 900 ± 150
cm. For the 15-cm moderator data, this formula predicts
a neutron flux of 2.3/(cm2 s) whereas our simulation,
using the known flux of the source, predicts 1.8/(cm2
s). This difference (20-30%) is somewhat greater than
the 17% uncertainty claimed by Ref. [30]. The geometry
for our measurement was complicated and perhaps this
added complexity of neutron propagation contributes to
the difference. For the uncertainty associated with the
flux of neutrons produced from cosmic ray µ, we use the
35% value as it is much larger than the value associated
with Eqn. 1.
For the Am-Be neutron source, the rate in the 692-keV
peak is 2.406 ± 0.008 Hz. This results in Φamben = 3.8
± 1.1 /(cm2 s). This rate is an average over the two
moderator configurations. The neutron flux during the
10-cm moderator run is estimated to be about a factor
2.3 larger than for the 15-cm moderator run. For the
PopTop-AmBe run on Pb for the raw data (in coinci-
dence with the NaI detector), the effective flux was 8.6
± 2.6/(cm2 s) (0.26 ± 0.08 /(cm2 s)).
C. Cosmic-ray Induced Neutron Fluxes
In the background spectrum the rate in the 692-keV
peak is 87.7 ± 0.4/hr. Using Eqn. (1) with I = (2.44 ±
0.06)×10−2 Hz for the background spectrum, one obtains
a fast neutron flux of Φbackn = (3.9 ± 1.2)×10
−2 /(cm2
s) at the detector in our surface laboratory.
Ref. [30] provides a similar formula to estimate the
thermal neutron flux, which is accurate to approximately
30%. Using the intensity of the 139.68-keV γ-ray line of
75mGe:
980I139.68
139.68 + 1.6)V
, (2)
139.68 ≃ 1−
1− e−V
V 1/3
where I = 47.2 ± 0.3 /h = 0.013 Hz is the event rate in
the peak of 139.68-keV line and V is the volume of the
detector in cm3. Using V = 566 cm3 we obtain Φbackth =
(9.1 ± 2.7)×10−3 /(cm2 s). We also measure the thermal
neutron flux for the Am-Be neutron source, Φambeth = 1.6
± 0.5 /(cm2 s).
Thus the total neutron flux incident on the Ge detector
measured for the background run is approximately Φbacktot
= Φbackn + Φ
th = (4.8 ± 0.7) × 10
−2 /(cm2 s).
D. Neutron Flux as a Function of Depth
In our basement laboratory, there are 3 primary
sources of environmental neutrons. The largest contri-
bution comes from the hadronic cosmic ray flux. The
next largest arises from µ interactions in the 77 g/cm2
thick overhead concrete layer in the building. Finally
there is the negligible contribution from (α,n) and fission
neutrons from natural radioactivity in the room. The at-
mospheric depth at the altitude of our laboratory is 792
g/cm2. Including the concrete, the depth is 869 g/cm2.
Using the analysis of Ziegler [33, 34], the flux at our lab
due to the hadronic flux can be estimated to be 3.0 times
larger than that at sea level. The flux at sea level has
been measured to be 1.22 × 10−2 /(cm2 s) [35] result-
ing in a flux in our laboratory of 3.7 × 10−2 /(cm2 s).
To estimate the additional neutron flux originating from
µ interactions in the concrete above our laboratory, we
rely on our simulations of neutron generation and prop-
agation. The simulation predicts 1.4 × 10−2 /(cm2 s)
(3.3 × 10−2 /(cm2 s)) for the muon-induced (hadronic)
neutron flux inside the lead shield for a total simulated
neutron flux of 4.7 × 10−2 /(cm2 s) in acceptable agree-
ment with our measurement of (4.8 ± 2.2) × 10−2 /(cm2
s) = (1.51 ± 0.69) × 106 ± /(cm2 y). The success of this
simulation lends credence to the neutron flux estimate in
the following sections.
The neutron flux onto the detector will be increased
due to the neutron interactions with shield materials and
neutron back-scattering from the cavity walls. For exam-
ple, our simulations show that the fast neutron flux will
increase by a factor of ≈ 10 by traversing a 30-cm lead
layer. Also, neutrons will backscatter from the cavity
walls and reflect back toward the experimental appara-
tus, effectively increasing the impinging neutron flux by
a factor of 2-3 depending on the specific geometries of
the detector and experimental hall. Therefore, it is im-
portant to account for these effects when estimating the
neutron flux at the detector.
Muon-induced neutron production in different shield-
ing materials and in the detector itself was also stud-
ied in Ref. [10]. For example, with 30 cm of lead sur-
rounding a CLOVER-style detector at 3200 mwe, the to-
tal muon-induced neutron flux impinging on the detector
was calculated to be (8.6 ± 4.0) × 10−8 /(cm2 s) = 2.7
± 1.2/(cm2 y). Some of the interactions resulting from
these neutrons would be eliminated by a µ veto. Assum-
ing a veto efficiency of 90% for muons traversing this lead
shield, the effective neutron flux is estimated to be (2.0
± 0.9) × 10−8 /(cm2 s) = 0.63 ± 0.29 /(cm2 y). The
energy spectrum of neutrons at the lead/detector bound-
ary at 3200 mwe is shown in Fig. 4 and has an average
value of 45 MeV.
The average energy of the µ-induced neutrons is 100-
200 MeV and much higher than that of (α,n) neutrons
(≈ 5 MeV). The simulated flux of the µ-induced neutrons
((2 ± 0.9) × 10−8 /cm2 s = 0.63 ± 0.29/cm2 y) inside
the detector shield at a depth of 3200 mwe is a factor 2.4
greater than the simulated (α,n) flux surviving the shield
((0.85 ± 0.39)× 10−8 /cm2 s = 0.27 ± 0.12/cm2 y). The
average energy of these (α,n) originating neutrons is 3-5
MeV at the detector surface.
With this estimate of the neutron flux at depth and
with our measurements of the Pb and Ge neutron-
induced detector response, we can proceed to estimate
these processes in underground Ge-detector experiments.
There are effects in addition to the incident flux, however,
that must be taken into account when extrapolating our
surface laboratory results to different geometries and lo-
Energy (MeV)
10 1 10
: Lead/Detector boundary!
,n):  Lead/Detector boundary!(
FIG. 4: The effective neutron flux onto the simulated detector
described in the text at a depth of 3200 mwe. Shown are the
neutron flux from two sources: (1) the effective neutron flux
induced by muons that transverse the surrounding rock and
shielding materials assuming a 90% muon-veto efficiency and
(2) the neutron flux from (α,n) reactions in the rock.
cations.
1. As the thickness of the Pb shield increases, addi-
tional secondary neutrons will be generated. Our
simulation predicts that a factor kshield = 2.16
more neutrons will be produced by a 30-cm thick
shield as compared to a 10-cm thick shield.
2. As the energy of the neutrons increases, the number
of multiply scattered neutrons increases and there-
fore the number of interactions that might produce
a γ ray increases. For the average energy of neu-
trons at our surface laboratory (at 3200 mwe), the
average scattering length is λL=7.1 cm (λUG = 12.5
3. Also as the energy increases, the number of states
that can be excited in the target nucleus increases.
In the shield at our surface laboratory (at 3200
m.w.e), the average neutron energy is 6.5 MeV (45
MeV).
All of these factors can be incorporated into a scaling
formula derived from our simulation. The rate (RUGROI)
of background near the region of interest (ROI) in an
underground laboratory can be related to that measured
in our surface laboratory (RLROI) as
ROI = (1 +
)kshield((
EUGn − Ex
ELn − Ex
)0.8)
ROI ,
where ΦLn (Φ
n ) is the neutron flux in our surface labo-
ratory (at 3200 mwe), En is the neutron energy and Ex
is the excitation energy for a typical level. This formula
reproduces our simulated results well and the uncertainty
of its use is dominated by the precision of simulation. Us-
ing the 2.6-MeV level in 208Pb as an example, RUGROI ∼
1.7 × 10−5RLROI . Fig. 5 shows a comparison between the
Monte Carlo simulation and the scaling formula Eqn. 4
for several lines.
-ray Energy (keV)γ
500 1000 1500 2000 2500
-610×
Monte Carlo simulation
Scaling formula
FIG. 5: The comparison between the Monte Carlo simulation
of a detector as described in the text and the scaling formula
for several excitation lines. The 35% uncertainties shown in
this figure arise from the cross section uncertainty and the
statistical uncertainty of determining the peak counts in the
simulated spectra. The latter dominates.
IV. ANALYSIS
A. Pb(n, n′γ) analysis
If fast neutrons are present, then one will also see γ-ray
lines from Pb(n, n′) interactions. In very low background
configurations, γ rays from neutron-induced excitations
in 208Pb and 207Pb can be masked by or confused for de-
cays of 208Tl and 207Bi respectively. Therefore it is the
stronger transitions in 206Pb (537.5, 1704.5 keV) that are
most useful for determining if these processes are taking
place. In 207Pb the relative strength of the 898-keV tran-
sition, compared to the 570- and 1064-keV transitions, is
much stronger when it originates from 207Pb(n,n’γ) as
opposed to 207Bi β decay to 207Pb. Therefore this line
can also be used as a tell-tale signature of neutron inter-
actions.
Our data show indications of 206,207,208Pb(n, n′γ). As
noted earlier, the 2614-keV γ ray from 208Pb can orig-
inate from 208Tl decay or from 208Pb(n,n’γ). The 692-
keV peak arises only from neutron interactions on 72Ge.
Since the Pb shielding was similar in both the background
and AmBe runs, we can compare the ratio of the rate in
the 2614-keV peak to that in the 72Ge(n, n′γ) 692-keV
peak in the two data sets to deduce the fraction of the
2614-keV in the background run that can be attributed
to neutron interactions. This ratio in the Am-Be spec-
trum is 0.30 and that in the background spectrum is 0.45,
and therefore, we conclude that ≈ 67% of the strength
in the background run is due to neutron reactions and
the remainder is due to 208Tl decay. Clearly, in our sur-
face laboratory, environmental neutrons are a significant
contributor to the observed signal.
B. The Special Cases of the 2023-keV, 2041-keV
and 3062-keV γ rays
The 2023-keV level in 76Ge can be excited by neutrons.
The simulation predicts that this line is too weak to be
seen in the CLOVER AmBe data, but the CLOVER is
built of natural Ge. In the enriched detectors planned
by future double-beta decay experiments, the fraction of
isotope 76 is much larger and this line would be enhanced.
Still our simulation (Table V) predicts it would be a very
small peak.
The 3714-keV level in 206Pb can emit a 2041-keV γ
ray. We only observed a candidate γ-ray peak in the
coincidence data (Ge detector event in coincidence with
a 4.4-MeV γ ray in the NaI detector) with the AmBe
source radiating the Pb shield around the PopTop detec-
tor. The magnitude of this peak if it exists is small and
not convincing. We use this data set to place a limit on
the production rate of this line as it results in the most
conservative limit.
In the AmBe-irradiated CLOVER and the non-
coincidence PopTop spectra, we observed a 3062-keV γ
ray that we assign to a transition from the 3633-keV level
in 207Pb. (See Fig. 6.) This line is only present when
Pb surrounds the detector: It is absent when Cu forms
the shield. The statistical sensitivity was too weak in
the PopTop coincidence spectrum to observe this weak
line. In the CLOVER AmBe spectrum, the rate of this
line is 5.3 × 10−3 that of 596-keV 74Ge peak rate and
in the raw AmBe PopTop with Pb spectrum the ratio is
4.5 × 10−3. From these data we can estimate an approx-
imate rate that these dangerous backgrounds would be
produced for a given neutron flux. In our surface labora-
tory, the CLOVER background rate for the 74Ge 596-keV
peak was 59.9 events/hr. This leads to a predicted rate of
0.3 events/hr in the 3062-keV peak. Note that our data
indicate that any peak at 3062 keV is statistically weak
(≤ 0.2 events/hr) but reasonably consistent with this pre-
diction. Note, in the AmBe-CLOVER runs, polyethylene
blocks were used to increase the flux of thermal neutrons.
It appears that these blocks contain some Cl and there-
fore we see indications of Cl(n,γ) lines. Even though 35Cl
has a neutron capture line at 3062-keV, we do not assign
the observed line in our data to that process. Because the
35Cl(n,γ) line at 2863 keV is not observed and because
the line at 1959 keV is weak, we conclude that assigning
this line to Cl would be inconsistent with the predicted
line ratios for neutron capture. However, a concern re-
garding our assignment of the 3062-keV line to 207Pb is
that one also expects a 2737-keV emission from the same
3633-keV level. This companion γ ray is not observed
in our data and we plan future measurements dedicated
to measuring the neutron-induced relative intensities of
these two lines. If we assume that the entire rate (0.01
Hz) of the 3062-keV line is due to (n, n′γ), we can make a
crude estimate of the cross section by scaling to the rate
in the 2614-keV line. The cross section for the 2614-keV
(n,n’γ) is 2.1 b ± 10% [23]. Using this cross section, the
relative rates in the two peaks, the different isotopic ra-
tios of 208Pb and 207Pb, and the different γ-ray detection
efficiencies, the average cross section for 207Pb(4.5-MeV
n, n′ 3062-keV γ ray) is estimated to be 75 mb. The
uncertainty is estimated to be about 20% or ∼ 15 mb.
Energy (keV)
3000 3020 3040 3060 3080 3100
AmBe activation, 6.1 h with 6" poly, 3.6 h with 4" poly
FIG. 6: The energy spectrum near 3000 keV showing the
207Pb(n, n′γ) 3062-keV γ-ray line in the AmBe spectrum with
the CLOVER.
From measurements with the CLOVER and a 56Co
source, which has γ-ray energies near 3100 keV, we expect
0.13 DEP events per full-energy γ-ray event. Therefore,
in our surface lab, we expect 0.03 events/hr in the DEP
at 2039 keV due to 207Pb(n, n′γ). This is well below
our continuum rate of 2.5 events/keV-hr or 10/hr in an
energy window corresponding to a 4-keV wide peak.
Since our simulation does not predict all these lines, we
summarize the measured rates normalized to the neutron
flux in Table III to provide simple scaling to different ex-
perimental configurations. The uncertainties in Table III
are estimates based on a minor contribution of the sta-
tistical uncertainty in the peak strengths and a major
contribution resulting from the ≈ 35% uncertainty in the
neutron flux determination as described in Section III B.
Because the uncertainty is mostly systematic, there is a
good possibility that the total uncertainties for each in-
dividual measurement are correlated. Therefore, to esti-
mate the average values in this Table, we took a straight
average of the individual values and then assigned an un-
certainty equal to the largest fractional value. This pro-
cedure, although not rigorous, is more conservative than
a weighted average. In addition, some peaks were not
observed in all spectra. The upper limits on the strength
of these peaks were estimated from the rates of weak-
est peaks observed near the associated energy region in
the spectrum. Such peaks are considered to represent
the level of sensitivity of our peak detection procedures.
The 2041-keV line is a special case. We quote an up-
per limit based on the only spectrum that indicated a
possible peak.
These measurements were done for a CLOVER de-
tector inside a 10-cm Pb shield. The relative energy-
dependent efficiency (ǫrel) for a full-energy peak in the
CLOVER can be approximated by,
TABLE III: The raw count rates for select processes normalized to neutron flux of 1/(cm2y). When extrapolating to neutron
energies distant from that near the measurements, the uncertainty (35%) associated with the extrapolation must be included.
See text for a discussion of the uncertainty estimates in this table especially with respect to the average.
Process Rate Rate Rate Rate
AmBe-CLOVER AmBe-PopTop Background Average
( events
206Pb(n, n′537− keV γ) 13.9±4.9 20.5±7.2 12.2±4.3 15.5±5.4
74Ge(n, n′596− keV γ) 164±57 unresolveda 142±50 153±54
207Pb(n, n′898− keV γ) 17.5±6.1 21.2±7.4 14.9±5.2 17.9±6.3
206Pb(n, n′1705− keV γ) 3.6±1.3 3.2±1.1 1.7±0.6 2.8±1.0
206Pb(n, n′2041− keV γ) <6.3 < 0.5 <1.7 < 5.0b
207Pb(n, n′3062− keV γ) 0.88±0.31 0.6±0.21 <1.0 0.7±0.2
( events
(keV ty)
Continuum Rate from Pb,Ge(n, n′γ) 2.6±0.9 2.0±0.7 2.5±0.9 2.4±0.8
aIn the PopTop data, the 596-keV line was not resolved from
nearby lines.
bThe 2041 line was not observed in any of our spectra, however,
a weak peak-like feature was present in the AmBe-PopTop coinci-
dence data. We used the upper limit for the rate in that peak as
the ”average” as we considered this to be most conservative.
ǫrel = 0.15 + 0.93e
−(Eγ−148)
766 , (5)
where Eγ is the γ-ray energy in keV. This expression is
normalized to 1.0 at 209 keV and is estimated to have an
accuracy of about 20% near 200 keV improving to about
10% at 2600 keV. The quoted relative efficiency for each
of the 4 individual CLOVER detectors is 26% at 1.33
MeV as quoted by the manufacturer. Table III does not
incorporate this efficiency correction, therefore the table
presents the measured count rates with a minimum of
assumptions. The thickness of Pb is large compared to
the mean free path of the γ rays of interest, therefore, the
scaling should hold for other thick-shield configurations.
Even so, the rates will be geometry-dependent so these
results can only be considered guides when applied to
other experimental designs. The rate of these excitations
also depends on neutron energy. For the background run
(AmBe run) the average neutron energy is ≈ 6.5 MeV (≈
5.5 MeV). Our simulations predict that the rate of these
excitations scales as energy to the 0.81 power.
V. DISCUSSION
A. A Model of the CLOVER Background
We can use these experimental results to create a back-
ground model for our surface lab and deduce the contri-
bution to the continuum near 2039 keV due to (n,n’γ) re-
actions. We then use simulation of high-energy neutron
production and propagation to extrapolate this model
to better understand experiments done at depth. The
measured rate for the continuum near 2039 keV was 14.8
events/(keV kg d). For the Th-wire data, this continuum
rate was 0.10 events/(keV s) (2900 events/(keV kg d))
and for the AmBe data it was 0.09 events/(keV s) (2600
events/(keV kg d)). To determine the neutron-induced
continuum rates in the AmBe data, however, we have
to correct for the contribution from the tail of two high-
energy γ rays that are not part of the neutron-induced
spectrum in the background. These are the γ rays from
the 2223-keV p(n, γ)d and the 4.4-MeV γ rays originat-
ing from the (α,n) reaction of the AmBe source itself.
Although only ≈ 10% of these AmBe γ rays penetrate
the 5-cm Pb shield, there is still a significant flux.
A simple simulation of the detector response to 2223-
keV γ rays can easily determine ratio of the rate in the
2039-keV region to that in the full-energy peak. Simula-
tion indicates that this ratio is 5.2 × 10−2 /keV. Since
the full-energy peak count rate is 5.813 Hz, we find this
contribution to the continuum is 0.03 events/(keV s). For
the 4.4-MeV γ ray from the source itself, simulation must
determine an absolute rate in the continuum because the
high-energy threshold prevented the observation of the
full-energy peak or its escape peaks. The simulation pre-
dicts 0.03 events/(keV s). Subtracting these two contri-
butions from the continuum rate for the AmBe source
near 2039 keV results in a final value of 0.03 events/(keV
s) or 860 events/(keV kg d).
Our background measurements were done without a
cosmic ray anti-coincidence system. From auxiliary mea-
surements with a scintillator in coincidence with the
CLOVER and a similar shielding geometry, we measured
the rate of µ passing through the detector. In the contin-
uum near the 2039-keV region, the rate is 5.4 events/(keV
kg d).
From the Th-wire source data, we measure the ratio
TABLE IV: A summary of the count rate in the CLOVER
background data in the energy region near 2039 keV based
on the model deduced for the surface lab described in the
text. The precision of the neutron-induced and muon-induced
spectra simulations (Section IIC and Ref [10]) is estimated to
be about 35%. We take this to be a conservative estimate for
the uncertainties associated with this Table.
Process CLOVER Event Rate
Surface Lab
events/(keV kg d)
neutron-induced 8.3±2.9
208Tl Compton scattering 0.7±0.3
high energy µ continuum 5.4±1.9
Total from model 14.4±5.0
Measured Rate 14.8±0.2
of the rate in the continuum near 2039 keV to that in
the 2614 keV (16.3 Hz) peak to be 6 × 10−3/keV. Of
the 2614-keV peak rate in the background data, ≈ 33%
is due to 208Tl decay. Scaling from the 2614-keV peak in
the background data, the count rate near the 2039-keV
region due to the Compton tail of the 208Tl 2614-keV
peak is 0.7 events/(keV kg d).
The remainder of the 2614-keV peak is due to neutron-
induced processes. The contribution due to neutrons can
be estimated from the AmBe data. For the AmBe data,
the ratio of the rate in the continuum near the 2039-
keV region (0.03 events(keV s)) to that for the 596-keV
74Ge(n, n′γ) peak (1.87 Hz) is 1.6 × 10−2/keV. Scaling
from the 74Ge peak rate in the background data (59.9/h)
indicates a rate of 7.8 events/(keV kg d) in the contin-
uum near 2039 keV. That is, 53% of the events in that
region are due to neutrons. One can do a similar scal-
ing from the 692-keV 72Ge rates. Here the ratio is 1.3 ×
10−2/keV and the continuum rate is 8.8 events/(keV kg
d). We use the average of the Ge values as our estimate
(8.3 events/(keV kg d) = 3030 events/(keV kg y)) for
the neutron induced contribution to the continuum rate.
Table IV summarizes the deduced contributions to the
spectrum in the 2039-keV region in the CLOVER back-
ground spectrum and the following section discusses how
these data are used along with simulation to estimate
rates in experimental apparatus underground.
B. Solving the Problem with Overburden
The primary purpose of this study is to better under-
stand the impact of neutrons on the background for fu-
ture double-beta decay experiments. In this subsection,
we use neutron fluxes from our simulations of the surface
laboratory, measurements with the AmBe source, and
simulations of the neutron flux in an underground lab-
oratory to estimate the contribution of neutron-induced
backgrounds underground. In the following subsection,
we examine data from previous underground experi-
ments.
The simulation of neutron processes in the 10-cm
Pb shield and Ge comprising the CLOVER detector at
the altitude of our laboratory predicts about 1594±558
events/(keV kg y) between 2000 and 2100 keV due to lead
excitation and about 1337±468 events/(keV kg y) in this
energy region due to germanium excitation. Our mea-
sured value for the neutron-induced events is 3030±1061
events/(keV kg y) to be compared with this predicted
value of 2931±1026 events/(keV kg y).
The simulation of the CLOVER within a 30-cm
lead shield at a depth of 3200 mwe, predicts about
0.019±0.007 events/(keV kg y) contributed from lead
excitation and about 0.016±0.006 events/(keV kg y)
contributed from germanium excitation for a total of
0.035±0.012 events/(keV kg y). One can also just
scale our surface-laboratory measurement of the neutron-
induced rate near 2039 keV by the factor derived from
Eqn. 4 above. This results in 0.05±0.02 events/(keV
kg y). For a detector like the CLOVER, analysis based
on pulse shape discrimination (PSD), and the response
of individual segments or crystals can help reduce back-
ground based on its multiple-site energy deposit nature.
These backgrounds can then be distinguished from the
single-site energy deposit character of double-beta decay.
We have measured the background reduction factor via
these techniques to be ≈ 5.9 for the CLOVER [12].
Reference [10] provides a quick reference formula to
estimate the neutron flux as a function of depth. The
µ flux and its associated activity is reduced by ≈ 10 for
each 1500 m.w.e of added depth. Future double-beta
decay experiments hope to reach backgrounds near 0.25
events/(keV t y). Our estimate of the rate at 3200 mwe
is 35-50 events/(keV t y), which is a factor of 150 above
the goal. Hence, greater depths would be desirable.
C. Discussion of Previous Underground
Experiments
Previous Ge-based double-beta decay experiments con-
ducted deep underground [5, 6] set the standard for low
levels of background. The future proposals however [8, 9]
hope to build experiments with much lower backgrounds.
In this subsection, we estimate how large the neutron
contribution was to the previous efforts and future de-
signs. Using the scaling summarized in Eqn. 4, we can
compare the expectations of our simulated underground
apparatus with previously published results. Table V
shows this comparison. This table also presents a sum-
mary of how the rates would be affected by a change in
depth only. The IGEX collaboration [5] has not pub-
lished its data in sufficient detail to do a similar com-
parison. Other underground Ge detector experiments do
not have the required sensitivity.
The Heidelberg-Moscow experiment [6] is a critical
case study for such backgrounds and it was operated at
TABLE V: A summary of the key count rates arising from neutron interactions in the CLOVER background data in the energy
region near 2039 keV as predicted by our analysis for three representative depths. The shield thickness is taken to be 30 cm
and a veto system with an assumed efficiency of 90% is included. Except for the 2023-keV line, we used the scaling of Eqn. 4 to
scale our CLOVER background measurements to the 3200 mwe depth and then used the muon fluxes at WIPP, Gran Sasso, and
SNOLAB[10] to scale to the other depths. The Ge rates are also scaled for an enriched detector (86% isotope 76, 14% isotope
74). The scalings require the results from the simulations. The uncertainty is dominated by the simulated flux uncertainty
and is estimated to be 35%. Since we did not observe the 2023-keV line, we used simulation to predict the rate. We used the
measured upper limit for the 2041-keV line. For comparison, the results of Ref. [37] is shown. Reference [37] claims a result for
zero-neutrino double-beta decay in an experiment performed at 3200 mwe. We entered the claimed event rate for that process
in the same row as the 2041-keV line for comparison. The rate limits for the other lines assigned to Ref.[37] result from our
estimates based on the figures in their papers and does not come directly from their papers.
Process 1600 mwe 3200 mwe 6000 mwe Ref. [37]
74Ge 596 keV 19400±6800/(t y) 1130±400/(t y) 15±5/(t y) <800/(t y)
76Ge 2023 keV 30±10/(t y) 2±1/(t y) 0.02±0.01/(t y) 300/(t y)
206Pb 537 keV 4400±1500/(t y) 250±88/(t y) 3.4±1.2/(t y)
207Pb 898 keV 5300±1900/(t y) 310±110/(t y) 4.2±1.5/(t y)
206Pb 1705 keV 610±210/(t y) 36±13/(t y) 0.5±0.2/(t y)
206Pb 2041 keV <1300±450/(t y) <74±26/(t y) <1.0±0.3/(t y) 400/(t y)
207Pb 3062 keV 145±51/(t y) 8.4±2.9/(t y) 0.1±0.03/(t y) <71/(t y)
continuum 880±310/(keV t y) 50±17/(keV t y) 0.7±0.24/(keV t y) 110/(keV t y)
a depth of 3200 mwe. One is clearly led to consider if
the 3062-keV γ ray can explain the signal reported in
Ref. [36, 37]. Figure 36 in Ref. [37], shows that no more
than a few counts can be assigned to a 3062-keV γ ray. If
the 23 counts assigned to double-beta decay were actually
a DEP from this γ ray, the one would expect 175 counts
or so in the 3062-keV peak. Therefore, it is difficult to
explain the claimed peak by this mechanism. It is also
clear that the predicted rate of the 2041-keV γ ray is too
low to explain their data. The data from Ref. [37] show
lines at 570 and 1064 keV and the authors assigned these
lines to 207Bi present in the Cu. However, the spectra dis-
played in Fig. 13 of that paper shows that only detectors
surrounded by Pb indicate the 570-keV line. Since there
is no evidence for the 898-keV line in the data, we agree
with the 207Bi assignment, however, we hypothesize that
it must reside in the Pb and not the Cu. Perhaps this
contamination is cosmogenically produced in Pb when it
resides on the surface and not as a result of bomb testing
as hypothesized by the authors[37].
Reference [37] also observed lines at 2011, 2017, 2022,
and 2053 keV. These lines had rates of approximately
500/(t y), 500/(t y), 300/(t y) and 380/(t y) respectively.
The line at 2022 keV is near a line we predict at 2023 keV.
Reference [37] attributes these lines to weak transitions
in 214Bi. From our analysis it is indicated that a negligi-
ble fraction of the peak at 2023 keV is neutron-induced.
However, since the predicted strength of the tell-tale lines
that would indicate a presence of neutron interactions is
just below the sensitivity of that experiment, this conclu-
sion is not without uncertainty. It has been pointed out
that the strength in the 2022-keV line is too strong with
respect to the 214Bi branching ratios even when sum-
ming uncertainties are taken into account [38, 39]. The
analysis in Ref. [38], however, was based on a incorrectly
normalized Fig. 1 in Ref [6]. A recent analysis [39] tak-
ing this into account still points to an inconsistency in
the line strengths. This discrepancy could be resolved if
one attributes a significant fraction of that peak to neu-
tron interactions on 76Ge. Such an attribution is not
supported by our simulations.
Reference [36] simulates the background in the
Heidelberg-Moscow experiment resulting in a predicted
signal of 646 ± 93 counts in the region between 2000 and
2100 keV during an exposure of 49.6 kg-y. This is a count
rate of 130/(keV t y) to be compared with the quoted
measured value for the data period simulated of 160/(keV
t y). Their estimate indicates that only 0.2/(keV t y) are
due to neutrons and they argue that µ-generated neu-
trons are a negligible contribution. Our estimates indi-
cate that neutrons are a more significant contribution and
that the µ contribution is significant. We are aware of no
direct neutron flux measurements for neutrons above 25
MeV. The flux of neutrons with energy greater than 25
MeV is estimated in Ref. [36] to be 10−11/(cm2 s) and
they considered these neutrons to produce a negligible
contribution to the background. In contrast, the simula-
tion in Ref. [10] gives 56 × 10−11/(cm2 s) at 3200 m.w.e
for the neutrons with energy greater than 25 MeV. We
use this higher flux value and as a result, our estimate
of the background rate near 2 MeV of 50/(keV t y) is
comparable to the excess (30/(keV t y)) of the measured
rate in comparison to the simulated rate in Ref. [36].
D. Is Copper an Alternative to Lead?
One has to consider the existence of a DEP line at the
double-beta decay endpoint a serious design considera-
tion for Ge-detector experiments. From the above anal-
ysis, the dangerous lines at 2041 and 3062 keV due to
Pb(n, n′γ) are not significant contributors to the spec-
trum of Ref. [37]. However, as future efforts reduce
the natural activity irradiating the detectors, these Pb-
neutron interactions will become important. One solu-
tion could be the use of Cu as a shield instead of Pb.
Copper is rather expensive and building the entire shield
of this material is probably not necessary. A thick inner
liner of Cu might suffice, but if a peak is observed and
Pb is present near the detector, arguments based on the
spectrum near 3062 keV will be critical.
Although the problematic lines we observed in the Pb
data were absent in our Cu data, the shields were too
dissimilar to make a quantitative comparison regarding
the effectiveness of reducing the continuum background.
Furthermore, our experience with the lead and the sim-
ulation of (n, n′γ) spectra reduces confidence in the con-
clusion regarding the Cu in the absence of such data. We
are preparing better experimental studies to address this
question.
VI. CONCLUSION
As double-beta decay experiments become more sen-
sitive, the potential background must be constrained to
ever-lower levels. Much progress has been made in reduc-
ing naturally-occurring radioactive isotopes from mate-
rials from which the detector is constructed. As these
isotopes that have traditionally limited the experimen-
tal sensitivity are eliminated, rarer processes will become
the dominant contributors. Here we have considered
neutron-induced processes and have quantified them. Re-
actions involving neutrons can result in a wide variety of
contributions to the background. That is, no single com-
ponent is likely to dominate. Therefore, tell-tale signa-
tures for neutrons are needed and were identified in this
work.
In addition to the general continuum background
that neutrons might produce, two specific dangerous
Pb(n, n′γ) lines were identified. These two backgrounds
can be significantly reduced using depth and/or an inner
layer of Cu within the shield. In particular, the 3062-
keV transition in 207Pb has a double escape peak at the
endpoint energy for double-beta decay in 76Ge. A com-
parison of past double-beta decay data indicates the rate
of this transition is too small to explain a claim of double-
beta decay.
VII. ACKNOWLEDGMENT
We thank R.L. Brodzinski for discussions regarding the
historical production of 207Bi and its possible presence in
the environment. We thank John Wilkerson and Jason
Detwiler for useful suggestions and discussion. Finally,
we also thank Alan Poon and Werner Tornow for use-
ful discussions and a careful reading of the manuscript.
This work was supported in part by Laboratory Directed
Research and Development at Los Alamos National Lab-
oratory.
[1] Steve R. Elliott and Petr Vogel, Annu. Rev. Nucl. Part.
Sci. 115 (2002)
[2] Steven R. Elliott and Jonathan Engel, J. Phys. G: Nucl.
Part. Phys. 30, R183 (2004).
[3] F.T Avignone III, G.S. King III and Yuri Zdesenko, New
Journal of Physics (in press 2004).
[4] A.S. Barabash, Physics of Atomic Nuclei, 67, No. 3, 438
(2004).
[5] C.E. Aalseth, et al., Phys. Rev. C v59, 2108 (1999).
[6] H.V. Klapdor-Kleingrothaus, et al., Eur. Phys. J A12,
147 (2001);
[7] H.V. Klapdor-Kleingrothaus, et al., Nucl. Instr. and
Meth. A522, 371(2004).
[8] R. Gaitskell, et al., nucl-ex/0311013 (2003).
[9] I. Abt, et al., “A New 76Ge Double Beta Decay Experi-
ment at LNGS” , hep-ex/0404039 (2004).
[10] D.-M. Mei and A.Hime, Phys. Rev. D73 053004 (2006).
[11] The CLOVER detector is manufactured by Canberra Eu-
rysis, 800 Research Parkway, Meriden CT 06450, USA.
[12] S. R. Elliott, et al. , Nucl. Instr. and Meth. A558 504
(2006).
[13] The PopTop detector is manufactured by ORTEC, 801
South Illinois Avenue, Oak Ridge, TN 37830.
[14] X-Ray Instruments Associates, 8450 Central Ave.,
Newark CA 94560, USA.
[15] Wavemetrics Inc., PO Box 2088, Lake Oswego, OR
97035, USA.
[16] Rene Brun and Fons Rademakers, Nucl. Instr. and Meth.
389, 81 (1997); http://root.cern.ch
[17] Table of Isotopes, Eds. Richard B. Firestone and Virginia
S. Shirley, John Wiley and Sons, Inc., New York (1996).
[18] R.Brun, et al., “GEANT3”, CERN DD/EE/84-1 (re-
vised), 1987.
[19] C.Zeitnitz, et al. , Nucl. Instr. and Meth. in Phys. Res.
A349, 106 (1994).
[20] W. Hauser and H. Feshbach, Phys. Rev. 87, 366 (1952).
[21] P. A. Moldauer, Phys. Rev. 123, 968 (1961).
[22] Richard M. Lindstrom, et al. , Nucl. Instr. and Meth.
A299, 425 (1990).
[23] H. Vonach, et al. , Phys. Rev. C 50 (1994) 1952.
[24] A. Pavlik, et al. , Phys. Rev. C 57, 2416 (1998).
[25] The National Nuclear Data Center, Brookhaven National
Laboratory, Upton, New York, 11973-5000.
[26] K.C. Chung, et al., Phys. Rev. C 2, 139 (1970).
[27] R.L. Bunting and J.J. Kraushaar, Nucl. Instrum. Meth.
118, 565 (1974).
[28] Richard M. Lindstrom, et al. , Nucl. Instrum. Meth.
A299, 425 (1990).
[29] G. Fehrenbacher, R. Meckbach, and H.G. Paretzke, Nucl.
Instrum. Meth. A372, 239 (1996).
http://arxiv.org/abs/nucl-ex/0311013
http://arxiv.org/abs/hep-ex/0404039
http://root.cern.ch
[30] G.P. Škoro, et al. , Nucl. Instrum. Meth. A316, 333
(1992).
[31] P.H. Stelson, et al. Nucl. Instr. and Meth. A98, 481
(1972).
[32] R. Wordel, et al. , Nucl. Instr. and Meth. A369, 557
(1996).
[33] J.F. Ziegler, IBM J. Res. Develop. 40, 19 (1996).
[34] J.F. Ziegler, IBM J. Res. Develop. 42, 117 (1998).
[35] P. Goldhagen, et al. , Nucl. Instr. and Meth. A476, 42
(2002).
[36] C. Dörr, H.V. Klapdor-Kleingrothaus, Nucl. Instr. and
Meth. A513, 596 (2003).
[37] H.V. Klapdor-Kleingrothaus, A. Dietz, I.V. Krivosheina,
and O. Chkvorets, Nucl. Instr. and Meth. A522, 371
(2004).
[38] C.E. Aalseth, et al. , Mod. Phys. Lett. A17, 1475 (2002).
[39] R. L. Brodzinski, “Regarding the Over-Estimation of
the Intensity of the Minor Photopeaks of 214Bi in
the Heidelberg-Moscow Experiment,” PNNL-SA-49010
(2006).
ABSTRACT
  We investigate several Pb$(n,n'\gamma$) and Ge$(n,n'\gamma$) reactions. We
measure $\gamma$-ray production from Pb$(n,n'\gamma$) reactions that can be a
significant background for double-beta decay experiments which use lead as a
massive inner shield. Particularly worrisome for Ge-based double-beta decay
experiments are the 2041-keV and 3062-keV $\gamma$ rays produced via
Pb$(n,n'\gamma$). The former is very close to the ^{76}Ge double-beta decay
endpoint energy and the latter has a double escape peak energy near the
endpoint. Excitation $\gamma$-ray lines from Ge$(n,n'\gamma$) reactions are
also observed. We consider the contribution of such backgrounds and their
impact on the sensitivity of next-generation searches for neutrinoless
double-beta decay using enriched germanium detectors.

<|endoftext|><|startoftext|>
Introduction
It is now well-established that most stars are members of binary systems at birth, and
that many of these stars are surrounded by disks similar to those found around young single
stars (see, e.g., the recent review by Monin et al. 2006). Thus, understanding the origin of
binaries is vital to understanding the star formation process. The predominance of binaries
also means that, based on number of systems alone, most potential sites of planet formation
lie in multiple systems. However, interactions between stars and disks in binary systems
can alter disk structure (Beckwith et al. 1990; Jensen et al. 1994; Osterloh & Beckwith 1995;
Jensen et al. 1996a,b; Jensen & Mathieu 1997), resulting in a more complicated environment
for planet formation. Nonetheless, the discovery of planets in relatively close binary systems
(Eggenberger et al. 2004) shows that binary systems are viable sites of planet formation.
Understanding the extent to which binaries modify the structure of their surrounding disks
is important for understanding the possible diversity of planetary system environments.
In addition, though the mass ratios are different, the interactions between stellar binary
companions and disks involve the same physics as those between planetary companions and
disks (e.g., D’Angelo et al. 2006) but are more easily observable. Thus, an understanding
of binary-disk interactions may help us understand planet formation around single stars as
well.
A binary star system may have up to three disks: two circumstellar disks, one each
around the primary and secondary; and a circumbinary disk outside the binary orbit. Both
analytic calculations and numerical simulations show that the region between these disks
is not stable for orbiting disk material. However, the question of how easily material can
flow from the outer, circumbinary disk across the gap to the circumstellar disks has not
been clear, either from an observational or a theoretical standpoint. The spectral energy
distributions of some young binaries show the clear signature of a cleared central region, while
other, apparently similar systems do not (Jensen & Mathieu 1997). Theoretical analyses by
Lin & Papaloizou (1993) and Artymowicz & Lubow (1994) suggested that material in the
region around the binary orbit is cleared, creating a quasi-equilibrium structure with three
– 3 –
distinct disks and a cleared region between them. If the gap between disks is impermeable,
the disks evolve independently of each other. Since the presence of a binary companion may
increase the rate of accretion (Clarke 1992; Ostriker et al. 1992) and the binary orbit presents
a constraint on the size of each circumstellar disk, the circumstellar disks would be exhausted
much more quickly than disks around single stars. However, smoothed particle hydrodynamic
simulations by Artymowicz & Lubow (1996) (hereafter AL96; see also Günther & Kley 2002)
predicted that material may indeed flow from the circumbinary disk to the circumstellar
environment, with the accretion rate varying with the phase of the binary orbit.
If such periodic accretion occurs in young binaries, it may be detectable observationally
by a periodic brightening of the system as the material flowing from the circumbinary disk
shocks when it collides with the circumstellar disk(s) or accretes onto the stellar surface(s).
Observations of the T Tauri spectroscopic binary DQ Tau by Mathieu et al. (1997) showed
such brightening, occurring at the binary orbital period. In addition, DQ Tau shows periodic
variations in spectral veiling and emission line intensities with orbital phase (Basri et al.
1997), providing strong support for the broad picture of mass flow across gaps suggested by
AL96. However, subsequent searches in other young, short-period binary systems surrounded
by disks have yielded mixed results. Alencar et al. (2003) did not find periodic photometric
variations in AK Sco, but they did find that the blue wing of the Hα line, and both the
blue and red wings of the Hβ line, vary with the binary orbital period, as does the total
Hα equivalent width. V4046 Sgr has shown periodic photometric variations at the binary
orbital period (Quast et al. 2000; Mekkaden 2000), though an earlier study did not find
such variations (Byrne 1986). Like AK Sco, however, V4046 Sgr does show variations in the
equivalent width and shape of Balmer lines as a function of orbital phase (Stempels & Gahm
2004). No dedicated photometric monitoring of UZ Tau E has been reported in the literature
to date, but in previous spectroscopic observations, neither Hα equivalent width nor spectral
veiling has shown any obvious dependence on binary orbital phase (Mart́ın et al. 2005).
Motivated by previous observational work and a desire to understand accretion in binary
systems, we have undertaken a photometric monitoring campaign for the pre–main-sequence
(PMS) spectroscopic binary UZ Tau E. UZ Tau, in the Taurus-Auriga star-forming region,
was first discovered to be variable by Bohlin during a bright outburst in 1921 (Bailey 1921;
Bohlin 1923) and identified as a ∼ 3.′′7 binary by Joy & van Biesbroeck (1944), one of the first
pre–main-sequence binaries to be identified. Subsequently, both components of the binary
were found to be binary themselves, making this a quadruple system. Simon et al. (1992)
and Ghez et al. (1993) identified UZ Tau W as a binary system, and Mathieu et al. (1996)
identified UZ Tau E to be a single-lined spectroscopic binary with a 19.1-day period. UZ Tau
W, a 0.′′34 binary (47.6 AU assuming a distance of 140 pc to Taurus-Auriga; Kenyon et al.
1994), is separated from UZ Tau E by 3.′′78 (530 AU; Simon et al. 1995a). Prato et al. (2002)
– 4 –
detected absorption lines of the secondary star in the near infrared spectrum of UZ Tau E,
measuring the mass ratio M2/M1 = 0.28 ± 0.01. Mart́ın et al. (2005) presented additional
radial velocity data for UZ Tau E; they found a binary orbital period of 18.979 days and an
eccentricity of 0.14. UZ Tau E shows strong Hα emission, indicative of ongoing accretion, and
strong infrared and millimeter excess emission from circumstellar and circumbinary disks;
the circumbinary disk has been resolved at λ = 1.3 mm (Jensen et al. 1996a) and 2.6 mm
(Dutrey et al. 1996) with a size and mass comparable to disks around single stars, showing
that the close spectroscopic companion has not significantly decreased the disk mass, in
contrast to the 50-AU pair in UZ Tau W, where the presence of a companion at a separation
comparable to typical disk sizes has greatly reduced the presence of circumstellar material.
In this paper, we present our new photometric observations, as well as a re-determination
of the binary orbital parameters from new and existing radial velocity observations. We
then examine periodicities in the data, showing that the photometric data vary at the binary
orbital period. Finally, we interpret the results in the context of the model of pulsed accretion
in binary systems.
2. Observations
2.1. Photometry
In order to search for periodic photometric variations, we have obtained new photometry
of UZ Tau E. Our photometric observations were made with the 0.6-m Perkin Telescope
at the Van Vleck Observatory (VVO) at Wesleyan University, and with ANDICAM on the
1.3-m SMARTS Telescope at CTIO. See Table 1 for details of the observations. The data
were reduced using standard techniques.
Since UZ Tau E and UZ Tau W are separated by 3.′′78, we sought to minimize con-
tamination of the UZ Tau E photometry by light from UZ Tau W by rejecting images with
FWHM greater than 7 pixels, and by using a relatively small (3-pixel-radius) photometric
aperture. Light curves of UZ Tau E and W show no correlation, indicating that the UZ Tau
E photometry is uncontaminated.
We performed differential photometry on UZ Tau E using USNO-B1.0 1158-0057597
as a comparison star. The UZ Tau field is relatively sparse, and in some of the SMARTS
images, this was the only star that was sufficiently bright to serve as a comparison star,
so we used it as the sole comparison in all of our photometry. The star was verified to be
non-variable at the few percent level, showing a standard deviation of 0.02 magnitudes by
comparison with several other stars of similar brightness using the wider-field VVO images.
– 5 –
The USNO-B1.0 Catalog (Monet et al. 2003) gives magnitudes for this star from the Second
Palomar Sky Survey of B = 16.39, R = 13.75, and I = 12.8. Although these filters are not
identical to those used in our CCD observations, we adopted these values for the magnitude
of the reference star to set the zero point of our light curves. In addition, we adopted
V = 14.99 by noting that the R − I color for the comparison star suggests a spectral type
of ∼ M0, and adopting a corresponding B − V color. Adopting these values allows us to
determine approximate colors for UZ Tau E, though we caution that the absolute scaling
of both the individual magnitudes and the colors is uncertain. The differential photometry
and the color changes, which form the basis of our analysis, are both unaffected by this
systematic uncertainty in the zero point.
2.2. Spectroscopy
We did not acquire new spectra of UZ Tau E solely for this program, but we did
make new measurements of a number of spectra taken during the course of other programs.
Some of these spectra were kindly supplied by Marcos Huerta. These are echelle spectra
from McDonald observatory spanning 14 nights in January 2002, with R = 46, 000 and
wavelength coverage of 5460–6760 Å. The observations and data reduction are described in
detail in Huerta et al. (2005). Additional echelle spectra of UZ Tau E were taken at Keck
(R = 31, 000) with the setup described in Basri & Reiners (2006) and at Lick (R = 48, 000)
with the setup described in Alencar & Basri (2000).
We used the spectra from Huerta to measure radial velocities of UZ Tau E. Spectra of
the weak-lined T Tauri star V819 Tau were used as a radial velocity standard. By cross-
correlating the UZ Tau E spectra against the V819 Tau spectra, we measured heliocentric
radial velocities of UZ Tau E, assuming vhelio = 14.4±1.5 km s
−1 for V819 Tau (Walter et al.
1988). The resulting velocities are given in Table 2. Radial velocities were measured using
several different echelle orders with strong absorption lines; the quoted uncertainties reflect
the dispersion in these different measurements, as well as the uncertainty of V819 Tau’s
radial velocity.
In addition, we also measured the equivalent width of the Hα line in both sets of spectra
in order to track changes in accretion rate over time. Equivalent widths are given in Table
2, with an estimated uncertainty of 10%.
– 6 –
3. Binary orbital parameters
In order to assess whether or not any periodic photometric variations detected in UZ
Tau E are synchronized with the binary orbit, we need to have an accurate knowledge of
the orbital parameters. These have been determined previously by Mathieu et al. (1996),
Prato et al. (2002), and Mart́ın et al. (2005), but the number of radial velocity points avail-
able is still relatively small, especially for the secondary, leaving open the prospect of further
improvements to the orbital parameters. To that end, we have re-analyzed the spectroscopic
orbit using data published in Prato et al. (2002) and Mart́ın et al. (2005) as well as our new
radial velocity measurements (Section 2.2).
We fit the radial velocity data using the Binary Star Combined Solution software
(Gudehus 2001), the ORBIT code (Forveille et al. 1999), and our own custom-written IDL
code; all gave the same solution. The best-fit phased radial velocity curve is shown in Figure
1 and the orbital parameters are given in Table 3.
The best-fit period of 19.131±0.003 days is inconsistent with the value of 18.979±0.007
days found by Mart́ın et al. (2005). Examination of the power spectrum of the velocity data
used by Mart́ın et al. shows that the 18.979-day period appears to be an alias of the true
period, caused by beating of the 19.131-day period with two six-year gaps in the radial
velocity data; there is a corresponding alias at 19.3 days. Re-fitting only the data used by
Mart́ın et al., we find that the two periods both correspond to local minima in χ2 space, with
reduced χ2 = 8.8 for P = 19.131 days and reduced χ2 = 10.7 for P = 18.979 days. When
the new radial velocity data are added, the fit for P = 19.131 days improves to reduced
χ2 = 8.1 while that for P = 18.979 days worsens to χ2 = 11.7, as expected if 19.131 days is
the correct period.
We note that we have not added any additional radial velocity measurements of the
secondary, and thus the mass ratio remains more uncertain than the other orbital elements,
resting on the six secondary radial velocities presented by Prato et al. (2002).
UZ Tau E is one of only a handful of pre–main-sequence systems with measured stellar
masses (see Mathieu et al. 2006 for a recent review). Because the total system mass has been
measured (Simon et al. 2000), the spectroscopic orbital parameter M sin3 i can be used to
determined the orbital inclination. This can then be compared with the observed inclination
of the circumbinary disk. While this was done by Simon et al. (2000) and Prato et al.
(2002), we revisit this issue here using our newly-determined orbital parameters for UZ
Tau E. Combined with M = 1.31 ± 0.08 M⊙ (Simon et al. 2000), our orbital parameters
give sin iorbit = 0.81± 0.05, or iorbit = 54
◦ ± 5◦. This is in excellent agreement with the disk
inclinations of 54◦±3◦ and 56◦±2◦ measured from interferometric images of the λ = 1.3 mm
– 7 –
continuum emission and the CO line emission, respectively (Simon et al. 2000). Thus, the
binary orbit and the circumbinary disk are coplanar. Since the disk inclination is measured
at scales of ∼ 100 AU and the binary orbit is only a few tenths of an AU, this co-planarity
apparently extends over the entire disk. Though there are several theoretical studies of
how tilted circumstellar disks interact with a binary system (e.g., Papaloizou & Terquem
1995; Larwood et al. 1996; Bate et al. 2000; Lubow & Ogilvie 2000), we know of no studies
of the timescale for alignment of a circumbinary disk if it is initially tilted relative to the
binary orbit. Studies of the effects of a planetary-mass companion on a tilted external
disk (Lubow & Ogilvie 2001) suggest that such disk tilts do decay over time, however. The
example of UZ Tau E shows that co-planarity over the entire disk exists already by an age
of a few Myr, suggesting that circumbinary disks either form already aligned with the orbit,
or come into alignment very quickly. This is similar to the result of Jensen et al. (2004),
who found that circumstellar disks in young binaries tend to be aligned with each other, and
thus presumably with the binary orbit.
4. Periodic variations
4.1. Photometry
In order to determine whether the system is varying in phase with the binary orbit, or
in any other systematic way, we have searched the photometric data for periodic signals. We
begin by searching for evidence of periodicity, without pre-supposing a particular period.
A Lomb-Scargle periodogram (Scargle 1982) of the I-band data (the band with the largest
number of points and best time coverage; Table 1) is shown in Figure 2. There is a strong peak
at a period of 19.20± 0.03 days. The false-alarm probability (FAP) of this peak is less than
0.001 according to the formulation of Horne & Baliunas (1986). While this FAP calculation
is strictly applicable only to evenly-spaced data, a Monte Carlo bootstrapping method (e.g.,
Stassun et al. 1999) confirms that this period is statistically significant at better than 99.9%
confidence. The period uncertainty reported above, which follows from the formulation of
Kovacs (1981), is probably underestimated as it assumes that the underlying signal is well
described by a single sinusoid.
Though it is slightly off the main peak of the power spectrum, there is significant power
at the binary period of 19.131 days. Periodograms of the B, V , and R data are similar
(Figure 3), showing peaks near the binary period, but with broader peaks and higher false-
alarm probabilities, perhaps due to the more limited time coverage of the data in those
bands.
– 8 –
To refine the period and to better estimate its uncertainty we next performed a phase
dispersion minimization (PDM) analysis (Stellingwerf 1978), which is particularly well-suited
to periodic variability that is highly non-sinusoidal and/or to data with large intrinsic scatter,
both of which apply to the photometry of UZ Tau E. The PDM search of the I-band data
yields a best period of P = 19.17±0.05 d, where the uncertainty was determined empirically
from the 1/e folding scale of the PDM merit function. The same analysis on the V -band
data gives P = 19.15± 0.04 d.
Schwarzenberg-Czerny (1989) argues that a related test, the one-way analysis of vari-
ance, is the most powerful statistic of this kind for detection of periodic signals. Applying
that test to our data yields P = 19.16± 0.03 d for the I-band data, and P = 19.15± 0.05 d
for the V -band data. Following Schwarzenberg-Czerny (1989), the period uncertainty was
determined using a “post-mortem” analysis that measures the 1-σ confidence interval of the
primary periodogram peak, defined by its width at the mean noise power level of the peri-
odogram in the vicinity of this peak. As above, a Monte Carlo permutation analysis of the
light curves confirms that this period is statistically significant at better than 99.9% confi-
dence. Combining these estimates, our best-fit photometric period is P = 19.16 ± 0.04 d,
consistent with the binary orbital period.
Figure 4 shows I-band light curves for all three observing seasons, folded at the binary
orbital period of 19.131 days. As suggested by the periodogram analysis, all show indications
of periodic behavior, with a broad minimum near orbital phase 0.5. The data from the 2004–
2005 season show the smoothest variability, but this is also the season with the smallest
number of data points. Clearly there is significant random variability as well, with scatter
of roughly 0.6 magnitudes at all orbital phases.
Figure 5 shows folded B, V , R, and I lightcurves from the 2003–2004 and 2005–2006
seasons. Broadly speaking, the BV R data show the same behavior as the I-band light curves,
with large-amplitude variability that appears to have both periodic and random components.
We note that the R band includes the Hα line, which may complicate the interpretation of
the light curve.
All four bands show a gradual increase in brightness over the three years of our ob-
servations, with the mean magnitude changing by 0.3 mags at I band from 2003–2004 to
2005–2006. To separate long-term variations from the shorter-term variations of interest
here, a linear trend (with a slope of roughly 0.15 mag / yr at I band) has been fit to each
band. The folded light curves using these de-trended data, and combining all three observing
seasons, are shown in Figure 6.
In addition to our photometric data, previous data on UZ Tau have shown some evidence
– 9 –
of both long-term trends and periodic variations. Bohlin (1923), in one of the first papers
to mention UZ Tau, reported on a major flare and then a four-magnitude overall decline in
brightness from 1921–1923. He also noted that there was a short-period variation with a
period of 10–20 days, which encompasses the period of the variations reported here. Bohlin’s
measurements are for the entire UZ Tau system, but later examination of Bohlin’s plates by
Herbig (1977) showed that it was UZ Tau E that brightened dramatically in 1921.
Variations in color of the system can also give clues about the cause of the variability.
Figure 7 shows the V − I color as a function of I magnitude and orbital phase. The system
shows a behavior commonly seen in T Tauri stars, appearing redder when fainter and bluer
when brighter (Herbst et al. 1994). This behavior is consistent either with periodic changes
in extinction (causing both dimming and reddening when the extinction is higher) or in
accretion (adding additional blue light when the accretion rate is higher).
4.2. Spectroscopy
If the periodic photometric variations are due to changes in accretion rate, one might
expect accompanying variations in Hα emission or spectral veiling, common tracers of accre-
tion. Mart́ın et al. (2005) searched for both of these in UZ Tau E and did not find evidence
of either. With our new spectra, we can revisit the question of Hα variability.
Figure 8 shows the equivalent width of the Hα emission line as a function of binary
orbital phase. The variations do not appear to be strongly correlated with orbital phase.
There is some evidence for lower Hα equivalent widths around phase 0.4–0.8, as seen in the
photometric data, but the data are relatively sparse in that phase range as well.
While a lack of periodic variability would be at odds with the photometric data, we note
that periodicity may not be as obvious in the spectroscopic data, since the two datasets differ
in two important respects. First, the Hα data have much sparser sampling; they span a total
of eight years (1994–2002), but with only a handful of points during a given year. Second,
they do not overlap at all with the photometric data. Given that the photometric data show
both long-term trends and short-term scatter in addition to the periodic variations, and that
T Tauri stars in general are known to show significant random variability at Hα, it may be
difficult to separate random and periodic variations (if any) without a dedicated monitoring
campaign, preferably one that includes simultaneous photometric and spectroscopic mea-
surements. We conclude that while periodic spectroscopic variations similar to those seen
in the photometry are not definitively present in these spectroscopic data, neither are they
ruled out.
– 10 –
5. Discussion
The photometric data (and possibly the spectroscopic data) show periodic variability
at the binary orbital period, suggesting that there is a link between the variability and in-
teractions of the binary with its circumstellar and/or circumbinary material. In this section,
we first argue that the periodic variations are unlikely to be due to stellar rotation, and
then we examine how well the observed behavior matches what is expected from the pulsed
accretion model of Artymowicz & Lubow (1996). Finally, we examine the available data for
other spectroscopic binaries to assess whether or not there is evidence for periodic accretion
as a general phenomenon.
5.1. Could the variations be due to rotation?
Periodic variability is not uncommon in photometric studies of PMS stars. Indeed, dedi-
cated monitoring surveys of rich star-forming regions (e.g., Mandel & Herbst 1991; Attridge & Herbst
1992; Choi & Herbst 1996; Stassun et al. 1999; Rebull 2001; Herbst et al. 2002a) have now
discovered hundreds of PMS stars exhibiting periodic variability, the result of surface-
brightness inhomogeneities (i.e. starspots) that rotate in and out of view with the stellar
rotation period. The periodic variability observed in UZ Tau E is very unlikely to be the
result of such rotationally modulated spot signals, for several reasons. First, the rotation
periods of low-mass PMS stars are nearly always shorter than about 12 days, while the pe-
riod of the variations reported here is 19 days. Among 150 low-mass PMS stars in the Orion
Nebula Cluster, only two stars (∼ 1%) have Prot > 15 d (Herbst & Mundt 2005). Second,
rotationally modulated spot signals are typically sinusoidal, and stable over many cycles or,
in some cases, many years. In contrast, the periodic signal we have found in UZ Tau E
is decidedly non-sinusoidal, with considerable scatter; the “bright” state has a duty cycle
of ∼ 60%. Thus, it is either intrinsically non-sinusoidal, or shows substantial phase shift-
ing from one cycle to the next; neither of these is consistent with rotationally-modulated
variability.
The rotation period distributions discussed above are presumably dominated by single
stars or member of wide binaries, while tidal interactions between the stars in a close binary
system can synchronize the orbital and rotational periods. However, in the case of eccentric
systems like UZ Tau E, pseudo-synchronization (in which the stellar angular velocity is
synchronized with the orbital angular velocity at periastron) occurs instead, since the tidal
interactions are strongest around periastron (Hut 1981). The predicted pseudo-synchronous
rotation period for UZ Tau E, using the weak friction formulation of Hut (1981) and the
orbital parameters in Table 3, is 11.4 ± 1.2 d, inconsistent with the observed variability
– 11 –
period.
We can also estimate the rotation period of the UZ Tau E primary directly if three
quantities are known: the inclination irot of the star’s rotation axis, the star’s projected
rotational velocity v sin irot, and the stellar radius. Of these, the inclination is typically
impossible to measure, except under special circumstances.
As noted in Section 3, the dynamical mass measurement of UZ Tau E allows us to
determine the binary orbital inclination iorbit. Based on studies of other binary systems, it
is reasonable to assume that this inclination is the same as that of the stellar rotation axis,
irot. The most detailed study comparing the orientations of these axes in binary systems
is that of Hale (1994). Considering spectral types of F5–K5, he finds that binaries with
separations less than 30–40 AU tend to exhibit co-planarity between rotational equators and
orbital planes, while wider binaries have random orientations. Using a similar method, Weis
(1974) found a tendency for the stellar rotational equators to align with the binary orbit
among primaries in F star binaries. Interestingly, Weis (1974) did not find a tendency toward
co-planarity between rotational and orbital planes among A stars, suggesting that caution
is necessary when comparing stars of different masses. Similarly, Guthrie (1985) found no
correlation between orbital inclination and v sin i among 23 A2–A9 binaries with semi-major
axes of 10–70 AU. The low mass and short period of UZ Tau E suggest, however, that the
conclusions of Hale (1994) are most applicable here.
Prato et al. (2002) find L = 0.63+0.19
−0.17
L⊙ and Teff = 3700 ± 150 K for the primary in
UZ Tau E. Combining these values yields R = 1.9 ± 0.2 R⊙. Hartmann & Stauffer (1989)
find v sin i = 15.9±4.0 km s−1 for UZ Tau E using optical spectra, consistent with the value
v sin i = 16 ± 2 km s−1, which we measure from our new spectra and adopt here. Since
absorption lines of the secondary of UZ Tau E have only been seen in near-infrared spectra
and are not evident in any of our optical spectra, we take this to be the projected rotation
velocity of the primary. Combining these measurements with sin iorbit = 0.81±0.05 (Section
3), and assuming iorbit = irot, we find Prot = 4.9± 0.8 d. If iorbit 6= irot, we find Prot ≤ 6± 1 d
since sin i ≤ 1. Thus, uncertainty on the inclination cannot reconcile the photometric period
with the inferred rotation period.
The most uncertain remaining quantity is v sin i, but since Hartmann & Stauffer (1989)
measured v sin i from 11 different spectra of UZ Tau E, with self-consistent results from two
different parts of the spectrum (including spectra near λ = 5200 Å) and consistency with our
new v sin i measurement, it is unlikely that line broadening from photospheric lines of the
faint, red secondary could lead to an overestimate of v sin i by a factor of four. Similarly, given
the uncertainties on L and Teff , it is difficult to see how the radius could be underestimated
by a factor of four. Thus, we conclude that the observed periodic variations are unlikely to
– 12 –
be due to stellar rotation.
5.2. Evidence for pulsed accretion
We have shown above that UZ Tau E exhibits periodic photometric variations that have
the same period as the binary orbit, and that these variations are unlikely to be caused by
stellar rotation. Here, we examine the predictions made by the pulsed accretion model of
AL96 and compare them to our observations.
5.2.1. What are the predictions?
Broadly speaking, Artymowicz & Lubow (1996) predict that a binary with an eccentric
orbit and a circumbinary disk will have an accretion flow from the circumbinary disk, and
thus onto the circumstellar disks or stellar surfaces, that varies periodically at the binary
orbital period.
The exact behavior of the accretion rate with orbital phase depends on the binary orbital
parameters. AL96 show the results of two simulations, one for mass ratio M2/M1 = 0.43 and
eccentricity e = 0.1, and another for M2/M1 = 0.79 and e = 0.5. The former shows accretion
that varies relatively smoothly over the orbital period, while the latter is strongly peaked
at periastron. As noted by AL96, the exact timing of the accretion variability depends
on the orbital parameters, most strongly on e. Some previous observational studies of T
Tauri spectroscopic binaries have focused specifically on looking for enhanced accretion near
periastron; however, we note here that the actual prediction of the model is more general
than that, and that the peak accretion rate need not come near periastron.
5.2.2. How well do the data match the predictions?
First, we note that our observations match the general predictions of AL96 quite well,
in that there are indeed periodic photometric variations at the binary orbital period, which
are readily interpretable as a variable accretion rate. The comparison with the spectroscopic
data is more ambiguous; if more intensive monitoring of the Hα line in UZ Tau E were to
show that there are no orbit-modulated Hα variations, it would present a problem for the
model.
For a more specific comparison with our data, Figure 9 shows the variations of accretion
– 13 –
with orbital phase predicted by AL96 for a binary with M2/M1 = 0.43, e = 0.1. UZ Tau E
has a more extreme mass ratio (M2/M1 = 0.30) and larger eccentricity (e = 0.33) than this,
but these parameters are closer to those of UZ Tau E than those of the other simulation in
AL96. AL96 do note that the timing of the maxima of the accretion depend largely on e
rather than M2/M1. Since e for UZ Tau E is intermediate between the two models calculated
by AL96, we might then expect the maximum accretion to come between the phase of ∼ 0.75
they calculate for the low-e case and the phase of ∼ 1 for the high-e case.
For comparison with our data, we have taken the logarithm of the variations of accretion
rate predicted by AL96 to shift them onto a “magnitude-like” scale, and added an arbitrary
offset and scale factor to match the mean of the data and amplitude of the variations. The
phase of the minimum predicted by this simulation does not match our data well; when the
model is given a shift of +0.2 in orbital phase, there is better agreement between the model
predictions and the data. This scaling and shifting to match the data is obviously ad hoc,
but it allows us to compare the phase width of the observed variations, which appear to
match the predictions relatively well. In addition, this shifted position of the maximum is
indeed between the two cases calculated by AL96, as expected if eccentricity is the dominant
factor in determining the timing of maximum accretion.
5.3. Evidence for periodic accretion in other T Tauri binaries
The discussion and data above show that looking for evidence of periodic accretion can
be complicated, with other sources of variability perhaps being important and masking the
effect in small datasets, and with the exact behavior expected to be a function of the specific
binary orbital parameters. That said, is evidence for pulsed accretion seen in other young
binary systems? In Table 4 we present characteristics of young binaries with periods of
less than one year and evidence of circumbinary material, in order of increasing eccentricity.
Below, we examine the observational data for some of these systems, attempting to relate
them to what we see in UZ Tau and exploring similarities and differences. Unfortunately,
the small number of systems and their somewhat heterogeneous properties means that it is
difficult to generalize, so we offer these comments in the spirit of attempting to pull together
the existing data, rather than arguing one way or the other for the validity of the AL96
model for the sample as a whole.
– 14 –
5.3.1. DQ Tau
DQTau was the first system to be scrutinized for evidence of pulsed accretion. Mathieu et al.
(1997) showed that the photometric variations are modulated at the binary orbital period,
and Basri et al. (1997) showed that the Hα line and spectral veiling are as well. Fortuitously,
the mass ratio and eccentricity of DQ Tau are quite similar to those of the high-eccentricity
case modeled by AL96, allowing for specific comparison with the theory. The timing and
phase width of the photometric and spectroscopic variations match the predictions well,
being sharply peaked near periastron. However, the DQ Tau observations did show consid-
erable orbit-to-orbit variation, with the periastron brightening being seen roughly 65% of
the time. This is reminiscent of the large scatter that we see in the UZ Tau light curves;
clearly the periodic accretion process is not exactly repeatable from orbit to the next, nor is
it the only source of variability.
5.3.2. AK Sco
The orbital eccentricity and binary mass ratio of AK Sco are quite similar to those of DQ
Tau, and indeed, simulations by Günther & Kley (2002) for a binary with AK Sco’s orbital
parameters predict pulsed accretion. Thus, it comes as some surprise that the system does
not show periodic photometric variability, despite extensive monitoring (Alencar et al. 2003).
The overall variability is large (up to 1.5 mags in y), but apparently random. There are
periodic variations in the Balmer lines, but they are not sharply peaked around periastron.
Examining Table 4, we note two properties of AK Sco that are quite different from those
of DQ Tau or UZ Tau. First, AK Sco is considerably hotter and more luminous. Thus,
accretion variations of the same luminosity as those occurring in UZ Tau and DQ Tau would
result in substantially smaller magnitude changes, which could be swamped by the large
random variability. Second, AK Sco has considerably lower millimeter flux than either of
the other two systems. If the systems are fit with similar disk models (in which the disk
is assumed to be optically thin at millimeter wavelengths in its outer regions), AK Sco’s
disk mass is an order of magnitude smaller than that of DQ Tau or UZ Tau E (Jensen et al.
1996a; Jensen & Mathieu 1997; Mathieu et al. 1997). Alencar et al. (2003) fit AK Sco with
an optically-thick disk model that has a comparable mass to the disk models fit to DQ Tau
and UZ Tau E. However, such disk models have not been fit to DQ Tau or UZ Tau E, and
would presumably result in even larger disk masses for those systems. In a direct comparison
of λ = 1.1 mm flux, DQ Tau and UZ Tau E are 3 and 5 times brighter than AK Sco at
roughly the same distance, presumably reflecting larger disk masses. It is possible that a
somewhat lower-mass disk has different dynamics, and that the accretion flow in the AK Sco
– 15 –
system is fundamentally different than that in the other systems with more massive disks.
5.3.3. GW Ori
This system has a near-circular orbit, and thus would not be expected to show pulsed
accretion under the model set forth by AL96. However, Stempels & Gahm (2004) quote
Artymowicz, private communication, as saying that pulsed accretion is possible for systems
with circular orbits as well, and indeed D’Angelo et al. (2006) show that this occurs for giant
planets embedded in disks. Thus, pulsed accretion appears to be possible for at least some
circular-orbit systems and thus may be for GW Ori as well, though the larger gap cleared by
a stellar companion (Artymowicz & Lubow 1994) will clear some of the disk resonances that
might contribute to disk eccentricity growth in a system with a planetary-mass companion.
Like AK Sco, GW Ori is very luminous, and shows significant random variability, though
no obvious periodic variability. It does have a much more massive disk than AK Sco, however,
and indeed than any of the systems considered here. Because of its much larger semimajor
axis, and to some extent its circular orbit, GW Ori is much more likely to have significant
circumstellar disks, as the stars do not approach each other very closely at periastron. Thus,
material flowing from the circumbinary disk may merge with the circumstellar disks and then
accrete more gradually onto the stars, rather than falling directly on (or near) the stellar
surfaces as is expected to happen in the shorter-period systems. If the infalling material does
not shock strongly as it merges with the circumstellar disk, and if any density enhancements
are smoothed out somewhat by the time the material reaches the stellar surface, then any
photometric signature of the periodic infall would be weakened. We note that UZ Tau E
likely has circumstellar disks as well (Jensen et al. 1996a), so a similar effect could be at work
in reducing the amplitude of the periodic variability relative to the stochastic variability.
5.3.4. V4046 Sgr
Like GW Ori, V4046 Sgr has a nearly circular orbit. However, V4046 Sgr has shown
periodic photometric variations at the binary orbital period (Quast et al. 2000; Mekkaden
2000). These variations persist over several years and are relatively sinusoidal (Walter,
unpublished data, 2003–2005). Unlike the other binaries discussed here, in this case stellar
rotation is a plausible explanation of the observed variations. It is common for stellar
rotational periods to become synchronized with the binary orbital period, particularly for
short-period binaries like V4046 Sgr. Given the short period (resulting in stronger tidal
– 16 –
interactions and a shorter synchronization time scale) and the somewhat older age of this
system (∼ 10 Myr), synchronization is plausible, and indeed is supported by detailed analysis
of the system (Stempels & Gahm 2004). However, rotation does not explain the periodic
Balmer line variations observed, which Stempels & Gahm (2004) attribute to accumulations
of gas co-rotating with the binary orbit.
5.3.5. ROXs 42 and ROXs 43B
These two spectroscopic binaries are both weak-lined T Tauri stars (Bouvier & Appenzeller
1992; Walter et al. 1994), indicating less-active accretion than some of the other systems dis-
cussed here. Neither has been detected at millimeter wavelengths, yielding only an upper
limit on the disk masses (Skinner et al. 1991; Jensen et al. 1996b). Both systems show
mid-infrared excesses, indicating the presence of circumbinary material, and a lack of near-
infrared excess, which can be modeled as a cleared central region in the disk (Jensen & Mathieu
1997). The fact that both are higher-order multiple systems complicates matters; ROXs 42
(NTTS 162814−2427) is a triple system with a separation of 0.′′15 (Lee 1992; Ghez et al.
1993), while ROXs 43B (NTTS 162819−2423S) has a wide companion at 4.′′8 which is itself a
close binary system (Walter et al. 1994; Simon et al. 1995b). Since the evidence for the pres-
ence of a substantial disk rests on the low-spatial-resolution IRAS detections, it is possible
that the excess is associated with the wider companions rather than arising from circumbi-
nary disks around the spectroscopic binaries. In any case, the lack of millimeter detections
indicates that there is less disk mass in these two systems than in the others discussed here.
Neither system has been intensively monitored over timespans that would be necessary to
detect periodic photometric variations at the relatively long orbital periods. ROXs 42 shows
evidence for some semi-regular variations over roughly 1.5 orbital periods (Zakirov et al.
1993), while the combined light of the ROX 43 system shows only a 0.1-magnitude varia-
tion, with evidence of a 1.5-day or 3-day periodicity, presumably attributable to rotation of
one or more of the stars (Shevchenko & Herbst 1998).
5.3.6. KH 15D
The unusual pre–main-sequence system KH 15D (V582 Mon) is a spectroscopic bi-
nary that undergoes deep (∆I ∼ 3.5 mag) eclipses, thought to arise due to occultation
from a circumbinary disk (Hamilton et al. 2001; Herbst et al. 2002b; Hamilton et al. 2005;
Winn et al. 2006 and references therein). While the system has an eccentricity and mass
ratio that would suggest that pulsed accretion might be present, the photometric variations
– 17 –
at the binary orbital period are dominated by the deep eclipses. Furthermore, the depth and
detailed shape of these eclipses are evolving with time (Winn et al. 2003; Johnson & Winn
2004; Maffei et al. 2005; Johnson et al. 2005; Winn et al. 2006), making it very difficult to
determine whether there might currently be an additional, smaller-amplitude component
with the same period that is related to accretion rather than occultation. Winn et al. (2003)
showed that the current deep eclipses did not occur during the first half of the twentieth
century, raising the possibility of searching for evidence of accretion-related variability at
earlier epochs. Their limit of one mag on the variability during that time does not preclude
accretion-related variations like those seen in UZ Tau E. The ∼ 0.9-mag periodic varia-
tions seen from the 1960’s through the 1980’s (Johnson & Winn 2004; Maffei et al. 2005;
Johnson et al. 2005), however, are relatively smooth and are well-fit by the eclipse model
(Winn et al. 2006), placing a limit on how much any accretion-related component was con-
tributing to the variability during that time. Since the inferred mass ratio and eccentricity
for KH 15D are similar to those of DQ Tau (Table 4), we might expect accretion-related vari-
ability to be strongly peaked around periastron, which is also when the current deep eclipses
occur. This might help explain several anomalously bright points seen during eclipses in the
late 1990’s that are not well-fit by the model of Winn et al. (2006).
The precessing circumbinary disk occultation model of Winn et al. (2006) is quite suc-
cessful in reproducing the shape and ongoing evolution of the light curve, and we do not
suggest that accretion explains most of the photometric variations. We note, however, the
possibility that such an additional component might be sporadically present (with the same
period) and that, if it is, this could complicate the modeling of the historical evolution of
the light curve, especially during earlier, more-sparsely-sampled epochs.
6. Conclusions
We have shown that the pre–main-sequence binary UZ Tau E shows clear photometric
variability at the binary orbital period of 19.13 days. This variability is consistent with a
model in which material in the circumbinary disk is periodically perturbed by the binary in
its eccentric orbit and falls from the outer disk, across the cleared central gap and onto the
stars or their circumstellar disks. There is significant scatter in the light curves, indicating
that this “pulsed accretion” may not occur during every binary orbit. Hα equivalent widths
show some suggestion of periodic variability, but it is not definitive.
The apparently intermittent behavior of the accretion, and the presence of other, random
sources of variability, suggest that searches for this sort of accretion signature require well-
sampled datasets with long time baselines in order to detect any periodic component. In
– 18 –
particular, simultaneous photometric and spectroscopic monitoring of UZ Tau E in the future
will help determine whether the Hα variations show a periodic component, as the photometric
variations do.
The good overall agreement between theory and observations suggests that resonant
interactions between stars (and, by extension, planets) and disks are indeed important in
determining disk structure and dynamics, while the random component of the observed
behavior shows that there is still work to be done in understanding the full complexity of
these interactions.
We gratefully acknowledge the support of the National Science Foundation through
grant AST-0307830. We thank the referee, Steve Lubow, for useful comments that improved
this paper. We are grateful to Michael Meyer, David Cohen, and Larry Marschall for useful
discussions; to Marcos Huerta and Pat Hartigan for use of their spectra of UZ Tau E; to
Matthew Richardson for assistance with data reduction; to Peter Collings for translating
early papers on UZ Tau from German to English; and to Thierry Forveille for use of his
ORBIT code. MS and FW are grateful for Stony Brook University’s partial support of
their participation in the SMARTS consortium. This research has made use of the SIMBAD
database, operated at CDS, Strasbourg, France, and of NASA’s Astrophysics Data System.
REFERENCES
Alencar, S. H. P. & Basri, G. 2000, AJ, 119, 1881
Alencar, S. H. P., Melo, C. H. F., Dullemond, C. P., Andersen, J., Batalha, C., Vaz, L. P. R.,
& Mathieu, R. D. 2003, A&A, 409, 1037
Artymowicz, P. & Lubow, S. H. 1994, ApJ, 421, 651
—. 1996, ApJ, 467, L77+, AL96
Attridge, J. M. & Herbst, W. 1992, ApJ, 398, L61
Bailey, S. I. 1921, Harvard College Observatory Bulletin, 759, 1
Basri, G., Johns-Krull, C. M., & Mathieu, R. D. 1997, AJ, 114, 781
Basri, G. & Reiners, A. 2006, AJ, 132, 663
Bate, M. R., Bonnell, I. A., Clarke, C. J., Lubow, S. H., Ogilvie, G. I., Pringle, J. E., &
Tout, C. A. 2000, MNRAS, 317, 773
– 19 –
Beckwith, S. V. W., Sargent, A. I., Chini, R. S., & Güsten, R. 1990, AJ, 99, 924
Bohlin, K. 1923, Astronomische Nachrichten, 218, 203
Bouvier, J. & Appenzeller, I. 1992, A&AS, 92, 481
Byrne, P. B. 1986, Irish Astronomical Journal, 17, 294
Choi, P. I. & Herbst, W. 1996, AJ, 111, 283
Clarke, C. 1992, in ASP Conf. Ser. 32: IAU Colloq. 135: Complementary Approaches to
Double and Multiple Star Research, 176–+
D’Angelo, G., Lubow, S. H., & Bate, M. R. 2006, ApJ, 652, 1698
Dutrey, A., Guilloteau, S., Duvert, G., Prato, L., Simon, M., Schuster, K., & Menard, F.
1996, A&A, 309, 493
Eggenberger, A., Udry, S., & Mayor, M. 2004, A&A, 417, 353
Forveille, T., Beuzit, J.-L., Delfosse, X., Segransan, D., Beck, F., Mayor, M., Perrier, C.,
Tokovinin, A., & Udry, S. 1999, A&A, 351, 619
Ghez, A. M., Neugebauer, G., & Matthews, K. 1993, AJ, 106, 2005
Gudehus, D. H. 2001, Bulletin of the American Astronomical Society, 33, 850
Günther, R. & Kley, W. 2002, A&A, 387, 550
Guthrie, B. N. G. 1985, MNRAS, 215, 545
Hale, A. 1994, AJ, 107, 306
Hamilton, C. M., Herbst, W., Shih, C., & Ferro, A. J. 2001, ApJ, 554, L201
Hamilton, C. M., Herbst, W., Vrba, F. J., Ibrahimov, M. A., Mundt, R., Bailer-Jones,
C. A. L., Filippenko, A. V., Li, W., Béjar, V. J. S., Ábrahám, P., Kun, M., Moór, A.,
Benkő, J., Csizmadia, S., DePoy, D. L., Pogge, R. W., & Marshall, J. L. 2005, AJ,
130, 1896
Hartmann, L. & Stauffer, J. R. 1989, AJ, 97, 873
Herbig, G. H. 1977, ApJ, 217, 693
Herbst, W., Bailer-Jones, C. A. L., Mundt, R., Meisenheimer, K., & Wackermann, R. 2002a,
A&A, 396, 513
– 20 –
Herbst, W., Hamilton, C. M., Vrba, F. J., Ibrahimov, M. A., Bailer-Jones, C. A. L., Mundt,
R., Lamm, M., Mazeh, T., Webster, Z. T., Haisch, K. E., Williams, E. C., Rhodes,
A. H., Balonek, T. J., Scholz, A., & Riffeser, A. 2002b, PASP, 114, 1167
Herbst, W., Herbst, D. K., Grossman, E. J., & Weinstein, D. 1994, AJ, 108, 1906
Herbst, W. & Mundt, R. 2005, ApJ, 633, 967
Horne, J. H. & Baliunas, S. L. 1986, ApJ, 302, 757
Huerta, M., Hartigan, P., & White, R. J. 2005, AJ, 129, 985
Hut, P. 1981, A&A, 99, 126
Jensen, E. L. N., Koerner, D. W., & Mathieu, R. D. 1996a, AJ, 111, 2431
Jensen, E. L. N. & Mathieu, R. D. 1997, AJ, 114, 301
Jensen, E. L. N., Mathieu, R. D., Donar, A. X., & Dullighan, A. 2004, ApJ, 600, 789
Jensen, E. L. N., Mathieu, R. D., & Fuller, G. A. 1994, ApJ, 429, L29
—. 1996b, ApJ, 458, 312
Johnson, J. A. & Winn, J. N. 2004, AJ, 127, 2344
Johnson, J. A., Winn, J. N., Rampazzi, F., Barbieri, C., Mito, H., Tarusawa, K.-i., Tsvetkov,
M., Borisova, A., & Meusinger, H. 2005, AJ, 129, 1978
Joy, A. H. & van Biesbroeck, G. 1944, PASP, 56, 123
Kenyon, S. J., Dobrzycka, D., & Hartmann, L. 1994, AJ, 108, 1872
Kovacs, G. 1981, Ap&SS, 78, 175
Larwood, J. D., Nelson, R. P., Papaloizou, J. C. B., & Terquem, C. 1996, MNRAS, 282, 597
Lee, C.-W. 1992, PhD thesis, AA(Wisconsin Univ., Madison.)
Lin, D. N. C. & Papaloizou, J. C. B. 1993, in Protostars and Planets III, 749–835
Lubow, S. H. & Ogilvie, G. I. 2000, ApJ, 538, 326
—. 2001, ApJ, 560, 997
Maffei, P., Ciprini, S., & Tosti, G. 2005, MNRAS, 357, 1059
– 21 –
Mandel, G. N. & Herbst, W. 1991, ApJ, 383, L75
Manset, N. & Bastien, P. 2003, AJ, 125, 3274
Manset, N., Bastien, P., & Bertout, C. 2005, AJ, 129, 480
Mart́ın, E. L., Magazzù, A., Delfosse, X., & Mathieu, R. D. 2005, A&A, 429, 939
Mathieu, R. D., Adams, F. C., Fuller, G. A., Jensen, E. L. N., Koerner, D. W., & Sargent,
A. I. 1995, AJ, 109, 2655
Mathieu, R. D., Adams, F. C., & Latham, D. W. 1991, AJ, 101, 2184
Mathieu, R. D., Baraffe, I., Simon, M., Stassun, K. G., & White, R. 2006, in Protostars and
Planets V
Mathieu, R. D., Martin, E. L., & Magazzu, A. 1996, Bulletin of the American Astronomical
Society, 28, 920
Mathieu, R. D., Stassun, K., Basri, G., Jensen, E. L. N., Johns-Krull, C. M., Valenti, J. A.,
& Hartmann, L. W. 1997, AJ, 113, 1841
Mekkaden, M. V. 2000, in IAU Symposium, 31P–+
Monet, D. G., Levine, S. E., Canzian, B., Ables, H. D., Bird, A. R., Dahn, C. C., Guetter,
H. H., Harris, H. C., Henden, A. A., Leggett, S. K., Levison, H. F., Luginbuhl, C. B.,
Martini, J., Monet, A. K. B., Munn, J. A., Pier, J. R., Rhodes, A. R., Riepe, B., Sell,
S., Stone, R. C., Vrba, F. J., Walker, R. L., Westerhout, G., Brucato, R. J., Reid,
I. N., Schoening, W., Hartley, M., Read, M. A., & Tritton, S. B. 2003, AJ, 125, 984
Monin, J. L., Clarke, C. J., Prato, L., & McCabe, C. 2006, in Protostars and Planets V
Osterloh, M. & Beckwith, S. V. W. 1995, ApJ, 439, 288
Ostriker, E. C., Shu, F. H., & Adams, F. C. 1992, ApJ, 399, 192
Papaloizou, J. C. B. & Terquem, C. 1995, MNRAS, 274, 987
Prato, L., Simon, M., Mazeh, T., Zucker, S., & McLean, I. S. 2002, ApJ, 579, L99
Quast, G. R., Torres, C. A. O., de La Reza, R., da Silva, L., & Mayor, M. 2000, in IAU
Symposium, 28P–+
Rebull, L. M. 2001, AJ, 121, 1676
– 22 –
Scargle, J. D. 1982, ApJ, 263, 835
Schwarzenberg-Czerny, A. 1989, MNRAS, 241, 153
Shevchenko, V. S. & Herbst, W. 1998, AJ, 116, 1419
Simon, M., Chen, W. P., Howell, R. R., Benson, J. A., & Slowik, D. 1992, ApJ, 384, 212
Simon, M., Dutrey, A., & Guilloteau, S. 2000, ApJ, 545, 1034
Simon, M., Ghez, A. M., Leinert, C., Cassar, L., Chen, W. P., Howell, R. R., Jameson, R. F.,
Matthews, K., Neugebauer, G., & Richichi, A. 1995a, ApJ, 443, 625
—. 1995b, ApJ, 443, 625
Skinner, S. L., Brown, A., & Walter, F. M. 1991, AJ, 102, 1742
Stassun, K. G., Mathieu, R. D., Mazeh, T., & Vrba, F. J. 1999, AJ, 117, 2941
Stellingwerf, R. F. 1978, ApJ, 224, 953
Stempels, H. C. & Gahm, G. F. 2004, A&A, 421, 1159
Walter, F. M., Brown, A., Mathieu, R. D., Myers, P. C., & Vrba, F. J. 1988, AJ, 96, 297
Walter, F. M., Vrba, F. J., Mathieu, R. D., Brown, A., & Myers, P. C. 1994, AJ, 107, 692
Weis, E. W. 1974, ApJ, 190, 331
Winn, J. N., Garnavich, P. M., Stanek, K. Z., & Sasselov, D. D. 2003, ApJ, 593, L121
Winn, J. N., Hamilton, C. M., Herbst, W. J., Hoffman, J. L., Holman, M. J., Johnson, J. A.,
& Kuchner, M. J. 2006, ApJ, 644, 510
Zakirov, M. M., Azimov, A. A., & Grankin, K. N. 1993, Informational Bulletin on Variable
Stars, 3898, 1
This preprint was prepared with the AAS LATEX macros v5.2.
– 23 –
Table 1. Observations of UZ Tau E
Telescope Exp. time (s) Filter(s) Season # of nights Timespan in days
SMARTS (1.3m) 5 BV RI 2003–2004 63 170
2005–2006 6 42
30 V RI 2005–2006 9 33
VVO (0.6m) 60 I 2004–2005 16 126
2005–2006 20 128
– 24 –
Table 2. Radial Velocities and Hα EW
Julian Date vhelio (km s
−1) Hα EW (Å)a
2450416.82 · · · 88.4
2450783.93 · · · 42
2450783.94 · · · 45.1
2450784.96 · · · 38
2450785.11 · · · 57
2450835.65 · · · 54:
2451060.99 · · · 45.6
2451061.00 · · · 39.8
2451077.14 · · · 63.8
2451120.92 · · · 101
2451137.94 · · · 51.3
2451138.88 · · · 62.6
2451162.86 · · · 69.3
2451163.82 · · · 57.8
2451164.79 · · · 57.5
2451165.78 · · · 58.3
2451166.79 · · · 61.7
2451169.84 · · · 48.9
2451504.73 · · · 35
2451507.68 · · · 25:
2451508.67 · · · 49.1
2451509.71 · · · 57
2451510.66 · · · 71
2451517.98 · · · 75
2451523.68 · · · 42.7
2451524.74 · · · 49.3
2451525.76 · · · 40.8
2451527.68 · · · 46.6
2451528.68 · · · 58.9
2451529.66 · · · 61.8
2451530.60 · · · 69.1
– 25 –
Table 2—Continued
Julian Date vhelio (km s
−1) Hα EW (Å)a
2452280.60 −4.3 ± 2.1 44
2452281.74 −3.1 ± 1.5 51
2452282.76 −5.8 ± 1.8 81
2452283.66 −4.9 ± 4.6 78
2452284.71 · · · b 77
2452286.66 6.2 ± 2.6 90
2452287.65 17.1 ± 1.7 87
2452288.68 27.2 ± 2.1 59
2452289.69 29.6 ± 3.0 65
2452290.72 37.8 ± 7.0 67
2452291.67 28.5 ± 2.1 58
2452292.64 29.1 ± 3.2 50
2452293.67 22.0 ± 3.6 45
2452579.11 · · · 50
aPositive values denote emission.
bThe spectrum on this date was too noisy
to allow measurement of an accurate radial
velocity.
– 26 –
Table 3. Binary orbital parameters for UZ Tau E
Period (days) 19.131± 0.003
e 0.33± 0.04
JD of periastron 2451328.3± 0.5
ω 239◦ ± 9◦
a sin i (AU) 0.124± 0.003
γ (km s−1) 13.9± 0.7
K1 (km s
−1) 17.3± 1.4
K2 (km s
−1) 57.4± 4.7
M sin3 i (M⊙) 0.69± 0.13
M2/M1 0.30± 0.03
Table 4. CTTS spectroscopic binaries
Binary Period e M2/M1 Spectral L Disk Mass
a Photometric Balmer line References
System (days) Type (L⊙) (M⊙) periodicity? periodicity?
V4046 Sgr 2.421 ≤ 0.01 0.94 K5 0.82 0.0085 Yes (∆B≈0.1) Yes 1, 2, 3
GW Ori 241.9 0.04 SB1 G0 26 0.3 ? (∆V≈0.7) ? 4, 5
UZ Tau E 19.131 0.33 0.30 M1 0.91 0.063 Yes (∆I≈0.8) Maybe 6, 7, 8
ROXs 43B 89.1 0.41 SB1 G0 0.4 < 0.00037 ? (∆V=0.1) ? 1, 9, 10, 11
AK Sco 13.609 0.47 0.99 F5 8.40 0.002 No (∆y≈1.5) Yes 1, 12, 13
ROXs 42 35.95 0.48 0.92 K4 0.4 < 0.00025 ? (∆V=0.4) ? 1, 10, 11, 14, 15
DQ Tau 15.804 0.56 0.97 K7–M1 0.95 0.020 Yes (∆V≈0.5) Yes 16, 17
KH 15D 48.38 0.57–0.65 0.83b K7c 0.4c · · · Eclipse (∆I≈3.5) ? 18, 19, 20
aDisk mass estimates were made assuming that the disk is at least partially optically thin at mm wavelengths. An optically-
thick model for AK Sco (Alencar et al. 2003) yields a disk mass of 0.02 M⊙.
bDerived from the stellar luminosity ratio that best fits the eclipse data (Winn et al. 2006).
cProperties of the secondary star, since the primary is never visible.
References. — 1. Jensen & Mathieu (1997). 2. Quast et al. (2000). 3. Mekkaden (2000). 4. Mathieu et al. (1991). 5.
Mathieu et al. (1995). 6. This work. 7. Prato et al. (2002). 8. Jensen et al. (1996a). 9. Shevchenko & Herbst (1998).
10. Manset & Bastien (2003). 11. Bouvier & Appenzeller (1992). 12. Alencar et al. (2003). 13. Manset et al. (2005). 14.
Lee (1992) 15. Walter et al. (1994). 16. Mathieu et al. (1997). 17. Basri et al. (1997). 18. Hamilton et al. (2001). 19.
Hamilton et al. (2005). 20. Winn et al. (2006).
– 28 –
Fig. 1.— The best-fit spectroscopic orbit for UZ Tau E. Crosses show velocities of the pri-
mary; those enclosed in boxes (in red in the on-line edition) show new radial velocity mea-
surements presented here. Open diamonds show velocities of the secondary from Prato et al.
(2002).
– 29 –
Fig. 2.— The Lomb-Scargle periodogram for the I-band data. The periodogram peaks at
a period of 19.20 days, with a false-alarm probability of 0.001. The dashed line shows the
binary orbital period of 19.131 days. The smaller peaks visible flanking the main peak (lower
panel) are near the alias periods expected for beat periods between one-year and two-year
periods (caused by the seasonal gaps in the data) and the binary period.
– 30 –
Fig. 3.— The Lomb-Scargle periodograms for the B, V , R, and I-band data. All show
significant power near the binary orbital period.
– 31 –
Fig. 4.— The I-band magnitude for UZ Tau E folded at the binary orbital period of 19.131
days and plotted against the binary orbital phase. Top to bottom, data from 2003–2004,
2004–2005, and 2005–2006.
– 32 –
Fig. 5.— The BV RI magnitudes for UZ Tau E folded at the binary orbital period and
plotted against the binary orbital phase. Left, 2003–2004; right, 2005-2006. The open circles
in the lower right plot show the VVO I-band data, which do not have corresponding B, V,
and R data.
– 33 –
Fig. 6.— The BV RI magnitudes for UZ Tau E folded at the binary orbital period and
plotted against the binary orbital phase, after removing a long-term linear trend from each
band.
– 34 –
Fig. 7.— V − I color vs. I magnitude and vs. orbital phase for 2003–2004 (closed circles)
and 2005–2006 (open circles). The system is redder when fainter and bluer when brighter,
the expected behavior either for changes in extinction or for brightening due to increased
accretion.
– 35 –
Fig. 8.— Top: Equivalent width of the Hα line as a function of binary orbital phase. Squares
are our new measurements; triangles are from Mart́ın et al. (2005). Bottom: For comparison,
the phased I-band data. There is some suggestion of reduced Hα equivalent width at phases
of 0.4–0.8 as seen in the photometric data, but the data are too sparse there for there to be
clear evidence for periodic variability of the Hα emission.
– 36 –
Fig. 9.— Left: The theoretical predictions of Artymowicz & Lubow (1996) for the depen-
dence of accretion rate on binary orbital phase in a binary with mass ratio M2/M1 = 0.43,
e = 0.1. The top curve shows the total accretion, while the lower curves show accretion onto
the secondary (higher dark curve) and primary (lower dark curve). Right: The same total
accretion curve, but placed onto a logarithmic scale and shifted vertically for comparison
with the phased I-band data. The model here has been given an ad hoc shift of +0.2 in
phase, roughly what is expected given the binary eccentricity (see Section 5.2.2).
	Introduction
	Observations
	Photometry
	Spectroscopy
	Binary orbital parameters
	Periodic variations
	Photometry
	Spectroscopy
	Discussion
	Could the variations be due to rotation?
	Evidence for pulsed accretion
	What are the predictions?
	How well do the data match the predictions?
	Evidence for periodic accretion in other T Tauri binaries
	DQ Tau
	AK Sco
	GW Ori
	V4046 Sgr
	ROXs 42 and ROXs 43B
	KH 15D
	Conclusions
ABSTRACT
  Close pre-main-sequence binary stars are expected to clear central holes in
their protoplanetary disks, but the extent to which material can flow from the
circumbinary disk across the gap onto the individual circumstellar disks has
been unclear. In binaries with eccentric orbits, periodic perturbation of the
outer disk is predicted to induce mass flow across the gap, resulting in
accretion that varies with the binary period. This accretion may manifest
itself observationally as periodic changes in luminosity. Here we present a
search for such periodic accretion in the pre-main-sequence spectroscopic
binary UZ Tau E. We present BVRI photometry spanning three years; we find that
the brightness of UZ Tau E is clearly periodic, with a best-fit period of 19.16
+/- 0.04 days. This is consistent with the spectroscopic binary period of 19.13
days, refined here from analysis of new and existing radial velocity data. The
brightness of UZ Tau E shows significant random variability, but the overall
periodic pattern is a broad peak in enhanced brightness, spanning more than
half the binary orbital period. The variability of the H-alpha line is not as
clearly periodic, but given the sparseness of the data, some periodic component
is not ruled out. The photometric variations are in good agreement with
predictions from simulations of binaries with orbital parameters similar to
those of UZ Tau E, suggesting that periodic accretion does occur from
circumbinary disks, replenishing the inner disks and possibly extending the
timescale over which they might form planets.

<|endoftext|><|startoftext|>
Effect of node deleting on network structure
Ke Deng,∗ Heping Zhao, and Dejun Li
Department of Physics, Jishou University, Jishou, Hunan 416000, People’s Republic of China
The ever-increasing knowledge to the structure of various real-world networks has uncovered their
complex multi-mechanism-governed evolution processes. Therefore, a better understanding to the
structure and evolution of these networked complex systems requires us to describe such processes
in more detailed and realistic manner. In this paper, we introduce a new type of network growth
rule which comprises of adding and deleting of nodes, and propose an evolving network model to
investigate the effect of node deleting on network structure. It is found that, with the introduction of
node deleting, network structure is significantly transformed. In particular, degree distribution of the
network undergoes a transition from scale-free to exponential forms as the intensity of node deleting
increases. At the same time, nontrivial disassortative degree correlation develops spontaneously as a
natural result of network evolution in the model. We also demonstrate that node deleting introduced
in the model does not destroy the connectedness of a growing network so long as the increasing rate
of edges is not excessively small. In addition, it is found that node deleting will weaken but not
eliminate the small-world effect of a growing network, and generally it will decrease the clustering
coefficient in a network.
I. INTRODUCTION
Network structure is of great importance in the topological characterization of complex systems in reality. Actually,
these networked complex systems have been found to share some common structural characteristics, such as the small-
world properties, the power-law degree distribution, the degree correlation, and so on [1, 2, 3]. In the theoretical
description of these findings, the Watts-Strogatz (WS) model [4] provides a simple way to generate networks with
the small-world properties. Barabási and Albert (BA) [5], with a somewhat different aim, proposed an evolving
network model to explain the origin of power-law degree distribution. In this model, by considering two fundamental
mechanisms: growth and preferential attachment (PA), power-law degree distribution emerges naturally from network
evolution. Based on the framework of BA model, many other mechanisms were introduced into network evolution
to reproduce some more complex observed network structures [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17], such as the
degree distribution of broad scale and single scale [6], as well as the degree correlation [17]. These further studies
show that real networked systems may undergo a very complex evolution process governed by multiple mechanisms
on which the occurrence of network structures depends. Therefore, to get a better understanding of the structure and
evolution of complex networks, describing such processes in more detailed and realistic manner is necessary.
In the BA’s framework, the growing nature of real-world networks is captured by a BA-type growth rule. According
to this rule, one node is added into the network at each time step, intending to mimic the growing process of real
systems. This rule gives an explicit description to the real-network’ growing process which, however, can in fact be
much more complex. One fact is that in many real growing networks, there are constant adding of new elements, but
accompanied by permanent removal of old elements (deletion of nodes) [18, 19, 20, 21, 22, 23]. Take the food webs for
a example: there are both additions and losses of nodes (species) at ecological and evolutionary time scales by means
of immigration, emigration, speciation, and extinction [18]. Likewise, for Internet and the World Wide Web (WWW),
node-deleting is reported experimentally in spit of their rapid expansion of size [19, 20, 21, 22, 23]. In the Internet’s
Autonomous Systems (ASs) map case, a node is an AS and a link is a relationship between two ASs. An AS adding
means a new Internet Service Provider (ISP) or a large institution with multiple stub networks joins the Internet. An
AS deleting happens due to the permanent shutdown of the corresponding AS as it is, for example, out of business.
Investigations of the evolution of real Internet maps from 1997 to 2000 verified such network mechanism [19, 20, 21].
The same is for the evolution of WWW, in which the deletions of invalid web pages are also frequently discovered
[22, 23]. In most cases, the deletion of a node is also accompanied with the removal of all edges once attached to it.
These facts justify the investigation of node-deletion’s influence on network structure. In this paper, we introduce a
new type of network growth rule which comprises of adding and deleting of nodes, and propose an evolving network
model to investigate the effect of node deleting on the network structure. Before now several authors have proposed
some models on node removal in networks, such as AJB networks in which a portion nodes are simultaneously removed
∗Electronic address: dengke@jsu.edu.cn
http://arxiv.org/abs/0704.0308v1
mailto:dengke@jsu.edu.cn
from the network [24], and also the decaying [25] and mortal [26] networks, which concerns networks’ scaling property
and critical behavior respectively. Sarshar et al [27] investigated the ad hoc network with node removal, focusing
on the compensatory process to preserve true scale-free state. They are different from present work, in which node
deleting is treated as an ubiquitous mechanism accompanied with the evolution of real-world networks.
This paper is organized as follows. In Section II, an evolving network model taking account of the effect of node
deleting is introduced which reduces to a generalized BA model when the effect of node deleting vanishes. Then
the effect of node deleting on network structure are investigated in five aspects: degree distribution (Section III),
degree correlation (Section IV), size of giant component (Section V), average distance between nodes (Section VI)
and clustering (Section VII). Finally, Section VIII presents a brief summary.
II. THE MODEL
We consider the following model. In the initial state, the network has m0 isolated nodes. At each time step, either
a new node is added into the network with probability Pa or a randomly chosen old node is deleted from the network
with probability Pd = 1 − Pa, where Pa is an adjustable parameter. When a new node is added to the network, it
connects to m (m 6 m0) existing node in the network according to the preferential probability introduced in the BA
model [5], which reads
kα + 1
β(kβ + 1)
where kα is the degree of node α. When an old node is deleted from the network, edges once attached to it are
removed as well. In the model, Pa is varied in the range of 0.5 < Pa ≤ 1, since in the case of Pa 6 0.5 the network
can not grow. In order to give a chance for isolated nodes to receive a new edge, we choose preferential probability
Πα proportional to kα + 1 [7]. Note that when Pa = 1, our model reduces to a generalized BA model [28].
To get a general knowledge to the effect of node deleting on network structure, firstly, a simple analysis to the
surviving probability D(i, t) is helpful. Here, D(i, t) is defined as the probability that a node is added into the
network at time step i, and this node (the ith node) has not been deleted until time step t, where t > i. Supposing
that a node-adding event happens at time step i
, and the probability that the i′th node has not been deleted until
time step t is denoted as D′(i′, t). Then, due to the independence of events happened at each time step, it is easy to
verify that D′(i′, t + 1) = D′(i′, t)[1 − (1 − Pa)/N(t)] with D′(i′, i′) = 1, where N(t) = (2Pa − 1)t is the number of
nodes in the network at moment t (in the limit of large t). In the continuous limit, we obtain
∂D′(i′, t)
= − (1− Pa)
(2Pa − 1)t
D′(i′, t), (2)
which yields
D′(i′, t) =
)−(1−Pa)/(2Pa−1)
. (3)
Thus to get the D(i, t) we should multiply D′(i′, t) with Pa, i.e.
D(i, t) = Pa
)−(1−Pa)/(2Pa−1)
. (4)
One can easily find that D(i, t) decreases rapidly as t increases and/or as i decreases provided 0.5 < Pa < 1. It is well
known that highly connected nodes, or hubs, play very important roles in the structural and functional properties of
growing networks [1, 2, 3]. The formation of hubs needs a long time to gain a large number of connections. As a
consequence, according to Eq. (4), a large portion of potential hubs are deleted during the network evolution. Thus
it can be expected that the introduction of node deleting has nontrivial effects on network structure. In the following
we show how network structure can be effected by the node deleting introduced in present model.
III. DEGREE DISTRIBUTION
The degree distribution p(k), which gives the probability that a node in the network possesses k edges, is a very
important quantity to characterize network structure. In fact, p(k) has been suggested to be used as the first criteria
to classify real-world networks [6]. Therefore it is necessary to investigate the effect of node deleting on the degree
distribution of networks firstly. Now we adopt the continuous approach [29] to give a qualitative analysis of p(k)
for our model with slight node deletion (i.e., when Pd is very small). Supposing that there is a node added into the
network at time step i′, and this node is still in the network at time t, let k(i′, t) be the degree of the i′th node at
time t, where t > i′. Then the increasing rate of k(i′, t) is
∂k(i′, t)
= Pam
k(i′, t) + 1
− (1− Pa)
k(i′, t)
, (5)
where
S(t) =
D′(i′, t)[k(i′, t) + 1] (6)
and the
′ denotes the sum of all i′ during the time step between 0 and t. It is easy to verify that the first term
in Eq. (5) is the increasing number of links of the i′th node due to the preferential attachment made by the newly
added node. The second term in Eq. (5) accounts for the losing of a link of the i′th node during the process of node
deletion, which happened with the probability k(i′, t)/N(t).
Firstly we solve for the S(t) and get
S(t) = (2Pa − 1) (2Pam+ 1) t (7)
(see the Appendix for details). Inserting Eq. (7) back into Eq. (5), one gets
∂k(i′, t)
Ak(i′, t) +B
, (8)
where
2P 2am− Pam+ Pa − 1
(2Pa − 1)(2Pam+ 1)
(2Pa − 1)(2Pam+ 1)
. (10)
When Ak +B > 0, the solution of Eq. (8) is
k(i′, t) =
(Am+B)
. (11)
Now, to get the probability p(k, t) that a randomly selected node at time t will have degree k, we need to calculate the
expected number of nodes Nk(t) with degree k at time t. Then the p(k, t) can be obtained from p(k, t) = Nk(t)/N(t),
where N(t) is the total number of nodes at time t. Let Ik(t) represent the set of all possible nodes with degree k at
time t, then one gets
p(k, t) =
Nk(t)
i∈Ik(t)
D(i, t). (12)
In the continuous-time approach, the number of nodes in Ik(t) is the number of i’s for which k 6 k(i, t) 6 k + 1,
and it is approximated to |∂k(i, t)/∂i|−1i=ik , where ik is the solution of the equation k(i, t) = k. To proceed with our
analysis, now we make the approximation that all nodes in Ik(t) have the same surviving probability D(ik, t) [44].
Under this mean-field approximation, Eq. (12) can be written as
p(k, t) =
D(ik, t)
∂k(i, t)
. (13)
From Eq. (11), we obtain
Ak +B
t. (14)
1 2 3 4 5 6 7 8 9 10
FIG. 1: Pmina [defined in Eq. (20)] as a function of m.
∂k(i, t)
= (Am+B)
t (Ak +B)
−(A+1)/A
. (15)
Inserting Eq. (14) back into Eq. (4) we get
D(ik, t) = Pa
Ak +B
)(A−B)/A
Inserting Eqs. (15) and (16) into Eq. (13), and noting that N(t) = (2Pa − 1)t, we get
p(k, t) =
2Pa − 1
(Am+B)
(B−A+1)/A
(Ak +B)
−(B+1)/A
, (17)
which is a generalized power-law form with the exponent
B + 1
= 2 +
Pam+ 1
2P 2am− Pam+ Pa − 1
. (18)
We point out again that equation (11) is only valid when Ak +B > 0, which translates into A > 0, i.e.
2P 2am− Pam+ Pa − 1 > 0. (19)
Considering that Pa > 0.5, Eq. (19) is satisfied when
Pa > P
(m− 1) +
m2 + 6m+ 1
. (20)
In Fig. 1, we plot Pmina as a function of m. One can see from Fig. 1 that the curve divides our model into two regimes.
(i) Pa > P
a : in this case Ak + B > 0 and equation (11) is valid. Thus, the degree distribution of the network
p(k) exhibits a generalized power-law form. (ii) Pa > P
a : In this case Ak + B > 0 can not be always satisfied
and equation (11) is not valid. Therefore, our continuous approach fails to predict the behavior of p(k), and we will
investigate it with numerical simulations. The Pmina (m), as one can find from Fig. 1, decreases with the increase of
In the power-law regime [Pa > P
a (m)], the behavior of p(k) is predicted by Eqs. (17) and (18), which are obtained
using a mean-field approximation [Eq. (13)]. One can easily verify that such approximation is only exact when Pa = 1,
in which case Eq. (18) turns into γ = 3 + 1/m, in good agreement with the results obtained from generalized BA
100 101 102 103
1.00 0.95 0.90
 power-law fit 
 pa=1.0
 pa=0.95
 pa=0.9
 pa=0.8
 pa=0.7
 pa=0.6
 pa=0.55 
 pa=0.51
exponential fit 
FIG. 2: Cumulative degree distribution P (k) for networks with system size N = 100000 and different values of Pa, in logarithmic
scales. The dash line is power-law fit for Pa = 1. The solid line is the exponential fit for Pa = 0.51. In the simulation, we
set m0 = m = 5 and each distribution is based on 10 independent realizations. Inset plots the power-law exponential γ as a
function of Pa. The continuous curve is according to the analytic result of Eq. (18), and circles to the simulation results.
model studied in Ref [28]. If Pmina (m) < Pa < 1, Eqs. (17) and (18) still give qualitative predictions for the model:
with slight node deletion, p(k) of the network is still power-law, and the exponential γ increases with the decrease of
Pa (inset of Fig. 2).
In remaining regime [Pa < P
a (m)], the limiting case is Pa → 0.5, in which the growth of network is suppressed (a
very slowly growing one). Similar non-growing networks have been studied, for example, for the Model B in Ref[30],
and the degree distribution has the exponential form. Here we conjecture that, in this regime, p(k) of our model
crossovers to an exponential form, which is verified by the numerical simulation results below.
Now we verify the above analysis with numerical simulations. In Fig. 2, we give the cumulative degree distributions
P (k) [3] of the networks with different Pa. As Pa gradually decreases from 1 to 0.5, Fig. 2 shows an interesting
transition process which can be roughly divided into three stages. (1) 0.9 6 Pa 6 1: In this stage, the model works in
the power-law regime and the power-law exponent γ increases as Pa decreases. Inset of Fig. 2 gives the comparison
between the value of γ predicted by Eq. (18) and the one obtained from numerical simulations. One sees that the
theory and the simulation results are in perfect agreement for Pa = 1. As Pa decreases, however, the agreement is only
qualitative and the deviation between theory and simulation becomes more and more obvious. As we have mentioned
above, such increasing deviation is due to the mean-field approximation used in the analysis. These results tell us that
slight node deletion does not cause deviation of the network from scale-free state, but only increases its power-law
exponent. Such robustness of power-low p(k) revealed here gives an explanation to the ubiquity of scale-free networks
in reality. It should be noted that a very similar robustness has also been found in the study of network resilience,
where simultaneously deleting of a portion of nodes was taken into account in static scale-free networks [24]. (2)
0.5 < Pa 6 0.6: In this stage, the model works in the regime of Pa < P
a (m). As one sees from Fig. 2, P (k) of
the network behaviors exponentially. This result indicates that with manifest node deletion, the network will deviate
from scale-free state and become exponential. (3) 0.6 < Pa < 0.9: In this stage, a crossover of the model from the
power-law regime to the exponential regime is found, in which the P (k) is no longer pure scale-free but truncated by
an exponential tail. As one can see, the truncation in P (k) increases as Pa decreases.
Besides the power-law degree distribution, it is now known that p(k) in real world may deviate from a pure power-
law form [18, 31, 32, 33, 34]. According to the extent of deviation, p(k) of real systems has been classified into three
groups [6]: scale-free (pure power-law), broad scale (power-law with a truncation), and single scale (exponential).
Many mechanisms, such as aging [6, 8, 9], cost [6], and information filtering [10], have been introduced into network
growth to explain these distributions. Here, the results of Fig. 2 indicate that a modified version of growth rule can
lead to all the three kinds of p(k) in reality, and it provides another explanation for the origin of the diversity of
degree distribution in real-world: such diversity may be a natural result of network growth.
0.0 2.0x104 4.0x104 6.0x104 8.0x104 1.0x105
-0.12
-0.10
-0.08
-0.06
-0.04
-0.02
 pa=1       pa=0.9
 pa=0.8    pa=0.7
 pa=0.6    pa=0.55
FIG. 3: Assortativity coefficient r plotted with network size N , for different Pa in the model. In the simulation, m0 = m = 5.
Result of each curve is based on 10 independent realizations.
IV. DEGREE CORRELATION
It has been recently realized that, besides the degree distribution, structure of real networks are also characterized
by degree correlations [19, 35, 36, 37, 38]. This translates into the fact that degrees at the end of any given edge in real
networks are not usually independent, but are correlated with one another, either positively or negatively. A network
in which the degrees of adjacent nodes are positively (negatively) correlated is said to show assortative (disassortative)
mixing by degree. An interesting observation emerging from the comparing of real networks of different types is that
most social networks appear to be assortatively mixed, whereas most technological and biological networks appear to
be disassortative. The level of degree correlation can be quantified by the assortativity coefficient r lying in the range
−1 6 r 6 1, which can be written as
i jiki −
(ji + ki)
(j2i + k
(ji + ki)
for practical evaluation on an observed network, where ji, ki are the degrees of the vertices at the ends of the ith edge,
with i = 1, . . . ,M [35]. This formula gives r > 0(r < 0) when the corresponding network is positively (negatively)
correlated, and r = 0 when there is no correlation [45].
Recently, Maslov et al [39] and Park et al [40] have proposed a possible explanation for the origin of such correlation.
They show for a network the restriction that there is at most one edge between any pair of nodes induces negative
degree correlations. This restriction seems to be an universal mechanism (indeed, there is no double edges in most
real networks), therefore, the authors of Ref. [40] conjecture that disassortativity by degree is the normal state of
affairs for a network. Although only a part of the measured correlation can be explained in the way of Ref. [40], this
universal mechanism does give a promising explanation for the origin of degree correlation observed in real networks
of various types.
It will be of great interest to discuss the effect of node deleting on degree correlation. In Fig. 3, we give the
assortativity coefficient r as a function of network size N , for different Pa in our model, for m = 5. As one sees
from Fig. 3, for each value of Pa, after a transitory period with finite-size effect, each r of networks tends to reach
a steady value. When Pa = 1, r → 0 as N becomes large. This result indicates that networks in the BA model are
uncorrelated, in agreement with results obtained in previous studies [35, 38]. When Pa < 1, nontrivial negative degree
correlations spontaneously develop as networks evolve. One can see from Fig. 3 that the steady value of r in the model
decreases with the decreasing Pa. In particular, when Pa 6 0.6, the value of r is about −0.1. These results indicate
that node deleting leads to disassortative mixing by degree in evolving networks. To make such relation more clear,
in Fig. 4, we plot r of networks in our model as a function of Pa, for different m. As the Fig. 3 indicates, when the
network size is larger than 40000, the assortativity coefficient r is nearly stable. So all results in Fig. 4 are obtained
from networks with N = 40000. Fig. 4 gives us the same relation between r and Pa shown in Fig. 3. What is more,
it tells us that for a given Pa, r will increase with the increasing m. The increment gets its maximum between m = 1
1.0 0.9 0.8 0.7 0.6 0.5
-0.20
-0.16
-0.12
-0.08
-0.04
 m=14
 m=15
FIG. 4: Assortativity coefficient r as a function of Pa, for different m in the model. In the simulation, N = 40000. Result of
each curve is based on 10 independent realizations.
0.0 2.0x104 4.0x104 6.0x104 8.0x104 1.0x105
-0.06
 pa=1      pa=0.9
 pa=0.8    pa=0.7
 pa=0.6    pa=0.55
FIG. 5: Assortativity coefficient r plotted with network size N , for different Pa in the randomly growing network model. In
the simulation, m0 = m = 5 and each curve is based on 10 independent realizations.
and other values. We point out that this is because when m = 1, the network has been broke up into small separate
components (see the following section). We can also find from Fig. 4 that the gap between different curves decreases
with the increasing m and the curves tend to merge at large m.
Now we give some explanations to the above observations. In the BA model, the network being uncorrelated is
the result of a competition between two factors: the growth and the preferential attachment (PA). On the one hand,
networks with pure growth is positively correlated. This is because the older nodes, also tending to be higher degree
ones, have a higher probability of being connected to one another, since they coexisted earlier. In Fig. 5, we compute
the assortativity coefficient r of a randomly growing network, which grows by the growth rule of BA-type, while the
newly added nodes connect to randomly chosen existing ones. As one can see from Fig. 5 that pure growth leads to
positive r. On the other hand, the introduction of PA makes the connection between nodes tend to be negatively
correlated, since newly added nodes (usually low degree ones) prefer to connect to highly connected ones. Then
degree correlation characteristic of the BA model is determined by this two factors. In Fig. 6, we plot the average
degree of the nearest neighbor < k >nn as a function of k in the BA model. It is found that nodes with large k show
no obvious biases in their connections. But there is a short disassortative mixing region when k is relatively small
(also reported in Ref. [41], see Fig.1a therein). Such phenomenon can be explained by the effect of these two factor:
10 20 30 40 50 60 70 80 90 100
FIG. 6: Average degree of the nearest neighbor as a function of k for the BA model. In the simulation, N = 10000 and
m = m0 = 5. Result of each curve is based on 1000 independent realizations.
1.0 0.9 0.8 0.7 0.6 0.5
1.0 0.9 0.8 0.7 0.6 0.5
FIG. 7: The relative size of the largest component S as a function of Pa for m = 2, 3, 4, 5. Inset gives the same curve for m = 1.
In the simulations, N = 100000. All results are based on 10 independent realizations.
Growth together with PA makes nodes with large k equally connect to both large and small degree nodes, and the
latter makes nodes with small degree be disassortatively connected. Now, we introduce node-deletion. According to
Eq. (4), depression of the growth of large-degree nodes also decreases the connections between them, therefore makes
the correlation negative. We also investigate the effect of node deleting on the r of the randomly growing network,
and obtained similar results. As one sees from Fig. 5, depression of connections between higher degree nodes causes
the network less positively correlated, and with stronger node-deletion, negatively correlated. Finally, with regard to
the effect of m in this relation (Fig. 4), larger m means more edges are established according to the PA probability
Eq. (1). We conjecture that the orderliness of newly added nodes connecting to large degree nodes will be weakened
by the increasing randomness as m becomes larger, thus leading to a less negative correlation. Such randomness can
not always increase and, as we see from Fig. 4, for large m, e.g., m ≥ 14, the curves tend to merge together.
V. SIZE OF GIANT COMPONENT
In a network, a set of connected nodes forms a component. If the relative size of the largest component S in a
network approaches a nonzero value when the network is grown to infinite size, this component is called the giant
component of the network [1, 2, 3]. In most previously studied growing models [1, 2, 3], due to the BA-type growth
rule they adopted, there is only one huge component in the network, i.e., S ≡ 1. In this extreme case the network
gains a perfect connectedness. The opposite case of S = 1 is the extreme of S = 0, in which case the network,
made up of small components, exhibits no connectedness. Experiments indicate that some real networks seem to
lie in somewhere between these two extreme: they contain a giant component as well as many separate components
[2, 3, 42, 43]. For example, According to Ref.[42], in May of 1999, the entire WWW, containing 203 × 106 pages,
consisted of a giant component of 186× 106 pages and the disconnected components (DC) of about 17× 106 pages. In
general, the introduction of node deletion in our model will cause the emergence of separate components even isolated
nodes in the network. What we interest here is the connectedness of the network. In Fig. 7 we plot the relative size
of the largest component S in the model, as a function of Pa, for m = 2, 3, 4, 5, where m is the number of edges
generated with the adding of a new node. One sees from Fig. 7 that for any 0.5 < Pa ≤ 1, a giant component can be
observed in the model if m > 1. In addition, for the same Pa, S increase as the increase of m. While when m = 1, the
network is found to be broke up into separate components if Pa < 1. For example, when Pa = 0.9, S of the network
with N = 100000 rapidly drops to 0.034. Inset of Fig. 7 gives the S Vs Pa curve for m = 1. These results indicate
that node deleting does not destroy the connectedness of a growing network so long as the increasing rate of edges is
not excessively small.
VI. AVERAGE DISTANCE BETWEEN NODES
Now we study the effect of node deletion on networks’ average distance L between nodes. Here the distance between
any two nodes is defined as the number of edges along the shortest path connecting them. It has been revealed that,
despite their often large size, most real networks present a relatively short L, showing the so-called small-world effect
[1, 2, 3, 4]. Such an effect has a more precise meaning: networks are said to show the small-world effect if the value
of L scales logarithmically or slower with network size for fixed mean degree. This logarithmic scaling can be proved
for a variety of network models [1, 2, 3]. As we have demonstrated in Section V, node deleting does not destroy the
connectedness of the network in our model for any m > 1, since there is always a giant component exists. Here in
our simulation, we calculate L of the giant component of the network in our model using the burning algorithm [3].
In Fig. 8, we plot L as a function of network size N , for different Pa in our model. As one can see from the figure,
for any 0.5 < Pa ≤ 1, a logarithmic scaling L ∼ lnN is obtained, while the proportional coefficient increases with
the decrease of Pa. Furthermore, for a given N , L increases with the decrease of Pa. These results tell us that node
deleting will weaken but not eliminate the small-world effect of a growing network.
VII. CLUSTERING
Finally, we investigate the effect of node deletion on network’s cluster coefficient C, which is defined as the average
probability that two nodes connected to a same other node are also connected. For a selected node i with degree ki
in the network, if there are Ei edges among its ki nearest neighbors, the cluster coefficient Ci of node i is defined as
ki (ki + 1)
. (22)
Then the clustering coefficient of the whole network is the average of all individual Ci. In Fig. 9, we plot C of the
giant component in the network as a function of network size N , for different Pa. As one sees from Fig. 9, for each Pa,
the clustering coefficient C of our model decreases with the network size, following approximately a power law form.
Such size-dependent property of C is shared by many growing network model [1, 2, 3]. Moreover, as Fig. 9 shows,
for the same network-size N , C decreases as Pa decreases. The results of Fig. 9 indicate that node deleting weakens
network’s clustering.
VIII. CONCLUSION
In summary, we have introduced a new type of network growth rule which comprises of adding and deleting of nodes,
and proposed an evolving network model to investigate effects of node deleting on network structure. It has been
102 103 104 105
 Pa=1
 Pa=0.9
 Pa=0.8
 Pa=0.7
 Pa=0.6
 Pa=0.55
 Pa=0.51
FIG. 8: Average distance L of the giant component in the network as a function of network size N , for different Pa in the
model. The chose of some parameters: m0 = m = 5. These curves are results of 10 independent realizations.
102 103 104 105
 Pa=1
 Pa=0.9
 Pa=0.8
 Pa=0.7
 Pa=0.6
 Pa=0.55
FIG. 9: Cluster coefficient C of the giant component in the network as a function of network size N , for different Pa. In the
simulation we set m0 = m = 5. These curves are results of 10 independent realizations.
found that, with the introduction of node deleting, network structure was significantly transformed. In particular,
degree distribution of the network undergoes a transition from scale-free to exponential forms as the intensity of node
deleting increased. At the same time, nontrivial disassortative degree correlation spontaneously develops as a natural
result of network evolution in the model. We also have demonstrated that node deleting introduced in our model does
not destroy the connectedness of a growing network so long as the increasing rate of edge is not excessively small. In
addition, it has been observed that node deleting will weaken but not eliminate the small-world effect of a growing
network. Finally, we have found that generally node deleting will decrease the clustering coefficient in a network.
These nontrivial effects justify further studies of the effect of node deleting on network function [3], which include
topics such as percolation, information and disease transportation, error and attack tolerance, and so on.
Acknowledgments
The authors thank Doc. Ke Hu for useful discussions. This work is supported by the National Natural Science
Foundation of China, Grant No. 10647132, and Natural Science Foundation of Hunan Province, China, Grant No.
00JJY6008.
APPENDIX: THE CALCULATION OF S(T )
To get S(t), we multiply both sides of Eq. (5) by D′(i′, t) and sum up all i′ between 0 and t:
∂k(i′, t)
D′(i′, t) = Pa(m− 1)−
1− Pa
(2Pa − 1)t
S(t) + 1. (A.1)
To get the above equation we have used the definition of S(t) [Eq. (6)] and the following equation:
D′(i′, t) =
D(i, t)di. (A.2)
The left-hand side of Eq. (A.1) can be simplified as:
∂ {[k(i′, t) + 1]D′(i′, t)}
[k(i′, t) + 1]
∂D′(i′, t)
[k(i′, t) + 1]D′(i′, t)
− [k(t, t) + 1]D(t, t)
[k(i′, t) + 1]D′(i′, t)
Pa − 1
(2Pa − 1)t
Substituting the above expression in Eq. (A.1), and noting that k(t, t) = m and D(t, t) = Pa, we get
∂S(t)
2(Pa − 1)
(2Pa − 1)t
S(t) + 2Pam+ 1.
The solution to the above equation is
S(t) = (2Pa − 1) (2Pam+ 1) t.
[1] R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
[2] S. N. Dorogovtsev, and J. F. F. Mendes, Adv. Phys. 51, 1079 (2002).
[3] M. E. J. Newman, SIAM Review 45, 167 (2003).
[4] D. J. Watts and S.H. Strogatz, Nature (London) 393, 440 (1998).
[5] A.-L. Barabási and R. Albert, Science, 286, 509 (1999).
[6] L. A. N. Amaral, A. Scala, M. Barthélémy and H. E. Stanley, Proc. Natl. Acad. Sci. U.S.A. 97, 11149 (2000).
[7] R. Albert and A.-L. Barabási, Phys. Rev. Lett. 85, 5234 (2000).
[8] S. N. Dorogovtsev and J. F. F. Mendes, Phys. Rev. E 62, 1842 (2000).
[9] K. Klemm and V. M. Egúıluz, Phys. Rev. E 65, 036123 (2002).
[10] S. Mossa, M. Barthélémy, H. E. Stanley and L. A. N. Amaral, Phys. Rev. Lett. 88, 138701 (2002).
[11] Z. Liu, Y.-C. Lai and N. Ye, Phys. Rev. E 66, 036112 (2002).
[12] S. Fortunato, A. Flammini and F. Menczer, Phys. Rev. Lett. 96, 218701 (2006).
[13] W. Jeżewski, Phys. Rev. E 66, 067102 (2002).
[14] R. Xulvi-Brunet and I. M. Sokolov, Phys. Rev. E 66, 026118 (2002).
[15] A. Vázquez, Phys. Rev. E 67, 056104 (2003).
[16] Tao Zhou, Gang Yan and B.-H. Wang, Phys. Rev. E 71, 046141 (2005).
[17] Wen-Xu Wang, Bo Hu, Tao Zhou, Bing-Hong Wang, and Yan-Bo Xie, Phys. Rev. E 72, 046140 (2005).
[18] J. A. Dunne, R. J. Williams and N. D. Martinez, Proc. Natl. Acad. Sci. U.S.A. 99, 12917 (2002).
[19] K.-I. Goh, B. Kahng and D. Kim, Phys. Rev. Lett. 88, 108701 (2002).
[20] Q. Chen et al., The origins of power laws in Internet topologies revisited, in Proceedings of the 21st Annual Joint Conference
of the IEEE Computer and Communications Societies, IEEE Computer Society (2002).
[21] A. Vázquez, R. Pastor-Satorras1 and A. Vespignani, Phys. Rev. E 65, 066130 (2002).
[22] S. Lawrence and C. Lee Giles, Science, 280, 98 (1998).
[23] B. A. Huberman and L. A. Adamic, Nature (London), 401, 131 (1999).
[24] R. Albert, H. Jeong and A.-L. Barabási, Nature (London) 406, 378 (2000).
[25] S. N. Dorogovtsev and J. F. F. Mendes, EuroPhys. Lett. 52, 33 (2000).
[26] J. L. Slater, B. D. Hughes and K. A. Landman, Phys. Rev. E 73, 066111 (2006).
[27] N. Sarshar and V. Roychowdhury, Phys. Rev. E 69, 026101 (2004).
[28] S. N. Dorogovtsev, J. F. F. Mendes and A. N. Samukhin, Phys. Rev. Lett. 85, 4633 (2000).
[29] S. N. Dorogovtsev and J. F. F. Mendes, Phys. Rev. E 63, 056125 (2001).
[30] A.-L. Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999).
[31] M. E. J. Newman, Computer Physics Communications 147, 40 (2002).
[32] J. Camacho, R. Guimerà and L. A. N. Amaral, Phys. Rev. Lett. 88, 228102 (2002).
[33] M. E. J. Newman, S. Forrest and J. Balthrop, Phys. Rev. E 66, 035101 (2002).
[34] H. Jeong, S. P. Mason, A.-L. Barabási and Z. N. Oltvai, Nature (London), 411, 41 (2001).
[35] M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
[36] R. Pastor-Satorras1, A. Vázquez and A. Vespignani, Phys. Rev. Lett. 87, 258701 (2001).
[37] S. Maslov and K. Sneppen, Science, 296, 910 (2002).
[38] M. E. J. Newman, Phys. Rev. E 67, 026126 (2003).
[39] S. Maslov, K. Sneppen and A. Zaliznyak, e-print cond-mat/0205379.
[40] J. Park and M. E. J. Newman, Phys. Rev. E 68, 026112 (2003).
[41] Huang Zhuang-Xiong, Wang Xin-Ran and Zhu Han, Chinese Physics 13, 273 (2004).
[42] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, in Proceedings of
the 9th WWW Conference (Elsevier, Amsterdam, 2000), p. 309.
[43] S. N. Dorogovtsev, J. F. F. Mendes and A. N. Samukhin, Phys. Rev. E 64, 025101(R) (2001).
[44] It seems that this is not a very good approximation, since investigations indicate that values of (∂D(i, t)/∂i) |i=ik are large
and increase rapidly with the decrease of Pa. Thus the analysis here is a qualitative one and only suit for the condition of
slight node deletion in the model.
[45] Another way to represent degree correlation is to calculate the mean degree of the nearest neighbors of a vertex as a function
of the degree k of that vertex. Although such way is explicit to characterize degree correlation for highly heterogeneously
organized networks, for less heterogeneous networks (this is the case in the proposed model when the intensity of node
deleting increases, see Fig. 2), it may be very nosy and difficult to interpret. So here we adopt the assortativity coefficient
r to characterize degree correlation in the model.
http://arxiv.org/abs/cond-mat/0205379
	INTRODUCTION
	THE MODEL
	DEGREE DISTRIBUTION
	DEGREE CORRELATION
	SIZE OF GIANT COMPONENT
	AVERAGE DISTANCE BETWEEN NODES
	CLUSTERING
	CONCLUSION
	Acknowledgments
	THE CALCULATION OF S(T)
	References
ABSTRACT
  The ever-increasing knowledge of the structure of various real-world networks
has uncovered their complex multi-mechanism-governed evolution processes.
Therefore, a better understanding of the structure and evolution of these
networked complex systems requires us to describe such processes in a more
detailed and realistic manner. In this paper, we introduce a new type of
network growth rule which comprises addition and deletion of nodes, and propose
an evolving network model to investigate the effect of node deleting on network
structure. It is found that, with the introduction of node deleting, network
structure is significantly transformed. In particular, degree distribution of
the network undergoes a transition from scale-free to exponential forms as the
intensity of node deleting increases. At the same time, nontrivial
disassortative degree correlation develops spontaneously as a natural result of
network evolution in the model. We also demonstrate that node deleting
introduced in the model does not destroy the connectedness of a growing network
so long as the increasing rate of edges is not excessively small. In addition,
it is found that node deleting will weaken but not eliminate the small-world
effect of a growing network, and generally it will decrease the clustering
coefficient in a network.

<|endoftext|><|startoftext|>
Introduction
It is well known that the Hamiltonian cycle problem(HCP) is one of the standard
NP-complete problem [1]. As for digraphs, even when the digraphs on this case:
planar digraphs with indegree 1 or 2 and outdegree 2 or 1 respectively, it is still
NP − Complete which is proved by J.Plesńık [2].
Let us named a simple strong connected digraphs with at most indegree 1 or
2 and outdegree 2 or 1 as Γ digraphs. This paper solves the HCP of Γ digraphs
with following main results.
Theorem 1. Given an incidence matrix Cnm of Γ digraph, building a mapping:F =
, then F is a incidence matrix of undirected balanced bipartite graph
G(X,Y ;E), which obeys the following properties:
c1. |X | = n,|Y | = n,|E| = m
∀xi ∈ X ∧ 1 ≤ d(xi) ≤ 2
∀yi ∈ Y ∧ 1 ≤ d(yi) ≤ 2
c3. G has at most n
components which is length of 4.
Let us named the undirected balanced bipartite graph G(X,Y : E) of Γ
digraph as projector graph.
http://arxiv.org/abs/0704.0309v3
Theorem 2. Let G be the projector graph of a Γ graph D(V,A), determining
a Hamiltonian cycle in Γ digraph is equivalent to find a perfect match M in
G and r(C′) = n − 1, where C′ is the incidence matrix of D′(V, L) ⊆ D and
L = {ai|ai ∈ D ∧ ei ∈ M}.
Let the each component of G corresponding to a boolean variable, a mono-
tonic function f(M) is build to represents the number of component in D. Based
on this function, the maximum number of non-isomorphism perfect matching is
linear, thus complexity of Γ digraphs has a answer.
Theorem 3. Given the incidence matrix Cnm of a Γ digraph , the complexity
of finding a Hamiltonian cycle existing or not is O(n4)
The concepts of cycle and rank of graph are given in section 2. Then theorems
1,2,3 are proved in sections 3,4,5 respectively. The last section discusses the P
versus NP in more detail.
2 Definition and properties
Throughout this paper we consider the finite simple (un)directed graph D =
(V,A) (G(V,E), respectively), i.e. the graph has no multi-arcs and no self loops.
Let n and m denote the number of vertices V and arcs A (edges E, respectively),
respectively.
As conventional, let |S| denote the number of a set S. The set of vertices V
and set of arcs of A of a digraph D(V,A) are denoted by V = {vi|1 ≤ i ≤ n}
and A = {aj |(1 ≤ j ≤ m) ∧ aj =< vi, vk >, (vi 6= vk ∈ V )} respectively,
where < vi, vk > is a arc from vi to vk. Let the out degree of vertex vi denoted
by d+(vi), which has the in degree by denoted as d
−(vi) and has the degree
d(vi) which equals d
+(vi) + d
−(vi). Let the N
+(vi) = {vj | < vi, vj >∈ A}, and
N−(vi) = {vj | < vj , vi >∈ A}.
Let us define a forward relation ⊲⊳ between two arcs as following, ai ⊲⊳ aj =
vk iff ai =< vi, vk > ∧aj =< vk, vj >. It is obvious that |ai ⊲⊳ ai| = 0 .
A cycle L is a set of arcs (a1, a2, . . . , al) in a digraph D, which obeys two
conditions:
c1. ∀ai ∈ L, ∃aj , ak ∈ L \ {ai}, ai ⊲⊳ aj 6= aj ⊲⊳ ak ∈ V
c2. |
ai 6=aj∈L
ai ⊲⊳ aj | = |L|
If a cycle L obeys the following conditions, it is a simple cycle.
c3. ∀L′ ⊂ L, L′ does not satisfy both conditions c1 and c2.
A Hamiltonian cycle L is also a simple cycle of length n = |V | ≥ 2 in digraph.
As for simplify, this paper given a sufficient condition of Hamiltonian cycle in
digraph.
Lemma 1. If a digraph D(V,A) include a sub graph D′(V, L) with following
two properties, the D is a Hamiltonian graph.
c1. ∀vi ∈ D′ → d+(vi) = 1 ∧ d−(vi) = 1,
c2. |L| = |V | ≥ 2 and D′ is a strong connected digraph.
A graph that has at least one Hamiltonian cycle is called a Hamiltonian
graph. A graph G=(V ;E) is bipartite if the vertex set V can be partitioned into
two sets X and Y (the bipartition) such that ∃ei ∈ E, xj ∈ X, ∀xk ∈ X \ {xj},
(ei ⊲⊳ xj 6= ∅ → ei ⊲⊳ xk = ∅) (ei, Y , respectively). if |X | = |Y |, We call that
G is a balanced bipartite graph. A matching M ⊆ E is a collection of edges
such that every vertex of V is incident to at most one edge of M , a matching of
balanced bipartite graph is perfect if |M | = |X |. Hopcroft and Karp shows that
constructs a perfect matching of bipartite in O((m + n)
n) [3]. The matching
of bipartite has a relation with neighborhood of X .
Theorem 4. [4] A bipartite graph G = (X,Y ;E) has a matching from X into
Y if and only if |N(S)| ≥ S, for any S ⊆ X.
Lemma 2. A even length of simple cycle consist of two disjoin perfect matching.
Two matrices representation for graphs are defined as follows.
Definition 1. [5] The incidence matrix C of undirected graph G is a two di-
mensional n×m table, each row represents one vertex, each column represents
one edge, the cij in C are given by
cij =
1, if vi ∈ ej;
0, otherwise.
It is obvious that every column of an incidence matrix has exactly two 1
entries.
Definition 2. [5] The incidence matrix C of directed graph D is a two dimen-
sional n×m table, each row represents one vertex, each column represents one
arc the cij in C are given by
cij =
1, if < vi, vi >⊲⊳ aj = vi;
−1, if aj ⊲⊳< vi, vi >= vi;
0, otherwise.
It is obvious to obtain a corollary of the incidence matrix as following.
Corollary 1. Each column of an incidence matrix of digraph has exactly one 1
and one −1 entries.
Theorem 5. [5] The C is the incidence matrix of a directed graph with k com-
ponents the rank of C is given by
r(C) = n− k (3)
In order to convince to describe the graph D properties, in this paper, we
denotes the r(D) = r(C).
3 Divided incidence matrix and Projector incidence
matrix
Firstly, let us divided the matrix of C into two groups.
C+ = {cij |cij ≥ 0 otherwise 0 } (4)
C− = {cij |cij ≤ 0 otherwise 0 } (5)
It is obvious that the matrix of C+ represents the forward arc of a digraph
and C− matrix represents the backward arc respectively. A corollary is deduced
as following.
Corollary 2. A digraph D = (V,A) is strong connected if and only if the rank
of divided incidence matrix satisfies r(C+) = r(C−) = |V |.
Secondly, let us combined the the C+ and C− as following matrix.
In more additional, let F represents as an incidence matrix of undirected
graph G(X,Y ;E). The F is named as projector incidence matrix of C and
G is named as projector graph , where X represents the vertices V + of D, Y
represents the vertices of V − respectively. In another words we build a mapping
F : D → G and denotes it as G = F (D). So the F (D) has 2n vertices and
m edges if D has n vertices and m arcs. We also build up a reverse mapping:
F−1 : G → D When G is a projector graph. To simplify, we also denotes the
arcs ai = F
−1(ei), v
i = F
−1(xi) and v
i = F
−1(yi).
3.1 Proof of Theorem 1
Firstly, let us prove the theorem 1.
Proof. c1. Since Γ digraph is strong connected, then each vertices of Γ digraph
has at least one forward arcs, each row of C+ has at least one 1 entries, and
the U represents the C+ , so
|U | = n
the same principle of C−, each row of C− has at least one −1 entries, and
the V represents the C− , so
|V | = n
Since the columns of F equal to the columns of C,
|E| = m
c2. Since the degree of each vi of Γ digraph is 1 ≤ d+(vi) ≤ 2,
∀ui ∈ U ∧ 1 ≤ d(ui) ≤ 2
Since the degree of each vi of Γ digraph is 1 ≤ d−(vi) ≤ 2,
∀vi ∈ V ∧ 1 ≤ d(vi) ≤ 2
c3. Let us prove by contradiction, suppose there are k > n
components with
length of 4 in G. Since D is strong connected, according to the corollary 2,
r(F ) = 3n
− q ≥ r(C+) = n, where q ≥ k is number of components (in-
cluding k components with length of 4). Thus q ≤ n
, then there are only x
components without length 4, where x is
x = q − k < n
Suppose the remind x components with length of t (at least t vertices con-
nected by some edges), then 4k + xt = 3n
. So tx = 3n
− 4k < n
. According
to the equation 7, the t < 2. It is contradict that the D is strong connected.
3.2 The cycle in digraph corresponding matching in projector graph
Secondly, let us given the properties after mapping Hamiltonian cycle L of D
into the sub graph M of projector graph G.
Lemma 3. If a Hamiltonian cycle L of D mapping into a forest M of projector
graph G, the forest M consist of |L| number of trees which has only two node
and one edge, and M has a unique perfect matching.
Proof. Let the Γ digraph D(V,A) has a sub digraph D′(V, L) which exists one
Hamiltonian cycle and |L| = n, the incidence matrix C of L could be permutation
as follows.
1 0 0 . . . 0 −1
−1 1 0 . . . 0 0
0 −1 1 . . . 0 0
0 0 −1 . . . 0 0
0 0 0 . . . 0 0
0 0 0 . . . −1 1
. (8)
It is obvious that each row of F has only one 1 entry and each column of F
has two 1 entries.
According to theorem 1, F represents a balanced bipartite graph G(X,Y ;E)
that each vertex has one edge connected, and each edge ei connect on vertex
xi ∈ X , another in Y , in another words, ∃ei ∈ E xj ∈ X ,∀xk ∈ X \ {xj}, ei ⊲⊳
xj 6= ∅ → ei ⊲⊳ xk = ∅(ei, Y ,respectively). According the matching definition,
M is a matching, since |E| = |L|, E is a perfect matching. and pair of vertices
between X and Y only has one edge, so M is a forest, and each tree has only
two node with one edge.
4 Proof of Theorem 2
Proof. ⇒ Let the Γ digraph D(V,A) has a sub digraph D′(V, L) which is a
Hamiltonian cycle and |L| = n, let matrix C′ represents the incidence matrix of
D′, so r(C′) = n − 1; According to lemma 3, the projector graph F (D′) has a
perfect matching, thus F (D) also has a perfect matching.
⇐ Let G(X,Y ;E) be a projector graph of the Γ graphD(V,A),M is a perfect
matching in G. Let D′(V, L) be a sub graph of D(V,A) and L = {ai|ai ∈ D∧ei ∈
M}. Since r(L) = n− 1, D′(V, L) is a strong connected digraph. it deduces that
∀vi ∈ D′,d+(vi) ≥ 1 ∧ d−(vi) ≥ 1. Suppose ∃vi ∈ D′, d+(vi) > 1 (d−(vi) > 1
respectively), Since |M | = n, it deduces that
i=1 d(vi) > 2n+ 1, which imply
that |L| > n. this is contradiction with L = {ai|ai ∈ D ∧ ei ∈ M} and |M | = n.
So ∀vi ∈ D′, d+(vi) = d−(vi) = 1, According the lemma 1, D′ has a Hamiltonian
cycle.
5 Number of perfect matching in projector graph
Let us considering the number of perfect matching in G . Firstly, let us consid-
ering a example as shown in figure 1.
Figure 1. Original Digraph D
Then the projector graph is shown in figure 2.
Figure 2. Projector graph G
. . .
Given a perfect matching M , each component(cycle) in G has two partition
edges belong to M . Let us code component Gi which |Gi| > 2 and matching M
to a binary variable.
1, if Gi ∩M = {ej, ek, . . .};
0, if Gi ∩M = {el, eq, . . .}.
Now there are two cases for the number of perfect matching.
Label edge. In that cases, the Code(M1) = {0, 0, 1} is different with Code(M2) = {0, 1, 0}.
If there are k number of components(cycles), then there are 2k perfect match-
Unlabel edge. In that cases, the Code(M1) = {0, 0, 1} is isomorphic to Code(M2) = {0, 1, 0}.
The same principle that Code(M3) = {0, 1, 1} is isomorphic to Code(M4) =
{1, 1, 0} but is not isomorphic to Code(M1).
Then let us summary the maximal number of perfect matching in these two
cases.
Lemma 4. The maximal number of labeled perfect matching in a projector graph
G is 2
4 , but the maximal number of unlabeled perfect matching in a projector
graph G is n
Proof. According to the theorem 1, there at most n
components with a compo-
nents which is length of k = 4. When k=2, there are only one perfect matching
in G; When k = 4, there are n
components which is C4, and so on when k = 6,
there are n
components which is C6, etc, so on. According to the lemma 2, each
simple cycle has divided the perfect matching into two class. So maximal number
perfect matching in the non isomorphism cycle which is 2
4 . Since in unlabeled
cases, every C4 cycle is isomorphism, the maximal number of perfect matching
is 2 ∗ n
Review the example 1 again, it is easy find that follow proposition.
Proposition 1. Given two perfect matching M1 and M2 in projector graph G,
if code(M1) = code(M2), then the r(F−1(M1)) = r(F−1(M2)).
5.1 Proof of Theorem 3
Now let us proof the theorem 3.
Proof. Let G be a project balanced bipartition of D. According theorem 1, the
Γ graph is equivalent to find a perfect match M in a project G.
According to the lemma 4, the maximal number non isomorphism perfect
matching in G is only n.
Thus it is only need exactly enumerate all of non isomorphism perfect match-
ing M , then obtain the value = r(F−1(M)),if value = n − 1, then the ei ∈ M
is also ei ∈ C, where C ⊂ D is a Hamiltonian cycle.
Since the complexity of rank of matrix is O(n3), finding a simple cycle in
a component with degree 2 is O(n2), and obtaining a perfect matching of a
bipartite graph is O((m+ n)
n) < O(n2) [3]. Then all exactly algorithms need
to calculate the n time o(n3). Thus the complexity is O(n4).
Since the non isomorphism perfect matching comes from the coding of edges
in the component of G, it is not easy implementation.
Let us give two recursive equation to obtain a perfect matching M from G.
Suppose there are k component G1, G2, . . .Gk in G where Gi is a component
with degree 2 and |Ei| ≥ 3.
M ′ =
M(t)⊗Gt, Gt is a cycle ;
M(t), otherwise.
M(t+ 1) =
M ′, if r(F−1(M ′)) > r(F−1(M(t))) ;
M(t), otherwise.
where t ≤ k − 1, when t = 0, M(0) is the initial perfect matching from G.
When r(F−1(M(t))) = n− 1, According the theorem 1, the A = F−1(M(t))
is a Hamiltonian cycle solution. If all of r(F−1(M(t))) < n − 1, then there has
no Hamiltonian cycle in D.
Since the non isomorphism perfect matching M in G is poset, the function
r(F−1(M)) in G is monotonic, so this approach is exactly approach.
Let us give a example to illustrate the approach in detail.
Example 1. Considering the digraph D in figure 1, then the projector graph G
in figure 2.
Let M(0) = {e1, e8, e22, e9, e10, e3, e20, e11, e19, e5, e18, e6, e17, e7, e15, e16}.
Thus the r(F−1(M(0)) = n−3. LetM ′ = r(F−1(M(0)⊗G3),then r(F−1(M ′) =
n−4, thus M(1) = M(0) and then turn to G2,G1. At last it obtain the solution.
Considering the equation 11, let it substituted by following equations when
r(M ′) = n− 1 and t < k − 1.
M(t+ 1) = M ′ if r(F−1(M ′)) ≥ r(F−1(M(t))) (12)
It is obvious that all non-isomorphism Hamiltonian cycle could obtain by the
repeat check the equation 12 and the equation r(M ′) = n− 1.
In conversely, if a Hamiltonian cycle of Γ digraphs is given, it represents a
perfect matching M in its projector graph G. Thus the equation 12 and Theo-
rem 3 follows a corollary.
Corollary 3. Given a Hamiltonian Γ digraph, the complexity of determining
another non-isomorphism Hamiltonian cycle is polynomial time.
5.2 The HCP in digraph with bound two
Let us extend the Theorem 3 to digraphs with d+(v) ≤ 2 and d−(v) ≤ 2 in this
section.
Theorem 6. The complexity of finding a Hamiltonian cycle existing or not in
digraphs with degree d+(v) ≤ 2 and d−(v) ≤ 2 is polynomial time.
Proof. Suppose a digraph D(V,A) having a vertex vi is shown as figure 3, which
is d(vi) = 2 ∧ d−(vi) = 2
Figure 3. A vertex with degree than 2
❍❍❍❍❍❍❥ ♠
❍❍❍❍❍❍❥a3
Let us spilt this vertex to two vertices that one of vertex has degree with
in degree 2 or out degree 1 , another vertex has degree with in degree 1 or out
degree 2 as shown in figure 4. Then the D is derived to a new Γ graph S.
Figrue 4 A vertex in D is mapping to a vertex in Γ digraph
It is obvious that each vertex in the Γ graph S has increase 1 vertices and 1
arcs of D. Suppose the worst cases is each vertex in D has in degree 2 and out
degree 2, the total vertices in S has 2n vertices.
According to the theorem 3, obtain a Hailtonian cycle L′ in S is no more
then O(n4), then the D will has a Hamiltonian cycle L′ = L ∩ A.
6 Discussion P versus NP
The P versus NP is a famous open problem in computer science and math-
ematics, which means to determine whether very language accepted by some
nondeterministic algorithm in polynomial time is also accepted by some deter-
ministic algorithm in polynomial time [6]. Cook give a proposition for the P
versus NP .
Proposition 2. If L is NP-complete and L ∈ P , then P = NP .
According above proposition and the result above section, P versus NP
problem has a answer.
Theorem 7. P = NP
Proof. As the result of [2], the complexity of HCP in digraph with bound two
is NP − complete. According the theorem 6, the complexity of HCP in digraph
with bound two is also P , thus according to proposition 2, P = NP .
In fact, the [2] proves that 3SAT �p HCP of Γ digraph, since 3SAT is a
NPC problem, which also implies that P = NP .
7 Conclusion
According to the theorem 6, the complexity of determining a Hamiltonian cycle
existence or not in digraph with bound degree two is in polynomial time. And
according to the theorem 7, P versus NP problem has closed, P = NP .
Acknowledgements
The author would like to thank Prof. Kaoru Hirota for valuable suggestions,
thank Prof. Jørgen Bang-Jensen who called mine attention to the paper [2], and
thank Andrea Moro for useful discussions.
References
1. Papadimitriou, C. H. Computational complexity , in Lawler, E. L., J. K. Lenstra, A.
H. G. Rinnooy Kan, and D. B. Shmoys, eds., The Traveling Salesman Problem: A
Guided Tour of Combinatorial Optimization. Wiley, Chichester, UK. (1985), 37–85
2. J.Plesńık,The NP-Completeness of the Hamiltonian Cycle Problem in Planar di-
graphs with degree bound two, Journal Information Processing Letters, Vol.8(1978),
199–201
3. J.E. Hopcroft and R.M. Karp , An n5/2 Algorithm for Maximum Matchings in
Bipartite Graphs . SIAM J. Comput. Vol.2, (1973), 225–231
4. P. Hall, On representative of subsets, J. London Math. Soc. 10, (1935), 26–30
5. Pearl, M, Matrix Theory and Finite Mathematics,McGraw-Hill, New York,(1973),
332–404.
6. Stephen Cook. The P Versus NP Problem ,”http://citeseer.ist.psu.edu/302888.html”
,2000.
	 The Complexity of HCP in Digraps with Degree Bound Two
	Guohun Zhu
	Introduction
	Definition and properties
	Divided incidence matrix and Projector incidence matrix
	Proof of Theorem ?? 
	The cycle in digraph corresponding matching in projector graph
	Proof of Theorem ?? 
	 Number of perfect matching in projector graph 
	Proof of Theorem ?? 
	 The HCP in digraph with bound two
	 Discussion P versus NP
	Conclusion
ABSTRACT
  The Hamiltonian cycle problem (HCP) in digraphs D with degree bound two is
solved by two mappings in this paper. The first bijection is between an
incidence matrix C_{nm} of simple digraph and an incidence matrix F of balanced
bipartite undirected graph G; The second mapping is from a perfect matching of
G to a cycle of D. It proves that the complexity of HCP in D is polynomial, and
finding a second non-isomorphism Hamiltonian cycle from a given Hamiltonian
digraph with degree bound two is also polynomial. Lastly it deduces P=NP base
on the results.

<|endoftext|><|startoftext|>
Introduction
GHz-Peaked-Spectrum (GPS) radio sources are powerful
(P1.4 GHz ≥ 10
25 W Hz−1), compact (≤ 1 kpc), and have con-
vex radio spectra, and they make up a significant fraction (≈
10%) of the bright radio source sample, see O’Dea (1998) for
a review. In general, the presence of large scale emission asso-
ciated with GPS galaxies is rare, about a few percent in a GPS
sample (Stanghellini et al. 2005). Most GPS sources appear to
be truly compact and isolated.
Their small size is most likely due to their youth (< 104
years) according to a spectral aging analysis (Murgia 2003). A
couple of GPS sources are certainly young radio sources whose
kinematic age from lobe proper motions has been measured
and these sources are also identified as Compact Symmetric
Objects (CSOs). There is compelling evidence in favour of the
youth scenario of GPS sources and CSOs, see e.g. Owsianik
& Conway (1998), Tschager et al. (2000), Polatidis & Conway
(2003), and Orienti et al. (2007). The GPS sources and CSOs
are the key objects to study the early evolution of power-
ful radio-loud AGN. A unification scenario assumes that GPS
sources evolve into Compact Steep Spectrum sources (1-15
kpc), which in turn, evolve into classical extended radio sources
(> 15 kpc), i.e. FR I/II radio sources (Fanti et al. 1995, Snellen
et al. 2000, de Vries et al. 2007).
Send offprint requests to: X. Liu: liux@ms.xjb.ac.cn
GPS galaxies are dominated by lobe/jet emission on both
sides of the central engine, and are thought to be relatively free
of beaming effects. The GPS galaxies show very low polariza-
tion (about less than 0.5% at 5 GHz, Dallacasa 2004, Xiang
et al. 2006). The low integrated polarization could be due to
large Faraday depths around the radio source, which would de-
polarize the radio emission, implying that their host-AGNs are
probably edge-on to us.
Since GPS sources live in the narrow line region of AGN,
it is likely that their low frequency radio emission will be ab-
sorbed due to either synchrotron self-absorption or free-free
absorption, giving rise to a peaked radio spectrum. Therefore,
GPS sources are also suitable for studying radio absorption and
scattering in AGNs.
We have carried out EVN (European VLBI Network) ob-
servations of 19 GPS sources, 15 of them are from the Parkes
Half Jansky (PHJ) sample (Snellen et al. 2002) with declina-
tion > −5◦ and not observed with VLBI before. Four sources
are from our previous observation list which we have observed
with the EVN at 2.3/8.4 GHz and/or 5 GHz (see Xiang et al.
2005, 2006). We aimed at imaging the GPS sources at 1.6 GHz,
in order to confirm whether the GPS sources are double-lobe
sources, and to find CSO candidates. For the sources with ob-
servations at 2.3, 5.0 and 8.4 GHz, the 1.6 GHz images will
further provide information on their source structure and inten-
http://arxiv.org/abs/0704.0310v1
2 X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz
sity at lower frequency, for further spectral study of the GPS
sources in the future.
2. Observations and data reduction
The observations were carried out on 3 March 2006 at
1.65 GHz using the MK5 recording system with a band-
width of 32 MHz and sample rate of 256 Mbps in dual
circular polarization. The EVN antennae in this experiment
were Effelsberg, Westerbork, Jodrell, Medicina, Noto, Onsala,
Torun, Hartebeesthoek, Urumqi and Shanghai. Snapshot obser-
vations of 19 sources (Table 1) in a total of 24 hours were made.
OQ208 and DA193 were observed as calibrators. The data cor-
relation was completed at JIVE.
The total flux densities of the sources were also measured
at 5 GHz with Urumqi 25m telescope in order to find any flux
variability. The values are listed in Table 2.
The Astronomical Image Processing System (AIPS) has
been used for editing, a-priori calibration, fringe-fitting, self-
calibration, imaging and model fitting of the data.
3. Results and comments on individual sources
We list the basic information of the sources in Table 1, and the
parameters derived from the VLBI images in Table 3. We com-
ment on the results of each source and give a short discussion.
We use S ∝ ν−α to define the spectral index. Optical informa-
tion and redshifts of the GPS sources in the PHJ sample are
given by de Vries et al. (2007), as listed in Table 1.
3.1. J0210+0419 (PKS B0208+040)
The 1.6 GHz VLBI image (Fig. 1) is the first VLBI image of
the source. It shows a double-lobe structure, and is most likely
a CSO. Optical observations did not result in an identification
with a lower limit of mR > 24.1, but it is identified with a mag-
nitude of Ks=18.3 (de Vries et al. 2007).
3.2. J0323+0534 (4C+05.14)
The 1.6 GHz VLBI image (Fig. 2) is the first VLBI image of the
source, and it exhibits a strong diffuse component and a weak
extended component in the south. Both are likely lobe emis-
sion. About 38% total flux density (estimated from Table 1) is
resolved out in the VLBI image, due to the diffuse components.
For its size of 490 pc, the source can be a CSO.
3.3. J0433−0229 (4C−02.17)
The 1.6 GHz VLBI image (Fig. 3) is the first VLBI image of the
source, and the main component is diffuse and extended in the
north-south direction, and a possible weak component in the
south. About 18% total flux density (estimated from Table 1)
is resolved out in the VLBI image. Either a core-jet or a CSO
classification is possible for the source.
3.4. J0913+1454 (PKS B0910+151)
The 1.6 GHz VLBI image (Fig. 4) is the first VLBI image of
the source. It shows double structure and both components are
further resolved. There is probably a hot-spot imbedded in the
bright one. We consider it as a CSO candidate.
3.5. J1057+0012 (PKS B1054+004)
The 1.6 GHz VLBI image (Fig. 5) is the first VLBI image of
the source. There is a bright compact component followed by
a secondary component and a series of possible weak compo-
nents in the east, indicating this is a core-jet source. A flux vari-
ability of (−11.4 ± 3.6)% over 15 years at 5 GHz, as reported
in Table 2, is consistent with the core-jet classification.
3.6. J1109+1043 (PKS B1107+109)
The 1.6 GHz VLBI image (Fig. 6) is the first VLBI image of
the source. It is a double structure, and can be a CSO candi-
date. The total flux density (1270 mJy estimated from Table 1)
is completely restored in the VLBI image (1370 mJy, increased
by 8%). There is also an indication of total flux increasing
(4.9 ± 12.4)% at 5 GHz in Table 2 but with a large error.
3.7. J1135−0021 (4C−00.45)
The 1.6 GHz VLBI image (Fig. 7) is the first VLBI image of
the source. It shows a double-lobe structure, and with a size of
720 pc, we classify the source as a CSO.
3.8. J1203+0414 (PKS B1200+045)
The 1.6 GHz VLBI image (Fig. 8) is the first VLBI image of
the source. The triple structure may consist of a core and two
sided emission, or a one sided core-jet source. The quasar as
newly identified by de Vries et al (2007), is possibly a core-jet
one, but still we keep the source as a CSO candidate.
3.9. J1352+0232 (PKS B1349+027)
The 1.6 GHz VLBI image (Fig. 9) is the first VLBI image of
the source. It shows a double-lobe like structure, for its size of
918 pc we consider it as a CSO.
3.10. J1352+1107 (4C+11.46)
The 1.6 GHz VLBI image (Fig. 10) is the first VLBI image of
the source. It appears to have a compact double structure or a
core-jet alike, and seems diffuse emission around the source.
About 30% total flux density (estimated from Table 1) is re-
solved out in the VLBI image. Either a core-jet or a compact
double classification is possible.
3.11. J1600−0037 (PKS B1557−004)
The 1.6 GHz VLBI image (Fig. 11) is the first VLBI image of
the source, and it has an overall double structure, the eastern
X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz 3
Table 1. The GPS sources. Columns (1),(2) source names; (3) optical identification (G: galaxy, QSO: quasar, EF: empty field);
(4) optical magnitude; (5) redshift (de Vries et al. 2007, those with * are a photometric estimated by Tinti et al. 2005); (6) linear
scale factor pc/mas [H0 = 71kms
−1Mpc−1 and q0 = 0.5 have been assumed]; (7) maximum angular size from the observation; (8)
maximum linear size; (9) 1.4 GHz flux density from the NVSS; (10) 2.7 GHz flux density from Snellen sample and the NED; (11)
low frequency spectral index; (12) higher frequency spectral index ( computed from columns 9 and 10); (13) turnover frequency;
(14) peak flux density; (15) references for the spectral information, 1 Snellen et al. 2002, 2 de Vries et al. 1997, 3 Stanghellini et
al. 1998, where S ∝ ν−α.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S ource id mR z pc/mas θ L S 1.4 S 2.7 αl αh νm S m re f
mas pc mJy Jy GHz Jy
J0210+0419 B0208+040 G 18.3Ks 1.5* 6.1 90 948 0.56 0.80 0.4 1.3 1
J0323+0534 4C+05.14 G 19.2 0.1785 2.7 180 490 2793 1.60 0.85 0.4 7.1 1
J0433−0229 4C−02.17 G 19.1 0.530 5.1 80 408 1462 1.04 0.52 0.4 3.0 1
J0913+1454 B0910+151 G 22.9 0.47* 4.9 80 881 0.54 0.75 0.6 1.1 1
J1057+0012 B1054+004 G 22.3 0.65* 5.5 80? 898 0.58 0.67 0.4 1.6 1
J1109+1043 B1107+109 G 22.6 0.55* 5.2 60 1481 0.80 0.94 0.5 2.4 1
J1135−0021 4C−00.45 G 21.9 0.975 6.0 120 720 1268 0.76 0.78 0.4 2.9 1
J1203+0414 B1200+045 QSO 18.8 1.221 6.1 75 458 1146 0.85 0.45 0.4 1.4 1
J1352+0232 B1349+027 G 20.0 0.607 5.4 170 918 1145 0.78 0.58 0.4 2.0 1
J1352+1107 4C+11.46 G 21.0 0.891 5.9 50 295 1538 0.78 1.03 0.4 3.6 1
J1600−0037 B1557−004 G 50 1168 0.54 1.17 1.0 1.2 1
J1648+0242 4C+02.43 G 22.1 0.824 5.8 0.61 0.4 3.4 1
J2058+0540 4C+05.78 G 23.4 1.381 6.1 160 970 1213 0.65 0.95 0.4 3.1 1
J2123−0112 B2121−014 G 23.3 1.158 6.1 80 488 1087 0.64 -0.56 0.75 0.5 1.8 2
J2325−0344 B2322−040 G 23.5 1.509 6.0 75 450 1224 0.91 -0.42 0.75 1.4 1.3 2
J0917+1113 B0914+114 EF 190 800 0.31 -0.1 1.6 0.3 2.3 3
J1753+2750 B1751+278 G 21.7 0.86* 5.9 50 625 0.46 -0.27 0.57 1.4 0.6 2
J1826+2708 B1824+271 G 22.9 45 332 0.23 -0.39 0.75 1.0 0.4 2
J2325+7917 B2323+790 G 19.5V 32 1136 -0.3 0.75 1.4 1.2 2
component has some extension in the west-east direction. A
flux variability of (13± 6.9)% at 5 GHz in Table 2 may suggest
this is a core-jet source.
3.12. J1648+0242 (4C+02.43)
The GPS source is not detected with VLBI. It is an NVSS
double-lobe source, and totally resolved out in the VLBI ob-
servation.
3.13. J2058+0540 (4C+05.78)
The 1.6 GHz VLBI image (Fig. 12) is the first VLBI image of
the source. It shows a double-lobe source, and for the size of
970 pc, we suggest this is a CSO.
3.14. PKS B2121−014
The 1.6 GHz VLBI image (Fig. 13) shows a double-lobe struc-
ture, it is similar to that at 2.3 and 5 GHz (Xiang et al. 2005,
2006), except that a weak jet-like emission ‘B’, which appears
at 2.3 and 5 GHz, is missing, probably due to absorption at the
lower frequency 1.6 GHz. The source is a CSO for the source
size of 488 pc.
3.15. PKS B2322−040
The 1.6 GHz VLBI image (Fig. 14) exposes a central emission
region between the two lobes ‘A’ and ‘B’, which is probably a
core embedded in the central region. The ‘core’ emission is not
detected at higher frequencies (Xiang et al. 2005, 2006), but it
emerges at 1.6 GHz near the peak frequency (1.4 GHz) of the
GPS source. There is a flux increase of (4.0 ± 1.9)% over 15
years at 5 GHz (Table 2), may suggest that the core is currently
active. The source can be a CSO for its size of 450 pc.
3.16. PKS B0914+114
The 1.6 GHz VLBI image (Fig. 15) exhibits a core ‘A’, jet fea-
ture ‘B’ and two lobes ‘C’, ‘E’. The western one ‘E’ emerges at
this frequency. Labiano et al. (2007) have identified an empty
field (> 25 mR) at the FIRST position of the source, and con-
cluded that the previously identified nearby disk galaxy (a red-
shift of 0.178) is not the host of this radio source 0914+114.
For the typical compact symmetric structure, we consider the
source is a CSO.
4 X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz
Table 2. Source flux and possible variability, columns 2-4 are flux densities at 5.0 GHz (PKS90), 4.85 GHz (Gregory &
Condon 1991, and Griffith et al. 1995) and 4.85 GHz flux measured with the Urumqi 25m telescope on 2007/1/24 (J1648+0242,
J2058+0540, 1824+271 and 2121−014 were not well measured due to source confusion or weak); column 5 is a flux variability
computed from columns 3,4.
S ource S 5.0 S 4.85 S 4.85Ur δS 4.85
mJy mJy mJy %
J0210+0419 300 298±19 302 ± 10 1.3 ±3.1
J0323+0534 830 819±44 868 ± 9 6.0±4.6
J0433−0229 640 640±35 637 ± 14 -0.5±3.3
J0913+1454 300 315±43 297 ± 8 -5.7±10.3
J1057+0012 370 396±23 351 ± 6 -11.4±3.6
J1109+1043 400 408±56 428 ± 8 4.9±12.4
J1135−0021 440 446±25 427 ± 8 -4.3±3.6
J1203+0414 520 640±35 611 ± 7 -4.5 ±4.1
J1352+0232 470 469 ± 7
J1352+1107 410 447±62 418 ± 5 -6.5±11.9
J1600−0037 180 187±14 212 ± 3 13.4±6.9
J1648+0242 260 337±20
J2058+0540 340 356±21
2121−014 320 345±21
2322−040 500 524±29 545 ± 20 4.0 ±1.9
0914+114 140 134±19 140 ± 1 4.5 ±14.1
1751+278 298±39 292 ± 6 -2.0±10.8
1824+271 122±17
2323+790 491 ± 7
OQ208 2421±217 2514 ± 12 3.8±8.8
3.17. 1751+278 (MG2 J175301+2750)
The 1.6 GHz structure (Fig. 16) is similar to what we got be-
fore at 1.6 GHz (Xiang et al. 2002), and confirms that there
is jet-like emission ‘C’ and ‘D’ associated with the southern
component ‘B’, indicating this is a core-jet source.
3.18. B2 1824+271
The 1.6 GHz VLBI image (Fig. 17) exposes a symmetric dou-
ble structure and jet-like emission associated with the two
lobes, confirming this is a CSO as we have suggested (Xiang et
al. 2006).
3.19. [WB92] 2323+790
The 1.6 GHz image (Fig. 18) shows a central component ‘A’
and a weak one ‘B+C’ in the north-west, and the components
‘A’ and ‘B+C’ show steep spectra between 1.6 GHz and 5 GHz
(Xiang et al. 2006). The source can be a CSO candidate.
4. Discussion
In the sample (Table 1), J1648+0242 is an NVSS double source
and is not detected in this VLBI observation; all others are
point-like in the NVSS images, indicating that GPS sources
are compact. Except four sources (J1057+0012, J1352+1107,
J1600−0037 and 1751+278), 14 out of 18 sources exhibit dou-
ble or triple VLBI structure and can be CSOs or CSO can-
didates though some of them have no measured redshift. The
sources with redshift show double or triple structure with sizes
< 1 kpc, suggesting these GPS sources are certainly compact
and likely CSOs.
The mini double-lobe sources or CSOs could be more sta-
ble in flux density than other type of compact sources. We have
measured the flux densities for the sources (Table 2) at 4.85
GHz and compared with the values observed 15 years ago, we
found that 12 among 14 GPS sources are likely stable in flux
(1σ level), two sources (J1057+0012 and J1600−0037) show
about 10% variability in 3σ and 2σ level respectively. The flux
variability on J1057+0012 and J1600−0037 is consistent with
their core-jet classification. ‘Core-jet’ sources are defined to
show a one-sided jet, and the jet is often closely pointing to us
(from a pole-on AGN). It is hard to estimate the real source size
due to Doppler boosting, hence the ‘core-jet’ sources might not
be young radio sources even if they appear to be compact in
some cases.
In addition, some sources are resolved out in our VLBI
image by more than 10% of total flux estimated from
Table 1, probably due to diffuse emission associated with
lobes and tail/jet emission. They are J0210+0419 (-14%),
J0323+0534 (-38%), J0433−0229 (-18%), J1352+0232 (-
15%), J1135+1107 (-31%), J2058+0540 (-12%), 2322−040 (-
15%), and J1648+0242 is completely resolved out. The VLBI
flux densities of the other nine sources at 1.6 GHz are consis-
tent with the estimated total flux densities within an error of
X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz 5
Table 3. The component parameters of the VLBI images at 1.6 GHz. The columns give: (1) source name and possible classifica-
tion (CSOc: CSO candidate, cj: core-jet); (2) total cleaned flux density of image at 1.6 GHz; (3) component identification labled
to Xiang et al. 2002, 2005, 2006; (4),(5) peak and integral intensity of a fitted Gaussian component at 1.6 GHz in the AIPS task
JMFIT; (6),(7) major/minor axes and position angle of component at 1.6 GHz; (8),(9) distance and position angle relative to the
first component; (10) brightness temperature of component.
1 2 3 4 5 6 7 8 9 10
Name S vlbi Comp S p S int θ1 × θ2 PA d PA T b
class mJy mJy mJy mas×mas ◦ mas ◦ 108K◦
J0210+0419 715 A 290 375 5.7 × 2.9 169 0 11.9
CSOc B 118 205 10 × 3.6 175 68.2 ± 0.1 -153.2 ± 0.1 2.2
J0323+0534 1497 A 536 1270 32.6 × 11.1 70 0 0.5
CSO B 117 548 57.5 × 21.7 18 122.7 ± 2.9 -167.2 ± 0.4 0.1
J0433−0229 1095 A 407 1045 14.5 × 4.7 171 0 2.4
CSOc/cj B 49 111 9.9 × 5.5 88 53.0 ± 0.3 164.4 ± 0.2 0.3
J0913+1454 796 A 211 501 8.2 × 4.9 65 0 2.1
CSOc B 55 157 8.9 × 6.2 80 56.8 ± 0.1 73.3 ± 0.1 0.4
J1057+0012 810 A 381 550 3.6 × 2.7 6.6 0 17.5
cj B 37 58 6.2 × 2.1 9.7 9.7 ± 0.2 114.8 ± 0.9 1.3
J1109+1043 1370 A 420 984 5.7 × 4.7 110 0 6.6
CSOc B 143 311 5.4 × 4.2 91 46.3 ± 0.1 104.4 ± 0.1 2.6
J1135−0021 1025 A 279 454 6.6 × 2.7 142 0 4.9
CSO B 143 271 8.6 × 3.0 153 85.8 ± 0.1 164.4 ± 0.1 1.7
J1203+0414 1029 A 571 850 4.7 × 2.9 107 0 25.1
CSOc B 51 81 5.0 × 3.8 78 18.1 ± 0.2 103.0 ± 0.4 1.6
C 31 35 6 × 6 0 58.2 ± 0.2 104.2 ± 0.2 1.9
J1352+0232 885 A 173 480 7.4 × 4.5 53 0 2.1
CSO B 30 120 8.9 × 6.3 110 165.7 ± 0.3 -111.9 ± 0.1 0.2
J1352+1107 896 A 198 395 8.1 × 5.4 11.7 0 2.0
CSOc/cj B 106 202 6.8 × 6.3 10 3.5 ± 0.1 53.1 ± 0.1 1.1
J1600−0037 936 A 394 607 5.4 × 4.1 157 0
cj B 125 255 7.9 × 5.0 86 24.7 ± 0.1 83.5 ± 0.1
J2058+0540 914 A 356 513 7.6 × 3.7 163 0 8.2
CSO B 182 403 12.6 × 4.5 128 127.2 ± 0.1 172.1 ± 0.1 2.3
2121−014 976 A 363 594 5.1 × 3.9 123 0 10.4
CSO C 191 415 7.4 × 5.0 126 59.4 ± 0.1 85.4 ± 0.1 2.9
2322−040 965 A 229 509 14.6 × 3.3 161 0 3.6
CSO B 74 160 11.7 × 5.3 162 40.6 ± 0.2 171.3 ± 0.1 0.8
C 65 196 19.3 × 4.7 1 22.0 ± 0.4 171.1 ± 0.2 0.5
0914+114 578 A 37 50 4.9 × 2.7 16 0
CSO B 29 65 9.1 × 4.3 66 45.6 ± 0.1 80.1 ± 0.1
C 242 360 4.2 × 3.9 90 84.2 ± 0.1 81.6 ± 0.1
E 28 40 4.4 × 3.4 63 86.1 ± 0.1 -95.9 ± 0.1
1751+278 596 A 400 522 4.8 × 2.9 71 0 14.5
cj B 26 37 8 × 3 15 20.5 ± 0.1 -128.6 ± 0.2 0.5
C 12.4 21 7 × 5 59 26.6 ± 0.3 -119.8 ± 0.3 0.2
D 8 20 19.5 × 3.6 176 41.4 ± 0.4 -104.4 ± 0.8 0.1
1824+271 296 A 145 174 3.9 × 2.1 164 0
CSO B 42 69 6.1 × 4.3 137 21.8 ± 0.1 -83.4 ± 0.1
2323+790 900 A 439 621 5.8 × 2.1 159 0
CSOc B+C 107 155 7.9 × 3.1 118 19.2 ± 0.1 -71.1 ± 0.1
10% the estimated amplitude uncertainty of the EVN observa-
tions.
5. Summary and conclusion
1. We obtained total intensity 1.6 GHz VLBI images of 17
GPS sources for the first time. The majority (80%) show
mini-double-lobe radio structure, indicating that they are
CSOs or candidates, and their host AGNs could be edge-
on to us. This result suggests that there is a high incidence
of mini double-lobe sources and CSOs in the GPS source
sample.
6 X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz
2. The sources J0323+0534, J1135−0021, J1352+0232,
J2058+0540, 2121−014 and 2322−040 with measured red-
shift, are double-lobed with sizes of < 1 kpc, and are clas-
sified as CSOs.
3. Three sources (J1057+0012, J1600−0037 and 1751+278)
are classified as core-jet sources according to their mor-
phologies and flux variability.
4. The 1.6 GHz images of the sources 0914+114, 1824+271,
2121−014 and 2322−040, for which we had observations
at 2.3, 5.0 and 8.4 GHz, have provided information on their
source structure and spectra at the lower frequency, permit-
ting further spectral study in the future.
Acknowledgements. We thank the referee Alvaro Labiano, and
Nathan de Vries for comments. The European VLBI Network is a
joint facility of European, Chinese, South African and other radio as-
tronomy institutes funded by their national research councils. This re-
search has made use of the NASA/IPAC Extragalatic Database (NED)
which is operated by the Jet Propulsion Laboratory, Caltech, under
contract with NASA. This work was partly supported by the Natural
Science Foundation of China (NSFC).
References
de Vries N., Snellen I. A. G., Schilizzi R. T., Lehnert M. D., Bremer
M. N., 2007, A&A 464, 879
de Vries W. H., Barthel P. D., O’Dea C. P., 1997, A&A 321, 105
Dallacasa D., 2004, in Proceedings of the 7th EVN Symposium, eds:
Bachiller R., Colomer F., et al.
Fanti, C., Fanti, R., Dallacasa, D., Schilizzi, R. T., Spencer, R. E.,
Stanghellini, C. 1995, A&A 302, 317
Gregory P. C., Condon J. J., 1991, ApJS 75, 1011
Griffith M. R., Wright A. E., Burke B. F., Ekers R. D., 1995, ApJS 97,
Labiano A, Barthel P. D., O’Dea C. P., de Vries W. H., Perez I., Baum
S. A., 2007, A&A 463, 97L
Murgia M., 2003, PASA 20, 19
O’Dea C. P. 1998, PASP 110, 493
Orienti M., Dallacasa D., Stanghellini C., 2007, A&A 461, 923
Polatidis A. G., Conway J. E., 2003, PASA 20, 69
Owsianik I., Conway J. E., 1998, A&A 337, 69
Snellen I. A. G., Lehnert M. D., Bremer M. N., Schilizzi R. T., 2002,
MNRAS 337, 981
Snellen I. A. G., Schilizzi R. T., Miley G. K., de Bruyn A. G., Bremer
M. N., Röttgering H. J. A., 2000, MNRAS 319, 445
Tschager W., Schilizzi R. T., Röttgering H. J. A., Snellen I. A. G.,
Miley G. K., 2000, A&A 360, 887
Stanghellini C., O’Dea C. P., Dallacasa D., Cassaro P., Baum S. A.,
Fanti R., Fanti C., 2005, A&A 443, 891
Stanghellini C., O’Dea C. P., Dallacasa D., Baum S. A., Fanti R., Fanti
C., 1998, A&AS 131, 303
Tinti S.,Dallacasa D., de Zotti G., Celotti A., Stanghellini C., 2005,
A&A 432, 31
Xiang L., Stanghellini C., Dallacasa D., Haiyan Z., 2002, A&A 385,
Xiang L., Dallacasa D., Cassaro P., Jiang D., Reynolds C., 2005, A&A
434, 123
Xiang L., Reynolds C., Strom R. G., Dallacasa D., 2006, A&A 454,
X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz 7
J0210+0419
0 100 200
MilliARC SEC
60 40 20 0 -20 -40 -60 -80 -100
Fig. 1. J0210+0419 at 1.65 GHz, the restoring beam is 10.2 ×
7.5 mas with PA −75.5◦, the peak is 293 mJy/beam, the con-
tours are 12 mJy/beam times levels -1, 1, 2, 4, 8, 16, 32, 64,
100, 200, 400, 800, and the same levels are used in the follow-
ing images, the grey scale unit is mJy.
J0323+0534
MilliARC SEC
150 100 50 0 -50 -100 -150
Fig. 2. J0323+0534 at 1.65 GHz, the restoring beam is 21.3 ×
17.7 mas with PA −20.2◦, the peak is 586 mJy/beam, the first
contour is 50 mJy/beam.
J0433-0229
0 100 200 300 400
MilliARC SEC
100 80 60 40 20 0 -20 -40 -60 -80
Fig. 3. J0433−0229 at 1.65 GHz, the restoring beam is 8.0×6.3
mas with PA −1.7◦, the peak is 435 mJy/beam, the first contour
is 15 mJy/beam.
J0913+1454
0 50 100 150 200
MilliARC SEC
100 80 60 40 20 0 -20 -40
Fig. 4. J0913+1454 at 1.65 GHz, the restoring beam is 6.9×4.6
mas with PA 24.4◦, the peak is 217 mJy/beam, the first contour
is 6 mJy/beam.
8 X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz
J1057+0012
0 100 200 300
MilliARC SEC
100 80 60 40 20 0 -20 -40
Fig. 5. J1057+0012 at 1.65 GHz, the restoring beam is 7.3×3.3
mas with PA 13.4◦, the peak is 390 mJy/beam, the first contour
is 6 mJy/beam.
J1109+1043
0 100 200 300 400
MilliARC SEC
100 80 60 40 20 0 -20 -40 -60
Fig. 6. J1109+1043 at 1.65 GHz, the restoring beam is 7.9×3.3
mas with PA 16.5◦, the peak is 434 mJy/beam, the first contour
is 10 mJy/beam.
J1135-0021
0 100 200
MilliARC SEC
100 80 60 40 20 0 -20 -40 -60 -80
Fig. 7. J1135−0021 at 1.65 GHz, the restoring beam is 7.8×5.3
mas with PA 20.4◦, the peak is 281 mJy/beam, the first contour
is 8 mJy/beam.
J1203+0414
0 200 400
MilliARC SEC
100 80 60 40 20 0 -20 -40 -60
Fig. 8. J1203+0414 at 1.65 GHz, the restoring beam is 7.0×4.9
mas with PA 15.2◦, the peak is 575 mJy/beam, the first contour
is 10 mJy/beam.
X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz 9
J1352+0232
MilliARC SEC
50 0 -50 -100 -150 -200
Fig. 9. J1352+0232 at 1.65 GHz, the restoring beam is 6.7×3.2
mas with PA 9.9◦, the peak is 183 mJy/beam, the first contour
is 8 mJy/beam.
J1352+1107
MilliARC SEC
100 50 0 -50 -100
Fig. 10. J1352+1107 at 1.65 GHz, the restoring beam is 14.0×
12.5 mas with PA −75.3◦, the peak is 379 mJy/beam, the first
contour is 20 mJy/beam.
J1600-0037
0 100 200 300 400
MilliARC SEC
80 60 40 20 0 -20 -40 -60
Fig. 11. J1600−0037 at 1.65 GHz, the restoring beam is 7.4 ×
5.5 mas with PA −39.8◦, the peak is 400 mJy/beam, the first
contour is 8 mJy/beam.
J2058+0540
MilliARC SEC
150 100 50 0 -50 -100 -150
Fig. 12. J2058+0540 at 1.65 GHz, the restoring beam is 10.6×
6.5 mas with PA 0.7◦, the peak is 362 mJy/beam, the first con-
tour is 10 mJy/beam.
10 X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz
B2121-014
0 100 200 300
MilliARC SEC
100 80 60 40 20 0 -20 -40 -60
Fig. 13. 2121−014 at 1.65 GHz, the restoring beam is 5.9× 5.5
mas with PA −1.4◦, the peak is 370 mJy/beam, the first contour
is 15 mJy/beam.
2322-040
MilliARC SEC
80 60 40 20 0 -20 -40 -60 -80
Fig. 14. 2322−040 at 1.65 GHz, the restoring beam is 8.6× 6.8
mas with PA −2.1◦, the peak is 233 mJy/beam, the first contour
is 10 mJy/beam.
0914+114
MilliARC SEC
50 0 -50 -100 -150 -200
Fig. 15. 0914+114 at 1.65 GHz, the restoring beam is 8.3× 4.7
mas with PA 16.8◦, the peak is 246 mJy/beam, the first contour
is 1 mJy/beam.
MilliARC SEC
80 60 40 20 0 -20 -40 -60 -80
1751+278
Fig. 16. 1751+278 at 1.65 GHz, the restoring beam is 10.7×6.0
mas with PA 7.0◦, the peak is 406 mJy/beam, the first contour
is 3 mJy/beam.
X. Liu et al.: VLBI observations of nineteen GHz-Peaked-Spectrum radio sources at 1.6 GHz 11
1824+271
MilliARC SEC
60 40 20 0 -20 -40 -60 -80
Fig. 17. 1824+271 at 1.65 GHz, the restoring beam is 9.0× 5.1
mas with PA 8.1◦, the peak is 146 mJy/beam, the first contour
is 1 mJy/beam.
2323+790
MilliARC SEC
60 40 20 0 -20 -40 -60 -80
Fig. 18. 2323+790 at 1.65 GHz, the restoring beam is 11.6×5.4
mas with PA −82◦, the peak is 438 mJy/beam, the first contour
is 10 mJy/beam.
	Introduction
	Observations and data reduction
	Results and comments on individual sources
	J0210+0419 (PKS B0208+040)
	J0323+0534 (4C+05.14)
	J0433-0229 (4C-02.17)
	J0913+1454 (PKS B0910+151)
	J1057+0012 (PKS B1054+004)
	J1109+1043 (PKS B1107+109)
	J1135-0021 (4C-00.45)
	J1203+0414 (PKS B1200+045)
	J1352+0232 (PKS B1349+027)
	J1352+1107 (4C+11.46)
	J1600-0037 (PKS B1557-004)
	J1648+0242 (4C+02.43)
	J2058+0540 (4C+05.78)
	PKS B2121-014
	PKS B2322-040
	PKS B0914+114
	1751+278 (MG2 J175301+2750)
	B2 1824+271 
	[WB92] 2323+790
	Discussion
	Summary and conclusion
ABSTRACT
  Aims and Methods: We present the results of VLBI observations of nineteen
GHz-Peaked-Spectrum (GPS) radio sources at 1.6 GHz. Of them, 15 sources are
selected from the Parkes Half Jansky (PHJ) sample (Snellen 2002), 4 others are
from our previous observation list. We aimed at imaging the structure of GPS
sources, searching for Compact Symmetric Objects (CSOs) and studying the
absorption for the convex radio spectra of GPS sources.
  Results: We obtained total intensity 1.6 GHz VLBI images of 17 sources for
the first time. Of them, 80% show mini-double-lobe radio structure, indicating
that they are CSOs or candidates, and their host AGNs could be edge-on to us.
This result suggests that there is a high incidence of mini double-lobe sources
(or CSOs) in the PHJ sample. The sources J0323+0534, J1135-0021, J1352+0232,
J2058+0540, J2123-0112 and J2325-0344 with measured redshift, showing
double-lobe structure with sizes of <1 kpc, are classified as CSOs. Three
sources J1057+0012, J1600-0037 and J1753+2750 are considered as core-jet
sources according to their morphologies and flux variability.

<|endoftext|><|startoftext|>
7 Moment switching in nanotube magnetic force
probes
John R Kirtley1,2,3, Zhifeng Deng4, Lan Luan4, Erhan
Yenilmez1, Hongjie Dai5, and Kathryn A Moler1,4
1 Department of Applied Physics and Geballe Laboratory for Advanced Materials,
Stanford University, Stanford, California 94305 USA
E-mail: jkirtley@stanford.edu
2 IBM Watson Research Center, Route 134 Yorktown Heights, NY 10598 USA
3 Faculty of Science and Technology and MESA+ Institute for Nanotechnology,
University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands
4 Department of Physics and Geballe Laboratory for Advanced Materials, Stanford
University, Stanford, California 94305 USA
5 Department of Chemistry, Stanford University, Stanford, California 94305 USA
Abstract.
A recent advance in improving the spatial resolution of magnetic force microscopy
(MFM) uses as sensor tips carbon nanotubes grown at the apex of conventional silicon
cantilever pyramids and coated with a thin ferromagnetic layer [1]. Magnetic images of
high density vertically recorded media using these tips exhibit a doubling of the spatial
frequency under some conditions [1]. Here we demonstrate that this spatial frequency
doubling is due to the switching of the moment direction of the nanotube tip. This
results in a signal which is proportional to the absolute value of the signal normally
observed in MFM. Our modeling indicates that a significant fraction of the tip volume
is involved in the observed switching, and that it should be possible to image very high
bit densities with nanotube magnetic force sensors.
PACS numbers: 75.75.+a,78.67.Ch
Spatial period doubling has been observed for several carbon nanotube tips with
different track widths and bit densities. The MFM images reported here were made
using the nanotube tip shown in the inset of Figure 1. It was approximately 250nm long,
with a ferromagnetic coating to a total tip diameter of 16nm. The means of producing
metal-coated carbon nanotube tips on AFM cantilevers and the techniques for imaging
magnetic media using these tips have been described previously [1, 2]. Briefly, carbon
nanotubes were grown using wafer-scale chemical vapor deposition at the apexes of the
pyramids of commercial silicon tips intended for tapping mode atomic force microscopy.
For the present measurements the nanotubes were shortened to a length of about 250
nm using an electrical cutting method [3], aligned approximately perpendicular to the
cantilever using a focused ion beam [2], and then coated to a total thickness of about
16 nm with a Ti/Co/Ti trilayer by e-beam evaporation from a direction parallel to the
http://arxiv.org/abs/0704.0311v1
Moment switching in nanotube magnetic force probes 2
nanotube long axis (see the inset in Figure 1). The magnetic imaging was done using the
Tapping/LiftTM mode of a Digital Instruments Nanoscope III SPM at room temperature
in air. In this mode, the topography of the sample is first determined for each line with
a scan at low tip-sample spacing z0, then the tip is retracted a specified distance and
a second line scan is made while recording the deviation of the phase angle δ of the
cantilever response with the cantilever driven slightly below its resonance frequency. δ
is proportional to dFz/dz0, the derivative of the force on the cantilever with respect to
The images presented here were made on a vertically polarized magnetic medium.
A collection of phase shift images at 300 kilo-flux changes per inch (kfci) and different
tip-sample spacings (z0’s) are shown in Figure 2(a), and at constant lift height (15nm)
and different bit densities in Figure 2(b). Cross-sections of the data through the center
of the tracks are displayed in Figure 2(c,d). Modeling of this data as described below
is shown in Figure 2(e,f). At high values of z0 and high bit densities the phase images
have the same period as written, but at low z0’s and low bit densities anomalies in
the images gradually develop into sharp features with double the original periodicity.
These anomalies are due to switching of the orientation of the magnetic moment of
the tip. This can be demonstrated most convincingly by inspecting the images and
cross-sections of Figure 2(b,d). In this case a background from the sections of the image
without written bits has been subtracted out, and it can be seen that at low bit densities
the phase shift always stays below the average background level, keeping the tip-sample
force attractive by switching of the tip moment direction just as the force derivative
crosses zero.
This conclusion is supported by detailed modeling: We assume that the magnetic
medium is composed of slabs of length s in the x direction, height h in the z
direction, and width W in the y direction (Figure 1(a)). The slabs have a uniform
magnetization with moment direction alternating between parallel and anti-parallel to
the z-axis direction. The tip is assumed to have a square cross-section with width w
and length L, with the tip end a distance z0 from the upper surface of the medium.
The magnetic fields above the sample (displayed as field lines in Figure 1(b)) were
calculated both analytically and numerically. In our numerical modeling the vector
between the individual medium dipole moments [xm, ym, zm] and a position [x, y, z] is
~r = (x− xm)x̂+ (y − ym)ŷ + (z − zm)ẑ. Then the z and x-component of the field from
the individual dipoles is given by
Bz(~r) =
3(z − zm)2
Bx(~r) =
3µ0mm
(x− xm)(z − zm), (1)
with r = |~r|. By is given by setting x → y and xm → ym in the second equation.
If we assume that the tip can also be represented by a sum of point dipoles ~mt = mtn̂,
where n̂ is a unit vector in the tip moment direction, the force gradient on the tip is
Moment switching in nanotube magnetic force probes 3
Medium
z0+L1
Figure 1. (a) Tip and sample geometry in model. The inset is a scanning electron
microscope image of the cobalt-coated nanotube tip. (b) Calculated field lines for a
300 kfci track above the center of the domains in the magnetic medium, using the
parameters described in the text. The arrows indicate calculated moment orientations
for a series of tip-sample spacings of 15, 22.5, 28, 33, 39, and 45 nm using the model and
parameters described in the text. At small tip-sample spacings the tip moment flips
to be always anti-aligned with the moment directly below the tip. At large tip-sample
spacings the tip moments are only slightly perturbed by the medium fields.
given by
mt cos(φ)
d2Bz(x, y, z)
tip,medium
3µ0mmmt cos(φ)
− 30(z − zm)
35(z − zm)4
where rtm is the distance between the individual dipole elements in the tip and medium,
and φ is the angle between the tip moment direction and the z-axis.
Moment switching in nanotube magnetic force probes 4
Figure 2. (a) Collection of phase shift images for a 300 kfci track of alternating
vertically recorded media moments at different lift heights. Each line of the image has
an offset such that the average phase shift value for that line is zero. (b) Image of a
section of vertically recorded media with a series of tracks with different bit densities,
at a lift height of 15 nm. A background has been subtracted from this image such
that the regions between the tracks have zero averaged phase shifts. (c) Cross-sections
through the centers of the tracks in (a). (d) Cross-sections through the centers of the
tracks in (b). The zero phase shift levels for each cross-section are indicated by dashed
lines. (e) Modeling of (c) as described in the text. The lines in (c) and (e) have been
offset vertically for clarity. (f) Modeling of the tracks in (d) as described in the text,
with the zero phase shift levels indicated by dashed lines.
Moment switching in nanotube magnetic force probes 5
For positions near the center of the tracks we obtained analytical expressions for
the magnetic fields by assuming that the magnetically oriented slabs have infinite width
W in the y direction. If we take the boundary condition
Hz(~r)|z=0+ −Hz(~r)|z=0− = Hz(~r)|z=−h− −Hz(~r)|z=−h+ = σ(~r), (3)
where σ(~r) is the surface magnetic charge, the magnetic field above the sample can be
written as [4]
Hz(~r, z) =
d2~kA
−i~k·~r, (4)
where
H,z = e
−kz(1− e−kh)
d2~r σ(~r)ei
~k·~r. (5)
Taking the surface magnetic charge σ(~r) to be uniform in y,
σ(~r) = M 2ns < x < (2n+ 1)s
= −M (2n− 1)s < x < 2ns, (6)
We find
tan−1
(2n+ 1)s+ x
+ tan−1
(2n− 1)s+ x
− 2 tan−1 z
2ns+ x
− tan−1 z + h
(2n+ 1)s+ x
− tan−1 z + h
(2n− 1)s+ x
+ 2 tan−1
z + h
2ns+ x
Since Hz = dΦ/dz, Φ a scalar potential, the x-component of the field can be written as
Hx = dΦ/dx, which leads to
Hx = −
((2n+ 1)s+ x)2
+ log
((2n− 1)s+ x)2
−2 log
(2ns+ x)2
− log
(z + h)2
((2n+ 1)s+ x)2
− log
(z + h)2
((2n− 1)s+ x)2
+ 2 log
(z + h)2
(2ns+ x)2
Hy is zero by symmetry. Figure 3 displays the calculated fields in the z and x directions
(a,c), the tip moment orientation angle φ (b), and the switching fields < Hz >c and
< Hx >c (c) for the best fit to the data as described below. It is interesting to note
that the discontinuous switches of φ are controlled predominantly by the size of the x
component of the field. This can be understood by examining the trajectory in field
that the tip takes in moving from one domain to the next. The ovals in Figure 3c are the
calculated trajectories of < Hz > vs. < Hx > for the tip heights listed in the caption.
The “asteroid” is the critical field calculated for the energy functional form of Eq. 1
of the main text and has the form first predicted by Stoner and Wohlfarth[5]. The tip
moment direction is predicted to switch when the field trajectories cross the critical field
asteroid (solid dots in Figure 3c). This happens for relatively large values of | < Hx > |
and small values of | < Hz > |. When the field trajectory crosses the Stoner-Wohlfarth
Moment switching in nanotube magnetic force probes 6
0 1 2 3 4
15 nm
45 nm
0 1 2 3 4
15 nm 30 nm
33 nm
45 nm
–2 –1 0 1 2
<Hx>Mt/K1
Figure 3. (color online) (a) Calculated magnetic fields, averaged over the tip volume
and multiplied by the tip saturation magnetization Mt divided by the anisotropy
parameter K1, above a vertically recorded magnetic medium with infinite extent in
the y-direction, 12 nm thick in the z-direction, with magnetization oriented in the
z-direction and alternating in the x-direction with period 2s = 185 nm, for spacings
between the bottom of the tip and the medium of z0 = 15, 22.5, 26, 28, 30, 33, 39, and
45 nm. The tip has an assumed square cross-section 16nm on a side, and the averaging
is over a tip length L1 = 48nm. The calculated magnetic fields < Hz > normal to
the magnetic medium are offset by 3 units. The best fit values MtMm/K1 = 17.5,
σw/µ0L1K1 = 0 (see main text Figure 3) were used for these calculations. (b)
Calculated variation of the tip moment orientation angle φ relative to the z axis.
(c) The ovals show the calculated trajectories of < Hz > Mt/K1 vs < Hx > Mt/K1
for the various tip heights. The “asteroid” plots the values of the critical fields. The
intersections of these two sets of curves, indicated by solid symbols, are the fields at
which switching occurs in the simulations.
Moment switching in nanotube magnetic force probes 7
asteroid at large values of | < Hz > | the tip has already switched to the low energy
configuration.
For numerical work the medium was assumed to be composed of a collection of
individual dipole moments ~m = mmẑ, where mm = ±Mmv, with Mm the saturation
magnetization and v the volume of the individual medium elements. In what follows
we take both the tip and medium volume elements to be cubes 4 nm on a side. This
results in agreement to within a few percent between our numerical work and analytical
expressions for the medium magnetic fields. Halving the size of the volume elements (to
cubes 2 nm on a side) changes the calculated force derivative curve for z0 = 30 nm (see
Figure 2c) by about 2%.
To model the dynamics of the tip flip process, we conceptually divide the tip into
two domains, one with length L1 close to the medium, the other with length L − L1
further away. Each has sufficiently strong exchange fields that the entire volume within
each domain has the same moment orientation[6]. The section of the tip furthest from
the medium is assumed to have its moment parallel to the z axis; that closest to the
medium has its moment at an angle φ relative to the z-axis (Figure 1). Then the energy
of the tip in an external magnetic field can be written as:
E(φ) = µ0w
2L1K1
µ0L1K1
(1− cos φ) + sin2 φ
(< Hz > cos φ+ < Hx > sin φ)
where we have taken the simplest non-trivial forms for the domain wall energy (first
term) and the anisotropy energy (second term)[5, 7]. The third term in Eq. 9 is the
energy of the dipole moments of the tip in the external magnetic field. Here µ0K1 is
the anisotropy energy density, σw is the domain wall energy per unit area, Mt is the tip
saturation magnetization, and < Hz > and < Hx > are the magnetic fields in the z and
x directions respectively averaged over the tip volume from z = z0 to z = z0 + L1. To
simulate the magnetic force images, the tip moment is at first assumed to be parallel
to the z-axis. The tip is moved to a new position, the local fields are calculated and
averaged over the tip volume, φ is moved to the new local minimum in energy (Eq.
9), the force gradient is calculated, and the process is repeated. This modeling results
in the cross-sections displayed in Figure 2(c,f), which reproduce the absence of tip
switching at high bit densities and high lift heights, and the presence of tip switching
at low bit densities and low lift heights. When tip switching occurs, the modeling also
reproduces the fact that the tip-sample force gradient always stays negative, with the
tip moment reversing as the z-component of the field crosses zero. The quantitative
interpretation of MFM images in the presence of tip switching is straightforward once it
is recognized that the phase shift δ is proportional to the negative of the absolute value
of the tip sample force gradient (-|dFz/dz0|). The experimental phase shift oscillation
amplitudes decrease much more rapidly than the modeling for bit densities above about
500 kfci (Figure 2f). We believe that this is because the as written bits do not have
as abrupt moment orientation reversals as our idealized model. Our modeling indicates
Moment switching in nanotube magnetic force probes 8
that nanotube tips with the geometry of Figure 1 could be used to image bits with sharp
moment direction transitions with densities above 2000 kfci (13 nm/flux reversal).
Figure 4 compares the maximum minus the minimum value for δ along a cross-
section through the center of the bits at a bit density of 300 kfci as a function of
tip height z0. The modeling results in this Figure are labeled by the length L1 of
tip that is allowed to reorient its magnetic moment. There are three parameters
in this analysis - a global multiplicative factor, MtMm/K1, and the reduced domain
wall energy σw/µ0L1K1. Figure 4b plots the best fit values for MtMm/K1 and
(∆δexperimental−∆δmodel)2/(N−1) (N the number of data points). In all cases the
best fit value for σw/µ0L1K1 is 0, and the χ
2 value at σw/µ0L1K1 = 1 is approximately
double that when σw/µ0L1K1 = 0. Increasing the domain wall energy requires larger
switching fields: the best fit value at L1= 64 nm for MtMm/K1 increases from 20.8 to
28.6 when σw/µ0L1K1 increases from 0 to 1. The domain wall energy for Co is reported
to be σw = 25 ± 3J/m2[8]. This leads to σw/µ0L1K1 = 0.85, using L1 = 48nm and
K1=0.25 M
t (neglecting crystalline anisotropy), with Mt = 1.4× 106A/m, so that our
fits are consistent with the calculated wall energy, if one allows for a doubling of the best
χ2 value. The tip end may be magnetically poorly coupled to the rest of the tip because
of an inhomogeneity or grain boundary: it appears (Figure 1 inset) to have granularity
on the scale of a few tens of nm and a kink about 50 nm from its end. We have observed
qualitatively similar spatial frequency doubling using several nanotube tips, particularly
in the smallest diameter tips, where inhomogeneities and weak magnetic coupling are
fundamentally more difficult to avoid.
The shift ∆ω in the resonance frequency ω0 of the cantilever due to a force gradient
between the tip and sample dFz/dz0 is given by ∆(ω)/ω0 = −(dFz/dz0)/2k, where k
is the spring constant of the cantilever. The phase shift δ of the cantilever response is
then given by
tan δ =
/ω − ω/ω′
, (10)
where Q is the quality factor, ω is the driving frequency and ω
is the perturbed
resonance frequency of the cantilever. At z0 = 15nm our model predicts an excursion of
∆(dF/dz)/µ0MtMm ≈ 2 × 10−9m. Using a driving frequency at optimal sensitivity
ω = ω
(1 − 1/
8Q), k = 2.8N/m, Q=350, ω0 = 168kHz, and estimating Mt =
1.4 × 106A/m [9] and Mm = 2 × 105A/m [10], this corresponds to an excursion
in the phase shift ∆δ = 0.7o, in reasonable agreement with the experimental value
of ∆δ = 0.4o given the uncertainties in the values for the saturation magnetizations.
Using Mt| < Hx >c |/K1 <∼ 2 [5], the best fit value MtMm/K1 ∼ 17 (Figure 4)
implies a critical field of approximately 2.4×104 A/m, much smaller than the saturation
magnetization of cobalt of 1.4×106 A/m, but comparable to a switching field of 3.2×104
A/m reported for 30 nm thick, 0.34 µm wide, 2.04 µm long ellipsoidal amorphous
cobalt nanodots [11]. This reduction in switching field could result from competition
between the crystalline and shape anisotropies [9] if, for example, the uniaxial crystalline
anisotropy favors moment alignment along the tip radial direction, while the shape
Moment switching in nanotube magnetic force probes 9
10 20 30 40 50
z0 (nm)
Experiment
L1= 0
L1= 4 nm
L1= 16 nm
L1= 32 nm
L1= 48 nm
L1= 256 nm
0 100 200 300
L1 (nm)
0 100 200 300
0.0000
0.0025
0.0050
0.0075
0.0100
Figure 4. (color online) (a) Full-scale variation in phase angle along cross-sections
through the centers of the recorded tracks in Figure 2b. The + symbols represent
experiment. The other symbols represent modeling as described in the text. The
modeling curves are labeled by the length L1 of tip that switches magnetic moment
orientation. (b) Best fit values for MtMm/K1, and χ
(∆δexp −∆δmod)2/(N − 1)
as a function of the tip switching length L1, with σw/µ0L1K1 = 0.
anisotropy favors the axial direction. The tip material could also have a complicated
structure incorporating grain and domain boundaries, reducing the anisotropy energy.
The highly non-uniform fields in our case could also play a role in reducing the switching
field.
Although our analysis has been presented using a specific model for the tip
dynamics, the conclusion that the tip moment flips at relatively small fields can be
presented simply: The magnetic field at the tip must be less than the magnetic field
at the medium surface, which is given by the saturation magnetization of the medium.
The fields required to flip the tip are expected, using for example the Stoner-Wohlfarth
model [5], to be about the saturation magnetization of the tip, which is for epitaxial
cobalt much larger than the saturation magnetization of the medium. The switching
fields of our nanotube, just as for amorphous nanodots, are much smaller than those
of epitaxial Co nanodots [7, 9], which are comparable to the saturation magnetization
Moment switching in nanotube magnetic force probes 10
of cobalt. It might be possible to avoid switching and the attendant spatial frequency
doubling by developing processes to epitaxially coat the nanotube or by using single-
crystal nanorods. However, such tips would also generate larger local magnetic fields
at the sample, increasing the possibility of changing the magnetic state of the sample,
particularly for the smallest samples. Our results show that reliable information on the
moment orientations of the media can be inferred even in the presence of tip switching
if it is realised that the MFM signal is proportional to the absolute magnitude of the
tip-sample force gradient.
Acknowledgments
We would like to thank Dennis Adderton of First Nano and Dr. Steve Minne of Veeco
Instruments for the AFM probes, and Dr. David Guarisco of Maxtor Corporation for
the recorded disks. This work was supported by the Center for Probing the Nanoscale
(CPN), an NSF NSEC, NSF Grant No. PHY-0425897, by NSF Grant No. DMR
0103548, by the Dutch Foundation for Research on Matter (FOM), the Netherlands
Organization for Scientific Research (NWO), and the Dutch STW NanoNed program.
References
[1] Deng Z, Yenilmez E, Leu J, Hoffman J E, Straver E W J, Dai H, and Moler K A 2004 Appl. Phys.
Lett. 85 6263
[2] Deng Z, Yenilmez E, Reilein A, Leu J, Dai H and Moler K A 2006 Appl. Phys. Lett. 88 023119
[3] Yenilmez E, Wang Q, Chen R J, Wang D W, and Dai H J 2002 Appl. Phys. Lett. 80 2225
[4] Steifel B, “Magnetic Force Microscopy at Low Temperatures and in Ultra High Vacuum -
Application on High Temperature Superconductors”, Inauguraldissertation, University of Basel,
1998.
[5] Stoner E C and Wohlfarth E P 1948 Phil. Trans. Royal Soc. London A 240 599
[6] Kittel C, 1946 Physical Review 70 965
[7] Bonet E, Wernsdorfer W, Barbara B, Benoit A, Mailly D and Thiaville A, 1999 Phys. Rev. Lett.
83 4188
[8] Hehn M, Padovani S, Ounadjela K and Bucher J P, 1996 Phys. Rev. B 54 3428
[9] Otani Y, Kohda T, Novosad V, Fukamichi K, Yuasa S and Katayama T, 2000 J. Appl. Phys. 87
[10] Ross C A 2001 Annu. Rev. Mater. Res. 203
[11] Johnson J A, Grimsditch M, Metlushko V, Vavassori P, Illic B, Neuzil P and Kumar R 2000 Appl.
Phys. Lett. 77 4410
ABSTRACT
  A recent advance in improving the spatial resolution of magnetic force
microscopy (MFM) uses as sensor tips carbon nanotubes grown at the apex of
conventional silicon cantilever pyramids and coated with a thin ferromagnetic
layer. Magnetic images of high density vertically recorded media using these
tips exhibit a doubling of the spatial frequency under some conditions. Here we
demonstrate that this spatial frequency doubling is due to the switching of the
moment direction of the nanotube tip. This results in a signal which is
proportional to the absolute value of the signal normally observed in MFM. Our
modeling indicates that a significant fraction of the tip volume is involved in
the observed switching, and that it should be possible to image very high bit
densities with nanotube magnetic force sensors.

<|endoftext|><|startoftext|>
Introduction
	Dark Energy and Cosmic Structure
	Simulation details
	The Consequences of Distance Matching
	Results
	Evolving Dark Energy and Structure Growth
	Conclusion
ABSTRACT
  For dynamical dark energy cosmologies we carry out a series of N-body
gravitational simulations, achieving percent level accuracy in the relative
mass power spectra at any redshift. Such accuracy in the power spectrum is
necessary for next generation cosmological mass probes. Our matching procedure
reproduces the CMB distance to last scattering and delivers subpercent level
power spectra at z=0 and z~3. We discuss the physical implications for probing
dark energy with surveys of large scale structure.

<|endoftext|><|startoftext|>
arXiv:0704.0313v1  [cond-mat.str-el]  3 Apr 2007
Typeset with jpsj2.cls <ver.1.2> Letter
Possibility of Gapless Spin Liquid State by One-dimensionalization
Yuta Hayashi∗ and Masao Ogata
Department of Physics, University of Tokyo, Hongo, Bunkyo-ku, Tokyo, 113-0033
Motivated by the observation of a gapless spin liquid state in κ-(BEDT-TTF)2Cu2(CN)3, we
analyze the anisotropic triangular lattice S = 1/2 Heisenberg model with the resonating valence
bond mean-field approximation. Paying attention to the small quasi-one-dimensional anisotropy
of the material, we take an approach from one-dimensional (1D) chains coupled with frustrating
zig-zag bonds. By calculating one-particle excitation spectra changing anisotropy parameter
J ′/J from the decoupled 1D chains to the isotropic triangular lattice, we find almost gapless
excitations in the wide range from the 1D limit. This one-dimensionalization by frustration is
considered to be a candidate for the mechanism of the gapless spin liquid state.
KEYWORDS: gapless spin liquid, κ-(BEDT-TTF)2Cu2(CN)3, anisotropic triangular lattice, frustration,
one-dimensionalization
Organic conductors are one of the fascinating materi-
als which have low-dimensionality and relatively strong
electron correlations. So far, various physical states
have been observed and investigated intensively.1 Among
them, magnetism in the Mott insulating phase next to
the unconventional superconductivity has been attract-
ing considerable attention. This phase is observed in the
family of κ-(BEDT-TTF)2X, where BEDT-TTF (ET)
denotes bis(ethylenedithio)-tetrathiafulvalene and X rep-
resents a monovalent anion. Similarities to that of high-
Tc cuprates are worthy of note. Another stimulating
problem concerning magnetism is ground state proper-
ties of geometrically frustrated spin systems such as a tri-
angular lattice and a Kagomé lattice. These two intrigu-
ing issues meet in a material κ-(ET)2Cu2(CN)3, which
is a Mott insulator having a nearly isotropic triangular
lattice, and it has been in the spotlight of late.
According to 1H NMR measurements at ambient pres-
sure,2 κ-(ET)2Cu2(CN)3 shows no indication of long-
range magnetic order (LRMO) down to 32mK. This
is 4 orders of magnitude below the exchange constant
J ∼ 250K estimated from the temperature dependence
of susceptibility. Recently, a similar result has been ob-
tained by zero-field muon spin relaxation measurements,
which have observed no LRMO down to 20mK.3 These
results suggest that a quantum spin liquid state is real-
ized in the ground state. On the other hand, the static
susceptibility remains finite down to 1.9K, and spin-
lattice relaxation rate 1/T1 shows power-law temperature
dependence below 1K. These imply that almost gapless
spin excitation exists. This fact is a significant feature of
the spin liquid phase observed in this material.
Since Anderson’s proposal of a resonating valence
bond (RVB) state,5 enormous number of studies have
been made on the triangular lattice spin system. It is
now a general view that the ground state of the isotropic
triangular lattice Heisenberg model has LRMO, such as
the 120◦ structure.6–9 On the other hand, if one ne-
glects the LRMO and assumes a disordered ground state,
the mean-field theory of RVB state gives a spin-gap
∗E-mail address: yhayashi@hosi.phys.s.u-tokyo.ac.jp
Table I. Anisotropy of effective transfer integrals in κ-(ET)2X.
The definition of t and t′ are not as usual (see the text).
Anion X t′/t
Cu2(CN)3 0.94
Cu(NCS)2 1.19
Cu[N(CN)2]Br 1.33
Cu[N(CN)2]Cl 1.47
Cu(CN)[N(CN)2] 1.47
Ag(CN)2·H2O 1.67
I3 1.72
state with dx2−y2+idxy-wave symmetry, which is called
“d+id state”.10–12 This RVB state, describing an insu-
lating spin system, corresponds to a projected BCS state
at half-filling in which doubly occupied states are ex-
cluded. Thus, the existing theories show that the ground
state has LRMO in general, and if the magnetic order
is destroyed in some reason, the d+id fullgap state will
appear. If we regard the Mott insulating phase of κ-
(ET)2Cu2(CN)3 in low temperatures as an isotropic tri-
angular lattice spin system, the results of NMR and sus-
ceptibility measurements, which suggest neither LRMO
nor spin gap, cannot be explained.
In this letter, we pay attention to small anisotropy of
κ-(ET)2Cu2(CN)3 and propose a new possibility for un-
derstanding its gapless spin liquid state. As shown in Ta-
ble I, only κ-(ET)2Cu2(CN)3 has an opposite anisotropy
among the family of κ-(ET)2X studied in the past. Here,
the effective transfer integrals t and t′ are defined in-
versely to the conventional way; t = 0 corresponds to
the square lattice, and t′ = 0 the decoupled chains.
Therefore, κ-(ET)2Cu2(CN)3 has quasi-one-dimensional
(Q1D) anisotropy rather than an isotropic triangular lat-
tice. Considering that the pure 1D spin system has no
LRMO and gapless spin excitation, it is likely that this
Q1D anisotropy is concerned with the formation of the
gapless spin liquid state in κ-(ET)2Cu2(CN)3.
Based on the above consideration, we study the
Heisenberg model on an anisotropic triangular lattice,
which is equivalent to 1D chains coupled with zig-zag
http://arxiv.org/abs/0704.0313v1
2 J. Phys. Soc. Jpn. Letter Author Name
bonds as shown in Fig. 1. The Hamiltonian is given by
<i,i′>
JSi · Si′ +
<i,j>
J ′Si · Sj , (1)
where <i, i′> and <i, j> represent the summation over
intrachain and interchain nearest-neighbor pairs with an-
tiferromagnetic coupling constant J and J ′, respectively
(see Fig. 1). We investigate the anisotropy parameter
range J ′/J = 0.0-1.0, in which the model interpolates
between the decoupled chains (J ′ = 0) and the isotropic
triangular lattice (J ′ = J).
In the following, we consider a projected BCS state
defined as
∣p-BCS
, (2)
where PG is the Gutzwiller projection operator which ex-
cludes double occupancy and
is a BCS mean-field
wave function. Since it is difficult to treat the Gutzwiller
projection analytically, we apply an RVB mean-field ap-
proximation to the Hamiltonian (1) and calculate the
one-particle excitation spectra. To put it more con-
cretely, we introduce mean fields ∆ij ≡
ci↑cj↓
, ξij ≡
and obtain its excitation spectrum by
diagonalizing the mean-field Hamiltonian. This approxi-
mation is equivalent to the “Gutzwiller approximation”
which replaces the effect of the Gutzwiller projection op-
erator with the statistical weight gs as
p-BCS
∣Si ·Sj
∣p-BCS
∣Si ·Sj
. (3)
In the simplest Gutzwiller approximation, the statisti-
cal weight is given as gs = 4/(1 + δ)
2 where δ is the
density of holes,15 and in the case of half-filling (δ = 0),
gs = 4. Although double occupancy is no longer excluded
from wave functions in this approximation, it is known
in the research of high-Tc superconductivity that the
RVB mean-field (Gutzwiller) approximation gives quali-
tatively good results.
The spin operators Si ·Sj in the Hamiltonian (1) can
be rewritten by the fermion operators as
Si · Sj =
ci↑ − c†i↓ci↓
cj↑ − c†j↓cj↓
cj↑ + c
. (4)
Fig. 1. The anisotropic triangular lattice Heisenberg model with
intrachain coupling J and interchain zig-zag coupling J ′.
τ1, τ2, τ3 are lattice vectors.
By introducing the mean fields, we can rewrite the
Hamiltonian as
HMF =
ck↑+ c
+ h.c.
except for constant terms. Here, ξk and ∆k are given by
ξk ≡ −3Jξτ1cos(k · τ 1)
− 3J ′
ξτ 2cos(k · τ 2) + ξτ 3cos(k · τ 3)
, (6)
∆k ≡ 3J∆τ1cos(k · τ 1)
+ 3J ′
∆τ 2cos(k · τ 2) + ∆τ3cos(k · τ 3)
, (7)
where τ 1 = (1, 0), τ 2 = (1/2,
3/2), τ 3 = (1/2,−
as shown in Fig. 1, and
ci+τ↑
ci+τ↓
, ∆τ ≡
ci↑ci+τ↓
. (8)
On the analogy of BCS theory, we obtain self-consistent
equations at zero temperature
ξτ i = −
eik·τ i
∆τ i =
e−ik·τ i
with a quasiparticle excitation spectrum
+ |∆k|2. (10)
We determine the order parameters ∆τ i , ξτ i (i = 1, 2, 3)
by solving self-consistent equations (9) numerically, and
obtain the one-particle excitation spectrum Ek.
Firstly, we verify our method in 1D limit (J ′/J = 0).
According to the exact solution, the ground state is a
spin disordered state and the excitation spectrum is “des
Cloizeaux-Pearson mode” with S = 1.16 In the present
RVB mean-field theory, the one-particle excitation spec-
trum becomes
Ek = 3J
+ |∆τ 1 |
2 |cos kx| (11)
in the 1D limit. This clearly realizes gapless excitations
at kx = ±π/2. Note that this one-particle excitation
describes a spin singlet breaking, i.e. S = 1/2 spinon
excitation, whereas the des Cloizeaux-Pearson mode de-
scribes S = 1 spin-wave (magnon) excitation. Thus, two-
spinon excitations with kx = π/2 and kx = −π/2 form an
S = 1 magnon with kx = 0. This means that the present
gapless excitation spectrum obtained in the RVB mean-
field theory is consistent with the exact des Cloizeaux-
Pearson mode.
Nextly, we show the results of 0 ≤ J ′/J ≤ 1 case,
focusing on the following parameters
+ |∆τ 1 |
D23 ≡
+ |∆τ2 |
+ |∆τ3 |
Because of the SU(2) degeneracy at half-filling,10, 15 these
parameters are determined uniquely regardless of the de-
generate ground states. Actually, the excitation spectrum
J. Phys. Soc. Jpn. Letter Author Name 3
can be written as
= 9J2D21 cos
+ 9J ′2D223
+ cos2
Therefore, D1, D23 determine the dispersion relations
along the chains (τ 1) and between the chains (τ 2,τ 3),
respectively. Their J ′/J dependence calculated in the
system size L = 1200 (N = L2) are plotted in Fig. 2.
A notable feature is that D23 remains very small com-
pared to D1, in spite of the comparatively large J
to J ′/J ∼ 0.25. When D23 = 0 the system is a pure 1D
chain. Indeed, when J ′/J = 0, the right-hand side of the
self-consistent equations of ξτ 2 , ξτ3 , ∆τ2 , ∆τ 3 become
all equal to zero. As we show later, D23 is very small
for J ′/J . 0.25 and vanishes when J ′/J → 0. This in-
dicates that there are scarcely any correlations between
spins of different chains, and practically 1D state is real-
ized. As J ′/J approaches unity, D23 gradually increases
and becomes equal to D1.
Finally, we show in Fig. 3 the J ′/J dependence of the
one-particle excitation spectra Ek in (12). We find that
the structure of excitation spectra in 0 ≤ J ′/J . 0.25
has little difference from that of the decoupled chains
(J ′/J = 0.0). As a result, almost gapless excitations are
realized in this wide parameter range. This means that
practically 1D state is realized, which is also expected
from the behavior of D23 in Fig. 2. When J
′/J exceeds
0.25, the excitation gap gradually increases globally in
the first Brillouin zone (1BZ). However, the shape of the
whole spectrum is almost unchanged until the J ′/J be-
comes as large as about 0.6. Moreover, focusing on the
lowest energy excitations (dark areas in the contour plot
shown in Fig. 3), their locations in the 1BZ do not deviate
from those in the 1D limit (kx = ±π/2) for J ′/J . 0.8.
Additionally, when kx = ±π/2, the excitation spectrum
Ek is independent of ky, i.e., Ek = 3J
′D23. This is be-
cause the frustration of two interchain couplings (corre-
sponding to the lattice vector τ 2 and τ 3) cancel the ky
dependence. This fact is rather important, since it indi-
cates that the excited quasiparticles along the kx = ±π/2
lines feel free to move along the ky direction. This is the
same condition as in the 1D limit, except for the exis-
Fig. 2. Anisotropy dependence of D1 and D23 for L = 1200. Note
that D23 is very small compared to D1 in a wide range 0 ≤
J ′/J .0.25.
tence of a finite energy gap.
Figure 4 shows the minimum gap energy in the 1BZ
as a function of anisotropy J ′/J , changing the system
size L. We can see the almost gapless excitations in the
wide parameter range 0 ≤ J ′/J . 0.25, as is already
expected. It is quite natural that this behavior is simi-
Fig. 3. Anisotropy dependence of the one-particle excitation
spectra. Contour plots of the spectra are on the left, and sections
along ky = 0 line are on the right. The hexagons with broken
lines represent 1BZ of the triangular lattice. Up to J ′/J ∼ 0.25,
the spectra for each anisotropy are hardly distinguishable, and
the one-dimensionality strongly remains for large J ′/J .
4 J. Phys. Soc. Jpn. Letter Author Name
lar to that of D23, considering that the minimum energy
excitations are located along kx = ±π/2 for J ′/J . 0.6.
By plotting the same data for various system size, L,
in a semi-log scale (Fig. 4), we can see a discontinuous
jump for every size. We find that this critical value J ′c/J
vanishes very slowly as (lnL)−1. Thus, the discontinu-
ity is an artifact of finite-size calculation. We also find
that the minimum gap energy is finite when infinitesimal
J ′ is introduced. Actually, we can fit the J ′ dependence
as aJ ′ exp(−bJ/J ′)17 for J ′/J . 0.6 as shown in Fig.
4. Considering that the minimum gap energy is already
about 3 orders of magnitude below J at J ′/J ∼ 0.25, it
can be said that almost gapless excitation is realized in
0 ≤ J ′/J . 0.25. This result is fairly suggestive com-
pared with the previous series expansion18 and linear
spin wave19, 20 studies, all of which suggest a spin dis-
ordered state in the parameter range J ′/J . 0.25.
From the above results, we conclude that there is a
strong tendency to form a 1D-like excitation spectrum
for the triangular lattice spin system with anisotropy
0 ≤ J ′/J . 0.6. Furthermore, even if the anisotropy
is as large as 0.6 . J ′/J . 0.8, we can still expect 1D-
like behavior for quasiparticles except for the existence of
the excitation gap. Let us here discuss the relation to κ-
(ET)2Cu2(CN)3. The anisotropy of spin exchange inter-
actions in this material can be estimated from J = 4t2/U
(U being the onsite Coulomb repulsion) as J ′/J ∼ 0.89.
At this anisotropy, a rather large excitation gap exists as
shown in Fig. 4. We consider two possibilities to under-
stand the gaplessness. One is that the small gap region in
Fig. 4 expands to large values of J ′/J by some factors not
considered in the present model. For example, If long-
distance exchange interactions, quantum fluctuation or
multiple spin exchange effect14 (higher order terms of
the Heisenberg model) suppress not only LRMO but also
the spin gap, we can reproduce the gapless spin liquid
state at large J ′/J . These possibilities remain as future
problems. Another possibility is that the anisotropy J ′/J
of κ-(ET)2Cu2(CN)3 deviates from the above estimation
Fig. 4. (Color Online) Anisotropy dependence of the minimum
gap energy in the 1BZ (right axis) for L=60(diamond), 120(plus),
300(square), 600(cross) and 1200(triangle). The semi-log plots of
the same quantity are also shown (left axis). The solid line is a
fitted exponential function aJ ′ exp(−bJ/J ′), where a = 3.50 and
b = 1.61. We find that the observed critical behavior is an artifact
of finite size calculation (see the text).
due to, for example, a finite U effect.21 If it is in the
range J ′/J < 0.25, the excitation gap is sufficiently small
and the susceptibility behavior (finite at 1.9K whereas
J ∼ 250K) can be explained.
In summary, we analyzed an anisotropic triangular lat-
tice Heisenberg model using RVB mean-field approxima-
tion in order to investigate the physical origin of the gap-
less spin liquid state observed in κ-(ET)2Cu2(CN)3. We
payed attention to the Q1D anisotropy of this material,
and took an approach from the 1D limit. As a result of
calculations, we found that a practically 1D state with
almost gapless excitations is realized in the wide range of
the anisotropy parameter 0 ≤ J ′/J . 0.25. Furthermore,
one-dimensionality remained strongly even in J ′/J >
0.25 due to the geometrical frustration of interchain cou-
plings. We consider this “one-dimensionalization by frus-
tration” as a candidate for the mechanism of the gapless
spin liquid state, although the full understanding has not
yet been achieved.
This work was partly supported by a Grant-in-Aid
for Scientific Research on Priority Areas of Molecular
Conductors (No. 15073210) from the Ministry of Edu-
cation, Culture, Sports, Science and Technology, Japan,
and also by a Next Generation Supercomputing Project,
Nanoscience Program, MEXT, Japan.
1) For a review, see T.Ishiguro, K.Yamaji and G.Saito: Organic
Superconductors (Springer-Verlag, Berlin, 1998), 2nd ed.
2) Y.Shimizu, K.Miyagawa, K.Kanoda, M.Maesato and G.Saito:
Phys. Rev. Lett. 91 (2003) 107001.
3) S.Ohira, Y.Shimizu, K.Kanoda and G.Saito: J. Low Temp.
Phys. 142 (2006) 153.
4) T.Komatsu, N.Matsukawa, T.Inoue and G.Saito: J. Phys. Soc.
Jpn 65 (1996) 1340.
5) P.W.Anderson: Mater. Res. Bull. 8 (1973) 153.
6) B.Bernu, P.Lecheminant, C.Lhuillier and L.Pierre: Phys. Rev.
B 50 (1994) 10048.
7) N.Elstner, R.R.P.Singh and A.P.Young: Phys. Rev. Lett. 71
(1993) 1629.
8) P.Lecheminant, B.Bernu, C.Lhuillier and L.Pierre: Phys. Rev.
B 52 (1995) 9162.
9) L.Capriotti, A.E.Trumper and S.Sorella: Phys. Rev. Lett. 82
(1999) 3899.
10) M.Ogata: J. Phys. Soc. Jpn. 72 (2003) 1839.
11) G.Baskaran: Phys. Rev. Lett. 91 (2003) 097003.
12) T.Watanabe, H.Yokoyama, Y.Tanaka, J.Inoue and M.Ogata:
J. Phys. Soc. Jpn 73 (2004) 3404.
13) Y.Shimizu, K.Miyagawa, K.Kanoda, M.Maesato and G.Saito:
Prog. Theor. Phys. Suppl. 159 (2005) 52.
14) G.Misguich, C.Lhuillier, B.Bernu and C.Waldtmann: Phys.
Rev. B 60 (1999) 1064.
15) F.C.Zhang, C.Gros, T.M.Rice and H.Shiba: Supercond. Sci.
Technol. 1 (1988) 36.
16) J.des Cloizeaux and J.J.Pearson: Phys. Rev. 128 (1962) 2131.
17) We would like to thank T.Misawa for pointing out this possi-
bility.
18) W.Zheng, R.H.McKenzie and R.R.P.Singh: Phys. Rev. B 59
(1999) 14367.
19) J.Merino, R.H.McKenzie, J.B.Marston and C.H.Chung: J.
Phys. Condens. Matter 11 (1999) 2965.
20) A.E.Trumper: Phys. Rev. B 60 (1999) 2987.
21) H.Otsuka: Phys. Rev. B 57 (1998) 14658.
ABSTRACT
  Motivated by the observation of a gapless spin liquid state in
$\kappa$-(BEDT-TTF)$_2$Cu$_2$(CN)$_3$, we analyze the anisotropic triangular
lattice $S=1/2$ Heisenberg model with the resonating valence bond mean-field
approximation. Paying attention to the small quasi-one-dimensional anisotropy
of the material, we take an approach from one-dimensional (1D) chains coupled
with frustrating zig-zag bonds. By calculating one-particle excitation spectra
changing anisotropy parameter $J'/J$ from the decoupled 1D chains to the
isotropic triangular lattice, we find almost gapless excitations in the wide
range from the 1D limit. This one-dimensionalization by frustration is
considered to be a candidate for the mechanism of the gapless spin liquid
state.

<|endoftext|><|startoftext|>
Extra dimensions and Lorentz invariance violation
Viktor Baukh∗ and Alexander Zhuk†
Department of Theoretical Physics and Astronomical Observatory,
Odessa National University, 2 Dvoryanskaya St., Odessa 65026, Ukraine
Tina Kahniashvili‡
CCPP, New York University, 4 Washington Place, New York, NY 10003, USA
National Abastumani Astrophysical Observatory, 2A Kazbegi Ave, Tbilisi, GE-0160 Georgia
We consider effective model where photons interact with scalar field corresponding to conformal
excitations of the internal space (geometrical moduli/gravexcitons). We demonstrate that this
interaction results in a modified dispersion relation for photons, and consequently, the photon group
velocity depends on the energy implying the propagation time delay effect. We suggest to use the
experimental bounds of the time delay of gamma ray bursts (GRBs) photons propagation as an
additional constrain for the gravexciton parameters.
PACS numbers: 04.50.+h, 11.25.Mj, 98.80.-k
Lorentz invariance (LI) of physical laws is one of the
corner stone of modern physics. There is a number of ex-
periments confirming this symmetry at energies we can
approach now. For example, on a classical level, the ro-
tation invariance has been tested in Michelson-Morley
experiments, and the boost invariance has been tested
in Kennedy-Torhndike experiments [1]. Although, up
to now, LI is well established experimentally, we can-
not say surely that at higher energies it is still valid.
Moreover, modern astrophysical and cosmological data
(e.g. UHECR, dark matter, dark energy, etc) indicate
for a possible LI violation (LV). To resolve these chal-
lenges, there are number of attempts to create new phys-
ical models, such as M/string theory, Kaluza-Klein mod-
els, brane-world models, etc. [1].
In this paper we investigate LV test related to photon
dispersion measure (PhDM). This test is based on the
LV effect of a phenomenological energy-dependent speed
of photon [2, 3, 4, 5, 6, 7, 8], for recent studies see Ref.
[9] and references therein.
The formalism that we use is based on the analogy
with electromagnetic waves propagation in a magnetized
medium, and extends previous works [8, 10, 11]. In our
model, instead of propagation in a magnetized medium,
the electromagnetic waves are propagating in vacuum
filled with a scalar field ψ. LV occurs because of an in-
teraction term f(ψ)F 2 where F is an amplitude of the
electromagnetic field. Such an interaction might have
different origins. In the string theory ψ could be a dila-
ton field [12, 13]. The field ψ could be associated with
geometrical moduli. In brane-world models the similar
term describes an interaction between the bulk dilaton
and the Standard Model fields on the brane [14]. In
Ref. [15], such an interaction was obtained in N = 4
∗Electronic address: bauch˙vGR@ukr.net
†Electronic address: zhuk@paco.net
‡Electronic address: tinatin@phys.ksu.edu
super-gravity in four dimensions. In Kaluza-Klein mod-
els the term f(ψ)F 2 has the pure geometrical origin, and
it appears in the effective, dimensionally reduced, four
dimensional action (see e.g. [16, 17]). In particular, in
reduced Einstein-Yang-Mills theories, the function f(ψ)
coincides (up to a numerical prefactor) with the volume
of the internal space. Phenomenological (exactly solv-
able) models with spherical symmetries were considered
in Refs. [18]. To be more specific, we consider the model
which is based on the reduced Einstein-Yang-Mills the-
ory [17], where the term ∝ ψF 2 describes the interaction
between the conformal excitations of the internal space
(gravexcitons) and photons. It is clear that the similar
LV effect exists for all types of interactions of the form
f(ψ)F 2 mentioned above.
Obviously, the interaction term f(ψ)F 2 modifies the
Maxwell equations, and, consequently, results in a mod-
ified dispersion relation for photons. We show that
this modification has rather specific form. For example,
we demonstrate that refractive indices for the left and
right circularly polarized waves coincide with each other.
Thus, rotational invariance is preserved. However, the
speed of the electromagnetic wave’s propagation in vac-
uum differs from the speed of light c. This difference
implies the time delay effect which can be measured via
high-energy GRB photons propagation over cosmological
distances (see e.g. Ref. [9]). It is clear that gravexcitons
should not overclose the Universe and should not result
in variations of the fine structure constant. These de-
mands lead to a certain constrains for gravexcitons (see
Refs. [17, 19]). We use the time delay effect, caused by
the interaction between photons and gravexcitons, to get
additional bounds on the parameters of gravexcitons.
The starting point of our investigation is the Abelian
part of D-dimensional action of the Einstein-Yang-Mills
theory:
SEM = −
|g|FMNFMN , (1)
http://arxiv.org/abs/0704.0314v4
mailto:bauch_vGR@ukr.net
mailto:zhuk@paco.net
mailto:tinatin@phys.ksu.edu
where the D-dimensional metric, g = gMN (X)dX
dXN = g(0)(x)µνdx
µ ⊗ dxν + a21(x)g(1), is defined on
the product manifold M = M0 × M1. Here, M0 is
the (D0 = d0 + 1)-dimensional external space. The d1-
dimensional internal space M1 has a constant curvature
with the scale factor a1(x) ≡ LPl expβ1(x). Dimensional
reduction of the action (1) results in the following effec-
tive D0-dimensional action [17]
S̄EM = −
|g̃(0)| [(1−Dκ0ψ)FµνFµν ] , (2)
which is written in the Einstein frame with the D0-
dimensional metric, g̃
µν = (exp d1β̄
1)−2/(D0−2)g
Here, κ0ψ ≡ −β̄1
(D0 − 2)/d1(D − 2) ≪ 1 and β̄1 ≡
β1 − β10 are small fluctuations of the internal space scale
factor over the stable background β10 (0 subscript de-
notes the present day value). These internal space scale-
factor small fluctuations/oscillations have the form of
a scalar field (so called gravexciton [20]) with a mass
mψ defined by the curvature of the effective potential
(see for detail [20]). Action (2) is defined under the
approximation κ0ψ < 1 that obviously holds for the
condition1 ψ < MPl. κ
0 = 8π/M
Pl is four dimen-
sional gravitational constant, MPl is the Plank mass,
D = 2
d1/[(D0 − 1)(D − 1)] is a model dependent con-
stant. The Lagrangian density for the scalar field ψ reads:
|g̃(0)|(−g̃µνψ,µψ,ν−m2ψψψ)/2. For simplicity we
assume that g̃0 is the flat Friedman-Lemaitre-Robertson-
Walker (FLRW) metric with the scale factor a(t).
Let’s consider Eq. (2). It is worth of noting that the
D0-dimensional field strength tensor, Fµν , is gauge in-
variant.2 Secondly, action (2) is conformally invariant in
the case when D0 = 4. The transform to the Einstein
frame does not break gauge invariance of the action (2),
and the electromagnetic field is antisymmetric as usual,
Fµν = ∂µAν − ∂νAµ. Varying (2) with respect to the
electromagnetic vector potential,
−g (1−Dκ0ψ)Fµν
= 0. (3)
The second term in the round brackets Dκ0ψFµν reflects
the interaction between photons and the scalar field ψ,
and as we show below, it is responsible for LV. In par-
ticular, coupling between photons and the scalar field ψ
makes the speed of photons different from the standard
speed of light. Eq. (3) together with Bianchi identity
(which is preserved in the considered model due to gauge-
invariance of the tensor, Fµν [17]) defines a complete set
1 In the brane-world model the prefactor κ0 in the expression for
κ0ψ is replaced by the parameter proportional to M
[14].
Thus, the smallness condition holds for ψ < MEW .
2 Eq. (2) can be rewritten in the more familiar form S̄EM =
−(1/2)
|g̃(0)|F̄µν F̄
µν [17]. The field strength tensor
F̄µν is not gauge invariant here.
of the generalized Maxwell equations. As we noted, ac-
tion (2) is conformally invariant in the 4D dimensional
space-time. So, it is convenient to present the flat FLRW
metric g̃0 in the conformally flat form: g̃0µν = a
2ηµν ,
where ηµν is the Minkowski metric.
Using the standard definition of the electromagnetic
field tensor, Fµν , we obtain the complete set of the
Maxwell equations in vacuum,
∇ ·B = 0 , (4)
∇ ·E = Dκ0
1−Dκ0ψ
(∇ψ ·E) , (5)
∇×B = ∂E
− Dκ0ψ̇
1−Dκ0ψ
1−Dκ0ψ
[∇ψ ×B] , (6)
∇×E = −∂B
, (7)
where all operations are performed in the Minkowski
space-time, η denotes conformal time related to physi-
cal time t as dt = a(η)dη, and an overdot represents a
derivative with respect to conformal time η.
Eqs. (4) and (7) correspond to Bianchi identity, and
since it is preserved, Eqs. (4) and (7) keep their usual
forms. Eqs. (5) and (6) are modified due to interactions
between photons and gravexcitons (∝ κ0ψ). These mod-
ifications have simple physical meaning: the interaction
between photons and the scalar field ψ acts as an effective
electric charge eeff . This effective charge is proportional
to the scalar product of the ψ field gradient and the E
field, and it vanishes for an homogeneous ψ field. The
modification of Eq. (6) corresponds to an effective cur-
rent Jeff , which depends on both electric and magnetic
fields. This effective current is determined by variations
of the ψ field over the time (ψ̇) and space (∇ψ). For
the case of a homogeneous ψ field the effective current is
still present and LV takes place. The modified Maxwell
equations are conformally invariant. To account for the
expansion of the Universe we rescale the field components
asB,E → B,E a2 [21].
To obtain a dispersion relation for photons, we use
the Fourier transform between position and wavenumber
spaces as,
F(k, ω) =
dη d3x e−i(ωη−k·x)F(x, η) ,
F(x, η) =
(2π)4
dω d3kei(ωη−k·x)F(k, ω) . (8)
Here, F is a vector function describing either the elec-
tric or the magnetic field, ω is the angular frequency of
the electro-magnetic wave measured today, and k is the
wave-vector. We assume that the field ψ is an oscilla-
tory field with the frequency ωψ and the momentum q,
so ψ(x, η) = Cei(ωψη−q·x) , C = const . Eq. (4) implies
B ⊥ k. Without loosing of generality, and for simplic-
ity of description we assume that the wave-vector k is
oriented along the z axis. Using Eq. (7) we get E ⊥ B.
A linearly polarized wave can be expressed as a super-
position of left (L, −) and right (R, +) circularly polar-
ized (LCP and RCP) waves. Using the polarization basis
of Sec. 1.1.3 of Ref. [22], we derive E± = (Ex± iEy)/
Rewriting Eqs. (4) - (7) in the components,3 for LCP
and RCP waves we get,
(1 − n2+)E+ = 0, (1− n2(−))E
− = 0 , (9)
where n+ and n− are refractive indices for RCP and LCP
electromagnetic waves
n2+ =
k2 [1−Dκ0ψ(1 + qz/k)]
ω2 [1−Dκ0ψ(1 + ωψ/ω)]
= n2− . (10)
In the case when LI is preserved the electromagnetic
waves propagating in vacuum have n+ = n− = n =
k/ω ≡ 1. For the electromagnetic waves propagating in
the magnetized plasma, k/ω 6= 1, and the difference be-
tween the LCP and RCP refractive indices describes the
Faraday rotation effect, α ∝ ω(n+ − n−) [23]. In the
considered model, since n+ = n− the rotation effect is
absent, but the speed of electromagnetic waves propaga-
tion in vacuum differs from the speed of light c (see also
Ref. [24] for LV induced by electromagnetic field cou-
pling to other generic field). This difference implies the
propagation time delay effect, ∆t = ∆l(1−∂k/∂ω) (∆l is
a propagation distance), ∆t is the difference between the
photon travel time and that for a ”photon” which travels
at the speed of light c. Here, t is physical synchronous
time. This formula does not take into account the evo-
lution of the Universe. However, it is easy to show that
the effect of the Universe expansion is negligibly small.
Solving the dispersion relation as a square equation,
we obtain
ω2ψ − q2z
(Dκ0ψ)2
, (11)
where ± signs correspond to photons forward and back-
ward directions respectively.
The modified inverse group velocity (11) shows that
the LV effect can be measured if we know the gravexciton
frequency ωψ, z-component of the momentum qz and its
amplitude ψ. For our estimates, we assume that ψ is
the oscillatory field, satisfying (in local Lorentz frame)
the dispersion relation, ω2ψ = m
ψ + q
2, where mψ is the
mass of gravexcitons4. Unfortunately, we do not have
3 We have defined the system of 6 equations with respect to 6
components of the vectors E and B. This system has non-trivial
solutions only if its determinant is nonzero. From this condition
we get the dispersion relation. The Faraday rotation effect is
absent if the matrix has a diagonal form.
4 To get physical values of the corresponding parameters we should
rescale them by the scale factor a.
any information concerning parameters of gravexcitons
(some estimates can be found in [17, 19]). Thus, we
intend to use possible LV effects (supposing it is caused
by interaction between photons and gravexcitons) to set
limits on gravexciton parameters. For example, we can
easily get the following estimate for the upper limit of
the amplitude of gravexciton oscillations:
|ψ| ≈ 1√
MPl , (12)
where for ω and mψ we can use their physical values.
In the case of GRB with ω ∼ 1021 ÷ 1022Hz ∼ 10−4 ÷
10−3GeV and ∆l ∼ 3 ÷ 5 × 109y ∼ 1017sec the typical
upper limit for the time delay is ∆t ∼ 10−4sec [9]. For
these values the upper limit on gravexciton amplitude of
oscillations is5
|κ0ψ| ≈
10−13GeV
. (13)
This estimate shows that our approximation κ0ψ < 1
works for gravexciton masses mψ > 10
−13GeV. Future
measurements of the time-delay effect for GRBs at fre-
quencies ω ∼ 1 − 10GeV would increase significantly
the limit up to mψ > 10
−9GeV. On the other hand,
Cavendish-type experiments [26, 27]) exclude fifth force
particles with masses mψ . 1/(10
−2cm) ∼ 10−12GeV
which is rather close to our lower bound for ψ field
masses. Respectively we slightly shift the considered
mass lower limit to be mψ ≥ 10−12GeV. These masses
considerably higher than the mass corresponding to the
equality between the energy densities of the matter and
radiation (matter/radiation equality), meq ∼ Heq ∼
10−37GeV, where Heq is the Hubble ”constant” at mat-
ter/radiation equality. It means that such ψ-particles
start to oscillate during the radiation dominated epoch
(see appendix). Another bound on the ψ-particles masses
comes from the condition of their stability. With re-
spect to decay ψ → γγ the life-time of ψ-particles is
τ ∼ (MPl/mψ)3tPl [17], and the stability conditions re-
quires that the decay time should be greater than the age
of the Universe. According this we consider light gravex-
citons with masses mψ ≤ 10−21MPl ∼ 10−2GeV ∼ 20me
(where me is the electron mass).
As an additional restriction arises from the condi-
tion that such cosmological gravexcitons should not
overclose the observable Universe. This reads mψ .
meq(MPl/ψin)
4 which implies the following restriction
for the amplitude of the initial oscillations: ψin .
(meq/mψ)
MPL << MPl [19]. Thus, for the range of
masses 10−12GeV ≤ mψ ≤ 10−2GeV, we obtain respec-
tively ψin . 10
−6MPl and ψin . 10
−9MPl. According to
5 We thank R. Lehnert to point that in addition of the time de-
lay effect the Cherenkov effect could be used to constrain the
electromagnetic field and ψ field coupling strength [25].
Eq. (A.3), we can also get the estimate for the amplitude
of oscillations of the considered gravexciton at the present
time. Together with the non-overcloseness condition,
we obtain from this expression that |κ0ψ| ∼ 10−43 for
mψ ∼ 10−12GeV and ψin ∼ 10−6MPl and |κ0ψ| ∼ 10−53
for mψ ∼ 10−2GeV and ψin ∼ 10−9MPl. Obviously, it is
much less than the upper limit (13). Note, as we men-
tioned above, gravexcitons with masses mψ & 10
−2GeV
can start to decay at the present epoch. However, taking
into account the estimate |κ0ψ| ∼ 10−53, we can easily
get that their energy density ρψ ∼ (|κ0ψ|2/8π)M2Plm2ψ ∼
10−55g/cm3 is much less than the present energy density
of the radiation ργ ∼ 10−34g/cm3. Thus, ρψ contributes
negligibly in ργ . Otherwise, the gravexcitons with masses
mψ & 10
−2GeV should be observed at the present time,
which, obviously, is not the case.
Additionally, it follows from Eq. (42) in Ref. [17]
that to avoid the problem of the fine structure constant
variation, the amplitude of the initial oscillations should
satisfy the condition: ψin . 10
−5MPl which, obviously,
completely agrees with our upper bound ψin . 10
−6GeV.
Summarizing we shown that LV effects can give addi-
tional restrictions on parameters of gravexcitons. First,
we found that gravexcitons should not be lighter than
10−13GeV. It is very close to the limit following from the
fifth-force experiment. Moreover, experiments for GRB
at frequencies ω > 1GeV can result in significant shift of
this lower limit making it much stronger than the fifth-
force estimates. Together with the non-overcloseness con-
dition, this estimate leads to the upper limit on the am-
plitude of the gravexciton initial oscillations. It should
not exceed ψin . 10
−6GeV. Thus, the bound on the ini-
tial amplitude obtained from the fine structure constant
variation is one magnitude weaker than our one even for
the limiting case of the gravexciton masses. Increasing
the mass of gravexcitons makes our limit stronger. Our
estimates for the present day amplitude of the gravexci-
ton oscillations, following from the obtained above lim-
itations, show that we cannot use the LV effect for the
direct detections of the gravexcitons. Nevertheless, the
obtained bounds can be useful for astrophysical and cos-
mological applications. For example, let us suppose that
gravexcitons with masses mψ > 10
−2GeV are produced
during late stages of the Universe expansion in some re-
gions and GRB photons travel to us through these re-
gions. Then, Eq. (A.3) is not valid for such gravexcitons
having astrophysical origin and the only upper limit on
the amplitude of their oscillations (in these regions) fol-
lows from Eq. (13). In the case of TeV masses we get
|κ0ψ| ∼ 10−16. If GRB photons have frequencies up to
1 TeV, ω ∼ 1TeV, then this estimate is increased by 6
orders of magnitude.
Acknowledgments
We thank G. Dvali, G. Gabadadze, A. Gruzinov, G.
Melikidze, B. Ratra, and A. Starobinsky for stimulating
discussions. T. K. and A. Zh. acknowledge hospital-
ity of Abdus Salam International Center for Theoreti-
cal Physics (ICTP) where this work has been started.
A.Zh. would like to thank the Theory Division of CERN
for their kind hospitality during the final stage of this
work. T.K. acknowledges partial support from INTAS
061000017-9258 and Georgian NSF ST06/4-096 grants.
A. Appendix: Dynamics of Light Gravexcitons
In this appendix we briefly summarize the main prop-
erties of the light gravexcitons necessary for our inves-
tigations. The more detail description can be found in
Refs. [17, 19].
The effective equation of motion for massive cosmolog-
ical gravexciton6 is
ψ + (3H + Γ)
ψ +m2ψψ = 0 , (A.1)
where H ∼ 1/t and Γ ∼ m3ψ/M2Pl are the Hubble pa-
rameter and decay rate (ψ → γγ) correspondingly. This
equation shows that at times when the Hubble parame-
ter is less than the gravexciton mass: H . mψ the scalar
field begins to oscillate (i.e. time tin ∼ H−1in ∼ 1/mψ
roughly indicates the beginning of the oscillations):
ψ ≈ CB(t) cos(mψt+ δ) . (A.2)
We consider cosmological gravexcitons with masses
10−12GeV ≤ mψ ≤ 10−2GeV. The lower bound fol-
lows both from the fifth-force experiments and Eq. (13).
The upper bound follows from the demand that the life-
time of these particles (with respect to decay ψ → γγ)
is larger than the age of the Universe: τ = 1/Γ ∼
(MPl/mψ)
tPl ≥ 1019sec > tuniv ∼ 4 × 1017 sec. Thus,
we can neglect the decay processes for these gravexci-
tons. Additionally, it can be easily seen that these par-
ticles start to oscillate before teq ∼ H−1eq when the en-
ergy densities of the matter and radiation become equal
to each other (matter/radiation equality). According to
the present WMAP data for the ΛCDM model it holds
Heq ≡ meq ∼ 10−56MPl ∼ 10−28eV. Thus, considered
particles have masses mψ >> meq and start to oscil-
late during the radiation dominated stage. They will not
overclose the observable Universe if the following condi-
tion is satisfied: mψ . meq(MPl/ψin)
4, where ψin is the
amplitude of the initial oscillations at the moment tin
(see Eq. (18) in Ref. [19]).
Prefactors C and B(t) in Eq. (A.2) for con-
sidered light gravexcitons respectively read: C ∼
(ψin/MPl) (MPl/mψ)
and B(t) ∼ MPl (MPlt)−3s/2.
Here, s = 1/2, 2/3 for oscillations during the radiation
6 We have seen that the interaction between gravexcitons and or-
dinary matter (in our case it is 4D-photons) is suppressed by the
Planck scale. Thus, gravexcitons are weakly interacting massive
particles (WIMPs).
dominated and matter dominated stages, correspond-
ingly. We are interested in the gravexciton oscillations
at the present time t = tuniv. In this case s = 2/3 and
for B(tuniv) we obtain: B(tuniv) ∼ t−1univ ≈ 10−61MPl.
Thus, the amplitude of the light gravexciton oscillations
at the present time reads:
|κ0ψ| ∼ 10−60
. (A.3)
[1] V. A. Kostelecký, Third meeting on CPT and Lorentz
Symmetry (World Scientific, Singapore, 2005); G. M.
Shore, Nucl. Phys. B 717, 86 (2005). D. Mattingly, Liv-
ing Rev. Rel. 8, 5 (2005); T. Jacobson, S. Liberati, and
D. Mattingly, Ann. Phys. 321, 150 (2006).
[2] G. Amelino-Camelia, et al., Nature, 393, 763, (1998).
[3] J. R. Ellis, K. Farakos, N. E. Mavromatos, V. A. Mit-
sou and D. V. Nanopoulos, Astrophys. J. 535, 139
(2000); J. R. Ellis, N. E. Mavromatos, D. V. Nanopoulos,
A. S. Sakharov and E. K. G. Sarkisyan, Astropart. Phys.
25, 402 (2006).
[4] V. A. Kostelecký and M. Mewes, Phys. Rev. Lett. 87,
251304 (2001); G. Amelino-Camelia and T. Piran, Phys.
Rev. D 64, 036005 (2001).
[5] S. Sarkar, Mod. Phys. Lett. A 17, 1025 (2002)
[6] R. C. Myers and M. Pospelov, Phys. Rev. Lett. 90,
211001 (2003).
[7] T. Piran, in Planck Scales Effects in Astrophysics and
Cosmology, eds. J. Kowalski-Glikman and G. Amelino-
Camelia (Springer, Berlin, 2005), p. 351.
[8] T. Jacobson, S. Liberati, and D. Mattingly, Nature 424,
1019 (2003).
[9] M. Rodŕıguez Mart́ınez and T. Piran, J. Cosmo. As-
tropart. Phys. 4, 006 (2006).
[10] S. M. Carroll, G. B. Field, and R. Jackiw, Phys. Rev. D
41, 141601 (1990).
[11] T. Kahniashvili, G. Gogoberidze, and B. Ratra, Phys.
Lett. B 643, 81 (2006).
[12] M.B. Green, J.H. Schwarz and E. Witten, 1987 Super-
string Theory, (Cambridge: Cambridge Univ. Press).
[13] T. Damour and A.M. Polyakov, Nucl. Phys. B 423, 532
(1994).
[14] A. Zhuk, Int. J. Mod.Phys. D 11, 1399 (2002).
[15] V.A. Kostelsky, R. Lehnert and M. Perry, Phys.Rev. D
68, 123511 (2003).
[16] P. Loren-Aguilar, E. Garcia-Berro, J. Isern and Yu.A.
Kubyshin, Class.Quant.Grav. 20, 3885, (2003).
[17] U. Günther, A. Starobinsky and A. Zhuk, Phys. Rev.
D69, 044003 (2004).
[18] K.P. Stanyukovich and V.N. Melnikov, 1983 Hydrody-
namics, fields and constants in theory of gravity, (in Rus-
sian); U. Bleyer, K.A.Bronnikov, S.B.Fadeev and V.N.
Melnikov, gr-qc/9405021.
[19] U. Günther and A. Zhuk, Int. J. Mod. Phys. D 13, 1167
(2004).
[20] U. Günther and A. Zhuk, Phys. Rev. D 56, 6391 (1997).
[21] B. Ratra, Astrophys. J. Lett. 391, L1 (1992); D. Grasso
and H. R. Rubinstein, Phys. Rept. 348, 163 (2001);
M. Giovannini, Class. Quant. Grav. 22, 363 (2005).
[22] D. A. Varshalovich, A. N. Moskalev, and V. K. Kher-
sonskii, Quantum Theory of Angular Momentum (World
Scientific, Singapore, 1988).
[23] N. A. Krall and A. W. Trivelpiece, Principles of Plasma
Physics (McGraw-Hill, New York, 1973).
[24] M. B. Cantcheff, Eur. Phys. J. C 46, 247 (2006).
[25] R. Lehnert and R. Potting, Phys. Rev. Lett. 93, 110402
(2004), Phys. Rev. D 70, 125010 (2004).
[26] G. R. Dvali and M. Zaldarriaga, Phys. Rev. Lett. 88,
091303 (2002).
[27] E.G. Adelberger, B.R. Heckel, A.E. Nelson,
Ann.Rev.Nucl.Part.Sci. 53, 77 (2003).
http://arxiv.org/abs/gr-qc/9405021
ABSTRACT
  We consider effective model where photons interact with scalar field
corresponding to conformal excitations of the internal space (geometrical
moduli/gravexcitons). We demonstrate that this interaction results in a
modified dispersion relation for photons, and consequently, the photon group
velocity depends on the energy implying the propagation time delay effect. We
suggest to use the experimental bounds of the time delay of gamma ray bursts
(GRBs) photons propagation as an additional constrain for the gravexciton
parameters.

<|endoftext|><|startoftext|>
Introduction.
Let ξ(t) be a random process with measurable phase space (X,Σ(X)). Consider the
measurable connected domain D ∈ Σ(X) and small parameter ǫ. The investigations of
asymptotics of sojourn probability (small deviations)
P (ξ(t) ∈ ǫD, t ∈ [0, T ]) (1)
is jointed with many practice and theoretical problems [1-4]. In the literature, it was
researched both rough asymptotics of principal member of (1)(log from it)[5] and exact
asymptotics of diffusion processes of (1)[6-8]. In the works [9,10] was proved of algo-
rithms of expansions of exact asymptotics of small deviation for diffusion and piecewise
deterministic random processes for one-dimensional case.
The purpose this article is to present the algorithm of expansion of small deviation for
many-dimensional diffusion processes and to define all constants of principal member.
In Section 1 our main result is stated and proved. In section 2 we consider the limits
theorems about numbers of unabsorbed diffusion particles by boundaries of small domain.
I. The expansion.
We shall investigate of asymptote of following probability
P (ǫ, x) = P (ξ(t) ∈ ǫD, 0 ≤ t ≤ T ) , ǫ → 0,
where ξ(t) ∈ Rd is solution of the following stochastic differential equation
dξ(t) = a(t, ξ(t))dt+
bi(ξ(t))dwi(t), ξ(0) = x ∈ ǫD. (2)
where functions
1991 Mathematics Subject Classification. 60 J 65.
Key words and phrases. parabolic problem,small domain, algorithm of expansion, number of unab-
sorbed processes.
Typeset by AMS-TEX
http://arxiv.org/abs/0704.0315v1
2 VITALII A. GASANENKO
bi(x), a(t, x) : R
d → Rd and R+ ×R
d → Rd.
are differentiable.
Set σij(x) =
bik(x)b
k(x).
It is known that P (ǫ, x) = uǫ0(T, x). Here u
0(t, x) is solution of the following parabolic
boundary problem at 0 ≤ t ≤ T
∂uǫ0(t, x)
i,j=1
σij(x)
∂2uǫ0(t, x)
∂xi∂xj
ai(T − t, x)
∂uǫ0(t, x)
, x ∈ Dǫ;
u(t, x)|t=0 = 1; x ∈ Dǫ; u(t, x) = 0 x ∈ ∂Dǫ, 0 ≤ t ≤ T. (3)
where Dǫ = ǫD. It is assumed that D is a connected bounded domain from R
m; the
boundary ∂Q is the Lyapunov surface C(1,λ) and 0 ∈ D. We interest of the asymptotic
expansion ǫ → 0 of solution this problem uǫ0(t, x) at ǫ → 0.
We define the differential operator A : 1
1≤i,j≤d
σij(0)
∂xi∂xj
. Let σ be a matrix with
the following property
1≤i,j≤d
σij(0)zizj ≥ µ|~z|
Here µ, there is a fixed positive number, and ~z = (z1, · · · , zd) is an arbitrary real
vector.
This operator acts in the following space
HA = {u : u ∈ L2(D) ∩ Au ∈ L2(D) ∩ u(∂D) = 0}
with inner product (u, v)A = (Au, v). Here (, ) is inner product in L2(Q). The opera-
tor A is a positive operator[11]. It is known that the following eigenvalue problem
Au = −λu, u(∂D) = 0
has infinite set of real eigenvalues λi → ∞ and
0 < λ1 < λ2 < · · · < λs < · · · .
The corresponding eigenfunctions
f11, . . . , f1n1 , · · · , fs1, . . . , fsns , · · ·
form the complete system of functions both in HA and L
2(Q) := {u : u ∈ L2(Q) ∩
u(∂Q) = 0}. Here the number nk is equal to multiplicity of eigenvalue λk.
It is often convenient to present the system of eigenfunctions by one index: {fn(z)}.
The corresponding system of eigenvalues {λn} will be with recurrences. We shall use it
We introduce the spectral function
e(x, y, λ) =
fj(x)fj(y).
We shall need in the following theorem from the monograph [12].
THE SMALL DEVIATION OF MANY-DIMENSIONAL DIFFUSION PROCESSES 3
Theorem 1 ([12].Th.17.5.3). . There exists such constant Cα that
x,y∈D
|Dαx,ye(x, y, λ)| ≤ Cαλ
(n+|α|)/2
Here α is multi-index.
Theorem 2. . If the surface ∂D is Lyapunov surface and
(t,z)∈[0,T ]×D,1≤i,j≤d
∂ai(t, z)
∂bi(z)
∂ai(T − t, z)
then the following relation takes place at ǫ → 0
P (ǫ, zǫ) = exp
µ(t)dt
c1mf1m(z) (1 +O(ǫ)) , at z ∈ D,
where
µ(t) =
σij(0)ai(t, 0)aj(t, 0)− δijai(t, 0)aj(t, 0)
and c1m =
f1m(z)dz.
Proof. Make the change of variables and function
xi = ziǫ, u
1 = u
0 exp
ak(T − t, 0)zk
Now we obtain the following parabolic problem for function uǫ1
∂uǫ1(t, z)
i,j=1
σij(ǫz)
∂2uǫ1(t, z)
∂zi∂zj
ai(T − t, ǫz)−
σij(ǫz)aj(T − t, 0)
∂uǫ1(t, z)
σij(ǫz)ai(T − t, 0)aj(T − t, 0)− δijai(T − t, 0)aj(T − t, ǫz)− ǫ
∂ai(T − t, 0)
uǫ1, z ∈ D;
1(t, z)|t=0 = exp
ak(T, 0)zk
; z ∈ D; uǫ1(t, z) = 0 z ∈ ∂D, 0 ≤ t ≤ T.
We will construct the asymptotic expansion of solution for this initial - boundary
problem in the following form
uǫ1(t, z) =
vk(t, z)ǫ
k. (5)
Note that the famous expansion
4 VITALII A. GASANENKO
ak(T, 0)zk
= 1 + ǫ
ak(T, 0)zk +
ak(T, 0)zk
+ · · · ,
defines the initial conditions for vk, k ≥ 0:
v0(0, z) = 1, v1(0, z) =
ak(T, 0)zk, v2(0, z) =
ak(T, 0)zk
· · · .
Using the first fragment of Taylor series in zero point under conditions of theorem we
can obtain the following representations
σij(ǫz) = σij(0) + ǫσ
ij(z), ai(T − t, ǫz) = ai(T − t, 0) + ǫa
i(T − t, z), 1 ≤ i, j ≤ d (6)
where
z∈D,ǫ∈[0,1],1≤i,j≤d
|σǫij(z)| < ∞, sup
z∈D,t∈[0,T ],ǫ∈[0,1],1≤i≤d
|aǫi(T − t, z)| < ∞
Now, after substitution of (5),(6) to (4) we conclude that the v0 satisfies the problem
i,j=1
σij(0)
∂zi∂zj
 v0 + µ(t)v0 (7)
v0|∂D = 0; v0(0, z) = 1, z ∈ D.
µ(t) =
σij(0)ai(T − t, 0)aj(T − t, 0)− δijai(T − t, 0)aj(T − t, 0)
Further, let us denote by Bǫ(t, z) the operator C
2(D) → C(D), for f ∈ C2(D) it’s
defined as follows:
ǫ(t, z)f =
i,j=1
σǫij(z)
∂zi∂zj
ai(T − t, ǫz)−
σij(ǫz)aj(T − t, 0)
i,j=1
σǫij(z)ai(T − t, 0)aj(T − t, 0)− δijai(T − t, 0)a
j(T − t, z)−
∂ai(T − t, 0)
i,j=1
σǫij(z)
∂zi∂zj
Aǫ1(t, z)f + ǫA
2(t, z).
THE SMALL DEVIATION OF MANY-DIMENSIONAL DIFFUSION PROCESSES 5
Now, formally the functions vk, k ≥ 1 are defined by the following recurrence system
problems
i,j=1
σij(0)
∂zi∂zj
 vk +Bǫ(t, z)vk−1 (8)
v0|∂D = 0; vk(0, z) =
ak(T − t, 0)zk
, z ∈ D.
We shall solve the problems of (7),(8) by method of separation of variables. According
to this method the solutions are defined in the form
vk(t, z) =
qk,n(t)fn(z). (9)
For definition of principal number it suffices to construct of the v0. If we substitute
(9) at k = 0 to (7) then we obtain
−q̇0,n(t)−
q0,n(t) + µ(t)q0,n(t)
fn(z) = 0.
Set c0,n =
fn(z)dz (coefficients of expansion of indicator of set D). The initial
condition of v0 has the following stating
v0(0, z) =
q0,n(0)fn(z) =
c0,nfn(z) =
c0,lmflm(z), z ∈ D.
By definition of system of functions {fn(z)}, now we have the system of ordinary
differential equations
q̇0,n(t) +
− µ(t)
q0,n(t) = 0, q0,n(0) = c0,n.
From the latter one we have
q0,n(t) = c0,n exp
µ(s)ds
A0 = sup
ǫ≤1,z∈D;i,j
|σǫij(z)|, L0 =
l≥1,1≤m≤nl
(c0,ml)
A1 = sup
0≤ǫ≤1,z∈D,t∈[0,T ];i,j
ai(T − t, ǫz)−
σij(ǫz)aj(T − t, 0)
= sup
0≤ǫ≤1,z∈D,t∈[0,T ];i,j
ij(z)ai(T − t, 0)aj(T − t, 0)− δijai(T − t, 0)a
j(T − t, z)−
∂ai(T − t, 0)
We have the following relations for eigenvalues λl
6 VITALII A. GASANENKO
2/d ≤ λl ≤ k2l
2/d, max(k1, k2) < ∞
Applying Cauchy-Bunyakovskii inequality, Theorem 1 and the latter one, we get
aǫi,j(z)
∂zi∂zj
−λltǫ
µ(s)ds
c0,ml
aǫi,j(z)
∂2fml(z)
∂zi∂zj
≤ A0d
−λltǫ
µ(s)ds
(c0,ml)
∂2fml(z)
∂zi∂zj
≤ A0dC2,2L0
µ(s)ds
l ≤ exp
K0. (10)
Here K0 < ∞.
Reasoning similarly we convince ourselves that for other parts of Bǫ(t, z)v0 the fol-
lowing estimations take place
|Aǫ1(t, z)v0| ≤ A1dC1,1L0
µ(s)ds
l ≤ exp
K0,1; (11)
|Aǫ2(t, z)v0| ≤ A2dC0,0L0
µ(s)ds
l ≤ exp
K0,2, (12)
where max{K0,1,K0,2} < ∞.
Now let us estimate the coefficients βǫn(t) of expansion of B
ǫ(t, z)v0 by system {fn}n≥1.
Applying (10)-(12) and Cauchy-Bunyakovskii inequality, we get
|βǫn(t)| = |
Bǫ(t, z)v0(t, z)fn(z)dz| ≤
(Bǫ(t, z)v0)
n(z)dz
≤ exp(−λ1tǫ
K0 +K0,1
+ ǫK0,2
The latter one now gives
βǫn(s)ds| ≤ ǫγǫ(t), (13)
THE SMALL DEVIATION OF MANY-DIMENSIONAL DIFFUSION PROCESSES 7
where
0≤ǫ≤1,t∈[0,T ]
γǫ(t) < ∞.
Finally, let us estimate the difference rǫ(t, z) = uǫ1(t, z)−v0(t, z). By definition, r
ǫ(t, z)
is solution of the following problem
i,j=1
σij(0)
∂zi∂zj
 rǫ +Bǫ(T − t, z)v0 z ∈ D; (14)
rǫ(t, z)|t=0 = exp
ak(T, 0)zk
− 1; z ∈ D; rǫ(t, z) = 0 z ∈ ∂D, 0 ≤ t ≤ T.
It is clear that rǫ(0, z) we can present as ǫrǫ1(0, z), where r
1(0, z) is uniform bounded
function of variables ǫ ∈ [0, 1] and z ∈ D. So, the coefficients of expansion this function
by system {fn(z)} have the following forms
rǫ(0, z)fn(z)dz = ǫµ
n, where sup
0≤ǫ≤1
(µǫn)
= M < ∞. (13)
Now we have the solution of (14) in the following form
rǫ(t, z) = ǫ
µǫn exp{−λntǫ
βǫn(s)ds}fn(z)
Applying latter one ,(13),(15), Theorem 1 and Cauchy-Bunyakovskii inequality we get
at t > 0
|rǫ(t, z)ǫ−1| ≤
(µǫn)
exp{−λntǫ
βǫn(s)ds}λ
n } ≤
≤ MC0,0 exp{−λ1tǫ
−2}K0,3, where K0,3 < ∞.
The proof of theorem is completed.
Remark 1. According to the above system of problems for definition of the functions
vk, k ≥ 1, we outline the construction of coefficients qk,n)(t) for the series (8):
q̇k,n(t) = +
+ µǫk−1,n(t)
qk,n(t),
qk,n(0) =
vk(0, z)fn(z)dz =
am(T, 0)zm
fn(z)dz
Here µǫk−1,n(t) =
fn(z)B
ǫ(t, z)vk−1(t, z)dz.
8 VITALII A. GASANENKO
Remark 2. Theorem 2 is coordinated with results of works [6-8] where the principal
member of small deviations in ball are investigated for more simple SDE.
II. The rarefaction of set of diffusion processes by boundaries of small
domains.
The following problem was investigated in works[13,14]. Let a set identical diffusion
random processes start at the initial time from the different points of domain D. These
processes are diffusion processes with absorbtion on the boundary ∂D. We are interested
in distribution of the number yet absorbed at the moment T . The initial number and
initial position of diffusion processes are defined either a random Poisson measure[14]
or deterministic measure [13]. The proved limits theorems described the situation when
T → ∞ and initial number of diffusion processes depended on T and it increased at the
rise of T . The role of normalizing function played principal member of asymptote of
solution of according parabolic problem at T → ∞.
Henceforth we shall assume that considered diffusion processes satisfy of the SDE (2)
with different initial points.
Now we consider the situation when initial number of absorbing diffusion processes
in small domain ǫD depends on ǫ → 0 and it increase under the condition of decrease
of ǫ. It is not hard to show, that now normalizing function is the principal member of
parabolic problem (3) at ǫ → 0.
The proofs of stated below theorems repeat the proofs of according theorems from
[13,14] almost word for word.
We will denote by η(ǫ, T ) the number of remaining processes in the region ǫD at the
moment T .
We will also assume that σ-additive measure ν is given on the Σν- algebra sets from
D, ν(D) < ∞. All eigenfunctions fij : D → R
1 are (Σν ,ΣY ) measurable. Here ΣY is
system of Borel sets from R1. Let ⇒ denote the weak convergence of random values or
measures.
At the beginning we assume that initial number and position of diffusion processes are
defined by deterministic measure N(ǫB, ǫ), B ∈ D. Thus, N(ǫB, ǫ) is equal to number of
starting points in the set ǫB.
Let us denote by νǫ(·) the measure
νǫ(ǫB) = exp
N(ǫB, ǫ).
where B ∈ Σν .
By definition of measure νǫ(·), we have
dνǫ(x) =
, if x = xk, k = 1, · · · , N(ǫD, ǫ)
0, otherwise.
Theorem 3. Under the assumptions of the Theorem 2 let the N(ǫ·, ǫ) satisfies the con-
dition
νǫ(ǫ ·) ⇒
ν(·).
THE SMALL DEVIATION OF MANY-DIMENSIONAL DIFFUSION PROCESSES 9
Then η(ǫ, T ) ⇒ η(T ) if ǫ → 0 where η(T ) has Poisson distribution function with
parameter
a(T ) = exp
µ(s)ds
F (z)dν(z),
where F (z) =
f1i(z)c1i, c1i =
f1i(z)dz
and µ(t) is the function from Theorem 2.
Now we consider the case when the initial number and positions of processes are
defined by the random Poisson measure µ(·, ǫ) in ǫD:
P (µ(ǫA, ǫ) = k) =
mk(ǫA, ǫ)
−m(ǫA,ǫ)
where m(ǫ ·, ǫ) is finitely additive positive measure on ǫD for fixed ǫ.
We assign
g(ǫ) = exp
Theorem 4. Under the assumptions of the Theorem 2 we suppose that m(ǫ·, ǫ) holds
the condition
m(ǫB, ǫ)g(ǫ) = ν(B), B ∈ Σν .
Then η(ǫ, T ) ⇒ η(T ) if ǫ → 0 where η(T ) has the Poisson distribution function with
the parameter a(T ) from Theorem 3.
References
1. GrahamR.,Path integral formulation of general diffusion processes, Z.Phys.(1979),B
26,pp.281-290.
2. Onsager L. andMachlup S. Fluctuation and irreversible processes, I,II, Phys.Rev.(1953)
91,pp.1505-1512,1512-1515.
3. Li W. V.,Shao Q.-M., Gaussian processes:inequalities, small ball probabilities and
applications, in : Stochastic Processes:Theory and Methods, in : Handbook of Statistics,
vol.19, 2001, pp. 533-597.
4. Lifshits M.A., Asymptotic behavior of small ball probabilities, in Probab. Theory
and Math.Statist., Proc. VII International Vilnius Conference (1998), pp. 453-468.
5. Lifshits M., Simon T., Small deviations for fractional stable processes, Ann. I. H.
Poincare - PR 41 (2005) pp. 725-752.
6. Mogulskii A.A, The method of Fourier for determination of asymptotics of small
deviations of Wiener process, Siberian Math. Journ. (1982),v.22,no.3,pp.161-174.
7. Fujita T. and Kotani S., The Onsager - Machlup Function for diffusion processes,
J.Math.Kyoto Uneversity.- 1982.-vol.22,no.22.pp.131-153.
8. Zeitoni O., On the Onsager-Machlup functional of diffusion processes around non
C2 curves, Ann. Probab.(1989),vol.17, no.3, pp.1037-1054.
10 VITALII A. GASANENKO
9. Gasanenko V.A., The total asymptotic expansion of sojourn probability of diffusion
process in thin domain with moving boundaries, Ukraine Math. Journ. (1999),v.51, no.
9, pp.1155-1164.
10. Gasanenko V.A., The jump like processes in thin domain, Analytic questions of
stochastic system, Kyiv:Institute of Mathematics (1992), pp. 4-9.
11. Mihlin S.G. Partial differential linear equations (1977), Vyshaij shkola, Moskow,
12.L.Hörmander, The analysis of Linear Partial differential Operators III (1985),
Spinger-Verlag.
13.Fedullo A., Gasanenko V.A., Limit theorems for rarefaction of set of diffusion
processes by boundaries, Theory of Stochastic Processes vol. 11(27), no.1-2,2005, pp.23-
14.Fedullo A., Gasanenko V.A.,Limit theorems for number of diffusion processes,
which did not absorb by boundaries, Central European Journal of Mathematics 4(4),
2006, pp.624-634.
Institute of Mathematics, National Academy of Science of Ukraine, Tereshchenkivska
3, 252601, Kiev, Ukraine
E-mail address: gs@imath.kiev.ua or gsn@ckc.com.ua
ABSTRACT
  We lead the algorithm of expansion of sojourn probability of many-dimensional
diffusion processes in small domain. The principal member of this expansion
defines normalizing coefficient for special limit theorems.

<|endoftext|><|startoftext|>
Spin-orbit coupling effect on the persistent currents in mesoscopic
ring with an Anderson impurity
Guo-Hui Ding and Bing Dong
Department of Physics, Shanghai Jiao Tong University, Shanghai, 200240, China
(Dated: November 4, 2018)
Abstract
Based on the finite U slave boson method, we have investigated the effect of Rashba spin-
orbit(SO) coupling on the persistent charge and spin currents in mesoscopic ring with an Anderson
impurity. It is shown that the Kondo effect will decrease the magnitude of the persistent charge
and spin currents in this side-coupled Anderson impurity case. In the presence of SO coupling,
the persistent currents change drastically and oscillate with the strength of SO coupling. The SO
coupling will suppress the Kondo effect and restore the abrupt jumps of the persistent currents. It
is also found that a persistent spin current circulating the ring can exist even without the charge
current in this system.
PACS numbers: 73.23.Ra, 71.70.Ej, 72.25.-b
http://arxiv.org/abs/0704.0319v1
I. INTRODUCTION
Recently the spin-orbit(SO) interaction in semiconductor mesoscopic system has attracted
a lot of interest[1]. Due to the coupling of electron orbital motion with the spin degree of
freedom, it is possible to manipulate and control the electron spin in SO coupling system by
applying an external electrical field or a gate voltage, and it is believed that the SO effect
will play an important role in the future spintronic application. Actually, various interesting
effects resulting from SO coupling have already been predicted, such as the Datta-Das spin
field-effect transistor based on Rashba SO interaction[2] and the intrinsic spin Hall effect[3].
In this paper we shall focus our attention on the persistent charge current and spin cur-
rent in mesoscopic semiconductor ring with SO interaction. The existence of a persistent
charge current in a mesoscopic ring threaded by a magnetic flux has been predicted decades
ago[4], and has been extensively studied in theory[5, 6, 7, 8, 9] and also observed in various
experiments[10, 11, 12]. The reason that a persistent charge current exists may be inter-
preted as that the magnetic flux enclosed by the ring introduces an asymmetry between
electrons with clockwise and anticlockwise momentum, thus leads to a thermodynamic state
with a charge current without dissipation. For a mesoscopic ring with a texture like inho-
mogeneous magnetic field, D. Loss et al.[13] predicted that besides the charge current there
are also a persistent spin current. The origin of the persistent spin current can be related
to the Berry phase acquired when the electron spin precesses during its orbital motion. The
persistent spin current has also been studied in semiconductor system with Rashba SO cou-
pling term[14, 15, 16]. Recently it is shown that a semiconductor ring with SO coupling can
sustain a persistent spin current even in the absence of external magnetic flux[17].
For the system of a mesoscopic ring with a magnetic impurity, the persistent charge
current has been investigated in the context of a mesoscopic ring coupled with a quantum
dot[18, 19, 20, 21, 22, 23, 24], where the quantum dot acts as an impurity level and will
introduce charge or spin fluctuations to the electrons in the ring. The Kondo effect arising
from a localized electron spin interacting with a band of electrons will be essential in the
charge transport in the ring. But to our knowledge in these systems the SO effect hasn’t
been considered. It might be expected that the interplay between the Kondo effect and
the SO coupling in the ring can give new features in the persistent currents. In this paper
we shall address this problem and investigate the SO effect on persistent charge and spin
currents in the ring system with an Anderson impurity. The Anderson impurity can act as
a magnetic impurity when the impurity level is in single electron occupied state and as well
as a barrier potential in empty occupied regime.
The outline of this paper is as follows. In section II we introduce the model Hamiltonian
of the system and also the method of calculation by finite-U slave boson approach[25, 26,
27, 28]. In section III the results of persistent charge current and spin current are presented
and discussed. In Section IV we give the summary.
II. MESOSCOPIC RING WITH AN ANDERSON IMPURITY
The electrons in a closed ring with SO coupling of Rashba term can be described by
following Hamiltonian in the polar coordinates[14, 29]
Hring = ∆(−i
[(σx cosϕ+ σy sinϕ)(−i
) + h.c.] , (1)
where ∆ = h̄2/(2mea
2), a is the radius of the ring. αR will characterize the strength of
Rashba SO interaction. Φ is the external magnetic flux enclosed by the ring, and Φ0 =
2πh̄c/e is the flux quantum.
We can write the above Hamiltonian in terms of creation and annihilation operators of
electrons in the momentum space,
Hring =
mσcmσ + 1/2
[tm(c
m+1↓cm↑ + c
m−1↑cm↓) + h.c.] , (2)
where ǫm = ∆(m+φ)
2, tm = αR(m+φ),(m = 0,±1, · · · ,±M) with φ = Φ/Φ0. One can see
that the SO interaction causes the m mode electrons coupled with m + 1 and m − 1 mode
electrons and spin-flip process. We consider the system with a side-coupled impurity which
can be described by the Anderson impurity model,
σdσ + Und↑nd↓ . (3)
The tunneling between the impurity level and the ring are given by
Hd−ring = tD
(d†σcmσ + h.c) . (4)
Then the total Hamiltonian for the system should be
H = Hring +Hd +Hd−ring . (5)
In order to treat the strong on-site Coulomb interaction in the impurity level. we adopt
the finite-U slave boson approach[25, 26]. A set of auxiliary bosons e, pσ, d is introduced for
the impurity level, which act as projection operators onto the empty, singly occupied(with
spin up and spin down), and doubly occupied electron states on the impurity, respectively.
Then the fermion operators dσ are replaced by dσ → fσzσ, with zσ = e
†pσ + p
σ̄d. In order
to eliminate un-physical states, the following constraint conditions are imposed :
σpσ +
e†e+ d†d = 1, and f †σfσ = p
σpσ + d
†d(σ =↑, ↓). Therefore, the Hamiltonian can be rewritten
as the following effective Hamiltonian in terms of the auxiliary boson e, pσ, d and the pesudo-
fermion operators fσ:
Heff =
mσcmσ + 1/2
[tm(c
m+1↓cm↑ + c
m−1↑cm↓) + h.c.]
σfσ + Ud
σcmσ + h.c.) + λ
p†σpσ + e
†e+ d†d− 1)
λ(2)σ (f
σfσ − p
σpσ − d
†d) , (6)
where the constraints are incorporated by the Lagrange multipliers λ(1) and λ(2)σ . The first
constraint can be interpreted as a completeness relation of the Hilbert space on the impurity
level, and the second one equates the two ways of counting the fermion occupancy for a given
spin. In the framework of the finite-U slave boson mean field theory[25, 26], the slave boson
operators e, pσ, d and the parameter zσ are replaced by real c numbers. Thus the effective
Hamiltonian is given as
HMFeff =
mσcmσ + 1/2
[tm(c
m+1↓cm↑ + c
m−1↑cm↓) + h.c.]
ǫ̃dσf
σfσ +
(t̃Dσf
σcmσ + h.c.) + Eg , (7)
where t̃Dσ = tDzσ represents the renormalized tunnel coupling between the impurity and
the mesoscopic ring. zσ can be regarded as the wave function renormalization factor. ǫ̃dσ =
σ is the renormalized impurity level and Eg = λ
2+d2−1)−
d2) + Ud2 is an energy constant.
In this mean field approximation the Hamiltonian is essentially that of a non-interacting
system, hence the single particle energy levels can be calculated by numerical diagonalization
of the Hamiltonian matrix. Then the ground state of this system |ψ0 > can be constructed
by adding electrons to the lowest unoccupied energy levels consecutively . By minimizing
the ground state energy with respect to the variational parameters a set of self-consistent
equations can be obtained as in Ref.[27,28], and they can be applied to determine the
variational parameters in the effective Hamiltonian.
III. THE PERSISTENT CHARGE CURRENT AND SPIN CURRENT
In this section we will present the results of our calculation of the persistent charge current
and spin current circulating the mesoscopic ring. Since there is still some controversial in the
literature for the definition of the spin current operator in the ring system with SO coupling
term[30]. We give both the formula of charge and spin currents used in this paper explicitly.
It is easy to obtain that the ϕ component of electron velocity operator in this SO coupled
ring is
[2∆(−i
+ φ) + αR(σx cosϕ+ σy sinϕ)] . (8)
Thereby the charge current operator is define as Î = −evϕ, and in terms of creation and
annihilation operator it can be written as
Î = −
c†mσcmσ(m+ φ) + αR
m+1↓cm↑ + c
m−1↑cm↓)] . (9)
At zero temperature, the persistent charge current is given by the expectation value of the
above charge current operator in the ground state, I = 1
< ψ0|Î|ψ0 >, and it can also be
calculated from the expression
I = −c
< ψ0|
|ψ0 > , (10)
where Egs is the ground state energy.
In Fig.1 the persistent charge current vs. the enclosed magnetic flux is plotted for a
set of values for the SO coupling strength. Here we have taken the model parameters
∆ = 0.01, tD = 0.3, U = 2.0 and the total number of electrons N is around 100. In this
case one can obtain the Fermi energy of the system EF = 6.25 and the level spacing δ = 0.5
around the Fermi surface. We consider the energy level of the Anderson impurity is well
below the Fermi energy( with ǫd − EF = −1.0), therefore the Anderson impurity is in the
Kondo regime. One can see in Fig.1 that the characteristic features of persistent charge
current depends on the parity of the total number of electrons(N), and can be distinguished
by two cases with N odd and N even. This is attributed to the different occupation patterns
of the highest occupied single particle energy level in the mean field effective Hamiltonian.
The persistent charge current for the system with N +2 electrons is different from that with
N electrons by a π phase shift IN+2(φ) = IN(φ+ π). In case (I) where the electron number
is odd(N = 4n− 1 and N = 4n + 1), one electron is almost localized on the impurity level
and forming a singlet with electron cloud in the conducting ring. This phenomena leads to
the well known Kondo effect. Fig.1 shows that the Kondo effect decreases the magnitude
of the persistent charge current, and also makes its curve shape resemble sinusoidal. In the
presence of finite SO coupling(αR < ∆), the spin-up and spin-down electrons are coupled and
it causes the splitting of the twofold degenerated energy levels in the effective Hamiltonian.
It turns out that the Kondo effect is suppressed and the abrupt jumps of the persistent
charge current with similarity to that of ideal ring case appears. It is explained in Ref.[14]
that the jumps of the persistent charge current in the case of odd number of electrons are
due to a crossing of levels with opposite spin. In case (II) where N is even (N = 4n and
N = 4n+2), The Kondo effect is manifested that the magnitude of persistent charge current
is significantly suppressed compared with ideal ring case and the rounding of the jumps of
persistent charge current due to the level crossing. In the presence of finite SO coupling, the
persistent charge current decreases with increasing the SO coupling strength when αR < ∆.
Fig.2 displays the persistent charge current as a function of the SO coupling strength
αR at different enclosed magnetic flux. The persistent charge current exhibits oscillations
with increasing the value of αR for both the systems with even or odd number of electrons.
Therefore by tuning the SO coupling strength, the magnetic response of this system can
change from paramagnetic to diamagnetic and vice versa. It indicates that SO coupling
can play a important role in electron transport in this mesoscopic ring. The curve of the
persistent charge current for odd number of electrons shows discontinuity in its derivation,
this can be attributed the level crossing in the energy spectrum by changing αR. It is also
noted that the position of this discontinuity for odd N also corresponds to the peak or valley
in even N case.
Since the electron has the spin degree of freedom as well as the charge, the electron
motion in the ring may give rise to a spin current besides the charge current. Now we turn
to study the persistent spin current in the ground state. The spin current operator is defined
by Ĵv = (v
ϕσv + σvv
ϕ)/2, which can be written explicitly as
Ĵv =
{2∆(−i
+ φ)σv +
[(σx cosϕ+ σy sinϕ)σv + h.c.]} , (11)
Therefore the three component of spin current operator in terms of creation and annihi-
lation operators are given by
Ĵz =
m↑cm↑ − c
m↓cm↓)(m+ φ)] , (12)
Ĵx =
m↑cm↓ + c
m↓cm↑)(m+ φ) +
m+1σ + c
m−1σ)cmσ] , (13)
Ĵy =
[−2i∆
m↑cm↓ − c
m↓cm↑)(m+ φ)− i
m+1σ − c
m−1σ)cmσ] , (14)
The expectation value of the spin current Jv =
< ψ0|Ĵv|ψ0 >.
In our calculation we find that only the z component of the spin current is nonzero in the
ground state. Fig.3 shows the persistent spin current Jz vs. magnetic flux at different SO
coupling strength. The persistent spin current is a periodic function of the magnetic flux
φ, which has the even parity symmetry Jz(−φ) = Jz(φ) and also an additional symmetry
Jz(φ) = Jz(π−φ). It is noted that the persistent spin current has quite different dependence
behaviors on magnetic flux compared with the persistent charge current in Fig.1. In the
presence of finite SO coupling, the persistent spin current is nonzero both for the systems
with odd N and even N at zero magnetic flux, it indicates that a persistent spin current can
be induced solely by SO interaction without accompany a charge current. This phenomena
is also shown in Ref.[17] where a SO coupling/normal hybrid ring was considered.
In Fig.4 the persistent spin current Jz as a function of SO coupling strength is plotted.
In the absence of SO coupling αR = 0, the persistent spin current is exactly zero for both
even and odd number electron system. In the presence of SO coupling, The persistent spin
current becomes nonzero and shows oscillations with increasing αR. It can change from
positive to negative values or vice versa by tuning the SO coupling strength. The sign of the
persistent spin current also shows dependence on the enclosed magnetic flux. For the system
with odd N , there is abrupt jumps in the curve of persistent spin current at certain value of
αR, the reason for the jump is the same as that in the charge current, and is due to the level
crossing in the energy spectrum. It is noted that the position of the jump coincides with
that of the persistent charge current. This kind of characteristic feature of the persistent
currents might provide a useful way to detect the SO coupling effects in semiconductor ring
system.
IV. CONCLUSIONS
In summary, we have investigated the Rashba SO coupling effect on the persistent charge
current and spin current in a mesoscopic ring with an Anderson impurity. The Anderson
impurity leads to the Kondo effect and decreases the amplitude of the persistent charge and
spin current in the ring. In the semiconducting ring with SO interaction, the persistent
charge current changes significantly by tuning the SO coupling strength, e.g. from the
paramagnetic to diamagnetic current. Besides the persistent charge current, there also
exists a persistent spin current, which also oscillates with the SO coupling strength. It is
shown that at zero magnetic flux a persistent spin current can exist even without the charge
current. Since the persistent spin current can generate an electric field[31], one might expect
that experiments on semiconductor ring with Rashba SO coupling can detect the persistent
spin current.
Acknowledgments
This project is supported by the National Natural Science Foundation of China, the
Shanghai Pujiang Program, and Program for New Century Excellent Talents in University
(NCET).
[1] I. Zutic, J. Fabian, and S. Das Sarma, Rev. Mod. Phys. 76, 323 (2004).
[2] S. Datta and B. Das, Appl. Phys. Lett. 56, 665(1990).
[3] S. Murakami, N. Nagaosa, and S. C. Zhang, Science 301, 1348 (2003); J. Sinova, D. Culcer, Q.
Niu, N. A. Sinitsyn, T.Jungwirth, and A. H. MacDonald, Phys. Rev. Lett., 92, 126603(2004).
[4] M. Büttiker, Y. Imry, and R. Landauer, Phys. Lett.96A, 365 (1983).
[5] H. F. Cheung, Y. Gefen, E. K. Riedel, and W. H. Shih, Phys. Rev. B 37, 6050 (1988).
[6] D. Loss and P. Goldbart, Phys. Rev. B 43, 13762 (1991).
[7] G. Montambaux, H. Bouchiat, D. Sigeti, and R. Friesner, Phys. Rev. B 42, 7647 (1990).
[8] Y. Meir, Y. Gefen, and O. Entin-Wohlman, Phys. Rev. Lett. 63, 798 (1989).
[9] B. L. Altshuler, Y. Gefen, and Y. Imry, Phys. Rev. Lett. 66, 88(1991).
[10] L. P. Lévy, G. Dolan, J. Dunsmuir, and H. Bouchiat, Phys. Rev. Lett. 64, 2074 (1990).
[11] V. Chandrasekhar, R. A. Webb, M. J. Brady, M. B. Ketchen, W. J. Gallagher, and A. Klein-
sasser, Phys. Rev. Lett. 67, 3578 (1991).
[12] D. Mailly, C. Chapelier, and A. Benoit, Phys. Rev. Lett. 70, 2020 (1993).
[13] D. Loss, P. Goldbart, and A. V. Balatsky, Phys. Rev. Lett., 65, 1655 (1990); D. Loss and P.
M. Goldbart, Phys. Rev. B 45, 13544 (1992).
[14] J. Splettstoesser, M. Governale, and U. Zülicke, Phys. Rev. B 68, 165341 (2003).
[15] J. S. Shen and K. Chang, Phys. Rev. B 74,235315(2006).
[16] R. Citro and F. Romeo, Phys. Rev. B75,073306(2007).
[17] Q. F. Sun, X. C. Xie, and J. Wang, cond-mat/0605748.
[18] M. Büttiker and C. A. Stafford, Phys. Rev. Lett. 76, 495 (1996).
[19] V. Ferrari, G. Chiappe, E. V. Anda, and M. A. Davidovich, Phys. Rev. Lett. 82, 5088 (1999).
[20] I. Affleck and P. Simon, Phys. Rev. Lett.86, 2854 (2001); Phys. Rev. B64, 085308 (2001).
[21] K. Kang and S. C. Shin, Phys. Rev. Lett. 85, 5619 (2000).
[22] S. Y. Cho, K. Kang, C. K. Kim, and C. M. Ryu, Phys. Rev. B 64, 033314 (2001)
[23] H. P. Eckle, H. Johannesson, and C. A. Stafford, Phys. Rev. Lett. 87, 016602 (2001).
[24] H. Hu, G. M. Zhang, and L. Yu, Phys. Rev. Lett. 86, 5558 (2001).
[25] G. Kotliar and A. E. Ruckenstein, Phys. Rev. Lett. 57, 1362 (1986).
[26] V. Dorin and P. Schlottmann, Phys. Rev. B 47, 5095 (1993).
[27] B. Dong and X. L. Lei, Phys. Rev. B 63, 235306 (2001); Phys. Rev. B 65, R241304 (2002).
[28] G. H. Ding and B. Dong, Phys. Rev. B 67, 195327 (2003).
[29] F. E. Meijer, A. F. Morpurgo, and T. M. Klapwijk, Phys. Rev. B 66, 033107 (2002).
[30] Q. F. Sun and X. C. Xie, Phys. Rev. B 72, 245305(2005).
[31] F. Meir and D. Loss, Phys. Rev. Lett. 90, 167204 (2003).
http://arxiv.org/abs/cond-mat/0605748
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
(d)(b)
0.0 0.2 0.4 0.6 0.8 1.0
FIG. 1: The persistent charge current vs. magnetic flux for a set of values for the spin-orbit
coupling strength(αR/∆ = 0.0(solid line),0.5(dashed line), 0.7(dotted line),1.0(dash-dotted line)).
The total number of electrons N = 99 (a), 100(b), 101(c), 102(d). We take the other parameters
∆ = 0.01, td = 0.3, ǫd − EF = −1.0, U = 2.0 in the calculation. The persistent charge current is
measured in units of I0 = eN∆.
0 1 2 3 4
-0.15
-0.10
-0.05
0 1 2 3 4
-0.10
-0.05
(c)(a)
0 1 2 3 4
-0.10
-0.05
0 1 2 3 4
-0.10
-0.05
FIG. 2: The persistent charge current as a function of the spin-orbit coupling strength. The
magnetic flux (Φ/Φ0 = 0.125(solid line),0.25(dashed line), 0.375(dotted line)).
0.0 0.2 0.4 0.6 0.8 1.0
-0.15
-0.10
-0.05
0.0 0.2 0.4 0.6 0.8 1.0
-0.10
-0.05
0.0 0.2 0.4 0.6 0.8 1.0
-0.05
0.0 0.2 0.4 0.6 0.8 1.0
-0.05
FIG. 3: FIG.3: The persistent spin current Jz vs. magnetic flux for a set of values for the spin-
orbit coupling strength( with αR/∆ = 0.5(solid line),0.7(dashed line), 1.0(dotted line)). The panel
(a), (b), (c) and (d) corresponds the system with total number of electrons N = 99, 100, 101, 102,
respectively. The persistent spin current is measured in units of J0 = N∆, and we have taken the
other parameter values the same as that in Fig.1.
0 1 2 3 4
-0.10
-0.05
0 1 2 3 4
-0.10
-0.05
0 1 2 3 4
-0.10
-0.05
0 1 2 3 4
-0.10
-0.05
FIG. 4: FIG.4: The persistent spin current Jz as a function of the spin-orbit coupling strength.
The magnetic flux takes the value (Φ/Φ0 = 0.0(solid line),0.125(dashed line), 0.25(dotted line),
0.5(dash-dotted line)).
	introduction
	Mesoscopic ring with an Anderson impurity
	the persistent charge current and spin current 
	conclusions
	Acknowledgments
	References
ABSTRACT
  Based on the finite $U$ slave boson method, we have investigated the effect
of Rashba spin-orbit(SO) coupling on the persistent charge and spin currents in
mesoscopic ring with an Anderson impurity. It is shown that the Kondo effect
will decrease the magnitude of the persistent charge and spin currents in this
side-coupled Anderson impurity case. In the presence of SO coupling, the
persistent currents change drastically and oscillate with the strength of SO
coupling. The SO coupling will suppress the Kondo effect and restore the abrupt
jumps of the persistent currents. It is also found that a persistent spin
current circulating the ring can exist even without the charge current in this
system.

<|endoftext|><|startoftext|>
Introduction . . . . . . . . . . . . . . . . . . . . . . . p. 2
2. The Standard Diffusion Equation . . . . . . . . . . . . . . p. 4
3. The Time-Fractional Diffusion Equation . . . . . . . . . . . p. 8
4. The Cauchy Problem for the Time-Fractional Diffusion Equation p.10
5. The Signalling Problem for the Time-Fractional Diffusion Equation p.13
6. The Cauchy Problem for the Symmetric Space-Fractional
Diffusion Equation . . . . . . . . . . . . . . . . . . . . p.15
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . p.21
A. The Riemann-Liouville Fractional Calculus . . . . . . . . . p.22
B. The Stable Probability Distributions . . . . . . . . . . . . p.31
References . . . . . . . . . . . . . . . . . . . . . . . p.41
1This paper is based on an invited talk given by Francesco Mainardi at the International
Workshop on Econophysics held at Bolyai College, Eötvös University, Budapest, on
July 21-27, 1997. The paper was originally edited as a contribution for the book J.
Kertesz and I. Kondor (Editors), Econophysics: an Emerging Science, Kluwer
Academic Publishers, Dordrecht (NL) that should contain selected papers presented at
that Workshop and should have appeared in 1998 or 1999. Unfortunately the book was
not published. The present e-print is a revised version (with up-date annotations and
references) of that unpublished contribution, but essentially represents our knowledge of
that early time.
http://arXiv.org/abs/0704.0320v1
Abstract
Fractional calculus allows one to generalize the linear, one-dimensional,
diffusion equation by replacing either the first time derivative or the second
space derivative by a derivative of fractional order. The fundamental
solutions of these generalized diffusion equations are shown to provide
probability density functions, evolving on time or variable in space, which
are related to the peculiar class of stable distributions. This property is
a noteworthy generalization of what happens for the standard diffusion
equation and can be relevant in treating financial and economical problems
where the stable probability distributions are known to play a key role.
1 Introduction
Non-Gaussian probability distributions are becoming more common as data
models, especially in economics where large fluctuations are expected. In
fact, probability distributions with heavy tails are often met in economics
and finance, which suggests to enlarge the arsenal of possible stochastic
models by non-Gaussian processes. This conviction started in the early
sixties after the appearance of a series of papers by Mandelbrot and
his associates, who point out the importance of non-Gaussian probability
distributions, formerly introduced by Pareto and Lévy, and related scaling
properties, to analyse economical and financial variables, as reported in
the recent book by Mandelbrot (1997). Some examples of such variables
are common stock prices changes, changes in other speculative prices, and
interest rate changes. In this respect many works by different authors have
recently appeared, see e.g. the recent books by Bouchaud & Potter (1997),
Mantegna & Stanley (1998) and the references therein quoted.
It is well known that the fundamental solution (or Green function) of
the Cauchy problem for the standard linear diffusion equation provides at
any time the probability density function (pdf) in space of the Gauss (or
normal) law. This law exhibits all moments finite thanks to its exponential
decay at infinity. In particular, the space variance of the Green function
is proportional to the first power of time, a noteworthy property that
can be understood by means of an unbiased random walk model for the
Brownian motion, see e.g. Feller (1957). Less known is the property for
which the fundamental solution of the Signalling problem for the same
diffusion equation, provides at any position a unilateral pdf in time, known
as Lévy law, using the terminology of Feller (1966-1973). Because of its
algebraic decay at infinity as t−3/2 , this law has all moments of integer
order divergent, and consequently its expectation value and variance are
infinite.
Both the Gauss and Lévy laws belong to the general class of stable probability
distributions, which are characterized by an index α (0 < α ≤ 2), called
index of stability or characteristic exponent. In particular, the index of the
Gauss law is 2 , whereas that of the Lévy law is 1/2 .
In this paper we consider two different generalizations of the diffusion
equation by means of fractional calculus, which allows us to replace either the
first time derivative or the second space derivative by a suitable fractional
derivative. Correspondingly, the generalized equation will be referred to
as the time-fractional diffusion equation or the symmetric, space-fractional
diffusion equation. Here we show how the fundamental solutions of this
equation for the Cauchy and Signalling problems provide probability density
functions related to certain stable distributions, so providing a natural
generalization of what occurs for the standard diffusion equation.
The plan of the paper is as follows. First of all, for the sake of convenience
and completeness, we provide the essential notions of Riemann-Liouville
Fractional Calculus and Lévy Stable Probability Distributions in Appendix
A and B, respectively.
In Section 2, we recall the basic results for the standard diffusion
equation concerning the fundamental solutions of the Cauchy and Signalling
problems. In particular we provide the derivation of these solutions by the
Fourier and Laplace transforms and the interpretation in terms of Gauss
and Lévy stable pdf , respectively.
In Section 3, we consider the time-fractional diffusion equation and we
formulate for it the basic Cauchy and Signalling problems to be treated in the
subsequent two sections. Here we adopt the Riemann-Liouville approach to
Fractional Calculus, and the related definition for the Caputo time-fractional
derivative of a causal function of time.
In Section 4, we solve the Cauchy problem for the time-fractional diffusion
equation by using the technique of Fourier transform and we derive the
corresponding fundamental solution in terms of a special function of Wright
type in the similarity variable. In this case the solution can be interpreted
as a noteworthy symmetric pdf in space with all moments finite, evolving
in time. In particular, its space variance turns out to be proportional to a
power of time equal to the order of the time-fractional derivative.
In Section 5, we derive the fundamental solution for the Signalling problem
of the time-fractional diffusion equation by using the technique of Laplace
transform. In this case the solution, still expressed in terms of a special
function of Wright type, can be interpreted as a unilateral stable pdf in
time, depending on position, with index of stability given by half of the
order of the time-fractional derivative.
In Section 6, we consider the symmetric, space-fractional diffusion equation.
Here we adopt the Riesz approach to Fractional Calculus, and the related
definition for the symmetric space-fractional derivative of a function of a
single space variable. Here we treat the Cauchy problem by technique
of Fourier transform and we derive the series representation of the
corresponding Green function. In this case the fundamental solution is
interpreted in terms of a symmetric stable pdf in space, evolving in time,
with index of stability given by the order of the space-fractional derivative.
To approximate such evolution we propose a random walk model, discrete
in space and time, which is based on the Grünwald-Letnikov approximation
of the fractional derivative.
Finally, Section 7 is devoted to conclusions and remarks on related work.
2 The standard diffusion equation
For the standard diffusion equation we mean the linear partial differential
equation
u(x, t) = D
u(x, t) , u = u(x, t) , (2.1)
where D denotes a positive constant with the dimensions L2 T−1 , x and t
are the space-time variables, and u = u(x, t) is the field variable, which is
assumed to be a causal function of time, i.e. vanishing for t < 0 .
The typical physical phenomenon related to such an equation is the heat
conduction in a thin solid rod extended along x , so the field variable u is
the temperature.
In order to guarantee the existence and the uniqueness of the solution,
we must equip (1.1) with suitable data on the boundary of the space-time
domain. The basic boundary-value problems for diffusion are the so-called
Cauchy and Signalling problems. In the Cauchy problem, which concerns
the space-time domain −∞ < x < +∞ , t ≥ 0 , the data are assigned at
t = 0+ on the whole space axis (initial data). In the Signalling problem,
which concerns the space-time domain x ≥ 0 , t ≥ 0 , the data are assigned
both at t = 0+ on the semi-infinite space axis x > 0 (initial data) and at
x = 0+ on the semi-infinite time axis t > 0 (boundary data); here, as mostly
usual, the initial data are assumed to be vanishing.
Denoting by g(x) and h(t) two given, sufficiently well-behaved functions, the
basic problems are thus formulated as following:
a) Cauchy problem
u(x, 0+) = g(x) , −∞ < x < +∞ ; u(∓∞, t) = 0 , t > 0 ; (2.2a)
b) Signalling problem
u(x, 0+) = 0 , x > 0 ; u(0+, t) = h(t) , u(+∞, t) = 0 , t > 0 . (2.2b)
Hereafter, for both the problems, we derive the classical results which will be
properly generalized for the fractional diffusion equation in the subsequent
sections.
Let us begin with the Cauchy problem. It is well known that this initial value
problem can be easily solved making use of the Fourier transform and its
fundamental solution can be interpreted as a Gaussian pdf in x. Adopting
the notation g(x) ÷ ĝ(κ) with κ ∈ R and
ĝ(κ) = F [g(x)] =
e+iκx g(x) dx ,
g(x) = F−1 [ĝ(κ)] = 1
e−iκx ĝ(κ) dκ ,
the transformed solution satisfies the ordinary differential equation of the
first order (
+ κ2 D
û(κ, t) = 0 , û(κ, 0+) = ĝ(κ) , (2.3)
and consequently it turns out to be
û(κ, t) = ĝ(κ) e−κ
2 D t . (2.4)
Then, introducing
Gdc (x, t) ÷ Ĝdc (κ, t) = e−κ
2 D t , (2.5)
where the upper index d refers to (standard) diffusion, the required solution,
obtained by inversion of (2.4), can be expressed in terms of the space
convolution u(x, t) =
−∞ Gdc (ξ, t) g(x − ξ) dξ , where
Gdc (x, t) =
t−1/2 e−x
2/(4D t) . (2.6)
Here Gdc (x, t) represents the fundamental solution (or Green function) of
the Cauchy problem, since it corresponds to g(x) = δ(x) . It turns out
to be a function in x , even and normalized, i.e. Gdc (x, t) = Gdc (|x|, t) and∫ +∞
−∞ Gdc (x, t) dx = 1 . We also note the identity
|x| Gdc (|x|, t) =
Md(ζ) , (2.7)
where ζ = |x|/(
D t1/2) is the well-known similarity variable and
Md(ζ) =
2/4 . (2.8)
We note that Md(ζ) satisfies the normalization condition
d(ζ) dζ = 1 .
The interpretation of the Green function (2.6) in probability theory is
straightforward since we easily recognize
Gdc (x, t) = pG(x;σ) :=
2/(2σ2) , σ2 = 2D t , (2.9)
where pG(x;σ) denotes the well-known Gauss or normal pdf spread out
over all real x (the space variable), whose moment of the second order, the
variance, is σ2 . The associated cumulative distribution function (cdf) is
known to be
PG(x;σ) :=
′;σ) dx′ =
1 + erf
, (2.10)
where erf (z) := (2/
0 exp (−u2) du denotes the error function.
Furthermore, the moments of even order of the Gauss pdf turn out to be∫ +∞
2n pG(x;σ) dx = (2n − 1)!!σ2n , so
x2n Gdc (x, t) dx = (2n − 1)!! (2D t)n , n = 1, 2, . . . . (2.11)
Let us now consider the Signalling problem. This initial-boundary value
problem can be easily solved by making use of the Laplace transform.
Adopting the notation h(t) ÷ h̃(s) with s ∈ C and
h̃(s) = L [h(t)] =
e−st h(t) dt ,
h(t) = L−1
h̃(t)
est h̃(s) ds ,
where Br denotes the Bromwich path, the transformed solution of the
diffusion equation satisfies the ordinary differential equation of the second
order
ũ(x, s) = 0 , ũ(0+, s) = h̃(s) , ũ(+∞, s) = 0 . (2.12)
and consequently it turns out to be
ũ(x, s) = h̃(s) e−(x/
D) s1/2 . (2.13)
Then introducing
Gds (x, t) ÷ G̃ds (x, s) = e−(x/
D) s1/2 , (2.14)
the required solution, obtained by inversion of (2.13), can be expressed in
terms of the time convolution, u(x, t) =
0 Gds (x, τ)h(t − τ) dτ , where
Gds (x, t) =
t−3/2 e−x
2/(4D t) . (2.15)
Here Gds (x, t) represents the fundamental solution (or Green function) of the
Signalling problem, since it corresponds to h(t) = δ(t) . We note that
Gds (x, t) = pLS(t;µ) :=
2π t3/2
e−µ/(2t) , t ≥ 0 , µ = x
, (2.16)
where pLS(t;µ) denotes the one-sided Lévy-Smirnov pdf spread out over all
non negative t (the time variable). The associated cdf is, see e.g. Feller
(1966-1971) and Prüss (1993),
PL(t;µ) :=
′;µ) dt′ = erfc
= erfc
, (2.17)
where erfc (z) := 1 − erf (z) denotes the complenatary error function.
The Lévy-Smirnov pdf has all moments of integer order infinite, since it
decays at infinity as t−3/2 . However, we note that the absolute moments of
real order ν are finite only if 0 ≤ ν < 1/2 . In particular, for this pdf the mean
is infinite, for which we can take the median as expectation value. From
PLs(tmed;µ) = 1/2 , it turns out that tmed ≈ 2µ , since the complementary
error function gets the value 1/2 as its argument is approximatively 1/2.
We note that in the common domain x > 0 , t > 0 the Green functions of
the two basic problems satisfy the identity
xGdc (x, t) = tGds (x, t) , (2.18)
that we refer to as the reciprocity relation between the two fundamental
solutions of the diffusion equation. Furthermore, in view of (2.7) and (2.18)
we recognize the role of the function of the similarity variable, Md(ζ) ,
in providing the two fundamental solutions; we shall refer to it as to the
normalized auxiliary function of the diffusion equation for both the Cauchy
and Signalling problems.
3 The time-fractional diffusion equation
By the time-fractional diffusion equation we mean the linear evolution
equation obtained from the classical diffusion equation by replacing the first-
order time derivative by a fractional derivative (in the Caputo sense) of order
α with 0 < α ≤ 2. In our notation it reads
, u = u(x, t) , 0 < α ≤ 2 , (3.1)
where D denotes a positive constant with the dimensions L2 T−α . From
Appendix A we recall the definition of the Caputo fractional derivative of
order α > 0 for a (sufficiently well-behaved) causal function f(t) , see (A.9),
Dα∗ f(t) :=
Γ(m − α)
(t − τ)m−α f (m)(τ) dτ , (3.2)
where m = 1, 2, . . . , and 0 ≤ m − 1 < α ≤ m . According to (3.2) we thus
need to distinguish the cases 0 < α ≤ 1 and 1 < α ≤ 2 . In the the latter case
(3.1) may be seen as a sort of interpolation between the standard diffusion
equation and the standard wave equation. Introducing
Φλ(t) :=
tλ−1+
, λ > 0 , (3.3)
where the suffix + is just denoting that the function is vanishing for t < 0 ,
we easily recognize that the equation (3.1) assumes the explicit forms :
if 0 < α ≤ 1 ,
Φ1−α(t) ∗
Γ(1 − α)
(t − τ)−α
dτ = D
; (3.4)
if 1 < α ≤ 2 ,
Φ2−α(t) ∗
Γ(2 − α)
(t − τ)1−α
dτ = D ∂
. (3.5)
Extending the classical analysis for the standard diffusion equation (2.1) to
the above integro-differential equations (3.4-5), the Cauchy and Signalling
problems are thus formulated as in equations (2.2), i.e.
a) Cauchy problem
u(x, 0+) = g(x) , −∞ < x < +∞ ; u(∓∞, t) = 0 , t > 0 ; (3.6a)
b) Signalling problem
u(x, 0+) = 0 , x > 0 ; u(0+, t) = h(t) , u(+∞, t) = 0 , t > 0 . (3.6b)
However, if 1 < α ≤ 2 , the presence in (3.5) of the second order time
derivative of the field variable requires to specify the initial value of the first
order time derivative ut(x, 0
+) , since in this case two linearly independent
solutions are to be determined. To ensure the continuous dependence of our
solution on the parameter α also in the transition from α = 1− to α = 1+ ,
we agree to assume ut(x, 0
+) = 0 .
We recognize that our fractional diffusion equation (3.1), when subject to
the conditions (3.6), is equivalent to the integro-differential equation
u(x, t) = g(x) +
(t − τ)α−1
dτ , (3.7)
where 0 < α ≤ 2 . Such integro-differential equation has been investigated
by several authors, including Schneider & Wyss (1989), Fujita (1990), Prüss
(1993) and Engler (1997).
In view of our subsequent analysis we find it convenient to put
, 0 < ν < 1 . (3.8)
In fact the analysis of the time-fractional diffusion equation turns out to
be easier if we adopt as a key parameter the half of the order of the
time-fractional derivative. In future we shall provide the symbol α with
other relevant meanings, as the index of stability of a stable probability
distribution or the order of the space derivative in the space-fractional
diffusion equation.
Henceforth, we agree to insert the parameter ν in the field variable, i.e.
u = u(x, t; ν) . By denoting the Green functions of the Cauchy and Signalling
problems by Gc(x, t; ν) and Gs(x, t; ν) , respectively, the solutions of the two
basic problems are obtained by a space or time convolution, u(x, t; ν) =∫ +∞
−∞ Gc(ξ, t; ν) g(x−ξ) dξ , u(x, t; ν) =
0 Gs(x, τ ; ν)h(t−τ) dτ , respectively.
It should be noted that Gc(x, t; ν) = Gc(|x|, t; ν) , since the Green function
turns out to be an even function of x .
In the following two sections we shall compute the two fundamental solutions
with the same techniques (based on Fourier and Laplace transforms) used
for the standard diffusion equation and we shall provide their interpretation
in terms of probability distributions. Most of the presented results are based
on the papers by Mainardi (1994), (1995), (1996), (1997) and by Mainardi
& Tomirotti (1995), (1997).
4 The Cauchy problem for the time-fractional
diffusion equation
For the fractional diffusion equation (3.1) subject to (3.6a) the application
of the Fourier transform leads to the ordinary differential equation of order
α = 2ν ,
+ κ2 D
û(κ, t; ν) = 0 , û(κ, 0+; ν) = ĝ(κ) , (4.1)
Using the results of Appendix A, see (A.22-30), the transformed solution is
û(κ, t; ν) = ĝ(κ)E2ν
−κ2 D t2ν
, (4.2)
where E2ν(·) denotes the Mittag-Leffler function of order 2ν , and conse-
quently for the Green function we have
Gc(x, t; ν) = Gc(|x|, t; ν) ÷ Ĝc(k, t; ν) = E2ν
−κ2D t2ν
. (4.3)
Since the Green function is a real and even function of x, its (exponential)
Fourier transform can be expressed in terms of the cosine Fourier transform
and thus is related to its spatial Laplace transform as follows
Ĝc(k, t; ν) = 2
Gc(x, t; ν) cos κx dx =
G̃c(s, t; ν)
s=+ik
+ G̃c(s, t; ν)
s=−ik
(4.4)
Indeed, a split occurs also in (4.3) according to the duplication formula for
the Mittag-Leffler function, see (A.26),
Ĝc(k, t; ν) = E2ν(−κ2 D t2ν) =
[Eν(+iκ
D tν) + Eν(−iκ
D tν)]/2 .
(4.5)
When ν 6= 1/2 the inversion of the Fourier transform in (4.5) cannot be
obtained by using a standard table of Fourier transform pairs; however, for
any ν ∈ (0, 1) such inversion can be achieved by appealing to the Laplace
transform pair (A.37) with r = |x| , and s = ±iκ . In fact, taking into
account the scaling property of the Laplace transform, we obtain from (4.5)
and (A.37)
Gc(|x|, t; ν) =
( |x|√
, (4.6)
where M(ζ; ν) is the special function of Wright type, defined by (A.31-33),
, (4.7)
the similarity variable. We note the identity
|x| Gc(|x|, t; ν) =
M(ζ; ν) , (4.8)
which generalizes to the time-fractional diffusion equation the identity (2.7)
of the standard diffusion equation. Since
0 M(ζ; ν) dζ = 1 , see (A.40),
the function M(ζ; ν) is the normalized auxiliary function of the fractional
diffusion equation.
We note that for the time-fractional diffusion equation the fundamental
solution of the Cauchy problem is still a bilateral symmetric pdf in x (with
two branches, for x > 0 and x < 0 , obtained one from the other by
reflection), but is no longer of Gaussian type if ν 6= 1/2 . In fact, for large
|x| each branch exhibits an exponential decay in the ”stretched” variable
|x|1/(1−ν) as can be derived from the asymptotic representation (A.36) of the
auxiliary function M(·; ν) . In fact, by using (4.7-8) and (A.36), we obtain
Gc(x, t; ν) ∼ a∗(t) |x|(ν−1/2)/(1−ν) exp
−b∗(t)|x|1/(1−ν)
, (4.9)
as |x| → ∞ , where a∗(t) and b∗(t) are certain positive functions of time.
Furthermore, the exponential decay in x provided by (4.9) ensures that all
the absolute moments of positive order of Gc(x, t; ν) are finite. In particular,
using (4.8) and (A.39) it turns out that the moments (of even order) are
x2n Gc(x, t; ν) dx =
Γ(2n + 1)
Γ(2νn + 1)
(Dt2ν)n , n = 0 , 1 , 2 , . . . (4.10)
The formula (4.10) provides a generalization of the corresponding formula
(2.11) valid for the standard diffusion equation, ν = 1/2 . Furthermore, we
recognize that the variance associated to the pdf is now proportional to Dt2ν ,
which for ν 6= 1/2 implies a phenomenon of anomalous diffusion. According
to a usual terminology in statistical mechanics, the anomalous diffusion is
said to be slow if 0 < ν < 1/2 and fast if 1/2 < ν < 1 .
In Figure 1, as an example, we compare versus |x| , at fixed t , the
fundamental solutions of the Cauchy problem with different ν (ν =
1/4 , 1/2 , 3/4 ). We consider the range 0 ≤ |x| ≤ 4 and assume D = t = 1 .
0 1 2 3 4
Figure 1: The Cauchy problem for the time-fractional diffusion equation.
The fundamental solutions versus |x| with a) ν = 1/4 , b) ν = 1/2 , c)
ν = 3/4 .
We note the different behaviour of the pdf in the cases of slow diffusion (ν =
1/4 ) and fast diffusion (ν = 3/4 ) with respect to the Gaussian behaviour
of the standard diffusion (ν = 1/2). In the limiting cases ν = 0 and ν = 1
we have
Gc(x, t; 0) =
e−|x|
, Gc(x, t; 1) =
δ(x −
D t) + δ(x +
. (4.11)
We also recognize from the appendix B that for 1/2 ≤ ν < 1 any branch
of the fundamental solution is proportional to the corresponding positive
branch of an extremal stable pdf with index of stability α = 1/ν , which
exhibits an exponential decay at infinity. In fact, applying (B.29) with
α = 1/ν and y = ζ = |x|/(
Dtν) , from (4.7-8) we obtain
Gc(|x|, t; ν) =
|x|/(
D tν) ; − (2 − 1/ν)
· p1/ν (|x|; +1, 1, 0) , 1 < 1/ν ≤ 2 .
(4.12)
We also note that the stable distribution in (4.12) satisfies the condition
p1/ν (x; +1, 1, 0) dx = ν , 1 < 1/ν ≤ 2 . (4.13)
5 The Signalling problem for the time-fractional
diffusion equation
For the fractional diffusion equation (3.1) subject to (3.6b) the application
of the Laplace transform leads to the ordinary differential equation of order
ũ(x, s; ν) , ũ(0+, s; ν) = h̃(s) , ũ(+∞, s; ν) = 0 . (5.1)
Thus the transformed solution reads
ũ(x, s; ν) = h̃(s) e−(x/
D) sν , (5.2)
so for the Green function we have
Gs(x, t; ν) ÷ G̃s(x, s; ν) = e−(x/
D) sν . (5.3)
When ν 6= 1/2 the inversion of this Laplace transform cannot be obtained by
looking in a standard table of Laplace transform pairs. Also here we appeal
to a Laplace transform pair related to the Wright-type function M(ζ; ν). In
fact, using (A.40) with r = t , and taking into account the scaling property
of the Laplace transform, we obtain
Gs(x, t; ν) = ν
D t1+ν
. (5.4)
Introducing the similarity variable ζ = x/(
Dtν) , we recognize the identity
tGs(x, t; ν) = ν ζ M(ζ; ν) , (5.5)
which is the counterpart for the Signalling problem of the identity (4.8) valid
for the Cauchy problem.
Comparing (5.5) with (4.8) we obtain the reciprocity relation between the
two fundamental solutions of the time-fractional diffusion equation, in the
common domain x > 0 , t > 0 ,
2ν xGc(x, t; ν) = tGs(x, t; ν) . (5.6)
The interpretation of Gs(x, t; ν) as a one-sided stable pdf in time is
straightforward: in this respect we need to apply (B.28), with index of
stability α = ν and variable y = ζ−1/ν = t (
D/x)1/ν , in (5.5). We obtain
Gs(x, t; ν) =
; − ν
 = pν (t; −1, 1, 0) . (5.7)
In Figure 2, as an example, we compare versus t , at fixed x , the fundamental
solutions of the Signalling problem with different ν (ν = 1/4 , 1/2 , 3/4 ). We
consider the range 0 ≤ t ≤ 3 and assume D = x = 1 .
We note the different behaviour of the pdf in the cases of slow diffusion
(ν = 1/4 ) and fast diffusion (ν = 3/4 ) with respect to the Lévy pdf for the
standard diffusion (ν = 1/2). In the limiting cases ν = 0 , 1 , we have
Gs(x, t; 0) = δ(t) , Gs(x, t; 1) = δ(t − x/
D) . (5.8)
0 1 2 3
Figure 2: The Signalling problem for the time-fractional diffusion equation.
The fundamental solutions versus t with a) ν = 1/4 , b) ν = 1/2 , c)
ν = 3/4 .
6 The Cauchy problem for the symmetric space-
fractional diffusion equation
The symmetric space-fractional diffusion equation is obtained from the
classical diffusion equation by replacing the second-order space derivative by
a symmetric space-fractional derivative (explained below) of order α with
0 < α ≤ 2 . In our notation we write this equation as
∂|x|α
, u = u(x, t;α) , x ∈ R , t ∈ R+0 , 0 < α ≤ 2 , (6.1)
where D is a positive coefficient with the dimensions Lα T−1 . The
fundamental solution for the Cauchy problem, Gc(x, t;α) is the solution of
(6.1), subject to the initial condition u(x, 0+;α) = δ(x) .
The symmetric space-fractional derivative of any order α > 0 of a sufficiently
well-behaved function φ(x) , x ∈ R , may be defined as the pseudo-
differential operator characterized in its Fourier representation by
d|x|α
φ(x) ÷ −|κ|α φ̂(κ) , x , k ∈ R , α > 0 . (6.2)
According to a usual terminology, −|κ|α is referred to as the symbol of our
pseudo-differential operator, the symmetric space-fractional derivative, of
order α . Here, we have adopted the notation introduced by Zaslavski, see
e.g. Saichev & Zaslavski (1997).
In order to properly introduce this kind of fractional derivative we need
to consider a peculiar approach to fractional calculus different from the
Riemann-Liouville one, already treated in Appendix A. This approach is
indeed based on the so-called Riesz potentials (or integrals), that we prefer
to consider later.
At first, let us see how things become highly transparent by using an
heuristic argument, originally due to Feller (1952). The idea is to start
from the positive definite differential operator
A := −
÷ κ2 = |κ|2 , (6.3)
whose symbol is |κ|2 , and form positive powers of this operator as pseudo-
differential operators by their action in the Fourier-image space, i.e.
Aα/2 :=
= |κ|α α > 0 . (6.4)
Thus the operator −Aα/2 can be interpreted as the required fractional
derivative, i.e.
Aα/2 ≡ − d
d|x|α
, α > 0 . (6.5)
We note that the operator just defined must not be confused with a power
of the first order differential operator d
for which the symbol is −iκ .
After the above considerations it is straightforward to obtain the Fourier
image of the Green function of the Cauchy problem for the space-fractional
diffusion equation. In fact, applying the Fourier transform to the equation
(6.1), subject to the initial condition u(x, 0+;α) = δ(x) , and accounting for
(6.2), we obtain
Gc(x, t;α) = Gc(|x|, t;α) ÷ Ĝc(k, t;α) = e−D t |κ|
, 0 < α ≤ 2 . (6.6)
We easily recognize that the Fourier transform of the Green function
corresponds to the canonic form of a symmetric stable distribution of index
of stability α and scaling factor γ = (Dt)1/α , see (B.8). Therefore we have
Gc(x, t;α) = pα(x; 0, γ, 0) , γ = (Dt)1/α . (6.7)
For α = 1 and α = 2 we easily obtain the explicit expressions of the
corresponding Green functions since in these cases they correspond to the
Cauchy and Gauss distributions,
Gc(x, t; 1) =
x2 + (D t)2
, (6.8)
see (B.5), and
Gc(x, t; 2)) =
2/(4D t) , (6.9)
in agreement with (2.6).
We easily recognize that
(D t)1/α
(6.10)
is the similarity variable for the space-fractional diffusion equation, in terms
of which we can express the Green function for any α ∈ (0, 2] . Indeed, we
recognize that
Gc(x, t;α) =
(D t)1/α
qα(η; 0) , (6.11)
where qα(η; 0) denotes the symmetric stable distribution of order α with
Feller-type characteristic function, see (B.14-15). Now we can express the
Green function using the Feller series expansions (B.21-22) with θ = 0 . We
obtain:
for 0 < α < 1 ,
qα(η; 0) = −
Γ(nα + 1)
, (6.12a)
for 1 < α ≤ 2 ,
qα(η; 0) =
(−1)m
Γ[(2m + 1)/α]
(2m)!
η2m . (6.12b)
In the limiting case α = 1 the above series reduce to geometrical series and
therefore are no longer convergent in all of C . In particular, they represent
the expansions of the function q1(η; 0) = 1/[π(1+η
2)] , convergent for η > 1
and 0 < η < 1 , respectively.
We also note that for any α ∈ (0, 2] the functions qα(η; 0) exhibit at the
origin the value qα(0; 0) = Γ(1/α)/(π α) , and at the queues, excluding the
Gaussian case α = 2 , the algebraic asymptotic behaviour, as η → ∞ ,
qα(η; 0) ∼
Γ(α + 1) sin
η−(α+1) , 0 < α < 2 . (6.13)
In Figure 3, as an example, we compare versus x , at fixed t , the fundamental
solutions of the Cauchy problem with different α (α = 1/2 , 1 , 3/2 , 2 ). We
consider the range −6 ≤ x ≤ +6 and assume D = t = 1 .
-6 -4 -2 0 2 4 6
-6 -4 -2 0 2 4 6
Figure 3: The Cauchy problem for the simmetric space-fractional diffusion
equation. The fundamental solutions versus x : plate a) α = 1/2
(continuous line), α = 1 (dashed line); plate b) α = 3/4 (continuous line),
α = 2 (dashed line).
Let us now express more properly our operator (6.4) (with symbol |κ|α)
as inverse of a suitable integral operator Iα whose symbol is |κ|−α . This
operator can be found in the approach by Marcel Riesz to Fractional
Calculus, see e.g. Samko, Kilbas & Marichev (1987-1993) and Rubin (1996).
We recall that for any α > 0 , α 6= 1 , 3 , 5 , . . . and for a sufficiently well-
behaved function φ(x) , x ∈ R , the Riesz integral or Riesz potential Iα and
its image in the Fourier domain read
Iα φ(x) :=
2Γ(α) cos(πα/2)
|x − ξ|α−1 φ(ξ) dξ ÷ φ̂(κ)
. (6.14)
On its turn, the Riesz potential can be written in terms of two Weyl integrals
Iα± according to
Iα φ(x) =
2 cos(πα/2)
Iα+φ(x) + I
−φ(x)
, (6.15)
where 
Iα+ φ(x) :=
(x − ξ)α−1 φ(ξ) dξ ,
Iα− φ(x) :=
(ξ − x)α−1 φ(ξ) dξ .
(6.16)
Then, at least in a formal way, the space-fractional derivative (6.2) turns
out to be defined as the opposite of the (left) inverse of the Riesz fractional
integral, i.e.
d|x|α
φ(x) := −I−α φ(x) = −
2 cos(πα/2)
I−α+ φ(x) + I
− φ(x)
. (6.17)
Notice that (6.14) and (6.17) become meaningless when α is an integer odd
number. However, for our range of interest 0 < α ≤ 2 , the particular case
α = 1 can be singled out since the corresponding Green function is already
known, see (6.8). Thus, excluding the case α = 1 , our space-fractional
diffusion equation (6.1) can be re-written, x ∈ R , t ∈ R+0 , as
= −D I−α u , u = u(x, t;α) , 0 < α ≤ 2 , α 6= 1 , (6.18)
where the operator I−α is defined by (6.16-17).
Here, in order to evaluate the fundamental solution of the Cauchy problem,
interpreted as a probability density, we propose a numerical approach,
original as far as we know, based on a (symmetric) random walk model,
discrete in space and time, see also Gorenflo & Mainardi (1998a), Gorenflo
& Mainardi (1998b) and Gorenflo, De Fabritiis & Mainardi (1999). We shall
see how things become highly transparent, in that we properly generalize
the classical random-walk argument of the standard diffusion equation
to our space-fractional diffusion equation (6.18). So doing we are in
position to provide a numerical simulation of the related (symmetric) stable
distributions in a way analogous to the standard one for the Gaussian law.
The essential idea is to approximate the left inverse operators I−α± by the
Grünwald-Letnikov scheme, on which the reader can inform himself in the
treatises on fractional calculus, see e.g. Oldham & Spanier (1974), Samko,
Kilbas & Marichev (1987-1993), Miller & Ross (1993), or in the recent review
article by Gorenflo (1997). If h denotes a ”small” positive step-length, these
approximating operators read
± φ(x) :=
(−1)k
φ(x ∓ kh) . (6.19)
Assume, for simplicity, D = 1 , and introduce grid points xj = j h with
h > 0 , j ∈ Z , and time instances tn = n τ with τ > 0 , n ∈ N0 . Let there
be given probabilities pj,k ≥ 0 of jumping from point xj at instant tn to
point xk at instant tn+1 and define probabilities yj(tn) of the walker being
at point xj at instant tn. Then, by
yk(tn+1) =
pj,k uj(tn) ,
pj,k =
pj,k = 1 , (6.20)
with pj,k = pk,j , a symmetric random walk (more precisely a symmetric
random jump) model is described. With the approximation
yj(tn) ≈
∫ (xj+h/2)
(xj−h/2)
u(x, tn) dx ≈ hu(xj , tn) , (6.21)
and introducing the ”scaling parameter”
2 | cos(απ/2)|
, (6.22)
we have solved
yj(tn+1) − yj(tn)
= − hI−α yj(tn) , (6.23)
for yj(tn+1) . So we have proved to have a consistent (for h → 0) symmetric
random walk approximation to (6.18) by taking
i) for 0 < α < 1 , 0 < µ ≤ 1/2 ,
−α yj(tn) = µ
+ yj(tn) + hI
− yj(tn)
pj,j = 1 − 2µ , pj,j±k = µ
)∣∣ , k ≥ 1 ;
(6.24)
ii) for 1 < α ≤ 2 , 0 < µ ≤ 1/(2α) ,


−α yj(tn) = µ
+ yj+1(tn) + hI
− yj−1(tn)
pj,j = 1 − 2µ α , pj,j±1 = µ
pj,j±k = µ
)∣∣∣ , k ≥ 2 .
(6.25)
We note that our random walk model is not only symmetric, but also
homogeneous, the transition probabilities pj,j±k not depending on the index
In the special case α = 2 we recover from (6.25) the well-known three-point
approximation of the heat equation, because pj,j±k = 0 for k ≥ 2 . This
means that for approximation of common diffusion only jumps of one step
to the right or one to the left or jumps of width zero occur, whereas for
0 < α < 2 (α 6= 1) arbitrary large jumps occur with power-like decaying
probability, as it turns out from the asymptotic analysis for the transition
probabilities given in (6.24-25). In fact, as k → ∞ , one finds
pj,j+k ∼
(τ/hα)
Γ(α + 1) sin
k−(α+1) , 0 < α < 2 . (6.26)
This result thus provides the discrete counterpart of the asymptotic
behaviour of the long power-law tails of the symmetric stable distributions,
as foreseen by (6.13) when 0 < α < 2 .
7 Conclusions
We have treated two generalizations of the standard, one-dimensional,
diffusion equation, namely, the time-fractional diffusion equation and the
symmetric space-fractional diffusion equation. For these equations we have
derived the fundamental solutions using the transform methods of Fourier
and Laplace, and exhibited their connections to extremal and symmetric
stable probability densities, evolving on time or variable in space. For the
symmetric space-fractional diffusion equation we have presented a stationary
(in time), homogeneous (in space) symmetric random walk model, discrete
in space and time, the step-lengths of the spatial grid and the time lapses
between transitions properly scaled. In the limit of infinitesimally fine
discretization this model (based on the Grünwald-Letnikov approximation
to fractional derivatives) is consistent with the continuous diffusion process,
i.e. convergent if interpreted as a difference scheme in the sense of numerical
analysis2.
From the mathematical viewpoint the field of such ”fractional” general-
izations is fascinating as there several mathematical disciplines meet and
come to a fruitful interplay: e.g. probability theory and stochastic processes,
2Further generalizations have been considered by us and our collaborators in other
papers, in which we have given a derivation of discrete random walk models related to
more general space-time fractional diffusion equations. For a comprehensive analysis, see
Gorenflo et al. (2002). Readers interested to the fundamental solutions of these fractional
diffusion equations are referred to the paper by Mainardi et al. (2001) where analytical
expressions and numerical plots are found.
integro-differential equations, transform theory, special functions, numerical
analysis. As one may take from our References, one can observe that since
some decades there is an ever growing interest in using the concepts of
fractional calculus among physicists and economists. Among economists we
like to refer the reader to a collection of papers on the topic of ”Fractional
Differencing and Long Memory Processes”, edited by Baillie & King (1996).
Appendix A: The Riemann-Liouville Fractional
Calculus
Fractional calculus is the field of mathematical analysis which deals with the
investigation and applications of integrals and derivatives of arbitrary order.
The term fractional is a misnomer, but it is retained following the prevailing
use. This appendix is mostly based on the recent review by Gorenflo &
Mainardi (1997). For more details on the classical treatment of fractional
calculus the reader is referred to Erdélyi (1954), Oldham & Spanier (1974),
Samko et al. (1987-1993) and Miller & Ross (1993).
According to the Riemann-Liouville approach to fractional calculus, the
notion of fractional Integral of order α (α > 0) is a natural consequence
of the well known formula (usually attributed to Cauchy), that reduces the
calculation of the n−fold primitive of a function f(t) to a single integral of
convolution type. In our notation the Cauchy formula reads
Jnf(t) := fn(t) =
(n − 1)!
(t − τ)n−1 f(τ) dτ , t > 0 , n ∈ N , (A.1)
where N is the set of positive integers. From this definition we note that
fn(t) vanishes at t = 0 with its derivatives of order 1, 2, . . . , n − 1 . For
convention we require that f(t) and henceforth fn(t) be a causal function,
i.e. identically vanishing for t < 0. In a natural way one is led to extend
the above formula from positive integer values of the index to any positive
real values by using the Gamma function. Indeed, noting that (n − 1)! =
Γ(n) , and introducing the arbitrary positive real number α , one defines the
Fractional Integral of order α > 0 :
Jα f(t) :=
(t − τ)α−1 f(τ) dτ , t > 0 , α ∈ R+ , (A.2)
where R+ is the set of positive real numbers. For complementation we define
J0 := I (Identity operator), i.e. we mean J0 f(t) = f(t) . Furthermore, by
Jαf(0+) we mean the limit (if it exists) of Jαf(t) for t → 0+ ; this limit
may be infinite.
We note the semigroup property JαJβ = Jα+β , α , β ≥ 0 , which implies
the commutative property JβJα = JαJβ , and the effect of our operators Jα
on the power functions
Jαtγ =
Γ(γ + 1)
Γ(γ + 1 + α)
tγ+α , α ≥ 0 , γ > −1 , t > 0 . (A.3)
These properties are of course a natural generalization of those known when
the order is a positive integer.
Introducing the Laplace transform by the notation L {f(t)} :=∫∞
−st f(t) dt = f̃(s) , s ∈ C , and using the sign ÷ to denote a Laplace
transform pair, i.e. f(t) ÷ f̃(s) , we note the following rule for the Laplace
transform of the fractional integral,
Jα f(t) ÷ f̃(s)
, α ≥ 0 , (A.4)
which is the generalization of the case with an n-fold repeated integral.
After the notion of fractional integral, that of fractional derivative of order
α (α > 0) becomes a natural requirement and one is attempted to substitute
α with −α in the above formulas. However, this generalization needs some
care in order to guarantee the convergence of the integrals and preserve the
well known properties of the ordinary derivative of integer order.
Denoting by Dn with n ∈ N , the operator of the derivative of order n ,
we first note that Dn Jn = I , Jn Dn 6= I , n ∈ N , i.e. Dn is left-inverse
(and not right-inverse) to the corresponding integral operator Jn . In fact
we easily recognize from (A.1) that
Jn Dn f(t) = f(t) −
f (k)(0+)
, t > 0 . (A.5)
As a consequence we expect that Dα is defined as left-inverse to Jα. For
this purpose, introducing the positive integer m such that m − 1 < α ≤ m ,
one defines the Fractional Derivative of order α > 0 :
Dα f(t) := Dm Jm−α f(t) , m − 1 < α ≤ m , m ∈ N , (A.6)
namely
Dα f(t)=
Γ(m − α)
(t − τ)α+1−m
, m − 1 < α < m,
f(t) , α = m.
(A.6′)
Defining for complementation D0 = J0 = I , then we easily recognize that
Dα Jα = I , α ≥ 0 , and
Dα tγ =
Γ(γ + 1)
Γ(γ + 1 − α)
tγ−α , α ≥ 0 , γ > −1 , t > 0 . (A.7)
Of course, these properties are a natural generalization of those known when
the order is a positive integer.
Note the remarkable fact that the fractional derivative Dα f is not zero for
the constant function f(t) ≡ 1 if α 6∈ N . In fact, (A.7) with γ = 0 teaches
us that
Dα1 =
Γ(1 − α)
, α ≥ 0 , t > 0 . (A.8)
This, of course, is ≡ 0 for α ∈ N, due to the poles of the gamma function in
the points 0,−1,−2, . . .. We now observe that an alternative definition of
fractional derivative, originally introduced by Caputo (1967) (1969) in the
late sixties and adopted by Caputo and Mainardi (1971) in the framework
of the theory of Linear Viscoelasticity, is
Dα∗ f(t) := J
m−α Dm f(t) m − 1 < α ≤ m , m ∈ N , (A.9)
namely
D ∗α f(t) =
Γ(m − α)
f (m)(τ)
(t − τ)α+1−m
dτ , m − 1 < α < m,
f(t) , α = m.
(A.9′)
This definition is of course more restrictive than (A.6), in that requires
the absolute integrability of the derivative of order m. Whenever we use
the operator Dα∗ we (tacitly) assume that this condition is met. We easily
recognize that in general
Dα f(t) := Dm Jm−α f(t) 6= Jm−α Dm f(t) := Dα∗ f(t) , (A.10)
unless the function f(t) along with its first m − 1 derivatives vanishes at
t = 0+. In fact, assuming that the passage of the m-derivative under the
integral is legitimate, one recognizes that, for m − 1 < α < m and t > 0 ,
Dα f(t) = Dα∗ f(t) +
Γ(k − α + 1)
f (k)(0+) , (A.11)
and therefore, recalling the fractional derivative of the power functions (A.7),
f(t) −
f (k)(0+)
= Dα∗ f(t) . (A.12)
The alternative definition (A.9) for the fractional derivative thus incorpo-
rates the initial values of the function and of its integer derivatives of lower
order. The subtraction of the Taylor polynomial of degree m − 1 at t = 0+
from f(t) means a sort of regularization of the fractional derivative. In
particular, according to this definition, the relevant property for which the
fractional derivative of a constant is still zero can be easily recognized, i.e.
Dα∗ 1 ≡ 0 , α > 0 . (A.13)
We now explore the most relevant differences between the two fractional
derivatives (A.6) and (A.9). We agree to denote (A.9) as the Caputo
fractional derivative to distinguish it from the standard Riemann-Liouville
fractional derivative (A.6). We observe, again by looking at (A.7), that
Dαtα−1 ≡ 0 , α > 0 , t > 0 .
From above we thus recognize the following statements about functions
which for t > 0 admit the same fractional derivative of order α , with
m − 1 < α ≤ m , m ∈ N ,
Dα f(t) = Dα g(t) ⇐⇒ f(t) = g(t) +
α−j , (A.14)
Dα∗ f(t) = D
∗ g(t) ⇐⇒ f(t) = g(t) +
m−j . (A.15)
In these formulas the coefficients cj are arbitrary constants.
For the two definitions we also note a difference with respect to the formal
limit as α → (m − 1)+ ; from (A.6) and (A.9) we obtain respectively,
Dα f(t) → Dm J f(t) = Dm−1 f(t) ; (A.16)
Dα∗ f(t) → J Dm f(t) = Dm−1 f(t) − f (m−1)(0+) . (A.17)
We now consider the Laplace transform of the two fractional derivatives.
For the standard fractional derivative Dα the Laplace transform, assumed to
exist, requires the knowledge of the (bounded) initial values of the fractional
integral Jm−α and of its integer derivatives of order k = 1, 2, . . . ,m−1 . The
corresponding rule reads, in our notation,
Dα f(t) ÷ sα f̃(s) −
Dk J (m−α) f(0+) sm−1−k , (A.18)
where m − 1 < α ≤ m .
The Caputo fractional derivative appears more suitable to be treated by
the Laplace transform technique in that it requires the knowledge of the
(bounded) initial values of the function and of its integer derivatives of
order k = 1, 2, . . . ,m− 1 , in analogy with the case when α = m . In fact, by
using (A.4) and noting that
Jα Dα∗ f(t) = J
α Jm−α Dm f(t) = Jm Dm f(t) = f(t) −
f (k)(0+)
(A.19)
we easily prove the following rule for the Laplace transform,
Dα∗ f(t) ÷ sα f̃(s) −
f (k)(0+) sα−1−k , m − 1 < α ≤ m . (A.20)
Indeed, the result (A.20), first stated by Caputo (1969) by using the
Fubini-Tonelli theorem, appears as the most ”natural” generalization of the
corresponding result well known for α = m .
Gorenflo and Mainardi (1997) have pointed out the major utility of the
Caputo fractional derivative in the treatment of differential equations of
fractional order for physical applications. In fact, in physical problems,
the initial conditions are usually expressed in terms of a given number
of bounded values assumed by the field variable and its derivatives of
integer order, no matter if the governing evolution equation may be a
generic integro-differential equation and therefore, in particular, a fractional
differential equation3.
We now analyze the most simple differential equations of fractional order,
including those which, by means of fractional derivatives, generalize the well-
known ordinary differential equations related to relaxation and oscillation
3We note that the Caputo fractional derivative was so named after the book by
Podlubny (1999). It coincides with that introduced, independently and a few later,
by Dzherbashyan and Nersesyan (1968) as a regularization of the Riemann-Liouville
fractional derivative. Nowadays, some Authors refer to it as the Caputo-Dzherbashyan
fractional derivative. The prominent role of this fractional derivative in treating initial
value problems was recognized in interesting papers by Kochubei (1989), (1990).
phenomena. Generally speaking, we consider the following differential
equation of fractional order α > 0 ,
Dα∗ u(t) = D
u(t) −
u(k)(0+)
= −u(t) + q(t) , t > 0 , (A.21)
where u = u(t) is the field variable and q(t) is a given function. Here m is
a positive integer uniquely defined by m − 1 < α ≤ m , which provides the
number of the prescribed initial values u(k)(0+) = ck , k = 0, 1, 2, . . . ,m−1 .
Implicit in the form of (A.21) is our desire to obtain solutions u(t) for which
the u(k)(t) are continuous. In particular, the cases of fractional relaxation
and fractional oscillation are obtained for 0 < α < 1 and 1 < α < 2 ,
respectively
The application of the Laplace transform through the Caputo formula (A.20)
yields
ũ(s) =
sα−k−1
sα + 1
sα + 1
q̃(s) . (A.22)
Now, in order to obtain the Laplace inversion of (A.22), we need to recall
the Mittag-Leffler function of order α > 0 , Eα(z) . This function, so named
from the great Swedish mathematician who introduced it at the beginning
of this century, is defined by the following series and integral representation,
valid in the whole complex plane,
Eα(z) =
Γ(αn + 1)
σα−1 e σ
σα − z
dσ , α > 0 . (A.23)
Here Ha denotes the Hankel path, i.e. a loop which starts and ends at −∞
and encircles the circular disk |σ| ≤ |z|1/α in the positive sense. It turns out
that Eα(z) is an entire function of order ρ = 1/α and type 1 .
The Mittag-Leffler function provides a simple generalization of the expo-
nential function, to which it reduces for α = 1 . Particular cases from which
elementary functions are recovered, are
= cosh z , E2
= cos z , z ∈ C , (A.24)
E1/2(±z1/2) = ez
1 + erf (±z1/2)
= ez erfc (∓z1/2) , z ∈ C , (A.25)
where erf (erfc) denotes the (complementary) error function. defined as
erf (z) :=
du , erfc (z) := 1 − erf (z) , z ∈ C .
A noteworthy property of the Mittag-Leffler function is based on the
following duplication formula
Eα(z) =
Eα/2(+z
1/2) + Eα/2(−z1/2)
. (A.26)
In (A.25-26) we agree to denote by z1/2 the main branch of the complex
root of z .
The Mittag-Leffler function is connected to the Laplace integral through the
equation ∫ ∞
e−u Eα (u
α z) du =
1 − z
α > 0 . (A.27)
The integral at the L.H.S. was evaluated by Mittag-Leffler who showed that
the region of its convergence contains the unit circle and is bounded by the
line Re z1/α = 1 . The above integral is fundamental in the evaluation of the
Laplace transform of Eα (−λ tα) with α > 0 and λ ∈ C . In fact, putting in
(A.27) u = st and uα z = −λ tα with t ≥ 0 and λ ∈ C , we get the Laplace
transform pair
Eα (−λ tα) ÷
sα + λ
, Re s > |λ|1/α . (A.28)
Then, using (A.28), we put for k = 0, 1, . . . ,m − 1 ,
uk(t) := J
keα(t) ÷
sα−k−1
sα + 1
, eα(t) := Eα(−tα) , (A.29)
and, from inversion of the Laplace transforms in (A.22), we find
u(t) =
ck uk(t) −
q(t − τ)u′0(τ) dτ . (A.30)
In particular, the formula (A.30) encompasses the solutions for α = 1 , 2 ,
since e1(t) = exp(−t) , e2(t) = cos t . When α is not integer, namely for
m − 1 < α < m , we note that m − 1 represents the integer part of α
(usually denoted by [α]) and m the number of initial conditions necessary
and sufficient to ensure the uniqueness of the solution u(t). Thus the m
functions uk(t) = J
keα(t) with k = 0, 1, . . . ,m−1 represent those particular
solutions of the homogeneous equation which satisfy the initial conditions
+) = δk h , h, k = 0, 1, . . . ,m − 1 , and therefore they represent the
fundamental solutions of the fractional equation (A.21), in analogy with the
case α = m . Furthermore, the function uδ(t) = −u′0(t) = −e′α(t) represents
the impulse-response solution.
The Mittag-Leffler function of order less than one turns out to be related
through the Laplace integral to another special function of Wright type,
denoted by M(z, ν) with 0 < ν < 1 , following the notation introduced
by Mainardi (1994, 1995). Since this function turns out to be relevant in
the general framework of fractional calculus with special regard to stable
probability distributions, we are going to summarize its basing properties.
For more details on this function, see Mainardi (1997), Appendix A.
Let us first recall the more general Wright function Wλ,µ(z) , z ∈ C , with
λ > −1 and µ > 0 . This function, so named from the British mathematician
who introduced it between 1933 and 1941, is defined by the following series
and integral representation, valid in the whole complex plane,
Wλ,µ(z) =
n! Γ(λn + µ)
eσ + zσ
−λ dσ
, (A.31)
where Ha denotes the Hankel path. It is possible to prove that the Wright
function is entire of order 1/(1+λ) , hence of exponential type if λ ≥ 0 . The
case λ = 0 is trivial since W0,µ(z) = e
z/Γ(µ) . The case λ = −ν , µ = 1 − ν
with 0 < ν < 1 provides the function M(z, ν) of special interest for us.
Specifically, we have
M(z; ν) := W−ν,1−ν(−z) =
W−ν,0(−z) , 0 < ν < 1 , (A.32)
and therefore from (A.31-32)
M(z; ν) =
(−z)n−1
(n − 1)!
Γ(ν n) sin (ν n π)
eσ − zσ
, 0 < ν < 1 .
(A.33)
In the series representation we have used the reflection formula for the
Gamma function, Γ(x) Γ(1−x) = π/ sin πx . Explicit expressions of M(z; ν)
in terms of simpler known functions are expected in particular cases when
ν is a rational number. Relevant cases are ν = 1/2 , 1/3 for which
M(z; 1/2) =
− z2/4
, (A.34)
M(z; 1/3) = 32/3 Ai
z/31/3
, (A.35)
where Ai denotes the Airy function.
When the argument is real and positive, i.e. z = r > 0 , the existence of
the Laplace transform of M(r; ν) is ensured by the asymptotic behaviour,
as derived by Mainardi & Tomirotti (1995), as r → +∞ ,
M(r/ν; ν) ∼ a(ν) r(ν − 1/2)/(1 − ν) exp
−b(ν) r1/(1 − ν)
, (A.36)
where a(ν) = 1/
2π (1 − ν) , b(ν) = (1 − ν)/ν .
It is an instructive exercise to derive the Laplace transform by interchanging
the Laplace integral with the Hankel integral in (A.33) and recalling the
integral representation (A.23) of the Mittag-Leffler function. We obtain the
Laplace transform pair
M(r; ν) ÷ Eν(−s) , 0 < ν < 1 . (A.37)
For ν = 1/2 , (A.37) with (A.25) and (A.34) provides the result, see e.g.
Doetsch (1974),
M(r; 1/2) :=
− r2/4
÷ E1/2(−s) := exp
erfc (s) . (A.38)
It would be noted that, since M(r, ν) is not of exponential order,
transforming term-by-term the Taylor series of M(r; ν) yields a series of
negative powers of s , which represents the asymptotic expansion of Eν(−s)
as s → ∞ in a certain sector around the real axis.
We also note that (A.37) with (A.23) allows us to compute the moments of
any real order δ ≥ 0 of M(r; ν) in the positive real axis. We obtain
r δ M(r; ν) dr =
Γ(δ + 1)
Γ(νδ + 1)
, δ ≥ 0 . (A.39)
When δ is integer we note that the moments are provided by the derivatives
of the Mittag-Leffler function in the origin, i.e.
rn M(r; ν) dr = lim
(−1)n
Eν(−s) =
Γ(n + 1)
Γ(νn + 1)
, (A.40)
where n = 0, 1, 2, . . . . The normalization condition
0 M(r; ν) dr =
Eν(0) = 1 is recovered for n = 0 . The relation with the Mittag-Leffler
function stated in (A.40) can be extended to the moments of non integer
order if we replace the ordinary derivative, of order n, with the corresponding
fractional derivative, of order δ 6= n, in the Caputo sense.
Another exercise on the function M concerns the inversion of the Laplace
transform exp(−sν) , either by the complex integral formula or by the formal
series method. We obtain the Laplace transform pair
M (1/rν ; ν) ÷ exp (−sν) , 0 < ν < 1 . (A.41)
For ν = 1/2 , (A.41) with (A.34) provides the known result, see e.g. Doetsch
(1974),
2 r3/2
M(1/r1/2; 1/2) :=
π r3/2
exp [− 1/(4r)] ÷ exp
−s1/2
. (A.42)
We recall that a rigorous proof of (A.41) was formerly given by Pollard
(1946), based on a formal result by Humbert (1945). The Laplace transform
pair was also obtained by Mikusiński (1959) and, albeit unaware of the
previous results, by Buchen & Mainardi (1975) in a formal way.
Appendix B: The Stable Probability Distributions
The stable distributions are a fascinating and fruitful area of research in
probability theory; furthermore, nowadays, they provide valuable models in
physics, astronomy, economics, and communication theory.
The general class of stable distributions was introduced and given this name
by the French mathematician Paul Lévy in the early 1920’s, see Lévy (1924,
1925). The inspiration for Lévy was the desire to generalize the celebrated
Central Limit Theorem, according to which any probability distribution
with finite variance belongs to the domain of attraction of the Gaussian
distribution.
Formerly, the topic attracted only moderate attention from the leading
experts, though there were also enthusiasts, of whom the Russian
mathematician Alexander Yakovlevich Khintchine should be mentioned first
of all. The concept of stable distributions took full shape in 1937 with the
appearance of Lévy’s monograph, see Lévy (1937-1954), soon followed by
Khintchine’s monograph, see Khintchine (1938).
The theory and properties of stable distributions are discussed in some
classical books on probability theory including Gnedenko & Kolmogorov
(1949-1954), Lukacs (1960-1970), Feller (1966-1971), Breiman (1968-1992),
Chung (1968-1974) and Laha & Rohatgi (1979). Also treatises on fractals
devote particular attention to stable distributions in view of their properties
of scale invariance, see e.g. Mandelbrot (1982) and Takayasu (1990). Sets of
tables and graphs have been provided by Mandelbrot & Zarnfaller (1959),
Fama & Roll (1968), Bo’lshev & Al. (1968) and Holt & Crow (1973).
Only recently, monographs devoted solely to stable distributions and related
stochastic processes have been appeared, i.e. Zolotarev (1983-1986), Janicki
& Weron (1994), Samorodnitsky & Taqqu (1994), Uchaikin & Zolotarev
(1999). We now can cite the paper by Mainardi, Luchko & Pagnini (2001)
where the reader can find (convergent and asymptotic) representations and
plots of the symmetric and non-symmetric stable densities generated by
fractional diffusion equations.
Stable distributions have three exclusive properties, which can be briefly
summarized stating that they 1) are invariant under addition, 2) possess
their own domain of attraction, and 3) admit a canonic characteristic
function.
Let us now illustrate the above properties which, providing necessary and
sufficient conditions, can be assumed as equivalent definitions for a stable
distribution. We recall the basic results without proof.
A random variable X is said to have a stable distribution P (x) = Prob {X ≤
x} if for any n ≥ 2 , there is a positive number cn and a real number dn such
X1 + X2 + . . . + Xn
= cn X + dn , (B.1)
where X1,X2, . . . Xn denote mutually independent random variables with
common distribution P (x) with X . Here the notation
= denotes equality
in distribution, i.e. means that the random variables on both sides have the
same probability distribution.
When mutually independent random variables have a common distribution
[shared with a given random variable X], we also refer to them as
independent, identically distributed (i.i.d) random variables [independent
copies of X]. In general, the sum of i.i.d. random variables becomes
a random variable with a distribution of different form. However, for
independent random variables with a common stable distribution, the sum
obeys to a distribution of the same type, which differs from the original
one only for a scaling (cn) and possibly for a shift (dn). When in (B.1) the
dn = 0 the distribution is called strictly stable.
It is known, see Feller (1966-1971), that the norming constants in (B.1) are
of the form
cn = n
1/α with 0 < α ≤ 2 . (B.2)
The parameter α is called the characteristic exponent or the index of stability
of the stable distribution.
We agree to use the notation X ∼ Pα(x) to denote that the random variable
X has a stable probability distribution with characteristic exponent α . We
simply refer to P (x) , p(x) := dP/dx (probability density function = pdf)
and X as α-stable distribution, density, random variable, respectively.
The definition (B.1) with the theorem (B.2) can be stated in an alternative
version that needs only two i.i.d. random variables. see also Lukacs (1960-
1970). A random variable X is said to have a stable distribution if for any
positive numbers A and B, there is a positive number C and a real number
D such that
AX1 + B X2
= C X + D , (B.3)
where X1 and X2 are independent copies of X . Then there is a number
α ∈ (0, 2] such that the number C in (B.3) satisfies Cα = Aα + Bα .
For a strictly stable distribution (B.3) holds with D = 0 . This implies that
all linear combinations of i.i.d. random variables obeying to a strictly stable
distribution is a random variable with the same type of distribution.
A stable distribution is called symmetric if the random variable −X has the
same distribution. Of course, a symmetric stable distribution is necessarily
strictly stable.
Noteworthy examples of stable distributions are provided by the Gaussian
(or normal) law (with α = 2) and by the Cauchy-Lorentz law (α = 1). The
corresponding pdf are known to be
pG(x;σ, µ) :=
e−(x − µ)
2/(2σ2) , x ∈ R , (B.4)
where σ2 denotes the variance and µ the mean, and
pC(x; γ, δ) :=
(x − δ)2 + γ2
, x ∈ R , (B.5)
where γ denotes the semi-interquartile range and δ the ”shift”.
Another (equivalent) definition states that stable distributions are the only
distributions that can be obtained as limits of normalized sums of i.i.d.
random variables. A random variable X is said to have a domain of
attraction,i.e. if there is a sequence of i.i.d. random variables Y1, Y2, . . .
and sequences of positive numbers {γn} and real numbers {δn}, such that
Y1 + Y2 + . . . Yn
d⇒X . (B.6)
The notation
d⇒ denotes convergence in distribution.
It is clear that the previous definition (B.1) yields (B.6), e.g. , by taking the
Yis to be independent and distributed like X . The converse is easy to show,
see Gnedenko & Kolmogorov (1949-1954). Therefore we can alternatively
state that a random variable X is said to have a stable distribution if it has
a domain of attraction.
When X is Gaussian and the Yis are i.i.d. with finite variance, then (B.6)
is the statement of the ordinary Central Limit Theorem. The domain
of attraction of X is said normal when γn = n
1/α ; in general, γn =
n1/α h(n) where h(x) , x > 0 , is a slow varying function at infinity, that
is, lim
h(ux)/h(x) = 1 for all u > 0 , see Feller (1971). The function
h(x) = log x , for example, is slowly varying at infinity.
Another definition specifies the canonic form that the characteristic function
(cf) of a stable distribution of index α must have. Recalling that the cf is
the Fourier transform of the pdf , we use the notation p̂α(κ) := 〈exp (iκX)〉 ÷
pα(x) . We first note that a stable distribution is also infinitely divisible, i.e.
for every positive integer n its cf can be expressed as the nth power of
some cf . In fact, using the characteristic function, the relation (B.1) is
transformed into
[p̂α(κ)]
n = p̂α(cn κ) e
idnκ . (B.7)
The functional equation (B.7) can be solved completely and the solution is
known to be
p̂α(κ;β, γ, δ) = exp {iδκ − γα |κ|α [1 + i (sign κ)β ω(|κ|, α)]} , (B.8)
where
ω(|κ|, α) =
tan (α π/2) , if α 6= 1 ,
−(2/π) log |κ| , if α = 1 . (B.9)
Consequently a random variable X is said to have a stable distribution if
there are four real parameters α, β, γ, δ with 0 < α ≤ 2 , −1 ≤ β ≤ +1 ,
γ > 0 , such that its characteristic function has the canonic form (B.8-9).
Then we write pα(x;β, γ, δ)÷ p̂α(κ;β, γ, δ) and X ∼ Pα(x;β, γ, δ) , so partly
following the notation of Holt & Crow (1973) and Samorodnitsky & Taqqu
(1994).
We note in (B.8-9) that β appears with different signs for α 6= 1 and α = 1 .
This minor point has been the source of great confusion in the literature, see
Hall (1980) for a discussion. The presence of the logarithm for α = 1 is the
source of many difficulties, so this case has often to be treated separately.
The cf (B.8-9) turns out to be a useful tool for studying α-stable distri-
butions and for providing an interpretation of the additional parameters,
β (skewness parameter), γ (scale parameter) and δ (shift parameter), see
Samorodnitsky & Taqqu (1994). When α = 2 the cf refers to the Gaussian
distribution with variance σ2 = 2 γ2 and mean µ = δ ; in this case the value
of the skewness parameter β is not specified because tan π = 0 , and one
conventionally takes β = 0 .
One easily recognizes that a stable distribution is symmetric if and only if
β = δ = 0 and is symmetric about δ if and only if β = 0 . Stable distributions
with extremal values of the skewness parameter are called extremal. One
can prove that all the extremal stable distributions with 0 < α < 1 are
one-sided, the support being R+0 if β = −1 , and R
0 if β = +1 .
For the stable distributions Pα(x;β, γ, δ) we now consider the asymptotic
behaviour of the tail probabilities, T+(λ) := Prob {X > λ} and T−(λ) :=
Prob {X < −λ} , as λ → ∞ . For the Gaussian case α = 2 the result is well
known, see e.g. Feller (1957),
α = 2 : T±(λ) ∼ 1
2/(4γ2)
, λ → ∞ . (B.10)
Because of the above exponential decay all the moments of the corresponding
pdf turn out to be finite, which is an exclusive property of this stable
distribution. For all the other stable distributions the singularity of the
characteristic function in the origin is responsible for the algebraic decay of
the tail probabilities as indicated below, see e.g. Samorodnitsky & Taqqu
(1994),
0 < α < 2 : lim
λα T±(λ) = Cα γ
α (1 ∓ β)/2 , (B.11)
where
x−α sin x dx
1 − α
Γ(2 − a) cos (απ/2)
, if α 6= 1 ,
2/π , if α = 1 .
(B.12)
We note that for extremal distributions (β = ±1) the above algebraic decay
holds true only for one tail, the left one if β = +1 , the right one if β = −1 .
The other tail is either identically zero if 0 < α < 1 (the distribution is
one-sided !), or exhibits an exponential decay if 1 ≤ α < 2 . Because of the
algebraic decay we recognize that
0 < α < 2 :
|x|>λ
pα(x;β, γ, δ) dx = O(λ
−α) , (B.13)
so the absolute moments of a stable non-Gaussian pdf turn out to be finite
if their order ν is 0 ≤ ν < α and infinite if ν ≥ α . We are now convinced
that the Gaussian distribution is the unique stable distribution with finite
variance. Furthermore, when α ≤ 1 , the first absolute moment 〈|X|〉 is
infinite as well, so we need to use the median to characterize the expected
value.
There is however a fundamental property shared by all the stable
distributions that we like to point out: for any α the stable pdf are unimodal
and indeed bell-shaped, i.e. their n-th derivative has exactly n zeros, see
Gawronski (1964).
We now come back to the cf of a stable distribution, in order to provide for
α 6= 1 and δ = 0 a simpler canonic form which allow us to derive convergent
and asymptotic power series for the corresponding pdf . We first note that
the two parameters γ and δ in (B.8), being related to a scale transformation
and a translation, are not so essential since they do not change the shape
of distributions. If we take γ = 1 and δ = 0 , we obtain the so-called
standardized form of the stable distribution and X ∼ Pα(x;β, 1, 0) is referred
to as the α-stable standardized random variable. Furthermore, we can choose
the scale parameter γ in such a way to get from (B.8-9) the simplified canonic
form used by Feller (1952, 1966-1971) and Takayasu (1990) for strictly stable
distributions (δ = 0) with α 6= 1 , which reads in an ad hoc notation,
q̂α(κ; θ) :=
eiκ y pα(y; θ) dy = exp
−|κ|α e±i θ π/2
, (B.14)
where the symbol ± takes the sign of κ . This canonic form, that we refer to
as the Feller canonic form, is derived from (B.8-9) if in addition to α 6= 1
and δ = 0 we require
γα = cos
, tan
= β tan
. (B.15)
Here θ is the skewness parameter instead of β and its domain is restricted
in the following region (depending on α)
|θ| ≤
α , if 0 < α < 1 ,
2 − α , if 1 < α < 2 . (B.16)
Thus, when we use the Feller canonic form for strictly stable distributions
with index α 6= 1 and skewness θ , we implicitly select the scale parameter
γ (0 < γ ≤ 1), which is related to α , β and θ by (B.15). Specifically, the
random variable Y ∼ Qα(y; θ) turns out to be related to the standardized
random variable X ∼ Pα(x;β, 1, 0) by the following relations
Y = X/γ , pα(x;β, 1, 0) = γ qα(y = γx; θ) , (B.17)
with 
γ = [cos (θπ/2)]1/α ,
θ = (2/π) arctan [β tan (απ/2)] ,
tan (θπ/2)
tan (απ/2)
(B.18)
We recognize that qα(y, θ) = qα(−y,−θ) , so the symmetric stable
distributions are obtained if and only if θ = 0 . We note that for the
symmetric stable distributions we get the identity between the standardized
and the Lévy canonic forms, since in (B.18) β = θ = 0 implies γ = 1 .
A particular but noteworthy case is provided by p2(x; 0, 1, 0) = q2(y; 0) ,
corresponding to the Gaussian distribution with variance σ2 = 2 .
The extremal stable distributions, corresponding to β = ±1 , are now
obtained for θ = ±α if 0 < α < 1 , and for θ = ∓(2 − α) if 1 < α < 2 ; for
them the scaling parameter turns out to be γ = [cos (|α|π/2)]1/α . It may be
an instructive exercise to carry out the inversion of the Fourier transform
when α = 1/2 and θ = −1/2 . In this case we obtain the analytical expression
for the corresponding extremal stable pdf , known as the (one-sided) Lévy-
Smirnov density,
q1/2(y;−1/2) =
y−3/2 e−1/(4y) , y ≥ 0 . (B.19)
The standardized form for this distribution can be easily obtained from
(B.19) using (B.17-18) with α = 1/2 and θ = −1/2 . We get γ =
[cos (−π/4)]2 = 1/2 , β = −1 , so
p1/2(x;−1, 1, 0) =
q1/2(x/2;−1/2) =
x−3/2 e−1/(2x) , (B.20)
where x ≥ 0 , in agreement with Holt & Crow (1973) [§2.13, p. 147].
Feller (1952) has obtained from (B.14) the following representations by
convergent power series for the stable distributions valid for y > 0 , with
0 < α < 1 (negative powers),
qα(y; θ) =
(−y−α)n Γ(nα + 1)
(θ − α)
, (B.21)
1 < α ≤ 2 (positive powers),
qα(y; θ) =
(−y)n
Γ(n/α + 1)
(θ − α)
. (B.22)
The values for y < 0 can be obtained from (B.21-22) using the identity
qα(−y; θ) = qα(y;−θ) , y > 0 . As a consequence of the convergence in all of
C of the series in (B.21-22) we recognize that the restrictions of the functions
y qα(y; θ) on the two real semi-axis turn out to be equal to certain entire
functions of argument 1/|y|α for 0 < α < 1 and argument |y| for 1 < α ≤ 2 .
It has be shown, see e.g. Bergström (1952), Chao Chung-Jeh (1953), that the
two series in (B.21-22) provide also the asymptotic (divergent) expansions to
the stable pdf with the ranges of α interchanged from those of convergence.
From (B.21-22) a relation between stable pdf with index α and 1/α can be
derived as noted in Feller (1966-1971). Assuming 1/2 < α < 1 and y > 0 ,
we obtain
q1/α(y
−α; θ) = qα(y; θ
∗) , θ∗ = α(θ + 1) − 1 . (B.23)
A quick check shows that θ∗ falls within the prescribed range, |θ∗| ≤ α ,
provided that |θ| ≤ 2 − 1/α .
We now consider two particular cases of the Feller series (B.21-22), of
particular interest for us, which turn out to be related to the entire function
of Wright type, M(z; ν) with 0 < ν < 1 , reported in Appendix A. These
cases correspond to the following extremal distributions
Φ1(y) := qα(y;−α) , y > 0 , 0 < α < 1 , (B.24)
Φ2(y) := qα(y;α − 2) , y > 0 , 1 < α ≤ 2 , (B.25)
for which the Feller series (B.21-22) reduce to
Φ1(y) =
(−1)n−1 y−αn−1 Γ(nα + 1)
sin (nπα) , y > 0 , (B.26)
Φ2(y) =
(−1)n−1 yn−1 Γ(n/α + 1)
, y > 0 . (B.27)
In fact, recalling the series representation of the general Wright function,
Wλ,µ(z) with λ > −1 , µ > 0 , see (A.31), and the definition of the function
M(z; ν) with 0 < ν < 1 , see (A.32-33), we recognize that
Φ1(y) =
W−α,0(−y−α) =
M(y−α;α) , y > 0 , (B.28)
Φ2(y) =
W−1/α,0(−y) =
M(y; 1/α) , y > 0 . (B.29)
We would like to remark that the above relations with the Wright functions
have been noted also by Engler (1997).
It is worth to point out that, whereas Φ1(y) totally represents the one-
sided stable pdf qα(y;−α) , 0 < α < 1 , with support in R+0 , Φ2(y) is the
restriction on the positive axis of qα(y;α− 2) , 1 < α ≤ 2 , whose support is
all of R . Since the function M(z; ν) turns out to be normalized in R+0 , see
(A.39-40), we also note
Φ1(y) dy = 1 ;
Φ2(y) dy = 1/α . (B.30)
Using the results (A.41) and (A.37) we can easily evaluate the Laplace
transforms of Φ1(y) and Φ2(y) , respectively. We obtain
L[Φ1(y)] = Φ̃1(s) = exp (−sα) , 0 < α < 1 , (B.31)
L[Φ2(y)] = Φ̃2(s) =
E1/α (−s) , 1 < α ≤ 2 , (B.32)
where E1/α(·) denotes the Mittag-Leffler function of order 1/α , see (A.23).
It is an instructive exercise to derive the asymptotic behaviours of Φ1(y) and
Φ2(y) as y → 0+ and y → +∞ . By using the expressions (B.28−29) in terms
of the function M and recalling the series and asymptotic representations of
this function, see (A.33) and (A.36), we obtain
Φ1(y) =
y−(2−α)/[2(1−α)] e−c1 y
−α/(1−α)
, as y → 0+ ,
Γ(1 − α)
y−α−1 [1 + O (y−α)] , as y → +∞ ,
(B.33)
Φ2(y) =
Γ(1 − 1/α)
[1 + O (y)] , as y → 0+ ,
y(2−α)/[2(α−1)] e−c2 y
α/(α−1)
, as y → +∞ ,
(B.34)
where c1 , c2 are positive constants depending on α . We note that the
exponential decay is found for Φ1(y) as y → 0+ but as y → +∞ for Φ2(y) .
Explicit expressions for stable pdf can be derived form those for the function
M(z; ν) when ν = 1/2 and ν = 1/3 , given in Appendix A, see (A.34-
35). Of course the ν = 1/2 expression can be used to recover the well-
known (symmetric) Gaussian distribution q2(y; 0) accounting for (B.29), and
the (one-sided) Lévy distribution q1/2(y;−1/2), see (B.19), accounting for
(B.28). The ν = 1/3 expression provides, accounting for (B.28),
q1/3(y;−1/3) = 3−1/3 y−4/3 Ai
(3y)−1/3
y−3/2 K1/3
(B.35)
where Ai denotes the Airy function and K1/3 the modified Bessel function of
the second kind of order 1/3 . The equivalence between the two expressions
in (B.35) can be proved in view of the relation, see Abramowitz & Stegun
(1965-1972) [(10.4.14)],
Ai (z) =
. (B.36)
The case α = 1/3 has also been discussed by Zolotarev (1983-1986), who
has quoted the corresponding expression of the pdf in terms of K1/3 .
A general representation of all stable distributions (thus including the
extremal distributions above considered) in terms of special functions has
been only recently achieved by Schneider (1986). In his remarkable
(but almost ignored) article, Schneider has established that all the stable
distributions can be characterized in terms of a general class of special
functions, the so-called Fox H functions, so named after Charles Fox (1961).
For details on Fox H functions, see e.g. the books Mathai & Saxena (1978),
Srivastava & Al. (1982) and the most recent paper by Kilbas and Saigo
(1999). These functions are expressed in terms of special integrals in the
complex-plane, the Mellin-Barnes integrals4.
4The names refer to the two authors, who in the first 1910’s developed the theory of
these integrals using them for a complete integration of the hypergeometric differential
equation. However, as pointed out in the the Bateman Project Handbook on High
Transcendental Functions, see Erdelyi (1953), these integrals were first used by S. Pincherle
in 1888. For a revisited analysis of the pioneering work of Pincherle (1853-1936, Professor
of Mathematics at the University of Bologna from 1880 to 1928) we refer to the paper by
Mainardi and Pagnini (2003).
References
[1] Abramowitz, M. and Stegun, I.A. (Editors) : Handbook of Mathematical
Functions, Dover, New York 1965. [reprint, 1972]
[2] Baillie, R.T. and King, M.L. (Editors) : Fractional Differencing and
Long Memory Processes, Journal of Econometrics, 73 1-324 (1996).
[3] Bergström, H. : On some expansions of stable distribution functions,
Ark. Mat., 2 375-378 (1952).
[4] Bol’shev, L.N., Zolotarev, V.M., Kedrova, E.S. and Rybinskaya, M.A.
: Tables of cumulative functions of on-sided stable distributions, Theor.
Probability Appl., 15 299-309 (1968).
[5] Bouchaud, J.-P and Potters, M. : Theory of Financial Risk: from Sta-
tistical Physics to Risk Management, Cambridge Univ. Press, Cambridge
2000. [English enlarged version of Théories des Risques Financiers, CEA,
Aléa, Saclay 1997.]
[6] Breiman, L. : Probability, SIAM, Philadelphia 1992. [The SIAM edition
is an unabridged, corrected republication of the 1st edn, Addison-Wesley,
Reading Mass. 1968]
[7] P.W. Buchen, and Mainardi, F. : Asymptotic expansions for transient
viscoelastic waves, J. de Mécanique, 14 597-608 (1975).
[8] Caputo, M. : Linear models of dissipation whose Q is almost frequency
independent, Part II., Geophys. J. R. Astr. Soc., 13 529-539 (1967).
[9] Caputo, M.: Elasticità e Dissipazione, Zanichelli, Bologna 1969.
[10] Caputo, M. and Mainardi, F. : Linear models of dissipation in anelastic
solids, Rivista del Nuovo Cimento (Ser. II), 1 161-198 (1971).
[11] Chung-Jeh, Chao : Explicit formula for the stable law of distribution,
Acta Math. Sinica, 3 177-185 (1953). [in Chinese with English summary]
[12] Chung, K.L. : A Course in Probability Theory, 2nd edn, Academic
Press, New York 1974. [1st edn, Harcourt Brace Jowvanovich, 1968]
[13] Dzherbashyan, M.M. and Nersesyan, A.B. : Fractional derivatives and
the Cauchy problem for differential equations of fractional order. Izv.
Acad. Nauk Armjanskvy SSR, Matematika, 3 3–29 (1968). [In Russian]
[14] Engler, H. : Similarity solutions for a class of hyperbolic integrodiffer-
ential equations, Differential Integral Eqns, 10 815-840 (1997).
[15] Erdélyi, A. (Editor) : Higher Transcendental Functions, Bateman
Project, McGraw-Hill, New York 1953; Vol. 1, Ch. 1, §1.19, p. 49.
[16] Erdélyi, A. (Editor) : Higher Transcendental Functions, Bateman
Project, McGraw-Hill, New York 1955; Vol. 3, Ch. 18, pp. 206-227.
[17] Erdélyi, A. (Editor) : Tables of Integral Transforms, Bateman Project
McGraw-Hill, New York 1954; Vol. 2, Ch. 13, pp. 181-212.
[18] Fama, E. and Roll, R. : Some properties of symmetric stable
distributions, J. Amer. Statist. Assoc., 63 817-836 (1968).
[19] Feller, W. : On a generalization of Marcel Riesz’ potentials and
the semi-groups generated by them, Meddelanden Lunds Universitets
Matematiska Seminarium (Comm. Sém. Mathém. Université de Lund),
Tome suppl. dédié a M. Riesz, Lund (1952), pp. 73-81.
[20] Feller, W. : An Introduction to Probability Theory and its Applications,
Vol. 1, 3rd edn, Wiley, New York 1968. [1st edn, 1957]
[21] Feller, W. : An Introduction to Probability Theory and its Applications,
Vol. 2, 2nd edn, Wiley, New York 1971. [1st edn. 1966]
[22] Fox, C. : The G and H functions as symmetrical Fourier kernels, Trans.
Amer. Math. Soc., 98 395-429 (1961).
[23] Fujita, Y. : Integrodifferential equation which interpolates the heat
equation and the wave equation, Osaka J. Math., 27 309-321, 797-804
(1990). [2 papers]
[24] Gawronski, W. : On the bell-shape of stable distributions, Annals of
Probability, 12 230-242 (1984).
[25] Gnedenko, B.V. and Kolmogorov, A.N. : Limit Distributions for Sums
of Independent Random Variables, Addison-Wesley, Cambridge, Mass.
1954. [English Transl. from the Russian edition 1949, with notes by K.L.
Chung, revised 1968]
[26] Gorenflo, R. : Fractional calculus: some numerical methods, in
Carpinteri, A. and Mainardi, F. (Eds.), Fractals and Fractional Calculus
in Continuum Mechanics, CISM Courses and Lectures # 378, Springer
Verlag, Wien 1997, pp. 277-290. [Reprinted in www.fracalmo.org]
[27] Gorenflo, R. and Mainardi, F. : Fractional calculus: integral and
differential equations of fractional order, in Carpinteri, A. and Mainardi,
F. (Eds.), Fractals and Fractional Calculus in Continuum Mechanics,
CISM Courses and Lectures # 378, Springer Verlag, Wien 1997, pp. 223-
276. [Reprinted in www.fracalmo.org]
[28] Gorenflo, R. and Mainardi, F. : Fractional calculus and stable
probability distributions, Archives of Mechanics, 50 377-388 (1998a).
[29] Gorenflo, R. and Mainardi, F. : Random walk models for space-
fractional diffusion processes, Fractional Calculus and Applied Analysis,
1 167-190 (1998b).
[30] Gorenflo, R. De Fabritiis, G. and Mainardi, F. : Discrete random walk
models for symmetric Lévy-Feller diffusion processes, Physica A, 269 79–
89 (1999).
[31] Gorenflo, R., Mainardi, F., Moretti, D., Pagnini, G. and Paradisi,
P. : Discrete random walk models for space-time fractional diffusion,
Chemical Physics, 284 521-541 (2002). Special issue on Strange Kinetics
Guest Editors: R. Hilfer, R. Metzler, A. Blumen, J. Klafter. [E-print
arXiv:cond-mat/0702072]
[32] Hall, P. : A comedy of errors: the canonical form for a stable
characteristic function, Bull. London Math. Soc., 13 23-27 (1980).
[33] Holt, D.R. and Crow, E.L. : Tables and graphs of the stable probability
density functions, J. Res. Nat. Bureau Standards, 77B 143-198 (1973).
[34] Humbert, P. : Nouvelles correspondances symboliques, Bull. Sci.
Mathém. (Paris, II ser.), 69 121-129 (1945).
[35] Janicki, A. and Weron, A. : Simulation and Chaotic Behavior of α-
Stable Stochastic Processes, Marcel Dekker, New York 1994.
[36] Khintchine, A.Y. : Limit Laws for Sums of Independent Variables,
ONTI, Moscow 1938 [in Russian]
[37] Kilbas, A.A. and Saigo, M. : On the H functions, Journal of Applied
Mathematics and Stochastic Analysis, 12 191-204 (1999).
[38] Kochubei, A.N. : A Cauchy problem for evolution equations of
fractional order, Differential Equations, 25 967–974 (1989). [English
translation from the Russian Journal Differentsial’nye Uravneniya]
[39] Kochubei, A.N. : Fractional order diffusion, Differential Equations,
26 485–492 (1990). [English translation from the Russian Journal
Differentsial’nye Uravneniya]
[40] Laha, R.G. and Rohatgi, V.K. : Probability Theory, Wiley, New York
1979.
[41] Lévy, P. : Théorie des erreurs. La Loi de Gauss et les lois exceptionelles,
Bull. Soc. Math. France, 52 49-85 (1924).
[42] Lévy, P. : Calcul des probabilités, Gauthier-Villars, Paris 1925: Part II,
Chap. 6.
[43] Lévy, P. : Théorie de l’addition des variables aléatoires, 2nd edn,
Gauthier-Villars, Paris 1954. [1st edn, 1937]
[44] Lukacs, E. : Characteristic Functions, 2nd edn, Griffin, London 1970.
[1st edn, 1960]
[45] Mainardi, F. : On the initial value problem for the fractional diffusion-
wave equation, in S. Rionero and T. Ruggeri (Eds), Waves and Stability
in Continuous Media, World Scientific, Singapore 1994, pp. 246-251.
[46] Mainardi, F. : The time fractional diffusion-wave equation, Radiofisika,
38 20-36 (1995). [English Translation: Radiophysics & Quantum
Electronics]
[47] Mainardi, F. : Fractional relaxation-oscillation and fractional diffusion-
wave phenomena, Chaos, Solitons & Fractals, 7 1461-1477 (1996).
[48] Mainardi, F. : Fractional calculus: some basic problems in continuum
and statistical mechanics, in Carpinteri, A. and Mainardi, F. (Eds),
Fractals and Fractional Calculus in Continuum Mechanics, CISM Courses
and Lectures # 378, Springer-Verlag, Wien 1997, pp. 291-348. [Reprinted
in www.fracalmo.org]
[49] Mainardi, F., Luchko, Yu. and Pagnini, G. : The fundamental solution
of the space-time fractional diffusion equation, Fractional Calculus and
Applied Analysis, 4 153-192 (2001). [E-print arXiv:cond-mat/0702419]
[50] Mainardi, F. and Pagnini, G. : Salvatore Pincherle, the pioneer of the
Mellin-Barnes integrals, Jour. Computational and Applied Mathematics,
153 331-342 (2003).
[51] Mainardi, F. and Tomirotti, M. : On a special function arising in the
time fractional diffusion-wave equation, in P. Rusev, I. Dimovski and V.
Kiryakova (Eds), Transform Methods and Special Functions, Sofia 1994,
Science Culture Technology, Singapore 1995, pp. 171-183.
[52] Mainardi, F. and Tomirotti, M. : Seismic pulse propagation with
constant Q and stable probability distributions, Annali di Geofisica, 40
1311-1328 (1997).
[53] Mandelbrot, B.B. and Zarnfaller, F. : Five place tables of certain stable
distributions, Technical Report RC-421, IBM Thomas J. Watson Research
Center, Yorktown Heights, New York, Dec 31, 1959.
[54] Mandelbrot, B.B. : The Fractal Geometry of Nature, Freeman, San
Francisco 1982.
[55] Mandelbrot, B.B. : Fractals and Scaling in Finance, Springer-Verlag,
New York 1997.
[56] Mantegna, R.N. and Stanley, H.E. : An Introduction to Econophysics,
Cambridge University Press, Cambridge 2000.
[57] Mathai, A.M. and Saxena, R.K. : The H-function with Applications in
Statistics and Other Disciplines, Wiley Eastern Ltd, New Delhi 1978.
[58] Mikusiński, J. : On the function whose Laplace transform is
exp (−sαλ) , Studia Math., 18 191-198 (1959).
[59] Miller, K.S. and Ross, B. : An Introduction to the Fractional Calculus
and Fractional Differential Equations, Wiley, New York 1993.
[60] Oldham, K.B. and Spanier, J. : The Fractional Calculus, Academic
Press, New York 1974.
[61] Pincherle, S. : Sulle funzioni ipergeometriche generalizzate, Atti R.
Accademia Lincei, Rend. Cl. Sci. Fis. Mat. Nat. (4), 4, 694-700, 792-799
(1888).
[62] Podlubny, I. : Fractional Differential Equations, Academic Press, San
Diego 1999. [Mathematics in Science and Engineering, Vol. 198]
[63] Pollard, H. : The representation of exp (−xλ) as a Laplace integral,
Bull. Amer. Math. Soc., 52, 908-910 (1946).
[64] Prüss, J. : Evolutionary Integral Equations and Applications,
Birkhäuser, Basel 1993.
[65] Rubin, B. : Fractional Integrals and Potentials, Pitman Monographs
and Surveys in Pure and Appl. Mathematics # 82, Longman, London
1996.
[66] Saichev, A.I and Zaslavsky, G.M. : Fractional kinetic equations:
solutions and applications, Chaos, 7 753-764 (1997).
[67] Samko, S.G., Kilbas, A.A. and Marichev, O.I. : Fractional Integrals and
Derivatives, Theory and Applications, Gordon and Breach, Amsterdam
1993. [Engl. Transl. from the Russian edition, 1987]
[68] Samorodnitsky, G. and Taqqu, M.S. : Stable non-Gaussian Random
Processes, Chapman & Hall, New York 1994.
[69] Schneider, W.R. : Stable distributions: Fox function representation and
generalization, in S. Albeverio, G. Casati and D. Merlini (Eds), Stochastic
Processes in Classical and Quantum Systems, Lecture Notes in Physics #
262, Springer Verlag, Berlin 1986, 497-511.
[70] Schneider, W.R. and Wyss, W. : Fractional diffusion and wave
equations, J. Math. Phys., 30 134-144 (1989).
[71] Srivastava, H.M., Gupta, K.C. and Goyal, S.P. : The H-Functions of
One and Two Variables with Applications, South Asian Publ., New Delhi
1982.
[72] H. Takayasu, H. : Fractals in the Physical Sciences, Manchester Univ.
Press, Manchester and New York 1990.
[73] Uchaikin, V.V. and Zolotarev, V.M. : Chance and Stability. Stable
Distributions and their Applications, VSP, Utrecht 1999. [Series ”Modern
Probability and Statistics”, No 3]
[74] Zolotarev, V.M. : One-dimensional stable distributions, Amer. Math.
Soc., Providence, R.I. 1986. [English Transl. from the Russian edition,
1982]
	Introduction
	The standard diffusion equation
	The time-fractional diffusion equation
	The Cauchy problem for the time-fractional diffusion equation
	The Signalling problem for the time-fractional diffusion equation
	The Cauchy problem for the symmetric space-fractional diffusion equation
	Conclusions
ABSTRACT
  Fractional calculus allows one to generalize the linear, one-dimensional,
diffusion equation by replacing either the first time derivative or the second
space derivative by a derivative of fractional order. The fundamental solutions
of these equations provide probability density functions, evolving on time or
variable in space, which are related to the class of stable distributions. This
property is a noteworthy generalization of what happens for the standard
diffusion equation and can be relevant in treating financial and economical
problems where the stable probability distributions play a key role.

<|endoftext|><|startoftext|>
Fabrication of half metallicity in a ferromagnetic metal
Kalobaran Maiti∗
Department of Condensed Matter Physics and Materials’ Science,
Tata Institute of Fundamental Research, Homi Bhabha Road, Colaba, Mumbai - 400 005, INDIA
(Dated: August 15, 2021)
We investigate the growth of half metallic phase in a ferromagnetic material using state-of-the-art
full potential linearized augmented plane wave method. To address the issue, we have substituted Ti
at the Ru-sites in SrRuO3, where SrRuO3 is a ferromagnetic material. Calculated results establish
Ti4+ valence states (similar to SrTiO3), which was predicted experimentally. Thus, Ti substitution
dilutes the Ru-O-Ru connectivity, which is manifested in the calculated results in the form of
significant band narrowing leading to finite gap between t2g and eg bands. At 75% substitution, a
large gap (> 2eV) appears at the Fermi level, ǫF in the up spin density of states, while the down spin
states contributes at ǫF characterizing the system a half-metallic ferromagnet. The t2g − eg gap can
be tailored judiciously by tuning Ti concentrations to minimize thermal effects, which is often the
major bottleneck to achieve high spin polarization at elevated temperatures in other materials. This
study, thus, provides a novel but simple way to fabricate half-metallicity in ferromagnetic materials,
which are potential candidates for spin-based technology.
PACS numbers: 85.70.Ay, 75.30.-m, 71.70.Ch, 71.15.Ap
The search of half metallic ferromagnetic materials has
seen an explosive growth in the recent times due to its
potential technological applications. In these materials,
the electronic density of states (DOS) at the Fermi level,
ǫF corresponds to only one kind of spin, while the other
spin density of states exhibit an energy gap at ǫF . Thus,
in the polarized condition, electronic conduction strongly
depends on the spin of the charge carriers; the material
is insulating for one kind of spin and metallic for the
other. This unique property makes them ideal candidates
for the development of spin-based electronics. Various
theoretical studies predicted half metallicity in Heusler
alloys [1], double perovskites [2], manganates [3], CrO2
[4], graphene nanoribbons [5] etc. However, experimen-
tal studies on very few materials such as manganates [3]
and CrO2 [4], etc. exhibit half metallicity at low temper-
atures. Thermal fluctuations often lead to a reduction in
spin polarization at elevated temperatures [6] making it
difficult for technological applications.
In this study, we investigate the evolution of the elec-
tronic density of states in SrRu1−xTixO3 as a function
of x. SrRuO3 is a ferromagnetic metal with Curie tem-
perature of 165 K. Spin polarization at ǫF is found to be
negative in the ferromagnetic ground state [7, 8]. SrTiO3,
on the other hand, is a band insulator. Various experi-
mental studies [9, 10] suggest (4+) valence state of Ti in
the intermediate compositions (similar to SrTiO3), which
corresponds to 3d0 electronic configuration. Thus, in ad-
dition to disorder effect, Ti substitution leads to a dilu-
tion of Ru-O-Ru connectivity. Transport measurements
in SrRu1−xTixO3 exhibit a range of novel phase tran-
sitions involving disorder induced correlated metal, An-
derson insulator, correlated insulator and band insulators
[11] for different values of x.
Using ab initio calculations, we find that Ti substitu-
tion at Ru-sites in ferromagnetic SrRuO3 leads to half
FIG. 1: (color online) Crystal structure of SrRu0.25Ti0.75O3.
In order to obtain the structure of SrRuTiO3, we replaced Ti2
by Ru, and all the Ti and Ru sites are made equivalent.
metallicity. Here, reduced Ru-O-Ru connectivity due to
Ti-substitution leads to significant narrowing of Ru 4d
band and thus, the up spin band moves below ǫF . In-
terestingly, the energy gap between t2g and eg bands can
be tuned by Ti-concentration. 75% substituted sample
exhibits gap as high as 2 eV. Experimental realization of
such method on different systems would provide a new
direction in the search of HMFs for spin-based technol-
The electronic density of states of SrRu1−xTixO3 for
x = 0.0, 0.5, 0.75 and 1.0 were calculated using state-
of-the-art full potential linearized augmented plane wave
method (FLAPW) within the local spin density approxi-
mations (LSDA) using WIEN2K software [12]. The crystal
structure of SrTiO3 is cubic with the lattice constant, a =
3.905 Å. SrRuO3 possesses close to cubic structure with
small orthorhombic distortion. This is manifested clearly
by the similar density of states (DOS) of SrRuO3 in real
structure vis-a-vis in the equivalent cubic structure [7].
Ti-substitution in SrRuO3 leads the system towards cu-
http://arxiv.org/abs/0704.0321v1
�� �� �� � � � �  !"
/0 12 34 5 6 7 8 9 :;
?@A BCDE
^_` ab cd efg
lmn op qr stu
z{| } ~� ���
��� �� ��
¡¢£¤¥¦ §¨©ª
FIG. 2: (color online) (a) TDOS, (b) Ti 3d PDOS, (c) Ru 4d
PDOS, (d) O 2p PDOS and (e) Sr 4d PDOS of SrRu1−xTixO3.
Thin and thick solid lines represent DOS corresponding to x
= 0.5 and 0.75, respectively.
bic structure. Thus, we have considered cubic structure
for all the calculations in this study. A typical unit cell
for SrRu0.25Ti0.75O3 is shown in Fig. 1. There are 8 for-
mula units in the unit cell constructed by doubling the
lattice constant of SrTiO3. In order to preserve cubic
symmetry, three types of Ti are considered occupying
corners (Ti1), edge centers (Ti2) and face centered posi-
tions (Ti3). The body centered position is occupied by
Ru. There are three non-equivalent oxygens; O1 forms
the octahedra around Ti1-sites, O2 forms the octahedra
around Ru-sites and the rest of the oxygen positions are
occupied by O3. Thus, the connectivity between Ru-sites
occurs via Ru-O2 bondings. The muffin-tin radii (RMT )
for Sr, Ru, Ti and O were set to 1.16 Å 0.95 Å 0.95 Å and
0.74 Å respectively. The convergence for different calcu-
lations were achieved considering 512 k points within the
first Brillouin zone. The error bar for the energy conver-
gence was set to < 0.25 meV per formula unit. In every
case, the charge convergence was achieved to be less than
10−3 electronic charge.
In Fig. 2, We show the total DOS calculated for
SrRu1−xTixO3 (x = 0.5 and 0.75) and the partial DOS
obtained by projecting the eigenstates onto the Ti 3d,
Ru 4d, O 2p and Sr 4d states. The figure exhibits 5
distinctly separable features. The energy region -1.5 eV
to -5 eV is primarily contributed by O 2p partial DOS
with negligible contributions from other electronic states.
Thus, these contributions are characterized due to the
non-bonding O 2p states. Sr 4d partial DOS shown in
Fig. 2(e) appear above 5 eV. The peak appears to shift
towards higher energy with increasing x. This can be
·¸ ¹º »¼ ½¾ ¿ À Á Â Ã ÄÅ
ÑÒÓÔÕ
×ØÙÚÛ
ÝÞ ßà
áâãäå
íîïðñ
óô õö
÷øùúûü ýþÿ�
�	���

�� ��
�����
89:;<
FIG. 3: (color online) (a) TDOS, (b) Ti 3d and Ru 4d PDOS,
(c) O 2p PDOS and (d) Sr 4d PDOS of SrTiO3 and SrRuO3.
Dashed line represent Sr 4d PDOS rescaled by 20 times.
understood by comparing the same in the end members,
SrTiO3 and SrRuO3 as demonstrated in Fig. 3. Sr 4d
states appear at much higher energies in SrTiO3 com-
pared to that in SrRuO3. One reason for such a large
shift may be related to the shift of the Fermi level to
the top of the O 2p band in SrTiO3. However, the shift
of Sr 4d band in the intermediate compositions, where
the Fermi level is pinned by the occupancy of the Ru 4d
band, indicates that the Madelung potential at Sr-sites
increases with the increase in Ti concentrations.
Ti 3d partial DOS appears 2 eV above the Fermi level.
This clearly demonstrates that the occupancy of Ti 3d
states is essentially zero and hence correspond to Ti4+
valency. Such valence states was predicted in the x-ray
photoemission spectra [9]. This study provides evidence
of such effect theoretically within the effective single par-
ticle approach itself. The width of the Ti 3d t2g band is
significantly small in x = 0.5 sample (∼ 0.65 eV), which
increases to 1.5 eV in x = 0.75 sample and 2.5 eV at x
= 1.0 (see Fig. 3).
Ru 4d partial DOS exhibit three regions. The narrow
and intense feature between the energy range -1.6 to 0.5
eV correspond to the electronic states having t2g sym-
metry. The electronic states above 1.8 eV appears due
to Ru 4d states having eg symmetry. Notably, the O
2p states also contribute in all the three energy regions.
Thus, DOS appearing below -5 eV can be attributed to
the Ru 4d - O 2p bonding states having a large O 2p
character, and the energy region above -1.5 eV are the
anti-bonding states having primarily Ru 4d character.
Most interestingly, both the compounds exhibit metallic
ground state. However, the t2g bandwidth, W reduces
significantly with the increase in x. While W is close to
2.6 eV in SrRuO3, it is about 1.7 eV for x = 0.5 and 0.54
eV for x = 0.75. Such reduction in W is understand-
able as Ti-substitution leads to a significant reduction
in the hopping interaction strength due to the reduced
degree of Ru-O-Ru connectivity. This is clearly evident
in Fig. 1; if we assume homogeneous distribution of Ru
and Ti atoms in the solid, all the RuO6 octahedra are
separated by TiO6 octahedra at x = 0.5. At x = 0.75,
the number of Ru-[O-Ti-O]-Ru connectivity reduces to
half of that at x = 0.5. Subsequently, U/W (U = local
Coulomb interactions strength) will increase significantly
and presumably play a role in the transport properties
in these compositions [11].
In order to understand the bonding of Ru 4d electronic
states with various O 2p states, we compare the Ru 4d t2g
and eg bands with the 2p bands corresponding to O1, O2
and O3 for x = 0.75 and 0.5 sample in Fig. 4(a) and 4(b),
respectively. All the oxygens are equivalent in the x = 0.5
sample. The energy distribution of O2 2p partial DOS is
almost identical in Fig. 4(a) to that observed in Ru 4d
partial DOS. This is expected as the RuO6 octahedra is
formed by O2 atoms only. The width of the O2 2p band
is significantly larger than that of O1 and O3. The most
interesting observation is that the t2g and eg bands are
separated by a distinct energy gap. This gap is already
visible in Ru 4d partial DOS of x = 0.5 sample in Fig.
4(b) and is absent in SrRuO3 as shown in Fig. 3 and in
the literature as well [7, 13].
We calculate the crystal field splitting of the Ru 4d
band by measuring the separation of the center of gravity
of the Ru 4d t2g and eg bands as shown in Fig. 4 by closed
circles in both the compositions. It is evident that crystal
field splitting, ∆ remains almost the same (∼ 2.1 eV) in
both the compositions and is very close to 2 eV found
in SrRuO3. Thus, the large energy gap between the t2g
and eg bands appears purely due to the band narrowing.
Such effect has strong implication in the magnetic phase
as described below.
It is already well established that the magnetic ground
state can be exactly described by these band structure
calculations [7, 14, 15, 16]. Thus, we have calculated the
ground state energies for ferromagnetic arrangement of
moments of the constituents using local spin-density ap-
proximations. Interestingly, the eigen energy for the fer-
romagnetic ground state in x = 0.5 sample is 5.67 meV/fu
lower than the lowest eigen energy for the non-magnetic
solution. This is higher than 1.2 meV/fu observed in
SrRuO3 in real structure and significantly smaller than
30.4 meV/fu observed in the equivalent cubic struc-
ture of SrRuO3. This energy difference between the
non-magnetic and magnetic solutions increases to 33.95
meV/fu in x = 0.75. All these results suggest that the
stability of the ferromagnetic ground state increases with
the decrease in the degree of charge delocalization of the
CD EF GH I J K L M
TU VW XY Z [ \ ] ^
bcd ef gh
ij kl
mn op
qrs tu vwxyz{|}
²³´ µ¶
·¸¹º»¼ ½¾¿À
FIG. 4: (color online) Ru 4d partial DOS with t2g and eg
symmetry are compared with the O 2p partial DOS in (a)
SrRu0.25Ti0.75O3 and (b) SrRu0.5Ti0.5O3.
ÇÈ É Ê Ë Ì
ÍÎ Ï Ð Ñ Ò
Ö× ØÙ ÚÛ Ü Ý Þ ß
àá âã äå æ ç è é
î ï ð ñ
ò ó ô õ
ùú ûü ýþ ÿ
�� �� �� �
��
� �� ���� �� ��
'()* +,-.
23 4567
[\]^_ `abc
ghij kl mnop
qr st
���� ����
�� ���� ����
 Energy (eV)
ª«¬­ ®¯°±
FIG. 5: (color online) Up and down spin density of stated
corresponding to (a) Ru 4d in SrRu0.5Ti0.5O3, (b) O 2p in
SrRu0.5Ti0.5O3, (c) Ru 4d in SrRu0.25Ti0.75O3, and (d) O
2p in SrRu0.25Ti0.75O3. This figure demonstrates that band
narrowing in Ru 4d band leads to a gap in the up spin channel
leading to half metallicity.
valence electrons.
The spin magnetic moment centered at Ru-sites is
found to be about 0.6 µB in x = 0.5 sample. Inter-
estingly, magnetic moment at the interstitial electronic
states is significantly large (∼ 0.36 µB). The moment
at the O sites is about 0.05 µB. The Ti sites also ex-
hibit very small moment (∼ -0.03 µB). Thus the total
magnetic moment of the solid becomes 1.24 µB per Ru-
atom. This is very similar to that observed (1.2 µB) in
SrRuO3. The magnetic moments increase significantly
with the increase in x. The moments at Ru site becomes
0.88 µB in x = 0.75 sample. The moments of the intersti-
tial states and 2p states at O2 sites also enhance to 0.66
µB and 0.066 µB, respectively. Thus, the total moment
turns out to be 1.99 µB, which is very close to the spin
only value of 2 µB corresponding to Ru 4t
2g electronic
configuration. It is to note here that although the local
moment of the highly extended 4d states is significantly
smaller than the spin only value as opposed to the case
in 3d transition metal oxides [15], Ru 4d moment induces
a large degree of polarization in the interstitial and O 2p
electrons. These results evidently suggest applicability
of Stoner description to capture magnetic properties of
these systems.
In order to investigate the exchange splitting and the
character of density of states in the vicinity of ǫF , we plot
the spin-resolved DOS corresponding to Ru 4d and O 2p
partial DOS in Fig. 5. In the x = 0.5 sample, both the up
and down spin states contribute at ǫF and the exchange
splitting is found to be about 0.47 eV. This is again very
similar to the case in SrRuO3 [7]. The exchange splitting
increases to 0.65 eV in x = 0.75 sample as shown in the
figure. Interestingly, the up spin band moves significantly
below ǫF and the contributions at ǫF appears only due
to the down spin states indicating a half-metallic behav-
ior. No contribution of the up spin states observed in
the total density of states (not shown here). Considering
the paucity of half-metallic materials for various tech-
nological applications, achieving half metallicity in the
ferromagnetic SrRuO3 by Ti-substitution is remarkable.
It is believed that the half metallicity can be achieved
via strong d − d hybridization in Heusler alloys involv-
ing two transition metal elements in the compound [17].
In transition metal oxides, often doping of large amount
of electrons or holes leads to a shift of the Fermi level
towards the energy gap of one spin channel leading to
half metallicity [3]. The primary difficulty to use these
systems in technological applications is the loss of half
metallicity at elevated temperatures, where thermal ex-
citations leads to significant mixing of various spin chan-
nels due to small energy gap at ǫF [6]. In the present
case, mechanism to achieve half metallicity is simple and
easily achievable experimentally. The most important
aspect is that the energy gap between t2g and eg bands
can be tailored judiciously by tuning the composition to
minimize thermal effects.
In summary, we investigate the possibility of fabricat-
ing half metallicity by Ti-substitution at the Ru-sites in
a ferromagnetic material, SrRuO3. The calculated re-
sults using FLAPW method within the local spin density
approximations reveal tetravalency of Ti in all the com-
positions consistent with the experimental predictions.
The Ru 4d band exhibit significant narrowing with the
increase in Ti-substitution; the crystal field splitting re-
mains almost the same across the whole series. Thus,
an energy gap develops between the t2g and eg bands,
which gradually grows with the increase in x. Conse-
quently, the up spin density of states exhibit an energy
gap at the Fermi level, while the down spin states still
contribute leading to half metallicity. Most interestingly,
the t2g − eg gap can be engineered by tuning x and thus
spin mixing effects due to thermal excitations can be min-
imized. This study thus provide a novel but simple way
to fabricate half metallicity in ferromagnetic materials,
which are potential candidates for spin based technol-
ogy. Experimental realization of this method would help
both chemists and physicists to cultivate new materials.
In addition, this study demonstrates that effective sin-
gle particle approaches provide a remarkable description
of the electronic properties of these systems, which are
predicted experimentally.
∗ Electronic mail: kbmaiti@tifr.res.in
[1] R.A. de Groot, F.M. Mueller, P.G. van Engen, and
K.H.J. Buschow, Phys. Rev. Lett. 50, 2024-2027 (1983),
[2] K.-I. Kobayashi, T. Kimura, H. Sawada, K. Terakura,
and Y. Tokura, Nature 395, 677-680 (1998).
[3] J.H. Park et al., Nature 392, 794-796 (1998).
[4] R.S. Keizer, S.T.B. Goennenwein, T.M. Klapwijk, G.
Miao, G. Xiao, and A. Gupta, Nature 439, 825-827
(2006).
[5] Y.-W. Son, M.L. Cohen, and S.G. Louie, Nature 444,
347-349 (2006).
[6] M. Ležaić, Ph. Mavropoulos, J. Enkovaara, G.
Bihlmayer, and S. Blügel, Phys. Rev. Lett. 97, 026404
(2006).
[7] K. Maiti, Phys. Rev. B 73, 235110 (2006).
[8] D.C. Worledge and T.H. Geballe, Phys. Rev. Lett. 85,
5182 (2000).
[9] J. Kim, J.-Y. Kim, B.-G. Park, and S.-J. Oh, Phys. Rev.
B 73, 235109 (2006).
[10] S. Ray, D.D. Sarma, and R. Vijayaraghavan, Phys. Rev.
B 73, 165105 (2006).
[11] K.W. Kim, J.S. Lee, T.W. Noh, S.R. Lee, and K. Char,
Phys. Rev. B 71, 125104 (2005).
[12] P. Blaha, K. Schwarz, G.K.H. Madsen, D. Kvasnicka, and
J. Luitz, WIEN2k, An Augmented Plane Wave + Lo-
cal Orbitals Program for Calculating Crystal Properties
(Karlheinz Schwarz, Techn. Universität Wien, Austria),
2001. ISBN 3-9501031-1-2.
[13] D.J. Singh, J. Appl. Phys. 79, 4818-4820 (1996).
[14] N. Hamada, H. Sawada, I. Solovyev, and K. Terakura,
Physica B 237-238, 11-13 (1997).
[15] D.D. Sarma, N. Shanthi, S.R. Barman, N. Hamada, H.
Sawada, and K. Terakura, Phys. Rev. Lett. 75, 1126
(1995).
[16] K. Maiti, Phys. Rev. B 73, 115119 (2006).
[17] I. Galanakis, P.H. Dederichs, and N. Papanikolaou, Phys.
Rev. B 66, 134428 (2002); ibid, 66, 174429 (2002).
ABSTRACT
  We investigate the growth of half metallic phase in a ferromagnetic material
using state-of-the-art full potential linearized augmented plane wave method.
To address the issue, we have substituted Ti at the Ru-sites in SrRuO3, where
SrRuO3 is a ferromagnetic material. Calculated results establish Ti4+ valence
states (similar to SrTiO3), which was predicted experimentally. Thus, Ti
substitution dilutes the Ru-O-Ru connectivity, which is manifested in the
calculated results in the form of significant band narrowing leading to finite
gap between t2g and eg bands. At 75% substitution, a large gap (> 2 eV) appears
at the Fermi level, e_F in the up spin density of states, while the down spin
states contributes at e_F characterizing the system a half-metallic
ferromagnet. The t2g - eg gap can be tailored judiciously by tuning Ti
concentrations to minimize thermal effects, which is often the major bottleneck
to achieve high spin polarization at elevated temperatures in other materials.
This study, thus, provides a novel but simple way to fabricate half-metallicity
in ferromagnetic materials, which are potential candidates for spin-based
technology.

<|endoftext|><|startoftext|>
Emergence of spatiotemporal chaos driven by far-field breakup of spiral waves in the
plankton ecological systems
Quan-Xing Liu,1 Gui-Quan Sun,1 Bai-Lian Li,2 and Zhen Jin1, ∗
Department of Mathematics, North University of China,
Taiyuan, Shan’xi 030051, People’s Republic of China
Ecological Complexity and Modeling Laboratory, Department of Botany and Plant Sciences,
University of California, Riverside, CA 92521-0124, USA
(Dated: October 25, 2018)
Alexander B. Medvinsky et al [A. B. Medvinsky, I. A. Tikhonova, R. R. Aliev, B.-L. Li, Z.-S. Lin,
and H. Malchow, Phys. Rev. E 64, 021915 (2001)] and Marcus R. Garvie et al [M. R. Garvie and C.
Trenchea, SIAM J. Control. Optim. 46, 775-791 (2007)] shown that the minimal spatially extended
reaction-diffusion model of phytoplankton-zooplankton can exhibit both regular, chaotic behavior,
and spatiotemporal patterns in a patchy environment. Based on that, the spatial plankton model
is furtherly investigated by means of computer simulations and theoretical analysis in the present
paper when its parameters would be expected in the case of mixed Turing-Hopf bifurcation region.
Our results show that the spiral waves exist in that region and the spatiotemporal chaos emerge,
which arise from the far-field breakup of the spiral waves over large ranges of diffusion coefficients
of phytoplankton and zooplankton. Moreover, the spatiotemporal chaos arising from the far-field
breakup of spiral waves does not gradually involve the whole space within that region. Our results
are confirmed by means of computation spectra and nonlinear bifurcation of wave trains. Finally,
we give some explanations about the spatially structured patterns from the community level.
PACS numbers: 87.23.Cc, 82.40.Ck, 82.40.Bj, 92.20.jm
Keywords: Spiral waves; Spatio-temporal pattern; Plankton dynamics; Reaction-diffusion system
I. INTRODUCTION
There is a growing interest in the spatial pattern dy-
namics of ecological systems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13]. However, many mechanisms of the spatio-
temporal variability of natural plankton populations are
not known yet. Pronounced physical patterns like ther-
moclines, upwelling, fronts and eddies often set the frame
for the biological process. Measurements of the underwa-
ter light field are made with state-of-the-art instruments
and used to calculate concentrations of phytoplankton
biomass (as chlorophyll) as well as other forms of organic
matter. Very high diffusion of the marine environment
would prevent the formation of any stable patch spatial
distribution with much longer life-time than the typical
time of biodynamics. Meanwhile, in addition to very
changeable transient spatial patterns, there also exist
other spatial patterns in marine environment, much more
stable spatial structure associated with ocean fronts, spa-
tiotemporal chaos [10, 11, 14], cyclonic rings, and so
called meddies [15]. In fact, it is significant to create
the biological basis for understanding spatial patterns
of plankton [16]. For instance, the impact of space on
the persistence of enriched ecological systems was proved
in laboratory experiments [17]. Recently, it has been
shown both in laboratory experiments [18] and theoreti-
cally [14, 19, 20, 21] that the existence of a spatial struc-
ture makes a predator-prey system less prone to extinc-
∗Corresponding author; Electronic address: jinzhn@263.net
tion. This is due to the temporal variations of the density
of different sub-populations can become asynchronous
and the events of local extinction can be compensated
due to re-colonization from other sites in the space [22].
During a long period of time, all the spiral waves have
been widely observed in diverse physical, chemical, and
biological systems [23, 24, 25, 26]. However, a quite lim-
ited number of documents [11, 12, 27, 28, 29] concern
the spiral wave pattern and its breakup in the ecological
systems.
The investigation of transition from regular patterns
to spatiotemporally chaotic dynamics in spatially ex-
tended systems remains a challenge in nonlinear sci-
ence [14, 23, 30, 31]. In a nonlinear ecology system, the
two most commonly seen patterns are spiral waves and
turbulence (spatio-temporal chaos) for the level of the
community [32]. It has been recently shown that sponta-
neous spatiatemoporal pattern formation is an instrinsic
property of a predator-prey system [11, 14, 33, 34, 35, 36]
and spatiotemporal structures play an important role in
ecological systems. For example, spatially induced speci-
ation prevents the extinction of the predator-prey mod-
els [11, 12, 37]. So far, plankton patchiness has been ob-
served on a wide range of spatial temporal scales [38, 39].
There exist various, often heuristic explanations of the
spatial patterns phenomenon for these systems. It should
be noted that, although conclusive evidence of ecological
chaos is still to be found, there is a growing number of
indications of chaos in real ecosystems [40, 41, 42, 43].
Recently developed models show that spatial self-
structuring in multispecies systems can meet both cri-
teria and provide a rich substrate for community-level
http://arxiv.org/abs/0704.0322v3
mailto:jinzhn@263.net
section and a major transition in evolution. In present
paper, the scenario in the spatially extended plankton
ecological system is observed by means of the numeri-
cal simulation. The system has been demonstrated to
exhibit regular or chaostic, depending on the initial con-
ditions and the parameter values [10, 29]. We find that
the far-field breakup of the spiral wave leads to complex
spatiotemporal chaos (or a turbulentlike state) in the spa-
tially extended plankton model (1). Our results show
that regular spiral wave pattern shifts into spatiotempo-
ral chaos pattern by modulating the diffusion coefficients
of the species.
II. MODEL
In this paper we study the spatially extended nutrient-
phytoplankton-zooplankton-fish reaction-diffusion sys-
tem. Following Scheffer’s minimal approach [44], which
was originally formulated as a system of ordinary diff-
ential equation (ODEs) and later developed models [10,
11, 29, 45, 46], as a further investigation, we study a
two-variable phytoplankton and zooplankton model on
the level of the community to describe pattern formation
with the diffusion. The dimensionless model is written
= rp(1 − p)−
1 + bp
h+ dp∇
2p, (1a)
1 + bp
h−mh− f
n2 + h2
+ dh∇
2h, (1b)
where the parameters are r, a, b, m, n, dp, dh, and
f which refer to work in Refs. [10, 11]. The explana-
tion of model (1) relates to the nutrient-phytoplankton-
zooplankton-fish ecological system [see Refs. [10, 29, 44]
for details]. The local dynamics are given by
g1(p, h) = rp(1− p)−
1 + bp
h, (2a)
g2(p, h) =
1 + bp
h−mh− f
n2 + h2
. (2b)
From the earlier results [45] about non-spatial system
of model (1) by means of numerical bifurcation analysis
show that the bifurcation and bistability can be found in
the system (1) when the parameters are varied within a
realistic range. For the fixed parameters (see the caption
of Fig. 1 and 2), we can see that the f controls the dis-
tance from Hopf bifurcation. For larger f , there exists
only one stable steady state. As f is decreased further,
the homogeneous steady state undergoes a saddle node
bifurcation (SN), that is fSN = 0.658. In this case, a
stable and an unstable steady state become existence.
Moreover, the bistability will emerge when the parame-
ter f lies the interval fSN > f > fc = 0.445 (this value is
more than the Hopf onset, fH = 0.3397). There are three
steady states: with these kinetics A and C are linearly
stable while B is unstable. Outside this interval, the sys-
tem (1) has unique nontrivial equilibrium. Recent stud-
ies [11, 29] shown that the systems (1) can well-develop
the spiral waves in the oscillation regime, but where the
authors only consider the special case, i.e., dp = dh. A
few important issue have not yet been properly addressed
such as the spatial pattern if dp 6= dh.
Here we report the result that emergence of spatiotem-
poral chaos due to breakup in the system under the
dh 6= dp case. We may now use the f and diffusion
ratio, ν = dh/dp, as control parameters to evaluate
the region for the spiral wave. Turing instability in
reaction-diffusion can be recast in terms of matrix sta-
bility [47, 48]. Such with the help of Maple software
assistance algebra computing, we obtain the parameters
space (f, ν) bifurcation diagrams of the spiral waves as
showing Fig. 2, in which two lines are plotted, Hopf line
(solid) and Turing lines (dotted) respectively. In domain
I, located above all three bifurcation lines, the homo-
geneous steady states is the only stable solution of the
system. Domain II are regions of homogeneous oscilla-
tion in two dimensional spaces [49]. In domain III, both
Hopf and Turing instabilities occur, (i.e., mixed Turing-
Hopf modes arise), in which the system generally pro-
duces the phase waves. Our results show that the system
has spiral wave in this regions. One can see that a Hopf
bifurcation can occur at the steady when the parameter
f passes through a critical values fH while the diffusion
coefficients dp = dh = 0 and the bifurcation periodic so-
lutions are stable. From our analysis (see Fig. 2), one
could also see that the diffusion can induce Turing type
instability for the spatial homogeneous stable periodic
solutions and the spatially extended model (1) exhibit
spatio-temporal chaos patterns. These spatial pattern
formation arise from interaction between Hopf and Tur-
ing modes, and their subharmonics near hte codimension-
two Hopf-Turing bifucation point. Special, it is interest-
ing that spiral wave and travelling wave will appear when
the parameters correspond to the Turing-Hopf bifurca-
tion region III in the spatially extended model (1), i.e.,
the Turing instability and Hopf bifurcation occur simul-
taneously.
III. NUMERICAL RESULTS
The simulation is done in a two-dimensional (2D)
Cartesian coordinate system with a grid size of 600×600.
The fourth order Runger-Kutta integrating method is
applied with a time step ∆t = 0.005 time unit and a
space step ∆x = ∆y = 0.20 length unit. The results
remain the same when the reaction-diffusion equations
were solved numerically in one and two spatial dimen-
sions using a finite-difference approximation for the spa-
tial derivatives and an explicit Euler method for the time
integration. Neumann (zero-flux) boundary conditions
FIG. 1: The sketch map for the bistability and the Hopf bi-
furcation in the system (2) with r = 5.0, a = 5.0, b = 5.0,
m = 0.6, and n = 0.4. The black curve is the g1(p, h). The
colored curves are g2(p, h) with different values of f . The red
curve: f = 0.3; the blue: f = 0.445; the green: f = 0.5; and
the cyan: f = 0.658.
5 10 15
Turing instability
FIG. 2: The sketch map of parameter space (f, ν) bifurcation
diagrams for the spatially extended system (1) with r = 5.0,
a = 5.0, b = 5.0, m = 0.6, dp = 0.05, and n = 0.4.
were emmployed in our simulation. The diffusion terms
in Eqs. (1a) and (1b) often describe the spatial mixing
of species due to self-motion of the organism. The typi-
cal diffusion coefficient of plankton patterns dp is about
0.05, based on the parameters estimatie of Refs [50, 51]
using the relationship between turbulent diffusion and
the scale of the space in the sea. In the previous stud-
ies [10, 11, 29, 45, 46], the authors provided a valueable
insight into the role of spatial pattern for the system (1)
if dp = dh. From the biological meaning, the diffusion
coefficients should satisfy dh ≥ dp. However, in nature
waters it is turbulent diffusion that is supposed to domi-
nate plankton mixing [52], when dh < dp is allowed. The
other reason for choosing such parameter is that it is well-
known new patterns, such as Turing patterns, can emerge
in reaction-diffusion systems in which there is an imbal-
ance between the diffusion coefficients dp and dh [23, 53].
Therefore, we set ν = dh/dp, and investigated whether a
spiral wave would break up into complex spatiotemporal
chaos when the diffusion ratio was varied. Throughout
this paper, we fix dp = 0.05 and dh is a control parameter.
In the following, we will show that the dynamic behav-
ior of the spiral wave qualitatively change as the control
parameter dh increases from zero, i.e., the diffusion ra-
tio ν increases from zero, to more than one. For large
ν (ν > 1), the outwardly rotating spiral wave is com-
pletely stable everywhere, and fills in the space when the
proper parameters are chosen, as shown in Fig. 3(A). Fig-
ure 3(A) shows a series of snapshots of a well-developed
single spiral wave formed spontaneously for the variable
p in system (1). The spiral is initiated on a 600×600 grid
by the cross-field protocol (the initial distribution chosen
in the form of allocated “constant-gradient” perturbation
of the co-existence steady state) and zero boundary con-
ditions are employed for simulations in the two dimen-
sions. From Fig. 3(A) we can see that the well-developed
spiral waves are formed firstly by the evolution. Inside
the domain, new waves emerge, but are evolved by the
spiral wave growing from the center. The spiral wave
can steadily grow and finally prevail over the whole do-
main (a movie illustrating the dynamical evolution for
this case [54] [partly movie−1, movie−2, and movie−3
for dh = 0.2]). Fig. 3(B) shows that the spiral wave
first break up far away from the core center and even-
tually relatively large spiral fragments are surrounded
by a ‘turbulent’ bath remain. The size of the surviv-
ing part of the spiral does not shrink when dh is further
decreasing until finally dh equals to 0, which is different
from phenomenon that is observed previous in the two-
dimensional space Belousov-Zhabotinsky and FitzHugn-
Nagumo oscillatory system [30, 31, 55, 56, 57], in which
the breakup gradually invaded the stable region near the
core center, and finally the spiral wave broke up in the
whole medium. Figure 3(C) is the time sequences (ar-
bitrary units) of the variables p and h at an arbitrary
spatial point within the spiral wave region, from which
we can see that the spiral waves are caused by the ac-
cepted as “phase waves” with substantially group veloc-
ity, phase velocity and sinusoidal oscillation rather than
the relaxational oscillation with large amplitude. This
breakup scenario is similar to the breakup of rotating
spiral waves observed in numerical simulation in chemi-
cal systems [30, 31, 55, 56, 57], and experiments in BZ
systems [58, 59], which shows that spiral wave breakup
in these systems was related to the Eckhaus instability
and more important, the absolute instability.
The corresponding trajectories of the spiral core and
the spiral arm (far away from the core center) at y = 300
are shown in Fig. 4, respectively. From Fig. 4, we can
see that the spiral core is not completely fixed, but oscil-
lates with a large amplitude. However, as dh decreases
to a critical value, an unstable modulation develops in
200 220 240 260 280 300
(D) t (arb. units)
FIG. 3: Well developed spiral waves and some properties of
them. The figures show simulations of the system (1) with
r = 5, a = 5, b = 5, m = 0.6, n = 0.4, dp = 0.05, and
f = 0.3. (A)Well developed spiral waves shown at subsequent
snapshot in time, dh = 0.2. (B) Far-field breakup of the spiral
waves shown at subsequent snapshot in time, dh = 0.002.
The white (black) areas correspond to maximum (minimum)
values of p [Additional movie format available from Ref. [54]].
(C) Oscillations of the variable p and h at an arbitrary spatial
point within the regular spiral wave region for both scenarios.
Each figure is ran the long time until it spatial patterns are
unchange.
regions which is far away from the spiral core (cf. the
middle column of the Fig. 4). These oscillations eventu-
ally grow large enough to cause the spiral arm far away
from the core to breakup into complex multiple spiral
waves, while the core region remains stable (the corre-
sponding movie can be viewed in the online supplemen-
tal in Ref. [54] [partly movie−1 and movie−2, and for
dh = 0.02]). Figures 3(B) and 4(B) show the dynamic
behavior for dh = 0.02, i.e., ν = 0.4. The regular tra-
jectories far away from the core are now the same as the
region of the spatial chaos (cf. the middle column of the
Fig. 4). It is shown that an decrease in the diffusion ra-
tio ν which leads to population oscillations of increasing
amplitude (cf. the left column of the Fig. 4). In the
tradition explain that the minimum value of the popula-
tion density decreases and population extinction becomes
more probable due to stochastic environmental perturba-
tions. However, from the spatial evolution of system (1)
(see Fig. 3), the temporal variations of the density of
different sub-population can become asynchronous and
the events of local extinction can be compensated due to
re-colonization (or diffusion) from other sites.
FIG. 4: The corresponding trajectories (from left to right)
for locations (300, 300), (250, 300), and (50, 300) respectively.
The parameters in (A), and (B) were the same as these in
Fig. 3(A) and (B), respectively.
Furthermore, it is well known that the basic arguments
in spiral stability analysis can be carried out by reducing
the system to one dimensional space [30, 31, 55, 56, 57].
Here we show some essential properties of the spiral
breakup resulting from the numerical simulation. In the
next section we will give the theoretical computation by
using the eigenvalue spectra. In this model, it is worth
noting that we do not neglect the oscillation of the dy-
namics in the core as shown in Fig. 4 due to the system
exhibiting spatial periodic wave trains when the model
is simulated in one-dimensional space. Breakup occurs
first far away from the core (the source of waves). The
spiral wave breaks towards the core until it gets to some
constant distance and then the surviving part of the spi-
ral wave stays stable. These minimal stable wavelengths
are called λmin. So the one-parameter family may be
described by a dispersion curve λ(dh) (see Fig. 5). The
minimal stable wavelength λmin of the spiral wave are
shown in Fig. 5 coming from the simulation in two di-
mensional space. The results of Fig. 5 can be interpreted
as follows: the minimal stable wavelengths decrease with
respect to the decrease of dh but eventually stay at a
relative constant value, which is that the stable spiral
waves are always existing for a larger region values of dh.
Space-time plots at different times are shown in Fig. 6
for two different dh, i.e., different ν, which display the
time evolution of the spiral wave along the cross section
in the two-dimensional images of Fig. 3(A) and (B). As
shown in Fig. 6(A) and (B) for dh = 0.2 and dh = 0.02
respectively, the waves far away from the core display
unstable modulated perturbation due to convective in-
stability [30, 31, 55, 56, 57], but this perturbation is
gradually advected to the left and right sides, and finally
disappears. The instability manifests itself to produce
the wave train breakup several waves from the far-field,
as shown in Figs. 6(B).
FIG. 5: Dependence of the wavelength λmin on the parameter
dh for the system (1) with r = 5.0, a = 5.0, b = 5.0, m = 0.6,
dp = 0.05, and n = 0.4. Note the log scale for dh.
IV. SPECTRA AND NONLINEAR
BIFURCATION OF THE SPIRAL WAVE
In this section, we concentrate on the linear stabil-
ity analysis of spiral wave by using the spectrum the-
ory [56, 60, 61, 62, 63]. From the results in Refs. [56, 62]
we know that the absolute spectrum must be computed
numerically for any given reaction-diffusion systems. In
practice, such computations only require discretization
in one-dimensional space and compare with computing
eigenvalues of the full stability problem on a large do-
main due to the spiral wave exhibitting traveling waves
in the plane (see Fig. 6 about the space-time graphes).
For spiral waves on the unbounded plane, the essential
FIG. 6: Space-time plots of variable p for different time and
dh. The parameters in (A), and (B) are the same as those in
Fig. 3(A) and (B), respectively.
spectrum is also required to compute, since it determined
only by the far-field wave trains of the spiral. The lin-
ear stability spectrum consists of point eigenvalues and
the essential spectrum that is a continuous spectrum for
spiral waves.
For sake of simplicity, the Eqs. (1a) and (1b) can been
written as following
= dp∇
2p+ g1(p, h), (3a)
= dh∇
2h+ g2(p, h). (3b)
Suppose that (p∗, h∗) are a solutions and refer to them
as steady spirals of Eq. (3) that rotate rigidly with a
constant angular velocity ω, and that are asymptotically
periodic along rays in the plane. In a coratating coordi-
nate frame, using the standardized analysis method for
the spiral waves [62, 63], the Eq. (3) is given by
= dp∇
ρ,θp+ ω
+ g1(p
∗, h∗), (4a)
= dh∇
ρ,θh+ ω
+ g2(p
∗, h∗), (4b)
where (ρ, θ) denote polar coordinates, spirals waves are
relative equilibria, then the statianry solutions p∗(ρ, θ)
and h∗(ρ, θ) both are 2π-periodic functions with θ = ϕ−
ωt. In Eqs. (4a) and (4b) the operator∇2ρ,θ denotes ∂ρρ+
A. Computation of spiral spectra
Next, we commpute the leading part of its linear stabil-
ity spectrum for the system (4). Consider the linearized
evolution equation in the rotating frame, the eigenvalue
problem of Eqs. (4a) and (4b) associated with the planar
spiral solutions p∗(ρ, θ) and h∗(ρ, θ) are given by
ρ,θp+ ω
∗, h∗)p+ gh1 (p
∗, h∗)h = λp, (5a)
ρ,θh+ ω
∗, h∗)p+ gh2 (p
∗, h∗)h = λh, (5b)
where g
1 , · · · , g
2 denote the derivatives of the nonlin-
ear functions and g
1(p, h) = r(1 − p) − rp −
(1+bp)2
, gh1 (p, h) = −
2(p, h) =
− abph
(1+bp)2
, and
gh2 (p, h) =
−m− 2fnh
n2+h2
+ 2fnh
(n2+h2)2
. We shall ignore
isolated eigenvalues that belong to the point spectrum,
instabilities caused by point eigenvalues lead to mean-
deringor drifting waves, or to an unstable tip motionin
in excitable media and oscillation media [56, 64, 65, 66].
This phenomenon is not shown in the present paper. In-
stead, we focus on the continuous spectrum that is re-
sponsible for the spiral wave breakup in the far field (see
Fig. 3(b)). By the results in Ref. [62], it turns out that
the boundary of the continuous spectrum depends only
on the limiting equation for ρ → ∞. Thus, we have that
λ is the boundary of the continuous spectrum if, and only
if the limiting equation
ρ,ρp+ ω
∗, h∗)p+ gh1 (p
∗, h∗)h = λp, (6a)
ρ,ρh+ ω
∗, h∗)p+ gh2 (p
∗, h∗)h = λh, (6b)
have solutions p(ρ, θ) and h(ρ, θ) for (ρ, θ) ∈ R+× [0, 2π],
which are bounded but does not decay as ρ → ∞. Since
spiral waves are rotating waves in the plane, the wave
train solutions have the form as u(t, x, y) = u(ρ, ϕ− ωt)
for an appropriate wave numbers k and temporal fre-
quency ω, where we assume that u is 2π-periodic in
its argument so that u(ξ) = u(ξ + 2π) for all ξ and
u = (p, h)T. Spiral waves converge to wave trains
u(ρ, ϕ − ωt) → uwt(kρ + ϕ − ωt) for ρ → ∞, which
are corresponding to asymptotically Archimedean in the
two-dimensional space. Assume that k 6= 0 and ω 6= 0,
and in this case, we can pass from the theoretical frame ρ
to the comoving frame ξ = kρ+ϕ−ωt (ξ ∈ R) in which
the eigenvalue equation (6) becomes
2∇2ξ,ξp+ωpξ+g
1(uwt(ξ))p+g
1 (uwt(ξ))h = λp, (7a)
2∇2ξ,ξh+ ωhξ + g
2(uwt(ξ))p + g
2 (uwt(ξ))h = λh.(7b)
Indeed, any nontrivial solution u(ξ) = (p(ξ), h(ξ))T cor-
responding to the linearization eigenvalue problem (7)
give a solution U(ρ, ·) of the eigenvalue problem for the
temporal period map of (3) in the corotating frame via
U(ρ, ·) = eλtu(kρ− ωt), U(ρ, T ) = eλTu(kρ− 2π).
We write the equations (7) as the first-order systems
= p1,
= h1,
= k−2d−1p
µp− ωp1 − g
1(uwt(ξ))p− g
1 (uwt(ξ))h
= k−2d−1
µh− ωh1 − g
2(uwt(ξ))p− g
2 (uwt(ξ))h
in the radial variable ρ. Then the spatial eigenvalues or
spatial Floquet exponents are deternined as the roots of
the Wronskian
A(λ, k) :=
0 0 1 0
0 0 0 1
(λ− g
1(uwt(ξ))) −
gh1 (uwt(ξ)) −
2(uwt(ξ))
(λ− gh2 (uwt(ξ))) 0 −
where k ∈ R. The function U(ρ, ·) = eλteikρu0(kρ− ωt)
satisfies the equation (3) when the spatial and temporal
exponents ik and λ satisfy the complex dispersion rela-
tion det(A(λ, k) − ik) = 0 for λ ∈ C. We call the ik
in spectrum of A(λ, k) as spatial eigenvalues or spatial
Floquet exponents.
The stability of the spiral waves state (p∗, h∗) on the
plane is determined by the essential spectrum given by
Σess = {λ ∈ C; det(A(λ, k) − ik) = 0 for some k ∈ R}.
Now, we compute the continuous spectrum with the
equation (9) that are parameterized by the wave num-
ber k. For each λ, there are infinitely many stable and
unstable spatial eigenvalues. We plot λ in the complex
plane associated spatial spectrum, see Fig. 7. By the ex-
plaination of Sandstede et al [60], one would know that
if the real part of the essentail spectra is positive, then
the associated eigenmodes grow exponentially toward the
boundary, i.e., they correspond to a far-field instability.
Note that we find the essentail spectra are not sensitive
to temporal frequency, ω.
Re(λ)
K30 K20 K10 0
Im(λ)
Re(λ)
K0.8 K0.6 K0.4 K0.2 0 0.2
Im(λ)
FIG. 7: The essentail spectra of wave trains are obtained by
using the algorithms outlined in Refs. [60, 61]. The param-
eters of (A) and (B) are corresponding to the values used in
the simulations of Fig. 3(A) and (B).
B. Existence and properties of wave trains
Suppose that a reaction-diffusion system on the one-
dimensional space such that the variables equal to a
homogeneous stationary solution. If the homogeneous
steady-state destabilizes, then its linearization accommo-
dates waves of the form ei(kx−ωt) for certain values k and
ω. Typically, near the transition to instability, small spa-
tially periodic travelling waves arise for any wave number
close to kc, which is the critical wavenumber. Their wave
speed is approximately equal to ωc
, where ωc is corre-
sponding to kc. In present paper, we focus exclusively on
the situation where ωc = 0 and kc 6= 0. The bifurcation
with ωc = 0 and kc 6= 0 is known as the Turing bifur-
cation, and the bifurcating spatially periodic steady pat-
terns are often referred to as Turing patterns. Another
class of moved patterns will appear when the instabilities
modulated by Hopf-Turing bifurcation, which is resem-
ble a travelling waves. Moreover, the common feature
of the spiral waves in one-dimensional space mentioned
above is the presence of wave trains which are spatially
periodic travelling waves of the form pwt(kx−ωt; k) and
hwt(kx − ωt; k), where pwt(φ; k) and hwt(φ; k) are 2π-
periodic about φ. Typically, the spatial wavenumber k
and the temporal frequency ω are related via the non-
linear dispersion relation ω = ω(k) so that the phase
velocity is given by
. (12)
A second quantity related to the nonlinear dispersion
relation is the group velocity, cg =
, of the wave
train which also play a central role in the spiral waves.
The group velocity cg gives the speed of propagation of
small localized wave-package perturbations of the wave
train [67]. Here, we are only concerned the existence of
travelling wave solution. In fact, the spiral waves move
at a constant speed outward from the core (see Fig. 6),
so that they have the mathematical form p(x, t) = P (z),
and h(x, t) = H(z) where z = x−cpt. Substituting these
solution forms into Eq. (3) gives the ODEs
+ g1(P,H) = 0, (13a)
+ g2(P,H) = 0. (13b)
Here, we investigate numerically the existence, speed
and wavelength of travelling wave patterns. Our ap-
proach is to use the bifurcation package Matcont 2.4 [68]
to study the pattern ODEs (13). To do this, the most
natural bifurcation parameters are the wave speed cp and
f , but they give no information about the stability of
travelling wave as solutions of the model PDEs (3).
Our starting point is the homogeneous steady state of
Eq. (13) with in the domain III of Fig. 2. The typical bi-
furcation diagrams are illustrated in Fig. 8, which shows
that steady spatially peroidic travelling waves exist for
the larger values of the speed cp, but it is unstable for
small values of cp. The changes in stability occur via
Hopf bifurcation, from which a branch of periodic orbits
emanate. Note that here we use the terms “stable” and
“unstable” as referring to the ODEs system (13) rather
than the model PDEs. Fig. 8(B) illustrates the max-
imun stable wavelength against the bifucation parame-
ter, speed cp, and the small amplitudes have very long
wavelength. It is known that cp =
, hence the tavelling
wave solution exist when the cp 6= 0, i.e., k 6= 0, ω 6= 0.
Using Matcont 2.4 package, it is possible to track the lo-
cus of the Hopf bifurcation points and the Limit point
(fold) bifurcation in a parameter plane, and a typical ex-
ample of this for the cp-f and cp-dh plane are illustrated
in Fig. 9. The travelling wave solutions exist for values
of cp and f lying in left of Hopf bifurcation locus (see
Fig. 9(A)). The same structure about the cp-dh plane
is shown in Fig. 9(B). These reuslts confirm our previ-
ous analysis coming from the algebra computation (see
Fig. 2) and the numerical results (see Fig. 6).
V. CONCLUSIONS AND DISCUSSION
We have investigated a spatially extended plank-
ton ecological system within two-dimensional space and
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Speed, c
Hopf bifurcation point
FIG. 8: Typical bifurcation diagrams for the pattern
ODEs (13). (A) The spatially periodic travelling waves of
system (3) is existence. The changes in stability occur via
Hopf bifurcation, from which a branch of periodic orbits em-
anate. Thus unstable travelling waves appear. (B) Maxi-
mum stable wavelength along the bifurcation parametercp,
i.e., k 6= 0, ω 6= 0. The parameter values in (A) and (B) are
the same as Fig. 3(A).
found that its spatial patterns exhibit spiral waves dy-
namics and spatial chaos patterns. Specially, the sce-
nario of the spatiotemporal chaos patterns arising from
the far-field breakup is observed. Our research is based
on numerical analysis of a kinematic mimicking the dif-
fusion in the dynamics of marine organisms, coupled to a
two component plankton model on the level of the com-
munity. By increasing (decreasing) the diffusion ratio of
the two variables, the spiral arm first broke up into a
turbulence-like state far away from the core center, but
which do not invade the whole space. From the previous
studies in the Belousov-Zhabotinsky reaction, we know
the reason causing this phenomenon can be illuminated
theoretically by the M. Bär and L. Brusch [30, 31], as
well as by using the spectrum theory that poses by B.
Sandstede, A. Scheel et al [56, 60, 61, 69]. The far-field
breakup can be verified in field observation and is useful
to understand the population dynamics of oceanic ecolog-
ical systems. Such as that under certain conditions the
interplay between wake (or ocean) structures and bio-
logical growth leads to plankton blooms inside mesoscale
hydrodynamic vortices that act as incubators of primary
production. From Fig. 3 and corresponding the movies,
we see that spatial peridic bloom appear in the phyto-
plankton populations, and the details of spatial evolution
of the distribution of the phytoplankton population dur-
ing one bloom cycle, respectively.
In Ref. [70], the authors study the optimal control of
the model (1) from the spatiotemporal chaos to spiral
waves by the parameters for fish predation treated as a
multiplicative control variable. Spatial order emerges in a
range of spatial models of multispecies interactions. Un-
surprisingly, spatial models of multispecies systems often
0 0.5 1 1.5 2 2.5
Speed, c
Locus of Hopf bifurcation points
0.5 1 1.5 2
Speed, c
Locus of Hopf bifurcation points
FIG. 9: An illustration of the variations in parameter space of
the pattern ODEs (13). We plot the loci of Hopf bifurcation
points. (A) f − cp planes; (B) dh − cp planes. The parameter
values in (A) and (B) are the same as Fig. 3(A).
manifests very different behaviors from their mean-field
counterparts. Two important general features of spatial
models of multispecies systems are that they allow the
possibility of global persistence in spite of local extinc-
tions and so are usually more stable than their mean-field
equivalents, and have a tendency to self-organzie spa-
tially or regular spatiotemporal patterns [70, 71]. The
spatial structures produces nonrandom spatial patterns
such as spiral waves and spatiotemporal chaos at scales
much larger than the scale of interaction among individ-
uals level. These structures are not explicitly coded but
emerge from local interaction among individuals and lo-
cal diffusion.
As we know that plankton plays an important role in
the marine ecosystem and the climate, because of their
participation in the global carbon and nitrogen cycle at
the base of the food chain [72]. From the review [73], a
recently developed ecosystem model incorporates differ-
ent phytoplankton functional groups and their competi-
tion for light and multiple nutrients. Simulations of these
models at specific sites to explore future scenarios sug-
gest that global environmental change, including global-
warming-induced changes, will alter phytoplankton com-
munity structure and hence alter global biogeochemical
cycles [74]. The coupling of spatial ecosystem model to
global climate raises again a series of open questions on
the complexity of model and relevant spatial scales. So
the study of spatial model with large-scale is more impor-
tant in the ecological system. Basing on numerical simu-
lation on the spatial model, we can draft that the oceanic
ecological systems show permanent spiral waves and spa-
tiotemporal chaos in large-scale over a range of parame-
ter values dh, which indicates that periodically sustained
plankton blooms in the local area. As with all areas of
evolutionary biology, theoretical development advances
more quickly than does empiraical evidence. The most
powerful empirical approach is to conduct experiments
in which the spatial pattern can be measured directly,
but this is difficulties in the design. However, we can in-
directly measured these phenomenona by the simulation
and compared with the satellite pictures. For example,
the spatiotemporal chaos patterns agree with the per-
spective observation of the Fig. 3 in Ref. [73]. Also, some
satellite imageries [http://oceancolor.gsfc.nasa.gov] have
displayed spiral patterns that represent the phytoplank-
ton [the chlorophyll] biomass and thus demonstrated that
plankton patterns in the ocean occur on much broader
scales and therefore mechanisms thought diffusion should
be considered.
Acknowledgments
This work is supported by the National Natural Sci-
ence Foundation of China under Grant No. 10471040
and the Natural Science Foundation of Shan’xi Province
Grant No. 2006011009.
[1] R. E. Amritkar and Govindan Rangarajan. Spatially
synchronous extinction of species under external forcing.
Phys. Rev. Lett., 96(25):258102, 2006.
[2] Andrzej Pekalski and Michel Droz. Self-organized packs
selection in predator-prey ecosystems. Phys. Rev. E,
73(2):021913, 2006.
[3] Y.-Y. H. Sayama, M. A. M. de Aguiar, and M. Baranger.
Interplay between turing pattern formation and do-
main coarsening in spatially extended population models.
FORMA, 18:19, 2003.
[4] E. Gilad, J. von Hardenberg, A. Provenzale, M. Shachak,
and E. Meron. Ecosystem engineers: From pat-
tern formation to habitat creation. Phys. Rev. Lett.,
93(9):098105, 2004.
[5] Mark J Washenberger, Mauro Mobilia, and Uwe C
Täuber. Influence of local carrying capacity restrictions
on stochastic predator&ndash;prey models. J. Phys.:
Cond. Matt., 19(6), 2007.
[6] Mauro Mobilia, Ivan Georgiev, and Uwe Täuber. Phase
transitions and spatio-temporal fluctuations in stochastic
lattice lotkacvolterra models. J. Stat. Phys., 128(1):447–
483, 2007.
[7] Bernd Blasius, Amit Huppert, and Lewi Stone. Com-
plex dynamics and phase synchronization in spatially ex-
tended ecological systems. Nature, 399(6734):354–359,
1999.
[8] J. von Hardenberg, E. Meron, M. Shachak, and Y. Zarmi.
Diversity of vegetation patterns and desertification.
Phys. Rev. Lett., 87(19):198101, 2001.
[9] A. Provata and G. A. Tsekouras. Spontaneous formation
of dynamical patterns with fractal fronts in the cyclic
lattice lotka-volterra model. Phys. Rev. E, 67(5):056602,
2003.
[10] Alexander B. Medvinsky, Irene A. Tikhonova, Rubin R.
Aliev, Bai-Lian Li, Zhen-Shan Lin, and Horst Malchow.
Patchy environment as a factor of complex plankton dy-
namics. Phys. Rev. E, 64(2):021915, 2001.
[11] Alexander B. Medvinsky, Sergei V. Petrovskii, Irene A.
Tikhonova, Horst Malchow, and Bai-Lian Li. Spatiotem-
poral complexity of plankton and fish dynamics. SIAM
Review, 44:311–370, 2002.
[12] W. S. C. Gurney, A. R. Veitch, I. Cruickshank, and
G. McGeachin. circles and spirals: population persis-
tence in a spatially explicit predatorcprey model. Ecol-
ogy, 79(7):2516–2530, 1998.
[13] J. D. Murray. Mathematical biology. Interdisciplinary
applied mathematics. Springer, New York, 3rd edition,
2002.
[14] Sergei Petrovskii, Bai-Lian Li, and Horst Malchow. Tran-
sition to spatiotemporal chaos can resolve the paradox of
enrichment. Ecological Complexity, 1:37–47, 2004.
[15] Laurence Armi, Dave Hebert, Neil Oakey, James Price,
Philip L. Richardson, Thomas Rossby, and Barry Rud-
dick. The history and decay of a mediterranean salt lens.
Nature, 333(6174):649–651, 1988.
[16] Esa Ranta, Veijo Kaitala, and Per Lundberg. The Spa-
tial Dimension in Population Fluctuations. Science,
278(5343):1621–1623, 1997.
[17] L S Luckinbill. The effects of space and enrichment on a
predator-prey system. Ecology, 55:1142–1147, 1974.
[18] M Holyoak. Effects of nurient enrichment on predator-
prey metapopulation dynamics. J Anim. Ecol., 69:985–
997, 2000.
[19] Vincent A.A. Jansen. Regulation of predator-prey sys-
tems through spatial interactions: a possible solution to
the paradox of enrichment. Oikos, 74:384390, 1995.
[20] Vincent A.A. Jansen and Alun L. Lloyd. Local stabil-
ity analysis of spatially homogeneous solutions of multi-
patch systems. J. Math. Biol., 41:232252, 2000.
[21] Vincent A.A. Jansen. The dynamics of two diffusively
coupled predatorprey populations. Theor. Popul. Biol.,
59:119131, 2001.
[22] J. C. Allen, W. M. Schaffer, and D. Rosko. Chaos reduces
species extinction by amplifying local population noise.
Nature, 364:229232, 1993.
[23] M. C. Cross and P. C. Hohenberg. Pattern formation
outside of equilibrium. Rev. Mod. Phys., 65(3):851, 1993.
[24] Kyoung J. Lee, Raymond E. Goldstein, and Edward C.
Cox. Resetting wave forms in dictyostelium territories.
Phys. Rev. Lett., 87(6):068101, 2001.
[25] Satoshi Sawai, Peter A. Thomason, and Edward C. Cox.
An autoregulatory circuit for long-range self-organization
http://oceancolor.gsfc.nasa.gov
in dictyostelium cell populations. Nature, 433(7023):323–
326, 2005.
[26] Arthur T. Winfree. Varieties of spiral wave behavior:
An experimentalist’s approach to the theory of excitable
media. Chaos, 1(3):303–334, 1991.
[27] V. N. Biktashev, J. Brindley, A. V. Holden, and M. A.
Tsyganov. Pursuit-evasion predator-prey waves in two
spatial dimensions. Chaos, 14(4):988–994, 2004.
[28] M Garvie. Finite-difference schemes for reaction-diffusion
equation modeling predato-prey interactions in matlab.
Bull. Math. Biol., 69:931–956, 2007.
[29] Marcus R. Garvie and Catalin Trenchea. Optimal con-
trol of a nutrient-phytoplankton-zooplankton-fish sys-
tem. SIAM J. Contr. and Opti., 46(3):775–791, 2007.
[30] Markus Bär and Lutz Brusch. Breakup of spiral waves
caused by radial dynamics: Eckhaus and finite wavenum-
ber instabilities. New Journal of Physics, 6:5, 2004.
[31] Markus Bär and Michal Or-Guil. Alternative scenar-
ios of spiral breakup in a reaction-diffusion model with
excitable and oscillatory dynamics. Phys. Rev. Lett.,
82(6):1160–1163, 1999.
[32] Craig R Johnson and Maarten C Boerlijst. Selection at
the level of the community: the importance of spatial
structure. Trends Ecol. and Evol., 17:83–90, 2002.
[33] M Pascual. Diffusion-induced chaos in a spatial preda-
torprey system. Proc. R. Soc. Lond. B, 251:17, 1993.
[34] J. A. Sherratt, B. T. Eagan, and M. A. Lewis. Oscil-
lations and chaos behind predatorprey invasion: math-
ematical artifact or ecological reality? Phil. Trans. R.
Soc. Lond. B, 352:2138, 1997.
[35] S. V. Petrovskii and H. Malchow. Wave of chaos: new
mechanism of pattern formation in spatio-temporal pop-
ulation dynamics. Theor. Popul. Biol., 59:157174, 2001.
[36] J. A. Sherratt. Periodic travelling waves in cyclic preda-
torprey systems. Ecol. Lett., 4:3037, 2001.
[37] N. J. Savill and P. Hogeweg. Spatially induced speciation
prevents extinction: the evolution of dispersal distance in
oscillatory predator-prey models. Proc. R. Soc. Lond. B,
265(1390):25–32, 1998.
[38] E. R. Abraham. The generation of plankton patchiness
by turbulent stirring. Nature, 391:577–580, 1998.
[39] Carol L. Folt and Carolyn W. Burns. Biological drivers
of zooplankton patchiness. Nature, 14:300–305, 1999.
[40] M Scheffer. Should we expect strange attractors behind
plankton dynamics and if so, should we bother? J. Plank-
ton Res., 13:1291–1305, 1991.
[41] I. Hanski, P. Turchin, E. Korplmakl, and H. Henttonen.
Population oscillations of boreal rodents: regulation by
mustelid predators leads to chaos. Nature, 364:232235,
1993.
[42] S. Ellner and P. Turchin. Chaos in a noisy world: new
methods and evidence from time-series analysis. Am.
Nat., 145:343375, 1995.
[43] B. Dennis, R. A. Desharnais, J. M. Cushing, S. M. Hen-
son, and R. F. Costantino. Estimating chaos and com-
plex dynamics in an insect population. Ecol. Monogr.,
71:277303, 2001.
[44] M Scheffer. Fish and nutrients interplay determines algal
biomass: A minimal model. Oikos, 62:271–282, 1991.
[45] H. Malchow. Spatio-temporal pattern formation in non-
linear non-equilibrium plankton dynamics. Procc. R. Soc.
Lond. B, 251:103, 1993.
[46] M. Pascual. Diffusion-induced chaos in a spatial
predator-prey system. Procc. R. Soc. Lond. B, 251:1–7,
1993.
[47] Arnd Scheel. Radialy symmetric patterns of reaction-
diffusion systems. Mem. Amer. Math. Soc., 165:86, 2003.
[48] R A Satnoianu, M Menzinger, and P K Maini. Turing
instabilities in general system. J. Math. Biol., 41:493–
512, 2000.
[49] Quan-Xing Liu, Bai-Lian Li, and Zhen Jin. Resonant pat-
terns and frequency-locked induced by additive noise and
periodically forced in phytoplankton-zooplankton sys-
tem, 2007.
[50] Sven Erik Jørgensen and G. Bendoricchio. Fundamentals
of ecological modelling. Developments in environmental
modelling; 21. Elsevier, Amsterdam; New York, 3rd edi-
tion, 2001.
[51] Akira Okubo. Diffusion and ecological problems: mathe-
matical models. Biomathematics; v. 10. Springer-Verlag,
Berlin; New York, 1980.
[52] George Sugihara and Robert M. May. Nonlinear forecast-
ing as a way of distinguishing chaos from measurement
error in time series. Nature, 344(6268):734–741, 1990.
[53] A. M. Turing. The chemical basis of morphogenesis.
Philosophical Transactions of the Royal Society of Lon-
don. Series B, Biological Sciences, 237(641):37–72, 1952.
[55] Fagen Xie, Dongzhu Xie, and James N. Weiss. Inwardly
rotating spiral wave breakup in oscillatory reaction-
diffusion media. Phys. Rev. E, 74(2):026107, 2006.
[56] Björn Sandstede and Arnd Scheel. Absolute versus
convective instability of spiral waves. Phys. Rev. E,
62(6):7708–7714, 2000.
[57] S. M. Tobias and E. Knobloch. Breakup of spiral waves
into chemical turbulence. Phys. Rev. Lett., 80(21):4811–
4814, 1998.
[58] Q. Ouyang and J. M. Flesselles. Transition from spirals
to defect turbulence driven by a convective instability.
Nature, 379(6561):143–146, 1996.
[59] Qi Ouyang, H. L. Swinney, and G. Li. Transition from
spirals to defect-mediated turbulence driven by a doppler
instability. Phys. Rev. Lett., 84(5):1047–1050, 2000.
[60] Björn Sandstede and Arnd Scheel. Curvature effects
on spiral spectra: Generation of point eigenvalues near
branch points. Phys. Rev. E, 73:016217, 2006.
[61] Jens D.M. Rademacher, Björn Dandstede, and Arnd
Scheel. Computing absolute and essential spectra using
continuation. Physics D, 229:166–183, 2007.
[62] P. Wheeler and D. Barkley. Computation of Spiral Spec-
tra. SIAM J Appl. Dynam. Syst., 2006.
[63] Björn Sandstede and Arnd Scheel. Absolute and convec-
tive instabilities of waves on unbounded and large bound
domians. Physics D, 145:233–277, 2000.
[64] Dwight Barkley. Linear stability analysis of rotat-
ing spiral waves in excitable media. Phys. Rev. Lett.,
68(13):2090–2093, 1992.
[65] Dwight Barkley. Euclidean symmetry and the dynamics
of rotating spiral waves. Phys. Rev. Lett., 72(1):164–167,
1994.
[66] Igor Aranson, Lorenz Kramer, and Andreas Weber. Core
instability and spatiotemporal intermittency of spiral
waves in oscillatory media. Phys. Rev. Lett., 72(15):2316–
2319, 1994.
[67] Björn Sandstede and Arnd Scheel. Defects in oscillatory
media: Toward a classification. SIAM J. Appl. Dynam.
Syst., 3(1):1–68, 2004.
[68] A Dhooge, W Govaerts, Yu A Kuznetsov, W Mestrom,
A M Riet, and B Sautois. Matcont and Cl-Matcont: Con-
tinuation toolboxes in Matlab. Utrecht University, The
Netherlands, 2006.
[69] Paul Wheeler and Dwight Barkley. Computation of spiral
spectra. SIAM J. Appl. Dynam. Syst., 5:157–177, 2006.
[70] Maarten C. Boerlijst. The Geometry of Ecological Inter-
actions: Simplifying Spatial Complexity, chapter Spirals
and spots: Novel Evolutionary Phenomena through spa-
tial self-structuring, pages 171–182. Cambridge Univer-
sity Press, 2000.
[71] Ulf Dieckmann, Richard Law, and Johan A J Metz. The
Geometry of Ecological Interactions: Simplifying Spatial
Complexity. Cambridge University Press, 2000.
[72] J. Duinker and G. Wefer. Das co2-problem und die rolle
des ozeans. Naturwissenschaften, 81(6):237–242, 1994.
[73] M. Pascual. Computational ecology: From the complex
to the simple and back. Plos Comput. Biol., 1(2):101–
105, 2005.
[74] E. Litchman, C. A. Klausmeier, J. R. Miller, O. M.
Schofield, and P. G. Falkowski. Multi-nutrient, multi-
group model of present and future oceanic phytoplankton
communities. Biogeosciences, 3(4):585–606, 2006.
ABSTRACT
  Alexander B. Medvinsky \emph{et al} [A. B. Medvinsky, I. A. Tikhonova, R. R.
Aliev, B.-L. Li, Z.-S. Lin, and H. Malchow, Phys. Rev. E \textbf{64}, 021915
(2001)] and Marcus R. Garvie \emph{et al} [M. R. Garvie and C. Trenchea, SIAM
J. Control. Optim. \textbf{46}, 775-791 (2007)] shown that the minimal
spatially extended reaction-diffusion model of phytoplankton-zooplankton can
exhibit both regular, chaotic behavior, and spatiotemporal patterns in a patchy
environment. Based on that, the spatial plankton model is furtherly
investigated by means of computer simulations and theoretical analysis in the
present paper when its parameters would be expected in the case of mixed
Turing-Hopf bifurcation region. Our results show that the spiral waves exist in
that region and the spatiotemporal chaos emerge, which arise from the far-field
breakup of the spiral waves over large ranges of diffusion coefficients of
phytoplankton and zooplankton. Moreover, the spatiotemporal chaos arising from
the far-field breakup of spiral waves does not gradually involve the whole
space within that region. Our results are confirmed by means of computation
spectra and nonlinear bifurcation of wave trains. Finally, we give some
explanations about the spatially structured patterns from the community level.

<|endoftext|><|startoftext|>
General Sequential Quantum Cloning
Gui-Fang Dang and Heng Fan
Institute of Physics, Chinese Academy of Sciences, Beijing 100080, China.
(Dated: November 4, 2018)
Some multipartite quantum states can be generated in a sequential manner which may be im-
plemented by various physical setups like microwave and optical cavity QED, trapped ions, and
quantum dots etc. We analyze the general N to M (N ≤ M) qubits Universal Quantum Cloning
Machine (UQCM) within a sequential generation scheme. We show that the N to M sequential
UQCM is available. The case of d-level quantum states sequential cloning is also presented.
PACS numbers: 03.67.Mn, 03.65.Ud, 52.50.Dv
Quantum entanglement plays a key role in quantum
computation and quantum information [1]. Multipartite
entangled states arise as a resource for quantum infor-
mation processing tasks such as the well known quantum
teleportation[2], quantum communication [3, 4], clock
synchronization [5] etc. In general it is extremely dif-
ficult to generate experimentally multipartite entangled
states through single global unitary operations. In this
sense, the sequential generation of the entangled states
appears to be promising. Actually most of the quantum
computation networks are designed to implement quan-
tum logic gates through a sequential procedure [6]. Re-
cently sequential implementing of quantum information
processing tasks has been attracting much attention. It
is pointed out that photonic multiqubit states can be
generated by letting a source emit photonic qubits in a
sequential manner [7]. The general sequential generation
of entangled multiqubit states in the realm of cavity QED
was systematically studied in Refs.[8, 9]. It is also shown
that the class of sequentially generated states is identical
to the matrix-product-state (MPS) which is very useful
in study of spin chains of condensed matter physics [10].
On the other hand, much progress has already been
made in the past years in studying quantum cloning ma-
chines, for reviews see, for example, Refs.[11, 12, 13].
And various quantum cloning machines have been im-
plemented experimently by polarization of photons [14,
15, 16, 17, 18],nuclear spins in Nuclear Magnetic Reso-
nance [19, 20], etc. However, these experiments are for
1 to 2 (one qubit input and two-qubit output) or 1 to 3
cloning machines. The more general case will be much
difficult. There are some schemes proposed for the gen-
eral quantum cloning machines which are not in a sequen-
tial manner, see for example, [21, 22]. Recently a 1 to M
sequential universal quantum cloning is proposed [23] by
using the cloning transformation presented in Ref.[24].
Since it is in a sequential procedure, potentially it re-
duces the difficult in implementing this quantum cloning
machine. However, as is well known the collective quan-
tum cloning machine (the N identical input states are
cloned collectively to M copies) is better than the quan-
tum cloning machine which can only deal with the in-
dividual input(only one input is copied to several copies
each time). We know that the general N to M cloning
transformation is also available in Refs.[24, 25]. Then a
natural question arise is that whether the general N to
M sequential cloning machine is possible. In this Letter,
we will present the general sequential universal quantum
cloning machine.
The 1 to M cloning transformations used in Ref.[23]
was proposed by Gisin and Massar in Ref.[24]. And the
N toM UQCM was also presented in Ref.[24]. However,
to use the method proposed in Refs.[8, 23] to find the se-
quential cloning machine, the input state |Φ〉⊗N should
be expanded in computational basis {|0〉, |1〉}. The ex-
plicit quantum cloning transformations with this kind of
input were proposed by Fan et al in Ref.[25]. In this Let-
ter, based on the result of Ref.[25], the general sequential
UQCM will be presented.
As presented in Refs.[8, 23], the sequential generation
of a multiqubit state is like the following. Let HA be
a D-dimensional Hilbert space which acts as the ancil-
lary system, and a single qubit (e.g., a time-bin qubit)
is in a two-dimensional Hilbert space HB. In every step
of the sequential generation of a multiqubit state, a uni-
tary time evolution will be acting on the joint system
HA ⊗HB. We assume that each qubit is initially in the
state |0〉 which is like a blank or an empty state and
will not be written out in the formulas. So the unitary
time evolution is written in the form of an isometry V :
HA → HA⊗HB, where V =
i,α,β V
α,β |α, i〉〈β|, each V i
is a D×D matrix, and the isometry condition takes the
i=0 V
i†V i = 1. By applying successively n oper-
ations of V (not necessarily the same) on an initial ancil-
lary state |φI〉 ∈ HA, we obtain |Ψ〉 = V [n]...V [2]V [1]|φI〉.
The generated n qubits are in general an entangled state,
but the last step qubit-ancilla interaction can be chosen
so as to decouple the final multiqubit entangled state
from the auxiliary system, so the sequentially generated
state is
|ψ〉 =
i1...in=0
〈φF |V [n]in ...V [1]i1 |φI〉|in, ..., i1〉, (1)
where |φF 〉 is the final state of the ancilla. This is the
MPS. It was proven that any MPS can be sequentially
generated [8].
http://arxiv.org/abs/0704.0323v2
Suppose there are N identical pure quantum states
|Φ〉⊗N = (x0|0〉+x1|1〉)⊗N need to be cloned toM copies,
where |x0|2 + |x1|2 = 1. We know that the input state
can be represented by a basis in symmetric subspace.
|Φ〉⊗N =
xN−m0 x
CmN |(N −m)0,m1〉, (2)
where |(N − m)0,m1〉 denotes the symmetric and nor-
malized state with (N −m) qubits in the state |0〉 and m
qubits in the state |1〉, and we have CmN = N !/(N−m)!m!
in standard notation. So if we find the quantum cloning
transformations for all states in symmetric subspace, we
can clone N pure states to M copies. The UQCM with
input in symmetric subspace can be written as [25],
|(N −m)0,m1〉 → |ΦmM 〉, (3)
where
|ΦmM 〉 =
βmj |(M −m− j)0, (m+ j)1〉 ⊗Rj ,(4)
βmj =
M−N−j
M−m−jC
(m+j)
/CN+1M+1, (5)
where Rj are the ancillary states of the cloning machine
and are orthogonal with each other for different j. For
a sequential quantum cloning machine in this Letter, we
choose a realization Rj ≡ |(M −N − j)1, j0〉 for the an-
cilla states. This UQCM is optimal in the sense that
the fidelity between single qubit output state reduced
density operator ρoutreduced and the single input |Φ〉 is op-
timal. The optimal fidelity is F = 〈Φ|ρoutreduced|Φ〉 =
(MN +M + N)/M(N + 2), see Refs.[11, 12, 13] for re-
views and the references therein. A realization of this
UQCM with photon stimulated emission can be found in
Ref.[22] which is not in a sequential manner. We next
show that this general N to M UQCM can be generated
through a sequential procedure.
The basic idea is to show that the final state of the
cloning, |ΦmM 〉 in (4), can be expressed in its MPS form.
As shown in Ref.[8], any MPS can be sequentially gen-
erated. We shall follow the method, for example, as in
Refs.[23, 26]. By Schmidt decomposition, we first ex-
press the quantum state |ΦmM 〉 as a bi-partite state across
1 : 2... cut,
|ΦmM 〉 = λ
1 |0〉|φ
[2...(2M−N)]
1 〉+ λ
2 |1〉|φ
[2...(2M−N)]
Γ[1]i1α1 λ
|i1〉|φ[2...(2M−N)]α1 〉, (6)
where Γ
α1 = δα1,1,Γ
α1 = δα1,2, and λ
α1 are eigen-
values of the first qubit reduced density operator, and
we find λ
∑M−m−1
k=−m β
M−1/C
M , λ
∑M−m−1
k=−m β
mk+1C
M−1/C
m+k+1
M . To correspond with
the MPS in (1), we can define V
[1]i1
α1 = Γ
[1]i1
α1 . Suc-
cessively by Schmidt decomposition, the quantum state
|ΦmM 〉 in (4) is divided into a bi-partite state with the first
n qubits as one part, and the rest as another part, where
1 < n ≤M − 1. We find
|ΦmM 〉 =
j+1|(n− j)0, j1〉|φ
[(n+1)...(2M−N)]
j+1 〉, (7)
when 1 < n ≤M−N+m,n′ = n; whenM−N+m< n ≤
M − 1, n′ =M −N +m, λ[n]j+1 are eigenvalues of the first
n qubits reduced density operator of |ΦmM 〉. According to
the results in Eqs.(4,5), we can obtain,
j+1 =
M−m−n
m(j+k)
Cm+kM−n
m+j+k
. (8)
And we also have
|φ[(n+1)...(2M−N)]j+1 〉 =
M−m−n
β2m(j+k) ×
(m+k)
m+j+k
|(M − n−m− k)0, (m+ k)1〉 ⊗Rj+k.
By induction and a concise formula, we have
|Φn...(2M−N)]j+1 〉
αn,in
[n]in
(j+1)αn
λ[n]αn |in〉|φ
[(n+1)...(2M−N)]
[n−1]
|0〉|φ[(n+1)...(2M−N)]j+1 〉
+|1〉|φ[(n+1)...(2M−N)]j+2 〉
, (9)
where we denote
(j+1)αn
= δ(j+1)αn
n−1/(λ
[n−1]
n), (10)
(j+1)αn
= δ(j+2)αn
n−1/(λ
[n−1]
n ). (11)
Still we define that
V [n]inαnαn−1 = Γ
[n]in
αn−1αn
λ[n]αn . (12)
It is thus in the MPS representation. We can further con-
sider other cases including the ancilla state of the cloning
machine represented as Rj (Note it is not the ancilla state
in the MPS representation). We can find that the out-
put state of the general UQCM can be expressed as MPS
as in form (1). So it can be created sequentially. The
explicit results are summarized in the appendix.
We have shown that the output states of the general
UQCM in (4,5) are MPS’s and thus can be generated
sequentially. The sequential matrices V [n] of course de-
pend on the input |(N−m)0,m1〉 which are W-like states
and are generally multiqubit entangled. For later con-
venience, we denote V (m) to express that it depends
on input state for different m. By a straightforward
method, the sequential cloning operation, i.e., the iso-
metrices, depending on different input may take the form
m |(N − m)0,m〉〈(N − m)0,m1| ⊗ V (m). However,
this operation may need a single global unitary opera-
tor which involves N -qubit entangled states except for
m = 0,m = N . This contradicts with our aim that each
operation should be divided into sequential unitary oper-
ators in a quDit (quantum state in D-dimensional space)
times qubit system. Here we can use a scheme like the
following: the ancillary state interacts with each qubit
according to the (N + 1) × D-dimensional isometrices
CmN |0〉〈0|⊗N−m⊗|1〉〈1|⊗m⊗V (m) sequentially,
here a whole normalization factor is omitted. We know
that the operation |0〉〈0|⊗N−m ⊗ |1〉〈1|⊗m acts on each
qubit individually. Thus this scheme reduces the com-
plexity of the operation. This finishes our general se-
quential UQCM for the case of qubit. In case N = 1,
we recover the result of Ref.[23] for 1 to M cloning.
We should remark that similar as the case of sequen-
tial 1 to M UQCM in Ref.[23], for the general sequential
UQCM, the minimal dimension D of the ancillary state
grows linearly at most with M −N/2 + 1 for even N or
M − (N − 1)/2 for odd N .
Next we will consider a more general case that the se-
quential cloning machine is about the quantum state in d-
dimensional Hilbert space. We will use the d-dimensional
UQCM proposed by Fan et al in Ref. [25]. This UQCM
is a generalization of the cloning machine proposed in
Ref.[24] and we can use this UQCM to study its sequen-
tial form for d-dimensional case.
An arbitrary d-dimensional pure state takes the form
|Φ〉 =
i=0 xi|i〉 with
i=0 |xi|2 = 1. N identical pure
states can be expanded in terms of state in symmet-
ric subspace |Φ〉⊗N =
m1!...md!
xm10 ...x
d−1|~m〉,
where |~m〉 ≡ |m1, ...,md〉 is a symmetric state with mi
states of |i − 1〉, and also mi should satisfy a relation
i=1mi = N . The cloning transformations with states
in symmetric subspace can be written as
|~m〉 → |Φ~mM 〉 =
|~m+~j〉 ⊗ |~j〉, (13)
i=1 C
mi+ji
CM−NM+d−1
where ~j should satisfy
i ji = M − N . This cloning
machine is optimal and the corresponding fidelity of a
single quantum state between input and output is F =
(N(d+M) +M −N) /(d+N)M .
As for qubit system, we next show that the output
states for all symmetric states input can be expressed
as the sequential form. We consider the case 1 < n ≤
M − 1, and the state |Φ~mM 〉 is a bipartite state across
1...n : (n+ 1)... cut,
|Φ~mM 〉 =
|~j〉|φ[(n+1)...(M+1)]
〉 (15)
where
~m(~j−~m+~k)
i=1 C
ji+ki
, (16)
|φ[(n+1)...(M+1)]
~m(~j−~m+~k)
i=1 C
ji+ki
|~k〉|~j − ~m+ ~k〉/λ[n]
. (17)
By the same procedure as that of qubit case, we can
obtain the following
|φ[n...(M+1)]
[n]in
λ[n]αn |in〉|φ
(n+1)...(M+1)]
〉. (18)
Then we have
[n]in
= δαn(~j+~ein+1)
jin+1 + 1
[n−1]
. (19)
Still we can define V
[n]in
αnαn−1 = Γ
[n]in
αn−1αnλ
αn , and thus we
can find that each state |Φ~mM 〉 is a MPS and thus can be
sequentially generated. The detailed result of this part
will be presented elsewhere [27].
In conclusion, we show that the generalN toM univer-
sal quantum cloning machine can be implemented by a se-
quential manner. Since the sequential generation of mul-
tipartite state can be implemented in various physical se-
tups such as microwave and optical cavity QED, trapped
ions and quantum dots etc. This general sequential quan-
tum cloning machine may be implemented much easier
than the single global implementation scheme. This re-
duces dramatically the complexity in implementing the
general UQCM. We also show that for d-dimensional
quantum state, the sequential UQCM is also available.
Besides the universal cloning machine, the 1 toM phase-
covariant quantum cloning machine can also be sequen-
tially implemented. It will be interesting to consider sim-
ilarly the generalN toM phase-covariant cloning and the
economic phase-covariant cloning. The sequential asym-
metric quantum cloning machine may also be an inter-
esting topic.
Acknowledgements: HF was supported by ”Bairen”
program, NSFC and ”973” program (2006CB921107).
Appendix.–The explicit form of matrices V are pre-
sented as:
V [n]0αnαn−1 = δαnαn−1 ×
∑M−m−n
k=−m X
m+αn−1−1+k
∑M−m−n+1
k=−m X
M−n+1
m+αn−1−1+k
V [n]0αnαn−1 = δαnαn−1+1 ×
∑M−m−n
k=−m X
m+αn−1+k
∑M−m−n+1
k=−m X
M−n+1
m+αn−1−1+k
where notations X = β2m(αn−1−1+k), X
′ = β2m(αn−1+k)
are used. For case 1 < n ≤ M − N + m,αn−1 =
1, ..., n;αn = 1, ..., (n+1), and for caseM−N+m < n ≤
M − 1, αn−1, αn = 1, ..., (M −N +m+1). We can check
that the above defined V satisfies the isometry condition
V [n]in
V [n]in = 1. Similarly we have
V [M ]0αMαM−1 = δαMαM−1 ×
M−1−1−m)
M−1−1−m)
M−1−1
M−1−m)
V [M ]1αMαM−1 = δαM (αM−1+1) ×
M−1−m)
M−1−1−m)
M−1−1
M−1−m)
where 0 ≤ m ≤ N −m,αM−1, αM = 1, 2, ..., (M − N +
m+ 1).
For case concerning about ancilla state of the UQCM,
assume 1 ≤ l ≤M −N , we have
V [M+l]0αM+lαM+l−1 = δαM+l(αM+l−1−1) ×
αM+l−1 −m− 1
M −N − l + 1
V [M+l]1αM+lαM+l−1 = δαM+lαM+l−1 ×
M −N − l − αM+l−1 +m+ 1
M −N − l + 1
(1) For (m+ 1) ≤ αM+l ≤ (M −N +m− l+ 1),
(m+ 2) ≤ αM+l−1 ≤ (M −N +m− l+ 2),
[M+l]0
αM+lαM+l−1 = δαM+l(αM+l−1−1)
αM+l−1−m−1
M−N−l+1 .
For αM+l = (M −N +m− l + 2), 1 ≤ αM+l−1 ≤
(M −N +m+ 1), V [M+l]0αM+lαM+l−1 = 0. Otherwise
[M+l]0
αM+lαM+l−1 = δαM+lαM+l−1
(2) For (m+ 1) ≤ αM+l, αM+l−1 ≤
(M −N +m− l + 1), V [M+l]1αM+lαM+l−1 =
δαM+lαM+l−1
M−N−l−αM+l−1+m+2
M−N−l+1 . For αM+l =
(M −N +m− l + 2), 1 ≤ αM+l−1 ≤ (M −N +m+ 1),
[M+l]0
αM+lαM+l−1 = 0. Otherwise V
[M+l]0
αM+lαM+l−1 =
δαM+lαM+l−1
[1] C. H. Bennett and D. P. DiVincenzo, Nature 404, 247
(2000).
[2] C. H. Bennett, G.Brassard, C. Crepeau, R. Jozsa, A.
Peres, and W. Wootters, Phys. Rev. Lett. 70, 1895
(1993).
[3] D. Gottesman and I. Chuang, Nature 402, 390 (1999).
[4] R. Raussendorf and Hans J. Briegel, Phys. Rev. Lett. 86,
5188 (2000).
[5] V. Giovannetti, S. Lloyd, L. Maccone, Nature 412, 417
(2001).
[6] A. Barenco, et al, Phys. Rev. A 52, 3457 (1995).
[7] C. Saavedra, K. M. Gheri, T. Torma, J. I. Cirac, and P.
Zoller, Phys. Rev. A 61, 062311 (2000).
[8] C. Schon, E. Solano, F. Verstraete, J. I. Cirac, and M.
M. Wolf, Phys. Rev. Lett. 95, 110503 (2005).
[9] C. Schon, K. Hammerer, M. M. Wolf, J. I. Cirac, and E.
Solano, quant-ph/0612101,
[10] A. Affleck, T. Kennedy, E. H. Lieb, and H. Tasaki, Phys.
Rev. Lett. 59, 799 (1987).
[11] V. Scarani, S. Iblisdir, N. Gisin, A. Acin, Rev. Mod.
Phys. 77, 1225 (2005).
[12] N. J. Cerf, J. Fiurasek, Progress in Optics 49, 455 (Else-
vier 2006).
[13] H. Fan, Topics in Applied Physics 102, 63 (2006).
[14] A. Lamas-Linares, C. Simon, J. C. Howell, and D.
Bouwmeester, Science 296, 712 (2002).
[15] F. De Martini, V. Bužek, F. Sciarrino, and C. Sias, Na-
ture 419 815 (2002).
[16] D. Pelliccia, V. Schettini, F. Sciarrino, C. Sias, and F.
De Martini, Phys. Rev. A 68, 042306 (2003).
[17] M. T. M. Irvine, A. Lamas Linares, M. J. A. de Dood,
and D. Bouwmeester, Phys. Rev. Lett. 92, 047902 (2004).
[18] M. Ricci, F. Sciarrino, C. Sias, and F. De Martini, Phys.
Rev. Lett. 92, 047901 (2004).
[19] H. K. Cummins, C. Jones, A. Furze, N. F. Soffe, M.
Mosca, J. M. Peach, and J. A. Jones, Phys. Rev. Lett.
88, 187901 (2002).
[20] J. F. Du, et al, Phys. Rev. Lett. 94, 040505 (2005).
[21] C. Simon, G. Weihs, and A. Zeilinger, Phys. Rev. Lett.
84, 2993 (2000).
[22] H. Fan, G. Weihs, K. Matsumoto, and X. B. Wang, Phys.
Rev. A 67, 022317 (2003).
[23] Y. Delgado, L. Lamata, J. Leon, D. Salgado, and E.
Solano, Phys. Rev. Lett. , quant-ph/0607105.
[24] N. Gisin, and S. Massar, Phys. Rev. Lett. 79, 2153
(1997).
[25] H. Fan, K. Matsumoto, and M. Wadati, Phys. Rev. A
64, 064301 (2001).
[26] G. Vidal, Phys. Rev. Lett. 91, 147902 (2003).
[27] G. F. Dang and H. Fan, in preparation.
http://arxiv.org/abs/quant-ph/0612101
http://arxiv.org/abs/quant-ph/0607105
ABSTRACT
  Some multipartite quantum states can be generated in a sequential manner
which may be implemented by various physical setups like microwave and optical
cavity QED, trapped ions, and quantum dots etc. We analyze the general N to M
qubits Universal Quantum Cloning Machine (UQCM) within a sequential generation
scheme. We show that the N to M sequential UQCM is available. The case of
d-level quantum states sequential cloning is also presented.

<|endoftext|><|startoftext|>
Introduction
1.1. Miscellaneous facts about pseudospectrum. In recent years, there has been
a lot of interest in studying the pseudospectrum of non-selfadjoint operators. The
study of this notion has been initiated by noticing that for certain problems of sci-
ence and engineering involving non-selfadjoint operators, the predictions suggested by
spectral analysis do not match with the numerical simulations. This fact lets thinking
that in some cases the only knowledge of the spectrum of an operator is not enough to
understand sufficiently its action. To supplement this lack of information contained
in the spectrum, some new subsets of the complex plane called pseudospectra have
been defined. The main idea about the definition of these new subsets is that it is
interesting to study not only the points where the resolvent of an operator is not de-
fined, i.e. its spectrum, but also where this resolvent is large in norm. This explains
the following definition of the ε-pseudospectrum σε(A) of a matrix or an operator A,
σε(A) =
z ∈ C, ‖(zI −A)−1‖ ≥ 1
for any ε > 0, if we write by convention that ‖(zI − A)−1‖ = +∞ for every point z
belonging to the spectrum σ(A) of the operator.
Let us mention that there exists an abundant literature about this notion of pseu-
dospectrum. We refer here for the definition and some general properties of pseu-
dospectra to the paper [15] of L.N. Trefethen. Let us also point out the more recently
published book [16], which draws up a wide all-round view of this topic and gives a
lot of illustrations.
According to the previous definition, studying the pseudospectra of an operator is
exactly studying the level lines of the norm of its resolvent. What is interesting in
studying such level lines is that it gives some information about the spectral stability
http://arxiv.org/abs/0704.0324v1
of the operator. Indeed, pseudospectra can be defined in an equivalent way in term of
spectra of perturbations of the operator. For instance, we have for any A ∈ Mn(C),
σε(A) = {z ∈ C, z ∈ σ(A +B) for some B ∈ Mn(C) with ‖B‖ ≤ ε}.
It follows that a complex number z belongs to the ε-pseudospectrum of a matrix A if
and only if it belongs to the spectrum of one of its perturbations A+B with ‖B‖ ≤ ε.
More generally, if A is a closed unbounded linear operator with a dense domain on a
complex Hilbert space H , the result of Roch and Silbermann in [13] gives that
σε(A) =
B∈L(H), ‖B‖L(H)≤ε
σ(A +B),
where L(H) stands for the set of bounded linear operators on H . From this second
description, we understand the interest in studying such subsets if we want for example
to compute numerically some eigenvalues of an operator. Indeed, we start to do
it by discretizing this operator. This discretization and inevitable round-off errors
will generate some perturbations of the initial operator. Eventually, algorithms for
eigenvalues computing will determine the eigenvalues of a perturbation of the initial
operator, i.e. a value in a ε-pseudospectrum of the initial operator but not necessarily
a spectral one. This explains why it is important in such numerical computations to
understand if the ε-pseudospectra of studied operators contain more or less deeply
their spectra.
Let us first notice that this study is a priori non-trivial only for non-selfadjoint
operators, or more precisely for non-normal operators. Indeed, we have for a normal
operator A an exact expression of the norm of its resolvent given by the following
classical formula (see for example (V.3.31) in [8]),
(1.1.1) ∀z 6∈ σ(A), ‖(zI −A)−1‖ = 1
z, σ(A)
where d
z, σ(A)
stands for the distance between z and the spectrum of the operator,
when A is a closed unbounded linear operator with a dense domain on a complex
Hilbert space. This formula proves that the resolvent of a normal operator cannot
blow up far from its spectrum. It ensures the stability of its spectrum under small
perturbations because the ε-pseudospectrum is exactly equal in this case to the ε-
neighbourhood of the spectrum
(1.1.2) σε(A) =
z ∈ C : d
z, σ(A)
Nevertheless it is well-known that this formula (1.1.1) is no more true for non-normal
operators. For such operators, it can occur that their resolvents are very large in
norm far from their spectra. This induces that the spectra of these operators can be
very unstable under small perturbations. To illustrate this fact, let us consider the
case of the rotated harmonic oscillator and the following numerical computation of its
spectrum. The rotated harmonic oscillator is a simple example of elliptic quadratic
differential operator
Hc = D
x + cx
2, Dx = i
−1∂x,
with c = eiπ/4. The numerical computation is performed on the matrix discretization
(HcΨi,Ψj)L2(R)
1≤i,j≤N
where N is an integer taken equal to 100 and (Ψj)j∈N∗ stands for the basis of L
composed by Hermite functions. The black dots appearing on this computation stand
for the numerically computed eigenvalues. We can notice on this numerical simulation
that the computed low energies are very close to theoretical ones since the spectrum
Figure 1. Computation of some level lines of the norm of the resol-
vent ‖(Hc− z)−1‖ = ε−1 for the rotated harmonic oscillator Hc with
c = eiπ/4. The right column gives the corresponding values of log10 ε.
0 20 40 60 80 100 120 140 160
dim = 100
of the rotated harmonic oscillator is only composed of eigenvalues regularly spaced
out on the half-line eiπ/8R∗+,
σ(Hc) = {eiπ/8(2n+ 1) : n ∈ N}.
However we notice that it is no more true for the high energies. It occurs for them
some strong spectral instabilities, which lead to the computation of “false eigenvalues”
far from the half-line eiπ/8R∗+. Let us mention that some comparable computations
can be found in [3]. In this paper, we are interested in studying when and how this
kind of phenomena occurs in the class of elliptic quadratic differential operators.
1.2. Elliptic quadratic differential operators. We study here the class of elliptic
quadratic differential operators. It is the class of pseudodifferential operators defined
in the Weyl quantization
(1.2.1) q(x, ξ)wu(x) =
(2π)n
ei(x−y).ξq
(x+ y
u(y)dydξ,
by some symbols q(x, ξ), where (x, ξ) ∈ Rn×Rn and n ∈ N∗, which are some complex-
valued elliptic quadratic forms i.e. complex-valued quadratic forms verifying
(1.2.2) (x, ξ) ∈ Rn × Rn, q(x, ξ) = 0 ⇒ (x, ξ) = (0, 0).
Let us first notice that since the symbols of these operators are some quadratic forms,
these are only some differential operators, which are a priori non-selfadjoint because
their Weyl symbols are complex-valued. As mentioned before, the rotated harmonic
oscillator is an example of such an operator since we have
D2x + e
iθx2 = (ξ2 + eiθx2)w, 0 < θ < π,
if Dx = i
−1∂x. This operator is a very simple example of non-selfadjoint operator for
which we have noticed on the previous numerical simulation that it occurs some strong
spectral instabilities under small perturbations for its high energies. These phenomena
have been studied in several recent works. We can mention in particular the works of
L.S. Boulton [1], E.B. Davies [3], K. Pravda-Starov [10] and M. Zworski [18], which
have given a good understanding of these phenomena.
A question, which has been at the origin of this work, has been to study if these
phenomena peculiar to the rotated harmonic oscillator are representative, or not, of
what occurs more generally in the class of elliptic quadratic differential operators in
every dimension. We have tried to answer to the following questions:
- Does it always occur some strong spectral instabilities under small perturba-
tions for the high energies of these operators ?
- If it is not the case, is it possible to give a necessary and sufficient condition on
the Weyl symbols of these operators, which ensures their spectral stability ?
- Can we precisely describe the geometry, which separates the regions of the
resolvent sets where the resolvents of these operators blow up in norm from
the ones where one keeps a control on their sizes ?
To understand these spectral stability or instability phenomena, we need to study
the microlocal properties, which rule these phenomena in the class of elliptic quadratic
differential operators. Let us mention that it is M. Zworski who first underlined in [18]
the close link between these questions of spectral instabilities and some results of
microlocal analysis about the solvability of pseudodifferential operators.
1.3. Semiclassical pseudospectrum. To answer to these previous questions, it is
interesting to use a semiclassical setting and to study a notion of pseudospectrum
in this new setting. We define for a semiclassical family (Ph)0<h≤1 of operators on
L2(Rn), with a domain D, the following notions of semiclassical pseudospectra.
Definition 1.3.1. For all µ ≥ 0, the set
Λscµ (Ph) =
z ∈ C : ∀C > 0, ∀h0 > 0, ∃ 0 < h < h0, ‖(Ph − z)−1‖ ≥ Ch−µ
is called semiclassical pseudospectrum of index µ of the semiclassical family (Ph)0<h≤1.
The semiclassical pseudospectrum of infinite index is defined by
Λsc∞(Ph) =
Λscµ (Ph).
With this definition, the points in the complement of the semiclassical pseudospectrum
of index µ are the points of the complex plane where we have the following control of
the resolvent’s norm for sufficiently small values of the semiclassical parameter h,
(1.3.1) ∃C > 0, ∃h0 > 0, ∀ 0 < h < h0, ‖(Ph − z)−1‖ < Ch−µ.
To prove the existence of semiclassical pseudospectrum of index µ, we will study the
question of existence of semiclassical quasimodes
(1.3.2) ∀C > 0, ∀h0 > 0, ∃ 0 < h < h0, ∃uh ∈ D,
‖uh‖L2(Rn) = 1 and ‖Phuh − zuh‖L2(Rn) ≤ Chµ,
in some points z of the resolvent set, which can be considered as some “almost eigen-
values” in O(hµ) in the semiclassical limit. Let us notice that the definition chosen
here for the notions of semiclassical pseudospectra differ from the one given in [5] for
a semiclassical pseudodifferential operator. In fact, we have chosen a definition for
semiclassical pseudospectra inspired by the remark made p.388 in [5], because this
definition only depends on the properties of the semiclassical operator rather than on
its symbol.
The interest of working in a semiclassical setting is a matter of geometry. We can
explain this choice by the fact that it is easier for an elliptic quadratic differential oper-
ator q(x, ξ)w to describe the geometry of semiclassical pseudospectra of its associated
semiclassical operator (q(x, hξ)w)0<h≤1, than to describe directly the geometry of its
ε-pseudospectra. The semiclassical setting is particularly well-adapted for the study
of elliptic quadratic differential operators because there exists a simple link between
this semiclassical setting and the quantum one. Indeed, using that the symbols of
these operators are some quadratic forms q, we obtain from the change of variables,
y = h1/2x with h > 0, the following identity between the quantum operator q(x, ξ)w
and its associated semiclassical operator (q(x, hξ)w)0<h≤1,
(1.3.3) q(x, ξ)w − z
q(y, hη)w − z
if z ∈ C. This identity allows to get some information about the resolvent’s norm
behaviour of the quantum operator
q(x, ξ)w − z
if we have some information about semiclassical pseudospectra for its associated semi-
classical operator. Let us mention for example that if a non-zero complex number z
belongs to the semiclassical pseudospectrum of infinite index of the operator
(q(x, hξ)w)0<h≤1,
the identity (1.3.3) induces that the resolvent’s norm of the quantum operator blows
up along the half-line zR+ with a rate faster than any polynomials
(1.3.4) ∀N ∈ N, ∀C > 0, ∀η0 ≥ 1, ∃η ≥ η0, ‖
q(x, ξ)w − zη
)−1‖ ≥ CηN ,
and this, even if this half-line zR+ does not intersect the spectrum of the opera-
tor q(x, ξ)w. Conversely, in the case where z 6∈ Λscµ
q(y, hη)w
, z 6= 0 and 0 ≤ µ ≤ 1,
the identity (1.3.3) shows that we can find some positive constants C1 and C2 such
that the resolvent of the operator q(x, ξ)w remains bounded in norm in some regions
of the resolvent set of the shape
(1.3.5)
u ∈ C : |u| ≥ C1, d(∆, u) ≤ C2|proj∆u|1−µ
∩ C \ σ
q(x, ξ)w
where ∆ = zR+ and proj∆u stands for the orthogonal projection of u on the closed
half-line ∆. Indeed, we obtain from (1.3.1) and (1.3.3) that
∃C > 0, ∃η0 ≥ 1, ∀η ≥ η0,
q(x, ξ)w − ηeiargz
∥ < Cηµ−1,
which induces that for all v ∈ D
q(x, ξ)w
and η ≥ η0,
q(x, ξ)w − ηeiargz
L2(Rn)
≥ C−1η1−µ‖v‖L2(Rn),
q(x, ξ)w
stands for the domain of the operator q(x, ξ)w . Then, we can find a
constant η̃0 ≥ 1 such that if z̃ belongs to
u ∈ C : |u| ≥ η̃0, d(eiargzR+, u) ≤ 2−1C−1|projeiargzR+u|
∩ C\σ
q(x, ξ)w
|projeiargzR+ z̃| ≥ η0.
This induces using the previous estimates and the triangular inequality that if z̃
belongs to
u ∈ C : |u| ≥ η̃0, d(eiargzR+, u) ≤ 2−1C−1|projeiargzR+u|
∩ C\σ
q(x, ξ)w
we have for all v ∈ D
q(x, ξ)w
q(x, ξ)w − z̃
q(x, ξ)w − projeiargzR+ z̃
eiargzR+, z̃
‖v‖L2
≥ 2−1C−1|projeiargzR+ z̃|
1−µ‖v‖L2
≥ 2−1C−1η1−µ0 ‖v‖L2,
because µ ≤ 1. This last estimate shows that the resolvent of the operator q(x, ξ)w is
bounded in norm by 2Cη
0 on the set
u ∈ C : |u| ≥ η̃0, d(eiargzR+, u) ≤ 2−1C−1|projeiargzR+u|
∩ C\σ
q(x, ξ)w
We notice that depending directly on the value of the index µ, 0 ≤ µ < 1, the previous
set contains more or less deeply in its interior the half-line
{u ∈ C : |u| ≥ η̃0, u ∈ zR+}.
This fact explains why in the following we will precise carefully the index of the
semiclassical pseudospectrum to which a point does not belong when there is no
semiclassical pseudospectrum of infinite index in that point.
2. Statement of the results
2.1. Some notations and some preliminary facts about elliptic quadratic
differential operators. Let us begin by giving some notations and recalling known
results about elliptic quadratic differential operators. Let q be a complex-valued
elliptic quadratic form
q : Rnx × Rnξ → C
(x, ξ) 7→ q(x, ξ),
with n ∈ N∗, i.e. a complex-valued quadratic form verifying (1.2.2). The numerical
range Σ(q) of q is defined by the subset in the complex plane of all values taken by
this symbol
(2.1.1) Σ(q) = q(Rnx × Rnξ ),
and the Hamilton map F ∈ M2n(C) associated to the quadratic form q is uniquely
defined by the identity
(2.1.2) q
(x, ξ); (y, η)
(x, ξ), F (y, η)
, (x, ξ) ∈ R2n, (y, η) ∈ R2n,
where q
stands for the polar form associated to the quadratic form q and σ is the
symplectic form on R2n,
(2.1.3) σ
(x, ξ), (y, η)
= ξ.y − x.η, (x, ξ) ∈ R2n, (y, η) ∈ R2n.
Let us first notice that this Hamilton map F is skew-symmetric with respect to σ.
This is just a consequence of the properties of skew-symmetry of the symplectic form
and symmetry of the polar form
(2.1.4) ∀X,Y ∈ R2n, σ(X,FY ) = q(X ;Y ) = q(Y ;X) = σ(Y, FX) = −σ(FX, Y ).
Under this assumption of ellipticity, the numerical range of a quadratic form can
only take some very particular shapes. It is a consequence of the following result
proved by J. Sjöstrand (Lemma 3.1 in [14]),
Proposition 2.1.1. Let q : Rnx × Rnξ → C a complex-valued elliptic quadratic form.
If n ≥ 2, then there exists z ∈ C∗ such that Re(zq) is a positive definite quadratic
form. If n = 1, the same result is fulfilled if we assume besides that Σ(q) 6= C.
This proposition shows that the numerical range of an elliptic quadratic form can only
take two shapes. The first possible shape is when Σ(q) is equal to the whole complex
plane. This case can only occur in dimension n = 1. The second possible shape is
when Σ(q) is equal to a closed angular sector with a top in 0 and an opening strictly
lower than π.
Figure 2. Shape of the numerical range Σ(q) when Σ(q) 6= C.
Σ(zq)
Indeed, if Σ(q) 6= C, using that the set Σ(q) is a semi-cone
tq(x, ξ) = q(
tξ), t ∈ R+, (x, ξ) ∈ R2n,
because q is a quadratic form, we have
Σ(q) = R+z
if z is the non-zero complex number given by the proposition 2.1.1 and I is the compact
interval
I = 1 + i Im(zq)(K),
where K is the following compact subset of R2n,
(x, ξ) ∈ R2n : Re(zq)(x, ξ) = 1
The compactness of K is a direct consequence of the fact that Re(zq) is a positive
definite quadratic form.
Elliptic quadratic differential operators define some Fredholm operators (see Lemma 3.1
in [6] or Theorem 3.5 in [14]),
(2.1.5) q(x, ξ)w + z : B → L2(Rn),
where B is the Hilbert space
(2.1.6)
u ∈ L2(Rn) : xαDβxu ∈ L2(Rn) if |α+ β| ≤ 2
with the norm
‖u‖2B =
|α+β|≤2
‖xαDβxu‖2L2(Rn).
The Fredholm index of the operator q(x, ξ)w + z is independent of z and is equal to 0
if n ≥ 2. In the case where n = 1, this index can take the values −2, 0 or 2. More
precisely, this index is always equal to 0 if Σ(q) 6= C.
In the following, we will always assume that Σ(q) 6= C. Under this assumption,
J. Sjöstrand has proved in the theorem 3.5 in [14] (see also Lemma 3.2 and Theorem 3.3
in [6]) that the spectrum of an elliptic quadratic differential operator
q(x, ξ)w : B → L2(Rn),
is only composed of eigenvalues with finite multiplicity
(2.1.7) σ
q(x, ξ)w
λ∈σ(F ),
−iλ∈Σ(q)\{0}
rλ + 2kλ
(−iλ) : kλ ∈ N
where F is the Hamilton map associated to the quadratic form q and rλ is the dimen-
sion of the space of generalized eigenvectors of F in C2n belonging to the eigenvalue
λ ∈ C. Let us notice that the spectra of these operators is always included in the
numerical range of their Weyl symbols.
To end this review of preliminary properties of elliptic quadratic differential oper-
ators, let us underline that the property of normality in this class of operators can be
easily checked by computing the Poisson bracket of the real part and the imaginary
part of their symbols
(2.1.8) {Re q, Im q} =
∂Re q
∂Im q
∂Re q
∂Im q
Proposition 2.1.2. An elliptic quadratic differential operator
q(x, ξ)w : B → L2(Rn), n ∈ N∗,
is normal if and only if the quadratic form defined by the Poisson bracket of the real
part and the imaginary part of its symbol is equal to zero
(2.1.9) ∀(x, ξ) ∈ R2n, {Re q, Im q}(x, ξ) = 0.
Proof of Proposition 2.1.2. This proposition is a direct consequence of the composition
formula in Weyl calculus (see Theorem 18.5.4 in [7]), which induces that the Weyl
symbol of the commutator
[qw, (qw)∗] = [qw, qw] = −2i[(Re q)w, (Im q)w],
is equal to
−2i(Re q ♯ Im q − Im q ♯ Re q) = −2{Re q, Im q},
because Re q and Im q are some quadratic forms. The notation Re q ♯ Im q stands
for the Weyl symbol of the operator obtained by composition (Req)w(Imq)w. �
Remark. Let us notice that the symplectic invariance of the Poisson bracket (see
(21.1.4) in [7]),
(2.1.10) {(Re q) ◦ χ, (Im q) ◦ χ} = {Re q, Im q} ◦ χ,
if χ stands for a linear symplectic transformation of R2n, implies that the condition
(2.1.9) is symplectically invariant.
2.2. Statement of the main results. Let us consider an elliptic quadratic differ-
ential operator
q(x, ξ)w : B → L2(Rn).
We know from (2.1.7) that the spectrum of this operator is contained in the numerical
range of its symbol Σ(q). The following proposition gives a first localization of the
regions where the resolvent can blow up in norm and where spectral instabilities can
occur.
Proposition 2.2.1. Let q : Rn × Rn → C, n ∈ N∗, be a complex-valued elliptic
quadratic form. We have
∀z 6∈ Σ(q),
q(x, ξ)w − z
∥ ≤ 1
z,Σ(q)
where d
z,Σ(q)
stands for the distance from z to the numerical range Σ(q).
This result shows that the resolvent of an elliptic quadratic differential operator
cannot blow up in norm far from the numerical range of its symbol. We are now
going to study what kind of phenomena can occur in this particular set. There are
two cases to separate according to the property of normality or non-normality of the
operator.
2.2.1. Case of a normal operator. Let us consider a normal elliptic quadratic differ-
ential operator
q(x, ξ)w : B → L2(Rn).
Let us recall that according to the proposition 2.1.2 this property of normality is
exactly equivalent to the fact that
∀(x, ξ) ∈ R2n, {Re q, Im q}(x, ξ) = 0.
In this case, we have the classical formula (1.1.1) for its resolvent’s norm
(2.2.1) ∀z 6∈ σ
q(x, ξ)w
q(x, ξ)w − z
z, σ(q(x, ξ)w)
which induces that the ε-pseudospectrum of this operator is exactly equal to the
ε-neighbourhood of its spectrum
q(x, ξ)w
z ∈ C : d
z, σ(q(x, ξ)w)
, ε > 0.
This classical formula (2.2.1) ensures that the resolvent cannot blow up in norm far
from the spectrum and induces that the spectrum of such an operator is stable under
small perturbations.
Example 1. The operator
(2.2.2) q1(x, ξ)
w = −(1 + i)∂2x1 − ∂
+ 4(−1 + i)x1∂x1 + 2(−1 + i)x2∂x1 + 6ix2∂x2
+ 2ix1∂x2 + (6 + 5i)x
1 + (11 + i)x
2 + (10 + 4i)x1x2 − 2 + 5i,
is an example of a normal elliptic quadratic differential operator. Its spectrum is given
q1(x, ξ)
(2k1 + 1) + (2k2 + 1)
4 : (k1, k2) ∈ N2
Figure 3. Spectrum and a ε-pseudospectrum of the operator q1(x, ξ)
∗ ∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗ ∗
∗ ∗ ∗ ∗
∗ ∗ ∗
Σ(q1)
Example 2. Let us notice that when the numerical range Σ(q) is reduced to a closed
half-line, the elliptic quadratic differential operator q(x, ξ)w is always normal since
{Re q, Im q} = |z|2{Re(z−1q), Im(z−1q)} = 0,
if z ∈ C∗ is chosen such that Im(z−1q) = 0. In fact, the operator q(x, ξ)w can in this
particular case be reduced after a conjugation by a unitary operator on L2(Rn) to the
operator
+ x2j),
where λj > 0 for all j = 1, ..., n.
Figure 4. Example of a normal elliptic quadratic differential operator.
2.2.2. Case of a non-normal operator. Let us consider a non-normal elliptic quadratic
differential operator
q(x, ξ)w : B → L2(Rn), n ∈ N∗.
We assume in the following that the numerical range Σ(q) is distinct from the whole
complex plane
(2.2.3) Σ(q) 6= C.
As mentioned in the section 2.1, this additional assumption is always fulfilled in
dimension n ≥ 2. It only excludes some very particular one-dimensional elliptic
quadratic differential operators (see the remark following the proposition 2.2.2 for
more precision about these operators).
Under this additional assumption, the numerical range Σ(q) is always a closed
angular sector with a top in 0 and a positive opening strictly lower than π.
2.2.2.a. On the pseudospectrum at the interior of the numerical range. Let us consider
the associated semiclassical elliptic quadratic differential operator
(q(x, hξ)w)0<h≤1.
We can build in every point of the interior of the numerical range Σ̊(q) some semi-
classical quasimodes.
Theorem 2.2.1. If the elliptic quadratic differential operator
q(x, ξ)w : B → L2(Rn), n ∈ N∗,
is non-normal and verifies Σ(q) 6= C then for all z ∈ Σ̊(q) and N ∈ N, there exist
h0 > 0 and a semiclassical family (uh)0<h≤h0 ∈ S(Rn) such that
‖uh‖L2(Rn) = 1 and ‖q(x, hξ)wuh − zuh‖L2(Rn) = O(hN ) when h → 0+.
This result induces the existence of semiclassical pseudospectrum of infinite index in
every point of the interior of the numerical range Σ̊(q).
According to (1.3.4), this result in the semiclassical setting induces that the resol-
vent’s norm of the quantum operator q(x, ξ)w blows up fastly along all the half-lines
belonging to the interior of the numerical range Σ̊(q),
(2.2.4) ∀z ∈ Σ̊(q), ∀N ∈ N, ∀C > 0, ∀η0 ≥ 1, ∃η ≥ η0, ‖
q(x, ξ)w − zη
)−1‖ ≥ CηN .
We deduce from (2.1.7) that as soon as an elliptic quadratic differential operator is
non-normal its resolvent blows up in norm in some regions of the resolvent set far
from its spectrum. This fact induces that the high energies of such an operator are
very unstable under small perturbations as we have already noticed on the numerical
computation performed for the rotated harmonic oscillator. It follows that in the class
of elliptic quadratic differential operators1 the property of spectral stability is exactly
equivalent to the property of normality:
σ(q(x, ξ)w) is stable under ⇔ q(x, ξ)w is a normal ⇔ {Re q, Im q} = 0.
small perturbations operator
By spectral stability, we mean here that the resolvent of these operators cannot blow
up in norm far from their spectra. Let us add that it is not very surprising to have
this property of spectral stability under the assumption of normality, but it is worth
1If we exclude the one-dimensional particular cases previously mentioned.
noticing that as soon as this property is violated, it occurs in this class of operators
some strong spectral instabilities under small perturbations for their high energies.
Examples. The two following operators
(2.2.5) q2(x, ξ)
w = −∂2x1 − 2∂
+ 4ix2∂x2 + 2x
1 + (4 + i)x
2 + 4x1x2 + 2i
(2.2.6) q3(x, ξ)
w = −(1 + i)∂2x1 − 2∂
+ 4(−1 + i)x1∂x1 + 2(1− i)x2∂x1 − 4ix1∂x2
+ (9 + 4i)x21 + (2 + i)x
2 − 4(1 + i)x1x2 − 2 + 2i,
are some examples of non-normal elliptic quadratic differential operators.
2.2.2.b. On the pseudospectrum at the boundary of the numerical range. Let us now
study what occurs on the boundary of the numerical range ∂Σ(q) for a non-normal
elliptic quadratic differential operator
q(x, ξ)w : B → L2(Rn).
Let us mention that we always assume that Σ(q) 6= C. Under these assumptions, the
boundary of the numerical range is composed of the union of the origin 0 and two
half-lines ∆1 and ∆2,
(2.2.7) ∂Σ(q) = {0} ⊔∆1 ⊔∆2,
that we can write
(2.2.8) ∆1 = z1R
+ and ∆2 = z2R
+ with z1, z2 ∈ ∂Σ(q) \ {0}.
We need to define a notion of order for the symbol q(x, ξ) on these two half-lines ∆j ,
j = 1, 2. Let us begin by recalling the classical definition of the order k(x0, ξ0) of a
symbol p(x, ξ) at a point (x0, ξ0) ∈ R2n (see section 27.2, chapter 27 in [7]). This
order k(x0, ξ0) is an element of the set N ∪ {+∞} defined by
(2.2.9) k(x0, ξ0) = sup
j ∈ Z : pI(x0, ξ0) = 0, ∀ 1 ≤ |I| ≤ j
where I = (i1, i2, ..., ik) ∈ {1, 2}k, |I| = k and pI stands for the iterated Poisson
brackets
pI = Hpi1Hpi2 ...Hpik−1 pik ,
where p1 and p2 are respectively the real and the imaginary part of the symbol p,
p = p1 + ip2. The order of a symbol q at a point z is then defined as the maximal
order of the symbol p = q − z at every point (x0, ξ0) ∈ R2n verifying
p(x0, ξ0) = q(x0, ξ0)− z = 0.
Let us underline that the symplectic invariance of the Poisson bracket (2.1.10) induces
the same property for the order of a symbol at a point.
Since here the symbol q is a quadratic form, all the iterated Poisson brackets are
also some quadratic forms. This property of degree two homogeneity of these Poisson
brackets induces that the symbol q has the same order at every point of each half-line
∆j , j = 1, 2. This allows to define the order of the symbol q on the half-line ∆j by
defining this order by this common value. Let us mention that this order can be finite
or infinite.
Examples. One can easily check that the Weyl symbol
ξ2 + eiθx2, 0 < θ < π,
of the rotated harmonic oscillator has an order equal to 2 on the both half-lines R∗+
and eiθR∗+, which composes the boundary of its numerical range. The symbol q2 of
the operator defined in (2.2.5) has an order equal to 2 on iR∗+ and to 6 on R
Σ(q2) = {z ∈ C : Re z ≥ 0, Im z ≥ 0}.
On the other hand, we can verify that the symbol q3 of the operator defined in (2.2.6)
is of infinite order on the half-line R∗+ and has an order equal to 2 on e
iπ/4R∗+,
Σ(q3) = {0} ∪ {z ∈ C∗ : 0 ≤ arg z ≤ π/4}.
In the case where the symbol is of finite order on a half-line ∆j , j = 1, 2, we have
the following result.
Theorem 2.2.2. If the Weyl symbol q(x, ξ) of a non-normal elliptic quadratic differ-
ential operator is of finite order kj on the half-line
∆j , j ∈ {1, 2}, ∆j ⊂ ∂Σ(q) \ {0},
then this order is necessary even and there is no semiclassical pseudospectrum of
index kj/(kj + 1) on ∆j for the associated semiclassical operator
∆j ⊂ C \ Λsckj/(kj+1)
q(x, hξ)w
Remark. Let us mention that we can more precisely establish that in dimension n ≥ 1,
the order kj is an even integer verifying
2 ≤ kj ≤ 4n− 2.
This result is proved in [12].
By rephrasing this result in a quantum setting, it follows from (1.3.5) and (2.1.7)
that when the symbol q of a non-normal elliptic quadratic differential operator q(x, ξ)w
is of finite order kj on a half-line
∆j , j ∈ {1, 2}, ∆j ⊂ ∂Σ(q) \ {0},
then the resolvent of this operator remains bounded in norm in a set of the following
(2.2.10)
u ∈ C : |u| ≥ C1, d(∆j , u) ≤ C2|proj∆ju|
where C1 and C2 are some positive constants.
As we will see in its proof, this absence of semiclassical pseudospectrum is linked
to some properties of subellipticity. Let us just underline for the moment that the
index kj/(kj + 1), which appears in this result is exactly equal to the loss appearing
in the subelliptic estimate hidden behind this result.
About the case of infinite order, the situation is much more complicated. Never-
theless, we can first notice in this case that we cannot expect to prove a stronger result
than an absence of semiclassical pseudospectrum of index 1. Indeed, we can easily
check on the example of the operator q3(x, ξ)
w defined in (2.2.6) that its spectrum is
given by
q3(x, ξ)
(2k1 + 1)
2 + (2k2 + 1)3
8 : (k1, k2) ∈ N2
We recall that the spectrum of this operator is only composed of eigenvalues and that
its symbol is of infinite order on R∗+. It follows from the structure of the spectrum and
(1.3.5) that if there is no semiclassical pseudospectrum of infinite index in a point of
the half-line R∗+, there is necessary no semiclassical pseudospectrum of index µ with
an index µ ≥ 1. In fact, we can prove by using a result of exponential decay in time
for the norm of contraction semigroups generated by elliptic quadratic differential
operators (see [12]) that there is never some semiclassical pseudospectrum of index 1
on all these half-lines of infinite order. Let us mention that this result of exponential
decay will not be proved here but it will be explained in the following how it induces
the absence of semiclassical pseudospectrum of index 1.
2.2.3. About the geometry of ε-pseudospectra for elliptic quadratic differential opera-
tors. Let us now explain what are the consequences of these results on the geometry
of ε-pseudospectra for elliptic quadratic differential operators. Let us begin by con-
sidering the one-dimensional case which is a bit particular. In dimension n = 1, an
elliptic quadratic differential operator can be reduced after a similitude and a conju-
gation by a unitary operator to the harmonic oscillator or to the rotated harmonic
oscillator.
Proposition 2.2.2. Let us consider q : R×R → C a complex-valued elliptic quadratic
form such that Σ(q) 6= C. For all h > 0, there exist a unitary operator (more precisely
a metaplectic operator) Uh on L
2(R), which is an automorphism of the spaces S(R)
and B, z ∈ C∗ and θ ∈ [0, π[ such that
∀h > 0, q(x, hξ)w = zUh
(hDx)
2 + eiθx2
U−1h .
Remark. In the case where Σ(q) = C, an elliptic quadratic differential operator
q(x, ξ)w can be reduced after a similitude and a conjugation by a unitary operator on
L2(Rn) to the operator defined in the Weyl quantization by the symbol
(ξ + ix)(ξ + ηx) with η ∈ C, Im η > 0,
(ξ − ix)(ξ + ηx) with η ∈ C, Im η < 0,
depending on the value of its Fredholm index, which is equal to −2 in the first case
and to 2 in the second one.
As we will see in the following, this proposition allows us to reduce the study of a
one-dimensional non-normal elliptic quadratic differential operator verifying
Σ(q) 6= C,
to the one of the rotated harmonic oscillator
Hθ = D
x + e
iθx2, 0 < θ < π.
Let us mention that the previous results (Theorem 2.2.1 and Theorem 2.2.2) were
already known in the particular case of the rotated harmonic oscillator. Indeed,
the existence of semiclassical quasimodes inducing the presence of semiclassical pseu-
dospectrum of infinite index in every point of the interior of the numerical range for
the associated semiclassical operator, is a direct consequence of a result proved by
E.B. Davies in [4] (Theorem 1). About the absence of semiclassical pseudospectrum
of index 2/3 on the boundary of the numerical range, this result has been proved for
the rotated harmonic oscillator in [10]2.
As proved in [10], this absence of semiclassical pseudospectrum allows to give a
proof of a conjecture stated by L.S. Boulton in [1]. It deals with the geometry of ε-
pseudospectra for the rotated harmonic oscillator. Let us now recall some facts about
this conjecture and some results proved by L.S. Boulton in [1].
2Let us recall that the value of the order is equal to 2 in this case.
L.S. Boulton has first proved (Theorem 3.3 in [1]) that the resolvent of the rotated
harmonic oscillator blows up in norm along all a family of curves of the following form
η 7→ bη + eiθηp,
where b and p are some positive constants verifying 1/3 < p < 3,
(2.2.11)
Hθ − (bη + eiθηp)
∥ → +∞ when η → +∞.
On the other hand, he also proved that the resolvent of this operator remains bounded
in norm on two half-stripes parallel to the half-lines R+ or e
iθR+. More precisely, he
proved that there exist some positive constants d and Md such that
(2.2.12) sup
, 0≤b≤d
Hθ − (η + ib)
∥ ≤ Md,
(2.2.13) sup
, 0≤b≤d
Hθ − eiθ(η − ib)
∥ ≤ Md.
These bounds provide some information about the shape of ε-pseudospectra of the
operator Hθ. Indeed, L.S. Boulton has proved using these results that for all suffi-
ciently small value of the positive parameter ε, the ε-pseudospectra of the rotated
harmonic oscillator is contained in the shaded set appearing on the following figure.
The eigenvalues appear on this figure marked by some ⋄.
Figure 5. A first localization of the ε-pseudospectra of the rotated
harmonic oscillator.
More precisely, L.S. Boulton proved that for all 0 < δ < 1 and m ∈ N, there exists
a positive constant ε0 such that for all 0 < ε < ε0,
(2.2.14) σε(Hθ) ⊂
{z ∈ C : |z − λn| < δ} ∪
λm+1 − δeiθ/2 + Sθ
where
λn = e
iθ/2(2n+ 1), n ∈ N
Sθ = {z ∈ C∗ : 0 ≤ arg z ≤ θ} ∪ {0}.
In fact, in view of some numerical calculations performed by E.B. Davies in [3],
L.S. Boulton has conjectured that the index p = 1/3 appearing in (2.2.11) is the
critical one in the following sense:
Let us consider 0 < p < 1/3, 0 < δ < 1 and m ∈ N. If bm,p and E are some positive
constants verifying
bm,pE + e
iθEp = λm and ∀η > E, arg zη < θ/2,
where zη = bm,pη + e
iθηp, let us set
Ωm,p =
|zη|eiα ∈ C : η ≥ E, arg zη ≤ α ≤ arg(zηeiθ)
L.S. Boulton has conjectured the following result.
Boulton’s conjecture. There exists ε0 > 0 such that for all 0 < ε < ε0,
(2.2.15) σε(Hθ) ⊂
{z ∈ C : |z − λn| < δ} ∪ Ωm,p.
The absence of semiclassical pseudospectrum of index 2/3 on the boundary of the
numerical range ∂Σ(q)\{0} for the rotated harmonic oscillator3 given by the theorem
2.2.2 shows that this index 1/3 is actually the critical one. Indeed, we can deduce
(2.2.15) from (2.2.10) (see [10] for more details) since here kj = 2, j ∈ {1, 2}. As
we will see, this theorem 2.2.2 is a consequence of a subelliptic estimate for gen-
eral semiclassical pseudodifferential operators proved by N. Dencker, J. Sjöstrand and
M. Zworski in [5] (Theorem 1.4). In the particular case of the rotated harmonic oscil-
lator, a more elementary proof of this result using only some non-trivial localization
scheme in the frequency variable is given in [10].
Let us notice that this inclusion (2.2.15) allows to give a sharp description of the ε-
pseudospectra of the rotated harmonic oscillator, which is optimal in view of (2.2.11).
Figure 6. Shape of the ε-pseudospectra of the rotated harmonic oscillator.
By coming back to the case of an arbitrary dimension n ≥ 1, let us finally underline
that using the theorem 2.2.2, we can give similar descriptions of the ε-pseudospectra
for non-normal elliptic quadratic differential operators, to the one given by L.S. Boul-
ton for the rotated harmonic oscillator, when the symbols of these operators are of
finite order on the two open half-lines, which compose the boundary of their numerical
ranges. The only difference with the particular case of the rotated harmonic oscillator
is that the critical indices, which appear in this description can be different. Indeed,
3The order of the rotated harmonic oscillator’s symbol is equal to 2 on ∂Σ(q) \ {0}.
these critical indices depend directly according to (2.2.10) on the order of the symbols
on the two half-lines composing the boundary of their numerical ranges. We refer the
reader to [10] for more details about the way of getting from (2.2.10) such descriptions
of ε-pseudospectra.
3. The proofs of the results
Before giving the proofs of the results stated in the previous section, let us begin by
recalling the symplectic invariance property of the Weyl quantization (see Theorem
18.5.9 in [7]). This symplectic invariance is actually the most important property of
the Weyl quantization.
For every affine symplectic transformation χ of R2n, there exists a unitary trans-
formation U on L2(Rn), uniquely determined apart from a constant factor of modulus
1, such that U is an automorphism of the spaces S(Rn), B and S ′(Rn), where B is
the Hilbert space defined in (2.1.6), and
(3.0.1) (a ◦ χ)(x, ξ)w = U−1a(x, ξ)wU,
for all a ∈ S ′(R2n). The operator U is a metaplectic operator associated to the affine
symplectic transformation χ.
This symplectic invariance of the Weyl quantization induces the same property for
the semiclassical pseudospectra of elliptic quadratic differential operators in the sense
that if
q : Rnx × Rnξ → C,
is a complex-valued elliptic quadratic form and χ is a linear symplectic transformation
of R2n, we have for all µ ∈ [0,∞],
(3.0.2) Λscµ
(q ◦ χ)(x, hξ)w
= Λscµ
q(x, hξ)w
To prove this fact, let us begin by noticing that for all a ∈ S ′(R2n) and h > 0, we
U−1h a(x, ξ)
wUh = a(h
−1/2x, h1/2ξ)w,
where
Uhf(x) = h
n/4f(h1/2x),
since according to the proof of Theorem 18.5.9 in [7], Uh is a metaplectic operator
associated to the linear symplectic transformation
(x, ξ) 7→ (h−1/2x, h1/2ξ).
Let us now consider the case where the symbol a is a quadratic form. The homogeneity
property of such a symbol implies that
∀h > 0, a(h−1/2x, h1/2ξ) = 1
a(x, hξ),
∀h > 0, U−1h a(x, ξ)
wUh =
a(x, hξ)w.
If q : Rnx × Rnξ → C is a complex-valued elliptic quadratic form and χ is a linear
symplectic transformation of R2n, we can notice that
(q ◦ χ)(x, hξ)w , h > 0,
is actually an elliptic quadratic differential operator since the symbol q◦χ is an elliptic
quadratic form. Let z ∈ C and U be a metaplectic operator associated to the linear
symplectic transformation χ. Using that U and Uh are some automorphisms of the
Hilbert space B and
(3.0.3) U−1h U
−1Uhq(x, hξ)
wU−1h UUh = U
−1hq(x, ξ)wUUh
= hU−1h (q ◦ χ)(x, ξ)
wUh = (q ◦ χ)(x, hξ)w ,
we obtain that
U−1h U
q(x, hξ)w − z
U−1h UUh =
(q ◦ χ)(x, hξ)w − z
Using finally that U−1h U
−1Uh is a unitary transformation of L
2(Rn), this identity
implies that
(q ◦ χ)(x, hξ)w − z
q(x, hξ)w − z
which proves (3.0.2). In the following, this property of symplectic invariance will
allow us to reduce certain symbols to some normal forms by choosing new symplectic
coordinates. We can now begin to prove the results stated in the previous section.
Let us start by the proof of the proposition 2.2.1.
Proof of Proposition 2.2.1. If the numerical range is equal to the whole complex plane,
there is nothing to prove. If Σ(q) 6= C, we have seen in the previous section that the
numerical range is necessary a closed angular sector with a top in 0 and an opening
strictly lower than π.
Let us consider z 6∈ Σ(q) and denote by z0 its orthogonal projection on the non-
empty closed convex set Σ(q). According to the shape of the numerical range, it
follows that z0 belongs to its boundary and that we can find a complex number
z1 ∈ C∗, |z1| = 1 such that
Σ(z1q) ⊂
z ∈ C : Re z ≥ 0
(3.0.4) z1z ∈
z ∈ C : Re z < 0
z,Σ(q)
= d(z1z, iR).
Using now that the operator i[Im(z1q)]
w is formally skew-selfadjoint, we obtain that
for all u ∈ S(Rn),
z1q(x, ξ)
wu− z1zu, u
L2(Rn)
= d(z1z, iR)‖u‖2L2(Rn) +
z1q(x, ξ)
L2(Rn)
.(3.0.5)
Then, since the quadratic form Re(z1q) is non-negative, we deduce from the symplectic
invariance of the Weyl quantization and the theorem 21.5.3 in [7] that there exists a
metaplectic operator U such that
z1q(x, ξ)
= U−1
+ x2j) +
j=k+1
with k, l ∈ N and λj > 0 for all j = 1, ..., k. By using that U is a unitary operator on
L2(Rn), we obtain that the quantity
z1q(x, ξ)
L2(Rn)
‖DxjUu‖2L2(Rn) + ‖xjUu‖
L2(Rn)
j=k+1
‖xjUu‖2L2(Rn),
is non-negative. Then, we can deduce from the Cauchy-Schwarz inequality, (3.0.4)
and (3.0.5) that for all u ∈ S(Rn),
z,Σ(q)
‖u‖L2(Rn) ≤ |z1| ‖q(x, ξ)wu− zu‖L2(Rn).
Finally, using the density of the Schwartz space S(Rn) in B and the fact that |z1| = 1,
we obtain that
∀z 6∈ Σ(q),
q(x, ξ)w − z
∥ ≤ 1
z,Σ(q)
since according to (2.1.7), σ
q(x, ξ)w
⊂ Σ(q). �
We now consider the one-dimensional case, which is a bit particular.
3.1. The one-dimensional case. In dimension n = 1, we can reduce the study of
complex-valued elliptic quadratic forms to exactly three normal forms after a simili-
tude and a real linear symplectic transformation.
Lemma 3.1.1. Let q : Rx × Rξ → C be a complex-valued elliptic quadratic form in
dimension 1. Then, there exists a linear symplectic transformation χ of R2 such that
the symbol q ◦ χ is equal to one of the following normal forms:
(i) α(ξ2 + eiθx2) with α ∈ C∗, 0 ≤ θ < π.
(ii) α(ξ + ix)(ξ + ηx) with α ∈ C∗, η ∈ C, Im η > 0.
(iii) α(ξ − ix)(ξ + ηx) with α ∈ C∗, η ∈ C, Im η < 0.
In the two last cases (ii) and (iii), the numerical range Σ(q) is equal to the whole
complex plane, Σ(q) = C.
Proof of Lemma 3.1.1. Let q : R2 → C be a complex-valued elliptic quadratic form.
Let us first consider the case where Σ(q) 6= C. We deduce from the proposition 2.1.1
that we can reduce our study to the case where Re q is a positive definite quadratic
form. Then, using Lemma 18.6.4 in [7], we can find a real linear symplectic transfor-
mation to reduce the quadratic form Re q to the normal form
λ(x2 + ξ2), with λ > 0.
It follows that there exist some real constants a, b and c such that
q(x, ξ) = λ
x2 + ξ2 + i(ax2 + 2bxξ + cξ2)
Then, we can choose an orthogonal matrix P ∈ O(2,R) diagonalizing the real sym-
metric matrix associated to the quadratic form ax2 + 2bxξ + cξ2,
with λ1, λ2 ∈ R. If P ∈ O(2,R) \ SO(2,R), we have
if σ0 is the matrix with determinant equal to −1,
and P̃ = Pσ0. It follows that we can always diagonalize the real symmetric matrix
associated to the quadratic form λ−1Im q by conjugating it by an element of SO(2,R).
Since the symplectic group is equal in dimension 1 to the group SL(2,R), we can after
a linear symplectic transformation of R2 reduce the quadratic form q to
x2 + ξ2 + i(γ1x
2 + γ2ξ
= α(ξ2 + reiθx2),
where γ1, γ2 ∈ R, α ∈ C∗, r > 0 and θ ∈] − π, π[. Let us notice that the elliptic-
ity of q actually implies that θ 6≡ π[2π]. Finally, using the real linear symplectic
transformation (x, ξ) 7→ (r−1/4x, r1/4ξ), we get a symbol of type (i),
αr1/2(ξ2 + eiθx2),
if 0 ≤ θ < π. If −π < θ < 0, we need to use besides the real linear symplectic
transformation (x, ξ) 7→ (ξ,−x) to obtain a symbol of type (i),
2 eiθ(ξ2 + e−iθx2).
Let us now assume that Σ(q) = C. Since the dimension is equal to 1, we can factor the
symbol q on C as a polynomial function of degree 2 in the variable ξ. Thus, according
to the dependence in the variable x of the polynomial function’s coefficients, we can
find some complex numbers λ1, λ2 and α ∈ C∗ such that
q(x, ξ) = α(ξ − λ1x)(ξ − λ2x).
The ellipticity assumption for the quadratic form q induces that
Im λj 6= 0,
if j = 1, 2. Using now the linear symplectic transformation (x, ξ) 7→ (x, ξ + Re λ1x),
we can assume that
(3.1.1) q(x, ξ) = α(ξ − irx)(ξ + bx),
with r ∈ R∗ and Im b 6= 0. Let us now check that the assumption Σ(q) = C induces
that r Im b < 0. Since
(ξ − irx)(ξ + bx) = ξ2 + (b − ir)xξ − irbx2,
the condition Σ(q) = C implies that for all (v, w) ∈ R2, there exists a solution
(x0, ξ0) ∈ R2 of the system
(3.1.2)
ξ2 +Re b xξ + r Im b x2 = v
xξ(Im b− r) − r Re b x2 = w.
Let us first notice that the second equation of (3.1.2) is fulfilled for all w ∈ R only if
Im b 6= r.
If w 6= 0, it follows from the second equation of (3.1.2) that x0 6= 0 and
(3.1.3) ξ0 =
w + r Re b x20
(Im b− r)x0
Let us consider the case where v = 0. Using (3.1.3) and the first equation of (3.1.2),
we obtain that
(w + r Re b x20)
2 +Re b (Im b− r)x20(w + r Re b x20) + r Im b (Im b− r)2x40 = 0.
We can rewrite this equation as fw(X0) = 0 if we set X0 = x
0 and
(3.1.4) fw(X) = r Im b
(Re b)2 + (Im b− r)2
X2 + w Re b (Im b+ r)X + w2.
Thus, the condition Σ(q) = C implies that there exists for all w 6= 0, a non-negative
solution X0 of the equation fw(X0) = 0. Since the quantity r Im b is assumed to be
non-zero, we first study the case where r Im b > 0. In this case, since
(3.1.5) f ′w(X) = 2r Im b
(Re b)2 + (Im b− r)2
X + w Re b (Im b+ r)
2r Im b
(Re b)2 + (Im b− r)2
because Im b 6= r, we have
(3.1.6) ∀X ∈ R+, fw(X) ≥ fw(0) = w2 > 0,
if w 6= 0 and
− w Re b (Im b+ r)
2r Im b
(Re b)2 + (Im b− r)2
) ≤ 0.
The estimate (3.1.6) shows that if r Im b > 0, the equation fw(X) = 0 has no non-
negative solution for all value of the parameter w 6= 0. This proves that the condition
Σ(q) = C induces that r Im b < 0. Using the linear symplectic transformation
(x, ξ) 7→ (|r|−1/2x, |r|1/2ξ),
we obtain the normal forms (ii) and (iii),
α|r|(ξ + ix)(ξ + ηx) with Im η > 0 and α|r|(ξ − ix)(ξ + ηx) with Im η < 0,
where η = |r|−1b. Finally, we can easily check that the numerical ranges of the normal
forms (ii) and (iii) are actually equal to the whole complex plane C. �
Let us notice that the proposition 2.2.2 and the remark following its statement are
some direct consequences of the symplectic invariance property of the Weyl quanti-
zation (see (3.0.3)) and the previous lemma. We can add that as proved after the
lemma 3.1 in [6], the Fredholm indices of the one-dimensional elliptic quadratic dif-
ferential operators with symbols of type (i), (ii) and (iii) are respectively equal to 0,
−2 and 2.
As we have mentioned in the previous section, the results of Theorem 2.2.1 and
Theorem 2.2.2 are already known in the particular case of the rotated harmonic oscil-
lator. The existence of semiclassical quasimodes inducing the presence of semiclassical
pseudospectrum of infinite index in every point of the interior of the numerical range
for the associated semiclassical operator, is a direct consequence of a result proved
by E.B. Davies in [4] (Theorem 1) and; the absence of semiclassical pseudospectrum
of index 2/3 on the boundary of the numerical range has been proved for the ro-
tated harmonic oscillator in [10]4. As we have previously mentioned (see (2.1.10) and
(3.0.2)), the property of non-normality, the order of symbols and the semiclassical
pseudospectra of elliptic quadratic differential operators are symplectically invariant.
These properties allow us to reduce by any real linear symplectic transformations the
symbols of the elliptic quadratic differential operators that we consider in our proof of
the theorem 2.2.1 and the theorem 2.2.2. By using the lemma 3.1.1, we deduce from
the results of the theorem 2.2.1 and the theorem 2.2.2 proved for the rotated harmonic
oscillator that they are therefore also fulfilled by all non-normal one-dimensional el-
liptic quadratic differential operators with a numerical range different from the whole
complex plane.
We now consider the multidimensional case. As we will see in the following, there is
a real jump of complexity between the one-dimensional case and the multidimensional
one. This jump is among other things a consequence of the complexity increase of
symplectic geometry in dimension n ≥ 2 and the larger diversity appearing in the
class of elliptic quadratic differential operators.
4Let us recall that the value of the order is equal to 2 in this case.
3.2. Case of dimension n ≥ 2. We only need to study the case of a non-normal
elliptic quadratic differential operator
(3.2.1) q(x, ξ)w : B → L2(Rn),
in dimension n ≥ 2. Let us recall that in this case, the numerical range Σ(q) is a
closed angular sector with a top in 0 and a positive opening strictly lower than π, and
that the proposition 2.1.2 gives that
(3.2.2) ∃(x0, ξ0) ∈ R2n, {Re q, Im q}(x0, ξ0) 6= 0.
Let us begin by studying what occurs at the interior of the numerical range Σ̊(q).
3.2.1. On the pseudospectrum at the interior of the numerical range. To prove the
existence of semiclassical quasimodes for the associated semiclassical operator given
by the theorem 2.2.1, we need a first purely algebraic step to characterize the points
belonging to the interior of the numerical range.
Let us consider the following decomposition of the numerical range
(3.2.3) Σ(q) = Ã ⊔ B̃,
where
(3.2.4) Ã =
z ∈ Σ(q) : ∃(x0, ξ0) ∈ R2n, z = q(x0, ξ0), {Re q, Im q}(x0, ξ0) 6= 0
(3.2.5) B̃ =
z ∈ Σ(q) : z = q(x0, ξ0) ⇒ {Re q, Im q}(x0, ξ0) = 0
The next section is devoted to give a geometrical description of these two sets. We
establish using purely algebraic arguments that
(3.2.6) Ã = Σ̊(q) and B̃ = ∂Σ(q).
This result is a consequence of the geometry induced by the quadratic setting to which
the studied symbols belong.
Let us begin by noticing that the symplectic invariance of the Poisson bracket
(2.1.10) induces the same property for the sets Ã and B̃. We can therefore use some
real linear symplectic transformation to reduce the symbol q. Since
{Re(zq), Im(zq)} = |z|2{Re q, Im q},
we deduce from this symplectic invariance, from the proposition 2.1.1 and the lemma
18.6.4 in [7] that after a similitude, we can reduce our study to the case where
(3.2.7) Re q(x, ξ) =
j + x
with λj > 0 for all j = 1, ..., n.
3.2.1.a. Geometrical description of the sets Ã and B̃. We begin by proving the fol-
lowing inclusion
(3.2.8) ∂Σ(q) ⊂ B̃.
Let us consider z ∈ ∂Σ(q) and (x0, ξ0) ∈ R2n such that z = q(x0, ξ0). This is
possible because the numerical range is a closed angular sector. If z = 0, the ellipticity
property of q implies that
(x0, ξ0) = (0, 0) and {Re q, Im q}(x0, ξ0) = 0,
because this Poisson bracket is also a quadratic form. This proves that z ∈ B̃. If
z ∈ ∂Σ(q) \ {0},
let us consider the global solution Y of the linear Cauchy problem
(3.2.9)
Y ′(t) = HRe q
Y (t)
Y (0) = (x0, ξ0),
associated to the Hamilton vector field of the symbol Re q,
HRe q =
(∂Re q
− ∂Re q
It is actually a linear Cauchy problem since Re q is a quadratic form. Setting
f(t) = Im q
Y (t)
a direct computation gives that
f ′(0) = {Re q, Im q}(x0, ξ0).
If f ′(0) 6= 0, we could find t0 6= 0 such that
|f(t0)| > |f(0)| = |Im z|.
Since Y is the flow associated to the Hamilton vector field of Re q, the quadratic form
Re q is constant under it. It follows that for all t ∈ R,
Y (t)
= Re q
Y (0)
= Re z
and provides a contradiction because, since z ∈ ∂Σ(q) \ {0}, this would imply in view
of the shape of the numerical range Σ(q) (see Figure 7) that
Y (t0)
6∈ Σ(q).
It follows that the Poisson bracket {Re q, Im q}(x0, ξ0) is necessary equal to 0 and
Figure 7.
q(Y (t
that z ∈ B̃. This ends the proof of the inclusion (3.2.8).
Let us now assume that
(3.2.10) ∂Σ(q) ⊂ B̃, ∂Σ(q) 6= B̃.
In this case, we could find
(3.2.11) z ∈ B̃ \ ∂Σ(q).
Let us first notice that z is necessary non-zero since 0 ∈ ∂Σ(q), and that Re z > 0,
since from (3.2.7),
(3.2.12) Σ(q) \ {0} ⊂ {z ∈ C∗ : Re z > 0}.
The fact that z belongs to the set B̃ implies that
(3.2.13)
Re q(x, ξ) = Re z
Im q(x, ξ) = Im z
=⇒ {Re q, Im q}(x, ξ) = 0.
We also know that there exists at least one solution to the system appearing in the
left-hand-side of (3.2.13). Since from (3.2.7), the quadratic form Re q is positive
definite, we can simultaneously reduce the quadratic forms Re q and Im q by finding
an isomorphism P of R2n such that in the new coordinates y = P−1(x, ξ),
(3.2.14) Re q(Py) =
y2j and Im q(Py) =
j with α1 ≤ ... ≤ αn.
Let us now consider the following quadratic form
(3.2.15) p(y) = {Re q, Im q}(Py).
We get from (3.2.13) and (3.2.14) that
(3.2.16)
j=1 y
j = Re z
j=1 αjy
j = Im z
=⇒ p(y) = 0.
Let us underline that the isomorphism P is not a priori a symplectic transformation
and that it does not preserve the Poisson bracket {Re q, Im q}.
We consider the two following sets
(3.2.17) E1 =
y ∈ R2n : r(y) = 0
where
(3.2.18) r(y) =
(3.2.19) E2 =
y ∈ R2n : p(y) = 0
The next lemma gives a first inclusion between these two sets E1 and E2.
Lemma 3.2.1. We have
(3.2.20) E1 ⊂ E2.
Proof of Lemma 3.2.1. Let y ∈ E1. If y = 0 then y belongs to E2 since from (3.2.15),
p is a quadratic form in the variable y. If y 6= 0, we set
y2j > 0 and ∀j = 1, ..., 2n, ỹj =
We recall from (3.2.12) that z ∈ B̃ \ ∂Σ(q) implies that Re z > 0. Then, since, on
one hand
ỹ2j = Re z,
and that, on the other hand, we have from (3.2.17) and (3.2.18) that
αj ỹ
y2j = Im z,
because y ∈ E1, we deduce from (3.2.16) and the homogeneity of degree 2 of the
quadratic form p that
p(ỹ) =
p(y) = 0.
According to (3.2.19), this proves that y ∈ E2 and ends the proof of the lemma 3.2.1.�
Then, we can notice from (3.2.14) that the boundary of the numerical range ∂Σ(q)
is given by
(3.2.21) (1 + iα1)R+ ∪ (1 + iαn)R+.
Since the numerical range Σ(q) is a closed set, the assumption
z ∈ B̃ \ ∂Σ(q) ⊂ Σ(q) \ ∂Σ(q) = Σ̊(q),
induces from (3.2.21) that
∈]α1, αn[.
This implies that the signature (r1, s1) of the quadratic form r defined in (3.2.18)
fulfills
(3.2.22) (r1, s1) ∈ N∗ × N∗ and r1 + s1 ≤ 2n.
Thus, we can assume after a new labeling that
(3.2.23) r(y) = a1y
1 + ...+ ar1y
− ar1+1y2r1+1 − ...− ar1+s1y
r1+s1
with aj > 0 for all j = 1, ..., r1+ s1. It follows from (3.2.17) and (3.2.23) that in these
new coordinates, the set E1 is the direct product of a proper cone C of R
r1+s1 and
R2n−r1−s1 ,
(3.2.24) E1 = C × R2n−r1−s1 .
Figure 8.
We are now going to prove that the two sets E1 and E2 are equal
(3.2.25) E1 = E2.
Let us reason by the absurd by assuming that it is not the case. Then, we could find
from the lemma 3.2.1,
(3.2.26) y0 ∈ E2 \ E1, y0 = (y′0, y′′0 ) with y′0 ∈ Rr1+s1 , y′′0 ∈ R2n−r1−s1 .
We deduce from (3.2.24) that y′0 6∈ C. Let us now recall an elementary geometrical
fact that we will use several times. This fact is that the intersection of a real line and
a real quadric surface is reduced to either 0, 1 or 2 points, or the line is completely
contained in the quadric surface. We first begin by proving that
(3.2.27) Rr1+s1 × {y′′ = y′′0} ⊂ E2.
Indeed, let us consider the affine subspace
F = {y ∈ R2n : y = (y′, y′′) ∈ Rr1+s1 × R2n−r1−s1 , y′′ = y′′0}.
We identify for more simplicity the space F to the space Rr1+s1 . We agree to say that
a point x′0 of R
r1+s1 belongs to the set E2 to mean that the point (x
0 ) belongs to
the set E2. With this convention, it is sufficient for proving the inclusion (3.2.27) to
consider some particular lines of Rr1+s1 , containing the point y′0 defined in (3.2.26)
and, which have an intersection with the cone C in at least two other different points
u′0 and v
0 (see Figure 9). These lines are necessary contained in the quadric surface
E2 because from the lemma 3.2.1,
E1 ⊂ E2,
and that there are at least three different points of intersection between these lines
and the quadric surface E2,
(u′0, y
0 ) ∈ C × R2n−r1−s1 = E1 ⊂ E2, (v′0, y′′0 ) ∈ C × R2n−r1−s1 = E1 ⊂ E2,
and (y′0, y
0 ) ∈ E2. Thus, we prove that the shaded disc appearing on the figure 10
is completely contained in the set E2. By using the cone structure of the set E2,
we can deduce that all the interior of the cone C (see Figure 11) is contained in E2.
Then, using again other particular intersections with some lines as on the figure 12,
we deduce from our identification of the space F to Rr1+s1 that the inclusion (3.2.27)
is fulfilled.
Figure 9.
We now prove that under these conditions, we have the identity
(3.2.28) E2 = R
Indeed, let us consider (ỹ′0, ỹ
0 ) ∈ R2n = Rr1+s1 × R2n−r1−s1 . If ỹ′0 ∈ C, then
(ỹ′0, ỹ
0 ) ∈ E2,
Figure 10.
These three points belong to E2.
The line D is contained in E2.
Figure 11.
because from (3.2.20) and (3.2.24), (ỹ′0, ỹ
0 ) ∈ E1 and E1 ⊂ E2. If, on the other hand
ỹ′0 6∈ C, we can choose a point u ∈ Rr1+s1 different from ỹ′0 such that u 6∈ C, and such
that the line containing ỹ′0 and u in R
r1+s1 , has an intersection with C in at least two
other different points v and w (see Figure 13). Thus, we can find some distinct real
numbers t1, t2 ∈ R \ {0, 1} such that
v = (1− t1)ỹ′0 + t1u ∈ C and w = (1− t2)ỹ′0 + t2u ∈ C.
Considering now the line
(1− t)(ỹ′0, ỹ′′0 ) + t(u, y′′0 ) : t ∈ R
we can notice that this real line contains at least three different points of E2:
(v, (1 − t1)ỹ′′0 + t1y′′0 ), (w, (1 − t2)ỹ′′0 + t2y′′0 ) and (u, y′′0 ).
Indeed, this is a consequence of the fact that v and w belong to C, and from (3.2.20),
(3.2.24) and (3.2.27). Thus, the line D is contained in the quadric surface E2. This
implies that (ỹ′0, ỹ
0 ) ∈ D ⊂ E2.
To sum up, we have proved that if the two sets E1 and E2 are different then the
set E2 is equal to R
2n. This fact induces in view of (3.2.19) that the quadratic form p
is identically equal to zero. By coming back to the first coordinates (x, ξ) = Py, it
Figure 12.
Figure 13.
follows from (3.2.15) that the quadratic form {Re q, Im q} is also identically equal to
zero, which contradicts (3.2.2). This proves the identity (3.2.25),
E1 = E2.
With this fact, we can resume our first reasoning by the absurd, which assume in
(3.2.11) the existence of a point z ∈ B̃ \ ∂Σ(q). Let us now consider y0 6∈ E1 = E2.
This is possible according to (3.2.2), (3.2.15) and (3.2.19). We deduce from (3.2.17)
and (3.2.19) that r(y0) and p(y0) are non-zero. By considering λ ∈ R∗ such that
p(y0) = λr(y0)
(3.2.29) r̃(y) = p(y)− λr(y),
it follows from (3.2.17), (3.2.19), (3.2.25) and (3.2.29) that
(3.2.30) E1 ⊂ {y ∈ R2n : r̃(y) = 0}.
This inclusion (3.2.30) is strict since
r̃(y0) = 0 and y0 6∈ E1.
By using now exactly the same reasoning as the one previously described to prove
(3.2.25), about the intersections of real lines and quadric surfaces, we prove that the
quadratic form r̃ is necessary identically equal to zero. Then, it follows from (3.2.29)
(3.2.31) p = λr.
By coming back to the first coordinates (x, ξ) = Py, we get using (3.2.14), (3.2.15),
(3.2.18) and (3.2.31) that for all (x, ξ) ∈ R2n,
(3.2.32) {Re q, Im q}(x, ξ) = λ
Im q(x, ξ)− Im z
Re q(x, ξ)
Let us now consider (x0, ξ0) ∈ R2n such that q(x0, ξ0) ∈ ∂Σ(q) \ {0}. This is possible
since the numerical range Σ(q) is a closed angular sector with a top in 0 and a positive
opening. We deduce from (3.2.5) and (3.2.8) that we necessarily have
{Re q, Im q}(x0, ξ0) = 0.
This induces from (3.2.32) that
(3.2.33) Im q(x0, ξ0) =
Re q(x0, ξ0),
because λ ∈ R∗. Since according to the shape of the numerical range Σ(q) and
(3.2.12),
q(x0, ξ0) ∈ ∂Σ(q) \ {0} ⊂ {z ∈ C : Re z > 0},
the identity (3.2.33) proves that the point z also belongs to the set ∂Σ(q), but it
contradicts the initial assumption
z ∈ B̃ \ ∂Σ(q).
Finally, this ends our reasoning by the absurd and proves (3.2.6).
3.2.1.b. Existence of semiclassical quasimodes at the interior of the numerical range.
To prove the existence of semiclassical quasimodes for the associated semiclassical
operator
(q(x, hξ)w)0<h≤1,
in every point of the numerical range’s interior (Theorem 2.2.1), we use an existence
result of semiclassical quasimodes for general pseudodifferential operators violating
the condition (Ψ)5. Let us mention that this result generalizes the two existence
results of semiclassical quasimodes given by E.B. Davies, in the case of Schrödinger
operators (Theorem 1 in [4]), and by M. Zworski in [17] and [18], for pseudodifferential
operators.
This existence result of semiclassical quasimodes can be stated as follows. Let us
consider a semiclassical symbol P (x, ξ;h) in S(〈(x, ξ)〉m, dx2 + dξ2) with m ∈ R+,
〈(x, ξ)〉2 = 1 + x2 + ξ2,
5The definition of the condition (Ψ) is recalled below.
where S(〈(x, ξ)〉m, dx2 + dξ2) stands for the following symbol class
S(〈(x, ξ)〉m, dx2 + dξ2) =
a(x, ξ;h) ∈ C∞(Rnx × Rnξ ,C) :
∀α ∈ N2n, sup
0<h≤1
‖〈(x, ξ)〉−m∂αx,ξa(x, ξ;h)‖L∞(R2n) < +∞
with a semiclassical expansion
(3.2.34) P (x, ξ;h) ∼
hjpj(x, ξ),
where for all j ∈ N, pj is a symbol of the class S(〈(x, ξ)〉m, dx2 + dξ2) independent
from the semiclassical parameter h.
Let z ∈ C, we assume that there exists a function q0 ∈ C∞b (R2n,C), where
C∞b (R
2n,C) stands for the set of bounded complex-valued functions on R2n with
all derivatives bounded, and a bicharacteristic curve, t ∈ [a, b] 7→ γ(t), of the real part
Re(q0(p0 − z)) of the symbol q0(p0 − z), with a < b, such that
(3.2.35) ∀t ∈ [a, b], q0
6= 0 and
q0(γ(a))
p0(γ(a))− z
> 0 > Im
q0(γ(b))
p0(γ(b))− z
Theorem 3.2.1. Under these assumptions (3.2.34) and (3.2.35), for all open neigh-
bourhood V of the compact set γ([a, b]) in R2n and for all N ∈ N, there exist h0 > 0
and (uh)0<h≤h0 a semiclassical family in S(Rn) such that
‖uh‖L2(Rn) = 1, FS
(uh)0<h≤h0
⊂ V and ‖P (x, hξ;h)wuh − zuh‖L2(Rn) = O(hN ),
when h → 0+.
The notation FS
(uh)0<h≤h0
stands for the frequency set of the semiclassical fam-
ily (uh)0<h≤h0 defined as the complement in R
2n of the set composed by the points
(x0, ξ0) ∈ R2n, for which there exists a symbol χ0(x, ξ;h) ∈ S(1, dx2 + dξ2) such that
χ0(x0, ξ0;h) = 1 and ‖χ0(x, hξ;h)wuh‖L2(Rn) = O(h∞),
when h → 0+.
This existence result of semiclassical quasimodes is an adaptation in a semiclassical
setting of the proof given by L. Hörmander in [7] for proving that the condition (Ψ) is a
necessary condition for the solvability of a pseudodifferential operator (Theorem 26.4.7
in [7]). The existence of this result has been first mentioned in [5]. A complete proof of
this adaptation in a semiclassical setting is given in [11]. This result shows that when
the principal symbol p0−z of the symbol P−z violates the condition (Ψ), there exists
in this point z some semiclassical quasimodes inducing the presence of semiclassical
pseudospectrum of infinite index for the semiclassical operator P (x, hξ;h)w.
Condition (Ψ). A complex-valued function p ∈ C∞(R2n,C) fulfills the condition (Ψ)
if there is no complex-valued function q ∈ C∞(R2n,C) such that the imaginary part
Im(qp) of the function qp changes sign from positive values to negative ones along
an oriented bicharacteristic of the symbol Re(qp) on which the function q does not
vanish.
By using the characterization given in the previous section for the interior of the
numerical range Σ̊(q) (see (3.2.4) and (3.2.6)), we are now going to prove that the
principal symbol q(x, ξ) − z of the semiclassical operator
q(x, hξ)w − z,
violates the condition (Ψ) for all z in Σ̊(q). This violation of the condition (Ψ) will
induce in view of the theorem 3.2.1 that for all z ∈ Σ̊(q) and N ∈ N, we can find a
semiclassical quasimode (uh)0<h≤h0 ∈ S(Rn), with h0 > 0, verifying
‖uh‖L2(Rn) = 1 and ‖q(x, hξ)wuh − zuh‖L2(Rn) = O(hN ) when h → 0+,
which will end the proof of Theorem 2.2.1.
Let us consider z ∈ Σ̊(q). We are now going to prove that there is actually a
violation of the condition (Ψ) for the symbol q − z. According to (3.2.4) and (3.2.6),
there are two cases to separate.
Case 1. Let us assume that there exists (x0, ξ0) ∈ R2n such that
(3.2.36) z = q(x0, ξ0), {Re(q − z), Im(q − z)}(x0, ξ0) = {Re q, Im q}(x0, ξ0) < 0.
By considering the solution of the following Cauchy problem
(3.2.37)
Y ′(t) = HRe q
Y (t)
Y (0) = (x0, ξ0),
we define the following function
(3.2.38) f(t) = Im q
Y (t)
− Im q(x0, ξ0).
As mentioned before, (3.2.37) is a linear Cauchy problem. It follows that its solution
Y is global and that the function f is well-defined on R. A direct computation using
(3.2.37) and (3.2.38) gives that for all t ∈ R,
(3.2.39) f ′(t) = {Re q, Im q}
Y (t)
Since from (3.2.36), (3.2.37), (3.2.38) and (3.2.39),
f(0) = 0, f ′(0) = {Re q, Im q}(x0, ξ0) < 0
and HRe q−Re z = HRe q, we deduce in this first case that the imaginary part of the
function q − z changes sign, at the first order, from positive values to negative ones
along the oriented bicharacteristic Y of the symbol Re q−Re z. This proves that the
symbol q − z actually violates the condition (Ψ).
Case 2. Let us now assume that there exists (x0, ξ0) ∈ R2n such that
(3.2.40) z = q(x0, ξ0), {Re(q − z), Im(q − z)}(x0, ξ0) = {Re q, Im q}(x0, ξ0) > 0.
We consider as in the previous case, the global solution Y of the Cauchy problem
(3.2.37) and the function f defined in (3.2.38). Since from (3.2.37), (3.2.38), (3.2.39)
and (3.2.40),
(3.2.41) f(0) = 0, f ′(0) = {Re q, Im q}(x0, ξ0) > 0,
we deduce this time that the imaginary part of the function q − z also changes sign,
at the first order, along the oriented bicharacteristic Y of the symbol Re q − Re z.
Nevertheless, this change of sign is done in the “wrong” way. It is a change of sign
from negative values to positive ones, which does not induce directly a violation of
the condition (Ψ). To check that there is actually a violation of the condition (Ψ)
in this second case, we need to study more precisely the behaviour of the function
Im q − Im z along this bicharacteristic Y .
We deduce from (3.2.41) that there exists ε > 0 such that
∀t ∈ [−ε, ε], f ′(t) > 0,
which induces that
(3.2.42) f(ε) > 0 and f(−ε) < 0,
since from (3.2.41), f(0) = 0. By using the following lemma, we obtain that for all
δ > 0, there exists a time t0(δ) > ε such that
(3.2.43) |Y
t0(δ)
− Y (−ε)| < δ.
Figure 14.
q(Y (�"))
z = q(Y (0))
q(Y ("))
Lemma 3.2.2. If Y (t) = (x(t), ξ(t)) is the C∞(R,R2n) function solving the linear
system of ordinary differential equations
Y ′(t) = HRe q
Y (t)
where Re q is the symbol defined in (3.2.7), then we have
∀t0 ∈ R, ∀ε > 0, ∀M > 0, ∃T1 > M, ∃T2 > M,
|Y (t0)− Y (t0 + T1)| < ε and |Y (t0)− Y (t0 − T2)| < ε.
Proof of Lemma 3.2.2. If Y (t0) = (a1, ..., an, b1, ..., bn) ∈ R2n, we deduce from (3.2.7)
that the function Y (t) = (x(t), ξ(t)) solves the following Cauchy problem
∀j = 1, ..., n,
x′j(t) = 2λjξj(t)
ξ′j(t) = −2λjxj(t)
xj(t0) = aj
ξj(t0) = bj.
It follows that for all j = 1, ..., n and t ∈ R,
(3.2.44)
xj(t) = bj sin
2(t− t0)λj
+ aj cos
2(t− t0)λj
ξj(t) = bj cos
2(t− t0)λj
− aj sin
2(t− t0)λj
Setting βj = λj/π for all j = 1, ..., n, we need to study two different cases.
Case 1: ∀j ∈ {1, ..., n}, βj ∈ Q. In this case, the function Y is periodic and the
result of Lemma 3.2.2 is obvious.
Case 2: (β1, ..., βn) 6∈ Qn. In this second case, we use the following classical result of
rational approximation: ∀ε > 0, ∀(θ1, ..., θn) ∈ Rn \Qn, ∃p1, ..., pn ∈ Z, ∃q ∈ N∗ such
0 < sup
j=1,...,n
If 0 < ε1 < 1/2, we can therefore find some integers p1,1, ..., p1,n ∈ Z and qε1 ∈ N∗
such that
0 < sup
j=1,...,n
|qε1βj − p1,j | < ε1.
j=1,...,n
|qε1βj − p1,j | > 0,
using again this result of rational approximation, we can find some other integers
p2,1, ..., p2,n ∈ Z and qε2 ∈ N∗ such that
0 < sup
j=1,...,n
|qε2βj − p2,j | < ε2.
By using this process, we build some sequences (pm,j)m∈N∗ of Z for j = 1, ..., n,
(εm)m∈N∗ of R
+ and (qεm)m∈N∗ of N
∗ such that for all m ≥ 2,
(3.2.45) 0 < sup
j=1,...,n
|qεmβj − pm,j | < εm =
j=1,...,n
∣qεm−1βj − pm−1,j
(3.2.46) 0 < εm <
The elements of the sequence (qεm)m∈N∗ are necessary two by two different. Indeed,
if qεk = qεl for k < l, this would imply according to (3.2.45) and (3.2.46) that
∀j = 1, ..., n, |pk,j − pl,j| ≤ |qεkβj − pk,j |+ |qεlβj − pl,j | < εk + εl < 1,
because 0 < ε1 < 1/2, which would induce that ∀j = 1, ..., n, pk,j = pl,j because pk,j
and pl,j are some integers; and would contradict (3.2.45) because
0 < sup
j=1,...,n
|qεlβj − pl,j | < εl ≤
j=1,...,n
|qεkβj − pk,j |.
Since the sequence (qεm)m∈N∗ is composed of integers two by two different, we can
assume after a possible extraction that qεm → +∞ when m → +∞. We deduce from
(3.2.44), (3.2.45) and (3.2.46) that
Y (t0 + qεm) → Y (t0) when m → +∞.
Then, considering (β̃1, ..., β̃n) = (−β1, ...,−βn), we obtain by using the same method
a sequence (q̃εm)m∈N∗ of integers such that q̃εm → +∞ and
Y (t0 − q̃εm) → Y (t0) when m → +∞.
This ends the proof of Lemma 3.2.2. �
Since from (3.2.42), f(−ε) < 0, we deduce from (3.2.38) and (3.2.43) that there
exists t0 > ε such that f(t0) is arbitrarily close to f(−ε). It follows in particular that
we can find t0 > ε such that f(t0) < 0. Since from (3.2.42), f(ε) > 0 and f(t0) < 0,
we deduce from (3.2.38) and (3.2.40) that the function
t 7→ Im q
Y (t)
− Im z,
changes sign from positive values to negative ones on the interval [ε, t0]. This proves
that the imaginary part of the function q−z actually changes sign from positive values
to negative ones along the oriented bicharacteristic Y of the symbol Re q−Re z; and
that the symbol q − z also violates in this second case the condition (Ψ). This ends
the proof of Theorem 2.2.1.
3.2.1.c. Another proof for the existence of semiclassical quasimodes. In the following
lines, we give another proof for the existence of semiclassical quasimodes in some
points of the numerical range’s interior. The result proved in this section is weaker
than the one given by the theorem 2.2.1, since we prove the existence of semiclassical
quasimodes in every point of the numerical range’s interior without a finite number
of particular half-lines.
Let us consider a non-normal elliptic quadratic differential operator
(3.2.47) q(x, ξ)w : B → L2(Rn),
in dimension n ≥ 2. We assume, as before, that (3.2.7) is fulfilled. Using that
the quadratic form Re q is positive definite, we can simultaneously reduce the two
quadratic forms Re q and Im q by choosing an isomorphism P of R2n such that in
the new coordinates y = P−1(x, ξ),
(3.2.48) r1(y) = Re q(Py) =
y2j , r2(y) = Im q(Py) =
with α1 ≤ ... ≤ αn. Let us study when the differential forms dr1(y) and dr2(y) are
linearly dependent on R i.e. when there exist (λ, µ) ∈ R2 \ {(0, 0)} such that
(3.2.49) λdr1(y) + µdr2(y) = 0.
It follows from (3.2.48) and (3.2.49) that for all j = 1, ..., 2n,
(3.2.50) (λ+ µαj)yj = 0.
If y 6= 0, then there exists j0 ∈ {1, ..., 2n} such that yj0 6= 0. This implies that
(3.2.51) λ+ µαj0 = 0.
We deduce from (3.2.50) and (3.2.51) that yj = 0 if αj 6= αj0 . Thus, we obtain that if
z ∈ Σ̊(q) \
(1 + iα1)R
+ ∪ ... ∪ (1 + iαn)R∗+
then the differential forms dRe q and dImq are linearly independent on R in every
point of the set q−1(z).
Figure 15.
(1 + i�
(1 + i�
(1 + i�
(1 + i�
(1 + i�
Let us consider such a point
z ∈ Σ̊(q) \
(1 + iα1)R
+ ∪ ... ∪ (1 + iαn)R∗+
Since the dimension n ≥ 2, we can apply the lemma 3.1 in [5] (see also the lemma 8.1
in [9]). It follows that for any compact, connected component Γ of q−1(z), we have
(3.2.52)
{Re q, Im q}(ρ)λq,z(dρ) = 0,
where λq,z stands for the Liouville measure on q
−1(z),
λq,z ∧ dRe q ∧ dIm q =
The set q−1(z) is a non-empty submanifold of codimension 2 in R2n. We deduce from
(3.2.4) and (3.2.6) that there exist (x0, ξ0) ∈ q−1(z) such that
(3.2.53) {Re q, Im q}(x0, ξ0) 6= 0.
Then, it follows from (3.2.52) and (3.2.53) that there necessary exists (x̃0, ξ̃0) ∈ q−1(z)
such that
(3.2.54) {Re q, Im q}(x̃0, ξ̃0) < 0.
Under this condition (3.2.54), we can use the reasoning given in the first studied case
(see (3.2.36)) to prove that the imaginary part of the function q−z changes sign, at the
first order, from positive values to negative ones along an oriented bicharacteristic of
the symbol Re q−Re z. This induces that the symbol q−z violates the condition (Ψ);
and we can conclude by using the theorem 3.2.1. Let us mention that we can also
directly use the existence result of semiclassical quasimodes given by M. Zworski in
[17] and [18]. This second proof gives the existence of semiclassical quasimodes in
every point belonging to the set
Σ̊(q) \
(1 + iα1)R
+ ∪ ... ∪ (1 + iαn)R∗+
3.2.2. On the pseudospectrum at the boundary of the numerical range. In this section,
we give a proof of the theorem 2.2.2. Let us consider a non-normal elliptic quadratic
differential operator
q(x, ξ)w : B → L2(Rn),
in dimension n ≥ 1. We assume that Σ(q) 6= C, and that its Weyl symbol q(x, ξ) is of
finite order kj on a half-line ∆j , j ∈ {1, 2} (See the definition given in (2.2.9)), which
composes the boundary of its numerical range
(3.2.55) ∂Σ(q) = {0} ⊔∆1 ⊔∆2.
As we have already done several times, we can reduce our study to case where (3.2.7)
is fulfilled.
Proof of Theorem 2.2.2. Let us consider the following symbol belonging to the
C∞b (R
2n,C) space, composed of bounded complex-valued functions on R2n with all
derivatives bounded
(3.2.56) r(x, ξ) =
q(x, ξ) − z
1 + x2 + ξ2
with z ∈ ∆j . Setting Σ̃(r) = r(R2n), we can first notice that
z ∈ ∂Σ(q) \ {0} ⇒ 0 ∈ ∂Σ̃(r).
Let us also notice that the symbol r fulfills the principal-type condition in 0. Indeed,
if (x0, ξ0) ∈ R2n was such that r(x0, ξ0) = 0 and dr(x0, ξ0) = 0, we would get from
(3.2.56) that
(3.2.57) dq(x0, ξ0) = 0.
Since from (3.2.7) and (3.2.57), we have
dRe q(x0, ξ0) = 2
(x0)jdxj + (ξ0)jdξj
this would imply that
(x0, ξ0) = (0, 0), q(x0, ξ0) = 0,
because q is a quadratic form and that λj > 0 for all j = 1, ..., n. On the other hand,
since r(x0, ξ0) = 0, we get from (3.2.56) that q(x0, ξ0) = z 6= 0 because
z ∈ ∆j ⊂ ∂Σ(q) \ {0},
which induces a contradiction. It follows that the symbol r actually fulfills the
principal-type condition in 0. Let us notice that, since symbol q is of finite order kj
in z, this induces in view of (3.2.56) that the symbol r is also of finite order kj in 0.
On the other hand, we deduce from (3.2.7) and (3.2.56) that the set
{(x, ξ) ∈ R2n : r(x, ξ) = 0} = {(x, ξ) ∈ R2n : q(x, ξ) = z},
is compact. Under these conditions, we can apply the theorem 1.4 in [5], which proves
that the integer kj is even and gives the existence of positive constants h0 and C1
such that
(3.2.58) ∀ 0 < h < h0, ∀u ∈ S(Rn), ‖r(x, hξ)wu‖L2(Rn) ≥ C1h
kj+1 ‖u‖L2(Rn).
Remark. We did not check the dynamical condition (1.7) in [5], because this assump-
tion is not necessary for the proof of Theorem 1.4. Indeed, this proof only use a part
of the proof of lemma 4.1 in [5] (a part of the second paragraph), where this condition
(1.7) is not needed.
By using some results of symbolic calculus given by Theorem 18.5.4 in [7] and (3.2.56),
we can write
(3.2.59) r(x, hξ)w(1 + x2 + h2ξ2)w = q(x, hξ)w − z + hr1(x, hξ)w + h2r2(x, hξ)w ,
(3.2.60) r1(x, ξ) = −ix
(x, ξ) + iξ
(x, ξ)
(3.2.61) r2(x, ξ) = −
(x, ξ) − 1
(x, ξ).
We can easily check from (3.2.56) that these functions r1 and r2 belong to the space
C∞b (R
2n,C), and we deduce from the Calderón-Vaillancourt theorem that there exists
a positive constant C2 such that for all u ∈ S(Rn) and 0 < h ≤ 1,
(3.2.62) ‖r1(x, hξ)wu‖L2 ≤ C2‖u‖L2 and ‖r2(x, hξ)wu‖L2 ≤ C2‖u‖L2.
It follows from (3.2.58), (3.2.59), (3.2.62) and the triangular inequality that for all
u ∈ S(Rn) and 0 < h < h0,
kj+1 ‖(1 + x2 + h2ξ2)wu‖L2(Rn)
≤ ‖r(x, hξ)w(1 + x2 + h2ξ2)wu‖L2(Rn)
≤ ‖q(x, hξ)wu− zu‖L2(Rn) + C2h(1 + h)‖u‖L2(Rn).
Since from the Cauchy-Schwarz inequality, we have for all u ∈ S(Rn) and 0 < h ≤ 1,
‖u‖2L2(Rn) ≤ ‖u‖2L2(Rn) + ‖xu‖2L2(Rn) + ‖hDxu‖2L2(Rn)
(1 + x2 + h2ξ2)wu, u
L2(Rn)
≤ ‖(1 + x2 + h2ξ2)wu‖L2(Rn)‖u‖L2(Rn),
we obtain that for all u ∈ S(Rn) and 0 < h < h0,
(3.2.63) C1h
kj+1 ‖u‖L2(Rn) ≤ ‖q(x, hξ)wu− zu‖L2(Rn) + C2h(1 + h)‖u‖L2(Rn).
Since kj ≥ 1, we deduce from (3.2.63) that there exist some positive constants h′0 and
C3 such that for all 0 < h < h
0 and u ∈ S(Rn),
‖q(x, hξ)wu− zu‖L2(Rn) ≥ C3h
kj+1 ‖u‖L2(Rn).
Using that the Schwartz space S(Rn) is dense in B and that the operator
q(x, hξ)w + z,
is a Fredholm operator of index 0, we obtain that for all 0 < h < h′0,
q(x, hξ)w − z
∥ ≤ C−13 h
kj+1 ,
which ends the proof of Theorem 2.2.2. �
About the case of infinite order, the situation is much more complicated. As
mentioned before, we cannot expect to prove a stronger result than an absence of
semiclassical pseudospectrum of index 1, but we can actually prove that there is never
some semiclassical pseudospectrum of index 1 on every half-line of infinite order, by
using a result of exponential decay in time for the norm of contraction semigroups
generated by elliptic quadratic differential operators proved in [12].
The result proved in [12] shows that the norm of a contraction semigroup
‖etq(x,ξ)
‖L(L2), t ≥ 0,
generated by an elliptic quadratic differential operator q(x, ξ)w with a Weyl symbol
verifying
Re q ≤ 0, ∃(x0, ξ0) ∈ R2n, Re q(x0, ξ0) 6= 0,
decreases exponentially in time
(3.2.64) ∃M,a > 0, ∀t ≥ 0, ‖etq(x,ξ)
‖L(L2) ≤ Me−at.
Let us consider a non-normal elliptic quadratic differential operator
q(x, ξ)w : B → L2(Rn),
in dimension n ≥ 1 such that Σ(q) 6= C. We explain in the following lines how (3.2.64)
allows to prove that there is never some semiclassical pseudospectrum of index 1 on
any open half-lines composing the boundary of the numerical range ∂Σ(q) \ {0}.
Let z ∈ ∂Σ(q)\{0}. Since the numerical range Σ(q) is a closed angular sector with
a top in 0 and a positive opening strictly lower than π, we can find ε ∈ {±1} such
(3.2.65) Re(εiz−1q) ≤ 0, ∃(x0, ξ0) ∈ R2n, Re(εiz−1q)(x0, ξ0) 6= 0.
Using the theorem 2.8 in [2], we obtain that for all η ∈ R,
q(x, ξ)w − ηz
= − iz−1ε
εiη − εiz−1q(x, ξ)w
= − iz−1ε
e−iεηsesεiz
−1q(x,ξ)wds.(3.2.66)
It follows from (3.2.64) and (3.2.65) that for all η ∈ R,
q(x, ξ)w − ηz
∥ ≤ |z|−1
‖esεiz
−1q(x,ξ)w‖L(L2)ds
≤ |z|−1
Me−asds = |z|−1M
< +∞,
which proves the absence of semiclassical pseudospectrum of index 1 on the half-line
zR∗+. We can actually use the theorem 2.8 in [2] because
iR ⊂ C \ σ
εiz−1q(x, ξ)w
Indeed, if it was not the case, we would deduce from (2.1.7) that there exists u0 ∈
B \ {0} and λ0 ∈ R such that
εiz−1q(x, ξ)wu0 = iλ0u0.
Since from (3.2.65), the quadratic form −Re(εiz−1q) is non-negative, we deduce from
the symplectic invariance of the Weyl quantization and the theorem 21.5.3 in [7] that
there exists a metaplectic operator U such that
(3.2.67) −
εiz−1q(x, ξ)
= U−1
+ x2j ) +
j=k+1
with k, l ∈ N and λj > 0 for all j = 1, ..., k. By using that U is a unitary operator on
L2(Rn), we obtain that
0 = − Re(iλ0u0, u0)L2
= − Re
εiz−1q(x, ξ)wu0, u0
εiz−1q(x, ξ)
u0, u0
‖DxjUu0‖2L2 + ‖xjUu0‖2L2
j=k+1
‖xjUu0‖2L2 ,
which induces that u0 = 0, because from (3.2.65) and (3.2.67), k + l ≥ 1. It follows
from (2.1.7) that there exists ε0 > 0 such that
εiz−1q(x, ξ)w
⊂ {z ∈ C : Re z ≤ −ε0}.
References
[1] L.S.Boulton, Non-self-adjoint harmonic oscillator semigroups and pseudospectra, J. Operator
Theory, 47, 413-429 (2002).
[2] E.B.Davies, One-Parameter Semigroups, Academic Press, London (1980).
[3] E.B.Davies, Pseudospectra, the harmonic oscillator and complex resonances, Proc. R. Soc. Lond.
A, 455, 585-599 (1999).
[4] E.B.Davies, Semi-classical states for non-self-adjoint Schrödinger operators, Comm. Math.
Phys., 200, 35-41 (1999).
[5] N.Dencker, J.Sjöstrand, M.Zworski, Pseudospectra of Semiclassical (Pseudo-)Differential Op-
erators, Comm. Pure Appl. Math., 57, 384-415 (2004).
[6] L.Hörmander, A Class of Hypoelliptic Pseudodifferential Operators with Double Characteristics,
Math. Ann., 217, 165-188 (1975).
[7] L.Hörmander, The analysis of linear partial differential operators (vol. I,II,III,IV), Springer
Verlag (1985).
[8] T.Kato, Perturbation Theory for Linear Operators, Springer-Verlag, Berlin (1980).
[9] A.Melin, J.Sjöstrand, Determinants of pseudodifferential operators and complex deformations
of phase space, Methods Appl. Anal., 9, no.2, 177-237 (2002).
[10] K.Pravda-Starov, A complete study of the pseudo-spectrum for the rotated harmonic oscillator,
J. London Math. Soc. (2) 73, 745-761 (2006).
[11] K.Pravda-Starov, Etude du pseudo-spectre d’opérateurs non auto-adjoints, PhD Thesis of the
University of Rennes 1, France (2006).
[12] K.Pravda-Starov, Contraction semigroups of elliptic quadratic differential operators, preprint
(2007).
[13] S.Roch, B.Silbermann, C∗-algebra techniques in numerical analysis, J. Oper. Theory 35, 241-
280 (1996).
[14] J.Sjöstrand, Parametrices for pseudodifferential operators with multiple characteristics, Ark.
för Mat., 12, 85-130 (1974).
[15] L.N.Trefethen, Pseudospectra of linear operators, Siam Review 39, 383-400 (1997).
[16] L.N.Trefethen, M.Embree, Spectra and Pseudospectra: The Behavior of Nonnormal Matrices
and Operators, Princeton University Press (2005).
[17] M.Zworski, A remark on a paper of E.B.Davies, Proc. Am. Math. Soc., 129, 2955-2957 (2001).
[18] M.Zworski, Numerical linear algebra and solvability of partial differential equations, Comm.
Math. Phys., 229, 293-307 (2002).
Department of Mathematics, University of California, Evans Hall, Berke-
ley, CA 94720, USA
E-mail address: karel@math.berkeley.edu
	1. Introduction
	1.1. Miscellaneous facts about pseudospectrum
	1.2. Elliptic quadratic differential operators
	1.3. Semiclassical pseudospectrum
	2. Statement of the results
	2.1. Some notations and some preliminary facts about elliptic quadratic differential operators
	2.2. Statement of the main results
	3. The proofs of the results
	3.1. The one-dimensional case
	3.2. Case of dimension n 2
	References
ABSTRACT
  We study the pseudospectrum of a class of non-selfadjoint differential
operators. Our work consists in a detailed study of the microlocal properties,
which rule the spectral stability or instability phenomena appearing under
small perturbations for elliptic quadratic differential operators. The class of
elliptic quadratic differential operators stands for the class of operators
defined in the Weyl quantization by complex-valued elliptic quadratic symbols.
We establish in this paper a simple necessary and sufficient condition on the
Weyl symbol of these operators, which ensures the stability of their spectra.
When this condition is violated, we prove that it occurs some strong spectral
instabilities for the high energies of these operators, in some regions which
can be far away from their spectra. We give a precise geometrical description
of them, which explains the results obtained for these operators in some
numerical simulations giving the computation of false eigenvalues far from
their spectra by algorithms for eigenvalues computing.

<|endoftext|><|startoftext|>
Introduction
Turbulent flows exhibit a notoriously complex and unpredictable dynamics: they
present a huge number of degrees of freedom, and their dynamics are both far from
equilibrium and dissipative [1, 2, 3]. The kinetic energy injected at large scale by shear
instability mecanisms is dissipated into heat by the molecular viscosity at small scales.
That is, dissipation and injection scales are distinct. Therefore, a transport process
through scales is necessary for a flow to be stationary. It is suspected that instability
mechanisms associated with non-linearities generate harmonics, therefore transfering
energy to smaller scales almost without dissipation. An equivalent picture would
consist in vortices stretching each other in such a way that a non-zero energy transfer
occurs toward smaller scales. This picture of cascade process was first proposed by
Richardson [4]. The cascade stops approximately in the range of scales where the
viscosity becomes efficient to damp velocity gradients. In the late thirties, Kolmogorov
derived from this idea a phenomenological theory accounting for the fluctuations of
various observables in fully developed turbulence [5]. In the present work, we are
http://arxiv.org/abs/0704.0325v3
Measurements of a dynamical temperature in turbulence. 2
neither concerned by the large (energy injection) scales, nor by the small (dissipation)
scales, but by the intermediate range. In this intermediate inertial range, we study
the transport process through scales, expected to be universal. Instead of scale l, one
often refers to the wave-number k = 2π/l.
The control parameter of the flow is the Reynolds number: Re = V L
, where L is the
macroscopic scale of the flow (integral scale, or correlation length), V is a characteristic
shear velocity at large scale, and ν is the kinematic viscosity of the fluid. It is also
the mean ratio of the inertial by the dissipative contribution of the forcing over a fluid
particle. Interesting predictions were derived by Kolmogorov (1941), that we use in the
following. Especially, the range of scales over which fluctuations occur scales as Re3/4.
The prediction for the exponent of the power spectral density as 〈|ṽ|2〉 ∝ k−5/3 is
among the most famous successes of this theory [1, 2, 3].
Our experimental system is discribed in detail in the next section. It is a thin string
held by its ends at constant tension across a turbulent flow. To formalize briefly, it
is an oscillator with multiple resonances, coupled to a particular ’thermostat’: the
turbulent flow. This string is used to probe the inertial range of a flow of high enough
Reynolds numbers. The device is ’calibrated’ by measuring the average (complex)
response to an external perturbation, and then used to measure the free fluctuations
caused by turbulence alone. Measurement of the displacement r(t) caused by the
turbulent forcing f(t) is performed with small piezoelectric transducers. We measure
the average response, i.e. the displacement on one end caused by a known broad band
forcing on the other end. Then, measurements of the displacement on one end alone
give information on the forcing fluctuations. Our study goes a step forward, in an
exploratory way. Knowing the average response function of the string and measuring
r(t), we invoque a version of the Fluctuation-Dissipation Theorem extended out of
equilibrium, to define an effective temperature of the turbulent flow. This effective
temperature happends to be scale-dependant.
In this work, fully developped turbulence is addressed from the point of view of
statistical mechanics. We first recall one important break-through: the statement of
the Fluctuation-Dissipation Theorem (FDT). Consider a pair of conjugate variables
(displacement r and force f) of a small system in thermal contact with a large
heat reservoir. In the present case the small system is the string, coupled to the
turbulent flow which is the reservoir. Displacement r and force f are conjugate in
the sense that their product is the work exerted by the flow on the string. The
theorem originates from the idea that spontaneous fluctuations r(t) should have the
same statistical properties as the relaxation of r(t) after the removal of an external
forcing perturbation. The main hypothesis needed to derive this theorem are: –
linear response between f and r, – thermal equilibrium between the system under
consideration and the thermostat, – thermal equilibrium of the thermostat itself. The
response function Hr,f is such that: r(t) =
Hx,f (t − t
′)f(t′)dt′. Equivalently it
can be written in the Fourier space as: r̃(ω) = H̃r,f f̃(ω). Under some hypothesis, the
fluctuations of r (its 2-times correlation function) are linked by a very simple relation
with the dissipative response of the system to a perturbation of the conjugate variable
f (imaginary part of the average response function). It is simply proportional, and
the coefficient is nothing but the temperature multiplied by the Boltzman constant:
kBT [6]. The validity of the hypothesis has to be discussed in each case. If they are
satisfied, the correlation function of the spontaneous fluctuations is proportional to
the response function, i.e. the factor is unique and constant. Moreover, this factor
Measurements of a dynamical temperature in turbulence. 3
is the same for all couples of conjugate variables, and this factor is kBT , where T is
the temperature of the system. The Boltzman constant kB ≃ 1.38 10
−23JK−1 is an
universal constant. This relation can be expressed in spectral variables:
〈|r̃(ω)|2〉 =
2 kBT
Im[H̃r,f (ω)]. (1)
In this expression of the FDT, 〈|r̃(ω)|2〉 is the power spectral density of the fluctuations
of the displacement r, as H̃r,f(ω) is the response function on r to the conjugate
variable f . Because the string is very thin, the drag is purely viscous. It is therefore
proportional to the velocity, which is in quadrature with the displacement. The
dissipation is therefore proportional to the imaginary part of the average response
function: Im[H̃].
In the perspective of constructing a non-equilibrium thermodynamics, the FDT has
been reconsidered by L. Cugliandolo and J. Kurchan, while investigating amorphous
materials relaxing after a thermal quench through the glass transition [7, 8].
We present in the following an exploratory approach of the question of turbulent
fluctuations using their extended formalism. The Fluctuation-Dissipation Ratio
(FDR) can be rewritten:
ω 〈r̃(ω)2〉
Im[H̃r,f(ω)]
= 2 kBTeff.(ω), (2)
where the temperature is replaced by an ’effective’ temperature Teff., function of
frequency ω. The frequency dependence of Teff. expresses the fact that different degrees
of freedom are not at equilibrium with each other, resulting in internal energy fluxes.
In other words, in our system, each (independent) mode of the string couples to
(non-independent) scale of the flow. As the flow is stationary, we average our
measurements on time, and finally obtain the frequency dependance of Teff. as defined
by equation 2. Measurements of the fluctuations of the string give Fourier components
of the excitation of the flow. We measure independently the fluctuations, and the
complex average response function to a specified excitation, in a way discussed below.
We propose to analyse these measurements with the criteria discussed above.
The paper is organised as follows. The next section describes the experimental setup,
turbulent flow properties, and the setting of the string. General properties of a
vibrating Melde string are also discussed. The measurements are shown in section 3:
response, fluctuations, and the Fluctuation Dissipation Ratio of this system. In section
4, we derive from Kolmogorov’s theory a simple scaling model for the fluctuations of
the drag, and therefore the FDR, which accounts for the exponent observed in the
whole range of accessible Re. The section 5 is devoted to a discussion of our results,
especially in comparison to several definitions of temperature in turbulence proposed
in the literature.
2. The Melde string and the experimental setup
The experimental setup is sketched in Fig. 1. A turbulent air jet originates from
a nozzle of diameter 5 cm. The flow facility we used is thoroughly described in
[9]. A thin stainless steel string of length 60 cm is located 2 m downstream the
nozzle, perpendicular to the axis of the flow. At this distance, the length of the
string is about the diameter of the turbulent jet. The displacement of the string is
measured using piezoelectric multi-layer ceramics at each end of the string. A piezo
Measurements of a dynamical temperature in turbulence. 4
is deformed by a voltage. Reciprocally, if the ceramic in compressed, a voltage is
generated. The relation between voltage and deformation is linear, and the frequency
response is almost flat in the frequency range we consider here. It can be used as
actuator or sensor. We have two piezos, one on each end of the string. The two
different measurements we perform are the following. 1) complex response function:
one (input) piezo is feeded with a white noise voltage through a power amplifier. The
source is that of a HP3562A signal analyser. Standing transverse waves appear in
the string, weakly perturbed by the turbulent fluctuations. Mecanical displacement
on the other end is transformed into a voltage by the other (output) piezo. It must
be amplified, and both input and output voltages are recorded synchronously with a
24 bits A/D converter. The acquisition frequency is 50 kHz. We call response the
time averaged ratio of the voltage amplitudes on input and output piezos, recorded
simultaneously. Voltages in and out are proportional respectively to the displacement
and the constraint (on the piezos). The dimension of the actual response is the inverse
of a stiffness, as what we measure is the ratio of voltages. Dimentional prefactors are
omited for simplicity, as they are constant for the same setup (string and transducers).
The diameter of the string is 100 µm, less than the viscous scale of the flow which
is about η ≃ 170 µm at the largest Re accessible. The equation of motion of the
PIEZOS
STAND
Figure 1. Eperimental setup: the thin steel wire is pulled across a turbulent air
jet by a 4 Kg weight on a rigid stand. Piezoelectric transducers are in mecanical
contact with the wire at each end.
undamped and unforced string is a linear wave equation. Its solutions with fixed ends
are standing waves r(x, t) = A cos(ωn t − knx), where A is the amplitude, t is time
and x is position along the wire. The discrete wave numbers are kn = n
, where L
Measurements of a dynamical temperature in turbulence. 5
is the length of the string and n is a positive integer. In a first approximation, the
waves are not dispersive: ωn = c kn, where c is the phase velocity. T is the tension of
the string and µ its mass per unit length, c =
T/µ ≃ 300 m/s. With a 4 kg weight
on one end, the string’s fundamental frequency is f0 = 344 Hz.
Dissipation is mainly due to friction on air, and causes little dispersion. More precise
treatment would require terms of dissipation in the wire itself and in the piezoelectric
transducers that fix the ends. We neglect this, as the amplitude remains small (a
few tens of micrometers) if compared to the length of the ceramic pile (3mm), or
even the wire diameter (100µm). The possible coupling with compression wave is not
relevant, as the range of frequency is distinct. (Compression wave speed in steel is a
few thousands of m/s, larger than what we consider here: c ≃ 300 m/s.) When this
wire is immersed into the turbulent flow, the resonant modes are excited by the drag
forcing. The quantities measured are averaged along the wire. They are therefore
global in space but local in scale, or more precisely in Fourier-space. The vortices
at scale l are expected to excite modes of wave-number k = 2π/l. In that sense,
the string is acting like a mechanical spectrometer, almost exactly like a Fabry-Perot
interferometer.
3. Measurements
Modulus of the response function is plotted in Fig. 2. It shows that the resonance
peaks are indeed very narrow, ensuring a very precise selection of wave-numbers:
the quality factor is approximately Q ≃ 4000. The imaginary part of the response
function is giving the dissipation. The width of the peaks in the modulus is also
Figure 2. Modulus of the response function versus the harmonic number, at
Re = 154000. The abscissa is given in non-dimensional coordinates, normalised
by the fundamental frequency.
linked to the dissipation, as well as the damping time after a perturbation. We used
in the following the measurement of the imaginary part of the response, but checked
that these different methods coincide. Only the resonant frequencies are considered
in this study, as they are much more sensitive to the velocity fluctuations. This is
Measurements of a dynamical temperature in turbulence. 6
especially important at large k, as the kinetic energy of the flow is small. Spectrum
of the fluctuation excited by the turbulent drag is shown in Fig. 3. Fluctuations
resonance peaks are clearly identified. Spurious vibrations are visible, mainly caused
by the vibrations of the stand. Because the peaks are very thin, long acquisitions
are necessary, as well as large windows for the FFT calculations (150000 points), in
order to achieve a sufficient resolution (0.33Hz). The protocol we used to find the
resonance frequencies, the value of the amplitude of fluctuations, and imaginary part
of the response, is the following. Resonance frequency is obtained by spline smoothing
each peak around the maximum amplitude of the response. Then, imaginary part is
measured after being also smoothed. The amplitude of the fluctuations peaks are
collected on the spectrum, after local smoothing around the maxima. One can see the
Figure 3. Spectrum of the resonance modes of the string excited by turbulent
drag fluctuations, at Re = 154000.
FDR in Fig. 4, called kBTeff., for several values of Re. Uncertainties on this ratio have
multiple origins. Errors indicated by the size of the symbols are those coming from
the determination of the resonance frequencies. Spurious vibrations of the stand are
difficult to handle: we perform measurements of response and fluctuations in the same
conditions, to reduce its influence on the ratio. We believe the scattering of the points
in Fig. 4 comes mainly from the weakening of signal/noise ratio for large frequencies,
simply because there is less energy in the flow at large k, especially at small Re.
The only possible escape on this point is to improve the coupling between the string
and the sensors. The wave-number has been rescaled with the internal viscous scale
η ∝ Re−3/4. The ordinates have been rescaled by an estimated number of degrees
of freedom: (L/η)3 ∝ Re9/4. These Re scalings are both usual consequences from
Kolmogorov’s theory. In other words, the “thermal energy” kBTeff. that the FDR is
representing in the framework of Cugliandolo et al ’s theory, is given per degree of
freedom. Assuming the number of degrees of freedom is the total number of particles
of size η in the total volume is usual, but crude. A more realistic description should
involve correlations between them, reducing this number. However, all the curves
collapse to a single power-law with this scaling. The exponent is discussed in the
Measurements of a dynamical temperature in turbulence. 7
Figure 4. Spectrum of the FDR, labelled as thermal agitation per degree of
freedom. Axis are rescaled with proper Reynolds number dependence, between
74000 and 170000. The size of the symbols represents the uncertainty in the
determination of the maxima of the peaks. The solid line is a k−11/3 power-law
given as an eye guide.
following section.
Please note that the equipartition of energy at equilibrium would require this spectrum
to be constant. There is no equilibrium between the Fourier modes, because of
the energy flux through scales. Moreover, they are not independent, and probably
not Gaussian. There is no reason to expect equipartition. Considering a kinematik
temperature as poportional to the kinetic energy, like in the kinetic theory of gases, it
would be: T ∝ 〈ṽ2〉. And, because of Kolmogorov’s theory it would scale as k−5/3.
The dependance we observe with our definition is much steeper.
4. Scaling law
Because the susceptibility of the string is very high at resonance, the half-wave-length
modes nλ/2 match with velocity structures of scale l (n is an integer). Therefore, the
wave number of the standing wave in the string k = n 2π/λ is the same as k = 2π/l.
The necessary condition for this matching is resonance. It also ensures that velocities
of the string and fluid equalise, which is crucial for the following argument.
Displacement is proportional to the drag forcing, itself proportional to velocity, as
drag is viscous: the string diameter-based Reynolds number is small (about 10).
The Melde string is not dispersive: ω = 2πf = ck, c being the wave velocity.
Therefore, the displacement is r = v/ω = v/(ck), and its power spectrum is:
〈r̃(ω)2〉 = 〈ṽ(ω)2〉(ck)−2 ∝ k−11/3. Because the viscous dissipation at each
resonance is proportional to frequency, the FDR of Eq. 2 is simply proportional to
c k 〈r̃(ω)2〉 ∝ k−11/3. Following Eq. 2, an effective “thermal agitation” defined by the
FDR would be: kBTeff. ∝ k
−11/3, in the inertial range of fully developed turbulence.
This exponent is compatible with the spectrum we measured, as can be seen in Fig.
Measurements of a dynamical temperature in turbulence. 8
5. Discussion
Theoretical characterisation of turbulence in terms of temperature were proposed in
the past by several authors. The temperatures as defined by T. M. Brown [10] and
B. Castaing [11] do not depend on k throughout the inertial range. The qualitative
idea is that the cascade transport process is efficient enough to equalise a quantity
they call temperature. In another model invoking an extremum principle, B. Castaing
proposed a definition of temperature, which might depend on scale [12]. In any case,
none of these theories invoke the FDR. On different basis, R. Robert and J. Sommeria
proposed a definition of temperature [13], only valid for 2D turbulence. It is not
expected to apply in a 3D flow.
Now, let’s consider our experimental results from the perspective of the three points of
reflexion we proposed in the first section, in relation with the FDT. 1- Linear response:
as we mentioned, the coupling between the string and the flow is purely viscous.
Therefore, drag force is proportional to velocity: f(t) = γ v(t), γ being a friction
coefficient. It is also the time-derivative of the position f(t) = γ ω r(t). Response is
linear in r, but the coefficient depends on frequency. 2- Are fluctuations and dissipation
proportional ? As we have seen, the measurements of the FDR are consistent with a
k−11/3 scaling, it is definitely not constant with respect to k. As our system is out
of equilibrium but stationary, there is no time evolution like the relaxation of glasses.
3- Setting a string in a turbulent flow allows to perform measurements on a couple
of conjugate force-displacement variables. We have no other set of observables to
compare with, for now.
We may ask whether what we measure is actually a temperature, in a dynamical
sense. If one assumes that each mode of the string is a harmonic oscillator, and that
a harmonic oscillator at equilibrium with a bath gives the temperature of this bath
through the FDR, then equilibrium between modes of the string and modes of the flow
means the temperature is equal: measurements give the temperature of the flow at this
corresponding scale. Such interpretation still rely on the assumption that FDR on the
oscillator gives the temperature of the oscilaror: this is our working hypothesis. By
equilibrium between modes of the string and the flow, we mean a ’no-flux’ condition
on energy. This is ensured by the high susceptibility of the string at resonance. In
other words, the probe and the reservoir are in equilibrium with each other for each
k, but equilibium is obviously not expected between one scale and another.
We have performed measurements on a turbulent flow, coupling to it a set of harmonic
oscillators: a Melde string. At equilibrium with the flow, in the sense that each mode
of the string couples with the fluid at scale l = πc/ω. It gives informations much
like a spectrometer, even though the flow itself is strongly out of equilibrium. This
is true, of course, as long as the response of the string is fast enough compared to
the frequencies of the velocity fluctuations. The displacement spectra are recorded at
different values of Re, as well as the complex response of the string over an excitation
(contributions of all the standing waves).
The matching of the string’s modes and hydrodynamic structures, what we call
equilibrium between the string and the flow, is still a questionable working hypothesis.
However, drawing inspiration from Cugliandolo et al ’s theory of non-equilibrium
temperature based on the FDR, we measured the Fluctuation over Dissipation Ratio
of our string in a turbulent flow, for different values of Re. The FDR, multiplied
by an appropriate power of the Reynolds number exhibits a unique power law, when
Reynolds number is between 74000 and 170000. The exponent is consistent with a
Measurements of a dynamical temperature in turbulence. 9
value −11/3 given by a very simple model derived from Kolmogorov 1941 theory.
Acknowledgments
We acknowledge B. Castaing, E. Leveque, P. Borgnat, F. Delduc, S. Ciliberto,
E. Bertin, and K. Gawedzki for many discussions. We also thank V. Bergeron,
T. Divoux, and V. Vidal for corrections on the manuscript and for many discussions.
Thanks to F. Dumas for his help in the construction of positioning devices. As this
system became a teaching experiment, several students contributed to this study as
part of their graduate lab-course. They are gratefully acknowledged: A. Louvet,
G. Bordes, I. Dossmann, J. Perret, C. Cohen, and M. Mathieu. We also thank the
guitar maker D. Teyssot, from Lyon, who gently gave us his thinnest E strings.
[1] L.D. Landau and E.M. Lifshitz. Course of Theoretical Physics: Fluid mechanics. Mir, 1971.
[2] A.S. Monin and A.M. Yaglom. Statistical fluid mechanics. MIT Press, Cambridge, 1975.
[3] U. Frisch. Turbulence: the legacy of A.N. Kolmogorov. Cambridge Univ. Press., 1995.
[4] L.F. Richardson. Weather prediction by numerical process. Cambridge Univ. Press, 1922.
[5] A.N. Kolmogorov. C. R. Acad. Sci. U.S.S.R., 30, 1941.
[6] M. Toda R. Kubo and N. Hashitsume. Statistical Physics II: Nonequilibrium Statistical
Mechanics, volume II. Springer, 1985.
[7] L. Cugliandolo and J. Kurchan. Phys. Rev. Lett., 71, 1993.
[8] J. Kurchan L. Cugliandolo and L. Peliti. Phys. Rev. E, 55, 1997.
[9] P. Marcq and A. Naert. Phys. of Fluids, 13, 2001.
[10] T.M. Brown. J. Phys. I, 15, 1982.
[11] B. Castaing. J. Phys. II, 6, 1996.
[12] B. Castaing. J. Phys. II, 50, 1989.
[13] J. Sommeria and R. Robert. J. Fluid Mech., 229, 1991.
	Introduction
	The Melde string and the experimental setup
	Measurements
	Scaling law
	Discussion
ABSTRACT
  We report on measurements of the transverse fluctuations of a string in a
turbulent air jet flow. Harmonic modes are excited by the fluctuating drag
force, at different wave-numbers. This simple mechanical probe makes it
possible to measure excitations of the flow at specific scales, averaged over
space and time: it is a scale-resolved, global measurement. We also measure the
dissipation associated to the string motion, and we consider the ratio of the
fluctuations over dissipation (FDR). In an exploratory approach, we investigate
the concept of {\it effective temperature} defined through the FDR. We compare
our observations with other definitions of temperature in turbulence. From the
theory of Kolmogorov (1941), we derive the exponent -11/3 expected for the
spectrum of the fluctuations. This simple model and our experimental results
are in good agreement, over the range of wave-numbers, and Reynolds number
accessible ($74000 \leq Re \leq 170000$).

<|endoftext|><|startoftext|>
Introduction
Mathai and Rathie (1975) consider various generalizations of Shannon en-
tropy (Shannon, 1948), called entropies of order α, and give various properties,
including additivity property, and characterization theorems. Recently, Mathai
and Haubold (2006, 2006a) explored a generalized entropy of order α, which
is connected to a measure of uncertainty in a probability scheme, Kerridge’s
(Kerridge, 1961) concept of inaccuracy in a scheme, and pathway models that
are considered in this paper.
As defined in Mathai and Haubold (2006, 2006a) the entropy Mk,α(P ) is a
non-additive entropy and his measure M∗k,α(P ) is an additive entropy. It is also
shown that maximization of the continuous analogue of Mk,α(P ), denoted by
Mα(f), gives rise to various functional forms for f , depending upon the types
of constraints on f .
http://arxiv.org/abs/0704.0326v2
Occasionally, emphasis is placed on the fact that Shannon entropy satisfies
the additivity property, leading to extensivity. It will be shown that when
the product probability property (PPP) holds then a logarithmic function can
give a sum and a logarithmic function enters into Shannon entropy due to the
assumption introduced through a certain type of recursivity postulate. The
concept of statistical independence will be examined in Section 1 to illustrate
that simply because of PPP one need not expect additivity to hold or that
one should not expect this PPP should lead to extensivity. The types of non-
extensivity, associated with a number of generalized entropies, are pointed out
even when PPP holds. The nature of non-extensivity that can be expected
from a multivariate distribution, when PPP holds or when there is statistical
independence of the random variables, is illustrated by taking a trivariate case.
Maximum entropy principle is examined in Section 2. It is shown that
optimization of measures of entropies, in the continuous populations, under
selected constraints, leads to various types of models. It is shown that the
generalized entropy of order α is a convenient one to obtain various probability
models.
Section 3 examines the types of differential equations satisfied by the various
special cases of the pathway model.
1.1. Product probability property (PPP) or statistical independence
of events
Let P (A) denote the probability of the event A. If the definition P (A∩B) =
P (A)P (B) is taken as the definition of independence of the events A and B then
any event A ∈ S, and S the sure event are independent. But A is contained in S
and then the definition of independence becomes inconsistent with the common
man’s vision of independence. Even if the trivial cases of the sure event S and
the impossible event φ are deleted, still this definition becomes a resultant of
some properties of positive numbers. Consider a sample space of n distinct
elementary events. If symmetry in the outcomes is assumed then we will assign
equal probabilities 1
each to the elementary events. Let C = A ∩B. If A and
B are independent then P (C) = P (A)P (B). Let
P (A) =
, P (B) =
, P (C) =
⇒ nz = xy, x, y, z = 1, 2, ..., n− 1, z < x, y (1)
deleting S and φ. There is no solution for x, y, z for a large number of n, for
example, n = 3, 5, 7. This means that there are no independent events in such
cases and it sounds strange from a common man’s point of view.
The term “independence” of events is a misnomer. This property should
have been called product probability property or PPP of events. There is no
reason to expect the information or entropy in a joint distribution to be the sum
of the information contents of the marginal distributions when the PPP holds
for the distributions, that is when the joint density or probability function is
a product of the marginal densities or probability functions. We may expect a
term due to the product probability to enter into the expression for the entropy
in the joint distribution in such cases. But if the information or entropy is
defined in terms of a logarithm, then naturally, logarithm of a product being
the sum of logarithms, we can expect a sum coming in such situations. This is
not due to independence or due to the PPP of the densities but due to the fact
that a functional involving logarithm is taken thereby a product has become
a sum. Hence not too much importance should be put on whether or not the
entropy on the joint distribution becomes sum of the entropies on marginal
distributions or additivity property when PPP holds.
1.2. How is logarithm coming in Shannon’s entropy?
Several characterization theorems for Shannon entropy and its various gen-
eralizations are given in Mathai and Rathie (1975. Modified and refined versions
of Shannon’s own postulates are given as postulates for the first theorem charac-
terizing Shannon entropy in Mathai and Rathie (1975). Apart from continuity,
symmetry, zero-indifference and normalization postulates the main postulate
in the theorem is a recursivity postulate, which in essence says that when the
PPP holds then the entropy will be a weighted sum of the entropies, thus in
effect, assuming a logarithmic functional form. The crucial postulate is stated
here. Consider a multinomial population P = (p1, ..., pm), pi > 0, i = 1, ...,m,
p1 + ... + pm = 1, that is, pi = P (Ai), i = 1, ...,m, A1 ∪ ... ∪ Am = S,
Ai ∩ Aj = φ, i 6= j. If any pi can take a zero value also then zero-indifferent
postulate, namely that the entropy remains the same when an impossible event
is incorporated into the scheme, is to be added. Let Hn(p1, ..., pn) denote the
entropy to be defined. Then the crucial recursivity postulate says that
Hn(p1, ..., pm−1, pmq1, .., pmqn−m+1)
= Hm(p1, ..., pm) + pmHn−m+1(q1, ..., qn−m+1) (2)
i=1 pi = 1,
∑n−m+1
i=1 qi = 1. This says that if the m-th event Am is par-
titioned into independent events P (Am ∩ Bj) = P (Am)P (Bj) = pmqj , j =
1, ..., n − m + 1 so that pm = pmq1 + ... + pmqn−m+1 then the entropy Hn(·)
becomes a weighted sum. Naturally, the result will be a logarithmic function
for the measure of entropy.
There are several modifications to this crucial recursivity postulate. One
suggested by Tverberg is that n−m+ 1 = 2 and q1 = q, q2 = 1− q, 0 < q < 1
and H2(q, 1 − q) is assumed to be Lebesgue integrable in 0 ≤ q ≤ 1. Again
a characterization of Shannon entropy is obtained. In all the characterization
theorems for Shannon entropy this recursivity property enters in one form or the
other as a postulate, which in effect implies a logarithmic form for the entropy
measure. Shannon entropy Sk has the following form:
Sk = −A
pi ln pi, pi > 0, i = 1, ..., k, p1 + ...+ pk = 1, (3)
where A is a constant. If any pi is assumed to be zero then 0 ln 0 is to be
interpreted as zero. Since the constant A is present, logarithm can be taken to
any base. Usually the logarithm is taken to the base 2 for ready application to
binary systems. We will take logarithm to the base e.
1.3. Generalization of Shannon entropy
Consider again a multinomial population P = (p1, ..., pk), pi > 0, i =
1, ..., k, p1 + ... + pk = 1. The following are some of the generalizations of
Shannon entropy Sk.
Rk,α(P ) =
i=1 p
, α 6= 1, α > 0, (4)
(Rényi entropy of order α of 1961)
Hk,α(P ) =
i=1 p
i − 1
21−α − 1
, α 6= 1, α > 0 (5)
(Havrda-Charvát entropy of order α of 1967)
Tk,α(P ) =
i=1 p
i − 1
, α 6= 1, α > 0 (6)
(Tsallis entropy of 1988)
Mk,α(P ) =
i=1 p
i − 1
, α 6= 1, −∞ < α < 2 (7)
(entropic form of order α)
M∗k,α(P ) =
i=1 p
, α 6= 1, −∞ < α < 2, (8)
(additive entropic form of order α).
When α → 1 all the entropies of order α described above in (4) to (7) go to
Shannon entropy Sk.
Rk,α(P ) = lim
Hk,α(P ) = lim
Tk,α(P ) = lim
Mk,α(P ) = lim
M∗k,α(P ) = Sk.
Hence all the above measures are called generalized entropies of order α.
Let us examine to see what happens to the above entropies in the case of a
joint distribution. Let pij > 0, i = 1, ...,m, j = 1, ..., n such that
j=1 pij =
1. This is a bivariate situation of a discrete distribution. Then the entropy in
the joint distribution, for example,
Mm,n,α(P,Q) =
j=1 p
ij − 1
. (10)
If the PPP holds and if pij = piqj , p1 + ... + pm = 1, q1 + ... + qn = 1,
pi > 0, i = 1, ...,m, qj > 0, j = 1, ..., n and if P = (p1, ..., pm), Q = (q1, ..., qn)
(α− 1)Mm,α (P ) Mn,α(Q) =
i − 1
j − 1
j + 1
= Mm,n,α(P,Q) −Mm,α(P )−Mn,α(Q).
Therefore
Mm,n,α(P,Q) = Mm,α(P ) +Mn,α(Q) + (α− 1)Mm,α(P )Mn,α(Q). (11)
If any one of the above mentioned generalized entropies in (4) to (8) is written
as Fm,n,α(P,Q) then we have the relation
Fm,n,α(P,Q) = Fm,α(P ) + Fn,α(Q) + a(α)Fm,α(P )Fn,α(Q). (12)
where
a(α) = 0 (Rényi entropy Rk,α(P ))
= 21−α − 1 (Havrda-Charvát entropy Hk,α(P ))
= 1− α (Tsallis entropy Tk,α(P ))
= α− 1 (entropic form of order α, i.e., Mk,α(P ))
= 0 (additive entropic form of order α, i.e., M∗k,α(P )). (13)
When a(α) = 0 the entropy is called additive and when a(α) 6= 0 the entropy
is called non-additive. As can be expected, when a logarithmic function is
involved, as in the cases of Sk(P ), Rk,α(P ),M
k,α(P ), the entropy is additive
and a(α) = 0.
1.4. Extensions to higher dimensional joint distributions
Consider a trivariate population or a trivariate discrete distribution pijk >
0, i = 1, ...,m, j = 1, ..., n, k = 1, ..., r such that
k=1 pijk = 1. If
the PPP holds mutually, that is, pair-wise as well as jointly, which then will
imply that
pijk = piqjsk,
pi = 1,
qj = 1,
sk = 1,
P = (p1, ..., pm), Q = (q1, ..., qn), S = (s1, ..., sr).
Then proceeding as before, we have for any of the measures described above in
(4) to (8), calling it F (·),
Fm,n,r,α(P,Q, S) = Fm,α(P ) + Fn,α(Q) + Fr,α(S) + a(α)[Fm,α(P )Fn,α(Q)
+Fm,α(P )Fr,α(S) + Fn,α(Q)Fr,α(S)]
+[a(α)]2Fm,α(P )Fn,α(Q)Fr,α(S) (14)
where a(α) is the same as in (13). The same procedure can be extended to any
multivariable situation. If a(α) = 0 we may call the entropy additive and if
a(α) 6= 0 then the entropy is non-additive.
1.5. Crucial recursivity postulate
Consider the multinomial population P = (p1, ..., pk), pi > 0, i = 1, ..., k, p1+
... + pk = 1. Let the entropy measure to be determined through appropriate
postulates be denoted by Hk(P ) = Hk(p1, ..., pk). For k = 2 let
f(x) = H2(x, 1− x), 0 ≤ x ≤ 1 or x ∈ [0, 1]. (15)
If another parameter α is to be involved in H2(x, 1−x) then we will denote f(x)
by fα(x). From (5) to (7) it can be seen that the generalized entropies of order
α of Havrda-Charvát (1967), Tsallis (1988, 2004) and Shannon (1948) entropy
satisfy the functional equation
fα(x) + bα(x)fα
= fα(y) + bα(x)f
for x, y ∈ [0, ) with x+ y ∈ [0, 1], with the boundary condition
fα(0) = fα(1) (17)
where
bα(x) = 1− x (Shannon entropy Sk(P ))
= (1− x)α (Harvda-Charvát entropy Hk,α(P ))
= (1− x)α (Tsallis entropy Tk,α(P ))
= (1− x)2−α (entropic form of order α, i.e., Mk,α(P )). (18)
Observe that the normalizing constant at x = 1
is equal to 1 for Hk,α(P ) and it
is different for other entropies. Thus equations (6),(7),(8), with the appropriate
normalizing constants fα(
), can give characterization theorems for the various
entropy measures. The form of bα(x) is coming from the crucial recursivity
postulate, assumed as a desirable property for the measures.
1.6. Continuous analogues
In the continuous case let f(x) be the density function of a real random
variable x. Then the various entropy measures, corresponding to the ones in (4)
to (8) are the following:
Rα(f) =
[f(x)]αdx
, α 6= 1, α > 0 (19)
(Rényi entropy of order α)
Hα(f) =
21−α − 1
[f(x)]αdx− 1
, α 6= 1, α > 0 (20)
(Havrda-Charvát entropy of order α)
Tα(f) =
[f(x)]αdx− 1
, α 6= 1, α > 0, (21)
(Tsallis entropy of order α)
Mα(f) =
[f(x)]2−αdx− 1
, α 6= 1, α < 2 (22)
(entropic form of order α)
M∗α(f) =
[f(x)]2−αdx
, α 6= 1, α < 2 (23)
(additive entropic form of order α).
As expected, Shannon entropy in this case is given by
S(f) = −A
f(x) ln f(x)dx (24)
where A is a constant.
Note that when PPP (product probability property) or statistical indepen-
dence holds then in the continuous case also we have the property in (12) and
(14) and then non-additivity holds for the measures analogous to the ones in
(3),(5),(6),(7) with a(α) remaining the same. Since the steps are parallel a
separate derivation is not given here.
2. Maximum Entropy Principle
If we have a multinomial population P = (p1, ..., pk), pi > 0, i = 1, ..., k, p1+
...+ pk = 1 or the scheme P (Ai) = pi, A1 ∪ ... ∪ Ak = S, P (S) = 1, Ai ∩ Aj =
φ, i 6= j then we know that the maximum uncertainty in the scheme or the
minimum information from the scheme is obtained when we cannot give any
preference to the occurrence of any particular event or when the events are
equally likely or when p1 = p2 = ... = pk =
. In this case, Shannon entropy
becomes,
Sk(P ) = Sk(
, ...,
) = −A
= A ln k (25)
and this is the maximum uncertainty or maximum Shannon entropy in this
scheme. If the arbitrary functional f is to be fixed by maximizing the entropy
then in (19) to (21) we have to optimize
[f(x)]αdx for fixed α, over all
functional f , subject to the condition
f(x)dx = 1 and f(x) ≥ 0 for all x.
For applying calculus of variation procedure we consider the functional
U = [f(x)]α − λ[f(x)]
where λ is a Lagrangian multiplier. Then the Euler equation is the following:
= 0 ⇒ αfα−1 − λ = 0 ⇒ f =
= constant. (26)
Hence f is the uniform density in this case, analogous to the equally likely
situation in the multinomial case. If the first moment E(x) =
xf(x)dx
is assumed to be a given quantity for all functional f then U will become the
following for (19) to (21).
U = [f(x)]α − λ1[f(x)]− λ2xf(x)
and the Euler equation leads to the power law. That is,
= 0 ⇒ αfα−1 − λ1 − λ2x = 0 ⇒ f = c1
. (27)
By selecting c1, λ1, λ2 appropriately we can create a density out of (27). For
α > 1 and λ2
> 0 the right side in (27) increases exponentially. If α = q > 1 and
= q − 1 then we have Tsallis’ q-exponential function from the right side of
(27). If α > 1 and λ2
= −(α−1) then (27) can produce a density in the category
of a type-1 beta. From (27) it is seen that the form of the entropies of Havrda-
CharvátHk,α(P ) and Tsallis Tk,α(P ) need special attention to produce densities
(Ferri et al. 2005). However, Tsallis has considered a different constraint on
E(x). If the density f(x) is replaced by its escort density, namely, µ[f(x)]α
where µ−1 =
[f(x)]αdx and if the expected value of x in this escort density
is assumed to be fixed for all functional f then the U of (26) becomes
U = fα − λ1f + µλ2xf
= 0 ⇒ αfα−1[1 + µλ2x] = λ1 ⇒ f =
(1+λ3x)
f = λ∗1[1 + λ3x]
where λ3 is a constant and λ
1 is the normalizing constant. If λ3 is taken as
λ3 = α− 1 then
f = λ∗1[1 + (α− 1)x]
α−1 . (28)
Then (28) for α > 1 is Tsallis statistics (Tsallis 2004, Cohen 2005). Then for
α < 1 also by writing α − 1 = −(1 − α) one gets the case of Tsallis statistics
for α < 1 (Ferri et al. 2005). These modifications and the consideration of
escort distribution are not necessary if we take the generalized entropy of order
α. Thus if we consider Mα(f) and if we assume that the first moment in f(x)
itself is fixed for all functional f then the Euler equation gives
(2− α)f1−α − λ1 + λ2x = 0 ⇒ f = λ̄
and for λ2
= 1− α we have Tsallis statistics (Tsallis 2004, Cohen 2005)
f = λ̄[1− (1− α)x]
1−α (29)
coming directly, where λ̄ is the normalizing constant.
Let us start with Mα(f) of (20) under the assumptions that f(x) ≥ 0 for all
f(x)dx = 1,
xδf(x)dx is fixed for all functional f and for a specified
δ > 0, f(a) is the same for all functional f , f(b) is the same for all functional
f , for some limits a and b, then the Euler equation becomes
(2 − α)f1−α − λ1 − λ2x
δ = 0 ⇒ f = c1[1 + c
1−α . (30)
If c∗1 is written as −s(1− α), s > 0 then we have, writing f1 for f ,
f1 = c1[1− s(1 − α)x
1−α , δ > 0, α < 1, 0 ≤ x ≤
[s(1− α)]
where 1 − s(1 − α)xδ > 0. For α < 1 or −∞ < α < 1 the right side of (31)
remains as a generalized type-1 beta model with the corresponding normalizing
constant c1. For α > 1, writing 1 − α = −(α − 1) the model in (31) goes to a
generalized type-2 beta form, namely,
f2 = c2[1 + s(α− 1)x
α−1 . (32)
When α → 1 in (31) or in (32) we have an extended or stretched exponential
form,
f3 = c3e
. (33)
If c∗1 in (30) is taken as positive then (30) for α < 1, α > 1, α → 1 will be
increasing exponentially. Hence all possible forms are available from (30). The
model in (31) is a special case of the distributional pathway model and for a
discussion of the matrix-variate pathway model see Mathai (2005). Special cases
of (31) and (32) for δ = 1 are Tsallis statistics (Gell-Mann and Tsallis, 2004;
Ferri et al. 2005).
Instead of optimizing Mα(f) of (22) under the conditions that f(x) ≥ 0
for all x,
f(x)dx = 1 and
xδf(x)dx is fixed, let us optimize under the
following conditions: f(x) ≥ 0 for all x,
f(x)dx < ∞ and the following two
moment-like expressions are fixed quantities for all functional f ,
x(γ−1)(1−α)f(x)dx = fixed ,
x(γ−1)(1−α)+δf(x)dx = fixed.
Then the Euler equation becomes
(2− α)f1−α −λ1x
(γ−1)(1−α)
− λ2x
(γ−1)(1−α)+δ = 0 ⇒
f = c xγ−1[1 + c∗xδ]
and for c∗ = −s(1 − α), s > 0, we have the distributional pathway model for
the real scalar case, namely
f(x) = c xγ−1[1− s(1− α)xδ ]
1−α , δ > 0, s > 0 (34)
where c is the normalizing constant. For α < 1, (34) gives a generalized type-1
beta form, for α > 1 it gives a generalized type-2 beta form and for α → 1
we have a generalized gamma form. For α > 1, (34) gives the superstatistics
of Beck (2006) and Beck and Cohen (2003). For γ = 1, δ = 1, (34) gives
Tsallis statistics (Tsallis 2004, Cohen 2005). Densities appearing in a number
of physical problems are seen to be special cases of (34), a discussion of which
may be seen from Mathai and Haubold (2006a). For example, (34) for δ =
2, γ = 3, α → 1, x > 0 is the Maxwell-Boltzmann density; for δ = 2, γ = 1, α →
1,−∞ < x < ∞ is the Gaussian density; for γ = δ, α → 1 is the Weibull density.
For γ = 1, δ = 2, 1 < q < 3 we have the Wigner function W (p) giving the atomic
moment distribution in the framework of Fokker-Planck equation, see Douglas,
Bergamini, and Renzoni (2006) where
W (p) = z−1q [1− β(1 − q)p
1−q , 1 < q < 3. (35)
Before closing this section we may observe one more property for Mα(f). As
an expected value
Mα(f) =
E[f(x)]1−α − 1
. (36)
But Kerridge’s (Kerridge, 1961) measure of “inaccuracy” in assigning q(x) for
the true density f(x), in the generalized form is
Hα(f : q) =
(21−α − 1)
E[q(x)]α−1 − 1
, (37)
which is also connected to the measure of directed divergence between q(x) and
f(x). In (37) the normalizing constant is 21−α−1, the same factor appearing in
Havrda-Charvt́ entropy. With different normalizing constants, as seen before,
(36) and (37) have the same forms as an expected value with q(x) replaced
by f(x) in (36). Hence Mα(f) can also be looked upon as a type of directed
divergence or “inaccuracy” measure.
3. Differential Equations
The functional part in (34), for a more general exponent, namely
g(x) =
= xγ−1[1− s(1 − α)xδ]
1−α , α 6= 1, δ > 0, β > 0, s > 0 (38)
is seen to satisfy the following differential equation for γ 6= 1 which defines the
differential pathway.
g(x) = (γ − 1)xγ−1[1− s(1− α)xδ]
−sβδxδ+γ−1[1− s(1− α)xδ]
(1−α)
. (39)
Then for δ =
(γ−1)(α−1)
, γ 6= 1, α > 1 we have
g(x) = (γ − 1)g(x)− sβδ[g(x)]1−
(1−α)
β (40)
= (γ − 1)g(x)− sδ[g(x)]α (41)
for β = 1, γ 6= 1, δ = (γ − 1)(α− 1), α > 1.
For γ = 1, δ = 1 in (38) we have
g(x) = −s[g(x)]η, η = 1−
(1 − α)
= −s[g(x)]α for β = 1. (43)
Here (43) is the power law coming from Tsallis statistics (Gell-Mann and Tsallis,
2004).
Acknowledgement The authors would like to thank the Department of Science
and Technology, Government of India, New Delhi, for the financial assistance for
this work under project No. SR/S4/MS:287/05 which enabled this collaboration
possible.
4. References
Beck, C. (2006). Stretched exponentials from superstatistics. Physica A, 365,
96-101.
Beck, C. and Cohen, E.G.D. (2003). Superstatistics. Physica A, 322, 267-275.
Cohen, E.G.D. (2005). Boltzmann and Einstein: Statistics and dynamics - An
unsolved problem. Pramana, 64, 635-643.
Douglas, P., Bergamini, S., and Renzoni, F. (2006). Tunable Tsallis distribution
in dissipative optical lattices. Physical Review Letters, 96, 110601-1-4.
Ferri, G.L., Martinez, S., and Plastino, A. (2005). Equivalence of the four
versions of Tsallis’s statistics. Journal of Statistical Mechanics: Theory and
Experiment, PO4009.
Gell-Mann, M. and Tsallis, C. (Eds.) (2004). Nonextensive Statistical Mechan-
ics: Interdisciplinary Applications. Oxford University Press, Oxford.
Havrda, J. and Charvát, F. (1967). Quantification method of classification pro-
cedures: Concept of structural α-entropy. Kybernetika, 3, 30-35.
Kerridge, D.F. (1961). Inaccuracy and inference. Journal of the Royal Statisti-
cal Society Series B, 23, 184-194.
Mathai, A.M. (2005). A pathway to matrix-variate gamma and normal densi-
ties. Linear Algebra and Its Applications, 396, 317-328.
Mathai, A.M. and Haubold, H.J. (2006). Pathway model, Tsallis statistics, su-
perstatistics and a generalized measure of entropy. Physica A , 375), 110-122.
Mathai,A.M. and Haubold, H.J. (2006a). On generalized distributions and path-
ways. arXiv:cond-mat/0609526v2.
Mathai, A.M. and Rathie, P.N. (1975). Basic Concepts in Information Theory
and Statistics: Axiomatic Foundations and Applications, Wiley Halstead, New
York and Wiley Eastern, New Delhi.
Rényi, A. (1961). On measure of entropy and information. Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability, 1960,
University of California Press, 1961, Vol. 1, 547-561.
Shannon, C.E. (1948). A mathematical theory of communication. Bell System
Technical Journal, 27, 379-423, 547-561.
Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Jour-
nal of Statistical Physics, 52, 479-487.
Tsallis, C. (2004). What should a statistical mechanics satisfy to reflect nature?,
Physica D, 193, 3-34.
http://arxiv.org/abs/cond-mat/0609526
ABSTRACT
  Product probability property, known in the literature as statistical
independence, is examined first. Then generalized entropies are introduced, all
of which give generalizations to Shannon entropy. It is shown that the nature
of the recursivity postulate automatically determines the logarithmic
functional form for Shannon entropy. Due to the logarithmic nature, Shannon
entropy naturally gives rise to additivity, when applied to situations having
product probability property. It is argued that the natural process is
non-additivity, important, for example, in statistical mechanics, even in
product probability property situations and additivity can hold due to the
involvement of a recursivity postulate leading to a logarithmic function.
Generalizations, including Mathai's generalized entropy are introduced and some
of the properties are examined. Situations are examined where Mathai's entropy
leads to pathway models, exponential and power law behavior and related
differential equations. Connection of Mathai's entropy to Kerridge's measure of
"inaccuracy" is also explored.

<|endoftext|><|startoftext|>
Introduction of the Ti4+ sublattice within the
Ru4+ sublattice provides a paradigmatic example, where
the charge density near Ti4+ sites is close to zero and each
Ru4+ site contributes 4 electrons in the valence band.
Such large charge fluctuation leads to a significant change
in spectral lineshape and a dip appears at ǫF (pseudo-
gap). Interestingly, the effects are much stronger in the
two dimensional (surface) electronic structure leading to
a soft gap at 50% substitution and eventually a hard gap
appears. Bulk electronic structure (3-dimensional), how-
ever, remains less influenced. A theoretical understand-
ing of these effects needs consideration of strong disorder
in addition to the electron correlation effects.
∗ Corresponding author: kbmaiti@tifr.res.in
[1] A. Fuhrmann, D. Heilmann, and H. Monien, Phys. Rev.
B 73, 245118 (2006).
[2] S.S. Kancharla and E. Dagotto, Phys. Rev. Lett. 98,
016402 (2007).
[3] Arti Garg, H.R. Krishnamurthy, and Mohit Randeria,
Phys. Rev. Lett. 97, 046403 (2006).
[4] N. Paris, K. Bouadim, F. Hebert, G.G. Batrouni, and
R.T. Scalettar, Phys. Rev. Lett. 98, 046403 (2007).
[5] J. Kim, J.-Y. Kim, B.-G. Park, and S.-J. Oh, Phys. Rev.
B 73, 235109 (2006), M. Abbate, J.A. Guevara, S.L.
Cuffini, Y.P. Mascarenhas, and E. Morikawa, Eur. Phys.
J. B 25, 203 (2002).
[6] S. Ray, D.D. Sarma, and R. Vijayaraghavan, Phys. Rev.
B 73, 165105 (2006).
[7] K.W. Kim, J.S. Lee, T.W. Noh, S.R. Lee, and K. Char,
Phys. Rev. B 71, 125104 (2005).
[8] R.S. Singh and K. Maiti, Solid State Commun, 140, 188
(2006).
[9] G. Cao, S. McCall, M. Shepard, J.E. Crow, and R.P.
Guertin, Phys. Rev. B 56, 321 (1997).
[10] K. Maiti and R.S. Singh, Phys. Rev. B 71, 161102(R)
(2005).
[11] K. Maiti, Phys. Rev. B 73, 235110 (2006).
[12] M. Takizawa, D. Toyota, H. Wadati, A. Chikamatsu, H.
Kumigashira, A. Fujimori, M. Oshima, Z. Fang, M. Lipp-
maa, M. Kawasaki, and H. Koinuma, Phys. Rev. B 72,
060404(R) (2005).
[13] K. Maiti, R.S. Singh, and V.R.R. Medicherla, Europhys.
Lett. (in print); Condmat/0604648.
[14] B.L. Altshuler and A.G. Aronov, Solid State Commun.
30, 115 (1979).
[15] D.D. Sarma et al., Phys. Rev. Lett. 80, 4004 (1998).
[16] A.L. Efros and B.I. Shklovskii, J. Phys. C: Solid State
Phys. 8, L49 (1975).
[17] J.G. Massey and M. Lee, Phys. Rev. Lett. 75, 4266
(1995).
[18] P. Blaha, K. Schwarz, G.K.H. Madsen, D. Kvasnicka, and
J. Luitz, WIEN2k, An Augmented Plane Wave + Lo-
cal Orbitals Program for Calculating Crystal Properties
(Karlheinz Schwarz, Techn. Universität Wien, Austria),
2001. ISBN 3-9501031-1-2.
[19] G. Treglia et. al., J. Physique 41, 281 (1980); ibid, Phys.
Rev. B 21, 3729 (1980); D.D. Sarma et al., Phys. Rev.
Lett. 57, 2215 (1986).
ABSTRACT
  We investigate the evolution of the electronic structure in SrRu_(1-x)Ti_xO_3
as a function of x using high resolution photoemission spectroscopy, where
SrRuO3 is a weakly correlated metal and SrTiO3 is a band insulator. The surface
spectra exhibit a metal-insulator transition at x = 0.5 by opening up a soft
gap. A hard gap appears at higher x values consistent with the transport
properties. In contrast, the bulk spectra reveal a pseudogap at the Fermi
level, and unusual evolution exhibiting an apparent broadening of the coherent
feature and subsequent decrease in intensity of the lower Hubbard band with the
increase in x. Interestingly, the first principle approaches are found to be
sufficient to capture anomalous evolutions at high energy scale. Analysis of
the spectral lineshape indicates strong interplay between disorder and electron
correlation in the electronic properties of this system.

<|endoftext|><|startoftext|>
Electroweak phase transitions in the MSSM with an
extra U (1)′
S.W. Ham(1), E.J. Yoo(2), and S.K. Oh(1,2)
(1) Center for High Energy Physics, Kyungpook National University,
Daegu 702-701, Korea
(2) Department of Physics, Konkuk University, Seoul 143-701, Korea
Abstract
We investigate the possibility of electroweak phase transition in the minimal
supersymmetric standard model (MSSM) with an extra U(1)′. This model has two
Higgs doublets and a singlet, in addition to a singlet exotic quark superfield. We
find that at the one-loop level this model may accommodate the electroweak phase
transitions that are strongly first-order in a reasonably large region of the parameter
space. In the parameter region where the phase transitions take place, we observe
that the lightest scalar Higgs boson has a smaller mass when the strength of the
phase transition becomes weaker. Also, the other three heavier neutral Higgs bosons
get more large masses when the strength of the phase transition becomes weaker.
http://arxiv.org/abs/0704.0328v1
I. INTRODUCTION
The baryon asymmetry of the universe can be dynamically generated during the evolution
of the universe, if the mechanism of baryogenesis satisfies the three Sakharov conditions
[1]. The three Sakharov conditions are: the presence of baryon number violation, the
violation of both C and CP, and a deviation from thermal equilibrium. It is known that
the universe can escape out of the thermal equilibrium by means of electroweak phase
transition, which should be strongly first-order in order to ensure sufficient deviation from
thermal equilibrium to generate the baryon asymmetry that is observed today. However,
it has been already recognized that the Standard Model (SM) has some difficulty to realize
the desired electroweak phase transition. The present experimental lower bound on the
mass of the SM Higgs boson does not allow the electroweak phase transition to be strongly
first-order [2, 3]. The electroweak phase transition is weakly first-order or higher order in
the SM. Thus, the SM is inadequate to generate sufficient baryon asymmetry. Moreover,
the amount CP violation in the Cabibbo-Kobayashi-Maskawa (CKM) matrix is too small
to account for the baryon asymmetry of the observed universe [4].
Consequently, new physical models beyond the SM have extensively been studied for
the possibility of reasonable explanation of the baryon asymmetry of the universe. Espe-
cially, the low energy supersymmetric models have been studied widely within the context
of electroweak baryogenesis [5-7]. The simplest supersymmetric model that includes the
SM is the minimal supersymmetric standard model (MSSM), which possesses in its su-
perpotential the µ term that accounts for the mixing between two Higgs doublets. The
µ parameter, which has the mass dimension, causes some problem with respect to its
energy scale [8]. Several possibilities have been investigated in the literature to solve the
so-called µ problem [9-12]. Introducing an additional U(1)′ to the MSSM is one of the
plausible explanations for the µ problem of the MSSM.
The MSSM with an extra U(1)′ can not only solve the µ problem but we will show
that it can also overcome the difficulties that the SM encounters when the SM tries to
satisfy the Sakharov conditions. This model can accommodate sufficient CP violation,
because it possesses other sources of CP violation besides the CKM matrix. It is possible
to realize the explicit CP violation in this model by means of complex CP phases arising
from the soft SUSY breaking terms [12].
Then, it is the purpose of this paper to show that this model indeed allows the strongly
first-order electroweak phase transitions such that it can successfully explain the baryo-
genesis. The characteristics of the electroweak phase transitions are determined essen-
tially by the temperature-dependent part of the Higgs potential. We construct the full
temperature-dependent Higgs potential at the one-loop level, and examine if the elec-
troweak phase transition may be strongly first-order. Two methods are employed for the
construction of the temperature-dependent Higgs potential. One method assumes that
the critical temperature at which the electroweak phase transition occurs is relatively
high, thus the temperature-dependent effective potential is approximated by retaining
only terms proportional to T 2, whereas the other method carries out numerically exact
integrations of the temperature-dependent effective potential. The thermal effects of par-
ticles whose masses are comparatively smaller than the critical temperature are included
at the one-loop level in the former method, whereas the particle content is different in the
latter method.
Either way, we obtain almost the same physical results. Unlike the MSSM, this model
allows a strongly first-order electroweak phase transition in a wide region of the parame-
ter space, and the first-order electroweak phase transition can be strong enough without
requiring a light stop quark. An interesting behavior of this model with respect to the
strongly first-order electroweak phase transition is that the mass of the lightest neutral
Higgs boson becomes larger when the phase transition gets stronger. On the other hand,
the masses of the other three neutral Higgs bosons become smaller when the phase tran-
sition gets stronger.
II. ZERO TEMPERATURE
The MSSM with an extra U(1)′ accommodates in its Higgs sector two Higgs doublets
H1 = (H
1 , H
1 ), H2 = (H
2 , H
2 ), and one Higgs singlet, S. In terms of these Higgs fields,
the relevant part of the superpotential of this model may be written as
W ≈ htQH2t
R + hbQH1b
R + hkSDLD̄R − λSH
ǫH2 , (1)
where we take into account only the third generation: tcR and b
R are, respectively, the
right-handed singlet top and bottom quark superfields, DR is the right-handed singlet
exotic quark (a vector-like down quark) superfield, Q is the left-handed SU(2) doublet
quark superfield of the third generation, and DL is the left-handed singlet exotic quark
superfield. Further, ht, hb and hk are, respectively, the dimensionless Yukawa coupling
coefficients of top, bottom, and exotic quark superfields, and ǫ is an antisymmetric 2× 2
matrix with ǫ12 = 1.
From the superpotential, at zero temperature, we can construct the Higgs potential
at the tree level, which may be read as
V0 = VF + VD + VS , (2)
where
VF = |λ|
2[(|H1|
2 + |H2|
2)|S|2 + |HT
1~σH1 +H
2~σH2)
(|H1|
2 − |H2|
(Q̃1|H1|
2 + Q̃2|H2|
2 + Q̃3|S|
2)2 ,
VS = m
2 +m2
2 +m2
|S|2 − [λAλ(H
ǫH2)S +H.c.] , (3)
where ~σ denotes the three Pauli matrices, g1, g2, and g
are the U(1), SU(2), and U(1)′
gauge coupling constants, respectively, Q̃1, Q̃2, and Q̃3 are the U(1)
′ hypercharges of H1,
H2, and S, respectively, and m
i (i = 1, 2, 3) are the soft SUSY breaking masses. In the
Higgs potential, λ and Aλ may in general be complex numbers. However, they will be
assumed to be real in the subsequent discussions, as we do not consider CP violation in
the Higgs sector. The soft masses are also assumed to be real, without loss of generality,
and they are eventually eliminated by imposing minimum conditions with respect to the
neutral Higgs fields, The gauge invariance of the superpotential under of U(1)′ requires
that the three U(1)′ hypercharges should satisfy Q̃1 + Q̃2 + Q̃3 = 0.
The above Higgs potential at the tree level would allow the three neutral Higgs fields
, and S to develop the vacuum expectation values (VEVs) v1(0), v2(0), and s(0),
respectively. Remark that these VEVs are obtained at zero temperature. However, for
simplicity, we omit the temperature dependence of these VEVs until next section where
we take into account the finite temperature effect.
The tree-level Higgs potential should now be corrected by the radiative one-loop effects.
In SUSY models, the radiative corrections due to the top and stop quarks contribute
most dominantly to the tree-level Higgs sector. Besides, if tanβ = v2/v1 is very large,
the radiative corrections due to the bottom and sbottom quarks should also be included
since they become no longer negligible. Furthermore, the radiative corrections due to the
exotic quark and squark may be important since the Yukawa coupling of the exotic quark
to the singlet field S can be large at the electroweak scale [11]. Therefore, we take into
account all the contributions from the top, bottom, exotic quark sector to the tree-level
Higgs potential.
The one-loop radiative corrections are evaluated by the effective potential method [13].
We assume that the squark masses are degenerate. Ignoring the mixings in the masses of
the squarks [14], the one-loop effective potential is given by
l=t,b,k
+ log
m̃2 +M2l
, (4)
where t, b, and k, respectively are top, bottom, and exotic quark fields including the
corresponding squark fields, Mt = ht|H2|, Mb = hb|H1|, Mk = hk|S| are the field-
dependent quark masses, and m̃ is the soft SUSY breaking mass, which is assumed that
m̃ = 1000 GeV ≫ mq (q= t, b, or k).
The Higgs sector of the present model consists of six physical Higgs bosons: a pair
of charged Higgs boson, one neutral pseudoscalar Higgs boson, and three neutral scalar
Higgs bosons. The tree-level mass of the charged Higgs boson is given by
m2C± = m
W − λ
2v2 +
2λAλs
sin 2β
, (5)
where v =
v21 + v
2 = 175 GeV and m
W = g
2/2 is the squared mass of the W boson.
At the tree level, the mass of the charged Higgs boson might be either smaller or larger
than the W boson mass.
The tree-level mass of the neutral pseudoscalar Higgs boson is given by
m2A =
2λAλv
sin 2α
, (6)
where tanα = (v/2s) sin 2β implies the splitting between the electroweak symmetry break-
ing scale and the extra U(1)′ symmetry breaking scale. Note that these tree-level masses
of both the neutral pseudoscalar and the charged Higgs bosons do not receive any radiative
corrections, because the squark masses are degenerate.
The tree-level squared masses of the three neutral scalar Higgs bosons are considerably
affected by the radiative corrections. Their squared masses at the one-loop level are given
as the eigenvalues of the 3×3 one-loop level mass matrix, whose elements may be written
M11 = m
Z cos
2 β + 2g
v2 cos2 β +m2A sin
2 β cos2 α + fa(m
M22 = m
Z sin
2 β + 2g
v2 sin2 β +m2A cos
2 β cos2 α + fa(m
t ) ,
M33 = 2g
2 +m2A sin
2 α + fa(m
M12 = g
Q̃1Q̃2v
2 sin 2β + (λ2v2 −m2Z/2) sin 2β −m
A cos β sin β cos
2 α ,
M13 = 2g
1 Q̃1Q̃3vs cos β + 2λ
2vs cosβ −m2A sin β cosα sinα ,
M23 = 2g
Q̃2Q̃3vs sin β + 2λ
2vs sinβ −m2A cos β cosα sinα , (7)
where m2Z = (g
)v2/2 is the squared mass of the Z boson, and the function fa(m
is defined as
3h2qm
m̃2 +m2q
4h2qm
m̃2 +m2q
(m̃2 +m2q)
. (8)
We assume that the masses of three scalar Higgs bosons Si are sorted such that mS1 ≤
mS2 ≤ mS3 .
III. FINITE TEMPERATURE
Now, let us study the temperature dependence of the Higgs potential in order to inves-
tigate the nature of the electroweak phase transition in the MSSM with an extra U(1)′.
We evaluate VT , the temperature-dependent part of the Higgs potential at the one-loop
level, using the effective potential method. It is given as [15]
l=B,F
dx x2 log
1± exp
x2 +m2l (φi)/T
, (9)
where B and F stand for bosons (t̃, b̃, and k̃) and fermions (t, b, and k), and nt = nb =
nk = −12 and nt̃ = nb̃ = nk̃ = 12. The negative sign is for bosons and the positive sign
is for fermions. Thus, the full Higgs potential at finite temperature at the one-loop level
is given by
V (T ) = V0 + V1 + VT (10)
For numerical analysis, we need to set the values of the relevant parameters of the
model. As in the previous section, the soft SUSY breaking mass is set as m̃ = 1000 GeV.
The quark masses are set as mt = 175 GeV, mb = 4 GeV, and mk = 400 GeV. From
these values, mq̃ =
m̃2 +m2q (q = t, b, k) yield the squark masses as mt̃ = 1015 GeV,
= 1000 GeV, and m
= 1077 GeV.
Some caution should be taken for setting the values of Q̃i (i=1, 2, 3), the U(1)
hypercharges of the Higgs doublets and the Higgs singlet. In the MSSM with an extra
U(1)′, the extra neutral gauge boson mass (mZ′) and the mixing angle (αZZ′) between the
two neutral gauge bosons (Z,Z ′) may impose strong constraints on the parameter values.
For our numerical analysis, mZ′ is estimated to be larger than 600 GeV, and αZZ′ smaller
than 2 × 10−3, for tan β = 3 and s(T = 0) = 500 GeV. Besides, as recent research has
suggested [10], we impose the constraint of Q̃1Q̃2 > 0. Further, the U(1)
′ gauge invariance
condition requires that Q̃3 = −(Q̃1 + Q̃2).
In this paper, we define new charges Qi = g
1Q̃i since Q̃i appear always together with
. Then, one may establish the allowed area in the (Q1, Q2)-plane by imposing the above
constraints. For tanβ = 3 and s(T = 0) = 500 GeV, the result is shown in Fig. 1, where
the small area near the point (Q1, Q2) = (-1, 0) and the upper right corner of Fig. 1
are the allowed areas. The hatched region is the excluded area. There are two specific
points in Fig. 1, marked by a star (∗) and a cross (+). The values of Q1 and Q2 at
the star-marked point correspond to the ν-model of E6 gauge group realizations [11]. We
would take the values of Q1 and Q2 at the cross-marked point, namely, (Q1, Q2) = (-1,
-0.1), and hence Q3 =1.1.
With these parameter values at hand, we would investigate the possibility of the
strongly first-order electroweak phase transition by using two different ways. The first
method is to retain only the dominant T 2-proportional part from the high-temperature
approximation of VT , and to take account only those particles whose masses are relatively
small [6]. The second method is to perform the integration in VT in numerically exact way,
and to consider only the contributions of top, bottom, and exotic quarks and squarks.
1. Method A
Let us start with the high temperature approximation of VT , which is expressed as [3]
VT ≈ −
i=t,b,k
T 2m2i (φi)
m4i (φi)
m2i (φi)
cFT 2
i=t̃,b̃,k̃
T 2m2i (φi)
Tm3i (φi)
m4i (φi)
m2i (φi)
cBT 2
, (11)
where log cF = 2.64 and log cB = 5.41. It is known that in the SM the high temperature
approximation is consistent with the exact integration of VT within 5 % at temperature
T for mF/T < 1.6 and mB/T < 2.2, where mF and mB are respectively the fermion mass
and the boson mass that participate in the potential.
We select those terms that are proportional to T 2 in the above expression, which
become most dominant at high temperature. Thus, we assume that the temperature at
which the electroweak phase transition takes place is sufficiently high. We also assume
that the U(1) and SU(2) gaugino masses M1 and M2 in the chargino and neutralino
sectors are very much larger than the other mass parameters. We take into account the
thermal effects due to the Higgs bosons, W , Z, and the extra U(1) gauge boson in the
boson sector, and t, b, k quarks, the lighter chargino, and the three light neutralinos in the
fermion sector, because their masses are relatively small as compared with temperature,
similarly to the analyses of previous articles [6]. Explicitly, the T 2 terms in the high
temperature approximation of VT can be expressed as
+ 4m2
+ 2m2
+ (2g2
+ 6g2
+ 6λ2)(|H1|
2 + |H2|
2) + 12λ2|S|2
+ 12g
2 + Q̃2
2 + Q̃2
|S|2) + 2g
Q̃1Q̃2(|H1|
2 + |H2|
Q̃2Q̃3(|H2|
2 + |S|2) + 2g
Q̃1Q̃3(|H1|
2 + |S|2)
1 (Q̃1 + Q̃2)(Q̃1|H1|
2 + Q̃2|H2|
2 + Q̃3|S|
+6(h2t |H2|
2 + h2b |H1|
2 + h2k|S|
. (12)
Now, the neutral scalar Higgs fields develop the temperature-dependent VEVs, v1(T ),
v2(T ), and s(T ), which we will simply denote v1, v2, and s, respectively. In terms of
these temperature-dependent VEVs, the vacuum at finite temperature is defined as the
minimum of V (T ) as
〈V (v1, v2, s, T )〉 = 〈V0〉+ 〈V1〉+ 〈VT 〉 , (13)
where
〈V0〉 = m
g21 + g
)2 + λ2(v2
s2 + v2
− 2λAλv1v2s+
(Q̃1v
+ Q̃2v
+ Q̃3s
2)2 ,
〈V1〉 = fb(m
t ) + fb(m
b) + fb(m
〈VT 〉 =
+ 4m2
+ 2m2
+ (2g2
+ 6g2
+ 6λ2)(v2
) + 12λ2s2
+ 12g
+ Q̃2
+ Q̃2
s2) + 2g
Q̃1Q̃2(v
1 Q̃2Q̃3(v
2 + s
2) + 2g
1 Q̃1Q̃3(v
1 + s
1 (Q̃1 + Q̃2)(Q̃1v
1 + Q̃2v
2 + Q̃3s
2) + 6(h2tv
2 + h
1 + k
. (14)
In the above expressions, the function fb is defined as
+ log
m̃2 +m2q
, (15)
and the soft SUSY breaking masses at the one-loop level are given as
cos 2β − λ2(s(0)2 + v(0)2 sin2 β) + λAλs(0) tanβ
Q̃1(Q̃1v(0)
2 cos2 β + Q̃2v(0)
2 sin2 β + Q̃3s(0)
2)− fc(m
b(0))
cos 2β − λ2(s(0)2 + v(0)2 cos2 β) + λAλs(0) cotβ
Q̃2(Q̃1v(0)
2 cos2 β + Q̃2v(0)
2 sin2 β + Q̃3s(0)
2)− fc(m
t (0))
= − λ2v(0)2 +
2s(0)
v(0)2Aλ sin 2β
Q̃3(Q̃1v(0)
2 cos2 β + Q̃2v(0)
2 sin2 β + Q̃3s(0)
2)− fc(m
k(0)) , (16)
where v1(0), v2(0), and s(0) are the VEVs evaluated at zero temperature in the preceding
section, tan β = v2(0)/v1(0), v(0) =
v1(0)2 + v2(0)2 = 175 GeV, and the function fc is
defined as
3h2qm
2 + 2 log
m̃2 +m2q
m̃2 +m2q
. (17)
Now, let us determine the critical temperature at which the electroweak phase tran-
sition takes place. In our analysis, the critical temperature is defined by a temperature
at which 〈V (T )〉 has two distinct minima with equal value, that is, a pair of degenerate
vacua. In order to have a pair of degenerate vacua, the potential 〈V (T )〉 should satisfy
the minimum condition of
0 = 2m2
s− 2λAλv1v2 + 2λ
1 Q̃3s(Q̃1v
1 + Q̃2v
2 + Q̃3s
2) + 2h2kmkfc(m
s[24λ2 + 24g
+ 20g
Q̃3(Q̃1 + Q̃2) + 12k
2] , (18)
which is obtained by calculating the first derivative of the full effective potential at the
finite temperature with respect to s.
For given parameter values at given temperature, one may solve the above minimum
condition to express s in terms of the other two VEVs, v1 and v2. Then, by substituting
s into 〈V (v1, v2, s, T )〉, one may obtain 〈V (v1, v2, T )〉 which depends only on v1 and v2.
By inspecting the shape of 〈V (v1, v2, T )〉 on the (v1, v2)-plane for given parameter values
at given temperature, we may determine whether it possess a pair of degenerate vacua or
In Fig. 2, the equipotential contours of 〈V (v1, v2, T )〉 are plotted on the (v1, v2)-
plane, where the parameter values are set as tanβ = 3, λ = 0.8, s(0) = 500 GeV,
mA = 1830 GeV, and the temperature is set as T = 100 GeV, which is actually the
critical temperature Tc. One can easily spot two distinct minima of 〈V (v1, v2, T )〉 on
the (v1, v2)-plane, namely, one at (0, 0) and the other at (275, 640) GeV. The phase of
the state is symmetric at the minimum point (0, 0) on the (v1, v2)-plane, whereas it is
broken at (275, 640) GeV. The electroweak phase transition may take place from (0, 0) to
(275, 640) GeV on the (v1, v2)-plane, which is evidently discontinuous and therefore it is
first-order.
The distance on the (v1, v2)-plane between the two minima of 〈V (v1, v2, T )〉, defined
as vc, determines the strength of the electroweak phase transition. The electroweak phase
transition is said to be strong if vc/Tc > 1, and weak otherwise. In Fig. 2, the distance is
calculated to be
(275− 0)2 + (640− 0)2 = 696 (GeV) . (19)
In Fig. 2, the strength of the electroweak phase transition is about vc/Tc = 6.9, which
definitely tells that the electroweak phase transition is a strong one. Therefore, the
particular parameter values set for Fig. 2 yields an electroweak phase transition which
is first-order as well as strong. Note that vc does not depend on s, that is, we need not
to know the values of s at the two minima to calculate vc. Actually, vc is the VEV at
the broken phase. The masses of the neutral scalar Higgs bosons at zero temperature
for the parameter values of Fig. 2 are obtained as mS1 = 56 GeV, mS2 = 807 GeV, and
mS3 = 1827 GeV.
We repeat the above job of analysis, varying the values of the relevant parameters. We
find that there are a large number of sets of parameter values that allow strongly first-order
electroweak phase transitions. Thus, the MSSM with an extra U(1)′ may accommodate
TABLE 1: Some sets of λ and mA that allow strongly first-order electroweak phase
transitions in the MSSM with an extra U(1)′, obtained by Method A. The values of other
parameters are fixed as tanβ = 3, s(0) = 500 GeV, m̃ = 1000 GeV, and Tc = 100
GeV. The pair of numbers in the third column are the coordinates of the broken-phase
minimum of 〈V (v1, v2, T )〉. The coordinates of its symmetric-phase minimum is (0, 0)
for all sets. The three numbers in the fourth column are the masses of S1, S2, and S3,
respectively. The number in the last column is the strength of the first-order electroweak
phase transition.
λ mA (GeV) (v1, v2) (GeV) mS1 , mS2 , mS3 (GeV) vc/Tc
0.1 478 (1750, 1650) 120, 524, 792 26
0.2 675 (1400, 1500) 118, 674, 796 23
0.3 900 (1200, 1400) 112, 786, 908 18
0.4 1109 (870, 1200) 104, 792, 1112 15
0.5 1306 (600, 1000) 93, 796, 1307 12
0.6 1486 (430, 850) 82, 800, 1485 8
0.7 1660 (340, 700) 70, 803, 1658 7
0.8 1830 (275, 640) 56, 807, 1827 6.9
the desired phase transitions for a wide region in its parameter space. Some of the results
are listed in Table 1, where tanβ = 3, s(0) = 500 GeV, and T = 100 GeV are fixed as
the values set in Fig. 2, whereas λ and mA have different values. The set of numbers in
the last row of Table 1 is the numerical result of Fig. 2.
Every set of numbers in each row of Table 1 gives 〈V (v1, v2, T )〉 a pair of degenerate
minima, the minimum of symmetric phase at (0, 0) on the (v1, v2)-plane, and the one of
broken phase at a different point on the (v1, v2)-plane as given in Table 1. The electroweak
phase transition is strongly first-order. One may easily observe in Table 1 that, as the
value of λ increases, a larger value of mA allow desired phase transitions. On the other
hand, the strength of the phase transition is reinforced if the value of λ decreases.
The masses of the neutral scalar Higgs bosons exhibit some interesting behavior. For a
larger value of mA, both S2 and S3 have also larger masses whereas S1 has a smaller mass.
The tendency is that the strength of the phase transition is reinforced if mS1 increases
and if mA, mS2 , and mS3 decrease. In the SM, the strength of the first order electroweak
phase transition decreases if its single Higgs boson mass is increased. Also, in the MSSM,
we have a weaker phase transition if the lighter one of its two scalar Higgs bosons has a
larger mass. In this regard, the tendency of our model is opposite to those of the SM or
the MSSM. One can see that this strange behavior also occurs in some parameter region
of a non-minimal SUSY model, as shown in Fig. 3 of Ref. [7].
2. Method B
The second method evaluates VT by exact integration to obtain the temperature-dependent
full potential V (T ) at one-loop level, where the thermal effects of top, bottom, and exotic
quarks and squarks are taken into account. The thermal effects of the gauge bosons can
be a help for strengthening the first-order electroweak phase transition, but we would
omit them, since the strength of the phase transition is already strong enough.
This method starts with the exact integral expression for 〈VT 〉 after replacing the
neutral Higgs fields by their VEVs as
〈VT 〉 = −
l=t,b,k
dx x2 log
1− exp
m2l (v1, v2, s)
l=t̃,b̃,k̃
dx x2 log
1 + exp
m̃2 +m2l (v1, v2, s)
 ,(20)
which is different from 〈VT 〉 of Method A, while 〈V0〉 and 〈V1〉 are the same as those of
Method A. From the full 〈V (T )〉 = 〈V0〉 + 〈V1〉 + 〈VT 〉, we obtain a minimum condition
for degenerate vacua as
0 = 2m2
s− 2λAλv1v2 + 2λ
)s+ 2g
Q3s(Q̃1v
+ Q̃2v
+ Q̃3s
+ 2h2kmkfc(m
dx x2
2h2ks exp(−
x2 +m2k/T
x2 +m2k/T
1 + exp(−
x2 +m2k/T
dx x2
2h2ks exp(−
x2 + (m̃2 +m2k)/T
x2 + (m̃2 +m2k)/T
1 + exp(−
x2 + (m̃2 +m2k)/T
] , (21)
where mk depends only on s and is independent from v1 and v2.
Solving the above minimum condition is harder than solving the corresponding mini-
mum condition of Method A. Nevertheless, we can solve it by using the bisection method
to express s in terms of the other parameters. Then, eliminating s from 〈V (T )〉, we can
obtain the expression for 〈V (v1, v2, T )〉 which depends only on v1 and v2. Subsequent
steps of numerical analysis are the same as the previous method.
In Fig. 3, equipotential contours of 〈V (v1, v2, T )〉 obtained by the present method is
plotted on the (v1, v2)-plane, where the parameter values are set slightly different from
the previous method: tan β = 3, λ = 0.8, s(0) = 500 GeV, mA = 1780 GeV, and T = 100
GeV. The shape of the equipotential contours of Fig. 3 is almost the same as that of
Fig. 2. One can see that there are two distinct minima in Fig. 3, just like Fig. 2:
one at (0, 0), and the other at (165, 440) GeV on the (v1, v2)-plane, indicating that the
phase transition is first order. The strength of the first-order phase transition is strong,
since vc/Tc = 4.7. The masses of the three scalar Higgs bosons are evaluated at zero
temperature as mS1 = 82 GeV, mS2 = 804 GeV, and mS3 = 1777 GeV.
TABLE 2: Some sets of λ and mA that allow strongly first-order electroweak phase
transitions in the MSSM with an extra U(1)′, obtained by Method B. Other descriptions
are the same as Table 1.
λ mA GeV (v1B, v2B) GeV mSi GeV vc/Tc
0.1 462 (1600, 1600) 121, 468, 791 22
0.2 663 (1400, 1400) 118, 662, 795 19
0.3 885 (1100, 1100) 113, 785, 894 15
0.4 1095 (800, 1200) 106, 792, 1098 14
0.5 1287 (680, 990) 97, 796, 1288 12
0.6 1457 (400, 750) 91, 799, 1456 8
0.7 1620 (300, 600) 86, 801, 1618 6
0.8 1780 (165, 440) 82, 804, 1777 4.7
Comparing Fig. 3 with Fig. 2, one may safely remark that Method A and Method
B lead qualitatively the same results. Either method, whether 〈VT 〉 is calculated by
direct integration or is simplified by high-temperature approximation, and whether the
participating particles at the one-loop level are somewhat exhaustive or selective, we
find that the MSSM with and extra U(1)′ allows strongly first-order electroweak phase
transitions for certain region in its parameter space.
We repeat the numerical analysis by varying the parameter values. and some of the
results are listed in Table 2. Like in Table 1, tan β = 3, s(0) = 500 GeV, and T = 100
GeV are fixed, whereas λ and mA are varied. The set of numbers in the last row of Table
2 is the numerical result of Fig. 3. Comparing Table 2 with Table 1, one may notice that
the numbers are slightly different from each other but the general behavior of the two
tables is exactly the same.
IV. DISCUSSIONS AND CONCLUSIONS
We investigate the MSSM with an extra U(1)′ if it could accommodate strongly first-
order electroweak phase transitions to provide sufficient baryon asymmetry, for reasonable
masses of scalar Higgs bosons. To do so, we need the temperature-dependent part of
the Higgs potential at the one-loop level. Explicitly, its expression is obtained by two
complementary methods: Method A employs high-temperature approximation and retains
only the most dominant T 2 terms, and takes into account the thermal effects at the one-
loop level of various participating particles. On the other hand, method B performs
numerical integrations, and the thermal effects of top, bottom, and exotic quarks and
squarks are accounted for.
Both methods lead us to essentially the same conclusion: the strongly first-order
electroweak phase transition is possible in the MSSM with an extra U(1)′, for a wide
region in its parameter space. The masses of the scalar Higgs bosons are obtained within
reasonably acceptable ranges. Accordingly, we may expect that the MSSM with an extra
U(1)′ can explain the baryon asymmetry of the universe.
We remark that the MSSM with an extra U(1)′ exhibits an interesting behavior with
respect to the correlation between the strength of the phase transition and the Higgs
boson masses. The MSSM with an extra U(1)′ is opposite to the SM or to the MSSM
in the sense that the mass of the lightest scalar Higgs boson increases when the strength
of the strongly first-order electroweak phase transition becomes stronger. In the SM, its
single Higgs boson has a larger mass when the strength of the first order electroweak
phase transition decreases. In the MSSM, we also have a larger mass for the lighter one
of its two scalar Higgs bosons when the phase transition becomes weaker.
ACKNOWLEDGMENTS
This research is supported by KOSEF through CHEP. The authors would like to
acknowledge the support from KISTI (Korea Institute of Science and Technology Infor-
mation) under ”The Strategic Supercomputing Support Program” with Dr. Kihyeon Cho
as the technical supporter. The use of the computing system of the Supercomputing
Center is also greatly appreciated.
[1] A.D. Sakharov, JETP Lett. 5, 24 (1967).
[2] V.A. Kuzmin, V.A. Rubakov, and M.E. Shaposhnikov, Phys. Lett. B 155, 36 (1985);
M.E. Shaposhnikov, JETP Lett. 44, 465 (1986); Nucl. Phys. B 287, 757 (1987);
Nucl. Phys. B 299, 797 (1988); L. McLerran, Phys. Rev. Lett. 62, 1075 (1989); N.
Turok and J. Zadrozny, Phys. Rev. Lett. 65, 2331 (1990); Nucl. Phys. B 358, 471
(1991); L. McLerran, M.E. Shaposhnikov, N. Turok, and M. Voloshin, Phys. Lett.
B 256, 451 (1991); M. Dine, P. Huet, R. S. Singleton Jr., and L. Susskind, Phys.
Lett. B 257, 351 (1991); A. I. Bochkarev, S. V. Kuzmin, and M. E. Shaposhnikov,
Phys. Lett. B 244, 257 (1990); Mod. Phys. Lett. A 2, 417 (1987); P. Arnold and O.
Espinosa, Phys. Rev. D 47, 3546 (1993); Z. Fodor and A. Hebecker, Nucl. Phys. B
432, 127 (1994); K. Kajantie, M. Laine, K. Rummukainen, and M. Shaposhnikov,
Phys. Rev. Lett. 77, 2887 (1996); A. G. Cohen, D. B. Kaplan, and A. E. Nelson,
Annu. Rev. Nucl. Part. Sci. 43, 27 (1993); M. Trodden, Rev. Mod. Phys. 71, 1463
(1999); A. Riotto and M. Trodden, Annu. Rev. Nucl. Part. Sci. 49, 35 (1999); F.
Csikor and Z. Fodor, and J. Heitger, Phys. Rev. Lett. 82, 21 (1999); F. Csikor, Z.
Fodor, and J. Heitger, Phys. Rev. Lett. 82, 21 (1999).
[3] G.W. Anderson and L.J. Hall, Phys. Rev. D 45, 2685 (1992).
[4] S. Barr, G. Segre, and A. Weldon, Phys. Rev. D 20, 2494 (1979); G.R. Farrar and
M.E. Shaposhnikov, Phys. Rev. D 50, 774 (1994).
[5] M. Carena, M. Quiros, and C.E.M Wagner, Phys. Lett. B 380, 81 (1996); Nucl.
Phys. B 524, 3 (1998); B. de Carlos and J. R. Espinosa, Nucl. Phys. B 503, 24
(1997); M. Laine and K. Rummukainen, Phys. Rev. Lett. 80, 5259 (1998); Nucl.
Phys. B 535, 423 (1998); J.M. Cline and G.D. Moore, Phys. Rev. Lett. 81, 3315
(1998); A.T. Davies, C.D. Froggatt, and R.G. Moorhouse, Phys. Lett. B 372, 88
(1996); S.J. Huber and M. G. Schmidt, Eur. Phys. J. C 10, 473 (1999); A. Menon,
D.E. Morrissey, and C.E.M. Wagner, Phys. Rev. D 70, 035005 (2004); S.W. Ham,
S.K. Oh, and D. Son, Phys. Rev. D 71, 015001 (2005); J. Kang, P. Langacker, T. Li,
and T. Liu, Phys. Rev. Lett. 94, 061801 (2005).
[6] M. Pietroni, Nucl. Phys. B 402, 27 (1993); M. Bastero-Gil, C. Hugonie, S. F. King,
D. P. Roy, and S. Vempati, Phys. Lett. B 489, 359 (2000); S.W. Ham, S.K. Oh, C.M.
Kim, E.J. Yoo, and D. Son, Phys. Rev. D 70, 075001 (2004).
[7] S.J. Huber and M.G. Schmidt, Nucl. Phys. B 606, 183 (2001).
[8] J.E. Kim and H.P. Nilles, Phys. Lett. B 138, 150 (1984).
[9] J.L. Hewett and T.G. Rizzo, Phys. Rep. 183, 193 (1989); A. Leike, Phys. Rep. 317,
143 (1999); M. Cvetic and P. Langacker, Phys. Rev. D 54, 3570 (1996); M. Cvetic,
D. A. Demir, J. R. Espinosa, L. Everett, and P. Langacker, Phys. Rev. D 54, 3570
(1996); D.A. Demir and N.K. Pak, Phys. Rev. D 57, 6609 (1998); Y. Daikoku and
D. Suematsu, Phys. Rev. D 62, 095006 (1998); H. Amini, New J. Phys. 5, 49 (2003).
[10] M. Cvetic, D.A. Demir, J.R. Espinosa, L.L. Everett, and P. Langacker, Phys. Rev.
D 56, 2861 (1997); Erratum-ibid. D 58, 119905 (1998).
[11] S.F. King, S. Moretti, and R. Nevzorov, Phys. Rev. D 73, 035009 (2006); Phys. Lett.
B 634, 278 (2006).
[12] D.A. Demir and L.L. Everett, Phys. Rev. D 69, 015008 (2004); S.W. Ham, E.J. Yoo,
and S.K. Oh, hep-ph/0703041.
[13] S. Coleman and E. Weinberg, Phys. Rev. D 7, 1888 (1973).
[14] Y. Okada, M. Yamaguchi, and T. Yanagida, Prog. Theor. Phys. 85, 1 (1991).
[15] L. Dolan and R. Jackiw, Phys. Rev. D 9, 3320 (1974).
http://arxiv.org/abs/hep-ph/0703041
FIGURE CAPTION
FIG. 1. : The allowed area in the (Q1, Q2)-plane. For tanβ = 3 and s(T = 0) = 500
GeV, the small area near the point (Q1, Q2) = (-1, 0) and the upper right corner are the
allowed areas, whereas the hatched region is the excluded area. There are two specific
points, marked by a star (∗) and a cross (+). The values of Q1 and Q2 at the star-marked
point correspond to the ν-model of E6 gauge group realizations. The values of Q1 and Q2
at the cross-marked point are (Q1, Q2) = (-1, -0.1), and hence Q3 =1.1. In our discussions,
we choose this point.
FIG. 2. : The plot of the equipotential contours of 〈V (v1, v2, T )〉 on the (v1, v2)-plane,
obtained by Method A. The parameter values are set as tan β = 3, λ = 0.8, s(0) = 500
GeV, mA = 1830 GeV, and the temperature is set as T = 100 GeV, which is actually the
critical temperature Tc. Notice two distinct minima of 〈V (v1, v2, T )〉 on the (v1, v2)-plane:
(0, 0) where the phase of the state is symmetric, and (275, 640) GeV, where the phase
of the state is broken. The electroweak phase transition may take place from (0, 0) to
(275, 640) GeV on the (v1, v2)-plane, which is evidently discontinuous and therefore it is
first order. The distance between the two minima is vc = 696 GeV, indicating that the
strength of the first-order phase transition is strong (vc/Tc > 1). The masses of the three
scalar Higgs bosons are obtained as mS1 = 56 GeV, mS2 = 807 GeV, and mS3 = 1827
FIG. 3. : The plot of the equipotential contours of 〈V (v1, v2, T )〉 on (v1, v2)-plane, ob-
tained by Method B. The parameter values are set as tan β = 3, λ = 0.8, s(0) = 500
GeV, mA = 1780 GeV, and Tc = 100 GeV. The coordinates of two minima are: (0, 0)
and (165, 440) GeV. The distance between the two minima is vc = 470 GeV, thus the
electroweak phase transition between the two minima is strongly first-order. The masses
of the three scalar Higgs bosons are obtained as mS1 = 82 GeV, mS2 = 804 GeV, and
mS3 = 1777 GeV.
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
FIG. 1: The allowed area in the (Q1, Q2)-plane. For tanβ = 3 and s(T = 0) = 500
GeV, the small area near the point (Q1, Q2) = (-1, 0) and the upper right corner are the
allowed areas, whereas the hatched region is the excluded area. There are two specific
points, marked by a star (∗) and a cross (+). The values of Q1 and Q2 at the star-marked
point correspond to the ν-model of E6 gauge group realizations. The values of Q1 and Q2
at the cross-marked point are (Q1, Q2) = (-1, -0.1), and hence Q3 =1.1. In our discussions,
we choose this point.
0 50 100 150 200 250 300 350 400
V1 (GeV)
V2 (GeV)
FIG. 2: The plot of the equipotential contours of 〈V (v1, v2, T )〉 on the (v1, v2)-plane,
obtained by Method A. The parameter values are set as tan β = 3, λ = 0.8, s(0) = 500
GeV, mA = 1830 GeV, and the temperature is set as T = 100 GeV, which is actually the
critical temperature Tc. Notice two distinct minima of 〈V (v1, v2, T )〉 on the (v1, v2)-plane:
(0, 0) where the phase of the state is symmetric, and (275, 640) GeV, where the phase
of the state is broken. The electroweak phase transition may take place from (0, 0) to
(275, 640) GeV on the (v1, v2)-plane, which is evidently discontinuous and therefore it is
first order. The distance between the two minima is vc = 696 GeV, indicating that the
strength of the first-order phase transition is strong (vc/Tc > 1). The masses of the three
scalar Higgs bosons are obtained as mS1 = 56 GeV, mS2 = 807 GeV, and mS3 = 1827
0 25 50 75 100 125 150 175 200 225 250
V1 (GeV)
V2 (GeV)
FIG. 3: The plot of the equipotential contours of 〈V (v1, v2, T )〉 on (v1, v2)-plane, obtained
by Method B. The parameter values are set as tan β = 3, λ = 0.8, s(0) = 500 GeV, mA =
1780 GeV, and Tc = 100 GeV. The coordinates of two minima are: (0, 0) and (165, 440)
GeV. The distance between the two minima is vc = 470 GeV, thus the electroweak phase
transition between the two minima is strongly first-order. The masses of the three scalar
Higgs bosons are obtained as mS1 = 82 GeV, mS2 = 804 GeV, and mS3 = 1777 GeV.
	INTRODUCTION
	ZERO TEMPERATURE
	FINITE TEMPERATURE
	Method A
	Method B
	DISCUSSIONS AND CONCLUSIONS
ABSTRACT
  We investigate the possibility of electroweak phase transition in the minimal
supersymmetric standard model (MSSM) with an extra $U(1)'$. This model has two
Higgs doublets and a singlet, in addition to a singlet exotic quark superfield.
We find that at the one-loop level this model may accommodate the electroweak
phase transitions that are strongly first-order in a reasonably large region of
the parameter space. In the parameter region where the phase transitions take
place, we observe that the lightest scalar Higgs boson has a smaller mass when
the strength of the phase transition becomes weaker. Also, the other three
heavier neutral Higgs bosons get more large masses when the strength of the
phase transition becomes weaker.

<|endoftext|><|startoftext|>
Introduction
The review of the theory and applications of reaction-diffusion systems is con-
tained in many books and articles. In recent work authors have demonstrated
the depth of mathematics and related physical issues of reaction-diffusion equa-
tions such as nonlinear phenomena, stationary and spatio-temporal dissipative
pattern formation, oscillations, waves etc. (Frank, 2005; Grafiychuk, Datsko,
http://arxiv.org/abs/0704.0329v2
and Meleshko, 2006, 20076). In recent time, interest in fractional reaction-
diffusion equations has increased because the equation exhibits self-organization
phenomena and introduces a new parameter, the fractional index, into the equa-
tion. Additionally, the analysis of fractional reaction-diffusion equations is of
great interest from the analytical and numerical point of view.
The objective of this paper is to derive the solution of an unified model of
reaction-diffusion system (14), associated with the Caputo derivative and the
Riesz-Feller derivative. This new model provides the extension of the models
discussed earlier by Mainardi, Luchko, and Pagnini (2001), Mainardi, Pagnini,
and Saxena (2005), and Saxena, Mathai, and Haubold (2006a). The present
study is in continuation of our earlier work, Haubold and Mathai (1995, 2000)
and Saxena, Mathai, and Haubold (2006a, 2006b).
2 Results Required in the Sequel
In view of the results
J−1/2(x) =
cosx. (1)
and (Mathai and Saxena, 1978, p. 49), the cosine transform of the H-function
is given by
tρ−1cos(kt)Hm,np,q
(ap,Ap)
(bq,Bq)
dt (2)
n+1,m
q+1,p+2
(1−bq,Bq),(
(ρ,µ),(1−ap,ap),(
, (3)
where Re[ρ + µmin1≤j≤m(
)] > 0, Re[ρ+ µmax1≤j≤n
] < 0, |argα| < 1
πΩ, Ω >
k > 0 and Ω =
j=1 Bj −
j=m+1 Bj +
j=1 Aj −
j=n+1 Aj .
The Riemann-Liouville fractional integral of order ν is defined by (Miller and
Ross, 1993, p. 45; Kilbas et al., 2006)
t N(x, t) =
(t − u)ν−1N(x, u)du, (4)
where Re(ν) > 0.
The following fractional derivative of order α > 0 is introduced by Caputo
(1969; see also Kilbas et al., 2006) in the form
t f(x, t) =
Γ(m − α)
f (m)(x, τ)dτ
(t − τ)α+1−m
, m − 1 < α ≤ m, Re(α) > 0, m ∈ N.
∂mf(x, t)
, if α = m. (5)
where ∂
f(x, t) is the mth partial derivative of f(x,t) with respect to t.
The Laplace transform of the Caputo derivative is given by Caputo (1969;
see also Kilbas et al., 2006) in the form
L {0D
t f(x, t); s} = s
αF (x, s)−
sα−r−1f (r)(x, 0+), (m− 1 < α ≤ m). (6)
Following Feller (1952, 1971), it is conventional to define the Riesz-Feller
space-fractional derivative of order α and skewness θ in terms of its Fourier
transform as
F {xD
θ f(x); k} = −Ψ
α(k)f
∗(k), (7)
where
Ψθα(k) = |k|
αexp[i(signk)
], 0 < α ≤ 2, |θ| ≤ min {α, 2 − α} . (8)
When θ = 0, then (8) reduces to
F {xD
0 f(x); k} = −|k|
α, (9)
which is the Fourier transform of the Weyl fractional operator, defined by
xf(t) =
Γ(n − µ)
f(u)du
(t − u)µ−n+1
. (10)
This shows that the Riesz-Feller operator may be regarded as a generalization
of the Weyl operator.
Further, when θ = 0, we have a symmetric operator with respect to x that
can be interpreted as
0 = −
This can be formally deduced by writing −(k)α = −(k2)α/2. For 0 < α < 2 and
|θ| ≤ min {α, 2 − α}, the Riesz-Feller derivative can be shown to possess the
following integral representation in the x domain:
θ f(x) =
Γ(1 + α)
sin[(α + θ)π/2]
f(x + ξ) − f(x)
+ sin[(α − θ)π/2]
f(x − ξ) − f(x)
. (12)
Finally, we need the following property of the H-function (Mathai and Sax-
ena, 1978)
Hm,np,q
(ap,ap)
(bq ,Bq)
Hm,np,q
(ap,Ap/δ)
(bq,Bq/δ
, (δ > 0). (13)
3 Unified Fractional Reaction-Diffusion Equa-
In this section, we will investigate the solution of the reaction-diffusion equation
(14) under the initial conditions (15). The result is given in the form of the
following
Theorem. Consider the unified fractional reaction-diffusion model
t N(x, t) = ηxD
θ N(x, t) + Φ(x, t), (14)
where η, t > 0, x ∈ r; α, θ, β are real parameters with the constraints
0 < α ≤ 2, |θ| ≤ min(α, 2 − α), 0 < β ≤ 2, and the initial conditions
N(x, 0) = f(x), Nt(x, 0) = g(x) ); for x ∈ R,
|x|→±∞ N(x, t) = 0, t > 0. (15)
Here Nt(x, 0) means the first partial derivative of N(x, t) with respect to t
evaluated at t = 0, η is a diffusion constant and Φ(x, t) is a nonlinear function
belonging to the area of reaction-diffusion. Further xD
θ is the Riesz-Feller
space-fractional derivative of order α and asymmetry θ. 0D
t is the Caputo
time-fractional derivative of order β. Then for the solution of (14), subject to
the above constraints, there holds the formula
N(x, t) =
f∗(k)Eβ,1(−ηt
βΨθα(k))exp(−ikx)dk (16)
tg∗(k)Eβ,2(−ηk
αtβΨθα(k))exp(−ikx)dk
ξβ−1dξ
Φ∗(k, t − ξ)Eβ,β(−ηk
αtβΨθα(k))exp(−ikx)dk.
In equation (16) and the following, Eα,β(z) denotes the generalized Mittag-
Leffler function (Saxena, Mathai, and Haubold, 2004; Berberan-Santos, 2005;
Chamati and Tonchev, 2006).
Proof. If we apply the Laplace transform with respect to the time variable t,
Fourier transform with respect to space variable x, and use the initial conditions
(15) and the formula (7), then the given equation transforms into the form
∼(k, s) − sβ−1f∗(k) − sβ−2g∗(k) = −ηΨθα(k)N
∼(k, s) + Φ
∼(k, s),
where according to the conventions followed , the symbol ∼ will stand for the
Laplace transform with respect to time variable t and * represents the Fourier
transform with respect to space variable x.
Solving for N
∼ , it yields
∼(k, s) =
f∗(k)sβ−1
sβ + ηΨθα(k)
g∗(k)sβ−2
sβ + ηΨθα(k)
sβ + ηΨθα(k)
. (17)
On taking the inverse Laplace transform of (17) and applying the formula
a + sα
= tα−βEα,α−β+1(−at
α), (18)
where Re(s) > 0, Re(α) > 0, Re(α − β) > −1; it is seen that
N∗(k, t) = f∗(k)Eβ,1(−ηt
βΨθα(k)) + g
∗(k)tEβ,2(−ηt
βΨθα(k))
Φ∗(k, t − ξ)ξβ−1Eβ,β(−ηΨ
α(k)ξ
β)dξ. (19)
The required solution (16) is now obtained by taking the inverse Fourier trans-
form of (19). This completes the proof of the theorem.
4 Special Cases
When g(x) = 0, then by the application of the convolution theorem of the
Fourier transform to the solution (16) of the theorem, it readily yields
Corollary 1. The solution of the fractional reaction-diffusion equation
N(x, t) − η
N(x, t) = Φ(x, t), x ∈ R, t > 0, η > 0, (20)
with initial conditions
N(x, 0) = f(x), Nt(x, 0) = 0 for x ∈ R, 1 < β ≤ 2,
x→±∞ N(x, t) = 0, (21)
where η is a diffusion constant and Φ(x, t) is a nonlinear function belonging to
the area of reaction-diffusion, is given by
N(x, t) =
G1(x − τ, t)f(τ)dτ
(t − ξ)β−1dξ
G2(x − τ, t − ξ)Φ(τ, ξ)dτ, (22)
where
α − θ
G1(x, t) =
exp(−ikx)Eβ,1(−η|t
β |Ψθα(k))dk (23)
η1/αtβ/α
(1,1/α),(β,β/α),(1,ρ)
(1,1/α),(1,1),(1,ρ)
, (α > 0)
G2(x, t) =
exp(−ikx)Eβ,β(−ηt
βΨθα(k))dk
η1/αtβ/α
(1,1/α),(β,β/α),(1,ρ)
(1,1/α),(1,1),(1,ρ)
, (α > 0). (24)
In deriving the above results, we have used the inverse Fourier transform formula
F−1[Eβ,γ(−ηt
βΨαθ (k)); x] =
3,3 [
η1αtβ/α
(1,1/α),(γ,β/α),(1,ρ)
(1,1/α),(1,1),(1,ρ)
], (25)
where Re(β) > 0, Re(γ) > 0, which can be established by following a procedure
similar to that employed by Mainardi, Luchko, and Pagnini (2001). Next , if
we set f(x) = δ(x), Φ = 0, g(x) = 0, where δ(x) is the Dirac delta-function,
then we arrive at the following interesting result given by Mainardi, Pagnini,
and Saxena (2005).
Corollary 2. Consider the following space-time fractional diffusion model
∂βN(x, t)
= η xD
θ N(x, t), η > 0, x ∈ R, 0 < β ≤ 2, (26)
with the initial conditions N(x, t = 0) = δ(x), Nt(x, 0) = 0,
x→±∞ N(x, t) = 0
where η is a diffusion constant and δ(x) is the Dirac delta-function. Then for
the fundamental solution of (26) with initial conditions, there holds the formula
N(x, t) =
3.3 [
(ηtβ)1/α
(1,1/α),(1,β/α),(1,ρ)
(1,1/α),(1,1),(1,ρ)
], (27)
where ρ = α−θ
Some interesting special cases of (26) are enumerated below.
(i) We note that for α = β, Mainardi, Pagnini, and Saxena (2005) have
shown that the corresponding solution of (26), denoted by Nθα, which we call as
the neutral fractional diffusion, can be expressed in terms of elementary function
and can be defined for x > 0 as
Neutral fractional diffusion: 0 < α = β < 2; θ ≤ min {α, 2 − α} ,
Nθα(x) =
xα−1sin[(π/2)(α − θ)]
1 + 2xαcos[(π/2)(α − θ)] + x2α
. (28)
The neutral fractional diffusion is not studied at length in the literature.
Next we derive some stable densities in terms of the H-functions as special
cases of the solution of the equation (26)
(ii) If we set β = 1, 0 < α < 2; θ ≤ min {α, 2 − α}then (26) reduces to space
fractional diffusion equation, which we denote by Lθα(x) is the fundamental
solution of the following space-time fractional diffusion model:
∂N(x, t)
= η xD
θ N(x, t), η > 0, x ∈ R, (29)
with the initial conditions N(x, t = 0) = δ(x), limx→±∞N(x, t) = 0,, where η is a
diffusion constant and δ(x) is the Dirac-delta function. Hence for the solution
of (29) there holds the formula
Lθα(x) =
α(ηt)1/α
(ηt)1/α
(1,1),(ρ,ρ)
),(ρ,ρ)
, 0 < α < 1, |θ| ≤ α, (30)
where ρ = α−θ
. The density represented by the above expression is known as
α-stable Lévy density. Another form of this density is given by
Lθα(x) =
α(ηt)1/α
(ηt)1/α
(1− 1
),(1−ρ,ρ)
(0,1),(1−ρ,ρ)
, 1 < α < 2, |θ| ≤ 2 − α,
(iii) Next, if we take α = 2, 0 < β < 2, θ = 0, then we obtain the time
fractional diffusion, which is governed by the following time fractional diffusion
model:
∂βN(x, t)
N(x, t), η > 0, x ∈ R, 0 < β ≤ 2, (32)
with the initial conditions N(x, t = 0) = δ(x), Nt(x, 0) = 0,
x→±∞ N(x, t) = 0
where η is a diffusion constant and δ(x) is the Dirac delta-function, whose
fundamental solution is given by the equation
N(x, t) =
(ηtβ)1/2
(1,β/2)
(1,1)
. (33)
(iv) Further, if we set α = 2, β = 1 and θ → 0 then for the fundamental
solution of the standard diffusion equation
N(x, t) = η
N(x, t), (34)
with initial condition
N(x, t = 0) = δ(x), limx→±∞N(x, t) = 0, (35)
there holds the formula
N(x, t) =
η1/2t1/2
(1,1/2)
(1,1)
= (4πηt)−1/2exp[−
], (36)
which is the classical Gaussian density. For further details of these special cases
based on the Green function, one can refer to the paper by Mainardi, Luchko,
and Pagnini (2001) and Mainardi, Pagnini, and Saxena (2005).
Remark. Fractional order moments and the asymptotic expansion of the solu-
tion (27) are discussed by Mainardi, Luchko, and Pagnini (2001).
Finally, for β = 1/2 in (14), we arrive at
Corollary 3. Consider the following fractional reaction-diffusion model
t N(x, t) = ηxD
θ N(x, t) + Φ(x, t), (37)
where η, t > 0, x ∈ R; α, θ are real parameters with the constraints
0 < α ≤ 2, |θ| ≤ min(α, 2 − α), and the initial conditions
N(x, 0) = f(x), for x ∈ R, limx→±∞N(x, t) = 0. (38)
Here η is a diffusion constant and Φ(x, t) is a nonlinear function belonging to
the area of reaction-diffusion. Further xD
θ is the Riesz-Feller space fractional
derivative of order α and asymmetry θ and D
t is the Caputo time-fractional
derivative of order 1/2. Then for the solution of (37), subject to the above
constraints, there holds the formula
N(x, t) =
f∗(k)E1/2,1(−ηt
βΨθα(k))exp(−ikx)dk (39)
ξ−1/2dξ
Φ∗(kct − ξ)E 1
(−ηkαt1.2Ψθα(k))exp(−ikx)dk.
If we set θ = 0 in (39), then it reduces to the result recently obtained by the
authors (2006a) for the fractional reaction-diffusion equation.
5 References
Berberan-Santos, M.N. (2005). Properties of the Mittag-Leffler relaxation func-
tion, Journal of Mathematical Chemistry, 38, 629-635.
Caputo, M. (1969). Elasticita e Dissipazione, Zanichelli, Bologna.
Chamati, H. and Tonchev, N.S. (2006). Generalized Mittag-Leffler functions
in the theory of finite-size scaling for systems with strong anisotropy and/or
long-range interaction, Journal of Physics A: Mathematical and General, 39,
469-478.
Feller, W. (1952). On a generalization of Marcel Riesz’ potentials and the
semi-groups generated by them, Meddeladen Lund Universitets Matematiska
Seminarium (Comm. Sém. Mathém. Université de Lund ), Tome suppl. dédié
a M. Riesz, Lund, 73-81.
Feller, W. (1966). An Introduction to Probability Theory and its Applications,
Vol. II, John Wiley and Sons, New York.
Frank, T.D. (2005). Nonlinear Fokker-Planck Equations: Fundamentals and
Applications, Springer, Berlin Heidelberg New York.
Grafiychuk, V., Datsko, B., and Meleshko, V. (2006). Mathematical model-
ing of pattern formation in sub- and superdiffusive reaction-diffusion systems,
arXiv:nlin.AO/06110005 v3.
Grafiychuk, V., Datsko, B., and Meleshko, V. (2007). Nonlinear oscillations and
stability domains in fractional reaction-diffusion systems, arXiv:nlin.PS/0702013
Haubold, H.J. and Mathai, A.M. (2000). The fractional kinetic equation and
thermonuclear functions, Astrophysics and Space Science, 273, 53-63.
Haubold, H.J. and Mathai, A.M. (1995). A heuristic remark on the periodic
variation in the number of solar neutrinos detected on Earth, Astrophysics and
Space Science, 228, 113-124.
Kilbas, A.A., Srivastava, H.M., and Trujillo, J.J. (2006). Theory and Applica-
tions of Fractional Differential Equations, Elsevier, Amsterdam.
Mainardi, F., Luchko, Y., and Pagnini, G. (2001). The fundamental solution
of the space-time fractional diffusion equation, Fractional Calculus and Applied
Analysis. 4, 153-192.
Mainardi, F., Pagnini, G., and Saxena, R.K. (2005). Fox H-functions in frac-
tional diffusion, Journal of Computational and Applied Mathematics 178, 321-
Mathai, A.M. and Saxena, R.K. (1978). The H-function with Applications in
Statistics and Other Disciplines, John Wiley and Sons, New York, London, and
Sydney.
Miller, K.S. and Ross, B. (1993). An Introduction to the Fractional Calculus
and Fractional Differential Equations, John Wiley and Sons, New York.
Saxena, R.K., Mathai, A.M., and Haubold, H.J. (2004). On fractional kinetic
equations, Astrophysics and Space Science, 282, 281-287.
Saxena, R.K., Mathai, A.M., and Haubold, H.J. (2006a). Fractional reaction-
diffusion equations, Astrophysics and Space Science, 305, 289-296.
Saxena, R.K., Mathai, A.M., and Haubold, H.J. (2006b). Reaction-diffusion
systems and nonlinear waves, Astrophysics and Space Science, 305, 297-303.
Yu, R. and Zhang, H. (2006). New function of Mittag-Leffler type and its ap-
plication in the fractional diffusion-wave equation, Chaos, Solitons and Fractals
30, 946-955.
ABSTRACT
  This paper deals with the investigation of the solution of an unified
fractional reaction-diffusion equation associated with the Caputo derivative as
the time-derivative and Riesz-Feller fractional derivative as the
space-derivative. The solution is derived by the application of the Laplace and
Fourier transforms in closed form in terms of the H-function. The results
derived are of general nature and include the results investigated earlier by
many authors, notably by Mainardi et al. (2001, 2005) for the fundamental
solution of the space-time fractional diffusion equation, and Saxena et al.
(2006a, b) for fractional reaction- diffusion equations. The advantage of using
Riesz-Feller derivative lies in the fact that the solution of the fractional
reaction-diffusion equation containing this derivative includes the fundamental
solution for space-time fractional diffusion, which itself is a generalization
of neutral fractional diffusion, space-fractional diffusion, and
time-fractional diffusion. These specialized types of diffusion can be
interpreted as spatial probability density functions evolving in time and are
expressible in terms of the H-functions in compact form.

<|endoftext|><|startoftext|>
Introduction
Starting from its introduction in nuclear physics by Wigner,1) random matrix
theories have been applied to a wide range of problems ranging from the physics of
proteins2) to quantum gravity (see3), 4) for a historical review). Three reasons for
the ubiquity of random matrix theory come to mind. First, eigenvalues of large ran-
dom matrices have universal properties determined by symmetries. Second, random
matrices are models for disorder present in many physical systems. Third, random
matrix theories have a topological expansion which is important for applications to
quantum field theory. One of the attractive features of random matrix theory is that
analytical information can be obtained for complex systems which otherwise only
can be studied experimentally or numerically.
In this review we discuss applications of random matrix theory to QCD at
nonzero temperature and chemical potential. Since the order parameter for the
chiral phase transition5), 6) and the deconfining phase transition7), 8) are determined
by the infrared behavior of the eigenvalues of the Dirac operator, these eigenvalues
are essential for the phase transitions in QCD. Remarkably, the distribution of the
smallest Dirac eigenvalues is given by universal functions9)–13) that depend only on
one or two parameters, the chiral condensate and the pion decay constant. This
offers an alternative way to measure these constants on the lattice.14)–22)
§2. Random Matrix Theory in QCD
Chiral Random Matrix Theory (chRMT) is a theory with the global symmetries
of QCD, but matrix elements of the Dirac operator replaced by random numbers9), 10)
iW † m
, P (W ) ∼ e−NTrW †W . (2.1)
∗) e-mail address: split@nbi.dk
∗∗) e-mail address: jacobus.verbaarschot@stonybrook.edu
http://arxiv.org/abs/0704.0330v1
2 K.Splittorff and J.J.M. Verbaarschot
This random matrix model has the global symmetries and topological properties of
QCD. It is confining in the sense that only color singlets have a nonzero expecta-
tion value. It is now well understood that fluctuations of low-lying eigenvalues of
the Dirac operator are described by chRMT (see23)–28) for lectures and reviews).
Philosphically, this is important because of the realization that chaotic motion dom-
inates the dynamics of quarks at low energy. Practically, this is important because
we can use powerful random matrix techniques to calculate physical observables.
The condition for the applicability of chRMT is that the Compton wavelength
of Goldstone bosons associated with the mass scale z of these eigenvalues is much
larger than the size of the box. With the squared mass of the associated Goldstone
boson given by 2zΣ/F 2π , this condition reads
≪ Λ2. (2.2)
The second condition is necessary to factorize the partition function into a contribu-
tion from the lightest degrees of freedom and all heavier degrees of freedom. These
two conditions determine the microscopic domain of QCD. We stress that z is a scale
in the Dirac spectrum so that, for sufficiently large volumes, we always have eigenval-
ues in the domain (2.2) where eigenvalues fluctuate according to chRMT. This can be
shown rigorously from the following two observations.30), 31) First, the infrared Dirac
spectrum follows from a (partially quenched) chiral Lagrangian determined by chiral
symmetry, and the inequality (2.2) is the condition for factorization of the partition
function into a factor containing the constant modes and another factor containing
the nonzero momentum modes. Second, the factor with the constant modes is equal
to the large N limit of chiral random matrix theory.
In32), 33) the condition (2.2) was imposed on the quark masses and was the bases
for a systematic expansion of the chiral Lagrangian known as the ǫ expansion.
One feature that underlies universal properties of eigenvalues is that they be-
have as repulsive confined charges. This follows from the joint probability distri-
bution ∼
k<l(λ
)2 exp(−N
). It can be shown that eigenvalues
correlations at the micrsocopic scale are universal.34) The reason is spontaneous
symmetry breaking and a mass gap so that they can be described in terms of a
chiral Lagrangian.
2.1. Chiral Random Matrix Theory at µ 6= 0 and T 6= 0
A nonzero temperature does not change the fluctuating behavior of the Dirac
eigenvalues provided that chiral symmetry remains broken. However, a transition to
a different universality class takes place at the critical temperature. A random matrix
model that reproduces this universal behavior of QCD is obtained by replacing the
off-diagonal elements in (2.1) by35)
iW → iW + t, iW † → iW † − t with t = diag(−πT, πT ). (2.3)
This model has been studied elaborately in the literature (see e.g.35)–40)).
A nonzero chemical potential can be introduced analogously to the quark mass.
The requirement is that the small µ behaviour of the QCD partition function should
Random Matrix Theory 3
0.0 1.0 2.0 3.0 4.0
2µ/mπ
m=0.10
m=0.05
m=0.01
Fig. 1. Lattice results for Nc = 2 (taken from
55)) and phase quenched QCD with Nc = 3 (taken
from56))
be reproduced by the random matrix partition function. This achieved by modifying
(2.1) by41)
iW → iW + µ, iW † → iW † + µ, (2.4)
resulting in a nonhermitean Dirac operator with eigenvalues scattered in the complex
plane. The prescription (2.4) is not unique. A random matrix model that has had a
strong impact on recent developments is defined by42)
iW → iW + µH, iW † → iW † + µH with H† = H, (2.5)
where H is drawn from a Gaussian ensemble of random matrices. This model is in
the same universality class as (2.4) but is technically simpler since it can be worked
out by means of the complex orthogonal polynomial method.42)–46)
There are other types of random matrix models that have been applied to QCD.
For example models with random gauge fields such as the Eguchi-Kawai model47) or
its 2-dimensional version.48) QCD in 1 dimension49), 50) is a random matrix model
as well, with universally fluctuating Dirac eigenvalues. Also models with random
Wilson loops51), 52) have attracted significant interest.
§3. Phases of QCD and RMT
QCD-like theories with charged Goldstone bosons have a critical chemical poten-
tial equal to mπ/2. The phase transition to the Bose condensed phase can therefore
be described completely in terms of a chiral Lagragian. At the mean field level,53)
the kinetic terms of this chiral Lagrangian do not contribute, so that these results
can also be obtained from chiral random matrix theory. Indeed, the static part of
the chiral Lagrangian53), 54)
F 2πµ
2Tr[U,B][U †, B]−
ΣTr(MU +MU †). (3.1)
can also be obtained from the large N limit of the models (2.4) or (2.5).
4 K.Splittorff and J.J.M. Verbaarschot
Tricritial point
0.1 0.2 0.3 0.4 0.5 0.6 0.7
Fig. 2. QCD phase diagram in the µTm-space (taken from58))
In Fig. 1 we display lattice results for QCD with Nc = 2
55) and phase quenched
QCD.56) They show an impressive agreement with the results from (3.1) given by
the solid curves in both figures.
3.1. Schematic RMT Phase Diagram
The phase transition in QCD with Nc = 3 at µc = mN/3 cannot be analyzed
by means of chiral Lagrangians. Because of the sign problem lattice studies are not
possible either. In such situation there is long tradition to analyze the same problem
in a much simpler theory in the hope of obtaining at least a qualitative understanding
of the problem. For example, one dimensional QCD,49), 50) or more recently, super
Yang-Mills theory and AdS-CFT duality,57) been explored as toy models for QCD.
We will use random matrix theory at T 6= 0 and µ 6= 0, introduced in (2.3) and
(2.4) to obtain a qualitive understanding of the QCD phase diagram. Lattice QCD
simulations show that the chiral phase transition at µ = 0 is of second order or a
steep cross-over. At T = 0 we expect a first order phase transition at µc = mN/3.
It is natural that the first order line ends in a critical end point or joins the second
order critical line at the tricritical point (see Fig. 3.1, left). This is indeed what
is observed in random matrix theory58), 59) (see Fig. 3.1, right). A similar phase
diagram has also been obtained from the NJL model.60)–62)
Another scenario that was discovered in RMT is the splitting of the first order
line into two at nonzero isospin chemical potential.63) This behavior was also found
in a NJL model64), 65) but might not be stable against flavor mixing interactions.66)
§4. Dirac Spectrum in Theories Without a Sign Problem
Since the spectrum of the Dirac operator determines the chiral condensate, phase
transitions in QCD can be understood in terms of its spectral flow. In this section we
discuss theories with a positive fermion determinant such as QCD with two colors and
phase quenched QCD, where a probabilistic interpretation of the eigenvalue density
is possible. The relation between chiral symmetry breaking and Dirac spectra is
much more complicated when the fermion determinant is complex and its discussion
will be postponed to the next section.
The spectrum of an anti-Hermitean Dirac operator is purely imaginary with an
eigenvalue density that is proportional to the volume. If chiral symmetry is broken
spontaneously, the chiral condensate becomes discontinuous across the imaginary
axis in the thermodynamic limit. Chiral symmetry is restored if such discontinuity
Random Matrix Theory 5
mm m m m m
T < Tc
µ = 0
T > Tc
µ = 0 T < Tc
µ < µc
T < Tc
µ = µc
T < Tc
µ > µc
T > Tc
µ > µc
Fig. 3. Critical behavior of the Dirac spectrum. µc = mπ/2 for T = 0 and increases with T .
is absent for example by the formation of a gap in the Dirac spectrum, see eg.71) .
For µ 6= 0, the Dirac spectrum broadens into a strip of width 4µ2F 2π/Σ.49), 67)
The chemical potential becomes critical when the quark mass hits the edge of this
strip. At this point the chiral condensate starts rotating into a pion condensate.
Chiral symmetry restoration takes place when a gap forms at zero. A schematic
picture of the critical behavior of Dirac eigenvalues is shown in Fig. 3 and the spectral
flow of the Dirac eigenvalues with respect to increasing µ and T is summarized in
Fig. 4. One conclusion from this behavior is that Tc(µ) is a concave function
of µ, and that µc(T ) is a convex function of T . The spectral flow discussed in this
section is supported by lattice simulations at T 6= 0 and µ 6= 0 (See Fig. 5)
4.1. Dirac spectrum in the µ-plane
We could equally well have diagonalized the Dirac operator in a representation
where µγ0 is proportional to the identity,
det(D +m+ µγ0) = det(γ0(D +m) + µ). (4.1)
These eigenvalues are relevant to the baryon number density. A gap in the spectrum
develops at m 6= 0 (see Fig. 6), and the chemical potential becomes critical, µ =
mπ/2 when it hits the inner edge of the domain of eigenvalues.
Increasing µ
Increasing T
Fig. 4. Spectral flow of the Dirac spectrum (left) and phase diagram (right) with respect to µ and
T in phase quenched QCD and QCD with two colors.
6 K.Splittorff and J.J.M. Verbaarschot
1 1.5 2 2.5
b=0.35
b=0.3525
b=0.355
b=0.3575
b=0.36
1.76(t-0.93)
0.0 0.1 0.2 0.3
β=5.5
β=5.66
β=5.71
β=5.75 β=5.9
Fig. 5. Temperature and chemical potential dependence of Dirac eigenvalues. From left to right
taken from.70), 72)–74)
4.2. Quenched Lattice QCD Dirac Spectra at µ 6= 0
Small Dirac eigenvalues at µ 6= 0 have been computed in quenched QCD. The
analytical formulas for the average density of the small Dirac eigenvalues are avail-
able.68), 69) They were first derived68) by exploiting the Toda lattice hierarchy in the
flavor index. Comparisons of random matrix predictions68) for the radial spectral
density and lattice QCD results75), 76) are shown in the left panel of Fig. 7. In other
cases, such as the overlap Dirac operator77) and QCD with Nc = 2,
78) a similar
degree of agreement was found. Both the spectral density and two-point correlations
can be derived from the Lagrangian (3.1), i.e. they are determined by two param-
eters, Fπ and Σ. This can be exploited to extract these low-energy constants. For
example, Fπ and Σ were determined
19), 21) (see also20)) from the correlators shown
in the two right panels of Fig. 7.
§5. Chiral Symmetry Breaking at µ 6= 0
The full QCD partition function at µ 6= 0 which is the average of
det(D +m+ µγ0) = |det(D +m+ µγ0)|eiθ, θ 6= 0, (5.1)
has properties which are drastically different from the phase quenched partition
function where the phase factor is absent. In particular, µc = mN/3 instead of mπ/2,
so that the free energy remains µ-independent until µ = mN/3. For µ < mN/3 the
Fig. 6. Eigenvalues of γ0(D + m) for a random matrix Dirac operator at m = 0 (left), m 6= 0
(middle) (both taken from79)), and lattice QCD at m 6= 0 (right, taken from49)).
Random Matrix Theory 7
—– Splittorff-Verbaarschot-2004
—– Wettig-2004
0 2 4 6 8
−0.15
−0.05
V = 8
  10000 configs
µisoFπV
 = 0.159
1.27 1.37 1.47 π/2 1.67 1.77 1.87
angle (θ)
lattice: 6
, µa = 0.006
fit: µFV
 = 0.14
Fig. 7. The radial spectral density for (left, taken from75), 76)) and two-point correlations (middle
taken from19) and right taken from21)).
chiral condensate remains discontinuous at m = 0, whereas the chiral condensate
of the phase quenched theory approaches zero for m → 0 (see Fig. 5). The only
difference between the phase quenched partition function and the full QCD partition
function is the phase of the fermion determinant. We conclude that the phase factor
is responsible for the discontinuity of the chiral condensate. How can this happen if
for each configuration the support of the spectrum is approximately the same? This
problem known as the “Silver Blaze Problem”80) was solved in.6)
5.1. Unquenched Spectral Density
The spectral density for QCD with dynamical fermions is given by
ρNf (λ) = 〈
δ2(λ− λk)detNf (D +m+ µγ0)〉. (5.2)
Because of the phase of the fermion determinant, this density is in general complex
and can be decomposed as ρNf (λ) = ρNf=0(λ) + ρU (λ). The chiral condensate can
then be decomposed as ΣNf (m) = ΣNf=0(m) +ΣU (m), so that the discontinuity in
Σ(m) is due to ρU . Asymptotically it behaves as
ρU ∼ e
µ2F 2V e
iIm(λ)ΣV
and vanishes outside an ellips starting at Re(λ) = m (see Fig. 9).6) In the right part
of this figure we show the real part of the spectral density for QCD with one flavor
at nonzero chemical potential.
Scatter plot of Dirac eigenvalues
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
quark mass m
Support of spectrum
Chiral condensate
condensate
Quenched chiral
in full QCD
µ2F 2
Σ(m) = 1
Fig. 8. Chiral condensate of quenched and full QCD.
8 K.Splittorff and J.J.M. Verbaarschot
Dirac spectrum for Full QCD.
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
�����
Oscillating Region
quark mass m
-1000100
0.001
0.002
2F 2µ2
µ2F 2
Fig. 9. Support (left) and real part (right, taken from27)) of Dirac spectral density for QCD with
Nf = 1 and µ 6= 0.
This result explains the mechanism of chiral symmetry breaking at nonzero
chemical potential. The phase of the fermion determinant rotates the pion conden-
sate back into a chiral condensate, but it does so in an unexpected way.6) The same
mechanism is at play for 1d QCD at µ 6= 0.82)
§6. Phase of the Fermion Determinant
The magnitude of the sign problem can be measured by means of the expectation
value of the phase factor of the fermion determiant which can be defined in two ways
〈e2iθ〉Nf =
det(D + µγ0 +m)
det∗(D + µγ0 +m)
detNf (D + µγ0 +m)
, 〈e2iθ〉1+1∗ =
ZNf=2
Z1+1∗
The average 〈· · · 〉 is with respect to the Yang-Mills action. The sign problem is
managable when the average phase factor remains finite in the thermodynamic limit.
In the microscopic domain it is possible to obtain exact analytical expressions for
the average phase factor by exploiting the equivalence between QCD and RMT in
this domain. For µ < mπ/2 the free energy of both QCD and phase quenched
QCD are independent of µ. This does not imply that the average phase factor is
µ-independent. The µ-dependence originates from the charged Goldstone bosons
with mass mπ ± 2µ, and for Nf flavors the mean field result83), 84) for 〈exp(2iθ)〉
reads (1 − 4µ2/m2π)Nf+1. The exact result for the average phase factor for Nf = 2
is shown in Fig. 10 (right), where lattice results85) are also shown (left). The exact
result has an essential singularity at µ = 0, but its thermodyanmic limit agrees with
the mean result.
0 0.5 1 1.5
2µ/mπ
mΣV  =  4
mΣV >> 1
Fig. 10. Average phase factor. Lattice QCD results are shown left (taken from85)) and the exact
microscopic result83) is shown right.
Random Matrix Theory 9
§7. Conclusions
The equivalence of chiral random matrix theory and QCD has been exploited
succesfully to derive a host of analytical results. Among others, eigenvalue fluctua-
tions predicted by chRMT have been observed in lattice simulations, the phases of
QCD can be understood in terms of spectral flow, observables can be extracted from
the fluctuations of the smallest eigenvalues, the sign problem is not serious when the
quark mass is outside the domain of the eigenvalues, and mean field results can be
obtained from random matrix theory. Summarizing, chiral random matrix theory is
a powerful tool for analyzing the infrared domain of QCD.
Acknowledgements
The YITP is thanked for its hospitality. G. Akemann, J. Osborn and P.H.
Damgaard are acknowledged for valuable discussions. This work was supported by
US DOE Grant No. DE-FG-88ER40388 (JV), the Villum Kann Rasmussen Foun-
dation (JV), the Danish National Bank (JV) and the Carslberg Foundation (KS).
References
1) E.P. Wigner, Proc. Cam. Phil. Soc. 47 (1951) 790.
2) M. Sener and K. Schulten, Phys. Rev. E 65, 031916 (2002).
3) T. Guhr, A. Muller-Groeling and H. A. Weidenmuller, Phys. Rept. 299, 189 (1998).
4) P. J. Forrester, N. C. Snaith and J. J. M. Verbaarschot, J. Phys. A 36, R1 (2003).
5) T. Banks and A. Casher, Nucl. Phys. B 169, 103 (1980).
6) J. C. Osborn, K. Splittorff and J. J. M. Verbaarschot, Phys. Rev. Lett. 94, 202001 (2005).
7) C. Gattringer, Phys. Rev. Lett. 97, 032003 (2006).
8) F. Synatschke, A. Wipf and C. Wozar, arXiv:hep-lat/0703018.
9) E. V. Shuryak and J. J. M. Verbaarschot, Nucl. Phys. A 560, 306 (1993).
10) J. J. M. Verbaarschot, Phys. Rev. Lett. 72, 2531 (1994).
11) J. J. M. Verbaarschot and I. Zahed, Phys. Rev. Lett. 70, 3852 (1993).
12) S. M. Nishigaki, P. H. Damgaard and T. Wettig, Phys. Rev. D 58, 087704 (1998).
13) P. H. Damgaard and S. M. Nishigaki, Phys. Rev. D 63, 045012 (2001).
14) M. E. Berbenni-Bitsch et al. , Nucl. Phys. Proc. Suppl. 63, 820 (1998).
15) P. H. Damgaard et al. Phys. Lett. B 495, 263 (2000)
16) T. DeGrand, R. Hoffmann, S. Schaefer and Z. Liu, Phys. Rev. D 74, 054501 (2006).
17) H. Fukaya et al. [JLQCD Collaboration], arXiv:hep-lat/0702003.
18) C. B. Lang, P. Majumdar and W. Ortner, arXiv:hep-lat/0611010.
19) P. Damgaard, U. Heller, K. Splittorff and B. Svetitsky, Phys. Rev. D 72, 091501 (2005).
20) P. Damgaard, U. Heller, K. Splittorff, B. Svetitsky and D. Toublan, Phys. Rev. D 73,
105016 (2006).
21) J. C. Osborn and T. Wettig, PoS LAT2005, 200 (2006) [arXiv:hep-lat/0510115].
22) G. Akemann, P. H. Damgaard, J. C. Osborn and K. Splittorff, Nucl. Phys. B 766, 34
(2007).
23) M. A. Stephanov, J. J. M. Verbaarschot and T. Wettig, arXiv:hep-ph/0509286.
24) J. J. M. Verbaarschot and T. Wettig, Ann. Rev. Nucl. Part. Sci. 50, 343 (2000).
25) J. J. M. Verbaarschot, arXiv:hep-th/0502029.
26) M. A. Nowak, arXiv:hep-ph/0112296.
27) K. Splittorff, PoS LAT2006 023, arXiv:hep-lat/0610072.
28) G. Akemann, arXiv:hep-th/0701175.
29) J. J. M. Verbaarschot, Phys. Lett. B 368, 137 (1996).
30) J. C. Osborn, D. Toublan and J. J. M. Verbaarschot, Nucl. Phys. B 540, 317 (1999).
31) P. Damgaard, J. Osborn, D. Toublan and J. Verbaarschot, Nucl. Phys. B 547, 305 (1999).
10 K.Splittorff and J.J.M. Verbaarschot
32) J. Gasser and H. Leutwyler, Phys. Lett. B 188, 477 (1987).
33) H. Leutwyler and A. Smilga, Phys. Rev. D 46, 5607 (1992).
34) G. Akemann, P. H. Damgaard, U. Magnea and S. Nishigaki, Nucl. Phys. B 487, 721 (1997).
35) A. D. Jackson and J. J. M. Verbaarschot, Phys. Rev. D 53, 7223 (1996).
36) T. Wettig, H. A. Weidenmueller and A. Schaefer, Nucl. Phys. A 610, 492C (1996).
37) M. A. Stephanov, Phys. Lett. B 375, 249 (1996).
38) A. D. Jackson, M. K. Sener and J. J. M. Verbaarschot, Nucl. Phys. B 479, 707 (1996).
39) M. A. Nowak, G. Papp and I. Zahed, Phys. Lett. B 389, 341 (1996).
40) R. A. Janik, M. A. Nowak, G. Papp and I. Zahed, Phys. Lett. B 446, 9 (1999).
41) M. A. Stephanov, Phys. Rev. Lett. 76, 4472 (1996).
42) J. C. Osborn, Phys. Rev. Lett. 93, 222001 (2004).
43) G. Akemann and A. Pottier, J. Phys. A 37, L453 (2004).
44) Y.V. Fyodorov, B. Khoruzhenko and H.J. Sommers, Ann. Inst. Henri Poincaré: Phys.
Theor. 68, 449 (1998).
45) G. Akemann, Phys. Rev. Lett. 80, 072002 (2002); J. Phys. A: Math. Gen. 36, 3363 (2003).
46) M. C. Bergere, arXiv:hep-th/0311227; M. C. Bergere, arXiv:hep-th/0404126.
47) T. Eguchi and H. Kawai, Phys. Rev. Lett. 48, 1063 (1982).
48) D. J. Gross and E. Witten, Phys. Rev. D 21, 446 (1980).
49) P. E. Gibbs, Preprint PRINT-86-0389-GLASGOW, 1986.
50) N. Bilic and K. Demeterfi, Phys. Lett. B 212, 83 (1988).
51) B. Durhuus and P. Olesen, Nucl. Phys. B 184, 461 (1981).
52) A. Dumitru et al., Phys. Rev. D 70, 034511 (2004).
53) J.B. Kogut et al., Nucl. Phys. B 582, 477 (2000).
54) J.B. Kogut, M.A. Stephanov and D. Toublan, Phys. Lett. B 464, 183 (1999).
55) S. Hands et al., ZEur. Phys. J. C 17, 285 (2000).
56) J. B. Kogut and D. K. Sinclair, Phys. Rev. D 66, 034505 (2002).
57) G. Policastro, D. T. Son and A. O. Starinets, Phys. Rev. Lett. 87, 081601 (2001).
58) M. Halasz et al., Phys. Rev. D 58, 096007 (1998).
59) B. Vanderheyden and A. D. Jackson, Phys. Rev. D 62, 094010 (2000).
60) A. Barducci et al. Phys. Rev. D 41, 1610 (1990).
61) J. Berges and K. Rajagopal, Nucl. Phys. B 538, 215 (1999).
62) R. A. Janik, M. A. Nowak, G. Papp and I. Zahed, Nucl. Phys. A 642, 191 (1998).
63) B. Klein, D. Toublan and J. J. M. Verbaarschot, Phys. Rev. D 68, 014009 (2003).
64) A. Barducci, R. Casalbuoni, G. Pettini and L. Ravagli, Phys. Rev. D 72, 056002 (2005).
65) D. N. Walters and S. Hands, Nucl. Phys. Proc. Suppl. 140, 532 (2005).
66) M. Frank, M. Buballa and M. Oertel, Phys. Lett. B 562, 221 (2003).
67) D. Toublan and J. J. M. Verbaarschot, Int. J. Mod. Phys. B 15, 1404 (2001).
68) K. Splittorff and J. J. M. Verbaarschot, Nucl. Phys. B 683, 467 (2004).
69) G. Akemann, Nucl. Phys. B 730, 253 (2005).
70) R. Narayanan and H. Neuberger, Nucl. Phys. B 696, 107 (2004).
71) F. Farchioni et al. Phys. Rev. D 62, 014503 (2000).
72) P. Damgaard, U. Heller, R. Niclasen and K. Rummukainen, Nucl. Phys. B 583, 347 (2000).
73) I. Barbour et al., Nucl. Phys. B 275, 296 (1986);
74) S. Muroya, A. Nakamura, C. Nonaka and T. Takaishi, Prog. Theor. Phys. 110, 615 (2003).
75) T. Wettig, private communication.
76) G. Akemann and T. Wettig, Phys. Rev. Lett. 92, 102002 (2004) [Ibid. 96, 029902 (2006)].
77) J. Bloch and T. Wettig, Phys. Rev. Lett. 97, 012003 (2006).
78) G. Akemann et al., Nucl. Phys. Proc. Suppl. 140, 568 (2005).
79) M. Halasz, J. Osborn, M. Stephanov and J. Verbaarschot, Phys. Rev. D 61, 076005 (2000).
80) T. D. Cohen, Phys. Rev. Lett. 91, 222001 (2003); arXiv:hep-ph/0405043.
81) G. Akemann, J. Osborn, K. Splittorff and J. Verbaarschot, Nucl. Phys. B 712, 287 (2005).
82) L. Ravagli and J.J.M. Verbaarschot, in preparation.
83) K. Splittorff and J. J. M. Verbaarschot, Phys. Rev. Lett. 98, 031601 (2007).
84) K. Splittorff and J. J. M. Verbaarschot, arXiv:hep-lat/0702011.
85) D. Toussaint, Nucl. Phys. Proc. Suppl. 17, 248 (1990).
ABSTRACT
  We review applications of random matrix theory to QCD at nonzero temperature
and chemical potential. The chiral phase transition of QCD and QCD-like
theories is discussed in terms of eigenvalues of the Dirac operator. We show
that for QCD at $\mu \ne 0$, which has a sign problem, the discontinuity in the
chiral condensate is due to an alternative to the Banks-Casher relation. The
severity of the sign problem is analyzed in the microscopic domain of QCD.

<|endoftext|><|startoftext|>
Microsoft Word - MS737.rtf
Manuscript submitted as a Letter to the Editor. 
Title:   
Symmetries by base substitutions in the genetic code predict 2’ or 3’ aminoacylation of tRNAs. 
Authors: Jean-Luc Jestina, Christophe Souléb 
Addresses: 
aUnité de Chimie Organique, URA 2128 CNRS 
Département de Biologie Structurale et Chimie, Institut Pasteur 
25 rue du Dr. Roux, 75724 Paris 15, France 
email: jjestin@pasteur.fr (corresponding author) 
tel +33 1 4438 9496; fax +33 1 4568 8404 
bInstitut des Hautes Etudes Scientifiques, CNRS 
35 route de Chartres, 91440 Bures-sur-Yvette, France 
email: soule@ihes.fr 
Key words :  
Mutation; degeneracy; aminoacyl-tRNA synthetase; codon; symmetry breaking. 
Understanding why the genetic code is the way it is, has been the subject of numerous 
models and still remains largely a challenge (Freeland et al., 2000; Sella and Ardell, 
2006). Associations between codons and amino acids were suggested to rely on RNA-
amino acid interactions (Raszka and Mandel, 1972; Yarus, 1998). Closely related 
codons were put in correspondence with closely related amino acids within their 
biosynthetic pathways (Wong, 2005). Codons have also been grouped into systems 
characterized by interlocked thermodynamic cycles (Klump, 2006). Evolutionary 
models that minimise the number of the most frequent mutations provide a rationale for 
the fact that transitions at the third base of codons are mostly neutral mutations 
(Goldberg and Wittes, 1966). Similarly, minimization of the deleterious effects of 
sequence-dependent single-base deletions catalyzed by DNA polymerases provides a 
rationale for the assignment of stop signals to codons (Jestin and Kempf, 1997). While 
in-frame stop codons are strictly selected against, out-of-frame stop codons minimize 
the costs of ribosomal slippages (Seligmann and Pollock, 2004). In this context, the 
frequencies of codons were found to be highly dependent on the reading frame and 
highlighted a symmetrical codon pattern (Koch and Lehmann, 1997). As the genetic 
code is quasi-universal among living organisms, models do not need to be time-
dependent, even though time-dependent models have been suggested (Bahi and Michel, 
2004; Rodin and Rodin, 2006; Sella and Ardell, 2006). Symmetries in the genetic code 
are of special interest as they may highlight underlying organization principles of the 
code. A supersymmetric model for the evolution of the genetic code was proposed: 
successive breaking of these symmetries would provide an evolutive scenario for the 
decomposition into sets of synonymous codons (Hornos and Hornos, 1993; Bashford et 
al., 1997). When the amino acids are mapped to the vertices of a 28-gon, three two-fold 
symmetries were identified for three subsets of the cognate aminoacyl-tRNA 
synthetases (Yang, 2004). 
This letter reports complete sets of two-fold symmetries between partitions of the 
universal genetic code. By substituting bases at each position of the codons according 
to a fixed rule, it happens that properties of the degeneracy pattern or of tRNA 
aminoacylation specificity are exchanged. 
First the set of sixty-four codons of the genetic code was partitionned in two groups of 
thirty-two codons depending on whether the third base of triplets is necessary or not to 
define unambiguously an amino acid or a stop signal (property 1). Rumer reported a 
symmetry by base substitutions that alters property 1 (Rumer, 1966) . The substitutions 
exchanging T and G as well as A and C are applied to all three codon bases and are 
called Rumer’s transformation. If the third base is necessary to define an amino acid, 
then the symmetrical codon by Rumer’s transformation does not require the third base 
of codons to be defined so as to define unambiguously the amino acid. Conversely, if 
the third base does not have to be defined so as to define unambiguously an amino acid, 
then the symmetrical codon by Rumer’s transformation requires the third base to be 
given so as to define unambiguously the amino acid. More recently, one of the authors 
reported a symmetry that leaves unchanged property 1 (Jestin, 2006): this symmetry 
consists in applying to the first base of codons the substitutions exchanging G and C as 
well as T and A. For example, GCN codons coding for alanine are exchanged into CCN 
codons coding for proline; for GCN and CCN codons, the third base does not have to 
be defined so as to define unambiguously the amino acid. 
Here we report a third symmetry that alters property 1 (Fig.1). This symmetry is 
obtained by applying successively the two symmetries described above. It consists in 
applying the substitution exchanging A and G as well as C and T (a transition) to the 
first base in the codon, the substitution exchanging A and C as well as G and T (a 
transversion) to the second base in the codon, and the substitution exchanging A and C 
as well as G and T (a transversion) in the third base of the codon.  
We show further that the only other symmetries exchanging both groups into each other 
are obtained by combining the previous ones with a symmetry acting only on the third 
base of the codons (here we do not include the substitution on the second base which 
exchanges A and C when fixing G and T). This can be seen by counting the number of 
occurrences of A, C, G, and T as first, second or third base in a codon of each group. 
The result is given in Table 1. 
These symmetries are valid for the standard genetic code and for other genetic codes 
such as the vertebrate mitochondrial genetic code which has a higher degree of 
symmetry of its degeneracy pattern as noted earlier (Lehmann, 2000; Jestin, 2006). 
In addition to the existence of Rumer’s transformation, Shcherbak discussed the 
following Rumer’s rule (Shcherbak, 1989), which can be read off Table 1: the ratio R = 
C+G/T+A of the number of occurrences of C and G by the number of occurrences of T 
and A in positions 1, 2 and 3 is equal to 3, 3 and 1 respectively in codons of the first 
group (and hence it is 1/3, 1/3 and 1 for codons of the second group). Similarly, the 
ratio P = T+C/A+G is 1, 3 and 1 in positions 1, 2 and 3 of the first group of codons. 
Secondly, we considered another grouping of codons of the genetic code depending on 
whether the amino acids are acylated by amino acyl-tRNA synthetases at the 2’ or at 
the 3’ hydroxyl group of the tRNA’s last ribose (property 2) (Sprinzl and Cramer, 1975; 
Arnez and Moras, 1994). This classification of amino acyl-tRNA synthetases is very 
similar to the one based on sequence homology and on structural considerations (Eriani 
et al., 1990; Cusack, 1997). Class I synthetases contain HIGH and KMSKS consensus 
sequences, which are absent from class II amino acyl tRNA synthetases. At the 
structural level, class I synthetases also contain a Rossman fold, a domain that binds 
nucleotides, unlike class II synthetases. Class I enzymes catalyse acylation at the 2’ 
hydroxyl group of the tRNA while class II enzymes generally catalyse acylation at the 
3’ hydroxyl group of the tRNA. PheRS as a class II enzyme that catalyses acylation at 
the tRNA’s 2’ hydroxyl group is therefore an exception.  
The case of cysteinyl-tRNACys synthetase (CysRS) is ambiguous and was investigated 
recently. CysRS is a class I synthetase, but establishes contacts with the major groove 
of the acceptor stem of the tRNACys as commonly found for class II enzymes. The 
enzyme from Escherichia coli is able to catalyse the acylation reaction at both 2’ and 3’ 
hydroxyl groups of the tRNACys. The 2’ acylation is about one order of magnitude 
faster than the 3’ acylation when catalysed by E. coli cysteinyl-tRNA synthetase in 
vitro (Shitivelband and Hou, 2005).  
The following classification was then used for 2’ acylated amino acids (Ile, Leu, Met, 
Val, Trp, Tyr, Arg, Gln, Glu, Phe) and for 3’ acylated amino acids (His, Pro, Ser, Thr, 
Asn, Asp, Lys, Ala, Gly).  To the class of 2’ acylated amino acids we also added the 
stop signals, a choice partially justified by the fact that two stop codons of the 
mitochondrial code of vertebrates code for the 2’ acylated amino acid Arg in the 
universal code. Note that if cysteine were not in the class 3’, or if a stop signal was not 
in the class 2’, symmetries could not be identified. If cysteine is assigned to the class 2’ 
as suggested by the previous paragraph, the symmetries are broken. Loss of the 
symmetries might have occurred during the evolution of aminoacyl-tRNA synthetases 
and might be associated to the late appearance of this amino acid in the genetic code 
(Brooks and Fresco, 2002).  
When considering molecular properties such as polarity, volume and hydrophobicity, 
no statistical differences were noted between class 2’ and class I on one hand, class 3’ 
and class II on the other hand (Table 3). 
There exist two symmetries by base substitutions that exchange the class 2’ with the 
class 3’ of the corresponding codon groups (cf. Fig.2). They consist in applying the 
substitution exchanging A and C as well as G and T (a transversion) to the first base of 
the codon, the substitution exchanging A and G as well as C and T (a transition) to the 
second base of the codon, and the substitution exchanging A and C as well as G and T 
or A and T as well as C and G (a transversion) to the third base of the codon. These two 
symmetries differ by the substitution exchanging A and G as well as C and T in the 
third position. They are not related to those depicted in Figures 4 and 5 (Yang, 2004) as 
Yang’s three symmetries act only on three subsets of amino acids whereas the 
symmetries described herein are valid for the whole codon table. 
There are no other symmetries by base substitutions between the two classes 2’ and 3’, 
as can be seen by counting the occurrences of A, C, G and T in each class and each 
position (Table 2). Note also the following analog of the Rumer’s rule: both the ratio R 
= C+G / T+A  and the ratio Q = A+C / G+T  are equal to 1, 1/3, 1 in positions 1,  2,  3 
respectively in the class 2’ (and 1, 3, 1 in the class 3’).  
In this letter we have described new symmetries by base substitutions in the genetic 
code for partitions concerning the codon degeneracy level or the tRNA-aminoacylation 
class. Several evolutionary models have been proposed concerning tRNAs and their 
aminoacyl-tRNA synthetases (Martinez Gimenez and Tabares Seisdedos, 2002; 
Klipcan and Safro, 2004; Chechetkin, 2006; Di Giulio, 2006). Newly introduced amino 
acids may well have been selected to minimize the deleterious effects of 
mistranslations, and possibly according to their molecular volumes (Torabi et al., 
2006). A unique serie of binary divisions of the codon table was recently noted: when 
the same differentiation rule was applied at each division, the class I / class II pattern 
arose consistently (Delarue, 2007). Aminoacyl-tRNA synthetases are likely to have 
evolved by gene duplication and mutation of primordial synthetases within each class, 
as evidenced by sequence homology (Woese et al., 2000). Consistently, the symmetries 
highlighted in this manuscript require three base substitutions per codon, which are 
unlikely to happen, thereby shedding some light on the duplication and divergence 
mechanism of evolution among the two classes of aminoacyl-tRNA synthetases. 
Acknowledgements  :  
We thank H. Epstein, E. Yeramian, D. Moras, B. Prum and J. Perona for their help. 
References : 
Arnez, J. G., Moras, D. 1994. Aminoacyl-tRNA synthetase tRNA recognition. Oxford, 
IRL Press 61-81. 
Bahi, J. M., Michel, C. J. 2004. A stochastic gene evolution model with time dependent 
mutations. Bull. Math. Biol. 66, 763-778. 
Bashford, J. D., Tsohantjis, I., Jarvis, P. D. 1997. Codon and nucleotide assignments in a 
supersymmetric model of the genetic code. Phys. Lett. A 233, 481-488. 
Brooks, D. J., Fresco, J. R. 2002. Increased frequency of cysteine, tyrosine, and 
phenylalanine residues since the last universal ancestor. Mol. Cell. Proteomics 1, 
125-131. 
Chechetkin, V. R. 2006. Genetic code from tRNA point of view. J. Theor. Biol. 242, 922-
934. 
Cusack, S. 1997. Aminoacyl-tRNA synthetases. Curr. Opin. Struct. Biol. 7, 881-889. 
Delarue, M. 2007. An asymmetric underlying rule in the assignment of codons. RNA 13, 
161-169. 
Di Giulio, M. 2006. The non-monophyletic origin of the tRNA molecule and the origin of 
genes only after the evolutionary stage of the last universal common ancestor. J. 
Theor. Biol. 240, 343-352. 
Di Giulio, M., Capobianco, M. R., Medugno, M. 1994. On the optimization of the 
physicochemical distances between amino acids in the evolution of the genetic code. 
J. Theor. Biol. 168, 43-51. 
Eriani, G., Delarue, M., Poch, O., Gangloff, J., Moras, D. 1990. Partition of tRNA 
synthetases into two classes based on mutually exclusive sets of sequence motifs. 
Nature 347, 203-206. 
Freeland, S. J., Knight, R. D., Landweber, L. F., Hurst, L. D. 2000. Early fixation of an 
optimal genetic code. Mol. Biol. Evol. 17, 511-518. 
Goldberg, A. L., Wittes, R. E. 1966. Genetic code: aspects of organization. Science 153, 
420-424. 
Hornos, J. E. M., Hornos, Y. M. M. 1993. Algebraic model for the evolution of the 
genetic code. Phys. Rev. Lett. 71, 4401-4404. 
Jestin, J. L. 2006. Degeneracy in the genetic code and its symmetries by base 
substitutions. C. R. Biol. 329, 168-171. 
Jestin, J. L., Kempf, A. 1997. Chain-termination codons and polymerase-induced 
frameshift mutations. FEBS Letters 419, 153-156. 
Klipcan, L., Safro, M. 2004. Amino acid biogenesis, evolution of the genetic code and 
aminoacyl-tRNA synthetases. J. Theor. Biol. 228, 389-396. 
Klump, H. H. 2006. Exploring the energy landscape of the genetic code. Arch. Biochem. 
Biophys. 453, 87-92. 
Koch, A. J., Lehmann, J. 1997. About a symmetry of the genetic code. J. Theor. Biol. 189, 
171-174. 
Kyte, J., Doolittle, R. F. 1982. A simple method for displaying the hydropathic character 
of a protein. J. Mol. Biol. 157, 105-132. 
Lehmann, J. 2000. Physico-chemical constraints connected with the coding properties of 
the genetic system. J. Theor. Biol. 202, 129-144. 
Martinez Gimenez, J. A., Tabares Seisdedos, R. 2002. On the dimerization of the 
primitive tRNAs: implications in the origin of genetic code. J. Theor. Biol. 217, 493-
498. 
Raszka, M., Mandel, M. 1972. Is there a physical chemical basis for the present genetic 
code? J. Mol. Evol. 2, 38-43. 
Rodin, S. N., Rodin, A. S. 2006. Origin of the genetic code: first aminoacyl-tRNA 
synthetases could replace isofunctional ribozymes when only the second base of 
codons was established. DNA Cell Biol. 25, 365-375. 
Rumer, Y. B. 1966. About the codon's systematization in the genetic code. Proc. Acad. 
Sci. USSR 167, 1393-1394. 
Seligmann, H., Pollock, D. D. 2004. The ambush hypothesis: hidden stop codons prevent 
off-frame gene reading. DNA Cell Biol. 23, 701-705. 
Sella, G., Ardell, D. H. 2006. The coevolution of genes and genetic codes: Crick's frozen 
accident revisited. J. Mol. Evol. 63, 297-313. 
Shcherbak, V. I. 1989. Rumer's rule and transformation in the context of the co-operative 
symmetry of the genetic code. J. Theor. Biol. 139, 271-276. 
Shitivelband, S., Hou, Y. M. 2005. Breaking the stereo barrier of amino acid attachment to 
tRNA by a single nucleotide. J. Mol. Biol. 348, 513-521. 
Sprinzl, M., Cramer, F. 1975. Site of aminoacylation of tRNAs from Escherichia coli with 
respect to the 2'- or 3'-hydroxyl group of the terminal adenosine. Proc. Natl. Acad. 
Sci. USA 72, 3049-3053. 
Torabi, N., Goodarzi, H., Najafabadi, H. S. 2006. The case of an error minimizing set of 
coding amino acids. J. Theor. Biol. in press. 
Woese, C. R., Olsen, G. J., Ibba, M., Soll, D. 2000. Aminoacyl-tRNA synthetases, the 
genetic code, and the evolutionary process. Microbiol. Mol. Biol. Rev. 64, 202-236. 
Wong, J. T. 2005. Coevolution theory of the genetic code at age thirty. Bioessays 27, 416-
Yang, C. M. 2004. On the 28-gon symmetry inherent in the genetic code intertwined with 
aminoacyl-tRNA synthetases--the Lucas series. Bull. Math. Biol. 66, 1241-1257. 
Yarus, M. 1998. Amino acids as RNA ligands: a direct-RNA-template theory for the 
code's origin. J. Mol. Evol. 47, 109-117. 
Figure Legends : 
Figure 1 
Exchange of Group I (codons for which the third base does not have to be defined to 
specify the amino acid) into Group II (codons for which the third base must be 
defined to specify unambiguously the amino acid or the stop signal) by the 
transformation (AG/CT for the first base, GT/AC for the second and third bases). 
N=A,T,G or C; H=A,T or C; Y=T or C; R=A or G. 
Figure 2 
Exchange of the classes 2’ and 3’ by the transformation (AC/GT on the first base, 
AG/CT on the second base, AC/GT on the third base). The special case of cysteine is 
labelled by an asterisk and discussed in the text. 
Table I 
Number of occurences of the bases A, C, G and T at each position within the 
codon in each group. 
Table II 
Number of occurences of the bases A, C, G and T at each position within the 
codon in each class. 
Table III 
Statistical t-values computed from the data on hydrophobicity (Kyte and 
Doolittle, 1982), molecular volume and polarity (Di Giulio et al., 1994) 
comparing the class 2’ with class I, and the class 3’ with class II. These values are 
below the threshold of significance given in the Student’s table. 
                        A        C        G       T 
           
Base 1   Group I              4               12              12             4          
             Group II            12                4               4             12 
   ____________________________ 
Base 2   Group I               0               16              8              8    
              
              Group II            16                 0              8              8 
   ____________________________ 
Base 3   Group I               8                8               8              8 
                  
              Group II              8                8               8              8  
                                                     Table 1   
                   
                        A        C        G       T 
 Base 1    Class 2’            6               10           6           10 
                                   
               Class 3’            10               6             10            6 
_____________________________ 
Base 2    Class 2’             8                 0              8            16 
                        
               Class 3’             8                16             8              0 
_____________________________ 
Base 3     Class 2’           10                6             10             6 
                  
                Class 3’             6               10             6            10  
                                                    Table 2 
                                                                                                                
   Class 2’ / Class I  Class 3’ / Class II 
Hydrophobicity  0.07     0.11 
Polarity   0.017    0.019 
Volume   0.57     0.45 
                                                    Table 3
ABSTRACT
  This letter reports complete sets of two-fold symmetries between partitions
of the universal genetic code. By substituting bases at each position of the
codons according to a fixed rule, it happens that properties of the degeneracy
pattern or of tRNA aminoacylation specificity are exchanged.

<|endoftext|><|startoftext|>
Optical properties of theHolstein-t-Jmodel fromdynamicalmean-field theory
E. Cappelluti a,b,∗, S. Ciuchi c, S. Fratini d
aDipartimento di Fisica, Università “La Sapienza”, P.le A. Moro 2, 00185 Rome, Italy
bSMC Research Center and ISC, INFM-CNR, v. dei Taurini 19, 00185 Rome, Italy
cINFM and Dipartimento di Fisica, Università dell’Aquila, via Vetoio, I-67010 Coppito-L’Aquila, Italy
dInstitut Néel - CNRS & Université Joseph Fourier, BP 166, F-38042 Grenoble Cedex 9, France
Abstract
We employ dynamical mean-field theory to study the optical conductivity σ(ω) of one hole in the Holstein-t-J model. We provide
an exact solution for σ(ω) in the limit of infinite connectivity. We apply our analysis to Nd2−xCexCuO4. We show that our model
can explain many features of the optical conductivity in this compounds in terms of magnetic/lattice polaron formation.
Key words: magnetic/lattice polarons, spin fluctuations, optical conductivity, cuprates.
PACS: 71.10.Fd, 71.38.-k, 78.20.Bh, 75.30.Ds.
The problem of a single hole in the t-J model interact-
ing also with the lattice degrees of freedom has attracted
recently a notable interest in connection with the physical
properties of the underdoped high-T
cuprates [1,2,3,4]. An
important issue in this regime is the formation of lattice
or magnetic polarons (or both of them) and their mutual
interaction. Along this line, the one-particle properties (as
the effective mass, spectral function, etc.) have been widely
investigated with different techniques. Much less effort has
been however paid to the study of the optical properties.
On the analytical ground, the definition of the optical con-
ductivity (OC) in the single hole is a delicate matter which
needs particular care even for the pure t-J or Holstein
model [5,6]. On the other hand, numerical calculations on
clusters are limited by finite size effects [7]. As a general
rule, thus, the choice of a particular theoretical approach
depends on which property is under examination and on
its feasibility to investigate it.
In this paper we summarize the main results of our work
based on the dynamical mean-field theory (DMFT). Tech-
nical details will be presented in a forthcoming longer pub-
lication [8]. In the infinite coordination number limit z →
∞, we provide an exact solution for σ(ω) as a functional
of the local one-particle Green’s function at finite temper-
ature. It should be stressed that, due to the classical treat-
ment of the magnetic background, the DMFT solution for
∗ Corresponding author. Tel: (+39) 06-49937453 fax: (+39) 06-
49937440
Email address: emmcapp@roma1.infn.it (E. Cappelluti).
0 1 2 3 4
Ref. [7]
this work
λ=1, J/t=0.4, ω
Fig. 1. Comparison between the optical conductivity σ(ω) obtained
by our DMFT solution and Lanczos diagonalization in two dimen-
sions on a finite cluster (Ref. [7]).
z → ∞ is purely local so that it cannot describe the coher-
ent propagation of holes due to the spin fluctuations, nor
the metallic Drude-like peak in σ(ω). On the other hand,
the local properties (as the average number of phonons, size
of the magnetic polaron, etc.) are well captured by this ap-
proach, [9] as well as the incoherent contributions to the
OC. We can explicitly show this feature by comparing in
Fig. 1 our DMFT results with numerical calculations using
Lanczos diagonalization for a single hole in the 2DHolstein-
t-J model on a
10 cluster [7].
The remarkably good agreement of the overall shape as-
sesses the feasibility of our approach to investigate the in-
coherent contributions to the finite frequency OC. This is-
sue is particularly important in light of the intensive de-
bate about the origin of the mid-infrared (MIR) band in the
underdoped high-T
cuprates. Different interpretations for
this feature have been discussed in the literature, involving
Preprint submitted to Elsevier 29 October 2018
http://arxiv.org/abs/0704.0333v1
charge/spin fluctuations, stripe ordering, and other mecha-
nisms. This spread of differentmechanisms reflects the pres-
ence in this doping regime of several actors, which makes
it difficult to isolate each effect from the others. A simpler
and ideal situation is the case of electron-doped cuprates, as
Nd2−xCexCuO4. In these compounds, the long-range anti-
ferromagnetic (AF) order extents up to x ≃ 0.14, so that
the low doping regime x . 0.1 we are interested in, lies well
within the AF phase. On the experimental side, in addi-
tion, a detailed and exhaustive study of the optical conduc-
tivity as a function of temperature T and of the doping x
was recently provided in Ref. [10]. In that work the authors
showed that the low doping OC spectra are characterized
at low temperature by a MIR pseudogap, with an absorp-
tion band edge which varies from EMIR ≃ 0.5− 0.6 for x =
0.05 to EMIR ≃ 0.3− 0.4 for x = 0.1, and is barely distin-
guishable for x = 0.125. Quite interestingly, increasing the
temperature leads to a filling of the pseudogap, rather than
a closing of it. Also remarkable is the temperature depen-
dence of the MIR spectral weight which does not present
any signature at the long-range Néel temperature TN but
rather a kink to a higher “pseudogap” temperature T ∗.
We show here that our approach is able to describe all
these features, and in particular the MIR band edge, in
terms of an optical gap due to the formation of a mag-
netic/lattice polaron. We define T ∗ as the temperature
where the size of the spin polaron becomes larger than
the AF correlation length, that is the maximum tempera-
ture where an injected charge actually probes the magnetic
background. In this perspective we can identify T ∗ with the
mean field Néel temperature of our model, which represents
the temperature above which the system is described by
a paramagnetic state (rather than the onset of long range
order). From Ref. [10] we get for instance T ∗ = 440 K at
x = 0.05 and T ∗ = 200 K at x = 0.125. Using the Curie-
Weiss relation T ∗
= J/4 we estimate respectively J = 152
meV (J/t = 0.126) and J = 69 meV (J/t = 0.057). Note
that such values of J do not represent the bare exchange
interaction but rather the effective spin-exchange coupling
which is reduced by hole doping. We also set ω0 = 84 meV,
consistent with the energy window of the optical phonons
in the cuprates. The electron-phonon (el-ph) coupling con-
stant is fixed to λ = 0.75 in order to reproduce the exper-
imental MIR band edge ≈ 0.5− 0.6 eV in the optical con-
ductivity at x = 0.05, and we assume λ to be independent
of the doping x. Note that with these choices no more free
adjustable parameters remain.
In Fig. 2 we show the temperature evolution of the MIR
optical conductivity for the representative cases x = 0.05
and x = 0.125 (note that in order to compare with the ex-
perimental data of Ref. [10] the tail of a Drude-peak should
be superimposed). Most remarkable is the behavior of σ(ω)
at low temperature, which shows a well defined gap for
x = 0.05 while no gap is found for x = 0.125. This fea-
ture reflects the formation of the lattice polaron and its in-
terplay with the spin degrees of freedom. While the el-ph
coupling λ = 0.75 alone is not strong enough at x = 0.125
0 0.5 1
ω  [eV]
0.5 1 1.5
ω  [eV]
0 200 400
T  [K]
x=0.05
x=0.125
T=50K
T=440K
T=540K
T=540K
T=340K
T=50K
T=190K
Fig. 2. Temperature dependence of the optical conductivity σ(ω) for
x = 0.05 and x = 0.125. Solid lines are used for T ≤ T ∗, dashed
lines for T > T ∗. Inset: loss of the MIR spectral weight ∆Neff , as
defined in Ref. [10], as function of T for x = 0.05 (filled circles) and
x = 0.125 (empty squares). Arrows mark the corresponding T ∗.
(J/t = 0.057) to establish a spin/lattice polaron, the lo-
calization effects induced by the larger exchange coupling
J/t = 0.126 at x = 0.05 favor the lattice polaron forma-
tion. This leads thus to the opening of an optical gap in
σ(ω) (this key point will be extensively discussed in a forth-
coming publication[8]). Increasing T reduces the localiza-
tion effects induced by the magnetic ordering. This makes
the positive interplay with the el-ph coupling less effective,
leading to a progressive filling of the pseudogap. Note that
this effect disappears in the disordered magnetic case for
T > T ∗, and further increasing of T leads to a reduction of
the MIR optical conductivity which is spread on a larger
energy window. This is reflected in the characteristic tem-
perature behavior of the MIR spectral weight ∆Neff , as de-
fined in Ref. [10], which presents a kink at T ∗ (inset of Fig.
2)[11].
References
[1] A.S. Mishchenko and and N. Nagaosa, Phys. Rev. Lett. 93
(2004) 0236402; Phys. Rev. B 73 (2006) 092502.
[2] O. Rösch and O. Gunnarsson, Phys. Rev. Lett. 92 (2004) 146403;
Eur. Phys. J. B 43 (2005) 11.
[3] O. Gunnarsson and O. Rösch, Phys. Rev. B 73 (2006) 174521.
[4] P. Prelovšek, R. Zeyher, and P. Horsch, Phys. Rev. Lett. 96
(2006) 086402.
[5] M.P.H. Stumpf and D.E. Logan, Eur.Phys.J.B, 8 (1999) 377.
[6] S. Fratini and S. Ciuchi, Phys. Rev. B 74 (2006) 075101.
[7] B. Bäuml et al., Phys. Rev. B 58 (1998) 3663.
[8] E. Cappelluti, S. Ciuchi and S. Fratini, in preparation (2007).
[9] E. Cappelluti and S. Ciuchi, Phys. Rev. B 66 (2002) 165102.
[10] Y. Onose et., Phys. Rev. B 69 (2004) 024504.
[11] Since we do not find any isosbestic point in our calculations,
we use the experimental energy windows of Ref. [10] to define
∆Neff , namely ωmin = 0.12 eV, ωmax = 0.42 eV for x = 0.05
and ωmax = 0.21 eV for x = 0.125.
	References
ABSTRACT
  We employ dynamical mean-field theory to study the optical conductivity
$\sigma(\omega)$ of one hole in the Holstein-t-J model. We provide an exact
solution for $\sigma(\omega)$ in the limit of infinite connectivity. We apply
our analysis to Nd$_{2-x}$Ce$_x$CuO$_4$. We show that our model can explain
many features of the optical conductivity in this compounds in terms of
magnetic/lattice polaron formation.

<|endoftext|><|startoftext|>
Introduction 
The understanding of chemical reactivity and site selectivity of the molecular 
systems has been effectively handled by the conceptual density functional theory (DFT).1 
Chemical potential, global hardness, global softness, electronegativity and electrophilicity 
are global reactivity descriptors, highly successful in predicting global chemical reactivity 
trends. Fukui function (FF) and local softness are extensively applied to probe the local 
reactivity and site selectivity. The formal definitions of all these descriptors and working 
equations for their computation have been described. 1-4 Various applications of both  
global and local reactivity descriptors in the context of chemical reactivity and site 
selectivity have been reviewed in detail.3  
Parr et al. introduced the concept of Electrophilicity (ω) as a global reactivity index 
similar to the chemical hardness and chemical potential. 5 This new reactivity index 
measures the stabilization in energy when the system acquires an additional electronic 
charge ΔN from the environment. The electrophilicity is defined as  
                            ημω 2/2=                     (1)
In Eq. (1), μ ≈ -(I+A)/2 and η ≈ (I-A)/2 are the electronic chemical potential and the 
chemical hardness of the ground state of atoms and molecules, respectively, approximated 
in terms of the vertical ionization potential (I) and electron affinity (A). The 
electrophilicity is a descriptor of reactivity that allows a quantitative classification of the 
global electrophilic nature of a molecule within a relative scale. 5 
Fukui Function (FF) 6 is one of the widely used local density functional descriptors 
to model chemical reactivity and site selectivity and is defined as the derivative of the 
electron density ρ ( r ) with respect to the total number of electrons N in the system, at 
constant external potential ν ( r ) acting on an electron due to all the nuclei in the system  
[ ] [ ] )()()()( rvN Nrrvrf ∂∂== ρδδμ . (2) 
The condensed FF are calculated using the procedure proposed by Yang and 
Mortier,7 based on a finite difference method 
)()1( NqNqf kkk −+=
+  for nucleophilic attack (3a)
 )1()( −−=− NqNqf kkk          for electrophilic attack      (3b) 
[ ] 2)1()1( −−+= NqNqf kkok for radical attack (3c)
where kq is the electronic population of atom k in a molecule. 
Chattaraj et al.8 have introduced the concept of generalized philicity. It contains 
almost all information about hitherto known different global and local reactivity and 
selectivity descriptors, in addition to the information regarding electrophilic/nucleophilic 
power of a given atomic site in a molecule. It is possible to define a local quantity called 
philicity associated with a site k in a molecule with the help of the corresponding 
condensed- to- atom variants of FF, αkf  as 
αα ωω kk f=              (4)
where (α= +, - and 0) represents local philic quantities describing nucleophilic, 
electrophilic and radical attacks respectively. Eq. (4) predicts that the most electrophilic 
site in a molecule is the one providing the maximum value of ωk+. When two molecules 
react, which one will act as an electrophile (nucleophile) will depend on, which has a 
higher (lower) electrophilicity index. This global trend originates from the local behavior 
of the molecules or precisely at the atomic site(s) that is(are) prone to electrophilic 
(nucleophilic) attack. Recently the usefulness of electrophilicity index in elucidating the 
toxicity of polychlorinated biphenyls, benzidine and chlorophenol has been assessed in 
detail. 9-11 
In addition to the knowledge of global softness (S), which is the inverse of 
hardness, 12 different local softnesses 13 used to describe the reactivity of atoms in 
molecule, can be defined as 
k ks Sf
α α=  (5)
where (α= +, - and 0) represents local softness quantities describing nucleophilic, 
electrophilic and radical attacks respectively. Based on local softness, relative 
nucleophilicity (sk- /sk+) and relative electrophilicity (sk+ /sk-) indices have also been defined 
and their usefulness to predict reactive sites also been addressed to.14 It has been 
established that the quantum chemical model selected to derive wave function; population 
scheme used to obtain the partial charges and basis set employed in the molecular orbital 
calculations are important parameters, which significantly influence the FF values. 15-18  
The condensed philicity summed over a group of relevant atoms is defined as the 
“group philicity”. It can be expressed as19  
αα ωω  
where n is the number of atoms coordinated to the reactive atom, αωk  is the local 
electrophilicity of the atom k, and ωgα is the group philicity obtained by  adding the local 
philicity of the nearby bonded atoms. In this study19 the group nucleophilicity index (ωg+) 
of the selected systems is used to compare the chemical reactivity trends.  
Toro-Labbé et al20 have recently proposed a dual descriptor (Δf ( r )), which is 
defined as the difference between the  nucleophilic and electrophilic Fukui functions and is 
given by,  
                                     Δf(r) = [ (f +(r) - (f - (r) ] (7)
If Δf(r) > 0, then the site is favored for a nucleophilic attack, whereas if Δf (r) < 0, then the 
site may be favored for an electrophilic attack. The associated dual local softness have also 
been defined as,19 
                           Δsk = S (fk+ - fk-) = (sk+ - sk-) (8)
It is defined as the condensed version of Δf (r) multiplied by the molecular softness S. 
2. Multiphilic Descriptor 
In the light of the local philicity concept proposed by Chattaraj et al.8 and the dual 
descriptor derived by Toro-Labbé and coworkers,20 we propose a multiphilic descriptor 
using the unified philicity concept, which can concurrently characterize both nucleophilic 
and electrophilic nature of a chemical species. It is defined as the difference between the 
nucleophilic and electrophilic condensed philicity functions. It is an index of selectivity 
towards nucleophilic attack, which can as well characterize an electrophilic attack and is 
given by,21  
Δωk = [ωk+ - ωk- ] = ω [Δƒk] (9)
where Δƒk is the condensed-to-atom variant-k of Δƒ(r) (eq 7). If Δωk > 0, then the 
site k is favored for a nucleophilic attack, whereas if Δωk < 0, then the site k may be favored 
for an electrophilic attack. Because FFs are positive (0 < ƒk < 1), -1 < Δƒk < 1, and the 
normalization condition for Δωk is 
0=Δ=Δ ∑∑
k fωω  (10)
Although Δωk and Δfk will contain the same intramolecular reactivity information 
the former is expected to be a better intermolecular descriptor because of its global 
information content.  
We may analyze the nature of ( )rωΔ  in terms of that22 of ( )f rΔ  as follows: 
[ ]( )( )
ωω ⎛ ⎞∂∂⎛ ⎞
= ⎜ ⎟⎜ ⎟∂ ∂⎝ ⎠ ⎝ ⎠
∂ ∂⎛ ⎞ ⎛ ⎞
= +⎜ ⎟ ⎜ ⎟∂ ∂⎝ ⎠ ⎝ ⎠
 ( ) ( )
f r f r
= + Δ⎜ ⎟∂⎝ ⎠
 ( ) ( )
f r r
= + Δ⎜ ⎟∂⎝ ⎠
( ) ( )
r f r
⎡ ⎤∂ ∂⎛ ⎞ ⎛ ⎞
Δ = −⎢ ⎥⎜ ⎟ ⎜ ⎟∂ ∂⎝ ⎠ ⎝ ⎠⎣ ⎦
The multiphilicity descriptor, ( )rωΔ  is a measure of the difference between local 
and global (modulated by ( )f r ) reactivity variations associated with the electron 
acceptance/ removal. Incidentally, the variation of 
ω∂⎛ ⎞
⎜ ⎟∂⎝ ⎠
 across the periodic table is 
similar to that of μ.23 
2v vN N
⎡ ⎤∂ ∂⎛ ⎞
=⎜ ⎟ ⎢ ⎥∂ ∂⎝ ⎠ ⎣ ⎦
24 vv N
μ μ μ η
η η η
⎛ ⎞∂ ∂⎛ ⎞
= −⎜ ⎟ ⎜ ⎟∂ ∂⎝ ⎠⎝ ⎠
= −   
μ γ μ
= − = −  
Since γ is generally very small,24 
ω∂⎛ ⎞
⎜ ⎟∂⎝ ⎠
 is expected to follow the μ trend. 
Problems associated with the definition of η and the discontinuity25 in E as a 
function of N will be present in the ( )f rΔ  definition and the discontinuity in ( )rρ . 
 Similar type of differentiation has also been attempted by other research workers.26 
Also, to study the intra- and intermolecular reactivities another related descriptor 
namely, nucleophilicity excess ( ∓
gωΔ ) for a nucleophile, over the electrophilicity (net 
nucleophilicity) in it is defined as   
( )+−+− −=−=Δ ggggg ffωωωω ∓  (11)
where )(
ωω  and )(
ωω  are the group philicities of the 
nucleophile in the molecule due to electrophilic and nucleophilic attacks respectively. It is 
expected that the nucleophilicity excess ( ∓
gωΔ ) for a nucleophile should always be 
positive whereas it will provide a negative value for an electrophile in a molecule.  
In the present study, we use both the multiphilicity descriptor and nucleophilicity 
excess to probe the nature of attack/reactivity at a particular site in the selected systems. 
3. Computational Details 
The geometries of HCHO, CH3CHO, CH3COCH3, C2H5COC2H5, CH2=CHCHO 
CH3CH=CHCHO, NH2OH, CH3ONH2, CH3NHOH, OHCH2CH2NH2, CH3SNH2, 
CH3NHSH, SHCH2CH2NH2 and all-metal aromatic molecules, viz., MAl4– (M=Li, Na, K 
and Cu) are optimized by B3LYP/6-311+G** as available in the GAUSSIAN 98 package.27 
Various reactivity and selectivity descriptors such as chemical hardness, chemical 
potential, softness, electrophilicity and the appropriate local quantities employing natural 
population analysis (NPA)28, 29 scheme are calculated. HPA scheme (Stockholder 
Partitioning Scheme) 30 as implemented in the DMOL3 package 31 has also been used to 
calculate the local quantities employing BLYP/DND method. For all-metal aromatic 
molecules, ∆SCF method has been utilized to compute the ionization potential (IP) and 
electron affinity (EA) according to the equations (I=EN-1 - EN, A=EN - EN+1, where I and A 
are obtained from total electronic energy calculations on the N-1, N, N+1-electron systems 
at the neutral molecule geometry). 
4. Results and Discussion 
A series of carbonyl compounds is selected in the present study to probe the 
usefulness of the multiphilicity descriptor (Figure 1). A comparison with various other 
descriptors and the recently derived dual descriptor is also probed. Due to bipolar nature of 
C=O bond, both nucleophilic and electrophilic attacks are possible at C and O sites. It is 
noted that the rate of nucleophilic addition on the carbonyl compound be reduced by 
electron donating alkyl groups and enhanced by electron withdrawing ones. 32 Recently, 
we have studied a set of these carbonyl compounds in the light of philicity and group 
philicity.19 The global molecular properties of the selected series of carbonyl compounds 
are presented in Table 1. Various local quantities for particular sites of the selected systems 
are listed in Table 2 and Table 3. Selected compounds are grouped into two sets namely, 
nonconjugated and α, β-conjugated carbonyl compounds. 
For the nonconjugated carbonyl compounds, the carbon atom (C1) bearing the 
carbonyl group is expected to be the most reactive site towards a nucleophilic attack. Table 
2 lists the values of local reactivity descriptors using B3LYP/6-311+G** method for NPA 
derived charges of the selected molecules. NPA derived local quantities predict the 
expected maximum value for carbonyl carbon (C1) of all the selected molecules for fk+, sk+ 
and ωk+. But sk+/sk- is unable to provide the maximum value for C1 atom due to negative FF 
values. One important point to note is that among the descriptors fk+, sk+, ωk+ and sk+/sk-, 
+ value is capable of providing a clear distinction between carbonyl carbon (C1) and the 
oxygen site for nucleophilic attack. 
  Since, HPA derived charges generally provide non-negative FF values, we also 
made use of it for local reactivity analysis on carbonyl compounds. HPA derived local 
reactivity descriptors also predict the expected maximum value for C1 atom in the case of 
HCHO and CH3CHO but fails to predict for CH3COCH3 and C2H5COC2H5, where oxygen 
atom is shown to be prone towards nucleophilic attack. Nevertheless, the fk+ value of 
oxygen is almost same as that of carbonyl carbon (C1), thus making it difficult to make a 
clear decision on the electrophilic behavior of these atoms. Under these situation, dual 
descriptors Δf (r), Δs k and multiphilic descriptor Δω (r), give a helping hand. All these 
quantities provide a clear difference between nucleophilic and electrophilic attacks at a 
particular site with their sign. That is, they provide positive value for site prone for 
nucleophilic attack and a negative value at the site prone for electrophilic attack. The 
advantage of multiphilic descriptor Δω (r) is that they provide higher value in terms of 
magnitude compared to other dual descriptors. For instance, values of Δf(r), Δsk and Δω(r) 
for nucleophilic (electrophilic) attack at carbonyl carbon (oxygen) site of CH3CHO are 
1.06 (-0.93), 0.17 (-0.15), 3.03 (-2.65) respectively for NPA derived charges. Almost the 
same trend is followed in the case of HPA derived charges. 
The second group of compounds namely, α, β-conjugated carbonyl is elaborately 
studied in the recent past because of the presence of two reactive centers.33 The first 
reactive site is the carbon (C1) of the carbonyl, and the second is the carbon in the β 
position (C6). In such a case, the β carbon is activated because of the withdrawing 
mesomeric effect of the adjacent carbonyl group. As seen from Table 2 and Table 3, NPA 
derived charges give a maximum value for fk+ to carbonyl carbon whereas HPA derived 
charges provide maximum fk+ value to the β carbon atom (C6) in the case of CH2=CHCHO 
molecule. For CH3CH=CHCHO, NPA (HPA) provide maximum fk+ value of 0.44 (0.17) to 
carbonyl carbon (C1) compared to the β carbon site of 0.34 (0.16). This ambiguous 
behavior may be due to the dependence of local reactivity descriptors on the selection of 
basis set and population schemes. Further oxygen site shows high value for fk+ and other 
local descriptors, making it difficult to predict the proper electrophilic site. Even now Δω 
(r) exhibits high positive value on both carbons that are supposed to be electrophilic and a 
high negative value on the oxygen site disclosing clearly its nucleophilic character 
compared to other dual descriptors. Also it can be noted from Tables 2 and 3 that, even for 
molecules with more than one reactive sites, Δω (r) is capable of making a clear distinction 
among them in terms of their magnitude. That is, for molecules 6 and 7 having two 
reactive sites as carbon (C1) of the carbonyl and the carbon in the β position (C6), our 
descriptors are capable of distinctly identifying the stronger site 
(electrophilic/nucleophilic). 
Optimized structures along with atom numbering for the selected set of amines are 
presented in Figure 2. Global and local reactivity properties of the selected set of amines 
calculated using B3LYP/6-311+g** and BLYP/DND methods are presented in Tables 4 to 
6. Global reactivity trend based on ω, is given by  
B3LYP/6-311+g** method (Table 4) 
(i) CH3ONH2 > OHCH2CH2NH2 > CH3NHOH > NH2OH 
(ii) CH3NHSH > SHCH2CH2NH2 > CH3SNH2 
BLYP/DND method (Table 4) 
(i) CH3ONH2 > OHCH2CH2NH2  > NH2OH  > CH3NHOH  
(ii) CH3NHSH > SHCH2CH2NH2 > CH3SNH2 
Though both the methods show variation in reactivity trend for oxygen containing 
systems, trends related to sulfur containing systems are same. 
Based on NPA and HPA charge derived multiphilic descriptor at nitrogen site 
(∆ωN), following reactivity trend has been obtained,  
NPA (Table 5) 
(1) OHCH2CH2NH2 > CH3NHOH   > NH2OH  > CH3ONH2      
(2) CH3NHSH > SHCH2CH2NH2 > CH3SNH2   
HPA (Table 6) 
(1) OHCH2CH2NH2 > CH3ONH2 > NH2OH  > CH3NHOH        
(2) CH3NHSH > SHCH2CH2NH2 > CH3SNH2   
It may be noted that trends are same as ω for sulfur containing systems, but shows 
variations with respect to oxygen containing systems for both NPA and HPA charge 
derived ∆ωN. 
So for as the intramolecular reactivity trends are concerned, site with maximum 
negative value of ∆ωk is the most preferred site for electrophilic attack. Chemical intuition 
suggests that N site is more prone towards electrophilic attack. Table 7 lists the site with 
maximum negative value for ∆ωk for the selected set of amines. It is seen that with a few 
exception, N site is predicted as the most preferred site for electrophilic attack.  
Further in order to test ∆ωk along intrinsic reaction coordinate (IRC), we consider a 
cope rearrangement of hexa-1,5-diene. This is an example of [3,3] sigmatropic reaction. 
Figure 3 provides the optimized geometrical structures with atom numbering for the 
reactant, transition state and product calculated using B3LYP/6-31G* level of theory. 
Table 8 gives the global reactivity parameters of the reactant, transition state and product. 
As expected, hardness is minimum (2.48 eV) and the corresponding electrophilicity index 
is maximum (1.57 eV) at the transition state. Variation of global reactivity parameter along 
the IRC path is presented in Table 9 and Figure 4 (a-b).  Variation of energy (E) and ω 
along IRC path is given in Figure 5a. It is seen that both E and ω are maximum around the 
transition state indicating it as the most unstable structure along the IRC path. Figure 5 b 
provides the variation of hardness (η) and polarizability (α) along the IRC path. An inverse 
relationship exists between them. That is, η reaches a minimum whereas α becomes 
maximum at the transition state as expected.  
Variation of multiphilic descriptor (∆ωk) along IRC for the important atomic sites 
(C1 and C3/ C6 and C11) is presented in Figure 5. In going from reactant to product, C1 and 
C3 (C6 and C11) sites change their nature and become more prone towards electrophilic 
attack (nucleophilic attack) at the product side. This change in the nature of attack takes 
place around the transition state.  
In studying the importance of nucleophilicity excess ( ∓
gωΔ ) descriptor, a careful 
analysis on the electronic structure, property and reactivity of all-metal aromatic 
compounds, viz., MAl4– (M=Li, Na, K and Cu) is performed. The four membered 
aluminum unit Al4 present in all the molecules may be considered as a single unit. This 
unit can easily take part in charge transfer process with the M (≡Li, Na, K, Cu) atom in 
those complexes. 
 Figure 6 shows the various stable isomers of MAl4–. The C4v isomer of the MAl4– is 
reported as energetically most stable, least polarizable and hardest.34, 35 Table 10 presents 
the group philicity (ωg+, ωg–) values of the Al42– nucleophile and M+ (M=Li, Na, K, Cu) 
electrophile in the MAl4– isomers. It is found that in all MAl4– isomers the nucleophilicity 
of the Al42– aromatic unit overwhelms its electrophilic trend (i.e. +−
gg ωω ) and therefore 
gωΔ  is positive, whereas the electrophilicity of M
+ dominates over its nucleophilicity (i.e. 
gg ωω ) and therefore 
gωΔ  is negative as expected. It is important to note that 
gωΔ  of 
Al42– is maximum in the case of most stable C4v isomer of the MAl4– molecule. The order 
of the ∓
gωΔ  value of Al4
2– nucleophile in MAl4–, 
vvv CCC ∞24 , i.e. stabilization of an 
MAl4– isomer (except in KAl4–) increases its nucleophilicity and accordingly can be used 
as a better molecular cathode. It is also important to note that the nucleophilicity of the 
Al42– unit in MAl4– (C4v) increases as K Cu Na Li≺ ≺ ≺  according to the respective 
nucleophilicity excess values. Standard expressions1-5 for ∆N and ∆E in terms of group 
electronegativity and group hardness will provide additional insights into the electron 
transfer process.  
 Variation of kωΔ  along the IRC of three selected reactions,
36 viz., a) a 
thermoneutral reaction: Fa– + CH3-Fb → Fa-CH3 + Fb–, b) an endothermic reaction: HNO 
→ HON, c) an exothermic reaction: H2OO → HOOH is provided in figures 7 (a) – 7(c). 
For the thermoneutral reaction, both the Fa– (bond making) and Fb– (bond breaking) are 
nucleophilic. The net nucleophilicity of the Fa– atom is more than that of the Fb– atom along 
the IRC from reactant side to TS and the situation is reversed for the IRCs pertaining to the 
TS to product side. For the endothermic reaction, the net nucleophilicity of O (bond 
making) is higher than that of N (bond breaking) along the IRC. In the case of exothermic 
reaction, the O1 (bond making) atom is more electrophilic than its nucleophilic activity. 
Moreover, its Fukui function values as calculated through Mulliken Population Analysis 
(MPA) scheme become negative in some cases. For the thermoneutral reaction kωΔ  is 
minimum at the transition state. For other two reactions, kωΔ does not always follow the 
trend that the IRC corresponding to the minimum value of kω
±  (if not zero) is in 
accordance with the Hammond’s postulate.36 Figures 8 (a) – 8 (c) provide the profiles for 
the corresponding reaction forces.37  
 Apart from the important points corresponding to  the reactant (R), the transition 
state (TS) and the product (P) there exists two other important points associated with the 
configurations having the force maximum (Fmax) and the force minimum (Fmin). The 
zeroes, maxima and minima of the reaction force define key points along the reaction 
coordinate, which divide it into three reaction regions that are identified through vertical 
dashed lined in Figure 8. The first stage, in the reactant region, tends to be preparative in 
nature with emphasis in structural effects such as rotation, bond stretching, angle bending, 
etc., that will facilitate subsequent steps. The transition state region is mostly characterized 
by electronic rearrangements whereas the product region is mainly associated to structural 
relaxation necessary to reach the products. We have shown that analyzing a chemical 
reaction in terms of these regions can provide significant insight into its mechanism and 
the roles played by external factors, such as external potentials and solvents.37, 38 Partition 
of the activation energies in terms of the work done in going from i) R to Fmin: W1, ii) Fmin 
to TS: W2, iii) TS to Fmax: W3 and iv) Fmax to P: W4 gives the activation energy for the 
forward reaction (Ef#) as (W1+W2) and that of the reverse reaction (Er#) as -(W3+W4). 
Therefore the reaction energy (∆E0) becomes (Ef# – Er# = W1+W2+W3+W4). These values 
are provided in Table 11. As expected ∆E0 is zero, negative and positive for the 
thermoneutral, exothermic and endothermic reactions respectively. The skew-symmetric 
nature of the force profile for the thermoneutral reaction suggests that A=W1+W4 and 
B=W2+W3 would be zero. Similarly A, B would be positive (negative) for the 
endo(exo)thermic reactions. The transition state at the IRC=0 configuration lies at the 
middle between Fmax and Fmin configurations for the thermoneutral reaction whereas it lies 
towards the Fmin(Fmax) configurations for the exo(endo)thermic reaction, a signature of the 
Hammond postulate via reaction force.  
 Similar values of W1 and W2 (see Table 11) together with the changes observed in 
the nucleophilicity along the reaction coordinate for the thermoneutral SN2 substitution and 
for the exothermic reaction H2OO → HOOH indicate that structural and electronic 
reordering show up at the very beginning of the reaction, 37,38 through a sharp decrease of 
the nucleophilicity, this change practically ceases at the transition state of the exothermic 
reaction to reach the product value. It is interesting to note that in both cases the lowering 
of nucleophilicity of the key atoms from the reactants (Δω(Fa/Fb) ~ 0.014; Δω(O1) ~ 0.14) 
to the transition state  (Δω(Fa/Fb) ~ 0.004; Δω(O1) ~ 0.0) requires a similar amount of 
energy (9.54 kcal/mol and 7.39 kcal/mol, respectively). It can be observed in Table 11 that 
for the thermoneutral reaction W1>W2 indicating that the preparation step requires more 
energy than the transition to product step. On the other hand, the W2 values for the 
thermoneutral and exothermic reactions are quite close to each other and the work W1 
associated to the preparation step in the thermoneutral reaction is larger than that of the 
exothermic reaction, this indicates that in the SN2 reaction the structural reordering of the 
CH3 group to reach the D3h structure at the transition state is the key transformation that 
involve most of the activation energy. In the endothermic HNO → HON reaction the small 
changes of nucleophilicity together with large values of W1 and W2 indicates that the 
reaction is mainly driven by the structural reordering in the preparation step. 
5. Conclusions 
A multiphilicity descriptor (Δωk) is proposed and tested in this work. It is shown 
that, Δωk helps in identifying the electrophilic/nucleophilic nature of a specific site within 
a molecule. A comparison between different local reactivity descriptors is carried out on a 
set of carbonyl compounds. Also a selected set of amines is analyzed using Δωk. Further, 
we also consider a cope rearrangement of hexa-1,5-diene to test the variation of Δωk along 
IRC path. It is seen that Δωk presents a clear distinction between electrophilic and 
nucleophilic sites within a molecule in terms of their magnitude and sign. Hence they 
reveal the fact that multiphilic descriptor can effectively be used in characterizing the 
electrophilic/nucleophilic nature of a given site in a molecule. Also the importance of 
nucleophilicity excess ( ∓
gωΔ ) descriptor on the reactivity of all-metal aromatic 
compounds, viz., MAl4– (M=Li, Na, K and Cu) is successfully analyzed. Important insight 
into three different types of reactions, viz., a) thermoneutral, b) endothermic and c) 
exothermic are obtained through the analysis of the multiphilic descriptor profiles within 
the reaction regions defined by reaction force along the reaction path. 
The results discussed so far clearly show the importance of the selected descriptors, 
namely, multiphilic descriptor and nucleophilicity excess in analyzing the overall reactivity 
trends in molecular systems. 
Acknowledgment:   
PKC and DRR thank BRNS, Mumbai for financial assistance. JP and BSK thank the IIT 
Kharagpur for providing the facilities required for a summer project. JP also thanks the 
UGC for selecting him to carryout his Ph.D. work under FIP. ATL and SGO wish to thank 
financial support from FONDECYT, grant N° 1060590, FONDAP through project N° 
11980002 (CIMAT) and Programa Bicentenario en Ciencia y Tecnología (PBCT), 
Proyecto de Inserción Académica N° 8.  ATL is also indebted to the John Simon 
Guggenheim Foundation for a fellowship. 
References 
(1) Parr, R.G.; Yang, W. Density Functional Theory of Atoms and Molecules, Oxford 
University Press: Oxford, 1989. 
(2) Pearson, R. G. Chemical Hardness - Applications from Molecules to Solids, VCH-
Wiley: Weinheim, 1997. 
(3) Geerlings, P.; De Proft, F.; Langenaeker, W.  Chem. Rev. 2003, 103, 1793. 
(4) Special Issue of J. Chem. Sci. on Chemical Reactivity, 2005, Vol. 117, Guest Editor: 
Chattaraj, P. K.   
(5) Parr, R. G.; Szentpaly, L. V.; Liu, S. J. Am. Chem. Soc. 1999, 121, 1922. Chattaraj, 
P. K.; Sarkar, U.; Roy, D. R. Chem. Rev. 2006, 106, 2065. 
(6) Parr, R. G.; Yang, W.  J. Am. Chem. Soc., 1984, 106, 4049. Fukui, K.  Science 1987, 
218, 747. Ayers, P. W.; Levy, M.  Theor. Chem. Acc. 2000, 103, 353. 
(7) Yang, W.; Mortier, W. J. J. Am. Chem. Soc. 1986, 108, 5708. 
(8) Chattaraj, P. K. ; Maiti, B. ; Sarkar, U.  J. Phys. Chem. A. 2003, 107, 4973. 
(9) Parthasarathi, R.; Padmanabhan, J.; Subramanian, V.; Maiti, B.; Chattaraj, P. K. J. 
Phys. Chem. A. 2003, 107, 10346.  
(10) Parthasarathi, R.; Padmanabhan, J.; Subramanian, V.; Maiti, B.; Chattaraj, P. K. 
Current Sci. 2004, 86, 535. 
(11) Padmanabhan, J.; Parthasarathi, R.; Subramanian, V.; Chattaraj, P. K. Chem. Res. 
Tox. 2006, 19, 356. 
(12) Yang, W.; Parr, R. G. Proc. Natl. Acad. Sci. U.S.A. 1985, 82, 6723. 
(13) Lee, C.; Yang, W.; Parr, R. G.  J. Mol. Struct. (Theochem). 1988, 163, 305. 
(14) Bulat, F.A.; Chamorro, E.; Fuentealba, P.; Toro-Labbé, A. J. Phys. Chem. A 2004, 
108, 342. 
(15) Langenaeker, W.; De Proft, F.; Geerlings, P.  J. Mol. Struct. (Theochem). 1996, 362, 
175. 
(16) De Proft, F.; Martin, M. L. J. ; Geerlings, P.  Chem. Phys. Lett. 1996, 256, 400. 
(17) Contreras, R. ; Fuentealba, P. ; Galván, M. ; Pérez, P.  Chem. Phys. Lett. 1999, 304, 
405.  
(18) Thanikaivelan, P.; Padmanabhan, J.; Subramanian, V.; Ramasami, T.   Theo. Chem. 
Acc. 2002, 107, 326. 
(19) Parthasarathi, R.; Padmanabhan, J.; Elango, M.; Subramanian, V.; Chattaraj, P. K. 
Chem. Phys. Lett. 2004, 394, 225. 
(20) Morell, C.; Grand, A.; Toro-Labbé, A. J. Phys. Chem. A.  2005, 109, 205. Morell, C.; 
Grand, A.; Toro-Labbé, A. Chem. Phys. Lett. 2006, 425, 342. 
(21) J. Padmanabhan, R. Parthasarathi, V. Subramanian, P. K. Chattaraj, J. Phys. Chem. A 
110 (2006) 2739. 
(22) Morell, C.; Grand, A.; Toro-Labbe, A. Chem. Phys. Lett. 2006, 425, 342. 
(23) Chamorro, E.; Chattaraj, P. K.; Fuentealba, P. J. Phys. Chem. A 2003, 107, 7068. 
(24) Fuentealba, P.; Parr, R. G. J. Chem. Phys. 1991, 94, 5559. 
(25) Perdew, J. P.; Parr, R. G.; Levy, M.; Balduz, J. L., Jr. Phys. Rev. Lett. 1982, 49, 1691. 
(26) Ayers, P. W.; Morell, C.; De Proft, F.; Geerlings, P., unpublished work. 
(27) Gaussian 98, Revision A.5, Gaussian Inc., Pittsburgh, PA, 1998. 
(28) Reed, A. E.; Weinhold, F. J Chem Phys. 1983, 78, 4066. 
(29) Reed A E.; Weinstock, R. B.; Weinhold, F. J Chem Phys. 1985, 83, 735. 
(30) Hirshfeld, F. L.  Theor. Chim. Acta. 1977, 44, 129. 
(31) DMOL3, Accelrys, Inc. San Diego, California. 
(32) March, J. Advanced Organic Chemistry: Reactions, Mechanisms and Structure, 
Wiley & Sons: New York. 1998. 
(33) Patai, S.; Rappoport, Z. In The Chemistry of Alkenes; Interscience Publishers: 
London, 1964; p 469. Wong, S. S.; Paddon-Row, M. N.; Li, Y.; Houk, K. N. J. Am. 
Chem. Soc. 1990, 112, 8679. Langenaeker, W.; Demel, K.; Geerlings, P. J. Mol. 
Struct.: THEOCHEM 1992, 259, 317. Dorigo, A. E.; Morokuma, K. J. Am. Chem. 
Soc. 1989, 111, 6524.  
(34) (a) Li, X.; Kuznetsov, A. E.; Zhang, H.-F.; Boldyrev, A. I.; Wang, L.-S. Science 
2001, 291, 859. (b) Kuznetsov, A.; Birch, K.; Boldyrev, A. I.; Li, X.; Zhai, H.; Wang, 
L.-S. Science 2003, 300, 622. 
(35) Chattaraj, P. K.; Roy, D. R.; Elango, M.; Subramanian, V. J. Phys. Chem. A 2005, 
109, 9590. Roy, D. R.; Chattaraj, P. K.; Subramanian, V. Ind. J.Chem. A 2006, 45A, 
2369. Bulat, F.A.; Toro-Labbé, A. J. Phys. Chem. A 2003, 107, 3987. 
(36) Chattaraj, P. K.; Roy, D. R. J. Phys. Chem. A 2006, 110, 11401. Chattaraj, P. K.; 
Roy, D. R. J. Phys. Chem. A 2005, 109, 3771. 
(37) Toro-Labbé, A. J. Phys. Chem. A 1999, 103, 4398. Jaque, P.; Toro-Labbé, A. J. Phys. 
Chem. A 2000, 104, 995. Martínez, J.; Toro-Labbé, A. Chem. Phys. Lett. 2004, 392, 
132. Herrera, B.; Toro-Labbé, A. J. Chem. Phys. 2004, 121, 7096. Toro-Labbé, A.; 
Gutiérrez-Oliva, S.; Concha, M. C.; Murray, J. S.; Politzer, P. J. Chem. Phys. 2004, 
121, 4570. Gutiérrez-Oliva, S.; Herrera, B.; Toro-Labbé, A.; Chermette, H. J. Phys. 
Chem. A 2005, 109, 1748. 
(38)  Politzer, P.; Burda, J. V.; Concha, M. C.; Lane, P.; Murray, J. S.  J. Phys. Chem. A  
2006, 110, 756. Rincón, E.; Jaque, P.; Toro-Labbé, A. J. Phys. Chem. A 2006, 110, 
9478. Burda, J. V.; Toro-Labbé, A.; Gutiérrez-Oliva, S.; Murray, J. S.; Politzer, P. J. 
Phys. Chem. A  2007, in press. 
TABLE 1: Calculated Global Reactivity Properties of the Selected Molecules using 
B3LYP/6-311+g** and BLYP/DND method.  
η μ ω  S 
η μ ω  S 
Molecules B3LYP/6-311+g** (eV) BLYP/DND (eV) 
HCHO 2.960 -4.707 3.742 0.169 1.942 -4.260 4.673 0.258 
CH3CHO 3.115 -4.224 2.864 0.161 2.096 -3.791 3.425 0.238 
CH3COCH3 3.144 -3.910 2.432 0.159 2.133 -3.456 2.800 0.234 
C2H5COC2H5 3.153 -3.799 2.288 0.159 2.151 -3.367 2.635 0.233 
         
CH2=CHCHO 2.503 -4.904 4.805 0.200 1.545 -4.413 6.303 0.324 
CH3CH=CHCHO 2.542 -4.631 4.217 0.197 1.593 -4.132 5.359 0.314 
TABLE 2: Calculated Local Reactivity Properties of the Selected Molecules using B3LYP/6-311+g** method for NPA derived 
charges.  
Molecule  fk
- Δfk 
+- fk
HCHO C 0.8323 -0.1722 0.1406 -0.0291 -4.8331 3.1146 -0.6444 1.0045 0.1697 3.7591 
 O 0.0399 0.9409 0.0067 0.1589 0.0424 0.1494 3.5211 -0.9010 -0.1522 -3.3718 
CH3CHO C1 0.8178 -0.2416 0.1313 -0.0388 -3.3856 2.3419 -0.6917 1.0593 0.1700 3.0337 
 O 0.0072 0.9320 0.0012 0.1496 0.0077 0.0206 2.6691 -0.9250 -0.1484 -2.6485 
CH3COCH3 C1 0.3142 -0.2916 0.0500 -0.0464 -1.0772 0.7640 -0.7092 0.6058 0.0964 1.4732 
 O -0.2540 0.9286 -0.0404 0.1477 -0.2734 -0.6170 2.2582 -1.1820 -0.1881 -2.8755 
C2H5COC2H5 C1 0.3064 -0.2944 0.0486 -0.0467 -1.0408 0.7011 -0.6736 0.6007 0.0953 1.3746 
 O -0.2650 0.8751 -0.0420 0.1388 -0.3024 -0.606 2.0025 -1.1400 -0.1807 -2.6080 
CH2=CHCHO C6 0.2789 0.2070 0.0557 0.0413 1.3472 1.3402 0.9944 0.0719 0.0144 0.3458 
 C1 0.4355 -0.2288 0.0870 -0.0457 -1.9033 2.0926 -1.0995 0.6643 0.1327 3.1921 
 O -0.0560 0.9265 -0.0112 0.1851 -0.0605 -0.2700 4.4518 -0.9830 -0.1963 -4.7213 
CH3CH=CHCHO C6 0.3437 0.0926 0.0676 0.0182 3.7143 1.4494 0.3904 0.2511 0.0494 1.0590 
 C1 0.4408 -0.2365 0.0867 -0.0465 -1.8642 1.8592 -0.9973 0.6773 0.1332 2.8566 
 O -0.0670 0.9281 -0.0132 0.1825 -0.0721 -0.2820 3.9142 -0.9950 -0.1957 -4.1964 
TABLE 3: Calculated Local Reactivity Properties of the Selected Molecules using BLYP/DND method for HPA derived 
charges. 
Molecule  fk
- Δfk 
+- fk
HCHO C 0.3973 0.2373 0.1023 0.0611 1.6744 1.8563 1.1088 0.1600 0.0412 0.7476 
 O 0.3010 0.4232 0.0775 0.1090 0.7113 1.4064 1.9774 -0.1222 -0.0315 -0.5710 
CH3CHO C1 0.2998 0.1642 0.0715 0.0391 1.8267 1.0268 0.5624 0.1356 0.0324 0.4644 
 O 0.2708 0.3782 0.0646 0.0902 0.7165 0.9275 1.2953 -0.1074 -0.0256 -0.3678 
CH3COCH3 C1 0.2108 0.1154 0.0494 0.0271 1.8262 0.5902 0.3231 0.0954 0.0223 0.2671 
 O 0.2359 0.3499 0.0553 0.0820 0.6742 0.6605 0.9797 -0.1140 -0.0267 -0.3192 
C2H5COC2H5 C1 0.1346 0.0990 0.0313 0.0230 1.3598 0.3547 0.2609 0.0356 0.0083 0.0938 
 O 0.1449 0.2873 0.0337 0.0668 0.5045 0.3818 0.7570 -0.1424 -0.0331 -0.3752 
CH2=CHCHO C1 0.1780 0.1357 0.0577 0.0440 1.3117 1.1219 0.8553 0.0423 0.0137 0.2666 
 C6 0.2062 0.1253 0.0668 0.0406 1.6457 1.2997 0.7898 0.0809 0.0262 0.5099 
 O 0.1797 0.3414 0.0582 0.1106 0.5264 1.1326 2.1518 -0.1620 -0.0524 -1.0191 
CH3CH=CHCHO C6 0.1592 0.1114 0.0500 0.0350 1.4291 0.8532 0.5970 0.0478 0.0150 0.2562 
 C1 0.1741 0.1095 0.0547 0.0344 1.5900 0.9330 0.5868 0.0646 0.0203 0.3462 
 O 0.1739 0.2450 0.0546 0.0769 0.7098 0.9319 1.3130 -0.0710 -0.0223 -0.3810 
TABLE 4: Calculated Global Reactivity Properties of the Selected Molecules using 
B3LYP/6-311+g** and BLYP/DND method.  
η μ ω  S 
η μ ω  S 
Molecules B3LYP/6-311+g** (eV) BLYP/DND (eV) 
NH2OH 3.869 -3.553 1.632 0.129 3.411 -1.399 0.287 0.147 
CH3ONH2 3.630 -3.738 1.925 0.138 3.549 -3.053 1.313 0.141 
CH3NHOH 3.482 -3.392 1.652 0.144 3.229 -1.308 0.265 0.155 
OHCH2CH2NH2 3.343 -3.507 1.840 0.150 3.348 -2.689 1.080 0.149 
CH3SNH2 3.050 -3.331 1.819 0.164 2.447 -1.750 0.626 0.204 
CH3NHSH 3.148 -3.629 2.092 0.159 2.466 -3.596 2.622 0.203 
SHCH2CH2NH2 3.135 -3.417 1.862 0.159 2.521 -1.843 0.674 0.198 
TABLE 5: Calculated Local Reactivity Properties of the Selected Molecules using B3LYP/6-
311+g** method for NPA derived charges. 
Molecule  fk
 - sk
- Δfk 
+- fk
NH2OH N 0.1870 0.4140 0.0274 0.0607 2.2139 0.0536 0.1187 -0.2270 -0.0333 -0.0651 
 O 0.2390 0.2300 0.0350 0.0337 0.9623 0.0685 0.0659 0.0090 0.0013 0.0026 
CH3ONH2 C 0.0870 0.0680 0.1410 1.3130 0.0123 0.0096 0.7816 0.1142 0.0893 0.0190 
 N 0.1500 0.3510 0.0211 0.0495 2.3400 0.1969 0.4608 -0.2010 -0.0283 -0.2639 
 O 0.0720 0.1740 0.0101 0.0245 2.4167 0.0945 0.2284 -0.1020 -0.0144 -0.1339 
CH3NHOH C 0.0470 0.0740 0.0073 0.0115 1.5745 0.0124 0.0196 -0.0270 -0.0042 -0.0071 
 N 0.1200 0.3390 0.0186 0.0525 2.8250 0.0318 0.0898 -0.2190 -0.0339 -0.0580 
 O 0.2100 0.1770 0.0325 0.0274 0.8429 0.0556 0.0469 0.0330 0.0051 0.0087 
OHCH2CH2NH2 C1 0.0540 0.0330 0.0081 0.0049 0.6111 0.0583 0.0356 0.0210 0.0031 0.0227 
 C2 0.0400 0.0610 0.006 0.0091 1.5250 0.0432 0.0659 -0.0210 -0.0031 -0.0227 
 N 0.0630 0.3470 0.0094 0.0518 5.5079 0.0680 0.3746 -0.2840 -0.0424 -0.3066 
 O 0.1400 0.1010 0.0209 0.0151 0.7214 0.1511 0.1090 0.0390 0.0058 0.0421 
CH3SNH2 C 0.0550 0.0640 0.0112 0.0131 1.1636 0.0344 0.0400 -0.0090 -0.0018 -0.0056 
 N 0.1490 0.0820 0.0305 0.0168 0.5503 0.0932 0.0513 0.0670 0.0137 0.0419 
 S 0.3580 0.5510 0.0732 0.1126 1.5391 0.2239 0.3447 -0.1930 -0.0394 -0.1207 
CH3NHSH C 0.0530 0.0540 0.0107 0.0110 1.0189 0.1390 0.1416 -0.0010 -0.0002 -0.0026 
 N 0.1310 0.1740 0.0266 0.0353 1.3282 0.3434 0.4562 -0.0430 -0.0087 -0.1127 
 S 0.4530 0.4420 0.0919 0.0896 0.9757 1.1876 1.1588 0.0110 0.0022 0.0288 
SHCH2CH2NH2 C1 0.0780 0.0410 0.0155 0.0081 0.5256 0.0525 0.0276 0.0370 0.0073 0.0249 
 C2 0.0290 0.0250 0.0058 0.0050 0.8621 0.0195 0.0168 0.0040 0.0008 0.0027 
 N 0.0380 0.1270 0.0075 0.0252 3.3421 0.0256 0.0856 -0.0890 -0.0177 -0.0600 
 S 0.3890 0.4710 0.0772 0.0934 1.2108 0.2621 0.3173 -0.0820 -0.0163 -0.0552 
TABLE 6 Calculated Local Reactivity Properties of the Selected Molecules using BLYP/DND 
method for HPA derived charges. 
Molecule  fk
 - sk
- Δfk 
+- fk
NH2OH N 0.1837 0.9327 0.0237 0.1205 5.0777 0.2997 1.5218 -0.7490 -0.0970 -1.2220 
 O -0.0770 0.5114 -0.0100 0.0661 -6.6170 -0.1261 0.8344 -0.5890 -0.0760 -0.9610 
CH3ONH2 C 0.5410 0.0819 0.0746 0.0113 0.1513 1.0412 0.1576 0.4592 0.0633 0.8837 
 N -0.1510 0.2534 -0.0210 0.0349 -1.6740 -0.2913 0.4877 -0.4050 -0.0560 -0.7790 
 O -0.1790 0.9011 -0.0250 0.1242 -5.0267 -0.3450 1.7342 -1.0800 -0.1490 -2.0790 
CH3NHOH C 0.4598 0.1677 0.0660 0.0241 0.3647 0.7598 0.2771 0.2921 0.0419 0.4827 
 N -0.0580 0.7950 -0.0080 0.1142 -13.725 -0.0957 1.3136 -0.8530 -0.1220 -1.4090 
 O -0.2690 0.4537 -0.0390 0.0651 -1.6855 -0.4448 0.7497 -0.7230 -0.1040 -1.1940 
OHCH2CH2NH2 C1 0.1186 0.0254 0.0177 0.0038 0.2140 0.2181 0.0467 0.0932 0.0139 0.1715 
 C2 0.4003 0.1067 0.0599 0.0160 0.2666 0.7365 0.1964 0.2936 0.0439 0.5401 
 N -0.3040 0.9520 -0.0450 0.1424 -3.1337 -0.5589 1.7514 -1.2560 -0.1880 -2.3100 
 O -0.3340 0.5965 -0.0500 0.0892 -1.7842 -0.6151 1.0974 -0.9310 -0.1390 -1.7120 
CH3SNH2 C 0.0667 0.3358 0.0100 0.0502 5.0377 0.1226 0.6178 -0.2690 -0.0400 -0.4950 
 N -0.297 0.4790 -0.044 0.0717 -1.6119 -0.5467 0.8813 -0.7760 -0.1160 -1.4280 
 S 0.3671 0.6485 0.0549 0.0970 1.7667 0.6753 1.1931 -0.2810 -0.0420 -0.5180 
CH3NHSH C 0.1715 0.1732 0.0256 0.0259 1.0100 0.3154 0.3186 -0.0020 -0.0003 -0.0030 
 N -0.225 0.9064 -0.0340 0.1356 -4.0267 -0.4141 1.6676 -1.1320 -0.1690 -2.0820 
 S 0.3479 0.2249 0.0520 0.0336 0.6465 0.6400 0.4137 0.1230 0.01840 0.2262 
SHCH2CH2NH2 C1 0.0117 0.2268 0.0017 0.0339 19.432 0.0215 0.4172 -0.2150 -0.0320 -0.3960 
 C2 0.1651 0.0876 0.0247 0.0131 0.5309 0.3037 0.1612 0.0774 0.0116 0.1425 
 N -0.292 0.7628 -0.0440 0.1141 -2.6164 -0.5364 1.4035 -1.0540 -0.1580 -1.9400 
 S 0.1064 0.5646 0.0159 0.0845 5.3089 0.1957 1.0388 -0.4580 -0.0690 -0.8430 
TABLE 7: Atomic site with maximum value for multiphilic descriptor (∆ωk) for the 
selected set of amines. 
site with maximum value for ∆ωk molecule 
NPA HPA 
NH2OH N N 
CH3ONH2 O N 
CH3NHOH N N 
OHCH2CH2NH2 N N 
CH3SNH2 N S 
CH3NHSH N N 
SHCH2CH2NH2 N N 
TABLE 8: Global reactivity descriptors calculated at B3LYP/6-31G* level of theory. 
Species η 
(eV) 
(eV) 
(eV) 
Reactant 3.64 -2.89 1.15 
Transition State 2.48 -2.79 1.57 
Product 3.64 -2.89 1.15 
TABLE 9: Global reactivity descriptors along the intrinsic reaction coordinate 
calculated at B3LYP/6-31G* level of theory. 
Points along 
(Hartrees) 
(eV) 
(eV) 
(eV) 
(a.u.) 
1 -234.5673091 2.65 -2.7825 1.46 64.94 
2 -234.5661087 2.63 -2.7827 1.47 65.21 
3 -234.5649450 2.61 -2.7828 1.49 65.47 
4 -234.5638273 2.59 -2.7836 1.50 65.74 
5 -234.5627655 2.57 -2.7836 1.51 65.98 
6 -234.5617681 2.55 -2.7843 1.52 66.22 
7 -234.5608445 2.54 -2.7843 1.53 66.42 
8 -234.5600030 2.53 -2.7851 1.54 66.63 
9 -234.5592516 2.51 -2.7852 1.54 66.80 
10 -234.5585980 2.50 -2.7859 1.55 66.96 
11 -234.5580104 2.50 -2.7857 1.56 67.07 
12 -234.5575677 2.49 -2.7866 1.56 67.20 
13 -234.5575677 2.49 -2.7866 1.56 67.20 
14 -234.5580104 2.50 -2.7857 1.56 67.07 
15 -234.5585980 2.50 -2.7859 1.55 66.96 
16 -234.5592516 2.51 -2.7852 1.54 66.80 
17 -234.5600030 2.53 -2.7851 1.54 66.63 
18 -234.5608445 2.54 -2.7843 1.53 66.42 
19 -234.5617681 2.55 -2.7843 1.52 66.22 
20 -234.5627655 2.57 -2.7836 1.51 65.98 
21 -234.5638273 2.59 -2.7836 1.50 65.74 
22 -234.5649450 2.61 -2.7830 1.49 65.47 
23 -234.5661087 2.63 -2.7827 1.47 65.21 
24 -234.5673092 2.65 -2.7825 1.46 64.94 
TABLE 10: Group Philicity ( +
gω ) Values for Nucleophilic and Electrophilic 
Attacks Respectively for the Ionic Units of Different Isomers of LiAl4–, NaAl4–, 
KAl4– and CuAl4–. 
Isomers Ionic Unit 
gωΔ  
Al42– 0.0070 0.0095  0.0025 LiAl4– 
(C∞v) Li+ 0.0063 0.0037 -0.0025 
Al42– 1.3E-05 0.0055  0.0055 LiAl4– 
(C2v) Li+ 0.0068 0.0013 -0.0055 
Al42– -0.0372 0.2965  0.3338 LiAl4– 
(C4v) Li+ 0.4055 0.0718 -0.3338 
Al42– 0.0070 0.0102   0.0032 NaAl4– 
(C∞v) Na+ 0.0074 0.0042 -0.0032 
Al42– -0.0001 0.0078   0.0079 NaAl4– 
(C2v) Na+ 0.0096 0.0017 -0.0079 
Al42– -0.0073 0.1024   0.1097 NaAl4– 
(C4v) Na+ 0.1301 0.0204  -0.1097 
Al42– 0.0044 0.0095   0.0051 KAl4– 
(C∞v) K+ 0.0106 0.0054 -0.0051 
Al42– 0.0023 0.0101   0.0078 KAl4– 
(C2v) K+ 0.0118 0.0039 -0.0078 
Al42– 0.0008 0.0066   0.0057 KAl4– 
(C4v) K+ 0.0078 0.0021 -0.0057 
Al42– 0.0031 0.0036   0.0006 CuAl4– 
(C∞v) Cu+ 0.0014 0.0009 -0.0006 
Al42– 0.0036 0.0036   0.0048 CuAl4– 
(C2v) Cu+ 0.0008 0.0008 -0.0048 
Al42– 0.0178 0.0332   0.0154 CuAl4– 
(C4v) Cu+ 0.0131 -0.0023 -0.0154 
TABLE 11: Profiles of the forward activation energy ( #fEΔ ), reverse activation 
energy ( #rEΔ ) and reaction energy (
0EΔ ) of a thermoneutral reaction (Fa– + CH3-Fb 
→ Fa--CH3 + Fb–; an endothermic reaction (HNO → HON) and an exothermic 
reaction (H2OO → HOOH). 
Reaction #fEΔ
ξ1 ξ2 W1 W2 W3 W4 
Thermo-neutral 
B3LYP/6-311++G** 
9.54 9.54 0.0 -1.33 1.33 5.42 4.12 -4.12 - 5.42 
Endothermic 
B3LYP/6-311+G** 
75.39 34.84 40.55 -0.80 0.60 43.97 31.42 -13,20 - 21.64 
Exothermic 
B3LYP/6-311+G** 
7.39 52.85 -45.46 -0.65 0.87 3.93 3.46 - 19.99 - 32.86 
Figure 1. Optimized structures with atom numbering for the selected carbonyl 
compounds. 
Figure 2. Optimized structures with atom numbering for the selected amine systems. 
Reactant Transition State Product 
Figure 3:Optimized geometrical structures calculated using B3LYP/6-31G* level of 
theory. 
-234.568
-234.566
-234.564
-234.562
-234.560
-234.558
-234.556
 Energy (Hartree)
 Electrophilicity Index (eV)
Intrinsic Reaction Coordinate
lectrophilicity Index (eV
2.66  Chemical Hardness (eV)
 Polarizability (au)
Intrinsic Reaction Coordinate
olarizability (au)
Figure 4 (a-b):Variation of global reactivity descriptors along intrinsic reaction 
coordinate. 
-0.008
-0.006
-0.004
-0.002
0.000
0.002
0.004
0.006
Intrinsic Reaction Coordinate
 C1,C3 sites
-0.008
-0.006
-0.004
-0.002
0.000
0.002
0.004
0.006
Intrinsic Reaction Coordinate
 C6,C11 sites
Figure 5 (a-b): Variation of multiphilic descriptor along intrinsic reaction coordinate for 
the selected atomic sites. 
         MAl4– [C∞v] 
     MAl4– [C2v] 
      
      MAl4– [C4v] 
                         M=Li, Na, K, Cu 
Figure 6. Optimized structures of various isomers of MAl4– (M ≡ Li, Na, K, Cu). 
                
-3 -2 -1 0 1 2 3
-239.704
-239.702
-239.700
-239.698
-239.696
-239.694
-239.692
-239.690
-239.688
-239.686
-3 -2 -1 0 1 2 3
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
 Energy Δω (Fa)
 Δω (F
     (a)                   
                 
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
-130.52
-130.50
-130.48
-130.46
-130.44
-130.42
-130.40
-130.38
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
-0.07
-0.06
-0.05
-0.04
-0.03
-0.02
-0.01
 Energy
                
-2 -1 0 1 2 3
-151.60
-151.58
-151.56
-151.54
-151.52
-151.50
-2 -1 0 1 2 3
-0.02
 Energy
 (O2) Δω
 (O1)
     (c) 
Figure 7 (a-c): Profiles of net nucleophilicity (∆ωk) of along the path of the gas phase (a) 
thermoneutral SN2 substitution: Fa- + CH3-Fb → Fa-CH3 + Fb-, (b) endothermic reaction: 
HNO → HON and (c) exothermic reaction: H2OO → HOOH. Also shown is the profile 
of energy.  
Figure  8  Reaction force profiles along the reaction coordinate for (a)  thermoneutral 
reaction: Fa– + CH3-Fb → Fa--CH3 + Fb–; (b) endothermic reaction: HNO → HON;  (c) the 
exothermic reaction: H2OO → HOOH. The vertical dashed lines define the reaction 
regions as follows: reactant (left), transition state (middle) and product (right). 
-4 -2 0 2 4
-2 -1 0 1 2 3
-2 -1 0 1 2
max(a)
ABSTRACT
  In line with the local philicity concept proposed by Chattaraj et al.
(Chattaraj, P. K.; Maiti, B.; Sarkar, U. J. Phys. Chem. A. 2003, 107, 4973) and
a dual descriptor derived by Toro-Labbe and coworkers (Morell, C.; Grand, A.;
Toro-Labbe, A. J. Phys. Chem. A. 2005, 109, 205), we propose a multiphilic
descriptor. It is defined as the difference between nucleophilic (Wk+) and
electrophilic (Wk-) condensed philicity functions. This descriptor is capable
of simultaneously explaining the nucleophilicity and electrophilicity of the
given atomic sites in the molecule. Variation of these quantities along the
path of a soft reaction is also analyzed. Predictive ability of this descriptor
has been successfully tested on the selected systems and reactions.
Corresponding force profiles are also analyzed in some representative cases.
Also, to study the intra- and intermolecular reactivities another related
descriptor namely, the nucleophilicity excess (DelW-+) for a nucleophile, over
the electrophilicity in it has been defined and tested on all-metal aromatic
compounds.

<|endoftext|><|startoftext|>
Introduction
1.1. Objectives and motivations
In this paper, we deal with an Rd-valued Feller Markov process (Xt) with semigroup
(Pt)t≥0 and assume that (Xt) admits an invariant distribution ν0. The aim of this work is
to propose a way to approximate the whole stationary distribution Pν0 of (Xt). More pre-
cisely, we want to construct a sequence of weighted occupation measures (ν(n)(ω,dα))n≥1
on the Skorokhod space D(R+,R
d) such that ν(n)(ω,F )
n→+∞−→
F (α)Pν0(dα) a.s. for a
class of functionals F :D(R+,R
d) which includes bounded continuous functionals for the
Skorokhod topology.
One of our motivations is to develop a new numerical method for option pricing in sta-
tionary stochastic volatility models which are slight modifications of the classical stochas-
tic volatility models, where we suppose that the volatility evolves under its stationary
regime.
This is an electronic reprint of the original article published by the ISI/BS in Bernoulli,
2009, Vol. 15, No. 1, 146–177. This reprint differs from the original in pagination and
typographic detail.
1350-7265 c© 2009 ISI/BS
http://arxiv.org/abs/0704.0335v3
http://isi.cbs.nl/bernoulli/
http://dx.doi.org/10.3150/08-BEJ142
mailto:gpa@ccr.jussieu.fr
mailto:fpanloup@insa-toulouse.fr
http://isi.cbs.nl/BS/bshome.htm
http://isi.cbs.nl/bernoulli/
http://dx.doi.org/10.3150/08-BEJ142
Approximation of the distribution of a stationary Markov process 147
1.2. Background and construction of the procedure
This work follows on from a series of recent papers due to Lamberton and Pagès ([12, 13]),
Lemaire ([14, 15]) and Panloup ([18, 19, 20]), where the problem of the approximation
of the invariant distribution is investigated for Brownian diffusions and for Lévy-driven
SDE’s.1 In these papers, the algorithm is based on an adapted Euler scheme with de-
creasing step (γk)k≥1. To be precise, let (Γn) be the sequence of discretization times:
Γ0 = 0, Γn =
k=1 γk for every n≥ 1, and assume that Γn → +∞ when n→+∞. Let
(X̄Γn)n≥0 be the Euler scheme obtained by “freezing” the coefficients between the Γn’s
and let (ηn)n≥1 be a sequence of positive weights such that Hn :=
k=1 ηk →+∞ when
k→+∞. Then, under some Lyapunov-type stability assumptions adapted to the stochas-
tic processes of interest, one shows that for a large class of steps and weights (ηn, γn)n≥1,
ν̄n(ω, f) :=
ηkf(X̄Γk−1)
n→+∞−→
f(x)ν0(dx) a.s., (1)
(at least)2 for every bounded continuous function f .
Since the problem of the approximation of the invariant distribution has been deeply
studied for a wide class of Markov processes (Brownian diffusions and Lévy-driven SDE’s)
and since the proof of (1) can be adapted to other classes of Markov processes under some
specific Lyapunov assumptions, we choose in this paper to consider a general Markov pro-
cess and to assume the existence of a time discretization scheme (X̄Γk)k≥0 such that (1)
holds for the class of bounded continuous functions. The aim of this paper is then to inves-
tigate the convergence properties of a functional version of the sequence (ν̄n(ω,dα))n≥1.
Let (Xt) be a Markov and Feller process and let (X̄t)t≥0 be a stepwise constant time
discretization scheme of (Xt) with non-increasing step sequence (γn)n≥1 satisfying
γn = 0, Γn :=
n→+∞−→ +∞. (2)
Letting Γ0 := 0 and X̄0 = x0 ∈Rd, we assume that
X̄t = X̄Γn ∀t ∈ [Γn,Γn+1[ (3)
and that (X̄Γn)n≥0 can be simulated recursively.
We denote by (Ft)t≥0 and (F̄t)t≥0 the usual augmentations of the natural filtrations
(σ(Xs,0≤ s≤ t))t≥0 and (σ(X̄s,0≤ s≤ t))t≥0, respectively.
1Note that computing the invariant distribution is equivalent to computing the marginal laws of the
stationary process (Xt) since ν0Pt = ν0 for every t≥ 0.
2The class of functions for which (1) holds depends on the stability of the dynamical system. In
particular, in the Brownian diffusion case, the convergence may hold for continuous functions with
subexponential growth, whereas the class of functions strongly depends on the moments of the Lévy
process when the stochastic process is a Lévy-driven SDE.
148 G. Pagès and F. Panloup
For k ≥ 0, we denote by (X̄(k)t )t≥0 the shifted process defined by
t := X̄Γk+t.
In particular, X̄
t = X̄t. We define a sequence of random probabilities (ν
(n)(ω,dα))n≥1
on D(R+,R
d) by
ν(n)(ω,dα) =
ηk1{X̄(k−1)(ω)∈dα},
where (ηk)k≥1 is a sequence of weights. For t ≥ 0, (ν(n)t (ω,dx))n≥1 will denote the se-
quence of “marginal” empirical measures on Rd defined by
t (ω,dx) =
ηk1{X̄(k−1)
(ω)∈dx}
1.3. Simulation of (ν(n)(ω,F ))
For every functional F :D(R+,R
d)→R, the following recurrence relation holds for every
n≥ 1:
ν(n+1)(ω,F ) = ν(n)(ω,F ) +
(F (X(n)(ω))− ν(n)(ω,F )). (4)
Then, if T is a positive number and F :D(R+,R
d) → R is a functional depending only
on the trajectory between 0 and T , (ν(n)(ω,F ))n≥1 can be simulated by the following
procedure.
Step 0. (i) Simulate (X̄
t )t≥0 on [0, T ], that is, simulate (X̄Γk)k≥0 for k =
0, . . . ,N(0, T ), where
N(n,T ) := inf{k ≥ n,Γk+1 − Γn > T }
= max{k ≥ 0,Γk − Γn ≤ T }, n≥ 0, T > 0.
Note that n 7→N(n, t) is an increasing sequence since (γn) is non-increasing, and that
ΓN(n,T ) − Γn ≤ T < ΓN(n,T )+1 − Γn.
(ii) Compute F ((X̄
t )t≥0) and ν
(1)(ω,F ). Store the values of (X̄Γk) for k =
1, . . . ,N(0, T ).
Step n (n≥ 1). (i) Since the values (X̄Γk)k≥0 are stored for k = n, . . . ,N(n− 1, T ),
simulate (X̄Γk)k≥0 for k =N(n−1, T )+1, . . . ,N(n,T ) in order to obtain a path of (X̄
on [0, T ].
(ii) Compute F ((X̄
t )t≥0) and use (4) to compute ν
(n+1)(ω,F ). Store the values of
(X̄Γk) for k = n+ 1, . . . ,N(n,T ).
Approximation of the distribution of a stationary Markov process 149
Remark 1. As shown in the description of the procedure, one generally has to store
the vector [X̄Γn , . . . , X̄ΓN(n,T) ] at time n. Since (γn) is a sequence with infinite sum that
decreases to 0, it follows that the size of this vector increases “slowly” to +∞. For
instance, if γn = Cn
−ρ with ρ ∈ (0,1), its size is of order nρ. However, it is important
to remark that even though the number of values to be stored tends to +∞, that is
not always the case for the number of operations at each step. Indeed, since X̄(n+1)
is obtained by shifting X̄(n), it is usually possible to use, at step n+ 1, the preceding
computations and to simulate the sequence (F (X̄(n)))n≥0 in a “quasi-recursive” way.
For instance, such remark holds for Asian options because the associated pay-off can be
expressed as a function of an additive functional (see Section 5 for simulations).
Before outlining the sequel of the paper, we list some notation linked to the spaces
D(R+,R
d) and D([0, T ],Rd) of cadlag Rd-valued functions on R+ and [0, T ], respectively,
endowed with the Skorokhod topology. First, we denote by d1 the Skorokhod distance
on D([0,1],Rd) defined for every α, β ∈D([0,1],Rd) by
d1(α,β) = inf
t∈[0,1]
|α(t)− β(λ(t))|, sup
0≤s<t≤1
λ(t)− λ(s)
where Λ1 denotes the set of increasing homeomorphisms of [0,1]. Second, for T > 0,
φT :D(R+,R
d) 7→D([0,1],Rd) is the function defined by (φT (α))(s) = α(sT ) for every s ∈
[0,1]. We then denote by d the distance on D(R+,R
d) defined for every α,β ∈D(R+,Rd)
d(α,β) =
e−t(1∧ d1(φt(α), φt(β))) dt. (6)
We recall that (D(R+,R
d), d) is a Polish space and that the induced topology is the usual
Skorokhod topology on D(R+,R
d) (see, e.g., Pagès [16]). For every T > 0, we set
σ(πu,0≤ u≤ s),
where πs :D(R+,R
d)→Rd is defined by πs(α) = α(s). For a functional F :D(R+,Rd)→
R, FT denotes the functional defined for every α ∈D(R+,Rd) by
FT (α) = F (α
T ) with αT (t) = α(t ∧ T ) ∀t≥ 0. (7)
Finally, we will say that a functional F :D(R+,R
d)→R is Sk-continuous if F is contin-
uous for the Skorokhod topology on D(R+,R
d) and the notation “
=⇒” will denote the
weak convergence on D(R+,R
In Section 2, we state our main results for a general Rd-valued Feller Markov process.
Then, in Section 3, we apply them to Brownian diffusions and Lévy-driven SDE’s. Section
4 is devoted to the proofs of the main general results. Finally, in Section 5, we complete
this paper with an application to option pricing in stationary stochastic volatility models.
150 G. Pagès and F. Panloup
2. General results
In this section, we state the results on convergence of the sequence (ν(n)(ω,dα))n≥1 when
(Xt) is a general Feller Markov process.
2.1. Weak convergence to the stationary regime
As explained in the Introduction, since the a.s. convergence of (ν
0 (ω,dx))n≥1 to the
invariant distribution ν0 has already been deeply studied for a large class of Markov
processes (Brownian diffusions and Lévy driven SDE’s), our approach will be to derive
the convergence of (ν(n)(ω,dα))n≥1 toward Pν0 from that of (ν
0 (ω,dx))n≥1 to the
invariant distribution ν0. More precisely, we will assume in Theorem 1 that
(C0,1): (Xt) admits a unique invariant distribution ν0 and
0 (ω,dx)
=⇒ ν0(dx) a.s.,
whereas in Theorem 2, we will only assume that
(C0,2): (ν
0 (ω,dx))n≥1 is a.s. tight on R
We also introduce three other assumptions, (C1), (C2) and (C3,ε), regarding the conti-
nuity in probability of the flow x 7→ (Xxt ), the asymptotic convergence of the shifted time
discretization scheme to the true process (Xt) and the steps and weights, respectively.
(C1): For every x0 ∈Rd, ǫ > 0 and T > 0,
limsup
0≤t≤T
|Xxt −Xx0t | ≥ ǫ
= 0. (8)
(C2): (X̄t) is a non-homogeneous Markov process and for every n≥ 0, it is possible to
construct a family of stochastic processes (Y
(n,x)
t )x∈Rd such that
(i) L(Y (n,x)) D(R+,R
= L(X̄(n)|X̄(n)0 = x);
(ii) for every compact set K of Rd, for every T ≥ 0,
0≤t≤T
|Y (n,x)t −Xxt |
n→+∞−→ 0 in probability. (9)
(C3,ε): For every n≥ 1, ηn ≤CγnHεn.
Remark 2. Assumption (C2) implies, in particular, that asymptotically and uniformly
on compact sets of Rd, the law of the approximate process (X̄(n)), given its initial value,
is close to that of the true process.
If there exists a unique invariant distribution ν0, the second part of (C2) can be relaxed
to the following, less stringent, assertion: for all ǫ > 0, there exists a compact set Aǫ ⊂Rd
such that ν0(A
ǫ)≤ ǫ and such that
0≤t≤T
|Y (n,x)t −Xxt |
n→+∞−→ 0 in probability. (10)
Approximation of the distribution of a stationary Markov process 151
This weaker assumption can some times be needed in stochastic volatility models like
the Heston model (see Section 5 for details).
The preceding assumptions are all that we require for the convergence of (ν(n)(ω,dα))n≥1
to Pν0 along the bounded Sk-continuous functionals, that is, for the a.s. weak conver-
gence on D(R+,R
d). However, the integration of non-bounded continuous functionals
F :D([0, T ],Rd)→ R will need some additional assumptions, depending on the stability
of the time discretization scheme and on the steps and weights sequences. We will sup-
pose that F is dominated (in a sense to be specified later) by a function V : Rd → R+
that satisfies the following assumptions for some s≥ 2 and ε < 1.
H(s, ε): For every T > 0,
(i) sup
0≤t≤T
Vs(Y (n,x)t )
≤CTVs(x),
(ii) sup
0 (V)<+∞,
(iii)
E[V2(X̄Γk−1)]<+∞,
∆N(k,T )
E[Vs(1−ε)(X̄Γk−1)]<+∞,
where T 7→CT is locally bounded on R+ and ∆N(k,T ) =N(k,T )−N(k− 1, T ).
For every ε < 1, we then set
K(ε) = {V ∈ C(Rd,R+),H(s, ε) holds for some s≥ 2}.
Remark 3. Apart from assumption (i), which is a classical condition on the finite time
horizon control, the assumptions in H(s, ε) strongly rely on the stability of the time
discretization scheme (and then, to that of the true process). More precisely, we will see
when we apply our general results to SDE’s that these properties are some consequences
of the Lyapunov assumptions needed for the tightness of (ν
0 (ω,dx))n≥1.
We can now state our first main result.
Theorem 1. Assume (C0,1), (C1), (C2) and (C3,ε) with ε ∈ (−∞,1). Then, a.s., for
every bounded Sk-continuous functional F :D(R+,R
d)→R,
ν(n)(ω,F )
n→+∞−→
F (α)Pν0 (dα), (11)
where Pν0 denotes the stationary distribution of (Xt) (with initial law ν0).
Furthermore, for every T > 0, for every non-bounded Sk-continuous functional
F :D(R+,R
d)→ R, (11) holds a.s. for FT (defined by (7)) if there exists V ∈ K(ε) and
152 G. Pagès and F. Panloup
ρ ∈ [0,1) such that
|FT (α)| ≤C sup
0≤t≤T
Vρ(αt) ∀α ∈D(R+,Rd). (12)
In the second result, the uniqueness of the invariant distribution is not required and
the sequence (ν
0 (ω,dx))n≥1 is only supposed to be tight.
Theorem 2. Assume (C0,2), (C1), (C2) and (C3,ε) with ε ∈ (−∞,1). Assume that
0 (ω,dx))n≥1 is a.s. tight on R
d. We then have the following.
(i) The sequence (ν(n)(ω,dα))n≥1 is a.s. tight on D(R+,R
d) and a.s., for ev-
ery convergent subsequence (nk(ω))n≥1, for every bounded Sk-continuous functional
F :D(R+,R
d)→R,
ν(nk(ω))(ω,F )
n→+∞−→
F (α)Pν∞(dα), (13)
where Pν∞ is the law of (Xt) with initial law ν∞ being a weak limits for (ν
0 (ω,dx))n≥1.
Furthermore, for every T > 0, for every non-bounded Sk-continuous functional
F :D(R+,R
d)→R, (13) holds a.s. for FT if (12) is satisfied with V ∈K(ε) and ρ ∈ [0,1).
(ii) If, moreover,
l≥k+1
|∆ηℓ|
n→+∞−→ 0, (14)
then ν∞ is necessarily an invariant distribution for the Markov process (Xt).
Remark 4. Condition (14) holds for a large class of steps and weights. For instance,
if ηn = C1n
−ρ1 and γn = C2n
−ρ2 with ρ1 ∈ [0,1] and ρ2 ∈ (0,1], then (14) is satisfied if
ρ1 = 0 or if ρ1 ∈ (max(0,2ρ2 − 1),1).
2.2. Extension to the non-stationary case
Even though the main interest of this algorithm is the weak approximation of the pro-
cess when stationary, we observe that when ν0 is known, the algorithm can be used to
approximate Pµ0 if µ0 is a probability on R
d that is absolutely continuous with respect
to ν0.
Indeed, assume that µ0(dx) = φ(x)ν0(dx), where φ :R
d → R is a continuous non-
negative function. For a functional F :D(R+,R
d)→ R, denote by Fφ the functional de-
fined on D(R+,R
d) by Fφ(α) = F (α)φ(α(0)).
Then, if ν(n)(ω,dα)
(Sk)⇒ Pν0(dα) a.s., we also have the following convergence: a.s., for
every bounded Sk-continuous functional F :D(R+,R
d)→R,
ν(n)(ω,Fφ)
n→+∞−→
Fφ(α)Pν0 (dα) =
F (α)Pµ0 (dα).
Approximation of the distribution of a stationary Markov process 153
3. Application to Brownian diffusions and
Lévy-driven SDE’s
Let (Xt)t≥0 be a cadlag stochastic process solution to the SDE
dXt = b(Xt−) dt+ σ(Xt−) dWt + κ(Xt−) dZt, (15)
where b :Rd → Rd, σ :Rd 7→Md,ℓ (set of d× ℓ real matrices) and κ :Rd 7→Md,ℓ are con-
tinuous functions with sublinear growth, (Wt)t≥0 is an ℓ-dimensional Brownian motion
and (Zt)t≥0 is an integrable purely discontinuous R
ℓ-valued Lévy process independent of
(Wt)t≥0 with Lévy measure π and characteristic function given for every t≥ 0 by
E[ei〈u,Zt〉] = exp
ei〈u,y〉 − 1− i〈u, y〉π(dy)
Let (γn)n≥1 be a non-increasing step sequence satisfying (2). Let (Un)n≥1 be a sequence
of i.i.d. random variables such that U1
=N (0, Iℓ) and let ξ := (ξn)n≥1 be a sequence of
independent Rℓ-valued random variables, independent of (Un)n≥1. We then denote by
(X̄t)t≥0 the stepwise constant Euler scheme of (Xt) for which (X̄Γn)n≥0 is recursively
defined by X̄0 = x ∈Rd and
X̄Γn+1 = X̄Γn + γn+1b(X̄Γn) +
γn+1σ(X̄Γn)Un+1 + κ(X̄Γn)ξn+1. (16)
We recall that the increments of (Zt) cannot be simulated in general. That is why we
generally need to construct the sequence (ξn) with some approximations of the true
increments. We will come back to this construction in Section 3.2.
As in the general case, we denote by (X̄(k))k≥0 and (ν
(n)(ω,dα))n≥1 the sequences of
associated shifted Euler schemes and empirical measures, respectively.
Let us now introduce some Lyapunov assumptions for the SDE. Let EQ(Rd) denote
the set of essentially quadratic C2-functions V :Rd → R∗+ such that limV (x) = +∞ as
|x| →+∞, |∇V | ≤C
V and D2V is bounded. Let a ∈ (0,1] denote the mean reversion
intensity. The Lyapunov (or mean reversion) assumption is the following.
(Sa): There exists a function V ∈ EQ(Rd) such that:
(i) |b|2 ≤CV a, Tr(σσ∗(x)) + ‖κ(x)‖2 |x|→+∞= o(V a(x));
(ii) there exist β ∈R and ρ > 0 such that 〈∇V, b〉 ≤ β − ρV a.
From now on, we separate the Brownian diffusions and Lévy-driven SDE cases.
3.1. Application to Brownian diffusions
In this part, we assume that κ= 0. We recall a result by Lamberton and Pagès [13].
Proposition 1. Let a ∈ (0,1] such that (Sa) holds. Assume that the sequence (ηn/γn)n≥1
is non-increasing.
154 G. Pagès and F. Panloup
(a) Let (θn)n≥1 be a sequence of positive numbers such that
n≥1 θnγn < +∞ and
that there exists n0 ∈N such that (θn)n≥n0 is non-increasing. Then, for every positive r,
θnγnE[V
r(X̄Γn−1)]<+∞.
(b) For every r > 0,
0 (ω,V
r)<+∞ a.s. (17)
Hence, the sequence (ν
0 (ω,dx))n≥1 is a.s. tight.
(c) Moreover, every weak limit of this sequence is an invariant probability for the SDE
(15). In particular, if (Xt)t≥0 admits a unique invariant probability ν0, then for every
continuous function f such that f ≤CV r with r > 0, limn→∞ ν(n)0 (ω, f) = ν0(f) a.s.
Remark 5. For instance, if V (x) = 1 + |x|2, then the preceding convergence holds for
every continuous function with polynomial growth. According to Theorem 3.2 in Lemaire
[14], it is possible to extend these results to continuous functions with exponential growth,
but it then strongly depends on σ. Further the conditions on steps and weights can be
less restrictive and may contain the case ηn = 1, for instance (see Remark 4 of Lamberton
and Pagès [13] and Lemaire [14]).
We then derive the following result from the preceding proposition and from Theorems
1 and 2.
Theorem 3. Assume that b and σ are locally Lipschitz functions and that κ = 0. Let
a ∈ (0,1] such that (Sa) holds and assume that (ηn/γn) is non-increasing.
(a) The sequence (ν(n)(ω,dα))n≥1 is a.s. tight on C(R+,Rd)3 and every weak limit
of (ν(n)(ω,dα))n≥1 is the distribution of a stationary process solution to (15). In par-
ticular, when uniqueness holds for the invariant distribution ν0, a.s., for every bounded
continuous functional F :C(R+,Rd)→R,
ν(n)(ω,F )
n→+∞−→
F (x)Pν0 (dx). (18)
(b) Furthermore, if there exists s ∈ (2,+∞) and n0 ∈N such that
∆N(k,T )
is non-increasing and
∆N(k,T )
<+∞, (19)
3C(R+,R
d) denotes the space of continuous functions on R+ with values in R
d endowed with the
topology of uniform convergence on compact sets.
Approximation of the distribution of a stationary Markov process 155
then, for every T > 0, for every non-bounded continuous functional F :C(R+,Rd)→ R,
(18) holds for FT if the following condition is satisfied:
∃r > 0 such that |FT (α)| ≤C sup
0≤t≤T
V r(αt) ∀α ∈ C(R+,Rd).
Remark 6. If ηn =C1n
−ρ1 and γn =C2n
−ρ2 with 0< ρ2 ≤ ρ1 ≤ 1, then for s ∈ (1,+∞),
(19) is fulfilled if and only if s > 1/(1− ρ1). It follows that there exists s ∈ (2,+∞) such
that (19) holds as soon as ρ1 < 1.
Proof of Theorem 3. We want to apply Theorem 2. First, by Proposition 1, assumption
(C0,2) is fulfilled and every weak limit of (ν
0 (ω,dx)) is an invariant distribution. Second,
it is well known that (C1) and (C2) are fulfilled when b and σ are locally Lispchitz
sublinear functions. Then, since (C3,ε) holds with ε = 0, (18) holds for every bounded
continuous functional F . Finally, one checks that H(s,0) holds with V := V r (r > 0).
It is classical that assumption (a) is true when b and σ are sublinear. Assumption (b)
follows from Proposition 1(b). Let θn,1 = ηn/(γnH
n) and θn,2 =∆N(n,T )/(γnH
n). Using
(19) and the fact that (ηn/γn) is non-increasing yields that (θn,1) and (θn,2) satisfy the
conditions of Proposition 1 (see (35) for details). Then, (iii) and (iv) of H(s,0) are
consequences of Proposition 1(a). This completes the proof. �
3.2. Application to Lévy-driven SDE’s
When we want to extend the results obtained for Brownian SDE’s to Lévy-driven SDE’s,
one of the main difficulties comes from the moments of the jump component (see Panloup
[18] for details). For simplification, we assume here that (Zt) has a moment of order
2p≥ 2, that is, that its Lévy measure π satisfies the following assumption with p≥ 1:
(H1p) :
|y|>1
π(dy)|y|2p <+∞.
We also introduce an assumption about the behavior of the moments of the Lévy measure
at 0:
(H2q) :
|y|≤1
π(dy)|y|2q <+∞, q ∈ [0,1].
This assumption ensures that (Zt) has finite 2q-variations. Since
|y|≤1
|y|2π(dy) is finite,
this is always satisfied for q = 1.
Let us now specify the law of (ξn) introduced in (16). When the increments of (Zt) can
be exactly simulated, we denote by (E) the Euler scheme and by (ξn,E) the associated
sequence
= Zγn ∀n≥ 1.
156 G. Pagès and F. Panloup
When the increments of (Zt) cannot be simulated, we introduce some approximated Euler
schemes (P) and (W) built with some sequences (ξn,P ) and (ξn,W ) of approximations of
the true increment (see Panloup [19] for more detailed presentations of these schemes).
In scheme (P),
=Zγn,n,
where (Z·,n)n≥1 a sequence of compensated compound Poisson processes obtained by
truncating the small jumps of (Zt)t≥0:
Zt,n :=
0<s≤t
∆Zs1{|∆Zs|>un} − t
|y|>un
yπ(dy) ∀t≥ 0, (20)
where (un)n≥1 is a sequence of positive numbers such that un → 0. We recall that
n→+∞−→ Z locally uniformly in L2 (see, e.g., Protter [21]).
As shown in Panloup [19], the error induced by this approximation is very large when
the local behavior of the small jumps component is irregular. However, it is possible to
refine this approximation by a Wienerization of the small jumps, that is, by replacing
the small jumps by a linear transform of a Brownian motion instead of discarding them
(see Asmussen and Rosinski [2]). The corresponding scheme is denoted by (W) with ξn,W
satisfying
= ξn,P +
γnQnΛn ∀n≥ 1,
where (Λn)n≥1 is a sequence of i.i.d. random variables, independent of (ξn,P )n≥1 and
(Un)n≥1, such that Λ1
=N (0, Iℓ) and (Qn) is a sequence of ℓ× ℓ matrices such that
n)i,j =
|y|≤uk
yiyjπ(dy).
We recall the following result obtained in Panloup [18] in our slightly simplified frame-
work.
Proposition 2. Let a ∈ (0,1], p≥ 1 and q ∈ [0,1] such that (H1p), (H2q) and (Sa) hold.
Assume that the sequence (ηn/γn)n≥1 is non-increasing. Then, the following assertions
hold for schemes (E), (P) and (W).
(a) Let (θn) satisfy the conditions of Proposition 1. Then,
n≥1 θnγnE[V
p+a−1(X̄Γn−1)]<
(b) We have
0 (ω,V
p/2+a−1)<+∞ a.s. (21)
Hence, the sequence (ν
0 (ω,dx))n≥1 is a.s. tight as soon as p/2+ a− 1> 0.
Approximation of the distribution of a stationary Markov process 157
(c) Moreover, if Tr(σσ∗)+ ‖κ‖2q ≤CV p/2+a−1, then every weak limit of this sequence
is an invariant probability for the SDE (15). In particular, if (Xt)t≥0 admits a unique
invariant probability ν0, for every continuous function f such that f = o(V
p/2+a−1),
limn→∞ ν
0 (ω, f) = ν0(f) a.s.
Remark 7. For schemes (E) and (P), the above proposition is a direct consequence of
Theorem 2 and Proposition 2 of Panloup [18]. As concerns scheme (W), a straightforward
adaptation of the proof yields the result.
Our main functional result for Lévy-driven SDE’s is then the following.
Theorem 4. Let a ∈ (0,1] and p≥ 1 such that p/2+ a− 1> 0 and let q ∈ [0,1]. Assume
(H1p), (H
q) and (Sa). Assume that b, σ and κ are locally Lipschitz functions. If, more-
over, (ηn/γn)n≥1 is non-increasing, then the following result holds for schemes (E), (P)
and (W).
(a) The sequence (ν(n)(ω,dα))n≥1 is a.s. tight on D(R+,R
d). Moreover, if
Tr(σσ∗) + ‖κ‖2q ≤CV p/2+a−1 or 1
l≥k+1
|∆ηℓ|
n→+∞−→ 0, (22)
then every weak limit of (ν(n)(ω,dα))n≥1 is the distribution of a stationary process solu-
tion to (15).
(b) Assume that the invariant distribution is unique. Let ε≤ 0 such that (C3,ε) holds.
Then, a.s., for every T > 0, for every Sk-continuous functional F :D(R+,R
d)→R, (18)
holds for FT if there exist ρ ∈ [0,1) and s≥ 2, such that
|FT (α)| ≤C sup
0≤t≤T
V (ρ(p+a−1))/s(αt) ∀α ∈D(R+,Rd)
and if
∆N(k,T )
s(1−ε)
is non-increasing and
∆N(k,T )
s(1−ε)
<+∞. (23)
Remark 8. In (22), both assumptions imply the invariance of every weak limit of
0 (ω,dx)). These two assumptions are very different. The first is needed in Proposition
2 for using the Echeverria–Weiss invariance criteria (see Ethier and Kurtz [7], page 238,
Lamberton and Pagès [12] and Lemaire [14]), whereas the second appears in Theorem
2, where our functional approach shows that under some mild additional conditions on
steps and weights, every weak limit is always invariant.
For (23), we refer to Remark 6 for simple sufficient conditions when (γn) and (ηn) are
some polynomial steps and weights.
158 G. Pagès and F. Panloup
4. Proofs of Theorems 1 and 2
We begin the proof with some technical lemmas. In Lemma 1, we show that the a.s
weak convergence of the random measures (ν(n)(ω,dα))n≥1 can be characterized by the
convergence (11) along the set of bounded Lipschitz functionals F for the distance d.
Then, in Lemma 2, we show with some martingale arguments that if the functional
F depends only on the restriction of the trajectory to [0, T ], then the convergence of
(ν(n)(ω,F ))n≥1 is equivalent to that of a more regular sequence. This step is fundamental
for the sequel of the proof.
Finally, Lemma 4 is needed for the proof of Theorem 2. We show that under some mild
conditions on the step and weight sequences, any Markovian weak limit of the sequence
(ν(n)(ω,dα))n≥1 is stationary.
4.1. Preliminary lemmas
Lemma 1. Let (E,d) be a Polish space and let P(E) denote the set of probability
measures on the Borel σ-field B(E), endowed with the weak convergence topology. Let
(µ(n)(ω,dα))n≥1 be a sequence of random probabilities defined on Ω×B(E).
(a) Assume that there exists µ(∞) ∈ P(E) such that for every bounded Lipschitz func-
tion F :E→R,
µ(n)(ω,F )
n→+∞−→ µ(∞)(F ) a.s. (24)
Then, a.s., (µ(n)(ω,dα))n≥1 converges weakly to µ
(∞) on P(E).
(b) Let U be a subset of P(E). Assume that for every sequence (Fk)k≥1 of Lipschitz
and bounded functions, a.s., for every subsequence (µ(φω(n))(ω,dα)), there exists a sub-
sequence (µ(φω◦ψω(n))(ω,dα)) and a U -valued random probability µ(∞)(ω,dα) such that
for every k ≥ 1,
µ(ψω◦φω(n))(ω,Fk)
n→+∞−→ µ(∞)(ω,Fk) a.s. (25)
Then, (µ(n)(ω,dα))n≥1 is a.s. tight with weak limits in U .
Proof. We do not give a detailed proof of the next lemma, which is essentially based
on the fact that in a separable metric space (E,d), one can build a sequence of bounded
Lipschitz functions (gk)k≥1 such that for any sequence (µn)n≥1 of probability measures
on B(E), (µn)n≥1 weakly converges to a probability µ if and only if the convergence
holds along the functions gk, k ≥ 1 (see Parthasarathy [22], Theorem 6.6, page 47 for a
very similar result). �
For every n≥ 0, for every T > 0, we introduce τ(n,T ) defined by
τ(n,T ) := min{k ≥ 0,N(k,T )≥ n}=min{k ≤ n,Γk + T ≥ Γn}. (26)
Approximation of the distribution of a stationary Markov process 159
Note that for k ∈ {0, . . . , τ(n,T )− 1}, {X̄(k)t ,0≤ t≤ T } is �FΓn -measurable and
T − γτ(n,T )−1 ≤ Γn − Γτ(n,T ) ≤ T.
Lemma 2. Assume (C3,ε) with ε < 1. Let F :D(R+,R
d)→R be a Sk-continuous func-
tional. Let (Gk) be a filtration such that F̄Γk ⊂ Gk for every k ≥ 1. Then, for any T > 0:
(a) if FT (defined by (7)) is bounded,
ηk(FT (X̄
(k−1))−E[FT (X̄(k−1))/Gk−1])
n→+∞−→ 0 a.s.; (27)
(b) if FT is not bounded, (27) holds if there exists V :Rd→R+, satisfying H(s, ε) for
some s≥ 2, such that |FT (α)| ≤C sup0≤t≤T V(αt) for every α ∈D(R+,Rd); furthermore,
ν(n)(ω,FT )<+∞ a.s. (28)
Proof. We prove (a) and (b) simultaneously. Let Υ(k) be defined by Υ(k) = FT (X̄
(k)).
We have
(k−1) −E[Υ(k−1)/Gk−1])
(k−1) −E[Υ(k−1)/Gn]) (29)
ηk(E[Υ
(k−1)/Gn]−E[Υ(k−1)/Gk−1]). (30)
We have to prove that the right-hand side of (29) and (30) tend to 0 a.s. when n→+∞.
We first focus on the right-hand side of (29). From the very definition of τ(n,T ), we
have that {X̄(k)t ,0≤ t≤ T } is F̄Γn -measurable for k ∈ {0, . . . , τ(n,T )− 1}. Hence, since
FT is σ(πs,0≤ s≤ T )-measurable and F̄Γn ⊂ Gn, it follows that Υ(k) is Gn-measurable
and that Υ(k) = E[Υ(k)/Gn] for every k ≤ τ(n,T )− 1. Then, if FT is bounded, we derive
from (C3,ε) that
(k−1) −E[Υ(k−1)/Gn])
≤ 2‖FT ‖sup
k=τ(n,T )+1
k=τ(n,T )+1
H1−εn
(Γn − Γτ(n,T ))
160 G. Pagès and F. Panloup
≤ C(T )
H1−εn
n→+∞−→ 0 a.s.,
where we used the fact that (Hn)n≥1 and (γn)n≥1 are non-decreasing and non-increasing
sequences, respectively.
Assume, now, that the assumptions of (b) are fulfilled with V satisfying H(s, ε) for
some s≥ 2 and ε < 1. By the Borel–Cantelli-like argument, it suffices to show that
k=τ(n,T )+1
(k−1) −E[Υ(k−1)/Gn])
<+∞. (31)
Let us prove (31). Let ak := η
(s−1)/s
k and bk(ω) := η
(k−1) − E[Υ(k−1)/Gn]). The
Hölder inequality applied with p̄= s/(s− 1) and q̄ = s yields
k=τ(n,T )+1
akbk(ω)
k=τ(n,T )+1
)s−1( n
k=τ(n,T )+1
ηk|Υ(k−1) −E[Υ(k−1)/Gn]|s
Now, since FT (α) ≤ sup0≤t≤T V(α), it follows from the Markov property and from
H(s, ε)(i) that
E[|FT (X̄(k))|s/F̄Γk ]≤CE
0≤t≤T
Vs(X̄(k)t )/F̄Γk
≤CTVs(X̄Γk).
Then, using the two preceding inequalities and (C3,ε) yields
k=τ(n,T )+1
(k−1) −E[Υ(k−1)/Gn])
k=τ(n,T )+1
)s−1( n
k=τ(n,T )+1
ηkE[Vs(X̄Γk−1)]
k=τ(n,T )+1
k=τ(n,T )+1
Vs(X̄Γk−1)
k=τ(n,T )+1
t∈[0,S(n,T )]
Vs(X̄τ(n,T )t )
where S(n,T ) = Γn−1 − Γτ(n,T ) and C does not depend n. By the definition of τ(n,T ),
S(n,T )≤ T . Then, again using H(s, ε)(i) yields
k=τ(n,T )
(k−1) −E[Υ(k−1)/Gn])
s(1−ε)
E[Vs(X̄(τ(n,T )))].
Approximation of the distribution of a stationary Markov process 161
Since n 7→ N(n,T ) is an increasing function, n 7→ τ(n,T ) is a non-decreasing function
and Card{n, τ(n,T ) = k}=∆N(k+1, T ) :=N(k+1, T )−N(k,T ). Then, since n 7→Hn
increases, a change of variable yields
k=τ(n,T )+1
(k−1) −E[Υ(k−1)/Gn])
∆N(k,T )
s(1−ε)
E[Vs(X̄Γk−1)]<+∞,
by H(s, ε)(iv).
Second, we prove that (30) tends to 0. For every n≥ 1, we let
(E[Υ(k−1)/Gn]−E[Υ(k−1)/Gk−1]). (32)
The process (Mn)n≥1 is a (Gn)-martingale and we want to prove that this process is
L2-bounded. Set Φ(k,n) = E[FT (X̄
(k))/Gn]− E[FT (X̄(k))/Gk]. Since FT is σ(πs,0 ≤ s ≤
T )-measurable, the random variable Φ(k,n) is F̄ΓN(k,T) -measurable. Then, for every i ∈
{N(k,T ), . . . , n}, Φ(k,n) is Gi-measurable so that
E[Φ(i,n)Φ(k,n)] =E[Φ(k,n)E[Φ(i,n)/Gi]] = 0.
It follows that
E[M2n] =
E[(Φ(k−1,n))
] + 2
N(k−1,T )∧n
i=k+1
E[Φ(i−1,n)Φ(k−1,n)]. (33)
Then,
E[M2n] ≤
E[(Φ(k−1,n))
] + 2
N(k−1,T )
i=k+1
E[Φ(i−1,n)Φ(k−1,n)]
H2−εk
E[(Φ(k−1,n))
] (34)
H2−εk
N(k−1,T )
i=k+1
γi sup
E[Φ(i−1,n)Φ(k−1,n)]
162 G. Pagès and F. Panloup
where, in the second inequality, we used assumption (C3,ε) and the decrease of i 7→
1/H1−εi . Hence, if FT is bounded, using the fact that
∑N(k−1,T )
i=k+1 γi ≤ T yields
E[M2n]≤C
H2−εk
H2−ε1
<+∞ (35)
since ε < 1. Assume, now, that the assumptions of (b) hold and let FT be dominated
by a function V satisfying H(s, ε). By the Markov property, the Jensen inequality and
H(s, ε)(i),
E[(Φ(k,n))
0≤t≤T
V2(X̄(k)t )/F̄Γk
≤CTE[V2(X̄Γk)].
We then derive from the Cauchy–Schwarz inequality that for every n, k ≥ 1, for every
i ∈ {k, . . . ,N(k,T )},
|E[Φ(i,n)Φ(k,n)]| ≤C
E[V2(X̄Γi)]
E[V2(X̄Γk)]≤C sup
t∈[0,T ]
E[V2(X̄(k)t )]≤CE[V2(X̄Γk)],
where, in the last inequality, we once again used H(s, ε)(i). It follows that
E[M2n]≤C
H2−εk
E[V2(X̄Γk−1)]<+∞,
by H(s, ε)(iii). Therefore, (34) is finite and (Mn) is bounded in L
2. Finally, we derive
from the Kronecker lemma that
ηk(E[FT (X̄
(k−1))/Gn]−E[FT (X̄(k−1))/Gk−1])
n→+∞−→ 0 a.s.
As a consequence, supn≥1 ν
(n)(ω,FT )<+∞ a.s. if and only if
E[FT (X̄
(k−1))/Fk−1]<+∞ a.s.
This last property is easily derived from H(s, ε)(i) and (ii). This completes the proof. �
Lemma 3. (a) Assume (C1) and let x0 ∈Rd. We then have limx→x0 E[d(Xx,Xx0)] = 0.
In particular, for every bounded Lispchitz (w.r.t. the distance d) functional F :D(R+,R
R, the function ΦF defined by ΦF (x) = E[F (Xx)] is a (bounded) continuous function on
(b) Assume (C2). For every compact set K ⊂Rd,
E[d(Y n,x,Xx)]
n→+∞−→ 0. (36)
Approximation of the distribution of a stationary Markov process 163
Set ΦFn (x) = E[F (Y
n,x)]. Then, for every bounded Lispchitz functional F :D(R+,R
d)→R,
|ΦF (x)−ΦFn (x)|
n→+∞−→ 0 for every compact set K ⊂Rd. (37)
Proof. (a) By the definition of d, for every α, β ∈D(R+,Rd) and for every T > 0,
d(α,β)≤
1∧ sup
0≤t≤T
|α(t)− β(t)|
+ e−T . (38)
It easily follows from assumption (C1) and from the dominated convergence theorem
limsup
E[d(Xx,Xx0)]≤ e−T for every T > 0.
Letting T →+∞ implies that limx→x0 E[d(Xx,Xx0)] = 0.
(b) We deduce from (38) and from assumption (C2) that for every compact setK ⊂Rd,
for every T > 0,
limsup
E[d(Y n,x,Xx)]≤ e−T .
Letting T →+∞ yields (36). �
Lemma 4. Assume that (ηn)n≥1 and (γn) satisfy (C3,ε) with ε < 1 and (14). Then:
(i) for every t≥ 0, for every bounded continuous function f :Rd→R,
t (ω, f)− ν
0 (ω, f)
n→+∞−→ 0 a.s.;
(ii) if, moreover, a.s., every weak limit ν(∞)(ω,dα) of (ν(n)(ω,dα))n≥1 is the dis-
tribution of a Markov process with semigroup (Qωt )t≥0, then, a.s., ν
(∞)(ω,dα) is the
distribution of a stationary process.
Proof. (i) Let f :Rd →R be a bounded continuous function. Since X̄(k)t = X̄ΓN(k,t) , we
t (ω, f)− ν
0 (ω, f) =
ηk(f(X̄ΓN(k−1,t))− f(X̄Γk−1)).
From the very definition of N(n,T ) and τ(n,T ), one checks that N(k − 1, T )≤ n− 1 if
and only if τ(n,T )≥ k. Then,
ηkf(X̄Γk−1) =
τ(n,t)
ηN(k−1,t)+1f(X̄ΓN(k−1,t))
ηkf(X̄Γk−1)1{k−1/∈N({0,...,n},t)}.
164 G. Pagès and F. Panloup
It follows that
t (ω, f)− ν
0 (ω, f) =
τ(n,t)
(ηk − ηN(k−1,t)+1)f(X̄ΓN(k−1,t))
τ(n,t)+1
ηkf(X̄ΓN(k−1,t))
ηkf(X̄Γk−1)1{k−1/∈N({0,...,n},t)}.
Then, since f is bounded and since
ηk1{k−1/∈N({0,...,n},t)} =
τ(n,t)
ηN(k−1,t)+1
τ(n,t)
|ηk − ηN(k−1,t)+1|+
k=τ(n,t)+1
we deduce that
|ν(n)t (ω, f)− ν
0 (ω, f)| ≤ 2‖f‖∞
τ(n,t)
|ηk − ηN(k−1,t)+1|+
k=τ(n,t)+1
Hence, we have to show that the sequences of the right-hand side of the preceding in-
equality tend to 0. On the one hand, we observe that
|ηk − ηN(k−1,t)+1| ≤
N(k−1,T )+1
ℓ=k+1
|ηℓ − ηℓ−1| ≤ max
ℓ≥k+1
|∆ηℓ|
N(k−1,T )+1
Using the fact that
∑N(k−1,T )+1
ℓ=k γℓ ≤ T + γ1 and condition (14) yields
τ(n,t)
|ηk − ηN(k−1,t)+1|
n→+∞−→ 0.
On the other hand, by (C3,ε), we have
k=τ(n,T )+1
H1−εn
k=τ(n,T )+1
H1−εn
n→+∞−→ 0 a.s.,
which completes the proof of (i).
Approximation of the distribution of a stationary Markov process 165
(ii) Let Q+ denote the set of non-negative rational numbers. Let (fℓ)ℓ≥1 be an every-
where dense sequence in CK(Rd) endowed with the topology of uniform convergence on
compact sets. Since Q+ and (fℓ)ℓ≥1 are countable, we derive from (i) that there exists
Ω̃⊂Ω such that P(Ω̃) = 1 and such that for every ω ∈ Ω̃, every t ∈Q+ and every ℓ≥ 1,
t (ω, fℓ)− ν
0 (ω, fℓ)
n→+∞−→ 0.
Let ω ∈ Ω̃ and let ν(∞)(ω,dα) denote a weak limit of (ν(n)(ω,dα))n≥1. We have
t (ω, fℓ) = ν
0 (ω, fℓ) ∀t ∈Q+ ∀ℓ≥ 1
and we easily deduce that
t (ω, f) = ν
0 (ω, f) ∀t ∈R+ ∀f ∈ CK(Rd).
Hence, if ν(∞)(ω,dα) is the distribution of a Markov process (Yt) with semigroup (Q
t )t≥0,
we have, for all f ∈ CK(Rd),
Qωt f(x)ν
0 (ω,dx) =
f(x)ν
0 (ω,dx) ∀t≥ 0.
0 (ω,dx) is then an invariant distribution for (Yt). This completes the proof. �
4.2. Proof of Theorem 1
Thanks to Lemma 1(a) applied with E =D(R+,R
d) and d defined by (6),
ν(n)(ω,dα)
=⇒ Pν0(dα) a.s.⇐⇒ ν(n)(ω,F )
n→+∞−→
F (x)Pν0 (dx) a.s. (39)
for every bounded Lipschitz functional F :D(R+,R
d)→ R. Now, consider such a func-
tional. By the assumptions of Theorem 1, we know that a.s., (ν
0 (ω,dx))n≥1 converges
weakly to ν0. Set Φ
F (x) := E[F (Xx)], x ∈Rd. By Lemma 3(a), ΦF is a bounded contin-
uous function on Rd. It then follows from (C0,1) that
F (X̄
(k−1)
n→+∞−→
ΦF (x)ν0(dx) =
F (x)Pν0 (dx) a.s.
Hence, the right-hand side of (39) holds for F as soon as
ηk(F (X̄
(k−1))−ΦF (X̄(k−1)0 ))
n→+∞−→ 0 a.s. (40)
166 G. Pagès and F. Panloup
Let us prove (40). First, let T > 0 and let FT be defined by (7). By Lemma 2,
ηkFT (X̄
(k−1))− 1
ηkE[FT (X̄
(k−1))/F̄Γk−1 ]
n→+∞−→ 0 a.s. (41)
With the notation of Lemma 3(b), we derive from assumption (C2)(i) that
E[FT (X̄
(k−1))/F̄Γk−1 ] = Φ
k (X̄
(k−1)
Let N ∈N. On one hand, by Lemma 3(b),
k (X̄
(k−1)
0 )−ΦFT (X̄
(k−1)
0 ))1{|X̄(k−1)
n→+∞−→ 0 a.s. (42)
On the other hand, the tightness of (ν
0 (ω,dx))n≥1 on R
d yields
ψ(ω,N) := sup
0 (ω, (B(0,N)
N→+∞−→ 0 a.s.
It follows that, a.s.,
ηk|ΦFTk (X̄
(k−1)
0 )−ΦFT (X̄
(k−1)
0 )|1{|X̄(k−1)
≤ 2‖F‖∞ψ(ω,N)
N→+∞−→ 0.
Hence, a combination of (42) and (43) yields
∀T > 0 1
k (X̄
(k−1)
0 )−ΦFT (X̄
(k−1)
n→+∞−→ 0 a.s. (44)
Finally, let (Tℓ)ℓ≥1 be a sequence of positive numbers such that, Tℓ→+∞ when ℓ→+∞.
Combining (44) and (41), we obtain that, a.s., for every ℓ≥ 1,
limsup
ηk(F (X̄
(k−1))−ΦF (X̄(k−1)))
≤ lim sup
ηk(F (X̄
(k−1))−FTℓ(X̄(k−1)))
+ limsup
FTℓ (X̄
(k−1)
0 )−ΦF (X̄
(k−1)
Approximation of the distribution of a stationary Markov process 167
By the definition of d, |F − FTℓ | ≤ e−Tℓ . Then, a.s.,
limsup
ηk(F (X̄
(k−1))−ΦF (X̄(k−1)0 ))
≤ 2e−Tℓ ∀ℓ≥ 1.
Letting ℓ→+∞ implies (40).
The generalization to non-bounded functionals in Theorem 1 is then derived from (28)
and from a uniform integrability argument.
4.3. Proof of Theorem 2
(i) We want to prove that the conditions of Lemma 1(b) are fulfilled. Since (ν
0 (ω,dx))n≥1
is supposed to be a.s. tight, one can check that for every bounded Lipschitz functional
F :D(R+,R
d)→R, (40) is still valid. Then, let (Fℓ)ℓ≥1 be a sequence of bounded Lipschitz
functionals. There exists Ω̃⊂Ω with P(Ω̃) = 1 such that for every ω ∈ Ω̃, (ν(n)0 (ω,dx))n≥1
is tight and
ηk(Fℓ(X̄
(k−1)(ω))−ΦFℓ(X̄(k−1)0 (ω)))
n→+∞−→ 0 ∀ℓ≥ 1. (45)
Let ω ∈ Ω̃ and let φω :N 7→N be an increasing function. As (ν(φω(n))0 (ω,dx))n≥1 is tight,
there exists a convergent subsequence (ν
(φω◦ψω(n))
0 (ω,dx))n≥1. We denote its weak limit
by ν∞. Since Φ
Fℓ is continuous for every ℓ≥ 1 (see Lemma 3(a)),
(φω◦ψω(n))
0 (ω,Φ
n→+∞−→ ν∞(ΦFℓ) =
Fℓ(α)Pν∞(dα) ∀ℓ≥ 1.
We then derive from (45) that for every ℓ≥ 1
ν(φω◦ψω(n))(ω,Fℓ)
n→+∞−→
Fℓ(α)Pν∞(dα).
It follows that the conditions of Lemma 1(b) are fulfilled with U = {Pµ, µ ∈ I}, where
µ ∈P(Rd),∃ω ∈ Ω̃ and an increasing function φ :N 7→N, µ= lim
ν(φ(n))(ω,dα)
Hence, by Lemma 1(b), we deduce that (ν(n)(ω,dα))n≥1 is a.s. tight with U -valued limits.
Finally, Theorem 2(ii) is a consequence of condition (14) and Lemma 4(ii).
168 G. Pagès and F. Panloup
5. Path-dependent option pricing in stationary
stochastic volatility models
In this section, we propose a simple and efficient method to price options in stationary
stochastic volatility (SSV) models. In most stochastic volatility (SV) models, the volatil-
ity is a mean reverting process. These processes are generally ergodic with a unique
invariant distribution (the Heston model or the BNS model for instance (see below) but
also the SABR model (see Hagan et al. [8]), . . .). However, they are usually considered
in SV models under a non-stationary regime, starting from a deterministic value (which
usually turns out to be the mean of their invariant distribution). However, the instanta-
neous volatility is not easy to observe on the market since it is not a traded asset. Hence,
it seems to be more natural to assume that it evolves under its stationary regime than
to give it a deterministic value at time 0.4
From a purely calibration viewpoint, considering an SV model in its SSV regime will
not modify the set of parameters used to generate the implied volatility surface, although
it will modify its shape, mainly for short maturities. This effect can in fact be an asset
of the SSV approach since it may correct some observed drawbacks of some models (see,
e.g., the Heston model below).
From a numerical point of view, considering SSV models is no longer an obstacle, es-
pecially when considering multi-asset models (in the unidimensional case, the stationary
distribution can be made more or less explicit like in the Heston model; see below) since
our algorithm is precisely devised to compute by simulation some expectations of func-
tionals of processes under their stationary regime, even if this stationary regime cannot
be directly simulated.
As a first illustration (and a benchmark) of the method, we will describe in detail
the algorithm for the pricing of Asian options in a Heston model. We will then show
in our numerical results to what extent it differs, in terms of smile and skew, from the
usual SV Heston model for short maturities. Finally, we will complete this section with
a numerical test on Asian options in the BNS model where the volatility is driven by a
tempered stable subordinator. Let us also mention that this method can be applied to
other fields of finance like interest rates, and commodities and energy derivatives where
mean-reverting processes play an important role.
4When one has sufficiently close observations of the stock price, it is in fact possible to derive a rough
idea of the size of the volatility from the variations of the stock price (see, e.g., Jacod [10]). Then, using
this information, a good compromise between a deterministic initial value and the stationary case may
be to assume that the distribution µ0 of the volatility at time 0 is concentrated around the estimated
value (see Section 2.2 for application of our algorithm in this case).
Approximation of the distribution of a stationary Markov process 169
5.1. Option pricing in the Heston SSV model
We consider a Heston stochastic volatility model. The dynamic of the asset price process
(St)t≥0 is given by S0 = s0 and
dSt = St(rdt+
(1− ρ2)vt dW 1t + ρ
vt dW
dvt = k(θ− vt) dt+ ς
vt dW
where r denotes the interest rate, (W 1,W 2) is a standard two-dimensional Brownian
motion, ρ ∈ [−1,1] and k, θ and ς are some non-negative numbers. This model was
introduced by Heston in 1993 (see Heston [9]). The equation for (vt) has a unique (strong)
pathwise continuous solution living in R+. If, moreover, 2kθ > ς
2, then (vt) is a positive
process (see Lamberton and Lapeyre [11]). In this case, (vt) has a unique invariant
probability ν0. Moreover, ν0 = γ(a, b) with a= (2k)/ς
2 and b= (2kθ)/ς2. In the following,
we will assume that (vt) is in its stationary regime, that is, that
L(v0) = ν0.
5.1.1. Option price and stationary processes
Using our procedure to price options in this model naturally needs to express the option
price as the expectation of a functional of a stationary stochastic process.
Näıve method. (may work) Since (vt)t≥0 is stationary, the first idea is to express the
option price as the expectation of a functional of (vt)t≥0: by Itô calculus, we have
St = s0 exp
rt− 1
vs ds
vs dW
1− ρ2
vs dW
. (46)
Since
vs dW
s =Λ(t, (vt)) :=
vt − v0 − kθt+ k
vs ds
it follows by setting Mt =
vs dW
s that
St =Ψ(t, (vs), (Ms)), (47)
where Ψ is given for every t≥ 0, u and w ∈ C(R+,R) by
Ψ(t, u,w) = s0 exp
rt− 1
u(s) ds
+ ρΛ(t, u) +
1− ρ2w(t)
Then, let F :C(R+,R) → R be a non-negative measurable functional. Conditioning by
FW 2T yields
E[FT ((St)t≥0)] = E[F̃T ((vt)t≥0)],
170 G. Pagès and F. Panloup
where, for every u ∈ C(R+,R),
F̃T (u) = E
t, u,
u(s) dW 1s
For some particular options such as the European call or put (thanks to the Black–
Scholes formula), the functional F̃ is explicit. In those cases, this method seems to be
very efficient (see Panloup [20] for numerical results). However, in the general case, the
computation of F̃ will need some Monte Carlo methods at each step. This approach is
then very time-consuming in general – that is why we are going to introduce another
representation of the option as a functional of a stationary process.
General method. (always works) We express the option premium as the expectation
of a functional of a two-dimensional stationary stochastic process. This method is based
on the following idea. Even though (vt,Mt) is not stationary, (St) can be expressed as a
functional of a stationary process (vt, yt). Indeed, consider the following SDE given by
dyt =−yt dt+
vt dW
dvt = k(θ− vt) dt+ ς
vt dW
First, one checks that the SDE has a unique strong solution and that assumption (S1) is
fulfilled with V (x1, x2) = 1+ x
2. This ensures the existence of an invariant distribu-
tion ν̃0 for the SDE (see, e.g., Pagès [17]). Then, since (vt) is positive and has a unique
invariant distribution, the uniqueness of the invariant distribution follows. Then, assume
that L(y0, v0) = ν̃0. Since (vt,Mt) = (vt, yt − y0 +
ys ds), we have, for every positive
measurable functional F :C(R+,R)→R,
E[FT ((St)t≥0)] = E[FT ((ψ(t, vt,Mt))t≥0)]
= Eν̃0
t, vt, yt − y0 +
ys ds
where Pν̃0 is the stationary distribution of the process (vt, yt). Every option price can
then be expressed as the expectation of an explicit functional of a stationary process. We
will develop this second general approach in the numerical tests below.
Remark 9. The idea of the second method holds for every stochastic volatility model
for which (St) can be written as follows:
St =Φ
t, vt,
hi(|vs|) dY is
, (50)
where, for every i ∈ {1, . . . , p}, hi :R+ →R is a positive function such that hi(x) = o(|x|)
as |x| → +∞, (Y it ) is a square-integrable centered Lévy process and (vt) is a mean
reverting stochastic process solution to a Lévy driven SDE.
Approximation of the distribution of a stationary Markov process 171
In some complex models, showing the uniqueness of the invariant distribution may be
difficult. In fact, it is important to note at this stage that the uniqueness of the invariant
distribution for the couple (vt, yt) is not required. Indeed, by construction, the local
martingale (Mt) does not depend on the choice of y0. It follows that if L(y0, v0) = µ̃,
with µ̃ constructed such that L(v0) = ν0, (49) still holds. This implies that it is only
necessary that uniqueness holds for the invariant distribution of the stochastic volatility
process.
5.1.2. Numerical tests on Asian options
We recall that (vt) is a Cox–Ingersoll–Ross process. For this type of processes, it is well
known that the genuine Euler scheme cannot be implemented since it does not preserve
the non-negativity of the (vt). That is why some specific discretization schemes have
been studied by several authors (Alfonsi [1], Deelstra and Delbaen [5] and Berkaoui et al.
[4, 6]). In this paper, we consider the scheme studied by the last authors in a decreasing
step framework. We denote it by (v̄t). We set v̄0 = x > 0 and
v̄Γn+1 = |v̄Γn + kγn+1(θ− v̄Γn) + ς
v̄Γn(W
−W 2Γn)|.
We also introduce the stepwise constant Euler scheme (ȳt) of (yt)t≥0 defined by
ȳΓn+1 = ȳΓn − γn+1ȳΓn +
v̄Γn(W̃
− W̃ 1Γn), ȳ0 = y ∈R
Denote by (v̄
t ) and (ȳ
t ) the shifted processes defined by v̄
t := v̄Γk+t and ȳ
ȳΓk+t, and let (ν
(n)(ω,dα))n≥1 be the sequence of empirical measures defined by
ν(n)(ω,dα) =
ηk1{(v̄(k−1),ȳ(k−1))∈dα}.
The specificity of both the model and the Euler scheme implies that Theorems 1 and 2
cannot be directly applied here. However, a specific study using the fact that (9) holds
for every compact set of R∗+ ×R when 2kθ/ς2 > 1+ 2
6/ς (see Theorem 2.2 of Berkaoui
et al. [4] and Remark 9) shows that
ν(n)(ω,dα)
=⇒ Pν̃0(dα) a.s.
when 2kθ/ς2 > 1+ 2
6/ς . Details are left to the reader.
Let us now state our numerical results obtained for the pricing of Asian options with
this discretization. We denote by Cas(ν0,K,T ) and Pas(ν0,K,T ) the Asian call and put
prices in the SSV Heston model. We have
Cas(ν0,K,T ) = e
Ss ds−K
172 G. Pagès and F. Panloup
Pas(ν0,K,T ) = e
K − 1
Ss ds
With the notation of (49), approximating Cas(ν0,K,T ) and Pas(ν0,K,T ) by our proce-
dure needs to simulate the sequences (Cnas)n≥1 and (P
as)n≥1 defined by
Cnas =
Ψ(s, v̄(k−1), M̄ (k−1)) ds−K
Pnas =
K − 1
Ψ(s, v̄(k−1), M̄ (k−1)) ds
These sequences can be computed by the method developed in Section 1.3. Note that
the specific properties of the exponential function and the linearity of the integral imply
that (
Ψ(t, v̄(n−1), M̄ (n−1)) ds) can be computed quasi-recursively.
Let us state our numerical results for the Asian call with parameters
s0 = 50, r = 0.05, T = 1, ρ= 0.5,
θ = 0.01, ς = 0.1, k = 2.
We also assume that K ∈ {44, . . . ,56} and choose the following steps and weights: γn =
ηn = n
−1/3. In Table 1, we first state the reference value for the Asian call price obtained
for N = 108 iterations. In the two following lines, we state our results for N = 5.104 and
N = 5.105 iterations. Then, in the last lines, we present the numerical results obtained
Table 1. Approximation of the Asian call price
K 44 45 46 47 48 49 50
Asian call (ref.) 6.92 5.97 5.04 4.12 3.25 2.46 1.78
N = 5 · 104 6.89 6.07 5.07 4.13 3.18 2.49 1.77
N = 5 · 105 6.90 6.02 5.00 4.11 3.24 2.46 1.79
N = 5 · 104 (CP parity) 6.92 5.96 5.04 4.13 3.26 2.46 1.78
N = 5 · 105 (CP parity) 6.92 5.97 5.04 4.12 3.25 2.47 1.78
K 51 52 53 54 55 56
Asian call (ref.) 1.23 0.82 0.53 0.33 0.21 0.12
N = 5 · 104 1.21 0.81 0.51 0.34 0.22 0.11
N = 5 · 105 1.23 0.82 0.53 0.33 0.21 0.13
N = 5 · 104 (CP parity) 1.23 0.82 0.53 0.31 0.21 0.12
N = 5 · 105(CP parity) 1.23 0.82 0.53 0.33 0.21 0.13
Approximation of the distribution of a stationary Markov process 173
using the call-put parity
Cas(ν0,K,T )− Pas(ν0, S0,K,T ) =
(1− e−rT )−Ke−rT (52)
as a means of variance reduction. The computation times for N = 5.104 and N = 5.105
(using MATLAB with a Xeon 2.4 GHz processor) are about 5 s and 51 s, respectively. In
particular, the complexity is quasi-linear and the additional computations needed when
we use the call-put parity are negligible.
5.2. Implied volatility surfaces of Heston SSV and SV models
Given a particular pricing model (with initial value s0 and interest rate r) and its asso-
ciated European call prices denoted by Ceur(K,T ), we recall that the implied volatility
surface is the graph of the function (K,T ) 7→ σimp(K,T ), where σimp(K,T ) is defined for
every maturity T > 0 and strike K as the unique solution of
CBS(s0,K,T, r, σimp(K,T )) =Ceur(K,T ),
where CBS(s0,K,T, r, σ) is the price of the European call in the Black–Scholes model
with parameters s0, r and σ. When Ceur(K,T ) is known, the value of σimp(K,T ) can be
numerically computed using the Newton method or by dichotomy if the first method is
not convergent.
In this last part, we compare the implied volatility surfaces induced by the SSV and SV
Heston models where we suppose that the initial value of (vt) in the SV Heston model is
the mean of the invariant distribution, that is, we suppose that v0 = θ.
5 We also assume
that the parameters are those of (51), except the correlation coefficient ρ.
In Figures 1 and 2, the volatility curves obtained when T = 1 are depicted, whereas in
Figures 3 and 4, we set the strikeK atK = 50 and let the time vary. These representations
show that when the maturity is long, the differences between the SSV and SV Heston
models vanish. This is a consequence of the convergence of the stochastic volatility to its
stationary regime when T →+∞.
The main differences between these models then appear for short maturities. That is
why we complete this part by a representation of the volatility curve when T = 0.1 for
ρ= 0 and ρ= 0.5 in Figures 5 and 6, respectively. We observe that for short maturities,
the volatility smile is more curved and the skew is steeper. These phenomena seem
interesting for calibration since one well-known drawback of the standard Heston model
is that it can have overly flat volatility curves for short maturities.
5.3. Numerical tests on Asian options in the BNS SSV model
The BNS model introduced in Barndorff-Nielsen and Shephard [3] is a stochastic volatility
model where the volatility process is a Lévy-driven positive Ornstein–Uhlenbeck process.
5This choice is the most usual in practice.
174 G. Pagès and F. Panloup
Figure 1. ρ= 0, K 7→ σimp(K,1).
The dynamic of the asset price (St) is given by St = S0 exp(Xt),
dXt = (r− 12vt) dt+
vt dWt + ρdZt, ρ≤ 0,
dvt = −µvt dt+dZt, µ > 0,
Figure 2. ρ= 0.5, K 7→ σimp(K,1).
Approximation of the distribution of a stationary Markov process 175
Figure 3. ρ= 0, T 7→ σimp(50, T ).
where (Zt) is a subordinator without drift term and Lévy measure π. In the following,
we assume that (Zt) is a tempered stable subordinator, that is, that
π(dy) = 1{y>0}
c exp(−λy)
dy, c > 0, λ > 0, α∈ (0,1).
As in the Heston model, we want to use our algorithm as a way of option pricing when
the stochastic volatility evolves under its stationary regime and test it on Asian options
using the method described in detail in Section 5.1. This model does not require a specific
Figure 4. ρ= 0.5, T 7→ σimp(50, T ).
176 G. Pagès and F. Panloup
Figure 5. ρ= 0, T 7→ σimp(50, T ).
discretization and the approximate Euler scheme (P) (see Section 3.2) relative to (vt)
can be implemented using the rejection method. In Table 2, we present our numerical
results obtained for the following choices of parameters, steps and weights:
ρ=−1, λ= µ= 1, c= 0.01, α= 1
, γn = ηn = n
−1/3.
The computation times forN = 5.104 andN = 5.105 are about 8.5 s and 93 s, respectively.
Note that for this model, the convergence seems to be slower because of the approximation
of the jump component.
Figure 6. ρ= 0.5, T 7→ σimp(50, T ).
Approximation of the distribution of a stationary Markov process 177
Table 2. Approximation of the Asian call price in the BNS model
K 44 45 46 47 48 49 50
Asian call (ref.) 6.75 5.83 4.93 4.05 3.18 2.35 1.57
N = 5 · 104 6.83 5.91 5.01 4.10 3.22 2.35 1.51
N = 5 · 105 6.78 5.86 4.96 4.06 3.19 2.34 1.52
N = 5 · 104 (CP parity) 6.76 5.85 4.94 4.07 3.20 2.29 1.51
N = 5 · 105 (CP parity) 6.75 5.83 4.93 4.04 3.17 2.32 1.54
K 51 52 53 54 55 56
Asian call (ref.) 0.91 0.55 0.39 0.29 0.23 0.18
N = 5 · 104 0.77 0.46 0.33 0.27 0.22 0.19
N = 5 · 105 0.79 0.48 0.34 0.27 0.21 0.17
N = 5 · 104 (CP parity) 0.79 0.47 0.37 0.27 0.23 0.19
N = 5 · 105(CP parity) 0.83 0.50 0.36 0.28 0.22 0.17
Acknowledgement
The authors would like to thank Vlad Bally for interesting comments on the paper.
References
[1] Alfonsi, A. (2005). On the discretization schemes for the CIR (and Bessel squared) pro-
cesses. Monte Carlo Methods Appl. 11 355–384. MR2186814
[2] Asmussen, S. and Rosinski, J. (2001). Approximations of small jumps of Lévy processes
with a view towards simulation. J. Appl. Probab. 38 482–493. MR1834755
[3] Barndorff-Nielsen, O.E. and Shephard, N. (2001). Modelling by Lévy processes for financial
economics. In Lévy Processes 283–318. Boston: Birkhäuser. MR1833702
[4] Berkaoui, A., Bossy, M. and Diop, A. (2008). Euler scheme for SDE’s with non-Lipschitz
diffusion coefficient: Strong convergence. ESAIM Probab. Statist. 12 1–11. MR2367990
[5] Deelstra, G. and Delbaen, F. (1998). Convergence of discretized stochastic (interest rate)
processes with stochastic drift term. Appl. Stochastic Models Data Anal. 14 77–84.
MR1641781
[6] Diop, A. (2003). Sur la discrétisation et le comportement à petit bruit d’EDS unidimension-
nelles dont les coefficients sont à dérivées singulières. Ph.D. thesis, Univ. Nice Sophia
Antipolis.
[7] Ethier, S. and Kurtz, T. (1986). Markov Processes, Characterization and Convergence.
Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical
Statistics. New York: Wiley. MR0838085
[8] Hagan, D., Kumar, D., Lesniewsky, A. and Woodward, D. (2002). Managing smile risk.
Wilmott Magazine 9 84–108.
[9] Heston, S. (1993). A closed-form solution for options with stochastic volatility with appli-
cations to bond and currency options. Review of Financial Studies 6 327–343.
[10] Jacod, J. (2008). Asymptotic properties of realized power variations and related functionals
of semimartingales. Stochastic Process. Appl. 118 517–559. MR2394762
http://www.ams.org/mathscinet-getitem?mr=2186814
http://www.ams.org/mathscinet-getitem?mr=1834755
http://www.ams.org/mathscinet-getitem?mr=1833702
http://www.ams.org/mathscinet-getitem?mr=2367990
http://www.ams.org/mathscinet-getitem?mr=1641781
http://www.ams.org/mathscinet-getitem?mr=0838085
http://www.ams.org/mathscinet-getitem?mr=2394762
178 G. Pagès and F. Panloup
[11] Lamberton, D. and Lapeyre, B. (1996). Introduction to Stochastic Calculus Applied to Fi-
nance. London: Chapman and Hall/CRC. MR1422250
[12] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution
of a diffusion. Bernoulli 8 367–405. MR1913112
[13] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution
of a diffusion: The case of a weakly mean reverting drift. Stoch. Dynamics 4 435–451.
MR2030742
[14] Lemaire, V. (2007). An adaptive scheme for the approximation of dissipative systems.
Stochastic Process. Appl. 117 1491–1518. MR2353037
[15] Lemaire, V. (2005). Estimation numérique de la mesure invariante d’un processus de diffu-
sion. Ph.D. thesis, Univ. Marne-La Vallée.
[16] Pagès, G. (1985). Théorèmes limites pour les semi-martingales. Ph.D. thesis, Univ. Paris
[17] Pagès, G. (2001). Sur quelques algorithmes récursifs pour les probabilités numériques.
ESAIM Probab. Statist. 5 141–170. MR1875668
[18] Panloup, F. (2008). Recursive computation of the invariant measure of a SDE driven by a
Lévy process. Ann. Appl. Probab. 18 379–426. MR2398761
[19] Panloup, F. (2008). Computation of the invariant measure of a Lévy driven SDE: Rate of
convergence. Stochastic Process. Appl. 118 1351–1384.
[20] Panloup, F. (2006). Approximation du régime stationnaire d’une EDS avec sauts. Ph.D.
thesis, Univ. Paris VI.
[21] Protter, P. (1990). Stochastic Integration and Differential Equations. Berlin: Springer.
MR1037262
[22] Parthasarathy, K.R. (1967). Probability Measures on Metric Spaces. New York: Academic
Press. MR0226684
Received April 2007 and revised March 2008
http://www.ams.org/mathscinet-getitem?mr=1422250
http://www.ams.org/mathscinet-getitem?mr=1913112
http://www.ams.org/mathscinet-getitem?mr=2030742
http://www.ams.org/mathscinet-getitem?mr=2353037
http://www.ams.org/mathscinet-getitem?mr=1875668
http://www.ams.org/mathscinet-getitem?mr=2398761
http://www.ams.org/mathscinet-getitem?mr=1037262
http://www.ams.org/mathscinet-getitem?mr=0226684
	Introduction
	Objectives and motivations
	Background and construction of the procedure
	Simulation of ((n)(,F))n1
	General results
	Weak convergence to the stationary regime
	Extension to the non-stationary case
	Application to Brownian diffusions and Lévy-driven SDE's
	Application to Brownian diffusions
	Application to Lévy-driven SDE's
	Proofs of Theorems 1 and 2
	Preliminary lemmas
	Proof of Theorem 1
	Proof of Theorem 2
	Path-dependent option pricing in stationary stochastic volatility models
	Option pricing in the Heston SSV model
	Option price and stationary processes
	Numerical tests on Asian options
	Implied volatility surfaces of Heston SSV and SV models
	Numerical tests on Asian options in the BNS SSV model
	Acknowledgement
	References
ABSTRACT
  We build a sequence of empirical measures on the space D(R_+,R^d) of
R^d-valued c\`adl\`ag functions on R_+ in order to approximate the law of a
stationary R^d-valued Markov and Feller process (X_t). We obtain some general
results of convergence of this sequence. Then, we apply them to Brownian
diffusions and solutions to L\'evy driven SDE's under some Lyapunov-type
stability assumptions. As a numerical application of this work, we show that
this procedure gives an efficient way of option pricing in stochastic
volatility models.

<|endoftext|><|startoftext|>
Influence of Phonon dimensionality on Electron Energy Relaxation
J. T. Karvonen and I. J. Maasilta
Nanoscience Center, Department of Physics, P.O. Box 35, FIN-40014 University of Jyväskylä, Finland.
We studied experimentally the role of phonon dimensionality on electron-phonon (e-p) interaction
in thin copper wires evaporated either on suspended silicon nitride membranes or on bulk substrates,
at sub-Kelvin temperatures. The power emitted from electrons to phonons was measured using
sensitive normal metal-insulator-superconductor (NIS) tunnel junction thermometers. Membrane
thicknesses ranging from 30 nm to 750 nm were used to clearly see the onset of the effects of two-
dimensional (2D) phonon system. We observed for the first time that a 2D phonon spectrum clearly
changes the temperature dependence and strength of the e-p scattering rate, with the interaction
becoming stronger at the lowest temperatures below ∼ 0.5 K for the 30 nm membranes.
PACS numbers: 63.22.+m, 63.20.Kr, 85.85.+j
It is an established fact that at sub-Kelvin tempera-
tures the thermal coupling between conduction electrons
and the lattice becomes very weak [1]. This has signifi-
cant implications for the operation of low-temperature
detectors and coolers [2], or for any solid-state sys-
tems where dissipation and cooling are relevant. Low-
temperature electron-phonon (e-p) interaction has been
studied widely during the past decades, but mostly only
for the case in which the phonons are fully three di-
mensional (3D) [3, 4, 5, 6]. However, due to signifi-
cant advances in fabrication of thin suspended structures,
many practical devices and detectors exist in which the
phonons are expected to move freely only within the
plane of a membrane, forming a quasi-2D system [7].
The question how the two-dimensionality of the phonon
modes influences e-p interaction has been addressed the-
oretically for certain cases [8, 9, 10], but no clear exper-
imental observation of the effect has been reported to
date, although several attempts have been made [11, 12].
In this paper, we show for the first time experimen-
tally that the electron-phonon interaction clearly changes
depending on the dimensionality of the phonons, as ex-
pected from theory. E-p coupling was measured with the
help of sensitive NIS tunnel junction thermometry [13],
for thin Cu wires on suspended silicon nitride (SiNx)
membranes with thickness varying from 30 nm to 750
nm, which spans the transition from 2D to 3D phonons.
In addition, samples with identical Cu wires on bulk
substrates were also measured for comparison. For the
thinnest membranes, the e-p interaction was strengthened
in comparison with the bulk samples, and its tempera-
ture dependence changed significantly, as is predicted by
the theory [8, 9, 10]. The change was large enough to
give indirect evidence that the dispersive (ω ∼ k2), flex-
ural modes of the membrane likely play a major role in
the e-p interaction.
In the presence of stress-free boundaries, the bulk
transversal and longitudinal phonon modes (with sound
velocities ct and cl, respectively) couple to each other
and form a new set of eigenmodes, which in the case
of a suspended membrane are known as the horizontal
shear modes (h), and symmetric (s) and antisymmet-
ric (a) Lamb modes [14]. The frequencies ω for the h
modes are simply ω = ct
+ (mπ/d)2, where k‖ is the
wave vector component parallel to the membrane sur-
faces, d is the membrane thickness and the integer m is
the branch number. However, the dispersion relations of
the s and a Lamb modes cannot be given in a closed an-
alytical form, but have to be calculated numerically. The
lowest three branches, dominant for thin membranes at
low temperatures, have low frequency analytical expres-
sions: ωh = ctk‖, ωs = csk‖, and ωa =
k2‖, where
cs = 2ct
− c2t )/c
is the effective sound velocity of
the s mode, and m⋆ = ~
− c2t )/3c
is an
effective mass for the a-mode ”particle”. This lowest a-
mode with its quadratic dispersion is mostly responsible
for the non-trivial behavior of the e-p interaction [9, 10].
Note that already a single free surface affects the modes
[15] and the e-p interaction [16], as the bulk modes cou-
ple and form another new set of eigenstates, including
the surface localized Rayleigh-mode. Thus, the widely
observed result for e-p power flow P = ΣV (T 5e − T
from a metal volume V with Te the electron and Tp the
phonon temperature, is not expected to hold even for
thin enough films on bulk substrates.
A schematic of the Cu wire samples on suspended sil-
icon nitride membranes and the used measuring circuit
is shown in Fig. 1. 17 samples were made on either sus-
pended membranes or bulk substrates, where nitridized
(100) Si wafers with 30, 200 and 750 nm thick low-stress
SiNx top layers were used as the substrate for both cases.
The suspension of the SiNx membranes (size 600×300
µm2) was achieved by anisotropic backside wet etching of
the silicon substate in KOH, and the metallic structures
were fabricated using standard e-beam lithography and
multi-angle shadow mask evaporation techniques. As the
e-p interaction strength is sensitive to the thickness and
disorder level of the metal [17], we minimized its effect
by evaporating the Cu wires of a specific thickness on
all the different substrates simultaneously. Ultrathin Cu
layers (t=14-30 nm) were used to strengthen the effect of
the thin membranes. The oxide layer forming the tun-
nel junction barriers was produced by thermal oxidation
of Al. Table I presents the essential dimensions of the
http://arxiv.org/abs/0704.0336v2
samples discussed in this paper, measured by scanning
electron (SEM) and atomic force (AFM) microscopies.
The electron mean free path l was determined from the
resistance of the wire at base temperature 60 mK, using
the accurately measured dimensions of the wire.
TABLE I: Parameters for samples. M= suspended SiNx mem-
brane and B= bulk substrate. B6 had an oxidized Si sub-
strate.
Sample SiNx d Cu t V l τ (0.2K) τ (0.8K)
(nm) (nm) [(µm)3] (nm) (µs) (µs)
M1 30 14 2.71 5.7 2.6 0.16
B1 30 14 2.46 4.9 7.1 0.030
M2 200 14 2.44 4.6 15.0 0.11
B2 200 18 3.67 4.1 6.4 0.045
M3 30 19 5.50 11.2 2.2 0.30
B3 30 19 4.62 9.8 4.3 0.034
M4 750 22 6.09 10.3 3.1 0.030
B4 750 22 5.87 8.7 3.9 0.013
M5 30 32 6.09 22 1.8 0.31
B5 30 32 5.09 19 2.7 0.038
B6 - 32 7.10 22 1.6 0.031
CuAl Nb/Al
FIG. 1: (Color online) A Schematic of the suspended samples
and the measuring circuit. Red lines are the normal metal
Cu, light gray Al for SINIS-junctions and dark gray Al or Nb
for SN-junctions.
We used the hot-electron technique [3] to measure the
e-p interaction by overheating the electrons by Joule heat
power P and measuring the resulting electron tempera-
ture Te. All the samples had two electrically isolated
Cu normal metal wires next to each other (Fig. 1). The
longer wire (L = 500µm) was heated by applying a slowly
ramping voltage across the pair of superconducting Nb
(or Al) leads in direct metallic contact to Cu, forming
SN junctions. These junctions provide excellent electri-
cal, but very poor thermal conductance due to Andreev
reflection, as the junctions are biased within the super-
conducting gap ∆. Thus, due to the lack of outdiffu-
sion of electrons and the long length of the wire, input
heat is distributed uniformly in the interior of the wire
and the electron gas cools dominantly by phonons, in-
stead of diffusively [18] or by thermal photons [19]. Since
L >> Le−e, the electron-electron scattering length, elec-
tron temperature is also well defined without complica-
tions from non-equilibrium [20]. In our sample geometry
the electron temperature is measured with two additional
Al leads forming a NIS tunnel junctions pair (SINIS) in
the middle of heated wire, as a function of input Joule
power P = IV measured in a four probe configuration.
The purpose of the short Cu wire, with additional SI-
NIS thermometer on it, is to give an estimate of the local
phonon temperature Tp, as the e-p power flow depends
on both Te and Tp.
The current-biased Al SINIS thermometer is ideally
suited to measure temperature below a few Kelvins, [2]
due to its high sensitivity (in our DC measurement ∼ 0.1
mK at 0.1 K) and low power dissipation. In addition,
for all the data here, the SINIS voltage vs. temperature
response follows the BCS theory without fitting param-
eters very accurately at least down to ∼ 0.2 K, where
typically saturation sets in. This saturation depends on
the strength of the e-p interaction (size of thermometer
and type of substrate) and the amount of filtering, and
thus we conclude that it is most likely caused by external
noise heating. For this reason we take the most conser-
vative approach and assume that all saturation is caused
by it, in which case we can use BCS theory to convert
the measured voltage data for all temperatures.
Even if the electrons lose their energy overwhelmingly
to the phonons in our sample geometry, it is still pos-
sible that the measured temperature is not only deter-
mined by the e-p interaction. This is because the emitted
phonons could be removed so ineffectively from the mem-
brane that the phonon transmission becomes a bottleneck
for the energy flow. Bulk scattering of phonons at low
temperatures is very weak [7], even for thin disordered
membranes [21], as is boundary resistance for thin films
on bulk substrates [22, 23]. In contrast, almost noth-
ing quantitative is known about the boundary resistance
between a thin metal film and a thin 2D membrane, or
between a thin 2D membrane and a bulk substrate. How-
ever, it seems clear that if the combined metal film and
membrane thickness is below the thermal wavelength of
the phonons, the phonon modes in the two materials are
strongly coupled, leading to an effectively non-existent
boundary resistance. Hence, if we check that the mem-
brane temperature Tp is not too high compared to Te
(effective enough hot phonon removal), we can be confi-
dent that the measured Te reflects the e-p interaction.
Figure 2 shows the main result of the measurements,
with Te and Tp plotted vs. the heating power density
p = P/V for all membrane thicknesses (30 nm, 200 nm
and 750 nm). In addition, data from a few represen-
tative bulk samples are shown. Compared to the cor-
responding bulk substrate sample (B4), Te of the 750
nm membrane (M4) shows no difference at all, and it
effectively behaves as bulk. This is reasonable, because
for the 750 nm membrane the estimated dimensional-
ity cross-over temperature [24, 25] Tcr = ~ct/(2kBd) is
∼ 30 mK, with ct = 6200 m/s for SiN. The phonon
temperatures Tp, however, show a big difference: The
0.1 1 10 100 1000
 of M3      T
 of M2
 of M4       T
 of B1-B6
 of M1
 of M2
 of M4
 of B1 and B2
 of B4
Heating power density [pW / ( m)3] 
FIG. 2: (Color online) Measured electron and phonon tem-
peratures Te and Tp versus the applied heating power density
in log-log-scale.
bulk samples show almost no response from the satura-
tion value of the thermometer ∼ 190 mK, whereas the
membrane phonons heat up measurably, most likely due
to the boundary resistance between the membrane and
the bulk. Nevertheless, this increase in Tp for all sam-
ples is small enough not to influence the e-p interaction.
For the 200 nm thick membrane (M2) (Tcr ∼ 110 mK),
at low heating power densities [p < 40 pW/(µm)3] the
temperature dependence follows the behavior of the bulk
sample (B2), although with a difference in the absolute
value. This shows that the strength of the e-p coupling
weakens compared to the bulk. At higher powers and
temperatures (p > 40 pW/(µm)3, where Te > 0.6 K),
Te starts to increase more rapidly in the membrane sam-
ple, most likely due to the boundary resistance effects.
The phonons in the 30 nm thick membrane sample (M1)
are expected to be in the 2D limit at low temperatures
(Tcr ∼ 0.5K), and a clear sign of this can be seen in
Fig. 2 as a strongly different behavior of the measured
Te vs. p curve with respect to all other samples. Below
∼ 6 pW/(µm)3 the e-p coupling is notably stronger (Te
lower) than in the corresponding bulk (B1) or any other
sample, but again at highest temperatures the influence
of other effects starts to dominate over the e-p coupling.
To study the temperature dependence of the data in
Fig. 2 more accurately, we plot the logarithmic deriva-
tives d(log p)/d(logTe) in Fig. 3 (a)-(c). For low heat-
ing powers (T ne >> T
p ) Pe−p ≈ T
e , where n is the
power law of the e-p interaction, thus in that regime
d(log p)/d(logTe) = n. Typically this exponent is n ≈ 5
for thicker (t > 30 nm) metal films on bulk substrates
[3, 4, 17], if the disorder in the film is not too strong
[26, 27, 28]. From Fig. 3 (a) we first of all see that
for the 30 nm membrane sample M1, the difference to
the bulk sample B1 is very clear. The M1 data has a
0.1 1 10 100 1000
 M1   B1
 M2   B2
 M4   B4
Heating power density [pW / (  m)3]
FIG. 3: (Color online) Numerical logarithmic derivatives of
the measured data in Fig. 2. (a) Te data for M1 and B1, (b)
Te data for M2 and B2, (c) Te data for M4 and B4.
plateau of n ∼ 4.5 between p = 0.1 - 6 pW/(µm)3, while
for B1, n continuously decreases from much higher val-
ues. Note that the strong increase of d(log p)/d(logTe)
below p ∼ 0.1 pW/(µm)3 is caused by the saturation
of the Te measurement, and not by the e-p interaction.
The point where n starts deviating from n = 4.5 cor-
responds to Te ≈ 0.4 K, which is surprisingly consistent
with the estimated Tcr ∼ 0.5 K. In contrast, the tempera-
ture dependence of the 200 nm membrane (M2) and bulk
(B2) samples [Fig. 3 (b)] are identical with each other
and with the 30 nm bulk sample (B1), as long as the
e-p interaction is dominant (up to 40 pW/(µm)3). The
750 nm membrane (M4) and bulk (B4) samples also give
identical values of n [Fig. 3 (c)]. The difference between
sample pairs M4,B4 and M2,B2 is caused by the Cu wire
thickness, which is expected to influence the temperature
dependence strongly [16, 27].
Finally, we discuss the effect of the Cu wire thickness
on the measured e-p interaction. The results for the
thinnest 30 nm membrane samples, with Cu thickness
t = 14,19 and 32 nm are shown in Figs 4 (a) and (c).
It is apparent that the metal film thickness has only a
minor effect on the e-p interaction on thin membranes,
and only influences the boundary resistance in the 3D
limit, by increasing its effect for thicker t, as expected.
However, for wires on bulk substrates, Figs 4 (b) and (d),
the effect of the Cu wire thickness on e-p interaction is
more profound. The thinner the Cu film, the more its
temperature dependence deviates from n = 5, which, for
comparison, is observed for a more typical t = 32 nm
Cu wire on oxidized Si (B6). This behavior is qualita-
0.1 1 10 100 1000
0.1 1 10 100 1000
(a) (b)
Heating power density [pW /( m)3]
FIG. 4: (Color online) (a) Te versus p = P/V for 30 nm mem-
brane samples M1,M3,M5. (b) Te versus p for bulk samples,
from top to bottom B1 (top), B3, B5 and B6 (bottom). (c)
d(log p)/d(log T ) of the data in (a). (d) d(log p)/d(log T ) of
the data in (b). From top to bottom: green line B1 (top),
magenta B3, blue B5, Red B6 (bottom). In (d) noise has
been filtered to help the eye.
tively consistent with the predicted effect of the surface
phonon modes [16], but could also depend on the disor-
der, as the thickening of the film increases the mean free
path l (Table I) and pushes the sample closer to the clean
limit. An apparent exponent as high as ∼ 7 could pos-
sibly be explained by the combination of strong disorder
and surface modes, but again, detailed theory is lacking.
In conclusion, we have obtained the first clear evidence
that the electron-phonon interaction at low tempera-
tures changes quite significantly when the phonon modes
become two-dimensional. To quantify the effects, the
electron thermal relaxation times τ = γV Te/(dP/dTe),
where γ = 100 J/K2m3 for Cu, are presented in Table I
for all the samples at two temperatures Te = 0.2 and 0.8
K. At Te < 0.5 K, the thinnest membranes can have a
a factor 2-3 strengthening effect, whereas at higher tem-
peratures the thermal relaxation from membranes can be
an order of magnitude weaker compared to bulk samples.
The membrane close to transition region (d=200 nm) was
shown to have a weaker (∼ factor of two) e-p interaction
strength than the bulk samples. Thinning the metal film
on bulk substrates also leads to a sizeable weakening of
the e-p interaction. The observed power law exponent
for the 2D limit is consistent with n ≈ 4.5, and is much
smaller than the corresponding bulk exponent n = 6..7.
A reduction by more than a factor one gives indirect evi-
dence of the importance of the flexural, dispersive Lamb-
modes for the membrane electron-phonon interaction, in
agreement with theory [9, 10].
Discussions with T. Kühn and A. Sergeev and tech-
nical assistance by H. Niiranen are acknowledged. This
work was supported by the Academy of Finland project
Nos. 118665 and 118231, and by the Finnish Academy
of Sciences and Letters (J.T.K.).
[1] V. F. Gantmakher, Rep. Prog. Phys. 37, 317 (1974).
[2] F. Giazotto et al., Rev. Mod. Phys. 78, 217 (2006).
[3] M. L. Roukes et al., Phys. Rev. Lett. 55, 422 (1985).
[4] F. C. Wellstood, C. Urbina, and J. Clarke, Phys. Rev. B
49, 5942 (1994).
[5] M. Kanskar and M. N. Wybourne, Phys. Rev. Lett. 73,
2123 (1994).
[6] D. R. Schmidt, C. S. Yung, and A. N. Cleland, Phys.
Rev. B 69, 140301 (2004).
[7] A. N. Cleland, Foundations of Nanomechanics, Springer,
Berlin (2003).
[8] D. Belitz and S. Das Sarma, Phys. Rev. B 36, 7701
(1987).
[9] K. Johnson, M. N. Wybourne and N. Perrin, Phys. Rev.
B 50, 2035 (1994).
[10] B. A. Glavin et al., Phys. Rev. B 65, 205315 (2002).
[11] J. F. DiTusa et al., Phys. Rev. Lett. 68, 1156 (1992).
[12] Y. K. Kwong et al., J. Low Temp. Phys. 88, 261 (1992).
[13] J. M. Rowell and D. C. Tsui, Phys. Rev. B 14, 2456
(1976).
[14] B. A. Auld, Acoustic Fields and Waves in Solids, 2nd.
Ed., Robert E. Krieger Publishing, Malabar, 1990.
[15] M. A. Geller, Phys. Rev. B 70, 205421 (2004).
[16] S.-X. Qu, A. N. Cleland and M. R. Geller, Phys. Rev. B
72, 224301 (2005).
[17] J. T. Karvonen, L. J. Taskinen and I. J. Maasilta, J. Low
Temp. Phys. 146, 213 (2007).
[18] C. Hoffmann, F. Lefloch, and M. Sanquer, Eur. Phys. J.
B 29, 629 (2002).
[19] M. Meschke, W. Guichard, and J. P. Pekola, Nature 444,
187 (2006).
[20] H. Pothier et al., Phys. Rev. Lett. 79, 3490 (1997).
[21] T. Kühn et al., arXiv:0705.1936.
[22] E. T. Swartz and R. O. Pohl, Rev. Mod. Phys. 61, 605
(1989).
[23] In this work, a maximum 1-5 % effect for Te at 1K for
t = 15..30 nm.
[24] T Kühn et al., Phys. Rev. B 70, 125425 (2004).
[25] T. Kühn and I. J. Maasilta, Nucl. Instrum. Methods
Phys. Res. A 559, 724 (2006); cond-mat/0702542.
[26] A. Schmid, Z. Phys. 259, 421 (1973); in Localization,
Interaction and Transport Phenomena, Springer 1985.
[27] M. Yu. Reizer and A. V. Sergeev, Zh. Eksp. Teor. Fiz.
90, 1056 (1986) [Sov. Phys. JETP 63, 616 (1986)]; A.
Sergeev and V. Mitin, Phys. Rev. B 61, 6041 (2000).
[28] L. J. Taskinen and I. J. Maasilta, Appl. Phys. Lett. 89,
143511 (2006).
http://arxiv.org/abs/0705.1936
http://arxiv.org/abs/cond-mat/0702542
ABSTRACT
  We studied experimentally the role of phonon dimensionality on
electron-phonon (e-p) interaction in thin copper wires evaporated either on
suspended silicon nitride membranes or on bulk substrates, at sub-Kelvin
temperatures. The power emitted from electrons to phonons was measured using
sensitive normal metal-insulator-superconductor (NIS) tunnel junction
thermometers. Membrane thicknesses ranging from 30 nm to 750 nm were used to
clearly see the onset of the effects of two-dimensional (2D) phonon system. We
observed for the first time that a 2D phonon spectrum clearly changes the
temperature dependence and strength of the e-p scattering rate, with the
interaction becoming stronger at the lowest temperatures below $\sim$ 0.5 K for
the 30 nm membranes.

<|endoftext|><|startoftext|>
Introduction
The issues of blowup of smooth solutions and finite time singularities of
the vorticity field for 3D incompressible Euler equations are still a major
open problem. The Cauchy problem in 3D bounded axisymmetric cylindrical
domains is attracting considerable attention: with bounded, smooth, non-
axisymmetric 3D initial data, under the constraints of conservation of bounded
energy, can the vorticity field blow up in finite time? Outstanding numeri-
cal claims for this have recently been disproven [Ke], [Hou1], [Hou2]. The
classical analytical criterion of Beale-Kato-Majda [B-K-M] for non-blow up
in finite time requires the time integrability of the L∞ norm of the vorticity.
DiPerna and Lions [Li] have given examples of global weak solutions of the
3D Euler equations which are smooth (hence unique) if the initial conditions
are smooth (specifically in W1,p(D), p > 1). However, these flows are really
2-Dimensional in x1, x2, 3-components flows, independent from the third co-
ordinate x3. Their examples [DiPe-Li] show that solutions (even smooth ones)
of the 3D Euler equations cannot be estimated in W1,p for 1 < p <∞ on any
time interval (0, T ) if the initial data are only assumed to be bounded inW1,p.
Classical local existence theorems in 3D bounded or periodic domains by Kato
[Ka], Bourguignon-Brézis [Bou-Br] and Yudovich [Yu1], [Yu2] require some
minimal smoothness for the initial conditions (IC), e.g., in Hs(D), s > 5
The classical formulation for the Euler equations is
∂tV+ (V · ∇)V = −∇p, ∇ ·V = 0, (1.1)
V ·N = 0 on ∂D, (1.2)
where ∂D is the boundary of a bounded, connected domain D, N the normal
to ∂D, V(t, y) = (V1, V2, V3) the velocity field, y = (y1, y2, y3), and p is the
pressure.
The equivalent Lamé form [Ar-Khe]
∂tV + curlV ×V +∇
= 0, (1.3)
∇ ·V = 0, (1.4)
∂tω + curl(ω ×V) = 0, (1.5a)
ω = curlV, (1.5b)
implies conservation of Energy:
E(t) =
|V(t, y)|2 dy. (1.6)
The helicity Hel(t) [Ar-Khe], [Mof], is conserved:
Hel(t) =
V · ω dy, (1.7)
for D = R3 and when D is a periodic lattice. Helicity is also conserved for
cylindrical domains, provided that ω·N = 0 on the cylinder’s lateral boundary
at t = 0 (see [M-N-B-G]).
From the theoretical point of view, the principal difficulty in the analysis
of 3D Euler equations is due to the presence of the vortex stretching term
(ω · ∇)V in the vorticity equation (1.5a). The equations (1.3) and (1.5a) are
equivalent to:
∂tω + [ω,V] = 0, (1.8)
where [a, b] = curl (a × b) is the commutator in the infinite dimensional Lie
algebra of divergence-free vector fields [Ar-Khe]. This point of view has led to
celebrated developments in Topological Methods in Hydrodynamics [Ar-Khe],
[Mof]. The striking analogy between the Euler equations for hydrodynamics
and the Euler equations for a rigid body (the latter associated to the Lie
Algebra of the Lie group SO(3,R)) had already been pointed out by Moreau
[Mor1]; Moreau was the first to demonstrate conservation of Helicity (1961)
[Mor2]. This has led to extensive speculations to what extent/in what cases
are the solutions of the 3D Euler equations “close” to those of coupled 3D rigid
body equations in some asymptotic sense. Recall that the Euler equations for
a rigid body in R3 is:
mt + ω ×m = 0, m = Aω, (1.9a)
mt + [ω,m] = 0, (1.9b)
where m is the vector of angular momentum relative to the body, ω the
angular velocity in the body and A the inertia operator [Ar1], [Ar-Khe].
The Russian school of Gledzer, Dolzhansky, Obukhov [G-D-O] and Vishik
[Vish] has extensively investigated dynamical systems of hydrodynamic type
and their applications. They have considered hydrodynamical models built
upon generalized rigid body systems in SO(n,R), following Manakhov [Man].
Inspired by turbulence physics, they have investigated “shell” dynamical sys-
tems modeling turbulence cascades; albeit such systems are flawed as they
only preserve energy, not helicity. To address this, they have constructed and
studied in depth n-dimensional dynamical systems with quadratic homoge-
neous nonlinearities and two quadratic first integrals F1, F2. Such systems
can be written using sums of Poisson brackets:
i2,...,in
ǫi1i2...inpi4...in
− ∂F1
, (1.10)
where constants pi4...in are antisymmetric in i4, ..., in.
A simple version of such a quadratic hydrodynamic system was introduced
by Gledzer [Gl1] in 1973. A deep open issue of the work by the Gledzer-
Obukhov school is whether there exist indeed classes of I.C. for the 3D Cauchy
Euler problem (1.1) for which solutions are actually asymptotically close in
strong norm, on arbitrary large time intervals to solutions of such hydro-
dynamic systems, with conservation of both energy and helicity. Another
unresolved issue is the blowup or global regularity for the “enstrophy” of such
systems when their dimension n→ ∞.
This article reviews some current new results of a research program in
the spirit of the Gledzer-Obukhov school; this program builds-up on the re-
sults of [M-N-B-G] for 3D Euler in bounded cylindrical domains. Following
the original approach of [B-M-N1]-[B-M-N4] in periodic domains, [M-N-B-G]
prove the non blowup of the 3D incompressible Euler equations for a class
of three-dimensional initial data characterized by uniformly large vorticity in
bounded cylindrical domains. There are no conditional assumptions on the
properties of solutions at later times, nor are the global solutions close to
some 2D manifold. The initial vortex stretching is large. The approach of
proving regularity is based on investigation of fast singular oscillating limits
and nonlinear averaging methods in the context of almost periodic functions
[Bo-Mi], [Bes], [Cor]. Harmonic analysis tools based on curl eigenfunctions
and eigenvalues are crucial. One establishes the global regularity of the 3D
limit resonant Euler equations without any restriction on the size of 3D initial
data. The resonant Euler equations are characterized by a depleted nonlin-
earity. After establishing strong convergence to the limit resonant equations,
one bootstraps this into the regularity on arbitrary large time intervals of the
solutions of 3D Euler Equations with weakly aligned uniformly large vorticity
at t = 0. [M-N-B-G] theorems hold for generic cylindrical domains, for a set
of height/radius ratios of full Lebesgue measure. For such cylinders, the 3D
limit resonant Euler equations are restricted to two-wave resonances of the
vorticity waves and are vested with an infinite countable number of new con-
servation laws. The latter are adiabatic invariants for the original 3D Euler
equations.
Three-wave resonances exist for a nonempty countable set of h/R (h
height, R radius of the cylinder) and moreover accumulate in the limit of
vanishingly small vertical (axial) scales. This is akin to Arnold tongues [Ar2]
for the Mathieu-Hill equations and raises nontrivial issues of possible sin-
gularities/lack thereof for dynamics ruled by infinitely many resonant triads
at vanishingly small axial scales. In such a context, the 3D resonant Euler
equations do conserve the energy and helicity of the field.
In this review, we consider cylindrical domains with parametric resonances
in h/R and investigate in depth the structure and dynamics of 3D resonant
Euler systems. These parametric resonances in h/R are proven to be non-
empty. Solutions to Euler equations with uniformly large initial vorticity are
expanded along a full complete basis of elementary swirling waves (T2 in time).
Each such quasiperiodic, dispersive vorticity wave is a quasiperiodic Beltrami
flow; these are exact solutions of 3D Euler equations with vorticity parallel
to velocity. There are no Galerkin-like truncations in the decomposition of
the full 3D Euler field. The Euler equations, restricted to resonant triplets of
these dispersive Beltrami waves, determine the “resonant Euler systems”. The
basic “building block” of these (a priori ∞-dimensional) systems are proven
to be SO(3;C) and SO(3;R) rigid body systems:
U̇k + (λm − λn)UmUn = 0
U̇m + (λn − λk)UnUk = 0
U̇n + (λk − λm)UkUm = 0
(1.11)
These λ’s are eigenvalues of the curl operator in the cylinder, curlΦ±n =
±λnΦ±n ; the curl eigenfunctions are steady elementary Beltrami flows, and
the dispersive Beltrami waves oscillate with the frequencies ± h
, n3 ver-
tical wave number (vertical shear), 0 < ǫ < 1. Physicists [Ch-Ch-Ey-H] have
computationally demonstrated the physical impact of the polarization of Bel-
trami modes Φ± on intermittency in the joint cascade of energy and helicity
in turbulence.
Another “building block” for resonant Euler systems is a pair of SO(3;C)
or SO(3;R) rigid bodies coupled via a common principal axis of inertia/mo-
ment of inertia:
ȧk = (λm − λn)Γaman (1.12a)
ȧm = (λn − λk)Γanak (1.12b)
ȧn = (λk − λm)Γakam + (λk̃ − λm̃)Γ̃ak̃am̃ (1.12c)
ȧm̃ = (λn − λk̃)Γ̃anak̃ (1.12d)
ȧk̃ = (λm̃ − λn)Γ̃am̃an, (1.12e)
where Γ and Γ̃ are parameters in R defined in Theorem 4.10. Both reso-
nant systems (1.11) and (1.12) conserve energy and helicity. We prove that
the dynamics of these resonant systems admit equivariant families of homo-
clinic cycles connecting hyperbolic critical points. We demonstrate bursting
dynamics: the ratio
||u(t)||2Hs/||u(0)||2Hs , s ≥ 1
can burst arbitrarily large on arbitrarily small times, for properly chosen para-
metric domain resonances h/R. Here
||u(t)||2Hs =
2s|un(t)|2 . (1.13)
The case s = 1 is the enstrophy. The “bursting” orbits are topologically close
to the homoclinic cycles.
Are such dynamics for the resonant systems relevant to the full 3D Euler
equations (1.1)-(1.8)? The answer lies in the following crucial “shadowing”
Theorem 2.10. Given the same initial conditions, given the maximal time
interval 0 ≤ t < Tm where the resonant orbits of the resonant Euler equations
do not blow up, then the strong norm Hs of the difference between the exact
Euler orbit and the resonant orbit is uniformly small on 0 ≤ t < Tm, provided
that the vorticity of the I.C. is large enough. Paradoxically, the larger the
vortex streching of the I.C., the better the uniform approximation. This deep
result is based on cancellation of fast oscillations in strong norms, in the
context of almost periodic functions of time with values in Banach spaces
(Section 4 of [M-N-B-G]). It includes uniform approximation in the spaces
Hs, s > 5/2. For instance, given a quasiperiodic orbit on some time torus Tl
for the resonant Euler systems, the exact solutions to the Euler equations will
remain ǫ-close to the resonant quasiperiodic orbit on a time interval 0 ≤ t ≤
maxTi, 1 ≤ i ≤ l, Ti elementary periods, for large enough initial vorticity. If
orbits of the resonant Euler systems admit bursting dynamics in the strong
norms Hs, s ≥ 7/2, so do some exact solutions of the full 3D Euler equations,
for properly chosen parametrically resonant cylinders.
2 Vorticity waves and resonances of elemen-
tary swirling flows
We study initial value problem for the three-dimensional Euler equations with
initial data characterized by uniformly large vorticity:
∂tV+ (V · ∇)V = −∇p, ∇ ·V = 0, (2.1)
V(t, y)|t=0 = V(0) = Ṽ0(y) +
e3 × y (2.2)
where y = (y1, y2, y3), V(t, y) = (V1, V2, V3) is the velocity field and p is the
pressure. In Eqs. (1.1) e3 denotes the vertical unit vector and Ω is a constant
parameter. The field Ṽ0(y) depends on three variables y1, y2 and y3. Since
curl(Ω
e3 × y) = Ωe3, the vorticity vector at initial time t = 0 is
curlV(0, y) = curlṼ0(y) + Ωe3, (2.3)
and the initial vorticity has a large component weakly aligned along e3, when
Ω >> 1. These are fully three-dimensional large initial data with large initial
3D vortex stretching. We denote by Hsσ the usual Sobolev space of solenoidal
vector fields.
The base flow
Vs(y) =
e3 × y, curlVs(y) = Ωe3 (2.4)
is called a steady swirling flow and is a steady state solution (1.1)-(1.4), as
curl(Ωe3×Vs(y)) = 0. In (2.2) and (2.3), we consider I.C. which are an arbi-
trary (not small) perturbation of the base swirling flow Vs(y) and introduce
V(t, y) =
e3 × y + Ṽ(t, y), (2.5)
curlV(t, y) = Ωe3 + curlṼ(t, y), (2.6)
∂tṼ + curlṼ× Ṽ +Ωe3 × Ṽ + curlṼ×Vs(y) +∇p′ = 0, ∇ · Ṽ = 0, (2.7)
Ṽ(t, y)|t=0 = Ṽ0(y). (2.8)
Eqs. (2.1) and (2.7) are studied in cylindrical domains
C = {(y1, y2, y3) ∈ R3 : 0 < y3 < 2π/α, y21 + y22 < R2} (2.9)
where α and R are positive real numbers. If h is the height of the cylinder,
α = 2π/h. Let
Γ = {(y1, y2, y3) ∈ R3 : 0 < y3 < 2π/α, y21 + y22 = R2}. (2.10)
Without loss of generality, we can assume that R = 1. Eqs. (2.1) are consid-
ered with periodic boundary conditions in y3
V(y1, y2, y3) = V(y1, y2, y3 + 2π/α) (2.11)
and vanishing normal component of velocity on Γ
V ·N = Ṽ ·N = 0 on Γ; (2.12)
where N is the normal vector to Γ. From the invariance of 3D Euler equations
under the symmetry y3 → −y3, V1 → V1, V2 → V2, V3 → −V3, all results in
this article extend to cylindrical domains bounded by two horizontal plates.
Then the boundary conditions in the vertical direction are zero flux on the
vertical boundaries (zero vertical velocity on the plates). One only needs to
restrict vector fields to be even in y3 for V1, V2 and odd in y3 for V3, and
double the cylindrical domain to −h ≤ y3 ≤ +h.
We choose Ṽ0(y) in H
s(C), s > 5/2. In [M-N-B-G], for the case of “non-
resonant cylinders”, that is, non-resonant α = 2π/h, we have established
regularity for arbitrarily large finite times for the 3D Euler solutions for Ω
large, but finite. Our solutions are not close in any sense to those of the 2D
or “quasi 2D” Euler and they are characterized by fast oscillations in the e3
direction, together with a large vortex stretching term
ω(t, y) · ∇V(t, y) = ω1
, t ≥ 0
with leading component
V(t, y)
≫ 1. There are no assumptions on
oscillations in y1, y2 for our solutions (nor for the initial condition Ṽ0(y)).
Our approach is entirely based on sturying fast singular oscillating limits
of Eqs. (1.1)-(1.5a), nonlinear averaging and cancelation of oscillations in the
nonlinear interactions for the vorticity field for large Ω. This has been devel-
oped in [B-M-N2], [B-M-N3], and [B-M-N4] for the cases of periodic lattice
domains and the infinite space R3.
It is well known that fully three-dimensional initial conditions with uni-
formly large vorticity excite fast Poincaré vorticity waves [B-M-N2], [B-M-N3],
[B-M-N4], [Poi]. Since individual Poincaré wave modes are related to the
eigenfunctions of the curl operator, they are exact time-dependent solutions
of the full nonlinear 3D Euler equations. Of course, their linear superposition
does not preserve this property. Expanding solutions of (2.1)-(2.8) along such
vorticity waves demonstrates potential nonlinear resonances of such waves.
First recall spectral properties of the curl operator in bounded, connected
domains:
Proposition 2.1 ([M-N-B-G]) The curl operator admits a self-adjoint ex-
tension under the zero flux boundary conditions, with a discrete real spectrum
λn = ±|λn|, |λn| > 0 for every n and |λn| → +∞ as |n| → ∞. The corre-
sponding eigenfunctions Φ±n
curlΦ±n = ±|λn|Φ
n (2.13)
are complete in the space
U ∈ L2(D) : ∇ ·U = 0 and U ·N|∂D = 0 and
U dz = 0
(2.14)
Remark 2.2 In cylindrical domains, with cylindrical coordinates (r, θ, z), the
eigenfunctions admit the representation:
Φn1,n2,n3 = (Φr,n1,n2,n3(r),Φθ,n1,n2,n3(r),Φz,n1,n2,n3(r)) e
in2θeiαn3z, (2.15)
with n2 = 0,±1,±2, ..., n3 = ±1,±2, ... and n1 = 0, 1, 2, .... Here n1 indexes
the eigenvalues of the equivalent Sturm-Liouville problem in the radial coor-
dinates, and n = (n1, n2, n3). See [M-N-B-G] for technical details. From now
on, we use the generic variable z for any vertical (axial) coordinate y3 or x3.
For n3 = 0 (vertical averaging along the axis of the cylinder), 2-Dimensional,
3-component solenoidal fields must be expanded along a complete basis for
fields derived from 2D stream functions:
curl(φne3), φne3
, φn = φn(r, θ),
−△φn = µnφn, φn|∂Γ = 0, and
curlΦn =
curl(φne3), µnφne3
a, be3
denotes a 3-component vector whose horizontal projection is
a and vertical projection is be3.
Let us explicit elementary swirling wave flows which are exact solutions to
(2.1) and (2.7):
Lemma 2.3 For every n = (n1, n2, n3), the following quasiperiodic (T
time) solenoidal fields are exact solution of the full 3D nonlinear Euler equa-
tions (2.1):
V(t, y) =
e3 × y + exp(
Jt)Φn(exp(−
Jt)y) exp(±i
αΩt), (2.16)
n3 is the vertical wave number of Φn and exp(
Jt) the unitary group of rigid
body rotations:
0 −1 0
1 0 0
0 0 0
 , eΩJt/2 =
cos(Ωt
) − sin(Ωt
sin(Ωt
) cos(Ωt
0 0 1
 . (2.17)
Remark 2.4 These fields are exact quasiperiodic, nonaxisymmetric swirling
flow solutions of the 3D Euler equations. For n3 6= 0, their second components
Ṽ(t, y) = exp(
Jt)Φn(exp(−
Jt)y) exp(± in3
αΩt) (2.18)
are Beltrami flows (curlṼ×Ṽ ≡ 0) exact solutions of (2.7) with Ṽ(t = 0, y) =
Φn(y).
Ṽ(t, y) in Eq. (2.18) are dispersive waves with frequencies Ω
and n3α|λn|Ω,
where α = 2π
. Moreover, each Ṽ(t, y) is a traveling wave along the cylinder’s
axis, since it contains the factor
iαn3(±z ±
Note that n3 large corresponds to small axial (vertical) scales, albeit 0 ≤
α|n3/λn| ≤ 1.
Proof of Lemma 2.3. Through the canonical rigid body transformation for
both the field V(t, y) and the space coordinates y = (y1, y2, y3):
V(t, y) = e+ΩJt/2U(t, e−ΩJt/2y) +
Jy, x = e−ΩJt/2y, (2.19)
the 3D Euler equations (2.1), (2.2) transform into:
∂tU+ (curlU+Ωe3)×U = −∇
(|x1|2 + |x2|2) +
, (2.20)
∇ ·U = 0, U(t, x)|t=0 = U(0) = Ṽ0(x), (2.21)
For Beltrami flows such that curlU×U ≡ 0, these Euler equations (2.20)-
(2.21) in a rotating frame reduce to:
∂tU+Ωe3 ×U+∇π = 0, ∇ ·U = 0,
which are identical to the Poincaré-Sobolev nonlocal wave equations in the
cylinder [M-N-B-G], [Poi], [Sob], [Ar-Khe]:
∂tΨ+Ωe3 ×Ψ+∇π = 0, ∇ ·Ψ = 0, (2.22)
curl2Ψ−Ω2 ∂
Ψ = 0, Ψ ·N|∂D = 0. (2.23)
It suffices to verify that the Beltrami flows Ψn(t, x) = Φn(x) exp
±iαn3|λn|Ωt
where Φ±n (x) and ±|λn| are curl eigenfunctions and eigenvalues, are exact
solutions to the Poincaré-Sobolev wave equation, in such a rotating frame of
reference.
Remark 2.5 The frequency spectrum of the Poincaré vorticity waves (solu-
tions to (2.22)) is exactly ±iαn3|λn|Ω, n = (n1, n2, n3) indexing the spectrum
of curl. Note that n3 = 0 (zero frequency of rotating waves) corresponds to
2-Dimensional, 3-Components solenoidal vector fields.
We now transform the Cauchy problem for the 3D Euler equations (2.1)-
(2.2) into an infinite dimensional nonlinear dynamical system by expanding
V(t, y) along the swirling wave flows (2.16)-(2.18):
V(t, y) =
e3 × y
(2.24a)
+ exp
n=(n1,n2,n3)
un(t) exp
(2.24b)
V(t = 0, y) =
e3 × y + Ṽ0(y)
(2.24c)
Ṽ0(y) =
n=(n1,n2,n3)
un(0)Φn(y),
(2.24d)
where Φn denotes the curl eigenfunctions of Proposition 2.1 if n3 6= 0, and
curl(φne3), φne3
if n3 = 0 (2D case, Remark 2.2).
As we focus on the case where helicity is conserved for (2.1)-(2.2), we
consider the class of initial data Ṽ0 such that [M-N-B-G]:
curlṼ0 ·N = 0 on Γ,
where Γ is the lateral boundary of the cylinder.
The infinite dimensional dynamical system is then equivalent to the 3D
Euler equations (2.1)-(2.2) in the cylinder, with n = (n1, n2, n3) ranging over
the whole spectrum of curl, e.g.:
k3+m3=n3
k2+m2=n2
× < curlΦk ×Φm,Φn > uk(t)um(t)
(2.25)
curlΦ±k = ±λkΦ
k if k3 6= 0,
curlΦk =
curl(φke3), µkφke3
if k3 = 0
(2D, 3-components, Remark 2.2), similarly for m3 = 0 and n3 = 0. The inner
product < , > denotes the L2 complex-valued inner product in D.
This is an infinite dimensional system of coupled equations with quadratic
nonlinearities, which conserve both the energy
E(t) =
|un(t)|2
and the helicity
Hel(t) =
±|λn| |u±n (t)|2.
The quadratic nonlinearities split into resonant terms where the exponential
oscillating phase factor in (2.25) reduces to unity and fast oscillating non-
resonant terms (Ω >> 1). The resonant set K is defined in terms of vertical
wavenumbers k3,m3, n3 and eigenvalues ±λk, ±λm, ±λn of curl:
K = {± k3
= 0, n3 = k3 +m3, n2 = k2 +m2}. (2.27)
Here k2,m2, n2 are azimuthal wavenumbers.
We shall call the “resonant Euler equations” the following ∞-dimensional
dynamical system restricted to (k,m, n) ∈ K:
(k,m,n)∈K
< curlΦk ×Φm,Φn > ukum = 0, (2.28a)
un(0) ≡< Ṽ0,Φn >, (2.28b)
here curlΦ±k = ±λkΦ
k if k3 6= 0, curlΦk =
curl(φke3), µkφke3
if k3 = 0;
similarly for m3 = 0 and n3 = 0 (2D components, Remark 2.2). If there
are no terms in (2.28a) satisfying the resonance conditions, then there will be
some modes for which
Lemma 2.6 The resonant 3D Euler equations (2.28) conserve both energy
E(t) and helicity Hel(t). The energy and helicity are identical to that of the
full exact 3D Euler equations (2.1)-(2.2).
The set of resonances K is studied in depth in [M-N-B-G]. To summarize,
K splits into:
(i ) 0-wave resonances, with n3 = k3 = m3 = 0; the corresponding reso-
nant equations are identical to the 2-Dimensional, 3-Components Euler
equations, with I.C.
Ṽ0(y1, y2, y3) dy3.
(ii) Two-Wave resonances, with k3m3n3 = 0, but two of them are not null;
the corresponding resonant equations (called “catalytic equations”) are
proven to possess an infinite, countable set of new conservation laws
[M-N-B-G].
(iii) Strict three-wave resonances for a subset K∗ ⊂ K.
Definition 2.7 The set K∗ of strict 3 wave resonances is:
= 0, k3m3n3 6= 0, n3 = k3 +m3, n2 = k2 +m2
(2.29)
Note that K∗ is parameterized by h/R, since α = 2π
parameterizes the eigen-
values λn, λk, λm of the curl operator.
Proposition 2.8 There exist a countable, non-empty set of parameters h
which K∗ 6= ∅.
Proof. The technical details, together with a more precise statement, are
postponed to the proof of Lemma 3.7. Concrete examples of resonant ax-
isymmetric and helical waves are discussed in [Mah] ( cf. Figure 2 in the
article).
Corollary 2.9 Let
Ṽ0(y1, y2, y3) dy3 = 0, i.e. zero vertical mean for the
I.C. Ṽ0(y) in (2.2), (2.8), (2.24d) and (2.28b). Then the resonant 3D Euler
equations are invariant on K∗:
(k,m,n)∈K∗
λk < Φk ×Φm,Φn > ukum = 0, k3m3n3 6= 0, (2.30a)
un(0) =< Ṽ0,Φn > (2.30b)
(where Ṽ0 has spectrum restricted to n3 6= 0).
Proof. This is an immediate corollary of the “operator splitting” Theorem
3.2 in [M-N-B-G]. �
We shall call the above dynamical systems the “strictly resonant Euler
system”. This is an ∞-dimensional Riccati system which conserves Energy
and Helicity. It corresponds to nonlinear interactions depleted on K∗.
How do dynamics of the resonant Euler equations (2.28) or (2.30) approx-
imate exact solutions of the Cauchy problem for the full Euler equations in
strong norms? This is answered by the following theorem, proven in Section
4 of [M-N-B-G]:
Theorem 2.10 Consider the initial value problem
V(t = 0, y) =
e3 × y + Ṽ0(y), Ṽ0 ∈ Hsσ, s > 7/2
for the full 3D Euler equations, with ||Ṽ0||Hs
≤M0s and curlṼ0 ·N = 0 on Γ.
• Let V(t, y) = Ω
e3 × y + Ṽ(t, y) denote the solution to the exact Euler
equations.
• Let w(t, x) denote the solution to the resonant 3D Euler equations with
Initial Condition w(0, x) ≡ w(0, y) = Ṽ0(y).
• Let ||w(t, y)||Hsσ ≤Ms(TM ,M
s ) on 0 ≤ t ≤ TM , s > 7/2.
Then, ∀ǫ > 0, ∃ Ω∗(TM ,M0s , ǫ) such that, ∀Ω ≥ Ω∗:
Ṽ(t, y)− exp
un(t)e
−i n3
on 0 ≤ t ≤ TM , ∀β ≥ 1, β ≤ s− 2. Here || · ||Hβ is defined in (1.13).
The 3D Euler flow preserves the condition curlṼ0 · N = 0 on Γ, that is
curlV(t, y) · N = 0 on Γ, for every t ≥ 0 [M-N-B-G]. The proof of this
“error-shadowing” theorem is delicate, beyond the usual Gronwall differential
inequalities and involves estimates of oscillating integrals of almost periodic
functions of time with values in Banach spaces. Its importance lies in that
solutions of the resonant Euler equations (2.28) and/or (2.30) are uniformly
close in strong norms to those of the exact Euler equations (2.1)-(2.2), on
any time interval of existence of smooth solutions of the resonant system.
The infinite dimensional Riccati systems (2.28) and (2.30) are not just hydro-
dynamic models, but exact asymptotic limit systems for Ω ≫ 1. This is in
contrast to all previous literature on conservative 3D hydrodynamic models,
such as in [G-D-O].
3 Strictly resonant Euler systems: the SO(3)
We investigate the structure and the dynamics of the “strictly resonant Euler
systems” (2.30). Recall that the set of 3-wave resonances is:
(k,m, n) : ± k3
= 0, k3m3n3 6= 0,
n3 = k3 +m3, n2 = k2 +m2
(3.1)
From the symmetries of the curl eigenfunctions Φn and eigenvalues λn in the
cylinder, the following identities hold under the transformation n2 → −n2,
n3 → −n3
Φ(n1,−n2,−n3) = Φ∗(n1, n2, n3) ,
λ(n1,−n2,−n3) = λ(n1, n2, n3) .
(3.2)
where ∗ designates the complex conjugate (see Section 3, [M-N-B-G] for
details). The eigenfunctions Φ(n1, n2, n3) involve the radial functions
Jn2(β(n1, n2, αn3)r) and J
(β(n1, n2, αn3)r), with
λ2(n1, n2, n3) = β
2(n1, n2, αn3) + α
2n23;
β(n1, n2, αn3) are discrete, countable roots of equation (3.30) in [M-N-B-G],
obtained via an equivalent Sturm-Liouville radial problem. Since the curl
eigenfunctions are even in r → −r, n1 → −n1, we will extend the indices
n1 = 1, 2, ...,+∞ to −n1 = −1,−2, ... with the above radial symmetry in
mind.
Corollary 3.1 The 3-wave resonance set K∗ is invariant under the symme-
tries σj , j = 0, 1, 2, 3, where
σ0(n1, n2, n3) = (n1, n2, n3),
σ1(n1, n2, n3) = (−n1, n2, n3),
σ2(n1, n2, n3) = (n1,−n2, n3)
σ3(n1, n2, n3) = (n1, n2,−n3) .
Remark 3.2 For 0 < i ≤ 3, 0 < j ≤ 3, 0 < l ≤ 3 σ2j = Id, σiσj = −σl if
i 6= j and σiσjσl = −Id, for i 6= j 6= l. The σj do preserve the convolution
conditions in K∗.
We choose an α for which the set K∗ is not empty. We further take the
hypothesis of a single triple wave resonance (k,m, n), modulo the symmetries
Hypothesis 3.3 K∗ is such that there exists a single triple wave number
resonance (n, k,m), modulo the symmetries σj , j = 1, 2, 3 and σj(k) 6=
k, σj(m) 6= m, σj(n) 6= n for j = 2 and j = 3.
Under the above hypothesis, one can demonstrate that the strictly resonant
Euler system splits into three uncoupled systems in C3:
Theorem 3.4 Under hypothesis 3.3, the resonant Euler system reduces to
three uncoupled rigid body systems in C3:
+ i(λk − λm)CkmnUkUm = 0 (3.3a)
− i(λm − λn)CkmnUnU∗m = 0 (3.3b)
− i(λn − λk)CkmnUnU∗k = 0 (3.3c)
where Ckmn = i < Φk ×Φm,Φ∗n >, Ckmn real and the other two uncoupled
systems obtained with the symmetries σ2(k,m, n) and σ3(k,m, n). The energy
and the helicity of each subsystem are conserved:
k + UmU
m + UnU
n) = 0,
(λkUkU
k + λmUmU
m + λnUnU
n) = 0.
Proof. It follows from U−k = U
k , λ(−k) = λ(+k), similarly for m and
n; and in a very essential way from the antisymmetry of < Φk ×Φm,Φ∗n >,
together with curlΦk = λkΦk. That Ckmn is real follows from the eigenfunc-
tions explicited in Section 3 of [M-N-B-G]. �
Remark 3.5 This deep structure, i.e. SO(3;C) rigid body systems in C3 is
a direct consequence of the Lamé form of the full 3D Euler equations, cf. Eqs.
(1.3) and (2.7), and the nonlinearity curlV ×V.
The system (3.3) is equivariant with respect to the symmetry operators
(z1, z2, z3) → (z∗1 , z∗2 , z∗3), (z1, z2, z3) → (exp(iχ1)z1, exp(iχ2)z2, exp(iχ3)z3) ,
provided χ1 = χ2 + χ3. It admits other integrals known as the Manley-
Rowe relations (see, for instance [We-Wil]). It differs from the usual 3-
wave resonance systems investigated in the literature, such as in [Zak-Man1],
[Zak-Man2], [Gu-Ma] in that
(1) helicity is conserved,
(2) dynamics of these resonant systems rigorously “shadow” those of the
exact 3D Euler equations, see Theorem 2.10.
Real forms of the system (3.3) are found in Gledzer et al. [G-D-O], corre-
sponding to the exact invariant manifold Uk ∈ iR, Um ∈ R, Un ∈ R, albeit
without any rigorous asymptotic justification. The C3 systems (3.3) with
helicity conservation laws are not discussed in [G-D-O].
The only nontrivial Manley-Rowe conservation laws for the resonant sys-
tem (3.3), rigid body SO(3;C), which are independent from energy and he-
licity, are:
(rkrmrn sin(θn − θk − θm)) = 0,
where Uj = rj exp(iθj), j = k,m, n, and
E1 = (λk − λm)r2n − (λm − λn)r2k,
E2 = (λm − λn)r2k − (λn − λk)r2m.
The resonant system (3.3) is well known to possess hyperbolic equilibria
and heteroclinic/homoclinic orbits on the energy surface. We are interested
in rigorously proving arbitrary large bursts of enstrophy and higher norms
on arbitrarily small time intervals, for properly chosen h/R. To simplify
the presentation, we establish the results for the simpler invariant manifold
Uk ∈ iR, and Um, Un ∈ R.
Rescale time as:
t→ t/Ckmn.
Start from the system
U̇n + i(λk − λm)UkUm = 0
U̇k − i(λm − λn)UnU∗m = 0
U̇m − i(λn − λk)UnU∗k = 0
(3.4)
Assume that Uk ∈ iR and that Um, Un ∈ R: set p = iUk, q = Um and r = Un,
as well as λk = λ, λm = µ and λn = ν: then
ṗ+ (µ− ν)qr = 0
q̇ + (ν − λ)rp = 0
ṙ + (λ− µ)pq = 0
(3.5)
This system admits two first integrals:
E = p2 + q2 + r2 (energy)
H = λp2 + µq2 + νr2 (helicity)
(3.6)
System (3.5) is exactly the SO(3,R) rigid body dynamics Euler equations,
with inertia momenta Ij =
|λj | , j = k,m, n [Ar1].
Lemma 3.6 ([Ar1], [G-D-O]) With the ordering λk > λm > λn, i.e. λ > µ >
ν, the equilibria (0,±1, 0) are hyperbolic saddles on the unit energy sphere,
and the equilibria (±1, 0, 0), (0, 0,±1) are centers. There exist equivariant
families of heteroclinic connections between (0,+1, 0) and (0,−1, 0). Each
pair of such connections correspond to equivariant homoclinic cycles at (0, 1, 0)
and (0,−1, 0).
We investigate bursting dynamics along orbits with large periods, with
initial conditions close to the hyperbolic point (0, E(0), 0) on the energy sphere
E. We choose resonant triads such that λk > 0, λn < 0, λk ∼ |λn|, |λm| ≪ λk,
equivalently:
λ > µ > ν, λν < 0, |µ| ≪ λ and λ ∼ |ν|. (3.7)
Lemma 3.7 There exist h/R with K∗ 6= ∅, such that
λk > λm > λn, λkλn < 0, |λm| ≪ λk and λk ∼ |λn|.
Remark 3.8 Together with the polarity ± of the curl eigenvalues, these are
3-wave resonances where two of the eigenvalues are much larger in mod-
uli than the third one. In the limit |k|, |m|, |n| ≫ 1, λk ∼ ±|k|, λm ∼
±|m|, λn ∼ ±|n|, the eigenfunctions Φ have leading asymptotic terms which
involve cosines and sines periodic in r, cf. Section 3 [M-N-B-G]. In the
strictly resonant equations (2.30), the summation over the quadratic terms
becomes an asymptotic convolution in n1 = k1+n1. The resonant three waves
in Lemma 3.7 are equivalent to Fourier triads k + m = n, with |k| ∼ |n|
and |m| ≪ |k|, |n|, in periodic lattices. In the physics of spectral theory of
turbulence [Fri], [Les], these are exactly the triads responsible from transfer of
energy between large scales and small scales. These are the triads which have
hampered mathematical efforts at proving the global regularity of the Cauchy
problem for 3D Navier-Stokes equations in periodic lattices [Fe].
Proof of Lemma 3.7 ([M-N-B-G]) The transcendental dispersion law for
3-waves in K∗ for cylindrical domains, is a polynomial of degree four in ϑ3 =
1/h2:
P̃ (ϑ3) = P̃4ϑ
3 + P̃3ϑ
3 + P̃2ϑ
3 + P̃1ϑ3 + P̃0 = 0, (3.8)
with n2 = k2 +m2 and n3 = k3 +m3.
Then with hk =
β2(k1,k2,αk3)
, hm =
β2(m1,m2,αm3)
, hn =
β2(n1,n2,αn3)
, cf.
the radial Sturm-Liouville problem in Section 3, [M-N-B-G], the coefficients
of P̃ (ϑ3) are given by:
P̃4 = −3,
P̃3 = −4(hk + hm + hn),
P̃2 = −6(hkhm + hkhn + hmhn),
P̃1 = −12hkhmhn,
P̃0 = h
n + h
n + h
k − 2(hkhmh2n + hkhnh2m + hmhnh2k).
Similar formulas for the periodic lattice domain were first derived in [B-M-N2],
[B-M-N3], [B-M-N4]. In cylindrical domains the resonance condition for K∗
is identical to
ϑ3 + hk
ϑ3 + hm
ϑ3 + hn
with ϑ3 =
, hk = β
2(k)/k23 , hm = β
2(m)/m23, hn = β
2(n)/n23; Eq. (3.8) is
the equivalent rational form.
From the asymptotic formula (3.44) in [M-N-B-G], for large β:
β(n1, n2, n3) ∼ n1π + n2
+ ψ, (3.9)
where ψ = 0 if lim m2
= 0 (e.g. h fixed, m2/m3 → 0) and ψ = ±π2 if
lim m2
= ±∞ (e.g. m2
fixed, h → ∞). The proof is completed by taking
leading terms P̃0+ϑ3P̃1 in (3.8), ϑ3 =
≪ 1, and m2 = 0, k2 = O(1), n2 =
O(1). �
We now state a theorem for bursting of the H3 norm in arbitrarily small
times, for initial data close to the hyperbolic point (0, E(0), 0):
Theorem 3.9 (Bursting dynamics in H3). Let λ > µ > ν, λν < 0, |µ| ≪ λ
and λ ∼ |ν|. Let W (t) = λ6p(t)2 + µ6q(t)2 + ν6r(t)2 the H3-norm squared of
an orbit of (3.5). Choose initial data such that: W (0) = λ6p(0)2 + µ6q(0)2
with λ6p(0)2 ∼ 1
W (0) and µ6q(0)2 ∼ 1
W (0). Then there exists t∗ > 0, such
W (t) ≥
W (0)
where t∗ ≤ 6√
W (0)
µ2Ln(λ/|µ|)(λ/|µ|)−1.
Remark 3.10 Under the conditions of Lemma 3.7,
≫ 1, whereas
µ2(Ln(λ/|µ|))(λ/|µ|)−1 ≪ 1. Therefore, over a small time interval of length
O(µ2(Ln(λ/|µ|))(λ/|µ|)−1) ≪ 1, the ratio ||U(t)||H3/||U(0)||H3 grows up to
a maximal value O
(λ/|µ|)3
≫ 1. Since the orbit is periodic, the H3 semi-
norm eventually relaxes to its initial state after some time (this being a mani-
festation of the time-reversibility of the Euler flow on the energy sphere). The
“shadowing” theorem 2.10 with s > 7/2 ensures that the full, original 3D Eu-
ler dynamics, with the same initial conditions, will undergo the same type of
burst. Notice that, with the definition (1.13) of ‖ · ‖Hs , one has
||Ωe3 × y||H3 = ||curl3(Ωe3 × y)||L2 = 0 .
Hence the solid rotation part of the original 3D Euler solution does not con-
tribute to the ratio ||V(t)||H3/||V(0)||H3 .
Theorem 3.11 (Bursting dynamics of the enstrophy). Under the same con-
ditions for the 3-wave resonance, let Ξ(t) = λ2p(t)2 + µ2q(t)2 + ν2r(t)2 the
enstrophy. Choose initial data such that Ξ(0) = λ2p(0)2 + µ2q(0)2 + ν2r(0)2
with λ2p(0)2 ∼ 1
Ξ(0), µ2q(0)2 ∼ 1
Ξ(0). Then there exists t∗∗ > 0, such that
Ξ(t∗∗) ≥
where t∗∗ ≤ 1√
Ln (λ/|µ|) (λ/|µ|)−1 .
Remark 3.12 It is interesting to compare this mechanism for bursts with ear-
lier results in the same direction obtained by DiPerna and Lions. Indeed, for
each p ∈ (1,∞), each δ ∈ (0, 1) and each t > 0, Di Perna and Lions [DiPe-Li]
constructed examples of 2D-3 components solutions to Euler equations such
||V(0)||W 1,p ≤ ǫ while ||V(t)||W 1,p ≥ 1/δ .
Their examples essentially correspond to shear flows of the form
V(t, x1, x2) =
u(x2)
w(x1 − tu(x2), x2)
where u ∈W 1,px2 while w ∈W
. Obviously
curlV(t, x1, x2) =
(∂2 − tu′(x2)∂1)w(x1 − tu(x2), x2)
−∂1w(x1 − tu(x2), x2)
−u′(x2)
Thus, all components in curlV(t, x1, x2) belong to L
loc, except for the term
−tu′(x2)∂1w(x1 − tu(x2), x2) .
For each t > 0, this term belongs to Lp for all choices of the functions u ∈
W 1,px2 and w ∈ W
x1,x2
if and only if p = ∞. Whenever p < ∞, DiPerna and
Lions construct their examples as some smooth approximation of the situation
above in the strong W 1,p topology.
In other words, the DiPerna-Lions construction works only in cases where
the initial vorticity does not belong to an algebra — specifically to Lp, which
is not an algebra unless p = ∞.
The type of burst obtained in our construction above is different: in that
case, the original vorticity belongs to the Sobolev space H2, which is an algebra
in space dimension 3. Similar phenomena are observed in all Sobolev spaces
Hβ with β ≥ 2 — which are also algebras in space dimension 3.
In other words, our results complement those of DiPerna-Lions on bursts
in higher order Sobolev spaces, however at the expense of using more intricate
dynamics.
We proceed to the proofs of Theorem 3.9 and 3.11. We are interested in
the evolution of
Ξ = λ2p2 + µ2q2 + ν2r2 (enstrophy) (3.10)
Compute
Ξ̇ = −2
λ2(µ− ν) + µ2(ν − λ) + ν2(λ− µ)
pqr (3.11)
˙(pqr) = −(µ− ν)q2r2 − (ν − λ)r2p2 − (λ− µ)p2q2 (3.12)
Using the first integrals above, one has
 (3.13)
where V an is the Vandermonde matrix
V an =
1 1 1
λ µ ν
λ2 µ2 ν2
For λ 6= µ 6= ν 6= λ, this matrix is invertible and
V an−1 =
(λ−µ)(λ−ν)
−(µ+ν)
(λ−µ)(λ−ν)
(λ−µ)(λ−ν)
(µ−ν)(µ−λ)
−(ν+λ)
(µ−ν)(µ−λ)
(µ−ν)(µ−λ)
(ν−λ)(ν−µ)
−(λ+µ)
(ν−λ)(ν−µ)
(ν−λ)(ν−µ)
Hence
(λ− µ)(λ − ν)
(Ξ− (µ+ ν)H + µνE)
(µ− ν)(µ− λ)
(Ξ− (ν + λ)H + νλE)
(ν − λ)(ν − µ)
(Ξ− (λ+ µ)H + λµE)
(3.14)
so that
(µ− ν)q2r2 = − (Ξ− (ν + λ)H + νλE) (Ξ− (λ + µ)H + λµE)
(λ− µ)(λ − ν)(µ− ν)
(ν − λ)r2p2 = − (Ξ− (λ + µ)H + λµE) (Ξ− (µ+ ν)H + µνE)
(λ− µ)(λ − ν)(µ− ν)
(λ− µ)p2q2 = − (Ξ− (µ+ ν)H + µνE) (Ξ− (ν + λ)H + νλE)
(λ− µ)(λ − ν)(µ− ν)
Later on, we shall use the notations
x−(λ, µ, ν) = (µ+ ν)H − µνE
x0 (λ, µ, ν) = (µ+ λ)H − µλE
x+(λ, µ, ν) = (λ+ ν)H − λνE
(3.15)
Therefore, we find that Ξ satisfies the second order ODE
Ξ̈ =− 2Kλ,µ,ν ((Ξ− x−(λ, µ, ν))(Ξ − x0(λ, µ, ν))
+(Ξ− x0(λ, µ, ν))(Ξ − x+(λ, µ, ν)) + (Ξ− x+(λ, µ, ν))(Ξ − x0(λ, µ, ν)))
which can be put in the form
Ξ̈ = −2Kλ,µ,νP ′λ,µ,ν(Ξ) (3.16)
where Pλ,µ,ν is the cubic
Pλ,µ,ν(X) = (X − x−(λ, µ, ν))(X − x0(λ, µ, ν))(X − x+(λ, µ, ν)) (3.17)
Kλ,µ,ν =
λ2(µ− ν) + µ2(ν − λ) + ν2(λ− µ)
(λ− µ)(λ− ν)(µ − ν)
(3.18)
In the sequel, we assume that the initial data for (p, q, r) is such that
r(0) = 0 , p(0)(q(0) 6= 0
Let us compute
x−(λ, µ, ν) = λνp(0)
2 + µ2q(0)2 + µ(λ− ν)p(0)2
x0 (λ, µ, ν) = λ
2p(0)2 + µ2q(0)2
x+(λ, µ, ν) = λ
2p(0)2 +
ν + λ
µ2q(0)2
(3.19)
We shall also assume that
λ > µ > ν , λν < 0 , |µ| ≪ λ and λ ∼ |ν| (3.20)
Then Kλ,µ,ν > 0 — in fact Kλ,µ,ν ∼ 2, and Ξ is a periodic function of t such
Ξ(t) = x0(λ, µ, ν) , sup
Ξ(t) = x+(λ, µ, ν) (3.21)
with half-period
Tλ,µ,ν =
Kλ,µ,ν
∫ x+(λ,µ,ν)
x0(λ,µ,ν)
−Pλ,µ,ν(x)
(3.22)
We are interested in the growth of the (squared) H3 norm
W (t) = λ6p(t)2 + µ6q(t)2 + ν6r(t)2 (3.23)
Expressing p2, q2 and r2 in terms of E, H and Ξ, it is found that
λ6(Ξ− x−(λ, µ, ν))
(λ− µ)(λ − ν)
µ6(Ξ− x+(λ, µ, ν))
(µ− ν)(µ− λ)
ν6(Ξ− x0(λ, µ, ν))
(ν − λ)(ν − µ)
(3.24)
Hence, when Ξ = x+(λ, µ, ν), then
λ6(x+(λ, µ ν) − x−(λ, µ, ν))
(λ− µ)(λ− ν)
ν6(x+(λ, µ ν)− x0(λ, µ, ν))
(ν − λ)(ν − µ)
λ6(x+(λ, µ ν) − x−(λ, µ, ν))
(λ− µ)(λ− ν)
Let us compute
x+(λ, µ ν)− x−(λ, µ, ν) = (λ − µ)(λ− ν)p(0)2 +
ν + λ
µ2q(0)2
& −νλq(0)2 ∼ λ2q(0)2
(3.25)
We shall pick the initial data such that
W (0) = λ6p(0)6 + µ6q(0)6 with λ6p(0)2 ∼ 1
W (0) and µ6q(0)2 ∼ 1
W (0)
(3.26)
Hence, when Ξ reaches x+(λ, µ, ν), one has
λ8q(0)2
(λ− µ)(λ− ν)
µ6(λ − µ)(λ− ν)
W (0) ∼ 1
W (0) . (3.27)
Hence W jumps from W (0) to a quantity ∼ 1
W (0) in an interval of time
that does not exceed one period of the Ξ motion, i.e. 2Tλ,µ,ν . Let us estimate
this interval of time. We recall the asymptotic equivalent for the period of an
elliptic integral in the modulus 1 limit.
Lemma 3.13 Assume that x− < x0 < x+. Then
(x− x−)(x− x0)(x+ − x)
x+ − x−
x+−x0
x+−x−
uniformly in x−, x0, and x+ as
x+−x0
x+−x− → 1.
x+(λ, µ, ν) − x−(λ, µ, ν)
λ2q(0)2
∼ |µ|
W (0)
x0(λ, µ, ν) − x−(λ, µ, ν) = (λ− µ)(λ − ν)p(0)2 (3.28)
so that
x+−x0
x+−x−
1− (λ−µ)(λ−ν)p(0)
(λ−µ)(λ−ν)p(0)2+(µ(ν+λ)−νλ−µ2)q(0)2
∼ (λ− µ)(λ− ν)p(0)
2 + (µ(ν + λ)− νλ− µ2)q(0)2
2(λ− µ)(λ − ν)p(0)2
∼ q(0)
2p(0)2
W (0)/2µ6
W (0)/2λ6
Hence
2Tλ,µ,ν .
W (0)
≤ 12√
W (0)
(3.29)
Conclusion: collecting (3.26), (3.27) and (3.29), we see that the squared H3
norm W varies from W (0) to a quantity ∼ ρ6W (0) in an interval of time
. 12√
W (0)
µ2 ln ρ
. (Here ρ = λ/µ).
We now proceed to obtain similar bursting estimates for the enstrophy.
We return to (3.21) and (3.22). Pick the initial data so that
Ξ(0) = λ2p(0)2 + µ2q(0)2 with λ2p(0)2 ∼ 1
Ξ(0) and µ2q(0)2 ∼ 1
Ξ(0).
x+(λ, µ, ν) − x−(λ, µ, ν)
= (λ− µ)(λ − ν)p(0)2 +
ν + λ
µ2q(0)2
∼ 2λ2p(0)2 + λ2q(0)2 ∼
while
x0(λ, µ, ν)− x−(λ, µ, ν) = (λ− µ)(λ− ν)p(0)2 ∼ 2λ2p(0)2 ∼ Ξ(0).
Hence, in the limit as ρ = λ/|µ| → +∞, one has
2Tλ,µ,ν ∼
ρ2Ξ(0)
1− Ξ(0)1
ρ2Ξ(0)
2Ξ(0)
1− 2ρ−2
2Ξ(0)
And Ξ varies from
x0(λ, µ, ν) = Ξ(0) to x+(λ, µ, ν) ∼ ρ2Ξ(0)
on an interval of time of length Tλ,µ,ν . �
4 Strictly resonant Euler systems: the case of
3-waves resonances on small-scales
4.1 Infinite dimensional uncoupled SO(3) systems
In this section, we consider the 3-wave resonant set K∗ when
|k|2, |m|2, |n|2 ≥
, 0 < η ≪ 1,
i.e. 3-wave resonances on small scales; here |k|2 = k21 + k22 + k23 , where
(k1, k2, k3) index the curl eigenvalues, and similarly for |m|2, |n|2. Recall
that k2 + m2 = n2, k3 + m3 = n3 (exact convolutions), but that the sum-
mation on k1, m1 on the right hand side of Eqs. (2.30) is not a convolution.
However, for |k|2, |m|2, |n|2 ≥ 1
, the summation in k1, m1 becomes an
asymptotic convolution. First:
Proposition 4.1 The set K∗ restricted to |k|2, |m|2, |n|2 ≥ 1
, ∀η, 0 < η ≪ 1
is not empty: there exist at least one h/R with resonant three waves satisfying
the above small scales condition.
Proof. We follow the algebra of the exact transcendental dispersion law
(3.8) derived in the proof of Lemma 3.7. Note that P̃ (ϑ3) < 0 for ϑ3 =
large enough. We can choose hm =
β2(m1,m2,αm3)
= 0, say in the specific limit
→ 0, and β(m1,m2, αm3) ∼ m1π+m2 π2 +
. Then P̃0 = h
n > 0 and
P̃ (ϑ3) must possess at least one (transcendental) root ϑ3 =
In the above context, the radial components of the curl eigenfunctions in-
volve cosines and sines in βr
(cf. Section 3, [M-N-B-G]) and the summation in
k1, m1 on the right hand side of the resonant Euler equations (2.30) becomes
an asymptotic convolution. The rigorous asymptotic convolution estimates are
highly technical and detailed in [Fro-M-N]. The 3-wave resonant systems for
|k|2, |m|2, |n|2 ≥ 1
are equivalent to those of an equivalent periodic lattice
[0, 2π]× [0, 2π]× [0, 2πh], ϑ3 = 1h2 ; the resonant three wave relation becomes:
ϑ3 + ϑ1
ϑ3 + ϑ1
ϑ3 + ϑ1
= 0, (4.1a)
k +m = n, k3m3n3 6= 0. (4.1b)
The algebraic geometry of these rational 3-wave resonance equations has been
investigated in depth in [B-M-N3] and [B-M-N4]. Here ϑ1, ϑ2, ϑ3 are periodic
lattice parameters; in the small-scales cylindrical case, ϑ1 = ϑ2 = 1 (after
rescaling of n2, k2, m2), ϑ3 = 1/h
2, h height. Based on the algebraic
geometry of “resonance curves” in [B-M-N3], [B-M-N4], we investigate the
resonant 3D Euler equations (2.30) in the equivalent periodic lattices.
First, triplets (k,m, n) solution of (4.1) are invariant under the reflec-
tion symmetries σ0, σ1, σ2, σ3 defined in Corollary 3.1 and Remark 3.2: σ0 =
Id, σj(k) = (ǫi,jki), 1 ≤ i ≤ 3, ǫi,j = +1 if i 6= j, ǫi,j = −1 if i = j, 1 ≤ j ≤ 3.
Second the set K∗ in (4.1) is invariant under the homothetic transformations:
(k,m, n) → (γk, γm, γn), γ rational. (4.2)
The resonant triplets lie on projective lines in the wavenumber space, with
equivariance under σj , 0 ≤ j ≤ 3 and γ-rescaling. For every given equivariant
family of such projective lines, the resonant curve is the graph of ϑ3
versus
, for parametric domain resonances in ϑ1, ϑ2, ϑ3.
Lemma 4.2 (p.17, [B-M-N4]). For every equivariant (k,m, n), the resonant
curve in the quadrant ϑ1 > 0, ϑ2 > 0, ϑ3 > 0 is the graph of a smooth function
ϑ3/ϑ1 ≡ F (ϑ2/ϑ1) intersected with the quadrant.
Theorem 4.3 (p.19, [B-M-N4]). A resonant curve in the quadrant ϑ3/ϑ1 > 0,
ϑ2/ϑ1 > 0 is called irreducible if:
k23 k
m23 m
n23 n
 6= 0. (4.3)
An irreducible resonant curve is uniquely characterized by six non-negative
algebraic invariants P1, P2, R1, R2, S1, S2, such that
P21 ,P22
R21,R22
S21 ,S22
and permutations thereof.
Lemma 4.4 (p. 25, [B-M-N4]). For resonant triplets (k,m, n) associated to
a given irreducible resonant curve, that is verifying Eq. (4.3), consider the
convolution equation n = k +m. Let σi(n) 6= n, ∀i, 1 ≤ i ≤ 3. Then there
are no more that two solutions (k,m) and (m, k), for a given n, provided
the six non-degeneracy conditions (3.39)-(3.44) in [B-M-N4] for the algebraic
invariants of the irreducible curve are verified.
For more details on the technical non-degeneracy conditions, see the Ap-
pendix. An exhaustive algebraic geometric investigation of all solutions to
n = k +m on irreducible resonant curves is found in [B-M-N4]. The essence
of the above lemma lies in that given such an irreducible, “non-degenerate”
triplet (k,m, n) on K∗, all other triplets on the same irreducible resonant
curves are exhaustively given by the equivariant projective lines:
(k,m, n) → (γk, γm, γn), for some γ rational , (4.4)
(k,m, n) → (σjk, σjm,σjn), j = 1, 2, 3, (4.5)
and permutations of k and m in the above. Of course the homothety γ and
the σj symmetries preserve the convolution. This context of irreducible, “non-
degenerate” resonant curves yields an infinite dimensional, uncoupled system
of rigid body SO(3;R) and SO(3;C) dynamics for the 3D resonant Euler
equations (2.30).
Theorem 4.5 For any irreducible triplet (k,m, n) which satisfy Theorem 4.3,
and under the “non-degeneracy” conditions of Lemma 4.4 (cf. Appendix), the
resonant Euler equations split into the infinite, countable sequence of uncou-
pled SO(3;R) systems:
ȧk = Γkmn(λm − λn)aman, (4.6a)
ȧm = Γkmn(λn − λk)anak, (4.6b)
ȧn = Γkmn(λk − λm)akam, (4.6c)
for all (k,m, n) = γ(σj(k
∗), σj(m
∗), σj(n
∗)), γ = ±1,±2,±3..., 0 ≤ j ≤ 3.
(4.7)
k∗,m∗, n∗ are some relatively prime integer vectors in Z3 characterizing the
equivariant family of projective lines (k,m, n); Γkmn = i < Φk ×Φm,Φ∗n >,
Γkmn real.
Proof. Theorem 4.5 is a simpler version for invariant manifolds of more
general SO(3;C) systems. It is a straightforward corollary of Proposition 3.2,
Proposition 3.3, Theorem 3.3, Theorem 3.4 and Theorem 3.5 in [B-M-N4].
The latter article did not explicit the resonant equations and did not use the
curl-helicity algebra fundamentally underlying this present work. Rigorously
asymptotic infinite countable sequences of uncoupled SO(3;R), SO(3;C) sys-
tems are not derived via the usual harmonic analysis tools of Fourier modes,
in the 3D Euler context. Polarization of curl eigenvalues and eigenfunctions
and helicity play an essential role.
Corollary 4.6 Under the conditions λn∗ − λk∗ > 0, λk∗ − λm∗ > 0, the
resonant Euler systems (4.6) admit a disjoint, countable family of homoclinic
cycles. Moreover, under the conditions λn∗ ≫ +1, λm∗ ≪ −1, |λk∗ | ≪ λn∗ ,
each subsystem (4.6) possesses orbits whose Hs norms, s ≥ 1, burst arbitrarily
large in arbitrarily small times.
Remark 4.7 One can prove that there exists some Γmax, 0 < Γmax < ∞,
such that |Γkmn| < Γmax, for all (k,m, n) on the equivariant projective lines
defined by (4.7). Systems (4.6) “freeze” cascades of energy; their total enstro-
phy Ξ(t) =
(k,m,n)(λ
k(t) + λ
m(t) + λ
n(t)) remains bounded, albeit
with large bursts of Ξ(t)/Ξ(0), on the reversible orbits topologically close to
the homoclinic cycles.
4.2 Coupled SO(3) rigid body resonant systems
We now derive a new resonant Euler system which couples two SO(3;R)
rigid bodies via a common principle axis of inertia and a common moment
of inertia. This 5-dimensional system conserves energy, helicity, and is rather
interesting in that dynamics on its homoclinic manifolds show bursting cas-
cades of enstrophy to the smallest scale in the resonant set. We consider the
equivalent periodic lattice geometry under the conditions of Proposition 4.1.
In Appendix, we prove that for an “irreducible” 3-wave resonant set which
now satisfies the algebraic “degeneracy” (A-4), there exist exactly two “prim-
itive” resonant triplets (k,m, n) and (k̃, m̃, n), where k, m, k̃, m̃ are relative
prime integer valued vectors in Z3:
Lemma 4.8 Under the algebraic degeneracy condition (A-4) the irreducible
equivariant family of projective lines in K∗ is exactly generated by the follow-
ing two “primitive” triplets:
n = k +m, k = ak, m = bm, (4.8a)
n = k̃ + m̃, k̃ = a′σi(k) + b
′σj(m), (4.8b)
that is,
n = ak + bm, (4.8c)
n = a′σi(k) + b
′σj(m), (4.8d)
where σi 6= σj are some reflection symmetries, a, b, a′, b′ are relatively
prime integers, positive or negative, and k, m are relatively prime integer
valued vectors in Z3, that is:
(a, a′) = (b, b′) = (a, b) = (a′, b′) = 1, (k,m) = 1,
where ( , ) denotes the Greatest Common Denominator of two integers. All
other resonant wave number triplets are generated by the group actions σl,
l = 1, 2, 3, and homothetic rescalings (k,m, n) → γ(k,m, n), (k̃, m̃, n) →
γ(k̃, m̃, n), (γ ∈ Z) of the “primitive” triplets.
Remark 4.9 It can be proven that the set of such coupled “primitive” triplets
is not empty on the periodic lattice. The algebraic irreducibility condition of
Lemma 4.2 implies that ±k3/|k| = ±k̃3/|k̃| and ±m3/|m| = ±m̃3/|m̃|, which
is obviously verified in equations (4.8).
Theorem 4.10 Under conditions of Lemma 4.8 the resonant Euler system
reduces to a system of two rigid bodies coupled via an(t):
ȧk = (λm − λn)Γaman (4.9a)
ȧm = (λn − λk)Γanak (4.9b)
ȧn = (λk − λm)Γakam + (λk̃ − λm̃)Γ̃ak̃am̃ (4.9c)
ȧm̃ = (λn − λk̃)Γ̃anak̃ (4.9d)
ȧk̃ = (λm̃ − λn)Γ̃am̃an, (4.9e)
where Γ = i < Φk×Φm,Φ∗n >, Γ̃ = i < Φk̃×Φm̃,Φ∗n >. Energy and Helicity
are conserved.
Theorem 4.11 The resonant system (4.9) possesses three independent con-
servation laws:
E1 = a2k + (1− α)a2m, (4.10a)
E2 = a2n + αa2m + (1 − α̃)a2m̃, (4.10b)
E3 = a2k̃ + α̃a
m̃, (4.10c)
where
α = (λm − λk)/(λn − λk), (4.11a)
α̃ = (λm̃ − λn)/(λk̃ − λn). (4.11b)
Theorem 4.12 Under the conditions
λm < λk < λn, (4.12a)
λm̃ < λn < λk̃, (4.12b)
which imply α < 0, α̃ < 0, the equilibria (±ak(0), 0, 0, 0,±ak̃(0)) are hyper-
bolic for |ak̃(0)| small enough with respect to |ak(0)|. The unstable manifolds
of these equilibria are one dimensional, and the nonlinear dynamics of system
(4.9) are constrained on the ellipse E1 (4.10a) for ak(t), am(t), the hyperbola
E3 (4.10c) for ak̃(t), am̃(t), and the hyperboloid E2 (4.10b) for am(t), am̃(t),
an(t).
Theorem 4.13 Let the 2-manifold E1 ∩E2 ∩E3 be coordinatized by (am, am̃).
On this 2-manifold, the resonant system (4.9) is Hamiltonian, and therefore
integrable. Its Hamiltonian vector field h is defined by
ιhω = Γ(λn − λk)
− Γ̃(λn − λk̃)
, (4.13)
where ιhω designates the inner product of the symplectic 2-form
dam ∧ dam̃
akanak̃
(4.14)
with the vector field h.
Proof of Theorem 4.13: Eliminating ak(t) via E1, an(t) via E2, ak̃(t) via
E3, the resonant system (4.9) reduces to:
ȧm = ±Γ(λn − λk)(E1 − (1− α)a2m)
2 (E2 − αa2m + (α̃− 1)a2m̃)
ȧm̃ = ±Γ̃(λn − λk̃)(E2 − αa
m + (α̃− 1)a2m̃)
2 (E3 − α̃a2m̃)
after changing the time variable into
(E1 − (1− α)a2m)
2 (E2 − αa2m + (α̃− 1)a2m̃)
2 (E3 − α̃a2m̃)
2 ds .
On each component of the manifold E1 ∩E2 ∩E3, the following functionals
are conserved:
H(am, am̃) = ± Γ̃(λn − λk̃)
(E1 − (1− α)a2m)1/2
± Γ(λn − λk)
(E3 − α̃a2m̃)1/2
Observe that the system of two coupled rigid bodies (4.9) does not seem to ad-
mit a simple Lie-Poisson bracket in the original variables (ak, am, an, am̃, ak̃).
Yet, when restricted to the 2-manifold E1 ∩E2 ∩E3 that is invariant under the
flow of (4.9), it is Hamiltonian and therefore integrable.
This raises the following interesting issue: according to the shadowing
Theorem 2.10, the Euler dynamics remains asymptotically close to that of
chains of coupled SO(3;R) and SO(3;C) rigid body systems. Perhaps some
new information could be obtained in this way. We are currently investigating
this question and will report on it in a forthcoming publication [G-M-N].
Already the simple 5-dimensional system (4.9) has interesting dynamical
properties, wich we could not find in the existing literature on systems related
to spinning tops.
Consider for instance the dynamics of the resonant system (4.9) with I.C.
topologically close to the hyperbola equilibria (±ak(0), 0, 0, 0,±ak̃(0)). Un-
der the conditions of (4.12) and with the help of the integrability Theorem
4.13, it is easy to construct equivariant families of homoclinic cycles at these
hyperbolic critical points:
Corollary 4.14 The hyperbolic critical points (±ak(0), 0, 0, 0,±ak̃(0)) pos-
sess 1-dimensional homoclinic cycles on the cones
a2n + (1− α̃)a2m̃ = −αa2m (4.15)
with α < 0, α̃ < 0.
Note that these are genuine homoclinic cycles, NOT sums of heteroclinic
connections. Initial conditions for the resonant system (4.9) are now chosen
in a small neighborhood of these hyperbolic critical points, the corresponding
orbits are topologically close to these cycles. With the ordering:
λm < λk < λn, (4.16a)
|λk| ≪ |λm|, |λk| ≪ λn, (4.16b)
λm̃ < λn < λk̃, (4.16c)
|λm̃| ≪ λk̃, (4.16d)
λk̃ ≫ λn, (4.16e)
which can be realized with |a
| ≫ 1 and | b
| ≪ 1 in the resonant triplets
(4.8), we can demonstrate bursting dynamics akin to Theorem 3.9 and 3.11 for
enstrophy and Hs norms, s ≥ 2. The interesting feature is the maximization
of |ak̃(t)| near the turning points of the homoclinic cycles on the cones (4.15).
This corresponds to transfer of energy to the smallest scale k̃, λk̃.
In a publication in preparation, we investigate infinite systems of the cou-
pled rigid bodies equations (4.9).
APPENDIX
We focus on a resonant wave number triplet (n, k,m) ∈ (Z∗)3 verifying
• the convolution relation
n = k +m, (A-1)
• the resonant 3-wave resonance relation
± n3√
1 + ϑ2n
2 + ϑ3n
± k3√
1 + ϑ2k
2 + ϑ3k
± m3√
1 + ϑ2m
2 + ϑ3m
(A-2)
• the condition of “non-catalyticity”
k3m3n3 6= 0, (A-3)
• and the degeneracy condition of [B-M-N4] (see p26)
Giri,j(k,m) = kinjml + klmjni = 0, (A-4)
where (i, j, l) is a permutation of (1, 2, 3).
Then, we know (see lemma 3.5 (2) of [B-M-N4]) that the system of equations
(A-3)-(A-4) for the unknown k and m, given the vector n, admits exactly 4
solutions in Z3 × Z3:
(k,m), (m, k), (k̃, m̃), (m̃, k̃).
Here k and m are the two vectors of the original resonant triplet, whereas
k̃ = ασi(k), m̃ = βσj(m)
where
mikl −mlki
mikl +mlki
/∈ {0,±1} and β = mlkj −mjkl
mlkj +mjkl
/∈ {0,±1}
and where the symmetries σi and σj are defined by
σi : u = (ul)l=1,2,3 →
(−1)δilul
l=1,2,3
One verifies that
σ2i = σ
j = Id, σiσj = σjσi = −σl.
That is, the group generated by σi and σj is the Klein group Z/2Z× Z/2Z.
Let us first write the irrational numbers α and β under the irreducible
representation
, β =
, with a, a′, b, b′ ∈ Z∗ and (a, a′) = (b, b′) = 1,
where ( , ) denotes the Greatest Common Denominator of the integer pair.
From k̃ ∈ Z3, it follows that a|a′k; but since (a, a′) = 1, the Euclid’s lemma
yields that a|k. Similarly, b|m. Now set
k ∈ Z3, m = 1
m ∈ Z3.
Hence the integer vector n admits the two decompositions
n = ak + bm = a′σi(k) + b
′σj(m).
Since the function
z 7−→ z3√
1 + ϑ2z
2 + ϑ3z
is homogeneous of degree 0, we see that within the resonance condition (A-2)
we can replace each vector k,m and n by any colinear vectors - either integer
or not. Suppose now that there exists some positive integer d 6= 1 such that
d|k; then d|n, so that by setting
n, k0 =
k, m0 =
we finally obtain
n0 = ak0 + bk0 = a
′σi(k0) + b
′σj(m0).
The triplets (n0, ak0, bm0) and (n0, a
′σi(k0), b
′σj(m0)) further verify from the
above remark, the convolution relation (A-1) and the resonance relation (A-2).
Hence, without loss of generality, we can assume that the only positive integer
d such that d|k and d|m is 1; which we denote by
(k,m) = 1.
Equivalently,
k1Z+ k2Z+ k3Z+m1Z+m2Z+m3Z+ = Z.
Finally, suppose there exists some positive integer d 6= 1 such that d|a and
d|b. Then d|n; set
n, a0 =
a, b0 =
Observe that
Giri,j(a0k, b0m) =
Giri,j(ak, bm) = 0.
It follows from lemma 3.5 (2) of [B-M-N4] that the vector n0 of the resonant
triplet (n0, a0k, b0m) can also be written as
n0 = k̂ + m̂ with (n0, k̂, m̂) verifying (A-2).
But then
n = dn0 = ak + bm = a
′σi(k) + b
′σj(m) = dk̂ + dm̂.
From lemma 3.5 (2) of [B-M-N4], (dk̂, dm̂) must coincide with either one of
the pairs
(a′σi(k), b
′σj(m)), (b
′σj(m), a
′σi(k)).
In particular, d|a′k and d|b′m. Since d|a and (a, a′) = 1, we have (d, a′);
similarly (d, b′) = 1. But then Euclid’s lemma yields that d|k and d|m, which
contradicts the fact that (k,m) = 1. Hence we have proven that (a, b) = 1.
In a similar way, one can show that (a′, b′) = 1.
Conclusion: It follows from the above study that n ∈ Z∗ admits the two
decompositions
n = ak + bm = a′σi(k) + b
′σj(m)
(a, a′) = (b, b′) = (a, b) = (a′, b′) = 1, (k,m) = 1.
The triplets (n, ak, bm) and (n, a′σi(k), b
′σj(m)) both verify the resonant
condition (A-2) (from the homogeneity of this condition) as well as the condi-
tion of non-catalyticity (A-3). Indeed, aba′b′ 6= 0 and the condition (A-3) on
the initial triplet (n, k,m) imply that the reduced triplet (n, k,m) also verifies
(A-3)). Finally, the degeneracy condition (A-4)
Giri,j(ak, bm) = 0
is verified.
Acknowledgments. We would like to thank A.I. Bobenko, C. Bardos
and G. Seregin for very useful discussions. The assistance of Dr. B. S. Kim
is gratefully acknowledged. A.M. and B.N. acknowledge the support of the
AFOSR contract FA9550-05-1-0047.
References
[Ar1] Arnold, V.I., Mathematical methods of classical mechanics, Springer-
Verlag, New York-Berlin, 1978.
[Ar2] Arnold, V.I., Small denominators. I. Mappings of the circumference
onto itself, Amer. Math. Soc. Transl. Ser. 2, 46 (1965), p. 213-284.
[Ar-Khe] Arnold, V.I. and Khesin, B.A., Topological Methods in Hydrody-
namics, Applied Mathematical Sciences, 125, Springer, 1997.
[B-M-N1] Babin, A., Mahalov, A. and Nicolaenko, B., Global splitting, in-
tegrability and regularity of 3D Euler and Navier-Stokes equations for
uniformly rotating fluids, European J. Mechanics B/Fluids, 15 (1996),
p. 291-300.
[B-M-N2] Babin, A., Mahalov, A. and Nicolaenko, B., Global regularity and
integrability of 3D Euler and Navier-Stokes equations for uniformly ro-
tating fluids, Asymptotic Analysis, 15 (1997), p. 103–150.
[B-M-N3] Babin, A., Mahalov, A. and Nicolaenko, B., Global regularity of
3D rotating Navier-Stokes equations for resonant domains, Indiana Univ.
Math. J., 48 (1999), No. 3, p. 1133-1176.
[B-M-N4] Babin, A., Mahalov, A. and Nicolaenko, B., 3D Navier-Stokes and
Euler equations with initial data characterized by uniformly large vortic-
ity, Indiana Univ. Math. J., 50 (2001), p. 1-35.
[B-K-M] Beale, J.T., Kato, T. and Majda, A., Remarks on the breakdown of
smooth solutions for the 3D Euler equations, Commun. Math. Phys., 94
(1984), p. 61-66.
[Bes] Besicovitch, A.S., Almost Periodic Functions, Dover, New York, 1954.
[Bo-Mi] Bogoliubov, N.N. and Mitropolsky, Y. A., Asymptotic Methods in
the Theory of Non-linear Oscillations, Gordon and Breach Science Pub-
lishers, New York, 1961.
[Bou-Br] Bourguignon, J.P. and Brezis, H., Remark on the Euler equations,
J. Func. Anal., 15 (1974), p. 341-363.
[Ch-Ch-Ey-H] Chen, Q., Chen, S., Eyink, G.L. and Holm, D.D., Intermit-
tency in the joint cascade of energy and helicity, Phys. Rev. Letters, 90
(2003), p. 214503.
[Cor] Corduneanu, C., Almost periodic Functions, Wiley-Interscience, New
York, 1968.
[DiPe-Li] DiPerna, R.J. and Lions, P.L., Ordinary differential equations,
Sobolev spaces and transport theory, Invent. Math., 98 (1989), p. 511-
[Fe] Fefferman, C.L., Existence and smoothness of the Navier-Stokes equa-
tions, The millennium prize problems, Clay Math. Inst., Cambridge, MA
(2006), p. 57-67.
[Fri] Frisch, U., Turbulence: the legacy of A. N. Kolmogolov, Cambridge
University Press, 1995.
[Fro-M-N] Frolova, E., Mahalov, A. and Nicolaenko, B., Restricted interac-
tions and global regularity of 3D rapidly rotating Navier-Stokes equa-
tions in cylindrical domains, Journal of Mathematical Sciences, Springer,
to appear.
[Gl1] Gledzer, E.B., Systema gidrodinamicheskovo tipa, dopuskayuchaya dva
kvadratichnykh integrala dvizheniya, D. A. N. USSR, 209 (1973), No. 5.
[G-D-O] Gledzer, E.B., Dolzhanski, F.V. and Obukhov, A.M., Systemi gidro-
dinamitcheskovo tipa i ikh primetchnii, Nauka, Moscow, (1987).
[G-M-N] Golse, F., Mahalov, A., Nicolaenko, B., in preparation.
[Gu-Ma] Guckenheimer, J. and Mahalov, A., Resonant triad interaction in
symmetric systems, Physica D, 54 (1992), 267-310.
[Hou1] Hou, T.Y., Deng, J. and Yu, X., Geometric properties and nonblowup
of 3D incompressible Euler flow, C.P.D.E., 30 (2005), p. 225-243.
[Hou2] Hou, T.Y. and Li, R., Dynamic depletion of vortex stretching and non-
blowup of the 3D incompressible Euler equations, CALTECH, preprint
(2006).
[Ka] Kato, T., Nonstationary flows of viscous and ideal fluids in R3, J. Func.
Anal., 9 (1972), p. 296-305.
[Ke] Kerr, R.M., Evidence for a singularity of the three dimensional, incom-
pressible Euler equations, Phys. Fluids, 5 (1993), No. 7, p. 1725-1746.
[Les] Lesieur, M., Turbulence in fluids, 2nd edition, Kluwer, Dortrecht, 1990.
[Li] Lions, P.L., Mathematical Topics in Fluid Mechanics: Incompressible
Models Vol 1, Oxford University Press, 1998.
[Mah] Mahalov, A., The instability of rotating fluid columns subjected to a
weak external Coriolis force, Phys. Fluids A, 5 (1993), No. 4, p. 891-900.
[M-N-B-G] Mahalov, A., Nicolaenko, B., Bardos, C. and Golse, F., Non blow-
up of the 3D Euler equations for a class of three-dimensional initial data
in cylindrical domains, Methods and Applications of Analysis, 11 (2004),
No. 4, p. 605-634.
[Man] Manakhov, S.V., Note on the integration of Euler’s equations of the
dynamics of a n-dimensional rigid body, Funct. Anal. and Appl., 10
(1976), No. 4, p. 328-329.
[Mor1] Moreau, J.J., Une methode de cinematique fonctionelle en hydrody-
namicque, C.R. Acad. Sci. Paris, 249 (1959), p. 2156-2158
[Mor2] Moreau, J.J., Constantes d’un ilôt tourbillonaire en fluide parfait
barotrope, C.R. Acad. Sci. Paris, 252 (1961), p. 2810-2812
[Mof] Moffatt, H.K., The degree of knottedness of tangled vortex lines, J.
Fluid Mech., 106 (1969), p. 117-129.
[Poi] Poincaré, H., Sur la précession des corps déformables, Bull. As-
tronomique, 27 (1910), p. 321-356.
[Sob] Sobolev, S.L., Ob odnoi novoi zadache matematicheskoi fiziki, Izvestiia
Akademii Nauk SSSR, Ser. Matematicheskaia, 18 (1954), No. 1, p. 3–50.
[Vish] Vishik, S. M., Ob invariantnyh characteristikah kvadratichno-
nelineynyh sistem kaskadnovo tipa, D. A. N. USSR, 228 (1976), No.
6, p. 1269-1270.
[We-Wil] Weiland, J. and Wilhelmsson, H., Coherent nonlinear interactions
of waves in plasmas, Pergamon, Oxford, 1977.
[Yu1] Yudovich, V.I., Non stationary flow of an ideal incompressible liquid,
Zb. Vych. Mat., 3 (1963), p. 1032-1066
[Yu2] Yudovich, V.I., Uniqueness theorem for the basic nonstationary prob-
lem in th dynamics of an ideal incompressible fluid, Math. Res. Letters,
2 (1995), p. 27-38.
[Zak-Man1] Zakharov, V.E. and Manakov, S.V., Resonant interactions of
wave packets in nonlinear media, Sov. Phys. JETP Lett., 18 (1973),
243-245.
[Zak-Man2] Zakharov, V.E. and Manakov, S.V., The theory of resonance in-
teraction of wave packets in nonlinear media, Sov. Phys. JETP, 42 (1976),
842-850.
	Introduction
	Vorticity waves and resonances of elementary swirling flows
	Strictly resonant Euler systems: the SO(3) case
	Strictly resonant Euler systems: the case of 3-waves resonances on small-scales
	 Infinite dimensional uncoupled SO(3) systems 
	Coupled SO(3) rigid body resonant systems
ABSTRACT
  A class of three-dimensional initial data characterized by uniformly large
vorticity is considered for the Euler equations of incompressible fluids. The
fast singular oscillating limits of the Euler equations are studied for
parametrically resonant cylinders. Resonances of fast swirling Beltrami waves
deplete the Euler nonlinearity. The resonant Euler equations are systems of
three-dimensional rigid body equations, coupled or not. Some cases of these
resonant systems have homoclinic cycles, and orbits in the vicinity of these
homoclinic cycles lead to bursts of the Euler solution measured in Sobolev
norms of order higher than that corresponding to the enstrophy.

<|endoftext|><|startoftext|>
Introduction 
In addition to zinc dialkyldithiophosphate (ZDTP) additives, extensively used for their 
exceptional antioxidant and anti-wear properties under boundary conditions in automotive 
engines, lubricating oils contain several additives, among which there are detergent and 
dispersant additives whose main role is to keep oil insoluble contaminants and degradation 
products in suspension, at elevated temperature for the detergent additives, and at low 
temperatures for the dispersant ones. Organo molybdenum compounds such as molybdenum 
dithiocarbamate (MoDTC) are also used as friction modifiers for energy saving. However, 
when used together in formulated oils, additives interact in various ways resulting either in 
synergies or in adverse effects affecting the oil performance regarding anti-wear and friction 
behaviour, and modifying the characteristics of the protective surface films generated during 
friction (tribofilms). A lot of investigations have been conducted to evaluate the performances 
of additive mixtures and to determine the composition of associated tribofilms. Several 
factors were identified as playing a role: additive structure [1, 2], additives concentration [3-
6], base oil nature [7, 8], …, or combinations of these parameters. A detailed review on 
published information on that topic was written by Willermet [9]. Non-chemical parameters 
such as characteristics of the solid antagonists (hardness, roughness) or test conditions (load, 
temperature, sliding speed) [3, 10] also might influence the additive interactions. 
Among this variety of additive interactions, we will focus on that between ZDTP and 
MoDTC, extensively studied through chemical investigations. All published works agree 
upon the fact that friction and anti-wear performances of oils are improved when ZDTP and 
MoDTC are used together. The formation of molybdenum disulphide (MoS2) on the rubbing 
surfaces has been evidenced by several authors [11, 12]. Using UHV friction tests, coupled 
with high-resolution TEM observation of wear debris and spectroscopic studies, Grossiord et 
al. has given evidence for the mechanism of single MoS2 sheet lubrication [13]. 
The aim of this paper is to enlarge the knowledge of the local mechanical and frictional 
properties of anti-wear tribofilms to those of films obtained from lubricants containing 
different additives (ZDTP, MoDTC, detergent/dispersant) or mixtures of additives, in order to 
explore the ZDTP/MoDTC synergy on a mechanical point of view. The only published results 
on that topic are the recent papers from Ye et al. who performed AFM observations and 
nanoindentation measurements on ZDTP and ZDTP + MoDTC tribofilms [14, 15]. 
In the present study, nanoindentation tests with continuous stiffness measurements were 
performed on unwashed and solvent-washed tribofilms to determine their mechanical 
properties. The frictional behaviour of the tribofilms was investigated through local 
nanofriction experiments, conducted with the same device. The evolution of the friction 
coefficient as a function of the applied pressure for the different lubricant formulations 
leading to different tribofilms has been determined. 
2. Preliminary results obtained on ZDTP anti-wear tribofilms 
The structure and the rheological properties of anti-wear films from a zinc 
dialkyldithiophosphate (ZDTP) solution generated in a rolling/sliding contact, simulating 
engine valve train conditions, have been studied in detail with analytical and surface force 
tools and the results have been published by the authors in a previous paper [16]. As preamble 
to the present paper, only the main points are summarised here. 
The ZDTP solution was a commercial secondary alkyl ZDTP additive at 0.1% weight of 
phosphorus in a highly refined base oil. The ZDTP anti-wear films have a complex structure 
that has been determined by extensive use of surface analytical techniques. It has been shown 
that the ZDTP films consisted of at least three non-homogeneous layers: on the steel surface, 
there is a sulphide/oxide layer, which is almost completely covered by a protective phosphate 
layer, with the addition of a viscous overlayer of ZDTP degradation precipitates (alkyl 
phosphate precipitates). This latter layer was removed when the film was washed with an 
alkane solvent. Therefore, the properties of the ZDTP films have been studied both before and 
after solvent washing with n-heptane. First, sphere/plane squeeze experiments were 
performed with a surface force apparatus (SFA) on unwashed films, showing that the 
overlayer of alkyl phosphate precipitates was heterogeneous and discontinuous, with a 
thickness of about 900 nm. Second, the mechanical properties were obtained from 
nanoindentation experiments, performed after replacing the sphere by a diamond tip, and 
coupled with in-situ topographic imaging procedures to measure the contact area. From the 
indentation experiments, the properties of the films were determined from normal stiffness 
measurements and through the application of a rheological film model. On the unwashed 
specimens, the viscous layer of alkyl phosphate precipitates was detected by the indentation 
tests. It is a very soft layer, mobile under the diamond tip, with a thickness of a few hundred 
of nanometers, which was in good agreement with that of sphere/plane experiments. It was 
also shown that indentation experiments removed this overlayer in the proximity of the tip, 
probably through a shear flow mechanism. This procedure can be compared to a soft 
"mechanical" sweep and the mechanical properties of ZDTP tribofilms after such a cleaning 
were found to be similar to those of solvent washed specimens. The solvent washed 
tribofilms, comprising sulphide and phosphate layers, exhibited an elastoplastic behaviour 
and, during the loading stage of the indentation, the hardness and the Young's modulus of the 
phosphate layer increased from their initial values of about 2 GPa for the hardness and 
between 30 and 40 GPa for the Young's modulus. In particular, the initial hardness of the 
polyphosphate layer at the beginning of the indentation tests was close to the mean applied 
pressure during the films generation. This suggested that the layer accommodated the contact 
pressure in the tribotest or during the loading stage of the indentation, and could thus be 
regarded as a final and local pressure sensor. The characteristics of the full ZDTP films ensure 
gradual changes in mechanical properties between the substrate, bonding layers and outer 
layers with the viscous overlayer serving as the tribofilm's precursor. The properties of these 
layered films can thus adapt to a wide range of imposed conditions and provide appropriate 
level of resistance to contact between the metal surfaces. As the severity of loading increases, 
so too do the resistive forces within the film. This ensures that the shear plane remains located 
inside the ZDTP protective film, which explains the exceptional efficiency of ZDTP films as 
anti-wear films.  
3. Experimental 
3.1. Tribofilms 
The tribofilms were generated at Shell Research and Technology Centre, Thornton, U.K., 
with a reciprocating Amsler machine [17] designed to simulate the contact conditions of the 
cam/follower system in an internal combustion engine valve train. A flat block specimen 
(8 mm x 8 mm size, 4 mm thick) has a reciprocating motion in loaded contact with a rotating 
disc. The block and the disc were made in through-hardened EN31 steel. Special care was 
taken with the roughness of the blocks which were polished until the average roughness was 
Ra = 0.01 µm. The movement of the block was driven by a crank linked to the motion of the 
disc axis through a gearbox. The block motion was approximately sinusoidal and at the same 
frequency as the disc rotation. Load was applied to the contact by a spring arrangement, 
acting through a roller bearing. The surface in contact with the loading bearing (rear surface 
of the reciprocating element) was curved to permit self-alignment between the block and the 
disc. The films were generated at a normal load of 400 N (mean contact pressure of 0.36 
GPa), speed of 600 rev/min., block temperature of approximately 100°C for 5 hours. The 
lubricants consisted of a highly refined base oil with different commercial additives (details of 
the oil formulation are not relevant to the present work): 
- MoDTC solution, 
- ZDTP + MoDTC solution, 
- ZDTP + MoDTC + detergent/dispersant solution ("full formulation"). 
The rubbing area on the polished block was typically 5 mm long in the sliding direction. 
Previous analyses have shown that the composition in the centre of the wear track was 
reasonably uniform, while the composition within 1 mm of the ends of the wear track could 
vary significantly. The mechanical measurements on the films with the Surface Force 
Apparatus have been performed in the central area of the wear track. An additional unworn 
and polished block was used to obtain reference values for the EN31 steel substrate. 
To preserve the film structures, the blocks were stored in the base oil (containing 
predominantly paraffinic hydrocarbons, with very low concentration of polar compounds) 
immediately after production of the films in the reciprocating Amsler tests and they were 
immersed again, when not in use. 
3.2. Surface Force Apparatus 
The Ecole Centrale de Lyon Surface Force Apparatus (SFA) used in these experiments has 
been described in previous publications [18, 19]. The general principle is that a macroscopic 
spherical body or a diamond tip can be moved toward and away from a planar one (the ZDTP 
specimen) using the expansion and the vibration of a piezoelectric crystal, along the three 
directions, Ox, Oy (parallel to the plane surface) and Oz (normal to the plane surface). The 
plane specimen is supported by double cantilever sensors, measuring quasi-static normal and 
tangential forces (respectively Fz and Fx). Each of these is equipped with a capacitive sensor. 
The sensor's high resolution allows a very low compliance to be used for the force 
measurement (up to 2 x 10-6 m/N). Three capacitive sensors were designed to measure relative 
displacements in the three directions between the supports of the two solids, with a resolution 
of 0.01 nm in each direction. Each sensor capacitance was determined by incorporating it in 
an LC oscillator operating in the range 5 - 12 MHz [20]. 
3.3. Tests methodology 
All the experiments were conducted at room temperature. Preliminary results obtained on 
anti-wear films from a ZDTP solutions have shown that n-heptane washing damages the film 
[16]. That is why the blocks were tested first as obtained from the Amsler friction test, 
without any cleaning and second after washing with n-heptane. The unwashed specimens 
were mounted on the SFA as taken from the storage base oil. Excess of base oil was simply 
removed by placing the side of the specimen on absorbing paper, which allowed the surface to 
be always preserved by an oil film (thickness > 10 µm). 
Nanoindentation tests 
The aim of these tests was to determine the elastoplastic properties of the tribofilms (hardness 
and Young's modulus) and their “mechanical” structure (number of layers and estimation of 
the thickness of each layer that constitutes the film). The method used to perform 
nanoindentation experiment with the SFA has already been published in detail [21]. Specific 
procedures have been developed for the characterisation of ZDTP tribofilms and have been 
described in previous papers [16, 22]. In this study, the determination of the near surface 
mechanical properties (first nanometers) was obtained through a specific tip shape calibration, 
performed on a gold film deposited by magnetron sputtering onto a silicon substrate. This 
film was very smooth (peak to valley roughness around 1 nm, measured on a scan length of 
1 µm) and its hardness was constant versus depth from the surface and until the penetration 
depth equals the gold film's thickness [21]. 
For the nanoindentation experiments, a trigonal diamond tip with an angle of 115.12° between 
edges (Berkovitch type) was used. The indentation tests were performed in controlled 
displacement mode. The standard set-up included the continuous quasi-static measurements 
of the resulting normal force Fz versus the normal displacement Z, at a slow penetration 
speed, generally 0.1 to 0.5 nm/s. It also included the simultaneous measurements of the 
rheological behaviour (dissipative and conservative or elastic contributions) of the tested 
surface, thanks to simultaneous small sinusoidal motions at a frequency of 37 Hz, with an 
amplitude of about 0.2 nm RMS. Furthermore, using the Z feedback in the constant force 
mode and the tangential displacement of the indenter, the surface topography was imaged 
before and after the indentation test, with the same diamond tip. This was made practically 
possible because of the partial elastic recovery during the unloading cycle and hence the 
geometry of tip and indent were different which was necessary to permit resolution of the 
indent. For this scanning procedure, a constant normal load of 0.5 µN was typically used. 
Such in-situ imaging procedure enables the operator to choose precisely the location of the 
indentation test on the surface and, after the test, to quantify the plastic pile-up around the 
indent and thus to measure the actual contact area. 
Rheological film model 
The elastic properties of the films were very difficult to extract from the indentation tests 
because of both the influence of the substrate and of the film structure itself. They were 
obtained through the stiffness measurements, which are global (film+substrate) 
measurements. To extract the properties of each layer of the film, a simple model has been 
developed, and its main features are described as follows. The experimental stiffness versus 
normal displacement curve was identified with the elastic response of a structure composed of 
one or two homogeneous elastic layers on a substrate (semi-infinite elastic half space) 
indented by a rigid cylindrical punch of radius a. For such a system, modelled by two springs 
connected in series [23], the calculated global stiffness (Kz) depends on the reduced Young's 
modulus of the substrate (Es* with Es*=Es/(1-νs2)), measured on an unworn steel block, over 
the contact radius (a) and depends also on four unknown parameters which are the reduced 
Young's modulus (Ef*, Ef*=Ef/(1-νf2)) and the thickness (t) of each layer. For each test, their 
values were adjusted to obtain a good fit between the measured stiffness curve and the 
calculated one. This procedure provided the structure (one or two layers), the thickness and 
the reduced Young's modulus of each layer that constituted the tribofilms. Details are given in 
a previous paper [16]. Following this model, the global stiffness of a single layer system is 
given by:  
22K t a
a E aEz f s
π π * *
 (1) 
This simple model describes perfectly the behaviour of model systems such as gold layers on 
a silicon substrate [21]. In the case of tribofilms, deviations may be observed at a critical 
pressure or at a critical depth from which the experimentally measured stiffness may be found 
to exceed significantly the theoretical one. This is interpreted as a change in the surface 
properties due to the applied pressure and appears to be related to a measured hardness 
increase. Indeed, as the applied pressure can reach values much larger than the initial hardness 
value of the surface, the resulting plastic flow may induce a small volume reduction and 
molecular rearrangements which could be sufficient to induce a noticeable change in the 
mechanical properties. From a threshold pressure value, H0, the stiffness curve was then 
influenced both by the substrate's elasticity and by the change in mechanical properties. This 
pressure dependence can be introduced in the model by writing that in the deformed volume 
of material, when H>H0 (i.e. when the film accommodates the applied pressure through 
hardness increase), the film modulus Ef* is proportional to the hardness (the ratio Ef*/H 
remains constant). It gives the following equation: 
EE =  (2) 
Ef0* is the reduced Young's modulus value, when the applied pressure is equal to or lower 
than the threshold pressure H0. When necessary, by introducing this effect in our modelling 
and by adjusting the value of the threshold pressure, we were able to fit correctly the whole 
stiffness curve. An example of such fit is given on figure 1. The evolution of the film modulus 
Ef* versus plastic depth can also be extracted from equation 1 using the experimentally 
measured global (film+substrate) stiffness values Kz and the film's thickness, t, independently 
of equation 2. This permits a check on whether it is proportional to the hardness as assumed in 
equation 2. In the example shown figure 2, the calculated Young's modulus of the film (from 
equation 1 with a film thickness t=25 nm) is found to be proportional to the measured 
hardness with a mean ratio Ef*/H=16.5, in good agreement with the ratio 
Ef0*/H0=17/1.05=16.2 obtained from the stiffness fit. 
Full formulation (ZDTP+MoDTC+detergent/dispersant)
Solvent washed tribofilm
0 10 20 30 40 50
Penetration depth (nm)
Measured stiffness
Calculated stiffness, t=25 nm, Efo*=17 GPa,
without pressure accommodation
Calculated stiffness, t=25 nm, Efo*=17 GPa,
with pressure accommodation, Ho=1.05 GPa
Figure 1: Example of application of the rheological film model: measured and calculated 
global stiffness for a tribofilm obtained from the full formulation (MoDTC + ZDTP + 
detergent/dispersant). A good fit between the measured and the calculated values is obtained 
with a single layer system (thickness t=25 nm and reduced Young's modulus Ef0*=17 GPa) 
and a pressure accommodation effect from a threshold pressure H0=1.05 GPa. 
t = 25 nmEf0* = 17 GPa H0 = 1.05 GPa
Full formulation (ZDTP+MoDTC+detergent/dispersant)
Solvent washed tribofilm
0 10 20 30 40 50 60
Plastic depth (nm)
Reduced Young's modulus of the tribofilm, Ef*
Hardness of the tribofilm, H
t = 25 nmEf0* = 17 GPa H0 = 1.05 GPa
Full formulation (ZDTP+MoDTC+detergent/dispersant)
Solvent washed tribofilm
0 10 20 30 40 50 60
Plastic depth (nm)
Reduced Young's modulus of the tribofilm, Ef*
Hardness of the tribofilm, H
Figure 2: Example of evolution of film's reduced Young's modulus and hardness versus 
plastic depth, for a tribofilm obtained from the "full formulation" (MoDTC + ZDTP + 
detergent/dispersant). The Young's modulus of the film is calculated using equation 1 with the 
measured stiffness values and using only the film's thickness determined from the fit shown 
figure 1 (t = 25 nm). 
Nanofriction experiments 
Nanofriction experiments were conducted on the blocks by moving the diamond tip along Ox 
direction (parallel to the surface) at low speed (2 to 5 nm/s) along a distance of 0.5 µm. The 
objective of these tests was to determine how the friction coefficient varies as a function of 
the applied pressure. The tests were conducted at monitored increasing depth. During the 
tests, the normal, Fz, and the tangential, Fx, forces were recorded, which allowed us to 
calculate the apparent friction coefficient µ=Fx/Fz (see example figure 3). 
Full formulation (ZDTP+MoDTC+detergent/dispersant)
Solvent washed tribofilm
0 20 40 60 80 100 120 140
Time (s)
µ=Fx/Fz
0 5 10 15 20 25
Penetration depth (nm)
Indentation test
Nanofriction test Smaller contact area
Figure 3: Procedure used for the nanofriction tests. The diamond tip is oriented edge first and 
the nanofriction tests are conducted at monitored increasing depth. During the test, the 
normal (Fz) and tangential (Fx) forces are recorded. The friction coefficient µ=Fx/Fz is 
calculated. 
Large pile-up was observed in the case of nanofriction with the diamond tip oriented face 
first, which may induce large uncertainty in the calculation of the contact area. That is why 
the nanofriction tests were conducted edge first. In these conditions, the estimation of the 
applied pressure at a given depth was obtained using low load nanoindentation tests, made in 
the near proximity of the nanofriction tests. Assuming that, at a given depth, the hardness of 
the tribofilm should be the same for the friction test and for the near indentation test, the 
contact area, and then the applied pressure, were obtained from the difference between the 
normal force measured for the two tests at the same depth (see insert on figure 3). Using the 
in-situ imaging procedure, figure 4 shows an example of an image of the surface of a 
tribofilm after such a nanofriction test. 
100 nm
Beginning
of the testBeginning of the wear
Direction of friction
100 nm
Beginning
of the testBeginning of the wear
Direction of friction  
Figure 4: Typical image of the surface 
of a tribofilm after a nanofriction 
experiment. The image is obtained with 
the in-situ imaging procedure. 
4. Results 
The first part presents the mechanical properties of the different tribofilms, determined from 
the nanoindentation experiments. Their structure, one or two layers, and their thickness were 
deduced from the use of our rheological film model. 
The results concerning the frictional behaviour of the tribofilms are given in a second part. 
4.1. Structure and mechanical properties of the tribofilms 
MoDTC tribofilms 
The tribofilm obtained from base oil + MoDTC has been tested without washing and after 
washing with n-heptane. Even on the solvent washed block, it was not possible to make any 
local topographic image nor line scanning preliminary to the indentations tests, revealing that 
the film was very soft and was easily damaged by the diamond tip. Representative hardness 
curves obtained on the MoDTC tribofilms are shown on figure 5. 
MoDTC tribofilm
0 40 80 120 160
Plastic depth (nm)
Unwashed tribofilm
Solvent-washed tribofilm
Figure 5: Typical hardness curves obtained on the MoDTC tribofilms. Open symbols 
correspond to hardness curves obtained on the unwashed film. Black symbols correspond to 
hardness curves obtained on the solvent-washed film. 
Very low mechanical properties were measured on the unwashed MoDTC tribofilm. The 
surface hardness ranged from 0.02 to 0.1 GPa indicating the presence of a very soft overlayer 
covering the tribofilm. 
After washing with n-heptane, the indentation tests showed that this overlayer has been 
removed by the washing procedure. The remaining tribofilm was a soft homogeneous layer, 
whose hardness was typically in the range 0.4 - 0.5 GPa at the beginning of the tests. 
Adhesion to the diamond tip was detected at the end of the unloading part of the tests. The 
film thickness and the structure (number of layers) have been obtained from the stiffness 
measurements performed during the experiments using the rheological film model. 
The film appeared to be homogeneous in its thickness, and for most of the tests, its elastic 
behaviour corresponded to that of a single layer, with constant properties versus depth. The 
thickness of the film was found to be between 30 and 75 nm. The reduced Young's modulus 
was typically equal to 7 – 8 GPa. 
ZDTP + MoDTC tribofilms 
From optical observation, the ZDTP + MoDTC unwashed film was very thin. This was 
confirmed by the indentation tests. Prior to any contact, a very soft layer, 60 to 120 nm thick, 
was detected at the surface of the unwashed film. 
Indentation tests conducted after scanning or imaging the surface of the unwashed film 
("mechanical sweep") showed that the film was spatially heterogeneous. Its thickness and its 
mechanical properties varied depending on the test location: 
- In some places, only a very thin layer (a few nanometers thick) with a reduced Young's 
modulus of 50 GPa covered the work-hardened steel substrate (tests A and B on figure 6). 
- A thicker layer (15 to 30 nm) with a reduced Young's modulus of 50 – 80 GPa was found 
in other places (tests C and D on figure 6), sometimes with accommodation pressure 
effect (threshold pressure H0 = 4.8 GPa). Such layer behaves like the sulphide-oxide layer 
of the ZDTP tribofilm [16]. 
- Elsewhere, the structure of the tribofilm was more complex, with a soft layer covering a 
stiffer one. For example, test E on figure 6 corresponds to a soft layer, 12 nm thick, with 
properties comparable to those of the MoDTC tribofilm (hardness of 0.2 GPa and reduced 
Young's modulus of 5 GPa) which covers a stiffer layer, 18 nm thick, with a reduced 
Young's modulus of 50 GPa. 
This heterogeneity was confirmed by the indentation tests conducted on the solvent-washed 
ZDTP + MoDTC tribofilm, where, at least, three different types of film were identified: 
- In some places, the film behaved like a one layer system, able to accommodate the 
pressure (pressure threshold 2.8 GPa). Its thickness was between 35 nm and 150 nm. The 
surface hardness was about 2 – 3 GPa and the reduced Young's modulus was about 
55 - 65 GPa. 
- In other places, the film behaved like a bilayered structure: a surface layer, about 25 nm 
thick, with properties comparable to those of the MoDTC tribofilm (hardness of 
0.3 - 0.4 GPa, reduced Young's modulus of 8 GPa), covers a stiffer layer, 150 nm thick, 
with a reduced Young's modulus of about 80 GPa. 
- Elsewhere, the surface film was between 3 and 15 nm thick, with properties comparable to 
the lower properties measured on the ZDTP tribofilm (hardness about 1 – 1.5 GPa and 
reduced Young's modulus about 10 GPa). For some tests, this surface film was able to 
accommodate the pressure, with a pressure threshold of 1 – 1.5 GPa. It covers a stiffer 
layer, 10 to 55 nm thick, with a reduced Young's modulus varying from 60 to 110 GPa. 
ZDTP+MoDTC tribofilm
Unwashed block
0 20 40 60 80 100 120
Plastic depth (nm)
Prior to any contact
After imaging - test A
After imaging - test B
After imaging - test C
After imaging - test D
After imaging - test E
Figure 6: Representative hardness curves obtained on the unwashed ZDTP + MoDTC 
tribofilm, prior to any contact and after the imaging procedure. The film is spatially 
heterogeneous in thickness and in mechanical properties. 
ZDTP + MoDTC + detergent/dispersant tribofilms ("full formulation" tribofilms) 
Nanoindentation tests performed in fresh areas, prior to any contact showed that, at the 
surface of the unwashed tribofilm, there was a very soft layer, mobile under the diamond tip, 
with an apparent thickness of a few hundreds of nanometers. 
Representative hardness curves obtained on the unwashed block near these initial contacts are 
shown figure 7. Contrary to the ZDTP + MoDTC tribofilm, the film was found to be spatially 
homogeneous. Only its thickness was found to vary, depending on the tested area. A very thin 
softer layer was detected at the surface of the tribofilm, which did not resist to imaging nor 
scanning, except if the normal load was very low (lower than 0.3 µN). This layer had a 
hardness value (about 0.3 – 0.4 GPa) comparable to the hardness value of the MoDTC 
tribofilm. The observed large hardness increase when the load increased also indicated that 
the tribofilm had a great capability to accommodate the applied pressure. This result was 
confirmed by the interpretation of the stiffness measurements using the rheological model, 
which also showed that the tribofilm had a complex structure. At its surface, there was first a 
layer with a thickness of only a few nanometers (2 nm to 7 nm) and a reduced Young's 
modulus of 10 - 15 GPa. Then, there was a second layer (thickness between 20 nm and 140 
nm) with a higher reduced Young modulus of 65 – 80 GPa. 
A similar tribofilm was tested after n-heptane washing. It also had a great ability to 
accommodate the applied pressure. From the stiffness measurements, on most places, the film 
was found to behave like a film constituted by two layers. The surface layer was thin (5 to 25 
nm) with a reduced Young's modulus value in the range 15 – 20 GPa. The thickness of the 
underlayer was found to vary between 0 (no underlayer, example of figures 1 and 2) and 100 
nanometers and its elastic modulus was in the range 110 - 120 GPa. 
ZDTP + MoDTC + detergent/dispersant
Unwashed tribofilm
0 10 20 30 40 50 60
Plastic depth (nm)
First test, prior to any contact
Without preliminary scanning
After scanning or imaging
Figure 7: Representative hardness curves obtained on the unwashed ZDTP + MoDTC + 
detergent/dispersant tribofilm ("full formulation"), prior to any contact and in the region near 
the first contacts, either without preliminary surface scanning or after scanning/imaging 
procedure. 
Figure 8 compares representative hardness curves for all tested tribofilms. For the ZDTP + 
MoDTC tribofilms, three curves are plotted because of the variety of obtained results 
revealing the spatial heterogeneity of this tribofilm. A representative hardness curve for the 
ZDTP tribofilm tested in the same conditions in a previous study [16] has been added for 
comparison. 
0 5 10 15 20 25 30 35 40
Total penetration depth (nm)
ZDTP, solvent washed MoDTC, solvent washed
ZDTP + MoDTC, unwashed (2 tests) ZDTP + MoDTC, solvent washed
Full formulation, unwashed Full formulation, solvent washed
Figure 8: Comparison of the hardness curves obtained on the different tribofilms. The 
hardness curve obtained for a ZDTP anti-wear tribofilm obtained from a previous study is 
plotted for comparison. 
4.2.  Nanofriction experiments 
Nanofriction experiments were conducted on the three preceding tribofilms and also on a 
ZDTP tribofilm and on a ZDTP + detergent/dispersant tribofilm. In order to simplify the 
following graphs, only one representative curve was plotted for each tribofilm (or two when it 
was necessary to illustrate the dispersion when it was significant). 
Figure 9 shows the evolution of the friction force versus the normal force for the tested 
tribofilms. For a given formulation, there was very little difference between the results 
obtained on unwashed and on solvent washed tribofilms at low load, indicating that the 
solvent washing does not seem to affect the frictional behaviour of the tribofilm. This agrees 
with the idea that the soft viscous overlayer is supposed to serve as precursor for the tribofilm 
rather than that it plays a mechanical role during friction. 
0 3 6 9 12 15
Normal force, Fz (µN)
ZDTP, solvent washed ZDTP + Det/Disp, solvent washed
MoDTC, solvent washed ZDTP + MoDTC, solvent washed
ZDTP + MoDTC, solvent washed ZDTP + MoDTC, unwashed
Full formulation, unwashed Full formulation, solvent washed
Figure 9: Friction force (Fx) versus normal force (Fz) during nanofriction tests with 
increasing penetration depth for different tribofilms. 
It is also worth noting that the heterogeneity in mechanical properties found on the ZDTP + 
MoDTC tribofilm also exists in the frictional properties. For this tribofilm, the friction force 
at low normal loads may be comparable either to the friction force obtained for the ZDTP 
tribofilm or to the friction force obtained for the "full formulation" tribofilm. 
Under the present testing conditions, it can be observed that the lower friction forces were 
obtained for films containing MoDTC together with ZDTP. The higher were obtained for the 
tribofilm from MoDTC alone. 
Figure 10 shows the evolution of the friction coefficient versus mean pressure. The existence 
of low friction coefficient values (0.01<µ<0.05) appears to be related both to the presence of 
MoDTC additive in the initial lubricant and to the ability for the tribofilm to reach sufficiently 
high pressure values (1.5 – 3 GPa) during the friction test. Thus, the MoDTC tribofilm, which 
is not able to resist to the contact pressure by increasing its mechanical properties seems to be 
ineffective in reducing friction, contrary to the tribofilms containing ZDTP and MoDTC 
together, which are able to accommodate the contact pressure by increasing their mechanical 
properties. Nevertheless, both behaviours (high or low friction) were observed for the ZDTP 
+ MoDTC tribofilms. This is certainly due to the spatial heterogeneity of these tribofilms, 
which behave on some places like ZDTP tribofilms, or elsewhere like "full formulation" 
tribofilms. It was also observed that tribofilms formed without MoDTC were ineffective in 
reducing friction even if high contact pressures were reached during the friction tests. 
0 1 2 3 4 5 6
Mean pressure P (GPa)
ZDTP, solvent washed ZDTP + Det/Disp, solvent washed
MoDTC, solvent washed ZDTP + MoDTC, solvent washed
ZDTP + MoDTC, solvent washed ZDTP + MoDTC, unwashed
Full formulation, unwashed Full formulation, solvent washed
Figure 10: Apparent friction coefficient versus mean pressure for the different tested 
tribofilms. 
When the evolution of the friction coefficient is plotted versus penetration depth (figure 11), it 
appears that, when it existed, the low friction coefficient domain was detected a few 
nanometers below the surface of the tribofilm. It also shows that, for the full formulation, the 
low friction domain was deeper for the unwashed tribofilm than for the solvent washed one. 
The unwashed tribofilm appears to be covered by a surface layer with rather bad frictional 
properties, which can be removed by solvent washing or by "mechanical" sweep (low load 
scanning procedures for example). 
0 2 4 6 8 10 12 14 16 18 20
Penetration depth (nm)
ZDTP, solvent washed ZDTP + Det/Disp, solvent washed
MoDTC, solvent washed ZDTP + MoDTC, solvent washed
ZDTP + MoDTC, solvent washed ZDTP + MoDTC, unwashed
Full formulation, unwashed Full formulation, solvent washed
Figure 11: Apparent friction coefficient versus penetration depth for the different tested 
tribofilms. 
5. Discussion 
Because of the inhomogeneous and patchy nature of anti-wear tribofilms and of their low 
thickness, very few results are published concerning their mechanical properties [24-28]. 
Moreover, the differences in sample preparation and the diversity of used techniques and 
experimental procedures render delicate the comparison of the obtained results. For example, 
the Young’s modulus values given by Aktary et al. for a ZDTP tribofilm [28] are significantly 
higher that those we measured but one explanation can be that they did not take into account 
the substrate’s elasticity in their calculations, contrary to what is done in the current study. Or 
if we attempt to compare our results with those recently published by Ye et al. on ZDTP and 
ZDTP + MoDTC tribofilms [14, 15], this reveals significant differences. For example, Ye et 
al. found that both tribofilms possess the same hardness and modulus depth distributions, 
corresponding to continuously and functionally graded materials, when in the present work, 
the hardness curves for similar tribofilms did not coincide and the use of our rheological film 
model allowed us to describe the tribofilms as layered materials with properties adaptable to 
contact conditions. The hardness and modulus values, respectively 10 GPa at a contact depth 
of 30 nm and 215 GPa at a depth of 20 nm, that they reported are also significantly higher 
than those we measured and also higher than those given by Aktary et al. This could be due to 
differences in sample preparation and also certainly to the use of different methods and 
assumptions for the treatment of the nanoindentation data. 
Concerning the frictional behaviour of the tribofilms, the presented nanofriction tests were 
conducted in unlubricated conditions, at very low speed (2 to 5 nm/s) and the measured 
nanofriction coefficients corresponded to the friction between the diamond tip and the 
tribofilm (over its steel substrate). That is why it also seems difficult to compare our values to 
macroscopic friction coefficient values obtained on classical tribometers. The latter are 
representative of steel on steel contact in the presence of a tribofilm and are averaged over the 
whole contact surface. However, our local values are not far from the end of test Amsler 
macroscopic friction coefficient values published by Pidduck and Smith [25] for ZDTP, 
ZDTP + detergent/dispersant and ZDTP + friction modifier tribofilms. Moreover, these 
macroscopic values were found to be proportional, with a factor 0.7, to micro-friction 
coefficient values measured with Lateral Force Microscopy by the same authors, making 
them suggest that there may be a link between macro and micro-frictional behaviour of 
smooth regions of anti-wear tribofilms. Unfortunately, no tribofilm obtained from friction 
modifier alone were tested in this study, with which we could compare our results. 
Nevertheless, macroscopic friction coefficient values, in the range 0.10 – 0.14, measured on 
an alternative ball on plane tribometer were reported by Muraki and Wada [6] for oil 
containing MoDTC alone. They conclude that such lubricant was ineffective in reducing 
friction, contrary to the oil containing MoDTC together with ZDTP. More recently, similar 
high macroscopic friction coefficient values (in the range 0.095 – 0.2) were measured by 
Unnikrishnan et al. for oil containing MoDTC alone [29]. On the other hand, Grossiord et al. 
reported very low steady-state friction coefficient (0.04) measured for base oil + MoDTC 
during SRV friction tests, and a lower steady-state value (0.02) for friction tests in a UHV 
tribometer, carried out by sliding a macroscopic hemispherical steel pin again a flat covered 
by a MoDTC tribofilm [13]. From tests carried out in a high frequency reciprocating rig, 
Graham et al. [30] also reported that, in the absence of ZDTP, MoTDC additives were 
effective in reducing friction at a combination of high additive concentration and high 
temperature (up to 0.4% wt. and 200°C). Such diversity of results, certainly partly due to the 
various tests conditions, makes unreasonable a comparison between the very high 
nanofriction coefficient measured on the MoDTC tribofilm under the present testing 
conditions and those published values. As, regarding the literature, the formation of MoS2 was 
well established for MoDTC containing lubricants, the question is how can we explain such 
high friction coefficient during the nanofriction tests ? Or what caused the very low friction 
observed when ZDTP was used together with MoDTC ? From figure 10, the low friction 
coefficient values (0.01<µ<0.05) were observed for the MoDTC containing lubricants when 
the contact pressure was in the range 1.5 – 3 GPa (the question of the spatial heterogeneity of 
the ZDTP + MoDTC tribofilm will be discussed latter). These high pressures were measured 
for tribofilms able to increase their mechanical properties, thus accommodating the contact 
conditions, which was demonstrated to be the case for ZDTP anti-wear tribofilms [16]. On the 
other hand, high pressures were not reached for the soft MoDTC tribofilm. Thus, the easy 
sliding of the MoS2 sheets could result from a favourable orientation induced by sufficiently 
high contact pressure values. The ability of MoS2 sheets to orient in a favourable direction 
was reported by Grossiord et al. [31] and Martin et al. [32], who recently investigated 
tribochemical interactions between ZDTP, MoDTC and OCB (overbased detergent calcium 
borate) additives. Using high resolution TEM observations of wear debris, coupled with wear 
scar micro-spot XPS analysis, they observed perfectly oriented MoS2 sheets, with their basal 
plane parallel to the flaky wear fragments. Such "mechanical" interpretation of the role of the 
contact pressure agrees with previous work of Muraki et al. who studied the effect of roller 
hardness on the rolling sliding characteristics of MoDTC in the presence of ZDTP and 
concluded that the friction reduction effect increased with higher degree of roller hardness 
[10]. Yamamoto also reported that a necessary condition for improving the friction and wear 
characteristics of a lubricant was the formation of surface films composed of iron phosphates 
with high hardness and Mo-S compounds [11]. Concerning the spatial heterogeneity of the 
ZDTP + MoDTC tribofilms, it can be worth noting that using high resolution TEM 
observations of wear debris collected after friction tests, coupled with AES and XPS studies 
of rubbing surfaces, Grossiord et al. described the ZDTP + MoDTC tribofilm as being 
composed of a mixture of glassy zinc phosphate zones containing molybdenum, and carbon-
rich zones containing zinc and highly-dispersed MoS2 single sheets [13, 33]. 
The observation that, during the nanofriction tests, the low friction domain was located a few 
nanometers below the surface also corroborates this interpretation. As the nanofriction tests 
were conducted at increasing depth, the sufficiently high pressures were obtained after a few 
nanometers penetration depth inside the MoS2 containing layer (with properties similar to the 
MoDTC tribofilm), thanks to the presence of the underneath resisting anti-wear layer, whose 
characteristics are similar to those of the phosphate layer of the ZDTP tribofilm. 
Finally, combining the results obtained from the nanoindentation and nanofriction 
experiments, we can propose a possible schematic description of the anti-wear tribofilms 
obtained from the "full formulation" oil. Some assumptions are also made on what happened 
during nanofriction tests on such tribofilms (see figure 12 on which for convenient drawing, 
as the Berkovitch diamond tip is not sharp, it was represented by a flat punch). 
A soft layer containing non-oriented MoS2 sheets is present at the surface of the tribofilm 
(layer (a) in figure 12). This layer, 0 to 25 nm thick, has mechanical properties comparable 
with those of the MoDTC tribofilm (0.3 – 0.5 GPa for the hardness and 3 – 10 GPa for the 
reduced Young's modulus). Its friction coefficient is rather high. This layer is easily damaged 
or removed by the diamond tip during imaging or line-scanning procedures. When the contact 
pressure is sufficiently high, friction induces a favourable orientation of the MoS2 sheets, over 
a thickness of 1 or 2 nanometers (layer (b) in figure 12), resulting in very low friction 
coefficient values which combine with the anti-wear efficiency of the tribofilm. Under this 
layer, there is then an anti-wear layer (layer (c) in figure 12), with properties similar to those 
of the polyphosphate layer of the ZDTP tribofilm. Then, just over the substrate (noted (e)in 
figure 12), there is a bonding layer (layer (d) in figure 12) with high mechanical properties 
(oxides, sulfides). 
Figure 12: Possible schematic description of the anti-wear tribofilm obtained from the "full 
formulation" and orientation of the MoS2 planes of the outer layer resulting from a 
nanofriction tests (for convenient drawing, as the Berkovitch diamond tip is not sharp, it was 
represented by a flat punch). The thickness of each layer is arbitrary drawn as it varies 
significantly depending on the tested area (from zero when the layer is not present to a few 
tens of nanometers). 
(a) Soft layer containing non-oriented MoS2 sheets, with mechanical properties comparable 
to those of the MoDTC tribofilm, 
(b) Layer of favourably frictionally oriented MoS2 sheets with a typical thickness of 1 or 
2 nm, 
(c) Layer with properties similar to those of the polyphosphate layer of the ZDTP tribofilm, 
(d) Bonding layer with high mechanical properties (oxides, sulfides), 
(e) Steel substrate. 
6.  Conclusions 
Thanks to the combined used of (i) nanoindentation experiments with continuous stiffness 
measurements coupled with imaging procedures, (ii) a specifically developed rheological film 
model and (iii) nanofriction tests, synergistic effects of ZDTP and MoDTC on frictional 
behaviour of anti-wear tribofilms have been evidenced from mechanical considerations. One 
original feature of this study lies in the characterisation of unwashed anti-wear tribofilms with 
their full structure preserved. 
The structure and nanomechanical properties (hardness and reduced Young's modulus) of 
tribofilms formed with different mixtures of additives (ZDTP, MoDTC, detergent/dispersant) 
were first determined. 
Concerning the occurrence of very low friction (0.01<µ<0.05), the contact pressure was found 
to be a critical parameter. The low friction coefficient values were attributed to a favourable 
orientation of MoS2 sheets present in the outer layer of the tribofilms formed from MoDTC 
containing lubricants. Such a favourable orientation occurred only if sufficiently high contact 
pressure was reached. These high contact pressures were attained when ZDTP was used as oil 
additive together with MoDTC because one of the main characteristics of ZDTP additives is 
to form protective anti-wear tribofilms under boundary lubrication, with varying structure and 
properties with depth, among which is an amazing ability to increase their mechanical 
properties, thus accommodating the contact conditions. 
A possible schematic description of the tribofilms containing both ZDTP and MoDTC was 
deduced and a mechanism was proposed to account for the mechanical synergy that occurs 
during nanofriction tests on such tribofilms. 
Aknowledgement 
The authors thank Shell Research Limited for financial support and permission to publish. 
References 
[1] F. G. Rounds, ASLE Transactions, 24 (4) (1980) 431-440. 
[2] M. Muraki and H. Wada, Tribology International, 35 (2002) 857-863. 
[3] Z. Yin, M. Kasrai, M. Fuller, G. M. Bancroft, K. Fyfe and K. H. Tan, Wear, 202 (1997) 
172-191. 
[4] Z. Yin, M. Kasrai, G. M. Bancroft, K. Fyfe, M. L. Colaianni and K. H. Tan, Wear, 202 
(1997) 192-201. 
[5] P. A. Willermet, D. P. Dailey, R. O. Carter III, P. J. Schmitz, W. Zhu, J. C. Bell and D. 
Park, Tribology International, 28 (3) (1995) 163-175. 
[6] M. Muraki and H. Wada, in Lubricants and Lubrication - Proceedings of Leeds-Lyon 21, 
D. Dowson et al. , Elsevier, Tribology Series, 30, (1995) 409-422. 
[7] A. K. Misra, A. K. Mehrotra and R. D. Srivastava, Wear, 31 (2) (1975) 345-357. 
[8] M. D. Johnson, R. K. Jensen and S. Korcek, SAE Technical Paper Series, Engine Oil 
Rheology and Tribology (SP-1303) - n°972860, (1997) 37-47. 
[9] P. A. Willermet, Tribology Letters, 5 (1998) 41-47. 
[10] M. Muraki, Y. Yanagi and K. Sakaguchi, Japanese Journal of Tribology, 40 (2) (1995) 
41-51. 
[11] Y. Yamamoto, S. Gondo, T. Kamakura and N. Tanaka, Wear, 112 (1986) 79-87. 
[12] M. Muraki, Y. Yanagi and K. Sakaguchi, Tribology International, 30 (1) (1997) 69-75. 
[13] C. Grossiord, K. Varlot, J. M. Martin, T. Le Mogne, C. Esnouf and K. Inoue, Tribology 
International, 31 (12) (1998) 737-743. 
[14] J. Ye, M. Kano and Y. Yasuda, Tribology Letters, 13 (1) (2002) 41-47. 
[15] J. Ye, M. Kano and Y. Yasuda, Journal of Applied Physics, 93 (9) (2003) 5113-5117. 
[16] S. Bec, A. Tonck, J. M. Georges, R. C. Coy, J. C. Bell and G. W. Roper, Proc. R. Soc. 
Lond. A, 455 (1999) 4181-4203. 
[17] G. W. Roper and J. C. Bell, Society of Automotive Engineers Fuels and Lubricants 
Meeting and Exposition, Toronto, Canada, Paper SAE 952473, (1995)  
[18] A. Tonck, J. M. Georges and J. L. Loubet, J. Colloid Interface Sci., 126 (1988) 150-163. 
[19] A. Tonck, S. Bec, D. Mazuyer, J. M. Georges and A. A. Lubrecht, Journal of 
Engineering Tribology - Proc Instn Mech Engrs Part J, 213 (J5) (1999) 353-361. 
[20] J. M. Georges, A. Tonck, D. Mazuyer, E. Georges, J. L. Loubet and F. Sidoroff, J. Phys. 
II France, 6 (1996) 57-76. 
[21] S. Bec, A. Tonck, J. M. Georges, E. Georges and J. L. Loubet, Philosophical Magazine 
A, Revue CL, 74, (5), (1996) 1061-1072. 
[22] A. Tonck, S. Bec, J. M. Georges, J. C. Bell, R. C. Coy and G. W. Roper, in Lubrication 
at the Frontier : The Role of the Interface and Surface Layers in the Thin Films and Boundary 
Regimes, Proceedings of Leeds-Lyon 25, D. Dowson et al. , Elsevier, Tribology Series, 36, 
Amsterdam, The Netherlands, (1999) 39-47. 
[23] S. Bec and A. Tonck, in The Third Body Concept : Interpretation of Tribological 
Phenomena, Proceedings of Leeds-Lyon 22, G. Dalmaz, D. Dowsen, C.M. Taylor and T.H.C 
Childs, Elsevier, Tribology Series, 31, Amsterdam, the Netherlands, (1996) 173-184. 
[24] P. A. Willermet, R. O. Carter III, P. J. Schmitz, M. Everson, D. J. Scholl and W. H. 
Weber, Lubrication Sci., 9 (4) (1997) 325-348. 
[25] A. J. Pidduck and S. G.C., Wear, 212 (1997) 254-264. 
[26] O. L. Warren, J. F. Graham, P. R. Norton, J. E. Houston and T. A. Michalske, Tribology 
Letters, 4 (1998) 189-198. 
[27] J. F. Graham, C. McCague and P. R. Norton, Tribology Letters, 6 (1999) 149-157. 
[28] M. Aktary, M. T. McDermott and G. A. McAlpine, Tribology Letters, 12 (3) (2002) 155-
162. 
[29] R. Unnikrishnan, M. C. Jain, A. K. Harinarayan and A. K. Mehta, Wear, 252 (2002) 240-
249. 
[30] J. F. Graham, H. A. Spikes and S. Korcek, Tribology Transactions, 44 (4) (2001) 626-
636. 
[31] C. Grossiord, J. M. Martin, K. Varlot, B. Vacher and T. Le Mogne, Tribology Letters, 8 
(4) (2000) 203-212. 
[32] J. M. Martin, C. Grossiord, K. Varlot, B. Vacher, T. Le Mogne and Y. Yamada, 
Lubrication Science, 15 (2) (2003) 119-132. 
[33] C. Grossiord, J. M. Martin, T. Le Mogne, K. Inoue and J. Igarashi, Journal of Vacuum 
Science and Technology A, 17 (3) (1999) 884-890.
ABSTRACT
  The layered structure and the rheological properties of anti-wear films,
generated in a rolling/sliding contact from lubricants containing zinc
dialkyldithiophosphate (ZDTP) and/or molybdenum dialkyldithiocarbamate (MoDTC)
additives, have been studied by dynamic nanoindentation experiments coupled
with a simple modelling of the stiffness measurements. Local nano-friction
experiments were conducted with the same device in order to determine the
evolution of the friction coefficient as a function of the applied pressure for
the different lubricant formulations. For the MoDTC film, the applied pressure
in the friction test remains low (<0.5 GPa) and the apparent friction
coefficient is high ($\mu$ > 0.4). For the tribofilms containing MoDTC together
with ZDTP, which permits the applied pressure to increase up to a few GPa
through some accommodation process, a very low friction domain appears (0.01 <
$\mu$ < 0.05), located a few nanometers below the surface of the tribofilm.
This low friction coefficient is attributed to the presence of MoS2 planes
sliding over each other in a favourable configuration obtained when the
pressure is sufficiently high, which is made possible by the presence of ZDTP.

<|endoftext|><|startoftext|>
Lattice Boltzmann inverse kinetic approach for the incompressible Navier-Stokes
equations
Enrico Fonda1,Massimo Tessarotto1,2 and Marco Ellero3
1Dipartimento di Matematica e Informatica,
Università di Trieste, Italy
2Consorzio di Magnetofluidodinamica, Trieste, Italy
3Institute of Aerodynamics,
Technical University of Munich, Munich, Germany
(Dated: August 18, 2021)
In spite of the large number of papers appeared in the past which are devoted to the lattice
Boltzmann (LB) methods, basic aspects of the theory still remain unchallenged. An unsolved theo-
retical issue is related to the construction of a discrete kinetic theory which yields exactly the fluid
equations, i.e., is non-asymptotic (here denoted as LB inverse kinetic theory). The purpose of this
paper is theoretical and aims at developing an inverse kinetic approach of this type. In principle
infinite solutions exist to this problem but the freedom can be exploited in order to meet important
requirements. In particular, the discrete kinetic theory can be defined so that it yields exactly the
fluid equation also for arbitrary non-equilibrium (but suitably smooth) kinetic distribution func-
tions and arbitrarily close to the boundary of the fluid domain. This includes the specification
of the kinetic initial and boundary conditions which are consistent with the initial and boundary
conditions prescribed for the fluid fields. Other basic features are the arbitrariness of the ”equi-
librium” distribution function and the condition of positivity imposed on the kinetic distribution
function. The latter can be achieved by imposing a suitable entropic principle, realized by means of
a constant H-theorem. Unlike previous entropic LB methods the theorem can be obtained without
functional constraints on the class of the initial distribution functions. As a basic consequence, the
choice of the the entropy functional remains essentially arbitrary so that it can be identified with
the Gibbs-Shannon entropy. Remarkably, this property is not affected by the particular choice of
the kinetic equilibrium (to be assumed in all cases strictly positive). Hence, it applies also in the
case of polynomial equilibria, usually adopted in customary LB approaches. We provide different
possible realizations of the theory and asymptotic approximations which permit to determine the
fluid equations with prescribed accuracy. As a result, asymptotic accuracy estimates of customary
LB approaches and comparisons with the Chorin artificial compressibility method are discussed.
PACS numbers: 47.27.Ak, 47.27.eb, 47.27.ed
1 - INTRODUCTION - INVERSE KINETIC
THEORIES
Basic issues concerning the foundations classical hy-
drodynamics still remain unanswered. A remarkable as-
pect is related the construction of inverse kinetic theo-
ries (IKT) for hydrodynamic equations in which the fluid
fields are identified with suitable moments of an appropri-
ate kinetic probability distribution. The topic has been
the subject of theoretical investigations both regarding
the incompressible Navier-Stokes (NS) equations (INSE)
[1, 2, 3, 4, 5, 6] and the quantum hydrodynamic equations
associated to the Schrödinger equation [7]. The impor-
tance of the IKT-approach for classical hydrodynamics
goes beyond the academic interest. In fact, INSE rep-
resent a mixture of hyperbolic and elliptic pde’s, which
are extremely hard to study both analytically and nu-
merically. As such, their investigation represents a chal-
lenge both for mathematical analysis and for computa-
tional fluid dynamics. The discovery of IKT [1] provides,
however, a new starting point for the theoretical and nu-
merical investigation of INSE. In fact, an inverse kinetic
theory yields, by definition, an exact solver for the fluid
equations : all the fluid fields, including the fluid pres-
sure p(r, t), are uniquely prescribed in terms of suitable
momenta of the kinetic distribution function, solution
of the kinetic equation. In the case of INSE this per-
mits, in principle, to determine the evolution of the fluid
fields without solving explicitly the Navier-Stokes equa-
tion, nor the Poisson equations for the fluid pressure [6].
Previous IKT approaches [2, 3, 4, 5, 7] have been based
on continuous phase-space models. However, the inter-
esting question arises whether similar concepts can be
adopted also to the development of discrete inverse ki-
netic theories based on the lattice Boltzmann (LB) the-
ory. The goal of this investigation is to propose a novel
LB theory for INSE, based on the development of an IKT
with discrete velocities, here denoted as lattice Boltzmann
inverse kinetic theory (LB-IKT). In this paper we intend
to analyze the theoretical foundations and basic proper-
ties of the new approach useful to display its relation-
ship with previous CFD and lattice Boltzmann methods
(LBM) for incompressible isothermal fluids. In particu-
lar, we wish to prove that it delivers an inverse kinetic
http://arxiv.org/abs/0704.0339v1
theory, i.e., that it realizes an exact Navier-Stokes and
Poisson solver.
1a - Motivations: difficulties with LBM’s
Despite the significant number of theoretical and nu-
merical papers appeared in the literature in the last few
years, the lattice Boltzmann method [8, 9, 10, 11, 12,
13, 14] - among many others available in CFD - is prob-
ably the one for which a complete understanding is not
yet available. Although originated as an extension of
the lattice gas automaton [15, 16] or a special discrete
form of the Boltzmann equation [17], several aspects re-
garding the very foundation of LB theory still remain to
be clarified. Consequently, also the comparisons and ex-
act relationship between the various lattice Boltzmann
methods (LBM) and other CFD methods are made dif-
ficult or, at least, not yet well understood. Needless to
say, these comparisons are essential to assess the relative
value (based on the characteristic computational com-
plexity, accuracy and stability) of LBM and other CFD
methods. In particular the relative performance of the
numerical methods depend strongly on the characteris-
tic spatial and time discretization scales, i.e., the minimal
spatial and time scale lengths required by each numerical
method to achieve a prescribed accuracy. On the other
hand, most of the existing knowledge of the LBM’s prop-
erties originates from numerical benchmarks (see for ex-
ample [18, 19, 20]). Although these studies have demon-
strated the LBM’s accuracy in simulating fluid flows, few
comparisons are available on the relative computational
efficiency of the LBM and other CFD methods [17, 21].
The main reason [of these difficulties] is probably because
current LBM’s, rather than being exact Navier-Stokes
solvers, are at most asymptotic ones (asymptotic LBM’s),
i.e., they depend on one or more infinitesimal parame-
ters and recover INSE only in an approximate asymptotic
sense.
The motivations of this work are related to some of
the basic features of customary LB theory representing,
at the same time, assets and weaknesses. One of the
main reasons of the popularity of the LB approach lays
in its simplicity and in the fact that it provides an ap-
proximate Poisson solver, i.e., it permits to advance in
time the fluid fields without explicitly solving numeri-
cally the Poisson equation for the fluid pressure. How-
ever customary LB approaches can yield, at most, only
asymptotic approximations for the fluid fields. This is
because of two different reasons. The first one is the dif-
ficulty in the precise definition of the kinetic boundary
conditions in customary LBM’s, since sufficiently close to
the boundary the form of the distribution function pre-
scribed by the boundary conditions is not generally con-
sistent with hydrodynamic equations. The second reason
is that the kinetic description adopted implies either the
introduction of weak compressibility [8, 9, 11, 12, 13, 14]
or temperature [22] effects of the fluid or some sort of
state equation for the fluid pressure [23]. These assump-
tions, although physically plausible, appear unacceptable
from the mathematical viewpoint since they represent a
breaking of the exact fluid equations.
Moreover, in the case of very small fluid viscosity
customary LBM’s may become inefficient as a conse-
quence of the low-order approximations usually adopted
and the possible presence of the numerical instabilities
mentioned above. These accuracy limitations at low vis-
cosities can usually be overcome only by imposing severe
grid refinements and strong reductions of the size of the
time step. This has the inevitable consequence of rais-
ing significantly the level of computational complexity
in customary LBM’s (potentially much higher than that
of so-called direct solution methods), which makes them
inefficient or even potentially unsuitable for large-scale
simulations in fluids.
A fundamental issue is, therefore, related to the con-
struction of more accurate, or higher-order, LBM’s, ap-
plicable for arbitrary values of the relevant physical
(and asymptotic) parameters. However, the route which
should permit to determine them is still uncertain, since
the very existence of an underlying exact (and non-
asymptotic) discrete kinetic theory, analogous to the con-
tinuous inverse kinetic theory [2, 3], is not yet known.
According to some authors [24, 25, 26] this should be
linked to the discretization of the Boltzmann equation, or
to the possible introduction of weakly compressible and
thermal flow models. However, the first approach is not
only extremely hard to implement [27], since it is based
on the adoption of higher-order Gauss-Hermite quadra-
tures (linked to the discretization of the Boltzmann equa-
tion), but its truncations yield at most asymptotic the-
ories. Other approaches, which are based on ’ad hoc’
modifications of the fluid equations (for example, intro-
ducing compressibility and/or temperature effects [28]),
by definition cannot provide exact Navier-Stokes solvers.
Another critical issue is related to the numerical sta-
bility of LBM’s [29], usually attributed to the violation of
the condition of strict positivity (realizability condition)
for the kinetic distribution function [29, 30]. Therefore,
according to this viewpoint, a stability criterion should
be achieved by imposing the existence of an H-theorem
(for a review see [31]). In an effort to improve the ef-
ficiency of LBM numerical implementations and to cure
these instabilities, there has been recently a renewed in-
terest in the LB theory. Several approaches have been
proposed. The first one involves the adoption of entropic
LBM’s (ELBM [30, 32, 33, 34] in which the equilibrium
distribution satisfies also a maximum principle, defined
with respect to a suitably defined entropy functional.
However, usually these methods lead to non-polynomial
equilibrium distribution functions which potentially re-
sult in higher computational complexity [35] and less nu-
merical accuracy[36]. Other approaches rely on the adop-
tion of multiple relaxation times [37, 38]. However the
efficiency, of these methods is still in doubt. Therefore,
the search for new [LB] models, overcoming these limita-
tions, remains an important unsolved task.
1b - Goals of the investigation
The aim of this work is the development of an inverse
kinetic theory for the incompressible Navier-Stokes equa-
tions (INSE) which, besides realizing an exact Navier-
Stokes (and Poisson) solver, overcomes some of the lim-
itations of previous LBM’s. Unlike Refs. [2, 3], where a
continuous IKT was considered, here we construct a dis-
crete theory based on the LB velocity-space discretiza-
tion. In such a type of approach, the kinetic description
is realized by a finite number of discrete distribution func-
tions fi(r, t), for i = 0, k, each associated to a prescribed
discrete constant velocity ai and defined everywhere in
the existence domain of the fluid fields (the open set Ω×I
). The configuration space Ω is a bounded subset of the
Euclidean space R3and the time interval I is a subset of
R. The kinetic theory is obtained as in [2, 3] by introduc-
ing an inverse kinetic equation (LB-IKE) which advances
in time the distribution function and by properly defin-
ing a correspondence principle, relating a set of velocity
momenta with the relevant fluid fields.
To achieve an IKT for INSE, however, also a proper
treatment of the initial and boundary conditions, to be
satisfied by the kinetic distribution function, must be in-
cluded. In both cases, it is proven that they can be de-
fined to be exactly consistent - at the same time - both
with the hydrodynamic equations (which must hold also
arbitrarily close to the boundary of the fluid domain) and
with the prescription of the initial and Dirichlet bound-
ary conditions set for the fluid fields. Remarkably, both
the choice of the initial and equilibrium kinetic distri-
bution functions and their functional class remain essen-
tially arbitrary. In other words, provided suitable min-
imal smoothness conditions are met by the kinetic dis-
tributions function, for arbitrary initial and boundary
kinetic distribution functions, the relevant moment equa-
tions of the kinetic equation coincide identically with the
relevant fluid equations. This includes the possibility
of defining a LB-IKT in which the kinetic distribution
function is not necessarily a Galilean invariant.
This arbitrariness is reflected also in the choice of pos-
sible ”equilibrium” distribution functions, which remain
essentially free in our theory, and can be made for exam-
ple in order to achieve minimal algorithmic complexity.
A possible solution corresponds to assume polynomial-
type kinetic equilibria, as in the traditional asymptotic
LBM’s. These kinetic equilibria are well-known to be
non-Galilean invariant with respect to arbitrary finite
velocity translations. Nevertheless, as discussed in detail
in Sec.4, Subsection 4A, although the adoption of Galilei
invariant kinetic distributions is in possible, this choice
does not represent an obstacle for the formulation of a
LB-IKT. Actually Galilean invariance need to be fulfilled
only by the fluid equations. The same invariance prop-
erty must be fulfilled only by the moment equations of
the LB-IKT and not necessarily by the whole LB inverse
kinetic equation (LB-IKE).
Another significant development of the theory is the
formal introduction of an entropic principle, realized by
a constant H-theorem, in order to assure the strict pos-
itivity of the kinetic distribution function in the whole
existence domain Ω× I. The present entropic principle
departs significantly from the literature. Unlike previ-
ous entropic LBM’s it is obtained without imposing any
functional constraints on the class of the initial kinetic
distribution functions. Namely without demanding the
validity of a principle of entropy maximization (PEM,
[39]) in a true functional sense on the form of the distri-
bution function. Rather, it follows imposing a constraint
only on a suitable set of extended fluid fields, in particu-
lar the kinetic pressure p1(r, t).The latter is uniquely re-
lated to the actual fluid pressure p(r, t) via the equation
p1(r, t) = p(r, t) + Po(t), with Po(t) > 0 to be denoted
as pseudo-pressure. The constant H-theorem is therefore
obtained by suitably prescribing the function Po(t) and
implies the strict positivity. The same prescription as-
sures that the entropy results maximal with respect in the
class of the admissible kinetic pressures, i.e., it satisfies a
principle of entropy maximization. Remarkably, since
this property is not affected by the particular choice of
the kinetic equilibrium, the H-theorem applies also in the
case of polynomial equilibria. We stress that the choice
of the entropy functional remains essentially arbitrary,
since no actual physical interpretation can be attached to
it. For example, without loss of generality it can always
be identified with the Gibbs-Shannon entropy. Even pre-
scribing these additional properties, in principle infinite
solutions exist to the problem. Hence, the freedom can
be exploited to satisfy further requirements (for example,
mathematical simplicity, minimal algorithmic complex-
ity, etc.). Different possible realizations of the theory and
comparisons with other CFD approaches are considered.
The formulation of the inverse kinetic theory is also use-
ful in order to determine the precise relationship between
the LBM’s and previous CFD schemes and in particular
to obtain possible improved asymptotic LBM’s with pre-
scribed accuracy. As an application, we intend to con-
struct asymptotic models which satisfy with prescribed
accuracy the required fluid equations [INSE] and possi-
bly extend also the range of validity of traditional LBM’s.
In particular, this permits to obtain asymptotic accuracy
estimates of customary LB approaches. The scheme of
presentation is as follows. In Sec.2 the INSE problem
is recalled and the definition of the extended fluid fields
{V, p1} is presented. In Sec. 3 the basic assumptions
of previous asymptotic LBM’s are recalled. In.Sec.4 and
5 the foundations of the new inverse kinetic theory are
laid down and the integral LB inverse kinetic theory is
presented, while in Sec. 6 the entropic theorem is proven
to hold for the kinetic distribution function for properly
defined kinetic pressure. Finally, in Sec.7 various asymp-
totic approximations are obtained for the inverse kinetic
theory and comparisons are introduce with previous LB
and CFD methods and in Sec. 8 the main conclusions
are drawn.
2 - THE INSE PROBLEM
A prerequisite for the formulation of an inverse kinetic
theory [2, 3] providing a phase-space description of a clas-
sical (or quantum) fluid is the proper identification of the
complete set of fluid equations and of the related fluid
fields. For a Newtonian incompressible fluid, referred to
an arbitrary inertial reference frame, these are provided
by the incompressible Navier-Stokes equations (INSE) for
the fluid fields {ρ,V,p}
∇ ·V = 0, (1)
NV = 0, (2)
ρ(r,t) = ρo. (3)
There are supplemented by the inequalities
p(r,t) ≥ 0, (4)
ρo > 0. (5)
Equations (1)-(3) are defined in a open connected set
Ω ⊆ R3 (defined as the subset of R3 where ρ(r,t) > 0)
with boundary δΩ, while Eqs. (4) and (5) apply on its
closure Ω. Here the notation is standard. Thus, N is the
NS operator
NV ≡ρo
V +∇p+ f − µ∇2V, (6)
with D
+V · ∇ the convective derivative, f denotes
a suitably smooth volume force density acting on the fluid
element and µ ≡ νρo > 0 is the constant fluid viscosity.
In particular we shall assume that f can be represented
in the form
f = −∇Φ(r) + f1(r,t)
where we have separated the conservative ∇Φ(r) and the
non-conservative f1 parts of the force. Equations (1)-(3)
are assumed to admit a strong solution in Ω × I, with
I ⊂ R a possibly bounded time interval. By assumption
{ρ,V,p} are continuous in the closure Ω. Hence if in Ω×I,
f is at least C(1,0)(Ω×I), it follows necessarily that {V,p}
must be at least C(2,1)(Ω × I). In the sequel we shall
impose on {V,p} the initial conditions
V(r,to) = Vo(r), (7)
p(r, to) = po(r).
Furthermore, for greater mathematical simplicity, here
we shall impose Dirichlet boundary conditions on δΩ
V(·,t)|
δΩ = VW (·,t)|δΩ
p(·,t)|
δΩ = pW (·,t)|δΩ .
Eqs.(3) and (7)-(8) define the initial-boundary value
problem associated to the reduced INSE (reduced INSE
problem). It is important to stress that the previous
problem can also formulated in an equivalent way by re-
placing the fluid pressure p(r, t) with a function p1(r, t)
(denoted kinetic pressure) of the form
p1(r, t) = Po + p(r, t), (9)
where Po = Po(t) is prescribed (but arbitrary) real func-
tion of time and is at least Po(t) ∈ C
(1)(I). {V,p1} will
be denoted hereon as extended fluid fields and Po(t) will
be denoted as pseudo-pressure.
3 - ASYMPTOTIC LBM’S
3A - Basic assumptions
As is well known, all LB methods are based on a dis-
crete kinetic theory, using a so-called lattice Boltzmann
velocity discretization of phase-space (LB discretization).
This involves the definition of a kinetic distribution func-
tion f, which can only take the values belonging to a
finite discrete set {fi(r, t), i = 0, k} (discrete kinetic dis-
tribution functions). In particular, it is assumed that the
functions fi, for i = 0, k, are associated to a discrete set
of k+1 different ”velocities” {ai, i = 0, k} . Each ai is an
’a priori’ prescribed constant vector spanning the vector
space Rn (with n = 2 or 3 respectively for the treatment
of two- and three-dimensional fluid dynamics),and each
fi(r, t) is represented by a suitably smooth real function
which is defined and continuous in Ω×I and in particular
is at least C(k,j)(Ω× I) with k ≥ 3.
The crucial aspect which characterizes customary LB
approaches [8, 9, 10, 11, 12, 13, 14, 17, 40, 41] involves the
construction of kinetic models which allow a finite sound
speed in the fluid and hence are based on the assumption
of a (weak) compressibility of the same fluid. This is
realized by assuming that the evolution equation (kinetic
equation) for the discrete distributions fi(r, t) (i = 1, k),
depends at least one (or more) infinitesimal (asymptotic)
parameters (see below). Such approaches are therefore
denoted as asymptotic LBM’s. They are characterized
by a suitable set of assumptions, which typically include:
1. LB assumption #1: discrete kinetic equation and
correspondence principle: the first assumption con-
cerns the definition of an appropriate evolution
equation for each fi(r, t) which must hold (together
with all its moment equations) in the whole open
set Ω× I. In customary LB approaches it takes the
form of the so-called LB-BGK equation [13, 41, 42]
L(i)fi = Ωi(fi), (10)
where i = 0, k. Here L(i) is a suitable streaming
operator,
Ωi(fi) = −νc(fi − f
i ) (11)
(with νc ≥ 0 a constant collision frequency) is
known as BKG collision operator (after Bhatba-
gar, Gross and Krook [43]) and f
i is an ”equi-
librium” distribution to be suitably defined. In
customary LBM’s it is implicitly assumed that
the solution of Eq.(10), subject to suitable initial
and boundary conditions exists and is unique in
the functional class indicated above. In partic-
ular, usually L(i) is either identified with the fi-
nite difference streaming operator (see for example
[8, 11, 13, 42]), i.e., L(i)fi(r, t) = LFD(i)fi(r, t) ≡
[fi(r+ ai∆t, t+∆t)− fi(r, t)] or with the dif-
ferential streaming operator (see for instance [17,
40, 41])
L(i) = LD(i) ≡
+ ai ·
. (12)
Here the notation is standard. In particular, in the
case of the operator LFD(i), ∆t and c∆t ≡ Lo are
appropriate parameters which define respectively
the characteristic time- and length- scales associ-
ated to the LBM time and spatial discretizations.
A common element to all LBM’s is the assump-
tion that all relevant fluid fields can be identified,
at least in some approximate sense, with appro-
priate momenta of the discrete kinetic distribu-
tion function (correspondence principle). In par-
ticular, for neutral and isothermal incompressible
fluids, for which the fluid fields are provided re-
spectively by the velocity and pressure fluid fields
{Yj(r, t), j = 1, 4} ≡ {V(r, t), p(r, t)} , it is as-
sumed that they are identified with a suitable set
of discrete velocity momenta (for j = 1, 4)
Yj(r, t) =
i=0,k
Xji(r, t)fi(r, t), (13)
where Xji(r, t) (with i = 0, k and j = 1, k) are ap-
propriate, smooth real weight functions. In the
literature several examples of correspondence prin-
ciples are provided, a particular case being provided
by the so-called D2Q9 (V, p)-scheme [44, 45]
p(r, t) = c2
i=0,k
fi = c
i=0,k
i , (14)
V(r,t) =
i=1,k
aifi =
i=1,k
i , (15)
where k = 8 and c = min {|ai| > 0, i = 0, k} is a
characteristic parameter of the kinetic model to be
interpreted as test particle velocity. In customary
LBM’s the parameter cs =
(with D the dimen-
sion of the set Ω) is interpreted as sound speed of
the fluid. In order that the momenta (14) and (15)
recover (in some suitable approximate sense) INSE
, however, appropriate subsidiary conditions must
be met.
2. LB assumption #2: Constraints and asymptotic
conditions: these are based on the introduc-
tion of a dimensionless parameter ε, to be consid-
ered infinitesimal, in terms of which all relevant
parameters can be ordered. In particular, it is
required that the following asymptotic orderings
[17, 40, 41] apply respectively to the fluid fields
ρo,V(r, t), p(r, t), the kinematic viscosity ν = µ/ρo
and Reynolds number Re = LV/ν:
ρo,V(r, t), p(r, t) ∼ o(ε
0), (16)
[1 + o(ε)] ∼ o(εαR), (17)
Re ∼ 1/o(ε
αR), (18)
where αR ≥ 0. Here we stress that the position
for ν holds in the case of D2Q9 only, while the
generalization to 3D and other LB discretizations.
is straightforward. Furthermore, the velocity c and
collision frequency νc are ordered so that
c ∼ 1/o(εαc), (19)
νc ∼ 1/o(ε
αν ), (20)
∼ o(εα), (21)
with α ≡ αν−αc > 0; the characteristic length and
time scales, Lo ≡ c∆t and ∆t for the spatial and
time discretization are assumed to scale as
∼ o(εαL), (22)
∼ o(εαt), (23)
with αt, αL > 0. Here L and T are the (smallest)
characteristic length and time scales, respectively
for spatial and time variations of V(r, t) and p(r.t).
Imposing also that 1
results infinitesimal at least
of order
∼ o(εα)
it follows that it must be also αt − αL > 0. These
assumptions imply necessarily that the dimension-
less parameter M eff ≡ V
(Mach number) must be
ordered as
M eff ∼ O(εαc) (24)
(small Mach-number expansion).
3. LB assumption #3: Chapman-Enskog expansion -
Kinetic initial conditions, relaxation conditions: it
is assumed that the kinetic distribution function
fi(r, t) admits a convergent Chapman-Enskog ex-
pansion of the form
fi = f
i + δf
i + δ
i + .., (25)
where δ ≡ εα and the functions f
i (j ∈ N)
are assumed smooth functions of the form (multi-
scale expansion) f
i (ro, r1, r2, ..to, t1, t2, ..), where
rn = δ
nr, tn = δ
nt and n ∈ N. In typical LBM’s
the parameter δ is usually identified with ε (which
requires letting α = 1), while the Chapman-Enskog
expansion is usually required to hold at least up to
order o(δ2). In addition the initial conditions
fi(r, to) = f
i (r, to), (26)
(for i = 0, k) are imposed in the closure of the fluid
domain Ω. It is well known [46] that this position
generally (i.e., for non-stationary fluid fields), im-
plies the violation of the Chapman-Enskog expan-
sion close to t = to, since the approximate fluid
equations are recovered only letting δf
0, i.e., assuming that the kinetic distribution func-
tion has relaxed to the Chapman-Enskog form (25).
This implies a numerical error (in the evaluation of
the correct fluid fields) which can be overcome only
discarding the first few time steps in the numerical
simulation.
4. LB assumption #5: Equilibrium kinetic distribu-
tion: a possible realization for the equilibrium dis-
tributions f
i (i = 0, k) is given by a polynomial of
second degree in the fluid velocity [44]
i (r, t) = wi
[p− Φ(r)] + (27)
+wiρo
ai ·V
ai ·V
Here, without loss of generality, the case of the
D2Q9 LB discretization will be considered, with wi
and ai (for i = 0, 8) denoting prescribed dimension-
less constant weights and discrete velocities. Notice
that, by definition, f
i is not a Galilei scalar. Nev-
ertheless, it can be considered approximately in-
variant, at least with respect to low-velocity trans-
lations which do not violate the low-Mach number
assumption (24).
5. LB assumption #6: Kinetic boundary conditions:
They are specified by suitably prescribing the form
of the incoming distribution function at the bound-
ary δΩ. [47, 48, 49, 50, 51, 51, 52, 53, 54, 54, 55, 56,
57, 58, 59]. However, this position is not generally
consistent with the Chapman-Enskog solution (25)
(see related discussion in Appendix A). As a con-
sequence violations of the hydrodynamic equations
may be expected sufficiently close to the boundary,
a fact which may be only alleviated (but not com-
pletely eliminated) by adopting suitable grid refine-
ments near the boundary. An additional potential
difficulty is related to the condition of strict posi-
tivity of the kinetic distribution function [57] which
is not easily incorporated into the no-slip boundary
conditions [50, 51, 52].
3B - Computational complexity of asymptotic
LBM’s
The requirements posed by the validity of these hy-
potheses may strongly influence the computational com-
plexity of asymptotic LBM’s which is usually associated
to the total number of ”logical” operations which must
be performed during a prescribed time interval. There-
fore, a critical parameter of numerical simulation meth-
ods is their discretization time scale ∆t. This is - in turn
- related to the Courant number NC =
, where V
and Lo.denote respectively the sup of the magnitude of
the fluid velocity and the amplitudes of the spatial dis-
cretization. As is well known ”optimal” CFD simulation
methods typically allow Lo ∼ L and a definition of the
time step ∆t = ∆tOpt such that NC ∼
V ∆tOpt
∼ 1. In-
stead, for usual LBM’s satisfying the low-M eff assump-
tion (24), the Courant number is very small since it re-
sults NC = M
eff Lo
∼ O(εα)Lo
. This means that their
discretization time scale of ∆t is much smaller than ∆tOpt
and reads
∆t ∼M eff
∆tOpt. (28)
In addition, depending on the accuracy of the numeri-
cal algorithms adopted for the construction of the dis-
crete kinetic distribution function, also the ratio Lo
sults infinitesimal in the sense Lo
∼ o(εαL), with suitable
αL > 0. Finally, we stress that LB approaches based
on the adoption of the finite-difference streaming opera-
tor LFD(i) are usually only accurate to order o(∆t
2). For
them, therefore, the requirement placed by Eq.(28) might
be even stronger. This implies that traditional LBM’s
may involve a vastly larger computation time than that
afforded by more efficient numerical methods.
4 - NEW LB INVERSE KINETIC THEORY
(LB-IKT)
A basic issue in LB approaches [8, 11, 13, 42] con-
cerns the choice of the functional class of the discrete
kinetic distribution functions fi (i = 0, k) as well as the
related definition of the equilibrium discrete distribution
function f
i [which appears in the BGK collision opera-
tor; see Eq.(11)]. This refers in particular to their trans-
formation properties with respect to arbitrary Galilean
transformations, and specifically to their Galilei invari-
ance with respect to velocity translations with constant
velocity.
In statistical mechanics it is well known that the ki-
netic distribution function is usually assumed to be a
Galilean scalar. The same assumption can, in principle,
be adopted also for LB models. However, the kinetic
distribution functions fi and f
i do not necessarily re-
quire a physical interpretation of this type. In the se-
quel we show that for a discrete inverse kinetic theory it
is sufficient that fi and f
i be so defined that the mo-
ment equations coincide with the fluid equations (which
by definition are Galilei covariant). It is sufficient to de-
mand that both fi and f
i are identified with a ordinary
scalars with respect to the group of rotation in R2, while
they need not be necessarily invariant with respect to
arbitrary velocity translations. This means that fi is in-
variant only for a particular subset of inertial reference
frames. For example for a fluid which at the initial time
moves locally with constant velocity an element of this
set can be identified with the inertial frame which in the
same position is locally co-moving with the fluid.
The adoption of non-translationally invariant discrete
distributions fi is actually already well known in LBM
and results convenient for its simplicity. This means,
manifestly, that in general no obvious physical interpre-
tation can be attached to the other momenta of the dis-
crete kinetic distribution function. As a consequence,
the very definition of the concept of statistical entropy to
be associated to the f ′is is essentially arbitrary, as well as
the related principle of entropy maximization, typically
used for the determination of the equilibrium distribution
function f
i . Several authors, nevertheless, have investi-
gated the adoption of possible alternative formulations,
which are based on suitable definitions of the entropy
functional and/or the requirement of approximate or ex-
act Galilei invariance (see for example [29, 32, 62]).
4A - Foundations of LB-IKT
As previously indicated, there are several important
motivations for seeking an exact solver based on LBM.
The lack of a theory of this type represents in fact a
weak point of LB theory. Besides being a still unsolved
theoretical issue, the problem is relevant in order to de-
termine the exact relationship between the LBM’s and
traditional CFD schemes based on the direct discretiza-
tion of the Navier–Stokes equations. Following ideas re-
cently developed [2, 3, 4, 5, 7], we show that such a theory
can be formulated by means of an inverse kinetic theory
(IKT) with discrete velocities. By definition such an IKT
should yield exactly the complete set of fluid equations
and which, contrary to customary kinetic approaches in
CFD (in particular LB methods), should not depend on
asymptotic parameters. This implies that the inverse ki-
netic theory must also satisfy an exact closure condition.
As a further condition, we require that the fluid equa-
tions are fulfilled independently of the initial conditions
for the kinetic distribution function (to be properly set)
and should hold for arbitrary fluid fields. The latter re-
quirement is necessary since we must expect that the
validity of the inverse kinetic theory should not be lim-
ited to a subset of possible fluid motions nor depend on
special assumptions, like a prescribed range of Reynolds
numbers. In principle a phase-space theory, yielding an
inverse kinetic theory, may be conveniently set in terms of
a quasi-probability, denoted as kinetic distribution func-
tion, f(x, t). A particular case of interest (investigated in
Refs.[2, 3]) refers to the case in which f(x, t) can actu-
ally be identified with a phase-space probability density.
In the sequel we address both cases, showing that, to a
certain extent, in both cases the formulation of a generic
IKT can actually be treated in a similar fashion. This
requires the introduction of an appropriate set of consti-
tutive assumptions (or axioms). These concern in par-
ticular the definitions of the kinetic equation - denoted
as inverse kinetic equation (IKE) - which advances in
time f(x, t) and of the velocity momenta to be identified
with the relevant fluid fields (correspondence principle).
However, further assumptions, such as those involving
the regularity conditions for f(x, t) and the prescription
of its initial and boundary conditions must clearly be
added. The concept [of IKT] can be easily extended to
the case in which the kinetic distribution function takes
on only discrete values in velocity space. In the sequel
we consider for definiteness the case of the so-called LB
discretization, whereby - for each (r, t) ∈ Ω × I - the
kinetic distribution function is discrete, and in particu-
lar admits a finite set of discrete values fi(r, t) ∈ R, for
i = 0, k, each one corresponding to a prescribed constant
discrete velocity ai ∈ R
3 for i = 0, k.
4B - Constitutive assumptions
Let us now introduce the constitutive assumptions (ax-
ioms) set for the construction of a LB-IKT for INSE,
whose form is suggested by the analogous continuous
inverse kinetic theory [2, 3]. The axioms, define the
”generic” form of the discrete kinetic equation, its func-
tional setting, the momenta of the kinetic distribution
function and their initial and boundary conditions, are
the following ones:
Axiom I - LB–IKE and functional setting.
Let us require that the extended fluid fields {V,p1}
are strong solutions of INSE, with initial and boundary
conditions (7)-(8) and that the pseudo pressure po(t) is
an arbitrary, suitably smooth, real function. In particu-
lar we impose that the fluid fields and the volume force
belong to the minimal functional setting:
p1,ΦǫC
(2,1)(Ω× I),
VǫC(3,1)(Ω× I), (29)
(1,0)(Ω× I).
We assume that in the set Ω×I the following equation
LD(i)fi = Ωi(fi) + Si (30)
[LB inverse kinetic equation (LB-IKE)] is satisfied iden-
tically by the discrete kinetic distributions fi(r, t) for
i = 0, k. Here Ωi(fi) and LD(i) are respectively the BGK
and the differential streaming and operators [Eqs.(11)
and (12)], while Si is a source term to be defined. We
require that KB-IKE is defined in the set Ω× I, so that
Ωi(fi) and Si are at least that C
(1)(Ω × I) and contin-
uous in Ω × I. Moreover Ωi(fi), defined by Eq.(11), is
considered for generality and will be useful for compar-
isons with customary LB approaches. We remark that
the choice of the equilibrium kinetic distribution f
the BGK operator remains completely arbitrary. We
assume furthermore that in terms of fi the fluid fields
{V, p1} are determined by means of functionals of the
form MXj [fi] =
i=0,8
Xjfi (denoted as discrete velocity
momenta). For X = X1, X2 (with X1 = c
2, X2 =
these are related to the fluid fields by means of the equa-
tions (correspondence principle)
p1(r, t)− Φ(r) = c
i=0,8
fi = c
i=0,8
i , (31)
V(r,t)=
i=1,8
aifi =
i=1,8
i , (32)
where c = min {|ai| , i = 1, 8} is the test particle veloc-
ity and f
i is defined by Eq.(27) but with the kinetic
pressure p1 that replaces the fluid pressure p adopted
previously [44]. These equations are assumed to hold
identically in the set Ω × I and by assumption, fi and
i belong to the same functional class of real functions
defined so that the extended fluid fields belong to the
minimal functional setting (29). Moreover, without loss
of generality, we consider the D2Q9 LB discretization.
Axiom II - Kinetic initial and boundary conditions.
The discrete kinetic distribution function satisfies, for
i = 0, k and for all r belonging to the closure Ω, the
initial conditions
fi(r, to) = foi(r,to) (33)
where foi(r,to) (for i = 0, k) is a initial distribution func-
tion defined in such a way to satisfy in the same set the
initial conditions for the fluid fields
p1o(r) ≡ Po(to) + po(r)− Φ(r) = (34)
i=0,8
foi(r),
Vo(r) =
i=1,8
aifoi(r) . (35)
To define the analogous kinetic boundary conditions on
δΩ, let us assume that δΩ is a smooth, possibly moving,
surface. Let us introduce the velocity of the point of the
boundary determined by the position vector rw ∈ δΩ, de-
fined by Vw(rw(t), t) =
rw(t) and denote by n(rw, t)
the outward normal unit vector, orthogonal to the bound-
ary δΩ at the point rw. Let us denote by f
i (rw, t)
and f
i (rw , t) the kinetic distributions which carry the
discrete velocities ai for which there results respectively
(ai −Vw) ·n(rw , t) > 0 (outgoing-velocity distributions)
and (ai −Vw) · n(rw, t) ≤ 0 (incoming-velocity distribu-
tions) and which are identically zero otherwise. We as-
sume for definiteness that both sets, for which |ai| > 0,
are non empty (which requires that the parameter c
be suitably defined so that c > |Vw|). The bound-
ary conditions are obtained by prescribing the incom-
ing kinetic distribution f
i (rw , t), i.e., imposing (for all
(rw, t) ∈ δΩ× I)
i (rw, t) = f
oi (rw , t). (36)
Here f
oi (rw, t) are suitable functions, to be assumed
non-vanishing and defined only for incoming discrete ve-
locities for which (ai −Vw)·n(rw , t) ≤ 0. Manifestly, the
functions f
oi (rw, t) (i = 0, k) must be defined so that
the Dirichlet boundary conditions for the fluid fields are
identically fulfilled, namely there results
p1w(rw, t) = Po(t) + pw(rw, t)− Φ(r) = (37)
i=0,k
oi (rw, t) + f
i (rw, t)
Vw(rw, t) = (38)
i=1,k
oi (rw, t) + f
i (rw, t)
Here, again, the functions foi(r) and f
oi (rw, t) (for i =
0, k) must be assumed suitably smooth. A particular case
is obtained imposing identically for i = 0, k
foi(r,to) = f
i (r, to), (39)
oi (rw, t) = f
i (rw , t), (40)
where the identification with f
oi (rw, t) and f
oi (rw, t)
is intended respectively in the subsets ai ·n(rw, t) > 0 and
ai ·n(rw , t) ≤ 0. Finally, we notice that in case Neumann
boundary conditions are imposed on the fluid pressure,
Eq.(37) still holds provided pw(rw, t) is intended as a
calculated value.
Axiom III - Moment equations.
If fi(r, t), for i = 0, k, are arbitrary solutions of LB-
IKE [Eq.(30)] which satisfy Axioms I and II validity of
Axioms I and II, we assume that the moment equations of
the same LB-IKE, evaluated in terms of the moment op-
erators MXj [·] =
i=0,8
Xj ·, with j = 1, 2, coincide iden-
tically with INSE, namely that there results identically
[for all (r, t) ∈ Ω× I]
MX1 [Lifi − Ωi(fi)− Si] = ∇ ·V = 0, (41)
MX2 [Lifi − Ωi(fi)− Si] = NV = 0. (42)
Axiom IV - Source term.
The source term is required to depend on a finite num-
ber of momenta of the distribution function. It is as-
sumed that these include, at most, the extended fluid
fields {V,p1} and the kinetic tensor pressure
Π = 3
fiaiai − ρoVV. (43)
• Furthermore, we also normally require (except
for the LB-IKT described in Appendix B) that
Si(r, t) results independent of f
i (r,t), foi(r) and
fwi(rw , t) (for i = 0, k).
Although, the implications will made clear in the fol-
lowing sections, it is manifest that these axioms do not
specify uniquely the form (and functional class) of the
equilibrium kinetic distribution function f
i (r,t), nor
of the initial and boundary kinetic distribution func-
tions (33),(36). Thus, both f
i (r,t), foi(r,to) and the
related distribution they still remain in principle com-
pletely arbitrary. Nevertheless, by construction, the
initial and (Dirichlet) boundary conditions for the fluid
fields are satisfied identically. In the sequel we show that
these axioms define a (non-empty) family of parameter-
dependent LB-IKT’s, depending on two constant free pa-
rameters νc, c > 0 and one arbitrary real function Po(t).
The examples considered are reported respectively in the
following Sec. 5,6 and in the Appendix B.
5 - A POSSIBLE REALIZATION: THE
INTEGRAL LB-IKT
We now show that, for arbitrary choices of the distri-
butions fi(r,t) and f
i (r,t) which fulfill axioms I-IV, an
explicit (and non-unique) realization of the LB-IKT can
actually be obtained. We prove, in particular, that a pos-
sible realization of the discrete inverse kinetic theory, to
be denoted as integral LB-IKT, is provided by the source
Si = (44)
− ai ·
f1−µ∇
V −∇ ·Π+∇p
≡ S̃i,
where wi
is denoted as first pressure term. Holds, in
fact, the following theorem.
Theorem 1 - Integral LB-IKT
In validity of axioms I-IV the following statements
hold. For an arbitrary particular solution fi and for ar-
bitrary extended fluid fields :
A) if fi is a solution of LB-IKE [Eq.(30)] the moment
equations coincide identically with INSE in the set Ω×I;
B) the initial conditions and the (Dirichlet) boundary
conditions for the fluid fields are satisfied identically;
C) in validity of axiom IV the source term S̃i is non-
uniquely defined by Eq.(44).
Proof
A) We notice that by definition there results identically
S̃i =
aiS̃i = (46)
f−µ∇2V−∇ ·Π+∇p
On the other hand, by construction (Axiom I) fi (i =
1, k) is defined so that there results identically
i=0 Ωi =
0 and
i=0 aiΩi = 0. Hence the momenta MX1 ,MX2 of
LB-IKE deliver respectively
i=1,8
aifi = 0 (47)
i=1,8
aifi + ρoV · ∇V +∇p1 + f−µ∇
V = 0 (48)
where the fluid fields V,p1 are defined by Eqs.(31),(32).
Hence Eqs.(47) and (48) coincide respectively with the
isochoricity and Navier-Stokes equations [(1) and (2)].
As a consequence, fi is a particular solution of LB-IKE
iff the fluid fields {V,p1} are strong solutions of INSE.
B) Initial and boundary conditions for the fluid fields
are satisfied identically by construction thanks to Axiom
C) However, even prescribing νc, c > 0 and the real
function Po(t), the functional form of the equation can-
not be unique The non uniqueness of the functional form
of the source term S̃i(r, t) is assumed to be indepen-
dent of f
i (r,t) [and hence of Eq.(30)] is obvious. In
fact, let us assume that S̃i is a particular solution for
the source term which satisfies the previous axioms I-
IV. Then, it is always possible to add to Si arbitrary
terms of the form S̃i + δSi, with δSi 6= 0 which depends
only on the momenta indicated above, and gives van-
ishing contributions to the first two moment equations,
namely MXj [δSi] =
i=0,8
XjδSi = 0, with j = 1, 2. To
prove the non-uniqueness of the source term Si, it is suf-
ficient to notice that, for example, any term of the form
δSi =
F (r, t), with F (r, t) an arbitrary real
function (to be assumed, thanks to Axiom IV, a linear
function of the fluid velocity), gives vanishing contribu-
tions to the momentaMX1 ,MX2 . Hence S̃i is non-unique.
The implications of the theorem are straightforward.
First, manifestly, it holds also in the case in which the
BGK operator vanishes identically. This occurs letting
νc = 0 in the whole domain Ω × I. Hence the inverse
kinetic equation holds independently of the specific defi-
nition of f
i (r,t).
An interesting feature of the present approach lies in
the choice of the boundary condition adopted for fi(r,t),
which is different from that usually adopted in LBM’s
[see for example [14] for a review on the subject]. In par-
ticular, the choice adopted is the simplest permitting to
fulfill the Dirichlet boundary conditions [imposed on the
fluid fields]. This is obtained prescribing the functional
form of fi(r,t) on the boundary of the fluid domain (δΩ),
which is identified with a function foi(r, t).
Second, the functional class of fi(r,t), f
i (r,t) and of
foi(r, t) remains essentially arbitrary. Thus, in particu-
lar, the initial and boundary conditions, specified by the
same function foi(r, t), can be defined imposing the po-
sitions (39),(40). As further basic consequence, f
i (r,t)
and fi(r,t) need not necessarily be Galilei-invariant (in
particular they may not be invariant with respect to ve-
locity translations), although the fluid equations must
be necessarily fully Galilei-covariant. As a consequence
it is always possible to select f
i (r,t) and foi(r, t) based
on convenience and mathematical simplicity. Thus, be-
sides distributions which are Galilei invariant and sat-
isfy a principle of maximum entropy (see for example
[22, 30, 32, 34, 60, 61]), it is always possible to iden-
tify them [i.e., f
i (r,t), foi(r, t)] with a non-Galilean in-
variant polynomial distribution of the type (27) [mani-
festly, to be exactly Galilei-invariant each f
i (r,t) should
depend on velocity only via the relative velocity ui =
ai −V].
We mention that the non-uniqueness of the source term
S̃i can be exploited also by imposing that f
i (r,t) re-
sults a particular solution of the inverse kinetic equation
Eq.(30) and there results also foi(r, t) = f
i (r,t). In Ap-
pendix B we report the extension of THM.1 which is ob-
tained by identifying again f
i (r,t) with the polynomial
distribution (27).
6 - THE ENTROPIC PRINCIPLE - CONDITION
OF POSITIVITY OF THE KINETIC
DISTRIBUTION FUNCTION
A fundamental limitation of the standard LB ap-
proaches is their difficulty to attain low viscosities, due to
the appearance of numerical instabilities [14]. In numeri-
cal simulations based on customary LB approaches large
Reynolds numbers is usually achieved by increasing nu-
merical accuracy, in particular strongly reducing the time
step and the grid size of the spatial discretization (both
of which can be realized by means of numerical schemes
with adaptive time-step and using grid refinements).
Hence, the control [and possible inhibition] of numerical
instabilities is achieved at the expense of computational
efficiency. This obstacle is only partially alleviated by
approaches based on ELBM [22, 30, 32, 34, 60, 61]. Such
methods are based on the hypothesis of fulfilling an H-
theorem, i.e., of satisfying in the whole domain Ω × I
the condition of strict positivity for the discrete kinetic
distribution functions. This requirement is considered,
by several authors (see for example [26, 29, 62]), an es-
sential prerequisite to achieve numerical stability in LB
simulations. However, the numerical implementation of
ELBM typically induce a substantial complication of the
original algorithm, or require a cumbersome fine-tuning
of adjustable parameters [22, 37].
6A - The constant entropy principle and PEM
A basic aspect of the IKT’s here developed is the possi-
bility of fulfilling identically the strict positivity require-
ment by means of a suitable H-theorem which provides
also a maximum entropy principle. In particular, in this
Section, extending the results of THM.1 and 2, we intend
to prove that a constant H-theorem can be established
both for the integral and differential LB-IKT’s defined
above. The H-theorem can be reached by imposing for
the Gibbs-Shannon entropy functional the requirement
that for all t ∈ I there results
S(f) = −
i=0,8
fi ln(fi/wi) = 0, (49)
which implies that S(f) is necessarily maximal in a suit-
able functional set {f} . The result can be stated as fol-
lows:
Theorem 2 - Constant H-theorem
In validity of THM.1, let us assume that:
1) the configuration domain Ω is bounded;
2) at time to the discrete kinetic distribution functions
fi, for i = 0, 8, are all strictly positive in the set Ω.
Then the following statements hold:
A) by suitable definition of the pseudo pressure
Po(t), the Gibbs-Shannon entropy functional S(f) =
i=0,8
fi ln(fi/wi) can be set to be constant in the
whole time interval I. This holds provided the pseudo-
pressure Po(t) satisfies the differential equation
(1 + log fi) = (50)
ai · ∇fi − Ŝi
(1 + log fi) ,
where Ŝi = Si +
B) if the entropy functional S(f) =
i=0,8
fi ln(fi/wi) is constant in the whole
time interval I the discrete kinetic distribution functions
fi are all strictly positive in the whole set Ω× I;
C) an arbitrary solution of LB-IKE [Eq.(30)] which
satisfies the requirement A) is extremal in a suitable func-
tional class and maximizes the Gibbs-Shannon entropy .
Proof:
A) Invoking Eq.(30), there results
∂S(t)
[1 + log fi] = (51)
(ai · ∇fi − Si) (1 + log fi) ,
where Si is the source term, provided by Eq.(44). By
direct substitution it follows the thesis.
B) If Eq.(50) holds identically in there results ∀t ∈
I, S (t) = S (t0) , which implies the strict positivity of fi,
for all i = 0, 8.
C) Let us introduce the functional class
{f + αδf} = {fi = fi(t) + αδfi(t), i = 0, 8} , (52)
where α is a finite real parameter and the syn-
chronous variation δfi(t) is defined δfi(t) = dfi(t) ≡
∂fi(t)
dt. Introducing the synchronous variation of the en-
tropy, defined by δS (t) = ∂
, with ψ(α) =
S (f + αδf) , it follows
δS (t) = dt
∂S(t)
. (53)
Since in validity of Eq.(50) there results
∂S(t)
which in view of Eq.(53) implies also δS (t) = 0. It is im-
mediately follows that there results necessarily δ2S (t) ≤
0, i.e., S (t) is maximal. Therefore, the kinetic distribu-
tion function which satisfies IKE (Eq.(30)] is extremal in
the functional class of variations (52) and maximizes the
Gibbs-Shannon entropy functional.
6B - Implications
In view of statement B, THM.2 warrants the strict pos-
itivity of the discrete distribution functions fi (i = 0, 8)
only in the open set Ω × I, while nothing can be said
regarding their behavior on the boundary δΩ (on which
fi might locally vanish). However, since the inverse ki-
netic equation actually holds only in the open set Ω× I,
this does not affect the validity of the result. While the
precise cause of the numerical instability of LBM’s is still
unknown,the strict positivity of the distribution function
is usually considered important for the stability of the nu-
merical solution [29, 30]. It must be stressed that the nu-
merical implementation of the condition of constant en-
tropy Eq.(50) should be straightforward, without involv-
ing a significant computational overhead for LB simula-
tions. Therefore it might represent a convenient scheme
to be adopted also for customary LB methods.
7 - ASYMPTOTIC APPROXIMATIONS AND
COMPARISONS WITH PREVIOUS CFD
METHODS
A basic issue is the relationship with previous CFD nu-
merical methods, particularly asymptotic LBM’s. Here
we consider, for definiteness, only the case of the inte-
gral LB-IKT introduced in Sec.5. Another motivation is
the possibility of constructing new improved asymptotic
models, which satisfy with prescribed accuracy the re-
quired fluid equations [INSE], of extending the range of
validity of traditional LBM’s and fulfilling also the en-
tropic principle (see Sec.6). The analysis is useful in
particular to establish on rigorous grounds the consis-
tency of previous LBM’s. The connection [with previ-
ous LBM’s] can be reached by introducing appropriate
asymptotic approximations for the IKT’s, obtained by
assuming that suitable parameters which characterize the
IKT’s are infinitesimal (or infinite) (asymptotic parame-
ters). A further interesting feature is the possibility of
constructing in principle a class of new asymptotic LBM’s
with prescribed accuracy , i.e., in which the distribution
function (and the corresponding momenta) can be de-
termined with predetermined accuracy in terms of per-
turbative expansions in the relevant asymptotic parame-
ters. Besides recovering the traditional low-Mach number
LBM’s [17, 21, 40], which satisfy the isochoricity condi-
tion only in an asymptotic sense and are closely related to
the Chorin artificial compressibility method, it is possible
to obtain an improved asymptotic LBM’s which satisfy
exactly the same equation.
We first notice that the present IKT is characterized
by the arbitrary positive parameters νc, c and the initial
value Po(to), which enter respectively in the definition
of the BGK operator [see (11)], the velocity momenta
and equilibrium distribution function f
i . Both c and
Po(to) must be assumed strictly positive, while, to assure
the validity of THM.2, Po(to) must be defined so that
(for all i = 0, 8) f
i (r,to) > 0 in the closure Ω. Thanks
to THM.1.and 2 the new theory is manifestly valid for
arbitrary finite value of these parameters. This means
that they hold also assuming
o(εαν )
, (54)
o(εαc)
, (55)
Po(to) ∼ o(ε
0), (56)
where ε denotes a strictly positive real infinitesimal,
αν , αc > 0 are real parameters to be defined, while the
extended fluid fields {ρ,V, p1} and the volume force f
are all assumed independent of ε. Hence, with respect to
ε they scale
ρo,V,p1, f ∼ o(ε
0). (57)
As a result, for suitably smooth fluid fields (i.e., in va-
lidity of Axiom 1) and appropriate initial conditions for
fi(r, t), it is expected that the first requirement actually
implies in the whole set Ω× I the condition of closeness
fi(r, t) ∼= f
i (r, t) [1 + o(ε)] , consistent with the LB As-
sumption #4. To display meaningful comparisons with
previous LBM’s let us introduce the further assumption
that the fluid viscosity is small in the sense
µ ∼ o(εαµ), (58)
with αµ ≥ 1 another real parameter to be defined.
Asymptotic approximations for the corresponding LB-
IKE [Eq.(30)] can be directly recovered by introducing
appropriate asymptotic orderings for the contributions
appearing in the source term Si = S̃i. Direct inspec-
tion shows that these are provided by the (dimensional)
parameters
M effp,a ≡
, (59)
∣∣∇ ·Π−∇p
∣∣ , (60)
∣∣µ∇2V
∣∣ . (61)
The first two M effp,a and M
are here denoted respec-
tively as (first and second) pressure effective Mach num-
bers, driven respectively by the pressure time-derivative
and by the divergence of the pressure anisotropy Π−p1.
Furthermore, M
is denoted as velocity effective Mach
number. Physically relevant examples [of asymptotic
LBM’s] can be achieved by introducing suitable orderings
in terms of the single infinitesimal ε for the parameters
M effp,a ,M
.We stress that these orderings, in
principle, can be introduced without actually introducing
restrictions on the fluid fields, i.e., retaining the assump-
tion that the extended fluid fields are independent of ε.
Interesting cases are provided by the asymptotic order-
ings indicated below.
7A - Small effective Mach numbers (Meffp,a ,M
An important aspect of LB theory is the possibility
of constructing asymptotic LBM’s with prescribed accu-
racy with respect to the infinitesimal parameter ε, in the
sense that the fluid equations are satisfied at least cor-
rect up to terms of order o(εn) included, with n = 1
or 2, namely ignoring error terms of order o(εn+1) or
higher. Let us, first, consider the case in which all pa-
rametersM effp,a ,M
and M
are all infinitesimal w.r.
to ε (low-effective-Mach numbers). Since the parameters
c and νc are free, they can be defined so that that there
results c ∼ νc ∼ 1/o(ε) [which implies αc = αν = 1].
This requires
M effp,a ∼M
∼ o(ε2). (62)
If, we consider a low-viscosity fluid for which the kine-
matic viscosity ν = µ/ρo can be assumed of order ε [and
hence αµ = 1] it follows that
∼ o(ε2). (63)
Thanks to the assumptions (54)-(58) there follows ∇ ·
Π − ∇p ∼ o(ε) and µ∇2V ∼ o(ε),which implies that
the source term S̃i, ignoring corrections of order o(ε
becomes
S̃i ∼= S̃Ai [1 + o(ε)] , (64)
S̃Ai ≡ −
ai · f . (65)
It is immediate to determine the corresponding moment
equations, which read:
+∇ ·V = 0, (66)
NV = 0+ o(ε2), (67)
Formally the first equation can be interpreted as an evo-
lution equation for the kinetic pressure p1. Nevertheless,
in view of the ordering (62) it actually implies the iso-
choricity condition
∇ ·V = 0 + o(ε2). (68)
Instead, the second one [Eq.(67)]. due to the asymp-
totic approximation (63), reduces to the Euler equation.
Therefore in this case the asymptotic approximation (64)
is not adequate. To recover the correct Navier-Stokes
equation a more accurate approximation is needed, real-
ized requiring that the hydrodynamic equations are sat-
isfied correct to order o(ε3). A fist possibility is to con-
sider a more accurate approximation for the source term.
Restoring the pressure and viscous source terms in (64)
there results the asymptotic source term
S̃Bi ≡
− ai ·
f1−µ∇
, (69)
where in validity of the previous orderings
S̃i ∼= S̃Bi [1 + o(ε)] . (70)
The corresponding moment equations become therefore
∇ ·V = 0, (71)
NV = 0+ o(ε3). (72)
It is remarkable that in this case the isochoricity condi-
tion is exactly fulfilled, even if the source term is not the
exact one. For the sake of reference, it is interesting to
mention another possible small-Mach-number ordering.
This is obtained imposing for the parameters c and νc
, (73)
o(ε2)
, (74)
while requiring for ν = µ/ρo the same constraint adopted
by asymptotic LBM’s, namely Eq.(17). In this case one
can show that the moment equation (72) is actually satis-
fied correct to order o(ε3), while the isochoricity condition
is only satisfied to order o(ε2). The following theorem
can, in fact, be proven:
Theorem 3 - Low effective-Mach-numbers asymptotic
approximation
In validity of THM.1, let us invoke the following as-
sumptions:
1) LB assumptions #3 and #4 for the discrete kinetic
distributions fi ( i = 0, 8);
2) the free parameters c and νc are assumed to satisfy
the asymptotic orderings (73),(74);
3) the fluid viscosity µ is assumed of order µ ∼ o(ε)
4) the fluid viscosity µ is prescribed so that the kine-
matic viscosity ν = µ/ρo is defined in accordance to
Eq.(17);
5) the kinetic pressure p1 is assumed slowly varying in
the sense
∂ ln p1
∼ o(ε). (75)
It follows that the source term is approximated by
Eq.(64) and moment equations are provided by the
asymptotic equations:
+∇ ·V = 0 + o(ε3), (76)
NV = 0+ o(ε3), (77)
i.e., the isochoricity and NS equation are recovered re-
spectively correct to order o(ε2) and o(ε3).
Proof
First we notice that the ordering assumptions 2)-5)
require
M effp,a ∼ o(ε
3) (78)
∼ o(ε2), (79)
∼ o(ε4), (80)
which imply at least the validity of Eqs.(64)-(67). The
proof of Eqs.(76) and (77) is immediate. In both cases
it sufficient to notice that in validity of hypotheses 1)-3)
and in terms of a Chapman-Enskog perturbative solution
of Eq.(30) there results actually
−µ∇2V −∇ ·Π+∇p = O + o(ε3), (81)
and hence S̃i reduces to Eq.(64).
The predictions of THM.3 are relevant for comparisons
and to provide asymptotic accuracy estimates for previ-
ous asymptotic LBM’s [see Refs. [17, 21, 40]]. In fact,
the asymptotic moment equations (76) and (77) formally
coincide with the analogous moment equations predicted
by such theories, when the kinetic pressure p is replaced
by the fluid pressure p1 (i.e., if the function Po(t) is set
identically equal to zero). [17, 21, 40]. Nevertheless, the
accuracy of customary LBM’s depends on the properties
of the solutions of INSE. In fact, if one assumes
∂ ln p1
∼ o(ε0) (82)
the customary (V, p) asymptotic LBM [17, 21, 40] result
actually accurate only to order o(ε2). Therefore, in such
case to reach an accuracy of order o(ε3) the approxima-
tion (69) must be invoked for the source term.
The other interesting feature of Eqs.(76) and (77) is
that they provide a connection with the artificial com-
pressibility method (ACM) postulated by Chorin [63],
previously motivated merely on the grounds of an asymp-
totic LBM [21]. In fact, these coincides with the Chorin’s
pressure relaxation equation where c can be interpreted
as sound speed of the fluid. However - in a sense - this
analogy is purely formal and is only due to the neglect of
the first pressure source term in Si. It disappears alto-
gether in Eq.(71) if we adopt the more accurate asymp-
totic source term (69). A further difference is provided
by the adoption of the kinetic pressure p1 which replaces
the fluid pressure p (used in Chorin approach). We stress
that the choice of p1 here adopted, with Po(t) determined
by the entropic principle, represents an important differ-
ence, since it permits to satisfy everywhere in Ω× I the
condition of strict positivity for the discrete kinetic dis-
tribution functions.
7B - Finite pressure-Mach number Meffp,a
Another possible asymptotic ordering, usually not per-
mitted by customary asymptotic LBM’s, is the one in
which the test particle velocity is finite, namely c ∼ o(ε0),
the viscosity remains arbitrary and is taken of order
µ ∼ o(ε0) while again νc is assumed νc ∼ 1/o(ε
2) [i.e.,
αc = νc = 0, αν = 2]. In this case the pressure Mach
M effp,a number results finite, while velocity and the sec-
ond pressure Mach numbers are considered infinitesimal,
respectively of first and second order in ε, namely
M effp,a ∼ o(ε
∼ o(ε), (83)
∼ o(ε2).
To obtain the fluid equation with the prescribed accu-
racy, say of order o(ε2), it is sufficient to approximate
the source term S̃i in terms of S̃i ∼= S
1 + o(ε2)
. The
set of asymptotic moment equations coincide therefore
with Eqs.(71),(72). Again, the isochoricity condition is
exactly fulfilled, while in this case the NS equation is
accurate only to order o(ε2).
7C - Small effective pressure-Mach numbers
(Meffp,a ,M
) and finite velocity-Mach number (M
Finally, another interesting case is the one in which the
fluid viscosity µ remains finite (strongly viscous fluid),
i.e., in the sense µ ∼ o(ε0) [i.e., αµ = 0] while both
parameters c and νc are suitably large, and respectively
scale as c ∼ 1/o(ε), νc ∼ 1/o(ε
2) [i.e., αc = 1, αν = 2].
Due to assumptions (54)-(58) one obtains ∇ ·Π−∇p ∼
o(ε2) and µ∇2V ∼ o(ε0). It follows that the effective
Mach numbers scale respectively as
∼ o(ε3) (84)
M effp,a ∼ M
∼ o(ε2),
If we impose on µ also the same constraint set by Eq.(17),
the customary asymptotic LBM’s can be invoked also in
this case. However, since the first pressure and veloc-
ity Mach numbers are only second order accurate, the
NS equation is recovered to order o(ε2) only. Never-
theless, it is possible to recover with prescribed accuracy
the fluid equations (71),(72). This is obtained adopt-
ing the source term S̃i ∼= S̃Bi [see Eq.(69)]. As a basic
consequence, the isochoricity equation is satisfied exactly
(hence no meaningful analogy with Chorin’s approach
arises), while the NS equation results correct to order
o(ε3). These results provide a meaningful extension of
the customary asymptotic LBM’s. We stress that the
entropic approach here developed holds independently of
the asymptotic orderings here considered [for the param-
etersM effp,a ,M
]. Thus it can be used in all cases
to assure the strict positivity of the discrete distribution
function.
8 - CONCLUSIONS
In this paper we have presented the theoretical foun-
dations of a new phase-space model for incompressible
isothermal fluids, based on a generalization of customary
lattice Boltzmann approaches.We have shown that many
of the limitations of traditional (asymptotic) LBM’s can
be overcome. As a main result, we have proven that
the LB-IKT can be developed in such a way that it
furnishes exact Navier-Stokes and Poisson solvers, i.e.,
it is - in a proper sense - an inverse kinetic theory for
INSE. The theory exhibits several features, in particular
we have proven that the integral LB-IKT (see Sec.5):
1. determines uniquely the fluid pressure p(r, t) via
the discrete kinetic distribution function without
solving explicitly (i.e., numerically) the Poisson
equation for the fluid pressure. Although analo-
gous to traditional LBM’s, this is interesting since
it is achieved without introducing compressibility
and/or thermal effects. In particular the present
theory does not rely on a state equation for the
fluid pressure.
2. is complete, namely all fluid fields are expressed as
momenta of the distribution function and all hy-
drodynamic equations are identified with suitable
moment equations of the LB inverse kinetic equa-
tion.
3. allows arbitrary initial and boundary conditions for
the fluid fields.
4. is self-consistent : the kinetic theory holds for ar-
bitrary, suitably smooth initial conditions for the
kinetic distribution function. In other words, the
initial kinetic distribution function must remain ar-
bitrary even if a suitable set of its momenta are
prescribed at the initial time.
5. the associated the kinetic and equilibrium distri-
bution functions can always be chosen to belong to
the class of non-Galilei-invariant distributions. In
particular the equilibrium kinetic distribution can
always be identified with a polynomial of second
degree in the velocity.
6. is non-asymptotic, i.e., unlike traditional LBM’s it
does not depend on any small parameter, in partic-
ular it holds for finite Mach numbers.
7. fulfills an entropic principle, based on a constant-H
theorem. This theorem assures, at the same time,
the strict positivity of the discrete kinetic distri-
bution function and the maximization of the as-
sociated Gibbs-Shannon entropy in a properly de-
fined functional class. Remarkably the constant H-
theorem is fulfilled for arbitrary (strictly positive)
kinetic equilibria. This includes also the case of
polynomial kinetic equilibria.
A further remarkable aspect of the theory concerns the
choice of the kinetic boundary conditions to be satisfied
by the distribution function (Axiom II) and obtained by
prescribing the form of the incoming-velocity distribution
[see Eq.(36)]. Thanks to Eqs.(34),(35), this requirement
[of the LB-IKT] the boundary conditions for the fluid
fields are satisfied exactly while the fluid equations are
by construction identically fulfilled also arbitrarily close
to the boundary. This result, in a proper sense, applies
only to Dirichlet boundary conditions for the fluid fields
[see Eqs.(8)]. Nevertheless the same approach can be
in principle extended to the case of mixed or Neumann
boundary conditions for the fluid fields.
Moreover, we have shown that a useful implication of
the theory is provided by the possibility of constructing
asymptotic approximations to the inverse kinetic equa-
tion. This permits to develop a new class of asymptotic
LBM’s which satisfy INSE with prescribed accuracy, to
obtain useful comparisons with previous CFD methods
(Chorin’s ACM) and to achieve accuracy estimates for
customary asymptotic LBM’s. The main results of the
paper are represented by THM’s 1-3, which refer respec-
tively to the construction of the integral LB-IKT, to the
entropic principle and to construction of the low effective-
Mach-numbers asymptotic approximations. For the sake
of reference, also another type of LB-IKT, which admits
as exact particular solution the polynomial kinetic equi-
librium, has been pointed out (THM.1bis).
The construction of a discrete inverse kinetic theory of
this type for the incompressible Navier-Stokes equations
represents an exciting development for the phase-space
description of fluid dynamics, providing a new starting
point for theoretical and numerical investigations based
on LB theory. In our view, the route to more accurate,
higher-order LBM’s, here pointed out, will be important
in order to achieve substantial improvements in the effi-
ciency of LBM’s in the near future.
APPENDIX A
The basic argument regarding the accuracy of the
boundary conditions adopted by customary asymptotic
LBM’s is provided by Ref.[46]. In fact. let us assume that
on the boundary δΩ the incoming distribution function
i (rw , t) is prescribed according to Eqs.(33),(37) and
(38), being f
oi (rw, t) prescribed suitably smooth func-
tions which are non vanishing only only for incoming dis-
crete velocities ai for which (ai −Vw) ·n(rw, t) ≤ 0. For
definiteness, let us assume that f
oi (rw , t) ≡ f
i (rw, t)
where f
i (rw, t) denotes a suitable equilibrium distribu-
tion. It follows that suitably close to the boundary the
kinetic distribution differs from the Chapman-Enskog so-
lution (25). The numerical error can be overcome only
discarding the first few spatial grid (close to the bound-
ary) in the numerical simulation [46].
APPENDIX B
Unlike standard kinetic theory, the distinctive feature
of LB-IKT’s is the possibility of adopting a non-Galilei
invariant kinetic distribution function (i.e., non-invariant
with respect to velocity translations). Here we re-
port another example of discrete inverse kinetic theory
of this type. Let us modify Axiom IV so that to permit
that a particular solution of LB-IKE [Eq.(30)] is pro-
vided by fi = f
i . Here we identify f
i with the (non-
Galilei invariant) polynomial kinetic distribution defined
by Eq.(27) but with the kinetic pressure p1 that replace
the fluid pressure p. In this case one can prove that the
source term Si reads
Si = S
i ≡ S̃i +∆Si, (85)
where
∆Si =
(ai −V) · ∇V−
ai∇·V
· ai+
V − ai
3ai ·V
+ (86)
wiρoai · ∇
ai ·V
Here N1 ≡ N−ρo
, where N is the Navier-Stokes opera-
tor (6), namely N1 is the nonlinear operator which acting
onV yields N1V = ρoV·∇V+∇ [p1 − Φ (r)]+f1−µ∇
Hence, invoking INSE, ∆Si can also be written in the
equivalent form
∆Si =
(ai −V) · ∇V−
ai∇·V
· ai+
V − ai
3ai ·V
+ (87)
wiρoai · ∇
ai ·V
The following result holds:
Theorem 1bis - Differential LB-IKT
In validity of axioms I-IV and the assumption that
fi = f
i is a particular solution of Eq.(30), the following
statements hold:
i is a particular solution of LB-IKE [Eq.(30)]
if and only if the extended fluid fields {V,p1} are strong
solutions of INSE of class (29), with initial and boundary
conditions (7)-(8), and arbitrary pseudo pressure po(t) of
class C(1)(I).
Moreover, for an arbitrary particular solution fi and
for arbitrary extended fluid fields :
For an arbitrary particular solution fi :
B) fi is a solution of LB-IKE [Eq.(30)] if and only
if the extended fluid fields {V,p1} are arbitrary strong
solutions of INSE of class (29), with initial and boundary
conditions (7)-(8), and arbitrary pseudo pressure po(t) of
class C(1)(I);
C) the moment equations of L-B IKE coincide identi-
cally with INSE in the set Ω× I;
D) the initial conditions and the (Dirichlet) boundary
conditions for the fluid fields are satisfied identically;
E) the source term Si is uniquely defined by
Eqs.(85),(86);
Proof:
The proof of propositions A,B, C and D is analogous
to that provided in THM.1. Assuming Si = S
i , the
proof of B follows from straightforward algebra. In fact,
letting fi(r, t) = f
i (r, t) for all (r, t) ∈ Ω × I in the
LB-IKE [Eq.(30)], one finds that Eq.(30) is fulfilled iff
the fluid fields satisfy the Navier-Stokes, isochoricity and
incompressibility equations (1),(2) and (3). The proof
of proposition E can be reached in a similar way. The
uniqueness of the source term Si is an immediate conse-
quence of the uniqueness of the solutions for INSE.
ACKNOWLEDGEMENTS Useful comments and
stimulating discussions with K.R. Sreenivasan, Direc-
tor, ICTP (International Center of Theoretical Physics,
Trieste, Italy) are warmly acknowledged. Research de-
veloped in the framework of PRIN Project Fundamen-
tals of kinetic theory and applications to fluid dynamics,
magnetofluid dynamics and quantum mechanics (MIUR,
Ministry for University and Research, Italy), with the
support of the Consortium for Magnetofluid Dynamics,
Trieste, Italy.
[1] M. Ellero and M. Tessarotto, Bull. Am Phys. Soc. 45 (9),
40 (2000).
[2] M. Tessarotto and M. Ellero, RGD24 (Italy, July 10-16,
2004), AIP Conf. Proc. 762, 108 (2005).
[3] M. Ellero and M. Tessarotto, Physica A 355, 233 (2005).
[4] M. Tessarotto and M. Ellero, Physica A 373, 142 (2007);
arXiv: physics/0602140.
[5] M. Tessarotto and M. Ellero, “On the uniqueness of con-
tinuous inverse kinetic theory for incompressible fluids,”
in press on AIP Conf. Proc., RGD25 (St. Petersburg,
Russia, July 21-28, 2006); arXiv:physics/0611113.
[6] M. Tessarotto, M. Ellero, N. Aslan, M. Mond and
P. Nicolini, “An exact pressure evolution equation
for the incompressible Navier-Stokes equationsInverse”,
arXiv:physics/0612072 (2006).
[7] M. Tessarotto, M. Ellero and P. Nicolini, Phys.Rev. A
75, 012105 (2007); arXiv:quantum-ph/060691.
[8] G.R. McNamara and G. Zanetti, Phys. Rev. Lett. 61,
2332 (1988).
[9] F. Higuera, S. Succi and R. Benzi, Europhys. Lett. 9,345
(1989).
[10] S. Succi, R. Benzi, and F. Higuera, Physica D 47, 219
(1991).
[11] S. Chen, H. Chen, D. O. Martinez, and W. H. Matthaeus,
Phys.Rev. Lett. 67, 3776 (1991).
[12] R. Benzi et al., Phys. Rep. 222, 145 (1992).
[13] H. Chen, S. Chen, and W. Matthaeus, Phys. Rev. A 45,
R5339 (1992).
[14] S. Succi, The Lattice Boltzmann Equation for Fluid Dy-
namics and Beyond (Numerical Mathematics and Scien-
tific Computation), Oxford Science Publications (2001).
[15] U. Frisch, B. Hasslacher, and Y. Pomeau, Rev.Lett. 56,
1505 (1986).
[16] U. Frisch, D. d’Humieres, B. Hasslachaer, P. Lallemand,
Y. Pomeau, and J.-P. Rivet, Complex Syst. 1, 649 (1987).
[17] X. He and L-S. Luo, Phys. Rev. E 56, 6811 (1997).
[18] D. O. Martinez, W. H. Matthaeus, S. Chen, et al., Phys.
Fluids 6, 1285 (1994).
[19] S. L. Hou, Q. S. Zou, S. Y. Chen, G. Doolen, and A. C.
Cogley, J. Comput. Phys. 118, 329 (1995).
[20] X. He and G. Doolen, J. Comput. Phys. 134, 306 (1997).
http://arxiv.org/abs/physics/0602140
http://arxiv.org/abs/physics/0611113
http://arxiv.org/abs/physics/0612072
[21] X. He, G.D. Doolen, and T. Clark, J. Comp. Physics 179
(2), 439 (2002).
[22] S. Ansumali and I. V. Karlin, Phys. Rev. E 65, 056312
(2002).
[23] Y. Shi, T. S. Zhao and Z. L. Guo, Phys. Rev.E 73, 026704
2006.
[24] X. Shan and X. He, Phys. Rev. Lett. 80, 65 (1998).
[25] S. Ansumali, I.V. Karlin, and H. C. Öttinger, Euro-
phys.Lett. 63, 798 (2003).
[26] S.S. Chikatamarla and I.V. Karlin, Phys. Rev.Lett. 97,
190601 (2006).
[27] A. Bardow, I.V. Karlin, and A. A. Gusev, Europhys.
Lett. 75, 434 (2006).
[28] S. Ansumali and I.V. Karlin, Phys. Rev. Lett. 95, 260605
(2005).
[29] S. Succi, I.V. Karlin, and H. Chen, Rev. Mod. Phys. 74,
1203 (2002).
[30] B. M. Boghosian, J. Yepez, P.V. Coveney, and A. J. Wag-
ner, Proc. R. Soc. A 457, 717 (2001).
[31] M. E. McCracken and J. Abraham, Phys. Rev.E 71,
036701 (2005).
[32] I.V. Karlin and S. Succi, Phys. Rev. E 58, R4053 (1998).
[33] I.V. Karlin, A. N. Gorban, S. Succi, and V. Boffi,
Phys.Rev. Lett. 81, 6 (1998).
[34] I.V. Karlin, A. Ferrante, and H. C. Öttinger, Europhys.
Lett. 47, 182 (1999).
[35] W.-A. Yong and L.-S. Luo, Phys. Rev. E 67, 051105
(2003).
[36] P. J. Dellar, Europhys. Lett. 57, 690 (2002).
[37] P. Lallemand and L.-S. Luo, Phys. Rev. E 61, 6546
(2000).
[38] P. Lallemand and L.-S. Luo, Phys. Rev. E 68, 036706
(2003).
[39] E.T. Shannon, Phys. Rev. 106, 620 (1957).
[40] T. Abe, J. Comput. Phys. 131, 241 (1997).
[41] N. Cao, S.Chen, S.Jin, and D. Mart́ınez, Phys. Rev. E
55, R21 (1997).
[42] Y.-H. Qian, D. d’Humieres, and P. Lallemand, Euro-
phys.Lett. 17, 479 (1992).
[43] P. L. Bhatnagar, E. P. Gross, and M. Krook, Phys. Rev.
A 94,511 (1954).
[44] X. He and L.S. Luo, Phys. Rev. E 55, R6333 (1997).
[45] Q. Zou and X. He, Phys. Fluids 9, 1951 (1997).
[46] P.A.Skordos, Phys.Review E 48, 4823 (1993).
[47] R. Comubert, D. d’Humieres, and D. Levermore, Physica
D 47, 241 (1991).
[48] D.P. Ziegler, J. Stat. Phys. 71, 1171 (1993).
[49] I. Ginzbourg and P. M. Adler, J. Phys. II France 4, 191
(1994).
[50] A.J.C. Ladd, J. Fluid Mech. 271, 285 (1994).
[51] D.R. Noble, S. Chen, J.G. Georgiadis, R.O. Buckius,
Phys. Fluids 7 (l), 203 (1995).
[52] Q. Zou and H. He, Phys. Fluids 7, 1788 (1996).
[53] R.S. Maier, R.S. Bernard, and D.W. Grunau, Phys. Flu-
ids 7, 1788 (1996).
[54] S. Chen, Daniel Martınez and R.Mei, Phys. Fluids 8(9),
2528 (1996).
[55] R. Mei, L.S. Luo, and W. Shyy, J. Comput. Phys. 155,
307 (1999).
[56] M. Bouzidi, M. Firadaouss, and P. Lallemand, Phys. Flu-
ids 13, 452 (2001).
[57] S. Ansumali and I. V. Karlin, Phys. Rev. E 66, 026311
(2002).
[58] I. Ginzburg and D.d’Humieres, Phys. Rev. E 68, 066614
(2003).
[59] M. Junk and Z. Yang, Phys. Rev. E 72, 066701 (2005).
[60] S Ansumali and I. V. Karlin, Phys. Rev. E 62, 7999
(2000).
[61] S. Ansumali, I.V. Karlin, and H. C. Ottinger, Europhys.
Lett. 63, 798 (2003).
[62] B.M. Boghosian, P.J. Love, P. V. Coveney, I.V. Karlin,
S.Succi, J.Yepez, Phys.Rev E 68, 025103(R) (2003).
[63] A.J. Chorin, J.Comp.Phys. 2, 12 (1967).
ABSTRACT
  In spite of the large number of papers appeared in the past which are devoted
to the lattice Boltzmann (LB) methods, basic aspects of the theory still remain
unchallenged. An unsolved theoretical issue is related to the construction of a
discrete kinetic theory which yields \textit{exactly} the fluid equations,
i.e., is non-asymptotic (here denoted as \textit{LB inverse kinetic theory}).
The purpose of this paper is theoretical and aims at developing an inverse
kinetic approach of this type. In principle infinite solutions exist to this
problem but the freedom can be exploited in order to meet important
requirements. In particular, the discrete kinetic theory can be defined so that
it yields exactly the fluid equation also for arbitrary non-equilibrium (but
suitably smooth) kinetic distribution functions and arbitrarily close to the
boundary of the fluid domain. Unlike previous entropic LB methods the theorem
can be obtained without functional constraints on the class of the initial
distribution functions. Possible realizations of the theory and asymptotic
approximations are provided which permit to determine the fluid equations
\textit{with prescribed accuracy.} As a result, asymptotic accuracy estimates
of customary LB approaches and comparisons with the Chorin artificial
compressibility method are discussed.

<|endoftext|><|startoftext|>
Phonon-mediated decay of an atom in a surface-induced potential
Fam Le Kien,1,∗ S. Dutta Gupta,1,2 and K. Hakuta1
Department of Applied Physics and Chemistry, University of Electro-Communications, Chofu, Tokyo 182-8585, Japan
School of Physics, University of Hyderabad, Hyderabad, India
(Dated: August 4, 2021)
We study phonon-mediated transitions between translational levels of an atom in a surface-induced
potential. We present a general master equation governing the dynamics of the translational states
of the atom. In the framework of the Debye model, we derive compact expressions for the rates
for both upward and downward transitions. Numerical calculations for the transition rates are
performed for a deep silica-induced potential allowing for a large number of bound levels as well
as free states of a cesium atom. The total absorption rate is shown to be determined mainly by
the bound-to-bound transitions for deep bound levels and by bound-to-free transitions for shallow
bound levels. Moreover, the phonon emission and absorption processes can be orders of magnitude
larger for deep bound levels as compared to the shallow bound ones. We also study various types of
transitions from free states. We show that, for thermal atomic cesium with temperature in the range
from 100 µK to 400 µK in the vicinity of a silica surface with temperature of 300 K, the adsorption
(free-to-bound decay) rate is about two times larger than the heating (free-to-free upward decay)
rate, while the cooling (free-to-free downward decay) rate is negligible.
PACS numbers: 34.50.Dy,33.70.Ca
I. INTRODUCTION
Over the past few years, tight confinement of cold
atoms has drawn considerable attention. The interest in
this area is motivated not only by the fundamental na-
ture of the problem, but also by its potential applications
in atom optics and quantum information. A method
for microscopic trapping and guiding of individual atoms
along a nanofiber has been proposed [1]. Surface–atom
quantum electrodynamic effects have constituted another
interesting area, where a great deal of work has been
carried out. Modification of spontaneous emission of
an atom [2] and radiative exchange between two distant
atoms [3] mediated by a nanofiber have been investigated.
Surface-induced deep potentials have played a major role
and have received due attention in recent years. Oria
et al. have studied various theoretical schemes to load
atoms into such potentials [4, 5]. A rigorous theory of
spontaneous decay of an atom in a surface-induced po-
tential invoking the density-matrix formalism has been
developed [6]. The role of interference between the emit-
ted and reflected fields and also the role of transmission
into the evanescent modes were identified. Further cal-
culations on the excitation spectrum have been carried
out [7]. Bound-to-bound transitions were shown to lead
to significant effects like a large red tail of the excita-
tion spectrum as compared to the weak consequences of
free-to-bound transitions. A crucial step in this direction
was the experimental observation of the excitation spec-
trum and the channeling of the fluorescent photons along
the nanofiber [8], opening up avenues for novel quantum
information devices.
In most of the problems involving surface–atom inter-
action, the macroscopic surface is usually kept at room
temperature. Thus the pertinent question that can be
asked is what would be the effect of heating on the cold
atoms. It is understood that transfer of heat to the
trapped atoms will lead to a change in the occupation
probability of the vibrational levels as well as their co-
herence. Phonon-induced changes in the populations of
the vibrational levels have been studied by several groups
[5, 9, 10]. In a nice and compact treatment based on the
dyadic Green function and the Fermi golden rule, Henkel
et al. showed that the effects can be very different de-
pending on the nature of the atomic/molecular species
[9]. The time scales for various species were estimated.
It should be stressed that the trap considered by Henkel
et al. was not necessarily a surface trap and misses out
on many of the aspects of the surface–atom interaction
[9]. Based on the assumption that the surface–atom in-
teraction can be represented by a Morse potential, the
phonon-mediated decay was estimated by Oria et al. [5].
Their estimate was based on the formalism developed by
Gortel et al. [10]. However, all the previous theories
focus on only the transition rates and thus are not gen-
eral enough. In this paper, we present a general density-
matrix formalism to calculate the phonon-mediated de-
cay of populations as well as the changes in coherence.
We derive the relevant master equation for the density
matrix of the atom. We emphasize that our density-
matrix equation describes the full dynamics of the cou-
pling between trapped atoms and phonons and does not
assume any particular form of the trapping potential.
Under the Debye approximation, we derive compact ex-
pressions for the phonon-mediated decay rates. Numer-
ical calculations are carried out assuming the potential
model considered in [4]. In contrast to the previous work,
we include a large number of vibrational levels due to the
deep surface–atom potential. We show that there can be
significant differences in the decay rates when the initial
level is chosen as one of the shallow or deep bound levels.
We also calculate and analyze the decay rates for various
http://arxiv.org/abs/0704.0340v1
types of transitions from free states.
The paper is organized as follows. In Sec. II we de-
scribe the model. In Sec. III we derive the basic dynam-
ical equations for the phonon-mediated decay processes.
In Sec. IV we present the results of numerical calcula-
tions. Our conclusions are given in Sec. V.
II. DESCRIPTION OF THE MODEL SYSTEM
We assume the whole space to be divided into two re-
gions, namely, the half-space x < 0, occupied by a nondis-
persive nonabsorbing dielectric medium (medium 1), and
the half-space x > 0, occupied by vacuum (medium 2).
We examine a single atom moving in the empty half-
space x > 0. We assume that the atom is in a fixed
internal state |i〉 with energy h̄ωi. Without loss of gen-
erality, we assume that the energy of the internal state
|i〉 is zero, i.e. ωi = 0. We describe the interaction be-
tween the atom and the surface. We first consider the
surface-induced interaction potential and then add the
atom-phonon interaction.
A. Surface-induced interaction potential
In this subsection, we describe the interaction between
the atom and the surface in the case where thermal vi-
brations of the surface are absent. The potential en-
ergy of the surface–atom interaction is a combination of
a long-range van der Waals attraction and a short-range
repulsion [11]. Despite a large volume of research on the
surface–atom interaction, due to the complexity of sur-
face physics and the lack of data, the actual form of the
potential is yet to be ascertained [11]. For the purpose
of numerical demonstration of our formalism, we choose
the following model for the potential [4, 11]:
U(x) = Ae−αx − C3
. (1)
Here, C3 is the van der Waals coefficient, while A and
α determine the height and range, respectively, of the
surface repulsion. The potential parameters C3, A, and
α depend on the nature of the dielectric and the atom.
In numerical calculations, we use the parameters of fused
silica, for the dielectric, and the parameters of ground-
state atomic cesium, for the atom. The parameters for
the interaction between silica and ground-state atomic
cesium are theoretically estimated to be C3 = 1.56 kHz
µm3, A = 1.6× 1018 Hz, and α = 53 nm−1 [6].
We introduce the notation ϕν(x) for the eigenfunc-
tions of the center-of-mass motion of the atom in the
potential U(x). They are determined by the stationary
Schrödinger equation
+ U(x)
ϕν(x) = Eνϕν(x). (2)
Here m is the mass of the atom. In the numerical ex-
ample with atomic cesium, we have m = 132.9 a.u.
= 2.21 × 10−25 kg. The eigenvalues Eν are the center-
of-mass energies of the translational levels of the atom.
These eigenvalues are the shifts of the energies of the
translational levels from the energy of the internal state
|i〉. Without loss of generality, we assume that the
center-of-mass eigenfunctions ϕν(x) are real functions,
i.e. ϕ∗ν(x) = ϕν(x).
In Fig. 1, we show the potential U(x) and the wave
functions ϕν(x) of a number of bound levels with en-
ergies in the range from −1 GHz to −5 MHz. We also
plot the wave function of a free state with energy of about
4.25 MHz. In order to have some estimate about the spa-
tial extent of a wave function ϕν(x), we define a crossing
point xcross, which corresponds to the rightmost solution
of the equation U(x) = Eν . Note that, for shallow lev-
els, the wave function generally peaks close to the point
xcross. We plot the eigenvalue modulus |Eν | and the cross-
ing point xcross in Figs. 2(a) and 2(b), respectively. It is
clear from the figure that, for ν in the range from 0 to
300, the eigenvalue varies dramatically from about 158
THz to about 322 kHz, while the wave function extends
only up to 170 nm.
FIG. 1: Energies and wave functions of the center-of-mass
motion of an atom in a surface-induced potential. The pa-
rameters of the potential are C3 = 1.56 kHz µm
3, A =
1.6 × 1018 Hz, and α = 53 nm−1. The mass of the atom
is m = 2.21 × 10−25 kg. We plot bound levels with energies
in the range from −1 GHz to −5 MHz and also a free state
with energy of about 4.25 MHz.
FIG. 2: Eigenvalue modulus |Eν | (a) and crossing point xcross
(b) as functions of the vibrational quantum number ν. The
parameters used are as in Fig. 1.
We introduce the notation |ν〉 = |ϕν〉 and ων = Eν/h̄
for the state vectors and frequencies of translational lev-
els. Then, the Hamiltonian of the atom in the surface-
induced potential can be represented in the diagonal form
h̄ωνσνν . (3)
Here, σνν = |ν〉〈ν| is the population operator for the
translational level ν. We emphasize that the summation
over ν includes both the discrete (Eν < 0) and continuous
(Eν > 0) spectra. The levels ν with Eν < 0 are called
the bound (or vibrational) levels. In such a state, the
atom is bound to the surface. It is vibrating, or more
exactly, moving back and forth between the walls formed
by the van der Waals part and the repulsive part of the
potential. The levels ν with Eν > 0 are called the free (or
continuum) levels. The center-of-mass wave functions of
the bound states are normalized to unity. The center-of-
mass wave functions of the free states are normalized to
the delta function of energy.
B. Atom–phonon interaction
In this subsection, we incorporate the thermal vibra-
tions of the solid into the model. Due to the thermal
effects, the surface of the dielectric vibrates. The surface-
induced potential for the atom is then U(x− xs), where
xs is the displacement of the surface from the mean po-
sition 〈xs〉 = 0. We approximate the vibrating potential
U(x− xs) by expanding it to the first order in xs,
U(x− xs) = U(x) − U ′(x)xs. (4)
The first term, U(x), when combined with the kinetic
energy p2/2m, yields the Hamiltonian HA [see Eq. (3)],
which leads to the formation of translational levels of
the atom. The second term, −U ′(x)xs, accounts for the
thermal effects in the interaction of the atom with the
solid. Note that the quantity F = −U ′(x) is the force
of the surface upon the atom. Hence, the force of the
atom upon the surface is −F = U ′(x) and, consequently,
U ′(x)xs is the work required to displace the surface for
a small distance xs.
It is well known that, for a smooth surface, the gas
atom interacts only with the phonons polarized along
the x direction [10]. In the harmonic approximation, we
2MNωq
iqR + b†qe
−iqR). (5)
Here, M is the mass of a particle of the solid, N is
the particle number density, ωq and q are the frequency
and wave vector of the x-polarized acoustic phonons, re-
spectively, R = (0, y, z) is the lateral component of the
position vector (x, y, z) of the atom, and bq and b
q are
the annihilation and creation phonon operators, respec-
tively. Without loss of generality, we choose R = 0.
Meanwhile, the operator U ′ can be decomposed as U ′ =
νν′ σνν′ 〈ν|U ′|ν′〉, where σνν′ = |ν〉〈ν′| is the operator
for the translational transition ν ↔ ν′. Hence, the en-
ergy term −U ′(x)xs leads to the atom–phonon interac-
tion Hamiltonian [10]
HI = h̄
S(bq + b
q), (6)
gνν′σνν′ . (7)
Here we have introduced the atom–phonon coupling co-
efficients
gνν′ =
Fνν′√
2MNh̄
, (8)
Fνν′ = −
ϕν(x)U
′(x)ϕν′ (x)dx (9)
being the matrix elements for the force of the surface
upon the atom. We note that Fνν′ = −mω2νν′xνν′ , where
xνν′ = 〈ν|x|ν′〉 and ωνν′ = ων −ων′ are the surface–atom
dipole matrix element and the translational transition
frequency, respectively. Hence, the coupling coefficient
gνν′ depends on the dipole matrix element xνν′ and the
transition frequency ωνν′ . Since ωνν = 0, we have gνν =
We note that the Hamiltonian of the x-polarized acous-
tic phonons is given by
h̄ωqb
qbq. (10)
The total Hamiltonian of the atom–phonon system is
H = HA +HI +HB. (11)
We use the above Hamiltonian to study the phonon-
mediated decay of the atom.
III. DYNAMICS OF THE ATOM
In this section, we present the basic equations for the
phonon-mediated decay processes. We derive a general
master equation for the reduced density operator of the
atom in subsection IIIA, obtain analytical expressions
for the relaxation rates and frequency shifts in subsec-
tion III B, and calculate the rates and the shifts in the
framework of the Debye model in subsection III C.
A. Master equation
In the Heisenberg picture, the equation for the phonon
operator bq(t) is
ḃq(t) = −iωqbq(t)−
S(t), (12)
which has a solution of the form
bq(t) = bq(t0)e
−iωq(t−t0) − iWq(t). (13)
Here, t0 is the initial time and Wq is given by
Wq(t) =
e−iωq(t−τ)S(τ) dτ. (14)
Consider an arbitrary atomic operator O which acts only
on the atomic states but not on the phonon states. The
time evolution of this operator is governed by the Heisen-
berg equation
∂O(t)
[HA(t) +HI(t),O(t)], (15)
which, with account of Eqs. (6) and (13), yields
∂O(t)
[HA(t),O(t)]
[S(t),O(t)][bq(t0)e−iωq(t−t0) − iWq(t)]
[b†q(t0)e
iωq(t−t0) + iW †q(t)][O(t), S(t)].
We assume the initial density of the atom–phonon sys-
tem to be the direct product state
ρΣ(t0) = ρ(t0)ρB(t0), (17)
with the atom in an arbitrary state ρ(t0) and the phonons
in a thermal state
ρB(t0) = Z
−1 exp[−HB(t0)/kBT ]. (18)
Here, Z is the normalization constant and T is the tem-
perature of the phonon bath. For the initial condition
(17), the Bogolubov’s lemma [12], applied to an arbitrary
operator Θ(t), asserts the following:
〈Θ(t)bq(t0)〉 = n̄q〈[bq(t0),Θ(t)]〉, (19)
where the mean number of phonons in the mode q is
given by
n̄q =
exp(h̄ωq/kBT )− 1
. (20)
Let Θ be an atomic operator. We then have the commu-
tation relation [bq(t),Θ(t)] = 0, which yields
[bq(t0),Θ(t)] = ie
iωq(t−t0)[Wq(t),Θ(t)]. (21)
Combining Eq. (19) with Eq. (21) leads to
〈Θ(t)bq(t0)〉 = ieiωq(t−t0)n̄q〈[Wq(t),Θ(t)]〉. (22)
We perform the quantum mechanical averaging for ex-
pression (16) and use Eq. (22) to eliminate the phonon
operators bq(t0) and b
q(t0). The resulting equation can
be written as
∂〈O(t)〉
〈[HA(t),O(t)]〉
n̄q + 1√
〈[S(t),O(t)]Wq(t) +W †q(t)[O(t), S(t)]〉
〈Wq(t)[O(t), S(t)] + [S(t),O(t)]W †q(t)〉.
We note that Eq. (23) is exact. It does not contain
phonon operators explicitly. The dependence on the
phonon operators is hidden in the time shift of the oper-
ator S(τ) in expression (14) for the operator Wq(t).
We now show how the dependence of the operator
Wq(t) on the phonon operators can be approximately
eliminated. We assume that the atom–phonon coupling
coefficients gνν′ are small. The use of the zeroth-order
approximation σνν′(τ) = σνν′ (t)e
iωνν′(τ−t) in the expres-
sion for S(τ) [see Eq. (7)] yields
S(τ) =
gνν′σνν′ (t)e
iωνν′(τ−t), (24)
which is accurate to first order in the coupling coeffi-
cients. Inserting Eq. (24) into Eq. (14) gives
Wq(t) =
gνν′σνν′(t)δ−(ων′ν − ωq), (25)
where
δ−(ω) = lim
e−i(ω+iǫ)τ dτ
δ(ω). (26)
Here, in order to take into account the effect of adiabatic
turn-on of interaction, we have added a small positive
parameter ǫ to the integral and have used the limit t0 →
−∞. Introducing the notation
gνν′σνν′δ−(ων′ν − ωq), (27)
we can rewrite Eq. (23) in the form
∂〈O(t)〉
〈[HA(t),O(t)]〉
(n̄q + 1)〈[S(t),O(t)]Kq(t) +K†q(t)[O(t), S(t)]〉
n̄q〈Kq(t)[O(t), S(t)] + [S(t),O(t)]K†q(t)〉. (28)
In order to examine the time evolution of the reduced
density operator ρ(t) of the atom in the Schrödinger
picture, we use the relation 〈O(t)〉 = Tr[O(t)ρ(0)] =
Tr[O(0)ρ(t)], transform to arrange the operator O(0) at
the first position in each operator product, and eliminate
O(0). Then, we obtain the Liouville master equation
∂ρ(t)
= − i
[HA, ρ(t)]
(n̄q + 1){[Kqρ(t), S] + [S, ρ(t)K†q]}
n̄q{[S, ρ(t)Kq] + [K†qρ(t), S]}. (29)
Equations (28) and (29) are valid to second order in
the coupling coefficients. These equations allow us to
study the time evolution and dynamical characteristics of
the atom interacting with the thermal phonon bath. We
note that Eq. (29) is a particular form of the Zwanzig’s
generalized master equation, which can be obtained by
the projection operator method [13].
B. Relaxation rates and frequency shifts
We use Eq. (29) to derive an equation for the matrix
elements ρjj′ ≡ 〈j|ρ|j′〉 of the reduced density operator
of the atom. The result is
∂ρjj′
= −iωjj′ρjj′ +
(γejj′νν′ + γ
jj′νν′)ρνν′
[(γejν + γ
jν)ρνj′ + (γ
j′ν + γ
j′ν)ρjν ], (30)
where the coefficients
γejj′νν′ = 2π
n̄q + 1
gjνgj′ν′ [δ−(ωνj − ωq)
+ δ+(ων′j′ − ωq)],
γejν = 2π
n̄q + 1
gjµgνµδ−(ωνµ − ωq) (31)
γajj′νν′ = 2π
gjνgj′ν′ [δ−(ωj′ν′ − ωq)
+ δ+(ωjν − ωq)],
γajν = 2π
gjµgνµδ+(ωµν − ωq) (32)
are the decay parameters associated with the phonon
emission and absorption, respectively. Here, the nota-
tion δ+(ω) = δ
−(ω) has been used.
Equation (30) describes phonon-induced variations in
the populations and coherences of the translational levels
of the atom. We analyze the characteristics of the relax-
ation processes. For simplicity of mathematical treat-
ment, we first consider only transitions from discrete lev-
els. The equation for the diagonal matrix element ρjj for
a discrete level j can be written in the form
(γejjνν + γ
jjνν )ρνν
− (γejj + γajj + c.c.)ρjj
+ off-diagonal terms. (33)
When the off-diagonal terms are neglected, Eq. (33) re-
duces to a simple rate equation. It is clear from Eq. (33)
that the rate for the downward transition from an upper
level l to a lower level k (k < l) is
Rekl = γ
kkll = 2π
n̄q + 1
g2lkδ(ωlk − ωq), (34)
while the rate for the upward transition from a lower level
k to an upper level l (l > k) is
Ralk = γ
llkk = 2π
g2lkδ(ωlk − ωq). (35)
Equations (34) and (35) are in agreement with the re-
sults of Gortel et al. [10], obtained by using the Fermi
golden rule. We note that Rekl and R
lk with l ≤ k are
mathematically equal to zero because they have no phys-
ical meaning. For convenience, we introduce the notation
Rlk = R
lk, R
lk, or 0 for l < k, l > k, or l = k, respec-
tively. It is clear that the off-diagonal coefficients Rlk
with l 6= k are the rates of transitions. However, the di-
agonal coefficients Rkk have no physical meaning and are
mathematically equal to zero.
As seen from Eq. (33), the phonon-mediated depletion
rate of a level k is Γkk = 2Re(γ
kk + γ
kk). The explicit
expression for this rate is
Γkk = 2π
n̄q + 1
g2kµδ(ωkµ − ωq)
g2µkδ(ωµk − ωq). (36)
We note that Γkk =
µk +R
µk) =
µ Rµk. We can
write Γkk = Γ
kk + Γ
kk, where
Γekk =
Reµk (37)
Γakk =
Raµk (38)
are the contributions due to downward transitions
(phonon emission) and upward transitions (phonon ab-
sorption), respectively. In the above equations, the sum-
mation over µ can be extended to cover not only the
discrete levels but also the continuum levels.
Meanwhile, the equation for the off-diagonal matrix
element ρlk for a pair of discrete levels l and k can be
written in the form ∂ρlk/∂t = −(iωlk + γell + γall + γe∗kk +
γa∗kk)ρlk + . . . , or, equivalently,
= −i(ωlk +∆lk − iΓlk)ρlk + . . . . (39)
Here the frequency shift ∆lk is given by
∆lk =
n̄q + 1
ωlµ − ωq
ωµk + ωq
ωlµ + ωq
ωµk − ωq
, (40)
while the coherence decay rate Γlk is expressed as
Γlk = π
n̄q + 1
g2lµδ(ωlµ − ωq) + g2kµδ(ωkµ − ωq)
g2µlδ(ωµl − ωq) + g2µkδ(ωµk − ωq)
When we set l = k in Eq. (40), we find ∆kk = 0.
When we set l = k in Eq. (41), we recover Eq. (36).
We note that Γlk =
µl + R
µk + R
µl + R
µk)/2 =
µ(Rµl + Rµk)/2. Comparison between Eqs. (41) and
(36) yields the relation Γlk = (Γll +Γkk)/2. We can also
write Γlk = Γ
lk + Γ
lk, where Γ
µl + R
µk)/2
and Γalk =
µl + R
µk)/2 are the contributions due
to downward transitions (phonon emission) and upward
transitions (phonon absorption), respectively. In the
above equations, the summation over µ can be extended
to cover not only the discrete levels but also the contin-
uum levels.
We now discuss phonon-mediated transitions from con-
tinuum (free) levels. We start by considering free-to-
bound transitions. For a continuum level f with energy
Ef > 0, the center-of-mass wave function ϕf (x) is nor-
malized per unit energy. In this case, the quantity Rνf
becomes the density of the transition rate. A free level f
can be approximated by a level of a quasicontinuum [14].
A discretization of the continuum can be realized by using
a large box of length L with reflecting boundary condi-
tions [15]. We label En the energies of the eigenstates
in the box and φn(x) the corresponding wave functions.
Note that such states are standing-wave states [14, 15].
The relation between a quasicontinuum-state wave func-
tion φnf (x), normalized to unity in the box, and the cor-
responding continuum-state wave function ϕf (x), nor-
malized per unit energy, with equal energies Enf = Ef ,
is [15]
ϕf (x) ∼=
]−1/2
φnf (x)
)1/2 (
φnf (x). (42)
Consequently, for a single atom initially prepared in the
quasicontinuum standing-wave state |nf 〉 = |φnf 〉, the
rate for the transition to an arbitrary bound state |ν〉 is
approximately given by
Gνf =
vfRνf , (43)
where vf = (2Ef/m)1/2 is the velocity of the atom in the
initial standing-wave state |f〉. The phonon-mediated
free-to-bound decay rate (adsorption rate) is then given
Gνf , (44)
where the summation includes only bound levels. It is
clear from Eq. (43) that, in the continuum limit L → ∞,
the rate Gνf tends to zero. This is because a free atom
can be anywhere in free space and therefore the effect of
phonons on a single free atom is negligible.
In order to get deeper insight into the free-to-bound
transition rate density Rνf , we consider a macroscopic
atomic ensemble in the thermodynamic limit [14]. Sup-
pose that there are N0 atoms in a volume with a large
length L and a transverse cross section area S0. Assume
that all the atoms are in the same quasicontinuum state
|nf 〉 and interact with the dielectric independently. The
rate for the transitions of the atoms from the quasicon-
tinuum state |nf〉 to an arbitrary bound state |ν〉, defined
as the time derivative of the number of atoms in the state
|ν〉, is Dνf = N0Gνf . In order to get the rate for the con-
tinuum state |f〉, we need to take the thermodynamical
limit, where L → ∞ and N0 → ∞ but N0/L remains
constant. Then, the rate for the transitions of the atoms
from the continuum state |f〉 to an arbitrary bound
state |ν〉 is given by Dνf = πh̄ρ0S0vfRνf = 2πh̄NfRνf .
Here, ρ0 = N0/LS0 is the atomic number density and
Nf = ρ0S0vf/2 is the number of atoms incident into the
dielectric surface per unit time. It is clear that the tran-
sition rate Dνf is proportional to the incidence rate Nf
as well as the transition rate density Rνf . We emphasize
that Dνf is a characteristics for a macroscopic atomic en-
semble in the thermodynamic limit while Gνf is a mea-
sure for a single atom. When the length of the box, L,
and the number of atoms, N0, are finite, the dynamics of
the atoms cannot be described by the free-to-bound rate
Dνf directly. Instead, we must use the transition rate
per atom Gνf = Dνf/N0, which depends on the length
L of the box that contains the free atoms [see Eq. (43)].
In a thermal gas, the atoms have different velocities
and, therefore, different energies. For a thermal Maxwell-
Boltzmann gas with temperature T0, the distribution of
the kinetic energy Ef of the atomic center-of-mass motion
along the x direction is
P (Ef ) =
πkBT0
e−Ef/kBT0
. (45)
The transition rate to an arbitrary bound state |ν〉 is
then given by GνT0 =
GνfP (Ef ) dEf , i.e.
GνT0 =
e−Ef/kBT0RνfdEf , (46)
where λD = (2πh̄
2/mkBT0)
1/2 is the thermal de Broglie
wavelength. The phonon-mediated free-to-bound decay
rate (adsorption rate) is given by
GT0 =
GνT0 =
GfP (Ef ) dEf . (47)
In the above equation, the summation over ν includes
only bound levels. Note that Eq. (46) is in qualitative
agreement with the results of Refs. [5, 14].
It is easy to extend the above results to the case of
free-to-free transitions. Indeed, it can be shown that the
density of the rate for the transition from a quasicontin-
uum state |nf 〉, which corresponds to a free state |f〉, to
a different free state |f ′〉 is given by
Qf ′f =
vfRf ′f . (48)
For convenience, we introduce the notation Qef ′f = Qf ′f
or 0 for Ef ′ < Ef or Ef ′ ≥ Ef , respectively, and Qaf ′f =
Qf ′f or 0 for Ef ′ > Ef or Ef ′ ≤ Ef , respectively. Then, we
have Qf ′f = Q
f ′f , 0, or Q
f ′f for Ef ′ < Ef , Ef ′ = Ef , or
Ef ′ > Ef , respectively. The downward (phonon-emission)
and upward (phonon-absorption) free-to-free decay rates
for the free state |f〉 are given by
Qef =
Qef ′fdEf ′ (49)
Qaf =
Qaf ′fdEf ′ , (50)
respectively. The total free-to-free decay rate for the free
state |f〉 is Qf = Qef +Qaf =
Qf ′fdEf ′ .
For a thermal gas, we need to replace the transition
rate density Qf ′f and the decay rate Qf by Qf ′T0 =
Qf ′fP (Ef ) dEf and QT0 =
QfP (Ef ) dEf , respec-
tively, which are the averages of Qf ′f and Qf , respec-
tively, with respect to the energy distribution P (Ef )
of the initial state. Like in the other cases, we have
Qf ′T0 = Q
f ′T0
+Qaf ′T0 and QT0 = Q
+QaT0 , where
Qef ′T0 =
Qef ′fP (Ef ) dEf ,
Qaf ′T0 =
∫ Ef′
Qaf ′fP (Ef ) dEf (51)
are the downward and upward transition rate densities
QeT0 =
QefP (Ef ) dEf ,
QaT0 =
QafP (Ef ) dEf (52)
are the downward and upward decay rates. The thermal
decay ratesQeT0 andQ
describe the cooling and heating
processes, respectively. It can be easily shown thatQeT0 <
QaT0 , Q
> QaT0 , and Q
= QaT0 when T0 < T , T0 >
T , and T0 = T , respectively. The relation Q
< QaT0
(QeT0 > Q
), obtained for T0 < T (T0 > T ), indicates
the dominance of heating (cooling) of free atoms by the
surface.
C. Relaxation rates and frequency shifts in the
framework of the Debye model
In order to get insight into the relaxation rates and
frequency shifts, we approximate them using the Debye
model for phonons. In this model, the phonon frequency
ωq is related to the phonon wave number q as ωq = vq,
where v is the sound velocity. Furthermore, the summa-
tion over the first Brillouin zone is replaced by an integral
over a sphere of radius qD = (6π
2N/V )1/3, where V is the
volume of the solid. The Debye frequency and the Debye
temperature are given by ωD = vqD and TD = h̄ωD/kB,
respectively. For fused silica, we have v = 5.96 km/s,
NM/V = 2.2 g/cm3, and M = 9.98× 10−26 kg [16]. Us-
ing these parameters, we find qD = 109.29 × 106 cm−1,
ωD = 10.4 THz, and TD = 498 K. In order to perform
the summation over phonon states in the framework of
the Debye model, we invoke the thermodynamic limit,
i.e., replace
· · · = V
|q|≤qD
. . . dq =
. . . ω2qdωq. (53)
Then, for transitions between an upper level l and a lower
level k, where 0 < ωlk < ωD, Eqs. (34) and (35) yield
Rekl =
Mh̄ω3D
(n̄lk + 1)ωlkF
lk (54)
Ralk =
Mh̄ω3D
n̄lkωlkF
lk. (55)
Here, n̄lk is given by Eq. (20) with ωq replaced by ωlk.
We emphasize that, according to Eqs. (54) and (55),
the phonon-emission rate Rekl and the phonon-absorption
rate Ralk depend not only on the matrix element Flk of
the force but also on the translational transition fre-
quency ωlk. The frequency dependences of the transi-
tion rates are comprised of the frequency dependences
of the mean phonon number n̄lk, the phonon mode den-
sity 3Nω2lk/ω
D, and the matrix element Flk = −U ′lk =
−mω2lkxlk of the force. An additional factor comes from
the presence of the phonon frequency in Eq. (5) for
the surface displacement and, consequently, in the atom–
phonon interaction Hamiltonian (6). It is clear that an
increase in the phonon frequency leads to a decrease in
the mean phonon number and an increase in the phonon
mode density. The matrix element of the force usu-
ally first increases and then decreases with increasing
phonon frequency. Due to the existence of several com-
peting factors, the frequency dependences of the tran-
sition rates are rather complicated. They usually first
increase and then decrease with increasing phonon fre-
quency. We note that, for transitions with ωlk > ωD, we
have Rekl = R
lk = 0.
We conclude this section by noting that the use of Eq.
(53) in Eq. (40) yields the frequency shift
∆lk = ∆
lk +∆
lk , (56)
where
2Mh̄ω3D
F 2lµ
ωlµ − ω
F 2µk
ωµk + ω
ωdω (57)
Mh̄ω3D
ω2lµ − ω2
ω2µk − ω2
n̄ωωdω
are the zero- and finite-temperature contributions, re-
spectively. In Eq. (58), n̄ω is given by Eq. (20) with ωq
replaced by ω.
IV. NUMERICAL RESULTS AND
DISCUSSIONS
In this section, we present the numerical results based
on the analytical expressions derived in the previous
section for the phonon-mediated relaxation rates of the
translational levels of the atom. In particular, we use
Eqs. (54) and (55), obtained in the framework of the De-
bye model, for our numerical calculations. We consider
transitions from bound states as well as free states. The
transitions from bound states to other translational lev-
els occur in the case where the atom is initially already
adsorbed or trapped near the surface. The transitions
from free states to other translational levels occur in the
processes of adsorbing, heating, and cooling of free atoms
by the surface. Due to the difference in physics of the ini-
tial situations, we study the transitions from bound and
free states separately.
A. Transitions from bound states
FIG. 3: Phonon-emission rates Reν′ν from the vibrational lev-
els (a) ν = 280 and (b) ν = 120 to other levels ν′ as functions
of the lower-level energy Eν′ . The arrows mark the initial
states. The parameters of the solid are M = 9.98 × 10−26 kg
and ωD = 10.4 THz. The temperature of the phonon bath is
T = 300 K. Other parameters are as in Fig. 1.
FIG. 4: Phonon-absorption rates Raν′ν from the vibrational
levels (a) ν = 280 and (b) ν = 120 to other levels ν′ as func-
tions of the upper-level energy Eν′ . The left (right) panel
in each row corresponds to bound-to-bound (bound-to-free)
transitions. The arrows mark the initial states. The param-
eters used are as in Fig. 3. The temperature of the phonon
bath is T = 300 K.
We start from a given bound level and calculate the
rates of phonon-mediated atomic transitions, both down-
ward and upward. The profiles of the phonon-emission
(downward-transition) rate Reν′ν [see Eq. (54)] and the
phonon-absorption (upward-transition) rateRaν′ν [see Eq.
(55)] are shown in Figs. 3 and 4, respectively. The upper
(lower) part of each of these figures corresponds to the
case of the initial level ν = 280 (ν = 120), with energy
Eν = −156 MHz (Eν = −8.4 THz). The left (right) panel
of Fig. 4 corresponds to bound-to-bound (bound-to-free)
upward transitions. The temperature of the surface is
assumed to be T = 300 K. As seen from Figs. 3 and 4,
the transition rates have pronounced localized profiles.
Due to the competing effects of the mean phonon num-
ber, the phonon mode density, and the matrix element
of the force, the transition rates usually first increase
and then decrease with increasing phonon frequency. It
is clear from a comparison of Figs. 3(a) and 3(b) and
also a comparison of Figs. 4(a) and 4(b) that transitions
from shallow levels have probabilities orders of magni-
tude lower than those from deeper levels. The main rea-
son is that the wave functions of the shallow states are
spread further away from the surface than those for the
deep states. Due to this difference, the effects of the sur-
face vibrations are weaker for the shallow levels than for
the deep levels. Another pertinent feature that should
be noted from the figure is the following: Since transi-
tion frequencies involved are large, they may overshoot
the Debye frequency ωD = 10.4 THz, leading to a cutoff
on the lower (higher) side of the frequency axis for the
emission (absorption) curve.
In order to see the overall effect of the individual tran-
sition rates shown above, we add them up. First we ex-
amine the phonon-absorption rates of bound levels. The
total phonon-absorption rate Γaνν of a bound level ν is the
sum of the individual absorption rates Raµν over all the
upper levels µ, both bound and free [see Eq. (38)]. We
plot in Fig. 5 the contributions to Γaνν from two types
of transitions, bound-to-bound and bound-to-free (des-
orption) transitions. The solid curve of the figure shows
that the bound-to-bound phonon-absorption rate is large
(above 1010 s−1) for deep and intermediate levels. How-
ever, it reduces dramatically with increasing ν in the
region of large ν and becomes very small (below 10−5
s−1) for shallow levels. Meanwhile, the dashed curve of
Fig. 5 shows that the bound-to-free phonon-absorption
rate (i.e., the desorption rate) is zero for deep levels, since
the energy required for the transition is greater than the
Debye energy [5]. However, the desorption rate is sub-
stantial (above 105 s−1) for intermediate and shallow lev-
els. Thus, the total phonon-absorption rate Γaνν is mainly
determined by the bound-to-bound transitions in the case
of deep levels and by the bound-to-free transitions in the
case of shallow levels. One of the reasons for the dramatic
reduction of the bound-to-bound phonon-absorption rate
in the region of shallow levels is that the number of up-
per bound levels µ becomes small. The second reason is
that the frequency of each individual transition becomes
small, leading to a decrease of the phonon mode den-
sity. The third reason is that the center-of-mass wave
functions of shallow levels are spread far away from the
surface, leading to a reduction of the effect of phonons
on the atom.
Unlike the bound-to-bound phonon-absorption rate,
the bound-to-free phonon-absorption rate is substantial
in the region of shallow levels. This is because the free-
state spectrum is continuous and the range of the bound-
to-free transition frequency can be large (up to the De-
bye frequency ωD = 10.4 THz). The gradual reduction of
the bound-to-free phonon-absorption rate in the region of
shallow levels is mainly due to the reduction of the time
that the atom spends in the proximity of the surface.
FIG. 5: Contributions of bound-to-bound (solid curve) and
bound-to-free (dashed curve) transitions to the total phonon-
absorption rate Γaνν versus the vibrational quantum number
ν of the initial level. The parameters used are as in Fig. 3.
The temperature of the phonon bath is T = 300 K.
The total phonon-emission rate Γeνν [see Eq. (37)] and
the total phonon-absorption rate Γaνν [see Eq. (38)] are
shown in Fig. 6 by the solid and dashed curves, respec-
tively. It is clear from the figure that emission is com-
parable to but slightly stronger than absorption. Such a
dominance is due to the fact that phonon emission moves
the atom to a center-of-mass state closer to the surface
while phonon absorption changes the atomic state in the
opposite direction (see Figs. 1 and 2). Our results for the
rates are in good qualitative agreement with the results
of Oria et al., albeit with the Morse potential [5]. We
stress that we include a large number of vibrational lev-
els as a consequence of the deep silica–cesium potential.
Note that the earlier work on this theme involved much
fewer levels [5].
FIG. 6: Phonon-emission decay rate Γeνν (solid lines) and
phonon-absorption decay rate Γaνν (dashed lines) of a bound
level as functions of the vibrational quantum number ν. The
inset shows the rates in the linear scale to highlight the dif-
ferences in the dissociation limit. The parameters used are as
in Fig. 3. The temperature of the phonon bath is T = 300 K.
FIG. 7: Same as in Fig. 6 except that T = 30 K.
We next study the effect of temperature on the decay
rates. The results for the phonon-mediated decay rates
for T = 30 K are shown in Fig. 7. In contrast to Fig. 6,
the absorption rate is now much smaller than the corre-
sponding emission rate for both shallow and deep levels.
Thus, while it is difficult to distinguish the two log-scale
curves for deep and shallow levels at room temperature
(see Fig. 6), they are well resolved at low temperature.
B. Transitions from free states
We now calculate the rates for transitions from free
states to other levels. We first examine free-to-bound
transitions, which correspond to the adsorption process.
According to Eq. (43), the free-to-bound (more exactly,
quasicontinuum-to-bound) transition rate Gνf depends
not only on the continuum-to-bound transition rate den-
sity Rνf but also on the length L of the free-atom quan-
tization box. To be specific, we use in our numerical
calculations the value L = 1 mm, which is a typical size
of atomic clouds in magneto-optical traps [17].
FIG. 8: Free-to-bound transition rates Gνf for transitions
from the free plane-wave states with energies (a) Ef = 2 MHz
and (b) Ef = 3.1 THz to bound levels ν as functions of the
bound-level energy Eν . The arrows mark the energies of the
initial free states. The insets show Gνf on the log scale versus
Eν in the range from −200 MHz to −0.2 MHz to highlight
the rates to shallow bound levels. The length of the free-
atom quantization box is L = 1 mm. The temperature of the
phonon bath is T = 300 K. Other parameters are as in Fig. 3.
We plot in Fig. 8 the free-to-bound transition rate Gνf
[see Eq. (43)] as a function of the vibrational quantum
number ν. The upper (lower) part of the figure corre-
sponds to the case of the initial-state energy Ef = 2
MHz (Ef = 3.1 THz), which is close to the average ki-
netic energy per atom in an ideal gas with temperature
T0 = 200 µK (T0 = 300 K). We observe that the free-to-
bound transition rate first increases and then decreases
with increasing transition frequency ωfν = (Ef − Eν)/h̄.
Such behavior results from the competing effects of the
mean phonon number, the phonon mode density, and the
matrix element of the force, like in the case of bound-
to-bound transitions (see Fig. 3). We also see a cut-
off of the transition frequency, which is associated with
the Debye frequency. Comparison of Figs. 8(a) and 8(b)
shows that the transitions from low-energy free states
have probabilities orders of magnitude smaller than those
from high-energy free states. One of the reasons is that
the transition rate Gνf is proportional to the velocity
vf = (2Ef/m)1/2 [see Eq. (43)]. The dependence of
the transition rate density Rνf on the transition fre-
quency ωfν also plays an important role. Because of this,
the rates for the transitions from low-energy free states
to shallow bound levels are very small [see the inset of
Fig. 8(a)].
FIG. 9: Free-to-bound decay rate Gf as a function of the
free-state energy Ef . The inset highlights the magnitude and
profile of the decay rate for Ef in the range from 0 to 20
MHz. The temperature of the phonon bath is T = 300 K.
Other parameters are as in Fig. 8.
We show in Fig. 9 the free-to-bound decay rate Gf
[see Eq. (44)], which is a characteristic of the adsorp-
tion process, as a function of the free-state energy Ef .
We see that Gf first increases and then decreases with
increasing Ef . The increase of Gf with increasing Ef in
the region of small Ef (see the inset) is mainly due to
the increase in the atomic incidence velocity vf . In this
region, we have Gf ∝ vf ∝
Ef [see Eqs. (43) and
(44)]. For Ef in the range from 0 to 20 MHz, which is
typical for atoms in magneto-optical traps, the maximum
value of Gf is on the order of 10
4 s−1 (see the inset of
Fig. 9). Such free-to-bound (adsorption) rates are sev-
eral orders of magnitude smaller than the bound-to-free
(desorption) rates (see the dashed curve in Fig. 5). The
decrease of Gf with increasing Ef in the region of large
Ef is mainly due to the reduction of the atom–phonon
coupling coefficients.
FIG. 10: Free-to-bound transition rates GνT0 for transitions
from the thermal states with temperatures (a) T0 = 200 µK
and (b) T0 = 300 K to bound levels ν as functions of the
bound-level energy Eν . The insets show GνT0 on the log scale
versus Eν in the range from −200 MHz to −0.2 MHz to high-
light the rates to shallow bound levels. The temperature of
the phonon bath is T = 300 K. Other parameters are as in
Fig. 8.
FIG. 11: Free-to-bound decay rate GT0 as a function of the
atomic temperature T0 in the ranges (a) from 100 µK to 400
µK and (b) from 50 K to 350 K. The temperature of the
phonon bath is T = 300 K. Other parameters are as in Fig. 8.
In a thermal gas, the adsorption process is charac-
terized by the transition rate GνT0 [see Eq. (46)] and
the decay rate GT0 [see Eq. (47)], which are the av-
erages of the free-to-bound transition rate Gνf and the
free-to-bound decay rate Gf , respectively, over the free-
state energy distribution (45). We plot the free-to-bound
transition rate GνT0 and the free-to-bound decay rate
GT0 in Figs. 10 and 11, respectively. Comparison be-
tween Figs. 10(a) and 9(a) shows that the transition rates
from low-temperature thermal states and low-energy free
states look quite similar to each other. The reason is that
the spread of the energy distribution is not substantial
in the case of low temperatures. The spread of the en-
ergy distribution is however substantial in the case of
high temperatures, leading to the softening of the cut-
off frequency effect [compare Fig. 10(b) with Fig. 9(b)].
Figure 11 shows that the free-to-bound decay rate GT0
first increases and then reduces with increasing atomic
temperature T0. For T0 in the range from 100 µK to 400
µK, which is typical for atoms in magneto-optical traps,
the maximum value of GT0 is on the order of 10
4 s−1 [see
Fig. 11(a)]. Such free-to-bound (adsorption) rates are
several orders of magnitude smaller than the bound-to-
free (desorption) rates (see the dashed curve in Fig. 5).
Figure 11(a) shows that, in the region of low atomic tem-
perature T0, one has GT0 ∝
T0, in agreement with the
asymptotic behavior of Eqs. (46) and (47).
FIG. 12: Free-to-free transition rate densities Qf ′f for the
upward (solid lines) and downward (dashed lines) transitions
from the free states |f〉 with energies (a) Ef = 2 MHz and (b)
Ef = 3.1 THz to other free states |f
′〉 as functions of the final-
level energy Ef ′ . The arrows mark the energies of the initial
free states. The inset in part (a) shows Qf ′f versus Ef ′ in
the range from 0 to 4 MHz to highlight the small magnitude
of the rate density for downward transitions (dashed line).
The temperature of the phonon bath is T = 300 K. Other
parameters are as in Fig. 8.
We now examine free-to-free transitions, both upward
and downward, which corresponding to the heating and
cooling processes of free atoms by the surface. We plot in
Fig. 12 the free-to-free transition rate density Qf ′f [see
Eq. (48)] as a function of the final-level energy Ef ′ . The
upper (lower) part of the figure corresponds to the case
of the initial-state energy Ef = 2 MHz (Ef = 3.1 THz),
which is close to the average kinetic energy per atom in
an ideal gas with temperature T0 = 200 µK (T0 = 300
K). The rate densities are shown for the upward (phonon-
absorption) and downward (phonon-emission) transitions
by the solid and dashed lines, respectively. The fig-
ure shows that the free-to-free transition rate density in-
creases or decreases with increasing transition frequency
if the latter is not too large or is large enough, respec-
tively. We also observe a signature of the Debye cutoff
of the phonon frequency. Comparison of Figs. 12(a) and
12(b) shows that transitions from low-energy free states
have probabilities orders of magnitude smaller than those
from high-energy free states. Figure 12(a) and its inset
show that, when the energy of the free state is low, the
free-to-free downward (cooling) transition rate is very
small as compared to the free-to-free upward (heating)
transition rate.
FIG. 13: Free-to-free upward and downward decay rates Qaf
(solid lines) and Qef (dashed lines) as functions of the energy
Ef of the initial free state. The insets highlight the magni-
tudes and profiles of the decay rates for Ef in the range from
0 to 20 MHz. The temperature of the phonon bath is T = 300
K. Other parameters are as in Fig. 8.
We show in Fig. 13 the free-to-free upward (phonon-
absorption) and downward (phonon-emission) decay
rates Qaf [see Eq. (50)] and Q
f [see Eq. (49)] as functions
of the free-state energy Ef . We observe that Qaf and Qef
increase with increasing Ef in the range from 0 to 8 THz.
The increase of Qaf with increasing Ef in the region of
small Ef (see the left inset) is mainly due to the increase
in the atomic incidence velocity vf . In this region, we
have Qaf ∝ vf ∝
Ef [see Eqs. (48) and (50)]. The
increase of Qef with increasing Ef in the region of small
Ef (see the right inset) is due to not only the increase in
the atomic incidence velocity vf [see Eq. (48)] but also
the increase of the transition rate density Qef ′f and the
increase of the integration interval (0, Ef) [see Eq. (49)].
In this region, the dependence of Qef on the energy Ef
is of higher order than E3/2f . The left inset of Fig. 13
shows that, for Ef in the range from 0 to 20 MHz, the
maximum value of Qaf is on the order of 10
4 s−1. Such
free-to-free upward (heating) decay rates are comparable
to but about two times smaller than the corresponding
free-to-bound (adsorption) decay rates (see the inset of
Fig. 9). Meanwhile, the right inset of Fig. 13 shows that,
in the region of small Ef , the free-to-free downward (cool-
ing) decay rate Qef is very small.
FIG. 14: Free-to-free transition rate densities QafT0 for up-
ward transitions (solid lines) and QefT0 for downward tran-
sitions (dashed lines) from the thermal states with tempera-
tures (a) T0 = 200 µK and (b) T0 = 300 K to free levels f
as functions of the free-level energy Ef . The inset in part (a)
shows the rate densities versus Ef in the range from 0 to 8
MHz to highlight the small magnitude of QefT0 (dashed line).
The temperature of the phonon bath is T = 300 K. Other
parameters are as in Fig. 8.
FIG. 15: Free-to-free decay rates QaT0 (solid lines) and
(dashed lines) for upward and downward transitions, re-
spectively, as functions of the atomic temperature T0 in the
ranges (a) from 100 µK to 400 µK and (b) from 50 K to 350
K. For comparison, the free-to-bound decay rate GT0 is re-
plotted from Fig. 11 by the dotted lines. The temperature of
the phonon bath is T = 300 K. Other parameters are as in
Fig. 8.
In the case of a thermal gas, the phonon-mediated heat
transfer between the gas and the surface is characterized
by the free-to-free transition rate densitiesQafT0 andQ
[see Eqs. (51)] and the free-to-free decay rates QaT0and
QeT0 [see Eqs. (52)]. We plot the free-to-free transition
rate densities QafT0 and Q
in Fig. 14. Comparison
between Figs. 14(a) and 12(a) shows that the transition
rate densities from low-temperature thermal states and
low-energy free states are quite similar to each other.
The spread of the initial-state energy distribution is not
substantial in this case. However, the energy spread of
the initial state is substantial in the case of high tem-
peratures, concealing the cutoff frequency effect [com-
pare Fig. 14(b) with Fig. 12(b)]. We display the free-
to-free decay rates QaT0 and Q
in Fig. 15. The solid
and dashed lines correspond to the upward (heating) and
downward (cooling) transitions, respectively. For com-
parison, the free-to-bound decay rate (adsorption rate)
GT0 is re-plotted from Fig. 11 by the dotted lines. We
observe that, for T0 in the range from 100 µK to 400 µK
[see Fig. 15(a)], the adsorption rate GT0 (dotted line) is
about two times larger than the heating rate QaT0 (solid
line), while the cooling rate QeT0 (dashed line) is negligi-
ble. Figure 15(a) shows that, in the region of low atomic
temperatures, one has QT0
∼= QaT0 ∝
T0, in agreement
with the asymptotic behavior of expressions (52). The
figure also shows that QeT0 quickly increases with increas-
ing atomic temperature T0. The relation Q
< QaT0 ,
obtained for T0 < T , indicates the dominance of heating
of cold free atoms by the surface. The substantial mag-
nitude of the free-to-bound transition rate GT0 (dotted
line) indicates that a significant number of atoms can be
adsorbed by the surface. According to Fig. 15(b), the
free-to-free downward transition rate QeT0 (dashed line)
crosses the upward transition rate QaT0 (solid line) when
T0 = T = 300 K, and then becomes the dominant decay
rate. The relation QeT0 > Q
, obtained for T0 > T , indi-
cates the dominance of cooling of hot free atoms by the
surface.
V. CONCLUSIONS
In conclusion, we have studied the phonon-mediated
transitions of an atom in a surface-induced potential.
We developed a general formalism, which is applicable
for any surface–atom potential. A systematic derivation
of the corresponding density-matrix equation enables us
to investigate the dynamics of both diagonal and off-
diagonal elements. We included a large number of vi-
brational levels originating from the deep silica–cesium
potential. We calculated the transition and decay rates
from both bound and free levels. We found that the
rates of phonon-mediated transitions between transla-
tional levels depend on the mean phonon number, the
phonon mode density, and the matrix element of the force
from the surface upon the atom. Due to the effects of
these competing factors, the transition rates usually first
increase and then reduce with increasing transition fre-
quency. We focused on the transitions from bound states.
Two specific examples, namely, when the initial level is
a shallow level also when it can be one of the deep levels
have been worked out. We have shown that there can be
marked differences in the absorption and emission behav-
ior in the two cases. For example, both the absorption
and emission rates from the deep bound levels can be sev-
eral orders (in our case, six orders) of magnitude larger
than the corresponding rates from the shallow bound lev-
els. We also analyzed various types of transitions from
free states. We have shown that, for thermal atomic ce-
sium with temperature in the range from 100 µK to 400
µK in the vicinity of a silica surface with temperature of
300 K, the adsorption (free-to-bound decay) rate is about
two times larger than the heating (free-to-free upward de-
cay) rate, while the cooling (free-to-free downward decay)
rate is negligible.
Acknowledgments
We thank M. Chevrollier for fruitful discussions. This
work was carried out under the 21st Century COE pro-
gram on “Coherent Optical Science.”
[∗] Also at Institute of Physics and Electronics, Vietnamese
Academy of Science and Technology, Hanoi, Vietnam.
[1] V. I. Balykin, K. Hakuta, Fam Le Kien, J. Q. Liang, and
M. Morinaga, Phys. Rev. A 70, 011401(R) (2004); Fam
Le Kien, V. I. Balykin, and K. Hakuta, Phys. Rev. A 70,
063403 (2004).
[2] Fam Le Kien, S. Dutta Gupta, V. I. Balykin, and K.
Hakuta, Phys. Rev. A 72, 032509 (2005).
[3] Fam Le Kien, S. Dutta Gupta, K. P. Nayak, and K.
Hakuta, Phys. Rev. A 72, 063815 (2005).
[4] E. G. Lima, M. Chevrollier, O. Di Lorenzo, P. C. Se-
gundo, and M. Oriá, Phys. Rev. A 62, 013410 (2000).
[5] T. Passerat de Silans, B. Farias, M. Oriá, and M.
Chevrollier, Appl. Phys. B 82, 367 (2006).
[6] Fam Le Kien and K. Hakuta, Phys. Rev. A 75, 013423
(2007).
[7] Fam Le Kien, S. Dutta Gupta, and K. Hakuta, e-print
quant-ph/0610067.
[8] K. P. Nayak, P. N. Melentiev, M. Morinaga, Fam Le Kien,
V. I. Balykin, and K. Hakuta, e-print quant-ph/0610136.
[9] C. Henkel and M. Wilkens, Europhys. Lett. 47, 414
(1999).
[10] Z. W. Gortel, H. J. Kreuzer, and R. Teshima, Phys. Rev.
B 22, 5655 (1980).
[11] H. Hoinkes, Rev. Mod. Phys. 52, 933 (1980).
[12] N. N. Bogolubov, Commun. of JINR, E17-11822, Dubna
(1978); N. N. Bogolubov and N. N. Bogolubov Jr., Ele-
mentary Particles and Nuclei (USSR) 11, 245 (1980).
[13] R. Zwanzig, Lectures in Theoretical Physics, eds. W. E.
Brittin, B. W. Downs, and J. Downs (Interscience, New
York, 1961) Vol. 3, p. 106; G. S. Agarwal, Progress in
Optics, ed. E. Wolf (North-Holland, Amsterdam, 1973)
Vol. 11, p. 3; L. Mandel and E. Wolf, Optical Coherence
and Quantum Optics (Cambridge, New York, 1995) p.
[14] J. Javanainen and M. Mackie, Phys. Rev. A 58, R789
(1998); M. Mackie and J. Javanainen, ibid. 60, 3174
(1999).
[15] E. Luc-Koenig, M. Vatasescu, and F. Masnou-Seeuws,
Eur. Phys. J. D 31, 239 (2004).
[16] See, for example, G. P. Agrawal, Nonlinear Fiber Optics
(Academic, New York, 2001).
[17] H. J. Metcalf and P. van der Straten, Laser Cooling and
Trapping (Springer, New York, 1999).
http://arxiv.org/abs/quant-ph/0610067
http://arxiv.org/abs/quant-ph/0610136
ABSTRACT
  We study phonon-mediated transitions between translational levels of an atom
in a surface-induced potential. We present a general master equation governing
the dynamics of the translational states of the atom. In the framework of the
Debye model, we derive compact expressions for the rates for both upward and
downward transitions. Numerical calculations for the transition rates are
performed for a deep silica-induced potential allowing for a large number of
bound levels as well as free states of a cesium atom. The total absorption rate
is shown to be determined mainly by the bound-to-bound transitions for deep
bound levels and by bound-to-free transitions for shallow bound levels.
Moreover, the phonon emission and absorption processes can be orders of
magnitude larger for deep bound levels as compared to the shallow bound ones.
We also study various types of transitions from free states. We show that, for
thermal atomic cesium with temperature in the range from 100 $\mu$K to 400
$\mu$K in the vicinity of a silica surface with temperature of 300 K, the
adsorption (free-to-bound decay) rate is about two times larger than the
heating (free-to-free upward decay) rate, while the cooling (free-to-free
downward decay) rate is negligible.

<|endoftext|><|startoftext|>
Introduction
	IREE for scattering amplitudes in the hard kinematics
	IREE for the form factor f(q2) in QED
	IREE for the form factor g(q2) in QED
	e+e- -annihilation into a quark-antiquark pair
	e+e- -annihilation into a quark-antiquark pair and gluons
	Exponentiation of Sudakov electroweak double-logarithmic contributions
	Application of IREE to the polarized Deep-Inelastic Scattering
	Comparison of expressions (30) and (41) for g1NS
	Comparison of small-x asymptotics, neglecting the impact of q 
	Numerical comparison between Eqs. (30) and (41), neglecting the impact of q
	Analysis of the standard fits for q
	Correcting misconceptions
	Combining the total resummation and DGLAP
	Conclusion
	Acknowledgement
	References
ABSTRACT
  It is a brief review on composing and solving Infrared Evolution Equations.
They can be used in order to calculate amplitudes of high-energy reactions in
different kinematic regions in the double-logarithmic approximation.

<|endoftext|><|startoftext|>
Cofibrations in the Category of Frölicher Spaces:
Part I
Brett Dugmore
Cadiz Financial Strategists (Pty) Ltd, Cape Town, South Africa
Email: Brett.Dugmore@cadiz.co.za
Patrice Pungu Ntumba
Department of Mathematics and Applied Mathematics
University of Pretoria
Hatfield 0002, Republic of South Africa
Email: patrice.ntumba@up.ac.za
Abstract
Cofibrations are defined in the category of Frölicher spaces by weak-
ening the analog of the classical definition to enable smooth homotopy
extensions to be more easily constructed, using flattened unit intervals.
We later relate smooth cofibrations to smooth neighborhood deforma-
tion retracts. The notion of smooth neighborhood deformation retract
gives rise to an analogous result that a closed Frölicher subspace A of the
Frölicher space X is a smooth neighborhood deformation retract of X if
and only if the inclusion i : A →֒ X comes from a certain subclass of
cofibrations. As an application we construct the right Puppe sequence.
Subject Classification (2000): 55P05.
Key Words:Frölicher spaces, Flattened unit intervals, Smooth neighborhood de-
formation retracts, Smooth cofibrations, Cofibrations with FCIP, Puppe se-
quence.
1 Preliminaries
The purpose of this section is to survey brielfy the notion of Frölicher spaces.
Frölicher spaces arise naturally in physics, and do generalize the concept of
smooth manifolds. A Frölicher space, or smooth space as initially called by
Frölicher and Kriegl [7], is a triple (X, CX ,FX) consisting of a setX , and subsets
CX ⊆ XR, FX ⊆ RX such that
• FX ◦ CX = {f ◦ c| f ∈ FX , c ∈ CX} ⊆ C∞(R)
• ΦCX := {f : X → R| f ◦ c ∈ C∞(R) for all c ∈ CX} = FX
http://arxiv.org/abs/0704.0342v1
• ΓFX := {c : R → X | f ◦ c ∈ C∞(R)for all f ∈ FX} = CX
Frölicher and Kriegl [7], and Kriegl and Michor [10] are our main reference for
Frölicher spaces. The following terminology will be used in the paper: Given a
Frölicher space (X, CX ,FX), the pair (CX ,FX) is called a smooth structure; the
elements of CX and FX are called smooth curves and smooth functions respec-
tively. The topology assumed for a Frölicher space (X, CX ,FX) throughout the
paper is the initial topology TF induced by the set FX of functions. When there
is no fear of confusion, a Frölicher space (X, CX ,FX) will simply be denoted X .
The most natural Frölicher spaces are the finite dimensional smooth manifolds,
where if X is such a smooth manifold, then CX and FX consist of all smooth
curves R → X and smooth functions X → R. Euclidean finite dimensional
smooth manifolds Rn, when viewed as Frölicher spaces, are called Euclidean
Frölicher spaces. In the sequel, by Rn, n ∈ N, we mean the Frölicher space Rn,
equipped with its usual smooth manifold structure.
A Frölicher space X is called Hausdorff if and only if the smooth real-valued
functions on X are point-separating, i.e. if and only if TF is Hausdorff.
A Frölicher structure (CX ,FX) on a set X is said to be generated by a set
F0 ⊆ RX (resp. C0 ⊆ XR) if CX = ΓF0 and FX = ΦΓF0 (resp. FX = ΦC0
and CX = ΓΦC0 ). Note that different sets F0 ⊆ RX on the same set X may
give rise to a same smooth structure on X . A set mapping ϕ : X → Y between
Frölicher spaces is called a map of Frölicher spaces or just a smooth map if for
each f ∈ FY , the pull back f ◦ϕ ∈ FX . This is equivalent to saying that for each
c ∈ CX , ϕ ◦ c ∈ CY . For Frölicher spaces X and Y , C∞(X,Y ) will denote the
collection of all the smooth maps X → Y . The resulting category of Frölicher
spaces and smooth maps is denoted by FRL.
Some useful facts regarding Frölicher spaces can be gathered in the following
Theorem 1.1 The category FRL is complete (i.e. arbitrary limits exist ), co-
complete (i.e. arbitrary colimits exist), and Cartesian closed.
Given a collection of Frölicher spaces {Xi}i∈I , let X =
i∈I Xi be the set
product of the sets {Xi}i∈I and πi : X → Xi, i ∈ I, denote the projection map
(xi)i∈I 7→ xi. The initial structure on X is generated by the set
{f ◦ πi : f ∈ FXi}.
The ensuing Frölicher space (X,ΓF0, ϕΓF0) is called the product space of the
family {Xi}i∈I . Clearly,
ΓF0 = {c : R → X | if c(t) = (ci(t))i∈I , then ci ∈ CXi for every i ∈ I}.
Now, let
i∈I Xi be the disjoint union of sets {Xi}i∈I , and ιXi : Xi →
i∈I Xi
the inclusion map. Place the smooth final structure on
i∈I Xi corresponding
to the family {ιXi}i∈I . The resulting Frölicher space is called the coproduct of
{Xi}i∈I , and denoted
i∈I Xi, and
Xi = {f :
Xi → R| for each i ∈ I, f |Xi ∈ FXi}
is the collection of smooth functions for the coproduct.
Corollary 1.1 Let X, Y , and Z be Frölicher spaces. Then the following canon-
ical mappings are smooth.
• ev: C∞(X,Y )×X → Y , (f, x) 7→ f(x)
• ins:X → C∞(Y,X × Y ), x 7→ (y 7→ ins(x)(y) = (x, y))
• comp:C∞(Y, Z)× C∞(X,Y ) → C∞(X,Z), (g, f) 7→ g ◦ f
• f∗ : C∞(X,Y ) → C∞(X,Z), f∗(g) = f ◦ g, where f ∈ C∞(Y, Z)
• g∗ : C∞(Z, Y ) → C∞(X,Y ), g∗(f) = f ◦ g, where g ∈ C∞(X,Z).
Given Frölicher spaces X , Y , and Z; in view of the cartesian closedness of
the category FRL, the exponential law
C∞(X × Y, Z) ∼= C∞(X,C∞(Y, Z))
holds. Because FX = C∞(X,R), it follows by cartesian closedness of FRL that
the collection FX can be made into a Frölicher space on its own right.
Finally we would like to show how to construct smooth braking functions,
following Hirsch [8]. Smooth braking functions are tools that are behind most
results in this paper. In [11], it is shown that the function ϕ : R → R given by
ϕ(u) =
0 if u ≤ 0
u if u > 0
is smooth. Substituting x2 for u in the above function, one sees that the function
ψ : R → R, given by
ψ(x) =
0 if x ≤ 0
x2 if u > 0
is smooth. Now, let us construct a smooth function α : R → R with the following
properties. Let 0 ≤ a < b. α(t) should satisfy:
• α(t) = 0 for t ≤ a,
• 0 < α(t) < 1 for a < t < b,
• α is strictly increasing for a < t < b,
• α(t) = 1 for t ≥ b.
Define α : R → [0, 1] by
α(t) =
γ(x)dx
γ(x)dx
where γ(x) = ψ(x− a)ψ(b − x).
In the sequel, the notation αǫ, 0 < ǫ <
, will refer to a smooth braking
function with the following properties
• αǫ(t) = 0 for t ≤ ǫ,
• 0 < αǫ(t) < 1 for ǫ < t < 1− ǫ,
• α strictly increasing for ǫ < t < 1− ǫ,
• αǫ(t) = 1 for 1− ǫ ≤ t.
2 Basic Constructions of Homotopy Theory in
In this section, we define the fundamental notions of homotopy theory in the
category FRL, such as the homotopy relation and the mapping cylinder. We
begin with an overview of our approach to homotopy in FRL, and then discuss
alternate Frölicher structures on the unit interval which are used in this and
subsequent sections.
2.1 Our Approach to Homotopy Theory in FRL
One might begin investigating homotopy theory in FRL by simply following
the homotopy theory of topological spaces, replacing continuous functions with
smooth ones. One can certainly define the notion of a homotopy H : I×X → Y
between smooth maps H(0,−) and H(1,−) in this way (which we do). One can
even get as far as the left Puppe sequence (see [4]), but eventually difficulties
begin to arise.
Extending functions defined on a subspace of a Frölicher space tends to be
a little tricky, and so the definition of a cofibration in FRL is one that needs
careful consideration. We envisage to construct the right Puppe sequence in a
future paper. To do this we define a slightly weaker notion of cofibration than
the notion obtained from topological spaces. In addition, we define the mapping
cylinder of a smooth map f : X → Y using not the unit interval, but a modified
version called the weakly flattened unit interval, denoted I, which, as one
can show, is topologically homeomorphic to the unit interval. This modified
structure on the unit interval allows us to show that the inclusion of a space X
into the mapping cylinder of f : X → Y is a cofibration (in our weaker sense ).
The weakly flattened unit interval is useful, but it also has its drawbacks.
It would be ideal to have a single structure on the unit interval that can be
used throughout out homotopy theory, but the weakly flattened unit interval
is not suitable, because it has the rather restrictive property that a smooth
map f : I → I on the usual unit interval often does not define a smooth map
f : I → I unless the endpoints of the interval are mapped to the endpoints. This
restrictive property means that we only use the flattened unit intervals where
they are absolutely necessary.
In our future work, we will investigate whether with our modified notions of
cofibration and mapping cylinder, Baues’ cofibration axioms are satisfied.
2.2 Flattened Structures on the Unit Interval
We define two main Frölicher structures which we call the flattened unit in-
terval and the weakly flattened unit interval . Let (CI ,FI) be the subspace
structure induced on I by the inclusion I →֒ R.
Definition 2.1 The Frölicher space (I, CI,FI), where the structure (CI,FI) is
the structure generated by the set
F = {f ∈ FI| there exists 0 < ǫ < 14 with f(t) = f(0) for t ∈ [0, ǫ) and
f(t) = f(1) for t ∈ (1− ǫ, 1]},
is called the flattened unit interval.
It is easy to see that any continuous map c : R → [0, 1] defines a structure
curve on I if and only if it is smooth at every point t ∈ R, where c(t) ∈ (0, 1), .
We define the left (resp. right) flattened unit interval, denoted by I−
(resp. I+), to be the Frölicher space whose underlying set is the unit interval
[0, 1], and structure is the structure generated by the structure functions in FI
that are constant near 0 (resp. 1).
Definition 2.2 The Frölicher space (I, CI,FI), with the structure defined below
is called the weakly flattened unit interval. The underlying set is the unit
interval; the structure (CI,FI) is generated by the family
F = {f ∈ FI | lim
f(t) = 0, lim
f(t) = 0, n ≥ 1}.
We call the property, for all f ∈ F ,
f(t) = 0, lim
f(t) = 0, n ≥ 1,
the zero derivative property of f .
We shall prove that all structure functions on I have the zero derivative
property, in other words, FI = F . To that effect, we need the following lemma.
Lemma 2.1 Let c : R → R be a smooth real-valued function at t = t0, and let
f : R → R be a smooth real-valued function at t = c(t0). Then,
(f ◦ c)(t0) = f (n)(c(t0))(c′(t0))n + terms of the form
af (k)(c(t0))(c
′(t0))
m1(c′′(t0))
m2 . . . (c(n−1)(t0))
mn−1 ,
where k < n and a ∈ R. In addition, if a 6= 0 then at least one ofm2,m3, . . . ,mn−1
is also non-zero.
Proof. The proof is done by induction. For the sake of brevity, we call the
term f (n)(c(t0))(c
′(t0))
n the primary term for n, and the terms of the form
af (k)(c(t0))(c
′(t0))
m1(c′′(t0))
m2 . . . (c(n−1)(t0))
mn−1 the lower order terms for n.
The statement is true for n = 1 and for n = 2. Suppose the result is true for
n = k. To show that the result holds for n = k + 1, since
dtk+1
(f ◦ c)(t0) =
(f (k)(c(t0))(c
′(t0))
+terms of the form d
(af (j)(c(t0))(c
′(t0))
m1(c′′(t0))
m2 . . . (c(k−1)(t0))
mk−1),
where j < k + 1 and a ∈ R, we need only show that
(af (j)(c(t0))(c
′(t0))
m1(c′′(t0))
m2 . . . (c(k−1)(t0))
mk−1)
gives rise to lower terms for n = k + 1, which is by the way straightforward. �
Theorem 2.1 FI = {f ∈ FI | limt→0+ d
f(t) = 0 = limt→1−
f(t)} =: F
Proof. That F ⊆ FI is evident. We must show the reverse inequality. Let
0 < ǫ < 1
, and 0 < M < 1. Consider the function cM : R → R, given by
cM (t) = (1− αǫ(|t|))βM (t) + αǫ(|t|),
where αǫ : R → R is a smooth braking function as defined in the Preliminaries,
and βM : R → R is given by
βM (t) =
−Mt if t ≤ 0
t if t > 0
It is easily seen that cM is continuous over all R, and smooth over all R except
at t = 0. Also note that 0 < cM (t) < 1 for all t ∈ R, and cM (t) = βM (t) = 0 for
all 0 ≤ t < ǫ. Now,
cM (t) =
βM (t) = −M, for −ǫ < t < 0
cM (t) =
βM (t) = 1, for 0 < t < ǫ
For n > 1, we have
cM (t) =
βM (t) = 0, for t ∈ (−ǫ, 0) ∪ (0, ǫ).
We now show that for cM ∈ ΓF . To this end, let f ∈ F . To show that
f ◦ cM : R → R is smooth, it is obvious that we need only concentrate on the
point t = 0, because f ◦ c is smooth at every t 6= 0. It follows for t 6= 0, and
n ∈ N that Lemma 2.1 applies. But as t → 0, cM (t) → 0+, and so, letting
s = cM (t), we have
f (j)(cM (t)) = lim
f (j)(s) = 0,
for all j ∈ N, by the zero derivative property of f . Thus, as t approaches
the value 0, the primary term and all the lower order terms of d
(f ◦ cM )(t)
vanish, and we have shown that f ◦ cM is smooth at t = 0. This implies that
f ◦ cM ∈ C∞(R,R) for all f ∈ F . It follows that cM ∈ ΓF .
We are now ready to show that FI ⊆ F . To this end, suppose that we
are given a structure function f ∈ FI. We shall show that this f has the zero
derivative property, and is thus an element of F .
Since f ∈ FI, we know that f ◦ c is a smooth real-valued function for every
c ∈ ΓF . In particular, f ◦ cM is smooth for all 0 < M < 1. Thus, for any n ∈ N,
(f ◦ cM )(t) = lim
(f ◦ cM )(t).
As t→ 0−, cM (t) → 0+; let us consider the lower order terms for n. Each term
of the form
af (k)(cM (t))(c
M (t))
m1(c′′M (t))
m2 . . . (c
(n−1)
M (t))
has some term (c
(t))mi , for some i > 1, with mi 6= 0. But limt→0− c
(t) = 0,
if i > 1, and so
af (k)(cM (t))(c
M (t))
m1(c′′M (t))
m2 . . . (c
(n−1)
(t))mn−1 = 0.
So all the lower order terms fall away, therefore
limt→0−
(f ◦ cM )(t) = limt→0− f (n)(cM (t))(c′M (t))n
= limt→0− f
(n)(cM (t))(−M)n
= lims→0+ f
(n)(s)(−M)n,
where s = cM (t). In a similar way one shows that
(f ◦ cM )(t) = lim
f (n)(s).
But f◦cM is smooth, therefore lims→0+ f (n)(s)(−M)n = lims→0+ f (n)(s), which
implies that lims→0+ f
(n)(s) = 0.
We have shown that the zero derivative property of f holds for the left
endpoint of the unit interval. To show that the zero derivative property of f
holds for the right endpoint of f , note that dM : R → R, dM (t) = 1− cM (t), is
a smooth real-valued function with d(0) = 1, and 0 ≤ dM (t) ≤ 1 for all t ∈ R.
One can follow a similar procedure to the above, using dM instead of cM to
show that lims→1− f
(n) = 0. �
2.3 Some Properties of Smooth Functions between the
Flattened Unit Intervals
One has to be careful when dealing with the various flattened unit intervals. A
smooth function f : I → I from the R- Frölicher subspace unit interval I to
itself need not define a smooth function f : I → I, for example. Conversely,
not every smooth function f : I → I defines a smooth function f : I → I. In
particular, we need to be aware of the fact that addition and multiplication of
functions when defined between the various flattened unit intervals does not
preserve smoothness, as is the case with the usual unit interval.
Example 2.1
The function f : I → I, f(t) = 1
t is clearly smooth, but the corresponding
function f : I → I, given by the same formula, is not smooth. To see this, let
α : R → R be a smooth braking function with the properties that
• α(t) = −1, for t < − 3
• α(t) = t, for − 1
< t < 1
• α(t) = 1, for t > 3
Define c : R → I by c(t) = 1− |α(t)|. The curve c is smooth everywhere except
at t = 0, where c(0) = 1. However, every generating function f on I is constant
near 1, and so the composite f ◦ c is smooth. Thus c is a structure curve on I.
Now, f ◦ c : R → I is given by (f ◦ c)(t) = 1
(1 − |α(t)|). Let h : I → R be a
structure function with the properties that
• h(s) = 0, for s < 1
• h(s) = s, for 1
< s < 3
• h(s) = 1, for 7
Then (h ◦ f ◦ c)(t) = 1
(1− |α(t)|) for t near 0, and is not smooth at t = 0. Thus
f does not define a smooth function from I to I.
Example 2.2
The function f : I → I, f(t) =
t, is smooth, but the corresponding f : I → I,
given by the same formula, is not smooth. This follows from the fact that f is
smooth on the open interval (0, 1), and a generating function g on I is constant
near 0 and 1. On the side, f : I → I is not smooth, because if c : R → I is a
structure curve with c(t) = t2 near t = 0, then (f ◦ c)(t) = |t| near t = 0, which
is not smooth on I at t = 0.
Example 2.3
The functions f, g : I− → I−, given by f(t) = 1
t and g(t) = 1
are both
smooth, but the sum f(t) + g(t) = 1
is not smooth.
The following lemma follows from the definition of the Frölicher structures
on the various flattened unit intervals.
Lemma 2.2 Let f : I → I be a smooth function with the properties that f(0) =
0 and f(1) = 1. Then the following maps are smooth:
• f : I → I±,
• f : I → I,
• f : I± → I,
• f : I → I,
• f : I → I.
The function defined in the following example is for later reference.
Example 2.4
Let H : I × I− → I− be given by H(t, s) = (1 − α(t))s, where α : R → R is a
smooth braking function with the properties that
• α(t) = 0 for t < 1
• 0 ≤ α(t) ≤ 1 for all t ∈ R,
• α(t) = 1 for t > 3
We show that H is smooth. To see this, let f : I− → R be a generating function
on I−. So f is constant near 0. Now, let c : R → I × I− be a structure curve,
given by c(v) = (t(v), s(v)). The curve t is a structure curve on I, and so is
a smooth real-valued function for all v ∈ R, except possibly when t(v) = 0 or
t(v) = 1. Similarly, the curve s is a structure curve on I−, and so is smooth
for all v ∈ R except possibly when s(v) = 0. Now consider the composite
H ◦ c : R → I−. Clearly, α(t(v)) is smooth for all v, since the only possible
points for non-smoothness occur when t(v) = 0 or t(v) = 1, and α(t(v)) is
locally constant near these points. Consequently, H ◦ c is smooth everywhere
except possibly when s(v) = 0. Now, let’s consider f ◦H ◦ c : R → R; the only
possible points for non-smoothness are those in which s is 0, i.e. H◦ = 0. But f
is a structure generating function on I−, and so is locally constant near 0. This
shows that f ◦H ◦ c is smooth for all v ∈ R, and thus H is smooth.
2.4 Homotopy in FRL and Related Objects
Definition 2.3 (1) Let X be a Frölicher space, and x0, x1 ∈ X. We say that
x0 is smoothly path-connected to x1 if there is a smooth path c : I → X such
that c(0) = x0 and c(1) = x1. We write x0 ≃ x1. The relation ≃ is called
smooth homotopy when it is applied to hom-sets.
(2) Let f : X → Y be a map of Frölicher spaces. f is called a smooth
homotopy equivalence provided there exists a smooth map g : Y → X such that
f ◦ g ≃ 1Y and g ◦ f ≃ 1X .
One can show that smooth homotopy is a congruence in RFL. In practice, we
say that smooth maps f, g : X → Y are smoothly homotopic if there exists a
smooth map H : I ×X → Y with H(0,−) = f and H(1,−) = g. If A ⊆ X is
subspace of X , then we say that H is a smooth homotopy (rel A) if the map
H has the additional property that H(t, a) = a for each t ∈ I and a ∈ A. See
Cherenack [5] and Dugmore [6] for more detail regarding smooth homotopy.
The notion of deformation retract is fundamental to topological homotopy
theory. The following definitions are adapted for smooth homotopy, and will be
needed at a later stage.
Definition 2.4 Let A ⊆ X be a subspace of a Frölicher space X, and let i :
A →֒ X denote the inclusion map. Then
• We say that A is a retract of X if there exists a smooth map r : X → A
such that ri = 1A. We call r a retraction.
• We call A a weak deformation retract of X if the inclusion i is a smooth
homotopy equivalence.
• The subspace A is called a deformation retract of X if there exists a re-
traction r : X → A such that ir ≃ 1X .
• The subspace A is called a strong deformation retract of X if there exists
a retraction r : X → A such that ir ≃ 1X(relA).
Definition 2.5 The mapping cylinder If of f : X → Y is defined by the fol-
lowing pushout
I ×X // If
where i1 : X → I ×X is given by i1(x) = (1, x), for any x ∈ X. We denote the
elements of If by [t, x] or [y], where (t, x) ∈ I ×X and y ∈ Y .
Replacing I ×X in the above pushout diagram by I×X or I×X, we obtain
the flattened mapping cylinder If and weakly flattened mapping cylinder If of f
respectively. We use the same notation for elements of these flattened mapping
cylinders as described above for the mapping cylinder.
There is also a map i0 : X → I ×X , defined by i0(x) = (0, x) for x ∈ X . This
induces an inclusion map i′0 : X → If , which identifies X with the Frölicher
subspace i′0(X) of If . An inclusion is induced in a similar way for the flattened
mapping cylinders. If one identifies {0}×X to a point in the mapping cylinder
If of a map f : X → Y , then one obtains the mapping cone Tf of the
map f . In a similar fashion, we define the flattened mapping cone Tf and
weakly flattened mapping cone Tf of a smooth map f : X → Y .
2.5 Cofibrations in FRL
A cofibration is a map i : A→ X for which the problem of extending functions
from i(A) to X is a homotopy problem. In other words, if a map f : i(A) → Z
can be extended to a map f∗ : X → Z, then so can any map homotopic to f . For
topological spaces, the usual definition is phrased in a slightly more restrictive
way. The extension of a map g ≃H f , for some homotopy H : I × i(A) → Z, is
required to exist at every level of the homotopy simultaneously. In other words,
one requires each H(t,−) to be extendable in such a way that the resulting
homotopy H∗ : I ×X → Z is continuous.
We weaken this definition somewhat, to enable smooth homotopy extensions
to be more easily constructed using a flattening at the endpoints of the homo-
topy. This enables us to characterize smooth cofibrations in terms of a flattened
unit interval, and then later to relate smooth cofibrations to smooth neigh-
borhood deformation retracts. Our definition of smooth cofibration, though
different from from Cap’s definition, see [1], leads to several classical results as
does Cap’s. As pointed out by Cap, the analogue of the classical definition of
cofibration would not allow even {0} →֒ I to be a smooth cofibration. So, we
have the following
Definition 2.6 A smooth map i : A → X is called a smooth cofibration if,
corresponding to to every commutative diagram of the form
(0,1A)
f // Z
66mmmmmmmmmmmmmm
there exists a commutative diagram in FRL of the form
(0,1X )
::tttttttttt
where G′ : I × A → Z is given by G′(t, a) = G(αǫ(t), a) for some 0 < ǫ < 12 ,
and each t ∈ I, a ∈ A.
The problem of extending a map smoothly from a subspace of a Frölicher
space to the whole space is a more difficult problem than simply extending con-
tinuously. It is mainly for this reason that the definition of smooth cofibration
differs somewhat from the corresponding definition of a topological cofibration.
Lemma 2.3 Let i : A → X be a smooth cofibration, then i is an initial mor-
phism in FRL. In addition, if A is Hausdorff, then i is injective. So in this
case A can be regarded as a subspace of X.
Proof. Let us show that every smooth map f : A→ R factors through i, that
is for every f ∈ FA, there exists f̃ ∈ FX such that f = f̃ ◦ i. To this end,
consider the smooth map G : I × A → R, given by H(t, a) = tf(a). Clearly,
0|A = G(0,−), where 0 : X → R is the constant map 0. It follows that there is
map F : I ×X → R such that F ◦ (1× i) = G′. Then, clearly f̃ := F (1,−) has
the desired property.
The remaining part of the proof of Proposition 3.3, in [1], holds verbatim
here as well. �
In this paper, we are interested only in cofibrations that are injective. Hence-
forth, all cofibrations are assumed to be injective.
All topological cofibrations are inclusions, and this result is true for smooth
cofibrations too. The proof of the following lemma is essentially the same as
the proof given by James [9] for the topological result, although James’s proof
is in some sense dual to ours, using path-spaces in place of cartesian products
and the adjoint versions of our homotopies.
Lemma 2.4 A cofibration
i // X
is a smooth inclusion.
Proof. Let Ii be a mapping cylinder of i, and let j : X → Ii be the standard
inclusion map. Consider the smooth map γ : I → I, γ(t) = 1 − t, for all t ∈ I,
and the quotient map q : (I ×A) ⊔X → Ii; we have the following commutative
diagram
(0,1A)
j // Ii
66mmmmmmmmmmmmmm
where G(t, a) = [(1 − t, a)]. Notice that the map G is smooth. Since i is a
cofibration, we have the commutative diagram
(0,1X )
::uuuuuuuuuu
where G′(t, a) = G(αǫ(t), a) for some 0 < ǫ <
. Define U : X → Ii by
U(x) = F (1, x). We have U ◦ i = G′(1,−), where G′(1, a) = [(0, a)], for every
a ∈ A. Thus the assignment a 7→ G′(1, a) defines the usual inclusion of A into
the mapping cylinder. From this we deduce that U ◦ i is an inclusion, and hence
i is an inclusion. �
There is an equivalent formulation of definition 2.6, given in the following
lemma.
Lemma 2.5 A smooth map
i // X
is a cofibration if and only if, for every smooth map h : (0×X)∪(I−×i(A)) → Z,
the following diagram
(0×X) ∪ (I− × i(A)) h //
I− ×X
77oooooooooooooo
where j is the evident inclusion, exists in FRL.
Proof. Suppose that the inclusion A //
i // X is a smooth cofibration, and
suppose that h : (0 × X) ∪ (I− × i(A)) → Z is a smooth map. We have the
diagram
(0×B) ∪ (I− × i(A)) h //
I− ×X
We need to fill in a smooth map G : I− × X → Z which makes the resulting
diagram commute. To do this, notice that h|I−×i(A) is smooth, and thus the
corresponding map h|I × i(A), using the usual unit interval, is also smooth. We
have the following diagram
(0,1A)
h|0×X // Z
66mmmmmmmmmmmmmm
where h|0×X(0,−) : X → Z is denoted as h|0×X . The fact that i is a smooth
cofibration yields the following FRL-commutative diagram:
h0×X //
(0,1A)
::tttttttttt
where (h|I−×A)′(t, a) = h|I−×A(αǫ(t), a), for some 0 < ǫ < 12 . Now, chose a
smooth braking function β : R → R with the following properties.
• α(t) = 0 for t < ǫ
• α(t) = t for ǫ < t.
F may not be smooth on I− × A due to the flattening requirements of the left
flattened unit interval. To correct this, set G(t, a) = F (β(t), a). Notice that the
insertion of this braking function does not affect the commutativity conditions
of G, since the only adjustments to F occur in the first coordinate where the
map (h|I−×X)′ is constant.
Now, assume the converse, i.e. to every smooth map h : (0 × X) ∪ (I− ×
i(A)) → Z, corresponds a commutative diagram
(0×X) ∪ (I− × i(A)) h //
I− ×X
77oooooooooooooo
We wish to show that the inclusion i : A → X is a cofibration; so assume we
have the following diagram
(0,1A)
f // Z
66mmmmmmmmmmmmmm
There exists the diagram
(0,1A)
f // Z
66mmmmmmmmmmmmmm
where G′(t, a) = G(αǫ(t), a). Our hypothesis allows us to construct the diagram
(0×X) ∪ (I− × i(A))
f∪G′ //
I− ×X
77oooooooooooooo
Note that f ∪ G′ is smooth since αǫ(t) is constant near 0. Since H is smooth
on I− ×X it defines a smooth map on I ×X . One can verify that the diagram
(0,1X)
::tttttttttt
commutes as required. �
3 Smooth Neighborhood Deformation Retracts
This section is concerned with the formulation of a suitable notion of smooth
neighborhood deformation retract. For topological spaces, the statement that a
closed subspace A of X is a neighborhood deformation retract of X is equivalent
to the statement that the inclusion i : A →֒ X is a closed cofibration. We show
that in the category of Frölicher spaces there is a notion of smooth neighborhood
deformation retract that gives rise to an analogous result that a closed Frölicher
subspace A of the Frölicher space X is a smooth neighborhood deformation
retract of X if and only if the inclusion i : A →֒ X comes from a certain subclass
of cofibrations. As an application, we construct the right Puppe sequence.
3.1 SNDR pairs and SDR pairs
The definition of ‘smooth neighborhood deformation retract’ that we adopt in
this paper is similar to the definition of ‘R-SNDR pair’suggested in [6], but we
have modified the definition in order to retain only the essential aspects of ‘first
coordinate independence’ defined in [6].
We begin by defining the ‘first coordinate independence property’ of a func-
tion on a product of a Frölicher space with I (or I−, I+).
Definition 3.1 Let i : A → X be a smooth map, and c : R → X a structure
curve on X. Define
Λ(c, i) = {t∗ ∈ c−1(i(A))| there exists a sequence {tn} of real numbers
with limn→∞ tn = t∗ and each tn ∈ c−1(X − i(A))}.
The points in Λ(c, i) are those values in R where the curve ‘enters’ i(A) from
X − i(A), or ‘touches’ a point in i(A) whilst remaining in X − i(A) nearby.
Now, we are ready to define the ‘first coordinate independence property’ for a
structure function on a product.
Definition 3.2 Let i : A→ X be a smooth map and suppose f : I×X → R is
a structure function on I ×X. Let c : R → I ×X, given by c(s) = (t(s), x(s))
have the following properties
• The map x(s) is a structure curve on X.
• For all ǫ > 0, t(s) is a smooth real-valued function on R−∪s∗∈Λ(x,i)[s∗ −
ǫ, s∗ + ǫ].
If, for every such map c, the composite f ◦ c is a smooth real-valued function,
then we say that f : I×X → R has the first independence property (FCIP) with
respect to i.
Extending the definition, we say that a map g : I × X → Y has the FCIP
with respect to i if the composite h ◦ g : I ×X → R has the FCIP with respect
to i for every h ∈ FY .
Notice that we can formulate a similar definition of the FCIP if we replace
I throughout by I− or I+, leaving the rest of the definition unchanged. We will
have occasion to use this type of first coordinate independence property in the
later part of this work.
Note. Let i : A→ X , and suppose that we are given a map g : I×X → Y . Let
f : Y → R be a structure function on Y , and suppose that f ◦ g : I ×X → R
has the FCIP with respect to i for any such f . Then, given a smooth map
h : Y → Z, the composite f ′ ◦ h ◦ g : I×X → R has the FCIP with respect to
i for any structure function f ′ on Z.
The above note applies equally well if g : I− ×X → Y or g : I+ ×X → Y
has the FCIP with respect to i when composed with a smooth function h on Y .
Example 3.1
1. For any i : A→ X , the projection onto the second coordinate πX : I×X → X
has the FCIP.
2. Let α : R → R be a smooth braking function with the properties that
• α(t) = 0 if t < 1
• 0 < α(t) < 1 if 1
≤ t ≤ 3
• α(t) = 1 if 3
Consider 0 →֒ I−. Let H : I× I− → I− be given by H(t, s) = (1−α(t))s. Then,
f ◦H : I× I− → R has the FCIP with respect to the inclusion 0 →֒ I−, for any
f ∈ FI− .
Definition 3.3 Consider a smooth inclusion i : A →֒ X. Suppose that there
exists a smooth map u : X → I, with u−1(0) = i(A). If there exists a smooth
map H : I×X → X that satisfies the following properties:
• H has the FCIP with respect to i.
• H(0, x) = x for all x ∈ X.
• H(t, x) = x for all (t, x) ∈ I× i(A).
• H(1, x) ∈ i(A) for all x ∈ X with u(x) < 1,
then the pair (X,A) is called a smooth neighborhood deformation retract pair,
or SNDR pair for short.
If, in addition, H is such that H(1 × X) ⊂ i(A), then the pair (X,A) is
called a smooth deformation retract pair, or an SDR pair for short.
The subspace A is called a smooth neighborhood deformation retract or smooth
deformation retract of X if (X,A) is an SNDR pair or SDR pair, respectively.
The pair (u,H) is called a representation for the SNDR (or SDR) pair.
Example 3.2
1. The pair (X, ∅) is an SNDR pair. A representation is u(x) = 1, H(t, x) = x,
for each t ∈ I and x ∈ X .
2. The pair (X,X) is an SNDR pair. A representation is u(X) = 0, H(t, x) = x,
for each t ∈ I and x ∈ X .
Lemma 3.1 The pair (I−, 0) is an SDR pair.
Proof. Let α : R → R be the smooth braking function of Examples 3.1. A
representation for (I−, 0) as an SDR pair is (u,H), where u : I− → I and
H : I× I− → I− are given by u(s) = s, and H(t, s) = (1 − α(t))s. Clearly, the
identity u : I− → I is smooth. And the map H , as shown in Example 2.4, is
smooth and clearly has the FCIP with respect to the inclusion, since whenever
v approaches a value for which s(v) = 0, one has
g((1− α(t(v)))s(v)) = g(0)
for v in a neighborhood of this value and g ∈ FI− . �
Lemma 3.2 The pair (I, {0, 1}) is an SNDR pair.
Proof. A representation (u,H) for the SNDR pair can be given as follows.
Define u : I → I to be a bump function such that
• u(t) = 0 for t = 0 or t = 1,
• u(t) = 1 for t ∈ [ 1
• 0 < u(t) < 1 otherwise,
and let β : I → I be a braking function with the properties that β(s) = 0 for
0 ≤ s ≤ 1
, and β(s) = 1 for 3
≤ s ≤ 1. Let 0 < ǫ 1
, and define H : I× I → I by
H(t, s) = (1− αǫ(t))s+ αǫ(t)β(s). It is clear that H(0, s) = s, H(t, 0) = 0, and
H(t, 1) = 1. Suppose that u(s) < 1. Then, s ∈ [0, 1
) ∪ (3
, 1]. This implies that
β(s) = 0 or β(s) = 1. We then have H(1, s) = 0 or H(1, s) = 1, which means
that H(1, s) ∈ {0, 1} if u(s) < 1.
To see that H is smooth, let f : I → R be a generating function for the
flattened unit interval. The only possible points of non-smoothness are points
where t = 0, 1 and s = 0, 1. The braking function αǫ ensures that H is locally
constant in the tb variable whenever t is near 0 or 1, so no problem arises from
the t component. When s is near s = 0, we have H(t, s) near 0, and so the
generating function f is locally constant. Similarly, when s is near s = 1, we
have H(t, s) near 1, and the generating function f is again locally constant. �
We now show that the product of SNDR pairs is again an SNDR pair.
Theorem 3.1 Let i : A →֒ X and j : B →֒ Y be inclusion mappings. If (X,A)
and (Y,B) are SNDR pairs, then so is
(X × Y, (X ×B) ∪ (A× Y )).
If one of (X,A) or (Y,B) is an SDR pair, then so is the pair
(X × Y, (X ×B) ∪ (A× Y )).
Proof. Let α : R → I be a smooth braking function with the properties that
α(t) = 0 for t ≤ 1
, and α(t) = 1 for t ≥ 3
, and let β : R → R be a smooth
increasing braking function with the properties that β(t) = t for t ≤ 1
, and
β(t) = 1 for t ≥ 3
. Suppose that (u,H) and (v, J) are representations for the
SNDR pairs (X,A) and (Y,B), respectively. Let u : X → I, and v : Y → I be
given by u(x) = β(u(x)) and v(y) = β(v(y)) respectively. Define w : X×Y → I
by w(x, y) = u(x)v(y). The braking function β ensures smoothness of u and
v, and consequently of w. We have w−1(0) = (X × B) ∪ (A × Y ), as required.
Define Q : I×X × Y → X × Y as follows .
Q(t, x, y) =
(H(α(t), x), J(α(t), y)) if u(x) = v(y) = 0
(H(α(t), x), J(α(
)α(t), y)) if v(y) ≥ u(x), v(y) > 0,
(H(α(
)α(t), x), J(α(t), y)) if u(x) ≥ v(y), u(x) > 0.
We must show that Q is a smooth map, with the first coordinate independence
property with respect to the inclusion (X × B) ∪ (A × Y ) →֒ X × Y . We first
consider each part of the definition of Q separately. The first part is clearly
smooth. Let us verify that Q is smooth on the second part of its definition; the
third part is similar.
We need only focus on the component J(α(
)α(t), y). Each function
making up J(α(
)α(t), y) is smooth individually, so we need only pay extra
attention to those parts that involve flattened unit intervals, remembering that
addition and multiplication on the flattened unit interval need not preserve
smoothness, as is the case for the usual unit interval.
So let us consider α(
); it is smooth except possibly when
approaches
0 or 1, since it is here that structure curves on the flattened unit interval need
not be smooth in the usual sense. Clearly, if u(x) approaches 0 and v(y) does
not approach 0, then the braking function α ensures that
= 0 near such
points. If v(y) approaches 0, then u(x) must approach 0 too. This situation is
dealt with later.
Thus, Q, in part two of the definition, is smooth, and one can show similarly
that Q in the third part of the definition is smooth as well.
Let us now consider the overlaps of the three parts of the definition of Q.
Observe that if u(x) is in a sufficiently small neighborhood of v(y), with u(x) 6= 0
and v(y) 6= 0, then we have α(u(x)
) = 1, and so the second and third
parts of the definition of Q coincide here. Thus, it remains only to show that Q
is smooth as u(x) and v(y) both approach 0.
If Q is smooth in each of its coordinates then it is smooth, so consider the
coordinate involving the map J . Let c : R → I×X × Y be a structure that is
given by c(s) = (t(s), x(s), y(s)). Then, the map c1 : R → I× Y , given by
c1(s) =
(α(t(s)), y(s)) if u(x(s)) = v(y(s)) = 0
u(x(s))
v(y(s))
)α(t(s)), y(s)) if v(y(s)) ≥ u(x(s)), v(y(s)) > 0
(α(t(s)), y(s)) if u(x(s)) ≥ v(y(s)), u(x(s)) > 0
is a map satisfying the conditions of Definition 3.2, since its second coordinate is
smooth, but its first coordinate may be singular as v(y(s)) ( and hence u(x(s)))
approaches 0. Since J has the first coordinate independence property, the map
(Joc1)(s) =
J(α(t(s)), y(s)) if u(x(s)) = v(y(s)) = 0
u(x(s))
v(y(s))
)α(t(s)), y(s)) if v(y(s)) ≥ u(x(s)), v(y(s)) > 0
J(α(t(s)), y(s)) if u(x(s)) ≥ v(y(s)), u(x(s)) > 0
is smooth. Thus, Q ◦ c is smooth, and since c is arbitrary, Q is smooth. In a
similar way, the coordinate of Q involving H can be shown to be smooth.
We now verify that Q satisfies the required boundary conditions. When t =
0, all three lines defining Q reduce to (H(0, x), J(0, y)) = (x, y). Let x ∈ A and
y ∈ B; then u(x) = v(y) = 0. Therefore, Q reduces to (H(α(t), x), J(α(t), y)) =
(x, y). If x ∈ A and y /∈ B, then Q is given by the second part of its definition,
which reduces to (H(α(t), x), J(0, y)). The case when x /∈ A and y ∈ B is
similar. If t = 1 and 0 < w(x, y) < 1 then either 0 < u(x) < 1 or 0 < v(y) < 1.
Suppose that 0 < u(x) < 1. Then either u(x) ≤ v(y) or v(y) < u(x). If
u(x) ≤ v(y), then Q is given by the second part of its definition, which reduces
to (H(1, x), J(α(
, y)) ∈ i(A)× Y . If v(y) < u(x), then the third part of the
definition of Q applies and Q reduces to (H(α(
), x), J(1, y)) ∈ X × j(B).
Finally, we must show that for any f ∈ FX×Y , f ◦Q has the first coordinate
independence property with respect to the inclusion (X×B)∪(A×Y ) →֒ X×Y .
To this end, consider a map c : R → I×X×Y , given by c(s) = (t(s), x(s), y(s)).
Let {sn} be a sequence of real numbers converging to s∗ with c(sn) ∈ (X×Y )−
((A× Y ) ∪ (X ×B)), and c(s∗) ∈ (A× Y ) ∪ (X ×B). There are three cases to
consider.
• Suppose that c(s∗) ∈ A×B. Then x(s∗) ∈ A and y(s∗) ∈ B. The fact that
H and J have the first coordinate independence property with respect to
i and j respectively means that each coordinate of Q is smooth, and so Q
is smooth.
• Suppose that c(s∗) ∈ A × Y , and that y(s∗) /∈ B. Then at each of the
points c(sn), (Q ◦ c)(sn) is given by the second part of the definition of
Q, for n large enough. Since x(s∗) ∈ A, the component of Q involving H
is smooth, since H has the first coordinate independence property. For
any s in a neighborhood of s∗, α(
u(x(s))
v(y(s))
) = 0. Thus, the component of
Q involving J is constant for s in a neighborhood of s∗, and so is smooth
there.
• The case with c(s∗) ∈ X ×B, and x(s∗) /∈ A is similar to the second case
above.
For the last part of the theorem, suppose that (u,H) represent (X,A) as an
SDR pair. If we replace u by u′ = 1
u, then (u′, H) also represent (X,A) as an
SDR pair. Making the above constructions now with u′ in place of u, it follows
that w(x, y) < 1 for all (x, y) and so Q(1, x, y) ∈ (X × B) ∪ (A × Y ). This
completes the proof. �
4 Cofibrations
In this section, we show that for a subspace A ⊆ X that is closed in the under-
lying topology, the inclusion i : A → X is a cofibration if and only if (X,A) is
an SNDR pair.
Definition 4.1 Let i : A→ X be a cofibration. We call i a cofibration with
FCIP if any homotopy extension can be chosen to have the FCIP with respect
to i.
Using the equivalent formulation of the notion of cofibration, given by Lemma
2.5, we may restate Definition 4.1 as follows: A cofibration i : A → X is a
cofibration with the FCIP if and only if the map G that we may fill in to
complete the commutative diagram
(0×X) ∪ (I− ×A) h //
I− ×X
may be chosen to have the FCIP with respect to the inclusion i.
We have the following result, which corresponds to a similar topological
result.
Lemma 4.1 A smooth map i : A → X is a cofibration (with the FCIP) if
and only if (0 × X) ∪ (I− × A) is a retract of I− × X, (where the retraction
r : I− ×X → (0×X) ∪ (I− ×A) has the FCIP ).
Proof. In the one direction, suppose that (0 × X) ∪ (I− × A) is a retract of
I− ×X . We wish to complete the following diagram:
(0×X) ∪ (I− ×A) h //
I− ×X
By hypothesis, there exists r : I−×X → (0×X)∪ (I− ×A) such that r ◦ j = 1.
Define G = h ◦ r. If r has the FCIP, then so does h ◦ r.
Conversely, suppose that i : A → X is a cofibration (with the FCIP). We
may find a map r such that the diagram
(0×X) ∪ (I− ×A) 1//
(0 ×X) ∪ (I− ×A)
I− ×X
commutes. Thus, r ◦ j = 1. If i is cofibration with the FCIP with respect to i,
then r can be chosen to have the FCIP. �
The next theorem shows the relationship between cofibrations, retracts and
SNDR pairs.
Theorem 4.1 Let i : A → X be an inclusion, with A closed in the underlying
topology of X. Then the following are equivalent.
(1) The pair (X,A) is an SNDR pair.
(2) There is a smooth retraction r : I− × X → (0 ×X) ∪ (I− × A) with the
FCIP.
(3) The map i : A→ X is a cofibration with the FCIP.
Proof. To show that (1) and (2) are equivalent, note that the pair (I−×X, (0×
X) ∪ (I− × A)) is an SDR pair, as a consequence of Lemma 3.1 and Theorem
3.1. Let (w,Q) be a representation for the pair (I− ×X, (0×X)∪ (I− ×A)) as
an SDR pair, and let Q be constructed as in Theorem 3.1. Define
r : I− ×X → (0×X) ∪ (I− ×A)
by r(t, x) = Q(1, t, x), where (t, x) ∈ I− ×X . We observe that r has the FCIP,
since Q has this property, and Q has this property since each of its components
has this property.
The equivalence of (2) and (3) is Lemma 4.1.
We need only show that (2) implies (1). Let r : I−×X → (0×X)∪ (I−×A)
be a retraction with the FCIP with respect to i. Define H : I × X → X
by H(t, x) = (πX ◦ r)(α(t), x), where πX is the projection onto the second
coordinate, and α : R → R is a braking function with the following properties:
α(t) = 0 for t ≤ 0, α(t) = 1 for t ≥ 3
, and 0 < α(t) < 1 for 0 < t < 3
. This
braking function is necessary to ensure smoothness at the right endpoint of the
flattened unit interval I. Smoothness at the left endpoint is already taken care
of by the fact that r is defined in terms of the left flattened unit interval. The
map H satisfies the following properties:
• H has the FCIP since r has this property.
• H(0, x) = (πX ◦ r)(0, x) = x, for x ∈ X .
• H(t, x) = (πX ◦ r)(α(t), x) = x, for x ∈ A.
We now construct u : X → I. Let πI : I×X → I denote the projection onto I.
Define a smooth function β : R → R by
β(t) =
0 if t ≤ 0
t2 if t > 0.
Now, define u : X → I by
u(x) =
β(α(t) − (πI ◦ r)(1, x)(πI ◦ r)(α(t), x))dt
β(α(t))dt
It is clear that u is a smooth mapping.
We now verify that (u,H) represents (X,A) as an SNDR pair.
(1) Let x ∈ A. Clearly, (πI ◦ r)(1, x) = 1 and πI ◦ r)(α(t), x) = α(t), and so
β(α(t)− (πI ◦ r)(1, x)(πI ◦ r)(α(t), x))dt = 0. Thus, u(x) = 0, for all x ∈ A.
(2) Suppose that x ∈ X−A. Since 0×(X−A) is open in the underlying topology
on (0×X)∪ (I− ×A), we may choose an open neighborhood W ⊆ 0× (X −A)
of (0, x). Since r is continuous, there is a neighborhood V ⊆ I− ×X such that
r(V ) ⊆W ⊆ 0× (X −A). Now, consider the mapping qx : I → I×X , given by
qx(t) = (α(t), x), for each x ∈ X . This is clearly smooth. Thus, there exists a
neighborhood U ⊆ I− such that qx(U) ⊆ V . In other words, U × {x} ⊆ V . So,
we have (πI ◦ r)(α(t), x) = 0, for all t ∈ U . Thus, we have
u(x) =
β(α(t) − (πI ◦ r)(1, x)(πI ◦ r)(α(t), x))dt +
β(α(t))dt
β(α(t))dt
Combining this with part (1), we deduce that u−1(0) = A.
(3) Suppose that x is such that u(x) < 1. There must be a neighborhood U of I
such that (πI◦r)(1, x)(πI ◦r)(α(t), x) > 0, for t ∈ U . Thus (πI◦r)(1, x) > 0, but
this implies that r(1, x) ∈ I×A, and hence H(1, x) ∈ A. The proof is complete.
5 The Mapping Cylinder
In this section we show that the inclusion of X into the flattened mapping
cylinder If of a map f : X → Y is a cofibration with the FCIP.
Theorem 5.1 Let f : X → Y be a smooth map. Then, the pair (If , X) is an
SNDR pair.
Proof. Let α : I → R be a smooth braking function with the following proper-
ties: α(t) = 0 if 0 ≤ t ≤ 1
, α(t) = 1 if 3
≤ t ≤ 1, 0 < α(t) < 1, otherwise. Define
two more braking functions α1, α2 : I → R as follows: α1(0) = 0, 0 < α1(t) < 1
if 0 < t < 3
, α1(t) = 1 if
≤ t ≤ 1, and α2(t) = 0 if 0 ≤ t ≤ 34 , α2(t) = 1
≤ t ≤ 1. Now, define u : If → I by u([t, x]) = α1(t) and u([y]) = 1, for
(t, x) ∈ I×X and y ∈ Y . Define H : I× If → If by
H(s, [t, x]) = [(1 − α(s))t+ α(s)α2(t), x] if (t, x) ∈ I×X
H(s, [y]) = [y] if y ∈ Y .
That u is smooth comes from the fact that it is smooth when restricted to
each component of the coproduct (I×X)⊔Y ; it is thus smooth on the quotient
To see that the map H : I × If → If is smooth, note that since we are
working in a cartesian closed category, products commute with quotients, i.e. if
q is quotient, then so is 1× q, where 1 is an identity map. Thus, we may think
of H as being defined on the space
(I× I×X) ⊔ (I× Y )
where ∼ is the identification (t, 1, x) = (t, f(x)) for t ∈ I, and x ∈ X . Since H is
smooth when restricted to each component of the coproduct (I×I×X)⊔(I×Y ),
H is smooth on the quotient I× If .
We now verify that (u,H) is a representation for (If , X) as an SNDR pair.
• u−1(0) = [0, x] = i0(X).
• H(0, [t, x]) = [t, x] and H(0, [y]) = [y].
• H(s, [0, x]) = [0, x].
• If u[t, x] < 1, then t < 3
and so α2(t) = 0. Thus, H(1, [t, x]) = [0, x].
This completes the proof. �
Finally, we have the following important corollary.
Corollary 5.1 Given any smooth map f : X → Y , the inclusion X →֒ If is a
cofibration with the FCIP.
6 The Exact Sequence of a Cofibration
Our aim in this section is to show how one can use SNDR pairs to prove the
existence of the right exact Puppe sequence. We state the result in Theorem
6.1 and break the proof of the result up into a number of lemmas. We follow
the method used by Whitehead [12] for the topological case.
Throughout this section we work in the category FRL∗ of pointed Fr—’olicher
spaces, and basepoint preserving smooth maps.
Theorem 6.1 Let W be an object in FRL∗, and suppose that i : A →֒ X is a
cofibration in FRL∗. For any basepoint x0 ∈ A ⊆ X there is a sequence
. . . // [
A,W ]
Ti,W ]
X,W ]
A,W ] // . . .
. . . // [
A,W ]
// [Ti,W ]
// [X,W ]
// [A,W ]
which is an exact sequence in SETS∗, where j : X → Ti is the inclusion dis-
cussed in Paragraphe 2.4 and k : Ti →
A is the quotient map defined below.
It is, in fact, possible to prove that the sequence above is an exact sequence
of groups as far as
A,W ] and that the morphisms to this point are group
homomorphisms, but we shall not do so here.
The reduced(flattened)suspension of a pointed Frölicher space X is de-
fined as
X = (I/{0, 1}) ∧X,
where the reduced join is defined as for topological spaces with the identified
set taken as basepoint, and with 0 the basepoint of I.
In this section, whenever we refer to the suspension of a space , we mean
the reduced flattened suspension defined above.
Lemma 6.1 If (x,A) is an SNDR pair and p : X → X/A the quotient map,
then the sequence
i // X
p // X/A
is right exact.
Proof. To show that the given sequence is right exact we must show that for
any Frölicher space W the following sequence is exact in SETS:
[X/A,W ]
// [X,W ]
// [A,W ] .
It is easy to see that im p∗ ⊆ ker i∗. To see the reverse inclusion, let g : X →
W be an element of [X,W ], with g|A ≃ w0 (rel w0), where w0 ∈ W . Since
i // X is an SNDR pair, the map i is a cofibration, and so we may extend
w0 to a smooth map g
′ : X → W such that g′ ≃ g. But g′ is constant on A,
and so there exists a smooth map g1 : X/A → W such that p∗(g1) = g′. This
shows that ker i∗ ⊂ im p∗. �
Lemma 6.2 For any smooth map f : X → Y , the sequence
f // Y
l // Tf
is right exact, where l is the usual inclusion of Y into the mapping cone; i.e.
y 7→ [y] ∈ Tf .
Proof. One can show that there is a homotopy commutative diagram
i ��@
// Tf
where i, j, and l are the usual inclusions, and p is the quotient map that collapses
away {0} ×X to a point. Since, by Theorem 5.1, (If , X) is an SNDR pair, it
follows from Lemma 6.1 that the sequence
i // If
p // Tf
is right exact. It is fairly easy to show that j : Y → If is a homotopy equivalence.
Therefore, the sequence
f // Y
l // Tf
is right exact. �
Lemma 6.3 For any smooth map i : A → X, there is an infinite right exact
sequence
i // X
// Ti
// . . . i
// Tin−2
// Tin−1
// . . .
where in, n ≥ 1, are inclusion maps.
Proof. The pair (Ti, X) is an SNDR pair. The representation for the pair
(If , X) in Theorem 5.1 can be adapted to show this. One iterates the procedure
of Lemmas 6.1 and 6.2. �
One can easily see that there is an isomorphism between Ti/X and
Define q : Ti →
A to be the map which identifies X ⊂ Ti to a point, followed
by the isomorphism Ti/X →
Lemma 6.4 The sequence
// Ti
is right exact.
Proof. As noted above the pair (Ti, X) is an SNDR pair. We have the com-
mutative diagram
// Ti
where p : Ti → Ti/X is the identification map, and q0 : Ti/X →
A is an
isomorphism. The top line of the diagram is right exact, by Lemma 6.1, and so
the sequence
// Ti
is right exact. �
There is a commutative diagram
// Ti
where q1 is a homotopy equivalence. ( See Whitehead [12] for more details of
this map. ) Using commutative diagrams of this form, one can now proceed
almost exactly as one does in the topological situation, as in Whitehead [12] for
example, to get the following infinite right exact sequence:
i // X
// Ti
// . . .
. . . //
// . . .
The definition of right exactness now gives us the exact sequence of Theorem
References
[1] A. Cap, K-Theory for Convenient Algebra, Dissertationen, Faculty of
Mathematics, University of Vienna, 1993.
[2] Cherenack P., Applications of Frölicher Spaces to Cosmology, Ann. Univ.
Sci. Budapest 41(1998), 63-91.
[3] P. Cherenack , Frölicher versus Differential Spaces: A prelude to Cosmol-
ogy, Kluwer Academic Publishers 2000, 391-413.
[4] P. Cherenack, The Left Exactness of the Smooth Left Puppe Sequence.
In L. Tamassy and J. Szenthe, editors, New Developments in Differential
Geometry, (Proceedings of the Colloquium on Differential Geometry, De-
brecen, Hungary, July 26-30, 1994), Mathematics and Its Applications.
Kluwer Academic Publishers, 1996.
[5] P. Cherenack, Smooth Homotopy, Topology with Applications, (18):27-41,
1984.
[6] B. Dugmore, The Right Exactness of the Smooth Right Puppe Sequence.
Master’s Thesis, University of Cape Town, 1996.
[7] A. Frölicher, A. Kriegl, Linear Spaces and Differentiation Theory, J. Wiley
and Sons, New York, 1988.
[8] M.W. Hirsch, Differential Topology, GTM 33, Springer-Verlag, New York,
1976.
[9] I.M. James, General Topology and Homotopy Theory, Springer-Verlag,
Berlin, 1984.
[10] A. Kriegl, P. Michor, Convenient Settings of Global Analysis, Am. Math.
Soc., 1997.
[11] Jet Nestruev, Smooth Manifolds and Observables, Springer-Verlag New
York, Inc., 2003
[12] G.W. Whitehead, Elements of Homotopy Theory, Springer-Verlag, New
York, 1978.
	Preliminaries
	Basic Constructions of Homotopy Theory in FRL
	Our Approach to Homotopy Theory in FRL
	Flattened Structures on the Unit Interval
	Some Properties of Smooth Functions between the Flattened Unit Intervals
	Homotopy in FRL and Related Objects
	Cofibrations in FRL
	Smooth Neighborhood Deformation Retracts
	SNDR pairs and SDR pairs
	Cofibrations
	The Mapping Cylinder
	The Exact Sequence of a Cofibration
ABSTRACT
  Cofibrations are defined in the category of Fr\"olicher spaces by weakening
the analog of the classical definition to enable smooth homotopy extensions to
be more easily constructed, using flattened unit intervals. We later relate
smooth cofibrations to smooth neighborhood deformation retracts. The notion of
smooth neighborhood deformation retract gives rise to an analogous result that
a closed Fr\"olicher subspace $A$ of the Fr\"olicher space $X$ is a smooth
neighborhood deformation retract of $X$ if and only if the inclusion $i:
A\hookrightarrow X$ comes from a certain subclass of cofibrations. As an
application we construct the right Puppe sequence.

<|endoftext|><|startoftext|>
Experimental observation of structural crossover in binary mixtures of colloidal hard
spheres
Jörg Baumgartl1,∗, Roel P.A. Dullens1, Marjolein Dijkstra2, Roland Roth3 and Clemens Bechinger1
12. Physikalisches Institut, Universität Stuttgart, 70550 Stuttgart, Germany
2Soft Condensed Matter Group, Utrecht University, 3584 CC Utrecht, The Netherlands
3Max-Planck-Institut für Metallforschung, 70569 Stuttgart, Germany and
Institut für Theoretische und Angewandte Physik, Universität Stuttgart, 70569 Stuttgart, Germany
Using confocal-microscopy we investigate the structure of binary mixtures of colloidal hard spheres
with size ratio q = 0.61. As a function of the packing fraction of the two particle species, we observe
a marked change of the dominant wavelength in the pair correlation function. This behavior is in
excellent agreement with a recently predicted structural crossover in such mixtures. In addition, the
repercussions of structural crossover on the real-space structure of a binary fluid are analyzed. We
suggest a relation between crossover and the lateral extension of networks containing only equally
sized particles that are connected by nearest neighbor bonds. This is supported by Monte-Carlo
simulations which are performed at different packing fractions and size ratios.
PACS numbers: 82.70.Dd, 61.20.-p
Most systems in nature and technology are mixtures
of differently sized particles. Each distinct particle size
introduces another length scale and its competition gives
rise to an exceedingly rich phenomenology in compari-
son with single-component systems. Already the simplest
conceivable multi-component system, i.e. a binary mix-
ture of hard spheres, exhibits interesting and complex
behavior. Just a few examples include entropy driven
formation of binary crystals [1, 2, 3], frustrated crys-
tal growth [4], the Brazil nut effect [5], glass-formation
[6, 7] and entropic selectivity in external fields [8]. Al-
though interaction potentials in atomic systems are more
complex than those of hard spheres, the principle of vol-
ume exclusion is ubiquitous and thus always dominates
the short-range order in liquids [9]. Accordingly, hard
spheres form one of the most important and successful
model systems in describing fundamental properties of
fluids and solids. It has been demonstrated that many
of their features can be directly transferred to atomic
systems where fundamental mechanisms are often ob-
structed by additional material specific effects [10]. Bi-
nary hard sphere systems are fully characterized by their
size ratio q = σS/σB with σi the diameters of the small
(S) and big (B) spheres and the small and big sphere
packing fractions ηS , ηB , respectively.
The pair-correlation functions, gij(r), are the central
measure of structure in fluids; they describe the probabil-
ity of finding a particle of size i at distance r from another
particle of size j. It is well known that all pair-correlation
functions in any fluid mixture with short-ranged inter-
actions (not just hard spheres) exhibit the same type
of asymptotic decay, which can be either purely (mono-
tonic) exponential or exponentially damped oscillatory
([11] and references therein). This prediction, which is
valid in all dimensions, suggests that all pair-correlation
functions decay with a common wavelength and decay
length in the asymptotic limit. For binary hard-sphere
mixtures where ηB � ηS or ηS � ηB , this is rather obvi-
ous since the system is dominated by either big or small
particles. The pair-correlation functions will asymptot-
ically oscillate with a wavelength determined either by
σB (ηB � ηS) or σS (ηS � ηB). Rather surprising is
that the above statement is also valid for all other rela-
tive packing fractions where the system is not dominated
by particles of a single size ([11, 12]). Accordingly, in the
asymptotic limit the (ηS , ηB) phase diagram is divided
by a sharp crossover line where the decay lengths of the
contributions to gij(r) with the two wavelengths become
identical. Below and above this line, however, the pair-
correlation function is either determined by the diameter
of the small spheres or that of the big spheres [13].
Despite the generic character of structural crossover
and the close relationship between structural and me-
chanical properties, this effect has not been observed in
experiments as the asymptotic limit is difficult to reach
in scattering experiments on atomic and molecular liq-
uids. However, recent calculations suggest that struc-
tural crossover is already detectable at relatively small
distances [12]. Because colloidal particles are directly
accessible in real space, such systems provide an oppor-
tunity to explore the structure of binary fluids and to
investigate structural crossover experimentally.
As colloidal suspension we used an aqueous binary
mixture of small melamin particles (σS = 2.9µm) and
big polystyrene spheres (σB = 4.8µm). Addition of
salt screens residual electrostatic interactions thus lead-
ing to an effective hard sphere system. Since melamin
has a higher density (ρM = 1.51g/cm3) than polystyrene
(ρP = 1.05g/cm3) the sedimentation velocities are sim-
ilar and, hence, we obtain a homogeneous system after
mixing. The suspension was contained in a cylindrical
sample cell with a silica bottom plate to allow optical
imaging with an inverted confocal microscope in reflec-
tion mode (Leica TCS SP2). From the images, particle
positions were obtained with digital video microscopy
[14]. Strong layering at the bottom wall allowed us to
image only the first two-dimensional bottom layer of the
three-dimensional system. We define the packing fraction
Figure 1: Different paths with constant total packing fraction
η = ηS + ηB in the (ηS , ηB)- plane. Experimental data (open
symbols: η = 0.72, q = 0.61) are sorted into ten bins. The
bin size is indicated by the ’error bars’. Closed symbols cor-
respond to the MC-simulations (N: η = 0.62, q = 0.4) and
(•: η = 0.57, q = 0.5). For convenience all samples are la-
beled with numbers increasing in the direction indicated by
the arrows.
as ηi = πσ2i /4, with ρi the number density of component
i. Variation of the relative packing fractions of the par-
ticles was achieved by addition of small particles to a
suspension of big spheres (Fig.1). Thus, the total pack-
ing fraction in the two-dimensional bottom layer remains
constant for all samples: η = 0.72. In the following we
will refer to the different samples by the sample numbers
(No.) as given in Fig.1.
Typical snapshots of the system for different packing
fractions of big and small particles are shown in Figs.2A-
C. The images demonstrate how the structure of the bot-
tom layer changes from being rich in small particles (No.
1, Fig.2A) to being rich in big particles (No. 10, Fig.2C).
Fig.2B (No. 5) corresponds to about the same number
density of small and big spheres. In order to analyze the
samples for a possible structural crossover, we calculated
the pair correlation function from the determined particle
positions. To minimize statistical noise we did not distin-
guish between big and small spheres. This is justified be-
cause the crossover has been predicted to be visible in all
pair-correlation functions and thus also in any linear com-
bination [11, 12]. The dominating wavelength in the os-
cillations is identified by computing the total correlation
function htot(r) =
i,j xixjhij(r) =
ij xixj [gij(r)−1],
with the mole fraction xi = ρi/
i ρi of component i [12].
Fig.2D exemplarily shows ln |htot(r)| for samples No. 1,5,
and 9. Note that in this representation the oscillation
wavelength is halved. The correlation functions of sam-
ples No.1 and 9 clearly oscillate with a single wavelength,
respectively, given by ≈ σB/2 and ≈ σS/2. In contrast,
sample 5 does not show a dominating wavelength but
an interference of different length scales which is typical
near the structural crossover. It is important to mention,
Figure 2: A-C) Typical snapshots of the bottom layer of a
binary mixture observed with a confocal microscope used in
reflection mode. The mixtures correspond to sample 10 (A), 5
(B) and 1 (C). The field of view is 40×40µm2. D) Logarithmic
plot of the total correlation functions htot(r) for the experi-
mental binary mixtures with η = 0.72 ± 0.04. Correlation
functions are plotted for sample numbers 1,5 and 9 (compare
Fig.1) and are shifted in vertical direction for clarity. The hor-
izontal bars correspond to σB/2 and σS/2, respectively. E)
Fourier-transforms of htot(r) for the experimental data points
(compare Fig. 1). Vertical lines indicate the wave vectors k
corresponding to the diameters of the small (S) and big par-
ticles (B), respectively. (color online).
that this intermediate behavior is only observed for sam-
ples No.5 and 6, i.e. only for about 10% of the entire
range over which ηB and ηS was varied. The experimen-
tally identified crossover-region is in excellent agreement
with the theoretically calculated value of ηS ≈ 0.3 at
those size ratios, which were determined from the decay
of the pair correlation functions calculated within density
functional theory in the test particle limit [15]. Fig.2E
Figure 3: Visualization of the different bond-types as determined by a Delaunay triangulation: big-big (black), big-small
(yellow) and small-small (red). Different plots correspond to the sample numbers as indicated in Fig.1. The field of view is
180× 180µm2.
shows the Fourier transforms of htot(r) for all samples
where the rather sudden change of the dominating wave-
length is seen more clearly [16]. At small and high pack-
ing fractions, the correlations are clearly dominated by
frequencies corresponding to either small or large parti-
cles (vertical lines) while around sample No.5 hardly any
dominating frequency is observed. This experimentally
confirms structural crossover as well as its occurrence at
finite particle distances.
So far, structural crossover has been discussed in terms
Figure 4: Averaged radii of gyration 〈Rig〉 (normalized to L/2
with L2 the size of the field of view) of networks formed by
large (solid symbols) and small particles (open symbols) as a
function of the sample number for A) the experimental data,
B) the MC-simulations at η = 0.57 and q = 0.5 and, C) the
MC-simulations at η = 0.62 and q = 0.4. The correspond-
ing packing fraction of small particles ηS is indicated as well.
The grey area and the dashed line respectively indicate the
crossover as inferred from the correlation functions and from
density functional theory. (color online).
of pair correlation functions, i.e. spatially averaged quan-
tities. Since our experiments naturally provide detailed
structural information, we investigate what the reper-
cussions are of the structural crossover on the real-space
structure. We first subjected a Delaunay triangulation to
the set of particle centers and identified nearest-neighbor
bonds between big-big (black), big-small (yellow), and
small-small (red) particles, respectively (see Fig.3). As
observed in Fig.3, sample 1 predominantly consists of big-
big bonds which form a large network spreading across
the entire field of view. With increasing sample No., i.e.
increasing ηS , the number of small-small bonds increases,
which leads to fragmentation of the big-big network into
smaller, randomly distributed patches. At large sample
numbers, the role of big and small particles is inverted
and small-small bonds form a network spanning the en-
tire area (No.10). Having distinguished between differ-
ent bond-types, a natural and well-known measure of the
spatial extend of a network formed by ni particles of size
i at positions ~xik (k = 1 . . . n
i) is given by the radius of
gyration Rig =
k=1(~x
k − ~R
2, with ~Ri0 the cen-
troid position of the network. Computing this quantity
for all, say N iC , networks formed by connected particles
of size finally yields a weighted averaged radius of gyra-
tion 〈Rig〉 =
m=1 ni(m)R
g(m) where N
i denote the
total number of particles i. We calculated 〈Rig〉 for net-
works consisting of connected big or small particles and
plotted these values for our experimental data in Fig.4A
as a function of the sample number. At small and high
sample numbers the quantities saturate while a relatively
sharp transition with an intersection point occurs around
sample 6. This location is indeed in very good agreement
with the crossover transition as determined from the cor-
relation functions in Fig.2 and density functional theory
(also indicated in Fig. 4A). This suggests that the struc-
tural crossover corresponds to a competition between the
sizes of networks consisting of connected big or small par-
ticles, respectively.
As structural crossover is also predicted for other size
ratios and packing fractions, we use Monte-Carlo (MC)
simulations to test our findings for more dilute systems
with size ratios q = 0.5 and q = 0.4. The corresponding
paths through the phase diagram (see closed symbols in
Fig.1) were obtained from 2-dimensional simulations with
a fixed number of particles of about 0 < N < 3000 for
both species and box areas of about 1500σ2B employing
periodic boundary conditions. From the configurational
snapshots we first determined the region of crossover by
analyzing htot(r) (the correlation functions are sampled
using 104 MC cycles per particle). Then, we performed
the above described Delaunay triangulation to calculate
〈Rig〉 for networks of connected big or small particles, re-
spectively. The corresponding radii of gyration are plot-
ted in Fig.4B and C and show a similar behavior as in the
experiment. Again, the intersection points are consistent
with the crossover region as inferred from the correlation
functions and DFT calculations. Note that the crossover
region sensitively depends on the size ratio and packing
fractions. Both the experiment and Monte-Carlo sim-
ulations show that structural crossover is accompanied
by a pronounced change in the typical size of networks
consisting of connected big and small particles. By in-
troducing small particles into a system of big spheres,
connections between big particles are broken and, at the
same time, connections between small particles are made.
This sensitively affects the typical size of networks con-
taining connected, equally-sized particles and thereby the
chance of finding another particle with the same size at a
relatively large distance. Consequently, the change from
〈RBg 〉 > 〈RSg 〉 to 〈RSg 〉 > 〈RBg 〉 (and vice versa) provides
a simple real-space argument why the oscillation wave-
length of the gij(r) in the asymptotic limit is either set
by σB or σS .
We have experimentally demonstrated the structural
crossover in a binary colloidal hard sphere system. Fur-
thermore, we show that structural crossover is strongly
coupled to the size of networks containing connected
equally-sized particles only. Going across the structural
crossover, the size ratio of such networks comprised by
either connected big or small particles is reversed. We be-
lieve this real-space configurational picture of structural
crossover is not just applicable to binary hard spheres,
as structural crossover is a generic feature of mixtures
with competing length scales. Moreover, it shows inter-
esting similarities with force chains in granular matter
[17] and glassy systems [6, 7, 18] of dissimilar sized parti-
cles. Therefore, our finding may help to gain more insight
into structure-related properties in binary systems at an
universal level.
∗Electronic address: j.baumgartl@physik.uni-
stuttgart.de
[1] P. Bartlett, R. H. Ottewill and P. N. Pusey, Phys. Rev.
Lett. 68, 3801 (1992).
[2] A. B. Schofield, Phys. Rev. E 64, 51403 (2001).
[3] M. D. Eldrige, P. A. Madden and D. Frenkel, Nature
365, 35 (1993).
[4] V. W. A. de Villeneuve, R. P. A. Dullens, D. G. A. L.
Aarts, E. Groeneveld, J. H. Scherff, W. K. Kegel and
H. N. W. Lekkerkerker, Science 309, 1231 (2005).
[5] D. C. Hong, P. V. Quinn and S. Luding, Phys. Rev. Lett.
86, 3423 (2001).
[6] T. Eckert and E. Bartsch, Phys. Rev. Lett. 89, 125701
(2002).
[7] D. N. Perera and P. Harrowell, Phys. Rev. E 59, 5721
(1999).
[8] R. Roth and D. Gillespie, Phys. Rev. Lett. 95, 247801
(2005).
[9] S. Sastry, T. M. Truskett, P. G. Debenedetti, S. Torquato
and F. H. Stillinger, Mol. Phys. 95, 289 (1998).
[10] W. Poon, P. Pusey and H. N. W. Lekkerkerker, Physics
World April, 27 (1996).
[11] C. Grodon, M. Dijkstra, R. Evans and R. Roth, J. Chem.
Phys. 121, 7869 (2004).
[12] C. Grodon, M. Dijkstra, R. Evans and R. Roth, Mol.
Phys. 103, 3009 (2004).
[13] For very asymmetric size ratios, i.e. q < 0.3, there can be
additional regions in which oscillations at intermediate
wavelength can be observed.
[14] J. C. Crocker and D. G. Grier, J. Colloid Interface Sci.
179, 298 (1996).
[15] R. Roth, R. Evans and S. Dietrich, Phys. Rev. E 62, 5360
(2000).
[16] In two dimensions for radial symmetric functions the
Fouriertransform becomes a Besseltransform. However,
for the identification of the dominant wavelength the
usual Fouriertransform, which is numerically easier to
handle, predicts equivalent results
[17] C. S. O’Hern, S. A. Langer, A. J. Liu and S. R. Nagel,
Phys. Rev. Lett. 86, 111 (2001).
[18] N. Hoffman, F. Ebert, C. N. Likos, H. Löwen and G.
Maret, Phys. Rev. Lett. 97, 078301 (2006).
	References
ABSTRACT
  Using confocal-microscopy we investigate the structure of binary mixtures of
colloidal hard spheres with size ratio q=0.61. As a function of the packing
fraction of the two particle species, we observe a marked change of the
dominant wavelength in the pair correlation function. This behavior is in
excellent agreement with a recently predicted structural crossover in such
mixtures. In addition, the repercussions of structural crossover on the
real-space structure of a binary fluid are analyzed. We suggest a relation
between crossover and the lateral extension of networks containing only equally
sized particles that are connected by nearest neighbor bonds. This is supported
by Monte-Carlo simulations which are performed at different packing fractions
and size ratios.

<|endoftext|><|startoftext|>
Introduction
	New data / new sources
	Conclusions
ABSTRACT
  The present status and understanding of the "spectral sequence" of blazars is
discussed in the perspective of the upcoming GLAST launch. The vast improvement
in sensitivity will allow to i) determine more objectively the "average"
gamma-ray properties of classes objects ii) probe more deeply the ratio between
accretion power and jet power in different systems.

<|endoftext|><|startoftext|>
epl draft
A High Robustness and Low Cost Model for Cascading Failures
Bing Wang and Beom Jun Kim
Department of Physics, BK21 Physics Research Division, and Institute of Basic Science, Sungkyunkwan University,
Suwon 440-746, Korea
PACS 89.75.Hc – Networks and genealogical trees
PACS 05.10.-a – Computational methods in statistical physics and nonlinear dynamics
PACS 89.20.Hh – World Wide Web, Internet
PACS 89.75.Fb – Structures and organization in complex systems
Abstract. - We study numerically the cascading failure problem by using artificially created
scale-free networks and the real network structure of the power grid. The capacity for a vertex
is assigned as a monotonically increasing function of the load (or the betweenness centrality).
Through the use of a simple functional form with two free parameters, revealed is that it is indeed
possible to make networks more robust while spending less cost. We suggest that our method to
prevent cascade by protecting less vertices is particularly important for the design of more robust
real-world networks to cascading failures.
The network robustness has been one of the most central
topics in the complex network research [1]. In scale-free
networks, the existence of hub vertices with high degrees
has been shown to yield fragility to intentional attacks,
while at the same time the network becomes robust to
random failures due to the heterogeneous degree distribu-
tion [2–5]. On the other hand, for the description of dy-
namic processes on top of networks, it has been suggested
that the information flow across the network is one of the
key issues, which can be captured well by the betweenness
centrality or the load [6].
Cascading failures can happen in many infrastructure
networks, including the electrical power grid, Internet,
road systems, and so on. At each vertex of the power
grid, the electric power is either produced or transferred
to other vertices, and it is possible that from some reasons
a vertex is overloaded beyond the given capacity, which is
the maximum electric power the vertex can handle. The
breakdown of the heavily loaded single vertex will cause
the redistribution of loads over the remaining vertices,
which can trigger breakdowns of newly overloaded ver-
tices. This process will go on until all the loads of the
remaining vertices are below their capacities. For some
real networks, the breakdown of a single vertex is suffi-
cient to collapse the entire system, which is exactly what
happened on August 14, 2003 when an initial minor distur-
bance in Ohio triggered the largest blackout in the history
of United States in which millions of people suffered with-
out electricity for as long as 15 hours [7]. A number of as-
pects of cascading failures in complex networks have been
discussed in the literature [8–16], including the model for
describing cascade phenomena [8], the control and defense
strategy against cascading failures [9, 10], the analytical
calculation of capacity parameter [11], and the modelling
of the real-world data [12]. In a recent paper [16], the cas-
cade process in scale-free networks with community struc-
ture has been investigated, and it has been found that
a smaller modularity is easier to trigger cascade, which
implies the importance of the modularity and community
structure in cascading failures.
In the research of the cascading failures, the following
two issues are closely related to each other and of signif-
icant interests: One is how to improve the network ro-
bustness to cascading failures, and the other particularly
important issue is how to design manmade networks with
a less cost. In most circumstances, a high robustness and a
low cost are difficult to achieve simultaneously. For exam-
ple, while a network with more edges are more robust to
failures, in practice, the number of edges is often limited
by the cost to construct them. In brevity, it costs much to
build a robust network. Very recently, Schäfer et. al. pro-
posed a new proactive measure to increase the robustness
of heterogeneous loaded networks to cascades. By defin-
ing the load dependent weights, the network turns to be
more homogeneous and the total load is decreased, which
means the investment cost is also reduced [15]. In the
present Letter, for simplicity, we try to find a possible way
of protecting networks based on the flow along shortest-
http://arxiv.org/abs/0704.0345v1
B. Wang B.J. Kim
l/lmax
This work
ML model in Ref.[8]
Fig. 1: The capacity c is assigned as c = λ(l)l with the initial
load l. The step function λ(l) = 1 + αΘ(l/lmax − β) with two
free parameters α and β is used in our model. For comparison,
the curve for the Motter-Lai (ML) capacity model in Ref. [8],
where λ(l) = constant, is also shown.
hop path, first proposed by Motter-Lai [8]. Through the
use of our improved capacity model, we numerically exam-
ine the cascades in scale-free networks and the electrical
power grid network. Since for heterogeneously loaded net-
works, overload avalanches can be triggered by the failure
of only one of the most loaded vertices, the following re-
sults are all based on the removal of one vertex with the
highest load. Our results suggest that networks can indeed
be made more robust while spending less cost.
We first construct the Barabási-Albert (BA) scale-free
network [17] of the size N = 5000 with the average degree
〈k〉 ≈ 4 to study the cascading failures. The BA network is
characterized by the degree distribution p(k) ∼ k−γ with
the degree exponent γ = 3, and it has been shown that the
load distribution also exhibits the power-law behavior [6],
which means that there exist a few vertices with very large
loads.
The betweenness centrality for each vertex, defined as
the total number of shortest paths passing through it, is
used as the measure of the load and computed by using
the efficient algorithm [18]. The capacity cv for the vertex
v is assigned as
cv = λ(lv)lv, (1)
where lv is the initial load without failed vertices. Al-
though it should be possible to find, via a kind of the
variational approach, the optimal functional form of λ(lv)
which gives rise to the lower cost and the higher robust-
ness (see below for the definitions of the two) we in this
work simplify λ(lv) as shown in Fig. 1:
λ(lv) = 1 + αΘ(lv/lmax − β), (2)
where Θ(x) = 0(1) for x < 0(> 0) is the Heaviside step
function, lmax = maxv lv, and we use α ∈ [0,∞) and β ∈
[0, 1] as two control parameters in the model. In Ref. [8]
a constant λ has been used (see Fig. 1 for comparison),
which corresponds to the limiting case of β = 0 with the
identification λ = 1 + α in our model.
At the initial time t = 0, the vertex with the highest
load is removed from the network, and then new loads
for all other vertices are recomputed.1 We then check the
failure condition cv < lv(t) for each vertex, and remove
all overloaded vertices to get the network at t + 1. The
above process continues until all existing vertices fulfill the
condition cv > lv(t), and the size of the giant component
N ′ at the final stage is measured. The relative size of the
cascading failures is conveniently captured by the ratio [8]
, (3)
which we call the robustness from now on. For networks
of homogeneous load distributions, the cascade does not
happen and g ≈ 1 has been observed [8]. Also for net-
works of scale-free load distributions, one can have g ≈ 1
if randomly chosen vertices, instead of vertices with high
loads, are destroyed at the initial stage [8].
In general, one can split, at least conceptually, the to-
tal cost for the networks into two different types: On the
one hand, there should be the initial construction cost to
build a network structure, which may include e.g., the cost
for the power transmission lines in power grids, and the
cost proportional to the length of road in road networks.
Another type of the cost is required to make the given
network functioning, which can be an increasing function
of the amount of flow and can be named as the running
cost. For example, we need to spend more to have big-
ger memory sizes and faster network card and so on for
the computer server which delivers more data packets. In
the present Letter, we assume that the network structure
is given, (accordingly the construction cost is fixed), and
focus only on the running cost which should be spent in
addition to the initial construction cost.
Without consideration of the cost to protect vertices,
the cascading failure can be made never to happen by
assigning extremely high values to capacities. However,
in practice, the capacity is severely limited by cost. We
expect the cost to protect the vertex v should be an in-
creasing function of cv, and for convenience define the cost
λ(lv)− 1
/N. (4)
It is to be noted that for a given value of α, the original
Motter-Lai (ML) capacity model in Ref. [8] has always a
higher value of the cost than our model (see Fig. 1). Al-
though e = 0 at β = 1, it should not be interpreted as
a costfree situation; we have defined e only as a relative
measure in comparison to the case of λ(l) = 1 for all ver-
tices. For a given network structure, the key quantities
to be measured are g(α, β) and e(α, β), and we aim to in-
crease g and decrease e, which will eventually provide us
1In real situations of failures, the initial breakdown can happen
at any vertex in the network. However, the eventual scale of dam-
ages must be greater when a heavily loaded vertex is broken, and
accordingly we in this work restrict ourselves to the worst case when
the vertex with the highest load is initially broken.
A High Robustness and Low Cost Model for Cascading Failures
 0.002  0.003  0.004
α =1.00
 =0.30
 =0.25
 =0.20
 =0.15
 =0.10
 0.002  0.003  0.004
(b) α =0.30
 =0.25
 =0.20
 =0.15
 =0.10
 0  0.2  0.4  0.6  0.8  1
α =0.30
 =0.25
 =0.20
 =0.15
 =0.10
Fig. 2: Cascading failures in the BA network of the size N =
5000 and the average degree 〈k〉 ≈ 4, triggered by the removal
of a single vertex with the highest load. The robustness g
and the cost e in Eqs. (3) and (4) are shown in (a) and (b),
respectively, as functions of β at various α values [see Fig. 1 for
α and β, the two parameters in the function λ(l) in Eq. (2)].
(c) The relation between e and g at different α’s. Compared
with the ML model in Ref. [8], it is clearly shown that the
network can be made more robust but with less cost.
a way to achieve the high robustness and the low cost at
the same time.
In Fig. 2(a), we report the robustness g for the BA net-
work of the size N = 5000 with the average degree 〈k〉 ≈ 4
as a function of β at α = 0.10, 0.15, 0.20, 0.25, 0.30, and
1.0 (from bottom to top). As β increases further beyond
the region in Fig. 2(a), the robustness g is found to de-
crease toward zero (not shown here), which is as expected
since the larger β makes vertices with larger loads less pro-
tected (see Fig. 1). We also skip in Fig. 2 small values of
β below approximately 0.001: If β < lmin/lmax, with the
minimum load lmin, all vertices are given λ(l) = 1 + α,
equivalent to the ML model corresponding to β = 0. It
is shown in Fig. 2(a) that for α . 0.30, g first increases
and then decreases as β is increased, exhibiting a well-
developed maximum gmax at β = β
∗. This is a partic-
ularly interesting observation since the network becomes
more robust (larger g) by protecting less vertices (larger
β). In more detail, the curve for α = 0.20 in Fig. 2(a)
shows the maximum gmax ≈ 0.62 (at β
∗ ≈ 0.00133), which
is about 3.5 times bigger than g ≈ 0.175 (at β = 0). In
other words, the network can be made much more robust
by assigning smaller capacities to vertices with less loads.
For larger values of α, on the other hand, it is found that
gmax occurs at β = 0, which indicates that the above find-
ing, i.e., possibility of making network more robust by
protecting less vertices, does not hold, as exemplified by
the curve for α = 1 in Fig. 2(a).
The above observation is closely related with Ref. [9],
where it has been found that in order to reduce the size
of cascades (or to have a larger g), some of less loaded
vertices should be removed just after the initial attack. In
reality, however, we believe that the direct application of
this strategy of intentional breakdowns is not easy, for cas-
cading failures usually propagate across the whole network
very soon just after the initial breakdown. In contrast, we
propose in this work a way to make the network better
prepared to breakdowns, by protecting less vertices.
In order to look at the cost benefit of protecting less
vertices in a more careful way, we plot in Fig. 2(b) the
cost e in Eq. (4) versus β at various values of α. As is
expected from Fig. 1, the cost e is shown to be a mono-
tonically decreasing (increasing) function of β (α) at fixed
α (β). Take again the case with α = 0.20 as an exam-
ple with e(β∗) ≈ 0.153 and e(β = 0) = 0.2: It is then
concluded that for α = 0.2 one can make the network
3.5 (≈ 0.62/0.175) times more robust while spending only
76.5% (≈ 0.153/0.2) of the original cost.
In Fig. 2(c), we use the same data as in Fig. 2(a) and
(b), and show the relation between the robustness and
the cost for α = 0.10, · · · , 0.30 from bottom to top. For
comparison, the values (g,e) for β = 0, corresponding to
the ML model, are also displayed as symbols at the end
of curves. It is clearly shown that for a given α, one can
achieve the higher robustness and the lower cost by tuning
β toward the right-most point on each curve. We can also
use Fig. 2(c) to choose the most efficient way to get a
given robustness g: For example, suppose that g = 0.6
is the required robustness. The vertical line for g = 0.6
crosses several different curves, and one can choose the
crossing point which has the lowest cost.
We next study the cascading failures in the real net-
work structure of the North American power grid of the
size N = 4941 [19]. Although the electrical power grid
network is a very homogeneous network in terms of the
degree distribution, the load distribution, in a sharp con-
trast, shows a strong heterogeneity as shown in Fig. 3. In
other words, the degree distribution is more like an ex-
ponential one, while the load distribution is similar to the
power-law form. The broad load distribution can be one of
the reasons of the fragility of the power grid to cascading
failures [8].
We then apply, the same method as we used above, to
the power grid, and obtain g and e as functions of β for
B. Wang B.J. Kim
104 105 106
0 5 10 15 20
Fig. 3: The cumulative load distribution of power grid network
P (l) in log-log scale. The inset shows the cumulative degree
distribution P (k) of the power grid in linear-log scale.
 0  0.01  0.02  0.03  0.04  0.05
α =1.0
 =0.8
 =0.4
 =0.2
 =0.1
 0  0.01  0.02  0.03  0.04  0.05
α =1.0
 =0.8
 =0.4
 =0.2
 =0.1
 0  0.2  0.4  0.6  0.8
α =1.0
 =0.8
 =0.4
 =0.2
 =0.1
Fig. 4: Cascading failures in the electrical power grid of the size
N = 4941. (Compare with Fig. 2 for the corresponding plots
for the BA network.) The robustness g and the cost e versus
β at various α values are shown in (a) and (b), respectively,
while (c) is for the relation between e and g. Again, it is
shown that one can achieve the higher robustness and the less
cost simultaneously, by choosing the right-most point in (c).
0.001 0.002 0.003 0.004
=0.0   
  =0.2   
  =0.4   
  =0.6   
  =0.8   
  =1.0   
Fig. 5: Cascading failures in the BA network of the size N =
5000 and the average degree 〈k〉 ≈ 4, triggered by the removal
of a single vertex with the highest load. Each vertex’s capacity
is disturbed with probability ε for α = 0.2. The data are
averaged over 20 runs.
given values of α. Figure 4 for the cascading failures of
the power grid is in parallel to Fig. 2 for the BA network:
Fig. 4(a) for g versus β, (b) for e versus β, and (c) for e
versus g. There are some quantitative differences between
curves for the power grid and the BA network. However,
qualitatively speaking, both networks are shown to ex-
hibit the following common features: (i) For a given α,
the robustness has a maximum gmax at β = β
∗, (ii) e is a
monotonically decreasing function of β at a given α, and
(iii) there exists a lob-like structure in the g-e plane, which
indicates that one can make the network exhibit a higher
robustness and a lower cost at the same time than the cor-
responding values for the ML model. It is worth mention-
ing that the power grid in Fig. 4 can be made to show the
higher g and the lower e than the ML model in a broader
region of α: Even at α = 1, the power grid can have much
better robustness and much less cost in comparison to the
ML model. Specifically, at α = 1.0 the ML model has
g ≈ 0.40 and e = 1.0 while our model can yield g ≈ 0.73
and e ≈ 0.26 (at β ≈ 0.00583) [see Fig. 4(c)], which occurs
when only 26% of vertices are given the higher capacity
λ(l) = 2, and the other remaining 74% of vertices have
the lower capacity λ(l) = 1. In other words, by assigning
lower capacities to 74% of vertices, the network becomes
much more robust.
In reality, it is also interesting to observe the effect of
noise on the dynamical process. In Ref. [20], when noise
is introduced into the nonlinear dynamical system, it has
been shown that noise changes the singularity at a special
time to a statistical time distribution and shows various in-
teresting behaviors. In the present work, we are interested
in how the presence of noise influences the final cascading
failure behavior within our scheme. Here, we introduce
A High Robustness and Low Cost Model for Cascading Failures
effects of noise as an erroneous assignment of the capac-
ity function. In detail, at a given error probability ε, the
vertex v is assigned the capacity c′v instead of its correct
c′v = cv(1 + r), (5)
where r is the uniform random variable with zero mean
(r ∈ [−1, 1]). We believe that this erroneous behavior
is plausible in reality, since the perfect knowledge for the
true value of the load for each vertex may not be available,
which may cause an erroneous assignment of the capacity
on a vertex. In the limiting case of ε = 0, we recover our
error-free results presented above. In Fig. 5, we report the
results at α = 0.2 for the robustness g for the BA network
as a function of β for different error probability ε [see
Fig.2(a) for comparison]. It is seen that for small ε, the
overall behavior is qualitatively the same as in Fig. 2(a),
i.e., the existence of a well-developed robustness peak and
gradual decrease as β is increased. The peak height of the
robustness is found to decrease as ε is increased, indicating
the negative effect of the noise. An interesting observation
in Fig. 5 is that as ε becomes larger there exits a region
of β in which the robustness is actually higher than the
error-free case of ε = 0.
In summary, we have suggested a new capacity model to
cascading failures, by improving the existing ML capacity
model in Ref. [8]. The main idea in our model is the same
as in existing studies: In a highly heterogeneous network
with a broad load distribution, vertices with large loads
should be more protected by assigning large capacities.
Different from other studies in which the capacity is as-
signed in proportion to the load, i.e., c = λl, we generalize
the model so that the proportionality constant λ is now
changed to an increasing function λ(l) of l. In more detail,
we use the Heaviside step function for λ(l) characterized
by two parameters, the step height α, and the step posi-
tion β. By applying this capacity model to the artificial
BA network as well as the real network of the power grid,
we have clearly shown that it is indeed possible to make
the network more robust, while at the same time the cost
to assign capacities is drastically reduced. We believe that
our suggested model to assign capacities to vertices should
be practically useful in designing infrastructure networks
in an economic point of view. As a final remark, it needs
to be pointed out that the model proposed in this work
should be considered as only the first step to find the op-
timal functional form λ(l) of the capacity as a function of
the load. As a future work, we are planning to apply a
sort of variational method to find the optimal functional
form of λ(l).
B.J.K. was supported by grant No. R01-2005-000-
10199-0 from the Basic Research Program of the Korea
Science and Engineering Foundation.
REFERENCES
[1] Pastor-Satorras R. and Vespignani A., Evolution and
Structure of the Internet: A Statistical Physics Ap-
proach (Cambridge University Press, Cambridge, Eng-
land, 2004); Albert R. and Barabási A.-L., Rev. Mod.
Phys., 74 (2002) 47; Dorogovtsev S.N. and Mendes
J.F.F., Adv. Phys., 51 (2002) 1079; Newman M.E.J.,
SIAM Rev., 45 (2003) 167.
[2] Albert R., Jeong H. and Barabási A.-L., Nature, 406
(2000) 378.
[3] Cohen R., Erez K., ben-Avraham D. and Havlin S.,
Phys. Rev. Lett., 85, (2000) 4626; ibid. 86 (2001) 3682.
[4] Holme P., Kim B.J., Yoon C.N. and Han S.K., Phys.
Rev. E, 65 (2002) 056109.
[5] Wang B., Tang H.W., Guo C.H. and Xiu Z.L., Physica
A, 363, (2006) 591; Wang B., Tang H.W., Guo C.H.,
Xiu Z.L. and Zhou T., ibid. 368 (2006) 607.
[6] Goh K.I., Kahng B. and Kim D., Phys. Rev. Lett., 87
(2001) 278701.
[7] U.S.-Canada Power System Outage Task Force, Final
Report on the August 14th blackout in the United States
and Canada: Causes and Recommendations (United
States Department of Energy and National Resources
Canada, April 2004.)
[8] Motter A.E. and Lai Y.C., Phys. Rev. E, 66 (2002)
065102(R).
[9] Motter A.E., Phys. Rev. Lett., 93 (2004) 098701.
[10] Hayashi Y. and Miyazaki T., cond-mat/0503615.
[11] Zhao L., Park K. and Lai Y.C., Phys. Rev. E, 70, (2004)
035101(R); Zhao L., Park K., Lai Y.C. and Ye N., ibid.
72 (2005) 025104.
[12] Kinney R., Crucitti P., Albert R. and Latora V., Eur.
Phys. J. B, 46 (2005) 101.
[13] Crucitti P., Latora V. and Marchiori M., Phys. Rev. E,
69 (2004) 045104(R).
[14] Holme P. and Kim B.J., Phys. Rev. E, 65 (2002) 066109.
[15] Schäfer M., Scholz J. and Greiner M., Phys. Rev. Lett.,
96 (2006) 108701.
[16] Wu J. J, Gao Z.Y. and Sun H. J., Phys. Rev. E, 74,
(2006) 066111.
[17] Barabási A.-L. and Albert R., Science, 286 (1999) 509.
[18] Newman M.E.J., Phys. Rev. E, 64, (2001) 016132; Pro.
Natl. Acad. Sci., U.S.A. 98 (2001) 404.
[19] Watts D.J. and Strogatz S.H., Nature, 393 (1998) 440.
[20] Fogedby H. C., Poutkaradze V., Phys. Rev. E, 66,
(2002) 021103.
http://arxiv.org/abs/cond-mat/0503615
ABSTRACT
  We study numerically the cascading failure problem by using artificially
created scale-free networks and the real network structure of the power grid.
The capacity for a vertex is assigned as a monotonically increasing function of
the load (or the betweenness centrality). Through the use of a simple
functional form with two free parameters, revealed is that it is indeed
possible to make networks more robust while spending less cost. We suggest that
our method to prevent cascade by protecting less vertices is particularly
important for the design of more robust real-world networks to cascading
failures.

<|endoftext|><|startoftext|>
Diffuse X-ray Emission from the Carina Nebula Observed with
Suzaku
Kenji Hamaguchi1,2, the Suzaku η Carinae team and the Carinae D-1 team
1CRESST and X-ray Astrophysics Laboratory NASA/GSFC, Greenbelt, MD 20771
2Universities Space Research Association, 10211 Wincopin Circle, Suite 500,
Columbia, MD 21044
A number of giant HII regions are associated with soft diffuse X-ray emission. Among
these, the Carina nebula possesses the brightest soft diffuse emission. The required plasma
temperature and thermal energy can be produced by collisions or termination of fast winds
from main-sequence or embedded young O stars, but the extended emission is often observed
from regions apart from massive stellar clusters. The origin of the X-ray emission is unknown.
The XIS CCD camera onboard Suzaku has the best spectral resolution for extended
soft sources so far, and is therefore capable of measuring key emission lines in the soft band.
Suzaku observed the core and the eastern side of the Carina nebula (Car-D1) in 2005 Aug and
2006 June, respectively. Spectra of the south part of the core and Car-D1 similarly showed
strong L-shell lines of iron ions and K-shell lines of silicon ions, while in the north of the core
these lines were much weaker. Fitting the spectra with an absorbed thin-thermal plasma
model showed kT∼0.2, 0.6 keV and NH∼1−2×10
21 cm−2 with a factor of 2-3 abundance
variation in oxygen, magnesium, silicon and iron. The plasma might originate from an old
supernova, or a super shell of multiple supernovae.
§1. Extended X-ray Emission from the Star Forming Region
Soft X-ray emission nebulae with kT∼0.1–0.8 keV, log LX∼33-35 ergs s
−1, and
size of ∼1–103 pc accompany a number of giant HII regions (see Table 4 of Ref. 6).
Chandra observations of extended emission in a few star forming clusters indicate
that the emission may arise from the fast O star stellar winds thermalized either by
wind-wind collisions or by a termination shock. However, the emission is often found
outside of the massive stellar clusters, so that another origin, such as an otherwise
unrecognized supernova remnant, cannot be ruled out.
In principle, the origin of the diffuse emission can be determined by measuring
its composition. For example, the plasma should be overabundant in nitrogen and
neon if it originates from winds from nitrogen-rich Wolf-Rayet stars (WN), while it
would be overabundant in oxygen if it arises from a Type II SNR. The temperature
of the plasma, typically a few million degrees, makes soft X-ray band studies highly
desirable, because of the presence in this band of strong lines from these elements,
plus carbon, silicon and iron.
The Carina Nebula, which contains several evolved and main-sequence massive
stars such as η Car, WR 25 and massive stellar clusters such as Trumpler 14 (Tr
14), emits soft diffuse X-rays 10–100 times stronger than any other Galactic giant
HII region (LX ∼10
35 ergs s−1).4) The high surface brightness made possible the
discovery of the diffuse emission by the Einstein Observatory in the late 1970’s.
The Einstein observations revealed that the diffuse emission tends to be associated
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0346v1
2 K. Hamaguchi et al.
with optically bright regions containing massive stars. Recent Chandra observations
provided a point source free measurement of the diffuse flux,1) and suggested the
presence of a north-south Fe and Ne abundance gradient.5)
The X-ray CCD cameras (XISs: X-ray Imaging Spectrometer) onboard the
Suzaku observatory have the best spectral resolution for extended soft X-ray emission
and thus they provide good diagnostics of emission lines especially below ∼1 keV.
§2. Suzaku and XMM-Newton Observations of the Carina Nebula
Figure 1 shows a mosaic image of the Carina nebula between 0.4−7 keV created
from 32 XMM-Newton observations. The image depicts several bright X-ray point
sources: η Car (an LBV), WR25, WR22 (Wolf-Rayet stars), HD 93250, HD 93043
(O3 stars), and Tr 14, Tr 16 (massive stellar clusters). The image also clearly shows
apparently extended emission toward the east-west direction. In a color image (e.g.
Figure 1 of Ref. 2), XMM-Newton Image Gallery∗)) the emission is softer between
Tr 14, WR 25 and η Car.
We analyzed the Suzaku data of the core and the eastern side (named Car-D1)
of the Carina nebula taken on 2005 Aug. 29 and 2006 June 5. The XIS FOVs
of these observations are shown in Figure 1 with dotted lines. To investigate the
color variation in detail, we divided the core region into two and thus extracted
three spectra from two Suzaku observations (core-north, core-south and Car-D1).
The background was reproduced with the night earth data. The spectra showed
strong emission between 0.3 and 2 keV, which is probably dominated by soft diffuse
emission associated with the Carina nebula, while the spectra above 2 keV may be
explained with CXB, Galactic Ridge X-ray Emission, X-ray point sources resolved
with Chandra and unresolved pre-main-sequence stars.
Figure 2 shows an overlay of the BI spectra between 0.3–2 keV. The left panel
compares spectra of the core-north region with the core-south region. A strong differ-
ence is seen between 0.7 keV and 1.2 keV, which apparently is the source of the two
colors of diffuse emission. The band in which the difference is found is dominated by
emission lines from the iron L-shell complex. Additionally, the core-south spectrum
shows a stronger Si line. The Car-D1 spectrum shows similar intensity in the Si and
Fe lines to the core-south spectrum (right panel of Figure 2) while it shows relatively
strong magnesium and oxygen lines. All these spectra look similar except for these
emission lines. This suggests that the differences represent an elemental abundance
variation, and not a temperature difference.
This is supported by spectral fits of the individual spectra. All three spectra
between 0.3−2 keV were reproduced by an absorbed 2T thin-thermal plasma models
although the best-fit models are not formally acceptable. The plasma tempera-
tures of all three regions are ∼0.2 and ∼0.6 keV, and their column densities are
∼3×1021 cm−2, which is consistent with extinction toward the Carina nebula.3) The
abundances of some elements show a factor of 2−4 variations: the core-north region
has a factor of 2 lower silicon abundance and a factor of 4 lower iron abundance
∗) http://xmm.esac.esa.int/external/xmm science/gallery/public
Diffuse X-rays from the Carina nebula 3
Fig. 1. Mosaic image (∼90′×60′) of the Carina nebula between 0.4−7 keV created from 32 XMM-
Newton observations. The image is created with the ESAS package, divided by the exposure
map and smoothed with the adaptive smoothing technique. The dotted lines show the XIS
FOVs of the Suzaku observations of η Car (right) and the Car-D1 field (left). The solid lines
show source extraction regions for the spectral analysis.
than the core-south region, while the Car-D1 region has a factor of 2 higher oxygen
and magnesium abundances. On the other hand, spectral fits of the core region
with higher sensitivity around 0.5 keV gave small upper-limits (.0.02 solar) of the
nitrogen abundance.
§3. Origin of the Diffuse Plasma
The N/O abundance ratio inferred from the spectral fits is .0.4, over 20 times
less than around η Car. The abundance distribution is totally contrary to that
expected from stellar winds from evolved massive stars, unless the winds somehow
heat the interstellar matter without enriching it, thus leaving the X-ray plasma with
abundances typical of interstellar matter. At the same time, the X-ray luminosity
of the Carina Nebula is about two orders of magnitude higher than that of other
Galactic star forming regions, but the number of early O stars is only an order of
magnitude higher (see Table 4 in Ref. 6). These results suggest an additional energy
source is needed to power the X-ray emission in the Carina Nebula.
An obvious possibility is one or more core-collapse supernovae (i.e. Type Ib,c
or II), mentioned as a possibility by Ref. 6). The regions vary strongly in oxygen,
magnesium, silicon, and iron abundances. These elements are products of core-
collapse supernovae, and young SNRs such as Cas A and Vela show strong abundance
4 K. Hamaguchi et al.
Fe L complex Fe L complex
Fig. 2. Comparison of the XIS1 spectra between the fields – left: the core-north region (black) and
the core-south region (grey), right: the Car-D1 field (black) and the core-south region (grey).
The above labels demonstrate energies of emission lines detected (black) or concerned (grey)
with this result. Emission lines with the solid lines showed variation in their line intensity. Low
count rates of the Car-D1 spectrum below 1 keV is caused by degradation of soft response by
progressive contamination on the XIS.
variation from location to location. The total energy content in the hot gas of
∼2×1050 ergs is a modest fraction of the ∼1051 ergs of kinetic energy produced by a
canonical supernova, while assuming an iron abundance of 0.30 solar, the total iron
mass in the diffuse gas requires at least 3-5 supernovae.
Acknowledgements
K. H. is financially supported by a US Chandra grant No. GO3-4008A and US
Suzaku grant.
References
1) N. R. Evans, F. D. Seward, M. I. Krauss, T. Isobe, J. Nichols, E. M. Schlegel, and S. J.
Wolk, Astrophysical Journal 2003 (589), 509
2) K. Hamaguchi, R. Petre, H. Matsumoto, M. Tsujimoto, S. S. Holt, Y. Ezoe, H. Ozawa,
Y. Tsuboi, Y. Soong, S. Kitamoto, A. Sekiguchi, and M. Kokubun. Publication of Astro-
nomical Society of Japan 2007 (59), 151
3) M. A. Leutenegger, S. M. Kahn, and G. Ramsay. Astrophysical Journal 2003 (585), 1015
4) F. D. Seward and T. Chlebowski. Astrophysical Journal 1982 (256), 530
5) L. K. Townsley. Proceeding of the STScI May Symposium, ”Massive Stars: From Pop III
and GRBs to the Milky Way, 2006, (astro–ph/0608173)
6) L. K. Townsley, E. D. Feigelson, T. Montmerle, P. S. Broos, Y.-H. Chu, and G. P. Garmire.
Astrophysical Journal 2003 (593), 874
http://arxiv.org/abs/astro--ph/0608173
	Extended X-ray Emission from the Star Forming Region
	Suzaku and XMM-Newton Observations of the Carina Nebula
	Origin of the Diffuse Plasma
ABSTRACT
  A number of giant HII regions are associated with soft diffuse X-ray
emission. Among these, the Carina nebula possesses the brightest soft diffuse
emission. The required plasma temperature and thermal energy can be produced by
collisions or termination of fast winds from main-sequence or embedded young O
stars, but the extended emission is often observed from regions apart from
massive stellar clusters. The origin of the X-ray emission is unknown.
  The XIS CCD camera onboard Suzaku has the best spectral resolution for
extended soft sources so far, and is therefore capable of measuring key
emission lines in the soft band. Suzaku observed the core and the eastern side
of the Carina nebula (Car-D1) in 2005 Aug and 2006 June, respectively. Spectra
of the south part of the core and Car-D1 similarly showed strong L-shell lines
of iron ions and K-shell lines of silicon ions, while in the north of the core
these lines were much weaker. Fitting the spectra with an absorbed thin-thermal
plasma model showed kT~0.2, 0.6 keV and NH~1-2e21 cm-2 with a factor of 2-3
abundance variation in oxygen, magnesium, silicon and iron. The plasma might
originate from an old supernova, or a super shell of multiple supernovae.

<|endoftext|><|startoftext|>
Introduction
1.1 The Colin de Verdière number
At the end of 80’s, Yves Colin de Verdière introduced a graph parameter
µ(G) based on spectral properties of certain matrices associated with the
graph G.
Definition 1.1 Let G be a graph with n vertices. A Colin de Verdière
matrix for G is a symmetric n × n matrix M = (Mij) with the following
properties.
Research for this article was supported by the DFG Research Unit 565 “Polyhedral
Surfaces”.
http://arxiv.org/abs/0704.0349v3
(M1) M is a Schrödinger operator on G, that is
< 0, if ij is an edge of G;
= 0, if ij is not an edge of G and i 6= j.
(M2) M has exactly one negative eigenvalue, and this eigenvalue is simple.
(M3) If X is a symmetric n × n matrix such that MX = 0 and Xij = 0
whenever i = j or ij is an edge of G, then X = 0.
The set of all Colin de Verdière matrices for graph G is denoted by MG.
The Colin de Verdière number µ(G) is defined as the maximum corank of
matrices from MG:
µ(G) := max
dimkerM.
A Colin de Verdière matrix of maximum corank is called optimal.
Basically, the Colin de Verdière number is the maximum multiplicity of
the second least eigenvalue λ2 of a discrete Schrödinger operator M satis-
fying a certain stability assumption (M3). By replacing M with M − λ2Id,
we can make the second eigenvalue zero (M2), so that multiplicity be-
comes corank. Definition 1.1 was motivated by the study of Schrödinger
and Laplace operators associated with degenerating families of Riemannian
metrics on surfaces.
The parameter µ(G) turned out to be interesting on its own. In partic-
ular, it posesses the minor monotonicity property: if a graph H is a minor
of G, then µ(H) ≤ µ(G). By the Robertson-Seymour theorem this implies
that graphs with µ(G) ≤ n can be characterized by a finite set of forbidden
minors. For n up to four such characterizations are known and allow nice
topological reformulations: e. g. µ(G) ≤ 3 iff G is planar (that is doesn’t
have K5 or K3,3 as minors), and µ(G) ≤ 4 iff G is linklessly embeddable in
3 (that is doesn’t have any graph of the Petersen family as a minor). An
overview of results and open problems on the Colin de Verdière number can
be found in [4], [14], and [5]. The book [4] deals also with other spectral
invariants arising from discrete Schrödinger and Laplace operators.
1.2 Nullspace representations and Steinitz representations
Let M be a Colin de Verdière matrix for graph G with dimkerM = d.
Choose a basis (u1, . . . , ud) for kerM ⊂ R
n, fix a coordinate system in Rn,
and read off the coordinates of (uα):
(u1, . . . , ud) = (v1, . . . , vn)
The map that associates to every vertex i of G the vector vi ∈ R
d is called
a nullspace representation of the graph G.
Nullspace representations were studied in [11]. In a subsequent paper
[10] Lovász showed that, for a 3-connected planar G, the nullspace repre-
sentation with properly scaled vectors (vi) realizes G as the skeleton of a
convex 3-polytope. Lovász provided also an inverse construction that as-
sociated to every convex 3-polytope with 1-skeleton G a Colin de Verdère
matrix of corank 3. The proof that the constructed matrix had an appro-
priate signature was indirect, and a more geometric approach was desirable.
1.3 Hessian matrix of the volume as a Colin de Verdière
matrix
In this paper we relate the Lovász construction (that of a matrix from a poly-
tope) to the mixed volumes. Our approach allows a straightforward gener-
alization to higher dimensions. That is, we associate to every d-dimensional
convex polytope with 1-skeleton G a Colin de Verdière matrix for G of
corank d.
As a consequence, the graph of a convex d-dimensional polytope has
Colin de Verdière number at least d. This result is not really new, since it
follows from the minor monotonicity of µ, from the fact that the graph of a
d-polytope has Kd+1 as a minor [8], and from µ(Kd+1) = d.
Our result is based on the following observation. Take a convex d-
polytope P and deform it by shifting every facet parallelly to itself. Then
the Hessian matrix of the volume of P , where partial derivatives are taken
with respect to the distances of the shifts, has corank d and exactly one
positive eigenvalue. Besides, the mixed partial derivative
∂2vol(P )
∂xi∂xj
is positive
if the ith and the jth facets are adjacent, and vanishes otherwise. Thus
the negative of the Hessian matrix satisfies conditions (M1) and (M2) from
Definition 1.1. The condition (M3) follows quite easily, too.
The signature of the Hessian of the volume is encoded in the second
Minkowski inequality for mixed volumes together with Bol’s characteriza-
tion of the case of equality. For simple polytopes, the determination of the
signature of the Hessian is an essential part in the proof of the Alexandrov-
Fenchel inequality.
1.4 Plan of the paper
In Section 2.1 we recall the Lovász construction of a Colin de Verdière matrix
for the skeleton of a convex 3-polytope Q.
After inroducing some terminology and notation in Section 2.2, we show
in Section 2.3 that the Lovász matrix is minus the Hessian matrix of the
volume of the polar dual polytope Q∗.
In Section 2.4, dealing with 3-polytopes, we point out an interesting
identity (first found and used elsewhere [2]) between the Hessian matrix of
vol(Q∗) and the Hessian matrix of another geometric quantity associated
with Q. This gives another interpretation of the Lovász matrix M and
relates the equality dimkerM = 3 with the infinitesimal rigidity of the
polytope Q.
In Section 3.1 we discuss the (im)possibility of inverting the construction,
that is of finding a convex polytope whose Hessian matrix of the volume
equals to a given Colin de Verdière matrix.
In Section 3.2 we give an estimate of the negative eigenvalue (and thus
of the spectral gap) for the Hessian matrices of the volume.
Finally, in the Appendix we derive the signature of the Hessian from the
second Minkowski inequality and Bol’s condition. Although this seems to
be a folklore knowledge in narrow circles, we failed to find a written account
on this subject.
1.5 Acknowledgements
I am grateful to the organizers of the 2006 Oberwolfach conference “Discrete
Differential Geometry”, where the idea of this paper was born. I also thank
Ronald Wotzlaw for pointing me out a mistake in a preliminary version.
2 From a convex polytope to a Colin de Verdière
matrix
2.1 Lovász construction
Let us recall the Lovász construction of an optimal Colin de Verdière matrix
associated with a polytopal representation of a graph in R3.
Let Q ⊂ R3 be a convex polytope containing the coordinate origin in
its interior. Let G be the 1-skeleton of Q. We denote the vertices of G
by i, j, . . ., and the corresponding vertices of Q by vi, vj , . . .. Let Q
∗ be the
polar dual of Q. The vertices of Q∗ are denoted by wf , wg, . . ., where f, g, . . .
are faces of Q.
For ij ∈ G, consider the edge vivj of Q and the dual edge wfwg of Q
see Figure 1. It is easy to show that the vector wf − wg is orthogonal to
both vectors vi and vj, hence parallel to their cross product vi × vj. Thus
we have
wf − wg = Mij(vi × vj), (1)
with Mij < 0 (we agree to choose the labeling of wf and wg so that we get
the correct sign).
Further, consider the vector
v′i =
Mijvj,
vi × vj
Figure 1: To the definition of the matrix M .
where the sum extends over all vertices of G adjacent to i. From (1) it is
easy to see that vi × v
i = 0. Thus there exists a real number Mii such that
v′i = −Miivi. (2)
Putting Mij = 0 for distinct non-adjacent vertices i and j of G, we complete
the construction of the matrix M .
Theorem 2.1 (Lovász, [10]) The matrix M is a Colin de Verdière matrix
for the graph G.
The equation (2) can be rewritten as
Mijvj = 0. (3)
Thus M has corank at least 3. Since µ(G) ≤ 3 for planar graphs, M is an
optimal Colin de Verdière matrix for G.
The proof of Theorem 2.1 goes through a deformation argument, using
the fact that the space of convex 3-polytopes with a given graph is connected.
2.2 Polytopes with a given set of normals
Here we fix some terminology and notation needed in the subsequent sec-
tions.
All polytopes in this paper are assumed to be convex. A facet of a
d-dimensional polytope is a (d− 1)-dimensional face of it.
We will study families of polytopes with fixed facet normals. Let v1, . . . , vn
be vectors in Rd such that the coordinate origin lies in the interior of their
convex hull. Consider a d× n matrix formed by row vectors v⊤i :
V = (v1, . . . , vn)
Definition 2.2 Denote by P(V ) the set of all convex polytopes with the
outer facet normals v1, . . . , vn.
Every polytope in P(V ) is the solution set of a system of linear inequal-
ities:
P (x) = {p ∈ Rd |V p ≤ x},
where x = (xi)
i=1 ∈ R
n. Denote by Fi(x) the facet of P (x) with the outer
normal vi. We have
Fi(x) = {p ∈ P (x) | v
i p = xi}.
The numbers xi are called the support parameters of the polytope P (x).
The map P (x) 7→ x embeds P(V ) into Rn as an open convex subset. The
support parameter xi is proportional to the signed distance from 0 to the
affine hull of the facet Fi(x):
xi = ‖vi‖ · hi.
By vold we denote the volume of a d-dimensional polytope. We use the
subscript because both vold(P ) and vold−1(Fi) will occur in our formulas.
We omit the subscript at vol, when it seems reasonable to do so.
2.3 Interpreting and generalizing the Lovász construction
By definition of the polar dual, we have
Q∗ = {p ∈ R3 | v⊤i p ≤ 1 for all i}.
Thus Q∗ can be viewed as an element of the set P(V ) of polytopes with
facet normals (vi)i∈G. In terms of Section 2.2, Q
∗ = P (1, . . . , 1). Let’s vary
the support parameters of Q∗ and look how does this change its volume.
Lemma 2.3 Let M be the matrix constructed in Section 2.1. Then we have
Mij = −
∂2vol(P (x))
∂xi∂xj
x=(1,...,1)
where P (x) is as in Section 2.2.
Proof . Let Fi(x) be the facet of P (x) with the normal vi. It is not hard to
show that
∂vol3(P (x))
vol2(Fi(x))
Further, for i 6= j we have
∂vol2(Fi(x))
vol1(Fij(x))
‖vj‖ sin θij
∆vol1(Fj)
∆vol2(P )
Figure 2: Partial derivatives of the volume with respect to the support
parameters.
if faces Fi(x) and Fj(x) are adjacent; otherwise this derivative is zero. Here
Fij(x) is the common edge of Fi(x) and Fj(x), and θij is the angle between
the vectors vi and vj (i. e. the outer dihedral angle at the edge Fij). The
equations are illustrated in Figure 2 in one dimension lower and for ‖vi‖ = 1.
Thus at x = (1, . . . , 1) we have
∂2vol(P (x))
∂xi∂xj
vol1(Fij(x))
‖vi‖‖vj‖ sin θij
‖wf − wg‖
‖vi × vj‖
= −Mij (4)
for all i 6= j.
To deal with the case i = j, differentiate the well-known identity
vol2(Fj(x))
with respect to xi. This gives
∂2vol(P (x))
j 6=i
∂2vol(P (x))
∂xi∂xj
vj = 0. (5)
In view of (3) and (4), we have
∂2vol(P (x))
|x=(1,...,1) = −Mii. �
Lemma 2.3 suggests the following generalization of the Lovász construc-
tion.
Theorem 2.4 Let
P (x0) = {p ∈ Rn | v⊤i p ≤ x
i for all i}
be a convex polytope with outer facet normals vi and support parameters
x0i , i = 1, . . . , n. Let G be the dual 1-skeleton of P (x
0). Then the matrix M
defined by
Mij = −
∂2vol(P (x))
∂xi∂xj
is a Colin de Verdiére matrix for the graph G.
The corank of M is equal to d. In particular, µ(G) ≥ d for every graph
G that can be realized as the 1-skeleton of a d-dimensional polytope.
Proof . Similarly to Lemma 2.3, for adjacent facets Fi and Fj we have
∂2vold(P (x))
∂xi∂xj
vold−2(Fij(x))
‖vi‖‖vj‖ sin θij
where Fij is their common (d− 2)-face, and θij is the angle between vi and
vj. For non-adjacent Fi and Fj this derivative is zero. Therefore matrix M
satisfies property (M1) from Definition 1.1.
The proof of property (M2) is the most interesting part of the theo-
rem. The signature of the Hessian of the volume is encoded in the second
Minkowski inequality for mixed volumes enhanced by Bol’s condition for
equality.
Theorem A.10 in Section A states in particular that the matrix M has
corank d. The kernel of M is easy to identify: due to the equation (5) it
consists of the vectors ξ ∈ Rn such that ξi = v
i p for some vector p ∈ R
Assuming this description of kerM , let us prove that the matrix M
satisfies property (M3). If MX = 0, then there are vectors p1, . . . , pn ∈ R
such that Xij = v
i pj for all i, j. Fix j. Then by assumption on X we have
pj ⊥ vj and pj ⊥ vi for all ij ∈ G. But the normal vj to the face Fj and the
normals to the neighboring faces span the space Rd. Thus we have pj = 0
for all j, which implies X = 0.
As for the last sentence of the theorem, if G is the dual 1-skeleton of a
convex polytope P , then G is the skeleton of the polar (P − p)∗, where p is
any interior point of P . �
2.4 Case d = 3 and infinitesimal rigidity of convex polytopes
In the case d = 3 there is another interpretation of the matrix M . As in
Section 2.1, let Q be a convex polytope that has skeleton G and contains
0 in the interior. Triangulate the faces of Q by diagonals and cut Q into
pyramids with apices at 0 and triangles of the triangulation as bases. De-
note by ri the length of the edge that joins 0 to the vertex vi of Q. Now
deform the pyramids by changing the lengths ri and leaving the lengths of
boundary edges constant. During such deformation, the dihedral angles of
the pyramids change, and the total angle ωi around the i-th edge can be-
come different from 2π. By computing the derivatives of ωi explicitly, we
obtain ([2], Theorem 3.11)
vol1(Fij)
sin θij
= ‖vi‖‖vj‖ ·Mij, (7)
where we use the notations from Section 2.3. If we change the variables xi to
hi = ‖vi‖ · xi, so that hi is the distance of 0 from aff (Fi), then the equation
(7) takes a particularly nice form
∂vol2(Fi)
By (7), the matrix (∂ωi
) is obtained from the matrix M by multiplying
the i-th row and the i-th column with ‖vi‖, for all i. This implies
Corollary 2.5 The matrix (∂ωi
) is an optimal Colin de Verdière matrix for
graph G.
The fact that the matrix (∂ωi
) has corank 3 is equivalent to the infinitesi-
mal rigidity of the polytope Q. Indeed, every infinitesimal deformation (dri)
such that dωi = 0 for all i gives rise to an infinitesimal isometric deformation
of Q. The resulting deformation is trivial iff it is produced by moving the
apex 0 inside Q.
Another interesting fact is that the matrix (∂ωi
) is the Hessian matrix
of a geometric quantity related to the polytope Q (deformed by varying ri).
Namely, put
S(r) =
riκi +
ℓijθij,
where κi = 2π − ωi is the “curvature” along the i-th radial edge, and ℓij =
vol1(Fij) is the length of the edge vivj. Then the Schläfli formula implies
= κi.
Hence
∂2S(Q)
∂ri∂rj
∂2vol(Q∗)
∂hi∂hj
and both matrices are equal to the negative of the Lovász matrix M , up to
scaling the rows and columns by ‖vi‖.
3 Concluding remarks
3.1 What fails in the inverse construction
Let M be a Colin de Verdère matrix for the graph G. Is there a convex
polytope P such that M arises from P as a result of the construction de-
scribed in Section 2.3? Of course, in general the answer is no, because G
must be the dual skeleton of P , and P must have dimension d = dimkerM .
In particular, all vertices of G must have degrees at least d. But, due to the
minor monotonicity of µ, there exist trivalent graphs with µ(G) arbitrarily
large.
Nevertheless, it is worth looking at what fails when we try to reconstruct
the polytope P from matrix M .
Let u1, . . . , ud ∈ R
n be a basis of kerM . Let v⊤i be the i-th row in the
matrix (u1, . . . , ud). Then we have
Mijvj = 0 (8)
for all i. Therefore, the vectors v1, . . . , vn ∈ R
d are good candidates for the
outer normals to the faces of the polytope P . At this point we can already
fail, if the following assumptions aren’t fulfilled:
1. vi 6= 0 for all i, and vi 6= vj for all i 6= j;
2. for every i, the projections vij of vj on v
i for ij ∈ G satisfy the
previous assumption and span v⊥i .
We proceed assuming that these conditions hold. Codimension 2 faces
Fij of P must be in 1-to-1 correspondence with the edges of G, and their
volumes are determined by the matrix M :
vold−2(Fij) = Aij := −Mij‖vi‖‖vj‖ sin θij,
where θij is the angle between vi and vj .
Lemma 3.1 For every i, there exists a convex (d−1)-dimensional polytope
Fi ⊂ v
i with outer facet normals vij and facet volumes Aij , ij ∈ G.
Proof . By projecting the equation (8) on v⊥i , we obtain
Mij · vij = 0. (9)
Due to ‖vij‖ = ‖vj‖ sin θij, it follows that
Aij ·
‖vij‖
By Minkowski’s theorem [13, Section 7.1], this implies the existence of a
polytope Fi as stated in the lemma. �
The polytopes Fi in Lemma 3.1 should become facets of the polytope
P . But here is the second point where the reconstruction can fail: the j-th
facet Fij of Fi might be different from the i-th facet Fji of Fj ; the only thing
we know is vold−2(Fij) = Aij = vold−2(Fji).
In the case d = 3, however, this suffices: Fi are convex polygons and
fit together along their edges to form a polytope P . Conditions 1. and 2.
above hold if we assume that G is a 3-connected planar graph [11]. Thus for
3-connected planar graphs every Colin de Verdière matrix corresponds to a
polytope. This is one of the results of [10].
The following example shows that even for highly connected graphs the
number µ(G) can be bigger than the maximum dimension of a polytope
with 1-skeleton G.
Example Let Gn = K2,2,...,2 be the multipartite graph on 2n vertices (the
graph of an n-dimensional cross-polytope). By [9], µ(Gn) = 2n − 3 for
n ≥ 3. For n = 3, 4 the graph Gn can also be represented as the skeleton of
a (2n − 3)-dimensional convex polytope: for n = 3 this is the octahedron,
for n = 4 the join of two convex quadrilaterals in general position in R5.
For n ≥ 5, however, there is no (2n − 3)-dimensional convex polytope with
skeleton Gn. Indeed, by studying the Gale diagram [16, Lecture 6] of a
d-polytope with d + 3 vertices, one can show that the complement to the
graph of such polytope cannot have more than 4 edges.
Note that the equation (9) is reminiscent of the definition of a (d − 2)-
weight in [12].
3.2 Negative eigenvalue
Theorem 3.2 Let λ1 be the negative eigenvalue of the matrix (6). Then
the following inequality holds:
λ1 ≤ −d(d− 1) ·
vold(P (x
‖x0‖2
The equality takes place iff
x0i = c ·
vold−1(Fi(x
for all i and some constant c.
Proof . By induction on d, it is easy to show that the function vold(P (x)) is a
degree d homogeneous polynomial in x as long as the combinatorics of P (x)
does not change. For different combinatorics, the polynomials have different
coefficients. However, since vold(P (x)) is twice differentiable, we can apply
Euler’s homogeneous function theorem twice at the point x0, independently
on how generic the combinatorics of P (x0) is. This yields
(x0)⊤Mx0 = −d(d− 1) · vold(P (x
Since λ1 = min‖ξ‖=1 ξ
⊤Mξ, the inequality follows.
Since λ1 is the unique negative eigenvalue of M , the inequality turns
into equality iff Mx0 = λx0 for some λ. We have
j = −
∂vold−1(Fi(x
x0j = −(d− 1) ·
vold−1(Fi(x
Thus Mx0 = λx0 is equivalent to x0i = c ·
vold−1(Fi(x
, and the theorem is
proved. �
The number λ2 − λ1 is called the spectral gap. In our case λ2 = 0 by
definition. Thus Theorem 3.2 provides an estimate on the spectral gap of
the matrix M .
Usually, one seeks to make the spectral gap as large as possible, but in
order this to make sense for Colin de Verdère matices, one has to choose a
matrix norm, [4, Chapter 5.7]. The norm of the matrix (6) is a function of
its coefficients, which have a geometric meaning. Thus, as soon as the choice
of a matrix norm is made, one can try to solve the problem of the spectral
gap by geometric means (at least for 3-connected planar graphs, for which
every optimal Colin de Verdière matrix can be realized through a polytope).
A The second Minkowski inequality for mixed vol-
umes and the signature of the matrix
∂2vol
∂xi∂xj
The goal of this appendix is to prove Theorem A.10 that describes the sig-
nature of the matrix (6). The theorem is derived from the second Minkowski
inequality for mixed volumes and Bol’s condition for equality.
The relation between the theory of mixed volumes and infinitesimal rigid-
ity (as we know, the rank of matrix (6) accounts for the infinitesimal rigidity
of the dual polytope, see Section 2.4) was noticed long ago [1, 15]. In the
decades thereafter this phenomenon seemed to be forgotten. Quite recently,
Carl Lee and Paul Filliman [7] discovered it again.
A.1 The second Minkowski inequality and Bol’s condition
Definition A.1 Let P,Q ⊂ Rd be convex bodies. A mixed volume of P and
Q is a coefficient in the expansion
vol(λP + µQ) =
vol(Q, . . . , Q
︸ ︷︷ ︸
, P, . . . , P
︸ ︷︷ ︸
)λd−kµk (10)
with λ, µ > 0, where A + B for A,B ⊂ Rd denotes the Minkowski sum. In
particular,
vol(P, . . . , P ) = vol(P ).
In a similar way one defines the mixed volume of more than two convex
bodies. It turns out that the mixed volume is polylinear with respect to the
Minkowski addition and multiplication with positive scalars. A proof that
the expansion (10) takes place and more information on mixed volumes can
be found in [6, 13].
Theorem A.2 Let P,Q ⊂ Rd be convex bodies. Then the following holds:
1. (The second Minkowski inequality)
vol(Q,P, . . . , P )2 ≥ vol(P ) · vol(Q,Q,P, . . . , P ). (11)
2. (Bol’s condition) Assume that dimQ = d. Then equality holds in (11)
if and only if either dimP < d − 1 or P is homothetic to a (d − 2)-
tangential body of Q.
For a proof see [13, Theorem 6.2.1, Theorem 6.6.18]. Bol’s condition was
conjectured by Minkowski but proved only decades later by Bol, [3].
Definition A.3 If P ⊂ Q ⊂ Rd are d-dimensional convex polytopes, then
Q is called a p-tangential body of P iff P has a non-empty intersection with
every face of Q of dimension at least p.
A.2 Mixed volumes as derivatives of the volume
By substituting in (10) λ = 1 and µ = t, we obtain
vol(P + tQ) = vol(P ) + tdvol(Q,P, . . . , P )
d(d− 1)
vol(Q,Q,P, . . . , P ) + · · · (12)
for all t > 0, which can be seen as the Taylor expansion of vol. We will look
at it in the case when P and Q are polytopes with the same sets of facet
normals.
The space P(V ) of all polytopes with outer facet normals v1, . . . , vn is
defined in Section 2.2. We want to study the partial derivatives of the volume
of P (x) ∈ P(V ) with respect to the support parameters x. For brevity, let’s
use the notation
vol(x) := vol(P (x)).
Similarly, the mixed volume of polytopes from P(V ) will be written as a
function of the support parameters:
vol(x1, . . . , xd) := vol(P (x1), . . . , P (xd)).
Now we would like to compute vol(x+ ty) with the help of (12). This is
not as straightforward as it seems, because the support parameters behave
not quite linearly under the Minkowski addition. We have P (ty) = tP (y)
for t > 0. Also we have P (x) + P (y) ⊂ P (x + y), but the equality doesn’t
always hold. To describe the cases in which we do have the equality, we
need a new definition.
Definition A.4 The normal cone N(F,P ) of the face F of a polytope P ⊂
d is the set of vectors w ∈ Rd such that
(w⊤x) = max
(w⊤x).
The normal fan N(P ) is the decomposition of Rd into the normal cones of
the faces of P . If the normal fan N(Q) subdivides the normal fan N(P ),
then we write N(Q) > N(P ).
Note that the normal fan of a polytope P ∈ P(V ) has the rays R+vi as
1-dimensional cones. The higher-dimensional cones of the normal fan deter-
mine the combinatorics of P . Therefore polytopes with equal normal fans
are sometimes called strongly isomorphic.
We denote the normal fans of the polytopes from P(V ) by N(x) :=
N(P (x)). The following lemma is classical.
Lemma A.5 If N(y) > N(x), then P (x) + P (y) = P (x+ y).
Now we are ready to prove
Lemma A.6 Let y ∈ P(V ) be such that N(y) > N(x). Then
∇yvol(x) = d · vol(y, x, . . . , x),
∇2yvol(x) = d(d− 1) · vol(y, y, x, . . . , x),
where ∇y denotes the directional derivative along y.
Proof . Due to Lemma A.5 we have P (x + ty) = P (x) + tP (y). By substi-
tuting P = P (x) and Q = P (y) in (12), we obtain
vol(x+ ty) = vol(x) + tdvol(y, x, . . . , x)
d(d− 1)
vol(y, y, x, . . . , x) + · · · ,
which implies the lemma. �
Remark. For polytopes with the same normal fan (“strongly isomorphic
polytopes”), there is the following description of mixed volumes. Denote
P∆(V ) = {P (x) ∈ P(V ) |N(x) = ∆}.
By induction on d, it is easy to show that there exists a homogeneous poly-
nomial V∆ of degree d in n variables such that
vol(P (x)) = V∆(x),
for all x ∈ P∆(U). If we use the same symbol V∆ to denote the associated
symmetric polylinear form, then we have
vol(P (x(1)), . . . , P (x(d))) = V∆(x
(1), . . . , x(d))
for all x(1), . . . , x(d) ∈ P∆(V ).
A.3 From the second Minkowski inequality to the signature
of the Hessian of the volume
By geometric arguments similar to those in the proof of Lemma 2.3, the
function vol is twice continuously differentiable on P(V ). Therefore the
following definition makes sense.
Definition A.7 Let x ∈ P(V ). Define a symmetric bilinear form Φ on Rn
Φ(ξ, η) = ∇η∇ξvol(x).
Let y ∈ P(V ) be such that N(y) > N(x). By combining Euler’s homo-
geneous function theorem and Lemma A.6, we obtain
Φ(x, x) = d(d− 1) vol(x, . . . , x),
Φ(x, y) = d(d− 1) vol(y, x, . . . , x),
Φ(y, y) = d(d− 1) vol(y, y, x, . . . , x).
Lemma A.8 Let L ⊂ Rn be a 2-dimensional vector subspace such that
x ∈ L. Then the restriction of the form Φ to L has signature (+,−) or
(+, 0).
Proof . Let y ∈ P(V ) be such that N(y) > N(x). The second Minkowski
inequality (11) applied to P = P (x) and Q = P (y) can be rewritten as
Φ(x, x) Φ(x, y)
Φ(x, y) Φ(y, y)
≤ 0. (13)
Since, moreover, Φ(x, x) = d(d−1) vol(P ) > 0, it follows that the restriction
of Φ to span {x, y} has signature (+, 0) or (+,−).
It remains to show that every 2-subspace L ∋ x can be represented as
span {x, y} with N(y) > N(x). This is true since x is an interior point of
the set {y ∈ P(V ) |N(y) > N(x)}. (When we perturb x, we can create new
faces, but cannot destroy old ones.) �
Lemma A.9 The form Φ has corank d.
Proof . Let us exhibit a d-dimensional subspace of ker Φ. Associate with
every point p ∈ Rd a vector p ∈ Rn with coordinates
pi = 〈vi, p〉.
The polytope P (x+p) is the translate of P (x) by p. Therefore the directional
derivative ∇pvol(x) vanishes for all x, which implies Φ(p, η) = 0 for all η.
Thus we have
p ∈ kerΦ
for all p ∈ Rd.
Let ξ ∈ ker Φ. We need to show that ξ = p for some p ∈ Rd. Denote
the span of x and ξ by L. Then, by Lemma A.8, the restriction Φ|L has
signature (+, 0) and hence
Rξ = L ∩ ker Φ. (14)
Choose y ∈ L such that N(y) > N(x), and x and y are linearly independent.
Then the degeneracy of Φ|L means that we have an equality in (13) and
thus also in the Minkowski inequality for P = P (x) and Q = P (y). By
Bol’s condition, see Theorem A.2, this happens if and only if the polytope
P (x) is homothetic to a (d − 2)-tangential body of the polytope P (y). By
studying Definition A.3, we see that in P(V ) it is equivalent to P (x) being
homothetic to P (y). If P (x) is homothetic to P (y), then x = λy + p for
some p ∈ Rd, thus p ∈ L. Since p ∈ kerΦ, it follows that
Rp = L ∩ ker Φ.
By comparing this to (14), we conclude that ξ = µp = µp for some µ ∈ R.
Thus the kernel of Φ is confined to the vectors of the form p. �
Theorem A.10 The form Φ has corank d and exactly one positive eigen-
value, which is simple.
Proof . The corank of Φ is computed in Lemma A.9.
The form Φ has at least one positive eigenvector since Φ(x, x) > 0.
Assume that it has more than one. Then there exists a 2-subspace of Rn
on which Φ is positively definite. The subgroup of GL(Rn) that preserves Φ
acts transitively on the cone of positive directions. Thus there is a positive
2-subspace L that passes through x. This contradicts Lemma A.8. Theorem
is proved. �
References
[1] W. Blaschke. Ein Beweis für die Unverbiegbarkeit geschlossener kon-
vexer Flächen. Gött. Nachr., pages 607–610, 1912.
[2] A. Bobenko and I. Izmestiev. Alexandrov’s theorem, weighted Delau-
nay triangulations, and mixed volumes. Ann. Inst. Fourier (Grenoble),
58(2):447–505, 2008.
[3] G. Bol. Beweis einer Vermutung von H. Minkowski. Abh. Math. Sem.
Hansischen Univ., 15:37–56, 1943.
[4] Y. Colin de Verdière. Spectres de graphes, volume 4 of Cours Spécialisés.
Société Mathématique de France, Paris, 1998.
[5] Y. Colin de Verdière. Sur le spectre des opérateurs de type Schrödinger
sur les graphes. In Graphes, pages 25–52. Ed. Éc. Polytech., Palaiseau,
2004.
[6] G. Ewald. Combinatorial convexity and algebraic geometry, volume 168
of Graduate Texts in Mathematics. Springer-Verlag, New York, 1996.
[7] P. Filliman. Rigidity and the Alexandrov-Fenchel inequality. Monatsh.
Math., 113(1):1–22, 1992.
[8] B. Grünbaum. Convex polytopes, volume 221 of Graduate Texts in
Mathematics. Springer-Verlag, New York, second edition, 2003. Pre-
pared and with a preface by Volker Kaibel, Victor Klee and Günter M.
Ziegler.
[9] A. Kotlov, L. Lovász, and S. Vempala. The Colin de Verdière number
and sphere representations of a graph. Combinatorica, 17(4):483–521,
1997.
[10] L. Lovász. Steinitz representations of polyhedra and the Colin de
Verdière number. J. Combin. Theory Ser. B, 82(2):223–236, 2001.
[11] L. Lovász and A. Schrijver. On the null space of a Colin de Verdière
matrix. Ann. Inst. Fourier (Grenoble), 49(3):1017–1026, 1999. Sympo-
sium à la Mémoire de François Jaeger (Grenoble, 1998).
[12] P. McMullen. Weights on polytopes. Discrete Comput. Geom.,
15(4):363–388, 1996.
[13] R. Schneider. Convex bodies: the Brunn-Minkowski theory, volume 44
of Encyclopedia of Mathematics and its Applications. Cambridge Uni-
versity Press, Cambridge, 1993.
[14] H. van der Holst, L. Lovász, and A. Schrijver. The Colin de Verdière
graph parameter. In Graph theory and combinatorial biology (Balaton-
lelle, 1996), volume 7 of Bolyai Soc. Math. Stud., pages 29–85. János
Bolyai Math. Soc., Budapest, 1999.
[15] H. Weyl. Über die Bestimmung einer geschlossenen konvexen Fläche
durch ihr Linienelement. Zürich. Naturf. Ges., 61:40–72, 1916.
[16] G. M. Ziegler. Lectures on polytopes, volume 152 of Graduate Texts in
Mathematics. Springer-Verlag, New York, 1995.
	Introduction
	The Colin de Verdière number
	Nullspace representations and Steinitz representations
	Hessian matrix of the volume as a Colin de Verdière matrix
	Plan of the paper
	Acknowledgements
	From a convex polytope to a Colin de Verdière matrix
	Lovász construction
	Polytopes with a given set of normals
	Interpreting and generalizing the Lovász construction
	Case d=3 and infinitesimal rigidity of convex polytopes
	Concluding remarks
	What fails in the inverse construction
	Negative eigenvalue
	The second Minkowski inequality for mixed volumes and the signature of the matrix (2 volxi xj)
	The second Minkowski inequality and Bol's condition
	Mixed volumes as derivatives of the volume
	From the second Minkowski inequality to the signature of the Hessian of the volume
ABSTRACT
  The Colin de Verdi\`ere number $\mu(G)$ of a graph $G$ is the maximum corank
of a Colin de Verdi\`ere matrix for $G$ (that is, of a Schr\"odinger operator
on $G$ with a single negative eigenvalue). In 2001, Lov\'asz gave a
construction that associated to every convex 3-polytope a Colin de Verdi\`ere
matrix of corank 3 for its 1-skeleton.
  We generalize the Lov\'asz construction to higher dimensions by interpreting
it as minus the Hessian matrix of the volume of the polar dual. As a corollary,
$\mu(G) \ge d$ if $G$ is the 1-skeleton of a convex $d$-polytope.
  Determination of the signature of the Hessian of the volume is based on the
second Minkowski inequality for mixed volumes and on Bol's condition for
equality.

<|endoftext|><|startoftext|>
Introduction
Jupiter Trojans are small bodies of the Solar System located in the Jupiter
Lagrangian points L4 and L5. Up to now more than 2000 Trojans have been
discovered, ∼ 1150 belonging to the L4 cloud and ∼ 950 to the L5 one.
The number of L4 Trojans with radius greater than 1 km is estimated to be
around 1.6 ×105 (Jewitt et al., 2000), comparable with the estimated main
belt population of similar size.
The debate about the origin of Jupiter Trojans and how they were trapped
in librating orbits around the Lagrangian points is still open to several possi-
bilities. Considering that Trojans have orbits stable over the age of the Solar
System (Levison et al, 1997, Marzari et al. 2003) their origin must date back
to the early phase of the solar system formation. Some authors (Marzari &
Scholl, 1998a,b; Marzari et al., 2002) suggested that they formed very close
to their present location and were trapped during the growth of Jupiter.
Morbidelli et al. (2005) suggested that Trojans formed in the Kuiper belt
and were subsequently captured in the Jupiter L4 and L5 Lagrangian points
during planetary migration, just after Jupiter and Saturn crossed their mu-
tual 1:2 resonances. In this scenario, Jupiter Troians would give important
clues on the composition and accretion of bodies in the outer regions of the
solar nebula.
Several theoretical studies conclude that Jupiter Trojan clouds are at
least as collisionally evolved as main belt asteroids (Shoemaker et al., 1989;
Binzel & Sauter, 1992; Marzari et al., 1997; Dell’Oro et al., 1998). This
result is supported by the identification of several dynamical families, both
in the L4 and L5 swarm (Shoemaker et al., 1989, Milani, 1993, Beaugé and
Roig, 2001).
Whatever the Trojan origin is, it is plausible to assume that they formed be-
yond the frost line and that they are primitive bodies, are possibly composed
of anhydrous silicates and organic compounds, and possibly still contain ices
in their interior. Several observations of Trojans in the near infrared region
(0.8-2.5 µm) have failed to clearly detect any absorption features indicative
of water ice (Barucci et al, 1994; Dumas et al, 1998; Emery & Brown, 2003,
2004; Dotto et al., 2006). Also in the visible range Trojan spectra appear
featureless (Jewitt & Luu, 1990; Fornasier et al., 2004a, Bendjoya et al.,
2004; Dotto et al., 2006). Up to now only 2 objects (1988 BY1 and 1870
Glaukos) show the possible presence of faint bands (Jewitt & Luu, 1990).
However, these bands are comparable to the peak to peak noise and are not
yet confirmed.
Recently, mineralogical features have been detected in emissivity spectra of
three Trojan asteroids measured by the Spitzer Space Telescope. These fea-
tures are interpreted as indicating the presence of fine-grained silicates on
the surfaces (Emery et al. 2006).
Several questions about Jupiter Trojans’ dynamical origin, physical prop-
erties, composition and link with other groups of minor bodies such as outer
main belt asteroids, cometary nuclei, Centaurs and KBOs are still open.
In order to shed some light on these questions, we have carried out a spectro-
scopic and photometric survey of Jupiter Trojans at the 3.5m New Technol-
ogy Telescope (NTT) of the European Southern Observatory (La Silla, Chile)
and at the 3.5m Telescopio Nazionale Galileo (TNG), La Palma, Spain. In
this paper we present new visible spectroscopic and photometric data, ob-
tained during 7 observing nights, carried out at ESO-NTT on April 2003,
May 2004, and January 2005, for a total of 47 objects belonging to the L5
(23 objects) and L4 (24 objects) swarms. Considering also the results already
published in Fornasier et al. (2004a) and Dotto et al. (2006), obtained in
the framework of the same project, we collected a total sample of 80 Jupiter
Trojan visible spectra, 47 belonging to the L5 clouds and 33 to the L4. This
is the largest homogeneous data set available up to now on these primitive
asteroids.
The principal aim of our survey was the investigation of Jupiter Trojans
belonging to different dynamical families. In fact, since dynamical families
are supposed to be formed from the collisional disruption of parent bodies,
the investigation of the surface properties of small and large family members
can help in understanding the nature of these dynamical groups and might
provide a glimpse of the interior structure of the larger primordial parent
bodies.
We also present an analysis of the visible spectral slopes for all the data in
our survey along with those available in the literature, for a total sample of
142 Trojans.
This enlarged sample allowed us to carry out a significant statistical investi-
gation of the Trojans’ spectral property distributions, as a function of their
orbital and physical parameters, and in comparison with other classes of mi-
nor bodies in the outer Solar System. We also discuss the spectral slope
distribution within the Trojan families.
2 Observations and data reduction
[HERE TABLE 1 AND 2]
The data were obtained in the visible range during 3 different observing
runs at ESO-NTT: 10 and 11 April 2003 for the spectroscopic and pho-
tometric investigation of 6 members of the 4035 1986 WD and 1 member
of 1986 TS6 families; 25 and 26 May 2004 for a spectroscopic survey of L4
Eurybates family; 17, 18, and 19 January 2005 for the spectroscopic and pho-
tometric investigation of 5 Anchises, 6 Misenus, 5 Panthoos, 2 Cloanthus, 2
Sarpedon and 3 Phereclos family members (L5 swarm).
We selected our targets from the list of Jupiter Trojan families provided by
Beaugé and Roig (2001 and P.E.Tr.A. Project at www.daf.on.br/ froig/petra/).
The authors have used a cluster-detection algorithm called Hierarchical Clus-
tering Method (HCM, e.g. Zappalà et al., 1990) to find asteroid families
among Jupiter Trojans starting from a data–base of semi-analytical proper
elements (Beaugé & Roig, 2001). The identification of families is performed
by comparing the mutual distances with a suitable metric in the proper el-
ements’ space. The clustering chain is halted when the mutual distance,
measuring the incremental velocity needed for orbital change after the puta-
tive parent body breakup, is larger than a fixed cut-off value. A lower cutoff
implies a higher statistical significance of the family. Since families in L4 are
on average more robust than those around L5 (Beaugé and Roig, 2001), we
prefer to adopt a cutoff of 100 m/s for the L4 cloud and of 150 m/s for L5.
For the very robust Eurybates family we decided to limit our survey to those
family members defined with a cutoff of 70 m/s.
All the data were acquired using the EMMI instrument, equipped with a
2x1 mosaic of 2048×4096 MIT/LL CCD with square 15µm pixels. For the
spectroscopic investigation during May 2004 and January 2005 runs we used
the grism #1 (150 gr/mm) in RILD mode to cover the wavelength range
4100–9400 Å with a dispersion of 3.1 Å/px (200 Å/mm) at the first order,
while on April 2003 we used a different grism, the #7 (150 gr/mm), covering
the spectral range 5200–9500 Å, with a dispersion of 3.6 Å/px at the first
order. April 2003 and January 2005 spectra were taken through a 1 arcsec
wide slit, while during May 2004 we used a larger slit (1.5 arcsec). The slit
was oriented along the parallactic angle during all the observing runs in order
to avoid flux loss due to the atmospheric differential refraction.
For most objects, the total exposure time was divided into several (usually
2-4) shorter acquisitions. This allowed us to check the asteroid position in
the slit before each acquisition, and correct the telescope pointing and/or
tracking rates if necessary. During each night we also recorded bias, flat–
field, calibration lamp (He-Ar) and several (6-7) spectra of solar analog stars
measured at different airmasses, covering the airmass range of the science
targets. During 17 January 2005, part of the night was lost due to some
technical problems and only 2 solar analog stars were acquired. The ratio
of these 2 stars show minimal variations (less than 1%) in the 5000–8400 Å
range, but higher differences at the edges of this range. For this reason we
omit the spectral region below 4800 Å for most of the asteroids acquired that
night.
The spectra were reduced using ordinary procedures of data reduction as
described in Fornasier et al. (2004a). The reflectivity of each asteroid was
obtained by dividing its spectrum by that of the solar analog star closest
in time and airmass to the object. Spectra were finally smoothed with a
median filter technique, using a box of 19 pixels in the spectral direction for
each point of the spectrum. The threshold was set to 0.1, meaning that the
original value was replaced by the median value if the median value differs
by more than 10% from the original one. The obtained spectra are shown
in Figs. 1–5. In Table 1 and Table 2 we report the circumstances of the
observations and the solar analog stars used respectively for the L5 and L4
family members.
[TABLE 3]
The broadband color data were obtained during the April 2003 and Jan-
uary 2005 runs just before the Trojans’ spectral observation. We used the
RILD mode of EMMI for wide field imaging with the Bessell-type B, V, R,
and I filters (centered respectively at 4139, 5426, 6410 and 7985Å). The ob-
servations were carried out in a 2 × 2 binning mode, yielding a pixel scale
of 0.33 arcsec/pixel. The exposure time varied with the object magnitude:
typically it was about 12-90s in V, 30-180s in B, 12-70s in R and I filters.
The CCD images were reduced and calibrated with a standard method (For-
nasier et al., 2004a), and absolute calibration was obtained through the ob-
servations of several Landolt fields (Landolt, 1992). The instrumental mag-
nitudes were measured using aperture photometry with an integrating radius
typically about three times the average seeing, and sky subtraction was per-
formed using a 5-10 pixels wide annulus around each object.
The results are reported in Table 3. From the visual inspection and the radial
profiles analysis of the images, no coma was detected for any of the observed
Trojans.
On May 2004, as the sky conditions were clear but not photometric, we did
not perform photometry of the Eurybates family targets.
3 Results
[TABLE 4 AND 5]
For each Trojan we computed the slope S of the spectral continuum using
a standard least squared technique for a linear fit in the wavelength range
between 5500 and 8000 Å. The choice of these wavelength limits has been
driven by the spectral coverage of our data. We choose 5500 Å as the lower
limit because of the different instrumental setup used during different ob-
serving runs (with some spectra starting at wavelength ≥ 5200 Å), while
beyond 8000 Å our spectra are generally noisier due to a combination of the
CCD drop-off in sensitivity and the presence of the strong atmospheric water
bands.
The computed slopes and errors are listed in Table 4 and 5. The reported er-
ror bars take into account the 1σ uncertainty of the linear fit plus 0.5%/103Å
attributable to the use of different instruments and solar analog stars (esti-
mated from the different efficiency of the grism used, and from flux losses
due to different slit apertures). In Table 4 and 5 we also report the taxo-
nomic class derived following the Dahlgren & Lagerkvist (1995) classification
scheme.
In the L5 cloud we find 27 D–, 3 DP–, 2 PD–, and 1 P–type objects. In
the L4 cloud we find 10 C–type and 7 P–type objects inside the Eurybates
family, while for the Menelaus, 1986 TS6 and 1986 WD families, including
the data published in Dotto et al. (2006), we get 9 D–, 3 P–, 3C–, and 1
DP–type asteroids.
The majority of the spectra are featureless, although some of the observed
Eurybates’ members show weak spectral absorption features (Fig. 5). These
features are discussed in the following section.
We derived an estimated absolute magnitude H by scaling the measured
V magnitude to r = ∆ = 1 AU and to zero phase assuming G=0.15 (Bowell
et al., 1989). The estimated H magnitude of each Trojan might be skewed
uncertain rotational phase, as the lightcurve amplitudes of Trojans might
vary up to 1 magnitude. In order to investigate possible size dependence in-
side each family, and considering that IRAS diameters are available for very
few objects, we estimate the size using the following relationship:
1329× 10−H/5
where D is the asteroid diameter, p is the geometric albedo, and H is the abso-
lute magnitude. We use H derived from our observations when available, and
from the ASTORB.DAT file (Lowell observatory) for the Eurybates mem-
bers, for which we did not carry out visible photometry. We evaluated the
diameter for an albedo range of 0.03–0.07, assuming a mean albedo of 0.04
for these dark asteroids (Fernandez et al., 2003). The resulting D values are
reported in Tables 4 and 5.
3.1 Dynamical families: L5 swarm
3.1.1 Anchises
[FIGURE 1]
We investigated 5 of the 15 members of the Anchises family (Fig. 1): 1173
Anchises, 23549 1994 ES6, 24452 2000 QU167, 47967 2000 SL298 and 124729
2001 SB173 on 17 January 2005. For 4 out of 5 observed objects we omit
the spectral range below 4800Å due to low S/N ratio and problems with the
solar analog stars. The spectral behavior is confirmed by photometric data
(see Table 3). All the obtained spectra are featureless.
The Anchises family survives at a cutoff corresponding to relative veloc-
ities of 150 m/s. The biggest member, 1173 Anchises, has a diameter of
126 km (IRAS data) and has the lowest spectral slope (3.9 %/103Å) among
the investigated family members. It is classified as P–type, while the other
4 members are all D–types. Anchises was previously observed in the 4000-
7400Å region by Jewitt & Luu (1990), who reported a spectral slope of 3.8
%/103Å, in perfect agreement with the value we found. The three 19-29 km
sized objects have a steeper spectral slope (7.4-9.2 %/103Å), while the small-
est object, 2001 SB173 (spectral slope = 14.78±0.99 %/103Å) is the reddest
one (Table 4).
Even with the uncertainties in the albedo and diameter, a slope–size rela-
tionship is evident among the observed objects, with smaller–fainter members
redder than larger ones (Fig. 7).
3.1.2 Misenus
[FIGURE 2]
For this family we investigated 6 members (11663 1997 GO24, 32794 1989
UE5, 56968 2000 SA92, 99328 2001 UY123, 105685 2000 SC51 and 120453
1988 RE12) out of the 12 grouped at a relative velocity of 150 m/s. The
family survives with the same members also at a stringent cut-off velocity
of 120 m/s. The spectra, together with magnitude color indices transformed
into linear reflectance, are shown in Fig. 2, while the color indices are reported
in Table 3. All the spectra are featureless with different spectral slope values
covering the 4.6–15.9 %/103Å range (Table 4): 1988 RE12 has the lowest
spectral slope and is classified as P–type, 3 objects (11663, 32794 and 2000
SC51) are in the transition region between P– and D– type, with very similar
spectral behavior, while the two other observed members are D–types. Of
these last, 56968 has the highest spectral slope not only inside the family
(15.86 %/103Å) but also inside the whole L5 sample analyzed in this paper.
All the investigated Misenus members are quite faint and have diameters
of a few tens of kilometers. No clear size-slope relationship has been found
inside this family (Fig. 7).
No other data on the Misenus family members are available in the literature,
so we do not know if the large gap between the spectral slope of 56968 and
those of the other 5 investigated objects is real or it could be filled by other
members not yet observed. If real, 56968 can be an interloper inside the
family.
3.1.3 Panthoos
[FIGURE 3]
The Panthoos family has 59 members for a relative velocity cutoff of 150
m/s. We obtained new spectroscopic and photometric data of 5 members:
4829 Sergestus, 30698 Hippokoon, 31821 1999 RK225, 76804 2000 QE and
111113 2001 VK85 (Fig. 3). Three objects presented by Fornasier et al.
(2004a) as belonging to the Astyanax family (23694 1997 KZ3, 32430 2000
RQ83, 30698 Hippokoon) and one to the background population (24444 2000
OP32) are now included among the members of the Panthoos family. Peri-
odic updates of the proper elements can change the family membership. In
particular the Astyanax group disappeared in the latest revision of dynami-
cal families, and its members are now in the Panthoos family within a cutoff
of 150m/s. The Panthos family survives also a cutoff of 120 m/s, with 7
members, and 90 m/s, with 6 members.
We observed 30698 Hippokoon during two different runs (on 9 Nov. 2002
and on 18 Jan. 2005), and both spectral slopes and colors are in agreement
inside the error bars (see Table 3, Table 4, and Fornasier et al., 2004a). No
other data on the Panthoos family are available in the literature.
The analysis of the 8 members (for 24444 only photometry is available)
show featureless spectra with slopes that seem to slightly increase as the
asteroid size decreases (Table 4 and Fig. 7). However, all the members have
dimensions very similar within the uncertainties, making it difficult for any
slope-size relationship to be studied. The largest member, 4829 Sergestus, is
a PD–type with a slope of about 5 %/103Å, while all the other investigated
members are D–types.
3.1.4 Cloantus
[FIGURE 4]
We observed only 2 out of 8 members of the Cloantus family (5511 Cloan-
thus and 51359 2000 SC17, see Fig. 4) as grouped at a cutoff corresponding
to relative velocities of 150 m/s. This family survives at a stringent cutoff
and 3 members (including the two that we observed) also survive for relative
velocities of 60 m/s. Both of the observed objects are D–types with very
similar, featureless, reddish spectra (Table 4 and Fig. 7). 5511 Cloanthus
was observed also by Bendjoya et al. (2004), who found a slope of 13.0±0.1
%/103Å in the 5000-7500 Å wavelength range, while we measure a value of
10.84±0.15 %/103Å. Our spectrum has a higher S/N ratio than the spectrum
by Bendjoya et al. (2004), and it is perfectly matched by our measured color
indices that confirm the spectral slope. This difference cannot be caused by
the slightly different spectral ranges used to measure the slope, but could
possibly be due to heterogeneous surface composition.
3.1.5 Phereclos
The Phereclos family comprises 15 members at a cutoff of 150 m/s. The
family survives with 8 members also at a cutoff of 120m/s. We obtained
spectroscopic and photometric data of 3 members (9030 1989 UX5, 11488
1988 RM11 and 31820 1999 RT186, see Fig 4), that, together with the 4
spectra (2357 Phereclos, 6998 Tithonus, 9430 1996 HU10, 18940 2000QV49)
already presented by Fornasier et al. (2004a), allow us to investigate about
half of the Phereclos family population defined at a cutoff of 150m/s. The
spectral slope of these objects, all classified as D–type except one PD–type
(11488), varies from 5.3 to 11.3 %/103Å (Table 4). The size of the fam-
ily members ranges from about 20 km in diameter for 31820 to 95 km for
2357, but we do not observe any clear slope-diameter relationship (Fig. 7 and
Table 4).
3.1.6 Sarpedon
We obtained new spectroscopic and photometric data of 2 members of the
Sarpedon family (48252 2001 TL212 and 84709 2002 VW120), whose spectra
and magnitude color indices are reported in Fig. 4 and Table 4. Including
the previous observations (Fornasier et al., 2004a) of 4 other members (2223
Sarpedon, 5130 Ilioneus, 17416 1988 RR10, and 25347 1999 RQ116), we have
measurements of 6 of the 21 members of this family dynamically defined at a
cutoff of 150 m/s. All the 6 aforementioned objects, except 25347, constitute
a robust clustering which survives up to 90 m/s with 9 members. The cluster
which contains (2223) Sarpedon was also recognized as a family by Milani
(1993).
All the 6 investigated members have very similar colors (see Table 3) and
spectral behavior. The spectral slope (Fig. 7) varies over a very restricted
range, from 9.6 to 11.6 %/103Å (Table 4), despite a significant variation of
the estimated size (from the 18 km of 17416 to the 105 km of 2223). Con-
sequently, the surface composition of the Sarpedon family members appears
to be very homogeneous.
3.2 Dynamical families: L4 swarm
3.2.1 Eurybates
[FIGURE 5]
Eurybates family members were observed in May 2004. The selection of
the targets was made on the basis of a very stringent cutoff, corresponding
to relative velocities of 70 m/s, that gives a family population of 28 objects.
We observed 17 of these members (see Table 2) that constitute a very robust
clustering in the space of the proper elements: all the members we studied,
except 2002 CT22, survive at a cutoff of 40 m/s.
The spectral behavior of these objects (Fig. 5) is quite homogeneous with
10 asteroids classified as C–type and 7 as P–type. The spectral slopes (Ta-
ble 5) range from neutral to moderately red (from -0.5 to 4.6 %/103Å). The
slopes of six members are close to zero (3 slightly negative) with solar-like col-
ors. The asteroids 18060, 24380, 24420, and 39285, all classified as C–types,
clearly show a drop off of reflectance for wavelength shorter than 5000–5200
Å. The presence of the same feature in the spectra of 2 other members (1996
RD29 and 28958) is less certain due to the lower S/N ratio. This absorp-
tion is commonly seen on main belt C–type asteroids (Vilas 1994; Fornasier
et al. 1999), where is due to the intervalence charge transfer transitions
(IVCT) in oxidized iron, and is often coupled with other visible absorption
features related to the presence of aqueous alteration products (e.g. phyl-
losilicates, oxides, etc). These IVCTs comprise multiple absorptions that are
not uniquely indicative of phyllosilicates, but are present in the spectrum
of any object containing Fe2+ and Fe3+ in its surface material (Vilas 1994).
Since no other phyllosilicate absorption features are present in the C-type
spectra of the Eurybates family, there is no evidence that aqueous alteration
processes occurred on the surface of these bodies.
In Fig. 8 we show the spectral slopes versus the estimated diameters for
the Eurybates family members. All the observed objects, except the largest
member (3548) that has a diameter of about 70 km and exhibit a neutral (∼
solar-like) spectral slope, are smaller than ∼ 40 km and present both neutral
and moderately red colors. The spectral slopes are strongly clustered around
S = 2%/103Å, with higher S values restricted to smaller objects (D< 25
3.2.2 1986 WD
[FIGURE 6]
We investigated 6 out of 17 members of the 4035 1986 WD family that is
dynamically defined at a cutoff of 130 m/s (Fig. 6 and Table 2). Three of our
targets (4035, 6545 and 11351) were already observed by Dotto et al. (2006):
for 6545 and 11351 there is a good consistency between our spectra and
those already published. 4035 was observed also by Bendjoya et al. (2004):
all the spectra are featureless, but Bendjoya et al. (2004) obtain a slope of
8.8 %/103Å, comparable to the one here presented, while Dotto et al. (2006)
found a higher value (see Table 5). This could be interpreted as due to the
different rotational phases seen in the three observations, and could indicate
some inhomogeneities on the surface of 4035.
The observed family members show heterogeneous behaviors (Fig. 8),
with spectral slopes ranging from neutral values for the smaller members
(24341 and 14707) to reddish ones for the 3 members with size bigger than
50 km (4035, 6545, and 11351). For this family, it seems that a size-slope
relationship exists, with smaller members having solar colors and spectral
slopes increasing with the object’ sizes.
3.2.3 1986 TS6
The 1986 TS6 family includes 20 objects at a cut-off of 100 m/s. We present
new spectroscopy and photometry of a single member, 12921 1998 WZ5
(Fig. 6). The spectrum we present here is flat and featureless, with a spectral
slope of 4.6±0.8%/103Å. Dotto et al. (2006) presented a spectrum obtained
a month after our data (in May 2003) that has a very similar spectral slope
3.7± 0.8%/103Å. Previously, 12917 1998 TG16, 13463 Antiphos, 12921 1998
WZ5, 15535 2000 AT177, 20738 1999 XG191, and 24390 2000 AD177 were
included in the Makhoan family. Refined proper elements now place all of
these bodies in the 1986 TS6 family.
In Fig. 8 we report the spectral slopes vs. estimated diameters of the
6 observed members. The family shows different spectral slopes with the
presence of both P–type (12921 and 13463) and D–type asteroids (12917,
15535, 20738, and 24390). Due to the very similar diameters, a slope-size
relationship is not found.
[FIGURE 7 AND 8]
4 Discussion
The spectra of Jupiter Trojan members of dynamical families show a range
of spectral variation from C– to D–type asteroids. With the exception of the
L4 Eurybates family, all the observed objects have featureless spectra, and
we cannot find any spectral bands which could help in the identification of
minerals present on their surfaces. The lack of detection of any mineralogy
diagnostic feature might indicate the formation of a thick mantle on the Tro-
jan surfaces. Such a mantle could be formed by a phase of cometary activity
and/or by space weathering processes as demonstrated by laboratory exper-
iments on originally icy surfaces (Moore et al., 1983; Thompson et al., 1987;
Strazzulla et al., 1998; Hudson & Moore, 1999).
A peculiar case is constituted by the Eurybates family, which shows a pre-
ponderance of C–type objects and a total absence of D–types. Moreover,
this is the only family in which some members exhibit spectral features at
wavelengths shorter than 5000–5200 Å, most likely due to the intervalence
charge transitions in materials containing oxidized iron (Vilas 1994).
4.1 Size vs spectral slope distribution:
Individual families
The plots of spectral slopes vs. diameters are shown in Fig. 7 and 8. A
relationship between spectral slopes and diameters seems to exist for only
three of the nine families we studied. In the Anchises and Panthoos families,
smaller objects have redder spectra, while for the 1986 WD family larger
objects have the redder spectra.
Moroz et al. (2004) have shown that ion irradiation on natural complex
hydrocarbons gradually neutralizes the spectral slopes of these red organic
solids. If the process studied by Moroz et al. (2004) occurred on the surface of
Jupiter Trojans, the objects having redder spectra have to be younger than
those characterized by bluish-neutral spectra. In this scenario the largest
and spectrally reddest objects of the 1986 WD family could come from the
interior of the parent body and expose fresh material. In the case of the
Anchises and Panthoos families the spectrally reddest members, being the
smallest, could come from the interior of the parent body, or alternatively
could be produced by more recent secondary fragmentations. In particular,
small family members may be more easily resurfaced, as significant collisions
(an impactor having a size greater than a few percent of the target), as well
as seismic shaking and recoating by fresh dust, may occur frequently at small
sizes.
[FIGURE 9]
4.2 Size vs slope distribution:
The Trojan population as a whole
[TABLE 6]
As compared to the data available in literature, our sample strongly con-
tributed to the analysis of fainter and smaller Trojans, with estimated di-
ameters smaller than 50 km. Jewitt & Luu (1990), analyzing a sample of
32 Trojans, found that the smaller objects were redder than the bigger ones.
However, our data play against the existence of a possible color-dimension
trend. In fact, the spectral slope’s range of the objects smaller than 50 km
is similar to that of the larger Trojans, as shown in Fig. 9.
The Eurybates family strongly contributes to the population of small
spectrally neutral objects, filling the region of bodies with mean diameter
D<40 km and with spectral slopes smaller than 3 %/103Å.
In order to carry out a complete analysis of the spectroscopic and pho-
tometric characteristics of the whole available data set on Jupiter Trojans,
we considered all the visible spectra published in the literature: Jewitt &
Luu (1990, 32 objects), Fitzimmons et al. (1994, 3 objects), Bendjoya et
al. (2004, 34 objects), Fornasier et al. (2004a, 26 L5 objects), and Dotto et
al. (2006, 24 L4 Trojans). We also add several Trojans spectra (11 L4 and
3 L5 Trojans) from the files available on line (Planetary data System archive,
pdssbn.astro.umd.edu, and www.daf.on.br/∼lazzaro/S3OS2-Pub/s3os2.htm)
from the SMASS I, SMASS II and S3OS2 surveys (Xu et al., 1995; Bus &
Binzel, 2003; Lazzaro et al., 2004). Including all these data, we compile a
sample of 142 different Trojans, 68 belonging to the L5 cloud and 74 belong-
ing to the L4. We performed the taxonomic classification of this enlarged
sample, on the basis of the Dahlgren and Lagerkvist (1995) scheme, by ana-
lyzing spectral slopes computed in the range 5500-8000 Å. Different authors,
of course, considered different spectral ranges for their own slope gradient
evaluations: Jewitt & Luu (1990) and Fitzimmons et al. (1994) use the
4000-7400 Å and Bendjoya et al. (2004, Table 2) used a slightly different
ranges around 5200-7500 Å. Since all the cited papers show spectra with lin-
ear featureless trends, the different wavelength ranges used for the spectral
gradient computation by Bendjoya et al. (2004) and Jewitt & Luu (1990)
are not expected to influence the obtained slopes.
In order to search for a dependency of the spectral slope distribution with
the size of the objects, all observations (from this paper as well as from the
literature) were combined. The objects were isolated in 5 size bins (smaller
than 25 km, 25–50 km, 50–75 km, 75-100 km and larger than 100 km). Each
bin contains between 20 and 50 objects. These subsamples are large enough
to be compared using classical statistical tests: the t-test, which estimates
if the mean values are compatible, the f-test, which checks if the widths of
the distributions are compatible (even if they have different means), and
the KS test, which compares directly the full distributions. A probability is
computed for each test; a small probability indicates that the tested distri-
butions are not compatible, i.e. the objects are not randomly extracted from
the same population, while a large probability value has no meaning (i.e. it
is not possible to assure that both samples come from the same population,
we can just say in that case that they are not incompatible). In order to
quantify the probability levels that we consider as significant, the same tests
were run on randomized distributions (see Hainaut & Delsanti 2002 for the
method). Since probability lower than 0.04-0.05 does not appear in these
randomized distributions, we consider that values smaller than 0.05 indicate
a significant incompatibility.
Each sub-sample was compared with the four others – the results are sum-
marized in Table 6. The average slope of the 5 bins are all compatible among
each other. The only marginally significant result is that the width of the
slope distribution among the larger objects (diam. > 100 km) is narrower
than that of all the smaller objects.
This narrower color distribution could be due to the aging processes affect-
ing the surface of bigger objects, which are supposed to be older. The wider
color distribution of small members is possibly related to the different ages
of their surfaces: some of them could be quite old, while some other could
have been recently refreshed.
4.3 Spectral slopes and L4/L5 Clouds
[HERE FIGURES 10 AND 11]
Considering only the Trojan observations reported in this paper, the aver-
age slope is 8.84±3.03%/103Å for the L5 population, and 4.57±4.01%/103Å
for the L4.
Considering now all the spectra available in the literature, the 68 L5
Trojans have an average slope of 9.15±4.19%/103Å, and the 78 L4 objects,
6.10±4.48%/103Å. Performing the same statistical tests as above, it appears
that these two populations are significantly different. In particular, the av-
erage slopes are incompatible at the 10−5 level.
Nevertheless, as described in Section 3.2.1, the Eurybates family members
have quite different spectral characteristics than the other objects and con-
stitute a large subset of the whole sample. Indeed, comparing their distribu-
tion with the whole populations, they are found significantly different at the
10−10 level. In other words, the Eurybates family members do not constitute
a random subset of the other Trojans.
Once excluded the Eurybates family, the remaining 61 Trojans from the
L4 swarm have an average slope of 7.33±4.24%/103Å. The very slight dif-
ference of average slope between the L5 and remaining L4 objects is very
marginally significant (probability of 1.6%), and the shape and width of the
slope distributions are compatible with each other.
The taxonomic classification we have performed shows that the majority
(73.5%) of the observed L5 Trojans (Fig. 10) are D–type (slope > 7 %/103
Å) with featureless reddish spectra, 11.8% are DP/PD –type (slope between
5 and 7 %/103 Å), 10.3% are P–type, and only 3 objects are classified as
C–type (4.4%).
In the L4 swarm (Fig. 11), even though the D–type still dominate the
population (48.6%), the spectral types are more heterogeneous as compared
to the L5 cloud, with a higher percentage of neutral-bluish objects: 20.3%
are P–type, 8.1% are DP/PD-type, 12.2% are C–type, and 10.8% of the
bodies have negative spectral slope. The higher percentage of C– and P–
type as compared to the L5 swarm is strongly associated with the presence
of the very peculiar Eurybates family. Among 17 observed members 10 are
classified as C–types (among which 3 have negative spectral slopes) and 7 are
P–types. Considering the 57 asteroids that compose the L4 cloud without
the Eurybates family, we find percentages of P, and PD/DP –types very
similar to those of the L5 cloud (14.0% and 10.5% respectively), a smaller
percentage of D–types (63.2%) and of the C–types (3.5%), and the presence
of a 8.8% Trojans with negative spectral slopes.
The visible spectra of the Eurybates members are very similar to those of
C–type main belt asteroids, Chiron-like Centaurs, and cometary nuclei. This
similarity is compatible with three different scenarios: the family could have
been produced by the fragmentation of a parent body very different from
all the other Jupiter Trojans (in which case the origin of such a peculiar
parent must still be assessed); this could be a very old family where space
weathering processes have covered any differences in composition among the
family members and flattened all the spectra; this could be a young family
where space weathering processes occurred within time scales smaller than
the age of the family. In the last two cases the Eurybates family would give
the first observational evidence of spectra flattened owing to space weathering
processes. This would then imply, according to the results of Moroz et al.
(2004), that its primordial composition was rich in complex hydrocarbons.
The knowledge of the age of the Eurybates family is therefore a fundamental
step to investigate the nature and the origin of the parent body, and to assess
the effect of space weathering processes on the surfaces of its members.
The present sample of Jupiter Trojans suggests a more heterogeneous
composition of the L4 swarm as compared to the L5 one. As previously
noted by Bendjoya et al. (2004), the L4 swarm contains a higher percentage
of C– and P–type objects. This result is enhanced by members of the Eu-
rybates family, but remains even when these family members are excluded.
Moreover, the dynamical families belonging to the L4 cloud are more robust
than those of the L5 one, surviving with densely populated clustering even
at low relative velocity cut-off. We therefore could argue that the L4 cloud
is more collisionally active than the L5 swarm. Nevertheless, we still cannot
intepret this in terms of the composition of the two populations, since we
cannot exclude that as yet unobserved C– and P–type families are present
in the L5 cloud.
4.4 Orbital Elements
[HERE FIGURE 12 and TABLES 7 and 8 ]
We analyzed the spectral slope as a function of the Trojans’ orbital el-
ements. As an illustration, Fig. 12 shows the B − R color distribution as
a function of the orbital elements. In order to investigate variations with
orbital parameters, the Trojan population is divided in 2 sub samples: those
with the considered orbital element lower than the median value, and those
with the orbital element higher than the median (by construction, the two
subsamples have the same size). Taking a as an example, half the Trojans
have a < 5.21AU, and half have a larger than this value.
The mean color, the color dispersion, and the color distribution of the
2 subsamples are compared using the three statistical tests mentioned in
Section 4.2. The method is discussed in details in Hainaut & Delsanti (2002).
The tests are repeated for all color and spectral slope distributions. The
results are the following.
• q, perihelion distance: the color distribution of the Trojans with small
q is marginally broader than that of Trojans with larger q. This result
is not very strong (5%), and is dominated by the red-end of the visible
wavelength. Removing the Eurybates from the sample maintains the
result, at the same weak level.
• e, eccentricity: the distribution shows a similar result, also at the weak
5% significance. The objects with larger e have broader color distribu-
tion than those with lower e. This result is entirely dominated by the
Eurybates’ contribution.
• i, inclination: objects with smaller inclination are significantly bluer
than those with larger i. This result is observed at all wavelengths. It
is worth noting that this is contrary to what is usually observed on
other Minor Bodies in the Outer Solar System survey (MBOSSes),
where objects with high i, or more generally, high excitation E =√
e2 + sin2 i, are bluer (Hainaut & Delsanti, 2002; Doressoundiram et
al., 2005). This can also be visually appreciated in Fig. 12. This result
is also completely dominated by the Eurybates’ contribution. The non-
Eurybates Trojans do not display this trend.
• E =
e2 + sin2 i, orbital excitation: the objects with small E are also
significantly bluer than those with high E. This result is also com-
pletely dominated by the Eurybates’ contribution. The non-Eurybates
Trojans do not display this trend.
In summary this analysis shows that the Eurybates sub-sample of the
Trojans is well separated in orbital elements and in colors.
For the other Minor Bodies in the outer Solar System, the relation be-
tween color and inclination–orbital excitation (objects with a higher orbital
excitation tend to be bluer) is interpreted as a relation between excitation
and surface aging/rejuvenating processes (Doressoudiram et al., 2005). The
Eurybates family has low excitation and neutral-blue colors, suggesting that
the aging/rejuvenating processes affecting them are different from the other
objects. This could be due to different surface compositions, different irradi-
ation processes, or different collisional properties – which would be natural
for a collisional family.
5 Comparison with other outer Solar System
minor bodied
5.1 Introduction and methods
[HERE FIGURES 13 AND 14]
The statistical tests set described in section 4.2 has also been applied
to compare the colors and the spectral slopes distribution of the Trojans
with those of the other minor bodies in the outer Solar System taken from
the updated, on-line version of the Hainaut & Delsanti (2002) database.
Figure 13, as an example, displays the (R-I) vs (V-R) diagrams, while Fig. 14
shows the (B-V) and (V-R) color distributions, as well as the spectral slope
distribution of the different classes of objects. The tests were performed
on all the color indices derived from filters in the visible (UBVRI) and near
infrared range (JHK) but in Table 7 and 8 we summarize the most significant
results.
In order to “calibrate” the significant probabilities, additional artificial
classes are also compared: first, the objects which have an even internal
number in the database with the odd ones. As this internal number is purely
arbitrary, both classes are statistically indistinguishable. The other tested
pair is the objects with a “1999” designation versus the others. Again, this
selection criterion is arbitrary, so the pseudo-classes it generates are sub-
sample of the total population, and should be indistinguishable. However, as
many more objects have been discovered in all the other years than during
that specific year, the size of these sub-samples are very different. This
permits us to estimate the sensitivity of the tests on sample of very different
sizes. Some of the tests found the arbitrary populations incompatible at the
5% level, so we use 0.5% as a conservative threshold for statistical significance
of the distribution incompatibility
5.2 Results
Table 7 and Fig. 14 clearly show that the Trojans’ colors distribution is
different as compare to that of Centaurs, TNOs and comets. Trojans are at
the same time bluer, and their distribution is narrower than all the other
populations. Using the statistical tests (see Table 8), we can confirm the
significance of these results.
• The average colors of the Trojans are significantly different from those
of all the other classes of objects (t-test), with the notable exception of
the short period comet nuclei. Refining the test to the Eurybates/non-
Eurybates, it appears that the Eurybates have marginally different
mean colors, while the non-Eurybates average colors are indistinguish-
able from those of the comets.
• Considering the full shape of the distribution (KS test), we obtain the
same results: the Trojans colors distributions are significantly differ-
ent from those of all the other classes, with the exception of the SP
comets, which are compatible. Again, this result becomes stronger
separating the Eurybates: their distributions are different from those
of the comets, while the non-Eurybates ones are indistinguishable.
• The results when considering the widths of the color distributions (f-
test) are slightly different. Classes of objects with different mean colors
could still have the same distribution width. This could suggest that a
similar process (causing the width of the distribution) is in action, but
reached a different equilibrium point (resulting in different mean val-
ues). This time, all classes are incompatible with the Trojans, including
the comets, with strong statistical significance.
In order to further explore possible similarities between Trojans and other
classes, the comparisons were also performed with the neutral Centaurs.
These were selected with S < 20%/103Å); this cut-off line falls in the gap
between the ”neutral” and ”red” Centaurs (Peixinho et al., 2003, Fornasier
et al., 2004b).
The t-Test (mean color) only reveals a very moderate incompatibility be-
tween the Trojans and neutral Centaurs, at the 5% level, i.e. only marginally
significant. On the other hand, the f-Test gives some strong incompatibilities
in various colors (moderate in B− V and H −K, very strong in R− I), but
the two populations are compatible for most of the other colors. Similarly,
only the R − I KS-test reveals a strong incompatibility. It should also be
noted that only 18 neutral Centaurs are known in the database. In summary,
while the Trojans and neutral Centaurs have fairly similar mean colors, their
color distributions are also different.
6 Conclusions
From 2002, we carried out a spectroscopic and photometric survey of Jupiter
Trojans, with the aim of investigating the members of dynamical families.
In this paper we present new data on 47 objects belonging to several dy-
namical families: Anchises (5 members), Cloanthus (2 members), Misenus (6
members), Phereclos (3 members), Sarpedon (2 members) and Panthoos (5
members) from the L5 swarm; Eurybates (17 members), 1986 WD (6 mem-
bers), and Menelaus (1 member) for the L4 swarm. Together with the data
already published by Fornasier et al. (2004a) and Dotto et al. (2006), taken
within the same observing program, we have a total sample of 80 Trojans,
the largest homogeneous data set available to date on these primitive aster-
oids. The main results coming from the observations presented here, and
from the analysis including previously published visible spectra of Trojans,
are the following:
• Trojans’ visible spectra are mostly featureless. However, some mem-
bers of the Eurybates family show a UV drop-off in reflectivity for
wavelength shorter than 5000–5200 Å that is possibly due to interva-
lence charge transfer transitions (IVCT) in oxidized iron.
• The L4 Eurybates family strongly differs from all the other families
in that it is dominated by C– and P–type asteroids. Also its spectral
slope distribution is significantly different when compared to that of
the other Trojans (at the 10−10 level).
This family is very peculiar and is dynamically very strong, as it sur-
vives also at a very stringent cutoff (40 m/s). Further observations in
the near-infrared region are strongly encouraged to look for possible
absorption features due to water ice or to material that experienced
aqueous alteration.
• The average spectral slope for the L5 Trojans is 9.15±4.19%/103Å, and
6.10±4.48%/103Å for the L4 objects. Excluding the Eurybates, the L4
average slope values becomes 7.33±4.24%/103Å. The slope distribu-
tions of the L5 and of the non-Eurybates L4 are indistinguishable.
• Both L4 and L5 clouds are dominated by D–type asteroids, but the L4
swarm has an higher presence of C– and P–type asteroids, even when
the Eurybates family is excluded, and appears more heterogeneous in
composition as compared to the L5 one.
• We do not find any size versus spectral slope relationship inside the
whole Trojans population.
• The Trojans with higher orbital inclination are significantly redder than
those with lower i. While this trend is the opposite of that observed
for other distant minor bodies, this effect is entirely dominated by the
Eurybates family.
• Comparing the Trojans colors with those of other distant minor bod-
ies, they are the bluest of all classes, and their colors distribution is the
narrowest. This difference is mostly due to the Eurybates family. In
fact, if we consider only the Trojan population without the Eurybates
members, their average colors and overall distributions are not distin-
guishable from that of the short period comets. However, the widths
of their color distributions are not compatible. The similarity in the
overall color distributions might be caused by the small size of the short
period comet sample rather than by a physical analogy. The Trojans
average colors are also fairly similar to those of the neutral Centaurs,
but the overall distributions are not compatible.
After this study, we have to conclude that Trojans have peculiar charac-
teristics very different from those of all the other populations of the outer
Solar System.
Unfortunately, we still cannot assess if this is due to differences in the physi-
cal nature, or in the aging/rejuvenating processes which modified the surface
materials in different way at different solar distances. Further observations,
mainly in V+NIR spectroscopy and polarimetry, are absolutely needed to
better investigate the nature of Jupiter Trojans and to definitively assess if a
genetical link might exist with Trans-Neptunian Objects, Centaurs and short
period comets.
Acknowledgments
We thank Beaugé and Roig for kindly providing us with updated Trojan
family list, and R.P. Binzel and J.P. Emery for their useful comments in the
reviewing process.
References
Barucci, M. A., Lazzarin, M., Owen, T., Barbieri, C., Fulchignoni, M.,
1994. Near–infrared spectroscopy of dark asteroids. Icarus 110, 287-291.
Beaugé, C., Roig, F., 2001. A Semianalytical Model for the Motion of
the Trojan Asteroids: Proper Elements and Families. Icarus 53, 391-415.
Bendjoya, P., Cellino, A., Di Martino, M., Saba, L., 2004. Spectroscopic
observations of Jupiter Trojans. Icarus 168, 374-384.
Binzel, R. P., Sauter, L. M., 1992. Trojan, Hilda, and Cybele asteroids -
New lightcurve observations and analysis. Icarus 95, 222-238.
Bowell, E., Hapke, B., Domingue, D., Lumme, K., Peltoniemi, J., Harris,
A.W., 2003. Application of photometric models to asteroids. In Asteroids
II (R.P Binzel, T. Gehrels, M.S. Matthews, eds) Univ. of Arizona Press,
Tucson, pp. 524–556.
Bus, S. J., Binzel, R.P., 2003. Phase II of the Small Main-Belt Asteroid
Spectroscopic Survey. The Observations. Icarus 158, 106–145.
Dahlgren, M., Lagerkvist, C. I., 1995. A study of Hilda asteroids. I. CCD
spectroscopy of Hilda asteroids. Astron. Astrophys. 302, 907-914.
Dell’Oro, A., Marzari, P., Paolicchi F., Dotto, E., Vanzani, V., 1998.
Trojan collision probability: a statistical approach. Astron. Astrophys. 339,
272-277.
Doressoundiram, A., Peixinho, N., Doucet, C., Mousis, O., Barucci, M. A.,
Petit, J. M., Veillet, C., 2005. The Meudon Multicolor Survey (2MS) of
Centaurs and trans-neptunian objects: extended dataset and status on the
correlations reported. Icarus 174, 90–104.
Dotto, E., Fornasier, S., Barucci, M. A., Licandro, J., Boehnhardt, H.,
Hainaut, O., Marzari, F., de Bergh, C., De Luise, F., 2006. The surface com-
position of Jupiter Trojans: Visible and Near–Infrared Survey of Dynamical
Families. Icarus 183, 420-434
Dumas, C., Owen, T., Barucci, M. A., 1998. Near-Infrared Spectroscopy
of Low-Albedo Surfaces of the Solar System: Search for the Spectral Signa-
ture of Dark Material. Icarus 133, 221-232.
Emery, J. P., Brown, R. H., 2003. Constraints on the surface composition
of Trojan asteroids from near-infrared (0.8-4.0 µm) spectroscopy. Icarus 164,
104-121.
Emery, J. P., Brown, R. H., 2004. The surface composition of Trojan
asteroids: constraints set by scattering theory. Icarus 170, 131-152.
Emery, J. P., Cruikshank, D. P., Van Cleve, J., 2006. Thermal emission
spectroscopy (5.2 38 µm of three Trojan asteroids with the Spitzer Space
Telescope: Detection of fine-grained silicates. Icarus 182, 496-512.
Fernandez Y. R., Sheppard, S. S., Jewitt, D. C., 2003. The Albedo Dis-
tribution of Jovian Trojan Asteroids. Astron. J. 126, 1563-1574.
Fitzsimmons, A., Dahlgren, M., Lagerkvist, C. I., Magnusson, P., Williams,
I. P., 1994. A spectroscopic survey of D-type asteroids. Astron. Astrophys.
282, 634-642.
Fornasier, S., Lazzarin, M., Barbieri, C., Barucci, M. A., 1999. Spec-
troscopic comparison of aqueous altered asteroids with CM2 carbonaceous
chondrite meteorites. Astron. Astrophys. 135, 65-73
Fornasier, S., Dotto, E., Marzari, F., Barucci, M.A., Boehnhardt, H.,
Hainaut, O., de Bergh, C., 2004a. Visible spectroscopic and photometric
survey of L5 Trojans : investigation of dynamical families. Icarus, 172, 221–
Fornasier, S., Doressoundiram, A., Tozzi, G. P., Barucci, M. A., Boehn-
hardt, H., de Bergh, C., Delsanti A., Davies, J., Dotto, E., 2004b. ESO Large
Program on Physical Studies of Trans-Neptunian Objects and Centaurs: fi-
nal results of the visible spectroscopic observations. Astron. Astrophys. 421,
353-363.
Hainaut, O. R., Delsanti, A. C., 2002. Colors of Minor Bodies in the
Outer Solar System. A statistical analysis. Astron. Astroph. 389, 641-664.
Hudson, R.L., Moore, M.H. 1999. Laboratory Studies of the Formation
of Methanol and Other Organic Molecules by Water+Carbon Monoxide Ra-
diolysis: Relevance to Comets, Icy Satellites, and Interstellar Ices. Icarus
140, 451-461.
Jewitt, D. C., Luu, J. X., 1990. CCD spectra of asteroids. II - The Tro-
jans as spectral analogs of cometary nuclei. Astron. J. 100, 933-944.
Jewitt, D. C., Trujillo, C. A., Luu, J. X., 2000. Population and Size Dis-
tribution of Small Jovian Trojan Asteroids. Astron. J. 120, 1140-1147
Landolt, A. U., 1992. UBVRI photometric standard stars in the magni-
tude range 11.5–16.0 around the celestial equator. Astron. J . 104, 340-371,
436-491.
Lazzaro, D., Angeli, C. A., Carvano, J. M., Mothé-Diniz, T., Duffard, R.,
Florczak, M., 2004. S3OS2: the visible spectroscopic survey of 820 asteroids.
Icarus 172, 179–220.
Levison, H., Shoemaker, E. M., Shoemaker, C. S., 1997. The dispersal of
the Trojan asteroid swarm. Nature 385, 42-44.
Marzari, F., Farinella, P., Davis, D. R., Scholl, H., Campo Bagatin, A.,
1997. Collisional Evolution of Trojan Asteroids. Icarus 125, 39-49.
Marzari, F., Scholl, H., 1998a. Capture of Trojans by a Growing Proto-
Jupiter. Icarus 131, 41-51.
Marzari, F., Scholl, H., 1998b. The growth of Jupiter and Saturn and the
capture of Trojans. Astron. Astroph. 339, 278-285
Marzari, F., Scholl, H., Murray, C., Lagerkvist, C., 2002. Origin and Evo-
lution of Trojan Asteroids. In Asteroids III, W. F. Bottke Jr., A. Cellino,
P. Paolicchi, and R. P. Binzel (eds), University of Arizona Press, Tucson,
725-738.
Marzari, F., Tricarico, P., Scholl, H., 2003. Stability of Jupiter Trojans
investigated using frequency map analysis: the MATROS project. MNRAS
345, 1091-1100.
Milani, A., 1993. The Trojan asteroid belt: Proper elements, stability,
chaos and families. Celest. Mech. Dynam. Astron. 57, 59-94.
Morbidelli, A., Levison, H. F., Tsiganis, K., Gomes, R., 2005. Chaotic
capture of Jupiter’s Trojan asteroids in the early Solar System. Nature 435,
462-465.
Moore, M.H., Donn, B., Khanna, R., A’Hearn, M.F., 1983. Studies of
proton-irradiated cometary-type ice mixtures. Icarus 54, 388-405.
Moroz L., Baratta G., Strazzulla G., Starukhina L., Dotto E., Barucci
M.A., Arnold G., Distefano E. 2004. Optical alteration of complex organics
induced by ion irradiation: 1. Laboratory experiments suggest unusual space
weathering trend. Icarus 170, 214-228.
Peixinho, N., Doressoundiram, A., Delsanti, A., Boehnhardt, H., Barucci,
M. A., Belskaya, I., 2003. Reopening the TNOs color controversy: Centaurs
bimodality and TNOs unimodality. Astron. Astrophys. 410, 29–32.
Shoemaker, E. M., Shoemaker, C. S., Wolfe, R. F., 1989. Trojan aster-
oids: populations, dynamical structure and origin of the L4 and L5 swarms.
In Binzel, Gehrels, Matthews (Eds.), Asteroids II. Univ. of Arizona Press,
Tucson, pp. 487-523.
Strazzulla, G., 1998. Chemistry of Ice Induced by Bombardment with
Energetic Charged Particles. In Solar System Ices (B. Schmitt, C. de Bergh,
M. Festou, eds.), Kluwer Academic, Dordrecht, Astrophys. Space Sci Lib.
Thompson, W.R., Murray, B.G.J.P.T., Khare, B.N., Sagan, C. 1987. Col-
oration and darkening of methane clathrate and other ices by charged particle
irradiation - Applications to the outer solar system. JGR 92, 14933-14947.
Xu, S., Binzel, R. P., Burbine, T. H., Bus, S. J., 1995. Small main-belt
asteroid spectroscopic survey: Initial results. Icarus 115, 1–35.
Vilas, F. 1994. A quick look method of detecting water of hydration in
small solar system bodies. LPI 25, 1439-1440.
Zappala, V., Cellino, A., Farinella, P., Knez̆ević, Z., 1990. Asteroid fam-
ilies. I - Identification by hierarchical clustering and reliability assessment.
Astron. J. 100, 2030-2046.
Tables
Table 1: Observing conditions of the investigated L5 asteroids. For each
object we report the observational date and universal time, total exposure
time, number of acquisitions with exposure time of each acquisition, airmass,
and the observed solar analogs with their airmass.
Obj Date UT Texp (s) nexp air. Solar An. (air.)
Anchises
1173 17 Jan 05 06:06 60 1×60s 1.42 HD76151 (1.48)
23549 17 Jan 05 07:20 480 2×240s 1.60 HD76151 (1.48)
24452 17 Jan 05 07:54 960 4×240s 1.44 HD76151 (1.48)
47967 17 Jan 05 05:34 800 2×400s 1.38 HD76151 (1.48)
2001 SB173 17 Jan 05 06:28 1200 2×600s 1.35 HD76151 (1.48)
Cloanthus
5511 19 Jan 05 06:04 960 4×240s 1.26 HD76151 (1.12)
51359 19 Jan 05 04:13 660 1×660s 1.36 HD76151 (1.12)
Misenus
11663 17 Jan 05 05:13 400 1×400s 1.21 HD44594 (1.12)
32794 18 Jan 05 03:13 1800 2×900s 1.39 HD28099 (1.44)
56968 17 Jan 05 04:31 400 2×400s 1.21 HD44594 (1.12)
1988 RE12 18 Jan 05 04:12 2000 2×1000s 1.31 HD28099 (1.44)
2000 SC51 18 Jan 05 06:09 1320 2×660s 1.16 HD44594 (1.17)
2001 UY123 18 Jan 05 06:46 1320 2×660s 1.32 HD44594 (1.17)
Phereclos
9030 18 Jan 05 08:19 1000 1×1000s 1.37 HD44594 (1.17)
11488 19 Jan 05 03:31 1320 2×660s 1.99 HD76151 (1.12)
31820 19 Jan 05 07:02 1320 2×660s 1.35 HD76151 (1.11)
Sarpedon
48252 18 Jan 05 02:32 1320 2×660s 1.30 HD28099 (1.44)
84709 19 Jan 05 05:35 1320 2×660s 1.34 HD76151 (1.12)
Panthoos
4829 17 Jan 05 08:37 720 3×240s 1.45 HD76151 (1.48)
30698 18 Jan 05 01:54 1320 2×660s 1.73 HD28099 (1.44)
31821 18 Jan 05 05:27 1320 2×660s 1.35 HD28099 (1.44)
76804 17 Jan 05 03:35 1800 3×600s 1.38 HD44594 (1.12)
2001 VK85 18 Jan 05 07:31 2000 2×1000s 1.23 HD44594 (1.17)
Table 2: Observing conditions of the investigated L4 asteroids. For each
object we report the observational date and universal time, total exposure
time, number of acquisitions with exposure time of each acquisition, airmass,
and the observed solar analogs with their airmass.
Obj Date UT Texp (s) nexp air. Solar An. (air.)
Eurybates
3548 25 May 04 05:14 600 2×300s 1.02 SA107-684 (1.19)
9818 26 May 04 00:13 780 1×780s 1.19 SA102-1081(1.15)
13862 25 May 04 03:35 1200 2×600s 1.09 SA107-998 (1.15)
18060 25 May 04 02:47 1500 2×750s 1.07 SA107-998 (1.15)
24380 25 May 04 06:53 780 1×780s 1.18 SA107-684 (1.19)
24420 25 May 04 08:49 900 1×900s 1.59 SA112-1333 (1.17)
24426 26 May 04 00:13 1440 2×720s 1.13 SA107-684 (1.17)
28958 26 May 04 07:14 1800 2×900s 1.35 SA107-684 (1.17)
39285 25 May 04 05:40 2700 3×900s 1.09 SA107-684 (1.19)
43212 25 May 04 07:39 2340 3×780s 1.39 SA110-361 (1.15)
53469 25 May 04 02:05 1800 2×900s 1.04 SA107-998 (1.15)
65150 26 May 04 01:59 3600 4×900s 1.07 SA102-1081 (1.20)
65225 26 May 04 03:40 3600 4×900s 1.04 SA107-684 (1.17)
1996RD29 26 May 04 05:12 2700 3×900s 1.10 SA107-684 (1.17)
2000AT44 25 May 04 04:14 1800 2×900s 1.04 SA107-684 (1.19)
2002CT22 26 May 04 00:49 2400 4×600s 1.08 SA102-1081 (1.15)
2002EN68 26 May 04 08:10 1800 2×900s 1.62 SA107-684 (1.17)
1986 WD
4035 10 Apr 03 03:28 600 1×600s 1.09 SA107-684 (1.15)
6545 10 Apr 03 02:39 900 1×900s 1.16 SA107-684 (1.15)
11351 10 Apr 03 09:21 900 1×900s 1.28 SA107-684 (1.15)
14707 11 Apr 03 08:11 1200 1×1200s 1.15 SA107-684 (1.15)
24233 11 Apr 03 02:29 1200 1×1200s 1.39 SA107-684 (1.37)
24341 11 Apr 03 05:47 900 1×900s 1.16 SA107-684 (1.17)
1986 TS6
12921 10 Apr 03 07:33 900 1×900s 1.39 SA107-684 (1.40)
Table 3: Visible photometric observations of L4 and L5 Trojans (ESO-NTT
EMMI): for each object, date, computed V magnitude, B-V, V-R and V-
I colors are reported. The given UT is for the V filter acquisition. The
observing photometric sequence (V-R-B-I) took a few minutes.
Object date UT V B-V V-R V-I
1986 WD
4035 10 Apr 03 03:11 16.892±0.031 0.752±0.040 0.473±0.042 0.926±0.055
4035 10 Apr 03 04:22 16.981±0.031 0.752±0.040 0.495±0.042 0.945±0.055
6545 10 Apr 03 02:22 17.558±0.031 0.734±0.041 0.499±0.042 0.935±0.055
11351 10 Apr 03 09:03 18.407±0.032 0.739±0.044 0.498±0.044 0.900±0.057
14707 11 Apr 03 06:46 18.666±0.031 0.751±0.041 0.401±0.033 0.804±0.055
14707 11 Apr 03 08:37 18.873±0.031 0.754±0.041 0.424±0.033 0.790±0.056
24233 11 Apr 03 01:33 18.894±0.034 0.704±0.051 0.481±0.037 0.899±0.058
24341 11 Apr 03 05:05 19.376±0.032 0.713±0.043 0.369±0.035 0.759±0.057
1986 TS6
12921 10 Apr 03 07:12 18.393±0.031 0.673±0.040 0.421±0.042 0.786±0.055
L5 cut off 150m/s
Anchises
1173 17 Jan 05 05:54 16.595±0.024 0.811±0.034 0.402±0.035 0.805±0.038
23549 17 Jan 05 07:09 18.969±0.050 0.800±0.071 0.485±0.068 0.872±0.075
24452 17 Jan 05 07:48 18.757±0.043 0.872±0.056 0.441±0.056 0.847±0.066
47967 17 Jan 05 05:27 19.382±0.044 0.899±0.058 0.489±0.069 0.965±0.075
2001 SB173 17 Jan 05 06:20 19.882±0.043 0.992±0.060 0.503±0.064 0.927±0.078
Cloanthus
5511 19 Jan 05 05:52 17.968±0.020 0.906±0.027 0.442±0.027 0.968±0.032
51359 19 Jan 05 03:54 19.631±0.102 0.864±0.201 0.447±0.131 0.885±0.164
Misenus
11663 17 Jan 05 05:05 18.473±0.022 0.837±0.030 0.409±0.030 0.872±0.039
32794 18 Jan 05 03:07 19.685±0.038 0.923±0.065 0.393±0.056 0.879±0.057
56968 17 Jan 05 04:18 18.596±0.026 0.986±0.040 0.494±0.033 1.003±0.036
1988 RE12 18 Jan 05 04:00 20.892±0.081 0.826±0.132 0.388±0.108 0.871±0.106
2000 SC51 18 Jan 05 06:03 19.876±0.038 1.016±0.055 0.444±0.059 0.896±0.056
2001 UY123 18 Jan 05 06:41 19.869±0.047 0.890±0.058 0.537±0.056 0.971±0.063
Phereclos
9030 18 Jan 05 08:14 18.397±0.020 0.887±0.024 0.493±0.027 0.973±0.028
11488 19 Jan 05 02:57 18.931±0.066 0.868±0.101 0.430±0.079 0.848±0.084
31820 19 Jan 05 06:39 20.041±0.077 0.889±0.093 0.520±0.091 0.916±0.123
Sarpedon
48252 18 Jan 05 02:25 19.878±0.060 0.949±0.100 0.467±0.093 0.903±0.090
84709 19 Jan 05 05:10 19.862±0.068 0.855±0.087 0.462±0.090 1.010±0.094
Panthoos
4829 17 Jan 05 08:18 18.430±0.029 0.851±0.050 0.420±0.039 0.792±0.052
30698 18 Jan 05 01:45 19.353±0.036 – 0.472±0.042 0.865±0.047
31821 18 Jan 05 05:21 19.328±0.076 0.980±0.111 0.440±0.097 0.901±0.108
76804 17 Jan 05 03:21 19.471±0.065 0.803±0.082 0.446±0.070 0.889± 0.080
2001 VK85 18 Jan 05 07:23 20.179±0.038 0.822±0.063 0.462±0.048 1.020±0.050
Table 4: L5 families. We report for each target the absolute magnitude H
and the estimated diameter (diameters marked by ∗ are taken from IRAS
data), the spectral slope S computed between 5500 and 8000 Å and the
taxonomic class (T) derived following Dahlgren & Lagerkvist (1995) classi-
fication scheme. The asteroids marked with a were observed by Fornasier
et al. (2004a), and their spectral slope values have been recomputed in the
5500-8000 Å wavelength range; asteroids 23694, 30698 and 32430, previously
Astyanax members, have been reassigned to the Panthoos family due to re-
fined proper elements.
Obj H D (km) S (%/103Å) T
Anchises
1173 8.99 ∗126+11
3.87±0.70 P
23549 12.04 26+4
8.49±0.88 D
24452 11.85 29+5
7.42±0.70 D
47967 12.15 25+4
9.21±0.78 D
2001 SB173 12.77 19+3
14.78±0.99 D
Cloanthus
5511 10.43 55+8
10.84±0.65 D
51359 12.25 24+6
12.63±1.30 D
Misenus
11663 10.95 44+7
6.91±0.70 DP
32794 12.77 19+3
6.59±0.88 DP
56968 11.72 30+5
15.86±0.71 D
1988 RE12 13.20 16+2
4.68±1.20 P
2000 SC51 12.69 20+3
6.54±0.98 DP
2001 UY123 12.75 19+3
8.28±0.88 D
Phereclos
a2357 8.86 ∗95+4
9.91±0.68 D
a6998 11.43 34+5
11.30±0.75 D
9030 11.14 40+6
10.35±0.76 D
a9430 11.47 35+5
10.02±0.90 D
11488 11.82 29+5
5.37±0.92 PD
a18940 11.81 29+4
7.13±0.75 D
31820 12.63 20+3
7.53±0.80 D
Sarpedon
a2223 9.25 ∗95+4
10.20±0.65 D
a5130 9.85 71+11
10.45±0.65 D
a17416 12.83 18+3
10.80±0.90 D
a25347 11.59 33+5
10.11±0.83 D
48252 12.84 18+3
9.62±0.82 D
84709 12.70 19+3
11.64±0.84 D
Panthoos
4829 11.16 39+6
5.03±0.70 PD
a23694 11.61 32+5
8.20±0.72 D
30698 12.14 25+4
8.23±1.00 D
a30698 12.27 25+4
9.08±0.82 D
a32430 12.23 25+4
8.12±1.00 D
31821 11.99 27+4
10.58±0.82 D
76804 12.16 25+4
7.29±0.71 D
2001 VK85 12.79 19+3
14.39±0.81 D
Table 5: L4 Families. We report for each target the absolute magnitude H
and the estimated diameter (diameters marked by ∗ are taken from IRAS
data, while absolute magnitudes marked by ∗∗ are taken from the astorb.dat
file of the Lowell Observatory), the spectral slope S computed between 5500
and 8000 Å, and the taxonomic class (T) derived following Dahlgren &
Lagerkvist (1995) classification scheme. The asteroids marked with a were
observed by Dotto et al. (2006), and their spectral slope values have been
recomputed in the 5500-8000 Å wavelength range.
Obj H D (km) S (%/103Å) T
Eurybates
3548 9.50∗∗ ∗72+4
-0.18±0.57 C
9818 11.00∗∗ 42+6
2.12±0.72 P
13862 11.10∗∗ 40+6
1.59±0.70 C
18060 11.10∗∗ 40+6
2.86±0.60 P
24380 11.20∗∗ 38+6
0.34±0.65 C
24420 11.50∗∗ 33+5
1.65±0.70 C
24426 12.50∗∗ 21+3
4.64±0.80 P
28958 12.10∗∗ 25+4
-0.04±0.80 C
39285 12.90∗∗ 17+3
0.25±0.69 C
43212 12.30∗∗ 23+4
1.19±0.78 C
53469 11.80∗∗ 29+4
0.17±0.80 C
65150 12.90∗∗ 17+3
4.14±0.70 P
65225 12.80∗∗ 18+3
0.97±0.85 C
1996RD29 13.06∗∗ 16+3
2.76±0.89 P
2000AT44 12.16∗∗ 24+3
-0.53±0.83 C
2002CT22 12.04∗∗ 26+4
2.76±0.73 P
2002EN68 12.30∗∗ 23+3
3.60±0.98 P
1986 WD
4035 9.72 ∗68+5
9.78±0.61 D
a4035 9.30∗∗ ∗68+5
15.19±0.61 D
6545 10.42 55+8
11.32±0.63 D
a6545 10.00∗∗ 66+10
9.88±0.56 D
11351 10.88 44+7
10.26±0.67 D
a11351 10.50∗∗ 53+8
10.44±0.61 D
14707 11.25 38+6
−9.4 -1.06±1.00 C
24233 11.58 33+5
−8.0 6.37±0.67 DP
24341 11.99 27+4
-0.26±0.71 C
1986 TS6
12917 11.61 32+5
10.98±0.68 D
12921 11.12 40+6
4.63±0.75 P
a12921 10.70∗∗ 48+7
3.74±1.00 P
13463 11.27 37+6
4.37±0.65 P
15535 10.70 48+7
10.67±0.65 D
20738 11.67 31+5
8.84±0.70 D
24390 11.80 29+5
9.53±0.62 D
Table 6: Results of the statistical analysis on the spectral slope distribution as a function of the diameters. For each
test bin, the average slope and the dispersion are listed; the size of the sample is reported in parenthesis. For each pair
of subsamples, the probability that both are randomly extracted from the same global sample is listed, as estimated
by the t-, f- and ks-test, respectively. Low probability indicates significant differences between the subsamples.
Diameter range 0–25 km 25–50 km 50–75 km 75–100 km >100 km
S average±σ 7.17±4.79 (22) 6.92±4.69 (48) 8.91±4.68 (26) 6.74±5.85 (21) 7.87±2.88 (21)
(%/103Å)
0–25 0.842 0.876 0.579 0.213 0.903 0.575 0.792 0.370 0.775 0.551 0.017 0.494
25–50 0.088 0.985 0.150 0.897 0.216 0.519 0.286 0.011 0.275
50–75 0.176 0.289 0.469 0.344 0.019 0.440
75–100 0.442 0.001 0.469
Table 7: Mean color indices and spectral slope of different classes of minor bodies of the outer Solar System. For each
class the number of objects considered is also listed.
Color Plutinos Cubewanos Centaurs Scattered Comets Trojans
B-V 36 87 29 33 2 74
0.895± 0.190 0.973± 0.174 0.886± 0.213 0.875± 0.159 0.795± 0.035 0.777± 0.091
V-R 38 96 30 34 19 80
0.568± 0.106 0.622± 0.126 0.573± 0.127 0.553± 0.132 0.441± 0.122 0.445± 0.048
V-I 34 64 25 25 7 80
1.095± 0.201 1.181± 0.237 1.104± 0.245 1.070± 0.220 0.935± 0.141 0.861± 0.090
V-J 10 14 11 8 1 12
2.151± 0.302 1.750± 0.456 1.904± 0.480 2.041± 0.391 1.630± 0.000 1.551± 0.120
V-H 3 7 11 4 1 12
2.698± 0.083 2.173± 0.796 2.388± 0.439 2.605± 0.335 1.990± 0.000 1.986± 0.177
V-K 2 5 9 2 1 12
2.763± 0.000 2.204± 1.020 2.412± 0.396 2.730± 0.099 2.130± 0.000 2.125± 0.206
R-I 34 64 25 26 8 80
0.536± 0.135 0.586± 0.148 0.548± 0.150 0.517± 0.102 0.451± 0.059 0.416± 0.057
J-H 11 17 21 11 1 12
0.403± 0.292 0.370± 0.297 0.396± 0.112 0.348± 0.127 0.360± 0.000 0.434± 0.064
H-K 10 16 20 10 1 12
-0.034± 0.171 0.084± 0.231 0.090± 0.142 0.091± 0.136 0.140± 0.000 0.139± 0.041
Slope 38 91 30 34 8 80
(%/103Å) 19.852± 10.944 25.603± 13.234 20.601± 13.323 18.365± 12.141 10.722± 6.634 7.241± 3.909
Table 8: Statistical tests performed to compare the color and slope distributions of different classes of minor bodies
(Plt= Plutinos, Resonant TNOs; QB1= Cubiwanos, Classical TNOs; Cent= Centaurs; Scat= scattered TNOs; Com=
Short Period Comet nuclei) with those of Trojans. The first five columns consider all the Trojans, the second five only
the Eurybates family, the third five only the non-Eurybates family Trojans. For each color, the first line shows the
number of objects used for the comparison (2nd is the number of Trojans), and the second line reports the probability
resulting from the test. A very low value indicates that the two compared distributions are not statistically compatible.
Probabilities are in boldface when the size of the samples is large enough for the value to be meaningful.
f-test
Color All Trojans Only Eurybates Only NON-Eurybates
Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com
B-V 36 74 83 74 29 74 33 74 2 74 36 14 83 14 29 14 33 14 2 14 36 60 83 60 29 60 33 60 2 60
0.000 0.000 0.000 0.000 0.600 0.001 0.001 0.000 0.005 0.722 0.000 0.000 0.000 0.000 0.598
V-R 38 80 92 80 30 80 34 80 19 80 38 17 92 17 30 17 34 17 19 17 38 63 92 63 30 63 34 63 19 63
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
R-I 34 80 62 80 25 80 26 80 8 80 34 17 62 17 25 17 26 17 8 17 34 63 62 63 25 63 26 63 8 63
0.000 0.000 0.000 0.000 0.773 0.000 0.000 0.000 0.001 0.335 0.000 0.000 0.000 0.000 0.185
Slope 38 80 87 80 30 80 34 80 8 80 38 17 87 17 30 17 34 17 8 17 38 63 87 63 30 63 34 63 8 63
0.000 0.000 0.000 0.000 0.020 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
t-test
Color All Trojans Only Eurybates Only NON-Eurybates
Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com
B-V 36 74 83 74 29 74 33 74 2 74 36 14 83 14 29 14 33 14 2 14 36 60 83 60 29 60 33 60 2 60
0.001 0.000 0.012 0.002 0.608 0.000 0.000 0.001 0.000 0.139 0.003 0.000 0.025 0.006 0.858
V-R 38 80 92 80 30 80 34 80 19 80 38 17 92 17 30 17 34 17 19 17 38 63 92 63 30 63 34 63 19 63
0.000 0.000 0.000 0.000 0.916 0.000 0.000 0.000 0.000 0.083 0.000 0.000 0.000 0.000 0.532
R-I 34 80 62 80 25 80 26 80 8 80 34 17 62 17 25 17 26 17 8 17 34 63 62 63 25 63 26 63 8 63
0.000 0.000 0.000 0.000 0.154 0.000 0.000 0.000 0.000 0.001 0.000 0.000 0.001 0.000 0.502
Slope 38 80 87 80 30 80 34 80 8 80 38 17 87 17 30 17 34 17 8 17 38 63 87 63 30 63 34 63 8 63
0.000 0.000 0.000 0.000 0.185 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.000 0.000 0.404
KS-test
Color All Trojans Only Eurybates Only NON-Eurybates
Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com Plt QB1 Cent Scat Com
B-V 36 74 83 74 29 74 33 74 2 74 36 14 83 14 29 14 33 14 2 14 36 60 83 60 29 60 33 60 2 60
0.001 0.000 0.001 0.004 0.330 0.002 0.000 0.035 0.000 0.065 0.003 0.000 0.002 0.047 0.468
V-R 38 80 92 80 30 80 34 80 19 80 38 17 92 17 30 17 34 17 19 17 38 63 92 63 30 63 34 63 19 63
0.000 0.000 0.000 0.000 0.040 0.000 0.000 0.000 0.000 0.008 0.000 0.000 0.000 0.000 0.056
R-I 34 80 62 80 25 80 26 80 8 80 34 17 62 17 25 17 26 17 8 17 34 63 62 63 25 63 26 63 8 63
0.000 0.000 0.000 0.000 0.201 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.587
Slope 38 80 87 80 30 80 34 80 8 80 38 17 87 17 30 17 34 17 8 17 38 63 87 63 30 63 34 63 8 63
0.000 0.000 0.000 0.000 0.088 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.211
Figure captions
Fig. 1 - Reflectance spectra of 5 Anchises family members (L5 swarm).
The photometric color indices are also converted to relative reflectance and
overplotted on each spectrum. Spectra and photometry are shifted by 0.5 in
reflectance for clarity.
Fig. 2 - Reflectance spectra of 6 Misenus family members (L5 swarm).
The photometric color indices are also converted to relative reflectance and
overplotted on each spectrum. Spectra and photometry are shifted by 0.5 in
reflectance for clarity.
Fig. 3 - Reflectance spectra of 5 Panthoos family members (L5 swarm).
The photometric color indices are also converted to relative reflectance and
overplotted on each spectrum. Spectra and photometry are shifted by 0.5 in
reflectance for clarity. For asteroid 30698, the B-V color is missing as a B
filter measurement was not available.
Fig. 4 - Reflectance spectra of 2 Cloantus, 3 Phereclos and 2 Sarpedon
family members (L5 swarm). The photometric color indices are also con-
verted to relative reflectance and overplotted on each spectrum. Spectra and
photometry are shifted by 1.0 in reflectance for clarity.
Fig. 5 - Reflectance spectra of the 17 Eurybates family members (L4
swarm). Spectra are shifted by 0.5 in reflectance for clarity.
Fig. 6 - Reflectance spectra of the 6 1986 WD family members and 12921,
which is a member of the 1986 TS6 family (all belonging to the L4 swarm).
Spectra are shifted by 1.0 in reflectance for clarity.
Fig. 7 - Plot of the spectral slope versus the estimated diameter for the
families observed in the L5 swarm.
Fig. 8 - Plot of the spectral slope versus the estimated diameter for the
families observed in the L4 swarm.
Fig. 9 - Plot of the observed spectral slopes versus the estimated diameter
for the whole population of Jupiter Trojans investigated by us and available
from the literature. The errors on slopes and diameters are not plotted to
avoid confusion.
Fig. 10 - Histogram of L5 Trojans taxonomic classes.
Fig. 11 - Histogram of L4 Trojans taxonomic classes (Neg indicates ob-
jects with negative spectral slope).
Fig. 12 - Color distributions as functions of the absolute magnitude
M(1, 1), the inclination i [degrees], the orbital semi-major axis a [AU], the
perihelion distance q [AU], the eccentricity e, and the orbital energy E (see
text for definition). We include all the available colors for distant minor bod-
ies (TNOs, Centaurs, and cometary nuclei, see Hainaut & Delsanti 2002).
The Plutinos (resonant TNOs) are red filled triangles, Cubiwanos (classical
TNOs) are pink filled circles, Centaurs are green open triangles, Scattered
TNOs are blue open circles, and Trojans are cyan filled triangles.
Fig. 13 - V −R versus R−I color-color diagram for the observed Trojans
and all distant minor bodies available in the updated Hainaut & Delsanti
(2002) database. The solid symbols are for the Trojans (square for Eurby-
bates, triangles for others). The open symbols are used as following: tri-
angles for Plutinos, circles for Cubiwanos, squares for Centaurs, pentagons
for Scattered, and starry square for Comets. The continuous line represents
the ”reddening line”, that is the locus of objects with a linear reflectivity
spectrum. The star symbol represents the Sun.
Fig. 14 - Cumulative function and histograms of the B − V and V − R
color distributions and of the spectral slope for all the considered classes of
objects. The dotted line marks the solar colors.
4000 5000 6000 7000 8000 9000
Figure 1:
4000 5000 6000 7000 8000 9000
Figure 2:
4000 5000 6000 7000 8000 9000
Figure 3:
Figure 4:
Figure 5:
Figure 6:
Figure 7:
Figure 8:
Figure 9:
Figure 10:
Figure 11:
M(1,1) E
a [AU] q [AU]
Figure 12:
Figure 13:
Figure 14:
	Introduction
	Observations and data reduction
	Results
	Dynamical families: L5 swarm
	Anchises
	Misenus
	Panthoos
	Cloantus
	Phereclos
	Sarpedon
	Dynamical families: L4 swarm
	Eurybates
	1986 WD
	1986 TS6
	Discussion
	Size vs spectral slope distribution:Individual families
	Size vs slope distribution:  The Trojan population as a whole
	Spectral slopes and L4/L5 Clouds
	Orbital Elements
	Comparison with other outer Solar System minor bodied
	Introduction and methods
	Results
	Conclusions
ABSTRACT
  We present the results of a visible spectroscopic and photometric survey of
Jupiter Trojans belonging to different dynamical families carried out at the
ESO-NTT telescope. We obtained data on 47 objects, 23 belonging to the L5 swarm
and 24 to the L4 one. These data together with those already published by
Fornasier et al. (2004a) and Dotto et al. (2006), constitute a total sample of
visible spectra for 80 objects. The survey allows us to investigate six
families (Aneas, Anchises, Misenus, Phereclos, Sarpedon, Panthoos) in the L5
cloud and four L4 families (Eurybates, Menelaus, 1986 WD and 1986 TS6). The
sample that we measured is dominated by D--type asteroids, with the exception
of the Eurybates family in the L4 swarm, where there is a dominance of C- and
P-type asteroids. All the spectra that we obtained are featureless with the
exception of some Eurybates members, where a drop--off of the reflectance is
detected shortward of 5200 A. Similar features are seen in main belt C-type
asteroids and commonly attributed to the intervalence charge transfer
transition in oxidized iron. Our sample comprises fainter and smaller Trojans
as compared to the literature's data and allows us to investigate the
properties of objects with estimated diameter smaller than 40--50 km. The
analysis of the spectral slopes and colors versus the estimated diameters shows
that the blue and red objects have indistinguishable size distribution. We
perform a statistical investigation of the Trojans's spectra property
distributions as a function of their orbital and physical parameters, and in
comparison with other classes of minor bodies in the outer Solar System.
Trojans at lower inclination appear significantly bluer than those at higher
inclination, but this effect is strongly driven by the Eurybates family.

<|endoftext|><|startoftext|>
Introduction
Following early hypotheses (Phillips & Mutel, 1982; Carvalho,
1985) suggesting that the gigahertz-peaked spectrum (GPS)
and compact steep spectrum (CSS) could be young objects,
Readhead et al. (1996) proposed an evolutionary scheme uni-
fying three classes of radio-loud AGNs (RLAGNs): symmet-
ric GPS objects – CSOs (compact symmetric objects); sym-
metric CSS objects – MSOs (medium-sized symmetric objects)
and large symmetric objects (LSOs). In this scheme GPS/CSO
sources with linear sizes less than 1 kpc1 would evolve into
CSS/MSOs with subgalactic sizes (<20 kpc) and these in turn
would eventually become LSOs during their lifetimes. Two
pieces of evidence definitely point towards GPS/CSS sources
being young objects: lobe proper motions (up to 0.3c) giving
kinematic ages as low as ∼103 years for CSOs (Owsianik et al.,
1998; Giroletti et al., 2003; Polatidis & Conway, 2003) and ra-
diative ages typically ∼105 years for MSOs (Murgia et al.,
1999). Although these AGNs are small-scale objects, in some
cases CSO/GPS sources are associated with much larger ra-
dio structures that extend out to many kiloparsecs. In these
cases, it has been suggested that the CSO/GPS stage rep-
resents a period of renewed activity in the life cycle of
the AGN (Stanghellini et al., 2005, and references therein).
Reynolds & Begelman (1997) have also proposed a model in
which extragalactic radio sources are intermittent on timescales
Send offprint requests to: M. Kunert-Bajraszewska
e-mail: magda@astro.uni.torun.pl
1 For consistency with earlier papers in this field, the following
cosmological parameters have been adopted throughout this paper:
H0=100 km s
−1 Mpc−1 and q0=0.5. Throughout this paper, the spectral
index is defined such that S ∝ να.
of ∼104–105 years. Following the above scenarios and also an
earlier suggestion by Readhead et al. (1994) and O’Dea & Baum
(1997) that there exists a large population of compact, short-
lived objects, Marecki et al. (2003, 2006) concluded that the evo-
lutionary track proposed by Readhead et al. (1996) is only one
of many possible tracks. A lack of stable fuelling from the black
hole can inhibit the growth of a radio source, and consequently
it will never reach the LSO stage, at least in a given phase of its
activity.
Observational support for the above ideas has been pro-
vided by Gugliucci et al. (2005). They calculated the kinematic
ages for a sample of CSOs with well-identified hotspots. It ap-
pears that the kinematic age distribution drops sharply above
∼500 years, suggesting that in many CSOs activity may cease
early. It is, therefore, possible that only some of them evolve
any further. Our observations have shown that young, fading
compact sources do indeed exist (Kunert-Bajraszewska et al.,
2005; Marecki et al., 2006; Kunert-Bajraszewska et al., 2006,
hereafter Papers II, III, and IV, respectively). A double source,
0809+404, described in Paper IV is our best example of a very
compact – i.e. very young – fader. The VLBA multifrequency
observations have shown it to have a diffuse, amorphous struc-
ture, devoid of a dominant core and hotspots. Giroletti et al.
(2005) have analysed the properties of a sample of small-size
sources and found a very good example of a kiloparsec-scale
fader (1855+37). It is to be noted that re-ignition of activity in
compact radio sources is not ruled out. In this paper – the fifth
and the last of the series – VLBA observations of 10 CSS and
CSO sources that are potential candidates for compact faders
or objects with intermittent activity are presented. One of these
sources, 1045+352, is of particular interest not only because it
http://arxiv.org/abs/0704.0351v2
2 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
has a puzzling radio structure, but it also appears to be a broad
absorption line (BAL) quasar.
As their name somewhat suggests, BAL quasars have very
broad, blue-shifted absorption lines arising from high-ionization
transitions such as C IV, Si IV, N V, etc. (e.g C IV 1549Å). They
constitute ∼10% of the optically selected radio-quiet quasars
with the absorption arising from gas outflow at velocities up to
∼0.2 c (Hewett & Foltz, 2003). In fact, BAL quasars have been
divided into two categories, as 10% of them also show absorp-
tion troughs in low-ionization lines such as Mg II 2800Å. This
group has been designated as LoBAL quasars and the others as
HiBAL ones. The high ionization level and continuous absorp-
tion over a wide velocity range is hard to reconcile with absorp-
tion by individual clouds. Rather, they indicate that BAL regions
exist in both BAL and non-BAL quasars and evidence, accumu-
lated from optically selected BAL quasars, indicates an orienta-
tion hypothesis to explain their nature. It would appear that BAL
quasars are normal quasars seen along a particular line of sight,
e.g. a line of sight skimming the edge of the accretion disk or
torus (Weymann et al., 1991; Elvis, 2000). Murray et al. (1995)
have proposed a model in which the line of sight to a BAL quasar
intersects an outflow or wind that is not entirely radial, e.g. an
outflow that initially emerges perpendicular to the accretion disk
and is then accelerated radially.
For quite a long time it was believed that BAL quasars were
never radio-loud. This view was challenged by Becker et al.
(1997), who discovered the first radio-loud BAL quasar when
using the VLA FIRST survey to select quasar candidates.
Five radio-loud BAL quasars were then identified in NVSS
by Brotherton et al. (1998). Since then, the number of radio-
loud BAL QSOs has increased considerably (Becker et al., 2000;
Menou et al., 2001), following identification of new quasar can-
didates selected from the FIRST survey. Most of the BAL
quasars in the Becker et al. (2000) sample tended to be com-
pact at radio frequencies with either a flat or steep radio spec-
trum. Those with steep spectra could be related to GPS and CSS
sources. A variety of their spectral indices also suggested a wide
range of orientations, contrary to the interpretation favoured
from optically selected quasars. Moreover, Becker et al. (2000)
indicated that the frequency of BAL quasars in their sample was
significantly greater (factor ∼2) than inferred from optically se-
lected samples and that the frequency of BAL quasars appeared
to show a complex dependence on radio loudness.
The radio morphology of BAL quasars is important because
it can indicate inclination in BALs, and therefore yields a di-
rect test of the orientation model. However, information about
the radio structure of BAL quasars is still very limited. Prior
to 2006, only three BAL quasars, FIRST J101614.3+520916
(Gregg et al., 2000), PKS 1004+13 (Wills et al., 1999), and
LBQS 1138−0126 (Brotherton et al., 2002) were known to have
a double-lobed FR II radio morphology on kiloparsec scales,
although this interpretation was doubtful for PKS 1004+13
(Gopal-Krishna & Wiita, 2000). Recently, the population of
FR II-BAL quasars has increased to ten objects (excluding PKS
1004+13) following the discoveries of Gregg et al. (2006) and
Zhou et al. (2006), although some of these still require confir-
mation. Their symmetric structures indicate an “edge-on” ori-
entation, which in turn supports an alternative hypothesis de-
scribed as “unification by time”, with BAL quasars charac-
terised as young or recently refuelled quasars (Becker et al.,
2000; Gregg et al., 2000). There has been only one attempt (at
1.6 GHz with the EVN) to image radio structures of the smallest
(and possibly the youngest) BAL quasars (Jiang & Wang, 2003)
from the Becker et al. (2000) sample. This paper presents high
frequency VLBA images of another very compact BAL quasar
— 1045+352, which makes it the BAL quasar with the best
known radio structure to date.
2. The observations and data reduction
The five papers of this series are concerned with a sample
of 60 candidates selected from the VLA FIRST catalogue
(White et al., 1997)2 which could be weak CSS sources. The
sample selection criteria have been given in Kunert et al. (2002)
(hereafter Paper I). All the sources were initially observed with
MERLIN at 5 GHz and the results of these observations led to
the selection of several groups of objects for further study with
MERLIN and the VLA (Paper II), as well as the VLBA and
the EVN (Papers III and IV). The last of those groups contains
10 sources that, because of their structures (very faint “haloes”
or possible core-jet structures), were not included in the other
groups as they were less likely to be candidates for faders.
However, to complete the investigation of the primary sample,
1.7, 5, and 8.4-GHz VLBA observations of 10 sources listed in
Table 1 together with their basic properties, were carried out on
13 November 2004 in a snapshot mode with phase-referencing.3
Each target source scan was interleaved with a scan on a phase
reference source and the total cycle time (target and phase-
reference) was ∼9 minutes including telescope drive times, with
∼7 minutes actually on the target source per cycle. The cycles
for a given target-calibrator pair were grouped and rotated round
the three frequencies, although the source 1059+351 was only
observed at 1.7 GHz with the VLBA because of its very low flux
density as measured at 5 GHz by MERLIN (13 mJy).
The whole data reduction process was carried out using
standard AIPS procedures but, in addition to this, corrections
for Earth orientation parameter (EOP) errors introduced by the
VLBA correlator also had to be made. For each target source
and at each frequency, the corresponding phase-reference source
was mapped, and the phase errors so determined were applied to
the target sources, which were then mapped using a few cycles
of phase self-calibration and imaging. For some of the sources
a final amplitude self-calibration was also applied. IMAGR was
used to produce the final “naturally weighted”, total intensity im-
ages shown in Figs. 1 to 10. Three of the ten sources (1056+316,
1302+356, 1627+289) were not detected in the 8.4-GHz VLBA
observations, and 1425+287 has not been detected in any VLBA
observations. Flux densities of the principal components of the
sources were measured using the AIPS task JMFIT and are listed
in Table 3.
In addition to the observations described above, unpub-
lished 8.4-GHz VLA observations of five sources – 1056+316,
1126+293, 1425+287, 1627+289, 1302+356 – made in A-conf.
by Glen Langston (first four objects) and Patnaik et al. (1992)
have been included (Figs. 3, 5, 9, 10, and 7, respectively).
It was realised that because of poor u-v coverage at the higher
frequencies, some flux density could be missing and the resul-
tant spectral index maps were not considered to be reliable. Any
calculation of spectral indices from the flux densities quoted in
Table 3 should also be treated only as coarse approximations.
For 1045+352, 30-GHz continuum observations using the
Toruń 32-m radio telescope and a prototype (two-element
2 Official website: http://sundog.stsci.edu
3 Including this paper, the results of the observations of 46 sources
out of 60 candidates from the primary sample have been published. The
observations of 14 objects failed for different reasons.
http://sundog.stsci.edu
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 3
Table 1. Basic parameters of target sources
Source RA Dec ID mR z S 1.4 GHz logP1.4GHz S 4.85 GHz α
4.85GHz
1.4GHz LAS LLS
Name h m s ◦ ′ ′′ mJy W Hz−1 mJy ′′ h−1 kpc
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
1045+352 10 48 34.247 34 57 24.99 Q 20.86 1.604 1051 27.65 439 −0.70 ∼0.50 2.1
1049+384 10 52 11.797 38 11 43.83 G 20.76 1.018 712 27.04 205 −1.00 0.14 0.6
1056+316 10 59 43.236 31 24 20.59 G 21.10 0.307∗ 459 25.72 209 −0.63 0.50 1.4
1059+351 11 02 08.686 34 55 10.74 G 19.50 0.594∗ 702 26.52 252 −0.82 3.03 11.5
1126+293 11 29 21.738 29 05 06.40 EF — — 729 — 213 −0.99 0.79 —
1132+374 11 35 05.927 37 08 40.80 G — 2.880 638 28.00 218 −0.86 ∼0.30 1.1
1302+356 13 04 34.477 35 23 33.93 EF — — 483 — 185 −0.77 ∼0.20 —
1407+369 14 09 09.528 36 42 08.06 q 21.51 0.996∗ 538 26.89 216 −0.73 ∼0.25 1.1
1425+287 14 27 40.281 28 33 25.78 EF — — 859 — 198 −1.18 0.75 —
1627+289 16 29 12.290 28 51 34.25 EF — — 526 — 162 −0.95 ∼0.65 —
Description of the columns: (1) source name in the IAU format; (2) source right ascension (J2000) extracted from FIRST; (3) source declination
(J2000) extracted from FIRST; (4) optical identification: G - galaxy, Q - quasar, EF - empty field, q - star-like object, i.e. unconfirmed QSO; (5)
red magnitude extracted from SDSS/DR5; (6) redshift; (7) total flux density at 1.4 GHz extracted from FIRST; (8) log of the radio luminosity
at 1.4 GHz; (9) total flux density at 4.85 GHz extracted from GB6; (10) spectral index between 1.4 and 4.85 GHz calculated using flux densities
in columns (7) and (9); (11) largest angular size (LAS) measured in the 5-GHz MERLIN image – in most cases, as a separation between the
outermost component peaks, otherwise “∼” means measured in the image contour plot; (12) largest linear size (LLS).
∗ photometric redshift extracted from SDSS/DR5
receiver) of the One-Centimeter Receiver Array (OCRA-p,
Lowe et al., 2005) have also been made. The recorded output
from the receiver was the difference between the signals from
two closely-spaced horns effectively separated in azimuth so
that atmospheric variations were mostly cancelled out. The ob-
serving technique was such that the respective two beams were
pointed at the source alternately with a switching cycle of ∼50
seconds for a period of ∼6 minutes, thus measuring the source
flux density relative to the sky background on either side of the
source. The telescope pointing was determined from azimuth
and elevation scans across the point source Mrk 421. The pri-
mary flux density calibrator that was used was the planetary neb-
ula NGC 7027, which has an effective radio angular size of ∼8
arcseconds (Bryce et al., 1997) and for which a correction of the
flux density scale had to be made. However, as NGC 7027 was at
some distance from the target source, the point source 1144+402
was used as a secondary flux density calibrator. Corrections for
the effects of the atmosphere were determined from system tem-
perature measurements at zenith distances of 0◦ and 60◦.
3. Comments on individual sources
1045+352. The MERLIN and VLBA maps (Fig. 1) show this
source to be extended in both the NE/SW and NW/SE directions.
The central compact feature visible in all the maps is probably a
radio core with a steep spectrum. The VLBA image at 1.7 GHz
shows two symmetric protrusions – possibly jets – straddling the
core in a NE/SW direction, the SW emission being weaker than
in the NE. This structure is aligned with the NE/SW emission
visible in the 5-GHz MERLIN image, but the more extended dif-
fuse emission has been resolved out in the VLBA images. The
5-GHz VLBA image shows a core and a one-sided jet pointing
to the East. Some compact features in a NE direction are also
visible. The radio structure in the 8.4-GHz VLBA image is sim-
ilar to that at 5 GHz: an extended radio core and a jet pointing in
an easterly direction.
The observed radio morphology of 1045+352 could indicate
a restart of activity with the NE/SW radio emission being the
first phase of activity, now fading away, and the extension in the
NW/SE direction being a signature of the current active phase.
However, the above is only one of a number of possible interpre-
tations of the structure of 1045+352 – see further discussion in
Sect. 4.
According to Sloan Digital Sky Survey/Data Release 5
(SDSS/DR5), 1045+352 is a galaxy at RA= 10h48m34.s242,
Dec=+34◦57′24.′′95, which is marked with a cross in the
MERLIN map but the spectral observations carried out by
Willott et al. (2002) have shown 1045+352 to be a quasar with
a redshift of z = 1.604. It has been also classified as a HiBAL
object based upon the observed very broad C IV absorption, and
it is a very luminous submillimetre object with detections at both
850µm and 450µm (Willott et al., 2002).
The total flux of 1045+352 at 30 GHz measured by us using
OCRA-p is S 30GHz=69 mJy±7 mJy, which gives a steep spectral
index α = −1.01 between 4.85 GHz and 30 GHz.
1049+384. The 5-GHz MERLIN image (Fig. 2) shows it as
a triple core-jet structure with the brightest component re-
solved into a double structure extended in a NW/SE direc-
tion in the high resolution VLBA observations. The 1.7-GHz
VLBA image shows four radio components (in agreement with
Dallacasa et al., 2002), whereas the 5-GHz and 8.4-GHz VLBA
maps show only three components. However, the 5-GHz VLBA
image published by Orienti et al. (2004) shows all four compo-
nents, and they suggest that the two western components and the
two eastern ones are two independent radio sources. As pointed
by Orienti et al. (2004), it is difficult to classify the object, al-
though the idea that 1049+384 consists of two separate com-
pact, double sources is not very plausible because of the very
small separation, ∼ 0.09′′ (0.4 kpc), between these two poten-
tial objects. Although the spectral index calculations are very
uncertain, it is suggested that one of the eastern components
at RA= 10h52m11.s797, Dec=+38◦11′44.′′027 is a radio core (in
agreement with Orienti et al., 2004) from which jets emerge al-
ternately in opposite directions.
1049+384 is a galaxy with a redshift z = 1.018
(Riley & Warner, 1994), but according to Allington-Smith et al.
(1988) the optical spectrum of 1049+384 shows interme-
diate properties between a galaxy and a quasar. The opti-
4 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
1045+352         4994.000 MHz
peak flux density=230.21 mJy/beam, beam size=56 x 41 mas
first contour level=0.12 mJy/beam
RIGHT ASCENSION (J2000)
10 48 34.30 34.28 34.26 34.24 34.22 34.20
34 57 25.8
1045+352         1667.474 MHz
peak flux density=118.71 mJy/beam, beam size=13.1 x 8.2 mas
first contour level=0.80 mJy/beam
RIGHT ASCENSION (J2000)
10 48 34.256 34.254 34.252 34.250 34.248 34.246 34.244 34.242 34.240
34 57 25.14
25.12
25.10
25.08
25.06
25.04
25.02
25.00
24.98
24.96
24.94
1045+352        4987.474 MHz
peak flux density=13.64 mJy/beam, beam size=4.7 x 2.4 mas
first contour level=0.14 mJy/beam
RIGHT ASCENSION (J2000)
10 48 34.254 34.252 34.250 34.248 34.246 34.244
34 57 25.12
25.10
25.08
25.06
25.04
25.02
25.00
24.98
1045+352        8421.474 MHz
peak flux density=4.03 mJy/beam, beam size=2.7 x 1.5 mas
first contour level=0.14 mJy/beam
RIGHT ASCENSION (J2000)
10 48 34.251 34.250 34.249 34.248 34.247 34.246 34.245
34 57 25.08
25.07
25.06
25.05
25.04
25.03
25.02
25.01
Fig. 1. The MERLIN 5-GHz (upper left) and VLBA 1.7, 5, and 8.4-GHz maps of 1045+352. Contours increase by a factor 2, and
the first contour level corresponds to ≈ 3σ. A cross indicates the position of an optical object found using the SDSS/DR5.
cal object was included in SDSS/DR5 (RA= 10h52m11.s802,
Dec=+38◦11′44.′′00) and is marked in all maps with a cross.
1056+316. The 8.4-GHz VLA image (Fig. 3) shows this source
to have a double structure that, in the 5-GHz MERLIN image,
has been resolved into a radio core and probably a hotspot in
a NW radio lobe. Both components are visible in the 1.7-GHz
VLBA image, but neither has been detected in the higher fre-
quency VLBA images. The two weak features on either side of
the NW component in the 1.7-GHz VLBA image may be the
remains of extended emission that has been resolved out.
The optical counterpart of 1056+316 was included
in SDSS/DR5 (RA= 10h59m43.s145, Dec=+31◦24′23.′′31), to-
gether with a photometric redshift (Table 1). Its position is
marked with a cross in 8.4-GHz VLA map.
1059+351. The 5-GHz MERLIN map (Fig. 4) shows a bright
component that is probably a radio core, on almost opposite
sides of which is emission from compact features (hotspots)
within the two radio lobes. This structure agrees with the 1.4-
GHz VLA observations presented by Gregorini et al. (1988) and
Machalski & Condon (1983). Their images clearly show an S-
shaped morphology of 1059+351 with two very diffuse compo-
nents, the brighter one resolved into a double structure in 5-GHz
VLA observations (Machalski, 1998). One of these two com-
ponents is the NW hotspot visible in the 5-GHz MERLIN map,
and the second is probably a radio core visible in both the 5-GHz
MERLIN and 1.7-GHz VLBA images.
The optical counterpart of 1059+351 was included
in SDSS/DR5 (RA= 11h02m08.s727, Dec=+34◦55′08.′′79), to-
gether with a photometric redshift (Table 1). The position of the
optical object is marked with a cross in all maps and is well
correlated with the position of the radio core. Machalski (1998)
also measured a photometric redshift for 1059+351, which is
z = 0.37 and which differs from that in SDSS/DR5.
1126+293. The VLA 8.4-GHz and MERLIN 5-GHz maps
(Fig. 5) show three radio components, the brighter one proba-
bly being the core that was resolved into a core-jet structure in
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 5
1049+384        4994.000 MHz
peak flux density=116.27 mJy/beam, beam size=62 x 38 mas
first contour level=0.40 mJy/beam
RIGHT ASCENSION (J2000)
10 52 11.84 11.83 11.82 11.81 11.80 11.79 11.78 11.77 11.76
38 11 44.6
1049+384        1667.474 MHz
peak flux density=181.64 mJy/beam, beam size=11.6 x 8.2 mas
first contour level=0.09 mJy/beam
RIGHT ASCENSION (J2000)
10 52 11.810 11.805 11.800 11.795 11.790 11.785
38 11 44.14
44.12
44.10
44.08
44.06
44.04
44.02
44.00
43.98
43.96
43.94
1049+384        4987.474 MHz
peak flux density=21.07 mJy/beam, beam size=4.2 x 2.3 mas
first contour level=0.09 mJy/beam
RIGHT ASCENSION (J2000)
10 52 11.805 11.800 11.795 11.790
38 11 44.08
44.07
44.06
44.05
44.04
44.03
44.02
44.01
44.00
43.99
43.98
1049+384        8421.474 MHz
peak flux density=68.39 mJy/beam, beam size=2.6 x 1.2 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
10 52 11.800 11.798 11.796 11.794 11.792 11.790 11.788 11.786
38 11 44.06
44.05
44.04
44.03
44.02
44.01
44.00
43.99
Fig. 2. The MERLIN 5-GHz (upper left) map and VLBA 1.7, 5, and 8.4-GHz maps of 1049+384. Contours increase by a factor 2,
and the first contour level corresponds to ≈ 3σ. Crosses indicate the position of an optical object found using the SDSS/DR5 .
the 1.7-GHz VLBA image. The source was not detected in the 5
and 8.4-GHz VLBA observations.
1132+374. The 5-GHz MERLIN image shows (Fig. 6) a core-jet
structure that was resolved into a triple CSO object in the 1.7-
GHz VLBA image. The 5 and 8.4-GHz VLBA images show only
two components: a hotspot in the NE lobe and a radio core. This
source is identified with a very high redshift (z = 2.88) galaxy
(Eales & Rawlings, 1996).
1302+356. This source was observed with the VLA at 8.4 GHz
as a part of the JVAS survey (Patnaik et al., 1992). The result-
ing map shows a slightly extended EW object (Fig. 7). The 5-
GHz MERLIN image shows this to be a double source, and the
weak (∼10 mJy) eastern component could be part of a jet. The
bright component was resolved into a diffuse structure in the 1.7-
GHz VLBA image. The 5-GHz VLBA image shows only a sin-
gle component at the position of the maximum emission in the
1.7-GHz VLBA image, which is probably a radio core (Fig. 7).
There is no trace of this source in the 8.4-GHz VLBA image.
1407+369. The 5-GHz MERLIN image shows a core-jet struc-
ture in a NW direction that is resolved into a core and jet in
6 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
1056+316        8439.900 MHz
peak flux density=118.41 mJy/beam, beam size=270 x 257 mas
first contour level=0.06 mJy/beam
RIGHT ASCENSION (J2000)
10 59 43.45 43.35 43.25 43.15 43.05
31 24 23
1056+316        4994.000 MHz
peak flux density=80.83 mJy/beam, beam size=60 x 43 mas
first contour level=0.16 mJy/beam
RIGHT ASCENSION (J2000)
10 59 43.28 43.26 43.24 43.22 43.20
31 24 21.2
1056+316        1667.474 MHz
peak flux density=9.10 mJy/beam, beam size=13.9 x 5.5 mas
first contour level=0.30 mJy/beam
RIGHT ASCENSION (J2000)
10 59 43.265 43.255 43.245 43.235 43.225
31 24 20.6
Fig. 3. The VLA 8.4-GHz map, MERLIN 5-GHz map, and VLBA 1.7-GHz map of 1056+316. Contours increase by a factor 2, and
the first contour level corresponds to ≈ 3σ. A cross on the VLA map indicates the position of an optical object found using the
SDSS/DR5.
1059+351        4994.000 MHz
peak flux density=10.03 mJy/beam, beam size=89 x 69 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
11 02 08.85 08.80 08.75 08.70 08.65 08.60
34 55 10.0
1059+351        1667.474 MHz
peak flux density=8.07 mJy/beam, beam size=10.9 x 7.9 mas
first contour level=0.08 mJy/beam
RIGHT ASCENSION (J2000)
11 02 08.735 08.730 08.725 08.720
34 55 08.80
08.75
08.70
08.65
08.60
Fig. 4. The MERLIN 5-GHz map and VLBA 1.7-GHz map of 1059+351. Contours increase by a factor 2, and the first contour level
corresponds to ≈ 3σ. Crosses indicate the position of an optical object found using the SDSS/DR5.
all the VLBA maps (Fig. 8). The optical object was included
in SDSS/DR5 (RA= 14h09m09.s509, Dec=+36◦42′08.′′15) and is
marked with a cross in all maps. The redshift quoted in Table 1
is photometric.
1425+287. Both the VLA 8.4-GHz and MERLIN 5-GHz images
(Fig. 9) show a double structure for this source. The brighter
component seems to be a radio core, although this cannot be
confirmed because the source was not detected in the VLBA ob-
servations (Fig. 9).
1627+289. Both the VLA 8.4-GHz and MERLIN 5-GHz images
(Fig. 10) show this source to have a core-jet structure. The 1.7-
GHz VLBA image shows only the central extended feature that
was resolved into a core-jet structure in the 5-GHz VLBA image.
The source was not detected in the 8.4-GHz VLBA image.
4. Discussion
4.1. 1045+352 — a BAL quasar
1045+352 is a HiBAL quasar with a very reddened spectrum
showing a C IV broad absorption system (Willott et al., 2002).
Its projected linear size is only 2.1 kpc, which is consistent with
the observation of Becker et al. (2000) that, amongst radio loud
quasars, broad absorption lines are more commonly observed in
the smallest radio sources.
It is a very luminous submillimetre object, which together
with the template dust spectrum adopted by Willott et al. (2002),
indicates this source to be a hyperluminous infrared quasar, with
large amounts of dust in its host galaxy. Although 1045+352
is quite luminous at 151 MHz (2.88 Jy, Waldram et al., 1996),
which suggests the presence of some extended emission and
which, indeed, appears to be present in our MERLIN 5-GHz
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 7
1126+293        8439.900 MHz
peak flux density=65.97 mJy/beam, beam size=368 x 310 mas
first contour level=0.08 mJy/beam
RIGHT ASCENSION (J2000)
11 29 23.9 23.8 23.7 23.6 23.5 23.4
29 05 01
04 59
1126+293        4994.000 MHz
peak flux density=60.53 mJy/beam, beam size=62 x43 mas
first contour level=0.14 mJy/beam
RIGHT ASCENSION (J2000)
11 29 21.80 21.75 21.70 21.65
29 05 07.5
1126+293        1667.474 MHz
peak flux density=6.03 mJy/beam, beam size=14.0 x 4.2 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
11 29 21.762 21.760 21.758 21.756 21.754 21.752 21.750 21.748 21.746
29 05 06.50
06.48
06.46
06.44
06.42
06.40
06.38
06.36
06.34
06.32
06.30
Fig. 5. The VLA 8.4-GHz map, MERLIN 5-GHz map and VLBA 1.7-GHz map of 1126+293. Contours increase by a factor 2, and
the first contour level corresponds to ≈ 3σ.
image, the VLBA maps show the radio structure to be domi-
nated by jets and a core. The 30-GHz flux density of 1045+352
is also high, as would be expected from the VLBA structure.
Consequently, there could be synchrotron contamination of the
submillimetre flux. As shown by Blundell et al. (1999), either
the first-order or second-order polynomials can accurately pre-
dict the shape of the radio spectrum. Both models have been ap-
plied to the radio data of 1045+352 taken from the literature and
from this paper (Fig. 11), and show that a non-thermal compo-
nent could constitute at least ∼40% of the entire 850µm flux (the
parabolic fit). The linear fit agrees with calculations based upon
the 1.25 mm flux measured by Haas et al. (2006), who derived a
value of 94% for the non-thermal component part of the detected
850µm flux. It has to be noted here that the linear fit should be
treated as an upper limit for the synchrotron emission at submil-
limetre wavelengths, since the spectrum may steepen in the inter-
val between 30 GHz and the SCUBA wavebands. However, the
above can indicate values of infrared emission and dust mass of
1045+352 lower than estimated (Willott et al., 2002). This also
appears be consistent with the findings of Willott et al. (2003),
who have shown that there is no difference between the submil-
limetre luminosities of BAL and non-BAL quasars, which sug-
gest that a large dust mass is not required for quasars to show
BALs.
The radio luminosity at 1.4 GHz is high (Table 1), making
this source one of the most radio-luminous BAL quasars, with
a value similar to that of the first known radio-loud BAL QSO
with an FR II structure, FIRST J101614.3+520916 (Gregg et al.,
2000). Following Stocke et al. (1992), a radio-loudness param-
eter, R∗, defined as the K-corrected ratio of the 5-GHz radio
flux to 2500Å optical flux (Table 2) was calculated. For this,
a global radio spectral index, αradio = −0.8 and an optical spec-
tral index, αopt = −1.0, were assumed, and the SDSS g
′ mag-
nitude defined by Fukugita et al. (1996) was converted to the
Johnson-Morgan-Cousins B magnitude using the formula given
by Smith et al. (2002). Corrections were also made for intrin-
8 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
1132+374        4994.000 MHz
peak flux density=122.40 mJy/beam, beam size=58 x 44 mas
first contour level=0.18 mJy/beam
RIGHT ASCENSION (J2000)
11 35 05.98 05.96 05.94 05.92 05.90 05.88
37 08 41.6
1132+374        1667.474 MHz
peak flux density=38.64 mJy/beam, beam size=9.5 x 4.0 mas
first contour level=0.40 mJy/beam
RIGHT ASCENSION (J2000)
11 35 05.940 05.938 05.936 05.934 05.932 05.930 05.928 05.926
37 08 40.86
40.84
40.82
40.80
40.78
40.76
40.74
40.72
40.70
40.68
40.66
1132+374         4987.474 MHz
peak flux density=12.57 mJy/beam, beam size=3.1 x 1.2 mas
first contour level=0.14 mJy/beam
RIGHT ASCENSION (J2000)
11 35 05.936 05.934 05.932 05.930 05.928
37 08 40.82
40.80
40.78
40.76
40.74
40.72
40.70
1132+374        8421.474 MHz
peak flux density=8.74 mJy/beam, beam size=2.2 x 1.5 mas
first contour level=0.10 mJy/beam
RIGHT ASCENSION (J2000)
11 35 05.936 05.935 05.934 05.933 05.932 05.931 05.930
37 08 40.83
40.82
40.81
40.80
40.79
40.78
40.77
40.76
40.75
40.74
Fig. 6. The MERLIN 5-GHz (upper left) map and VLBA 1.7, 5, and 8.4-GHz maps of 1132+374. Contours increase by a factor 2,
and the first contour level corresponds to ≈ 3σ.
sic extinction (local to the quasar) calculated by Willott et al.
(2002), who assumed a Milky-Way extinction curve. Even af-
ter correction, log(R∗) > 1, which means that 1045+352 is still
radio-loud object. The angle between the jet axis and the line
of sight can be estimated using the core radio-to-optical lumi-
nosity ratio defined by Wills & Brotherton (1995) as log(RV ) =
log(Lcore) + 0.4MV − 13.69, where Lcore is a radio luminosity of
the core at 5-GHz rest frequency (the core flux density at 5 GHz
were taken from the VLBA image; see also Table 3), and MV
is the K-corrected absolute magnitude calculated using transfor-
mation equation V = g′−0.55(g′−r′)−0.03 (Smith et al., 2002).
From this, a value of ∼3.2 has been obtained for 1045+352, im-
plying an angle in the range θ ∼ 10◦ − 30◦ for the jet in the
observed asymmetric MERLIN 5-GHz radio morphology, and
can explain the high value of the radio-loudness parameter. An
assumption of θ = 20◦ yields the deprojected linear size of the
source of ∼ 6 kpc. As shown by White et al. (2006), BAL QSOs
are systematically brighter than non-BAL objects, which indi-
cates we are looking closer to the jet axis in quasars with BALs.
Based upon the small inclination angles of their BAL quasars,
Zhou et al. (2006) suggest that BAL features can be caused by
polar disk winds. Also, Saikia et al. (2001) and Jeyakumar et al.
(2005) found that the radio properties of CSS sources are con-
sistent with the unified scheme in which the axes of the quasars
are observed close to the line of sight. On the other hand, it
has been shown (Saikia et al., 2001; Jeyakumar et al., 2005) that
many CSS objects interact with an asymmetric medium in the
central regions of their host galaxies, and this can cause the ob-
served asymmetries. It is then likely that, also in the case of the
CSS quasar 1045+352, the environmental asymmetries might
play an important role. The jet power can be estimated from
the relationship between the radio luminosity and the jet power
given by Willott et al. (1999, Eq.(12)). However, because some
of the flux density of the 1045+352 can be beamed, the calcu-
lations have to be treated as an approximation. Assuming the
151-MHz flux density, which accounts for the extended emis-
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 9
1302+356        8452.400 MHz
peak flux density=109.96 mJy/beam, beam size=252 x 230 mas
first contour level=0.09 mJy/beam
RIGHT ASCENSION (J2000)
13 04 34.75 34.70 34.65 34.60 34.55 34.50 34.45 34.40 34.35 34.30
35 23 36
1302+356        4994.500 MHz
peak flux density=129.54 mJy/beam, beam size=62 x 39 mas
first contour level=0.18 mJy/beam
RIGHT ASCENSION (B1950)
13 02 13.86 13.84 13.82 13.80 13.78 13.76 13.74 13.72 13.70 13.68
35 39 38.5
1302+356        1667.474 MHz
peak flux density=19.62 mJy/beam, beam size=10.0 x 4.0 mas
first contour level=0.14 mJy/beam
RIGHT ASCENSION (J2000)
13 04 34.502 34.500 34.498 34.496 34.494 34.492 34.490 34.488 34.486
35 23 33.64
33.62
33.60
33.58
33.56
33.54
33.52
33.50
33.48
33.46
33.44
1302+356        4987.474 MHz
peak flux density=4.23 mJy/beam, beam size=3.8 x 1.5 mas
first contour level=0.07 mJy/beam
RIGHT ASCENSION (J2000)
13 04 34.498 34.497 34.496 34.495 34.494 34.493 34.492
35 23 33.57
33.56
33.55
33.54
33.53
33.52
33.51
33.50
33.49
Fig. 7. The VLA 8.4-GHz map (upper left), MERLIN 5-GHz map (upper right) and VLBA 1.7 and 5-GHz maps of 1302+356.
Contours increase by a factor 2, and the first contour level corresponds to ≈ 3σ.
sion and the radio emission from the jets, the jet kinetic power is
Q jet ∼ 10
44erg sec−1.
The projected linear size D of a radio quasar or radio
galaxy can be approximately related to the time, from the trig-
gering of activity, as the relationship between these variables
is only weakly dependent upon the radio luminosity. Using
the model of radio source evolution from Willott et al. (1999),
the age of 1045+352 was estimated to be ∼ 105 years (see
also Willott et al., 2002; Rawlings et al., 2004). For the calcu-
lations we assumed: θ = 20◦, β = 1.5, c1 = 2.3, n100 =
3000 e− m−3, a0 = 100 kpc (see Willott et al., 1999, for defini-
tions). Both the MERLIN and VLBA high frequency images
have revealed that two cycles of activity may have occurred dur-
ing these ∼ 105 years. The extended NE/SW emission is prob-
ably the remnant of the first phase of activity, which has been
very recently replaced by a new phase of activity pointing in a
NW/SE direction. It has been shown by Stanghellini et al. (2005)
that the extended emission observed for small-scale objects can
be the remnants of an earlier period of activity in these sources.
In the case of 1045+352, renewal of activity has been accompa-
nied by a reorientation of the jet axis.
Several processes can be used to explain a jet reorientation
in AGNs. There are strong observational and theoretical grounds
for believing that accretion disks around black holes may be
twisted or warped, and this can be caused by a number of pos-
sible physical processes. In particular, if there is a misalignment
between the axis of rotating black hole and the axis of its rotating
accretion disk, then the Lense-Thirring precession produces a
warp in the disk. This process is called the Bardeen-Peterson ef-
fect (Bardeen & Petterson, 1975). According to Pringle (1997),
disk warping can also be induced by internal instabilities in
the accretion disk caused by radiation pressure from the central
source.
A reorientation of the jet axis may also result from a merger
with another black hole. Merritt & Ekers (2002) have shown that
a rapid change in jet orientation can be caused by even a mi-
10 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
1407+369        4994.500 MHz
peak flux density=109.87 mJy/beam, beam size=52 x 46 mas
first contour level=0.12 mJy/beam
RIGHT ASCENSION (J2000)
14 09 09.56 09.54 09.52 09.50 09.48 09.46
36 42 08.8
1407+369        1667.474 MHz
peak flux density=147.37 mJy/beam, beam size=10.2 x 5.1 mas
first contour level=0.18 mJy/beam
RIGHT ASCENSION (J2000)
14 09 09.516 09.512 09.508 09.504 09.500
36 42 08.30
08.25
08.20
08.15
08.10
08.05
1407+369        4987.474 MHz
peak flux density=60.90 mJy/beam, beam size=3.6 x 2.0 mas
first contour level=0.16 mJy/beam
RIGHT ASCENSION (J2000)
14 09 09.512 09.510 09.508 09.506 09.504
36 42 08.22
08.20
08.18
08.16
08.14
08.12
08.10
1407+369        8421.474 MHz
peak flux density=24.34 mJy/beam, beam size=2.1 x 1.0 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
14 09 09.511 09.510 09.509 09.508 09.507 09.506
36 42 08.18
08.17
08.16
08.15
08.14
08.13
08.12
Fig. 8. The MERLIN 5-GHz map (upper left) and VLBA 1.7, 5, and 8.4-GHz maps of 1407+369. Contours increase by a factor 2,
and the first contour level corresponds to ≈ 3σ. Crosses indicate the position of an optical object found using the SDSS/DR5.
nor merger because of a spin-flip of the central active black
hole arising from the coalescence of inclined binary black holes.
According to Liu (2004), the Bardeen-Peterson effect can also
cause a realignment of a rotating SMBH and a misaligned ac-
cretion disk, where the timescale of such a realignment t < 105
years. If it is assumed that the typical speed of advance of ra-
dio lobes of young AGNs is υ ∼0.3c (Owsianik et al., 1998;
Giroletti et al., 2003; Polatidis & Conway, 2003), then distorted
jets of length, tυ <10 kpc for some CSS and GPS sources should
be observed, although the character of these disturbances is not
known. Liu (2004) shows that the interaction/realignment of
a binary and its accretion disk leads to the development of X-
shaped sources. 1045+352 is not a typical X-shaped source like
3C 223.1 or 3C 403 (Dennett-Thorpe et al., 2002; Capetti et al.,
2002). However, according to Cohen et al. (2005) the realign-
ment of a rotating SMBH followed by a repositioning of the ac-
cretion disk and jets is a plausible interpretation for misaligned
radio structures, even if they are not conspicuously X-shaped.
It is likely that in young sources such as 1045+352, the
gas has not yet settled into a regular disk following a merger
event and that separate clouds of gas and dust reaching the very
central regions of the source at different times disturb the sta-
bility of the accretion disk and affect the jet formation. Later,
these clouds could cause a renewal of activity. Numerical simu-
lations of colliding galaxies show that these usually merge com-
pletely after a few encounters in timescales up to ∼ 108 years
(Barnes & Hernquist, 1996). According to Schoenmakers et al.
(2000), multiple encounters between interacting galaxies can
cause interruptions of activity and lead to the many types of
sources that are observed in a restarted phase, such as double-
double radio galaxies. Nevertheless, it is unclear whether such
encounters can cause jet reorientation. On the other hand, the
dense medium of a host galaxy can frustrate the jets, and their
collisions with the dense surrounding medium can cause rapid
bends through large angles. In the case of 1045+352, the VLBA
images at the higher frequencies seem to show a jet emerging in
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 11
1425+287       8439.900 MHz
peak flux density=64.89 mJy/beam, beam size=305 x 267 mas
first contour level=0.08 mJy/beam
RIGHT ASCENSION (J2000)
14 27 38.7 38.6 38.5 38.4 38.3 38.2
28 33 17
1425+287        4994.500 MHz
peak flux density=62.37 mJy/beam, beam size=74 x 40 mas
first contour level=0.80 mJy/beam
RIGHT ASCENSION (J2000)
14 27 40.40 40.35 40.30 40.25 40.20
28 33 27.5
Fig. 9. The VLA 8.4-GHz map and MERLIN 5-GHz map of 1425+287. Contours increase by a factor 2, and the first contour level
corresponds to ≈ 3σ.
Frequency (Hz)
1045+352
Fig. 11. Spectral Energy Distribution (SED) of 1045+352 from
radio to submillimetre wavelengths. The errors are smaller than
the size of the symbols; 1.25 mm point (Haas et al., 2006) is
shown as a triangle, 850µm and 450 µm points (Willott et al.,
2002) are shown as filled circles, radio observations are shown
as asterisks. The solid curve is the parabolic fit f (x) = ax2+bx+c
to all radio data (yi), with a = −0.14, b = 1.91, c = −5.68, and
reduced χ2 = 12. The dashed curve is the linear fit f (x) = ax+ b
to radio data with ν > 1GHz, with a = −0.86, b = 7.91, and
reduced χ2 = 0.5.
a S/SE direction, but being bent through ∼ 60◦ to a NE direction
in the lower resolution 1.7-GHz image. The MERLIN lower res-
olution 5-GHz image might indicate that the jet has been bent
again and now emerges from the core in a NW direction.
It is difficult to find a convincing argument in favour of one
of the above-mentioned alternatives or to rule any of them out
based upon the extensive multifrequency data on 1045+352 pre-
sented here. However, if it is assumed that a merger is the most
probable cause of the ignition and restart of activity in radio
galaxies, this could mean that 1045+352 has undergone two
merger events in a very short period of time (∼ 105), which is un-
Table 2. 1045+352 properties
Parameter Value
u′ 22.12
g′ 21.38
r′ 20.81
i′ 20.14
z′ 20.08
AB 2.0
MB -22.05 (-24.05)
AV 1.5
MV -22.83 (-24.33)
log(R∗)(total) 4.9 (4.1)
log(R∗)(core) 3.8 (3.0)
Notes: Optical photometry from SDSS, corrected for Galactic extinc-
tion. AV taken from Willott et al. (2002). Quantities in parentheses are
corrected for intrinsic extinction.
likely. More probable is that the ignition of activity in 1045+352
has occurred during a merger event that is, as yet, incomplete and
that disturbed, misaligned radio jets result from the realignment
of a rotating SMBH or intermittent gas injection that interrupts
jet formation.
4.2. Other nine sources
Three sources from our sample (1126+293, 1407+369,
1627+289) show one- or two-sided core-jet structures, indicat-
ing that they are in an active phase of their evolution, although
the core-jet structure of 1126+293 is controversial. Our images
indicate that the western components are parts of the jet, which is
possibly precessing or being bent by interactions with the inter-
stellar medium. They could, however, also be hotspots of a radio
lobe. Unfortunately, our high frequency VLBA observations are
not sensitive enough to settle this problem. Three other sources
(1056+316, 1132+374, 1425+287) have visible radio cores and
parts of lobes or hotspots, indicating activity. 1132+374 is a CSO
object. In the case of one source, 1059+351, the VLBA obser-
12 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
1627+289        8439.900 MHz
peak flux density=75.42 mJy/beam, beam size=271 x 263 mas
first contour level=0.07 mJy/beam
RIGHT ASCENSION (J2000)
16 29 12.50 12.45 12.40 12.35 12.30 12.25 12.20 12.15 12.10 12.05
28 51 37
1627+289        4994.500 MHz
peak flux density=77.77 mJy/beam, beam size=70 x 39 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
16 29 12.36 12.34 12.32 12.30 12.28 12.26 12.24 12.22 12.20
28 51 35.5
1627+289        1667.474 MHz
peak flux density=40.35 mJy/beam, beam size=10.6 x 5.1 mas
first contour level=0.30 mJy/beam
RIGHT ASCENSION (J2000)
16 29 12.270 12.268 12.266 12.264 12.262 12.260 12.258
28 51 34.16
34.14
34.12
34.10
34.08
34.06
34.04
34.02
34.00
33.98
33.96
1627+289        4987.474 MHz
peak flux density=5.57 mJy/beam, beam size=4.2 x 1.9 mas
first contour level=0.15 mJy/beam
RIGHT ASCENSION (J2000)
16 29 12.267 12.266 12.265 12.264 12.263 12.262 12.261 12.260
28 51 34.11
34.10
34.09
34.08
34.07
34.06
34.05
34.04
34.03
34.02
Fig. 10. The VLA 8.4-GHz map (upper left), MERLIN 5-GHz map (upper right), and VLBA 1.7 and 5-GHz maps of 1627+289.
Contours increase by a factor 2, and the first contour level corresponds to ≈ 3σ.
vations show only a radio core, although the 5-GHz MERLIN
image of 1059+351 also shows remnants of the two radio
lobes of its “S” shaped structure visible at the VLA resolutions
(Machalski & Condon, 1983; Machalski, 1998). According to
Taylor et al. (1996) and Readhead et al. (1996), “S” symmetry
is observed in many compact sources and can be explained
by precession of the central engine. 1059+351 is the largest
source in our sample with a linear size of 45 kpc based upon
its largest angular size measured from 1.46-GHz VLA image
(Machalski & Condon, 1983).
The compact 1049+384 and 1302+356 steep spectrum
sources appeared to be low-frequency variables (LFV) at
151 MHz with very high (≥0.99) probabilities that their variabil-
ity is real (Minns & Riley, 2000). According to them, LFV ob-
jects are generally more compact than other CSS sources and
tend to exhibit steeper spectra than typical CSS sources. This
may be because of rapid spectral ageing, which might be ex-
pected for frustrated sources, or it might simply be because the
sources are at very high redshifts.
5. Conclusions
VLBA, VLA, and MERLIN images of ten compact steep
spectrum sources have been presented. One of these sources,
1045+352, is a very radio-luminous BAL quasar, whose com-
plex structure suggests restarted activity. This may have resulted
either from a merger event or from the infall of a cloud of gas,
that had cooled in the halo of the galaxy into the core region of
the source. The asymmetric radio jets of 1045+352 and the es-
timated angle suggest that some of the emission can be boosted,
although the intrinsic asymmetries cannot be ruled out. It has
also been confirmed that the 850µm flux of 1045+352 can be
severely contaminated by synchrotron emission, which may sug-
gest less than previously estimated values of infrared emission
and dust mass. Most of the radio-loud BAL quasars detected to
M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources 13
Table 3. Flux densities of sources principal components from the VLBA observations
Source RA DEC S1.7 GHz S5 GHz S8.4GHz θ1 θ2 PA
Name h m s ◦ ′ ′′ mJy mJy mJy mas mas ◦
(1) (2) (3) (4) (5) (6) (7) (8) (9)
1045+352 10 48 34.248 34 57 25.044 303.2 − − 15.0 11.0 60
10 48 34.249 34 57 25.061 − 3.5 − 2.0 1.0 76
10 48 34.248 34 57 25.041 − 21.8 7.1 7.0 1.0 101
10 48 34.248 34 57 25.043 − 32.7 12.3 4.0 3.0 95
1049+384 10 52 11.803 38 11 44.018 13.6 − − 3.0 1.0 14
10 52 11.797 38 11 44.027 11.4 3.9 6.9 2.0 2.0 121
10 52 11.789 38 11 44.031 182.1 33.6 12.9 8.0 1.0 119
10 52 11.787 38 11 44.048 218.5 23.9 2.3 5.0 3.0 177
1056+316 10 59 43.254 31 24 20.106 8.8 − − 0.9 0.1 7
10 59 43.235 31 24 20.538 43.6 − − 33.0 8.0 6
1059+351 11 02 08.726 34 55 08.709 8.1 − − 0.7 0.3 124
1126+293 11 29 21.755 29 05 06.402 7.3 − − 3.0 1.0 84
11 29 21.753 29 05 06.401 10.4 − − 13.0 4.0 53
1132+374 11 35 05.934 37 08 40.810 124.1 6.6 1.6 18.0 2.0 57
11 35 05.932 37 08 40.775 36.3 13.8 9.4 2.0 0.4 8
11 35 05.931 37 08 40.715 14.5 − − 5.0 0.8 105
1302+356 13 04 34.495 35 23 33.534 46.8 5.9 − 11.0 6.0 97
13 04 34.494 35 23 33.538 60.5 − − 15.0 7.0 147
1407+369 14 09 09.504 36 42 08.195 81.0 1.9 − 17.0 3.0 138
14 09 09.508 36 42 08.164 192.8 76.7 42.0 8.0 1.5 141
14 09 09.508 36 42 08.152 − 9.5 4.8 0.7 0.2 140
1627+289 16 29 12.264 28 51 34.062 111.5 8.1 − 10.0 6.0 58
Description of the columns: (1) source name in the IAU format; (2) component right ascension (J2000) as measured at 1.7 GHz; (3) component
declination (J2000) as measured at 1.7 GHz; (4) VLBA flux density in mJy at 1.7 GHz from the present paper; (5) VLBA flux density in mJy at
5 GHz from the present paper; (6) VLBA flux density in mJy at 8.4 GHz from the present paper; (7) deconvolved component major axis angular
size at 1.7 GHz obtained using JMFIT; (8) deconvolved component minor axis angular size at 1.7 GHz obtained using JMFIT; (9) deconvolved
major axis position angle at 1.7 GHz obtained using JMFIT. In the case the component is not visible in 1.7 GHz map the values for the last three
columns are taken from the 5-GHz image.
date have very compact radio structures similar to GPS and CSS
sources which are thought to be young. Therefore, the compact
structure and young age of 1045+352 fit well to the evolutionary
interpretation of radio-loud BAL QSOs.
According to the evolutionary model recently proposed by
Lipari & Terlevich (2006), BAL quasars are young systems with
composite outflows, and they are accompanied by absorption
clouds. The radio-loud systems may be associated with the later
stages of evolution, when jets have removed the clouds respon-
sible for the generation of BALs. The effect of orientation could
play a secondary role here. The above could explain the rarity of
extended radio structures showing BAL features (Gregg et al.,
2006).
Acknowledgements.
The VLBA is operated by the National Radio Astronomy Observatory (NRAO),
a facility of the National Science Foundation (NSF) operated under cooperative
agreement by Associated Universities, Inc. (AUI).
This research has made use of the NASA/IPAC Extragalactic Database (NED),
which is operated by the Jet Propulsion Laboratory, California Institute
of Technology, under contract with the National Aeronautics and Space
Administration.
Use has been made of the Sloan Digital Sky Survey (SDSS) Archive. The
SDSS is managed by the Astrophysical Research Consortium (ARC) for the
Participating Institutions: The University of Chicago, Fermilab, the Institute for
Advanced Study, the Japan Participation Group, The Johns Hopkins University,
Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy
(MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State
University, University of Pittsburgh, Princeton University, the United States
Naval Observatory, and the University of Washington.
We thank M. Gawroński for his help with the OCRA-p observations. The OCRA
project was supported by the Polish Ministry of Science and Higher Education
under grant 5 P03D 024 21 and the Royal Society Paul Instrument Fund.
We thank P.J. Wiita for a discussion and P. Thomasson for reading of the paper
and a number of suggestions.
This work was supported by the Polish Ministry of Science and Higher
Education under grant 1 P03D 008 30.
References
Allington-Smith, J., R., Spinrad, H., Djorgovski, S., & Liebert, J. 1988, MNRAS,
234, 1091
Bardeen, J. M., & Petterson, J. A. 1975, ApJ, 195, L65
Barnes, J. E., & Hernquist, L. 1996, ApJ, 471, 115
Becker, R. H., Gregg, M. D., Hook, I. M., et al. 1997, ApJ, 479, L93
Becker, R. H., White, R. L., Gregg, M. D., et al. 2000, ApJ, 538, 72
Blundell, K. M., Rawlings, S., & Willott, C. J. 1999, ApJ, 117, 677
Brotherton, M. S., van Breugel, W., Smith, R. J., et al. 1998, ApJ, 505, L7
Brotherton, M. S., Croom, S. M., De Breuck, C., Becker, R. H., & Gregg, M. D.
2002, AJ, 124, 2575
Bryce, M., Pedlar, A., Muxlow, T., Thomasson, P., & Mellema, G. 1997,
MNRAS, 284, 815
Capetti, A., Zamfir, S., Rossi, P., et al. 2002, A&A, 394, 39
Carvalho, J. C. 1985, MNRAS, 215, 463
Cohen, A. S., Clarke, T. E., Ferretti, L., & Kassim, N. E. 2005, ApJ, 620, L5
Dallacasa D., Tinti, S., Fanti, C., et al. 2002, A&A, 389, 115
Dennett-Thorpe J., Scheuer, P. A. G., Laing, R. A., et al. 2002, MNRAS, 330,
Eales, S., & Rawlings, S., 1996, ApJ, 460, 68
Elvis, M. 2000, ApJ, 545, 63
Fukugita, M., Ichikawa, T., Gunn, J. E., et al. 1996, AJ, 111, 1748
Giroletti, M., Giovannini, G., & Taylor, G. B. 2005, A&A, 441, 89
Giroletti, M., Giovannini, G., Taylor, G. B., et al. 2003, A&A, 399, 889
Gopal-Krishna, & Wiita, P. J. 2000, A&A, 363, 507
Gregg, M. D., Becker, R. H., Brotherton, M. S., et al. 2000, ApJ, 544, 142
Gregg, M. D., Becker, R. H., & de Vries, W. 2006, ApJ, 641, 210
Gregorini, L., Padrielli, L., Parma, P., & Gilmore, G. 1988, A&AS, 74, 107
Gugliucci, N. E., Taylor, G. B., Peck, A. B., & Giroletti, M. 2005, ApJ, 622, 136
14 M. Kunert-Bajraszewska and A. Marecki: FIRST-based survey of compact steep spectrum sources
Haas, M., Chini, R., Muller, S. A. H., Bertoldi, F., & Albrecht, M. 2006, A&A,
445, 115
Hewett, P. C., & Foltz, C. B. 2003, AJ, 125, 1784
Jeyakumar, S., Wiita, P. J., Saikia, D. J., & Hooda, J. S., 2005, A&A, 432, 823
Jiang, D. R., & Wang, T. G. 2003, A&A, 397, L13
Kunert, M., Marecki, A., Spencer, R. E., Kus, A. J., & Niezgoda J. 2002, A&A,
391, 47 (Paper I)
Kunert-Bajraszewska, M., Marecki, A., Thomasson, P., & Spencer, R. E. 2005,
A&A, 440, 93 (Paper II)
Kunert-Bajraszewska, M., Marecki, A., & Thomasson, P. 2006, A&A, 450,
945 (Paper IV)
Lipari, S. L., & Terlevich, R. J. 2006, MNRAS, 368, 1001
Liu, F. K., 2004, MNRAS, 347, 1357
Lowe, S. R., 2005, PhD thesis, University of Manchester
Machalski, J., & Condon, J. J. 1983, AJ, 88, 143
Machalski, J. 1998, A&AS, 128, 153
Marecki, A., Spencer, R. E., & Kunert, M. 2003, PASA, 20, 46
Marecki, A., Kunert-Bajraszewska, M., & Spencer, R. E. 2006, A&A, 449,
985 (Paper III)
Menou, K., Vanden Berk, D. E., & Ivezić, Ž. 2001, ApJ, 561, 645
Merritt, D., & Ekers, R. D. 2002, Science, 297, 1310
Minns, A. R., & Riley, J. M. 2000, MNRAS, 318, 827
Murgia, M., Fanti, C., Fanti, R., et al. 1999, A&A, 345, 769
Murray, N., Chiang, J., Grossman, S. A., & Voit, G. M. 1995, ApJ, 451, 498
O’Dea, C. P., & Baum, S. A. 1997, AJ, 113, 148
Orienti, M., Dallacasa, D., Fanti C., et al. 2004, A&A, 426, 463
Owsianik, I., Conway, J. E., & Polatidis, A. G. 1998, A&A, 336, L37
Patnaik, A. R., Browne, I. W. A., Wilkinson, P. N., & Wrobel, J. M. 1992,
MNRAS, 254, 655
Phillips, R. B., & Mutel, R. L. 1982, A&A, 106, 21
Polatidis, A. G., & Conway, J. E. 2003, PASA, 20, 69
Pringle, J. E. 1997, MNRAS, 292, 136
Rawlings, S., Willott, C. J., Hill, G. J., et al. 2004, MNRAS, 351, 676
Readhead, A. C. S., Xu, W., Pearson, T. J., Wilkinson, P. N., & Polatidis, A. G.
1994, in Compact Extragalactic Radio Sources, NRAO Workshop, ed. J. A.
Zenzus, K. Kellermann, 17
Readhead, A. C. S., Taylor, G. B., Xu, W., et al. 1996, ApJ, 460, 612
Reynolds, C. S., & Begelman, M. C. 1997, ApJ, 487, L135
Riley, J. M., & Warner, P., J. 1994, MNRAS, 269, 166
Saikia, D. J., Jeyakumar, S., Salter, C. J., et al. 2001, MNRAS, 321, 37
Schoenmakers, A. P., de Bruyn, A. G., Röttgering, H. J. A., van der Laan, &
Kaiser, C. R. 2000, MNRAS, 315, 371
Smith, J. A., Tucker, D. L., Kent, S., et al. 2002, AJ, 123, 2121
Stanghellini, C., O’Dea, C. P., Dallacasa, D., et al. 2005, A&A, 443, 891
Stocke, J. T., Morris, S. L., Weymann, J. T., & Foltz, C. B. 1992, ApJ, 396, 487
Taylor, G. B., Readhead, A. C. S., & Pearson, T. J. 1996, ApJ, 463, 95
Waldram, E. M., Yates, J. A., Riley, J. M., & Warner, P. J. 1996, MNRAS, 282,
Weymann, R. J., Morris, S. L., Foltz, C. B., & Hewett, P. C. 1991, ApJ, 373, 23
White, R. L., Becker, R. H., Helfand, D. J., & Gregg, M. D. 1997, ApJ, 475, 479
White, R. L., Helfand, D. J., Becker, R. H., Glikman, E., & de Vries, W. 2007,
ApJ, 654, 99
Willott, C. J., Rawlings, S., Blundell, K. M., & Lacy, M. 1999, MNRAS, 309,
Willott, C. J., Rawlings, S., Archibald, E. N., & Dunlop, J. S. 2002, MNRAS,
331, 435
Willott, C. J., Rawlings, S., & Grimes, J. A. 2003, ApJ, 598, 909
Wills, B. J., & Brotherton, M. S. 1995, ApJ, 448, L81
Wills, B. J., Brandt, W. N., & Laor, A. 1999, ApJ, 520, L91
Zhou, H., Wang, T., Wang, H., et al. 2006, ApJ, 639, 716
List of Objects
‘1045+352’ on page 3
‘1049+384’ on page 3
‘1056+316’ on page 3
‘1059+351’ on page 3
‘1126+293’ on page 4
‘1132+374’ on page 4
‘1302+356’ on page 4
‘1407+369’ on page 5
‘1425+287’ on page 5
‘1627+289’ on page 5
	Introduction
	The observations and data reduction
	Comments on individual sources
	Discussion
	1045+352 — a BAL quasar
	Other nine sources
	Conclusions
ABSTRACT
  Multifrequency VLBA observations of the final group of ten objects in a
sample of FIRST-based compact steep spectrum (CSS) sources are presented. The
sample was selected to investigate whether objects of this kind could be relics
of radio-loud AGNs switched off at very early stages of their evolution or
possibly to indicate intermittent activity. Initial observations were made
using MERLIN at 5 GHz. The sources have now been observed with the VLBA at 1.7,
5 and 8.4 GHz in a snapshot mode with phase-referencing. The resulting maps are
presented along with unpublished 8.4-GHz VLA images of five sources. Some of
the sources discussed here show a complex radio morphology and therefore a
complicated past that, in some cases, might indicate intermittent activity. One
of the sources studied - 1045+352 - is known as a powerful radio and
infrared-luminous broad absorption line (BAL) quasar. It is a young CSS object
whose asymmetric two-sided morphology on a scale of several hundred parsecs,
extending in two different directions, may suggest intermittent activity. The
young age and compact structure of 1045+352 is consistent with the evolution
scenario of BAL quasars. It has also been confirmed that the submillimetre flux
of 1045+352 can be seriously contaminated by synchrotron emission.

<|endoftext|><|startoftext|>
Introduction 
The present communication is devoted to the 
experimental investigation of relaxation phenomena in 
high-temperature superconductors of HoBa2Cu3O7-δ 
system. 
High-temperature superconductors are characterized 
by such high critical transition temperatures Тс in the 
superconducting state, they remain superconductors at 
temperatures when their thermal fluctuations energy 
becomes compared with the elastic energy, and also with 
the pinning energy [1]. It creates prerequisites for phase 
transitions.  Due to the layered crystal structure and 
anisotropy, which is a characteristic high-temperature 
superconductors, they reveal conditions for the 
appearance of different phases on B-T diagram.( B is 
magnetic induction, T-is temperature)[2-13]. As 
example, Abrikosov vortex lattice begin melting near the 
critical Тс temperature what is followed by the essential 
change of vortex continuum flow dynamics along with 
sharp change of character (dynamics) of relaxation 
phenomena. In high-temperature superconductors it is 
observed such relaxation processes as a slow logarithmic 
decrease of captured flux with time at temperatures much 
below their superconductive critical transition 
temperature Тс [14-16]. The logarithmic character of 
relaxation is explained by the Anderson [17]. Near Тс, in 
the range of Abrikosov vortex lattice melting, the 
logarithmic character of relaxation is changed by the 
power one with 2/3 exponent [18]. 
Consequently, the study of relaxation processes in 
high-temperature superconductors is an important 
problem.  
2. Experimental 
For Investigation it was used currentless mechanical 
method of Abrikosov vortex stimulated dynamics study 
by magnetic pulses revealing relaxation phenomena in 
vortex matter described in work [19]. This method is a 
development of currentless mechanical method of 
pinning investigations [20,21] and is based on pinning  
forces   countermoments measurements and viscous 
friction, acting on a axially symmetrical superconducting 
sample in an outer  (transverse) magnetic field.   
Countermoments of pinning forces and of viscous 
friction, acting on a superconductive sample from  
quantized vortex lines side (Abrikosov vortices) are 
defined the way as it was described in work [22,23].  The 
sensitivity of the method accordingly works [24], is 
equivalent to 10-8 V×cm-1   in the method of V-A 
characteristics. 
The high-temperature superconducting samples of 
HoBa2Cu3O7-δ system were prepared by the standard 
solid state reaction method. Samples were made 
cylindrical with height L=13mm and diameter d=6mm. 
Their critical temperature was Tc=92 K. The investigated 
samples were isotropic what was established by 
mechanical moment τ measurements appearing 
H > 1cH   with the penetration of Abrikosov vortices 
into a freely suspended on a thin elastic thread 
superconducting sample. The appearance of such 
moment ατ sinMH= , characteristic for anisotropic 
superconductors, is related with penetrating Abrikosov 
vortices and the mean magnetic moment M
 of a sample 
which could deviate on angle α  from the direction of 
outer magnetic field H
. In superconducting anisotropic 
samples it is presented energetically favorable directions 
for the arrangement of emerging (penetrating) vortex 
lines which in their turn are fastened by pinning centers 
creating aforementioned moment τ . The lack of τ  
moment is characteristic for isotropic and investigated by 
us samples, no matter magnetic field value and its 
previous orientation in respect to H
 in the axial 
symmetry plane. Pulsed magnetic fields were created by 
Helmholtz coils. The value of pulsed magnetic fields was 
changed in Oeh 2002 ÷=∆   limits. 
In experiments it was used both single and continuous 
pulsed with repetition frequency ν   from 2.5 s-1  to 500s-1 
.  The duration 
x of pulses was changed from 0,5 до 
500 �s. Magnetic pulse could be directed both parallel 
h||H) and perpendicularly (
h⊥H) to applied steady 
magnetic field H
, creating mixed state of 
superconducting sample. The standard pulsed generator 
and amplifier were used to feed Helmholtz coils. The 
current strength in coils reached up to  40÷50 A.  
Samples were high-temperature superconductors of 
HoBa2Cu3O7-δ system placed in the center between 
Helmholtz coils. 
The principal set-up of experiment is shown in fig.1 
[19,20]. In experiments it is measured the rotation angle 
2ϕ  of sample depending on the angle of rotation of a 
torsion head 1ϕ , transmitting the rotation to a sample by 
means of suspension having the torsion stiffness 
K ≈4·10-1 [dyn•cm], which can be replaced when 
necessary by a less stiff or stiffer one.  
The measurements were carried out at a constant speed 
of rotation of the torsion head, making ω1=1,8·10
-2 rad/s . 
Angles of rotation поворота   2ϕ   and 1ϕ  were 
determined with an accuracy of ±4,6·10-3 and  ±2,3·10-3 
rad, respectively. The uniformity of the magnetic field’s 
strength along a sample was below H
∆ = 10-3.
                  Fig. 1.  The schematic diagram and the geometry of the experiment. 1-sample, 2-upper elastic filament, 3-lower filament, 4 - leading head,  
                              5 - glass  road. φ is angle between Mr  and Hr  
To avoid effects, connected with the frozen magnetic 
fluxes, the lower part of the cryostat with the sample was 
put into a special cylindrical Permalloy screen, reducing 
the Earth magnetic field by the factor of 1200. After a 
sample was cooled by liquid nitrogen to the 
superconducting state, the screen was removed, a 
magnetic field of necessary intensity H  was applied and 
the 2 1( )ϕ ϕ  dependences were measured. To carry out 
measurements at different values of H , the sample was 
brought to the normal state by heating it to до T > cT   at 
H =0,  and only after returning sample and torsion head 
to the initial state 1 2 0ϕ ϕ= = , the experiment was 
repeated.  
                                            
3. Results and discussions 
During rotation of the sample both of normal and 
superconducting states in the 
absence external magnetic field ( H =0) the 2ϕ  
dependence versus 1ϕ  is linear and the condition is 
satisfied. 
       tωϕϕ == 21  
    The character of the   2 1( )ϕ ϕ   dependence is changed 
significantly, when the sample is in magnetic fields 
H > 1cH  at  T < cT . Typical 2 1( )ϕ ϕ dependences at 
T=77K and various magnetic fields for HoBa2Cu3O7-δ  
sample  ( length of a cylindrical sample L=13mm and 
diameter d=6mm ) is shown in Fig.2.   
                              
Fig.2. Dependence of the rotation angle of the sample HoBa2Cu3O7-δ   
2ϕ  on the rotation angle of the leading head 1ϕ  in magnetic field 
H=1000 Oe at T=77K. 
Three distinct regions are observed in Fig.2.  In the 
first (initial) region, the sample does not respond to the 
increase in  1ϕ  ,  i.e. to the applied and increase with 
time torsion torque as  1ϕ ~ )( 21 ϕϕτ −= K  or 
responds weakly. Such behavior of the sample can be 
explained by fact that Abrikosov vortices are not 
detached from pinning centers at small values of 1ϕ ~τ , 
but if the sample is still turned slightly, this can be 
caused by elastic deformation of magnetic force lines 
beyond it or, possibly, by separation of the most weakly 
fixed vortices. As it is seen from fig.2 , as soon as a 
certain critical value  φсmin depending on H is reached , 
the first region under goes a transition to the second 
region in which the velocity of the sample increases 
gradually with  1ϕ  increasing resulting from the 
progressive process of detachment of vortices from their 
corresponding pinning centers. One should expect that 
just in this region, in the rotating sample “the vortices 
fan” begins to unfold, in with the vortices are distributed 
according to the instantaneous angles of orientations with 
respect to the fixed external magnetic field.   In this case 
the of orientation angles of separate vortex filaments are 
limited from frϕ  to pinfr ϕϕ + , where  frϕ  is the angle 
on which the vortex filament can be turned with respect 
to  H
 by forces of viscous friction with the matrix of 
superconductor, and  pinϕ  is the angle on with the 
vortex filament can be turned by the most strong pinning 
center, studied for the first time in [25]. 
The gradual transition (at high 1ϕ  values)  to the third 
region where the linear 2 1( )ϕ ϕ dependence was 
observed, allows one to define the countermoments of 
pinning forces  pτ and  frτ , independently. Just in this 
region, when  21 ωω =  the torque τ , appeared to the 
uniformly rotating sample, is balanced by the 
countermoment  pτ  and  frτ . In particular, in the case 
of continuously rotating sample with frequency  
21 ωω =    one could find similarly to   [26,27]  the 
expression for the total braking torque  τ  [19] . 
Indeed, if we consider in this case a vortex element 
  moving with velocity ⊥υ
 perpendicular to sd
then the average force acting on this elements is 
dsFdsfd l
⊥ += υ
and the associated braking torque, exerted on the rotating 
specimen becomes:                    
υτ fdrd
 where r
is the vector pointing from the rotational axis 
to the vortex elements, lF  is the pinning force per flux 
thread per unit length, and η  is the viscosity coefficient. 
For a cylindrical specimen of radius R   and height L  
integrating over the individual contribution of all vortex 
gives a total braking torque τ  
 ωτττ 0+= p                                                        (1) 
   with      
=τ  ,                
 and 
B ηπτ
= ,               
Where B is the inductivity averaged over the sample, 
0Φ  is the flux quantum , L  is the height and R  is the 
radius of the sample. 
As it is shown in Fig.2, starting with the point   
(a), where 21 ωω = , to the superconducting sample 
uniformly rotating in the homogeneous stationary 
magnetic field  H=1000 Oe, is applied stationary 
dynamic torsion moment  fr
p τττ += .  
If in this region the torsion head is stopped, then at the 
expense of relaxation processes connected with the 
presence of viscous forces acting on vortex filaments, the 
sample will continue the rotation in the same direction 
(with decreasing velocity)   until it reaches a certain 
equilibrium position, depending on the H  value. The  
Fig.3 shows curves of  
2ϕ∆ time dependences at the 
stopped leading head for HoBa2Cu3O7-δ  sample at 
T=77K and  H=1000 Oe. 
Fig.3. Dependence of momentum 
relτ  on time  t  after the stopping of 
rotating head for HoBa2Cu3O7-δ      sample at  T=77K  and  H=1000 Oe.  
If during the relaxation after rotation of sample one 
applies the pulsed magnetic field in parallel to the outer 
magnetic field H
, then additional vortices, created as 
result of magnetic pulse, influence the structure already 
existing in the sample as “the vortex fan” what could 
result in the decrease of the angle of its unfolding or to 
its folding. The letter in its turn, would cause the 
additional change in the relaxation process taking place 
in the sample, and, correspondingly, results in the 
stepwise decrease of moment related with viscous 
forces frτ . 
But the change of relaxation process character and, 
correspondently, the stepwise decrease of moment could 
happen if the duration of magnetic pulse is larger as 
compared with the time necessary for creation of a new 
vortex structure, which will influence the 
superconducting sample relaxing in magnetic field. If it 
is the case, then at the small durations of magnetic pulses 
the relaxation curve, presented in Fig.3, doesn’t  change, 
but when this duration becomes the order of a time for 
penetration of vortices into the sample and the creation 
of vortex structure, then the aforementioned change of 
relaxation processes could principally appear. Namely, 
this situation when the duration �x of magnetic pulses 
was larger then the time for  Abrikosov vortex lattice 
creation �xс,  have been described by us our previous 
work [19], when it was shown that the influence of one 
magnetic pulse �h≈400 Oe (�h||H)  with duration 
30�сек>�xс   was stepwisely decreased the relτ  moment 
and the relaxation process continued with the reduction 
relτ on a level as far as a new magnetic pulse similar 
the first one is not applied. 
In the presented work it was studied the influence of 
different duration and amplitude pulses on relaxation 
processes in vortex matter. The results shown in  Fig.4 
on action of single pulses of different durations on 
relaxation processes in vortex matter and, consequently, 
on mechanical moment  
relτ revealed that at small pulses 
durations up to  15 �s  the relτ doesn’t change, but at 
duration of applied pulse  >15 �s  it is observed the 
stepwise change  
relτ , what speaks on the existence of 
the  �xс   threshold. 
Fig.4. Dependence of momentum � relτ on  the duration �x of 
magnetic field single pulse    � h=172 Oe applied in parallel to the 
main magnetic field  H=1000 Oe  at  T=77K  for  HoBa2Cu3O7-δ  
sample. 
This way one could say that the Abrikosov vortex 
lattice creation time in high-temperature isotropic 
superconductor of HoBa2Cu3O7-δ makes value on the 
order of 20�s. This value approximately on the order of 
value higher then time for the single-vortex creation for 
the first time measured by G. Boato, G.Gallinaro and C. 
Rizzuto [28],  who showed that this time is less than 10-5 
sec. 
In work [19]  it was also shown that continuous action  
of aforementioned pulses with the train frequency equal 
to 2,5 s-1  more sharply reveals their influence on 
relaxation processes in vortex matter and in these 
conditions the processes of penetration of vortices into 
superconductors bulk are made more sharply expressed. 
In fig.5 it is presented the clear picture of magnetic 
pulses continuous action with �h=172 Oe (�h||H)  , and 
the duration  20�sec, what  is larger than the �xс with the 
train frequency  ν  = 2,5 s-1. As it is seen from picture the 
pulses of 5, 10 and 15 �sec durations doesn’t  change 
)(tfrel =τ  which is observed at absence of magnetic 
pulses. 
The results presented in Fig.5 show that at durations of 
pulses in 20�sec, 30�sec and 40�sec the Abrikosov 
vortices penetrate into the superconductor. This way the 
threshold value on the magnetic pulses duration observed 
at the action of single pulses  (Fig.4)   coinside with the 
threshold observed when their repetition frequency is  ν  
= 2.5 s-1. 
Fig.5. Dependence of momentum  
relτ on time  t  after the stopping of 
rotating head with the influence since  t=5 min  on the relaxation 
process of  HoBa2Cu3O7-δ sample of the continuous magnetic field 
pulses      h=172 Oe  with  ν=2,5 s-1  frequency and different                            
durations     x=5;  10;  15;  20;  30;  and 40 �s.  Pulsed magnetic field 
was parallel to the main magnetic field  H=1000 Oe  at  T=77K.   
 In Fig.6 it is presented the curve of  relτ =f(t) 
dependence on time at the influence of magnetic pulses  �h=172  (�h||H)  the duration of which is below the time 
of Abrikosov vortex system creation �x=5�s<�xс 
(�xс≥15�s for the investigated HoBa2Cu3O7-δ). As it is 
seen from the picture when �x<�xс, the relaxation curve 
doesn’t change in spite the increase of the repetition 
frequency of magnetic pulses ν  from 2.5 up to 500 s-1. As 
soon as the duration of pulses exceeds the critical value 
and becomes  �x=30�s, the relaxation curve undergoes 
the essential  (stepwise)  change. For example in Fig.6 it 
is presented measurement for ν=5s-1  и ν= 500s-1. 
Fig.6. Dependence of momentum  
relτ on the time  t  after the 
stopping of rotating head with the influence since  t=10 min  on the 
HoBa2Cu3O7-δ  sample relaxation process of  the continuous                       
pulses magnetic field  with frequency ν=2,5 ÷500s-1  at �x=5�s< �xс , 
and also at �x= 30�s > �xс. The pulsed magnetic field �h=172  Oe was 
parallel to the main magnetic field  H=1000 Oe   at  T=77K.   
And finally, we have observed the threshold on the 
value of applied pulses. In Fig.7 it is shown that in spite 
the fact that we applied magnetic pulses of the  large 
duration 300�s>>�xс, much longer as compared with the 
time of Abrikosov vortex creation at small amplitudes of 
pulsed field  �h ~7, 11, 14 Oe  relτ =f(t) doesn’t change. 
The stepwise change of the relaxing moment 
relτ is 
revealed only at �h ~18 Oe  and higher. 
       
Fig.6. Dependence of momentum  
relτ on the time  t  after the 
stopping of rotating head with the application after 5 minutes on the 
HoBa2Cu3O7-δ  sample relaxation process of  the single magnetic                    
field  pulses �h=(7÷36)  Oe  with duration �x= 300�s >>�xс.                             
The pulsed magnetic field was parallel to the main magnetic field  
H=400 Oe at T=77K. 
The further investigations of relaxation phenomena are 
anticipated for anisotropic high-temperature 
superconductors among them in strongly anisotropic 
high-temperature superconductors of  Bi-Pb-Sr-Ca-Cu-O  
system.   
4. Conclusion 
      
The simple mechanical method of Abrikosov vortex 
stimulated dynamics investigations it was applied for the 
study of pulsed magnetic fields influence on relaxation 
phenomena in vortex matter of high- temperature 
superconductors. It was observed the change of 
relaxation processes in vortex matter as a result of pulsed 
magnetic field influence on it.   
    The study of influence of different duration and 
amplitude pulsed magnetic fields influence was revealed 
the existence of threshold phenomena. A small duration 
pulse doesn’t change the course of relaxation processes 
in vortex matter of isotropic high- temperature 
superconductor HoBa2Cu3O7-δ. When the duration of 
pulses exceeds some critical value (threshold), then their 
influence change the course of relaxation processes. The 
latter is revealed in a stepwise decrease of relaxing 
mechanical momentum
relτ , apparently, related with a 
sharp change of pinning and the rearrange of vortex 
system of superconducting sample as a result of 
penetration into its bulk of a new portion of vortices at 
application of pulsed field on the outer magnetic field 
creating the main vortex structure in the investigated 
HoBa2Cu3O7-δ sample. A new portion of vortices 
“shakes” the vortex lattice existing in a sample causing 
the detachment of vortices from a weak pinning centers 
what, apparently, is the reason for the stepwise decrease 
of mechanical momentum
relτ . 
 All these made it possible to define the Abrikosov 
vortex lattice creation time in HoBa2Cu3O7-δ which 
turned out to be on the order of value higher as compared 
with the time of single- vortex creation observed in type 
II superconductors. 
  Acknowledgements 
     
The work was supported by the grants of International 
Science and Technology Center (ISTC) G-389 and G-
593. 
                             
References: 
1. V.M. Pan, A.V.Pan , Low Temperature Physics, v27,  
    №9-10, pp. 991-1010. 
2. Brandt E. H., Esquinazi P., Weiss W. C. Phys. Rev.  
    Lett., 1991, v. 62. p.2330. 
3. Xu Y., Suenaga M. Phys. Rev. 1991, v. 43. p. 5516  
    Kopelevich Y., Esquinazi P.arXiv: cond-mat/0002019.   
 4. E. Koshelev and V. M. Vinokur,  Phys.  Rev. Lett. 73,   
     3580– 3583 (1994). 
5. E.W. Carlson, A.H. Castro Neto, and D.K.Campbell,  
    Phys. Rev. Lett., 1991, v. 90, p.087001. 
6. D. E. Farrell, J. P. Rice and D. M. Ginsberg, Phys.   
    Rev. Lett., 1991, v. 67, pp.1165-1168. 
7. S.M. Ashimov, J.G.Chigvinadze, Cond-mat/0306118. 
8. V.M.Vinokur, P.S. Kes and A.E. Koshelev., Physica C  
    168, (1990), 29-39. 
9. M.V.  Feigelman, V.B. Geshkenbein , A.I. Larkin,  
    Physica C 167 (1990) 177. 
10. J.G. Chigvinadze, A.A. Iashvili , T.V. Machaidze,    
      Phys.Lett.A. 300(2002) 524-528. 
11. J.G. Chigvinadze , A.A. Iashvili , T.V.  Machaidze,  
      Phys.Lett.A.300(2002) 311-316. 
12. C. J. Olson, G.T. Zimanyi, A.B. Kolton, N.  
      Gronbech-Iensen, Phys.Rev. Lett.   85(2000)5416. 
      C.J.Olson, C.Reichbardt, R.T.Scalettar, G.T.Zimanyi,  
      cond-mat/0008350. 
13. S.M. Ashimov, J.G. Chigvinadze, Physics Letters A  
      313 (2003) 238-242. 
14. Muller K. A., Tokashige., Bednorz J. G.- Phys. Rev.  
      Lett., 1987, v.58, p.1143.  
15. Touminen M., Goldman A. M., McCartney M. L.-  
      Phys. Rev. B, 1988, v.37, p.548. 
16. Klimenko A.G., Blinov A.G., Vesin Yu.I.,  
      StarikovM.A.- Pis’ma Zh.Eksp. Teor. Fiz., 1987,  
      v.46,Suppl., p.196.   
17. Anderson P. W. - Phys. Rev. Lett., 1962, v.9, p.303. 
18. A.A. Iashvili, T.V. Machaidze.  L.T.Paniashvili, and  
      J.G. Chigvinadze. Phys.,Chem., Techn., 1994, v. 7, N  
      2, pp. 297-300. 
19. J.G. Chigvinadze, J.V. Acrivos S.M. Ashimov, A.A.  
      Iashvili, T. V. Machaidze,   Th. Wolf.  Phys.Lett. A,  
      349,  264(2006). 
20. E. L. Andronikashvili, J.G. Chigvinadze, R.M.Kerr,  
      J. Lowell, K. Mendelsohn, J. S. Tsakadze.  
      Cryogenics, v. 9, N2, pp.119-121 (1969). 
21. J.G. Chigvinadze, Zh.Eksp.Teor.Fiz.,v. 65, N5, pp.  
      1923-1927 (1973). 
22. S.M. Ashimov, I.A. Naskidashvili et al., Low Temp.  
      Phys. 10, 479, (1984). 
23. S.M. Ashimov and J.G. Chigvinadze. Physics  
      Letters A, 313,  pp. 238-242. (2003).   
24. G. L. Dorofeev, E.F. Klimenko, Journ. Techn. Phys.  
      57, p.2291, (1987). 
25. B.H. Heise Rev.Mod.Phys.36, 64 (1964). 
26. M. Fuhrmans, C. Heiden, Proc. International  
      Discussion (Sonnenberg, Germany, 1974),  
      Göttingen, 1975, p. 223. 
27. M. Fuhrmans, C. Heiden, Cryogenics, 125, 451       
     (1976). 
28. G. Boato, G.Gallinaro and C. Rizzuto. Solid State  
      Communications, vol. 3, pp.173-176(1965).
ABSTRACT
  It is used the mechanical method of Abrikosov vortex stimulated dynamics
investigation in superconductors. With its help it was studied relaxation
phenomena in vortex matter of high-temperature superconductors. It established
that pulsed magnetic fields change the course of relaxation processes taking
place in vortex matter. The study of the influence of magnetic pulses differing
by their durations and amplitudes on vortex system of isotropic
high-temperature superconductors system HoBa2Cu3O7-d showed the presence of
threshold phenomena. The small duration pulses does not change the course of
relaxation processes taking place in vortex matter. When the duration of pulses
exceeds some critical value (threshold), then their influence change the course
of relaxation process which is revealed by stepwise change of relaxing
mechanical moment . These investigations showed that the time for formatting of
Abrikosov vortex lattice in HoBa2Cu3O7-d is of the order of 20 microsec. which
on the order of value exceeds the time necessary for formation of a single
vortex observed in type II superconductors.

<|endoftext|><|startoftext|>
Spin and pseudospin symmetries and the equivalent spectra of relativistic spin-1/2
and spin-0 particles
P. Alberto
Physics Department and Center for Computational Physics,
University of Coimbra, P-3004-516 Coimbra, Portugal
A. S. de Castro
Departamento de F́ısica e Qúımica, Universidade Estadual Paulista, 12516-410 Guaratinguetá, SP, Brazil
M. Malheiro
Departamento de F́ısica, Instituto Tecnológico de Aeronáutica,
CTA, 12228-900, São José dos Campos, SP, Brazil
and Instituto de F́ısica, Universidade Federal Fluminense, 24210-340 Niterói, Brazil
(Dated: November 4, 2018)
We show that the conditions which originate the spin and pseudospin symmetries in the Dirac
equation are the same that produce equivalent energy spectra of relativistic spin-1/2 and spin-0
particles in the presence of vector and scalar potentials. The conclusions do not depend on the
particular shapes of the potentials and can be important in different fields of physics. When both
scalar and vector potentials are spherical, these conditions for isospectrality imply that the spin-
orbit and Darwin terms of either the upper component or the lower component of the Dirac spinor
vanish, making it equivalent, as far as energy is concerned, to a spin-0 state. In this case, besides
energy, a scalar particle will also have the same orbital angular momentum as the (conserved) orbital
angular momentum of either the upper or lower component of the corresponding spin-1/2 particle.
We point out a few possible applications of this result.
PACS numbers: 11.30.-j,03.65.Pm
When describing some strong interacting systems it is often useful, because of simplicity, to approximate the
behavior of relativistic spin-1/2 particles by scalar spin-0 particles obeying the Klein-Gordon equation. An example
is the case of relativistic quark models used for studying quark-hadron duality because of the added complexity of
structure functions of Dirac particles as compared to scalar ones. It turns out that some results (e.g., the onset of
scaling in some structure functions) almost do not depend on the spin structure of the particle [1]. In this work we
will give another example of an observable, the energy, whose value may not depend on the spinor structure of the
particle, i.e., whether one has a spin-1/2 or a spin-0 particle. We will show that when a Dirac particle is subjected
to scalar and vector potentials of equal magnitude, it will have exactly the same energy spectrum as a scalar particle
of the same mass under the same potentials. As we will see, this happens because the spin-orbit and Darwin terms
in the second-order equation for either the upper or lower spinor component vanish when the scalar and vector
potentials have equal magnitude. It is not uncommon to find physical systems in which strong interacting relativistic
particles are subject to Lorentz scalar potentials (or position-dependent effective masses) that are of the same order
of magnitude of potentials which couple to the energy (time components of Lorentz four-vectors). For instance, the
scalar and vector (hereafter meaning time-component of a four-vector potential) nuclear mean-field potentials have
opposite signs but similar magnitudes, whereas relativistic models of mesons with a heavy and a light quark, like D-
or B-mesons, explain the observed small spin-orbit splitting by having vector and scalar potentials with the same sign
and similar strengths [2].
It is well-known that all the components of the free Dirac spinor, i.e., the solution of the free Dirac equation, satisfy
the free Klein-Gordon equation. Indeed, from the free Dirac equation
(i~γµ∂µ −mc)Ψ = 0 (1)
one gets
(−i~γν∂ν −mc)(i~γ
µ∂µ −mc)Ψ = (~
2∂µ∂µ +m
2c2)Ψ = 0 , (2)
where use has been made of the relation γµγν∂µ∂ν = ∂µ∂
µ. In a similar way, for the time-independent free Dirac
equation we would have
(cα · p+ βmc2)ψ = (−i~cα · ∇+ βmc2)ψ = Eψ , (3)
http://arxiv.org/abs/0704.0353v1
where, as usual, ψ(r) = Ψ(r, t) exp (i E t/~), α = γ0γ and β = γ0. Then, by left multiplying Eq. (3) by cα ·p+βmc2,
one gets the time-independent free Klein-Gordon equation
(c2p2 +m2c4)ψ = (−~2c2∇2 +m2c2)ψ = E2ψ , (4)
where the relation {β,α} = 0 was used. This all means that the free four-component Dirac spinor, and of course all
of its components, satisfy the Klein-Gordon equation. This is not surprising, because, after all, both free spin-1/2 and
spin-0 particles obey the same relativistic dispersion relation, E2 = p2c2 +m2c4, in spite of having different spinor
structures and thus different wave functions. Since there is no spin-dependent interaction, one expects both to have
the same energy spectrum.
We consider now the case of a spin-1/2 particle subject to a Lorentz scalar potential Vs plus a vector potential Vv.
The time-independent Dirac equation is given by
[cα · p+ β(mc2 + Vs)]ψ = (E − Vv)ψ (5)
It is convenient to define the four-spinors ψ± = P±ψ = [(I ± β)/2]ψ such that
, (6)
where φ and χ are respectively the upper and lower two-component spinors. Using the properties and anti-
commutation relations of the matrices β and α we can apply the projectors P± to the Dirac equation (5) and
decompose it into two coupled equations for ψ+ and ψ−:
cα · pψ− + (mc
2 + Vs)ψ+ = (E − Vv)ψ+ (7)
cα · pψ+ − (mc
2 + Vs)ψ− = (E − Vv)ψ− . (8)
Applying the operator cα · p on the left of these equations and using them to write ψ+ and ψ− in terms of α · pψ−
and α · pψ+ respectively, we finally get second-order equations for ψ+ and ψ−:
c2p2 ψ+ + c
[α · p∆]α · pψ+
E −∆+mc2
= (E −∆+mc2)(E − Σ−mc2)ψ+ (9)
c2p2 ψ− + c
[α · pΣ]α · pψ−
E − Σ−mc2
= (E −∆+mc2)(E − Σ−mc2)ψ− (10)
where the square brackets [ ] mean that the operator α · p only acts on the potential in front of it and we defined
Σ = Vv + Vs and ∆ = Vv − Vs. The second term in these equations can be further elaborated noting that the Dirac
αi matrices satisfy the relation αiαj = δij +
iǫijkSk where Sk, k = 1, 2, 3, are the spin operator components. The
second-order equations read now
c2 p2 ψ+ + c
[p∆] · pψ+ +
[p∆]× p · S ψ+
E −∆+mc2
= (E −∆+mc2)(E − Σ−mc2)ψ+ (11)
c2 p2 ψ− + c
[pΣ] · pψ− +
[pΣ]× p · S ψ−
E − Σ−mc2
= (E −∆+mc2)(E − Σ−mc2)ψ−. (12)
Now, if p∆ = 0, meaning that ∆ is constant or zero (if ∆ goes to zero at infinity, the two conditions are equivalent),
then the second term in eq. (11) disappears and we have
c2 p2ψ+ = (E −∆+mc
2)(E − Σ−mc2)ψ+ = [(E − Vv)
2 − (mc2 + Vs)
2]ψ+ , (13)
which is precisely the time-independent Klein-Gordon equation for a scalar potential Vs plus a vector potential Vv[14].
Since the second-order equation determines the eigenvalues for the spin-1/2 particle, this means that when p∆ = 0,
a spin-1/2 and a spin-0 particle with the same mass and subject to the same potentials Vs and Vv will have the same
energy spectrum, including both bound and scattering states. This last sufficient condition for isospectrality can be
relaxed to demand that just the combination mc2+Vs be the same for both particles, allowing them to have different
masses. This is so because this weaker condition does not change the gradient of ∆ and Σ and therefore the condition
p∆ = 0 will still hold. On the other hand, if the scalar and vector potentials are such that pΣ = 0, we would obtain a
Klein-Gordon equation for ψ−, and again the spectrum for spin-0 and spin-1/2 particles would be the same, provided
they are subjected to the same vector potential and mc2 + Vs is the same for both particles. If both Vs and Vv are
central potentials, i.e., only depend on the radial coordinate, then the numerators of the second terms in equations
(11) and (12) read
[p∆] · pψ+ +
[p∆]× p · S ψ+ = −~
∆′L · S ψ+ (14)
[pΣ] · pψ− +
[pΣ]× p · S ψ− = −~
Σ′L · S ψ− , (15)
where ∆′ and Σ′ are the derivatives with respect to r of the radial potentials ∆(r) and Σ(r), and L = r × p is the
orbital angular momentum operator. From these equations ones sees that these terms, which set apart the Dirac
second-order equations for the upper and lower components of the Dirac spinor from the Klein-Gordon equation and
thus are the origin of the different spectra for spin-1/2 and spin-0 particles, are composed of a derivative term, related
to the Darwin term which appears in the Foldy-Wouthuysen expansion, and a L · S spin-orbit term. If ∆′ = 0
(Σ′ = 0), then there is no spin-orbit term for the upper (lower) component of the Dirac spinor. In turn, since
the second-order equation determines the energy eigenvalues, this means that the orbital angular momentum of the
respective component is a good quantum number of the Dirac spinor. This can be a bit surprising, since one knows
that in general the orbital quantum number is not a good quantum number for a Dirac particle, since L2 does not
commute with a Dirac Hamiltonian with radial potentials. The reason why this does not happen in these cases was
reported in Refs. [3, 4], and we now review it in a slight different fashion. Let us consider in more detail the case of
spherical potentials such that ∆′ = 0. One knows that a spinor that is a solution of a Dirac equation with spherically
symmetric potentials can be generally written as
ψjm(r) =
gj l(r)
Yj lm(r̂)
j l̃ m
. (16)
where Yj lm are the spinor spherical harmonics. These result from the coupling of spherical harmonics and two-
dimensional Pauli spinors χms , Yj lm =
〈 l ml ; 1/2ms | j m 〉Ylmlχms , where 〈 l ml ; 1/2ms | j m 〉 is a
Clebsch-Gordan coefficient and l̃ = l ± 1, the plus and minus signs being related to whether one has aligned or
anti-aligned spin, i.e., j = l ± 1/2. The spinor spherical harmonics for the lower component satisfy the relation
j l̃m
= −σ · r̂Yj lm. The fact that the upper and lower components have different orbital angular momenta is related
to the fact, mentioned before, that L2 does not commute with the Dirac Hamiltonian
H = cα · p+ β(Vs +mc
2) + Vv = cα · p+ βmc
2 +ΣP+ +∆P− , (17)
where P± are the projectors defined above. However, when ∆
′ = 0, there is an extra SU(2) symmetry of H (so-called
“spin symmetry”) as first shown by Bell and Ruegg [5]. When we have spherical potentials, Ginocchio showed that
there is an additional SU(2) symmetry (for a recent review see [4]). The generators of this last symmetry are
L = LP+ +
α · pLα · pP− =
0 Up LUp
, (18)
where Up = σ · p/(
p2) is the helicity operator. One can check that L commutes with the Dirac Hamiltonian,
[H,L] = [cα · p,LP+ +
α · pLα · pP−] + [∆,
α · pLα · p] + [Σ,L]
= [∆,
α · pLα · p ] = 0 , (19)
where the last equality comes from the fact that ∆′ = 0. The Casimir L2 operator is given by L2 = L2P+ +
α · pP−. Applying this operator to the spinor ψjm (16), we get
2ψjm = L
α · pL2 α · pψ−
= ~2l(l+ 1)ψ+
α · p cL2 ψ+jm
E −∆+mc2
= ~2l(l + 1)ψ+
+ ~2l(l + 1)ψ−
= ~2l(l + 1)ψjm , (20)
where ψ±jm = P±ψjm and we used the relation, valid when ∆
′ = 0, ψ+jm = (E −∆ +mc
α · p
ψ−jm. From (20) we
see that ψjm is indeed an eigenstate of L
2. Thus the orbital quantum number of the upper component l is a good
quantum number of the system when the spherical potentials Vs(r) and Vv(r) are such that Vv(r) = Vs(r)+C∆, where
C∆ is an arbitrary constant. Also, according to we have said before, there is a state of a spin-0 particle subjected to
these same spherical potentials (or, at least, with a scalar potential such that the sum Vs +mc
2 is the same) that
has the same energy and the same orbital angular momentum as ψjm. In addition, the wave function of this scalar
particle would be proportional to the spatial part of the wave function of the upper component.
Note that the generator of the “spin symmetry” S is given by a similar expression as (18) just replacing L by ~/2σ
[4, 5], meaning that S2 ≡ S2 = 3/4 ~2I so that spin is also a good quantum number, as would be expected. Actually,
one can show that the total angular momentum operator J can be written as L + S, so that l, ml (eigenvalue of
Lz), s = 1/2, ms (eigenvalue of Sz) are good quantum numbers. Then, of course, j and m = ml +ms are also good
quantum numbers, but only in a trivial way, because there is no longer spin-orbit coupling. Therefore, in the spinor
(16) one could just replace the spinor spherical harmonic Yj lm by Yl mlχms and Yj l̃m by −σ · r̂ Yl mlχms . Note that if
∆ is a nonrelativistic potential, ∆ ≪ mc2 and ∆′ ≪ m2c4/(~c), i.e., it is slowly varying over a Compton wavelength.
In this case, the spin-orbit term will also get suppressed. In fact, the derivative of the ∆ potential is the origin of
the well-known relativistic spin-orbit effect which appears as a relativistic correction term in atomic physics or in the
v/c Foldy-Wouthuysen expansion (only the derivative of Vv appears because usually no Lorentz scalar potential Vs is
considered, and therefore ∆ = Vv).
When Σ′ = 0, or Vv(r) = −Vs(r) + CΣ, with CΣ an arbitrary constant, there is again a SU(2) symmetry, usually
called pseudospin symmetry ([5, 6]) which is relevant for describing the single-particle level structure of several nuclei.
This symmetry has a dynamical character and cannot be fully realized in nuclei because in Relativistic Mean-field
Theories the Σ potential is the only binding potential for nucleons [7, 8]. For harmonic oscillator potentials this is
no longer the case, since ∆, acting as an effective mass going to infinity, can bind Dirac particles [9, 10], even when
Σ = 0. As before, in the special case of spherical potentials, there is another SU(2) symmetry whose generators are
α · pLα · pP+ +LP− =
Up LUp 0
. (21)
In the same way as before, applying L̃
to ψjm, we would find that L̃
ψjm = ~
2 l̃(l̃ + 1)ψjm, that is, this time it
is the orbital quantum number of the lower component l̃ which is a good quantum number of the system and can
be used to classify energy levels. Again, provided the vector and scalar potentials are adequately related, there
would be a corresponding state of a spin-0 particle with the same energy and same orbital angular momentum l̃,
and, furthermore, its wave function would be proportional to the spatial part of the wave function of the lower
component. As before, the pseudospin symmetry generator S̃ can be obtained from L̃ by replacing L by ~/2σ. The
good quantum numbers of the system would be, besides l̃, m
, s̃ ≡ s = 1/2 and ms̃. Again, J = L̃ + S̃. It is
interesting that, as has been noted by Ginocchio [9], the generators of spin and pseudospin symmetries are related
through a γ5 transformation since S̃ = γ5Sγ5 and L̃ = γ5Lγ5. This property was used in a recent work to relate
spin symmetric and pseudospin symmetric spectra of harmonic oscillator potentials [11]. There it was shown that
for massless particles (or ultrarelativistic particles) the spin- and pseudo-spin spectra of Dirac particles are the same.
In addition, this means that spin-symmetric massless eigenstates of γ5 would be also pseudo-spin symmetric and
vice-versa. Since in this case ∆ = Σ = 0, or Vv = Vs = 0, this is, of course, just another way of stating the well-known
fact that free massless Dirac particles have good chirality.
Naturally, for free spin-1/2 particles described by spherical waves, both l and l̃ are good quantum numbers, which
just reflects the fact that one can have free spherical waves with any orbital angular momentum for the upper or
lower component and still have the same energy, as long as their linear momentum magnitude is the same, or, put in
another way, the energy of a free spin-1/2 particle cannot depend on its direction of motion.
In summary, we showed that when a relativistic spin-1/2 particle is subject to vector and scalar potentials such
that Vv = ±Vs + C±, where C± are constants, its energy spectrum does not depend on their spinorial structure,
being identical to the spectrum of a spin-0 particle which has no spinorial structure. This amounts to say that if
the potentials have these configurations there is no spin-orbit coupling and Darwin term. If the scalar and vector
potentials are spherical, one can classify the energy levels according to the orbital angular momentum quantum
number of either the upper or the lower component of the Dirac spinor. This would then correspond to having a
spin-0 particle with orbital angular momentum l or l̃, respectively. This spectral identity can of course happen only
with potentials which do not involve the spinorial structure of the Dirac equation in an intrinsic way. For instance, a
tensor potential of the form iβσµν (∂µAν − ∂νAµ) does not have an analog in the Klein-Gordon equation, so that one
could not have a spin-0 particle with the same spectrum as a spin-1/2 particle with such a potential. This is the case
of the so-called Dirac oscillator [12] (see [10] for a complete reference list), in which the Dirac equation contains a
potential of the form iβσ0imωri = imωβα · r. Another important potential, the electromagnetic vector potential A,
which is the spatial part of the electromagnetic four-vector potential, can be added via the minimal coupling scheme
to both the Dirac and the Klein-Gordon equations. Since α · (p− eA)α · (p− eA) = (p− eA)2 + 2e~∇×A · S, the
spectra of spin-0 and spin-1/2 particles cannot be identical as long as there is a magnetic field present, even though
the condition Vv = ±Vs +C± is fulfilled. It is important also to remark that, since for an electromagnetic interaction
Vv is the time-component of the electromagnetic four-vector potential, this last condition is gauge invariant in the
present case, in which we are dealing with stationary states, i.e, time-independent potentials. So, in the absence of a
external magnetic field (allowing, for instance, an electromagnetic vector potential A which is constant or a gradient
of a scalar function), a spin-0 and spin-1/2 particle subject to the same electromagnetic potential Vv and a Lorentz
scalar potential fulfilling the above relation would have the same spectrum.
The remark made above about the similarity of spin-0 and spin-1/2 wave functions can be relevant for calculations
in which the observables do not depend on the spin structure of the particle, like some structure functions. One such
calculation was made by Paris [13] in a massless confined Dirac particle, in which Vv = Vs. It would be interesting
to see how a Klein-Gordon particle would behave under the same potentials. More generally, this spectral identity
can also have experimental implications in different fields of physics, since, should such an identity be found, it would
signal the presence of a Lorentz scalar field having a similar magnitude as that of a time-component of a Lorentz
vector field, or at least differing just by a constant.
Acknowledgments
We acknowledge financial support from CNPQ, FAPESP and FCT (POCTI) scientific program.
[1] S. Jeschonnek and J. W. Van Orden, Phys. Rev. D 69, 054006 (2004).
[2] P. R. Page, T. Goldman, and J. N. Ginocchio, Phys. Rev. Lett. 86, 204.
[3] J. N. Ginocchio and A. Leviatan, Phys. Lett. B425, 1 (1998).
[4] J. N. Ginocchio, Phys. Rep. 414 165 (2005).
[5] J. S. Bell and H. Ruegg, Nucl. Phys. B98, 151 (1975).
[6] J. N. Ginocchio, Phys. Rev. Lett. 78, 436 (1997).
[7] P. Alberto, M. Fiolhais, M. Malheiro, A. Delfino, and M. Chiapparini, Phys. Rev. Lett. 86, 5015 (2001).
[8] P. Alberto, M. Fiolhais, M. Malheiro, A. Delfino, and M. Chiapparini, Phys. Rev. C 65, 034307 (2002).
[9] J. N. Ginocchio, Phys. Rev. Lett. 95, 252501 (2005).
[10] R. Lisboa, M. Malheiro, A. S. de Castro, P. Alberto, and M. Fiolhais, Phys. Rev. C 69, 024319 (2004).
[11] A. S. de Castro, P. Alberto, R. Lisboa, and M. Malheiro, Phys. Rev. C 73, 054309 (2006).
[12] D. Itô, K. Mori, and E. Carriere, Nuovo Cimento A 51, 1119 (1967); M. Moshinsky and A. Szczepaniak, J. Phys. A 22,
L817 (1989).
[13] M. W. Paris, Phys. Rev. C 68, 025201 (2003).
[14] There are some authors who introduce a scalar potential Vs in the Klein-Gordon equation by making the replacement
m2c4 → m2c4 +V2
. Here we introduce it, as most authors do, as an effective mass m∗ 2 = (m+Vs/c
2)2, since it is the way
that it is introduced in the Dirac equation. The two potentials are related by V2
= (mc2 + Vs)
−m2c4.
	Acknowledgments
	References
ABSTRACT
  We show that the conditions which originate the spin and pseudospin
symmetries in the Dirac equation are the same that produce equivalent energy
spectra of relativistic spin-1/2 and spin-0 particles in the presence of vector
and scalar potentials. The conclusions do not depend on the particular shapes
of the potentials and can be important in different fields of physics. When
both scalar and vector potentials are spherical, these conditions for
isospectrality imply that the spin-orbit and Darwin terms of either the upper
component or the lower component of the Dirac spinor vanish, making it
equivalent, as far as energy is concerned, to a spin-0 state. In this case,
besides energy, a scalar particle will also have the same orbital angular
momentum as the (conserved) orbital angular momentum of either the upper or
lower component of the corresponding spin-1/2 particle. We point out a few
possible applications of this result.

<|endoftext|><|startoftext|>
General asymptoti
 solutions
of the Einstein equations and
phase transitions in quantum gravity
Dmitry Podolsky
Helsinki Institute of Physi
s, University of Helsinki,
Gustaf Hällströmin katu 2, FIN00014, Helsinki, Finland
Email: dmitry.podolsky�helsinki.�
Abstra
t
We dis
uss generi
 properties of 
lassi
al and quantum theories of grav-
ity with a s
alar �eld whi
h are revealed at the vi
inity of the 
osmolog-
i
al singularity. When the potential of the s
alar �eld is exponential and
unbounded from below, the general solution of the Einstein equations
has quasi-isotropi
 asymptoti
s near the singularity instead of the usual
anisotropi
 Belinskii - Khalatnikov - Lifshitz (BKL) asymptoti
s. De-
pending on the strength of s
alar �eld potential, there exist two phases
of quantum gravity with s
alar �eld: one with essentially anisotropi
 be-
havior of �eld 
orrelation fun
tions near the 
osmologi
al singularity, and
another with quasi-isotropi
 behavior. The �phase transition� between the
two phases is interpreted as the 
ondensation of gravitons.
On leave from Landau Institute for Theoreti
al Physi
s, 119940, Mos
ow, Russia.
http://arxiv.org/abs/0704.0354v2
One pessimisti
 quotation from the golden era of �nding exa
t solutions of
the Einstein equations whi
h re�e
ted the relations between parti
le theorists
and experts in GR belongs to Ri
hard Feynman. Taking part in the Interna-
tional Conferen
e on Relativisti
 Theories of Gravitation at Warsaw, he was
writing to his wife [1℄: �I am not getting anything out of the meeting. I am
learning nothing. ... I get into arguments outside the formal sessions (say,
at laun
h) whenever anyone asks me a question or starts to tell me about his
�work�. The �work� is always: (1) 
ompletely un-understandable, (2) vague and
inde�nite, (3) something 
orre
t that is obvious and self-evident but worked out
by a long and di�
ult analysis, and presented as an important dis
overy, or (4)
a 
laim based on the stupidity of the author that some obvious and 
orre
t fa
t,
a

epted and 
he
ked for years, is in fa
t false ... (5) an attempt to do something
probably impossible but 
ertainly of no utility whi
h, it is �nally revealed in the
end, fails ... or (6) just plan wrong ... Remind me not to 
ome to any more
gravity 
onferen
es!� Certainly, I am well aware of that the work presented in
this essay 
ould belong to the 
lass (3) or (5) in the Feynman's 
lassi�
ation
(hopefully, not to the 
lass (6)!), but I will follow Feynman's own words [1℄:
�We all do it for the fun of it� trying to �nd my fun in identifying some links
whi
h 
onne
t the part of the 
ommon lore on general relativity named �Exa
t
solutions of the Einstein equations� to the problem of the GR quantization.
Of 
ourse, Feynman's interest was in the quantization of GR by applying the
path integral approa
h working so well in QED. Solutions of the Einstein equa-
tions de�ne saddle points of the a
tion
2 S = Sgravity + Smatter of the quantum
gravity with matter. However, the 
ontributions of these saddle points into the
partition fun
tion of the theory and �u
tuations near them
Dφmatter exp
(Sgravity + Smatter)
typi
ally have zero measure. In other words, the probability for an almost any
exa
t solution to des
ribe the observable features of the Universe or some parts
of it, to appear somehow from the quantum foam realized near the singularity
is in�nitely small, and the Feynman's anger is absolutely understandable.
Well, almost absolutely... Of 
ourse, there are several 
lasses of solutions
whi
h will be important for the quantum part of the story, too, and one 
an
without mu
h thinking immediately identify some:
1. Attra
tors : among them are Minkowski spa
etime, de Sitter (at least in
the sense of eternal in�ation [2℄) and anti de Sitter spa
etimes (a set of
AdS domains is mostly probably the global attra
tor of GR realized as
low-energy approximation of string theory [3℄); bla
k holes (S
hwarzs
hild,
Kerr, Reissner-Nordström, Kerr-Newman solutions), et
.
From now on, by the quantum theory of gravity we mean e�e
tive QFT of spin 2 �elds
[4℄ (plus matter �elds) � the one whi
h parti
les with energies E ≪ MP test. In this limit,
the e�e
ts of the non-renormalizability may be negle
ted. Although we dis
uss below the
situation whi
h is realized near the 
osmologi
al singularity, we limit the dis
ussion to time
s
ales t ≫ tP .
2. General solutions of the Einstein equations. As usual [5℄, a solution of the
Einstein equations is regarded as general if it 
ontains su�
ient number
of arbitrary fun
tions of 
oordinates. In the 
ase of Ri

i-�at spa
etimes,
this number is 4, and is equal to 8 in the presen
e of hydrodynami
 matter.
While any non-attra
tor type solution of the Einstein equations de�nes the
saddle point for the path integral (1) whi
h does have a vanishing 
ontribution
into the overall partition fun
tion, eventually it well settle down towards an
attra
tor solution due to the e�e
t of 
lassi
al perturbations and/or quantum
�u
tuations. The 
ontribution of attra
tor type saddle points into the partition
fun
tion (1) is therefore signi�
ant. However, the key word here is �eventually�.
For any non-attra
tor solution it takes a time tcoll before the solution rea
hes
its attra
tor asymptoti
s.
Let us 
onstru
t some initial state |Ψ(t = ti)〉 of quantum matter �elds in
a 
urved spa
etime and gravitons. The amplitude 〈Ψ(tf )|Ψ(ti)〉 is then de�ned
by the path integral (1) 
al
ulated on the 
losed S
hwinger-Keldysh 
ontour
from t = ti to t = tf and ba
k. Then, if tf ≪ tcoll, the 
orresponding attra
tor
saddle point does not give any noti
eable 
ontribution into the amplitude.
it is ne
essary to know the evolution of the quantum state |Ψ(t)〉 at time s
ales
t ≪ tcoll, we are for
ed to pay mu
h more attention to the type of saddle

orresponding to general solutions of the Einstein equations.
Certainly, the Einstein equations are hard to solve, and it is possible to �nd
something like their general solution only in physi
ally simpli�ed situations.
As was �rst shown by Belinskii, Khalatnikov and Lifshitz [6℄, asymptoti
ally,
the general solutions of the Einstein equations near the 
osmologi
al singularity
have the very same form for an almost arbitrary 
hoi
e of the matter 
ontent.
This asymptoti
s in the syn
hronous frame
is given by Kasner-like solution
ds2 = dt2 − γαβ(t,x)dxαdxβ , (2)
γαβ(t, x) = t
lαlβ + t
mαmβ + t
nαnβ . (3)
Both Kasner exponents p1, p2, p3 and Kasner axis ve
tors lα, mα and nα are
arbitrary fun
tions of spa
e 
oordinates. The Einstein equations provide two

onstraints on the Kasner exponents
p1 + p2 + p3 = 1, (4)
p21 + p
2 + p
3 = 1, (5)
as well as three other 
onstraints on arbitrary fun
tions of spa
e 
oordinates
present in (3). Taking into a

ount that the 
hoi
e of syn
hronous gauge
g00 = 1, g0α = 0 (6)
Of 
ourse, the time s
ale tcoll itself is a fun
tional of the initial state |Ψ(t = ti)〉.
Often, it is impossible to 
hoose the globally syn
hronous frame of referen
e due to the
limitations set by the 
asuality. However, everywhere in the text we dis
uss the physi
s in a
given 
asual pat
h.
leaves the freedom to make three-dimensional spa
e 
oordinate transformations,
one 
an easily see that the total number of arbitrary 
oordinate fun
tions in the
Kasner-like solution (2),(3) is equal to 4 as it should be expe
ted for a general
solution of Einstein equations 
orresponding to an empty spa
etime.
In the presen
e of the hydrodynami
 matter Kasner solution (2),(3) de-
s
ribes asymptoti
 behavior of metri
s near the singularity,
sin
e 
omponents
of energy-momentum tensor Tik grow slower at t → 0 then the 
omponents of
the Ri

i tensor.
Higher order 
orre
tions to the Kasner solution (2),(3), i.e.,
higher order terms in the expansion of γαβ(t, x) over powers of t play the role
of perturbations whi
h give rise to the time dependen
e of Kasner exponents pi
as well as Kasner axis ve
tors lα, mα and nα and to well-known BKL 
haoti
behavior. Therefore, the BKL solution is simultaneously a universal attra
-
tor for all solutions of the Einstein equations possessing a spa
elike singularity.
It means that no other saddle points 
ontribute into the amplitude (1) in the
vi
inity of the 
osmologi
al singularity.
In this essay, it will be �rst of all shown that in the presen
e of a s
alar �eld
with potential V (φ) whi
h is exponential and unbounded from below, the general
asymptoti
 solution of the Einstein equations is di�erent from the BKL solution
and is quasi-isotropi
 [8℄ (while the BKL solution is essentially anisotropi
). In
parti
ular, we will 
hoose potential of the form
V (φ) = −|V0|ch (λφ) . (7)
S
alar �eld potentials of this form appear in problems related to gauged super-
gravity models [10℄ and the ekpyroti
 s
enario [11℄. The 
osmologi
al singularity
realized in su
h theory is of the Anti de Sitter Big Crun
h type. The physi
s
in its vi
inity it is interesting by itself and even more so sin
e this type of
singularity seems to be realized quite often on the string theory lands
ape [3℄.
As in the 
ase dis
ussed in [6℄, it is 
onvenient to perform all 
al
ulations
in the syn
hronous frame of referen
e where g00 = 1, g0α = 0, gαβ = −γαβ ,
α, β = 1 . . . 3, i.e., the spa
etime interval has the form
ds2 = dt2 − γαβ(t, x)dxαdxβ . (8)
Near the hypersurfa
e t = 0 whi
h 
orresponds to the singularity, the spatial
metri
 
omponents behave as
γαβ(t,x) = aαβ(x)t
2q + cαβ(x)t
d + bαβ(x)t
(i,j)
(x)tfij . (9)
With the same pre
ision, one has in the vi
inity of singularity
φ(t,x) = ψ(x) + φ0(x)log(t) + φ1(x)t
f1 + φ2(x)t
f2 + · · · , (10)
Whi
h 
orresponds everywhere below to the spa
elike hypersurfa
e t = 0.
If there is a s
alar �eld in the matter 
ontent [7℄, BKL solution (3) remains general solution
of the Einstein equations with 
hanged Kasner 
onstraints (4),(5).
The quasi-isotropi
 solution for su
h potentials was �rst found at the ba
kground level in
[9℄, where it was also shown that it is the attra
tor. The goal we pursue in this essay is to
prove that the quasi-isotropi
 solution is also general and to understand how its instability
develops with the 
hange of the form of the potential.
with dots 
orresponding to higher order terms of φ(t,x) expansion in powers
of t. From the Einstein equations one �nds8 that the leading exponents in the
expansions (9) and (10) are de�ned by the expressions
, n = 2, d = 1− q, (11)
ψ(x) = Const, φ0(x) =
, f1 = 1− 3q, f2 = 2− q, (12)
cαα(x) = 2λφ1(x), c
α;β(x) =
1− 2q
1− 3q
φ0φ1,α(x), (13)
P̃ βα (x) + (1 − q)(qbγγ(x)δβα + (1 + q)bβα(x)) =
e−ψλφ2(x), (14)
− (1− q)bαα(x) =
(1− q)φ0φ2(x)−
−ψλφ2(x), (15)
where P̃ βα (x) is the 3-dimensional Ri

i tensor 
onstru
ted from 
omponents of
the tensor aβα(x) as 
omponents of metri
 tensor. Higher order terms in the
expansions (9) and (10) 
an be self
onsistently 
al
ulated by using the Einstein
equations and the orthogonality 
ondition
β = δ
α. (16)
One 
an immediately �nd from Eq. (16) that the higher order exponents in the
metri
 (9) are de�ned by
fij = i+ 2j − (3i+ 2j − 2)q, (17)
where i, j ∈ N. The n term in the metri
 expansion 
orresponds to i = 0, j = 1
and d term � to i = 1, j = 0. It is easy to see that there is no other exponents
in the expansion (9).
Let us examine the formulae (11)-(15) more 
losely and 
al
ulate the number
of arbitrary fun
tions present in this solution. First of all, one 
an immediately
see that the tensor aβα(x) is not 
onstrained by the Einstein equations. It has 6

omponents, and 3 of them 
an be made to be equal to 0 by a three-dimensional

oordinate transformation (the remnant gauge freedom of the syn
hronous gauge
(6)). Sin
e this tensor is used for lowering and rising the indi
es and represents
the leading term in the expansion (9), we will identify the term aαβt
ba
kground 
ontribution to γαβ(t, x). Furthermore, we see from Eqs. (14),(15)
that bαβ 
an be re
onstru
ted from the known tensor aαβ .
The tensor cαβ 
ontains three more arbitrary fun
tions of 
oordinates. In-
deed, it 
an be represented in the form
cβα(x) =
α + Y
;α + Y
Y γγ δ
α + c
(TT)β
α . (18)
Due to the limitations of spa
e we are unable to present the full derivation of the solution
here. It will be given in the forth
oming publi
ation [12℄.
The indi
es of all matri
es are lowered and raised by the tensor aαβ , for example, b
From Eq. (13) one 
an see that its tra
e part de�nes the value of φ2(x) 
on-
tributing to Eq. (10) and therefore provides one arbitrary fun
tion. Then, three

omponents of the ve
tor 
ontribution Yα(x) are �xed, and transverse tra
eless
part c
(TT)β
α (x) provides remaining two arbitrary fun
tions. We also note that
the cαβ term 
an be regarded as the leading term perturbation to the ba
kground

ontribution into γαβ . In parti
ular, it 
ontains the 
ontribution of s
alar per-
turbations (related to the tra
e of the tensor cβα) and tensor perturbations or
gravitons (related to the transverse tra
eless part of the tensor cβα).
The total number of arbitrary fun
tions in the solution (9),(10) is therefore
6, as one may expe
t for the general solution of the Einstein equations with a
s
alar �eld. By analysis similar to [6℄, one may show [9, 12℄ that the 
ontri-
butions of other matter �elds into the overall energy-momentum tensor grow
slower at t → 0 than the 
ontribution of the s
alar �eld. We 
on
lude that
the solution (9), (10) is the general asymptoti
 solution of Einstein equations
(with arbitrary matter 
ontent) near the 
osmologi
al singularity. Similarly to
the BKL solution, the quasi-isotropi
 solution is the universal attra
tor for all
solutions of the Einstein equations with s
alar �eld having the potential (7)
and arbitrary additional matter 
ontent whi
h possess the time-like singularity.
Again, under 
onsidered 
onditions, no other saddle points 
ontribute into the
amplitude (1) in the vi
inity of the Big Crun
h singularity.
It is instru
tive to understand how exa
tly the transition from the quasi-
isotropi
 regime (9),(10) near the singularity to the BKL anisotropi
 regime
(3) happens. This transition 
an be a
hieved by 
hanging the value of λwhile
keeping V0 �xed (or vise versa).
By 
onstru
tion, 2q < d = 1− q, i.e., the exponent aαβt2q in the expansion
(9) is leading. With the in
rease of q, the value of d de
reases and when q rea
hes
the 
riti
al value qc = 1/3, the 
ontributions aαβ(x)t
and cαβ(x)t
into the
expansion of the metri
 (9) be
ome of the same order. Similarly, one 
an 
he
k
that the values of higher order exponents (17) de
rease with the in
rease of q.
In parti
ular, all exponents with di�erent i's and similar j's be
ome of the same
order of magnitude at qc = 1/3. At q > qc = 1/3 the general asymptoti
 solution
of the Einstein equations near the singularity is given by Eq. (3) instead of Eq.
In fa
t, what we have just found is relevant for the quantum part of the
story, too, and in a sense is analogous to the spontaneous symmetry breaking
phenomenon in QFTs. Indeed, let us take the theory with a s
alar �eld
(Φ2 − v)2, (19)
set Φ(x, 0) = 0 as an initial 
ondition and 
ontinuously 
hange the value of
the parameter v. At v > 0 the solution Φ(t, x) = 0 of the 
lassi
al equations
of motion is perturbatively stable and 
orresponds to the true va
uum of the
theory at the quantum level. At ν < 0 the same solution be
omes 
lassi
ally
unstable, and Φ(t, x) rea
hes the �true� va
uum value Φ = ±
v during the time
t ∼ 1
log 1
(with the VEV of the operator Φ̂ having similar behavior at the
quantum level). Similar situation is realized in our 
ase.
At q < qc = 1/3 the quasi-isotropi
 solution (9),(10) is the general solution of
the Einstein equations; it is perturbatively stable by 
onstru
tion (without any
limitations on the weakness of the perturbations). At q > qc the quasi-isotropi
solution be
omes perturbatively unstable (perturbations de�ned by cαβ and
higher order terms grow faster than the ba
kground term aαβ at t→ 0).
Vise versa, at q > qc = 1/3 the BKL anisotropi
 solution if the Einstein
equations is general in the vi
inity of the 
osmologi
al singularity. It is stable
by 
onstru
tion with respe
t to arbitrary perturbations and the stability is lost
at q < qc.
This analysis remains valid for the quantum situation
sin
e the 
anoni-

al phase spa
e is in one-to-one 
orresponden
e with the spa
e of solutions of

lassi
al �eld equations [13℄, and both quasi-isotropi
 and BKL solutions are (a)
general and (b) universal attra
tors for other solutions of the Einstein equations
in the vi
inity of the time-like singularity.
The transition from the regime realized at q < 1/3 to the regime q > 1/3
probably 
orresponds in the quantum level to the 
ondensation of gravitational
perturbations. Indeed, one 
an interpret the higher order 
ontributions in the
expansion (9) as terms 
orresponding to the intera
tion between gravitational
degrees of freedom as well as higher order nonlinearities in the ba
kground. Our

on
lusion is based on the fa
t that at q = qc the spe
trum of the exponents
in the expansion (9) be
omes in�nitely dense. It is also possible to show that
the point of the �phase transition� qc = 1/3 
orresponds at the 
lassi
al level
to the situation when the 
hoi
e of globally syn
hronous frame of referen
e is
impossible near the singularity [12℄.
Let us summarize what have been found in the present essay. We have shown
that in the presen
e of the s
alar �eld with exponential potential unbounded
from below, the general asymptoti
 solution of the Einstein equations near
the 
osmologi
al singularity has quasi-isotropi
 behavior instead of anisotropi
found by [6℄. We have argued that at the quantum level there should exist a
phase transition between the quasi-isotropi
 and anisotropi
 phases, governed
by the strength of the s
alar �eld potential and interpreted this phase transition
as the 
ondensation of gravitational perturbations.
A
knowledgements
I am thankful to A.A. Starobinsky and D. Wesley for the dis
ussions and to
K. Enqvist for making helpful 
omments. While 
ondu
ting this work, I was
supported by Marie Curie Resear
h training network HPRN-CT-2006-035863.
One important 
omment regarding the quantization should be made. The quantum theory
of the s
alar �eld with the potential (7) is ta
hyoni
ally unstable and has neither well-de�ned
asymptoti
 |out〉 states, nor 〈out|in〉 S-matrix. However, the S
hwinger-Keldysh 〈in|in〉 S-
matrix is de�ned, and it is possible to make sense of the 
orresponding time-dependent theory
[12℄.
Referen
es
[1℄ R. Feynman, as told R. Leighton, �What do you 
are what other people
think?�, W.W. Norton, New-York, 1988.
[2℄ A. Linde, �Parti
le physi
s and in�ationary 
osmology�, Harwood, Switzer-
land, 1990 [hep-th/0503203℄; A. Linde, Phys. Lett. 175B, 395 (1983).
[3℄ A. Ceresole, G. Dall'Agata, A. Giryavets, R. Kallosh and A. Linde, Phys.
Rev. D 74 (2006) 086010 [hep-th/0605086℄; A. Linde, JCAP 0701 (2007)
022 [hep-th/0611043℄; T. Clifton, A. Linde, N. Sivanandam, JHEP 0702
(2007) 024 [hep-th/0701083℄.
[4℄ C. Burgess, in �Towards quantum gravity�, ed. D. Oriti, Cambridge Uni-
versity Press, 2006 [gr-q
/0606108℄; C. Burgess, hep-th/0701053.
[5℄ L.D. Landau and E.M. Lifshitz, �The 
lassi
al theory of �elds�, Pergamon
Press, 1979.
[6℄ V.A. Belinskii, E.M. Lifshitz and I.M. Khalatnikov, Adv. Phys. 19, 525
(1970); V.A. Belinskii, E.M. Lifshitz and I.M. Khalatnikov, Adv. Phys.
31, 639 (1982); B. Berger, D. Gar�nkle, J. Isenberg, V. Mon
rief, and M.
Weaver, Mod. Phys. Lett. A 13, 1565 (1998).
[7℄ V.A. Belinskii and I.M. Khalatnikov, Sov.Phys. JETP 36, 591 (1973);
L. Andersson, A. Rendall, Commun. Math. Phys. 218, 479 (2001)
[gr-q
/0001047℄.
[8℄ E.M. Lifshitz, I.M. Khalatnikov, ZhETF 39, 149 (1960) (in russian);
E.M. Lifshitz, I.M. Khalatnikov, Sov. Phys. Uspekhi 6, 495 (1964).
[9℄ J.K. Eri
kson, D. Wesley, P. Steinhardt, N. Turok, Phys. Rev. D 69 (2004)
063514 [hep-th/0312009℄.
[10℄ C.M. Hull, Class. Quant. Grav. 2, 343 (1985); R. Kallosh, A.
Linde, S. Prokushkin, M. Shmakova, Phys. Rev. D 65, 105016 (2002)
[hep-th/0110089℄; R. Kallosh, A. Linde, S. Prokushkin, M. Shmakova,
Phys. Rev. D 66, 123503 (2002) [hep-th/0208156℄.
[11℄ J. Khoury, B.A. Ovrut, P.J. Steinhardt, N. Turok, Phys. Rev. D 64, 123522
(2001) [hep-th/0103239℄; J. Khoury, B.A. Ovrut, N. Seiberg, P.J. Stein-
hardt, N. Turok, Phys. Rev. D 65, 086007 (2002) [hep-th/0108187℄; E.
Bu
hbinder, J. Khoury, B. Ovrut, hep-th/0702154.
[12℄ D. Podolsky, A. Starobinsky, in preparation.
[13℄ Č. Crn
ović and E. Witten, in Three hundred year of gravitation, eds. S.W.
Hawking and W. Israel, Cambridge University Press, 1987, p. 676; G.J.
Zu
kerman, in Mathemati
al aspe
ts of string theory, San-Diego 1986, Ed.
S.-T. Tau, Worlds S
ienti�
,1987, p.259.
http://arxiv.org/abs/hep-th/0503203
http://arxiv.org/abs/hep-th/0605086
http://arxiv.org/abs/hep-th/0611043
http://arxiv.org/abs/hep-th/0701083
http://arxiv.org/abs/gr-qc/0606108
http://arxiv.org/abs/hep-th/0701053
http://arxiv.org/abs/gr-qc/0001047
http://arxiv.org/abs/hep-th/0312009
http://arxiv.org/abs/hep-th/0110089
http://arxiv.org/abs/hep-th/0208156
http://arxiv.org/abs/hep-th/0103239
http://arxiv.org/abs/hep-th/0108187
http://arxiv.org/abs/hep-th/0702154
ABSTRACT
  We discuss generic properties of classical and quantum theories of gravity
with a scalar field which are revealed at the vicinity of the cosmological
singularity. When the potential of the scalar field is exponential and
unbounded from below, the general solution of the Einstein equations has
quasi-isotropic asymptotics near the singularity instead of the usual
anisotropic Belinskii - Khalatnikov - Lifshitz (BKL) asymptotics. Depending on
the strength of scalar field potential, there exist two phases of quantum
gravity with scalar field: one with essentially anisotropic behavior of field
correlation functions near the cosmological singularity, and another with
quasi-isotropic behavior. The ``phase transition'' between the two phases is
interpreted as the condensation of gravitons.

<|endoftext|><|startoftext|>
Introduction
In the last decade interest in the very cool, old white dwarf (WD)
halo population has grown. This interest is motivated by the pos-
sibility that these objects could account for a significant fraction
of the baryonic dark matter of our Galaxy. This idea is in accord
with discussions attempting to explain the microlensing events
in the Large Magellanic Cloud in terms of a halo WD popula-
tion – see, for example, Chabrier et al. 1996 and Hansen 1998.
Alcock et al. 1999 suggested that massive compact halo objects
(MACHOs) make up 20 to 100% of the dark matter in the halo,
with MACHOs having typical mass m ∼ 0.5 M�; more recently,
Calchi Novati et al. 2005 find a similar result from pixel lensing
in the line of sight to M31. Hence, in this scenario the search
for, and direct study of, halo WDs can provide constraints on
the fraction of dark matter in the Milk Way that is attributable to
these objects.
Oppenheimer et al. (2001, hereafter OHDHS) identified 38
high proper motion WDs; from their kinematics, the authors
Send offprint requests to: ducourant@obs.u-bordeaux1.fr
? Based on observations collected at the European Southern
Observatory, Chile (067.D-0107, 069.D-0054, 070.D-0028, 071.D-
0005, 072.D-0153, 073.D-0028)
concluded that they were members of a halo population. Since
then an intense discussion concerning the status of these objects
has taken place in the literature. A comprehensive review of
this debate is presented in Hansen and Liebert 2003 where the
conclusion is that the OHDHS interpretation is possibly over-
stated, but that complete conclusions are not possible without
further data. Other studies suggest that the disk and “thick disk”
Galactic populations can be used to explain the great majority
of the objects (Reid 2005, Kilic et al. 2005, Spagna et al. 2004,
Crézé et al. 2004, Holopainen & Flynn 2004, Flynn et al. 2003,
Silvestri et al. 2002). The importance of the high velocity WDs
cannot be understated in other contexts (e.g. the star forma-
tion history of the Galaxy, see also Davies, King & Ritter 2002,
Hansen 2003, Montiero et al. 2006). Moreover, several stud-
ies emphasise the importance of obtaining trigonometric par-
allaxes for candidate halo WDs (Bergeron & Leggett 2002,
Torres et al. 2002, Bergeron 2003). This is especially important
for the coolest WDs, whose spectral energy distributions show
remarkable departures from black–body distributions and which
are proving to be difficult to model accurately (Kowalski 2006,
Gates et al. 2004, Saumon & Jacobson 1999, Hansen 1998). In
the presence of such radical changes to the WD spectrum, the as-
sumption of a monotonic photometric parallax relation (e.g. as
2 C. Ducourant et al.: Parallaxes of halo white dwarf candidates
used in OHDHS) could break down and estimates of intrinsic
space velocities could be in error seriously. Furthermore, a recent
paper (Bergeron et al. 2005) concludes that precise distances are
mandatory to derive accurate kinematics and ages for the puta-
tive halo WDs and in order to derive their evolutionary status.
Aiming to clear up this question, in 2001 we started an ob-
serving program with the ESO 1.56-m Danish and ESO 2.2-m
telescopes to measure the trigonometric parallaxes of these stars.
Trigonometric parallax measurements remain the only direct un-
biased distance determination. They are of great importance in
the debate about the status of cool halo white dwarfs because
they are required to derive precise space velocities and ages
which are used for distinguishing between halo and disk mem-
bership. These trigonometric parallaxes lead to the re-calibration
of photometric distances used until now in this debate and allow
analysis of the cool halo white dwarf population with more con-
fidence. Unfortunately, due to limited observing time, only 15
stars on the OHDHS list have been observed to date. However,
this sub-sample provides important insight into the problem.
2. Observations
Astrometric observations of 15 of the OHDHS list of 38 halo
white dwarf candidates were performed at the ESO 2.2-m tele-
scope equipped with the WFI wide–field mosaic camera (with
0.238 ′′/pixel, a field of view FOV = 34′ × 33′, 4 × 2 mosaic of
2k × 4k CCDs), through the ESO 845 I filter. To reduce astro-
metric distortions and other instrumental effects, only data from
chip 51 (with FOV = 8′× 16′) were used in this work; target stars
were centered in the FOV of this chip.
Four epochs of observation were acquired at maximum
parallactic factor in Right Ascension in November 2002, July
2003, November 2003 and July 2004 with a total of 11 nights of
observations. Two parallactic periods (four observations over 1.5
years) are required, at a minimum, for a unique determination
of the parallax and proper motion. Two preliminary observing
runs were performed at the ESO 1.56-m Danish telescope in
July 2001 and July 2002 but the subsequent closure of the
telescope forced the authors to move the program to the ESO
2.2-m telescope. Data acquired at the Danish telescope were not
included in our final analysis to avoid systematic effects due to
the use of two different telescopes.
To minimize differential colour refraction effects (DCR), ob-
servations were performed around the transit of targets with hour
angles of less than 1 hour. Multiple exposures were taken at
each observation epoch to reduce the astrometric errors and to
estimate the precision of measurements. Exposure times varied
from 100 to 600 seconds depending on the magnitude of the tar-
get. Each field was observed from 20 to 35 times.
3. Astrometric Reduction
3.1. Measurements
Frames were measured using the DAOPHOT II package (Stetson
1987), fitting a PSF. The significance level of a luminosity en-
hancement over the local sky brightness which was regarded as
real was set to 7σ. The PSF routine was used to define a stel-
lar point spread function for each frame. Finally we obtained the
(x, y) measured positions, the internal magnitudes and associated
errors of all stars on each frame. There were typically 300 to 600
stars measured on each frame depending on the exposure time.
From these, a selection on the error in magnitude (ERRMAG)
as derived by the DAOPHOT II software was applied. Any ob-
servation with ERRMAG ≥ 0.15m was rejected. Objects fainter
than 1.5m brighter than a given image’s limiting magnitude were
also rejected from the analysis.
3.2. Cross-Identification
For each of the 15 different fields of view, we selected a “master”
or fiducial image from the set of 20 to 35 images. This master
frame for each object had the deepest limiting magnitude and
highest image quality. For each of the other images for a given
target object, the positions of all stars not rejected by the crite-
ria above were then cross–identified to the master image’s star
positions. Objects not detected on three or more frames were
excluded, yielding 100 to 200 stars in common in each field.
Frames containing less than Nmaster/3 stars in common with the
master frame were removed from the solution (where Nmaster is
the number of stars in the master frame). Note that the master
frame is processed in an identical fashion to the other frames
and is not assumed to be free of errors in the parallax solution.
In other words, the fiducial frames are not taken as an error-free
“truth”, but are simply used as a basis for coordinate transfor-
mations and correlation of star positions that comprise the astro-
metric grid used in the solution.
3.3. Differential Colour Refraction
Atmospheric refraction changes the apparent positions of stars in
ground–based observations and depends on the zenith distance
of the observations. For precision astrometry this effect must be
accounted for, because it can be many tens of milliarcsececonds
at even relatively modest zenith distances. In our case, another
effect becomes important as well, because the atmospheric re-
fraction of our target stars will not be identical to that of the
background stars used for our astrometric reference grid. Our
target stars (WDs) and the background stars (typically main–
sequence G or K stars) have different spectral energy distribu-
tions. Therefore, atmospheric refraction will affect them differ-
ently when observed through a given filter bandpass. This is
called a differential colour refraction (DCR) and is known to
cause spurious parallactic motion Monet et al. 1992. DCR can
affect both the Right Ascension (RA) and Declination of the tar-
get as derived with respect to the field stars. Observations in par-
allax programs are planned to maximize the parallactic factor in
RA so the parallax solution for the target will rely heavily on
the RA measured. Therefore the parallax derived is mainly per-
turbed by the DCR effects in RA which are critically dependent
on the zenith distance of a given observation.
We investigated the impact of such effects on the parallax
of white dwarfs through simulations. Using the usual formula
for atmospheric refraction, a blackbody approximation for white
dwarf and background stellar spectra, the Besançon Galaxy
model for background star characteristics (Robin et al. 1994)
and ESO 845 filter limits, we computed the average differen-
tial colour refraction effects between a white dwarf similar to
those of our list with effective temperatures, Teff , in the range
4000 K to 11000 K (Bergeron et al. 2005, Table 2) and a typical
background star (Teff ∼ 5000 K).
We present in Fig. 1 the effects of DCR in RA for white
dwarfs situated at δ = −30◦, covering the range of temperatures
of our targets. Fig. 1 demonstrates that the impact of DCR effects
were always less than 0.5 mas for observations taken with an
hour angle of less than one hour. Therefore, our observations
C. Ducourant et al.: Parallaxes of halo white dwarf candidates 3
were made specifically so that the hour angle never exceeded
one hour, and DCR corrections were not applied in this work.
Fig. 1. DCR effects in RA between a white dwarf of temperature
Teff and a mean background stars (Teff=5000K) at a Declination
of −30◦ (representative of our sample) for various hour angles of
observation. The DCR effects appear to be always lower than 0.5
mas for observations performed at less than 1 hour from merid-
ian which is the case of the present project. The DCR effects are
then negligible compared with other sources of astrometric error
and were not taken into account in this work.
3.4. Impact of Pixel Scale Errors on Parallax
Proper motions (µx, µy) and trigonometric parallax (πxy) of tar-
gets are determined by comparing the (x, y) measurements ex-
pressed in pixels. A scaling factor S f , the image pixel scale, is
applied to πxy to convert pixel measurements into physical units:
π = S fπxy; d(pc) = π−1, with S f expressed in ′′/pixel.
Derivation of the pixel scale can be achieved through a
cross–correlation between the (x, y) positions of stars on a given
master frame to corresponding values of (α, δ) for the subset of
stars that are also in a reference catalogue. Here we used the
2MASS catalogue (Cutri et al. 2003) to determine the orienta-
tion of the master frame on the sky and for the pixel scale deter-
mination. We selected the 2MASS catalogue as a reference cat-
alogue because of its accuracy and density although we note the
absence of proper motion corrections. Nevertheless the epoch
difference between our observations and the 2MASS catalogue
(3 years) would result in negligible corrections to the catalogue
positions with respect to the catalogue errors.
Errors on the scale so determined, resulting from catalogue
random errors, will produce errors in the distance determination
of the target. It is therefore important to quantify the impact of
the catalogue errors onto the distance of the target.
To measure this impact in the present work, we assumed N
reference stars equally spread over a square detector of side A.
The classical equation relating the (x, y) measurements of a stars
on the frame to its standard coordinates X(α, δ),Y(α, δ) in the
tangent plane to the celestial sphere is (with a similar equation
in the Y coordinate)
X = (ax + by + c)1/F, (1)
where (a,b,c) are the unknown “plate” constants and F the focal
length of the telescope (typically the value indicated in the ref-
erence manual). F is expressed in the same units as (x,y) and A
(pixel, mm). It is then easy to show that a fair approximation of
the variance of the estimation of parameter (a) is given by
�cat, (2)
where �cat is the catalogue precision (expressed in radians).
Similar results can be found in Eichhorn & Williams 1963. We
can express the parallax (in radians) as:
πxy, (3)
σ2π = π
σ2a, σπ =
�cat (4)
with F ∼ 13m, A ∼ 0.03m, we evaluate here σπ ∼ 10−4π. The
impact of the error of the catalogue on the parallax of the target
is far below the measurement errors (typically a few milliarcsec-
onds) and are therefore negligible.
3.5. Global Solution: Relative Parallax
The astrometric reduction of the whole set of data of each
field is performed iteratively through a global central overlap
procedure (Hawkins et al. 1998, Eichhorn 1997) in order to
determine simultaneously the position, the proper motion and
the parallax of each object of the field.
The following condition equations are written for each star
on each of the N frames considered (including the master frame).
These equations relate the measured coordinates to the stellar
astrometric parameters:
X0 + ∆X0 + µX(t − t0) + πFX(t) = a1x(t) + a2y(t) + a3 (5)
Y0 + ∆Y0 + µY (t − t0) + πFY (t) = b1x(t) + b2y(t) + b3 (6)
where (X0,Y0) are the known standard coordinate of the star at
the epoch t0 of the master frame, and (x(t), y(t)) its measured
coordinates on the frame (epoch t) to be transformed into the
master frame system. ∆X0, ∆Y0, µX , µY and π are the unknown
stellar astrometric parameters: (∆X0, ∆Y0) yield correction of the
standard coordinates of the star on the master frame, (µX , µY ) are
the projected proper motion in RA ∗cos(δ) and Dec, and π is the
parallax. Coefficients (ai, bi) are the unknown frame parameters
which describe the transformation to the master frame system.
(FX , FY ) are the parallax factors in standard coordinates. The
unknowns of this large over–determined system of equations are
the stellar astrometric parameters of each object, and the trans-
formation coefficients of each of the N frames considered. The
system of equations is singular and therefore the derived solution
is not unique; any solution will depend on the starting point of
the iterations. The usual technique to obtain a particular solution
is to introduce a set of constraints that the solution must satisfy.
In this work we chose to set strictly to zero the mean parallax of
the reference stars.
We used a Gauss–Seidel type iterative method to solve the
set of equations. At the first iteration all stellar parameters are
assumed null, we then computed the plate constants which are
4 C. Ducourant et al.: Parallaxes of halo white dwarf candidates
injected into the system of equations to derive the stellar param-
eters. These results are then used as the starting point of the fol-
lowing iteration. The iterative procedure converges usually at the
second or third iteration. A test of elimination at 3σ is applied to
remove poor observations either in the master frame fit or in the
stellar parameters fit. The stellar parameters fit equations have
been weighted by the mean residual of the master frame fit. This
weighting represents the quality of the measurements. The stars
used for the master frame fit are called here reference stars.
We applied this global treatment to the various observations
of the 15 fields observed and we derived for the targets a proper
motion and parallax with associated variances.
3.6. Conversion from Relative to Absolute Parallax
The parallaxes that we derived for our targets are relative to the
reference stars (for which we used the constraint
π = 0), sup-
posed placed at infinite distance. In fact these reference stars are
at a finite distance from Sun. We must therefore correct the rela-
tive parallax of the target from an estimate of the mean distance
of the reference stars to obtain the absolute parallax of the target.
The choice we made to keep as many reference stars as possible
in our calculation is interesting because statistically faint stars
have smaller parallax and require smaller correction.
There are several ways to estimate the mean distance of
reference stars: statistical methods relying on a model of the
Galaxy; spectroscopic parallax; and photometric parallax. For
the corrections from relative parallax to absolute parallax we
used a statistical method relying on simulations using the
Besançon Galaxy model (Robin et al. 1994) to derive the theo-
retical mean distance of reference stars. A simulation of each ob-
served field was performed, providing catalogues of distance and
apparent magnitude of simulated stars. We computed in these
catalogues mean distances and associated dispersion in magni-
tude bins of 0.2 mag, establishing a table of theoretical distances
with respect to apparent magnitude. Then we considered our ob-
served fields and we computed the weighted mean parallax and
associated dispersion of our reference stars using the theoretical
table. Finally we added this mean parallax of reference stars to
the relative parallax of our target leading to the absolute parallax
of the white dwarfs.
We give in Table 1 the relative to absolute corrections in mil-
liarcseconds as found from the Besançon Galaxy model in each
of the field treated.
4. Results
4.1. Distances of Halo White Dwarf Candidates
We present in Table 2 the proper motions and absolute parallaxes
of the fifteen halo white dwarf candidates as derived from this
work together with their absolute magnitude MV computed using
CCD V magnitudes from Bergeron et al. (2005).
One notices that WD2326–272, LP586–51, LP588–37, and
WD2324–595 are too distant to have a measurable parallax.
Eleven objects are at distances ranging from 19 pc to 90 pc from
the Sun. The parallax errors are about 1–2 mas corresponding
to relative precisions of 5 to 20%. WD2214–390, which is the
closest and brightest object, has a σπ = 2.6 mas. This poor pre-
cision is due to the short exposure time used to avoid saturation
problems and corresponding lower signal–to–noise ratio.
We present in Figs 8 and ?? the positions (empty circles),
their weighted mean (filled circles) and associated error bars at
Table 1. Relative to absolute corrections ∆π and associated
RMS (σ) as found from the Besançon Galaxy model in the
Galactic direction (l,b) together with number of reference stars
(N*) in magnitude interval [Jmin,Jmax].
Target l b ∆π σ N* Jmin Jmax
[◦] [◦] [mas] [mag]
WD2214-390 2.79 -55.37 1.3 0.3 38 13.1 16.2
WD2242-197 40.01 -59.42 1.0 0.3 97 14.0 18.4
WD2259-465 344.30 -60.62 1.1 0.2 83 13.6 18.0
LHS542 72.40 -59.70 1.2 0.3 42 13.4 17.0
WD2324-595 321.83 -54.34 1.1 0.2 62 13.3 17.0
WD2326-272 27.66 -71.06 1.3 0.4 80 14.2 18.7
LHS4033 90.24 -61.96 1.3 0.2 39 14.2 16.5
LHS4041 351.44 -74.66 1.4 0.3 37 13.5 16.2
LHS4042 6.55 -76.61 1.5 0.4 38 13.3 16.6
WD0045-061 118.54 -68.96 1.5 0.3 54 13.5 17.7
F351-50 314.26 -83.50 0.3 0.2 53 14.1 18.1
LP586-51 128.88 -63.30 1.3 0.3 47 14.1 17.4
WD0135-039 149.30 -64.53 1.3 0.2 82 14.4 19.0
LP588-37 150.44 -61.52 1.4 0.2 57 13.6 17.7
LHS147 178.72 -73.56 1.5 0.3 43 13.4 16.8
each epoch of observation, together with the fitted path for the
eleven most significant parallaxes, where π/σπ ≥ 4.
4.2. Comparison with Published Distances
We have compared our results with available data from the lit-
erature, employing both trigonometric and photometric paral-
laxes measured previously. We give in Table 3 the comparison
with published trigonometric parallaxes and in Figure 2 a com-
parison of the parallaxes derived in this work with photomet-
ric parallaxes (from OHDHS, where photometric parallax errors
were 20%). Parameters of a weighted linear fit between pho-
tometric and trigonometric parallaxes are: πtrig = a.πphot + b
with a = 1.08+/-0.08 and b = 3.21+/-1.56 [mas] with a reduced
χ2 =8.06.
Table 3. Comparison of trigonometric parallaxes from this
work (πThiswork) with published data (πext) for LHS 147
(Van Altena et al. 1995), LHS 4033 (Dahn et al. 2004) and LHS
542 (Bergeron et al. 2005).
Target πThiswork πext ∆π
[mas] [mas] [mas]
LHS 542 29.6 +/- 1.8 32.2 +/- 3.7 2.6
LHS 147 14.8 +/- 1.8 14.0 +/- 9.2 –0.8
LHS 4033 30.1 +/-1.8 33.9 +/- 0.6 3.8
Our parallaxes are in excellent agreement with the 3 pre-
viously published trigonometric parallaxes, within the errors
(which are considerably smaller in two cases than published val-
ues). In Fig. 2 one notices a clear systematic tendency of pho-
tometric parallaxes to be underestimated. This overestimation of
OHDHS distances is of importance in the calculation of WD
kinematics and space density.
C. Ducourant et al.: Parallaxes of halo white dwarf candidates 5
Table 2. Proper motion and absolute parallaxes of the fifteen halo white dwarf candidates, where µα∗ = µαcos(δ) and σµ=σµα∗=σµδ ;
π and σπ are the parallax and its precision, Dist the derived distance in parsec and MV the absolute magnitude. No value is given
for Dist and Mv when the parallax is not better than 3 σ. N* is the number of reference stars and Nf the Number of frames. Dphot
is the photometric distance from OHDHS and V is extracted from Bergeron et al. (2005) when available, otherwise (cases marked
by an asterix) it comes from Salim et al. (2004). Note that LHS 4041 is in the OHDHS sample, but is not listed in OHDHS Table 1
(see Table 4 of Salim et al. 2004)
Name α δ Epoch V µα∗ µδ σµ π σπ Dist Mv N* Nf Dphot
[J2000] [yr] [mag] [mas/yr] [mas] [pc] [mag] [pc]
WD2214–390 22 14 34.727 –38 59 07.05 2003.5 15.92 1009 –350 2.9 53.5 2.6 19 14.78 38 28 24
WD2242–197 22 41 44.252 –19 40 41.41 2003.5 19.74 359 +48 3.1 11.1 2.3 90 14.89 97 27 117
WD2259–465 22 59 06.633 –46 27 58.86 2002.9 19.56 402 –153 1.8 22.7 1.3 44 16.49 83 32 71
LHS542 23 19 09.518 –06 12 49.92 2003.5 18.15 –615 –1576 1.8 29.6 1.8 34 15.58 42 33 42
WD2324–595 23 24 10.165 –59 28 07.95 2003.5 16.79 136 –562 1.8 (3.1) 1.5 —- —- 62 25 58
WD2326–272 23 26 10.718 –27 14 46.68 2002.9 ∗19.92 574 –85 2.7 (6.2) 2.4 —- —- 80 17 108
LHS4033 23 52 31.941 –02 53 11.76 2002.9 16.98 631 298 2.5 30.1 1.8 33 14.38 39 26 63
LHS4041 23 54 18.793 –36 33 54.60 2002.9 ∗15.46 21 –662 1.8 13.4 1.5 75 11.10 37 27 59
LHS4042 23 54 35.034 –32 21 19.44 2003.5 17.41 421 –37 2.2 13.9 1.8 72 13.13 38 25 85
WD0045–061 00 45 06.325 –06 08 19.65 2002.9 18.26 111 –668 1.9 30.1 1.9 33 15.59 54 27 44
F351–50 00 45 19.695 –33 29 29.46 2003.5 19.01 1820 –1476 2.1 28.3 1.4 35 16.63 53 34 37
LP586–51 01 02 07.181 –00 33 01.82 2002.9 18.18 350 –118 3.6 (2.4) 2.7 —- —- 47 24 120
WD0135–039 01 35 33.685 –03 57 17.90 2002.9 19.68 456 –180 3.4 13.3 2.9 75 15.26 82 21 146
LP588–37 01 42 20.770 –01 23 51.38 2002.9 ∗18.50 112 –328 3.4 (1.4) 4.5 —- —- 57 17 120
LHS147 01 48 09.120 –17 12 14.08 2002.9 17.62 –115 –1094 2.1 14.8 1.8 68 13.46 43 29 71
Fig. 2. Comparison of parallaxes derived in this work with pho-
tometric parallaxes from OHDHS (errors are assumed 20% for
πphot). Parameters of a weighted linear regression (diagonal line)
between both types of parallaxes are π = 1.08πphot + 3.21 [mas]
with a reduced χ2 = 8.06. The photometric distances are sys-
tematically larger than the astrometric ones.
4.3. Proper Motions
We have compared the proper motions derived here with the
OHDHS proper motions in order to check wether some system-
atic effects could affect our proper motions derived on a 1.5 yr
time span and, as a result, our parallaxes. We present this com-
parison in Fig. 3 and Fig. 4. Error bars are drawn in both co-
ordinates but since the present work has much higher precision
than the photographic astrometry, the error bars in x are not vis-
ible. The slope of a linear regression between proper motions in
α cos(δ) derived in this work with the OHDHS proper motions
is 1.04 ± 0.02 with a reduced χ2 = 3.7. The equivalent linear
fit in proper motions in Declination has a slope of 1.01 ± 0.02
with a reduced χ2 = 0.7. For F351-50 (the largest error bars in
both figures), the accordance in RA and Dec proper motions is
not good. This is due to a known problem of contamination by
a background galaxy of the Schmidt plate measurements used
in the OHDHS work. Nevertheless the accordance is within 2σ.
These comparisons show excellent agreement between both sets
of proper motions, and argue against any systematic effects from
the present work.
4.4. Space Velocities
We derived the Galactic space velocities U, V, W
(Johnson and Soderblom 1987) for the white dwarfs using
the distances and proper motions measured here together with
radial velocities from Salim et al. 2004 (data available for 9
of the 15 white dwarfs treated here). Salim’s observed radial
velocities were corrected for a mean gravitational redshift of
+28km/s as suggested by the authors in their paper except in
the case of the very massive white dwarf LHS4033 were the
correction was taken from Dahn et al. 2004. U is radial toward
the Galactic center, V is in the direction of rotation and W
perpendicular to the Galactic disk. U,V and W were corrected
for the Sun’s peculiar velocity (Mihalas and Binney (1981)).
When no radial velocity was available from other studies, we
assumed Vr = 0 km/s. This approximation is acceptable due to
its minor impact on U,V velocities since the targets are located
6 C. Ducourant et al.: Parallaxes of halo white dwarf candidates
Fig. 3. Comparison of proper motions in RA cos(δ) with the
OHDHS proper motions. Error bars are drawn in both coordi-
nates but since the present work has much higher precision than
the photographic astrometry, error bars in abscissae are not visi-
ble. The slope of a linear regression (dotted line) is 1.04 ± 0.02
indicating good accordance between both proper motion data
sets with a reduced χ2 = 3.7.
close to South Galactic Cap (the effect was investigated in
OHDHS and shown to be negligible).
We present in Figure 5 the distribution of velocities in
the Galactic plane together with the velocity dispersion for
the disk (right most)(1, 2 and 3 σ), thick disk (middle)(1,
2 and 3 σ)( Fuhrmann 2004) and halo (left) (1 and 2σ)
(Chiba and Beers 2000) and in Figure 6 the component of mo-
tion perpendicular to the Galactic plane. These two figures con-
cern the 11 objects with parallax measured at the 4σ level or
better.
In Fig. 5 one notices that 4 of the 11 studied WDs have a
velocity incompatible at the 3σ level with the kinematic of the
disk and of the thick disk and that 6 of them are incompatible at a
2σ level. No star lies within the 1σ ellipse of the disk, primarily
because of selection effects in the original proper motion survey
that OHDHS based is based upon Hambly et al. 2005.
Obviously the choice of the center and dispersions of halo,
thick disk and disk ellipses is critical to classify objects as be-
longing to a particular population. We adopted recent values
which are in in the range of the values cited by Reid 2005
in his review: Disk (Fuhrmann 2004) : (U,V) = (7.7, −18.1)
km/s, (σU , σV ) = (42.6, 22.6) km/s; thick disk (Fuhrmann 2004):
(U, V) = (-18, −63) km/s, (σU , σV ) = (58, 41) km/s; halo
(Chiba and Beers 2000): (U,V) = (0, −180) km/s, (σU , σV ) =
(141, 106) km/s.
5. Discussion
As discussed above, OHDHS sparked a lively debate about
whether stellar remnants contribute to a significant fraction of
the baryonic component of the putative dark matter halo of our
Galaxy. The main criticisms have concerned interpretation, and
we do not address those here. However, the photographic pho-
Fig. 4. Comparison of proper motions in Declination derived
in this work with the OHDHS proper motions. Error bars are
drawn in both coordinates but since the present work has much
higher precision than the photographic astrometry, error bars in
abscissae are not visible. The slope of a linear regression (dotted
line) is 1.01 ± 0.02 indicating a good accordance between both
proper motion data sets with a reduced χ2 = 0.7.
tometry and use of a single photometric parallax relation are also
potential sources of systematic error. Both Salim et al. (2004)
and Bergeron et al. (2005) have shown that the original photom-
etry presented in OHDHS was as accurate as could be expected.
Here, we address the question of the accuracy of photometric
parallaxes directly via trigonometric determination of distances.
In Fig. 2 we compare the trigonometric parallaxes derived
here with the OHDHS photometric parallaxes. Parameters of a
weighted linear regression between both types of parallaxes are
π = 1.08 πphot + 3.21 with a reduced χ2 = 8.06. A clear under-
estimation of photometric parallaxes is visible in this figure with
only one point below the diagonal and three points more than
3σ above the relation. With the usual caveat of small number
statistics, this indicates some level of non–Gaussian scatter, or
at least a mean value for the relation that is not coincident with
π = πphot. The photometric parallax overestimates the distance.
This leads, of course, to an overestimation of tangential space
velocities based on proper motion and distance (as an aside, we
note that the quoted photometric parallax errors of 20% were
conservatively overestimated by OHDHS).
It is interesting to note that the mass distribution of hot
(Teff > 12, 000 K) DA WDs is not Gaussian and has a broad
tail on the high mass side (Należyty et al. 2005). Given that ra-
dius r ∝ m−1/3 for WDs, we would expect photometric paral-
laxes to tend to overestimate rather than underestimate distances
since some of the sample may have higher than average mass,
and correspondingly smaller radii, placing them nearer to the
Sun than typical objects of the same colour. Adding in a sprin-
kling of higher mass WDs with helium–dominated atmospheres
will introduce further systematic overestimation of distances. It
is almost certainly the case that the discrepant photometric par-
allaxes for WD2259–465 and WD0135–039 are caused by these
effects; indeed, this has been shown to be the case for LHS 4033
C. Ducourant et al.: Parallaxes of halo white dwarf candidates 7
Fig. 5. Distribution of velocities in the Galactic plane to-
gether with the velocity dispersion for the disk (right most)(1,
2 and 3 σ), thick disk (middle)(1, 2 and 3 σ)( Fuhrmann 2004)
and halo (left) (1 and 2σ) (Chiba and Beers 2000). Filled
squares correspond to objects with a measured radial velocity
(Salim et al. 2004) while open circles correspond to objects with
no Vr measurement. Only objects with parallax measured at the
4σ level or better are plotted.
which has a mass m ∼ 1.3 M� (Dahn et al. 2004). On the other
hand, the low–mass side of the mass distribution is by no means
perfectly Gaussian (e.g. due to low-mass, helium–core white
dwarfs formed in close binaries). Moreover, any overestimation
in distance leads to a corresponding underestimate of space den-
sity using the 1/Vmax technique. So the interpretation of the re-
sults from this relatively small sub–sample is rather complicated,
and it is only through detailed simulations compared with much
larger samples that significant progress is likely to be made con-
cerning the question of the kinematic population of such objects.
From the comparison of trigonometric and photometric
parallaxes (Fig. 2) we recalibrated photometric distances of
the original OHDHS sample and, using radial velocities from
Salim et al. 2004, we derived their associated recalibrated space
velocities. We present the recalibrated UV plane for the entire
OHDHS sample in Fig. 7.
When compared to Fig. 3 of OHDHS, the number of halo
objects has diminished. From the 38 original OHDHS halo can-
didates, 16 appear compatible with a halo status based on a 2σ
cut with the disk and thick disk velocity distributions (a 3σ cut
would reduce this number to 7), the remaining objects being now
located within the disk and thick disk 2 sigma ellipses. In the lit-
erature there is a large spread of the proposed values to charac-
terise the thick disk and halo populations in terms of kinematics.
For instance in Reid 2005 the velocity dispersions for thick disk
vary from 50 to 69 km/s in the U direction and from 39 to 58
km/s in the V direction. Even the center of velocity ellipsoid
varies from –30 to –63 km/s in the < V > coordinate from one
author to another. All this makes it very difficult to separate ob-
jects into halo and thick disk populations and requires a more
detailed analysis which is beyond the scope of the present paper.
Fig. 6. Component of motion perpendicular to the Galactic
plane (W) as function of
U2 + V2. Only objects with paral-
lax measured at the 4σ level or better and with available radial
velocity (Salim et al. 2004) arre plotted. The vertical line is the
OHDHS
U2 + V2 = 94 km/s cut.
The conclusions of OHDHS about local halo WD density
must be now reanalysed since the volume explored by their sur-
vey has changed (re-calibrated distances) and the number of halo
candidates has also changed. This will be the subject of a forth-
coming paper.
6. Acknowledgements
The authors wish to thank G. Daigne for helpful comments and
CAPES/COFECUB, FAPESP organizations and INR for sup-
porting the project.
References
Alcock, C. et al. 1999, in ASP Conf. Ser. 165, The third Stromlo Symposium:
the Galactic Halo, ed. B.K. Gibson, T.S. Axelrod and M.E. Putman (San
Francisco:ASP),362
Bergeron, P., Leggett, S. K. 2002, ApJ, 580, 1070
Bergeron, P. 2003, ApJ, 586, 201
Bergeron, P.; Ruiz, M.–T.; Hamuy, M.; Leggett, S. K.; Currie, M. J.; Lajoie,
C.–P.; Dufour, P. 2005, ApJ, 625, 838
Calchi Novati, S. et al. 2005, A&A, 443, 911
Chabrier G., Segretain L. and Mra D., 1996, ApJ, 468, L21-L24
Chiba, M., and Beers, T.C., 2000, AJ, 119, 2843
Crézé, M., Mohan, V., Robin, A. C., Reylé, C., McCraken, H. J., Cuillandre,
J.–C., Le Fèvre, O., Mellier, Y., 2004, A&A, 426, 65
Cutri R. M., Skrustskie M. F., Van Dyk S. et al., 2003
Dahn, C. C., Bergeron, P., Liebert, J., Harris, H. C., Canzian, B., Leggett, S. K.,
Boudreault, S. 2004, ApJ, 605, 400
Davies, M. B., King, A. R., Ritter, H. 2002, MNRAS, 333, 469
Eichhorn, H. and Williams, C.A. 1963, AJ, 68, 221
Eichhorn, H.1997, Astron. Astrophys., 327, 404
Flynn, C., Holopainen, J., Holmberg, J., 2003, MNRAS, 339, 817
Fuhrmann K. 2004, Astron. Nact. 325:3-80
Gates, E. et al. 2004, ApJ, 612, 129L
Hansen, B. M. S., 1998, Nature, 394, 860
Hansen, B. M. S. 2003, ApJ, 582, 915
Hansen, B. M. S. and Liebert, J. 2003, ARA&A, 41,465
Hambly, N. C., Digby, A. P., Oppenheimer, B. R., 2005, ASPC, 334, 113
8 C. Ducourant et al.: Parallaxes of halo white dwarf candidates
Fig. 7. Distribution of velocities of the original OHDHS sam-
ple with recalibrated parallaxes in the Galactic plane together
with the velocity dispersion for the disk (right most)(1, 2 and
3 σ), thick disk (middle) (1, 2 and 3 σ)( Fuhrmann 2004)
and halo (left) (1 and 2σ) (Chiba and Beers 2000). Filled
squares correspond to objects with a measured radial velocity
(Salim et al. 2004) while open circles correspond to objects with
no Vr measurement.
Hawkins, M. R. S., Ducourant, C., Jones, H. R. A. and Rapaport, M., 1998,
MNRAS, 294, 505
Holopainen, J., Flynn, C., 2004, MNRAS, 351, 721
Johnson, D. R. H., Soderblom, D. R. 1987, AJ, 93, 864
Kilic, M., Mendez, R. A., von Hippel, T., Winget, D. E., 2005, ApJ, 633, 1126
Kowalski, P. M. 2006, ApJ, 641, 488
Należyty, M., Madej, J., Althaus, L. G. 2005, ASPC 334, 107
Mihalas, D., Binney, J. 1981, ”Galactic astronomy”, second edition.
Monet D.G., Dahn C.C., Vrba F.J., Harris H.C., Pier J.R., Luginbuhl C.B., Ables
H.D., 1992, AJ, 103, 638
Montiero, H., Jao, W.–C., Henry, T., Subasavage, J., Beaulieu, T. 2006, ApJ, 638,
Oppenheimer, B. R., Hambly, N. C., Digby, A. P., Hodgkin, S. T., Saumon, D.
2001, Science, 292, 698 (OHDHS)
Reid, I. N., 2005, ARA&A, 43, 247
Robin, A., 1994, ApSS, 217, 163R
Salim, S., Rich, R. M., Hansen, B. M., Koopmans, L. V. E., Oppenheimer, B. R.,
Blandford, R. D., 2004, ApJ, 601, 1075
Torres, S., Garcı́a–Berro, E., Burket, A., Isern, J. 2002, MNRAS, 336, 971
Saumon, D.; Jacobson, S. B. 1999, ApJ, 511, L107
Silvestri, N. M., Oswalt, T. D., Hawley, S. L. 2002, AJ, 124, 1118
Spagna, A., Carollo, D., Lattanzi, M. G., Bucciarelli, B. 2004, A&A, 428, 451
Stetson Peter B., 1987, PASP, 99, 191
Van Altena, W. F., Lee J. T., Hoffleit E. D. 1995, General Catalogue
of Trigonometric Stellar Parallaxes, Fourth Edition, Yale University
Observatory
C. Ducourant et al.: Parallaxes of halo white dwarf candidates 9
Fig. 8. Observations along the fitted path expressed in mas.
	Introduction
	Observations
	Astrometric Reduction
	Measurements
	Cross-Identification
	Differential Colour Refraction
	Impact of Pixel Scale Errors on Parallax
	Global Solution: Relative Parallax
	Conversion from Relative to Absolute Parallax
	Results
	Distances of Halo White Dwarf Candidates
	Comparison with Published Distances
	Proper Motions
	Space Velocities
	Discussion
	Acknowledgements
ABSTRACT
  The status of 38 halo white dwarf candidates identified by Oppenheimer et al.
(2001) has been intensively discussed by various authors. In analyses
undertaken to date, trigonometric parallaxes are crucial missing data. Distance
measurements are mandatory to kinematically segregate halo object from disk
objects and hence enable a more reliable estimate of the local density of halo
dark matter residing in such objects.
  We present trigonometric parallax measurements for 15 candidate halo white
dwarfs (WDs) selected from the Oppenheimer et al. (2001) list. We observed the
stars using the ESO 1.56-m Danish Telescope and ESO 2.2-m telescope from August
2001 to July 2004. Parallaxes with accuracies of 1--2 mas were determined
yielding relative errors on distances of $\sim5$% for 6 objects, $\sim12$% for
3 objects, and $\sim20$% for two more objects. Four stars appear to be too
distant (probably farther than 100 pc) to have measurable parallaxes in our
observations. Distances, absolute magnitudes and revised space velocities were
derived for the 15 halo WDs from the Oppenheimer et al. (2001) list. Halo
membership is confirmed unambiguously for 6 objects while 5 objects may be
thick disk members and 4 objects are too distant to draw any conclusion based
solely on kinematics. Comparing our trigonometric parallaxes with photometric
parallaxes used in previous work reveals an overestimation of distance as
derived from photometric techniques. This new data set can be used to revise
the halo white dwarf space density, and that analysis will be presented in a
subsequent publication.

<|endoftext|><|startoftext|>
Introduction
Neutron stars following a core collapse supernova are rotating at birth and
can be subject to various nonaxisymmetric instabilities (see e.g. [1] for a re-
view). Among those, if the rotation rate is high enough so that the ratio of
rotational kinetic energy T to gravitational potential energy W , β ≡ T/|W |,
exceeds the critical value βd ∼ 0.27, inferred from studies with incompressible
Maclaurin spheroids, the star is subject to a dynamical bar-mode (l = m = 2
Preprint submitted to Elsevier 30 July 2021
http://arxiv.org/abs/0704.0356v1
f -mode) instability driven by hydrodynamics and gravity. Its study is highly
motivated nowadays as such an instability bears important implications in
the prospects of detection of gravitational radiation from newly-born rapidly
rotating neutron stars.
Simulations of the dynamical bar-mode instability are available in the litera-
ture, both using simplified models based on equilibrium stellar configurations
perturbed with suitable eigenfunctions [2,3,4,5], and more involved models for
the core collapse scenario [6,7,8,9], and in either case both in Newtonian grav-
ity and general relativity. Due to its superior simplicity the former approach
has received much more attention, notwithstanding that the conclusions drawn
from perturbed stellar models may not be straightforwardly extended to the
collapse scenario.
Newtonian simulations of triaxial instabilities following core collapse were first
performed by [6]. These showed that the bar-mode instability sets in when
β ≫ 0.27 and when the progenitor rotates rapidly and highly differentially.
Such conditions are met when the (artificial) depletion of internal energy to
trigger the collapse is large enough to produce a very compact core for which a
significant spun-up can be achieved. More recently, three-dimensional simula-
tions of the core collapse of rotating polytropes in general relativity have been
performed by [7]. These authors studied the evolution of the bar-mode insta-
bility starting with axisymmetric core collapse initial models which reached
values of β ∼ 0.27 during the infall phase. These simulations showed that the
maximum value of β achieved during collapse and bounce depends strongly
on the velocity profile, the total mass of the initial core, and on the equa-
tion of state. In agreement with the findings from the Newtonian simula-
tions of [6], the bar-mode instability sets in if the progenitor rotates rapidly
(0.01 ≤ β ≤ 0.02) and has a high degree of differential rotation. In addition,
the artificial depletion of pressure and internal energy to trigger the collapse,
leading to a compact core which subsequently spins up, also plays a key role
in general relativity for a noticeable growth of the bar-mode instability.
Whether the requirements inferred from numerical simulations are at all met
by the collapse progenitors remains unclear. As shown by [10] magnetic torques
can spin down the core of the progenitor, which leads to slowly rotating neu-
tron stars at birth (∼ 10 − 15ms). The most recent, state-of-the-art compu-
tations of the evolution of massive stars, which include angular momentum
redistribution by magnetic torques and spin estimates of neutron stars at
birth [11,12], lead to core collapse progenitors which do not seem to rotate
fast enough to guarantee the unambiguous growth of the canonical bar-mode
instability. Rapidly-rotating cores might be produced by an appropriate mix-
ture of high progenitor mass (M > 25M⊙) and low metallicity (N. Stergioulas,
private communication). In such case the progenitor could by-pass the Red
Supergiant phase in which the differential rotation of the core produces a
magnetic field by dynamo action which couples the core to the outer layers
of the star, transporting angular momentum outwards and spinning down the
core. According to [13] about 1% of all stars with M > 10M⊙ will produce
rapidly-rotating cores.
On the other hand, Newtonian simulations of the bar-mode instability from
perturbed equilibrium models of rotating stars have shown that βd ∼ 0.27
independent of the stiffness of the equation of state provided the star is not
strongly differentially rotating. The relativistic simulations of [5] yielded a
value of β ∼ 0.24 − 0.25 for the onset of the instability, while the dynamics
of the process closely resembles that found in Newtonian theory, i.e. unstable
models with large enough β develop spiral arms following the formation of
bars, ejecting mass and redistributing the angular momentum. As the degree
of differential rotation becomes higher Newtonian simulations have also shown
that βd can be as low as 0.14 [14]. More recently [15,16] have reported that
rotating stars with an extreme degree of differential rotation are dynamically
unstable against bar-mode deformation even for values of β of O(0.01).
Given its recent discovery and its potential astrophysical implications for post-
bounce core collapse dynamics and gravitational wave astronomy, we present
in this paper high resolution simulations of such low T/|W | bar-mode in-
stabilities. This work is further motivated in the light of the few numerical
simulations available in the literature. Our main goal is to revisit the simula-
tions by [15] on the low T/|W | bar-mode instability, and particularly to check
how sensitive the onset and development of the instability is to numerical
issues such as grid resolution. To this aim we perform Newtonian hydrody-
namical simulations of a subset of models analyzed by [15] using an adaptive
mesh refinement (AMR) code [17] which allows us to perform such three di-
mensional simulations with the highest resolution ever used. Our simulations
reveal the complex morphological features involved in the nonlinear dynamics
of the instability, where the excitation of Kelvin-Helmholtz-like fluid modes
influences the saturation of the bar-mode deformation. We advance that while
the overall trends found by [15] are confirmed by our work, the resolution
employed in the simulations does play a key role for the long-term behaviour
of the instability and for the nonlinear dynamics of rotating stars, which has
implications on the attainable amplitudes of the associated gravitational wave
signals. We note that we plan to upgrade the existing AMR code to account
for the effects of magnetic fields in order to attempt the current study in a
more realistic setup. The present work is a step towards that goal.
The paper is organized as follows: Section 2 gives a brief overview of the
equations to solve. Their solution is outlined in Section 3 which also contains
the bare details of the AMR code. The results of the simulations are discussed
in Section 4. Finally Section 5 presents our conclusions.
2 Mathematical framework
The evolution of a self-gravitating ideal fluid in the Newtonian limit is de-
scribed by the hydrodynamics equations and Poisson’s equation:
+∇ · (ρv) = 0 (1)
+ (v · ∇)v = −1
∇p−∇φ (2)
+∇ · [(E + p)v] = −ρv∇φ (3)
∇2φ = 4πGρ (4)
where x, v = dx
= (vx, vy, vz), and φ(t,x) are, respectively, the Eulerian
coordinates, the velocity, and the Newtonian gravitational potential. The total
energy density, E = ρǫ + 1
ρv2 , is defined as the sum of the thermal energy,
ρǫ, where ρ is the mass density and ǫ is the specific internal energy, and the
kinetic energy (where v2 = v2x + v
y + v
z). Pressure gradients and gravitational
forces are the responsible for the evolution. An equation of state p = p(ρ, ǫ)
closes the system. We use an ideal gas equation of state p = (Γ − 1)ρǫ with
Γ = 2.
The hydrodynamics equations, Eqs. (1–3), can be rewritten in flux-conservative
form:
∂f(u)
∂g(u)
∂h(u)
= s(u) (5)
where u is the vector of unknowns (conserved variables):
u = [ρ, ρvx, ρvy, ρvz, E] . (6)
The three flux functions Fα ≡ {f , g,h} in the spatial directions x, y, z, respec-
tively, are defined by
f(u) =
ρvx, ρv
x + p, ρvxvy, ρvxvz, (E + p)vx
g(u)=
ρvy, ρvxvy, ρv
y + p, ρvyvz, (E + p)vy
h(u) =
ρvz, ρvxvz, ρvyvz, ρv
z + p, (E + p)vz
and the source terms s are given by
Table 1
Overview of the initial models and results of the simulations. The rows report
the name of the model, the ratio of equatorial-to-polar radii (re/rp), the degree of
differential rotation (Â), the ratio of kinetic to potential energy (T/|W |), the size of
the computational grid (L) and the location of the corotation radius (rc) for the two
resolutions used: high (AMR H) and low (AMR L). In models R1H and R2H the
corotation radius lies outside the star. The real (frequency) and imaginary (growth
rate) parts of the bar-mode σ2 are shown, for the low and high resolution simulation
in comparison with the numerical results and linear analysis by [15]. Note that for
model D3 no linear analysis results are available.
Model D1 D2 D3 R1 R2
re/rp 0.805 0.605 0.305 0.305 0.255
Â 0.3 0.3 0.3 1.0 1.0
T/|W | 0.039 0.085 0.149 0.253 0.275
L/re 4.06 3.73 3.21 4.25 4.03
rc/re AMR L 0.38 0.47 0.58
AMR H 0.36 0.48 0.56 - -
Re(σ2)/Ω0 AMR L 0.76 0.58 0.41
AMR H 0.81 0.55 0.43 - 0.82
Shibata 0.80 0.60 0.45 0.92 0.75
linear 0.80 0.58 - 0.92 0.75
Im(σ2)/Ω0 AMR L 0.0042 0.0154 0.0200
AMR H 0.0089 0.0190 0.0240 0.0005 0.1960
Shibata 0.009-0.013 0.019-0.021 0.013 <0.002 0.23
linear 0.015 0.021 - <0.002 0.20
s(u) =
, ρvx
− ρvy
− ρvz
. (10)
System (5) is a three-dimensional hyperbolic system of conservation laws with
sources s(u).
3 Numerical approach
For our study of the low T/|W | bar-mode instability we perform high-resolution
simulations of rotating neutron stars using a Newtonian AMR hydrodynamics
code called MASCLET [17]. The implementation of the AMR technique in the
code follows the procedure developed by [18]. The hydrodynamics equations
are solved using a high-resolution shock-capturing scheme based upon Roe’s
Riemann solver and second-order cell reconstruction procedures, while Pois-
son’s equation for the gravitational field is solved using multigrid techniques.
The accuracy and performance of the MASCLET code has been assessed in a
number of tests [17]. We note that the code was originally designed for cos-
mological applications, and here it is applied to simulations of self-gravitating
stellar objects for the first time.
The simulations are performed with two different grid resolutions. The low
resolution grid consists of a box of size L with 1283 zones, yielding a fixed
resolution of L/128. We note that the effective resolution of our coarse grid
is comparable to that used by [15]. Correspondingly, the high resolution grid
consists of a base coarse grid of 1283 cells, and one level of refinement composed
of patches with maximum size of 643 cells (323 coarse cells). This yields a grid
resolution on the finest grid of L/256. This resolution is enough to resolve the
structures simulated, and hence no deeper refinement levels are needed. The
patches are dynamically allocated covering those regions of the star where the
highest resolution is required (highest densities). Typically only one patch is
needed for spheroidal models, and 4-8 in models with toroidal topology. The
use of AMR techniques in our high resolution simulations, allows us to save
about a factor 4 in CPU time and memory with respect to a unigrid simulation
with 2563 cells. No symmetries are imposed in the simulations. To the best of
our knowledge, in the investigations of the bar-mode instability performed by
previous groups, grid resolutions as high as the ones we use here were never
employed.
As customary in grid-based codes [19,20] the vacuum surrounding the star
is filled with a tenuous numerical atmosphere with density ρ/ρmax ≈ 10−12
and zero velocities, ρmax being the maximun density. Every grid cell with
ρ/ρmax < 10
−6 is reset to the atmosphere values. A correct treatment of the
atmosphere is essential for an accurate description of the stellar dynamics and
correct computation of the growth rates of unstable modes. We have checked
that values for the atmosphere higher than those we chose or a free evolution of
the atmosphere altogether, lead to remarkable changes in the mode behaviour,
growth rates, and frequencies. We have also checked that lower values for the
atmosphere do not produce those changes, which ensures that our evolutions
are not affected by the atmosphere values used in the simulations.
4 Results
4.1 Initial data
Differentially rotating stellar models in equilibrium are built according to the
method of [21], and used as initial data for the AMR evolution code. The stars
obey a polytropic equation of state P = KρΓ with index Γ = 2. As [15] the
profile of the angular velocity Ω is given by
(̟/re)2 + Â2
, (11)
where re is the equatorial radius of the star, Ω0 is the central angular ve-
locity, ̟ is the distance to the rotation axis, and Â parametrizes the degree
of differential rotation, from Â ≪ 1 for highly differentially rotating stars to
Â → ∞ for rigidly rotating stars. For comparison purposes these parameters
are chosen as in some of the models of [15], and are summarized in Table 1.
Models labelled D rotate with a high degree of differential rotation, as Â = 0.3,
and may therefore be subject to the low T/|W | bar-mode instability. We also
consider models almost rigidly rotating, labelled R, prone to experience the
“classical” bar-mode instability. Labels L and H in the models refer to low
and high resolution respectively.
Following [15] we perturb the initial density profile ρ(0) according to
ρ = ρ(0)
1 + δ
x2 − y2
, (12)
the perturbation of the pressure given by the equation of state accordingly.
A perturbation amplitude δ = 0.1 is used in all our simulations. As we show
below this form of the perturbation excites the l = m = 2 bar-mode. In
addition, grid discretization can leak small amounts of energy to all other
possible modes, which could in principle grow provided they were unstable
and the simulations were carried on for sufficiently long times.
4.2 Stability analysis
To compare with [15] we calculate the distortion parameters η+ and η× (and
η = (η2+ + η
)1/2) defined as
0 0.5 1 1.5 2
Ω / Ω0
0 0.21 0.43 0.65 0.86 1.1 1.3 1.5 1.7 1.9
Fig. 1. Power spectra of Am from m = 1 to m = 8 for model D3H.
Ixx − Iyy
Ixx + Iyy
, η× ≡
Ixx + Iyy
, (13)
where Iij(i, j = x, y, z) is the mass-quadrupole moment
Iij =
dx3ρ xixj . (14)
For the study of the growth rate and interaction of the different angular modes
within the star is useful to calculate the global quantity
dx3ρ(x) e−imϕ, (15)
and Am ≡ Am/A0. We follow the time evolution of modes with m ranging
from 1 to 8. Since our initial equilibrium models are axisymmetric and have
equatorial plane symmetry, all Am are zero initially, but once perturbed all
0 20 40 60 80 100
Fig. 2. Evolution of η for models R1H (upper panel) and R2H (lower panel). Expo-
nential fits to the peaks in the growing phase are overplotted as solid lines.
initial models exhibit a dominant m = 2 component. Assuming that the modes
behave as e−i(σmt−mϕ), the real part of σm can be obtained by Fourier trans-
forming Am. In particular Re(σ2), the bar-mode frequency, can be extracted
from either A2 or η as both represent the same mode. This is the dominant
mode in all our simulations and its frequency and growth rate are given in
Table 1. The latter corresponds to the imaginary part of σ2, which is calcu-
lated fitting an exponential to the peak values of η in the growing phase of
the evolution until the modes saturate. Other modes are also identified in the
simulations for values of Am with lower amplitudes. We have checked that
these modes are harmonics of the l = m = 2 mode so that they follow to good
accuracy the relation σm = mσp, σp being the pattern frequency, calculated
as σp = σ2/2. This is shown for model D3H in Fig. 1 which displays the spec-
trum of Am from m = 1 to m = 8 (in arbitrary units). The vertical dashed
lines in this figure indicate the location of the integer multiples of the pattern
frequency σp, their values indicated on the axis at the top of the figure. Each
spectrum for each mode is normalized to its own maximum for plotting pur-
poses. Note that the lower the mode amplitude the noisier the spectrum and
the less accurate the relation σm = mσp.
For the models of our sample subject to the “clasical” bar-mode deformation
(R1H and R2H), our simulations yield a value of β between 0.253 and 0.275,
in good agreement with the critical value for the onset of the dynamical bar-
mode instability. Model R1H is stable and model R2H is unstable. The growth
rates and frequencies reported in Table 1 agree with those of [15]. Note that
0 100 200 300
Fig. 3. Evolution of η for models D1 (upper panel), D2 (central panel) and D3 (lower
panel). Dashed lines correspond to low resolution and solid lines to high resolution.
Exponential fits to the peaks in the growing phase are overplotted as solid lines.
for model R1H, which is stable, the frequency for the m = 2 mode cannot be
computed. The time evolution of η for these two models is displayed in Fig. 2.
For the unstable model R2H, our simulations show the formation of a bar
which saturates for values of η+ and η× close to 1, i. e. in the full nonlinear
regime.
Fig. 3 shows the time evolution of η for models D in our sample, prone to suffer
the low T/|W | bar-mode instability. Solid lines correspond to high resolution
simulations and dashed lines to low resolution. For all three models the pattern
frequencies σp are such that there exists a corotation radius inside the star,
i.e. a radius at which the bar-mode rotates with the same angular velocity as
the fluid. The location of the corotation radius for all models of our sample
is reported in Table 1. As recently discussed by [22] the existence of such
corotation radius is a potential requirement for the ocurrence of the instability.
As becomes clear from Fig. 3, all models are unstable but grid resolution has
an important effect on the saturation of the instability once the nonlinear
phase has been reached, as well as in the long-term dynamics of the stars.
In the linear phase of models D1H and D2H, the growth rates and frequencies
agree with the results of [15] in both, the numerical simulations and the linear
analysis (see Table 1). In the linear phase of model D3H, our frequencies are
similar to the numerical results of [15], although our growth rates are about
a factor two larger. We emphasize that no results are reported in the linear
analysis for this model in the work of [15], and therefore this discrepancy can
be an effect of the resolution used or of the characteristics of each numerical
code. Increasing resolution leads to similar results in the frequencies but to
higher growth rates.
In the nonlinear phase, models D1 and D3 behave similarly for the two resolu-
tions used (see Fig. 3), and also similarly to the results by [15] (compare with
Fig. 3 of that paper). For model D2 we observe a radical change of behavior in
the nonlinear phase of the mode evolution depending on the grid resolution.
This has implications on the long-term dynamics of the star and, in particu-
lar, on the attainable amplitudes of the gravitational radiation emitted, as we
discuss below.
It is worth mentioning the possibility that the unstable mode at the start of
model D2H might excite some other mode in the corotation band, which could
not otherwise be excited for lower grid resolution. As discussed by [23,24] in
their study of differentially rotating shells, there are many zero-step modes in
the band, so that the whole continuous spectrum could potentially be excited.
In such case these modes would have very slow power-law growth.
For all our models we have checked mass conservation along the evolution. The
worst results are obtained for model D3H, for which mass is conserved within
2.5% error when the instability saturates. At the end of the simulation (after
48 orbital periods and 25000 iterations in the coarsest grid) the error has grown
to only 6%. For all other models mass conservation is even more accurate. Note
that these errors are within the round-off error of the code, and it is not related
to the conservation properties of the numerical scheme itself. For a regular grid
with 1283 cells and a simulation employing 25000 iterations, the accumulated
round-off error (binomial distribution) using single-precision arithmetics, is
about
1283 × 25000 × 10−8 = 0.0023 = 0.23%. Correspondingly, for a 2563
grid (with twice the number of iterations for the simulation) the error is about
0.9%. Taking into account that this error affects the nonlinear evolution of the
system, it is not surprising to have an error at the level of a few percent by
the end of our high resolution simulations, for all conserved quantities.
Figure 4 shows the evolution of Am for model D3 and for m ranging from
1 to 8 for our two resolutions. According to this figure, the only two modes
1e-06
0.0001
0 100 200
1e-06
0.0001
m=1, 3, 5, 7
m=1, 3, 5, 6, 7, 8
Fig. 4. Evolution of |Am| for model D3 with low resolution (top) and high resolution
(bottom). The m = 2 mode is represented with thick solid line, m = 4 with thin
solid line, m = 6 with dashed line, m = 8 with dot-dashed line, and all other odd
m with dotted lines.
relevant for the dynamics of the star are m = 2 and m = 4. All other modes
have smaller amplitudes and play no role in the dynamics. Note that for odd
m modes, the value of the integrated quantity Am, if close to zero, is extremely
sensitive to very small numerical asymmetries, which are induced by the patch
creation scheme of our AMR code. This explains the resolution differences in
the initial values for odd m modes in Fig. 4 (at t = 0 they start off at 10−8
level for the low resolution simulation), although they saturate at the same
value irrespective of the resolution.
An important diagnosis for the accuracy of the results is the location of the
center of mass during an evolution. The round-off error of the numerical code
imposes controlled errors in mass and linear momentum, which results in tiny
displacements of the center of mass. However small (one numerical cell in
our runs) this unphysical displacement may hinder the correct analysis of the
0 100 200 300
1e-06
0.0001
Fig. 5. Effects of the artificial displacement of the center of mass (of only one
numerical cell) on the time evolution of |A1| for model D2H. The thin solid line
shows a fictitiuos evolution resulting from the numerical artifact originated by the
center of mass displacement.
mode growth rates. For this reason all integrated quantities shown in Fig. 4
are computed after correcting for the displacement of the center of mass,
xnew = xold − xCM, in a post-processing stage of the data analysis. Were this
not done, a one-armed m = 1 mode would grow much faster than it should to
bring up fictitious features in the plots. This is shown for model D2H in Fig. 5.
The thick solid line in this figure corresponds to the evolution of the m = 1
mode taking into account the correction for the center of mass displacement,
while the thin solid line is the corresponding evolution of this mode without
the correction.
4.3 Gravitational waves
The growth and saturation of the instability is also imprinted on the gravita-
tional waves emitted. The gravitational waveforms h+ and h× for models D1,
D2, and D3, computed using the standard quadrupole formula, are shown in
Fig. 6. For a source of mass M located at a distance R those waveforms can
be calculated from the dimensionless waveform amplitudes a+ and a× as
h+,× = a+,×
sin2 θ
, (16)
using G = c = 1 units. The resulting chirp-like signal in all the models, partic-
ularly apparent for model D2L, indicates the presence of a bipolar distribution
of mass within the star (see Sec. 4.4).
-0.05
0 100 200 300
Fig. 6. Gravitational waves for models D1 to D3 extracted using the standard
quadrupole formula. Thick (thin) solid lines correspond to low (high) resolution.
Only the dimensionless waveform amplitude a+ is plotted.
As mentioned before, the effects of grid resolution on the evolution of the
nonlinear phase of the bar-mode are imprinted on the gravitational waveforms.
Thick solid lines in Fig. 6 are the waveforms which correspond to the low-
resolution models, and thin solid lines to the high-resolution counterparts.
The evolution of η for model D3, displayed in Fig. 3, shows little deviations
with grid resolution, and this translates into very similar gravitational wave
patterns (bottom panel of Fig. 6), the differences becoming more noticeable in
the nonlinear phase following saturation (Ω0t ≥ 75). For model D1 (top panel),
the differences also become more apparent at later times during the evolution,
in good agreement with the dissimilar behaviour of the matter dynamics in
this model, as encoded in the evolution of η in Fig. 3. As happens for model
D3 the first few cycles of the gravitational waveform, when the mode is still
in the linear phase, are accurately captured for both resolutions.
The major dependence of the waveform on the grid resolution is found for
model D2. Again, the linear phase for the growth of the bar deformation is
Ω 0 t=
70.40.0
Fig. 7. Snapshots of the density, vorticity, and specific angular momentum, for
model D3H, at three representative instants of the evolution. All snapshots show
slices of the stars in the equatorial plane. Quantities are normalized as follows:
max, rew
s , and l
ϕ/(rev
s ), where v
s is the initial velocity at the surface
of the star.
accurately captured irrespective of the resolution (and agrees with the per-
turbative results of [15]). This is signalled in the perfect overlapping of both
gravitational waveforms during the first three cycles (see the middle panel of
Fig. 6). However, the different nonlinear dynamics of the bar-mode deforma-
tion for this model, shown in the middle panel of Fig. 3, is severely imprinted
on the gravitational waveform. Model D2H emits gravitational waves which
have roughly one order of magnitude smaller amplitude than those computed
for the corresponding low resolution model.
4.4 Morphology
We next describe the morphological features encountered during the evolution
of some representative models. Fig. 7 shows three snaphsots of the evolution
of model D3H for the density (top), the azimuthal component of the vorticity,
~wϕ = (∇ × ~v)ϕ (middle), and the specific angular momentum, ~l = ~r × ~v
(bottom). From left to right the snapshots correspond to the initial time (Ω0t =
0), a time when the bar-mode instability is growing (Ω0t = 33.6), and the time
when the instability saturates (Ω0t = 70.4). Only the equatorial plane of the
stars is shown in all these plots. Animations of all simulations performed are
available at www.uv.es/∼cerdupa/bars/. We note that our AMR code is able
to dynamically place patches (e. g. between 4 and 8 in the D3H model) and
evolves the system with continuous matching between patches, as exemplified
in Fig. 7.
The evolution of model D3H shows that as the m = 2 mode grows the star
develops an ellipsoidal shape which remains spinning beyond saturation. Since
the low β m = 2 mode saturates at lower values (η ∼ 0.1) than the classical
bar-mode instability (η ∼ 1), no clear bars are visible in the density plot. At
late times (Ω0t > 100) a “boxy” structure becomes apparent as the m = 4
mode has grown to almost similar amplitude as the m = 2 mode (see anima-
tions and Fig. 4). No other global features can be seen, consistent with the
fact that |Am| ≪ 1 for all modes other than m = 2 and 4. The vorticity plot
shows that the m = 2 mode at Ω0t = 33.6 adopts the form of a two-armed
spiral winding up around the central parts of the star. As the mode begins to
saturate (Ω0t = 70.4) the spirals break apart into the outer layers in a turbu-
lent flow reminiscent of the (shear) Kelvin-Helmholtz instability, and shock as
they reach the atmosphere. These trends are also visible in the specific angular
momentum plot.
The presence of a corotation radius, at r/re = 0.56 for model D3H, seems
to play a role in the growth and saturation of the instability, in agreement
with the recent findings of [25]. As the bar-mode grows, pressure waves carry
angular momentum outside the corotation radius, which is deposited in the
outer layers of the star. This excites Kelvin-Helmholtz-like instabilities in the
fluid that break the mode outside the corotation radius. When this happens the
m = 2 instability stops growing and no more angular momentum is extracted.
Figure 8 shows late-time snapshots of the equatorial plane distribution of the
density perturbation, i.e. (ρ−ρ(0))/ρ(0)max, for models D2H and D3H. The times
are chosen well inside the nonlinear and saturation phase of the instability.
This figure helps to interpret the mode dynamics and its saturation along the
lines mentioned before: During the evolution the density perturbations are
shed in waves from the center towards the outer layers of the star. At late
times, when the instability saturates, such shedding stops, and the density
Fig. 8. Snapshots of the density perturbation at the equatorial plane for models D2H
and D3H. The white solid curves indicate the location of the corotation radius. The
white dashed boxes indicate the location of the patches for model D3H.
perturbation reaches the largest values outside the corotation radius (depicted
with white solid lines in Fig. 8), for either model.
We note in passing that the corotation radius in all our high resolution models
lies well inside the outer boundary of the finest box set up by the AMR
refinement pattern. (see, e.g. the white dashed boxes depicted in the right
panel of Fig. 8 indicating the location of the AMR patches for model D3H)
This rules out the possibility of a numerical artifact resulting from the patch
creation scheme of our AMR code being the cause for the different long-term
evolution between low and high resolution models, particularly noticeable for
model D2 in Fig. 3.
Finally, Fig. 9 shows a comparison between models D2L and D2H at Ω0t = 101
(i.e. well within the nonlinear phase), to highlight the effects of the numer-
ical resolution on the morphology. From top to bottom this panel shows a
schlieren plot (|∇ log ρ|) , ~wϕ, and ~l. The resolution differences in the evolu-
tion of model D2 become apparent from this figure. In particular, the “boxy”
structure becomes much more clearly visible in the low resolution simulation
(D2L), indicating an excessive growth rate of the m = 4 mode. The presence
of pressure waves is emphasized in the schlieren plot, very accurately captured
in model D2H. Those waves, once the flow is driven to turbulence past the
corotation radius, redistribute the angular momentum in the outer layers of
model D2L in a much more pronounced way than for model D2H.
Ω 0 t=
D2L D2H
Fig. 9. Resolution comparison between models D2L and D2H once the instability
has saturated. Only slices of the stars in the equatorial plane are shown.
5 Summary and outlook
We have presented AMR high-resolution simulations of the low T/|W | bar-
mode instability of extremely differentially rotating neutron stars. Our main
motivation has been to revisit the simulations by [15] on such instability,
assessing how sensitive the onset and development of the instability is to
numerical issues such as grid resolution. We have addressed the importance
of a correct treatment of delicate numerical aspects which may spoil three-
dimensional simulations in (Cartesian) grid-based codes, always hampered by
insufficient resolution, namely the handling of the low-density atmosphere sur-
rounding the star, the correction for the center of mass displacement, and the
mass and momentum conservation properties of the numerical scheme. Our
simulations have revealed the complex morphological features involved in the
nonlinear dynamics of the instability. We have found that in the nonlinear
phase of the evolution, the excitation of Kelvin-Helmholtz-like fluid modes
outside the corotation radii of the stellar models leads to the saturation of the
bar-mode deformation. While the overall trends reported in the investigation
of [15] are confirmed by our work, the resolution used to perform the simu-
lations may play a key role on the long-term behaviour of the instability and
on the nonlinear dynamics of rotating stars, which has only become apparent
for some specific models of our sample (namely model D2). This, in turn, has
implications on the attainable amplitudes of the associated gravitational wave
signals.
The work reported in this paper is a first step in our ongoing efforts of study-
ing the dynamical bar-mode instability within the magnetized core collapse
scenario.
Acknowledgements
The authors thank Harry Dimmelmeier, Nick Stergioulas, and Anna Wats
for useful comments. Research supported by the Spanish Ministerio de Edu-
cación y Ciencia (MEC; grants AYA2004-08067-C03-01, AYA2003-08739-C02-
02, AYA2006-02570). VQ is a Ramón y Cajal Fellow of the Spanish MEC.
Computations performed at the Servei d’Informática de la Universitat de
València (CERCA-CESAR).
References
[1] N. Stergioulas, Liv. Rev. Relativ. 6 (2003) 3
[2] J. E. Tohline, R. H. Durisen, & M. McCollough, ApJ 298 (1985) 220
[3] J. L. Houser, J. M. Centrella, & S. Smith, Phys. Rev. Lett. 72 (1994) 1314
[4] K. C. B. New, J. M. Centrella, & J. E. Tohline, Phys. Rev. D 62 (2000) 064019
[5] M. Shibata, T. W. Baumgarte, & S. L. Shapiro, ApJ 542 (2000) 453
[6] M. Rampp, E. Müller, & M. Ruffert, A&A 332 (1998) 969
[7] M. Shibata, & Y. Sekiguchi, Phys. Rev. D 71 (2005) 024014
[8] M. Saijo, Phys. Rev. D 71 (2005) 104038
[9] C. D. Ott, S. Ou, J. E. Tohline, & A. Burrows, ApJ 625 (2005) L119
[10] H. C. Spruit & E. S. Phinney, Nature 393 (1998) 139
[11] A. Heger, S. E. Woosley, & H. C. Spruit, ApJ 626 (2005) 350
[12] C. D. Ott, A. Burrows, T. A. Thompson, E. Livne, & R. Walder, ApJS 164
(2006) 130
[13] S. E. Woosley, & A. Heger, ApJ 637 (2006) 914
[14] J. M. Centrella, K. C. B. New, L. L. Lowe, & J. D. Brown, ApJ 550 (2001)
[15] M. Shibata, S. Karino, & Y. Eriguchi, MNRAS 334 (2002) L27
[16] M. Shibata, S. Karino, & Y. Eriguchi, MNRAS 343 (2003) 619
[17] V. Quilis, MNRAS 352 (2004) 1426
[18] M. J. Berger, P. Colella, J. Comp. Phys. 82 (1989) 64
[19] J. A. Font, M. Miller, W.-M. Suen, & M. Tobias, Phys. Rev. D 61 (2000) 0044011
[20] M. D. Duez, P. Marronetti, S. L. Shapiro, & T. W. Baumgarte, Phys. Rev. D
67 (2003) 024004
[21] Y. Eriguchi, & E. Müller, A&A, 147 (1984) 161
[22] A. L. Watts, N. Andersson, & D. I. Jones, ApJ 618 (2005) L37
[23] A. L. Watts, N. Andersson, H. Beyer, & B. F. Schutz, MNRAS 342 (2003) 1156
[24] A. L. Watts, N. Andersson, & R. L. Williams, MNRAS 350 (2004) 927
[25] M. Saijo & S. Yoshida, MNRAS 368 (2006) 1429
	Introduction
	Mathematical framework
	Numerical approach
	Results
	Initial data
	Stability analysis
	Gravitational waves
	Morphology
	Summary and outlook
	References
ABSTRACT
  It has been recently argued through numerical work that rotating stars with a
high degree of differential rotation are dynamically unstable against bar-mode
deformation, even for values of the ratio of rotational kinetic energy to
gravitational potential energy as low as O(0.01). This may have implications
for gravitational wave astronomy in high-frequency sources such as core
collapse supernovae. In this paper we present high-resolution simulations,
performed with an adaptive mesh refinement hydrodynamics code, of such low
T/|W| bar-mode instability. The complex morphological features involved in the
nonlinear dynamics of the instability are revealed in our simulations, which
show that the excitation of Kelvin-Helmholtz-like fluid modes outside the
corotation radius of the star leads to the saturation of the bar-mode
deformation. While the overall trends reported in an earlier investigation are
confirmed by our work, we also find that numerical resolution plays an
important role during the long-term, nonlinear behaviour of the instability,
which has implications on the dynamics of rotating stars and on the attainable
amplitudes of the associated gravitational wave signals.

<|endoftext|><|startoftext|>
Introduction
	 Hierarchical Meanfield Theory for Two Distinct Scales
	Cooperation in populations with hierarchical levels of mixing
	The RPS Game
	The Repeated Prisoner's Dilemma Game
	Discussion
	Acknowledgments
	Appendix
	References
ABSTRACT
  Population structure induced by both spatial embedding and more general
networks of interaction, such as model social networks, have been shown to have
a fundamental effect on the dynamics and outcome of evolutionary games. These
effects have, however, proved to be sensitive to the details of the underlying
topology and dynamics. Here we introduce a minimal population structure that is
described by two distinct hierarchical levels of interaction. We believe this
model is able to identify effects of spatial structure that do not depend on
the details of the topology. We derive the dynamics governing the evolution of
a system starting from fundamental individual level stochastic processes
through two successive meanfield approximations. In our model of population
structure the topology of interactions is described by only two parameters: the
effective population size at the local scale and the relative strength of local
dynamics to global mixing. We demonstrate, for example, the existence of a
continuous transition leading to the dominance of cooperation in populations
with hierarchical levels of unstructured mixing as the benefit to cost ratio
becomes smaller then the local population size. Applying our model of spatial
structure to the repeated prisoner's dilemma we uncover a novel and
counterintuitive mechanism by which the constant influx of defectors sustains
cooperation. Further exploring the phase space of the repeated prisoner's
dilemma and also of the "rock-paper-scissor" game we find indications of rich
structure and are able to reproduce several effects observed in other models
with explicit spatial embedding, such as the maintenance of biodiversity and
the emergence of global oscillations.

<|endoftext|><|startoftext|>
Flavor Physics in SUSY at large tan β
Paride Paradisi
Departament de F́ısica Teòrica and IFIC, Universitat de València-CSIC, E-46100, Burjassot, Spain.
We discuss the phenomenological impact of a particularly interesting corner of the MSSM: the
large tanβ regime. The capabilities of leptonic and hadronic Flavor Violating processes in shedding
light on physics beyond the Standard Model are reviewed. Moreover, we show that tests of Lepton
Universality in charged current processes can represent an interesting handle to obtain relevant
information on New Physics scenarios.
I. INTRODUCTION
Despite the great phenomenological success of the
Standard Model (SM), it is natural to consider this the-
ory only as the low-energy limit of a more general model.
The direct exploration of New Physics (NP) particles
at the TeV scale will be performed at the upcoming
LHC. A complementary strategy in looking for NP is
provided by high-precision low-energy experiments where
NP could be detected through the virtual effects of NP
particles. In particular, flavor-changing neutral-current
(FCNC) transitions may exhibit a sensitivity reach even
beyond that achievable by the direct searches at the LHC
while representing, at the same time, the best (or even
the only) tool to extract information about the flavor
structures of NP theories.
In view of the above considerations, it is clear that
flavor physics provides necessary and complementary in-
formation to those obtainable by the LHC.
Besides FCNC decays, also the Lepton Flavor Univer-
sality (LFU) tests (Kℓ2 and πℓ2) offer a unique opportu-
nity to probe the SM and thus, to shed light on NP: the
smallness of NP effects is more than compensated by the
excellent experimental resolution and the good theoreti-
cal control.
II. LFV IN SUSY
The discovery of neutrino masses and oscillations has
unambiguously pointed out the existence of the Lepton
Flavor Violation (LFV) thus, we expect this phenomenon
to occur also in the charged-lepton sector.
Within a SM framework with massive neutrinos,
FCNC transitions in the lepton sector like ℓi → ℓjγ are
strongly suppressed by the GIM mechanism at the level
of B(ℓi → ℓjγ) ∼ (mν/mW )
4 ∼ 10−50 well beyond any
realistic experimental resolution [1]. In this sense, the
search for FCNC transitions of charged leptons is one of
the most promising directions where to look for physics
beyond the SM.
Within a SUSY framework, LFV effects originate from
any misalignment between fermion and sfermion mass
eigenstates. In particular, if the light neutrino masses
are obtained via a see-saw mechanism, the radiatively
induced LFV entries in the slepton mass matrix (m2
are given by [2]:
)i6=j ≈ −
ν )i6=j ln
, (1)
where MX denote the scale of SUSY-breaking media-
tion andm0 the universal supersymmetry breaking scalar
mass. Since the see–saw equation 1 allows large (YνY
entries, sizable effects can stem from this running [2].
The determination of (m2
)i6=j would imply a complete
knowledge of the neutrino Yukawa matrix (Yν)ij , which
is not possible even if all the low-energy observables from
the neutrino sector were known. As a result, the predic-
tions of leptonic FCNC effects will remain undetermined
even in the very optimistic situation where all the rele-
vant NP masses were measured at the LHC.
This is in contrast with the quark sector, where similar
RGE contributions are completely determined in terms
of quark masses and CKM-matrix elements.
More stable predictions can be obtained embedding
the SUSY model within a Grand Unified Theory (GUT)
where the see-saw mechanism can naturally arise (such
as SO(10)). In this case the GUT symmetry allows us to
obtain some hints about the unknown neutrino Yukawa
matrix Yν . Moreover, in GUT scenarios there are other
contributions stemming from the quark sector [3]. These
effects are completely independent from the structure of
Yν and can be regarded as new irreducible LFV contribu-
tions within SUSY GUTs. For instance, within SU(5),
as both Q and ec are hosted in the 10 representation, the
CKM matrix mixing the left handed quarks will give rise
to off diagonal entries in the running of the right-handed
slepton soft masses [3].
There exist to different classes of LFV contributions to
rare decays:
i) Gauge-mediated LFV effects through the exchange
of gauginos and sleptons,
ii) Higgs-mediated LFV effects through effective non-
holomorphic Yukawa interactions [4] .
1 The effective light-neutrino mass matrix obtained from a see-
saw mechanism is mν = −YνM̂
ν 〈Hu〉
2, where M̂R is the
3 × 3 right-handed neutrino mass matrix and Yν are the 3 ×
3 Yukawa couplings between left- and right-handed neutrinos
(the potentially large sources of LFV), and 〈Hu〉 is the vacuum
expectation value of the up-type Higgs.
http://arxiv.org/abs/0704.0358v1
The above contributions decouple with the heaviest mass
in the slepton/gaugino loops mSUSY (case i)) or with the
heavy Higgs mass mH (case ii)).
In principle, mH and mSUSY refers to different mass
scales. Higgs mediated effects start being competitive
with the gaugino mediated ones when mSUSY is roughly
one order of magnitude heavier then mH and for tanβ ∼
O(50) [5].
While the appearance of LFV transitions would un-
ambiguously signal the presence of NP, the underlying
theory generating LFV phenomena will remain undeter-
mined, in general.
A powerful tool to disentangle among NP theories is
the study of the correlations of LFV transitions among
same families [5, 6, 7].
Interestingly enough, the predictions for the correla-
tions among LFV processes are very different in the
gauge- and Higgs-mediated cases [5]. In this way, if sev-
eral LFV transitions are observed, their correlated anal-
ysis could shed light on the underlying mechanism of
LFV. In the case of gauge-mediated LFV amplitudes the
ℓi → ℓjℓkℓk decays are dominated by the ℓi → ℓjγ
∗ dipole
transition, which leads to the unambiguous prediction:
B(ℓi → ℓjℓkℓk)
B(ℓi → ℓjγ)
B(µ− e in Ti)
B(µ→eγ)
≃αel . (3)
If some ratios different from the above were discovered,
then this would be clear evidence that some new process
is generating the ℓi → ℓj transition, with Higgs mediation
being a potential candidate 2.
As regards the Higgs mediated case, Br(τ → ljγ) still
gets generally the largest contribution among all the pos-
sible LFV decay modes [5]. The following approximate
relations hold [5]:
Br(τ → ljγ)
Br(τ → ljη)
& 1 ,
Br(τ → ljη)
Br(τ → ljµµ)
3+5δjµ
. (4)
Br(τ → ljee)
Br(τ → ljµµ)
3+5δjµ
. (5)
Br(µ → eγ)
Br(µAl → eAl)
∼ 10 ,
Br(µ → eee)
Br(µ → eγ)
∼ αel . (6)
On the other hand, a correlated study of processes of the
same type but relative to different family transitions, like
2 As recently shown in [7], a powerful tool to disentangle between
Little Higgs models with T parity (LHT) and SUSY theories
is a correlated analysis of LFV processes. In fact, LHT and
SUSY theories predict very different correlations among LFV
transitions [7].
Br(µ → eγ)/Br(τ → µγ) ∼ [(m2
)21/(m
2, provides
important information about the unknown structure of
the LFV source, i.e. (m2
)i6=j .
III. LFU IN SUSY
High precision electroweak tests, such as deviations
from the SM expectations of the LFU breaking, represent
a powerful tool to probe the SM and, hence, to constrain
or obtain indirect hints of new physics beyond it. Kaon
and pion physics are obvious grounds where to perform
such tests, for instance in the π → ℓνℓ and K → ℓνℓ
decays, where l = e or µ. In particular, the ratios
B(P → µν)
B(P → eν)
can be predicted with excellent accuracies in the SM,
both for P = π (0.02% accuracy [8]) and P = K (0.04%
accuracy [8]), allowing for some of the most significant
tests of LFU.
As recently pointed out in Ref. [9], large departures
from the SM expectations can be generated within a
SUSY framework with R-parity only once we assume i)
LFV effects, ii) large tanβ values.
Denoting by ∆r
NP the deviation from µ−e universal-
ity in RK due to NP, i.e.: R
K = (R
K )SM
1 + ∆r
it turns out that [9]:
|∆31R |
2 tan6β. (8)
The deviations from the SM could reach ∼ 1% in the
K case [9] (not far from the present experimental res-
olution [10]) and ∼ few × 10−4 in the R
π case while
maintaining LFV effects in τ decays at the 10−10 level. In
the pion case the effect is quite below the present experi-
mental resolution [11], but could well be within the reach
of the new generation of high-precision πℓ2 experiments
planned at TRIUMPH and at PSI. Larger violations of
LFU are expected in B → ℓν decays, with O(10%) devi-
ations from the SM in R
B and even order-of-magnitude
enhancements in R
B [12].
IV. FLAVOR PHYSICS AT LARGE tanβ AND
DARK MATTER
Within the MSSM, the scenario with large tanβ and
heavy squarks is particularly interesting. On the one
hand, values of tanβ ∼ 30–50 can allow the unification
of top and bottom Yukawa couplings, as predicted in
well-motivated grand-unified models [13]. On the other
hand, a Minimal Flavor Violating (MFV) structure [14]
with heavy (∼ TeV ) soft-breaking terms in the quark
sector and large tanβ ∼ 30 − 50 values leads to in-
teresting phenomenological virtues [12, 15]: the present
(g − 2)µ anomaly and the upper bound on the Higgs
boson mass can be easily accommodated, while satisfy-
ing all the present tight constraints in the electroweak
and flavor sectors. Additional low-energy signatures of
this scenario could possibly show up in the near future
in B(Bu → τν), B(Bs,d → ℓ
+ℓ−) and B(B → Xsγ). In
the following, as discussed in [16], we analyze the above
scenario under the additional assumption that the relic
density of a Bino-like lightest SUSY particle (LSP) ac-
commodates the observed dark matter distribution
0.094 ≤ ΩCDMh
2 ≤ 0.129 at 2σ C.L. . (9)
In the regime with large tanβ and heavy squarks, the
relic-density constraints can be easily satisfied mainly in
the so called A-funnel region [17] where MB̃ ≈ MA/2.
The combined constraints from low-energy observables
and dark matter in the tanβ–MH plane are illustrated
in Figure 1 (left). The light-blue areas are excluded since
the stau turns out to be the LSP, while the yellow band
denotes the allowed region where the stau coannihilation
mechanism is also active. The remaining bands corre-
spond to the following constraints/reference-ranges from
low-energy observables:
• B → Xsγ [1.01 < RBsγ < 1.24]: allowed region
between the two blue lines.
• aµ [2 < 10
−9(aexpµ − a
µ ) < 4 [18]]: allowed region
between the two purple lines.
• B → µ+µ− [Bexp < 8.0×10−8 [19]]: allowed region
below the dark-green line.
• ∆MBs [∆MBs = 17.35 ± 0.25 ps
−1 [20]]: allowed
region below the gray line.
• B → τν [0.8 < RBτν < 0.9]: allowed region be-
tween the two black lines [ red (green) area if all
the other conditions (but for aµ) are satisfied].
From Figure 1 (right), we deduce that there is a quite
strong correlation between ∆aµ and B(Bu → τν) thanks
to the A-funnel region condition MH ≈ 2M1. A SUSY
contribution to aµ of O(10
−9) generally implies a sizable
effect in 0.7 < B(Bu → τν) < 0.9. A more precise de-
termination of B(Bu → τν) is therefore a key element to
test this scenario.
The interplay of B physics observables, dark-matter
constraints, ∆aµ of O(10
−9), and LFV rates is shown
in Figure 2. For a natural choice of |δ12LL| = 10
B(µ → eγ) is in the 10−12 range, i.e. well within the
reach of MEG [21] experiment. On the other hand,
B(τ → µγ) lies within the 10−9 range for a |δ23LL| = 10
that is a natural size expected in many models.
Acknowledgments
I wish to thank the conveners of WG3 for the kind
invitation and G. Isidori, F. Mescia and D. Temes for
collaborations on which this talk is partly based. I also
acknowledge support from the EU contract No. MRTN-
CT-2006-035482, ”FLAVIANET” and from the Spanish
MEC and FEDER FPA2005-01678.
[1] W. M. Yao et al. [Particle Data Group], J. Phys. G 33
(2006) 1 [hppt://pdg.lbl.gov].
[2] F. Borzumati and A. Masiero, Phys. Rev. Lett. 57 (1986)
[3] R. Barbieri and L. J. Hall, Phys. Lett. B 338 (1994) 212
[hep-ph/9408406]; R. Barbieri, L. J. Hall and A. Strumia,
Nucl. Phys. B 445 (1995) 219 [hep-ph/9501334]; L. Cal-
ibbi, A. Faccia, A. Masiero and S. K. Vempati, Phys.
Rev. D 74, 116002 (2006) [hep-ph/0605139].
[4] K. S. Babu and C. Kolda, Phys. Rev. Lett. 89 (2002)
241802 [hep-ph/0206310].
[5] P. Paradisi, JHEP 0602, 050 (2006) [hep-ph/0508054];
P. Paradisi, JHEP 0608, 047 (2006) [hep-ph/0601100].
[6] A. Brignole and A. Rossi, Nucl. Phys. B 701, 3 (2004)
[hep-ph/0401100].
[7] M. Blanke, A. J. Buras, B. Duling, A. Poschenrieder and
C. Tarantino, hep-ph/0702136.
[8] W.J. Marciano and A. Sirlin, Phys.Rev.Lett. 71 3629
(1993); M.Finkemeier, Phys.Lett. B 387 391 (1996).
[9] A. Masiero, P. Paradisi and R. Petronzio, Phys. Rev. D
74, 011701 (2006) [hep-ph/0511289].
[10] L. Fiorini [NA48/2 Collaboration], talk presented at EPS
2005 July 21st-27th 2005 (Lisboa, Portugal).
[11] G. Czapek et al., Phys. Rev. Lett. 70 (1993) 17;
D. I. Britton et al., Phys. Rev. Lett. 68 (1992) 3000.
[12] G. Isidori and P. Paradisi, Phys. Lett. B 639 (2006) 499
[hep-ph/0605012].
[13] G. Anderson, S. Raby, S. Dimopoulos, L. J. Hall
and G. D. Starkman, Phys. Rev. D 49 (1994) 3660
[hep-ph/9308333].
[14] G. D’Ambrosio, G. F. Giudice, G. Isidori and A. Strumia,
Nucl. Phys. B645 (2002) 155.
[15] E. Lunghi, W. Porod and O. Vives, Phys. Rev. D 74
(2006) 075003 [hep-ph/0605177].
[16] G. Isidori, F. Mescia, P. Paradisi and D. Temes,
hep-ph/0703035.
[17] J. R. Ellis, L. Roszkowski and Z. Lalak, Phys. Lett. B
245 (1990) 545.
[18] K. Hagiwara, A. D. Martin, D. Nomura and T. Teubner,
hep-ph/0611102; M. Passera, Nucl. Phys. Proc. Suppl.
155 (2006) 365 [hep-ph/0509372].
[19] R. Bernhard et al. [CDF Collab.], hep-ex/0508058.
[20] A. Abulencia et al. [CDF - Run II Collab.], Phys. Rev.
Lett. 97 (2006) 062003 [AIP Conf. Proc. 870 (2006) 116]
[hep-ex/0606027].
[21] M. Grassi [MEG Collaboration], Nucl. Phys. Proc. Suppl.
149 (2005) 369.
http://arxiv.org/abs/hep-ph/9408406
http://arxiv.org/abs/hep-ph/9501334
http://arxiv.org/abs/hep-ph/0605139
http://arxiv.org/abs/hep-ph/0206310
http://arxiv.org/abs/hep-ph/0508054
http://arxiv.org/abs/hep-ph/0601100
http://arxiv.org/abs/hep-ph/0401100
http://arxiv.org/abs/hep-ph/0702136
http://arxiv.org/abs/hep-ph/0511289
http://arxiv.org/abs/hep-ph/0605012
http://arxiv.org/abs/hep-ph/9308333
http://arxiv.org/abs/hep-ph/0605177
http://arxiv.org/abs/hep-ph/0703035
http://arxiv.org/abs/hep-ph/0611102
http://arxiv.org/abs/hep-ph/0509372
http://arxiv.org/abs/hep-ex/0508058
http://arxiv.org/abs/hep-ex/0606027
FIG. 1: Left plot: Combined constraints from low-energy observables and dark matter in the tan β–MH plane setting [µ,Mℓ̃] =
[0.5, 0.4] TeV. The light-blue area is excluded by the dark-matter conditions [16]. Within the red (green) area all the reference
values of the low-energy observables (but for aµ) are satisfied. The yellow band denote the area where the stau coannihilation
mechanism is active (1 < Mτ̃R/MB̃ < 1.1); in this area the A-funnel region (where MH ≈ 2M1) and the stau coannihilation
region overlap. Right plot: ∆aµ = (gµ−g
µ )/2 vs. the slepton mass within the funnel region taking into account the B → Xsγ
constraint and setting RBτν > 0.7 (blue), RBτν > 0.8 (red), RBτν > 0.9 (green) [16]. The supersymmetric parameters have
been varied in the following ranges: 200 GeV ≤ M2 ≤ 1000 GeV, 500 GeV ≤ µ ≤ 1000 GeV, 10 ≤ tan β ≤ 50. In both plots,
we have set AU = −1 TeV, Mq̃ = 1.5 TeV, and imposed the GUT relation M1 ≈ M2/2 ≈ M3/6.
FIG. 2: Isolevel curves for B(µ → eγ) and B(τ → µγ) assuming |δ12LL| = 10
−4 and |δ23LL| = 10
−2 in the tan β–MH plane [16].
The green/red areas correspond to the allowed regions for the low-energy observables illustrated in Figure 1 for [µ,M
[0.5, 0.4] TeV.
ABSTRACT
  We discuss the phenomenological impact of a particularly interesting corner
of the MSSM: the large tan(beta) regime. The capabilities of leptonic and
hadronic Flavor Violating processes in shedding light on physics beyond the
Standard Model are reviewed. Moreover, we show that tests of Lepton
Universality in charged current processes can represent an interesting handle
to obtain relevant information on New Physics scenarios.

<|endoftext|><|startoftext|>
Introduction
Let Ω be a bounded hyperconvex domain in Cn. By PSH(Ω) we denote the set of plurisub-
harmonic (psh) functions on Ω. In [BT 1,2] the authors established and used the compari-
son principle to study the Dirichlet problem in PSH∩L∞loc(Ω). Recently, Cegrell introduced
a general class E of psh functions on which the complex Monge-Ampère operator (ddc.)n
can be defined. He obtained many important results of pluripotential theory in the class E .
For example, the ones on the comparison principle and solvability of the Dirichlet problem
(see [Ce 1-3]).
The main result of our paper are Theorem 4.1 and some Xing type comparision principles.
Theorem 4.1 is generalize Lemma 5.4 in [Ce1], Lemma 7.2 in [Åh] and Lemma 3.4 in [Ce3].
For definitions of Cegrell’s classes see Section 2. After giving some preliminaries, we start in
Proposition 3.1 with a comparison principle, which is analogous to a comparison principle
due to Xing (Lemma 1 in [Xi1]). It should be observed that our proof is quite different
from Xing’s proof, and the inequality we obtain is slightly stronger than Xing’s inequality,
even in the case of bounded psh functions. Using Proposition 3.1, we give in Theorem
3.5 a sufficient condition for Cn-capacity convergence of a sequence of psh functions in
the class F . This result should be compared to Theorem 3 of [Xi1] where the situation
of bounded psh functions was studied. Applying Theorem 3.5 we give generalizations of
recent results in [Cz] and [CLP] about convergences of multipole Green functions and a
criterion for pluripolarity, respectively. Section 4 focuses on Theorem 4.1 and Theorem
4.9. By applying Theorem 4.1 we give some results on class Cegrell’s classes. We prove in
Proposition 4.4 a local estimate for the Monge-Ampère measure in terms of the Beford-
Taylor relative capacity. As an application, we give in Theorem 4.5 a decomposition result
for Monge-Ampère measure, which is similar in spirit to Theorem 6.3 in [Ce1]. From
Proposition 3.1 and Theorem 4.1 we obtain easily a Xing type comparison principle for
functions in classes F and E .
Acknowledgment. We are grateful to Professor Urban Cegrell for useful discussions that
helped to improve the paper. We are grateful to Per Åhag for fruitful comments. This
work is supported by the National Research Program for Natural Sciences, Vietnam.
http://arxiv.org/abs/0704.0359v1
2. Preliminaries
First we recall some elements of pluripotential theory that will be used throughout the
paper. All this can be found in [BT2], [Ce1], [Ce2], [Le].
2.1. We will always denote by Ω a bounded hyperconvex domain in Cn unless other wise
stated. The Cn-capacity in the sense of Bedford and Taylor on Ω is the set function given
Cn(E) = Cn(E,Ω) = sup{
(ddcu)n : u ∈ PSH(Ω), −1 ≤ u ≤ 0}
for every Borel set E in Ω. It is proved in [BT2] that
Cn(E) =
(ddch∗E,Ω)
where h∗E,Ω is the upper regularization of the relative extremal function hE,Ω for E (relative
to Ω) i.e.,
hE,Ω(z) = sup{u(z) : u ∈ PSH
−(Ω), u ≤ −1 on E}.
The following concepts are taken from [Xi1] and [Xi2]
∗A sequence of functions uj on Ω is said to converge to a function u in Cn-capacity on a
set E ⊂ Ω if for every δ > 0 we have Cn({z ∈ E : |uj(z) − u(z)| > δ}) → 0 as j → ∞.
∗A family of positive measures {µα} on Ω is called uniformly absolutely continuous with
respect to Cn-capacity in a set E ⊂ Ω if for every ǫ > 0 there exists δ > 0 such that for
each Borel subset F ⊂ E with Cn(F)< δ the inequality µα(F)< ǫ holds for all α. We write
µα ≪ Cn in E uniformly for α.
2.2. The following classes of psh functions were introduced by Cegrell in [Ce1] and [Ce2]
E0 = E0(Ω) = {ϕ ∈ PSH
−(Ω) ∩ L∞(Ω) : lim
ϕ(z) = 0,
(ddcϕ)n < +∞},
F = F(Ω) = {ϕ ∈ PSH−(Ω) : ∃ E0(Ω) ∋ ϕj ց ϕ, sup
(ddcϕj)
n < +∞},
E = E(Ω) = {ϕ ∈ PSH−(Ω) : ∃ ϕK ∈ F(Ω) such that ϕK = ϕ on K, ∀K ⊂⊂ Ω},
Ea = Ea(Ω) = {u ∈ E(Ω) : (ddcu)n(E) = 0 ∀ E is pluripolar in Ω}.
For each u ∈ F(Ω), we set
e0(u) =
(ddcu)n.
2.3. Let A = {(wj , νj)}j=1,...,p be a finite subset of Ω × R
+. According to Lelong (see
[Le]), the pluricomplex Green function with poles in A is defined by
g(A)(z) = sup{u(z) : u ∈ LA}
where
LA = {u ∈ PSH
−(Ω) : u(z) − νj log |z − wj | ≤ O(1) as z → wj , j = 1, ..., p}
ν(A) =
νnj , Â = {wj}j=1,...,p.
2.4. We write lim
[u(z) − v(z)] ≥ a if for every ǫ > 0 there exists a compact set K in Ω
such that
u(z) − v(z) ≥ a− ǫ for z ∈ (Ω\K) ∩ {u > −∞}
v(z) = −∞ for z ∈ (Ω\K) ∩ {u = −∞}.
2.5. Xing’s comparison principle (see Lemma 1 in [Xi1]). Let Ω be a bounded open subset
in Cn and u, v ∈ PSH∩L∞(Ω) satisfy lim
[u(z)− v(z)] ≥ 0. Then for any constant r ≥ 1
and all wj ∈ PSH(Ω) with 0 ≤ wj ≤ 1, j = 1, 2, ..., n we have
(n!)2
{u<v}
(v − u)nddcw1 ∧ ... ∧ dd
{u<v}
(r − w1)(dd
v)n ≤
{u<v}
(r − w1)(dd
3. Some convergence theorems
In order to study the convergence of a sequence of psh functions in Cn-capacity, we start
with the following.
3.1. Proposition. a) Let u, v ∈ Fsuch that u ≤ v on Ω. Then for 1 ≤ k ≤ n
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
for all wj ∈ PSH(Ω), 0 ≤ wj ≤ 1, j = 1, ..., k, wk+1, ..., wn ∈ F and all r ≥ 1.
b) Let u, v ∈ E such that u ≤ v on Ω and u = v on Ω\K for some K ⊂⊂ Ω. Then for
1 ≤ k ≤ n
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
for all wj ∈ PSH(Ω), 0 ≤ wj ≤ 1, j = 1, ..., k, wk+1, ..., wn ∈ E and all r ≥ 1.
We proceed through some lemmas.
3.2. Lemma. Let u, v ∈ PSH ∩ L∞(Ω) such that u ≤ v on Ω and lim
[u(z) − v(z)] = 0.
(v − u)kddcw ∧ T ≤ k
(1 − w)(v − u)k−1ddcu ∧ T
for all w ∈ PSH(Ω), 0 ≤ w ≤ 1 and all positive closed currents T .
Proof. First, assume u, v ∈ PSH∩L∞(Ω), u ≤ v on Ω and u = v on Ω\K, K ⊂⊂ Ω. Then,
using the Stokes formula we obtain
(v − u)kddcw ∧ T =
(v − u)kddc(w − 1) ∧ T
(w − 1)ddc(v − u)k ∧ T
= −k(k − 1)
(1 − w)d(v − u) ∧ dc(v − u) ∧ T
(1 − w)(v − u)k−1ddc(u− v) ∧ T
(1 − w)(v − u)k−1ddc(u− v) ∧ T
(1 − w)(v − u)k−1ddcu ∧ T.
General case, for each ǫ > 0 we set vǫ = max(u, v − ǫ). Then vǫ ր v on Ω, vǫ ≥ u on Ω
and vǫ = u on Ω\K for some K ⊂⊂ Ω. Hence
(vǫ − u)
kddcw ∧ T ≤ k
(1 − w)(vǫ − u)
k−1ddcu ∧ T.
Since 0 ≤ vǫ − u ր v − u as ǫ ց 0, letting ǫ ց 0 we get
(v − u)kddcw ∧ T ≤ k
(1 − w)(v − u)k−1ddcu ∧ T.
3.3. Lemma. Let u, v ∈ PSH ∩ L∞(Ω) such that u ≤ v on Ω and lim
[u(z) − v(z)] = 0.
Then for 1 ≤ k ≤ n
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cv)k ∧ T
(r − w1)(dd
cu)k ∧ T.
for all w1, ..., wk ∈ PSH(Ω), 0 ≤ wj ≤ 1 ∀ j = 1, ..., k, wk+1, ..., wn ∈ E and all r ≥ 1.
Proof. To simplify the notation we set
T = ddcwk+1 ∧ ... ∧ dd
First, assume that u, v ∈ PSH ∩ L∞(Ω), u ≤ v on Ω, and u = v on Ω\K, K ⊂⊂ Ω. Using
Lemma 3.2 we get
(v − u)kddcw1 ∧ ... ∧ dd
cwn ≤ k
(v − u)k−1ddcw1 ∧ ... ∧ dd
cwk−1 ∧ dd
cu ∧ T
≤ ...
(v − u)ddcw1 ∧ (dd
cu)k−1 ∧ T
(v − u)ddcw1 ∧ [
(ddcu)i ∧ (ddcv)k−i−1] ∧ T
(w1 − r)dd
c(v − u) ∧ [
(ddcu)i ∧ (ddcv)k−i−1] ∧ T
(r − w1)dd
c(u− v) ∧ [
(ddcu)i ∧ (ddcv)k−i−1] ∧ T
(r − w1)[(dd
cu)k − (ddcv)k] ∧ T.
General case, for each ǫ > 0 we put vǫ = max(u, v − ǫ). Then vǫ ր v on Ω, vǫ ≥ u on Ω
and vǫ = u on Ω\K for some K ⊂⊂ Ω. Hence
(vǫ − u)
kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
k ∧ T
(r − w1)(dd
u)k ∧ T.
Observe that 0 ≤ vǫ − u ր v− u and (dd
k ∧ T → (ddcv)k ∧ T weakly as ǫ ց 0, r−w1
is lower semicontinuous, by letting ǫ ց 0 we have
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cv)k ∧ T
(r − w1)(dd
cu)k ∧ T.
The proof is finished.
Proof of Proposition 3.1. a) Let E0 ∋ uj ց u and E0 ∋ vj ց v as in the definition of F .
Replace vj by max(uj , vj) we may assume that uj ≤ vj for j ≥ 1. By Lemma 3.3 we have
(vj − ut)
kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
k ∧ ddcwk+1 ∧ ... ∧ dd
for t ≥ j ≥ 1. By Proposition 5.1 in [Ce2] letting t → ∞ in the above inequality we have
(vj − u)
w1 ∧ ... ∧ dd
(r − w1)(dd
k ∧ T
(r − w1)(dd
cu)k ∧ T
for j ≥ 1. Next letting j → ∞ again by Proposition 5.1 in [Ce2] we get the desired
conclusion.
b) Let G,W be open sets such that K ⊂⊂ G ⊂⊂ W ⊂⊂ Ω. According to the remark
following Definition 4.6 in [Ce2] we can choose a function ṽ ∈ F such that ṽ ≥ v and ṽ = v
on W . Set
u on G
ṽ on Ω\G
Since u = v = ṽ on W\K we have ũ ∈ PSH−(Ω). It is easy to see that ũ ∈ F , ũ ≤ ṽ and
ũ = u on W . By a) we have
(ṽ − ũ)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cṽ)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
ũ)k ∧ ddcwk+1 ∧ ... ∧ dd
Since ũ = ṽ on Ω\G we have
(ṽ − ũ)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cṽ)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
cũ)k ∧ ddcwk+1 ∧ ... ∧ dd
Since ũ = u, ṽ = v on W and u = v on Ω\K we obtain
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
3.4. Proposition. Let u, v ∈ F and u ≤ v on Ω. Then
(v − u)nddcw1 ∧ ... ∧ dd
(−w1)[(dd
u)n − (ddcv)n]
for all wj ∈ PSH(Ω), −1 ≤ wj ≤ 0, j = 1, ..., n.
Proof. The proposition follows from Proposition 3.1 with k = n, r = 1 and wj are replaced
by wj + 1.
3.5. Theorem. Let u, uj ∈ F and uj ≤ u for j ≥ 1. Assume that sup
(ddcuj)
n < +∞
and ||(ddcuj)
n − (ddcu)n||E → 0 as j → ∞ for all E ⊂⊂ Ω. Then uj → u in Cn-capacity
on every E ⊂⊂ Ω as j → ∞.
Proof. Let Ω′ ⊂⊂ Ω and δ > 0. Put
Aj = {z ∈ Ω′ : |uj − u| ≥ δ} = {z ∈ Ω′ : u− uj ≥ δ}.
We prove that Cn(Aj) → 0 as j → ∞. Given ǫ > 0. By quasicontinuity of u and uj , there
is an open set G in Ω such that Cn(G) < ǫ, and uj |Ω\G, u|Ω\G are continuous. We have
Aj = Bj ∪ {z ∈ G : u− uj ≥ δ}.
where Bj = {z ∈ Ω′\G : u− uj ≥ δ} are compact sets in Ω and
Cn(Aj) ≤ lim
Cn(Bj) + ǫ
We claim that lim
Cn(Bj) = 0. By Proposition 3.4 we have
Cn(Bj) =
(ddch∗Bj )
(u− uj)
n(ddch∗Bj )
(−h∗Bj )[(dd
n − (ddcu)n]
{||(ddcuj)
n − (ddcu)n||K +
(−hΩ′)[(dd
n + (ddcu)n]}
{||(ddcuj)
n − (ddcu)n||K + sup
|hΩ′ |[sup
(ddcuj)
(ddcu)n]}.
As lim
hΩ′(z) = 0 there exists K ⊂⊂ Ω such that
|hΩ′ |[sup
(ddcuj)
(ddcu)n] < ǫ.
By the hypothesis
||(ddcuj)
n − (ddcu)n||K < ǫ for j > j0.
Cn(Bj) < 2ǫ for j > j0.
This proves the claim and hence the theorem.
As an application of Theorem 3.5 we have the following
3.6. Proposition. Let g(Aj) be multipolar Green functions on Ω such that
Âj = {w
1, ..., w
} → ∂Ω and sup
ν(Aj) = sup
)n < +∞
Then g(Aj) → 0 as j → ∞ in Cn-capacity.
Proof. By the hypothesis we have
(ddcg(Aj))
n(Ω) = sup
ν(Aj) < +∞
||(ddcg(Aj))
n||K → 0 as j → ∞ for all K ⊂⊂ Ω.
Theorem 3.5 implies that g(Aj) → 0 as j → ∞ in Cn-capacity.
This section ends up with a criterion for pluripolarity
3.7. Theorem. Let uj ∈ F such that sup
(ddcuj)
n < +∞.
Then there is a constant A > 0 such that
i)( lim
∗ ∈ F .
ii)Cn({z ∈ Ω : ( lim
∗(z) < −t}) ≤ A
iii){z ∈ Ω : lim
uj(z) = −∞} is pluripolar.
Proof. i) For each j ≥ 1 put vj = sup{uj, uj+1, ...}. By [Ce2] v
j ∈ F and
(ddcv∗j )
n ≤ sup
(ddcuj)
n < +∞.
By [Ce2] we have v∗j ց v ∈ F .
ii) By Proposition 3.1 in [CKZ] we have
Cn{z ∈ Ω : ( lim
∗(z) < −t} = Cn{z ∈ Ω : v(z) < −t} ≤
2ne0(v)
where A = 2ne0(v).
iii) According to [BT2] we have
Cn{z ∈ Ω : lim
uj(z) = −∞} = Cn{z ∈ Ω : v(z) = −∞} = 0.
Remark. Theorem 3.7 in the case where uj are multipole Green functions was proved by
D.Coman, N.Levenberg and A.Poletsky in Theorem 4.1 of [CLP].
4. Some properties of the Cegrell’s classes and applications
In this section, first we prove the following
4.1. Theorem. Let u, u1, ..., un−1 ∈ E , v ∈ PSH
−(Ω) and T = ddcu1 ∧ ... ∧ dd
cun−1.
ddc max(u, v) ∧ T |{u>v} = dd
cu ∧ T |{u>v}.
We need the following well-known fact.
4.2. Lemma. Let µ be a measure on Ω and f : Ω → R a measurable function on Ω. The
following are equivalent
i)µ(E) = 0 for all Borell sets E ⊂ {f 6= 0}.
fdµ = 0 for every measurable set E in Ω.
Proof. i)⇒ii) follows from:
fdµ =
E\{f=0}
fdµ +
E∩{f=0}
fdµ = 0
ii)⇒i). It suffices to show that µ = 0 on every Xδ = {f > δ > 0}. By the Hahn
decomposition theorem, there exist measurable subsets X+
and X−
of Xδ such that Xδ =
= ∅ and µ ≥ 0 on X+
, µ ≤ 0 on X−
. We have
δµ(X+
fdµ = 0
δµ(X−
fdµ = 0
Hence, µ(X+
) = µ(X−
) = 0. Therefore, we have µ = 0 on Xδ.
Proof of Theorem 4.1.
a) First we prove the proposition for v ≡ a < 0. According to the remark following
Definition 4.6 in [Ce2], without loss of generality we may assume that u, u1, ..., un−1 ∈ F .
Using Theorem 2.1 in [Ce2] we can find
E0 ∩ C(Ω̄) ∋ u
j ց u, E0 ∩ C(Ω̄) ∋ u
ց uk, k = 1, ..., n− 1.
Since {uj > a} is open we have
ddc max(uj, a) ∧ Tj |{uj>a} = dd
cuj ∧ Tj |{uj>a}.
Thus from the inclusion {u > a} ⊂ {uj > a} we obtain
ddc max(uj , a) ∧ Tj |{u>a} = dd
cuj ∧ Tj |{u>a}.
where Tj = dd
1 ∧ ... ∧ dd
n−1. By Corollary 5.2 in [Ce2], it follows that
max(u− a, 0)ddc max(uj , a) ∧ Tj → max(u− a, 0)dd
c max(u, a) ∧ T.
max(u− a, 0)ddcuj ∧ Tj → max(u− a, 0)dd
cu ∧ T.
Hence
max(u− a, 0)[ddc max(u, a) ∧ T − ddcu ∧ T ] = 0.
Using Lemma 4.2 we have
ddc max(u, a) ∧ T = ddcu ∧ T on {u > a}.
b) Assume that v ∈ PSH−(Ω). Since {u > v} =
{u > a > v}, it suffices to show
ddc max(u, v) ∧ T = ddcu ∧ T on {u > a > v}
for all a ∈ Q−. Since max(u, v) ∈ E , by a) we have
ddc max(u, v) ∧ T |{max(u,v)>a} = dd
c max(max(u, v), a) ∧ T |{max(u,v)>a}
= ddc max(u, v, a) ∧ T |{max(u,v)>a}.
(2) ddcu ∧ T |{u>a} = dd
c max(u, a) ∧ T |{u>a}.
Since max(u, v, a) = max(u, a) on set open {a > v} , we have
(3) ddc max(u, v, a) ∧ T |{a>v} = dd
c max(u, a) ∧ T |{a>v}.
Since {u > a > v} ⊂ {u > a}, {a > v}, {max(u, v) > a} and (1), (2), (3) we have
ddc max(u, v) ∧ T |{u>a>v} = dd
cu ∧ T |{u>a>v}.
The next result is an analogue of an inequality due to Demaily in [De2]
4.3. Proposition. a) u, v ∈ E such that (ddcu)n({u = v = −∞}) = 0. Then
(ddc max(u, v))n ≥ 1{u≥v}(dd
cu)n + 1{u<v}(dd
where 1E denotes the characteristic function of E.
b) Let µ be a positive measure which vanishes on all pluripolar subsets of Ω. Suppose
u, v ∈ E such that (ddcu)n ≥ µ, (ddcv)n ≥ µ. Then (ddc max(u, v))n ≥ µ.
Proof. a) For each ǫ > 0 put Aǫ = {u = v − ǫ}\{u = v = −∞}. Since Aǫ ∩ Aδ = ∅ for
ǫ 6= δ there exists ǫj ց 0 such that (dd
cu)n(Aǫj ) = 0 for j ≥ 1. On the other hand, since
(ddcu)n({u = v = −∞}) = 0 we have (ddcu)n({u = v− ǫj}) = 0 for j ≥ 1. Since Theorem
4.1 it follows that
(ddc max(u, v − ǫj))
n ≥ (ddc max(u, v − ǫj))
n|{u>v−ǫj} + (dd
c max(u, v − ǫj))
n|{u<v−ǫj}
= (ddcu)n|{u≥v−ǫj} + (dd
cv)n|{u<v−ǫj}
= 1{u≥v−ǫj}(dd
cu)n + 1{u<v−ǫj}(dd
≥ 1{u≥v}(dd
cu)n + 1{u<v−ǫj}(dd
cv)n.
Letting j → ∞ and by Remark under Theorem 5.15 in [Ce2] we get
(ddc max(u, v))n ≥ 1{u≥v}(dd
cu)n + 1{u<v}(dd
because max(u, v − ǫj) ր max(u, v) and 1{u<v−ǫj} ր 1{u<v} as j → ∞.
b) Argument as a)
4.4. Proposition. Let u1, ..., uk ∈ PSH(Ω) ∩ L
∞(Ω) and uk+1, ..., un ∈ E . Then
ddcu1 ∧ ... ∧ dd
cun = O((Cn(B))
n ) for all Borel sets B ⊂ Ω′ ⊂⊂ Ω.
B(a,r)
ddcu1 ∧ ... ∧ dd
cun = o((Cn(B(a, r)))
n ) as r → 0 for all a ∈ Ω.
where B(a, r) = {z ⊂ Cn : |z − a| < r}
Proof. We may assume that 0 ≤ uj ≤ 1 for j = 1, ..., k. On the other hand, by the remark
following Defintion 4.6 in [Ce2] we again may assume that uk+1, ..., un ∈ F .
i) For each open set B ⊂⊂ Ω, applying Proposition 3.1 we get
ddcu1 ∧ ... ∧ dd
cun =
(−h∗B)
kddcu1 ∧ ... ∧ dd
(−h∗B)
kddcu1 ∧ ... ∧ dd
(1 − u1)(dd
ch∗B)
k ∧ ddcuk+1 ∧ ... ∧ dd
(ddch∗B)
k ∧ ddcuk+1 ∧ ... ∧ dd
≤ k![
(ddch∗B)
n ∧ [
(ddcuk+1)
n ∧ ... ∧ [
(ddcun)
(by Corollary 5.6 in [Ce2])
≤ k!(e0(uk+1))
n ...(e0(un))
n .[Cn(B)]
≤ constants.[Cn(B)]
Hence
ddcu1 ∧ ... ∧ dd
cun ≤ constants.[Cn(B)]
for all Borel set B ⊂ Ω.
ii) By Proposition 3.1 we have
(−ϕ)kddcu1 ∧ ... ∧ dd
un ≤ k!
(1 − u1)(dd
ϕ)k ∧ ddcuk+1 ∧ ... ∧ dd
(ddcϕ)k ∧ ddcuk+1 ∧ ... ∧ dd
cun < +∞.
Hence (−ϕ)k ∈ L1(dd
cu1 ∧ ... ∧ dd
cun) for all ϕ ∈ F(Ω). Given a ∈ Ω let r0, R0 such that
B(a, r0) ⊂⊂ Ω ⊂⊂ B(a, R0). Then
|z − a|
≤ ga(z) ≤ log
|z − a|
for all z ∈ Ω, where ga denotes the Green function of Ω with pole at a. Since (−ga)
L1(dd
cu1 ∧ ... ∧ dd
cun), it follows that
B(a,r)
(−ga)
kddcu1 ∧ ... ∧ dd
cun → 0 as r → 0
Hence
(log r0 − log r)
B(a,r)
ddcu1 ∧ ... ∧ dd
cun ≤
B(a,r)
(−ga)
kddcu1 ∧ ... ∧ dd
cun → 0
as r → 0. This means that
B(a,r)
ddcu1 ∧ ... ∧ dd
cun = o((
log r0 − log r
)k) as r → 0
Combining this with the inequality
Cn(B(a, r),Ω) ≥ Cn(B(a, r), B(a, R0)) = (
logR0 − log r
)n = O((
log r0 − log r)n
we get
B(a,r)
ddcu1 ∧ ... ∧ dd
cun = o((Cn(B(a, r)))
The next result should be compared with Theorem 6.3 in [Ce1]
4.5. Theorem. Let u1, ..., un ∈ E . Then there exists ũ ∈ E
a such that
ddcu1 ∧ ... ∧ dd
cun = (dd
cũ)n + ddcu1 ∧ ... ∧ dd
cun|{u1=...=un=−∞}.
Proof. First, we write
ddcu1 ∧ ... ∧ dd
cun = µ + dd
cu1 ∧ ... ∧ dd
cun|{u1=...=un=−∞}.
where
µ = ddcu1 ∧ ... ∧ dd
cun|{u1>−∞}∪...∪{un>−∞}.
It is easy to see that µ ≪ Cn in every E ⊂⊂ Ω. Indeed, by Theorem 4.1 we have
ddcu1 ∧ ... ∧ dd
cun|{u1>−j} = dd
c max(u1,−j) ∧ ... ∧ dd
cun|{u1>−j}.
Hence, by Proposition 4.4 (i) it follows that ddcu1 ∧ ... ∧ dd
cun|{u1>−j} ≪ Cn in every
E ⊂⊂ Ω. Next, it remains to show that there exists ũ ∈ Ea such that µ = (ddcũ)n. Let
{Ωj} be an increasing exhaustion sequence of Ω. For each j ≥ 1 put µj = µ|Ωj . By [Åh]
there exists ũj ∈ F such that (dd
cũj)
n = µj . Notice that µj ր µ and
(ddcũj)
n ≤ µ ≤ (ddc(u1 + ... + un))
Applying the comparison principle we obtain
ũj ց ũ ≥ u1 + ... + un ∈ E .
Hence, ũ ∈ Ea and (ddcũ)n = lim
(ddcũj)
n = µ. The proof is thereby completed.
4.6. Corollary. u1, ..., un ∈ E . Then the following are equivalent
i) ddcu1 ∧ ... ∧ dd
cun ≪ Cn in every E ⊂⊂ Ω.
{u1=...=un=−∞}
ddcu1 ∧ ... ∧ dd
cun = 0.
{u1<−s,...,un<−s}∩E
ddcu1 ∧ ... ∧ dd
cun → 0 as s → +∞ for all E ⊂⊂ Ω.
Proof. Direct application of Theorem 4.5.
The comparison principle for class F was studied in [Ce3] and [H1]. By using Proposition
3.1 and Theorem 4.1 we prove a Xing type comparison principle for F
4.7. Theorem. Let u ∈ F , v ∈ E and 1 ≤ k ≤ n. Then
{u<v}
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
{u<v}
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
for all wj ∈ PSH(Ω), 0 ≤ wj ≤ 1, j = 1, ..., k, wk+1, ..., wn ∈ F and all r ≥ 1.
Proof. Let ǫ > 0. We set ṽ = max(u, v − ǫ). By a) in Proposition 3.1 we have
(ṽ − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cṽ)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
Since {u < ṽ} = {u < v − ǫ} and Theorem 4.1 we have
{u<v−ǫ}
(v− ǫ−u)kddcw1 ∧ ...∧ dd
cwn +
{u≤v−ǫ}
(r−w1)(dd
cv)k ∧ ddcwk+1 ∧ ...∧ dd
{u≤v−ǫ}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
Letting ǫ ց 0 we obtain
{u<v}
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
{u<v}
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
4.8. Corollary. Let u ∈ Ea such that u ≥ v for all functions v ∈ E satisfying (ddcu)n ≤
(ddcv)n. Then
{u<v}
(v − u)nddcw1 ∧ ... ∧ dd
{u<v}
(r − w1)(dd
{u<v}
(r − w1)(dd
for all v ∈ E , r ≥ 1 and all w1, ..., wn ∈ PSH(Ω), 0 ≤ w1, ..., wn ≤ 1.
Proof. Let {Ωj} be an increasing exhaustion sequence of relatively compact subdomains
of Ω. Set µj = 1Ωj1{u>−j}(dd
cu)n, where 1E denotes the characteristic function of E ⊂ Ω.
Applying Theorem 4.1 we have
µj = 1Ωj1{u>−j}(dd
c max(u,−j))n ≤ 1Ωj (dd
c max(u,−j))n.
Take φ ∈ E0(Ω) ∩ C(Ω̄). Put
φj = max(u,−j, ajφ)
where aj =
. Then φj = max(u,−j) on Ωj+1, φj ∈ E0 and
µj ≤ 1Ωj (dd
c max(u,−j))n = 1Ωj (dd
n ≤ (ddcφj)
By Ko lodziej’s theorem (see [Ko]) there exists uj ∈ E0 such that
(ddcuj)
n = µj = 1Ωj1{u>−j}(dd
cu)n, ∀ j ≥ 1.
for all j ≥ 1. By the comparison principle we have uj ց ũ ≥ u. On the other hand, since
(ddcu)n({u = −∞}) = 0, it follows that
(ddcuj)
n = 1Ωj1{u>−j}(dd
cu)n → (ddcu)n
weakly as j → ∞. Thus (ddcũ)n = lim
(ddcuj)
n = (ddcu)n. By the hypothesis we have
ũ = u. Applying Theorem 4.7 we get
{uj<v}
(v − uj)
nddcw1 ∧ ... ∧ dd
cwn +
{uj<v}
(r − w1)(dd
{uj<v}
(r − w1)(dd
{uj<v}
(r − w1)(dd
cu)n.
Letting j → ∞ we obtain
{u<v}
(v − u)nddcw1 ∧ ... ∧ dd
cwn +
{u<v}
(r − w1)(dd
Arguing as in Theorem 4.7 we prove a Xing type comparison principle for E .
4.9. Theorem. Let u, v ∈ E and 1 ≤ k ≤ n such that lim
[u(z) − v(z)] ≥ 0. Then
{u<v}
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
{u<v}
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
for all wj ∈ PSH(Ω), 0 ≤ wj ≤ 1, j = 1, ..., k, wk+1, ..., wn ∈ E and all r ≥ 1.
Proof. Let ǫ > 0. We set ṽ = max(u, v − ǫ). By b) in Proposition 3.1 we have
(ṽ − u)kddcw1 ∧ ... ∧ dd
cwn +
(r − w1)(dd
cṽ)k ∧ ddcwk+1 ∧ ... ∧ dd
(r − w1)(dd
u)k ∧ ddcwk+1 ∧ ... ∧ dd
Since {u < ṽ} = {u < v − ǫ} and Theorem 4.1 we have
{u<v−ǫ}
(v− ǫ−u)kddcw1 ∧ ...∧ dd
cwn +
{u≤v−ǫ}
(r−w1)(dd
cv)k ∧ ddcwk+1 ∧ ...∧ dd
{u≤v−ǫ}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
cu)k ∧ ddcwk+1 ∧ ... ∧ dd
Letting ǫ ց 0 we obtain
{u<v}
(v − u)kddcw1 ∧ ... ∧ dd
cwn +
{u<v}
(r − w1)(dd
cv)k ∧ ddcwk+1 ∧ ... ∧ dd
{u<v}∪{u=v=−∞}
(r − w1)(dd
u)k ∧ ddcwk+1 ∧ ... ∧ dd
References
[Åh] P. Åhag, The complex Monge-Ampère operator on bounded hyperconvex domains,
Ph. D. Thesis, Ume̊a University, (2002).
[Bl1] Z. Blocki, On the definition of the Monge-Ampère operator in C2, Math. Ann., 328
(2004), 415-423.
[Bl2] Z. Blocki, Weak solutions to the complex Hessian equation, Ann. Inst. Fourier 55
(2005), 1735-1756.
[BT1] E. Bedford and B.A.Taylor, The Dirichlet problem for the complex Monge-Ampère
operator. Invent. Math.37 (1976), 1-44.
[BT2] E. Bedford and B.A.Taylor, A new capacity for plurisubharmonic functions. Acta
Math., 149 (1982), 1-40.
[BT3] E. Bedford and B.A.Taylor, Fine topology, Silov boundary, and (ddc)n. J. Funct.
Anal. 72 (1987), 225-251.
[Ce1] U. Cegrell, Pluricomplex energy. Acta Math., 180 (1998), 187-217.
[Ce2] U. Cegrell, The general definition of the complex Monge-Ampère operator. Ann.
Inst. Fourier (Grenoble) 54 (2004), 159-179.
[Ce3] U. Cegrell, A general Dirichlet problem for the complex Monge-Ampère operator,
preprint (2006).
[CKZ] U. Cegrell, S. Ko lodziej and A. Zeriahi, Subextention of plurisubharmonic functions
with weak singularities. Math. Zeit., 250 (2005), 7-22.
[Cz] R. Czyz, Convergence in capacity of the Perron-Bremermann envelope, Michigan
Math. J., 53 (2005), 497-509.
[CLP] D. Coman, N. Levenberg and E.A. Poletsky,Quasianalyticity and pluripolarity, J.
Amer. Math. Soc., 18 (2005), 239-252.
[De1] J-P. Demailly, Monge-Ampère operators, Lelong Numbers and Intersection theory,
Complex Analysis and Geometry, Univ. Ser. Math., Plenum, New York, 1993, 115-193.
[De2] J-P. Demailly, Potential theory in several variables, preprint (1989).
[Ko] S. Ko lodziej, The range of the complex Monge-Ampère operator, II, Indiana Univ.
Math. J., 44 (1995), 765-782.
[H1] P. Hiep, A characterization of bounded plurisubharmonic functions, Ann. Polon.
Math., 85 (2004), 233-238.
[H2] P. Hiep, The comparison principle and Dirichlet problem in the class Ep(f), p > 0,
Ann. Polon. Math., 88 (2006), 247-261.
[Le] P .Lelong, Notions capacitaires et fonctions de Green pluricomplexes dans les espaces
de Banach. C.R. Acad. Sci. Paris Ser. Imath., 305:71-76, 1987.
[Xi1] Y. Xing, Continuity of the complex Monge-Ampère operator. Proc. of Amer. Math.
Soc., 124 (1996), 457-467.
[Xi2] Y. Xing, Complex Monge-Ampère measures of pluriharmonic functions with bounded
values near the boundary. Cand. J. Math., 52, (2000),1085-1100.
Department of Mathematics
Hanoi University of Education (Dai hoc Su Pham Hanoi).
Cau giay, Ha Noi, VietNam
E-mail: phhiep−vn@yahoo.com
ABSTRACT
  In this article we will first prove a result about convergence in capacity.
Using the achieved result we will obtain a general decompositon theorem for
complex Monge-Ampere measues which will be used to prove a comparison principle
for the complex Monge-Ampere operator.

<|endoftext|><|startoftext|>
Introduction
Recent space-based observations revealed a presence of vari-
ous kinds of magnetohydrodynamic (MHD) waves and oscilla-
tions in the solar corona. These observations as well as mod-
eling of MHD waves are important as these waves contribute
to the coronal heating problem (Roberts 2000) and they may
consist unique tool of a coronal seismology (Edwin & Roberts
1983, Nakariakov & Ofman 2001). Fast kink (Aschwanden et
al. 1999, Nakariakov et al. 1999, Wang & Solanki 2004) and
sausage (Nakariakov 2003, Pascoe et al. 2007) as well as slow
(de Moortel et al. 2002, Wang et al. 2003) magnetosonic oscil-
lations were observed to be associated either with or without
a solar flare. Analytical studies of these oscillations in coronal
loops were carried on over the last few decades, amongst others,
by Edwin & Roberts (1982, 1983), Poedts & Boynton (1996),
Nakariakov (2003), Van Doorsselaere et al. (2004a,b), Ofman
(2005), Verwichte et al. (2006) and Diáz et al. (2006).
Coronal loops act as natural wave guides for magnetosonic
and torsional Alfvén waves. The later are purely azimuthal os-
cillations in cylindrical geometry. In the linear regime, Alfvén
oscillations do not lead to mass density perturbations. As a re-
sult, contrary to magnetosonic waves, torsional Alfvén waves
can be observed only spectroscopically. While propagating from
the base of the solar corona along open magnetic field lines,
these waves may lead to an increase of a spectral line width with
height (Hassler et al. 1990, Banerjee et al. 1998, Doyle et al.
1998). In closed magnetic field structures, such as coronal loops,
these waves can be observed indirectly as periodic variations of
non-thermal broadening of spectral lines (Zaqarashvili 2003).
Alongside magnetosonic waves, torsional oscillations can be
used to infer, in the framework of coronal seismology, plasma
properties inside oscillating loops. These oscillations are an ideal
Send offprint requests to: T. Zaqarashvili e-mail: temury@genao.org
tool of coronal seismology as their phase speed depends alone on
plasma quantities within the loop, while wave speeds of magne-
tosonic oscillations are influenced by plasma conditions in the
ambient medium. Having known mass density within a loop,
coronal seismology, that is based on torsional oscillations, en-
ables to estimate a magnetic field strength. Torsional oscilla-
tions are potentially important in the context of rapid attenua-
tion of coronal loop kink oscillations (Aschwanden et al. 1999,
Nakariakov et al. 1999). One of a few suggested mechanisms
of the attenuation is a resonant absorption of fast magnetosonic
kink waves by azimuthal Alfvén waves (Ruderman & Roberts
2002). This process may lead to a formation of torsional oscil-
lations in the outer part of a loop. As a result, spotting torsional
oscillations after the kink mode was attenuated would serve as
an evidence of this attenuation mechanism.
A theoretical study of Alfvén oscillations in a coronal loop
was carried on recently by Gruszecki et al. (2007) who con-
sidered impulsively generated oscillations in two-dimensional
straight and curved magnetic field topologies. They found that
lateral leakage of Alfvén waves into the ambient corona is neg-
ligibly small. However, mass density profiles were adopted ho-
mogeneous within the loop, while the real conditions there are
much more complex.
Despite of significant achievements in a development of re-
alistic models there is still much more effort required to develop
our knowledge of wave phenomena in coronal loops. A goal of
this paper is to study the influence of inhomogeneous mass den-
sity fields on spectrum of torsional oscillations. The paper is or-
ganized as follows. Analytical solutions for torsional oscillations
in a longitudinally inhomogeneous coronal loop are presented in
Sect. 2. The numerical results are showed in Sect. 3. Guidelines
for potential observations of these oscillations are presented in
Sect. 4. This paper is concluded by a discussion and a short sum-
mary of the main results in Sect. 5.
http://arxiv.org/abs/0704.0360v1
2 T. Zaqarashvili & K. Murawski: Torsional oscillations of a coronal loop
2. Analytical model of torsional oscillations
We consider a coronal loop of its inhomogeneous mass density
̺0(z) and length 2L, that is embedded in a uniform magnetic field
B = B0ẑ. Small amplitude torsional Alfvén waves in a cylindri-
cal coordinate system (r, φ, z), in which plasma profiles depend
on a longitudinal coordinate z only, can be described by the fol-
lowing linear equations:
4π̺0(z)
, (1)
, (2)
where uφ and bφ are the velocity and magnetic field components
of Alfvén waves.
These equations can be easily cast into a single wave equa-
V2A(z)
= 0 , (3)
where VA(z) = B0/
4π̺0(z) is the Alfvén speed. Assuming that
uφ ∼ exp(iωt), where ω is a wave frequency, we get the equation
uφ = 0 . (4)
For a trapped solution uφ must satisfy line-tying boundary con-
ditions which are implemented by setting
uφ(z = ±L) = 0 . (5)
Equation (4) with condition (5) consists the well-known Sturm-
Liuville problem which solution depends on the profile of VA(z).
We model the coronal loop by a rarefied plasma at the loop apex
(at z = 0) and by a compressed plasma at the loop footpoints
(z = ±L). Specifically, we adopt
̺0(z) = ̺00
1 + α2
, (6)
where ̺00 is the mass density at the loop apex and α
2 is a param-
eter which defines a strength of the inhomogeneity. For α2 = 0
the above mass density profile corresponds to a homogeneous
loop, while for a larger value of α2 the medium is more inhomo-
geneous. Figure 1 illustrates ̺0(z) for α
2 = 50. The mass density
is described by Eq. (6) with ̺00 = 10
−12 kg m−3 and L = 25 Mm.
Note that plasma is compressed at z = ±L. Substituting Eq. (6)
into Eq. (4), we obtain
1 + α2
uφ = 0 , (7)
where VA0 = B0/
4π̺00. With a use of the notation
y ≡ uφ, x ≡
z, a ≡ −
Eq. (7) can be rewritten in the form of Weber (parabolic cylinder)
equation (Abramowitz & Stegun 1964)
y = 0 . (9)
Fig. 1. Spatial profile of the background mass density, ̺0(z), given by
Eq. (6) with α2 = 50. The mass density and length are expressed in
units of 10−12 kg m−3 and 1 Mm, respectively.
Standard solutions to this equation are called Weber (parabolic
cylinder) functions (Abramowitz & Stegun 1964)
W(a,±x) =
(coshπa)1/4
G1y1(x) ∓
2G3y2(x)
, (10)
where
, G3 =
and y1(x), y2(x) are respectively even and odd solutions to
Eq. (9)
y1(x) = 1 + a
+ · · · ,
y2(x) = x + a
+ · · · .
2.1. Two limiting solutions
Periodic solutions to Eq. (9) can be written analytically in the
limiting cases: (a) for a large value of a but a moderate value
of x; (b) for a large x but a moderate a. The first (second) case
corresponds to α2 ≪ 1 (α2 ≫ 1).
2.1.1. Weakly inhomogeneous plasma
We consider first the case of a weakly inhomogeneous mass den-
sity field, i.e. α2 ≪ 1. In this case we have
a < 0, −a≫ x2, p ≡
−a . (12)
We adopt the following expansion (Abramowitz & Stegun
1964):
W(a, x) + iW(a,−x) =
2W(a, 0) exp [vr + i(px + π/4 + vi)] , (13)
where
W(a, 0) =
, (14)
vr = −
(x/2)2
(2p)2
2(x/2)4
(2p)4
+ · · · , vi =
2/3(x/2)3
+ · · · . (15)
T. Zaqarashvili & K. Murawski: Torsional oscillations of a coronal loop 3
As a result of relation −a≫ x2 we have from Eq. (13)
W(a, x) =
2W(a, 0) exp
cos ζ , (16)
W(a,−x) =
2W(a, 0) exp
sin ζ , (17)
ζ ≡ px + π/4 +
. (18)
The general solution to Eq. (9) is
uφ = c1W(a, x) + c2W(a,−x) , (19)
where c1 and c2 are constants.
For a homogeneous loop, i.e. α2 = 0, we recognize the well
known solution
uφ ∼ c1 cos (kz + π/4) + c2 sin (kz + π/4). (20)
Here wave number k satisfies the following homogeneous dis-
persion relation:
. (21)
Line-tying boundary conditions of Eq. (5) lead then to discrete
values of the wave frequency, viz.
1 + α2/6
, n = 1, 2, 3, . . . . (22)
From this dispersion relation we infer that in a comparison to
the loop with a homogeneous mass density distribution, ̺00, the
weakly inhomogeneous mass density field results in a decrease
of a wave frequency. This reduction is a consequence of the fact
that the inhomogeneous loop is denser at its footpoints, so the
average Alfvén speed is decreased. To show this, we compare the
results for the inhomogeneous loop with the homogeneous loop
with the same average density, so that both loops contain exactly
the same mass (Andries et al. 2005). We introduce a frequency
difference
∆ωn = ωn − ω̄n , (23)
where
ω̄n =
V̄A0 =
4π ¯̺0
corresponds to the average mass density
¯̺0 =
̺0(z)dz = ̺00
. (25)
Substituting Eq. (25) into Eq. (24), we obtain
ω̄n =
1 + α2/3
. (26)
From Eqs. (23) and (26) we find that ∆ωn ≤ 0. Here we infer
that in comparison to the average mass density case the wave
frequency is reduced, but as a result of α2 ≪ 1 the frequency
reduction is small. This is in a disagreement with Fermat’s law
and with the results of Murawski et al. (2004) who showed that
sound waves experience frequency increase in a case of a space-
dependent random mass density field.
2.1.2. Strongly inhomogeneous plasma
We discuss now a strongly inhomogeneous mass density case,
i.e. α2 ≫ 1. This case corresponds to x ≫ |a|. In this limit we
get (Abramowitz & Stegun 1964)
W(a, x) =
2k/x(s1(a, x) cos(ξ) − s2(a, x) sin(ξ)) , (27)
W(a,−x) =
2/kx(s1(a, x) sin(ξ) − s2(a, x) cos(ξ)) , (28)
where
− a ln x +
argΓ(1/2 + ia)
, (29)
1 + e2πa − eπa, (30)
s1(a, x) ∼ 1 +
1!2x2
2!22x4
− · · · , (31)
s2(a, x) ∼ −
1!2x2
2!22x4
+ · · · (32)
ur + ivr = Γ(r + 1/2 + ia)/Γ(1/2 + ia), r = 2, 4, . . . . (33)
The boundary conditions of Eq. (5) lead to the discrete fre-
quency spectrum
. (34)
Here we infer that the strongly inhomogeneous mass den-
sity field results in a significant decrease of a wave frequency in
comparison to the case of the loop with the constant density, ̺00.
This wave frequency decrease is a consequence of the fact that
the inhomogeneous loop is denser at its footpoints. Substituting
Eq. (34) into Eq. (23) we find that ∆ωn > 0. This wave frequency
decrease, in a comparison to the case of an average mass density
is now in an agreement with Fermat’s law and with the results of
Murawski et al. (2004).
3. Numerical results
Numerical simulations are performed for Eqs. (1), (2) with an
adaptation of CLAWPACK which is a software package de-
signed to compute numerical solutions to hyperbolic partial dif-
ferential equations using a wave propagation approach (LeVeque
2002). The simulation region (−L, L) is covered by an uniform
grid of 600 numerical cells. We verified by convergence studies
that this grid does not introduce much numerical diffusion and as
a result it represents well the simulation region. We set reflect-
ing boundary conditions at the left and right boundaries of the
simulation region.
Figure 2 shows a spatial profile of velocity uφ(z) for α
2 = 50,
drawn at t = 1000 s (solid line). This spatial profile results from
the initial Gaussian pulse that was launched at t = 0 in the center
of the simulation region, at z = 0. It is noteworthy that the sine-
wave profile of Eq. (20), which is valid for α2 = 0 (dashed line),
is distorted by the strong inhomogeneity which takes place for
the case of α2 = 50.
As a consequence of the inhomogeneity wave period is al-
tered. Figure 3 displays wave period P vs. inhomogeneity param-
eter α2. Diamonds represent the numerical solutions while the
solid lines correspond to the analytical solution to Eqs. (22) (top
panel) and (34) (bottom panel). Wave periods were obtained by
Fourier analysis of the wave signals that were collected in time
at the fixed spatial location, z = 0. It is discernible that the nu-
merical data fits quite well to the analytical curves. A growth of
4 T. Zaqarashvili & K. Murawski: Torsional oscillations of a coronal loop
Fig. 2. Numerically evaluated velocity profile uφ at t = 1000 s for α2 =
50 (solid line). This profile corresponds to the mode number n = 1. Note
that as a result of strong inhomogeneity, uφ departs from the sine-wave
which corresponds to α2 = 0. The dashed line corresponds to Eq. (20)
with c1 = c2 = 0.5.
Fig. 3. Wave period P = ω/2π vs. α2 for the mode number n = 1.
Diamonds correspond to the numerical solutions to Eqs. (1), (2). Solid
lines are drawn with the use of the analytical solution to Eqs. (22) and
(34). The wave period is expressed in seconds.
wave period P with α2 results from wave scattering on centers
of the inhomogeneity and it can be explained on simple physical
grounds. In an inhomogeneous field wave frequency ωn of the
torsional oscillations can be estimated from the following for-
mula:
V̄A0 , (35)
where V̄A0 is the averaged Alfvén speed that is expressed by Eq.
(24). Using P = 2π/ωn we obtain
4π ¯̺0
. (36)
As ¯̺0 grows with α, the growth of P with α results in.
4. Potential observations of torsional oscillations
Torsional oscillations of a coronal loop may result in periodic
variations of spectral line non-thermal broadening (expressed by
a half line width, ∆λB, hereafter HW) (Zaqarashvili 2003). For a
homogeneous loop, HW can be expressed as
∆λB =
uVA0λ
|sin(ωnt)sin(knz)| , (37)
where u is an amplitude of oscillations, λ is a wave length of
the spectral line and c is the light speed. Periodic variations of
spectral line width depend on a height above the solar surface:
a strongest variation corresponds to the wave antinode and the
place of a lack of line width variation corresponds to the nodes
(loop footpoints). Therefore, time series of spectroscopic ob-
servations may allow to determine a wave period. Knowing a
length of the loop, we may estimate the Alfvén speed, which in
turn gives a possibility to infer the magnetic field strength in the
corona. We estimate the expected value of line width variations
which result from torsional oscillations. For a typical coronal
Alfvén speed of ∼ 800 km/s, an amplitude of linear torsional
oscillation can be ∼ 40 km/s, which consists 5% of the Alfvén
speed. For the ”green” coronal line Fe XIV (5303 Å) from Eq.
(37) we obtain
∆λB ≈ 0.7 Å. (38)
This value is about twice larger than the original thermal broad-
ening of Fe XIV line. As a consequence, torsional oscillations
can be detected in time series of the green coronal line spectra.
For a weakly inhomogeneous distribution of mass density
along a loop, Eq. (22) enables to estimate the Alfvén speed at
the loop apex with the help of the observed period of HW vari-
ation and a loop length. For a strongly inhomogeneous density
profile along a loop, Eq. (34) shows that a wave period of tor-
sional oscillations is not just the ratio of the loop length to the
Alfvén speed, but it strongly depends on the rate of inhomo-
geneity, α2. Therefore, an additional effort is required in order
to apply the method of coronal seismology for torsional oscil-
lations. A spatial variation of mass density along the loop can
be estimated by a direct measurement of spectral line intensity
variation along the loop. Then, the estimated variation can be fit-
ted to Eq. (6), and hence a value of α2 can be inferred. Eq. (34)
provides a value of VA0 at the loop summit. Another possibility
is to collect time series of spectroscopic observations at differ-
ent positions of the loop. A spatial variation of line width along
the loop may be compared to the theoretical plot of uφ (Fig. 2),
which enables to estimate α2 and consequently Alfvén speed at
the loop apex (with a use of Eqs. (22) or (34)).
T. Zaqarashvili & K. Murawski: Torsional oscillations of a coronal loop 5
5. Discussion and summary
It is commonly believed that Alfvén waves are generated in the
solar interior either by convection (granulation, supergranula-
tion) or by any other kinds of plasma flow (differential rotation,
solar global oscillations). Due to their incompressible nature,
these waves may carry energy from the solar surface to the solar
corona and therefore they may significantly contribute to coro-
nal heating and solar wind acceleration. In closed magnetic loops
the Alfvén waves may set up the standing torsional oscillations,
while in opened magnetic structures these waves may propagate
up to the solar wind. As a result, observations of Alfvén waves
can be of vital importance to the problems of plasma heating and
particle acceleration.
The Alfvén waves that propagate along open magnetic field
lines may lead to a growth of a spectral line width with height
(Hassler et al. 1990, Banerjee et al. 1998; Doyle et al. 1998).
However, at some altitudes the spectral line width reveals a sud-
den fall off (Harrison et al. 2002; O’Shea et al. 2003, 2005). This
phenomenon was recently explained by resonant energy transfer
into acoustic waves (Zaqarashvili et al. 2006).
On the other hand, the photospheric motions may set up tor-
sional oscillations in closed magnetic loop systems, which can
be observed spectroscopically as periodic variations of spectral
line width (Zaqarashvili 2003). As a result, the observation of
Alfvén waves can be used as an additional powerful tool of coro-
nal seismology; the observed period and loop mean length en-
ables to estimate the Alfvén speed within a loop, which in turn
makes it possible to infer a mean magnetic field strength.
Besides their photospheric origin, torsional Alfvén waves
can be generated in the solar corona in a process of resonant ab-
sorption of the global oscillations (Ruderman & Roberts 2002,
Goossens et al. 2002, Andries et al. 2005, Terradas et al. 2006).
These oscillations may excite Alfvén waves in the outer inho-
mogeneous part of a loop, leading to attenuation of global oscil-
lations and amplification of torsional oscillations. These Alfvén
oscillations can be detected as periodic variations of spectral line
width. As a consequence, observations of Alfvén waves can be
a key for a determination of a damping mechanism of the loop
global oscillations.
Dynamics of torsional Alfvén waves in a homogeneous loop
can be easily solved. However, real coronal loops are longitudi-
nally inhomogeneous, which leads to alteration of wave dynam-
ics (Arregui et al. 2005, 2007, Van Doorsselaere et al. 2004a,b,
Donnelly et al. 2006, Dymova & Ruderman 2006, McEwan et
al. 2006). Therefore, the dynamics of Alfvén waves in longitudi-
nally inhomogeneous coronal loops must be understood in order
to provide analytical basis for potential observations of torsional
oscillations.
In this paper we discussed by analytical and numerical
means evolution of torsional Alfvén waves in an inhomogeneous
mass density field. The analytical efforts resulted in dispersion
relations which were obtained for a specific choice of an equilib-
rium mass density profile. These dispersion relations were writ-
ten explicitly for two limiting cases: (a) weekly inhomogeneous
and (b) strongly inhomogeneous mass density fields. From these
dispersion relations we inferred that the inhomogeneity results in
a wave frequency reduction in comparison to that of estimated at
the loop summit. This analytical finding is supported by the nu-
merical data which reveals that frequency reduction takes place
outside the region of validity of the analytical approach. As a
result of that we claim that a reduction of wave frequency is
ubiquitous for the inhomogeneous mass density field we consid-
ered. This reduction is a consequence of wave scattering on in-
homogeneity centers and it results from reduction of the average
Alfvén speed within a coronal loop. This frequency reduction
has important implications as far as wave observations are con-
cerned. The analytical formulae can be used for estimation of
coronal plasma parameters and therefore torsional Alfvén waves
consist an additional powerful tool of coronal seismology.
Acknowledgments: The authors express their thanks to the
referee, Prof. S. Poedts, for his stimulating comments. The work
of T.Z. is supported by the grant of Georgian National Science
Foundation GNSF/ST06/4-098. A part of this paper is sup-
ported by the ISSI International Programme ”Waves in the Solar
Corona”.
References
Abramowitz, M., & Stegun, I.A. 1964, Handbook of Mathematical Functions
(Washington, D.C.: National Bureau of Standards)
Andries, J., Goossens, M., Hollweg, J. V., Arregui, I., & Van Doorsselaere, T.
2005, A&A, 430, 1109
Arregui, I., Van Doorsselaere, T., Andries, J., Goossens, M., & Kimpe, D. 2005,
A&A, 441, 361
Arregui, I., Andries, J., Van Doorsselaere, T., Goossens, M., & Poedts, S. 2007,
A&A, 463, 333
Aschwanden, M.J., Fletcher, L., Schrijver, C.J., & Alexander, D. 1999, ApJ, 520,
Banerjee, D., Teriaca, L., Doyle, J., & Wilhelm, K. 1998, A&A, 339, 208
De Moortel I., Ireland J., Walsh R. W. & Hood A. W. 2002, Sol. Phys., 209, 61
Diáz, A., Zaqarashvili, T.V., & Roberts, B. 2006, A&A, 455, 709
Donnelly, G.R., Diáz, A., & Roberts, B. 2006, A&A, 457, 707
Doyle, J., Banerjee, D. & Perez, M. 1998, Sol. Phys., 181, 91
Dymova, M. V., & Ruderman, M. S. 2006, A&A, 457, 1059
Edwin, P.M., & Roberts, B. 1982, Sol. Phys., 76, 239
Edwin, P.M., & Roberts, B. 1983, Sol. Phys., 88, 179
Goossens, M., Andries, J., & Aschwanden, M.J. 2002, A&A, 394, L39
Gruszecki, M., Murawski, K., Solanki, S., & Ofman, L. 2007, A&A, (in press)
Harrison, R.A., Hood, A.W., & Pike, C.D. 2002, A&A, 392, 319
Hassler, D.M., Rottman, G.J., Shoub, E.C., & Holzer, T.E. 1990, ApJ, 348, L77
LeVeque, R.J. 2002, Finite Volume Methods for Hyperbolic Problems,
Cambridge University Press
McEwan, M. P., Donnelly, G. R., Diaz, A. J. & Roberts, B. 2006, A&A, 460,
Murawski, K., Nocera, L. and Pelinovsky, E. N. 2004, Waves in Random Media,
14, 109
Nakariakov, V.M., Ofman, L., Deluca, E.E., Roberts, B., & Davila, J.M. 1999,
Science, 285, 862
Nakariakov, V.M., & Ofman, L. 2001, A&A, 372, L53
Nakariakov, V.M. 2003, in The Dynamic Sun (Ed. B. Dwivedi), CUP
O’Shea, E., Banerjee, D., & Poedts, S. 2003, A&A, 400, 1065
O’Shea, E., Banerjee, D., & Doyle, J.G. 2005, A&A, 436, L35
Ofman, L. 2005, Adv. Space Res., 36, 1772
Pascoe, D.J., Nakariakov, V.M. & Arber, T.D. 2007, A&A, 461, 1149
Poedts, S. & Boynton, G.C. 1996, A&A, 306, 610
Roberts, B. 2000, Sol. Phys., 193, 139
Ruderman, M.S., & Roberts, B. 2002, ApJ, 577, 475
Terradas, J., Oliver, R., & Ballester, J.L. 2006, ApJ, 642, 533
Verwichte, E., Foullon, C., & Nakariakov, V.M. 2006, A&A, 449, 769
Wang, T., Solanki, S.K., Innes, D.E., Curdt, W., & Marsch, E. 2003, A&A, 402,
Wang, T.J., & Solanki S.K. 2004, A&A, 421, L33
Zaqarashvili, T.V., 2003, A&A, 399, L15
Zaqarashvili, T.V., Oliver, R., & Ballester, J.L. 2006, A&A, 456, L13
Van Doorsselaere, T., Andries, J., Poedts, S. & Goossens, M., 2004a, ApJ, 606,
Van Doorsselaere, T., Debosscher, A., Andries, J. & Poedts, S., 2004b, A&A,
424, 1065
	Introduction
	Analytical model of torsional oscillations
	Two limiting solutions
	Weakly inhomogeneous plasma
	Strongly inhomogeneous plasma
	Numerical results
	Potential observations of torsional oscillations
	Discussion and summary
ABSTRACT
  We explore the effect of an inhomogeneous mass density field on frequencies
and wave profiles of torsional Alfven oscillations in solar coronal loops.
Dispersion relations for torsional oscillations are derived analytically in
limits of weak and strong inhomogeneities. These analytical results are
verified by numerical solutions, which are valid for a wide range of
inhomogeneity strength. It is shown that the inhomogeneous mass density field
leads to the reduction of a wave frequency of torsional oscillations, in
comparison to that of estimated from mass density at the loop apex. This
frequency reduction results from the decrease of an average Alfven speed as far
as the inhomogeneous loop is denser at its footpoints. The derived dispersion
relations and wave profiles are important for potential observations of
torsional oscillations which result in periodic variations of spectral line
widths. Torsional oscillations offer an additional powerful tool for a
development of coronal seismology.

<|endoftext|><|startoftext|>
Introduction
	Performance Evaluation of PCCCs
	Determination of Parameters that Influence the Performance of Turbo Codes
	Rate-1/3 PCCCs
	Rate-1/2 Pseudo-randomly Punctured PCCCs
	Performance Comparison of Analytic to Simulation Results
	Conclusion
	References
ABSTRACT
  It has been observed that particular rate-1/2 partially systematic parallel
concatenated convolutional codes (PCCCs) can achieve a lower error floor than
that of their rate-1/3 parent codes. Nevertheless, good puncturing patterns can
only be identified by means of an exhaustive search, whilst convergence towards
low bit error probabilities can be problematic when the systematic output of a
rate-1/2 partially systematic PCCC is heavily punctured. In this paper, we
present and study a family of rate-1/2 partially systematic PCCCs, which we
call pseudo-randomly punctured codes. We evaluate their bit error rate
performance and we show that they always yield a lower error floor than that of
their rate-1/3 parent codes. Furthermore, we compare analytic results to
simulations and we demonstrate that their performance converges towards the
error floor region, owning to the moderate puncturing of their systematic
output. Consequently, we propose pseudo-random puncturing as a means of
improving the bandwidth efficiency of a PCCC and simultaneously lowering its
error floor.

<|endoftext|><|startoftext|>
Introduction
The Arctic Circle has first appeared in the study of domino tilings of large
Aztec diamonds [EKLP, JPS]. The name originates from the fact that in most
configurations the dominoes are ‘frozen’ outside the circle inscribed into the dia-
mond, while the interior of the circle is a disordered, or ‘temperate’, zone. Further
investigations of the domino tilings of Aztec diamonds, such as details of statistics
near the circle, can be found in [CEP, J1, J2]. Here we mention that the Arctic
Circle is a particular example of a limit shape in dimer models, in the sense that
it describes the shape of a spatial phase separation of order and disorder. Apart
from domino tilings, many more examples have been discussed recently, see, among
others, papers [CKP, CLP, KO, KOS, OR].
As long as only dimer models are considered, this amounts to restrict to dis-
crete free-fermionic models, although with nontrivial boundary conditions. Indeed,
many of them can be viewed as a six-vertex model at its Free Fermion point (the
correspondence being however usually not bijective), with suitably chosen fixed
boundary conditions. In particular, this is the case of domino tilings of Aztec dia-
monds [EKLP], and the corresponding boundary conditions of the six-vertex model
are the so-called Domain Wall Boundary Conditions (DWBC). Hence the problem
of limit shapes extends to the six-vertex model with generic weights, and with fixed
boundary conditions, among which the case of DWBC is the most interesting.
Historically, the six-vertex model with DWBC was first considered in paper [K]
within the framework of Quantum Inverse Scattering Method [KBI] to prove the
Gaudin hypothesis for norms of Bethe states. The model was subsequently solved
2000 Mathematics Subject Classification. 15A52, 82B05, 82B20, 82B23.
http://arxiv.org/abs/0704.0362v1
2 F. COLOMO AND A.G. PRONKO
in paper [I] where a determinant formula for the partition function was given; see
also [ICK] for a detailed exposition. Quite independently, the model was later
found, under certain restrictions on the vertex weights, to be deeply related with
enumerations of alternating sign matrices (see, e.g., [Br] for a review) and, as
already mentioned, to domino tilings of Aztec diamonds [EKLP].
Concerning the problem of limit shapes for the six-vertex model with DWBC,
as far as the Free Fermion point is considered, the relation with domino tilings
provided apparently an indirect proof of the corresponding Arctic Circle. The non-
bijective nature of the correspondence between the two models asked for more direct
results, purposely for the free-fermion six-vertex model, see [Zi1, FS, KP]. Out
of the Free Fermion point, however, only very few analytical results are available,
such as exact expressions for boundary one-point [BPZ] and two-point [FP, CP1]
correlation functions. The present knowledge on the subject is based mainly on
numerics [E, SZ, AR]; some steps towards finding the limit shapes of the model
have been done recently in [PR].
In the present note we propose a rather direct strategy to address the problem:
after briefly reviewing the six-vertex model with DWBC, we define a bulk corre-
lation function, the Emptiness Formation Probability (EFP), which discriminates
the ordered and disordered phase regions. We give for this correlation function two
equivalent representations, in terms of a determinant and of a multiple integral.
The core derivation of EFP is heavily based on the Quantum Inverse Scattering
Method [KBI], along the lines of papers [BPZ, CP1]; it is out of the scope of the
present paper, corresponding details being given in a separate publication [CP4].
Here our aim is to demonstrate how the limit shapes for the considered model
can be extracted from EFP in a suitable scaling limit, by making use of ideas and
techniques of Random Matrix Models.
To be more specific, and to establish a contact with previous results, we spe-
cialize here our further discussion to the case of free-fermion six-vertex model. We
show that the asymptotic analysis of multiple integral formula for EFP in the scal-
ing limit reduces to a saddle-point problem for a one-matrix model with a triple
logarithmic singularity, or triple Penner model. We argue that the limit shape cor-
responds to condensation of all saddle-point solutions to a single point. This allows
us to recover the known Arctic Circle and Ellipses.
As a comment to our approach, it is to be stressed that it is directly tailored on
the six-vertex model, rather than domino tilings. For this reason it is not restricted
to the free-fermion models, even if, of course, further significant efforts might be
necessary, essentially from the point of view of Random Matrix Model reformu-
lation, for application to more general situations. On the basis of our previous
results in [CP2], however, the application of the method to the particular case of
the so-called Ice Point of the model should be straightforward. This would provide
the limit shape of alternating sign matrices.
2. The model
2.1. The six-vertex model. The six-vertex model (for reviews, see [LW,
Ba]) is formulated on a square lattice with arrows lying on edges, and obeying the
so-called ‘ice-rule’, namely, the only admitted configurations are such that there are
always two arrows pointing away from, and two arrows pointing into, each lattice
vertex. An equivalent and graphically simpler description of the configurations of
THE ARCTIC CIRCLE REVISITED 3
w1 w2 w3 w4 w5 w6
Figure 1. The six allowed types of vertices in terms of arrows
and lines, and their Boltzmann weights.
Figure 2. A possible configuration of the six-vertex model with
DWBC at N = 4, in terms of arrows and lines.
the model can be given in terms of lines flowing through the vertices: for each arrow
pointing downward or to the left, draw a thick line on the corresponding edge. This
line picture implements the ‘ice-rule’ in an automated way. The six possible vertex
states and the Boltzmann weights w1, w2, . . . , w6 assigned to each vertex according
to its state are shown in Figure 1.
2.2. Domain Wall Boundary Conditions. The Domain Wall Boundary
Conditions (DWBC) are imposed on the N×N square lattice by fixing the direction
of all arrows on the boundaries in a specific way. Namely, the vertical arrows on the
top and bottom of the lattice point inward, while the horizontal arrows on the left
and right sides point outward. Equivalently, a generic configuration of the model
with DWBC can be depicted by N lines flowing from the upper boundary to the
left one. A possible state of the model both in terms of arrows and of lines is shown
in Figure 2.
2.3. Partition function. The partition function is defined, as usual, as a
sum over all possible arrow configurations, compatible with the imposed DWBC,
each configuration being assigned its Boltzmann weight, given as the product of all
the corresponding vertex weights,
arrow configurations
with DWBC
wn11 w
2 . . . w
Here n1, n2, . . . , n6 denote the numbers of vertices with weights w1, w2, . . . , w6,
respectively, in each arrow configuration (n1 + n2 + · · ·+ n6 = N
2.4. Anisotropy parameter and phases of the model. The six-vertex
model with DWBC can be considered, with no loss of generality, with its weights
invariant under the simultaneous reversal of all arrows,
w1 = w2 =: a , w3 = w4 =: b , w5 = w6 =: c .
4 F. COLOMO AND A.G. PRONKO
Under different choices of Boltzmann weights the six-vertex model exhibits different
behaviours, according to the value of the parameter ∆, defined as
a2 + b2 − c2
It is well known that there are three physical regions or phases for the six-vertex
model: the ferroelectric phase, ∆ > 1; the anti-ferroelectric phase, ∆ < −1; the
disordered phase, −1 < ∆ < 1. Here we restrict ourselves to the disordered phase,
where the Boltzmann weights are conveniently parameterized as
a = sin(λ+ η) , b = sin(λ− η) , c = sin 2η . (2.1)
With this choice one has ∆ = cos 2η. The parameter λ is the so-called spectral
parameter and η is the crossing parameter. The physical requirement of positive
Boltzmann weights, in the disordered regime, restricts the values of the crossing
and spectral parameters to 0 < η < π/2 and η < λ < π − η.
The special case η = π/4 (or ∆ = 0) is related to free fermions on a lattice, and
there is a well-known correspondence with dimers and domino tilings. In particular,
at λ = π/2, the ∆ = 0 six-vertex model with DWBC is related to the domino tilings
of Aztec diamond. For arbitrary λ ∈ [π/4, 3π/4], we shall refer to the ∆ = 0 case
as the Free Fermion line.
The case η = π/6 (i.e. ∆ = 1/2) and λ = π/2, where all weights are equal,
a = b = c, is known as the Ice Point; all configurations are given the same weight.
In this case there is a one to one correspondence between configurations of the
model with DWBC and N ×N alternating sign matrices.
2.5. Phase separation and limit shapes. The six-vertex model exhibits
spatial separation of phases for a wide choice of fixed boundary conditions, and,
in particular, in the case of DWBC. Roughly speaking, the effect is related to the
fact that ordered configurations on the boundary can induce, through the ice-rule,
a macroscopic order inside the lattice.
The notion of phase separation acquires a precise meaning in the scaling limit,
that is the thermodynamic/continuum limit, performed by sending the number of
sites N to infinity and the lattice spacing to zero, while keeping the total size of the
lattice fixed, e.g., to 1. On a finite lattice, several macroscopic regions may appear,
which in the scaling limit are expected to be sharply separated by some curves, the
so-called Arctic curves.
For the six-vertex model with DWBC the shape of the Artic curve, or limit
shape, has been found rigorously only on the Free Fermion line, and for the closely
related domino tilings of Aztec diamond [JPS, CEP, Zi1, FS, KP]. For generic
values of weights the limit shapes are not known, but the whole picture is strongly
supported both numerically [E, SZ, AR] and analytically [KZ, Zi2, BF, PR].
3. Emptiness Formation Probability
3.1. Definition. We shall use the following coordinates on the lattice: r =
1, . . . , N labels the vertical lines from right to left; s = 1, . . . , N labels the horizontal
lines from top to bottom. We may now introduce the correlation function FN (r, s),
measuring the probability for the first s horizontal edges between the r-th and
r+1-th line to be all ‘full’ (i.e. thick in the line picture, or with a left arrow in the
THE ARCTIC CIRCLE REVISITED 5
Figure 3. Emptiness Formation Probability. The sum in (3.1) is
performed over all configurations compatible with the drawn ar-
rows.
standard picture of the six-vertex model):
FN (r, s) =
‘constrained’
arrow configurations
with DWBC
wn11 w
2 . . . w
6 . (3.1)
Here the sum is performed over all arrow configurations on the N × N lattice,
subjected to the restriction of DWBC, and to the condition that all arrows on the
first s edges between the r-th and r + 1-th line should point left, see Figure 3.
Although this correlation function may appear rather sophisticated, it is com-
putable in some closed form by means of the Quantum Inverse Scattering Method,
on which DWBC are indeed tailored. It is the natural adaptation of the Empti-
ness Formation Probability of quantum spin chains to the present model. For this
reason, and to link to the common practice in the quantum integrable models com-
munity, even if FN (r, s) actually describes ‘fullness’ formation probability, we shall
call it Emptiness Formation Probability (EFP).
3.2. Qualitative discussion of FN (r, s). Let us restrict ourselves to the dis-
ordered regime, −1 < ∆ < 1, for definiteness. From previous analytical and nu-
merical work, in the large N limit the emergence of a limit shape, in the form of
a continuous closed curve touching once each of the four sides of the lattice, is ex-
pected. It follows that five regions emerge in the lattice: a central region, enclosed
by the curve, and four corner regions, lying outside the closed curve and delimited
by the sides of the lattice. The central region is disordered, while the four corners
are frozen, with mainly vertices of type 1, 3, 2, 4 (see Figure 1) appearing in the
top-left, top-right, bottom-right and bottom-left corner, respectively.
By construction, EFP is expected to be almost one in frozen regions of type 1,
or 3, bordering the top side of the lattice, and to be rather small otherwise. DWBC
exclude a region of type 3 to emerge in the upper part of the lattice. Hence FN (r, s)
describes, at a given value of r, as s increases, a transition from a frozen region of
vertices of type 1, where FN (r, s) ∼ 1, to a generic region where FN (r, s) ∼ 0.
It follows that FN (r, s) can describe only the upper left portion of the closed
curve, between its top and left contact points. Nevertheless, it should be mentioned
that the full curve can be built from the knowledge of its top left portion, just
6 F. COLOMO AND A.G. PRONKO
exploiting the crossing symmetry of the six-vertex model. Hence EFP, FN (r, s), is
well suited to describe limit shapes.
3.3. Some notations. For a given choice of parameters λ, η we define
sin 2η
sin(λ+ η) sin(λ− η)
and the integration measure on the real line
µ(x) := ex(λ−π/2)
sinh(ηx)
sinh(πx/2)
related to ϕ as follows:
µ(x) dx .
Let us introduce the complete set of monic orthogonal polynomial {Pn(x)}n=0,1,...
associated to the integration measure µ(x), with the orthogonality relation
Pn(x)Pm(x)µ(x) dx = hnδnm .
The square norms hn are completely determined by the measure µ(x), and may
be expressed, in principle, in terms of its moments. In the following we shall be
interested in the complete set of orthogonal polynomials {Kn(x)}n=0,1,... defined as
Kn(x) = n!ϕ
n+1 1
Pn(x) .
We moreover define
ω(ǫ) :=
sin(ǫ)
sin(ǫ− 2η)
, ω̃(ǫ) :=
sin(ǫ)
sin(ǫ + 2η)
Note that the following relation holds
a2 ω̃ − 2∆ab ω̃ω + b2 ω = 0 , (3.2)
allowing to express ω̃ in terms of ω.
3.4. Determinant representation. For EFP in the six-vertex model with
DWBC, the following representation holds:
FN (r, s) = (−1)
s det
1≤j,k≤s
KN−k(∂ǫj )
[ω(ǫj)]
[ω(ǫj)− 1]
1≤j<k≤s
[ω̃(ǫj)− 1] [ω(ǫk)− 1]
ω̃(ǫj)ω(ǫk)− 1
ǫ1=0,...,ǫs=0
. (3.3)
This representation has been obtained in the framework of the Quantum Inverse
Scattering Method [KBI], along the lines of analogous derivations worked out for
one-point and two-point boundary correlation functions of the model [BPZ, CP1].
The details of the derivation can be found in [CP4].
THE ARCTIC CIRCLE REVISITED 7
3.5. The boundary correlation function. If we consider expression (3.3)
when s = 1, we recover the boundary polarization, introduced and computed in
[BPZ]. It is convenient to consider the closely related boundary correlation function
HN (r) := FN (r, 1)− FN (r − 1, 1) .
As shown in [BPZ, CP1], the following representation holds:
HN (r) = KN−1(∂ǫ)
[ω(ǫ)]N−r
[ω(ǫ)− 1]N−1
We define the corresponding generating function
hN (z) :=
HN (r) z
r−1 . (3.4)
Noticing that ω(ǫ) → 0 as ǫ → 0, it can be shown that, given any arbitrary function
f(z) regular in a neighbourhood of the origin, the following inverse representation
holds
KN−1(∂ǫ)f(ω(ǫ))
(z − 1)N−1
hN(z)f(z) dz . (3.5)
Here C0 is a closed counterclockwise contour in the complex plane, enclosing the
origin, and no other singularity of the integrand.
3.6. Multiple integral representation. Plugging (3.5) into representation
(3.3), we readily obtain the following multiple integral representation for EFP:
FN (r, s) =
· · ·
dsω det
1≤j,k≤s
hN−k+1(ωj)
ωj − 1
ωN−r−1j
(ωj − 1)N
1≤j<k≤s
(ω̃j − 1)(ωk − 1)
ω̃jωk − 1
. (3.6)
Here ω̃j ’s should be expressed in terms of ωj ’s through (3.2). Indeed, due to (3.5),
relation (3.2) for functions ω(ǫ), ω̃(ǫ), translates directly into the same relation
between ωj and ω̃j , j = 1, . . . , s.
Representation (3.6), and all results in this Section hold for any choice of param-
eters λ and η within the disordered regime. Moreover, by analytical continuation
in parameters λ and η, these results can be easily extended to all other regimes.
The determinant in expression (3.6) is a particular representation of the parti-
tion function of the six-vertex model with DWBC, when the homogeneous limit is
performed only on a subset of the spectral parameters [CP3]. The structure of the
previous multiple integral representation therefore closely recalls analogous ones for
the Heisenberg XXZ quantum spin chain correlation functions [JM, KMT].
For generic values of λ and η, the orthogonal polynomialsKn(x), or the generat-
ing function hN (z), are known only in terms of rather implicit representations. For-
tunately, there are three notable exceptions [CP2]: the Free Fermion line (η = π/4,
−π/4 < λ < π/4, ∆ = 0), the Ice Point (η = π/6, λ = π/2, ∆ = 1/2), and the Dual
Ice Point (η = π/3, λ = π/2, ∆ = −1/2). In these three cases, the Kn(x) turn
out to be classical orthogonal polynomials, namely Meixner-Pollaczek, Continuous
Hahn and Continuous Dual Hahn polynomial, respectively. Correspondingly, the
8 F. COLOMO AND A.G. PRONKO
generating function can be represented explicitly in terms of terminating hyperge-
ometric functions that may simplify considerably further evaluation of EFP. In the
next Section we shall focus on the case of Free Fermion line.
4. Multiple integral representation at ∆ = 0
4.1. Specialization to η = π/4. We shall now restrict ourselves to the case
η = π/4. We have ∆ = 0, and the six-vertex model reduces to a model of free
fermions on the lattice. The parameter λ can still assume any value in the interval
(−π/4, π/4). It is convenient to trade λ for the new parameter
τ = tan2(λ− π/4) , 0 < τ < ∞ .
The symmetric point (related to the domino tiling of Aztec Diamond) corresponds
now to τ = 1. For generic values of τ we have:
ω̃ = −τω .
The generating function (3.4) is known explicitely (see [CP2] for details):
hN (z) =
1 + τz
1 + τ
Plugging this expression into (3.6), we get
FN (r, s) =
· · ·
dsω det
1≤j,k≤s
(1 + τωj)(ωj − 1)
(1 + τ)ωj
ωN−r−1j
(ωj − 1)N
1≤j<k≤s
(1 + τωj)(ωk − 1)
1 + τωjωk
. (4.1)
4.2. Symmetrization. After extracting a common factor
(1 + τωj)(ωj − 1)
(1 + τ)ωj
from the determinant in (4.1), we recognize it to be of Vandermonde type. We can
therefore collect from the integrand of (4.1) the double product
1≤j<k≤s
(1 + τωj)(ωj − 1)
(1 + τ)ωj
(1 + τωk)(ωk − 1)
(1 + τ)ωk
(1 + τωj)(ωk − 1)
1 + τωjωk
Noticing that the integration and the remaining of integrand are fully symmetric
under permutation of variables ω1, . . . , ωj , we can perform total symmetrization of
the previous double product over all its variables, with the result
(−1)s(s−1)/2
ωs−1j
1≤j<k≤s
(ωj − ωk)
THE ARCTIC CIRCLE REVISITED 9
Hence, we finally obtain the following representation for EFP on the Free Fermion
line:
FN (r, s) =
(−1)s(s+1)/2
s!(1 + τ)s(N−s)(2πi)s
· · ·
1≤j<k≤s
(ωj − ωk)
(1 + τωj)
(ωj − 1)s ω
. (4.2)
The appearance of a squared Vandermonde determinant in this expression naturally
recalls the partition functions of s× s Random Matrix Models.
5. Triple Penner model and Arctic Ellipses
5.1. Scaling limit. We shall now address the asymptotic behaviour of expres-
sion (4.2) for EFP in the ∆ = 0 case. We are interested in the limit N, r, s → ∞,
while keeping the ratios
r/N = x , s/N = y ,
fixed. In this limit, x, y ∈ [0, 1] will parameterize the unit square to which the
lattice is rescaled. Correspondingly EFP is expected to approach a limit function
F (x, y) := lim
FN (xN, yN) , x, y ∈ [0, 1] .
We shall exploit the standard approach developed for instance in the investigation
of asymptotic behaviour for Random Matrix Models. Before this let us however
point out some facts which holds already for any finite value of s.
5.2. A useful identity. Consider the quantity
IN (r, s) :=
(−1)s(s+1)/2
s!(1 + τ)s(N−s)(2πi)s
· · ·
1≤j<k≤s
(ωj − ωk)
(1 + τωj)
(ωj − 1)s ω
which differs from (4.2) only in the integration contours. Here C1 is a closed,
clockwise oriented contour (note the change in orientation) in the complex plane
enclosing point ω = 1, and no other singularity of the integrand. We have the
identity
IN (r, s) = 1 (5.1)
for any integer r, s = 1, . . . , N . The simplest way to prove the previous identity is
by shifting ωj → ωj + 1, and rewriting IN (r, s) as an Hankel determinant; indeed
we have
IN (r, s) =
(−1)s(s−1)/2
(1 + τ)s(N−s)
1≤j,k≤s
ωj+k−2−s(1 + τ + τω)N−s
(1 + ω)r
The entries of the Hankel matrix vanish for j+k > s+1, and hence the determinant
is simply given by the product of the antidiagonal entries, j + k = s + 1 (modulo
a sign (−1)s(s−1)/2 emerging from the permutation of all columns). Identity (5.1)
follows immediately.
10 F. COLOMO AND A.G. PRONKO
5.3. Saddle-point evaluation for large N and finite s. When using the
saddle-point method in variables ω1, . . . , ωs to evaluate the behaviour of FN (r, s) for
large N and r, and finite s, it is rather easy to see that the saddle-point equations
decouple at leading order, and that each saddle-point will be on the real axis,
contributing with a factor e−NSj with Sj positive.
If a given saddle-point is smaller than 1, the contour C0 can be deformed
through the saddle-point without encountering any pole, and its contribution will
vanish as e−NSj in the large N limit. If however the saddle-point, still on the real
axis, happens to be larger than 1, the deformation of the contour C0 through the
saddle-point will pick up the contribution of the pole at ω = 1 (with a reversed
orientation of the contour), and the j-th integral will behave as 1 + e−NSj . Hence,
in the large N limit (at fixed s) the quantity FN (r, s) will vanish unless all the
saddle-points are greater than 1, in which case FN (r, s) ∼ IN (r, s) = 1. Note that
in the present situation the s saddle-points coincide. A detailed analysis shows that
in this case the position of the s saddle-points depends on the value x = r/N as
τ(1−x)
. In correspondence to the value x0 =
, for which these saddle-
points are exactly 1, the function F (x, 0) has a step discontinuity. More precisely,
it is easy to show that for x ∈ [0, 1], F (x, 0) = θ(x − x0), where θ(x) is Heaviside
step function. From a physical point of view x0 is the contact point between the
limit shape and the boundary. What have been discussed here can easily be verified
in the case s = 1. The extension to finite s > 1 is rather direct as well.
5.4. Saddle-point equation. Having in mind the analogy with s × s Ran-
dom Matrix Models, and the scaling limit specified in Section 5.1, we rewrite our
expression for FN (r, s) at ∆ = 0 as follows:
FN (r, s) =
(−1)s(s+1)/2
s!(1 + τ)s
2(1/y−1)(2πi)s
· · ·
dsω exp
j,k=1
j 6=k
ln |ωj − ωk|
ln(τωj + 1)− ln(ωj − 1)−
ln(ωj)
. (5.2)
Both sums in the exponent are O(s2). The corresponding (coupled) saddle-point
equations read
ωj − 1
(1/y − 1)τ
τωj + 1
k 6=j
ωj − ωk
. (5.3)
A standard physical picture reinterprets the saddle-point equations as the equi-
librium condition for the positions of s charged particle confined to the real axis,
with logarithmic electrostatic repulsion, in an external potential. In the present
case the latter can be seen as generated by three external charges, 1, x/y, and
−(1/y − 1) at positions 1, 0, and −1/τ , respectively. It is natural to refer to this
model as the triple Penner model. Although the simple Penner [P] matrix model
has been widely investigated, not so much is known about the much more compli-
cate double Penner model [M, PW]. We have not been able to trace any previous
study concerning the triple Penner model.
THE ARCTIC CIRCLE REVISITED 11
5.5. The exact Green function at finite s. To investigate the structure of
solutions of the saddle-point equations (5.3) for large s we first introduce the Green
function
Gs(z) =
z − ωj
which, if the ωj ’s solves (5.3), has to satisfy the differential equation:
z(z − 1)(τz + 1)
sG′s(z) + s
2G2s(z)
− s(αz2 + βz + γ)sGs(z)
τs(s− 1)− αs2
z + (1 − τ)s(s− 1)− βs2 +Ω
2τs(s− 1)− αs2
. (5.4)
The coefficients α, β and γ are readily obtained as the coefficients of the second
order polynomial appearing in the numerator, when setting to common denominator
the left hand side of (5.3). We give them explicitly for later convenience:
α = τ
, β =
+ (1− τ)
, γ = −
The derivation of the differential equation is very standard (see, e.g., [SD]). The
left hand side is built by suitably combining the explicit definition of the Green
function and its derivative. The result has to be a polynomial of the first degree in
z, whose coefficients are constructed by matching the leading and first subleading
behaviour of the left hand side as |z| → ∞.
5.6. The first moment Ω. The quantity Ω appearing in (5.4) is defined as
the first moment of the solutions of the saddle-point equations:
It is related in a obvious way to the first subleading coefficient of Gs(z); indeed,
from the definition of the Green function, it is evident that
Gs(z) =
+O(z−3) , |z| → ∞ .
It is worth to emphasize that Ω is still unknown, and that in principle its value
should be determined self consistently by first working out the explicit solution
of Gs(z) (which will depend implicitly on Ω), from (5.4) and then demanding that
j=1 ωj evaluated from this solution coincides with Ω. The appearance of the un-
determined parameter Ω is a manifestation of the ‘two-cuts’ nature of the Random
Matrix Model related to (5.2), see, e.g., par. 6.7 of [D1].
5.7. The asymptotic Green function. We are now in condition to perform
the large s (and large N , r) limit at fixed x, y. In the limit, we can neglect terms of
order O(s) in the differential equation (5.4), which therefore reduces to an algebraic
equation for the limiting Green function G(z):
z(z−1)(τz+1)[G(z)]2−(αz2+βz+γ)G(z) = (τ−α)z+(1−τ−β)+Ω(2τ−α) . (5.5)
The previous algebraic equation has to be supplemented by the normalization con-
dition
G(z) ∼
, |z| → ∞ . (5.6)
12 F. COLOMO AND A.G. PRONKO
Hence the Green function describing the large s asymptotic distribution of solutions
for the saddle equation (5.3) reads:
G(z) =
2z(z − 1)(τz + 1)
(αz2 + βz + γ)
(αz2 + βz + γ)2 + 4z(z − 1)(τz + 1)[(τ − α)z +Ω(2τ − α) + 1− τ − β]
(5.7)
We have selected the positive branch of the square root, to satisfy the normalization
condition (note that the coefficient of z4 under the square root is (α − 2τ)2, and
α − 2τ is negative for any x, y ∈ [0, 1]). However, the expression for G(z) is not
completely specified yet, because Ω is still undetermined.
5.8. Limit shape and condensation of roots. The polynomial under the
square root is of fourth order, hence G(z) will have in general two cuts in the
complex plane. The emergence of a two-cut problem was already expected from
the appearance of the undetermined first moment Ω in (5.4). The discontinuity of
G(z) across these cuts defines, when positive, the density of solutions of the saddle-
point equations (5.3) when s → ∞. The problem of explicitly finding this density,
for arbitrary α, β, γ (or x, y), is a formidable one, not to mention the evaluation of
the corresponding ‘free energy’, and of the saddle-point contribution to the integral
in (5.2). But our aim is much more modest, since we are presently interested only
in the expression of the limit shape, i.e. in the curve in the square x, y ∈ [0, 1],
delimiting regions where F (x, y) = 0 from regions where F (x, y) = 1. Of course we
are here somehow assuming that the transition of F (x, y) from 0 to 1 is stepwise in
the scaling limit, but this is supported both by the physical interpretation of EFP
(in the disordered region, by definition, the number of ‘thin’ lines is macroscopic,
and the probability of finding no ‘thin’ horizontal edges immediately vanishes in
the scaling limit) and by the discussion of Section 5.3.
As explained in the discussion of the double Penner model in paper [PW], the
logarithmic wells in the potential can behave as condensation germs for the saddle-
point solutions. In our case, this can role can be played only by the ‘charge’ at
ω = 1 in the Penner potential since the charge at ω = −1/τ is always repulsive,
while the one at ω = 0 is larger than 1, at least in the region of interest. [PW]
have shown that condensation can occur only for charges less than or equal to 1,
since this will be the fraction of condensed solutions. This consideration, together
with the expected stepwise behaviour and the discussion in Section 5.3, suggest
the following picture for the evolution of saddle-point solution density from the
disordered region, F (x, y) ∼ 0, to the upper left frozen region, F (x, y) ∼ 1: in the
disordered region there is a macroscopic fraction of solutions which are real and
smaller than 1, while in the upper left frozen region this fraction vanish. On the
basis of the discussion here and in Sections 3.2 and 5.3, we shall assume that at
the transition between the two regions all saddle-point solutions have condensed at
ω = 1.
5.9. Main assumption. We claim that the Arctic curve in the square x, y ∈
[0, 1] separating the disordered phase from the upper left frozen phase is defined by
the condition that all solutions of the saddle-point equation lies at ω = 1.
In the derivation of the limit shape, this is indeed the only assumption to which
we are unable to provide a proof. There is in fact no guarantee, at this level, for
THE ARCTIC CIRCLE REVISITED 13
this possibility to occur, and limit shapes could in principle emerge from a different
condition. But if for some values of x, y ∈ [0, 1] we have all solutions of the saddle-
point equation condensing at ω = 1, then this provides a transition mechanism
from 0 to 1 for F (x, y), and this might correspondingly define some limit shape.
If all saddle-point solutions condensate at ω = 1, then we obviously have:
Ω = 1 .
Moreover, the complicate expression (5.7) for G(z) should simply reduce to
G(z) =
z − 1
, (5.8)
since we expect to have no cuts, and only one pole at z = 1 with unit residue.
5.10. Arctic Ellipses. Consider the quartic polynomial under the square root
in (5.7). It is convenient to rewrite it in terms of
α̃ := 2τ − α = τ
β̃ := 2− β = τ
x+ y − 1
y − x
γ̃ := −γ =
(5.9)
Note that α̃ and γ̃ are always positive for x, y ∈ [0, 1]. When Ω = 1, our quartic
polynomial reads
α̃2z4 + 2α̃β̃z3 + (β̃2 + 2α̃γ̃)z2 + 2β̃γ̃z + γ̃2 ,
which may be equivalently rewritten as
(α̃z2 + β̃z + γ̃)2 .
We see that the quartic polynomial reduces to a perfect square, and hence, when
Ω = 1, the two cuts of G(z) disappear, as expected.
Now, when Ω = 1, in our new notations, the Green function reads:
G(z) =
[(2τ − α̃)z2 + (2− β̃)z − γ̃] +
(α̃z2 + β̃z + γ̃)2
2z(τz + 1)(z − 1)
. (5.10)
We now require the coefficients α̃, β̃, γ̃ to be such that the polynomial under
the square root combines with the first part of the numerator in (5.10) to give
2z(τz+1) and simplify the Green function according to (5.8). Once we have chosen
a given branch of the square root (the positive one, in order to satisfy normalization
condition (5.6)), it is obvious that the required simplification can occur for any z
in the complex plane only if the second order polynomial α̃z2 + β̃z + γ̃ does not
change its sign, i.e. only if its two roots coincide, implying:
β̃2 − 4α̃γ̃ = 0 .
Rewriting the last relation in terms of x, y, through (5.9), we readily get
(1 + τ)2x2 + (1 + τ)2y2 − 2(1− τ2)xy − 2τ(1 + τ)x − 2τ(1 + τ)y + τ2 = 0 .
We have therefore recovered the limit shape, which in this Free Fermion case is the
well-known Arctic Ellipse (Arctic Circle for τ = 1) [JPS, CEP]. We recall that,
as discussed in Section 3.2, F (x, y) is non-vanishing only in the upper left region
14 F. COLOMO AND A.G. PRONKO
of the unit square. Therefore, concerning EFP, only the upper left portion of the
Arctic curve, between the two contact points at ( τ
, 0) and (1, 1
), is relevant.
6. Concluding remarks
Our starting point has been the definition of a relatively simple but relevant
correlation function for the six-vertex model with DWBC, the Emptiness Formation
Probability. We have provided both a determinant representation and a multiple
integral representation for the proposed correlation function. This is the first ex-
ample in literature of a bulk (as opposed to boundary) correlation function for the
considered model, for generic weights.
The multiple integral representation, specialized to the Free Fermion case, has
been studied in the scaling limit. In the standard picture of Random Matrix Mod-
els, we recognize the emergence of a triple Penner model. Assuming condensation of
the roots of saddle point equations in correspondence to a limit shape, we recover
the well-known Arctic Circle and Ellipse. It would be interesting to investigate
whether universality considerations of Random Matrix Models (see, e.g., [D2]) can
be extended to the Penner model in the neighbourhood of its logarithmic singular-
ities. This would imply directly the results of [CEP, J1, J2] on the Tracy-Widom
distribution and the Airy process, emerging in a suitably rescaled neighbourhood
of the Arctic Ellipse.
It is worth to stress that the multiple integral representation for EFP presented
in Section 3 can be studied beyond the usual Free Fermion situation. We expect
that condensation of roots of the saddle point equation in correspondence of the
limit shape is a general phenomenon. We believe that this assumption could be of
importance in addressing the problem of limit shapes in the six-vertex model with
DWBC.
Our derivation of the limit shape in the Free Fermion case uses the explicit
knowledge of function hN (z), standing in the multiple integral representation (3.6).
It is worth mentioning that function hN(z) is also known explicitly at Ice Point,
(∆ = 1/2), and Dual Ice Point, (∆ = −1/2), being expressible in terms of (poly-
nomial) Gauss hypergeometric function [Ze, CP2]. For instance, at Ice Point the
triple Penner model discussed above generalizes to a two-matrix Penner model.
This model can be studied along the lines presented here, thus providing a solution
to the longstanding problem of limit shape for Alternating Sign Matrices.
Acknowledgements
We thank Nicolai Reshetikhin for useful discussion, and for giving us a draft
of [PR] before completion. FC is grateful to Percy Deift, and Courant Institute
of Mathematical Science, for warm hospitality. AGP thanks INFN, Sezione di
Firenze, where part of this work was done. We acknowledge financial support from
MIUR PRIN program (SINTESI 2004). One of us (AGP) is also supported in
part by Civilian Research and Development Foundation (grant RUM1-2622-ST-
04), by Russian Foundation for Basic Research (grant 04-01-00825), and by the
program Mathematical Methods in Nonlinear Dynamics of Russian Academy of
Sciences. This work is partially done within the European Community network
EUCLID (HPRN-CT-2002-00325), and the European Science Foundation program
INSTANS.
THE ARCTIC CIRCLE REVISITED 15
References
[AR] D. Allison and N. Reshetikhin, Numerical study of the 6-vertex model with domain wall
boundary conditions, Ann. Inst. Fourier (Grenoble) 55 (2005) 1847–1869.
[Ba] R.J. Baxter, Exactly Solved Models in Statistical Mechanics, Academic press, San Diego,
1982.
[Br] D. M. Bressoud, Proofs and Confirmations: The Story of the Alternating Sign Matrix Con-
jecture, Cambridge University Press, Cambridge, 1999.
[BF] P. Bleher and V. Fokin, Exact Solution of the Six-Vertex Model with Domain Wall Boundary
Conditions. Disordered Phase, preprint (2005) arXiv:math-ph/0510033.
[BPZ] N.M. Bogoliubov, A.G. Pronko, and M.B. Zvonarev, Boundary correlation functions of the
six-vertex model, J. Phys. A: Math. Gen. 35 (2002) 5525–5541.
[CEP] H. Cohn, N. Elkies and J. Propp, Local statistics for random domino tilings of the Aztec
diamond, Duke Math. J. 85 (1996) 117–166.
[CKP] H. Cohn, R. Kenyon and J. Propp, A variational principle for domino tilings, J. Amer.
Math. Soc. 14 (2001) 297–346
[CLP] H. Cohn, M. Larsen and J. Propp, The shape of a typical boxed plane partition, New York
J. Math. 4 (1998) 137–165.
[CP1] F. Colomo and A.G. Pronko, On two-point boundary correlations in the six-vertex model
with DWBC, J. Stat. Mech.: Theor. Exp. JSTAT(2005)P05010, arXiv:math-ph/0503049.
[CP2] F. Colomo and A.G. Pronko, Square ice, alternating sign matrices, and classical orthogonal
polynomials, J. Stat. Mech.: Theor. Exp. JSTAT(2005)P01005, arXiv:math-ph/0411076.
[CP3] F. Colomo and A.G. Pronko, The role of orthogonal polynomials in the six-vertex model
and its combinatorial applications, J. Phys. A: Math. Gen. 39 (2006) 9015–9033.
[CP4] F. Colomo and A.G. Pronko, Emptiness Formation Probability in the Domain Wall six-
vertex marodel, in preparation.
[D1] P. Deift, Orthogonal Polynomials and Random Matrices: a Riemann-Hilbert approach,
Courant Lecture Notes in Mathematics, Amer. Math. Soc., Providence, RI, 2000.
[D2] P. Deift, Universality for mathematical and physical systems, preprint (2006) arXiv:
math-ph/0603038.
[E] K. Eloranta, Diamond Ice, J. Statist. Phys. 96 (1999) 1091–1109.
[EKLP] N. Elkies, G. Kuperberg, M. Larsen and J. Propp, Alternating sign matrices and domino
tilings , J. Algebraic Combin. 1 (1992) 111–132; 219–234.
[FP] O. Foda and I. Preston, On the correlation functions of the domain wall six-vertex model,
J. Stat. Mech.: Theor. Exp. JSTAT(2004)P11001.
[FS] P.L. Ferrari and H. Spohn, Domino tilings and the six-vertex model at its free fermion point,
J. Phys. A: Math. Gen. 39 (2006) 10297–10306.
[I] A.G. Izergin, Partition function of the six-vertex model in the finite volume, Sov. Phys. Dokl.
32 (1987) 878.
[ICK] A.G. Izergin, D.A. Coker and V.E. Korepin, Determinant formula for the six-vertex model,
J. Phys. A: Math. Gen. 25 (1992) 4315–4334.
[J1] K. Johansson, Non-intersecting paths, random tilings and random matrices, Probab. Theory
Related Fields 123 (2002) 225–280.
[J2] K. Johansson, The arctic circle boundary and the Airy process, Annals of Probability 33
(2005) 1–30.
[JM] M. Jimbo and T. Miwa, Algebraic analysis of solvable lattice models, CBMS Lecture Notes
Series, vol. 85, Amer. Math. Soc., Providence, RI (1995).
[JPS] W. Jockush, J. Propp and P. Shor, Random domino tilings and the arctic circle theorem,
preprint (1995) arXiv:math.CO/9801068.
[K] V.E. Korepin, Calculation of norms of Bethe wave functions, Comm. Math. Phys. 86 (1982)
391–418.
[KBI] V.E. Korepin, N.M. Bogoliubov, and A.G. Izergin, Quantum Inverse Scattering Method
and Correlation Functions, Cambridge University Press, Cambridge, 1993.
[KMT] N. Kitanine, J. M. Maillet and V. Terras, Correlation functions of the XXZ Heisenberg
spin-1/2 chain in a magnetic field, Nucl. Phys. B 567 (2000) 554–582.
[KO] R. Kenyon and A. Okounkov, Limit shapes and the complex Burgers equation, preprint
(2005) arXiv:math-ph/0507007.
16 F. COLOMO AND A.G. PRONKO
[KOS] R. Kenyon, A. Okounkov and S. Sheffield, Dimers and Amoebae, Ann. of Math. (2) 163
(2006) 1019–1056.
[KP] V. Kapitonov and A. Pronko, On the arctic ellipse phenomenon in the six-vertex model, in
preparation.
[KZ] V. Korepin, P. Zinn-Justin, Thermodynamic limit of the Six-Vertex Model with Domain
Wall Boundary Conditions, J. Phys. A 33 (2000) 7053–7066.
[LW] E.H. Lieb and F.Y. Wu, Two-dimensional ferroelectric models, in Phase Transitions and
Critical Phenomena, Vol. 1, edited by C. Domb and M.S. Green, Academic Press, London,
1972, pp. 321–490.
[M] Yu. Makeenko, Critical Scaling and Continuum Limits in the D > 1 Kazakov-Migdal Model,
Int.J.Mod.Phys. A10 (1995) 2615–2660.
[OR] A. Okounkov and N. Reshetikhin, Correlation function of Schur process with application
to local geometry of a random 3-dimensional Young diagram, J. Amer. Math. Soc. 16 (2003)
581–603,
[P] R.C. Penner, Perturbative series and the moduli space of Riemann surfaces, J. Diff. Geom.
27 (1988) 35–53.
[PR] K. Palamarchuk and N. Reshetikhin, The six-vertex model with fixed boundary conditions,
in preparation.
[PW] L. Paniak and N. Weiss, Kazakov-Migdal Model with Logarithmic potential and the Double
Penner Matrix Model, J. Math. Phys. 36 (1995) 2512–2530.
[SD] B. Sriram Shastry and A. Dhar, Solution of a generalized Stieltjes problem J. Phys. A: Math.
Gen. 34 6197-6208.
[SZ] O.F. Syljuasen and M.B. Zvonarev, Directed-loop Monte Carlo simulations of Vertex models,
Phys. Rev. E 70 (2004) 016118.
[Ze] D. Zeilberger, Proof of the refined alternating sign matrix conjecture, New York J. Math. 2
(1996) 59–68.
[Zi1] P. Zinn-Justin, The influence of boundary conditions in the six-vertex model, preprint (2002)
arXiv:cond-mat/0205192.
[Zi2] P. Zinn-Justin, Six-Vertex Model with Domain Wall Boundary Conditions and One-Matrix
Model, Phys. Rev. E 62 (2000), 3411–3418.
I.N.F.N., Sezione di Firenze and Dipartimento di Fisica, Università di Firenze, Via
G. Sansone 1, 50019 Sesto Fiorentino (FI), Italy
E-mail address: colomo@fi.infn.it
Saint Petersburg Department of Steklov Mathematical Institute of Russian Acad-
emy of Sciences, Fontanka 27, 191023 Saint Petersburg, Russia
E-mail address: agp@pdmi.ras.ru
ABSTRACT
  The problem of limit shapes in the six-vertex model with domain wall boundary
conditions is addressed by considering a specially tailored bulk correlation
function, the emptiness formation probability. A closed expression of this
correlation function is given, both in terms of certain determinant and
multiple integral, which allows for a systematic treatment of the limit shapes
of the model for full range of values of vertex weights. Specifically, we show
that for vertex weights corresponding to the free-fermion line on the phase
diagram, the emptiness formation probability is related to a one-matrix model
with a triple logarithmic singularity, or Triple Penner model. The saddle-point
analysis of this model leads to the Arctic Circle Theorem, and its
generalization to the Arctic Ellipses, known previously from domino tilings.

<|endoftext|><|startoftext|>
Introduction
The standard text-book presentation of special relativity follows closely that of Ein-
stein’s seminal paper of 1905 [1] in basing the theory on the Special Relativity Principle,
classical electromagnetism and the postulate of constant light speed. However an alterna-
tive and conceptually simpler approach to the physics of space and time, in the absence
of gravitational fields, is possible in which it is not necessary to consider light signals,
classical electromagnetism, or indeed, any dynamical theory whatsoever. The Lorentz
transformation (LT) was first derived in this way by Ignatowsky [2] in 1910. Purely
mathematical considerations imply, in such a derivation of the LT, the existence of a
maximum relative velocity, V , of two inertial frames. Use of relativistic kinematics then
shows that V is equal to the speed of light, c, when light in identified as a manifestation
of the propgation in space-time of massless particles –photons [3]. In this way Einstein’s
mysterious second postulate is derived from first principles. The fundamental axiom un-
derlying such an approach is the Reciprocity Principle (RP) [4, 3], discussed in Section
3 below, relating the the relative velocities of two inertial frames. Derivations of the LT
and the parallel velocity addition formula based on the RP and other simple axioms are
given in Ref. [3].
In the present paper the space-time properties of ponderable1 physical bodies in free
space, as described by Newton’s First Law of mechanics, are used together with the RP,
to demonstrate the invariance of the measured length of a ruler in uniform motion. The
proof given is valid in both Galilean and special relativity, since Newton’s First Law and
the RP hold in both theories.
The analysis presented is based on a careful definition of physical time concepts. In
particular, the ‘frame time’ or ‘proper times’ that appear in in the RP, are distinguished
from the ‘improper time’ or ‘apparent time’ (of a moving clock) that appear in the Time
Dilatation (TD) relation of special relativity.
The paper is organised as follows: The following section contains an elementary discus-
sion of the concepts of ‘space’, ‘time’ and ‘motion’ in physics, in connection with Newton’s
First Law. In Section 3, the RP is introduced and discussed in relation to Newton’s First
Law. It is pointed out that, because of the RP, ‘rulers are clocks’ and ‘clocks are rulers’
when the motion of ponderable bodies in free space is considered. In Section 4 the RP is
used to demonstrate the invariance of the measured length of a uniformly moving ruler.
In Section 5 the operational meanings of the time symbols appearing in the TD formula
of special relativity are discussed. This may be done in a ‘clock oriented’ manner in terms
of ’proper’ and ’improper’ times of the observed clock, or in an ‘observer oriented’ manner
in terms of the proper time of the observer’s local clock and the ‘apparent time’, as seen
by the observer, of the moving clock. Two specific experiments are described to exemplify
the operational meanings of the time symbols in the TD formula. A non-intuitive ‘length
expansion’ effect is found to relate similarly defined spatial intervals corresponding to the
observation of an event either in the rest frame of the clock, or in a frame in which it is
in uniform motion.
1That is bodies, with a non-vanishing Newtonian mass, which may be associated with an inertial
frame in which the body is at rest. No such frame may be associated with a massless object.
The results of the present paper show that the ‘length contraction’ effect and the
correlated ‘relativity of simultaneity’ effect of conventional special relativity do not exist.
A detailed discussion of the reason for the spurious nature of these effects of conventional
special relativity theory may be found in Refs. [5, 6, 7, 8, 9, 10] .
However, a genuine ‘relativistic length contraction’ effect does occur when distances be-
tween spatial coincidences of moving objects are observed from different inertial frames [11].
Also a genuine ‘relativity of simultaneity’ effect occurs when clocks at rest in two different
inertial frames are viewed from a third one [12, 13]. An alternative derivation, directly
from the RP, of the invariance of the measured spatial separation of two objects at rest
in the same inertial frame as well as the absence of the conventional ‘relativity of simul-
taneity’ effect is given in Ref. [9].
2 Physical time and Newton’s First Law of Mechan-
In physics the concepts of ‘time’ and ‘motion’ are inseparable. In a world in which
motion did not exist the physical concept of time would be meaningless. Similarly the
physical concepts of ‘space’ and ‘motion’ are inseperable. Without the concept of space,
no operational definition of motion is possible. The concept of historical time –the time
of the everyday world of human existence– requires the introduction of the further, and
equivalent, concepts of ‘uniform motion’ and ‘cyclic motion with constant period’. For
example, the unit of time the ‘year’ is identified with the (assumed constant) period of
rotation of the Earth about the Sun.
The idea of uniform motion entered into physics in a quantitative way with the for-
mulation of Newton’s First Law [14]
Every body continues in its state of rest, or uniform motion in a right
line unless it is compelled to change that state by forces impressed upon
This law gives an operational meaning to the physical concept of ‘uniform motion’.
It is defined by observations of the position of any ponderable object in ‘free space’ i.e.
in the absence of any mechanical interaction of the object with other objects. There is
a one-to-one correspondence between such a ponderable object and an ‘inertial frame’
of relativity theory. As will be discussed in the following section, one such ponderable
object, O, constitutes both a ruler and a clock for an observer in the rest frame of another
such object, O’, and vice versa.
When time is measured by using a cyclic physical phenomenon, e.g. an analogue clock,
time measurement reduces to recording the result of a spatial (or angular) measurement.
There is a one-to-one correspondence between the spatial coincidence of a stationary
‘mark’ on the face of the clock and a moving ‘pointer’, constituted by the hand of the clock,
and the time measurement [6]. A ‘time interval’ is measured by the angular separation of
two such ‘pointer-mark coincidences’. The implicit assumption is that the motion of the
pointer is ‘uniform’. There is an evident logical circularity here since ‘equal’ time intervals
measured by such an analogue clock assume that the angular velocity of the hand is
constant, whereas constant angular velocity is established by observation of equal angular
increments for equal time time intervals (i.e. also equal angular increments) recorded by
a second clock of supposedly known uniform rate. In practice, this conundrum is resolved
by an appeal to physics. For example, an undamped pendulum in a uniform gravitational
field is predicted, by the laws of mechanics, to have a constant period of oscillation.
Quantum mechanics predicts the same transition frequency and mean lifetimes for two
identical atoms in the same excited state, in the same physical environment, etc.
Measurements of ‘time’ are then ultimately observations of spatial phenomena, e.g.
the time measurement corresponding to observation of the number displayed by a digital
clock is a spatial perception. This will also be the case for time measurements related to
observation of two ponderable objects O and O’ in motion in free space that will now be
discussed.
3 The Reciprocity Principle: rulers are clocks, and
clocks are rulers
Consider two non-interacting ponderable objects O and O’, with arbitary motions in
free space. They are placed at the origins of inertial coordinate systems S and S’ with
axes orientated so that the x and x′ axes are parallel to the relative velocity vector of O
and O’. Without any loss of generality for the following discussion, it may be assumed
that O and O’ lie on the common x-x′ axis.
The Reciprocity Principle (RP) [4, 3, 9] is defined by the equation:
v = vO′O =
∂xO′O
= −vOO′ (3.1)
where xO′O ≡ xO′−xO and x
, or in words: ‘If the velocity of O’ relative to O
is ~v, the velocity of O relative to O’ is - ~v’. In many discussions of special relativity, the RP
is taken as ‘obvious’ and is often not even declared as a separate axiom. This is the case,
for example, in Einstein’s 1905 special relativity paper [1]. However, as first demonstrated
by Ignatowsky in 1910 [2], it is sufficient, together with some other weaker axioms such as
the homogeneity of space or single-valuedness of the transfomation equations, to derive [3]
the space-time Lorentz transformation and hence the whole of special relativity theory.
Eqn(3.1) looks very similar to the equation defining the relative velocity of two objects
A and B as observed in a single inertial reference frame (say S):
vAB ≡ vA − vB =
d(xA − xB)
= −vBA (3.2)
The crucial difference is the appearence in the RP, (3.1), of two different times t and t′.
The time t is the ‘frame time’ of S. i.e. the time registered by a synchronised clock at rest,
at any position in S, according to an observer also a rest in S. The frame time t′ is similarly
defined by an array of synchronised clocks at rest in S’. Eqn(3.1) (and its integral) gives a
relation between the times t and t′ Both t and t′ correspond to ‘proper times’ of clocks at
rest, whereas, as explained in Section 4 below, the Lorentz transformation relates instead
a proper time to an ‘improper time’ –the observed time of a clock in uniform motion.
Suppose now that O and O’ are equipped with local clocks that are observed to run at
exactly the same rate when they are both at rest in the same inertial frame. The direction
of the relative velocity vector ~v of O’ relative to O is such that they are approaching each
other at the frame times t and t′. The spatial separations of O and O’ in S and S’ are ℓ(t)
and ℓ′(t′) respectively, at times t and t′. Using the RP, a spatial coincidence of O and O’
will be observed at the time
tOO′ = t+
(3.3)
in S, and
= t′ +
ℓ′(t′)
(3.4)
in S’. The OO’ coincidence event will be mutually simultaneous in the frames S and S’.
Note that the OO’ spatial coincidence that is mutually simultaneous in S and S’
constitutes a pair of reciprocal pointer mark coincidences. In S the mark is at the position
of O and the moving pointer at the position of O’, whereas in S’ the position of O’
constitutes the mark and the position of O the pointer. A corollary is that all such pairs
of reciprocal pointer mark coincidences are mutually simultaneous. This is the basis of the
‘system external synchronistation’ [15] as introduced in Einstein’s first special relativity
paper [1] to synchronise clocks at rest in different inertial frames when they are in spatial
coincidence.
The observation of the OO’ coincidence event in both frames can be used to give a
condition that any other pair of events, one observed in S, the other observed in S’ are
mutually simultaneous. If the time of an event in S is t̃ and another event in S’ is t̃′ they
will be ‘mutually simultaneous’ providing that:
t̃′ − t̃ = t′
− tOO′ (3.5)
Combining (3.3)-(3.5) gives:
t̃′ − t̃ = t′
− tOO′ = t
′ − t+
ℓ′(t′)− ℓ(t)
(3.6)
If now events occuring at times t in S and t′ in S’ are mutually simultaneous, it follows
from (3.5) and (3.6) that ℓ(t) = ℓ′(t′), so that events which occur when O and O’ have
the same spatial separation in S and S’ are mutually simultaneous. A special case occurs
if the clock arrays in S and S’ are mutually synchronised so that ℓ(t) = ℓ′(t′ = t). There
is then a direct correlation between either t or t′ and the spatial separation of O and O’:
When mutually synchronised clocks in the frames S and S’ have the same reading, O and
O’ have the equal spatial separations in S and S’, and conversely, When O and O’ have
equal spatial separations in the frames S and S’, mutually synchronised clocks in S and S’
have the same reading.
The dependence of ℓ on t in Eqn(3.3) and ℓ′ on t′ in Eqn(3.4) means that each of
the objects may be considered to be an ‘inertial clock’ by an observer in the rest frame
of the other one. That is, t is measured by the spatial separation of O’ from O in S
and t′ is measured by the spatial separation of O from O’ in S’. Conversely, after mutual
synchronisation of the clock arrays in S and S’ at the instant when O and O’ are in spatial
coincidence, t measures the spatial separation of O’ and O in S (and so is effectively a
ruler in this frame) while t′ measures the spatial separation of O’ and O in S’, constituting
a ruler in this frame. Matching of these measurements of the separation of O and O’ with
the lengths of physical rulers at rest in S and S’ is now used to demonstrate the invariance
of the measured length of the length of a ruler in uniform motion –that is, the absence of
any relativistic length contraction effect– in this case.
4 Invariance of the measured length of a ruler in uni-
form motion
Figure 1: Rulers attached to objects O and O’ are viewed from the frame S (left) and S’
(right). The equality of the separations of O and O’ in S and S’ at time t = t′ = L/v,
predicted by the RP, is used to establish the invariance of the measured length of the
moving ruler R’ in S, or of the moving ruler R in S’ (see text).
Suppose that O and O’ are equipped with rulers R and R’, parallel to the x-x′ axis
as shown in Fig.1. O coincides with the mark MR(0) of the ruler R and O’ with the
mark MR′(10) of the ruler R’. A t = t′ = 0 (Fig.1a) O and O’ are in spatial coincidence.
The clock arrays in S and S’ are mutually synchronised at this time. The length of each
ruler in its rest frame is L. The object O’ now moves along the ruler R, being in spatial
coincidence with different marks of the ruler at different times. The object O moves in a
similar manner along the ruler R’. At any given time t the separation of O and O’ in S is
given by the corresponding ‘Pointer Mark Coincidence’ (PMC):
PMC(O′, t) ≡ O′(t)@MR(J) (4.1)
where the symbol before the ampersand denotes the moving ‘pointer’, and the symbol
after it the stationary ‘mark’ with which it is spatial coincidence2. Since
PMC(O, t) ≡ O(t)@MR(0) for all t (4.2)
and x[MR(0)] = 0 it follows that the separation of O and O’ in the frame S at time t is
given by:
dO′O(t) = x[MR(J)]− x[MR(0)] = x[MR(J)] (4.3)
where
x[MR(J)] =
and where, in Fig.1, Jmax = 10, is the ordinal number of the mark at the end of the ruler.
Thus the x-coordinate origin is at MR(0). Defining in a similar manner a PMC in the
frame S’:
PMC(O, t′) ≡ O(t′)@MR′(K) (4.4)
and since
PMC(O′, t′) ≡ O′(t′)@MR′(10) for all t′ (4.5)
the separation of O and O’ in S’ at the time t′ is
(t′) = x′[MR′(10)]− x′[MR′(K)] (4.6)
where
x′[MR′(K)] =
and where, in Fig.1, Kmax = 10. The spatial configurations in S and S’ at the times
t = t′ = L/v are shown in Fig.1b. The corresponding PMC are:
PMC(O′, L/v) ≡ O′(L/v)@MR(10) (4.7)
PMC(O, L/v) ≡ O(L/v)@MR′(0) (4.8)
It follows from (4.3) and (4.6) that
dO′O(L/v) = x[MR(10)]−x[MR(0)] = L = x
′[MR′(10)]−x′[MR′(0)] = d′
(L/v) (4.9)
Since O’ coincides with MR′(10) at all times it follows that, at t = L/v
x[MR′(10)] = x[O′] = x[MR(10)] (4.10)
Also, since O is in spatial coincidence with MR′(0) at t = t′ = L/v it follows that at
t = L/v,
x[MR′(0)] = x[O] = x[MR(0)] = 0 (4.11)
2This notation was introduced in Ref. [6]. Note the similarity with an e-mail address
Eqns(4.9)-(4.11) then give at t = L/v:
x[MR′(10)]− x[MR′(0)] = x[MR(10)]− x[MR(0)] = L (4.12)
That is, the measured length of the moving ruler R’ in the frame S, at t = L/v, is the
same as the length of the same ruler at rest –there is no ‘length contraction’ effect. A
similar calculation for the length of the ruler R as measured in the frame S’ gives, at
t′ = L/v:
x′[MR(10)]− x′[MR(0)] = x′[MR′(10)]− x′[MR′(0)] = L (4.13)
The length of the moving ruler R as measured in S’, at t′ = L/v, is the same as the length
of the same ruler at rest. The above calculations have used the equality of the spatial
separations of O and O’ in S and S’ at equal times of mutually synchronised clocks in these
frames, that follows from the RP, to establish, via corresponding PMCs, the equality of
the measured lengths of a ruler at rest, or in motion. Note that nowhere in any of the
calculations was the Lorentz transformation invoked. In fact the calculations are the same
in Galilean and special relativity, since the RP is equally valid for both.
5 The time dilatation effect; proper, improper and
apparent time intervals
All the times considered above were ‘frame times’ i.e. t and t′ are the times recorded
by a synchronised clock at rest at any position in S and S’ as viewed by an observer at rest
in these respective frames. In order to discuss the time dilatation effect it will be found
convenient to use the notation t(S), t′(S ′) for the frame times where the arguments S, S’
specify the reference frame of the observer of the clock. Such times are proper times of
such a clock. The Lorentz transformation relates the space-time coordinates (x′,t′(S ′)) of
an event specified in the frame S’ to those of the same event, (x,t′(S)) as observed in S, or
vice versa. The times t(S ′)[ t′(S)] which are those of clocks at rest in S[S’], as viewed from
S’[S] are called improper times. The space-time LT gives the following invariant interval
relation between corresponding space and time intervals in the frames S and S’:
c2(∆τ ′)2 = c2(∆t′(S))2 − (∆x)2 = c2(∆t′(S ′))2 − (∆x′)2 (5.1)
where ∆x ≡ x2 − x1 etc, while the inverse LT gives:
c2(∆τ)2 = c2(∆t(S ′))2 − (∆x′)2 = c2(∆t(S))2 − (∆x)2 (5.2)
In order to use the general interval relation (5.1) to derive the time dilatation effect it is
necessary to identify the time interval ∆t′(S ′) with the proper time interval of a clock at
rest in S’ (∆x′ = 0), and with equation of motion in S: ∆x = v∆t′(S). Using the latter
equation to eliminate ∆x from (5.1) and setting ∆x′ = 0 yields the time dilatation (TD)
relation:
∆t′(S) = γ∆t′(S ′) (5.3)
Figure 2: An experiment to illustrate the TD effcet viewed from S (left) and S’ (right). a)
The pulsed lamp PL at rest in S flashes at time t(S) = L/v and PL’ at rest in S’ flashes
at time t′(S ′) = L/v. b) The light signal from PL is observed at time t(S ′) = γL/v in the
frame S’, that from PL’ at time t′(S) = γL/v in the frame S. The PMCs corresponding to
the positions of observation of the signals in the different frames are indicated. See text
for discussion.
Figure 3: Spatial configurations in the frame S (left) and the frame S’ (right) are viewed
at different times. a) t(S) = t′(S ′) = 0; the Λ is created and moves to the right in the
plane of the figure with speed v =
3c/2. b) t(S) = t′(S ′) = T ′; the Λ is observed to decay
in the frame S’. The decay products move in the plane of the figure perpendicular to the
direction of motion of the Λ. c) t(S) = t′(S ′) = γT ′; the Λ is observed to decay in the
frame S. See text for discussion. The momentum vectors of the p and π− are drawn to
scale in the different reference frames. The spatial position of each particle is at the tail
of the corresponding momentum vector.
where γ ≡ 1/
1− (v/c)2, relating the improper to the proper time of a clock at rest in
S’. In a similar manner the interval relation (5.2) gives the TD relation for a clock at rest
in S and observed from S’:
∆t(S ′) = γ∆t(S) (5.4)
It is important to note the existence of four different time symbols, with different opera-
tional meanings in Eqns(5.3) and (5.4). The proper times t(S) and t′(S ′) (corresponding
to the ‘frame times’ t and t′ of the previous sections) and the improper times t(S ′) and
t(S ′). The notation for these times just introduced may be called ‘clock oriented’ since
only the readings of a single clock (observed either at rest, or in motion) appear in the TD
relations. In any actual experiment where the TD effect in measured, two clocks are nec-
essary, the observed moving clock, and another one at rest to measure the corresponding
time interval in the observer’s proper frame. If a clock at rest in S’ is observed from S as
in Eqn(5.3), the time interval ∆t′(S) is actually that, ∆τ , recorded by a similar clock, at
rest in S while ∆t′(S ′) is the corresponding time interval recorded by the (slowed-down)
moving clock. Since the observed rate of the moving clock depends on its motion, ∆t′(S ′)
is not a proper time interval for the observer in S. From the view-point of the latter this
is an ‘apparent’ (velocity-dependent) time interval that may be denoted simply as ∆t′,
to distinguish it from the observer’s proper time interval ∆τ . This gives an alternative
‘observer oriented’ time notation for the TD relations (5.3) and (5.4) above:
∆τ = γ∆t′ (5.5)
∆τ ′ = γ∆t (5.6)
This alternative notation has beeen employed in several previous papers by the present
author [6, 8, 11, 12, 13, 16].
In order to apply the TD relations (5.3) and (5.4), or (5.5) and (5.6), to any actual
or imagined experiment an operational definition must be given to the improper time
intervals of Eqns(5.3) and (5.4) or the apparent time intervals of (5.5) and (5.6). Two
examples of such definitions will be given, the first in a thought experiment to illustrate
the physical meaning of the TD effect, the second in an actual experiment typical of
many performed in particle physics, where the TD effect is used to measure the proper
decay time of an unstable particle. However as will be seen, the thought experiment and
actually realisable (and many times realised) one are similar in all essential features.
What notation is most convenient depends on the experiment considered. In the
observation of the TD effect in the last CERN muon g-2 experiment [17] where the time
interval ∆τ was directly measured by clocks in the laboratory frame, and ∆t′ was the
known muon rest-frame lifetime, it was natural to use Eqn(5.5). For the second of the
two experiments considered below where ∆τ is not directly measured but inferred from
spatial measurements in the frame S, the relation (5.3) relating connecting a proper time
in the frame S’ to an improper time in the frame S, is used.
In the thought experiment it is imagined that the objects O, O’ are each equipped
with local pulsed lamps PL, PL’. The objects O, O’ are in spatial coincidence at times
t(S) = t′(S ′) = 0 and are attached to rulers of length 2L in similar spatial configurations to
that shown in Fig.1a. The objects move apart with relative velocity v =
3c/2. As shown
in Fig.2a, at the times t(S) = t′(S ′) = L/v, PL and PL’ both flash, producing an isotropic
pulse of photons. The observation times in S of the photon signal produced by PL’, and
in S’ of the photon signal produced by PL, are given by Eqns(5.3) and (5.4) respectively.
Since γ = 2, these observations occur at the times t(S) = t′(S ′) = γL/v = 2L/v. The
corresponding spatial configurations of O and O’ at these times shown in Fig.2b. It can
be seen that the observation times of the light flashes in S and S’ correspond to different
PMCs of the objects O and O’ and to different spatial separations of the objects:
In S PL : PMC(MR′(10), L/v) ≡ MR′(10)@0 = MR′(10)@MR(0) (5.7)
PL′ : PMC(O′, γL/v) ≡ O′@MR(20) = MR′(20)@MR(20) (5.8)
In S′ PL′ : PMC(MR(10), L/v) ≡ MR(10)@O′ = MR(10)@MR′(20) (5.9)
PL : PMC(O, γL/v) ≡ O@MR′(0) = MR(0)@MR′(0) (5.10)
ℓ(γL/v)
ℓ(L/v)
ℓ′(γL/v)
ℓ′(L/v)
vγL/v
vL/v)
= γ (5.11)
The relations in (5.11) follow directly from the RP, while the PMC in (5.7)-(5.10) are
obtained from the geometry of Fig.2 and the invariance of the lengths of the moving
rulers derived in Section 3 above.
The different PMC corresponding to observations of the light flashes emitted by PL
and PL’ in different frames in (5.7)-(5.10) is deeply perplexing for common-sense con-
cepts of space and time. For example the photon bunches emitted by PL’ correspond
to MR(10)@MR′(20) in S’ and to MR′(20)@MR(20) in S. In some discussions of time
dilatation this apparent paradox is avoided by invoking a hypothetical contraction of a
moving ruler by a factor 1/γ [18]. This has the effect of shortening the moving ruler
R by a factor 1/2 in the right hand figure in Fig.2a, so that the PMC corresponding
to the flashing of PL’ becomes MR(20)@MR′(20), the same as in S with inversion of
pointer and mark. However, as demonstrated above, there is no such length contraction
effect, which, as pointed out elsewhere [5, 6, 7, 8, 9] is a spurious consequence of misin-
terpreting the space-time Lorentz transformation. Indeed the possibility of such a length
contraction effect is already excluded by inspection of Fig.2a. In the right hand figure, the
PMC correponding to the moving object O considered as a pointer is MR(0)@MR′(10).
Since O is in motion and R’ at rest no hypothetical length contraction effect operates
here. In the left hand figure the mutually simultaneous PMC in S is MR′(10)@MR(0)
so that at t(S) = t′(S ′) = L/v observers in S and S’ see reciprocal PMCs, i.e. ones
related by exchange of the pointer and mark symbols. If however the length contrac-
tion effect exists, the observer in S will see instead that the PMC corresponding to O is
MR′(0)@MR(0) at time t(S) = L/v. But from the RP this PMC must correspond to the
times t(S) = t′(S ′) = 2L/v (see Fig.2b) contrary to the assumption that t(S) = L/v. The
length contraction hypothesis therefore contradicts the corollory of the RP that states
that mutually simultaneous events in two frames have reciprocal PMCs, since it implies
that the reciprocal PMCs MR′(0)@MR(0) and MR(0)@MR′(0) are not mutually simul-
taneous.
The second example of a TD experiment illustrates a typical application of the effect
in particle physics (see Fig.3). A π− meson interacts with a proton in a thin plastic
target T to produce a Λ hyperon via the reaction3 π−p → ΛK0 The hyperon moves with
3The results of an actual such experiment constructed to test the ∆S = ∆Q rule in semileptonic
neutral kaon decays are described in Ref. [19].
velocity v =
3c/2 perpendicular to the plane of the target in the laboratory frame S.
After the time t′(S ′) = T ′ in its rest frame S’, it decays to a proton and a negative pion:
Λ → pπ−. These decay products are observed in the laboratory system. The experiment
is in every way similar to that shown in Fig.2. The object O is replaced by the target T,
the object O’ by the undecayed Λ or the kinematical system constructed from its decay
products. The photon pulse emitted by PL’ is replaced by the decay products of the
Λ. By reconstructing the trajectories of the decay p and π− in a particle detector the
position of the decay event and hence the decay length lD –the distance between the point
of production and decay of the Λ– in the frame S can be measured. Identification of the
p and π− and measurement of their momenta (typically by measurement of the curvature
of their trajectories in a known magnetic field ) enables the momentum P and the energy
E of the Λ to be determined. Since v = Pc2/E and γ = E/(mΛc
2) where mΛ is the mass
of the Λ, the proper decay time of the Λ is given by Eqn(5.3) as:
T ′ = ∆t′(S ′) =
∆t′(S)
(5.12)
The spatial configurations of T and the Λ at different times in the frames S and S’ are
shown in Fig.3. The spatial separations of T and the Λ at the observed instant of decay
in S and S’ obey the relation (5.11). This implies that this separation, in changing the
frame of observation from the rest frame of the Λ to the laboratory system in which it is
motion, undergoes a ‘length expansion’ by the factor γ. In accordance with Eqn(5.11), it
can be seen that this is a necessary consequence of the RP, given the existence of the TD
effect.
The mutally simultaneous events in S and S’ shown in Fig.3c, correspond, as they
must, to equal spatial separations of T and the physical object constituted by the decay
products, p and π−, of the Λ. However, in the frame S, these particles have just been
created and have vanishing spatial separation, whereas in S’ they are spatially separated by
a distance corresponding to a time-of-flight (γ− 1)T ′. This also seems highly paradoxical
when interpreted by commonsense classical concepts of space and time.
Acknowledgement
I thank the referee of the journal that rejected Ref. [11] for publication for correspon-
dence that was important for the clarificatiion of the ideas expressed in both the latest
version of Ref. [11] and the present paper.
Added Note
The calculations presented in the present paper are flawed by a major conceptual
misunderstanding which is rectified in later papers [20, 21] treating similar subjects.
At the time of writing the present paper, the author had correctly understood the
spurious nature of the ‘relativity of simultaneity’and ‘length contraction’ effects of con-
ventional special relativity theory [5, 7, 8, 10] but had not yet drawn the simple conclusion
that the existence of the genuine and experimentally-confirmed time dilatation effect then
necessarily implies that the Reciprocity Principle, as generally understood, also breaks
down in special relativity. This point is easily understood by considering the first member
of Eqn(3.1), written in a simplified notation as:
dxO′O
Transforming into the frame S’, the invariance of length intervals implies that
dxO′O = −dx
Since the time dilatation relation gives dt = γdt′, the Reciprocity Principle of (3.1) is
replaced by:
dxO′O
so that
= −γv
to be compared with v′ = −v given by (3.1).
The detailed calculations presented in Section 4 are correct and logically coherent
given the initial assumptions, but the configurations shown in the frame S’ in Fig.1 do not
correspond to observations in this frame of the coincidence events specified in the frame
S in the same space-time experiment. If this were the case, in the S’ frame configurations
in Fig.1 v should be replaced by γv and t and t′ should be related by time dilatation
relation t = γt′. In fact, what are shown in Fig.1 and considered in Section 4 are the
configurations in S of a primary experiment and in S’ of the corresponding but physically
independent reciprocal experiment [20, 22].
Nevertheless, the invariance of corresponding length intervals can be derived [21] by
considering the configurations in S and S’ in Fig.1b in the case that they are corresponding
ones, at the same epoch, in the same space-time experiment. In this case, as explained
above, the speed of O in S’ should be γv, not v. Consider, however, an object Õ with the
same x′ coordinate as O that does have the velocity v. The separation L′ of O and O’
in S’ is then equal to that between O’ and Õ. at the epoch of Fig.1b. Compare now the
configuration of O and O’ in S, with separation L with the corresponding one of Õ. and
O’ in S’ with separation L′. From the symmetry of the configurations it can be seen that
both L and L′ can depend only on v: L = L(v), L′ = L′(v). The reciprocity of the two
configurations is now invoked to give the condition, as stated by Pauli [23]:
The contraction of length at rest in S’ and observed from S is equal to
the length at rest in S as observed from S’.
The ‘length at rest in S’ ’ is L′ which ‘as observed from S’ is L, whereas the ‘length
at rest in S ’ is L which ‘as observed fron S’ ’ is L′. Denoting the contraction factor by
α(v), the above condition states that
L = α(v)L′, L′ = α(v)L
which implies that L = α(v)2L or α(v)2 = 1 so that L = L′ and the spatial separation
between O and O’ is the same in S and S’ at corresponding epochs. The same conclusion
is more simply reached by noting the symmetry of the configurations of O,O’ in S and
Õ,O’ in S’. and applying Leibnitz’ Principle of Sufficient Reason [21].
If, therefore, in the primary experiment, shown in S in Fig.2b and S’ in Fig.2a, the
configuration in S’ in Fig.2a is to correctly represent that corresponding to the configura-
tion in S in Fig.2b, the velocity v in S’ should be replaced by γv, so that when PL’ flashes
O’ is aligned with MR(20) in both S and S’. In the reciprocal experiment, shown in S in
Fig.2a and S’ in Fig.2b, v in S in Fig.2a should be replaced by γv so that O is aligned
with MR’(0) in both S and S’ when PL flashes.
Similarly, in the thought experiment of Fig.5, if the S’ frame configurations on the right
side of the figure are to represent observations in this frame of events shown in S by the
configurations on the left side, instead of what are actually shown which are configurations
of the physically independent reciprocal experiment, v should be replaced by γv in all the
S’ frame configurations. In this case, there is no mismatch between the spatial position of
the decay event in the two frames and the claimed ‘length expansion’ effect does not occur.
Indeed the claimed ‘... different PMC corresponding to observations of the light flashes
emitted by PL and PL’ in different frames in (5.7)-(5.10)’ is not only ‘...deeply perplex-
ing for common-sense concepts of space and time.’ it is the absurd (self-contradictory)
consequence of assuming, at the same time, that length intervals are invariant, time di-
latation occurs and the conventional interpretation of the Reciprocity Principle holds. In
conventional special relativity theory time dilatation and the Reciprocity Principle are
reconciled by invoking the spurious ‘length contraction’ effect dxO′O = −γx
[18]. so
that v′ = −v. The correct physical interpretation of the Reciprocity Principle is actually
the definition of the configuration in S’ of the physically-independent experiment that is
reciprocal to the primary one specified by the standard configuration of the frames S and
S’ [20, 22].
References
[1] A.Einstein,‘Zur Elektrodynamik bewegter Korper’, Annalen der Physik 17, 891
(1905). English translation by W.Perrett and G.B.Jeffery in ‘The Principle of Relativ-
ity’ (Dover, New York, 1952) P37-P65, or in ‘Einstein’s Miraculous Year’ (Princeton
University Press, Princeton, New Jersey, 1998) P123-P161.
[2] W.Ignatowsky, ‘Einige allgemine Bermerkungun Zum Relatitivitäsprinzip’, Phys.
Zeitschr. 11 972 (1910).
[3] J.H.Field, ‘A New Kinematical Derivation of the Lorentz Transformation and the
Particle Description of Light’, Helv. Phys. Acta. 70 542-564 (1997); arXiv pre-print:
http://xxx.lanl.gov/abs/physics/0410262. Cited 27 Oct 2004.
[4] V.Berzi and V.Gorini, ‘Reciprocity Principle and the Lorentz Transformation’, Journ.
Math. Phys. 10 1518-1524 (1969).
[5] J.H.Field, ‘The Local Space-Time Lorentz Transformation: a New Formulation
of Special Relativity Compatible with Translational Invariance’, arXiv pre-print:
http://xxx.lanl.gov/abs/physics/0501043. Cited 30 Nov 2007.
[6] J.H.Field, ‘The physics of space and time I: The description of rulers and clocks
in uniform translational motion by Galilean or Lorentz transformations’, arXiv pre-
print: http://xxx.lanl.gov/abs/physics/0612039. Cited 28 Mar 2008.
[7] J.H.Field, ‘Uniformly moving clocks in special relativity: Time dilatation,
but but no relativity of simultaneity or length contraction’, arXiv pre-print:
http://xxx.lanl.gov/abs/physics/0603135. Cited 4 Dec 2008.
[8] J.H.Field, ‘Clock rates, clock settings and the physics of the space-time Lorentz
transformation’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0606101. Cited 4
Dec 2007.
[9] J.H.Field, ‘Absolute simultaneity: Special relativity without light signals or synchro-
nised clocks’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0604010. Cited 6 Nov
2008.
[10] J.H.Field, ‘Translational invariance and the space-time Lorentz transformation with
arbitary spa-
tial coordinates’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0703185. Cited
15 Feb 2008.
[11] J.H.Field, ‘Relativistic velocity addition and the relativity of space and time inter-
vals’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0610065. Cited 6 Feb 2009.
[12] J.H.Field, ‘The train/embankment thought experiment, Einstein’s second pos-
tulate of special relativity and relativity of simultaneity’, arXiv pre-print:
http://xxx.lanl.gov/abs/physics/0606135. Cited 9 Jan 2009.
[13] J.H.Field, ‘Muon decays in the Earth’s atmosphere, time dilatation and relativity of
simultaneity’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0606188. Cited 22
Jan 2009.
[14] I.B.Cohen and R.S.Westfall ‘Newton’ (W.W.Norton Company, New York, 1995)
P233.
[15] R.Mansouri and R.U.Sexl, ‘A Test Theory of Special Relativity: I Simultaneity and
Clock Synchronisation’, Gen. Rel. Grav. 8, 497-513 (1977).
[16] J.H.Field, ‘The physics of space and time II: A reassessment of Einstein’s 1905 special
relativity paper’, arXiv pre-print: http://xxx.lanl.gov/abs/physics/0612041. Cited
14 Apr 2008.
[17] J.Bailey et al., ‘Measurements of relativistic time dilation for positive and negative
muons in circular orbit’, Nature 268, 301 (1979).
[18] N.D.Mermin ‘Its About Time’ (Princeton University Press, Princeton 2005) Figure
6.3. P67.
[19] J.C.Hart et. al ,‘ A test of the ∆S = ∆Q rule in Ke3 decay’, Nucl. Phys.B 66 317
(1973).
[20] J.H.Field, ‘Primary and reciprocal space-time experiments, relativistic reciprocity re-
lations and Einstein’s train-embankment thought experiment’, arXiv pre-print:http:
//xxx.lanl.gov/abs/0807.0158. Cited 1 Jul 2008.
[21] J.H.Field, ‘Space-time attributes of physical objects and the laws of space-time
physics’, arXiv pre-print: http://xxx.lanl.gov/abs/0809.4121. Cited 24 Sep 2008.
[22] J.H.Field, ‘The physics of space and time III: Classification of space-time experiments
and the twin paradox’, arXiv pre-print: http://xxx.lanl.gov/abs/0806.3671. Cited 23
Jun 2008.
[23] W.Pauli, ‘Relativitätstheorie’ (Springer, Berlin 2000). English translation,‘Theory of
Relativity’ (Pergamon Press, Oxford, 1958) Section 4, P11.
ABSTRACT
  Ponderable objects moving in free space according to Newton's First Law
constitute both rulers and clocks when one such object is viewed from the rest
frame of another. Together with the Reciprocity Principle this is used to
demonstrate, in both Galilean and special relativity, the invariance of the
measured length of a ruler in motion. The different times: `proper', `improper'
and `apparent' appearing in different formulations of the relativistic time
dilatation relation are discussed and exemplified by experimental applications.
A non-intuitive `length expansion' effect predicted by the Reciprocity
Principle as a necessary consequence of time dilatation is pointed out

<|endoftext|><|startoftext|>
Introduction
The B → ρK∗ charmless decays proceed through
dominant gluonic penguin loops and doubly Cabibbo-
suppressed tree processes, as shown in Fig. 2. The ex-
ternal tree diagram is only possible with a K∗+, and the
color-suppressed internal tree diagram with a ρ0. Hence
B+ → ρ+K∗0 is pure penguin.
According to isospin symmetry, the two modes with a
charged ρ are expected to have a branching fraction twice
as large as the two modes with a neutral ρ.
FIG. 2: Feynmann diagrams for the B → ρK∗ decay: gluonic
penguin, external tree and internal tree diagrams.
http://arxiv.org/abs/0704.0364v1
mailto:georges.vasseur@cea.fr
B. Results from Belle
Mbc (GeV/c
∆E (GeV)
FIG. 3: Projections of Mbc for events in the ∆E signal region
(left) and of ∆E in the Mbc signal region (right). The solid
curves show the results of the fit. The dashed curve is the
signal contribution. The hatched histograms represent the
continuum background. The sum of the b → c and continuum
background component is shown as dot-dashed lines.
0.4 0.8 1.2 1.6 2.0
M(π+π0) (GeV/c2)
0.64 0.84 1.04 1.24 1.44
M(K+π-) (GeV/c2)
FIG. 4: Signal yields obtained from the Mbc-∆E distribution
in bins of M(π+π0) (left) for events in the K∗0 region and in
bins of M(K+π−) (right) for events in the ρ+ region. The
points with error bars show the data. Solid curves show the
results of the fit. Hatched histograms are for the nonresonant
component.
Belle was the first experiment in 2005 to publish a re-
sult on the observation of the B+ → ρ+K∗0 mode [5],
on a sample of 275 millions of BB̄ pairs. A signal of
B+ → π+π0K+π− is extracted from the e+e− → qq̄
continuum and BB̄ backgrounds in an extended un-
binned maximum-likelihood fit using the B meson beam-
constrained mass Mbc and energy difference ∆E, as
shown in Fig 3.
The B+ → ρ+K∗0 signal is extracted by fits to Mbc
and ∆E in bins of the vector meson masses M(π+π0)
and M(K+π−), as shown in Fig 4. This is necessary
because there is a large nonresonant ρKπ background,
which gives a continuum in the distribution of M(K+π−).
Nethertheless there is a clear B+ → ρ+K∗0 signal of
85± 16 events with a significance of 5.2 σ.
As for fL, it is obtained by fitting simultaneously the
signal yields obtained from Mbc-∆E fits in bins of the two
helicity angles, assuming an S-wave Kπ system in the
ρKπ background. The results for the branching fraction
and fL in B
→ ρ+K∗0 are:
B = (8.9± 1.7± 1.2) 10−6,
fL = 0.43± 0.11
+0.05
−0.02.
The value found for fL is similar to the one found in φK
and its error is about twice as large as in φK∗.
C. Results from BABAR
 (GeV)0π+πm
0.5 1 1.5E
 (GeV)0π+πm
0.5 1 1.5E
 (GeV)-π+πm
0.6 0.8 1 1.2 1.4E
 (GeV)-π+πm
0.6 0.8 1 1.2 1.4E
 (GeV)-π+Km
0.8 1 1.2 1.4E
 (GeV)-π+Km
0.8 1 1.2 1.4E
 (GeV)-π+Km
0.8 1 1.2 1.4E
 (GeV)-π+Km
0.8 1 1.2 1.4E
FIG. 5: sPlots for the ππ (top) and Kπ (bottom) invariant
masses in the B+ → ρ+K∗
(left) and B0 → ρ0K∗
/B0 →
f0(980)K
∗0 (right) analyses. The points with error bars show
the data. The solid curve shows the signal and nonresonant
background contribution, the dashed curve is the nonreso-
nant background contribution (ρKπ except for the top right
plot where it represents the sum of f0(1370)K
∗, ππK∗, and
ππKπ). The arrows show the standard mass windows used
in the final fit.
More recently BABAR published an anlysis of all four
B → ρK∗ modes [6], performed on a sample of 232 mil-
lions of BB̄ pairs. It is based on an unbinned maximum-
likelihood fit, using seven variables: the B meson energy-
substituted mass mES and energy difference ∆E, a neural
network output or a Fischer discriminant combining sev-
eral event shape variables, the two vector meson masses,
and the two helicity angle cosines. The fit allows the
simultaneous extraction of the branching ratio and the
fraction of longitudinal polarization.
The major challenge in the analysis comes from the
nonresonant backgrounds, which share the same final
state as the signal. They are studied by enlarging the
vector meson mass windows, as illustrated in Fig. 5. As
in Belle, a large ρKπ background is seen in the mKπ dis-
tribution in the B+ → ρ+K∗0 mode. The Kπ system in
this background is measured to be mostly S-wave. The
situation is even more complex in the B0 → ρ0K∗0 mode,
since in addition to the ρKπ background there are sev-
eral contributions seen in the mππ distribution for a ρ
in contrast to the one for a ρ+. The f0(980) can be seen
clearly. In fact B → f0(980)K
∗, which is a scalar-vector
TABLE I: Results from BABAR on the B → ρK∗ modes: signal yield with its statistical uncertainty, significance (systematic un-
certainties included), branching fraction (90% confidence level upper limit in parentheses), fraction of longitudinal polarization
and direct CP asymmetry. (The numbers in brackets are not quoted as measurements.)
Mode Signal yield Significance (σ) B(×10−6) fL ACP
∗+ 51± 24 2.5 < 6.1 (3.6± 1.7± 0.8) [0.9 ± 0.2]
∗+ 60± 24 1.6 < 12.0 (5.4± 3.6± 1.6)
∗0 194± 29 7.1 9.6± 1.7± 1.5 0.52 ± 0.10± 0.04 −0.01 ± 0.16± 0.02
∗0 185± 30 5.3 5.6± 0.9± 1.3 0.57 ± 0.09± 0.08 0.09 ± 0.19± 0.02
a) b)
c) d)
 (GeV)ESm
 (GeV)ESm
5.25 5.26 5.27 5.28 )2 (GeV/cESm )2 (GeV/cESm
5.26 5.27 5.28 5.29
FIG. 6: Projections of mES of events passing a signal likeli-
hood threshold for (a) B+ → ρ0K∗
, (b) B+ → ρ+K∗
, (c)
∗+, (d) B0 → ρ0K∗
, (e) B+ → f0(980)K
∗+, and
(f) B0 → f0(980)K
∗0. The points with error bars show the
data. The solid curve is the fit function, the dashed curve
is the total background contribution, and the dotted curve is
the continuum background contribution.
mode, is considered as another signal to be measured in
the same maximum-likelihood fit. Also present are con-
tributions from the f0(1370) and nonresonant ππ. The
yields of the nonresonant backgrounds are fitted in the
enlarged mass windows, then extrapolated to the stan-
dard ones and fixed in the final fit with the standard mass
windows.
The projection plots in the B mass shown in Fig. 6 il-
lustrate the extraction of the signal from the continuum
and BB̄ backgrounds in the four B → ρK∗ channels and
the two B → f0(980)K
∗ modes. Table I summarizes the
results. No significant enough signals are observed for
B0 → ρ−K∗+ and B+ → ρ0K∗+, where upper limits at
the 90 % confidence level are set on the branching ratios.
For the latter a related signal B+ → f0(980)K
∗+ is ob-
served with a significance of 5.0 σ and a measured branch-
ing fraction of (5.2 ± 1.2 ± 0.5) 10−6. In B+ → ρ+K∗0,
the result is in very good agreement with the result from
Belle, with a similar precision. The B0 → ρ0K∗0 mode is
observed for the first time. The ratio between the branch-
ing fractions in these two modes is compatible with the
factor 2 expected from isospin symmetry.
The value of ACP is measured in the two significant
modes to be compatible with 0, as expected since there is
one dominant diagram. Finally fL is found close to 0.5 in
these two modes. It is compatible with the measurement
from Belle and has about the same precision. It is again
similar to the value found for φK∗.
III. MODES WITH ω
40 +ρω
10 ωω
20 *0Kω
20 0ρω
40 +ρω
20 ωω
 E (GeV)  ∆
-0.2 -0.1 0 0.1 0.2
 (GeV/cESM
5.25 5.26 5.27 5.28 5.29
20 0fω
FIG. 7: Projections of ∆E (left) and mES (right) of events
passing a signal likelihood threshold for, from top to bottom,
∗0, B+ → ωK∗
, B0 → ωρ0, B+ → ωρ+, B0 →
ωω, B0 → ωφ, and B0 → ωf0(980). The points with error
bars show the data. The solid curve is the fit function, the
dashed curve is the signal contribution, and the dot-dashed
curve is the background contribution.
TABLE II: Results from BABAR on modes involving an ω meson: signal yield with its statistical uncertainty, significance
(systematic uncertainties included), branching fraction (90% confidence level upper limit in parentheses), fraction of longitudinal
polarization and direct CP asymmetry. (The numbers in brackets are not quoted as measurements.)
Mode Signal yield Significance (σ) B(×10−6) fL ACP
∗0 55± 20 2.4 < 4.2 (2.4± 1.1± 0.7) [0.71 ± 0.25]
∗+ 8± 16 0.4 < 3.4 (0.6± 1.3± 1.0)
−18± 16 0.6 < 1.5 (−0.6± 0.7+0.8−0.3)
+ 156± 32 5.7 10.6± 2.1+1.6−1.0 0.82 ± 0.11± 0.02 0.04 ± 0.18± 0.02
→ ωω 48+24−19 2.1 < 4.0 (1.8
−0.9 ± 0.4) [0.71 ± 0.25]
→ ωφ 3.1±+4.4−8.5 0.3 < 1.2 (0.1± 0.5± 0.1)
    ρθcos
-0.5 0 0.5
|    ωθ|cos
0 0.5 1
FIG. 8: Projections of the helicity-angle cosines for ω (left)
and ρ+ (right) of events passing a signal likelihood threshold
from the fit for B+ → ωρ+ decays. The points with error bars
show the data. The solid curve is the fit function, the dashed
curve is the signal contribution, and the dot-dashed curve is
the background contribution.
On the same sample of 232 millions of BB̄ pairs,
BABAR has also recently published a search for several
vector-vector modes involving an ω meson [7]: B0 →
ωK∗0, B+ → ωK∗+, B0 → ωρ0, B+ → ωρ+, B0 → ωω,
and B0 → ωφ. The related vector-scalar mode B0 →
ωf0(980) was also searched for. An earlier search for
B → ωK∗ and B → ωρ on 89 millions of BB̄ pairs
resulted in the first observation of the B+ → ωρ+ chan-
nel [8].
The analysis is also based on an extended unbinned
maximim-likelihood fit using the same seven variables as
in the previous section. Nonresonant ππ and Kπ back-
grounds are fixed in the fit as determined from extrap-
olations from higher-mass regions. The projection plots
of ∆E and mES of Fig. 7 illustrate the extraction of the
signal from the continuum and BB̄ backgrounds in all
these modes. In most of them, no significant enough sig-
nal is seen. The only channel where a significant signal is
observed is B+ → ωρ+. Its measured branching fraction
is about 2 standard deviations smaller than the one of
B+ → ρ+ρ0 [2], while these two branching fractions are
naively expected to be equal. Table II summarizes the
results in all the modes. To calculate the branching frac-
tion, fL is left free in the fit for the three modes with a
signal significance greater than 2σ and is fixed otherwise.
Upper limits at the 90 % confidence level are set on the
branching fractions for the modes other than B+ → ωρ+.
The maximum-likelihood fit also provides the value of
fL in B
→ ωρ+, which is found to be 0.82 ± 0.11, a
high value expected for this tree-dominated mode. This
is illustrared in the projection plots of the helicity angle
cosines shown in Fig. 8. The direct CP asymmetry is also
measured and found to be compatible with 0.
IV. CONCLUSION
In summary, improved analyses with explicit consider-
ation of nonresonant backgrounds have been performed
on several charmless hadronic vector-vector decays of
the B meson. The B+ → ωρ+, B+ → ρ+K∗0, and
B0 → ρ0K∗0 modes have been observed and measured in
the past few years. Improved upper limits have been set
on the branching fraction of other vector-vector modes.
The recent results on vector-vector modes have also
brought more pieces to the polarization puzzle. The
penguin-dominated B+ → ρ+K∗0 and B0 → ρ0K∗0
modes have a fraction of longitudinal polarization of
about 0.5 like φK∗, while the tree-dominated B+ → ωρ+
mode has one closer to 1 like ρρ. As a lot of charmless
vector-vector modes have not yet been observed, new re-
sults can be expected with more data.
[1] M. Beneke et al., Phys. Lett. B 638, 68 (2006).
[2] A. Somov, contribution to this conference.
[3] K.F. Chen, contribution to this conference.
[4] G.W.S. Hou, contribution to this conference.
[5] J. Zhang et al., Phys. Rev. Lett. 95, 141801 (2005).
[6] B. Aubert et al., Phys. Rev. Lett. 97, 201801 (2006).
[7] B. Aubert et al., Phys. Rev. D 74, 051102 (2006).
[8] B. Aubert et al., Phys. Rev. D 71, 031103 (2005).
ABSTRACT
  The recent analyses of the following rare vector-vector decays of the B meson
are presented: rho K*, omega K*, omega rho, omega omega, and omega phi
charmless final states. The latest results indicate that the fraction of
longitudinal polarization is about 0.5 in penguin-dominated modes and close to
1 for tree-dominated modes.

<|endoftext|><|startoftext|>
arXiv:0704.0365v1  [cond-mat.supr-con]  3 Apr 2007
7 Extending the theory of phonon-mediated
superconductivity in quasi-2D
J.P.Hague
Department of Physics, Loughborough University, Loughborough, LE11 3TU
Abstract. I present results from an extended Migdal–Eliashberg theory of electron-phonon inter-
actions and superconductivity. The history of the electron-phonon problem is introduced, and then
study of the intermediate parameter regime is justified from the energy scales in the cuprate su-
perconductors. The Holstein model is detailed, and limiting cases are examined to demonstrate the
need for an extended theory of superconductivity. Results of the extended approximation are shown,
including spectral functions and phase diagrams. These are discussed with reference to Hohenberg’s
theorem, the Bardeen–Cooper–Schrieffer theory and Coulomb repulsion. [Published in: Lectures
on the physics of highly correlated electron systems X, p255-264, AIP Conference Proceedings
vol. 846 (2006)]
INTRODUCTION
Over the past half-century, the study of the role of electron-phonon interactions in
condensed matter physics has been an active and controversial field. Initially of interest
from the point of view of thermal properties, early models of the interactions between
lattice vibrations and electrons included the continuum Fröhlich model [1]. Interest in
electron-phonon interactions increased dramatically when in 1957, Bardeen, Cooper and
Schrieffer (BCS) published their famous theory of superconductivity [2], which directly
implicated phonons as the microscopic mechanism for the low temperature absence of
resistivity in a variety of metals. Until the discovery of the cuprate superconductors by
Bednorz and Müller in 1986 [3], the BCS picture was found to account well for all
superconducting materials - a remarkable success for a simple mean-field theory which
is only applicable at weak coupling!
Soon after the realisation that phonons were responsible for superconductivity, Eliash-
berg extended the theoretical description beyond the absolute weak coupling theory with
the famous Eliashberg equations [4]. In doing this, he built on the earlier work of Migdal,
who argued that a simple resummation of a certain class of Feynmann diagrams should
be sufficient to describe the limit of low phonon frequency [5]. Eliashberg’s theory can
be argued to be one of the first applications of the dynamical mean-field theory (DMFT)
[6], since (in its original sense) it ignores spatial fluctuations (momentum dependence)
in the self-energy, while keeping frequency dependent (dynamical) effects.
The purpose of this paper is to describe an extension to the theory of superconductivity
from electron-phonon interactions. The approach goes beyond the Eliashberg theory by
introducing the effects of spatial fluctuations and higher order terms in the perturbation
theory. The aim is to develop a theory which can be used for systems with stronger
coupling, larger phonon frequencies and reduced dimensionality. I begin by motivating
http://arxiv.org/abs/0704.0365v1
the need for a more sophisticated theory from the experimental viewpoint. I also discuss
limiting cases of the Holstein model, and how the large phonon frequency limit of that
model implies that the conventional theories of superconductivity are incomplete. I then
introduce the approximations needed to develop a more sophisticated theory. Finally
I present some results from the new approximation, and discuss them in relation to
Cuprate superconductors, and also with regard to conventional theories and the exact
Hohenberg theorem [7].
MOTIVATION
When the high-temperature cuprate superconductors were discovered in 1986 [3], the
possibility that phonons could be attributed to the microscopic mechanism was quickly
discounted by many people. In part, this was due to the absence of an isotope effect at
optimal doping, and also an assumption that phonon-mediated superconductivity could
not occur above 30K. The mechanism for high-TC superconductors remains highly
controversial, and many different hypotheses are suggested (some examples are spin
fluctuations [8] and exotic phonon mechanisms such as bipolarons [9]). An increasing
body of evidence shows that phonons as well as Coulomb repulsion have an effect on the
physics of the cuprate materials. I shall give a brief review of the current experimental
situation in this section, and argue that (1) Electron-phonon interactions need to be
treated on an equal footing to Coulomb repulsion if the Cuprates are to be understood,
and (2) In order to treat the phonons in the Cuprates, extensions to the current theories
of electron-phonon interactions and phonon-mediated superconductivity are required.
There are several experiments demonstrating strong electron-phonon coupling in the
cuprates. The most compelling is the existence of a strong isotope effect on exchanging
O16 for O18 [10]. There are also some more recent experiments which demonstrate
the effects of electron-phonon interactions in a transparent manner. Figure 1 shows
schematic representations of electron and phonon dispersions in the cuprates. Panel
(a) details the main features of the electronic dispersion measured by Angle-Resolved
Photo-Emmission Spectroscopy (ARPES) in the [11] direction [11]. At energies close to
the Fermi-surface, there are coherent excitations with a long lifetime. As εk = |ω0 − εF |
is approached, the gradient of the dispersion changes at a sharp kink. The phonon is
of the transverse optic variety, and its frequency (ω0) is of the order of 100meV. It
suffices here to mention that this is very large. The ratio of the gradients above and
below the kink is related to the dimensionless coupling constant (λ = g2/tω0), and it is
found that λ can take values of up to 2 [11]. Panel (b) shows a schematic representation
of some neutron scattering results measuring the phonon dispersion [12, 13]. Above
the transition temperature, this looks like the solid line, but as the system moves from
normal to superconducting state, the spectral weight in the circled area vanishes. This
indicates that the superconductivity (bound pairs of electrons) affects the phonons, and
is additional evidence for a strong electron-phonon coupling.
A frequent misconception about the cuprates is that electron-phonon terms in the
Hamiltonian can be neglected on the basis that they are small. To demonstrate that this
is not the case, figure 2 shows approximate energy scales in the cuprates. The largest
energy by far is the Coulomb repulsion (or Hubbard U ) which weighs in at some 10eV.
(a) (b)
INCOHERENT
COHERENT
WEIGHT LOST ON
TRANSITION
kk F 0
PHONON DISPERSIONELECTRON DISPERSION
FIGURE 1. Schematics showing the effect of electron-phonon interactions on the electron and phonon
dispersions in the cuprates. Both panels describe measurements along the [11] direction. Panel (a) shows
a schematic representation of the electronic dispersion measured by Angle-Resolved Photo-Emmission
Spectroscopy (ARPES) [11]. At energies close to the Fermi-surface, there are coherent excitations with
a long lifetime. As εk = |ω0 − εF | is approached, the gradient of the dispersion changes and a kink is
introduced. The phonon is of the transverse optic variety, and its frequency (ω0) is ∼ 75meV. The ratio
of the gradients above and below the kink is related to the coupling constant [11]. Panel (b) shows a
schematic representation of some neutron scattering results measuring the phonon dispersion [12, 13].
Above the transition temperature, this looks like the solid line, but as the system moves from the normal
to the superconducting state, the spectral weight in the shaded area vanishes. This indicates that the
superconducting state affects the phonons, and is further evidence for strong electron-phonon coupling.
Next is the intersite hopping integral t, which is of the order of 1eV. Using a simple
2nd order perturbation theory at strong coupling, an effective exchange interaction is
generated [14], with J = t2/U of the order of 100meV. This J often used to argue for a
spin-fluctuation theory of high-TC superconductivity that neglects phonons. The problem
with this viewpoint is immediately clear if one reviews the experimental data. First, the
energies of the phonons are also approximately 100meV, so they cannot be treated as
a small energy scale. Second, a dimensionless coupling constant of order unity implies
dimensionfull coupling g with similar magnitude. Thus with three very close energy
scales, it is important that the contributions from both phonon and Coulomb mechanisms
are treated on equal footing in a theory for the cuprates. Unfortunately, as I discuss in the
next section, current theories of electron-phonon interactions are not capable of handling
the large phonon energies and coupling constants in the cuprates. The remainder of this
paper focuses on how the theory can be extended to describe this regime.
MODEL AND LIMITS
A generic model of electron-phonon interactions includes the motion of the electrons
Hel, the motion of the ions (or phonons) Hph and the interaction between the electrons
and the phonons (which may be absorbed or emitted) which is denoted Hel−ph. In this
Energy: 10meV 100meV 1eV 10eV
T J t U
FIGURE 2. Schematic showing the energy scales in the cuprates. The largest energy by far is the
Coulomb repulsion (or Hubbard U) of order 10eV. The intersite hopping integral t, is ∼1eV. Using a
simple 2nd order perturbation theory, an effective exchange interaction is generated, with J = t2/U of
the order of 100meV. This J is then used to argue for the spin-fluctuation theory of high TC. However,
the energies of the phonons are also approximately 100meV and the dimensionful coupling g has around
the same value. Thus with 3 similar energy scales, it is important that the contributions from both spin-
fluctuations and phonon mechanisms are treated on equal footing.
way, H = Hel +Hel−ph +Hph is the total Hamiltonian.
Hel = ∑
kck ≈− ∑
<i j>σ
tc†iσ c jσ (1)
Hel−ph =−∑
k−qck(b
q +b−q)≈−∑
niσ gri (2)
Hph = ∑
b†kbk +
Mω20 r
The first term in the Hamiltonian is the general form for free electrons, i.e. the total
energy is the sum of the kinetic energies of all occupied states. In a special case,
which is known as the Holstein Hamiltonian, the electrons in a tight binding model
may hop between nearest-neighbour sites only, and εk = −2t ∑Di=1 cos(ki), where t is
the overlap integral. In the generic form of the electron-phonon interaction, an electron
may be scattered by absorbing a phonon with momentum −q or emitting a phonon with
momentum q. An additional approximation uses a momentum independent electron-
phonon coupling, g, and in that case the Fourier transform shows that the second
term connects the local ion displacement, ri to the local electron density. Finally, the
free phonon term may be simplified by using the Einstein approximation ωk ≈ ω0
and Fourier transforming, the bare phonon Hamiltonian is shown to be a series of
independent simple harmonic oscillators at each site index. The creation of electrons
and phonons is represented by c† and b† respectively, pi is the ion momentum and M the
ion mass. By choosing t = 0.25, a bandwidth of W = 2 is chosen. A small interplanar
hopping of t⊥ = 0.01 is included to remove the logarithmic singularity in the 2D density
of states at ε = 0.
Figure 3 shows the parameter space of the Holstein model. For very large phonon
frequency, the effective interaction is instantaneous, and a Lang–Firsov transformation
[15] results in an attractive Hubbard model (which is one of the standard models for
correlated electron systems) [16]. Alternatively, taking the limit of very small phonon
frequency, a fast moving electron cannot ‘see’ the nuclei move in the time it takes to
EXTENDED THEORY
CUPRATE?
FIGURE 3. Parameter space of the Holstein model. For very large phonon frequency, the effective
interaction is instantaneous, and a Lang–Firsov transformation results in an attractive Hubbard model.
Alternatively, taking the limit of very small phonon frequency, a fast moving electron cannot ‘see’ the
phonons move, and the problem maps to a static disorder problem (similar to the Falikov–Kimball model
[19]). This makes the phonon problem extremely hard, and little is known about the middle of the
parameter space. The range of the Eliashberg theory is shown in the bottom left corner. The expected
position of the cuprates is shown as the single diamond. The expected validity of an extended theory
including all 2nd order Feynman diagrams is also shown.
traverse many sites, so the problem maps to a static disorder problem (which is essen-
tially uncorrelated). One may therefore think of the phonon frequency as possessing the
ability to “tune” the effect of correlations, and one therefore obtains a second motivation
for the study of electron-phonon systems of trying to understand electronic correlations
[17]. The correlation tuning makes the phonon problem extremely hard, and little is
known about the intermediate regime of the parameter space. The range of the Eliash-
berg theory is shown in the bottom left corner. Contrary to Migdal’s assumption, the
theory cannot extend beyond intermediate coupling since renormalisation of the effec-
tive mass reduces εF invalidating the condition (Migdal’s theorem) ω0 ≪ εF [18, 9]. The
approximate position of the phonon parameters in the cuprates is shown as the single di-
amond. It is essential to correct the theory for weak to intermediate coupling at larger
phonon frequencies. The extension is clear by looking at the large phonon frequency
limit. The Hubbard limit requires that all 2nd order processes in U are included in the
self-energy, or the incorrect weak coupling limit is found. An extended theory including
all 2nd order Feynman diagrams is required to understand the weak coupling limit, from
small to large phonon frequency.
FIGURE 4. Series of Feynman diagrams used in the current approximation. Σ is the electron and Π the
phonon self-energy. Series (a) is the Migdal-Eliashberg approximation and (b) the vertex corrected series.
EXTENDING THE ELIASHBERG THEORY
Extending the Eliashberg theory involves inserting the lowest order vertex corrections
into the electron and phonon self energies. In the Eliashberg theory, emitted phonons
are reabsorbed in a last-out-first-in order. Vertex corrections essentially allow this order
to be changed once. Such contributions are shown diagrammatically in figure 4. All
the diagrams must be included in the calculation, or electron number would not be
conserved. Momentum dependence is included in the approximation, which is essential
in low-dimensions. The inclusion of vertex corrections leads to double 2-fold integration
over the Brillouin zone in combination with a double sum over matsubara frequencies,
which is time consuming for the numerics. In order to reduce the number of points in k-
space while maintaining the thermodynamic limit, the dynamical cluster approximation
is applied [20]. Additionally, superconducting states can be considered by using the
Nambu formalism. The full details of the implementation of the extended approximation
can be found in references [21] and [22].
Using a maximum entropy technique, it is possible to compute the spectral function
from the Matsubara axis Green function. Figure 5 shows the spectral function of the
Holstein model calculated using the extended Migdal–Eliashberg theory. The results are
qualitatively similar to ARPES measurements of the cuprates. In particular the change
between incoherent and coherent particles occurs at the phonon frequency (shown as the
dashed line), associated with a kink in the [11] direction. It is noted here that the effect
of the phonon self-energy is a softening of the phonon mode. In the standard ME theory
in 2D, the mode at the (π ,π) point is completely softened, leading to a fatal instability
of the theory. However, the vertex corrections act against this softening, and relieve the
instability. In such a way, it is clear that a vertex corrected Eliashberg theory is essential
for the study of quasi-2D materials [21].
One can also compute properties in the superconducting state. One such property is
the momentum-dependent pairing density, ns(k) = T ∑n F(iωn,k), where F(iωn,k) is
the anomalous Green function associated with the pairing of electrons with momentum
k and −k. It is possible to transform the momentum dependent order parameter to
determine the magnitude of individual spherical harmonics. Figure 6 shows such a
decomposition. A cluster size of NC = 64 is used, with U = 0.6 and ω0 = 0.4. Note
ω0=0.2, U=0.3, DCA(VC)
-2 -1
A(k,ω)
FIGURE 5. Spectral function of the Holstein model in the extended Migdal–Eliashberg theory. The
results are qualitatively similar to ARPES measurements of the cuprates. In particular the change between
incoherent and coherent particles occurs at the phonon frequency, associated with a kink in the [11]
direction. ©Institute of physics publishing 2003 [21].
how higher order harmonics develop as the filling is increased. In particular, it can be
seen that no single harmonic (such as the s-wave symmetry) is sufficient to describe the
order parameter. Some of the higher order terms come about due to increased pairing at
momentum k = (π/2,π/2), in particular, pairs with angular momentum.
Finally, by varying the temperature and chemical potential, the phase diagram can
be computed. Figure 7 shows phase diagrams of the Holstein model for the different
approximations. U = 0.6 and ω0 = 0.4.The top diagram shows the result from the
Eliashberg approximation (dynamical mean-field theory NC = 1). On the bottom the
results from the current approximation with NC = 4 are shown. The superconducting
order is suppressed close to half filling. Assuming a form for the density of states in 2D
(with small interplane hopping) of D(ε) = (1− t log((ε2+ t2⊥)/16t
2))/tπ2 (for |ε|< 4t)
[23], which matches the full density of states with reasonable accuracy. From this the
BCS result may be calculated using the expression
TC(n) = 2ω0 exp(−1/|U |D(µ(n)))/π , (4)
with the chemical potential taken from the self-consistent solution for a given n. This
result also drops off monotonically. Results in the dilute limit are in good agreement with
the BCS result (line with points). Close to half-filling, the DMFT result is significantly
smaller than the BCS result (which predicts TC(n = 1)> 0.07). The difference in results
-0.025
-0.02
-0.015
-0.01
-0.005
 0.005
 0.01
 1  1.1  1.2  1.3  1.4  1.5  1.6  1.7
s, m=0
d, m=0
g, m=0
g, m=4,-4
FIGURE 6. Decomposition of the order parameter into spherical harmonics. A cluster size, NC = 64 is
used, with U = 0.6 and ω0 = 0.4. Note how higher order harmonics develop as the filling is increased. In
particular, the g harmonics can be almost as strong as the s harmonics at n = 1.45. ©Institute of physics
publishing 2005 [22].
between the two mean-field theories at half-filling is due to the self-consistency in the
DMFT. When vertex corrections and spatial fluctuations are included, the dilute limit
is relatively unchanged. However at half-filling, there is a huge drop in the transition
temperature. The suppression at half-filling is a manifestation of Hohenberg’s theorem,
which implies that there may be no superconducting order in 2D. Here I have computed
for quasi-2D, so it is interesting that in real materials with low dimensional character the
maximum in superconductivity is shifted away from half-filling.
CONCLUDING REMARKS
I end the paper with a warning for constructing theories of high-temperature supercon-
ductivity using electron-phonon interactions alone, while neglecting the Coulomb re-
pulsion. If one takes the phase diagrams from the previous section, and assigns similar
energy scales to those in the cuprates, it is possible to obtain a temperature in Kelvins
for the maximum in the phase diagram at n = 1.2. This comes out as around 172K - one
could say approximately the TC in the cuprates.
So why isn’t this the solution for the cuprates? Cuprates are very tightly bound ma-
terials, which is why the “Fermi energy” is low, and the ratio ω0/εF is large enough
to justify extending Eliashberg theory. The problem is that a small Fermi energy also
means the the Hubbard U is a comparatively large quantity. On a simple mean-field
level, one can include the Coulomb repulsion in the theory of superconductivity. For ex-
ample, the Eliashberg equations can be extended to include an effective electron-electron
interaction (otherwise known as the Coulomb pseudopotential µC). The effect of this is
to modify λ → λ − µC. Substitution into equation 4 means that the transition temper-
 0.02
 0.04
 0.06
 0.08
U=0.6, ω0=0.4, Nc=4, VC
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
 0.02
 0.04
 0.06
 0.08
 0.01
 0.02
 0.03
 0.04
 1  1.2  1.4  1.6  1.8
 0.02
 0.04
 0.06
 0.08
U=0.6, ω0=0.4, Nc=1
FIGURE 7. Phase diagrams of the Holstein model. U = 0.6 and ω0 = 0.4. The top diagram shows the
result from the Eliashberg approximation (dynamical mean-field theory NC = 1). Also shown is the BCS
result (line with points). On the bottom the results from the current approximation with NC = 4 are shown.
The superconducting order is suppressed close to half filling in the vertex corrected theory. ©Institute of
physics publishing 2005 [22].
ature is considerably reduced, or that superconductivity of the BCS type is completely
destroyed. Any phonon-based mechanism for the cuprates must address this point and
be compatible with the electron-electron interaction. Alternatively (and this is a warn-
ing against the other extreme) on the basis of the similarity of energy scales, any spin-
fluctuation mechanism (which is essentially Coulombic) must also treat the phonons (or
at least be compatible with them) to be plausible.
ACKNOWLEDGMENTS
I sincerely thank the organising committee of the course for their generous financial
support. Aspects of this research were carried out under the MPIPKS guest scien-
tist program, and as a visitor at the University of Leicester. I thank A.S.Alexandrov,
J.L.Beeby, E.M.L.Chung, N.d’Ambrumenil, J.K.Freericks, M.Jarrell, P.E.Kornilovitch,
J.H.Samson and M.Yethiraj for stimulating discussions, both about this work and the
problems of electron-phonon interactions and superconductivity in general. I acknowl-
edge support at Loughborough University under EPSRC grant no. EP/C518365/1.
REFERENCES
1. H.Fröhlich. Phys. Rev., 79:845, 1950.
2. J.Bardeen, L.N.Cooper, and J.R.Schrieffer. Phys. Rev., 108:1175, 1957.
3. J.G.Bednorz and K.A.Müller. Z. Phys. B, 64:189, 1986.
4. G.M.Eliashberg. JETP letters, 11:696, 1960.
5. A.B.Migdal. JETP letters, 7:996, 1958.
6. W.Metzner and D.Vollhardt. Phys. Rev. Lett., 62:324, 1989.
7. P.C.Hohenberg. Phys. Rev., 158:383, 1967.
8. P.W.Anderson. The theory of superconductivity in the high-TC cuprates. Princeton University Press,
1997.
9. E.K.H.Salje, A.S.Alexandrov, and W.Y.Liang, editors. Polarons and Bipolarons in high-TC super-
conductors and related materials. Cambridge University Press, 1995.
10. G.M.Zhao, M.B.Hunt, H.Keller, and K.A.Müller. Nature, 385:236, 1997.
11. A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar, D.L.Feng, E.D.Lu, T.Yoshida, H.Eisaki,
A.Fujimori, K.Kishio, J.-I.Shimoyama, T.Noda, S.Uchida, Z.Hussa, and Z.-X.Shen. Nature,
412:6846, 2001.
12. R.J.McQueeney, Y.Petrov, T.Egami, M.Yethiraj, G.Shirane, and Y.Endoh. Phys. Rev. Lett., 82:628,
1999.
13. J-.H.Chung et al. Phys. Rev. B, 67:014517, 2003.
14. F.Gebhard. The Mott Metal-Insulator Transition - Models and Methods, volume 137 of Springer
tracts in modern physics. Springer, Heidelberg, 1997.
15. I.G.Lang and Yu.A.Firsov. Sov. Phys. JETP, 16:1301, 1963.
16. J.Hubbard. Proc. R. Soc. London Ser. A, 276:238, 1963.
17. J.P.Hague and N.d’Ambrumenil. J. Low Temp. Phys., 140:77–89, 2005.
18. J.P.Hague and N.d’Ambrumenil. cond-mat/0106355.
19. A.J.Millis, R.Mueller, and B.I.Shraiman. Phys. Rev. B, 54:5389, 1996.
20. M.Hettler, A.N.Tahvildar-Zadeh, M.Jarrell, T.Pruschke, and H.R.Krishnamurthy. Phys. Rev. B,
58:7475, 1998.
21. J.P.Hague. Electron and phonon dispersions of the two dimensional Holstein model: Effects of vertex
and non-local corrections. J. Phys.: Condens. Matter, 15:2535, 2003.
22. J.P.Hague. Superconducting states of the quasi-2d Holstein model: Effects of vertex and non-local
corrections. J. Phys.: Condens. Matter, 17:5663, 2005.
23. L.S.Macarie and N.d’Ambrumenil. J. Phys.: Condens. Matter, 7:3237, 1995.
ABSTRACT
  I present results from an extended Migdal-Eliashberg theory of
electron-phonon interactions and superconductivity. The history of the
electron-phonon problem is introduced, and then study of the intermediate
parameter regime is justified from the energy scales in the cuprate
superconductors. The Holstein model is detailed, and limiting cases are
examined to demonstrate the need for an extended theory of superconductivity.
Results of the extended approximation are shown, including spectral functions
and phase diagrams. These are discussed with reference to Hohenberg's theorem,
the Bardeen-Cooper-Schrieffer theory and Coulomb repulsion.

<|endoftext|><|startoftext|>
Introduction 2
2 The gravitational coupling. Some geometrical features 3
3 The horizon coalecence geometry 4
3.1 Case m = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Case m 6= 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Conclusions 11
A Proof of the finite nonzero physical distance 12
B Horizon coalescence as a flow on the line 13
1 Introduction
Monopoles have been subject of deep study and controversy all over the last century. This
is so because, although no experimental evidence of their existence has been found, many
theoretical issues make them almost unavoidable. They already appeared as solutions of
Maxwell equations as long as the null B-divergence condition was relaxed, that is, ∇·B 6=
0. It was Dirac [1] in the early thirties who first proposed the theoretical possibility of
creating an experiment to actually produce a “fake” monopole, in a way that its fakeness,
say, the Dirac string, was undetectable. As a consequence, the product of the electric
and the magnetic charges was quantized. Many years later, in 1959, the quantization
requirement was confirmed by the celebrated Aharonov-Bohm experiment [2].
Since 1954, owing to the papers by Yang and Mills [3] and by Utiyama [4], gauge
theories of a group of symmetry larger than U(1), in particular non abelian symmetry
groups SU(2) and SU(3) (which eventually would conform the Standard Model of par-
ticle physics) where gradually developped. In 1969, Lubkin [5] realized that monopoles
can be classified by the homotopy group of the gauge symmetry group of the theory,
so that the magnetic charge is replaced by the topological charge of the field configura-
tion. In the case of the Dirac monopole, the homotopy group π1 of U(1) is exactly Z.
However, it was not till 1975 that Yang [6] generalized the abelian monopole to the case
of an SU(2)-invariant gauge theory in six dimensions, see also [7]. Modern approaches
use the formalism of fiber bundles for a suitable description of monopoles. It generalizes
the traditional classification in terms of the homotopic group of the gauge theory. In
this way, magnetic monopoles are identified with the different instanton configurations
which come up basically as non trivial maps of the gauge group, usually SU(N), onto Sd,
where d is the spatial dimension. That is, magnetic monopoles are all those non trivial
principal bundles with group structure SU(N) that can be realized on the hypersurface
Sd. The classification coincides, as said before, with the different classes of homotopy
groups. The genalization of Yang monopoles to an arbitrary even dimension was carried
out in [8]. Using slightly different methods similar analysis have recently been done [9].
The reader can find good reviews on the subject in [10], [11] and the references therein.
As every existing object in nature, monopoles couple to gravity via their energy-
momentum tensor. The resulting geometry is obtained by solving the Yang-Mills-
Einstein equations, which get greatly simplified by imposing spherical symmetry (as
expected from a magnetic monopole field configuration). This geometry is fully specified
by choosing a point in the space of parameters {µ,m,Λ, k}, the meaning of which will
be explained in detail later on. For a given range of parameters, it is easy to prove
that the geometry presents both a cosmological and an event horizon. A full analogy
with the Schwarzschild-de Sitter solution reveals that, in these cases, the geometry is dy-
namically driven through the parameter space into a thermally stable point where both
horizons coalesce [12], the final line element being the analogue of Nariai’s spacetime in
four dimensions.
This paper is organized as follows: the next section sets a general framework and
fixes the notation used later. The main body of the article concerns the analysis of the
coalescence solutions. This is achieved in two subsections corresponding to the massless
and massive cases respectively. An explicit computation of the resulting geometry is
carried out in each case. A final section includes some conclusions and comments. Two
appendixes have been added to the article. They are topics which lie somehow out of the
main line of the paper, either for being technical aspects of a computation (Appendix
A) or for presenting a new idea the exposition of which would need a new section, as in
Appendix B. the absence of B, in turn, would not have prevented the reader from a full
understanding of the paper.
2 The gravitational coupling. Some geometrical fea-
tures
The gravitational effects of these monopoles have been recently studied [9]. It was done,
as usual, by minimally coupling the Yang-Mills energy-momentum tensor to gravity.
Variations of the Einstein-Hilbert action
− det g
(R− 2Λ)− 1
Tr|F |2
with respect to the metric tensor leads to
Gmn = 8πGTmn − gmnΛ, (2)
where
Tmn = γ
tr(F pm Fnp)−
gmntr(FpqF
is the energy momentum tensor of the YM strength field. The traces are taken in the
colour index and γ is the YM coupling constant. Finding general solutions for (2) is
a highly complicated problem. However, imposing spherical symmetry simplifies the
task enormously. According to this, the ansatz will be a spatially spherically symmetric
(2k + 2)-dimensional metric whose line element reads
ds2 = −∆dt2 +∆−1dr2 + r2dΩ22k. (4)
The last equation is consistent with (2) and (3) when [9]
∆(r) = 1− 2Gm
r2k−1
, (5)
where R =
k(2k+1)
is the de Sitter radius, µ2 is proportional to 1
and measures
the magnetic charge of the monopole, m comes up as a constant of integration with
dimensions of mass and G is the Newton constant in 2k + 2 spacetime dimension. At
first sight, (4) with (5) look like a Schwarzschild-de Sitter geometry in 2k + 1 spatial
dimensions with an extra term, the one involving µ, which seems to be independent of
the dimension of spacetime. It seems reasonable to think of this term as a contribution
of the magnetic monopole. This simple image, even if not exact1, is helpful and, unless
we face the vanishing limits, it may be kept in mind in the following.
The next step (and the next temptation) is to analyze how the causal structure
of this spacetime depends on given values of the parameters. The main body of this
work concerns a deep analysis of the solution in the case when parameters µ,Λ, m and
k allow the existence of two horizons. Then, inspired by the Schwarzschild-de Sitter
unstable solution, it is claimed that the system gets dynamically driven to a value of the
parameters where both horizons coalesce. Eventhough coordinate distance shrinks to
zero, physical distance does not. A generalized Nariai geometry “between” the horizons
is then explicitly obtained. The Nariai line element [13] is a nonsingular solution of the
Einstein’s vacuum equations with a positive cosmological constant, Rµν = Λgµν . It was
first found by Kasner [14] and its electrically charged generalization dates of 1959 [15].
However, the important fact that it emerges as an extremal limit of Schwarzschild-de
Sitter black holes was not noticed until 1983 [12].
Nariai spacetime in four dimensions is the direct product dS2 × S2, dS2 being no
more than the hyperbolic version of S2 as we change t → iτ . In 2k + 2 dimensions, the
solution gets generalized to dS2 × S2k. Again, it is the direct product of two constant
curvature spaces and admits a 3 + k(2k+ 1) group of isometries SO(2, 1)× SO(2k+ 1).
The space is homogeneous since the group acts transitively and is locally static, given
that a global dS-type spacetime cannot be described by merely one static coordinate
chart. In four dimensions, radii of curvature of the two product spaces are equal if the
black hole is neutral, and different in the charged case. If the black hole is electrically
charged, the respective radii a and b are different and related by the equation
a−2 + b−2 = 2Λ (6)
as shown in [15]. This relation will be generalized in the magnetic case, the object of
our study. A short but instructive recent work on the four dimensional geometry can be
found in [16].
3 The horizon coalecence geometry
Studying the horizons of a geometry like (4) is equivalent to searching the divergencies
of grr for finite values of the coordinates. This leads us to analyze the zeroes of function
∆(r), where the horizons will be located. For a certain range of values of {µ,Λ, k,m}
there will be two horizons. Finding this region in the parameter space will be the first
1The resulting geometry is, of course, not just the sum of terms of different geometries, but it casually
coincides. Differences are bound to exist on the limit of vanishing of a given contribution. For instance,
let us suppose that, given a set of parameters, say {m,µ,Λ, k }, we can switch off µ (by neglecting it
with respect to the others). The resulting geometry is topologically different to the one obtained by not
assuming any monopole at all at the beginning, that is, the limit does not coincide. However, in the
cases studied here, this is no more than an enough-to-be-aware-of subtlety.
task. After that, attention will be focused on the coalescence point of the horizons2. The
analysis consists of two steps, first, the parameterization of the coordinate separation
of the horizons (ǫ) and the calculation of the physical distance between them when
coalescence takes place (ǫ → 0). Then, following the strategy in [12], the computation
of the line element of the remaining geometry. This program is carried out on two cases:
m = 0 and m 6= 0, which are treated in the next subsections, respectively. The massless
case must be seen as a toy model of the massive one. This distinction is made not
merely for simplicity but also because, as will be explained, the mass parameter comes
out naturally for dynamical requirements.
3.1 Case m = 0
In the massless case, ∆(r) gets reduced to
∆(r) = 1−
. (7)
Solving ∆ = 0 is equivalent to finding the zeroes of a biquadratic equation as long as
r = 0 is not considered. We perform the change z ≡ r2 and solve a second order ordinary
equation. The horizons are found to be at
1− 4µ
z++ =
1− 4µ
. (9)
R > 2µ guarantees the existence of two positive solutions and, therefore, four solutions
for the quartic equation. Two of them, r+ = +
z+ and r++ = +
z++, correspond to the
radial coordinate of the inner (black hole) and outer (cosmological) horizon respectively.
If R = 2µ, both solutions coincide, which means that the horizons coalesce. As said
before, this does not mean that the geometry vanishes as a naive observation (given
a wrong choice of coordinates) would make one think. Physical distance between the
horizons, on the contrary, remains finite at the limit. In order to prove this, let us
compute it. For fixed time and angular coordinates, the physical distance is
D(µ,R) =
∫ r++
[−r4 +R2r2 − µ2R2]1/2
∫ z++
− µ2R2
z − R2
)2]1/2dz (10)
The requirement R > 2µ implies R
− µ2R2 > 0 so the above integral is exactly solved
as an cos−1-type. The result is
D(R) =
πR. (11)
Surprisingly, the physical distance does not depend on µ. It means that, given a cos-
mological constant Λ, one could “switch on” the monopole and go on till the horizons
coalesce but the distance would remain unchanged. However, because of quantization
2Coalescence as seen in Schwarzschild coordinates.
requirements, monopole charge µ cannot be tuned, but needs to have, instead, a fixed
value upto a sign. On the other hand, the cosmological constant, Λ, should be chosen
when writing the lagrangian. It means that changing its value does not drive us from
one model to another but implies an essential change in the theory [17]. Therefore, we
are not free to adjust any parameter arbitrarily as done with the mass of the black hole
in the Schwarzschild-de Sitter case. Then, eventhough physical reasons would lead the
horizons to coalesce, the absence of any free parameter in our model makes it impossible.
In the next section, m will come to our help as a free parameter for the model.
Despite the last remark, one could wonder about the kind of geometry that re-
mains when the horizons coalesce. This task, even if seems just a curious exercise now,
will be useful for the next section. Applying a technique similar to the one Gingspar
and Perry [12] used to study the geometry of Nariai’s solution, we proceed by, first,
parametrizing the separation of horizons as
R = 2µ(1 + ǫ2), (12)
in a way that coalescence corresponds to taking ǫ = 0. Then, we define a “wise” change
of coordinates
χ = cos−1
(r2 − r20)
τ = ǫ
, (13)
where A =
1− 4µ2
and r20 =
, and the angular coordinates remain unchanged. The
new coordinates (13) might seem randomly chosen at first sight. However, there are
some reasons that justify such a functional dependence. For instance, χ is nothing but
the physical distance between r+ and r. The timelike coordinate t is multiplied by i in
order to work in the Euclidean region3 and by ǫ because ∆/ǫ2 is expected to have a finite
limit when ǫ → 0. Now, we apply (12) and (13) and expand ∆(r(χ))dτ 2, ∆−1(r(χ))dχ2
and r2(χ) up to first order in ǫ. The line element (4) reads
ds2 = µ2dχ2 + µ2 sin2(χ)
1 + ǫ
2 cos(χ)
dτ 2 +
+ 2µ2
2 cos(χ)ǫ
dΩ22k. (14)
We take limit ǫ → 0 to obtain
ds2 = µ2
dχ2 + sin2(χ)dτ 2
+ 2µ2dΩ22k. (15)
As seen in (15), the 2k-sphere decouples from the rest. The resulting geometry is S2×S2k
for k ≥ 2. Notice the parallelism between this geometry and Nariai’s solution, which is
S2 × S2. The “classical” relation between radii (6) gets also generalized to
a−2 + b−2 = C0Λ, (16)
where C0 =
k(2k+1)
. The geometry (15) can be viewed as a “degenerate” black hole, in
which the two horizons have the same (maximum) size and are in thermal equilibrium.
This could be interpreted by an observer as a bath of radiation coming from both horizons
3τ will be periodic at both horizons, although different in each case. Equality will hold at the
coalescence point, when thermal stability is reached.
at a precise temperature [19]. The temperature can be calculated by means of surface
gravity κ, as computed in the new coordinates (13)
k(2k + 1)
. (17)
The entropy can also be computed as a quarter of the sum of the two horizons [18], so
k(2k + 1)
, (18)
where ω2k is the area of the 2k-dimension unit sphere.
3.2 Case m 6= 0
In the massive case we recover the full expression (5) for ∆. Since the singular point
r = 0 is not to be considered, we better analyze the function r2k−1∆(r)
∆̃ ≡ r2k−1∆ = −
r2k+1
+ r2k−1 − µ2r2k−3 − 2Gm. (19)
It is known that a polynomial equation with powers equal to or higher than five is not
generally solvable in a symbolical way. This happens for k ≥ 2. So, the purpose of doing
a study for the massive case analogous to that achieved in the first section is ruined.
Nevertheless, some information can be extracted from (19). We should first remember
the sign of the parameters: R2 > 0 (de Sitter), µ2 > 0 for k ≥ 2, and m will be free in
principle. Derivating (19) and equating to zero leads to a biquadratic equation of the
(2k + 1)r4 + (2k − 1)r2 − (2k − 3)µ2 = 0, (20)
which, as long as
Λµ2 ≤ k
(2k − 1)2
2k − 3
, (21)
has two positive (and two negative) roots, rmin and rmax ≡ rc. In terms of the cosmo-
logical constant
r2c ≡
k(2k − 1)
1− 4(2k − 3)Λµ
k(2k − 1)2
, (22)
rmin is obtained from (22) by swapping the sign of the square root. A quick look at (19)
shows that the smallest root is a minimum and the largest is a maximum of function ∆̃.
Now, let us plug rc into (19):
1. If m > 0, then (see fig.1)
a) ∆̃(rc) ≥ 0 implies that there are two event horizons, the black hole and the
cosmological horizon. The inequality gets saturated at the coalescence point.
b) ∆̃(rc) < 0 means that no horizon is found.
2. If m < 0, then (see fig.2)
a) ∆̃(rmin) < 0 together with ∆̃(rc) < 0 implies that there is just one Cauchy
horizon.
b) ∆̃(rmin) < 0 together with ∆̃(rc) > 0 assures the existence of a Cauchy horizon
and both black hole and cosmological horizon.
c) ∆̃(rmin) > 0 leaves us with the cosmological horizon only.
The case we will study is ∆̃(rmin) < 0 and ∆̃(rc) > 0 which, independently of the sign
of m, assures4 the existence of black hole and cosmological horizons. This corresponds
to values of m within range (see fig.3)
1.a 1.b
Figure 1: Case m > 0. The curve represents function ∆̃(r). Figure 1.a has two roots which
correspond to the black hole (r+) and cosmological horizon (r++) respectively. Figure 1.b
shows the absence of horizons.
2.a 2.b 2.c
Figure 2: Case m < 0. This time ∆̃(r) permits the existence of one (Cauchy) horizon as
in Figure 2.a, three horizons (Cauchy, black hole and cosmological) as in 2.b, or just the
cosmological horizon as shown in 2.c.
m− < m < m+, (23)
where
Gmc ≡ Gm+ =
1 + 2k
r2k−3c (r
c − 2µ2). (24)
The value of Gm− is obtained by replacing rc → rmin. In terms of Λ and µ we get
Gm± =
(2Λ)−k+1/2
1 + 2k
− k + 2k2 ±
k2(1− 2k)2 − 4Λµ2(2k − 3)k
]k−3/2
− k + 2k2 − 4Λµ2 ±
k2(1− 2k)2 − 4Λµ2(2k − 3)k
. (25)
4The value of m can be negative. That is because m should not be thought of as an entity with
physical meaning but as a geometrical parameter. Short calculation in (25) shows that m gets negative
values for Λµ2 ≥ k
(1 + 2k).
Hr,m+L
Hr,m-L
Figure 3: This figure shows the range of “masses” which are consistent with the existence of
both black hole and cosmological horizons. The curve ∆̃(r) “moves down” in the process of
coalescence.
The crucial point is that both horizons coalesce when rc is a root of (19) which
happens at m = mc(k,Λµ
2,Λ). Two relations have been imposed so far:
de∆(r;m)
|rc= 0,
that is, (20), which defines rc, and ∆(rc;mc) = 0 which leads to mc. In order for m to
be real, the bound which must be impossed on Λµ2 coincides with (21) which, in turn, is
nothing but the condition for the existence of two horizons. So, if a given a value for Λµ2
is low enough to produce two horizons, there always exists a real value of m which makes
them coalesce. Again, as in the Schwarzschild-de Sitter example, the system is unstable
and the equilibrium point is reached at m = mc. Unlike the massless case, plugging m
gives us enough room for maneuvre to drive the system to equilibrium.
At this point, we would like to remark that the procedure of horizon coalescence, as
studied in detail below, may be seen as a flow in a line which undergoes a Pitchfork
bifurcation at the coalescence point. Parameter m, moved by thermal instability, drives
the system to the critical situation. For concreteness see Appendix B.
Let us focus on the near coalescence point. This can be parameterized by
r = rc + δr = rc(1 + ǫ cosχ) (26)
m = mc − δm = mc(1 + bǫ2).
Parameterization of r also involves a change of coordinates r → χ and should be taken
as imposed at the moment although it will be justified later. The horizons will be
symmetrically located at: r+ = rc(1−ǫ) and r++ = rc(1+ǫ) which correspond to χ+ = π
and χ++ = 2π, respectively
5. The value of b as well as the absence of a linear term in ǫ of
the parameterization of m may be explained as follows. Near the coalescence point one
should Taylor expand ∆ around rc and have in mind that, for
≪ 1, ∆ is aproximately
parabolic, so that second order expansion is enough. By definition ∆(r+) = ∆(r++) = 0
5For a small enough ǫ, it is expected that the parabolic approach holds and, then, both horizons are
symmetrically located with respect to rc.
and ∆ reaches a maximum at rc. So,
0 = ∆(r++) = ∆(rc, m) + ∆
′(rc, m)(rcǫ) +
∆′′(rc, m)(rcǫ)
r2k−1c
∆′′(rc;mc)r
2, (27)
which means that
∆′′(rc;mc)r
. (28)
Calculating the physical distance near the coalescence point would, again, imply
solving the integration
D(ǫ) =
∫ r++
∆1/2(r)
, (29)
where r++ = r+ + 2rcǫ. Although the exact result is not computed, an explicit proof of
its finite nonzero value is given in Appendix A. The procedure of calculating the physical
distance also brings us some light on which is the change of coordinates that should be
made in order to understand the resulting geometry. It turns out to be
χ = cos−1
(r − rc)
t, (30)
where
k − 2k2 + 2Λr2c
is a dimensionless factor.
The coalescence of horizons takes place at ǫ = 0. In order to study the geometry at
the limit we proceed by calculating −∆dt2, ∆−1dr2 and r2 in the new coordinates (30)
and expand in ǫ around ǫ = 0. The new line element gets determined by taking the zero
order of the expansion. The relations for r and m in (26) are in accordance with (30),
where b takes the value of (28), by virtue of the parabolic approach. From (30), it is
straightforward to see that r2 takes a constant value r2c . Surprisingly, as in the massless
and Schwarzschild-de Sitter cases, the geometry splits in two disconnected parts which
lead to a product manifold S2 × S2k. The line element reads
ds2 = Br2c
dχ2 + sin2(χ)dτ 2
+ r2cdΩ
2k, (32)
where χ ∈ [π, 2π] and τ is periodic6.
As seen in (32), S2 has radius a2 = Br2c , and S2k has radius b2 = r2c . Now, the
generalized Bertotti relation (6) is
a−2 + b−2 =
2(1− k)
= CΛ, (33)
where C(k,Λµ2) is obtained by inserting (22) in (33). Note that C
k,Λµ2 = k(2k +
= C0, and then (33) turns into (16), that is, into the massless case. This is no
6τ is periodic on both horizon surfaces all over the process in order to avoid the conical singularity
at the horizons. At the coalescence point, however, both periods equal.
surprising since Λµ2 = k(2k + 1)/4 is the condition for coalescence in the massless case
(equivalent to R = 2µ), and, at the same time, it makes mc = 0. So, the massive
geometry is a consistent extension of the massless one. Now, fixing Λ does not determine
uniquely the geometry. Another dimensionless variable Λµ2 is required.
As in the last section, the geometry (32) can be viewed as a “degenerate” black hole,
in which the two horizons have the same (maximum) size and are in thermal equilibrium.
In the present case the temperature is given in terms of the surface gravity κ by
. (34)
In Planck units,the entropy associated with this solution may be calculated (given that
it is not extreme7) by means of the total area of the horizons as
c . (35)
4 Conclusions
The spherically symmetric solution of gravity due to a magnetic monopole in arbitrary
dimension has been studied, in particular, when the set of parameters {Λ, µ,m, k} allows
the existence of two horizons. In these cases, thermal instabilities drive a process of
horizon coalescence. Even though coordinate separation between the horizons shrinks
to zero, it has been proven in both the massless and the massive case that the physical
distance does not. The geometry of the remaining space between the horizons has been
calculated in both cases. They turned out to be Nariai-type solutions, that is, the product
of a 2-sphere and a 2k-sphere for a (2k + 2)-dimensional spacetime. In each solution,
the radii of the spheres are not independent. They are related by an elliptical equation
which should be understood as the generalization of the relation found by Bertotti. The
unique generalized equation involving these radii for both the massless and the massive
case has been given. After computing the line element in each case, the thermodynamical
properties (Hawking temperature and entropy) due to the existence of horizons have been
calculated.
The Yang monopole corresponds to the six dimensional case, where k = 2. The
geometry obtained after coalescence is S2 × S4 as can be explicitly read in (32). This
case is especially interesting since it may be described in String Theory (a realization
of the Yang monopole in Heterotic String Theory has recently been done [21] as well as
another complementary picture in Type-IIA String Theory [20]). In the same context,
it looks possible to find results (18) and (35) for the entropy by application of some
attractor mechanism [22, 23]. We believe that this would be an interesting topic to be
addressed in future research.
7A charge black hole is said to be extreme when it has the minimum mass. Then, as it cannot release
any energy without losing charge, it is supposed not to emit, and its associated Hawking temperature is
0. The black hole we are dealing with in this paper is extreme in the sense of carrying the “maximum
mass” allowed by the cosmological constant Λ. Obviously, the temperature will not be zero.
A Proof of the finite nonzero physical distance
Computing the physical distance is equivalent to performing the integration
∫ r++
∆1/2(r)
, (36)
where, for small ǫ, r++ = r+ + 2rcǫ. Divergencies might appear at the points where
∆ → 0. The case we have been considering all along section (3.2) concerns the existence
of two horizons which coalesce, that is, two single roots r+ and r++ of ∆ which join to
form a double one. Function ∆̃ can always be expressed as ∆̃ = (r − r+)(r++ − r)g(r),
where g(r) is a polynomial function of powers of degree 2k − 1 and no zeroes within the
range [r+, r++] are to be found by construction. Explicitly, equation (36) is
D(ǫ) =
∫ r++2rcǫ
(r − r+)1/2(r+ + 2rcǫ− r)1/2
rk−1/2
g1/2(r)
︸ ︷︷ ︸
. (37)
Now, h(r) is a continuous divergenceless strictly positive function in the compact [r+, r++],
which means that it will reach a positive maximum and minimum for certain r′s. Let us
call hmax and hmin the values of the function h in these points
8. Then
∫ r++2rcǫ
(r − r+)1/2(r+ + 2rcǫ− r)1/2
≤ D(ǫ) ≤
≤ hmax
∫ r++2rcǫ
(r − r+)1/2(r+ + 2rcǫ− r)1/2
. (38)
The integration can be performed:
∫ r++2rcǫ
(r − r+)1/2(r+ + 2rcǫ− r)1/2
= π. (39)
D(ǫ → 0) = πh(rc), (40)
where the value of rc is given in (22).
Integrations of form (39) are solved exactly by a cos−1 type function, and a nonzero
finite result is obtained. It is remarkable that the same can be said for any ∆ we would
choose, as long as no more than two single roots were to join to form a double one. The
key point is that (39), which could be problematic, is independent of ǫ and therefore the
distance is finite in the limit, when ǫ → 0. So, eventhough (39) was neither exactly the
physical distance in the massive case nor in Schwarzschild-de Sitter solution (however, it
was in the massless case as we have already seen in the first section), it is closely related
to it. This fact gives us a hint or, at least, justifies the change of coordinates we were
performing once and again to study the geometry at the limit ǫ → 0.
8These, in principle, depend on ǫ but coincide when ǫ → 0: hmin = hmax ≡ h(rc).
B Horizon coalescence as a flow on the line
The main phenomenon that concerns this paper, as said before, can be described in terms
of the dynamics of a vector field on the line. The coalescence point, in this picture, is no
more than a supercritical Pitchfork bifurcation. Let us remember some general features
of the dynamics of a one-dimensional flow. The equation of a general vector field on the
line can be expressed as:
ẋ = f(x, α) (41)
where f is any real function with real support, the dot means differentiation with respect
to t and α is a parameter of the model. Fixed points of (41) require ẋ = 0, which must
be obtained by finding the roots of f , that is
f(x∗, α) = 0. (42)
Equation (42) is solved by an n-collection of fixed points x∗i for a given value of α. Let
us suppose that f has three roots if α = α0. Fixed points come closer as α moves and
get “condensed” in a “fat” fixed point (bifurcation point) at α = αc. A paradigmatic
example of a Pitchfork bifurcation is shown by function
f(x) = x(α− x2). (43)
One question arises naturally now about the role the horizons play in this picture. Let
us claim that horizons are fixed points and the role of α is played by m. We will justify
this identification by constructing the vector flow.
Constructing a flow in a manifold (in our case it will be a line) is equivalent to giving
a family of curves r̄(t) which covers the manifold or part of it. Each of the curves gets
specified by the initial condition, say, r̄(t = 0). Now, let us consider geodesic motions.
Without loss of generality, the angular coordinates of our geometry will be frozen, θ and
φ are constants, and only radial curves r(t) are to be regarded. Static coordinate system
will serve us to describe the movement for any r ∈ (r+, r++). Let us invoke intuition at
this point. If r(0) is near the cosmological or the black hole horizon it is clear that a
test particle will move out of the region by approaching each horizon respectively. Then,
there is a point r = rg where the test particle will not “feel” any force and, consequently,
it will not move9. This is the first (unstable) fixed point.
Let us move the origin by defining r′ = r− rg, after this, primes will be dropped out
to simplify notation. The flow at each point will be determined by the physical velocity
ṙ(r) (as measured by an observer placed at r = 0) that a test particle would adquire at
r if it is dropped with ṙ = 0 at around r = 0 (as close as possible). It is not hard to see
that the velocity of the test particle, as seen by the static geodesic observer, is bound to
be zero at both horizons. So, horizons are fixed points. Now, our system can be treated
as a vector flow ṙ(r) which covers the region between the horizons. The vector flow has
three fixed points: {r+, r++, 0} where the first two are stable. As m runs towards mc,
the system shrinks into a Pitchfork bifurcation. Near the bifurcation point the flow can
be approximated by
ṙ = βr(r − r+)(r++ − r), (44)
9rg in our geometry, plays the role the asymptotic infinity does in Schwarzschild solution,that is, the
point where the time-like Killing vector should be normalized in order to define the horizon temperature.
Note that rg ≡ rc at the coalescence point, that is, when ǫ = 0.
where β is a positive constant which depends on µ, k and Λ. On the one hand, in the
coordinate system {χ, τ}, and using (30), we have
−→ ṙ =
. (45)
On the other hand, equation (44), expressed in the new coordinate system, reads ṙ =
ǫ3βr3c cosχ sin
2 χ, and so
= −iǫβr3cB cosχ sin2 χ. (46)
As expected, in the new coordinate system, every point converts into a fixed point as
horizons coalesce (ǫ → 0). Since the flux lines were identified with geodesics of test
particles, this can be understood as the abscence of forces at the end of the process.
Acknowledgment
We thank P. K. Townsend and Adil Belhaj for helpful discussions and Jean Nuyts for
critical reading of the manuscript. This work has been supported by MCYT ( Spain)
under grant FPA 2003-02948.
References
[1] P. A. M. Dirac, Quantised singularities in the electromagnetic field, Proc. Roy. Soc.
Lond. A 133, 60 (1931).
[2] Y. Aharonov and D. Bohm, Significance of Electromagnetic Potentials in the Quan-
tum Theory, Phys. Rev. 115, 485 (1959).
[3] C. N. Yang and R. L. Mills, Conservation of Isotopic Spin and Isotopic Gauge
Invariance, Phys. Rev. 96, 191 (1954).
[4] Ryoyu Utiyama, Invariant Theoretical Interpretation of Interaction, Phys. Rev. 101,
1597 (1956).
[5] E. Lubkin, Geometric Definition of Gauge Invariance, Ann. Phys. 23, 233 (1963).
[6] C. N. Yang, Generalization of Dirac’s monopole to SU2 gauge fields, J. Math. Phys.
19, 320 (1978).
[7] P. Goddard, J. Nuyts and D.I. Olive, Gauge Theories And Magnetic Charge, Nucl.
Phys. B 125, 1 (1977).
[8] Zalan Horvath, Laszlo Palla, Spontaneous Compactification And ’Monopoles’ In
Higher Dimensions., Nucl. Phys. B 142, 327 (1978).
[9] G.W. Gibbons and P.K. Townsend, Self-graviting Yang monopoles in all dimensions,
Class. Quantum Grav. 23, 4873 (2006).
[10] S. Coleman, The magnetic monopole fifty years later, in The Unity of the Funda-
mental Interactions, ed. A. Zichichi (Plenum, New York, 1983).
[11] E. J. Weinberg and P. Yi, Magnetic Monopole Dynamics, supersymmetry, and Du-
ality, Phys. Rept. 43, 65 (2007). hep-th/0609055.
[12] Paul Ginsparg and Malcom J. Perry, Semiclassical perdurance of de Sitter space,
Nuclear Physics B 222, 245 (1983).
[13] H. Nariai, Sci. Rep. Tohoku Univ., Ser. 1 35, 62 (1951).
[14] E. Kasner, Trans. Am. Math. Soc. 27, 101 (1925).
[15] B. Bertotti, Uniform Magnetic Field in the Theory of General Relativity, Phys. Rev.
116, 1331 (1959).
[16] Marcello Ortaggio, Impulsive waves in the Nariai Universe, Phys. Rev.D 65, 084046
(2002).
[17] G. W. Gibbons and S. W. Hawking, Cosmological Event Horizons, Thermodynam-
ics, And Particle Creation, Phys. Rev. D 15, 2738 (1977).
[18] S. W. Hawking and Simon F. Ross, Duality between electric and magnetic black
holes, Phys. Rev. D 52, 5865 (1995)
[19] R. Bousso and S. W. Hawking, Pair creation of black holes during inflation, Phys.
Rev. D 54, 6312 (1996).
[20] A. Belhaj, P. Diaz, A. Segui, On the Superstring Realization of the Yang Monopole,
(2007). hep-th/0703255.
[21] E. A. Bergshoeff, G. W. Gibbons and P. K. Townsend, Open M5-branes, Phys. Rev.
Lett. 97, 231601 (2006). hep-th/0607193.
[22] S. Ferrara, R. Kallosh, A. Strominger, N=2 extremal black holes, Phys. Rev. D 52,
5412 (1995).
[23] S. Ferrara, R. Kallosh, Supersymmetry and Attractors, Phys. Rev. D 54, 1514
(1996).
http://arxiv.org/abs/hep-th/0609055
http://arxiv.org/abs/hep-th/0703255
http://arxiv.org/abs/hep-th/0607193
	Introduction
	The gravitational coupling. Some geometrical features
	The horizon coalecence geometry
	Case m=0
	Case m=0
	Conclusions
	Proof of the finite nonzero physical distance
	Horizon coalescence as a flow on the line
ABSTRACT
  A detailed study of the geometries that emerge by a gravitating generalized
Yang monopole in even dimensions is carried out. In particular, those which
present black hole and cosmological horizons. This two-horizon system is
thermally unstable. The process of thermalization will drive both horizons to
coalesce. This limit is what is profusely studied in this paper. It is shown
that eventhough coordinate distance shrinks to zero, physical distance does
not. So, there is some remaining space which geometry has been computed and
identified as a generalized Nariai solution. The thermal properties of this new
spacetime are then calculated. Topics, as the elliptical relation between radii
of spheres in the geometry or a discussion about whether a mass-type term
should be present in the line element or not, are also included.

<|endoftext|><|startoftext|>
Introduction
In [1] a new formulation of general relativity was presented, named the
instanton representation of Plebanski gravity. The basic dynamical variables
are an SO(3, C) gauge connection Aaµ and a matrix Ψae taking its values in
two copies of SO(3, C).1 The consequences of the associated action IInst
were determined via its equations of motion, which hinge crucially on weak
equalities implied by the the initial value constraints. For these consequences
to be self-consistent, the constraint surface must be preserved for all time
by the evolution equations. The present paper will demonstrate that this
is indeed the case. We will not use the usual Hamiltonian formulation for
totally constrained systems [2], since we will not make use of any canonical
structure implied by IInst. Rather, we will deduce the time evolution of the
dynamical variables directly from the equations of motion of IInst.
Sections 2 and 3 of this paper present the instanton representation ac-
tion and derive the time evolution of the basic variables. Sections 4, 5 and 6
demonstrate that the nondynamical equations, referred to as the diffeomor-
phism, Gauss’ law and Hamiltonian constraints, evolve into combinations of
the same constraint set. The result is that the time derivatives of these con-
straints are weakly equal to zero with no additional constraints generated on
the system. While we do not use the usual Dirac method in this paper, the
result is still that the instanton representation is in a sense Dirac consistent.
We will make this inference clearer by comparison with the Ashtekar vari-
ables in the discussion section. On a final note, the terms ‘diffeomorphism’
and ‘Gauss’ law’ constraints are used loosely in this paper, in that we have
not specified what transformations of the basic variables these constraints
generate. The use of these terms is mainly for notational purposes, due to
their counterparts which appear in the Ashtekar variables.
2 Instanton representation of Plebanski gravity
The starting action for the instanton representation of Plebanski gravity is
given by [1]
IInst =
d3xΨaeB
F a0i + ǫkjmB
−iN(detB)1/2
Λ+ trΨ−1
, (1)
1Index labelling conventions for this paper are that symbols a, b, . . . from the begini-
ing of the Latin alphabet denote internal SO(3, C) indices while those from the middle
i, j, k, . . . denote spatial indices. Both of these sets of indices take takes 1, 2 and 3. The
Greek symbols µ, ν, . . . refer to spacetime indices which take values 0, 1, 2, 3.
where Nµ = (N,N i) are the lapse function and shift vector from metric
general relativity, and Λ is the cosmological constant. The basic fields are
Ψae and A
i , and we action (1) is defined only on configurations restricted
to (detB) 6= 0 and (detΨ) 6= 0.2 In the Dirac procedure one refers to Nµ
as nondynamical fields, since their velocities do not appear in the action.
While the velocity Ψ̇ae also does not appear, we will distinguish this field
from Nµ since the action (1), unlike for the latter, is nonlinear in Ψae.
The equation of motion for the shift vector N i, the analogue of the
Hamilton equation for its conjugate momentum Π ~N , is given by
δIInst
= ǫmjkB
eΨae = (detB)(B
−1)diψd ∼ 0, (2)
where ψd = ǫdaeΨae is the antisymmetric part of Ψae. This is equivalent to
the diffeomorphism constraint Hi owing to the nondegeneracy of B
a, and
we will often use Hi and ψd interchangeably in this paper. The equation of
motion for the lapse function N , the analogue of the Hamilton equation for
its conjugte momentum ΠN , is given by
δIInst
= (detB)1/2
Λ+ trΨ−1
= 0. (3)
Nondegeneracy of Ψae and the magnetic field B
e implies that on-shell, the
following relation must be satisfied
Λ + trΨ−1 = 0, (4)
which we will similarly take as synonymous with the Hamiltonian constraint.
The equation of motion for Ψae is
δIInst
= BkeF
0k + ǫkjmB
m + iN
detΨ(Ψ−1Ψ−1)ea ∼ 0, (5)
up to a term proportional to (4) which we have set weakly equal to zero.
One could attempt to define a momentum conjugate to Ψae, for which (5)
would be the associated Hamilton’s equation of motion. But since Ψae forms
part of the canonical structure of (1), then our interpretation is that this is
not technically correct.3
The equation of motion for the connection Aaµ is given by
δIInst
∼ ǫµσνρDσ(ΨaeF eνρ)−
4ǫmjkN
mBkeΨ[de]
+N(B−1)dj
Λ+ trΨ−1
, (6)
2The latter case limits the application of our results to spacetimes of Petroc Types I,
D and O (See e.g. [3] and [4].
3This is because (5) contains a velocity Ȧa
within F a
and will therefore be regarded
as an evolution equation rather than a constraint. This is in stark contrast with (2) and
(3), which are genuine constraint equations due to the absence of any velocities.
where we have defined
ea(x, y) ≡
δAai (x)
Bje(y) = ǫ
−δae∂k + fedaAdk
δ(3)(x, y); D
ea ≡ 0. (7)
The terms in large round brackets in (6) vanish weakly, since they are pro-
portional to the constraints (2) and (4) and their spatial derivatives. For
the purposes of this paper we will regard (6) as synonymous with
ǫµσνρDσ(ΨaeF
νρ) ∼ 0. (8)
In an abuse of notation, we will treat (5) and (8) as strong equalities in
this paper. This will be justified once we have completed the demonstration
that the constraint surface defined collectively by (2), (3) and the Gauss’
constraint from (8) is indeed preserved under time evolution. As a note
prior to proceeding we will often make the identification
N(detB)1/2
detΨ ≡
−g (9)
as a shorthand notation, to avoid cluttering many of the derivations which
follow in this paper.
2.1 Internal consistency of the equations of motion
Prior to embarking upon the issue of consistency of time evolution of the
initial value constraints, we will check for internal consistency of IInst, which
entails probing of the physical content implied by (8) and (5). First, equation
(8) can be decomposed into its spatial and temporal parts as
Di(ΨbfB
f ) = 0; D0(ΨbfB
f ) = ǫ
ijkDj(ΨbfF
0k). (10)
The first equation of (10) is the Gauss’ law constraint of a SO(3) Yang–
Mills theory, when one makes the identification of ΨbfB
f ∼ Eib with the
Yang–Mills electric field. The Maxwell equations for U(1) gauge theory
with sources (ρ, ~J), in units where c = 1, are given by
~∇ · ~B = 0; Ḃ = −~∇× ~E = 0; ~∇ · ~E = ρ; ~̇E = − ~J + ~∇× ~B. (11)
Equations (10) can be seen as a generalization of the first two equations of
(11) to SO(3) nonabelian gauge theory in flat space when one: (i) identifies
with the SO(3) generalization of the electric field ~E, and (ii) one
chooses Ψae = kδae for some numerical constant k.
When ρ = 0 and ~J = 0, then one has the vacuum theory and equations
(11) are invariant under the transformation
( ~E, ~B) −→ (− ~B, ~E). (12)
Then the second pair of equations of (11) become implied by the first pair.
This is the condition that the Abelian curvature Fµν , where F0i = Ei and
ǫijkFjk = Bi, is Hodge self-dual with respect to the metric of a conformally
flat spacetime. But equations (10) for more general Ψae encode gravitational
degrees of freedom, which as shown in [1] generalizes the concept of self-
duality to more general spacetimes solving the Einstein equations. Let us
first attempt to derive the analogue for (10) of the second pair of (11) in the
vacuum case. Acting on the first equation of (10) with D0 yields
D0Di(ΨbfB
f ) = DiD0(ΨbfB
f ) + [D0,Di](ΨbfB
f ) = 0. (13)
Substituting the second equation of (10) into the first term on the right hand
side of (13) and using the definition of temporal curvature as the commutator
of covariant derivatives on the second term we have
ijkDj(ΨbfF
0k)) + fbcdF
0iΨdfB
f = fbcd
0k +B
Ψdf = 0 (14)
where we have also used the spatial part of the commutator ǫijkDiDjva =
fabcB
b vc. Note that the term in brackets in (14) is symmetric in f and c,
and also forms the symmetric part of the left hand side of (5)
0i + i
−g(Ψ−1Ψ−1)fb + ǫijkBifB
k = 0, (15)
re-written here for completeness. To make progress from (14), we will sub-
stitute (15) into (14). This causes the last term of (15) to drop out due to
antisymmetry, which leaves us with
−gfbcd
Ψdf (Ψ
−1Ψ−1)fc +Ψdf (Ψ
−1Ψ−1)fc
= −2i
−gfbcdΨ−1dc . (16)
The equations are consistent only if (16) vanishes, which is the requirement
that Ψae = Ψea be symmetric. This of course is the requirement that the
diffeomorphism constraint (2) be satisfied. So the analogue of the second
pair of (11) in the vacuum case must be encoded in the requirement that
Ψae = Ψea be symmetric.
3 The time evolution equations
We must now verify that the initial value constraints are preserved under
time evolution defined by the equations of motion (5) and (6). These equa-
tions are respectively the Hodge duality condition
0k + i
−g(Ψ−1Ψ−1)fb + ǫijkN iBjbB
f = 0, (17)
and one of the Bianchi identity-like equations
ǫijkDj(ΨaeF
ok) = D0(ΨaeB
e). (18)
Since the initial value constraints were used to obtain the second line of (17)
from (1), then we must verify that these constraints are preserved under
time evolution as a requirement of consistency. Using F b0i = Ȧ
i −DiAb0 and
defining
−g(B−1)fi (Ψ
−1Ψ−1)fb + ǫmnkN
mBnb ≡ iHbk, (19)
Then equation (17) can be written as a time evolution equation for the
connection, which is not the same as a constraint equation as noted earlier
F b0i = −iHbi −→ Ȧbi = DiAb0 − iHbi . (20)
From equation (20) we can obtain the following equation governing time
evolution equation for the magnetic field
Ḃie = ǫ
ijkDjȦ
k = ǫ
ijkDj
0 − iHek
= febcB
0 − iǫijkDjHek = −δ~θB
e − iǫijkDjHek, (21)
which will be useful. On the first term on the right hand side of (21) we
have used the definition of the curvature as the commutator of covariant
derivatives. The notation δ~θ in (21) suggests that that B
e transforms as
a SO(3, C) vector under gauge transformations parametrized by θb ≡ Ab0.4
Since we have not specified anything about the canonical structure of IInst,
then δ~θ as used in (21) and in (24) should at this stage simply be regarded
as a definition useful for shorthand notation.
We will now apply the Liebnitz rule in conjunction with the definition
of the temporal covariant derivatives to (18) to determine the equation gov-
erning the time evolution of Ψae. This is given by
D0(ΨaeB
e) = B
eΨ̇ae +ΨaeḂ
e + fabcA
0(ΨceB
e) = ǫ
ijkDj(ΨaeF
0k). (22)
Substituting (21) and (20) into the left and right hand sides of (22), we have
BieΨ̇ae +Ψae
febcB
0 − iǫijkDjHek
+ fabcA
0(ΨceB
e) = −iǫijkDj(ΨaeHek).(23)
In what follows, it will be convenient to use the following transformation
properties for Ψae as A
i under SO(3, C) gauge transformations
δ~θΨae =
fabcΨce + febcΨac
Ab0; δ~θA
i = −DiAa0; δ~θB
e = −febcBibAc0. (24)
4We will make the identification with SO(3, C) gauge transformations later in this
paper when we bring in the relation of IInst with the Ashtekar variables.
Then using (24), the time evolution equations for the phase space variables
ΩInst can be written in the following compact form
Ȧbi = −δ~θA
i − iHbi ; Ψ̇ae = −δ~θΨae − iǫ
ijk(B−1)ei (DjΨaf )H
k . (25)
We have found evolution equations for Ψae and A
i from the covariant equa-
tions of Aaµ and the Hodge-duality condition We have obtained these without
using Poisson brackets, and by assuming that the Hamiltonian and diffeo-
morphism constraints are satisfied. Therefore the first order of business is
then to check for the preservation of the initial value constraints under the
time evolution generated by (25). This means that we must check that
the time evolution of the diffeomorphism, Gauss’ law and Hamiltonian con-
straints are combinations of terms proportional to the same constraints and
their spatial derivatives, and terms which vanish when the constraints hold.5
These constraints are given by
we{Ψae} = 0; (detB)(B−1)diψd = 0; (detB)1/2
Λ+ trΨ−1
= 0(26)
where (detB) 6= 0 and (detΨ) 6= 0. We will occasionally make the identifi-
cation
N(detB)1/2(detΨ)1/2 ≡
−g (27)
for a shorthand notation. Additionally, the following definitions are provided
for the vector fields appearing in the Gauss’ constraint
we = B
eDi; ve = B
e∂i (28)
where Di is the SO(3, C) covariant derivative with respect to the connection
Aai . Equations (26) are the equations of motion for the auxilliary fields A
N i and N .
4 Consistency of the diffeomorphism constraint un-
der time evolution
The diffeomorphism constraint is directly proportional to ψd = ǫdaeΨae, the
antisymmetric part of Ψae. So to establish the consistency condition for
this constraint, it suffices to show that the antisymmetric part of the second
equation of (25) weakly vanishes. This is given by
ǫdaeΨ̇ae = −δ~θ(ǫdaeΨae)− iǫdaeǫ
ijk(B−1)ei (DjΨaf )H
k , (29)
5This includes any nonlinear function of linear order or higher in the constraints, a
situation which involves the diffeomorphism constraint.
which splits into two terms. Using (24), one finds that the first term of (29)
is given by
−ǫdaeδ~θΨae = −ǫdae
fabcΨce +Ψacfebc
δebδdc − δecδbd
Ψce +
δdbδac − δdcδab
Ψdb − δbdtrΨ + δdbtrΨ−Ψbd
Ab0 = 2Ψ[bd]A
0 = −ǫdbhAb0ψh, (30)
which is proportional to the diffeomorphism constraint. The second term of
(30) has two contributions due to H
k as defined in (19). The first contribu-
tion reduces to
−iǫdaeǫijk(B−1)ei (DjΨaf )(H(1))
= −iǫdaeǫijk(B−1)ei (DjΨaf )
−g(B−1)gk(Ψ
−1Ψ−1)gf
= iǫdae(detB)
−1ǫegh(Ψ−1Ψ−1)gfB
hDjΨaf
= i(detB)−1(Ψ−1Ψ−1)gf
a − δgaδhd
vv{Ψaf}
= i(detB)−1(Ψ−1Ψ−1)gf
dva{Ψaf} − vd{Ψgf}
= i(detB)−1
(Ψ−1Ψ−1)dfGf + vd{Λ+ trΨ−1}
. (31)
The first term on the final right hand side of (31) is the Gauss’ constraint
and the second term is the derivative of a term direction proportional to
the Hamiltonian constraint.6 The second contribution to the second term
of (29) is given by
ǫdaeǫ
ijk(B−1)ei (DjΨaf )(H(2))
k = ǫdaeǫ
ijk(B−1)ei (DjΨaf )ǫmnkN
= ǫdac
n − δinδjm
(B−1)ei (DjΨaf )N
= ǫdaeN
i(B−1)eivf{Ψaf} −N jDj(ǫdaeΨae) = ǫdaeN i(B−1)efGa −N jDjψd.(32)
The result is that the time evolution of the diffeomorphism constraint is
directly proportional to
ψ̇d =
i(detB)−1(Ψ−1Ψ−1)da + ǫdaeN
i(B−1)ei
Ab0ǫbdh − δdhN jDj
ψh + i(detB)
vd{(−g)−1/2H}, (33)
which is a linear combination of terms proportional to the constraints (26)
and their spatial derivatives. The result is that the diffeomorphism con-
straint Hi = 0 is consistent with respect to the Hamiltonian evolution gen-
erated by the equations (25). So it remains to verify consistency of Gauss’
law and the Hamiltonian constraints Ga and H.
6We have added in a term Λ, which can be regarded as a constant of integration with
respect to the spatial derivatives from vd.
5 Consistency of the Gauss’ constraint under time
evolution
Having verified the consistency of the diffeomorphism constraint under time
evolution, we now move on to the Gauss’ constraint. Application of the
Liebnitz rule to the first equation of (26) yields
Ġa = Ḃ
eDiΨae +B
eDiΨ̇ae +B
fabfΨfe + febgΨag
Ȧai . (34)
Upon substituion of (21) and (25) into (34), we have
Ġa =
−δ~θB
e − iǫijkDjHek
DiΨae +B
−δ~θΨae − iǫ
ijk(B−1)ei (DjΨaf )H
fabfΨfe + febgΨag
−δ~θA
i − iHbi
.(35)
Using the Liebniz rule to combine the δ~θ terms of (35), we have
Ġa = −δ~θGa − iǫ
k)DiΨae +B
e Dm((B
−1)ei (DjΨaf )H
fabfΨfe + febgΨag
i . (36)
The requirement of consistency is that we must show that the right hand
side of (36) vanishes weakly. First, we will show that the third term on the
right hand side of (36) vanishes up to terms of linear order and higher in
the diffeomorphism constraint. This term, up to an insignificant numerical
factor, has two contributions. The first contribution is
fabfΨfe + febgΨag
Bie(H(1))
fabfΨfe + febgΨag
(Ψ−1Ψ−1)eb
fabf (Ψ
−1)fb + febg(Ψ
−1Ψ−1)ebΨag
∼ δ(1)a (~ψ) ∼ 0, (37)
which is directly proportional to a nonlinear function of first order in ψd
which is proportional to the diffeomorphism constraint. The second contri-
bution to the third term on the right hand side of (36) is
fabfΨfe + febgΨag
Bie(H(2))
fabfΨfe + febgΨag
ǫkmnN
kBme B
fabfΨfe + febgΨag
(detB)Nk(B−1)dkǫdeb
= (detB)Nk(B−1)dk
δfdδae − δfeδad
Ψfe + 2δdgΨag
= (detB)Nk(B−1)dk
Ψda − δadtrΨ + 2Ψad
≡ δ(2)a ( ~N) (38)
which does not vanish, and neither is it expressible as a constraint. For
the Gauss’ law constraint to be consistent under time evolution, a necessary
condition is that this δ
a ( ~N) term must be exactly cancelled by another
term arising from the variation.
Let us expand the terms in square brackets in (36). This is given, using
the Liebniz rule on the second term, by
ǫijk(DjH
k)(DiΨae) + ǫ
ijkBme Dm((B
−1)ei (DjΨae)H
= ǫijk(DjH
k)(DiΨae)− ǫijkBme (B−1)en(DmBng )(B−1)
i (DjΨaf )H
+ǫmjk(DmDjΨaf )H
+ ǫmjk(DjΨaf )(DmH
). (39)
The first and last terms on the right hand side of (39) cancel, which can be
seen by relabelling of indices. Upon application of the definition of curvature
as the commutator of covariant derivatives to the third term, then (39)
reduces to
−ǫijk(DnBng )(B−1)
i (DjΨaf )H
fabcΨcf + ffbcΨac
. (40)
The first term of (40) vanishes on account of the Bianchi identity and the
second term contains two contributions which we must evaluate. The first
contribution is given by
(H(2))
fabcΨcf + ffbcΨac
= (detB)Nk(B−1)dkǫdbf
fabcΨcf + ffbcΨac
= (detB)Nk(B−1)dk
δdaδfc − δdcδfa
Ψcf − 2δdcΨac
= (detB)Nk(B−1)dk
δdatrΨ−Ψda − 2Ψad
= −δ(2)a ( ~N ),(41)
with δ
a ( ~N) as given in (37). So putting the results of (39), (40) and (41)
into (36), we have
Ġa = −δ~θGa + δ
~N) + δ(1)a (
~ψ) + δ(1)a (
~ψ)− δ(2)a ( ~N) = −δ~θGa + 2δ
(1)(~ψ).(42)
The velocity of the Gauss’ law constraint is a linear combination of the
Gauss’ constraint with terms of the diffeomorphism constraint of linear or-
der and higher. Hence the time evolution of the Gauss’ law constraint is con-
sistent in the sense that we have defined, since δ(1)(~ψ) vanishes for ψd = 0.
6 Consistency of the Hamiltonian constraint un-
der time evolution
The time derivative of the Hamiltonian constraint, the third equation of
(26), is given by
((detB)1/2(detΨ)1/2
(Λ + trΨ−1) +
(Λ + trΨ−1) (43)
which has split up into two terms. The first term is directly proportional
to the Hamiltonian constraint, therefore it is already consistent. We will
nevertheless expand it using (21) and (25)
(B−1)di Ḃ
d + (Ψ
−1)aeΨ̇ae
(detB)1/2(detΨ)1/2(Λ + trΨ−1)
(B−1)di
−δ~θB
d − iǫijkDjHdk
+(Ψ−1)ae
−δ~θΨae − iǫ
ijk(B−1)ei (DjΨaf )H
H. (44)
We will be content to compute the δ~θ terms of (44). These are
(B−1)di δ~θB
d = (B
−1)di fdbfB
0 = δdbfdbfA
0 = 0 (45)
on account of antisymmetry of the structure constants, and
(Ψ−1)eaδ~θΨae = (Ψ
−1)ea
fabfΨfe + febgΨag
= 0, (46)
also due to antisymmetry of the structure constants. We have shown that
the first term on the right hand side of (43) is consistent with respect to
time evolution. To verify consistency of the Hamiltonian constraint under
time evolution, it remains to show that the second term is weakly equal to
zero. It suffices to show this just for the second term, in brackets, of (43)
(Λ + trΨ−1) = −(Ψ−1Ψ−1)feΨ̇ef
= (Ψ−1Ψ−1)ef
δ~θΨae − iǫ
ijk(B−1)ei (DjΨaf )H
, (47)
where we have used (25). Equation (47) has split up into two terms, of
which the first term is
(Ψ−1Ψ−1)eaδ~θΨae = (Ψ
−1Ψ−1)ea
fAbfΨfe + febgΨag
fabf (Ψ
−1)fa + febg(Ψ
−1)eg
Ab0 = m(
~ψ) ∼ 0 (48)
which vanishes weakly since it is a nonlinear function of at least linear order
in ψd. The second term of (47) splits into two terms which we must evaluate.
The first contribution is proportional to
(Ψ−1Ψ−1)eaǫijk(B−1)ei (DjΨaf )(H(1))
−g(Ψ−1Ψ−1)eaǫijk(B−1)ei (DjΨaf )(B−1)dk(Ψ−1Ψ−1)df
−g(Ψ−1Ψ−1)ea(Ψ−1Ψ−1)df (detB)−1ǫedgBjgDjΨaf
−g(detB)−1ǫedg(Ψ−1Ψ−1)ea(Ψ−1Ψ−1)dfvg{Ψaf} ≡ v{~ψ} (49)
for some vector field v. We have used the fact that the term in (49) quartic
in Ψ−1 in antisymmetric in a and f due to the epsilon symbol. Hence Ψaf
as acted upon by vg can only appear in an antisymmetric combination, and
is therefore proportional to the diffeomorphism constraint ψd whose spatial
derivatives weakly vanish. Hence (49) presents a consistent contribution to
the time evolution of H, which leaves remaining the second contribution to
the second term of (47). This term is proportional to
(Ψ−1Ψ−1)eaǫijk(B−1)ei (DjΨaf )(H(2))
= (Ψ−1Ψ−1)eaǫijk(B−1)ei (DjΨaf )ǫmnkN
n − δinδjm
(B−1)eiB
−1Ψ−1)ea(DjΨaf )
N i(B−1)eiB
f − δefN
(Ψ−1Ψ−1)ea(DjΨaf )
= (−g)−1/2N iHai vf{Ψaf} − (Ψ−1Ψ−1)fa(N jDjΨaf )
= (−1)−1/2N iHai Ga −N jDj(Λ + trΨ−1). (50)
The first term on the final right hand side of (50) is proportional to the
Gauss’ law constraint, and the second term is proportional to the derivative
of the Hamiltonian constraint. To obtain this second term we have added
in Λ as a constant of differentiation with respect to ∂j . Substituting (48),
(49) and (50) into (47), then we have
Ḣ =∼ Ô(~ψ) + (−g)−1/2N iHai Ga + T̂ ((−g)−1/2H), (51)
where Ô and T̂ are operators consisting of spatial derivatives acting to the
right and c numbers. The time derivative of the Hamiltonian constraint
is a linear combination of the Gauss’ law and Hamiltonian constraints and
its spatial derivatives, plus terms of linear order and higher in the diffeo-
morphism constraint and its spatial derivatives. Hence the Hamiltonian
constraint is consistent under time evolution.
7 Recapitulation
The final equations governing the time evolution of the initial value con-
straints are given weakly by
ψ̇d =
i(detB)−1(Ψ−1Ψ−1)da + ǫdaeN
i(B−1)ei
Ab0ǫbdh − δdhN jDj
ψh + i(detB)
vd{Λ + trΨ−1};
Ġa = −fabcAb0Gc + δ(1)a (~ψ);
ǫijk(B−1)di (DjH
k ) + ǫ
ijk(B−1)ei (Ψ
−1)ae(DjΨaf )H
−N j∂j
(Λ + trΨ−1)
+(−g)−1/2N iHai Ga −
−g(detB)−1ǫedg(Ψ−2Ψ−1)ea(Ψ−1Ψ−1)dfvg{ǫafhψh}+m(~ψ).(52)
Equations (52) show that all constraints derivable from the the action (1)
are preserved under time evolution, since their time derivatives yield linear
combinations of the same set of constraints and their spatial derivatives.
There are no additional constraints generated which implies that the action
(1) is consistent in the Dirac sense. On the other hand, we have not defined
the canonical structure of (52) or any Poisson brackets.
Equations (52) can be written schematically in the following form
~̇H ∼ ~H + ~G+H; ~̇G ∼ ~G+Φ( ~H); Ḣ ∼ H + ~G+Φ( ~H), (53)
where Φ is some nonlinear function of the diffeomorphism constraint ~H,
which is of at least first order in ~H. In the Hamiltonian formulation of a
theory, one identifies time derivatives of a variable f with via ḟ = {f,H}
the Poisson brackets of the variable with the Hamiltonian H . So while we
have not specified Poisson brackets, equation (53) implies the existence of
Poisson brackets associated to some Hamiltonian HInst for the action (1),
{ ~H,HInst} ∼ ~H + ~G+H; {~G,HInst} ∼ ~G+Φ( ~H);
{H,HInst} ∼ H +Φ( ~H) + ~G. (54)
So the main result of this paper has been to demonstrate that the instanton
representation of Plebanski gravity forms a consistent system, in the sense
that the constraint surface is preserved under time evolution. As a direction
of future research we will compute the algebra of constraints for (1) directly
from its canonical structure. Nevertheless it will be useful for the present
paper to think of equations (52) in the Dirac context, mainly for compari-
son with other formulations of general relativity. This will bring us to the
Ashtekar variables.
8 Discussion: Relation of the instanton represen-
tation to the Ashtekar variables
We will now provide the rationale for not following the Dirac procedure for
constrained systems [2] with respect to (1), by comparison with the Ashekar
formulation of GR. The action for the instanton representation (1) can be
written in the following 3+1 decomposed form
IInst =
0we{Ψae} − ǫijkN iBjaBkeΨae
−iN(detB)1/2(detΨ)1/2
Λ+ trΨ−1
, (55)
which regards Ψae and A
i as phase space variables. But the phase space of
(55) is noncanonical since its symplectic two form
δθInst = δ
d3xΨaeB
d3xBieδΨae ∧ δAai +
d3xΨaeǫ
ijkDj(δA
k) ∧ δAai , (56)
is not closed owing to the presence of the second term on the right hand
side. The initial stages of the Dirac procedure applied to (55) state that the
momentum conjugate to Aai yields the primary constraint
Πia =
δIInst
δȦai
= ΨaeB
e. (57)
Then making the identification σ̃ia = Π
a and upon substitution into (57)
and into (55), one obtains the action
IAsh =
σ̃iaȦ
0Ga −N iHi −
, (58)
which is the action for the Ashtekar complex formalism of general relativity
[5], [6], with σ̃ia being the densitized triad. This is a totally constrained sys-
tem with (Aa0, N
i, N), respectively the SO(3, C) rotation angle Aa0, the shift
vector N i and the densitized lapse function N = N(detσ̃)−1/2 as auxilliary
fields. The constraints in (58) smearing the auxilliary fields are the Gauss’
law, vector and Hamiltonian constraints
Ga = Diσ̃
a; Hi = ǫijkσ̃
a ; H = ǫijkǫ
abcσ̃iaσ̃
σ̃kc +B
. (59)
From (58) one reads off the symplectic two form ΩAsh given by
ΩAsh =
d3xδσ̃ia ∧ δAai = δ
d3xσ̃iaδA
= δθAsh, (60)
which is the exact functional variation of the canonical one form θAsh.
The actions (55) and (58) are transformable into each other only under
the condition (detB) 6= 0 and (detΨ) 6= 0. In (58) it is clear that σ̃ia and Aai
form a canonically conjugate pair, which suggests that (55) is a noncanonical
version of (58). The constraints algebra for (59) is
{ ~H[ ~N ], ~H [ ~M ]} = Hk
N i∂kMi −M i∂kNi
{ ~H[N ], Ga[θa]} = Ga[N i∂iθa];
{Ga[θa], Gb[λb]} = Ga
fabcθ
{H(N), ~H [ ~N ]} = H[N i∂iN
{H(N ), Ga(θa)} = 0;[
H(N),H(M )
= Hi[
N∂jM −M∂jN
H ij], (61)
which is first class due to closure of the algebra, and is therefore consistent
in the Dirac sense. Let us consider (61) for each constraint with the total
Hamiltonian HAsh and compare with (54). This is given schematically by
{ ~H,HAsh} ∼ ~H + ~G+H; {~G,HAsh} ∼ ~G+ ~H;
{H,HAsh} ∼ H + ~H. (62)
Comparison of (62) with (54) shows an essentially similar structure for the
top two lines involving ~H and ~G.7 But there is a marked dissimilarity with
respect to the Hamiltonian constraint H. Note that there is a Gauss’ law
constraint appearing in the right hand side of the last line of (54) whereas
there is no such constraint on the corresponding right hand side of (62).
This means that while the Hamiltonian constraint is gauge-invariant under
SO(3, C) gauge-transformations as implied by (61) and (62), this is not
the case in (54). This means that the action (1), which as shown in [1]
describes general relativity for Petrov Types I, D and O, has a different
role for the Gauss’ law and Hamiltonian constraints than the action (58),
which also describes general relativity. Therefore IInst and IAsh at some
level correspond to genuinely different descriptions of GR, a feature which
would have been missed had we applied the step-by-step Dirac procedure.
9 Appendix: Commutation relations for IInst
We will now infer the Poisson brackets for (55) by inference from the corre-
sponding canonical Ashtekar Poisson brackets
{Aai (x), σ̃
b (y)} = δ
(3)(x,y) (63)
along with the vanishing brackets
{Aai (x), Abj(y)} = {σ̃ia(x), σ̃
(x)} = 0. (64)
To find the analogue of (63) and (64) for (55), we will use the tranformation
equation
σ̃ia = ΨaeB
e, (65)
which corresponds to a noncanonical transformation. Substitution of (65)
into (63) yields
{Aai (x),Ψbf (y)Bif (y)} = δ
(3)(x,y)
{Aai (x),Ψbf (y)}B
(y) + Ψbf (x){Aai (x), B
(y)}. (66)
The second term on the right hand side of (66) vanishes on account of the
first relations of (64), and upon multiplying (66) by the inverse magnetic
field (B−1)ei , assumed to be nondegenerate, we obtain
{Aai (x),Ψbf (y)} = δab (B−1(y))
(3)(x,y). (67)
7The linearly versus nonlinearly of the diffeomorphism constraints on the right hand
side is just a minor difference.
This gives us the Poisson brackets {A,A} ∼ 0 and {A,Ψ} ∼ B−1, which
leaves remaining the brackets {Ψ,Ψ}. To obtain these, we substitute (65)
into the second equation of (64), yielding
{σ̃ia(x), σ̃bj(y)} = {Ψae(x)Bie(x),Ψbf (y)B
f (y)}
= Ψae(x){Bie(x),Ψbf (y)}B
(y) + {Ψae(x),Ψbf (y)}Bie(x)B
+Ψbf (x)Ψae(x){Bie(x), B
f (y)}+Ψbf (y){Ψae(x), B
f (y)}B
e(x) = 0. (68)
Noting that the third term vanishes on account of the first equation of (64),
equation (68) reduces to
{Ψae(x),Ψbf (y)}Bie(x)B
f (y)
+Ψae(x){Bie(x),Ψbf (y)}B
(y)−Ψbf (y){Bjf (y),Ψae(x)}B
e(x) = 0. (69)
The bottom two terms of (69) can be computed using (67)
{Bie(x),Ψbf (y)} = ǫimnDxm{Aen(x),Ψbf (y)} = ǫimnDxm(δeb (B−1(y))fnδ(3)(x,y)).(70)
Substituting (70) into (69) and cancelling a pair of magnetic fields, then we
have that
{Ψae(x),Ψbf (y)}Bie(x)B
f (y) = ǫ
Ψae(x)D
m +Ψba(y)D
δ(3)(x,y). (71)
Left and right multiplying (71) by the inverse of the magnetic fields, we have
{Ψae(x),Ψbf (y)} = ǫijm
(B−1(y))
mΨab(x)(B
−1(x))ei
+(B−1(x))eiD
mΨba(y)(B
−1(y))
δ(3)(x,y). (72)
One sees that the internal components of Ψae have nontrivial commutation
relations with themselves.
References
[1] Eyo Ita ‘Instanton representation of Plebanski gravity. Gravitational
instantons from the classical formulation.’ arXiv: gr-qc/0703057
[2] Paul Dirac ‘Lectures on quantum mechanics’ Yeshiva University Press,
New York, 1964
[3] Hans Stephani, Dietrich Kramer, Maclcolm MacCallum, Cornelius
Hoenselaers, and Eduard Herlt ‘Exact Solutions of Einstein’s Field
Equations’ Cambridge University Press
[4] R. Penrose and W. Rindler ‘Spinors and space-time’ Cambridge Mono-
graphs in Mathematical Physics
[5] Ahbay Ashtekar ‘New Hamiltonian formulation of general relativity’
Phys. Rev. D36(1987)1587
[6] Ahbay Ashtekar ‘New variables for classical and quantum gravity’ Phys.
Rev. Lett. Volume 57, number 18 (1986)
ABSTRACT
  The instanton representation of Plebanski gravity provides as equations of
motion a Hodge self-duality condition and a set of `generalized' Maxwell's
equations, subject to gravitational degrees of freedom encoded in the initial
value constraints of general relativity. The main result of the present paper
will be to prove that this constraint surface is preserved under time
evolution. We carry this out not using the usual Dirac procedure, but rather
the Lagrangian equations of motion themsleves. Finally, we provide a comparison
with the Ashtekar formulation to place these results into overall context.

<|endoftext|><|startoftext|>
Introduction
The organic Bechgaard salts (TMTSF)2X consist of
stacks of planar TMTSF (tetramethyltetraselenafulva-
lene) molecules separated by anions (X = PF6, AsF6,
ClO4, Br, etc.). The charge transport in these systems
is restricted to the direction along the molecular stacks,
making the Bechgaard salts prime examples of one-
dimensional metals. However, on cooling down most of
them undergo a metal-insulator transition which prevents
the onset of a superconducting state [1]. In Bechgaard
salts with noncentrosymmetric anions such as ReO4, BF4
or FSO3 the metal-insulator transition is related to the
anion ordering [2]. It was furthermore demonstrated that
in some cases the metal-insulator transition can be sup-
pressed by the application of external pressure, leading to
a superconducting ground state [3].
The case of the anions X=FSO3 in this class of ma-
terials is particularly interesting, since these anions are
noncentrosymmetric and in addition possess a permanent
electrical dipole moment. The first study of the basic prop-
erties of (TMTSF)2FSO3 has been reported by Wudl et
al. in 1982 [4]. Further studies have shown that this com-
a email: oleksiy.pashkin@physik.uni-augsburg.de
b email: christine.kuntscher@physik.uni-augsburg.de
pound has the highest superconducting transition temper-
ature (2.5 K at 8.5 kbar) among the Bechgaard salts. It
was proposed that this is due to the interaction of the
conducting electrons with the FSO3 anion dipoles [5]. A
recent detailed study [6] revealed a very rich pressure-
temperature phase diagram of (TMTSF)2FSO3 with a va-
riety of different phases, which have not been completely
identified up to now. Furthermore, by magnetoresistance
measurements a two-dimensional electronic behavior was
found in (TMTSF)2FSO3 under a pressure of around 6.2
kbar [7].
The interaction of the FSO3 anions with each other via
long-range Coulomb forces and with the centrosymmetric
surrounding formed by the TMTSF cations tends to order
the anions below a certain temperature. The first-order
structural phase transition related to this anion ordering
occurs at around TMI=89 K in (TMTSF)2FSO3 at ambi-
ent pressure. The change of the crystal structure modifies
the electronic band structure: The effective half-filled con-
ducting band splits into one filled and one empty band
separated by an energy gap, leading to a sharp metal-
insulator transition [5]. The structural analysis suggested
a modulation of the crystal structure with wavevector q =
(1/2, 1/2, 1/2) below the phase transition, which implies
an antiferroelectric state [8,2]. The ordering of the FSO3
anions modulates the lattice resulting in a new unit cell
http://arxiv.org/abs/0704.0368v1
2 A. Pashkin et al.: Metal-insulator transition in (TMTSF)2FSO3 probed by infrared spectroscopy
of size 2a × 2b × 2c. Thus, there are eight formula units
of (TMTSF)2FSO3 per unit cell in the low temperature
phase. Correspondingly, one can expect a splitting of each
vibrational mode into up to eight components [9].
The ratio of the energy gap to the transition tem-
perature in (TMTSF)2FSO3 is ∼ 12.5 [4], which is ap-
preciably higher than the value 3.5 predicted by the
mean-field theory for the Peierls transition. Therefore,
the metal-insulator in the Bechgaard salts with non-
centrosymmetric anions was attributed to a special type of
Peierls instability which originates from the anion-electron
coupling [10].
In this work we present the results of a temperature-
dependent polarized infrared reflectivity study of
(TMTSF)2FSO3 single crystals in the far- and mid-
infrared frequency range, in order to characterize the
change of electronic and vibrational properties during
the metal-insulator transition at TMI=89 K. This is the
first infrared spectroscopic investigation of the compound
(TMTSF)2FSO3. Our results allow a direct determination
of the charge gap in the insulating state. Furthermore, we
determined and analyzed the behavior of the vibrational
modes during the metal-insulator transition, which can
clarify details of the dipolar ordering.
2 Experimental
(TMTSF)2FSO3 single crystals were grown by standard
electrochemical techniques from TMTSF molecules and
tetrabutylammonium-FSO3. The studied samples have a
needle-like shape, with a size of approximately 2 × 0.2 ×
0.1 mm3. The samples were mounted on a cold-finger Cry-
oVac Konti-Mikro cryostat. The actual measuring temper-
ature was controlled by a sensor attached in direct vicin-
ity of the sample. The measurements were performed at
the infrared beamline of the synchrotron radiation source
ANKA. The polarized infrared reflectivity was measured
in the range 150 - 10000 cm−1 using a Bruker IRscope
II microscope attached to a IFS66v/S spectrometer. The
frequency resolution was 1 cm−1 for all measured spec-
tra. Optically transparent TPX and KBr cryostat win-
dows were used for the measurements in the far- and mid-
infrared frequency range, respectively.
3 Results and discussion
3.1 Electronic properties
The reflectivity spectra of (TMTSF)2FSO3 above and be-
low the metal-insulator transition temperature of 89 K for
both polarizationsE‖a and E‖b′ (along and perpendicular
to the stacking axis, respectively) are shown in Fig. 1. The
reflectivity data in the spectral region at around 450 cm−1
are affected by the absorption features of the far-infrared
TPX cryostat window and are therefore not shown.
At 290 K the reflectivity of the sample along the
stacking axis E‖a demonstrates a typical Drude behavior
E || a
 290 K
 45 K
8000250
E || b'
Frequency (cm
 290 K
 45 K
 Energy (meV)
E || a
Fig. 1. Reflectivity spectra of (TMTSF)2FSO3 above and be-
low the metal-insulator transition for E‖a and E‖b′.
(growth up to 1 when frequency tends to zero). In con-
trast, at 45 K , i.e., below TMI , the reflectivity is almost
frequency independent below 1000 cm−1, which is typical
for an insulating state.
The interference fringes observed below 400 cm−1 in
the spectra for both polarizations are due to the partial
transparency of the sample in the insulating phase. Per-
pendicular to the stacking axis (E‖b′), the optical reflec-
tivity and conductivity is much lower than along the a
axis. Nevertheless, the observed changes during the metal-
insulator transition are similar to those of the E‖a direc-
tion. These results demonstrate the opening of an energy
gap at the Fermi level for both studied directions.
The dramatic effect of the temperature decrease on the
electronic properties of (TMTSF)2FSO3 are more directly
seen in the optical conductivity spectra. The E‖a optical
conductivity σ1(ω) of (TMTSF)2FSO3 in the insulating
(at 45 K) and conducting phase (at 290 K) obtained by
means of Kramers-Kronig analysis is shown in Fig. 2. The
dominating feature of the spectrum at 45 K is a strong
charge transfer band due to electronic transitions across
the gap. The arrow shows the band gap (1500 cm−1) ob-
tained from the published temperature-dependent dc re-
sistivity measurements [4]. Obviously, the agreement of
this value with the onset of the optical interband tran-
A. Pashkin et al.: Metal-insulator transition in (TMTSF)2FSO3 probed by infrared spectroscopy 3
1000 10000
100 1000
2D = 1500 cm
Frequency (cm
 290 K
 45 K
E || a
 Energy (meV)
Fig. 2. E‖a optical conductivity spectra of (TMTSF)2FSO3
above and below the metal-insulator transition at TMI= 89 K.
Hatched area depicts the Drude model fit of the high temper-
ature optical conductivity.
sition is very good. On the other hand, the optical con-
ductivity at room temperature is mostly dominated by
the Drude response of the free carriers. The corresponding
fit using the Drude model is shown as the hatched area
in Fig. 2. Obviously, the Drude model provides a good
description of the measured room-temperature spectrum
excluding the electron-molecular vibration (emv) antires-
onance modes. The plasma frequency ωp = 8660 cm
and the scattering rate Γ ≃ 1450 cm−1 obtained from the
fit agree well with the Drude model parameters reported
for other TMTSF salts [11]. The obtained value of the dc
conductivity, σdc ≃ 860 (Ωcm)
−1, is in reasonable agree-
ment with the dc and microwave conductivity values of
1600 and 300 (Ωcm)−1, respectively, reported by Wudl et
al. [4].
3.2 Vibrational modes
The TMTSF molecule with the point group symmetry
D2h has in total 72 local vibrational modes classified ac-
cording to the following representations [12]
ΓD2h = (12ag + 11b3g + 11b1u + 11b2u)
+(6b1g + 7b2g + 7au + 7b3u), (1)
where the vibrations in the first brackets are polarized in
the molecular plane (perpendicular to the stacking a axis)
and the vibrations in the second brackets are polarized out
of the plane (along the stacking axis a). The symmetric
(gerade) vibrations are Raman active and the asymmet-
ric (ungerade) vibrations are infrared active excluding the
au silent modes. Some of the totally symmetric ag Ra-
man modes are expected to appear in the infrared spectra
for E‖a due to efficient emv coupling in the modulated
stacking structure [11,13].
Table 1. The eigenfrequencies and assignment of some vi-
brational modes observed in (TMTSF)2FSO3 for E‖a at 45 K
below TMI . All numbers are in cm
45 K calculated frequency1 assignment
580 571 ν3(a1) FSO3
728 702 ν51(b2u)
902, 911, 915, 916 ν8(ag)
917, 924, 932
1020, 1031, 1036 1060 ν7(ag)
1067, 1072 1060 ν7(ag)
1362, 14502 1469 ν4(ag)
1354, 1364, 1369 1369 ν6(ag)
1373, 1379, 1385
1550, 1584, 1606 1596 ν3(ag)
1847, 1854, 1863 1863 ν3(ag) + ν11(ag)
The tetrahedral FSO3 anion has C3v point group sym-
metry which gives in total nine vibrational modes
ΓC3v = 3a1(z, x
2 + y2, z2) + 3e(x, y, x2 − y2, xy, yz, xz),
where e species correspond to the doublets. Thus, in the
infrared spectra one expects six modes, with the 3a1 and
3e modes being polarized along and perpendicular to the
polar axis of the anion, respectively.
In this section we want to concentrate on the changes
in the infrared phonon spectra for both polarizations
across the metal-insulator transition. For E‖a several ag
vibrations of the TMTSF molecules become infrared ac-
tive in the insulating phase. This is due to the effec-
tive emv coupling of these vibrations to the on-chain
charge transfer band in the structure modulated due to
the anion ordering. The list of the new modes observed
below the transition together with their tentative as-
signment is given in Table 1. Most of them are emv
coupled ag modes polarized in the molecular plane or
their combination as a triplet at around 1850 cm−1. The
ν4(ag) mode involving the central C=C bond stretching is
known to have especially strong emv coupling and there-
fore it appears as a strong antiresonance mode in the
optical conductivity spectrum. It should be pointed out
that the observed appearance of ag modes for E‖a in
the ordered phase is typical only for (TMTSF)2X com-
pounds with non-centrosymmetric anions. In comparison,
(TMTTF)2X salts possess a stronger stack dimerization,
resulting in the emv coupling of the ag modes already
in the disordered phase, and therefore the anion ordering
transition causes only a frequency shift and an intensity
change of the emv coupled modes [15].
4 A. Pashkin et al.: Metal-insulator transition in (TMTSF)2FSO3 probed by infrared spectroscopy
1264 1280 1296
560 570 580 590 600 1140 1150 1160
1358 1365 1372
Frequency (cm
 Frequency (cm
 Frequency (cm
  Frequency (cm
Fig. 3. Reflectivity spectra (shifted for clarity) of some
phonons which experience changes during the metal-insulator
transition at 89 K: (a) vibration polarized along the a axis;
(b)-(d) vibrations polarized along the b′ axis.
The ν3(a1) vibrational mode of the FSO3 anion at
580 cm−1 is observed for the whole studied temperature
range. However, the lineshape of this mode in the metallic
phase above TMI is inverted with respect to the insulating
phase [see Fig. 3(a)], since the background dielectric con-
stant is negative as expected for highly conducting metals
at low frequencies. Such a change is a clear evidence for
the suppression of the Drude conductivity in the insulat-
ing phase of (TMTSF)2FSO3.
The mode at 728 cm−1 observed for temperatures
below TMI is particularly interesting, since its intensity
gradually increases on temperature decrease (see Fig. 4).
A similar behavior is found for the polarization perpen-
dicular to the stacks, E‖b′. Moreover, above the transi-
tion temperature a strong asymmetric mode is seen at
710 cm−1. This mode shifts to lower frequencies and gets
stronger with increasing temperature. This mode has not
been observed in any other earlier study of the Bechgaard
salts. Therefore, it would be natural to assign it to a vibra-
tion of FSO3 anion. However, such an assignment would be
in contradiction to the experimental observations, since:
680 700 720 740 760
E || b'
E || a
200 K
200 K
  Frequency (cm
Fig. 4. Reflectivity spectra (shifted for clarity) of the vibration
at around 710 cm−1 at different temperatures for E||a and
E||b′.
(i) the ν5(e) and ν2(a1) vibrations of FSO3 located close
to the observed mode have frequencies which are by more
than 100 cm−1 higher or lower [14]; (ii) the intensity of
the anion vibration should not vanish at the order-disorder
transition point. Thus, one has to attribute the modes at
around 710 and 728 cm−1 to vibrations of the TMTSF
molecules. We suggest that both modes originate from the
ν51(b2u) in-plane vibration of the TMTSF molecule. Ac-
cording to the normal-coordinate analysis [12,16] its fre-
quency for a free TMTSF0.5+ cation is 702 cm−1.
The corresponding atomic movements involve stretch-
ing of the Se-C side bond and rocking of the adjacent
methyl group. For the b2u vibration the inversion symme-
try of the molecule is not preserved, causing its infrared-
activity for the polarization perpendicular to the stacks.
However, it is known that in (TMTSF)2X salts the dipole
moment corresponding to the ν51(b2u) vibration is very
small, and therefore this mode can hardly be detected
even for E‖b′ where it should have the strongest inten-
sity [16]. Nevertheless, in (TMTSF)2FSO3 this mode is
particularly strong even at room temperature. This find-
ing can be explained by the electrical dipole of the FSO3
anion pointing towards the Se-F bond. Similar to other
A. Pashkin et al.: Metal-insulator transition in (TMTSF)2FSO3 probed by infrared spectroscopy 5
Fig. 5. Schematic illustration of the ν51(b2u) vibration coupled
to the reorientation of the FSO3 electrical dipole moment. The
projection of the crystal structure on the b− c plane is shown.
Only the Se (large open circles) and C (small filled circles)
atoms of the TMTSF molecules are presented, together with
the displacements of the Se atoms. The grey filled circles be-
tween molecules denote the positions of the FSO3 anions, the
bold arrows show the two possible orientations of the anion
dipole moment (p1 and p2). Because of the symmetry prop-
erties of the b2u vibration the reorientation of the electrical
dipole moment leads to a change of polarization ∆p in the
perpendicular direction for any orientation of the anion dipole
moment.
non-centrosymmetric anions (ReO4, ClO4 etc.) the FSO3
anion has two possible symmetrically equivalent orienta-
tions for which the dipole moment points towards the Se
atoms of the neighboring TMTSF molecules. This situa-
tion is sketched in Fig. 5, where p1 and p2 are two possible
orientations of the FSO3 electrical dipole moment. During
the vibration the dipole moment of the anion follows the
position of the Se atom. Due to the symmetry properties
of the b2u vibration the nearest Se atoms on both sides
of the anion move in the same direction. Thus, for both
possible orientations of the dipole the b2u vibration results
in a change of the average polarization along the direction
of ∆p (Fig. 5).
The described coupling mechanism between the b2u
vibration and the dipole moment of the anion in
(TMTSF)2FSO3 should lead to a strong enhancement of
the infrared strength of the ν51(b2u) vibration for E‖b
since ∆p has the largest projection along this direction.
On the other hand, ∆p is perpendicular to the stacking
axis and the b2u mode should not appear for E‖a. This
is indeed observed in our experiment above the transition
temperature. Below the transition the long-range order of
the anion sublattice builds up. Then the anion dipole mo-
ment orientation is determined by the modulation of the
whole lattice and it is not dependent on the movement of
neighboring TMTSF molecules, i.e., the ν51(b2u) vibration
is decoupled from the FSO3 anions. Therefore, its inten-
sity should drop abruptly below TMI , in agreement with
our observations (see Fig. 4). Moreover, the observed de-
crease of the intensity of the coupled b2u mode at around
Table 2. The eigenfrequency, width (given in bracket), and
assignment of some vibrational modes observed for E‖b′ at
selected temperatures. All numbers are in cm−1.
95 K 80 K 45 K assignment
580 (1.3) 580 (0.9) 580 (0.8) ν3(a1) FSO3
710 (7.1) 728 728 (2.0) ν51(b2u)
1154 (5.0) 1150 (3.4) 1150 (3.3) ν48(b2u)
1157 (3.8) 1158 (2.5)
1280 (5.1) 1276 (4.2) 1276 (2.1) ν4(e) FSO3
1288 (16) 1286 (4.4) 1286 (1.7)
1363 (3.2) 1361 (2.8) 1361 (1.3) ν47(b2u)
1366 (4.3) 1367 (2.4) 1367 (1.8)
710 cm−1 at 95 K compared to higher temperatures can be
explained by taking into account short-range order fluc-
tuations above the transition, evidence for which is also
given below. Indeed, in the large enough dynamical regions
where the anions are ordered, the coupling is suppressed
and therefore the strength of the ν51(b2u) should decrease.
On cooling down below TMI a vibration appears again
at somewhat higher frequency (728 cm−1) for E‖b′ and
its strength gradually increases with decreasing temper-
ature. We suggest that this is the same ν51(b2u) vibra-
tion described above. Since it is decoupled from the anion
sublattice, its frequency is expected to increase abruptly
below the transition. The increase in strength for both
polarizations should be obviously related to the tempera-
ture dependence of the order parameter (i.e., the degree
of lattice modulation). One of the possible mechanisms
can be the emv coupling of the ν51(b2u) vibration to the
charge transfer bands along a and b′ directions. However,
the detailed picture of this emv coupling is not clear, since
the symmetry of b2u mode does not allow such kind of
coupling. One can speculate that the electric field of the
FSO3 dipoles in the ordered phase distorts the TMTSF
molecules making them non-centrosymmetric. Then the
emv coupling may become allowed for the b2u(ν51) mode.
Noticeable changes in the phonon mode spectra across
the metal-insulator transition are observed for E‖b′. The
list of the parameters of these modes at temperatures
above and below TMI is given in Table 2. An obvious split-
ting into two components is seen for the ν48(b2u) mode at
1154 cm−1 [see Figure 3(b)]. In addition, the damping of
the split components directly below TMI is lower than the
damping of the single component directly above the tran-
sition (see Table 2). This difference is probably related
to the precursor short-range order fluctuations above the
transition, which can induce a small splitting already in
the disordered phase. An evidence for such fluctuations
was found in x-ray diffuse scattering experiments [8,2].
This effect is even more clearly seen in the splitting of
two other modes: the doublet ν4(e) vibration of the FSO3
anion at 1280 cm−1 [Fig. 3(c)] and the ν47(b2u) mode at
around 1365 cm−1 [Fig. 3(d)]. For each of these modes
6 A. Pashkin et al.: Metal-insulator transition in (TMTSF)2FSO3 probed by infrared spectroscopy
above TMI one can resolve two weakly split components.
However, below TMI the splitting abruptly increases and
the damping decreases (see Table 2) indicating the onset
of long-range order. Since the described effect is observed
not only for the FSO3 anion vibration but also for two vi-
brations of the TMTSF cation, we can conclude that the
short-range order fluctuations involve the modulation of
the whole (TMTSF)2FSO3 lattice and not only the anion
sublattice.
4 Conclusion
We have performed an infrared spectroscopic study of
the metal-insulator transition in (TMTSF)2FSO3. The ob-
tained optical conductivity spectra for E‖a show a Drude-
like conductivity above the anion ordering temperature
and a charge transfer band formed below the transition.
The onset of this band is in agreement with the energy
gap value of 1500 cm−1 obtained from transport measure-
ments [4].
The analysis of the infrared-active vibrations leads to
the following conclusions: (i) the crystal structure modula-
tion below the metal-insulator transition leads to a strong
emv coupling of several ag vibrations which therefore be-
come infrared-active; (ii) short-range order fluctuations of
the FSO3 anions and the corresponding lattice modulation
exist above the transition temperature, as it is seen from
the splitting of some infrared-active modes for E‖b′; (iii)
a new infrared-active mode located at around 710 cm−1
with a peculiar temperature behavior is detected and as-
signed to the coupling between the b2u TMTSF molecule
vibration and the electrical dipole moment of the FSO3
anion. The latter feature has not been observed in any
other (TMTSF)2X salt showing a metal-insulator transi-
tion. This points out the important role of the electrical
dipole moment of the anion on the structural and dynam-
ical properties of the (TMTSF)2FSO3 salt.
5 Acknowledgements
We acknowledge the ANKA Angströmquelle Karlsruhe for
the provision of beamtime and thank M. Süpfle, D. Moss,
and B. Gasharova for technical assistance at the ANKA
IR beamline. The financial support of the DFG (Emmy
Noether-program) is acknowledged.
References
1. T. Ishiguro, K. Yamaji, G. Saito, Organic Superconductors
(Springer, Berlin, 1998)
2. J.P. Pouget, S. Ravy, J. Phys. I France 6, 1501 (1996)
3. D. Jerome, Chem. Rev. 104, 5565 (2004)
4. F. Wudl, E. Aharon-Shalom, D. Nalewajek, J.V.
Waszczak, J. W. M. Walsh, J. L. W. Rupp, P. Chaikin,
R. Lacoe, M. Burns, T.O. Poehler et al., The Journal of
Chemical Physics 76(11), 5497 (1982)
5. R.C. Lacoe, S.A. Wolf, P.M. Chaikin, F. Wudl, E. Aharon-
Shalom, Phys. Rev. B 27(3), 1947 (1983)
6. Y.J. Jo, E.S. Choi, H. Kang, W. Kang, I.S. Seo, O.H.
Chung, Phys. Rev. B 67, 014516 (2003)
7. W. Kang, O.H. Chung, Y.J. Jo, H. Kang, I.S. Seo, Phys.
Rev. B 68, 073101 (2003)
8. R. Moret, J.P. Pouget, R. Comes, K. Bechgaard, J. Phys.
Colloq. France 44, 957 (1983)
9. C.C. Homes, J.E. Eldridge, Phys. Rev. B 40, 6138 (1989)
10. C.S. Jacobsen, H.J. Pedersen, K. Mortensen, G. Rindorf,
N. Thorup, J.B. Torrance, K. Bechgaard, J. Phys. C 15,
2651 (1982)
11. C.S. Jacobsen, D.B. Tanner, K. Bechgaard, Phys. Rev. B
28, 7019 (1983)
12. M. Meneghetti, R. Bozio, I. Zanon, C. Pelice, C. Ricotta,
M. Zanetti, J. Chem. Phys. 80, 6210 (1984)
13. C.C. Homes, J.E. Eldridge, Phys. Rev. B 42, 9522 (1990)
14. K. Nakamoto, Infrared and Raman Spectra of Inorganic
and Coordination Compounds (Wiley, New York, 1986)
15. C. Garrigou Lagrange, A. Graja, C. Coulon, P. Delhaes,
J. Phys. C: Solid State Phys. 17, 5437 (1984)
16. J.E. Eldridge, C.C. Homes, Phys. Rev. B 43, 13971 (1991)
	Introduction
	Experimental
	Results and discussion
	Conclusion
	Acknowledgements
ABSTRACT
  We present measurements of the infrared response of the quasi-one-dimensional
organic conductor (TMTSF)2$SO3 along (E||a) and perpendicular (E||b') to the
stacking axis as a function of temperature. Above the metal-insulator
transition related to the anion ordering the optical conductivity spectra show
a Drude-like response. Below the transition an energy gap of about 1500 cm-1
(185 meV) opens, leading to the corresponding charge transfer band in the
optical conductivity spectra. The analysis of the infrared-active vibrations
gives evidence for the long-range crystal structure modulation below the
transition temperature and for the short-range order fluctuations of the
lattice modulation above the transition temperature. Also we report about a new
infrared mode at around 710 cm-1 with a peculiar temperature behavior, which
has so far not been observed in any other (TMTSF)2X salt showing a
metal-insulator transition. A qualitative model based on the coupling between
the TMTSF molecule vibration and the reorientation of electrical dipole moment
of the FSO3 anion is proposed, in order to explain the anomalous behavior of
the new mode.

<|endoftext|><|startoftext|>
Introduction
When considering matter effects on neutrino oscillation, it is customary to
consider only the W -exchange interaction of the νe with the electrons in
matter. However, if new interactions beyond the Standard Model (SM) that
distinguish among the three generations of neutrinos exist, they can lead
to extra matter effects via radiative corrections to the Zνν vertex which
effectively violate neutral current universality, or via the direct exchange of
new particles between the neutrinos and matter particles.
For instance, topcolor assisted technicolor4 treats the third generation
differently from the first two and the Z ′ in this class of models couples
more strongly to the ντ than to the νe or νµ. In Extended Technicolor
(ETC) Models, such as that of Appelquist, Piai, and Shrock,5 the neutral
technimesons, which mix with the Z, couple to different generation fermions
differently, distinguishing among νe, νµ, and ντ . The diagonal ETC gauge
bosons also couple to the different generations differently, as well as the
∗Presenting Author
http://arxiv.org/abs/0704.0369v1
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
large variety of leptoquark states in the model. Flavor distinguishing matter
effects from diagonal ETC and leptoquarks are induced by ETC gauge
boson mixing.
The effective Hamiltonian that governs neutrino oscillations in the pres-
ence of neutral-current lepton universality violation, or new physics that
couples to the different generations differently, is given by1
H = Ũ
λ1 0 0
0 λ2 0
0 0 λ3
 Ũ † = U
0 0 0
0 δm221 0
0 0 δm231
U † +
a 0 0
0 0 0
0 0 0
be 0 0
0 bµ 0
0 0 bτ
where U is the MNS matrix,
a = 2EVCC , VCC =
2GFNe = Ne
, (2)
is the usual matter effect due to W -exchange between νe and the electrons,
and be, bµ, bτ are the extra matter effects which we assume to be non-equal.
We define the parameter ξ as
bτ − bµ
= ξ . (3)
Then, the effective Hamiltonian can be rewritten as
H = Ũ
λ1 0 0
0 λ2 0
0 0 λ3
 Ũ † = U
0 0 0
0 δm221 0
0 0 δm231
U †+a
1 0 0
0 −ξ/2 0
0 0 +ξ/2
 , (4)
where we have absorbed the extra b-terms in the (1, 1) element into a.
The extra ξ-dependent contribution in Eq. (4) can manifest itself when
a > |δm231| (i.e. E & 10GeV for typical matter densities in the Earth) in
the νµ and ν̄µ survival probabilities as
P (νµ → νµ) ≈ 1− sin2
2θ23 −
δm231
P (ν̄µ → ν̄µ) ≈ 1− sin2
2θ23 +
δm231
, (5)
where
∆ ≈ ∆31c213 −∆21c212 , ∆ij =
δm2ij
L , cij = cos θij , (6)
and the CP violating phase δ has been set to zero. As is evident from
these expressions, the small shift due to ξ will be invisible if the value of
sin2 2θ23 is too close to one. However, if the value of sin
2 2θ23 is as low
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
as sin2 2θ23 = 0.92 (the current 90% lower bound), and if ξ is as large
as ξ = 0.025 (the central value of from CHARM/CHARM II6), then the
shift in the survival probability at the first oscillation dip can be as large as
∼ 40%. If the Fermilab-NUMI beam in its high-energy mode2 were aimed at
a declination angle of 46◦ toward the planned Hyper-Kamiokande detector3
in Kamioka, Japan (baseline 9120 km), such a shift would be visible after
just one year of data taking, assuming a Mega-ton fiducial volume and
100% efficiency. The absence of any shift after 5 years of data taking would
constrain ξ to1
|ξ| ≤ ξ0 ≡ 0.005 , (7)
at the 99% confidence level.
In the following, we look at how this potential limit on ξ would translate
into constraints on the Z ′ in topcolor assisted technicolor, and various types
of leptoquarks. A more comprehensive analysis will be presented in Ref. 7.
2. Topcolor Assisted Technicolor
Though there are several different versions of topcolor assisted technicolor,4
we consider here the simplest in which the quarks and leptons transform
under the gauge group
SU(3)s × SU(3)w × U(1)s × U(1)w × SU(2)L (8)
with coupling constants g3s, g3w, g1s, g1w, and g, respectively. It is assumed
that g3s ≫ g3w and g1s ≫ g1w. SU(2)L is the usual weak-isospin gauge
group of the SM. The first and second generation fermions are assumed to
be charged only under SU(3)w×SU(2)L×U(1)w, while the third generation
fermions are assumed to be charged only under SU(3)s × SU(2)L ×U(1)s.
The U(1) charges for both cases are set equal to the SM hypercharge. At
scale Λ ∼ 1 TeV, technicolor, which is included in the model to generate the
W and Z masses, is assumed to become strong and generate a condensate
(of something which is left unspecified) which breaks the two SU(3)’s and
the two U(1)’s to their diagonal subgroups:
SU(3)s × SU(3)w → SU(3)c , U(1)s × U(1)w → U(1)Y , (9)
which we identify with the usual SM color and hypercharge groups. The
massless unbroken U(1) gauge boson Bµ and the massive broken U(1) gauge
boson Z ′µ are related to the original U(1)s × U(1)w gauge fields Ysµ and
Ywµ by
Z ′µ = Ysµ cos θ1 − Ywµ sin θ1
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
Bµ = Ysµ sin θ1 + Ywµ cos θ1 (10)
where
tan θ1 =
. (11)
The currents to which the Bµ and Z
µ couple to are:
1sYsµ+g1wJ
1wYwµ = g
′ (cot θ1J
1s − tan θ1J
1s + J
1w)Bµ ,
where
. (13)
The current J
1s + J
1w is the SM hypercharge current, and g
′ is the
SM hypercharge coupling constant.
The exchange of the Z ′ leads to the current-current interaction
(cot θ1J1s − tan θ1J1w) (cot θ1J1s − tan θ1J1w) , (14)
the J1sJ1s part of which does not contribute to neutrino oscillations on the
Earth, while the J1wJ1w part is suppressed relative to the J1wJ1s part by
a factor of tan2 θ1 ≪ 1. Therefore, we only need to consider the J1sJ1w
interaction which only affects the propagation of ντ . The effective potential
felt by ντ due to this interaction is
Vντ =
, (15)
and the effective ξ is
ξTT =
Vντ − Vνµ
(g′/MZ′)
(g/MW )2
tan2 θW
sin2 θW
The limit |ξTT | ≤ ξ0 = 0.005 then translates into:
MZ′ ≥ MZ
sin2 θW
≈ 440GeV . (17)
Unfortunately, this potential limit from the measurement of ξ is weaker
than what is already available from precision electroweak data,8 and from
direct searches for pp̄ → Z ′ → τ+τ− at CDF.9,10
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
3. Generation Non-diagonal Leptoquarks
The interactions of leptoquarks with ordinary matter can be described in
a model-independent fashion by an effective low-energy Lagrangian as dis-
cussed in Ref. 11. Assuming the fermionic content of the SM, the most gen-
eral dimensionless SU(3)C × SU(2)L ×U(1)Y invariant couplings of scalar
and vector leptoquarks satisfying baryon and lepton number conservation
are given by:
L = LF=2 + LF=0 , (18)
where
LF=2 =
ejL − dciLνjL) + g
γµejL) + g
γµejR)
V +2µ
γµνjL) + g
γµejR)
V −2µ
γµejL)Ṽ
2µ + (u
γµνjL)Ṽ
ejL)S
3 − (uciLejL + dciLνjL)S
νjL)S
, (19)
LF=0 =
(uiRejL) + h
(uiLejR)
(uiRνjL)− hij2R(diLejR)
(diRejL)S̃
2 + (diRνjL)S̃
(uiLγ
µνjL + diLγ
µejL) + h
(diRγ
µejR)
V 01µ
2(uiLγ
µejL)V
3µ + (uiLγ
µνjL − diLγµejL)V 03µ +
2(diLγ
µνjL)V
Here, the scalar and vector leptoquark fields are denoted by S and V , their
subscripts indicating the dimension of their SU(2)L representation, and the
superscripts indicating the sign of the weak-isospin of each component. We
allow for generation non-diagonal couplings with the indices i and j indicat-
ing the quark and lepton generation numbers, respectively. The subscript L
or R on the coupling constants indicate the chirality of the lepton involved
in the interaction. For simplicity, color indices have been suppressed. The
leptoquarks S1, ~S3, V2, Ṽ2 carry fermion number F = 3B + L = −2, while
the leptoquarks S2, S̃2, V1, ~V3 have F = 0. The interactions that affect neu-
trino oscillation are those with (ij) = (12) or (13).
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
Table 1. Constraints on the leptoquark couplings with all the leptoquark
masses set to 300 GeV. To obtain the bounds for a different leptoquark mass
MLQ, simply rescale these numbers with the factor (MLQ/300 GeV)
LQ CLQ δλ
upper bound from |ξ| ≤ ξ0
S1 +3 |g
|2 − |g13
|2 0.01
~S3 +9 |g
|2 − |g13
|2 0.003
S2 −3 |h
|2 − |h13
|2 0.01
S̃2 −3 |h̃
|2 − |h̃13
|2 0.01
V2 +6 |g
|2 − |g13
|2 0.005
Ṽ2 +6 |g̃
|2 − |g̃13
|2 0.005
V1 −6 |h
|2 − |h13
|2 0.005
~V3 −18 |h
|2 − |h13
|2 0.002
It is straightforward to calculate the effective potentials due to the ex-
change of these leptoquarks, as well as the effective values of ξ.7 Assuming a
common mass for leptoquarks in the same SU(2)L weak-isospin multiplet,
the effective ξ due to the exchange of any particular type of leptoquark can
be written in the form
ξLQ = CLQ
δλ2LQ/M
g2/M2
δλ2LQ
. (21)
Here, CLQ is a constant prefactor, and δλ
LQ represents
δλ2LQ = |λ12LQ|2 − |λ13LQ|2 , (22)
where λ
is a generic coupling constant. The values of CLQ and δλ
for the different types of leptoquark are listed in Table 1. The constraint
|ξLQ| ≤ ξ0 translates into:
MLQ ≥
|CLQ||δλ2LQ|
2GF ξ0
|CLQ||δλ2LQ| × (1700GeV) . (23)
Alternatively, one can fix the leptoquark mass and obtain upper bounds on
the leptoquark couplings:
∣δλ2LQ
2GF ξ0
M2LQ ≈
300GeV
. (24)
The values when MLQ = 300GeV are listed in the rightmost column of
Table 1. Thought it is often stated that generation non-diagonal couplings
of leptoquarks are strongly constrained by the absence of flavor chang-
ing neutral currents, it is only the products of the (ij) = (12) and (13)
couplings with other couplings that are constrained.12 The limits on the
August 9, 2021 18:21 WSPC - Proceedings Trim Size: 9in x 6in SCGT06-takeuchi
individual couplings can be improved considerably. The current leptoquark
mass bounds from direct searches at the Tevatron, LEP, and HERA are in
the 200∼300 GeV range assuming generation diagonal couplings set equal
4πα. At the LHC, leptoquarks, if they exist, can be expected to be
pair-produced copiously through gluon-gluon fusion. The expected sensi-
tivity is up to about 1.5 TeV.13 Depending on the value assumed for δλ2LQ,
the bound from Eq. (23) can be competitive.
Acknowledgments
We would like to thank Drs. Andrew Akeroyd, Mayumi Aoki, Masafumi
Kurachi, Robert Shrock, and Hiroaki Sugiyama for helpful discussions. This
research was supported in part by the U.S. Department of Energy, grant
DE–FG05–92ER40709, Task A (Kao, Pronin, and Takeuchi).
References
1. M. Honda, N. Okamura, and T. Takeuchi, arXiv:hep-ph/0603268.
2. NUMI Technical Design Handbook,
available at http://www-numi.fnal.gov/numwork/tdh/tdh index.html
3. Y. Itow et al., arXiv:hep-ex/0106019;
updated version available at http://neutrino.kek.jp/jhfnu/.
4. C. T. Hill, Phys. Lett. B 345, 483 (1995); G. Buchalla, G. Burdman,
C. T. Hill, and D. Kominis, Phys. Rev. D 53, 5185 (1996).
5. T. Appelquist, M. Piai and R. Shrock, Phys. Rev. D 69, 015002 (2004).
6. J. Dorenbosch et al. [CHARM Collaboration], Phys. Lett. B 180, 303 (1986);
P. Vilain et al. [CHARM-II Collaboration], Phys. Lett. B 320, 203 (1994).
7. M. Honda, Y. Kao, N. Okamura, A. Pronin, and T. Takeuchi, in preparation.
8. R. S. Chivukula and J. Terning, Phys. Lett. B 385, 209 (1996);
W. Loinaz and T. Takeuchi, Phys. Rev. D 60, 015005 (1999).
9. D. Acosta et al. [CDF Collaboration], Phys. Rev. Lett. 95, 131801 (2005).
10. W. M. Yao et al. [Particle Data Group], J. Phys. G 33 (2006) 1.
11. W. Buchmüller, R. Rückl and D. Wyler, Phys. Lett. B 191, 442 (1987);
M. Tanabashi, p.412 of Ref. 10.
12. S. Davidson, D. C. Bailey and B. A. Campbell, Z. Phys. C 61, 613 (1994);
M. Leurer, Phys. Rev. D 49, 333 (1994); M. Leurer, Phys. Rev. D 50, 536
(1994); E. Gabrielli, Phys. Rev. D 62, 055009 (2000).
13. ATLAS detector and physics performance. Technical design report. Vol. 2,
CERN-LHCC-99-15, ATLAS-TDR-15; CMS physics : Technical Design Re-
port v.2 : Physics performance, CERN-LHCC-2006-021, CMS-TDR-008-2;
V. A. Mitsou, N. C. Benekos, I. Panagoulias and T. D. Papadopoulou, Czech.
J. Phys. 55, B659 (2005).
ABSTRACT
  New physics beyond the Standard Model can lead to extra matter effects on
neutrino oscillation if the new interactions distinguish among the three
flavors of neutrino. In Ref.1, we argued that a long-baseline neutrino
oscillation experiment in which the Fermilab-NUMI beam in its high-energy mode
is aimed at the planned Hyper-Kamiokande detector would be capable of
constraining the size of those extra matter effects, provided the vacuum value
of \sin^2 2\theta_{23} is not too close to one. In this talk, we discuss how
such a constraint would translate into limits on the coupling constants and
masses of new particles in models such as topcolor assisted technicolor.

<|endoftext|><|startoftext|>
The magnetization of a ferromagnetic body can be manipulated by transfer of spin angular momentum from a spin-polarized curren
Shaped angular dependence of the spin transfer torque and microwave generation without 
magnetic field  
O. Boulle1, V. Cros1, J. Grollier1, L. G. Pereira1,*, C. Deranlot1, F. Petroff1, G. Faini2, J. Barnaś3, 
A. Fert1 
1 Unité Mixte de Physique CNRS/Thales and Université Paris Sud XI, Route départementale 128, 
91767 Palaiseau, France 
2 Laboratoire de Photonique et de Nanostructures LPN-CNRS, Route de Nozay, 91460 
Marcoussis, France 
3 Department of Physics, Adam Mickiewicz University, Umultowska 85, 61-614 Poznań, Poland 
Abstract: The generation of oscillations in the microwave frequency range is one of the most 
important applications expected from spintronics devices exploiting the spin transfer 
phenomenon. We report transport and microwave power measurements on specially designed 
nanopillars for which a non-standard angular dependence of the spin transfer torque (wavy 
variation) is predicted by theoretical models. We observe a new kind of current-induced 
dynamics that is characterized by large angle precessions in the absence of any applied field, as 
this is also predicted by simulation with such a wavy angular dependence of the torque. This type 
of non-standard nanopillars can represent an interesting way for the implementation of spin 
transfer oscillators since they are able to generate microwave oscillations without applied 
magnetic field. We also emphasize the theoretical implications of our results on the angular 
dependence of the torque.  
The magnetization of a ferromagnetic body can be manipulated by transfer of spin angular 
momentum from a spin-polarized current. This is the concept of spin transfer introduced by 
Slonczewski [1] and Berger [2] in 1996. In most experiments, a spin-polarized current is injected 
from a spin polarizer into a “free” magnetic element, for example in pillar-shaped magnetic 
trilayers [3-6]. The phenomenon of spin transfer has a great potential for applications. It can be 
used either to switch a magnetic configuration (the configuration of a magnetic memory for 
example) [3-5] or to generate magnetic precessions and voltage oscillations in the microwave 
frequency range[6-7]. In the most usual situations, such oscillations are observed in the presence 
of a magnetic field. 
From a fundamental point of view, spin transfer effects raise two different types of 
problems [8]. First the spin transfer torque acting on a magnetic element is related to the 
transverse spin polarisation of the current (transverse meaning perpendicular to the magnetization 
axis of the element) and can be derived from spin-dependent transport equations [8-17]. On the 
other hand, the description of the magnetic excitations generated by the spin transfer torque raises 
problems of non-linear dynamics [8,18-20]. For example, in the simple limit where the excitation 
is supposed to be a uniform precession of the magnetization (macrospin approximation), this 
precession can be determined by introducing the spin transfer torque into a Landau-Lifshitz-
Gilbert (LLG) equation for the motion of the magnetic moment. However, the determination of 
the spin transfer torque and the description of the magnetization dynamics cannot be regarded as 
independent problems. In standard trilayered structures with in-plane magnetizations and with the 
usual angular dependence, a switching regime is found at zero and low magnetic field and the 
precession regime with generation of voltage oscillations is mainly observed above some 
threshold field [8]. We will show that a new behavior, characterized by large angle precessions in 
the absence of any magnetic field, can be obtained in specially designed structures presenting a 
non-standard dependence of the spin transfer torque as a function of the angle between the fixed 
magnetization of the polarizer and the magnetization of the free layer. This non-standard angular 
dependence of the torque, that we call “wavy”, is obtained by choosing materials with different 
spin diffusion lengths for the “fixed” and “free” magnetic layers, which changes the distribution 
of the spin currents and spin accumulations in the structure. 
The observation of spin transfer oscillations at zero field in structures with a “wavy” angular 
dependence of the torque can represent a new way to obtain spin transfer oscillators operating 
without any applied field, an other possible way being the use of exchange interactions or 
anisotropy to generate local effective fields or non-collinear equilibrium configurations [21]. In 
addition, the observation of a wavy angular dependence of the torque represents a valuable test of 
the theory and shows that realistic predictions of the spin transfer torque and its angular 
dependence in a given structure are now possible. As we will see, in the models we consider here 
[15-16], the torque is calculated from parameters which, for most of them, can be derived from 
former CPP-GMR experiments [22-23].  
The usual behaviour observed in pillars with in-plane magnetizations along an anisotropy axis 
corresponds to the standard angular dependence of the inset of Fig.1a, in which the torque starts 
from zero at ϕ = 0 (P equilibrium state with parallel magnetizations of the fixed and free 
magnetic layers) and keeps the same sign till it comes back to zero at ϕ = π (AP antiparallel 
state). At zero field and starting from a P state for example (Fig.1b), a negative current (electrons 
going from the free to the fixed layer in our convention) will destabilize the P state and stabilize 
the AP state, i.e. can switch the system from P to AP. In the presence of a large enough applied 
field favouring the P configuration, the torque cannot stabilize the AP state and leaves the system 
in an intermediate precession state. This is what we call the standard behaviour with irreversible 
switching at low field and precession at high field, as illustrated by Fig.2a (remark: in some low 
field experiments however, the irreversible switching is preceded by precessions in a very narrow 
current range just below the switching current). 
The non-standard behaviour with precession at zero and/or low field presented in this article 
is related to the existence of a wavy angular dependence of the torque acting on the free magnetic 
layer. This oscillatory angular dependence, with an inversion of the torque between ϕ = 0 and ϕ = 
π, is shown in Fig. 1a. We present the results of calculations in the models of Fert et al [15] and 
Barnaś et al [16-17] for a Py(8)/Cu(10)/Co(8) pillar. With respect to standard structures like 
Co/Cu/Co or Py/Cu/Py, the difference we have introduced is a large asymmetry between the spin 
diffusion lengths (SDL) in the magnetic layers, with a long SDL in Co ≈ 38 nm (at room 
temperature) and a short SDL in Py ≈ 4 nm [22-23]. The smaller spin asymmetry of the resistivity 
in Co could also affect the angular dependence but we have checked by additional calculations 
that the wavy variation comes primarily from the shorter SDL in the Py free layer and not from 
the different spin asymmetry coefficients, as this has been mistakenly written in Ref.[24]. The 
solid curves in Fig.1a correspond to the calculation in the model of Barnaś et al [16]. A wavy 
angular dependence is also predicted by the model of Fert et al [15] which gives the terms of first 
order in ϕ and (π-ϕ) in the vicinity of the colinear P or AP states  (the solid straight lines at the 
left and right edges of the graph in Fig.1). Due to the inversion at small values of ϕ, a negative 
current (Fig. 1c) now stabilizes not only the AP state but also the P one and should be ‘inactive’. 
This can be a solution, for example, to reduce the spin-transfer-induced noise that is detrimental 
to read heads. In contrast, an appropriate positive current can destabilize both the P and AP states, 
leading to a precessional solution the motion equation, even at zero field.  
To validate these predictions, we have performed transport and microwave power 
measurements at room temperature on Py(8)/Cu/Co(8) elliptical nanopillars of approximate 
dimensions 100x155 nm². Only the top Py layer (free layer) and the Cu spacer are etched 
through. The unetched Co layer (“fixed” layer) lies directly on the Ta/Cu bottom electrode. Very 
similar results have been obtained on Py(8)/Cu/Co(4)/IrMn nanopillars in which the extended Co 
layer is exchange biased by the IrMn one. We show in Fig. 2b the GMR signal of a 
Py(8)/Cu/Co(8) sample. Starting, for example, from large negative fields, the switching to an AP 
state at about 40 Oe is related to the magnetization reversal of the free layer (Py) to the positive 
direction, as this can be found from subsequent CIMS experiments in which the current-induced 
return to P is made harder by a larger positive field (consistently with a positive orientation of the 
Py magnetization in the switching to the AP state). From the GMR minor cycles of the Py layer 
(see Supplementary Information), we find that the coercive field of the Py layer is 90 Oe and the 
dipolar field acting on it is 43 Oe. 
The different behaviours observed for standard and wavy angular dependences are first 
illustrated in Fig.2a and 2c. In Fig. 2a, we show the standard variation of differential resistance 
(dV/dI) versus I measured on a  Py(4 nm)/Cu(10 nm)/Py(15 nm) pillar: starting from a P state, a 
negative current  induces an irreversible switching from P to AP at low field and a reversible 
variation with the characteristic peak of steady precessions at high field. In contrast, starting 
again from a P magnetic configuration with magnetizations in the positive field direction but now 
with a Py(8 nm)/Cu(10 nm)/Co(8 nm) pillar for which a wavy angular dependence is expected, 
we detect (Fig. 2c) reversible peaks of dV/dI for positive currents and at very small fields on both 
sides of Happ= 0. The peak current increases with increasing applied positive field as expected 
since the P state becomes more stable. We have also performed experiments with an AP initial 
state. We find that dV/dI first drops to the level of a P state at some positive current and then, at 
higher current, exhibits the same characteristic precession peak we observe in measurements with 
a P initial state (data not presented).  
In Fig. 3, we present microwave power spectra recorded with the same P initial state and for 
several values of the current. Fig.3a is for zero applied field (actually, Happ ≈ 2 Oe) and Fig.3b for 
zero effective in-plane field (after subtracting the dipolar field). Coloured dots in the insets 
indicate the values of the current on the corresponding dV/dI vs I curves. A peak in the 
microwave power spectrum turns out approximately in some current range above the maximum 
of dV/dI. The frequency f of the microwave peak increases with the current (blue shift), in 
contrast with the red shift generally observed in standard pillars with in-plane magnetization. 
Actually, with the standard angular dependence of the torque, the theoretical prediction is a 
succession of red and blue shift regimes at increasing current but, in experiments with in-plane 
applied fields, the crossover to a blue shift regime has been seldom observed [25]. In macrospin 
simulations, a blue shift in f is predicted for the regime of out-of-plane (OP) precessions and is 
also associated with a decrease of f with increasing in-plane field. As shown in Fig.3 c, we 
observe this decrease of f with Happ. In Fig.4a, we present the current-field diagram of the 
microwave power. Microwave signals are emitted only in the top left corner of the diagram, i.e. 
at low field and in a zone which is also a region of increased resistance (Fig.4b). No excitation is 
observed at higher field. 
We can therefore put forward two main results from our microwave power data: i) Pillars in 
which a wavy angular dependence of the spin transfer torque is expected, generate microwave 
oscillations, but, in contrast with the standard behavior, when there are excited by  positive 
currents and at zero field; ii) These microwave oscillations present a blue shift of their frequency 
with current, a behaviour generally associated with out-of-plane precessions.  
We first want to exclude that the effects described in the preceding paragraphs could arise 
from other origins than the wavy angular dependence of the STT. Could they arise from 
excitations of the Co “fixed” layer? We can first argue that the same behaviour is also observed 
when the 4nm thick and extended Co layer is pinned by an IrMn layer and that an excitation of a 
thin Co layer in the presence of such a strong pinning is quite improbable. We can also point out 
that, for un-pinned continuous magnetic layers, the switching current densities obtained by Chen 
et al. [26] are about one order of magnitude larger than ours. In addition, whereas a reduction of 
the thickness of the Co layer to 4 nm for the same 8nm Py thickness should make the excitation 
of Co easier (smaller current), our experimental results are in the opposite direction.  
The sample of Fig.2-3 exhibits the relatively simple behaviour predicted for a wavy angular 
dependence of the torque in a macrospin picture, i.e. precessions at zero and low field in positive 
current. However, in a series of five similar samples (with or without pinning by IrMn), we have 
also observed additional features in transport measurements. For example, in some samples and 
with an initial P state, we see not only peaks in dV/dI in positive current at zero or low field but 
also partial or total switchings in negative current. These excitations can be ascribed to a non-
uniform distribution of the magnetization [27]. For a part of the sample, the angle ϕ between the 
magnetizations of the two layers is above ϕc, the angle of torque inversion, and can be excited by 
a negative current. However, we emphasize that these additional excitations observed in transport 
measurements are never associated with peaks in the emitted power in the Gigahertz range. All 
the samples share the same main features with microwave emission only at low field in positive 
current.  
We now present the theoretical implications of our experimental results and first comment briefly 
on the origin of the wavy angular dependence of the spin transfer torque in our samples. The 
physics governing this angular dependence can be discussed simply by considering that, in all the 
models [8,13-17] based on interfacial absorption of the transverse spin component and boundary 
conditions of the mixing conductance type (the language can be different in different 
formalisms), the spin transfer torque is proportional to the transverse component of the spin 
accumulation in the spacer layer. The key point is that the spin accumulation in a nonmagnetic 
conductor is directly related to the gradient of the spin current along the current axis z, 
dzjdm m /)(−∝  [28]. In configurations close to the P state of a standard pillar, with a thick 
fixed layer and a thin free layer made of the same material, the spin polarization of the current in 
the spacer decreases from the fixed layer to the free layer.  This corresponds to a given sign of 
the spin accumulation. But an opposite sign is expected if, in the same configuration, the spin 
polarization of the current increases from the fixed layer to the free layer. This is what occurs for 
our Py(8nm)/Cu(10nm)/Co(8nm) pillars in an angular range close to the P configuration, as this 
can be seen from the spin accumulation calculated in the Section Methods. As shown in Fig 1a, 
calculations of STT based on two different models reflect this inversion of the spin accumulation 
by an inversion of the torque on the left part of the figure with respect to the standards case. 
However, as shown the figure, the inversion is a little less pronounced (less steep slope) in the 
model of Ref.[15] which goes beyond the simple mixing conductance approximation of Ref.[16].  
For a further understanding, we have performed additional macrospin simulations of the 
current-induced precessions by solving a Landau-Lifschitz Gilbert equation including a spin 
transfer term using parameters compatible with the actual structure of the measured samples (see 
Methods, the simulations have been performed by two of the co-authors, O.B. and J.G., 
independently of those published in Ref.[24]). The simulated current-field diagram at T = 0 K is 
presented on Fig 4.d with a colour scale corresponding to the change of resistance. At high field 
(Happ larger than the anisotropy field) and in the current range we have considered, the only 
excitations are in-plane (IP) precessions occurring above a threshold current Ic1 and associated 
with a small change of resistance (which also corresponds to a small microwave power). At low 
field, the IP precessions above Ic1 (black and blue trajectories in Fig. 4c) are followed by out-of-
plane (OP) precessions (orange and red trajectories) above a second current threshold Ic2.  
There is a general good agreement between the main features of the experimental and 
calculated phase diagrams. In particular, the zone of OP precessions in the top left corner of the 
diagram of Fig. 4d turns out to be also the zone where we measure the larger DC resistance 
increase (Fig. 4b) and also detect microwave excitations (Fig.4a). Quantitatively, if one compares 
the colours in Fig.4 b and c, one can see that the distribution of the resistance change in the 
diagram is well reproduced and that the experimental ΔR in the OP zone is only somewhat 
smaller than the calculated one (by about 20% in average). The simulations also give a 
distribution of microwave power (not shown) concentrated in OP top-left zone as in the 
experimental plot of Fig.4a but with a power which is about 80 times larger than the experimental 
one. This could be due to several reasons. First, there are certainly technical factors, like a large 
impedance mismatch in the detection circuit. Second, for the OP excitations, the limits of a 
macrospin approach for a quantitative prediction [6,30], have been put forward by several 
publications. Finally, for the IP precessions we could not detect in the microwave spectra, it can 
be pointed out that a very small variation of GMR is expected for angles between P and an angle 
similar to our ϕc in structures with our type of torque angular dependence [29]. This has also led 
us probably to overestimate the resistance change and the microwave power, since our 
calculation is based on a standard angular dependence of the GMR as sin2(ϕ/2). 
A confirmation that the zone of maximum resistance and microwave excitations in the top-
left corner (positive currents and low fields) of the diagrams in Fig.4a-b can be identified with the 
zone of Out-of-Plane precessions in the calculated diagram (Fig. 4c) comes from the current and 
field dependence of the frequency. As shown in the inset of Fig. 4c, the simulations predict that a 
decrease of the frequency at increasing current for IP precessions is followed by an increase at 
the transition to OP precessions. This is in agreement with the frequency blue shift of the 
microwave excitations detected in the same zone of the phase diagram. The simulations also 
predict correctly the red shift for the variation with the field. Our simulations therefore support 
the picture of a non-standard behavior induced by a wavy angular dependence of the STT torque 
and characterized by out-of-plane precessions excited by positive current at zero and low field. 
During the submission process, we learned that oscillations of vortex structures in thick Py 
layers excited by STT have been observed at relatively low field [31,32]. However this leads to 
oscillations at relatively low frequency, below 1 GHz for layers in our aspect ratio [33], and the 
oscillations above 3 GHz we observe cannot be explained by this mechanism. 
Leaning on recent theoretical models of spin transfer torque, our experimental results 
should help designing more efficient spin transfer oscillators operating in a very small or even 
without an applied magnetic field. This is a necessary step (among others) on the implementation 
of these new spintronics-based oscillators in a microwave receiver system for telecommunication 
applications. 
Methods : 
The multilayers are grown by sputtering onto oxidized Si substrates. Two types of stacks 
were deposited : structure 1 = Au(20 nm)/Cu(5 nm)/Py(8 nm)/Cu(10 nm)/Co(8 nm)/Ta(10 
nm)/Cu(80 nm)/Ta(10 nm) and structure 2 = Au(25 nm)/Py(8 nm)/ Cu(8 nm)/Co(4 nm)/IrMn(15 
nm)/Ru(15 nm)/Cu(35 nm). Py stands for Permalloy. The results we present, are on a nanopillar 
with structure 1, but very similar results are observed with structure 2 when the fixed Co layer is 
pinned with an IrMn layer. This indicates that, even without an IrMn pinning layer, the 
magnetization of the extended Co layer is similarly fixed. 
For the nanofabrication process, we first defined (by e-beam lithography, evaporation 
deposition and lift-off) a Ti(15 nm)/Au (55 nm) elliptical mask on the magnetic multilayer. Then, 
the magnetic pillar is etched by ion milling with a real-time monitoring by mass spectroscopy 
down to the Cu/Co interface. The bottom electrode is defined by optical lithography and ion 
milling. The next step is a planarization of the pillar with a Su-8 resist layer that is also used to 
electrically isolate the bottom and the top electrode. The Su-8 layer on the top of the pillar is 
removed by reactive ion etching. Finally, the top Ti/Au electrode is defined by optical 
lithography, evaporation deposition and lift-off.  
We measured both the dc resistance and the differential resistance dV/dI using an additional 
20µA ac current modulated at 5kHz. For the frequency-domain measurements, we applied a dc 
current on the sample through a bias-T. The high frequency voltage signal is then amplified (68 
dB) and analysed on a commercial spectrum analyzer. The power spectra we show are extracted  
from the spectrum analyser (we do not correct them from a calibration done for quantities like the 
frequency-dependent amplifier gain, the attenuation in the transmission lines, and the impedance 
mismatches). They are only obtained by subtracting a reference spectra measured at Idc = 0 in the 
same magnetic field conditions. Note that the measured emitted power is therefore only a fraction 
of the actual emitted power from the pillars. Both transport and frequency measurements have 
been performed at room temperature and with in-plane magnetic field. 
The torques of Fig.1a have been calculated by introducing in the models of Refs.[16] and 
[17] parameters mostly derived from CPP-GMR experimental data [22-23]. For respectively Au, 
Py, Cu, Co and Ta, these parameters are: bulk resistivity ρ (μΩ.cm) = 2, 15, 2.9, 24, 170; bulk 
spin asymmetry coefficient β = 0, 0.76, 0, 0.46, 0; spin diffusion length lsf (nm) = 35, 4, 350, 38, 
10. For the interface parameters, respectively Au/Cu, Cu/Py, Cu/Co, Co/Ta, Ta/Cu, the 
parameters are : interfacial resistance rb (fΩ.m²) = 0.17, 0.5, 0.51, 0.5, 0.5; interfacial spin 
asymmetry coefficient γ : 0, 0.7, 0.77, 0.7, 0; interfacial spin memory loss coefficient δ : 0.13, 
0.25, 0.25, 0.25, 0.1. Note that the values of the Co and Ta resistivity have been measured on thin 
film we had grown in the same conditions. The unknown value of lsf in Ta has been estimated by 
fitting the calculated and experimental variations of resistance ΔR. We have also used the same 
parameters in routine programmes [34] developed for the CPP-GMR to calculate the spin 
accumulation in the spacer layer for our structure and in a standard structure Py (15 nm)/Cu (10 
nm)/Py(2 nm), respectively – 2.2 and + 2.0 in arbitrary units and check the change of sign at the 
origin of the wavy angular dependence. 
For the simulations of the magnetization dynamics, we have solved a Landau Lifschitz 
Gilbert equation including a spin transfer term of the form M1x(M1xM2) with the angular 
dependence shown in Fig.1a. The calculations are performed at zero temperature. The saturation 
magnetization µ0Ms = 0.87 T has been derived from ferromagnetic resonance experiments 
performed on a Cu(6nm)/Py(7nm)/Cu(6nm) layer at room temperature. The other parameters are 
the anisotropy field Han= 0.009 T, the gyromagnetic factor γ0 = 2.21 105(s.A/m)-1, α = 0.011. The 
area of the pillars is about 1.38 104 nm², as derived by scanning electron microscopy.  
References:  
[1] Slonczewski, J.C. Current-driven excitation of magnetic multilayers. J. Magn. Magn. Mater. 
159, L1-L7 (1996). 
[2] Berger, L. Emission of spin waves by a magnetic multilayer traversed by a current. Phys. Rev. 
B 54, 9353-9358 (1996). 
[3] Katine, J.A., Albert, F.J., Buhrman, R.A., Myers, E.B., Ralph, D.C., Current-driven 
magnetization reversal and spin-wave excitations in Co/Cu/Co pillars. Phys. Rev. Lett. 84, 3149-
3152, (2000). 
[4] Grollier, J., Cros, V., Hamzić, A., George, J.M., Jaffres, H., Fert, A., Faini, G., Ben Youssef, 
J., Le Gall, H., Magnetization reversal in Co/Cu/Co pillars by spin injection. Appl. Phys. Lett. 78, 
3663-3665 (2001).  
[5] Urazhdin, S., Birge, N.O., Pratt, W.P., Bass, J. Switching current versus magnetoresistance in 
magnetic multilayer nanopillars. Appl. Phys. Lett. 84, 1516-1518 (2004). 
[6] Kiselev, S.I. et al., Microwave oscillations of a nanomagnet driven by a spin-polarized 
current. , Nature 425, 380-383 (2003). 
[7] Rippard, W.H., Pufall, M.R., Kaka, S., Russek, S.E., Silva, T.J., Direct-current induced 
dynamics in Co90Fe10/Ni80Fe20. Phys. Rev. Lett. 92, 027201 (2004). 
[8] Stiles, M.D. and Miltat, J. in Spin Dynamics in Confined Magnetic Structures, III, edited by 
B. Hillebrands and A. Thiaville (Springer, Berlin, 2006).  
[9] Waintal, X., Myers, E.B., Brouwer, P.W., Ralph, D.C., Role of spin-dependent interface 
scattering in generating current-induced torques in magnetic multilayers. Phys. Rev. B 62, 12317-
12327 (2000).  
[10] Slonczewski, J., Currents and torques in metallic magnetic multilayers. J. Magn. Magn. 
Mater. 247, 324-338 (2002). 
[11] Zhang, S., Levy, P.M., Fert, A., Mechanisms of spin-polarized current-driven magnetization 
switching. Phys. Rev. Lett. 88, 236601 (2002). 
[12] Stiles, M.D. and Zangwill, A., Anatomy of spin-transfer torque. Phys. Rev. B 66, 014407 
(2002). 
[13] Brataas, A., Bauer, G. E. W., Kelly, P. J., Non-collinear magnetoelectronics, Physics 
Reports 427, 157-256 (2006) 
[14] Kovalev, A., Brataas, A., Bauer, G.E.W. Spin transfer in diffusive ferromagnet–normal 
metal systems with spin-flip scattering. Phys. Rev. B 66, 224424 (2002). 
[15] Fert, A., Cros, V., George, J.M., Grollier, J., Jaffres., H., Hamzić, A., Vaurès, A., Faini, G., 
Ben Youssef, J., Le Gall, H., Magnetization reversal by injection and transfer of spin. J. Magn. 
Magn. Mater. 272, 1706-1711 (2004). 
[16] Barnaś, J., Fert, A., Gmitra, M., Weymann, I., Dugaev, V.K., From giant magnetoresistance 
to current-induced switching by spin transfer. Phys. Rev. B 72, 024426 (2005) 
[17] Barnaś, J., Fert, A., Gmitra, M., Weymann, I., Dugaev, V.K., Macroscopic description of 
spin transfer torque. Mat. Sc. Engin. B 126, 271-274 (2006). 
[18] Rezende, S. M., de Aguiar, F. M., Azevedo, A., Magnon excitation by spin-polarized direct 
currents in magnetic nanostructures. Phys. Rev. B 73, 094402 (2006). 
[19] Bertotti, G. et al., Magnetization switching and microwave oscillations in nanomagnets 
driven by spin-polarized currents. Phys. Rev. Lett. 94, 127206 (2005). 
[20] Slavin, A.N., Kabos, P., Approximate theory of microwave generation in a current-driven 
magnetic nanocontact magnetized in an arbitrary direction. IEEE Trans. on Magnetics 41, 1264-
1273 (2005).  
[21] Rippard, W. H., Pufall, M. R., Silva, T. J. Quantitative studies of spin-momentum-transfer-
induced excitations in Co/Cu multilayer films using point-contact spectroscopy Appl. Phys. Lett. 
82, 1260-1262 (2003). 
[22] Bass, J., Pratt, W.P., Current-perpendicular (CPP) magnetoresistance in magnetic 
multilayers. J. Magn. Magn. Mat. 200, 274-289 (1999). 
[23] Bass, J., Pratt, W.P., Spin-Diffusion Lengths in Metals and Alloys, and Spin-Flipping at 
Metal/Metal Interfaces: an Experimentalist's Critical Review. cond-mat/0610085 (2006). 
[24] Gmitra, M. and Barnaś, J., Current-driven destabilization of both collinear configurations in 
asymmetric spin valves. Phys. Rev. Lett. 96, 207205 (2006). 
[25] Kiselev, S.I. et al., Spin-transfer excitations of permalloy nanopillars for large applied 
currents. Phys. Rev. B 72, 064430 (2005). 
[26] Chen, T. Y., Ji, Y., Chien, C. L., Switching by point-contact spin injection in a continuous 
film,  Appl. Phys. Lett. 84, 380-382 (2004)  
[27] Acremann, Y. et al. Time-resolved imaging of spin transfer switching : beyond the 
macrospin concept.  Phys. Rev. Lett. 96, 217202 (2006). 
[28] Valet, T.,  Fert, A., Theory of the perpendicular magnetoresistance in magnetic multilayers. 
Phys. Rev. B 48, 7099-7113 (1993). 
[29] Urazhdin S., Loloee R., Pratt Jr. W. P., Noncollinear spin transport in magnetic multilayers, 
Phys. Rev. B 71, 100401 (2005). 
[30] Berkov, D. V., Gorn, N. L., Magnetization precession due to a spin polarized current, Phys. 
Rev. B 72, 024455 (2005) 
[31] Pribiag, V. S., Krivorotov, I. N., Fuchs, G. D., Braganca, P. M., Ozatay, O., Sankey J. C., 
Ralph, D. C., Buhrman R. A., Magnetic vortex oscillator by dc spin-polarized current, cond-
mat/0702253 (2007). 
[32] Pufall, M. R., Rippard, W. H ., Schneider, Russek M., S. E., Low Field, Current-Hysteretic 
Oscillations in Spin Transfer Nanocontacts, , cond-mat/0702416 (2007). 
[33] Novosad, V., Fradin, F. Y., Roy, P. E., Buchanan, K. S., Guslienko, K. Yu., Bader, S. D. 
Magnetic vortex resonance in patterned ferromagnetic dots Phys. Rev. B 72, 024455 (2005)  
[34] H. Jaffrès, http://www.trt.thalesgroup.com/ump-cnrs-thales 
Correspondence and requests for materials should be adressed to V. C. The authors declare they 
have no competing financial interests 
Acknowledgements 
The authors thank M. Gmitra for the calculations of Fig 1b based on the model of Ref[16]. We 
would like also to acknowledge H. Hurdequint for FMR measurements, L. Vila for assistance in 
fabrication, O. Copie and B. Marcilhac for assistance in transport and frequency measurements 
and M.R. Pufall for discussions. This work was partly supported by the french National Agency 
of Research ANR through the PNANO program (MAGICO PNANO-05-044-02) and the EU 
through the Marie Curie Training network SPINSWITCH (MRTN-CT-2006-035327). J. B. 
acknowledges support by funds from the Polish Ministry of Science and Higher Education as a 
research project (2006-2009). 
*Present address : Instituto de Física, UFRGS, 91501-970 Porto Alegre, RS, Brazil 
Figure captions 
Figure 1 Angular dependence of the spin transfer torque for a standard and a ‘wavy’ 
angular dependence. a, Variation of the spin transfer torque on the free Py layer of a 
Au(infinite)/Cu(5 nm)/Py(8 nm)/Cu(10 nm)/Co(8 nm)/Ta(10 nm)/Cu(infinite) multilayer as a 
function of the angle ϕ between the magnetizations of the free Py and fixed Co layers for positive 
and negative currents. The solid curves are calculated in the model of Barnaś et al [17], the solid 
straight lines represent the slopes of the torque variation as the angle tends to 0 and π and have 
been derived from the small angle expression of Fert et al [16]. The parameters used in the 
calculations and mainly derived from CPP-GMR data are listed in the Section Methods. Inset : 
typical variation of the spin transfer torque as a function of the angle between the magnetizations 
of the free and fixed layers for a standard trilayer structure (case of Co/Cu/Co from Ref.[10]). b-
c, Sketches showing schematically the direction (blue arrow) of the spin transfer torque on the 
free layer for configurations close to the P and AP configurations of the free layer (m) and fixed 
layer (M) magnetizations for a standard (b) and a wavy (c) angular dependence of the torque. 
Figure 2 Transport measurements on nanopillars with standard or “wavy” angular 
dependence of the spin transfer torque. a, Differential resistance vs current measured for a 
nanopillar with a standard structure Py(15 nm)/Cu(10 nm)/Py(4 nm) at “low field” (H = 6 Oe) 
and “high field” (H = 133 Oe). In the latter case (precession), the applied field is larger than the 
coercive field equal to H = 133 Oe. Curves are offset for clarity. b-c : Transport data for a Co(8 
nm)/Cu(10 nm)/Py(8 nm). nanopillar. b, Resistance vs field at low current (I = 200 µA). c, 
Differential resistance vs current for different applied fields around zero. These fields correspond 
to the coloured symbols in b. 
Figure 3. Microwave power spectra for the Co(8 nm)/Cu(10 nm)/Py(8 nm) nanopillar of 
Fig.2b-c. a, Microwave power spectra for an applied field close to zero (Happl = 2 Oe) at different 
currents corresponding to the coloured symbols in the inset. Inset: dV/dI vs I for Happl = 2 Oe. b, 
Microwave spectra for different applied currents corresponding to the symbols in inset for an 
effective (applied + dipolar) field of about zero (Happ = 43 Oe). Inset in b : dV/dI vs I for Happ = 
43 Oe. c, Microwave spectra for I = 9 mA at different positive applied fields. Spectra are offset 
for clarity.  
Figure 4 Experimental and simulated spin-transfer-induced high frequency dynamics for a 
Co(8nm)/Cu10nm)/Py(8nm) nanopillar. a, Experimental integrated power between 0.1 to 8 
GHz in colour scale as a function of field and current. b, Normalized experimental resistance in 
colour scale as a function of field and current (a reference curve has been subtracted to the 
experimental R vs I curves to remove the changes in resistance due to Joule heating). c-d : 
Simulated dynamics of the magnetization in a macrospin approach c, Results of macrospin 
numerical calculations of LLG equation as a function of current and field at T = 0K. The black 
line indicates the onset of current-induced precession. Inset in c : Variation of the calculated 
frequency as a function the current for Happ = 0 Oe. d Magnetization trajectories for Happ=0 (black 
arrow in c) at several increasing applied currents.  
π/2 π
Standard angular
dependence of the spin 
transfer torque 
I > 0  
P stable       
AP unstable
(H = 0)
Wavy angular
dependence of the spin 
transfer torque 
I < 0  
P unstable       
AP stable 
(H = 0)
I < 0  
P stable       
AP stable 
(H = 0)
I > 0  
P unstable       
AP unstable
(H = 0)
Fig. 1 Boulle et al.
-10 -5 0 5 10
Current (mA)
 - 19 Oe
   - 3 Oe
     9 Oe
   22 Oe
-100 -50 0 50 100
12.95
12.97
12.99
Magnetic field (Oe)
-6 -4 -2 0 2 4 6
Current (mA)
Fig. 2 
Boulle et al.
9,5 mA
8,5 mA
7,5 mA
6,5 mA
4 6 8 10
12.25
12.39
I (mA)
I (mA)
1 2 3 4
36 Oe
24 Oe
- 4 Oe
Frequency (GHz)
-19 Oe
Happ ≈ 0  (2 Oe) a
Heff ≈ 0  
(Happ=43 Oe)
I = 9 mA
11 mA
10 mA
Fig. 3
Boulle et al.
9,5 mA
8,5 mA
7,5 mA
6,5 mA
4 6 8 10
12.25
12.39
I (mA)
I (mA)
1 2 3 4
36 Oe
24 Oe
- 4 Oe
Frequency (GHz)
-19 Oe
Happ ≈ 0  (2 Oe) a
Heff ≈ 0  
(Happ=43 Oe)
I = 9 mA
11 mA
10 mA
Fig. 3
Boulle et al.
4 6 8 10
12.25
12.39
I (mA)
I (mA)
1 2 3 4
36 Oe
24 Oe
- 4 Oe
Frequency (GHz)
-19 Oe
Happ ≈ 0  (2 Oe) a
Heff ≈ 0  
(Happ=43 Oe)
I = 9 mA
11 mA
10 mA
Fig. 3
Boulle et al.
0 50 100 150
 Magnetic field (Oe)
Power (pW)0 2.74
0 50 100 150
 Magnetic field (Oe)
  9.4 mA
  6.9 mA
  5.7 mA
  3.7 mA
0 50 100 150
4 6 8
Current (mA)
=0 Oe
Magnetic field (Oe)
plane
In plane
Parallel state
ΔRdc(mΩ)
ΔRdc(mΩ)
Fig 4.
Boulle et al.
	Article File #1
	page 2
	page 3
	page 4
	page 5
	page 6
	page 7
	page 8
	page 9
	page 10
	page 11
	page 12
	page 13
	Figure  1
	Figure 2
	Figure 3
	Figure 4
ABSTRACT
  The generation of oscillations in the microwave frequency range is one of the
most important applications expected from spintronics devices exploiting the
spin transfer phenomenon. We report transport and microwave power measurements
on specially designed nanopillars for which a non-standard angular dependence
of the spin transfer torque (wavy variation) is predicted by theoretical
models. We observe a new kind of current-induced dynamics that is characterized
by large angle precessions in the absence of any applied field, as this is also
predicted by simulation with such a wavy angular dependence of the torque. This
type of non-standard nanopillars can represent an interesting way for the
implementation of spin transfer oscillators since they are able to generate
microwave oscillations without applied magnetic field. We also emphasize the
theoretical implications of our results on the angular dependence of the
torque.

<|endoftext|><|startoftext|>
Dark energy interacting with neutrinos and dark matter: a
phenomenological theory
G. M. Kremer∗
Departamento de F́ısica, Universidade Federal do Paraná
Caixa Postal 19044, 81531-990 Curitiba, Brazil
October 29, 2018
Abstract
A model for a flat homogeneous and isotropic Universe composed of dark energy, dark
matter, neutrinos, radiation and baryons is analyzed. The fields of dark matter and neutrinos
are supposed to interact with the dark energy. The dark energy is considered to obey either
the van der Waals or the Chaplygin equations of state. The ratio between the pressure and the
energy density of the neutrinos varies with the red-shift simulating massive and non-relativistic
neutrinos at small red-shifts and non-massive relativistic neutrinos at high red-shifts. The
model can reproduce the expected red-shift behaviors of the deceleration parameter and of the
density parameters of each constituent.
The recent astronomical measurements of type-IA supernovae [1, 2, 3, 4] and the analysis of
the power spectrum of the CMBR [5, 6, 7, 8, 9] provided strong evidence for a present accelerated
expansion of the Universe [3, 10, 11, 12, 13, 14]; the nature of the responsible entity, called dark
energy, still remains unknown. Furthermore, the measurements of the rotation curves of spiral
galaxies [15] as well as other astronomical experiments suggest that the luminous matter represents
only a small amount of the massive particles of the Universe, and that the more significant amount
is related to dark matter. That offered a new setting for cosmological models with dark energy
and dark matter and in these contexts many interesting phenomenological models appear in the
literature analyzing the interaction of neutrinos [16, 17, 18] and dark matter [19, 20, 21, 22, 23, 24]
with dark energy. With respect to dark energy some exotic equations of state were proposed in the
literature and among others we quote the van der Waals [25, 26, 27, 28, 29] and the Chaplygin [30,
31, 32, 33] equations of state.
In the present work a very simple cosmological model – for a homogeneous, isotropic and flat
Universe composed by dark matter, dark energy, baryons, radiation and neutrinos – is investigated
where the dark energy is modeled either by the van der Waals or the Chaplygin equations of state
and interact with neutrinos and dark matter. Units have been chosen so that 8πG/3 = c = 1,
whereas the metric tensor has signature (+,−,−,−).
Let a homogeneous, isotropic and spatially flat Universe be characterized by the Robertson
Walker metric ds2 = dt2 − a(t)2δijdxidxj , where a(t) denotes the cosmic scale factor. The sources
of the gravitational field are related to a mixture of five constituents described by the fields of dark
energy, dark matter, baryons, neutrinos and radiation. The components of the energy-momentum
tensor of the sources is written as
(T µν) = diag(ρ,−p,−p,−p), (1)
where ρ and p denote the total energy density and pressure of the sources, respectively. In terms
of the energy densities and pressures of the constituents it follows
ρ = ρb + ρdm + ρr + ρν + ρde, p = pb + pdm + pr + pν + pde. (2)
∗kremer@fisica.ufpr.br
http://arxiv.org/abs/0704.0371v1
Above the indexes (b, dm, r, ν, de) refer to the baryons, dark matter, radiation, neutrinos and dark
energy, respectively.
The conservation law of the energy-momentum tensor T µν ;ν = 0 leads to the evolution equation
for the total energy density of the sources, namely
ρ̇+ 3
(ρ+ p) = 0, (3)
where the dot refers to a differentiation with respect to time.
The baryons and radiation are considered as non-interacting fields so that the evolution equa-
tions for their energy densities read
ρ̇b + 3
ρb = 0, ρ̇r + 4
ρr = 0, (4)
once the baryons represent a pressureless fluid, i.e., pb = 0, and the radiation pressure is given in
terms of its energy density by pr = ρr/3.
According to a model proposed by Wetterich [19] the evolution equation for the energy density
of a pressureless (pdm = 0) dark matter field which interacts with a scalar field φ is given by
ρ̇dm + 3
ρdm = βρdmφ̇. (5)
Here the scalar field plays the role of the dark energy and β is a constant which couples the fields
of dark matter and dark energy.
For interacting neutrinos with dark energy it is supposed that the evolution equation of the
energy density is given by (see [17, 18])
ρ̇ν + 3
(ρν + pν) = α(ρν − 3pν)φ̇. (6)
The coefficient α is connected with the mass of the neutrinos and for more details one is referred
to [17, 18] and to the references therein. Here α will be consider a phenomenological coefficient
that couples the dark energy field with the neutrinos. Note that if pν = ρν/3, there is no coupling
between the fields of dark energy and neutrinos. Moreover, it is also important to note that the
neutrinos in the past must behave as massless particles where the relationship between the pressure
and the energy density is pν = ρν/3. Due to the coupling of the neutrinos with the scalar field
they become massive and non-relativistic. For these reasons a barotropic equation of state for the
neutrinos is proposed where the ratio between the pressure and the energy density wν = pν/ρν ,
given in terms of the red-shift z, reads
K3(1/z)
K2(1/z)
K3(1/z)
K2(1/z)
. (7)
Above K2(1/z) and K3(1/z) are modified Bessel functions of second kind. For small values of z,
wν tends to the non-relativistic limit equal to 2/3, whereas for large values of z, wν tends to the
relativistic limit equal to 1/3. It is noteworthy that for red-shifts z ≈ 10 this ratio reaches the value
wν ≈ 1/3 and the coupling between the neutrinos and the dark energy is negligible. The expression
given in (7) is motivated by the equation of the specific heat of a relativistic gas (see e.g. [34]).
The evolution equation for the energy density of the dark energy field is obtained from equations
(2) through (6), yielding
ρ̇de + 3
(ρde + pde) = −αφ̇(ρν − 3pν)− βρdmφ̇. (8)
The energy density and pressure of the dark energy are connected with the scalar field by φ̇ =√
ρde + pde. Since the purpose of this work is to develop a phenomenological theory, it is assumed
0 2 4 6 8 10
vw Ωb
Figure 1: Density parameters as functions of red-shift: van der Waals fluid (solid lines) and Chap-
lygin fluid (dashed lines).
that the dark energy field behaves either as a van der Waals or a Chaplygin fluid with an equation
of state given by [28, 29, 30, 31, 32, 33]
pvw =
8wvwρvw
3− ρvw
− 3ρ2vw, pch = −
, (9)
where wvw and A are positive free parameters in the van der Waals and Chaplygin equations of
state, respectively.
For the determination of the time evolution of the energy densities one has to close the system
of differential equations by introducing the Friedmann equation
= ρ. (10)
From now on the red-shift will be used as a variable instead of time thanks to the following
relationships
ρ(1 + z)
. (11)
Equations (4) can be easily integrated leading to the well-known dependence of the energy
densities of the baryons and radiation with the red-shift
ρr(z) = ρr(0)(1 + z)
4, ρb(z) = ρb(0)(1 + z)
3, (12)
whereas equations (5), (6) and (8) become a system of coupled differential equations for the energy
densities ρdm, ρν and ρde, namely,
(1 + z)ρ′dm − 3ρdm
(ρde + pde)/ρ
= −βρdm, (13)
(1 + z)ρ′ν − 3(ρν + pν)
(ρde + pde)/ρ
= −α(ρν − 3pν), (14)
(1 + z)ρ′de − 3(ρde + pde)
(ρde + pde)/ρ
= βρdm + α(ρν − 3pν). (15)
In the above equations the prime refers to a differentiation with respect to the red-shift.
In order to solve the coupled system of differential equations (13) – (15) one has to specify initial
values for the energy densities at z = 0. The following initial values for the density parameters
Ωi(z) = ρi(z)/ρ(z) taken from the literature (see [35] for a review) were chosen: Ωde(0) = 0.72,
0 500 1000 1500 2000 2500 3000
Figure 2: Density parameters as functions of red-shift for a van der Waals fluid as dark matter.
Ωdm(0) = 0.229916, Ωb(0) = 5 × 10−2, Ωr(0) = 5 × 10−5, Ων(0) = 3.4 × 10−5. Moreover, one has
to specify values for the coupling parameters α and β and for the parameters wvw and A which
appear in the van der Waals and Chaplygin equations of state (9). One way to fix the two last
parameters is through the use of the value of the deceleration parameter q = 1/2+ 3p/2ρ at z = 0.
Indeed, by considering q(0) = −0.55 it follows wvw = 0.33851 and A = 0.50403. For the coupling
parameters two sets of values were chosen, namely, (a) α = 5 × 10−5 and β = −5 × 10−5 for the
van der Waals equation of state and (b) α = 10−1 and β = −10−2 for the Chaplygin equation of
state. Its is also important to note that by increasing the value of the coupling parameter α (and/
or β) the transfer of energy between the dark energy and neutrinos (and/or dark matter) becomes
more efficient.
In Fig. 1 the density parameters are plotted as functions of the red-shift for values in the range
0 ≤ z ≤ 10. The straight lines refer to the case where the van der Waals equation of state is used
to describe the dark energy field whereas the dashed lines correspond to the Chaplygin equation
of state. The two density parameters that represent the dark energy field are denoted by Ωvw and
Ωch. One can infer from this figure that the dark energy density parameter tends to zero for high
red-shifts when the van der Waals equation of state is used, whereas it tends to a constant value
for the Chaplygin equation of state. While for high red-shifts the van der Waals equation of state
simulates a cosmological constant with pvw = −ρvw, the pressure of the Chaplygin fluid vanishes
indicating that it becomes another component of the dark matter field (see also the behavior of
the pressures indicated in Fig. 4). It is also important to note that the density parameters of the
baryons and of the dark matter increase more with the red-shift for the van der Waals equation of
state, since there is an accentuated decrease in the density parameter of the dark energy for this
case. Note that the density parameters of the radiation and neutrinos are very small in this range
of the red-shift and are not represented in this figure.
The behavior of the density parameters for the cases of the van der Waals and Chaplygin
equations of state are shown in Figs. 2 and 3, respectively, for red-shifts in the range 0 ≤ z ≤ 3000.
One can conclude from these figures, as expected, that the density parameters of the neutrinos
and radiation increase with the red-shift whereas those of the baryons and dark matter decrease.
Furthermore, the equality between the “matter” and “radiation” fields occurs when z ≈ 3000 for
the case where the dark matter field is modeled as a van der Waals fluid and z ≈ 4200 for the case of
a Chaplygin fluid. This can be easily understood, since in the latter case the dark energy becomes
dark matter for high red-shifts contributing for the density parameter of the “matter” field.
In Fig. 4 are plotted the deceleration parameter and the ratio between the pressure and the
energy density for both cases, the large frame corresponding to the van de Waals fluid whereas the
small frame to the Chaplygin fluid. For both cases the deceleration parameter at z = 0 is equal to
q(0) = −0.55, since this value was fixed in order to find the parameters wvw and A in the equations
of state (9). The transition from the decelerated to the accelerated phase of the Universe occurs at
zT = 0.73 and zT = 0.53 for the van der Waals and Chaplygin equations of state, respectively. It
0 500 1000 1500 2000 2500 3000
 + Ων
Figure 3: Density parameters as functions of red-shift for a Chaplygin fluid as dark matter.
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1 1.25 1.5 1.751.5 2
-0.5 0 0.5 1 1.51.5 2
Figure 4: Deceleration parameter and ratio between the pressure and the energy density as functions
of red-shift: large frame (van der Waals), small frame (Chaplygin).
0 2 4 6 8 10
with interactions
without interactions
Figure 5: Density parameters as functions of red-shift for a Chaplygin fluid with and without
interactions.
is interesting to note that while the Chaplygin equation of state simulates a cosmological constant
with pch = −ρch for negative red-shifts which implies an accelerated phase of the Universe in the
future, the van der Waals equation of state leads to a positive pressure and brings the Universe to
another decelerated phase in the future. It is noteworthy to call attention that for positive values
of the red-shift, the solution of the coupled differential equations (13) through (15) predicts that
the van der Waals fluid behaves close to a cosmological constant with pvw ≈ −ρvw. This behavior
does not lead to a new transition from a decelerated to an accelerated phase in the very early
Universe, since the energy density of the radiation field increases so that the radiation pressure
becomes larger than that of the van der Waals fluid. For high red-shifts the Universe first becomes
dominated by the baryon and dark matter fields and for higher red-shifts by the radiation field.
This model does not attempt to model the inflationary period, where the inflaton field dominates
a short rapid evolution of the Universe.
As final remarks we call attention to the fact that one expects that the coupling between dark
energy, dark matter and neutrinos should be weak so that the parameters α and β are restricted to
small values. The difference between the parameters adopted for the van der Waals and Chaplygin
equations of state is due to stability conditions of the non-linear coupled system of differential
equations (13) – (15), the van der Waals equation of state being more unstable for large values
of these parameters than the Chaplygin equation of state. In Fig. 5 we have plotted the density
parameters as functions of the red-shift for the case where a Chaplygin equation of state is used as
dark energy. One can infer from this figure that the decay of the dark energy density parameter
and the increase of the dark matter density parameter with the red-shift are more pronounced when
there exists a coupling between the fields. The density parameter of the baryons remains unchanged
since the baryons are uncoupled.
As final comment it is important to note that even without couplings between the fields of
dark energy, dark matter and neutrinos, this phenomenological model – with the equations of state
of van der Waals and Chaplyging as dark energy – can describe satisfactorily the evolution of a
Universe whose constituents are dark energy, dark matter, baryons, neutrinos and radiation.
References
[1] S. Perlmutter et al. Astrophys. J. 517, 565 (1999).
[2] A. G. Riess et al. Astrophys. J. 560, 49 (2001).
[3] M. S. Turner and A. G. Riess, Astrophys. J. 569, 18 (2002).
[4] J. Tonry et al., Astrophys. J. 594, 1 (2003).
[5] C. L. Bennett et al. Astrophys. J. Suppl. 148, 1 (2003).
[6] H. V. Peiris et al., Astrophys. J. Suppl. 148, 213 (2003).
[7] C. Netterfield et al. Astrophys. J. 571, 604 (2002).
[8] N. Halverson et al. Astrophys. J. 568, 38 (2002).
[9] D. N. Spergel et al. Astrophys. J. Suppl. 148, 175 (2003).
[10] S. M. Carroll, astro-ph/0310342.
[11] B. Schmidt et al., Astrophys. J. 507, 46 (1998).
[12] G. Efstathiou, S. L. Bridle, A. N. Lasenby, M. P. Hobson and R. S. Ellis, astro-ph/9812226.
[13] A. G. Riess et al., Astrophys. J. 516, 1009 (1998).
[14] D. Huterer and M. S. Turner, Phys. Rev. D 60, 081301 (1999).
[15] M. Persic, P. Salucci and F. Stel Mon. Not. Roy. Astron. Soc. 281, 27 (1996).
[16] X.-J. Bi, B. Feng, H. Li and X. Zhang, Phys. Rev. D 72 123523 (2005).
[17] A. W. Brookfield, C. van de Bruck, D. F. Mota and D. Tocchini-Valentini, Phys. Rev. Lett.
96, 061301 (2006).
[18] A. W. Brookfield, C. van de Bruck, D. F. Mota and D. Tocchini-Valentini, Phys. Rev. D 73
083515 (2006).
[19] C. Wetterich, Nucl. Phys. B 302, 645 (1988).
[20] C. Wetterich, Astron. Astrophys. 301, 321 (1995).
[21] L. Amendola, Phys. Rev. D 62, 043511 (2000).
[22] J. B. Binder and G. M. Kremer, Braz. J. Phys. 35, 1038 (2005).
[23] G. W. Anderson and S. M. Carroll, astro-ph/9711288 (1997)
[24] J. B. Binder and G. M. Kremer, Gen. Relativ. Gravit. 38, 857 (2006).
[25] S. Capozziello, S. De Martino and M. Falanga, Phys. Lett. A 299, 494 (2002).
[26] S. Capozziello, V. F. Cardone, S. Carloni, S. De Martino, M. Falanga, A. Troisi and M. Bruni,
J. Cosmol. Astropart. Phys. 04 (2005) 005.
[27] V. F. Cardone, C. Tortora, A. Troisi and S. Capozziello, Phys. Rev. D 73 043508 (2006).
[28] G. M. Kremer, Phys. Rev. D 68, 123507 (2003).
[29] G. M. Kremer, Gen. Relativ. Gravit. 36, 1423 (2004).
[30] A. Yu. Kamenshchik, U. Moschella and V. Pasquier, Phys. Lett. B 511, 265 (2001).
[31] J. C. Fabris, S. V. B. Gonçalves and P. E. de Souza, Gen. Relativ. Gravit. 34, 53 (2002).
[32] M. C. Bento, O. Bertolami and A. A. Sen A. A. Phys. Rev. D66, 043507 (2002).
[33] G. M. Kremer, Gen. Relativ. Gravit. 35, 1459 (2003).
[34] C. Cercignani and G. M. Kremer, The Relativistic Boltzmann Equation: Theory and Applica-
tions (Birkhäuser, Basel, 2002).
[35] M. Fukugita and P. J. E. Peebles, Astrophys. J. 616, 643 (2004).
http://arxiv.org/abs/astro-ph/0310342
http://arxiv.org/abs/astro-ph/9812226
http://arxiv.org/abs/astro-ph/9711288
ABSTRACT
  A model for a flat homogeneous and isotropic Universe composed of dark
energy, dark matter, neutrinos, radiation and baryons is analyzed. The fields
of dark matter and neutrinos are supposed to interact with the dark energy. The
dark energy is considered to obey either the van der Waals or the Chaplygin
equations of state. The ratio between the pressure and the energy density of
the neutrinos varies with the red-shift simulating massive and non-relativistic
neutrinos at small red-shifts and non-massive relativistic neutrinos at high
red-shifts. The model can reproduce the expected red-shift behaviors of the
deceleration parameter and of the density parameters of each constituent.

<|endoftext|><|startoftext|>
Introduction
	The new representation
	A practical Example: The parametric exponential form of f
	Discussion and Conclusions
	References
ABSTRACT
  The constrained-search formulation of Levy and Lieb, which formally defines
the exact Hohenberg-Kohn functional for any N-representable electron density,
is here shown to be equivalent to the minimization of the correlation
functional with respect to the N-1 conditional probability density, where N is
number of electrons of the system. The consequences and implications of such a
result are here analyzed and discussed via a practical example.

<|endoftext|><|startoftext|>
Introduction
	Definition of the momentum operator and the reality of its expectation value
	The expectation value of the momentum operator in Cartesian and spherical polar coordinates
	Momentum expectation values in various bound states
	Particle in one-dimensional stiff potential wells
	Infinite square well potential
	Finite square well potential
	Dirac-delta potential
	Particle in one-dimensional slowly varying potentials
	Linear harmonic oscillator potential
	Pöschl-Teller potential
	Morse potential
	Position expectation values for various potentials
	Momentum expectation value for a three-dimensional slowly varying spherically symmetric potential
	The Hydrogen atom
	A discussion on Heisenberg's equation of motion and Ehrenfest theorem
	Conclusion
	References
ABSTRACT
  In quantum mechanics textbooks the momentum operator is defined in the
Cartesian coordinates and rarely the form of the momentum operator in spherical
polar coordinates is discussed. Consequently one always generalizes the
Cartesian prescription to other coordinates and falls in a trap. In this work
we introduce the difficulties one faces when the question of the momentum
operator in spherical polar coordinate comes. We have tried to point out most
of the elementary quantum mechanical results, related to the momentum operator,
which has coordinate dependence. We explicitly calculate the momentum
expectation values in various bound states and show that the expectation value
really turns out to be zero, a consequence of the fact that the momentum
expectation value is real. We comment briefly on the status of the angular
variables in quantum mechanics and the problems related in interpreting them as
dynamical variables. At the end, we calculate the Heisenberg's equation of
motion for the radial component of the momentum for the Hydrogen atom.

<|endoftext|><|startoftext|>
Environmental noise reduction for holonomic quantum gates
Daniele Parodi,1,2 Maura Sassetti,1,3 Paolo Solinas,4 and Nino Zangh̀ı1,2
1 Dipartimento di Fisica,
Università di Genova, Genova, Italy
2 Istituto Nazionale di Fisica Nucleare (Sezione di Genova),
Genova, Italy 3 INFM-CNR Lamia
Via Dodecaneso 33, 16146 Genova, Italy
4 Laboratoire de Physique Théorique de la Matière Condensée,
Université Pierre et Marie Curie,
Place Jussieu, 75252 Paris Cedex 05, France
(Dated: October 27, 2018)
We study the performance of holonomic quantum gates, driven by lasers, under the effect of
a dissipative environment modeled as a thermal bath of oscillators. We show how to enhance the
performance of the gates by suitable choice of the loop in the manifold of the controllable parameters
of the laser. For a simplified, albeit realistic model, we find the surprising result that for a long time
evolution the performance of the gate (properly estimated in terms of average fidelity) increases. On
the basis of this result, we compare holonomic gates with the so-called Stimulated Raman adiabatic
passage (STIRAP) gates.
PACS numbers: 03.67.Lx
I. INTRODUCTION
The major challenge for quantum computation is posed
by the fact that generically quantum states are very del-
icate objects quite difficult to control with the required
accuracy—typically, by means of external driving fields,
e.g., a laser. The interaction with the many degrees of
freedom of the environment causes decoherence; more-
over, errors in processing the information may lead to a
wrong output state.
Among the approaches aiming at overcoming these dif-
ficulties are those for which the quantum gate depends
very weakly on the details of the dynamics, in particu-
lar, the holonomic quantum computation (HQC) [1] and
the so-called Stimulated Raman adiabatic passage (STI-
RAP) [2, 3, 4]. In the latter, the gate operator is obtained
acting on the phase difference of the driving lasers dur-
ing the evolution, while in the former the same goal is
achieved by exploiting the non-commutative analogue of
the Berry phase collected by a quantum state during a
cyclic evolution. Concrete proposals have been put for-
ward, for both Abelian [5, 6] and non-Abelian holonomies
[7, 8, 9, 10, 11, 12]. The main advantage of the HQC
is the robustness against noise deriving from a imperfect
control of the driving fields [13, 14, 15, 16, 17, 18, 19, 20].
In a recent paper [21] we have shown that the dis-
turbance of the environment on holonomic gates can be
suppressed and the performance of the gate optimized
for particular environments (purely superohmic thermal
bath). In the present paper we consider a different sort
of optimization, which is independent of the particular
nature of the environment.
By exploiting the full geometrical structure of HQC,
we show how the performance of a holonomic gate can
be enhanced by a suitable choice of the loop in the man-
ifold of the parameters of the external driving field: by
choosing the optimal loop which minimizes the “error”
(properly estimated in terms of average fidelity loss). Our
result is based on the observation that there are different
loops in the parameter manifold producing the same gate
and, since decoherence and dissipation crucially depend
on the dynamics, it is possible to drive the system over
trajectories which are less perturbed by the noise. For a
simplified, albeit realistic model, we find the surprising
result that the error decreases linearly as the gating time
increases. Thus the disturbance of the environment can
be drastically reduced. On the basis of this result, we
compare holonomic gates with the STIRAP gates.
In Sec. II the model is introduced and the explicit
expression of the error is derived. In Sec. III we find
the optimal loop, calculate the error, make a comparison
with other approaches, and briefly sketch how to treat a
different coupling with the environment.
II. MODEL
The physical model is given by three degenerate (or
quasidegenerate) states, |+〉, |−〉, and |0〉, optically con-
nected to another state |G〉. The system is driven by
lasers with different frequencies and polarizations, acting
selectively on the degenerate states. This model describes
various quantum systems interacting with a laser radia-
tion, ranging from semiconductor quantum dots, such as
excitons [12] and spin-degenerate electron states [3], to
trapped ions [8] or neutral atoms [7].
The (approximate) Hamiltonian modeling the effect of
the laser on the system is (for simplicity, ~ = 1) [8, 12]
H0(t) =
j=+,−,0
ǫ|j〉〈j|+(e−iǫtΩj(t)|j〉〈G|+H.c)
, (1)
where Ωj(t) are the timedependent Rabi frequencies de-
http://arxiv.org/abs/0704.0376v2
pending on controllable parameters, such as the phase
and intensity of the lasers, and ǫ is the energy of the
degenerate electron states. The Rabi frequencies are
modulated within the adiabatic time tad, (which coin-
cides with the gating time), to produce a loop in the pa-
rameter space and thereby realize the periodic condition
H0(tad) = H0(0).
The Hamiltonian (1) has four time dependent eigen-
states: two eigenstates |Ei(t)〉 , i = 1, 2, called bright
states, and two eigenstates |Ei(t)〉, i = 3, 4, called dark
states. The two dark states have degenerate eigen-
value ǫ and the two bright states have timedependent
energies λ±(t) = [ǫ ±
ǫ2 + 4Ω2(t)]/2 with Ω2(t) =
i=±,0 |Ωi(t)|
2 [22].
The evolution of the state is generated by
Ut = Te
dt′H0(t
′), (2)
where T is the time-ordered operator. In the adiabatic
approximation, the evolution of the state takes place in
the degenerate subspace generated by |+〉, |−〉, and |0〉.
This approximation allows to separate the dynamic con-
tribution and the geometric contribution from the evolu-
tion operator. Expanding Ut in the basis of instantaneous
eigenstates of H0(t) (the bright and dark states), in the
adiabatic approximation, we have
Ut ∼=
′)dt′ |Ej(t)〉〈Ej(t)| Ut, (3)
where
Ut = Te
dτV (τ), (4)
here V is the operator with matrix elements Vij(t) =
〈Ei(t)|∂t|Ej(t)〉. The unitary operator Ut plays the role
of timedependent holonomic operator and is the funda-
mental ingredient for realizing complex geometric trans-
formation whereas
′)dt′ |Ej(t)〉〈Ej(t)| is the
dynamic contribution.
Consider Ut for a closed loop, i.e., for t = tad,
U = Utad . (5)
If the initial state |ψ0〉 is a superposition of |+〉 and |−〉,
then U|ψ0〉 is still a superposition of the same vectors (in
general, with different coefficients)[12]. Thus the space
spanned by |+〉 and |−〉 can be regarded as the “logi-
cal space” on which the “logical operator” U acts as a
“quantum gate” operator. Note that for t < tad, Ut|ψ0〉
has, in general, also a component along |0〉. However,
as it is easy to show [22], at any instant t < tad, Ut|ψ0〉
can be expanded in the twodimensional space spanned
by the dark states |E3(t)〉 and |E4(t)〉. It is important
to observe that U depends only on global geometric fea-
tures of the path in the parameter manifold and not on
the details of the dynamical evolution [1, 12].
To construct a complete set of holonomic quantum
gates, it is sufficient to restrict the Rabi frequencies
Ωj(t) in such a way that the norm Ω of the vector
~Ω = [Ω0(t),Ω+(t),Ω−(t)] is time independent and the
vector lies on a real three dimensional sphere [8, 12].
We parametrize the evolution on this sphere as Ω+(t) =
sin θ(t) cos φ(t), Ω−(t) = sin θ(t) sin φ(t) and Ω0(t) =
cos θ(t) with fixed initial (and final) point in θ(0) = 0,
the north pole By straightforward calculation we obtain
the analytical expression for V (t) in eq. (4), V (t) =
iσy cos[θ(t)]φ̇(t), where σy is the usual Pauli matrix
written in the basis of dark states. Thus, the oper-
ator (4) becomes Ut = cos[a(t)] − iσy sin[a(t)], here
a(t) =
dτφ̇(τ) cos θ(τ). Accordingly, the logical op-
erator U (5) is
U = cos a− iσy sin a, (6)
where
a = a(tad) =
∫ tad
dτφ̇(τ) cos θ(τ) (7)
is the solid angle spanned on the sphere during the evolu-
tion. Note that the are many paths on the sphere which
generate the same logical operator U , and span the same
solid angle a.
In a previous work we have studied how interaction
with the environment disturbs the logical operator U [21].
The goal of the present paper is to analyze whether and
how such a disturbance can be minimized for a given
U . To this end, we model the environment as a thermal
bath of harmonic oscillators with linear coupling between
system and environment [23]. The total Hamiltonian is
H = H0(t) +
α + cαxαA), (8)
where A is the system interaction operator called, from
now on, noise operator.
We now consider the time evolution of the reduced
density matrix of the system, determined by the Hamil-
tonian (8). We rely on the standard methods of the “mas-
ter equation approach,” with the environment treated in
the Born approximation and assumed to be at each time
in its own thermal equilibrium state at temperature T .
This allows to include the effect of the environment in
the correlation function (kB = 1)
g(τ) =
cos(ωτ) − i sin(ωτ)
Here the spectral density is
J(ω) =
δ(ω − ωα), (10)
at the low frequencies regimes, is proportional to ωs, with
s ≥ 0, i.e., s = 1 describes a Ohmic environments, typ-
ical of baths of conduction electrons, s = 3 describes
a super-Ohmic environment, typical of baths of phonons
[21, 24]. The asymptotic decay of the real part of g(τ) de-
fines the characteristic memory time of the environment.
Denoting with ρ̃(t) the time evolution of the reduced den-
sity matrix of the system in the interaction picture, e.g.,
ρ̃(t) = U
t ρUt, one has [24]
ρ̃(tad) = ρ(0) +
dτ{g(τ )[ÃÃ
ρ̃(t− τ )− Ã
ρ̃(t− τ )Ã]
+ g(−τ )[ρ̃(t− τ )Ã
Ã− Ãρ̃(t− τ )Ã
]. (11)
Here Ã and Ã′ stand for Ã(t) and Ã(t−τ), with the tilde
denoting the time evolution in the interaction picture.
In quantum information the quality of a gate is usually
evaluated by the fidelity F , which measures the closeness
between the unperturbed state and the final state,
F = 〈ψ0(0)|U
†ρ(tad)U|ψ0(0)〉, (12)
where |ψ0(0)〉 is the initial state, and ρ(tad) = U ρ̃(tad)U
is the reduced density matrix in the Schrödinger picture
starting from the initial condition ρ(0) = |ψ0(0)〉〈ψ0(0)|.
The average error is defined as the average fidelity loss,
i.e.,
δ =< 1−F >= 1− < 〈ψ0(0)|ρ̃(tad)|ψ0(0)〉 >, (13)
where < · · · > denotes averaging with respect to the
uniform distribution over the initial state |ψ0(0)〉.
The right-handside of Eq. (13) can be computed by
the following steps:
(1) solving Eq. (11) in strictly second order approxima-
tion; this approximation corresponds to replace ρ̃(t − τ)
with ρ(0);
(2) using the adiabatic approximation U(t − τ, t) ≈
exp(iτH0(t));
(3) expanding the scalar product in Eq. (13) with respect
to a complete orthonormal basis {|ϕn(t)〉}, n = 1, 2, 3,
orthogonal to |ψ0(t)〉. In this way, one obtains
∫ tad
dt G(t)|〈ψ0(t)|A|ϕn(t)〉|
, (14)
where
G(t) =
Re[g(τ)] cos(ω0nt) + Im[g(τ)] sin(ω0nt))
Here, ω0n = ω0−ωn are the energy differences associated
to the transition ψ0 ↔ φn, with ω0 = ǫ, ω1 = λ+, ω2 =
λ−, and ω3 = ǫ.
The interaction between system and environment is ex-
pressed by the noise operator A in Eq. (8). We shall now
make the assumption that A = diag{0, 0, 0, 1} in the |G〉,
|±〉, and |0〉 basis. In this case the transition between de-
generate states are forbidden, however the noise breaks
their degeneracy, shifting one of them. In spite of its sim-
ple form, this A is nevertheless a realistic noise operator
for physical semiconductor systems [4].
III. MINIMIZING THE ERROR
The problem can be stated in the following way: given
the noise operator A and the logical operator U , find a
path on the parameter space (the surface of the sphere,
described above) which minimizes the error δ.
The total error δ, given by Eq. (14), can be decom-
posed as
δ = δtr + δpd, (16)
where the transition error, δtr, is the contribution to the
sum of the nondegenerate states (ω0n 6= 0) and the pure
dephasing error δpd is the contribution of the degenerate
states (ω0n = 0). Thus
δpd =
∫ tad
sin(ωt)
sin2 2a(t)
sin4 θ(t) (17)
δtr =
n=+,−
1 + [(λn − ǫ)/Ω]2
∫ tad
sin2 2θ(t)dt,
where
Γ0n = J(|ω0n|)
|ω0n|
− sgn(ω0n)
correspond to the transition rates calculated by standard
Fermi golden rules, supposing, as usual, G(t) ≈ G(∞) for
g(τ) strongly peaked around τ = 0. In the following we
define for simplicity
n=+,−
1 + [(λn − ǫ)/Ω]2
Since we are interested at long time evolution, we start
discussing the transition error which dominates in this
regime [4, 25].
A. Transition rate
As explained in Sec. II, the holonomic paths are closed
curves on the surface of the sphere which start from the
north pole. It turns out that the curve minimizing δtr
can be found among the loops which are composed by a
simple sequence of three paths (see the Appendix): evo-
lution along a meridian (φ = const), evolution along a
�����
�����
���������
FIG. 1: The error δtr versus θM for two different a values:
a = π/2 (dashed line) and a = π/4 (full line) correspond to
NOT and Hadamard gate, respectively.
parallel (θ = const) and a final evolution along a merid-
ian to come back to the north pole.
The error δtr in (18), depends on a given by Eq. (7),
θM (the maximum angle spanned during the evolution
along the meridian), ∆φ (the angle spanned along the
parallel), and angular velocity v. We allow ∆φ ≥ 2π
which corresponds to cover more than one loop along
the parallel. The velocity along the parallel is v(t) =
φ̇(t) sin θ and that along the meridian is v(t) = θ̇(t). In
the following we assume that v is constant, and it cannot
exceed the maximal value of vmax, fixed by adiabatic
condition vmax ≪ Ω.
The parameters a, θM , and ∆φ are connected by the
relation a = ∆φ(1 − cos θM ). The error δtr is then
δtr = δ
tr + δ
tr, (20)
where
δMtr =
sin 4θM
is the contribution along the meridian and
δPtr = K
sin θM sin
2 2θM
1− cos θM
is the contribution along the parallel.
In Fig. 1 δtr is plotted for a = π/2 and a = π/4 (corre-
sponding to NOT and Hadamard gate, respectively) as a
function of θM . One can see that δtr has a local minimum
for θM = π/2 and a global minimum for θM = 0 where
the error vanishes. This suggests that the best choice is
to take θM as small as possible.
It is interesting to consider the dependence of δtr also
on the evolution time tad. For simplicity, we set the ve-
locity v = vmax. In this case, changing θM (and then
∆φ) corresponds to a change in the evolution time. We
obtain
θM = arccos
, (23)
5 10 15 20 25 30
vmaxtad
FIG. 2: The error δtr versus vmaxtad for two different a val-
ues: a = π/2 (dashed line) and a = π/4 (full line) correspond
to NOT and Hadamard gate, respectively. The dotted-dashed
line shows the value of the error at θ = π/2. The circles show
the critical value of vmaxtad above which the best loop is the
one with the minimal θM .
where
(vmaxtad)
2 + a2
. (24)
Using these relations, δMtr and δ
tr, given by (21) and
(22) become functions of tad, vmax, and a. Note that
m measures the space covered along the parallel, in fact
∆φ = 2πm.
In Fig. 2 we see the behavior of δtr as a function
of vmaxtad. The first minimum for both curves corre-
sponds to θM = π/2, then the curves for long tad de-
crease asymptotically to zero corresponding to the region
in which θM → 0. In this regime we have δtr ∝ 1/tad
which is drastically different from the results obtained
with other methods where δtr ∝ tad, (see Refs [4, 25]
and below Sec. III C). It should be observed that this
surprising results is a merit of holonomic approach which
allows to choose the loop in the parameter space, with-
out changing the logical operation as long as it subtends
the same solid angle. Observe that small θM and long
tad mean large value of m, i.e., multiple loops around the
north pole.
Figure 2 shows that, for a given gate, there is a criti-
cal value kc of vmaxtad which discriminate between the
choice of θM (e.g., k = 6 for the Hadamard gate and
k = 25 for the NOT gate). For vmaxtad < kc the best
choice for the loop is θM = π/2; For vmaxtad > kc the
best choice is the value of θM determined by eq. (23) and
(24).
Note that the region vmaxtad > kc is accessible with
physical realistic parameters [12]. For example, if we
choose the laser intensity Ω = 20 meV and vmax = Ω/50
(for which values the nonadiabatic transitions are forbid-
den), the critical parameter corresponds to the critical
time of 15 ps for the Hadamard gate and 42 ps for the
NOT gate.
B. Pure Dephasing
Until now we have ignored the pure dephasing effect
because we have assumed that it is negligible in com-
parison with the transition error for long evolution time.
Now, we check that the pure dephasing error contribu-
tion can indeed be neglected. We can write the pure
dephasing error using Eq. (17) and splitting to parallel
and meridian part as
δPpd =
∫ tad
Q[a(t)] sin ωt sin4 θM (25)
δMpd =
Q[a(t)]
sin ωt
sin4(vmaxt) + sin
vmaxt
,(26)
where Q[a(t)] = 1 + 1/2 sin2[2a(t)].
To estimate δpd we assume that tad is longer with re-
spect to the characteristic time of the bath. Remember-
ing that J(ω) ∝ ωs, the pure dephasing error behavior
along the parallel part at the temperature T is
δPpd ∝
, T ≪ 1/tad
, T ≫ 1/tad
while the along meridian is
δMpd ∝
. (28)
Then, we can conclude that the pure dephasing can al-
ways be neglected for long time evolution because it de-
creases faster than the transition error.
C. Comparison between HQC and STIRAP
We make a comparison between holonomic quantum
computation (HQC) and the STIRAP procedure which is
an analogous approach to process quantum information.
The STIRAP procedure ([2, 4]) is, in its basic points,
very similar to the holonomic information manipulation.
The level spectrum, the information encoding, the evolu-
tion produced by adiabatic evolving laser are exactly the
same. The fundamental difference is that in STIRAP
the dynamical evolution is fixed (we must pass through
a precise sequence of states) and then the correspond-
ing loop in the parameter space is fixed. In particular,
we go from the north pole to the south pole and back
to the north pole along meridians. Since the loop, as
in our model, is a sequence of meridian-parallel-meridian
path, we can calculate the error and make a direct com-
parison. In this case, the transition error results propor-
tional to δtr ∝ tad and grows linearly in time while for
HQC δtr ∝ 1/tad. Therefore, the HQC is fundamentally
the favorite for long application times with respect to the
STIRAP ones.
Moreover, we can show that the freedom in the choice
of the loop allows us to construct HQC which perform
better than the best STIRAP gates. In Ref. [4] the mini-
mum error (not depending on the evolution time) for STI-
RAP was obtained reaching a compromise between the
necessity to minimize the transition, pure dephasing error
and the constraint of adiabatic evolution. With realistic
physical parameters [21] (J(ω) = kω3e(−ω/ωc)
, Ω = 10
meV, ǫ = 1eV, vmax = Ω/50, k = 10
−2(meV)−2, ωc = 0.5
meV and for low temperature), the total minimum error
in Ref. [4] is δstirap = 10
−3. With the same parame-
ters, we still have the possibility to increase the evolution
time in order to reduce the environmental error. How-
ever, for evolution time tad = 50 ps we obtain a total
error δ = 1.5× 10−4 for the NOT gate and δ = 4× 10−5
for the Hadamard gate, respectively. As can be seen, the
logical gate performance is greatly increased.
D. More general noise
Until now we have discussed the possibility to minimize
the environmental error by choosing a particular loop in
the parameter sphere but the structure of the error func-
tional clearly depends on the system-environment inter-
action. Then one might wonder if the same approach can
be used for a different noise environment.
For this reason, we now briefly analyze the case of noise
matrix in the form A = diag{0, 1, 0,−1}. Again, for long
evolution we can neglect the contribution of the pure de-
phasing and focus on the transition error. In this case the
interesting part of the error functional takes the form
δtr = K[(
sin 2θ cos 2θ)2 + (sin θ sin 2φ)2]. (29)
Even if the analysis in this case is much more com-
plicated, it can be seen that δtr has an absolute mini-
mum for θM = 0. The long time behavior is the same
(δtr ∝ 1/tad) such that the results are qualitatively anal-
ogous to the above ones: for small θM loops (or long
evolution at fixed velocity) the holonomic quantum gate
presents a decreasing error. Then even in this case it is
possible to minimize the environmental error.
IV. CONCLUSIONS
In summary, we have analyzed the performance of
holonomic quantum gates in the presence of environmen-
tal noise by focusing on the possibility to have small
errors choosing different loops in the parameter mani-
fold. Due to the geometric dependence, we can imple-
ment the same logical gate with different loops. Since
different loops correspond to different dynamical evolu-
tions, we have used this freedom to construct an evolu-
tion through “protected” or “weakly influenced” states
leading to good holonomic quantum gates performances.
This allows to select (once that the physical parameter
are fixed) the best loop which minimizes the environ-
mental effect. (Note that this optimization procedure
is rather independent of the details of the simple model
we have considered and arguably, it could be extended
to more complicated systems without any substantial
modification.) We have shown that for long time evo-
lutions the noise decreases as 1/tad while in the other
cases it increases linearly with adiabatic time. We also
have shown that the same features can be found with
different kinds of noise suggesting the possibility to find
a way to minimize the environmental effect in the pres-
ence of any noise. These results open a new possibility
for implementation of holonomic quantum gates to build
quantum computation because they seem robust against
both control error and environmental noise.
Acknowledgment
The autors thank E. De Vito for useful discussions.
One of the authors (P. S.) acknowledges support from
INFN. Financial support by the italian MIUR via
PRIN05 and INFN is acknowledged.
APPENDIX A: MINIMIZING THEOREM
Let us consider the family Cn composed of the closed
curves generated by a sequence of n paths along a parallel
(θ = const) alternated with paths along a meridian (φ =
const). We call Cn a generic curve in this family. For
example, the family C1 contains all the closed curves com-
posed by the sequence of path meridian-parallel-meridian
while the family C2 contains the curves meridian-parallel-
meridian-parallel-meridian.
We argue that the closed curve minimizing the error
in Eq. (18) can be found in the C1 family. First, we
show that any closed curve in C2 spanning a solid angle
a on the sphere can be replaced by a closed curve in C1
spanning the same angle and producing a smaller error.
In analogous way any closed curve in C3 can be replaced
by a closed curve in C2 with smaller error and so on. By
induction we obtain that any closed curve in Cn can be
replaced by a curve in C1 spanning the same solid angle
but producing smaller error. Since the curve belonging
to Cn can approximate any closed curve on the sphere,
the best curve can be found in C1.
The crucial point is to show that any curve in C2 can be
replaced by a curve in C1. Let us consider a generic curve
C2 in C2 spanning a solid angle a: composed by a seg-
ment of a meridian (with θ going from 0 to θ1), a parallel
(spanning a ∆φ1 angle), meridian (with θ : θ1 → θ2), a
parallel (spanning a ∆φ2 angle), and finally a segment
to the north pole along a meridian. Let us consider two
closed curves C11 and C
1 in C1 subtending the same solid
angle a with, respectively, θ1 and θ2 as maximum angle
spanned during the evolution along the meridian. First
we analyze (20) along the meridian. Without losing gen-
erality, we can take θ1 < θ2; it is clear from Eq. (21) that
the value of δtr along the meridian for C
1 is smaller that
for C21 : δ
. We note from the Eq. (21), suitable
extended to C2, that the two paths along the meridians
depends only on θ2 and then produce the same error of
C21 ,
< δMC2
= δMC2 . (A1)
The difference between the contribution along the par-
allel is
δPC2−δ
= ∆φ1
sin θ1 sin
2 2θ1−
1− cos θ1
1− cos θ2
sin θ2 sin
2 2θ2
δPC2−δ
= ∆φ2
sin θ2 sin
2 2θ2−
1− cos θ2
1− cos θ1
sin θ1 sin
2 2θ1
Analysis of the positivity of the quantities given by
Eqs. (A2) and (A3) shows that δPC2 cannot be at the
same time smaller than δP
and δP
. In fact, there are
two possibilities: If δPC2 > δ
, from Eq. (A1) and (A3),
δC2 = δ
+ δPC2 > δ
+ δPC1
= δC1
, (A4)
and the best closed curve is C11 . If δ
, from Eqs.
(A1) and (A2),
δC2 = δ
+ δPC2 > δ
+ δPC2
= δC2
, (A5)
and the best closed curve is C21 .
In the same way it can be shown that any closed curve
in C3 can be replaced by a closed curve in C2 with smaller
error.
[1] P. Zanardi and M. Rasetti, Phys. Lett. A 264, 94 (1999). [2] Z. Kis and F. Renzoni, Phys. Rev. A 65, 032318 (2002).
[3] F. Troiani, E. Molinari, and U. Hohenester, Phys. Rev.
Lett. 90, 206802 (2003).
[4] K. Roszak, A. Grodecka, P. Machnikowski, and T. Kuhn,
Phys. Rev. B 71, 195333 (2005).
[5] J. A. Jones, V. Vedral, A. Ekert, and G. Castagnoli, Na-
ture (London) 403, 869 (2000).
[6] G. Falci, R. Fazio, G. M. Palma, J. Siewert, and V. Ve-
dral, Nature (London) 407, 355 (2000).
[7] R. G. Unanyan, B.W. Shore, and K. Bergmann, Phys.
Rev. A 59, 2910 (1999).
[8] L.-M. Duan, J.I. Cirac, and P. Zoller, Science 292, 1695
(2001).
[9] L. Faoro, J. Siewert, and R. Fazio, Phys. Rev. Lett. 90,
028301 (2003).
[10] I. Fuentes-Guridi, J. Pachos, S. Bose, V. Vedral, and S.
Choi, Phys. Rev. A 66, 022102 (2002).
[11] A. Recati, T. Calarco, P. Zanardi, J. I. Cirac, and P.
Zoller, Phys. Rev. A 66, 032309 (2002).
[12] P. Solinas, P. Zanardi, N. Zangh̀ı, and F. Rossi, Phys.
Rev. B 67, 121307(R) (2003).
[13] A. Carollo, I. Fuentes-Guridi, M. F. Santos, and V. Ve-
dral, Phys. Rev. Lett. 90, 160402 (2003).
[14] G. De Chiara and G. M. Palma, Phys. Rev. Lett. 91,
090404 (2003).
[15] A. Carollo, I. Fuentes-Guridi, M. F. Santos, and V. Ve-
dral, Phys. Rev. Lett. 92, 020402 (2004).
[16] V.I. Kuvshinov and A.V. Kuzmin, Phys. Lett. A, 316,
391 (2003).
[17] P. Solinas, P. Zanardi, and N. Zangh̀ı, Phys. Rev. A 70,
042316 (2004).
[18] S.-L. Zhu and P. Zanardi, Phys. Rev. A 72, 020301(R)
(2005).
[19] G. Florio, P. Facchi, R. Fazio, V. Giovannetti, and S.
Pascazio, Phys. Rev. A 73, 022327 (2006).
[20] I. Fuentes-Guridi, F. Girelli, and E. Livine, Phys. Rev.
Lett. 94, 020503 (2005)
[21] D. Parodi, M. Sassetti, P. Solinas, P. Zanardi, and N.
Zangh̀ı, Phys. Rev. A 73, 052304 (2006).
[22] The explicit expression for the bright states
is |E1〉 =
(Ω|e〉 +
Ωi|i〉) and |E2〉 =
(−Ω|e〉 +
Ωi|i〉); for the dark states is
|E3〉 = 1/(Ω
|Ω+|2 + |Ω−|2)[Ω0(Ω+|+〉 + Ω−|−〉) −
(Ω2 − |Ω0|
2)|0〉]) and |E4〉 = 1/
|Ω+|2 + |Ω−|2[Ω−|+〉−
Ω+|−〉].
[23] A. O. Caldeira and A. J. Leggett, Phys. Rev. Lett. 46,
211 (1981).
[24] U. Weiss, Quantum dissipative systems (World Scientific,
Singapore, 1999).
[25] R. Alicki, M. Horodecki, P. Horodecki, R. Horodecki, L.
Jacak, and P. Machnikowski, Phys. Rev. A 70, 010501(R)
(2004).
ABSTRACT
  We study the performance of holonomic quantum gates, driven by lasers, under
the effect of a dissipative environment modeled as a thermal bath of
oscillators. We show how to enhance the performance of the gates by suitable
choice of the loop in the manifold of the controllable parameters of the laser.
For a simplified, albeit realistic model, we find the surprising result that
for a long time evolution the performance of the gate (properly estimated in
terms of average fidelity) increases. On the basis of this result, we compare
holonomic gates with the so-called stimulated Raman adiabatic passage (STIRAP)
gates.

<|endoftext|><|startoftext|>
arXiv:0704.0377v3  [hep-ph]  22 Dec 2008
The lifetime of unstable particles in electromagnetic fields
Daniele Binosi1 and Vladimir Pascalutsa1, 2
1ECT* Trento, Villa Tambosi, Villazzano, I-38050 TN, Italy
2Institut für Kernphysik, Johannes Gutenberg Universität, Mainz D-55099, Germany
(Dated: October 30, 2018)
Abstract
We show that the electromagnetic moments of unstable particles (resonances) have an absorptive
contribution which quantifies the change of the particle’s lifetime in an external electromagnetic
field. To give an example we compute here the imaginary part of the magnetic moment for the
cases of the muon and the neutron at leading order in the electroweak coupling. We also consider
an analogous effect for the strongly-decaying ∆(1232) resonance. The result for the muon is
Imµ = eG2Fm
3/768π3, with e the charge and m the mass of the muon, GF the Fermi constant,
which in an external magnetic field of B Tesla give rise to the relative change in the muon lifetime
of 3 × 10−15 B. For neutron the effect is of a similar magnitude. We speculate on the observable
implications of this effect.
PACS numbers: 13.40.Em, 13.35.-r, 12.15.Lk, 23.40.-s
http://arxiv.org/abs/0704.0377v3
I. INTRODUCTION
The electromagnetic (e.m.) moments of a particle are among the few fundamental quan-
tities which describe the particle properties and as such have thoroughly been studied. The
most renowned examples are the magnetic moments of the electron and the muon which
have been measured to unprecendented accuracy and yielded a number of physical insights,
see[1] for recent reviews. What is far lesser known is that the e.m. moments of unstable
particles are complex numbers in general [2, 3]. Their imaginary part reflects, of course, the
unstable nature of the particle, however, the precise interpretation has been missing. In this
paper we work out the relation, suggested first by Holstein [4], which should exist between
the imaginary part of the magnetic moment and the effect of an external magnetic field on
particle’s lifetime.
The argument for such a relation is very simple. The (self-)energy of the particle with a
lifetime τ has an absorptive part, which has an interpretation of the width Γ = 1/τ . The
particle’s magnetic moment ~µ in the presence of magnetic field ~B induces the change in
the energy: −~µ · ~B. The latter contribution can then also change the width, provided the
magnetic moment has an absorptive part (Imµ 6= 0).
The decay properties of unstable particles, such as muon or neutron are extremely
well studied and are widely used for the precise determination of the Standard Model
parameters[5, 6]. There are also a plethora of studies of how these particles behave in
e.m. fields. A well-known example is the search for the neutron’s electric dipole moment[7].
In view of these studies it is compelling to investigate how the decay properties of unstable
particles may be affected by e.m. fields.
The lifetime of unstable quantum-mechanical systems is known to be affected by an
e.m. field. Positronium provides a textbook example[8], where the effect arises due to the
admixture of para- (S = 0) and ortho- (S = 1) positronium states with orbital momentum
l = 0 by the magnetic field interacting with the magnetic moments of the constituents. As
the result, already in the field of B = 0.2 Tesla, the lifetime of ortho-positronium decreases
by almost a factor of 2.
It is far from obvious how the same kind of an effect can arise for an elementary unstable
particle, e.g., the muon. The above-mentioned relation between the imaginary part of the
magnetic moment and the lifetime change may, therefore, provide us with both an interpre-
µ νµ µ
FIG. 1: The muon self-energy contributing to its decay width.
tation for the imaginary part of the magnetic moment and the means to compute the effect
of the lifetime change.
In the following we examine in detail the case of the muon, compute the leading contri-
bution to Imµ and the corresponding effect on the lifetime. Then we will briefly discuss the
cases of the neutron and of the ∆-resonance.
II. MUON DECAY (µ → e νeνµ)
The leading contribution to the muon decay width arises at two-loop level, see Fig. 1. For
our purposes, the W propagators in this graph can safely be assumed to be static — Fermi
theory. We also neglect the mass of the electron in the loops, since it leads to an under-
percent correction of O(me/m); here and in what follows, m is the muon mass. The graphs
with other Standard Model fermions (e.g., quarks) in the loops need not to be considered
here, because they cannot give any contribution to the muon width.
Using dimensional regularization, we compute this graph in d = 4 − 2ǫ dimensions (in
the limit ǫ → 0+),[14]
Σ (p/) =
(2π)d
2γµ(1− γ5) (p/− k/) γν
(p− k)2 + iε
Πµν(k). (1)
where MW is the W -boson mass, g = |e|/ sin θW is the electroweak coupling related to the
Fermi constant by GF/
2 = g2/8M2W , e is the charge, θW is the Weinberg angle, and
Πµν(k) =
d (d− 2)
(4π)d/2(d− 1)
Γ (ǫ)Γ (1− ǫ)2
Γ (2− 2ǫ)
× (−k2)−ǫ
k2gµν − kµkν
is the one-loop correction to the polarization tensor of the W boson. The decay width can
then be found as Γ = −2 ImΣ (p/ = m). A brief calculation shows that the self-energy has
the following form:
Σ (p/) = v(s) p/ (1− γ5) , (3)
with s = p2 and the scalar function v given by:
v(s) = −
G2F s
3(4π)4
− 2γE − 2 ln
+O(ǫ)
, (4)
where γE = −Γ ′(1) is the Euler’s constant. The absorptive part of this function stems from
the logarithm [ln(−s− iε) = ln s− iπ, for s > 0]:
Im v(s) = −
G2F s
384π3
. (5)
Terefore, the width is Γ = −2m Im v(m2), and the muon lifetime:
τ = 192π3/(G2Fm
5) ≃ 2.187× 10−6 sec, (6)
This result is of course long-known due to the seminal work of Feynman and Gell-Mann on
Fermi theory[9]. It is in a percent agreement with the experimental value[5]:
τ (exp) = (2.19703± 0.00004) 10−6 sec, (7)
The discrepancy is due the neglect of the electron mass and some radiative corrections,
c.f.[10]. We now investigate the influence of the e.m. field on the leading contribution given
by Eq. (6).
Let us denote by Σ (x, y;Aµ) the self-energy in the presence of an external e.m. field Aµ.
It is obtained by minimal substitution (∂µ → ∂µ − ieAµ) of the derivatives of all charged
fields into the self-energy of Fig. 1. Expanding in the e.m. coupling, we obtain:
Σ [x, y;Aµ] = Σ (i∂/
x) δ4(x− y)
dz Λµ(x, y; z)Aµ(z) +O(e
2A2), (8)
where Σ (i∂/) is the already computed self-energy in the vacuum, while Λ is the e.m. vertex
correction of Fig. 2, with static W ’s.
Denoting p (p′) the 4-momentum of the initial (final) muon and assuming the on-shell
situation (p2 = p′
= p · p′ = m2), the vertex correction has in the momentum space the
following general form:
Λµ(p′, p) = e
F γµ +G
(p+ p′)µ
+ FA γ
, (9)
where F , G and FA are complex numbers. Note that eF/2m is the correction to the magnetic
moment, and eF + eG is the correction to the electric charge. The Ward-Takahashi (WT)
identity:
(p′ − p) · Λ(p′, p) = e [Σ (p/)− Σ (p/′)] (10)
µ µνµ
FIG. 2: Electromagnetic correction to the muon decay.
with the self-energy in Eq. (3) leads to the following conditions:
F +G = −v(m2)− 2m2v′(m2), FA = v(m2) . (11)
Therefore, the term FA is in fact necessary by the e.m. gauge invariance. The γ5 terms, in
both self-energy and the vertex, are shown to vanish when summing over all the fermions
in Standard Model[11]. However, this does not happen for the imaginary part because the
heavier fermions do not contribute.
The expression for the graph in Fig. 2 is (in Fermi theory) given by
Λµ(p′, p) = −
64M4W
(2π)d
(2π)d
γβ(1− γ5)
2γα(1− γ5) (p/′ − k/1) γµ (p/− k/1) γβ (k/1 − k/2)
(k1 − k2)2 (p− k1)2 (p′ − k1)2
After a lengthy calculation we obtain the following result:
ImF =
384π3
, ImG =
, ImFA = −
384π3
, (13)
hence satisfying the gauge-invariance conditions Eq. (11), for Im v given by Eq. (5).
We would like to emphasize here that, of course, not only the magnetic moment, but
also the charge operator receives an imaginary contribution, equal to e Im(F +G). However,
through the WT identity, this contribution is completely fixed by the momentum dependence
of the self-energy, and therefore is not independent. The same holds for FA. We thus
discuss only the effect of the absorptive part of the magnetic moment, here given by Imµ =
e ImF/2m = eG2Fm
3/768π3.
The energy of the magnetic moment interacting with the magnetic field is equal to −µBz,
with Bz being the projection of the field along the muon spin. Then the total energy, in the
muon rest-frame, is given by: m − (i/2)Γ− µBz. We thus deduce that the absorptive part
yields the following change in the muon width:
∆Γ = 2 ImµBz =
192π3
Bz , (14)
while the change in the lifetime is ∆τ = −(∆Γ/Γ) τ , for ∆Γ/Γ ≪ 1.
Given this result, we conclude that the positively-charged muons live shorter (longer) in
a uniform magnetic field if their spin is aligned along (against) the field. For the relative
change in the width we find:
|eBz|
<∼ 3× 10−15B T−1, (15)
where B is the strength of the field in Tesla. Therefore, in moderate magnetic fields the
change in the muon lifetime is tiny, well beyond the present experimental accuracy (which
is at the ppm level). We will dwell on this more in the concluding part of the paper, but for
now we turn to a more technical issue.
It is interesting to observe that the result of Eq. (13), can simply be obtained by the
minimal substitution into Eq. (3), rather than into the electron propagator in Eq. (1). To
show this we go to coordinate space and hence write the self-energy as Σ (x, y) = Σ (i∂/
) δ(x − y). The minimal substitution to the first order in e leads to the following vertex
correction:
Λ̃µ(x, y; z) = − δ/δAµ(z)Σ (i∂/ + eA/ ) δ(x− y) |A=0 . (16)
Note that in general this is different from the vertex function in Eq. (8), since in the latter
the minimal substitution is performed also in the internal lines. The general form of Eq. (9),
of course, applies here as well, but now the scalar functions are completely specified by the
self-energy:
F̃ = −v(m2), G̃ = −2m2 v′(m2), F̃A = v(m2) . (17)
Substituting the explicit form of Im v, we see that this method unambiguously leads to
exactly the same result [Eq. (13)] as the full calculation. We emphasize though, that this
method cannot always work (see, e.g., Ref.[12]), as will also be clear from the following
examples. Nevertheless, it is worthwhile to investigate this method further, since knowing
whether it is applicable a priori can enormously facilitate the calculations.
III. NEUTRON DECAY AND THE ∆-RESONANCE
We consider now the neutron β-decay. Assuming exact V − A interaction (gA = 1) and
neglecting the electron mass (but not the proton mass, mp), the corresponding two-loop
self-energy can still be written in the form of Eq. (3). We introduce δ = (s −m2p)/2s and
treat it as a small parameter, since in the physical case (where s = m2n), δ ≃ 1.293 × 10−3.
A simple calculation then yields:
Im v(s) = −G
F |Vud|2
s2 δ5, (18)
where Vud is the relevant quark-mixing (CKM) matrix element. We note in passing that
this result leads to the lifetime of τn ≈ 622 sec, to be compared with the experimental
value of 886 sec. This 30% disagreement is largely due to the fact that in reality the axial
coupling gA deviates from 1. However, for our order-of-magnitude estimate this discrepancy
is unimportant.
What is important is that the derivative of the self-energy is enhanced by one power of
Im v′(s) = −(GF |Vud|)
s δ4 . (19)
and this opens the possibility for the enhancement of the effect in the lifetime. Namely, the
relative change in the neutron width then goes as
|∆Γn|
∼ µN |Bz|
mn −mp
<∼ 3× 10−14B T−1, (20)
where µN ≃ 3.15× 10−14 MeV T−1 is the nuclear magneton. A more precise analysis of this
effect for the neutron is beyond the scope of this paper. We focus instead on the example
of the ∆-resonance, where such an enhancement will be shown to be even more dramatic,
at least qualitatively.
The ∆ resonance decays strongly into the pion and nucleon, ∆ → πN , and the cor-
responding self-energy, to leading order in chiral effective-field theory, yields the following
result for the absorptive part[3]:
ImΣ∆(p/) = −23πλ
3C2 (α p/+mN ) , (21)
where the isospin symmetry is assumed, e.g., mp = mn = mN . The constant C =
hAm∆/8πfπ ≃ 1.5, where hA represents the πN∆ coupling and is fitted to the empira-
cal width of the ∆, fπ ≃ 93 MeV is the pion-decay constant, and m∆ = 1232 MeV is the ∆
π(b)(a)
FIG. 3: The leading chiral-loop correction to the magnetic moment of the ∆.
mass. For simplicity we neglect the pion mass (i.e., take the chiral limit). Then, in Eq. (21),
λ = (s−m2N)/2s, α = 1− λ. For s = m∆, λ ≈ (m∆ −mN)/mN ∼ 1/3 is a small parameter
in the chiral effective-field theory with ∆’s (see Ref.[13] for a recent review), and will so be
treated here too.
The absorptive part of the magnetic dipole moment of the ∆ arises at this order from
graphs in Fig. 3. These graphs, computed in Ref.[3], in the chiral limit yield the following
result (upto λ4 terms):
ImF (a) = 4πC2(λ− 3λ2 + 43
λ3) ,
ImG(a) = 4πC2(−λ+ 4λ2 − 71
λ3) ,
ImF (b) = 4πC2(λ2 + 1
λ3) , (22)
ImG(b) = −32
πC2λ3 ,
where F and G correspond with the decomposition in Eq. (9), with the superscript referring
to the corresponding graphs in Fig. 3; FA is absent in this case, of course.
First of all, we observe that this result satisfies the WT conditions, Eq. (11), for each of
the four charge states of the ∆,
∆++ : Im [F (a) +G(a) + F (b) +G(b)] = −2 ImΣ ′∆ ,
∆+ : Im [1
(F (a) +G(a)) + 2
(F (b) +G(b))] = − ImΣ ′∆ ,
∆0 : Im [−1
(F (a) +G(a)) + 1
(F (b) +G(b))] = 0 , (23)
∆− : − Im [F (a) +G(a)] = ImΣ ′∆ ,
where Σ ′∆ = ∂/∂p/Σ∆(p/)|p/=m∆, and hence ImΣ ′∆ = 4πC2(−λ2 + 73λ
At the same time, the ‘naive’ minimal-substitution procedure [Eq. (16)], that happens
to work for the muon, fails here miserably. It would predict that the magnetic moment
contribution would go with the same power as the self-energy [Eq. (17)], which for the
absorptive part means Imµ ∼ ImΣ (m∆) ∼ λ3. In reality it goes as λ. E.g., for the ∆+:
Imµ∆+ = (e/2m∆) Im[
F (a) + 2
F (b)]
π µNC
2 λ+O(λ2). (24)
The fact that the self-energy goes as λ3, while Imµ as λ has as a consequence the enhance-
ment of the lifetime change in the magnetic field by two powers of λ.
Quantitatively such enhancements of the lifetime change over the lifetime by the phase-
space volume do not make much difference in the above examples. However, it shows that
it might be useful to look for manifestations of the lifetime change in the medium where the
phase-space volume can be varied.
IV. CONCLUSIONS AND OUTLOOK
We have examined her a concept of the ‘absorptive magnetic moment’ — an intrinsic
property of an unstable particle, together with the width or the lifetime. It manifests itself
in the change of the particle’s lefetime in an external magnetic field, see Eq. (25) below.
We have computed this quantity for the examples of muon, neutron and ∆-resonance to
leading order in couplings. In all the three cosidered cases the effect on the lifetime is tiny
for normal magnetic fields: in a uniform field of 1 Tesla the change in the lifetime is of order
of 10−13 percent, at most.
In the case of the muon we have computed this effect to the leading order in the elec-
troweak coupling; the change in the lifetime is
∆τ = −2 ImµBz τ 2 = −96π3eBz/(G2Fm7) , (25)
or, numerically, |∆τ | <∼ 6× 10−21 (B/T) sec. A direct measurement of this effect is therefore
beyond the present experimental precision. Nevertheless, it is worthwhile to investigate
the effect of the magnetic field on the differential decay rates, with the hope that some
asymmetries could show a significantly bigger sensitivity.
A notable feature of this effect is that the relative change of the lifetime is inversely
proportional to the phase space. It goes as (mn −mp)−1 in the neutron case, and as (m∆ −
−2 in the ∆-resonance case. (The difference in power is apparently because the neutron
decays solely into fermions while the ∆ has a boson in the decay product.) One can expect
that in the conditions where the phase-space is significantly reduced, e.g. for the neutron in
nuclear medium, the effect of the lifetime change may become measurable.
Especially interesting would be to evaluate the manifestations of this effect in neutron star
formations. Not only the phase-space of the neutron decay is shrinking, the protons decay
too, and all that occurs in magnetic fields as large as 1010 Tesla. Even larger fields can be
achieved in atomic or nuclear systems. Finally, it is worthwhile to point out that in lattice
QCD studies strong magnetic fields are standardly used to compute the electromagnetic
properties of hadrons. Combined with the lattice techniques of extracting the width, the
relation between the absorprive part and the lifetime change may allow to compute the
former on the lattice for unstable hadrons.
Acknowledgments
We thank Barry Holstein and Marc Vanderhaeghen for a number of insightful discus-
sions. The work of V.P. is partially supported by the European Community-Research In-
frastructure Activity under the FP6 ”Structuring the European Research Area” programme
(HadronPhysics, contract RII3-CT-2004-506078).
[1] M. Passera, J. Phys. G 31, R75 (2005); J. P. Miller, E. de Rafael and B. L. Roberts, Rept.
Prog. Phys. 70, 795 (2007).
[2] L. V. Avdeev and M. Y. Kalmykov, Phys. Lett. B 436, 132 (1998).
[3] V. Pascalutsa and M. Vanderhaeghen, Phys. Rev. Lett. 94, 102003 (2005); Phys. Rev. D 77,
014027 (2008).
[4] B. R. Holstein, unpublished.
[5] W. M. Yao et al. [Particle Data Group],“Review of particle physics,”J. Phys. G 33, 1 (2006).
[6] D. Tomono [RIKEN RAL R77 Collaboration], AIP Conf. Proc. 842, 906 (2006); K. R. Lynch,
AIP Conf. Proc. 870, 333 (2006); J. S. Nico, AIP Conf. Proc. 870, 132 (2006); A. P. Serebrov
et al., arXiv:nucl-ex/0702009.
[7] P. G. Harris et al., Phys. Rev. Lett. 82, 904 (1999); C. A. Baker et al., Phys. Rev. Lett. 97,
131801 (2006).
[8] J. L. Basdevant and J. Dalibard, “Quantum Mechanics Solver,” (Springer, Berlin, 2005).
[9] R. P. Feynman and M. Gell-Mann, Phys. Rev. 109, 193 (1958).
[10] T. van Ritbergen and R. G. Stuart, Phys. Rev. Lett. 82, 488 (1999).
[11] A. Czarnecki and B. Krause, Nucl. Phys. Proc. Suppl. 51C, 148 (1996).
[12] J. H. Koch, V. Pascalutsa and S. Scherer, Phys. Rev. C 65, 045202 (2002).
[13] V. Pascalutsa, M. Vanderhaeghen and S. N. Yang, Phys. Rept. 437, 125 (2007).
[14] Our conventions are: metric (+,−,−,−), ε0123 = +1, γ5 = iγ0γ1γ2γ3, γ’s stand for Dirac
matrices and their totally-antisymmetric products: γµν = 1
[γµ, γν ], γµνα = 1
{γµν , γα},
γµναβ = 1
[γµνα, γβ].
ABSTRACT
  We show that the electromagnetic moments of unstable particles (resonances)
have an absorptive contribution which quantifies the change of the particle's
lifetime in an external electromagnetic field. To give an example we compute
here the imaginary part of the magnetic moment for the cases of the muon and
the neutron at leading order in the electroweak coupling. We also consider an
analogous effect for the strongly-decaying $\Delta$(1232) resonance. The result
for the muon is Im$ \mu = e G_F^2 m^3/768 \pi^3$, with $e$ the charge and $m$
the mass of the muon, $G_F$ the Fermi constant, which in an external magnetic
field of $B$ Tesla give rise to the relative change in the muon lifetime of
$3\times 10^{-15} B$. For neutron the effect is of a similar magnitude. We
speculate on the observable implications of this effect.

<|endoftext|><|startoftext|>
Introduction
For an integrable function a : {z ∈ C | |z| = 1} → C defined on the unit
circle in the complex plane, the n× n Toeplitz matrix Tn(a) with symbol a
is defined by
Tn(a)
= aj−k, j, k = 1, . . . , n, (1.1)
where ak is the kth Fourier coefficient of a,
a(eiθ)e−ikθ dθ. (1.2)
In this paper we study banded Toeplitz matrices for which the symbol has
only a finite number of non-zero Fourier coefficients. We assume that there
exist p, q ≥ 1 such that
a(z) =
k, ap 6= 0, a−q 6= 0. (1.3)
Department of Mathematics, Katholieke Universiteit Leuven, Celestijnenlaan 200B,
3001 Leuven, Belgium. (maurice.duits@wis.kuleuven.be, arno.kuijlaars@wis.kuleuven.be).
The first author is a research assistant of the Fund for Scientific Research – Flanders.
The authors were supported by the European Science Foundation Program MISGAM.
The second author is supported by FWO-Flanders project G.0455.04, by K.U. Leuven
research grant OT/04/21, by Belgian Interuniversity Attraction Pole NOSY P06/02, and
by a grant from the Ministry of Education and Science of Spain, project code MTM2005-
08648-C02-01.
http://arxiv.org/abs/0704.0378v1
2 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Thus Tn(a) has at most p + q + 1 non-zero diagonals. As in [1, p. 263], we
also assume without loss of generality that
g.c.d. {k ∈ Z | ak 6= 0} = 1. (1.4)
We are interested in the limiting behavior of the spectrum of Tn(a) as
n → ∞. We use spTn(a) to denote the spectrum of Tn(a):
spTn(a) = {λ ∈ C | det(Tn(a)− λI) = 0}
Spectral properties of banded Toeplitz matrices are the topic of the recent
book [1] by Böttcher and Grudsky. We will refer to this book frequently,
in particular to Chapter 11 where the limiting behavior of the spectrum is
discussed.
The limiting behavior of spTn(a) was characterized by Schmidt and Spitzer
[10]. They considered the set
lim inf
spTn(a), (1.5)
consisting of all λ ∈ C such that there exists a sequence {λn}n∈N, with
λn ∈ spTn(a), converging to λ, and the set
lim sup
spTn(a), (1.6)
consisting of all λ such that there exists a sequence {λn}n∈N, with λn ∈
spTn(a), that has a subsequence converging to λ. Schmidt and Spitzer
showed that these two sets are equal and can be characterized in terms of
the algebraic equation
a(z) − λ =
k − λ = 0. (1.7)
For every λ ∈ C there are p+q solutions for (1.7), which we denote by zj(λ),
for j = 1, . . . , p+ q. We order these solutions by absolute value, so that
0 < |z1(λ)| ≤ |z2(λ)| ≤ · · · ≤ |zp+q(λ)|. (1.8)
When all inequalities in (1.8) are strict then the values zk(λ) are unambigu-
ously defined. If equalities occur then we choose an arbitrary numbering
so that (1.8) holds. The result by Schmidt and Spitzer [10], [1, Theorem
11.17], is that
lim inf
spTn(a) = lim sup
spTn(a) = Γ0 (1.9)
where
Γ0 := {λ ∈ C | |zq(λ)| = |zq+1(λ)|}. (1.10)
This result gives a description of the asymptotic location of the eigenvalues.
The eigenvalues accumulate on the set Γ0, which is known to be a disjoint
union of a finite number of (open) analytic arcs and a finite number of ex-
ceptional points [1, Theorem 11.9]. It is also known that Γ0 is connected
EIGENVALUES OF BANDED TOEPLITZ MATRICES 3
[13], [1, Theorem 11.19], and that C \ Γ0 need not be connected [1, Theo-
rem 11.20], [2, Proposition 5.2]. See [1] for many beautiful illustrations of
eigenvalues of banded Toeplitz matrices.
The limiting eigenvalue distribution was determined by Hirschman [5], [1,
Theorem 11.16]. He showed that there exists a Borel probability measure
µ0 on Γ0 such that the normalized eigenvalue counting measure of Tn(a)
converges weakly to µ0, as n → ∞. That is,
λ∈spTn(a)
δλ → µ0, (1.11)
where in the sum each eigenvalue is counted according to its multiplicity.
The measure µ0 is absolutely continuous with respect to the arclength mea-
sure on Γ0 and has an analytic density on each open analytic arc in Γ0,
which can be explicitly represented in terms of the solutions of the algebraic
equation (1.7) as follows. Equip every open analytic arc in Γ0 with an orien-
tation. The orientation induces ±-sides on each arc, where the +-side is on
the left when traversing the arc according to its orientation, and the −-side
is on the left. The limiting measure µ0 is then given by
dµ0(λ) =
zj+(λ)
zj−(λ)
dλ. (1.12)
where dλ is the complex line element on Γ0 (taken according to the orien-
tation), and where zj±(λ), λ ∈ Γ0, is the limiting value of zj(λ
′) as λ′ → λ
from the ± side of the arc. These limiting values exist for every λ ∈ Γ0,
with the possible exception of the finite number of exceptional points.
Note that the right-hand side of (1.12) is a priori a complex measure and
it is not immediately clear that it is in fact a probability measure. In the
original paper [5] and in the book [1, Theorem 11.16], the authors give a
different expression for the limiting density, from which it is clear that the
measure is non-negative. We prefer to work with the complex expression
(1.12), since it allows for a direct generalization which we will need in this
paper.
Note also that if we reverse the orientation on an arc in Γ0, then the ±-
sides are reversed. Since the complex line element dλ changes sign as well,
the expression (1.12) does not depend on the choice of orientation.
The following is a very simple example, which however serves as a moti-
vation for the results in the paper.
Example 1.1. Consider the symbol a(z) = z+1/z. In this case we find that
Γ0 = [−2, 2] and µ0 is absolutely continuous with respect to the Lebesgue
measure and has density
dµ0(λ)
4− λ2
, λ ∈ (−2, 2). (1.13)
4 MAURICE DUITS AND ARNO B.J. KUIJLAARS
This measure is well-known in potential theory and is called the arcsine
measure or the equilibrium measure of Γ0, see e.g. [9]. It has the property
that it minimizes the energy functional I defined by
I(µ) =
|x− y| dµ(x) dµ(y), (1.14)
among all Borel probability measures µ on [−2, 2]. The measure µ0 is also
characterized by the equilibrium condition
log |x− λ| dµ0(λ) = 0, x ∈ [−2, 2], (1.15)
which is the Euler-Lagrange variational condition for the minimization prob-
The fact that µ0 is the equilibrium measure of Γ0 is special for symbols
a with p = q = 1. In that case one may think of the eigenvalues of Tn(a) as
charged particles on Γ0, each eigenvalue having a total charge 1/n, that repel
each other with logarithmic interaction. The particles seek to minimize the
energy functional (1.14). As n → ∞, they distribute themselves according
to µ0 and µ0 is the minimizer of (1.14) among all probability measures
supported on Γ0.
The aim of this paper is to characterize µ0 for general symbols a of the
form (1.3) also in terms of an equilibrium problem from potential theory.
The corresponding equilibrium problem is more complicated since it involves
not only the measure µ0, but a sequence of p+ q − 1 measures
µ−q+1, µ−q+2, . . . , µ−1, µ0, µ1, . . . , µp−2, µp−1
that jointly minimize an energy functional.
2. Statement of results
2.1. The energy functional. To state our results we need to introduce
some notions from potential theory. Main references for potential theory in
the complex plane are [8] and [9].
We will mainly work with finite positive measures on C, but we will also
use ν1 − ν2 where ν1 and ν2 are positive measures. The measures need not
have bounded support. If ν has unbounded support then we assume that
log(1 + |x|) dν(x) < ∞. (2.1)
In that case the logarithmic energy of ν is defined as
I(ν) =
|x− y| dν(x)dν(y) (2.2)
and I(ν) ∈ (−∞,+∞].
EIGENVALUES OF BANDED TOEPLITZ MATRICES 5
Definition 2.1. We define Me as the collection of positive measures ν on
C satisfying (2.1) and having finite energy, i.e., I(ν) < +∞. For c > 0 we
define
Me(c) = {ν ∈ Me | ν(C) = c}. (2.3)
The mutual energy I(ν1, ν2) of two measures ν1 and ν2 is
I(ν1, ν2) =
|x− y| dν1(x)dν2(y). (2.4)
It is well-defined and finite if ν1, ν2 ∈ Me and in that case we have
I(ν1 − ν2) = I(ν1) + I(ν2)− 2I(ν1, ν2). (2.5)
If ν1, ν2 ∈ Me(c) for some c > 0, then
I(ν1 − ν2) ≥ 0, (2.6)
with equality if and only if ν1 = ν2. This is a well-known result if ν1 and ν2
have compact support [9]. For measures in Me(c) with unbounded support,
this is a recent result of Simeonov [11], who obtained this from a very elegant
integral representation for I(ν1 − ν2). It is a consequence of (2.6) that I is
strictly convex on Me(c), since
ν1 + ν2
(I(ν1) + I(ν2))− I
ν1 − ν2
(I(ν1) + I(ν2)) , for ν1, ν2 ∈ Me(c),
with equality if and only if ν1 = ν2.
Before we can state the equilibrium problem we also need to introduce
the sets
Γk := {λ ∈ C | |zq+k(λ)| = |zq+k+1(λ)|}, k = −q + 1, . . . , p− 1, (2.7)
which for k = 0 reduces to the definition (1.10) of Γ0. We will show that
each Γk is the disjoint union of a finite number of open analytic arcs and
a finite number of exceptional points. All Γk are unbounded, except for Γ0
which is compact.
The equilibrium problem will be defined for a vector of measures denoted
by ~ν = (ν−q+1, . . . , νp−1). The component νk is a measure on Γk satisfying
some additional properties that are given in the following definition.
Definition 2.2. We call a vector of measures ~ν = (ν−q+1, . . . , νp−1) admis-
sible if νk ∈ Me, νk is supported on Γk, and
νk(Γk) =
if k ≤ 0,
if k ≥ 0,
(2.8)
for every k = −q + 1, . . . , p − 1.
Now we are ready to state our first result. The proof is given in section
6 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Theorem 2.3. Let the symbol a satisfy (1.3) and (1.4), and let the curves
Γk be defined as in (2.7). For each k ∈ {−q + 1, . . . , p − 1}, define the
measure µk on Γk by
dµk(λ) =
zj+(λ)
zj−(λ)
dλ, (2.9)
where dλ is the complex line element on each analytic arc of Γk according
to a chosen orientation of Γk (cf. discussion after (1.12)). Then
(a) ~µ = (µ−q+1, . . . , µp−1) is admissible.
(b) There exist constants lk such that
log |λ− x| dµk(x) =
log |λ− x| dµk+1(x) +
log |λ− x| dµk−1(x) + lk,
(2.10)
for k = −q + 1, . . . , p − 1, and λ ∈ Γk. Here we let µ−q and µp be
the zero measures.
(c) ~µ = (µ−q+1, . . . , µp−1) is the unique minimizer of the energy func-
tional J defined by
J(~ν) =
k=−q+1
I(νk)−
k=−q+1
I(νk, νk+1) (2.11)
for admissible vectors of measures ~ν = (ν−q+1, . . . , νp−1).
The relations (2.10) are the Euler-Lagrange variational conditions for the
minimization problem for J among admissible vectors of measures.
It may not be obvious that the energy functional (2.11) is bounded from
below. This can be seen from the alternative representation
J(~ν) =
I(ν0) +
k(k + 1) I
ν−q+k
− ν−q+k+1
k + 1
k(k + 1) I
− νp−k−1
k + 1
. (2.12)
We leave the calculation leading to this identity to the reader. Under the
normalizations (2.8) it follows by (2.6) that each term in the two finite sums
on the right-hand side of (2.12) is non-negative, so that
J(~ν) ≥
I(ν0).
Since ν0 is a Borel probability measure on Γ0 and Γ0 is compact, we indeed
have that the energy functional is bounded from below on admissible vectors
of measures ~ν.
The alternative representation (2.12) will play a role in the proof of The-
orem 2.3.
EIGENVALUES OF BANDED TOEPLITZ MATRICES 7
Yet another representation for J is
J(~ν) =
j,k=−q+1
Ajk I(νj , νk) (2.13)
where the interaction matrix A has entries
Ajk =
1, if j = k,
, if |j − k| = 1,
0, if |j − k| ≥ 2.
(2.14)
The energy functional in the form (2.13) and (2.14) also appears in the
theory of simultaneous rational approximation, where it is the interaction
matrix for a Nikishin system [7, Chapter 5].
It allows for the following physical interpretation: on each of the curves Γk
one puts charged particles with total charge (q+k)/q or (p−k)/p, depending
on whether k ≤ 0 or k ≥ 0. Particles that lie on the same curve repel each
other. The particles on two consecutive curves interact in the sense that
they attract each other but in a way that is half as strong as the repulsion
on a single curve. Particles on different curves that are not consecutive do
not interact with each other in a direct way.
2.2. The measures µk as limiting measures of generalized eigenval-
ues. By (1.12) and Theorem 2.3 we know that the measure µ0 that appears
in the minimizer of the energy functional J is the limiting measure for the
eigenvalues of Tn(a). It is natural to ask about the other measures µk that
appear in the minimizer. In our second result we show that the measures
µk can be obtained as limiting counting measures for certain generalized
eigenvalues.
Let k ∈ {−q+1, . . . , p− 1}. We use Tn(z−k(a−λ) to denote the Toeplitz
matrix with the symbol z 7→ z−k(a(z) − λ). For example, for k = 1, q = 1
and p = 2, we have
−k(a−λ)) =
a1 a0 − λ a−1
a2 a1 a0 − λ a−1
a2 a1 a0 − λ a−1
. . .
. . .
. . .
. . .
a2 a1 a0 − λ a−1
a2 a1 a0 − λ
a2 a1
Definition 2.4. For k ∈ {−q + 1, . . . , p − 1} and n ≥ 1, we define the
polynomial Pk,n by
Pk,n(λ) = detTn(z
−k(a− λ)) (2.15)
and we define the kth generalized spectrum of Tn(a) by
spk Tn(a) = {λ ∈ C | Pk,n(λ) = 0}. (2.16)
8 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Finally, we define µk,n as the normalized zero counting measure of spk Tn(a)
µk,n =
λ∈spk Tn(a)
δλ (2.17)
where in the sum each λ is counted according to its multiplicity as a zero of
Pk,n.
Note that λ ∈ spk Tn(a) is a generalized eigenvalue (in the usual sense) for
the matrix pencil (Tn(z
−ka), Tn(z
−k)), that is, det(A − λB) = 0 with A =
−ka) and B = Tn(z
−k). If k = 0, then B = I and sp0 Tn(a) = spTn(a).
If k 6= 0, then B is not invertible and the generalized eigenvalue problem is
singular, causing that there are less than n generalized eigenvalues. In fact,
since Tn(z
−k(a−λ)) has exactly n−|k| entries a0−λ, we easily get that the
degree of Pk,n is at most n− |k| and so there are at most n− |k| generalized
eigenvalues. Due to the band structure of Tn(z
−k(a−λ)) the actual number
of generalized eigenvalues is substantially smaller.
Proposition 2.5. Let k ∈ {−q+1, . . . , p−1}. Let Pk,n(λ) = γk,nλdk,n + · · ·
have degree dk,n and leading coefficient γk,n 6= 0. Then
dk,n ≤
n, if k < 0,
n, if k > 0.
(2.18)
Equality holds in (2.18) if either k > 0 and n is a multiple of p, or k < 0
and n is a multiple of q, and in those cases we have
γk,n =
(−1)(k+1)na|k|n/q−q , if k < 0 and n ≡ 0 mod q,
(−1)(k+1)nakn/pp , if k > 0 and n ≡ 0 mod p.
(2.19)
We now come to our second main result. It is the analogue of the results
of Schmidt-Spitzer and Hirschman for the generalized eigenvalues.
Theorem 2.6. Let k ∈ {−q + 1, . . . , p− 1}. Then
lim inf
spk Tn(a) = lim sup
spk Tn(a) = Γk, (2.20)
φ(z) dµk,n(z) =
φ(z) dµk(z) (2.21)
holds for every bounded continuous function φ on C.
The key element in the proof of Theorem 2.6 is a beautiful formula of
Widom [14], see [1, Theorem 2.8], for the determinant of a banded Toeplitz
matrix. In the present situation Widom’s formula yields the following. Let
λ ∈ C be such that the solutions zj(λ) of the algebraic equation (1.7) are
mutually distinct. Then
Pk,n(λ) = detTn(z
−k(a− λ)) =
CM (λ) (wM (λ))
, (2.22)
EIGENVALUES OF BANDED TOEPLITZ MATRICES 9
where the sum is over all subsets M ⊂ {1, 2, . . . , p+ q} of cardinality |M | =
p− k and for each such M , we have
wM (λ) := (−1)p−kap
zj(λ), (2.23)
and (with M := {1, 2, . . . , p+ q} \M),
CM (λ) :=
zj(λ)
(zj(λ)− zl(λ))−1. (2.24)
The formula (2.22) shows that for large n, the main contribution comes from
those M for which |wM (λ)| is the largest possible. For λ ∈ C \ Γk there is a
unique such M , namely
M = Mk := {q + k + 1, q + k + 2, . . . , p + q} (2.25)
because of the ordering (1.8).
2.3. Overview of the rest of the paper. In section 3 we will state some
preliminary results about analyticity properties of the solutions zj of the al-
gebraic equation (1.7). These results will be needed in the proof of Theorem
2.3 which is given in section 4. In section 5 we will prove Proposition 2.5
and Theorem 2.6. Finally, we conclude the paper by giving some examples
in section 6.
3. Preliminaries
In this section we collect a number of properties of the curves Γk and the
solutions z1(λ), . . . , zp+q(λ) of the algebraic equation (1.7). For convenience
we define throughout the rest of the paper
Γ−q = Γp = ∅, and µ−q = µp = 0. (the zero-measure).
Occasionally we also use
z0(λ) = 0, zp+q+1(λ) = +∞.
3.1. The structure of the curves Γk. We start with a definition, cf. [1,
§11.2].
Definition 3.1. A point λ0 ∈ C is called a branch point if a(z) − λ0 = 0
has a multiple root. A point λ0 ∈ Γk is an exceptional point of Γk if λ0 is a
branch point, or if there is no open neighborhood U of λ such that Γk ∩U is
an analytic arc starting and terminating on ∂U .
If λ0 is a branch point, then there is a z0 such that a(z0) = λ0 and
a′(z0) = 0. Then we may assume that z0 = zq+k(λ0) = zq+k+1(λ0) for some
k and λ0 ∈ Γk. For a symbol a of the form (1.3), the derivative a′ has exactly
p+q zeros (counted with multiplicity), so that there are exactly p+q branch
points counted with multiplicity.
10 MAURICE DUITS AND ARNO B.J. KUIJLAARS
The solutions zk(λ) also have branching at infinity (unless p = 1 or q = 1).
There are p solutions of (1.7) that tend to infinity as λ → ∞, and q solutions
that tend to 0. Indeed, we have
zk(λ) =
−1/q(1 +O(λ−1/q)), for k = 1, . . . , q,
1/p(1 +O(λ−1/p)), for k = q + 1, . . . , p + q,
(3.1)
as λ → ∞. Here c1, . . . , cq are the q distinct solutions of cq = a−q (taken in
some order depending on λ), and cq+1, . . . , cp+q are the p distinct solutions
of cp = a−1p (again taken in some order depending on λ).
The following proposition gives the structure of Γk at infinity.
Proposition 3.2. Let k ∈ {−q+1, . . . , p−1}\{0}. Then there is an R > 0
such that Γk ∩ {λ ∈ C | |λ| > R} is a finite disjoint union of analytic arcs,
each extending from |λ| = R to infinity.
Proof. The proof is similar to the proof of [1, Proposition 11.8] where a
similar structure theorem was proved for finite branch points. We omit the
details. �
It follows from Proposition 3.2 that the exceptional points for Γk are in a
bounded set. Since the set of exceptional point is discrete we conclude that
there are only finitely many exceptional points. Then we have the following
result about the structure of Γk.
Proposition 3.3. For every k ∈ {−q + 1, . . . , p − 1}, the set Γk is the
disjoint union of a finite number of open analytic arcs and a finite number
of exceptional points. The set Γk has no isolated points.
Proof. This was proved for k = 0 in [10] and [1, Theorem 11.9]. For general
k, there are only finitely many exceptional points and the proof follows in a
similar way. �
3.2. The Riemann surface. From Proposition 3.3 it follows that the curves
Γk can be taken as cuts for the p + q-sheeted Riemann surface of the alge-
braic equation (1.7). We number the sheets from 1 to p+ q, where the kth
sheet of the Riemann surface is
Rk = {λ ∈ C | |zk−1(λ)| < |zk(λ)| < |zk+1(λ)|} = C \ (Γ−q+k−1 ∪ Γ−q+k).
(3.2)
Thus zk is well-defined and analytic on Rk.
The easiest case to visualize is the case where consecutive cuts are disjoint,
that is, Γ−q+k−1 ∩ Γ−q+k = ∅ for every k = 2, . . . , p+ q− 2. In that case we
have that Rk is connected to Rk+1 via Γ−q+k in the usual crosswise manner,
and zk+1 is the analytic continuation of zk across Γ−q+k.
The general case is described in the following proposition.
Proposition 3.4. Suppose A is an open analytic arc such that A ⊂ Γ−q+k,
for k = k1, . . . , k2, and A ∩ (Γ−q+k1−1 ∪ Γ−q+k2+1) = ∅. Then for k =
EIGENVALUES OF BANDED TOEPLITZ MATRICES 11
k1, . . . , k2+1, we have that the analytic continuation of zk across A is equal
to zk1+k2−k+1. Thus across A, we have that Rk is connected to Rk1+k2−k+1.
Proof. We have that
|zk1(λ)| = |zk1+1(λ)| = · · · = |zk2(λ)| = |zk2+1(λ)|
for λ ∈ A, with strict inequalities (<) for λ on either side of A. Choose an
orientation for A. Then there is a permutation π of {k1, . . . , k2 + 1} such
that zπ(k) is the analytic continuation of zk from the +-side of A to the
−-side of A.
Assume that there are k, k′ ∈ {k1, . . . , k2 + 1} such that k < k′ and
π(k) < π(k′). Take a regular λ0 ∈ A and a small neighborhood U of λ0 such
that A∩U = Γ−q+k ∩U = Γ−q+k′ ∩U and A∩U is an analytic arc starting
and terminating on ∂U . Then we have a disjoint union U = U+∪U−∪(A∩U)
where U+ (U−) is the part of U on the +-side (−-side) of A. The function
φ defined by
φ(λ) =
zk(λ)
zk′ (λ)
, for λ ∈ U+,
zπ(k)(λ)
zπ(k′)(λ)
, for λ ∈ U−,
has an analytic continuation to U , and satisfies |φ(λ)| < 1 for λ ∈ U+ ∪ U−
and |φ(λ)| = 1 for λ ∈ A ∩ U . This contradicts the maximum principle for
analytic functions. Therefore π(k) > π(k′) for every k, k′ ∈ {k1, . . . , k2 + 1}
with k < k′, and this implies that π(k) = k1 + k2 − k + 1 for every k =
k1, . . . , k2 + 1, and the proposition follows. �
3.3. The functions wk(λ). A major role is played by the functions wk,
which for k ∈ {−q + 1, . . . , p − 1}, are defined by
wk(λ) =
zj(λ), for λ ∈ C \ Γk. (3.3)
Note that wk = (−1)p−ka−1p w{1,...,k} in the notation of (2.23).
Proposition 3.5. The function wk is analytic in C \ Γk.
Proof. Since zj is analytic on Rj = C \ (Γ−q+j−1 ∪ Γ−q+j), see (3.2), we
obtain from its definition that wk is analytic in C \
j=1 Γ−q+j. Let A be
an analytic arc in Γ−q+j \ Γk for some j < k + q. Choose an orientation on
A. Since the arc is disjoint from Γk, we have that zj+(λ) = zπ(j)−(λ), for
λ ∈ A and j = 1, . . . , q+k, where π is a permutation of {1, . . . , q+k}. Since
wk is symmetric in the zj ’s for j = 1, . . . , q + k, it then follows that
wk+(λ) = wk−(λ), for λ ∈ A,
which shows the analyticity in C \Γk with the possible exception of isolated
singularities at the exceptional points of Γ−q+1, Γ−q+2, . . . , Γk−1. However,
each zj , and therefore also wk, is bounded near such an exceptional point,
so that any isolated singularity is removable. �
12 MAURICE DUITS AND ARNO B.J. KUIJLAARS
In the rest of the paper we make frequently use of the logarithmic de-
rivative w′k/wk of wk. By the fact that wk does not vanish on C \ Γk and
Proposition 3.5, it follows that w′k/wk is analytic in C \ Γk. By Proposition
3.4 it moreover has an analytic continuation across every open analytic arc
A ⊂ Γk. Near the exceptional points that are no branch points w′k/wk re-
mains bounded. At the branch points it can however have singularities of a
certain order.
Proposition 3.6. Let λ0 ∈ Γk be a branch point of Γk. Then there exists
an m ∈ N such that
w′k(λ)
wk(λ)
(λ− λ0)−m/(m+1)
, (3.4)
as λ → λ0 with λ ∈ C \ Γk.
Proof. Let 1 ≤ j ≤ q+k. We investigate the behavior of zj(λ) when λ → λ0
such that λ remains in a connected component of C \ (Γj−1 ∪ Γj). Then
zj(λ) → z0 for some z0 ∈ C with a(z0) = λ0. Let m0 + 1 be the multiplicity
of z0 as a solution of a(z) = λ0. Then
a(z) = λ0 + c0(z − z0)m0+1(1 +O(z − z0)), z → z0, (3.5)
for some nonzero constant c0. Therefore,
zj(λ) = z0 +O((λ− λ0)1/(m0+1)), (3.6)
z′j(λ) = O((λ− λ0)−m0/(m0+1)), (3.7)
for λ → λ0 such that λ remains in the same connected component of C \
(Γj−1 ∪ Γj). Let m be the maximum of all the multiplicities of the roots of
a(z) = λ0. Then it follows from (3.6) and (3.7) that
z′j(λ)
zj(λ)
= O((λ− λ0)−m/(m+1))
as λ → λ0 with λ ∈ C \ Γk. Then we obtain (3.4) in view of (3.3). �
We end this section by giving the asymptotics of w′k/wk for λ → ∞.
Proposition 3.7. As λ → ∞ with λ ∈ C \ Γk, we have
w′k(λ)
wk(λ)
− q+k
λ−1 +O
λ−1−1/q
, for k = −q + 1, . . . ,−1,
−λ−1 +O(λ−2), for k = 0,
λ−1 +O
λ−1−1/p
, for k = 1, . . . , p − 1.
(3.8)
Proof. This follows directly from (3.1) and (3.3). �
EIGENVALUES OF BANDED TOEPLITZ MATRICES 13
4. Proof of Theorem 2.3
We use the function wk introduced in (3.3). We define µk by the formula
(2.9) and we note that
dµk(λ) =
w′k+(λ)
wk+(λ)
w′k−(λ)
wk−(λ)
dλ. (4.1)
Proposition 4.1. For each k = −q + 1, . . . , p − 1, we have that µk is a
measure on Γk with total mass µk(Γk) = (q + k)/q if k ≥ 0, and µk(Γk) =
(p− k)/p if k ≥ 0.
Proof. We first show that µk is a measure, i.e., that it is non-negative on
each analytic arc of Γk. Let A be an analytic arc in Γk consisting only of
regular points. Let t 7→ λ(t) be a parametrization of A in the direction of
the orientation of Γk. Then
dµk(λ) =
w′k+(λ(t))
wk+(λ(t))
w′k−(λ(t))
wk−(λ(t))
λ′(t)dt
wk+(λ(t))
wk−(λ(t))
To conclude that µk is non-negative on A, it is thus enough to show that
Re log
wk+(λ)
wk−(λ)
= 0, for λ ∈ A, (4.2)
Im log
wk+(λ)
wk−(λ)
increases along A. (4.3)
Since |wk+(λ)| = |wk−(λ)| for λ ∈ A, we have (4.2) so that it only remains
to prove (4.3).
There is a neighborhood U of A such that U \ Γk has two components,
denoted U+ and U−, where U+ is on the +-side of Γk and U− on the −-side.
It follows from Proposition 3.4 that wk has an analytic continuation from
U− to U , which we denote by ŵk, and that |wk(λ)| < |ŵk(λ)| for λ ∈ U+,
and equality |wk+(λ)| = |ŵk(λ)| holds for λ ∈ A. Thus it follows that
Re log
wk(λ)
ŵk(λ)
≤ 0, for λ ∈ A,
where ∂
denotes the normal derivative to A in the direction of U+. Then by
the Cauchy-Riemann equations we have that Im log
wk+(λ)
ŵk+(λ)
is increasing
along A. Since ŵk+(λ) = wk−(λ) for λ ∈ A, we obtain (4.3). Thus µk is a
measure.
Next we show that µk is a finite measure, which means that we have to
show that
w′k+(λ)
wk+(λ)
w′k−(λ)
wk−(λ)
(4.4)
14 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Figure 1. Illustration for the proofs of Propositions 4.1 and
4.2. The solid line is a sketch of a possible contour Γk. The
dashed line is the contour Γ̃k,R and the dotted line is the
boundary of a disk of radius R around 0.
is integrable near infinity on Γk and near every branch point on Γk. This
follows from Propositions 3.7 and 3.6. Indeed, from Proposition 3.7 it follows
w′k+(λ)
wk+(λ)
w′k−(λ)
wk−(λ)
λ−1−δ
as λ → ∞, λ ∈ Γk. (4.5)
where δ = 1/q if k < 0 and δ = 1/p if k > 0. Since δ > 0 we see that
(4.4) is integrable near infinity. For a branch point λ0 of Γk, we have from
Proposition 3.6 that there exist an m ≥ 1 such that
w′k+(λ)
wk+(λ)
w′k−(λ)
wk−(λ)
(λ− λ0)−m/(m+1)
as λ → λ0, λ ∈ Γk. (4.6)
This shows that (4.4) is integrable near every branch point. Thus µk is a
finite measure.
Finally we compute the total mass of µk. LetD(0, R) = {z ∈ C | |z| < R}.
Then for R large enough, so that D(0, R) contains all exceptional points of
Γk and all connected components of C \ Γk (if any),
µk(Γk ∩D(0, R)) =
Γk∩D(0,R)
w′k+(λ)
wk+(λ)
Γk∩D(0,R)
w′k−(λ)
wk−(λ)
(4.7)
where we have used the behavior (4.6) near the branch points in order to
be able to split the integrals. Again using (4.6) we can then turn the two
integrals into a contour integral over a contour Γ̃k,R as in Figure 1. The
contour Γ̃k,R passes along the ±-sides of Γk ∩D(0, R) and if we choose the
orientation that is also shown in Figure 1 (and which is independent of the
EIGENVALUES OF BANDED TOEPLITZ MATRICES 15
choice of orientation for Γk), then
µk(Γk ∩D(0, R)) =
Γ̃k,R
w′k(λ)
wk(λ)
dλ. (4.8)
The parts of Γ̃k,R that belong to bounded components of C \Γk form closed
contours along the boundary of each bounded component. By Cauchy’s
theorem their contribution to the integral (4.8) vanishes. The parts of Γ̃k,R
that belong to the unbounded components of C \Γk can be deformed to the
circle ∂D(0, R) with the clockwise orientation. Thus if we use the positive
orientation on ∂D(0, R) as in Figure 1, then we obtain from (4.8)
µk(Γk ∩D(0, R)) = −
∂D(0,R)
w′k(λ)
wk(λ)
Letting R → ∞ and using Proposition 3.7, we then find that µk is a measure
on Γk with total mass µk(Γk) = (q + k)/q if k ≤ 0, and µk(Γk) = (p− k)/p
if k ≥ 0. �
The following proposition is the next step in showing that the measures
µk from (2.9) satisfy the equations (2.10).
Proposition 4.2. For k = −q + 1, . . . , p− 1, we have that
dµk(x)
x− λ =
w′k(λ)
wk(λ)
, for λ ∈ C \ Γk, (4.9)
log |λ− x| dµk(x) = − log |wk(λ)|+ αk, for λ ∈ C, (4.10)
where αk is the constant
log |a−q|+ kq log |a−q|, if k ≤ 0,
log |a−q| − kp log |ap|, if k ≥ 0.
(4.11)
Proof. To prove (4.9), we follow the same arguments as in the calculation
of µk(Γk) in the end of the proof of Proposition 4.1. Let λ ∈ C \ Γk, and
choose R > 0 as in the proof of Proposition 4.1. We may assume R > |λ|.
Then similar to (4.7) and (4.8) we can write
Γk∩D(0,R)
dµk(x)
x− λ =
Γ̃k,R
w′k(x)
wk(x)(x− λ)
where Γ̃k,R has the same meaning as in the proof of Proposition 4.1, see also
Figure 1. As in the proof of Proposition 4.1 we deform to an integral over
∂D(0, R), but now we have to take into account that the integrand has a
pole at x = λ with residue w′k(λ)/wk(λ). Therefore, by Cauchy’s theorem
Γk∩D(0,R)
dµk(x)
x− λ =
w′k(λ)
wk(λ)
∂D(0,R)
w′k(x)
wk(x)(x− λ)
dx. (4.12)
Letting R → ∞ and using Proposition 3.7 gives (4.9).
16 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Next we integrate (4.9) over a Jordan curve J in C \ Γk from λ1 to λ2.
x− λ dµk(x) dλ = −
∫ ∫ λ2
x− λ dλ dµk(x)
(log |λ1 − x| − log |λ2 − x|+ i∆J [arg(λ− x)]) dµk(x), (4.13)
where ∆J [arg(λ − x)] denotes the change in argument of λ − x as when λ
varies over J from λ1 to λ2. By (4.9) the integral (4.13) is equal to
w′k(λ)
wk(λ)
dλ = log |wk(λ2)| − log |wk(λ1)|+ i∆J [argwk(λ)]. (4.14)
Equating the real parts of (4.13) and (4.14) we get
(log |λ1 − x| − log |λ2 − x|) dµk(x) = − log |wk(λ1)|+ log |wk(λ2)|.
(4.15)
Since λ1 and λ2 can be taken arbitrarily in a connected component of C\Γk,
we find that there exists a constant αk ∈ R (which a priori could depend on
the connected component) such that
log |λ− x| dµk(x) = − log |wk(λ)|+ αk, (4.16)
for all λ in a connected component of C \ Γk. By continuity the equation
(4.16) extends to the closure of the connected component, which shows that
the same constant αk is valid for all connected components. Thus (4.16)
holds for all λ ∈ C.
The exact value of αk can then be determined by expanding (4.16) for
large λ. Suppose for example that k < 0. Then by (3.1) and (3.3)
|wk(λ)| =
|zj(λ)| = |a−q|(q+k)/q|λ|−(q+k)/q
1 +O(λ−1/q)
as λ → ∞. Thus
− log |wk(λ)| =
q + k
log |λ| − q + k
log |a−q|+O(λ−1/q). (4.17)
Since
log |λ− x| dµk(x) = log |λ|µk(Γk) + o(1) =
q + k
log |λ|+ o(1), (4.18)
as λ → ∞, the value (4.11) for αk follows from (4.16), (4.17), and (4.18). The
argument for k > 0 is similar. This completes the proof of the proposition.
To prove part (c) of Theorem 2.3 we also need the following lemma.
EIGENVALUES OF BANDED TOEPLITZ MATRICES 17
Lemma 4.3. Let ~ν1 = (ν1,−q+1 . . . , ν1,p−1) and ~ν2 = (ν2,−q+1 . . . , ν2,p−1) be
two admissible vectors of measures. Then J(~ν1 − ~ν2) is well defined and
J(~ν1 − ~ν2) ≥ 0, (4.19)
with equality if and only if ~ν1 = ~ν2.
Proof. Since both ~ν1 and ~ν2 have finite energy, we find that J(~ν1 − ~ν2) is
well defined. According to the alternative representation (2.12), we have
J(~ν1 − ~ν2) =
I(ν1,0 − ν2,0)
k(k + 1)I
ν1,−q+k
− ν2,−q+k
− ν1,−q+k+1
k + 1
ν2,−q+k+1
k + 1
k(k + 1)I
ν1,p−k
ν2,p−k
ν1,p−k−1
k + 1
ν2,p−k−1
k + 1
(4.20)
Using (2.6) and (2.8), we see that all terms in (4.20) are non-negative and
therefore (4.19) holds.
Suppose now that J(~ν1 − ~ν2) = 0. Then all terms in the right-hand side
of (4.20) are zero, so that
ν1,0 = ν2,0, (4.21)
ν1,−q+k
ν2,−q+k+1
k + 1
ν1,−q+k+1
k + 1
ν2,−q+k
, for k = 1, . . . , q − 1,
(4.22)
ν1,p−k
ν2,p−k−1
k + 1
ν1,p−k−1
k + 1
ν2,p−k
, for k = 1, . . . , p− 1.
(4.23)
Using (4.21) in (4.22) with k = q − 1, we find ν1,−1 = ν2,−1. Proceeding
inductively we then obtain from (4.22) that ν1,k = ν2,k for all k = −q +
1, . . . , 0. Similarly, from (4.21) and (4.23) it follows that ν1,k = ν2,k for
k = 0, . . . , p− 1, so that ~ν1 = ~ν2 as claimed. �
Now we are ready for the proof of Theorem 2.3.
Proof of Theorem 2.3. (a) In view of Proposition 4.1 it only remains to
show that µk ∈ Me for every k = −q + 1, . . . , p − 1. The decay estimate
(4.5) implies that
log(1 + |λ|) dµk(λ) < ∞.
The fact that I(µk) < +∞ follows from (4.10). Indeed,
I(µk) = −
log |λ− x|dµk(x)dµk(λ) =
(log |wk(λ)| − αk)dµk(λ)
18 MAURICE DUITS AND ARNO B.J. KUIJLAARS
and this is finite since µk is a finite measure on Γk with a density that decays
as in (4.5) and log |wk(λ)| is continuous on Γk and grows only as a constant
times log |λ| as λ → ∞. Thus ~µ is admissible and part (a) is proved.
(b) According to (4.10) we have
log |λ− x| dµk(x)−
log |λ− x| dµk+1(λ)−
log |λ− x| dµk−1(λ)
= −2 log |wk(λ)|+ 2αk + log |wk+1(λ)| − αk+1 + log |wk−1(λ)| − αk−1
= log
wk+1(λ)wk−1(λ)
wk(λ)
+ 2αk − αk+1 − αk−1
= log
zq+k+1(λ)
zq+k(λ)
+ 2αk − αk+1 − αk−1. (4.24)
Since |zq+k(λ)| = |zq+k+1(λ)| for λ ∈ Γk, we see from (4.24) that (2.10)
holds with constant
lk = 2αk − αk−1 + αk+1. (4.25)
Note that for k = −q + 1 and k = p − 1, we are using the convention that
µ−q = µp = 0, and we also have put α−q = αp = 0. This proves part (b).
(c) Let ~ν = (ν−q+1, . . . , νp−1) be any admissible vector of measures. From
the representation (2.13) we get
J(~ν) = J(~µ+ ~ν − ~µ)
= J(~µ) + J(~ν − ~µ) + 2
j,k=−q+1
AjkI(µj, νk − µk). (4.26)
Using (2.14), we find from (4.26)
J(~ν) = J(~µ) + J(~ν − ~µ) +
k=−q+1
I(2µk − µk−1 − µk+1, νk − µk) (4.27)
For each k = −q + 1, . . . , p − 1, we have
I(2µk − µk−1 − µk+1, νk − µk)
log |λ− x| d(2µk − µk−1 − µk+1)(x)
d(νk − µk)(λ) (4.28)
By (2.10) the inner integral in the right-hand side of (4.28) is constant for
λ ∈ Γk. Since νk and µk are finite measures on Γk with νk(Γk) = µk(Γk),
we find from (4.28) that
I(2µk − µk−1 − µk+1, νk − µk) = 0, for k = −q + 1, . . . , p− 1.
Then (4.27) shows that J(~ν) = J(~µ)+J(~ν−~µ), which by Lemma 4.3 implies
that J(~ν) ≥ J(~µ) and equality holds if and only if ~ν = ~µ. This completes
the proof of Theorem 2.3. �
EIGENVALUES OF BANDED TOEPLITZ MATRICES 19
5. Proofs of Proposition 2.5 and Theorem 2.6
5.1. Proof of Proposition 2.5. We will now prove Proposition 2.5, which
follows by a combinatorial argument.
Proof of Proposition 2.5. We prove (2.18) and (2.19) for k > 0. The case
k < 0 is similar. Let us first expand the determinant in the definition of
Pk,n(λ) = detTn(z
−k(a− λ)) =
(a− λ)j−π(j)+k. (5.1)
Here Sn denotes the set of all permutation on {1, . . . , n}. By the band struc-
ture of Tn(z
−k(a − λ)) it follows that we only have non-zero contributions
from permutations π that satisfy
k − p ≤ π(j) − j ≤ q + k, for all j = 1, . . . , n. (5.2)
Define for π ∈ Sn,
Nπ = {j | π(j) = j + k}. (5.3)
and denote the number of elements of Nπ by |Nπ|. For each π ∈ Sn we have
j=1(a−λ)j−π(j)+k is a polynomial in λ of degree at most |Nπ|. So by
(5.1)
dk,n = degPk,n ≤ max
|Nπ| (5.4)
where we maximize over permutations π ∈ Sn satisfying (5.2).
Let π ∈ Sn satisfying (5.2). We prove (2.18) by giving an upper bound
for |Nπ|. Since
j=1(π(j) − j) = 0 we obtain
(π(j) − j)+ =
(j − π(j))+, (5.5)
where (·)+ is defined as (a)+ = max(0, a) for a ∈ R. Each j ∈ Nπ gives a
contribution k to the left-hand side of (5.5). Therefore the left-hand side is
at least k|Nπ|. By (5.2) we have that each term in the right hand side is
at most p− k. Moreover, there are at most n− |Nπ| non-zero terms in this
sum. Combining this with (5.5) leads to
k|Nπ| ≤
(π(j) − j)+ =
(j − π(j))+ ≤ (n− |Nπ|)(p − k). (5.6)
Hence, if π is a permutation satisfying (5.2)
|Nπ| ≤
n(p− k)
. (5.7)
Now (2.18) follows by combining (5.7) and (5.4).
To prove (2.19), we assume that n ≡ 0 mod p. We claim that there exists
a unique π such that equality holds in (5.7). Then equality holds in both
20 MAURICE DUITS AND ARNO B.J. KUIJLAARS
inequalities of (5.6) and the above arguments show that this can only happen
π(j) = j + k, or π(j) = j − p+ k, (5.8)
for every j = 1, . . . , n. We claim that there exists a unique such permutation,
namely
π(j) =
j + k, if j ≡ 1, . . . , (p − k) mod p,
j − p+ k, if j ≡ (p− k + 1), . . . , p mod p.
(5.9)
To see this let π be a permutation satisfying (5.8). The numbers 1, . . . , p−
k can not satisfy π(j) = j−p+k and thus satisfy π(j) = j+k. On the other
hand, the numbers 1, . . . , k can not be the image of numbers j satisfying
π(j) = j + k, and thus π(j) = j − p + k for j = p − k + 1, . . . , p. So (5.9)
holds for j = 1, . . . , p. This means in particular that the restriction of π to
{p + 1, . . . , n} is again a permutation, but now on {p + 1, . . . , n}. By the
same arguments we then find that (5.9) holds for j = p + 1, . . . , 2p, and so
on. The result is that (5.9) is indeed the only permutation that satisfies
(5.8).
Finally, a straightforward calculation shows that the coefficient of λ(p−k)n/p
j=1(a − λ)j−π(j)+k with π as in (5.9) is nonzero and given by (2.19).
This proves the proposition. �
5.2. Proof of Theorem 2.6. Before we start with the proof of Theorem
2.6 we first prove the following proposition concerning the asymptotics for
Pk,n for n → ∞.
Proposition 5.1. Let Mk = {q + k + 1, . . . , p+ q}. We have that
Pk,n(λ) = (wMk(λ))
nCMk(λ) (1 +O(exp(−cKn)) , n → ∞, (5.10)
uniformly on compact subsets K of C \ Γk. Here cK is a positive constant
depending on K.
Proof. First rewrite (2.22) as
Pk,n(λ) = (wMk(λ))
nCMk(λ) (1 +Rk,n(λ)) . (5.11)
with Rk,n defined by
Rk,n(λ) =
M 6=Mk
(wM (λ))
nCM (λ)
(wMk(λ))
nCMk(λ)
. (5.12)
Let K be a compact subset of C \ Γk. If K does not contain branch points
then there exists A,B > 0 such that
A < |CM (λ)| < B (5.13)
for all λ ∈ K and M . Moreover, we have
wM (λ)
wMk(λ)
zq+k(λ)
zq+k+1(λ)
≤ sup
zq+k(λ)
zq+k+1(λ)
< 1, (5.14)
EIGENVALUES OF BANDED TOEPLITZ MATRICES 21
for all λ ∈ K and M 6= Mk. Therefore one readily verifies from (5.11)
that there exist cK such that |Rk,n(λ)| ≤ exp(−cKn) for all λ ∈ K and n
large enough. This proves the statement in case K does not contain branch
points.
Suppose that K does contain branch points. Without loss of generality
we can assume that all branch points lie in the interior of K (otherwise we
replace K by a bigger compact set). The boundary ∂K of K is a com-
pact set with no branch points and therefore (5.10) holds for ∂K by the
above arguments. Since wMk and CMk are analytic in K, we find by (5.11)
that Rk,n is analytic in K. The maximum modulus principle for analytic
functions states that supz∈K |Rk,n(z)| = supz∈∂K |Rk,n(z)| and thereby we
obtain that (5.10) also holds for K with the same constant cK = c∂K . �
We now state two particular consequences of (5.10).
Corollary 5.2. Let k ∈ {−q + 1, . . . , p − 1}. For every compact set K ⊂
C \ Γk we have that µk,n(K) = 0 for n large enough.
Proof. Let K be a compact subset of C \ Γk. By (5.10) it follows that Pk,n
has no zeros in K for large n. Since nµk,n(K) equals the number of zeros of
Pk,n in K the corollary follows. �
Corollary 5.3. Let k ∈ {−q + 1, . . . , p− 1}. We have that
dµk,n(x)
x− λ =
dµk(x)
x− λ , (5.15)
uniformly on compact subsets of C \ Γk.
Proof. Let K be a compact subset of C \ Γk. Note that
dµk,n(x)
x− λ =
λi∈spk Tn(a)
λi − λ
P ′k,n(λ)
nPk,n(λ)
, (5.16)
for all λ ∈ K. With Mk and cK as in Proposition 5.1 we obtain from (5.10)
P ′k,n(λ)
nPk,n(λ)
w′Mk(λ)
wMk(λ)
+O(1/n), n → ∞, (5.17)
uniformly on K. Let us rewrite the right-hand side of (5.17). By expanding
both sides of zq(a(z)− λ) = ap
j=1(z − zj(λ)) and collecting the constant
terms we obtain
(−zj(λ)) =
. (5.18)
Since λ /∈ Γk, we can split this product in two parts, take the logarithmic
derivative and use (3.3) and (2.23) to obtain
z′j(λ)
zj(λ)
j=q+k+1
z′j(λ)
zj(λ)
w′k(λ)
wk(λ)
w′Mk(λ)
wMk(λ)
. (5.19)
22 MAURICE DUITS AND ARNO B.J. KUIJLAARS
Combining (5.16), (5.17) and (5.19), we obtain
dµk,n(x)
x− λ =
w′k(λ)
wk(λ)
(5.20)
uniformly on K. Then (5.15) follows from (5.20) and (4.9). �
Now we are ready for the proof of Theorem 2.6.
Proof of Theorem 2.6.
First we prove (2.21). By Proposition 2.5 and the fact that ~µ is admissible,
we get (see (2.8))
µk,n(C) =
degPk,n ≤ µk(C), (5.21)
for every n ∈ N.
Let C0(C) be the Banach space of continuous functions on C that vanish
at infinity. The dual space C0(C)
∗ of C0(C) is the space of regular complex
Borel measures on C. By (5.21) the sequence (µk,n)n∈N belongs to the ball in
C0(C)
∗ centered at the origin with radius µk(C), which is weak
∗ compact by
the Banach-Alaoglu theorem. Let µk,∞ be the limit of a weak
∗ convergent
subsequence of (µk,n)n∈N.
By weak∗ convergence and Corollary 5.2 we obtain that µk,∞ is supported
on Γk. Combining this with (5.15) and the weak
∗ convergence leads to
dµk(x)
x− λ =
dµk,∞(x)
x− λ , (5.22)
for every λ ∈ C \ Γk. The integrals in (5.22) are known in the literature as
the Cauchy transforms of the measures µk and µk,∞. The Cauchy transform
on Γk is an injective map that maps measures on Γk to functions that are
analytic in C \ Γk (one can find explicit inversion formulae, see for example
the arguments in [9, Theorem II.1.4] or the Stieltjes-Perron inversion formula
in the special case Γk ⊂ R). Thus it follows from (5.22) that µk,∞ = µk.
Therefore
µk,n = µk (5.23)
in the sense of weak∗ convergence in C0(C)
∗. Thus (2.21) holds if φ is a
continuous function that vanishes at infinity.
From (5.21) and (5.23) it also follows that
µk,n(C) = µk(C), (5.24)
Then the sequence (µk,n)n∈N is tight. That is, for every ε > 0 there exists a
compact K such that µk,n(C \K) < ε for every n ∈ N. By a standard ap-
proximation argument one can now show that (2.21) holds for every bounded
continuous function φ on C.
Having (2.21) and Proposition 5.1, we can prove (2.20) as in [1, Theo-
rem 11.17]. Indeed, the sets lim infn→∞ spk Tn(a) and lim supn→∞ spk Tn(a)
equal the support of µk, which is Γk. �
EIGENVALUES OF BANDED TOEPLITZ MATRICES 23
–2 –1.5 –1 –0.5 0.5 1 1.5
lambda
–2 –1.5 –1 –0.5 0.5 1 1.5
lambda
Figure 2. Illustration for Example 1: The densities of the
measures µ0 (left) and µ1 (right) for a =
4(z+1)3
6. Examples
6.1. Example 1. As a first example consider the symbol a defined by
a(z) =
4(z + 1)3
. (6.1)
In this case we have p = 2 and q = 1. So we obtain two contours Γ0 and
Γ1 with two associated measures µ0 and µ1. This example appeared in [3],
in which the authors gave explicit expressions for Γ0 and µ0. The following
proposition also contains expressions for Γ1 and µ1. In what follows we take
the principal branches for all fractional powers.
Proposition 6.1. With a as in (6.1), we have that Γ0 = [0, 1] and
dµ0(λ) =
dλ. (6.2)
Moreover, Γ1 = (−∞, 0] and
dµ1(λ) =
)1/3 −
1− λ− 1
(−λ)2/3
dλ. (6.3)
Proof. A straightforward calculation shows that λ = 0 and λ = 1 are the
branch points.
Let λ ∈ Γ0 ∪ Γ1 and assume that λ is not a branch point. There exist
y1, y2 ∈ C such that y1 6= y2, |y1| = |y2| and a(y1) = a(y2) = λ. Then it
follows from (6.1) that |y1+1| = |y2+1|. Therefore y1 and y2 are intersection
points of a circle centered at −1 and a circle centered at the origin. Since
y1 6= y2, this means that y1 = y2 and therefore λ = a(y1) = a(y2) = a(y1) =
λ, so that λ ∈ R. A further investigation shows that a(z)−λ has 3 different
real zeros if λ > 1. If λ < 1 and λ 6= 0 then a(z) − λ has precisely 1 real
zero and 2 conjugate complex zeros. Therefore, Γ0 ∪ Γ1 = (−∞, 1].
24 MAURICE DUITS AND ARNO B.J. KUIJLAARS
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
k = 0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
k = 1
Figure 3. Illustration for Example 1: The spectrum
spT50(a) (top) and the generalized spectrum sp1T50(a) (bot-
tom), for the symbol a =
4(z+1)3
Now we will show that Γ0 = [0, 1] and Γ1 = (−∞, 0]. By Cardano’s
formula the solutions of the algebraic equation a(z) = λ are given by
zj(λ) = −1−
3λ1/3
1 + (1− λ)1/2
+ ω−j
1− (1− λ)1/2
(6.4)
for λ ∈ [0, 1] and
zj(λ) = −1+
3(−λ)1/3
1 + (1− λ)1/2
− ω−j−2
(1− λ)1/2 − 1
(6.5)
for λ ∈ (−∞, 0]. Here ω = e2πi/3. One can check that |z1(λ)| = |z2(λ)| <
|z3(λ)| for λ ∈ (0, 1] and |z1(λ)| < |z2(λ)| = |z3(λ)| for λ ∈ (−∞, 0). More-
over, for λ = 0 we have z1(0) = z2(0) = z3(0) = −1. Therefore Γ0 = [0, 1]
and Γ1 = (−∞, 0].
The density (6.2) was already given in [3] and (6.3) follows in a similar
way. �
In Figure 2 we plotted the densities of µ0 and µ1. Note that, due to the
interaction between µ0 and µ1 in the energy functional, there is more mass
of µ0 near 0 than near 1. We also see that the singularities of the densities
for µ0 and µ1 are of order O(|λ|−2/3) for λ → 0, whereas the typical nature
of a singularity in each of the measures is a square root singularity. The
stronger singularity is due to the fact that a(z) − λ has a triple root for
λ = 0.
EIGENVALUES OF BANDED TOEPLITZ MATRICES 25
–4 –2 2 4
lambda
–4 –2 2 4
lambda
Figure 4. Illustration for Example 2: The densities of the
measures µ0 (left) and µ1 = µ−1 (right) for a(z) = z
2 + z +
z−1 + z−2.
In Figure 3 we plotted the eigenvalues and generalized eigenvalues for
n = 50. It is known that the eigenvalues are simple and positive [3, §2.3],
which we also see in Figure 3.
6.2. Example 2. For the symbol a defined by
a(z) = z2 + z + z−1 + z−2. (6.6)
we have p = q = 2. From the symmetry a(1/z) = a(z) it follows that
Γ−1 = Γ1 and µ−1 = µ1.
The interesting feature of this example is that the contours Γ0 and Γ±1
overlap. To be precise, the interval (−9/4, 0) is contained in all three con-
tours Γ−1,Γ0 and Γ1. This can be most easily seen by investigating the
image of the unit circle under a. Consider
a(eit) = 2 cos 2t+ 2cos t, for t ∈ [0, 2π). (6.7)
A straightforward analysis shows that for every λ ∈ (−9/4, 0), the equation
a(eit) = λ has four different solutions for t in [0, 2π). This means that the
four solutions of the equation a(z) = λ are on the unit circle, and so in
particular have the same absolute value.
The equation a(z) − λ = 0 can be explicitly solved by introducing the
variable y = z + 1/z. In exactly the same way as in the previous example
one can obtain the limiting measures. We will not give the explicit formulas,
but only plot the densities in Figure 4. The branch points are λ = −9/4,
λ = 0 and λ = 4. The contours are given by
Γ0 = [−9/4, 4], Γ−1 = Γ1 = (−∞, 0]. (6.8)
The densities have singularities at the branch points in the interior of their
supports. The singularities are only felt at one side of the branch points.
Consider first µ0, whose density has a singularity at 0. However the limiting
value when 0 is approached from the positive real axis is finite. The change
in behavior of µ0 has to do with the fact that z1 is analytic on (0, 4) but not
26 MAURICE DUITS AND ARNO B.J. KUIJLAARS
on (−9/4, 0). Therefore we find by (1.12) that
dµ0(λ) =
z1+(λ)
z2+(λ)
z1−(λ)
z2−(λ)
dλ (6.9)
on (−9/4, 0), and
dµ0(λ) =
z2+(λ)
z2−(λ)
dλ (6.10)
on (0, 4).
For µ−1 = µ1 a similar phenomenon happens at λ = −9/4. This is a
consequence of the fact that z1 has an analytic continuation into z2 when
we cross (−∞,−9/4), but it has an analytic continuation into z4 when we
cross (−9/4, 0).
6.3. Example 3. As a final example, consider the symbol
a(z) = zp + z−q, (6.11)
with p, q ≥ 1 and gcd(p, q) = 1. This example appeared in [10], where the
authors mentioned that Γ0 is given by the star
Γ0 = {rωj | j = 1, . . . , p+ q, 0 ≤ r ≤ R} (6.12)
with ω = e2πi/(p+q) and R = (p + q)p−p/(p+q)q−q/(p+q). The other contours
also have a star shape, namely
Γk = {(−1)krωj | j = 1, . . . , p+ q, 0 ≤ r < ∞} (6.13)
for k 6= 0. Note that the star Γk for k 6= 0 is unbounded.
In Figure 5 we plotted the eigenvalues and the generalized eigenvalues
for p = 2, q = 3 and n = 50. All the (generalized) eigenvalues appear
to lie exactly on the contours. In the special case p = 1 it is known that
the eigenvalues of Tn(a) lie indeed precisely on the star (6.12) and are all
simple (possibly except for 0) [4, Theorem 3.2], see also [6] for a connection
to Chebyshev-type quadrature.
6.4. Numerical stability. In Figure 3 and Figure 5 the eigenvalues and
the generalized eigenvalues of T50(a) were computed numerically. To control
the stability of the numerical computation of the eigenvalues one needs to
analyze the pseudo-spectrum. For banded Toeplitz matrices the pseudo-
spectrum is well understood [12, Th. 7.2]. To this date, a similar analysis
of the pseudo-spectrum for the matrix pencil (Tn(z
−ka), Tn(z
−k)) has not
been carried out. See [12, §X.45] for some remarks on the pseudo-spectrum
for the generalized eigenvalue problem.
EIGENVALUES OF BANDED TOEPLITZ MATRICES 27
−5 0 5
k = −2
−5 0 5
k = −1
−5 0 5
k = 0
−5 0 5
k = 1
Figure 5. Illustration for Example 3: The contours Γk and
the eigenvalues and generalized eigenvalues for T50(a) for the
symbol a = z2 + z−3.
References
1. A. Böttcher and S. M. Grudsky, Spectral Properties of Banded Toeplitz Matrices,
SIAM, Philadelphia, PA, 2005.
2. A. Böttcher and S. M. Grudsky, Can spectral values sets of Toeplitz band matrices
jump?, Linear Algebra Appl., 351-352 (2002), pp. 99-116.
3. E. Coussement, J. Coussement and W. Van Assche, Asymptotic zero distribution for
a class of multiple orthogonal polynomials, Trans. Amer. Math. Soc., (to appear)
4. M. Eiermann and R. Varga, Zeros and local extreme points of Faber polynomials
associated with hypocycloidal domains, Electron. Trans. Numer. Anal., 1 (1993), pp.
49-71.
5. I. I. Hirschman, Jr., The spectra of certain Toeplitz matrices, Illinois J. Math., 11
(1967), pp. 145-159.
6. A. Kuijlaars, Chebyshev quadrature for measures with a strong singularity, J. Comput.
Appl. Math., 65 (1995), pp. 207-214.
7. E. Nikishin and V. Sorokin, Rational Approximations and Orthogonality, Translations
of Mathematical Monographs 92, American Mathematical Society, Providence, RI,
(1991).
8. T. Ransford, Potential Theory in the Complex Plane, London Mathematical Society
Student Texts 28, Cambridge University Press, Cambridge, 1995.
9. E.B. Saff and V. Totik, Logartihmic Potentials with External Fields, Grundlehren der
Mathematischen Wissenschaften 316, Springer-Verlag, Berlin, 1997.
10. P. Schmidt and F. Spitzer, The Toeplitz matrices of an arbitrary Laurent polynomial,
Math. Scand., 8 (1960), pp. 15-38.
28 MAURICE DUITS AND ARNO B.J. KUIJLAARS
11. P. Simeonov, A weigthed energy problem for a class of admissible weights, Houston
J. Math., 31 (2005), pp. 1245-1260.
12. L.N. Trefethen and M. Embree, Spectra and Pseudospectra, Princeton University
Press, Princeton, NJ, 2005.
13. J.L. Ullman, A problem of Schmidt and Spitzer, Bull. Amer. Math. Soc., 73 (1967),
pp. 883-885.
14. H. Widom, On the eigenvalues of certain Hermitean operators, Trans. Amer. Math.
Soc., 88 (1958), pp. 491-522.
	1. Introduction
	2. Statement of results
	2.1. The energy functional
	2.2. The measures k as limiting measures of generalized eigenvalues
	2.3. Overview of the rest of the paper
	3. Preliminaries
	3.1. The structure of the curves k
	3.2. The Riemann surface
	3.3. The functions wk()
	4. Proof of Theorem 2.3
	5. Proofs of Proposition 2.5 and Theorem 2.6
	5.1. Proof of Proposition 2.5
	5.2. Proof of Theorem 2.6
	6. Examples
	6.1. Example 1
	6.2. Example 2
	6.3. Example 3
	6.4. Numerical stability
	References
ABSTRACT
  We study the limiting eigenvalue distribution of $n\times n$ banded Toeplitz
matrices as $n\to \infty$. From classical results of Schmidt-Spitzer and
Hirschman it is known that the eigenvalues accumulate on a special curve in the
complex plane and the normalized eigenvalue counting measure converges weakly
to a measure on this curve as $n\to\infty$. In this paper, we characterize the
limiting measure in terms of an equilibrium problem. The limiting measure is
one component of the unique vector of measures that minimes an energy
functional defined on admissible vectors of measures. In addition, we show that
each of the other components is the limiting measure of the normalized counting
measure on certain generalized eigenvalues.

<|endoftext|><|startoftext|>
FIG. 1. Knotted bead-spring polymer: Starting configuration with N=16384 beads; after 6 reduction steps (N=265); final 
configuration after 15 iterations (N=8) with the knotted (trefoil) region circled in red; and magnified.
 (enhanced online) 
Capturing knots in polymers 
Peter Virnau, Mehran Kardar  
Department of Physics, MIT, Cambridge, MA 
02139-4307, USA 
Yacov Kantor 
School of Physics and Astronomy, Tel Aviv 
University, 69978 Tel Aviv, Israel 
(received, published) 
[DOI: 10.1063/1.2130690]  
Visualizing topological properties is a particularly 
challenging task. Although algorithms can usually 
determine if a loop contains a knot, finding its 
exact location is difficult (and not necessarily 
well-defined).
     Here, we apply a reduction method by Koniaris 
and Muthukumar
, which was originally proposed 
to simplify polymers before calculating knot 
invariants. We start with one end and consider 
consecutive triangles formed by three adjacent 
monomers. If the triangle is not crossed by any of 
the remaining bonds, the particle in the middle is 
removed. Going back and forth between both ends 
we proceed until the configuration cannot be 
reduced any further (see Fig.1). 
    Although the method is not perfect (sometimes 
entangled, but unknotted regions remain), it 
provides us with a valuable impression on the 
typical number of knots, their respective location 
and sizes
     This work was supported by the DFG grant 
Vi237/1. 
 P. Virnau, Y. Kantor, and M. Kardar, J. Am. Chem. Soc., 
in press (2005). 
 W. G. Taylor, Nature 406, 916 (2000). 
 K. Koniaris and M. Muthukumar, J.Chem.Phys. 95, 2873 
(1991). 
 Pictures and movie were generated using the VMD 
visualization package; see W. Humphrey, A. Dalke, and K. 
Schulten,, J. Molec. Graphics 14, 33 (1996).  
Copyright (2005) American Institute of Physics. 
This article may be downloaded for personal use 
only. Any other use requires prior permission of 
the author and the American Institute of Physics.  
The following article appeared in the Gallery of Images in 
Chaos 15, 041103 (2005) 
and may be found at 
http://chaos.aip.org/chaos/gallery/toc_Dec05.jsp 
This version also contains a movie of the algorithm.
ABSTRACT
  This paper visualizes a knot reduction algorithm

<|endoftext|><|startoftext|>
Introduction. In this article we will consider a certain family of typed
branching diffusions that have particles which move (independently of each
other) in space according to a Brownian motion with variance controlled by
the particle’s type process. The type of each particle evolves as an Ornstein–
Uhlenbeck process and this type also controls the rate at which births occur.
The particular form of this model permits many explicit calculations, but
throughout we will strive to develop techniques that rely on general prin-
ciples as much as possible, so they might readily adapt to other situations.
This model was previously considered in [12, 13]; these papers form essential
foundations for this work, although we will recall various results as necessary.
Received December 2004; revised November 2006.
1Supported in part by an EPSRC studentship.
AMS 2000 subject classification. 60J80.
Key words and phrases. Spatial branching process, branching diffusion, multi-type
branching process, additive martingales, spine decomposition.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Probability,
2007, Vol. 17, No. 2, 609–653. This reprint differs from the original in pagination
and typographic detail.
http://arxiv.org/abs/0704.0380v1
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000853
http://www.imstat.org
http://www.imstat.org
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000853
2 Y. GIT, J. W. HARRIS AND S. C. HARRIS
We will make some significant applications of the spine theory for branch-
ing processes. Inspired by the series of papers Lyons, Pemantle and Peres
[19], Lyons [18] and Kurtz, Lyons, Pemantle and Peres [16], spine techniques
have been instrumental in recent years in providing intuitive and elegant
proofs of many important classical and new results in the theory of branch-
ing processes. In this article we use the recent reformulation of the spine
method presented in [8], which follows in similar spirit to the branching
Brownian motion study of Kyprianou [17]. For a selection of other applica-
tions of spine techniques, for example, see [1, 6, 7, 23] and references therein.
1.1. The branching model. We define Nt to be the set of particles alive
at time t ≥ 0. For a particle u ∈ Nt, Xu(t) ∈ R is its spatial position, and
Yu(t) ∈ R is the type of u. We will label offspring using the Ulam–Harris
convention where, for example, if u=∅21 then particle u is the first child
of the second child of the initial ancestor, and we will write v > u if particle
v is a descendant of particle u. The configuration of the branching diffusion
at time t is given by the point process Xt := {(Xu(t), Yu(t)) :u ∈Nt}.
A particle’s type evolves as an Ornstein–Uhlenbeck process with an invari-
ant measure given by the standard normal density φ(y) and an associated
differential operator (generator)
Qθ :=
− y ∂
where θ > 0 is considered to be the temperature of the system. The spatial
motion of a particle of type y is a driftless Brownian motion on R with
variance
A(y) := ay2, where a≥ 0.
A particle of type y particle is replaced by two offspring at a rate
R(y) := ry2 + ρ, where r, ρ≥ 0.
Each offspring inherits its parent’s current type and spatial position, and
then moves off independently of all others. We use P x,y and Ex,y with x, y ∈
R to represent probability and expectation when the Markov process starts
with a single particle at position (x, y).
We will find the almost sure rate of exponential growth, D(γ,κ), of par-
ticles which are found simultaneously with spatial positions near −γt and
type positions near κ
t at large times t. From this we can deduce the speed
of extremal particles and hence the asymptotic shape of the particle system.
The main effort is required in identifying D(γ,κ) as the almost sure limit of
t−1 log
1{Xu(t)≤−γt;Yu(t)≥κ
A TYPED BRANCHING DIFFUSION 3
In particular, the convergence properties of two different families of addi-
tive martingales associated with the branching diffusion will lead directly to
the spatial exponential growth rates and an upper bound on the space-type
growth. For the remaining lower bound, we describe an explicit two-phase
mechanism for amassing the required number of particles with prescribed
space-type positions. The first phase involves building up an “excess” num-
ber of particles, each covering a certain proportion of the required spatial
distance. During their second phase, enough of these particles must succeed
in making a difficult and rapid ascent into the required position. The latter
phase is proved using an intuitive change of measure technique that induces
a spine construction.
The family of models we are considering is specific but nevertheless have
some features of fundamental significance that motivate the choices for Qθ,
R and A. If the spatial motion is ignored, we have investigated a binary
branching Ornstein–Uhlenbeck process in a quadratic breeding potential. In
contrast, Enderle and Hering [5] considered a branching Ornstein–Uhlenbeck
with constant branching rate but random offspring distribution. A quadratic
breeding potential is a critical rate for explosions in the population of parti-
cles. In a branching Brownian motion on R with binary splitting occurring
at rate xp at position x, the population will explode almost surely in finite
time if p > 2, whereas for p = 2 the expected number of particles explodes
while the total population remains finite for all time with probability 1 (see
[15], Chapter 5.12). The Ornstein–Uhlenbeck process is not only a canoni-
cal ergodic diffusion, but this type-motion has exactly the right drift to help
counteract the quadratic breeding rate. For high temperatures, θ > 8r, there
is a sufficiently strong mean-reversion in the type processes to ensure that
the expected total population size does not blow up; but for temperatures
θ ≤ 8r, the quadratic breeding overpowers the pull toward the origin, the
expected population blows up in a finite time and particles behave very dif-
ferently. Throughout this paper we consider only high temperatures θ > 8r,
deferring the low and critical temperature regimes to future work. Given
other choices, the quadratic spatial diffusion coefficient now becomes very
natural, enabling us to find explicit families of (fundamental) additive mar-
tingales since the linearized traveling-wave equation can be linked to the
classical harmonic oscillator equations from physics. The binary branching
mechanism was taken for simplicity; in principle our approach could extend
to general offspring distributions, although new features would arise from
possible extinctions and necessary offspring moment conditions. All these
choices make the models rich in structure, possessing some very challenging
features whilst remaining sufficiently tractable.
4 Y. GIT, J. W. HARRIS AND S. C. HARRIS
1.2. Application to reaction–diffusion equations. Following in the foot-
steps of McKean [20], the solution of the reaction–diffusion equation
+R(y)u(u− 1) + θ
− y∂u
with initial condition f(x, y)∈ [0,1] for all x, y ∈R, can be represented by
u(t, x, y) =Ex,y
f(Xu(t), Yu(t))
Of great importance for reaction–diffusion equations are traveling-wave so-
lutions (e.g., see [21]). In the present context, a solution to equation (1) of
the form u(t, x, y) := w(x− ct, y) is said to be a traveling-wave of speed c,
where w(x, y) solves the traveling-wave equation
+R(y)w(w − 1) + θ
− y∂w
= 0.(3)
Fundamental to our study of the branching diffusion are two families of “ad-
ditive” martingales, Z±λ (t) [defined at (6)], which are linked to the lineariza-
tion of (1). When θ > 8r, Harris and Williams [13] determined when Z−λ is
uniformly integrable (see Theorem 17) and then wλ(x, y) :=E
x,y exp(−Z−λ (∞))
yields a traveling wave of speed c−λ . This gives the existence of traveling
waves for all speeds c greater than some threshold c̃(θ) := inf c−λ .
Furthermore, combining the McKean representation (2) with the almost-
sure convergence result established in [12] (look ahead to Theorem 18) can
give results on the attraction toward traveling waves from given initial data.
For example, if − lnf(x, y)∼ eλxg(y) uniformly in y as x→∞ for some suit-
able g ∈ L2(φ), the solution u(x, y) to (1) with initial conditions f satisfies
u(t, x− c−λ t, y)→wλ(x+ x̂, y) as t→∞, where x̂ is some constant that can
be determined from g.
In future work we hope to develop the approach used for standard BBM
and the FKPP equation in [11], and prove that traveling waves of a given
speed c > c̃(θ) are unique (up to translation) and that no traveling waves
exist for speeds c < c(θ). We anticipate that our new results on the growth
rates of particles will aid in establishing some difficult estimates on the tail
behavior of any traveling wave, and hence assist in proving the conjectured
uniqueness. In addition, we expect our growth rate results will be essential
in obtaining broader classes of initial conditions that are attracted toward
traveling waves. In each of these problems, difficulties arise from the un-
bounded type space where, for example, some control must be gained over
the possible contributions to
u∈Nt log f(Xu(t) − ct, Yu(t)) from particles
that have large type positions in addition to large spatial positions.
A TYPED BRANCHING DIFFUSION 5
2. Main results. In this section, we will present our main results that
identify the growth rates found within the branching diffusion. We will give
an overview for our proofs, identifying the key ideas and techniques used,
as well as introducing some intuition for the dominant behavior of particles
that underpins our approach.
2.1. Martingales. The principal tools used throughout this paper are
two fundamental families of “additive” martingales, which were introduced
in [13].
Before defining the martingales we give some key definitions. Let
λmin :=−
θ− 8r
Let λ ∈R, with the following convention which we always use for λ:
λmin <λ< 0.
Also, define
µλ :=
θ(θ− 8r− 4aλ2), ψ±λ :=
E±λ := ρ+ θψ
λ , c
λ :=−E
λ /λ.(5)
Will will occasionally write E±λ as E
±(λ) in order to emphasize that E±λ are
really functions of λ; the ± superscripts will always distinguish these from
expectation operators. Note that λmin is the point beyond which µλ is no
longer a real number.
The martingales are Z−λ and Z
λ , defined for λ ∈ (λmin,0] as
Z±λ (t) :=
v±λ (Yu(t))e
λXu(t)−E±λ t,(6)
where v±λ (y) := exp(ψ
2) are strictly-positive eigenfunctions of the opera-
Qθ + 12λ
2A+R,
with corresponding eigenvalues E−λ <E
λ and A,R are the functions defined
in Section 1.1. This operator is self-adjoint on L2(φ) with the inner product
〈·, ·〉φ where 〈f, g〉φ :=
fgφdy and φ is the standard normal density. Note
that v−λ ∈ L2(φ), whereas v
λ /∈L2(φ) so is not normalizable.
The calculations of Section 3 make it easy to see these are martingales,
and throughout the paper we will need a variety of martingale convergence
results which are gathered together in Section 8. In particular, we will need
to know precisely when Z−λ is uniformly integrable with a strictly positive
limit, some further strong convergence results for other closely related sums
over particles (also identifying which particles contribute nontrivially to their
limits), and the rate of convergence to zero of the Z+λ martingales.
6 Y. GIT, J. W. HARRIS AND S. C. HARRIS
2.2. The asymptotic growth-rate of particles along spatial rays. As an
essential initial step toward determining the growth rate of particles in the
two-dimensional space-type domain, we first look at the growth rate of par-
ticles in the spatial dimension only.
For γ ≥ 0 and C ⊂R, define
Nt(γ;C) :=
1{Xu(t)≤−γt;Yu(t)∈C}.(7)
The limit giving the expected rate of growth,
t−1 logE(Nt(γ;R))
can be shown to exist and its value can be calculated to be
∆(γ) := inf
λ∈(λmin,0)
{E−λ + λγ}
a−1(θ − 8r)(4γ2 + θa).
An outline for this expectation calculation is given in Section 3.
It is now tempting to guess that the asymptotic speed of the spatially
left-most particle, c̃(θ), is given by
c̃(θ) := sup{γ :∆(γ)> 0}
r+ ρ+
2(2r+ ρ)2
θ− 8r
Recall that c̃(θ) = infλ∈(λmin,0) c
λ is also the minimum threshold for traveling
waves. In this particular situation, the guess that “expectation” and “almost
sure” right-most particle speeds agree was first proved rigorously using a
martingale change of measure technique in [13]. In this paper, we extend
this connection and prove that the “expected” and “almost sure” rates of
growth of particles with given speeds (Theorem 1) and given space-type
locations (Theorem 3) agree.
Theorem 1. Let γ ≥ 0 and y0 < y1. Under each P x,y law, the limit
D(γ) := lim
t−1 logNt(γ; [y0, y1])
exists almost surely and is given by
D(γ) =
∆(γ), if 0≤ γ < c̃(θ),
−∞, if γ ≥ c̃(θ).
A TYPED BRANCHING DIFFUSION 7
Note that symmetry in the process means there is a corresponding re-
sult for particles with spatial velocities greater than +γ (corresponding to
positive λ values). We may occasionally make use of such process symme-
tries without further comment. Then, since Nt(γ;R) is integer valued, the
asymptotic speed of the right-most particle follows immediately:
Corollary 2. Almost surely,
t−1 sup{Xu(t) :u ∈Nt}= c̃(θ).
This spatial growth rate result is proved in Section 10 using the martin-
gale results from Section 8. In fact, it is very easy to obtain the upper bound
by first dominating the indicator function with exponentials to reveal that
Nt(γ;R) ≤ exp{(E−λ + λγ)t}Z
λ (t), recalling that Z
λ is a convergent mar-
tingale, and then optimizing over the choice of λ. For the lower bound, we
will use a strong convergence result obtained in [12], combined with the idea
that each uniformly integrable martingale Z−λ essentially “counts” only the
particles of corresponding velocity −γ.
2.3. The asymptotic shape and growth of the branching diffusion. The
main result of this paper is the almost-sure rate of growth of particles which
are in the vicinity of −γt in space and near κ
t in type position at large
times t. For γ, κ≥ 0, it can be shown that the limit
t−1 logE(Nt(γ; [κ
t,∞)))(10)
exists and takes the value
∆(γ,κ) := inf
λ∈(λmin,0)
{E−λ + λγ − κ
2ψ+λ }
(θ− κ2)
θ(θ− 8r)(4aθγ2 + a2(θ+ κ2)2).
An outline of this expectation calculation is given in Section 3. Once again,
we will find that the “almost sure” rate of growth of particles agrees with
this “expected” rate exactly where there is growth in particle numbers.
Theorem 3. Let γ,κ ≥ 0 with ∆(γ,κ) 6= 0. Under each P x,y law, the
limit
D(γ,κ) := lim
t−1 logNt(γ; [κ
t,∞))
exists almost surely and is given by
D(γ,κ) =
∆(γ,κ), if ∆(γ,κ)≥ 0,
−∞, if ∆(γ,κ)< 0.(12)
8 Y. GIT, J. W. HARRIS AND S. C. HARRIS
To prove the tricky lower bound of Theorem 3, which amounts to the
major work of this paper, we will exhibit an explicit two-phase mechanism by
which the branching diffusion can build up at least the required exponential
number of particles near to −γt in space and κ
t in type position by large
times t.
During the first phase, over a large time t the process builds up an ini-
tial excess of approximately exp(∆(α)t) particles with spatial position at
least −αt, as is already known from Theorem 1. In this “ergodic” phase,
“typical” particles found near −αt in space will have drifted with a steady
spatial speed of α whilst their type histories will have behaved roughly like
OU processes with inward drift of µλy for a certain optimal choice λ(α) of
parameter λ.
For the second phase, we will show that the probability any individual
particle has at least one descendant that makes a “rapid ascent” in both
space and type dimensions from initial position (0,0) to final position near
(−βt,κ
t) is approximately exp(−Θ(β,κ)t), where
Θ(β,κ) =
θ(θ− 8r)(a2κ4 +4aθβ2)
,(13)
and the time taken for this “rapid ascent” is an interval [0, τ ]. We show that
this time τ can be chosen such that 2µλτ ∼ log t, and hence the additional
time is asymptotically negligible in comparison with t. Intuitively, we will
see that given an offspring that has successfully made such a difficult “rapid
ascent,” it will most likely have roughly had its type process behaving like an
OU process with an outward drift of µλy and the Brownian motion driving
its spatial motion will have had a drift λ [corresponding to a real time spatial
drift λA(y) that increases in strength as the type position y increases], for
some optimal choice λ(β,κ) of parameter λ. The precise result required will
be formulated rigorously as a large-deviation lower bound in Theorem 7
of Section 5, and is proved using a “spine” change of measure technique
intimately related to the Z+λ martingales.
Combining these two phases and using independence of the particles, we
can see that the number of particles near (−αt,0) at time t that subse-
quently proceed to have at least one descendant near (−(α + β)t, κ
t) is
approximately Poisson with mean
exp({∆(α)−Θ(β,κ)}t).
Optimizing for a fixed overall spatial speed γ, some calculus reveals that
α+β=γ
α,β>0
{∆(α)−Θ(β,κ)}=∆(ᾱ)−Θ(β̄, κ) = ∆(γ,κ),(14)
A TYPED BRANCHING DIFFUSION 9
with optimal parameters
ᾱ= γ
θ+ κ2
and β̄ = γ
θ+ κ2
.(15)
Thus we will be able to demonstrate an explicit two-phase mechanism pro-
ducing the required number of particles, with this outline argument later
guiding our rigorous proof. In addition, it is interesting to note that the
optimal choices for λ over each phase then also coincide at a single value
λ̄= λ(ᾱ) = λ(β̄, κ).
An informative large deviation heuristic for the rapid ascent can also be
found in Section 4, with this section also containing some essential optimal
path calculations. We actually prove the two-phase mechanism for the lower
bound of Theorem 3 in Section 5, although we defer proving the large-
deviation lower bound until Section 7 after presenting the necessary “spine”
background in Section 6.
We prove the upper bound for the space-type growth rate in Section 9,
again making crucial use of martingale results from Section 8. Similarly to
the spatial growth case, we can find an upper bound using the Z+λ mar-
tingales, that is to say Nt(γ; [κ
t,∞)) ≤ exp{(E+λ + λγ − κ2ψ
λ )t}Z
λ (t).
However, as each Z+λ martingale converges to zero, we must show that its
exponential decay rate is (E+λ −E
λ ) before being able to optimize over the
choice of λ to obtain the required upper bound.
Given Theorem 3, and noting symmetries, it becomes straightforward to
retrieve the following:
Corollary 4. For any F ⊂R2, define
Nt(F ) :=
1{(Xu(t)/t,Yu(t)/
t)∈F}.
If B ⊂R2 is any open set and C ⊂R2 is any closed set, then almost surely
under any P x,y
lim inf
logNt(B)≥ sup
(γ,κ)∈B
D(γ,κ),
lim sup
logNt(C)≤ sup
(γ,κ)∈C
D(γ,κ),
with the growth rate D(γ,κ) given at equation (12).
We can also recover the almost sure asymptotic shape of the region occu-
pied by the particles in the branching diffusion.
10 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Corollary 5. Let B ⊂R2 be any open set. Almost surely, under each
P x,y law,
Nt(B)→
0, if S ∩B =∅,
+∞, if S ∩B 6=∅,
where S ⊂R2 is the set given by
S := {(γ,κ) ∈R2|∆(γ,κ)> 0}.
3. Some expectation calculations. This section discusses how the ex-
pected growth rates given in the previous section may be obtained. For this,
we use the “many-to-one” lemma (see, e.g., [8]) and one-particle changes of
measure. In the process we shall start to gain valuable intuition into how
particles within the branching diffusion behave, as well as seeing hints as to
which are the “correct” martingales to use to prove the almost-sure growth
rate results.
For simplicity, we assume throughout this section that the branching dif-
fusion starts with one particle at the origin in both space and type at time
zero, unless otherwise stated. We also introduce a family of single particle
probability measures Pµ,λ with associated expectations Eµ,λ where, under
Pµ,λ, η is an Ornstein–Uhlenbeck process with variance θ and drift µ, and
ξt =B(
0 A(ηs)ds) where B is a Brownian motion with drift λ.
Lemma 6 (Many-to-one). If f :R2 7→R is Borel measurable then
f(Xu(t), Yu(t)) = Eθ/2,0
R(ηs)ds
f(ξt, ηt)
.(16)
Using the many-to-one lemma, and changing measure to alter the drift of
Brownian motion, we see that
f(Xu(t), Yu(t))
= Eθ/2,0
R(ηs)ds
f(ξt, ηt)
= Eθ/2,0
e−λξt exp
R(ηs) +
A(ηs)
× f(ξt, ηt) · eλξt−λ
A(ηs)ds
= Eθ/2,λ
−λξt +
R(ηs) +
A(ηs)
f(ξt, ηt)
A TYPED BRANCHING DIFFUSION 11
To perform a further change of measure on the OU process to get rid of the
time integrals in the exponential of the expectation, we recall that
dPµλ,·
dPθ/2,·
µλ,θ/2
:= exp
ψ−λ η
t −E−λ t+
R(ηs) +
λ2A(ηs)
and then
f(Xu(t), Yu(t))
= Eθ/2,λ(exp(−λξt −ψ−λ η
λ t)f(ξt, ηt) ·M
µλ,θ/2
t )(17)
= Eµλ,λ(exp(−λξt − ψ
λ t)f(ξt, ηt)).
Note that the many-to-one lemma, combined with the branching prop-
erty, immediately suggests how to get “additive” martingales for the branch-
ing diffusion from single particle martingales—for example, taking f(x, y) =
exp{λx+ ψ−λ y2} in equation (17) quickly leads to the martingale Z
We may now proceed to calculate the expected growth rates. However,
for both clarity and brevity we will leave rigorous details to the interested
reader, noting that the intuition we will gain from our rough calculations
will later be invaluable in guiding our rigorous proof of the corresponding
almost-sure growth rates.
3.1. The expected rate of growth along spatial rays. We first give the
outline of some calculations to find the rate of growth in the expected number
of particles near −γt in space at time t.
Using the formula from (17), for λ ∈ (λmin,0) and any ε > 0 we have
1{t−1Xu(t)+γ∈(−ε,ε)}
= Eµλ,λ(e
−λξt−ψ−λ η
1{t−1ξt+γ∈(−ε,ε)})
≤ e(E
+λγ−λε)tEµλ,λ
+ γ ∈ (−ε, ε)
≥ e(E
+λγ+λε)tEµλ,λ
η2t ;
+ γ ∈ (−ε, ε)
where, with some abuse of notation that we shall continue to use throughout
this section, we will abbreviate this to
1{Xu(t)∼−γt} = Eµλ,λ(e
−λξt−ψ−λ η
1{ξt∼−γt})
∼ e(E
+λγ)t
Eµλ,λ(e
t ; ξt ∼−γt)
12 Y. GIT, J. W. HARRIS AND S. C. HARRIS
with the understanding that any subsequent arguments to identify expo-
nential growth rates can readily be made rigorous by using the appropriate
upper and lower bounds, and so on.
Now, considering E−(λ) :=E−λ as a function of λ, we have from (8) that
∆(γ) = infλ∈(λmin,0){E−(λ) + λγ}=E−(λγ) + λγγ, where λγ satisfies
(λγ) =−γ, hence λγ =−γ
(θ− 8r)
θa2 +4aγ2
.(19)
Of course, choosing this optimal λγ value in (18) means that we must have si-
multaneously maximized the expectation Eµλ,λ(exp(−ψ
t ); ξt ∼−γt), and
to confirm that this value is not exponentially decaying in t is now relatively
straightforward. Under Pµλ,λ, η is an Ornstein–Uhlenbeck process with an
invariant measure given by the probability density, φλ, of the normal distri-
bution N(0, θ/(2µλ)); and ξt =B(
0 A(ηs)ds), where B is a BM with drift
λ. Note also that by differentiating
〈(Qθ + (1/2)λ2A+R−E−λ )v
λ , v
λ 〉φ = 0
with respect to λ, using self-adjointness, and observing that φλ ∝ (v−λ )2φ,
we find that
〈λAv−λ , v
〈v−λ , v
A(y)φλ(y)dy.
Then almost surely under Pµλ,λ,
0 A(ηs)ds)
0 A(ηs)ds
0 A(ηs)ds
Aφλ dy =
,(20)
and so when we use the optimal λγ value we get exactly the desired drift,
since ∂E
(λγ) =−γ. Then
Eµλγ ,λγ (e
η2t ; ξt ∼−γt) → lim
Eµλγ ,λγ(e
η2t )
φλγ (y)dy.
In this way, we can obtain the exact rate of exponential growth for the
expectation,
t−1 logE(Nt(γ;R)) = ∆(γ).
The changes of measure used above are actually suggesting a great deal
about the dominant particles that are found in the vicinity of a given ray
in space. An alternative discussion of this expectation result, involving a
dual approach via large deviation theory for occupation densities, can also
be found in [13].
A TYPED BRANCHING DIFFUSION 13
3.2. The expected asymptotic shape. We give a rough outline of calcula-
tions that will yield the correct exponential growth in the expected number
of particles both near −γt in space and κ
t in type at large times t. Using
the formula from (17) and abusing notation throughout in the same way as
Section 3.1, we find that
1{Xu(t)∼−γt;Yu(t)≥κ
= Eµλ,λ(e
−λξt−ψ−λ η
1{ξt∼−γt;ηt≥κ
∼ e(E
+λγ−κ2ψ−
Pµλ,λ(ξt ∼−γt;ηt ≥ κ
Now, from standard bounds on the tail of the normal distribution,
Pµλ,λ(ξt ∼−γt;ηt ≥ κ
= Pµλ,λ(ηt ≥ κ
t)Pµλ,λ(ξt ∼−γt|ηt ≥ κ
t)(21)
∼ e−µλ/θκ2tPµλ,λ(ξt ∼−γt|ηt ≥ κ
and, since ψ−λ + (µλ/θ) = ψ
λ , this yields
1{Xu(t)∼−γt;Yu(t)≥κ
∼ e(E
+λγ−κ2ψ+
Pµλ,λ(ξt ∼−γt|ηt ≥ κ
Recalling that ∆(γ,κ) := infλ∈(λmin,0){E
λ +λγ−κ2ψ
λ }, simple calculus re-
veals this infimum is attained at a λ value of
λ̄(γ,κ) =−γ
θ(θ− 8r)
a2(κ2 + θ)2 + 4aγ2θ
∈ (λmin,0),(23)
and using this optimal value in equation (22) will lead to the upper bound
limsup
t−1 logE
1{Xu(t)∼−γt;Yu(t)≥κ
t} ≤∆(γ,κ).
It is also clear from equation (22) that when minimizing E−λ + λγ − κ2ψ
we simultaneously maximize the probability Pµλ,λ(ξt ∼ −γt |ηt ≥ κ
t). In
particular, to get a matching lower bound, we do not want this probability
to have any exponential decay in time when we choose the optimal parameter
for λ.
In fact, at least up to the exponential decay rate in time, it can be shown
using large-deviations arguments that
Pµλ̄,λ̄
(ξt ∼−γt;ηt ≥ κ
t)∼ exp
14 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Indeed, we immediately gain the required upper bound from (21). For the
lower bound, consider the following heuristics where we break paths into two
sections: normal ergodic behavior over large time period [0, t] followed by a
rapid ascent out to type position κ
t over a much shorter period [t, t+ τ ].
(i) Ergodic behavior. Over a large time t, the occupation density of η
will most likely have settled close to the invariant measure. Hence for large
t, almost surely under Pλ,µλ ,
η2s ds→
(ii) Rapid ascent. Over a large time τ , but where τ = o(t), the probability
that η starts close to the origin and ends near to κ
t, having followed close
to the path y over the entire time period τ , is roughly given by
{ẏ(s) + µλy(s)}2 ds
under the Pλ,µλ law. See, for example [24], Chapter 6, or [4], Chapter 5.6.
After some Euler–Lagrange optimization, the path
y(s) = κ
sinhµλs
sinhµλτ
gives
0 y(s)
2 ds≈ κ2t/(2µλ), with the probability of this path being roughly
exp(−(µλ/θ)κ2t).
Combining these two types of behavior, we can find paths with final positions
ηt+τ ∼ κ
t, ξt+τ ∼ λa
∫ t+τ
η2s ds∼ λa
and, moreover, when substituting the optimal λ value of λ̄(γ,κ) and sim-
plifying, this actually gives ξt+τ ∼ −γt. Further, one of these paths occurs
with a probability of roughly exp(−(µλ̄/θ)κ2t), and note that t+ τ ∼ t since
τ = o(t). Thus we see that to exponential order, the probability Pµλ̄,λ̄
(ξt ∼
−γt;ηt ≥ κ
t) must be at least exp(−(µλ̄/θ)κ2t), as required.
This heuristic argument can be made rigorous to prove, as claimed, that
t−1 logE
1{Xu(t)≤−γt;Yu(t)≥κ
=∆(γ,κ).
If we scale all spatial coordinates by t−1 and all type coordinates by
t)−1 at time t, the expected asymptotic shape can be considered to be the
region S := {(γ,κ) :∆(γ,κ) ≥ 0} where, on average, we have growth in the
numbers of (scaled) particles.
A TYPED BRANCHING DIFFUSION 15
4. Short climb large deviation heuristics. In this section, we give a heuris-
tic calculation that suggests why the probability a single particle manages
to have at least one descendant in the vicinity of (−βt,κ
t ) near time τ
is roughly exp(−Θ(β,κ)t) for very large t, where Θ(β,κ) is given at equa-
tion (13). For these heuristics, we will think of τ as large and fixed, but of
smaller order than t (later on, in our rigorous approach, we will choose τ
proportional to log t). We emphasize that the heuristics in this section are
neither meant to be precise nor made rigorous, yet they will provide invalu-
able intuition, guidance and motivation for our rigorous approach later on.
Of particular importance will be the optimization problem that the heuris-
tics suggest. Indeed, many of the exact calculations in Sections 4.2 and 4.3
will be essential later in the paper.
Suppose we start the branching diffusion with a single particle at (0,0).
First, we wish to know the probability that there is at least one particle at
time τ that has a spatial position near −βt having followed close to the path
x(s) for 0≤ s≤ τ and a type position near κ
t having closely followed the
path y(s) for 0≤ s≤ τ for t arbitrarily large.
We recall from large deviation theory of Ventcel–Freidlin (see [24], Chap-
ter 6, or [4], Chapter 5.6) that the probability a single particle manages to
follow closely both the type path y(s) and the spatial path x(s) for 0≤ s≤ τ
is roughly given by
ẏ(s) +
ds− 1
ẋ(s)2
ay(s)2
when x(0) = 0, x(τ) = −βt, y(0) = 0, y(τ) = κ
t and t is very large. This
probability will typically be very small, but if such paths are followed by
particles in the branching diffusion, we have to also take account of the
large breeding rates that are found far from the type origin.
If we let X(s) represent the numbers of particles in the branching diffusion
that are alive at time s and have traveled “close” to the path (x(u), y(u))
for 0 ≤ u ≤ s, then we can get a rough idea of how X might behave by
considering the following birth–death process.
4.1. A birth–death process. For given fixed paths x(·) and y(·), let M be
a time-dependent birth–death process where at time s particles either give
birth to single offspring with breeding rate λ(s) given by
λ(s) = ρ+ ry(s)2,
or particles die with death rate µ(s) given by
µ(s) =
ẏ(s) +
ẋ(s)2
ay(s)2
16 Y. GIT, J. W. HARRIS AND S. C. HARRIS
(Note that the probability the initial particle of this birth–death process sur-
vives the entire time period [0, τ ] is consistent with the rough large deviation
probability for the branching diffusion at equation (24).)
An important quantity is the effective total death rate up to time t which
is defined by ν(s) :=
0 {µ(w)− λ(w)}dw, so here
ν(s) = J(x, y, s)
ẏ(w) +
ẋ(w)2
ay(w)2
− ry(w)2 − ρ
The distribution for total number of offspring surviving, M(τ), for the
time-dependent birth–death process is well known, for example, see [14].
Then defining
Wτ := e
−ν(τ)
µ(s) eν(s) ds
Uτ := 1− e−ν(τ)W−1τ ,
Vτ := 1−W−1τ ,
we have
P(M(τ) = 0) = Uτ ,
P(M(τ) = n) = (1−Uτ )(1− Vτ )V n−1τ , n= 1,2, . . .
with EM(τ) = e−ν(τ) and E(M(τ)|M(τ)≥ 1) =Wτ .
In our particular case, we have
E(M(τ)) = exp(−J(x, y, τ)).
Define the largest effective total death rate prior to time τ by
L(x, y, τ) := sup
s∈[0,τ ]
J(x, y, s)≥ 0.
If we are in a case when L(x, y, τ) is very large, suggesting a high chance of
extinction, then
P(M(τ)≥ 1) = 1
0 µ(s)e
ν(s) ds
∼Kτ exp(−L(x, y, τ)),(25)
where K−1τ :=
0 µ(s) exp(−{L(x, y, τ) − J(x, y, s)})ds. If there is at least
one particle alive, we would then expect to have
E(M(τ)|M(τ)≥ 1)∼K−1τ exp(L(x, y, τ)− J(x, y, τ)).
Thus, we might guess that the probability any particles in the branching
diffusion manage to make the difficult, rapid ascent along path (x, y) to finish
up near (−βt,κ
t ) can, very roughly, be estimated by exp(−L(x, y, τ)). [To
A TYPED BRANCHING DIFFUSION 17
help see this, try writing x(s) = tf(s) and y(s) =
tg(s), thinking of f, g as
fixed paths and recall that t is very large and τ = o(t), then the role of Kτ
in (25) is insignificant next to exp(−L(x, y, τ)).]
We might then further guess that the chance any particles manage to
stay near position (−βt,κ
t) during a very small interval of time close to τ
should roughly look like
− inf
L(x, y, τ)
where we permit all possible paths x and y satisfying x(0) = 0, x(τ) =−βt
and y(0) = 0, y(τ) = κ
t for the fixed time τ . (We will state and prove a
precise lower bound that corresponds to this guess at Theorem 7.)
4.2. Finding the optimal path and probability. We proceed to calculate
L(x, y, τ)
over paths x and y satisfying x(0) = 0, x(τ) =−βt and y(0) = 0, y(τ) = κ
for the fixed time τ .
We first note that
L(x, y, τ) = inf
s∈[0,τ ]
J(x, y, s)≥ inf
J(x, y, τ)(26)
and we now proceed to calculate infx,y J(x, y, τ).
We can easily optimize over the choice of function x given y, finding that
ẋ(s)∝ ay(s)2 ⇒ x(s) = λa
y(u)2 du
where λ is the constant of proportionality and must satisfy
0 y(s)
,(27)
yielding
ẋ(s)2
ay(s)2
0 y(s)
This is exactly as anticipated since, when following the path y in type space,
the spatial position of a particle is following a Brownian motion with total
amount of variance over period τ given by a
0 y(s)
2 ds. Hence, the proba-
bility that a particle following the path y in type space will also be found
near to βt in space at time τ is roughly
( −β2t2
0 y(s)
18 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Introducing the notation
I(y) :=
ẏ(s) +
− ry(s)2
we are left to find
I(y) +
0 y(s)
= inf
I(y)− 1
y(s)2 ds− λβt
≥ sup
I(y)− 1
y(s)2 ds− λβt
where the first equality is trivially true by maximizing the quadratic in
λ, the introduction of which conveniently removes the awkward integral in
the denominator. Some further Euler–Lagrange optimization now gives the
optimal path as
yλ(s) = κ
sinhµλs
sinhµλτ
(0≤ s≤ τ),(29)
where
θ(θ− 8r− 4aλ2)
and then
I(y)− 1
y(s)2 ds− λβt
= sup
cothµλτ
− λβt
The optimal parameter choice λ̂ (which depends on τ as well as the model
parameters) then satisfies
= κ2t
cothµ
2 sinh2 µ
(s)2 ds.(30)
Then we have shown that
0 yλ̂(s)
≥ inf
I(y) +
0 y(s)
= inf
I(y)− 1
y(s)2 ds− λβt
≥ sup
I(y)− 1
y(s)2 ds− λβt
≥ I(y
(s)2 ds− λ̂βt,
A TYPED BRANCHING DIFFUSION 19
and, in fact, we see that the left- and right-hand sides of the above are equal
by recalling (30). It follows that the preceding supremum and infimum can
be freely interchanged, actually preserving equality at the inequality (28).
Then, with the optimal spatial path
xλ(s) := λa
yλ(u)
2 du=−βt sinh2µλs− 2µλs
sinh2µλτ − 2µλτ
,(31)
and defining x̂ := x
, ŷ := y
, we have
J(x, y, τ) = J(x̂, ŷ, τ)
= t sup
cothµλτ
cothµ
− λ̂β
− ρτ.
Finally, it is easy to check that J(x̂, ŷ, τ) = L(x̂, ŷ, τ), whence
J(x, y, τ)≥ inf
L(x, y, τ),
and, combining with equation (26), we have found that
L(x, y, τ) = inf
J(x, y, τ) = J(x̂, ŷ, τ).
4.3. An important note on the optimal paths. As τ →∞, we have
cothµλτ
↑ sup
{κ2ψ+λ − λβ}= κ
− λ̄ β,
where the optimizing parameters of the supremums also converge with
λ̂→ λ̄=−β
θ(θ− 8r)
a2κ4 + 4aθβ2
κ2 + θ
.(32)
Note the agreement with previous optimal values at equations (23) and (15).
Then letting
Θ(β,κ) := sup
{κ2ψ+λ − λβ}
θ(θ− 8r)(a2κ4 +4aθβ2)
and writing x̄ := xλ̄ and ȳ := yλ̄, we note that for all ε, δ > 0 there exist
τ̃ , µ > 0 such that for all t > 0 and τ > τ̃
− inf
J(x, y, τ)
≥ exp(−J(x̄, ȳ, τ))
= exp
cothµλ̄τ
− λ̄β
≥ exp(−t(Θ(β,κ) + ε)).
20 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Further (when κ > 0), for all s ∈ [τ − µ, τ ],
ȳ(s)≥ (κ− δ)
t, x̄(s)≤−(β − δ)t.
In particular, the paths stay close to the required positions for some fixed
length of time with corresponding probability at least as large as required.
5. Proof of Theorem 3. Lower bound. In this section we will state a pre-
cise short climb probability result and show how to combine it with almost
sure spatial (only) growth rates to prove the lower bound of the growth rate
in Theorem 3. This will make rigorous the two-phase mechanism described
in Section 2 and suggested by the expectation calculations in Section 3.
The first phase requires knowledge of the almost-sure rates of growth of
particles in the spatial dimension only. To this end, we will already make full
use of Theorem 1 throughout this section, deferring its proof until Section 10.
The second phase requires a lower bound for the probability that a single
particle makes a rapid ascent in type-space over the time interval [0, τ ]. This
is the lower bound found in the heuristics of Section 4, but we require some
further notation before the precise result can be stated. Note, throughout
this section, we will only be interested in the optimal parameter value λ= λ̄
as introduced in Section 4.3.
We wish to fix the relationship between sufficiently large t and τ as
θ/(2µλ̄) e
µλ̄τ = κ
t(34)
and so define τ = τ(t) by
τ(t) :=
(2µλ̄)
−1 log(2µλ̄t/θ), for 2µλ̄t > θ,
0, otherwise.
Recall the optimal paths (x̄, ȳ) over s ∈ [0, τ ], where
ȳ(s) = κ
sinhµλ̄s
sinhµλ̄τ
,(36)
x̄(s) = aλ̄
ȳ(w)2 dw=−βt sinh2µλ̄s− 2µλ̄s
sinh2µλ̄τ − 2µλ̄τ
,(37)
with fixed end points ȳ(τ) = κ
t and x̄(τ) =−βt.
For large times t and δ, ε > 0, let
t (u) :=
s∈[0,τ(t)]
|Yu(s)− ȳ(s)|< ε
t; sup
s∈[0,τ(t)]
|Xu(s)− x̄(s)|< δt
.(38)
We will use the notation
u∈Nτ(t)
t (u)(39)
A TYPED BRANCHING DIFFUSION 21
for the event that there exists a particle in the branching diffusion that
makes the short climb. Finally, recalling Θ(β,κ) given at (33), we can now
state the short climb theorem:
Theorem 7. Fix any y1 > y0 > 0, x ∈ R, and let ε0 > 0. Then for any
ε, δ > 0, there exists T > 0 such that for all y ∈ [y0, y1],
t−1 logP x,y(Aε,δt )≥−(Θ(β,κ) + ε0)
for all t > T .
We will prove Theorem 7 using a spine change of measure. This requires
us to introduce the notation for the spine set-up in detail before proceeding,
so this and further technical issues are postponed to Sections 6 and 7.
Remark 8. We note that Theorem 7 is actually a stronger result than
needed to prove Theorem 3 because we identify the specific paths followed
by particles that are near position (βt,κ
t ) at time t+ τ , rather than just
considering the particle’s positions close to time t+ τ .
In combining the two phases, we will have a huge number of independent
trials each with a small probability of success, intuitively giving rise to a
Poisson approximation for a large number of successful particles. In fact,
in our proof of the lower bound of Theorem 3 below, we will actually use
the following result about the behavior of sequences of sums of independent
Bernoulli random variables.
Lemma 9. For each n, define the random variable Bn :=
u∈Fn 1En(u)
where the events {En(u) :u ∈ Fn} are independent. Let pn(u) := P (En(u))
and Sn :=
u∈Fn pn(u) and suppose that, for some ν ∈ (1/2,1),
(Sn)2ν−1
<∞.(40)
Then the sequence of (possibly dependent) random variables {B1,B2, . . .}
has |Bn − Sn|> (Sn)ν for only finitely many n, almost surely.
In particular, for any ε > 0, there exists some (random) N ∈N such that,
with probability one,
> 1− ε for all n >N.(41)
Proof. For ν ∈ (1/2,1), Chebyshev’s inequality yields
P(|Bn − Sn|> (Sn)ν)≤
u∈Fn pn(u)(1− pn(u))
2ν−1 ,
22 Y. GIT, J. W. HARRIS AND S. C. HARRIS
and hence the Borel–Cantelli lemmas, combined with hypothesis (40), imply
|Bn − Sn|> (Sn)ν
for only finitely many n, almost surely. Equation (41) now follows on division
by Sn, and noticing the assumption (40) implies that limn→∞Sn =∞. �
Proof of Theorem 3. Lower bound. Define f−1(t) := t− τ(t), not-
ing that both f(t)/t→ 1 and f−1(t)/t→ 1 as t→∞. Also, for n ∈ N and
µ > 0, define Tn := (n+ 1)µ. We want to estimate the number of particles
that are near the large position (−(α + β)Tn, κ
Tn) during time interval
[Tn−1, Tn]. For this, we will consider particles that travel with a velocity
−α over time period [0, f−1(Tn)] before commencing their rapid ascent of
(relatively short) duration τ(Tn) to be in final position at time Tn. Then
s∈[Tn−1,Tn]
Ns((α+ β − δ)Tn; [(κ− δ)
Tn,∞))
u∈NTn
s∈[Tn−1,Tn]
{Xu(s)≤−(α+β−δ)Tn ;Yu(s)≥(κ−δ)
Tn}}(42)
u∈Fαn
1{N̄β,κn (u)>0}
where
Fαn := {u ∈Nf−1(Tn) :Xu(f
−1(Tn))≤−αTn, Yu(f−1(Tn)) ∈ [y0, y1]}
and, for u ∈ Fαn ,
N̄β,κn (u) :=
v∈NTn
s∈[Tn−1,Tn]
{Xv(s)−Xv(f−1(Tn))≤−(β−δ)Tn ;Yv(s)≥(κ−δ)
Tn}}.
We will now show that the sum at (42) grows as fast as anticipated:
Lemma 10. For any ε > 0, we may choose µ > 0 such that there exists
a random N ∈N where
u∈Fαn
1{N̄β,κn (u)>0}
≥∆(α)−Θ(β,κ)− ε
for all n >N with probability one.
Proof. We will be able to apply Lemma 9 given sufficient information
about the growth of |Fαn | and decay of the probabilities
pβ,κn (u) := P (N̄
n (u)> 0|Ff−1(Tn)),
A TYPED BRANCHING DIFFUSION 23
where u ∈ Fαn ⊂Nf−1(Tn).
It follows easily from Theorem 1, f−1(Tn)/Tn → 1 and the continuity of
∆(α) that
log |Fαn |
≥∆(α)− ε
for all sufficiently large n.
The definition of N̄β,κn (u) and spatial translation invariance implies that,
for each u ∈ Fαn , the rapid ascent probability pβ,κn (u) depends only on the
initial type position Yu(f
−1(Tn)).
For δ,µ > 0, define
t (u) :=
s∈[τ(t)−µ,τ(t)]
{Xu(s)−Xu(0)<−(β − δ)t;Yu(s)≥ (κ− δ)
u∈Nτ(t)
t (u).(43)
Recalling the comments of Section 4.3, there exist ε′, δ′ > 0 and we may
choose µ> 0 sufficiently small, such that
pβ,κn (u) = P
0,Yu(f
−1(Tn))(B
)≥ P 0,Yu(f−1(Tn))(Aε
) =: p̄n(u)
for all u ∈ Fαn whenever n is sufficiently large. Together with Theorem 7 and
since Yu(f
−1(Tn)) ∈ [y0, y1] for u ∈ Fαn , this reveals
log pβ,κn (u)
≥ log p̄n(u)
≥−Θ(β,κ)− ε
for all for u ∈ Fαn and all sufficiently large n, almost surely. Then we may
combine the observations above to obtain
u∈Fαn
pβ,κn (u)≥∆(α)−Θ(β,κ)−
Taking this last line together the assertion of Lemma 9 at equation (41)
gives the result. �
It is now straightforward to combine Lemma 10 with the inequality at
(42) to see that, given ε, δ > 0, there exists µ > 0 and a random time T such
t−1logNt((α+ β − δ)t; [(κ− δ)
t,∞))≥∆(α)−Θ(β,κ)− ε
for all t > T , almost surely. Since ε and δ can be taken arbitrarily small,
using the optimal ᾱ and β̄ according to equations (14)–(15), we find
lim inf
t−1 logNt(γ, [κ
t,∞))≥∆(γ,κ) almost surely,
24 Y. GIT, J. W. HARRIS AND S. C. HARRIS
as required. [It is also interesting to note that λ̄= λᾱ = λ̄(γ,κ) from equa-
tions (19), (23) and (32), so the optimal parameters are in agreement with
those of the expectation calculations in Section 3 and the path large devia-
tions in Section 4.] �
6. The “spine” setup and results. In this section, we describe how to
construct an enriched branching diffusion with an identified “spine” or “back-
bone” particle and discuss how to perform some extremely useful changes
of measure (closely related to the additive martingales) that will essentially
“force” the spine perform the short climb, whilst giving birth at an accel-
erated rate to offspring that behave as if under the original measure. These
spine techniques are at the very heart of our proof of Theorem 7 in Sec-
tion 5. Spine ideas were first seen for branching Brownian motion in [3] and
developed for Galton–Watson processes in [16, 18, 19]. Kyprianou [17] and
Englander and Kyprianou [6], developed the technique for some families of
branching diffusions; and more recently the spine approach has been signif-
icantly improved in [8]. This approach uses several different filtrations on
an enlarged probability space carrying the branching diffusion, and permits
some very useful techniques and results to be developed. For example, “addi-
tive” (many-particle) martingales can be represented as suitable conditional
expectations of “spine” (single-particle) martingales and consequently there
are clear interpretations for any changes of measure and all measures in-
volved in our “spine” setup are probability measures with intuitive construc-
tions. Following Hardy and Harris [8], we will first outline the notation and
then describe the changes of measure. The notation described in this section
is generalized to allow each particle u to have 1 +Au offspring, where each
Au is an independent copy of a random variable with values in {0,1,2, . . .}.
The spine techniques developed in this paper could readily be generalized
to such models.
All probability measures are to be defined on the space T̃ of marked
Galton–Watson trees with spines; before defining precisely what this space
is we need to set up some other notation. We recall the set of Ulam–Harris
labels, Ω, defined by Ω := {∅}∪
n∈N(N)
n, where N := {1,2,3, . . .}. For two
words u, v ∈Ω, uv denotes the concatenated word, where we take u∅=∅u=
u. So Ω contains elements such as “∅412,” which represents “the individual
being the 2nd child of the 1st child of the 4th child of the initial ancestor
∅.” For labels u, v ∈Ω the notation v < u means that v is an ancestor of u,
and |u| denotes the length of u.
We define a Galton–Watson tree to be a set τ ⊂Ω such that:
(i) ∅ ∈ τ , so there is the unique initial ancestor;
(ii) if u, v ∈Ω, then vu ∈ τ ⇒ v ∈ τ , so τ contains all of the ancestors of
its nodes;
A TYPED BRANCHING DIFFUSION 25
(iii) for all u ∈ τ , there exists Au ∈ {0,1,2, . . .} such that for j ∈N, uj ∈ τ
if and only if 1≤ j ≤ 1 +Au.
The set of all such trees is T, and we will use the symbol τ for a particular
tree. As our work concerns branching diffusions we shall often refer to the
labels of τ as particles. Note that for the binary branching mechanism in this
paper, P (Au = 1) ≡ 1; of course, here there is only one τ ∈ T—the binary
tree.
A Galton–Watson tree by itself only records the family structure of the
individuals, so to each individual u ∈ τ we give a mark (Xu, Yu, σu) which
contains the following information:
• σu ∈ [0,∞) is the lifetime of particle u, which also determines the fission
time of the particle as Su :=
v≤u σv . We may also refer to the Su as
death times;
• the function Xu(t) : [Su − σu, Su)→R describes the particle’s spatial mo-
tion in R during its lifetime;
• the function Yu(t) : [Su− σu, Su)→R describes the evolution of the parti-
cle’s type in R during its lifetime.
For clarity we must decide whether or not a particle is in existence at its
death time: our convention will be that a particle dies “infinitesimally be-
fore” its death time—this is why Xu and Yu are defined on [Su−σu, Su) and
not [Su−σu, Su]—so that at time Su the particle u has disappeared and has
been replaced by its two children.
We denote a particular marked tree by (τ,X,Y,σ), or the abbreviation
(τ,M), and the set of all marked Galton–Watson trees by T . For each
(τ,X,Y,σ) ∈ T , the set of particles alive at time t is defined as Nt := {u ∈
τ :Su − σu ≤ t < Su}. For any given marked tree (τ,M) ∈ T we can distin-
guish individual lines of descent from the initial ancestor: ∅, u1, u2, u3, . . . ∈
τ , where ui is a child of ui−1 for all i ∈ {2,3, . . .} and u1 is a child of the
initial individual ∅. We call such a line of descent a spine and denote it by
ξ. In a slight abuse of notation we refer to ξt as the unique node in ξ that is
alive at time t, and also for the position of the particle that makes up the
spine at time t; that is, ξt :=Xu(t), where u ∈ ξ∩Nt. However, although the
interpretation of ξt should always be clear from the context, we introduce
the following notation for use where some ambiguity may still arise:
• nodet((τ,M, ξ)) := u if u ∈ ξ is the node in the spine alive at time t.
It is natural to think of the spine as a single diffusing particle ξt, or, strictly
speaking, the pair (ξt, ηt), where ηt is the type of the spine at time t.
We define nt to be a counting function that tells us which generation of
the spine is currently alive, or equivalently the number of fission times there
have been on the spine:
nt = |nodet(ξ)|.
26 Y. GIT, J. W. HARRIS AND S. C. HARRIS
The collection of all marked trees with a distinguished spine is the space
T̃ on which our probability measures will eventually be defined, but first we
define four filtrations on this space that contain different levels of information
about the branching diffusion.
• Filtration (Ft)t≥0. We define a filtration of T̃ made up of the σ-algebras
Ft := σ((u,Xu, Yu, σu) :Su ≤ t;
(u,Xu(s), Yu(s) : s ∈ [Su − σu, t]) : t ∈ [Su − σu, Su)),
which means that Ft is generated by the information concerning all par-
ticles that have lived and died before time t, and also those that are
still alive at time t. Each of these σ-algebras is a subset of the limit
F∞ := σ(
t≥0Ft).
• Filtration (F̃t)t≥0. We define the filtration (F̃t)t≥0 by augmenting the
filtration Ft with the knowledge of which node is the spine at time t; that
is, (F̃t)t≥0 := σ(Ft,nodet(ξ)) and F̃∞ := σ(
t≥0 F̃t), so that this filtration
knows everything about the branching diffusion and everything about the
spine.
• Filtration (Gt)t≥0. (Gt)t≥0 is a filtration of T̃ defined by Gt := σ(ξs : 0 ≤
s≤ t), and G∞ := σ(
t≥0 Gt). These σ-algebras are generated only by the
spine’s motion and so do not contain the information about which nodes
of the tree τ make up the spine.
• Filtration (G̃t)t≥0. As we did in going from Ft to F̃t we create (G̃t)t≥0
from (Gt)t≥0 by including knowledge of which nodes make up the spine:
(G̃t)t≥0 := σ(Gt,nodet(ξ)) and G̃∞ := σ(
t≥0 G̃t). This means that G̃t also
knows when the fission times on the spine occurred, whereas Gt does not.
Now that we have defined the underlying space and filtrations, we can de-
fine the probability measures of interest. We let the typed branching diffusion
be as described in Section 1.1, with the probability measures {P x,y :x, y ∈R}
on (T̃ ,F∞) representing the law of this typed branching diffusion when ini-
tially started with a single particle at (x, y).
We recall from [18] that, if f is an F̃t-measurable function, we can write
fu1{ξt=u},(44)
where fu is Ft-measurable. Now we can extend P x,y to a measure P̃ x,y on
(T̃ , F̃∞) by choosing the particle that continues the spine uniformly each
time there is a birth on the spine; more precisely, for any f ∈ mF̃t with
representation like (44), we have:
f dP̃ x,y(τ,M, ξ) :=
dP x,y(τ,M).
A TYPED BRANCHING DIFFUSION 27
We construct the F̃t-measurable martingale ζ̃(t) as
ζ̃(t) := v+λ (ηt)e
{R(ηs)+1/2λ2A(ηs)}ds−E+λ t × 2nte−
R(ηs)ds
× eλξt−1/2λ
A(ηs)ds(45)
= v+λ (ηt)2
nteλξt−E
Observe that this is a product of single-particle martingales, details of which
can be found in [17] or [10]. One can think of these as h-transforms of the P̃ -
law of the spine: the first makes η an outward-drifting Ornstein–Uhlenbeck
process with drift parameter µλ; the second increases the breeding rate on
the spine to 2R(·); and the third adds a spatial drift to ξ.
Using the martingale ζ̃(t) we may define a measure Q̃
λ on (T̃ , F̃∞) by
dP̃ x,y
ζ̃(t)
ζ̃(0)
v+λ (y)
v+λ (ηt)2
nteλξt−E
t.(46)
And since ζ̃(t) is a product of h-transforms, under Q̃
λ the process may be
re-constructed path-wise according to the following description:
• starting from spatial position x and type y the spine (ξt, ηt) diffuses spa-
tially as a Brownian motion with infinitesimal variance A(ηt) and infinites-
imal drift λA(ηt);
• the type of the spine, ηt, begins at y and moves in type space as an
outward-drifting Ornstein–Uhlenbeck process with generator
+ µλy
• the spine branches at rate 2R(ηt), producing 2 particles;
• one of these particles is selected uniformly at random;
• the chosen offspring repeats stochastically the behavior of its parent;
• the other offspring particle initiates a P ·,·-BBM from its birth position
and type.
The change of measure (46) projects onto the sub-algebra Ft as a condi-
tional expectation:
dP̃ x,y
v+λ (y)
P̃ x,y(v+λ (ηt)2
nteλξt−E
t|Ft),
and it is a short calculation using the methods of, for example, Hardy and
Harris [10] to show that:
28 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Theorem 11. If we define Q
λ := Q̃
λ |F∞ , then Q
λ is a measure on
F∞ that satisfies
dP x,y
= Ẑ+λ (t) :=
Z+λ (t)
Z+λ (0)
Moreover under Q
λ , the path-wise construction of the branching diffusion
is the same as under Q̃λ.
Although the path-wise construction of the branching diffusion is the same
under Q
λ and Q̃
λ , only the measure Q̃
λ “knows” about the spine. It is
clear, however, that we have Q̃
λ (A) =Q
λ (A) for any A ∈ F∞.
Under the measure Q̃
λ only the behavior of the spine is altered, and
combining this observation with conditioning on the spine’s path and fission-
times gives us a very useful representation for Z+λ (t) under Q̃
λ that we shall
refer to as the spine decomposition:
λ (t)|G̃∞) =
v+λ (ηSu)e
λξSu−E
Su + v+λ (ηt)e
λξt−E+λ t.(47)
Throughout the rest of this article we will refer to the two pieces of this de-
composition as the “sum term” and the “spine term.” This decomposition
is discussed in detail for a wide variety of branching diffusions in [9], but
to derive it we simply note that the contributions to Z+λ (t) from the sub-
trees that branch off the spine have constant Q̃
λ -expectation because they
behave as if under the original measure P , and we know that Z+λ (t) is a P -
martingale. The spine decomposition reduces many calculations about the
behavior of Z+λ (t) under Q̃
λ to one-particle calculations about the spine,
and this observation is exploited in the spine proofs of Lp-bounds for some
families of additive martingales in [9].
7. Proof of Theorem 7. The short climb probability. With the spine
foundations firmly established in Section 6, we may proceed with the proof
of the short climb probability lower bound from Theorem 7.
First, recall definitions (38) and (39), where A
t is the event that there
exists a particle that makes the short climb along optimal path (x̄, ȳ), and
t (ξ) is the event that the spine makes the short climb. Note that ε controls
the proximity to x̄ and δ the proximity to ȳ. Importantly, we will only be
interested in taking λ= λ̄ throughout this section, although we will usually
just write λ for notational simplicity. Also recall throughout that t and τ
are related through (θ/(2µλ)) exp(2µλτ) = κ
A TYPED BRANCHING DIFFUSION 29
Proof of Theorem 7. The key step in the proof of this is the following
use of the spine change of measure: for any function g :R+ →R+ we have
P x,y(A
t ) =Q
Ẑ+λ (τ)
Ẑ+λ (τ)
≥ Q̃x,yλ
Ẑ+λ (τ)
≥ Q̃x,yλ
Ẑ+λ (τ)
; sup
s∈[0,τ ]
Ẑ+λ (s)≤ g(τ)
≥ g(τ)−1Q̃x,yλ
t (ξ); sup
s∈[0,τ ]
Ẑ+λ (s)≤ g(τ)
Essentially we just have to make the “correct” choice for both λ and g in
expression (48), although there will still remain a number of technicalities
to resolve.
The first idea is to ensure the (originally rare) event A
t actually occurs
under the new measure Q̃
λ by making the spine follow close to the required
path (x̄, ȳ); this is achieved by choosing the optimal value λ̄ for λ and choos-
ing τ to be on the natural time scale it would take the spine to reach position
t. In particular, this choice will mean that in the first line of the above set
of inequalities there is no significant loss of mass when replacing the event
t with A
t (ξ). Next, we wish to choose the smallest possible g that will
still leave some positive probability on the last line of the above argument.
Hence, we wish to identify the rate of growth of the martingale Z+λ under
λ , and this will essentially be governed by the contribution from the spine
itself.
With this is mind, and recalling the various properties of the optimal
paths and parameters from Section 4, for ε0 > 0 we define
gε0(τ) := exp
ψ+λ +
e2µλτ − (ψ+λ y
2 + λx)
and recall from (35) that the scaling between t and τ is fixed throughout,
where κ2t= (θ/(2µλ̄))e
2µλ̄τ for large t, hence t+ τ ∼ t. Note that since we
are only considering the optimal value λ= λ̄, we have
e2µλ̄τ = (κ2ψ+
− λ̄β)t=Θ(β,κ).
Then from (48) we have
P x,y(A
t )≥ gε0(τ)−1Q̃
t (ξ); sup
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
.(49)
Our strategy for the rest of this proof is to show that the Q̃
λ -probability in
(49) is at least some ε′ > 0 for all sufficiently large t, uniformly for y ∈ [y0, y1],
30 Y. GIT, J. W. HARRIS AND S. C. HARRIS
so that the decay rate part of (49) matches the desired rate in the statement
of the theorem.
Conditioning on the spine’s path and birth times, G̃∞, and then making
use of some standard properties of conditional expectation we have
t (ξ); sup
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
t (ξ); sup
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
since A
t (ξ) is G̃∞-measurable. We next observe that, conditional on G̃∞,
we can write Ẑ+λ (t) as
Ẑ+λ (t) = e
y2+λx)
λ (t− Su) + f(t)
,(50)
where the Z
λ are independent copies of Z
λ started from a single particle
at (ξSu , ηSu); and f(t) is the contribution to Z
λ (t) from the spine, which,
conditional on G̃∞, is a known function of t. Now if we could show, for
0< ε̃0 < ε0,
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
and sup
s∈[0,τ ]
(Ẑ+λ (s)− f̂(s))≤
gε0(τ)
where f̂(t) := e−(ψ
y2+λx)f(t), we would have sups∈[0,τ ] Ẑ
λ (s)≤ gε0(τ). Hence,
defining Ẑ+λ (s) := Ẑ
λ (s)− f̂(s), we have
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
≥ Q̃x,yλ
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
; sup
s∈[0,τ ]
Ẑ+λ (s)≤
gε0(τ)
1{sups∈[0,τ ] f̂(s)≤gε̃0 (τ)/2}
s∈[0,τ ]
Ẑ+λ (s)≤
gε0(τ)
since, conditional on G̃∞, the supremum of f̂ on [0, τ ] is known.
We see from (50) that, conditional on G̃∞, Ẑ+λ (t) is a submartingale. This
is because the Q̃
λ -conditional expectation of each of the Z
λ in the sum
y2+λx)
λ (t− Su)(51)
A TYPED BRANCHING DIFFUSION 31
is constant, so the expectation of the sum cannot decrease, and in fact this
expectation increases every time there is a birth on the spine. Then by
Doob’s submartingale inequality we have
s∈[0,τ ]
Ẑ+λ (s)≤
gε0(τ)
= 1− Q̃x,yλ
s∈[0,τ ]
Ẑ+λ (s)≥
gε0(τ)
≥ 1− 2
gε0(τ)
λ (Ẑ
λ (τ)|G̃∞).
We must note here that the expectation on the above line is not a priori
finite. However, the expectation of each term in the sum (51) is bounded by
sups∈[0,τ ] f̂(s), which we have control over via an indicator function and so
we do not have to worry about this expectation blowing up.
So we need to show that for all sufficiently large τ and all y ∈ [y0, y1],
(ξ)∩{sups∈[0,τ ] f̂(s)≤gε̃0 (τ)/2}
gε0(τ)
λ (Ẑ
λ (τ)|G̃∞)
> ε′,
and hence also
t (ξ); sup
s∈[0,τ ]
Ẑ+λ (s)≤ gε0(τ)
as required. This will follow by combining both parts of the following result.
Lemma 12. Fix y1 > y0 > 0 and ε0 > ε̃0 > 0.
(i) For all sufficiently small ε, δ > 0, there exists some ε′ > 0 and T̃ > 0
such that for all y ∈ [y0, y1] and all t > T̃ ,
t (ξ); sup
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
> ε′.
(ii) As τ →∞,
gε0(τ)
λ (Ẑ
λ (τ)|G̃∞); sup
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
uniformly over y ∈ [y0, y1].
Then we have shown that, for any ε0 > 0, y1 > y0 > 0, and sufficiently
small ε, δ > 0, there exists a T > 0 such that, for all y ∈ [y0, y1] and all t > T ,
t−1 logP x,y(Aε,δt )≥−(Θ(β,κ) + ε0).
32 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Finally, we observe that the probability P x,y(A
t ) is trivially monotone
increasing in both ε and δ, and so it follows that if the result is true for all
sufficiently small ε and δ, it is in fact true for all ε, δ > 0. This completes
the proof of Theorem 7. �
Proof of Lemma 12(i). We will prove Lemma 12(i) in a sequence of
other lemmas, using a convenient coupling for the spine’s type process.
First recall that, under Q̃
λ , ηs solves the SDE
dηs =
θ dBs + µληs ds,
where Bs is a Q̃λ-Brownian motion. Noting that d(e
−µλsηs) = e
θ dBs,
we can construct e−µλsηs as a time-change of a Brownian motion with
e−µλsηs − η0 =
e−µλw dBw =
B̃(1− e−2µλs),
where B̃ is also a Q̃
λ -Brownian motion started at the origin.
In this way, we will construct ηy under P from a Brownian motion By
started at y
2µλ/θ where, for s ∈ [0,∞),
ηy(s) =
eµλsBy(1− e−2µλs).
To construct simultaneously all type processes ηy under the same measure
P, we first construct the process By0 as an independent Brownian motion
started at y0
2µλ/θ. Second, we construct the process B
y1 by running an
independent Brownian motion started at y1
2µλ/θ until it first hits the
path of By0 , at which point we couple the two processes together. Next, for
any other y ∈ (y0, y1), we run an independent Brownian motion By until it
first meets with either the process By0 below or By1 above, at which point
we couple it to the process it first hits.
Finally, we construct all the corresponding spatial processes ξy under P
from a single Brownian motion W by defining
ξy(s) =W
ηy(w)2 dw
ηy(w)2 dw,(52)
where W is started at x and is independent of the By processes.
Constructed in this way, for each y ∈ [y0, y1], the P-law of (ξy, ηy) is the
same as the Q̃
λ -law of (ξ, η).
Fixing µ ∈ (0,1) and K >max{y1,1}, we define the events and stopping
times
Ayε :=
By(s) ∈
,∀s ∈ (1− µ,1]
T0 := inf{t :By0(t) = 0}, TK := inf{t :By1(t) =K},
Ãε,K :=A
ε ∩Ay1ε ∩ {T0 > 1} ∩ {TK > 1}.
A TYPED BRANCHING DIFFUSION 33
Then, clearly P(Ãε,K)> 0 and, on the event Ãε,K , the coupling gives
0< ηy0(s)≤ ηy(s)≤ ηy1(s)≤K
eµλs,
for all s≥ 0 and y ∈ [y0, y1]. Note that our construction also ensures that if
event Ay0ε ∩Ay1ε occurs then so must Ayε for any y ∈ [y0, y1], hence Ayε ⊃ Ãε,K .
Lemma 13. Let ε > 0. On event Ãε,K , there exists a deterministic time
s0 = s0(ε)> 0 such that for all τ > s0,
s∈[0,τ ]
|ηy(s)− ȳ(s)| ≤ ε
for all y ∈ [y0, y1].
Proof. Set s1 =− 12µλ logµ and then, on event Ãε,K , for all τ ≥ s > s1
we have
ηy(s)−
eµλs,
for all y ∈ [y0, y1]. Writing
ȳ(s) =
1− e−2µλs
1− e−2µλτ
eµλs,
we see that there exists s2 = s2(ε)> 0 such that, for τ ≥ s > s2,
ȳ(s)−
eµλs.
Taking s3(ε) = max{s1, s2(ε)} now yields
|ηy(s)− ȳ(s)|< ε
eµλs ≤ ε
t(53)
for all τ ≥ s > s3 and all y ∈ [y0, y1].
Now consider s ∈ [0, s3]. On Ãε,K we have
|ηy(s)− ȳ(s)| ≤
eµλs3(1 +K),
and hence for some s4(ε)> 0 we have |ηy(s)− ȳ(s)| ≤ ε
t for all τ > s4, all
s ∈ [0, s3], and all y ∈ [y0, y1]. Taking s0(ε) =max{s3, s4} yields the result.
34 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Lemma 14. Let δ > 0. Then for all sufficiently small ε, there exists a
deterministic τ0 = τ0(ε, δ)> 0 such that, on Ãε,K , we have
s∈[0,τ ]
ηy(w)2 dw−
ȳ(w)2 dw
< δt(54)
for all τ > τ0 and all y ∈ [y0, y1].
Proof. Given any δ > 0, we first fix an ε > 0 sufficiently small such
that ε(2+ ε
; this yields a corresponding s3 = s3(ε), which is chosen
as at equation (53). Given this s3, we find τ1 = τ1(ε, δ)> 0 such that, for all
τ > τ1,
(K2 + 1)
e2µλw dw <
We now set τ0 = τ0(ε, δ) = max{s3, τ1}. With this choice of ε and τ0, we
proceed to show that the inequality (54) is satisfied. Note that τ0 is deter-
ministic and independent of y.
From equation (53) we see that, on Ãε,K and for s > s3,
ηy(w)2 dw ≥
ηy(w)2 dw+
ȳ(w)− ε
ȳ(w)2 dw−
ȳ(w)2 dw− 2
e2µλw dw
ȳ(w)2 dw−
e2µλw dw− (2ε) κt
ȳ(w)2 dw− δ
for all τ > τ0 and all y ∈ [y0, y1]. Similarly
ηy(w)2 dw ≤
ηy(w)2 dw+
ȳ(w) +
ȳ(w)2 dw+
ȳ(w)2 dw+K2
e2µλw dw+ ε
ȳ(w)2 dw+
A TYPED BRANCHING DIFFUSION 35
for all τ > τ0 and all y ∈ [y0, y1]. Finally, for s ∈ [0, s3], on Ãε,K we have
ηy(w)2 dw−
ȳ(w)2 dw
ηy(w)2 dw+
ȳ(w)2 dw
≤ (K2 + 1)
e2µλw dw < δt
for all τ > τ0 and all y ∈ [y0, y1]. �
Lemma 15. Let δ > 0. Then for all sufficiently small ε > 0, there exists
P-almost everywhere on Ãε,K a random time S0 = S0(δ, ε)<∞ such that
s∈[0,τ ]
ξy(s)− λa
ηy(w)2 dw
< δt,
for all y ∈ [y0, y1] and all τ > S0.
Proof. Given δ > 0, choose any δ′, δ′′ > 0 such that δ′(|β/λ|+ δ′′)< δ.
Recalling the construction of ξy at (52), we see from standard properties of
Brownian motion that there almost surely exists some S1 = S1(δ
′)<∞ such
s∈[0,t]
|W (s)|< δ′ for all t > S1.
s∈[0,τ ]
ηy(w)2 dw
ηy(w)2 dw
for all τ such that a
y(w)2 dw > S1, and by the coupling construction,
on Ãε,K this is true for all y ∈ [y0, y1] if a
y0(w)2 dw > S1. Then there
exists (P-almost everywhere on Ãε,K) a random time S2 = S2(δ
′)<∞, which
depends on By0 and S1, such that a
y(w)2 dw > S1 for all y ∈ [y0, y1] when
τ > S2.
Now by Lemma 14, given δ′′ and a sufficiently small ε, there exists a
deterministic τ0 = τ0(ε, δ
′′)> 0 such that, on Ãε,K ,
ηy(w)2 dw ≤ a
ȳ(s)2 ds+ δ′′t=
+ δ′′
t(56)
for all τ > τ0 and all y ∈ [y0, y1]. Combining the inequalities at (55) and (56),
we now see that, for τ > S0 = S0(ε, δ
′, δ′′) = max{S2, τ0},
s∈[0,τ ]
ξy(s)− λa
ηy(w)2 dw
= sup
s∈[0,τ ]
ηy(w)2 dw
for all y ∈ [y0, y1]. �
36 Y. GIT, J. W. HARRIS AND S. C. HARRIS
On combining Lemmas 14 and 15 and recalling the definition of optimal
path x̄ at (37), we obtain the following:
Lemma 16. Let δ > 0. Then for all sufficiently small ε > 0, there exists
P-almost everywhere on Ãε,K a random time S̃0 = S̃0(δ, ε)<∞ such that
s∈[0,τ ]
|ξy(s)− x̄(s)|< δt,
for all y ∈ [y0, y1], and all τ > S̃0.
We may now draw everything together to finish the proof of Lemma 12(i).
First we observe that since λ < 0, on event A
t (ξ),
s∈[0,τ ]
y2+λx) exp(ψ+λ η
s + λξs −E+λ s)
≤ e−(ψ
y2+λx) exp(ψ+λ (κ+ ε)
2t+ λ(−β − δ)t),
and so, given ε̃0, we can choose first δ and then ε sufficiently small so that
t (ξ)⊂
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
and, from Lemmas 13 and 16, there exists a random time T̃ = T̃ (δ, ε) <∞
such that on Ãε,K we have
s∈[0,τ ]
|ηy(s)− ȳ(s)|< ε
s∈[0,τ ]
|ξy(s)− x̄(s)|< δt
for all τ > T̃ and all y ∈ [y0, y1]. That is, Ãε,K ∩ {T̃ < τ} ⊂Aε,δt (ξy) for each
y ∈ [y0, y1], with the slight abuse of notation that
s∈[0,τ(t)]
|ηy(s)− ȳ(s)|< ε
t; sup
s∈[0,τ(t)]
|ξy(s)− x̄(s)|< δt
Note also that P(Ãε,K)> ε
′ for some ε′ > 0.
Combining the above, for any y ∈ [y0, y1] we have
t (ξ); sup
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
t (ξ)) = P(A
≥ P(Ãε,K; T̃ < τ)→ P(Ãε,K)
as τ →∞, as required. �
A TYPED BRANCHING DIFFUSION 37
Proof of Lemma 12(ii). Consider the expectation of the “sum term.”
We have
λ (Ẑ
λ (τ)|G̃∞) = e
y2+λx)
λ (t− Su)
= e−(ψ
y2+λx)
λ (t− Su)|G̃∞)
≤ e−(ψ
y2+λx)nτ max{eψ
η(Su)
2+λξ(Su)−E+λ Su :u < ξτ}
≤ nτ sup
s∈[0,τ ]
f̂(s).
Hence
gε0(τ)
λ (Ẑ
λ (τ)|G̃∞); sup
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
≤ Q̃x,yλ
gε̃0(τ)
gε0(τ)
; sup
s∈[0,τ ]
f̂(s)≤ gε̃0(τ)
≤ e−(ε0−ε̃0)tQ̃x,yλ (nτ ),
and we can now calculate Q̃
λ (nτ ) = Q̃
λ (Q̃
λ (nτ |G∞)), where G∞ the
σ-algebra generated by the path of the spine (not including the birth times).
Conditional on G∞, nτ is a Poisson random variable with mean given by
0 2(rη
s + ρ)ds, and using Fubini’s theorem we have
2(rη2s + ρ)ds
s)ds+2ρτ
e2µλτ −
− rθτ
+ 2ρτ
2y2κµλ
t+ o(τ).
So the Q̃
λ -expectation of nτ only grows linearly in t. Then since ε0− ε̃0 > 0,
the expression at (57) tends to 0 as t→∞. Moreover, the expectation at (57)
is bounded by the Q̃
λ -expectation, and hence the convergence is uniform
over y ∈ [y0, y1], as claimed. �
8. Martingale results. In this section we recall some existing and prove
some new martingale results that are intermediate steps in the proofs of
38 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Theorem 1 and the upper bound of Theorem 3. We recall from [13] that E−λ
[also written E−(λ)] and ∆(γ) are Legendre conjugates with
∆(γ) = inf
{E−(λ) + λγ}, E−(λ) = sup
{∆(γ)− γλ}.(58)
If, for λmin <λ< 0, we write γλ for the γ value which achieves the supremum
on the right-hand side of equation (58), then the functions λ 7→ γλ from
(−λmin,0) to (0,∞), and γ 7→ λγ from (0,∞) to (−λmin,0) are inverses of
each other and, of course, λγ is the λ value which achieves the infimum on
the left-hand side of equation (58). In addition, we note that
γλ =−
E−(λ) =
θa2λ2
θ− 8r− 4aλ2 ,(59)
that E−(λ) and ∆(γ) are convex functions, and that
c̃(θ) = sup{γ :∆(γ)> 0}= inf{−E−(λ)/λ :λmin <λ< 0}
= inf{c−λ :λmin < λ< 0}= c
λ̃(θ)
where
c−λ :=−E
λ /λ and
λ̃(θ) :=−
2(θ− 8r)(θρ+ 2ρ2 + rθ)
a(θ+4ρ)2
∈ (λmin,0).
A formula for c̃(θ) is given in equation (9). The following fundamental con-
vergence result for the Z−λ martingale was first partly proved in [13], but
also see [9] for a more complete proof using “spine” techniques.
Theorem 17. Suppose λ ∈ (λmin,0].
(i) If λ ∈ (λ̃(θ),0], the martingale Z−λ is uniformly integrable and has
an almost sure strictly positive limit.
(ii) If λ≤ λ̃(θ), then Z−λ (∞) = 0 almost surely.
The following convergence result was proved in [12] using martingales
based on Hermite polynomials.
Theorem 18. Let λ ∈ (λ̃(θ),0] and α< 1/4. For each P x,y starting law
and every continuous bounded function f :R 7→R, we have
f(Yu(t))e
αYu(t)
2+λ(Xu(t)+c
λ (∞),
A TYPED BRANCHING DIFFUSION 39
where
f0 :=
)1/4 ∫
f(y)eαy
y2φ(y)dy(62)
and φ(y) is the standard normal density.
In this paper, we require a corollary to this theorem which specifies more
precisely which particles contribute to the final limit.
Corollary 19. Let λ ∈ (λ̃(θ),0] and α < 1/4. For each P x,y starting
law and every continuous bounded function f :R 7→ R, we have for every
ε > 0
f(Yu(t)) e
αYu(t)
2+λXu(t)−E−λ t
1{|Xu(t)/t+γλ|<ε}−→a.s. f0Z
λ (∞)(63)
where γλ =− ∂∂λE
−(λ) and f0 is given at equation (62).
This last result will enable us to show in Section 10 that the almost sure
growth rate is at least as large as the expected growth rate, D(γ)≥∆(γ). It
is easy to see from Corollary 19 that when Z−λ (∞)> 0, there must exist at
least one particle near to −γλt in space. Further, because of the decay rate
of each term in the sum over particles at equation (63), it will be relatively
straightforward to improve this to get the required exponential numbers of
particles, exp(∆(γ)t), near −γλt for large times [as long as Z−λ (∞)> 0].
The following result concerns the rate at which the martingales Z+λ and
Z−λ converge to zero.
Theorem 20. Let λ ∈ (λmin,0). For every starting law, P x,y,
logZ±λ (t)
→ λ(c±λ − c
λ) a.s.
where c±λ is given at (5), and
c∗λ :=
c̃(θ), if λmin <λ≤ λ̃(θ),
c−λ , if λ̃(θ)≤ λ < 0.
Corollary 21. If λ ∈ (λmin,0), then Z+λ (t)→ 0 P x,y-almost surely.
The rate of convergence of the Z+λ martingale in part (i) of Theorem 20
is crucial in Section 9 to obtain the upper bound on the almost sure growth
rate, D(γ,κ) ≤ ∆(γ,κ). We also comment that if Corollary 19 were true
for all α < ψ+λ , then we could have gained this upper bound at that point.
40 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Although Corollary 19 is only proven for α < 1/4 (where we can utilize
suitable Hermite expansions), we conjecture that it holds for all α <ψ+λ .
Proof of Corollary 19. Let ε > 0 be small, µ := λ−ε, λ,µ ∈ (λ̃(θ),0),
f be a positive, continuous bounded function, α< 1/4 and note that γµ > γλ.
Then we have
f(Yu(t))e
αYu(t)
2+λXu(t)−E−λ t
1{Xu(t)<−γµt}
≤ e(E
µ −E−λ −εγµ)t
f(Yu(t))e
αYu(t)
2+µXu(t)−E−µ t
1{Xu(t)<−γµt}
f(Yu(t))e
αYu(t)
2+µXu(t)−E−µ t
−E−µ +(λ−µ)γµ)t.
Recall that E−(λ) is convex with ∂
E−(λ)≥ 0 and ∂
E−(λ) = γλ, so, from
the Taylor expansion,
E−λ −E
µ + (µ− λ)
E−(λ)
(µ− λ)2
E−(λ) + o((µ− λ)2).
Then taking ε > 0 small enough so that E−λ −E−µ +(λ−µ)γµ > 0, and using
Theorem 18, we find that for any δ > 0
limsup
f(Yu(t))e
αYu(t)
2+λXu(t)−E−λ t
1{Xu(t)<−(γλ+δ)t} = 0.
Similarly, we can show
limsup
f(Yu(t))e
αYu(t)
2+λXu(t)−E−λ t
1{Xu(t)>−(γλ−δ)t} = 0,
and hence the only contribution to the limit comes from the particles near
−γλt in space. Combining this with Theorem 18 we have
λ (∞) = limt→∞
f(Yu(t))e
αYu(t)
2+λXu(t)−E−λ t
= lim
f(Yu(t))e
αYu(t)
2+λXu(t)−E−λ t
1{|Xu(t)/t+γλ |<δ}.
Proof of Theorem 20. We use a useful technique brought to our
attention in [22]. Let p ∈ (0,1) so that, by Jensen’s inequality, Z±λ (t)p is a
supermartingale; then for u, v > 0 we have
(u+ v)p ≤ up + vp,
A TYPED BRANCHING DIFFUSION 41
and hence
Z±λ (t)
Yu(t)
2+λ(Xu(t)+c
Yu(t)
2+pλ(Xu(t)+c
For any ε > 0, Doob’s supermartingale inequality says
s≤w≤s+t
Z±λ (w)
p > εp
λ (s)
≤ ε−p
Yu(s)
2+pλ(Xu(s)+c
and then
s≤w≤s+t
eδwZ±λ (w)> ε
s≤w≤s+t
Z±λ (w)
p > e−pδ(s+t)εp
≤ ε−pepδt
Yu(s)
2+pλ(Xu(s)+c
p(λ(c±
)+δ)s
Now, if we can choose p ∈ (0,1) such that λ(c±λ − c
pλ) + δ < 0 and pψ
ψ+pλ, we must have e
δuZ±λ (u)→ 0 almost surely by using a familiar Borel–
Cantelli argument. [The condition pψ±λ < ψ
pλ guarantees that the expec-
tation in the last line above tends to a finite limiting value, hence stays
bounded over all times s, as can be checked by using formula (17), for ex-
ample.]
For all 0 ≤ p < 1 we find pψ±λ < ψ
pλ. Considering the graph of c
quickly see that, for λ ∈ [λ̃(θ),0), taking p as close to 1 as we like gives the
best rate. For λ ∈ [λmin, λ̃(θ)) we can choose p so that pλ= λ̃(θ), which gives
the best rate.
Recall from Theorem 17 that Z−λ (∞)> 0 when λ ∈ (λ̃(θ),0). Then, so far,
we have proved the following:
Lemma 22. For every starting law, P x,y, and for all ε > 0, if λ ∈ (λmin,0)
e−εt e−λ(c
)tZ±λ (t)→ 0 a.s.
42 Y. GIT, J. W. HARRIS AND S. C. HARRIS
where
c∗λ :=
c̃(θ), if λmin <λ≤ λ̃(θ),
c−λ , if λ̃(θ)≤ λ < 0.
It is clear that this gives the required upper bound of
lim sup
logZ±λ (t)
≤ λ(c±λ − c
Now, for any ε > 0, if λ ∈ (λmin, λ̃(θ)] then
Yu(t)
2+λ(Xu(t)+c̃(θ)t) ≥ eλ(Lt+c̃(θ)t)+εt →∞ a.s.
since we know that Lt := inf{Xu(t) : u ∈ N(t)} satisfies Lt/t→ −c̃(θ) a.s.
Otherwise, with λ∈ (λ̃(θ),0),
Yu(t)
2+λ(Xu(t)+c
t) ≥ eεtZ−λ (t)→∞ a.s.
since here Z−λ (∞)> 0 a.s. Thus, in all cases,
lim inf
logZ±λ (t)
≥ λ(c±λ − c
which completes the proof of Theorem 20. �
9. Proof of Theorem 3. Upper bound. The idea for the upper bound
proof is to overestimate indicator function by exponentials, and then re-
arrange the expressions to form martingale terms.
Simply observe that for λ ∈ (λmin,0),
Nt(γ, [κ
t,∞)) =
1{Xu(t)≤−γt;Yu(t)≥κ
1{Xu(t)≤−γt;Yu(t)2≥κ2t}e
(Yu(t)
2−κ2t)+λ(Xu(t)+γt)
≤ e(E
−κ2ψ+
+λγ)t
Yu(t)
2+λXu(t)−E+λ t
≤ e−λ(c
)tZ+λ (t)e
+λγ−κ2ψ+
where E±λ =−λc
Recall from equations (11) and (32) that E−λ + λγ − κ2ψ
λ has a minimal
value of ∆(γ,κ) achieved when λ= λ̄(γ,κ). Since c̃(θ) is the minimal value
of cλ, Theorem 20 implies that
lim sup
t−1 logZ+λ (t)≤ λ(c
λ − c
λ )(65)
A TYPED BRANCHING DIFFUSION 43
almost surely for all λ ∈ (λmin,0).
In cases where ∆(γ,κ)< 0, we can use the optimal value for λ, Theorem 20
and trivially note that Nt(γ, [κ
t,∞)) is integer valued to deduce that
1{Yu(t)≥κ
t;Xu(t)≤−γt} = 0
eventually, almost surely. Hence, D(γ,κ) =−∞ almost surely if ∆(γ,κ)< 0.
Otherwise we have ∆(γ,κ)≥ 0, which in fact guarantees that γ ∈ (0, c̃(θ)]
and hence λ̄(γ,κ) ∈ [λ̃(θ),0). Then since
lim sup
t−1 logNt(γ, [κ
t,∞))
≤ lim sup
t−1 log(e−λ(c
)tZ+λ (t)) + (E
λ + λγ − κ
2ψ+λ )
we can again make use of Theorem 20 and the minimizing λ value, λ̄(γ,κ),
to get the bound
limsup
t−1 logNt(γ, [κ
t,∞))≤∆(γ,κ) almost surely,
as desired.
Notice that, when ∆(γ,κ) = 0, the right-hand side of the inequality at
(64) will tend to infinity (see Corollary 19). Then, on the boundary, we have
only shown that lim sup t−1 logNt(γ, [κ
t,∞))≤ 0.
10. Proof of Theorem 1. The spatial growth rate. We first bound the
spatial growth rate above. Suppose that C ⊂ R is Borel-measurable with
y2φ(y)dy > 0. Let λ ∈ (λmin,0), then
1{Xu(t)≤−γt;Yu(t)∈C} ≤
1{Yu(t)∈C}e
λ(Xu(t)+γt)
= e(E
+λγ)t
1{Yu(t)∈C} e
λXu(t)−E−λ t
≤ e(E
+λγ)tZ−λ (t).
Recalling equations (8) and (19), we therefore have
1{Xu(t)≤−γt;Yu(t)∈C} ≤ e
∆(γ)tZ−λγ (t).
Now if γ ≥ c̃(θ), corresponding to λγ ∈ (λmin, λ̃(θ)] and having ∆(γ)≤ 0,
we know from Theorem 17 that Z−λγ (∞) = 0 almost surely. Then,
γ > c̃(θ) ⇒
1{Xu(t)≤−γt;Yu(t)∈C} = 0 eventually, a.s.
44 Y. GIT, J. W. HARRIS AND S. C. HARRIS
Otherwise, if γ ∈ (0, c̃(θ)), corresponding to λγ ∈ (λ̃(θ),0) and having ∆(γ)>
0, Theorem 17 tells us that Z−λγ (∞)> 0 almost surely, hence
lim sup
t−1 log
1{Xu(t)≤−γt;Yu(t)∈C} ≤∆(γ).
Now we bound the growth rate from below. Let ε > 0 be small, λ̃(θ) <
λ < 0, and µ = λ − ε. We recall now that E−λ is convex so
≥ 0 and
γµ > γλ. Then
eλXu(t)−E
1{−(γλ+ε)t≤Xu(t)≤−(γλ−ε)t;Yu(t)∈C}
eλ(−(γλ+ε)t)−E
1{−(γλ+ε)t≤Xu(t)≤−(γλ−ε)t;Yu(t)∈C}
= e(−λγλ−E
−λε)t ∑
1{−(γλ+ε)t≤Xu(t)≤−(γλ−ε)t;Yu(t)∈C}
≤ e(−λγλ−E
−λε)t ∑
1{Xu(t)≤−(γλ−ε)t;Yu(t)∈C}.
t−1 log
1{Yu(t)∈C}e
λXu(t)−E−λ t
1{|Xu(t)/t+γλ |<ε}
≤−λγλ−E−λ − λε+ t
−1 log
1{Xu(t)≤−(γλ−ε)t;Yu(t)∈C}.
Letting t→∞, using Corollary 19 and remembering that for λ̃(θ)< λ≤ 0
we have Z−λ (∞)> 0 a.s., we find
0≤−λγλ −E−λ − λε+ lim inft→∞ t
−1 log
1{Xu(t)≤−(γλ−ε)t;Yu(t)∈C}
and as ε > 0 can be arbitrarily small we have
lim inf
t−1 log
1{Xu(t)≤−γλt;Yu(t)∈C} ≥E
λ + λγλ.
Equivalently,
lim inf
t−1 log
1{Xu(t)≤−γt;Yu(t)∈C} ≥E
+ λγγ =∆(γ)
and hence the lim sup and lim inf agree as required.
We note that these proofs will easily adapt to cover a multi-type branching
Brownian motion where the types evolve as a finite state Markov chain,
such as found in [2], where it will also be possible to prove the analogous
A TYPED BRANCHING DIFFUSION 45
convergence theorem required when we have a finite type space by adapting
the proof of Theorem 18 found in [12].
In the standard branching Brownian motion case things are even simpler
to adapt (where, of course, there is no need for any convergence result akin to
Theorem 18). All the information necessary is contained in the martingales
u∈Nt exp(λXu(t)− (λ
2/2 + r)t) studied by Neveu [22] and, as first came
to our attention during discussions with J. Warren, the martingale with
parameter λ can only be capable of “counting” particles near γλt in space
at large times t, so when this martingale is uniformly integrable particles
must perpetually be found with the corresponding speed. Of course, in this
case more precise results, in the spirit of Watanabe [25], also exist.
Acknowledgments. We would like to thank two anonymous referees for
providing extremely helpful and thorough reviews of earlier incarnations
of this manuscript. Their numerous invaluable comments led to a much
improved presentation of this work.
REFERENCES
[1] Athreya, K. B. (2000). Change of measures for Markov chains and the L logL
theorem for branching processes. Bernoulli 6 323–338. MR1748724
[2] Champneys, A., Harris, S. C., Toland, J., Warren, J. andWilliams, D. (1995).
Algebra, analysis and probability for a coupled system of reaction-diffusion equa-
tions. Philos. Trans. Roy. Soc. London 350 69–112. MR1325205
[3] Chauvin, B. and Rouault, A. (1988). KPP equation and supercritical branching
Brownian motion in the subcritical speed area. Application to spatial trees.
Probab. Theory Related Fields 80 299–314. MR0968823
[4] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications,
2nd ed. Springer, New York. MR1619036
[5] Enderle, K. and Hering, H. (1982). Ratio limit theorems for branching Orstein–
Uhlenbeck processes. Stochastic Process. Appl. 13 75–85. MR0662806
[6] Engländer, J. and Kyprianou, A. E. (2004). Local extinction versus local
exponential growth for spatial branching processes. Ann. Probab. 32 78–99.
MR2040776
[7] Geiger, J. (1999). Elementary new proofs of classical limit theorems for Galton–
Watson processes. J. Appl. Probab. 36 301–309. MR1724856
[8] Hardy, R. and Harris, S. C. (2006). A new formulation of the spine approach to
branching diffusions. Available at http://arxiv.org/abs/math.PR/0611054.
[9] Hardy, R. and Harris, S. C. (2006). Spine proofs for Lp-convergence of branching
diffusion martingales. Available at http://arxiv.org/abs/math.PR/0611056.
[10] Hardy, R. and Harris, S. C. (2006). A spine proof of a large-deviations principle
for branching Brownian motion. Stochastic Process. Appl. 116 1992–2013.
[11] Harris, S. C. (1999). Travelling-waves for the FKPP equation via probabilistic ar-
guments. Proc. Roy. Soc. Edinburgh Sect. A 129 503–517. MR1693633
[12] Harris, S. C. (2000). Convergence of a “Gibbs–Boltzmann” random measure for a
typed branching diffusion. Séminaire de Probabilités XXXIV. Lecture Notes in
Math. 1729 239–256. Springer, Berlin. MR1768067
http://www.ams.org/mathscinet-getitem?mr=1748724
http://www.ams.org/mathscinet-getitem?mr=1325205
http://www.ams.org/mathscinet-getitem?mr=0968823
http://www.ams.org/mathscinet-getitem?mr=1619036
http://www.ams.org/mathscinet-getitem?mr=0662806
http://www.ams.org/mathscinet-getitem?mr=2040776
http://www.ams.org/mathscinet-getitem?mr=1724856
http://arxiv.org/abs/math.PR/0611054
http://arxiv.org/abs/math.PR/0611056
http://www.ams.org/mathscinet-getitem?mr=1693633
http://www.ams.org/mathscinet-getitem?mr=1768067
46 Y. GIT, J. W. HARRIS AND S. C. HARRIS
[13] Harris, S. C. and Williams, D. (1996). Large deviations and martingales for a
typed branching diffusion. I. Astérisque 236 133–154. MR1417979
[14] Harris, T. E. (2002). The Theory of Branching Processes. Dover, Mineola, NY.
MR1991122
[15] Itô, K. and McKean, H. P. (1965). Diffusion Processes and Their Sample Paths.
Academic Press, New York. MR0199891
[16] Kurtz, T., Lyons, R., Pemantle, R. and Peres, Y. (1997). A conceptual proof
of the Kesten–Stigum theorem for multi-type branching processes. In Classi-
cal and Modern Branching Processes (Minneapolis, MN, 1994 ) (K. B. Athreya
and P. Jagers, eds.). IMA Vol. Math. Appl. 84 181–185. Springer, New York.
MR1601737
[17] Kyprianou, A. E. (2004). Travelling wave solutions to the K–P–P equation: Alter-
natives to Simon Harris’ probabilistic analysis. Ann. Inst. H. Poincaré Probab.
Statist. 40 53–72. MR2037473
[18] Lyons, R. (1997). A simple path to Biggins’ martingale convergence for branching
random walk. In Classical and Modern Branching Processes (Minneapolis, MN,
1994 ) (K. B. Athreya and P. Jagers, eds.). IMA Vol. Math. Appl. 84 217–221.
Springer, New York. MR1601749
[19] Lyons, R., Pemantle, R. and Peres, Y. (1995). Conceptual proofs of L logL cri-
teria for mean behavior of branching processes. Ann. Probab. 23 1125–1138.
MR1349164
[20] McKean, H. P. (1975). Application of Brownian motion to the equation
of Kolmogorov-Petrovskĭı–Piskunov. Comm. Pure Appl. Math. 28 323–331.
MR0400428
[21] Murray, J. D. (2003). Mathematical Biology. II, 3rd ed. Springer, New York.
MR1952568
[22] Neveu, J. (1988). Multiplicative martingales for spatial branching processes. In Sem-
inar on Stochastic Processes 1987 (E. Çinlar, K. L. Chung and R. K. Getoor,
eds.) 223–242. Birkhäuser, Boston. MR1046418
[23] Olofsson, P. (1998). The x logx condition for general branching processes. J. Appl.
Probab. 35 537–544. MR1659492
[24] Varadhan, S. R. S. (1984). Large Deviations and Applications. SIAM, Philadelphia.
MR0758258
[25] Watanbe, S. (1967). Limit theorem for a class of branching processes. In Markov
Processes and Potential Theory (J. Chover, ed.) 205–232. Wiley, New York.
MR0237007
Y. Git
Statistical Laboratory
Cambridge University
22 Mill Street
Cambridge CB1 2HP
E-mail: Yoav.Git@gmail.com
J. W. Harris
Department of Mathematics
University of Bristol
University Walk
Bristol BS8 1TW
E-mail: john.harris@bristol.ac.uk
S. C. Harris
Department of Mathematical Sciences
University of Bath
Bath BA2 7AY
E-mail: s.c.harris@bath.ac.uk
URL: http://people.bath.ac.uk/massch/
http://www.ams.org/mathscinet-getitem?mr=1417979
http://www.ams.org/mathscinet-getitem?mr=1991122
http://www.ams.org/mathscinet-getitem?mr=0199891
http://www.ams.org/mathscinet-getitem?mr=1601737
http://www.ams.org/mathscinet-getitem?mr=2037473
http://www.ams.org/mathscinet-getitem?mr=1601749
http://www.ams.org/mathscinet-getitem?mr=1349164
http://www.ams.org/mathscinet-getitem?mr=0400428
http://www.ams.org/mathscinet-getitem?mr=1952568
http://www.ams.org/mathscinet-getitem?mr=1046418
http://www.ams.org/mathscinet-getitem?mr=1659492
http://www.ams.org/mathscinet-getitem?mr=0758258
http://www.ams.org/mathscinet-getitem?mr=0237007
mailto:Yoav.Git@gmail.com
mailto:john.harris@bristol.ac.uk
mailto:s.c.harris@bath.ac.uk
http://people.bath.ac.uk/massch/
	Introduction
	The branching model
	Application to reaction–diffusion equations
	Main results
	Martingales
	The asymptotic growth-rate of particles along spatial rays
	The asymptotic shape and growth of the branching diffusion
	Some expectation calculations
	The expected rate of growth along spatial rays
	The expected asymptotic shape
	Short climb large deviation heuristics
	A birth–death process
	Finding the optimal path and probability
	An important note on the optimal paths
	Proof of Theorem 3. Lower bound
	The ``spine'' setup and results
	Proof of Theorem 7. The short climb probability
	Martingale results
	Proof of Theorem 3. Upper bound
	Proof of Theorem 1. The spatial growth rate
	Acknowledgments
	References
	Author's addresses
ABSTRACT
  We study the high temperature phase of a family of typed branching diffusions
initially studied in [Ast\'{e}risque 236 (1996) 133--154] and [Lecture Notes in
Math. 1729 (2000) 239--256 Springer, Berlin]. The primary aim is to establish
some almost-sure limit results for the long-term behavior of this particle
system, namely the speed at which the population of particles colonizes both
space and type dimensions, as well as the rate at which the population grows
within this asymptotic shape. Our approach will include identification of an
explicit two-phase mechanism by which particles can build up in sufficient
numbers with spatial positions near $-\gamma t$ and type positions near $\kappa
\sqrt{t}$ at large times $t$. The proofs involve the application of a variety
of martingale techniques--most importantly a ``spine'' construction involving a
change of measure with an additive martingale. In addition to the model's
intrinsic interest, the methodologies presented contain ideas that will adapt
to other branching settings. We also briefly discuss applications to traveling
wave solutions of an associated reaction--diffusion equation.

<|endoftext|><|startoftext|>
Introduction
	Modeling the freefall
	The collision in CDM
	The collision in MOND
	N-body collision
	Results
	Summary
ABSTRACT
  We consider the orbit of the bullet cluster 1E 0657-56 in both CDM and MOND
using accurate mass models appropriate to each case in order to ascertain the
maximum plausible collision velocity. Impact velocities consistent with the
shock velocity (~ 4700km/s) occur naturally in MOND. CDM can generate collision
velocities of at most ~ 3800km/s, and is only consistent with the data provided
that the shock velocity has been substantially enhanced by hydrodynamical
effects.

<|endoftext|><|startoftext|>
Introduction
In order to avoid switching from multiplicative to additive notation, all groups
will be written multiplicatively.
Kneser’s addition theorem states that if S, T are finite subsets of an abelian
group G then |ST | ≤ |S| + |T | − 2 holds only if ST is periodic (i.e, there
is a non trivial subgroup H such that HST = ST .) Kneser’s Theorem is a
fundamental tool in Additive number Theory. Proofs of this result may be
found in [4, 5, 6, 7, 9].
In all previously known proofs of Kneser’s Theorem, the subgroup H depends
crucially on both sets S and T . With the goal of breaking this double depen-
dence in S and T , Balandraud investigated in recent work [1, 2] the properties
of a combinatorial poset that we now present.
Let S be a finite subset containing 1 of a group G. Following Balandraud, let
us define a cell of S as a finite subset X such that, for all z /∈ X, it holds that
zS 6⊂ XS. This notion is defined in [1, 2] and it is equivalent to the notion of
Université Pierre et Marie Curie, Paris 6, Combinatoire et Optimisation - case 189, 4
place Jussieu, 75252 Paris Cedex 05. yha@ccr.jussieu.fr
Universitat Politècnica de Catalunya, Matemàtica Aplicada IV, Campus Nord - Edif. C3,
C. Jordi Girona, 1-3, 08034 Barcelona, Spain. oserra@mat.upc.es
Université de Bordeaux 1, Institut de Mathématiques de Bordeaux, 351 cours de la
Libération, 33405 Talence. zemor@math.u-bordeaux1.fr
http://arxiv.org/abs/0704.0382v1
nonextendible subset used in [3]. Throughout the paper, by a cell we always
mean a cell of S.
A cell X is called a u-cell if |XS| − |X| = u. A u-cell with minimal cardinality
is called a u-kernel (of S).
Balandraud showed that, for a finite set S in an abelian group G, in the poset
of j–cells containing the unity ordered by inclusion with 1 ≤ j ≤ |S| − 2, the
set of kernels form a chain of subgroups. Moreover, if there exists a u–cell, then
there is a unique u–kernel containing the unit element which is contained in all
u–cells containing the unit element.
One of the consequences of this work is a new proof and the following strength-
ening of Kneser’s Theorem:
Theorem 1 (Balandraud) For any non-empty finite subset S of an abelian
group G, there exists a finite subgroup H of G such that for any finite subset T
of G one of the following conditions hold :
• |TS| ≥ |T |+ |S| − 1
• HTS = TS and |TS| ≤ |HS|+ |HT | − |H|
As far as the authors are aware this is a surprising and strong formulation that
was not observed before and does not follow straightforwardly from the classical
forms of Kneser’s Theorem.
The purpose of the present note is to give a short proof for the nonabelian
case that, in the poset of j–cells that are subgroups ordered by inclusion with
0 ≤ j ≤ |S| − 1, the set of kernels form a chain of subgroups. Moreover, each
u-kernel of this poset is unique and contained in all u–cells of this poset.
From this statement Kneser’s theorem allows one to deduce Balandraud’s re-
sults for the abelian case, and in particular Theorem 1. Kneser’s Theorem has
several equivalent forms. We use the following one; see e.g [4, 7]:
Theorem 2 (Kneser [5]) Let G be an abelian group and X,Y ⊂ G be finite
subsets such that |XY | ≤ |X| + |Y | − 2. Then
|XY | = |HX|+ |HY | − |H|,
where H = stab(XY ) = {x : xXY = XY }.
Our main tool is the following Theorem of Olson[8, Theorem 2]. We give an
equivalent formulation here where we use left–cosets instead of right–cosets.
Theorem 3 (Olson [8]) Let X,Y be finite subsets of a group G, and let H
and K be subgroups such that HX = X, KY = Y and KX 6= X, HY 6= Y .
|X \ Y |+ |Y \X| ≥ |H|+ |K| − 2|H ∩K|.
In particular either |X \ Y | ≥ |H| − |H ∩K| or |Y \X| ≥ |K| − |H ∩K|.
We shall use the following lemma.
Lemma 4 ([1, 2]) Let G be a group and 1 ∈ S ⊂ G be a finite subset. Then
the intersection of two cells M1,M2 of S is a cell of S.
Proof. Let x /∈ M1 ∩M2. There is i with x /∈ Mi. Then xS 6⊂ MiS. Hence
xS 6⊂ (M1 ∩M2)S.
We can now state our main result, namely Theorem 5 below.
2 An application of Olson’s Theorem
Balandraud [1, 2] proved that, in the abelian case, the set of kernels containing
the unit element and ordered by inclusion is a chain of subgroups. In the non
abelian case we can prove only that the set of kernels that are subgroups forms
a chain. The abelian case can then be easily recovered, since Kneser’s Theorem
implies (as we shall see below) that a kernel containing the unit element is a
subgroup.
Theorem 5 Let S be a finite subset containing 1 of a group G. Let M be a
u–kernel of S which is a subgroup. Let N be a subgroup which is a v–cell and
suppose u, v ≤ |S| − 1.
(i) If either N is a v–kernel or u = v then M ⊂ N or N ⊂ M .
(ii) If N is a v–kernel and v ≤ u then M ⊂ N .
Proof. Suppose that M 6⊂ N and N 6⊂ M . Note that, since M is a cell, if
NMS = MS then NM = M , thus N ⊂ M against our assumption. Hence we
may assume NMS 6= MS and similarly MNS 6= NS. By Theorem 3 we have
one of the two following cases.
Case 1: |MS| − |(MS) ∩ (NS)| = |(MS) \ (NS)| ≥ |M | − |M ∩N |. It follows
that |(M ∩N)S| − |M ∩N | ≤ |(MS)∩ (NS)| − |M ∩N | ≤ |MS| − |M |. On the
other hand we have u = |MS| − |M | < |S| ≤ |(M ∩N)S|. Since |MS| − |M | is
a multiple of |M ∩N | we have
u = |MS| − |M | = |(M ∩N)S| − |M ∩N |.
By Lemma 4, M ∩N is a cell. Since M is a u–kernel, we have M ∩N = M, a
contradiction.
Case 2: |NS| − |(NS) ∩ (MS)| = |(NS) \ (MS)| ≥ |N | − |N ∩M |. It follows
that |(N ∩ M)S| − |N ∩ M | ≤ |(NS) ∩ (MS)| − |N ∩ M | ≤ |NS| − |N |. On
the other hand we have |NS| − |N | < |S| ≤ |(N ∩M)S|. Since |NS| − |N | is a
multiple of |N ∩M | we have
|NS| − |N | = |(N ∩M)S| − |N ∩M |. (1)
Assume first u = v. Then u = |MS|−|M | = |NS|−|N | = |(N∩M)S|−|N∩M |.
Since M is a u–kernel, we have M ∩N = M, a contradiction.
Assume that N is a v–kernel. Then (1) implies N ∩M = N, a contradiction.
This proves (i).
Assume now that v ≤ u. Suppose M 6⊂ N . By (i) we have N ⊂ M , which
implies in particular that |MS| − |M | is a multiple of N . Therefore, from
u = |MS| − |M | < |S| ≤ |NS| we have u = |MS| − |M | ≤ |NS| − |N | = v
which gives u = v. But then M 6⊂ N and N ⊂ M imply |N | < |M |, and since
N is now a u–cell, this contradicts M being a u–kernel.
We can now deduce Balandraud’s description for kernels and cells :
Corollary 6 (Balandraud [1, 2]) Let G be an abelian group and S ⊂ G be
a finite subset with 1 ∈ S. Let M be a u–kernel of S containing 1 with 1 ≤ u ≤
|S| − 2. Then,
(i) M is a subgroup.
(ii) Each u-cell is M–periodic.
(iii) Each v–kernel with u < v ≤ |S| − 2 is a proper subgroup of M .
Proof. Let X be a u-cell with u ≤ |S| − 2. By Kneser’s Theorem, the
inequality |XS| − |X| = u ≤ |S| − 2 implies
u = |XS| − |HX| = |HS| − |H|, (2)
where H is the stabilizer of XS. Since X is a cell and HXS = XS, we
have X = HX. Note that, since G is abelian, ({y} ∪ H)S = HS implies
y ∈ Stab(HS) ⊂ Stab(XS), so that y ∈ H. This observation and (2) imply
that H is an u–cell. In particular, by taking X = M , the period K of MS is a
u–cell. Since KMS = MS and M is a u–cell, we have K ⊂ KM ⊂ M . Since
M is a u–kernel we have M = K. This proves (i).
Now let H be the stabilizer of XS, where X is a u–cell. As shown in the
preceding paragraph H is also a u–cell. By Theorem 5 we have M ⊂ H and
thus MH = H. Since X is a cell and HXS = XS, we have X = HX = MHX.
Hence X ⊂ MX ⊂ MHX = X implies X = MX. This proves (ii).
Finally, by (i), a v–kernel N is a subgroup. By Theorem 5 we have N ⊂ M .
From Corollary 6, one can deduce Theorem 1.
Proof of Theorem 1: We may assume without loss of generality that 1 ∈ S.
Case 1: There is no m–cell for any 1 ≤ m ≤ |S| − 2.
• either we have |TS| ≥ |S|+ |T | − 1 for any non-empty finite T , in which
case the theorem clearly holds with H = {1}.
• or there exists some non-empty finite T such that |TS| ≤ |S| + |T | − 2.
Without loss of generality, we may also suppose 1 ∈ T . Now T must be
contained in an m–cell with m ≤ |S| − 2, but since no such cell exists for
1 ≤ m, we have that T itself must be a cell (a 0-cell) i.e. |TS| = |T |. We
therefore have HT = TH = T = TS = HTS where H is the (necessarily
finite) subgroup generated by S. We have just proved that the theorem
holds in this case with H = 〈S〉.
Case 2: There exists an m–cell with 1 ≤ m ≤ |S| − 2. We may therefore
consider the largest integer u ≤ |S| − 2 for which S admits a u–cell. Let H be
the u–kernel containing 1. Note that u ≤ |S|−2 implies that H is different from
{1}. Now let T be any finite non-empty subset such that |TS| − |T | ≤ |S| − 2.
We shall prove that HTS = TS.
By adding elements to T as long as necessary, we can find a cell X that contains
T and such that XS = TS. Note that we then have |XS| − |X| ≤ |TS| − |T | ≤
|S| − 2, so that X is a v–cell for some v ≤ u. By Corollary 6 (ii) we have
TS = XS = MXS = MTS where M is the v-kernel containing 1. By part (i)
of Corollary 6, H is a subgroup of M so that TS = XS = HTS as well.
Finally, |ST | ≤ |HS|+ |HT | − |H| follows from |ST | being a multiple of |H|.
References
[1] E. Balandraud, Une variante de la méthode isopérimetrique de Hamidoune,
appliquée au theoreme de Kneser, Preprint, december 2005.
[2] E. Balandraud, Quelques résultats combinatoires en Théorie Additive des
Nombres, Thèse de Doctorat de l’Université de Bordeaux I, May 2006.
[3] D. Grynkiewicz, A step beyond Kempermann structure Theorem, Preprint
May 2006.
[4] J. H. B. Kemperman, On small sumsets in Abelian groups, Acta Math. 103
(1960), 66–88.
[5] M. Kneser, Summenmengen in lokalkompakten abelesche Gruppen, Math.
Zeit. 66 (1956), 88–110.
[6] H.B. Mann, Addition Theorems, R.E. Krieger, New York, 1976.
[7] M. B. Nathanson, Additive Number Theory. Inverse problems and the ge-
ometry of sumsets, Grad. Texts in Math. 165, Springer, 1996.
[8] J.E. Olson, On the symmetric difference of two sets in a group. European
J. Combin. 7 (1986), no. 1, 43–54.
[9] T. Tao and V.H. Vu, Additive Combinatorics, Cambridge Studies in Ad-
vanced Mathematics 105 (2006), Cambridge University Press.
	Introduction
	An application of Olson's Theorem
ABSTRACT
  A recent result of Balandraud shows that for every subset S of an abelian
group G, there exists a non trivial subgroup H such that |TS| <= |T|+|S|-2
holds only if the stabilizer of TS contains H. Notice that Kneser's Theorem
says only that the stabilizer of TS must be a non-zero subgroup.
  This strong form of Kneser's theorem follows from some nice properties of a
certain poset investigated by Balandraud. We consider an analogous poset for
nonabelian groups and, by using classical tools from Additive Number Theory,
extend some of the above results. In particular we obtain short proofs of
Balandraud's results in the abelian case.

<|endoftext|><|startoftext|>
Microsoft Word - arxiv_prasadtext.doc
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 1 of 21)       
The Exact Boundary Condition to Solve the Schrödinger 
Equation of Many Electron System 
Rajendra Prasad 
Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India 
E-mail: Theochem@gmail.com 
In an attempt to bypass the sign problem in quantum Monte Carlo simulation of 
electronic systems within the framework of fixed node approach, we derive the exclusion 
principle “Two electrons can’t be at the same external isopotential surface 
simultaneously” using the first postulate of quantum mechanics. We propose the exact 
Coulomb-Exchange nodal surface i.e. the exact boundary condition to solve the non-
relativistic Schrödinger equation for the non-degenerate ground state of atoms and 
molecules. This boundary condition was applied to compute the ground state energies of 
N, Ne, Li2, Be2, B2, C2, N2, O2, F2, and H2O systems using diffusion Monte Carlo 
method. The ground state energies thus obtained agree well with the exact estimate of 
non-relativistic energies. 
INTRODUCTION 
An ideal target of a quantum chemist/physicist is to solve the non-relativistic 
Schrödinger equation exactly as it describes much of the world of chemistry. If we can 
solve this equation at a realistic cost, we can make very precise predictions. At present, 
only the full-CI method is available for obtaining the exact wave function within a given 
basis set, but this method is too demanding computationally and therefore not affordable 
even for a small system.  
In recent years increasing attention has been drawn to the random walk approach 
called diffusion Monte Carlo (DMC) method1 2 3 4 for solving Schrödinger equation. The 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 2 of 21)       
attractiveness of DMC method lies in that it can treat many body problems exactly. The 
DMC method is a projection method based on the combination of the imaginary time 
Schrödinger equation, generalized stochastic diffusion process, and Monte Carlo 
integration. The solution, it yields has only statistical error, which can be properly 
estimated and in principle, made as small as desired. Since in the DMC method the wave 
function has to be a population density, therefore, the DMC method can only describe the 
constant sign solution of the Schrödinger equation. This poses a serious problem if one is 
interested in the solution of a many electron system where the wave function is known to 
be antisymmetric (i.e. both positive and negative) with respect to interchange of two 
electrons. This situation is known as fermion sign problem in the quantum Monte Carlo 
literature1-4. The solution of this problem is one of the most outstanding in all of the 
computational physics/chemistry. This problem is often (mis)understood as a technical 
detail that defeating the numerical simulators. To the best of our knowledge no 
methodology is available to handle this problem in a systematic and controlled fashion. 
However, we think that it is essentially a problem of exact boundary, which is not known 
for many electron systems for obtaining well-behaved solutions of non-relativistic 
Schrödinger equation. It is our understanding that the boundary must be derived from the 
link between the formal mathematics and the physics of the real world. 
In this article, we will derive the boundary condition for atomic and molecular 
systems to obtain well-behaved solutions (i.e. bound state solution is single valued, 
continuous, quadratically integrable, and differentiable) of non-relativistic electronic 
Schrödinger equation. To start with, we are dealing with situations in which the ground 
state is non-degenerate only.  
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 3 of 21)       
THE EXACT BOUNDARY CONDITION 
We have the time independent Schrödinger equation: 
ˆ Ψ=Ψ EH                                                                                     …………..(1) 
where Ĥ  is the time independent non-relativistic electronic Hamiltonian operator in the 
Born-Oppenheimer approximation, E0 is the eigenvalue of the full many-electron ground 
state 0Ψ . The Ĥ  is defined in atomic units as follows: 
+−∇−=
electrons
electrons
electrons
2 1)(
1ˆ r                                                 ………..(2) 
where the external potential, ∑=
Nuclei
rV )(
,                                                 ………….(3) 
2∇  is Laplacian, ZI denotes nuclear charge, and  rIi and rij symbolize the electron-nucleus 
and electron-electron distance, respectively. 
Following Hohenberg-Kohn theorem I, a proof only of existence5, the electron 
density )(0 r
ρ  in the ground state 0Ψ is a functional of )(rV
, i.e. 
)(0 r
ρ  = )]([0 rV
ρ .                                                                     ………………………. (4) 
Further, the full many electron ground state 0Ψ  is unique functional of )(0 r
ρ , i.e. 
0Ψ = )]([ 00 r
ρΨ .                                                                                ……………………(5) 
Evidently we can say that 0Ψ  is a functional of )(rV
 i.e.  
0Ψ = )]([0 rV
Ψ .                                                                                    …………………. (6) 
We have a choice to express the exact density: 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 4 of 21)       
)]([0 rV
ρ  = ∑
2 )]([
φ                                                                          ………………..(7) 
Where N denotes number of electrons. The functionals, { }NirVi .....1)],([ =
φ  are exact 
ortho-normal one electron functions of the function )(rV
, which give exact )]([0 rV
(Caution to reader!! At the moment, here is nothing to do with so-called s, p, d, f, ..etc. 
type orbitals . The functionals { }NirVi .....1)],([ =
φ  are entirely different from those 
orbitals obtained from Kohn-Sham6 or similar formalisms.) 
Now we can write the exact N electron ground state wave function as a functional of N 
exact one electron functionals { }NirVi .....1)],([ =
    0Ψ = )]]([)],......,([)],([[
10 NN rVrVrV
φφφΨ                                                   ……(8)  
or 0Ψ = )]]([)],......,([)],([[ 22110 NN rVrVrV
φφφΨ                                                 ……..(9)  
Since each one electron functional in { }NirVi .....1)],([ =
φ  is a function of external 
potential )(rV
, we can also write the exact N electron ground state wave function in 
functional form as follows: 
    0Ψ = )](),......,(),([ 210 NrVrVrV
Ψ                                                …………………...(10) 
Thus the exact N electron non-degenerate ground state wave function is a unique 
functional of external potential experienced by each electron i.e. functional of 
)(),......,(),( 21 NrVrVrV
So far, it is not clear: 
• Whether the wave function is symmetric or antisymmetric with respect to 
interchange of any two electrons. 
• What are the analytical forms of { }NirVi .....1)],([ =
• What is the analytical form of the exact wave function? 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 5 of 21)       
• How to get the exact wave function from the exact density.  
However, we get some idea about the topology of a well behaved ground state non-
degenerate wave function and distribution functions in a given external potential )(rV
In particular: “The probability of n-electrons (where n = 2..N) being found 
simultaneously on the isopotential surface of an external potential )(rV
 is same 
irrespective of positions of the electrons on the surface.” 
Now we proceed to decide the nature (symmetric or antisymmetric with respect to 
interchange of any two electrons) of a well behaved many electron wave function.  
Defining the local energy, EL: 
∑∑ ∑ ∑
electrons
electrons
electrons
Nuclei
11 10
                                     ………….(11) 
The terms IiI rZ and ijr1  in the equation (11) will blow up if 0→Iir and 0→ijr  unless 
so-called cusp conditions are obeyed by 0Ψ . The 0Ψ  is exact and obeys electron nucleus 
(e-N) and electron-electron (e-e) cusp conditions.  
The wave function for a system of N identical particles must be symmetric or 
antisymmetric with regard to interchange of any two of the identical particles, i and j. 
Since the N particles are all identical, we could not have the wave function symmetric 
with regard to some interchanges and antisymmetric with regard to other interchanges. 
Thus the wave function of N identical particles must be either symmetric or 
antisymmetric with regard to every possible interchange of any two particles.  
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 6 of 21)       
 Let us assume 0Ψ is symmetric with regard to interchange of electrons i and j.  
There is a cusp in 0Ψ at rij = 0. This implies that 0Ψ  is not differentiable at rij = 0. 
Therefore, 0Ψ , symmetric with respect to interchange of any two electrons is not a well-
behaved solution. To make 0Ψ  a well-behaved wave function, 0Ψ  must be zero when rij 
= 0 and also it must change sign with respect to the interchange of two electrons, i.e. if 
ji rr
=  then 0Ψ  = 0.  This condition is universal and independent of kind of external 
potential. However, we are interested in a well-behaved solution of a bound state in a 
given external potential )(rV
. From the previous argument, we know that the 
simultaneous probability of finding two electrons is same everywhere at the isopotential 
surface. Therefore, if )( irV
- )( jrV
 = 0 then 0Ψ  = 0.  
Extending to N electron system, we have 
If ( ) 0)()(
=−= Π
rVrVf
 then 0Ψ  = 0. 
We can also express f  as Vandermonde determinant: 
)()(....)()(
)()(....)()(
)()(....)()(
11....11
rVrVrVrV
rVrVrVrV
rVrVrVrV
rVrVf
=−= Π               …(12) 
Consequently we have exclusion principle in the following form: 
“Two electrons can’t be at the same external isopotential surface simultaneously.” 
We see that if we are interested in a well behaved solution of the time 
independent Schrödinger equation, the boundary condition (12) (i.e. antisymmetric wave 
function) is obtained naturally due to singularity in the e-e interaction potential, which 
respects Pauli’s exclusion principle.  If electrons i and j are of opposite spin then we say 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 7 of 21)       
that )( irV
- )( jrV
 = 0 represents Coulomb (nodal) surface. If electrons i and j are of same 
spin then )( irV
- )( jrV
 = 0 represents Coulomb-Exchange nodal surface. All together, 
the ( ) 0)()(
=−= Π
rVrVf
 represents the Coulomb-Exchange nodal surface of N 
electron system. Hereafter, we will call f as ExchangeCoulombf −  nodal surface. However, the 
solution obtained for the Hamiltonian (2) within the boundary ExchangeCoulombf − =0 does not 
tell us about the spin multiplicity of the N electron system. 
Further, we can rewrite the functional f in terms of Hermite polynomials, )]([ rVH k
)]([)]([....)]([)]([
)]([)]([....)]([)]([
)]([)]([....)]([)]([
)]([)]([....)]([)]([
1112111
2122212
1112111
0102010
NNNNNN
rVHrVHrVHrVH
rVHrVHrVHrVH
rVHrVHrVHrVH
rVHrVHrVHrVH
−−−−−
=              …..(13) 
In particular, if we multiply an optimizable one electron functional )]([ rVψ  to the 
equation (13) and we obtain an N electron wave function: 
)]([)]([)]......([)]([)]([ 21 rVfrVrVrVNormrV N
rrrrr
ψψψ=Ψ   ……………………………(14) 
The one-electron density functional corresponds to the wave function (14): 
′′=′ ∑
1111 )]([)]([)]([)]([)]();([
kkk rVHrVHArVrVrVrV
rrrrrr
ψψρ      ……………….(15) 
where Ak is normalization constant of )]([)]([ rVHrV k
The two-electron density functional in terms of one-electron density functional: 
)]();([)]();([)]();([)]();([
)](),();(),([
21122211
rVrVrVrVrVrVrVrV
rVrVrVrV
′′−′′=
rrrrrrrr
……(16) 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 8 of 21)       
Here it appears that we can obtain exact ground state energy by optimizing only a one-
electron functional )]([ rVψ  in the equation (14). 
A very interesting and new physics obtained from the equation (13) is that each 
row in the determinant represents different level (k) of Kamalpur breathing (anharmonic 
quantum breathing) of electron cloud in a given external potential )(rV
 and each level, k 
is occupied by one electron (the elementary particle). 
BYPASSING THE SIGN PROBLEM 
We can bypass the fermion sign problem in the electronic structure diffusion 
Monte Carlo (DMC) method using fixed node approach. Here one assumes a prior 
knowledge of the nodal surface i.e. 
0(R) = 0. Due to tiling property
7 of the exact ground 
state wave function, the Schrödinger equation is solved in the volume embraced by the 
nodal surface, where the wave function has a constant sign and in this way the fermion 
sign problem is bypassed. The exact knowledge of Coulomb Exchange nodal surface 
allows us for an exact stochastic solution of the Schrödinger equation. The restriction in 
the random walk RR ′→  during the electronic structure diffusion Monte Carlo 
simulation is as follows: 
reject
accept
RfRf ExchangeCoulombExchangeCoulomb
)()(                                      ………….(17) 
We have applied the boundary condition (17) for the ground state electronic 
structure diffusion Monte Carlo simulation of N, Ne, Li2, Be2, B2, C2, N2, O2, F2, and 
H2O systems.  
Monte Carlo calculations can be carried out using sets of random points picked 
from any arbitrary probability distribution. The choice of distribution obviously makes a 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 9 of 21)       
difference to the efficiency of the method. If the Monte Carlo calculations are carried out 
using uniform probability distributions, very poor estimates of high-dimensional integrals 
are obtained, which is not a useful method of approximation. These problems are handled 
by introducing the importance sampling approach8 9.  In this approach the sampling 
points are chosen from a trial distribution, which concentrates on points where the trial 
function, ΦT(R) is large.  
 In the present DMC calculations, we have chosen the trial function, ΦT in the 
form: 
ΦT = Φ.F ,                                ….(18) 
where Φ denotes the Hartree Fock (HF) or multi configuration self consistent field 
(MCSCF) wave function and F is a correlation function that depends on inter-particle 
distances. The HF and MCSCF wave functions were obtained using the GAMESS 
package10 and employed Dunning’s cc-VTZ atomic basis set 11. In order to satisfy the 
electron nucleus (e-N) cusp condition, all Gaussian type s basis functions were replaced 
with eight Slater-type s basis functions. The exponents of Slater-type s functions were 
taken from Koga et al. 12 and satisfy the e-N cusp condition. 
We have chosen the Schmidt, Moskowitz, Boys, and Handy (SMBH) correlation 
function FSMBH
13. For the SMBH correlation function, Eqn. (19), we have included terms 
up to 2nd order, where order, s is defined as s = l + m + n. 
( ) 
+−= ∑∑ ∑
µµµµµ
atoms
electrons
iAASMBH rrrrrcF exp     ………….(19) 
where 
                                                                                        ………….(20) 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 10 of 21)       
and r denotes inter-particle distance. Six non-redundant parameters out of the total ten 
were optimized keeping b = 1.0 as follows: 
1) First we obtained optimal parameters by minimizing the energy and variance at 
the variational Monte Carlo (VMC) level. 
2) Using this VMC optimal trial function, the trial function fixed node DMC 
calculation was carried out and walkers were collected after each 2000 steps.  
Further, correlation parameters were reoptimized to minimize the variance with 
~100,000 walkers. Here reference energy was set to the trial function fixed node 
DMC energy.  
These optimized trial functions were used for importance sampling in the DMC 
simulation and a random walk RR ′→  was accepted if 
0)()( >′−− RfRf ExchangeCoulombExchangeCoulomb .  
The DMC calculations were performed using the open source quantum Monte 
Carlo program, ZORI14. Around 10,000 walkers were used for the systems studied. The 
Umrigar et al.15 algorithm was used for DMC walks and Caffarel Assaraf et al.16 
algorithm for population control. We have allowed only one electron walk at a time. The 
DMC calculations were done at several time steps. We report only those energies 
extrapolated to zero time step. 
 We present the ground state DMC energies of N, Ne, Li2, Be2, B2, C2, N2, O2, F2, 
and H2O systems in Table I. The DMC energies obtained using our newly derived 
boundary ExchangeCoulombf −  = 0 are far better than the trial function fixed node DMC 
energies17 and compare well with the experimental counterpart. However, present 
simulations were noisy and unpleasant compared to conventional trial function fixed 
node DMC simulations.  It is worth noting that we have obtained DMC energy even 
lower than the exact value at smaller time steps for the atoms of relatively larger atomic 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 11 of 21)       
radius perhaps due to failure of the distributions to reach the steady state or equilibrium 
distributions in a finite number of steps. This problem can be handled in the Green’s 
function quantum Monte Carlo (GFQMC) method as it takes the advantage of the 
properties of Green’s functions in eliminating time-step entirely in treating the steady 
state equation. The GFQMC is well suited if boundaries are exactly known18. If the trial 
function boundary and the ExchangeCoulombf − = 0 does not coincide and also non-zero values 
of trial function are very much different from the exact solution, which could lead to 
large statistical fluctuations from poor sampling and possibly to an effective non-ergodic 
diffusion process due to the finite projection time in practical calculations. Therefore, we 
are looking for an alternative well behaved trial function whose boundary coincides with 
those of ExchangeCoulombf − .  
CONCLUSION 
This article presents a progress of the author's research in order to get exact 
solution of non-relativistic Schrödinger equation of many electron systems. A conclusion 
of this on going research is that we have derived the exclusion principle “Two electrons 
can’t be at the same external isopotential surface simultaneously” using the first postulate 
of quantum mechanics. We propose the exact Coulomb-Exchange nodal surface i.e. the 
exact boundary to solve the non-relativistic Schrödinger equation for non-degenerate 
ground state of atoms and molecules. Using this newly derived boundary condition, one 
can bypass the fermion sign problem in the electronic structure Quantum Monte Carlo 
simulation and hence the exact ground state energy as well as the exact electron density. 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 12 of 21)       
                                                                                                                                                 
REFERENCES 
1.  Anderson, J. B. A random-walk simulation of the Schrödinger equation: +3H . J. Chem. 
Phys.  63, 1499-1503 (1975) 
2.  Ceperley, D. M. & Mitas, L. in New Methods in Quantum Mechanics, I. Prigogine, S. 
A. Rice, Eds.,(John Wiley and Sons, New York, 1996), Vol. 93. 
3. Hammond, B. L., Lester, Jr, W. A. & Reynolds, P. J. Monte Carlo Methods in Ab 
Initio Quantum Chemistry; World Scientific: Singapore, 1994. 
4.  Ceperley, D. & Alder, B. Quantum Monte Carlo. Science 231, 555-560 (1986) 
5.  Hohenberg,  P. & Kohn, W. Inhomogeneous Electron Gas. Phys. Rev. 136, B864-
B871 (1964) 
6.  Kohn, W. & Sham, L. J. Self-Consistent Equations Including Exchange and 
Correlation Effects. Phys. Rev. 140, A1133-A1138 (1965) 
7.  Cepereley, D. M. Fermion nodes. J. Stat. Phys. 63, 1237-1267 (1991) 
8.  Metropolis, N. A., Rosenbluth, W., Rosenbluth, M. N., Teller, A. H. & Teller, E. 
Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 21, 1087-
1092 (1953)  
9.  Grimm, R. C. & Storer, R. G. Monte-Carlo solution of Schrödinger's equation.  J. 
Comput. Phys. 7, 134-156 (1971) 
10.  Schmidt, M. W. et al. General atomic and molecular electronic structure system. J 
Comput Chem 14, 1347-1363 (1993)  
11.  Dunning, Jr., T. H. Gaussian basis sets for use in correlated molecular calculations. I. 
The atoms boron through neon and hydrogen.  J. Chem. Phys. 90, 1007-1023 (1989) 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 13 of 21)       
                                                                                                                                                 
12.  Koga, T., Kanayama, K., Watanabe, S. & Thakkar, A. J. Analytical Hartree-Fock 
wave functions subject to cusp and asymptotic constraints: He to Xe, Li+ to Cs+, H- to I-. 
Int. J. Quantum Chem. 71, 491-497 (1999)  
13. Schmidt, K. E. & Moskowitz, J. W. Correlated Monte Carlo wave functions for the 
atoms He through Ne.  J. Chem. Phys. 93, 4172-4178 (1990) 
14. Aspuru-Guzik , A. et al. Zori 1.0: A parallel quantum Monte Carlo electronic 
structure package. J. Comp. Chem. 26, 856-862 (2005) 
15.  Umrigar, C. J., Nightingale, M.P. & Runge, K.J. A diffusion Monte Carlo algorithm 
with very small time-step errors. J. Chem. Phys. 99,2865-2890 (1993) 
16.  Assaraf, R., Caffarel, M. & Khelif, A. Diffusion Monte Carlo methods with a fixed 
number of walkers. Phys. Rev. E. 61, 4566-4575 (2000) 
17.  Filippi, C. & Umrigar, C. J. Multiconfiguration wave functions for quantum Monte 
Carlo calculations of first-row diatomic molecules. J. Chem. Phys. 105, 213-226 (1996) 
18.  Kalos, M. H., Monte Carlo Calculations of the Ground State of Three- and Four-
Body Nuclei.  Phys. Rev. 128, 1791-1795 (1962) 
ACKNOWLEDGEMENTS 
The QMC calculations were carried out at the Lawrence Berkeley National Laboratory, 
Berkeley. The author gratefully acknowledges Professor W. A. Lester for his support 
during the stay at Berkeley. The author is indebted to Professor P. Chandra of Banaras 
Hindu University, Varanasi for his interest and helpful discussion during the preparation 
of the manuscript. Professor S. K. Sengupta of Banaras Hindu University, Varanasi is 
acknowledged for careful reading of the manuscript. 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 14 of 21)       
                                                                                                                                                 
TABLE I. The total ground state energies obtained from fixed-node DMC calculation.  
Atom / 
Molecule 
Bond 
length 
CSF,D ETFN-DMC 
(Ref. 17) 
ECEN-DMC 
(Extrapolated to τ=0) E0 
-, 111 
-54.5753(3) 
-54.5841(5) 
-54.5902(11) 
-54.5892 
Ne  1,1 -128.9216(15) -128.938(1) -128.9375 
Li2 5.051 
-14.9911(1) 
-14.9938(1) 
-14.9955(5) 
-14.9954 
Be2 4.63 
5,16 
-29.3176(4) 
-29.3301(2) 
-29.3378(15) 
-29.33854(5) 
B2 3.005 
6,11 
-49.3778(8) 
-49.3979(6) 
-49.41655(45) 
-49.415(2) 
C2 2.3481 
4,16 
77, 314  
-75.8613(8) 
-75.8901(7) 
-75.9035(9) 
-75.9229(19) 
-75.923(5) 
2.068 
4,17 
-, 552 
-109.487(1) 
-109.505(1) 
-109.520(3) 
-109.5424(15) 
-109.5423 
O2 2.282 
-150.268(1) 
-150.277(1) 
-150.3274(15) 
-150.3268 
F2 2.68 
-199.478(2) 
-199.487(1) 
-199.5289(25) 
-199.5299 
-199.52891(4) 
H2O  
-, 300 
-76.4175(4) 
-76.429(1) 
-76.4376(11) 
-76.438(3) 
-76.4376 
Bond lengths and energies are in atomic units. In the third column, we list the number of 
configuration state functions (CSFs) and number of determinants (D) in the trial function 
(ΦT). ETFN-DMC denotes the DMC energy with ΦT =0. ECEN-DMC denotes the DMC energy 
with ExchangeCoulombf − = 0. E0 denotes the exact, non-relativistic, infinite nuclear mass 
energy. The numbers shown in bracket are error bar. 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 15 of 21)       
                                                                                                                                                 
Supplementary note for reviewers: 
1. The proposed theory is to deal only real interacting many electron systems. To 
start with only non-degenerate ground state of many electron systems are 
considered. Author is neither interested nor intended to deal any kind of non-
interacting systems such as free fermion, free electron gases, or free particles 
because author think that none of the real system belong to either of these classes. 
Author has chosen to construct the boundary condition from the link between the 
formal mathematics and the physics of many electron systems. 
2. Difference between spatial nodes and Coulomb-Exchange nodal surface: 
I hope that people can distinguish spatial nodes and Coulomb Exchange nodes 
and the physics behind the different kind of nodes. Whatever I have discussed in 
this paper is only about Coulomb-Exchange nodal surfaces. There is no analogy 
with a particle in a box node and Coulomb-Exchange nodal surfaces. For 
example: The function f(r1,r2)=(r1-1)(r2-1)(r1-r2)exp(-2 r1-2 r2) is antisymmetric 
with respect to interchange of two electrons. However, the node (r1-1)(r2-1) is 
symmetric with respect to interchange of two electrons and fixed and this node 
can be compared with nodes of particle in a box. The Coulomb-Exchange node 
(r1-r2) is antisymmetric with respect to interchange of two electrons and 
responsible for removal of singularity in e-e interaction potential. The Coulomb-
Exchange nodal surfaces only occur in a system of more than one electron 
systems. Author understands that the Coulomb-Exchange nodal surfaces are 
directly responsible for the existence of real many electron systems. 
3. A consequence of proposed solution of the sign problem is that the ground 
state of Helium atom in the non-relativistic limit has a nodal surface.  
However, it is understood that He ground state wave function is symmetric 
and has no such node. 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 16 of 21)       
                                                                                                                                                 
A consequence of proposed solution for the sign problem is that the ground state 
of the Helium atom in the non-relativistic limit has Coulomb-Exchange nodal 
surface, r1-r2=0. 
An understanding that He atom ground state wave function is symmetric and has 
no such node is an illusion only. This illusion arises due to a practice that the 
QMC people using phi(1)*phi(2)*Jastrow trial function, where phi(r) is 1s orbital. 
The trial function is symmetric with respect to exchange of two electrons. The 
trial function also satisfies electron nucleus cusp condition. We also expect that 
the final solution will satisfy e-e cusp condition. Since the trial function is 
symmetric, people got accurate energy and assumed that the final solution is also 
symmetric and it does not have any node also wave function is non-zero at the 
point of coincidence of two electrons. Where is Coulomb hole? However, it can 
be proven that a symmetric solution is not acceptable. Proofs are as follows:  
“Proof for the existence of Coulomb-Exchange node in He ground 
state exact wave function” 
A.) Let assume  
 ),( 21 rrsym
Ψ  is an exact symmetric wave function. 
 i.e. ),( 21 rrsym
Ψ = ),( 12 rrsym
Since ),( 21 rrsym
Ψ  is exact, it must satisfy the cusp condition at 21 rr
= . Clearly 
there is a cusp at 21 rr
Since there is a cusp at 21 rr
=  in ),( 21 rrsym
Ψ , the second derivative 
2 ),( xrrsym ∂Ψ∂
 is not defined at 21 rr
Therefore ),( 21 rrsym
Ψ  is not a well behaved solution and hence it is not an 
acceptable wave function. Only option left is antisymmetric solution. 
B.) Another proof: 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 17 of 21)       
                                                                                                                                                 
 ),( 21 rr
Ψ  = ∑∑
νµµν φφ )()( 21 rrc
                                                  ………...(S-1) 
 Where { })(rrµφ  forms a real infinite one-electron basis. 
 Since ),( 21 rr
Ψ  is expanded over infinite basis set and hence it is exact. 
 This implies that ),( 21 rr
Ψ satisfies the cusp condition at 21 rr
 ),( 21
Ψ∇ =∑∑
νµµν φφ )()( 21
1 rrc
.                                         ………...(S-2) 
The second derivative ),( 21
Ψ∇  is continuous at 21 rr
=  because each term in 
the expansion is continuous (the rules of continuity for algebraic combinations). 
 This implies that there is no cusp in ),( 21 rr
Ψ  at 21 rr
 BUT ),( 21 rr
Ψ has to satisfy the cusp condition at 21 rr
 This is only possible if ),( 21 rr
Ψ changes the sign at 21 rr
And hence the exact ),( 21 rr
Ψ  has exchange node irrespective of its spin 
multiplicity. 
C.) More illustrative example: 
 Hamiltonian for He atom: 
H +−−∇−∇−=                                                    ………...(S-3) 
 and )2,1()2,1( Ψ=Ψ EH                                                                       ………...(S-4) 
 Let expand  
νµµν φφ )2()1()2,1( c                                                           ………...(S-5) 
Where { })(rµφ  is complete set of eigen functions of the Hamiltonian 
1ˆ 2 −∇−=  with eigenvalue equation )()(ˆ rrh µµµ φεφ = . 
 Rewriting the He Hamiltonian: 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 18 of 21)       
                                                                                                                                                 
H +−−∇−∇−= =
hh ++                            ………...(S-6) 
)2,1(
)2()1()ˆˆ(
)2,1(
)2,1()
)2,1(
)2,1(ˆ
EL +Ψ
νµµν φφ
          ………...(S-7) 
 Since )(rµφ  is an eigen function of ĥ . 
 We can write  
)2,1(
)2()1()(
EL +Ψ
νµνµµν φφεε
                                      ………...(S-8) 
 µννµ εε dE ++=  
 µννµ εε dE −=+                                                                              ………...(S-9) 
)2,1(
)2()1()(
EL +Ψ
νµµνµν φφ
                                   ………...(S-10) 
)2,1(
)2()1(
)2,1(
)2()1(
EL +Ψ
∞ ∞∞ ∞
νµµνµν
νµµν φφφφ
    ………...(S-11) 
)2,1(
)2()1(
)2,1(
)2,1(
EL +Ψ
νµµνµν φφ
                           ………...(S-12) 
)2,1(
)2()1(
EEL +Ψ
νµµνµν φφ
                                      ………...(S-13) 
 If )2,1(Ψ  is exact then 
 EEL =  
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 19 of 21)       
                                                                                                                                                 
 and 
)2,1(
)2()1(
νµµνµν φφ
                                             ………...(S-14) 
Now let assume that the exact solution is symmetric with respect to interchange of 
two electrons. )2,1(Ψ  and )(rµφ  are well behaved and differentiable. From the 
rules of continuity for algebraic combinations, 
the term in equation (S-14), 
)2,1(
)2()1(
νµµνµν φφdc
 is continuous and finite 
and it should not diverge when 012 →r . Therefore, symmetric solution is not 
acceptable. 
However, if )2,1(Ψ  = 0 at r12=0 then 
)2,1(
)2()1(
νµµνµν φφdc
 will also diverge 
and can compensate the divergence in 1/r12 term.   
Thus the only acceptable solution is antisymmetric (with respect to interchange 
of two electrons) solution. 
D.) Another example: 
Almost all QMC people believe (their believe is based on some assumptions and 
approximations) that He atom ground state wave function is symmetric. This is an 
illusion.  This can be understood as follows: 
Let us take trial functions of two electron system: 
( )221
1 xxb
xxg 21
211 ),(
−−=  
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 20 of 21)       
                                                                                                                                                 
( ) eeexx Uxxxxg 21 22212 221),(
( ) eeexx Uxxxxg 21 22213 21),( −
−−=  
If someone claims that He ground state is symmetric, what kind of exact 
symmetric wave functions they are getting finally? The functions g1and g2 are 
symmetric with respect to interchange of two electrons. The functions like g1, g2, 
and g3 can satisfy cusp condition. The functions g1 and g2 are not differentiable at 
x1=x2 and therefore these are not acceptable. The antisymmetric function g3 are 
differentiable at x1=x2. 
However, people have got very accurate ground state energy for He atom using 
HF*Jastrow trial function and they concluded that He atom has no node without 
examining the simultaneous probability of finding two electrons at exactly same 
place. I think, they got good results due to inherent beauty of DMC technique.  
The functions g2 is symmetric and g3 is antisymmetric with respect to interchange 
of two electrons. However, g2*g2 and g3*g3 give the exactly same probability 
distribution i.e. same physics. Functions g2 and g3 vanish when x1=x2. Further, the 
VMC calculation for g2 and g3 will give exactly same energy. Can anyone predict 
that the VMC energies obtained from g2 and g3 represent singlet or triplet state? I 
am sure it is not possible.  
An antisymmetric wave function can satisfy cusp condition as well as it’s 
derivative will be continuous simultaneously at the point of coincidence. Here 
symmetric and antisymmetric wave functions serve same distribution. Why I 
should not prefer antisymmetric wave function for which a boundary condition 
can be imposed easily? 
4. Further, if I assume the argument  “He ground state wave function is 
symmetric and has no such node” is correct. The end will be a nonsense, 
which is as follows: 
____________________________________________________________________________________________________________ 
The Exact Boundary Condition to Solve the Schrödinger Equation of Many Electron System  
By Rajendra Prasad, Village + Post: Kamalpur, District: Chandauli, Uttar Pradesh, PIN: 232106, India E-mail: Theochem@gmail.com 
4/3/2007  (Page 21 of 21)       
                                                                                                                                                 
It is very much common practice in QMC calculation to take Hartree-Fock trial 
function as a product of alpha-beta determinants. For example N atom: 
PSIT(1,2,3,4,5,6,7) = Det(1,2,3,4,5,6,7). 
PSIT(1,2,3,4,5,6,7) = Detα (1,2,3,4,5)*Detβ(6,7). 
PSIT(1,2,3,4,5,6,7) = Detα(1,2,3,4,5)*Detβ(6,7). 
Detα(1,2,3,4,5)*Detβ(6,7) ≠ Detα(1,2,3,4,7)*Detβ(6,5) 
The trial function PSIT is clearly neither symmetric nor antisymmetric with 
respect to interchange of alpha-beta electrons. It is not clear to me what kind of 
final solution (i.e. symmetric or antisymmetric) we will obtain with this trial 
function fixed node DMC? The fact that we can not write PSIexact = Phiα*Phiβ. 
5. It is natural to ask what is node of 3S He atom and why people are getting 
very accurate energy with exchange node r1-r2=0?  
At the moment, I can only say that this is due to artifact of importance sampling 
because people have used HF*Jastrow trial function. The correlation energy for 
He(3S) atom is around 2mH and the overlap of HF trial function with exact wave 
function can be anticipated to be more than 99%. Perhaps due to technical reasons 
final DMC solution may have converged to He(3S). I have seen some recent 
papers on the node of He(3S) system. It is widely claimed that the node r1-r2=0 
belongs to He(3S) system and it is exact. I differ with their argument and I proved 
that the exchange node r1-r2=0 belongs to He ground state. I do not know if 
anyone have performed DMC calculation with a trial function like psit(r1,r2)=(r1-
r2)*exp(-2*r1)*exp(-2*r2) and reported the energy for He(
3S).  Anyway, at present 
I am interested only in the non-degenerate ground state of atoms and molecules. 
Author welcomes further comments, questions, and suggestions if any. 
Theochem@gmail.com
ABSTRACT
  In an attempt to bypass the sign problem in quantum Monte Carlo simulation of
electronic systems within the framework of fixed node approach, we derive the
exclusion principle "Two electrons can't be at the same external isopotential
surface simultaneously" using the first postulate of quantum mechanics. We
propose the exact Coulomb-Exchange nodal surface i.e. the exact boundary
condition to solve the non-relativistic Schrodinger equation for the
non-degenerate ground state of atoms and molecules. This boundary condition was
applied to compute the ground state energies of N, Ne, Li2, Be2, B2, C2, N2,
O2, F2, and H2O systems using diffusion Monte Carlo method. The ground state
energies thus obtained agree well with the exact estimate of non-relativistic
energies.

<|endoftext|><|startoftext|>
Introduction
	Fragmentation of 9Be nuclei
	Fragmentation of 14N nuclei
	Fragmentation of 7Be, and 8B nuclei
	 Conclusions
	References
ABSTRACT
  Recent studies of clustering in light nuclei with an initial energy above 1 A
GeV in nuclear treack emulsion are overviewed. The results of investigations of
the relativistic $^9$Be nuclei fragmentation in emulsion, which entails the
production of He fragments, are presented. It is shown that most precise
angular measurements provided by this technique play a crucial role in the
restoration of the excitation spectrum of the $\alpha$ particle sysytem. In
peripheral interactions $^9$Be nuclei are dissociated practically totally
through the 0$^+$ and 2$^+$ states of the $^8$Be nucleus.
  The results of investigations of the dissociation of a $^{14}$N nucleus of
momentum 2.86 A GeV/c in emulsion are presented as example of more complicated
system. The momentum and correlation characteristics of $\alpha$ particles for
the $^{14}$N$\to3\alpha+X$ channel in the laboratory system and the rest
systems of 3$\alpha$ particles were considered in detail.
  Topology of charged fragments produced in peripheral relativistic
dissociation of radioactive $^8$B, $^7$Be nuclei in emulsion is studied.

<|endoftext|><|startoftext|>
Super-shell stru
tures and pairing in ultra
old trapped Fermi gases
Magnus Ögren
and Henning Heiselberg
Mathemati
al Physi
s, Lund Institute of Te
hnology, P.O. Box 118, SE-22100 Lund, Sweden
University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark
(Dated: April 3, 2007)
We 
al
ulate level densities and pairing gaps for an ultra
old dilute gas of fermioni
 atoms in
harmoni
 traps under the in�uen
e of mean �eld and anharmoni
 quarti
 trap potentials. Super-
shell stru
tures, whi
h were found in Hartree-Fo
k 
al
ulations, are 
al
ulated analyti
ally within
periodi
 orbit theory as well as from WKB 
al
ulations. For attra
tive intera
tions, the underlying
level densities are 
ru
ial for pairing and super-shell stru
tures in gaps are predi
ted.
PACS numbers: 03.75.Ss, 05.30.Fk
Ultra
old atomi
 gases have re
ently been used to 
re-
ate novel quantum many-body systems su
h as strongly
intera
ting high temperature super�uids of fermions,
Bose-Einstein 
ondensates, Mott insulators in opti
al lat-
ti
es, et
. These lab phenomena have a strong over-
lap with 
ondensed matter [1℄, nu
lear [2℄ and neutron
star physi
s [3℄. Finite fermion systems su
h as atoms in
traps, nu
lei, helium and metal 
lusters, semi
ondu
tor
quantum dots, super
ondu
ting grains, et
., have addi-
tional interesting quantum stru
tures su
h as level spe
-
tra, densities and pairing. These will be observable as
temperatures are further lowered in atomi
 trap experi-
ments. The high degree of 
ontrol over physi
al param-
eters, in
luding intera
tion strength and density, makes
the atomi
 traps marvelous model systems for general
quantum phenomena.
The purpose here is to 
al
ulate the level spe
tra,
densities and pairing for zero-temperature Fermi gases
in harmoni
 os
illator (HO) traps with anharmoni
 and
mean �eld perturbations, and to show that novel super-
shell stru
tures appear in both level densities and pairing.
In 
al
ulating level spe
tra by analyti
al periodi
 orbit
theory and WKB as well as numeri
al Hartree-Fo
k, we
also relate these di�erent theoreti
al approa
hes to one
another.
We treat a gas ofN fermioni
 atoms of massm in a HO
potential at zero temperature, intera
ting via a two-body
intera
tion with s-wave s
attering length a. We shall
mainly dis
uss a spheri
ally symmetri
 trap and a dilute
gas (i.e. where the density ρ obeys the 
ondition ρ|a|3 ≪
1) of parti
les with two spin states of equal population.
The Hamiltonian is then given by
mω2r2i + U(ri)
, (1)
We will 
onsider both external anharmoni
 potentials of
the form U = εr4 and parti
le intera
tions: U(ri) =
(2π~2a/m)
j 6=i δ
3(ri−rj). When intera
tions are weak,
the latter 
an be approximated by the mean �eld poten-
U(r) =
2π~2a
ρ(r) . (2)
For a large number of parti
les and U = 0 the Fermi
energy is EF = ñF~ω where nF = ñF − 3/2 ≃ (3N)1/3
is the HO quantum number at the Fermi surfa
e. The
HO shells are highly degenerate with states having an-
gular momenta l = nF , nF −2, ...,mod(nF , 2), due to the
U(3) symmetry of the 3D spheri
ally symmetri
 HO po-
tential. However, intera
tions split this degenera
y. In
the Thomas-Fermi (TF) approximation (see, e.g., [4℄) the
Fermi energy is
2k2F (r)
mω2r2 + U(r) . (3)
The density ρ(r) = k3F (r)/3π
ρ(r) = ρ0
1− r2/R2TF
, (4)
inside the 
loud r ≤ RTF = aosc
2ñF , where ρ0 =
(2ñF )
3/2/3π2a3osc is the 
entral density [5℄. For 
onve-
nien
e we set the os
illator length aosc =
~/mω = 1 in
the following.
Taylor expanding the density and thereby also the
mean �eld of Eq. (2) around the 
enter gives
ρ(r) ≃ ρ0
r2/R2TF +
r4/R4TF + ...
, (5)
the �rst term will simply in
orporate a 
onstant shift
in energies whereas the term quadrati
 in radius renor-
malizes the HO frequen
y as ωeff = ω
1− 6πaρ0/R2TF .
The third term is quarti
 in radius and is therefore also
of the same form as the external potential
U(r) ≃ εr4 , (6)
with ε = (3π~2a/4m)ρ0/R
TF . Both the pure quarti
potential and the mean �eld potential of Eq. (2) are an-
harmoni
 and 
hange the level density by splitting the l
degenera
y of the HO shell nF at the Fermi surfa
e.
We will now 
al
ulate analyti
ally the level spe
tra
from perturbative periodi
 orbit theory for the quarti
potential and subsequently within semi
lassi
al WKB
wavefun
tions for both the quarti
 and the mean �eld
potential of Eq. (2). We will start with repulsive inter-
a
tions where pairing is not present.
http://arxiv.org/abs/0704.0385v1
In periodi
 orbit theory [6℄, the level density 
an be
written (to leading order in ~
) in terms of a perturba-
tive HO tra
e formula [7, 8℄
g(E) =
1 +Re
(−1)k M ei2πkE/~ω
. (7)
For the unperturbed HO (U = 0) the modulation fa
tor is
M = 1. For a quarti
 perturbed potential, as in Eq. (6),
the modulation fa
tor was 
al
ulated in [8℄
e−i2kσ/~−iπ/2 + e−i3kσ/~+iπ/2
, (8)
with σ = επE2/~2ω3, being a small 
lassi
al a
tion. The
two terms arise from the 
hange in a
tions for the 
ir
le
and diameter orbits respe
tively due to the quarti
 po-
tential [8℄. The resulting level density 
an be written in
the fa
torised form [9℄
g(E) =
(~ω)3
(−1)k
. (9)
Here, the �rst term is the average level density, the

osine fa
tor gives the rapid HO shell os
illations (mod-
i�ed by the perturbation) whi
h, however, are slowly
modulated by the sine fa
tor resulting in a beating pat-
tern. Moreover, the non-perturbed HO limit, equivalent
to M = 1 in Eq. (7), is re
overed in the limit of |ε| → 0,
where the U (3) symmetry is restored. The k = 1 term
in Eq. (9) gives the major os
illations in the level density
and is shown in Fig. 1 (a). The beating pattern or super-
shells is 
learly observed. The shell os
illations vanish
when the argument of the sine in Eq. (9) is an integer
S = 1, 2, 3, ... times π, i.e. |ε|E2/2(~ω)3 = S. This gives
the supernode 
ondition
nF = E/~ω =
2S~ω/|ε| . (10)
We now turn to an alternative 
al
ulation of the level
density with WKB. The splitting of the HO shells degen-
erate levels l = nF , nF −2, ...,mod (nF , 2) in the shell nF
by the mean-�eld potential 
an be 
al
ulated perturba-
tively in the dilute limit. An ex
ellent approximation for
the radial HO wave fun
tion with angular momentum l
and (nF − l)/2 radial nodes in the HO shell when nF ≫ 1
is the WKB one [10, 11℄:
RnF l(r) ≃
sin(kl(r)r + θ)
l (r)r
, (11)
between turning points r2± = ñF ±
ñ2F − l(l + 1). Here,
ñF = nF + 3/2 and the WKB wave number kl(r) is
k2l (r) = 2ñF − r2 − l(l+ 1)/r2 . (12)
When nF ≫ 1 the wave fun
tion has many nodes 1 ≪
l ≪ nF and the os
illations in R2nl(r) 
an be averaged
〈sin2(kl(r)r)〉 = 1/2 [10℄. The phase θ is then unim-
portant. The single-parti
le energies for the anharmoni
potential of Eq. (6) are simply
EnF ,l − ñF~ω =
U(r)|RnF l(r)|2r2dr (13)
πkl(r)
3ñ2F − l(l + 1)
.(14)
It is spe
ial for the quarti
 perturbation that the level
energies are linear in l(l+1). The resulting level spa
ing
in
reases as (2l+1) just as the level degenera
y for SO(3)
symmetry. Therefore the level density is 
onstant within
the bandwidth
D ≡ |EnF ,l≡0 − EnF ,l=nF | = εn2F /2 (15)
on energy s
ales larger than 2D/nF but smaller than D.
The level density vanishes between the bandwidths of
two neighbouring n shells and therefore it generally has
a strong os
illatory behavior as shown in Fig. 1 (a). Its
amplitude is largest when D ∼ ~ω/2. However, when
D ≃ ~ω the level density is 
onstant and the os
illa-
tory behavior vanishes. This phenomenon repeats when
D = S~ω sin
e the level spe
tra then overlap S times.
With the bandwidth of Eq. (15) under this 
ondition, we
obtain exa
tly the same supernode 
ondition as for pe-
riodi
 orbit theory, Eq. (10). We 
on
lude that Craig's
perturbative periodi
 orbit theory [7℄ is in exa
t agree-
ment with perturbative WKB for a quarti
ally perturbed
spheri
al symmetri
 HO in three dimensions.
We now turn to the slightly more 
ompli
ated mean
�eld potential of Eq. (2). Its level spe
trum 
an also be

al
ulated from the WKB wave fun
tions of Eq. (11).
Inserting them in Eq. (13), we obtain
EnF ,l − ñF~ω = 2/
F ~ω I . (16)
Here, the integral I is
1− l(l+ 1)/ñ2F
1− x2
where x = (r2−ñF )/
ñ2F − l(l + 1). This integral is I =
π for l ≃ nF and I = 8
2/3 for l = 0. The bandwidth is
therefore
D = 2/
2/3− π
. (18)
Inserting this bandwidth in the supernode 
ondition D =
S~ω gives
2/3− π
F = S . (19)
For example in the 
ase 2πa = 1 the supernodes S =
1, 2, 3, .. should o

ur when nF ∼ 28, 44, 58, et
. The
Hartree-Fo
k (HF) 
al
ulations of the os
illating part of
the total energy, whi
h is proportional to the level den-
sity at the Fermi level [6℄, result in slightly higher su-
pernodes, as in Fig 1 (b). The di�eren
es arise be
ause
the WKB 
al
ulations are perturbative in the intera
tion
strength, whereas in the HF 
al
ulation the MF poten-
tial U in
ludes a large s
attering length whi
h, e.g., leads
to 
orre
tions for the e�e
tive os
illator frequen
y. Also
for the purely quarti
 term the perturbative approa
h
underestimates the exa
t supernodes (see Fig. 3 of [8℄).
For weaker intera
tions 2πa = 0.1, the �rst supernode
S = 1 should o

ur at nF = 130 a

ording to the 
ondi-
tion of Eq. (19), in 
loser agreement with the HF result
of Fig. 1 (
).
For 
omparison, the Taylor expansion of the mean �eld
potential leads to the supernode 
ondition of Eq. (10)
with ε = (3π~2a/4m)ρ0/R
TF . It di�ers from Eq. (19)
by the prefa
tor, whi
h is ∼ 34% smaller. It is a better
approximation to expand e.g. around r = RTF /2
nF /2, where the 
orresponding prefa
tor is only ∼ 8%
smaller, su
h that the supernode in Fig 1 (
) is predi
ted
to nF = 137. Now expanding I of Eq. (17) for small
l ≪ nF , one �nds
I = (8/3)
2− l2/
2n2F , (20)
resulting in the level spe
trum [10℄
EnF ,l − ñ~ω =
− l(l+ 1)
. (21)
This level density is 
onstant at low l as for the potential
in Eq. (14). However, near l ∼ nF the density of lev-
els is slightly smaller as 
an be seen from the bandwidth

orresponding to Eq. (21), whi
h is ∼ 12% larger, for a
given nF , than the bandwidth of Eq. (18). That the level
density is not 
ompletely 
onstant within the bandwidth
has the e�e
t that a small periodi
ity remains even at
the super-shell 
ondition D = S~ω. Therefore the shell
os
illations do not disappear 
ompletely at the supern-
odes, as 
an be seen in Fig. 1 (b,
), whereas for the purely
quarti
 
ase (a) the os
illations disappear 
ompletely at
the supernodes.
Most atomi
 traps are not spheri
al but 
igar shaped
(prolate) with ωz
<∼ω⊥. The unperturbed HO energies
E = nz~ωz + n⊥~ω⊥ will generally lead to a 
onstant
level density for energy s
ales larger than ~ωz but smaller
than ~ω⊥. When the os
illator frequen
y ratio ω⊥/ωz is a
rational number, level degenera
ies and larger os
illations
will o

ur on the s
ale ~ωz. Intera
tions will, however,
smear this level density. In any 
ase, super-shell stru
ture
is not expe
ted as in the spheri
al symmetri
 
ase. In
very oblate traps ωz ≫ ω⊥ the mean �eld potential is
e�e
tively two-dimensional and quadrati
, i.e. it does
not split the HO shells [10, 13℄. Thus we may expe
t
strong os
illations in the level density on the s
ale ~ω⊥,
but again no super-shell stru
ture.
0 20 40 60 80 100
−1000
1000 (a)
0  20 40 60 80 100 120 140
200 (c)
0 20 40 60 80 100
40 (b)
Figure 1: (
olor online) The upper �gure (a) shows the leading
term (k = 1) of the os
illating part of the perturbative level
density of Eq. (9) as a fun
tion of nF = E/~ω, for the 
ase of
an external potential V = VHO+εr
with ε = 2/402 ≈ 0.0013.
The middle and lower �gures (b,
) show the os
illating part
of the total energy a

ording to a numeri
al HF 
al
ulation
[12℄, with intera
tion strength 2πa = 1 and 2πa = 0.1, as a
fun
tion of the HO shell number (~ = ω = 1). This illustrates
qualitatively that a supernode, e.g. at nF = 40, 
an be due
to intera
tion (b) and/or an additional quarti
 term to the
HO potential (a).
Attra
tive intera
tions lead to pairing by an amount
that is exponentially sensitive to the underlying level den-
sity near the Fermi surfa
e [2, 10, 11, 14℄. The level den-
sity is the same for repulsive and attra
tive intera
tions
ex
ept that the levels are reversed when the sign of ε
(Eqs. (9)) and (13)) or a is 
hanged (Eq. (16)). Therefore
we 
an use the level densities and bandwidths 
al
ulated
above for pairing 
al
ulations. Pairing in �nite systems
is des
ribed by the Bogoliubov-de Gennes (BdG) equa-
tions [15℄ and take pla
e between time-reversed states.
As shown in [14℄ these states 
an be approximated by HO
wave fun
tions in dilute HO traps as long as the gap does
not ex
eed the os
illator energy, ∆<∼~ω. Solving BdG for
su
h �nite systems is numeri
ally 
ompli
ated and we
shall therefore apply further simplifying approximations,
namely that the pairing gap ∆nl and the wavefun
tion
overlap matrix elements vary slowly with level l in a shell
n. Both approximations are fair for the trapped atoms
as argued in [11℄ and deviations 
an be understood. As
result we arrive at a mu
h simpli�ed gap equation
∫ ∼2nF
g(E) dE
(E − µ)2 +∆(µ)2
. (22)
Here, the supergap G = 32
2nF |a|~ω/15π2 was 
al-

ulated in [10℄ as the pairing gap when all states in
a shell 
an pair; this is the 
ase for a region of in-
tera
tion strengths and parti
le number where the gap
is large as 
ompared to the level splitting, yet small
0 10 20 30 40 50 60
=(3N)1/3
38 40 42 44
Figure 2: (
olor online) Multi-shell pairing gaps for a HO trap
with an additional quarti
 term in the potential with ε =
2/402, i.e. for the level density of Fig. 1 (a) with supernodes
at nF ≃ 40, 40
2 ≈ 57, etc. The intera
tion strength is a =
−0.05 (top red 
urve), a = −0.03 (middle blue 
urve, with the
inset �gure around the �rst supernode) and a = −0.01 (lower
green 
urve). In the inset plot it is 
learly seen that the lo
al
minima for l ∼ nF and l ∼ 0 before the supernode turns
into lo
al maxima after the supernode, as a 
onsequen
e of
overlapping shells. The dashed (red) line is the multi-shell gap
∆ = G/(1−2G ln(nF )/~ω) for a = −0.05 and the upper/lower
thin solid line (bla
k) are the single mid-/end-shell pairing for
a = −0.01 (see text).

ompared to the shell splitting ~ω. ∆(µ) = ∆nl is
the gap at the Fermi surfa
e. g(E) = n2F /D is
the level density within ea
h bandgap D around ev-
ery shell n = 0, 1, ...,∼2nF but vanishes between
the bandgaps. The gap equation thus redu
es to
1 = (G/D)
∑∼2nF
(E + n~ω − µ)2 +∆2. The

hemi
al potential µ 
an be determined from the level
spe
trum; as we gradually �ll parti
les into the shell nF
at the Fermi surfa
e, µ in
reases from nF~ω to nF~ω+D.
The 
ut-o� n<∼2nF in the sum of the gap equation models
as a �rst approximation the more rigorous regularization
pro
edure des
ribed in Ref. [16℄ that is required for a
delta-fun
tion pseudo-potential.
By solving this simpli�ed gap equation of Eq. (22),
we �nd that it still 
ontains and displays the essential
interplay between the variation in level density and pair-
ing. To illustrate the super-shell stru
ture in pairing,
we take the strongly anharmoni
 trap potential used for
the level spe
tra in Fig. 1 (a), and 
al
ulate the pairing
arising from a weak attra
tive s
attering length a < 0.
For su�
iently weak intera
tions su
h that pairing only
takes pla
e in the shell at the Fermi surfa
e, we obtain
the expe
ted result from the gap equation: ∆ = G when
D ≪ ∆, whereas for D ≫ ∆ we get ∆ = D exp(−D/2G)
midshell (µ = nF~ω + D/2) and ∆ = 2D exp(−D/G)
endshell (µ = nF~ω or µ = nF~ω +D). Pairing is thus
stronger at midshell than at endshell, where there are
fewer states to pair [11℄, and strong shell os
illations fol-
low as shown in Fig. 2. For stronger intera
tions, pairing
also takes pla
e between states in shells around the Fermi
shell and Eq. (22) gives: ∆ = G/ (1− 2G ln (nF ) /~ω) for
small bandwidth [14℄. In Fig. 2 this 
urve is 
ompared
with the �nite bandwidth result, whi
h has strong os
il-
lations ex
ept at the supernodes where the level density
is 
ontinuous. At a supernode D = ~ω and the gap equa-
tion (22) leads to a gap ∆ = 2nF~ω exp(−~ω/2G) [11℄.
In summary, level densities, shell-os
illations and
super-shell stru
tures in anharmoni
 traps 
al
ulated
from numeri
al Hartree-Fo
k and analyti
al periodi
 or-
bit theory as well as WKB were found to mat
h to lead-
ing order. Analogous super-shell stru
tures were found
in pairing from an approximated BdG 
al
ulation. The
mean �eld in atomi
 nu
lei also have a large anharmoni
potential and the HO shells start to overlap (the �rst
supernode) already for heavy nu
lei with nF ∼ 5 − 6.
The interplay of level spe
tra and multishell pairing is,
however, di�
ult to disentangle in nu
lear pairing due to
strong spin-orbit e�e
t and small parti
le number. Ul-
tra
old atomi
 traps, however, provide ideal systems for
observing the ri
h quantum stru
tures su
h as level den-
sities and pairing.
Dis
ussions with Matthias Bra
k on periodi
 orbit the-
ory, Ben Mottelson on (nu
lear) shell theory and pairing,
and proof reading by Joel Corney, are gratefully a
knowl-
edged.
[1℄ J. Bardeen, L. N. Cooper, J. R. S
hrie�er, Phys. Rev.
108, 1175 (1957).
[2℄ A. Bohr and B. R. Mottelson, Nu
lear Stru
ture Vols.
I+II, Benjamin, New York 1969.
[3℄ A. Bohr, B. R. Mottelson, D. Pines, Phys. Rev. 110, 936
(1958).
[4℄ C. J. Pethi
k and H. Smith, Bose-Einstein Condensation
in Dilute Gases, Cambridge Univ. Press 2002.
[5℄ For a �nite number of parti
les the fa
tor ñ = nF + 3/2
in
ludes a 
orre
tion to nF , whi
h has been 
he
ked nu-
meri
ally to improve the TF approximation and slightly

hange the predi
tion of supernodes.
[6℄ M. Bra
k and R. K. Bhaduri, Semi
lassi
al Physi
s, re-
vised edn (Boulder, CO: Westview) (2003).
[7℄ S. C. Creagh, Ann. Phys., NY 248 60 (1996).
[8℄ M. Bra
k et al., J. Phys. A 38, 9941 (2005).
[9℄ M. Ögren, unpublished (2006):
www.magnus.ogren.se/notes/pot/derivationofgpert.pdf
[10℄ H. Heiselberg and B. R. Mottelson, Phys. Rev. Lett. 88,
190401 (2002).
[11℄ H. Heiselberg, Phys. Rev. A 68, 053616 (2003). Note that
the square root of kl was missing in Eq. (6) of this Ref.
as 
ompared to Eq. (11).
[12℄ Y. Yu et al., Phys. Rev A 72, 051602(R) (2005).
[13℄ B. P. van Zyl et al., Phys. Rev. A 67, 023609 (2003).
[14℄ G. M. Bruun and H. Heiselberg, Phys. Rev. A 65, 053407
(2002).
[15℄ P. G. de Gennes, Super
ondu
tivity of Metals and Alloys
(Addison-Wesley, New York, 1989).
[16℄ G. M. Bruun et al., Eur. Phys. J. D9, 433 (1999).
ABSTRACT
  We calculate level densities and pairing gaps for an ultracold dilute gas of
fermionic atoms in harmonic traps under the influence of mean field and
anharmonic quartic trap potentials. Super-shell structures, which were found in
Hartree-Fock calculations, are calculated analytically within periodic orbit
theory as well as from WKB calculations. For attractive interactions, the
underlying level densities are crucial for pairing and super-shell structures
in gaps are predicted.

<|endoftext|><|startoftext|>
Quantum non-local effects with Bose-Einstein condensates
F. Laloë a and W. J. Mullin b
Laboratoire Kastler Brossel, ENS, UPMC, CNRS; 24 rue Lhomond, 75005 Paris, France
Department of Physics, University of Massachusetts, Amherst, Massachusetts 01003 USA
We study theoretically the properties of two Bose-Einstein condensates in different spin states,
represented by a double Fock state. Individual measurements of the spins of the particles are per-
formed in transverse directions, giving access to the relative phase of the condensates. Initially, this
phase is completely undefined, and the first measurements provide random results. But a fixed value
of this phase rapidly emerges under the effect of the successive quantum measurements, giving rise
to a quasi-classical situation where all spins have parallel transverse orientations. If the number of
measurements reaches its maximum (the number of particles), quantum effects show up again, giving
rise to violations of Bell type inequalities. The violation of BCHSH inequalities with an arbitrarily
large number of spins may be comparable (or even equal) to that obtained with two spins.
PACS numbers: 03.65.Ta,03.65.Ud,03.75.Gg,03.75.Mn
The notion of non-locality in quantum mechanics
(QM) takes its roots in a chain of two theorems, the
EPR (Einstein Podolsky Rosen) theorem [1] and its log-
ical continuation, the Bell theorem. The EPR theorem
starts from three assumptions (Einstein realism, locality,
the predictions of quantum mechanics concerning some
perfect correlations are correct) and proves that QM is
incomplete: additional quantities, traditionally named λ,
are necessary to complete the description of physical re-
ality. The Bell theorem [2, 3] then proves that, if λ exists,
the predictions of QM concerning other imperfect corre-
lations cannot always be correct. The ensemble of the
three assumptions: Einstein realism, locality, all predic-
tions of QM are correct, is therefore self-contradictory;
if Einstein realism is valid, QM is non-local. Bohr [4]
rejected Einstein realism because, in his view, the no-
tion of physical reality could not correctly be applied to
microscopic quantum systems, defined independently of
the measurement apparatuses. Indeed, since EPR con-
sider a system of two microscopic particles, which can be
“seen” only with the help of measurement apparatuses,
the notion of their independent physical reality is open
to discussion.
Nevertheless, it has been pointed out recently [5, 6]
that the EPR theorem also applies to macroscopic sys-
tems, namely Bose-Einstein (BE) condensates in two dif-
ferent internal states. The λ introduced by EPR then cor-
responds to the relative phase of the condensates, i.e. to
macroscopic transverse spin orientations, physical quan-
tities at a human scale; it then seems more difficult to
deny the existence of their reality, even in the absence
of measurement devices. This gives even more strength
to the EPR argument and weakens Bohr’s refutation. It
is then natural to ask whether the Bell theorem can be
transposed to this stronger case.
The purpose of this article is to show that it can. We
consider an ensemble of N+ particles in a state defined by
an orbital state u and a spin state +, and N− particles
in the same state with spin orientation −. The whole
system is described quantum mechanically by a double
Fock state, that is, a “double BE condensate”:
| Φ > =
(au,+)
]N+ [
(au,−)
| vac. > (1)
where au,+ and au,− are the destruction operators asso-
ciated with the two populated single-particle states and
|vac. > is the vacuum state. We introduce a sequence
of transverse spin measurements that leads to quantum
predictions violating the so called BCHSH [7, 8] Bell in-
equality. This is reminiscent of the work of Mermin [9],
who finds exponential violations of local realist inequal-
ities with N -particle spin states that are maximally en-
tangled. By contrast, here we consider the simplest way
in which many bosons can be put in two different in-
ternal levels, with a N -particle state containing only the
minimal possible correlations, those due to statistics. We
find violations of inequalities that are the same order of
magnitude as with the usual singlet spin state and may
actually saturate the Cirel’son bound [10].
Double Fock states are experimentally more accessi-
ble and much less sensitive to dissipation and decoher-
ence than maximally entangled states [11]. Considering
a system in a double Fock state, we assume that a se-
ries of rapid spin measurements can be performed and
described by the usual QM postulate of measurement,
without worrying about decoherence between the mea-
surements, thermal effects, etc.
The operators associated with the local density of par-
ticles and spins can be expressed as functions of the
two fields operators Ψ±(r) associated with the two in-
ternal states ± as: n(r) = Ψ†+(r)Ψ+(r) + Ψ
−(r)Ψ−(r),
σz(r) = Ψ
+(r)Ψ+(r)−Ψ
−(r)Ψ−(r), while the spin com-
ponent in the direction of plane xOy making an angle ϕ
with Ox is: σϕ(r) = e
+(r)Ψ−(r)+ e
−(r)Ψ+(r).
Consider now a measurement of this component per-
formed at point r and providing result η = ±1. The
http://arxiv.org/abs/0704.0386v4
corresponding projector is:
Pη=±1(r, ϕ) =
[n(r) + η σϕ(r)] (2)
and, because the measurements are supposed to be per-
formed at different points (ensuring that these projectors
all commute) the probability P(η1, η2, ...ηN ) for a series
of results ηi± 1 for spin measurements at points ri along
directions ϕi can be written as:
< Φ | Pη1(r1, ϕ1)× Pη2(r2, ϕ2)× ....PηN (rN , ϕN ) | Φ >
We now substitute the expression of σϕ(r) into (2) and
(3), exactly as in the calculation of ref. [5], but with one
difference: here we do not assume that the number of
measurements is much smaller than N±, but equal to
its maximum value N = N+ + N−. In the product of
projectors appearing in (3), because all r’s are different,
commutation allows us to push all the field operators to
the right, all their conjugates to the left; one can then
easily see that each Ψ±(r) acting on | Φ > can be re-
placed by u(r) × au,± , and similarly for the Hermitian
conjugates. With our initial state, a non-zero result can
be obtained only if exactly N+ operators au,+ appear in
the term considered, and N− operators au,−; a similar
condition exists for the Hermitian conjugate operators.
To express these conditions, we introduce two additional
variables. As in [5], the first variable λ ensures an equal
number of creation and destruction operators in the in-
ternal states ± through the mathematical identity:
einλ = δn,0 (4)
The second variable Λ expresses in a similar way that the
difference between the number of destruction operators in
states + and − is exactly N+−N−, through the integral:
e−inΛ ei(N+−N−)Λ = δn,N+−N− (5)
The introduction of the corresponding exponentials into
the product of projectors (2) in (3) provides the expres-
sion (c.c. means complex conjugate):
|u(rj)|2
eiΛ + e−iΛ + ηj
ei(λ−ϕj+Λ) + c.c.
where, after integration over λ and Λ, the only surviving
terms are all associated with the same matrix element in
state | Φ > (that of the product of N+ operators a†u,+
and N− operators a
u,− followed by the same sequence
of destruction operators, providing the constant result
N+!N−!). We can thus write the probability as:
P(η1, η2, ...ηN ) ∼
ei(N+−N−)Λ
|u(rj)|2
eiΛ + e−iΛ + ηj
ei(λ−ϕj+Λ) + c.c.
or, by using Λ parity and changing one integration variable (λ′ = λ+ Λ), as:
P(η1, η2, ...ηN ) =
cos [(N+ −N−)Λ]
{cos (Λ) + ηj cos (λ′ − ϕj)} (8)
The normalization coefficient CN is readily obtained by writing that the sum of probabilities of all possible sequences
of η’s is 1 (this step requires discussion; we come back to this point at the end of this article):
cos [(N+ −N−)Λ] [cos (Λ)]N (9)
Finally, we generalize (8) to any number of measurements M < N . A sequence of M measurements can always be
completed by additional N −M measurements, leading to probability (8). We can therefore take the sum of (8) over
all possible results of the additional N −M measurements to obtain the probability for any M as:
P(η1, η2, ...ηM ) =
cos [(N+ −N−)Λ] [cosΛ]N−M
{cos (Λ) + ηj cos (λ′ − ϕj)} (10)
The Λ integral can be replaced by twice the integral between ±π/2 (a change of Λ into π −Λ multiplies the function
by (−1)N+−N−+N−M+M = 1). If M ≪ N , the large power of cosΛ in the first integral concentrates its contribution
around Λ ≃ 0, so that a good approximation is Λ = 0. We then recover the results of refs [5, 6], with a single integral
over λ defining the relative phase of the condensates (Anderson phase), initially completely undetermined, so that
the first spin measurement provides a completely random result. But the phase rapidly emerges under the effect of a
few measurements, and remains constant [12, 13, 14]; it takes a different value for each realization of the experiment,
as if it was revealing the pre-existing value of a classical quantity. Moreover, when cosΛ is replaced by 1, each factor
of the product over j remains positive (or zero), leading to a result similar to that of stochastic local realist theories;
the Bell inequalities can then be obtained. However, when N − M is small or even vanishes, cosΛ can take values
that are smaller than 1 and the factors may become negative, opening the possibility of violations. In a sense, the
additional variable Λ controls the amount of quantum effects in the series of measurements.
We now discuss when these standard QM predictions violate Bell inequalities. We need the value of the quantum
average of the product of results, that is the sum of η1, η2, ...ηM × P(η1, η2, ...ηM ) over all possible values of the η’s,
which according to (10) is given by:
E(ϕ1, ϕ2, ..ϕM ) =
cos [(N+ −N−)Λ] [cosΛ]N−M
Consider a thought experiment where two condensates
in different spin states (two eigenstates of the Oz spin
component) overlap in two remote regions of space A
and B , with two experimentalists Alice and Bob; they
measure the spins of the particles in arbitrary transverse
directions (perpendicular to Oz) at points of space where
the orbital wave functions of the two condensates are
equal. All measurements performed by Alice are made
along a single direction ϕa, which plays here the usual
role of the “setting” a, while all those performed by Bob
are made along angle ϕb. We assume that Alice retains
just the product A of all her measurements, while Bob
retains only the product B of his; A and B are both ±1.
We now assume two possible orientations ϕa and ϕ
for Alice, two possible orientations ϕb and ϕ
b for Bob.
Within deterministic local realism, for each realization of
the experiment, it is possible to define two numbers A,
A′, both equal to ±1, associated with the two possible
products of results η that Alice will observe, depending
of her choice of orientation; the same is obviously true
for Bob, introducing B and B′. Within stochastic local
realism [8, 15], A and A′ are the difference of probabilities
associated with Alice observing +1 or −1, i.e. numbers
that have values between +1 and −1. In both cases, the
following inequalities (BCHSH) are obeyed:
− 2 ≤ AB +AB′ ± (A′B − A′B′) ≤ 2 (12)
In standard quantum mechanics, of course, “unper-
formed experiments have no results” [16], and several of
the numbers appearing in (12) are undefined; only two
of them can be defined after the experiment has been
performed with a given choice of the orientations. Thus,
while one can calculate from (11) the quantum average
value < Q > of the sum of products of results appearing
in (12), there is no special reason why < Q > should be
limited between +2 and −2. Situations where the limit
is exceeded are called “quantum non-local”.
We have seen that the most interesting situations oc-
cur when the cosines do not introduce their peaking effect
around Λ = 0, i.e. when N+ = N− and M has its maxi-
mum value N . Then, for a given N , the only remaining
choice is how the number of measurements is shared be-
tween Na measurements for Alice and Nb for Bob.
Assume first that Na = 1 (Alice makes one measure-
ment) and therefore Nb = N − 1 (Bob makes all the oth-
ers). Since we assume that N+ = N− and M = N , the
Λ integral in (11) disappears, and the λ integral contains
only the product of cos (λ′ − ϕa) by the (N − 1)th power
of cos (λ′ − ϕb), which is straightforward and provides
cos (ϕa − ϕb) times the normalization integral CN . The
quantum average associated with the product AB is thus
merely equal to cos (ϕa − ϕb), exactly as the usual case
of two spins in a singlet state. Then it is well-known that,
when the angles form a “fan” [17] spaced by χ = π/4,
a strong violation of (12) occurs, by a factor
2, sat-
urating the Cirel’son bound [10]. A similar calculation
can be performed when Alice makes 2 measurements and
Bob N − 2, and shows that the quantum average is
now equal to 1
1 + 1
+ (1 − 1
) cos 2 (ϕa − ϕb)
no longer independent of N. If N = 4, the maximum
of < Q > is 2.28 < 2
2, and rises to 2.41 as N → ∞.
An expression for the generalization of the quantum av-
erage to any number P and N − P of measurements by
Alice and Bob, respectively, is (with χ = ϕa − ϕb):
E(χ) =
{P/2}
P !(N − 2k)!
k!(P − 2k)!(N
− k)!
sin2k χ cosP−2k χ
where {P/2} is the integer part of P/2. The maximum of
< Q > can then be found using a numerical Mathematica
routine. Results are shown for several values of P in Fig.
1. The angles maximizing the quantum Bell quantity
always occur in the fan shape, although the basic angle
χ changes with P and N. All of the curves where P is
held fixed have a finite < Q > limit with increasing N ,
and the optimum values of the angles approach constants.
For the curve P = N/2, the limit is 2.32 when N → ∞,
and the fan opening decreases as 1/
10080604020
       P
FIG. 1: The maximum of the quantum average < Q > for
Alice doing P experiments and Bob N − P , as a function of
the total number of particles N . The usual Bell situation is
obtained for N = 2, P = 1. Local realist theories predict an
upper limit of 2; large violations of this limit are obtained,
even with macroscopic systems (N → ∞). If P = 1, the
violation saturates the Cirel’son limit for any N .
We can also study cases where the number of measure-
ments is M < N : if Bob makes all his measurements,
but ignores one or two of them (independently of the or-
der of the measurements), when he correlates his results
with Alice, the BCHSH inequality is never violated. All
measurements have to be taken into account to obtain
violations. Furthermore, if the number of particles in the
two condensates are not equal, no violation occurs either.
Finally, it is possible to consider cases where we gener-
alize the angles considered: experimenter Carole makes
measurements at ϕc and ϕ
c, and David at ϕd and ϕ
We then find that a maximization of < Q > reduces to
the cases already studied, where the new angles collapse
to the previous angles ϕa, · · · , ϕ′b.
For the sake of simplicity, we have not yet discussed
some important issues that underlie our calculations. One
is related to the so called “sample bias loophole” (or
“detection/efficiency loophole”) and to the normalization
condition (9), which assumes that one spin is detected at
each point of measurement. A more detailed study (see
second ref. [5]) should include the integration of each
r in a small detection volume and the possibility that
no particle is detected in it. This is a well-known dif-
ficulty, which already appears in the usual two-photon
experiments [8], where most photons are missed by the
detectors. If this loophole still raises a real experimen-
tal challenge, the difficulty can be resolved in theory by
assuming the presence of additional spin-independent de-
tectors [2, 8], which ensure the detection of one particle
in each detector and create appropriate initial conditions
(see for instance [18] for a description of an experiment
with veto detectors). We postpone this discussion to an-
other article [19]. A second issue deals with the definition
of the local realist quantities A, B, etc. For two conden-
sates, we have a slightly different situation than in the
usual EPR situation: the local realist reasoning leads to
the existence of a well-defined phase λ between the con-
densates [5], not necessarily to deterministic properties
of the individual particles. Fortunately, Bell inequalities
can also be derived within stochastic local realist theories
[3, 8] (see also for instance [9] or appendix I of [15]), and
this difference is not a problem [19].
In conclusion, strong violations of local realism may
occur for large quantum systems, even if the state is a
simple double Fock state with equal populations; within
present experimental techniques, this seems reachable
with N ∼ 10 or 20. We have assumed that the mea-
sured quantity is the product of many microscopic mea-
surements, not their sum, which would be macroscopic; a
product of results remains sensitive to the last measure-
ment, even after a long sequence of others. Curiously, for
very few measurements only the results are quantum, for
many measurements they can be interpreted in terms of
a classical phase, but become again strongly quantum
when the maximum number of measurements is reached,
a sort of revival of quantum-ness of the system.
Laboratoire Kastler Brossel is “UMR 8552 du CNRS,
de l’ENS, et de l’Université Pierre et Marie Curie”.
[1] A. Einstein, B. Podolsky and N. Rosen, Phys. Rev. 47,
777 (1935).
[2] J.S. Bell, Physics 1, 195 (1964), reprinted in [3].
[3] J.S. Bell, “Speakable and unspeakable in quantum me-
chanics”, Cambridge University Press (1987).
[4] N. Bohr, Phys. Rev. 48, 696 (1935).
[5] F. Laloë, Europ. Phys. J. D, 33, 87 (2005); see also
cond-mat/0611043.
[6] W.J. Mullin, R. Krotkov and F. Laloë, Phys. Rev. A74,
023610 (2006).
[7] J.F. Clauser, M.A. Horne, A. Shimony and R.A. Holt,
Phys. Rev. Lett. 23, 880 (1969).
[8] J.F. Clauser and A. Shimony, Rep. on Progress in Phys.
41, 1883 (1978).
[9] N.D. Mermin, Phys. Rev. Lett. 65, 1838 (1990).
[10] B.S. Cirel’son, Letters in math. phys. 4, 93 (1980).
[11] J.A. Dunningham, K. Burnett and S.M. Barnett, Phys.
Rev. Lett. 89, 150401 (2002).
[12] J. Javanainen and Sung Mi Yoo, Phys. Rev. Lett. 76,
161 (1996).
[13] Y. Castin and J. Dalibard, Phys. Rev. A55, 4330 (1997).
[14] I. Cirac, C. Gardiner, M. Naraschewski and P. Zoller,
Phys. Rev. A54, R3714 (1996) and references in [6]
[15] F. Laloë, Am. J. Phys. 69, 655 (2001).
[16] A. Peres, Am. J. Phys. 46, 745 (1978).
[17] The term “fan” refers to the angles arranged as ϕba =
ϕa′b = ϕab′ and ϕa′b′ = 3χ where ϕab ≡ ϕa − ϕb.
[18] J.S. Bell, Comments on at. and mol. phys. 9, 121 (1979);
reprinted in [3].
[19] W.J. Mullin and F. Laloë, to be published
http://arxiv.org/abs/cond-mat/0611043
ABSTRACT
  We study theoretically the properties of two Bose-Einstein condensates in
different spin states, represented by a double Fock state. Individual
measurements of the spins of the particles are performed in transverse
directions, giving access to the relative phase of the condensates. Initially,
this phase is completely undefined, and the first measurements provide random
results. But a fixed value of this phase rapidly emerges under the effect of
the successive quantum measurements, giving rise to a quasi-classical situation
where all spins have parallel transverse orientations. If the number of
measurements reaches its maximum (the number of particles), quantum effects
show up again, giving rise to violations of Bell type inequalities. The
violation of BCHSH inequalities with an arbitrarily large number of spins may
be comparable (or even equal) to that obtained with two spins.

<|endoftext|><|startoftext|>
Introduction
HD141272 is a nearby G8 dwarf with a mass of
0.83+0.07
−0.03M⊙ (Nordström et al. 2004) located in the
constellation Serpens Caput (αJ2000.0 = 15
h 48m 09.4s,
δJ2000.0 = +01
◦ 34′ 18′′). Its proper motion (µα cos δ =
−176.19 ± 1.08mas/yr, µδ = −166.72 ± 1.13mas/yr)
and parallax (π = 46.84± 1.05mas, i.e. 21 pc) are both
well determined by the European astrometry satellite
Hipparcos (Perryman et al. 1997). While Montes et al.
(2001) list HD 141272 as a member of the Local asso-
ciation with an age of ∼ 120Myr (Mart́ın et al. 2001),
Fuhrmann (2004) suggested that this star belongs to
the young Her-Lyr moving group, according to its UV-
velocities. The age of some Her-Lyr members is esti-
mated by Fuhrmann (2004) to approximately 100Myr
(e.g. HR857, HD 82443, HD113449 and HR5829) which
recently reached their main sequence position, while
others seemed to be older than ∼ 200Myr
(Fuhrmann 2004). Also Fuhrmann (2004) argued that
HD141272, with an effective temperature of Teff =
(5270±80)K, an absolute bolometric magnitudeMbol =
(5.54±0.07)mag and metallicity of [Fe/H ] = (−0.08±
0.07) dex appears slightly too bright for its main se-
quence position, indicating that it might be non single
or young.
⋆ Based on observations obtained on La Silla in ESO programs
77.C-0572(A) and Calar Alto project number F06-3.5-016.
⋆⋆ E-mail: eisen@astro.uni-jena.de
On the other hand Gaidos, Henry & Henry (2000)
measured a Fe corrected Li-equivalent width ofW6708 =
3.9 ± 1.9mÅ and a rotational velocity of
v sin i ≈ 4.0 km/s, which might be too small for a
100Myr old star. Furthermore Chen et al. (2005) ob-
served HD141272 using the infrared space telescope
Spitzer and did not find any IR-excess at 24µm and
70µm indicating that HD141272 is not surrounded by
an optically thick disk.
Finally López-Santiago et al. (2006) revised the list
of Her-Lyr members and candidates of Fuhrmann (2004)
and classified HD141272 as an doubtful member, due
to its lithium depletion.
In our program we search for companions to Her-
Lyr members and candidates and first results are pre-
sented here. We found a co-moving companion of
HD 141272 by a combination of archival first epoch im-
ages and recent observations. We present our imaging,
the astrometric data and reduction techniques in sec-
tion 2 and 3, followed by a description of the spectro-
scopic and photometric analysis of the new companion
in section 4. The results are discussed in section 5.
2 Archival first epoch data
Astrometry is an effective method to find companions
of stars, by comparing two images taken with suffi-
ciently long epoch difference. In order to find late-type
stellar and substellar objects, we concentrate our search
c© Year of publication WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim
http://arxiv.org/abs/0704.0387v1
Astron. Nachr. / AN (Year of publication) 1
Fig. 1 POSS-I E image of HD141272 from 17 June
1950. The star is located at αJ2000.0 = 15
h 48m 09.4s,
δJ2000.0 = +01
◦ 34′ 18′′. A faint object is located in the
north of HD141272, which is hardly recognizable due
to the diffraction spikes of the primary star induced by
saturation. With a pixel size of ∼ 10microns the pixel
scale of the plate is ∼ 6.72 arcsec/pixel.
on companions of young stars. Young objects are still
in contraction and are brighter than older objects of
the same mass hence, low mass objects are easier to
detect.
We found HD141272 in three epochs of the Super-
COSMOS-Sky-Survey, namely a POSS-I (Palomar Ob-
servatory Sky Survey) plate from 1950, as well as in
UKST (United Kingdom Schmidt Telescope) infrared
and red observations from 1981 and 1992. On all three
plates we detected by eye inspection a faint object,
located approximately 18 arcsec north of HD141272,
which was not detected by the SuperCOSMOS ma-
chine due to its small angular separation to the much
brighter star and due to its overlap with the diffraction
spike (Fig. 1).
The diffraction spike of HD141272 intersects the
northern object on all three plates hence, the detec-
tion of this object would be inaccurate by means of
most common detection techniques. Nevertheless, we
obtained a position measurement of the companion can-
didate on the POSS-I plate, using the Source Extractor
package (Bertin & Arnouts 1996), included in the Star-
link application GAIA (Gray et al. 2004). The source
extractor uses thresholding and deblending of point-
spread functions hence the method is more accurate
than other detection techniques (e.g. Gaussian fitting)
under the circumstances in Fig. 1. However, an system-
atical error is possible, due to the perturbation of the
primary’s spike. This error is larger in right accession
than in declination and would affect the measurement
of the position angle rather than the separation (see
section 3, Fig. 4), due to the orientation of the system
(Fig. 1 and 3).
Due to its brightness HD141272 saturates the POSS-
I plate. Furthermore the PSF (point spread function)
is contaminated by the stray light of the companion
candidate hence, position measurement via PSF cen-
tering does not work sufficiently. We used the diffrac-
tion spikes of the saturated primary to determine its
position, since they are unaffected by the companion.
We determined the intensity center of a spike taking
∼ 30 measurements for each spike using the data re-
duction and analysis package ESO-MIDAS. The appli-
cation of a linear regression gives the position of the
star as intersection of the two spikes and leads to very
small astrometric uncertainties (∆αH = 0.047 arcsec
and ∆δH = 0.050 arcsec).
In addition to the detection on the POSS-I plate
HD 141272 and its companion-candidate are also de-
tected in 2MASS images from observing epoch 2000.
The 2MASS point source catalog (Cutri et al. 2003)
lists the position of both objects with accurate astro-
metric precision, see Tbl. 1.
Equipped with these data we determined the proper
motion of all stars in a 15 arcmin box around HD141272
which are detected at the POSS-I plate and listed in
the 2MASS point source catalog (see Fig. 2). We de-
rived the proper motion of all stars in the field by
comparing the positions of all detected objects. The
majority of sources only shows small proper motion
following a normal distribution, since these stars are
most probably at high distances. Using the Lilliefors
test for normal distribution we derived the subsample
of stars belonging to the background stars, since their
proper motion follows a normal distribution (non mov-
ing background stars). The standard deviation of the
background stars gives the statistically derived proper
motion error (σp.m., α = 8.8mas/yr, σp.m., δ = 6.8
mas/yr). Objects not belonging to the background stars
are considered as companion candidates, if they are ly-
ing within a 5-σ vicinity of HD141272 (ellipse in Fig. 2).
Other objects are omitted, since these are either false
detections or high-proper-motion stars moving in other
directions.
The proper motion of the nearby star HD141272
is clearly separated from the background stars. The
companion candidate clearly shares the proper motion
of HD 141272 and will be denoted HD141272B, here-
after. Fig. 2 shows with high confidence (∼ 13σ) that
HD141272A and B are co-moving over roughly 50 years.
Due the above discussed astrometric uncertainties of
HD 141272B this analysis gives a first indication of a
new nearby young double star system
www.an-journal.org c© Year of publication WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim
2 T. Eisenbeiss et al.: Low mass visual binaries in the solar neighbourhood: The case of HD141272
−300−250−200−150−100−50050
∆ Ra/Year [mas/yr]
standard error 
search radius 
HD141272 
companion 
candidate 
non moving background stars 
Fig. 2 Proper motion plot of HD141272 (cross) and
its companion candidate (circle) and non moving back-
ground stars (upper left). X- and Y-axis show the
change of the positions (in mas/yr). The plot is based
on POSS-I Schmidt plate (17 June 1950) and 2MASS
catalog data (29 April 2000). Error estimates are taken
as 2-σ errors from the background stars. Data points
lying outside the background stars and outside a 5-σ
vicinity of HD141272 (large ellipse) are omitted, since
these are either false detections or high proper motions
stars moving in other directions. The statistical error
of all data points is shown by the thick error cross in
the lower left. The diagram shows the common proper
motion of HD141272 and its new companion with a
confidence of ∼ 13σ.
Moreover, we used the non-moving background stars
to estimate the positional error of the detections in the
POSS-I plate. The mean of the distribution shows the
systematic error of the POSS-I measurements (∆sys, α =
−4.5mas/yr and ∆sys, δ = −4.9mas/yr as offset to
(0, 0). The whole set of data points in Fig. 2 is shifted
by that offset to correct for calibration errors between
POSS-I and 2MASS data. The standard deviation shows
the statistical measurement error (∆stat = σp.m.) hence,
can be applied as standard detection error. The total
detection error derived for the POSS-I plate is ∆α =
0.29 arcsec and ∆δ = 0.25 arcsec. The additional sys-
tematic error for the companion candidate due to the
diffraction spike of HD141272 is not included in this
error analysis.
Fig. 3 H-band image of HD 141272 and its compan-
ion candidate taken with the near infrared camera Ω-
Cass at the 3.5m telescope of the Calar Alto obser-
vatory in Spain. The separation between HD141272
and its companion candidate is ∼ 17.8 arcsec at a
position angle of ∼ 352.62◦ with a pixel scale of ∼
0.2 arcsec/pixel. Note that HD141272 is slightly satu-
rated.
3 Follow-up observations
In order to get a third epoch on our astrometric re-
sult and to detect or rule out further companions we
observed HD141272 again in April 2006 (Fig. 3). We
carried out H-band as well as narrow-band observations
(1.644µm) with the near infrared camera Ω-Cass, in-
stalled at the Cassegrain focus of the 3.5m telescope of
the Calar Alto observatory in Spain. Ω-Cass is equipped
with a 1024 × 1024 HgTeCd-detector with a pixel scale
of ∼0.2 arcsec per pixel. We always used the short-
est possible detector integration time (0.84 s) to limit
strong saturation effects due to the bright star. For
background subtraction we applied the standard jitter
technique and chose 12 jitter-positions. On each jitter
position 49 integrations (0.84 s) were co-added, yield-
ing a total integration time in the H-band of 8.2min.
All images were flatfielded with a skyflat image taken
during twilight. The whole data reduction (background
subtraction, flatfielding, and shift+add) was carried out
with the ESO data-reduction package Eclipse (Devil-
lard 2001).
We calibrated our Ω-Cass image for relative astrom-
etry, using the well known binary systems HIP 63322
and HIP 82817, which we observed during the same
night and with the same instrumentation as our sci-
ence image. Using the Hipparcos astrometry (Perry-
c© Year of publication WILEY-VCH Verlag GmbH&Co.KGaA, Weinheim www.an-journal.org
Astron. Nachr. / AN (Year of publication) 3
Table 1 Separation and position angle of the co-moving companion HD141272B relative to its primary
HD141272A for all observing epochs. We also show the expected change of separation and position angle in
case that the companion is a non-moving background source, derived with the well known proper and parallactic
motion of the primary.
epoch telescope/ pixel scale band sepobs. sepifback PAobs. PAifback
[dd/mm/yyy] catalogs [arcsec] [arcsec] [arcsec] [◦] [◦]
17/07/1950 POSS-I 1.0 E (6442Å) 17.85±0.31 − 353.6±1.1 −
29/04/2000 2MASS 0.7 JHKS 17.83±0.150 26.92±0.33 352.42±0.48 14.61±0.75
20/04/2006 3.5m CA 0.2 H 17.851±0.041 28.12±0.31 352.62±0.18 16.48±0.68
man et al. 1997) and considering the maximal orbital
motion of the calibration binaries we estimated the
pixel scale (192 ± 0.43mas/pixel) and the orientation
(−1.86±0.18 ◦) of the Ω-Cass images. This yields to the
relative astrometric parameters of the system (Tbl. 1).
For the detection of both objects we used the Gaussian
centroiding technique, implemented in ESO-MIDAS.
Further co-moving companions could be ruled out
around HD141272 within an angular separation of ∼ 5
to 73 arcsec (1500AU of projected separation) with H-
band magnitudes down to 18.3mag (S/N= 3).
HD 141272A and B are separated by ∼ 17.8 arcsec
(Fig. 3), hence the projected separation of the system
is approximately 380AU and its orbital period can be
estimated with Kepler’s third law to be roughly 7000
years (we use 0.83M⊙ for HD 141272A and 0.26M⊙
for B). During 56 years of epoch difference between
the POSS-I and our H-band observation, this yields
maximal orbital motion as large as ∼0.5 arcsec in sep-
aration (edge-on orbit assumed) or ∼3◦ in position an-
gle (face-on orbit assumed). Therefore, we derived the
separation and the position angle of the companion for
all three observing epochs which are summarized in
Tab. 1. These results are also visualized in Fig. 4. Note
that absolute calibrated astrometric data, derived for
the POSS-I image as described in section 2, as well as
catalog data from the 2MASS catalog is used in Fig. 4,
while the third epoch data is based on relative astrom-
etry, hence the uncertainties of that data point are sig-
nificantly smaller.
While the separation between HD141272A and B
did not change during 56 years, we found a slight de-
crease of its position angle. This effect is most likely
due to the perturbation of the companions PSF by the
diffraction spike of the primary (see section 2 and Fig.
1). Nevertheless Fig. 4 ensures the companionship of
HD 141272B, since all data points are lying within the
given error bars of the first epoch.
4 Photometry and Spectroscopy
The infrared colors of both components of the new bi-
nary system HD141272AB are listed in the 2MASS
point source catalog, i.e. accurate J, H, and KS band
2.43 2.44 2.45 2.46
JD−2400000.5
2.43 2.44 2.45 2.46
JD−2400000.5
Fig. 4 Separation (sep) and position angle (PA) for
HD 141272A and B from 1950 to 2006 (three data
points). Upper lines show the changes of the proper-
ties under the assumption HD141272B was a back-
ground star (including parallactic motion of A) while
the straight, opening lines give the range of the bi-
nary movement, considering maximal orbital move-
ment. While the separation stays approximately con-
stant there is a change in the position angle, caused
by the perturbation of the companions PSF due to the
diffraction spikes of the primary.
Table 2 2MASS Photometry of HD141272A and B
Comp. J H KS
[mag] [mag] [mag]
A 5.991±0.021 5.610±0.027 5.501±0.018
B 9.298±0.020 8.725±0.055 8.456±0.023
photometry is available for the primary and its co-
moving companion, which is summarized in Tab. 2. Ad-
ditionally the I-band magnitude of both components
(mI = 8.59 ± 0.02mag for A and mI = 10.572 ±
0.02mag for B) is measured in the second release of the
DENIS database, while the accuracy for HD141272A
is limited due to saturation effects, hence the given er-
ror is probably underestimated.
In order to obtain also unsaturated images of the
primary we observed the binary system with Ω-Cass
www.an-journal.org c© Year of publication WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim
4 T. Eisenbeiss et al.: Low mass visual binaries in the solar neighbourhood: The case of HD141272
in the FeII (1.644µm) narrow-band filter. Thereby, we
used again the 12 point jitter pattern but co-added 15
integrations (4 s) per jitter position, yielding a total in-
tegration time of 12min. The bright primary as well as
its fainter co-moving companion are both well detected
in this narrow-band image and their fluxes did not ex-
ceed the linearity level of the Ω-Cass detector. Hence,
we could use this image to derive the magnitude differ-
ence between the primary star and its companion and
obtained ∆HFeII = 3.166± 0.005mag, fully consistent
with the magnitude difference derived from the 2MASS
data in H-band (∆H = 3.115± 0.061mag.)
Furthermore we acquired a low-resolution optical
spectrum with EMMI at the NTT on La Silla to de-
termine the spectral type of HD141272B and prove its
common distance with HD141272A. The spectrum was
taken in RILD and REMD mode covering a wavelength
of 400-900nm with a resolution of R ≈ 3000 at 600nm.
The data reduction followed the standard procedure for
low-resolution optical spectra: After bias subtraction,
flat fielding and wavelength calibration with a HeAr arc
spectrum we corrected for the instrumental response
and for telluric features using a spectrum of HR5501
taken at the same airmass as HD141272B.
We determined the spectral type by comparing our
spectrum with a standard sequence of M dwarfs in the
same spectral range and with comparable spectral res-
olution (Bochanski et al. 2006), see Fig. 5. The best
fit resulted in a spectral type of M3.25 ± 0.25 which
is consistent with a spectral type of M3.0± 0.5 deter-
mined from the TiO5 spectral index of 0.49 following
Cruz & Reid (2002).
Adopting the latter spectral type as final we derived
a spectrophotometric distance of 24.4±4.2 pc from the
MJ relation given in Cruz & Reid (2002) and the J
magnitude from 2MASS, assuming that the companion
is on the Main sequence. The determined distance is in
excellent agreement with the HIPPARCOS measured
distance of 21.35±0.48 pc for HD 141272A, confirming
their common distance. Hence, we call the companion
HD141272B.
5 Conclusions
With the astrometric data reduction and analysis tech-
niques presented in this work, we could verify the com-
mon proper motion of both components of the binary
system HD141272AB during 56 years of epoch differ-
ence between the first successful observation of this sys-
tem on the POSS-I plates taken in July 1950 and our
H-band imaging obtained with Ω-Cass in April 2006.
Furthermore we obtained an optical spectrum of the
companion and derived its spectral type to range be-
tween M2.5V and M3.5V. The infrared apparent mag-
nitudes of the co-moving companion are fully consis-
tent with a M3 dwarf which is located at the distance
400 500 600 700 800 900
HD 141272 B
Wavelength [nm]
Fig. 5 Relative flux of the spectral sequence from
M1 to M5 (Bochanski et al. 2006) in comparison to the
EMMI spectrum of HD141272B, ranging from 400 to
900 nm. The resolutions are comparable (R ∼ 3000 for
the EMMI spectrum and R ∼ 6000 for the standard
spectra at 600 nm). HD 141272B shows good agree-
ment with an M3 star.
of HD141272A which finally confirms the companion-
ship of this new binary system. The companion is an
addition to the Catalog of Nearby Stars within 25 pc
(Gliese & Jahreiß 1991).
In order to get an estimation of the system age we
compared the infrared photometry of HD 141272A and
B with ≈ 1300 members of the Pleiades cluster which
are listed in the WEBDA database (Mermilliod 1998).
All objects are plotted in a J-K vs. MH color-magnitude
diagram (Fig. 6). The colors of all objects are obtained
from the 2MASS catalog and we derived the absolute
H-Band magnitudes of all comparison stars using their
2MASS H-band photometry and a mean distance mod-
ule of the Pleiades of 5.97mag (WEBDA database).
The expected distance uncertainty of the cluster mem-
bers which results in an uncertainty of their absolute H-
band magnitudes was approximated with the angular
diameter of the Pleiades cluster on the sky, assuming a
similar extension of the cluster also in the radial direc-
tion. The absolute H-band magnitudes of HD141272A
and B are derived with 2MASS photometry and the
Hipparcos parallax of the binary system. Compared to
the Pleiades of the same J-K color HD141272A and
B appear a little fainter, indicating that the system is
already on the ZAMS, which is similar to the results of
earlier works (Gaidos 1998; Wright et al. 2004).
If we assume that both components of the binary
system have already reached the ZAMS we can deter-
mine the mass of the secondary using equation (11)
from Kirkpatrick & McCarthy (1994) with the given
c© Year of publication WILEY-VCH Verlag GmbH&Co.KGaA, Weinheim www.an-journal.org
Astron. Nachr. / AN (Year of publication) 5
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
J−K [mag]
0.75 0.8 0.85 0.9 0.95 1
J−K [mag]
HD 141272 A 
HD 141272 B 
Fig. 6 J-K vs. MH diagram for the Pleiades and
HD141272A and B (rectangles symbolize the error
boxes). The inserted plot shows HD141272B and the
surrounding Pleiades stars drawn to a larger scale. The
main sequence of the cluster can be seen although there
are some outliers due to the mean distance module
(5.97mag for Pleiades) applied. The mean error of the
Pleiades is shown by the error cross in the lower left.
HD 141272A and B appear a little fainter than Pleiades
stars of the same J-K color. This indicates, that the
system already reached the ZAMS.
errors for the constants a and b and the range of the
spectral type. We derived a mass of
M∗ = 0.26
+0.07
−0.06M⊙.
Future work should ascertain the age of the system
and derive more properties of the M dwarf, which en-
larges the list of nearby low mass stars bound in binary
systems.
Acknowledgements. We would like to thank the technical
staff of the ESO NTT at La Silla as well as of the Centro
Astronómico Hispano Alemán (CAHA) at Calar Alto for all
their help and assistance in carrying out the observations.
In addition we would like to thank John Bochanski, An-
drew West, Suzanne Hawley and Kevin Covey for providing
the electronic sequence of M-stars composite spectra.
T.O.B. Schmidt acknowledges support from a Thur-
ingian State Scholarship and from a Scholarship of the Evan-
gelisches Studienwerk e.V. Villigst.
This publication makes use of data products from the
Two Micron All Sky Survey, which is a joint project of
the University of Massachusetts and the Infrared Process-
ing and Analysis Center/California Institute of Technology,
funded by the National Aeronautics and Space Administra-
tion and the National Science Foundation.
We use imaging data from the SuperCOSMOS Sky Sur-
vey, prepared and hosted by the Wide Field Astronomy
Unit, Institute for Astronomy, University of Edinburgh,
which is funded by the UK Particle Physics and Astron-
omy Research Council.
This research has made use of the VizieR catalogue
access tool and the Simbad database, both operated at
the Observatoire Strasbourg, as well as of the WEBDA
database, operated at the Institute for Astronomy of the
University of Vienna.
The DENIS project has been partly funded by the SCI-
ENCE and the HCM plans of the European Commission
under grants CT920791 and CT940627. It is supported by
INSU, MEN and CNRS in France, by the State of Baden-
Württemberg in Germany, by DGICYT in Spain, by CNR
in Italy, by FFwFBWF in Austria, by FAPESP in Brazil,
by OTKA grants F-4239 and F-013990 in Hungary, and by
the ESO C&EE grant A-04-046.
Jean Claude Renault from IAP was the Project man-
ager. Observations were carried out thanks to the contri-
bution of numerous students and young scientists from all
involved institutes, under the supervision of P. Fouqué, sur-
vey astronomer resident in Chile.
References
Bertin, E., Arnouts, S.: 1996, A&AS 117, 393
Bochanski, J.J., West, A.A., Hawley, et al.: 2007, AJ 133,
Chen, C.H., Patten, B.M., Werner, M.W., et al.: 2005,
ApJ 634, 1372
Cruz, K.L., Reid, I.N.: 2002, AJ 123, 2828
Cutri, R.M. Skrutskie, M.F., van Dyk, S., et al.: 2003,
2MASS All Sky Catalog of point sources. (The IRSA
2MASS All-Sky Point Source Catalog, NASA/IPAC In-
frared Science Archive. http://irsa.ipac.caltech.edu)
Devillard, N.: 2001, in ASP Conf. Ser. 238: Astronomical
Data Analysis Software and Systems X, ed. F. R. Harn-
den, Jr., F. A. Primini, & H. E. Payne, 525–+
Fuhrmann, K.: 2004, AN 325,3
Gaidos, E.J.: 1998, PASP 110, 1259
Gaidos, E.J. Henry, G.W., Henry, S.M.: 2000, AJ 120, 1006
Gliese, W., Jahreisß, H.: 1991, Preliminary Version of the
Third Catalogue of Nearby Stars, Tech. rep.
Gray, N., Jenness, T. Allan, A., et al.: 2005, in Astronomical
Society of the Pacific Conference Series, ed. P. Shopbell,
M. Britton, & R. Ebert, 119–+
Kirkpatrick, J.D., Mc Carthy, Jr., D.W.: 1994, AJ 107,333
Lilliefors, H.W.: 1967, Journal of the American Statistical
Association 62, 399
López-Santiago, J., Montes, D., Crespo-Chacón, I. et al.:
2006, ApJ 643, 1160
Mart́ın, E. L., Dahm, S., Pavlenko, Y.: 2001, ASP Conf.
Ser. 245: Astrophysical Ages and Times Scales, 349
Mermilliod, J.-C.: ed. 1998, WEB Acces to the Open Clus-
ter Database
Montes, D., López-Santiago, J., Gálvez, M.C., et al.: 2001,
MNRAS 328, 45
Nordström, B., Mayor, M., Andresen, J., et al.: 2004,
A&A 418, 989
Perryman, M.A.C., Lindgren, L., Kovalevsky, J., et al.:
1997, A&A 323, L49
Wright, J.T., Marcy, G.W., Butler, R.P., Vogt, S.S.: 2004,
ApJS 152, 261
www.an-journal.org c© Year of publication WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim
http://irsa.ipac.caltech.edu
	Introduction
	Archival first epoch data
	Follow-up observations
	Photometry and Spectroscopy
	Conclusions
ABSTRACT
  We search for stellar and substellar companions of young nearby stars to
investigate stellar multiplicity and formation of stellar and substellar
companions. We detect common proper-motion companions of stars via multi-epoch
imaging. Their companionship is finally confirmed with photometry and
spectroscopy. Here we report the discovery of a new co-moving (13 sigma)
stellar companion ~17.8 arcsec (350 AU in projected separation) north of the
nearby star HD141272 (21 pc). With EMMI/NTT optical spectroscopy we determined
the spectral type of the companion to be M3+-0.5V. The derived spectral type as
well as the near infrared photometry of the companion are both fully consistent
with a 0.26+-0.07 Msol dwarf located at the distance of HD141272 (21 pc).
Furthermore the photometry data rules out the pre-main sequence status, since
the system is consistent with the ZAMS of the Pleiades.

<|endoftext|><|startoftext|>
Introduction
The results of solar [1,2,3,4,5,6], atmospheric [7,8], reactor [9,10,11,12] and accelera-
tor [13,14,15] neutrino experiments show that flavour mixing occurs not only in the
hadronic sector, as it has been known for long, but in the leptonic sector as well. The
full understanding of the leptonic mixing matrix constitutes, together with the dis-
crimination of the Dirac/Majorana character of neutrinos and with the measurement
of their absolute mass scale, the main goal of neutrino physics for the next decade.
The experimental results point to two very distinct mass-squared differences, ∆m2
7.9 × 10−5 eV2 and |∆m2
| ≈ 2.4 × 10−3 eV2. On the other hand, only two out of
the four parameters of the three-family leptonic mixing matrix UPMNS [16,17,18,19] are
known: θ12 ≈ 34
◦ and θ23 ≈ 43
◦ [20]. The other two parameters, θ13 and δ, are still
unknown: for the mixing angle θ13 direct searches at reactors [9,10,11] and three-family
global analysis of the experimental data give the upper bound θ13 ≤ 11.5
◦, whereas for
the leptonic CP-violating phase δ we have no information whatsoever (see, however,
Ref. [20]).
The LSND data [21,22,23], on the other hand, would indicate a ν̄µ → ν̄e oscillation with
a third neutrino mass-squared difference: ∆m2
∼ 0.3 − 6 eV2, about two orders of
magnitude larger than ∆m2
. Given the strong hierarchy among the solar, atmospheric
and LSND mass-squared splittings, ∆m2
≪ ∆m2
≪ ∆m2
, it is not possible to
explain all these data with just three massive neutrinos, as it has been shown with
detailed calculations in Ref. [24]. A necessary condition to explain the whole ensemble
of data in terms of neutrino oscillations is therefore the introduction of at least a
fourth light neutrino state. This new light neutrino must be an electroweak singlet [18]
in order to comply with the strong bounds on the Z0 invisible decay width [25,26]. For
this reason, the LSND signal has often been considered as an evidence of the existence
of a sterile neutrino.
In recent years, global analyses of solar, atmospheric, short-baseline [27,28,29,30] exper-
iments and LSND data have been performed to establish whether four-neutrino models
can really reconcile the data and solve the puzzle [31,32,33,34,35,36,37,38]. The point
is that providing a suitable mass-squared difference to each class of experiments is not
enough: it is also necessary to show that the intrinsic structure of the neutrino mixing
matrix is compatible with all the data. This turned out to be very hard to accomplish.
In Ref. [39] it was shown that four-neutrino models were only marginally allowed, with
best fit around ∆m2
≃ 1 eV2 and sin2 2θLSND ≃ 10
−3. Generically speaking, the
global analysis indicated that a single sterile neutrino state was not enough to reconcile
LSND with the other experiments. For this reason, to improve the statistical compat-
ibility between the LSND results and the rest of the oscillation data, models with
two sterile neutrino states have been tested (see, for example, Ref. [40] and references
therein). Although a slightly better global fit was achieved, a strong tension between
the LSND data and the results from atmospheric and short-baseline experiments was
still present.
So far, the LSND signal has not been confirmed by any other experiment [41]. It is
therefore possible that the LSND anomaly arises from some some yet unknown problem
in the data set itself. To close the issue, the MiniBooNE collaboration [42] at FermiLab
has recently performed a search for νµ → νe appearance with a baseline of 540 m and a
mean neutrino energy of about 700 MeV. The primary purpose of this experiment was
to test the evidence for ν̄µ → ν̄e oscillation observed at LSND with a very similar L/E
range. No evidence of the expected signal has been found, hence ruling out once and for
all the four-neutrino interpretation of the LSND anomaly. However, MiniBooNE data
are themselves not conclusive: although no evidence for νµ → νe oscillation has been
reported in the spectrum region compatible with LSND results, an unexplained excess
has been observed for lower energy neutrinos. Furthermore, within a five-neutrino model
this excess can be easily explained, and even reconciled with LSND and all the other
appearance experiments [43]. On the other hand, a post-MiniBooNE global analysis
including also disappearance data show that five-neutrino models suffer from the same
problems as four-neutrino schemes, and in particular they are now only marginally
allowed – a situation very similar to that of four-neutrino models before MiniBooNE
data. Adding a third sterile neutrino 1 does not help [43], and in general global analyses
seem to indicate that sterile neutrinos alone are not enough to reconcile all the data.
Models with sterile neutrinos and exotic physics have been therefore proposed (see, for
example, Ref. [46]).
In summary, the present experimental situation is still confused. It is therefore worth-
wile to understand if, aside of MiniBooNE, new neutrino experiments currently running
or under construction can investigate the existence of sterile neutrinos separated from
the active ones by O(eV2) mass-squared differences. In this paper we explore in detail
the capability of the CNGS beam to perform this search. For definiteness we focus on
the simplest case with only one extra sterile neutrino. Note that this model is perfectly
viable once the LSND result is dropped, as it contains as a limiting case the usual
three-neutrino scenario. Furthermore, it is easily generalizable by adding new sterile
neutrino states, and it can be used as a basis for models with extra “sterile” states
strongly decoupled from active neutrinos (such as in extra-dimensions models with a
right-handed neutrino in the bulk [47]).
The CNGS beam [48] has been built to test the (supposedly) dominant oscillation in at-
mospheric neutrino data, νµ → ντ . In order to make possible τ production through CC
interactions, the mean neutrino energy, 〈Eν〉 = 17 GeV, is much above the atmospheric
oscillation peak for the CERN to Gran Sasso baseline, L = 732 Km. Two detectors are
illuminated by the CNGS beam: OPERA (see Ref. [49] and refs. therein) will start data
taking with the lead-emulsion target in 2007; ICARUS-T600 (see Ref. [50] and refs.
therein) will start operating in 2008. Both detectors have been especially designed to
look for τ ’s produced through νµ → ντ oscillation and to minimize the corresponding
1 A quite interesting scenario is, in our opinion, that in which three right-handed Majorana
neutrinos are added to the three weakly-interacting ones. If the Majorana mass term M is
O(eV), (3+3) light Majorana neutrinos are present at low-energy [44,45].
backgrounds. The expected number of τ events after signal selection in an experiment
such as OPERA (after five years of data taking with nominal CNGS luminosity) is
O(10) events with O(1) background event.
At the CNGS distance and energy, neutrino oscillations mediated by an O(eV2) mass
difference will appear as a constant term in the oscillation probability. In four-neutrino
models, fluctuations induced by this term over the atmospheric νµ → ντ oscillation can
be as large as 100% for specific points of the allowed parameter space. This is due to the
fact that the leading angle for this oscillation is the less constrained one. The νµ → ντ
channel, therefore, is extremely promising as a “sterile neutrino” smoking gun, as it has
been commented elsewhere (see, for example, Refs. [51,52] and refs. therein). To test
the model we will also make use of the νµ → νe channel. Notice that the background to
this signal coming from τ → e decay is modified in four-neutrino models with respect
to standard three-family oscillations. In fact, since νµ → ντ oscillations are depleted by
active-sterile mixing with respect to standard ones, the τ → e background to νµ → νe
oscillations gets depleted, too. A combined analysis of the two channels in four-neutrino
models at the OPERA detector has been performed, taking into account properly all
of the backgrounds. We stress, however, that the same analysis could be performed
at ICARUS, as well. The previous considerations hold for any facility operating well
beyond the kinematical threshold for τ production.
In the specific case of the CNGS beam, the limited flux implies a modest improvement
in the parameter space exclusion, see Sec. 6. An increase in the exposure of such
facilities, however, would permit to improve the present bounds on the parameters of
four-neutrino models and, in particular, to constrain the leading angle in νµ → ντ
oscillations at the level of the other mixing parameters.
The paper is organized as follows. In Sec. 2 we briefly review the main features of
four-neutrino models and we introduce our parametrization of the mixing matrix. In
Sec. 3 we compute the vacuum oscillation probabilities in the atmospheric regime and
we review the present bounds on the active-sterile mixing angles. In Sec. 4 we recall the
most relevant parameters of CNGS. In Sec. 5 we study theoretically the expectations
of the νµ → ντ and νµ → νe channels at the CNGS. In Sec. 6 we present our results
using these channels at the OPERA detector and the CNGS beam. Finally, in Sec. 7
we draw our conclusions.
2 Four neutrino mass schemes
In four-neutrino models, one extra sterile state is added to the three weakly interacting
ones. The relation between the flavor and the mass eigenstates is then described by a
4×4 unitary matrix U , which generalizes the usual 3×3 matrix UPMNS [16,17,18,19]. As
stated in the introduction, in this work we only consider the case when the fourth mass
eigenstate is separated by the other three by an O(eV2) mass-squared gap. There are
six possible four-neutrino schemes, shown in Fig. 1, that can accommodate the results
                 
(3+1) (2+2)
Fig. 1. The two classes of four–neutrino mass spectra, (3+1) and (2+2).
from solar and atmospheric neutrino experiments and contain a third much larger ∆m2.
They can be divided in two classes: (3+1) and (2+2). In the (3+1) schemes, there is
a group of three close-by neutrino masses that is separated from the fourth one by
the larger gap. In (2+2) schemes, there are two pairs of close masses separated by the
large gap. While different schemes within the same class are presently indistinguishable,
schemes belonging to different classes lead to very different phenomenological scenarios.
A characteristic feature of (2+2) schemes is that the extra sterile state cannot be
simultaneously decoupled from both solar and atmospheric oscillations. To understand
why, let us define
i∈ sol
|Usi|
2 and cs =
j ∈ atm
|Usj|
2 (1)
where the sums in i and j run over mass eigenstates involved in solar and atmospheric
neutrino oscillations, respectively. Clearly, the quantities ηs and cs describe the fraction
of sterile neutrino relevant for each class of experiment. Results from atmospheric and
solar neutrino data imply that in both kind of experiments oscillation takes place
mainly between active neutrinos. Specifically, from Fig. 46 of Ref. [20] we get ηs ≤ 0.30
and cs ≤ 0.36 at the 3σ level. However, in (2+2) schemes unitarity implies ηs + cs = 1,
as can be easily understood by looking at Fig. 1. These models are therefore ruled out
at a very high confidence level [53], and in the rest of this work we will not consider
them anymore.
On the other hand, (3+1) schemes are not affected by this problem. Although the
experimental bounds on ηs and cs quoted above still hold, the condition ηs + cs = 1 no
longer applies. For what concerns neutrino oscillations, (3+1) models are essentially
unfalsifiable, since they reduce to the conventional three-neutrino scenario when the
mixing between active and sterile states are small enough.
The mixing matrix U can be conveniently parametrized in terms of six independent
rotation angles θij and three (if neutrinos are Dirac fermions) or six (if neutrinos are
Majorana fermions) phases δi. In oscillation experiments, only the so-called “Dirac
phases” can be measured, the effect of the “Majorana phases” being suppressed by
factors of mν/Eν . The Majorana or Dirac nature of neutrinos can thus be tested only
in ∆L = 2 transitions such as neutrino-less double β-decay [54] or lepton number
violating decays [25]. In the following analysis, with no loss in generality, we will restrict
ourselves to the case of 4 Dirac-type neutrinos only.
A generic rotation in a four-dimensional space can be obtained by performing six dif-
ferent rotations along the Euler axes. Since the ordering of the rotation matrices Rij
(where ij refers to the plane in which the rotation takes place) is arbitrary, plenty of
different parametrizations of the mixing matrix U are allowed. The large parameter
space (6 angles and 3 phases, to be compared with the standard three-family mixing
case of 3 angles and 1 phase) is however reduced to a subspace whenever some of
the mass differences become negligible. If the eigenstates i and j are degenerate, ro-
tations in the ij-plane become unphysical and the corresponding mixing angle should
drop from oscillation probabilities. If the matrix Rij is the rightmost one the angle
θij automatically disappears, since the matrix commutes with the vacuum hamilto-
nian. The parameter space gets therefore reduced to the physical angles and phases.
If a different ordering of the rotation matrices is taken, no angle explicitly disappears
from the oscillation formulas, but the physical parameter space is still smaller than the
original one. In this case, a parameter redefinition is needed to reduce the parameter
space to the observable sector. In Refs. [55,56] it was shown how the one-mass domi-
nance (∆sol → 0 and ∆atm → 0, where ∆ = ∆m
2L/4E [57]) and two-mass dominance
(∆sol → 0) approximations can be implemented in a transparent way (in the sense that
only the physical parameters appear in oscillation probabilities) using a parametriza-
tion in which rotations are performed in the planes corresponding to smallest mass
difference first:
USBL = R14(θ14) R24(θ24) R34(θ34) R23(θ23, δ3) R13(θ13, δ2) R12(θ12, δ1) . (2)
This parametrization was shown to be particularly useful when maximizing oscilla-
tions driven by a O(eV2) mass difference. The analytical expressions for the oscillation
probabilities in the (3+1) model in the one-mass dominance approximation in this
parametrization have been presented in Ref. [51].
In this paper, however, we are interested in a totally different regime: the “atmospheric
regime”, with oscillations driven by the atmospheric mass difference, ∆m2
L/E ∼
π/2. We will then make use of the following parametrization, adopted in Ref. [43]:
Uatm = R34(θ34) R24(θ24) R23(θ23, δ3) R14(θ14) R13(θ13, δ2) R12(θ12, δ1) . (3)
It is convenient to put phases in R12 (so that it automatically drops in the two-mass
dominance regime) and R13 (so that it reduces to the “standard” three-family Dirac
phase when sterile neutrinos are decoupled). The third phase can be put anywhere;
we will place it in R23. Note that in the one-mass dominance regime all the phases
disappear from the oscillation probabilities.
0 2 4 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14 16 18 20
Fig. 2. Allowed regions at 90%, 95%, 99% and 3σ CL in the (θ13, θ14) plane (left) and in the
(θ24, θ34) plane (right) from the results of present atmospheric, reactor and LBL neutrino
experiments. The undisplayed parameters θ23 and δ3 are marginalized.
3 Oscillation probabilities and allowed parameter space
Let us first consider νe disappearance at L/E such that ∆sol can be safely neglected
with respect to ∆atm and ∆sbl. We get for this probability in vacuum:
Pee = 1− sin
2 2θ14 sin
∆sblL
sin2 2θ13 sin
∆atmL
, (4)
where cij = cos θij and sij = sin θij . It is clear from Eq. (4) that reactor experiments
such as Bugey and Chooz can put stringent bounds to θ13 and θ14, in this parametriza-
tion. This is depicted in Fig. 2(left), where 90%, 95%, 99% and 3σ CL contours in
the (θ13 − θ14)-plane are shown for ∆sol → 0 and ∆m
= 2.4 × 10−3 eV2. The third
mass difference, ∆m2
, is free to vary above 0.1 eV2. Notice that the νe disappearance
probability does not depend on θ23, θ24 and θ34. It can be clearly seen that the three-
family Chooz bound on θ13 is slightly modulated by θ14. Both angles, however, cannot
be much larger than 10◦. We will therefore expand in these two parameters to deduce
the other relevant oscillation probabilities.
At the CNGS beam atmospheric oscillations are large, solar oscillations can be ne-
glected and O(eV2) oscillations are extremely fast and can be averaged. It is useful to
write down the oscillation probability (in vacuum) at typical atmospheric L/E, in the
approximation ∆sol → 0, ∆sbl → ∞. In this regime:
P (να → νβ) = δαβ − 4ℜ
β3 (δαβ − U
α3Uβ3 − U
α4Uβ4)
δαβUα4U
β4 − |Uα4|
2|Uβ4|
α4Uβ4
sin∆23L ,
where + stands for neutrinos and − for antineutrinos, respectively. Up to second order
in θ13 and θ14 we get for the νµ disappearance oscillation probability:
Pµµ = 1− 2c
(1− c2
)− s2
− 2c3
s23(1− 2s
)s13s14s24 cos(δ2 − δ3)
∆atmL
A “negative” result in an atmospheric L/E νµ disappearance experiment (such as, for
example, K2K), in which νµ oscillations can be very well fitted in terms of three-family
oscillations, will put a stringent bound on the mixing angle θ24. The bound from such
experiments on θ24 can be seen in Fig. 2(right), where 90%, 95%, 99% and 3σ CL
contours in the (θ24 − θ34)-plane are shown for ∆sol → 0 and ∆m
= 2.4× 10−3 eV2.
The third mass difference, ∆m2
, is free to vary above 0.1 eV2. The mixing angles not
shown have been fixed to: θ23 = 45
◦; θ13 = θ14 = 0 (in this hypothesis, Pµµ does not
depend on phases). Notice that the νµ disappearance probability does not depend on
From the figure, we can see that θ24 cannot be much larger than 10
◦, either. We will
consider, therefore, the three mixing angles θ13, θ14 and θ24 being of the same order and
expand in powers of the three. At second order in θ13, θ14 and θ24, we get:
Pµµ = 1− 2s
− 4s2
(1− 2s2
)− s2
∆atmL
. (7)
Since both νe and νµ disappearance do not depend on θ34, we should ask which mea-
surements give the upper bound to this angle that can be observed in Fig. 2(right).
This is indeed a result of indirect searches for νµ → νs conversion in atmospheric exper-
iments, using the different interaction with matter of active and sterile neutrinos. This
can be understood from the (vacuum) νµ → νs oscillation probability at atmospheric
L/E for which, at second order in θ13, θ14 and θ24, we get:
Pµs = 2c
sin2 2θ23(c
+ 2c34 sin 2θ23s34
s24(1− 2s
) cos δ3 + 2s23s13s14 cos δ2
∆atmL
± c34 sin 2θ23s24s34 sin δ3 sin∆atmL .
As it can be seen, the bound on θ34 arises from a measurement of spectral distortion
(i.e., from the “atmospheric” term proportional to sin2∆atmL/2). On the other hand,
bounds on θ13, θ14 and θ24 are mainly drawn by a flux normalization measurement.
As a consequence, the bound on θ34 that we can draw by non-observation of νµ → νs
oscillation in atmospheric experiments is less stringent than those we have shown before.
For this reason, θ34 can be somewhat larger than θ13, θ14 and θ24, but still bounded to
be below 40◦. In the following, we will expand in powers of the four mixing angles
θ13, θ14, θ24 and θ34, that will be considered to be comparably small.
Up to fourth-order in θ13, θ14, θ24 and θ34, the νµ → νe appearance probability in the
atmospheric regime is:
Pµe = 4
[1− s2
] + s23s13s14s24 cos(δ2 − δ3)
∆atmL
± 2s23s13s14s24 sin(δ2 − δ3) sin∆atmL+ 2s
Eventually, the νµ → ντ appearance probability up to fourth-order in θ13, θ14, θ24 and
θ34 in the atmospheric regime is:
Pµτ = 2s
sin2 2θ23[c
− 4 sin 2θ23s13s14[s23s34 cos δ2 + c23s24 cos(δ2 − δ3)]
+ 2 sin 2θ23s24s34c
c34[c
− 2c2
] cos δ3
∆atmL
∓ sin 2θ23s24s34c
c34 sin δ3 sin∆atmL .
As it was shown in Refs. [51,52], the νµ → ντ appearance channel is a good place to
look for sterile neutrinos. This can be understood as follows: consider the νµ → ντ
three-family oscillation probability in the atmospheric regime, up to fourth-order in
P 3νµτ = Pµτ (θi4 = 0) ≃ c
sin2 2θ23 sin
∆atmL
. (11)
The genuine active-sterile neutrino mixing effects are:
∆Pµτ ≡ Pµτ − P
) sin2 2θ23 + 2 sin 2θ23(1− 2s
)s24s34 cos δ3
∆atmL
∓ sin 2θ23s24s34 sin δ3 sin∆atmL+ . . .
that is second-order in small angles θ13, θ14, θ24 and θ34. We would get a similar result
for νµ disappearance, also. On the other hand, computing the corresponding quantity
in the νµ → νe channel, we get:
∆Pµe ≡ Pµe − Pµe(θi4 = 0)
= s23s13s14s24 cos(δ2 − δ3) sin
∆atmL
± 2s23s13s14s24 sin(δ2 − δ3) sin∆atmL+ . . .
that is third-order in the same parameters.
Notice, eventually, that all oscillation probabilities start with an energy-independent
term and are, therefore, non-vanishing for L = 0, a result of our assumption that
∆sbl → ∞.
0 10 20 30 40 50
Eν(GeV)
anti−νµ
anti−νe
Fig. 3. CNGS neutrino fluxes (in arbitrary units) as a function of the neutrino energy. Both
muon and electron neutrino fluxes are illustrated.
4 The CNGS facility
The CNGS is a conventional neutrino beam in which neutrinos are produced by the
decay of secondary pions and kaons, obtained from collisions of 400 GeV protons from
the CERN-SPS onto a graphite target. The resulting neutrinos are aimed to the under-
ground Gran Sasso Laboratory (LNGS), located at 730 Km from CERN. This facility
provided the first neutrinos in August 2006 [49]. Differently from other long baseline
experiments, the neutrinos from CNGS can be exploited to search directly for νµ → ντ
oscillations, since they have a mean energy well beyond the kinematic threshold for τ
production. Moreover, the prompt ντ contamination (mainly fromDs decays) is negligi-
ble. The expected νe contamination is also relatively small compared to the dominant νµ
component, thus allowing for the search of sub-dominant νµ → νe oscillations through
an excess of νe CC events.
The energy spectra of the CNGS neutrino beam are shown (in arbitrary units) in
Fig. 3 [58]. In the present paper we assume the nominal intensity for the CNGS,
corresponding to 4.5× 1019 pot/year.
OPERA has been designed to search for τ appearance through identification of the
ντ CC interaction on an event-by-event basis. In particular, τ ’s are tagged identifying
explicitly their decay kink through high resolution nuclear emulsions interleaved with
lead sheets. For this detector, we can take advantage of the detailed studies of the
νµ → ντ signal (see Ref. [59]) and of the νµ → νe signal (see Ref. [60]).
The total non-oscillated CC event rates for a 1 Kton lead target with a neutrino flux
νµ ν̄µ νe ν̄e
669.0 13.7 5.9 0.3
Table 1
Nominal performance of the CNGS reference beam [58]. The total non-oscillated CC event
rates are calculated assuming 1019 pot and 1 Kton lead target mass.
normalized to 1019 pot are shown in Tab. 1 and are evaluated according to
dφνα(E)
σνα(E) dE , (14)
in which φνα is the flux of the neutrino flavour να and σνα the corresponding cross
section on lead.
5 Appearance channels at the CNGS
5.1 νµ → ντ oscillations
Since the CNGS experiments have been designed to search for νµ → ντ oscillation
in the parameter region indicated by the atmospheric neutrino data, we can take full
advantage of them in order to constrain (and, possibly, study) the four-family parameter
space.
The number of taus from νµ → ντ oscillations is given by the convolution of the νµ
flux dφνµ(E)/dE with the ντ charged-current cross-section on lead, σ
(E), weighted
by the νµ → ντ oscillation probability, Pµτ (E), times the efficiency for the OPERA
detector, εµτ :
Nµτ = A
dφνµ(E)
Pµτ (E)σ
(E) εµτ dE . (15)
A is a normalization factor which takes into account the target mass and the nor-
malization of the νµ flux in physical units. Specializing our analysis for the OPERA
detector, we have considered an overall efficiency εµτ ∼ 13%, [59]. This efficiency takes
into account that OPERA is able to exploit several decay modes of the final state τ ,
using both so-called short and long decays.
The dominant sources of background for the νµ → ντ signal are charm decays and
hadronic reinteractions. Both of them only depend on the total neutrino flux and not
on the oscillation probabilities. The OPERA experiment at the CNGS beam has been
designed precisely to measure this channel, and thus the corresponding backgrounds
are extremely low.
In Tab. 2 we report the expected number of τ events in the OPERA detector, according
to Eq. (15), for different values of θ13, θ14, θ24 and θ34. Input points have been chosen
according to the allowed regions in the parameter space shown in Sec. 3. The other
(θ13; θ14; θ24; θ34) Nτ background (θ13; θ14; θ24; θ34) Nτ background
(5◦; 5◦; 5◦; 20◦) 8.9 1.0 (10◦; 5◦; 5◦; 20◦) 8.5 1.0
(5◦; 5◦; 5◦; 30◦) 6.9 1.0 (10◦; 5◦; 5◦; 30◦) 6.5 1.0
(5◦; 5◦; 10◦; 20◦) 8.3 1.0 (10◦; 5◦; 10◦; 20◦) 7.9 1.0
(5◦; 5◦; 10◦; 30◦) 10.5 1.0 (10◦; 5◦; 10◦; 30◦) 10.3 1.0
3 families 15.1 1.0 3 families 14.4 1.0
Table 2
Event rates and expected background for the νµ → ντ channel in the OPERA detector, for
different values of θ14, θ24 and θ34 in the (3+1) scheme. The other unknown angle, θ13 has
been fixed to: θ13 = 5
◦, 10◦. The CP-violating phases are: δ1 = δ2 = 0; δ3 = 90
◦. As a
reference, the expected value in the case of standard three-family oscillation (i.e., for θi4 = 0)
is shown for maximal CP-violating phase δ. The rates are computed according to Eq. (15).
parameters are: θ12 = 34
◦; θ23 = 45
◦; ∆m2
= 7.9× 10−5 eV2; ∆m2
= 2.4× 10−3 eV2
and ∆m2
= 1 eV2 (all mass differences are taken to be positive). Eventually, phases
have been fixed to: δ1 = δ2 = 0; δ3 = 90
◦. The expected background is also shown. Rates
refer to a flux normalized to 4.5× 1019 pot/year (the nominal intensity of the CNGS),
an active lead target mass of 1.8 Kton and 5 years of data taking. For comparison, we
also report the expected number of events in the usual 3-family scenario.
As it can be seen, in most part of the parameter space we expect a significant depletion
of the signal with respect to standard three-neutrino oscillations. However, the differ-
ence between (3+1) model νµ → ντ oscillations and standard ones is much bigger than
the expected background. A good signal/noise separation can therefore be used to test
the model.
5.2 νµ → νe oscillations
The number of electrons from the νµ → νe oscillation is given by the convolution of
the νµ flux dφνµ(E)/dE with the νe charged-current cross-section on lead, σ
weighted by the νµ → νe oscillation probability, Pµe(E), times the efficiency for the
OPERA detector, εµe(E) [60]:
Nµe = A
dφνµ(E)
Pµe(E)σ
(E) εµe(E) dE , (16)
where A is defined as above. The overall signal efficiency εµe is the convolution of the
kinematic efficiency εkinµe (that ranges from 60% to 80% for neutrino energies between
5 to 20 GeV) and several (nearly factorizable) contributions. Among them, the most
relevant are trigger efficiencies, effects due to fiducial volume cuts, vertex and brick
finding efficiencies and the electron identification capability. They result in a global
constant factor εfactµe ∼ 48%.
(θ13; θ14; θ24; θ34) Ne ν
µ τ → e ν
(5◦; 5◦; 5◦; 20◦) 3.5 19.4 5.3 2.8 0.9
(5◦; 5◦; 5◦; 30◦) 3.5 19.4 5.3 2.1 0.9
(5◦; 5◦; 10◦; 20◦) 2.4 19.4 5.3 2.3 0.9
(5◦; 5◦; 10◦; 30◦) 2.4 19.4 5.3 2.4 0.9
3 families 3.7 19.7 5.3 4.6 0.9
(10◦; 5◦; 5◦; 20◦) 10.6 19.4 5.3 2.7 0.9
(10◦; 5◦; 5◦; 30◦) 10.4 19.4 5.3 2.0 0.9
(10◦; 5◦; 10◦; 20◦) 8.8 19.4 5.3 2.2 0.9
(10◦; 5◦; 10◦; 30◦) 8.6 19.4 5.3 2.4 0.9
3 families 15.1 19.7 5.3 4.8 0.9
Table 3
Event rates and expected background for the νµ → νe channel in the OPERA detector, for
different values of θ14, θ24 and θ34 in the (3+1) scheme. The other unknown angle, θ13, has
been fixed to: θ13 = 5
◦, 10◦. The CP-violating phases are: δ1 = δ2 = 0; δ3 = 90
◦. As a
reference, the expected value in the case of standard three-family oscillation(i.e., for θi4 = 0)
is shown for maximal CP-violating phase δ. The rates are computed according to Eq. (16).
Backgrounds have been computed following Ref. [60].
The dominant sources of background to the νµ → νe signal are, in order of importance:
(1) νe beam contamination;
(2) fake electrons due to π0 decays from νµ NC interactions;
(3) electrons produced through τ decay, where the τ comes from νµ → ντ oscillations;
(4) CC νµ events where the muon is lost and a track mimics an electron.
Backgrounds (1), (2) and (4) depend very little on the oscillation parameters. On the
other hand, the τ → e background depends strongly on the active-sterile mixing angles.
As we have seen in Sec. 5.1, in the allowed region of the parameter space νµ → ντ
oscillations are significantly depleted with respect to the standard three-neutrino ones.
As a consequence, this background gets depleted, too.
In Tab. 3 we report the expected number of electrons in the OPERA detector, according
to Eq. (16), for different values of θ13, θ14, θ24 and θ34. Input points have been chosen
according to the allowed regions in the parameter space shown in Sec. 3. The other
parameters are: θ12 = 34
◦; θ23 = 45
◦; ∆m2
= 7.9× 10−5 eV2; ∆m2
= 2.4× 10−3 eV2
and ∆m2
= 1 eV2 (all mass differences are taken to be positive). Eventually, phases
have been fixed to: δ1 = δ2 = 0; δ3 = 90
◦. Backgrounds have been computed accordingly
to Ref. [60]. Rates refer to a flux normalized to 4.5×1019 pot/year (the nominal intensity
of the CNGS), an active lead target mass of 1.8 Kton and 5 years of data taking. For
comparison, we also report the expected number of events in the usual 3-family scenario.
As it can be seen from Tab. 3, the difference between the (3+1) model and the stan-
dard three-neutrino oscillations are smaller in this channel than in the νµ → ντ one.
Moreover, they linearly depends on θ13, as it is clear from Eq. (13). For θ13 = 5
◦, this
channel will be of no help to test the allowed parameter space of the (3+1) model. On
the other hand, for θ13 saturating the Chooz-Bugey bound, both νµ → ντ and νµ → νe
might cooperate. However, notice that backgrounds to this signal are much larger than
the difference between (3+1) model and standard three-neutrino oscillations for any
value of θ13.
6 Sensitivity to (3 + 1) sterile neutrinos at OPERA
In this section we study the sensitivity to θ13 and to the active-sterile mixing angles
θ14, θ24 and θ34 at the CNGS beam, using both the νµ → ντ and νµ → νe appearance
channels at the OPERA detector. In the rest of this section, the known three-family
subspace angles have been fixed to: θ12 = 34
◦; θ23 = 45
◦. The mass differences have
been fixed to: ∆m2
= 7.9× 10−5 eV2 and ∆m2
= 2.4× 10−3 eV2. The CP-violating
phases δ1 and δ2 have been kept fixed to δ1 = δ2 = 0. On the contrary, the CP-violating
phase δ3 is fixed to two values: δ3 = 0 or 90
◦. Notice that this phase is still present in
the oscillation probabilities even when θ12 and θ13 vanish, see Eq. (10). At atmospheric
L/E, oscillations driven by an O(eV2) mass difference are averaged. We have checked
that our results apply for any value of ∆m2
≥ 0.1 eV2.
In Fig. 4 we show the sensitivity limit at 99% CL in the (θ13, θ14) plane (left) and
in the (θ24, θ34) plane (right) from a null result of the OPERA experiment, assuming
1, 2, 3, 5 and 10 times the nominal intensity of 4.5 × 1019 pot/year. The coloured
regions show the present bounds at 90% and 99% CL. We assume θ23 = 45
◦ and
δ3 = 0
◦ (top) or δ3 = 90
◦ (bottom). The sensitivity is defined as the region for which a
(poissonian) 2 d.o.f.’s χ2 is compatible with a “null result” at the 99% CL. We refer to
“null result” when θ13 and the three active-sterile mixing angles, θ14, θ24 and θ34 vanish
simultaneously. Both νµ → ντ and νµ → νe oscillations have been considered, with the
corresponding backgrounds treated properly as in Sec. 5. An overall systematic error
of 10% has been taken into account.
In the left panels of Fig. 4 we can see that OPERA can improve only a little the bound
on θ13 after 5 years of data taking working at nominal CNGS beam intensity, both
for δ3 = 0 (top panel) or δ3 = 90
◦ (bottom panel). Increasing the nominal intensity,
however, a significant improvement on the bound is achieved for any value of θ14. Notice
that the limit on θ14 is almost unaffected by the OPERA data. This is because for the
νµ → ντ and νµ → νe oscillation probabilities at atmospheric L/E, the θ14-dependence
always arises at third-order in the small parameters θ13, θ14, θ24 and θ34 (see Eqs. (9)
and (10) for the explicit expression in the adopted parametrization, Eq. (3)). On the
contrary, the θ13-, θ24- and θ34-dependences in the same oscillation probabilities are
quadratic in the small parameters. In case of vanishing active-sterile mixing angles,
0 2 4 6 8 10 12 14
 = 45°, δ
 = 0°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 0°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 90°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 90°
× 5× 10
Fig. 4. Sensitivity limit at 99% CL in the (θ13, θ14) plane (left) and in the (θ24, θ34) plane
(right) from a null result of the OPERA experiment, assuming 1, 2, 3, 5 and 10 times the
nominal intensity of 4.5 × 1019 pot/year. The coloured regions show the present bounds at
90% and 99% CL. We assume θ23 = 45
◦ and δ3 = 0
◦ (top) or δ3 = 90
◦ (bottom).
θi4 = 0, see Ref. [60].
In the right panels of Fig. 4 the sensitivity of OPERA to θ24 and θ34 is shown. First
of all, notice that the sensitivity is strongly affected by the intensity of the beam.
No improvement on the existing bounds on these two parameters is achieved after 5
years of data taking at nominal CNGS beam intensity, for any of the considered value
of δ3. Already with a doubled flux intensity, some sensitivity to θ24, θ34 is achievable.
The sensitivity enhancement strongly depends on the value of the CP-violating phase
δ3, however. For δ3 = 0, OPERA can exclude a small part of the 99% CL allowed
region, only. On the other hand, for δ3 = 90
◦ twice the nominal CNGS flux suffices
0 2 4 6 8 10 12 14
 = 45°, δ
 = 0°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 0°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 90°
0 2 4 6 8 10 12 14
 = 45°, δ
 = 90°
Fig. 5. Sensitivity limit at 99% CL in the (θ13, θ14) plane (left) and in the (θ24, θ34) plane
(right) from the combined analysis of present data and a null result of the OPERA experiment,
assuming 1, 2, 3, 5 and 10 times the nominal intensity of 4.5 × 1019 pot/year. The coloured
regions show the present bounds at 90% and 99% CL. We assume θ23 = 45
◦ and δ3 = 0
(top) or δ3 = 90
◦ (bottom).
to put a bound on θ34 ≤ 25
◦ for θ24 ≥ 4
◦ at 99% CL. For maximal CP-violating
δ3, increasing further the CNGS flux can significantly constrain the (θ24, θ34) allowed
parameter space. Notice, eventually, the strong correlation between θ24 and θ34 in the
right panels of Fig. 4. This is an indication that the dominant channel that constrains
these angles is νµ → ντ . As it can be seen in Eq. (10), the two angles always appear in
combination, with an approximate exchange symmetry θ24 ↔ θ34.
The allowed regions at 99% CL in the (θ13, θ14) plane (left) and in the (θ24, θ34) plane
(right) from the combined analysis of present data and a null result of the OPERA
experiment after 5 years of data taking (assuming 1, 2, 3, 5 and 10 times the nominal
CNGS intensity of 4.5 × 1019 pot/year) are eventually shown in Fig. 5. The coloured
regions refer to the present bounds at 90% and 99% CL, for θ23 = 45
◦ and δ3 = 0
◦ (top)
or δ3 = 90
◦ (bottom). As it can be seen, the sensitivity of OPERA strongly benefits
from the complementary information on the neutrino parameters provided by other
experiments. In this case, even with the nominal beam intensity the extension of the
allowed regions is reduced by a moderate but non-negligible amount.
7 Conclusions
The results of atmospheric, solar, accelerator and reactor neutrino experiments show
that flavour mixing occurs not only in the quark sector, as it has been known for
long, but also in the leptonic sector. Experimental data well fit into a three-family
scenario. The existence of new “sterile” neutrino states with masses in the eV range is
not excluded, however, provided that their couplings with active neutrinos are small
enough.
In this paper, we have tried to test the potential of the OPERA experiment at the
CNGS beam to improve the present bounds on the parameters of the so-called four-
neutrino models. The model, in which only one sterile neutrino is added to the three
active ones responsible for solar and atmospheric oscillations, is the minimal extension
of the standard three-family oscillation scenario.
We have determined the presently allowed regions for all active-sterile mixing angles
and studied the OPERA capability to constrain them further using both the νµ → νe
and νµ → ντ channels. We have performed our analysis using the OPERA detector as
a reference. It can be extended including a detailed simulation of the ICARUS detector
at the CNGS beam.
Our conclusions are the following: if the OPERA detector is exposed to the nominal
CNGS beam intensity, a null result can improve a bit the present bound on θ13, but
not those on the active-sterile mixing angles, θ14, θ24 and θ34. If the beam intensity is
increased by a factor 2 or beyond, not only the sensitivity to θ13 increases accordingly,
but a significant sensitivity to θ24 and θ34 is achievable. The (θ24, θ34) sensitivity strongly
depends on the value of the CP-violating phase δ3, however, with stronger sensitivity for
values of δ3 approaching π/2. Only a marginal improvement is achievable on the bound
on θ14, that should be constrained by high-intensitiy νe disappearance experiments.
Notice that our results hold for any value of ∆m2
≥ 0.1 eV2, i.e. in the region of L/E
for which oscillations driven by this mass difference are effectively averaged.
Acknowledgements
We acknowledge E. Fernández-Mart́ınez, P. Hernández, J. López-Pavón, M. Sorel and
P. Strolin for useful discussions and comments. We thank T. Schwetz for pointing out
to us an error in the first version of the paper and for useful comments on it. The
work has been partially supported by the E.U. through the BENE-CARE networking
activity MRTN-CT-2004-506395. A.D. received partial support from CiCYT through
the project FPA2006-05423. M.M. received partial support from CiCYT through the
project FPA2006-01105 and the MCYT through the Ramón y Cajal program. A.D. and
M.M. acknowledge also financial support from the Comunidad Autónoma de Madrid
through the project P-ESP-00346. D.M. would like to thank CERN, where part of this
work has been accomplished.
References
[1] B. T. Cleveland et al., Astrophys. J. 496 (1998) 505.
[2] J. N. Abdurashitov et al. [SAGE Collaboration], Phys. Rev. C 60 (1999) 055801
[arXiv:astro-ph/9907113].
[3] W. Hampel et al. [GALLEX Collaboration], Phys. Lett. B 447 (1999) 127.
[4] S. Fukuda et al. [Super-Kamiokande Collaboration], Phys. Rev. Lett. 86 (2001) 5651
[arXiv:hep-ex/0103032].
[5] Q. R. Ahmad et al. [SNO Collaboration], Phys. Rev. Lett. 87 (2001) 071301
[arXiv:nucl-ex/0106015].
[6] S. N. Ahmed et al. [SNO Collaboration], Phys. Rev. Lett. 92 (2004) 181301
[arXiv:nucl-ex/0309004].
[7] Y. Fukuda et al. [Super-Kamiokande Collaboration], Phys. Rev. Lett. 81 (1998) 1562
[arXiv:hep-ex/9807003].
[8] M. Ambrosio et al. [MACRO Collaboration], Phys. Lett. B 517 (2001) 59
[arXiv:hep-ex/0106049].
[9] M. Apollonio et al. [CHOOZ Collaboration], Phys. Lett. B 466 (1999) 415
[arXiv:hep-ex/9907037].
[10] M. Apollonio et al. [CHOOZ Collaboration], Eur. Phys. J. C 27 (2003) 331
[arXiv:hep-ex/0301017].
[11] F. Boehm et al., Phys. Rev. D 64 (2001) 112001 [arXiv:hep-ex/0107009].
[12] K. Eguchi et al. [KamLAND Collaboration], Phys. Rev. Lett. 90 (2003) 021802
[arXiv:hep-ex/0212021].
http://arxiv.org/abs/astro-ph/9907113
http://arxiv.org/abs/hep-ex/0103032
http://arxiv.org/abs/nucl-ex/0106015
http://arxiv.org/abs/nucl-ex/0309004
http://arxiv.org/abs/hep-ex/9807003
http://arxiv.org/abs/hep-ex/0106049
http://arxiv.org/abs/hep-ex/9907037
http://arxiv.org/abs/hep-ex/0301017
http://arxiv.org/abs/hep-ex/0107009
http://arxiv.org/abs/hep-ex/0212021
[13] M. H. Ahn et al. [K2K Collaboration], Phys. Rev. Lett. 90 (2003) 041801
[arXiv:hep-ex/0212007].
[14] E. Aliu et al. [K2K Collaboration], Phys. Rev. Lett. 94 (2005) 081802
[arXiv:hep-ex/0411038].
[15] D. G. Michael et al. [MINOS Collaboration], Phys. Rev. Lett. 97 (2006) 191801
[arXiv:hep-ex/0607088].
[16] B. Pontecorvo, Sov. Phys. JETP 6 (1957) 429 [Zh. Eksp. Teor. Fiz. 33 (1957) 549].
[17] Z. Maki, M. Nakagawa and S. Sakata, Prog. Theor. Phys. 28 (1962) 870.
[18] B. Pontecorvo, Sov. Phys. JETP 26 (1968) 984 [Zh. Eksp. Teor. Fiz. 53 (1967) 1717].
[19] V. N. Gribov and B. Pontecorvo, Phys. Lett. B 28 (1969) 493.
[20] M. C. Gonzalez-Garcia and M. Maltoni, arXiv:0704.1800 [hep-ph].
[21] C. Athanassopoulos et al. [LSND Collaboration], Phys. Rev. C 54 (1996) 2685
[arXiv:nucl-ex/9605001].
[22] C. Athanassopoulos et al. [LSND Collaboration], Phys. Rev. Lett. 81 (1998) 1774
[arXiv:nucl-ex/9709006].
[23] A. Aguilar et al. [LSND Collaboration], Phys. Rev. D 64 (2001) 112007
[arXiv:hep-ex/0104049].
[24] G. L. Fogli, E. Lisi, A. Marrone and G. Scioscia, arXiv:hep-ph/9906450.
[25] W. M. Yao et al. [Particle Data Group], J. Phys. G 33 (2006) 1.
[26] LEP Collaborations (ALEPH, DELPHI, OPAL, L3) et al., Phys. Rept. 427 (2006) 257
[arXiv:hep-ex/0509008].
[27] J. Kleinfeller [KARMEN Collaboration], Nucl. Phys. Proc. Suppl. 87 (2000) 281.
[28] F. Dydak et al., Phys. Lett. B 134 (1984) 281.
[29] I. E. Stockdale et al., Z. Phys. C 27 (1985) 53.
[30] Y. Declais et al., Nucl. Phys. B 434 (1995) 503.
[31] W. Grimus and T. Schwetz, Eur. Phys. J. C 20 (2001) 1 [arXiv:hep-ph/0102252].
[32] S. M. Bilenky, C. Giunti and W. Grimus, Eur. Phys. J. C 1 (1998) 247
[arXiv:hep-ph/9607372].
[33] N. Okada and O. Yasuda, Int. J. Mod. Phys. A 12 (1997) 3669 [arXiv:hep-ph/9606411].
[34] V. D. Barger, S. Pakvasa, T. J. Weiler and K. Whisnant, Phys. Rev. D 58 (1998) 093016
[arXiv:hep-ph/9806328].
[35] S. M. Bilenky, C. Giunti, W. Grimus and T. Schwetz, Phys. Rev. D 60 (1999) 073007
[arXiv:hep-ph/9903454].
http://arxiv.org/abs/hep-ex/0212007
http://arxiv.org/abs/hep-ex/0411038
http://arxiv.org/abs/hep-ex/0607088
http://arxiv.org/abs/0704.1800
http://arxiv.org/abs/nucl-ex/9605001
http://arxiv.org/abs/nucl-ex/9709006
http://arxiv.org/abs/hep-ex/0104049
http://arxiv.org/abs/hep-ph/9906450
http://arxiv.org/abs/hep-ex/0509008
http://arxiv.org/abs/hep-ph/0102252
http://arxiv.org/abs/hep-ph/9607372
http://arxiv.org/abs/hep-ph/9606411
http://arxiv.org/abs/hep-ph/9806328
http://arxiv.org/abs/hep-ph/9903454
[36] O. L. G. Peres and A. Y. Smirnov, Nucl. Phys. B 599 (2001) 3 [arXiv:hep-ph/0011054].
[37] C. Giunti and M. Laveder, JHEP 0102 (2001) 001 [arXiv:hep-ph/0010009].
[38] M. Maltoni, T. Schwetz, M. A. Tortola and J. W. F. Valle, New J. Phys. 6 (2004) 122
[arXiv:hep-ph/0405172].
[39] M. Maltoni, T. Schwetz, M. A. Tortola and J. W. F. Valle, Nucl. Phys. B 643 (2002)
321 [arXiv:hep-ph/0207157].
[40] M. Sorel, J. M. Conrad and M. Shaevitz, Phys. Rev. D 70 (2004) 073004
[arXiv:hep-ph/0305255].
[41] E. D. Church, K. Eitel, G. B. Mills and M. Steidl, Phys. Rev. D 66 (2002) 013001
[arXiv:hep-ex/0203023].
[42] A. A. Aguilar-Arevalo et al. [The MiniBooNE Collaboration], Phys. Rev. Lett. 98 (2007)
231801 [arXiv:0704.1500 [hep-ex]].
[43] M. Maltoni and T. Schwetz, arXiv:0705.0107 [hep-ph], to appear in PRD.
[44] A. de Gouvea, Phys. Rev. D 72 (2005) 033005 [arXiv:hep-ph/0501039].
[45] A. de Gouvea, J. Jenkins and N. Vasudevan, Phys. Rev. D 75 (2007) 013003
[arXiv:hep-ph/0608147].
[46] S. Palomares-Ruiz, S. Pascoli and T. Schwetz, JHEP 0509 (2005) 048
[arXiv:hep-ph/0505216].
[47] H. Pas, S. Pakvasa and T. J. Weiler, Phys. Rev. D 72 (2005) 095017
[arXiv:hep-ph/0504096].
[48] G. Giacomelli, arXiv:physics/0703247.
[49] R. Acquafredda et al. [OPERA Collaboration], New J. Phys. 8 (2006) 303
[arXiv:hep-ex/0611023].
[50] S. Amerio et al. [ICARUS Collaboration], Nucl. Instrum. Meth. A 527 (2004) 329.
[51] A. Donini and D. Meloni, Eur. Phys. J. C 22 (2001) 179 [arXiv:hep-ph/0105089].
[52] A. Donini, M. Lusignoli and D. Meloni, Nucl. Phys. B 624 (2002) 405
[arXiv:hep-ph/0107231].
[53] M. Maltoni, T. Schwetz, M. A. Tortola and J. W. F. Valle, Phys. Rev. D 67 (2003)
013011 [arXiv:hep-ph/0207227].
[54] S. M. Bilenky, S. Pascoli and S. T. Petcov, Phys. Rev. D 64 (2001) 113003
[arXiv:hep-ph/0104218].
[55] A. Donini, M. B. Gavela, P. Hernandez and S. Rigolin, Nucl. Phys. B 574 (2000) 23
[arXiv:hep-ph/9909254].
http://arxiv.org/abs/hep-ph/0011054
http://arxiv.org/abs/hep-ph/0010009
http://arxiv.org/abs/hep-ph/0405172
http://arxiv.org/abs/hep-ph/0207157
http://arxiv.org/abs/hep-ph/0305255
http://arxiv.org/abs/hep-ex/0203023
http://arxiv.org/abs/0704.1500
http://arxiv.org/abs/0705.0107
http://arxiv.org/abs/hep-ph/0501039
http://arxiv.org/abs/hep-ph/0608147
http://arxiv.org/abs/hep-ph/0505216
http://arxiv.org/abs/hep-ph/0504096
http://arxiv.org/abs/physics/0703247
http://arxiv.org/abs/hep-ex/0611023
http://arxiv.org/abs/hep-ph/0105089
http://arxiv.org/abs/hep-ph/0107231
http://arxiv.org/abs/hep-ph/0207227
http://arxiv.org/abs/hep-ph/0104218
http://arxiv.org/abs/hep-ph/9909254
[56] A. Donini, M. B. Gavela, P. Hernandez and S. Rigolin, Nucl. Instrum. Meth. A 451
(2000) 58 [arXiv:hep-ph/9910516].
[57] A. De Rujula, M. Lusignoli, L. Maiani, S. T. Petcov and R. Petronzio, Nucl. Phys. B
168 (1980) 54.
[58] A.E. Ball et al., “CNGS: Update on secondary beam layout”, SL-Note-2000-063 EA.
[59] P. Migliozzi, Nucl. Phys. Proc. Suppl. 155 (2006) 23.
[60] M. Komatsu, P. Migliozzi and F. Terranova, J. Phys. G 29 (2003) 443
[arXiv:hep-ph/0210043].
http://arxiv.org/abs/hep-ph/9910516
http://arxiv.org/abs/hep-ph/0210043
	Introduction
	Four neutrino mass schemes
	Oscillation probabilities and allowed parameter space
	The CNGS facility
	Appearance channels at the CNGS
	 oscillations
	e oscillations
	Sensitivity to (3+1) sterile neutrinos at OPERA
	Conclusions
	References
ABSTRACT
  We study the potential of the CNGS beam in constraining the parameter space
of a model with one sterile neutrino separated from three active ones by an
$\mathcal{O}(\eVq)$ mass-squared difference, $\Dmq_\Sbl$. We perform our
analysis using the OPERA detector as a reference (our analysis can be upgraded
including a detailed simulation of the ICARUS detector). We point out that the
channel with the largest potential to constrain the sterile neutrino parameter
space at the CNGS beam is $\nu_\mu \to \nu_\tau$. The reason for that is
twofold: first, the active-sterile mixing angle that governs this oscillation
is the less constrained by present experiments; second, this is the signal for
which both OPERA and ICARUS have been designed, and thus benefits from an
extremely low background. In our analysis we also took into account $\nu_\mu
\to \nu_e$ oscillations. We find that the CNGS potential to look for sterile
neutrinos is limited with nominal intensity of the beam, but it is
significantly enhanced with a factor 2 to 10 increase in the neutrino flux.
Data from both channels allow us, in this case, to constrain further the
four-neutrino model parameter space. Our results hold for any value of
$\Dmq_\Sbl \gtrsim 0.1 \eVq$, \textit{i.e.} when oscillations driven by this
mass-squared difference are averaged. We have also checked that the bound on
$\theta_{13}$ that can be put at the CNGS is not affected by the possible
existence of sterile neutrinos.

<|endoftext|><|startoftext|>
For reference, the following erratum corrects the published version of the paper. These errors have been fixed in
this arxiv-version (the article starting on page 2 has the corrected expressions).
Erratum: Evolution of the Carter constant for inspirals into a black hole: Effect of the
black hole quadrupole
[Phys. Rev. D 75, 124007 (2007)]
Éanna É. Flanagan, Tanja Hinderer
In Eqs. (3.16), (3.17), (3.18), (3.24), (3.25) and (3.26) of this paper, the variable r should be replaced everywhere
by the variable r̃, and the variable θ should be replaced everywhere by the variable θ̃. The definitions of r̃ and θ̃ are
given in Eq. (2.11). These replacements do not affect the any of the subsequent results in the paper.
Also, the right hand side of Eq. (B3) is missing a term −4SLzr̃ and Eq. (2.24) is missing a factor of dϕ/dt̃ in front
of Q.
Some terms are missing in Eqs. (3.18), (3.26) and (3.30) - (3.33). The additional terms in Eqs. (3.18) and (3.26)
15r̃7
−75K2 + 2Kr̃(51r̃E + 50) + 8r̃2(r̃E + 1)(3r̃E + 5)
15p2r̃7
25p3(3p− 4r̃) + p2r̃2
11− 51e2
+ 32pr̃3
1− e2
+ 6r̃4
1− e2
respectively. These result in additional fractional corrections to Eq. (3.30) given by
and the full expression replacing the O(Q) terms in Eq. (3.30) is then
〈K̇〉 = −
(1− e2)3/2
cos(2ι)
+O(S), O(S2)− terms.
Equations (3.31), (3.32) and (3.33) contain typos in the O(S) and O(Q) terms, the corrected expressions are given
below. We thank P. Komorowski for pointing this out. Equation (3.31) should be replaced by
〈ṗ〉 = −64
(1− e2)3/2
− S cos(ι)
96p3/2
1064 + 1516e2 + 475e4
149e2
469e2
227e4
cos(2ι)
+ e2 +
[13− cos(2ι)]
, (0.1)
Equation (3.32) should be replaced by
〈ė〉 = −304
e(1− e2)3/2
121e2
Se(1− e2)3/2 cos(ι)
5p11/2
1172 + 932e2 +
1313e4
Q(1− e2)3/2
785e2
− 219e
+ 13e6 +
2195e2
+ 251e4 +
218e6
cos(2ι)
2e(1− e2)3/2
2 + 3e2 +
[13− cos(2ι)] , (0.2)
and the corrected Eq. (3.33) is
〈ι̇〉 = S sin(ι)(1 − e
2)3/2
p11/2
1− e2
S2 sin(2ι)
240p6
8 + 3e2
8 + e2
Q cot(ι)(1 − e2)3/2
312 + 736e2 − 83e4 −
408 + 1268e2 + 599e4
cos(2ι)
. (0.3)
http://arxiv.org/abs/0704.0389v8
Evolution of the Carter constant for inspirals into a black hole: effect of the black hole
quadrupole
Éanna É. Flanagan1,2 and Tanja Hinderer1
Center for Radiophysics and Space Research, Cornell University, Ithaca, NY 14853, USA
Laboratory for Elementary Particle Physics, Cornell University, Ithaca, NY 14853, USA
(Dated: November 4, 2018)
We analyze the effect of gravitational radiation reaction on generic orbits around a body with an
axisymmetric mass quadrupole moment Q to linear order in Q, to the leading post-Newtonian order,
and to linear order in the mass ratio. This system admits three constants of the motion in absence
of radiation reaction: energy, angular momentum along the symmetry axis, and a third constant
analogous to the Carter constant. We compute instantaneous and time-averaged rates of change
of these three constants. For a point particle orbiting a black hole, Ryan [15] has computed the
leading order evolution of the orbit’s Carter constant, which is linear in the spin. Our result, when
combined with an interaction quadratic in the spin (the coupling of the black hole’s spin to its own
radiation reaction field), gives the next to leading order evolution. The effect of the quadrupole, like
that of the linear spin term, is to circularize eccentric orbits and to drive the orbital plane towards
antialignment with the symmetry axis.
In addition we consider a system of two point masses where one body has a single mass multipole
or current multipole of order l. To linear order in the mass ratio, to linear order in the multipole,
and to the leading post-Newtonian order, we show that there does not exist an analog of the Carter
constant for such a system (except for the cases of an l = 1 current moment and an l = 2 mass
moment). Thus, the existence of the Carter constant in Kerr depends on interaction effects between
the different multipoles. With mild additional assumptions, this result falsifies the conjecture that
all vacuum, axisymmetric spacetimes posess a third constant of the motion for geodesic motion.
PACS numbers: 04.25.Nx, 04.30.Db
I. INTRODUCTION AND SUMMARY
The inspiral of stellar mass compact objects with
masses µ in the range µ ∼ 1 − 100M⊙ into massive
black holes with masses M ∼ 105 − 107M⊙ is one of
the most important sources for the future space-based
gravitational wave detector LISA. Observing such events
will provide a variety of information: (i) the masses and
spins of black holes can be measured to high accuracy
(∼ 10−4); which can constrain the black hole’s growth
history [1]; (ii) the observations will give a precise test
of general relativity in the strong field regime and unam-
biguously identify whether the central object is a black
hole [2]; and (iii) the measured event rate will give in-
sight into the complex stellar dynamics in galactic nu-
clei [1]. Analogous inspirals may also be interesting for
the advanced stages of ground-based detectors: it has
been estimated that advanced LIGO could detect up to
∼ 10 − 30 inspirals per year of stellar mass compact
objects into intermediate mass black holes with masses
M ∼ 102 − 104M⊙ in globular clusters [3]. Detect-
ing these inspirals and extracting information from the
datastream will require accurate models of the gravita-
tional waveform as templates for matched filtering. For
computing templates, we therefore need a detailed un-
derstanding of the how radiation reaction influences the
evolution of bound orbits around Kerr black holes [4–7].
There are three dimensionless parameters characteriz-
ing inspirals of bodies into black holes:
• the dimensionless spin parameter a = |S|/M2 of
the black hole, where S is the spin.
• the strength of the interaction potential ǫ2 =
GM/rc2, i.e. the expansion parameter used in post-
Newtonian (PN) theory.
• the mass ratio µ/M .
For LISA data analysis we will need waveforms that are
accurate to all orders in a and ǫ2, and to leading order
in µ/M . However, it is useful to have analytic results in
the regimes a ≪ 1 and/or ǫ2 ≪ 1. Such approximate
results can be useful as a check of numerical schemes
that compute more accurate waveforms, for scoping out
LISA’s data analysis requirements [1, 6], and for assessing
the accuracy of the leading order in µ/M or adiabatic
approximation [8–10]. There is substantial literature on
such approximate analytic results, and in this paper we
will extend some of these results to higher order.
A long standing difficulty in computing the evolution
of generic orbits has been the evolution of the orbit’s
”Carter constant”, a constant of motion which governs
the orbital shape and inclination. A theoretical prescrip-
tion now exists for computing Carter constant evolution
to all orders in ǫ and a in the adiabatic limit µ ≪ M
[9, 11–13], but it has not yet been implemented numer-
ically. In this paper we focus on computing analytically
the evolution of the Carter constant in the regime a≪ 1,
ǫ ≪ 1, µ/M ≪ 1, extending earlier results by Ryan
[14, 15].
We next review existing analytical work on the effects
of multipole moments on inspiral waveforms. For non-
spinning point masses, the phase of the l = 2 piece of
the waveform is known to O(ǫ7) beyond leading order
[16], while spin corrections are not known to such high
order. To study the leading order effects of the central
body’s multipole moments on the inspiral waveform, in
the test mass limit µ ≪ M , one has to correct both the
conservative and dissipative pieces of the forces on the
bodies. For the conservative pieces, it suffices to use the
Newtonian action for a binary with an additional multi-
pole interaction potential. For the dissipative pieces, the
multipole corrections to the fluxes at infinity of the con-
served quantities can simply be added to the known PN
point mass results. The lowest order spin-orbit coupling
effects on the gravitational radiation were first derived by
Kidder [17], then extended by Ryan [14, 15], Gergely [18],
and Will [19]. Recently, the corrections of O(ǫ2) beyond
the leading order to the spin-orbit effects on the fluxes
were derived [20, 21]. Corrections to the waveform due
to the quadrupole - mass monopole interaction were first
considered by Poisson [22], who derived the effect on the
time averaged energy flux for circular equatorial orbits.
Gergely [23] extended this work to generic orbits and
computed the radiative instantaneous and time averaged
rates of change of energy E, magnitude of angular mo-
mentum |L|, and the angle κ = cos−1(S ·L) between the
spin S and orbital angular momentum L. Instead of the
Carter constant, Gergely identified the angular average
of the magnitude of the orbital angular momentum, L̄, as
a constant of motion. The fact that to post-2-Newtonian
(2PN) order there is no time averaged secular evolution of
the spin allowed Gergely to obtain expressions for L̇ and
κ̇ from the quadrupole formula for the evolution of the
total angular momentum J = L+S. In a different paper,
Gergely [18] showed that in addition to the quadrupole,
self-interaction spin effects also contribute at 2PN order,
which was seen previously in the black hole perturbation
calculations of Shibata et al. [24]. Gergely calculated
the effect of this interaction on the instantaneous and
time-averaged fluxes of E and |L| but did not derive the
evolution of the third constant of motion.
In this paper, we will re-examine the effects of the
quadrupole moment of the black hole and of the leading
order spin self interaction. For a black hole, our analysis
will thus contain all effects that are quadratic in spin to
the leading order in ǫ2 and in µ/M . Our work will extend
earlier work by
• Considering generic orbits.
• Using a natural generalization of the Carter-type
constant that can be defined for two point particles
when one of them has a quadrupole. This facilitates
applying our analysis to Kerr inspirals.
• Computing instantaneous as well as time-averaged
fluxes for all three constants of motion: energy
E, z-component of angular momentum Lz, and
Carter-type constant K. For most purposes, only
time-averaged fluxes are needed as only they are
gauge invariant and physically relevant. However,
there is one effect for which the time-averaged
fluxes are insufficient, namely transient resonances
that occur during an inspiral in Kerr in the vicin-
ity of geodesics for which the radial and azimuthal
frequencies are commensurate [10, 25]. The instan-
taneous fluxes derived in this paper will be used in
[10] for studying the effect of these resonances on
the gravitational wave phasing.
We will analyze the effect of gravitational radiation re-
action on orbits around a body with an axisymmetric
mass quadrupole moment Q to leading order in Q, to the
leading post-Newtonian order, and to leading order in the
mass ratio. With these approximations the adiabatic ap-
proximation holds: gravitational radiation reaction takes
place over a timescale much longer than the orbital pe-
riod, so the orbit looks geodesic on short timescales. We
follow Ryan’s method of computation [14]: First, we cal-
culate the orbital motion in the absence of radiation re-
action and the associated constants of motion. Next, we
use the leading order radiation reaction accelerations that
act on the particle (given by the Burke-Thorne formula
[26] augmented by the relevant spin corrections [14]) to
compute the evolution of the constants of motion. In the
adiabatic limit, the time-averaged rates of change of the
constants of motion can be used to infer the secular or-
bital evolution. Our results show that a mass quadrupole
has the same qualitative effect on the evolution as spin: it
tends to circularize eccentric orbits and drive the orbital
plane towards antialignment with the symmetry axis of
the quadrupole.
The relevance of our result to point particles inspi-
ralling into black holes is as follows. The vacuum space-
time geometry around any stationary body is completely
characterized by the body’s mass multipole moments
IL = Ia1,a2...al and current multipole moments SL =
Sa1,a2...al [27]. These moments are defined as coefficients
in a power series expansion of the metric in the body’s
local asymptotic rest frame [28]. For nearly Newtonian
sources, they are given by integrals over the source as
IL ≡ Ia1,...al =
ρx<a1 . . . xal>d
3x, (1.1)
SL ≡ Sa1,...al =
ρxpvqǫpq<a1xa2 . . . xal>d
3x.(1.2)
Here ρ is the mass density and vq is the velocity, and ”<
· · · >” means ”symmetrize and remove all traces”. For
axisymmetric situations, the tensor multipole moments
IL (SL) contain only a single independent component,
conventionally denoted by Il (Sl) [27]. For a Kerr black
hole of mass M and spin S, these moments are given by
Il + iSl =M
l+1(ia)l, (1.3)
where a is the dimensionless spin parameter defined by
a = |S|/M2. Note that Sl = 0 for even l and Il = 0 for
odd l.
Consider now inspirals into an axisymmetric body
which has some arbitrary mass and current multipoles
Il and Sl. Then we can consider effects that are linear in
Il and Sl for each l, effects that are quadratic in the mul-
tipoles proportional to IlIl′ , IlSl′ , SlSl′ , effects that are
cubic, etc. For a general body, all these effects can be sep-
arated using their scalings, but for a black hole, Il ∝ al
for even l and Sl ∝ al for odd l [see Eq.(1.3)], so the ef-
fects cannot be separated. For example, a physical effect
that scales as O(a2) could be an effect that is quadratic
in the spin or linear in the quadrupole; an analysis in
Kerr cannot distinguish these two possibilities. For this
reason, it is useful to analyze spacetimes that are more
general than Kerr, characterized by arbitrary Il and Sl,
as we do in this paper. For recent work on computing
exact metrics characterized by sets of moments Il and Sl,
see Refs. [29, 30] and references therein.
The leading order effect of the black hole’s multipoles
on the inspiral is the O(a) effect computed by Ryan [15].
This O(a) effect depends linearly on the spin S1 and is
independent of the higher multipoles Sl and Il since these
all scale as O(a2) or smaller. In this paper we compute
the O(a2) effect on the inspiral, which includes the lead-
ing order linear effect of the black hole’s quadrupole (lin-
ear in I2 ≡ Q) and the leading order spin self-interaction
(quadratic in S1).
We next discuss how these O(a2) effects scale with the
post-Newtonian expansion parameter ǫ. Consider first
the conservative orbital dynamics. Here it is easy to see
that fractional corrections that are linear in I2 scale as
O(a2ǫ4), while those quadratic in S1 scale as O(a
2ǫ6).
Thus, the two types of terms cleanly separate. We com-
pute only the leading order, O(a2ǫ4), term. For the dissi-
pative contributions to the orbital motion, however, the
scalings are different. There are corrections to the radi-
ation reaction acceleration whose fractional magnitudes
are O(a2ǫ4) from both types of effects linear in I2 and
quadratic in S1. The effects quadratic in S1 are due to
the backscattering of the radiation off the piece of space-
time curvature due to the black hole’s spin. This effect
was first pointed out by Shibata et al. [24], who com-
puted the time-averaged energy flux for circular orbits
and small inclination angles based on a PN expansion of
black hole perturbations. Later, Gergely [18] analyzed
this effect on the instantaneous and time-averaged fluxes
of energy and magnitude of orbital angular momentum
within the PN framework.
The organization of this paper is as follows. In Sec.
II, we study the conservative orbital dynamics of two
point particles when one particle is endowed with an ax-
isymmetric quadrupole, in the weak field regime, and to
leading order in the mass ratio. In Sec. III, we com-
pute the radiation reaction accelerations and the instan-
taneous and time-averaged fluxes. In order to have all
the contributions at O(a2ǫ4) for a black hole, we include
in our computations of radiation reaction acceleration
the interaction that is quadratic in the spin S1. The ap-
plication to black holes in Sec. IV briefly discusses the
qualitative predictions of our results and also compares
with previous results.
The methods used in this paper can be applied only
to the black hole spin (as analyzed by Ryan [14]) and
the black hole quadrupole (as analyzed here). We show
in Sec. V that for the higher order mass and current
multipole moments taken individually, an analog of the
Carter constant cannot be defined to the order of our
approximations. We then show that under mild assump-
tions, this non-existence result can be extended to exact
spacetimes, thus falsifying the conjecture that all vac-
uum axisymmetric spacetimes possess a third constant
of geodesic motion.
II. EFFECT OF AN AXISYMMETRIC MASS
QUADRUPOLE ON THE CONSERVATIVE
ORBITAL DYNAMICS
Consider two point particles m1 and m2 interacting in
Newtonian gravity, where m2 ≪ m1 and where the mass
m1 has a quadrupole moment Qij which is axisymmetric:
Qij =
d3xρ(r)
xixj −
r2δij
(2.1)
ninj −
. (2.2)
For a Kerr black hole of mass M and dimensionless spin
parameter a with spin axis along n, the quadrupole scalar
is Q = −M3a2.
The action describing this system, to leading order in
m2/m1, is
µv2 − µΦ(r)
, (2.3)
where v = ṙ is the velocity, the potential is
Φ(r) = −M
xixjQij , (2.4)
µ is the reduced mass and M the total mass of the bi-
nary, and we are using units with G = c = 1. We work to
linear order in Q, to linear order in m2/m1, and to lead-
ing order in M/r. In this regime, the action (2.3) also
describes the conservative effect of the black hole’s mass
quadrupole on bound test particles in Kerr, as discussed
in the introduction. We shall assume that the quadrupole
Qij is constant in time. In reality, the quadrupole will
evolve due to torques that act to change the orientation
of the central body. An estimate based on treating m1 as
a rigid body in the Newtonian field ofm2 gives the scaling
of the timescale for the quadrupole to evolve compared to
the radiation reaction time as (see Appendix I for details)
Tevol
(2.5)
Here, we have denoted the dimensionless spin and
quadrupole of the body by S̄ and Q̄ respectively, and
the last relation applies for a Kerr black hole. Since
µ/M ≪ 1, the first factor in Eq. (2.5) will be large, and
since 1/a ≥ 1 and for the relativistic regime M/r ∼ 1,
the evolution time is long compared to the radiation re-
action time. Therefore we can neglect the evolution of
the quadrupole at leading order.
This system admits three conserved quantities, the en-
µv2 + µΦ(r), (2.6)
the z-component of angular momentum
Lz = ez · (µr× v), (2.7)
and the Carter-type constant
K = µ2(r× v)2 − 2Qµ
(n · r)2
(n · v)2 −
. (2.8)
(See below for a derivation of this expression for K).
A. Conservative orbital dynamics in a
Boyer-Lindquist-like coordinate system
We next specialize to units where M = 1. We also
define the rescaled conserved quantities by Ẽ = E/µ,
L̃z = Lz/µ, K̃ = K/µ
2, and drop the tildes. These spe-
cializations and definitions have the effect of eliminating
all factors of µ and M from the analysis. In spherical
polar coordinates (r, θ, ϕ) the constants of motion E and
Lz become
(ṙ2 + r2θ̇2 + r2 sin2 θϕ̇2)−
(1− 3 cos2 θ), (2.9)
Lz = r
2 sin2 θϕ̇. (2.10)
In these coordinates, the Hamilton-Jacobi equation is not
separable, so a separation constant K cannot readily be
derived. For this reason we switch to a different coordi-
nate system (r̃, θ̃, ϕ) defined by
r cos θ = r̃ cos θ̃
r sin θ = r̃ sin θ̃
. (2.11)
We also define a new time variable t̃ by
cos(2θ̃)
dt̃. (2.12)
The action (2.3) in terms of the new variables to linear
order in Q is
r̃2 sin2 θ̃
sin2 θ̃
. (2.13)
However, a difficulty is that the action (2.13) does not
give the same dynamics as the original action (2.3). The
reason is that for solutions of the equations of motion for
the action (2.3), the variation of the action vanishes for
paths with fixed endpoints for which the time interval ∆t
is fixed. Similarly, for solutions of the equations of motion
for the action (2.13), the variation of the action vanishes
for paths with fixed endpoints for which the time interval
∆t̃ is fixed. The two sets of varied paths are not the
same, since ∆t 6= ∆t̃ in general. Therefore, solutions of
the Euler-Lagrange equations for the action (2.3) do not
correspond to solutions of the Euler-Lagrange equations
for the action (2.13). However, in the special case of zero-
energy motions, the extra terms in the variation of the
action vanish. Thus, a way around this difficulty is to
modify the original action to be
µv2 − µΦ(r) + E
. (2.14)
This action has the same extrema as the action (2.3),
and for motion with physical energy E, the energy com-
puted with this action is zero. Transforming to the new
variables yields, to linear order in Q:
r̃2 sin2 θ̃
sin2 θ̃
+ E − QE
cos(2θ̃)
. (2.15)
The zero-energy motions for this action coincide with the
zero energy motions for the action (2.14). We use this
action (2.15) as the foundation for the remainder of our
analysis in this section.
The z-component of angular momentum in terms of
the new variables (r̃, θ̃, ϕ, t̃) is
Lz = r̃
2 sin2 θ̃
sin2 θ̃
. (2.16)
We now transform to the Hamiltonian:
p2r̃ −
− E − Q
sin2 θ̃
+QE cos(2θ̃)
(2.17)
and solve the Hamiltonian Jacobi equation. Denoting
the separation constant by K we obtain the following
two equations for the r̃ and θ̃ motions:
= 2E +
, (2.18)
= K − L
sin2 θ̃
−QE cos(2θ̃). (2.19)
Note that the equations of motion (2.18) and (2.19) have
the same structure as the equations of motion for Kerr
geodesic motion. Using Eqs. (2.18), (2.19) and (2.16)
together with the inverse of the transformation (2.11)
to linear order in Q, we obtain the expression for K in
spherical polar coordinates:
K = r4(θ̇2 + sin2 θϕ̇2) +Q(ṙ cos θ − rθ̇ sin θ)2 + Q
(ṙ2 + r2 θ̇2 + r2 sin2 θϕ̇2)− 2Q
cos2 θ. (2.20)
This is equivalent to the formula (2.8) quoted earlier.
B. Effects linear in spin on the conservative orbital
dynamics
To include the linear in spin effects, we repeat Ryan’s
analysis [14, 15] (he only gives the final, time averaged
fluxes; we will also give the instantaneous fluxes). We
can simply add these linear in spin terms to our results
because any terms of order O(SQ) will be higher than
the order a2 to which we are working. The correction to
the action (2.3) due to spin-orbit coupling is
Sspin−orbit =
−2µSn
iǫijkxj ẋk
. (2.21)
We will restrict our analysis to the case when the unit
vectors ni corresponding to the axisymmetric quadrupole
Qij and to the spin Si coincide, as they do in Kerr.
Including the spin-orbit term in the action (2.3) results
in the following modified expressions for Lz and K:
Lz = n · (µr× v)−
[r2 − (n · r)2], (2.22)
K = (r× v)2 − 4S
n · (r× v)− 2Q
(n · r)2
(n · v)2 − 1
. (2.23)
In terms of the Boyer-Lindquist like coordinates, the con-
served quantities with the linear in spin terms included
Lz = r̃
2 sin2 θ̃
sin2 θ̃ −Q sin4 θ̃
(2.24)
K = r4(θ̇2 + sin2 θϕ̇2)− 4Sr sin2 θϕ̇
cos2 θ +Q(ṙ cos θ − rθ̇ sin θ)2 + QM
(ṙ2 + r2θ̇2 + r2 sin2 θϕ̇2). (2.25)
The equations of motion are
= 2E+
− 4SLz
, (2.26)
= K −
sin2 θ̃
−QE cos(2θ̃). (2.27)
III. EFFECTS LINEAR IN QUADRUPOLE AND
QUADRATIC IN SPIN ON THE EVOLUTION OF
THE CONSTANTS OF MOTION
A. Evaluation of the radiation reaction force
The relative acceleration of the two bodies can be writ-
ten as
a = −∇Φ(r) + arr, (3.1)
where arr is the radiation-reaction acceleration. Combin-
ing this with Eqs. (2.6), (2.22) and (2.23) for E, Lz and
K gives the following formulae for the time derivatives of
the conserved quantities:
Ė = v · arr, (3.2)
L̇z = n · (r× arr), (3.3)
K̇ = 2(r× v) · (r× arr)−
n · (r× arr)
+2Q(n · v) (n · arr)−Qv · arr. (3.4)
The standard expression for the leading order radiation
reaction acceleration acting on one of the bodies is [31]:
ajrr = −
jk xk +
ǫjpqS
pk xkxq +
ǫjpqS
pk xkvq
ǫpq[jS
xqvk. (3.5)
Here the superscripts in parentheses indicate the number
of time derivatives and square brackets on the indices
denote antisymmetrization.
The multipole moments Ijk(t) and Sjk(t) in Eq. (3.5)
are the total multipole moments of the spacetime, i.e.
approximately those of the black hole plus those due to
the orbital motion. The expression (3.5) is formulated
in asymptotically Cartesian mass centered (ACMC) co-
ordinates of the system, which are displaced from the
coordinates used in Sec. II by an amount [28]
δr(t) = − µ
r(t). (3.6)
This displacement contributes to the radiation reaction
acceleration in the following ways:
1. The black hole multipole moments Il and Sl, which
are time-independent in the coordinates used in
Sec. II, will be displaced by δr and thus will con-
tribute to the (l + 1)th ACMC radiative multipole
[28].
2. The constants of motion are defined in terms of the
black hole centered coordinates used in Sec. II, so
the acceleration arr we need in Eqs. (3.2) – (3.4)
is the relative acceleration. This requires calculat-
ing the acceleration of both the black hole and the
point mass in the ACMC coordinates using (3.5),
and then subtracting to find arr = a
rr − aMrr [14].
To leading order in µ, the only effect of the acceler-
ation of the black hole is via a backreaction of the
radiation field: the lth black hole moments couple
to the (l+1)th radiative moments, thus producing
an additional contribution to the acceleration.
For our calculations at O(S1ǫ
3), O(I2ǫ
4), O(S21ǫ
4), we
can make the following simplifications:
• quadrupole corrections: The fractional corrections
linear in I2 = Q that scale as O(a
2ǫ4) require only
the effect of I2 on the conservative orbital dynamics
as computed in Sec. IIA and the Burke-Thorne for-
mula for the radiation reaction acceleration [given
by the first term in Eq. (3.5)].
• spin-spin corrections: As discussed in the intro-
duction, the fractional corrections quadratic in S1
to the conservative dynamics scale as O(a2ǫ6) and
are subleading order effects which we neglect. At
O(a2ǫ4), the only effect quadratic in S1 is the
backscattering of the radiation off the spacetime
curvature due to the spin. As discussed in item 1.
above, the black hole’s current dipole Si = S1δi3
(taking the z-axis to be the symmetry axis) will
contribute to the radiative current quadrupole an
amount
ij = −
S1xiδj3. (3.7)
The black hole’s current dipole Si will couple to
the gravitomagnetic radiation field due to Sij as
discussed in item 2. above, and contribute to the
relative acceleration as [14]:
aj spinrr =
S1δi3S
ij . (3.8)
For our purposes of computing terms quadratic in
the spin, we substitute S
ij for Sij in Eq. (3.8).
Evaluating these quadratic in spin terms requires
only the Newtonian conservative dynamics, i.e. the
results of Sec. II and Eqs. (3.2) – (3.4) with the
quadrupole set to zero.
• linear in spin corrections: Contributions to these
effects are from Eq. (3.5) with the current
quadrupole replaced by just the spin contribution
(3.7), and from Eq. (3.8) evaluated using only the
orbital current quadrupole.
With these simplifications, we replace the expression
(3.5) for the radiation reaction acceleration with
ajrr = −
jk xk +
ǫjpqS
(6) spin
pk xkxq
ǫjpqS
(5) spin
pk xkvq +
ǫpq[jS
(5) spin
S1δi3
(5) orbit
ij + S
(5) spin
. (3.9)
To justify these approximations, consider the scaling of
the contribution of black hole’s acceleration to the orbital
dynamics. The mass and current multipoles of the black
hole contribute terms to the Hamiltonian that scale with
∆H ∼ Slǫ2l+3 & Ilǫ2l+2. (3.10)
Since the Newtonian energy scales as ǫ2, the fractional
correction to the orbital dynamics scale as
∆H/E ∼ Slǫ2l+1 & Ilǫ2l. (3.11)
To O(ǫ4), the only radiative multipole moments that con-
tribute to the acceleration (3.5) are the mass quadrupole
I2, the mass octupole I3, and the current quadrupole S2
(cf. [17]). Since we are focusing only on the leading or-
der terms quadratic in spin (these can simply be added
to the known 2PN point particle and 1.5PN linear in spin
results), the only terms in Eq. (3.5) relevant for our pur-
poses are those given in Eq. (3.9). The results from a
computation of the fully relativistic metric perturbation
for black hole inspirals [24] show that quadratic in spin
corrections to the l = 2 piece compared to the flat space
Burke-Thorne formula first appear at O(a2ǫ4), which is
consistent with the above arguments.
B. Instantaneous fluxes
We evaluate the radiation reaction force as follows.
The total mass and current quadrupole moment of the
system are
QTij = Qij + µxixj , (3.12)
STij = S
ij + xiǫjkmxkẋm, (3.13)
where from Eq. (2.11)
r̃ sin θ̃
cosϕ, r̃ sin θ̃
sinϕ,
r̃ cos θ̃
. (3.14)
Only the second term in Eq. (3.12) contributes to the
time derivative of the quadrupole. We differentiate five
times by using
cos(2θ̃)
, (3.15)
to the order we are working as discussed above. Af-
ter each differentiation, we eliminate any occurrences of
dϕ/dt̃ using Eq. (2.24), and we eliminate any occurrences
of the second order time derivatives d2r̃/dt̃2 and d2θ̃/dt̃2
in favor of first order time derivatives using (the time
derivatives of) Eqs. (2.26) and (2.27). For computing the
terms linear and quadratic in S1, we set the quadrupole
Q to zero in all the formulae. We insert the resulting ex-
pression into the formula (3.9) for the self-acceleration,
and then into Eqs. (3.2) – (3.4). We eliminate (dr̃/dt̃)2,
(dθ̃/dt̃)2, and (dϕ/dt̃) in favor of E, Lz, and K using
Eqs. (2.24) – (2.27). In the final expressions for the in-
stantaneous fluxes, we keep only terms that are of O(S),
O(Q) and O(S2) and obtain the following results:
15r̃4
− 40K
272KE
196K2 +
r̃2 − 3668
Kr̃ − 352KEr̃2 + 1024
Er̃3 +
E2r̃4
−49K2 − 169KL2z + r̃
+ 2r̃2
+ 47KE +
− 152
r̃3E − 16r̃4E2
−562K2 +
Kr̃ −
r̃2 +
KEr̃2 −
r̃3E − 160r̃4E2
cos(2θ̃)
sin(2θ̃)
439K − 926
r̃ − 1528
θ ˙̃r
−K2 + 22
Kr̃ − 28
r̃2 +
KEr̃2 − 236
r̃3E − 32
r̃4E2
cos(2θ̃)− r̃3 sin(2θ̃)
θ ˙̃r
−49K2 + 6KL2z + 2r̃
63K − 16
L2z −
+ r̃2
112KE − 48
− 1652
r̃3E − 224
r̃4E2
, (3.16)
L̇z =
144LzE
− 24KLz
−50K2 + 240KL2z +
Kr̃ − 7376
L2z r̃ +
r̃2 + 56KEr̃2 − 1824
EL2z r̃
Er̃3 +
E2r̃4
50K2 − 62
Kr̃ − 316
r̃2 − 56KEr̃2 − 624
Er̃3 − 128
E2r̃4
cos(2θ̃)
−104K + 64r̃ + 64Er̃2
sin(2θ̃) ˙̃r
660Er̃2 + 753r̃− 360L2z − 435K +
1601r̃+ 1512r̃2E − 1185K
cos 2θ̃
174QLz
sin(2θ̃) ˙̃r
2S2Lz
Er̃2 + 16r̃ − 9K
, (3.17)
20r̃ + 18r̃2E − 15K
280K2 − 14008
Kr̃ +
r̃2 +
Er̃3 − 2528
KEr̃2 +
E2r̃4
−45K2 + r̃L2z(83 + 80r̃E)− 115KL2z + 14Kr̃(6 + 5r̃E)
15r̃7
cos(2θ̃)
−2175K2 + 2975Kr̃+ 80r̃2 + 3012KEr̃2 − 112Er̃3 − 168E2r̃4
15r̃4
3075K − 20r̃ − 192Er̃2
sin(2θ̃)
θ ˙̃r
7K − 2L2z
−3K + 16
+K cos(2θ̃)
3K − 16
r̃ − 24
sin(2θ̃)
−4K +
θ ˙̃r. (3.18)
C. Alternative set of constants of the motion
A body in a generic bound orbit in Kerr traces an
open ellipse precessing about the hole’s spin axis. For
stable orbits the motion is confined to a toroidal region
whose shape is determined by E, Lz, K. The motion
can equivalently be characterized by the set of constants
inclination angle ι, eccentricity e, and semi-latus rectum
p defined by Hughes [32]. The constants ι, p and e are
defined by cos ι = Lz/
K, and by r̃± = p/(1± e), where
r̃± are the turning points of the radial motion, and r̃
is the Boyer-Lindquist radial coordinate. This param-
eterization has a simple physical interpretation: in the
Newtonian limit of large p, the orbit of the particle is an
ellipse of eccentricity e and semilatus rectum p on a plane
whose inclination angle to the hole’s equatorial plane is
ι. In the relativistic regime p ∼M , this interpretation of
the constants e, p, and ι is no longer valid because the
orbit is not an ellipse and ι is not the angle at which the
object crosses the equatorial plane (see Ryan [14] for a
discussion).
We adopt here analogous definitions of constants of
motion ι, e and p, namely
cos(ι) = Lz/
K, (3.19)
= r̃±. (3.20)
Here K is the conserved quantity (2.23) or (2.25), and r̃±
are the turning points of the radial motion using the r̃
coordinate defined by Eq. (2.11), given by the vanishing
of the right-hand side of Eq. (2.26).
We now rewrite our results in terms of the new con-
stants of the motion e, p and ι. We can use Eq. (2.26)
together with the equations (3.19) and (3.20) to write E,
Lz and K as functions of p, e and ι. To leading order in
Q and S we obtain
K = p
1− 2S cos ι
3 + e2
1 + e2
) 2Q cos2 ι
3 + e2
, (3.21)
E = − (1− e
2S cos ι
1− e2
1− e2
cos2 ι− 1
, (3.22)
p cos ι
1− S cos ι
(3 + e2)−
1 + e2
) Q cos2 ι
3 + e2
. (3.23)
As discussed in the introduction, the effects quadratic in
S on the conservative dynamics scale as O(a2ǫ6) and thus
are not included in this analysis to O(a2ǫ4).
Inserting these relations into the expressions (3.16)–
(3.18) gives, dropping terms of O(QS), O(Q2) and
O(QS2):
Ė = −
15p2r̃7
75p4 − 100p3r̃ + p2r̃2
11− 51e2
+ 32pr̃3
1− e2
)− 6r̃4
1− e2
4S cos ι
15p7/2r̃9
735p6 − 2751p5r̃ + 10p4r̃2(365− 6e2)− 128pr̃5(1− e2)2 − 48r̃6(e2 − 1)3
64S cos ι
15p3/2r̃6
5p(−23 + 3e2)− 3r̃(−9 + e2 + 8e4)
15p4r̃9
4005p6 − 6499p5r̃ + 2p4r̃2
1577− 1977e2
− 24r̃6
1− e2
)3 − 32p3r̃3
8− 33e2
+ 64pr̃5
1− 2e2 + e4
15p4r̃9
24p2r̃4
5− 27e2 + 22e4
− pr̃3 sin(2θ̃)
6585p2 − 4630pr̃+ 2292r̃2(1 − e2)
θ ˙̃r
15p4r̃9
2p2 cos(2θ̃)
4215p4 − 7495p3r̃ + 4p2r̃2(1151− 951e2)− 1012pr̃3(1− e2) + 300r̃4(1− 2e2 + e4)
15p4r̃9
cos(2ι)
2535p6 − 3307p5r̃ + 12p4r̃2(37− 237e2)− 48r̃6(1− e2)3 + 800p3r̃3(1 + e2) + 128pr̃5(1− 2e2 + e4)
15p2r̃5
cos(2ι)
1 + 2e2 − 3e4
15p2r̃9
84r̃4(1− e2)2(1 + e2)2 + 345p4 − 905p3r̃ − 413pr̃3(1− e2) + 2p2r̃2(446− 201e2)
15p2r̃9
cos(2θ̃)
15p4 − 110p3r̃ + 4p2r̃2(47− 12e2)− 118pr̃3(1− e2) + 24r̃4(1− e2)2(1 + e2)2
15r̃9
cos(2ι)
45p2 − 80pr̃ + 36r̃2(1− e2)
15pr̃6
sin(2θ̃) ˙̃r
15p2 + 10pr̃ − 12r̃2(1− e2)
, (3.24)
L̇z = −
8 cos ι
15p2 − 20pr̃ + 9r̃2(1 − e2)
15p2r̃7
525p4 − 1751p3r̃ + 34p2r̃2(61− 6e2) + 12pr̃3(−69 + 29e2) + 6r̃4(17 + 2e2 − 19e4)
15p2r̃7
375p4 − 93p3r̃ + 468pr̃3(1− e2)− 10p2r̃2(58 + 21e2)− 48r̃4(1 − 2e2 + e4)
cos(2θ̃)
15p2r̃7
450p4 − 922p3r̃ − 60pr̃3(3 + e2)− 9p2r̃2(−83 + 23e2) + 27r̃4(1 + 2e2 − 3e4)
cos(2ι)
13p2 − 8pr̃ + 4r̃2(1− e2)
sin(2θ̃) ˙̃r
− Q cos ι
5p5/2r̃7
615p4 − 753p3r̃ + 15p2r̃2
19− 31e2
+ 20pr̃3
1 + 3e2
+ 9r̃4
1− 6e2 + 5e4
− Q cos ι
5p1/2r̃7
cos(2θ̃)
1185p2 − 1601pr̃+ 756r̃2(1− e2)
− 2Q cos ι
5p5/2r̃7
2 cos(2ι)
45p4 − 18r̃4e2(1− e2)− 45p2r̃2(1 + e2) + 20pr̃3(1 + e2)
− 435p3r̃3 sin(2θ̃) ˙̃θ ˙̃r
2 cos ι
p1/2r̃7
9p2 − 16pr̃ + 36
r̃2(1− e2)
, (3.25)
20pr̃ − 15p2 − 9r̃2(1− e2)
8S cos ι
15p3/2r̃7
525p4 − 1751p3r̃ + 2p2r̃2(1172− 57e2) + 12pr̃3(−99 + 19e2)− 24r̃4(−11 + 4e2 + 7e4)
5p2r̃7
−615p4 + 753p3r̃ + 30p2r̃2(17e2 − 9) + 72r̃4e2(1− e2)− 40pr̃3(1 + 3e2)
5p2r̃7
cos(2ι)
−345p4 + 249p3r̃ − 160pr̃3(1 + e2) + 120p2r̃2(1 + 3e2) + 36r̃4(1 + 2e2 − 3e4)
15p2r̃7
2 cos(2θ̃)
2175p4 − 2975p3r̃ − 56pr̃3(1 − e2) + 2p2r̃2(713− 753e2) + 42r̃4(1 − 2e2 + e4)
15pr̃4
sin(2θ̃)
3075p2 − 20pr̃ + 96r̃2(1 − e2)
−9p2 + 16pr̃ − 36
r̃2(1− e2)
cos(2θ̃) + cos(2ι)
3p2 − 16
pr̃ +
r̃2(1 − e2)
sin(2θ̃) ˙̃r
−2p2 + 7
pr̃ − 4
r̃2(1− e2)
. (3.26)
D. Time averaged fluxes
In this section we will compute the infinite time-
averages 〈Ė〉, 〈L̇z〉 and 〈K̇〉 of the fluxes. These averages
are defined by
〈Ė〉 ≡ lim
∫ T/2
Ė(t)dt. (3.27)
These time-averaged fluxes are sufficient to evolve or-
bits in the adiabatic regime (except for the effect of res-
onances) [12, 25]. In Appendix II, we present two dif-
ferent ways of computing the time averages. The first
approach is based on decoupling the r̃ and θ̃ motion us-
ing the analog of the Mino time parameter for geodesic
motion in Kerr [12]. The second approach uses the ex-
plicit Newtonian parameterization of the orbital motion.
Both averaging methods give the following results:
〈Ė〉 = −32
(1− e2)3/2
e4 − S
cos(ι)
cos(2ι)
cos(2ι)
,(3.28)
〈L̇z〉 = −
(1 − e2)3/2
cos ι
e2 − S
2p3/2 cos ι
+ 7e2 +
cos(2ι)
45 + 148e2 +
cos(2ι)
1 + 3e2 +
, (3.29)
〈K̇〉 = −64
(1 − e2)3/2
e2 − S
2p3/2
+ 37e2 +
cos(ι)
cos(2ι)
cos(2ι)
. (3.30)
Using Eqs. (3.21) and (3.23), we obtain from (3.28) – (3.30) the following time averaged rates of change of the
orbital elements e, p, ι:
〈ṗ〉 = −64
(1− e2)3/2
− S cos(ι)
96p3/2
1064 + 1516e2 + 475e4
149e2
469e2
227e4
cos(2ι)
+ e2 +
[13− cos(2ι)]
, (3.31)
〈ė〉 = −
e(1− e2)3/2
121e2
Se(1− e2)3/2 cos(ι)
5p11/2
1172 + 932e2 +
1313e4
Q(1− e2)3/2
785e2
219e4
+ 13e6 +
2195e2
+ 251e4 +
218e6
cos(2ι)
S2e(1− e2)3/2
2 + 3e2 +
[13− cos(2ι)] , (3.32)
〈ι̇〉 = S sin(ι)(1 − e
2)3/2
p11/2
1− e2
S2 sin(2ι)
240p6
8 + 3e2
8 + e2
Q cot(ι)(1 − e2)3/2
312 + 736e2 − 83e4 −
408 + 1268e2 + 599e4
cos(2ι)
. (3.33)
IV. APPLICATION TO BLACK HOLES
A. Qualitative discussion of results
The above results for the fluxes, Eqs. (3.31), (3.32)
and (3.33) show that the correction terms at O(a2ǫ4)
due to the quadrupole have the same type of effect on the
evolution as the linear spin correction computed by Ryan:
they tend to circularize eccentric orbits and change the
angle ι such as to become antialigned with the symmetry
axis of the quadrupole.
The effects of the terms quadratic in spin are quali-
tatively different. In the expression (3.28) for 〈Ė〉, the
coefficient of cos(2ι) due to the spin self-interaction has
the same sign as the quadrupole term, while the terms
not involving ι have the opposite sign. The terms in-
volving cos(2ι) in Eq. (3.30) for 〈K̇〉 of O(Q) and O(S2)
terms have the same sign, while the terms not involving
ι have the opposite sign. The fractional spin-spin cor-
rection to 〈L̇z〉, Eq. (3.29), has no ι-dependence, and in
expression (3.33) for 〈ι̇〉, the dependence on ι of the two
effects O(Q) and O(S2) is different, too. This is not sur-
prising as the O(Q) effects included here are corrections
to the conservative orbital dynamics, while the effects of
O(S2) that we included are due to radiation reaction.
B. Comparison with previous results
The terms linear in the spin in our results for the time
averaged fluxes, Eqs. (3.28) – (3.33), agree with those
computed by Ryan, Eqs. (14a) – (15c) of [15], and with
those given in Eqs. (2.5) – (2.7) of Ref. [33], when we use
the transformations to the variables used by Ryan given
in Eqs. (2.3) – (2.4) in [33].
Equation (3.28) for the time averaged energy flux
agrees with Eq. (3.10) of Gergely [23] and Eq. (4.15)
of [18] when we use the following transformations:
K = L̄2
Ā2 sin2 κ cos δ − (1− Ā2) cos2 κ
= L̄2
E cos2 κ
(1 + 2L̄2) sin2 κ cos δ
, (4.1)
cos ι = cosκ
E cos2 κ
(1 + 2L̄2) sin2 κ cos δ
, (4.2)
(δ + κ), (4.3)
ξ0 = (ψ0 − ψi) +
, (4.4)
where Ā, L̄, κ, δ, ψ0 and ψi are the quantities used by
Gergely. The first relation here is obtained from the turn-
ing points of the radial motion as follows. We compute
r̃± in terms of E and K and map these expressions back
to r using Eqs. (2.11). The result can then be com-
pared with the turning points in Gergely’s variables, Eq.
(2.19) of [23], using the fact that E is the same in both
cases. Instead of the evolution of the constants of motion
K and Lz, Gergely computes the rates of change of the
magnitude L of the orbital angular momentum and of the
angle κ defined by cosκ = (L · S)/L. Using the trans-
formations (4.1) – (4.4) and the definition of κ we verify
that our Eq. (3.29) agrees with the 〈L̇z〉 computed using
Gergely’s Eqs. (3.23) and (3.35) in [23] and Eq. (4.30)
of [18].
In the limit of the circular equatorial orbits analyzed
by Poisson [22], our Eq. (3.28) agrees with Poisson’s Eq.
(22) when we use the transformations and specializations:
, (4.5)
ι = 0, (4.6)
e2 = 0, (4.7)
cosαA = 1, (4.8)
where v and αA are the variables used by Poisson and the
relation (4.5) is obtained by comparing the expressions
for the constants of motion in the two sets of variables.
The main improvement of our analysis over Gergely’s
is that we express the results in terms of the Carter-type
constant K, which facilitates comparing our results with
other analyses of black hole inspirals. Our computations
also include the spin curvature scattering effects for all
three constants of motion; Gergely [18] only considers
these effects for two of them: the energy and magnitude
of angular momentum, not for the third conserved quan-
tity.
When we expand Eq. (3.28) for small inclination an-
gles and specialize to circular orbits, then after converting
p to the parameter v using Eq. (4.5), we obtain
〈Ė〉 = − 32
11Q− S
= − 32
33− 527
. (4.9)
This result agrees with the terms atO(a2v4) of Eq. (3.13)
of Shibata et al. [24], whose calculations were based on
the fully relativistic expressions. This agreement is a
check that we have taken into account all the contribu-
tions at O(a2ǫ4). The analysis in Ref. [24] could not dis-
tinguish between effects due to the quadrupole and those
due curvature scattering, but we can see from Eq. (4.9)
that those two interactions have the opposite dependence
on ι. Comparing (4.9) with Eq. (3.7) of [24] (which gives
the fluxes into the different modes (l = 2,m, n), where m
and n are the multiples of the ϕ and θ frequencies), we see
that the terms in the (2,±2, 0) and the (2,±1,±1) modes
are entirely due to the quadrupole, while the spin-spin in-
teraction effects are fully contained in the (2,±1, 0) and
(2, 0,±1) modes.
V. NON-EXISTENCE OF A CARTER-TYPE
CONSTANT FOR HIGHER MULTIPOLES
In this section, we show that for a single axisymmetric
multipole interaction, it is not possible to find an ana-
log of the Carter constant (a conserved quantity which
does not correspond to a symmetry of the Lagrangian),
except for the cases of spin (treated by Ryan [15]) and
mass quadrupole moment (treated in this paper). Our
proof is valid only in the approximations in which we
work – expanding to linear order in the mass ratio, to
the leading post-Newtonian order, and to linear order in
the multipole. However we will show below that with
very mild additional smoothness assumptions, our non-
existence result extends to exact geodesic motion in exact
vacuum spacetimes.
We start in Sec. VA by showing that there is no co-
ordinate system in which the Hamilton-Jacobi equation
is separable. Now separability of the Hamilton-Jacobi
equation is a sufficient but not a necessary condition for
the existence of a additional conserved quantity. Hence,
this result does not yield information about the existence
or non-existence of an additional constant. Nevertheless
we find it to be a suggestive result. Our actual derivation
of the non-existence is based on Poisson bracket compu-
tations, and is given in Sec. VB.
A. Separability analysis
Consider a binary of two point masses m1 and m2,
where the mass m1 is endowed with a single axisymmet-
ric current multipole moment Sl or axisymmetric mass
multipole moment Il. In this section, we show that the
Hamilton-Jacobi equation for this motion, to linear order
in the multipoles, to linear order in the mass ratio and to
the leading post-Newtonian order, is separable only for
the cases S1 and I2.
We choose the symmetry axis to be the z-axis and write
the action for a general multipole as
ṙ2 + r2θ̇2 + r2 sin2 θϕ̇2
+ f(r, θ) + g(r, θ)ϕ̇+ E] . (5.1)
For mass moments, g(r, θ) = 0, while for current mo-
ments f(r, θ) = 0. For an axisymmetric multipole of
order l, the functions f and g will be of the form
f(r, θ) =
clIlPl(cos θ)
, g(r, θ) =
dlSl sin θ∂θPl(cos θ)
(5.2)
where Pl(cos θ) are the Legendre polynomials and cl and
dl are constants. We will work to linear order in f and g.
In Eq. (5.1), we have added the energy term needed when
doing a change of time variables, cf. the discussion before
Eq. (2.14) in section III. Since ϕ is a cyclic coordinate,
pϕ = Lz is a constant of motion and the system has
effectively only two degrees of freedom. Note that in the
case of a current moment, there will be correction term
in Lz:
Lz = r
2 sin2 θϕ̇+ g(r, θ). (5.3)
Next, we switch to a different coordinate system
(r̃, θ̃, ϕ) defined by
r = r̃ + α(r̃, θ̃, Lz), (5.4)
θ = θ̃ + β(r̃, θ̃, Lz), (5.5)
where the functions α and β are yet undetermined. We
also define a new time variable t̃ by
1 + γ(r̃, θ̃, Lz)
dt̃. (5.6)
Since we work to linear order in f and g, we can work
to linear order in α, β, and γ. We then compute the
action in the new coordinates and drop the tildes. The
Hamiltonian is given by
p2r(1 + γ − 2α,r) +
(1− 2α
− 2β,θ + γ)
(−α,θ − r2β,r)− E(1 + γ)
2r2 sin2 θ
(1 + γ − 2α
− 2β cot θ)
(1− α
+ γ)− f − gLz
r2 sin2 θ
(5.7)
and the corresponding Hamilton-Jacobi equation is
Ĉ1 +
+ 2V̂ , (5.8)
where we have denoted
Ĉ1 = J(r, θ) [1 + γ − 2α,r] = 1 + γ − 2α,r + j, (5.9)
Ĉ2 = J(r, θ)
1− 2α
− 2β,θ + γ
= 1− 2α
− 2β,θ + γ + j, (5.10)
Ĉ3 = J(r, θ)
−α,θ − r2β,r
= −α,θ − r2β,r, (5.11)
V̂ = J(r, θ)
2r2 sin2 θ
(1 + γ − 2α
− 2β cot θ)
+ γ)− E(1 + γ)
− f − gLz
r2 sin2 θ
2r2 sin2 θ
(1 + γ − 2α
− 2β cot θ + j)
−E(1 + γ + j)− 1
(1− α
+ γ + j)
−f − gLz
r2 sin2 θ
. (5.12)
The unperturbed problem is separable, so make the
perturbed problem separable, we have multiplied the
Hamilton-Jacobi equation by an arbitrary function
J(r, θ), which can be expanded as J(r, θ) = 1 + j(r, θ),
where j(r, θ) is a small perturbation.
To find a solution of the form W =Wr(r)+Wθ(θ), we
first specialize to the case where Ĉ3 = 0:
− Ĉ3 = β,rr2 + α,θ = 0. (5.13)
We differentiate Eq. (5.8) with respect to θ, using Eq.
(5.8) to write (dWr/dr)
2 in terms of (dWθ/dθ)
2 and then
differentiate the result with respect to r to obtain
∂θĈ2
∂θĈ1
2V̂ ∂θĈ1
Ĉ1Ĉ2
. (5.14)
Expanding Eq. (5.14) to linear order in the small quan-
tities then yields the two conditions for the kinetic and
the potential part of the Hamiltonian to be separable:
0 = ∂r∂θ
2α,r −
− 2β,θ
, (5.15)
sin2 θ
2β,r cot
2 θ − 3β,rθ cot θ + β,r csc2 θ
sin2 θ
+ α,rθ
−∂r∂θ
Pl(cos θ) +
dlSlLz
rl sin θ
∂θPl(cos θ)
2α,rθ −
+ 2Er2α,rθ
, (5.16)
where we have used Eq. (5.2) for f and g. Therefore, the
following conditions must be satisfied:
M4(θ)−N(r) =
+ β,θ − 2α,r, (5.17)
M1(θ) = 2β cot
2 θ + β csc2 θ + β,θθ
−3β,θ cot θ, (5.18)
M2(θ) = r
2∂r(r
2β,r), (5.19)
M3(θ) = 2rα,rθ − α,θ +
∂θPl(cos θ)
−SlLz
∂θ(csc θ ∂θPl(cos θ)). (5.20)
Here, the functions M and N are arbitrary integration
constants.
Solving the condition for the kinetic term to be sep-
arable, Eq. (5.17), together with Eq. (5.13) gives the
general solution that goes to zero at large r as
cos(nθ + ν), (5.21)
β = − A
sin(nθ + ν), (5.22)
where A and ν are arbitrary and n is an integer. These
functions must satisfy the conditions (5.18) – (5.20) in
order for the potential term to be separable as well. To
see when this will be the case, we start by considering Eq.
(5.20). Substituting the general ansatz α = a1(r)a2(θ)
shows that a′2 = P
l or a
2 = (cscθ P
′ depending on
whether a mass or a current multipole is present. The
function a1(r) is then determined from
0 = 2ra′1 − a1 +
clIl/r
(l−1)
dlSlLz/r
(5.23)
Hence,
[clIl/(2l)] r
(1−l)
[dlSlLz/(2l+ 1)] r
(5.24)
so that we obtain for mass moments
Pl(cos θ)
, β =
P ′l (cos θ)
(5.25)
and for current moments
dlSlLz
2l+ 1
csc θP ′l (cos θ)
, (5.26)
dlSlLz
(2l+ 1)(l + 1)
(csc θ P ′l (cos θ))
, (5.27)
where we have used the condition (5.13) to solve for β.
Substituting this in Eq. (5.19) determines that l = 2
for mass moments and l+1 = 2 for current moments. For
an l = 2 mass moment, conditions (5.17) and (5.18) are
satisfied as well, with n = 2 and ν = 0. For the case of an
l = 1 current moment, the extra term inH is independent
of θ anyway. But for any other multipole interaction,
the Hamilton-Jacobi equation will not be separable. For
example, for the current octupole Sijk, the last term in
Eq. (5.7) is proportional to S3Lz(5 cos
2 θ − 1)/r5 and
is therefore not separable. From Eq. (5.2) one can see
that, for a general multipole, the functions f or g contain
different powers of cos θ appearing with the same power
of r since the Legendre polynomials can be expanded as
[34]:
Pl(cos θ) =
(−1)n(2l− 2n)!
2ln!(l − n)!(l − 2n)!
(cos θ)l−2n, (5.28)
where N = l/2 for even l and N = (l + 1)/2 for odd l.
It will not be possible to cancel all of these terms with
(5.21) – (5.22) for l > 2.
The case when Ĉ3 is non-vanishing will only be sepa-
rable if all the coefficients are functions of r or of θ only,
and if in addition, the potential also depends only on r or
on θ. Achieving this for our problem will not be possible
because the potential cannot be transformed to the form
required for separability.
B. Derivation of non-existence of additional
constants of the motion
In this subsection, we show using Poisson brackets that
for a single axisymmetric multipole interaction, to linear
order in the multipole and the mass ratio, a first integral
analogous to the Carter constant does not exist, except
for the cases of mass quadrupole and spin.
Suppose that such a constant does exist. We write the
Hamiltonian corresponding to the action (5.1) as H =
H0 + δH and the Carter-type constant as K = K0 +
δK(pr, pθ, Lz, r, θ), where
2r2 sin2 θ
, (5.29)
δH = − clIl
Pl(cos θ)−
dlSlLz
rl+2 sin θ
∂θPl(cos θ),(5.30)
K0 = p
sin2 θ
. (5.31)
Computing the Poisson bracket gives, to linear order in
the perturbations
0 = {H0, δK}+ {δH,K0} (5.32a)
δK + {δH,K0}, (5.32b)
where we have used that {H0,K0} = 0 and the fact that
{H0, δK} = d(δK)/dt. Here, d/dt denotes the total time
derivative along an orbit (r(t), θ(t), pr(t), pθ(t)) of H0 in
phase space. The partial differential equation (5.32a) for
δK thus reduces to a set of ordinary differential equa-
tions that can be integrated along the individual orbits
in phase space.
The unperturbed motion for a bound orbit is in a
plane, so we can switch from spherical to plane polar co-
ordinates (r, ψ). In terms of these coordinates, we have
H0 = p
r/2+p
ψ/2, K0 = p
ψ, and cos θ = sin ι sin(ψ+ψ0),
with cos ι = Lz/
K and the constant ψ0 denoting the
angle between the direction of the periastron and the
intersection between the orbital and equatorial plane.
Then Eq. (5.32) becomes
δK = η(t), (5.33)
η(t) = − 2pψ dlSlLz
sin ι rl+2(t)
∂ψPl(sin ι sin(ψ(t) + ψ0))
cos(ψ(t) + ψ0)
2pψ clIl
rl+1(t)
∂ψPl(sin ι sin(ψ(t) + ψ0)). (5.34)
For unbound orbits, one can always integrate Eq.
(5.33) to determine δK. However, for bound periodic
orbits there is a possible obstruction: the solution for
the conserved quantity K0 + δK will be single valued if
and only if the integral of the source over the closed orbit
vanishes,
∮ Torb
η(t)dt = 0. (5.35)
Here, Torb is the orbital period. In other words, the par-
tial differential equation (5.32) has a solution δK if and
only if the condition (5.35) is satisfied. This is the same
condition as obtained by the Poincare-Mel’nikov-Arnold
method, a technique for showing the non-integrability
and existence of chaos in certain classes of perturbed dy-
namical systems [35].
Thus, it suffices to show that the condition (5.35) is
violated for all multipoles other than the spin and mass
quadrupole. To perform the integral in Eq. (5.35), we use
the parameterization for the unperturbed motion, r =
K/(1+ e cosψ) and dt/dψ = K3/2/(1+ e cosψ)2, so that
the condition for the existence of a conserved quantity
K0 + δK becomes
clIl(1 + e cosψ)
l−1∂ψPl(sin ι sin(ψ + ψ0))−
dlSlLz
K sin ι
(1 + e cosψ)l∂ψ
∂ψPl(sin ι sin(ψ + ψ0))
cos(ψ + ψ0)
(5.36)
In terms of the variable χ = ψ + ψ0 − π/2, Eq. (5.36) can be written as
dχclIl [1 + e(sinψ0 cosχ− cosψ0 sinχ)]l−1
Pl(sin ι cosχ)
dlSlLz
sin ι
[1 + e(sinψ0 cosχ− cosψ0 sinχ)]l
Pl(sin ι cosχ)
. (5.37)
Inserting the expansion (5.28) for Pl(cosχ), taking the derivatives, and using the binomial expansion for the first term
in Eq. (5.37), we get
0 = clIl
Alnjk e
j(sin ι)l−2n(sinψ0)
k(cosψ0)
dχ (sinχ)j−k+1(cosχ)k+l−2n−1
dlSlLz
Blnjk e
j(sin ι)l−2n−1(sinψ0)
k(cosψ0)
dχ (sinχ)j−k+1(cosχ)k+l−2n−2. (5.38)
The coefficients Alnkj and Blnkj are
Alnkj =
(−1)n+k+1(l − 1)!(2l − 2n)!
2ln!(l − 1− j)!k!(j − k)!(l − n)!(l − 2n− 1)!
, Blnkj =
(−1)n+kl!(2l− 2n)!
2ln!(l − j)!k!(j − k)!(l − n)!(l − 2n− 2)!
. (5.39)
The only non-vanishing contribution to the integrals in Eq. (5.38) will come from terms with even powers of both
cosχ and sinχ. These can be evaluated as multiples of the beta function:
0 = clIl
Clnjk e
j(sin ι)l−2n(sinψ0)
k(cosψ0)
j−k δ(j−k+1),even δ(l+k−1),even
dlSlLz
Dlnjk e
j(sin ι)l−2n−1(sinψ0)
k(cosψ0)
j−k δ(j−k+1),even δ(l+k),even. (5.40)
Here, the coefficients are
Clnjk =
2Γ( j
+ 1)Γ(k
− n+ 1)
Alnkj , Dlnjk =
2Γ( j
+ 1)Γ(k
− n− 1
− n+ 3
Blnkj (5.41)
Eq. (5.40) shows that for even l, terms with j =even
(odd) and k =odd (even) give a non-vanishing contribu-
tion for the case of a mass (current) multipole, and hence
K0+δK is not a conserved quantity for the perturbed mo-
tion. Note that terms with j =even and k =odd for even
l occur only for l > 3, so for l = 2 the mass quadrupole
term in Eq. (5.40) vanishes and therefore there exists an
analog of the Carter constant, which is consistent with
our results of Sec. II and our separability analysis. For
odd l, terms with j =odd (even) and k =even (odd) are
finite for Il (Sl). Note that for the case l = 1 of the spin,
the derivatives with respect to χ in Eq. (5.37) evaluate to
zero, so in this case there also exists a Carter-type con-
stant. These results show that for a general multipole
other than I2 and S1, there will not be a Carter-type
constant for such a system.
1. Exact vacuum spacetimes
Our result on the non-existence of a Carter-type con-
stant can be extended, with mild smoothness assump-
tions, to falsify the conjecture that all exact, axisymmet-
ric vacuum spacetimes posess a third constant of the mo-
tion for geodesic motion. Specifically, we fix a multipole
order l, and we assume:
• There exists a one parameter family
(M, gab(λ))
of spacetimes, which is smooth in the parameter λ,
such that λ = 0 is Schwarzschild, and each space-
time gab(λ) is stationary and axisymmetric with
commuting Killing fields ∂/∂t and ∂/∂φ, and such
that all the mass and current multipole moments of
the spacetime vanish except for the one of order l.
On physical grounds, one expects a one parameter
family of metrics with these properties to exist.
• We denote by H(λ) the Hamiltonian on the tan-
gent bundle overM for geodesic motion in the met-
ric gab(λ). By hypothesis, there exists for each λ
a conserved quantity M(λ) which is functionally
independent of the conserved energy and angular
momentum. Our second assumption is that M(λ)
is differentiable in λ at λ = 0. One would expect
this to be true on physical grounds.
• We assume that the conserved quantity M(λ) is
invariant under the symmetries of the system:
L~ξM(λ) = L~ηM(λ) = 0,
where ~ξ and ~η are the natural extensions to the 8
dimensional phase space of the Killing vectors ∂/∂t
and ∂/∂φ. This is a very natural assumption.
These assumptions, when combined with our result of
the previous section, lead to a contradiction, showing
that the conjecture is false under our assumptions.
To prove this, we start by noting that M(0) is a con-
served quantity for geodesic motion in Schwarzschild, so
it must be possible to express it as some function f of
the three independent conserved quantities:
M(0) = f(E,Lz,K0). (5.42)
Here E is the energy, Lz is the angular momentum, and
K0 is the Carter constant. Differentiating the exact re-
lation {H(λ),M(λ)} = 0 and evaluating at λ = 0 gives
{H0,M1} =
{E,H1}+
{Lz, H1}+
{K0, H1},
(5.43)
where H0 = H(0), H1 = H
′(0), and M1 = M
′(0). As
before, we can regard this is a partial differential equa-
tion that determines M1, and a necessary condition for
solutions to exist and be single valued is that the integral
of the right hand side over any closed orbit must vanish:
{E,H1}+
{Lz, H1}+
{K0, H1}
(5.44)
Now strictly speaking, there are no closed orbits in
the eight dimensional phase space. However, the ar-
gument of the previous section applies to orbits which
are closed in the four dimensional space with coordinates
(r, θ, pr, pθ), since by the third assumption above every-
thing is independent of t and φ, and pt and pφ are con-
served. Here (t, r, θ, φ) are Schwarzschild coordinates and
(pt, pr, pθ, pφ) are the corresponding conjugate momenta.
Next, we can pull the partial derivatives ∂f/∂E etc.
outside of the integral. It is then easy to see that the first
two terms vanish, since there do exist a conserved energy
and a conserved z-component of angular momentum for
the perturbed system. Thus, Eq. (5.44) reduces to
{K0, H1} = 0. (5.45)
Since M(0) is functionally independent of E and Lz, the
prefactor ∂f/∂K0 must be nonzero, so we obtain
{K0, H1} = 0. (5.46)
The result (5.46) applies to fully relativistic orbits in
Schwarzschild. We need to take the Newtonian limit of
this result in order to use the result we derived in the
previous section. However, the Newtonian limit is a lit-
tle subtle since Newtonian orbits are closed and generic
relativistic orbits are not closed. We now discuss how the
limit is taken.
The integral (5.46) is taken over any closed orbit in
the four dimensional phase space (r, θ, pr, pθ) which cor-
responds to a geodesic in Schwarzschild. Such orbits are
non generic; they are the orbits for which the ratio be-
tween the radial and angular frequencies ωr and ωθ is a
rational number. We denote by qr and qθ the angle vari-
ables corresponding to the r and θ motions [36]. These
variables evolve with proper time τ according to
qr = qr,0 + ωrτ, (5.47a)
qθ = qθ,0 + ωθτ, (5.47b)
where qr,0 and qθ,0 are the initial values. We denote the
integrand in Eq. (5.46) by
I(qr, qθ, a, ε, ι),
where I is some function, and a, ε and ι are the parame-
ters of the geodesic defined by Hughes [32] (functions of
E, Lz and K0). The result (5.46) can be written as
∫ T/2
dτ I[qr(τ), qθ(τ), a, ε, ι] = 0, (5.48)
where T = T (a, ε, ι) is the period of the r, θ motion.
Since the variables qr and qθ are periodic with period
2π, we can express the function I as a Fourier series
I(qr, qθ, a, ε, ι) =
n,m=−∞
Inm(a, ε, ι)einqr+imqθ . (5.49)
Now combining Eqs. (5.47), (5.48) and (5.49) gives
n,m=−∞
Inm(a, ε, ι)einqr,0+imqθ,0
×Si [(nωr +mωθ)T/2] , (5.50)
where Si(x) = sin(x)/x. Since the initial conditions qr,0
and qθ,0 are arbitrary, it follows that
Inm(a, ε, ι)Si [(nωr +mωθ)T/2] = 0 (5.51)
for all n, m.
Next, for closed orbits the ratio of the frequencies must
be a rational number, so
, (5.52)
where p and q are integers with no factor in common.
These integers depend on a, ε and ι. The period T is
given by 2π/T = qωr = pωθ. The second factor in Eq.
(5.51) now simplifies to
(np+mq)π
, (5.53)
which vanishes if and only if
n = n̄q, m = m̄p, n̄+ m̄ 6= 0, (5.54)
for integers n̄, m̄. It follows that
Inm(a, ε, ι) = 0 (5.55)
for all n, m except for values of n, m which satisfy the
condition (5.54)
Consider now the Newtonian limit, which is the limit
a → ∞ while keeping fixed ε and ι and the mass of the
black hole. We denote by IN(qr , qθ, a, ε, ι) the Newtonian
limit of the function I(qr , qθ, a, ε, ι). The integral (5.48)
in the Newtonian limit is given by the above computation
with p = q = 1, since ωr = ωθ in this limit. This gives
dτIN =
INn,−n(a, ε, ι) ein(qr,0−qθ,0), (5.56)
where INnm are the Fourier components of IN. In the
previous subsection, we showed that this function is non-
zero, which implies that there exists a value k of n for
which IN k,−k 6= 0.
Now as a → ∞, we have ωr/ωθ → 1, and hence from
Eq. (5.52) there exists a critical value ac of a such that
the values of p and q exceed k for all closed orbits with
a > ac. (We are keeping fixed the values of ε and ι). It
follows from Eqs. (5.54) and (5.55) that
Ik,−k(a, ε, ι)
IN k,−k(a, ε, ι)
= 0 (5.57)
for all such values of a. However this contradicts the fact
Ik,−k(a, ε, ι)
IN k,−k(a, ε, ι)
→ 1 (5.58)
as a→ ∞. This completes the proof.
Hence, if the three assumptions listed at the start of
this subsection are satisfied, then the conjecture that all
vacuum, axisymmetric spacetimes possess a third con-
stant of the motion is false.
Finally, it is sometimes claimed in the classical dynam-
ics literature that perturbation theory is not a sufficiently
powerful tool to assess whether the integrability of a sys-
tem is preserved under deformations. An example that
is often quoted is the Toda lattice Hamiltonian [38, 39].
This system is integrable and admits a full set of con-
stants of motion in involution. However, if one approx-
imates the Hamiltonian by Taylor expanding the poten-
tial about the origin to third order, one obtains a sys-
tem which is not integrable. This would seem to indicate
that perturbation theory can indicate a non-integrability,
while the exact system is still integrable.
In fact, the Toda lattice example does not invalidate
the method of proof we use here. If we write the Toda
lattice Hamiltonian as H(q,p), then the situation is that
H(λq,p) is integrable for λ = 1, but it is not integrable
for 0 < λ < 1. Expanding H(λq,p) to third order in λ
gives a non-integrable Hamiltonian. Thus, the perturba-
tive result is not in disagreement with the exact result
for 0 < λ < 1, it only disagrees with the exact result for
λ = 1. In other words, the example shows that pertur-
bation theory can fail to yield the correct result for finite
values of λ, but there is no indication that it fails in ar-
bitrarily small neighborhoods of λ = 0. Our application
is qualitatively different from the Toda lattice example
since we have a one parameter family of Hamiltonians
H(λ) which by assumption are integrable for all values
of λ.
VI. CONCLUSION
We have examined the effect of an axisymmetric
quadrupole moment Q of a central body on test parti-
cle inspirals, to linear order in Q, to the leading post-
Newtonian order, and to linear order in the mass ratio.
Our analysis shows that a natural generalization of the
Carter constant can be defined for the quadrupole inter-
action. We have also analyzed the leading order spin self-
interaction effect due to the scattering of the radiation off
the spacetime curvature due to the spin. Combining the
effects of the quadrupole and the leading order effects
linear and quadratic in the spin, we have obtained ex-
pressions for the instantaneous as well as time-averaged
evolution of the constants of motion for generic orbits un-
der gravitational radiation reaction, complete at O(a2ǫ4).
We have also shown that for a single multipole interaction
other than Q or spin, in our approximations, a Carter-
type constant does not exist. With mild additional as-
sumptions, this result can be extended to exact space-
times and falsifies the conjecture that all axisymmetric
vacuum spacetimes possess a third constant of motion for
geodesic motion.
VII. ACKNOWLEDGMENTS
This research was partially supported by NSF grant
PHY-0457200. We thank Jeandrew Brink for useful cor-
respondence.
Appendix A: Time variation of quadrupole: order of
magnitude estimates
In this appendix, we give an estimate of the timescale
Tevol for the quadrupole to change. The analysis in the
body of this paper is valid only when Tevol ≫ Trr, where
Trr is the radiation reaction time, since we have neglected
the time evolution of the quadrupole. We distinguish be-
tween two cases: (i) when the central body is exactly non-
spinning but has a quadrupole, and (ii) when the central
body has finite spin in addition to the quadrupole.
1. Estimate of the scaling for the nonspinning case
For the purpose of a crude estimate, the relevant in-
teraction is the tidal interaction with energy
QijEij ∼ −
Q̄I cos2 θ, (A1)
where Eij is the tidal field, θ is the angle between the
symmetry axis and the normal to the orbital plane of
m2, and we have written the quadrupole as Q ∼ Q̄I,
where Q̄ is dimensionless and I is the moment of inertia.
For small deviations from equilibrium, the relevant piece
of the Lagrangian is schematically
L ∼ Iψ̇2 + Q̄I m2
ψ2. (A2)
We define the evolution timescale Tevol to be the time
it takes for the angle to change by an amount of order
unity, and since the amplitude of the oscillation scales
roughly as ∼ m2/m1, the evolution time scales as
T−2evol ∼
ω2orbit, (A3)
where ω2orbit = M/r
3. Thus, the ratio of the evolution
timescale compared to the radiation reaction timescale
scales as
Tevol/Trr ∼
. (A4)
2. Estimate of the scaling for the spinning case
When the body is spinning the effect of the tidal cou-
pling is to cause a precession. For the purpose of this
estimate, we calculate the torque on m1 due to the com-
panion’s Newtonian field. The torque N scales as
Ni ∼ ǫimjQmkEjk. (A5)
We assume that the precession is slow, i.e.
ωprec ≪ S̄/m1
, (A6)
where ωprec is the precession frequency and S̄ = S/m
is the dimensionless spin. This gives the approximate
scaling of the precession timescale as (cf. [37])
Tprec/Trr ∼
. (A7)
and the evolution timescale is thus
Tevol/Trr ∼
. (A8)
Because of our assumption (A6) that the precession is
slow, equation (A8) is valid only when
) S̄2
. (A9)
When S̄ is sufficiently small that the condition (A9) is
violated, the relevant timescale is instead given by Eq.
(A3).
3. Application to Kerr inspirals
For Kerr inspirals,
S̄ ∼ a, Q̄ ∼ a2, µ/M ≪ 1 and r ∼M. (A10)
Therefore, the condition (A9) is satisfied, and the pre-
cession time is longer than the radiation reaction time
Tprec/Trr ∼
. (A11)
Note that for Kerr inspirals, since r ∼ M both formulas
(A3) and (A7) give the same scaling.
Moreover, for Kerr inspirals, the amplitude of the pre-
cession will be small, of order the mass ratio µ/M . This is
because of angular momentum conservation: in the rela-
tivistic regime, the orbital angular momentum is a factor
of µ/M smaller than the angular momentum of the black
hole and can therefore not cause a large precession ampli-
tude. Even if the orbital angular momentum at infinity
is large, most of it will be radiated away as outgoing
gravitational waves during the earlier phase of the inspi-
ral. This factor of µ/M is taken into account when we
consider the evolution timescale, which for Kerr inspirals
reduces to
Tevol/Trr ∼
. (A12)
Since 1/a ≥ 1, M/r ∼ 1 and M/µ ≪ 1, the evolution
time is long compared to the radiation reaction time and
we can neglect the time variation of the quadrupole at
leading order.
Appendix B: Computation of time averaged fluxes
1. Averaging method that parallels fully
relativistic averaging
We start by noting that the differential equations
(2.26) and (2.27) governing the r̃ and θ̃ motions decouple
if we define a new time parameter t̂ by
dt̂ =
dt̃. (B1)
This is the analog of the Mino time parameter for
geodesic motion in Kerr [12]. The equations of motion
(2.26)–(2.24) then become
= V̂r̃(r̃), (B2)
V̂r̃(r̃) = 2Er̃
4 + 2r̃3 −Kr̃2 − 4SLzr̃
r̃ − 2L2z
, (B3)
= V̂θ̃(θ̃), (B4)
V̂θ̃(θ̃) = K −
sin2 θ̃
−QE cos 2θ̃, (B5)
= V̂ϕr̃(r̃) + V̂ϕθ̃(θ̃), (B6)
V̂ϕr̃(r̃) =
, V̂ϕθ̃(θ̃) =
sin2 θ̃
. (B7)
The parameters t and t̂ are related by:
= V̂tr̃(r̃) + V̂tθ̃(θ̃) (B8)
V̂tr̃(r̃) = r̃
2, V̂tθ̃(θ̃) =
cos 2θ̃. (B9)
It follows from Eqs. (B2) and (B4) that the functions
r̃(t̂) and θ̃(t̂) are periodic; and we denote their periods
by Λr̃ and Λθ̃. We define the fiducial motion associated
with the constants of motion E, Lz and K to be the
motion with the initial conditions r̃(0) = r̃min and θ̃(0) =
θ̃min, where r̃min and θ̃min are given by the vanishing of
the right-hand sides of Eqs. (B2) and (B4) respectively.
The functions r̂(t̂) and θ̂(t̂) associated with this fiducial
motion are given by
∫ r̂(t̂)
r̃min
V̂r̃(r̃)
= t̂, (B10)
∫ θ̂(t̂)
θ̃min
V̂θ̃(θ̃)
= t̂. (B11)
From Eq. (B8) it follows that
t(t̂) = t0 +
V̂tr̃[r̃(t
′)] + V̂tθ̃[θ̃(t
, (B12)
where t0 = t(0). Next, we define the constant Γ to be
the following average value:
∫ Λr̃
dt′V̂tr̃[r̂(t
′)] +
dt′V̂tθ̃[θ̂(t
′)]. (B13)
Then we can write t(t̂) as a sum of a linear term and
terms that are periodic:
t(t) = t0 + Γt̂+ δt(t̂), (B14)
where δt(t̂) denotes the oscillatory terms in Eq. (B12).
To average a function over the time parameter t̂, it is
convenient to parameterize r̃ and θ̃ in terms of angular
variables as follows. For the average over θ̃ we introduce
the parameter χ by
cos2 θ̂(t̂) = z− cos
2 χ, (B15)
where z− = cos
2 θ̃− with z− being the smaller root of Eq.
(B4):
K + 3QE ±
(K −QE)2 + 4QEL2z
(B16)
and where β = 2QE. Then from the definition (B11)
of θ̂ together with Eq. (B4) and the requirement that χ
increases monotonically with t̂ we obtain
β (z+ − z− cos2 χ). (B17)
Then we can write the average over t̂ of a function Fθ̃(t̂)
which is periodic with period Λθ̃ in terms of χ as
〈Fθ̃〉t̂ =
dt̂Fθ̃(t̂)
Fθ̃[t̂(χ)]
β (z+ − z− cos2 χ)
, (B18)
where
Λθ̃ =
β (z+ − z− cos2 χ)
. (B19)
Similarly, to average a function Fr̃(t̂) that is periodic with
period Λr̃, we introduce a parameter ξ via
1 + e cos ξ
, (B20)
where the parameter ξ varies from 0 to 2π as r̃ goes
through a complete cycle. Then,
= P (ξ), (B21)
P (ξ) ≡
V̂r̃[r̃(ξ)]
pe | sin ξ |
(1 + e cos ξ)
(B22)
The average over t̂ of Fr̃(t̂) can then be computed from
〈Fr̃〉t̂ =
dξ Fr̃/P (ξ)
dξ/P (ξ)
. (B23)
Now, a generic function Fr̃,θ̃[r̃(t̂), θ̃(t̂)] will be biperiodic
in t̂: Fr̃,θ̃[r̃(t̂+Λr̃), θ̃(t̂+Λθ̃)] = Fr̃,θ̃[r̃(t̂), θ̃(t̂)]. Combin-
ing the results (B18) and (B23) we can write its average
as a double integral over χ and ξ as
〈Fr̃,θ̃〉t̂ =
Λθ̃Λr̃
Fr̃,θ̃[r̃(ξ), θ̃(χ)]
β (z+ − z− cos2 χ)P (ξ)
(B24)
To compute the time average of Ė, L̇z, and K̇, we need
to convert the average of a function over t̂ calculated from
(B24) to the average over t. As explained in detail in
[9], in the adiabatic limit we can choose a time interval
∆t which is long compared to the orbital timescale but
short compared to the radiation reaction time. From
Eq. (B12) we have ∆t = Γt̂+ osc.terms. The oscillatory
terms will be bounded and will therefore be negligible in
the adiabatic limit, so we have to a good approximation
〈Ė〉t =
〈Ė V̂t〉t̂, (B25)
where V̂t ≡ V̂tr̃ + V̂tθ̃, cf. Eq. (B8), and similarly for L̇z
and K̇.
The explicit results we obtain using this method are
given in section III, Eqs. (3.28), (3.29), and (3.30).
2. Averaging method using the explicit
parameterization of Newtonian orbits
To perform the time-averaging using this method, we
define a parameter ξ via
1 + e cos ξ
, (B26)
where the parameter ξ varies from 0 to 2π as r̃ goes
through a complete cycle. Note that θ appears in Eqs.
(3.16) – (3.18) only in terms that are linear in Q, so we
can write θ in terms of ξ using the Newtonian relation
x3 = r cos θ = r sin ι sin(ξ + ξ0). (B27)
Here, ξ0 is the angle between the direction of the peri-
helion and the intersection of the orbital and equatorial
plane. Similarly, for the ṙθ̇ terms in Eqs. (3.17) and
(3.26) we can use the Newtonian relations ṙ = e/
p sin ξ
and ξ̇ =
p/r2. From Eqs. (2.27) and (B20) it follows
(1 + e cos ξ)2
−3 + e2 − 2e cos ξ + 2 cos2 ι(8 − e2 + 8e cos ξ + e2 cos 2ξ)
, (B28)
and from Eq. (2.12)
(1 + e cos ξ)
2 sin2 ι sin2(ξ + ξ0)− 1
. (B29)
Using these expressions, we compute the time-averaged fluxes from
〈Ė〉 =
dξ Ė (dt/dt̃) (dt̃/dξ)
dξ (dt/dt̃) (dt̃/dξ)
(B30)
and obtain:
〈Ė〉 = −32
(1− e2)3/2
e4 − S
cos(ι)
cos(2ι)
cos(2ι)
cos(2ξ0) sin
cos(2ξ0) sin
, (B31)
〈L̇z〉 = −
(1− e2)3/2
cos ι
e2 − S
2p3/2 cos ι
+ 7e2 +
cos(2ι)
−3− 45
45 + 148e2 +
cos(2ι)
1 + 3e2 +
e2 cos(2ξ0) sin
, (B32)
〈K̇〉 = −64
(1 − e2)3/2
e2 − S
2p3/2
+ 37e2 +
cos(ι)
cos(2ι)
cos(2ι)
e2 cos(2ξ0) sin
. (B33)
In the adiabatic limit, the terms involving cos(2ξ0) can
be omitted because they average to zero. As explained
by Ryan [15], the radiation reaction timescale for terms
involving ξ0 is much longer than the precession timescale
for most orbits, so the terms involving ξ0 will average
away. This is consistent with our results for the adia-
batic infinite time-averaged fluxes using the Mino time
parameter. The Mino-time averaging method was based
on the assumption that the fundamental frequencies are
incommensurate and the motion fills up the whole torus,
which is equivalent to averaging over ξ0.
[1] L. Barack and C. Cutler, Phys. Rev. D 69, 082005 (2004) [2] K. Glampedakis and S. Babak, Class. Quantum Grav.
23, 4167 (2006)
[3] D. A. Brown, et al. gr-qc/0612060
[4] E. Poisson, Living Rev. Relat. 7, 6 (2004),
http://relativity.livingreviews.org/Articles/lrr-2004-6/index.html
[5] Special Issue: Gravit. Rad. from Binary Black Holes: Ad-
vances in the perturbative approach, Class. Quant. Grav.
22 (2005)
[6] K. Glampedakis, Class. Quantum Grav. 22, S605 (2005)
[7] S. Drasco, Class. Quantum Grav. 23, S769 (2006)
[8] M. Favata and É. É. Flanagan Accuracy of adiabatic
waveforms for eccentric orbits, (in preparation)
[9] S. Drasco, É. É. Flanagan, and S. A. Hughes, Class.
Quantum Grav. 22, S801 (2005)
[10] É. É. Flanagan and T. Hinderer, Two timescale analysis
of extreme mass ratio inspirals in Kerr. II. Numerical
integration through resonances, (in preparation)
[11] N. Sago, et al., Progr. Theor. Phys. 114, 509 (2005)
[12] Y. Mino, Phys. Rev. D 67, 084027 (2003)
[13] S. A. Hughes, S. Drasco, É. É. Flanagan, and J. Franklin,
Phys. Rev. Lett. 94, 221101 (2005)
[14] F. D. Ryan, Phys. Rev. D 52, R3159 (1995)
[15] F. D. Ryan, Phys. Rev. D 53, 3064 (1996)
[16] L. Blanchet, T. Damour, G. Farese, and B. Iyer, Phys.
Rev. D 71, 124004 (2005)
[17] L. Kidder, C. Will, and A. Wiseman, Phys. Rev. D 47,
R4183 (1993)
[18] L. A. Gergely, Phys. Rev. D 61, 024035 (1999)
[19] C. Will, Phys. Rev. D 71, 084027 (2005)
[20] G. Faye, L. Blanchet, and A. Buonanno, Phys. Rev. D
74, 104033 (2006)
[21] L. Blanchet, A. Buonanno, and G. Faye, Phys. Rev. D
74, 104034 (2006)
[22] E. Poisson, Phys. Rev. D 57, 5287 (1998)
[23] L. A. Gergely and Z. Keresztes, Phys. Rev. D 67, 024020
(2003)
[24] M. Shibata, M. Sasaki, H. Tagoshi, and T. Tanaka, Phys.
Rev. D 51, 1646 (1995)
[25] É. É. Flanagan and T. Hinderer, Two timescale analy-
sis of extreme mass ratio inspirals in Kerr. I. General
formalism, (in preparation)
[26] C. Misner, K. Thorne, and J. Wheeler, Gravitation (W.H.
Freeman and Co., San Francisco, 1973)
[27] R. O. Hansen, J. Math. Phys. 15, 46 (1974)
[28] K. S. Thorne, Rev. Mod. Phys 52, 300 (1980)
[29] T. Bäckdahl, gr-qc/0612043 (2006); Class. Quantum
Grav. 22, 3585 (2005)
[30] C. Li and G. Lovelace, gr-qc/0702146 (2007)
[31] L. Blanchet and T. Damour, Phys. Lett. 104A, 82 (1984)
[32] S. A. Hughes, Phys. Rev. D 61, 084004 (2000)
[33] K. Glampedakis, S. Hughes, and D. Kennefick, Phys.
Rev. D 66, 064005 (2002)
[34] G. Arfken, Mathematical Methods for Physicists (Aca-
demic Press, CA, 1985)
[35] V. K. Melnikov, Trans. Moscow Math. Soc. 12, 1-56
(1956)
[36] W. Schmidt, Celestial mechanics in Kerr spacetime
Class. Quantum Grav. 19 (2002) 2743-2764
[37] H. Goldstein, C. Poole, and J. Safko, Classical Mechanics
(Addison-Wesley, 2002)
[38] Ý. Birol and A. Hacinliyan, Phys. Rev. E 52, 4750 (1995)
[39] H. Yoshida, Comm. Math. Phys. 116, 529 (1988)
http://arxiv.org/abs/gr-qc/0612060
http://relativity.livingreviews.org/Articles/lrr-2004-6/index.html
http://arxiv.org/abs/gr-qc/0612043
http://arxiv.org/abs/gr-qc/0702146
ABSTRACT
  We analyze the effect of gravitational radiation reaction on generic orbits
around a body with an axisymmetric mass quadrupole moment Q to linear order in
Q, to the leading post-Newtonian order, and to linear order in the mass ratio.
This system admits three constants of the motion in absence of radiation
reaction: energy, angular momentum, and a third constant analogous to the
Carter constant. We compute instantaneous and time-averaged rates of change of
these three constants. For a point particle orbiting a black hole, Ryan has
computed the leading order evolution of the orbit's Carter constant, which is
linear in the spin. Our result, when combined with an interaction quadratic in
the spin (the coupling of the black hole's spin to its own radiation reaction
field), gives the next to leading order evolution. The effect of the
quadrupole, like that of the linear spin term, is to circularize eccentric
orbits and to drive the orbital plane towards antialignment with the symmetry
axis. In addition we consider a system of two point masses where one body has a
single mass multipole or current multipole. To linear order in the mass ratio,
to linear order in the multipole, and to the leading post-Newtonian order, we
show that there does not exist an analog of the Carter constant for such a
system (except for the cases of spin and mass quadrupole). With mild additional
assumptions, this result falsifies the conjecture that all vacuum, axisymmetric
spacetimes posess a third constant of geodesic motion.

<|endoftext|><|startoftext|>
Introduction to Algorithms.
Diestel R (1997) Graph Theory. New York: Springer.
Eguiluz VM, Chialvo DR, Cecchi GA, Baliki M, Apkarian AV (2005) Scale-free brain functional 
networks. Phys Rev Lett 94:018102.
Erdös P, Rényi A (1960) On the evolution of random graphs. Publ Math Inst Hung Acad Sci 5:17-
Felleman DJ, van Essen DC (1991) Distributed hierarchical processing in the primate cerebral 
cortex. Cereb Cortex 1:1-47.
Girvan M, Newman MEJ (2002) Community Structure in Social and Biological Networks. Proc Natl 
Acad Sci 99:7821-7826.
Hilgetag CC, O'Neill MA, Young MP (1996) Indeterminancy of the visual cortex. Science 271:776-
Hilgetag CC, Burns GAPC, O'Neill MA, Scannell JW, Young MP (2000) Anatomical Connectivity 
Defines the Organization of Clusters of Cortical   Areas in the Macaque Monkey and the Cat. 
Phil Trans R Soc Lond B 355:91-110.
Jeong H, Mason SP, Barabási A-L, Oltvai ZN (2001) Lethality and centrality in protein networks. 
Nature 411:41-42.
Kaiser M, Hilgetag CC (2004) Edge vulnerability in neural and metabolic networks. Biol Cybern 
90:311-317.
Kaiser M, Hilgetag CC (2006) Nonoptimal Component Placement, but Short Processing Paths, due 
to Long-Distance Projections in Neural Systems. PLoS Computational Biology 2:e95.
Kaiser M, Hilgetag CC (2007) Development of multi-cluster cortical networks by time windows for 
spatial growth. Neurocomputing:(in press).
Kötter R (2004) Online Retrieval, Processing, and Visualization of Primate Connectivity Data from 
the CoCoMac Database. Neuroinformatics 2:127-144.
Krubitzer L, Kahn DM (2003) Nature versus Nurture Revisited: An Old Idea with a New Twist. 
Prog Neurobiol 70:33-52.
Martin R, Kaiser M, Andras P, Young MP (2001) Is the Brain a Scale-Free Network? In: Annual 
Conference of the Society for Neuroscience, p Paper 816.814. San Diego, US.
Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002) Network Motifs: Simple 
Building Blocks of Complex Networks. Science 298:824-827.
Nicolelis MAL, Yu CH, Baccalá LA (1990) Structural Characterization of the Neural Circuit 
responsible for Control of   the Cardiovascular Functions in High Vertebrates. Comput Biol 
Med 20:379-400.
Petroni F, Panzeri S, Hilgetag CC, Koetter R, Young MP (2001) Simultaneity of Responses in a 
Hierarchical Visual Network. Neuroreport 12:2753-2759.
Preuss TM (2000) What's human about the human brain. In: The New Cognitive Neurosciences 
(Gazzaniga M, ed), pp 1219-1234. Cambridge, MA.
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási A-L (2002) Hierarchical Organization of 
Modularity in Metabolic Networks. Science 297:1551-1555.
Scannell JW, Blakemore C, Young MP (1995) Analysis of Connectivity in the Cat Cerebral Cortex. 
J Neurosci 15:1463-1483.
Spear PD, Tong L, McCall MA (1988) Functional influence of areas 17, 18 and 19 on lateral 
suprasylvian cortex   in kittens and adult cats: implications for compensation following early   
visual cortex damage. Brain Res 447:79-91.
Sporns O, Zwi JD (2004) The Small World of the Cerebral Cortex. Neuroinformatics 2:145-162.
Sporns O, Tononi G, Edelman GM (2000) Theoretical Neuroanatomy: Relating Anatomical and 
Functional Connectivity in   Graphs and Cortical Connection Matrices. Cereb Cortex 10:127-
Sporns O, Chialvo DR, Kaiser M, Hilgetag CC (2004) Organization, development and function of 
complex brain networks. Trends Cogn Sci 8:418-425.
Stam CJ, Jones BF, Nolte G, Breakspear M, Scheltens P (2007) Small-world networks and 
functional connectivity in Alzheimer's disease. Cereb Cortex 17:92-99.
Stephan KE, Kamper L, Bozkurt A, Burns GA, Young MP, Kotter R (2001) Advanced database 
methodology for the Collation of Connectivity data on the Macaque brain (CoCoMac). 
Philos Trans R Soc Lond B Biol Sci 356:1159-1186.
Strogatz SH (2001) Exploring complex networks. Nature 410:268-276.
Stromswold K (2000) The cognitive neuroscience of language acquisition. In: The New Cognitive 
Neurosciences (Gazzaniga M, ed), pp 909-932. Cambridge, MA.
Stumpf MPH, Wiuf C, May RM (2005) Subnets of Scale-Free Networks are Not Scale-Free: 
Sampling Properties of   Networks. Proc Natl Acad Sci USA 102:4221-4224.
Watts DJ, Strogatz SH (1998) Collective Dynamics of 'small-World' Networks. Nature 393:440-442.
Young MP (1992) Objective Analysis of the Topological Organization of the Primate Cortical   
Visual System. Nature 358:152-155.
Young MP (1993) The organization of neural systems in the primate cerebral cortex. Phil Trans R 
Soc 252:13-18.
Young MP (2000) The architecture of visual cortex and inferential processes in vision. Spatial 
Vision 13:137-146.
Young MP, Scannell JW, Burns GA, Blakemore C (1994) Analysis of connectivity: neural systems 
in the cerebral cortex. Rev Neurosci 5:227-250.
Tables
Table 1. Comparison of brain networks and benchmark networks.
The table shows the average shortest path and the clustering coefficient statistics for the macaque 
and cat brain structure networks, and for the respective benchmark random, rewired, small-world, 
and scale-free networks. For the benchmark networks, the data shows the mean value and the 
standard deviation of 50 generated networks.
Average shortest path Clustering coefficient
Macaque 2.414 0.453
   Random mean 2.093 ± 0.009 0.142 ± 0.004
   Rewired mean 2.118 ± 0.010 0.239 ± 0.009
   Small-world mean 2.439 ± 0.054 0.416 ± 0.022
   Scale-free mean 2.078 ± 0.042 0.564 ± 0.042
Cat 1.961 0.542
   Random mean 1.749 ± 0.002 0.265 ± 0.003
   Rewired mean 1.803 ± 0.006 0.381 ± 0.006
   Small-world mean 1.868 ± 0.017 0.461 ± 0.016
   Scale-free mean 1.768 ± 0.014 0.535 ± 0.029
Table 2. Overview of the most highly-connected regions in the cat and macaque network.
The table shows the total number of connections of the region (degree) as well as the number of 
incoming / afferent (in-degree) and outgoing / efferent (out-degree) connections. The maximal 
possible number of connections would have been 110 connections for the cat and 130 connections 
for the macaque.
Rank Area Total Incoming Outgoing
1 AES 59 30 29
2 Ia 55 29 26
3 7 54 28 26
4 Ig 52 22 30
5 5al 49 30 19
Macaque
Rank Area Total Incoming Outgoing
1 A7B 43 23 20
2 LIP 42 19 23
3 A46 42 23 19
4 FEF 38 19 19
5 TPT 37 18 19
Figures
Figure 1. Examples of random and scale-free networks. Schematic view of network connectivity 
features. (A) Simple scale-free network having highly-connected nodes (hubs) here shown at the 
centre. (B) Simple random network; both networks have the same number of nodes and edges.
0 5 10 15 20 25 30 35 40 45
degree
s Macaque
Random
0 5 10 15 20 25 30 35 40 45 50 55
degree
s Cat
Random
Figure 2. Direct comparison of degree distribution. (A) Histogram of the degree distribution of 
the macaque (gray) compared to the distribution of random networks (binomial distribution given 
the probability p=0.1417 that an edge occurs, black). (B) Histogram of the degree distribution of the 
cat (gray) compared to the distribution of random networks (binomial distribution given the 
probability p=0.2643 that an edge occurs, black). 
rewired scale-free random small-world
Macaque
Figure 3. Similarity of network connectivity. For each type of benchmark network, 1,000 
networks were generated. As the cat network has a larger number of edges, the percentages of 
similar edges are also higher. The similarity with the cortical networks is as good for the scale-free 
networks as for the rewired cortical networks. In contrast, the similarity of random and small-world 
networks is significantly lower. 
Figure 4. Sequential node eliminations in Macaque cortical networks. The fraction of deleted 
nodes (zero for the intact network) is plotted against the average shortest path (ASP) after node 
removals. Nodes were removed in order of connectivity, starting with the most highly connected 
nodes (targeted elimination) or the node order was determined randomly (random elimination). (A)
Cortical network during targeted (dashed) and random (solid line) elimination. In the subsequent 
plots B, C and D, the dashed line shows the average effect of targeted elimination and the thin 
dashed lines the 95% confidence interval for the generated networks. The solid line represents the 
average effect of random elimination. The dashed grey line represents targeted removal in the 
cortical network of A for comparison. (B) Small-world benchmark network. (C) Scale-free 
benchmark network. (D) Random benchmark network. (The complete set of figures for cat and 
macaque with node and edge elimination and the effect on ASP is available in the supplementary
material). 
Figure 5. Sequential node eliminations in cat cortical networks. The fraction of deleted nodes 
(zero for the intact network) is plotted against the average shortest path (ASP) after node removals. 
Nodes were removed in order of connectivity, starting with the most highly connected nodes 
(targeted elimination) or the node order was determined randomly (random elimination). (A)
Cortical network during targeted (dashed) and random (solid line) elimination. Lines in B-C have the 
same meaning as in Fig. 4. (B) Small-world benchmark network. (C) Scale-free benchmark network. 
(D) Random benchmark network. 
A      
cortical scale-free rewired random small-world
Macaque
cortical scale-free rewired random small-world
Macaque
Figure 6. Fraction and value of peak ASP for targeted node elimination. The average values and 
standard deviations are shown for the 50 generated benchmark networks. (A) Fraction of eliminated 
nodes, at which the largest ASP was attained. For the cat cortical network, only the fraction of peak 
ASP for the scale-free network is close to the cat network whereas the fractions of other benchmark 
networks are higher. The same is the case for the macaque cortical network. (B) Peak value of the 
ASP. It is higher for scale-free networks than for cortical networks, in contrast to more similar 
values for the other benchmark networks. 
A     
cortical scale-free rewired random small-world
Macaque
cortical scale-free rewired random small-world
Macaque
     
Figure 7. Fraction and value of peak ASP for targeted connection elimination. The average 
values and standard deviations are shown for the 50 generated benchmark networks. (A) Fraction of 
eliminated connections, at which the largest ASP was attained. For the cat network, scale-free and 
small-world fractions are similar to the cortical value whereas fractions of rewired and random 
networks are significantly higher. For the macaque network, however, all benchmark networks 
except for the small-world network show a similar fraction of peak ASP. (B) Peak values of the 
ASP. The peak value of the cat cortical network can be matched by the random and rewired 
networks, nearly by the scale-free but significantly not by the small-world network. For the 
macaque, all networks except for the scale-free network show significantly different values.
ABSTRACT
  Structure entails function and thus a structural description of the brain
will help to understand its function and may provide insights into many
properties of brain systems, from their robustness and recovery from damage, to
their dynamics and even their evolution. Advances in the analysis of complex
networks provide useful new approaches to understanding structural and
functional properties of brain networks. Structural properties of networks
recently described allow their characterization as small-world, random
(exponential) and scale-free. They complement the set of other properties that
have been explored in the context of brain connectivity, such as topology,
hodology, clustering, and hierarchical organization. Here we apply new network
analysis methods to cortical inter-areal connectivity networks for the cat and
macaque brains. We compare these corticocortical fibre networks to benchmark
rewired, small-world, scale-free and random networks, using two analysis
strategies, in which we measure the effects of the removal of nodes and
connections on the structural properties of the cortical networks. The brain
networks' structural decay is in most respects similar to that of scale-free
networks. The results implicate highly connected hub-nodes and bottleneck
connections as structural basis for some of the conditional robustness of brain
systems. This informs the understanding of the development of brain networks'
connectivity.

<|endoftext|><|startoftext|>
Introduction
	Geometry of supported particles
	Melting and surface melting
	Conclusion
	References
ABSTRACT
  We construct a simple thermodynamic model to describe the melting of a
supported metal nanoparticle with a spherically curved free surface both with
and without surface melting. We use the model to investigate the results of
recent molecular dynamics simulations, which suggest the melting temperature of
a supported metal particle is the same as that of a free spherical particle
with the same surface curvature. Our model shows that this is only the case
when the contact angles of the supported solid and liquid particles are
similar. This is also the case for the temperature at which surface melting
begins.

<|endoftext|><|startoftext|>
Introduction and the model. This paper deals with discrete-time
Markov control processes on a general state space. The one-step cost function
is nonnegative and possibly unbounded. The decision maker is supposed to
be risk-averse with a constant risk coefficient γ > 0. The risk-sensitive aver-
age cost criterion is used as a performance measure. The aim of the work is to
establish the optimality inequality for risk-sensitive dynamic programming
and derive an optimal stationary policy. The result is proved under two
different sets of compactness-continuity assumptions, namely, for Markov
control processes with weakly continuous transition probabilities [Condition
(W)], as well as transition probabilities that are continuous with respect
to setwise convergence [Condition (S)]. A similar problem for risk-neutral
stochastic control models has been examined in [27] using the vanishing dis-
count factor approach. However, it is well known that, for risk-sensitive con-
trol models, an analogous approximation of the average cost via a sequence
of the corresponding discounted models does not work. Instead of this, fol-
lowing [9, 15, 16], we introduce an auxiliary discounted minimax problem.
A variational formula that expresses the mutual relationship between the
relative entropy function and the logarithmic moment-generating function
enables us to connect the discounted minimax model with the original one.
Received March 2006; revised September 2006.
1Supported by MEiN Grant 1 P03A 01030.
AMS 2000 subject classifications. Primary 60J05, 90C39; secondary 60A10.
Key words and phrases. Risk-sensitive control, Borel state space, average cost optimal-
ity inequality.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Probability,
2007, Vol. 17, No. 2, 654–675. This reprint differs from the original in pagination
and typographic detail.
http://arxiv.org/abs/0704.0394v1
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000790
http://www.imstat.org
http://www.ams.org/msc/
http://www.imstat.org
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000790
2 A. JAŚKIEWICZ
Next, assuming that a certain family of functions is bounded [Condition (B)]
and using Fatou’s lemma (for weakly or setwise convergent measures), we
obtain the optimality inequality.
The predecessor of our result is Theorem 4.1 in [16], where the optimality
inequality for the risk-sensitive dynamic programming with a countable state
space was established. Instead of boundedness assumption (B), Hernández-
Hernández and Marcus [16] assume that there exists a stationary policy
which induces a finite average cost that is equal some constant in each
state. On the other hand, it is well known that an optimal risk-sensitive
average cost may depend on the initial state (see Example 1). This behavior
happens if the risk factor is too large. Instead of this restriction on the
risk coefficient, we use Condition (B), which makes the process reach “good
states” sufficiently fast.
There is a rich literature in risk-sensitive control, going back at least to
the seminal works of Howard and Matheson [18] and Jacobson [19], which
covered the finite horizon case. The average cost criterion on the infinite
horizon was studied in [5, 8, 14, 15, 16, 31] for a denumerable state space
and in [10, 11, 20] for a general state space. It is also worth mentioning
that risk-sensitive control finds natural applications in portfolio managment,
where the objective is to maximize the growth rate of the expected utility
of wealth; see [3, 4, 30] and the references cited therein.
The paper is organized as follows. Below a Markov control model with
the long-run average cost criterion as a performance measure is described, as
well as some basic notation is set up. In Section 2 we introduce preliminaries
and present the auxiliary discounted minimax problem, which is, in turn,
solved in Section 3. The main result is established in Section 4. Section 5
contains a discussion of Condition (B), and in the Appendix a variational
formula for the logarithmic moment-generating function is stated.
A discrete-time Markov control process is specified by the following ob-
jects:
(i) The state space X is a standard Borel space (i.e., a nonempty Borel
subset of some Polish space).
(ii) A is a Borel action space.
(iii) K is a nonempty Borel subset of X×A. We assume that, for each
x ∈X , the nonempty x-section
A(x) = {a ∈A : (x,a) ∈K}
of K is compact and represents the set of actions available in state x.
(iv) q is a regular conditional distribution from K to X.
(v) The one-step cost function c is a Borel measurable mapping from K
to [0,+∞].
RISK-SENSITIVE CONTROL 3
Then the history spaces are defined as H0 = X, Hk = (X ×A)
k ×X and
H∞ = (X ×A)
∞. As usual, a policy π = {πk, k = 0,1, . . .} ∈ Π is a sequence
of transition probabilities from Hk to A such that πk(A(xk)|hk) = 1, where
hk = (x0, a0, . . . , xk) ∈Hk. The class of stationary policies is identified with
the class F of measurable functions f from X to A such that f(x) ∈A(x). It
is well known that F is nonempty [6]. By the Ionescu–Tulcea theorem [24],
for each policy π and each initial state x0 = x, a probability measure P
and a stochastic process {(xk, ak)} are defined on H∞ in a canonical way,
where xk and ak describe the state and the decision at stage k, respectively.
By Eπx we denote the expectation operator with respect to the probability
measure Pπx .
Let γ > 0 be a given risk factor. For any initial state x ∈X and policy
π ∈ Π, we define the following risk-sensitive average cost criterion:
J(x,π) = lim sup
logEπx exp
c(xk, ak)
Our aim is to minimize J(x,π) within the class of all policies and find a
policy π∗, for which
J∗(x) := inf
J(x,π) = J(x,π∗).
Throughout the paper the following assumption will be supposed to hold
true even without explicit reference:
∃π̃ ∈ Π J(x, π̃) < +∞.(G)
Remark 1. Throughout the remainder, we assume that the risk factor
γ > 0 is arbitrary and fixed. Therefore, here and subsequently, we shall not
indicate that some quantities depend on γ [e.g., we write J(x,π) instead of
Jγ(x,π), dropping the index γ].
2. Preliminaries. Let Pr(X) be the set of all probability measures on
X. Fix ν ∈ Pr(X). The relative entropy function R(·‖ν) is a mapping from
Pr(X) into R defined as follows:
R(µ‖ν) :=
dµ, µ≪ ν,
+∞, otherwise.
It is well known that R(µ‖ν) is nonnegative for any µ ∈ Pr(X) and R(µ‖ν) =
0 if and only if µ = ν (consult Lemma 1.4.1 in [12]).
We shall consider the following auxiliary minimax problem, associated
with our original Markov control process. The set X is the state space,
4 A. JAŚKIEWICZ
while A and Pr(X) are the action sets for the decision maker and op-
ponent, respectively. The process then operates as follows. In a state xn,
n = 0,1, . . . , the controller chooses an action an ∈ A(xn), while the oppo-
nent selects µn(·)[xn, an] ∈ Pr(X). As a consequence, the controller pays
γc(xn, an)−R(µn‖q(·|xn, an)) to his opponent, and the system moves to the
next state according to the probability distribution µn(·)[xn, an].
We shall deal with the following classes of strategies. It will cause no
confusion if we continue to use the same letters to denote strategies for
the controller. Namely, π stands for a randomized control strategy (policy),
whereas f denotes a stationary strategy. We write Π and F to denote the sets
of corresponding strategies. For the opponent’s class of strategies, we confine
to the stationary one, which is identified with the class P of stochastic kernels
p on X given K.
Let (Ω,F) be the measurable space consisting of the sample space Ω =
(X × A)∞ and its product σ-algebra F . Then for an initial state x ∈ X,
and strategies π and p, there exists a unique probability measure Pπpx and,
again, a stochastic process {(xk, ak)} is defined (Ω,F) in a canonical way,
where xk denotes the state at time k and ak is the action for the controller.
With some abuse of notation, we let hk stand for the history of the process
up to the kth state, that is,
hk = (x0, a0, x1, . . . , ak−1, xk).
The corresponding expectation operator is denoted by Eπpx .
For fixed x ∈ X, π ∈ Π and p ∈ P , we define the following functional
costs:
Vβ(x,π, p) =
βkEπpx [γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))],(1)
where β ∈ (0,1) is the discount factor, and
j(x,π, p) = lim sup
Eπpx [γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))].
Note that, since the function R(·‖·) is lower semicontinuous on Pr(X) ×
Pr(X) and p and q are stochastic kernels [i.e., measurable functions of (x,a)],
it follows that the mapping
(x,a) 7→R(p(·|x,a)‖q(·|x,a))
is measurable (Lemma 1.4.3(f) in [12]). Observe that Vβ(x,π, p) and j(x,π, p)
might be undetermined, because c can be unbounded. We thus restrict the
set of admissible strategies for the opponent in the following way.
RISK-SENSITIVE CONTROL 5
Definition 1. Given π = {πk} ∈ Π, we say that p ∈ P is a π-admissible
strategy iff
A(xk)
R(p(·|xk, a)‖q(·|xk, a))πk(da|hk) < +∞,(2)
and moreover, there exists a constant C ≥ 0, possibly depending on π and
p, such that
A(xk)
[γc(xk, a) −R(p(·|xk, a)‖q(·|xk, a))]πk(da|hk) + C ≥ 0,
for all histories of the process hk, k ≥ 0, induced by p and π. We denote
this set by Q(π). [Note that this set is nonempty, since p = q ∈Q(π) for any
π ∈ Π.]
Let us introduce the following notation. For any π ∈ Π, p ∈ Q(π) and
n≥ 1, define
Jn(x,π) = logE
x exp
c(xk, ak)
jn(x,π, p) =
Eπpx [γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))].
Now we are ready to present the result that was originally proved in
[16] for Markov strategies. However, it still remains valid when arbitrary
strategies for the decision maker are considered. Therefore, for the sake of
clarity, we state the result with its proof.
Proposition 1. Let x∈X and p ∈Q(π). Then:
(a) supp∈Q(π) jn(x,π, p) ≤ Jn(x,π) for each n≥ 1,
(b) lim supn→∞ supp∈Q(π)
jn(x,π, p) ≤ γJ(x,π).
Proof. (a) Let p ∈ Q(π) be any stochastic kernel. For n = 1, we con-
clude
j1(x,π, p) ≤ E
x (γc(x,a0)) ≤ logE
γc(x,a0) = J1(x,π),
where the first inequality holds since the relative entropy is nonnegative, and
the second one is due to Jensen’s inequality. Now assume that the hypothesis
is true for some n≥ 1. Clearly,
jn+1(x,π, p) =
Eπpx [γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))]
= Eπpx
[γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))], n≥ 1.
6 A. JAŚKIEWICZ
Denote by π(1) the “1-shifted” strategy, that is,
(·|hk) = πk+1(·|x0, a0, hk), k ≥ 0.
Then, we have
jn+1(x,π, p)
= Eπpx [γc(x,a0) + jn(x1, π
(1), p) −R(p(·|x,a0)‖q(·|x,a0))]
≤ Eπpx (γc(x,a0))
+ Eπpx (E
x {[Jn(x1, π
(1)) −R(p(·|x,a0)‖q(·|x,a0))]|a0})
= Eπx log e
γc(x,a0)
+ Eπpx
Jn(x1, π
(1))p(dx1|x,a0) −R(p(·|x,a0)‖q(·|x,a0))
log eγc(x,a0)π0(da0|x)
eJn(x1,π
(1))q(dx1|x,a0)π0(da0|x)
eγc(x,a0)+
γc(xk,ak)q(dx1|x,a0)π0(da0|x)
≤ log
eγc(x,a0)+
γc(xk,ak)q(dx1|x,a0)π0(da0|x)
= Jn+1(x,π).
Clearly, the first inequality follows from the induction hypothesis. The third
inequality is due to Jensen’s inequality, whilst the second one follows from
Lemma A in the Appendix. Since p ∈Q(π) is arbitrary, we get the desired
conclusion.
Part (b) follows directly from part (a). �
Remark 2. Note that in the proof of Proposition 1 we did not really
have to use the fact that p ∈ Q(π). The only assumption which plays an
essential role is condition (2). Namely, it guarantees that jn(x,π, p) is well
defined for all n≥ 1, x ∈X and π ∈ Π. However, in Definition 1 we restrict
the opponent’s class of strategies to the set Q(π) in order to be able to apply
the Hardy–Littlewood theorem. In actual fact, later on it will be clear that
the set Q(π), where π ∈ Π, is sufficiently large. Namely, the supremum of
certain discounted functional costs over the set Q(π) will not change if we
add new elements to Q(π); see the proofs of Lemmas 1 and 2.
RISK-SENSITIVE CONTROL 7
Let π̃ be as in assumption (G) and let p ∈Q(π̃). Then from the Hardy–
Littlewood theorem (Theorem H.2 in [13]), we get
lim sup
(1 − β)Vβ(x, π̃, p) ≤ lim sup
jn(x, π̃, p)
and from Proposition 1(b),
lim sup
p∈Q(π̃)
jn(x, π̃, p) ≤ γJ(x, π̃).
Combining these two inequalities, we conclude that
lim sup
(1− β)Vβ(x, π̃, p) ≤ γJ(x, π̃) for every p ∈Q(π̃).
This in turn yields
lim sup
(1− β)Vβ(x) ≤ γJ(x, π̃),(4)
where Vβ(x) is the upper value of functional cost (1), that is,
Vβ(x) = inf
p∈Q(π)
Vβ(x,π, p).
Consequently, inequality (4) and assumption (G) together lead to the fol-
lowing:
Vβ(x) < +∞(5)
for each x ∈X and β ∈ (0,1). In addition, Vβ(x) ≥ 0. Now defining
ρ := inf
J(x,π), mβ := inf
Vβ(x)
and observing that
lim sup
(1 − β)mβ ≤ γρ,(6)
one can deduce that there exists a sequence of discount factors {βn} con-
verging to 1 for which
(1− βn)mβn = l,(7)
where l is a certain nonnegative constant.
8 A. JAŚKIEWICZ
3. A solution to the auxiliary discounted minimax problem. The main
thrust of this section is to solve the auxiliary discounted minimax problem
introduced in the previous section. In other words, we look for a discounted
functional equation whose solution is the function Vβ . This is done by an ap-
proximation of the above-mentioned minimax models by ones with bounded
cost functions. These models in turn are solved by a fixed point argument in
Proposition 1. Next, we show in Lemma 1 that the corresponding solutions
equal the upper values of some discounted costs on the infinite horizon. Fi-
nally, the limit passage in Lemma 2 gives the desired discounted functional
equation with the function Vβ as a solution.
We shall need the following two sets of compactness-semicontinuity as-
sumptions, which will be used alternatively.
Condition (S).
(i) The set A(x) is compact.
(ii) For each x ∈X and every Borel set D ⊂X, the function q(D|x, ·) is
continuous on A(x).
(iii) The cost function c(x, ·) is lower semicontinuous for each x ∈X.
Condition (W).
(i) The set A(x) is compact and the set-valued mapping x 7→ A(x) is
upper semicontinuous, that is, {x ∈X : A(x) ∩ B 6= ∅} is closed for every
closed set B in A.
(ii) The transition law q is weakly continuous on K, that is, the function
(x,a) 7→
u(y)q(dy|x,a), (x,a) ∈K,
is continuous function for each bounded continuous function u.
(iii) The cost function c is lower semicontinuous on K.
By Lb(X) and Bb(X), we denote the set of all bounded lower semicontin-
uous and bounded Borel measurable functions on X, respectively. Further,
let N stand for the set of positive integers. Choose N ∈ N and define the
truncated cost function
cN (x,a) = min{N,c(x,a)}.
The following result was proved under Condition (W) for bounded cost
functions by a fixed point argument; see page 72 in [10]. However, a simple
and obvious modification of the proof gives the conclusion under Condition
(S) as well.
RISK-SENSITIVE CONTROL 9
Proposition 2. Under (W) [(S)], for any discount factor β ∈ (0,1) and
a number N ∈N, there exists a unique function wNβ ∈ Lb(X) [w
β ∈Bb(X)]
such that
= min
a∈A(x)
N (x,a)
q(dy|x,a)
for each x ∈X, and
0 ≤ (1 − β)wNβ (x) ≤Nγ.(9)
Moreover, there exists a stationary strategy f0 ∈ F (possibly depending on β
and N) that attains the minimum in (8).
Let β and N be fixed just in the next lemma.
Lemma 1. Assume (W) or (S). Then, it holds
wNβ (x)
= inf
p∈Q(π)
Eπpx β
k[γcN (xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))]
for any initial state x ∈X.
Proof. Note that (8) can be rewritten in the following equivalent form:
wNβ (x) = min
a∈A(x)
γcN (x,a) + log
q(dy|x,a)
.(11)
Applying Lemma A in the Appendix to (11), we get
wNβ (x)
= min
a∈A(x)
µ∈∆(x,a)
γcN (x,a) −R(µ‖q(·|x,a)) + β
wNβ (y)µ(dy)
∆(x,a) := {µ ∈ Pr(X) :R(µ‖q(·|x,a)) < +∞}, (x,a) ∈K.
Moreover, the measure
µ0(dy)[x,a] =
q(dy|x,a)
q(dy|x,a)
achieves the supremum in (12). Put
p0(dy|x,a) = µ0(dy)[x,a] for each (x,a) ∈K.(13)
10 A. JAŚKIEWICZ
Note that p0 ∈Q(π) for any strategy π ∈ Π. This directly follows from the
definition of R(p0(·|x,a)‖q(·|x,a)) and (9). Simple calculations give the up-
per bound
R(p0(·|x,a)‖q(·|x,a)) ≤ 2
1 − β
for every (x,a) ∈K.
Let p0 be defined as in (13). By (12), we then have
wNβ (x) ≤ γc
N (x,a) −R(p0(·|x,a)‖q(·|x,a)) + β
wNβ (y)p
0(dy|x,a).
By iteration of this inequality n times, it follows
wNβ (x) ≤
βkEπp
x [γc
N (xk, ak) −R(p
0(·|xk, ak)‖q(·|xk, ak))]
+ βn+1Eπp
β (xn+1),
where π is any strategy for the controller. Now, letting n→∞ and making
use of (9), we conclude
wNβ (x) ≤
βkEπp
x [γc
N (xk, ak) −R(p
0(·|xk, ak)‖q(·|xk, ak))].
Since π is arbitrary, we get
wNβ (x) ≤ inf
βkEπp
x [γc
N (xk, ak) −R(p
0(·|xk, ak)‖q(·|xk, ak))]
≤ inf
p∈Q(π)
βkEπpx [γc
N (xk, ak)(14)
−R(p(·|xk, ak)‖q(·|xk, ak))].
Note that inequality (14) is valid because p0 ∈Q(π).
On the other hand, by (12), we can write
wNβ (x) ≥ γc
N (x, f0(x)) −R(p(·|x, f0(x))‖q(·|x, f0(x)))
wNβ (y)p(dy|x, f
0(x)),
with f0 as in Proposition 2 and any p ∈Q(f0). Proceeding along the same
line, we infer
wNβ (x) ≥
x [γc
N (xk, f
0(xk)) −R(p(·|xk, f
0(xk))‖q(·|xk, f
0(xk)))].
RISK-SENSITIVE CONTROL 11
Since p ∈Q(f0) is arbitrary, we easily deduce
wNβ (x) ≥ sup
p∈Q(f0)
x [γc
N (xk, f
0(xk))
−R(p(·|xk, f
0(xk))‖q(·|xk, f
0(xk)))]
≥ inf
p∈Q(π)
βkEπpx [γc
N (xk, ak)
−R(p(·|xk, ak)‖q(·|xk, ak))].
Finally, combining (14) with (15) completes the proof. �
In the remainder of the paper, we shall use the following notation. Let
L(X) denote the set of all lower semicontinuous functions on X, whereas
B(X) stands for the set of all Borel measurable functions on X.
Lemma 2. Let (W) [(S)] hold and β ∈ (0,1). Then, we have the follow-
(a) The function
wβ(x) := lim
wNβ (x)
is finite and nonnegative for each x ∈X. Moreover, wβ ∈L(X) [wβ ∈B(X)].
(b) The functional equation holds
ewβ(x) = min
a∈A(x)
eγc(x,a)
eβwβ(y)q(dy|x,a)
for all x ∈X. Furthermore, there exists a Borel measurable selector fβ ∈ F
of the minima in (16).
(c) For any x ∈X, wβ(x) = Vβ(x).
Proof. Let x ∈X and β ∈ (0,1) be fixed. From (10), it is easily seen
that the sequence {wNβ (x)} is nondecreasing in N. Therefore, wβ(x) =
limN→∞w
β (x) exists and by (9), it is nonnegative. Clearly, under (S),
wβ ∈B(X), whereas, under (W), wβ ∈L(X); see Proposition 10.1 in [26].
In order to prove that wβ(x) is finite for each x ∈X, observe first that,
for any π ∈ Π, p ∈Q(π) and N ∈N,
Vβ(x,π, p) =
βkEπpx [γc(xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))]
βkEπpx [γc
N (xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))].
12 A. JAŚKIEWICZ
Moreover, from Lemma 1, we have
Vβ(x) = inf
p∈Q(π)
Vβ(x,π, p)
≥ inf
p∈Q(π)
βkEπpx [γc
N (xk, ak) −R(p(·|xk, ak)‖q(·|xk, ak))]
= wNβ (x).
Hence, letting N →∞, it follows
Vβ(x) ≥ lim
wNβ (x) = wβ(x).(17)
By (5), Vβ(x) is finite for each x ∈X, so is wβ(x). This finishes the proof of
part (a).
In order to prove part (b), note that by (11) and part (a) the limit
a∈A(x)
γcN (x,a) + log
q(dy|x,a)
exists. Since the first and the second term in (18) are nondecreasing and
(W) or (S) holds, then we may interchange the limit with the minimum
(see Proposition 10.1 in [26]). Furthermore, making use of the Lebesgue
monotone convergence theorem, we conclude (16). The existence of a Borel
measurable selector fβ ∈ F follows from the compactness–semicontinuity
assumptions and Proposition D.5 in [17].
We now turn to proving part (c). Again, taking a logarithm on both sides
of (16), it follows
wβ(x) = min
a∈A(x)
γc(x,a) + log
eβwβ(y)q(dy|x,a)
.(19)
Applying Lemma A in the Appendix to (19), we easily obtain
wβ(x)
= min
a∈A(x)
µ∈∆(x,a)
γc(x,a) −R(µ‖q(·|x,a)) + β
wβ(y)µ(dy)
∆(x,a) = {µ ∈ Pr(X) :R(µ‖q(·|x,a)) < +∞}, (x,a) ∈K.
Observe that by (20), for any p ∈Q(fβ), the following holds:
wβ(x) ≥ γc(x, fβ(x)) −R(p(·|x, fβ(x))‖q(·|x, fβ(x)))
wβ(y)p(dy|x, fβ(x)).
RISK-SENSITIVE CONTROL 13
Iterating this inequality n times, we immediately obtain
wβ(x) ≥
x [γc(xk, fβ(xk))
−R(p(·|xk, fβ(xk))‖q(·|xk, fβ(xk)))]
+ βn+1E
x wβ(xn+1)(21)
x [γc(xk, fβ(xk))
−R(p(·|xk, fβ(xk))‖q(·|xk, fβ(xk)))].
Now note that, by Definition 1,
x [γc(xk, fβ(xk)) −R(p(·|xk, fβ(xk))‖q(·|xk, fβ(xk)))] ≥−C,
for some C ≥ 0 and k ≥ 1. Thus, letting n→∞ in (21), it follows
wβ(x) ≥
x [γc(xk, fβ(xk)) −R(p(·|xk, fβ(xk))‖q(·|xk, fβ(xk)))]
= Vβ(x, fβ, p).
Since p ∈Q(fβ) is arbitrary, we see that
wβ(x) ≥ sup
p∈Q(fβ)
Vβ(x, fβ, p) ≥ Vβ(x).(22)
Inequalities (17) and (22) combined conclude the proof of part (c). �
4. A solution to the risk-sensitive control problem. For any x ∈X and
any discount factor β ∈ (0,1), define
hβ(x) := Vβ(x) −mβ
with mβ = infx∈X Vβ(x). Obviously, hβ is nonnegative.
The following boundedness assumption is supposed to hold true. As men-
tioned in the Introduction, we put off discussing it until Section 5:
Condition (B). For any x ∈X , supβ∈(0,1) hβ(x) < +∞.
Remark 3. A similar assumption and its equivalent variants were used
to study the expected average cost criterion for Markov decision processes
in the risk-neutral setting [17, 27, 28]. Roughly speaking, Hernández-Lerma
and Lasserre [17], Schäl [27], and Sennott [28] assume that the family of the
so-called normalized β-discounted cost functions is bounded. This assump-
tion, however, simply holds for ergodic Markov decision processes. More
14 A. JAŚKIEWICZ
precisely, if the n-step transition probabilities converge to the unique in-
variant probability measure geometrically fast, and the cost functions are
bounded (or more generally satisfy a certain growth hypothesis), then the
aforementioned family of functions is pointwise relatively compact [21, 22].
It is worth pointing out that this requirement is crucial to obtain the opti-
mality inequality in the risk-neutral case; see [27, 28]. In Section 5 we provide
an example that illustrates that also in the risk-sensitive case Condition (B)
cannot be weakened.
We shall need the following two versions of Fatou’s lemma for converging
measures.
Lemma 3. Let {µn} be a sequence of probability measures converging to
µ ∈ Pr(X) and let {hn} be a sequence of measurable nonnegative functions
on X. Then,
h(y)µ(dy) ≤ lim inf
hn(y)µn(dy)
in the following cases:
(a) {µn} converges setwise to µ [i.e.,
f(y)dµn(y) →
f(y)dµ(y)∀f ∈
Bb(X)], and h(x) = lim infn→∞ hn(x);
(b) {µn} converges weakly to µ, and h(x) = inf{lim infn→∞ hn(xn) :xn →
x}; moreover, h ∈ L(X).
Proof. Part (a) is due to Royden [25], page 231, whereas part (b) was
proved by Serfozo [29]. For the proof of lower semicontinuity of h, the reader
is referred to Lemma 3.1 in [22]. �
Now we are in a position to state the main result of the paper. This theo-
rem concerns a study of the risk-sensitive average cost optimality inequality,
which is sufficient to establish the existence of an optimal stationary policy.
Theorem 1. Assume (B) and (W) [or (S)]. Then, for each risk factor
γ > 0, there exist a constant l̂ and a nonnegative function h ∈ L(X) [h ∈
B(X)] and f̂ ∈ F such that
h(x) + l̂ ≥ min
a∈A(x)
γc(x,a) + log
eh(y)q(dy|x,a)
= γc(x, f̂(x)) + log
eh(y)q(dy|x, f̂(x))
RISK-SENSITIVE CONTROL 15
for all x ∈X. Moreover,
= inf
J(x,π) = J(x, f̂).
In other words, l̂/γ is the optimal risk-sensitive average cost and f̂ is a
risk-sensitive average cost optimal stationary policy.
Remark 4. (a) There are two papers [16, 27] that can be treated as
predecessors of our work. They both deal with the optimality inequality but
within two different frameworks. The first work [16] establishes the optimal-
ity equation for the risk-sensitive dynamic programming on a denumarable
state space. In the other one, the result is obtained for Markov control pro-
cesses on an uncountable state space for the risk factor γ = 0. From this
point of view, our result is an extention of Theorem 4.1 in [16] to a general
state space and Theorem 3.8 in [27] to the risk-sensitive case. Moreover, the
common feature of the discussed results is that their proofs are based on the
vanishing discount factor approach. Our proof also relies on this method, and
similarly, as in [27] or [21, 22], makes use of the Fatou lemmas for setwise
and weakly convergent measures.
(b) Finally, it is also worth mentioning that there are papers studying the
optimality equation in the risk-sensitive dynamic programming, which is of
the following form:
h(x) + l̂ = min
a∈A(x)
γc(x,a) + log
eh(y)q(dy|x,a)
.(24)
The constant l̂
is (under suitable assumptions) an optimal cost with respect
to the risk-sensitive average cost criterion. Let us mention and discuss a few
representative papers that deal with equation (24). In [8, 15] Markov control
models satisfying a simultaneous Doeblin condition, on a finite and countable
state space, respectively, are considered. The cost functions are supposed to
be bounded and the risk factor must be sufficiently small. Otherwise, as
argued in [8], the optimality equation need not have a solution.
In [10] Di Masi and Stettner extend the result to a general state space
by retaining bounded cost functions and replacing a simultaneous Doeblin
condition with a very strong assumption on transition probabilities. In [11],
however, they replace this assumption by one imposed on the risk coeffi-
cient. Finally, the class of Markov control models that requires neither any
ergodicity conditions nor the smallness of the risk factor was pointed out by
Jaśkiewicz in [20].
Fairly recently Borkar and Meyn [5] considered Markov decision processes
with unbounded cost functions on a denumarable state space. Their result
16 A. JAŚKIEWICZ
assumes the following: the state space is irreducible under all Markov poli-
cies, the costs are norm-like, and there exists a policy that induces a finite
average risk-sensitive cost. Moreover, their proof is based on a multiplicative
ergodic theorem that was studied in more detail in [1].
Proof of Theorem 1. Let {βn} be a sequence of discount factors
converging to 1 for which (7) holds. Defining
l̂ := l = lim
(1 − βn)mβn
and applying (6), we note that
≤ inf
J(x,π)(25)
for any x ∈X. Assume for a while that inequality (23) is satisfied and there
exists f̂ ∈ F as in the statement of Theorem 1. We prove that f̂ is an optimal
policy. From (23), we have
h(x) ≥ γc(x, f̂(x)) − l̂ + log
eh(y)q(dy|x, f̂(x)).
By iteration of this inequality n times, we obtain
h(x) ≥ logEπx exp
γc(xk, f̂(xk)) + h(xn+1)
− (n+ 1)l̂.
Since h is nonnegative, we infer
+ l̂≥
Jn+1(x, f̂)
with Jn+1(x, f̂) defined in (3). Letting n→∞, it follows
≥ J(x, f̂), x ∈X.(26)
Hence, (25) and (26) together imply
= J(x, f̂) = inf
J(x,Π)
for each x ∈X.
We next focus on showing inequality (23). Let n≥ 1 and put hn := hβn ,
fn := fβn. Note that (19) can be rewritten in the following form:
(1 − βn)mβn + hn(x) = min
a∈A(x)
γc(x,a) + log
eβnhn(y)q(dy|x,a)
= γc(x, fn(x)) + log
eβnhn(y)q(dy|x, fn(x)).
RISK-SENSITIVE CONTROL 17
(i) Assume first (S) and define
h(x) = lim inf
hn(x).
Taking the lim inf on both sides of (27), we get
lim inf
((1 − βn)mβn + hn(x))
= l̂ + h(x) = lim inf
a∈A(x)
γc(x,a) + log
eβnhn(y)q(dy|x,a)
Making use of Lemma 3(a) and the measurable selection theorem (see Propo-
sition D.5(a) in [17]), one can prove that there exists f̂ ∈ F such that (23)
holds.
(ii) Now assume (W). Fix x0 ∈X and choose any xn → x0, n→∞. Take
a subsequence {nk} of positive integers such that
lim inf
hn(xn) = lim
hnk(xnk).
Then by (27),
lim inf
((1− βn)mβn + hn(xn))
= l̂ + lim inf
hn(xn) = l̂ + lim
hnk(xnk)
= lim
a∈A(xnk )
γc(xnk , a) + log
eβnkhnk (y)q(dy|xnk , a)
= lim
γc(xnk , fnk(xnk)) + log
eβnkhnk (y)q(dy|xnk , fnk(xnk))
Note that G = {x0} ∪ {xn} is compact in X. From the upper semicontinu-
ity of x 7→A(x), compactness of every A(z) and Berge’s theorem (see [2] or
Theorem 7.4.2 in [23]), it follows that
z∈GA(z) is compact in A. There-
fore, {fnk(xnk)} has a subsequence converging to some a0 ∈A. By (W)(i),
a0 ∈ A(x0), that is, (x0, a0) ∈ K. Without loss of generality, assume that
fnk(xnk) → a0, k →∞. By the lower semicontinuity of the cost function c
and (28), we have
l̂ + lim inf
hn(xn) ≥ γc(x0, a0) + lim
eβnkhnk (y)q(dy|xnk , fnk(xnk)).
This and Lemma 3(b) imply that
l̂ + lim inf
hn(xn) ≥ γc(x0, a0) + log
eh̃(y)q(dy|x0, a0),
where eh̃ is the generalized lim inf of the sequence eh̃k = ehnk . Clearly, h≤ h̃.
By Lemma 3(b), h ∈L(X). Thus,
l̂ + lim inf
hn(xn) ≥ γc(x0, a0) + log
eh(y)q(dy|x0, a0).(29)
18 A. JAŚKIEWICZ
Since xn → x0 was chosen arbitrarily, we infer from (29) that
l̂ + h(x0) ≥ γc(x0, a0) + log
eh(y)q(dy|x0, a0).
The last inequality shows that, for any x ∈X, there exists an ax ∈A(x) such
l̂ + h(x) ≥ γc(x,ax) + log
eh(y)q(dy|x,ax)
≥ min
a∈A(x)
γc(x,a) +
eh(y)(y)q(dy|x,a)
By our compactness–semicontinuity assumptions and Proposition D.5(b) in
[17], there exists some f̂ ∈ F such that (23) holds. �
5. A discussion. This section is devoted to a discussion of Condition (B).
We start with revisiting Example 3.1 in [8].
Example 1. Put X = {0,1}, A = {a}, c(x) := c(x,a) = x and the tran-
sition matrix is as follows:
ρ 1 − ρ
where ρ ∈ (0,1). Recall that the following was proved.
Let us consider three cases for the risk factor γ:
(I) γ <− log(1− ρ),
(II) γ = − log(1− ρ),
(III) γ >− log(1− ρ).
Then if (I) or (II) hold, the optimal risk-sensitive average cost equals 0
and is independent of the initial state. In case (III) we have J∗(0) = 0 and
J∗(1) = 1 +
log(1−ρ)
> 0. In addition, it is interesting to observe that, for
(II) and (III) cases, there does not exist a function h :X 7→ R such that
optimality inequality (23) is satisfied. Indeed, to see this take x = 1 and
consider (III). The optimality inequality is then as follows:
γJ∗(1) + h(1) = γ + log(1 − ρ) + h(1) ≥ γ + log(eh(1)(1− ρ) + eh(0)ρ).
Note that the right-hand side is strictly greater than γ + log(eh(1)(1 − ρ)),
which equals to the left-hand side. Similar calculations for case (II) also
lead to a contradiction. Hence, although an optimal cost is constant, the
optimality inequality need not have a solution.
Now we turn to checking Condition (B). Let Vβ be as in Lemma 2. Clearly,
Vβ = w
β for N ≥ 1 and Vβ(0) = 0. Then, by (8) under (I), we get
Vβ(1) = γ + log[e
βVβ(1)(1 − ρ) + ρ] < γ + log[eVβ(1)(1 − ρ) + ρ].
RISK-SENSITIVE CONTROL 19
Hence,
Vβ(1) < log
eγ(1− ρ)
1− eγ(1− ρ)
∀β ∈ (0,1),
and consequently, supβ∈(0,1) hβ(x) < +∞.
Now let the risk factor γ be as in (III). Then by (8),
Vβ(1) > γ + log(1 − ρ) + βVβ(1),
which in turn implies that
Vβ(1) >
γ + log(1− ρ)
Thus, hβ(1) = Vβ(1) goes to the infinity when β ր 1.
For case (II), we obtain
Vβ(1) = − log(1− ρ) + log[e
βVβ(1)(1− ρ) + ρ]
= βVβ(1) + log
1 + e−βVβ(1)
1 − ρ
If Vβ(1) ր +∞ when β ր 1, then the right-hand side of (31) also goes to the
infinity. On the contrary, assume that supβ∈(0,1) Vβ(1) ≤C for some constant
C > 0. Then,
Vβ(1) ≥
log[1 + e−Cρ/(1 − ρ)]
which leads to a contradiction when β ր 1. In consequence, in case (II) the
family {hβ(1)} does not satisfy Condition (B) either.
Therefore, the following conclusion can be drawn. Condition (B) is nec-
essary to obtain a solution to the optimality inequality.
For a verification of Condition (B), one can use Lemma 4 below. For a
similar result in the risk-neutral, case we refer to [27, 28]. For some η ≥ 0,
define the stopping time
τ = τ(β) := inf{n≥ 0 :Vβ(xn) ≤mβ + η}.
Lemma 4. For η ≥ 0, β ∈ (0,1) and x ∈X,
hβ(x) ≤ η + inf
logEπx exp
γc(xk, ak)
Proof. By Lemma 2(b), (c) and the fact that Vβ(y) ≥ 0, y ∈ X , we
Vβ(x) = min
a∈A(x)
γc(x,a) + log
eβVβ(y)q(dy|x,a)
< γc(x,a) + log
eVβ(y)q(dy|x,a)
20 A. JAŚKIEWICZ
for each x ∈X. Subtracting mβ from both sides in (32), we obtain
Vβ(x) −mβ < γc(x,a) + log
e(Vβ(y)−mβ )q(dy|x,a).
Iteration of this inequality up to the stopping time τ yields
Vβ(x) −mβ < logE
c(xk,ak)+η
= η + logEπx exp
c(xk, ak)
Since π ∈ Π is an arbitrary policy, we easily get the conclusion. �
Note that the fact
Eπx exp
γc(xk, ak)
< +∞(33)
has the following interpretation: before the process will reach “good states,”
the incurred costs at “early stages” should not be too large. Indeed, let us
define a set D as follows. We say that
x ∈D iff Vβ(x) ≤mβ + η
for a certain η ≥ 0. Clearly, D 6= ∅. Denote by τD the first return time of
the process, governed by fβ, to set D. Certainly, if (33) holds with τ := τD,
then Condition (B) is satisfied.
In Example 1 we can take D = {0} and η = 0, since Vβ(0) ≤ 0 + 0. If γ is
as in (I), then (33) holds:
E1 exp
τ0−1∑
γc(xk)
enγ(1− ρ)n−1ρ =
eγ(1− ρ)
1− eγ(1− ρ)
In other cases (33) fails to hold and, in addition, the earlier calculations
show that hβ(1) = +∞.
Summing up, the presented example shows that, without Condition (B)
imposed on the family of functions {hβ(x)}, β ∈ (0,1), a solution to the
optimality inequality need not exist, and moreover, the optimal risk-sensitive
average cost may depend on the initial state. In view of the above discussion,
Condition (B) is designed to prevent the accrual of infinite expected costs.
Namely, the costs incurred at transient states, that may be occupied only
at “early stages,” have an important and definite influence on a long-run
performance measure. Therefore, Condition (B) requires the model to be
sort of communicating insofar as certain sets of “good states” to be reached
sufficiently fast. Then, the optimal risk-sensitive average cost is constant and
the optimality inequality takes place. In addition, it is worth mentioning that
RISK-SENSITIVE CONTROL 21
the ergodicity itself of a Markov process/chain does not help so much as in
the risk-neutral case. In other words, for an ergodic Markov chain, it may
happen that the optimal risk-sensitive average cost depends on the initial
state as in Example 1. Moreover, in this example one can even prove in
a straightforward way that under case (I) [either under Condition (B) or
for sufficiently small risk factors], the optimality equation (24) is satisfied.
Therefore, it would be interesting to know whether Condition (B) (together
with some compactness–continuity assumptions) is sufficient to obtain a
solution to the optimality equation. There is a conjecture that, since in the
risk-neutral case a counterpart of Condition (B) is not sufficient [7], neither
is it in the risk-sensitive setting. But this question is beyond the scope of
the paper and remains open.
APPENDIX
The lemma below establishes a variational formula for the logarithmic
moment-generating function. The reader is referred to Theorem 4.5.1 and
Proposition 1.4.2 in [12] for its proof.
Lemma A. Let X be a Polish space, h a measurable function mapping
on X into R, which is either bounded from below or bounded from above,
and ν a probability measure on X .
(a) Then, we have the variational formula
ehdν = sup
−R(µ‖ν) +
where
∆ = {µ ∈ Pr(X ) :R(µ‖ν) < +∞}.
(b) Let µ0 denote the probability measure on X , which is µ0 ≪ ν and
satisfies
(x) =
eh(x)∫
eh dν
Then, the supremum in the variational formula is attained uniquely at µ0.
Acknowledgments. A part of this research was done while the author
was a Humboldt research fellow and visiting the University of Ulm. The
author gratefully acknowledges support from the Alexander von Humboldt
Foundation.
The second part of this paper was written at the Institute of Mathematics
and Computer Science, Wroc law University of Technology.
The author is greatly indebted to Professor Ulrich Rieder for drawing
her attention to paper [16], suggesting the problem and for several helpful
conversations.
22 A. JAŚKIEWICZ
REFERENCES
[1] Balaji, S. and Meyn, S. P. (2000). Multiplicative ergodicity and large deviations for
an irreducible Markov chains. Stochastic Process. Appl. 90 123–144. MR1787128
[2] Berge, E. (1963). Topological Spaces. MacMillan, New York.
[3] Bielecki, T., Hernández-Hernández, D. and Pliska, S. (1999). Risk-senisitive
control of finite state Markov chains in discrete time, with applications to port-
folio managment. Math. Methods Oper. Res. 50 167–188. MR1732397
[4] Bielecki, T. and Pliska, S. (1999). Risk-senisitive dynamic asset managment. Appl.
Math. Optim. 39 337–360. MR1675114
[5] Borkar, V. S. and Meyn, S. P. (2002). Risk-sensitive optimal control for
Markov decision processes with monotone cost. Math. Oper. Res. 27 192–209.
MR1886226
[6] Brown, L. D. and Purves, R. (1973). Measurable selections of extrema. Ann.
Statist. 1 902–912. MR0432846
[7] Cavazos-Cadena, R. (1991). A counterexample on the optimality equation in
Markov decision chains with the average cost criterion. Systems Control Lett. 16
387–392.
[8] Cavazos-Cadena, R. and Fernández-Gaucherand, E. (1999). Controlled Markov
chains with risk-sensitive criteria: Average cost, optimal equations and optimal
solutions. Math. Methods Oper. Res. 49 299–324. MR1687362
[9] Dai Pra, P., Meneghini, L. and Runggaldier, W. J. (1996). Some connections
between stochastic control and dynamic games. Math. Control Signals Systems
9 303–326. MR1450355
[10] Di Masi, G. B. and Stettner,  L. (2000). Risk-sensitive control of discrete-time
Markov processes with infinite horizon. SIAM J. Control Optim. 38 61–78.
MR1740607
[11] Di Masi, G. B. and Stettner,  L. (2000). Infinite horizon risk sensitive control of
discrete time Markov processes with small risk. Systems Control Lett. 40 15–20.
MR1829070
[12] Dupuis, P. and Ellis, R. S. (1997). A Weak Convergence Approach to the Theory
of Large Deviations. Wiley, New York. MR1431744
[13] Filar, J. and Vrieze, K. (1997). Competitive Markov Decision Processes. Springer,
New York. MR1418636
[14] Fleming, W. H. and Hernández-Hernández, D. (1997). Risk-sensitive control of
finite state machines on an infinite horizon. SIAM J. Control Optim. 35 1790–
1810. MR1466928
[15] Hernández-Hernández, D. and Marcus, S. I. (1996). Risk sensitive control of
Markov processes in countable state space. Systems Control Lett. 29 147–155.
[Corrigendum (1998) Systems Control Lett. 34 105–106.] MR1422212
[16] Hernández-Hernández, D. and Marcus, S. I. (1999). Existence of risk-sensitive
optimal stationary policies for controlled Markov processes. Appl. Math. Optim.
40 273–285. MR1709324
[17] Hernández-Lerma, O. and Lasserre, J. B. (1993). Discrete-Time Markov Control
Process: Basic Optimality Criteria. Springer, New York. MR1363487
[18] Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive Markov decision pro-
cesses. Management Sci. 18 356–369. MR0292497
[19] Jacobson, D. H. (1973). Optimal stochastic linear systems with exponential per-
formance criteria and their relation to deterministic differential games. IEEE
Trans. Automat. Control 18 124–131. MR0441523
http://www.ams.org/mathscinet-getitem?mr=1787128
http://www.ams.org/mathscinet-getitem?mr=1732397
http://www.ams.org/mathscinet-getitem?mr=1675114
http://www.ams.org/mathscinet-getitem?mr=1886226
http://www.ams.org/mathscinet-getitem?mr=0432846
http://www.ams.org/mathscinet-getitem?mr=1687362
http://www.ams.org/mathscinet-getitem?mr=1450355
http://www.ams.org/mathscinet-getitem?mr=1740607
http://www.ams.org/mathscinet-getitem?mr=1829070
http://www.ams.org/mathscinet-getitem?mr=1431744
http://www.ams.org/mathscinet-getitem?mr=1418636
http://www.ams.org/mathscinet-getitem?mr=1466928
http://www.ams.org/mathscinet-getitem?mr=1422212
http://www.ams.org/mathscinet-getitem?mr=1709324
http://www.ams.org/mathscinet-getitem?mr=1363487
http://www.ams.org/mathscinet-getitem?mr=0292497
http://www.ams.org/mathscinet-getitem?mr=0441523
RISK-SENSITIVE CONTROL 23
[20] Jaśkiewicz, A. (2006). A note on risk-sensitive control of invariant models. Technical
Report, Wroc law University of Technology.
[21] Jaśkiewicz, A. and Nowak, A. S. (2006). On the optimality equation for average
cost Markov control processes with Feller transition probabilities. J. Math. Anal.
Appl. 316 495–509. MR2206685
[22] Jaśkiewicz, A. and Nowak, A. S. (2006). Zero-sum ergodic stochastic games with
Feller transition probabilities. SIAM J. Control Optim. 45 773–789. MR2247715
[23] Klein, E. and Thompson, A. C. (1984). Theory of Correspondences. Wiley, New
York. MR0752692
[24] Neveu, J. (1965). Mathematical Foundations of the Calculus of Probability. Holden-
Day, San Francisco, CA. MR0198505
[25] Royden, H. L. (1968). Real Analysis. MacMillan, New York. MR0151555
[26] Schäl, M. (1975). Conditions for optimality in dynamic programming and for the
limit n-stage optimal policies to be optimal. Z. Wahrsch. Verw. Gebiete 32 179–
196. MR0378841
[27] Schäl, M. (1993). Average optimality in dynamic programming with general state
space. Math. Oper. Res. 18 163–172. MR1250112
[28] Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queue-
ing Systems. Wiley, New York. MR1645435
[29] Serfozo, R. (1982). Convergence of Lebesgue integrals with varying measures.
Sankhyã Ser. A 44 380–402. MR0705462
[30] Stettner,  L. (1999). Risk sensitive portfolio optimization. Math. Methods Oper.
Res. 50 463–474. MR1731299
[31] Whittle, P. (1990). Risk-Sensitive Optimal Control. Wiley, Chichester. MR1093001
Institute of Mathematics and Computer Science
Wroc law University of Technology
Wybrzeże Wyspiańskiego 27
PL-50-370 Wroc law
Poland
E-mail: ajaskiew@im.pwr.wroc.pl
http://www.ams.org/mathscinet-getitem?mr=2206685
http://www.ams.org/mathscinet-getitem?mr=2247715
http://www.ams.org/mathscinet-getitem?mr=0752692
http://www.ams.org/mathscinet-getitem?mr=0198505
http://www.ams.org/mathscinet-getitem?mr=0151555
http://www.ams.org/mathscinet-getitem?mr=0378841
http://www.ams.org/mathscinet-getitem?mr=1250112
http://www.ams.org/mathscinet-getitem?mr=1645435
http://www.ams.org/mathscinet-getitem?mr=0705462
http://www.ams.org/mathscinet-getitem?mr=1731299
http://www.ams.org/mathscinet-getitem?mr=1093001
mailto:ajaskiew@im.pwr.wroc.pl
	Introduction and the model
	Preliminaries
	A solution to the auxiliary discounted minimax problem
	A solution to the risk-sensitive control problem
	A discussion
	Appendix
	Acknowledgments
	References
	Author's addresses
ABSTRACT
  This paper deals with discrete-time Markov control processes on a general
state space. A long-run risk-sensitive average cost criterion is used as a
performance measure. The one-step cost function is nonnegative and possibly
unbounded. Using the vanishing discount factor approach, the optimality
inequality and an optimal stationary strategy for the decision maker are
established.

<|endoftext|><|startoftext|>
ZJOU-PHY-TH-07-02
NJNU-TH-07-11
A Study of B0d → J/Ψη(′) Decays in the pQCD Approach
Xin Liua∗, Zhen-Jun Xiaob†, Hui-Sheng Wangc
a. Department of Physics, Zhejiang Ocean University,
Zhoushan, Zhejiang 316000, P.R. China
b. Department of Physics and Institute of Theoretical Physics,
Nanjing Normal University, Nanjing, Jiangsu 210097, P.R. China and
c. Department of Applied Mathematics and Physics,
Anhui University of Technology and Science,
Wuhu, Anhui 241000, P.R. China
(Dated: November 4, 2018)
Abstract
Motivated by the very recent measurement of the branching ratio of B0d → J/ψη decay, we
calculate the branching ratios of Bd
0 → J/ψη and Bd0 → J/Ψη′ decays in the perturbative
QCD (pQCD) approach. The pQCD predictions for the branching ratios of considered decays
are: BR(B0d → J/Ψη) = (1.96
+9.68
−0.65) × 10−6, which is consistent with the first experimental
measurement within errors; while BR(B0d → J/Ψη′) = (1.09
+3.76
−0.25) × 10−6, very similar with
B0d → J/Ψη decay and can be tested by the forthcoming LHC experiments. The measurements
of these decay channels may help us to understand the QCD dynamics in the corresponding
energy scale, especially the reliability of pQCD approach to these kinds of B meson decays.
PACS numbers: 13.25.Hw, 12.38.Bx, 14.40.Nd
∗ liuxin@zjou.edu.cn
† xiaozhenjun@njnu.edu.cn
http://arxiv.org/abs/0704.0395v1
Very recently, the first observation of B0d → J/Ψη decay was reported by Belle Collab-
oration [1], and the branching ratio measured is
BR(B0d → J/Ψη) = (9.5± 1.7(stat)± 0.8(syst))× 10−6, (1)
which is consistent with the currently available theoretical predictions [1, 2, 3].
Up to now, the theoretical calculations for the branching ratios of Bd → J/Ψη(′) decays
were obtained by using the heavy quark factorization approximation in Ref. [2], or from
the measured J/Ψπ0 and J/ΨK0 branching ratios[3, 4, 5] based on the assumption of
the SU(3) flavor symmetry of strong interaction. In this paper, we will calculate the
branching ratios of B0d → J/Ψη and B0d → J/Ψη(′) decays directly by employing the
low energy effective Hamiltonian [6] and the perturbative QCD (pQCD) factorization
approach [7, 8, 9].
The paper is organized as follows: we present the formalism used in the calculation of
B0d → J/ψη(′) decays in Sec. I. In Sec. II, we show the numerical results and compare
them with the measured values. A short summery and some conclusions are also included
in this section.
I. FORMALISM AND PERTURBATIVE CALCULATIONS
The pQCD approach has been developed earlier from the QCD hard-scattering ap-
proach [7], and has been used frequently to calculate various B meson decay channels
[7, 8, 9, 10]. For two body charmless hadronic Bd,s → Mη(′) (here M stands for the
pseudo-scalar or vector light mesons composed of the light quarks u, d, s) decays, the
pQCD predictions generally agree well with the measured values [9, 10, 11].
In Refs. [12, 13], the authors calculated B → D∗sK,D
s and Bs → D(∗)+D(∗)−
decays and found that the pQCD approach works well for such decays. Here we try to
apply the pQCD approach to calculate the B meson decays involving the heavier J/Ψ
meson as one of the two final state mesons.
A. Formulism
In pQCD approach, the decay amplitude of B → J/ΨP (P = η, η(′) here) decay can
bo written conceptually as the convolution,
A(B →M1M2) ∼
d4k1d
4k3 Tr
C(t)ΦB(k1)ΦJ/Ψ(k2)ΦP (k3)H(k1, k2, k3, t)
, (2)
where the term “Tr” denotes the trace over Dirac and color indices. C(t) is the Wilson
coefficient which results from the radiative corrections at short distance. In the above
convolution, C(t) includes the harder dynamics at larger scale thanMB scale and describes
the evolution of local 4-Fermi operators from mW (the W boson mass) down to t ∼
Λ̄MB) scale, where Λ̄ ≡ MB −mb. The function H(k1, k2, k3, t) is the hard part and
can be calculated perturbatively. The function ΦM is the wave function which describes
hadronization of the quark and anti-quark to the mesonM . While the functionH depends
on the process considered, the wave function ΦM is independent of the specific process.
Using the wave functions determined from other well measured processes, one can make
quantitative predictions here.
Using the light-cone coordinates the B meson and the two final state meson momenta
can be written as
(1, 1, 0T ), P2 =
(1, r2, 0T ), P3 =
(0, 1− r2, 0T ), (3)
respectively, where r = MJ/Ψ/MB, and the light meson masses m
η have been ne-
glected. The longitudinal polarization vector of the J/Ψ meson, ǫL, is given by ǫL =
2MJ/Ψ
(1,−r2, 0T ). Putting the light (anti-) quark momenta in B, J/Ψ and η(
′) mesons
as k1, k2, and k3, respectively, we can choose
k1 = (x1P
1 , 0,k1T ), k2 = (x2P
2 , 0,k2T ), k3 = (0, x3P
3 ,k3T ). (4)
Then, for B → J/Ψη decay for example, the integration over k−1 , k−2 , and k+3 in eq.(2)
will lead to
A(B → J/Ψη′) ∼
dx1dx2dx3b1db1b2db2b3db3
C(t)ΦB(x1, b1)ΦJ/Ψ(x2, b2)Φη(x3, b3)H(xi, bi, t)St(xi) e
−S(t)] ,(5)
where bi is the conjugate space coordinate of kiT , and t is the largest energy scale in
functionH(xi, bi, t). The large logarithms ln(mW/t) are included in the Wilson coefficients
C(t). The large double logarithms (ln2 xi) on the longitudinal direction are summed by
the threshold resummation [14], and they lead to St(xi) which smears the end-point
singularities on xi. The last term, e
−S(t), is the Sudakov form factor which suppresses the
soft dynamics effectively [15]. Thus it makes the perturbative calculation of the hard
part H applicable at intermediate scale, i.e., MB scale. We will calculate analytically the
function H(xi, bi, t) for the considered decays in the first order in αs expansion and give
the convoluted amplitudes in next section.
B. The B0d → J/Ψη(
′) Decays
The low energy effective Hamiltonian for decay modes B0d → J/ψη(
′) can be written as
Heff =
[VcbV
cd (C1(µ)O
1(µ) + C2(µ)O
2(µ))] , (6)
with the four-fermion operators
Oc1 = d̄αγ
µ(1− γ5)cβ · c̄βγµ(1− γ5)bα , Oc2 = d̄αγµ(1− γ5)cα · c̄βγµ(1− γ5)bβ (7)
where the Wilson coefficients Ci(µ) (i = 1, 2), we will use the leading order (LO) expres-
sions, although the next-to-leading order (NLO) results already exist in the literature [6].
This is the consistent way to cancel the explicit µ dependence in the theoretical formulae.
For the renormalization group evolution of the Wilson coefficients from higher scale to
lower scale, we use the formulae as given in Ref.[16] directly.
FIG. 1: Typical Feynman diagrams contributing to the Cabibbo- and color- suppressed B0d →
J/Ψη(
′) decays.
As for B meson wavefunction, we make use of the same parameterizations as used
in the studies of different processes [16]. For vector J/ψ meson, in terms of the nota-
tion in Ref. [17], we decompose the nonlocal matrix elements for the longitudinally and
transversely polarized J/ψ mesons into
ΦJ/Ψ(x) =
mJ/ψǫ/ LΨ
L(x) + ǫ/ LP/Ψ
, (8)
Here, ΨL denote for the twist-2 distribution amplitudes, and Ψt for the twist-3 distri-
bution amplitudes. x represents the momentum fraction of the charm quark inside the
charmonium.
The J/ψ meson asymptotic distribution amplitudes read as [18]
ΨL(x) = 9.58
x(1− x)
x(1 − x)
1− 2.8x(1− x)
Ψt(x) = 10.94
(1− 2x)2
x(1− x)
1− 2.8x(1− x)
. (9)
It is easy to see that both the twist-2 and twist-3 DAs vanish at the end points due to
the factor [x(1− x)]0.7.
From the effective Hamiltonian (6), the Feynman diagrams corresponding to the con-
sidered decay are shown in Fig.1. With the meson wave functions and Sudakov factors,
the hard amplitude is given as
Feη = 8πCFm
dx1dx3
b1db1b3db3 φB(x1, b1)
(1− r2)
(1 + x3(1− r2))φAη (x3, b3) + r0(1− 2x3)
·φPη (x3, b3)
(1− 2x3) + r2(1 + 2x3)
φTη (x3, b3)
·αs(t1e) he(x1, x3, b1, b3) exp[−Sab(t1e)]
1− (1− x1)r2φPη (x3, b3)− x1r2φAη (x3, b3)
·αs(t2e)he(x3, x1, b3, b1) exp[−Sab(t2e)]
. (10)
where r0 = m
0/mB; CF = 4/3 is a color factor. The function he, the scales t
e and the
Sudakov factors Sab are displayed in Appendix A.
For the non-factorizable diagrams 1(c) and 1(d), all three meson wave functions are
involved. The integration of b3 can be performed using δ function δ(b3 − b1), leaving only
integration of b1 and b2. For the concerned operators, the corresponding decay amplitude
Meη =
dx1dx2 dx3
b1db1b2db2 φBs(x1, b1)
2rrcφ
J/Ψ(x2, b2)φ
η (x3, b2)− 4rr0rcφtJ/Ψ(x2, b2)φTη (x3, b2)
2 + x3(1− 2r2)
φLJ/Ψ(x2, b2)φ
η (x3, b2)
x3r0 + (x2 − x3)r0r2
φLJ/Ψ(x2, b2)φ
η (x3, b2)
·αs(tf )hf(x1, x2, x3, b1, b2) exp[−Scd(tf )]} . (11)
where rc = mc/mB,mc is the mass for c quark.
For the B0d → J/Ψη′ decay, the Feynman diagrams are obtained by replacing the η
meson in Fig. 1 with the meson η′. The corresponding expressions of decay amplitudes will
be similar with those as given in Eqs.(10-11), since the η and η′ are all light pseudoscalar
mesons and have the similar wave functions. The expressions of B0d → J/Ψη′ decay can
be obtained simply by the following replacements
φAη −→ φAη′ , φPη −→ φPη′ , φTη −→ φTη′ , r0 −→ r′0. (12)
For the η−η′ system, there exist two popular mixing basis: the octet-singlet basis and
the quark-flavor basis [19, 20]. Here we use the quark-flavor basis [19] and define
ηq = (uū+ dd̄)/
2, ηs = ss̄. (13)
The physical states η and η′ are related to ηq and ηs through a single mixing angle φ,
= U(φ)
cosφ − sinφ
sin φ cos φ
. (14)
The three input parameters fq, fs and φ in the quark-flavor basis have been extracted
from various related experiments [19, 20]
fq = (1.07± 0.02)fπ, fs = (1.34± 0.06)fπ, φ = 39.3◦ ± 1.0◦, (15)
where fπ = 130 MeV. In the numerical calculations, we will use these mixing parameters
as inputs. It worth of mentioning that the effects of possible gluonic component of η′
meson will not considered here since it is small in size [10, 21, 22].
For B0d → J/Ψη decay, by combining the contributions from different diagrams, the
total decay amplitude can be written as
M(B0d → J/Ψη) = VcbV ∗cdF1(φ)
FeηfJ/Ψ
+MeηC2
where the relevant mixing parameter is F1(φ) = cos φ/
It should be mentioned that the Wilson coefficients Ci = Ci(t) in Eq. (16) should
be calculated at the appropriate scale t using equations as given in the Appendices of
Ref. [16]. Here the scale t in the Wilson coefficients should be taken as the same scale
appeared in the expressions of decay amplitudes in Eqs. (10) and (11). This is the way
in pQCD approach to eliminate the scale dependence. In order to estimate the effect of
higher order corrections, however, we introduce a scale factor at = 1.0± 0.2 and vary the
scale tmax as described in Appendix A.
Similarly, the decay amplitudes for B0d → J/Ψη′ decay can be obtained easily from
Eq.(16) by the following replacements of F1(φ) → F ′1(φ) = sinφ/
II. NUMERICAL RESULTS AND DISCUSSIONS
In this section, we will calculate the branching ratios for those considered decay modes.
The input parameters and the wave functions to be used are given in Appendix B. In
numerical calculations, central values of input parameters will be used implicitly unless
otherwise stated.
With the complete decay amplitudes, we can obtain the decay width for the considered
decays,
Γ(B0d → J/ψη(
′)) =
(1− r2)
M(B0d → J/ψη(
. (17)
By employing the quark-flavor scheme of η−η′ system and using the mixing parameters
as given in Eq. (15), one finds the branching ratios for the considered two decays with
error bars as follows:
Br( B0d → J/Ψη) =
1.96+0.71−0.50(ωb)
+9.65
−0.39(at)
+0.32
+0.13(a2)
+0.14
−0.13(fJ/Ψ)
× 10−6, (18)
Br( B0d → J/Ψη′) =
1.09+0.32−0.24(ωb)
+3.73
+0.01(at)
+0.28
+0.01(a2)
+0.08
−0.07(fJ/Ψ)
× 10−6, (19)
where the main errors are induced by the uncertainties of ωb = 0.40 ± 0.05 GeV, at =
1.0 ± 0.2, a2 = 0.115 ± 0.115 and fJ/Ψ = 0.405 ± 0.014 GeV , respectively. One can see
that the pQCD predictions are sensitive to the variations of ωb and at.
For B0d → J/Ψη decay, the central value of the pQCD prediction for Br(B0d → J/Ψη)
is a factor of 4 smaller than the measured value as given in Eq. (1) [1]. But the pQCD
prediction is in fact still consistent with Belle’s first measurement if we take the large
theoretical and experimental errors into account. By varying the scale factor at in the
range of at = [0.8, 1.0], for example, the central value of Br(B → J/Ψη) will change in the
range of [0.2, 1.1]×10−5 accordingly. It is not difficult to understand such at dependence.
Since the J/Ψ meson is much heavier than light mesons, and therefore moving not as fast
as those light meson when B meson is decaying. So a small decrease of the scale ti will
lead to a larger Wilson coefficients C1,2(t) and αs(ti), and consequently results in a larger
decay rate.
For B0d → J/Ψη′ decay, only experimental upper limit (at 90% C.L) is available now:
BR(B0 → J/Ψη′) < 6.3 × 10−5 [4, 5]. The pQCD prediction for the branching ratio of
B0d → J/Ψη′ decay is very similar in magnitude with that of B0d → J/Ψη, consistent with
the upper limit and will be tested in the forthcoming LHC experiments.
At the leading order, only the tree Feynman diagrams as shown in Fig. 1 contribute to
B0d → J/Ψη(′) decays. There exists no CP violation in these decays within the standard
model, since there is only one kind of Cabibbo-Kabayashi-Muskawa (CKM) phase involved
in the corresponding decay amplitudes, as can be seen from eq. (16).
In short, we calculated the branching ratios of B0d → J/Ψη and B0d → J/Ψη′ decays at
the leading order by using the pQCD factorization approach. Besides the usual factoriz-
able diagrams, the non-factorizable spectator diagrams are also calculated analytically in
the pQCD approach. By keeping the transverse momentum kT , the end-point singularity
disappears in our calculation.
From our calculations and phenomenological analysis, we found the following results:
• Using the quark-flavor scheme, the pQCD predictions for the branching ratios are
Br(B0d → J/Ψη) =
1.96+9.68−0.65
× 10−6, (20)
Br(B0d → J/Ψη′) =
1.09+3.76−0.25
× 10−6, (21)
where the various errors as specified previously have been added in quadrature.
• The major theoretical errors of the pQCD predictions are induced by the uncertain-
ties of the hard energy scale ti’s and the parameters ωb.
Acknowledgments
X. Liu would like to acknowledge the financial support of The Scientific Research
Start-up Fund of Zhejiang Ocean University under Grant No.21065010706. This work was
partially supported by the National Natural Science Foundation of China under Grant
No.10575052, and by the Specialized Research Fund for the Doctoral Program of Higher
Education (SRFDP) under Grant No. 20050319008.
APPENDIX A: RELATED FUNCTIONS
We show here the function hi’s, coming from the Fourier transformations of the function
H(0),
he(x1, x3, b1, b3) = K0
x1x3(1− r2)mBb1
θ(b1 − b3)K0
x3(1− r2)mBb1
x3(1− r2)mBb3
+ θ(b3 − b1)K0
x3(1− r2)mBb3
x3(1− r2)mBb1
St(x3), (A1)
hf(x1, x2, x3, b1, b2) =
θ(b2 − b1)I0(MB
x1x3(1− r2)b1)K0(MB
x1x3(1− r2)b2)
+ (b1 ↔ b2)
K0(MBF(1)b2), for F
(1) > 0
0 (MB
| b2), for F 2(1) < 0
, (A2)
where J0 is the Bessel function, K0 and I0 are the modified Bessel functions with
K0(−ix) = −(π/2)Y0(x) + i(π/2)J0(x), and F(j)’s are defined by
F 2(1) = (x1 − x2)x3(1− r2) + r2c , (A3)
F 2(2) = (x1 − x2)x3(1− r2) + r2c . (A4)
The threshold resummation form factor St(xi) is adopted from Ref. [17]
St(x) =
21+2cΓ(3/2 + c)√
πΓ(1 + c)
[x(1 − x)]c, (A5)
where the parameter c = 0.3. This function is normalized to unity.
The Sudakov factors used in the text are defined as
Sab(t) = s
x1mB/
2, b1
x3mB/
2, b3
(1− x3)mB/
2, b3
ln(t/Λ)
− ln(b1Λ)
ln(t/Λ)
− ln(b3Λ)
, (A6)
Scd(t) = s
x1mB/
2, b1
x2mB/
2, b2
(1− x2)mB/
2, b2
x3mB/
2, b1
(1− x3)mB/
2, b1
ln(t/Λ)
− ln(b1Λ)
ln(t/Λ)
− ln(b2Λ)
, (A7)
where the function s(q, b) are defined in the Appendix A of Ref. [16]. The scale ti’s in the
above equations are chosen as
t1e = at ·max(
x3(1− r2)MB, 1/b1, 1/b3),
t2e = at ·max(
x1(1− r2)MB, 1/b1, 1/b3),
tf = at ·max(
x1x3(1− r2)MB,
(x1 − x2)x3(1− r2) + r2cMB, 1/b1, 1/b2), (A8)
where at = 1.0±0.2 and r =MJ/Ψ/MB. The scale ti’s are chosen as the maximum energy
scale appearing in each diagram to kill the large logarithmic radiative corrections.
APPENDIX B: INPUT PARAMETERS AND WAVE FUNCTIONS
The masses, decay constants, QCD scale and B0d meson lifetime are
(f=4)
= 250MeV, fπ = 130MeV, fJ/Ψ = 405MeV,
0 = 1.08GeV, MB0d = 5.28MeV, MJ/Ψ = 3.097GeV,
MW = 80.41GeV, τB0
= 1.54× 10−12s. (B1)
For the CKM matrix elements, here we adopt the Wolfenstein parametrization for the
CKM matrix, and take λ = 0.2272, A = 0.818, ρ = 0.221 and η = 0.340 [4].
For the B meson wave function, we adopt the model
φB(x, b) = NBx
2(1− x)2exp
(ωbb)
, (B2)
where ωb is a free parameter and we take ωb = 0.40± 0.05 GeV in numerical calculations,
and NB = 91.745 is the normalization factor for ωb = 0.40 for the B meson.
The wave function for dd̄ components of η(′) meson is given by
Φηdd̄(p, x, ζ) ≡
P/φAηdd̄(x) +m
(x) + ζm
0 (v/n/− v · n)φTηdd̄(x)
, (B3)
where p and x are the momentum and the momentum fraction of ηdd̄ respectively, while
φAηdd̄, φ
and φTηdd̄ represent the axial vector, pseudoscalar and tensor components of the
wave function respectively. We here assume that the wave function of ηdd̄ is same as the
π wave function based on SU(3) flavor symmetry. The parameter ζ is either +1 or −1
depending on the assignment of the momentum fraction x.
The explicit expression of chiral enhancement scale m
0 = m
0 is given by [21]
[m2η cos
2 φ+m2η′ sin
(m2η′ −m2η) cosφsinφ], (B4)
and numerically m
0 = 1.07MeV for mη = 547.5 MeV, mη′ = 957.8 MeV, fq = 1.07fπ,
fs = 1.34fπ and φ = 39.3
For the distribution amplitude φAηq , φ
and φTηq , we utilize the results for π meson
obtained from the light-cone sum rule [23] including twist-3 contributions:
φAηq(x) =
fqx(1 − x)
1 + a
5(1− 2x)2 − 1
21(1− 2x)4 − 14(1− 2x)2 + 1
, (B5)
φPηq(x) =
30η3 −
3(1− 2x)2 − 1
−3η3ω3 −
ρ2ηq −
ρ2ηq(s)a
35(1− 2x)4 − 30(1− 2x)2 + 3
,(B6)
φTηq(x) =
fq(1− 2x)
+ (5η3 −
η3ω3 −
ρ2ηq −
ρ2ηqa
2 )(10x
2 − 10x+ 1)
, (B7)
with the updated Gegenbauer moments [24]
2 = 0.115, a
4 = −0.015, ρηq = 2mq/mqq, η3 = 0.015, ω3 = −3.0. (B8)
[1] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett. 98, 131803 (2007).
[2] A. Deandrea et al., Phys. Lett. B 318, 549 (1993).
[3] P.Z. Skands, J. High Energy Phys. 0101 (2001) 008.
[4] W.-M. Yao et al. ( Particle Data Group), J. Phys. G 33, 1 (2006).
[5] Heavy Flavor Averaging Group, E. Barberio et al., hep-ex/0603003; and online update at
http://www.slac.stanford.edu/xorg/hfag.
[6] G. Buchalla, A.J. Buras, and M.E. Lautenbacher, Rev. Mod. Phys. 68, 1125 (1996).
[7] G.P. Lepage and S.J. Brodsky, Phys. Rev. D 22, 2157 (1980).
[8] C.-H. V. Chang and H.N. Li, Phys. Rev. D 55, 5577 (1997); T.-W. Yeh and H.N. Li, Phys.
Rev. D 56, 1615 (1997).
[9] H.N. Li, Prog.Part.& Nucl.Phys. 51, 85 (2003), and reference therein. H.N. Li and H.L. Yu,
Phys. Rev. Lett. 74, 4388 (1995); Phys. Lett. B 353, 301 (1995); Phys. Rev. D 53, 2480
(1996).
[10] X. Liu, H.S. Wang, Z.J. Xiao, L.B. Guo and C.D. Lü, Phys. Rev. D 73, 074002 (2006);
H.S. Wang, X. Liu, Z.J. Xiao, L.B. Guo and C.D. Lü, Nucl. Phys. B 738. 243 (2006);
Z.J. Xiao, X.F. Chen and D.Q. Guo, Eur.Phys.J. C 50 (2007) in press; Z.J. Xiao, D.Q Guo
and X.F. Chen, Phys. Rev. D 75 , 014018 (2007); Z.J. Xiao, X. Liu and H.S. Wang, Phys.
Rev. D 75034017 (2007); Z.J. Xiao, X.F. Chen and D.Q. Guo, hep-ph/0701146.
[11] A. Ali, G. Kramer, Y. Li, C.D. Lü, Y.L. Shen, W. Wang, and Y.M. Wang, hep-ph/0703162.
[12] Y. Li and C.D. Lü, J. Phys. G 29, 2115 (2003); High Energy & Nucl.Phys. 27, 1061 (2003).
[13] Y. Li, C.D. Lü, and Z.J. Xiao, J. Phys. G 31, 273 (2005).
[14] H.N. Li, Phys. Rev. D 66, 094010 (2002).
[15] H.N. Li and B. Tseng, Phys. Rev. D 57, 443, (1998).
[16] C.-D. Lü, K. Ukai and M.Z. Yang, Phys. Rev. D 63, 074009 (2001).
[17] T. Kurimoto, H.N. Li, and A.I. Sanda, Phys. Rev. D 65, 014007 (2002); Phys. Rev. D
67, 054028 (2003).
[18] A.E. Bondar and V.L. Chernyak, Phys. Lett. B 612, 215(2005).
[19] Th. Feldmann, P. Kroll and B. Stech, Phys. Rev. D 58, 114006 (1998).
[20] R. Escribano and J.M. Frere, J. High Energy Phys. 0506 (2005) 029; J. Schechter, A.
Subbaraman and H. Weigel, Phys. Rev. D 48, 339 (1993)
[21] Y.-Y. Charng, T. Kurimoto, H.N. Li, Phys. Rev. D 74, 074024 (2006).
[22] R. Escribano, J. Nadal, hep-ph/0703187.
[23] P. Ball, J. High Energy Phys. 9809, 005 (1998); P. Ball, J. High Energy Phys. 9901, 010
(1999).
[24] P. Ball and R. Zwicky, Phys. Rev. D 71, 014015 (2005); P. Ball, V.M. Braun, and A. Lenz,
J. High Energy Phys. 0605 (2006) 004.
http://arxiv.org/abs/hep-ex/0603003
http://www.slac.stanford.edu/xorg/hfag
http://arxiv.org/abs/hep-ph/0701146
http://arxiv.org/abs/hep-ph/0703162
http://arxiv.org/abs/hep-ph/0703187
	Formalism and Perturbative Calculations
	Formulism
	 The Bd0 J/(') Decays
	Numerical results and Discussions
	Acknowledgments
	Related Functions 
	Input parameters and wave functions
	References
ABSTRACT
  Motivated by the very recent measurement of the branching ratio of ${B_d^0}
\to J/\psi \eta$ decay, we calculate the branching ratios of ${B_d}^0 \to
J/\psi \eta$ and ${B_d}^0 \to J/\Psi \eta'$ decays in the perturbative QCD
(pQCD) approach. The pQCD predictions for the branching ratios of considered
decays are: $BR(B_d^0 \to J/\Psi \eta) = (1.96 ^{+9.68}_{-0.65}) \times
10^{-6}$, which is consistent with the first experimental measurement within
errors; while $BR(B_d^0 \to J/\Psi \eta') = (1.09 ^{+3.76}_{-0.25}) \times
10^{-6}$, very similar with $B_d^0 \to \jpsi \eta$ decay and can be tested by
the forthcoming LHC experiments. The measurements of these decay channels may
help us to understand the QCD dynamics in the corresponding energy scale,
especially the reliability of pQCD approach to these kinds of B meson decays.

<|endoftext|><|startoftext|>
Finite-temperature phase transitions in a two-dimensional boson Hubbard model
Min-Chul Cha1 and Ji-Woo Lee2
Department of Applied Physics, Hanyang University, Ansan 426-791, Korea
Department of Physics, Myongji University, Yongin 449-728, Korea
We study finite-temperature phase transitions in a two-dimensional boson Hubbard model with
zero-point quantum fluctuations via Monte Carlo simulations of quantum rotor model, and construct
the corresponding phase diagram. Compressibility shows a thermally activated gapped behavior in
the insulating regime. Finite-size scaling of the superfluid stiffness clearly shows the nature of the
Kosterlitz-Thouless transition. The transition temperature, Tc, confirms a scaling relation Tc ∝ ρ
with x = 1.0. Some evidences of anomalous quantum behavior at low temperatures are presented.
PACS numbers: 73.43.Nq, 74.25.Dw, 05.30.Jp
Recently quantum phase transitions[1, 2] have drawn a
lot of attention in systems of interacting particles. Typ-
ically strong interactions suppress the itineracy of par-
ticles to induce a strongly correlated insulating phase,
whereas with weak interactions a conducting phase is
stable. The criticality of these zero-temperature phase
transitions can be investigated at low, but finite, tem-
peratures. How quantum fluctuations associated with a
quantum critical point(QCP) have influence on phases at
finite temperatures [3, 4, 5] is a theoretically interesting
and an experimentally relevant question.
At finite temperatures, it is expected that a quantum
phase transition turns into a classical one with the same
order parameter or disappears. Remnant quantum fluc-
tuations near a QCP may bring anomalous properties[3],
which can be captured by scaling relations, and lead
to crossover behaviors as temperature rises. Some pos-
sibilities such as reentrant behaviors due to the inter-
play of quantum and thermal fluctuations have been
proposed[6].
These issues can be clarified by direct investigations of
a generic quantum mechanical model. So far most of the
theoretical investigations heavily rely on the exact solu-
tion of the quantum Ising model, available strictly in one
dimension[4]. Interacting bosonic systems simulated via
Monte Carlo methods, not suffering from negative sign
problems, will be an ideal place to study these problems.
In previous works, a quantum XY model, equivalent to
hard-core bosons at half-filling, showed the Kosterlitz-
Thouless(KT) transition[7] at finite temperature in two
dimensions[8, 9]. In the model with nearest neighbor
repulsion, destruction of the solid order as well as the
superfluidity by thermal fluctuations was observed [10].
However, generic finite-temperature phase diagrams have
not been constructed.
In this work, we investigate the thermally driven phase
transitions of a two-dimensional quantum rotor model,
which is believed to share the same critical properties of
a soft-core generic boson Hubbard model[11], via Monte
Carlo simulations. The results are summarized in the
phase diagram as shown in Fig. 1. Finite-size scaling
properties of the superfluid stiffness confirm that the na-
ture of the classical phase transition associated with the
destruction of superfluidity is consistent with that of the
KT transition, and clearly support the scenario of the
universal jump at the critical point[12]. Finite tempera-
ture, T , sets the size in the temporal direction, leading to
a scaling behavior[4, 11] Tc ∝ ρ0x with x = 1.0, where Tc
is the transition temperature and ρ0 is the superfluid stiff-
ness at zero temperature. The compressibility diverges at
the transition. In the insulating regime at low tempera-
ture, thermally activated behavior of the compressibility
with a finite energy gap is observed. Some anomalous de-
pendence of energy and specific heat on T , possibly due
to quantum fluctuations, are observed for T < 0.25U .
The Hamiltonian of a boson Hubbard model reads
nj(nj − 1)− µ
nj − w
ibj + b
jbi),(1)
where bj(b
j) is the boson annihilation(creation) operator
0 0.01 0.02 0.03 0.04 0.05
µ=0.9
Normal fluid
Mott Insulator
Superfluid
KT transition
Gapped fluid
FIG. 1: (Color online) Phase diagram on the space of hop-
ping strength, t(=
n0(n0 + 1)w), and temperature, T , in
unit of U . The solid line denotes the classical phase transi-
tions, which terminates at a QCP at T = 0. The dotted line
represents crossover between gapped fluid and normal fluid.
http://arxiv.org/abs/0704.0396v1
at the j-th site, and nj is the number operator. U and
w stand for the strengths of the on-site repulsion and of
the nearest neighbor hopping, respectively, and µ is the
chemical potential.
It is convenient to put µ/U + 1/2 = n0 + n̄ with an
integer n0 and −1/2 < n̄ ≤ 1/2 so that n0 represents the
background number of bosons per site and n̄ is a charge
offset. When n̄ = 0, the density of bosons is fixed to
a commensurate filling across the transition. For non-
integer n̄, however, an integer filling in a Mott insulator
shifts to a non-integer filling in a compressible fluid. We
study the phase transition of the latter case in (2+1)-
dimensional L×L×Lτ square lattices, where L denotes
the size in a spatial dimension and Lτ in the temporal
dimension.
Since the phase transition of the model in Eq. (1)
is characterized by the establishment of phase coher-
ence, we may rewrite the Hamiltonian in terms of
the phase angle θj of bosons by replacing bj(b
j) =√
−iθj (
nj + 1e
iθj ) with nj =
. Under the as-
sumption that the nature of the transition is governed
only by the fluctuations of θj , not those of the hop-
ping strength, we replace nj → n0 so that bj(b†j) =√
−iθj (
n0 + 1e
iθj ). Then, the Hamiltonian is re-
duced to a quantum rotor model
nj(nj − 1)− µ
nj − 2t
cos(θi − θj),(2)
where t =
n0(n0 + 1)w. Here we take the number of
bosons nj ≥ 0.
Through a path integral mapping, we can construct
the corresponding classical action[14]
Jτr (J
r − 1)− ǫµJτr − ln IJxr (2ǫt)− ln IJyr (2ǫt)(3)
with the partition function
∇· ~J=0
{ ~Jr}
~J], (4)
where ǫ = β/Lτ is a lattice constant in the imaginary
time axis for an inverse temperature β, ~Jr is an integer
current at site r = (j, τ) with a spatial index j and a tem-
poral index τ , which is conserved at each site as denoted
by ∇ · ~J = 0, and Im(x) is the modified Bessel func-
tion given by the relation eK cos θ =
m=−∞ Im(K)e
In this work, we investigate the properties of the model
in Eq. 3 via Monte Carlo simulations using a recently
proposed worm algorithm [13]. In order to reduce the
systematic errors in discretizing the imaginary time axis,
we need to take ǫ
tU ≪ 1. We take Uǫ = 0.5 - 2 for
t ≪ U and set the energy unit U = 1.
The superfluid stiffness in a finite system is given by[14]
ρL = β
−1L2−d〈W 2x 〉, (5)
0.02 0.03 0.04 0.05
L=128
0 0.5 1
L exp(-b (t
) µ=0.9
=0.0409
b=1.85
0 10 20 30 40 50 60
0.005
0.015
L=128
0 0.1 0.2 0.3 0.4
L exp(-b (β
) µ=0.9
t=0.034
t=0.034
b=3.35
=28.8
FIG. 2: (Color online) Finite-size scaling behaviors of the su-
perfluid stiffness as a function of (a) hopping strength and
(b) temperature. For both cases, data collapsing onto a sin-
gle curve works fine in terms of the scaling parameter L/ξ as
shown in insets, consistent with the nature of the KT transi-
tion and the universal jump at the critical point.
where Wx = L
r and 〈...〉 denotes the averages
over the probabilites determined by the partition func-
tion of Eq. (4), and d is the spatial dimensionality. Sim-
ilarly the compressibility is
κ = βL−d[〈N2〉 − 〈N〉2], (6)
with N = L−1τ
r . The energy expectation is given
〈H〉 = L−1τ 〈
〉 , (7)
and the specific heat is CV = L
−d(∂〈H〉/∂T ).
We consider the case for µ = 0.9 so that n0 = 1 and
n̄ = 0.4. Figure 2 shows the finite-size scaling behav-
ior of the superfluid stiffness as a function of (a) t and
(b) β. Finite-size scaling properties of the transition can
be obtained by plotting the curves in terms of a scal-
ing variable L/ξ, where ξ is the correlation length. Here
we assume an essential singularity[15] ξ ∼ exp(bδ−1/2),
where δ = t − tc (or β − βc) is a tuning parameter and
b is a non-universal scaling factor. In terms of this scal-
ing variable, we obtain high-quality data collapsing onto
a single curve for different sizes, consistent with the na-
ture of the KT transition. The scaling behavior also sup-
ports the scenario of the universal jump of the superfluid
stiffness[12], (π/2)βcρ∞ = 1, at the critical point in the
thermodynamic limit. By extrapolating the single curves
to the critical point, we find that (π/2)βcρ∞ ≈ (a)1.01
and (b)1.06. These numbers are, however, sensitive to
fitting parameters b and tc(βc).
Figure 3a shows the behavior of the compressibility.
The finite-size scaling ansatz of the compressibility is
written in the form
κ = Lz−dX̃κ(L(t− tc)1/ν , β/Lz), (8)
where X̃κ is a dimensionless scaling function and z is the
dynamical critical exponent. For the generic superfluid-
insulator transition(GSIT), z = 2 is expected[11]. The
crossing behavior of the compressibility curves for differ-
ent sizes at the critical point t0c = 0.023±0.001, therefore,
represents the scaling properties near the QCP, where t0c
is the critical hopping strength at zero temperature. For
different values of µ, we have similar results with t0c just
shifted.
We find that the compressibility diverges at the transi-
tion. In the superfluid side, κ ∼ 1/(t− t0c). This strongly
supports that the longe-range density fluctuations drive
the transition. In the insulating side, the compressibility
has an activated form e−∆gap/T with a finite energy gap
∆gap. This dependence is shown in Fig. 3b for different
t, from which we can calculate ∆gap as shown in the in-
set. For small t, we need a large number of Monte Carlo
steps to obtain equilibrium and have bigger error bars in
determination of ∆gap. The gap vanishes around t = t
as expected.
Thus we have a so-called ’V-shaped’ phase diagram
(Fig.1). In the insulating regime, the Mott insulator
exists at T = 0, which turns into an activated gapped
fluid with a finite energy gap at low temperature. It
gradually disappears in a high-temperature normal fluid.
This crossover line can be specified by the condition
∆gap/T ≈ 1. The phase coherence in a superfluid at
T = 0 is destroyed by quantum fluctuations to form a
QCP or by thermal fluctuations at T > 0 to define clas-
sical phase transitions. The phase boundary in Fig. 1 is
obtained by tuning t for given T (black circles) as well
as by tuning β for a given t (red squares). Note that the
phase boundary follows a scaling relation Tc ∝ |t− t0c |zν ,
which implies that β determines the correlation length in
the temporal direction, where ν is the correlation length
0 0.01 0.02 0.03 0.04 0.05
L=12  Lτ=18
L=16  Lτ=32
L=20  Lτ=50
L=24  Lτ=72
L=28  Lτ=98
µ=0.9
0 50 100 150 200
1e-05
0.0001
0.001
0 0.01 0.02
µ=0.9
t=.005
t=.010
t=.015
t=.020
t=.021
t=.022
t=.023(b)
FIG. 3: (Color online) (a) Compressibility of the boson Hub-
bard model shows behavior of the GSIT with z = 2.0, diverg-
ing at the transition. (b) In the insulating regime, we have
thermally activated behaviors, κ ∼ e−∆gap/T , from which
∆gap can be evaluated. Inset: ∆gap as a function of t, van-
ishing at the QCP.
critical exponent. The boundary in Fig. 1 is consistent
with the expectation zν = 1[11] for the GSIT.
It is interesting to check the predicted scaling relation
[4, 11] Tc ∝ ρ0x in this model. Figure 4 shows that
the zero-temperature superfluid stiffness ρ0, denoted by
dotted line, which obtained via extrapolation of values at
T > 0, follows ρ0 ∝ |t− t0c |, implying that x = 1.0. It is
consistent with the hyperscaling argument[11] suggesting
x = z/(d+ z − 2).
We expect that this quantum criticality disappears
as temperature rises, which means quantum fluctuations
possibly leave some tracks in bulk properties at low tem-
peratures. Figure 5 shows the specific heat, CV , and the
energy expectation values, 〈H〉, as a function of T for dif-
ferent t. Sharp rises of CV in the conducting regime or
round up-rises in the insulating regime are followed by
indents, regions indicated by N, which apparently rep-
resent anomalous behavior due to quantum fluctuations
and disappear at high temperatures for T & 0.25. This
feature strongly suggests a crossover in normal fluid from
quantum mechanical to classical regime. Similarly the
curves of 〈H〉 show bumps, indicated by H, only in the
range where quantum critical fluctuations are expected
to have effects.
In summary, we have investigated the phase transi-
tions at finite temperature in a two-dimensional quan-
tum rotor model in which intrinsic zero-point fluctua-
tions are present. Finite-size scaling of the superfluid
stiffness shows an essential singularity of the KT phase
transition and the universal jump at the critical point.
The compressibility diverges at the transition. In the
insulating regime, the compressibility shows a thermally
activated behavior, κ ∼ e−∆gap/T , from which we can
successfully evaluate the gap. This indicates that the in-
sulating behavior at low temperature gradually crosses
over to the behavior of normal fluid as temperature in-
creases. The transition temperature Tc shows a scaling
behavior Tc ∝ |t − t0c |, showing that finite T limits the
length of quantum fluctuations in the temporal direction,
and a hyperscaling relation Tc ∝ ρ0. The behavior of the
specific heat and the energy suggests that, as tempera-
ture rises, quantum critical regime near a QCP crosses
over to classical regime.
MCC would like to thank Gerardo Ortiz for helpful
discussions and the hospitality of Department of Physics,
Indiana University, where parts of this work were carried
out. This work was supported by Korea Research Fund
grant No. R05-2004-000-11004-0.
0.02 0.025 0.03 0.035 0.04 0.045 0.05
β=200
β=400
µ=0.9
FIG. 4: (Color online) Superfluid stiffness for different β. As
β increases, the size dependence becomes smaller. This allows
us to extrapolate the curves to obtain zero-temperature su-
perfluid stiffness, ρ0, in the thermodynamic limit as denoted
by dotted line. It shows that ρ0 ∝ |t− t
c | with t
c ≈ 0.22.
[1] S. Sachdev, Quantum Phase Transitions (Cambridge
University Press, Cambridge, 1999).
[2] S. L. Sondhi, S. M. Girvin, J. P. Carini, and D. Sahar,
Rev. Mod. Phys. 69, 315 (1997).
[3] P. Coleman and A. J. Schofield, Nature (London) 433,
226 (2005).
[4] A. Kopp and S. Chakravarty, Nature Phys. 1, 53 (2005).
[5] S. Sachdev, Phys. Rev. B 55, 142 (1997).
[6] S. Kim and M. Y. Choi, Phys. Rev. B 41, 111 (1990).
[7] J. M. Kosterlitz and D. J. Thouless, J. Phys. C 6, 1181
(1973).
[8] H.-Q. Ding and M. S. Makivić, Phys. Rev. B, 42, R6827
(1990).
[9] K. Harada and N. Kawashima, J. Phys. Soc. Jpn. 67,
2768 (1998); A. W. Sandvik and C. J. Hamer, 60 6588
(1999).
[10] G. Schmid, S. Todo, M. Troyer, and A. Dorneich, Phys.
Rev. Lett., 88, 167208 (2002).
[11] M. P. A. Fisher, P. B. Weichman, G. Grinstein, and D. S.
Fisher, Phys. Rev. B 40, 546 (1989).
[12] D. R. Nelson and J. M. Kosterlitz, Phys. Rev. Lett., 39,
1201 (1977).
[13] F. Alet and E. S. Sørensen, Phys. Rev. E 67, 015701(R)
(2003); Phys. Rev. E 68, 026702 (2003).
[14] M. Wallin, E. S. Sørensen, S. M. Girvin, and A. P. Young,
Phys. Rev. B 49, 12115 (1994).
[15] J. M. Kosterlitz, J. Phys. C 7, 1046 (1974).
0 0.1 0.2 0.3 0.4 0.5
t = 0.005
t = 0.020
t = 0.040
t = 0.050
t = 0.070
t = 0.100
-0.85
t = .005
t = .010
t = .020
t = .025
t = .030
t = .040
0 0.1 0.2 0.3 0.4
t = .040
t = .050
t = .060
t = .070
t = .080
t = .090
t = .010
FIG. 5: (Color online) Specific heat, CV , as a function of T
for different t. Sharp rises in the conducting regime, signa-
ture of the superfluid transition, or round up-rise of CV in the
insulating regime are followed by indents which disappear in
high temperature region, T & 0.25. Insets: The curves of the
energy expectation values, 〈H〉, have bumps at low tempera-
tures possibly due to the effects of quantum fluctuations.
ABSTRACT
  We study finite-temperature phase transitions in a two-dimensional boson
Hubbard model with zero-point quantum fluctuations via Monte Carlo simulations
of quantum rotor model, and construct the corresponding phase diagram.
Compressibility shows a thermally activated gapped behavior in the insulating
regime. Finite-size scaling of the superfluid stiffness clearly shows the
nature of the Kosterlitz-Thouless transition. The transition temperature,
$T_c$, confirms a scaling relation $T_c \propto \rho_0^x$ with $x=1.0$. Some
evidences of anomalous quantum behavior at low temperatures are presented.

<|endoftext|><|startoftext|>
Introduction, the NOON
state fidelity is F1,s = (1−r2)2, and the success probabil-
ity is P1,s = λ/(λ + 1). Choosing squeezing parameters
such that F1,s = F1, we find that P
1 = (4/3)P1,s in the
high fidelity limit. It is thus possible to achieve a higher
success probability using the scheme with two OPOs, but
the price to pay is a more technically involved setup,
and NOON states with two different values of φ are pro-
duced. For N = 2 the present protocol and combination
of two single-photon states on a 50 : 50 beam splitter,
each produced conditionally from a single OPO, lead to
identical fidelities and success probabilities. Finally we
note that the photo detector model underestimates FN
for η > 0 because 1 − (1 − η)n = η
i=0 (1 − η)i < nη
for n = 2, 3, . . . while 1− (1− η)n = nη for n = 0, 1, i.e.,
the ‘wrong’ terms containing more than N photons are
given a too large weight. This is also what we observe in
Fig. 2.
In the limit of small r and for odd values of N the
success probability is given approximately by the simple
expression
〈ψi|((â†−)N − (b̂
−iθ)N )
(âN− − (b̂+eiθ)N )|ψi〉 =
(2N)N
λN (N odd). (23)
Again η/(2N) must be replaced by η/N to obtain PN for
even values of N . The approximation to P3 is shown in
Fig. 3.
IV. NOON STATES FROM
CONTINUOUS-WAVE OPO SOURCES
Our protocol is not limited to pulsed fields, and for
completeness we now briefly consider NOON state gen-
eration from continuously driven OPOs. We assume
N = 3. For continuous-wave fields each of the three
detected trigger beams and the two signal beams are
described by time dependent field operators ĉi(t). The
trigger detections take place in particular modes local-
ized around the three detection times tc1, tc2, and tc3,
and we want to determine the NOON state fidelity of
an output state occupying one temporal mode in each
signal beam. Following the general multimode formal-
ism in Refs. [15, 20], the five relevant modes are spec-
ified by the mode functions fi(t), and the correspond-
ing five single mode operators are ĉi =
fi(t)ĉi(t)dt.
In general, we are free to choose the two output mode
functions at will, and in the present case it is natural to
choose the mode function which gives rise to the largest
three-photon state fidelity when three-photon states are
generated from a single type II continuous-wave OPO.
Since we are mainly interested in the parameter region
where the squeezing is small and the NOON state fidelity
is large, we use the optimal three-photon state mode
function derived for very low beam intensity in [20], i.e.,
f4(t) = f5(t) =
k=1 ck
γ/2 exp(−γ|t− tck|/2), where
0 0.05 0.1 0.15
FIG. 4: NOON state fidelity as a function of ǫ/γ for states
generated from a pair of continuous-wave OPO sources when
conditioning on three simultaneous trigger detection events
tc1 = tc2 = tc3.
0 0.5 1 1.5 2
FIG. 5: Fidelity of NOON states from continuous-wave OPO
sources as a function of separation between trigger detection
events (tc3 − tc1)γ for N = 3, tc3 − tc2 = tc2 − tc1, and ǫ/γ =
0.01.
the coefficients ck are functions of the intervals between
the detection times and γ is the leakage rate of the OPO
output mirror. We furthermore assume that the trigger
mode functions are nonzero only in an infinitesimal time
interval centered at the detection time, which is valid if
the trigger detections take place on a time scale much
shorter than γ−1. Since we consider a low intensity con-
tinuous beam, and since we formally assume that the
trigger detectors only register the light field in infinitesi-
mal time intervals around the detection times, the anni-
hilation operator detector model is perfectly valid in this
case and detector dead time is insignificant.
We may now proceed as above and eliminate all the
irrelevant modes from the analysis by writing down the
Gaussian Wigner function of the five interesting modes.
The only difference is that this time 〈ĉ†i ĉj〉 and 〈ĉiĉj〉 are
expressed in terms of the two time correlation functions
of the OPO output. Also, the operators applied to the
Wigner function to take conditioning into account are
different because the annihilation detector model is used.
The reader is referred to Refs. [15, 20] for details.
The resulting fidelity is shown as a function of ǫ/γ in
Fig. 4, where ǫ is the nonlinear gain in the OPO, and as a
function of the temporal distance between the condition-
ing detection events in Fig. 5. As in the pulsed case the
fidelity decreases when the degree of squeezing increases.
The fidelity also decreases when the temporal distance
between the conditioning detection events increases from
zero, but it is permissible to have a small time interval
between the trigger detection events. We note that the
curves represent a lower limit to the theoretically achiev-
able fidelity since a better fidelity may be obtained for
another choice of output state mode functions.
V. CONCLUSION
In conclusion we have analyzed a method to gener-
ate path entangled NOON states from the output from
two optical parametric oscillators. The method relies
on the joint detection of photons in a number of trig-
ger beams, and we presented a theoretical analysis of the
role of detector efficiency and dead time for the fidelity
of the states obtained and the success probability of the
protocol. Our specific NOON state protocol applies for
general photon numbers of the states, but in practice it
is not realistic to go beyond the case of N = 3, studied
here. This is due to the reduction of the fidelity due to
unwanted contributions from higher number states, when
the OPO output power gets too high, combined with the
severe reduction of the probability to obtain the number
of conditioning detection events needed when the OPO
output power is too low. The N = 3 NOON states, which
can be produced at 90% fidelity at the rate of one state
produced every 10− 100 seconds, seem to be at the limit
of realistic experiments of the proposed kind. Finally, we
also determined the NOON state fidelity for continuous-
wave fields, where the best NOON state occupies a pair
of suitably selected temporal mode functions, and where
we find high fidelities as long as the trigger events occur
within a short time window compared to the lifetime of
the OPO cavity field.
We presented this analysis for the production of op-
tical NOON states, but we note that recent theoreti-
cal proposals and experiments with four wave mixing
of matter waves [26], engineered quadratic interactions
among trapped ions [27], and entanglement between field
and atomic degrees of freedom [28, 29] bring promise for
similar conditional generation of atomic and interspecies
atom-field NOON states.
This work was supported by the European Integrated
project SCALA.
[1] A. N. Boto, P. Kok, D. S. Abrams, S. L. Braunstein, C.
P. Williams, and J. P. Dowling, Phys. Rev. Lett. 85, 2733
(2000).
[2] L. Pezzé and A. Smerzi, quant-ph/0508158.
[3] M. D’Angelo, A. Zavatta, V. Parigi, and M. Bellini, Phys.
Rev. A 74, 052114.
[4] P. Kok, H. Lee, and J. P. Dowling, Phys. Rev. A 65,
052104 (2002).
[5] J. Fiurasek, Phys. Rev. A 65, 053818 (2002).
[6] H. F. Hofmann, Phys. Rev. A 70, 023812 (2004).
[7] P. Walther, J. Pan, M. Aspelmeyer, R. Ursin, S. Gaspa-
roni, and A. Zeilinger, Nature (London) 429, 158 (2004).
[8] H. S. Eisenberg, J. F. Hodelin, G. Khoury, and D.
Bouwmeester, Phys. Rev. Lett. 94, 090502 (2005).
[9] N. M. VanMeter, P. Lougovski, D. B. Uskov, K. Kieling,
J. Eisert, and J. P. Dowling, quant-ph/0612154.
[10] M. W. Mitchell, J. S. Lundeen, and A. M. Steinberg,
Nature (London) 429, 161 (2004).
[11] A. Ourjoumtsev, A. Dantan, R. Tualle-Brouri, and P.
Grangier, Phys. Rev. Lett. 98, 030502 (2007).
[12] M. Dakna, T. Anhut, T. Opatrný, L. Knöll, and D.-G.
Welsch, Phys. Rev. A 55, 3184 (1997).
[13] A. B. URen, C. Silberhorn, J. L. Ball, K. Banaszek, and
I. A. Walmsley, Phys. Rev. A 72, 021802(R) (2005).
[14] A. Ourjoumtsev, R. Tualle-Brouri, J. Laurat, and P.
Grangier, Science 312, 83 (2006).
[15] K. Mølmer, Phys. Rev. A 73, 063804 (2006).
[16] A. Ourjoumtsev, R. Tualle-Brouri, and P. Grangier,
Phys. Rev. Lett. 96, 213601 (2006).
[17] J. S. Neergaard-Nielsen, B. M. Nielsen, C. Hettich, K.
Mølmer, and E. S. Polzik, Phys. Rev. Lett. 97, 083604
(2006).
[18] K. Wakui, H. Takahashi, A. Furusawa, and M. Sasaki,
Opt. Express 15, 3568 (2007).
[19] A. E. B. Nielsen and K. Mølmer, Phys. Rev. A 75, 023806
(2007).
[20] A. E. B. Nielsen and K. Mølmer, Phys. Rev. A 75, 043801
(2007).
[21] F. W. Sun, Z. Y. Ou, and G. C. Guo, Phys. Rev. A 73,
023808 (2006).
[22] P. Hariharan, N. Brown, and B. C. Sanders, J. Mod. Opt.
40, 1573 (1993).
[23] P. P. Rohde and T. C. Ralph, J. Mod. Opt. 53, 1589
(2006).
[24] A. Ekert and P. Knight, Am. J. Phys. 63, 415 (1995).
[25] J. Eisert and M. B. Plenio, Int. J. Quantum Inf. 1, 479
(2003).
[26] G. K. Campbell, J. Mun, M. Boyd, E. W. Streed, W. Ket-
terle, and D. E. Pritchard, Phys. Rev. Lett. 96, 020406
(2006)
[27] D. Porras and J. I. Cirac, Phys. Rev. Lett. 93, 263602
(2004).
[28] B. B. Blinov, D. L. Moehring, L. M. Duan, and C. Mon-
roe, Nature, 428, 153 (2004).
[29] J. Volz, M. Weber, D. Schlenk, W. Rosenfeld, J. Vrana,
K. Saucke, C. Kurtsiefer, and H. Weinfurter, Phys. Rev.
Lett. 96, 030404 (2006).
http://arxiv.org/abs/quant-ph/0508158
http://arxiv.org/abs/quant-ph/0612154
ABSTRACT
  We propose a measurement protocol to generate path-entangled NOON states
conditionally from two pulsed type II optical parametric oscillators. We
calculate the fidelity of the produced states and the success probability of
the protocol. The trigger detectors are assumed to have finite dead time, and
for short pulse trigger fields they are modeled as on/off detectors with finite
efficiency. Continuous-wave operation of the parametric oscillators is also
considered.

<|endoftext|><|startoftext|>
Introduction. Let (Yk)k∈N be a sequence of independent, nonnegative
random variables and let (Sn)n∈N0 ,
S0 := 0, Sn :=
Yk for all n ∈N,
be the associated sequence of partial sums. Regarding the Yk’s as successive
lifetimes and Sn as the time of the nth renewal, we interpret
Nt := sup{n ∈N0 :Sn ≤ t}
as the number of renewals up to and including time t; (Nt)t≥0 is the renewal
process. Standard renewal theory assumes that the Yk’s all have the same
distribution, in which case Nt, appropriately rescaled, is asymptotically nor-
mal as t→∞. For this result, and for renewal theory in general, we refer
the reader to Section XI in [3].
In this note we consider exponentially increasing lifetimes. We show that
in such a case the distribution of Nt does not converge and that asymp-
totic distributional fluctuations appear (Section 2). Such fluctuations occur
frequently in the analysis of algorithms. The renewal theoretic framework
Received January 2006.
AMS 2000 subject classifications. Primary 60K05; secondary 68Q25.
Key words and phrases. Asymptotic distributional behavior, limiting periodicities, re-
newal processes.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Probability,
2007, Vol. 17, No. 2, 676–687. This reprint differs from the original in pagination
and typographic detail.
http://arxiv.org/abs/0704.0398v1
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000862
http://www.imstat.org
http://www.ams.org/msc/
http://www.imstat.org
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000862
2 F. DENNERT AND R. GRÜBEL
provides a probabilistic view of this phenomenon in connection with digital
search trees (Section 3). We also indicate how our approach can be used to
obtain rates of convergence (Section 4).
2. Renewals for increasing lifetimes. We assume that the lifetimes in-
crease exponentially with rate α, where α> 1 is fixed throughout the sequel,
in the sense that
α−kYk →distr Y∞ and α−kEYk →EY∞(1)
for some random variable Y∞ and as k→∞. Here “→distr” denotes conver-
gence in distribution, so that the first part of (1) means that
Ef(α−kYk) =Ef(Y∞)
for all bounded continuous functions f :R→ R. Below we will use the fact
that in order to prove Xn →distr X it is sufficient to show that Ef(Xn)→
Ef(X) holds for all bounded and uniformly continuous functions. For details
and a general treatment of convergence in distribution we refer the reader
to [1]. Of course, only the distribution µ = L(Y∞) of Y∞ matters, so we
will occasionally write α−kYk →distr µ instead. Finally, throughout this note
a condition involving moments is meant to imply that these moments are
finite.
An important role will be played by
S∞ :=
α−kY∞,k,
where (Y∞,k)k∈N0 is a sequence of independent and identically distributed
random variables with L(Y∞,0) = L(Y∞), Y∞ as in (1). From EY∞ <∞ we
obtain ES∞ = α(α − 1)−1EY∞ <∞ and therefore P (S∞ <∞) = 1; more-
over, we then also have that
−kY∞,k converges almost surely and
hence in distribution to S∞ as n→∞. We will also assume that L(Y∞) has
no atoms, that is,
P (Y∞ = y) = 0 for all y ∈R+.(2)
Finally, it is an elementary analytic fact that, for a sequence (xn)n∈N of real
numbers with limit x ∈R,
α−kxn−k =
The following lemma can be regarded as a random version of (3).
Lemma 1. If (1) and (2) are satisfied, then α−nSn →distr S∞ as n→∞,
and P (S∞ = y) = 0 for all y ∈R.
RENEWALS FOR INCREASING LIFETIMES 3
Proof. Suppose that (Uk)k∈N is a sequence of independent random
variables on some probability space (Ω,A, P ), all uniformly distributed on
the unit interval. Let Fk be the distribution function of Yk, F the distribution
function of Y∞. We use a variant of the quantile construction:
Ỹk := F
k (Uk), Ỹ∞,k := F
−1(Uk) for all k ∈N.
We then have L(Ỹ1, . . . , Ỹn) = L(Y1, . . . , Yn) for all n ∈N, which implies
L(α−nSn) =L(α−nS̃n) with S̃n :=
Using α−nS̃n =
k=0 α
−k(α−(n−k)Ỹn−k) we obtain
α−nS̃n −
α−kỸ∞,n−k
α−kE|α−(n−k)Ỹn−k − Ỹ∞,n−k|.(4)
With Y ′k := F
k (U1) and Y
∞ := F
−1(U1) we have
E|α−kỸk − Ỹ∞,k|=E|α−kY ′k − Y ′∞|.(5)
From (1) it follows that α−kY ′k →distr Y ′∞ and Eα−kY ′k → EY ′∞. Because
of Y ′k, Y
∞ ≥ 0 Theorem 5.4 in [1] applies and gives the L1-convergence of
α−kY ′k to Y
∞, that is, E|α−kY ′k − Y ′∞| → 0 as k →∞. Using this together
with (3), (4) and (5) we obtain
α−nS̃n −
α−kỸ∞,n−k
= 0.(6)
Now let f :R→R be bounded and uniformly continuous. We have
|Ef(α−nSn)−Ef(S∞)|=
Ef(α−nS̃n)−Ef
α−kỸ∞,n−k
α−kỸ∞,k
α−kỸ∞,k
f(α−nS̃n)− f
α−kỸ∞,n−k
α−kỸ∞,k
α−kỸ∞,k
For the first integral on the right-hand side we use (6), for the second an
elementary estimate shows that the difference between the arguments of f
converges to 0 in probability. In both cases we now use uniform continuity
4 F. DENNERT AND R. GRÜBEL
when the arguments of f are close to each other and boundedness otherwise.
This leads to
Ef(α−nSn) =Ef(S∞),
which gives the convergence in distribution. The statement about the atoms
of S∞ follows from (2) and the fact that S∞ is equal in distribution to
Y∞ + α
−1S∞ with Y∞ and S∞ independent. �
The above proof is based on classical weak convergence arguments. An
alternative proof can be obtained via the Wasserstein distance
dW (µ, ν) = inf{E|X − Y | :L(X) = µ,L(Y ) = ν},
its known relation to weak convergence and convergence of the first moments,
and the same variant of the quantile construction, which in this context is
known as the comonotone coupling.
We write ⌊x⌋ for the greatest integer less than or equal to x and {x} for
the fractional part of x ∈R.
Theorem 2. Suppose that (1) and (2) are satisfied and let
Qη := L(⌊− logαS∞ + η⌋), 0≤ η ≤ 1.(7)
If (tn)n∈N is a sequence of real numbers with tn →∞ and such that {logα tn}→
η for some η ∈ [0,1], then
Ntn − ⌊logα tn⌋→distr Qη as n→∞.
Proof. We use the abbreviations kn := ⌊logα tn⌋ and ηn := {logα tn}. In
particular, logα tn = kn + ηn. Further, let Z∞ := − logαS∞. By a standard
renewal theoretic argument,
P (Nt = j) = P (Sj ≤ t)−P (Sj+1 ≤ t) for all t≥ 0, j ∈N0,
hence
P (Ntn − kn = j) = P (Skn+j ≤ tn)− P (Skn+j+1 ≤ tn)
= P (− logα(α−kn−jSkn+j) + ηn ≥ j)
−P (− logα(α−kn−j−1Skn+j+1) + ηn ≥ j + 1)
→ P (⌊Z∞ + η⌋= j) as n→∞,
where in the last step Lemma 1 and three general facts about convergence
in distribution were used: First, the continuous mapping theorem, which
implies that − logα(α−mSm)→distr − logαS∞ as m→∞; secondly, the in-
terplay with convergence in probability, see Theorem 4.1 in [1], which yields
RENEWALS FOR INCREASING LIFETIMES 5
− logα(α−nSn)+ ηn →distr − logαS∞+ η as n→∞; finally, that L(S∞) and
therefore also L(− logαS∞+η) assign probability 0 to single points and that
this implies
P (− logα(α−nSn) + ηn ≥ z) = P (− logαS∞ + η ≥ z) for all z ∈R.
A structural consequence of the representation (7) is the →distr-continuity
of η 7→Qη on the open unit interval; at η = 0 this function is right continuous,
at η = 1 it is left continuous. The extreme members are translates of each
other in the sense that Q0({j}) =Q1({j + 1}) for all j ∈ Z.
The total variation distance dTV of probability measures is defined by
dTV(µ, ν) := sup
|µ(B)− ν(B)|,
for µ, ν concentrated on Z this can be written as
dTV(µ, ν) =
|µ({j})− ν({j})|.(8)
For a sequence of probability measures that are concentrated on a fixed
countable set Scheffé’s lemma implies that weak convergence is equivalent
to convergence in total variation distance, hence (7) can be rewritten as
dTV(L(Ntn − ⌊logα tn⌋),Q{log
tn}) = 0.
Because of the continuity of [0,1] ∋ η 7→Qη this in turn leads to a statement
that avoids the use of subsequences,
dTV(L(Nt − ⌊logα t⌋),Q{log
t}) = 0.(9)
In Section 4 we will investigate the rate of convergence in (9) in a particular
case.
3. An application to digital search trees. The nodes of a (rooted, di-
rected) binary tree can be represented by finite strings of 0’s and 1’s if we
interpret 0 as a move to the left and 1 as a move to the right. The length of
the string is the depth (or level) of the node it represents, the root node corre-
sponds to the empty string and has level 0. The sequence (Tn)n∈N associated
with a sequence (xn)n∈N of numbers from the unit interval by the DST (dig-
ital search tree) algorithm is obtained as follows: For T1, we put x1 into the
root node. If x1, . . . , xn have been stored into Tn then the position of xn+1 is
determined by traveling through the tree with the direction given by the bi-
nary expansion of xn+1 until an empty node has been found. This algorithm
and its properties are discussed in the standard texts of the area, for exam-
ple, [8, 10, 11]. As an example we consider the first ten numbers given in [8],
6 F. DENNERT AND R. GRÜBEL
Fig. 1. Binary tree.
Appendix A, (
2, log 2, log 3, log 10). Let xi be the
fractional part of the ith entry, 1≤ i≤ 10; the relevant first four bits of the
respective binary expansions are given by (0110,1011,0011,0010, 0100,0111,
0011,1011,0001,0100). This leads to the binary tree given in Figure 1.
Consider now the sequence (Tn)n∈N0 of random trees that the DST algo-
rithm associates with a sequence (Un)n∈N of independent random variables,
where we assume that the Un’s are uniformly distributed on the unit inter-
val and that T0 is the empty tree. Let Xn(θ) be the depth of the first free
node of Tn along the path determined by a sequence θ ∈ {0,1}N. Such a θ
defines a family of nested intervals of length 2−k, k = 1,2,3, . . . , and it is
easy to see that (Xn(θ))n∈N0 is a Markov chain with X0(θ) ≡ 0 and tran-
sition probabilities pk,k+1 = 1− pk,k = 2−k for all k ∈ N0. Conditioning on
the value of Un+1 we see that the distribution of Xn(θ) is the same as the
distribution of Zn+1, the insertion depth of Un+1. This quantity is known
as “unsuccessful search” in the literature on the analysis of algorithms. [Of
course, this distributional equality does not hold for the joint distributions:
n 7→Xn(θ) is increasing, n 7→ Zn+1 is not.] For example, the next number
in Knuth’s list is x11 = 1/ log 2, the binary expansion of the fractional part
{x11} begins with 011100 and hence x11 would be inserted at level 4 as the
right child of x6.
The Markov chain (Xn(θ))n∈N0 is of the simple birth type and can there-
fore be described by its respective holding times Y1, Y2, Y3, . . . in the states
k = 0,1,2, . . . . These are independent, and Yk has a geometric distribution
with parameter pk−1,k, that is, for all k ∈N,
P (Yk = j) = (1− 2−k+1)j−12−k+1 for all j ∈N.
Here we interpret the case k = 1 as Y1, the holding time in 0, being constant
and equal to 1. As a result of its simple stochastic dynamics, (Xn(θ))n∈N0
is equal to the renewal process N associated with the sequence (Yk)k∈N,
observed at discrete times, that is, (Xn(θ))n∈N0 = (Nn)n∈N0 . It is easy to
see that for this sequence (Yk)k∈N of lifetimes conditions (1) and (2) are
satisfied and that L(Y∞) = Exp(2), with Exp(λ) the exponential distribution
RENEWALS FOR INCREASING LIFETIMES 7
with parameter λ (and mean 1/λ). Hence Theorem 2 can be applied: If the
sequence (n(m))m∈N ⊂ N is such that n(m) →∞ and {log2 n(m)} → η as
m→∞, then
Xn(m)(θ)− ⌊log2 n(m)⌋→distr Qη.(10)
Here Qη , 0≤ η ≤ 1, is the distribution of ⌊− log2 S∞+η⌋, S∞ :=
k=0 2
−kY∞,k
and Y∞,k, k ∈N0, are independent and identically distributed with L(Y∞,1) =
Exp(2). Alternatively, we can write S∞ :=
k=1 Ỹk with Ỹk, k ∈N, again in-
dependent and L(Ỹk) = Exp(2k) for all k ∈N.
The explicit representation of the family of limit distributions on the basis
of the convolution product of the distributions Exp(2k), k ∈N, can be used
to obtain a series expansion for the distribution functions associated with
Qη , 0 ≤ η ≤ 1. For this, we start with a partial fraction expansion: For all
n ∈N and all z ∈C with |ℜ(z)|< 2,
(1− 2−kz)−1 =
an,k(1− 2−kz)−1,(11)
where an,k :=
j=1(1− 2j)−1
j=1 (1− 2−j)−1. Reading (11) as an equality
relating characteristic functions we obtain
Exp(21) ⋆Exp(22) ⋆ · · · ⋆Exp(2n) =
an,kExp(2
k).(12)
Note, however, that the right-hand side in (12) is not the usual mixture of
probability distributions as the coefficients alternate in sign. With
ak := b
(1− 2j)−1, b :=
(1− 2−j)−1,
letting n→∞ in (12) leads to L(S∞) =
k=1 akExp(2
k), so that
Qη((−∞, x]) = P (⌊− log2(S∞) + η⌋ ≤ x)
= P (S∞ > 2
η−1−x)(13)
ak exp(−2k+η−1−x) for all x ∈ Z.
This representation of the limiting distribution functions as an alternating
series has already been obtained by Louchard [9] in the context of digital
search trees and by Flajolet [4] in the context of approximate counting; see
also Section 6.4 in [10] and Section 6.3 in [8] for related results. These authors
use a completely different approach, more analytic in flavor and relying on
combinatorial identities due to Euler.
8 F. DENNERT AND R. GRÜBEL
Our main point here, however, is not a rederivation of (13) but the rep-
resentation of the family {Qη : 0≤ η < 1} in terms of one particular random
variable, which is first shifted by η and then discretized. This representation
can, for example, be used to obtain information about the tail behavior of
the limit distributions. Janson [7] notes that (13) by itself would only give
an exponential rate of decrease for the tail probabilities, he then provides an
analytic argument that improves this to a superexponential rate by show-
ing that the associated Fourier transform is an entire function. Using the
representation S∞ =
k=1 2
−kZk with Zk independent and L(Zk) = Exp(1)
together with the fact that Exp(1) has a density bounded by 1, we get
P (S∞ ≤ 2−j)≤ P (Z1 ≤ 2−j+1)P (Z2 ≤ 2−j+2) · · ·P (Zj−1 ≤ 2−1)
≤ 2−j+12−j+2 · · ·2−1
= 2−j(j−1)/2
for all j ∈ N. Because of Qη([k,∞)) ≤ P (S∞ ≤ 2−k+1) for all k ∈ N, k ≥ 2,
this leads to
Qη([x,∞)) = o(exp(−ρx2)) as x→∞, for all ρ < (log 2)/2.
The fact that a representation by discretization is possible in many situ-
ations where fluctuations were first found by calculation seems to belong to
the folklore of the subject, at least in simple instances such as the asymp-
totic distributional behavior of the maximum of a sample from a geometric
distribution. The geometric case together with some renewal theoretic tech-
niques (for identically distributed lifetimes) was used in [5] to obtain results
of the above type for von Neumann addition. In [2] a discretization represen-
tation occurs on the level of stochastic processes, leading to a probabilistic
approach to fluctuation phenomena in the context of multiplicities of the
maximum in a random sample from a discrete distribution. In a recent pa-
per, Janson [7] studies the effects of discretizing random variables and the
resulting distributional fluctuations and gives a range of interesting exam-
ples. Of course, the explanation for periodicities can be, and indeed often is,
quite different and mechanisms other than discretization may be responsible;
see, for example, [6] and the references given there.
4. Rates of convergence. The renewal theoretic approach can also be
used to obtain rates of convergence. We sketch one of the possibilities, for
a particular choice of distances, and give details for the DST situation from
Section 3. Let, for t > 0, k(t) := ⌊logα t⌋ and η(t) := {logα t}.
The Kolmogorov–Smirnov distance of two probability measures µ and ν
on the real line is defined by
dKS(µ, ν) := sup
|µ((−∞, x])− ν((−∞, x])|.
RENEWALS FOR INCREASING LIFETIMES 9
If X and Y are real random variables, then we abbreviate dKS(L(X),L(Y ))
to dKS(X,Y ); if F and G are the associated distribution functions, then
dKS(X,Y ) = ‖F − G‖∞, where the supremum norm for general bounded
functions f :R → R is given by ‖f‖∞ := supx∈R |f(x)|. The Kolmogorov–
Smirnov distance is obviously invariant under strictly monotone transfor-
mations. For example,
dKS(αX + β,αY + β) = dKS(X,Y ) for all α,β ∈R, α 6= 0,
and for X,Y > 0,
dKS(X,Y ) = dKS(logX, logY ).
With the notation as in the proof of Theorem 2,
|P (Nt − k(t) = j)−P (⌊− logα(S∞) + η(t)⌋= j)|
≤ |P (− logα(α−k(t)−jSk(t)+j) + η(t)≥ j)−P (− logα(S∞) + η(t)≥ j)|
+ |P (− logα(α−k(t)−j−1Sk(t)+j+1) + η(t)≥ j +1)
− P (− logα(S∞) + η(t)≥ j + 1)|.
With the auxiliary quantities
Zt := ⌊− logα(S∞) + η(t)⌋, φ(m) := dKS(α−mSm, S∞)
and the above properties of the Kolmogorov–Smirnov distance this leads to
|P (Nt − k(t) = j)−P (Zt = j)| ≤ φ(k(t) + j) + φ(k(t) + j +1).(14)
It is often possible to obtain an upper bound for negative j, say j ≤−k(t)/2,
directly. In such cases the above elementary renewal theoretic argument leads
to a bound for the ‖ · ‖∞-distance between the probability mass functions of
Nt − k(t) and Zt, for example; note that the latter variable has distribution
Qη(t) where Qη , 0≤ η ≤ 1, is the set of limit distributions along subsequences
that appears in Theorem 2.
The above argument covers the step from (α−mSm)m∈N to (Nt)t≥0. How-
ever, in an application the starting point will usually be the convergence of
the scaled lifetimes in (1), which means that we also need an analogue for
Lemma 1 that gives rates of convergence.
We carry this out in the specific context of digital search trees. The fol-
lowing general bounds will turn out to be useful: If X has density fX and if
P (|Y | ≤ c) = 1, then
dKS(X,X + Y )≤ c‖fX‖∞.(15)
Indeed: For all z ∈R, P (X ≤ z− c)≤ P (X+Y ≤ z)≤ P (X ≤ z+ c), so that
|P (X + Y ≤ z)−P (X ≤ z)|
≤max{P (X ≤ z + c)−P (X ≤ z), P (X ≤ z)−P (X ≤ z − c)},
10 F. DENNERT AND R. GRÜBEL
and, of course, P (X ∈ (a, b]) ≤ (b − a)‖fX‖∞. This bound can easily be
generalized to
dKS(X,X + Y )≤ c‖fX‖∞ +P (|Y |> c) for all c > 0,(16)
where we still assume that X has density fX , but Y may be arbitrary.
Note that X and Y need not be independent in (15) and (16). If they are
independent then it is easy to show that
dKS(X,X + Y )≤ ‖fX‖∞E|Y |.(17)
In (17) boundedness of Y is not needed but the bound obviously makes sense
only if Y has finite first moment. Finally, in connection with density bounds
the interplay with convolution is of interest: We have ‖f ⋆ g‖∞ ≤ ‖f‖∞ for
all probability densities f, g. For example, if a sum of independent random
variables contains a summand with distribution Exp(λ), then the density of
the sum is bounded by λ.
Lemma 3. With (Yk)k∈N and S∞ as in Section 3,
dKS(2
−nSn, S∞) =O(n2
Proof. Let (Zk)k∈N be a sequence of independent random variables, all
exponentially distributed with parameter 1. Then S∞ is equal in distribution
k=1 2
−kZk. We recall that the kth lifetime Yk has a geometric distribu-
tion with parameter 2−k+1. On the basis of (Zk)k∈N we define a sequence
(Ỹk)k∈N by Ỹk := ⌊αkZk⌋+1 for all k ∈N, with
α1 := 0, αk := (− log(1− 2−k+1))−1 for k > 1.
It is easy to check that
(Ỹk)k∈N =distr (Yk)k∈N, 2
2k−1Zk =distr
2−kZk.
Hence, with φ(n) denoting the dKS-distance of 2
−nSn and S∞,
φ(n)≤ φ1(n) + φ2(n) + φ3(n) for all n ∈N,
with φ1, φ2, φ3 defined by
φ1(n) := dKS
Ỹk,2
φ2(n) := dKS
αkZk,2
2k−1Zk
φ3(n) := dKS
2−kZk,
2−kZk
RENEWALS FOR INCREASING LIFETIMES 11
For the random variables in φ1 we have
Vn ≤ 2−n
Ỹk ≤ Vn + n2−n with Vn := 2−n
αkZk.
It is easy to show that the densities of Vn, n ∈N, can be uniformly bounded
for all n by some finite constant C1, hence (15) implies that φ1(n)≤C1n2−n
for all n ∈N.
The elementary bounds
− 1≤−
log(1− x)
for 0< x≤
together with α1 = 0 imply supk∈N |αk − 2k−1|= 1, hence we have
αkZk − 2−n
2k−1Zk
≤ 2−n
The familiar combination of Markov’s inequality and moment generating
functions gives
Zk ≥ (1 + κ)n
=O(2−n)
if κ is chosen large enough, so that we can use (16) with c = c(n) = (1 +
κ)n2−n to obtain that φ2(n)≤C2n2−n for all n ∈N, for some finite constant
For φ3 finally we use (17): For the densities of the finite sums we again
have a finite uniform bound for all n, and
k=n+1
2−kZk
k=n+1
2−kEZk = 2
so that φ3(n) ≤ C32−n for all n ∈ N with some C3 <∞. Putting these to-
gether we arrive at
φ(n)≤Cn2−n for all n ∈N
with some finite constant C. �
In the application under consideration we obtain a rate of convergence
result with respect to the total variation distance, which is stronger than
a result for the supremum norm distance of the corresponding probability
mass functions that we mentioned in connection with (14).
Theorem 4. With (Xn(θ))n∈N and Qη as in Section 3,
dTV(L(Xn(θ)− ⌊log2 n⌋),Q{log2 n}) = o(n
−γ) for all γ < 1.
12 F. DENNERT AND R. GRÜBEL
Proof. We use the abbreviations k(n) := ⌊log2 n⌋ and η(n) := {log2 n}.
Let γ < 1 be given and choose ε > 0 such that ε < 1− γ. Lemma 3 together
with (14) gives
j≥−εk(n)
|P (Nn − k(n) = j)−Qη(n)({j})| ≤C
j≥(1−ε)k(n)
for all n ∈ N with some finite constant C. Our choice of ε implies that the
upper bound has the desired rate o(n−γ).
For the remaining part of the infinite sum in (8) we replace the absolute
difference of the probabilities by their sum, which means that it is now
enough to show that
P (Nn ≤ (1− ε)k(n)) = o(n−γ),(18)
P (− log2(S∞)≤−εk(n) + 1) = o(n−γ).(19)
Here we have used that Qη is the distribution of ⌊− log2(S∞)+ η⌋. It is easy
to show that the moment generating function for S∞ exists in a neighbor-
hood of 0, hence
P (S∞ > x) = o(e
−κx) for all x > 0(20)
with some κ > 0. Straightforward manipulations show that (20) implies (19);
indeed, the probability converges faster to 0 than any negative power of n.
Using once again the relation between the number of renewals and the partial
sums of the lifetimes we further obtain, with m(n, ε) := ⌊(1− ε)k(n)⌋,
P (Nn ≤ (1− ε)k(n)) ≤ P (Sm(n,ε) ≥ n)
= P (2−m(n,ε)Sm(n,ε) ≥ n2−m(n,ε))
≤ dKS(2−m(n,ε)Sm(n,ε), S∞) +P (S∞ ≥ n2−m(n,ε)).
For the Kolmogorov–Smirnov distance we use Lemma 3, for the tail of S∞
the desired rate follows with (20). This gives (18) and hence completes the
proof. �
REFERENCES
[1] Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
MR0233396
[2] Bruss, F. Th. and Grübel, R. (2003). On the multiplicity of the maximum in a
discrete random sample. Ann. Appl. Probab. 13 1252–1263. MR2023876
[3] Feller, W. (1971). An Introduction to Probability Theory and Its Applications II,
2nd ed. Wiley, New York. MR0270403
[4] Flajolet, Ph. (1985). Approximate counting: A detailed analysis. BIT 25 113–134.
MR0785808
[5] Grübel, R. and Reimers, A. (2001). On the number of iterations required by von
Neumann addition. Theor. Inform. Appl. 35 187–206. MR1862462
http://www.ams.org/mathscinet-getitem?mr=0233396
http://www.ams.org/mathscinet-getitem?mr=2023876
http://www.ams.org/mathscinet-getitem?mr=0270403
http://www.ams.org/mathscinet-getitem?mr=0785808
http://www.ams.org/mathscinet-getitem?mr=1862462
RENEWALS FOR INCREASING LIFETIMES 13
[6] Janson, S. (2004). Functional limit theorems for multitype branching processes and
generalized Pólya urns. Stochastic Process. Appl. 110 177–245. MR2040966
[7] Janson, S. (2006). Rounding of continuous random variables and oscillatory asymp-
totics. Ann. Probab. 34 1807–1826.
[8] Knuth, D. E. (1973). The Art of Computer Programming 3. Sorting and Searching.
Addison–Wesley, Reading, MA. MR0445948
[9] Louchard, G. (1987). Exact and asymptotic distributions in digital binary search
trees. Theor. Inform. Appl. 21 479–496. MR0928772
[10] Mahmoud, H. M. (1992). Evolution of Random Search Trees. Wiley, New York.
MR1140708
[11] Sedgewick, R. and Flajolet, Ph. (1996). An Introduction to the Analysis of Al-
gorithms. Addison–Wesley, Reading, MA.
Institut für Mathematische Stochastik
Universität Hannover
Postfach 60 09
D-30060 Hannover
Germany
E-mail: dennert@stochastik.uni-hannover.de
rgrubel@stochastik.uni-hannover.de
http://www.ams.org/mathscinet-getitem?mr=2040966
http://www.ams.org/mathscinet-getitem?mr=0445948
http://www.ams.org/mathscinet-getitem?mr=0928772
http://www.ams.org/mathscinet-getitem?mr=1140708
mailto:dennert@stochastik.uni-hannover.de
mailto:rgrubel@stochastik.uni-hannover.de
	Introduction
	Renewals for increasing lifetimes
	An application to digital search trees
	Rates of convergence
	References
	Author's addresses
ABSTRACT
  We show that the number of renewals up to time $t$ exhibits distributional
fluctuations as $t\to\infty$ if the underlying lifetimes increase at an
exponential rate in a distributional sense. This provides a probabilistic
explanation for the asymptotics of insertion depth in random trees generated by
a bit-comparison strategy from uniform input; we also obtain a representation
for the resulting family of limit laws along subsequences. Our approach can
also be used to obtain rates of convergence.

<|endoftext|><|startoftext|>
LAPTH-1178/07
Hawking radiation of linear dilaton black holes
G. Clémenta∗, J.C. Fabrisb†and G.T. Marquesa,b‡
aLaboratoire de Physique Théorique LAPTH (CNRS),
B.P.110, F-74941 Annecy-le-Vieux cedex, France
b Departamento de F́ısica, Universidade Federal do Esṕırito Santo,
Vitória, 29060-900, Esṕırito Santo, Brazil
April 3, 2007
Abstract
We compute exactly the semi-classical radiation spectrum for a
class of non-asymptotically flat charged dilaton black holes, the so-
called linear dilaton black holes. In the high frequency regime, the
temperature for these black holes generically agrees with the surface
gravity result. In the special case where the black hole is massless,
we show that, although the surface gravity remains finite, there is
no radiation, in agreement with the fact that massless objects cannot
radiate.
e-mail: gclement@lapp.in2p3.fr
e-mail: fabris@cce.ufes.br
e-mail:gtadaiesky@cce.ufes.br
http://arxiv.org/abs/0704.0399v1
Quantum field theory in curved spacetime predicts new phenomena such
as particle emission by a black hole [1]. This is due to the fact that the vac-
uum for a quantum field near the horizon is different from the observer’s
vacuum at spatial infinity. A distant observer thus receives from a black
hole a steady flux of particles exhibiting, in the high frequency regime, a
black body spectrum with a temperature proportional to the surface grav-
ity [2]. Although Hawking’s original derivation of this black hole evaporation
dealt with realistic collapsing black holes, Unruh [3] showed that the same
results are obtained when the collapse is replaced by appropriate boundary
conditions on the horizon of an eternal black hole. In the semi-classical ap-
proximation, the black hole radiation spectrum may be evaluated by com-
puting the Bogoliubov coefficients relating the two vacua. An equivalent
procedure is to compute the reflection and absorption coefficients of a wave
by the black hole. Usually, the wave equation cannot be solved exactly,
and one must resort to match solutions in an overlap region between the
near-horizon and asymptotic regions [4, 5]. In the special case of the (2+1)-
dimensional BTZ black hole [6], an exact solution of the wave equation is
available, which allows for an exact computation of the radiation spectrum,
leading to the Hawking temperature [7, 8, 9].
In this Letter, we discuss another case of black holes also allowing for
an exact semi-classical computation of their radiation spectrum, that of lin-
ear dilaton black hole solutions to Einstein-Maxwell dilaton (EMD) theory
in four dimensions. Linear dilaton black holes are a special case of the
more general class of non-asymptotically flat black hole solutions to EMD
[10, 11], which we first briefly present. We discuss the evaporation of these
non-asymptotically flat black holes and show that they either collapse to a
naked singularity in a finite time, or evaporate in an infinite time. We then
specialize to linear dilaton black holes, and outline the analytical computa-
tion of their radiation spectrum. For massive black holes, this computation
leads, in the high frequency regime, to the same temperature which is ob-
tained from the surface gravity. However in the case of massless extreme
black holes, we find that, although the surface gravity remains finite, there
is no radiation, in agreement with the fact that a massless object cannot
radiate.
EMD is defined by the following action
R− 2∂µφ∂µφ− e−2αφFµνFµν
, (1)
where Fµν is the electromagnetic field, and φ is the dilatonic field, with cou-
pling constant α. This theory admits static spherically symmetric solutions
representing black holes. Among these black hole solutions there are asymp-
totically flat ones [12, 13] as well as non-asymptotically flat configurations
[10, 11]. In the present work, we are interested in the non-asymptotically
flat black hole solutions
ds2 =
rγ(r − b)
dt2 −
rγ(r − b)
dr2 + r(r − b)dΩ2
, (2)
1 + γ
dr ∧ dt , e2αφ = ν2
. (3)
1− α2
1 + α2
. (4)
The constants b and r0 are related to the mass and to the electric charge of
the black hole through
M = (1− γ)b/4 , Q =
1 + γ
. (5)
The solutions (2),(3) interpolate between the Schwarzschild solution for
γ = −1 (α2 → ∞) and the Bertotti-Robinson solution for γ = +1 (α2 = 0).
For b > 0 the horizon at r = b hides the singularity at r = 0, while in the
extreme black hole case b = 0 the horizon coincides with the singularity.
This is a curious case, with vanishing mass but a finite electric charge. For
−1 < γ < 0 (α2 > 1) the central singularity is timelike and clearly naked
[11]. On the other hand, for 0 ≤ γ < 1 (0 < α2 ≤ 1), the central singularity
is null and marginally trapped [14], so that signals coming from the centre
never reach external observers. Thus in this case, extreme black holes can
be still considered as black holes indeed.
The statistical Hawking temperature of the black holes (2), computed as
usual by dividing the surface gravity by 2π is given by
. (6)
It is finite for all γ if b 6= 0. For b = 0 and −1 < γ < 0 (naked singularity).
the temperature is infinite, while for b = 0 and 0 < γ < 1 (extreme black
hole), the temperature vanishes.
The case b = γ = 0 is intriguing. Although this an extreme black
hole, the situation is different from that of asymptotically flat extreme black
holes. The near-horizon Euclidean extreme Reissner-Nordström geometry
is cylindrical, rather than conical, so that its statistical temperature is ar-
bitrary, contrary to the zero value derived from surface gravity [15]. In the
present case the two-dimensional Euclidean continuation of the metric (2)
with γ = 0 clearly has a conical singularity at r = b for all values of b,
including b = 0, leading for this particular extreme black hole to the finite
temperature TH = 1/4πr0, in agreement with the value (6). However this
result is questionable. A black hole with pointlike horizon and zero mass
clearly cannot radiate, so one should rather expect its temperature to be
zero. We will return to this question presently.
As black holes (2) radiate, they loose mass according to Stefan’s law
= −σAhT 4H , (7)
where σ is Stefan’s constant, and Ah = 4πr
1−γ is the horizon area.
Assuming that only electrically neutral quanta are radiated, (7) implies that
the horizon area decreases according to
(4π)3(1− γ)
−3(1+γ)
1+3γ , (8)
which is solved by
b(t) = r0
t− t0
)−1/3γ
(γ 6= 0) ,
b(t) = r0 exp
t− t0
(γ = 0) , (9)
where c = 3σ/16π3, and t0 is an integration constant. The outcome de-
pends on the sign of γ. For γ < 0, the Hawking temperature increases with
decreasing mass and the black hole collapses to a naked singularity (or evap-
orates away altogether in the Schwarzschild case γ = −1) in a finite time
according to b ∼ (t0 − t)1/3|γ|. On the other hand, for γ ≥ 0, the Hawking
temperature decreases (or is constant for γ = 0) with decreasing mass, and
the black hole evaporates in an infinite time, reaching the extreme black
hole state b = 0 only asymptotically.
We now proceed to a more precise evaluation of the temperature of non-
asymptotically flat black holes from the study of wave scattering in these
spacetimes. The wave equation
∇2φ = 0 (10)
does not generically allow for an exact solution in the spacetimes (2). How-
ever, it can be solved analytically [16] in the case of linear dilaton black
holes with γ = 0 and b 6= 0, with the metric
ds2 =
r − b
dt2 −
r − b
dr2 + r(r − b)dΩ2
, (11)
Considering the harmonic eigenmodes
φ(x) = ψ(r, t)Ylm(θ, ϕ) , ψ(r, t) = R(r)e
−iωt , (12)
we obtain the following radial equation:
r(r − b)∂rR
r − b
− l(l + 1)
R = 0 (13)
(ω̄2 ≡ ω2r20). Putting
, R = yiω̄f , (14)
reduces (13) to the equation
y(1−y)∂2yf+
1+2iω̄−2(1+ iω̄)y
ω̄2− iω̄− λ̄2−1/4
f = 0 , (15)
λ̄2 = ω̄2 − (l + 1/2)2 . (16)
This is a hypergeometric equation
y(1− y)∂2yf +
c− (a+ b+ 1)y
∂yf − abf = 0 , (17)
+ i(ω̄ + λ̄) , b =
+ i(ω̄ − λ̄) , c = 1 + 2iω̄ . (18)
It follows that the general solution of equation (13) is
R = C1
r − b
+ i(ω̄ + λ̄),
+ i(ω̄ − λ̄), 1 + 2iω̄; b− r
r − b
)−iω̄
− i(ω̄ + λ̄), 1
− i(ω̄ − λ̄), 1 − 2iω̄; b− r
.(19)
Putting
r − b
= ex/r0 , (20)
the partial wave near the horizon (x→ −∞) is thus
ψ ≃ C1eiω(x−t) +C2e−iω(x+t) . (21)
To obtain the behavior of the partial wave near spatial infinity, we must
expand the solutions of (15) in hypergeometric functions of argument 1/y.
The relevant transformation is
F (a, b, c; y) =
Γ(c)Γ(b− a)
Γ(b)Γ(c− a)
(−y)−aF (a, a+ 1− c, a+ 1− b; 1/y)
Γ(c)Γ(a − b)
Γ(a)Γ(c − b)
(−y)−bF (b, b+ 1− c, b+ 1− a; 1/y) . (22)
This leads to the asymptotic behavior
)−1/2(
i(λx−ωt) +B2e
−i(λx+ωt)
(λ = λ̄/r0), where the amplitudes of the asymptotic outgoing and ingoing
waves B1 and B2 are related to the amplitudes of the near-horizon outgoing
and ingoing waves C1 and C2 by
B1 = Γ(2iλ̄)
Γ(1 + 2iω̄)
Γ(1/2 + i(ω̄ + λ̄))2
Γ(1− 2iω̄)
Γ(1/2 − i(ω̄ − λ̄))2
B2 = Γ(−2iλ̄)
Γ(1 + 2iω̄)
Γ(1/2 + i(ω̄ − λ̄))2
Γ(1− 2iω̄)
Γ(1/2− i(ω̄ + λ̄))2
. (24)
Hawking radiation can be considered as the inverse process of scattering
by the black hole, with the asymptotic boundary condition B1 = 0 (the
outgoing mode is absent). The coefficient for reflection by the black hole is
then given by
|C1|2
|C2|2
|Γ(1/2 + i(ω̄ + λ̄))2|2
|Γ(1/2 + i(ω̄ − λ̄))2|2
cosh2 π(ω̄ − λ̄)
cosh2 π(ω̄ + λ̄)
. (25)
The resulting radiation spectrum is
= (eω/TH − 1)−1 . (26)
For high frequencies, λ̄ ≃ ω̄ = ω/r0, and we recover from (25) the Hawking
temperature as computed from the surface gravity,
. (27)
The above computation fails in the linear dilaton vacuum case b = 0.
The question of assigning a temperature to such massless black holes might
be evacuated by arguing that they cannot be formed, either through cen-
tral collapse of matter, or (as we have seen above) through evaporation of
massive black holes. Nevertheless, as a matter of principle one should con-
sider the possibility of primordial massless black holes. From the general
temperature law (6) these should have a finite temperature. On the other
hand, being massless they cannot radiate energy away, so their temperature
should vanish.
The question can be settled by solving the massless Klein-Gordon equa-
tion in the metric (11) with b = 0,
ds2 =
dt2 − r0
dr2 − r0rdΩ2 . (28)
This metric can be rewritten as
ds2 = Σ2
dτ2 − dx2 − dΩ2
, (29)
x = ln(r/r0) , τ = t/r0 , Σ = r0e
x/2 , (30)
showing that the linear dilaton vacuum metric is conformal to the product
M2 × S2 of a two-dimensional Minkowski spacetime with the two-sphere.
Performing also the redefinition
φ = Σ−1ψ , (31)
the Klein-Gordon equation (10) is reduced to
∇2φ = Σ−3
∂ττ − ∂xx −∇2Ω +
ψ = 0 , (32)
where ∇2Ω is the Laplacian operator on the two-sphere.
For a given spherical harmonic with orbital quantum number l, the re-
duced Klein-Gordon equation is thus
∇22ψl + (l + 1/2)2ψl = 0 , (33)
with ∇22 the Dalembertian operator on M2. Also, for a given spherical
harmonic the four-dimensional Klein-Gordon norm reduces to theM2 norm:
‖φ‖2 = 1
|g|g0µφ∗
∂µ φ =
dxψ∗l
∂τ ψl . (34)
Thus, the problem of wave propagation in the linear dilaton vacuum reduces
to the propagation of eigenmodes of a free Klein-Gordon field in two dimen-
sions, with effective mass µ = l+1/2. Clearly there is no reflection, so that
the linear dilaton vacuum does not radiate and hence its Hawking temper-
ature vanishes, contrary to the naive surface gravity value (6). A similar
reasoning holds in 2+1 dimensions for the BTZ vacuum [6] (M = L = 0),
which is conformal to M2 × S1.
We have shown that a complete analytical computation of the radia-
tion spectrum is possible for linear dilaton black hole solutions of EMD. For
massive black holes, this leads in the high frequency regime to a Planckian
distribution with a temperature independent of the black hole mass, in ac-
cordance with the surface gravity value. On the other hand, we find that
extreme, massless black holes do not radiate, thereby solving the paradox
presented by apparently hot (if the surface gravity temperature is taken
seriously) yet massless black holes.
Acknowledgements: J.C.F. thanks the LAPTH for the warm hospitality
during the elaboration of this work. He also thanks CNPq (Brazil) for
partial support. J.C.F. and G.T.M. thank the French-Brazilian scientific
cooperation CAPES/COFECUB for partial financial support.
References
[1] N.D. Birrell and P.C.W. Davies, Quantum fields in curved space,
Cambridge University Press, Cambridge (1982).
[2] S.W. Hawking, Commun. Math. Phys. 43 (1975) 199.
[3] W.G. Unruh, Phys. Rev. D14 (1976) 870.
[4] D. Page, Phys. Rev. D13 (1976) 198.
[5] W.G. Unruh, Phys. Rev. D14 (1976) 3251.
[6] M. Bañados, C. Teitelboim and J. Zanelli, Phys. Rev. Lett. 69 (1992)
1849.
[7] K. Ghoroku and A.L. Larsen, Phys. Lett. B328 (1994) 28.
[8] M. Natsuume, N. Sakai and M. Sato, Mod. Phys. Lett. A11 (1996)
1467.
[9] D. Birmingham, I. Sachs and S. Sen, Phys. Lett. B413 (1997) 281.
[10] K.C.K. Chan, J.H. Horne and R.B. Mann, Nucl. Phys. B447 (1995)
[11] G. Clément and C. Leygnac, Phys. Rev. D70 (2004) 084018.
[12] G.W. Gibbons and K. Maeda, Nucl. Phys. B298 (1988) 741.
[13] D. Garfinkle, G.T. Horowitz and A. Strominger, Phys. Rev. D43 (1991)
3140.
[14] S.A. Hayward, Class. Quantum Grav. 17 (2000) 4021.
[15] S.W. Hawking, G.T. Horowitz and S.F. Ross, Phys. Rev. D51 (1995)
4302.
[16] G. Clément, D. Gal’tsov and C. Leygnac, Phys. Rev. D67 (2003)
024012.
ABSTRACT
  We compute exactly the semi-classical radiation spectrum for a class of
non-asymptotically flat charged dilaton black holes, the so-called linear
dilaton black holes. In the high frequency regime, the temperature for these
black holes generically agrees with the surface gravity result. In the special
case where the black hole is massless, we show that, although the surface
gravity remains finite, there is no radiation, in agreement with the fact that
massless objects cannot radiate.

<|endoftext|><|startoftext|>
Introduction and Overview
Bethe’s ansatz [1] for solving a one-dimensional integrable model was and remains a
powerful tool in contemporary theoretical physics: 75 years ago it solved one of the
first models of quantum mechanics, the Heisenberg spin chain [2]; today it provides
exact solutions for the spectra of certain gauge and string theories and thus helps us
understand their duality [3] better. Since the discovery of integrable structures in planar
N = 4 supersymmetric gauge theory [4] and in planar IIB string theory on AdS5×S5 [5]
the tools for computing and comparing the spectra of both models have evolved rapidly.
We now have complete asymptotic Bethe equations [6, 7] which interpolate smoothly
between the perturbative regimes in gauge and string theory and which agree with all
available data.
In this note we will focus on the S-matrix [8] in the excitation picture above a ferro-
magnetic ground state. We start by reviewing the algebraic construction of the S-matrix
in Sec. 2. In Sec. 3 we subsequently show that this S-matrix has indeed a larger symmetry
algebra: a Yangian.
http://arxiv.org/abs/0704.0400v4
2 The Universal Enveloping Algebra U(su(2|2)⋉R2)
In this section the results on the S-matrix of AdS/CFT shall be reviewed from an al-
gebraic point of view. The applicable symmetry is a central extension h of the Lie
superalgebra su(2|2) which we consider first. We continue by presenting the Hopf al-
gebra structure of its universal enveloping algebra and its fundamental representation.
Finally, we comment on the S-matrix and its dressing phase factor.
Lie Superalgebra. The symmetry in the excitation picture for light cone string theory
on AdS5×S5 and for single-trace local operators in N = 4 supersymmetric gauge theory
is given by two copies of the Lie superalgebra [9, 10]
h := su(2|2)⋉ R2 = psu(2|2)⋉ R3. (2.1)
It is a central extension of the standard Lie superalgebras su(2|2) or psu(2|2), see [11].
It is generated by the su(2)× su(2) generators Rab, Lαβ, the supercharges Qαb, Saβ and
the central charges C, P, K. The Lie brackets of the su(2) generators take the standard
[Rab,R
d] = δ
d − δadRcb, [Lαβ ,Lγδ] = δ
Lαδ − δαδ Lγβ,
[Rab,Q
d] = −δadQγb + 12δ
d, [L
d] = +δ
d − 12δ
[Rab,S
δ] = +δ
δ − 12δ
δ, [L
δ] = −δαδ Scβ + 12δ
δ. (2.2)
The Lie brackets of two supercharges yield
{Qαb,Scδ} = δcbLαδ + δαδ Rcb + δcbδαδ C,
{Qαb,Qγd} = εαγεbdP,
{Saβ,Scδ} = εacεβδK. (2.3)
The remaining Lie brackets vanish.
Where appropriate, we shall use the collective symbol JA for the generators. The Lie
brackets then take the standard form
[JA, JB] = fABC J
C . (2.4)
For simplicity of notation, we shall pretend that all generators are bosonic; the general-
isation to fermionic generators by insertion of suitable signs and graded commutators is
straightforward.
Hopf Algebra. Next we consider the universal enveloping algebra U(h) of h. The
construction of the product is standard, and one identifies the Lie brackets (2.4) with
graded commutators. For the coproduct one can introduce a non-trivial braiding [12,13]
∆JA = JA ⊗ 1 + U [A] ⊗ JA (2.5)
∆Rab = R
b ⊗ 1 + 1⊗Rab,
∆Lαβ = L
β ⊗ 1 + 1⊗ Lαβ,
∆Qαb = Q
b ⊗ 1 + U+1 ⊗Qαb,
∆Saβ = S
β ⊗ 1 + U−1 ⊗Saβ,
∆C = C⊗ 1 + 1⊗ C,
∆P = P⊗ 1 + U+2 ⊗P,
∆K = K⊗ 1 + U−2 ⊗ K,
∆U = U ⊗ U .
Table 1: The coproduct of the braided universal enveloping algebra U(h).
with some abelian1 generator U (a priori unrelated to the algebra) and the grading
[R] = [L] = [C] = 0, [Q] = +1, [S] = −1, [P] = +2, [K] = −2. (2.6)
The coproduct is spelt out in Tab. 1 for the individual generators. The above grading
is derived from the Cartan charge of the sl(2) automorphism [11] of the algebra h and
therefore the coproduct is compatible with the algebra relations.
We should define the remaining structures of the Hopf algebra: the antipode S and
the counit ε [12,13]. The antipode is an anti-homomorphism which acts on the generators
S(1) = 1, S(U) = U−1, S(JA) = −U−[A]JA. (2.7)
The counit acts non-trivially only on 1 and U
ε(1) = ε(U) = 1, ε(JA) = 0. (2.8)
Cocommutativity. This coproduct is in general not quasi-cocommutative as can eas-
ily be seen by considering the central charges P, K in Tab. 1. To make it quasi-cocommu-
tative we have to satisfy the constraints [12]
1− U+2
1− U+2
⊗P, K⊗
1− U−2
1− U−2
⊗ K. (2.9)
They are solved by identifying the central charges P, K with the braiding factor U as
follows [13]
P = gα
1− U+2
, K = gα−1
1− U−2
. (2.10)
This leads to the following quadratic constraint
PK− gα−1P− gαK = 0. (2.11)
It was furthermore shown in [14] that the coproduct is quasi-triangular, at least at the
level of central charges, see also [15].
1Curiously, we can include the supersymmetric grading (−1)F in the generator U to manually impose
the correct statistics. This is helpful for an implementation within a computer algebra system. In this
case U would anticommute with fermionic generators.
Fundamental Representation. The algebra h has a four-dimensional representation
[10] which we will call fundamental. The corresponding multiplet has two bosonic states
|φa〉 and two fermionic states |ψα〉. The action of the two sets of su(2) generators has to
be canonical
Rab|φc〉 = δcb |φa〉 − 12δ
b |φc〉,
Lαβ|ψγ〉 = δγβ |ψα〉 − 12δ
β |ψγ〉. (2.12)
The supersymmetry generators must also act in a manifestly su(2)×su(2) covariant way
Qαa|φb〉 = a δba|ψα〉,
Qαa|ψβ〉 = b εαβεab|φb〉,
Saα|φb〉 = c εabεαβ|ψβ〉,
Saα|ψβ〉 = d δβα|φa〉. (2.13)
We can write the four parameters a, b, c, d using the parameters x±, γ and the constants
g, α as
g γ, b =
, c =
, d =
. (2.14)
The parameters x± (together with γ) label the representation and they must obey the
constraint
− x− − 1
. (2.15)
The three central charges C,P,K and U are represented by the values C, P,K and U
which read
1 + 1/x+x−
1− 1/x+x− , P = gα
, K =
, U =
. (2.16)
They furthermore obey the quadratic relation C2−PK = 1
. Note that the corresponding
quadratic combination of central charges C2−PK is singled out by being invariant under
the sl(2) external automorphism.
Fundamental S-Matrix. In [10,14] an S-matrix acting on the tensor product of two
fundamental representations was derived. It was constructed by imposing invariance
under the algebra h
[∆JA,S] = 0. (2.17)
We will not reproduce the result here, it is given in [14]. Note that we have to fix the
parameters ξ = U =
x+/x− in order to make the action of the generators compatible
with the coproduct (2.5).2
2This identification removes all braiding factors from the S-matrix in [14] which will thus satisfy the
standard Yang-Baxter (matrix) equation, see also [10, 16, 17].
This S-matrix has several interesting properties. Firstly, it is not of difference form;
it cannot be written as a function of the difference of some spectral parameters. Sec-
ondly, the S-matrix could be determined uniquely up to one overall function merely by
imposing a Lie-type symmetry (2.17) [10]. This unusual fact is related to an unusual
feature of representation theory of the algebra h: The tensor product of two fundamental
representations is irreducible in almost all cases [14].
Intriguingly this S-matrix is equivalent to Shastry’s R-matrix [18] of the one-dimen-
sional Hubbard model [19]. Furthermore the Bethe equations [10] contain two copies of
the Lieb-Wu equations for the Hubbard model [20]. These observations of [14] estab-
lish a link between an important model of condensed matter physics and string theory
(complementary to the one in [21]).
Finally, let us note that one can derive (asymptotic) Bethe equations from the S-
matrix and thus confirm the conjecture in [6]. So far this step has been performed in
two different ways: by means of the nested coordinate [10] and the algebraic [17] Bethe
ansatz.
Phase Factor. The remaining overall phase factor of the S-matrix clearly cannot be
determined by demanding invariance under h. The phase factor was computed to some
approximation from gauge theory [22] and from string theory [23]. The problem of an
algebraically undetermined phase factor is in fact generic. Usually one imposes a further
crossing symmetry relation to obtain a constraint on it. Indeed the known string phase
factor is consistent with crossing symmetry [24] as was shown in [25]. By substituting a
suitable ansatz [26] for the phase factor into the crossing symmetry relation a conjecture
for the all-orders phase factor at strong coupling was made in [27].
A corresponding all-orders expansion at weak coupling was presented in [7]. The
latter conjecture was obtained by a sort of analytic continuation in the perturbative
order of the series. Let us illustrate this principle by means of a very simple example:
Consider the rational function f(x) = 1/(1−x). It has the following expansions at x = 0
and at x = ∞
n, f(x)
−n (2.18)
with an = 1 and bn = −1. When we consider an and bn as analytic functions of the
index, we can make the observation (“reciprocity”)
an = −b−n. (2.19)
Of course there are various ways in which the two functions +1 and −1 could be related,
but the choice (2.19) appears to work for a surprisingly large class of functions!3 It was
proved in [30] that it does apply for the conjectured expansion of the phase factor. Very
useful integral expressions for the phase have recently appeared in [31]. The analytic
expression of the dressing phase can formally be obtained from the psu(2, 2|4) Bethe
3Among other physical examples, we have identified circular Maldacena-Wilson loops [28] and non-
critical string theory [29] where this reciprocity can be applied. Furthermore, summation by the Euler-
MacLaurin formula (also known as zeta-function regularisation) is consistent with it. I thank Curt
Callan, Marcos Mariño and Tristan McLoughlin for discussions of this principle.
equations [32] (see however appendix D in [33]) in analogy to the covariant approach
of [34, 21, 35]. While this proposal may seem to be encouraging in general, it is at the
same time strange from the Hopf algebra point of view to use an S-matrix which does
not obey the crossing relation [32]. This calls for further investigations.
Several tests of the phase have recently appeared, they are based on four-loop unitary
scattering methods [36], numerical evaluation [37, 38], analytic methods [37, 30, 39] and
on taking a certain highly non-trivial limit [40].
3 The Yangian Y(su(2|2) ⋉ R2)
In the section we investigate Yangian symmetry [41,42] for the above S-matrix. We will
start with a very brief review of Yangian symmetry for generic S-matrices (see [43] for
more extensive reviews), and then we apply the framework to the S-matrix discussed
above.
Yangians and S-Matrices. Typically the symmetries of rational S-matrices are of
Yangian type. The Yangian Y(g) of a Lie algebra g is a deformation of the universal
enveloping algebra of half the affine extension of g.
More plainly, it is generated by the g-generators JA and the Yangian generators ĴA.
Their commutators take the generic form
[JA, JB] = fABC J
[JA, ĴB] = fABC Ĵ
C , (3.1)
and they should obey the Jacobi and Serre relations
J[A, [JB, JC]]
J[A, [JB, ĴC]]
Ĵ[A, [ĴB, JC]]
2fAGD f
F fGHKJ
{DJEJF}. (3.2)
The symbol fABC = gADgBEf
C represents the structure constants f
C with two indices
lowered by means of the inverse of the Cartan-Killing forms gAD and gBE . The brackets
{ } and [ ] at the level of indices imply total symmetrisation and anti-symmetrisation,
respectively. Finally, ~ is a scale parameter whose value plays no physical role. The first
two relations lead to a constraint on the structure constants fABC . The third relation
a deformation of the Serre relation for affine extensions of Lie algebras.
The Yangian is a Hopf algebra and the coproduct takes the standard form
∆JA = JA ⊗ 1 + 1⊗ JA,
∆ĴA = ĴA ⊗ 1 + 1⊗ ĴA + 1
~fABCJ
B ⊗ JC . (3.3)
where fABC = gBDf
C . The antipode S is defined by
S(JA) = −JA, S(ĴA) = −ĴA + 1
~fABCf
D, (3.4)
4For g = su(2) it has to be replaced by a quartic relation.
and the counit ε takes the standard form
ε(1) = 1, ε(JA) = ε(ĴA) = 0. (3.5)
For the study of integrable systems, the evaluation representations of the Yangian
are of special interest. For these the action of the Yangian generators ĴA is proportional
to the Lie generators
ĴA|u〉 = ~uJA|u〉. (3.6)
Here |u〉 is some state of the evaluation module with spectral parameter u. This Yangian
representation is finite-dimensional if the g-representation is. One merely has to ensure
that the Serre relation (3.2) is satisfied. This is indeed not the case for all representations
of all Lie algebras. The power of the Yangian symmetry lies in the fact that tensor
products of evaluation representations are typically irreducible (except for special values
of their spectral parameters). This allows for simple proofs (e.g. for the Yang-Baxter
relation) by representation theory arguments.
Let us finally consider the connection to the S-matrix. The S-matrix is a permutation
operator; it acts by interchanging two modules of the algebra
S : V1 ⊗ V2 → V2 ⊗ V1. (3.7)
In particular, for the tensor product of two evaluation modules one has
S|u1, u2〉 ∼ |u2, u1〉. (3.8)
Invariance of the S-matrix under the Yangian means
[∆JA,S] = [∆ĴA,S] = 0 (3.9)
for all generators JA, ĴA. The existence of such an S-matrix is equivalent to quasi-
cocommutativity of Y(g). Note that only the difference of spectral parameters appears in
the invariance condition: We can write the action of the coproduct of Yangian generators
on the evaluation module |u1, u2〉 as
∆ĴA ≃ (u1 − u2)JA ⊗ 1 + u2∆JA + ~fABCJB ⊗ JC . (3.10)
Here the first equation in (3.9) ensures that the term proportional to u2 drops out from
the second equation. Therefore the S-matrix typically depends on the difference u1 − u2
of spectral parameters only.
Yangians in AdS/CFT. Yangian symmetries for planar AdS/CFT have been inves-
tigated in [44], both for classical string theory and for gauge theory at leading order,
see also [45] Yangian symmetry also persists to higher perturbative orders in both mod-
els [22, 46] and it is likely that it also exists at finite coupling. This Yangian can be
understood as a symmetry of the Hamiltonian on an infinite world sheet or as an expan-
sion of the full monodromy matrix. The Lie symmetry in this picture is psu(2, 2|4) and
the Yangian would be Y(psu(2, 2|4)).
Here we consider a different picture of well-separated excitations on a ferromagnetic
ground state and of their scattering matrix. In this picture the Lie symmetry reduces to
two copies of h and the corresponding Yangian would be Y(h). Our Yangian should arise
as a subalgebra of the full Yangian Y(psu(2, 2|4)) when acting on asymptotic excitation
states.
Hopf Algebra. Let us now consider Y(h). We have already studied the universal
enveloping algebra U(h). All we still need to do is to introduce one generator ĴA for each
JA obeying the relations (3.1,3.2), and define its coproduct, antipode as well as counit.
In (2.5) we have defined a braided coproduct for the universal enveloping algebra.
For consistency with the Serre relations, we also have to apply an analogous braiding to
the standard Yangian coproduct
∆ĴA = ĴA ⊗ 1 + U [A] ⊗ ĴA + ~fABCJBU [C] ⊗ JC . (3.11)
Note that lowering an index requires to use the inverse Cartan-Killing form of the algebra.
In the case of h the Cartan-Killing form is degenerate and we need to extend the algebra
by the sl(2) outer automorphism, see [14]. Effectively, lowering an index leads to an
interchange of the automorphism generators with the central charges. We refrain from
spelling out the Cartan-Killing form or the modified structure constants. Instead we
present the complete set of coproducts of Yangian generators in Tab. 2, where we also
fix the value of ~.
For the sake of completeness we state the antipode5 and the counit
S(ĴA) = −U−[A]ĴA, ε(ĴA) = 0. (3.12)
Cocommutativity. An important question is if this coproduct can be quasi-cocom-
mutative.6 A first step is to consider the central generators Ĉ, P̂, K̂. For that purpose
it is favourable to choose suitable combinations
Ĉ′ = Ĉ+ 1
gα−1P− 1
P̂′ = P̂+ C
P− 2gα
K̂′ = K̂− C
K− 2gα−1
, (3.13)
for whom the coproduct almost trivialises
∆Ĉ′ = Ĉ′ ⊗ 1 + 1⊗ Ĉ′,
∆P̂′ = P̂′ ⊗ 1 + U+2 ⊗ P̂′,
∆K̂′ = K̂′ ⊗ 1 + U−2 ⊗ K̂′. (3.14)
The combination Ĉ′ is already cocommutative, and in order to make the generators P̂′,
K̂′ cocommutative we have to set as above in (2.9,2.10)
P̂′ = iguPP, K̂
′ = iguKK (3.15)
with two universal constants uP and uK. With this choice, Ĉ, P̂, K̂ also become cocom-
mutative because they differ from Ĉ′, P̂′, K̂′ only by central elements.
5Note that fA
= 0 here, so there is no contribution from the Lie generators.
6The braiding factors in (3.11) turn out to be very important for the Yangian. It can easily be
seen that without them the coproduct cannot be quasi-cocommutative. This is in contradistinction
to the universal enveloping algebra where the braided as well as the unbraided coproduct are quasi-
cocommutative.
∆R̂ab = R̂
b ⊗ 1 + 1⊗ R̂ab
Rac ⊗Rcb − 12R
b ⊗Rac
SaγU+1 ⊗Qγb − 12Q
bU−1 ⊗Saγ
δab S
γU+1 ⊗Qγd + 14δ
dU−1 ⊗Sdγ ,
∆L̂αβ = L̂
β ⊗ 1 + 1⊗ L̂αβ
Lαγ ⊗ Lγβ + 12L
β ⊗ Lαγ
QαcU−1 ⊗Scβ + 12S
βU+1 ⊗Qαc
δαβ Q
cU−1 ⊗Scδ − 14δ
δU+1 ⊗Qδc,
∆Q̂αb = Q̂
b ⊗ 1 + U+1 ⊗ Q̂αb
LαγU+1 ⊗Qγb + 12Q
b ⊗ Lαγ
RcbU+1 ⊗Qαc + 12Q
c ⊗Rcb
CU+1 ⊗Qαb + 12Q
b ⊗ C
εαγεbdPU−1 ⊗Sdγ − 12ε
αγεbdS
γU+2 ⊗P,
∆Ŝaβ = Ŝ
β ⊗ 1 + U−1 ⊗ Ŝaβ
RacU−1 ⊗Scβ − 12S
β ⊗Rac
LγβU−1 ⊗Saγ − 12S
γ ⊗ Lγβ
CU−1 ⊗Saβ − 12S
β ⊗ C
εacεβδKU+1 ⊗Qδc + 12ε
acεβδQ
cU−2 ⊗ K,
∆Ĉ = Ĉ⊗ 1 + 1⊗ Ĉ
PU−2 ⊗ K− 1
KU+2 ⊗P,
∆P̂ = P̂⊗ 1 + U+2 ⊗ P̂
− CU+2 ⊗P+P⊗ C,
∆K̂ = K̂⊗ 1 + U−2 ⊗ K̂
+ CU−2 ⊗ K− K⊗ C.
Table 2: The coproduct of the Yangian generators in Y(h).
Fundamental Evaluation Representation. For the fundamental evaluation repre-
sentation we make the ansatz7
ĴA|X 〉 = ig(u+ u0)JA|X 〉. (3.16)
By comparison with (3.13,3.15) we can infer that u has to be related to the parameters
of the fundamental representation by
u = x+ +
= x− +
(x+ + x−)(1 + 1/x+x−) . (3.17)
Furthermore uP and uK in (3.15) have to both coincide with the universal constant
u0 = uP = uK.
As an aside we state the eigenvalue of the quadratic combination
CĈ − 1
PK̂ − 1
KP̂ = 1
ig(u+ u0). (3.18)
Fundamental S-Matrix. Using the coproducts in Tab. 2 we have confirmed that the
S-matrix is also invariant under all of the Yangian generators
[∆ĴA,S] = 0. (3.19)
We have used a computer algebra system to evaluate the action of the Yangian gener-
ators and the S-matrix.9 To show invariance requires heavy use of the identity (2.15).
Superficially it is very surprising to find all these additional symmetries of the S-matrix.
The deeper reason however should be that the coproduct is quasi-cocommutative. We
have thus proved quasi-cocommutativity when acting on fundamental representations.
It is interesting to see that the S-matrix is based on standard evaluation represen-
tations of the Yangian. Nevertheless, it is not a function of the difference of spectral
parameters. This unusual property traces back to the link between the spectral param-
eter u and the h-representation parameters x± in (3.17). The latter is again related to
the braiding in the coproduct (3.11).
As our S-matrix is equivalent [14] to Shastry’s R-matrix, our Yangian is presumably
an extension of the su(2)×su(2) Yangian symmetry of the Hubbard model found in [47].
4 Conclusions and Outlook
In this note we have reviewed the construction of the S-matrix with centrally extended
su(2|2) symmetry that appears in the context of the planar AdS/CFT correspondence
and the one-dimensional Hubbard model. We have furthermore shown that the S-matrix
has an additional Yangian symmetry whose Hopf algebra structure we have presented.
This Yangian is not quite a standard Yangian, but its coproduct needs to be braided in
order to be quasi-cocommutative. This fact is intimately related to the existence of a
7We believe, but we have not verified that this is compatible with the Serre relations (3.2).
8It is conceivable that a further consistency requirement fixes the value of u0, presumably to zero.
9We have also confirmed the invariance of the singlet state found in [10].
triplet of central charges with non-trivial coproduct and leads to the wealth of unusual
features of the S-matrix.
In connection to the Yangian there are many points left to be clarified. Most im-
portantly the representation theory needs to be understood. Which representations of h
lift to evaluation representations of Y(h)? At what values of the spectral parameters do
their tensor products become reducible? This information could be used to prove that
the coproduct is quasi-cocommutative. Also the Yang-Baxter equation for the S-matrix
should follow straightforwardly. It might also give some further understanding of bound
states [48].
Then it would be highly desirable to construct a universal R-matrix for this Yangian
and show that it is quasi-triangular. This would put large parts of the integrable structure
for arbitrary representations of this algebra on solid ground much like for the case of
generic simple Lie algebras.
Some further interesting questions include: Is this Yangian the unique quasi-co-
commutative Hopf algebra based on h? Does the double Yangian [42] exist and what is
its structure? Can the sl(2) automorphism of the algebra be included at the Yangian
level such that the coproduct is quasi-cocommutative? What would the representations
be in this case?
Acknowledgements. I am grateful to C. Callan, D. Erkal, A. Kleinschmidt, P. Ko-
roteev, N. MacKay, M. Mariño, T. McLoughlin, J. Plefka, F. Spill and B. Zwiebel for
interesting discussions.
References
[1] H. Bethe, Z. Phys. 71, 205 (1931).
[2] W. Heisenberg, Z. Phys. 49, 619 (1928).
[3] J. M. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998), hep-th/9711200.
[4] J. A. Minahan and K. Zarembo, JHEP 0303, 013 (2003), hep-th/0212208.
N. Beisert, C. Kristjansen and M. Staudacher, Nucl. Phys. B664, 131 (2003), hep-th/0303060.
N. Beisert and M. Staudacher, Nucl. Phys. B670, 439 (2003), hep-th/0307042.
[5] G. Mandal, N. V. Suryanarayana and S. R. Wadia, Phys. Lett. B543, 81 (2002), hep-th/0206103.
I. Bena, J. Polchinski and R. Roiban, Phys. Rev. D69, 046002 (2004), hep-th/0305116.
[6] N. Beisert and M. Staudacher, Nucl. Phys. B727, 1 (2005), hep-th/0504190.
[7] N. Beisert, B. Eden and M. Staudacher, J. Stat. Mech. 07, P01021 (2007), hep-th/0610251.
[8] M. Staudacher, JHEP 0505, 054 (2005), hep-th/0412188.
[9] N. Beisert, Phys. Rept. 405, 1 (2004), hep-th/0407277.
[10] N. Beisert, hep-th/0511082.
[11] W. Nahm, Nucl. Phys. B135, 149 (1978).
[12] C. Gómez and R. Hernández, JHEP 0611, 021 (2006), hep-th/0608029.
[13] J. Plefka, F. Spill and A. Torrielli, Phys. Rev. D74, 066008 (2006), hep-th/0608038.
[14] N. Beisert, J. Stat. Mech. 07, P01017 (2007), nlin.SI/0610017.
[15] N. Beisert and P. Koroteev, arxiv:0802.0777.
[16] G. Arutyunov, S. Frolov and M. Zamaklar, JHEP 0704, 002 (2007), hep-th/0612229.
[17] M. J. Martins and C. S. Melo, Nucl. Phys. B785, 246 (2007), hep-th/0703086.
[18] B. S. Shastry, Phys. Rev. Lett. 56, 2453 (1986).
[19] J. Hubbard, Proc. R. Soc. London A276, 238 (1963).
[20] E. H. Lieb and F. Y. Wu, Phys. Rev. Lett. 20, 1445 (1968).
[21] A. Rej, D. Serban and M. Staudacher, JHEP 0603, 018 (2006), hep-th/0512077.
[22] D. Serban and M. Staudacher, JHEP 0406, 001 (2004), hep-th/0401057.
[23] G. Arutyunov, S. Frolov and M. Staudacher, JHEP 0410, 016 (2004), hep-th/0406256.
N. Beisert and A. A. Tseytlin, Phys. Lett. B629, 102 (2005), hep-th/0509084.
R. Hernández and E. López, JHEP 0607, 004 (2006), hep-th/0603204.
N. Gromov and P. Vieira, Nucl. Phys. B789, 175 (2008), hep-th/0703191.
[24] R. A. Janik, Phys. Rev. D73, 086006 (2006), hep-th/0603038.
[25] G. Arutyunov and S. Frolov, Phys. Lett. B639, 378 (2006), hep-th/0604043.
[26] N. Beisert and T. Klose, J. Stat. Mech. 06, P07006 (2006), hep-th/0510124.
[27] N. Beisert, R. Hernández and E. López, JHEP 0611, 070 (2006), hep-th/0609044.
[28] J. K. Erickson, G. W. Semenoff and K. Zarembo, Nucl. Phys. B582, 155 (2000), hep-th/0003055.
[29] D. J. Gross and N. Miljkovic, Phys. Lett. B238, 217 (1990).
[30] A. V. Kotikov and L. N. Lipatov, Nucl. Phys. B769, 217 (2007), hep-th/0611204.
[31] A. V. Belitsky, Phys. Lett. B650, 72 (2007), hep-th/0703058.
N. Dorey, D. M. Hofman and J. Maldacena, Phys. Rev. D76, 025011 (2007), hep-th/0703104.
[32] K. Sakai and Y. Satoh, Phys. Lett. B661, 216 (2008), hep-th/0703177.
[33] A. Rej, M. Staudacher and S. Zieme, J. Stat. Mech. 0708, P08006 (2007), hep-th/0702151v2.
[34] N. Mann and J. Polchinski, Phys. Rev. D72, 086002 (2005), hep-th/0508232.
[35] N. Gromov, V. Kazakov, K. Sakai and P. Vieira, Nucl. Phys. B764, 15 (2007), hep-th/0603043.
[36] Z. Bern, M. Czakon, L. J. Dixon, D. A. Kosower and V. A. Smirnov,
Phys. Rev. D75, 085010 (2007), hep-th/0610248.
[37] M. K. Benna, S. Benvenuti, I. R. Klebanov and A. Scardicchio,
Phys. Rev. Lett. 98, 131603 (2007), hep-th/0611135.
[38] M. Beccaria, G. F. De Angelis and V. Forini, JHEP 0704, 066 (2007), hep-th/0703131.
[39] L. F. Alday, G. Arutyunov, M. K. Benna, B. Eden and I. R. Klebanov, JHEP 0704, 082 (2007),
hep-th/0702028.
I. Kostov, D. Serban and D. Volin, Nucl. Phys. B789, 413 (2008), hep-th/0703031.
[40] J. Maldacena and I. Swanson, Phys. Rev. D76, 026002 (2007), hep-th/0612079.
[41] V. G. Drinfel’d, Sov. Math. Dokl. 32, 254 (1985).
[42] V. G. Drinfel’d, J. Math. Sci. 41, 898 (1988).
[43] D. Bernard, Int. J. Mod. Phys. B7, 3517 (1993), hep-th/9211133.
N. J. MacKay, Int. J. Mod. Phys. A20, 7189 (2005), hep-th/0409183.
[44] L. Dolan, C. R. Nappi and E. Witten, JHEP 0310, 017 (2003), hep-th/0308089.
[45] L. Dolan, C. R. Nappi and E. Witten, hep-th/0401243, in: “Quantum theory and symmetries”,
ed.: P. C. Argyres et al., World Scientific (2004), Singapore.
M. Hatsuda and K. Yoshida, Adv. Theor. Math. Phys. 9, 703 (2005), hep-th/0407044.
L. Dolan and C. R. Nappi, Nucl. Phys. B717, 361 (2005), hep-th/0411020.
[46] A. Agarwal and S. G. Rajeev, Int. J. Mod. Phys. A20, 5453 (2005), hep-th/0409180.
N. Berkovits, JHEP 0503, 041 (2005), hep-th/0411170.
B. I. Zwiebel, J. Phys. A40, 1141 (2007), hep-th/0610283.
N. Beisert and D. Erkal, J. Stat. Mech. 0803, P03001 (2008), arxiv:0711.4813.
[47] D. B. Uglov and V. E. Korepin, Phys. Lett. A190, 238 (1994), hep-th/9310158.
[48] N. Dorey, J. Phys. A39, 13119 (2006), hep-th/0604175.
H.-Y. Chen, N. Dorey and K. Okamura, JHEP 0611, 035 (2006), hep-th/0608047.
ABSTRACT
  We review the algebraic construction of the S-matrix of AdS/CFT. We also
present its symmetry algebra which turns out to be a Yangian of the centrally
extended su(2|2) superalgebra.

<|endoftext|><|startoftext|>
Introduction
Red supergiant stars (RSGs) provide most of the near-IR light emitted by young stellar
populations, such as those in starburst galaxies. As star forming environments tend to
be dusty, rest-frame optical analyses are incomplete (highly obscured populations are
missed) and it is crucial to improve our understanding of spectra at longer wavelengths. In
the past, the near-IR analysis of young stellar populations has often been restricted to the
determination of the average properties of the dominant stars, such as their spectral types
or abundances. The subsequent interpretation of these results in terms of precise stellar
population ages and star formation histories remains an enormous challenge, as it requires
(i) a good understanding of the near-IR spectra of individual RSGs and (ii) adequate
stellar evolution tracks. We have started a programme that aims at providing state of
the art predictions for the emission of RSG-dominated populations and at characterizing
remaining uncertainties. Currently, the project focuses on wavelengths between 0.81 and
2.4µm and spectral resolutions of order λ/δλ = 103.
http://arxiv.org/abs/0704.0401v1
2 A. Lançon et al.
2. Empirical and synthetic spectra of red supergiants
In principle, synthetic stellar spectra are more practical for the prediction of galaxy
spectra than empirical ones, because theory allows us to sample parameter space without
biases. Lançon et al. (2007) show that modern theoretical spectra can reproduce the near-
IR (+optical) emission of giant stars well down to effective temperatures Teff ≃ 3400K,
but that they are not yet satisfactory at lower temperatures and higher luminosities. They
used new Phoenix models to compute spectra at the necessary resolution (0.1 Å before
smoothing), with solar abundances and with the RSG-specific abundances obtained as
the result of internal mixing along stellar evolution tracks; the models include some 109
individual molecular and atomic lines, assume spherical symmetry, and allow dust to form
if conditions are fulfilled. Model limitations include the assumptions of local thermal
equilibrium (LTE) and of hydrostatic equilibrium. A sample of 101 empirical spectra
covering wavelengths between 0.51, 0.81 or 0.90µm and 2.4µm was used for comparison
(Lançon & Wood 2000, Lançon et al. in preparation). The data were acquired with
CASPIR on the 2.3m ANU Telescope at Siding Spring and with SpeX at IRTF, Hawaii.
Below Teff ∼ 3400K, uncertain input line lists are a problem in the models (especially
for molecular bands around 1µm). At high luminosity (luminosity class Ia and Iab), the
main difficulty is to reproduce simultaneously extremely deep CN bands and the relative
strengths of the CO bandheads around 1.7µm and at 2.3µm. RSG-specific abundances
improve fits to the CN bands. Exploratory calculations show that values near 10 km/s for
the “microturbulence” (a 1D-model parameter that hides poorly understood 3D physical
phenomena) may be able to solve both problems (Fig. 1, top left). The calculation of a
new grid has been launched to explore this further. In the mean time, the study shows
that the population synthesis community still has to rely on empirical spectra for RSGs,
and it warns that the lack of satisfactory stellar models implies large uncertainties on
the derived fundamental parameters of the observed stars.
3. Population synthesis using averaged stellar spectra
In order to compute spectra of synthetic populations, we have constructed three se-
quences of average empirical spectra, corresponding to luminosity classes Ia, Iab and
Ib/II. Each subset was sorted into bins according to the estimated Teff , the spectra were
dereddened (an estimate of the reddening is provided by the model fits), and averages
were computed. The sequences shown in Fig. 1 (top right) account for varying micro-
turbulence in a preliminary way, based on the limited number of high microturbulence
models available to us at the time of this writing. We chose to flag any star with an initial
mass above 7M⊙ as a supergiant, which implies that the new spectra affect predictions
up to the age of about 75Myr (Fig. 1, middle left). We note that predictions vary signif-
icantly depending on the adopted evolutionary tracks; different authors predict different
red supergiant lifetimes, and main sequence rotation affects both the surface abundances
and the final red (and blue) supergiant numbers.
4. Star clusters in M82
The synthetic spectra of single stellar populations (SSPs) at solar metallicity are com-
pared with those of young star clusters in starbursts, such as M82-L and M82-F (Smith
& Gallagher 1999). The selected clusters are massive (well above 105M⊙), i.e. stochastic
effects due to an underpopulated RSG-branch are avoided. A few have well determined
optical ages (based on standard non-rotating evolutionary tracks). Figure 1 (middle right
Modelling Red Supergiant Populations 3
 3000 4000 5000
Age = 18 Myr
Av = 1
0.5 1.0 1.5 2.0 2.5
Age=18
Av=3.7
Rv=2.4
chi2=1.58934
1.0 1.5 2.0
Wavelength (micron)
12 CO
[FeII]
1.55 1.60 1.65 1.70
Wavelength (micron)
bad pixels
2.1 2.2 2.3
Figure 1. Top left: Spectrum of an M0Ia RSG compared with models with vmicroturb=2km/s
(top: 4200K, log(g)=-1, AV =4.4) and with vmicroturb=10 km/s (bottom: 4500K, log(g)=0,
AV =4.7; note the improved CN at 1.1µm and CO around 1.6 and 2.3µm). Top right: Param-
eters assigned to the new sequences of average spectra, superimposed on the solar metallicity
tracks of Bressan et al. (1993). Middle left: Comparison of a new SSP spectrum (black) with
the standard predictions of Pegase.2 (differences are largest between 10 and 20Myr). Middle
right: Best near-IR fit to the spectrum of cluster M82-L. The extinction law of Cardelli et al.
(1989) with RV =2.4 allows us to also reproduce the optical spectrum (from Smith & Gallagher
1999). The error spectrum and the χ2 weighting function are shown. Bottom: Zoom-ins of the
H and K windows.
4 A. Lançon et al.
and bottom) shows cluster L, the cluster observed with SpeX with the best signal-to-
noise ratio: an excellent fit is obtained over the whole available range in wavelength.
Such results make the new models valuable tools for purposes such as weak emission line
measurements. The χ2-test restricted to near-IR wavelengths not affected by strong tel-
luric absorption shows that age is formally determined to an accuracy of about ±10Myr.
Because of strong reddening, the optical age of cluster L cannot be determined as well
as that of cluster F: 50-70Myr (Gallagher & Smith 2001, McCrady et al. 2005, Bastian
et al. 2007). For F, our current models provide a near-IR age range of 32 to 46Myr.
This small but nevertheless significant disagreement calls for several comments. (i) Be-
fore accounting for luminosity-dependent microturbulence, we found a near-IR age of 10
to 20Myr; we hope that our next generation of synthetic stellar spectra will significantly
reduce uncertainties originating in uncertain fundamental parameters of stars. (ii) The
spectrum used for optical age-dating and our near-IR spectrum have different slopes in
the region of overlap. This suggests slightly different positions were observed: the ob-
scuration across M82-F is not at all uniform. In addition, a younger cluster located at
very small projected distance might contaminate the near-IR data. (iii) Modified stellar
tracks (e.g. including rotation) might affect optical ages as well as near-IR ones.
5. Conclusions
The spectra of young stellar populations at solar metallicity, observed at R∼ 103, can
now be modelled well from the optical through the near-IR. Nevertheless, ages based on
near-IR spectra remain severely affected by uncertainties. They are due mainly to system-
atic errors, which further work needs to characterize and reduce. Errors are associated on
one hand with the fundamental parameters of red supergiant stars (theoretical spectra,
microturbulence, surface abundances of C, N and O, non-LTE, variability, winds, giant-
supergiant transition), and on the other with evolutionary tracks (convection, opacities,
rotation, binarity, effects of a dense environment). We expect rapid progress in stellar at-
mosphere models to provide us with tools to test stellar tracks further. Complete optical
and near-IR spectra of massive clusters such as those of M82 are useful test cases for the
identification and correction of systematic errors, but even they are not trivial to exploit
(due to inhomogeneous background populations and extinction, mass segregation, etc.).
References
Bastian, N., Konstantopoulos, I., Smith, L.J. & Gallagher, J.S. 2007, MNRAS in press
Cardelli, J.A., Clayton, G.C. & Mathis, J.S. 1989 ApJ 345, 245
Gallagher, J.S. & Smith, L.J. 1999 MNRAS 304, 540
Lançon, A. & Wood, P.R. 2000, A&AS 146, 217
Lançon, A., Hauschildt, P., Ladjal, D. & Mouhcine, M. 2007, A&A in press
McCrady, N., Graham, J.R. & Vacca, W.D. 2005 ApJ 621, 278
Smith, L. J. & Gallagher, J. S. 2001 MNRAS 326, 1027
Discussion
Gustafsson: Do the models with high microturbulence include turbulent pressure in a
consistent way?
Lançon (after discussion with P.H. and H. Lamers): No. But the microturbu-
lent velocities required to reproduce the spectra with 1D models are supersonic, which
suggests that the actual process is not microturbulence... Therefore it is unclear how to
relate this parameter of 1D models to pressure.
	Introduction
	Empirical and synthetic spectra of red supergiants
	Population synthesis using averaged stellar spectra
	Star clusters in M82
	Conclusions
ABSTRACT
  We report on recent progress in the modelling of the near-IR spectra of young
stellar populations, i.e. populations in which red supergiants (RSGs) are
dominant. First, we discuss the determination of fundamental parameters of RSGs
using fits to their near-IR spectra with new PHOENIX model spectra;
RSG-specific surface abundances are accounted for and effects of the
microturbulence parameter are explored. New population synthesis predictions
are then described and, as an example, it is shown that the spectra of young
star clusters in M82 can be reproduced very well from 0.5 to 2.4 micrometers.
We warn of remaining uncertainties in cluster ages.

<|endoftext|><|startoftext|>
Introduction and statement of results
In this paper we study the shape of certain solutions to the following quasilinear
elliptic Neumann problem:
(1.1)
εm∆mu− um−1 + f (u) = 0, u > 0 in Ω,
= 0 on ∂Ω,
where m (2 ≤ m < N) and 0 < ε ≤ 1 are constants and Ω ⊆ RN (N ≥ 3)
is a smooth bounded domain. The operator ∆mu = div(|∇u|m−2 ∇u) is the m-
Laplacian operator, and ν is the unit outer normal to ∂Ω.
Problem (1.1) appears in the study of non-Newtonian fluids, chemotaxis and
biological pattern formation. For example, in the study of non-Newtonian fluids,
the quantity m is a characteristic of the medium: media with m > 2 are called
dilatant fluids, and those with m < 2 are called pseudo-plastics. If m = 2, they
are Newtonian fluids (see [3] and its bibliography for more backgrounds). For
the case m = 2, (1.1) is also known as the stationary equation of the Keller–
Segal system in chemotaxis [14] or the limiting stationary equation of the so-called
Gierer–Meinhardt system in biological pattern formation (see [23]).
First let us recollect some results related to our problem. In a series of remarkable
papers, C.-S. Lin, W.-M. Ni and I. Takagi [14], Ni and Takagi [17], [18] studied the
Neumann problem for certain elliptic equations, including
(1.2)
d∆u− u+ up = 0, u > 0 in Ω,
= 0 on ∂Ω,
where d > 0, p > 1 are constants, and p is subcritical, i.e., p < N+2
. First, Lin,
Ni and Takagi [14] applied the mountain-pass lemma [1] to show the existence of
Key words and phrases. Quasilinear Neumann problem, m-Laplacian operator, least-energy
solution, exponential decay, mean curvature.
http://arxiv.org/abs/0704.0402v1
2 YI LI AND CHUNSHAN ZHAO
a least-energy solution ud to (1.2), by which is meant that ud has the least energy
among all solutions to (1.2) with the energy functional
Id (u) =
|∇u|2 + 1
u2 − 1
defined on W 1,2 (Ω). Hereinafter u+ = max {u, 0} and u− = min {u, 0}. Then in
[17], [18], Ni and Takagi investigated the shape of the least-energy solution ud as
d becomes sufficiently small, and showed that ud has exactly one peak (i.e., local
maximum of ud) at Pd ∈ ∂Ω. Moreover, as d tends to zero, Pd approaches a point
where the mean curvature of ∂Ω achieves its maximum. See [15] for a review in
this field. Also see [16] for the critical case p = N+2
, and [5], [6], [7], [8], [9] for
existence and properties of multiple-peaks solutions to (1.2).
From now on we make some hypotheses on f : R → R, as follows.
(H2) f (t) ≡ 0 for t ≤ 0 and f ∈ C1 (R).
(H3) f(t) = O (t
p) as t→ ∞ with m− 1 < p <
N (m− 1) +m
N −m .
(H4) Let F (t) =
f (s) ds. Then there exists a constant θ ∈
that F (t) ≤ θtf (t) for t > 0.
f (t)
is strictly increasing for t > 0 and f (t) = O
tm−1+δ
as t→ 0+ with
a constant δ > 0.
(H6) Let g (u) =
(m− 1)um−1 − uf ′ (u)
um−1 − f (u)
. Then g (u) is non-increasing on (uc,∞),
where uc is the unique positive solution for f (t) = t
Next we present some preliminary knowledge about least energy solutions of the
following problem:
(1.3)
∆mu− um−1 + f(u) = 0 in RN
u > 0 in RN
As before we define an “energy functional” I:W 1,m(RN ) −→ R associated with
(1.3) by
(1.4) I(ṽ) =
(εm |∇ṽ|m + |ṽ|m)− F (ṽ+)
Next let us give a remark on ground states to the problem 1.3. Here by a
ground state we mean a non-negative nontrivial C1 distribution solution which
tends to zero at ∞. For case m = 2, it is well known that the problem 1.3 has
a unique ground state (up to translations) which is radially symmetric [4]. For
case 2 < m < N uniqueness and radial symmetry of ground states are still open.
But the Steiner symmetrization tells us the least-energy solutions must be radially
symmetric (certainly least-energy solutions are ground states). Our assumptions
guarantee that the uniqueness (up to translations) of radial ground states (see
[20]), which implies the uniqueness of least-energy solutions of the problem (1.3).
Exact exponential decay of radial ground states was given in [11], thus we have the
following proposition about the unique radial least-energy solution to problem 1.3:
Proposition 1.1. Under assumptions (H2)–(H6), there is a unique least energy
solution w(x) for (1.3) satisfying:
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 3
( i ) w is radial, i.e., w(x) = w(|x|) = w(r) and w ∈ C1(RN ) with
w(0) = maxX∈RN w(x), w
′(0) = 0 and w′(r) < 0, ∀r > 0.
(ii) limr−→∞ w(r)r
m(m−1) e(
m r = C0 > 0 for some constant C0 and
limr−→∞
w′(r)
Remark 1.1. A good example for f (t) which satisfies all hypotheses (H2)–(H6) is
f (t) = tp for m− 1 < p < N (m− 1) +m
Next we define an “energy functional” Jε : W
1,m (Ω) → R associated with (1.1)
(1.5) Jε (v) =
(εm |∇v|m + |v|m)− F (v+)
with F (v+) =
f (s) ds. Then the well-known mountain-pass lemma [1] implies
(1.6) cε = inf
t∈[0,1]
Jε (h (t))
is a positive critical value of Jε, where Γ is the set of all continuous paths joining
the origin and a fixed nonzero element e ∈ W 1,m (Ω) such that e ≥ 0 and Jε (e) ≤ 0.
It turns out cε can also be characterized as follows:
cε = inf
Jε (u)
u ∈ W 1,m (Ω) ; u ≥ 0, u 6≡ 0,
(εm |∇u|m + um) dx =
f (u)u dx
(1.7) cε = inf
M [u] | u ∈ W 1,m (Ω) , u 6≡ 0 and u ≥ 0 in Ω
M [u] = sup
Jε (tu) .
Hence cε is the least positive critical value and a critical point uε of Jε with critical
value cε is called a least-energy solution. Notice also that if we let
c∗ = I(w) =
(|∇w|m + wm) dx−
F (w) dx,
where w is the unique least energy solution of (1.3), then c∗ can also be characterized
(1.8) c∗ = inf
M∗ [v] | v ∈ W 1,m
, v 6≡ 0 and v ≥ 0 in RN
M∗ [v] = sup
I (tv) .
We refer to Lemma 2.1 of [13] for the above characterizations.
Next we consider the following problem:
4 YI LI AND CHUNSHAN ZHAO
v ∈W 1,m
with RN+ =
(x1, · · · , xN ) ∈ RN , xN ≥ 0
and satisfies
(1.9)
∆mv − vm−1 + f (v) = 0, v > 0 in RN+ ,
= 0 on xN = 0.
The solutions of (1.9) can be characterized as critical points of the functional defined
over W 1,m
as follows.
(ṽ) =
|∇ṽ|m + ṽm) dx−
F (ṽ+) dx.
Similarly as above the least positive critical value C∗ corresponding to least energy
solutions of (1.9) can be characterized as
(1.10) C∗ = inf
ṽ∈W 1,m(RN+ ),ṽ≥0,ṽ 6≡0
(tṽ)
and moreover
(1.11) C∗ =
due to the boundary condition in (1.9) and the fact that w is radial and hence
= 0. We also refer to Lemma 2.1 of [13] for the above characterization of C∗.
In Theorem 1.3 of [13], we proved the following theorem.
Theorem 1.1. Under hypotheses (H2)–(H6), let uε be a least-energy solution of
(1.1). Then all local maximum points(if more than one) of uε aggregate to a global
maximum point Pε at a rate of o(ε) and dist(Pε, ∂Ω)/ε→ 0 as ε → 0+, where
dist(·, ·) is the general distance function. Moreover, we have the following upper-
bound estimate for cε as ε→ 0+:
(1.12) cε ≤ εN
c∗ − (N − 1) max
H (P ) γε+ o (ε)
where H (P ) denotes the mean curvature of ∂Ω at P , γ > 0 is a positive constant
given by
(1.13) γ =
N + 1
|w′ (|z|)|m zN dz.
Our goal in this paper is to locate the position on ∂Ω where the global maximum
point Pε of uε in Ω approaches, provided ε is sufficiently small. For the case m = 2,
Ni and Takagi [18] located the peak by linearizing the equation d∆u−u+f (u) = 0
around the ground state w. But this method fails for our problem with m 6= 2 due
to the strong nonlinearity of the m-Laplacian operator ∆mu = div(|∇u|m−2 ∇u).
So we have to use the intrinsic variational method created by Del Pino and Felmer
in [2] to attack it. We also give a complete proof of the exponential decay of
the least-energy solution uε. We remark that our proof is complete and does not
require the non-degeneracy of the unique radial least energy solution w as stated
in Proposition 1.1, and hence it is different from Ni’s and Takagi’s work [17]. Now
our results can be stated as follows:
Theorem 1.2. Under hypotheses (H2)–(H6), let uε be a least-energy solution of
(1.1) and P̃ε ∈ ∂Ω with dist(Pε, P̃ε) = dist(Pε, ∂Ω). Then as ε→ 0+, after passing
to a sequence P̃ε approaches P̄ ∈ ∂Ω with
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 5
(ii) H
= max
H (P ), where H (P ) denotes the mean curvature of ∂Ω at P
as stated before, and moreover
(iii) the associated critical value cε can be estimated as ε→ 0+ as follows :
(1.14) cε = ε
c∗ − (N − 1)H
γε+ o (ε)
where c∗, γ are as stated in Theorem 1.1.
The organization of this paper is as follows: In Section 2, we will prove some
lemmas which will be used in proving Theorem 1.2. The proof of Theorem 1.2 will
be given in Section 3.
2. Some lemmas and exponential decay of uε
First we prove the following lemma related to exponential decay of the least-
energy solution uε.
Lemma 2.1. Let ε be sufficiently small and that the least-energy solution uε
achieves its global maximum at some point Pε. Then there exist two positive con-
stants c3 and c4 independent of uε or ε such that
(2.1)
uε (x) ≤ c3 exp {−c4 |x− Pε| /ε}
|∇uε(x)| ≤ c3ε−1 exp{−c4|x− Pε|/ε}.
Before beginning to prove this lemma, we give a remark on it.
Remark 2.1. For the case m = 2, under the assumption of non-degeneracy of the
linearized operator ∆− 1 + f ′ (w), where w is the unique ground state of (1.3), Ni
and Takagi [18] showed that uε (x) can be written as
(2.2) uε (x) = w (x) + εφ1 (x) + o (ε)
and φ1 (x) enjoys the exponential-decay property ([18]). Clearly we cannot derive
exponential decay of uε (x) as stated in Lemma (2.1) from (2.2) even though both
w (x) and εφ1 (x) have exponential decay property.
Proof of Lemma 2.1. Since ∂Ω is a smooth compact submanifold of RN , it follows
from the tubular neighborhood theorem [10] that there exists a constant ω (Ω) > 0
which depends only on Ω such that ΩI =
x ∈ Ω, d (x, ∂Ω) < ω (Ω)
is diffeomor-
phic to the inner normal bundle
I = {(x, y) : x ∈ ∂Ω, y ∈ (−ω (Ω) , 0] νx} ,
here νx is the unit outer normal of ∂Ω at x, and the diffeomorphism is defined as
follows: ∀x ∈ ΩI , there exists an unique x̂ ∈ ∂Ω such that d (x, x̂) = d (x, ∂Ω) ,
then Φ∗ : x −→ (x̂,−d (x, x̂) νx̂) . Moreover this diffeomorphism satisfies Φ∗|∂Ω =
Identity. Similarly, let ΩO =
x ∈ RN \ Ω, d(x, ∂Ω) < ω (Ω)
. Then ΩO is diffeo-
morphic to the outer normal bundle
O = {(x, y) : x ∈ ∂Ω, y ∈ [0, ω (Ω)) νx} ,
and the diffeomorphism is given as follows. ∀x ∈ ΩO, there exists an unique
x̄ ∈ ∂Ω such that d(x, x̄) = d (x, ∂Ω) , and then Φ# : x −→ (x̄, d (x, x̄) νx̄)
and Φ#|∂Ω = Identity. Note that (∂Ω)NI is clearly diffeomorphic to (∂Ω)
O via
the following reflection Φ∗ : (∂Ω)
I −→ (∂Ω)
O defined by Φ
∗ ((x, y)) = (x,−y) .
6 YI LI AND CHUNSHAN ZHAO
Therefore, Φ = Φ−1∗ ◦ Φ∗−1 ◦ Φ# : ΩO −→ ΩI is the desired diffeomorphism and
Φ|∂Ω = Identity. Moreover, if we let x = Φ(z) = (Φ1(z), · · · ,ΦN(z)) , z ∈ ΩO,
and z = Ψ(x) = Φ−1(x) = (Ψ1(x), · · · ,ΨN(x)) , x ∈ ΩI , gij =
gij =
(Φ (z)) , we have gij |∂Ω = gij |∂Ω = δij with δij being the Kro-
necker symbol. Denote G =
and A = G− I with I being the N ×N identity
matrix, g(x) = det (gij) and ûε(x) = uε (Φ (x)) for x ∈ ΩO. Then ûε(x) satisfies the
following equations:
εmLûε −
gûm−1ε +
gf (ûε) = 0, ûε > 0 in ΩO
= 0, on ∂Ω,
where
Lûε =
s,l=1
∇ûεG (∇ûε)T
g (∇ûε)G
where Tr means taking the trace of a square matrix.
For 0 < γ̃ ≤ ω(Ω), let ΩOγ̃ =
x ∈ ΩO, d(x, ∂Ω) < γ̃
. We know ‖A‖C0 can be
made arbitrarily small by making γ̃ sufficiently small. Next we define
ūε =
uε(x), x ∈ Ω
ûε(x), x ∈ ΩO,
g̃ij =
δij , x ∈ Ω
gij , x ∈ ΩO,
g̃ij =
δij , x ∈ Ω
gij , x ∈ ΩO,
and Ã(x, ξ) =
Ã1(x, ξ), · · · , ÃN (x, ξ)
for ξ = (ξ1, · · · , ξN ) with
Ãi(x, ξ) =
s,l=1
g̃slξsξl
g̃ijξj
and g̃ = det(g̃ij), B(x, u) =
−um−1 + f(u)
. Then ūε(x) satisfies
(2.3) εm div
Ã(x,∇ūε)
+B(x, ūε) = 0 in Ω
in the weak sense.
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 7
For any ball Br(x0) ⊂ Ω
ΩO with radius r and center x0 ∈ Ω, let ρ = |x− x0|.
Then for any smooth increasing function φ = φ(ρ) we have
(∇φ)G(∇φ)T
g(∇φ)G
∣∇φ(I +A)(∇φ)T
det(I +A)−1∇φ(I +A)
= |∇φ|m−2 ∇φ+
∣∇φ(I + tA)(∇φ)T
det(I + tA)−1∇φ(I + tA)
= |∇φ|m−2 ∇φ
∣∇φ(I + tA)(∇φ)T
(∇φ)A(∇φ)T
det(I + tA)−1∇φ(I + tA)dt
∣∇φ(I + tA)(∇φ)T
det(I + tA)−1
det(I + tA)−1
∇φ(I + tA)dt
∣∇φ(I + tA)(∇φ)T
det(I + tA)−1(∇φ)Adt.
Therefore
(2.4)
(∇φ)G(∇φ)T
g(∇φ)G
3(N − 1)
φ′ +K
by taking γ̃ sufficiently small, here K > 0 is a constant depending only on Ψ, hence
only on Ω and φ′ =
dφ(ρ)
From now on γ̃ = γ̃(Ω) is fixed such that (i) 3
≤ √g ≤ 5
, (ii) (2.4) holds for
any smooth increasing radial function φ(ρ) and (iii) 3
|ξ|m ≤ Ã(x, ξ) · ξ ≤ 5
|ξ|m for
any ξ = (ξ1, · · · , ξN ). Denote Ωγ̃ = Ω ∪ ΩOγ̃ .
Let Ωε =
(Ω− Pε) and uε(x) = uε(Pε + εx) for x ∈ Ωε. Then uε is a solution
to the following problem:
(2.5)
ε − (uε)m−1 + f(uε) = 0, uε > 0 in Ωε
= 0, on ∂Ωε,
where n is the unit outer normal of ∂Ωε. Similarly, let Ω
Ωγ̃ − Pε
ūε(x) = ūε(Pε + εx) for x ∈ Ωγ̃ε . Since ūε converges to the unique radial least-
energy solution w of (1.3) in C1loc(R
N ) ∩W 1,m(RN ) as ε → 0+ (see the proof of
Theorem 1.2 of [13]) and w satisfies:
(i) w is radial, i.e.,w(x) = w(|x|) = w(r) > 0
(ii) lim
w(r)r
m(m−1) e(
m−1 )
m r = C0 > 0
(see Theorem 1 of [11]) which yields w(r) ≤ κe−µr for a constant κ > 0 and
. First we fix a constant η > 0 such that 1
tm−1 > f(t) for t ∈
(0, η]. From hypothesis (H5) it follows that such an η exists. Then there exist
ε0 > 0 sufficiently small and R0 sufficiently large such that 4κ exp{−µR0} < η and
8 YI LI AND CHUNSHAN ZHAO
‖ūε − w‖C0(BR0(0)∩Ωε) ≤ κ exp{−µR0}, which yields
uε|(∂BR0 (0))∩Ωε ≤ 2κ exp{−µR0}.
Note that
ε − 7
(uε)m−1 = 1
(uε)m−1 − f(uε) > 0 in Ωε \BR0(0),
= 0 on ∂Ωε \BR0(0),
uε ≤ 2κ exp{−µR0} on ∂BR0(0) ∩ Ωε.
Then we have
uε(x) ≤ 2κ exp{−µR0}, for x ∈ Ωε \BR0(0)
due to the strong maximum principle ([22]). We get by scaling back that
uε|Ω\BεR0 (0) ≤ 2κ exp{−µR0}
(2.6)
uε(x) ≤ w
+ κ exp{−µR0} ≤ κ exp{−
}+ κ exp{−µR0}
≤ 2κ exp{−µ|x|
for x ∈ Ω ∩BεR0(0).
From definition of ūε we know
ūε(x) ≤ 2κ exp{−
µ (|x| − 2dist (Pε, ∂Ω))
} ≤ 4κ exp{−µ|x|
} for x ∈ Ωγ̃∩BεR0(0)
for ε ∈ (0, ε0] with ε0 sufficiently small due to the fact dist(Pε, ∂Ω) = o(ε) as
ε→ 0+. Note that
Ωγ̃\BεR0 (0)
ūε ≤ 4κ exp{−µR0}.
Choice of R0 and γ̃ tells us for any 0 < t ≤ 4κ exp{−µR0}
B(x, t) =
−tm−1 + f(t)
tm−1.
∀x0 ∈ Ω \BεR0(0) and Br(x0) ⊂ Ωγ̃ \BεR0(0), define
φ(x) = φ(ρ) = φ(|x− x0|)
Ωγ̃\BεR0 (0)
where λ∗ > 0 is a constant to be determined later. Simple calculations show that
(i) φ′(ρ) > 0 and
3(N−1)
φ′ +K
(m− 1) (λ∗)m
))m−2
tanh(λ∗ρε )
( λ∗ρε )
+ε (λ∗)
))m−1
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 9
for any 0 < λ∗ ≤ λ̂, where λ̂ > 0 is a small constant depending only on m and
Ω through K. We remark that we have used the fact maxr∈[0,∞)
tanhr
< ∞ for
m ≥ 2. From now on we choose λ∗ = λ̂.
Therefore we have
εm div(Ã(x,∇ūε))−
ūm−1ε ≥ 0 in Br(x0),
εm div(Ã(x,∇φ))−
m−1 ≤ 0 in Br(x0).
Clearly
φ|∂Br(x0) ≥ ūε|∂Br(x0).
Then from the Comparison Theorem (Theorem 10.1 of [19]) it follows that
φ(x) ≥ ūε(x) in Br(x0).
In particular, φ(x0) ≥ ūε(x0). Thus we get
uε(x0) ≤
Ωγ̃\BεR0(0)
exp{−λ∗r
Choosing r = d
x0, ∂
Ωγ̃ \BεR0(0)
we get
uε(x0) ≤ 4κ exp{−µR0 −
} ≤ 2κ exp{− λ̃(εR0 + r)
with λ̃ = min{µ, λ∗}. Note that x0 belongs to one of the following two cases:
(i) d
x0, ∂
Ωγ̃ \BεR0(0)
= d (x0, ∂BεR0(0)) ,
(ii) d
x0, ∂
Ωγ̃ \BεR0(0)
x0, ∂Ω
For case (i) we have d(x0, Pε) ≤ εR0 + r and therefore
(2.7) uε(x0) ≤ 4κ exp{−
λ̃d(x0, Pε)
For case (ii) we have r ≥ γ̃ and thus
(2.8)
uε(x0) ≤ 4κ exp{−λ̃
εR0 + r
} ≤ 4κ exp{− λ̃γ̃
≤ 4κ exp{−λ̃ γ̃
diam(Ω)
· d(x0, Pε)
Combining (2.6), (2.7) and (2.8) together and letting c̃3 = 4κ, c̃4 = min{µ, λ̃, λ̃γ̃diam(Ω)}
yields
(2.9) uε(x) ≤ c̃3 exp{−
c̃4|x− Pε|
Next we show the estimate for |∇uε| holds. First from (2.5) it follows that
(2.10) ∆mu
ε = (uε)
m−1 − f(uε), uε > 0 in Ωε
For x ∈ Ωε and dist(x, ∂Ωε) ≥ 1, consider (2.10) in the unit ball centered at x, i.e.,
B1(x). Then by an C
1,α estimate (see [21], for example) there exists two constants
10 YI LI AND CHUNSHAN ZHAO
C > 0 and α∗ ∈ (0, 1) which are independent of ε such that
(2.11)
‖uε‖C1,α∗ (B 1
(x)) ≤ C
‖uε‖L∞(B1(x)) + ‖ (u
m−1 − f(uε)‖
L∞(B1(x))
≤ c∗3 exp{−c∗4|x− Pε|},
where we have used (2.9) and the fact that uε(x) = uε(Pε + εx) for x ∈ Ωε.
Especially we have
(2.12) |∇uε(x)| ≤ c∗3 exp{−c∗4|x− Pε|},
for x ∈ Ωε and dist(x, ∂Ωε) ≥ 1. For x ∈ Ωε with dist(x, ∂Ωε) < 1. Let x0 ∈ ∂Ωε
be a point such that dist(x, x0) = dist(x, ∂Ωε) and consider ū
ε(x) = ūε(Pε + εx)
in B2(x0), the ball of radius 2 centered at x0, then from (2.3) it follows that ū
satisfies
(2.13) div
Ã(Pε + εx,∇ūε)
+B(Pε + εx, ū
ε) = 0 in B2(x0)
in the weak sense. Then applying an C1,α estimate (see [21], for example) again
yields as above that there exists two constants C > 0 and α∗ ∈ (0, 1) which are
independent of ε such that
‖ûε‖C1,α∗(B1(x0)) ≤ C
‖ûε‖L∞(B2(x0)) + ‖B(Pε + εx, û
L∞(B2(x0))
≤ c∗3 exp{−c∗4|x− Pε|}
by adjusting c∗3 and c
4 if it is necessary. Especially we have
(2.14) |∇uε(x)| ≤ c∗3 exp{−c∗4|x− Pε|},
Thus combining (2.11) and (2.14) together and scaling back we have for x ∈ Ω
|∇uε(x)| ≤ c∗3ε−1 exp{−c∗4
|x− Pε|
Proof of Lemma 2.1 is completed by letting c3 = max{c̃3, c∗3} and c4 = min{c̃4, c∗4}.
Remark 2.2. Our proof of the Lemma 2.1 with necessary minor modifications also
works well for elliptic systems.
Next we present a lemma related to extensions of uε.
Lemma 2.2. There exists a C1-extension ũε of uε which has compact support in
N and satisfies
(ii) ‖ũε‖W 1,m(RN ) ≤ c5 ‖uε‖W 1,m(Ω) and ‖ũε‖C1(RN ) ≤ c5 ‖uε‖C1(Ω̄),
(iii) ũε also has the exponential-decay property as stated in Lemma 2.1, i.e.,
there exists an absolute constant λ ≥ 1 such that
(2.15)
0 ≤ ũε ≤ c3λ exp
|x− Pε|
|∇ũε(x)| ≤ c3λε−1 exp{−
|x− Pε|
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 11
(iv) there exists a positive constant δ̃ = δ̃ (Ω) such that for any P ∈ ∂Ω,
ũε|B
(P )\Ω is the reflection of uε through ∂Ω.
Proof. Let d̃ = d
∂Ω, ∂Ωγ̃
and 0 ≤ ̺(x) ≤ 1 be a smooth cut-off function such
that ̺(x) ≡ 1 for x ∈ {x ∈ RN , d(x,Ω) ≤ d̃
} and ̺(x) ≡ 0 for x ∈ RN \
Then ũε = ̺ūε satisfies (ii), (iii) and (iv) automatically. The proof of this lemma
is completed. �
Similar to energy density introduced in [2], we define the energy density associ-
ated with (1.1) as follows:
E (w, y′) =
(|∇w|m + wm)− F (w)
(y′, 0) for y′ ∈ RN−1.
Then we have the following lemma.
Lemma 2.3. Let G be a C2 function in a neighborhood of the origin of RN−1.
i,j=1
Gij (0) yiyjE (w, y
′) dy′ = 2∆G (0) γ,
where γ is the constant defined in (1.13), and y′ = (y1, . . . , yN−1), and
Gij (0) =
∂yi∂yj
(0) .
Proof. In Lemma 2.4 of [13], we showed that
(2.16) γ =
(|∇w|m) + wm − F (w)
zN dz.
Next we introduce the polar coordinates
z1 = r sin θN−1 sin θN−2 · · · sin θ2 sin θ1,
z2 = r sin θN−1 sin θN−2 · · · sin θ2 cos θ1,
z3 = r sin θN−1 sin θN−2 · · · cos θ2,
... ,
zN = r cos θN−1,
and notice that
(r, θ1, . . . , θN−1) | r > 0, 0 ≤ θ1 < 2π,
0 ≤ θj < π for j = 2, . . . , N − 2, and 0 ≤ θN−1 <
and that
dz = rN−1 sin θ2 sin
2 θ3 · · · sinN−2 θN−1 dr dθ1 · · · dθN−1.
After elementary computations one obtains
(2.17) γ =
|w′ (r)|m + wm (r)
− F (w (r))
rN dr · ωN−2,
where ωN−2 is the volume of the unit ball in R
N−2. Here we used the fact that w
is radially symmetric.
12 YI LI AND CHUNSHAN ZHAO
Using the radial symmetry of w again, we obtain
i,j=1
Gij (0) yiyjE (w, y
′) dy′(2.18)
Gii (0) y
iE (w, y
′) dy′
Gii (0) ·
N − 1
|y′|2E (w, y′) dy′
= ∆G (0) ·
E (w, r) rN dr · ωN−2,
where E (w, r) = (1/m)
|w′ (r)|m + wm (r)
− F (w (r)) . Comparing (2.17) and
(2.18) yields
i,j=1
Gij (0) yiyjE (w, y
′) dy′ = 2∆G (0) γ.
The proof of Lemma 2.3 is completed. �
3. Proof of Theorem 1.2
With the help of the lemmas in Section 2, now we can give the proof of Theorem
Proof of Theorem 1.2. Since as ε→ 0+, Pε → ∂Ω at the rate of o(ε), it follows that
d(Pε, P̃ε)/ε → 0, where P̃ε ∈ ∂Ω is the closest point on ∂Ω to Pε. then by passing
to a sequence, P̃ε → P̄ ∈ ∂Ω. After an ε-dependent rotation and translation,
we may assume that P̃ε is at the origin and Ω can be described in a fixed cubic
neighborhood V of P̄ as the set
{ (x′, xN ) | xN > ψε (x′) } with x′ = (x1, . . . , xN−1) ,
where ψε is smooth, ψε (0) = 0, ∇ψε (0) = 0. Furthermore, we may assume that
ψε converges locally in the C
2 sense to ψ, a corresponding parametrization at P̄ .
Note that since P̃ε is the origin, so we have Pε/ε → 0 as ε → 0+. Thus we have
ũε(x) = ũε(εx) = ũε
x− Pε
→ w(x) in C1loc
as ε → 0+. From the
characterization of cε = Jε (uε) in Section 1, we have
ε−NJε (uε) ≥ ε−NJε (tuε) = IΩε (tuε)
for all t > 0. Hereinafter
IΩ∗ (v) =
(|∇v|m + |v|m) dx−
F (v) dx.
IΩε (tu
ε) = IΩε (tũ
ε) ≥ I
(tũε) + I(Ωε∩Vε)\RN+
(tũε)− I(RN+∩Vε)\Ωε (tũ
ε)(3.1)
= I + II− III,
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 13
with Vε =
V. Let us choose t = tε so that IRN
(tũε) maximizes in t. Then from
the definition of C∗ in (1.10), equality (1.11) and Lemma 2.2 it follows that
I = I
(tεũ
ε) ≥ c∗
e−c6/ε
for some constant c6 > 0 independent of ε. Next we give an estimate of tε.
Lemma 3.1. There is a unique tε ∈ (0,∞) such that
tmε (|∇ũε|
+ (ũε)m) dx −
F (tεũ
ε) dx
= sup
tm (|∇ũε|m + (ũε)m) dx−
F (tũε) dx
and moreover
(3.2) tε = 1 + o (1) as ε→ 0+.
Proof. Under assumption (H5), the existence and uniqueness of tε can be proved
similarly to the proof of Lemma 2.1 of [13]. Here we only need show (3.2). Let
(3.3) hε (t) =
(|∇ũε|m + (ũε)m) dx−
F (tũε) dx.
(3.4)
h′ε (t) = t
(|∇ũε|m + (ũε)m) dx−
ũεf (tũε) dx
= tm−1
(|∇w|m + wm) dx−
wf (tw) dx+ o(1),
here we have used the exponential decay of ũε in Lemma 2.2, exponential decay of
w and ũε → w in C1loc
as ε → 0+. Moreover the term o (1) → 0 uniformly
in t on each compact interval as ε → 0+. (3.3) tells us hε(1) = 12c∗ + o(1), which
yields that tε is bounded and away from 0. Also from (3.4) it follows that
(3.5)
h′ε (t) = t
wf (w) dx−
w f (tw) dx+ o (1)
= tm−1
f (w)
− f (tw)
dx+ o (1) .
Therefore at t = tε we have
(3.6)
f (w)
− f (tεw)
(tεw)
dx = o (1) .
Since f(t)/tm−1 is strictly increasing (see (H5)) it follows from (3.6) that tε =
1 + o (1) . The proof of Lemma 3.1 is completed. �
14 YI LI AND CHUNSHAN ZHAO
Proof of Theorem 1.2 continued. Using again the exponential decay of uε in
Lemma 2.1 and the expansion of tε in Lemma 3.1, we obtain
−II = −
(RN−1×{0})∩Vε
dy′(3.7)
(ψε(εy′))
tmε (|∇ũε|
+ (ũε)
)− F (tεũε)
(y′, yN ) dyN
= − (1 + o (1))
(RN−1×{0})∩(Ωε∩Vε)
(ψε(εy′))
(|∇uε|m + (uε)m)− F (uε)
(y′, yN) dyN .
Similarly,
(3.8) III = (1 + o (1))
Vε∩(RN−1×{0})
(ψε(εy
(|∇ũε|m + (ũε)m)− F (ũε)
(y′, yN) dyN .
In above a+ = max{a, 0}, a− = min{a, 0}. Since ψε (0) = 0, ∇ψε (0) = 0 and
ψε converges in the C
2 local sense to ψ, and ũε → w in the C1 local sense in RN
with uniform exponential decay with respect to ε, it follows from the dominated
convergence theorem that
(−II + III)
i,j=1
ψij (0) yiyj
(|∇w|m + wm)− F (w)
(y′, 0) dy′
= ∆ψ (0) γ = (N − 1)H
γ (by Lemma 2.3).
Thus we have
cε ≥ εN
c∗ − (N − 1)H
γε+ o (ε)
But (1.12) in Theorem 1.1 tells us
cε ≤ εN
c∗ − (N − 1) max
H (P ) γε+ o (ε)
Therefore we get
(ii) H
= max
H (P ), which is (ii) of Theorem 1.2,
(iii) cε = ε
c∗ − (N − 1)H
γε+ o (ε)
as ε→ 0+,
which is part (iii) of Theorem 1.2. The proof of Theorem 1.2 is completed. �
Acknowledgement. The authors want to give their thanks to anonymous
referee for some helpful comments.
LOCATING THE PEAKS OF LEAST-ENERGY SOLUTIONS 15
References
[1] A. Ambrosetti and P.H. Rabinowitz, Dual variational methods in critical point theory and
applications, J. Functional Analysis 14 (1973), 349–381. MR 51 #6412
[2] M. Del Pino and P.L. Felmer, Spike-layered solutions of singularly perturbed elliptic problems
in a degenerate setting, Indiana Univ. Math. J. 48 (1999), 883–898. MR 2001b:35027
[3] J.I. Dı́az, Nonlinear Partial Differential Equations and Free Boundaries, Vol. I: Elliptic
Equations, Research Notes in Mathematics, vol. 106, Pitman Advanced Publishing Program,
Boston, 1985. MR 88d:35058
[4] B. Gidas, W.M. Ni, L. Nirenberg, Symmetry of positive solutions of nonlinear elliptic equa-
tions in RN , Advances in Math, Supplementary Studies 7A (1981) 369-402. MR 84a:35083
[5] C. Gui, Multipeak solutions for a semilinear Neumann problem, Duke Math. J. 84 (1996),
739–769. MR 1997i:35052
[6] C. Gui and N. Ghoussoub, Multi-peak solutions for a semilinear Neumann problem involving
the critical Sobolev exponent, Math. Z. 229 (1998), 443–474. MR 2000k:35097
[7] C. Gui and J. Wei, Multiple interior peak solutions for some singularly perturbed Neumann
problems, J. Differential Equations 158 (1999), 1–27. MR 2000g:35035
[8] , On multiple mixed interior and boundary peak solutions for some singularly per-
turbed Neumann problems, Canad. J. Math. 52 (2000), 522–538. MR 2001b:35023
[9] C. Gui, J. Wei, and M. Winter, Multiple boundary peak solutions for some singularly per-
turbed Neumann problems, Ann. Inst. H. Poincaré Anal. Non Linéaire 17 (2000), 47–82.
MR 2001a:35018
[10] M.-M. Hirsch, Differential Topology, Graduate Texts in Mathematics, vol. 33, Springer-
Verlag, New York, 1976. MR 56# 6669
[11] Y. Li and C. Zhao, A note on exponential decay properties of ground states for quasilinear
elliptic equations, Proc. Amer. Math. Soc. 133 (2005), 2005–2012. MR 2006a:35091
[12] , On the structure of solutions to a class of quasilinear elliptic Neumann problems,
J. Differential Equations 212 (2005), 208–233. MR 2006b:35107
[13] , On the shape of least-energy solutions for a class of quasilinear elliptic Neumann
problems, IMA Journal of Applied Mathematics 2007; doi: 10.1093/imamat/hx1032.
[14] C.-S. Lin, W.-M. Ni, and I. Takagi, Large amplitude stationary solutions to a chemotaxis
system, J. Differential Equations 72 (1988), 1–27. MR 89e:35075
[15] W.-M. Ni,Diffusion, cross-diffusion, and their spike-layer steady states, Notices Amer. Math.
Soc. 45 (1998), no. 1, 9–18. MR 99a:35132
[16] W.-M. Ni, X.B. Pan, and I. Takagi, Singular behavior of least-energy solutions of a semilin-
ear Neumann problem involving critical Sobolev exponents, Duke Math. J. 67 (1992), 1–20.
MR 93j:35081
[17] W.-M. Ni and I. Takagi, On the shape of least-energy solutions to a semilinear Neumann
problem, Comm. Pure Appl. Math. 44 (1991), no. 7, 819–851. MR 92i:35052
[18] , Locating the peaks of least-energy solutions to a semilinear Neumann problem, Duke
Math. J. 70 (1993), 247–281. MR 94h:35072
[19] P. Pucci and J. Serrin, The strong maximum principle revisted, J. Differential Equations 196
(2004), no. 1, 1-66. MR 2004k:35033
[20] J. Serrin and M.-X. Tang, Uniqueness of ground states for quasilinear elliptic equations,
Indiana Univ. Math. J. 49 (2000), no. 3, 897–923. MR 2002d:35072
[21] P. Tolksdorf, Regularity for a more general class of quasilinear elliptic equations, J. Differ-
ential Equations 51 (1984), no. 1, 126–150. MR 85g:35047
[22] J. Vazquez, A strong maximum principle for some quasilinear elliptic equations, Appl. Math.
Optim. 12 (1984), no. 3, 191–202. MR 86m:35018
[23] J. Wei, On the boundary spike layer solutions to a singularly perturbed Neumann problem,
J. Differential Equations 134 (1997), 104–133. MR 98e:35076
Department of Mathematics, The University of Iowa, Iowa City, IA 52242
16 YI LI AND CHUNSHAN ZHAO
Department of Mathematics, Hunan Normal University, Changsha, Hunan
E-mail address: yi-li@uiowa.edu
Department of Mathematical Sciences, Georgia Southern University, Statesboro,
GA 30460
E-mail address: czhao@GeorgiaSouthern.edu
	1. Introduction and statement of results
	2. Some lemmas and exponential decay of u0=x"0122
	3. Proof of Theorem ??
	References
ABSTRACT
  In this paper we study the shape of least-energy solutions to a singularly
perturbed quasilinear problem with homogeneous Neumann boundary condition. We
use an intrinsic variation method to show that at limit, the global maximum
point of least-energy solutions goes to a point on the boundary faster than the
linear rate and this point on the boundary approaches to a point where the mean
curvature of the boundary achieves its maximum. We also give a complete proof
of exponential decay of least-energy solutions.

<|endoftext|><|startoftext|>
Microsoft Word - preprint.doc
Preprint version of Nature Photonics 1, 215 (2007) 
Review: Semiconductor Quantum Light Sources  
Andrew J Shields  
Toshiba Research Europe Limited, 260 Cambridge Science Park, Cambridge CB4 0WE, UK 
Abstract 
Lasers and LEDs display a statistical distribution in the number of photons emitted in a given time interval.  New 
applications exploiting the quantum properties of light require sources for which either individual photons, or pairs, are 
generated in a regulated stream.  Here we review recent research on single-photon sources based on the emission of a 
single semiconductor quantum dot.  In just a few years remarkable progress has been made in generating 
indistinguishable single-photons and entangled photon pairs using such structures.  It suggests it may be possible to 
realise compact, robust, LED-like semiconductor devices for quantum light generation.   
Applications of Quantum Photonics 
Applying quantum light states to photonic applications allows functionalities that are not possible using ‘ordinary’ 
classical light.  For example, carrying information with single-photons provides a means to test the secrecy of optical 
communications, which could soon be applied to the problem of sharing digital cryptographic keys.1 2 Although secure 
quantum key distribution systems based on weak laser pulses have already been realised for simple point-to-point links, 
true single-photon sources would improve their performance.3 Furthermore, quantum light sources are important for 
future quantum communication protocols such as quantum teleportation. 4   Here quantum networks sharing 
entanglement could be used to distribute keys over longer distance or through more complex topologies.5   
A natural progression would be to use photons for quantum information processing, as well as communication.  In this 
regard it is relatively straightforward to encode and manipulate quantum information on a photon.   On the other hand, 
single-photons do not interact strongly with one-another, a prerequisite for a simple photon logic gate.  In linear optics 
quantum computing67 (LOQC) this problem is solved using projective measurements to induce an effective interaction 
between the photons.  Here triggered sources of single-photons and entangled pairs are required as both the qubit 
carriers, as well as auxiliary sources to test the successful operation of the gates.  Although the component requirements 
for LOQC are challenging, they have recently been relaxed significantly by new theoretical schemes. 7  Quantum light 
states are also likely to become increasingly important for various types of precision optical measurement.8  
For these applications we would ideally like light sources which generate pure single-photon states “on demand” in 
response to an external trigger signal.  Key performance measures for such a source are the efficiency, defined as the 
fraction of photons collected into the experiment or application per trigger, and the second order correlation function at 
zero delay, see text box.  The latter is essentially a measure of the two-photon rate compared to a classical source with 
random emission times of the same average intensity.  In order to construct applications involving more than one 
photon, it is also important that photons emitted from the source (at different times), as well as those from different 
sources, are otherwise indistinguishable.   
In the absence of a convenient triggered single-photon source, most experiments in quantum optics rely on non-linear 
optical processes for generating quantum light states.  Optically pumping a crystal with a χ(2) non-linearity has a finite 
probability of generating a pair of lower energy photons via parametric down conversion.  This may be used to prepare 
photon pairs with time-bin entanglement,9 entangled polarisations,1011 or alternatively single-photon states ‘heralded’ by 
the second photon in the pair.12  A χ(3) non-linearity in a semiconductor has also been used to generate entangled 
pairs.13  As these non-linear processes occur randomly, there is always a finite probability of generating two pairs that 
increases with pump power.  As double pairs degrade the fidelity of quantum optical gates, the pump laser power must 
be restricted to reduce the rate of double pairs to an acceptable level, which has a detrimental effect upon the efficiency 
of the source.14  This means that although down-conversion sources continue to be highly successful in demonstrating 
few photon quantum optical gates, scaling to large numbers may be problematic. Solutions have been proposed based 
on switching multiple sources,15 or storing photons in a switched fibre loop.16    
Ideally we would like a quantum light source that generates exactly one single-photon, or entangled-pair, per excitation 
trigger pulse.  This may be achieved using the emission of a single quantum system.  After relaxation, a quantum 
system is by definition no longer excited and therefore unable to re-emit.  Photon anti-bunching, the tendency of a 
quantum source to emit photons separated in time, was first demonstrated in the resonance fluorescence of a low density 
vapour of Na atoms,17 and subsequently for a single ion.18   
Quantum dots are often referred to as “artificial atoms”, as their electron motion is quantised in all three spatial 
directions, resulting in a discrete energy level spectrum, like that of an atom.  They provide a quantum system which 
can be grown within robust, monolithic semiconductor devices and can be engineered to have a wide range of desired 
properties.  In the following we review recent progress towards the realisation of a semiconductor technology for 
quantum photonics.  An excellent account of the early work can be found in Ref. 19. Space restrictions limit discussion 
of work on other quantised systems. For this we refer the reader to the comprehensive review in Ref 20. 
Optical Properties of Single Quantum Dots 
Nano-scale quantum dots with good optical properties can be fabricated using a natural growth mode of strained layer 
semiconductors.21  When InAs is deposited on GaAs it initially grows as a strained two-dimensional sheet, but beyond 
some critical thickness, tiny islands like those shown in Fig.1a form in order to minimize the surface strain.  
Overgrowth of the islands leads to the coherent incorporation of InxGa1-xAs dots into the crystal structure of the device, 
as can be seen in the cross-sectional image of Fig.1c.  The most intensively studied are small InAs dots on GaAs 
emitting around 900-950nm at low temperatures, which can be conveniently measured with low noise Si single photon 
detectors.    
A less desirable feature of the self-organising technique is that the dots form at random positions on the growth surface.  
However, recently considerable progress has been made on controlling the dot position (Fig.1b) within the device 
structure by patterning nanometer sized pits on the growth surface.2223   
As InGaAs has a lower energy bandgap than GaAs, the quantum dot forms a potential trap for electrons and holes.  If 
sufficiently small, the dot contains just a few quantised levels in the conduction and valence bands, each of which holds 
two electrons or holes of opposite spin.   Illumination by a picosecond laser pulse excites electrons and holes which 
rapidly relax to the lowest lying energy states either side of the bandgap.  A quantum dot can thus capture two electrons 
and two holes to form the biexciton state, which decays by a radiative cascade, as shown schematically in Fig.2a.  One 
of the trapped electrons recombines with one of the holes and generates a first photon (called the biexciton photon, X2).  
This leaves a single electron-hole pair in the dot (the exciton state), which subsequently also recombines to generate a 
second (exciton, X) photon.  The biexciton and exciton photons have distinct energies, as can be seen in the low 
temperature photoluminescence spectrum of Fig.2a, due to the different Coulomb energies of their initial and final 
states.  Often a number of other weaker lines can also be seen due to recombination of charged excitons which form 
intermittently when the dot captures an excess electron or hole.24  Larger quantum dots, with several confined electron 
and hole levels, have a richer optical signature due to the large number of exciton complexes that can be confined.   
High resolution spectroscopy reveals that the X2 and X transitions of a dot are in fact both doublets with linearly 
polarised components parallel to the [110] and [1-10] axes of the semiconductor crystal, labelled here H and V, 
respectively.2526  The origin of this polarisation is an asymmetry in the electron-hole exchange interaction of the dot 
which produces a splitting of the exciton spin states.  The asymmetry derives from an elongation of the dot along one 
crystal axis and in-built strain in the crystal.  It mixes the exciton eigenstates of a symmetric dot with total z-spin Jz = 
+1 and -1 into symmetric and anti-symmetric combinations, which couple to two H or two V polarised photons, 
respectively, as shown in Fig.2.   
The exciton state of the dot has a typical lifetime of ~1ns, which is due purely to radiative decay.  As this is much 
longer than the duration of the exciting laser pulse, or the lifetime of the photo-excited carrier population in the 
surrounding semiconductor, only one X photon can be emitted per laser pulse.  This can be proven, as first reported27 by 
Peter Michler, Atac Imamoglu and their colleagues in Santa Barbara, by measuring the second order correlation 
function, g(2)(τ) of the exciton photoluminescence,2829 see text box.  In fact each of the exciton complexes of the dot 
generates at most one photon per excitation cycle, which allows single-photon emission from also the biexciton or 
charged exciton transitions.30  
Cross-correlation measurements313233  between the X and X2 photons confirm the time correlation expected for the 
cascade in Fig.2a, ie the X photon follows the X2 one.  Indeed the shape of the cross-correlation function for both CW 
and pulsed excitation can be accurately described with a simple rate equation model and the experimentally measured X 
and X2 decay rates.
 34   
Semiconductor Microcavities 
A major advantage of using self-assembled quantum dots for single-photon generation is that they can be easily 
incorporated into cavities using standard semiconductor growth and processing techniques.  Cavity effects are useful for 
directing the emission from the dot into an experiment or application, as well as for modifying the photon emission 
dynamics. 3536 Purcell37 predicted enhanced spontaneous emission from a source in a cavity when its energy coincides 
with that of the cavity mode, due to the greater density of optical states to emit into.  For an ideal cavity, in which the 
emitter is located at the maximum of the electric field with its dipole aligned with the local electric field, the 
enhancement in decay rate is given by Fp = (3/4π
2) (λ/n)3 Q/V, where Q is the quality factor, a measure of the time a 
photon is trapped in the cavity, and V is the effective mode volume.  Thus high photon collection efficiency, and 
simultaneously fast radiative decay, requires small cavities with highly reflecting mirrors and a high degree of structural 
perfection.  However, without controlling the location of the dot in the cavity, as discussed below, it may be difficult to 
achieve the full enhancement predicted by the Purcell formula.   
Figure 3 shows images of some of the single quantum dot cavity structures that have proven most successful.  Pillar 
microcavities, formed by etching cylindrical pillars into semiconductor Bragg mirrors placed either side of the dot layer, 
have shown large Purcell enhancements and have a highly directional emission profile, thus making good single-photon 
sources.38394041 Purcell factors of around 6 have been measured directly,4041 through the rate of cavity-enhanced radiative 
decay compared to that of a dot without cavity, implying a coupling to the cavity mode of β=Fp/(1+Fp)>85%, if we 
assume the leaky modes are unaffected by the cavity.  However, the experimentally determined photon collection 
efficiency, which is a more pertinent parameter for applications, is typically ~10%, due the fact that not all the cavity 
mode can be coupled into an experiment and scattering of the mode by the rough pillar edges.  We can expect that the 
photon collection efficiency will increase with improvements to the processing technology or new designs of 
microcavity.   
Another means of forming a cavity is to etch a series of holes in a suspended slab of semiconductor, so as to form a 
lateral variation in the refractive index which creates a forbidden energy gap for photonic modes in which light cannot 
propagate.42  Photons can then be trapped in a central irregularity in this structure: usually an unetched portion of the 
slab.  Such photonic bandgap defect cavities have been fabricated in Si with Q values approaching 106.4344  High quality 
active cavities have also been demonstrated in GaAs containing InAs quantum dots. 45464748  A radiative lifetime of 86 ps, 
corresponding to a Purcell factor of Fp~12, has been reported.
47  Very recently a lifetime of 60ps was measured for a 
cavity in the strong coupling regeme.48  
If the Q-value is sufficiently large, the system enters the strong coupling regime where the excitation oscillates 
coherently between an exciton in the dot and a photon in the cavity.  The spectral signature of strong coupling, an anti-
crossing between the dot line and the cavity mode, has been observed for quantum dots in pillar microcavities,49 
photonic bandgap defect cavities,50 microdisks51 and microspheres.52  It has been demonstrated for atom cavities that 
strong coupling allows the deterministic generation of single-photons.5354  Single-photon sources in the strong coupling 
regime can be expected to have very high extraction efficiencies and be time-bandwidth limited.55  Encouragingly 
single-photon emission has been reported recently for a dot in a strongly coupled pillar microcavity. 56      
Another interesting recent development is the ability to locate a single quantum dot within the cavity, as this ensures the 
largest possible coupling and removes background emission, as well as other undesirable effects, due to other dots in 
the cavity.  Above we discussed techniques to control the dot position on the growth surface.  The other way is to 
position the cavity around the dot.  One technique combines micro-photoluminescence spectroscopy to locate the dot 
position, with in-situ laser photolithography to pattern markers on the wafer surface.57  An alternative involves growing 
a vertical stack of dots so that their location can be revealed by scanning the wafer surface, 58 as shown in Fig.3.  
Recently this technique has allowed larger coupling energies for a single dot in a photonic bandgap defect cavity.48   
Photon Indistinguishability 
Cavity effects are important for rendering different photons from the source indistinguishable, which is essential for 
many applications in quantum information.  When two identical photons are incident simultaneously on the opposite 
input ports of a 50/50 beamsplitter, they will always exit via the same output port, 59 as shown schematically in Fig.4a. 
This occurs because of a destructive interference in the probability amplitude of the final state in which one photon exits 
through each output port.  The amplitude of the case where both photons are reflected exactly cancels with that where 
both are transmitted, due to the π/2 phase change upon reflection, provided the two photons are entirely identical.   
Two-photon interference of two single-photons emitted successively from a quantum dot in a weakly-coupled pillar 
microcavity was first reported by the Stanford group.60 Fig. 4b shows a schematic of their experiment.  Notice the 
reduction of the co-incidence count rate measured between detectors in either output port, when the two photons are 
injected simultaneously (Fig.4c).  The dip does not extend completely to zero, indicating that the two photons sometime 
exit the beamsplitter in opposite ports.  The measured reduction in co-incidence rate at zero delay of 69%, implies an 
overlap for the single-photon wavepackets of 0.81, after correcting for the imperfect single-photon visibility of the 
interferometer.  Two-photon interference dips of 66% and 75% have been reported by Bennett et al61  and Vauroutsis et 
al. 62   Similar results have been obtained for a single dot in a photonic bandgap defect cavity.63   
This two-photon interference visibility is limited by the finite coherence time of the photons emitted by the quantum 
dot,64 which renders them distinguishable.  The depth of the dip in Fig.4c depends upon the ratio of radiative decay time 
to the coherence time of the dot, ie R=2τdecay/τcoh. When unity, the coherence time is limited by radiative decay and the 
source will display perfect 2-photon interference.  The most successful approach thus far has been to extend τcoh by 
resonant optical excitation of the dot and reduce τdecay using the Purcell effect in a pillar microcavity, to values R~1.5.
the future higher visibilities may be achieved with a larger Purcell enhancement, using a single dot cavity in the strong-
coupling regime or with electrical gating described in the next section.   
A source of indistinguishable single-photons was used by Fattal et al to generate entanglement between post-selected 
pairs. 65 66  This involves simply rotating the polarisation of one of the photons incident on the final beamsplitter in 
Fig.4a by 90o.  By post-selecting the results where the two photons arrive at the beamsplitter at the same time and where 
there is one photon in each output arm (labelled 1 and 2), the measured pairs should correspond to the Bell state 
ψ− =  1/√2 (¦H1 V2 > - ¦V1 H2 >)            Eq.1 
Note that only if the two photons are indistinguishable and thus the entanglement is only in the photon polarisation, are 
the two terms in Eq1 able to interfere. Analysis of the density matrix published by Fattal et al65 reveals a fidelity of the 
post-selected pairs to the state in Eq.1 of 0.69, beyond the classical limit of 0.5.  This source of entangled pairs has an 
importance difference to that based on the biexciton cascade described below.  Post-selection implies that the photons 
are destroyed when this scheme succeeds.  This is a problem for some quantum information applications such as LOQC, 
but could be usefully applied to quantum key distribution.65   
Single-Photon LEDs 
An early proposal for an electrical single-photon source by Kim et al67  was based upon etching a semiconductor 
heterostructure displaying Coulomb blockade. However, the light emission from this etched structure was too weak to 
allow the second-order correlation function to be studied. Recently encouraging progress has been made towards the 
realisation of a single-photon source based on quantising a lateral electrical injection current.6869  However the most 
successful approach so far has been to integrate self-assembled quantum dots into conventional p-i-n doped junctions. 
In the first report of electrically-driven single-photon emission by Yuan et al,70 the electroluminescence of a single dot 
was isolated by forming a micron-diameter emission aperture in the opaque top contact of the p-i-n diode.  Fig.5a shows 
an improved emission aperture single-photon LED after Bennett et al, 71 which incorporates an optical cavity formed 
between a high reflectivity Bragg mirror and the semiconductor/air interface in the aperture. This structure forms a weak 
cavity, which enhances the measured collection efficiency 10-fold compared to devices without a cavity. 72      
Single-photon pulses are generated by exciting the diode with a train of short voltage pulses. The second order 
correlation function g(2)(τ) of either the X or X2  electroluminescence (Fig.5c) shows the suppression of the zero delay 
peak indicative of single-photon emission.71  The finite rate of multi-photon pulses is due mostly to background 
emission from layers other than the dot, which is also seen for non-resonant optical excitation.  Electrical contacts also 
allow the temporal characteristics of the single-photon source to be tailored. By applying a negative bias to the diode 
between the electrical injection pulses, Bennett et al73  reduced the jitter in the photon emission time <100ps. This 
allowed the repetition rate of the single-photon source to be increased to 1.07GHz (Fig.5d) while retaining good single-
photon emission characteristics (Fig.5e).  Electrical gating could provide a technique for producing time-bandwidth-
limited single-photons from quantum dots.   
Another promising approach is to aperture the current flowing through the device.7475  This is achieved by growing a 
thin AlAs layer within the intrinsic region of the p-i-n junction and later exposing the mesa to wet oxidation in a 
furnace, converting the AlAs layer around the outer edge of the mesa to insulating Aluminium oxide.  By careful 
control of the oxidation time, a µm-diameter conducting aperture can be formed within the insulating ring of AlOx.  
Such structures have the advantage of exciting just a single dot within the structure, thereby reducing the amount of 
background emission.  The oxide annulus also confines the optical mode laterally within the structure, potentially 
allowing high photon extraction efficiency.   
Altering the nanostructure or materials that comprise the quantum dot allows considerable control over the emission 
wavelength and other characteristics.  Most of the experimental work done so far has concentrated on small InAs 
quantum dots emitting around 900-950nm, as these have well understood optical properties and can be detected with 
low noise Si single-photon detectors. On the other hand the shallow confinement potentials of this system means they 
emit only at low temperatures.  At shorter wavelengths optically-pumped single-photon emission has been demonstrated 
at ~350nm using GaN/AlGaN,76 500nm using CdSe/ZnSSe77 and 682nm InP/GaInP78 quantum dot.  The former two 
systems have been shown to operate at 200K.   
It is very important for quantum communications to develop sources at longer wavelengths in the fibre optic 
transmission bands at 1.3 and 1.55µm.  This may be achieved using InAs/GaAs heterostructures by depositing more 
InAs to form larger quantum dots. These larger dots offer deeper confinement potentials than those at 900nm and thus 
often display room temperature emission.79 Optically pumped single-photon emission at telecom wavelengths has been 
achieved using a number of techniques to prepare low densities of longer wavelength dots, including a bimodal growth 
mode in MBE to form low densities of large dots,80 ultra-low growth rate MBE81 and MOCVD.82  Recently, the first 
electrically-driven single-photon source at a telecom wavelength has been demonstrated.83   
Generation of Entangled Photons 
By collecting both the X2 and X photons emitted by the biexciton cascade, a single quantum dot may also be used as a 
source of photon pairs.  Polarisation correlation measurements on these pairs discovered that the two photons were 
classically-correlated with the same linear polarisation.848586  This occurs because the cascade can proceed via one of 
two intermediate exciton spin states, as described above and shown in Fig.2a, one of which couples to two H- and the 
other two V-polarised photons.  The emission is thus a statistical mixture of |HX2HX> and |VX2VX>, although exciton 
spin scattering during the cascade (discussed below) ensures there are also some cross-polarised pairs.   
The spin splitting87,88 of the exciton state of the dot distinguishes the H and V polarised pairs and prevents the emission 
of entangled pairs predicted by Benson et al. 89  If this splitting could be removed, the H and V components would 
interfere in appropriately designed experiments.  The emitted 2-photon state should then be written as a superposition of 
HH and VV, which can be recast in either the diagonal (spanned by D, A) or circular (σ+, σ-) polarisation bases, ie  
Φ+ =  1/√2 (¦HX2 HX > + ¦VX2 VX >)   
    =  1/√2 (¦DX2 DX > + ¦ΑX2 ΑX >)   
                 =  1/√2 (¦σ+X2 σ
X > + ¦σ
X >)  Eq.2. 
Equal weighting of the HH and VV terms assumes the source to be unpolarised, as indicated by experimental 
measurements.  
Eq.2 suggests that, for zero exciton spin splitting, the biexciton cascade generates entangled photon pairs, similar to 
those seen for atoms.90  Entanglement of the X or X2 photons was recently observed experimentally for the first time by 
Stevenson, Young and co-workers,9192   using two different schemes to cancel the exciton spin splitting. An alternative 
approach by Akopian et al, 93 using dots with finite exciton splitting, post-selects photons emitted in a narrow spectral 
band where the two polarisation lines overlap.    
The exciton spin splitting depends on the exciton emission energy, tending to zero for InAs dots emitting close to 1.4eV 
and then inverting for higher emission energy. 94 95 These correspond to shallow quantum dots for which the carrier 
wavefunctions extend into the barrier material reducing the electron-hole exchange.  Zero splitting can be achieved by 
either careful control of the growth conditions to achieve dots emitting close to the desired energy, or by annealing 
samples emitting at lower energy.94  The exciton spin splitting may be continuously tuned by applying a magnetic field 
in the plane of the dot.96 It has been observed that the signatures of entanglement then appear only when the exciton 
splitting is close to zero.91  Other promising schemes to tune the exciton splitting are now emerging, including 
application of strain97 and electric field.9899  
Figure 6a plots polarisation correlations reported by Young et al92 for a dot with zero exciton splitting (by control of the 
growth conditions).  Pairs emitted in the same cascade (ie zero delay) shows a very striking positive correlation (co-
polarisation) measuring in either, rectilinear or diagonal bases and anti-correlation (cross-polarisation) when measuring 
in circular basis.  This is exactly the behaviour expected for the entangled state of Eq.2.  In contrast, a dot with finite 
splitting shows polarisation correlation for the rectilinear basis only, with no correlation for diagonal or circular 
measurements, see Figure 6b.  The strong correlations seen for all three bases in Fig.6a could not be produced by any 
classical light source or mixture of classical sources and is proof that the source generates entangled photons. The 
measured92 two-photon density matrix (Fig.6c) projects onto the expected 1/√2 (¦HX2 HX > + ¦VX2 VX >) state with 
fidelity (ie probability) 0.702 ± 0.022, exceeding the classical limit (0.5) by 9 standard deviations.  
Two processes contribute to the ‘wrongly’ correlated pairs which impair the fidelity of the entangled photon source.  
The first of these is due to background emission from layers in the sample other than the dot.  This background 
emission, which is unpolarised and dilutes the entangled photons from the dot, limited the fidelity observed in the first 
report91 of triggered entangled photon pairs from a quantum dot and has been subsequently reduced with better sample 
design.92  The second mechanism, which is an intrinsic feature of the dot, is exciton spin scattering during the biexciton 
cascade.  It is interesting that this process does not seem to depend strongly upon the exciton spin splitting.  It may be 
reduced by suppressing the scattering using resonant excitation or alternatively using cavity effects to reduce the time 
required for the radiative cascade.    
Outlook 
The past several years have seen remarkable progress in quantum light generation using semiconductor devices.  
However, despite considerable progress many challenges still remain.  The structural integrity of cavities must continue 
to improve, thereby enhancing quality factors.  This, combined with the ability to reliably position single dots within the 
cavity, will further enhance photon collection efficiencies and the Rabi energy in the strong coupling regime.  It is also 
important to realise all the benefits of these cavity effects in more practical electrically-driven sources.  Meanwhile 
bandstructure engineering of the quantum dots will allow a wider range of wavelengths to be accessed for both single 
and entangled photon sources, as well as structures that can operate at higher temperatures.  Techniques for fine tuning 
the characteristics of individual emitters will also be important.   
One of the most interesting aspects of semiconductor quantum optics is that we may be able to use quantum dots not 
only as quantum light emitters, but also as the logic and memory elements which are required in quantum information 
processing.  Although LOQC is scalable theoretically, quantum computing with photons would be much easier with a 
useful single-photon non-linearity.  Such non-linearity may be achieved with a quantum dot in a cavity in the strong 
coupling regime.  Encouragingly strong coupling of a single quantum dot with various type of cavity has already been 
observed in the spectral domain.  Eventually it may even be possible to integrate photon emission, logic, memory and 
detection elements into single semiconductor chips to form a photonic integrated circuit for quantum information 
processing.   
The author would like to thank Mark Stevenson, Robert Young, Anthony Bennett, Martin Ward and Andy Hudson for 
their useful comments during the preparation of the manuscript and the UK DTI “Optical Systems for Digital Age”, 
EPSRC and EC Future and Emerging Technologies programmes for supporting research on quantum light sources.   
TextBox : Photon Correlation Measurements 
The photon statistics of light can be studied via the second order correlation function, g(2)(τ), which describes the 
correlation between the intensity of the light field with that after a delay τ and is given by100   
This function can be measured directly using the Hanbury-Brown and Twiss101 interferometer, comprising a 50/50 
beamsplitter and two single-photon detectors, shown in the figure.  For delays much less than the average time between 
detection events (ie for low intensities), the distribution in the delays between clicks in each of the two detectors is 
proportional to g(2)(τ).   
For a continuous light source with random emission times, such as an ideal laser or LED, g(2)(τ)=1.  It shows there is no 
correlation in the emission time of any two photons from the source.  A source for which g(2)(τ=0)>1 is described as 
'bunched' since there is an enhanced probability of two photons being emitted within a short time interval.  Photons 
emitted by quantum light sources are typically 'anti-bunched', (g(2)(τ=0)<1) and tend to be separated in time.   
In communication and computing systems, we are interested in pulsed light sources, for which the emission occurs at 
times defined by an external clock.  In this case g(2)(τ) consists of a series of peaks separated by a clock period. For an 
ideal single-photon source, the peak at zero time delay is absent, g(2)(τ=0)=0;  as the source cannot produce more than 
one photon per excitation period, clearly the two detectors cannot fire simultaneously.   
The figure shows g(2)(τ) recorded for resonant pulsed optical excitation of the X emission of a single quantum dot in a 
pillar microcavity.  Notice the almost complete absence of the peak at zero delay: the definitive signature of a single-
photon source.  The weak peak seen at τ=0 demonstrates that the rate of two-photon emission is 50 times less than that 
of an ideal laser with the same average intensity.  The bunching behaviour observed for the finite delay peaks is 
explained by intermittent trapping of a charge carrier in the dot.102  This trace was taken for quasi-resonant laser 
excitation of the dot which avoids creating carriers in the surrounding semiconductor.  For higher energy laser 
excitation, the suppression in g(2)(0) is typically reduced indicating occasional 2-photon pulses due to emission from the 
layers surrounding the dot, but can be minimised with careful sample design.   
Figure textbox: (a) Schematic of the set-up used for photon correlation measurements, (b) second order correlation 
function of the exciton emission of a single dot in a pillar microcavity.  
Figure Captions 
Figure 1: Self assembled quantum dots (a) Image of a layer of InAs/GaAs self assembled quantum dots recorded in an 
Atomic Force Microscope (AFM).  Each yellow blob corresponds to a dot with typical lateral diameters of 20-30nm and 
a height of 4-8nm.  (b) AFM image23 of a layer of InAs quantum dots whose locations have been seeded by a matrix of 
nanometer sized pits patterned onto the wafer surface.  Under optimal conditions up to 60% of the etch pits contain a 
single dot (Courtesy of P Atkinson & D A Ritchie, Cambridge). (c) Cross-sectional STM image of an InAs dot inside a 
GaAs device (Courtesy of P. Koenraad, Eindhoven).   
Figure 2: Optical spectrum of a quantum dot. (a) Schematic of the biexciton cascade of a quantum dot.  (b) Typical 
photoluminescence spectrum of a single quantum dot showing sharp line emission due to the biexciton X2 and exciton X 
photon emitted by the cascade.   The inset shows the polarisation splitting of the transitions originating from the spin 
splitting of the exciton level. 
Figure 3: SEM images of semiconductor cavities, including pillar microcavities (a)56 and (b), microdisk (c)51 and 
photonic bandgap defect cavities (d)47, (e) and (f).48  (Structures fabricated at Univ Wuerzburg (a), CNRS-LPN (UPR-
20), Marcoussis (b, c, e), Univ Cambridge (d), UCSB/ETHZ Zurich (f))      
Figure 4: Two Photon Interference.  (a) If the two photons are indistinguishable, the two outcomes resulting in one 
photon in either arm interfere destructively. This results in the two photons always exiting the beamsplitter together. (b) 
Schematic of an experiment using two photons emitted successively from a quantum dot, (c) experimental data showing 
suppression of the co-incidence rate in (b) when the delay between input photons is zero due to two-photon 
interference.60  (Courtesy of Y Yamamoto, Stanford Univ.) 
Figure 5: Electrically driven single-photon emission.  (a) Schematic of a single-photon LED. (b) Electroluminescence 
spectra of the device.  Notice the spectra are dominated by the exciton X and biexciton X2 lines, which have linear and 
quadratic dependence on drive current, respectively.  Other weak lines are due to charged excitons. (c) second order 
correlation function recorded for the exciton (i) and biexciton (ii) emission lines, (d) time-resolved electroluminescence 
from a device operate with a 1.07GHz repetition rate, (e) measured (i) and modelled (ii) second order correlation 
function of the biexciton electroluminescence at 1.07GHz.  (adapted from Refs. 71and  73) 
Figure 6: Generation of entangled photons by a quantum dot. (a) Degree of correlation measured for a dot with exciton 
polarisation splitting S=0 µeV in linear (i), diagonal (ii) and circular (iii) polarisation bases as a function of the delay 
between the X and X2 photons (in units of the repetition cycle).  The correlation is defined as the rate of co-polarised 
pairs minus the rate of cross-polarised pairs divided by the total rate.  Notice that the values at finite delay show no 
correlation, as expected for pairs emitted in different laser excitation cycles.  More interesting are the peaks close to 
zero time delay, corresponding to X and X2 photon emitted from the same cascade.  The presence of strong correlations 
for all three types of measurement for the dot with zero exciton splitting can only be explained if the X and X2 
polarisations are entangled. (b) Degree of correlations measured for the dot in (a) subject to in-plane magnetic field so 
as to produce an exciton polarisation splitting of S=25 µeV. Notice that the correlation in diagonal and circular bases 
have vanished, indicating only classical correlations at finite splitting.  (c) Two-photon density matrix of the device 
emission in (a).  The strong off-diagonal terms appear due to entanglement.  (adapted from Ref 92) 
References 
                                                                 
1 Gisin, N., Ribordy, G., Tittel, W. & Zbinden, H. Quantum cryptography. Rev. Mod Physics 74, 145-195 (2001).   
2 Dusek, M., Lutkenhaus, N. & Hendrych, M. Quantum Cryptography. Progress in Optics 49, Edt. E. Wolf (Elsevier 2006).   
3 Waks, E., Inoue, K., Santori, C., Fattal, D., Vucković, J., Solomono, G. S. & Yamamoto, Y. Secure communication: Quantum 
cryptography with a photon turnstile. Nature 420, 762-762 (2002).   
4 Bouwmeester, D., Pan, J. W., Mattle, K., Eibl, M., Weinfurter, H. & Zeilinger, A. Experimental quantum teleportation. Nature 390, 
575–579 (1997). 
5 Briegel, H.-J., Dür, W., Cirac, J. I. & Zoller, P. Quantum Repeaters: The Role of Imperfect Local Operations in Quantum 
Communication. Phys. Rev. Lett. 81, 5932-5935 (1998). 
6 Knill, E., Laflamme, R. & Milburn, G. J. A scheme for efficient quantum computation with linear optics. Nature 409, 46–52 (2001).   
7 Kok, P., Munro, W. J., Nemoto, K., Ralph, T. C., Dowling, J. P. & Milburn, G. J. Linear optical quantum computing. Quant-
ph/0512071 (2005). 
8 Giovannetti, V., Lloyd, S. & Maccone, L. Quantum-enhanced measurements: Beating the standard quantum limit. Science 306, 
1330-1336 (2004). 
9  Brendel J, Gisin N, Tittel W and Zbinden H, Pulsed Energy-Time Entangled Twin-Photon Source for Quantum 
Communication   Phys. Rev. Lett. 82, 2594 (1999). 
10 Shih, Y. H. & Alley, C. O. New type of Einstein-Podolsky-Rosen-Bohm experiment using pairs of light quanta produced by 
optical parametric down conversion. Phys. Rev. Lett. 61, 2921–2924 (1988) 
11 Ou, Z. Y. & Mandel, L. Violation of Bell’s inequality and classical probability in a two-photon correlation experiment. Phys. Rev. 
Lett. 61, 50–53 (1988). 
12 Fasel, S., Alibart, O., Tanzilli, S., Baldi, P., Beveratos, A., Gisin, N. & Zbinden, H. High quality asynchronous heralded single-
photon source at telecom wavelength.  New J. Phys. 6, 163 (2004). 
13 Edamatsu, K., Oohata, G., Shimizu, R. & Itoh, T. Generation of ultraviolet entangled photons in a semiconductor, Nature 431, 
167–170 (2004). 
14 Scarani V, Riedmatten H de, Marcikic I, Zbinden H, Gisin N, Eur. Phys. J D 32, 129-138 (2005). 
15 Migdall, A., Branning, D. & Casteletto S. Tailoring single-photon and multiphoton probabilities of a single-photon on-demand 
source. Phys. Rev. A 66, 053805 (2002).   
16 Pittman, T.B., Jacobs, B.C. & Franson, J.D. Single-photons on pseudodemand from stored parametric down-conversion. Phys. Rev. 
A 66, 042303 (2002).   
17 Kimble, H. J., Dagenais M. & Mandel L. Photon Antibunching in Resonance Fluorescence. Phys. Rev. Lett. 39, 691-695 (1977). 
18 Diedrich F. & Walther H. Nonclassical radiation of a single stored ion. Phys. Rev. Lett. 58, 203-206 (1987). 
19 Michler P. et al, in Single Quantum Dots (Springer, Berlin 2003), p315.   
20 Lounis B. and Orrit M., Single Photon Sources, Rep. Prog. Phys. 68, 1129 (2005).   
21 Bimberg, D., Grundmann, M. & Ledentsov N. N. Quantum Dot Heterostructures (Wiley, Chichester, 1999).   
22 Song, H. Z., Usuki, T., Hirose, S., Takemoto, K., Nakata, Y., Yokoyama, N. & Sakuma, Y. Site-controlled photoluminescence at 
telecommunication wavelength from InAs/InP quantum dots. Appl. Phys. Lett. 86, 113118 (2005).   
23 Atkinson P. et al, Site control of InAs quantum dots using ex-situ electron-beam lithographic patterning of GaAs substrates, Jpn. J. 
Appl. Phys., 45, 2519-2521 (2006). 
24 Landin, L., Miller, M. S., Pistol, M.-E., Pryor, C. E. & Samuelson, L. Optical studies of individual InAs quantum dots in GaAs: 
Few-particle effects. Science 280, 262-264 (1998).   
25 Gammon, D., Snow, E. S., Shanabrook, B. V., Katzer, D. S. & Park, D. Fine structure splitting in the optical spectra of single 
GaAs quantum dots. Phys. Rev. Lett. 76, 3005 (1996).   
26 Kulakovskii, V. D., Bacher, G., Weigand, R., Kümmell, T., Forchel, A., Borovitskaya, E., Leonardi, K. & Hommel, D. Fine 
structure of biexciton emission in symmetric and asymmetric CdSe/ZnSe single quantum dots. Phys. Rev. Lett. 82, 1780-1783 (1999).   
27 Michler, P., Kiraz, A., Becher, C., Schoenfeld, W. V., Petroff, P. M., Zhang, L., Hu, E. & Imamoglu, A. A quantum dot single-
photon turnstile device. Science 290, 2282-2285 (2000). 
28 Santori, C., Pelton, M., Solomon, G., Dale, Y. & Yamamoto, Y. Triggered single-photons from a quantum dot. Phys. Rev. Lett  86, 
1502-1505 (2001). 
29 Zwiller, V., Blom, H., Jonsson, P., Panev, N., Jeppesen, S., Tsegaye, T., Goobar, E., Pistol, M.-E., Samuelson, L. & Björk, G. 
Single quantum dots emit single-photons at a time: Antibunching experiments. Appl. Phys. Lett. 78, 2476 (2001). 
30 Thompson, R. M., Stevenson, R. M., Shields, A. J., Farrer, I., Lobo, C. J., Ritchie, D. A., Leadbeater, M. L. & Pepper, M. Single-
photon emission from exciton complexes in individual quantum dots. Phys. Rev. B 64, 201302 (2001)..   
31 Moreau, E., Robert, I., Manin, L., Thierry-Mieg, V., Gérard, J. M. & Abram, I, Quantum cascade of photons in semiconductor 
quantum dots. Phys. Rev. Lett. 87, 183601 (2001).   
32 Regelman, D. V., Mizrahi, U., Gershoni, D., Enhrenfreund, E., Schoenfeld, W. V. & Petroff, P. M. Semiconductor quantum dot: A 
quantum light source of multicolor photons with tunable statistics. Phys. Rev. Lett. 87, 257401 (2001). 
33 Kiraz, A., Falth, S., Becher, C., Gayral, B., Schoenfeld, W. V., Petroff, P. M., Zhang, L., Hu, E. & Imamoglu, A. Photon 
correlation spectroscopy of a single quantum dot. Phys. Rev. B 65, 161303 (2002).  
34 Shields, A. J., Stevenson, R. M., Thompson, R., Yuan, Z. & Kardynal, B. Nano-Physics and Bio-Electronics (Elsevier, Amsterdam 
2002).  
35 Vahala, K. J. Optical microcavities. Nature 424, 839–846 (2003). 
36 Barnes, W. L., Björk, G., Gérard, J. M., Jonsson, P., Wasey, J. A. E., Worthing, P. T. & Zwiller, V. Solid-state single-photon 
sources: light collection strategies. Euro. Phys. Journal D 18, 197 (2002). 
37 Purcell E., Phys. Rev. 69, 681 (1946).   
                                                                                                                                                                                                                     
38 Moreau, E., Robert, I., Gérard, J. M., Abram, I., Manin, L. &  Thierry-Mieg, V. Single-mode solid-state single-photon source 
based on isolated quantum dots in pillar microcavities. Appl. Phys. Lett. 79, 2865 (2001) 
39 Pelton, M., Santori, C., Vucković, J., Zhang, B., Solomon, G. S., Plant, J. & Yamamoto, Y. Efficient source of single-photons: A 
single quantum dot in a micropost microcavity. Phys. Rev. Lett. 89 233602 (2002). 
40 Vucković, J., Fattal, D., Santori, C., Solomon, G. S., Yamamoto, Y., Enhanced single-photon emission from a quantum dot in a 
micropost microcavity, Appl. Phys. Lett. 82, 3596 (2003). 
41 Bennett, A. J., Unitt D., Atkinson P., Ritchie D. A. & Shields A. J., High performance single-photon sources from photo-
lithographically defined pillar microcavities. Opt. Express 13, 50 (2005).   
42 Yablonovitch, E., Inhibited spontaneous emission in solid state physics and electronics. Phys. Rev. Lett. 58, 2059-2062 (1987). 
43 Song B-S, Noda S, Asano T and Akahane, Y., Ultra-high-Q photonic double heterostructure nanocavity. Nature Materials 4, 207-
210 (2005). 
44 Notomi M et al, Appl. Phys. Lett. 88, 041112 (2006).   
45 Kress, A., Hofbauer, F., Reinelt, N., Kaniber, M., Krenner, H. J., Meyer, R., Böhm, G. & Finley, J. J. Manipulation of the 
spontaneous emission dynamics of quantum dots in two-dimensional photonic crystals. Phys. Rev. B 71, 241304 (2005). 
46 Englund, D., Fattal, D., Waks, E., Solomon, G., Zhang, B., Nakaoka, T., Arakawa, Y., Yamamoto, Y. & Vucković, J. Controlling 
the spontaneous emission rate of single quantum dots in a two-dimensional photonic crystal. Phys. Rev. Lett. 95, 013904 (2005). 
47 Gevaux, D. G., Bennett, A. J., Stevenson, R. M., Shields, A. J., Atkinson, P., Griffiths, J., Anderson, D., Jones, G. A. C. & Ritchie, 
D. A. Enhancement and suppression of spontaneous emission by temperature tuning InAs quantum dots to photonic crystal cavities. 
Appl. Phys. Lett. 88, 131101 (2006).   
48 Hennessy et al, quant-ph/0610034; to be published in Nature on 22 Feb doi:10.1038/nature05586 
49 Reithmaier, J. P., Sek, G., Löffler, A., Hofmann, C., Kuhn, S., Reitzenstein, S., Keldysh, L. V., Kulakovskii, V. D., Reinecke, T. L. 
& Forchel, A. Strong coupling in a single quantum dot−semiconductor microcavity system. Nature 432, 197-200 (2004). 
50 Yoshie, T., Scherer, A., Hendrickson, J., Khitrova, G., Gibbs, H. M., Rupper, G, Ell, C., Shchekin, O. B. & Deppeet, D. G. 
Vacuum Rabi splitting with a single quantum dot in a photonic crystal nanocavity. Nature 432, 200-203 (2004). 
51 Peter, E., Senellart, P., Martrou, D., Lemaître, A., Hours, J., Gérard, J. M. & Bloch, J. Exciton-photon strong-coupling regime for a 
single quantum dot embedded in a microcavity. Phys. Rev. Lett. 95, 067401 (2005).   
52 LeThomas N., Woggon U., Schöps O., Artemyev M.V., Kazes M., Banin U., Nano Lett. 6, 557 (2006). 
53 Kuhn A., Hennrich M., Rempe G., Determinsitic single photon source for distributed quantum networking, Phys. Rev. Lett. 89, 
067901 (202) 
54 McKeever J., Boca A., Boozer A.D., Miller R., Buck J.R., Kuzmich A., Kimble H.J., Determinsitic generation of single photons 
from one atom trapped in a cavity, Science 303, 1992 (2004). 
55 Cui G and Raymer M.G., Quantum efficiency of single photon sources in cavity-QED strong-coupling regime, Optics Express 13, 
9660 (2005).   
56  Press, D., Goetzinger, S., Reitzenstein, S., Hofmann, C., Loeffler, A., Kamp, M., Forchel, A. & Yamamoto, Y. Photon 
antibunching from a single quantum dot-microcavity system in the strong coupling regime. Quant-ph/0609193 (2006).   
57 Lee K. H. et al, Registration of single quantum dots using cryogenic laser photolithography, Appl. Phys. Lett. 88, 193106 (2006) 
58 Badolato A. et al, Deterministic Coupling of single quantum dots to single nanocavity modes, Science 308, 1158 (2005).   
59 Hong, C. K., Ou, Z. Y. & Mandel, L. Measurement of subpicosecond time intervals between two photons by interference. Phys. 
Rev. Lett. 59, 2044-2046 (1987).  
60 Santori, C., Fattal, D., Vucković, J., Solomon, G. S. & Yamamoto, Y. Indistinguishable photons from a single-photon device. 
Nature 419, 594-597 (2002). 
61 Bennett, A. J., Unitt, D., Atkinson, P., Ritchie, D. A. & Shields, A. J. Influence of exciton dynamics on the interference of two 
photons from a microcavity single-photon source. Opt. Express 13, 7772-7778 (2005).   
62  Varoutsis, S., Laurent, S., Kramper, P., Lemaître, A., Sagnes, I., Robert-Philip, I. & Abram, I. Restoration of photon 
indistinguishability in the emission of a semiconductor quantum dot. Phys. Rev. B 72, 041303 (2005). 
63 Laurent, S., Varoutsis, S., Le Gratier, L., Lemaître, A., Sagnes, I., Raineri, F., Levenson, A., Robert-Philip, I. & Abram, I. 
Indistinguishable single photons from a single quantum dot in two-dimensional photonic crystal cavity. Appl. Phys. Lett. 87, 163107 
(2005). 
64 Kammerer. C., Cassabois, G., Voisin, C., Perrin, M., Delalande, C., Roussignol, Ph. & Gérard, J. M. Interferometric correlation 
spectroscopy in single quantum dots. Appl. Phys. Lett. 81, 2737-2739 (2002).. 
65 Fattal, D., Diamanti, E., Inoue, K. & Yamamoto, Y. Quantum teleportation with a quantum dot single-photon source. Phys. Rev. 
Lett. 92, 037904 (2004). 
66 Fattal, D., Inoue, K., Vucković, J., Santori, C., Solomon, G. S. & Yamamoto, Y. Entanglement formation and violation of Bell’s 
inequality with a semiconductor single-photon source. Phys. Rev. Lett. 92, 037903 (2004). 
67 Imamoglu, A. & Yamamoto, Y. Turnstile device for heralded single photons: Coulomb blockade of electron and hole tunneling in 
quantum confined p-i-n heterojunctions. Phys. Rev. Lett. 72, 210 (1994). 
68 Cecchini, M., De Simoni, G., Piazza, V., Beltram, Berre, H. E. & Ritchie, D. A. Surface acoustic wave-driven planar light-emitting 
device. Appl. Phys. Lett. 85, 3020-3022 (2005). 
69 Gell, J. R. et al, Surface acoustic wave driven luminescence from a lateral p-n junction.  Appl. Phys. Lett. accepted (2006). 
70Yuan, Z., Kardynal, B. E., Stevenson, R. M., Shields, A. J., Lobo, C. J., Cooper, K., Beattie, N. S., Ritchie, D. A. & Pepper, M. 
Electrically driven single-photon source. Science 295, 102–105 (2002). 
71 Bennett, A. J., Unitt, D. C., See, P., Shields, A. J., Atkinson, P, Cooper, K. & Ritchie, D. A. A microcavity single-photon emitting 
diode. Appl. Phys. Lett. 86, 181102 (2005). 
72 Abram, I., Robert, I. & Kuszelewicz, R. Spontaneous emission control in semiconductor microcavities with metallic or Bragg 
mirrors. IEEE. J. Of Quant. Elect. 34, 71-76 (1998) 
                                                                                                                                                                                                                     
73 Bennett, A. J., Unitt, D. C., See, P., Shields, A. J., Atkinson, P., Cooper, K. & Ritchie, D. A. Electrical control of the uncertainty in 
the time of single-photon emission events. Phys Rev B 72, 033316 (2005) 
74 Ellis, D., Bennett, A. J., Shields, A. J., Atkinson, P. & Ritchie, D. A. Electrically addressing a single self-assembled quantum dot. 
Appl. Phys. Lett. 88, 133509 (2006). 
75 Lochman, A., Stock, E., Schulz, O, Hopfer, F., Bimberg, D., Haisler, V. A., Toropov, A. I., Bakarov, A. K. & Kalagin, A. K. 
Electrically driven single quantum dot polarised single photon emitter. Electron. Lett. 42, 774-775 (2006). 
76 Kako, S., Santori, C., Hoshino, K., Götzinger, S., Yamamoto, Y. & Arakawa, Y. A gallium nitride single-photon source operating 
at 200 K. Nature Materials 5, 887-892 (2006). 
77 Sebald, K., Michler, P., Passow, T., Hommel, D., Bacher, G. & Forchel, A. Single-photon emission of CdSe quantum dots at 
temperatures up to 200 K. Appl. Phys. Lett. 81, 2920-2922 (2002).   
78 Aichele, T., Zwiller, V. & Benson O. Visible single-photon generation from semiconductor quantum dots. New J. Phys. 6, 90 
(2004).   
79 Le Ru, E.C., Fack, J. & Murray, R. Temperature and excitation density dependence of the photoluminescence from annealed 
InAs/GaAs quantum dots. Phys. Rev. B 67, 245318 (2003).  
80 Ward, M. B., Karimov, O. Z., Unitt, D. C., Yuan, Z. L., See, P., Gevaux, D. G., Shields, A. J., Atkinson, P. & Ritchie, D. A. On-
demand single-photon source for 1.3 µm telecom fiber. Appl. Phys. Lett. 86, 201111 (2005). 
81 Zinoni, C., Alloing, B., Monat, C., Zwiller, V., Li, L. H., Fiore, A., Lunghi, L., Gerardino, A., de Riedmatten, H., Zbinden, H. & 
Gisin, N. Time-resolved and antibunching experiments on single quantum dots at 1300nm. Appl. Phys. Lett. 88, 131102 (2006). 
82 Miyazawa T., Takemoto, K., Sakuma, Y., Hirose, S., Usuki, T., Yokoyama, N., Miyazawa, T., Takatsu, M. & Arakawa, Y., 
Single-Photon Generation in the 1.55-um Optical-Fiber Band from an InAs/InP Quantum Dot, Jpn. J. Appl. Phys., vol 44, no 20, pp. 
L620-622 (2005) 
83 Ward M B, Farrow T, See P, Yuan Z L, Karimov O Z, Bennett A J, Shields A J, Atkinson P, Cooper K, Ritchie D A,, Electrically 
driven telecommunication wavelength single-photon source  Appl. Phys. Lett. 90, 063512 (2007) 
84 Stevenson, R. M., Thompson, R. M., Shields, A. J., Farrer, I., Kardynal, B. E., Ritchie, D. A. & Pepper, M. Quantum dots as a 
photon source for passive quantum key encoding. Phys. Rev. B 66, 081302 (2002). 
85 Santori, C., Fattal, D., Pelton, M., Solomon, G. S. & Yamamoto, Y. Polarization-correlated photon pairs from a single quantum dot. 
Phys. Rev. B 66, 045308 (2002). 
86 Ulrich, S. M., Strauf, S., Michler, P., Bacher, G. & Forchel, A. Triggered polarization-correlated photon pairs from a single CdSe 
quantum dot. Appl. Phys. Lett. 83, 1848–1850 (2003). 
87 van Kesteren, H. W., Cosman, E. C., van der Poel, W. A. J. A. & Foxon C. T. Fine structure of excitons in type-II GaAs/AlAs 
quantum wells. Phys. Rev. B 41, 5283–5292 (1990) 
88 Blackwood, E., Snelling, M. J., Harley, R. T., Andrews, S. R. & Foxon C. T. B. Exchange interaction of excitons in GaAs 
Heterostructures. Phys. Rev. B 50, 14246–14254 (1994) 
89 Benson, O., Santori, C., Pelton, M. & Yamamoto, Y. Regulated and entangled photons from a single quantum dot. Phys. Rev. Lett. 
84, 2513-2516 (2000). 
90 Aspect, A., Grangier, P. & Roger, G. Experimental tests of realistic local theories via Bell’s theorem. Phys. Rev. Lett. 47, 460-463 
(1981). 
91 Stevenson, R. M., Young, R. J., Atkinson, P., Cooper, K., Ritchie, D. A. & Shields, A. J. A semiconductor source of triggered 
entangled photon pairs. Nature 439, 179 (2006). 
92 Young, R. J., Stevenson, R. M., Atkinson, P., Cooper, K., Ritchie, D. A. & Shields, A. J. Improved fidelity of triggered entangled 
photons from single quantum dots. New J. Phys. 8, 29 (2006).   
93 Akopian, N., Lindner, N. H., Poem, E., Berlatzky, Y., Avron, J. & Gershoni, D. Entangled photon pairs from semiconductor 
quantum dots. Phys. Rev. Lett. 96, 130501 (2006). 
94 Young, R. J., Stevenson, R. M., Shields, A. J., Atkinson, P., Cooper, K., Ritchie, D. A., Groom, K. M., Tartakovskii, A. I. & 
Skolnick, M. S. Inversion of exciton level splitting in quantum dots. Phys. Rev. B 72, 113305 (2005). 
95 Seguin R., Schliwa A., Rodt S., Poetschke K., Pojl U.W. Bimberg D., Size-Dependent Fine-Structure Splitting in Self-Organized 
InAs/GaAs Quantum Dots, Phys. Rev. Lett. 95, 257402 (2005).   
96 Stevenson, R. M., Young, R. J., See, P., Gevaux, D. G., Cooper, K., Atkinson, P., Farrer, I., Ritchie, D. A. & Shields, A. J. 
Magnetic-field-induced reduction of the exciton polarisation splitting in InAs quantum dots. Phys. Rev. B 73, 033306 (2006). 
97 Seidl, S., Kroner, M., Högele, A. & Karrai, K. Effect of uniaxial stress on excitons in a self-assembled quantum dot. Appl. Phys. 
Lett. 88, 203113 (2006). 
98 Geradot, B. D., Seidl, S., Dalgarno, P. A., Warburton, R. J., Granados, D., Garcia, J. M., Kowalik, K., Krebs, O., Karrai, K., 
Badolato, A. & Petroff, P. M. Manipulating exciton fine-structure in quantum dot with a lateral electric field. Cond-mat/0608711 
(2006). 
99 Kowalik, K., Lemaître, A., Laurent, S., Senellart, P., Voisin, P. & Gaj, J. A. Influence of an in-plane electric field on exciton fine 
structure in InAs-GaAs self-assembled quantum dots. Appl. Phys. Lett. 86, 041907 (2005). 
100 Walls, D. F. & Milburn, G. J. Quantum Optics (Springer, Berlin, 1994). 
101 Hanbury Brown, R. & Twiss, R. Q. A New Type of Interferometer for Use in Radio Astronomy. Phil. Mag. 45, 663 (1954). 
102 Santori, C., Fattal, D., Vucković, J, Solomon, G. S., Waks, E. & Yamamoto, Y. Submicrosecond correlations in 
photoluminescence from InAs quantum dots. Phys. Rev. B 69, 205324 (2004). 
500 nm 500 nm
Fig. 1 
1375 1380 1385
Detection polarisation:
Vertical
Horizontal
Photon Energy (meV)
1378.0 1378.5 1380.0 1380.5
(b)(a)
Fig. 2 
ground state
500 nm
500nm
(c)(a) (b)
(e) (f)
Fig. 3 
(a) (b)
Fig. 4
substrate/buffer
n-ohmic
contact
InAs QD
insulator
Al p-ohmic
contact
emission
n+ Bragg mirror
cavity layer
contact
metal
p+ GaAs
Semicon/air 
interface
905 910 915
X X-X
x100 0.11µA
12.0µA 
wavelength (nm)
95.1µ A
X+(b)
-40 -20 0 20 40
 Time (ns)
-40 -20 0 20 40
delay (ns)
(ii) X
(c) (i)
-10 -5 5 10
delay (ns)
(i) calculated
(ii) measured
time (ns)
Fig. 5 
-0.05
(c) Real Part Imaginary Part
Errors 
(magnitude)
S = 0µeV
delay period (/12.5ns)
(a)(i)
(ii) 
(iii)
-15 0 15
S = 25µeV
(b)(i)
(iii)
-15 0 15
Fig. 6 
detector
beamsplitter
detector
device 
emission
-40 -20 0 20 40
delay, τ [ns]
(a) (b)
Fig. textbox
ABSTRACT
  Lasers and LEDs display a statistical distribution in the number of photons
emitted in a given time interval. New applications exploiting the quantum
properties of light require sources for which either individual photons, or
pairs, are generated in a regulated stream. Here we review recent research on
single-photon sources based on the emission of a single semiconductor quantum
dot. In just a few years remarkable progress has been made in generating
indistinguishable single-photons and entangled photon pairs using such
structures. It suggests it may be possible to realise compact, robust, LED-like
semiconductor devices for quantum light generation.

<|endoftext|><|startoftext|>
Introduction
There are three reasons to study FSI in B decays: to predict (or explain)
the pattern of branching ratios, to study strong interactions, and to forsee in
what decays direct CPV will be large. In view of this necessity a model for
FSI in B decays to two light mesons is suggested and explored in the present
paper.
The probabilities of three B → ππ and three B → ρρ decays are measured
now with good accuracy. The C-averaged branching ratios of these decays
are presented in Table 1 [1]. Let us look at the ratio of the charge averaged
Bd decay probabilities to the charged and neutral mesons:
Br(Bd → ρ+ρ−)
Br(Bd → ρ0ρ0)
≈ 20 , Rπ ≡
Br(Bd → π+π−)
Br(Bd → π0π0)
≈ 4 . (1)
∗kaidalov@itep.ru
†vysotsky@itep.ru
http://arxiv.org/abs/0704.0404v1
Table 1
Mode Br(10−6) Mode Br(10−6)
Bd → π+π− 5.2± 0.2 Bd → ρ+ρ− 23.1± 3.3
Bd → π0π0 1.3± 0.2 Bd → ρ0ρ0 1.16± 0.46
Bu → π+π0 5.7± 0.4 Bu → ρ+ρ0 18.2± 3.0
C-averaged branching ratios of B → ππ and B → ρρ decays.
The large difference of Rρ and Rπ is due to the difference of FSI phases
in B → ρρ and B → ππ decays (see below). In Section 2 we will determine
the differences of FSI phases of tree amplitudes which describe B → ρρ and
B → ππ decays into the states with isospins zero and two from the data
presented in Table 1. As a next step we will suggest a mechanism which
produces such phases. Once this mechanism is defined it becomes possible to
calculate FSI phases of decay amplitudes into states with a definite isospin
(not only their differences). A central question is: what intermediate states
produce FSI phases in B-meson decays into two light mesons. In the weak
decay b → uū(dd̄)d in the rest frame of a heavy quark (which is B-meson
rest frame as well) three fast light quarks are produced. Their energies are
of the order of MB/3 and momenta are more or less isotropically oriented.
The energy of the fourth (spectator) quark is of the order of ΛQCD. This four
quark state transforms mainly into multi pi-meson final state with the average
pion multiplicity about 9 (this number follows from the experimentally known
charged particles multiplicity in e+e− annihilation at Ecm = 3GeV multiplied
by 1.5 ∗ 1.5 in order to take neutral pions and third quark jet into account).
The total branching ratio of such decays is about 10−2. However such meson
state does not transform into the state composed from two light mesons
moving into opposite directions with momenta MB/2. What meson state
does transform into two light mesons can be understood from the inverse
reaction of two light meson scattering at the center of mass energy equal
to the mass of B-meson. The produced hadronic state consists of two jets
of particles moving in opposite directions. Each jet should originate from
a quark-antiquark pair produced in the weak decay of b-quark. The square
of invariant mass of a jet which contains spectator quark does not exceed
MBΛQCD and is much smaller than M
B. The energy of this jet is determined
by that of a companion quark and is about MB/2. That is why the square
of invariant mass of the second jet also does not exceed MBΛQCD. So for
B-decays the mass of a hadron cluster which transforms into light meson
in the final state should not exceed 1.5 GeV. Following these arguments in
the calculation of the imaginary parts of the decay amplitudes we will take
into account only two (relatively light) particle intermediate states for which
branching ratios of B-meson are maximal.
In Section 3 we will calculate FSI phases of tree amplitudes describing
B → ππ decays taking into account ρρ, ππ and πa1 intermediate states which
by t(u)-channel exchanges are converted into ππ. We will find that large
probability of B → ρ+ρ− decay explains about half of FSI phases of B → ππ
decays. Relatively small probability of B → π+π− decay prevents generation
of noticeable FSI phase of B → ρρ amplitudes through B → π+π− → ρρ
chain.
We will demonstrate that the strong interaction phase of the penguin
amplitude is opposite to the result of quark loop calculation, which is very
important for the value of a direct CPV asymmetry Cπ+π− ≡ C+− discussed
in Section 4. Predictions for CPV asymmetries C00 and S00 will be presented
in Section 4 as well and the value of the unitarity triangle angle α will be
extracted from the experimental data on CPV asymmetry S+−.
Subject of rare B decays is an object of intensive study nowadays and an
interested reader can find extensive list of references in a recent paper [2].
2 Phenomenology; |δπ0 − δπ2 | and |δ
0 − δ
Let us present B → ππ decay amplitudes in the so-called “t-convention”, in
which the penguin amplitude with the intermediate c-quark multiplied by
ud + VcbV
cd + VtbV
td = 0 is subtracted from the decay amplitudes [3]:
MB̄d→π+π− =
|VubV ∗ud|m2Bfπf+(0)
iδπ2 +
+ e−iγ
iδπ0 +
V ∗tdVtb
eiβPei(δ
+δ̃π0 )
, (2)
MB̄d→π0π0 =
|VubV ∗ud|m2Bfπf+(0)
iδπ2 −
− e−iγ 1√
iδπ0 −
V ∗tdVtb
eiβPei(δ
+δ̃π0 )
, (3)
MB̄u→π−π0 =
|VubV ∗ud|m2Bfπf+(0)
e−iγA2e
, (4)
where Vik are the elements of CKM matrix, γ and β are the unitarity triangle
angles and we factor out the product m2Bfπf+(0) which appears when the
decay amplitudes are calculated in the factorization approximation. A2 and
A0 are the absolute values of the decay amplitudes into the states with I = 2
and 0, generated by operators O1 and O2 (tree amplitudes), while P is the
absolute value of QCD penguin amplitude (generated by operators O3 − O6
of effective nonleptonic Hamiltonian which describes b quark decays into the
states without charm and strange quarks). δπ0 , δ
2 and δ̃
0 are FSI phases of
these three amplitudes, and it is very important for what follows that all of
them are different. It is easy to understand why δπ0 is different from δ
2 : strong
interaction depends on the isospin and is different for I = 0 and I = 2. For
example, there are definitely quark-antiquark resonances with I = 0, while
exotic resonances with I = 2 should be made from at least four quarks and
their existence is questionable. The reason why δπ0 differs from δ̃
0 is more sub-
tle. Let us consider the intermediate state made from two charged ρ-mesons
which contributes to FSI phases: Bd → ρ+ρ− → ππ. ρ+ρ− intermediate state
contribution to FSI phases can be large since Br(Bd → ρ+ρ−) is big. Both
tree and penguin induced amplitudes get FSI phases through this chain. Its
contribution to δπ0 is proportional to
(BrBd → ρ+ρ−)T/(BrBd → π+π−)T ≈
(BrBd → ρ+ρ−)/(BrBd → π+π−) ≈ 2.1, while that to δ̃π0 is proportional to
(BrBd → ρ+ρ−)P/(BrBd → π+π−)P .
How can we determine the penguin contributions to the probabilities
of Bd → ρ+ρ− and Bd → π+π−-decays? The most straightforward way
suggested in literature is to extract them from the probabilities of Bu →
K0∗ρ+ and Bu → K0π+ decays to which tree amplitudes almost do not
contribute [4, 5]1:
Br(Bd → ρ+ρ−)P =
η2 + (1− ρ)2
]2 τBd
Br(K0∗ρ+) ≈ 0.34·10−6 ,
1Contribution of tree amplitudes to these decays comes from the rescattering (Bu →
K+π0)T , K
+π0 → K0π+, and taking into account CKM suppression of the tree am-
plitudes of B → Kπ(K∗ρ) decays relative to the penguin amplitudes we can cautiously
estimate tree contribution as not more than 10% of penguin one .
Br(Bd → π+π−)P =
η2 + (1− ρ)2
]2 τBd
Br(K0π+) ≈ 0.59·10−6 ,
where fρ = 209 MeV and fK∗ = 218 MeV are the vector meson decay
constants, λ = 0.23, η = 0.34 and ρ = 0.20 are the CKM matrix parameters
in Wolfenstein parametrization [6], fK/fπ = 1.2 and the central values of
Br(Bu → K0∗ρ+) = (9.2±1.5)·10−6 and Br(Bu → K0π+) = (23.1±1.0)·10−6
[1] were used. The accuracy of equations (5) and (6) depends on the accuracy
of d ↔ s interchange symmetry (U -spin symmetry) of b → d(s) transition
amplitudes described by QCD penguin, however when the ratio of (5) to (6)
is calculated uncertainty factors partially cancel out and we obtain rather
stable result: instead of being enchanced as in the case of the tree amplitude
intermediate vector mesons contribution into penguin Bd → π+π− amplitude
is suppressed, (δ̃π0 )ρρ ≈ 1/2.8(δπ0 )ρρ. Taking into account that fraction of
longitudinally polarized vector mesons produced in Bu → K0∗ρ+ decays is
about 50% we get additional suppression of (δ̃π0 )ρρ by factor
Finally, phase δπP comes from the imaginary part of the penguin loop
with c-quark propagating in it [8]. In order to calculate δπP let us consider
corresponding quark diagram. The charm penguin contribution is given by
the following expression:
P = −Pc(k2) =
) + i
1− 4m
, (7)
where k is the sum of momenta of two quarks to which gluon radiated from
penguin decays: k = p1 + p2. One of these quarks forms π-meson with the
spectator quark, so neglecting spectator quark momentum in the rest frame
of B-meson we have p1 = (
). The second quark forms another π-meson
with d̄-quark radiated from penguin: p2 = x(
) where 0 < x < 1 is
the fraction of π+ momentum carried by u-quark. Substituting k2 = xm2b
into (7) and integrating it with the asymptotic quark distribution function
in π-meson ϕπ(x) = x(1−x) we obtain the value of δπP which depends on the
ratio 4m2c/m
b . In particular, for mb = 5.3 GeV and mc = 1.9 GeV (which
correspond to the masses of physical states) we obtain δπP ≈ 10o, a small
positive value. A nonperturbative calculation of δπP described in Section 3
demonstrates that the sign of δπP can be negative.
Our next task is to determine the difference of FSI phases δπ0 − δπ2 (the
large value of it is responsible for a relatively small value of Rπ). If we neglect
the penguin contribution, then from (2) - (4) we get the following expression:
cos(δπ0 − δπ2 ) =
B+− − 2B00 + 23
B+− +B00 − 23
, (8)
where Bik’s are the C-averaged branching ratios, while τ0/τ+ ≡ τ(Bd)/τ(Bu) =
0.92. Substituting the central values from Table 1 we get |δπ0 − δπ2 | = 48o.
Penguin contributions to Bik do not interfere with tree ones because α =
π − β − γ is almost equal to π/2. Taking P 2 terms into account with the
help of (6) (subtracting 0.59 and 0.30 from the first and the second lines of
Table 1 numbers describing B → ππ data correspondingly) we get:
|δπ0 − δπ2 | = 37o ± 10o . (9)
The accuracy of this 11o decrease of the absolute value of the phases difference
is determined by the accuracy of (6) and is not high. In recent paper [2] the
global fit of B → ππ and B → πK decay data was made. The tree amplitudes
of B → ππ decays were designated in [2] by T for B → π+π− and by C for
B → π0π0. According to [2] the difference of FSI phases between C and T
equals δC = −58o ± 10o, |C| = 0.37 ± 0.05, |T | = 0.57 ± 0.05 in the units
of 104 eV. The phase shift between the isospin amplitudes is determined by
these quantities:
tan(δ0 − δ2) =
3TC sin(−δC)
2T 2 + TC cos δC − C2
, (10)
and substituting the numbers we obtain:
δ0 − δ2 = 40o ± 7o , (11)
the result very close to (9). However, the same d ↔ s interchange symmetry
was used in [2] when relating B → ππ and B → Kπ decays. Fit [2] was
made in the same “t-convention” which we use (see the statement at the end
of page 3 of the paper [2]: “for simplicity, we will assume ... Ptc = Ptu”),
therefore the obtained results can be directly compared with ours.
Now let us consider B → ρρ decays. According to BABAR and BELLE
results ρ mesons produced in B decays are almost entirely longitudinally
polarized (fL(ρ+ρ−) = 0.98± 0.03[9], fL(ρ+ρ0) = 0.91± 0.4 [10], fL(ρ0ρ0) =
0.86 ± 0.12 [11]). For B decays into the longitudinally polarized ρ-mesons
we can write formulas analogous to (2) - (4) and we can find FSI phases
difference with the help of analog of (8). Substituting the central values of
branching ratios of B → ρρ decays from Table 1 we obtain: |δρ0−δ
2 | = 21o. In
order to subtract the penguin contribution with the help of (5) we should take
into account that in Bu → K0∗ρ+ decays the fraction of the longitudinally
polarized vector mesons equals approximately 50% [12], so we should subtract
0.17 · 10−6 in case of decay to ρ+ρ− and 0.08 · 10−6 for decay into ρ0ρ0. In
this way we obtain:
|δρ0 − δ
2 | = 20o+8
−20o , (12)
and the factor 2 difference between (12) and (9) or (11) is responsible for
the different patterns of B → ρρ and B → ππ decay probabilities. Let us
emphasize that while |δρ0 − δ
2 | being only one standard deviation from zero
can be very small this is not so for |δπ0 − δπ2 |.
3 Calculation of the FSI phases of B → ππ
and B → ρρ decay amplitudes
Among three amplitudes of B → ππ decays (2)–(4) only two are independent.
We will calculate FSI phases of B → π+π0 and B → π+π− amplitudes and
extract from them FSI phases of amplitudes with a definite isospin.
Our task is to take into account the intermediate state contributions
into FSI phases. As it was argued in Introduction we should consider only
two particle intermediate states with positive G-parity to which B-mesons
have relatively large decay probabilities. Alongside with ππ and ρρ there is
only one such state: πa1. So we will consider ρρ intermediate state which
transforms into ππ by π exchange in t-channel, πa1 intermediate state which
transforms into ππ by ρ exchange in t-channel and will take into account the
elastic channel B → ππ → ππ as well. This approach is analogous to the
FSI consideration performed in paper [13]. However in [13] 2 → 2 scattering
amplitudes were considered to be due to elementary particle exchanges in
t-channel. For vector particles exchanges s-channel partial wave amplitudes
behave as sJ−1 ∼ s0 and thus do not decrease with energy (decaying meson
mass). However it is well known that the correct behavior is given by Regge
theory: sαi(0)−1. For ρ-exchange αρ(0) ≈ 1/2 and the amplitude decrease
with energy as 1/
s. This effect is very spectacular for B → DD → ππ
chain with D∗(D∗2) exchange in t-channel: αD∗(0) ≈ −1 and reggeized D∗
meson exchange is damped as s−2 ≈ 10−3 in comparison with elementary
D∗ exchange (see for example [14]). For π-exchange, which gives a dominant
contribution to ρρ → ππ transition (see below), in the small t region the pion
is close to mass shell and its reggeization is not important.
We will use Feynman diagram approach to calculate FSI phases from
the triangle diagram with the low mass intermediate states X and Y (see
Figure 1). Integrating over loop momenta d4k we assume that integrals over
masses of intermediate states X and Y decrease rapidly with increase of these
masses. Then choosing z axis in the direction of momenta of the produced
meson M1 we can transform the integral over k0 and kz into the integral over
the invariant masses of clusters of intermediate particles X and Y
dk0dkz =
dsXdsY (13)
and deform integration contours in such a way that only low mass interme-
diate states contributions are taken into account while the contribution of
heavy states being small is neglected. In this way we get:
M Iππ = M
XY (δπXδπY + iT
XY→ππ) , (14)
where M
XY are the decay matrix elements without FSI interactions and
T J=0XY→ππ is the J = 0 partial wave amplitude of the process XY → ππ (T J =
(SJ − 1)/(2i)) which originates from the integral over d2k⊥.
For real T (14) coincides with the application of the unitarity condition
for the calculation of the imaginary part of M while for the imaginary T the
corrections to the real part of M are generated.
Let us calculate the imaginary parts of B → ππ decay amplitudes which
originate from B → ρρ → ππ chain with the help of unitarity condition 2:
ImM(B → ππ) =
d cos θ
M(ρρ → ππ)M∗(B → ρρ) , (15)
where θ is the angle between ρ and π momenta. For small values of θ or t
π-exchange in t-channel dominates and the calculation of Feynman diagram
2in this section the phases which originate from CKM matrix elements are omitted.
Figure 1: Diagram which describes FSI in the decay of heavy meson MQq
into two light mesons M1 and M2. X and Y are the clusters of particles with
small invariant masses sX , sY ≤ MQΛQCD, k is 4-momentum of a virtual
particle propagating in t-channel.
for ρρ → ππ amplitude with the elementary virtual π-meson exchange can
be trusted, as it was noted above. It was already stressed that ρ-mesons pro-
duced in B-decays are almost entirely longitudinally polarized. That is why
we will take into account only longitudinal polarization for the intermediate
ρ-mesons and amplitudes of B-decays into ππ and ρLρL are simply related
MB+→ρ+ρ0 = −
MB+→π+π0 , MBd→ρ+ρ− = −
MBd→π+π− . (16)
For the amplitude of ρ+ρ0 → π0π+ transition we have:
iM(ρ+ρ0 → π0π+) = −i
g2ρππ
(p1 − k1)2 −m2π
+)(k2ρ
0) , (17)
where p1, k1 and k2 are ρ
+, π0 and π+ momenta. From the width of ρ-meson
we get g2ρππ/16π = 2.85. For the longitudinally polarized ρ-mesons in their
center of mass system we have:
+ = k2ρ
0 = − 1
(t−m2π)(1 +
) +m2ρ
, (18)
where t = (p1 − k1)2. Changing the integration variable in (15) to t with the
help of dt =
(1− 2 m
)d cos θ and introducing formfactor exp(t/µ2) with
3relative negative sign of the amplitudes follows from the expressions for transition
formfactors in the factorization approximation, see for example [15].
the parameter µ2 ∼ 1 GeV 2 we obtain:
ImMB→π+π0 = +
(m2ρ−m
g2ρππdt
16πM2B ∗ 4m2ρ
(t−m2π)(1 +
+ 2m2ρ(1 +
t−m2π
exp(t/µ2)
MB→π+π0 . (19)
For µ2 = 2m2ρ the contributions of the first two terms in square brackets
cancel, while the third term gives:
ImMB→π+π0 = −
g2ρππ
3.1MB→π+π0 , (20)
and from (4) we get:
δπ2 (ρρ) = −4.9o . (21)
Let us note that in the limit MB → ∞ the ratio Br(Bd → ρρ)/Br(Bd → ππ)
grows as M2B, that is why FSI phase δ
2 (ρρ) (and δ
0 (ρρ)) diminishes as 1/MB.
The analogous consideration of ρ+ρ− intermediate state leads to the pos-
itive FSI phase of Bd → π+π− amplitude which is enhanced relatively to
δπ2 (ρρ) according to (16):
δπ+−(ρρ) = +5.7
o , (22)
and for FSI phase of the amplitude with isospin zero in the linear approxi-
mation we get:
δπ0 (ρρ) = δ
+−(ρρ) +
δπ+−(ρρ)− δπ2 (ρρ)
. (23)
We are able to extract the ratio A2/A0 from that of C-averaged Br(Bd →
π+π−), Br(Bd → π0π0) and Br(Bu → π+π0), subtracting penguin contribu-
tion as we did deriving (9):
B̃+− + B̃00
− 1 , (24)
= 0.80± 0.09 , (25)
and, finally:
δπ0 (ρρ) = 15
o , δπ0 (ρρ)− δπ2 (ρρ) = 20o . (26)
In this way we see that B → ρρ → ππ chain generates half of the experi-
mentally observed FSI phase difference of B → ππ tree amplitudes.
It is remarkable that FSI phases generated by B → ππ → ρρ chain are
damped by Br(B → ρ+ρ−, ρ+ρ0)/Br(B → π+π, π+π0) ratios and are a few
degrees:
2(ππ) =
∗ δπ2 (ρρ) = −1.4o , δ
+−(ππ) =
∗ δπ+−(ρρ) = 1.2o ,
(A0/A2)ρρ = 1.1 , δ
0(ππ) = 2.9
o , δ
0(ππ)− δ
2(ππ) ≈ 4o . (27)
Next we will take into account ππ intermediate state. From Regge analy-
sis of ππ elastic scattering we know that good description of the experimental
data is achieved when the exchanges of pomeron, ρ and f trajectories in t-
channel are taken into account [16]. Pomeron exchange dominates in elastic
ππ → ππ scattering at high energies. For αP (0) = 1 the corresponding
amplitude T is purely imaginary and the phases of matrix elements do not
change [3]. However taking into account that pomeron is ”supercritical”,
αP (0) ≈ 1.1, we obtain the phase of the amplitude generated by pomeron
exchange 4 which cancels the phases generated by ρ and f exchanges for
I = 2. For I = 0 the sum of ρ and f exchanges produces the purely imag-
inary amplitude T and the phase of the amplitude M is due to pomeron
”supercriticallity”:
δπ0 (ππ) = 5.0
o , δπ2 (ππ) = 0
o . (28)
In paper [3] the pomeron exchange amplitude was considered as purely
imaginary. As a result though important for branching ratios phase difference
δπ0 (ππ)−δπ2 (ππ) was the same (pomeron contribution being universal cancels
in the difference of phases) it came mainly from δπ2 (ππ) negative value. In
4The amplitude of 2 → 2 process due to supercritical pomeron exchange is T ∼
(s/s0)
αP (t)(1 + exp(−iπαP (t)))/(− sin(παP (t))) = (s/s0)(1+∆)(i + ∆π/2), where in the
last expression t = 0 was substituted and αP (0) = 1 +∆ was used (∆ ≈ 0.1).
this way result for the absolute value of direct CP-asymmetry Cπ+π− was
underestimated, see below.
Finally πa1 intermediate state should be accounted for. Large branching
ratio of Bd → π±a∓1 -decay ( Br(Bd → π±a∓1 ) = (40 ± 4) ∗ 10−6) is partially
compensated by small ρπa1 coupling constant (it is 1/3 of ρππ one). As a
result the contributions of πa1 intermediate state (which transforms into ππ
by ρ-trajectory exchange in t-channel) to FSI phases equal approximately
that part of ππ intermediate state contributions which is due to ρ-trajectory
exchange. Assuming that the sign of the πa1 intermediate state contribution
into phases is the same as that of elastic channel we obtain:
δπ0 (πa1) = 4
o , δπ2 (πa1) = −2o . (29)
Summing the imaginary parts of the amplitudes which follow from (21),
(26), (28) and (29) we finally obtain:
δπ0 = 23
o , δπ2 = −7o , δπ0 − δπ2 = 30o , (30)
and the accuracy of these numbers is not high, at the level of 50%.
The analogous consideration of the real parts of the loop corrections to
B → ππ decay amplitudes leads to the diminishing of the (real) tree am-
plitudes by ≈ 30%, and we can explain the experimentally observed value
δπ0 − δπ2 ≈ 40o in our model while for ρρ final state the analogous difference
is about three times smaller, δ
0 − δ
2 ≈ 15o.
Let us estimate the phase of the penguin amplitude δπP considering charmed
mesons intermediate states: B → D̄D, D̄∗D, D̄D∗, D̄∗D∗ → ππ 5. In Regge
model all these amplitudes are described at high energies by exchanges of
D∗(D∗2)-trajectories. An intercept of these exchange-degenerate trajectories
can be obtained using the method of [17] or from masses of D∗(2007) – 1−
andD∗2(2460) – 2
+ resonances, assuming linearity of these Regge-trajectories.
Both methodes give αD∗(0) = −0.8÷−1 and the slope α′D∗ ≈ 0.5GeV −2.
The amplitude of D+D− → π+π− reaction in the Regge model proposed
in papers [18, 19] can be written in the following form:
TDD̄→ππ(s, t) = −
e−iπα(t)Γ(1− αD∗(t))(s/scd)αD∗(t) , (31)
5These amplitudes are considered as penguin due to the proper combination of CKM
matrix elements.
where Γ(x) is the gamma function.
The t-dependence of Regge-residues is chosen in accord with the dual
models and is tested for light (u,d,s) quarks [18]. According to [19] scd ≈
2.2GeV 2.
Note that the sign of the amplitude is fixed by the unitarity in the t-
channel (close to the D∗-resonance). The constant g20 is determined by the
width of the D∗ → Dπ decay: g20/(16π) = 6.6. Using (14), analog of (15),
(31) and the branching ratio Br(B → DD̄) ≈ 2 · 10−4 [20] we obtain the
imaginary part of P and comparing it with the contribution of P in B →
π+π− decay probability (6) we get δπP ≈ −3.5o6. A smallness of the phase
is due to the low intercept of D∗-trajectory. The sign of δP is negative -
opposite to the positive sign which was obtained in perturbation theory (7).
Since DD̄-decay channel constitutes only ≈ 10% of all two-body charm-
anticharm decays of Bd-meson [20] taking these channels into account we can
easily get
δP ∼ −10o , (32)
which may be very important for the interpretation of the experimental data
on direct CP asymmetry C+− discussed in the next section.
4 CP asymmetries of Bd(B̄d) → ππ decays
The CP asymmetries are given by :
Cππ ≡
1− |λππ|2
1 + |λππ|2
, Sππ ≡
2Im(λππ)
1 + |λππ|2
, λππ ≡ e−2iβ
MB̄→ππ
MB→ππ
, (33)
where ππ is π+π− or π0π0.
From (2) for direct CP asymmetry in Bd(B̄d) → π+π− decays we readily
obtain:
C+− = −
sinα[
2A0 sin(δ0 − δ̃0 − δP ) + A2 sin(δ2 − δ̃0 − δP )]/
cos(δ0 − δ2)−
A0P̃ cosα cos(δ0 − δ̃0 − δP )−
6In integration over cos θ the region θ ≪ 1 dominates. In this region representation
(31) is valid.
− A2P̃√
cosα cos(δ2 − δ̃0 − δP ) + P̃ 2] , (34)
where
V ∗tdVtb
P . (35)
In order to make a numerical estimate we should know the ratios A0/A2
and P/A2. The first one is given by (25) while the second one can be extracted
from the ratio Br(Bu → K0π+)/Br(Bu → π0π+) assuming d ↔ s invariance
of the strong interactions:
Br(Bu → K0π+)
Br(Bu → π0π+)
f 2KP
2|V ∗tsVtb|2
A22|V ∗udVub|2
, (36)
= 0.092(0.009) . (37)
The numerical values of A0 andA2 are given with good accuracy by factor-
ization calculation, while P appears to be 2.5 times larger than factorization
result [3]. In view of this the validity of factor fK in (36) which originates
from factorization calculation of the penguin amplitude is questinable. If
factorization of the penguin amplitudes is not assumed then the ratio fK/fπ
in (36) should be replaced by unity. In this way we get 20% larger value of
P/A2 in (37) and we will take this value of uncertainty as an estimate of the
theoretical accuracy of the determination of P :
= 0.21(0.04) , (38)
Taking into account that unitarity triangle angle α ≈ 90o and angles δ̃0 and
δP are of the order of few degrees from (34) we obtain:
C+− ≈ −0.28[1.1 sin(δ0 − δ̃0 − δP ) + sin(δ2 − δ̃0 − δP )] ≈
≈ −0.56 sin((δ0 + δ2)/2− δ̃0 − δP ) . (39)
In order to determine the lower bound on the value of C+− let us suppose
that δ0 = 37
o, δ2 = 0
o (we keep the difference δ0− δ2 = 37o, as it follows from
the data on B → ππ decay probabilities (9)), and neglect small values of δ̃0
and δP :
C+− > −0.18 . (40)
Concerning experimental number it could well happen that finally it will
be considerably below our bound. In this case the result of nonperturbative
calculation of penguin phase will be confirmed. Substituting in (39) δ0 =
30o, δ2 = −7o and δP from (32) we obtain the following central value:
C+− = −0.21 . (41)
It is instructive to compare the obtained numbers with the value of C+−
which follows from the asymmetry ACP (K
+π−) if d ↔ s symmetry is sup-
posed [21]:
C+− =
ACP (K
Γ(B → K+π−)
Γ(B → π+π−)
sin(β + γ)
sin(γ)
= 1.2(−2)(−0.093± 0.015)19.8
sin 82o
sin 60o
0.87 = −0.24± 0.04 . (42)
Let us note that one factor fπ/fK in the last equation appears from the
matrix element of the tree operator, the second one - from the matrix element
of the penguin operator. If because of nonfactorization of penguin amplitudes
we will omit the factor which appears from the penguin [5], then the numbers
in the right-hand sides of (40, 41) and (42) will become 20% smaller.
The experimental results obtained by Belle [22] and BABAR [23] are
contradictory
CBelle+− = −0.55(0.09) , CBABAR+− = −0.21(0.09), (43)
Belle number being far below (40) and (41).
For direct CP asymmetry in Bd(B̄d) → π0π0 decay from (3) we readily
obtain:
C00 = −
P̃ sinα[A0 sin(δ0 − δ̃0 − δP )−
2A2 sin(δ2 − δ̃0 − δP )]/
A0A2 cos(δ0 − δ2)−
A0P̃ cosα cos(δ0 − δ̃0 − δP ) +
A2P̃ cosα cos(δ2 − δ̃0 − δP ) + P̃ 2] , (44)
C00 ≈ −1.06[0.8 sin(δ0 − δ̃0 − δP )− 1.4 sin(δ2 − δ̃0 − δP )] ≈ −0.6 , (45)
considerably smaller than C+−. This unusually large direct CPV (measured
by |C00|) is intriguing task for future measurements since the present exper-
imental error is too big:
exper
00 = −0.36(0.32) . (46)
Belle and BABAR agree now on the value of another CPV asymmetry
measured in Bd(B̄d) → π+π− decays: Sexper+− = −0.62 ± 0.09 [22, 23]. From
this measurement the value of unitarity triangle angle α can be extracted.
Neglecting the penguin contribution we get:
sin 2αT = S+− , (47)
αT = 109o ± 3o . (48)
Penguin shifts the value of α. The accurate formula looks like:
S+− = [sin 2α(
cos(δ0 − δ2))−
A2P̃√
sinα cos(δ2 − δ̃0 − δP )−
A0P̃ sinα cos(δ0 − δ̃0 − δP )]/
cos(δ0 − δ2)−
A0P̃ cosα cos(δ0 − δ̃0 − δP )−
A2P̃√
cosα cos(δ2 − δ̃0 − δP ) + P̃ 2] , (49)
and since all the phase shifts are not big the values of cosines in (49) are
rather stable relative to their variations. For numerical estimates we take
δ0 = 30
o, δ2 = −7o and neglect δ̃0 and δP . In this way we get:
(α)ππ = 88
o ± 40(exper)± 50(theor) , (50)
where the first error comes from uncertainty in S
exper
+− while the second one
comes from that in the value of penguin amplitude, (38). Relatively large
theoretical uncertainty in the value of P̃ does not prevent to determine α
with good precision.
The relative smallness of penguin contribution to B → ρρ decay am-
plitudes allow us to determine α with better theoretical accuracy from the
experimental measurement of (S+−)ρρ just as it was done in [24]. With the
help of (5) we obtain:
)ρρ = 0.060(0.012) , (51)
where the same 20% uncertainty in extracting penguin amplitude is supposed.
Using the ratio (A0/A2)ρρ determined in (27) from the (49) neglecting strong
phases (which are much smaller than in the case of B → ππ decays) and
taking into account the recent experimental result (S
exper
+− )ρρ = −0.06± 0.18
[1] we obtain:
(α)ρρ = 87
o ± 50(exper)± 10(theor) . (52)
Let us point out that considerably larger theoretical error quoted in [4]
follows from the larger theoretical uncertainty in the value of penguin ampli-
tude assumed in that paper.
Our results for α should be compared with the numbers which follow from
the global fit of unitarity triangle [6, 7]:
αCKMfitter = (99.0+4.0−9.4)
o , αUTfit = (93± 4)o . (53)
We conclude this section with the prediction for the value of CPV asym-
metry S00:
S00 = [sin 2α(
2A0A2
cos(δ0 − δ2)) +
2A2P̃√
sinα cos(δ2 − δ̃0 − δP )−
A0P̃ sinα cos(δ0 − δ̃0 − δP )]/
2A0A2
cos(δ0 − δ2)−
A0P̃ cosα cos(δ0 − δ̃0 − δP ) +
2A2P̃√
cosα cos(δ2 − δ̃0 − δP ) + P̃ 2] = 0.70± 0.15 , (54)
a large asymmetry with the sign opposite to that of S+−.
5 Conclusions
FSI appeared to be very important in B → ππ decays.
The description of these interactions presented in the paper allows to
explain the experimentally observed difference of the ratios of decay proba-
bilities to the neutral and charged modes in B → ππ and B → ρρ decays.
Rather large absolute value of direct CP asymmetry C+− (if confirmed
experimentally) will be a manifestation of the negative sign of penguin FSI
phase in accord with nonperturbative calculation and opposite to perturba-
tive result.
We are grateful to L.V.Akopyan for checking formulas, Jose Ocariz for
recommendation to include the result for angle α which follows from CP
asymmetry (S+−)ρρ and M.B.Voloshin for useful discussion.
This work was supported by Russian Agency of Atomic Energy;
A.K. was partly supported by grants RFBR 06-02-17012, RFBR 06-02-
72041-MNTI, INTAS 05-103-7515 and state contract 02.445.11.7424;
M.V. was partly supported by grants RFBR 05-02-17203 and
NSh-5603.2006.2.
References
[1] HFAG, http://www.slac.stanford-edu/xorg/hfag.
[2] C.-W. Chiang, Y.-F. Zhou, JHEP 0612 (2006) 027.
[3] A.B. Kaidalov, M.I. Vysotsky, hep-ph/0603013, accepted in Yad. Fiz.
[4] M. Beneke, M. Gronau, J. Rohrer, M. Spranger, Phys.Lett. B638
(2006) 68.
[5] M. Gronau, J.L. Rosner, Phys. Lett. B595 (2004) 339.
[6] CKM fitter, http://ckmfitter.in2p3.fr.
[7] UTfit, http://utfit.roma1.infn.it.
[8] M. Bander, D. Silverman and A. Soni, Phys. Rev. Lett. 43, (1979) 242;
G.M. Gérard and W.-S. Hou, Phys. Rev. D43, (1991) 2909.
[9] B. Aubert et al., BABAR Collaboration, hep-ex/0607098 (2006).
[10] B. Aubert et al., BABAR Collaboration, Phys. Rev. Lett. 97 (2006)
261801.
[11] B. Aubert et al., BABAR Collaboration, hep-ex/0607097 (2006).
[12] J. Zhong et al. Belle Collaboration, Phys. Rev. Lett. 95 (2005) 141801;
B. Aubert et al., BABAR Collaboration, Phys. Rev. Lett. 97 (2006)
201801.
[13] H-Y. Cheng, C-K. Chua and A.Soni, Phys. Rev. D71 (2005) 014030.
[14] A.Deandrea et al., Int. J. Mod. Phys. (2006) 4425.
[15] R.Aleksan et al., Phys. Lett. B356 (1995) 95.
[16] K.G.Boreskov, A.A.Grigoryan, A.B.Kaidalov, I.I.Levintov, Yad. Fiz.
27, (1978) 813.
[17] A.B.Kaidalov, Zeit. fur Phys. C12, (1982) 63.
[18] P.E.Volkovitsky, A.B.Kaidalov, Sov.J.Nucl.Phys. 35, (1982) 909.
[19] K.G.Boreskov, A.B.Kaidalov, Sov.J.Nucl.Phys. 37, (1983) 100.
[20] Review of Particle Physics, W.-M. Yao et al., Journal of Physics G 33,
(2006) 1.
[21] R.Fleischer, Phys. Lett. B459, (1999) 306.
[22] H.Ishino, Belle, talk at ICHEP06, Moscow (2006).
[23] B.Aubert et al, BABAR Collaboration, hep-ex/0703016 (2007).
[24] M.I.Vysotsky, Yad. Fiz. 69, (2006) 703.
ABSTRACT
  The final state interactions (FSI) model in which soft rescattering of low
mass intermediate states dominates is suggested. It explains why the strong
interaction phases are large in the $B_d\to\pi\pi$ channel and are considerably
smaller in the $B_d\to\rho\rho$ one. Direct CP asymmetries of $B_d\to\pi\pi$
decays which are determined by FSI phases are considered as well.

<|endoftext|><|startoftext|>
Introduction. Semimartingale reflecting Brownian motions (SRBMs)
living in the closures of domains with piecewise smooth boundaries are of
interest in applied probability because of their role as heavy traffic diffusion
approximations for some stochastic networks. The nonsmoothness of the
boundary for such a domain, combined with discontinuities in the oblique
directions of reflection at intersections of smooth boundary surfaces, present
Received May 2006; revised November 2006.
1Supported in part by NSF Grants DMS-03-05272 and DMS-06-04537.
AMS 2000 subject classifications. 60F17, 60J60, 60K25, 90B15, 93E03.
Key words and phrases. Semimartingale reflecting Brownian motion, piecewise smooth
domain, invariance principle, oscillation inequality, Skorokhod problem, stochastic net-
works.
This is an electronic reprint of the original article published by the
Institute of Mathematical Statistics in The Annals of Applied Probability,
2007, Vol. 17, No. 2, 741–779. This reprint differs from the original in pagination
and typographic detail.
http://arxiv.org/abs/0704.0405v1
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000899
http://www.imstat.org
http://www.ams.org/msc/
http://www.imstat.org
http://www.imstat.org/aap/
http://dx.doi.org/10.1214/105051606000000899
2 W. KANG AND R. J. WILLIAMS
challenges in the development of a rigorous theory of existence, uniqueness
and approximation for such SRBMs.
When the state space is an orthant and the direction of reflection is con-
stant on each boundary face, a necessary and sufficient condition for weak
existence and uniqueness of an SRBM is known [14]. This condition involves
a so-called completely-S condition on the matrix formed by the reflection
directions. An invariance principle for such SRBMs was established in [15]
and used in [16] to justify heavy traffic diffusion approximations for cer-
tain open multiclass queueing networks. Loosely speaking, the invariance
principle shows that, assuming uniqueness in law for the SRBM, a process
satisfying the definition of the SRBM, except for small perturbations in the
defining conditions, is close in distribution to the SRBM.
For more general domains with piecewise smooth boundaries, some con-
ditions for existence and uniqueness of SRBMs are known. In particular, for
convex polyhedrons with a constant direction of reflection on each boundary
face, necessary and sufficient conditions for weak existence and uniqueness of
SRBMs are known for simple convex polyhedrons (where precisely d faces
meet at each vertex in d-dimensions) and sufficient conditions are known
for nonsimple convex polyhedrons, see [4]. For a bounded domain that can
be represented as a finite intersection of domains, each of which has a C1-
boundary and an associated uniformly Lipschitz continuous reflection vector
field, sufficient conditions for strong existence and uniqueness were provided
by Dupuis and Ishii [6]; in fact, these authors study stochastic differential
equations with reflection which include SRBMs. Despite these existence and
uniqueness results, a general invariance principle for SRBMs living in the
closures of domains with piecewise smooth boundaries has not been proved
to date. (We note that for the special case when the directions of reflection
are normal, that is, perpendicular to the boundary, there are a number of
perturbation results for reflecting Brownian motions. Our emphasis here is
on treating a wide range of oblique reflection directions.)
Motivated by its potential for use in approximating heavily loaded stochas-
tic networks that are more general than open multiclass queueing networks,
in this paper, we formulate and prove an invariance principle for SRBMs
living in the closures of domains with piecewise smooth boundaries with
possibly nonconstant directions of reflection on each of the smooth bound-
ary surfaces. An application of the results of this paper to the analysis of
an internet congestion control model can be found in [13]. An outline of the
current paper is as follows.
The definition of an SRBM and assumptions on the domains and direc-
tions of reflection are given in Sections 2 and 3, respectively. Some sufficient
conditions for these assumptions to hold are provided in Section 3. Section
4 is devoted to proving the main result of this paper, namely, the invari-
ance principle. A key element for our proof of this result is an oscillation
INVARIANCE PRINCIPLE FOR SRBMS 3
inequality for solutions of a perturbed Skorokhod problem; this inequality,
which may be of independent interest, is proved in Section 4.1. In Section 5
we give some applications of the invariance principle. We prove weak ex-
istence of SRBMs under the conditions specified in Section 3. We also use
the invariance principle, in conjunction with known uniqueness results for
SRBMs, to give sufficient conditions for validating approximations involving
(i) SRBMs in convex polyhedrons with a constant reflection vector field on
each face of the polyhedron, and (ii) SRBMs in bounded domains with piece-
wise smooth boundaries and possibly nonconstant reflection vector fields on
the boundary surfaces.
Beyond its possible use in justifying SRBM approximations for stochastic
networks, the invariance principle might be used to justify numerical ap-
proximations to SRBMs. A further possible extension of the results stated
here would involve an invariance principle for stochastic differential equa-
tions with reflection. The oscillation inequality for the perturbed Skorokhod
problem and associated criteria for C-tightness described in Sections 4.1
and 4.2 are likely to be useful for this. We have not developed such an ex-
tension here as that would involve introduction of extra assumptions that
would make the result less relevant for potential applications to stochastic
networks. In particular, the approximating processes would involve stochas-
tic integrals driven by a Brownian motion, whereas in stochastic network
applications, the Brownian motion typically only appears in the limit.
1.1. Notation, terminology and preliminaries. Let N denote the set of all
positive integers, that is, N = {1,2, . . .}, R denote the set of real numbers,
which is also denoted by (−∞,∞), R+ denote the nonnegative half-line,
which is also denoted by [0,∞). For x ∈ R, we write |x| for the absolute
value of x, [x] for the largest integer less than or equal to x, x+ for the
positive part of x. For any positive integer d, we let Rd denote d-dimensional
Euclidean space, where any element in Rd is denoted by a column vector.
Let ‖ · ‖ denote the Euclidean norm on Rd, that is, ‖x‖ = (
i=1 x
1/2 for
x ∈Rd, and 〈·, ·〉 denote the inner product on Rd, that is, 〈x, y〉=
i=1 xiyi,
for x, y ∈ Rd. We note that for any x ∈ Rd, ‖x‖ ≤
i=1 |xi|. Let R
+ denote
the positive orthant in Rd, that is, Rd+ = {x ∈ R
d :xi ≥ 0,1 ≤ i ≤ d}. Let
B(S) denote the Borel σ-algebra on S ⊂ Rd, that is, the collection formed
by intersecting all Borel sets in Rd with S. Let dist(x,S) denote the distance
between x ∈Rd and S ⊂Rd, that is, dist(x,S) = inf{‖x−y‖ :y ∈ S}, with the
convention that dist(x,∅) =∞ for x ∈Rd. Let Ur(S) denote the closed set
{x ∈Rd : dist(x,S)≤ r} for any r > 0 and S ⊂Rd, where if S =∅, Ur(S) =∅
for all r > 0. Let Br(x) denote the closed ball {y ∈R
d :‖y− x‖ ≤ r} for any
x ∈Rd and r > 0. For any set S ⊂Rd, we write S for the closure of S, So for
the interior of S and ∂S = S \So. For a finite set S, |S| denotes the number
4 W. KANG AND R. J. WILLIAMS
of elements in S. For any v ∈Rd, v′ denotes the transpose of v. Inequalities
between vectors in Rd should be interpreted componentwise, that is, if u, v ∈
d, then u ≤ (<)v means that ui ≤ (<)vi for each i ∈ {1, . . . , d}. For any
matrix A, let A′ denote the transpose of A. For any function x :R+ → R
x(t−) denotes the left limit of x at t > 0 whenever x has a left limit at t;
unless explicitly stated otherwise, x(0−)≡ 0, where 0 is the zero vector in Rd.
For any function x :R+ → R
d, we let ∆x(t) = x(t)− x(t−) for t ∈ R+ when
x(t−) exists. We let 0 be the constant deterministic function x :R+ → R
such that x(t) = 0 for all t ∈R+.
A domain in Rd is an open connected subset of Rd. For each continuously
differentiable function f defined on some nonempty domain S ⊂Rd, ∇f(x)
is the gradient of f at x ∈ S. For each x ∈ Rd, a neighborhood Vx of x is a
bounded domain in Rd that contains x. For any nonempty domain S ⊂Rd,
we say that the boundary ∂S of S is C1, if for each x ∈ ∂S there exists a
Euclidean coordinate system Cx for R
d centered at x, an rx > 0, and a once
continuously differentiable function ϕx :R
d−1 →R such that ϕx(0) = 0 and
S ∩Brx(x) = {z = (z1, . . . , zd)
′ in Cx : zd >ϕx(z1, . . . , zd−1)} ∩Brx(x).
Then, for x ∈ ∂S, the inward unit normal to ∂S at z ∈ ∂S ∩Brx(x) is given
in the coordinate system Cx by
n(z) =
(1 + ‖∇ϕx(z1, . . . , zd−1)‖
2)1/2
(−∇ϕx(z1, . . . , zd−1)
where ∇ϕx(z1, . . . , zd−1) = (
, . . . ,
∂zd−1
)′(z1, . . . , zd−1). For any nonempty
convex set S ⊂Rd, we call a vector n ∈Rd \{0} an inward unit normal vector
to S at x ∈ ∂S if ‖n‖= 1 and 〈n, y − x〉 ≥ 0 for all y ∈ S. Note that such a
vector need not be unique.
All stochastic processes used in this paper will be assumed to have paths
that are right continuous with finite left limits (abbreviated henceforth as
r.c.l.l.). A process is called continuous if almost surely its sample paths
are continuous. We denote by D([0,∞),Rd) the space of r.c.l.l. functions
from [0,∞) into Rd and we endow this space with the usual Skorokhod
J1-topology (cf. Chapter 3 of [7]). We denote by C([0,∞),R
d) the space
of continuous functions from [0,∞) into Rd. The Borel σ-algebra on either
D([0,∞),Rd) or C([0,∞),Rd) will be denoted by Md. The abbreviation
u.o.c. will stand for uniformly on compacts and will be used to indicate
that a sequence of functions in D([0,∞),Rd) (or C([0,∞),Rd)) is converg-
ing uniformly on compact time intervals to a limit in D([0,∞),Rd) (or
C([0,∞),Rd)). Consider W 1,W 2, . . . ,W , each of which is a d-dimensional
process (possibly defined on different probability spaces). The sequence
{W n}∞n=1 is said to be tight if the probability measures induced by the
W n on the measurable space (D([0,∞),Rd),Md) form a tight sequence,
INVARIANCE PRINCIPLE FOR SRBMS 5
that is, they form a weakly relatively compact sequence in the space of
probability measures on (D([0,∞),Rd),Md). The notation “W n ⇒W” will
mean that, as n → ∞, the sequence of probability measures induced on
(D([0,∞),Rd),Md) by {W n} converges weakly to the probability measure
induced on the same space by W . We shall describe this in words by saying
that W n converges weakly (or in distribution) to W as n → ∞. The se-
quence of processes {W n}∞n=1 is called C-tight if it is tight, and if each weak
limit point, obtained as a weak limit along a subsequence, almost surely has
sample paths in C([0,∞),Rd). The following proposition provides a useful
criterion for checking C-tightness.
Proposition 1.1. Suppose that, for each n ∈N, W n is a d-dimensional
process defined on the probability space (Ωn,Fn, Pn). The sequence {W n}∞n=1
is C-tight if and only if the following two conditions hold:
(i) For each η > 0 and T ≥ 0, there exists a finite constant Mη,T > 0
such that
lim inf
0≤t≤T
‖W n(t)‖ ≤Mη,T
≥ 1− η.(1)
(ii) For each ε > 0, η > 0 and T > 0, there exists λ ∈ (0, T ) such that
lim sup
Pn{wT (W
n, λ)≥ ε} ≤ η,(2)
where for x ∈D([0,∞),Rd),
wT (x,λ) = sup
u,v∈[t,t+λ]
‖x(u)− x(v)‖ : 0≤ t < t+ λ≤ T
Proof. See Proposition VI.3.26 in [12]. �
A d-dimensional process W is said to be locally of bounded variation if
all sample paths of W are of bounded variation on each finite time interval.
For such a process W , we define V(W ) = {V(W )(t), t≥ 0} such that for each
t≥ 0,
V(W )(t) = ‖W (0)‖
+ sup
‖W (ti)−W (ti−1)‖ : 0 = t0 < t1 < · · ·< tl = t, l≥ 1
A triple (Ω,F ,{Ft, t ≥ 0}) will be called a filtered space if Ω is a set,
F is a σ-algebra of subsets of Ω, and {Ft, t≥ 0} is an increasing family of
sub-σ-algebras of F , that is, a filtration. Henceforth, the filtration {Ft, t≥ 0}
will be simply written as {Ft}. If P is a probability measure on (Ω,F), then
6 W. KANG AND R. J. WILLIAMS
(Ω,F ,{Ft}, P ) is called a filtered probability space. A d-dimensional process
X = {X(t), t ≥ 0} defined on (Ω,F , P ) is called {Ft}-adapted if for each
t ≥ 0, X(t) :Ω→ Rd is measurable when Ω is endowed with the σ-algebra
Given a filtered probability space (Ω,F ,{Ft}, P ), a vector µ ∈R
d, a d× d
symmetric, strictly positive definite matrix Γ, and a probability distribution
ν on (Rd,B(Rd)), an {Ft}-Brownian motion with drift vector µ, covariance
matrix Γ, and initial distribution ν, is a d-dimensional {Ft}-adapted process
defined on (Ω,F ,{Ft}, P ) such that the following hold under P :
(a) X is a d-dimensional Brownian motion whose sample paths are almost
surely continuous and that has initial distribution ν,
(b) {Xi(t)−Xi(0)− µit,Ft, t≥ 0} is a martingale for i= 1, . . . , d, and
(c) {(Xi(t)−Xi(0)−µit)(Xj(t)−Xj(0)−µjt)−Γijt,Ft, t≥ 0} is a mar-
tingale for i, j = 1, . . . , d.
In this definition, the filtration {Ft} may be larger than the one generated
by X ; however, for each t≥ 0, under P , the σ-algebra Ft is independent of
the increments of X from t onward. The latter follows from the martingale
properties of X . If ν = δx, the unit mass at x ∈ R
d, we say that X starts
from x.
2. Definition of an SRBM. Let G=
i∈I Gi be a nonempty domain in
d, where I is a nonempty finite index set and for each i ∈ I , Gi is a
nonempty domain in Rd. For simplicity, we assume that I = {1,2, . . . , I}
and then |I|= I. For each i ∈ I , let γi(·) be a vector valued function defined
from Rd into Rd. Fix µ ∈Rd, Γ a d × d symmetric and strictly positive
definite covariance matrix and ν a probability measure on (G,B(G)), where
B(G) denotes the σ-algebra of Borel subsets of the closure G of G.
Definition 2.1 (Semimartingale reflecting Brownian motion). A semi-
martingale reflecting Brownian motion (abbreviated as SRBM) associated
with the data (G,µ,Γ,{γi, i ∈ I}, ν) is an {Ft}-adapted, d-dimensional pro-
cess W defined on some filtered probability space (Ω,F ,{Ft}, P ) such that:
(i) P -a.s., W (t) =X(t) +
(0,t] γ
i(W (s))dYi(s) for all t≥ 0,
(ii) P -a.s., W has continuous paths and W (t) ∈G for all t≥ 0,
(iii) under P , X is a d-dimensional {Ft}-Brownian motion with drift
vector µ, covariance matrix Γ and initial distribution ν,
(iv) for each i ∈ I , Yi is an {Ft}-adapted, one-dimensional process such
that P -a.s.,
(a) Yi(0) = 0,
(b) Yi is continuous and nondecreasing,
INVARIANCE PRINCIPLE FOR SRBMS 7
(c) Yi(t) =
(0,t] 1{W (s)∈∂Gi∩∂G} dYi(s) for all t≥ 0.
We shall often refer to Y = {Yi, i ∈ I} as the “pushing process” associated
with the SRBM W . When ν = δx, we may alternatively say that W is an
SRBM associated with the data (G,µ,Γ,{γi, i ∈ I}) that starts from x. We
will call (W,X,Y ) satisfying Definition 2.1 an extended SRBM associated
with the data (G,µ,Γ,{γi, i ∈ I}, ν).
Loosely speaking, an SRBM behaves like a Brownian motion in the inte-
rior of the domain G and it is confined to G by instantaneous “reflection” (or
“pushing”) at the boundary, where the allowed directions of “reflection” at
x ∈ ∂G are convex combinations of the vectors γi(x) for i such that x ∈ ∂Gi.
Under the assumptions imposed on G and {γi, i ∈ I} in Sections 3.1 and 3.2
below, at each point on the boundary of G there is an allowed direction of
reflection that can be used there which “points into the interior of G.” We
end this section by introducing a related set-valued function I(·) and show
a key property of it.
Definition 2.2. For each x∈Rd, let I(x) = {i ∈ I :x ∈ ∂Gi}.
The set-valued function I(·) has the following property called upper semi-
continuity on ∂G.
Lemma 2.1. For each x ∈ ∂G, there is an open neighborhood Vx of x in
d such that
I(y)⊂ I(x) for all y ∈ Vx.(4)
Proof. We prove this lemma by contradiction. Suppose that the func-
tion I(·) does not satisfy (4). Then there is a point x ∈ ∂G such that there
is no open neighborhood Vx of x such that I(y)⊂ I(x) for all y ∈ Vx. Since
the index set I is finite, there is an index k ∈ I \ I(x) and a sequence of
points {yn} ⊂R
d such that ‖yn−x‖<
and k ∈ I(yn) for each n≥ 1. Hence
yn ∈ ∂Gk for all n≥ 1. Since ∂Gk is closed and yn → x as n→∞, we con-
clude that x ∈ ∂Gk. This implies that k ∈ I(x), which is a contradiction, as
desired. �
3. Assumptions on the domain G and the reflection vector fields {γi}.
3.1. Assumptions on the domain G. We henceforth assume that the
domain G satisfies assumptions (A1)–(A3) below. In the case when G is
bounded, assumptions (A2)–(A3) follow from assumption (A1) (see Lem-
mas A.1 and A.2 in the Appendix for details). If the domain G is a convex
polyhedron satisfying assumption (A1), then assumptions (A2)–(A3) hold
by Lemma A.3 in the Appendix.
8 W. KANG AND R. J. WILLIAMS
(A1) G is a nonempty domain in Rd with representation
Gi,(5)
where for each i ∈ I , Gi is a nonempty domain, Gi 6=R
d, and the boundary
∂Gi of Gi is C
1. For each i ∈ I , we let ni(·) be the unit normal vector field
on ∂Gi that points into Gi.
(A2) For each ε ∈ (0,1) there exists R(ε) > 0 such that for each i ∈ I ,
x ∈ ∂Gi ∩ ∂G and y ∈G satisfying ‖x− y‖<R(ε), we have
〈ni(x), y − x〉 ≥−ε‖x− y‖.(6)
(A3) The function D : [0,∞)→ [0,∞] defined such that D(0) = 0 and
D(r) = sup
J 6=∅
(∂Gj ∩ ∂G)
Ur(∂Gj ∩ ∂G)
for r > 0, satisfies
D(r)→ 0 as r→ 0.(8)
Remark. Assumption (A2) is reminiscent of the uniform exterior cone
condition (cf. [9], page 195). We say that a region G ⊂ Rd satisfies a uni-
form exterior cone condition if for each x0 ∈ ∂G, there is a truncated closed
right circular cone Vx0 , with nonempty interior and vertex x0, satisfying
Vx0 ∩G= {x0}, and the truncated cones Vx0 are all congruent to some fixed
truncated closed right circular cone V . By comparing assumption (A2) with
the uniform exterior cone condition, we see that assumption (A2) implies
the uniform exterior cone condition. On the other hand, under assumption
(A1), assumption (A2) is implied by a family of uniform exterior cone condi-
tions where for each ε ∈ (0,1), the axis of the truncated closed right circular
cone at x ∈ ∂G is along the vector −ni(x) and all of the truncated closed
right circular cones are congruent to a truncated closed right circular cone
whose height and base radius are R(ε) and R(ε)( 1
− 1)1/2 respectively.
Assumption (A2) holds automatically if G is convex. We also note that as-
sumption (A2) is strictly weaker than the uniform exterior sphere condition.
The definition of the uniform exterior sphere condition is similar to that for
the uniform exterior cone condition where a closed ball with x0 on its bound-
ary takes the place of the truncated closed right circular cone Vx0 . It can
be checked that for the domain G = {(x, y) ∈ R2 :y < |x|α} with α ∈ (1,2),
the uniform exterior sphere condition fails to hold, but assumption (A2)
holds. In fact, at the point (0,0) ∈R2, there is no r > 0 and y ∈R2 such that
Br(y)∩ ∂G= {(0,0)}.
INVARIANCE PRINCIPLE FOR SRBMS 9
Remark. For the definition of D(·) in (A3), we adopt the convention
that the supremum over an empty set is zero and dist(x,∅) =∞. Since ∂Gi∩
∂G 6=∅ for at least one i ∈ I , the function D(·) satisfies limr→∞D(r) =∞.
Furthermore, D(r1)≤D(r2) whenever r1, r2 ∈ [0,∞) and r1 ≤ r2. Assump-
tion (A3) requires that for any nonempty subset J ⊂ I , the intersection of
tubular neighborhoods of the boundaries {∂Gj ∩ ∂G : j ∈ J } given by the
j∈J Ur(∂Gj ∩ ∂G) “converges” to the intersection of the boundaries
given by the set
j∈J (∂Gj ∩ ∂G) as r approaches 0. Property (8) need
not always hold. For example, let G1 = {(x, y) ∈ R
2 :y < e−x
2/2, x ∈ R} and
G2 = {(x, y) ∈ R
2 :y > 0, x ∈ R}. Then ∂G1 ∩ ∂G2 =∅. But for each r > 0,
Ur(∂G1)∩Ur(∂G2) 6=∅. Hence D(r) =∞ for each r > 0.
3.2. Assumptions on the reflection vector fields {γi}. We henceforth as-
sume that there are vector fields {γi(·), i ∈ I} satisfying assumptions (A4)–
(A5) below.
(A4) There is a constant L > 0 such that for each i ∈ I , γi(·) is a uni-
formly Lipschitz continuous function from Rd into Rd with Lipschitz con-
stant L and ‖γi(x)‖= 1 for each x ∈Rd.
(A5) There is a constant a ∈ (0,1), and vector valued functions b(·) =
(b1(·), . . . , bI(·)) and c(·) = (c1(·), . . . , cI(·)) from ∂G into R
+ such that for
each x ∈ ∂G,
i∈I(x) bi(x) = 1,
j∈I(x)
i∈I(x)
bi(x)n
i(x), γj(x)
≥ a,(9)
i∈I(x) ci(x) = 1,
j∈I(x)
i∈I(x)
ci(x)γ
i(x), nj(x)
≥ a.(10)
We note here for future use that by (A4), if we set ρ0 =
, then for
any x, y ∈ Rd satisfying ‖x − y‖ < ρ0, we have ‖γ
i(x) − γi(y)‖ < a/4 for
each i ∈ I . So for each 0< ρ < ρ0/4, by (9)–(10) and the normalization of
b(·), c(·), γi(·), nj(·) for i, j ∈ I , we obtain
j∈I(x)
y∈B4ρ(x)
i∈I(x)
bi(x)n
i(x), γj(y)
≥ a/2(11)
j∈I(x)
y∈B4ρ(x)
i∈I(x)
ci(x)γ
i(y), nj(x)
≥ a/2.(12)
10 W. KANG AND R. J. WILLIAMS
The use of B4ρ(x) here is related to the form in which this is used in Section
Remark. Assumption (A4) is equivalent to (3.4) in [6] whenG is bounded.
Property (10) means that, at each point x ∈ ∂G, there is a convex combi-
nation γ(x) =
i∈I(x) ci(x)γ
i(x) of the vectors {γi(x), i ∈ I(x)} that can
be used there such that γ(x) “points into” G. Property (9) is in a sense a
dual condition to property (10), where the roles of γi and ni are reversed
for i ∈ I(x). This property (9) is used in showing the oscillation inequality
in Theorem 4.1 below. Assumption (A5) is an analogue of Assumption 1.1
in [4]. When G is bounded, (10) is similar to condition (3.6) in [6] (we as-
sume some additional uniformity through the lack of dependence of a on
It is straightforward to see using the triangle inequality that the following
condition (A5)′ implies (A5).
(A5)′ There is a ∈ (0,1) and vector valued functions b, c from ∂G into RI+
such that for each x ∈ ∂G,
i∈I(x) bi(x) = 1, and for each i ∈ I(x),
bi(x)〈n
i(x), γi(x)〉 ≥ a+
j∈I(x)\{i}
bj(x)|〈n
j(x), γi(x)〉|,(13)
i∈I(x) ci(x) = 1, and for each i ∈ I(x),
ci(x)〈γ
i(x), ni(x)〉 ≥ a+
j∈I(x)\{i}
cj(x)|〈γ
j(x), ni(x)〉|.(14)
Condition (A5)′(ii) is similar to condition (3.8) in [6], although here we
assume additional uniformity through the lack of dependence of a on x. As
noted in [6], their condition (3.8) can be phrased in terms of a nonsingu-
lar M-matrix requirement [2]. (This is sometimes also called a generalized
Harrison–Reiman type of condition [10].) Since that nonsingular M-matrix
property is invariant under transpose, and this property for the transpose
corresponds to a local form of (A5)′(i), one might conjecture that there is an
equivalence between the existence of a nonnegative vector valued function b
such that (A5)′(i) holds for each x ∈ ∂G and the existence of a nonnegative
vector valued function c such that (A5)′(ii) holds for each x ∈ ∂G. Indeed
we have the following lemma. We have stated the two (equivalent) condi-
tions (i) and (ii) in specifying (A5)′ to preserve a parallel with (A5) and
since both properties can be useful in proofs. Furthermore, in light of the
following lemma, verifying either condition suffices for both to hold.
INVARIANCE PRINCIPLE FOR SRBMS 11
Lemma 3.1. There is a constant a ∈ (0,1) and a vector valued function
b :∂G → RI+ such that (A5)
′(i) holds for each x ∈ ∂G if and only if there
is a constant a ∈ (0,1) and a vector valued function c :∂G→ RI+ such that
(A5)′(ii) holds for each x ∈ ∂G.
Proof. We just prove the “if” part; the “only if” part can be proved
in a similar manner.
We suppose that there is a constant a ∈ (0,1) and a vector valued function
c :∂G → RI+ such that (A5)
′(ii) holds for each x ∈ ∂G. For fixed x ∈ ∂G,
consider the square matrix A(x) whose diagonal entries are given by the
(positive) elements 〈ni(x), γi(x)〉 for i ∈ I(x) and whose off-diagonal entries
are given by−|〈ni(x), γj(x)〉| for i ∈ I(x), j ∈ I(x), j 6= i. Let E be the square
matrix having the same dimensions as A(x) and whose entries are all equal
to one. By the theory of M-matrices (see [2], Chapter 6, especially condition
(M35)), condition (ii) of (A5)
′ implies that A(x)− a
E is a nonsingular M-
matrix, that is, A(x)− a
E has nonnegative diagonal entries and nonpositive
off-diagonal entries and it can be written in the form s(x)I −B(x) where
B(x) is a matrix with nonnegative entries and s(x)> 0 is a constant that is
strictly larger than the spectral radius of B(x).
Since the nonsingular M-matrix property is invariant under transpose (cf.
(G21) in Chapter 6 of [2]), then A
′(x)− a
E is also a nonsingular M-matrix.
Hence, there is a vector b̃(x) = (b̃i(x) : i ∈ I(x)) with nonnegative entries such
that (A′(x)− a
E)b̃(x)> 0 (cf. (I27) in Chapter 6 of [2]). We can extend b̃(x)
to an I-dimensional vector b(x) and normalize it so that
i∈I(x) bi(x) = 1.
Then (A5)′(i) holds with a
in place of a. �
4. Invariance principle. In this section we state and prove an invariance
principle for an SRBM living in the closure of a domain G with piecewise
smooth boundary and having associated reflection fields {γi, i ∈ I}, where
G, {γi, i ∈ I} satisfy assumptions (A1)–(A5) of Section 3. (These assump-
tions hold throughout this section.) We shall first state a preliminary result
called an oscillation inequality (see Theorem 4.1), then we use it to prove
a tightness result (see Theorem 4.2). Finally, we establish the invariance
principle (see Theorem 4.3).
4.1. Oscillation inequality. The following oscillation inequality is the key
to the proof of the tightness result claimed in Theorem 4.2. In this subsec-
tion, for any 0≤ t1 < t2 <∞ and any integer k ≥ 1, D([t1, t2],R
k) denotes
the set of functions w : [t1, t2]→R
k that are right continuous on [t1, t2) and
have finite left limits on (t1, t2]. For w ∈D([t1, t2],R
Osc(w, [t1, t2]) = sup{‖w(t)−w(s)‖ : t1 ≤ s < t≤ t2},(15)
Osc(w, [t1, t2)) = sup{‖w(t)−w(s)‖ : t1 ≤ s < t < t2}.(16)
12 W. KANG AND R. J. WILLIAMS
Note that we do not explicitly indicate the dependence on k in the notation.
Recall the constants a,L from assumptions (A4)–(A5), the functions R(·)
from assumption (A2) and D(·) from (7). Let ρ0 =
Theorem 4.1 (Oscillation inequality). There exists a nondecreasing func-
tion Π: (0,∞)→ (0,∞] satisfying Π(u)→ 0 as u→ 0, such that Π depends
only on the constants I, a and the function D(·), and such that whenever
0 < ρ <min{
R(a/4)
}, 0 < δ <
, 0 ≤ s < t < ∞, w,x ∈ D([s, t],Rd) and
y ∈D([s, t],RI) satisfy:
(i) w(u) ∈Bρ(x0)∩Uδ(G) for all u ∈ [s, t], for some x0 ∈G,
(ii) w(u) = w(s) + x(u) − x(s) +
(s,u] γ
i(w(v))dyi(v) for all u ∈
[s, t],
(iii) for each i ∈ I ,
(a) yi(s)≥ 0,
(b) yi is nondecreasing and ∆yi(u)≤ δ for all u ∈ (s, t],
(c) yi(u) = yi(s) +
(s,u] 1{w(v)∈Uδ(∂Gi∩∂G)} dyi(v) for all u ∈ [s, t],
(iv) D(Π(Osc(x, [s, t]) + δ))<
then we have that the following hold:
Osc(w, [s, t])≤Π(Osc(x, [s, t]) + δ),(17)
Osc(y, [s, t])≤Π(Osc(x, [s, t]) + δ).(18)
Proof. Let
Π0(u) = u for all u > 0.
Define Πm : (0,∞)→ (0,∞], m= 1, . . . , I, inductively such that
Πm(u) = Πm−1(u) + (I+2)u+
(D(Πm−1(u) + (I+2)u) + 2u).
Here the sum of any element of [0,∞) with ∞ is ∞ and D(∞) is defined
to equal ∞. For each m= 0,1, . . . , I, the function Πm is nondecreasing and
depends only on I, a and D(·). For each m= 1, . . . , I and u > 0, Πm−1(u)≤
Πm(u). By assumption (A3), we conclude (using an induction proof) that
Πm(u)→ 0 as u→ 0, for m= 0,1, . . . , I.
Let Π(·) = ΠI(·).
Fix 0 < ρ < min{
R(a/4)
}, 0 < δ <
, 0 ≤ s < t < ∞. Suppose that
w,x ∈D([s, t],Rd) and y ∈D([s, t],RI) satisfy (i)–(iv) in the statement of
Theorem 4.1. For each nonempty interval [t1, t2]⊂ [s, t], let
I[t1,t2] = {i ∈ I :w(u) ∈ Uδ(∂Gi ∩ ∂G) for some u ∈ [t1, t2]},
INVARIANCE PRINCIPLE FOR SRBMS 13
the indices of the boundary surfaces that w(·) comes close to in the time
interval [t1, t2]. For each 0 ≤ m ≤ I, define Tm = {[t1, t2] ⊂ [s, t] : |I[t1,t2]| ≤
m}. Note that under the partial ordering of set inclusion, Tm increases with
m. To prove the theorem, we will prove by induction that for each 0≤m≤ I
and each interval [t1, t2] ∈ Tm, (17)–(18) hold with [t1, t2] in place of [s, t] and
Πm(·) in place of Π(·). The result for m= I yields the theorem.
Suppose that m= 0. Then T0 = {[t1, t2]⊂ [s, t] : |I[t1,t2]|= 0}. Fix an inter-
val [t1, t2] ∈ T0. Since I[t1,t2] =∅ and (iii)(c) holds, the function y does not
increase on the time interval (t1, t2], that is, yi(t2)− yi(t1) = 0 for all i ∈ I .
Then, for t1 ≤ u < v ≤ t2,
w(v)−w(u) = x(v)− x(u).(19)
So in this case,
Osc(w, [t1, t2]) = Osc(x, [t1, t2])≤Osc(x, [t1, t2]) + δ,(20)
Osc(y, [t1, t2]) = 0≤Osc(x, [t1, t2]) + δ.(21)
Thus, (17)–(18) hold with Π0(·) in place of Π(·) and [t1, t2] in place of [s, t]
for each interval [t1, t2] ∈ T0.
For the induction step, let 1 ≤ m ≤ I and suppose that (17)–(18) hold
with Πm−1(·) in place of Π(·) and [t1, t2] in place of [s, t] for each interval
[t1, t2] ∈ Tm−1.
Now fix [t1, t2] ∈ Tm. If |I[t1,t2]| ≤ m − 1, then [t1, t2] ∈ Tm−1 and so by
the induction assumption we have that (17)–(18) hold with [t1, t2] in place
of [s, t] and Πm−1(·) [and hence Πm(·)] in place of Π(·). Thus, it suffices
to consider [t1, t2] ⊂ [s, t] such that |I[t1,t2]| =m. For i /∈ I[t1,t2], by (iii)(c),
yi(t2)− yi(t1) = 0, and so by (ii), for t1 ≤ u < v ≤ t2, we have
w(v)−w(u) = x(v)− x(u) +
i∈I[t1,t2]
(u,v]
γi(w(r))dyi(r).(22)
Let Πm(u) = Πm−1(u) + (I + 2)u for all u > 0, and η = Osc(x, [t1, t2]) + δ.
For any M ∈ (0,∞] and any nonempty set J ⊂ I , let
FMJ = {z ∈R
d : dist(z, ∂Gi ∩ ∂G)<M for all i ∈ J }.
Note that FMJ = ∅ when there is an i ∈ J such that ∂Gi ∩ ∂G = ∅. Since
Πm(·)≤Πm(·)≤Π(·), D(·) and Π(·) are nondecreasing, and Osc(x, [t1, t2])≤
Osc(x, [s, t]), we have by (iv) that
D(Πm(η))≤D(Πm(η))≤D(Π(η))<
.(23)
Note that this implies Πm(η)<∞ since D(∞) =∞.
We now consider two cases.
14 W. KANG AND R. J. WILLIAMS
Case 1. Suppose that w(r) ∈ F
Πm(η)
I[t1,t2]
for all r ∈ [t1, t2].
Fix u, v such that t1 ≤ u < v ≤ t2. Since we have that
w(v) ∈
j∈I[t1,t2]
Πm(η)
(∂Gj ∩ ∂G),
by the definition of D(·) and (23), there is z ∈
j∈I[t1,t2]
(∂Gj ∩ ∂G) such
‖w(v)− z‖ ≤D(Πm(η))<
.(24)
For each r ∈ [t1, t2], by (i) we have that w(r) ∈ Uδ(G), and so there is z
such that
‖w(r)− zr‖ ≤ 2δ.
Hence by (i) and (24) we have
‖zr − z‖ ≤ ‖zr −w(r)‖+ ‖w(r)− x0‖+ ‖x0 −w(v)‖+ ‖w(v)− z‖
≤ 2δ + ρ+ ρ+ ρ/2< 4ρ < R(a/4)
‖w(r)− z‖ ≤ ‖w(r)− x0‖+ ‖x0 −w(v)‖+ ‖w(v)− z‖
≤ ρ+ ρ+ ρ/2< 4ρ.
By (6) and (25) we have
〈nj(z), z − zr〉 ≤
‖z − zr‖ for each j ∈ I(z) and r ∈ [t1, t2].(27)
Note that I(z) ⊃ I[t1,t2]. Recalling the definition of b(·) from assumption
(A5), on dotting the vector
j∈I(z) bj(z)n
j(z) with both sides of (22) and
rearranging, we obtain
i∈I[t1,t2]
(u,v]
j∈I(z)
bj(z)n
j(z), γi(w(r))
dyi(r)
j∈I(z)
bj(z)〈n
j(z),w(v)−w(u)〉(28)
j∈I(z)
bj(z)〈n
j(z), x(v)− x(u)〉.
So by (11), (22), (24)–(28), and the fact that
j∈I(z) bj(z) = 1, bj(z)≥ 0 for
j ∈ I , we have
i∈I[t1,t2]
(yi(v)− yi(u))
INVARIANCE PRINCIPLE FOR SRBMS 15
i∈I[t1,t2]
(u,v]
j∈I(z)
bj(z)n
j(z), γi(w(r))
dyi(r)
j∈I(z)
bj(z)〈n
j(z),w(v)− z〉+
j∈I(z)
bj(z)〈n
j(z), z − zu〉
j∈I(z)
bj(z)〈n
j(z), zu −w(u)〉 −
j∈I(z)
bj(z)〈n
j(z), x(v)− x(u)〉
≤D(Πm(η)) +
‖z − zu‖+ 2δ + ‖x(v)− x(u)‖
≤D(Πm(η)) + 2δ + ‖x(v)− x(u)‖
(‖z −w(v)‖+ ‖w(v)−w(u)‖+ ‖w(u)− zu‖)
≤D(Πm(η)) + 2δ + ‖x(v)− x(u)‖
D(Πm(η)) + ‖x(v)− x(u)‖+
i∈I[t1,t2]
(yi(v)− yi(u)) + 2δ
{D(Πm(η)) + 2δ + ‖x(v)− x(u)‖}+
i∈I[t1,t2]
(yi(v)− yi(u)).
Hence
i∈I[t1,t2]
(yi(v)− yi(u))≤
{D(Πm(η)) + 2δ + ‖x(v)− x(u)‖}
{D(Πm(η)) + 2η}.
On multiplying through by 4
, we obtain
i∈I[t1,t2]
(yi(v)− yi(u))≤
{D(Πm(η)) + 2η} ≤Πm(η).(29)
Hence, by (29) and the fact that for any x ∈Rd, ‖x‖ ≤
i=1 |xi|, we have
Osc(y, [t1, t2])≤Πm(Osc(x, [t1, t2]) + δ),(30)
and by (22), (29) and the definitions of Πm(·) and Πm(·), we have
Osc(w, [t1, t2])≤Osc(x, [t1, t2]) +
{D(Πm(η)) + 2η}
≤Πm(Osc(x, [t1, t2]) + δ),
as desired.
16 W. KANG AND R. J. WILLIAMS
Case 2. Suppose that there is t3 ∈ [t1, t2] such that w(t3) /∈ F
Πm(η)
I[t1,t2]
Define σ = inf{u ∈ [t1, t2] :w(u) /∈ F
Πm(η)
I[t1,t2]
}. Then σ ≤ t2. For each u ∈
[t1, σ), w(u) ∈ F
Πm(η)
I[t1,t2]
and so by a similar analysis to that for Case 1, we
obtain for each v ∈ [t1, σ),
Osc(w, [t1, v])≤ η+
(D(Πm(η)) + 2η)
Osc(y, [t1, v])≤
(D(Πm(η)) + 2η).
By the right continuity of paths we have w(σ) /∈ F
Πm(η)
I[t1,t2]
. Then there is an
i ∈ I[t1,t2] such that dist(w(σ), ∂Gi ∩ ∂G) ≥ Πm(η), and it follows that w
does not reach Uδ(∂Gi ∩ ∂G) during the interval [σ, t2]. To see this, let
τ = inf{u ∈ [σ, t2] : dist(w(u), ∂Gi ∩ ∂G) ≤ δ} with the convention that the
infimum of an empty set is ∞. If τ ≤ t2, then by the right continuity of
w(·) and since Πm(η)> δ, we have τ > σ and dist(w(τ), ∂Gi∩∂G)≤ δ. Also,
since |I[t1,t2]|=m, we have [σ,u] ∈ Tm−1 for each u ∈ [σ, τ). By the induction
assumption and letting u→ τ , we have ‖w(τ−)−w(σ)‖ ≤Πm−1(η). By (ii),
(iii)(b) and since ‖γi(·)‖= 1, we have
‖∆w(τ)‖ ≤ ‖∆x(τ)‖+
∆yi(τ)≤Osc(x, [t1, t2]) + Iδ ≤ Iη.
Then simple manipulations yield
dist(w(σ), ∂Gi ∩ ∂G)≤ ‖w(σ)−w(τ−)‖+ ‖∆w(τ)‖+ dist(w(τ), ∂Gi ∩ ∂G)
≤Πm−1(η) + Iη+ δ
<Πm(η).
This contradicts the fact that dist(w(σ), ∂Gi∩∂G)≥Πm(η), and so confirms
that w does not reach Uδ(∂Gi ∩ ∂G) in [σ, t2]. Thus we must have [σ, t2] ∈
Tm−1. Hence we have by the induction assumption that
Osc(w, [t1, t2])≤ sup
v∈[t1,σ)
Osc(w, [t1, v]) + ‖∆w(σ)‖+Osc(w, [σ, t2])
≤ η +
(D(Πm(η)) + 2η) + Iη+Πm−1(η)
≤Πm(Osc(x, [t1, t2]) + δ)
INVARIANCE PRINCIPLE FOR SRBMS 17
Osc(y, [t1, t2])≤ sup
v∈[t1,σ)
Osc(y, [t1, v]) + ‖∆y(σ)‖+Osc(y, [σ, t2])
(D(Πm(η)) + 2η) + Iη+Πm−1(η)
≤Πm(Osc(x, [t1, t2]) + δ).
On combining all of the cases above, we have
Osc(w, [t1, t2])≤Πm(Osc(x, [t1, t2]) + δ),(31)
Osc(y, [t1, t2])≤Πm(Osc(x, [t1, t2]) + δ).(32)
This completes the induction step. �
Remark. The proof of the above theorem was inspired by the proof of
Lemma 4.3 of [4]. Because of the condition (i) in Theorem 4.1, the oscilla-
tion inequality given here is localized. Similar, but nonlocalized, oscillation
inequalities were proved in [15] when G = Rd+ and in [3] for a sequence of
convex polyhedrons; in these cases, the direction of reflection was constant
on each boundary face.
4.2. C-tightness result. Throughout this subsection and the next, we
suppose that the following assumption holds in addition to (A1)–(A5).
Assumption 4.1. There is a sequence of strictly positive constants {δn}∞n=1
such that for each positive integer n, there are processes W n, W̃ n,Xn, αn
having paths in D([0,∞),Rd) and processes Y n, Ỹ n, βn having paths in
D([0,∞),RI) defined on some probability space (Ωn,Fn, Pn) such that:
(i) Pn-a.s., W n = W̃ n + αn and W̃ n(t) ∈Uδn(G) for all t≥ 0,
(ii) Pn-a.s., W n(t) =Xn(t)+
(0,t] γ
i,n(W n(s−),W n(s))dY ni (s) for
all t≥ 0, where for each i ∈ I , γi,n :Rd ×Rd → Rd is Borel measurable and
‖γi,n(y,x)‖= 1 for all x, y ∈Rd,
(iii) Y n = Ỹ n+βn, where βn is locally of bounded variation and Pn-a.s.,
for each i ∈ I ,
(a) Ỹ ni (0) = 0,
(b) Ỹ ni is nondecreasing and ∆Ỹ
i (t)≤ δ
n for all t > 0,
(c) Ỹ ni (t) =
(0,t] 1{W̃n(s)∈Uδn (∂Gi∩∂G)}
dỸ ni (s),
(iv) δn → 0 as n → ∞, and for each ε > 0, there is ηε > 0 and nε > 0
such that for each i ∈ I , ‖γi,n(y,x)− γi(x)‖< ε whenever ‖x− y‖< ηε and
n≥ nε,
18 W. KANG AND R. J. WILLIAMS
(v) αn → 0 and V(βn)→ 0 in probability, as n→∞,
(vi) {Xn} is C-tight.
Remark. A simple case in which (iv) above holds is where γi,n(y,x)≡
γi(y). In (v), V(βn) is the total variation process for βn (cf. Section 1.1).
The following theorem will play an important role in the proof of the in-
variance principle. It will be used to show that a sequence of processes sat-
isfying suitably perturbed versions of the defining conditions for an SRBM
[cf. (i)–(vi) above] is C-tight.
Theorem 4.2 (C-tightness). Suppose that Assumption 4.1 holds. Define
Zn = (W n,Xn, Y n) for each n. Then the sequence of processes {Zn}∞n=1 is
C-tight.
Remark. Note that C-tightness of {W n}, {Xn} and {Y n} implies C-
tightness of {Zn} (see Chapter VI, Corollary 3.33 of [12] for details).
Proof of Theorem 4.2. References here to (i)–(vi) are to the condi-
tions in Assumption 4.1.
Simple algebraic manipulations yield Pn-a.s.,
W̃ n(t) = X̃n(t) +
(0,t]
γi,n(W n(s−),W n(s))dỸ ni (s)(33)
= X̃n(t) + Ṽ n(t) +
(0,t]
γi(W̃ n(s))dỸ ni (s),(34)
where
X̃n(t) =Xn(t) +
−αn(t) +
(0,t]
γi,n(W n(s−),W n(s))dβni (s)
Ṽ n(t) =
(0,t]
(γi,n(W n(s−),W n(s))− γi(W n(s)))dỸ ni (s)
(0,t]
(γi(W n(s))− γi(W̃ n(s)))dỸ ni (s).
The hypotheses on αn, the total variation process V(βn) of βn, and the
fact that ‖γi,n(y,x)‖ = 1 for all x, y ∈ Rd and each i ∈ I , imply that the
process
−αn(·) +
(0,·]
γi,n(W n(s−),W n(s))dβni (s)
INVARIANCE PRINCIPLE FOR SRBMS 19
converges to 0 in probability as n→∞. Combining this with the fact that
{Xn}∞n=1 is C-tight, we obtain that {X̃
n}∞n=1 is C-tight.
Recall the positive nondecreasing function Π(·) from Theorem 4.1, and
the constants a, L and functions R(·) and D(·) from assumptions (A1)–(A5)
in Section 3. Recall also that ρ0 =
Fix ρ, ε, η, T such that 0 < ρ <min{
R(a/4)
}, ε > 0, η > 0 and T > 0.
By assumption (A3), there is a constant r1 > 0 such that
D(r)<min
for all r ∈ (0, r1].(37)
Since Π(u) → 0 as u → 0, there are constants 0 < r3 < r2 < min{r1,
such that
Π(r)<
for all r ∈ (0, r3].(38)
By (iv), there are 0< ε̃ <min{
} and n0 > 0 such that for all n≥ n0,
‖y−x‖<2ε̃
‖γi,n(y,x)− γi(x)‖<
.(39)
By (iv)–(vi), and Proposition 1.1, there exist an integer n1 > n0, a con-
stant M̃η,T > 0 and λ̃ ∈ (0, T ), such that for all n≥ n1,
0≤s≤T
‖X̃n(s)‖ ≤ M̃η,T
≥ 1− η/2,(40)
Pn{wT (X̃
n, λ̃)≥ ε̃} ≤ η/4,(41)
0≤s≤T
‖αn(s)‖<
6ILr2
≥ 1− η/4,(42)
δn <min
8(1 + I)
.(43)
To prove C-tightness of {W̃ n} and {Ỹ n} (and hence of {W n}, {Y n}),
by Proposition 1.1, it suffices to show that there exists a constant Nη,T > 0
such that for all n≥ n1,
Pn{wT (W̃
n, λ̃)≥ ε} ≤ η,(44)
Pn{wT (Ỹ
n, λ̃)≥ ε} ≤ η,(45)
0≤s≤T
‖W̃ n(s)‖ ≤Nη,T
≥ 1− η,(46)
0≤s≤T
‖Ỹ n(s)‖ ≤Nη,T
≥ 1− η.(47)
20 W. KANG AND R. J. WILLIAMS
For each n≥ 1, let Fn be a set in Fn such that Pn(Fn) = 1 and on Fn,
properties (iii)(a)–(c) hold, (33)–(36) hold, and W̃ n(t) ∈ Uδn(G) for all t≥ 0.
Fix a t such that 0≤ t < t+ λ̃≤ T . Let
τn = inf{s≥ t :W̃ n(s) ∈ Uδn(∂Gi ∩ ∂G) for some i ∈ I}.(48)
For each n≥ n1, let
wT (X̃
n, λ̃)< ε̃, sup
0≤s≤T
‖αn(s)‖<
6ILr2
0≤s≤T
‖X̃n(s)‖ ≤ M̃η,T
Then by (40)–(42) and the definition of Fn,
P{Hn} ≥ 1− η.(50)
Fix ωn ∈Hn. By the definition of wT (x,λ) in (3), we have that,
r,s∈[t,t+λ̃]
‖X̃n(s,ωn)− X̃n(r,ωn)‖< ε̃.(51)
Now there are two cases to consider for n ≥ n1 and u, v fixed such that
t≤ u < v ≤ t+ λ̃.
Case 1. ωn ∈ {τn > v}. In this case, by (iii)(c), Ỹ n(·, ωn) does not increase
on the interval (u, v], that is, Ỹ ni (v,ω
n)− Ỹ ni (u,ω
n) = 0 for all i ∈ I . Then
by (34) and (36),
W̃ n(v,ωn)− W̃ n(u,ωn) = X̃n(v,ωn)− X̃n(u,ωn).(52)
Hence, by (51),
‖W̃ n(v,ωn)− W̃ n(u,ωn)‖ ≤ sup
r,s∈[t,t+λ̃]
‖X̃n(s,ωn)− X̃n(r,ωn)‖< ε̃ < ε/8,
and we also have
‖Ỹ n(v,ωn)− Ỹ n(u,ωn)‖= 0< ε/2.
Case 2. ωn ∈ {τn ≤ v}. Then there is an i ∈ I such that W̃ n(τn, ωn) ∈
Uδn(∂Gi ∩ ∂G), since the set Uδn(∂Gi ∩ ∂G) is closed and W̃
n(·, ωn) is right
continuous. It follows that there is some x0 ∈ ∂G (which depends on ω
such that W̃ n(τn, ωn) is in the closed ball Bδn(x0)⊂Bρ(x0). To apply the
INVARIANCE PRINCIPLE FOR SRBMS 21
oscillation inequality in Theorem 4.1, we first prove the following:
W̃ n(r,ωn) ∈Bρ(x0) for all r satisfying τ
n ≤ r ≤ v.(53)
For the proof of (53), let
ξn = inf{r ≥ τn :W̃ n(r,ωn) /∈Bρ(x0)} ∧ v.(54)
By the definition of ξn, W̃ n(r,ωn) ∈Bρ(x0) for each r ∈ [τ
n, ξn). In order to
apply the oscillation inequality in Theorem 4.1 on the time interval [τn, ξn),
we show that
D(Π(Osc(X̃n(·, ωn) + Ṽ n(·, ωn), [τn, ξn)) + δn))<
.(55)
For each r ∈ (0, T ], by (i)–(iii) and (33), (49), (43), we have that
‖W n(r−, ωn)−W n(r,ωn)‖
≤ ‖W̃ n(r−, ωn)− W̃ n(r,ωn)‖+ ‖αn(r−, ωn)−αn(r,ωn)‖
≤ ‖∆X̃n(r,ωn)‖+2 sup
0≤s≤T
‖αn(s)‖+ Iδn
≤ ε̃+
< 2ε̃.
Hence by (39), for each r ∈ (0, T ],
‖γi,n(W n(r−, ωn),W n(r,ωn))− γi(W n(r,ωn))‖ ≤
.(56)
By (36), (56), Assumption (A4), (i) and (49), we have that for any s1, s2
such that u≤ s1 < s2 ≤ v,
‖Ṽ n(s2, ω
n)− Ṽ n(s1, ω
(s1,s2]
‖γi,n(W n(r−, ωn),W n(r,ωn))
− γi(W n(r,ωn))‖dỸ ni (r,ω
(s1,s2]
‖γi(W n(r,ωn))− γi(W̃ n(r,ωn))‖dỸ ni (r,ω
(Ỹ ni (s2, ω
n)− Ỹ ni (s1, ω
(s1,s2]
L‖W n(r,ωn)− W̃ n(r,ωn)‖dỸ ni (r,ω
(Ỹ ni (s2, ω
n)− Ỹ ni (s1, ω
6ILr2
(Ỹ ni (s2, ω
n)− Ỹ ni (s1, ω
22 W. KANG AND R. J. WILLIAMS
‖Ỹ n(s2, ω
n)− Ỹ n(s1, ω
σn = inf{s≥ τn :Osc(Ỹ n(·, ωn), [τn, s))> r2}.(58)
Note that Osc(Ỹ n(·, ωn), [τn, s)) as a function of s defined on (τn,∞) is
left continuous with finite right limits and is nondecreasing. By the right
continuity of Ỹ n, we know that
Osc(Ỹ n(·, ωn), [τn, s))→ 0 as s ↓ τn.
Thus, σn > τn, Osc(Ỹ n(·, ωn), [τn, σn))≤ r2 and on {σ
n <∞}, Osc(Ỹ n(·, ωn),
[τn, σn]) ≥ r2. By (57), (51), (43), the choice of ε, and since t≤ τ
n ≤ ξn ≤
v ≤ t+ λ̃, we have
Osc(X̃n(·, ωn) + Ṽ n(·, ωn), [τn, ξn ∧ σn)) + δn
≤Osc(X̃n(·, ωn), [τn, ξn ∧ σn))
+Osc(Ṽ n(·, ωn), [τn, ξn ∧ σn)) + δn
≤Osc(X̃n(·, ωn), [τn, ξn ∧ σn))
Osc(Ỹ n(·, ωn), [τn, ξn ∧ σn)) + δn
≤ ε̃+
r2 + δ
n < r3.
Then by (38) and the monotonicity of D(·), we have
D(Π(Osc(X̃n(·, ωn) + Ṽ n(·, ωn), [τn, ξn ∧ σn)) + δn))
≤D(r2)≤D(r1)<
We claim that
σn ≥ ξn.(61)
To prove (61), we proceed by contradiction and suppose that σn < ξn. Then
by (60), with x= X̃n(·, ωn) + Ṽ n(·, ωn) and δ = δn, condition (iv) of The-
orem 4.1 holds with [s, t] = [τn, σn − 1/m] for all m sufficiently large. By
applying Theorem 4.1 and letting m→∞, we obtain using (34), (38) and
(59) that,
Osc(Ỹ n(·, ωn), [τn, σn))
≤Π(Osc(X̃n(·, ωn) + Ṽ n(·, ωn), [τn, ξn ∧ σn)) + δn)(62)
≤Π(r3)<
INVARIANCE PRINCIPLE FOR SRBMS 23
By (62), (iii)(b) and (43), we obtain that
Osc(Ỹ n(·, ωn), [τn, σn])≤
+ Iδn < r2.
This contradicts the fact that Osc(Ỹ n(·, ωn), [τn, σn]) ≥ r2 on {σ
n < ∞},
and so (61) holds and (55) follows by (60).
By applying Theorem 4.1 on [τn, ξn − 1/m] and then letting m→∞, we
obtain using (61), (59) and (38), that
Osc(W̃ n(·, ωn), [τn, ξn))
≤Π(Osc(X̃n(·, ωn) + Ṽ n(·, ωn), [τn, ξn ∧ σn)) + δn)
and similarly,
Osc(Ỹ n(·, ωn), [τn, ξn))<
.(63)
Then we have
‖W̃ n(ξn−, ωn)− x0‖
≤ ‖W̃ n(ξn−, ωn)− W̃ n(τn, ωn)‖+ ‖W̃ n(τn, ωn)− x0‖
+ δn.
Using hypotheses (ii), (iii)(b), and (33), (51), we obtain
‖W̃ n(ξn, ωn)− W̃ n(ξn−, ωn)‖
≤ ‖X̃n(ξn, ωn)− X̃n(ξn−, ωn)‖
‖γi,n(W n(ξn−, ωn),W n(ξn, ωn))‖
× (Ỹ ni (ξ
n, ωn)− Ỹ ni (ξ
n−, ωn))
≤ ε̃+ Iδn.
Hence
‖W̃ n(ξn, ωn)− x0‖ ≤ ‖W̃
n(ξn−, ωn)− x0‖
+ ‖W̃ n(ξn, ωn)− W̃ n(ξn−, ωn)‖
+ δn + ε̃+ Iδn
≤ ε̃+ (I+1)δn +
< ρ/8 + ρ/8 + ρ/8< ρ/2.
24 W. KANG AND R. J. WILLIAMS
It follows from this that ξn = v and (53) holds, as desired.
Then, by (33), (51), (iii)(b), (iii)(c), (63) and (43), we have
‖W̃ n(v,ωn)− W̃ n(u,ωn)‖
≤ sup
r,s∈[u,v]
‖X̃n(s,ωn)− X̃n(r,ωn)‖+
(Ỹ ni (v,ω
n)− Ỹ ni (u,ω
≤ ε̃+
(Ỹ ni (v,ω
n)− Ỹ ni (u∨ τ
n, ωn))
(Ỹ ni (u ∨ τ
n, ωn)− Ỹ ni (u,ω
n))(64)
≤ ε̃+ IOsc(Ỹ n(·, ωn), [u∨ τn, v)) +
∆Ỹ ni (v,ω
n) + Iδn
≤ ε̃+ I
+ Iδn + Iδn <
‖Ỹ n(v,ωn)− Ỹ n(u,ωn)‖ ≤
(Ỹ ni (v,ω
n)− Ỹ ni (u,ω
(Ỹ ni (v,ω
n)− Ỹ ni (u∨ τ
n, ωn))
(Ỹ ni (u ∨ τ
n, ωn)− Ỹ ni (u,ω
Here we have used the fact that Ỹi does not increase on (u, τ
n ∨ u) and can
jump at most by δn at τn, by the definition of τn and (iii)(c).
On combining the results from Case 1 and Case 2, we obtain that for each
n≥ n1,
u,v∈[t,t+λ̃]
‖W̃ n(v,ωn)− W̃ n(u,ωn)‖ : 0≤ t≤ t+ λ̃≤ T
< ε(66)
u,v∈[t,t+λ̃]
‖Ỹ n(v,ωn)− Ỹ n(u,ωn)‖ : 0≤ t≤ t+ λ̃≤ T
< ε.(67)
Hence since ωn ∈Hn was arbitrary, by (50), we have that (44) and (45) hold
for all n≥ n1.
Next we show that there is a constant Nη,T > 0 such that (46) and (47)
hold for all n≥ n1. By (66)–(67) above, we have that for each n≥ n1, ω
INVARIANCE PRINCIPLE FOR SRBMS 25
Hn, t such that 0≤ t < t+ λ̃≤ T and t≤ u < v ≤ t+ λ̃,
‖W̃ n(v,ωn)− W̃ n(u,ωn)‖< ε(68)
‖Ỹ n(v,ωn)− Ỹ n(u,ωn)‖< ε.(69)
Then, for each 0≤ s≤ T , by (68), (69), (49) and (33), we have
‖W̃ n(s,ωn)‖ ≤ ‖W̃ n(s,ωn)− W̃ n(0, ωn)‖+ ‖W̃ n(0, ωn)‖
[T/λ̃]+1∑
‖W̃ n(iλ̃∧ s,ωn)− W̃ n((i− 1)λ̃ ∧ s,ωn)‖+ ‖X̃n(0, ωn)‖
≤ ([T/λ̃] + 1)ε+ M̃η,T
‖Ỹ n(s,ωn)‖ ≤ ‖Ỹ n(s,ωn)− Ỹ n(0, ωn)‖
[T/λ̃]+1∑
‖Ỹ n(iλ̃ ∧ s,ωn)− Ỹ n((i− 1)λ̃∧ s,ωn)‖
≤ ([T/λ̃] + 1)ε.
Here [T/λ̃] is the greatest integer less than or equal to T/λ̃. Let Nη,T =
([T/λ̃] + 1)ε+ M̃η,T . Then we obtain that for n≥ n1 and ω
n ∈Hn,
0≤s≤T
‖W̃ n(s,ωn)‖ ≤Nη,T(70)
0≤s≤T
‖Ỹ n(s,ωn)‖ ≤Nη,T .(71)
Then by (50), we have that (46) and (47) hold for all n≥ n1.
Finally by applying Proposition 1.1, we have the C-tightness of {W̃ n}
and {Ỹ n}. It then follows that {(W̃ n,Xn, Ỹ n)}∞n=1 is C-tight. Since Z
(W̃ n,Xn, Ỹ n) + (αn,0, βn) where αn,V(βn) → 0 in probability as n→∞,
then {Zn}∞n=1 is also C-tight. �
4.3. Invariance principle for SRBMs. The main theorem of the paper is
the following.
Theorem 4.3 (Invariance principle for SRBMs). Suppose that Assump-
tion 4.1 holds. Define Zn = (W n,Xn, Y n) for each n. Then the sequence of
26 W. KANG AND R. J. WILLIAMS
processes {Zn}∞n=1 is C-tight and any (weak) limit point of this sequence is
of the form Z = (W,X,Y ) where continuous d-dimensional processes W,X
and a continuous I-dimensional process Y are defined on some probability
space (Ω,F , P ) such that conditions (i), (ii) and (iv) of Definition 2.1 hold
with Ft = σ{Z(s) : 0≤ s≤ t}, t≥ 0.
If, in addition, the following conditions (vi)′ and (vii) hold, then any
weak limit point of the sequence {Zn}∞n=1 is an extended SRBM associated
with the data (G,µ,Γ,{γi, i ∈ I}, ν). If furthermore the following condition
(viii) holds, then W n ⇒W as n→∞ where W is an SRBM associated with
(G,µ,Γ,{γi, i ∈ I}, ν).
(vi)′ {Xn} converges in distribution to a d-dimensional Brownian mo-
tion with drift µ, covariance matrix Γ and initial distribution ν.
(vii) For each (weak) limit point Z = (W,X,Y ) of {Zn}∞n=1, {X(t) −
X(0)− µt, Ft, t≥ 0} is a martingale.
(viii) If a process W satisfies the properties in Definition 2.1, the law
of W is unique, that is, the law of an SRBM associated with the data
(G,µ,Γ,{γi, i ∈ I}, ν) is unique.
Remark. We note that (vi)′ implies that (vi) of Assumption 4.1 holds.
Proof of Theorem 4.3. By Theorem 4.2, we have that the sequence
{Zn}∞n=1 is C-tight. Let Z = (W,X,Y ) be a (weak) limit point of {Z
n}∞n=1,
that is, there is a subsequence {nk} of {n} such that Z
nk ⇒Z as k→∞. It
also follows that Z̃nk ≡ (W̃ nk ,Xnk , Ỹ nk)⇒Z as k→∞. By the C-tightness
of {Zn}, we obtain that Z has continuous paths a.s. For the purpose of
verifying that Z satisfies the listed properties in Definition 2.1, one may
invoke the Skorokhod representation theorem to assume, without loss of
generality, that Znk and Z̃nk converge u.o.c. to Z a.s. as k→∞ and V(βnk)
converges u.o.c. to 0 a.s. as k → ∞. With this simplification, it is easily
verified that the properties of {Znk} and {Z̃nk} imply that Z has properties
(ii) and (iv)(a)–(b) of Definition 2.1. For the verification of property (i) of
Definition 2.1, note that for each k, a.s. for each t≥ 0,
W nk(t) =Xnk(t) +
(0,t]
γi,nk(W nk(s−),W nk(s))dβ
i (s)
(0,t]
(γi,nk(W nk(s−),W nk(s))− γi(W nk(s)))dỸ
i (s)
(0,t]
γi(W nk(s))dỸ
i (s).
The sum of the first two terms on the right-hand side of the above equality
converges a.s. to X(t) as k → ∞. The third term on the right-hand side
INVARIANCE PRINCIPLE FOR SRBMS 27
converges a.s. to 0 as k→∞, by property (iv) and the fact that a.s.,
s∈(0,t]
‖W nk(s)−W nk(s−)‖
≤ sup
s∈(0,t]
‖∆Xnk(s)‖+ I sup
s∈(0,t]
‖∆Y nk(s)‖→ 0 as k→∞.
It remains to show that for each i ∈ I and t≥ 0, a.s.,
(0,t]
γi(W nk(s))dỸ
i (s)→
(0,t]
γi(W (s))dYi(s) as k→∞.
This follows directly from Lemma A.4.
For the verification of property (iv)(c) of Definition 2.1, it suffices to show
that for each i ∈ I , m= 1,2, . . . , a.s. for each t≥ 0,
Yi(t) =
(0,t]
fm(W (s))dYi(s),(72)
where {fm}
m=1 is a sequence of real valued continuous functions defined
on Rd such that for each m, the range of fm is [0,1], fm(x) = 1 for x ∈
U1/m(∂Gi ∩ ∂G) and fm(x) = 0 for x /∈ U2/m(∂Gi ∩ ∂G). The existence
of such a sequence of continuous functions {fm}
m=1 can be shown using
Urysohn’s lemma (cf. [8], page 122). Then (72) is a consequence of Lemma
A.4, property (iii) of Ỹ
i and the fact that δ
nk → 0 as k→∞. Indeed, a.s.,
for each t≥ 0,
Yi(t) = lim
i (t) = lim
(0,t]
{W̃nk (s)∈U
nk (∂Gi∩∂G)}
i (s)
= lim
(0,t]
fm(W̃
nk(s))dỸ
i (s)
(0,t]
fm(W (s))dYi(s).
Thus, Z satisfies properties (i), (ii) and (iv) of Definition 2.1 with Ft =
σ{Z(s) : 0≤ s≤ t}, t≥ 0.
Assuming properties (vi)′ and (vii) hold, Z satisfies (iii) of Definition
2.1. Then Z is an extended SRBM associated with the data (G,µ,Γ,{γi, i ∈
I}, ν). If in addition, property (viii) holds, then the law of W is unique. Since
each weak limit W is an SRBM associated with the data (G,µ,Γ,{γi, i ∈
I}, ν) and the law of such an SRBM is unique, then by a standard argument,
W n ⇒W as n→∞ where W is an SRBM associated with (G,µ,Γ,{γi, i ∈
I}, ν). �
Some sufficient conditions for (vii) to hold are given in Proposition 4.2 of
[15] for a simpler setting where G=Rd+. Two of those conditions generalize
to our setting here and can be proved in the same manner as in [15]. For
completeness, we state the ensuing result here.
28 W. KANG AND R. J. WILLIAMS
Proposition 4.1. Suppose that Assumption 4.1 and (vi)′ of Theorem
4.3 hold. If, in addition, one of the following conditions (I)–(II) holds, then
condition (vii) of Theorem 4.3 is satisfied, and any weak limit point of
{Zn}∞n=1 is an extended SRBM associated with (G,µ,Γ,{γ
i, i ∈ I}, ν).
(I) For any triple of d-dimensional {Ft}-adapted processes (W,X,Y )
defined on some filtered probability space (Ω,F ,{Ft}, P ) and satisfying con-
ditions (i), (ii) and (iv) of Definition 2.1 together with the condition that
X, under P , is a d-dimensional Brownian motion with drift vector µ, co-
variance matrix Γ and initial distribution ν, the pair (W,Y ) is adapted to
the filtration generated by X and the P -null sets.
(II) Xn = X̌n + εn1 , Y
n = Y̌ n + εn2 , W
n = W̌ n + εn3 , where ε
1 , ε
2 , ε
3 are
processes converging to 0 in probability as n→∞, and:
(a) {X̌n(t)− X̌n(0)}∞n=1 is uniformly integrable for each t≥ 0,
(b) there is a sequence of constants {µn}∞n=1 in R
d such that
limn→∞µ
n = µ,
(c) for each n, {X̌n(t)− X̌n(0)−µnt, t≥ 0} is a Pn-martingale with
respect to the filtration generated by (W̌ n, X̌n, Y̌ n).
In the rest of this work, we focus on applications of the invariance prin-
ciple and in particular on giving sufficient conditions for property (viii) of
Theorem 4.3 to hold.
5. Applications of the invariance principle. In Section 5.1, we prove weak
existence of SRBMs associated with data (G,µ,Γ,{γi, i ∈ I}, ν) satisfying
(A1)–(A5) of Section 3. This is accomplished by constructing a sequence of
approximations whose weak limit points are SRBMs. The invariance prin-
ciple is used to prove the C-tightness of the approximations and that any
weak limit point is an SRBM. In Sections 5.2 and 5.3, using known results
on uniqueness in law for SRBMs, we illustrate the invariance principle for
certain domains and directions of reflection.
5.1. Weak existence of SRBMs.
Theorem 5.1. Suppose that assumptions (A1)–(A5) of Section 3 hold.
Then there exists an SRBM associated with the data (G,µ,Γ,{γi, i ∈ I}, ν).
Proof. We construct a sequence of approximations to an SRBM and
use the invariance principle to establish weak convergence along a subse-
quence to an SRBM.
In the following we will use R(·) from assumption (A2), L > 0 from as-
sumption (A4), a > 0 from assumption (A5), and ρ0 =
. Fix ε > 0 and
INVARIANCE PRINCIPLE FOR SRBMS 29
0 < ρ < min{
R(a/4)
}. By assumption (A3), there is a constant r1 > 0
such that
D(r)<min
for all r ∈ (0, r1].
Recall the properties of Π(·) from Theorem 4.1. Since Π(u)→ 0 as u→ 0,
there are constants 0< r3 < r2 <min{r1,
} such that
Π(r)<
for all r ∈ (0, r3].
Fix ε̃ and δ such that 0< ε̃ <min{
24ILr2
} and 0< 2δ <min{r3
8(1+I)
We will construct a d-dimensional stochastic processW δ and an I-dimensional
“pushing” process Y δ , such that W δ approximately satisfies the conditions
defining an SRBM for the data (G,µ,Γ,{γi, i ∈ I}, ν) (cf. Assumption 4.1).
The idea for this construction is to use a Brownian motion X with drift
vector µ, covariance matrix Γ and initial distribution ν. Away from ∂G, the
increments of W δ are determined by those of X . For any time t ≥ 0 such
that W δ(t−) ∈ ∂G, we add an instantaneous jump to W δ(t−) to obtain
W δ(t) ∈G. Here W δ(0−) =X(0). The size of the jump is such that W δ(t)
is a strictly positive distance (depending on δ) from the boundary of G.
The jump vector is obtained as a measurable function of W δ(t−). To ensure
the measurability, each point x on ∂G is associated with a nearby point x̄,
chosen in a measurable way from a fixed countable set of points in ∂G. The
jump vector for x is one associated with x̄. We now specify the mapping
x→ x̄ and the associated jump vector more precisely.
By assumption (A5)(ii), for each x ∈ ∂G, there is c(x) ∈RI+ such that
i∈I(x)
ci(x) = 1 and min
j∈I(x)
i∈I(x)
ci(x)γ
i(x), nj(x)
≥ a.(73)
By (73), Lemma 2.1 and the fact that ni(·) is continuous on ∂Gi for each
i ∈ I , we have that for each x ∈ ∂G there is rx ∈ (0, δ) such that for each
y ∈Brx(x)∩ ∂G,
I(y)⊂ I(x)(74)
j∈I(x)
i∈I(x)
ci(x)γ
i(x), nj(y)
.(75)
It follows, using the C1 nature of ∂Gi and the fact that n
i(y) is the inward
unit normal to ∂Gi at y ∈ ∂G for each i ∈ I(y), that (by choosing rx even
30 W. KANG AND R. J. WILLIAMS
smaller if necessary) for each x ∈ ∂G there is m(x)> 0 and rx ∈ (0, δ) such
that for each y ∈Brx(x)∩ ∂G, (74)–(75) hold and
y + λ
i∈I(x)
ci(x)γ
i(x) ∈G for all λ ∈ (0,m(x)).(76)
Let Borx(x) denote the interior of the closed ball Brx(x) for each x ∈ ∂G. The
collection {Borx(x) :x ∈ ∂G} is an open cover of ∂G and it follows that there
is a countable set {xk} such that ∂G⊂
kBrxk (xk) and {xk} ∩BN (0) is a
finite set for each integer N ≥ 1. We can further choose the set {xk} to be
minimal in the sense that for each strict subset C of {xk}, {Brx(x) :x ∈C}
does not cover ∂G. Let Dk = (Brxk (xk) \ (
i=1 Brxi (xi)) ∩ ∂G for each k.
Then Dk 6= ∅ for each k, {Dk} is a partition of ∂G, and for each x ∈ ∂G
there is a unique index i(x) such that x ∈Di(x). For each x ∈R
d, let
x, if x /∈ ∂G,
xi(x), if x ∈ ∂G.
Note that for all x ∈Rd,
‖x− x̄‖< δ.(77)
For each i ∈ I and x∈Rd, let
γi,δ(x) = γi(x̄).(78)
The mapping x → x̄ is Borel measurable on Rd and hence γi,δ is a Borel
measurable function from Rd into Rd.
We construct (W δ, Y δ) as follows. Let X defined on some filtered proba-
bility space (Ω,F ,{Ft}, P ) be a d-dimensional {Ft}-Brownian motion with
drift µ and covariance matrix Γ such that X is continuous surely and X(0)
has distribution ν. Let
τ1 = inf{t≥ 0 :X(t) ∈ ∂G}
W δ(t) =X(t), Y δ(t) = 0 for 0≤ t < τ1.
Note that W δ(τ1−) exists on {τ1 <∞} since X has continuous paths and
in the case that τ1 = 0, W
δ(0−)≡X(0). On {τ1 <∞}, define
Y δi (τ1) =
0, i /∈ I(W δ(τ1−)),
ci(W δ(τ1−))
m(W δ(τ1−))
, i ∈ I(W δ(τ1−)),
W δ(τ1) =X(τ1)
m(W δ(τ1−))
i∈I(W δ(τ1−))
ci(W δ(τ1−))γ
i,δ(W δ(τ1−))
INVARIANCE PRINCIPLE FOR SRBMS 31
So W δ, Y δ have been defined on [0, τ1) and at τ1 on {τ1 <∞}, such that:
(i) W δ(t) = X(t) +
i∈I γ
i,δ(W δ(0−))Y δi (0) +
(0,t]
γi,δ(W δ(s−))dY δi (s) for all t ∈ [0, τ1]∩ [0,∞), where W
δ(0−) =X(0),
(ii) W δ(t) ∈G for t ∈ [0, τ1]∩ [0,∞),
(iii) for i ∈ I ,
(a) Y δi (0)≥ 0,
(b) Y δi is nondecreasing on [0, τ1]∩ [0,∞),
(c) Y δi (t) = Y
i (0) +
(0,t] 1{W δ(s)∈U2δ(∂Gi∩∂G)} dY
i (s) for t ∈ [0, τ1] ∩
[0,∞),
(iv) ‖∆Y δ(t)‖ ≡ ‖Y δ(t)−Y δ(t−)‖ ≤ δ for t ∈ [0, τ1]∩ [0,∞), where Y
δ(0−)≡
Note that (iii)(c) above contains the expression W δ(s) ∈U2δ(∂Gi∩∂G). The
reader may wonder why 2δ appears instead of δ. The reason is that at a jump
time s of Y δi , W
δ(s−) ∈ ∂Gi ∩ ∂G and so
dist(W δ(s), ∂Gi ∩ ∂G)≤ ‖W
δ(s)−W δ(s−)‖+ ‖W δ(s−)−W δ(s−)‖
≤ δ + δ = 2δ.
Proceeding by induction, we assume that for some n ≥ 2, τ1 ≤ · · · ≤ τn−1
have been defined, and W δ, Y δ have been defined on [0, τn−1) and at τn−1
on {τn−1 <∞}, such that (i)–(iv) above hold with τn−1 in place of τ1. Then
we define τn =∞ on {τn−1 =∞}, and on {τn−1 <∞} we define
τn = inf{t≥ τn−1 :W
δ(τn−1) +X(t)−X(τn−1) ∈ ∂G}.
For τn−1 ≤ t < τn, let
Y δ(t) = Y δ(τn−1),
W δ(t) =W δ(τn−1) +X(t)−X(τn−1),
and on {τn <∞}, let
Y δi (τn) =
Y δi (τn−1), i /∈ I(W
δ(τn−)),
Y δi (τn−1) + ci(W
δ(τn−))
m(W δ(τn−))
, i ∈ I(W δ(τn−)),
W δ(τn) =W
δ(τn−)
m(W δ(τn−))
i∈I(W δ(τn−))
ci(W δ(τn−))γ
i,δ(W δ(τn−))
32 W. KANG AND R. J. WILLIAMS
In this way, W δ, Y δ have been defined on [0, τn) and at τn on {τn <∞} such
that (i)–(iv) hold with τn in place of τ1.
By construction {τn}
n=1 is a nondecreasing sequence of stopping times.
Let τ = limn→∞ τn. On {τ =∞}, the construction of (W
δ, Y δ) is complete.
We now show that {τ < ∞} = ∅. In fact, if {τ < ∞} 6= ∅, let ω ∈ {τ <
∞}. The above construction gives (W δ(·, ω), Y δ(·, ω)) on the time interval
[0, τ(ω)). For each t ∈ [0, τ(ω)), we have
W δ(t,ω) =X(t,ω) +
γi,δ(W δ(0−, ω))Y δi (0, ω)
(0,t]
γi,δ(W δ(s−, ω))dY δi (s,ω).
SinceX is continuous on [0,∞), ‖γi,δ(x)‖= 1 for each x ∈Rd and
i∈I Y
i (0,
ω)≤ δ, there are constants λ̃ ∈ (0, τ(ω)) and M̃ > 0 (depending on ω) such
wτ(ω)(X(·, ω) + γ
i,δ(W δ(0−, ω))Y δi (0, ω), λ̃)< ε̃(80)
0≤t≤τ(ω)
∥∥∥∥∥X(·, ω) +
γi,δ(W δ(0−, ω))Y δi (0, ω)
∥∥∥∥∥≤ M̃,(81)
where w
(·, ·) is defined in (3). By the choice of ε̃, δ made at the beginning of
this proof, (77)–(78) and the uniform Lipschitz property of the γi(·), i ∈ I , it
follows that (39) and (43) hold with γi,δ(y) and 2δ in place of γi,n(y,x) and
δn, respectively. Then by similar pathwise analysis to that used in Case 1 and
2 of the proof of Theorem 4.2, with W̃ n =W n =W δ, αn = 0, γi,n(y,x) =
γi,δ(y) for each i ∈ I and x, y ∈ Rd, Xn = X +
i∈I γ
i,δ(W δ(0−))Y δi (0),
Y n = Y δ , Ỹ n = Y δ − Y δ(0), βn = Y δ(0) and δn = 2δ, we obtain that (71)
holds for any T < τ(ω) with ωn = ω, Nη,T = ([τ(ω)/λ̃] + 1)ε+ M̃ . It follows
that supi∈I sups∈[0,τ(ω)) Y
i (s,ω) is finite. By the nondecreasing property of
Y δi (·, ω) on [0, τ(ω)) for each i ∈ I , Y
i (τ(ω)−, ω) exists and is finite for each
i ∈ I . Then by (79) and the continuity of X , we see that W δ(τ(ω)−, ω) exists
and is finite. By the construction of Y δ and the fact that
i∈I(x) ci(x) = 1
for all x∈ ∂G, we have that
Y δi (τ(ω)−, ω) =
m(W δ(τn(ω)−, ω))
∧ δ.(82)
Since τn(ω) ↑ τ(ω) as n → ∞ and W
δ(τ(ω)−, ω) exists, it follows that
{W δ(τn(ω)−, ω)}
n=1 converges to W
δ(τ(ω)−, ω) ∈ ∂G as n → ∞. Conse-
quently, {W δ(τn(ω)−, ω)}
n=1 is a bounded sequence in ∂G and so by the
INVARIANCE PRINCIPLE FOR SRBMS 33
definition of the sets {Dk} which form a partition of ∂G, there is a finite set
C such that
{W δ(τn(ω)−, ω)}
n=1 ⊂
Hence,
m(W δ(τn(ω)−, ω))≤ inf
m(xk)> 0,(83)
and so the right-hand side of (82) is infinite. On the other hand, since
supi∈I sups∈[0,τ(ω)) Y
i (s,ω) is finite, the left-hand side of (82) is finite. This
yields the desired contradiction and so {τ < ∞} = ∅ and we have con-
structed (W δ, Y δ) on [0,∞).
From the construction above, we can see that W δ and Y δ are well-defined
stochastic processes with sample paths in D([0,∞),Rd) and D([0,∞),RI).
They are adapted to the filtration generated by X and satisfy (i)–(iv) above
with [0,∞) in place of [0, τ1].
Consider a sequence of sufficiently small δ’s, denoted by {δn}, such that
δn ↓ 0 as n → ∞. For each δn, let (W δ
, Y δ
) be the pair constructed as
above for the same process X . By the above properties and the fact that for
each i ∈ I and x, y ∈Rd,
‖γi,δ
(y)− γi(x)‖ ≤ ‖γi(ȳ)− γi(x)‖ ≤ L‖ȳ− x‖ ≤ L(δn + ‖y − x‖),
we obtain that Assumption 4.1 holds with W̃ n =W n =W δ
, αn = 0, γi,n(y,x) =
(y) for each i ∈ I and x, y ∈Rd, Xn =X+
i∈I γ
i,δn(W δ
(0−))Y δ
i (0),
Y n = Y δ
, Ỹ n = Y δ
(0), βn = Y δ
(0) and 2δn in place of δn. By invok-
ing the first part of Theorem 4.3, we obtain that {Zδ
}∞n=1 = {(W
δn ,Xδ
)}∞n=1 is C-tight and any weak limit point Z of this sequence satisfies
conditions (i), (ii) and (iv) of Definition 2.1 with Ft = σ{Z(s) : 0 ≤ s ≤ t},
t≥ 0. Note that condition (vi)′ of Theorem 4.3 holds trivially. Furthermore,
= {Xδ
(t)−Xδ
(0)− µt, t≥ 0}= {X(t)−X(0)− µt, t≥ 0} is a mar-
tingale with respect to the filtration generated by X . Since W δ
, Y δ
adapted to this filtration, it follows that M δ
is a martingale with respect
to the filtration generated by W δ
, Y δ
(which in fact is the same as
that generated by X). For each t≥ 0, Xδ
(t)−Xδ
(0) =X(t)−X(0) and
so trivially this forms a uniformly integrable sequence as n varies. It fol-
lows from Proposition 4.1 that condition (vii) of Theorem 4.3 holds. Hence,
any weak limit point of {Zδ
}∞n=1 is an extended SRBM with the data
(G,µ,Γ,{γi, i ∈ I}, ν). �
5.2. SRBMs in convex polyhedrons with constant reflection fields. Exis-
tence and uniqueness in law for SRBMs living in convex polyhedrons with a
constant reflection field on each boundary face has been studied by Dai and
34 W. KANG AND R. J. WILLIAMS
Williams [4]. In this subsection, we state a consequence of our invariance
principle using the results in [4] to establish uniqueness in law. In this case,
G is defined in terms of I (I≥ 1) d-dimensional unit vectors {ni, i ∈ I} and
an I-dimensional vector β = (β1, . . . , βI)
′ such that
G≡ {x ∈Rd : 〈ni, x〉 ≥ βi for all i ∈ I}.(84)
It is assumed that G is nonempty and that the set {(n1, β1), . . . , (n
I, βI)} is
minimal in the sense that no proper subset defines G. For each i ∈ I , let Fi
denote the boundary face: {x ∈G : 〈ni, x〉= βi}. Then, n
i is the inward unit
normal to Fi. A constant vector field γ
i of unit length specifies the direction
of reflection associated with Fi.
Definition 5.2. For each ∅ 6=K⊂ I , define FK =
i∈KFi. Let F∅ =G.
A set K⊂ I is maximal if K 6=∅, FK 6=∅ and FK 6= FK̄ for any K̄ ⊃ K such
that K̄ 6=K.
In [4], Dai and Williams introduced the following assumption.
Assumption 5.1. For each maximal K⊂ I ,
(S.a) there is a positive linear combination n=
i∈K bin
i (bi > 0 ∀i ∈K)
of the {ni, i ∈K} such that 〈n,γi〉> 0 for all i ∈K,
(S.b) there is a positive linear combination γ =
i∈K ciγ
i (ci > 0 ∀i ∈K)
of the {γi, i ∈K} such that 〈ni, γ〉> 0 for all i ∈K.
Remark. For the given G and constant vector fields {γi, i ∈ I}, As-
sumption 5.1 is equivalent to assumption (A5).
Definition 5.3. The convex polyhedron G is simple if for each K⊂ I
such that K 6=∅ and FK 6=∅, exactly |K| distinct faces contain FK.
Remark. The polyhedron G is simple if and only if K is maximal for
every K such that ∅ 6= K ⊂ I and FK 6= ∅. It is shown in [4] that when G
is simple, (S.a) holds for all maximal K⊂ I if and only if (S.b) holds for all
maximal K⊂ I.
Dai and Williams [4] showed that Assumption 5.1 is sufficient for exis-
tence and uniqueness in law of SRBMs living in G with the reflection fields
{γi, i ∈ I} and fixed starting point. [They also showed that condition (S.b)
holding for all maximal K⊂ I is necessary for existence of an SRBM starting
from each point in G. Consequently, when G is simple, Assumption 5.1 is
necessary and sufficient for existence of an SRBM starting from each point
in G.] This yields the following consequence of our invariance principle.
INVARIANCE PRINCIPLE FOR SRBMS 35
Theorem 5.4. Let G be a nonempty domain such that G is a convex
polyhedron of the form (84) (with minimal description), and let {γi, i ∈ I} be
a family of constant vector fields of unit length satisfying Assumption 5.1.
Suppose that Assumption 4.1 and (vi)′, (vii) of Theorem 4.3 hold. Then
W n ⇒W as n→∞ where W is an SRBM associated with (G,µ,Γ,{γi, i ∈
I}, ν).
Proof. Clearly (A1) holds. Assumptions (A2)–(A3) hold by Lemma
A.3. Since for each i ∈ I , γi(·) is a constant vector field of unit length, as-
sumption (A4) holds trivially. Assumption (A5) is implied by Assumption
5.1. Hence by Theorem 4.3, the only thing that we have to check is condition
(viii) of Theorem 4.3, that is, uniqueness in law for SRBMs in convex poly-
hedrons with constant reflection fields of unit length. But this is proved in
Theorem 1.3 of [4] for a fixed starting point in G and follows by a standard
conditioning argument for the initial distribution ν. �
5.3. SRBMs in bounded domains with piecewise smooth boundaries. Dupuis
and Ishii [6] have established sufficient conditions for the existence and path-
wise uniqueness of reflecting diffusions living in the closures of bounded
domains with piecewise smooth boundaries. In this subsection, we state a
consequence of our invariance principle using the results in [6] to establish
uniqueness in law.
Theorem 5.5. Let G be a bounded domain and {γi, i ∈ I} be a family
of reflection fields that satisfy assumptions (A1)–(A4) and (A5)′ in Section
3. We further assume that for each i ∈ I , γi(·) is once continuously differ-
entiable with locally Lipschitz continuous first partial derivatives. Suppose
that Assumption 4.1 and (vi)′, (vii) of Theorem 4.3 hold. Then W n ⇒W
as n→∞ where W is an SRBM associated with (G,µ,Γ,{γi, i ∈ I}, ν).
Remark. We remind the reader that in view of Lemma 3.1, to verify
condition (A5)′, one only needs to show that (i) or (ii) holds for all x ∈ ∂G.
However, as can be seen from the proof below, both forms of the condition
can be useful.
Proof of Theorem 5.5. This theorem follows from Theorem 4.3 and
uniqueness in law for the associated SRBMs. The latter follows by a standard
argument from the pathwise uniqueness established in Corollary 5.2 of [6] for
their Case 2. The conditions required for that case are satisfied in particular
because (A5)′(ii) implies condition (3.8) of [6]. That condition (3.8) readily
implies condition (3.6) of [6]; and, by [5], under the additional smoothness
assumptions imposed on the γi in the statement of our theorem, condition
(3.8) also implies condition (3.7) in [6]. In addition, (A5)′(i) implies that
36 W. KANG AND R. J. WILLIAMS
for each x ∈ ∂G, 〈γi(x), ni(x)〉> 0 for each i ∈ I(x), and furthermore, since
(A5)′ implies (A5), we have by (A5)(i) that the origin does not belong to
the convex hull of the {γi(x) : i ∈ I(x)}. �
APPENDIX: AUXILIARY LEMMAS
Lemma A.1. Suppose that G is bounded. If assumption (A1) holds, then
assumption (A2) holds.
Proof. To see this, suppose G is bounded and assumption (A1) holds.
Fix ε ∈ (0,1). For each i ∈ I and z ∈ ∂Gi ∩ ∂G, by the C
1 property of ∂Gi,
there is a neighborhood Vz of z and a constant R(ε, i, z) > 0 such that for
all x ∈ Vz ∩ ∂Gi ∩ ∂G and y ∈Gi such that ‖x− y‖<R(ε, i, z),
〈ni(x), y − x〉 ≥−ε‖y − x‖.(85)
Assumption (A2) then follows by a standard compactness argument. �
Lemma A.2. Suppose that G is a nonempty bounded domain satisfying
(5), where for each i ∈ I , Gi is a nonempty domain. Then assumption (A3)
holds.
Proof. We prove the lemma by contradiction. Suppose that assumption
(A3) does not hold. Then, since there are only finite many J ⊂ I , J 6=∅,
there is an ε > 0, a nonempty set J ⊂ I , a sequence {rn} ⊂ (0,∞) with rn →
0 as n→∞, a sequence {xn} ⊂R
d such that for each n, xn ∈
j∈J Urn(∂Gj∩
∂G) and dist(xn,
j∈J (∂Gj ∩ ∂G)) > ε. But since G is bounded, {xn} is
bounded and without loss of generality we may assume that xn → x as
n→∞ for some x ∈Rd. It follows that x ∈
j∈J (∂Gj ∩ ∂G), since for each
j ∈ J ,
dist(x,∂Gj ∩ ∂G)≤ ‖xn − x‖+dist(xn, ∂Gj ∩ ∂G)≤ ‖xn − x‖+ rn → 0
as n→∞. This is inconsistent with xn → x and dist(xn,
j∈J (∂Gj ∩∂G))>
Lemma A.3. Suppose (A1) holds where
Gi = {x ∈R
d : 〈ni, x〉> βi} for i ∈ I,(86)
{ni, i ∈ I} is a finite collection of d-dimensional vectors of unit length, and
for I= |I|, β = (β1, . . . , βI)
′ is an I-dimensional vector. (Thus, G is a convex
polyhedron.) Assume that for each i ∈ I , ∂Gi ∩ ∂G 6=∅. Then assumptions
(A2) and (A3) hold.
INVARIANCE PRINCIPLE FOR SRBMS 37
Proof. Assumption (A2) holds automatically since G is convex. In or-
der to show that assumption (A3) holds, we just need to show that for each
J ⊂ I with J 6=∅,
(∂Gj ∩ ∂G)
Ur(∂Gj ∩ ∂G)
→ 0(87)
as r→ 0. Fix J ⊂ I such that J 6=∅. Then
j∈J (∂Gj ∩∂G) is the collection
of all solutions x ∈Rd to the following system of linear inequalities:
〈ni, x〉 ≥ βi for all i ∈ I,
〈−ni, x〉 ≥ −βi for all i ∈ J .
Suppose that
j∈J (∂Gj ∩∂G) 6=∅, that is, (LS) has at least one solution.
By a theorem of Hoffman [11], with supporting lemmas proved by Agmon
[1], there is a constant C > 0 (depending only on {ni, i ∈ I} and not on β)
such that for any x ∈Rd there exists a solution x0 ∈R
d of (LS) with
‖x− x0‖ ≤C
(βi − 〈n
i, x〉)+ +
(−βi − 〈−n
i, x〉)+
.(88)
For r > 0, any x ∈
j∈J Ur(∂Gj ∩ ∂G) satisfies the following:
〈ni, x〉 ≥ βi − r for all i ∈ I,
(r-LS)
〈−ni, x〉 ≥ −βi − r for all i ∈ J .
Then by (88), there is x0 ∈
j∈J (∂Gj ∩ ∂G) such that
(∂Gj ∩ ∂G)
≤ ‖x− x0‖ ≤ 2C|I|r.
It follows that (87) holds when
j∈J (∂Gj ∩ ∂G) 6=∅.
Now suppose that
j∈J (∂Gj ∩∂G) =∅, that is, (LS) has no solution. We
shall use an argument by contradiction to show that
j∈J Ur(∂Gj ∩∂G) =∅
for all r sufficiently small. Suppose that this is not true. Then we have that⋂
j∈J Ur(∂Gj ∩ ∂G) 6= ∅ for all r ∈ (0,∞). As we have seen before, any
j∈J Ur(∂Gj ∩ ∂G) is a solution to (r-LS). We now construct a Cauchy
sequence. Let x1 ∈
j∈J U1/2(∂Gj ∩ ∂G). Then x1 is a solution to (
-LS).
Since ( 1
-LS) has at least one solution, by the theorem of Hoffman [11] (using
the fact that the constant C depends only on {ni, i ∈ I}), we conclude that
there is a solution x2 to (
-LS) such that ‖x1−x2‖ ≤
, where C ′ = 2C|I|.
Continuing in this manner, we can obtain a sequence {xn}
n=1 such that for
each n ≥ 1, ‖xn − xn+1‖ ≤
and xn+1 is a solution of (
-LS). The
38 W. KANG AND R. J. WILLIAMS
sequence {xn}
n=1 is Cauchy. Hence, there is an x
∗ ∈Rd such that xn → x
as n→ ∞, and x∗ is a solution to (LS). This contradicts the supposition
j∈J (∂Gj ∩ ∂G) =∅. Thus we have that
j∈J Ur(∂Gj ∩ ∂G) =∅ for
all r sufficiently small, and for such r,
(∂Gj ∩ ∂G)
Ur(∂Gj ∩ ∂G)
by convention.
Combining the above we see that for each J ⊂ I with J 6=∅, (87) holds
and hence assumption (A3) holds. �
Remark. In fact, under the assumptions of Lemma A.3, there is a con-
stant C > 0 such that D(u) ≤ Cu for each u ≥ 0 and D(·) defined as in
assumption (A3).
Lemma A.4. Given T > 0, functions φ,{φn}∞n=1 in D([0,∞),R
d), and
χ,{χn}∞n=1 in D([0,∞),R), suppose that sup0≤s≤T ‖φ
n(s)− φ(s)‖ → 0 and
sup0≤s≤T |χ
n(s)−χ(s)| → 0 as n→∞. Assume that χn is nondecreasing for
each n. Then for any sequence of real valued continuous functions {fn}∞n=1
defined on Rd such that fn converges uniformly on each compact set to a
continuous function f :Rd →R, we have
(0,t]
fn(φn(s))dχn(s)→
(0,t]
f(φ(s))dχ(s) as n→∞,(89)
uniformly for t ∈ [0, T ].
Proof. By replacing χn(·) and χ(·) by χn(·)− χn(0) and χ(·)− χ(0),
respectively, we may assume that χn(0) = χ(0) = 0. It is straightforward to
see by the uniform convergence of {χn} to χ on [0, T ] that χ inherits the
nondecreasing property of the {χn}.
By the triangle inequality,
0≤t≤T
(0,t]
fn(φn(s))dχn(s)−
(0,t]
f(φ(s))dχ(s)
≤ sup
0≤t≤T
(0,t]
(fn(φn(s))− f(φ(s)))dχn(s)
∣∣∣∣(90)
+ sup
0≤t≤T
(0,t]
f(φ(s))d(χn(s)− χ(s))
∣∣∣∣.
For the first term on the right-hand side of the above inequality, we have
0≤t≤T
(0,t]
(fn(φn(s))− f(φ(s)))dχn(s)
≤ sup
0≤s≤T
|fn(φn(s))− f(φ(s))|χn(T ),
INVARIANCE PRINCIPLE FOR SRBMS 39
where the right-hand side member above tends to zero as n → ∞ by the
uniform convergence of φn to φ on [0, T ] (which implies uniform boundedness
of {φn} on [0, T ]), the uniform convergence of fn to f on compact sets, the
continuity of f , and the convergence of χn(T ) to χ(T ). For the second term,
note that since f(φ(·)) ∈D([0,∞),R), by Theorem 3.5.6, Proposition 3.5.3
and Remark 3.5.4 of [7], there is a sequence of step functions {zk}∞k=1 of the
zk(·) =
zk(tki )1[tk
)(·),(91)
where 1 ≤ lk < ∞, 0 = t
1 < t
2 < · · · < t
< ∞ and sup0≤s≤T |f(φ(s)) −
zk(s)| → 0 as k→∞. Then
0≤t≤T
(0,t]
f(φ(s))d(χn(s)− χ(s))
≤ sup
0≤t≤T
(0,t]
(f(φ(s))− zk(s))d(χn(s)− χ(s))
+ sup
0≤t≤T
(0,t]
zk(s)d(χn(s)− χ(s))
≤ sup
0≤s≤T
|f(φ(s))− zk(s)|(χn(T ) + χ(T ))
+ sup
0≤t≤T
|zk(tki )||(χ
n − χ)((tki+1 ∧ t)−)− (χ
n − χ)((tki ∧ t)−)|.
For fixed k, the last term above can be made as small as we like for all n
sufficiently large since χn → χ uniformly on [0, T ]. The desired result follows.
Remark. The proof of Lemma A.4 is a modification of the proof of the
related Lemma 2.4 in [4]. The difference in assumptions is that in [4] it is
assumed that φn → φ in the J1-topology rather than uniformly on [0, T ],
χn, χ ∈C([0,∞),R+) rather than χ
n, χ ∈D([0,∞),R), and there is a single
function f rather than a sequence {fn}.
REFERENCES
[1] Agmon, S. (1954). The relaxation method for linear inequalities. Canadian J. Math.
6 382–392. MR0062786
[2] Berman, A. and Plemmons, R. J. (1979). Nonnegative Matrices in the Mathematical
Sciences. Academic Press, New York. MR0544666
[3] Dai, J. G. and Dai, W. (1999). A heavy traffic limit theorem for a class of open
queueing networks with finite buffers. Queueing Systems 32 5–40. MR1720547
http://www.ams.org/mathscinet-getitem?mr=0062786
http://www.ams.org/mathscinet-getitem?mr=0544666
http://www.ams.org/mathscinet-getitem?mr=1720547
40 W. KANG AND R. J. WILLIAMS
[4] Dai, J. G. and Williams, R. J. (1995). Existence and uniqueness of semimartingale
reflecting Brownian motion in convex polyhedrons. Theory Probab. Appl. 40 1–
40. MR1346729 [Correctional note (2006) 50 346–347 MR2222685.]
[5] Dupuis, P. and Ishii, H. (1991). On oblique derivative problems for fully nonlinear
second-order elliptic PDE’s on domains with corners. Hokkaido Math J. 20 135–
164. MR1096165
[6] Dupuis, P. and Ishii, H. (1993). SDEs with oblique reflection on nonsmooth do-
mains. Ann. Probab. 21 554–580. MR1207237 [Correction note (2007) 6 pages,
submitted.]
[7] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and
Convergence. Wiley, New York. MR0838085
[8] Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications,
2nd ed. Wiley, New York. MR1681462
[9] Gilbarg, D. and Trudinger, N. S. (1977). Elliptic Partial Differential Equations
of Second Order. Springer, Berlin. MR0473443
[10] Harrison, J. M. and Reiman, M. I. (1981). Reflected Brownian motion on an
orthant. Ann. Probab. 9 302–308. MR0606992
[11] Hoffman, A. J. (1952). On approximate solutions of systems of linear inequalities.
J. of Research of the National Bureau of Standards 49 263–265. MR0051275
[12] Jacod, J. and Shiryaev, A. N. (1987). Limit Theorems for Stochastic Processes.
Springer, New York. MR0959133
[13] Kang, W., Kelly, F. P., Lee, N. H. and Williams, R. J. (2007). State space
collapse and diffusion approximation for a network operating under a fair band-
width sharing policy. Preprint.
[14] Taylor, L. M. and Williams, R. J. (1993). Existence and uniqueness of semi-
martingale reflecting Brownian motions in an orthant. Probab. Theory Related
Fields 96 283–317. MR1231926
[15] Williams, R. J. (1998). An invariance principle for semimartingale reflecting Brow-
nian motions in an orthant. Queueing Systems 30 5–25. MR1663755
[16] Williams, R. J. (1998). Diffusion approximations for open multiclass queueing net-
works: Sufficient conditions involving state space collapse. Queueing Systems
Theory Appl. 30 27–88. MR1663759
Department of Mathematical Sciences
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213
E-mail: weikang@andrew.cmu.edu
Department of Mathematics
University of California at San Diego
9500 Gilman Drive
La Jolla, California 92093
E-mail: williams@math.ucsd.edu
http://www.ams.org/mathscinet-getitem?mr=1346729
http://www.ams.org/mathscinet-getitem?mr=2222685
http://www.ams.org/mathscinet-getitem?mr=1096165
http://www.ams.org/mathscinet-getitem?mr=1207237
http://www.ams.org/mathscinet-getitem?mr=0838085
http://www.ams.org/mathscinet-getitem?mr=1681462
http://www.ams.org/mathscinet-getitem?mr=0473443
http://www.ams.org/mathscinet-getitem?mr=0606992
http://www.ams.org/mathscinet-getitem?mr=0051275
http://www.ams.org/mathscinet-getitem?mr=0959133
http://www.ams.org/mathscinet-getitem?mr=1231926
http://www.ams.org/mathscinet-getitem?mr=1663755
http://www.ams.org/mathscinet-getitem?mr=1663759
mailto:weikang@andrew.cmu.edu
mailto:williams@math.ucsd.edu
	Introduction
	Notation, terminology and preliminaries
	Definition of an SRBM
	Assumptions on the domain G and the reflection vector fields {i}
	Assumptions on the domain G
	Assumptions on the reflection vector fields {i}
	Invariance principle
	Oscillation inequality
	C-tightness result
	Invariance principle for SRBMs
	Applications of the invariance principle
	Weak existence of SRBMs
	SRBMs in convex polyhedrons with constant reflection fields
	SRBMs in bounded domains with piecewise smooth boundaries
	Appendix: Auxiliary lemmas
	References
	Author's addresses
ABSTRACT
  Semimartingale reflecting Brownian motions (SRBMs) living in the closures of
domains with piecewise smooth boundaries are of interest in applied probability
because of their role as heavy traffic approximations for some stochastic
networks. In this paper, assuming certain conditions on the domains and
directions of reflection, a perturbation result, or invariance principle, for
SRBMs is proved. This provides sufficient conditions for a process that
satisfies the definition of an SRBM, except for small random perturbations in
the defining conditions, to be close in distribution to an SRBM. A crucial
ingredient in the proof of this result is an oscillation inequality for
solutions of a perturbed Skorokhod problem. We use the invariance principle to
show weak existence of SRBMs under mild conditions. We also use the invariance
principle, in conjunction with known uniqueness results for SRBMs, to give some
sufficient conditions for validating approximations involving (i) SRBMs in
convex polyhedrons with a constant reflection vector field on each face of the
polyhedron, and (ii) SRBMs in bounded domains with piecewise smooth boundaries
and possibly nonconstant reflection vector fields on the boundary surfaces.

<|endoftext|><|startoftext|>
Finite Drude weight for 1D low temperature conductors
Dariush Heidarian and Sandro Sorella
Istituto Nazionale di Fisica della Materia (INFM)-Democritos, National Simulation Centre,
and Scuola Internazionale Superiore di Studi Avanzati (SISSA), I-34014 Trieste, Italy
We apply well established finite temperature QuantumMonte Carlo techniques to one dimensional
Bose systems with soft and hardcore constraint, as well as to spinless fermion systems. We give
clear and robust numerical evidence that, as expected, no superfluid density for Bosons or Meissner
fraction for fermions. is possible at any non zero temperature in one dimensional interacting Bose
or fermi lattice models, whereas a finite Drude weight is generally observed in gapless systems, in
partial disagreement to previous expectations.
PACS numbers: 74.25.Fy,71.27.+a,71.10.Fd
I. INTRODUCTION
In the last decades there have been a lot of numerical
and theoretical works to understand the role of strong
correlation in lattice model Hamiltonians.1,2,3,4,5,6,7 Re-
cently this issue has acquired an increasing attention and
remarkable importance, due to the recent advances in the
realization of optical lattices. In these experiments ultra-
cold atoms behave as boson particles trapped on particu-
lar lattice sites, whereas the interaction and the hopping
parameters can be tuned continuously. This important
achievement has opened the possibility to verify directly
the crucial role played by the electron correlation in very
important model Hamiltonians defined on a lattice. An
important example is the realization of a Mott insulating
state in a system with strong on site repulsion8,9. More-
over quite recently the possibility to include the Fermi
statistics in optical lattices appears very promising and
interesting.10
In 1D spinless fermion systems are equivalent to inter-
acting Bose systems with hard-core constraint and are
described by the same low energy theory -the Luttinger
liquid theory-. Indeed this theory holds also for soft-
core bosons, as shown in Ref.(7). Therefore, as far as
the transport properties are concerned one should expect
the same behavior both for fermions and bosons. On the
other hand for lattice models, even in absence of disor-
der, the current does not commute with the Hamiltonian,
implying its possible decay at finite temperature due to
the backscattering processes11. In this case the dynam-
ical current-current correlation function also decays in
time, leading to a current Fourier transform without δ
function at zero energy, namely without a finite Drude
weight within the linear response theory.
Until few decades ago the absence of the Drude weight
was the expected behavior of all interacting metals in lat-
tice models or in real solids at finite temperature. How-
ever a quite clear numerical evidence has been reported
in Ref.12 that current should not decay in integrable 1D
models, namely for Hamiltonians that can be solved by
Bethe ansatz techniques in 1D. These models essentially
possess some hidden conservation law, that was conjec-
tured to forbid the current decay process.12,13 Later sev-
eral groups have reproduced this surprising effect14,15,
with a noticeable exception that a finite Drude weight
at finite temperature was found also for non-integrable
models.15 On the other hand, from purely theoretical
grounds this issue is not settled yet: in Ref.11 it was ar-
gued that backscattering processes can be effective also
at finite temperature and in 1D non integrable models,
whereas in Ref.16, it was proposed that also some par-
ticular non integrable model could provide a conserved
current.
In this work we propose that the general behavior of
1D gapless systems is eventually characterized by a finite
Drude weight at finite temperature, and we have found
no exception in the models that we have studied. This
conclusion is based on a careful and systematic numerical
work on fairly generic one dimensional Bose and Fermi
systems, that all show the same behavior, even though
strong finite size effects are observed in the non integrable
cases.
In the following we investigate the behavior of the
Drude weight in 1D systems in the thermodynamic limit
and finite temperature.
Model and Method : We have studied hardcore and
softcore bosons in a 1D lattice with periodic boundary
conditions. The Hamiltonian studied reads,
iai+1 + h.c.) +
ni(ni − 1)
+V nini+1 +Wnini+2 − µni
The sum is over all lattice sites i, a
i/ai is the boson
creation/annihilation operator at site i, henceforth ni is
the particle number at site i andµ is the chemical poten-
tial. t is the hopping amplitude which is set to one, U is
the on-site repulsion, whereas V and W are the nearest
and the next-nearest neighbor interactions, respectively.
For hardcore bosons in the U → ∞ limit the Hamilto-
nian can be mapped onto an S = 1/2 spin system with
Szi = ni − 1/2 and S
i = a
i . In this work we present
our results for the half filled case of hardcore and soft-
core models. Most of our results have been obtained by
Quantum Monte Carlo (QMC), using the stochastic se-
ries expansion (SSE)5,17 with the directed loop update18.
http://arxiv.org/abs/0704.0406v1
Superfluid density ρs (or spin stiffness in the equiva-
lent spin model), is defined as the second derivative of
the free energy with respect to a twist in the boundary
conditions. In order to compute this quantity by QMC,
it is convenient to apply linear response theory, relating
this quantity to the current current response function
Λ(q, iωn) =
dτ exp(iωnτ)〈J(q, τ)J(−q, 0)〉/N , where
J is the current operator and ωn is Matsubara frequency.
Then the following expression for the superfluid density
is obtained:
ρs = 〈−K〉 − Λ(q = 0; iωn = 0) =
〈W 2〉
where 〈K〉 is the average kinetic energy per site, ωn =
2πn/β are the Matsubara frequencies andW is the wind-
ing number. Similarly the Drude weight is obtained with
the same expression but with a different order in the limit
ω → 0 and q → 0, namely15,19,20
D = 〈−K〉 − ReΛ(q = 0, ω → 0). (3)
In SSE one can obtain Λ very accurately in terms of
Matsubara frequencies. Therefore analytic continuation
of the data is required. In order to avoid difficulties of
extrapolation to iωn → 0 at large temperatures, we have
worked at relatively low temperatures (β ≥ 10).
In principle, due to the different order of lim-
its, the Drude weight and the superfluid density may
be different when the following quantity remains fi-
nite in the thermodynamic limit15: D − ρs =
En=Em
β exp(−βEn)|〈ψn|J |ψm〉|
2/L, where, J is the
current operator, while En and |ψn〉 are the n
th eigen-
value and eigenstate of the many body system, respec-
tively.
The current operator can be written as J(q = 0) =
) where H+
al+1 and b is the bond
index, corresponding to the site index l. The ensemble
average of product of two local operatorsHσ1
andHσ2
(τ)Hσ1
(0)〉 =
n,m=0
(τ − β)n(−τm)
〈ψk|H
HmHσ1
|ψk〉 (4)
where τ is the imaginary time, Z is partition function
and the summation over n and m comes from Taylor-
expansion of e(−β+τ)H and e−τH . Following Ref.17 the
relation (4) can be simplified to
(β − τ)ns−m−2τm
(ns − 1)!
(ns −m− 2)!m!
N b1b2,σ1σ2m
where ns is the length of sequence of the local operators
and it changes in each QMC sampling. N b1b2,σ1σ2m is the
number of times that two operatorsHσ1
and Hσ2
appear
in this sequence with distance of m local operators, and
〈...〉W indicates an arithmetic average using configura-
tions with relative weight W . In this work we introduce
an efficient way to sample by SSE the current-current
response function. To this end, we multiply expression
(5) by eiωnτ and integrate over the imaginary time τ , we
obtain:
1F1(m+ 1, ns; 2iπn)N
b1b2σ1σ2
where
1F1(m+ 1, n; z) =
(n− 1)!
(n−m− 2)!m!
dx exp(zx)xm(1− x)n−m−2 (7)
is the confluent hypergeometric function.
Therefore, the current-current correlation acquires
contributions determined by length of operator string
ns. All these contributions are stochastically sampled
in an efficient way, and in each statistical measurement
the correlation function Λ(q = 0, iωn) has the following
estimator:
σ1,σ2=±
1F 1(m+ 1, ns; 2iπn)N
m (8)
where Nσ1σ2m =
b1,b2
N b1b2,σ1σ2m .
Discussion: At zero temperature, for non degenerate
ground state, the Drude weight and the superfluidity are
the same. In a 1D system at any finite temperature ρs is
expected to be zero in the thermodynamic limit, whereas
the Drude weight can be non-zero. For hardcore and soft-
core bosons in a 1D lattice, a systematic size scaling of
the superfluid density ρs clearly shows that this quantity
vanishes in the thermodynamic limit and for any finite
temperature (see figures 1 and 2). Further, we find that,
for a fixed set of parameters and at half filling, all super-
fluidity data versus 1/L collapse to one curve whenever
the x-axis is appropriately scaled with the temperature
T (see figures 1 and 2). This analysis suggests the scal-
ing form ρs(β, L) ≡ ρs(β/L). If one takes the order of
limit T → 0 after L → ∞, superfluidity remains zero
even at zero temperature. Notice that by taking first the
limit T → 0 and then L → ∞ superfluidity has a finite
value for the gapless phase, but this is not a signature of
superfluidity, rather the occurrence of a finite zero tem-
perature Drude weight. Though in 1D is not possible to
have a finite superfluid density at any non zero tempera-
ture, several authors have identified the finite zero tem-
perature Drude weight with the superfluid density for a
superfluid with vanishing critical temperature. We be-
lieve that this identification is a bit confusing and there-
fore we prefer to think about absence of superfluidity and
superconductivity in 1D systems, as commonly reported
in the textbooks.
Fig. 3 shows the current-current correlation versus ωn
in the metallic and insulating phases of an integrable
FIG. 1: (color online) Superfluid stiffness for an integrable (a)
and a non-integrable (b) model versus β/L. The system size
L is ranging from 50 to 1200.
FIG. 2: (color online) Superfluidity of the soft-core bosons
versus scaled system size at half filling, the on-site interaction
is U = 4
model (W = 0, U = ∞). The zero-frequency value is
the superfluid density ρs and the limit ωn → 0 gives the
Drude weight D. For W = 0 at zero temperature, there
exists a critical value Vc/t = 2 below which the Drude
weight is finite. In the first case (a) shown in Fig.(3)
with V/t = 2 the Drude weight has a finite value at any
finite temperature, which is consistent with the previous
works12. In the insulating phase (case b) with V/t = 3,
the superfluid density coincides with the Drude weight
and they both tend to zero as the system size increases.
In a non-integrable model such as hard-core bosons
with nearest and next nearest neighborer interactions
earlier works have suggested zero Drude weight as system
size increases. With SSE we can go to very large system
sizes and low temperatures and check the scaling depen-
dence of the Drude weight. In Fig. 4 we have plotted
current-current correlation versus Matsubara frequency
for different L, and a fixed temperature T = 1/100. As
shown in the same Figure (4) we have also found a finite
Drude weight at finite T in the celebrated Bose-Hubbard
model with softcore constraint and in several other mod-
FIG. 3: (color online) (a) Current-current correlation for an
integrable model in the metallic phase. The zero frequency
data shows superfluidity while the extrapolation to n → 0 is
the Drude weight. D remains finite with increasing L while
ρs vanishes. (b) In the insulating phase D and ρs have the
same value and both tend to zero by increasing L.
FIG. 4: (color online) Response function vs. n for (a) hard-
core bosons with V/t = 1.5, W/t = 1, T/t = 1/100 and (b)
Bose-Hubbard model with softcore constraint and U/t = 2,
µ/t = −0.4, T/t = 1/25. The system sizes ranges from
L = 100 to L = 800.
els (not shown). Although some evidence that few par-
ticular non integrable models could have a finite Drude
weight at finite temperature have been reported before,
here we have found a very convincing evidence that this
behavior should be generic for 1D gapless system regard-
less from their integrability. We have supported this
statement by state of the art numerical calculations ob-
tained for very large system sizes and low temperature so
that all possible extrapolations are perfectly under con-
trol.
In conclusion it turns out that, at low energy, all gap-
less lattice models studied scale to the Luttinger liquid
fixed point where the backscattering is a marginally irrel-
evant coupling and the current is therefore conserved at
the fixed point. This is therefore a peculiar and generic
feature of 1D. Indeed in 2D systems, such as hardcore
bosons with n.n. repulsion in a square and triangular
lattice, we found no difference between ρs and D.
Acknowledgments
We thank M. Troyer for useful discussions. This work
is partially supported by COFIN-2005 and CNR.
1 E. L. Pollock and D. M. Ceperley Phys. Rev. B 36, 8343
(1987).
2 G. G. Batrouni, R. T. Scalettar and G. T. Zimanyi Phys.
Rev. Lett. 65, 1765 (1990), ibidem Phys. Rev. B 46, 9051
(1992).
3 L. I. Plimak, M. K. Olsen, and M. Fleischhlauer Phys. Rev.
A 70, 013611 (2004).
4 S. Wessel, F. Alet, M. Troyer, and G. G. Batrouni, Phys.
Rev. A 70, 053615 (2004).
5 A. W. Sandvik, Phys. Rev. B 56, 11678 (1997).
6 M. P. A. Fisher, P.B. Weichman, G. Grinstein and D. S.
Fisher, Phys. Rev. B 40, 546 (1989).
7 see e.g. M. A. Cazalilla J. Phys. B 37, S1 (2004) and ref-
erences therein.
8 M. Greiner et al. Nature (London) 415, 39 (2002).
9 M. Greiner et al. Nature (London) 426, 537 (2003).
10 see e.g. H. Moritz et al. Phys. Rev. Lett. 94, 210401 (2005)
and references therein.
11 A. Rosch and N. Andrei Phys. Rev. Lett. 85, 1092 (2000).
12 X. Zotos and P. Prelovs̈ek Phys. Rev. B 53, 983 (1996).
13 H. Castella, X. Zotos and P. Prelovs̈ek Phys. Rev. Lett.
74, 972 (1995).
14 D. Poilblanc and et al., Europhys. Lett. 22, 537 (1993).
15 S. Kirchner, H. G. Evertz and W. Hanke Phys. Rev. B 59,
1825 (1999).
16 S. Fujimoto and N. Kawakami, Phys. Rev. Lett. 90, 197202
(2003); ibid S. Fujimoto and N. Kawakami Jour. Phys. A
31, 465 (1998).
17 A. W. Sandvik, J. Phys. A 25, 3667 (1992).
18 O. F. Syljuasen and A. W. Sandvik, Phys. Rev. E 66,
046701 (2002).
19 D. J. Scalapino, S. R. White and S. Zhang Phys. Rev. B
47, 7995 (1993).
20 In principle there is a subtle issue related to the ω → 0
limit, that should be employed for real frequencies. We
assume here that the analytic continuation of the function
Λ(iωn) is possible, as it is obvious on any finite cluster,
and therefore this limit can be obtained by interpolation
of Matsubara frequencies around ω = 0, namely at small
enough temperatures.
ABSTRACT
  We apply well established finite temperature Quantum Monte Carlo techniques
to one dimensional Bose systems with soft and hardcore constraint, as well as
to spinless fermion systems. We give clear and robust numerical evidence that,
as expected, no superfluid density for Bosons or Meissner fraction for
fermions. is possible at {\em any} non zero temperature in one dimensional
interacting Bose or fermi lattice models, whereas a finite Drude weight is
generally observed in gapless systems, in partial disagreement to previous
expectations.

<|endoftext|><|startoftext|>
Introduction.
In Fig. 5a we show the asymmetry obtained with the
NL3 model for both numerical calculations in the TF ap-
proximation, i.e, nucleation and HO expansion methods.
In this case, the agreement is very satisfactory even for
larger q-values, although the small numerical discrepan-
cies is reflected in a ∼ 10 percent difference in the pre-
dicted neutron skin thickness, as can be seen from Table
II. Finally, in Fig. 5b our results for the NLωρ (using
two different values for the ω− ρ coupling constant) and
the TWmodels within the HO numerical prescription are
shown. Again, at low momentum transfers, all curves co-
incide. However, it should be noticed that even for two
different model parametrizations which lead us to identi-
cal neutron skin thicknesses, a measurement of the asym-
metry in a higher q-region with a modest experimental
precision, can distinguish between them. Also, we should
expect that the asymmetry presents more structure in
this high momentum transfer region if we solve the Dirac
equation instead of using the TF approach, once the high
q value region is much more sensitive to the central part
of the neutron distribution, which is known to be flat in
the TF approximation. These differences can be seen in
Fig.4.
 NL3 nucl
 TW nucl
 NL3 HO
 TW HO
r(fm)
FIG. 3: Difference between neutron and proton densities ob-
tained with the Thomas-Fermi approach solved with both nu-
merical prescriptions for the TW model.
VII. DIFFERENT EOS, DIFFERENT NEUTRON
SKINS
For the sake of completeness, at this point, we discuss
some of the differences between the TW, the NLωρ mod-
els and the NL3 parametrization of the NLWM.
From Fig. 1 one can see that the largest possible pres-
sure for a phase coexistence in the TW model is much
lower, and appears at a lower proton fraction than the
NL3 model. This gives rise to a thinner crust within
the TW model, which may imply that the more exotic
pasta shapes will not form [5]. The NLωρ model goes
on a different direction, i.e., the pressure becomes higher
than the one obtained with the NL3 as the Λv coupling
is turned on.
Although the nuclear matter properties fitted to
parametrize the models are quite similar (see Table I),
the way the EoS behaves when extrapolated to higher
or lower densities can vary a lot from a density depen-
dent hadron model to one of the parametrizations of the
NLWM. Moreover, as seen from Table I, although the ef-
fective mass at saturation density is lower with the TW
than with the NL3, it can accommodate hyperons if an
EoS for stellar matter is necessary, contrary to the usual
0.0 0.5 1.0 1.5 2.0
 no structure
 HS-Dirac 
 HS-TF A
q(fm-1)
0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50
2.5x10-7
5.0x10-7
7.5x10-7
1.0x10-6
1.3x10-6
1.5x10-6
 no structure
 HS-Dirac 
 HS-TF 
q(fm-1)
FIG. 4: Parametrization HS, comparison Thomas-Fermi-HO
versus Dirac-HO
NL3 parametrization [32–34].
TABLE I: Nuclear matter properties.
NL3 NLωρ TW
[25] [36] [24]
Λv = 0.01 Λv = 0.02 Λv = 0.025
B/A (MeV) 16.3 16.3 16.3 16.3 16.3
ρ0 (fm
−3) 0.148 0.148 0.148 0.148 0.153
K (MeV) 271 271 271 271 240
Esym. (MeV) 37.4 34.9 33.1 32.3 32.0
M∗/M 0.60 0.60 0.60 0.60 0.56
L (MeV) 118 88 68 61 55
Ksym (MeV) 100 -46 -53 -34 -124
Another quantity of interest in asymmetric nuclear
matter is the nuclear bulk symmetry energy, shown in
Table I for the saturation point. The differences in the
symmetry energy at densities larger than the nuclear sat-
uration density is still not well established, but has al-
ready been extensively discussed in the literature even
0.0 0.5 1.0 1.5 2.0
q(fm-1)
 NL3 nucl
 NL3 HO
 no structure
0.0 0.5 1.0 1.5 2.0
q(fm-1)
 NL ( v=0.01)
 NL ( v=0.025)
FIG. 5: Asymmetry obtained with a) NL3 with both numer-
ical prescriptions and b)parametrizations NLωρ and TW
for the TW model [7, 8, 16, 32]. Again, for the sake
of completeness we reproduce these results here because
the neutron skin thickness and the neutron star EoS are
related by this quantity [1–4], which is usually defined
as Esym = 12
∂2E/ρ
, with δ = −ρ3/ρ = 1 − 2yp. The
symmetry energy can be analytically rewritten as
Esym =
ρ, (52)
for the TW model and as
Esym =
ρ (53)
with the effective ρ-meson mass defined as [3]
= m2ρ + 2g
for the NLωρ model. In both cases
kFp = kF (1 + δ)
1/3, kFn = kF (1− δ)1/3,
with kF = (1.5π
2ρ)1/3 and ǫF =
k2F +M
∗2. In equa-
tions (52) and (53) the second term dominates at large
densities. It is seen that the non-linear ρ − ω terms
introduce a non-linear density behavior in the symme-
try energy of the NLWM parametrizations such as NL3
and TM1. In TW the non-linear density behavior en-
ters through the density dependent coupling parameters.
These non-linear density behavior is important because
the linear behavior of NL3 and TM1 parametrizations
predicts too high symmetry energy at densities of impor-
tance for neutron star matter which has direct influence
on the proton fraction dependence with density. From
Fig. 6, it is easily seen that the symmetry energy ob-
tained with the TW model behaves in a very different
way, as compared with NL3. In [4] a relation between
the symmetry energy and the nuclear binding energy is
discussed : the harder the EoS, the more the symmetry
(fm  )−3ρ
vΛ   =0.01
vΛ   =0.025
 0  0.05  0.1  0.15  0.2  0.25  0.3  0.35  0.4
FIG. 6: Symmetry energy for the NL3, TW and NLωρmodels.
energy rises with density. The density dependence dis-
cussed in [4] is of the type introduced in [3, 36] through
the inclusion of a σ− ρ and/or ω− ρ couplings and then,
similar with the NLωρ model discussed here. One can
observe that as the strength of the coupling increases,
the symmetry energy gets closer to the TW curve. In
fact, in [8] it was shown that once this kind of coupling
is introduced with a reasonable strength, the symmetry
energy at low densities tends to behave as the TW model.
The symmetry energy can be expanded around the nu-
clear saturation density and reads
Esym(ρ) = Esym(ρ0) +
ρ− ρ0
ρ− ρ0
where L and Ksym are respectively the slope and the
curvature of the nuclear symmetry energy at ρ0 and they
are calculated from
L = 3ρ0
∂Esym(ρ)
|ρ=ρ0 Ksym = 9ρ20
∂2Esym(ρ)
|ρ=ρ0 .
These two quantities can provide important information
on the symmetry energy at both high and low densi-
ties because they characterize the density dependence of
the energy symmetry. In a recent work [49], the authors
found a correlation between the slope of the symmetry
energy and the neutron skin thickness. In their work 21
sets of the non-relativistic Skyrme potential were inves-
tigated and only 4 of them were shown to have L values
consistent with the values extracted from experimental
isospin diffusion data from heavy ion collisions. In fact,
the extracted value was L = 88 ± 25 MeV [50], which
gives a very strong constraint on the density dependence
of the nuclear symmetry energy and consequently on the
EoS as well. A detailed analysis of Table I shows that, if
this constraint is to be taken seriously, neither the NL3
nor the TW model satisfy it. Nevertheless, the NLωρ
slope interpolates beautifully between the NL3 and TW
slope values. Once again it is seen that the increase in
Λv approximates the NL3 model values for the slope and
energy symmetry to the TW values. Moreover, we have
also tried to find a correlation between the θ values shown
in Table II and L values displayed in Table I. We found
that, as far as some numerical imprecision are consid-
ered, larger values of L correspond to larger values of the
neutron skin, as seen in Fig. 7.
Let’s now go back to the problem of solving the dif-
ferential equations within the nucleation numerical pre-
scription. As we need boundary conditions arising from
the liquid-gas phase coexistence in order to solve eqs.
(12-15) for the TW model and eqs. (22-25) for the NLωρ
model, the binodal sections are essential and the spinodal
sections, which separate the regions of stable to unsta-
ble matter are also of interest. If we had displayed the
binodals in a ρp versus ρn plot, as it is done with the
spinodals in Fig 8, we could see that the spinodals sur-
faces lie inside the binodal sections and share the critical
point corresponding to the highest pressure.
In Fig. 8 the spinodals for the three different mod-
els discussed in this work are shown. Once again, some
of these results can also be found in the recent litera-
ture [7, 8], but we include them here to make a direct
link with the binodals. The instability of the ANM sys-
tem is essentially determined by density fluctuations in
the isoscalar channel. Although the spinodals are, by
themselves, not relevant in calculations performed at the
thermodynamical equilibrium, the isospin channel is very
sensitive to the instabilities occurring below the nuclear
saturation density. The spinodal is determined by the
values of pressure, proton fraction and density for which
the determinant of
Fij =
∂ρi∂ρj
, (56)
where F is the free energy density, goes to zero. A de-
tailed analysis of this quantity can be found in [8, 42].
From Fig. 8, it is seen that the instability region in the
ρp/ρn plane, defined by the inner section of the spinodal
curve is larger for the TW than for the NL3 model. The
size of the instability region depends on the derivative
of the chemical potentials with respect to the neutron
and proton densities. At low densities different models
exhibit different behaviors.
The presence of the rearrangement term in the TW
model also plays a decisive role. Even though a rela-
tively large compensation exists between scalar and vec-
tor mesons in the isoscalar channels within the rearrange-
ment term at low densities, the spinodal region is defined
by the derivative of the chemical potential and therefore
of the rearrangement term.
Next we examine the spinodals obtained with differ-
ent coupling strengths for the NLωρ model. As seen in
Fig. 8, there is almost no difference between the different
curves. They all fall around the original NL3 curve but
once again, they tend to the TW curve as the coupling
strength increases. However, contrary to the TW model,
it was shown in [9] that the direction of the instability in
Λ  =0.01
Λ  =0.02
Λ  =0.025v 0.16
 0.18
 0.22
 0.24
 50  60  70  80  90  100  110  120
L (MeV)
FIG. 7: Correlation between the neutron skin θ and the slope
of the symmetry energy L.
NLωρ increases distillation as the density increases, and
the larger the coupling Λv the larger the effect.
Finally, to end this section, let’s make our points clear:
we have used a simple mean field theory approach to ob-
tain the boundary conditions for the equations of motion
of the meson fields in the nucleation prescription. These
boundary conditions depend on the model used and are
intrinsically related with the liquid-gas phase transition
which, in turn, can be well understood by studying the
coexistence surfaces of the corresponding models. On
the other hand, the neutron skin thickness shows a lin-
ear correlation with the slope of the symmetry energy,
as already pointed out in [49] for non-relativistic mod-
els. Based on the different behaviors found with density
dependent hadronic models and the NLWM, an obvious
consequence is the fact that the neutron skin thickness
depends on the choice of the model.
VIII. CONCLUSIONS
We have calculated the 208Pb neutron skin thickness
with two different density dependent hadronic models,
the TW and the NLωρ model, and one of the most used
parametrizations of the NLWM, the NL3. The calcu-
lations were done within the Thomas-Fermi approxima-
tion, which gives quite accurate results for the asymme-
try in the momentum transfer range of interest for the
calculation of neutron skins. In implementing the nu-
merical results two different prescriptions were used: the
first one based on the nucleation process and the second
one based on the harmonic oscillator basis method. We
have seen that when the nucleation method is performed,
the neutron radius is systematically larger, what results
vΛ   =0.01
vΛ   =0.025
ρ (fm   )−3
 0.02
 0.04
 0.06
 0.08
 0  0.02  0.04  0.06  0.08
FIG. 8: Spinodal section in terms of ρp versus ρn for the NL3,
TW and NLωρ models.
in a thicker neutron skin. This is a consequence of the
fact that the surface energy is lower within the nucleation
calculation than within the harmonic oscillator method.
Within the same numerical prescription, the neutron skin
thickness is smaller with the TW model than with the
NL3. As the coupling strength Λv increases in the NLωρ
model, the neutron skin thickness moves from the orig-
inal NL3 towards the TW results. We have also found
that although the neutron skin thickness is model depen-
dent, the asymmetry at low momentum transfers (below
0.5 fm−1) is very similar for all models and all numerical
prescriptions. As q increases, the asymmetry also be-
comes model dependent. The density profiles obtained
from the solution of the Dirac equation exhibits oscil-
lations near the center of the nucleus, behavior which is
not reproduced within the Thomas-Fermi approximation.
This fact shows up in the asymmetry at large momentum
transfers and therefore all the calculations should be re-
produced by solving the Dirac equation. This calculation
is already under investigation.
It is worth mentioning that the neutron skin thickness
has shown to give hints on the equations of state that
are suitable to describe neutron stars. Moreover, in [49]
a correlation between the slope of the symmetry energy
and the neutron skin thickness was found for Skyrme-
type models. We have observed that this correlation was
also present in the density dependent models we have
studied in the present work.
ACKNOWLEDGMENTS
This work was partially supported by CNPq(Brazil),
CAPES(Brazil)/GRICES (Portugal) under project
100/03 and FEDER/FCT (Portugal) under the projects
POCTI/FP/63419/2005 and POCTI/FP/63918/2005.
[1] S. Typel and B.A. Brown, Phys. Rev. C 64, 027302
(2001).
[2] A.W. Steiner, M. Prakash, J.M. Lattimer and P.J. Ellis,
Phys. Rep. 411, 325 (2005).
[3] C.J. Horowitz and J.Piekarewicz, Phys. Rev. Lett. 86,
5647 (2001).
[4] J.Piekarewicz, nucl-th/0607039. Proceedings of the ”In-
ternational Conference on Current Problems in Nuclear
Physics and Atomic Energy” (May 29 - June 3, 2006)
Kyiv, UKRAINE.
[5] F. Duchoin and Haensel, Phys. Lett. B 485, 107 (2000).
[6] Ph. Chomaz, C. Colonna and J. Randrup, Phys. Rep.
389, 263 (2004).
[7] S.S. Avancini, L. Brito, D. P. Menezes and C.
Providência, Phys. Rev. C 70, 015203 (2004).
[8] S.S. Avancini, L. Brito, Ph. Chomaz, D. P. Menezes and
C. Providência, Phys. Rev. C 74, 024317 (2006).
[9] C. Providência, L. Brito, S.S. Avancini, D. P. Menezes
and Ph. Chomaz, Phys. Rev. C 73, 025805 (2006).
[10] L. Brito, C. Providência, A.M.S. Santos, S.S. Avancini,
D. P. Menezes and Ph. Chomaz. Phys. Rev. C (2006),
C 74, 045801 (2006); C. Providência, L. Brito, A.M.S.
Santos, D.P. Menezes and S.S. Avancini, Phys. Rev. C
74, 045802 (2006).
[11] Ph. Chomaz and F. Gulminelli, Phys. Lett. B447, 221
(1999) 221; H. S. Xu, et al, Phys. Rev. Lett. 85, 716
(2000).
[12] C. Ducoin, Ph. Chomaz and F. Gulminelli, Nucl. Phys.
771, 68 (2006).
[13] Ph. Chomaz, Nucl. Phys. A 685, 274c (2001).
[14] E. Chabanat, P. Bonche, P. Haensel, J. Meyer and R.
Schaeffer, Nucl. Phys. A 627, 710 (1997).
[15] C. Providência, D. P. Menezes, L. Brito and Ph. Chomaz,
in preparation.
[16] B.A. Li, C.M. Ko and W. Bauer, Inter. J. Mod. Phys. E
7, 147 (1998).
[17] K. Pomorski, P. Ring, G.A. Lalazissis, A. Baran, Z. Lo-
jewski, B. Nerlo-Pomorska, M. Warda, Nucl. Phys. A
624, 349 (1997).
[18] D. G. Ravenhall, C. J. Pethick, and J. R. Wilson, Phys.
Rev. Lett. 50, 2066 (1983); M. Hashimoto, H. Seki, and
M. Yamada, Prog. Theor. Phys.71, 320 (1984).
[19] H. de Vries, C.W. de Jager and C. de Vries, Atomic and
Nuclear Data Tables 36, 495 (1987).
[20] C.J. Horowitz, S.J. Pollock, P.A. Souder and R. Michaels,
Phys. Rev. C 63, 025501 (2001).
[21] T.W. Donnelly, J. Dubach and I. Sick, Nucl. Phys. A503
589 (1989).
[22] K.A. Aniol et al. (HAPPEX) (2005), nucl-
ex/0506010; ibidem, nucl-ex/0506011; R. Michaels,
P.A. Souder and G.M. Urciuoli (2005), URL
http://hallaweb.jlab.org/parity/prex.
[23] H. Lenske and C. Fuchs, Phys. Lett. B 345, 355 (1995);
C. Fuchs, H. Lenske and H.H. Wolter, Phys. Rev. C 52,
3043 (1995).
[24] S. Typel and H. H. Wolter, Nucl. Phys. A656, 331
(1999).
[25] G. A. Lalazissis, J. König and P. Ring, Phys. Rev. C 55,
540 (1997).
[26] K. Sumiyoshi, H. Kuwabara, H. Toki, Nucl. Phys. A 581,
725 (1995).
[27] C. Fuchs, H. Lenske and H.H. Wolter, Phys. Rev. C 52,
3043 (1995).
[28] H. Lenske and C. Fuchs, Phys. Lett. B 345, 355 (1995).
[29] B. ter Haar and R. Malfliet, Phys. Rep. 149, 207 (1987).
[30] T. Niks̆ić, D. Vretenar, P. Finelli and P. Ring, Phys. Rev.
C 66, 024303 (2002).
[31] B. Serot and J.D. Walecka, Advances in Nuclear Physics
16, Plenum-Press, (1986) 1.
[32] S.S. Avancini and D.P. Menezes, Phys. Rev. C 74,
015201 (2006).
[33] D.P. Menezes and C. Providência, Phys. Rev. C 68,
035804 (2003); Braz. J. Phys. 34, 724 (2004).
[34] A.M.S. Santos and D.P. Menezes, Phys. Rev. C 69,
045803 (2004).
[35] C.J. Horowitz and J.Piekarewicz, Phys. Rev.C 64,
062802 (2001).
[36] J.K. Bunta and S. Gmuca, Phys. Rev. C 68, 054318
(2003).
[37] J.K. Bunta and S. Gmuca, Phys. Rev. C 70, 054309
(2004).
[38] D.P. Menezes and C. Providência, Nucl. Phys. A 650,
283 (1999); D.P. Menezes and C. Providência, Phys. Rev.
C 60, 024313 (1999); D.P. Menezes and C. Providência,
Phys. Rev. C 64, 044306 (2001).
[39] Y.K. Gambhir, P. Ring and A. Thimet, Ann. Phys. 198,
132 (1990).
[40] T. Gaitanos, M. Di Toro, S. Typel, V. Baran, C. Fuchs,
V. Greco and H. H. Wolter, Nucl. Phys. A 732, 24
(2004).
[41] S.S. Avancini, M.E. Bracco, M. Chiapparini and D.P.
Menezes, J. Phys. G 30, 27 (2004); S.S. Avancini, M.E.
Bracco, M. Chiapparini and D.P. Menezes, Phys. Rev. C
67, 024301 (2003).
[42] J. Margueron and P. Chomaz, Phys. Rev. C 67, 041602
(2003).
[43] C.J. Horowitz, Phys. Rev. C57, 3430 (1998).
[44] C.J. Horowitz and B.D. Serot, Nucl. Phys. A 368, 503
(1981).
[45] G.Fricke, C. Bernhardt, K.Heilig, L.A. Schaller, L.
Schellinberg, E.B. Shera,C.W. de Jager, At. Data Nucl.
Data Tables 60 (1995)177.
[46] G. Audi, A.H. Waptra, C. Thibault, Nucl. Phys. A 729,
337 (2003).
[47] A. Krasznahorkay et a., Nucl. Phys. A 731, 224 (2004).
[48] V.E. Starodubsky, N.M. Hintz,Phys. Rev.
C49,2118(1994).
[49] L. Chen, C.M. Ko and B. Li, nucl-th/0610057.
[50] M.B. Tsang et al., Phys. Rev. Lett. 92, 062701 (2004).
TABLE II: 208 Pb properties
model approximation Rn Rp θ B/A σ
(fm) (fm) (fm) MeV Mev/fm2
NL3 TF+nucleation 5.88 5.65 0.24 -7.77 0.76
NL3 TF+HO 5.79 5.57 0.22 -7.79 0.96
NLωρ, Λv = 0.01 TF+HO 5.77 5.57 0.20 -7.73 0.98
NLωρ, Λv = 0.02 TF+HO 5.75 5.57 0.17 -7.65 0.99
NLωρ, Λv = 0.025 TF+HO 5.74 5.58 0.16 -7.63 1.00
TW TF+nucleation 5.71 5.50 0.22 -6.42 1.08
TW TF+HO 5.68 5.52 0.16 -7.46 1.10
HS TF+HO 5.70 5.47 0.24 -6.10 1.37
exp.[45] 5.44
exp. [46] -7.87
exp. [47] 0.12± 0.07
exp. [48] 0.20± 0.04
ABSTRACT
  In the present work we investigate the main differences in the lead neutron
skin thickness, binding energy, surface energy and density profiles obtained
with two different density dependent relativistic hadronic models, within the
Thomas-Fermi approximation. We show that the asymmetry parameter for low
momentum transfer polarized electron scattering is not sensitive to the model
parametrization differences.

<|endoftext|><|startoftext|>
Introduction
Tunneling and over–barrier reflection are the characteristic non–perturbative phenomena in
quantum mechanics. They typically occur with exponentially small probabilities,
P ∝ e−F/~ , (1)
where F is the suppression exponent; still, the above phenomena are indispensable in under-
standing a wide variety of physical situations, from the generation of baryon number asym-
metry in the early Universe [1] to chemical reactions [2] and atom ionization processes [3].
During the last decades extensive investigations of tunneling processes in systems with
many degrees of freedom have been performed [2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. These studies
1levkov@ms2.inr.ac.ru
2panin@ms2.inr.ac.ru
3Sergey.Sibiryakov@cern.ch, sibir@ms2.inr.ac.ru
http://arxiv.org/abs/0704.0409v2
revealed a rich variety of features of multidimensional tunneling which are in striking contrast
to the properties of one–dimensional tunneling and over–barrier reflection. In particular, the
following phenomenon has been observed: the probability of tunneling may depend non-
monotonically on the total energy of the system and exhibit resonance-like peaks. One can
envisage three physically different mechanisms of this phenomenon. The first mechanism,
present already in one-dimensional case, is tunneling via creation of a metastable state.
In this case the tunneling probability at the maximum of the resonance is exponentially
higher than at other energies. On the other hand, the resonance width ∆E is exponentially
suppressed; so, after averaging with an energy distribution of a finite width the effect of the
resonance is washed out in the semiclassical limit ~ → 0. The second possible mechanism of
non-monotonic behavior of P(E) is quantum interference [7, 13] (see also [14]). In this case
the peak value of the tunneling probability is only by a factor of order one higher than the
average value, while the width of the resonance scales as ∆E ∝ ~. Again, the resonances
become indiscernible in the semiclassical limit. In both these cases the resonances can
be attributed to the subleading semiclassical corrections, i.e. non-monotonic behavior of
the pre-exponential factor omitted in Eq. (1). The third possibility is that the suppression
exponent F (E) is non-monotonic. In this case the existence of the “resonances” is the leading
semiclassical effect: the optimal tunneling probability at the maximum of the resonance is
exponentially higher than the probability at other energies. At the same time the resonance
width scales as4 ∆E ∝
~. This last possibility of “optimal tunneling” is definitely of
interest; yet, it did not receive much attention in literature. We are aware of only a few works
mentioning non–monotonic dependence of the suppression exponent on energy [15, 16, 14].
It is worthwhile studying this phenomenon in detail; this can provide a new insight into the
dynamics of multidimensional tunneling.
In this paper we consider the process of over–barrier reflection in a simple model with two
degrees of freedom. Our setup is interesting in two respects. First, the model under study
is essentially non–linear and the variables cannot be separated; still, over–barrier reflections
in this model can be described analytically within the semiclassical framework. Thus, this
model can serve as an analytic laboratory for the study of multidimensional tunneling. Sec-
ond, the suppression exponent F of the reflection process behaves non–monotonically as the
4This follows from the representation
P(E) ∝ exp
F (Eo)
F ′′(Eo)(E − Eo)2
of the tunneling probability in the vicinity of the maximum.
total energy E changes. We demonstrate that the function F (E) possesses a number of
local minima E = Eo, where reflection is optimal. We stress that the process we study is
exponentially preferable at “optimal” energies as compared to other energies.
Our model describes the motion of a quantum particle in the two dimensional harmonic
waveguide (see Refs. [8, 10, 14] for similar models). The Hamiltonian is
w2(x, y) ,
where x, y are the Cartesian coordinates and m is the mass of the particle. The function
U = mω2w2/2 represents the waveguide potential in two dimensions: a particle with small
energy is bound to move along the line w(x, y) ≈ 0. We do not introduce a potential barrier
across the waveguide and consider the case when the line w = 0 stretches all the way from
x → −∞ to x → +∞. We also assume that the function w(x, y) is linear in the initial
asymptotic region,
w(x, y) → y as x→ −∞ .
In the present paper we consider two particular cases of the function w(x, y) describing
waveguides with one and two sharp turns5, see Fig. 1.
The motion of the particle at x → −∞ is a superposition of free translatory motion in
x direction and oscillations of frequency ω along y coordinate; the state of such a particle is
fully characterized by two quantum numbers, the total energy E and y–oscillator excitation
number N . The particle sent into the waveguide from the asymptotic region x→ −∞ with
given E, N may either continue to move towards x → +∞, or reflect back into the region
x→ −∞. We are interested in the probability P(E,N) of reflection.
Let us discuss reflections at the classical level. [Note that the classical counterpart of N
is the energy of transverse oscillations.] Consider first the waveguide with one sharp turn
(Fig. 1a). One observes that the outcome of the classical evolution, i.e. whether or not
the particle reflects from the turn, depends not only on the total energy E, but on other
dynamical quantities as well. In particular, the direction of the momentum of the particle
in the vicinity of the turn (point C on the graph) is important. This means that the entire
dynamics in the waveguide should be taken into account in order to determine the possibility
of classical reflection. This is in sharp contrast with the situation in one–dimensional case,
where reflection from the potential barrier (or transition through it) is ensured by the value
of the conserved energy of the particle.
5The explicit expressions for the waveguide functions w(x, y) will be presented in the subsequent sections.
Figure 1: The equipotential contour U = E for the waveguides with (a) one and (b) two
sharp turns. An example of classical trajectory is shown in the case (b).
Now, consider the waveguide with two turns. The model is characterized by the angles
of the turns and the distance L between them (see Fig. 1b). Suppose the particle starts
moving classically from x → −∞ with N = 0 along the valley w = 0. Then, the transverse
oscillations get excited only after the particle crosses the first turn, point C ′ on the plot, so
that at the time of arrival to the second turn (point C) approximately ωτ/2π oscillations
are made, where τ ∼ L
m/2E is the time of motion between the two turns. The state of
the particle (coordinates and momenta) at which it comes across the second turn depends
periodically on the phase of transverse oscillations ωτ . Hence, one expects that the regime
of motion of the classical particle can change from transmission to reflection and back as the
energy grows (τ decreases); the energies where it happens can be roughly estimated as
mω2L2
2(2πn)2
. (2)
We will see that this is indeed the case for the waveguides with certain angles of the turns.
At some values of E, N the reflection process cannot proceed classically. Then, at the
quantum mechanical level its probability is exponentially suppressed, F (E,N) > 0. It is
natural to call such a process “over–barrier reflection”6. The central quantity to be studied
below is the suppression exponent F (E,N) of this process. The above discussion suggests
that F (E,N), being determined by the entire dynamics in the waveguide, may be a highly
non–trivial function. For the particular case of the waveguide with alternating regimes of
6By this term we want to emphasize that the process is classically forbidden. Recall, however, that there
is no actual potential barrier across the waveguide in our setup.
classical reflections and transmissions F should oscillate: F = 0 at the energies where the
classical reflections are allowed, and F > 0 at the energies where the reflections are classically
forbidden. One can expect that the similar oscillatory behavior of the suppression exponent
persists for other two–turn models as well. Now, instead of reaching zero, F may possess
a number of local positive minima implying that the reflection at the “optimal” energies is
still a tunneling process.
Let us emphasize the difference of the “optimal tunneling” from quantum interference
and resonance phenomena in our two–turn model. The interference of the de Broglie waves
reflected from the two turns can, in principle, lead to oscillations in the reflection proba-
bility P(E). One can estimate the positions of the interference peaks by equating the De
Broglie wavelength of the particle to an integer fraction of the distance between the turns,
2mE ∼ L/n. This yields the energies of the interference peaks,
Eintn ∼
(2πn)2~2
This formula is completely different from Eq. (2) for the peaks due to “optimal tunneling”.
In particular, the distance between the adjacent inteference peaks,
∆Eint ∼
scales proportional to ~. Thus, these peaks should be averaged over in the semiclassical
limit. Besides, the amplitude of the interference peaks is at most of order one and does
not affect the suppression exponent. Indeed, the exponential increase of the scattering
amplitude can arise due to quantum interference only in the presence of a resonant state
with exponentially long life–time. This state should be supported somewhere in between the
turns and should be classically stable. In Sec. 4.2 we show that such states are absent in
our system. One concludes that the peak–like structure of the probability P(E) of “optimal
tunneling” is caused by completely different physical reasons as compared to the case of
resonance scattering in quantum theory.
It is worth noting that the phenomenon of “optimal tunneling” has an important imple-
mentation in field theory. Recently it was argued [17] (see also Ref. [16]) that the probability
of tunneling induced by particle collisions [18, 19] reaches its maximum at a certain “op-
timal” energy and stays constant7 at higher energies. This result, if generic, provides the
7As opposed to the quantum mechanical case, the tunneling probability does not decrease at energies
higher than the “optimal” one. This is due to the possibility, specific to the field theoretical setup, to emit
the excess of energy into a few hard particles, so that tunneling effectively occurs at the “optimal” energy.
answer to the long–standing question [20] about the high–energy behavior of the probability
of collision–induced nonperturbative transitions in field theory. The quantum mechanical
model presented here supports the generic nature of the phenomenon of “optimal tunnel-
ing”; the simplicity of our model enables one to get an intuitive insight into the nature of
this phenomenon.
The paper is organized as follows. In Sec. 2 we review the semiclassical method of complex
trajectories, which is exploited in the rest of the paper. Reflections in the waveguides with
one and two turns are considered in Secs. 3 and 4 respectively. We discuss our results in
Sec. 5. In appendix we analyze the validity of some assumptions made in the main body of
the paper.
2 The semiclassical method
We start by describing the semiclassical method8 of complex trajectories which will be used
in the study of over–barrier reflections. We concentrate on the derivation of the formula
for the suppression exponent F (E,N) (see Refs. [2, 8, 9] for the details of the method and
Ref. [19] for the field theory formulation). In what follows we use the system of units
~ = m = ω = 1 ,
where the Hamiltonian takes the form,
p2x + p
y + w
2(x, y)
. (3)
One starts with the amplitude of reflection into the state with definite coordinates
xf < 0 , yf ,
A = 〈xf , yf |e−iĤ(tf−ti)|E, N〉 . (4)
Here |E, N〉 is the initial state of the particle moving in the asymptotic region xi → −∞
with fixed translatory momentum p0 =
2(E −N) and the oscillator excitation number N .
Semiclassically,
〈xi, yi|E, N〉 = eip0xi cos
′)dy′ + π/4
, (5)
8Note that the method has been confirmed by the explicit comparison with the exact quantum mechanical
results in Refs. [8, 9, 14]; specifically, the recent check [14] deals with the case when the dependence of the
suppression exponent on energy is not monotonic.
where xi, yi denote initial coordinates,
2N − y′2 , (6)
and we omitted the pre-exponential factor which is irrelevant for our purposes. Using Eq. (5),
one rewrites the amplitude (4) as a path integral,
dxidyi
[dx][dy]
xf , yf
xi, yi
eiS+ip0xi cos
′)dy′ + π/4
, (7)
where S is the classical action of the model (3).
In the semiclassical case the integral (7) is dominated by the (generically complex) saddle
point. Note that, as we continue the integrand in Eq. (7) into the plane of complex coor-
dinates, one of the exponents constituting the initial oscillator wave function grows, while
the other becomes negligibly small. Within the validity of our approximation, we omit the
decaying exponent by writing
′)dy′ + π/4
→ exp
′)dy′
, (8)
with the standard choice9 of the branch of the square root in Eq. (6).
One proceeds by finding the saddle point for the integral (7) with the substitution (8).
Extremization with respect to x(t), y(t) leads to the classical equations of motion,
ẍ = −wwx , ÿ = −wwy . (9)
Differentiating with respect to xi ≡ x(ti), yi ≡ y(ti), one obtains,
ẋi = p0 =
2(E −N) , ẏi = py(yi) =
2N − y2i .
The latter equations are equivalent to fixing the total energy E and initial oscillator energy
N of the complex trajectory,
ẋ2i +N , (10a)
ẏ2i + y
. (10b)
9 The correct branch is fixed by drawing a cut between the oscillator turning points y = ±
2N , and
choosing Im py > 0 at y ∈ R, y >
2N , see, e.g., Refs. [21].
Substituting the saddle–point configuration10 into Eq. (7), one obtains the amplitude of the
process with exponential accuracy,
A ∝ eiS+iB(xi, yi) ,
where the term
B(xi, yi) = p0xi +
′)dy′ (11)
is the initial–state contribution. For the inclusive reflection probability one writes,
dxfdyf |A|2 ∝
dxfdyf e
iS−iS∗+iB−iB∗ .
The integral over the final states can also be evaluated by the saddle point technique; ex-
tremization with respect to xf ≡ x(tf ), yf ≡ y(tf) fixes the boundary conditions in the
asymptotic future,
Im ẋf = Im xf = 0 , Im ẏf = Im yf = 0 . (12)
In this way one obtains the expression (1) for the reflection probability, where the suppression
exponent F is given by the value of the functional
F (E, N) = 2 ImS + 2 ImB(xi, yi)
evaluated on the saddle–point configuration — a complex trajectory satisfying the boundary
value problem (9), (10), (12).
The contribution B(xi, yi) of the initial state is simplified after one uses the asymptotic
form of the solution at t→ −∞ (xi → −∞),
x = p0t + x0 , y = ae
−it + āeit . (13)
Equations (10) guarantee that the quantities p0 =
2(E −N) and 2aā = N are real, since
E, N ∈ R. Therefore, one may introduce two real parameters T , θ as follows,
2 Im x0 = −p0T , ā = a∗eT+θ . (14)
One finds for the initial term (11),
2 Im B(xi, yi) = Im
2p0xi − 2Narccos(yi/
2N) + yi
2N − y2i
= −p20T −N(T + θ) + Im(yiẏi) ,
10For simplicity we assume that the saddle–point configuration is unique. Otherwise, one should take the
saddle point corresponding to the weakest exponential suppression.
and thus
F = 2 Im S̃ −ET −Nθ , (15)
where S̃ is the classical action of the system (3) integrated by parts,
S̃ = −1
xẍ+ yÿ + w2(x, y)
. (16)
Let us comment on the physical meaning of the parameters T , θ. Consider two trajectories
which are solutions to the boundary value problem (9), (10), (12) at neighbouring values of
E, N . The differential of the quantity 2 Im S̃ as one deforms one trajectory into the other is
d (2 Im S̃) = d Im(2S + xiẋi + yiẏi) = Im(xidẋi − ẋidxi + yidẏi − ẏidyi) = EdT +Ndθ ,
where in the last equality we used the asymptotic form (13), (14) of the solution. Then,
from Eq. (15) one finds,
dF (E,N) = −TdE − θdN . (17)
Thus, the parameters T and θ are (up to sign) the derivatives of the suppression exponent
with respect to energy E and initial oscillator excitation number N respectively.
Our final remark is that the boundary value problem (9), (10), (12) is invariant with
respect to the trivial time translation symmetry,
t→ t+ δt , δt ∈ R , (18)
which can be fixed in any convenient way.
3 The model with one turn
To warm up, we consider the simplest model, where the waveguide has one sharp turn,
w = y θ(−x+ y tg β) + cos β (x sin β + y cos β) θ(x− y tg β) . (19)
Here θ(x) is the step function. It is convenient to use the rotated coordinate system,
cos β − sin β
sin β cos β
The waveguide function takes the form,
w = η cos β − ξ sin β θ(−ξ) . (20)
Figure 2: The equipotential contour w2(x, y) = 2N for the waveguide (20) and the trajectory
of the critical solution with energy N/ cos2 β.
The equipotential contour w2(ξ, η) = const is shown in Fig. 2. One observes that the motion
of the particle in two regions, ξ < 0 and ξ > 0, decomposes into the translatory motion and
oscillations in the coordinates x, y and ξ, η respectively (see. Eqs. (19) and (20)); the
frequency of η–oscillations in the latter case is cos β.
Due to the presence of the step function, the first derivatives of the potential (20) are
discontinuous11 at ξ = 0. Strictly speaking, the semiclassical method is not applicable in
this situation [21]. Thus, the formula (20) should be regarded as an approximation to some
waveguide function with smooth turn. Generically the width of the smoothened turn is
characterized by a parameter b; the sharp–turn approximation (20) corresponds to b → 0.
An example of smoothening is provided by the following substitution in Eq. (20),
θ(ξ) → θb(ξ) =
1 + e−ξ/b
. (21)
The semiclassical description can be used as long as the de Broglie wavelength of the particle
is small compared to the linear size of the potential12, 1/
E ≪ b. We conclude that the
sharp–turn and semiclassical approximations are valid simultaneously for smooth waveguides
1 ≫ b≫ 1/
E . (22)
11Note that the potential itself is continuous.
12Another semiclassical condition is that the energy is sufficient to excite a lot of oscillator levels, E ≫ 1.
It is satisfied provided Eq. (22) holds.
An important property of the model (20) is invariance of the classical equations of motion
(9) under the rescaling of the coordinates,
x→ Λx , y → Λy . (23)
Using the transformation (23), one may express a solution x(t), y(t) with energy E in terms
of the “normalized” one,
x = x̃
E , y = ỹ
where the solution x̃(t), ỹ(t) has unit energy; its initial oscillator excitation number is
ν = N/E .
The suppression exponent (15) takes the form,
F (E, N) = Efβ(ν) , (24)
where fβ(ν) is the exponent for the “normalized” solution. Substituting the expression (24)
into Eq. (17), one obtains,
fβ(ν) = −T − θν . (25)
We will exploit Eq. (25) in the end of this section. Now, we proceed to finding the “normal-
ized” trajectories.
At certain initial data ν > νcr the particle can reflect from the turn classically, so that
fβ(ν > νcr) = 0 .
Let us find the value of νcr. In the region ξ < 0 the classical solution takes the form,
x(t) = p0t+ x0 , (26a)
y(t) = A0 sin(t + ϕ) . (26b)
Having crossed the line ξ = 0 (line AB in Fig. 2), the classical particle can never return
back into the region ξ < 0. Indeed, in this case it moves at ξ > 0 with constant momentum
pξ > 0. Thus, the particle can reflect classically only if its trajectory touches the line
ξ = 0. The potential of our model has ill–defined derivatives at ξ = 0, and the fate of the
particle moving along the line AB depends on the particular choice of the smoothening of
the potential. In appendix we consider the motion of the classical particle in the case when
nonzero smoothening of width b is switched on. For a class of smoothenings we show that
in the small vicinity (δξ ∼ b) of any trajectory touching the line ξ = 0 there exists some
“smoothened” trajectory, which reflects classically from the turn. Consequently, below we
associate the trajectories touching the line ξ = 0 with the classical reflected solutions.
One notices that the inclination of the trajectory (26) is bounded from above
therefore, the classical trajectory of the particle can touch the line ξ = 0, that is, y/x = ctg β
only at
A0/p0 ≥ ctg β . (27)
From Eqs. (27), (26), (10) one extracts the condition for the particle to reflect classically
from the turn,
ν ≥ νcr = cos2 β . (28)
The critical classical solution at ν = νcr touches the line ξ = 0 at η = 0 (point C in Fig. 2),
where its trajectory
xcr(t) =
2t sin β , (29)
ycr(t) =
2 sin t cos β .
has the largest inclination.
We now turn to the classically forbidden reflections at ν < νcr, which are described by
the boundary value problem (9), (10), (12). One makes the following important observation.
The waveguide function (20) has the form of two analytic functions glued together at ξ = 0.
Hence, the equations of motion (9) can be continued analytically to the complex values of
coordinates in two different ways, starting from the regions ξ < 0 and ξ > 0 respectively. In
this way one obtains two complex solutions, ξ−(t), η−(t) and ξ+(t), η+(t). These solutions
and their first derivatives should be matched at some moment of time t1, ξ(t1) = 0. [Note
that the matching time t1 does not need to be real.] Below we conventionally refer to these
solutions as the ones belonging to the regions ξ < 0 and ξ > 0.
By the same reasoning as above we find that once the particle arrives into the region
ξ > 0, it never reflects back to ξ < 0, unless pξ = 0. So, in the region ξ > 0 one writes,
ξ+(t) = 0 , (30a)
η+(t) =
cos β
sin(t cos β + ϕη) , (30b)
where the “normalization” condition E = 1 has been used explicitly. Due to the conditions
in the asymptotic future, Eqs. (12), the parameter ϕη is real. We use the translational
invariance (18) to set ϕη = 0. Note that we again associate the trajectory going along the
line ξ = 0 with the reflected one.
The physical picture of over–barrier reflection that comes to mind matches with the new
mechanism of multidimensional tunneling proposed recently in Refs. [9, 11]. The process
proceeds in two steps. The first step, which is exponentially suppressed, is formation of the
periodic classical orbit (30) oscillating along the line ξ = 0. This orbit is unstable. At the
second step of the process the unstable orbit decays classically forming a trajectory going
back to x → −∞ at t → +∞. Clearly, the second step does not affect the suppression
exponent of the whole process, and we do not consider it explicitly. In what follows we
concentrate on the determination of the tunneling trajectory describing the first step of the
process.
One should find the solution at ξ < 0 and impose the boundary conditions (10). Note,
however, that the energy of our solution is fixed already. As for the initial oscillator excitation
number ν, it does not change during the evolution in the region ξ < 0. Thus, one may fix it
at the matching time t = t1. One writes,
(ẏ2 + y2)
= cos2 β + sin2 β sin2(t1 cos β) .
This complex equation allows one to express t1 as
sin(t1 cos β) = −i
νcr − ν
sin β
, (31)
where the choice of the sign is dictated by the condition in footnote 9. It is convenient to
introduce notation t1 = iT1, T1 ∈ R.
In order to find the suppression exponent fβ(ν), one needs to evaluate the parameters
T (ν), θ(ν). At ξ < 0 the solution has the form,
x−(t) = p0(t− iT/2) + x′0 , (32a)
y−(t) = ae
−it + a∗eT+θ+it , (32b)
where the definitions (13), (14) have been taken into account explicitly, so that p0, x
0 ∈ R.
One evaluates p0, x
0, a, T , θ by matching the coordinates x±, y± and their first derivatives
νcr0.20.150.10.050
Figure 3: The suppression exponent fβ(ν) for the waveguide (20); β = π/3.
ẋ±, ẏ± at t = iT1; this yields
x′0 = 0 , p0 =
2(1− ν) , a = i
1− ν/ cos2 β
T + θ
cos2 β − ν
sin β
The last two equations, together with Eq.(25), define the function fβ(ν),
fβ(ν) =
cos β
arcsh
νcr − ν
sin β
− ν cos β arcsh
νcr − ν
sin β
(νcr − ν)(1− ν)
this finction is plotted in Fig. 3. One observes that at ν → νcr the quantities T1, T, θ, fβ
tend to zero, and the complex trajectory tends to the classically allowed critical solution, cf.
Eqs. (29),
2 sin β , a→ i√
cos β .
At ν = 0 one has,
fβ(0) = −2 +
cos β
arcth (cos β) . (33)
To summarize, we obtained the suppression exponent for the reflection of a particle in the
simplest waveguide with one sharp turn.
Figure 4: The equipotential contour w2(x, y) = 2N ′ for the waveguide (35) and the trajectory
of the critical solution with energy N ′/ cos2 β > EB. The matching points C, C
′ are shown
by the thick black dots.
4 The model with two turns
4.1 Introducing the system
In the model of the previous section the suppression exponent was proportional to energy
because of the coordinate rescaling symmetry (23). Now, we are going to demonstrate that
small violation of this symmetry results in highly non–trivial graph for F (E).
One introduces a second turn into the waveguide, see Fig. 4. We want to consider this
turn as a small perturbation, so, we assume its angle α to be smaller than β. It is convenient
to introduce two additional coordinate systems, x′, y′ and ξ, η, bound to the central and
rightmost parts of the waveguide respectively. They are related to the original coordinate
system x, y as follows,
cosα sinα
− sinα cosα
cos β − sin β
sin β cos β
x′ − L
Note that the origin of the coordinate system ξ, η is shifted by the distance L. The waveguide
function is
w = θ(−x′)θ(−ξ)y + θ(−ξ)θ(x′)y′ cosα + θ(ξ)η cosα cos β ; (35)
it consists of three pieces glued together continuously at x′ = 0 and ξ = 0 (lines A′B′ and
AB in Fig. 4 respectively). At t→ −∞ the particle comes flying from the asymptotic region
x′ < 0, where w = y. In the intermediate region x′ > 0, ξ < 0 the particle moves in the x′
direction oscillating along the y′ coordinate with the frequency cosα. Finally, in the region
ξ > 0 its motion is free in the coordinates ξ, η; the frequency of η–oscillations is cosα cos β.
The model (35) no longer possesses the symmetry (23): rescaling of coordinates changes
the length L of the central part of the waveguide. In what follows it is convenient to work
in terms of the rescaled dynamical variables,
x̃ = x/L , ỹ = y/L .
In new terms the parameter L disappears from the classical equations of motion, entering
the theory through the overall coefficient L2 in front of the action. The initial–state quantum
numbers are also proportional to L2,
E = L2Ẽ , N = L2Ñ . (36)
Thus, the conditions (22) for the validity of the semiclassical approximation are satisfied in
the limit
L→ ∞ , Ẽ, Ñ = fixed .
The suppression exponent takes the form
F (E,N) = L2F̃ (Ẽ, Ñ) . (37)
To simplify notations, we omit tildes over the rescaled quantities in the rest of this sec-
tion. Rescaling back to the physical units can be easily performed in the final formulae by
implementing Eqs. (36), (37).
4.2 Classical evolution
Let us begin this subsection by demonstrating that there are no stable classical solutions
localized in the region between the turns. This is important for the determination of the
tunneling probability, since such stable solutions could lead to exponential resonances in the
tunneling amplitude. The argument proceeds as follows. Any trajectory which is localized in
the intermediate region should reflect from the line AB infinitely many times. Each reflection
involves touching the unstable orbit living at the line AB. This implies that the trajectory
itself is unstable.
We proceed by determining the region of initial data E, N , which correspond to the
classical reflections. [For brevity we will refer to this region as the “classically allowed
region”, as opposed to the “classically forbidden region” where reflections occur only at the
quantum mechanical level. We stress that these are the regions in the plane of quantum
numbers E, N .] Let us search for the critical classical solutions which correspond to the
smallest initial oscillator number N = Ncr(E) at given energy E. As in the previous section,
one finds that the particle must get stuck at the line13 AB for some time in order to reflect
back. Let us first make an assumption inspired by the study of the one-turn model that the
critical solutions touch the line AB at their maximum inclination point (point C in Fig. 4).
We will see shortly that this is true only at energies above a certain value EB, see Eq. (50).
Still, the analysis based on the above assumption enables one to catch the qualitative features
of the critical line N = Ncr(E). Besides, the analysis is considerably simplified in this case;
we postpone the accurate study until the end of this subsection. Keeping in mind the above
remarks, one writes for the solution in the intermediate region,
x′cr(t) = t
2E sin β + 1 , (38a)
y′cr(t) =
cos β
sin(t cosα) . (38b)
Before entering the intermediate region, the particle crosses the line A′B′ (point C ′ in Fig. 4).
The initial oscillator number N is most conveniently calculated at the moment
t = t0 ≡ −
2E sin β
of crossing. Using the relations (34) one obtains,
ẋcr(t0) =
sin β cosα− cos β sinα cos
cosα√
2E sin β
, (39)
and thus
Ncr(E) = E −
ẋ2cr(t0) = E − E
sin β cosα− cos β sinα cos
cosα√
2E sin β
, E > EB .
 0.05
 0.15
0.90.80.70.60.50.40.30.10
Figure 5: The boundary N = Ncr(E) of the classically allowed region at E > EB for the
waveguide model (35); β = π/3, α = π/30. The region of the classically allowed initial data
lies above this boundary. The empty circles correspond to the energies E = En, where the
curve N = Ncr(E) touches its lower envelope N = E cos
2(β + α).
As an example, we show in Fig. 5 the region of the classically allowed initial data for β = π/3,
α = π/30. One observes that the function Ncr(E) oscillates between two linear envelopes,
E cos2(β + α) and E cos2(β − α); the period of oscillations decreases as E → 0. Moreover,
the curve Ncr(E) has a number of minima at the points E = E
n . This means that the
energies E = Ecrn are optimal for reflection: in the vicinity of any point E = E
n , N =
Ncr(E
n ) reflections become exponentially suppressed independently of whether the energy
gets increased or decreased. This feature is particularly pronounced in the case α+β = π/2,
when the lower envelope coincides with the line N = 0. Then, the classical reflections (i.e.
reflections with the probability of order 1) at N = 0 are possible only in the vicinities of the
points
8π2(n− 1/2)2
This is the case we used in Introduction to illustrate the effect.
The minima E = Ecrn exist at other values of the parameters as well. For instance, let
13We do not consider reflections from the line A′B′. They disappear at larger values of N than reflections
from the line AB if α is small enough.
us find the positions of these minima in the case α ≪ 1. One differentiates Eq. (40) with
respect to energy and obtains,
Ecrn = En
π(n− 1/2)
arcsin
ctg β
2πα(n− 1/2)
+O(α2)
, (41)
where
8π2(n− 1/2)2 sin2 β
are the points where the curve N = Ncr(E) touches its lower envelope. The argument of
arcsine in Eq. (41) should be smaller than one, so, the minima Ecrn exist only at large enough
n ≥ n0 ≡
ctg β
+ 1 , (43)
where [·] stands for the integer part.
Let us make several comments. First, note that n0 ∼ O(1/α), consequently, all the
optimal points Ecrn lie in the region of small energies E ∼ 1/n20 ∼ O(α2). Second, as we
pointed out before, the formula (40) for the function Ncr(E) holds at E > EB. Comparing
the expressions (42), (43) and (50), one observes that En0 > EB if tg β > 1. So, there
does exist a range of energies where the non-monotonic behavior of the function Ncr(E) can
be inferred from the formula (40). In fact, the conclusion about the existence of the local
minima of Ncr(E), as well as the expressions (41), (42), (43) determining their positions,
remain valid also at E < EB. This follows from the rigorous analysis of the boundary of the
classically allowed region to which we turn now. The reader who is more interested in the
tunneling processes may skip this part and proceed directly to subsection 4.3.
Now, we do not appeal to the Ansatz (38). Instead, we start with the general solution
in the intermediate region,
x′ = p′0(t− t0) , (44a)
y′ = A′0 sin [(t− t0) cosα+ ϕ′] . (44b)
It is convenient to parametrize it by the total energy E = p′20 /2 + cos
2 αA′20 /2 and the
“inclination” γ defined by the relation
p′0/A
0 = tg γ cosα .
Expressions (44) take the following form,
2E (t− t0) sin γ , (45a)
cos γ
sin [(t− t0) cosα + ϕ′] . (45b)
The constants t0 and ϕ
′ are fixed by demanding the trajectory (45) to reflect classically from
the second turn, i.e. touch the line ξ = 0 at t = 0,
(x′ − 1) cos β − y′ sin β
= 0 ,
= ctg β .
These conditions imply,
t0 = −
2E sin γ
tg2 β
tg2 γ
− 1 , (46a)
ϕ′ = −
cosα√
2E sin γ
tg2 β
tg2 γ
− 1− arccos
. (46b)
One sees that the classical reflections are possible only at γ ∈ [0; β]; the boundary value
γ = β reproduces the solution (38).
In order to find Ncr(E), one should minimize the value of the incoming oscillator exci-
tation number with respect to γ at fixed E. At t = t0, when the particle crosses the first
turn,
p0 ≡ ẋ(t0) =
2E(cosα sin γ − sinα cos γ cosϕ′) . (47)
Since N = E − p20/2, one can maximize the value of the translatory momentum p0 instead
of minimizing N(γ). Formula (39) represents the value γ = β lying at the boundary of the
accessible γ–domain; this value should be compared to p0(γ) taken at local maxima.
Let us consider the case α ≪ 1. At large enough energies, E ∼ 1, Eq. (47) is dominated
by the first term, which grows with γ, so that the maximum of p0(γ) is indeed achieved at
γ = β. At small energies, however, the second term in Eq. (47) becomes essential because of
the quickly oscillating cosϕ′ multiplier: the frequency of cosϕ′ oscillations grows as E → 0,
and at E ∼ α2, in spite of the small magnitude proportional to sinα, the second term
produces the sequence of local maxima of the function p0(γ).
One expects the parameters of the trajectory at small α not to be very different from the
ones at α = 0 (the latter case was considered in Sec. 3). So, we write,
γ = β − δγ ,
where 0 < δγ ≪ 1. Expanding the expressions (46), (47) and taking into account that
E ∼ α2 one obtains,
ϕ′ = −
2E sin β
(1 + δγ ctg β) , (48a)
2E(sin β − δγ cos β − α cos β cosϕ′) . (48b)
Now, the local maxima of the initial translatory momentum can be obtained explicitly by
differentiating Eqs. (48) with respect to δγ. One finds the sequence of them,
δγn = −tg β +
sin2 β
cos β
2πn− π − arcsin
2E sin2 β
α cos β
. (49)
Only the maxima with δγn > 0 should be taken into account. The local maxima exist when
E ≤ EB ≡
α2 cos2 β
2 sin4 β
. (50)
Substituting Eq. (49) into the expressions (48), one evaluates the values of p0 at the local
maxima,
p0,n(E) =2
2E sin β − 2E sin2 β
2πn− π − arcsin
2E sin2 β
α cos β
2E cos β
1− 2E sin
α2 cos2 β
The graphs Nn(E) = E − p20,n(E)/2 are shown in Fig. 6 for the case β = π/3, α = π/30.
Each graph is plotted for the energy range E > EAn restricted by the condition δγn > 0.
They are presented together with the curve given by the formula (40). By definition, the
critical solution corresponds to the lowest of these graphs. Clearly, for each “local” curve
representing the n-th local minimum of N(γ) there is a range of energies EAn < E < EBn
where it lies lower than the “global” curve (40). This means that the parameter γ of the
critical solution changes discontinuously across the points E = EBn . Correspondingly, the
curve Ncr(E) has a break at these points. On the other hand, the function Ncr(E) is smooth
at the points An as the “local” graphs end up exactly at δγ = 0, where the parameters of
the n-th “local” solution coincide with the ones of the “global” solution.
To summarize, we have observed that the boundary of the classically allowed region is
given by a collection of many branches of classical solutions, each branch being relevant in
its own energy interval. We will see that a similar branch structure is present in the complex
trajectories describing over–barrier reflections in the classically forbidden region of E, N .
4.3 Classically forbidden reflections
In this subsection we demonstrate that the suppression exponent F (E, N) viewed as a
function of energy at fixed N exhibits oscillations deep inside the classically forbidden region
 0.02
 0.04
 0.06
 0  0.05  0.1  0.15  0.2
Figure 6: The graphs Nn(E) corresponding to the local minima of the function N(γ) (dashed
lines) plotted together with the “global” curve, Eq. (40) (solid line); β = π/3, α = π/30.
The critical curve N = Ncr(E) is obtained by taking the minimum among all the graphs.
of initial data. This result comes without surprise if one takes into account the non-monotonic
behavior of the boundary Ncr(E) of the classically allowed region. Indeed, the curve N =
Ncr(E) coincides with the line F (E,N) = 0. One has,
= −∂EF
N=Ncr(E)
so that
(Ecrn , N
n ) = 0 .
We conclude that the points E = Ecrn are the local minima of the function F (E) at fixed
N = N crn . It is natural to expect that such local minima of F (E) exist at other values of
N as well. To illustrate this fact explicitly, we study the complex trajectories, solutions to
Eqs. (9), (10), (12).
Following the tactics of the previous section, we find solutions in three separate regions:
initial region x′ < 0, final region ξ > 0, and the intermediate region x′ > 0, ξ < 0. These
solutions, together with their first derivatives, should be glued at t = t0, when the complex
trajectory crosses the line x′ = 0, and at t = t1, when ξ = 0. Besides, we are looking for the
tunneling solution which ends up oscillating along the line AB, see Fig. 4. As discussed in
Sec. 3 this assumes existence of the second step of the process: classical decay of the unstable
orbit living at ξ = 0; the latter decay is described by a real trajectory14 going to x → −∞
at t→ +∞.
The solution in the final region ξ > 0 is (cf. Eqs. (30)),
ξ+(t) = 0 , (51a)
η+(t) =
cosα cos β
sin(t cosα cos β) , (51b)
where we used the time translation invariance (18) to fix the final oscillator phase ϕη = 0.
In the intermediate region x′ > 0, ξ < 0 one writes,
x′(t) = p′0t + x
0 , (52a)
y′(t) = a′e−it cosα + ā′eit cosα . (52b)
Note that the final solution (51) does not contain free parameters; thus, the matching of x′,
ẋ′, y′, ẏ′ at t = t1 enables one to express all the parameters in Eqs. (52) in terms of one
complex variable t1,
p′0 =
2E sin β cos φ1 , (53a)
x′0 = 1 +
[sin φ1 − φ1 cos φ1] , (53b)
eiφ1/ cos β [sinφ1 + i cos β cosφ1] , (53c)
ā′ =
e−iφ1/ cos β [sinφ1 − i cos β cos φ1] , (53d)
where we introduced φ1 = t1 cosα cos β.
As the energy of the solution has been fixed already, the only remaining initial condition
involves initial oscillator excitation number at x′ < 0, see Eqs. (10). It is convenient
to impose this condition at the matching point t = t0. One recalls the definition of the
matching time t0,
p′0t0 + x
0 = 0 ,
14One wonders why this trajectory does not reflect from the turn A′B′ on its way back. This concern is
removed by the observation that the trajectory produced in the decay of the unstable orbit is not unique:
in appendix we show that the decay can occur at any point of the segment AC giving rise to a whole bunch
of potential decay trajectories. Most of these trajectories pass through the turn A′B′ without reflection.
which, after taking into account the expressions (53a), (53b), leads to the following equation,
cosα√
2E sin β
sinφ1
cos β
− cosφ1∆φ = 0 , (54)
where ∆φ = cosα(t1 − t0). At t = t0 one has,
ẋ(t0) = p
0 cosα− ẏ′(t0) sinα =
2(E −N) ,
and thus √
= ctgα sin β cosφ1 − sin φ1 sin∆φ− cos β cosφ1 cos∆φ . (55)
As before, ν = N/E.
Two complex equations (54), (55) determine the matching times t0, t1, and, consequently,
the complex trajectory. Although these equations cannot be solved explicitly, they can
be simplified in the case α ≪ 1, which we consider from now on. For concreteness, we
study reflections at N = 0. It is important to keep in mind that in the region of interest
E ∼ Ecrn ∼ O(α2); thus, one should regard all the momenta p and oscillator amplitudes a,
ā, as the quantities of order O(α). At the same time, for the distances along the waveguide
one has x ∼ O(1), so that the real parts of time intervals may be parametrically large,
Re t ∼ x/p ∼ O(1/α).
Further on, it will be convenient to work in terms of real variables, so, we represent φ1
and ∆φ as
φ1 = cosα cos β(τ1 + iT1) , ∆φ = cosα(τ + i∆T ) .
Note that τ and ∆T are the real and imaginary parts of the time interval t1 − t0 which the
particle spends in the intermediate region. Now, equation (54) enables one to express
2E sin βch(T1 cos β)
+O(α) , (56)
τ1 = −
τ cos β
cos β
−∆T cth(T1 cos β)
+O(α3) . (57)
Note that τ1 ∼ O(α), τ ∼ O(1/α). Then, the real part of Eq. (55) implies that
ch(T1 cos β) =
sin β
1 + α ctgβ cos τe∆T
+O(α2) . (58)
While deriving this formula we imposed T1 < 0 which follows from the requirement that
in the limit α → 0 equation (31) should be recovered; besides, we assumed e∆T ∼ O(1).
Substituting Eq. (58) into Eq. (56) and the imaginary part of Eq. (55), we obtain the final
set of equations,
2E = α ctgβ cos τe∆T +O(α2) , (59a)
(1 + ∆T )e−∆T = α ctgβτ sin τ +O(α) . (59b)
These two nonlinear equations, still, cannot be solved explicitly. Nevertheless, one can get
a pretty accurate idea about the structure of their solutions.
Before proceeding to the analysis of the above equations, let us derive a convenient
expression for the suppression exponent F0(E) ≡ F (E,N = 0). Note that on general
grounds one expects to obtain an expression of the form,
F0(E) = E(fβ(0) +O(α)) ,
where fβ(0) is given by Eq. (33). We are interested in the O(α) correction in this expression,
so, one must be careful to keep track of the subleading terms during the derivation.
Making use of the equations of motion, one obtains for the incomplete action (16) of the
system,
2 Im S̃ = Im p′0 =
2E sin β Im(cosφ1) .
Substitution of Eqs. (56), (57), (58) into this formula yields
2 Im S̃ = 2E
−1−∆T − α ctg β cos τe∆T
cos2 β
+ 2∆T
+O(α2)
For the parameter T one has (see Eqs. (14)),
T = −
2 Im x0
2 Im(x(t0)− p0t0)
= 2(T1 −∆T ) +
sinα Im y′(t0) , (60)
where in the last equality we used Eqs. (34) and x′(t0) = 0. The quantity Im y
′(t0) is
evaluated by using Eqs. (52b), (53) and (58); one finds,
Im y′(t0) = −
ctg β cos τe∆T +O(α)
Substituting everything into the formula (15), we obtain,
F0(E) = E
fβ(0)− 4α ctg β cos τ ∆T e∆T +O(α2)
. (61)
This expression implies that determination of the O(α) correction to the suppression expo-
nent involves finding τ , ∆T with O(1)–accuracy. This is precisely the level of accuracy of
Eqs. (59). Below we will also need the following formulae, which can be easily obtained by
using T = −F.
and Eq. (60),
= fβ(0) + 2(∆T + 1) +O(α) , (62)
2(∆T + 1 +O(α))
. (63)
Note that, though the suppression exponent differs from that in the one–turn case only by
O(α) correction, its derivative gets modified in the zeroth order in α.
Now, we are ready to analyze Eqs. (59). One begins by solving Eq. (59b) graphically,
see Fig. 7. The important property of this equation is as follows. One notices that the l.h.s.
of Eq. (59b) is always smaller than 1, the maximum being achieved at ∆T = 0. Therefore,
the solutions to this equation are confined to the bands
τ sin τ <
This corresponds to
τ ∈ [0; 2π(n1 − 1) + δτn1 ] or τ ∈ [2πn− π − δτn; 2πn + δτn] , n ≥ n1 (64)
where
δτn = arcsin
2πα(n− 1/2)
+O(α) ,
+ 1 , (65)
with [·] in the last formula standing for the integer part. The forbidden bands, where
τ sin τ > tgβ/α, are marked in Fig. 7 by yellow shading. The property (64) introduces a
topological classification of the solutions τ , ∆T to Eqs. (59). Namely, these solutions fall
into a set of continuous branches: the “local” branches τn(E), ∆Tn(E) living inside the
strips τ ∈ [2πn − π − δτn; 2πn + δτn], n ≥ n1, and the “global” branch τg(E), ∆Tg(E)
inhabiting the very first band τ ∈ [0; 2π(n1 − 1) + δτn1 ]. As follows from the definition of
τ , the topological number n counts the number of y′–oscillations during the evolution in the
intermediate region.
Let us consider the “global” branch. From Eqs. (59) one has,
τg → 2π(n1 − 1) +O(α lnα) , ∆Tg → ln(tg β/α) , E → 0 ,
τg → 0 , ∆Tg → −1 , E → +∞ .
10π 9π 8π 7π 6π  5π4π 3π 2π π 0
g 4 5
10π 9π 8π 7π 6π  5π4π 3π 2π π 0
g 4 5
Figure 7: Curves representing solutions to Eq. (59b); β = π/3, α = π/30.
By inspection of Fig. 7 one can work out the qualitative behavior of the functions τg(E),
∆Tg(E). Alternatively, these functions can be found numerically. They are plotted in Fig. 8
for the case β = π/3, α = π/30 (the curves marked with “g”). One observes that at
high enough energies the function ∆Tg(E) exhibits oscillations around the line ∆T = −1.
According to the formula (63) this means that the function F0(E)/E is non-monotonic, it
attains local minima at the points
E ′n =
8π2(n− 1/2)2
1 + 2αe−1ctgβ +O(α2)
. (66)
Moreover, if
n ≥ n′0 ≡
fβ(0) exp
fβ(0)
+ 1 (67)
there exist Eon = E
n(1 + O(α)), such that ∆T (E
n) = −1 − fβ(0)/2. Then, according to
Eq. (62) the points Eon are the “optimal” energies corresponding to the local minima of the
suppression exponent F0(E).
At low energies the function ∆Tg(E) ceases to oscillate and becomes large and positive.
According to Eq. (62) this means that the suppression exponent F0,g(E) of the “global”
solution becomes negative at low energies15, see Fig. 9. This is a clear signal that the
15It is worth mentioning that Eqs. (59) and the expression (61) for the suppression exponent become
 0  0.1  0.2  0.3
 0  0.1  0.2  0.3
 0  0.1  0.2  0.3
E’4E’5
Figure 8: Several first branches of solutions to Eqs. (59): “global” branch (“g”) and two
“local” branches (“4”, “5”); β = π/3, α = π/30.
 0.05
 0.15
 0.25
 0  0.2  0.4  0.6
 0.02
 0.04
0.120.110.100.09
Figure 9: The suppression exponent F0(E) for the “global” and first “local” (n = 4) branches;
β = π/3, α = π/30. The vicinity of intersection of the graphs is enlarged in the upper right
corner.
“global” solution becomes unphysical at these energies and its contribution to the reflection
probability should be discarded: negative suppression exponent contradicts the unitarity
requirement16, P < 1. One is forced to conclude that at low energies reflection is described
by the “local” solutions. Let us study them in detail.
For the n-th branch one obtains,
τn → 2πn+O(α lnα) , ∆Tn → ln(tg β/α) , E → 0 ,
τn → 2πn− π , ∆Tg → +∞ , E → +∞ .
From Fig. 7 one learns that the n-th solution passes through the points
∆Tn = −1 , τ = 2πn or τ = 2πn− π . (68)
inapplicable at large ∆T : the assumption e∆T ∼ O(1) which was used in the derivation of these equations
gets violated. Nevertheless, by analyzing the full equations (54), (55) one can show that dF0,g/dE = −Tg
is large and positive at E → 0. This is sufficient for concluding that F0,g(E) is negative in the low–energy
domain.
16Another indication that the “global” solution is unphysical at small E is that the function τg(E) is
bounded from above. Indeed, τ is the time interval the particle spends in the intermediate part of the
waveguide, one expects it to tend to infinity as E → 0 for a physically relevant solution.
Thus, each curve ∆Tn(E) has one sharp dip, its minimum is smaller than −1, see Fig. 8. As
in the case with the “global” branch, the points (68) represent the extrema of the functions
F0,n(E)/E; the positions of the local minima are again given by Eq. (66).
Making use of Eq. (61), we find that the suppressions F0,n(E) of the “local” branches are
large and positive at high energies. Hence, these solutions give subdominant contributions
to the reflection probability at such E as compared to the “global” solution. As energy
decreases, F0,n(E) also decreases, then makes one oscillation and drops to negative values at
small E. The latter property means that each “local” branch becomes unphysical at small
enough energies. The suppression exponent of the first “local” branch (corresponding to
n = 4 in the case β = π/3, α = π/30) is presented in Fig. 9.
An alert reader may have already guessed that we have met here the typical Stokes
phenomenon [21]. In fact, the Stokes phenomenon is specific to the situations where some
integral (e.g., the path integral (7) in our case) is evaluated by the saddle–point method.
Essentially, it means the following: as one gradually changes the parameters of the integral
in question, a given saddle point may become non–contributing after the values of these
parameters cross a certain curve drawn in the parameter space, the Stokes line. Since the
result of the computation should be continuous, this phenomenon occurs only for subdomi-
nant saddle points (saddle–point trajectories in our case). Unfortunately, apart from several
heuristic conjectures [21, 12], sometimes rather suggestive [13], there is presently no general
method of dealing with the Stokes phenomenon in the semiclassical calculations. However,
in the situation encountered above it suffices to use the simplest logic lying at the heart of
all other approaches17.
When gathering the final result for the suppression exponent, we follow two guidelines.
First, it is clear that, as energy decreases, each branch becomes unphysical before F0,n(E)
crosses zero. On the other hand, at high energies one should pick up the branch corresponding
to the smallest value of the suppression exponent. Looking at Fig. 9, one notes that the
curves F0,g(E), F0,4(E) have two intersections, A and B. At E > EB one chooses the
“global” branch. In the region EA < E < EB we switch to the first “local” branch, because
in this region F0,4(E) < F0,g(E). Naively, at E = EA one should jump back to the “global”
branch; however, in order to preserve unitarity at small energies, we suppose that somewhere
in between the points B and A the “global” branch becomes non–contributing, so that one
should stay at the “local” branch at E < EA. Similarly, the adjacent “local” branches have
17The simplification in the present case is related to the fact that we concentrate on the dominant semi-
classical contribution, leaving aside the subdominant ones.
two intersections; as the energy decreases, we switch from n-th branch to n + 1-th at the
first intersection, and stay there until the intersection with the n+2-th branch. Overall, one
obtains the graph for the suppression exponent plotted in Fig. 10. The suppression exponent
 0.02
 0.04
 0.06
 0.08
 0  0.05  0.1  0.15  0.2  0.25
Figure 10: The final result for the suppression exponent F0(E) in the region of small energies;
β = π/3, α = π/30. The points where different branches merge are shown with thick black
dots.
oscillates between two linear envelopes, F = E(fβ(0) ± 4e−1α ctg β); oscillations pile up in
the region of low energies. The reflection process is optimal in the vicinities of the minima
of the function F0(E).
5 Discussion
By considering a class of two–dimensional waveguide models, we have demonstrated explicitly
that the probability of over–barrier reflection can be non–monotonic function of energy. The
origin of the effect lies in the classical dynamics: the parameters of the complex trajectory
describing over–barrier reflection change quasi-periodically as the energy gets decreased.
This results in the oscillatory behavior of the suppression exponent. Reflection occurs with
exponentially larger probability in the vicinities of “optimal” energies (local minima of the
suppression exponent) while being highly suppressed in between.
Our results are obtained for a fairly specific class of waveguides, namely, the ones with
very sharp turns. However, the qualitative features observed in this paper should be valid
for quite general waveguide models: a classical particle with high energy feels any large–scale
turn of the waveguide as a sharp one18; if two turns are separated by a long interval of free
motion, one arrives to the model (35). We remark that the phenomenon of optimal tunneling
has been observed also in numerical investigation of a smooth waveguide, see Ref. [14].
The branch structure of solutions observed in the region of small energies is interesting
from the mathematical point of view. We have shown that there exists an infinite sequence of
complex trajectories marked by the topological number n. Each branch produces physically
consistent result for the suppression exponent in some energy interval; outside of this interval
the n-th branch would correspond either to highly suppressed transitions (high energies) or
to violation of unitarity (low energies). We collected the final graph for the suppression
exponent basing on the empirical considerations, which hardly may be acknowledged as
satisfactory. Our study clearly shows that the method of complex trajectories should be
equipped with a convenient rule to pick up the physical trajectory among the discrete set of
solutions to the boundary value problem (9), (10), (12) (in other words, the method to deal
with the Stokes phenomenon). Presently, such a rule is absent.
We note that the described physical phenomenon of optimal tunneling is present inde-
pendently of the way the branches of solutions are glued together. The result at relatively
high energies is given by the “global” branch, which displays a large number of local minima
if n′0 > n1, see Eqs. (67), (65). This is the case for the illustrative example considered
throughout this paper, see Fig. 9.
As a final remark, we point out some open issues. We have calculated the suppression
exponent of reflection using the sharp–turn approximation. It would be instructive to extend
our analysis by finding corrections due to the finite turn widths. The motivation is twofold.
First, the analysis performed in appendix implies existence of a rich variety of distinctive
semiclassical solutions contributing almost equally into the reflection probability. This fea-
ture might be a manifestation of chaos [7] which is present in our system but hidden by the
sharp–turn approximation. [Note that chaos is inherent in a very similar waveguide model
18More precisely, one should compare the width b of the turn to the quantity 2π
, where p0 is the
translatory momentum of the particle and ω stands for the frequency of transverse oscillations; if b≪ 2πp0
one is in the class of models with sharp turns.
with smooth potential, see Ref. [14].] Clearly, the structure of solutions in the vicinities of
the turns is worth further investigation.
Second, it was proposed recently in Refs. [9, 11] that the process of dynamical tunneling
in quantum systems with multiple degrees of freedom (including field theoretical models,
see Refs. [19]) can proceed differently from the ordinary case of one–dimensional tunnel-
ing. Namely, classically unstable state can be created during the process; this state decays
subsequently into the final asymptotic region. The analysis performed in the present paper
naturally conforms with this tunneling mechanism: all our complex trajectories are matched
with the unstable orbit living at the turn. Still, the sharp–turn approximation does not allow
to distinguish between the truly unstable trajectories staying at the turn forever and those
which reflect from the turn in a finite time. To decide whether the tunneling mechanism of
Refs. [9, 11] is indeed realized in our model one needs to go beyond the sharp–turn approx-
imation. Then, the candidate for the “mediator” unstable state is the “excited sphaleron”,
the solution considered in the appendix. Presumably, in our model one can answer analyti-
cally to the question of whether or not the “excited sphaleron” acts as an intermediate state
of the tunneling process. This study is quite beyond the scope of the present paper and we
leave it for future investigations.
Acknowledgments. We are indebted to F.L. Bezrukov and V.A. Rubakov for the en-
couraging interest and helpful suggestions. This work is supported in part by the Russian
Foundation for Basic Research, grant 05-02-17363-a; Grants of the President of Russian
Federation NS-7293.2006.2 (government contract 02.445.11.7370), MK-2563.2006.2 (D.L.),
MK-2205.2005.2 (S.S.); Grants of the Russian Science Support Foundation (D.L. and S.S.);
the personal fellowship of the “Dynasty” foundation (awarded by the Scientific board of
ICFPM) (A.P.) and INTAS grant YS 03-55-2362 (D.L.). D.L. is grateful to Universite Libre
de Bruxelles and EPFL (Lausanne) for hospitality during his visits.
A Classical motion near the turn
In this appendix we analyze the motion of the particle near the sharp turn of the waveguide
(20) at nonzero smoothening of the turn, see, e.g., Eq. (21). We suppose that in the small
vicinity of the turn the function w(ξ, η) can be represented in the form
w(ξ, η) = cos β (η − bv(ξ/b)) , (69)
where v(ψ) does not depend explicitly on b. Moreover, we consider the case when v(ψ) has
a maximum19,
v′(ψ0) = 0 . (70)
Due to the property (70) one immediately obtains the exact periodic solution to the
equations of motion (9), which we call “excited sphaleron” [9],
ξsp = bψ0 , ηsp = Aη sin(t cos β + ϕη) + bv(ψ0) . (71)
We are going to show that this solution is unstable: a small perturbation above it grows
with time and the particle flies away to either end of the waveguide. In particular, there are
solutions that describe the decay of the sphaleron to ξ → −∞ both at t → ±∞. Clearly,
such solutions correspond to reflections from the turn.
In the vicinity of the sphaleron the trajectory of the particle can be represented in the
form,
ξ = bψ(t) , η = ηsp(t) + bρ(t) , (72)
where ψ, ρ ∼ O(1). Writing down the classical equations of motion (9) in the leading order
in b, one obtains,
Aη sin(2s)v
′(ψ) , (73)
+ 4ρ = 4[v(ψ)− v(ψ0)] , (74)
where s = (t cos β +ϕη)/2. It is worth noting that the right hand side of Eqs. (73), (74) are
of different order in b. We will see that due to this difference ρ = 0 in the leading order in b.
Let us first consider the linear perturbations above the excited sphaleron,
ψ = ψ0 + δψ , δψ ≪ 1 .
Equation (73) can be linearized with respect to δψ leading to the Mathieu equation
δψ + 2q sin(2s)δψ = 0 ,
with canonical parameter q = −2v′′0Aη/b > 0. As q ∼ O(1/b) ≫ 1, one can apply the WKB
formula,
A cosW
dW/ds
, (75)
19For the smoothening (21), the properties (69), (70) hold with v(ψ) = ψtgβ
, ψ0 ≈ 1.28.
where |A| ≪ 1, and
sin(2s′) .
Note that we have chosen the solution symmetric with respect to time reflections,
δψ(π/2− s) = δψ(s) . (76)
At s ∈ [0; π/2] the exponent W is real and the particle gets stuck at ψ ≈ ψ0, oscillating
around this point with high frequency dW/ds ∼ O(b−1/2). At s < 0 the solution (75) grows
exponentially, meaning that the particle flies away from the excited sphaleron,
δψ(s < 0) =
A cos(W (0)− π/4)
|dW/ds|
e|W (s)−W (0)| .
In what follows, we choose A cos(W (0)− π/4) < 0, so that δψ < 0 at s < 0. Let us denote
by s1 < 0 the point where δψ becomes formally equal to −1,
A cos(W (0)− π/4)
|dW/ds|
e|W (s1)−W (0)| = −1 .
In what follows we suppose that s1 ∼ O(1), hence, A is exponentially small. Then, in the
vicinity of this point, |s− s1| ≪ 1, one has,
δψ = − exp
−2q sin(2s1)(s1 − s)
= − exp
4v′′0Aη sin(2s1)
(s1 − s)√
. (77)
We notice that δψ evolves from exponentially small values to δψ ∼ O(1) during the charac-
teristic time |s− s1| ∼ O(
When δψ ∼ O(1) the linear approximation breaks down and one has to solve the nonlinear
equation (73). Using s = s1 +O(
b) one writes
Aη sin(2s1)v
′(ψ) . (78)
This equation permits to draw a useful analogy with one–dimensional particle moving in
the effective potential Veff (ψ) = −4b−1Aη sin(2s1)v(ψ) (see Fig. 11). This auxiliary particle
starts in the region near the maximum of the potential at (s − s1)/
b → +∞ with energy
E ≈ Vmax and rolls down toward ψ → −∞ at (s−s1)/
b→ −∞. In this limit v(ψ) → ψ tg β
and the solution takes the form
ψ = C1 + C2(s− s1) + 2b−1Aη sin(2s1) tg β (s− s1)2 .
Vmax 
Figure 11: The effective potential for Eq. (78).
Note that the coefficients C1, C2 here are not independent: they are determined by the
parameter s1 through matching of the solution with Eq. (77) at (s− s1)/
b→ +∞. We do
not need their explicit form, however.
Let us argue that the function ρ remains small during the whole evolution of the particle
in the vicinity of the sphaleron. Indeed, in the linear regime one has δψ ≪ 1 and the r.h.s.
of Eq. (74) is small. So, ρ does not get excited. On the other hand, the nonlinear evolution
of ψ proceeds in a short time interval ∆s = O(
b); so, again, ρ is suppressed by some power
of b.
The trajectory (72) found in the vicinity of the sphaleron should be matched at
1 ≫ |s− s1| ≫
with the free solution in the asymptotic region ξ < 0, see Eqs. (26). It is straightforward to
check that matching can be performed up to the second order in (t− t1), which is consistent
with our approximations. In this way one determines the free asymptotic solution which, up
to corrections of order O(b), coincides with the sinusoid coming from ξ → −∞ at t → −∞
and touching the line ξ = 0 at t = t1.
Now we recall that, by construction, the obtained solution is symmetric with respect to
time reflections,
ξ(s) = ξ(π/2− s) , η(s) = η(π/2− s) .
This means that it satisfies ξ → −∞ at t → ±∞. This solution describes reflection of the
particle from the turn.
The reasoning presented in this appendix puts considerations of the main body of this
paper on the firm ground: we have found the “smoothened” solutions which reflect classically
from the turn, and in the limit b→ 0 coincide with the free solutions of Sec. 3 touching the
line ξ = 0.
It is worth mentioning that, apart from the reflected solution we have found, in the
vicinity of any trajectory touching the line ξ = 0 there exists a rich variety of qualitatively
different motions. First of all, one may successfully search for solutions which are odd with
respect to time reflections (Eq. (76) with minus sign). Such solutions, though close to the
reflected ones at t < 0, describe transmissions of the particle through the sharp turn into the
asymptotic region ξ → +∞. Relaxing the time reflection symmetry, one can find solutions
leaving the vicinity of the turn at any point η < 0, which is different, in general, from the
starting point η = η(s1). Yet another types of solutions are obtained in the case when the
amplitude A of δψ–oscillations at s ∈ [0; π/2] is so small that δψ does not reach the values
of order one during the time period s ∈ [−π/2; 0]. If the particle is still in the vicinity of the
point ψ0 at s = −π/2, it remains for sure in this vicinity at s ∈ [−π; −π/2], because the r.h.s.
of Eq. (73) is positive again. In this way one obtains solutions, which spend two, three, etc.
sphaleron periods at ψ ≈ ψ0 before escaping into the asymptotic regions ψ → ±∞. In the
leading order in b all these solutions correspond to the identical initial state, and (in the case
of classically forbidden transitions) to the same value of the suppression exponent. However,
an accurate study of the dynamics in the vicinity of the the sphaleron is generically required
to obtain the correct value of the suppression exponent in the case b ∼ 1, cf. Ref. [14].
References
[1] V. A. Kuzmin, V. A. Rubakov and M. E. Shaposhnikov, Phys. Lett. B 155, 36 (1985).
[2] W. Miller and T. George, J. Chem. Phys. 56, 5668 (1972); 57, 2458 (1972);
W. H. Miller, Adv. Chem. Phys. 25, 69 (1974).
[3] A. M. Perelomov, V. S. Popov and M. V. Terent’ev, ZHETF 51, 309 (1966);
V. S. Popov, V. Kuznetsov and A. M. Perelomov, ZHETF 53, 331 (1967).
[4] M. Davis and E. Heller, J.Chem.Phys. 75, 246 (1981).
[5] M. Wilkinson, Physica 21D, 341 (1986);
S. Takada and H. Nakamura, J. Chem. Phys. 100, 98 (1994);
S. Takada, P. Walker and M. Wilkinson, Phys. Rev. A 52, 3546 (1995);
S. Takada, J. Chem. Phys. 104, 3742 (1996).
[6] W. Miller, J. Phys. Chem. A 105, 2942 (2001).
[7] O. Bohigas, D. Boose, R.Egydio de Carvalho, V. Marvulle, Nucl. Phys. A560, 197 (1993);
S. Tomsovic, D. Ullmo, Phys. Rev. E 50, 145 (1994);
A. Shudo, K.S. Ikeda, Phys. Rev. Lett. 74, 682 (1994); Physica D 115, 234 (1998).
E. Doron, S.D. Frischat, Phys. Rev. Lett. 75, 3661 (1995); Phys.Rev. E 57, 1421 (1998);
S.C. Creagh, N.D. Whelan, Phys. Rev. Lett. 77, 4975 (1996).
[8] G. F. Bonini, A. G. Cohen, C. Rebbi and V. A. Rubakov, Phys. Rev. D 60, 076004
(1999) [arXiv:hep-ph/9901226].
[9] F. Bezrukov and D. Levkov, J. Exp. Theor. Phys. 98, 820 (2004) [Zh. Eksp. Teor. Fiz.
125, 938 (2004)] [arXiv:quant-ph/0312144]; “ Transmission through a potential bar-
rier in quantum mechanics of multiple degrees of freedom: Complex way to the top”,
arXiv:quant-ph/0301022.
[10] C.S. Drew, S.C. Creagh, R.H. Tew, Phys. Rev. A 72, 062501 (2005).
[11] K. Takahashi, K.S. Ikeda, Europhys. Lett. 71 (2), 193 (2005); Phys. Rev. Lett. 97,
240403 (2006).
[12] S. Adachi, Ann.Phys. 195, 45 (1989).
[13] A. Shudo, K.S. Ikeda, Phys.Rev.Lett. 76, 4151 (1996).
[14] D. G. Levkov, A. G. Panin and S. M. Sibiryakov, “Complex trajectories in chaotic
dynamical tunneling,” arXiv:nlin.cd/0701063.
[15] A. S. Ioselevich and E. I. Rashba, Sov. Phys. JETP 64, 1137 (1986) [Zh. Eksp. Theor.
Fiz. 91, 1917 (1986)].
[16] M. B. Voloshin, Phys. Rev. D 49, 2014 (1994).
[17] D. G. Levkov and S. M. Sibiryakov, Phys. Rev. D 71, 025001 (2005)
[arXiv:hep-th/0410198]; JETP Lett. 81, 53 (2005) [Pisma Zh. Eksp. Teor. Fiz. 81, 60
(2005)] [arXiv:hep-th/0412253].
http://arxiv.org/abs/hep-ph/9901226
http://arxiv.org/abs/quant-ph/0312144
http://arxiv.org/abs/quant-ph/0301022
http://arxiv.org/abs/nlin/0701063
http://arxiv.org/abs/hep-th/0410198
http://arxiv.org/abs/hep-th/0412253
[18] M. P. Mattis, Phys. Rept. 214, 159 (1992);
P. G. Tinyakov, Int. J. Mod. Phys. A 8, 1823 (1993);
V. A. Rubakov and M. E. Shaposhnikov, Usp. Fiz. Nauk 166, 493 (1996) [Phys. Usp.
39, 461 (1996)], [arXiv:hep-ph/9603208].
[19] V. A. Rubakov, D. T. Son and P. G. Tinyakov, Phys. Lett. B 287, 342 (1992);
A. N. Kuznetsov and P. G. Tinyakov, Phys. Lett. B 406, 76 (1997)
[arXiv:hep-ph/9704242];
F. Bezrukov, D. Levkov, C. Rebbi, V. A. Rubakov and P. Tinyakov, Phys. Rev. D 68,
036005 (2003) [arXiv:hep-ph/0304180].
[20] A. Ringwald, Nucl. Phys. B 330, 1 (1990).
O. Espinosa, Nucl. Phys. B 343, 310 (1990).
[21] P. V. Elyutin, V. D. Krivchenkov “Nonrelativistic Quantum Mechanics in problems”,
Moscow, Fizmatlit (2001).
M. V. Berry, K. E. Mount, Rep. Prog. Phys. 35, 315 (1972).
http://arxiv.org/abs/hep-ph/9603208
http://arxiv.org/abs/hep-ph/9704242
http://arxiv.org/abs/hep-ph/0304180
	Introduction
	The semiclassical method
	The model with one turn
	The model with two turns
	Introducing the system
	Classical evolution
	Classically forbidden reflections
	Discussion
	Classical motion near the turn
ABSTRACT
  We present an analytic example of two dimensional quantum mechanical system,
where the exponential suppression of the probability of over-barrier reflection
changes non-monotonically with energy. The suppression is minimal at certain
"optimal" energies where reflection occurs with exponentially larger
probability than at other energies.

<|endoftext|><|startoftext|>
Microsoft Word - APL-Ribbon_junctions_20070328-Text_with_Figures-ARXIV.doc
Molecular circuits based on graphene nano-ribbon junctions 
Zhiping Xu† 
Department of Engineering Mechanics, Tsinghua University, Beijing, 100084, China 
Graphene nano-ribbons junctions based electronic devices are proposed in this Letter. Non-
equilibrium Green’s function calculations show that nano-ribbon junctions tailored from 
single layer graphene with different edge shape and width can act as metal/semiconductor 
junctions and quantum dots can be implemented. In virtue of the possibilities of patterning 
monolayer graphene down to atomic precision, these structures, quite different from the 
previously reported two-dimensional bulk graphene or carbon nanotube devices, are 
expected to be used as the building blocks of the future nano-electronics. 
Keyword: graphene nano-ribbon, electronic transport, metal/semiconductor junction, quantum dot 
                                                 
† Email: xuzhiping@gmail.com 
Nano-electronics, or molecular electronics have been proposed as the alternative to silicon in 
future technical applications1 and have attracted great interests recently. In virtue of their unique 
structures and various functions, these nanostructures possess intriguing electromagnetic, 
mechanical and optical features. Especially, carbon based nanostructure, such as fullerene, graphene 
and carbon nanotubes are the most interesting structures because of their rich variety of excellent 
physical properties. For instance, anomalous quantum hall effects (QHE) and massless Dirac 
electronic behavior have been discovered in the graphene systems2, 3, and these discovery has 
sparked lots of investigations on this unique two-dimensional material. Tailored from monolayer 
graphene, graphene ribbon (GNR) with finite width has been shown to hold unusual electronic 
properties4, depending on their edge shape and width. In more details, ribbons with zigzag edges 
(ZGNRs) possess spin-polarized peculiar edges states and spin-polarized electronic state provides 
half-metallicity under transverse electric field and has great potential in the application as 
spintronics5. In contrast, the armchair edged ribbons (AGNR) can be either metallic or 
semiconducting depending on their width6, AGNR with width Na (named as NaAGNR in the 
conventional nomenclature) has been shown to be metallic only if Na = 3k + 2 and semiconducting 
otherwise, where k is an integer. 
From the experimental point of view, the fascinating feature of the ribbons is that the 
graphene material can be easily patterned using standard micro- or nano-electronics lithography 
methods. Unlike the carbon nanotubes or other low-dimensional nanostructures, the GNRs with 
intricate sub-micrometer structures can now be fabricated7, 8, 9, and it is believed that a combination 
of standard lithographic and chemical methods will help to pattern the graphene with atomic 
precision down to the molecular level. The high mobility μ = 2.7 m2/V.s, large elastic mean free 
path le = 600nm and phase coherence lengths lφ= 1.1 μm observed7 in the epitaxial graphene 
patterned suggest the use of pure GNR structures as the building blocks of the nanoscale confined 
and coherent electronic circuits. To realize the components such as field transistors9 and coulomb 
blockade devices, experimentally controllable metal/semiconductor junctions and quantum dots will 
be essential. As proposed by Chico et al.10, 11, these can be achieved by jointing different carbon 
nanotubes. However, the fabrication and control of the nanostructure of graphene ribbons are much 
more convenient than introducing pentagon-heptagon defects in carbon nanotubes as discussed, 
therefore it is interesting to investigate the possibilities of ribbon junction based nano-circuits. 
To this end, we have proposed several kinds of the GNRs based electronic devices in this 
Letter. We show that, by controlling the tailoring process of GNRs with different edge shape and 
width, the metal/semiconductor junctions and quantum dots can be easily implemented 
experimentally. To validate this, electronic transport calculation using the non-equilibrium Green’s 
function method have been carried out following Landauer’s approach12. The electronic structure of 
the graphene lattice is described using the nearest-neighbour π-orbital tight-binding model and the 
hopping parameter Vppπ = 2.75 eV is used. This simple topological model gives quantitative results 
comparing with the LDA results except for the gap opening at small width as the consequence of 
the length changing of σ bonds6. By solving the Green’s function, the conductance was finally 
calculated as G = G0Tr[ΓLGRΓRGA] and the density of state is expressed as D = –ImTr[GR]/π11, 
where G0 = 2e2/h is the unit quanta of conductance including the spin degeneracy, GR(A) is the 
retarded (advanced) Green’s function of the conductor and ΓL(R) is the spectral density describing 
the coupling between the left (right) lead and the conductor. In our model, the leads are represented 
using semi-infinite graphene ribbons attached to the conductor region, with the same shape and 
width. 
First of all we investigate straight metal/semiconducting junction 11AGNR/10AGNR. The 
structure of the junction is considered by simply patching two different straight ribbons together, 
leave a width mismatching at the interface. The result shown in Fig. 1 indicates a gap Eg = 0.93 eV 
near the Fermi energy and the imperfection at the interface induces a deviation of conductance from 
the step-like curve of the perfect ribbon. However, the van Hove singularities which are the 
characteristics of 1D system remain. 
To examine the detailed electronic structure of the junction, a spatial-resolved localized 
density of states (LDOS) analysis is helpful. We have grouped the atoms into slices according to 
their distance from the interface. Each 4.26 Å long slice (a unit cell of the perfect AGNR) in the 
10AGNR, 11AGNR and interface part contain 20, 22 and 21 atoms respectively. The LDOS 
averaged at different slices are plotted in Fig. 1. From the semiconducting 10AGNR side we find 
the LDOS is distorted near the interface and gap state appears through the contact with metallic 
11AGNR. However at slices far from the interface, at slice 3 for example, the perfect 
semiconduting behavior is mostly recovered. At the scattering interface the van Hove singularities 
have been smoothed and 1D metallic structure gradually emerges as the distance from interface 
increases from the 11AGNR side. The arising of gap state near the interface characterizes the metal-
semiconductor junction and suggests the possibilities of building Schottky devices. 
Furthermore, L-shape GNR junctions with different orientations can be constructed. For 
instance, the LDOS of 8ZGNR/15AGNR junction with a π/6 joint is analysized in Fig. 2. As 
expected, the edge state of the 8ZGNR spreads into the semiconducting 15AGNR side. Because of 
the ZGNR possess spin-polarized structure, so this half-metal/semiconducting junction inspires 
interests in the spin-transport devices. 
Beside of the metal/semiconductor junction, the semiconductor/semiconductor junctions 
have also been investigated and defect states in gap appear at the interface. Moreover in the 
ZGNR/ZGNR junctions, zero-conductance dips13 near Fermi energy have been observed, caused by 
the complete backward scattering. 
The metal-semiconductor junction also suggests quantum dot devices through combing two 
of them together. We now consider the junction 12AGNR/11AGNR/12AGNR. In this structure a 
central metallic ribbon is sandwiched by two semiconductor barriers where quantized states can be 
formed. Our calculation results depicted in Fig. 3 show two sharp DOS peaks inside the gap of 
semiconducting 12AGNR containing 7 unit cells, with energy E1,2 = 0.2025 and -0.2025 eV. As 
seen from the spatial-resolved LDOS at E = 0.2025, the bounded state is localized inside the 
11AGNR region. The structure of the quantum levels can be further tuned by changing the length of 
11AGNR. From our calculation, as it changes from 1 to 8 unit cells, the energy spacing between the 
nearest peaks around Fermi energy, i.e. ΔE = E1-E2, gradually decreases from 0.785 eV to 0.385 eV 
and their DOS becomes higher and sharper. 
We have also observed quantized edge states within the 10AGNR/7ZGNR/10AGNR 
junctions through introducing two π/6 joints. The results are shown in Fig. 3 where we can found 7 
LDOS peaks inside the zero-conductance gap. The quantized states with E = -0.3525, -0.1625, - 
0.05, 0, 0.05, 0.1625 and 0.3525 correspond to different LDOS patterns (see Fig. 4 for E = 0.3525). 
The higher the energy, the more nodes of the bounded standing wave have. The electron wave 
quantized pattern depends on the structure of the central region. 
In conclusion, we have proposed nano-electronic circuits based on graphene nano-ribbon 
junctions. Through tailoring GNRs into junctions of different edge shape and width, we can 
implement metal/semiconductor junctions and quantum dots in principle. In virtue of the possibility 
of molecular level patterning based on lithography and chemical methods, these devices are 
expected to be fabricated easier in comparison with other structures such as the single molecule or 
carbon nanotubes junctions, and are expected to find great applications in the large-scale integrated 
nano-circuits in future. 
The work is supported by the National Science Foundation of China through Grants 
10172051, 10252001, and 10332020 and the Hong Kong Research Grant Council (NSFC/RGC N 
HKU 764/05 and HKU 7012/04P). ZX also thanks Prof. Wenhui Duan, Dr. Tao Zhou and Dr. 
Haiyun Qian from the Department of Physics in Tsinghua University for their help on the 
calculation. 
1N. J. Tao, Nature Nanotechnology 1 173 (2006). 
2K. S. Novoselov, A. K. Geim, S. V. Morozov, D. Jiang, M. I. Katsnelson, I. V. Grigorieva, S. V. 
Dubonos and A. A. Firsov, Nature 438, 197 (2005). 
3Y. Zhang, Y. Tan, H. L. Stormer and P. Kim, Nature 438, 201 (2005). 
4Y. Kobayashi, K. Fukui, T. Enoki, K. Kusakabe and Y. Kaburagi, Phys. Rev. B 71, 193406 (2005). 
5Y. Son, M. L. Cohen and S. G. Louie, Nature 444, 347 (2006). 
6Y. Son, M. L. Cohen and S. G. Louie, Phys. Rev. Lett. 97, 216803 (2006). 
7C. Berger, Z. Song, X. Li, X. Wu, N. Brown, C. Naud, D. Mayou, T. Li, J. Hass, A. N. 
Marchenkov, E. H. Conrad, P. N. First and W. A. de Heer, Science 312, 119 (2006). 
8S. Liu, F. Zhou, A. Jin, H. Yang, Y. Ma, H. Li, C. Gu, L. Lu, B. Jiang, Q. Zheng, S. Wang and L. 
Peng, Acta Physica Sinica 54, 4251 (2005). 
9Z. Chen, Y. Lin, M. Rooks and P. Avouris, http://arvix.org/abs/cond-mat/0701599, (2007). 
10L. Chico, V. H. Crespi, L. X. Benedict, S. G. Louie and M. L. Cohen, Phys. Rev. Lett. 76, 971 
(1996). 
11L. Chico, M. P. Lopez Sancho and M. C. Munoz, Phys. Rev. Lett. 81, 1278 (1998). 
12J. Lu, J. Wu, W. Duan, F. Liu, B. Zhu and B. Gu, Phys. Rev. Lett. 90, 156601 (2003). 
13K. Wakabayashi, Phys. Rev. B 64, 125428 (2001).
FIG. 1. The metal/semiconducting junction 11AGNR/10AGNR: (Top) Conductance and DOS of 
the whole system. (Bottom) LDOS at slices near the interface. Slice n (n = 1, 2 and 3) represents the 
n-th nearest slice to the interface and the vertical scale of DOS is 0.2. 
FIG. 2. Spatial-resolved LDOS in metal/semiconducting junction 8ZGNR/15AGNR, the vertical 
scale of DOS is 0.2. 
FIG. 3. Quantum dot structure based on 12AGNR/11AGNR/12AGNR junction: (Top) Conductance 
and DOS at low bias, where two isolated sharp peaks appear inside the gap; (Bottom) Spatial-
resolved LDOS at E = 0.2025 eV, the grey dot represents the ionic site and the radius of circle 
around it corresponds to the value of LDOS. 
FIG. 4. Quantum dot structure based on 10AGNR/7ZGNR/10AGNR junction: (Top) Conductance 
and DOS; (Bottom) Spatial-resolved LDOS at E = 0.3525 eV. 
Figure 1 
Figure 2 
Figure 3 
Figure 4
ABSTRACT
  Graphene nano-ribbons junctions based electronic devices are proposed in this
Letter. Non-equilibrium Green function calculations show that nano-ribbon
junctions tailored from single layer graphene with different edge shape and
width can act as metal-semiconductor junctions and quantum dots can be
implemented. In virtue of the possibilities of patterning monolayer graphene
down to atomic precision, these structures, quite different from the previously
reported two-dimensional bulk graphene or carbon nanotube devices, are expected
to be used as the building blocks of the future nano-electronics.

<|endoftext|><|startoftext|>
Introduction
Is a finite subgroup H of units in the integral group ring ZG of a finite group
G necessarily isomorphic to a subgroup of G? Of course, torsion coming from the
coefficient ring should be excluded, that is, only finite subgroups H in V(ZG), the
group of units of augmentation one in ZG, will be considered. The question was
raised by Higman in his thesis (1940), where he gave an affirmative answer when G
is metabelian nilpotent or the affine group over a prime field; cf. Sandling (1981).
In the survey of Sandling (1984) it is included as Problem 5.4, and noted that an
affirmative answer for metabelian G was finally given by Roggenkamp (1981); but
see also Cliff, Sehgal and Weiss (1981), and Marciniak and Sehgal (2003) for a more
recent result, giving a generalization based on a theorem of Weiss (1988). These
results are really about certain ‘large’ torsion-free normal subgroups of V(ZG). For
a more complete discussion, see Chapter 4 in Sehgal’s book (1993).
As a sort of converse, one may fix a finite group H and look for groups G for
which H embeds into V(ZG), again hoping for the best, but little is known in this
respect. What is known is that if a cyclic group H of prime power order embeds
into some unit group V(ZG), then H also embeds into G (due to an observation
of Cohn and Livingstone (1965); see also Zassenhaus (1974)), and only recently in
Hertweck (2007b) it was shown that the restriction on the order can be removed
if in addition G is assumed to be solvable. In this spirit, Marciniak, at a satellite
conference of the ICM 2006, asked whether a group G necessarily has a subgroup
isomorphic to Klein’s four group provided this is the case for V(ZG). Kimmerle
immediately observed that this is implied by the Brauer–Suzuki theorem (rendered
in Kimmerle (2006)), see Section 2. Our complementary result is as follows.
Theorem A. Let G be a finite group. Suppose that V(ZG) has a noncyclic abelian
subgroup of order p2, for some odd prime p. Then the same is true for G (i.e.,
Sylow p-subgroups of G are not cyclic).
Date: October 30, 2018.
2000 Mathematics Subject Classification. Primary 16S34, 16U60; Secondary 20C05.
Key words and phrases. integral group ring, torsion unit, partial augmentation.
http://arxiv.org/abs/0704.0412v1
2 MARTIN HERTWECK
It is easy to verify that a finite p-group with no noncyclic abelian subgroup is
either cyclic or a (generalized) quaternion group, see Theorem 4.10 in Gorenstein
(1968). It comes to mind that the theory of cyclic blocks might be used in the
proof, but it is pretty simple and makes only use of a fact about vanishing of
partial augmentations of torsion units, established in Hertweck (2006, 2007a).
We remark that both results (whether p is even or odd) for a solvable group G
are covered by Theorem 5.1 in Dokuchaev and Juriaans (1996).
Note that a group G whose Sylow 2-subgroups are cyclic has a normal 2-comple-
ment, by Burnside’s well known criterion, see Theorem 4.3 in Gorenstein (1968).
We obtain the following corollary.
Corollary 1. Let G be a finite group having cyclic Sylow p-subgroups for some
prime p. Then any finite p-subgroup of V(ZG) is isomorphic to a subgroup of G.
Finally, we remark that, as with other results in this field, the theorem can be
formulated for more general coefficient rings than Z, notably for the semilocalization
of Z at the prime divisors of the order of G. Unfortunately, it is definitely wrong
for p-adic coefficient rings.
2. Kimmerle’s observation
Coming back to the initial question, we mention that in the hope for further
positive results, it is natural to impose restrictions on the prime divisors of the
finite subgroup H , i.e., to consider only π-groups H for some set π of primes (a
singleton {p}, to begin with), as has been done before in work on the stronger
Zassenhaus conjecture (ZC3), cf. Dokuchaev and Juriaans (1996). It is well known
that then, one can assume that Oπ′(G), the largest normal π
′-subgroup of G, is
trivial, for H has an isomorphic image under the natural map ZG → ZG/Oπ′(G),
see the remark after Theorem 2.2 in Dokuchaev and Juriaans (1996).
This derives from the vanishing of certain partial augmentations of the elements
of H . Recall that for a group ring element u =
g∈G agg (all ag in Z), its partial
augmentation with respect to an element x of G, or rather its conjugacy class xG
in G, is the sum
g∈xG ag; we will denote it by εx(u). The result of Cohn and
Livingstone mentioned in the introduction really says that if an element h of H is
of prime power order, then there exists an element x in G of the same order such
that εx(h) 6= 0. Note that εz(u) = az for an element z in the center of G. An
old yet fundamental result from Berman (1955) and Higman (1940) asserts that if
εz(h) 6= 0 for an element h in H and some z in the center of G, then h = z.
Coming to Marciniak’s question, suppose that G has no subgroups isomorphic to
Klein’s four group. For our purpose, we can assume that O2′(G) = 1 and that Sylow
2-subgroups of G are not cyclic. Thus Sylow 2-subgroups of G are (generalized)
quaternion, and by the Brauer–Suzuki theorem, from Brauer and Suzuki (1959), G
contains a unique involution z. For an involution u in V(ZG), the Cohn–Livingstone
result gives εz(u) 6= 0, and therefore u = z by the Berman–Higman result, answering
Marciniak’s question in the affirmative.
Theorem B (Kimmerle). Let G be a finite group. Suppose that V(ZG) has a
subgroup isomorphic to Klein’s four group. Then the same is true for G.
We do not know of a proof avoiding the use of the Brauer–Suzuki theorem.
Suppose that Sylow 2-subgroups of G are quaternion groups. Then the theorem
implies that finite 2-subgroups of V(ZG) are cyclic or quaternion groups. Taking
UNIT GROUPS WITH NO NONCYCLIC ABELIAN FINITE SUBGROUPS 3
into account the structure of the quaternion groups, and the Cohn–Livingstone
result, one obtains the following corollary.
Corollary 2. Let G be a finite group whose Sylow 2-subgroups are quaternion
groups (ordinary or generalized). Then any finite 2-subgroup of V(ZG) is isomor-
phic to a subgroup of G.
3. Proof of Theorem A
The partial augmentations of a torsion unit in V(ZG) encode its character values
in a way establishing a connection to group elements which respects a divisibility
relation between orders. We will make use of a lemma which is an easy consequence
of this fact.
Lemma 3. Let u be a torsion unit in V(ZG) of, say, order n. Let s be a natural
integer coprime to n, so that st ≡ 1 mod n for another natural integer t. Then for
all x in G whose order divide n, we have εx(u
s) = εxt(u).
Proof. Let ζ be a primitive n-th complex root of unity, and let σ be the Galois auto-
morphism ofQ(ζ) sending ζ to ζs. Let x1, . . . , xk be representatives of the conjugacy
classes of G whose elements have order dividing n. Note that then xt1, . . . , x
k is an-
other system of representatives. By Theorem 2.3 in Hertweck (2007a), εx(u) 6= 0 is
possible only for elements x whose order divide n. Thus for any ordinary irreducible
character χ of G, we have
εxi(u
s)χ(xi) = χ(u
s) = χ(u)σ =
εxi(u)χ(xi)
εxi(u)χ(x
i ) =
(u)χ(xi).
Since the character table of G, stripped off from any additional information, is an
invertible matrix, it follows that εxi(u
s) = εxt
(u) for all indices i, which proves the
lemma. �
Corollary 4. Let u be a torsion unit in V(ZG) of, say, order n. Then for any x
in G whose order divides n,
s∈(Z/nZ)×
s∈(Z/nZ)×
εxs(u).
Corollary 5. Suppose that for a prime divisor p of the order of G, all elements
of order p in G are conjugate to a power of some fixed element x. Let u be a
torsion unit in V(ZG) of order p. Then
i=1 u
i and
i=1 x
i have the same
partial augmentations.
Proof. Let k be the number of conjugacy classes of elements of order p in G. By
Corollary 4 and Theorem 2.3 in Hertweck (2007a),
εxi(u) =
yG : y∈〈x〉
εy(u) =
Applying again Theorem 2.3 from Hertweck (2007a), the corollary follows. �
4 MARTIN HERTWECK
We will apply this by means of the following formula relating ranks of an idem-
potent to arithmetical properties of the group.
Corollary 6. Suppose that for a prime divisor p of the order of G, all elements
of order p in G are conjugate to a power of some fixed element x. Suppose further
that V(ZG) contains an elementary abelian subgroup U of order p2. Then for any
ordinary character χ of G,
(1) χ
χ(1) + (p+ 1)
χ(xi)
We now turn to the proof of Theorem A. Suppose that G has a cyclic Sylow
p-subgroup P (p = 2 is allowed). Let x be an element of order p in P , and
set N = NG(〈x〉). Suppose further that V(ZG) contains an elementary abelian
subgroup U of order p2. Let χ be the character of G which is induced from the
principal irreducible character of P . Then the rank in (1) is
(|G : P |+ |N : P |(p2 − 1)).
If χ is a character of G which is induced from a faithful irreducible character of P ,
the rank in (1) is
(|G : P | − |N : P |(p+ 1)).
The difference of these ranks is |N : P |(p2 + p)/p2, which is not an integer. This
contradiction proves the theorem.
In view of Corollaries 1 and 2, one may be tempted to investigate the analogous
problem for groups with dihedral Sylow 2-subgroups. These groups were classified
by Gorenstein and Walter, and listed, for example, on p. 462 in Gorenstein (1968).
To indicate what can be done by now, we end with an example.
Note that the order of a finite subgroup of V(ZG) divides the order of G, see
Lemma 37.3 in Sehgal (1993); a fact which, surprisingly enough from today’s point
of view, is in this generality not recorded in Higman’s thesis.
Example 7. For the alternating group A7, any finite 2-subgroup of V(ZA7) is
isomorphic to a subgroup of A7.
Proof. Sylow 2-subgroups of A7 are dihedral of order 8. Let x be an element of
order 4 in A7. Then x
G and (x2)G are the only conjugacy classes of elements of
order 4 and 2, respectively. There is an (irreducible) character χ of A7 of degree 6
which is afforded by a deleted permutation representation. We have χ(x) = 0 and
χ(x2) = 2.
Let U be a finite 2-subgroup of V(ZA7). If U is of order 2, then U is rationally
conjugate to a subgroup of A7 by Corollary 3.5 in Hertweck (2006). If U is of
order 4, the Luthar–Passi method as described in Hertweck (2007a) is not sufficient
to guarantee rational conjugacy to a subgroup of A7: for a unit u of order 4 in
V(ZA7) one cannot exclude the possibility of having (εx2(u), εx(u)) = (2,−1) when
χ(u) = 4. In this case, also χ(u−1) = 4. Anyway, U is isomorphic to a subgroup of
A7, and the same is true if U is a Klein’s four group.
Suppose that U is abelian of order 8. By the Cohn–Livingstone result, U is not
cyclic. Set e = 1
u∈U u. Since e is an idempotent, χ(u) is a rational integer. If
U is elementary abelian, then χ(e) = 1
(χ(1) + 7χ(x2)) = 20
, which is impossible.
UNIT GROUPS WITH NO NONCYCLIC ABELIAN FINITE SUBGROUPS 5
Thus U contains 3 elements of order 2 and 4 elements of order 4. Trying out all
possibilities shows that again χ(e) is not a rational integer.
It remains to consider the case when U is the quaternion group. Let u be an
element of order 4 in U . Since χ(u2) = χ(x2), the restriction of the character χ
to U is the sum of four linear characters and the one of degree two. But this is
impossible since χ is afforded by a rational representation, while the character of
degree two of the quaternion group comes from the block of the rational quaternion
algebra (whence the name of the group). �
References
Berman, S. D. (1955). On the equation xm = 1 in an integral group ring. Ukrain. Mat.
Ž. 7:253–261.
Brauer, R., Suzuki, M. (1959). On finite groups of even order whose 2-Sylow group is a
quaternion group. Proc. Nat. Acad. Sci. U.S.A. 45:1757–1759.
Cliff, G. H., Sehgal, S. K., Weiss, A. R. (1981). Units of integral group rings of metabel-
ian groups. J. Algebra 73(1):167–185.
Cohn, J. A., Livingstone, D. (1965). On the structure of group algebras. I. Canad. J.
Math. 17:583–593.
Dokuchaev, M. A., Juriaans, S. O. (1996). Finite subgroups in integral group rings. Can-
ad. J. Math. 48(6):1170–1179.
Gorenstein, D. (1968). Finite groups. New York: Harper & Row.
Hertweck, M. (2006). On the torsion units of some integral group rings. Algebra Colloq.
13(2):329–348.
Hertweck, M. (2007a). Partial augmentations and Brauer character values of torsion units
in group rings. Comm. Algebra, to appear (e-print arXiv:math.RA/0612429v2).
Hertweck, M. (2007b). The orders of torsion units in integral group rings of finite solvable
groups. Comm. Algebra, to appear (e-print arXiv:math.RT/0703541).
Higman, G. (1940). Units in group rings. Ph.D. thesis. University of Oxford (Balliol
College).
Kimmerle, W. (2006). Arithmetical properties of finite groups. Talk delivered at the Math
Colloquium of the Vrije Universiteit Brussel.
Marciniak, Z., Sehgal, S. K. (2003). The unit group of 1 + ∆(G)∆(A) is torsion-free. J.
Group Theory 6(2):223–228.
Roggenkamp, K. W. (1981). Units in integral metabelian grouprings. I. Jackson’s unit
theorem revisited. Quart. J. Math. Oxford Ser. (2) 32:209–224.
Sandling, R. (1981). Graham Higman’s thesis “Units in group rings”. In: Integral rep-
resentations and applications (Oberwolfach, 1980). Lecture Notes in Math. Vol. 882.
Berlin: Springer, pp. 93–116.
Sandling, R. (1984). The isomorphism problem for group rings: a survey. In: Orders
and their applications (Oberwolfach, 1984). Lecture Notes in Math. Vol. 1142. Berlin:
Springer, pp. 256–288.
Sehgal, S. K. (1993). Units in integral group rings. Pitman Monographs and Surveys in
Pure and Applied Mathematics Vol. 69. Harlow: Longman Scientific & Technical.
Weiss, A. (1988). Rigidity of p-adic p-torsion. Ann. of Math. (2) 127(2):317–332.
Zassenhaus, H. (1974). On the torsion units of finite group rings. In: Studies in mathe-
matics (in honor of A. Almeida Costa). Lisbon: Instituto de Alta Cultura, pp. 119–126.
Universität Stuttgart, Fachbereich Mathematik, IGT, Pfaffenwaldring 57, 70550
Stuttgart, Germany
E-mail address: hertweck@mathematik.uni-stuttgart.de
arXiv:math.RA/0612429v2
arXiv:math.RT/0703541
	1. Introduction
	2. Kimmerle's observation
	3. Proof of Theorem A
	References
ABSTRACT
  It is shown that in the units of augmentation one of an integral group ring
$\mathbb{Z} G$ of a finite group $G$, a noncyclic subgroup of order $p^{2}$,
for some odd prime $p$, exists only if such a subgroup exists in $G$. The
corresponding statement for $p=2$ holds by the Brauer--Suzuki theorem, as
recently observed by W. Kimmerle.

<|endoftext|><|startoftext|>
Introduction
The experimental announcement for the discovery of the pentaquark Θ+(1540)1)
triggered tremendous amount of theoretical and experimental works on the exotic
states. Although the existence of such exotic states is still not so obvious, the exotics
provide a good opportunity to get the deeper insight of the hadron structures and
their connection to QCD. One of approaches from QCD to exotics is the QCD sum
rule (QSR),2) which relates informations of QCD to the hadronic properties through
the correlator analysis for the interpolating fields of hadrons. The Borel transformed
sum rules with the simplest pole + continuum parametrization are given as (i = 0, 1
correspond to the chiral even and odd part, respectively)
(ope)
i (−Q
2) = λ2i e
−m2/M2 +
ds e−s/M
(ope)
i (s), (1
where the relation ±mλ20 = λ21 holds due to the spinor structure and the relative
sign of the residues λ2i represents with the parity of the resonance state. Using these
sum rules, we can derive the approximated expressions of the mass and residue as a
function of M and sth. To extract properties of the low energy excitations with good
accuracy, we need to treat sum rules in the appropriateM2-region, i.e., Borel window,
where the low energy correlation is large enough compared to the contaminations
from high energy components which have no relations with properties of low-lying
resonances. The setting the Borel window is the most important step in QSR and,
only within this window, we can evaluate the physical quantities.
In the exotic cases, as reported in Ref. 3), it is extremely difficult to find
the appropriate Borel window in contrast to the usual meson and baryon cases.
∗) e-mail address: torujj@ruby.scphys.kyoto-u.ac.jp
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0413v1
2 Toru Kojo, Daisuke Jido and Arata Hayashigaki
This is because the OPE convergence is very slow and the unwanted high energy
components dominate the spectral integral. In addition, we often encounter the
artificial stability of the physical quantities against M2-variation. This is the case
that physical quantities depend too strongly on the threshold parameter sth and not
on the low energy correlations which we want to extract. To attack these serious
problems common to the exotics, we propose a new approach and apply it to the Θ+,
assuming its quantum number as I = 0, J = 1/2, as one example of the exotics.4)
§2. OPE and favorable set up of the correlators
To find the Borel window, it is necessary to increase low energy informations in
the spectral function efficiently and, at the same time, reduce high energy contami-
nations. For these purposes, we take the following treatments.
Inclusion of the higher dimension terms of OPE is particularly important be-
cause they strongly reflect the low energy dynamics. For example, in the case of the
sum rules for ρ and A1 mesons, the dim.0 and 4 terms are the same due to the chiral
symmetry realized in the high energy, and the splitting of the masses is explained
only after the inclusion of dim.6 terms, 〈q̄q〉2, which appear due to the chiral sym-
metry breaking. From these observations, we perform the OPE calculation up to
dim.15 within factorization hypothesis both for taking into account the low energy
correlations and for the confirmation of good OPE convergence.
As later shown, simple inclusion of the low energy correlations through the higher
dimension terms is found to be not sufficient to find the Borel window because
high energy contaminations are too large in the QSR for the exotics. To reduce
the high energy contaminations, we take the difference between correlators for two
interpolating fields with similar structure but different chirality each other, i.e.,
d4x eiq·x
∣T [P (x)P̄ (0) − t S(x)S̄(0)]
ds e−s/M
Im[ΠP0 (s)− tΠS0 (s)] q̂ + Im[ΠP1 (s)− tΠS1 (s)]
, (2.1)
where Π0, Π1 correspond to the chiral even and odd part respectively, and
P = ǫabcǫdef ǫcfg{uTaCdb}{uTd Cγµγ5de}γµCsTg , (2.2)
S = ǫabcǫdef ǫcfg{uTaCγ5db}{uTd Cγµγ5de}γµCsTg . (2.3)
Here the only difference in these interpolating fields is that the first diquark structures
have the opposite chirality.
Let us first explain in the case of the chiral even part. Since they show the same
behavior in high energy due to the chiral symmetry, after the subtraction of two
correlators (t = 1 case), the irrelevant high energy contributions are expected to be
canceled out in the similar way as the Weinberg sum rules.5) In terms of OPE, this
cancellation corresponds to the cancellation of the lower dimension terms. It is not
a priori evident whether the low energy correlations remain enough even after the
subtraction because the low energy contribution could also cancel out. Our Borel
Exotic Hadron in Pole-dominated QCD Sum Rules 3
analysis, however, reveals that, in the case of t = 1, the large low energy correlation
remains enough even after the subtraction. As a result, we can find the Borel window
in the relatively large M2-region.
On the other hand, for the chiral odd part, the subtraction procedure corre-
sponding to t = 1 case does not lead the cancellation of the high energy components
because chiral odd part is constructed of the chiral symmetry breaking terms. How-
ever, in the case of t = 1, the OPE convergence is found to be very good and then we
can find the Borel window in the small M2-region where high energy contaminations
are suppressed due to the Borel factor e−s/M
in the integral of the spectral function.
§3. Borel analysis for mass and residue
Here we explain our criterion for the Borel window. The lower bound of the
Borel window is given so that the highest-dimensional terms in the truncated OPE
are less than 10% of its whole OPE, while the upper bound is determined by the
region where the absolute value of the pole contribution is larger than the absolute
value of the integrated spectral function in the region s ≥ sth. Note that the 50%
pole contribution in our criterion is extremely large in comparison with any prior
pentaquark sum rules, where the pole contributions are not more than 20%.3)
To recognize the problems in the case of QSR for exotics, let us see Fig.1 for
M2-dependence of the mass in the cases of t = −1, 0, 10 corresponding to PP̄ +SS̄,
PP̄ , SS̄ cases respectively. The threshold value is fixed to typical value
sth = 2.2
GeV. In these cases, we fail to find stabilities of the mass in the M2-region lower
than the upper bound of the Borel window. The stabilities above the upper bound
are simply artifacts which often appear in QSR. Fig.1 shows that typical mass of PP̄
case is much smaller than that of SS̄, and then we can expect that the low energy
correlation of PP̄ is much larger than that of SS̄. This observation leads that even
after the subtraction PP̄ − SS̄ (t = 1 case), the low energy correlation can remain
enough.
Now we see the case of around t = 1. We tune the value of t around t = 1 to
get the widest Borel window. As expected, for even part (t = 0.9), the high energy
SS (t = 10)
PP (t = 0)
PP (t = 0)
SS (t = ‐1)
SS (t = ‐1)
SS (t = 10)
Fig. 1. The behavior of the mass as a function of M2 for t = −1, 0, 10. The left arrows represent
the upper bound of the Borel window. In the smaller M2-region than the upper bound, we can
not find stable region of the mass. The stabilities above the upper bound are simply artifacts
which often appear in QSR.
4 Toru Kojo, Daisuke Jido and Arata Hayashigaki
even odd
(t=1.1)(t=0.9)
Fig. 2. The behavior of the mass as a function of M2. The left (right) arrows represent the upper
(lower) bound of the Borel window. We succeed to find the Borel window and stabilities of
mass.
contaminations are canceled out due to chiral symmetry and we find the wide Borel
window in the relatively large M2-region. On the other hand, for odd part (t = 1.1),
thanks to the good OPE convergence, we also find the wide Borel window in the
small M2-region. The threshold values are taken to make the physical quantities
most stable in the Borel window.
The best stability is achieved with
sth = 2.2 GeV (even) and 2.1 GeV (odd),
giving mΘ+ = 1.64 GeV (even) and 1.72 GeV (odd) respectively. The values of the
residue are also obtained from the chiral even and odd sum rules as λ20 = (3.0±0.1)×
10−9 GeV12 and λ21/mΘ+ = (3.4 ± 0.2) × 10−9 GeV12. It is remarkable that these
numbers are quite similar with the close t. This implies that our analysis investigates
consistently the same state in the two independent sum rules. Note that from the
relative sign of the residues, we assign positive parity to the observed Θ+ state.
In conclusion, we perform the Borel analysis for Θ+ with special setup of the
correlators in order to find the Borel window. Within uncertainties of the condensate
value, independent analyses for the chiral-even and odd sum rules give the consistent
values of the Θ+ mass, 1.68± 0.22 GeV, and the residue. The parity is found to be
positive.
Acknowledgements
We thank Profs. M. Oka, A. Hosaka and S.H. Lee for useful discussions about
QSR for the exotics during the YKIS2006 on ”New Frontiers on QCD” held at
the Yukawa Institute for Theoretical Physics. This work is supported in part by
the Grant for Scientific Research (No.18042001) and by Grant-in-Aid for the 21st
Century COE ”Center for Diversity and Universality in Physics” from the Ministry
of Education, Culture, Sports, Science and Technology (MEXT) of Japan.
References
1) T. Nakano et al., Phys. Rev. Lett. 91 (2003), 012002.
2) M.A. Shifman, A.I. Vainshtein, and V.I. Zakharow, Nucl. Phys. B 147 (1979), 385.
3) R.D. Matheus and S. Narison, Nucl. Phys. Proc. Suppl. 152 (2006), 236.
4) T. Kojo, A. Hayashigaki, and D. Jido, Phys. Rev. C 74 (2006), 045206
5) S. Weinberg, Phys. Rev. Lett. 18 (1967), 507.
	Introduction
	OPE and favorable set up of the correlators
	Borel analysis for mass and residue
ABSTRACT
  We study pentaquark $\Theta^{+} (I=0,J=1/2)$ in the QCD sum rules emphasizing
that we can not extract any properties of the pentaquark outside of the Borel
window. To find the appropriate Borel window, we prepare a favorable set up of
the correlators and carry out the operator product expansion up to dimension 15
within factorization hypothesis. Our procedures reduce the unwanted high energy
contaminations and enhance the low energy correlation. In the Borel window,
independent analyses for the chiral-even and odd sum rules give the consistent
values of the $\Theta^+$ mass, $1.68\pm0.22$ GeV, and the residue. The parity
is found to be {\it positive}.

<|endoftext|><|startoftext|>
Introduction
Left-handed materials [1] have long been considered a theoretical oddity. Since it has been
demonstrated that they could be produced using metamaterials [2], they have attracted
much attention. The basic physics of left-handed materials (LHM) is truly exotic and has
been completely ignored until recently, it renews the physics of lamellar structures to the
extend that a bare slab of LHM exhibits many surprising properties : it can for instance
support unusual guided modes [3,4] or behave as a perfect lens [5]. In this paper, we study
the exotic properties of the different types of leaky waves supported by a left-handed slab.
Given the importance of the left-handed slab for both fundamental and applied works,
there is obviously a need for a clear understanding of these properties.
We particularly show that two types of leaky waves are supported by such a structure
(i) leaky slab waves which are always backward due to negative refraction and (ii) leaky
surface waves which do not exist for a right-handed slab and which can be backward or
forward. The excitation of these modes leads to positive or negative giant lateral shifts,
the latter being rather exotic [6].
http://arxiv.org/abs/0704.0414v3
2 Leaky modes and giant lateral shifts
A leaky mode [6] is a solution of the wave equation which verifies the relation dispersion of
a structure but with a propagative solution above and (or) under the structure. Whereas
a guided mode has a real propagation constant, the propagation constant of a leaky
mode is complex because the energy of the waves leaks out of the structure and the
waves is attenuated. A leaky wave is thus a complex solution of the dispersion relation
and a complex plane analysis is thus particularly relevant for a thorough analysis of its
properties. Let us underline that a leaky mode may be either forward, which is common, or
backward, leading to a propagation constant which has a positive (respectively negative)
imaginary part.
Let us consider a slab characterized by ε2 and µ2 surrounded by right-handed media with
ε1 and µ1 (resp. ε3 and µ3) above (resp. under) the slab as shown figure 1. The values
we have chosen for ε2 and µ2 are arbitrary but realistic [7] so that this structure could be
realized using split-ring resonators and wires.
ε µ3 3
ε µ1 1
ε µ2 2
Figure 1: The LHM slab of thickness h surrounded by right-handed media.
We may assume that ε1 µ1 ≥ ε3 µ3 with no loss of generality.
The relation dispersion of such a structure can be written
r21 r23 = exp(−2iγ2 h) (1)
where γi =
εi µi k
0 − α2, k0 = ωc =
and rij =
κi−κj
κi+κj
with κi =
in TE polarization
(or κi =
in TM polarization). Since ε1 µ1 ≥ ε3 µ3 and we are concerned with leaky
waves, we will only consider values of α such that α <
ε1 µ1 k0, which means that the
solution will always be propagative at least in medium 1.
Let us now consider the reflection coefficient of a plane wave exp(i(αx+γ z−ω t)) coming
from upwards with an angle of incidence θ so that α = n k0 sin θ. Its reflection coefficient
can be written
r23 exp(2iγ2 h)− r21
1− r21 r23 exp(2iγ2 h)
using the above definitions.
It is obvious that when the relation dispersion is verified, then the reflection coefficient
presents a pole. A leaky mode thus corresponds to a pole of the reflection coefficient.
A zero, located on the other side of the real axis, corresponds to each pole. As we will
see in the following, a zone where the phase of r quickly varies lies between a pole and
its corresponding zero. This zone crosses the real axis, so that the presence of a pole is
responsible for a swift variation of the phase on the real axis.
When considering the reflection of a gaussian beam on a structure whose reflection co-
efficient has a modulus equal to one (so that it can be written r = eiφ), the lateral
displacement of the reflected beam’s barycenter along the interface is given by the well
known formula
δ = −dφ
. (3)
This lateral displacement is the sign that a leaky wave has been excited by the incident
beam. The reflected beam then has two components : the part which is reflected by
the first interface of the structure (whose barycenter is not particularly displaced) and
the leaky wave itself [6]. The reflected beam is heavily distorted by the leaky wave and
presents an exponentially decreasing tail so that its barycenter is largely displaced : this
is the so-called giant lateral shift.
This effect is sometimes called a giant Goos-Hänchen effect, but in this case the shift is
due to the excitation of a leaky mode [6] and not, as in the real Goos-Hänchen effect [8,9],
to the total reflection.
3 The left-handed slab
With left-handed materials, though, negative lateral shifts seem to be much more common
[10–14] than once expected [6]. Here we will consider the case of a left-handed slab (i.e.
if ε2 < 0 and µ2 < 0) and explain why the leaky modes supported by such a structure are
usually backward. Our explanations will be supported by a complex plane analysis of the
leaky modes.
Here the expression (2) of the reflection coefficient remains perfectly valid. We will now
distinguish two cases : the case when the solution is propagative in the left-handed medium
and the case when the solution is evanescent in region 2.
3.1 Leaky slab modes
When the field is propagative in the left-handed slab, large negative lateral shifts have
been reported but not interpreted [13]. These shifts are due to the excitation of leaky slab
modes or Perot-Fabry resonances of the slab at non normal incidence. Such leaky modes
have already been studied for a right-handed slab [15] and they can be considered as
constructive interferences of the multiple beams which are produced by reflections on the
interfaces of the slab. In the case of a left-handed slab, since the first beam undergoes a
negative refraction as shown figure 2 these constructive interferences will logically generate
a backward leaky mode. We may thus conclude that the existence of such a backward
leaky mode is linked to negative refraction.
Figure 2: Modulus of the field for a thick left-handed slab with ǫ1 = ǫ3 = µ1 = µ3 = 1,
ǫ2 = −3,µ2 = −1 and h = 60 λ using a gaussian incident beam with a waist of 20 λ and
an incidence angle of θ = 45.
This argument is not a proof, though : unexpected lateral shifts have been reported when
the beams interfere destructively [16]. But if the leaky modes are backward, then the
corresponding solutions of the dispersion relation and the poles of the reflection coefficient
should have a negative imaginary part. This is what is shown figure 3.
Figure 3: The phase of the reflection coefficient in a part of the complex plan [0, n1 k0] +
i[−k0
]. Each black point represents a pole and each white point a zero. The cut line is
clearly visible here. The rapid variation of the phase which is due to each pole is obvious.
Two types of leaky slab waves should be distinguished (i) L2 waves which are leaky in
both the upper and the lower media and (ii) L1 waves which are leaky only in the upper
medium and evanescent in the lower one. The latter correspond to the poles located under
the cut line.
Using complex plane analysis we will now try to show that all the solutions of the disper-
sion relation 1 are located in the lower part of the complex plane, meaning that all the
leaky modes are backward.
When the relation dispersion is satisfied, then the following condition holds :
|r23 r21| = e2 γ
h. (4)
As demonstrated in the annex, |rij > 1 whenever one of the media is left-handed. Since
medium 2 is left-handed then the condition
h > 1 (5)
should be satisfied, which is possible for γ′′2 > 0 and therefore for α
′′ < 0 (see the annex
for details). The fact that rij > 1 is thus the main reason why the poles of r are under
the axis and why the leaky slab modes are backward.
We must underline the fact that our demonstration is valid only for the first Riemann
sheet : our proof cannot exclude that there may be some poles on the other Riemann sheet
above the real axis, corresponding to forward L1 leaky slab waves when ε1 µ1 > ε3 µ3. But
we could not find any.
3.2 Leaky surface modes
Let us now consider the situation in which the field is evanescent in the left- handed
medium. Then γ2 is purely imaginary on the real axis. Since e
2 γ′′
h tends towards infinity
when h → +∞ then relation (4) can be verified only if r23 has a pole (r21 cannot have
one since the field is always propagative in the upper medium). This means that the
structure may support a leaky mode only if the interface between medium 2 and 3 can
support a guided mode. It is now well-known that such an interface actually supports a
surface mode [17,18] which can, depending on media 2 and 3, be backward (resp. forward)
corresponding to a pole under the real axis (resp. above the real axis but on the other
Riemann sheet). The leaky wave always has the same propagation direction as the surface
mode, whatever the thickness of the slab, as shown figure 4. In the case of a forward leaky
wave, only the zero belongs to the first Riemann sheet, just under the real axis. The pole
shown figure 4 belongs to the other Riemann sheet.
 0.05
 0.15
 0.25
 0.35
 1.05  1.1  1.15  1.2  1.25  1.3  1.35  1.4  1.45  1.5
−0.45
−0.35
−0.25
−0.15
−0.05
 1.7  1.75  1.8  1.85  1.9  1.95  2  2.05  2.1  2.15  2.2
Figure 4: Location of the poles in the α
complex plane for different values of h with
ε1 = 9, µ1 = µ3 = ε3 = 1 and (a) ε2 = −0.5 and µ2 = −1.5, showing a forward surface
mode and (b) ε2 = −5 and µ2 = −0.5, showing a backward surface mode.
Figure 5 finally shows the excitation of a backward leaky surface wave by a gaussian
beam. The chosen values of µ2 may be obtained with simple split ring resonators [19] for
instance.
Figure 5: Modulus of the field for a left-handed slab with ǫ1 = 9,ǫ3 = µ1 = µ3 = 1,
ǫ2 = −0.5,µ2 = −1.5 and h = 0.6 λ using a gaussian incident beam with a waist of 20 λ
and an incidence angle of θ = 21.496. The pole corresponding to the leaky mode is located
at αp = (1.0993 + 0.001267i) k0.
4 Fundamental property
Let us a consider a structure with left-handed materials. We will call corresponding
right-handed structure the structure obtained by replacing any left-handed medium by a
medium with opposite permittivity and permeability, without changing the geometrical
parameters.
In this section, we will concentrate on the link between the reflection coefficient of a
left-handed slab and the one of its corresponding right-handed structure.
Let us consider the interface between a right-handed medium labelled i and a left-handed
medium j. The reflection coefficient of such an interface is rij . We will now define
r+ij the reflection coefficient of an interface between medium i and right-handed medium
characterized by |εj| and |muj|. It is not difficult to see, from the expression of rij that
. (6)
This allows to understand why the Goos-Hänchen shift of an interface between a right-
and a left-handed medium is the opposite of the corresponding right-handed structure [11]
since the phases of both structures are opposite on the real axis.
The reflection coefficient r can now be written
e2iγ2 h
1− e2iγ2 h
r+23 e
−2iγ2 h − r+21
1− r+21 r+23 e−2iγ2 h
Since
except when z is on the cut line, then γ(z∗) = γ(z)∗ and hence r+ij(z)
∗) so that
r(z)∗ =
∗) e2iγ2(z
∗)h − r+21(z∗)
1− r+21(z∗) r+23(z∗) e2iγ2(z
, (10)
which can simply be written
r(z)∗ = r+(z∗), (11)
where r+ is the coefficient reflection of the corresponding right-handed slab. Note that
this relation does not hold on the cut line, but that it holds for the two Riemann sheets.
This means that the poles of the left-handed slab and the poles of the corresponding
right-handed slab are symmetrical with respect to the real axis. This means that L2
waves can be excited for the same incidence angle for both structures. This is not the
case for L1 modes : the function r on the real axis is continuous with the lower part of
the first Riemann sheet whatever the situation and the poles which are above the cut line
thus have no effect on the real axis.
As an example, we have computed the field in TE polarization inside and around the slab
when it is illuminated with a gaussian beam for the left-handed slab and its corresponding
right-handed structure. The results are shown figures 6 and 7.
Figure 6: Modulus of the field for a symmetrical slab with ǫ1 = ǫ3 = 9, µ1 = µ3 = 1,
ǫ2 = 1.5, µ2 = 1 and h = 1.3 λ using a gaussian incident beam with a waist of 20 λ and
an incidence angle of θ = 22.78.
Figure 7: Modulus of the field for a symmetrical slab with ǫ1 = ǫ3 = 9, µ1 = µ3 = 1,
ǫ2 = −1.5, µ2 = −1. and h = 1.3 λ using a gaussian incident beam with a waist of 20 λ
and an incidence angle of θ = 22.78. The pole corresponding to the leaky mode is located
at αp = (1.16823− 0.01125i) k0
5 The grounded left-handed slab
The grounded left-handed slab is a much more simple structure for (i) there is no need to
distinguish two different types of leaky slab modes and (ii) the structure can not support
any leaky surface mode. All the leaky modes are then slab modes and are found for
α < n2 k0. The reflection coefficient of the grounded slab is given by (2) with r23 = −1 for
the TE polarization and r23 = 1 for the TM polarization so that the relation dispersion
gives
|r12| = e2 γ
h. (12)
Since |r12| > 1 then all the solutions of the dispersion relation are located in the lower
part of the complex plane so that they are all backward.
It is then easy to show that the relation r+(z)∗ = r(z∗) still holds. As a consequence,
the leaky modes of a grounded left-handed slab and of its corresponding right-handed
structure can be excited for the same angle of incidence of the impinging beam.
6 Conclusion
In this paper, we have thoroughly studied the leaky modes of a left-handed slab for realistic
values of the permittivity and permeability of the left-handed medium [7,19,20] which can
be obtained using structures like split-ring resonators. Our results can be summarized as
follows. Left-handed slab may support two types of leaky modes :
• Leaky slab modes, which are always backward because of the negative refraction
phenomenon. When the transmission is not null, leaky modes of the left-handed
slab and of its corresponding right-handed structure are excited for the same angle
of incidence.
• Leaky surface modes, which may be backward or forward depending on the propa-
gation direction of the surface wave itself.
This work could help to interpret many giant lateral shifts as excitations of exotic leaky
waves [12, 13, 16]. Since the existence of backward slab waves is linked to the property
of negative refraction, and since these leaky waves constitute a signature of a left-handed
slab behavior we think that they could be used to characterize the left-handedness of
metamaterial or photonic crystal structures far better than other methods [21].
Acknowledgments
This work has been supported by the French National Agency for Research (ANR), project
030/POEM. The authors would like to thank Alexandru Cabuz and Kevin Vynck for their
help.
Annex
In this annex, we will clearly define the choice we have made for the definition of the
complex square root and prove that for z on the first Riemann sheet (but not on the cut
line) we have |rij(z)| > 1 when media i and j are not both right-handed.
Since the square root can be continued on the complex plane, r and rij can be continued
as well. We have chosen to take
2 with z = r eiθ and θ ∈]−π, π], as a definition
of the square root. This means that we have placed the cut line on the negative part of
the real axis and if x is a positive real,
−x = i
x. This defines the square root on the
entire complex plane, to which we refer as the first Riemann sheet. When we write that
z is on the second Riemann sheet, it will mean that we have taken the opposite of
defined above.
With this choice, we have (i) ℜ(
z) ≥ 0 (ii)
for z on both sheets but not on
the cut line (iii) if ℑ(z) < 0, ℑ(
z) < 0 and if ℑ(z) > 0, ℑ(
z) > 0 (iv) the function
γ(z) =
ǫ µ k20 − z2 has a cut line on the real axis (on ] − ∞,−n k0] ∪ [n k0,+∞] more
precisely) and the function γ on the real axis is continuous with the part of the complex
plane which is under the cut line : when z passes through the cut line from the first
Riemann sheet (coming from the lower part of the plane) to the second Riemann sheet,
γ(z) is continuous. When a function which can be written using γ(z) presents a pole,
it must be found either (i) for z on the first Riemann sheet and under the real axis (we
will say that the pole itself is on the first Riemann sheet in this case) or (ii) for z on the
second Riemann sheet but above the real axis.
We have
rij =
κi − κj
κi + κj
. (13)
The modulus of rij reads as
|rij|2 =
(κi − κj) (κ∗i − κ∗j )
(κi + κj) (κ
i + κ
|κi|2 + |κj|2 − 2 (κ′i κ′j + κ′′i κ′′j )
|κi|2 + |κj|2 + 2 (κ′i κ′j + κ′′i κ′′j )
, (15)
where κ = κ′ + i κ′′.
Let us define x and y the real and imaginary part of z = x + i y on the first Riemann
sheet. Let us assume that x > 0. We have
n2 k20 − z2 =
n2 k20 − x2 + y2 − 2 i x y. (17)
If y > 0, then x y > 0 and thus ℑ(n2 k20 − z2) < 0 so that finally ℑ(γ) < 0. If y < 0,
then x y < 0 so that ℑ(γ) > 0. Since γ(−z) = γ(z) the result will hold for x < 0 too and
for x = 0, γ(z) is real and positive so that the result obviously holds. So the imaginary
part of γ(z) is positive (resp. negative) when the imaginary part of z is negative (resp.
positive).
For any right-handed medium, κ has the same property than γ. For a left-handed medium,
since κ = γ
or κ = γ
depending on the polarization, the imaginary part of κ has the sign
of ℑ(z). Since i and j are not both right-handed, then κ′′i and κ′′j have not the same sign
and the product κ′′i κ
j is always negative. Since ℜ(
z) > 0 for all z on the first Riemann
sheet then κ′i κ
j is always negative too.
Finally, since κ′i κ
j + κ
j < 0, we have |rij| > 1 for all z except on the real axis. Please
note that rij is not, in the particular case of a left-handed medium, the reflection coefficient
on the interface [22].
REFERENCES
[1] V. Veselago, “The Electrodynamics of substances with simultaneously negative values
of ǫ and µ”, Usp. Fiz. Nauk. 92 517 (1967).
[2] R. Shelby, D. Smith, and S. Shultz, “Experimental Verification of a Negative Index
of Refraction”, Science 292 77 (2001).
[3] I. Shadrivov, A. Sukhorukov, and Y. Kivshar, “Guided modes in negative-refractive-
index waveguides”, Phys. Rev. E 67 057602 (2003).
[4] P. Tichit, A. Moreau, and G. Granet, “Localization of light in a lamellar structure
with left-handed medium : the Light Wheel”, Opt. Expr. 15 14961–14966 (2007).
[5] J. Pendry, D. Schuring, and D. Smith, “Controlling Electromagnetic Fields”, Science
312 1780 (2006).
[6] T. Tamir and H. Bertoni, “Lateral displacement of optical beams at multilayered and
periodic structures”, J. Opt. Soc. Am. 61 1397 (1971).
[7] D. R. Smith, S. Schultz, P. Makos, and C. M. Soukoulis, “Determination of effec-
tive permittivity and permeability of metamaterials from reflection and transmission
coefficients”, Phys. Rev. B 65 195104 (2002).
[8] F. Goos and H. Haenchen, “Ein neuer und fundamentaler Versuch zur Totalreflexion”,
Ann. Phys. 1 333 (1947).
[9] D. Felbacq, A. Moreau, and R. Smaali, “Goos-Haenchen effect in the gaps of photonic
crystals”, Opt. Lett. 28 1633 (2003).
[10] A. Lakhtakia, “On plane wave remittances and Goos-Haenchen shifts of planar slabs
with negative real permittivity and permeability”, Electromagnetics 23 71 (2003).
[11] P. Berman, “Goos-Haenchen shift in negatively refractive media”, Phys. Rev. E 66
067603 (2002).
[12] I. Shadrivov, A. A. Zharov, and Y. Kivshar, “Giant Goos-Haenchen effect at the
reflection from left-handed materials”, Appl. Phys. Lett. 83 2713 (2002).
[13] L. Wang and S. Zhu, “Large negative lateral shifts from the Kretschman-Raether
configuration with left-handed materials”, Appl. Phys. Lett. 87 221102 (2005).
[14] A. Moreau and D. Felbacq, “Comment on ’Large negative lateral shifts from the
Kretschman-Raether configuration with left-handed materials”’, Appl. Phys. Lett.
90 066102 (2007).
[15] F. Pillon, H. Gilles, S. Girard, M. Laroche, R. Kaiser, and A. Gazibegovic, “Goos-
Haenchen and Imbert-Fedorov shifts for leaky guided modes”, J. Opt. Soc. Am. B
22 1290 (2005).
[16] X. Chen and C. Li, “Lateral shift of the transmitted light beam through a left-handed
slab”, Phys. Rev. E 69 066617 (2004).
[17] R. Ruppin, “Surface polaritons of a left-handed medium”, Phys. Lett. A 277 61
(2000).
[18] I. Shadrivov, A. Sukhorukov, Y. Kishvar, A. Zharov, A. Boardman, and P. Egan,
“Non-linear surface waves in left-handed materials”, Phys. Rev. E 69 016617 (2004).
[19] C. Soukoulis, T. Koschny, J. Zhou, M. Kafesak, and E. Economou, “Magnetic re-
sponse of split ring resonators at terahertz frequencies”, Phys. Stat. Sol. B 244
1181–1187 (2007).
[20] S. O’Brien and J. B. Pendry, “Photonic band-gap effect and magnetic activity in
dielectric composites”, J. Phys. : Condens. Matter 14 4035–4044 (2002).
[21] J. Kong, B. Wu, and Y. Zhang, “Lateral displacement of a Gaussian beam reflected
from a grounded slab with negative permittivity and permeability”, Appl. Phys. Lett.
80 2084 (2002).
[22] D. Felbacq and A. Moreau, “Direct evidence of negative refraction at media with
negative ε and µ”, J. Opt. A : Pure and Appl. Opt. 5 L9 (2003).
	Introduction
	Leaky modes and giant lateral shifts
	The left-handed slab
	Leaky slab modes
	Leaky surface modes
	Fundamental property
	The grounded left-handed slab
	Conclusion
ABSTRACT
  Using complex plane analysis we show that left-handed slab may support either
leaky slab waves, which are backward because of negative refraction, or leaky
surface waves, which are backward or forward depending on the propagation
direction of the surface wave itself. Moreover, there is a general connection
between the reflection coefficient of the left-handed slab and the one of the
corresponding right-handed slab (with opposite permittivity and permeability)
so that leaky slab modes are excited for the same angle of incidence of the
impinging beam for both structures. Many negative giant lateral shifts can be
explained by the excitation of these leaky modes.

<|endoftext|><|startoftext|>
untitled
Coulomb blockade of field emission from nanoscale conductors
O. E. Raichev*
Institute of Semiconductor Physics, National Academy of Sciences of Ukraine, Prospekt Nauki 45, 03028, Kiev, Ukraine
�Received 9 February 2006�
Theoretical description of the field emission of electrons from nanoscale objects weakly coupled to the
cathode is presented. It is shown that the field-emission current increases in a steplike fashion due to single-
electron charging which leads to abrupt changes of the effective electric field responsible for the field emission.
A detailed consideration of the current-voltage characteristics is carried out for a nanocluster modeled by a
metallic spherical particle in the close vicinity of the cathode and for a cylindrical silicon nanowire grown on
the cathode surface.
PACS number�s�: 79.70.�q, 73.23.Hk, 73.40.Gk
I. INTRODUCTION
The discrete nature of electric charge reveals itself in the
transport of electrons through small conductors �nanopar-
ticles or other nanoscale objects� weakly coupled to the
source and drain electrodes �current-carrying leads� owing to
the Coulomb blockade effect. Numerous manifestations of
the charge quantization in transport properties, the most fa-
miliar of them are the Coulomb blockade oscillations of the
electric current as a function of the gate voltage and the
Coulomb staircase in the current-voltage characteristics,
have attracted considerable attention in the past years.1 Since
the fundamentals of the transport theory in the Coulomb
blockade regime have been established,2–4 the Coulomb
blockade-based physics has been applied to various issues of
electron transport in mesoscopic systems, and the field of its
applications expands in line with the advances in nanotech-
nology.
Usually, the influence of the Coulomb blockade on the
current in two-terminal devices is considered under assump-
tion that the coupling between the nanoscale object and the
leads is not sensitive to the number of electrons N determin-
ing the object charge eN. This corresponds to the introduc-
tion of ohmic �or nearly ohmic� effective resistances describ-
ing this coupling. Though this assumption often works well,
it can be violated, for example, in nanomechanical
systems,5–7 where charging of the object gives rise to its
displacement towards one of the leads thereby changing its
tunnel coupling to both leads. In this paper we study a situ-
ation when the sensitivity of the tunnel coupling to the num-
ber of electrons does not require a mechanical displacement
and is determined by the nature of tunneling. This implies a
device layout and conditions similar to those used in the
recent experiments on field emission of electrons from me-
tallic nanoclusters,8–10 silicon nanowires11–15 and
nanocones,16,17 and carbon nanotubes �see, for example,
Refs. 11 and 18–26�, when small �nanoscale� objects are
formed on the source electrode �cathode�, the latter is then
negatively biased with respect to the drain electrode �anode�
in vacuum. The current between the electrodes flows owing
to the field emission of electrons from nanoscale objects,
because the electric field F at the tips of the objects is higher
than in the other places of the device. The field-emission
current is described by the Fowler-Nordheim formula27
I = ASF2 exp�− F
�, F = 4�2m
3�e��
W3/2, �1�
where m is the free electron mass, W is the work function of
the emitting material, S is the effective emitting area, and A
is a constant expressed through the work function and Fermi
energy �F of the emitting material
�e�3��F/W
4�2 � ��F + W�
. �2�
The effective field F, which describes the tunnel coupling
between the nanoscale object and the anode, depends on the
object charge, which is induced by the applied voltage V
=V1−V2, where V1 and V2 are the cathode and anode poten-
tials, respectively. Under conditions of Coulomb blockade,
i.e., when the electric connection between the cathode and
the object is weak and the charging energy of the object
considerably exceeds the temperature T, the continuous
variation of the voltage V leads to discrete changes of the
object charge in units of e, and, consequently, to correspond-
ing discrete changes of the field F. Therefore, one may in-
troduce the field FN, which is a function of the discrete num-
ber N and continuous variable V. Next, if the current in the
device is limited by the field emission, the single-electron
tunneling processes become important. This means that, at a
fixed voltage V, the object stays mostly in the states with N
and N−1 electrons, the number N is determined by the volt-
age. In the N-electron state, no electrons can come to the
object from the cathode until an electron leaves the object by
tunneling through the barrier, see Fig. 1�a�. Then the object
appears in the N−1-electron state and returns to the
N-electron state before the next Fowler-Nordheim tunneling
event takes place. The field-emission current in these condi-
tions is given by Eq. �1� with F=FN and can be denoted as
IN. If the bias eV increases, the state with N+1 electrons
becomes more favorable, and the current changes in a step-
like fashion from IN to IN+1. This leads to staircaselike
current-voltage characteristics, which may look similar to the
usual Coulomb staircases.28–30 However, since the sensitivity
of the tunneling to the number of electrons is involved, the
staircaselike current-voltage characteristics can exist under
rather peculiar conditions, when the source-drain bias is or-
ders of magnitude larger than the charging energy.
The rest of the paper is devoted to quantitative studies
based on the physical idea outlined above. In Sec. II we give
the basic equations and calculate the current in the simplest
case of an idealized emitter shown in Fig. 1�b�. In Sec. III we
calculate the current from a nanocluster modeled by a spheri-
cal particle on the metallic cathode surface and from a semi-
conductor wire �nanowhisker� grown perpendicular to the
cathode surface. The discussion and concluding remarks are
given in the last section.
II. GENERAL CONSIDERATION
We consider the case of classical �or metallic� Coulomb
blockade, when the electron energy level separation in the
nanoscale object can be neglected in comparison to both
temperature and charging energy. Since the object is assumed
to be weakly coupled to the cathode, we study the sequential
tunneling process and not the coherent one. It is convenient
to investigate the electron transport by applying the kinetic
equation2 �Master equation� for the distribution function PN
describing the probability for the object to be in the state
with N electrons. Assuming that the electric connection be-
tween the cathode and the object is characterized by the con-
ductance G, this equation is written as
= QN+1 − QN, �3�
where
1 − exp�− �EN/T�
�PN − PN−1 exp�− �EN/T�	
+ PNIN/�e� . �4�
Here �EN= �e
2 /C��N−1/2−C2V /e	 is the difference in Cou-
lomb energies for the objects with N and N−1 electrons, C is
the total capacitance, and C2 is the capacitance of the object
with respect to the anode �the capacitance with respect to the
cathode is given by C1=C−C2�. The first term in expression
�4� has the usual form2 and corresponds to the current be-
tween the object and the cathode. It is written as a difference
of the contributions describing the departure of an electron
from the object in the N-electron state and arrival of an elec-
tron at the object in the N−1-electron state. The second term
corresponds to the field-emission current from the object in
the N-electron state. Since no electrons come to the object
from the anode, this term does not contain a contribution
describing arrival of electrons. In the stationary case, Eq. �3�
is reduced to the form QN=const, where the constant can be
chosen equal to zero. After determining PN from the equation
QN=0 with the use of the normalization condition 
NPN=1,
the total current is given by
PNIN. �5�
Under the condition GT� �e � IN, which means that the object
is in thermal equilibrium with the cathode, the stationary
solution of Eq. �3� is written as PN=Z
−1 exp�−EN /T�, where
EN= �e
2 /2C��N−C2V /e	
2 is the Coulomb energy, and Z
N exp�−EN /T� is the partition function. The current in this
case is determined by the expression
J = Z−1
IN exp�− EN/T� . �6�
Let us apply the solution �6� to the idealized model of
emitter, Fig. 1�b�, when the emission takes place from a
spherical nanoparticle of radius R, placed at a distance d
from the cathode. The distance between the cathode and an-
ode is L. The connection c-p denotes a low-transparent con-
tact �for example, tunnel barrier� between the particle and the
cathode, which does not contribute to the field-emission
properties and electrostatics of the device. Assuming d�R,
we have C=R, C2=Rd /L, and neglect the charge polariza-
tion of the particle because this polarization is small in com-
parison to the total charge eN induced by the applied voltage.
The number of electrons is estimated as N�C2V /e
=RdF0 / �e�, where F0=−V /L is the applied electric field. The
effective field for the nanoparticle with N electrons is FN
= �e �N /R2, and the partial currents IN in these conditions are
given by
IN = AS�eN/R
2�2exp�− FR2/�e�N� , �7�
where the emitting area S, in the idealized model considered
here, can be approximated by the total surface area of the
nanoparticle, S=4�R2. In Fig. 2 we plot the current-voltage
characteristics of the idealized emitter, calculated according
to Eqs. �6� and �7�, where A is given by Eq. �2� with W
=5.1 eV and �F=5.5 eV �taken for Au�, and the geometrical
parameters are chosen as R=5 nm and d=0.5 �m. The char-
acteristics look like staircases with flat regions �plateaus� be-
tween the steps, which are visible even at room temperature.
It is possible to estimate the relative heights of the steps by
calculating the ratio of the currents IN and IN−1 emitted from
the nanoparticle with N and N−1 electrons
� exp� FR2
�e�N�N − 1�
� . �8�
In spite of the fact that the charged nanoparticle typically
contains a large number of electrons, N
100, one can al-
FIG. 1. �a� The mechanism of single-electron tunneling in the
Fowler-Nordheim regime. �b� Schematic representation of the ide-
alized emitter.
ways find a regime when the ratio IN / IN−1 is not small in
comparison to unity. This necessarily implies a weak Fowler-
Nordheim tunneling, when F /F=FR2 / �e �N�1.
In the calculations described above, the applicability of
the Fowler-Nordheim formula requires R�W / �e �F, which is
rewritten as R�e2N /W, or, according to N�RdF0 / �e�, as
�e �F0�W /d, independent of the nanoparticle radius. This
condition is satisfied at high enough applied voltages. If
�e �F0=eV /L�W /d, the approximation of a triangular poten-
tial barrier is not quite good, and one should consider the
tunneling through the barrier described by the potential en-
ergy W−e2N�1/R−1/r� at r�R, where r is the distance
from the center of the spherical nanoparticle; the tunneling
through the potential barrier of this form is described in Ref.
31. Even under the condition �e �F0�W /d, which is satisfied
in the calculations shown in Fig. 2, the relative change of the
current per one step, IN / IN−1−1, appears to be significant,
because the exponent FR2 / �e �N�N−1� in Eq. �8� is estimated
as c�W / �e �F0d�
2, where the dimensionless constant c
=4/3�2me4 /�2W is noticeably larger than unity.
If the current is high enough, the field emission cannot
remain the bottleneck for the electron transfer from the cath-
ode to the anode, and a finite resistance G−1 becomes essen-
tial. The nanoparticle in these conditions is no longer in equi-
librium with the cathode. This means that the distribution PN
is established kinetically, and several states with different
charges coexist at a fixed voltage �see the inset in Fig. 2�. As
a consequence, the Coulomb blockade features are washed
out. This case requires a numerical solution of the equation
QN=0. The corresponding current-voltage characteristics of
the idealized emitter calculated by using the RC time C /G
=100 ps are also shown in Fig. 2. The degradation of the
current steps appears to be stronger with increasing voltage,
because the current increases and the nanoparticle-cathode
link becomes more important. The shape of the steps in this
case resembles the usual Coulomb staircase.
III. MORE COMPLEX EXAMPLES
After demonstrating the possibility of the Coulomb-
blockade staircase of the field emission on a model example,
it is worth to consider more complex cases. Indeed, the
model example discussed above has certain disadvantages.
First of all, it is hardly possible to connect a particle placed
far from the cathode surface by a link �c-p in Fig. 1�b�	
which does not contribute to the electrostatic properties of
the device. Second, the model of uniform charging is insuf-
ficient: the charge polarization of the nanoscale object ap-
pears to be important and should be always taken into ac-
count, see below in this section. Therefore, the model shown
in Fig. 1�b� is suitable only for the purposes of illustration of
the basic physics described by Eqs. �3�–�6�. To have a closer
approach to reality, we point out that the nanoscale objects
investigated in the above-cited experiments on field emission
can be roughly divided into two classes: the objects whose
dimensions in all directions are comparable �nanoclusters or
nanoparticles�, and the objects whose length in the direction
of the applied field is much larger than their transverse size
�nanowires or nanowhiskers�. The following consideration is
carried out for the cases of nanoclusters and nanowires of the
simplest geometries, when the electric fields FN and the ca-
pacitances C and C2 can be determined consistently by solv-
ing corresponding electrostatic problems. The current is cal-
culated according to Eq. �6�, under the assumption that the
objects are in equilibrium with the cathode.
A. Field emission from nanoclusters
Below we consider the field emission from a nanocluster
modeled by a spherical metallic particle of radius R depos-
ited on the flat cathode surface. To provide a finite capaci-
tance C, one should assume a finite separation d-R between
the particle and the metallic cathode plate �for instance, one
can imagine that the particle resides on an oxidized surface�,
see the inset to Fig. 3. Besides, this assumption provides
electrical isolation of the particle from the cathode, which is
FIG. 2. Current from the idealized emitter as a function of the
applied field F0=−V /L for the case of small C /G �nanoparticle in
thermal equilibrium with the cathode, upper curves� and for the
case of C /G=100 ps �lower curves�, at the temperatures T=77 K
�solid� and T=293 K �dashed�. The inset shows the distribution
function PN at F0=5	10
5 V/cm for the second case.
FIG. 3. Charge density per unit length in z direction for a spheri-
cal metallic nanocluster placed at the distance 0.1 R from the me-
tallic cathode. Here 
0=F0R /2. The inset shows the geometry of
the problem and the directions of the field emission �arrows�.
a necessary condition for the Coulomb blockade. The elec-
trostatics of the plane-sphere system is known, and the field
and charge distributions in this case can be found in the form
of rapidly converging infinite series arising from the poten-
tials of image point charges and point dipoles.32 Such a con-
sideration allows one to present the distribution of the elec-
trostatic potential energy near the particle in the approximate
U�r,�� � W + e��F0R�1 + 
�cos � − 1�	
− ��eN − C2V	/C�
r − R
, �9�
where r and � are the radial and azimuthal coordinates of the
spherical coordinate system with the origin at the center of
the particle, and �, 
, and � are the dimensionless constants
of the order of unity, which are to be determined from nu-
merical calculations. Such calculations also give us the ca-
pacitances C and C2.
33 Note that if the charge quantization is
neglected �so that N=C2V /e when the particle is in equilib-
rium with the cathode�, � is identified with the field enhance-
ment factor conventionally used in the physics of field emis-
sion. The expression �9� provides an excellent description of
the electrostatic potential at r−R�R /2 and at small �. It
allows one to take into account deviations of the potential
energy from the linear form W− �e �F�r−R� and, therefore, to
find corrections to the Fowler-Nordheim tunneling exponent.
Neglecting such corrections in the prefactor, we obtain the
following expression for the partial currents
IN = ASFN
2 exp�− F
�� �e�FNR
��x� =
� x2�x − 1��2 − arctan �x − 1� − x� , �10�
where A is given by Eq. �2�, the dimensionless function ��x�
describes the corrections to the tunneling exponent, and the
effective emitting area S=2�R2�FN
2 /
F�F0�
�2�R2�FN /
F� is reduced due to the angular dependence of
the radial field described by Eq. �9�. The field FN is given by
FN = �F0 + �
�e�N − C̃2F0
, �11�
where the quantity C̃2=C2L does not depend on the distance
L between the cathode and anode. Note that, since we always
assume that L is much larger than any dimension of the
nanoscale object, the capacitance C2 is always proportional
to 1 /L, and it is more convenient to replace C2 �V� by C̃2F0.
This substitution also allows us to represent the Coulomb
energy standing in Eq. �6� as
�N − C̃2F0/�e�	
2. �12�
Further calculations are done for the separation d−R
=0.1R, when C=2.16R, C̃2=1.74R
2, �=4.32, �=1.22, and

=0.66. Figure 3 shows the distribution of negative charges
on the surface of the spherical particle staying in equilibrium
with the cathode for this case ��e �N= C̃2F0 is assumed�. The
distribution of the radial field F�z� at the surface of the par-
ticle is given by the same dependence, F�z� /F0=
�z� /
The field-emission current from the nanocluster described
above has been calculated according to Eqs. �6� and �10�–
�12� at R=5 nm. The results of the calculations shown in Fig.
4 demonstrate the staircaselike behavior caused by the Cou-
lomb blockade. However, in contrast to the staircases shown
in Fig. 2, the current continues to increase between the steps.
This occurs because of electrostatic polarization of the nano-
particle. According to Eq. �11�, when the particle charge is
constant, the increase in the applied field F0 leads to an in-
crease in the effective field FN because the factor �
−�C̃2 /CR is positive. For the chosen particle radius, the
steps of the current are clearly visible at liquid nitrogen tem-
perature but poorly visible at room temperature. Neverthe-
less, the Coulomb blockade features at room temperature be-
come quite distinct in the plots of the derivative of the
current, as shown in the inset to Fig. 4.
B. Field emission from nanowires
Let us consider the field emission from a small semicon-
ductor wire modelled by a cylinder of radius R and length d,
which ends with a hemispherical tip of the same radius, see
the inset to Fig. 5. The cathode substrate upon which the
wire is grown is assumed to be a metal �or a heavily doped
semiconductor� so that one can use the method of image
charges instead of solving the electrostatic problem in the
whole space. The electric isolation of the wire from the cath-
ode in this case takes place in a natural way, because a
Schottky barrier is formed between the wire and the metallic
cathode �in the case of semiconducting cathode there can be
a heterobarrier or an interband p-n barrier�. In other words,
FIG. 4. Current from the spherical nanocluster of radius R
=5 nm as a function of the applied field F0=−V /L at T=4.2 K
�solid� and 77 K �dashed�. The inset shows the derivative of the
current at T=293 K.
the wire region adjacent to the cathode becomes depleted of
electrons and positively charged because of the presence
of donors �we assume that the wire is uniformly doped
with bulk donor density nD�. When a bias eV is applied
between the cathode and anode, the wire acquires a con-
siderable negative charge because of tunneling or thermi-
onic emission of electrons from the cathode through the
barrier. When the field emission from the wire of nan-
oscale radius becomes essential, the density of induced
negative charges per unit length of the wire appears to be
much larger than the equilibrium charge density 
=�R2 �e �nD even if nD is of the order of 10
18 cm−3. For
this reason, one can use the “metallic” approximation
assuming that the charges in the wire are placed mostly on its
surface. This means that the electron density distribution
n�� ,z�, which depends on the radial coordinate � of the cy-
lindrical coordinate system connected with the wire, is given
by n�� ,z�= �2�� �e � �−1���−R�
�z�+nD for z�d and n�� ,z�
= �2�� �e � �−1���−�R2− �z−d�2	
�z�+nD for d�z�d+R,
where 
�z� is the density of negative charges on the surface
per unit length. Since this approximation is based on the
assumption that the screening length is small in comparison
with the wire radius, it works better for wider wires. For
silicon wires, whose field-emission properties are currently
the subject of investigations,11–15 the metallic approximation
remains suitable even for the radius of several nanometers,
because, owing to the large effective masses and six-valley
degeneracy, the density of electron states in n-Si appears to
be high enough to provide the Thomas-Fermi screening
length less than one nanometer for Fermi energies �F
�0.01 eV. The metallic approximation, of course, fails to
describe the region of the wire in the close vicinity of the
cathode, where the depletion occurs. Nevertheless, since this
region is a small part of the whole wire, see the charge dis-
tribution in Fig. 5, its presence cannot considerably modify
the parameters calculated as described below.
According to the discussion given here, we search for the
charge distribution 
�z� satisfying the integral equation
U�z� = U0 − �e�F0z + �
dz�K�z,z��
�z�� , �13�
where U�z� is the potential energy counted from the Fermi
level in the cathode material, U0 is the barrier height, and
K�z ,z�� is the potential of interaction between the electrons
in the points z and z� of the wire surface in the presence of
the cathode plate, see the Appendix. Equation �13� is accom-
panied with additional requirements: U�z�=0 at z�z0 and
�z�=−
D at z�z0, where z0 is the depletion edge coordi-
nate, which is to be found self-consistently. The first of these
requirements corresponds to a full screening of the bare po-
tential U0− �e �F0z by the induced charges of the wire, while
the second one models the presence of the positive charges in
the depletion region z�z0. Once the distribution 
�z� is
found, the total charge of the wire, −�0
d+Rdz
�z�, as well as
the distribution of electric field around the wire, can be cal-
culated. To find the capacitance C and describe modification
of the effective field under single-electron charging, one may
calculate the variation of the total charge and the field at the
tip �at z=d+R� with respect to a small variation of U0. Equa-
tion �13� is solved numerically by using the method of itera-
tions. The dependence of the effective field FN on F0 and N
can be represented in the form similar to Eq. �11�
FN = ��F0�F0 + �
�e��N + B� − C̃2F0
C�F0�R
, �14�
while the Coulomb energy is written as
2C�F0�
�N + B − C̃2F0/�e�	
2. �15�
These equations take into account a finite �though weak�
dependence of the capacitance C and field enhancement fac-
tor � on the applied field F0. The dependence of the param-
eters C̃2 and � on F0 appears to be much weaker and can be
neglected. The positive dimensionless constant B reflects the
fact that the average number of induced charges is smaller
than C̃2F0 / �e�. These features appear because the system un-
der consideration is not entirely metallic and contains a
depletion region whose length changes with F0.
The numerical calculations leading to the results pre-
sented below are done for U0=0.7 eV, which approximately
corresponds to the Schottky barrier height for n-Si in contact
with Al.34 The chosen donor density is nD=2	10
18 cm−3.
The parameters standing in Eqs. �14� and �15�, however, are
not sensitive to nD, except for the capacitance C, which
changes within 10% when nD varies from 10
18 cm−3 to 2
	1018 cm−3. Figure 5 shows the charge density distribu-
tion for the wire of radius R=5 nm and length d=0.1 �m
at F0=10
6 and 2	106 V/cm. The charge density shows
a nearly linear growth through the main part of the wire
and a sharp enhancement at the hemispherical tip from
which the field emission occurs. The dependence of the
field enhancement factor and capacitance on the applied
electric field is shown in Fig. 6, and the other calcu-
lated parameters are C̃2=2.44 dR, �=0.414, and B=12.14.
FIG. 5. Charge density per unit length for the cylindrical wire
whose geometry is shown in the inset �see parameters in the text�.
The plots of the field-emission current calculated with the
use of the parameters listed here are given in Fig. 7. The
calculations are done according to Eqs. �6�, �14�, and �15�,
and the Fowler-Nordheim formula for the partial current,
IN=ASFN
2 exp�−F /FN�. Since the calculated radial electric
field in the region of the tip weakly depends on z �in contrast
to the case of the nanocluster studied above� and sharply
decreases in the region of transition to the cylindrical part of
the wire, the effective emitting area S is estimated by the
total area of the hemispherical tip, S=2�R2. The work func-
tion is taken for silicon, W=4.2 eV. Next, the Fermi energy
standing in the expression for A, see Eq. �2�, is estimated
from the equation �F��e �FinrTF, where Fin��F0 /� is the
field inside the semiconductor near the end of the tip, rTF is
the Thomas-Fermi screening length, and � is the dielectric
constant of the semiconductor. Such an estimate, carried out
for n-Si, leads to �F�0.1 eV at F0�2	10
6 V/cm. The pic-
ture of Coulomb staircase shown in Fig. 7 is basically the
same as that in Fig. 4. Again, the increase of the current with
the applied field is determined by the increase of the effec-
tive field �14� due to both single-electron charging �steps�
and charge polarization under a constant charge �regions be-
tween the steps�. The main difference is that the interval of
the applied field needed for addition of one electron to the
wire is considerably reduced, owing to the larger capacitance
C2, and appears to be of about 1.2 V/�m �further reduction
of this interval takes place with the increase of the wire
length, see below�. Next, since the capacitance C increases
considerably in comparison to the case of nanocluster of the
same radius, the Coulomb blockade features at room tem-
perature are poorly visible even in the derivative plot, see the
inset. Nevertheless, these features remain pronounced at T
=77 K.
With the increase of the wire length d, the parameters
entering Eqs. �14� and �15� are modified as shown in Fig. 8.
The field enhancement factor and the capacitances increase
nearly in a linear way, while the parameter �, which charac-
terizes relative contribution of charging to the effective field,
slightly decreases �for comparison, the idealized emitter con-
sidered in the previous section is described by the parameters
�=d /R, �=1, C=R, and C̃2=dR, where d is the distance
from the cathode to the emitting sphere�. The increase of the
total capacitance C makes it difficult to observe the Coulomb
staircase in long wires. For example, at d=1 �m one should
have temperatures considerably lower than 77 K. The inter-
val of the applied field corresponding to the addition of one
electron is inversely proportional to C̃2. This interval de-
creases very fast with the increase of d and becomes equal to
2.5	102 V/cm at d=1 �m.
FIG. 6. Field dependence of the enhancement factor and capaci-
tance for the cylindrical wire with R=5 nm and d=0.1 �m.
FIG. 7. Current from the cylindrical wire of radius R=5 nm and
length 0.1 �m as a function of the applied field F0=−V /L at T
=4.2 K �solid� and 77 K �dashed�. The inset shows the derivative of
the current at T=293 K.
FIG. 8. Dependence of the parameters �, �, C, and C̃2 on the
length of the wire for R=5 nm and F0=5	10
5 V/cm.
IV. CONCLUSIONS
The key point of the presented theoretical study is the
possibility of noticeable modification of the effective electric
field causing the field emission from a nanoscale conductor
by addition of just one electron to this conductor. Formally,
this modification is described by introducing the effective
field FN, which determines the partial current IN, and by
evaluating the dependence of this field on the bias applied
between the cathode and anode, see Eqs. �11� and �14�. As a
result of this effect, the current-voltage characteristics of the
field emission show steps in the Coulomb blockade regime.
In other words, the steplike current-voltage characteristics
related to single-electron charging �Coulomb staircases� may
exist even under the conditions of field-emission experi-
ments, when the applied bias is orders of magnitude larger
than the charging energy. The steps on the current-voltage
characteristics can be visible at 77 K in the case of field
emission from nanoclusters and nanowires of 10 nm diam-
eter and submicron length. In the regions between the steps,
where the total charge of the nanoscale object is constant, the
current increases with the increase of the applied bias owing
to charge polarization.
The staircases described in this work are similar to the
usual Coulomb staircases obtained in the transport through
small metallic islands28–30 or quantum dots �see Ref. 35 for
review� with strong asymmetry in the barriers. In both cases,
each step of the current is associated with addition of an
electron to the nanoscale object, and the applied source-drain
voltage drops mostly across the low-transparent barrier �the
barrier between the object and the drain�. Therefore, the pe-
riodicity of the steps in both cases is determined by the
object-drain capacitance C2. However, the steps in the sec-
ond case are formed due to shifts of effective �N-dependent�
electrochemical potential of the object with respect to elec-
trochemical potentials of the source and drain. For this rea-
son, the usual Coulomb staircase shows well-defined steps
when C2 is greater than the object-source capacitance C1,
while in the opposite situation, C1�C2, the steps are sup-
pressed and the current-voltage characteristic approaches to a
linear dependence.29,30 In contrast, in the case described in
this work the steps are formed due to changes in the prob-
ability of Fowler-Nordheim tunneling from the object to the
drain �anode�. That is why the steps are clearly visible under
the condition C1�C2, which is imposed by the field-
emission layout considered in this paper. To summarize, the
sensitivity of the field emission to the number of electrons in
the nanoscale object makes it possible to obtain the Coulomb
staircases under the conditions when such staircases cannot
be observed in the transport through small metallic islands or
quantum dots.
The quantitative consideration has been applied here to
some simple models of the nanoscale objects, whose electro-
static properties necessary for description of the field en-
hancement and charging have been determined consistently.
Consequently, the number of geometrical parameters charac-
terizing the objects has been minimized. For example, the
nanowire has been characterized only by its length d and
radius R. In reality, the geometrical structure of objects is
more complicated. For example, their tips may contain sharp
regions which provide a more efficient field emission. In
fact, high field-emission currents from nanoscale objects are
typically observed at the applied fields of the order of
105 V/cm, which requires the field enhancement factors
much larger than those calculated in this paper. On the other
hand, the presence of sharp tips cannot strongly modify the
capacitances of the objects. The general picture of the single-
electron tunneling under the field-emission regime also re-
mains valid. For possible application to experiments, the
field enhancement due to charging can be described by equa-
tions of the kind of Eqs. �11� and �14�, where � and � should
be considered as parameters to be determined experimen-
tally.
At the present time, there is no experimental evidence of
the Coulomb staircase phenomenon under the field emission.
Though the current-voltage characteristics sometimes show
steplike features, see, for example, Ref. 11, these features are
not regular and, most probably, should be attributed to insta-
bilities of the emission process and burning out of the emit-
ting material. There are numerous reasons which make ob-
servation of the phenomena considered in this paper difficult.
First of all, in most cases the nanoscale objects on the cath-
ode surface form dense arrays. This means that the field
emission takes place from a macroscopic number of objects
which are electrostatically coupled. The charging and field-
emission properties appear to be considerably different36
from those of individual objects. The Coulomb blockade
phenomena in this case should be dramatically suppressed by
the size dispersion of the objects and by the effects of mutual
screening. Investigation of field emission from individual ob-
jects is possible in the cases of metallic nanoclusters8–10 and
carbon nanotubes.26 However, there exists the problem of
electric isolation of these objects from the cathode, which is
one of the necessary conditions for Coulomb blockade. No
special attempts to achieve such an isolation in the field-
emission experiments have been undertaken so far, except
for the nanomechanical system investigated in Ref. 7, where
the electron emission from an isolated Au island to a
submicron-sized electrode has been observed. Most of the
experiments on field emission are carried out at room tem-
perature, though existing experimental techniques also allow
measurements at liquid nitrogen temperature. This means
that the Coulomb blockade phenomena can be observed only
for small-sized objects whose capacitances are low enough
�see the results of Sec. III�. Besides, the interval of the ap-
plied field corresponding to addition of one electron strongly
decreases in the case of emission from long nanowires,
which requires high resolution with respect to field. In sum-
mary, a search for the Coulomb blockade features in the
field-emission current would require a special planning of
experiment. The author hopes that the presented theoretical
study will stimulate experimental investigations in this direc-
tion.
ACKNOWLEDGMENTS
The author is grateful to A. I. Klimovskaya for stimulat-
ing discussions.
APPENDIX: KERNEL OF EQUATION (13)
If z�d and z��d, K�z ,z��=K0�z ,z��−K0�−z ,z��, where
K0�z,z�� = �
��z − z��2 + 2R2�1 − cos ��
. �A1�
If z�d and z��d, K�z ,z��=K0�z ,z��−K0�−z ,z��, where
K0�z,z�� = �
��d − z�2 + 2R�d − z�cos �� + 2R2�1 − sin �� cos ��
. �A2�
If z�d and z��d, K�z ,z��=K0�z ,z��−K0�z ,−z��, where
K0�z,z�� = �
��d − z��2 + 2R�d − z��cos � + 2R2�1 − sin � cos ��
. �A3�
Finally, if z�d and z��d,
K�z,z�� = �
� �e��2R2�1 − cos � cos �� − sin � sin �� cos ����
�4d2 + 4dR�cos � + cos ��� + 2R2�1 + cos � cos �� − sin � sin �� cos ��� . �A4�
In Eqs. �A2�–�A4�, cos �= �z−d� /R and cos ��= �z�−d� /R, so � and �� are the azimuthal angles. The integrals are taken over
the polar angle �. The function K�z ,z�� is also representable in the form of full elliptic integrals.
*Electronic address: raichev@isp.kiev.ua
1 Single Charge Tunneling, edited by H. Grabert and M. H. De-
voret, NATO ASI Series B 294 �Plenum Press, New York, 1992�.
2 I. O. Kulik and R. I. Shekhter, Zh. Eksp. Teor. Fiz. 68, 623
�1975� �Sov. Phys. JETP 41, 308 �1975�	.
3 D. V. Averin and K. K. Likharev, J. Low Temp. Phys. 62, 345
�1986�; and in Mesoscopic Phenomena in Solids, edited by B. L.
Altshuler, P. A. Lee, and R. A. Webb �Elsevier, Amsterdam,
1991�.
4 C. W. J. Beenakker, Phys. Rev. B 44, 1646 �1991�.
5 L. Y. Gorelik, A. Isacsson, M. V. Voinova, B. Kasemo, R. I.
Shekhter, and M. Jonson, Phys. Rev. Lett. 80, 4526 �1998�.
6 A. Erbe, C. Weiss, W. Zwerger, and R. H. Blick, Phys. Rev. Lett.
87, 096106 �2001�.
7 D. V. Scheible, C. Weiss, J. P. Kotthaus, and R. H. Blick, Phys.
Rev. Lett. 93, 186801 �2004�.
8 M. E. Lin, R. P. Andres, and R. Reifenberger, Phys. Rev. Lett. 67,
477 �1991�.
9 M. E. Lin, R. Reifenberger, and R. P. Andres, Phys. Rev. B 46,
15490 �1992�.
10 M. E. Lin, R. Reifenberger, A. Ramachandra, and R. P. Andres,
Phys. Rev. B 46, 15498 �1992�.
11 C. S. Chang, S. Chattopadhyay, L. C. Chen, K. H. Chen, C. W.
Chen, Y. F. Chen, R. Collazo, and Z. Sitar, Phys. Rev. B 68,
125322 �2003�.
12 S. Johnson, A. Markwitz, M. Rudolphi, H. Baumann, S. P. Oei,
K. B. K. Teo, and W. I. Milne, Appl. Phys. Lett. 85, 3277
�2004�.
13 N. N. Kulkarni, J. Bae, C.-K. Shih, S. K. Stanley, S. S. Coffee,
and J. G. Ekerdt, Appl. Phys. Lett. 87, 213115 �2005�.
14 J. C. She, K. Zhao, S. Z. Deng, J. Chen, and N. S. Xu, Appl.
Phys. Lett. 87, 052105 �2005�.
15 J. C. She, S. Z. Deng, N. S. Xu, R. H. Yao, and J. Chen, Appl.
Phys. Lett. 88, 013112 �2006�.
16 Q. Wang, J. J. Li, Y. J. Ma, Z. L. Wang, P. Xu, C. Y. Shi, B. G.
Quan, S. L. Yue, and C. Z. Gu, Nanotechnology 16, 2919
�2005�.
17 Y. L. Chueh, L. J. Chou, S. L. Cheng, J. H. He, W. W. Wu, and L.
J. Chen, Appl. Phys. Lett. 86, 133112 �2005�.
18 A. G. Rinzler, J. H. Hafner, P. Nikolaev, L. Lou, S. G. Kim, D.
Tomanek, P. Nordlander, D. T. Colbert, and R. E. Smalley, Sci-
ence 269, 1550 �1995�.
19 P. G. Collins and A. Zettl, Appl. Phys. Lett. 69, 1969 �1996�.
20 Q. H. Wang, A. A. Setlur, J. M. Lauerhaas, J. Y. Dai, E. W. Seelig,
and R. P. H. Chang, Appl. Phys. Lett. 72, 2912 �1998�.
21 S. Fan, M. G. Chapline, N. R. Franklin, T. W. Tombler, A. M.
Cassell, and H. Dai, Science 283, 512 �1999�.
22 R. H. Baughman, A. A. Zakhidov, and W. A. de Heer, Science
297, 787 �2002�.
23 S. H. Jo, Y. Tu, Z. P. Huang, D. L. Carnahan, J. Y. Huang, D. Z.
Wang, and Z. F. Ren, Appl. Phys. Lett. 84, 413 �2004�.
24 M. Mauger, V. T. Binh, A. Levesque, and D. Guillot, Appl. Phys.
Lett. 85, 305 �2004�.
25 N. de Jonge, M. Allioux, M. Doytcheva, M. Kaiser, K. B. K. Teo,
R. G. Lacerda, and W. I. Milne, Appl. Phys. Lett. 85, 1607
�2004�.
26 Z. Xu, X. D. Bai, E. G. Wang, and Z. L. Wang, Appl. Phys. Lett.
87, 163106 �2005�.
27 R. H. Fowler and L. W. Nordheim, Proc. R. Soc. London, Ser. A
119, 173 �1928�.
28 J. B. Barner and S. T. Ruggiero, Phys. Rev. Lett. 59, 807 �1987�.
29 K. Mullen, E. Ben-Jacob, R. C. Jaklevic, and Z. Schuss, Phys.
Rev. B 37, 98 �1988�.
30 R. Wilkins, E. Ben-Jacob, and R. C. Jaklevic, Phys. Rev. Lett. 63,
801 �1989�.
31 L. D. Landau and E. M. Lifshitz, Quantum Mechanics �Perga-
mon, Oxford, 1977�.
32 W. R. Smythe, Static and Dynamic Electricity �McGraw-Hill,
New York, 1968�.
33 If d=R, the parameters are obtained analytically: �=7��3� /2
�4.21, 
=93��5� /56��3�−3/4, �=�2 /8, and C2= ��
2 /6�R2 /L,
but C is equal to infinity �C diverges in a logarithmic way as the
separation d-R goes to zero�.
34 Metal-Semiconductor Interfaces, edited by A. Hiraki �IOS Press,
Amsterdam, 1995�.
35 L. P. Kouwenhoven, C. M. Marcus, P. L. McEuen, S. Tarucha, R.
M. Westervelt, and N. S. Wingreen, Electron Transport in Quan-
tum Dots, in Proceedings of the Advanced Study Institute on
Mesoscopic Electron Transport, edited by L. L. Sohn, L. P. Kou-
wenhoven, and G. Schön �Kluwer, 1997�.
36 T. A. Sedrakyan, E. G. Mishchenko, and M. E. Raikh, cond-mat/
0504042 �unpublished�.
ABSTRACT
  Theoretical description of the field emission of electrons from nanoscale
objects weakly coupled to the cathode is presented. It is shown that the field-
emission current increases in a step-like fashion due to single-electron
charging which leads to abrupt changes of the effective electric field
responsible for the field emission. A detailed consideration of the
current-voltage characteristics is carried out for a nanocluster modelled by a
metallic spherical particle in the close vicinity of the cathode and for a
cylindrical silicon nanowire grown on the cathode surface.

<|endoftext|><|startoftext|>
Origamis with non congruence Veech groups
Gabriela Schmithüsen
In this article we give an introduction to origamis (often also called square-tiled
surfaces) and their Veech groups. As main theorem we prove that in each genus
there exist origamis, whose Veech groups are non congruence subgroups of SL2(Z).
The basic idea of an origami is to obtain a topological surface from a few combina-
torial data by gluing finitely many Euclidean unit squares according to specified
rules. These surfaces come with a natural translation structure. One assigns in
general to a translation surface a subgroup of GL2(R) called the Veech group. In
the case of surfaces defined by origamis, the Veech groups are finite index sub-
groups of SL2(Z). These groups are the objects we study in this article.
One motivation to be interested in Veech groups is their relation to Teichmüller
disks and Teichmüller curves, see e.g. the article [H 06] of F. Herrlich in the
same volume: A translation surface of genus g defines in a geometric way an
embedding of the upper half plane into the Teichmüller space Tg of closed Rie-
mann surfaces of genus g. The image is called Teichmüller disk. Its projection to
the moduli space Mg is sometimes a complex algebraic curve, called Teichmüller
curve. More precisely this happens, if and only if the Veech group is a lattice in
SL2(R). In this case the algebraic curve can be determined from the Veech group
up to birationality.
It is hard to determine the Veech group for a general translation surface. How-
ever, if the translation surface comes from an origami there is a special approach
to this problem. It is based on the idea of describing origamis by finite index sub-
groups of F2, the free group in two generators. This leads to a characterization of
origami Veech groups as the images in SL2(Z) of certain subgroups of Aut(F2),
the automorphism group of F2.
Using this approach we will calculate Veech groups of two origamis explicitly.
They turn out to be non congruence groups. Starting from these examples we
obtain infinite sequences of origamis all of whose Veech groups are non congruence
groups. This leads to the following theorem.
Theorem 1. Each moduli space Mg (g ≥ 2) contains an origami curve whose
Veech group is a non congruence group.
In Section 1 we introduce origamis and present different equivalent ways to de-
scribe them. In Section 2 we give a glance on the mathematical context. We
describe, how an origami defines a family of translation surfaces and explain
roughly , how one obtains a Teichmüller curve in moduli space starting from
http://arxiv.org/abs/0704.0416v1
an origami. We introduce Veech groups and shortly point out their relation to
Teichmüller curves. In Section 3 we turn to Veech groups of origamis and present
a characterization of them in terms of automorphisms of the free group F2 in two
generators. We use this characterization to calculate two examples explicitly.
Finally, in Section 4 we show that these two examples produce Veech groups that
are non congruence groups and give a method to construct out of them infinite
sequences of Veech groups that are again non congruence groups.
The first part (Section 1 - Section 3) of this article is meant to give a handy in-
troduction to origamis and an overview on some of our results about their Veech
groups. In the second part we state and prove Theorem 1 based on the results in
the PhD thesis [S 05] of the author.
For a broader introduction and overview on origamis and Teichmüller curves
as well as for references to the larger context, we refer the the reader e.g. to
[HeSc 06], [S 04] and [S 05].
Acknowledgments: I would like to thank Frank Herrlich for his support in
respect of the content and for his proof reading, Stefan Kühnlein for helpful dis-
cussions and suggestions especially on non congruence groups and the organizers
of the conference for giving me the opportunity to contribute to these proceedings.
This work was partially supported by a fellowship within the Postdoc-Programme
of the German Academic Exchange Service (DAAD).
1 ORIGAMIS 3
1 Origamis
There are several ways to define origamis. We start with the somehow playful
description that we have learned from [Lo 05], where also the name origami was
introduced: An origami is obtained by gluing the edges of finitely many copies
Q1, . . . , Qd of the Euclidean square Q via translations according to the following
rules:
• Each left edge shall be identified to a right edge and vice versa.
• Similarly, each upper edge shall be identified to a lower one.
• The arising closed surface X shall be connected.
We only study what is called oriented origamis in [Lo 05] and call them just
origamis.
Example 1.1.
a) The simplest example is the origami that is made from only one square.
There is precisely one possibility to glue its edges according to the rules.
One obtains a torus E. We call this origami the trivial origami O0.
Figure 1: The trivial origami. Opposite edges are glued.
Observe that the four vertices of the square are all identified and become
one point on the closed surface E. We call this point ∞.
b) We now consider an origami made from four squares, see Figure 2. Some
identifications of the edges are already done in the picture. For all other
edges those having same labels are glued. The origami is called L(2, 3) for
obvious reasons.
2 3 4
a b c
Figure 2: The origami L(2, 3). Opposite edges are glued.
1 ORIGAMIS 4
Observe that in this case the vertices labeled with • and the vertices labeled
with ◦ are respectively identified and become two points on the closed surface
X. By calculating the Euler characteristic one obtains, that the genus of
the surface X is 2.
c) Finally, we consider an example with five squares, see Figure 3. Here,
edges with same labels are identified. For the unlabeled edges, those which
are opposite to each other are glued. We call the origami D.
1 2 3
• • •
Figure 3: The origami D. Edges with the same label and unlabeled edges
that are opposite are glued.
In this case, we obtain the three identification classes ◦, ⋆ and • for the
vertices. The genus of the closed surface X is again 2.
Origamis as coverings of a torus
Observe, that the trivial origami O0 from Example 1.1 a) is universal in the
following sense: If X is the closed surface that arises from an arbitrary origami
O and E the torus that arises from O0, then we have a natural map X → E by
mapping each of the unit squares of the origami O that form the surface X to
the one unit square of O0 that forms the torus E. This map is a covering that is
unramified except over the one point ∞ ∈ E. Conversely, given a closed surface
X together with such a covering p : X → E, we obtain a decomposition of X
into squares by cutting X along the preimages of the edges of the one square of
O0 that forms E. This motivates the following definition of origamis.
Definition 1.2. An origami O of genus g and degree d is a covering p : X → E
of degree d from a closed, oriented (topological) surface X of genus g to the torus
E that is ramified over at most one marked point ∞ ∈ E.
Remark that we have fixed here one torus E and one point ∞ ∈ E. In particular
we may furthermore fix a point M 6= ∞ on E and a set of standard generators
of the fundamental group π1(E,M) that do not pass through ∞. That way we
obtain a fixed isomorphism
∗) ∼= F2, (1)
1 ORIGAMIS 5
where E∗ = E−∞ and F2 = F2(x, y) is the free group in two generators x and y.
Describing E by gluing the edges of the unit square via translations, we choose
M to be the midpoint of the unit square and the standard generators to be the
horizontal and the vertical simply closed curve through M , see Figure 4.
Figure 4: Generators of π1(E
Example 1.3. In Example 1.1, in a) the covering is the identity id : E → E.
In b) we have a covering p : X → E of degree 4 that is ramified in the two points
labeled by • and ◦. Recall that the genus of X is 2.
In c) we have a covering p : X → E of degree 5 ramified in the two points labeled
by • and ⋆. Observe that though the point on X labeled by ◦ is a preimage of ∞,
the covering is not ramified in this point. The genus of X is again 2.
Definition 1.4. We say that two origamis O1 = (p1 : X1 → E) and O2 =
(p2 : X2 → E) are equivalent, if there is a homeomorphism ϕ : X1 → X2 with
p1 = p2 ◦ ϕ.
Description by a pair of permutations
An origami O = p : (X → E) of degree d defines (up to conjugation in Sd)
• a homomorphism m : F2 = F2(x, y) → Sd or equivalently
• a pair of permutations (σa, σb) in Sd
as follows:
Let M1, . . . , Md be the preimages of the point M (defined as above) under p.
Furthermore, let
m : π1(E
∗,M) → Sym(M1, . . . ,Md)
be the monodromy map defined by p, i.e. for the closed path c ∈ π1(E
∗,M) the
point Mi is mapped to Mj by m(c) if and only if the lift of the curve c to X via
p, that starts in Mi, ends in Mj.
Choosing an isomorphism Sym(M1, . . . ,Md) ∼= Sd and using the isomorphism
∗) ∼= F2 fixed in (1) makes m into a homomorphism from F2 to Sd. We set
σa = m(x) and σb = m(y).
Observe that this homomorphism depends on the chosen isomorphism to Sd and
on the choice of the origami in its equivalence class only up to conjugation in Sd.
Therefore we consider two homomorphisms m1 : F2 → Sd and m2 : F2 → Sd to
be equivalent, if they are conjugated by an element in Sd. Similarly we call two
pairs (σa, σb) and (σ
b) in Sd equivalent, if they are simultaneously conjugated,
i.e. there is some s ∈ Sd such that σa = sσ
−1 and σb = sσ
1 ORIGAMIS 6
Example 1.5. In Example 1.1 we obtain for the origami L(2, 3) in b) the mon-
odromy homomorphism
m : F2 → S4, x 7→ (2 3 4) and y 7→ (2 1),
and thus σa = (2 3 4) and σb = (2 1).
For the origami D in c) we similarly obtain the permutations
σa = (1 2 3) and σb = (1 4 5)(2 3).
Description as finite index subgroups of F2
Origamis can be equivalently described as finite index subgroups of F2, the free
group in two generators, as stated in the following remark. The characterization
of the Veech groups of origamis is mainly based on this observation.
Remark 1.6. We have a one-to-one correspondence:
origamis up to equivalence ↔ finite index subgroups of F2 up to conjugacy.
More precisely, this correspondence is given as follows:
Let O = (p : X → E) be an origami. Define E∗ = E − {∞} and X∗ =
X − p−1(∞). Thus we may restrict p to the unramified covering p : X∗ → E∗.
This defines an embedding of the corresponding fundamental groups:
U = π1(X
∗) ⊆ π1(E
∗) ∼= F2
Again we use the fixed isomorphism in (1), see also Figure 4. Changing the
origami in its equivalence class leads to a conjugation of U with an element in
F2. The index of the subgroup of F2 is the degree d of the covering p.
Conversely, given a finite index subgroup U of F2 we retrieve the origami in
the following way: Let v : Ẽ∗ → E∗ be a universal covering of E∗. By the
theorem of the universal covering, π1(E
∗) is isomorphic to Deck(Ẽ∗/E∗), the
group of deck transformations of Ẽ∗/E∗. Furthermore, the finite index subgroup
U of Deck(Ẽ∗/E∗) corresponds to an unramified covering p : X∗ → E∗ of finite
degree. This can be extended to a covering X → E, where X is a closed surface.
Example 1.7. In Example 1.1, we obtain the following subgroups of F2:
In a), X∗ is the once punctured torus itself and U = F2.
In b), X∗ is a genus 2 surface with 2 punctures. Thus U = π1(X
∗) is a free group
of rank 5. Keeping in mind that we use the identification π1(E
∗) ∼= F2 = F2(x, y)
described in Figure 4, one can read off from the picture in Figure 2 that
U = < x3, xyx−1, x2yx−2, yxy−1, y2 >
In c), X∗ is a genus 2 surface with three punctures. Thus U is a free group of
rank 6. More precisely, we read off the picture in Figure 3, that
U = < x3, xyx−2, x2yx−1, yxy−1, y2xy−2, y3 >
2 TRANSLATION STRUCTURES AND VEECH GROUPS 7
Description as a finite graph
Finitely, sometimes it is convenient to describe an origami O = (p : X → E)
as a finite, oriented labeled graph: Namely, let U be the finite index subgroup
of F2 (unique up to conjugation) that corresponds to O as described in the last
paragraph. Then we represent the origami by the Cayley-Graph of U ⊆ F2:
The vertices of the graph are the coset representatives. They are labeled with a
representative of the coset. The edges are labeled with x and y. For each vertex
(with label w ∈ F2) there is an x-edge from it to the vertex that belongs to the
coset of wx. And similarly there is a y-edge to the vertex that belongs to the
coset wy.
Example 1.8. The following figure shows the Cayley-graph for the origami L(2, 3)
from Example 1.1:
?>=<89:;ȳ
GFED@ABCīd
x // ?>=<89:;x̄ x //
�� GFED@ABC
Figure 5: Graph for O = L(2, 3).
2 Translation structures and Veech groups
Translation structures
Recall that an atlas on a surface is called translation atlas, if all transition maps
are translations. An origamiO = (p : X → E) naturally defines an SL2(R)-family
of translation structures µA (A ∈ SL2(R)) on X
∗ = X − p−1(∞) as follows:
• As first step, observe that each A ∈ SL2(R) naturally defines a translation
structure ηA on the torus E itself by identifying it with C/ΛA, where
and ΛA is the lattice <
> in C (2)
• Then define the translation structure µA on X
∗ by lifting ηA via p, i.e.
µA = p
Using the first description of an origami that we gave by gluing squares, we obtain
the translation structure µI (where I is the identity matrix), if we identify the
squares with the Euclidean unit square in C. We obtain µA for a general matrix
2 TRANSLATION STRUCTURES AND VEECH GROUPS 8
A ∈ SL2(R) from this by identifying the squares with the parallelogram spanned
by the two vectors
Thus the SL2(R)-variations of the translation structure µI can be thought of as
affine shearing of the unit squares, see Figure 6.
Figure 6: Sheared translation structure for the origami L(2, 3).
From an origami to a Teichmüller curve in the moduli space
By the SL2(R)-family of translation structures, the origami O = (p : X → E)
defines a specific complex algebraic curve called Teichmüller curve in the moduli
space Mg of closed Riemann surfaces of genus g. We state this construction here
only briefly as motivation and refer e.g. to the overview article [HeSc 06] for a
detailed description and links to references. A particular nice configuration of
such Teichmüller curves is described in [H 06] in this volume.
The Teichmüller curve in Mg is obtained from the origami in the following way:
• The translation structure µA described in the previous paragraph is in par-
ticular a complex structure on the surface X∗ which can be extended to the
closed surface X . The Riemann surface (X, µA) together with the identity
map id : X → X as marking then defines a point in the Teichmüller space
Tg. Thus we obtain the map: ι̂ : SL2(R) → Tg, A 7→ [(X, µA), id].
• If A ∈ SO2(R), then the affine map z 7→ A ·z is holomorphic. Thus the map
ι̂ factors through SO2(R). Furthermore using that SL2(R) modulo SO2(R)
is isomorphic to the upper half plane H, one obtains a map
ι : H ∼= SO2(R)\SL2(R) → Tg
In fact, this map is an embedding that is in the same time holomorphic and
isometric. A map with this property is called Teichmüller embedding and
its image ∆ in Teichmüller space is called a Teichmüller disk or a geodesic
disk.
2 TRANSLATION STRUCTURES AND VEECH GROUPS 9
• Finally, one may compose the map ι with the projection to the moduli space
Mg. The image of ∆ in Mg is a complex algebraic curve. A curve in Mg
that arises like this as the image of a Teichmüller disk is called Teichmüller
curve.
Note: More generally, one obtains a Teichmüller disk ∆ in a similar way starting
from an arbitrary translation surface (or a bit more general: from a flat surface).
However, the image of such a disk ∆ in moduli space is not always a complex
algebraic curve; in fact its Zariski closure tends to be of higher dimension. It
is an interesting question how to decide whether a translation surface leads to a
Teichmüller curve. One possible answer to this is given by the Veech group which
we introduce in the following paragraph.
Veech groups
Let X∗ be a connected surface and µ a translation structure on it. One assigns to
it a subgroup of GL2(R) called Veech group as described in the following: We con-
sider the group Aff+(X∗, µ) of all orientation preserving affine diffeomorphisms,
i.e. orientation preserving diffeomorphisms that are locally affine maps of the
plane C, see Figure 7. Here – and throughout the whole article – we identify C
with R2 by the map z 7→ (Re(z) , Im(z))t. Thus an affine diffeomorphism f can
be written in terms of local charts as
f : z = (Re(z), Im(z))t 7→ A · (Re(z), Im(z))t+ z0 with A ∈ GL2(R) and z0 ∈ C.
Observe that A does not depend on the chart, since µ is a translation structure.
Thus one obtains a well defined map
D : Aff+(X∗, µ) → GL2(R), f 7→ A
called Derivative map.
Definition 2.1. The Veech group Γ(X∗, µ) of the translation surface (X∗, µ) is
the image of the derivative map D:
Γ(X∗, µ) = D(Aff+(X∗, µ))
z 7→ Az + z0
Figure 7: An affine diffeomorphism of a translation surface
3 VEECH GROUPS OF ORIGAMIS 10
Example 2.2. Let (X∗, µ) be C/ΛI with the natural translation structure. Here
I is the identity matrix and ΛI is the corresponding lattice as defined in (2).
An affine diffeomorphisms of C/ΛI lifts to an affine diffeomorphism of C respect-
ing the lattice. Conversely, each such diffeomorphism descends to C/ΛI. Thus,
we have in this case
Γ(X∗, µ) = SL2(Z).
Veech groups and Teichmüller curves
As indicated in the paragraph about Teichmüller curves, the Veech group “knows”
whether a translation surface defines a Teichmüller curve in moduli space or not.
More precisely, one has the following statement:
Fact: Let X be a surface of genus g and X∗ = X−{P1, . . . , Pn} for finitely many
points P1, . . . , Pn on X . Furthermore let µ be a translation structure on X
Then (X∗, µ) defines a Teichmüller curve C if and only if the Veech group Γ(X∗, µ)
is a lattice in SL2(R). In this case, the curve C is (antiholomorphic) birational
to H/Γ(X∗, µ).
We describe the relation to Teichmüller curves here just as motivation and in
order to give a glance at the general frame. We have therefore resumed theorems
contributed by several authors condensed in what is here called “fact”. A good
access to it can be found e.g. in [EG 97] or [Z 06]. A broader overview on
Veech groups of translation surfaces is given e.g. in [HuSc 01] and in [Le 02].
Teichmüller disks, Teichmüller curves and Veech groups have intensively been
studied by numerous authors, starting from Thurston [T 88] and Veech himself
[V 89]. We refer to [S 04] and [HeSc 06] for more comprehensive overviews on
references.
3 Veech groups of origamis
Let O = p : (X → E) be an origami. We have seen in Section 2 that O defines an
SL2(R)-family of translation structures µA (A ∈ SL2(R)) on X
∗ = X − p−1(∞).
The corresponding Veech groups are not very different. In fact, they are all
conjugated to each other. More precisely, we have:
Γ(X∗, µA) = AΓ(X
∗, µI)A
Thus, we may restrict to the case where A = I which justifies the following
definition.
Definition 3.1. The Veech group Γ(O) of the origamiO is defined to be Γ(X∗, µI).
From Example 2.2 it follows that the Veech group of the trivial origami O0 (de-
fined in Example 1.1) is SL2(Z). For a general origami one can show that Γ(O)
3 VEECH GROUPS OF ORIGAMIS 11
is a finite index subgroup of SL2(Z). In fact, also the converse is true as it was
shown by Gutkin and Judge in [GJ 00]: A Veech group is a finite index subgroup
of SL2(Z) if and only if it comes from an origami.
From this it follows in particular by the Fact presented in Section 2 on page 10
that an origami always defines a Teichmüller curve in the moduli space.
Characterization of origami Veech groups
Recall from Section 1 that an origami O corresponds (up to equivalence) to a
finite index subgroup U of F2 = F2(x, y), the free group in two generators (up to
conjugation). This description enables us to give a characterization of its Veech
group entirely in terms of F2 and its automorphisms.
For this we need the following two ingredients:
• Let β̂ : Aut(F2) → Out(F2) ∼= GL2(Z) be the natural projection. The
fact that we only consider orientation preserving diffeomorphisms applies
to only taking automorphisms of Aut(F2) that are mapped to elements in
SL2(Z). We denote Aut
+(F2) = β̂
−1(SL2(Z)) and restrict to the map
β̂ : Aut+(F2) → SL2(Z).
• Let Stab(U) = {γ ∈ Aut+(F2)|γ(U) = U}
Using these ingredients, it was shown in [S 04] that Veech groups of origamis can
be described as stated in the following theorem.
Theorem 2 (Proposition 1 in [S 04]). For the Veech group Γ(O) of the origami
O holds:
Γ(O) = β̂(Stab(U))
Let us make two comments on this description:
One consequence is, that one obtains an algorithm that can calculate the Veech
group of an arbitrary origami explicitly. This algorithm is described in detail in
[S 04].
As an other consequence, we have now a characterization of all origami Veech
groups as stated in the following corollary.
Corollary 3.2. A finite index subgroup of SL2(Z) occurs as origami Veech group
if and only if it is the image of the stabilizing group Stab(U) ⊆ Aut+(F2) for
some finite index subgroup U in F2.
Thus the question, which finite index subgroups of SL2(Z) are Veech groups be-
comes roughly speaking the same as the question which subgroups of Aut+(F2)
are such stabilizing groups. So far, there is no general answer known.
3 VEECH GROUPS OF ORIGAMIS 12
In [S 05] it was shown that many congruence subgroups of SL2(Z) are Veech
groups. Recall that a congruence group of level n is a subgroup of SL2(Z) that is
the full preimage of some subgroup of SL2(Z/nZ) under the natural homomor-
phism SL2(Z) → SL2(Z/nZ) and n shall be minimal with this property. For prime
level congruence groups the following statement is shown in [S 05, Theorem 4]
Theorem 3. Let p be prime. All congruence groups Γ of level p are Veech groups
except possibly p ∈ {2, 3, 5, 7, 11} and Γ has index p in SL2(Z).
This result is generalized to a statement for arbitrary n in [S 05, Theorem 5]
Presenting the Veech group Γ and the quotient H/Γ for an origami
As mentioned above, using Theorem 2 the Veech group of an origami can be
calculated explicitly. The Veech groups are described as subgroups of SL2(Z)
by generators and coset representatives. We use for the notation that SL2(Z) is
generated by S and T , with
and T =
Recall furthermore from the discussion on Veech groups and Teichmüller curves
in Section 2 on page 10 that for a Veech group Γ we are in particular interested
in the quotient H/Γ, since this quotient is birational to the corresponding Te-
ichmüller curve. Here Γ acts as Fuchsian group on the upper half plane H, which
is endowed with the Poincaré metric.
Since an origami Veech group Γ is a finite index subgroup of SL2(Z), the quotient
H/Γ comes with a natural triangulation. More precisely, we choose the funda-
mental domain for the action of SL2(Z) on H that is the geodesic pseudo-triangle
∆ with vertices P = −1
i, Q = 1
i and P∞ = ∞.
Figure 8: Fundamental domain of SL2(Z).
The surface H/SL2(Z) is obtained by identifying the vertical edges P∞ and Q∞
via T and the edge PQ with itself (with fixed point i) via S.
For an arbitrary subgroup Γ of SL2(Z) of finite index we obtain a fundamental
domain as a union of translates of the triangle ∆: for each coset A we take the
3 VEECH GROUPS OF ORIGAMIS 13
triangle A(∆), where A is a representative of the coset. The identification of
the edges is given by the elements in Γ. Gluing the edges gives the quotient
surface H/Γ, filling in the cusps leads to a closed Riemann surface endowed with
a triangulation. We draw stylized pictures of the fundamental domains that
indicate the triangles (see Figure 9 and 10). The triangles are labeled with a
coset representative, edges that are identified are labeled with the same letter
and vertices that are identified with the same number. Vertices that come from
cusps (i.e. points at ∞) are marked with •.
In particular, one can read off from these stylized pictures the genus and the
number of cusps of the quotient surface H/Γ.
Two examples: the origami L(2,3) and the origami D
The origami L(2,3):
In [S 04, Example 3.5] the Veech group is calculated as follows:
Γ(L(2, 3)) = <
More precisely, one obtains the generators presented as products of S and T as
well as a list of coset representatives.
• List of generators:
= T 3,
= TST 2ST−1T−1,
= TSTST−1S,
= T 2STST−1S−1T−2,
• List of representatives:
I, T, S, T 2, TS, ST, T 2S, TST, T 2ST
Hence, Γ(L(2, 3)) is a subgroup of index 9 in SL2(Z).
The stylized picture of the quotient H/Γ(L(2, 3)) is determined in [S 04, Example
3.6] and is shown here in Figure 9.
3 VEECH GROUPS OF ORIGAMIS 14
TTSTT
Figure 9: Fundamental domain of Γ(L(2, 3)).
From this one can read off that the genus of the quotient H/Γ(L(2, 3)) is 0 and
that it has 3 cusps, namely the vertices labeled by 1,4 and 5. It follows in par-
ticular that the corresponding Teichmüller curve has genus 0.
The origami D:
The Veech group of the origami D is calculated in [S 05, Section 7.3.2]. It has
index 24 in SL2(Z) and the following generators:
= −I, A1 =
= T 3,
= ST 6S−1, A3 =
−7 16
= (T 2S)T 4(T 2S)−1.
= (TS)T 4(TS)−1, A5 =
−20 11
= (TST 2S)T 5(TST 2S)−1,
−18 −5
= (ST 3S)T 2(ST 3S)−1,
The following is a system of cosets representatives:
I , T , S , T 2 , TS , ST , T 2S , TST , ST 2 , STS , T 2ST , TST 2 ,
ST 5 , ST 3 , T 2S , TST 3 , TST 2S , ST 4 , ST 3S , TST 2ST−1 ,
TST 2ST−2 , TST 2ST−3 ; TST 2ST−4 , ST 3ST
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 15
The corresponding origami curve C(D) has genus 0. It is shown with its natu-
ral triangulation in Figure 10. It has six cusps, namely C1, C2, C3, C4, C5 and C6.
TT TTS
TTSTT
STTSTT
TSTTS
TSTTST−1
3ST ST
Figure 10: The origami curve to D.
4 Veech groups that are non congruence groups
Theorem 3 implies that there are many congruence groups which are Veech
groups. How about non congruence groups? In this section we will see that
the Veech groups for the two examples, the origami L(2, 3) and the origami D,
studied in the last paragraph are both non congruence groups. Furthermore,
we give a construction that produces for both of them an infinite sequence of
origamis whose Veech group is a non congruence group. We use this in order to
prove our main theorem.
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 16
An other generalization of the example L(2, 3) was given by Hubert and Lelièvre
in [HL 05], where they show for certain “L-shaped” origamis or square-tiled sur-
faces, how they are called there, that their Veech groups are non congruence
groups. These surfaces are all of genus 2, hence it follows that there are infinitely
many origamis of genus 2 whose Veech group is a non congruence group.
Recall that a group is a congruence group, whose level is a divisor of n, if and
only if it contains the principal congruence group
Γ(n) = {
mod n} = kernel(proj : SL2(Z) → SL2(Z/nZ))
In [S 04, Proposition 3.8] it was shown using a proof of Stefan Kühnlein that the
Veech group of L(2, 3) is a non congruence group. The basic tool for this is the
general level that is defined for any subgroup Γ of SL2(Z) as follows: For each
cusp we define its amplitude to be the smallest natural number n such that there
is an element of Γ conjugated in SL2(Z) to the matrix
which fixes the cusp. Observe that this is equal to the number of triangles around
the vertex that represents the cusp in our stylized picture of the quotient surface
(see Figures 9 and 10). The general level of Γ is the least common multiple of the
amplitudes of all its cusps. A theorem of Wohlfahrt [W 64, Theorem 2] states
that the level and the general level of a congruence group coincide.
The amplitude of the three cusps of H/Γ(L(2, 3)) labeled with 1, 4 and 5 in Fig-
ure 9 is 3, 2 and 4 respectively. Hence, the general level of Γ(L(2, 3)) is 12. Then
it is shown in the proof that Γ(L(2, 3)) does not contain Γ(12) which gives the
contradiction.
The same method can be used in order to show that Γ(D) is a non congruence
group. We here carry out the proof for it. Observe from Figure 10 that the six
cusps C1, . . . , C6 have the amplitude 3, 6, 4, 4, 5 and 2, respectively. Thus the
general level is 60.
Proposition 4.1. The Veech group Γ(D) is a non congruence group.
Proof. Suppose that Γ = Γ(D) is a congruence group. Since the general level of
Γ is 60, we have by the theorem of Wohlfahrt mentioned above, that Γ(60) is a
subgroup of Γ.
We will use the following facts, which can be checked e.g. in Figure 10:
∈ Γ, A6 =
−18 −5
∈ Γ and T =
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 17
In order to verify this in Figure 10, use that
A1 = T
3 and A6 = S
−1T 2S−1T−1S−1TS−1T−3S−1.
We will find an element in Γ whose projection to SL2(Z/60Z) is equal to that of
T , which gives us the desired contradiction.
Recall that
SL2(Z/60Z) ∼= SL2(Z/4Z)× SL2(Z/3Z)× SL2(Z/5Z).
We identify in the following these two groups. Furthermore we denote by p4, p3,
p5 and p60 the projection from SL2(Z) to SL2(Z/4Z), SL2(Z/3Z), SL2(Z/5Z) and
SL2(Z/60Z), respectively. Then p60 = p4 × p3 × p5.
We have
p60(A1) = (
) and
p60(A6) = (
The order of p4(A1) in SL2(Z/4Z) is 4, the order of p3(A1) in SL2(Z/3Z) is 1
and the order of p5(A1) in SL2(Z/5Z) is 5. We also say: The order of p60(A1) is
(4, 1, 5). Since 7 ≡ 3 mod 4 and 7 ≡ 2 mod 5 we have
p60(A
1) = (
) (4)
Furthermore:
p60(A
6) = (
and with the same notation as above p60(A
6) has the order (1, 3, 5). Thus
p60(A
6 ) = (
). (5)
From (4) and (5) it follows that
p60(A
6 · A
1) = (
) = p60(
) = p60(T )
But A206 · A
1 ∈ Γ and T /∈ Γ, thus Γ(60) = ker(p60) cannot be contained in Γ.
Therefore, Γ cannot be a congruence group of level 60. Contradiction!
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 18
Sequences of origamis with non congruence Veech groups
Starting from the origamis L(2, 3) and D we will define respectively a sequence
On, such that for each n ∈ N the Veech group Γ(On) again is a non congruence
group. The basic idea is to “copy and paste”: we will cut the origami along a
segment, take n copies of it and glue them along the cuts.
In Figure 11 we show the origami On for L(2, 3):
1 3 4
5 7 8
. . . 4n-7 4n-5 4n-4
4n-3 4n-1 4n
Figure 11: n copies of L(2, 3). Opposite edges are glued.
Using the description of an origami by a pair of permutations from Section 1, On
is given as:
σa = (1 3 4 5 7 8 9 11 12 . . . 4n−3 4n−1 4n), σb = (1 2)(5 6) . . . (4n−3 4n−2).
Observe that the genus of On is n + 1 and it has 2n cusps: n of order 3 (all n
marked by • in Figure 11), and n of order 1 (all n marked by ◦ in Figure 11).
Finally, we want to present the origami On by the finite index subgroup Hn =
∗) of F2, that corresponds to On by Remark 1.6.
Recall from Example 1.7 that for O1 = L(2, 3), we obtain the free group of rank 5:
U = H1 =< g1 = x
3, g2 = xyx
−1, g3 = x
2yx−2, g4 = yxy
−1, g5 = y
2 >= F5.
The group Hn is obtained as as follows:
Hn = < g
1 , g
1 gj g
1 ∈ F5 | i ∈ {0, . . . , n− 1} and j ∈ {2, . . . , 5} >
In Figure 12, we show the origami Dn:
1 2 3
6 7 8
. . . . . . . . .
5n-4 5n-3 5n-2
b1 a2
b2 an
• • •
* • • •
• • •
Figure 12: n copies of D. Edges with the same label or
unlabeled opposite edges are glued.
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 19
The pair of permutations describing Dn is:
σa = (1 2 3 6 7 8 . . . 5n− 4 5n− 3 5n− 2),
σb = (1 4 5)(6 9 10) . . . (5n− 4 5n− 1 5n)(2 3)(7 8) . . . (5n− 3 5n− 2)
The genus of Dn is 2n and it has n+ 2 cusps: 2 of order 2n (marked as • and ⋆)
and n of order 1 (all n marked by ◦).
Again, we present On by the corresponding finite index subgroup Hn of F2. We
have from Example 1.7 that U = H1 = F6, the free group of rank 6:
U =< g′1 = x
3, g′2 = xyx
−2, g′3 = x
2yx−1, g′4 = yxy
−1, g′5 = y
2xy−2, g′6 = y
3 > = F6
And similarly as above, we obtain:
Hn = < g
1 , g
1 ∈ F6 | i ∈ {0, . . . , n− 1} and j ∈ {2, . . . , 6} >
We will see in the following that for both sequences all Veech groups Γ(On) are
non congruence groups. More precisely, we will show:
Proposition 4.2. For both sequences On the following holds:
• Γ(On) ⊆ Γ(O1), which is for both sequences a non congruence group.
• More generally one has:
n divides m ⇒ Γ(Om) ⊆ Γ(On).
• Different origamis in one sequence have different Veech groups, i.e.:
Γ(On) 6= Γ(Om) for n 6= m.
To prove this, let us detect that we are in the following more general setting.
Setting A:
• Let U be a finite index subgroup of F2. Then U is a free group of rank k
for some k ≥ 2, i.e.
U = < g1, . . . , gk > = Fk
• Let α : Fk → Z be the projection w 7→ ♯g1w
where ♯g1w is the number of g1 in the word w = w(g1, . . . , gk) with g
counted as −1.
• Let Hn be the kernel of pn ◦ α, where pn : Z → Z/nZ is the natural
projection, i.e.
Hn = < g
1 , g
1 gj g
1 ∈ Fk | i ∈ {0, . . . , n− 1} and j ∈ {2, . . . , k} > .
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 20
• Finally, let H0 be the kernel of α, i.e.:
Hn = << g2, . . . , gk >>U ,
is the normal subgroup in U generated by g2, . . . , gk.
Observe that we are in this setting with
U = π1(X
∗) =< x3, xyx−1, x2yx−2, yxy−1, y2 > for the origami L(2, 3) and
U = π1(X
∗) =< x3, xyx−2, x2yx−1, yxy−1, y2xy−2, y3 > for the origami D.
In order to prove the properties in Proposition 4.2, we will need that U fulfills
the following a bit technical condition:
Property B: Let U =< g1, . . . , gk > (k ≥ 2) be as above a finite index
subgroup of F2 of rank k and {wi}i∈I a system of coset representatives with
w1 = id. Suppose that U has the following property:
∀ j ∈ I − {1} : wj << g2, . . . , gk >>U w
j 6⊆ U.
One can check by hand that for both origamis, L(2, 3) and D, this property is
fulfilled. In this setting we obtain the following conclusions.
Proposition 4.3. Let n ∈ N ∪ {0}. Let U be a finite index subgroup of F2
fulfilling property B. With the notations from Setting A, we have:
a) The normalizer of Hn in F2 is equal to U : NormF2(Hn) = U
b) Stab
(Hn) ⊆ StabAut+(F2)
c) Recall that U = Fk, the free group in k generators.
Let βn : Aut(Fk) → GLk(Z/nZ) be the natural projection.
Then Stab
(Hn) is equal to
β−1n ({A = (ai,j)1≤i,j≤k ∈ GLk(Z/nZ)| a1,2 = . . . = a1,k = 0}) ∩ G.
Here we use the notation Z/(0Z) = Z thus β0 is the natural projection
Aut(Fk) → GLk(Z).
Proof.
By definition Hn is normal in U , i.e. U ⊆ NormF2(Hn).
Let now w be an element of F2\U . Hence, w = wj ·u for some j ∈ I−{1}, u ∈ U .
By Property B, there exists some h0 ∈ << g2, . . . , gk >>U = H0, such that
wjh0w
j 6∈ U . Therefore we have w(u
−1h0u)w
−1 6∈ U . But u−1h0u ∈ H0 ⊆ Hn,
since H0 is normal in U . This shows that w 6∈ NormF2(Hn).
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 21
This follows from a), since for a subgroup H of F2 in general holds:
(H) ⊆ Stab
(NormF2(H)), see e.g. [S 06, Remark 3.1].
Define M = {A = (ai,j)1≤i,j≤k ∈ GLk(Z/nZ)| a1,2 = . . . = a1,k = 0}.
Let γ ∈ G. We have to show that γ(Hn) = Hn if and only if βn(γ) ∈ M .
Let furthermore pkn : Fk → (Z/nZ)
k be the natural projection.
Consider the following commutative diagram:
Hn = p
n(Hn) ⊆ (Z/nZ)
βn(γ) // (Z/nZ)k ⊇ pkn(Hn) = Hn
Since pkn is surjective and Hn is the full preimage of Hn = p
n(Hn), it follows that
γ(Hn) = Hn if and only if βn(γ)(Hn) = Hn.
Observe finally that:
Hn = {(0, x2, . . . , xk) ∈ (Z/nZ)
k} and
StabGLk(Z/nZ)(Hn) = { A = (ai,j)1≤i,j≤k ∈ GLk(Z/nZ)|
(y1, . . . , yk) = A · (0, x2, . . . , xk) ⇒ y1 = 0 }
= {A = (ai,j) ∈ GLk(Z/nZ)| a1,2 = . . . = a1,k = 0}
Theorem 2 suggests the following notation.
Definition 4.4. Let U be a subgroup of F2.
With β̂ : Aut+(F2) → SL2(Z) as in Theorem 2, we define
Γ(U) = β̂(Stab
and call Γ(U) the Veech group of U .
We now obtain from Proposition 4.3 the following conclusions.
Corollary 4.5. Suppose that we are in the same situation as in Proposition 4.3,
in particular that U is a finite index subgroup of F2 fulfilling property B. Then
we have for all n ∈ N:
a) Stab
(H0) ⊆ StabAut+(F2)
(Hn) and Γ(H0) ⊆ Γ(Hn).
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 22
b) If m ∈ N with n|m, then:
(Hm) ⊆ StabAut+(F2)
(Hn) and Γ(Hm) ⊆ Γ(Hn).
(H0) =
(Hn) and Γ(H0) =
Γ(Hn)
Proof.
a) and b):
Let γ ∈ G. By Proposition 4.3 we have that
∀n ∈ N : γ ∈ Stab
(Hn) ⇔ βn(γ) = A = (ai,j)
with a1,2 ≡ . . . ≡ a1,k ≡ 0 mod n
and γ ∈ Stab
(H0) ⇔ β0(γ) = A = (ai,j)
with a1,2 = . . . = a1,k = 0.
Thus we have for all n ∈ N and for all m ∈ N with n|m, that
(H0) ⊆ StabAut+(F2)(Hm) ⊆ StabAut+(F2)(Hn).
We have in particular by the definition of the Veech group of a subgroup of F2:
Γ(H0) ⊆ Γ(Hm) ⊆ Γ(Hn).
⊆ follows from a). ⊇ follows from Remark [S 06, Remark 3.1].
We now return to the language of origamis: Let O be an origami, U the corre-
sponding subgroup of F2. Define for U the subgroups Hn (n ∈ N) as in Setting
A and let On be the origamis corresponding to the groups Hn.
By Corollary 4.5 and Theorem 2 we obtain immediately the following result.
Proposition 4.6. If U has the Property B, then
∀n ∈ N : Γ(On) ⊆ Γ(O) and ∀n,m ∈ N : n|m ⇒ Γ(Om) ⊆ Γ(On).
In particular, if Γ(O) is a non congruence group, each Γ(On) is a non congruence
group. Thus in this case, we obtain infinitely many origamis whose Veech group
is a non congruence group.
In order to conclude Proposition 4.2, it is now just left to prove the last item. But
this follows , since we have (see [S 05]) for both sequences On, the one coming
from the origami L(2, 3) and the one coming from the origami D, that
∈ Γ(On) ⇔ 3n divides s. (6)
4 VEECH GROUPS THAT ARE NON CONGRUENCE GROUPS 23
This finishes the proof of Proposition 4.2.
Furthermore, Theorem 1 follows from Proposition 4.2.
Remark: From Corollary 4.5 and (6) it follows that Γ(H0) has infinite index in
SL2(Z). Furthermore it is non trivial, since it contains
for L(2, 3) respectively B3 =
for D.
REFERENCES 24
References
[EG 97] C.J. Earle, F.P. Gardiner: Teichmüller disks and Veech’s F -structures.
American Mathematical Society. Contemporary Mathematics 201, 1997 (p.
165–189).
[GJ 00] E. Gutkin, C. Judge: Affine mappings of translation surfaces. Duke
Mathematical Journal 103 No. 2, 2000 (p. 191–212).
[HeSc 06] F. Herrlich, G. Schmithüsen: On the boundary of Teichmüller disks
in Teichmüller and in Schottky space. To appear in Handbook of Teichmüller
theory. Ed. A. Papadopoulos, European Mathematical Society, 2006.
[H 06] F. Herrlich: A comb of origami curves in M3. Proceedings of Symposium
on Transformation Groups, Yokohama, November 2006.
[HL 05] P. Hubert, S. Lelièvre: Noncongruence subgroups in H(2). International
Mathematics Research Notices 2005, No.1 , 2005 (p. 47–64).
[HuSc 01] P. Hubert, T. Schmidt: Invariants of translation surfaces. Annales de
l’Institut Fourier 51 No. 2, 2001 (p. 461–495).
[Le 02] S. Lelièvre: Veech surfaces associated with rational billiards. Preprint,
2002. arXiv:math.GT/0205249.
[Lo 05] P. Lochak: On arithmetic curves in the moduli space of curves. J. Inst.
Math. Jussieu 4, No. 3, 2005 (p. 443–508).
[S 04] G. Schmithüsen: An algorithm for finding the Veech group of an origami.
Experimental Mathematics 13 No. 4, 2004 (p. 459–472).
[S 05] G. Schmithüsen: Veech Groups of Origamis. Dissertation (PhD thesis),
Karlsruhe 2005. Elektronisches Volltextarchiv EVA Universität Karlsruhe.
http://www.ubka.uni-karlsruhe.de/eva/
[S 06] Schmithüsen,G.: Examples for Veech groups of origamis. In: The Geome-
try of Riemann Surfaces and Abelian Varieties. Contemp. Math. 397, 2006 (p.
193–206).
[T 88] W. Thurston: On the geometry and dynamics of diffeomorphisms of sur-
faces. Bulletin (New Series) of the American Mathematical Society 19 No. 2,
1988 (p. 417–431).
[V 89] W.A. Veech: Teichmüller curves in moduli space, Eisenstein series and
an application to triangular billiards. Inventiones Mathematicae 97 No.3 1989,
(p. 553–583).
REFERENCES 25
[W 64] K. Wohlfahrt: An Extension of F. Klein’s Level Concept. Illinois Journal
of Mathematics 8, 1964 (p. 529–535).
[Z 06] A. Zorich: Flat Surfaces in collection ”Frontiers in Number Theory,
Physics and Geometry. Volume 1: On random matrices, zeta functions and
dynamical systems”, Ed. P. Cartier, B. Julia, P. Moussa, P. Vanhove, Springer-
Verlag, 2006 (p. 439–586).
	Origamis
	Translation structures and Veech groups
	Veech groups of origamis
	Veech groups that are non congruence groups
ABSTRACT
  As main result we show that for each g > 1 there is some translation surface
of genus g whose Veech group is a non congruence subgroup of SL(2,Z). We use
origamis/square-tiled surfaces to produce our examples. The article is divided
into two parts: In the first part we introduce translation surfaces, origamis,
Veech groups and Teichmueller curves and show for two origamis in genus 2 that
their Veech groups are non congruence groups; in the second part we provide a
technique that produces sequences of origamis whose Veech groups are
decreasing. This is used to prove the main result.

<|endoftext|><|startoftext|>
Introduction
In Ruelle’s book [R69] on statistical mechanics, in section 3.2 concerning one
species of classical particles in Rν , you can read:
1 PROPOSITION. If the pair potential Φ can be written in the form
Φ = Φ1 + Φ2 (1)
where Φ1 is positive, and Φ2 is a real continuous function of positive type, then
Φ is stable.
1Bernhard.Baumgartner@univie.ac.at
http://arxiv.org/abs/0704.0417v1
TDS BB April 3, 2007 2
“Positive” is meant here and throughout this paper as nowhere negative,
“stable” means
∃E0 ∈ R such that ∀N, ∀{x1...xN} ⊂ Rν : U(x1 · · ·xN) ≧ N · E0, (2)
where U(x1 · · ·xN ) =
i 6=j
Φ(xj − xi). (3)
This proposition is accompanied by the
2 FOOTNOTE. It seems to be an open problem to construct a stable potential
which is not of the form (1).
We solve this problem in dimension 1, considering particles either in Z or in
R, giving a detailed proof. In dimension 2 the problem can also be solved, but
we give only a sketch of the ideas. 1
To make it simple, we consider only pair potentials which are bounded con-
tinuous functions and state the stability property as
3 DEFINITION. A bounded continuous real valued function V on Rν is stable,
E(ρ) :=
ρ(x)V (x− y)ρ(y)dνx dνy ≥ 0 (4)
for every positive finite measure ρ(x)dνx on Rν. A bounded real valued function
V on Zν is stable, if
E(ρ) :=
ρ(~m)V (~m− ~n)ρ(~n) ≥ 0 (5)
for every positive bounded function ρ(~m) on Zν .
The stability property used in Ruelle’s Theorem is an immediate consequence.
With ρ =
i=1 δ(xi − xj) put into equation (4) one gets
U(x1 · · ·xN ) = E(ρ)−N · V (0) ≥ −N · V (0).
The main result of our considerations is stated as
4 THEOREM. Each of the following functions is a stable pair potential, but
not a sum of a positive and a real valued positive definite function.
1. The function V : Z → R, defined as
V (0) = V (2) = V (−2) = 1, V (1) = V (−1) = −1, (6)
V (n) = 0 ∀n with |n| ≥ 3,
2. The function W : R → R, defined as
W (x) =
V (n)f(n− x+ y)f(y)dy, (7)
with f a positive continuous function (−1
) → R and V as defined in (6).
1 Construction in higher dimensions is still an open problem.
TDS BB April 3, 2007 3
2 Properties of the interaction potentials
Proof. Of part (1) of 4 Theorem.
Denote the distribution of particles on the chain by the “density” ρ, a function
Z → Z+. The interaction energy U becomes smaller, when the system is cut into
non-interacting pieces: If ρ(n) ≥ ρ(n+1) divide the chain, cutting between n+1
and n+ 2. Moving the pieces apart, one looses the energy
2[ρ(n)− ρ(n + 1)]ρ(n+ 2) + 2ρ(n+ 1)ρ(n + 3) ≥ 0.
The symmetric procedure of cutting between n − 2 and n − 1 lowers the energy
if ρ(n− 1) ≤ ρ(n).
Now there remains a set of pieces of no more than three lattice points, with
densities like
0 ≤ ρ(n− 1) ≤ ρ(n) ≥ ρ(n + 1) ≥ 0.
Including the “self-energies” N · V (0) one gets for each piece, centered around n,
E = ρ(n−1)2+ρ(n)2+ρ(n+1)2+2[ρ(n−1)ρ(n+1)−ρ(n−1)ρ(n)−ρ(n)ρ(n+1)]
= [ρ(n− 1)− ρ(n) + ρ(n+ 1)]2 ≥ 0.
Proving the stability of V .
If V were the sum of a positive and a positive definite function, it would give
V (n)µ(n) ≥ 0, (8)
for each µ being both positive and positive definite. Now consider
µ(5ν) = 1, µ(5ν ± 1) =
, µ(5ν ± 2) = 0, (9)
which is obviously positive. Positive definiteness is seen by using Bochner’s the-
orem [RN55] and calculating the Fourier-Transform, with α ∈ (−π,+π]:
µ̂(α) =
µ(n)e−inα
5 δ(α) +
),+δ(α+
> 0. (10)
But it does not give a positive value in (8):
V (n)µ(n) = 2−
5 < 0.
TDS BB April 3, 2007 4
The appearance of the numbers 5 and
5 may seem mysterious. Demystifying
is the next section, where we present the “origin” of these V and µ.
In this section we develop further use of these functions in R and in R2.
Proof. Of part (2) of 4 Theorem.
For N particles at x1 . . . xN consider the measure
ρ(x) =
δ(x− xj). (11)
Adding the self-energies N ·W (0), we study
ρ(x)W (x− y)ρ(y) dxdy
V (n)
ρf (x+ n)ρf (x) dx, (12)
with ρf(x) :=
f(x − y)ρ(y) dy. Splitting the integral in (12) into pieces of
intervals with unit length and defining ρf,x(m) = ρf(x+m) gives
V (n)ρf (x+m+ n)ρf(x+m)
ρf,x(p)V (p−m)ρf,x(m) ≥ 0,
by part (1) of the theorem. So the potential W is stable.
Now consider the distribution
µD(x) =
µ(m)δ(x−m), (13)
using the sequence µ defined in (9). This distribution is positive and positive
definite, as can be seen at its Fourier transform, which is (up to a factor) the
same as in (10), now with µ̂D(α+2π) = µ̂D(α) periodically extended to all α ∈ R.
This µD is used to show that the potential is not a sum of positive and positive
definite functions:
W (x)µD(x)dx
V (n)
∫ + 1
dx δ(x−m)f(n− x+ y)f(y)
V (n)µ(n) ·
f 2(y)dy < 0. (14)
In the last step the final support of f is essential.
TDS BB April 3, 2007 5
Construction of a stable pair potential in R2 being a function of the particle
distances only may be done in the following way:
• Use W (x) defined in (7), now with an f supported on (−1
), convolute it
twice with the distribution
h(x) =
e−ǫ|n|δ(x− 5n) :
W1(x) =
h(x− y)W (y − z)h(z)dy dz.
• Take the mean value (times 2π) of all rotated versions: Wr(~x) = 1rW1(|~x|).
• Smoothen out Wr with a positive continuous function g(r) with support on
[0, 1
W2(~x) =
g(|~x− ~y|)
W1(|~y − ~z|)
|~y − ~z|
g(|~z|)d2y d2z.
That the stability is not destroyed by the double convolution with h follows
from a consideration as it is used in the equation (12). Written in a formal way:
〈ρ|W1 |ρ〉 = 〈ρ| h ∗W ∗ h− |ρ〉 = 〈ρ ∗ h|W |ρ ∗ h〉.
Considering only smooth densities ρ(~x) one may take W1(x1)δ(x2) as a stable
distribution in R2:
〈ρ|W1 · δ |ρ〉dim=2 =
〈ρy|W1 |ρy〉dim=1 d y ≥ 0.
Now rotating the axes and taking the mean value does not destroy the stability.
Once more a double convolution is done, now with g in order to get W2 as a
bounded continuous potential acting in R2.
〈ρ|W2 |ρ〉 = 〈ρ| g ∗W ∗ g− |ρ〉 = 〈ρ ∗ g|W |ρ ∗ g〉 ≥ 0.
Smoothing by convolution with g enables to consider again sets of particles rep-
resented by delta-functions in ρ.
To disprove the possibility of splitting W2 into a sum of a positive and a
positive definite function one may use the µD of equ. (13) embedded into R
µD(x, y) = µD(x)δ(y).
Due to the smoothing of Wr by g and due to its decrease given by the decrease
of h, the integral
W2µD is finite:
W2µD(x) dx = W2(0) + 2W2(1)µ(1) + 2 ·
W2(|5ν + n|)µ(n)
TDS BB April 3, 2007 6
The bounded support of f and g is needed here as it was in equ. (14). The
exponential decrease implies
W2(|5ν + n|) = const. · e−5ǫ ν
V (n) ·
1 +O(
The “const.” factor involves the integrals over f 2 and g2, the error term O( 1
gives the difference between e−5ǫ ν/5ν and e−ǫ (5ν+n)/(5ν + n). The summations
over ν and n give
≈ 2 · const. ·
V (n)µ(n) · log(1/ǫ) +O
e−5ǫ ν
The first part is negative and increases without limit when ǫ → 0, while the other
term remains finite. So W2 with small ǫ can not be a sum of positive and positive
definite functions.
3 Mathematical background
Only in applying Proposition 1 in statistical mechanics the Thermodynamic Limit
is considered, not yet in the investigations of “stability”. Moreover, in the re-
formulation in 3 Definition there is no mentioning of “particles”. What is used
of properties of space are: A distance relation between points and an invariant
measure. This allows for a more general version of the definition, concerning
functions on groups. We keep the notation we used above: x and y are elements
of the group, their “group product” is x+ y, the “inverse” of x is −x.
5 DEFINITION. Consider a bounded continuous real valued function V on a
locally compact abelian group G which has the Haar measure dx. V is stable, if
〈ρ|V |ρ〉 :=
ρ(x)V (x− y)ρ(y)dx dy ≥ 0 (15)
for every finite positive Borel measure ρ(x)dx.
Stable functions can be added, multiplied by positive numbers, and limits
may be formed. So they form a closed convex cone, which we call STB. This
cone STB contains POS, the cone of positive functions, also PDF, the cone of
positive definite functions and sums thereof.
STB ⊃ POS + PDF (16)
An investigation of the relations between these cones may proceed via investi-
gation of the dual cones (see [V64, R62, G03]). The dual cones are subsets of
V ′, the space of finite Borel measures µ(x)dx, which is the dual space to V, the
TDS BB April 3, 2007 7
Banach space of bounded continuous functions. The dual cone to POS is POS′,
the set of finite positive Borel measures, dual to PDF is PDF′, the set of finite
positive definite Borel measures. The cone STB′ is given as the closure of the
cone of convex combinations of “correlation measures”
µ(x) =
ρ(x)ρ(y + x)dy, (17)
i.e. convolutions of finite positive Borel measures ρ(x)dx with their reflected ver-
sion ρ(−x)dx. These correlation measures are both positive and positive definite:
STB′ ⊂ POS′ ∩ PDF′ (18)
Now the question of equality or inequality in this relation is related to the
central problem which is our concern in this investigation, the question of equality
or inequality in (16). If the closed cone POS′∩PDF′ contains an element µ which
is not in the closed cone STB′, then, by definition of “dual cone”, there exists
an element V ∈ STB such that
V µ < 0, incompatible with a decompostion
V = f + g, f ∈ POS, g ∈ PDF.
For the groups Z2, Z3, Z4 there is equality in the equations (16) and (18), but
not for Z5.
6 PROPOSITION. The intersection of POS′ ∩ PDF′ with the plane
{(µ(−2) . . . µ(2))|µ(0) = 1} is completely characterized by its extremal points
(0, 0, 1, 0, 0), (0, γ, 1, γ, 0), (γ, 0, 1, 0, γ), (1, 1, 1, 1, 1), with γ = (
5 − 1)/2 =
1/(2| cos 4π/5|).
Proof. By using Bochner’s theorem and analyzing the Fourier transform
µ̂(k) =
µ(n)e−2π k n/5. (19)
On the other hand there is a bound for STB′ which cuts off a triangular subset
of this convex quadrangle:
7 LEMMA. Each element of STB′ obeys the inequality
µ(1) ≤
µ(n)/4. (20)
Proof. STB′ is defined by its extremal rays, formed as correlation measures of
positive densities.
µ ∈ STB′, µ extremal ⇔ ∃ρ ≥ 0, µ(n) =
ρ(m)ρ(m + n).
TDS BB April 3, 2007 8
Assume, w.l.o.g., that ρ(−1) ≥ ρ(−2). then
µ(1) = [ρ(−1) + ρ(1)] · [ρ(−2) + ρ(0) + ρ(2)]− [ρ(−1)− ρ(−2)] ρ(2)− ρ(−2)ρ(1)
− x)(s
+ x) ≤ s
Here s =
m ρ(m), x = [ρ(−2) + ρ(0) + ρ(2)− ρ(−1)− ρ(1)] /2.
Observe
n µ(n) = s
Remark: Also µ(2) obeys this inequality and µ(−1) = µ(1), µ(−2) = µ(2).
Closer inspection reveals moreover two rounded edges of STB′.
Now the extremal point with µ(n) as in equation (9) with ν = 0 is outside
this boundary. And V (n) as in equation (6) is an element of STB, but outside of
POS+PDF.
4 Conclusion
For pair potentials which are bounded continuous functions the property of being
“stable” can be reformulated without mention of particles. In this way it can be
studied for abstract abelian groups. At the heart of the present investigation is
the observation of a function V in Z5 which is stable, but indecomposable into a
sum of positive and positive definite functions. This function V can also be used
on Z. With some smoothing it can be used on R, and in damped periodically
extended, rotationally symmetrized and again smoothed form on R2. Of course
it is possible find sets of other examples nearby. So V (−1) = V (1) in Theorem 4
could be a little bit higher than −1. Only at −(
5+1)/4 ≈ −0.8 does it become
decomposable.
The construction of a rotationally invariant example for dimension two is not
so simple. A nicer one, or one for higher dimension, is not yet known.
References
[R69] D. Ruelle: Statistical Mechanics: Rigorous Results (W. A. Benjamin, inc.,
New York) 1969.
[RN55] F. Riesz and B. Sz. Nagy: Functional Analysis (Ungar, New York) 1955
[V64] Frederick A. Valentine: Convex sets (McGraw-Hill, NY (McGraw-Hill se-
ries in higher mathematics)) 1964
[R62] W. Rudin: Fourier Analysis on Groups (Interscience, New York) 1962
[G03] “Convex cones and their faces” Chapter 3 in: H. Glöckner: Positive Def-
inite Functions on Infinite-Dimensional Convex Cones; Memoirs AMS,
166, Number 789, 2003
	Introduction
	Properties of the interaction potentials
	Mathematical background
	Conclusion
ABSTRACT
  Thermodynamic stable interaction pair potentials which are not of the form
``positive function + real continuous function of positive type'' are presented
in dimension one. Construction of such a potential in dimension two is
sketched. These constructions use only elementary calculations. The
mathematical background is discussed separately.

<|endoftext|><|startoftext|>
Entanglement entropy at infinite randomness fixed points in higher dimensions
Yu-Cheng Lin1, Ferenc Iglói2,3 and Heiko Rieger1
Theoretische Physik, Universität des Saarlandes, 66041 Saarbrücken, Germany
Research Institute for Solid State Physics and Optics, H-1525 Budapest, Hungary
Institute of Theoretical Physics, Szeged University, H-6720 Szeged, Hungary
(Dated: November 3, 2018)
The entanglement entropy of the two-dimensional random transverse Ising model is studied with
a numerical implementation of the strong disorder renormalization group. The asymptotic behavior
of the entropy per surface area diverges at, and only at, the quantum phase transition that is
governed by an infinite randomness fixed point. Here we identify a double-logarithmic multiplicative
correction to the area law for the entanglement entropy. This contrasts with the pure area law valid
at the infinite randomness fixed point in the diluted transverse Ising model in higher dimensions.
PACS numbers: Valid PACS appear here
Extensive studies have been devoted recently to under-
stand ground state entanglement in quantum many-body
systems [1]. In particular, the behavior of various entan-
glement measures at/near quantum phase transitions has
been of special interest. One of the widely used entan-
glement measures is the von Neumann entropy, which
quantifies entanglement of a pure quantum state in a bi-
partite system. Critical ground states in one dimension
(1D) are known to have entanglement entropy that di-
verges logarithmically in the subsystem size with a uni-
versal coefficient determined by the central charge of the
associated conformal field theory [2]. Away from the crit-
ical point, the entanglement entropy saturates to a finite
value, which is related to the finite correlation length.
In higher dimensions, the scaling behavior of the entan-
glement entropy is far less clear. A standard expectation
is that non-critical entanglement entropy scales as the
area of the boundary between the subsystems, known as
the ”area law” [3, 4]. This area-relationship is known to
be violated for gapless fermionic systems [5] in which a
logarithmic multiplicative correction is found. One might
suspect that whether the area law holds or not depends
on whether the correlation length is finite or diverges.
However, it has turned out that the situation is more
complex: numerical findings [7] and a recent analytical
study [8] have shown that the area law holds even for
critical bosonic systems, despite a divergent correlation
length. This indicates that the length scale associated
with entanglement may differ from the correlation length.
Another ongoing research activity for entanglement in
higher spatial dimensions is to understand topological
contributions to the entanglement entropy [9].
The nature of quantum phase transitions with
quenched randomness is in many systems quite different
from the pure case. For instance, in a class of systems
the critical behavior is governed by a so-called infinite-
randomness fixed point (IRFP), at which the energy scale
ǫ and the length scale L are related as: ln ǫ ∼ Lψ with
0 < ψ < 1. In these systems the off-critical regions
are also gapless and the excitation energies in these so-
called Griffiths phases scale as ǫ ∼ L−z with a nonuni-
versal dynamical exponent z <∞. Even so, certain ran-
dom critical points in 1D are shown to have logarith-
mic divergences of entanglement entropy with universal
coefficients, as in the pure case; these include infinite-
randomness fixed points in the random-singlet universal-
ity class [12, 13, 14, 15, 16] and a class of aperiodic singlet
phases [17].
In this paper we consider the random quantum Ising
model in two dimensions (2D), and examine the disorder-
averaged entanglement entropy. The critical behavior of
this system is governed by an IRFP [10, 11] implying that
the disorder strength grows without limit as the system is
coarse grained in the renormalization group (RG) sense.
In our study, the ground state of the system and the
entanglement entropy are numerically calculated using a
strong-disorder RG method [18, 19], which yields asymp-
totically exact results at an IRFP. To our knowledge this
is the first study of entanglement in higher dimensional
interacting quantum systems with disorder.
The random transverse Ising model is defined by the
Hamiltonian
H = −
〈i,j〉
i . (1)
Here the {σαi } are spin-1/2 Pauli matrices at site i of an
L × L square lattice with periodic boundary conditions.
The nearest neighbor bonds Jij(≥ 0) are independent
random variables, while the transverse fields hi(≥ 0) are
random or constant. For a given realization of random-
ness we consider a square block A of linear size ℓ, and
calculate the entanglement between A and the rest of
the system B, which is quantified by the von Neumann
entropy of the reduced density matrix for either subsys-
tems:
S = −Tr(ρA log2 ρA) = −Tr(ρB log2 ρB). (2)
The basic idea of the strong disorder RG (SDRG) ap-
proach is as follows [18, 19]: The ground state of the sys-
tem is calculated by successively eliminating the largest
http://arxiv.org/abs/0704.0418v2
FIG. 1: (color online). An example of typical ground state
in the random quantum Ising model (a) in 1D, and (b) in 2D;
it contains a collection of spin clusters of various sizes, which
are formed and decimated during the RG. The entanglement
of a block (shaded area) is give by the number of decimated
clusters (indicated by red loops) that connect the block with
the rest of the system.
local terms in the Hamiltonian and by generating a new
effective Hamiltonian in the frame of the perturbation
theory. If the strongest bond is Jij , the two spins at i and
j are combined into a ferromagnetic cluster with an effec-
tive transverse field h̃(ij) =
. If, on the other hand,
the largest term is the field hi, the spin at i is decimated
and an effective bond is generated between its neighbor-
ing sites, say j and k, with strength J̃jk =
JijJik
. After
decimating all degrees of freedom, we obtain the ground
state of the system, consisting of a collection of indepen-
dent ferromagnetic clusters of various sizes; each cluster
of n spins is frozen in an entangled state of the form:
(| ↑↑ · · · ↑︸ ︷︷ ︸
n times
〉+ | ↓↓ · · · ↓︸ ︷︷ ︸
n times
〉). (3)
In this representation, the entanglement entropy of a
block is given by the number of clusters that connect
sites inside to sites outside the block [Fig. 1]. We note
that correlations between remote sites also contribute to
the entropy due to long-range effective bonds generated
under renormalization.
In 1D the RG calculation can be carried out analyti-
cally and the disorder-averaged entropy Sℓ of a segment
of length ℓ has been obtained as Sℓ =
log2 ℓ [12].
In higher dimensions the RG method can only be imple-
mented numerically. The major complication in this case
is that the model is not self-dual and thus the location of
the critical point is not exactly known. To locate the crit-
-1.5 -1 -0.5
-1.5 -1 -0.5 0
-0.15 -0.1 -0.05 0
L = 16
L = 32
L = 64
-1.5 -1 -0.5
-1.5 -1 -0.5 0
(a) (b)
(d) (e)
=1.175 h
=1.175
=1.175
=1.18 h0=1.17
PSfrag replacements
ln ˜h
ln ˜h
ln ˜J
ln ˜J
ln ˜h
/L0.55
FIG. 2: (color online). The distribution of the last deci-
mated effective log-fields lneh∞, and the distribution of the
last decimated effective log-bonds ln eJ∞ in the RG calcula-
tions. At h0 = 1.175, the distributions, shown in (a) and (b),
get broader with increasing system sizes, indicating the RG
flow towards infinite randomness, i.e. the system is critical.
A scaling plot of the data in (a) using energy-length scaling
lneh∞ ∼ L
ψ with ψ = 0.55 is presented in (c). The solid line
is just a guide to the eye. The subfigures (d) and (e) show
the log-field distribution at h0 = 1.18 and the log-bond dis-
tribution at h0 = 1.17, respectively; the distributions show
a power-law decaying tail in the low energy region, which is
clear evidence that the system is in the Griffiths phases.
ical point, we can make use of the fact that the excitation
energy of the system has the scaling behavior ln ǫ ∼ Lψ
at criticality, while it follows ǫ ∼ L−z in the off-critical
regions. In the numerical implementation of the SDRG
method, the low-energy excitations of a given sample can
be identified with the effective transverse field h̃∞ of the
last decimated spin cluster, or with the effective coupling
J̃∞ of the last decimated cluster-pair.
In our implementation we set for convenience the
transverse fields to be a constant h0 and the random
bond variables were taken from a rectangular distribu-
tion centered at J = 1 with a width ∆ = 0.5. The critical
point was approached by varying the single control pa-
rameter h0. Although this initial disorder appears to be
 = 1.170
 = 1.175
 = 1.180
 = 1.185
 = 1.190
L = 16
L = 24
L = 32
L = 40
L = 64
 = 1.175
PSfrag replacements
ln ℓℓ
FIG. 3: (color online). Left panel: The disorder averaged
block entropy per surface unit Sℓ/ℓ vs. the linear size of the
block ℓ for a system size L = 64 for various values of h0. We
observe that the entropy for ℓ = L/2 reaches its maximum
at the critical point hc = 1.175 (cf. Fig 2). Right panel:
The block entropy per surface area vs. ln ℓ on a log-scale for
different system sizes L at the critical point. The data show
a straight line (guided by the dashed line), corresponding to
the scaling obeying the area law with a double-logarithmic
correction, as given in Eq. (4).
weak, the renormalized field and bond distributions be-
come extremely broad even on a logarithmic scale [Fig. 2]
at the critical point h0 = hc = 1.175. This indicates the
RG flow towards infinite randomness. Slightly away from
the critical point, both in the disordered Griffiths-phase
with h0 = 1.18 and in the ordered Griffiths-phase with
h0 = 1.17, the distributions have a finite width and obey
quantum-Griffiths scaling h∞ ∼ L−z. At the critical
point one has IRFP scaling lnh∞ ∼ Lψ and we estimate
the scaling exponent as ψ = 0.55, quite close to the value
ψ = 0.5 for the 1D case [18].
Now we consider the entanglement entropy near the in-
finite randomness critical point. To obtain the disorder-
averaged entanglement entropy Sℓ of a square block of
size ℓ, we averaged the entropies over blocks in different
positions of the whole system for a given disorder real-
ization and then averaged over a few thousand samples.
In Fig. 3 we show the entropy per surface unit Sℓ/ℓ = sℓ
for different values of h0. This average entropy density
is found to be saturated outside the critical point, which
corresponds to the area law. At the critical point sℓ in-
creases monotonously with ℓ, and the numerical data are
consistent with a log-log dependence:
Sℓ ∼ ℓ log2 log2 ℓ (4)
as illustrated in Fig. 3. In this way we have identified an
alternative route to locate the infinite randomness criti-
cal point: it is given by the field h0 for which the average
block entropy at ℓ = L/2 is maximal. Indeed the nu-
merical results in Fig. 3 predict the same value of hc as
obtained from the scaling of the gaps. We note that the
same quantity, the position of the maxima of the average
entropy, can be used for the random quantum Ising chain
to locate finite-size transition points [21].
The log-log size dependence of the average entropy in
Eq.(4) at criticality is completely new; it differs from the
scaling behavior observed in 2D pure systems, like the
area law, Sℓ ∼ ℓ, for critical bosonic systems [7, 8], or
a logarithmic multiplicative correction to the area law,
Sℓ ∼ ℓ log2 ℓ, as found in free fermions [5, 6, 7, 8]. This
double-logarithmic correction can be understood via a
SDRG argument: In the 1D case a characteristic length
scale r at a given RG step is identified with the aver-
age length of the effective bonds, i.e. the average size
of the effective clusters. At the scale r(< ℓ) the frac-
tion of the total number of spins, nr, that have not been
decimated is given by nr ∼ 1/r [18]; these active (i.e.,
undecimated) spins have a finite probability to form a
cluster across the boundary of the block (a segment ℓ in
the 1D case) and thus to give contributions to the en-
tanglement entropy. Repeating the renormalization until
the scale r ∼ ℓ, the contributions to the entropy are
summed up: Sℓ ∼
dr nr ∼ ln ℓ, leading to the log-
arithmic dependence of the 1D model [12]. For the 2D
case with the same type of RG transformation with a
length scale r < ℓ, the fraction of active spins in the
renormalized surface layer of the block is nr ∼ ℓ/r. Here
we have to consider the situation in which some of these
active surface spins would form clusters within the sur-
face layer and thus contribute zero entanglement entropy;
the number of the active spins that are already engaged
in clusters on the surface at RG scale r is proportional
to ln r, as known from the 1D case, and only O(1) of
the active surface spins would form clusters connecting
the block with the rest of the system. Consequently,
the entropy contribution in 2D can be estimated as:
dr nr/ ln r ∼ ℓ ln ln ℓ, i.e. a double-logarithmic ℓ-
dependence, as reflected by the numerical data in Fig. 3.
Based on the SDRG argument described above, the
double-logarithmic correction to the area law appears to
be applicable for a broad class of critical points in 2D with
infinite randomness. For instance, the critical points of
quantum Ising spin glasses are believed to belong to the
same universality class as ferromagnets since the frustra-
tion becomes irrelevant under RG transformation, and
the same type of cluster formations as observed in our
numerics for the ferromagnet is expected to be generated
during the action of the RG. The entanglement entropy
at the IRFP is completely determined by the cluster ge-
ometries occurring during the SDRG.
Another type of IRFP in higher dimensions occurs
in the bond-diluted quantum Ising ferromagnet: The
Hamiltonian is again given by (1), but now Jij = 0 with
probability p and Jij = J > 0 with probability 1− p. At
percolation threshold p = pc there is a quantum critical
line along small nonzero transverse fields, which is con-
p  = 0.49
 = 0.50
p  = 0.52
1 10 10010
L = 128
L = 256
L = 512
 = 0.5
PSfrag replacements
FIG. 4: (color online). The entropy per surface area
Sℓ/ℓ = sℓ vs. ℓ near the percolation threshold pc = 0.5 for
the 2D bond-diluted Ising model at small transverse fields for
L = 512. The curves converge to finite values for ℓ → ∞,
corresponding to the area law. The inset shows sℓ − s∞ as
a function of ℓ. s∞ is estimated from sL/2 at L = 512. The
dashed line corresponds to ℓ−1.
trolled by the classical percolation fixed point, and the
energy scaling across this transition line obeys ln ǫ ∼ Lψ,
implying an IRFP [20]. The ground state of the system
is given by a set of ordered clusters in the same geom-
etry as in the classical percolation model – only nearest
neighboring sites are combined into a cluster. In this
cluster structure, the block entropy, determined by the
number of the clusters connecting the block and the rest
of the system, is bounded by the area of the block, i.e.
Sℓ ∼ ℓd−1 with d being the dimensionality of the sys-
tem. To examine this, we determined the entanglement
entropy by analyzing the cluster geometry of the bond-
diluted transverse Ising model. Fig. 4 shows our results
for the square lattice, which follow a pure area-law with
an additive constant: Sℓ = aℓ+ b+O(1/ℓ).
To summarize, we have found that the entanglement
properties at quantum phase transitions of disordered
systems in dimensions larger than one can behave quite
differently. Generalizing our arguments for the 2D case,
we expect for the random bond transverse Ising systems
a multiplicative d-fold logarithmic correction to the area
law in d dimensions at the critical point, whereas for
diluted Ising model at small transverse fields the area
law will hold in any dimension d > 1 at the percolation
threshold. Although both critical points are described
by infinite randomness fixed points, the structure of the
strongly coupled clusters in both cases is fundamentally
different, reflecting the different degrees of quantum me-
chanical entanglement in the ground state of the two sys-
tems. This behavior appears to be in contrast to one-
dimensional systems governed by IRFPs [12].
Other disordered quantum systems in higher dimen-
sions might also display interesting entanglement prop-
erties: For instance, the numerical SDRG has also been
applied to higher dimensional random Heisenberg anti-
ferromagnets which do not display an IRFP [22]. The
ground states involve both singlet spins and clusters with
larger moments; therefore, we expect the correction to
the area law to be weaker than a multiplicative loga-
rithm and different from the valence bond entanglement
entropy in the Néel Phase [23].
Useful discussions with Cécile Monthus are gratefully
acknowledged. This work has been supported by the Na-
tional Office of Research and Technology under Grant
No. ASEP1111, by a German-Hungarian exchange pro-
gram (DAAD-MÖB), by the Hungarian National Re-
search Fund under grant No OTKA TO48721, K62588,
MO45596.
[1] For a review see: L. Amico et al., quant-ph/0703044.
[2] P. Calabrese and J. Cardy, J. Stat. Mech. Theor. Exp.
2004, P06002 (2004).
[3] M. Srednicki, Phys. Rev. Lett. 71, 666 (1993).
[4] M.B. Plenio et al., Phys. Rev. Lett. 94, 060503 (2005);
M. Cramer et al., Phys. Rev. A 73, 012309 (2006);
M. Cramer and J. Eisert, New J. Phys 8 71 (2006).
[5] M.M. Wolf, Phys. Rev. Lett. 96, 010404 (2006); D. Gioev
and I. Klich, Phys. Rev. Lett. 96, 100503 (2006).
[6] W. Li et al, Phys. Rev. B 74, 073103 (2006).
[7] T. Barthel, M.-C. Chung, and U. Schollwöck, Phys. Rev.
A 74, 022329 (2006).
[8] M. Cramer, J. Eisert, and M.B. Plenio, Phys. Rev. Lett.
98, 220603 (2007).
[9] A. Kitaev and J. Preskill, Phys. Rev. Lett. 96, 110404
(2006); M. Levin and X.-G. Wen, Phys. Rev. Lett. 96,
110405 (2006); E. Fradkin and J.E. Moore, Phys. Rev.
Lett. 97, 050404 (2006).
[10] C. Pich et al., Phys. Rev. Lett. 81, 5916 (1998).
[11] O. Motrunich et al., Phys. Rev. B 61, 1160 (2000); Y.-
C. Lin et al., Prog. Theor. Phys. Suppl. 138, 479 (2000).
[12] G. Refael and J.E. Moore, Phys. Rev. Lett. 93, 260602
(2004).
[13] G. Refael and J.E. Moore, Phys. Rev. B 76, 024419
(2007).
[14] R. Santachiara, J. Stat. Mech. Theor. Exp. 2006, L06002
(2006).
[15] N.E. Bonesteel and K. Yang, cond-mat/0612503.
[16] A. Saguia et al., Phys. Rev. A 75, 052329 (2007).
[17] F. Iglói, R. Juhász, and Z. Zimborás, Europhys. Lett. 79,
37001 (2007); R. Juhász and Z. Zimborás, J. Stat. Mech.
Theor. Exp. 2007, P04004 (2007).
[18] D.S. Fisher, Phys. Rev. B 50, 3799 (1994). D.S. Fisher,
Phys. Rev. B 51, 6411 (1995).
[19] F. Iglói and C. Monthus, Phys. Rep. 412, 277 (2005).
[20] T. Senthil and S. Sachdev, Phys. Rev. Lett. 77, 5292
(1996).
[21] F. Iglói, Y.-C. Lin, H. Rieger, and C. Monthus, Phys.
Rev. B 76, 064421 (2007).
[22] Y.-C. Lin et al, Phys. Rev. B 68, 024424 (2003); Y.-
C. Lin et al, Phys. Rev. B 74, 024427 (2006).
[23] F. Alet et al, cond-mat/0703027.
http://arxiv.org/abs/quant-ph/0703044
http://arxiv.org/abs/cond-mat/0612503
http://arxiv.org/abs/cond-mat/0703027
ABSTRACT
  The entanglement entropy of the two-dimensional random transverse Ising model
is studied with a numerical implementation of the strong disorder
renormalization group. The asymptotic behavior of the entropy per surface area
diverges at, and only at, the quantum phase transition that is governed by an
infinite randomness fixed point. Here we identify a double-logarithmic
multiplicative correction to the area law for the entanglement entropy. This
contrasts with the pure area law valid at the infinite randomness fixed point
in the diluted transverse Ising model in higher dimensions.

<|endoftext|><|startoftext|>
Ultrasound Attenuation of Superfluid 3He in Aerogel
H.C. Choi, N. Masuhara, B.H. Moon, P. Bhupathi, M.W. Meisel, and Y. Lee∗
Microkelvin Laboratory, Department of Physics, University of Florida, Gainesville, FL 32611-8440, USA
N. Mulders
Department of Physics and Astronomy, University of Delaware, Newark, DE 19716, USA
S. Higashitani, M. Miura, and K. Nagai
Faculty of IAS, Hiroshima University, Kagamiyama 1-7-1, Higashi-Hiroshima 739-8521, Japan
(Dated: October 27, 2018)
We have performed longitudinal ultrasound (9.5 MHz) attenuation measurements in the B-phase
of superfluid 3He in 98% porosity aerogel down to the zero temperature limit for a wide range of
pressures at zero magnetic field. The absolute attenuation was determined by direct transmission
of sound pulses. Compared to the bulk fluid, our results revealed a drastically different behavior in
attenuation, which is consistent with theoretical accounts with gapless excitations and a collision
drag effect.
Liquid 3He has attracted intense interest for many
decades in the field of low temperature physics [1]. In
its normal state, liquid 3He has served as a paradigm
for a Fermi liquid whose nature transcends 3He physics.
The superfluid phases of 3He exhibit exotic and intrigu-
ing features associated with the broken symmetries in the
condensate, having an unconventional structure of the or-
der parameter with spin triplet p-wave pairing. Liquid
3He is arguably the most well-understood system mainly
because of its extreme intrinsic pureness at low temper-
atures. Therefore, it has provided important insights
in understanding other unconventional superconductors
such as the high temperature superconductors, the heavy
fermion superconductors, and in particular the more re-
cently discovered Sr2RuO4, which is also thought to have
the p-wave symmetry [2]. However, the same virtue has
hampered the effort in pursuing answers to an important
overarching question: how does the nature of a quantum
condensate (spin triplet p-wave superfluid in this case)
respond to increasing impurity or disorder?
Observation of superfluid transitions in liquid 3He im-
pregnated in high porosity aerogel in 1995 [3, 4] opened
a novel path to introducing static disorder in liquid 3He.
Aerogel possesses a unique structure, whose topology is
at the antipode of widely studied porous media such as
Vycor glass and metallic sinters. Due to its open struc-
ture, there are no well-defined pores in aerogel and conse-
quently, the liquid is in the proximity to the bulk. Ninety
eight percent porosity aerogel, which has been used in
most of the studies including this work, offers a corre-
lated network of strand-like aggregates of SiO2 molecules
whose structure can be characterized by the geometrical
mean free path (ℓ ≃ 100 - 200 nm), the diameter of strand
(r ≈ 3 nm), and the average inter-strand distance (d ≃ 25
- 40 nm). The coherence length of pure superfluid 3He,
ξ0, which varies from 20 nm (34 bar) to 80 nm (0 bar),
is at least an order of magnitude larger than the strand
diameter but is comparable to ℓ and d. As a result, the
scattering off the aerogel strand would have a significant
influence on the superfluid. It is now well established
that the superfluid transition temperature is significantly
depressed from that of the bulk, and the effect of pair-
breaking is progressively magnified at lower pressures,
leading to the possibility of a quantum phase transition
at Pc ≈ 6 bars [5]. To date, three distinct superfluid
phases have been experimentally identified, namely the
A-like, B-like, and A1-like phases [4, 6, 7, 8, 9]. The B-like
phase and the A1-like phase in aerogel show striking sim-
ilarity to their counterparts in the bulk superfluid [9, 10].
Detailed NMR studies [7, 8, 10] suggest that the aerogel
B-phase has the same order parameter structure as the
bulk B-phase. The aerogel A1-phase only appears in the
presence of magnetic field as is the case in the bulk [9].
However, the aerogel A-phase exhibits quite a different
behavior from the bulk A-phase (e.g. in NMR frequency
shift and superfluid density), although the overwhelming
experimental evidence suggests that it is an equal spin
pairing state. Various interpretations or novel proposi-
tions on the possible order parameter structure have been
suggested for this phase [11, 12, 13].
Nuclear magnetic resonance and ultrasound spec-
troscopy have been used in concert to investigate the mi-
croscopic structure of the superfluid phases [1, 14]. These
two experimental methods encompass complementary in-
formation on the orbital (ultrasound) and spin (NMR)
structure of the Cooper pairs. Rich spectra of order
parameter collective modes in bulk superfluids, which
are the fingerprints of specific broken symmetries in the
system, have been mapped by ultrasound spectroscopic
techniques [14]. In 2000, Nomura et al. [15] performed ul-
trasound attenuation measurements on 98% aerogel us-
ing a 16.5 MHz cw acoustic impedance technique. Their
work was limited to a single pressure at 16 bars and down
to 0.6 mK. Although their technique was not adequate in
determining absolute attenuation, they managed to ex-
tract the absolute sound attenuation after making auxil-
http://arxiv.org/abs/0704.0419v1
Time (µs)
FIG. 1: Acoustic response from the receiver vs. time at 34
bars for select temperatures ranging from 0.3 mK to 2.5 mK.
The aerogel superfluid transition is marked by a small arrow.
iary assumptions. A Bayreuth group [16] performed ab-
solute sound attenuation measurements in aerogel (97%
porosity) using a direct sound transmission technique at
10 MHz. They experienced poor transducer response,
and observed self-heating and no depression in the aero-
gel superfluid transition. We conducted high frequency
sound transmission experiments in 98% porosity aero-
gel, covering the whole phase diagram of the superfluid
phases in aerogel, from 8 to 34 bars and from the transi-
tion temperatures to as low as 200 µK.
In this experiment, two matched LiNbO3 longitudi-
nal sound transducers with the fundamental resonance at
9.5 MHz were used as a transmitter and a receiver. The
6.3 mm diameter transducers were separated by a Ma-
cor spacer maintaining a 3.05 (± 0.02) mm sound path
between the transducers where the aerogel sample was
grown in situ. This scheme ensures the best contact be-
tween the transducer surface and the aerogel, which is
crucial for clean sound transmission at the boundaries.
A 1 µs pulse was generated by the transmitter and de-
tected by the receiver. Temperature was determined by
a melting pressure thermometer (MPT) for T ≥ 1 mK
and a Pt NMR thermometer for T ≤ 1 mK which was
calibrated against the MPT. No non-linear response or
self-heating was observed at the excitation level used in
this work. All the data presented here, except for 8 bars,
were taken while warming with a typical warming rate
of 3 µK/min. A detailed description on the experimen-
tal cell and experimental techniques can be found else-
where [17, 18].
The temporal responses of the receiver taken at 34 bars
are shown in Fig. 1 for select temperatures ranging from
0.3 to 2.5 mK. The primary response, which starts to rise
around 8 µs, shows a rather broad response due to ring-
ing of the high Q transducer (Q ∼ 103). The step-like
structure of the receiver signal is caused by the slight mis-
match in the spectra of the transducers [18]. Below the
aerogel superfluid transition (marked around 2.1 mK by
an arrow in Fig. 1) the primary response starts to grow
and the trailing echoes emerge from the background, as
the sound attenuation decreases in the superfluid. No
change in the receiver signal was observed at the bulk
superfluid transition. The multiple echoes follow a bona
fide exponential decay in time. Absolute sound attenua-
tion was obtained in the following manner [19]. First, the
relative attenuation at each temperature was calculated
using the area under the primary response curve by inte-
grating the signal from the rising edge to a fixed point in
time (23 µs point). The absolute attenuation at 0.4 mK
and 29 bars, obtained using the primary signal and the
echoes, was used as a reference point in converting the
relative attenuation into the absolute attenuation. Due
to a drastic mismatch in the acoustic impedance at the
the transducer-aerogel/3He boundary, the signal absorb-
tion at the surface of transducers was ignored [19]. The
possible background contributions to attenuation from
the quasi-particle scattering off the cavity wall [20] and
the non-parallel alignment of the two transducers are es-
timated to be negligible.
The absolute attenuations on warming for several pres-
sures are plotted as a function of temperature in Fig. 2(a).
The superfluid transition is marked by the smooth drop
in attenuation. Our aerogel superfluid transition tem-
peratures are in excellent agreement with the previously
reported values for all pressures [5, 21]. At 9.5 MHz in
the bulk B-phase, a strong attenuation peak appears right
below the superfluid transition. This peak is the result of
the combined contributions from pair-breaking and cou-
pling to the order parameter collective modes. Above the
polycritical pressure, the B to A transition on warming
is registered as a sharp step in attenuation. In aero-
gel, none of these features exist. However, we did ob-
serve a sharp step in attenuation on cooling for P > 14
bars, which implies the existence of the supercooled A-
phase [19]. We were able to identify a rather smooth B
to A transition on warming for 29 and 34 bars within ≈
150 µK below the superfluid transition. This observation
is consistent with the previous results obtained using a
transverse acoustic impedance technique [13]. Therefore,
most of the attenuation data presented here are in the
aerogel B-phase. In the bulk B-phase with a clean gap,
the attenuation follows α ∝ e−∆(T )/kBT below the atten-
uation peak, practically reaching zero attenuation below
T/Tc ≈ 0.6, due to thermally activated quasi-particles,
where ∆(T ) is the temperature dependent gap and kB
is the Boltzmann constant. In contrast, the attenuation
in aerogel decreases rather slowly with temperature and
remains high even at T/Tc ≈ 0.2. Furthermore, a pe-
culiar shoulder feature appears at T/Tc ≈ 0.6 for higher
pressures. This feature weakens gradually and eventually
disappears at lower pressures, Fig. 2(a).
Sound propagation for higher harmonics up to 96 MHz
was measured for several temperatures and pressures,
but no evidence of sound propagation was found above
30 MHz even at 0.3 mK, where the lowest attenuation
is expected. Below about 10 mK, the scattering process
is dominated by the temperature independent impurity
scattering off the aerogel, and at 9.5 MHz, ωτi ∼ 0.1 for
all pressures where τi = ℓ/vf (see below for ℓ). There-
fore, the sound mode should remain in the hydrodynamic
limit. This claim is bolstered by the observation of the
strong frequency dependence in attenuation and the ab-
sence of a temperature dependence in the normal fluid
attenuation [15]. The coupling between the normal com-
ponent of the superfluid 3He and the mass of the elas-
tic aerogel modifies the conventional two-fluid hydrody-
namic equations [22, 23]. This consideration leads to two
(slow and fast) longitudinal sound modes with different
sound speeds, cs = ca
ρsρa/ρ, and cf = c1
1+ρaρs/ρnρ
1+ρa/ρn
Here, cf(s) represents the speed of the fast (slow) mode,
ρn(s) is the normal fluid (superfluid) density (ρ = ρn+ρs),
ρa is the aerogel density, c1 is the speed of hydrodynamic
sound in 3He, and finally ca is the sound speed of the bare
aerogel. From the time of flight measurements, we found
the sound speed in aerogel consistently lower (by ≈ 20%)
than c1 for all pressures studied and in good agreement
with the values obtained using the expression above [24].
Detailed analysis of sound velocity for various pressures
will be presented in a separate publication.
Low mass density and the compliant nature of aero-
gel necessitate the consideration of effective momentum
transfer upon quasi-particle scattering off the aerogel,
which generates dragged motion of aerogel. Ichikawa et
al. [25] incorporated the collision drag effect in calculat-
ing the dispersion relation in the normal fluid. Their
model offered a successful explanation for the experimen-
tal results of the Northwestern group [15]. Recently, Hi-
gashitani et al. [26, 27] extended this model to study the
longitudinal sound (fast mode) propagation in superfluid
3He/aerogel within the framework of the two-fluid model.
The drag effect can be described phenomenologically by
a frictional force, ~Fd =
(~vn −~va), introducing an addi-
tional relaxation time τf , where ~vn(a) is the normal fluid
component (aerogel) velocity. This effect is of particu-
lar importance when ωτi < 1, and the total attenuation
(Eq. (130) of ref. [27]) is
ω2/2cf
1 + ρaρs/ρnρ
ρ2aτf/ρρn
1 + ρa/ρn
4η/3ρc21
1 + ρaρs/ρnρ
), (1)
where η is the shear viscosity of liquid 3He. The first
term (αf ) arises from the frictional damping caused by
the aerogel motion relative to the normal fluid compo-
nent, and the second term (αv) from the conventional
hydrodynamic sound damping associated with the viscos-
ity. This expression allows us to extract ℓ in this system
from our absolute attenuation at the transition temper-
ature, αc. The inset of Fig. 3 shows our results of αc for
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.5 1.0 1.5 2.0
 10 bar
 14 bar
 20 bar
 25 bar
 33 bar
 34 bar
T / T
 10 bar
 12 bar
 14 bar
 21 bar
 25 bar
 34 bar
T (mK)
FIG. 2: (a) Absolute attenuation for various pressures vs.
temperature (color in on-line version). Thin solid lines are
the results of a quadratic fit to the low temperature part
(T/Tc <∼ 0.4) of the data at each pressure. (b) Normalized
sound attenuation vs. normalized temperature. The results
of theoretical calculation (solid lines, color in on-line version)
are plotted along with the experimental results at 34 bars for
comparison.
various pressures. The solid lines are the result of calcu-
lation using Eq. (1) for three different mean free paths,
ℓ = 100, 120, and 140 nm. As can be seen, ℓ = 120 nm
produces an excellent fit to our data for the whole pres-
sure range, which is in good agreement with the val-
ues obtained from the thermal conductivity (90 nm) [28]
and spin diffusion (130 nm) [29] measurements. With
the knowledge of the mean free path, one can calculate
the full temperature dependence of sound attenuation in
the superfluid phase. The results of the calculation (in
the unitary limit) following the prescription described in
ref. [27] are displayed in Fig. 2(b) along with the experi-
mental results at 34 bars. The calculation reproduces all
the important features observed in our measurements. In
particular, the conspicuous shoulder structure appearing
near T/Tc ≈ 0.6 at 33 bars softens at lower pressures and
is completely absorbed in an almost linear temperature
dependence below 20 bars. This behavior is the charac-
teristic of αf [27]. A fast decrease in ρn right below Tc
produces the bump in αf , and αf → 0 as T → 0. On
the other hand, αv decreases monotonically and reaches
a finite value due to non-zero ρn and the impurity states
0 5 10 15 20 25 30 35
0 10 20 30
P (bar)
 100 nm
 120 nm
 140 nm
P (bar)
FIG. 3: Normalized zero temperature attenuation vs. pres-
sure. The dashed line is a guide for eye. Inset: Pressure
dependence of sound attenuation at Tc. The solid lines (color
on-line) are the results of theoretical fit for ℓ = 100, 120, and
140 nm (see text).
induced inside the gap as T → 0. The quantitative agree-
ment between the theory and experiment, however, is not
yet satisfactory. The calculation utilizes the isotropic ho-
mogeneous scattering model (IHSM) [30], which tends to
overestimate ∆(T ) and ρs compared to the experimen-
tally determined values [3, 23]. As shown in ref. [31],
the inhomogeneity gives rise to the reduction of the aver-
age value of the order parameter and consequently yields
larger η and ρn, which in turn increases α0 but decreases
the frictional contribution. It is also expected that the
non s-wave scattering components make non-trivial con-
tributions to the viscous and frictional relaxation times
in a direction that improves the quantitative agreement.
Theoretical calculations based on the IHSM [27, 32]
predict that the impurity states would completely fill the
gap, leading to a gapless superfluid when τiTc < 1 for the
B-phase in the unitary limit. We estimate 0.3 < τiTc < 1
for 10 < P < 34 bars with ℓ = 120 nm. The normalized
zero temperature attenuation (α0/αc) obtained by ex-
trapolating the low temperature part of the attenuation
(solid lines in Fig. 2(a)) is plotted in Fig. 3, where α0/αc
increases as the sample pressure is reduced and seems
to approach unity near Pc ≈ 6 bars. Since the viscosity
ratio is directly related to the density of states at zero
energy through η(0)/η(Tc) = n(0)
z, z = {2,4} for the
{Born, unitary} limit where n(0) is the normalized den-
sity of states at zero energy [27], the finite α0/αc is strong
evidence of a finite n(0). The gapless behavior has been
experimentally suggested by recent thermal conductivity
(for P ≤ 10 bars) [28] and heat capacity (for 11 ≤ P ≤ 29
bars) [33] measurements. The pressure dependence of
α0/αc is in qualitative agreement with the combined re-
sults of Fisher et al. and Choi et al. Although all of
these experimental techniques (including ours) are lim-
ited to probe the impurity states near the Fermi level,
the behavior is consistent with the theoretical predictions
with gapless excitations. Unlike the thermodynamic and
transport measurements, the high frequency ultrasound
measurement has a potential to unveil a larger portion
of the impurity states profile from the frequency depen-
dence.
We acknowledge support from an Alfred P. Sloan Re-
search Fellowship (YL), NSF grants DMR-0239483 (YL),
DMR-0305371 (MWM), and a Grant-in-Aid for Scientific
Research on Priority Areas (No. 17071009) from MEXT
of Japan (SH and KN). We would like to thank J.-H.
Park for his technical assistance, and Jim Sauls, Peter
Wölfle, and Bill Halperin for useful discussions.
∗ yoonslee@phys.ufl.edu
[1] D. Vollhardt and P. Wölfle, The Superfluid Phases of
Helium Three, (Taylor and Francis, London, 1990).
[2] A.P. Mackenzie and Y. Maeno, Rev. Mod. Phys. 75, 657
(2003).
[3] J.V. Porto and J.M. Parpia, Phys. Rev. Lett. 74, 4667
(1995).
[4] D. T. Sprague et al., Phys. Rev. Lett. 75, 661 (1995).
[5] K. Matsumoto et al., Phys. Rev. Lett. 79, 253 (1997).
[6] D.T. Sprague et al., Phys. Rev. Lett. 77, 4568 (1996).
[7] H. Alles et al., Phys. Rev. Lett. 83, 1367 (1999).
[8] B.I. Barker et al., Phys. Rev. Lett. 85, 2140 (2000).
[9] H.C. Choi et al ., Phys. Rev. Lett. 93, 145302 (2004).
[10] V.V. Dmitriev et al., JETP Lett. 76, 312 (2002); V.V.
Dmitriev et al., Physica B 329, 324 (2003).
[11] G.E. Volovik, JETP Lett. 63, 301 (1996).
[12] I.A. Fomin, J. Low Temp. Phys. 134, 769 (2004).
[13] C. L. Vicente et al., Phys. Rev. B 72, 094519 (2005).
[14] W.P. Halperin and E. Varoquaux, in Helium Three, ed.
by W.P. Halperin and L.P. Pitaevski (Elsevier, Amster-
dam, 1990).
[15] R. Nomura et al., Phys. Rev. Lett. 85, 4325 (2000).
[16] L. Hristakos, Ph.D. thesis, University of Bayreuth
(2001).
[17] H.C. Choi et al., will appear in J. Low Temp. Phys.
[18] H.C. Choi, Ph.D. thesis, University of Florida (2007).
[19] Y. Lee et al., will appear in J. Low Temp. Phys.
[20] G. Eska et al., Phys. Rev. B 27, 5534 (1983).
[21] G. Gervais et al., Phys. Rev. B 66, 054528 (2002).
[22] M.J. McKenna, T. Slawecki, and J.D. Maynard, Phys.
Rev. Lett. 66, 1878 (1991).
[23] A. Golov et al. Phys. Rev. Lett. 82, 3492 (1999).
[24] For example, c = 350 (± 10) m/s at 34 bars from our
measurement, and cf = 370 m/s.
[25] T. Ichikawa et al., J. Phys. Soc. Jpn. 70, 3483 (2001).
[26] M. Miura et al., J. Low Temp. Phys. 134, 843 (2004).
[27] S. Higashitani et al., Phys. Rev. B 71, 134508 (2005).
[28] S. N. Fisher et al., Phys. Rev. Lett. 91, 105303 (2003).
[29] J.A. Sauls et al., Phys. Rev. B 72, 024507 (2005).
[30] E. V. Thuneberg et al., Phys. Rev. Lett. 80, 2861 (1998).
[31] R. Hänninen and E.V. Thuneberg, Phys. Rev. B 67,
214507 (2003).
[32] P. Sharma and J.A. Sauls, Physica B 329-333, 313
(2003).
[33] H. Choi et al., Phys. Rev. Lett. 93, 145301 (2004).
mailto:yoonslee@phys.ufl.edu
ABSTRACT
  We have performed longitudinal ultrasound (9.5 MHz) attenuation measurements
in the B-phase of superfluid $^3$He in 98% porosity aerogel down to the zero
temperature limit for a wide range of pressures at zero magnetic field. The
absolute attenuation was determined by direct transmission of sound pulses.
Compared to the bulk fluid, our results revealed a drastically different
behavior in attenuation, which is consistent with theoretical accounts with
gapless excitations and a collision drag effect.

<|endoftext|><|startoftext|>
The Hourglass—Consequences of Pure
Hamiltonian Evolution of a Radiating System
Donald McCartor
ABSTRACT
Hourglass is the name given here to a formal isolated quantum system that can
radiate. Starting from a time when it defines the system it represents clearly
and no radiation is present, it is given straightforward Hamiltonian evolution.
The question of what significance hourglasses have is raised, and this question
is proposed to be more consequential than the measurement problem.
1 Hourglasses
2 Physics without true histories
3 But histories are sometimes good
4 Phlogiston and oxygen
5 A closer look at quantum engineering
6 Conclusion
But I want to know the particular go of it
– the plea of James Clerk Maxwell as a young child
concerning, among many things, the bell-wires that
ring the bells that summon servants. [Mahon]
1 Hourglasses
Suppose that theory develops in such a way that quantum fields can be handled
like nonrelativistic quantum mechanics. Then if we are interested in something,
perhaps gooseberry bushes, we can model one as we would conceive it to be at
some instant and then follow its development through time. And not only the
atoms and molecules would be modeled, but also the radiation.
This is a scheme for the imagination. The gooseberry bush, though not
isolated, would grow within a suitable environment that would be an isolated
system, complete in itself. We do learn well from isolated systems, both real
ones in the laboratory and those envisaged in our theoretical musings.
We will provide the bush with air, earth, and water. And there can be
life-giving sunlight shining on it. As for the light that had been reflected or
emitted from the bush before the present time, we will leave that out. Such
light goes off and away, so it could only matter as information about what the
http://arxiv.org/abs/0704.0420v1
bush had been doing. We will take the bush just as it is now.
The gooseberry bush is then developed forward in time. Lagrange or
Hamilton would have recognized what we are doing, for we are doing physics
the classical way. We have an initial condition and we are finding out what
will happen next.
As we move toward the future, light shoots out from the bush, as we
expect. But, disconcertingly, the bush starts to lose definition. Its parts lose
their precise places. Within a few weeks it is a scarcely recognizable mess.
Let’s go back in time, then. This is terrifyingly worse. The bush has been the
subject of a vast conspiracy. Light has been streaming in on it from the entire
universe. The bush swallows it up. Then at the present time this suddenly all
stops. Time symmetry of the Hamiltonian makes it happen like that.
This is the hourglass. It is really more like a cone, with the light streaming
in before the set-up time forming one nappe and the light streaming out after
it the other. But hourglass is a more colorful name.
What to do? We will try to bring quantum mechanics to the rescue.
We will make what is conventionally called a measurement, but cautiously. A
place is chosen well outside the gooseberry bush, and a time chosen that is
later than when we set the state of the bush up. A check is made of whether
there are at this place and time any photons coming from the direction of
the bush. By doing things this way, we won’t disturb the bush at all, and
we don’t care if we disturb the escaping light. We get from this, of course, a
probability distribution over various possibilities for photons at this place and
time. Encouraged by this small success, we choose another place and time and
do the same. And this is what is nice: the two measurements are compatible.
Thus we get correlations between them, too. Emboldened by this opportunity,
we do millions of them, which all formally combine into a single measurement
with a single set of possible results. Each possible result of the single, combined
measurement is a combination of results of all the individual measurements
of light made at the various times and places. Thus each combined result
constitutes a kind of movie of the gooseberry bush.
What will the most probable of these results be like? This is the problem
of the hourglass. To begin with, however, it may be that there is no hourglass.
The deepest quantum theory might not provide a system with a state and
its evolution. Or if it does, it could still be objected that the Hamiltonian
evolution should not have been allowed to run on unchecked. There should
have been many quantum jumps. By leaving them out, quantum mechanics
has been misused, and what results is no matter.
But Lagrange and Hamilton and would have been best pleased if these
objections did not hold. And surely we would then hope to see in each of the
most probable results something like a movie of a bush producing gooseberries:
physics working right. The bushes in these movies would look much alike at
the start but then gradually differ, as chance has it. We would learn something
about how gooseberry bushes grow gooseberries!
Certainly Lagrange and Hamilton would have thought the problem of the
hourglass a leading one, if they had known of quantum mechanics. Indeed,
every physicist might like to take a stab at guessing its solution, just to orient
themselves in their science. Does the hourglass fail, and if so, where and why?
Or if it does produce movies true to our world, but not from a developing
quantum state that might be the true history of a gooseberry bush, rather
from a “history” that does at one time represent a gooseberry bush well, but
soon is unlike anything that ever did exist, then how can this be?
2 Physics without true histories
Here is what I think about it. But before we go into that, see if you don’t agree
that the hourglass question has gravity, and this regardless of the ideas that I
or anyone might have for its answer.
Now my guess is that quantum mechanics will give us movies of ripening
gooseberries, produced by hourglasses through the means described or some-
thing rather like that. And I think that to understand hourglasses, not to
solve the measurement problem, is the central question for the understanding
of quantum mechanics.
For the measurement problem begs a question, which makes it futile. It
assumes that we learn from physics simply because physics describes well those
things that exist. Like this example from classical physics. There exist in a
gas a multitude of zipping molecules. At any given moment, each particle has
its particular position and momentum, and over time this forms their history.
Physics has told us what a gas is—precisely what exists there. This is what
lets us learn about gases. Undoubtedly this is how Boltzmann saw it.
But when we look at the statistical mechanics he produced, and even
more at that of Gibbs, a person will acquire deep qualms about this view-
point. Boltzmann’s analysis of the collision of molecules seems like straightfor-
ward common sense. He is looking at what they are likely to do. But when
Loschmidt’s reversibility objection is brought forward, the lucidity vanishes.
Gibbs’s more abstract statistical mechanics made the problem even starker.
Gibbs found beautiful mathematical form in Boltzmann’s (and Maxwell’s)
work, which he generalized. He held that thermodynamic systems should be
represented as being in states that have the form of certain probability dis-
tributions over classical states. Gibbs could not well understand what these
probabilities were about, but he saw that his theory was good nevertheless. To
keep this lack of clear comprehension from poisoning work with the theory, he
devised a work-around. The axioms of probability theory are reflected in the
axioms of finite set theory. One can effectively solve problems of probability
by thinking about finite sets. So Gibbs suggested that we simply think about
these probabilities in terms of sets. The word he used was ensembles.
Gibbs described his intent in these words: “The application of this prin-
ciple is not limited to cases in which there is a formal and explicit reference to
an ensemble of systems. Yet the conception of such an ensemble may serve to
give precision to notions of probability. It is in fact customary in the discus-
sion of probabilities to describe anything which is imperfectly known as some-
thing taken at random from a great number of things which are completely
described.”[Gibbs]
But physicists have never been able to accept gracefully that they don’t
understand the elements of their science. So they have been moved to think
that they do understand Gibbs’s probabilities somehow, and this has led to
two missteps.
One has been to regard the probabilities in Gibbs’s theory as being the
result of our ignorance of the detailed state of the system we are considering.
But when a probability distribution is useful, this is a very great step up in
order from chaos. Ignorance cannot create order. If water always boils at the
same temperature, it is not our fault. Rather than being so explained, for it
is not, Gibbs’s theory shows that there is something deeply wrong with classi-
cal mechanics. Classical statistical mechanics is not really a form of classical
mechanics. It is quantum mechanics being born.
The following words of Gibbs seem to show that Gibbs himself took the
view just scotched. “The states of the bodies which we handle are certainly not
known to us exactly. What we know about a body can generally be described
most accurately and most simply by saying that it is one taken at random from
a great number (ensemble) of bodies which are completely described.”[Gibbs]
The impression that I get, though, is that Gibbs is cautiously hedging. He is
not saying plainly, as he might have, that a body we handle will be in some
completely described state, so that if we describe it with an ensemble, the
probabilities in the ensemble simply represent our partial ignorance about that
state. He does say plainly that his method seems to work.
The other misstep has come about because quantum theory is a mirror
of Gibbs’s statistical mechanics in the sense that it is based on what are prob-
abilities in form (in other words, sets of non-negative real numbers that add
up to one) and we don’t know what they mean in general. It is true that we
can make good sense of them as real probabilities in various special cases. For
instance, when quantum mechanics is applied to the Stern-Gerlach experiment,
to see the detector react is like seeing a coin tossed. But in the general case no
such kind of experience is directly implied by these probability forms. There
are, for example, canonical distributions in quantum mechanics too, and we
don’t ever expect to see a detector pick a pure state out of a hot cup of coffee.
We then sometimes think about these formal probabilities in terms of
ensembles, just as Gibbs did, and for the same reason. Where the formal
probabilities are highest and the members of the ensemble most numerous,
there the greatest significance will lie, whatever it may be. This is fine. But
quite often physicists say that ensembles (that is to say, Gibbs’s work-around)
provide the means to understand quantum theory. This is clearly wrong.
But to get back to the measurement problem. As you well know, but
for explicitness I will say it anyway, to see a problem in measurement is to
suppose that quantum mechanics can describe the equipment in the lab as it
exists at the start of an experiment, but when the representation is continued,
the equipment becomes entangled with the microscopic systems it is examining
and gets smeared. Then quantum mechanics has stopped describing what we
know exists in the lab and needs to be corrected so that it will continue to
describe what exists.
But it isn’t so that quantum mechanics, if it is to show us some pre-
dictability in nature, must provide us directly with histories of the existence of
things, as by a developing wave packet. As evidence, I offer the hourglass.
3 But histories are sometimes good
If physics does not work simply because it describes what exists, and if, rather,
the way of the hourglass is right, then a corollary is that how we learn about
nature necessarily becomes more indirect. We are given such information as
radiation provides about something, not directly told what exists there. And
for the purpose of inferring useful rules of nature’s behavior, what we deal with
are imagined situations that we think typical of what we want to learn about,
not faithful descriptions of actual things. No real radiating system is like an
hourglass, except momentarily near the hourglass’s neck.
But quantum engineering may temper the truth of that judgment just a
bit. For there is also an engineering use of quantum mechanics where, somewhat
as classical mechanics does it, for a time we can use a wave packet to represent
the development of an actual situation we are dealing with. But this is rather
more special, for we must take care to set things up so that this will work. The
vacuum must be excellent, etc. Isolation is important.
A simple example of quantum engineering is an ion that alternately blinks
for a spell and remains dark for a spell while sitting in an ion trap that is irradi-
ated by lasers. You can picture the ion well enough by thinking of Schrödinger
evolution of a wave packet with occasional quantum jumps interspersed. You
might then be tempted to think that everything can be handled effectively in
the same way, at least in principle. We have just not been clever enough to
find New York City’s wave packet and its measurement collapses.
This is trouble. The worst of it is that you will be led to ignore hourglasses
and what they imply, since clearly hourglasses cannot represent the history
of things in the same manner that you have advantageously represented the
history of the blinking ion.
On the other hand, imagine that decades ago physicists had taken hour-
glasses to their hearts, as well I think they might have. Then they could have
been tempted to look upon representing an ion in a trap by Schrödinger evolu-
tion of a wave packet with quantum jumps as ‘following the wrong philosophy’
(by trying to represent the actual histories of things with wave packets), and
might have disdained to do so. There is a lesson here. Don’t take your philo-
sophical ideas too seriously, we’re not good enough for that.
I believe, though, that from hourglasses you would be able to infer that
Schrödinger evolution with jumps is a simple and effective (not perfect) way
to regard a blinking ion in a trap. The hourglasses would then be in this sense
the more fundamental theory.
4 Phlogiston and oxygen
But what is a quantum jump? Here is where I think the community of physi-
cists has been careless in the use of words, perhaps mixed with real misun-
derstanding. Two principles of quantum physics have been formulated. The
first principle (promoted by Dirac and von Neumann) is that when a measure-
ment is made on a system, an immediately following measurement will give the
same result. Therefore, right after any measurement the system must be in the
eigenstate corresponding to the value found.
The second principle is that if the probabilities of the possible results of
all the measurements that may be made on a system are defined, then there
will be a (unique) quantum state that the system may be said to be in that will
yield these probabilities. Add to this that sometimes two measurements may
be made on a system without interfering with each other. Then when one of the
two measurements has a certain result this will define a conditional probability
for any result of the other measurement (simply divide the probability that
both results occur by the probability that this result of the first measurement
occurs). According to the second principle, then, there will be a quantum state
that yields these probabilities (for the possible results of any measurement that
may be made without interfering with, or suffering interference from, a given
measurement that has had a certain result).
Please notice that the argument above assumes that the set of all the
measurements compatible with a given measurement effectively constitutes ‘all
the measurements that may be made on a system’ as needed by the second
principle.
Now consider a system A in the state α. It is composed of two subsystems,
B and C, in reduced states β and γ respectively. A measurement is made on
subsystem B and it has a result. By the first principle, there is a quantum
state β′ that will yield the probabilities of the possible results of any immedi-
ately following measurement that might be made on subsystem B. And by the
second principle, there is a quantum state γ′ that will yield the probabilities of
the possible results of any compatible measurement made on subsystem C.
For the supplanting in one’s considerations of β by β′ there is the historical
name ‘collapse of the wave packet’. For the supplanting in one’s considerations
of α by γ′ most physicists use the same phrase (or any of its several synonyms).
It would easier to think about these things if different names were used for the
two. ‘Collapse of the wave packet’ might be retained for the first and, say,
‘conditioning of the wave packet’ adopted for the second.
This is all the more important because the first principle is an out and
out mistake by Dirac and von Neumann, whereas the second is an inalienable
part of quantum mechanics. To those two mathematically minded, and so
logically minded, people, the dignity of quantum mechanics required that there
be measurements, so that quantum mechanics might be real physics. And
since quantum mechanics did not say that a system had to have, before the
measurement, the value found in the measurement, the dignity of measurement
required that it at least have that value afterward, or what sort of measurement
was this anyway?
Tacked on to this was the fact that so distressed Schrödinger: wave pack-
ets spread interminably. If a developing wave packet were to represent the
history of a system, which they assumed to be necessary, then the spreading
had to be checked, and an occasional quantum jump such as their measurement
theory presupposed might do that.
And experiment lent some support. Above all, if an electron went splat
somewhere on a screen, which they regarded as a measurement by the experi-
menter of the electron’s position, then conservation of charge suggested strongly
that the electron could be found subsequently thereabout. This was the origin
of the phrase ‘collapse of the wave packet’. Too, the famous Stern-Gerlach
experiment allows a following measurement of spin, which will give the same
result as the first if the first measurement’s detection has been delicate enough.
But the idea of a quickly following measurement is just not well-defined
in general. And there are cases where the principle must prove false under any
reasonable definition of a following measurement. For example, a particle might
lose most of its energy in those collisions that measured its energy. Or if the
momentum of a charged particle were measured by the curvature of its path in a
magnetic field, the particle might end up going in the wrong direction, although
this is, to be sure, correctable. Those events called “measurements” are what
they are, and if they fall short of truly being measurements of properties, so
be it!
If the first principle is an error, then that leaves us with only one principle,
the second, and people might then be inclined to continue to use the traditional
phrase ‘collapse of the wave packet’, but now meaning the replacements the
second principle defines. This would result in the transfer, in the course of
history, of the meaning of the phrase from the first principle to the second. I
think that this would have the same unhappy effect as if Lavoisier, not wishing
to burden the world with a neologism, had instead given to the word phlogiston
a new sense.
5 A closer look at quantum engineering
The second principle has a very different flavor from the first. For it leads to
conditional probabilities, and these lend themselves to imaginative thinking. In
this mind-set you are free to take up points of view according to what you wish
to learn. The first principle, however, leads to probabilities that are thought
to be the properties of real events, such as an actual toss of a coin. You are
now in a reality mind-set. That probability is as much a part of the coin toss
as is the silver of the coin, and you must deal with it. You have no choice.
But I don’t mean to say that this is an absolute difference between the two
principles. Rather, they tend to lead us into these respective modes of thought,
and vice versa. Bearing this in mind, let us look at the hourglass and quantum
engineering.
First consider the hourglass that represents a gooseberry bush. By choos-
ing one among the more probable of the results of the course of observation
of light, we will select what is in effect a likely movie of such a bush. We can
look at the movie, and the marvelous algorithms of our brains will construct
an idea of a gooseberry bush and follow it through its history. We have gotten
something good out of this, and we have made no use of the conditional proba-
bilities offered by the second principle at all. However, if we are not limited to
one movie then we can use conditional probabilities as they are normally used,
to explore various interesting possibilities while taking into account how likely
they are when we are supplied with certain information.
Notice that we have been thinking imaginatively. No one would suppose
that we have directly grasped the reality of a gooseberry bush in our garden in
this way, particularly because real gooseberry bushes do not start to exist at a
special time.
Now consider quantum engineering. By means of careful construction
of the equipment a clearly defined situation can be set up where the power
of wave packets to give understanding will be enhanced. On the other hand,
here there can be significant entanglement. The power of our minds to achieve
understanding through their everyday methods will be set at nought.
Then for quantum engineering, a history formed by wave packet develop-
ment with occasional saltations may be a quite good route to understanding.
We would take up this idea of what exists simply because it is good enough to
help us with the job at hand. And for this case, where we find it fit to think
that we are dealing with an actual system that is an evolving wave packet,
and with saltations that we regard as actual events, but in a way so different
from that intended by Dirac and von Neumann, then perhaps a third term,
say, ‘change of the wave packet’, would be appropriate for the saltations.
These changes of the wave packet would differ from collapses of the wave
packet because, although they would be thought of as real events just as col-
lapses have been, they would be derived from conditioning of wave packets, in
the following manner. When a system sends out radiation (or anything else)
that will not return, in one way you can consider the system of interest to be
the whole, including the radiation, and in another way you can consider it to
be the reduced system that does not include the radiation. Upon observation
of the radiation you will derive from the result and from the wave packet of
the whole system a wave packet for the system less the radiation, and this we
have called conditioning of the wave packet. But if before the observation your
interest had been focussed on the system less the radiation, and thus on its
reduced wave packet, then you will have gone from one wave packet to another
wave packet for the system less the radiation. And since you are reckoning
these wave packets as being portions of the system’s history, this looks like a
quantum jump. This is what is meant by a change of the wave packet. There
is no need to define any such change of the wave packet precisely, of course.
No more is there need to suppose that it can be defined with precision.
6 Conclusion
If hourglasses cannot be true histories, how can it be that we can learn from
them? What lets them tell us how gooseberry bushes grow, when they are only
momentarily like a gooseberry bush? I haven’t said a word about this yet.
First of all, there is an assumption hidden behind this puzzlement of ours.
The assumption is that we have no reason to be perplexed that we can learn
from things that can be true histories. For if it did not seem so perfectly natural
to us that we learn from true histories, then it would not appear unnatural to
learn from what clearly cannot be a true history. But I think this assumption
of ours is thoughtless, and I will try to explain why.
We make judgments about when we are better informed and when less so.
The ideas we hold true when thought to be better informed are compared with
those that we held when not so well informed. In this way, through the device
of taking the ideas that we presently have most confidence in as trustworthy,
we try to gather how successfully our ideas tend to stack up against reality. It
is not quite so simple, however, since we know from sad experience that the
ideas we now trust may fail us. But we have the conviction, or hope, that if
such happens we can land on our feet again. We will search for still better
ideas until we find something that works.
We are apt to give to this situation a logical cast. Namely, by positing
that there is a best of all possible ideas in whose direction we are headed. This
posit can be helpful. It can give us greater confidence in our search for better
ideas. If we guess that this best idea will have a certain form, and we guess
well, it can guide our search. But there is no necessity for this posit; all we
really know is what was said above.
Another thing we like to do is to find where things are and when. Our
vision, touch, and hearing do this automatically all the time, and we often give
them some conscious help, say by turning the head. When we are a teenager
it is likely to occur to us that there must be a best of all possible such ideas, a
complete map of where everything is, and has been, and perhaps will be too.
A further thought may cross one’s mind. Maybe this is all that our world is.
For instance, if one person likes another, this should show up in that person’s
actions, which the map will completely define. Maybe the liking simply is those
actions.
Now I will propose some physics, the red dust theory. According to this
theory the world is made up of an exceedingly large number of very fine specks
of a scarlet dust. Because of its ruddiness, the dust is extremely beautiful, if
only we could see it, but we will not be concerned with that. The red dust
theory differs from most physics in that the flight of the particles does not have
to satisfy a differential equation, it is merely continuous.
The interpretation of the theory is quite simple. Where we find things
there will be a crowd of these specks, and where we find vacancy they will be
much sparser. But can our world be as this theory says? Surely it can. There
will be among its solutions one that maps the entire history of our universe with
extraordinary precision. The collisions of galaxies, the evolution of whales, the
experiments in laboratories, all will be there and rightly shown.
Now you may think that the red dust theory is hopelessly bad physics
and should be ignored. It may be hopelessly bad, but it should not be ignored.
It is a benchmark. If another physics theory is proposed, is it better than the
red dust theory, and if so, just why? This is especially pertinent if the other
theory intends, as does the red dust theory, to give a precise description of all
that exists. Bohmian quantum mechanics is an example.
But what I intend to put up against the benchmark is classical mechanics.
Everyone will agree that classical mechanics is far better than the red dust
theory. You can do things with classical mechanics; you can’t do anything
with the red dust theory. For instance, you can pull a pendulum to the side
and let it go. It will swing. Classical mechanics can give you the history of that
swing ahead of time. The red dust theory has so many solutions compatible
with the way things are at the start that it won’t tell you anything useful about
how things will go.
Our experience with classical mechanics is that it is practical, but why
is this so? The most natural idea is that the world must at bottom be clas-
sical mechanical. Since we understood the pendulum by assigning a classical
mechanical state to it and evolving the state, there must then be an evolving
classical mechanical state that the whole world is in, and that would explain
why classical mechanics is so useful.
When we look at the history of our universe, however, and particularly at
the evolution of life over billions of years, and when we consider the resources
that it is likely that classical mechanics has to offer in its solutions, it doesn’t
really seem possible that there is any classical mechanical history that would
match our universe’s history, no matter how exquisitely the initial conditions
are chosen. For the more detailed structures of the classical representation
must in time dissolve into lasting chaos, and I would think rather quickly.
Still, this does depend on a point I don’t actually know the answer to.
For in order to make the universe behave as you wish, that is to say, give a
good account of continents rifting and hummingbirds feeding, it might be that
to obtain each additional second of the desired history it is always sufficient
to correctly calculate another, say, thousand decimal places for the positions
and momenta of the molecules in the initial state. Or to the contrary, the first
thousand decimal places might give you one second, the next thousand only a
further half second, then a fourth of a second, and so on.
Yet even if I am wrong in this, we would just go from Scylla to Charybdis.
For in that case classical mechanics must be like the red dust theory, where,
from our point of view, anything is possible, or too close to anything. In either
case the classical solution set would imply no structure such as we experience
in life. No sculpted dunes, no ants carting morsels, no shower of hail would
pop out of it. Nor can one imagine any reason why the solution set would show
a preference for depicting creatures learning classical mechanics, or if so doing
benefiting by it. In short, there is a total disconnect between the fact that
classical mechanics is useful and the hypothesis that the universe as a whole is
a classical mechanical system.
That leaves us with an unsolved mystery: why does classical mechanics
work for us? And classical mechanics is the archetype of the kind of physics
where we learn from what can be true histories of things.
To my mind, the hourglass with observation of its emitted light is deeply
conservative physics. It makes quantum mechanics as seamless a continuation
of the physics of the previous centuries as is at all possible. This is because
of the mathematical form of the hourglass, which is a continuous development
from initial conditions, as well as the form of the observations, which impinge
as little as can be. And when this leads to our being given movies rather than
direct histories, then I am surprised (and amused) by this, but accept it for the
sake of the qualities mentioned, which I consider to be virtues that promise.
Nature is teaching us another lesson.
Bohr’s old quantum theory was based on quantum jumps, and I think
this was a wonderful piece of exploration in the dark. When Heisenberg’s new
quantum mechanics came along, quantum jumps were kept. The jumps would
allow direct histories to be retained as the foundation of our physics, though
at the expense of the continuous Hamiltonian evolution of the wave packets
(and at the expense of clear definition, for no one has ever been able to specify
just when and where and what the quantum jumps are). Like Schrödinger,
I am jarred by this. If we are given the choice of preserving philosophical
principle or mathematical form, I think we should prefer mathematical form.
Isn’t this what Copernicus did?
A final thought: If learning from the movies provided by hourglasses is
how we do physics, then to know why quantum mechanics works would be to
know why all the inferences we might make from the movies will fit together
with sufficient coherence. But to know this would require that we know all the
things we might ever think of. It’s hopeless. Though we might nibble at the
problem, by showing that the hourglasses have some needed characteristics. So
I think the hourglasses will leave us with an essentially unfathomable mystery.
References
Gibbs, J. Willard [1981]: Elementary Principles in Statistical Mechanics , Wood-
bridge, CT: Ox Bow Press, p. 17 and p. 163.
Mahon, Basil [2003]: The Man Who Changed Everything, Chichester, UK:
John Wiley & Sons Ltd.
The hourglasses suggest that von Neumann’s measurement theory should be
recast for imaginative use rather than for the description of actual situations.
This gives one extra freedom in setting it up, and it can then work more
effectively. An outline is here:
McCartor, Donald [2004]: ‘Quantum Thought Experiments Can Define Na-
ture’, Concepts of Physics, Vol. I, no. 1, pp. 105–150 and quant-ph
0702192.
donaldamccartor@earthlink.net
ABSTRACT
  Hourglass is the name given here to a formal isolated quantum system that can
radiate. Starting from a time when it defines the system it represents clearly
and no radiation is present, it is given straightforward Hamiltonian evolution.
The question of what significance hourglasses have is raised, and this question
is proposed to be more consequential than the measurement problem.

<|endoftext|><|startoftext|>
Serb. Astron. J. } 174 (2007), 73 - 76
Preliminary report
THE Σ − D RELATION FOR PLANETARY NEBULAE:
PRELIMINARY ANALYSIS
D. Urošević1, B. Vukotić2, B. Arbutina1,2 and D. Ilić1
1Department of Astronomy, Faculty of Mathematics, University of Belgrade
Studentski trg 16, 11000 Belgrade, Serbia
2Astronomical Observatory, Volgina 7, 11160 Belgrade 74, Serbia
(Received: February 22, 2007; Accepted: March 30, 2007)
SUMMARY: An analysis of the relation between radio surface brightness and
diameter, so-called Σ−D relation, for planetary nebulae (PNe) is presented: i) the
theoretical Σ − D relation for the evolution of bremsstrahlung surface brightness
is derived; ii) contrary to the results obtained earlier for the Galactic supernova
remnant (SNR) samples, our results show that the updated sample of Galactic
PNe does not severely suffer from volume selection effect - Malmquist bias (same
as for the extragalactic SNR samples) and; iii) we conclude that the empirical
Σ −D relation for PNe derived in this paper is not useful for valid determination
of distances for all observed PNe with unknown distances.
Key words. planetary nebulae: general – Radio continuum: ISM – Methods: ana-
lytical – Methods: statistical
1. INTRODUCTION
The relation between radio surface bright-
nesses and diameters of supernova remnants (SNRs),
the so-called Σ−D relation, has been subject of the
extensive discussions in the last more than fourty
years. Due to improvements of the observational
techniques (radio-interferometers), the several hun-
dreds planetary nebulae (PNe) were resolved in the
last two decades at radio frequencies, but the Σ−D
relation for PNe was not discussed until now. By
using radio data, some statistical methods were es-
tablished in order to determine distances to PNe.
The main method was related to the correlation be-
tween radius of PNe and brightness temperature –
R − Tb relation (Van de Steene and Zijlstra 1995,
Zhang 1995, Phillips 2002). The different samples
of Galactic PNe with known distances were defined
in these papers. All the obtained empirical R − Tb
relations were used for determination of distances to
PNe for which the independent distances (in order of
R− Tb dependence) were not obtained earlier.
The samples of Galactic PNe are better for
statistical analysis than the samples of Galactic
SNRs. The selection effects should be smaller in
the case of PN samples. However, the selection ef-
fects surely influence the Galactic PN samples and
the statistical determination of distances to Galactic
PNe has to be highly uncertain.
The main objectives of this paper are the fol-
lowing:
i) to derive a simple form of the theoretical
Σ−D relation for PNe by analyzing the evolution of
radio bremsstrahlung surface brightness,
ii) to discuss whether the updated sample of
radio PNe is affected by the selection effects, and,
iii) to check whether the Σ−D relation is valid
for determination of distances to PNe.
http://arxiv.org/abs/0704.0421v3
D. UROŠEVIĆ et al.
2. ANALYSIS AND RESULTS
2.1. Theoretical Σ−D relation for PNe
The thermal bremsstrahlung mechanism is re-
sponsible for radiation of HII regions at radio wave-
lengths. The bremsstrahlung volume emissivity εν of
a PN can be shown to be (Rohlfs and Wilson 1996):
εν [ergs s
−1 cm−3 Hz−1] ∝ n2T−1/2, (1)
where n is the volume density and T is the thermo-
dynamic temperature of interstellar medium (ISM).
The surface brightness can be expressed as:
Σν ∝ ενD, (2)
where D is the diameter of PN. Combining Eqs. (1)
and (2), we obtain:
Σν ∝ n
D. (3)
Our next step is to express dependance of n and T
on D. For a constant velocity mass flow the den-
sity distribution is ̺ = Ṁ
4πr2v
, i.e. n ∝ D−x, where
x = 2. Moreover, for the isothermal envelope with
a power-law electron density distribution there is re-
lationship between the shape of the density distri-
bution and the power-law index of the radio contin-
uum spectra (see Gruenwald and Aleman 2007, and
references therein). Supposing that n ∝ D−2 and
T=const. (HII regions are approximately isothermal
at T ∼ 104 K), we obtain the simplest form of the
theoretical Σ−D relation for PNe:
Σν ∝ D
−3. (4)
This is a standard power-law form of the Σ − D
relation which can be written in general form as
Σ = AD−β , that is the same as in the case of SNRs.
It is possible that x in density distribution is
slightly higher, x & 2, and that the temperature is
not strictly constant throughout the nebula. We can
expect to see temperature gradients in PNe arising
from radiation hardening. More energetic photons
will travel further and when they are absorbed by the
PN they will impart greater kinetic energy to the ions
thereby producing a higher temperature. Using the
numerical model results given by Evans and Dopita
(1985), we calculate the dependence between logTe
and logD and find the low slope (≈ 0.1). Therefore,
this only slightly changes the slope of the theoretical
Σ−D relation. The value β = 3 is then a theoretical
lower limit, and the Σ − D relation could only be
steeper, as one can see from Eq. (3).
2.2. The empirical Σ−D relation for PNe
The most important prerequisite for deriving
a proper empirical Σ − D relation is defining of a
representative sample of PNe. The distances to the
calibrators have to be determined by accurate meth-
ods, e.g. trigonometric or spectroscopic parallaxes of
central stars in PNe, or by a method that uses the
expansion of nebulae. On the other hand, all sam-
ples suffer from the severe selection effects that arise
from limitation in sensitivity and resolution, but the
most severe selection effect for the Galactic samples
of PNe is Malmquist bias; i.e. intrinsically bright
PNe are favored because they are sampled from a
larger spatial volume compared to any given flux lim-
ited survey. The result is a bias against low surface
brightness nebulae such as highly evolved old PNe.
In this paper we use the updated sample of PNe at
the distances less than 0.7 kpc collected by Phillips
(2002). The influence of Malmquist bias in this sam-
ple is limited because of the limitation in distances to
PNe. In addition, we assume that the distances are
accurately determined for this sample of relatively
close PNe. The empirical Σ − D relation at 5 GHz
for 44 calibrators with distances less than 0.7 kpc
(Phillips 2002) has the form:
Σ56Hz = 2.33
+0.88
−0.64 · 10
−2.07±0.19
. (5)
The parameters A and β are calculated by least-
squares fitting procedure with correlation coefficient
−0.86. The corresponding Σν −D diagram is shown
in Fig. 1.
 0.01  0.1  1  10
D [pc]
Fig. 1. The Σ−D diagram at 5 GHz for 44 Galac-
tic PNe with distances less than 0.7 kpc.
The form of Eq. (5) is very close to the so-
called trivial Σ−D form with β = 2 (for details see
Arbutina et al. 2004). The additional test in order to
estimate the validity of Eq. (5) pertains to the possi-
ble dependence between the luminosity and diameter
of PNe. The Lν−D diagram is shown in Fig. 2. The
scatter in Lν − D plane shows that the correlation
between Lν and D is poor (correlation coefficient = -
0.06) and therefore the physical dependence between
L and D could not be confirmed by this statistical
procedure.
THE Σ−D RELATION FOR PLANETARY NEBULAE
 0.01  0.1  1  10
D [pc]
Fig. 2. The L −D plot at 5 GHz for 44 Galactic
PNe with distances less than 0.7 kpc.
3. DISCUSSION
The theoretical Σν −D relation (Eq. (4)) for
PNe, derived in this paper, describes a trend of de-
creasing radio surface brightness with increasing di-
ameter of an object. The radiation mechanism used
in this simple derivation is thermal bremsstrahlung.
This is the basic process of production of the radio
radiation in HII regions. The theoretically derived
slope (β = 3) is steeper than the slope from the em-
pirical relation given by Eq. (5). This discrepancy
can be explained by the low quality of the sample of
Galactic PNe or by the assumptions used in deriva-
tion of theoretical relation. Due to small variation
in power-law density distribution with x & 2 (Gru-
enwald and Aleman 2007, and references therein)
and approximately constant temperature of expand-
ing envelope of PNe, theoretical slope can be slightly
steeper than in Eq. (4). Therefore, we conclude
that the theoretical relation has the correct form,
but our empirical relation is under influence of bi-
ases that could make the slope shallower. On the
other hand, there are some attempts to show that
evolution of PNe are not linear in log-log scales (e.g.
Phillips 2004). These different dependences cannot
be derived from the thermal bremsstrahlung radia-
tion formula (Eq. (1)).
A very interesting feature regarding the em-
pirical relation for Galactic PNe (Eq. (5)) is that
the slope is approximately equal to the slope of triv-
ial Σ − D relation. Therefore, we conclude that
Malmquist bias is not so severe as in cases of Galac-
tic SNR samples. This slope (β ≈ 2) was obtained
for the extragalactic samples of SNRs (except M82
sample) where Malmquist bias is small, because all
the SNRs are at the approximately same distance
(see Urošević 2002, Urošević et al. 2005).
The large scatter in Lν − D plane (Fig. 2)
suggests that the slope in Eq. (5) does not have
real and valid physical interpretation. It is a kind
of luminosity-diameter scattering artefact which pro-
duces the trivial Σ ∝ D−2 form. Therefore, the rela-
tion defined by Eq. (5) is not precise enough for de-
termination of valid distances to Galactic PNe. This
is due to the different biases: the limitations in sen-
sitivity and resolution of radio surveys, the source
confusion, Malmquist bias (in mild form), mixture
of different types of PNe in the same sample, and
insufficient precision in determining the distances to
the 44 calibrators.
4. SUMMARY
The main results of this paper may be sum-
marized as follows:
i) The theoretical Σν −D relation for the radio
evolution of thermal bremsstrahlung surface
brightness of PNe in form of Σν ∝ D
−3 is de-
rived.
ii) Our results show that the updated sample of
Galactic PNe does not severely suffer from vol-
ume selection effect - Malmquist bias (same
as in cases of the extragalactic SNR samples).
This is opposite to results obtained earlier for
the Galactic SNR samples.
iii) Due to analysis of the Lν −D dependence, we
conclude that the Σν −D relation for Galactic
PNe is not useful for reliable determination of
distances for all observed PNe with unknown
distances.
The above observation leads to the more gen-
eral comment that PNe may have very different ini-
tial conditions leading to independent evolutionary
paths. These paths could follow the same theoreti-
cal Σ−D curve but with varying intercepts, leading
to the scatter such as the one found in this paper.
Acknowledgements – The authors would like to thank
the referee Prof. Nebojsa Duric for valuable com-
ments which have improved this paper. This research
has been supported by the Ministry of Science and
Environmental Protection of the Republic of Serbia
(Projects: No 146002, No 146003, No 146012, No
146016).
D. UROŠEVIĆ et al.
REFERENCES
Arbutina, B., Urošević, D., Stanković, M. and Tešić,
Lj.: 2004, Mon. Not. R. Astron. Soc., 350,
Evans, I.N. and Dopita, M.A.: 1985, Astrophys. J.
Suppl. Series, 58, 125
Gruenwald, R. and Aleman, A.: 2007, Astron. As-
trophys., 461, 1019.
Phillips, J.P.: 2002, Astrophys. J. Suppl. Series,
139, 199.
Phillips, J.P.: 2004, Mon. Not. R. Astron. Soc.,
353, 589.
Rohlfs, K. and Wilson, T.L.: 1996, Tools of Radio
Astronomy, Springer
Urošević, D.: 2002, Serb. Astron. J., 165, 27
Urošević, D., Pannuti, T. G., Duric, N., Theodorou,
A.: 2005, Astron. Astrophys., 435, 437.
Van de Steene, G.C. and Zijlstra, A.A.: 1995, As-
tron. Astrophys., 293, 541.
Zhang, C.Y.: 1995, Astrophys. J. Suppl. Series, 98,
Σ − D RELACIJA ZA PLANETARNE
MAGLINE: PRELIMINARNA ANALIZA
D. Urošević1, B. Vukotić2, B. Arbutina1,2 and D. Ilić1
1Department of Astronomy, Faculty of Mathematics, University of Belgrade
Studentski trg 16, 11000 Belgrade, Serbia
2Astronomical Observatory, Volgina 7, 11160 Belgrade 74, Serbia
UDK 524.37–77–54
Prethodno saopxteǌe
Prikazana je analiza tzv. Σ − D re-
lacije izme�u povrxinskog sjaja na radio-
frekvencijama i dijametra planetarnih
maglina (PM): i) izvedena je teorijska Σ −D
relacija za evoluciju povrxinskog sjaja
stvorenog zakoqnim zraqeǌem; ii) suprotno
rezultatima dobijenim ranije za uzorke
saqiǌene od Galaktiqkih ostataka super-
novih, naxi rezultati pokazuju da najnovije
formirani uzorak Galaktiqkih PM ne trpi
veliki uticaj zbog zapreminskog selekcionog
efekta, tzv. Malmkvistovog selekcionog
efekta (isto vaжi za vangalaktiqke uzorake
ostataka supernovih); i iii) zakǉuqujemo da
empirijska Σ − D relacija za PM izvedena
u ovom radu nije upotrebǉiva za pouzdana
odre�ivaǌa daǉina do svih posmatranih PM
sa nepoznatim daǉinama.
ABSTRACT
  An analysis of the relation between radio surface brightness and diameter,
so-called Sigma-D relation, for planetary nebulae (PNe) is presented: i) the
theoretical Sigma-D relation for the evolution of bremsstrahlung surface
brightness is derived; ii) contrary to the results obtained earlier for the
Galactic supernova remnant (SNR) samples, our results show that the updated
sample of Galactic PNe does not severely suffer from volume selection effect -
Malmquist bias (same as for the extragalactic SNR samples) and; iii) we
conclude that the empirical Sigma-D relation for PNe derived in this paper is
not useful for valid determination of distances for all observed PNe with
unknown distances.

<|endoftext|><|startoftext|>
Polarization conversion in a silica microsphere
Pablo Bianucci, Chris Fietz, John W. Robertson, Gennady Shvets, and Chih-Kang Shih∗
Physics Department, The University of Texas at Austin, Austin, Texas 78712
(Dated: May 22nd, 2007)
Abstract
We experimentally demonstrate controlled polarization-selective phenomena in a whispering
gallery mode resonator. We observed efficient (≈ 75%) polarization conversion of light in a silica
microsphere coupled to a tapered optical fiber with proper optimization of the polarization of the
propagating light. A simple model treating the microsphere as a ring resonator provides a good fit
to the observed behavior.
In the past few years, microresonators
have received a lot of attention1. Whis-
pering gallery mode (WGM) resonators2,
such as microspheres,3 microtoroids4 and
microrings5 have been the object of inten-
sive research, both in their fundamental prop-
erties (such as quality factors, non-linear
effects6,7 and coupling to quantum systems8
among many) and applications that include
lasers9,10, chemical11 and biological12 sens-
ing and photonic devices13. Microsphere res-
onators, particularly when coupled to a ta-
pered optical fiber14,15, are very useful to
characterize these properties and test new
ideas due to their high Q-factors and ease of
fabrication.
Recent reports have shown a further step,
taking into account the difference between
modes with different polarizations in micro-
spheres. In particular, changes in the output
polarization after coupling into the resonator
have been observed16 and transverse electric
(TE) and transverse magnetic (TM) modes
have been discriminated17.
Polarization conversion has been observed
in microrings5 and explained as a resonant
enhancement of polarization coupling caused
by waveguide bending. However, the mode
structure of microspheres makes it possible
to completely decouple the polarizations and
still obtain conversion. In this article, we re-
port on the observation of efficient, controlled
polarization conversion by using a silica mi-
crosphere resonator coupled to a tapered op-
tical fiber. We demonstrate that highly ef-
ficient polarization conversion (75% for our
particular case, higher for better optimized
conditions) is enabled by a specific orienta-
tion between the incoming light polarization
and fiber-resonator displacement. Specifi-
http://arxiv.org/abs/0704.0422v2
cally, for a horizontally stacked, strongly cou-
pled, fiber and resonator combination, a 45◦
incident polarization results in the largest
conversion. The conversion results in a
strong dip of the transmitted light with the
original polarization and a strong spike in the
orthogonally polarized transmission.
We fabricated the tapered fiber using the
“flame brush” technique18. This technique
involves mechanically stretching the optical
fiber while scanning a flame (oxy-hydrogen
in our case) over the region to be tapered.
Due to constraints in the maximum pulling
length, the fiber tapers are not completely
adiabatic, but typical losses are never larger
than 50%. SEM studies of the tapers reveal
a characteristic diameter close to 1 µm. The
microsphere was fabricated using a CO2 laser
to stretch and melt an optical fiber tip19. In
this way it is easy to obtain spheres with di-
ameters ranging from 10 µm to 200 µm. For
this particular experiment the sphere diame-
ter was measured using an optical microscope
to be 52 µm (corresponding to an estimated
free spectral range of 1.2 THz).
We mounted the microsphere on a piezo-
electric scanner which allowed us to finely
position the sphere over a range of a few
micrometers, and the stretched fiber taper
on a piezoelectric stick-slip walker permitting
both coarse and fine positioning of the fiber
taper next to the sphere. Both sphere and
PD PC2 PC1
Tapered
Fiber
Laser PR FC
Microsphere
10 um
FIG. 1: Experimental setup schematic. PR is
a polarization rotator, FC a fiber coupler, PC1
and 2 are fiber polarization controllers, P a po-
larizer and PD is an amplified photodiode. Inset:
Image of a sphere near a tapered fiber.
taper were then situated inside a compact,
closed chamber. We used an external cav-
ity tunable diode laser purchased from New
Focus as the excitation source, centered at
a wavelength near 927.85 nm. The polariza-
tion rotator set the polarization of the laser
which was then coupled into the optical fiber
using a free-space coupler. A polarizer and
an amplified photodiode at the fiber output
were used to analyze the transmitted light.
Space constraints in the chamber and lim-
itations on the arrangement of the optical
fiber caused bending of the fiber in differ-
ent locations and subsequent scrambling of
the input polarization. As a way to compen-
sate for these changes in the polarization, we
used two polarization controllers. The first
one preceded the fiber taper, compensating
for polarization changes up to the position of
the microsphere. The second controller was
placed after the fiber taper to ensure the lin-
earity of the output polarization. Figure 1
shows a schematic of this experimental setup.
We used the following procedure to mea-
sure the degree of polarization conversion.
First, the incoming polarization was selected
by using the polarization rotator. Then we
adjusted the first polarization controller to
ensure the polarization at the fiber taper was
linear and matched to one set of modes (“x-
polarized”). The next step was to uncouple
the taper from the sphere and make sure the
output polarization was linear (we achieved
this by turning the detection polarizer to its
position for minimum transmission and then
minimizing this transmission further with the
second polarization controller). This orienta-
tion of the detection polarizer is the one we
call “orthogonal”. Rotating the polarizer 90
degrees (the “parallel” orientation) resulted
in maximum transmission, with a contrast
of about 95%, confirming the linear polar-
ization of the output light. Finally, we posi-
tioned the sphere and the tapered fiber try-
ing to optimize the coupling, while measur-
ing transmission spectra for both orientations
of the detection polarizer. We repeated the
procedure for two other incoming polariza-
tions: one matched to the other set of sphere
modes (“y-polarized”) and another at 45◦ be-
tween the x- and y- polarization axis (“xy-
polarized”).
Figure 2 shows the resulting transmission
-30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30
Frequency shift (GHz)
Input Detection
polariz. polariz.
FIG. 2: Transmission spectra for different in-
put polarizations. The resonant frequencies cor-
respond to modes with l ≈ 496. The x- and
y-polarizations are orthogonal and correspond
to the polarization eigenmodes of the resonator.
The xy-polarization is oriented at 45 degrees
from both x and y. The dark traces correspond
to the detection polarizer parallel to the input
polarization and the light traces correspond to a
crossed detection polarization.
spectra for the different configurations. The
cases for both the x- and y- polarized light
show the same behavior: a set of transmission
dips whenever the laser frequency hit a whis-
pering gallery resonance when the detection
polarization is parallel and no signal when
it is perpendicular. The xy-polarized case is
more interesting: the parallel detection po-
larization shows dips for both sets of modes,
while the orthogonal one shows transmission
peaks at the whispering gallery resonances.
At the highest peak, more than 70% of the
incident light had its polarization converted.
Most of the observed polarization conver-
sion can be understood by using a simple ring
resonator model for the whispering gallery
modes. In this model, the transmission of
polarized light through the resonator is given
by14,20
τ(φ) =
r − aeiφ
1− raeiφ
, (1)
where r is the field coupling coefficient be-
tween the resonator and the waveguide, a is
the attenuation due to the resonator intrinsic
losses and φ = 2π(ν−ν0)tRT is the phase shift
imposed by the resonator (ν and ν0 are the in-
coming light frequency and the resonant fre-
quency respectively, while tRT is the round-
trip time in the resonator). The model is
scalar, but we can include the polarization by
simply assuming that modes with orthogonal
polarizations are independent and neglecting
cross-polarized couplings (using an analysis
similar to that by Little and Chu21). In this
way we obtain the same expression, with pos-
sibly different parameters, for the transmis-
sion of both polarizations. In our particu-
lar case of whispering gallery modes in mi-
crospheres, we can safely assume that modes
with different polarizations are not degener-
ate, so one of the polarizations will be unaf-
fected by the presence of a resonance. This
differs from the case of microrings5, where the
conversion depends on coupling between TE
and TM modes.
The essence of the effect lies in the differ-
ent resonator response for each polarization.
For a strongly coupled fiber and microsphere,
|τ | ≈ 1, but the phase shift ψ = arg(τ)
is changed by ∆ψ = π as the frequency is
sweeped across the resonance. Because the
orthogonal polarization is transmitted unal-
tered, the transmitted polarization rotates by
as much as 90◦ for the initial xy-polarization.
When the fiber and the resonator are horizon-
tally stacked, the effect is maximized when
the incident polarization is at 45◦ degrees
with respect to the horizontal plane.
Conversion efficiencies of up to 25% can
be achieved if one of the polarizations is crit-
ically coupled to the ring, i.e. is completely
absorbed in+the resonator. Achieving higher
efficiencies requires increasing the resonator-
waveguide coupling to obtain a significant po-
larization dependent phase shift which will
change the final polarization state into one
closer to the desired one.
We can look in more detail at the data
by concentrating into a pair of modes show-
ing good conversion, now accounting for laser
frequency drift between scans using a Fabry-
Perot interferometer as a reference. This de-
tailed spectrum can be seen in in Fig. 3. The
resonance on the right side of Fig. 3, near a
shift of 31 GHz, shows a polarization conver-
sion of about 60%. The left-side resonance
shows a conversion near 75%. The higher ef-
ficiency is due to the leftmost mode being
26 27 28 29 30 31
Frequency shift (GHz)
30.5 31
26 26.5 27 27.5 28
FIG. 3: Detailed view of two modes showing
polarization conversion. The dashed lines are
fits using equations of the form of equation 1.
The fit parameters for the leftmost features are
a = 0.99997, r = 0.99977. The corresponding
ones for the rightmost feature are a = 0.99999,
r = 0.99993.
more strongly coupled (displaying a broader
feature) to the tapered fiber than the right-
most one. Consistent with theoretical predic-
tions, in both cases one of the polarizations
is over-coupled to the ring. The lack of a
shift in the center frequency of the features
also indicates that each pair of peak and dip
corresponds to a single resonant mode.
This phenomenon could be useful for po-
larization control in photonic devices, such
as narrowband polarization-dependent filter-
ing or switching, as shown in Fig. 4 or even
arbitrary polarization manipulation.
We have observed efficient polarization
conversion on a microsphere resonator cou-
PBSPBS
a) b)
FIG. 4: Schematic of a resonator working as
wavelength-selective polarization switch. a)
Two signals with different wavelengths+ (green
and blue) and orthogonal polarizations pass un-
changed through the waveguide and the un-
coupled resonator. A polarization beamsplitter
then routes the signals to different paths. b)
The polarization of the resonant signal (blue) is
converted by the coupled resonator, and both
signals are sent through the same path. The
resonator-waveguide coupling can be changed in
different ways, including mechanical or optical22
means.
pled to a tapered optical fiber and used a
simple theoretical model to understand the
phenomenon. The model does not involve di-
rect coupling of the orthogonal polarizations,
but rather a polarization-selective phase shift
induced by the resonator. This effect should
be common to all whispering gallery mode
resonators and could be useful for polariza-
tion control in photonic devices.
Acknowledgments
This work was supported by NSF-NIRT
(DMR-0210383), the Texas Advanced Tech-
nology program, and the W.M. Keck Foun-
dation. G.S. and C.F. acknowledge support
from ARO MURI grant no. W911NF-04-01-
0203.
∗ Electronic address: shih@physics.utexas.edu
1 K. J. Vahala, Nature (London) 424, 839
(2003).
2 A. B. Matsko and V. S. Ilchenko, IEEE J. Sel.
Top. Quantum Electron. 12, 3 (2006).
3 M. L. Gorodetsky, A. A. Savchenkov, and
V. S. Ilchenko, Opt. Lett. 21, 453 (1996).
4 D. K. Armani, T. J. Kippenberg, S. M.
Spillane, and K. J. Vahala, Nature (London)
421, 925 (2003).
5 A. Melloni, F. Morichetti, and M. Martinelli,
Opt. Lett. 29, 2785 (2004).
6 A. E. Fomin, M. L. Gorodetsky, I. S. Gru-
dinin, and V. S. Ilchenko, J. Opt. Soc. Am.
B 22, 459 (2005).
7 T. Carmon, H. Rokhsari, L. Yang, T. Kip-
penberg, and K. J. Vahala, Phys. Rev. Lett.
94, 223902 (2005).
8 Y.-S. Park, A. K. Cook, and H. Wang, Nano.
Lett. 6, 2075 (2006).
9 M. Cai and K. Vahala, Opt. Lett. 26, 884
(2001).
10 S. I. Shopova, G. Farca, A. T. Rosenberger,
W. M. Wickramanayake, and N. A. Kotov,
Appl. Phys. Lett. 85, 6101 (2004).
11 A. M. Armani and K. J. Vahala, Opt. Lett.
31, 1896 (2006).
12 S. Arnold, M. Khoshsima, I. Teraoka,
S. Holler, and F. Vollmer, Opt. Lett. 28,
272 (2003).
13 F. Michelotti, A. Driessen, and M. Bertolotti,
eds., Microresonators as building blocks for
VLSI photonics, vol. 709 of AIP Conference
Proceedings (American Institute of Physics,
Melville, New York, 2003).
14 M. Cai, O. Painter, and K. J. Vahala, Phys.
Rev. Lett. 85, 74 (2000).
15 J. C. Knight, G. Cheung, F. Jacques, and
T. A. Birks, Opt. Lett. 22, 1129 (1997).
16 G. Guan and F. Vollmer, Appl. Phys. Lett.
86, 121115 (2005).
17 H. Konishi, H. Fujiwara, S. Takeuchi, and
K. Sasaki, Appl. Phys. Lett. 89, 121107
(2006).
18 T. A. Birks and Y. W. Li, J. Lightwave Tech-
nol. 10, 432 (1992).
19 D. S. Weiss, V. Sandoghar, J. Hare,
V. Lefèvre-Seguin, J.-M. Raimond, and
S. Haroche, Opt. Lett. 20, 1835 (1995).
20 D. D. Smith, H. Chang, and K. A. Fuller, J.
mailto:shih@physics.utexas.edu
Opt. Soc. Am. B 20, 1967 (2003).
21 B. E. Little and S. T. Chu, IEEE Photon.
Technol. Lett. 12, 401 (2000).
22 V. R. Almeida, C. A. Barrios, R. R.
Panepucci, and M. Lipson, Nature (London)
431, 1081 (2004).
ABSTRACT
  We experimentally demonstrate controlled polarization-selective phenomena in
a whispering gallery mode resonator. We observed efficient ($\approx 75 %$)
polarization conversion of light in a silica microsphere coupled to a tapered
optical fiber with proper optimization of the polarization of the propagating
light. A simple model treating the microsphere as a ring resonator provides a
good fit to the observed behavior.

<|endoftext|><|startoftext|>
Limits on the WIMP-nucleon interactions with CsI(Tl) crystal detectors
H.S. Lee,1 H.C. Bhang,1 J.H. Choi,1 H. Dao,7 I.S. Hahn,4 M.J. Hwang,5 S.W. Jung,2 W.G. Kang,3 D.W.
Kim,1 H.J. Kim,2 S.C. Kim,1 S.K. Kim,1, ∗ Y.D. Kim,3 J.W. Kwak,1, † Y.J. Kwon,5 J. Lee,1, ‡ J.H. Lee,1 J.I.
Lee,3 M.J. Lee,1 S.J. Lee,1 J. Li,7 X. Li,7 Y.J. Li,7 S.S. Myung,1 S. Ryu,1 J.H. So,2 Q. Yue,7 and J.J. Zhu7
(KIMS Collaboration)
DMRC and Department of Physics and Astronomy, Seoul National University, Seoul, Korea
Department of Physics, Kyungpook National University, Daegu, Korea
Department of Physics, Sejong University, Seoul, Korea
Department of Science Education, Ewha Womans University, Seoul, Korea
Department of Physics, Younsei University, Seoul, Korea
Department of Engineering Physics, Tsinghua Universuty, Beijing, China
Department of Engineering Physics, Tsinghua University, Beijing, China
(Dated: November 4, 2018)
The Korea Invisible Mass Search (KIMS) experiment presents new limits on the WIMP-nucleon
cross section using data from an exposure of 3409 kg·d taken with low-background CsI(Tl) crystals
at Yangyang Underground Laboratory. The most stringent limit on the spin-dependent interaction
for a pure proton case is obtained. The DAMA signal region for both spin-independent and spin-
dependent interactions for the WIMP masses greater than 20 GeV/c2 is excluded by the single
experiment with crystal scintillators.
PACS numbers: 95.35.+d, 14.80.Ly
The existence of dark matter has been widely sup-
ported by many astronomical observations on vari-
ous scales [1][2][3]. Weakly interacting massive parti-
cles (WIMPs) are a good candidate for dark matter well
motivated by cosmology and supersymmetric models [4].
The Korea Invisible Mass Search (KIMS) experiment has
developed low-background CsI(Tl) crystals to detect the
signals from the elastic scattering of WIMP off the nu-
cleus [5][6][7]. Both 133Cs and 127I are sensitive to the
spin-independent (SI) and spin-dependent (SD) interac-
tions of WIMPs. Recently, the role of CsI in the direct
search for SD WIMP for pure proton coupling has been
pointed out [8]. It is worth noting that 127I is the dom-
inant target for the SI interactions in the DAMA exper-
iment. The pulse shape discrimination (PSD) technique
allows us to statistically separate nuclear recoil (NR) sig-
nals of WIMP interactions from the electron recoil (ER)
signals due to the gamma ray background [9][10].
The KIMS experiment is located at the Yangyang Un-
deground Laboratory (Y2L) at a depth of 700 m under an
earth overburden. Details of the KIMS experiment and
the first limit with 237 kg·d exposure data can be found
in the previous publication [11]. Four low-background
CsI(Tl) crystals are installed in the Y2L and operated
at a temperature of T = 0◦C. Throughout the exposure
period, the temperature of the detector was kept sta-
ble to within ±0.1◦C. Green-enhanced photomultiplier
tubes (PMTs) are mounted at both ends of each crystal.
The signals from the PMTs are amplified and recorded
by a 500 MHz FADC. Each event is recorded for a pe-
riod of 32 µs. Both PMTs on each crystal must have at
least two photoelectrons within a 2 µs window to form an
event trigger. We obtained 3409 kg·d WIMP search data
TABLE I: Crystals used in this analysis and amount of data
for each crystal
Crystal mass (kg) data (kg·days)
S0501A 8.7 1147
S0501B 8.7 1030
B0510A 8.7 616
B0510B 8.7 616
Total 34.8 3409
with four crystals, as shown in Table I. The energy is cali-
brated using 59.5 keV gamma rays from an 241Am source.
For calibration of the mean time, a variable used for the
PSD, NR events are obtained with small crystals ( 3 cm ×
3 cm × 3 cm ) using an Am-Be neutron source. Compton
scattering events taken with the WIMP search crystals
using the 137Cs source are used to determine the mean
time distribution of the gamma background. Compton
scattering events are also taken with the small crystals
to verify that the mean time ditributions for both the test
crystals and the WIMP search crystals are the same. In
order to understand the nature of the PMT background,
a dominant background at low energies, acrylic boxes are
mounted on the same PMTs used for the crystals. The
data obtained using this setup is used to develop the cuts
for the rejection of PMT background.
Since the decay time of the scintillation light in the
CsI(Tl) crystal is rather long, photoelectrons are well
separated at low energies and thereby enabling recon-
struction of each photoelectron. The time distribution of
photoelectrons in an event is fitted to a double exponen-
http://arxiv.org/abs/0704.0423v2
 sec)µMean Time (
FIG. 1: (color online). MT distribution of NR events (open
squares), ER events (open circles) and WIMP search
data (filled triangles) of S0501A crystal in the 5-6 keV range.
Fitted PDF functions are overlayed. χ2/DOF =0.8 and 1.3
with DOF=38 and 35 for NR and ER events respectively.
tial function given by
f(t) =
−(t− t0)
−(t− t0)
where τf and τs are decay time constants of fast and
slow components, respectively, R is ratio between two
components, and t0 is the time of the first photoelectron
in the event. The mean time (MT ) of each event is then
calculated using these quantities as
t · f(t)dt/
f(t)dt.
With this method, an improvement in PSD is achieved
over the previous analysis where we used a simple math-
ematical mean [11]. In order to reject the PMT back-
ground, we applied cuts to the fit variable, τf . The ratio
between the maximum log likelihood value of the dou-
ble exponential fit and that of the single exponential fit
is also used to reject the PMT background, since PMT
background events tend to be shaped as single exponen-
tial decay. To reject the background that originates from
the radioactivity of the PMT, the asymmetry between
the signals from two PMTs is applied. Finally events in
which signals are recorded in more than one crystal are
rejected. The event selection efficiency was estimated
by applying the same analysis cuts to the neutron and
gamma calibration samples. The efficiency depends on
the measured energy and ranges from 30% at 3 keV to
60% above 5 keV.
The estimation of the NR event rate is performed in
each 1 keV bin from 3 to 11 keV for each crystal. TheMT
distributions of NR events and ER events are compared
with the WIMP search data in Fig. 1 for the 5-6 keV
energy range. The probability density functions (PDF)
for the ER and NR events are obtained by fitting these
distributions. An unbinned maximum likelihood fit is
Electron Equivalent Energy (keV)
3 4 5 6 7 8 9 10 11
FIG. 2: (color online). Extracted NR event rates of the
S0501A (open circles), S0501B (filled circles), B0510A (filled
squares), and B05010B (filled triangles) crystals and only sta-
tistical errors (1σ) are shown. The points are shifted with
respect to each other on the x-axis to avoid overlapping.
performed with the log(MT ) distribution of the WIMP
search data using the likelihood function,
× exp{−(NNR,i +NER,i)}
[NNR,iPDFNR,i(xk) +NER,iPDFER,i(xk)],
where the index i denotes the i-th energy bin; n =
NNR,i +NER,i is the total number of events; NNR,i and
NER,i are the numbers of NR and ER events, respec-
tively; PDFNR,i and PDFER,i are PDFs of NR and ER
events, respectively; and xk = log(MT ) for each event.
The NR event rates obtained for each bin and for each
crystal after efficiency correction are shown in Fig. 2.
The extracted NR event rates are consistent with a null
observation of the WIMP signal.
In order to obtain the expected measured energy spec-
trum of a WIMP signal including instrumental effects,
a Monte Carlo (MC) simulation with GEANT4 [12] is
used. A recoil energy spectrum is generated for each
WIMP mass with the differential cross section, form fac-
tor, and quenching factor, as described in Ref. [13]. The
spin-dependent form factor for 133Cs calculated by Toiva-
nen [14] is used, while for 127I, Ressell and Dean’s cal-
culation [15] is used. The photons generated with the
fitted decay function described above are propagated to
the PMT and digitized in the same manner as in the ex-
periment. Subsequently, the photoelectrons within given
time windows are counted to check the trigger condition
and to calculate energy. In this manner, the trigger ef-
ficieny and energy resolution is accounted for in the ex-
pected energy spectrum. The trigger efficiency is found
to be higher than 99% above 3 keV. The simulation is
verified with the energy spectrum obtained using 59.5
keV gamma rays from 241Am. The peak position and
TABLE II: Spin expectation values for 133Cs and 127I
Isotope J < Sp > < Sn > Reference
133Cs 7/2 -0.370 0.003 [16]
127I 5/2 0.309 0.075 [15]
width of the distribution are very well reproduced for
each crystal as described in Ref [11].
The total WIMP rate, R, for each WIMP mass is ob-
tained by fitting the measured energy spectrum to the
simulated one. The 90% confidence level (CL) limit on
R is calculated by the Feldman-Cousins’s approach in
the case of Gaussian with a boundary at the origin [17]
and then converted to the WIMP-nucleus cross section,
σW−A. Subsequently, the limits on WIMP-nucleon cross
section is obtained from Ref. [13][18] as follows:
σW−n = σW−A
where µn,A are the reduced masses of the WIMP-nucleon
and WIMP-target nucleus of mass number A. CA/Cn =
A2 for SI interactions and CA/Cn = 4/3{ap < Sp >
+an < Sn >}2(J + 1)/J for SD interactions. Here ap,
an are WIMP-proton and WIMP-neutron SD couplings
respectively. The spin expectation values used for this
analysis are shown in Table II. Following the “model-
independent” framework [18], we report the allowed re-
gion in two cases for SD interaction: one for an = 0,
and the other for ap = 0. We express the WIMP-nucleon
cross section as follows:
σSIW−n = σW−A
σSDW−n,p = σW−A
µ2n,p
(J + 1)
< Sn,p >2
where we indicate pure proton (p, an = 0) and pure
neutron (n, ap = 0) coupling for SD interaction. We also
present the allowed region in the ap − an plane with the
following relation [18]:
where GF is the Fermi coupling constant.
The uncertainty in the MT distribution results in the
uncertainty of the NR event rate. The limited statistics
of the calibration data and different crystals used for the
neutron calibration and WIMP search data are the ma-
jor sources of this uncertainty. The former is investigated
by varying the fitted parameters in PDF function within
errors. The lattter is estimated by changing the mean of
MT by the difference between the crystals. The system-
atic uncertainties from these two souces are combined in
quadrature resulting in 20-30% of statistical uncertain-
ties depending on the energy bins. In addition, there
WIMP Mass (GeV)
210 310 410
DAMA region
FIG. 3: (color online). Exclusion plot for the SD interaction
in the case of pure proton coupling (an = 0) at the 90%
confidence level
WIMP Mass (GeV)
210 310 410
DAMA region
FIG. 4: (color online). Exclusion plot for for the SD interac-
tion in the case of pure neutron coupling (ap = 0) at the 90%
confidence level
are uncertainties in the MC estimation of the expected
event rates due to the uncertainties in the quenching fac-
tors and the difference of energy resolution between the
MC simulation and the data. The systematic error from
the MC simulation is estimated to be 13.3% of the limits.
These systematic errors are combined with the statistical
error in quadrature in the presented results.
The limits on the SD interactions are shown in Fig. 3
and 4 in the cases of pure proton coupling and pure neu-
tron coupling, respectively. We also show the results ob-
tained from CDMS [19], NAIAD [20], SIMPLE [21], and
-6 -4 -2 0 2 4 6
FIG. 5: (color online). Allowed region (90% confidence level)
in ap − an plane by KIMS data (inside the solid line contour)
for 50 GeV WIMP mass. Results of CDMS [19](dotted line)
and NAIAD [20](dot-dashed line) are also shown.
WIMP Mass (GeV)
210 310 410
FIG. 6: (color online). Exclusion plot for the SI interactions
at the 90% confidence level.
PICASSO [22]. The DAMA signal region is taken from
Ref [23]. Our limit provides the lowest bound on the
SD interactions in the case of pure proton coupling for a
WIMP mass greater than 30 GeV/c2. The allowed region
in the ap − an plane for the WIMP mass of 50 GeV/c2 is
also shown in Fig. 5 together with the limits from CDMS
and NAIAD. The limit for the SI interactions is shown
in Fig. 6 together with the results of CDMS [24], EDEL-
WEISS [25], CRESST [26], ZEPLIN I [27], and the 3σ
signal region of DAMA (1-4) [28]. Although there are
several experiments that reject the DAMA signal region,
this is the first time that it is ruled out by a crystal de-
tector containing 127I, which is the dominant nucleus for
the SI interactions in the NaI(Tl) crystal.
In summary, we report new limits on the WIMP-
nucleon cross section with CsI(Tl) crystal detectors using
3409 kg·d exposure data. The DAMA signal regions for
both SI and SD interactions are excluded for the WIMP
masses higher than 20 GeV/c2 by the single experiment.
The most stringent limit on the SD interaction in the
case of purely WIMP-proton coupling is obtained.
The authors thank Dr. J. Toivanen and M. Korte-
lainen for the calculation of the SD form factor as well
as for the useful discussions. This work is supported by
the Creative Research Initiative Program of the Korea
Science and Engineering Foundation. We are grateful to
the Korea Middland Power Co. Ltd. and the staff mem-
bers of the YangYang Pumped Storage Power Plant for
providing us the underground laboratory space.
∗ skkim@hep1.snu.ac.kr
† Current address: National Cancer Center, Ilsan, Korea
‡ Current address: Department of Physics, Ewha Womans
University, Seoul, Korea
[1] K. G. Begeman, A. H. Broeils, and R. H. Sanders, Mon.
Not. Roy. Astron. Soc. 249, 523 (1991).
[2] D. N. Spergel et al., Astrophys. J. Suppl. 148, 175 (2003).
[3] M. Tegmark et al., Phys. Rev. D 69, 103501 (2004).
[4] G. Jungman, M. Kamionkowski, and K. Griest, Phys.
Rep. 267, 195 (1996).
[5] H. S. Lee et al., Nucl. Instr. Meth. A 571, 644 (2007).
[6] Y. D. Kim et al., J. Korean. Phys. Soc. 40, 520 (2002).
[7] Y. D. Kim et al., Nucl. Instr. Meth. A 552, 456 (2005).
[8] T. A. Girard and F. Giuliani, Phys. Rev. D 75, 043512
(2007).
[9] H. J. Kim et al., Nucl. Instr. Meth. A 457, 471 (2001).
[10] H. Park et al., Nucl. Instr. Meth. A 491, 460 (2002).
[11] H. S. Lee et al., Phys. Lett. B 633, 201 (2006).
[12] S. Agostinelli et al., Nucl. Instr. Meth. A 506, 250 (2003).
[13] J. D. Lewin and P. F. Smith, Astropart. Phys. 6, 87
(1996).
[14] J. Toivanen and M. Kortelainen (2006), private commu-
nication.
[15] M. T. Ressell and D. J. Dean, Phys. Rev. C 56, 535
(1997).
[16] F. Iachello, L.M.Krauss, and G. Maino, Phys. Lett. B
254, 220 (1991).
[17] G. J. Feldman and R. D. Cousins, Phys. Rev. D 57, 3873
(1998).
[18] D. R. Tovey et al., Phys. Lett. B 488, 17 (2000).
[19] D. S. Akerib et al., Phys. Rev. D 73, 011102 (2006).
[20] G. J. Alner et al., Phys. Lett. B 624, 186 (2005).
[21] T. A. Girard et al., Phys. Lett. B 621, 233 (2005).
[22] M. Barnabe-Heider et al., Phys. Lett. B 624, 186 (2005).
[23] C. Savage, P. Gondolo, and K. Freese, Phys. Rev. D 70,
123513 (2004).
[24] D. S. Akerib et al., Phys. Rev. Lett. 96, 011302 (2006).
[25] V. Sanglard et al., Phys. Rev. D 71, 122002 (2005).
[26] G. Angloher et al., Astropart. Phys. 23, 325 (2005).
[27] G. J. Alner et al., Astropart. Phys. 23, 444 (2005).
[28] R. Bernabei et al., Phys. Lett. B 480, 23 (2000); R. Bern-
abei et al., Riv. Nuovo. Cim. 26, 1 (2003).
mailto:skkim@hep1.snu.ac.kr
ABSTRACT
  The Korea Invisible Mass Search(KIMS) experiment presents new limits on
WIMP-nucleon cross section using the data from an exposure of 3409 kgd taken
with low background CsI(Tl) crystals at Yangyang underground laboratory. The
most stringent limit on the spin dependent interaction for pure proton case is
obtained. The DAMA signal region for both spin independent and spin dependent
interactions for the WIMP mass higher than 20 GeV/c^2are excluded by the single
experiment with crystal scintillators.

<|endoftext|><|startoftext|>
Stopping effects in U+U collisions with a beam energy of 520 MeV/nucleon
Xiao-Feng Luo,1, ∗ Xin Dong,1 Ming Shao,1 Ke-Jun Wu,2 Cheng Li,1
Hong-Fang Chen,1 and Hu-Shan Xu3
University of Science and Technology of China, Hefei, Anhui 230026, China
Institute of Particle Physics, Hua-Zhong Normal University, Wuhan, Hubei 430079, China
Institute of Modern Physics, Chinese Academy of Sciences, LanZhou, Gansu 730000, China
(Dated: November 4, 2018)
A Relativistic Transport Model (ART1.0) is applied to simulate the stopping effects in tip-tip
and body-body U+U collisions, at a beam kinetic energy of 520 MeV/nucleon. Our simulation
results have demonstrated that both central collisions of the two extreme orientations can achieve
full stopping, and also form a bulk of hot, dense nuclear matter with a sufficiently large volume
and long duration, due to the largely deformed uranium nuclei. The nucleon sideward flow in
the tip-tip collisions is nearly 3 times larger than that in body-body ones at normalized impact
parameter b/bmax < 0.5, and that the body-body central collisions have a largest negative nucleon
elliptic flow v2 = −12% in contrast to zero in tip-tip ones. Thus the extreme circumstance and the
novel experimental observables in tip-tip and body-body collisions can provide a good condition and
sensitive probe to study the nuclear EoS, respectively. The Cooling Storage Ring (CSR) External
Target Facility (ETF) to be built at Lanzhou, China, delivering the uranium beam up to 520
MeV/nucleon is expected to make significant contribution to explore the nuclear equation of state
(EoS).
PACS numbers: 24.10.Lx,25.75.Ld,25.75.Nq,24.85.+p
I. INTRODUCTION
In recent years, the ultra-relativistic high energy heavy
ion collisions performed at SPS/CERN and RHIC/BNL
sNN ∼ 10 − 200 GeV) focus on high temperature
and low baryon density region in nuclear matter phase
diagram [1] to search a new form of matter with par-
tonic degree of freedom-the quark-gluon plasma (QGP)
[2, 3, 4, 5]. However, no dramatic changes of experimen-
tal observables, such as jet-quenching, elliptic flow and
strangeness enhancement, have been observed yet [6]. On
the other hand, the heavy ion collisions performed at the
BEVALAC/LBNL and SIS/GSI [7, 8] in last two decades
were used to produce hot and compressed nuclear mat-
ter to learn more about the nuclear equation of state
(EoS) [13, 14] at high baryon density and low temper-
ature region of the phase diagram. Although we have
made great efforts to study the nuclear EoS, theoreti-
cally and experimentally, a solid conclusion can hardly
be made. Then, it is still worthwhile to systematically
study on the collision dynamics as well as the EoS observ-
ables. Recently, for more understanding of the nuclear
matter phase diagram and EoS at high net-baryon den-
sity region, it is proposed to collide uranium on uranium
target at External Target Facility (ETF) of Cooling Stor-
age Ring (CSR) at Lanzhou, China with a beam kinetic
energy of 520 MeV/nucleon. [10].
Uranium is the largest deformed stable nucleus, and
has approximately an ellipsoid shape with the long and
short semi-axis given by Rl = R0(1 + 2δ/3) and Rs =
∗contact author: science@mail.ustc.edu.cn
FIG. 1: (Color online) (a) body-body collisions (b) tip-tip
collisions
R0(1 − δ/3), respectively, where R0 = 7 fm is the effec-
tive spherical radius and δ = 0.27 is the deformation pa-
rameter [9]. Consequently, one has Rl/Rs = 1.3. In our
simulation, we consider two extreme orientations: the so-
called tip-tip and body-body patterns with the long and
short axes of two nuclei are aligned to the beam direc-
tion, respectively [12], see Fig. 1 for illustration. The
two types of orientations can be identified in random ori-
entations of U+U collisions by making proper cutoffs in
experimental data, such as the particle multiplicities, el-
liptic flow and so on [10, 11]. With the two extreme
collision orientations, some novel stopping effects which
are believed responsible for some significant experimental
observables, such as particle production, collective mo-
tion as well as attainable central densities, can be ob-
tained. Due to the large deformation of the uranium
nuclei [11, 12] , it is expected that the tip-tip collisions
can form a higher densities nuclear matter with longer
duration than in body-body or the spherical nuclei colli-
sions, which is considered to be a powerful tool to study
http://arxiv.org/abs/0704.0424v2
the nuclear matter phase transition at high baryon den-
sity [12], and the body-body central collisions may reveal
a largest out-of-plane elliptic flow (negative v2) at high
densities, which can be a sensitive probe to extract the
early EoS of the hot, dense nuclear matter [12, 17]. The
novel experimental observables can be effectively utilized
to study the possible nuclear matter phase transition and
the nuclear EoS [12, 13, 14, 15, 16, 17, 18, 19, 20]. For
comparing with tip-tip and body-body collisions, a type
of gedanken ”sphere-sphere” collisions without deforma-
tions of uranium nuclei are also included in the simula-
tion.
The ART1.0 model [21, 22] derived from Boltzmann-
Uehling-Uhlenbeck (BUU) model [23] has a better treat-
ment of mean field and Pauli-Blocking effects [23] than
cascade models [24]. The fragments production mech-
anism and partonic degree of freedom are not present
in the ART1.0 model. A soft EoS with compressibility
coefficient K = 200 MeV is used throughout the simu-
lation and the beam kinetic energy of uranium nuclei is
set to 520 MeV/nucleon if not specifically indicated. In
the next section, we discuss about the stopping power
ratio and selection of impact parameter b. In Sec. 3,
the evolution of baryon and energy densities as well as
thermalization of central collision systems are studied. In
Sec. 4, some experimental observables, such as nucleon
sideward flow and elliptic flow are also investigated. We
summarize our results in Sec. 5.
II. STOPPING POWER OF TIP-TIP AND
BODY-BODY COLLISIONS
Large stopping power can lead to remarkable pressure
gradient in the compressed dense matter. It is generally
also considered to be responsible for transverse collec-
tive motion [25], the maximum attainable baryon and
energy densities as well as thermalization of collision sys-
tems. Thus, the study of the stopping power in U+U
collisions may provide important information for under-
standing the nuclear EoS and collision dynamics.
A. Selection of impact parameter
The nuclear stopping power and geometric effects in
U+U collisions rely strongly on the impact parameter
b. Considering the conceptual design of the CSR-ETF
detector [10], two methods are invoked here to estimate
the impact parameter. The first one is the multiplicity
of forward neutrons with polar angle θ < 20o in the lab
frame which can be covered by a forward neutron wall.
The other method is to make use of the parameter Erat
[26], which is the ratio of the total transverse kinetic en-
ergy to the total longitudinal one. The particles are also
required to be within θ < 20o in the lab frame, while
the two qualities are calculated within the center of mass
0 0.2 0.4 0.6 0.8 1
tip-tip
body-body
0 0.2 0.4 0.6 0.8 10
maxb/b
FIG. 2: Upper: Forward neutron multiplicity and Lower:
Erat, as a function of normalized impact parameter b/bmax in
both tip-tip and body-body collisions.
system (c.m.s.).
Erat =
Ezi (1)
The normalized impact parameter b/bmax is used to rep-
resent centralities of tip-tip and body-body collisions and
the bmax of the two cases are quite different from each
other. As shown in Fig. 2, with either method, obvious
linear dependence of the normalized impact parameter
are demonstrated in both tip-tip and body-body near
central collisions. Then, the two methods can be com-
bined to determine the impact parameter to identify the
most central collision events in both tip-tip and body-
body collisions.
B. Stopping power ratio definition and evolution
It is difficult to obtain a universally accepted estimate
of the nuclear stopping power in heavy ion collisions due
to a proliferation of definitions of the concept [27]. The
stopping power ratio R [28] is employed to measure the
degree of stopping and defined as:
|Ptj |/
|Pzj | (2)
, the total nucleon transverse momentum |Ptj | divided by
the total absolute value of nucleon longitudinal momen-
tum |Pzj | in the c.m.s.. The ratio is wildly used to de-
scribe the degree of thermalization and nuclear stopping
by low and intermediate energies heavy ion collisions.
It’s a multi-particle observable on an event-by-event ba-
sis, which for an isotropic distribution is unity.
Fig. 3 shows the time and normalized impact param-
eter dependence of the stopping ratio R for three con-
ditions: tip-tip, body-body and sphere-sphere collisions.
0 10 20 30 40
=0max(a)b/b
tip-tip
body-body
sphere-sphere
0.2 0.4 0.6 0.8 1
(b)Minibias
tip-tip
body-body
sphere-sphere
t(fm/c) maxb/b
FIG. 3: (Color online) (a)The time evolution of the stopping
ratio R in tip-tip, body-body and sphere-sphere central colli-
sions, and (b) the stopping ratio R as a function of b/bmax in
minimum biased collisions.
When the ratio R reaches the value of 1, full stopping of
the collision system is considered to be achieved, and the
momenta is also isotropy, which are not sufficient but nec-
essary for thermal equilibrium of collision systems [28].
For R > 1, it can be explained by preponderance of mo-
mentum flow perpendicular to the beam direction [29]. It
is shown that all of the three conditions can achieve full
stopping when the stopping ratio R=1, the correspond-
ing time for body-body and tip-tip central collisions are
about 15 fm/c and 25 fm/c, respectively. Larger stopping
ratio and faster evolution to full stopping are observed
for body-body central collisions than tip-tip and sphere-
sphere ones at the early stage, which may indicate a more
violent colliding process for body-body central collisions
due to the sizable initial transverse overlap region. Al-
though the stopping ratio of tip-tip central collisions is
lowest than the other two cases at the early time, it raises
sharply later and even exceeding one. So, it means that
longer reaction and passage time can be obtained in tip-
tip central collisions than body-body and sphere-sphere
ones, which may indicate the nucleons in tip-tip colli-
sions can undergo more binary collisions to reach higher
transverse momentum.
In Fig. 3(b), the R of the three conditions are gradu-
ally decrease with the increase of the normalized impact
parameter. When b/bmax < 0.5, the ratio is always larger
for tip-tip collisions than the other two cases, while for
b/bmax > 0.5 all of the three conditions almost have the
same stopping power ratio.
III. BARYON, ENERGY DENSITY AND
THERMAL EQUILIBRIUM
Considering the discrepancy of stopping power be-
tween tip-tip and body-body collisions, it is interesting
4 (a)Baryon Density 
tip-tip
body-body
Au-Au
0 10 20 30 40 50
(b)Energy Density
tip-tip
body-body
Au-Au
t(fm/c)
FIG. 4: The evolution of (a) baryon and (b) energy densities
in tip-tip, body-body and Au+Au central collisions.
to study further about the baryon and energy densities
evolution in both cases. As the full stopping and de-
formation effects in U+U collisions, it is believed higher
local baryon and energy densities system with long du-
ration can be created, which is considered to be a signif-
icant condition to study the nuclear EoS at high bayonic
density region.
A. The evolution of baryon and energy densities
The evolution of baryon and energy densities in the
central zone of tip-tip and body-body as well as Au+Au
central collisions are illustrated in Fig. 4.
In Fig. 4, it is observed the maximum attainable
baryon and energy densities for both tip-tip and body-
body central collisions are about 3.2 ρ0 and 0.8 GeV/fm
respectively, while the Au+Au one are about 2.6 ρ0 and
0.6 GeV/fm3. Both the baryon and energy densities in
U+U collisions are higher than the Au+Au one. Once a
baryon density threshold of ρ > 2.5 ρ0 is required, the
corresponding duration in tip-tip central collisions ∼ 20
fm/c (from ∼ 8 fm/c to ∼ 28 fm/c) is longer than ∼
10 fm/c ( from ∼ 8 fm/c to ∼ 18 fm/c ) of body-body
one, which is as predicted. But the peak densities have
no significant discrepancy between the two cases unlike
those at the energy region of the Alternating Gradient
Synchrotron (AGS) [12], which may be attribute to the
full stopping at the CSR energy.
B. Thermalization of the U+U collision systems
As mentioned before, the stopping ratio R = 1 is a
necessary but not sufficient condition for thermal equi-
librium of the collision system. In order to approach a
thermal equilibrium, a long duration of reaction is needed
for nucleons to undergo sufficient binary collisions. As
0 10 20 30 40 50
tip-tip
body-body
Au-Au
0 10 20 30 40 50
tip-tip
body-body
Au-Au
t(fm/c)
FIG. 5: The evolution of (a) volume with high density (ρ >
2.5ρ0) in tip-tip, body-body and Au+Au central collisions,
and (b) the scaled mean kinetic energy 2
< Ek >, within a
sphere of radius 2fm around the system mass center.
shown in Fig. 4(a), obvious long duration has been ob-
tained in both tip-tip and body-body central collisions.
It is therefore possible thermal equilibrium at the time
of freeze-out can be achieved.
The Fig. 5(a) is the evolution of the volume with the
high baryon density(ρ > 2.5 ρ0) for tip-tip, body-body
and Au+Au central collisions, respectively. Both tip-
tip and body-body central collisions have larger volumes
than Au+Au one at the same beam kinetic energy 520
MeV/nucleon. Although the maximum volume attain-
able for body-body central collisions(∼ 220 fm3) is about
two times larger than tip-tip one(∼ 120 fm3), the peak
volume of tip-tip central collisions lasts a much longer
time of ∼ 10 fm/c (from ∼ 15 fm/c to ∼ 25 fm/c) and
much more stable than body-body one. To estimate the
temperature at the freeze-out time, the scaled mean ki-
netic energy of all hadrons in a sphere of radius 2fm
around the system mass center is calculated as 2
< Ek >
[22], which is utilized to reflect the thermalization tem-
perature T of the collision system approximately. As
illustrated in Fig. 5 (b), both tip-tip and body-body
central collisions show a flat region about 75 MeV and
the corresponding time range are about 10 fm/c to 28
fm/c and 10 fm/c to 18 fm/c, respectively. Considering
the time range of the flat region in Fig. 5 (b) associat-
ing with the corresponding range in Fig. 5 (a) and also
looking back to Fig. 4, we obtain a large volume of hot,
dense nuclear matter in both tip-tip and body-body cen-
tral collisions. Consequently, the extreme circumstance
of sufficiently high temperature and density for a signif-
icant large volume and long duration [12, 22] has been
formed in tip-tip and body-body central collisions, which
can provide a good opportunity to study the nuclear EoS
as well as particles in medium properties, especially for
tip-tip case.
The time of freeze out should be cautiously determined
(a)tip-tip π+∆+*N
0 10 20 30 40 50 60
80 (b)Body-Body π+∆+*N
t(fm/c)
FIG. 6: Evolution of the multiplicity of the free pion, N∗ +
∆, N∗ +∆+ π in (a)tip-tip ,and (b)body-body central colli-
sions.
for estimating the thermalization temperature of colli-
sion system. In Fig. 6 the multiplicity evolution of free
pion which are not bounded in baryon resonances and
pion still bounded inside the excited baryon resonances
(∆, N∗) (unborn pion) are displayed. At the Lanzhou
CSR energy region (520 MeV/nucleon), the production
and destruction of the ∆ resonances are mainly through
NN ⇋ N∆ and ∆ → Nπ reactions in which the ∆ decay
rate is always higher than that of the formation of this
resonance and the production of pion is predominated by
the decay of the ∆ resonances (∆ → Nπ) [30]. The total
multiplicity of pion, ∆ and N∗ approaches a saturated
level after a period of evolution, indicating the freeze-
out time about t=28 fm/c and t=18 fm/c for tip-tip
and body-body central collisions, respectively. The larger
maximum attainable total multiplicity of pion, ∆ andN∗
and freeze out earlier indicates a existent of faster evo-
lution and more violently reaction process for the body-
body central collisions than tip-tip case consisting with
the discussing before.
The corresponding temperature about 75 MeV at
freeze-out time can be extracted from the Fig. 5 (b), for
both tip-tip and body-body central collisions. To further
confirm this estimation, both the energy spectrum of the
nucleon and negative-charged pion are studied within the
polar angle range of 900±100 in the c.m.s.. The thermo-
dynamic model [31] predicts that the energy spectra will
be represented by a temperature T which characterizes a
Maxwell-Boltzmann gas
PEdEdΩ
= const× e−Ekin/T (3)
, where P and E are the particle momentum and total
energy in the c.m.s.. Both the energy spectra and the
Boltzmann fit results are shown in Fig. 7. The inverse
slope (e.g. temperature T ) of the nucleons in tip-tip and
body-body central collisions are about 73 MeV and 70
0 0.2 0.4 0.6 0.8
310  
tip-tip
body-body
(a)Nucleon
0 0.2 0.4 0.6 0.8
-π(b)
(GeV)kinE
FIG. 7: (a) Nucleon, and (b) negative-charged pion energy
spectrum at 900 ± 100 in the c.m.s. together with a Maxwell-
Boltzmann fit for both tip-tip and body-body central colli-
sions. The nucleon fit temperature for tip-tip and body-body
are about 73 MeV and 70 MeV, respectively and that of pion
are about 56 MeV and 52 MeV, respectively.
MeV, respectively, which are in good agreement with the
temperature extracted from the Fig. 5(b) at the freeze
out time. The spectra of negative-charged pion show a
different lower temperature than that of nucleon which
may be explained by considering an equilibrated N and
∆ system at thermal freeze out and taking into account
the kinematics of ∆ decay [32]. The nucleon temperature
closely reflects the freeze-out temperature of tip-tip and
body-body central collisions.
In conclusion, thermalization (or near thermalization)
of the collision system corresponding a freeze-out tem-
perature about 75 MeV is likely to be achieved in both
tip-tip and body-body central collisions. However, it’s
also possible that the collision system is still in a non-
equilibrium transport process on its path towards kinetic
equilibration [30].
IV. THE COLLECTIVE FLOW OF U+U
COLLISIONS
Stopping of nuclei in heavy ion collision can lead
to pressure gradient along different directions, result-
ing in collective motion as spectators bounce-off [34]
and participants squeeze-out effects [35]. Since last
two decades, at Bevalac/LBNL and SIS/GSI energies
the so-called ”collective flow” analysis has been estab-
lished [15, 34, 35, 36, 37] to study the collective mo-
tion of the products in heavy ion collisions. The collec-
tive flow resulting from bounce-off and squeeze-out ef-
fects, which can be explained well by the hydrodynamics
model [34, 38], and also be in good agreement with the
experimental data has been observed [39, 40]. Because of
the large deformation of the uranium nuclei, a novel col-
-1 -0.5 0 0.5 1
Soft:tip-tip
Soft:body-body
Cascade:tip-tip
Cascade:body-body
(a)b/b
-1 -0.5 0 0.5 1
(b)b/b
FIG. 8: The mean transverse momentum per nucleon pro-
jected into the reaction plane, < px/A >, as a function
of c.m.s. normalized rapidity is illustrated for tip-tip and
body-body collisions. With normalized impact parameter
cutoff:(a)b/bmax <= 0.5 (b)b/bmax > 0.5.
lective motion is expected [12], to be used to extract the
medium properties and nuclear matter EoS information.
[15, 16, 17, 18, 19, 20].
To perform flow analysis, it is necessary to construct a
imaginary reaction plane defined by direction of the beam
(z) and the impact parameter vector b [43, 45, 46]. In our
simulation, the x− z plane is just defined as the reaction
plane with the beam direction along z positive direction
and the impact parameter vector b along x positive direc-
tion. In last two decades, there are mainly two methods
to study the collective flow at the low and intermediate
energies. One is the sphericity method [28, 34, 41, 42]
which yields the flow angle relative to the beam axis of
the major axis of the best-fit kinetic energy ellipsoid, and
the other is to employ the mean transverse momentum
per nucleon projected into the reaction plane, < px/A >,
to perform nucleon sideward flow analysis [43, 44] which
reflects the spectator bounce-off effects in the reaction
plane. In recent years, it is usual to use an anisotropic
transverse flow analysis method. With a Fourier expan-
sion [47, 48] of the particle azimuthal angle φ distribu-
tion with respect to the reaction plane, different har-
monic coefficients can be extracted, among which the
first harmonic coefficient v1, called directed flow (simi-
lar to sideward flow) and the second harmonic coefficient
v2, called elliptic flow are mostly interested. The ellip-
tic flow reflects the anisotropy of emission particles in
the plane perpendicular to the reaction plane while the
directed flow describes the anisotropy in reaction plane.
The Fourier expansion can be expressed as
∼ 1 +
2vncos(nφ) (4)
Fig. 8 shows nucleon sideward flow, < px/A >, for
both tip-tip and body-body minimum biased collisions
(a)Nucleon Flow Parameter
tip-tip
body-body
Au-Au(500MeV/A)
0 0.2 0.4 0.6 0.8 1
10 2(b)Nucleon v
tip-tip
body-body
Au-Au(500MeV/A)
maxb/b
FIG. 9: (a)The nucleon flow parameter F and (b)the c.m.s.
mid-rapidity ( −0.5 < y0 < 0.5 ), nucleon elliptic flow v2 of
three collision conditions as a function of normalized impact
parameter b/bmax with soft EoS.
as a function of normalized rapidity, y(0) = Ycm/ycm, in
which Ycm represents the particle rapidity in c.m.s. and
ycm is the rapidity of the system mass center. To ex-
tract the nuclear EoS information and also demonstrate
the discrepancies of the nucleon sideward flow in tip-tip
and body-body collisions, the cascade events [49], which
neglect the mean field and pauli blocking effects are em-
ployed here to compare with the soft EoS case. In Fig.
8(a),(b), with a soft EoS, it is noted that either tip-tip
or body-body collisions show a spectator bounce-off ef-
fect revealing an obvious ”S” shape [15, 49] at the mid-
rapidity region of −0.5 < y0 < 0.5, while the cascade one
appear a almost vanishing nucleon sideward flow. It can
be understand by the nucleon sideward flow is related
to the mean field, which is mainly responsible for the
pressure gradient of the stopping nuclei, while the mean
field has a strong dependence of the nuclear EoS. There-
fore, the nucleon sideward flow is thought to be a good
indirect probe to extract the nuclear EoS information,
especially tip-tip case for its largely remarkable sideward
flow. A cutoff on normalized impact parameter is also
applied to explore the impact parameter dependence of
nucleon sideward flow. As shown in Fig. 8(b), when
b/bmax > 0.5 the curves of soft EoS and cascade are al-
most superposed with each other, while for b/bmax < 0.5
large discrepancy is observed. The situation is quite sim-
ilar to Fig. 3 (b), almost the same stopping power for
b/bmax > 0.5 and large discrepancy for b/bmax < 0.5 in
tip-tip and body-body minimum biased collisions, which
means there exists a correlation between nuclear stopping
power and sideward flow [33].
The normalized impact parameter dependence of the
collective flow of nucleon is further studied, by analyzing
the ”flow parameter” F [49] and also elliptic flow v2 for
both tip-tip and body-body as well as Au+Au minimum
biased collisions. The flow parameter F is a customarily
used quality to describe the nucleon sideward flow quan-
titatively defined as
d < px/A >
dy(0)
y(0)=0
the slope of the mean transverse momentum per nucleon
projected into the reaction plane at y(0) = 0.
In Fig. 9(a), with b/bmax > 0.5, the nucleon flow pa-
rameter F of tip-tip and body-body collisions are with
similar value. This similarity, along with the almost same
stopping ratioR in Fig. 3(b), indicates a existence of sim-
ilar pressure gradient effects on nucleon sideward flow in
the two collision orientations. While for b/bmax < 0.5,
the flow parameter F of tip-tip collisions is nearly 3 times
larger than that of body-body case. Even the sideward
flow of Au+Au collisions is larger than the body-body
one. It is further confirmed the tip-tip nucleon sideward
flow is a more sensitive probe to extract the information
of nuclear EoS than that of body-body one. The promi-
nence high of the nucleon sideward flow in tip-tip colli-
sions may be resulted from the stronger pressure gradient
between the participants and spectators in the reaction
plane than body-body one, due to the largely deformed
nuclei.
The normalized impact parameter dependent of nu-
cleon elliptic flow v2 at the mid-rapidity region ( −0.5 <
y0 < 0.5 ) is displayed in Fig. 9(b). A significant neg-
ative elliptic flow v2 at this energy region is consistent
with the excitation function of the elliptic flow studied
before [50]. An largest negative v2 about −12% in body-
body central collisions is observed which reflects the large
geometric and squeeze-out effects in the collisions. While
for tip-tip and Au+Au ones the maximum negative v2 are
obtained at mid-centrality. Since both high baryon, en-
ergy densities and large elliptic flow effects, which reflects
an early EoS of the hot dense compression nuclear matter
[17], are available in body-body central collisions. Thus
the body-body nucleon elliptic flow can also be taken as
a sensitive probe of nuclear EoS. The novel behaviors
of nucleon collective flow in tip-tip and body-body col-
lisions are mainly attributed to the large deformation of
the uranium nuclei.
V. SUMMARY
In summary, the CSR-ETF at Lanzhou provide a good
opportunity to systematically study the nuclear EoS at
the high net-baryon density region of nuclear matter
phase diagram. Due to the novel stopping effects in
largely deformed U+U collisions, the simulation based on
ART1.0 demonstrates that full stopping can be achieved
and also a bulk of hot, high densities nuclear matter with
large volume and long duration have been formed in both
tip-tip and body-body collisions. Large nucleon sideward
flow in tip-tip collisions and the significant negative nu-
cleon elliptic flow in body-body central collisions can pro-
vide a sensitive probe to extract nuclear EoS information.
Thus the extreme circumstance and the novel collective
flow in both tip-tip and body-body collisions can provide
a good condition and sensitive probe to study the nu-
clear EoS, respectively. More experimental observables
of U+U collision dynamics should be further studied, due
to the geometric effects.
VI. ACKNOWLEDGEMENT
This work is supported by National Natural Sci-
ence Foundation of China (10575101,10675111) and the
CAS/SAFEA International Partnership Program for
Creative Research Teams under the grant number of
CXTD-J2005-1. We wish to thank Bao-an Li, Feng Liu,
Qun Wang, Zhi-Gang Xiao and Nu Xu for their valuable
comments and suggestions.
[1] M. A. Stephanov, Int. J. Mod. Phys. A20,4387 (2005);
[2] C. Lourenco et al, Nuclear Physics A698,13-22 (2002);
[3] N. Xu et al, Nucl. Phys. A751,109-126 (2005)
[4] J. Adams et al, Nucl. Phys. A757,102-183 (2005);
[5] K. Adcox et al, Nucl. Phys. A757,184-283 (2005);
[6] P. Jacobs, X. N. Wang, Prog. Part. Nucl. Phys. 54, 433-
534(2005)
[7] E. K. Hyde, Phys. Scr. 10 30-35 (1974) ;
[8] C. Höhne, Nucl. Phys. A749,141c-149c (2005);
[9] A. Bohr and B. Mottelson, Nuclear Structure 2,133
(1975);
[10] Z. G. Xiao, talk presented at QM2006,ShangHai;
[11] E. V. Shuryak, Phys. Rev. C 61,034905 (2000);
[12] Bao-An Li, Phys. Rev. C 61,021903(R) (2000);
[13] P. Danielewicz, nucl-th/0512009;
[14] P. Danielewicz et al, Science 298,1592-1596 (2002);
[15] K. G. R. Doss et al, Phys. Rev. Lett. 57,302 (1986);
[16] J. J. Molitoris et al, Nucl. Phys. A447,13c (1985);
[17] P. Danielewicz et al, Phy. Rev. Lett. 81,2438 (1998);
[18] J. Hofmann et al, Phys. Rev. Lett. 36,88 (1976);
[19] H. Stöocker and W. Greiner, Phys. Rep. 137,277 (1986);
[20] H. Stöocker et al, Z. Phys. A 290,297 (1979);
[21] Bao-An Li and C. M. Ko, Phys. Rev. C 52 ,2037 (1995);
[22] Bao-An Li et al, Inter. Jour. Mod. Phys. E10,267 (2001);
[23] G. F. Bertsch and S. D. Gupta, Phys.Rep. 160,189
(1988);
[24] J. Cugnon, Phys. Rev. C 22,1885 (1980)
[25] A. Andronic et al, Eur. Phys. J. A30,1 (2006);
[26] B. Hong et al, Phys. Rev. C 66,034901 (2002);
[27] S. P. Sorensen et al, CONF-9109221-3:
DE92009006 (1991);
[28] H. Ströbele et al, Phys. Rev. C 27,1349 (1983);
[29] R. E. Renfordt and D. Schall et al, Phys. Rev. Lett.
53,763 (1984);
[30] Bao-An Li and W. Bauer, Phys. Rev. C 44,450 (1991);
[31] J. Gosset et al, Phys. Rev.C 16,629-657 (1977);
[32] R. Brockmann et al, Phys. Rev. Lett. 53,2012 (1984);
[33] W. Reisdorf et al, Phys. Rev. Lett. 92,232301 (2004);
[34] H. A. Gustafsson et al, Phys. Rev. Lett. 52,1590 (1984);
[35] H. H. Gutbrod et al, Phys. Rev. C 42,640 (1990);
[36] R. E. Renfordt et al, Phys. Rev. Lett. 53,763 (1984);
[37] H. Stöcker, J. A. Maruhn, W. Greiner, Phys. Rev. Lett.
44,725 (1980);
[38] H. Stöcker et al, Phys. Rev. Lett. 47,1807 (1981);
[39] W. Scheid et al, Phys. Rev. Lett. 32,741 (1974);
[40] G. Buchwald et al, Phys. Rev. Lett. 52,1594 (1984);
[41] J. Cugnon et al, Phys. Lett. B 109,167 (1982);
[42] M. Gyulassy et al, Phys. Lett. B 110,185 (1982);
[43] P. Danielewicz and and G. Odyniec, Phys. Lett. B
157,146 (1985);
[44] H. A. Gustafsson et al, Z. Phys. A321,389 (1983);
[45] J. Y. Ollitrault, Phys. Rev. D 48,1132 (1993);
[46] R. J. M. Snellings et al, STAR Note 388 (1999) (arXiv:
nucl-ex/9904003);
[47] A. M. Poskanzer and S. A. Voloshin, Phys. Rev. C
58,1671 (1998);
[48] S. A. Voloshin and Y. Zhang, Z. Phys. C 70,665-672
(1994);
[49] F. Rami et al, Nucl. Phys. A646,367-384 (1999);
[50] J. Y. Ollitrault, Nucl. Phys. A638,195-206 (1998);
http://arxiv.org/abs/nucl-th/0512009
http://arxiv.org/abs/nucl-ex/9904003
ABSTRACT
  A Relativistic Transport Model (ART1.0) is applied to simulate the stopping
effects in tip-tip and body-body U+U collisions, at a beam kinetic energy of
520 MeV/nucleon. Our simulation results have demonstrated that both central
collisions of the two extreme orientations can achieve full stopping, and also
form a bulk of hot, dense nuclear matter with a sufficiently large volume and
long duration, due to the largely deformed uranium nuclei. The nucleon sideward
flow in the tip-tip collisions is nearly 3 times larger than that in body-body
ones at normalized impact parameter $b/b_{max}<0.5$, and that the body-body
central collisions have a largest negative nucleon elliptic flow $v_{2}=-12%$
in contrast to zero in tip-tip ones. Thus the extreme circumstance and the
novel experimental observables in tip-tip and body-body collisions can provide
a good condition and sensitive probe to study the nuclear EoS, respectively.
The Cooling Storage Ring (CSR) External Target Facility (ETF) to be built at
Lanzhou, China, delivering the uranium beam up to 520 MeV/nucleon is expected
to make significant contribution to explore the nuclear equation of state
(EoS).

<|endoftext|><|startoftext|>
gleim200802
QED for fields obeying a square root operator equation 
Tobias Gleim 
Instead of using local field equations – like the Dirac equation for spin-1/2 and the Klein-
Gordon equation for spin-0 particles – one could try to use non-local field equations in order 
to describe scattering processes. The latter equations can be obtained by means of the 
relativistic energy together with the correspondence principle, resulting in equations with a 
square root operator. By coupling them to an electromagnetic field and expanding the square 
root (and taking into account terms of quadratic order in the electromagnetic coupling 
constant e), it is possible to calculate scattering matrix elements within the framework of 
quantum electrodynamics, e.g. like those for Compton scattering or for the scattering of two 
identical particles. This will be done here for the scalar case. These results are then compared 
with the corresponding ones based on the Klein-Gordon equation. A proposal of how to 
transfer these reflections to the spin-1/2 case is also presented. 
Free scalar particles are usually described by means of the well-known Klein-Gordon equation (see 
e.g. [4,5,6,10]): 
( ) ( ) 0,ˆ 222 =++∂ txmpt rr φ ,      (1) 
where we have used the momentum operator in configuration space ∇−=
ip̂  (and set the velocity of 
light as well as Planck’s constant h  to one). (1) can be regarded as an iteration of the following square 
root operator equation (see e.g. [1,2,4,5]): 
( ) ( )txpmtxi t ,ˆ, 22
 +=∂ φφ .     (2) 
Introducing an electromagnetic field with a 4-vector potential ( ) ( )AAA r,0=µ  and applying minimal 
coupling (with coupling constant e), i.e. replacing µµ xii ∂∂=∂  by ( )xeAxi µµ −∂∂ , the Klein-
Gordon equation yields (see e.g. [6]) 
( ) ( ) ( ) ( )( ) ( ) ( )[ ] ( )xxAxAexAxAiexmpt φφ µµµµµµ 2222 ˆ +∂+∂−=++∂ r ,  (3) 
where we have used the 4-vector notation ( ) ( ) ( )xtxxx rr ,,0 ==µ  and Einstein’s summation convention. 
Here, the coupling terms on the right hand side of (3) could easily be separated from the term with the 
free particle Hamiltonian on the left hand side of (3). This is unfortunately no longer possible, if one 
couples the non-local equation (2) to the electromagnetic field: 
( ) ( )( ) ( ) ( ) ( )xxeAxxAepmxi t φφφ 022 ˆ +
,    (4) 
because the vector potential A
 appears under the square root. But in a perturbation analysis of 
scattering processes, this property is useful, since such an analysis is based on the assumption that the 
coupling terms make only small contributions to the free particle solution due to the small value of the 
coupling constant e. By rewriting the Hamiltonian in (4), 
( )( ) ( ) ( )22220220 ˆˆˆˆˆ AepAeApepmeAxAepmeAH rrrrrrrr +⋅−⋅−+++=−++=′ ,  (5) 
one can split off a factor with the free Hamiltonian 
ˆ pmH +=′ ,       (6) 
which yields 
( )( ) ( )2122122220 ˆˆˆˆ1ˆ pmpmAepAeApeeAH rrrrrrr +++⋅−⋅−++=′ − .   (7) 
With the above-mentioned assumption, it is now very tempting to expand the first square root factor. 
A very similar approach has already been proposed by [3]. We would like to restrict ourselves to a 
series expansion of the kind  
...ˆˆ1ˆ1 2
1 +−+≈+ yyy       (8) 
containing only constant, linear and quadratic terms, where ŷ  denotes 
( )( ) 12222 ˆˆˆˆ −++⋅−⋅−= pmAepAeApey rrrrrr .     (9) 
Hamiltonian (7) is therefore approximated by 
ˆˆˆˆ HHHH ′+′+′≈′        (10) 
with 
( )( ) 
++⋅+⋅−=′
− 022
ˆˆˆˆ ApmpAApeH
rrrrr
,     (11) 
( ) ( )( ) ( )( ) 
+⋅+⋅+⋅+⋅−+=′
22122
ˆˆˆˆˆˆˆˆ pmpAAppmpAAppmAeH
rrrrrrrrrrrr
,  (12) 
where we have reordered the terms of expansion (9), retaining only terms to (and including the) 
quadratic order in e and recollected powers of ( )22 p̂m r+ . What we have won by (10) is a separation 
of the free Hamiltonian (6) from the coupling terms in (5) that approximately result in the sum of 1Ĥ ′  
and 2Ĥ ′  (i.e. (11) and (12) respectively). For we are only interested in corrections to the free 
Hamiltonian anyway, this approximation might not hurt very much. But however, this separation 
seems not to be a true one, because of the multiple factors of powers of ( )22 p̂m r+  in (11) and (12). 
That is, we need an interpretation of these operators. To this end, it is useful to know that for the free 
square root operator equation (2), an integral representation can be given (see  [1,2]): 
( ) ( ) ( ) ( )( )txtxxxxdtxi t ,:,, 3
rrrrr
φφφ Ω=′′−Ω′=∂ ∫ ,     (13) 
where Ω  denotes an energy distribution 
′−⋅−=′−Ω xxpip e
rrrrr
     (14) 
with 
22 pmpp
r +== ωω .     (15) 
(13) results from the fact, that one would expect to obtain the following momentum space 
representation of (2): 
( ) ( )tptpi pt ,
φωφ =∂  
with ( )tp,~ rφ  denoting the Fourier-transformed ( )tx,rφ . If the operator 22 p̂m r+  corresponds to 
( )∫ ′−Ω′ xxxd
rr3 , the operator ( ) 2122 ˆ −+ pm r  must correspond to ( )∫ ′−Ω′ − xxxd rr13  with 
′−⋅−−− =′−Ω xxpip e
rrrrr 1
,  (16) 
because 
( ) ( ) ( ) ( ) ( )xxxxxxxdxxxxxd ′−=′−′′Ω′′−Ω′′=′−′′Ω′′−Ω′′ −− ∫∫
rrrrrrrrrr 31313 δ   (17) 
with the Dirac distribution 
′−⋅−=′− xxpie
rrrrr
δ .      (18) 
(17) should be an integral representation of the “symbolic equation” 
( ) ( ) ( ) ( ) 1ˆˆˆˆ 21212121 22222222 =++=++ −− pmpmpmpm rrrr . 
Accordingly, terms with the nth power of ( )2122 p̂m r+ , 
( )222 ˆ npm r+ ,       (19) 
correspond to integrals over “the nth power of Ω “: 
′−⋅−=′−Ω xxpinp
rrrrr
.     (20) 
By replacing the operators of type (19) by integrals over “powers of Ω ” as given in (20), 1Ĥ ′  and 2Ĥ ′  
(see (11) and (12), respectively) can now be given a configuration space representation, too. 
With these preparations, we can now address to the quantisation of the scalar field with the aim to be 
able to calculate scattering matrix elements. 
Quantisation of the scalar field and the description of scattering processes 
Starting with Hamiltonian (10), it is now possible to describe scattering processes within the 
framework of quantum field theory. For free scalar particles, a quantum field theoretic ansatz is 
described e.g. in [2] and [4], using (2) and (13), respectively, as equations for a field operator ( )xφ . 
The latter one can be formulated with the help of creation and annihilation operators +pa rˆ  and pa rˆ , 
respectively: 
( ) p
xip ae
φ ,      (21) 
where as usual xptxpxp p
⋅−==⋅ ωµµ  with the 4-vector ( )pp p r,ω=  and the subsequent 
definitions are postulated: 
,00ˆ =pa r       (22) 
0ˆ0 =+pa r ,       (23) 
[ ] ( )ppaa pp ′−=+′ rrrr 3ˆ,ˆ δ ,      (24 a) 
[ ] 0ˆ,ˆ =′pp aa rr , [ ] 0ˆ,ˆ =+′+ pp aa rr      (24 b) 
with the vacuum state 0 . Since we are interested in a quantum theory for bosons, [ ]••,  in (24) 
must be a commutator (for fermions we would use here an anti-commutator instead, cf. e.g. [4]). 
Equations (21) to (24) are identical to those that one would postulate within a non-relativistic quantum 
field theory for bosons. 
For the density of a Hamiltonian, we make the usual ansatz (see e.g. [7]): 
( ) ( )xHxH φφ ′= + ˆˆ       (25) 
which one can retrieve from a density of a Lagrangian (see [2]): 
( ) ( ) ( ) φφφφφφφφ ++++ Ω−Ω−∂−∂=
iL .    (25 a) 
Substituting (10) into (25), we get 
ˆˆˆˆ HHHH ++≈       (26) 
with  
( ) ( )xHxH φφ 00 ˆˆ ′= + ,      (27) 
( ) ( )xHxH φφ 11 ˆˆ ′= + ,      (28) 
( ) ( )xHxH φφ 22 ˆˆ ′= + .      (29) 
(25) is (among other things) motivated by the fact that 
ppp aapdHxd rr ˆˆ
∫∫ = ω      (30) 
reproduces the relativistic analogue of the free non-relativistic Hamiltonian: 
pp aa
pd rr
∫ .      (31) 
(28) together with (29) are the densities of the Hamiltonian to (and including the) quadratic order in e. 
(21) to (24) are valid for free particles, but can also be used for interacting ones, if Dirac’s 
representation is used instead of the so far applied Heisenberg representation. Then, with (28) and (29) 
combined to a Hamiltonian density 
ˆˆˆ HHH I +=       (32) 
for the interaction of scalar bosons with photons, we can now start to calculate scattering matrix 
elements. To this purpose, we need the serial expansion of the S-operator (see e.g. [7]) to the order of 
2e : 
( ) ( ) ...ˆˆ1ˆ 21 +++= SSS        (33) 
with 
( ) ( )( )∫−= xHTxdiS Iˆˆ 41 ,      (34) 
( ) ( ) ( ) ( )( )∫ ∫−= 2124142212 ˆˆˆ xHxHTxdxdiS II ,   (35) 
where we have introduced a time ordering operator 
( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( )1212010221020121 ˆˆˆˆˆˆˆˆ xHxHTxHxHxxxHxHxxxHxHT IIIIIIII =−+−= θθ   (36) 
with 
tθ .      (37) 
( )1Ŝ  does not only contribute to the expansion (33) with terms of order e, but also to order 2e . 
Therefore, we can split off ( )1Ŝ  into a term 1Ŝ  containing only terms in e,  
( ) ( ) ( ) ( )( )( )( ) ( ) ( ) ( )( )∫ +−+ +Ω⋅+⋅−−= xxAxxpxAxApxTxdieS φφφφ 012141 ˆˆˆ r
  (38) 
with 
( )( ) ( ) ( )∫ −Ω=Ω −− txxxxdx ,111131 rrr φφ      (39) 
and a part containing only terms in 2e : 
( ) ( ) ( )( )( )(
( ) ( ) ( )( ) ( ) ( ) ( )( )( )( ) )∫
Ω⋅+⋅−Ω⋅+⋅−
21212221
111118
,ˆ,,ˆˆˆ
txptxAtxApxxxdpxAxApx
xxAxTxdieS
rrrrrrrrrrrrr
(40) 
where the momentum operator 1p̂
 contains a gradient acting on 1x
 and 2p̂
 acting on 2x
. In (38) and 
(40), we have already substituted (32) into (34) and replaced powers of ( )2122 p̂m r+  by integrals over 
“powers of Ω ” (see (20)) in (11). Thus we can rewrite (34) as 
1 ˆˆˆ SSS += .       (41) 
The time ordering operator appearing in ( )1Ŝ  can be left out, because it contains only one time. In ( )2Ŝ  
we only want to retain terms of order 2e , therefore IĤ  can be approximated by 1Ĥ : 
( ) ( ) ( ) ( ) ( )( )( )( ) ( ) ( ) ( )(
( ) ( ) ( )( )( )( ) ( ) ( ) ( )( )( )( )
( ) ( ) ( ) ( ) ( ) ( ))22021101
222221
111114
111112
xxAxxxAx
xpxAxApxxpxAxApx
xxAxxpxAxApxTxdxdieS
Ω⋅+⋅Ω⋅+⋅+
Ω⋅+⋅−−= ∫ ∫
rrrrrrrr
. (42) 
Here, the time ordering operator must be taken into account, because there are two times: 1t  and 2t . 
We have already quantised the field of scalar bosons, but must do now the corresponding with the 
electromagnetic field. Since the scalar field in (38), (40) and (42) couple in a different way to the 
vector  potential A
 than to the scalar potential 0A , the choice of a Coulomb gauge seems to be 
appropriate: 
0=⋅∇ A
.       (43) 
Then the field equations take on the form 
( ) jAA tt r
=∇∂+∇−∂ 022 ,     (44 a) 
ρ−=∇ 02 A
       (44 b) 
with charge and current densities ρ  and j
, respectively. From (44 b) we can see that in this 
gauge the scalar potential is just a c-number: 
( ) ( )∫ ′−
xdtxA rr
,      (45) 
whereas the vector potential A
 becomes an operator when being quantised. For a free field, A
 can be 
chosen like (see e.g. [4,7,10]) 
( ) ( )λε
λλ ,ˆˆ~22
kecec
xA xik
rr∫ ∑
⋅+⋅− +=     (46) 
with the usual photon frequency 
   2~ kk
and with creation and annihilation operators λkc
rˆ  and +λkc
rˆ , respectively, for photons: 
[ ] ( )kkcc
rr −′= ′
3ˆ,ˆ δδ λλλλ , [ ] [ ] 0ˆ,ˆˆ,ˆ == ++ ′′′′ λλλλ kkkk cccc rrrr ,   (47) 
and 0A  would even vanish. The polarisation vectors ( )λε ,krr  fulfil the relation (see e.g. [4, 7]): 
( ) ( )
δλελε
.      (48) 
Due to the Coulomb gauge condition (43), 
( ) 0, =⋅ λε kk rrr          (49) 
is valid, too. 
In the following sections, we are going to calculate ( )1Ŝ  and ( )2Ŝ  by substituting the field operators 
( )xφ  and ( )xA
 from (21) and (46) as well as the distributions (20). Since we are considering 
electromagnetic interactions between (charged) spin-0 bosons, we have to take 0A  in (45) into 
account, too. Therefore, we first have to find out what the density of charge in (45) will be in this case. 
This can be done by coupling the Lagrangian density for free spin-0 bosons (25 a) to an 
electromagnetic field with the aid of the minimal coupling scheme µµµ eAii −∂→∂ . Then in that 
Lagragian density an extra term φφ +0eA  (the only one with an 0A ) emerges. If one subsequently 
regards the sum of that spin-0 and the electromagnetic Lagrangian, it is possible to obtain from it the 
electromagnetic field equations by means of the Euler-Lagrange equations. The former equations then 
contain a charge density φφ +e . That is why we can set ρ  in (45) to φφ + . 
With these results, one can then continue to calculate scattering matrix elements. 
The first term we calculate is 
( ) ( ) ( )( )( )( )∫ −+ Ω⋅+⋅= xpxAxApxxdI φφ 141 ˆˆ:ˆ r
     (50) 
appearing in ( )1Ŝ  (see (38)). To this end, it is useful to recognise that by means of (16) we get 
( )( )
⋅−−− =Ω p
φ .     (51) 
With (51) and integration by parts, (50) yields: 
( ) ( ) ( ) ( )( )++ −−++−+⋅= ∫ ∑ λλ
ωπ kk
ckppckppaa
pdpdkd
I rrrr
ˆˆˆˆ,
, (52) 
where 
( ) ( ) ( )kppkpp kpp
±−±−=±− 21
δωωωδδ     (53) 
and, of course, the a operators commute with the c operators. In 12Ŝ  (40), the first integral can be 
expressed in a similar way: 
( ) ( )( )( )
( ) ( )
( ) ( )(
( ) ( ) )
22112211
22112211
~2~22
ccppkkccppkk
ccppkkccppkk
pdpdkdkd
xxAxxdI
−+−−+−++−
+−+−+−++
   (54) 
After a quite lengthy but straightforward calculation, the second integral in (40) yields 
( ) ( ) ( )( ) ( )
( ) ( )( )( )( ) =Ω⋅+⋅
⋅−Ω⋅+⋅=
212122
111112
,ˆ,,ˆ
txptxAtxAp
xxpxAxApxxdxdI
rrrrrrr
rrrrrr
                                           (55) 
( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )( )
−+−−+−++−
+−+−+−++−
11221122
11221122
~2~22
21222
ccppkkccppkk
ccppkkccppkk
ppkaa
pdpdkdkd
Finally, we want to determine the second term in ( )2Ŝ  (see (42)) which is not just like the product of 
two operators 1̂I  due to the time ordering operator as defined in (36). Unfortunately, we cannot use 
the famous Wick theorem, because the scalar field operator (21) contains only contributions to 
positive energy solutions: it does not consist of a sum of both positive and negative energy solutions as 
it would be the case for the field operator of the Klein-Gordon equation. Due to the symmetry of the 
time ordering operator (36) in its arguments, we may conclude 
( ) ( )( ) ( ) ( ) ( )2102012414212414 ˆˆ2ˆˆ xHxHxxxdxdxHxHTxdxd IIII ∫∫∫∫ −= θ .  (56) 
With this property, the calculation of the second term in (42) can be simplified a bit: 
( ) ( ) ( )( )( )( )(
( ) ( ) ( )( )( )( ) =Ω⋅+⋅
⋅Ω⋅+⋅=
22222
111112
xpxAxApx
xpxAxApxTxdxdI
     (57 a) 
( ) ( ) ( ) ( )
( ( ) ( )
( ) ( )( ) ( )( )
( ) ( )
( ) ( )( ) ( )( )
( ) ( )
( ) ( )( ) ( )( )
( ) ( )
( ) ( )( ) ( )( )
−−−−−−−
⋅−−′−′−′
++−−−−−−
⋅+−′−′−′
+−−−+−−−
⋅−−′+′−′
++−−+−−−
⋅+−′+′−′
21212
21212
21212
21212
21212
~exp~exp
~exp~exp
~exp~exp
~exp~exp
,,ˆˆˆˆ
~2~22
tititt
kppkppcc
tititt
kppkppcc
tititt
kppkppcc
tititt
kppkppcc
kaaaa
pdpdpdpdkdkd
kppkpp
kppkpp
kppkpp
kppkpp
ωωωωωωθ
ωωωωωωθ
ωωωωωωθ
ωωωωωωθ
rrrrrr
rrrrrr
rrrrrr
rrrrrr
The two integrals over the θ  function can be performed by means of the introduction of the two 
variables 
21: tt −=τ , 
    21: ttT +=         (58) 
with the Jacobian  
( ) 2
A linear combination of these variables  
21 tbtaTBA +=+τ       (59 a) 
can be expressed by means of 
( )baA −=
1 ,         (59 b) 
             ( )baB +=
(57) contains four terms of the subsequent type that can be simplified with the help of (58) and (59): 
( ) ( ) ( )
( ) ( )( ) ( )( )
( ) ( ) ( )( )τωωτθ
τωωτθ
21221
2122122
2211212
=+−−−
titixpett
   (60 a) 
The function θ  can be expressed by an integral in the complex plane (see e.g. [6,8]), 
( ) ∫
0      (61) 
with an ε  approaching zero. Substituting this into (60 a), we get: 
( ) ( ) ( )( ) ( )21
21221
τωωτθ
ωωδ +
=−−+ ∫ i
id i .  (60 b) 
With this result, (57) becomes: 
( ) ( ) ( ) ( )
( ( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
⋅−−′−′−′−−+−−
⋅+−′−′−′+−+−−
⋅−−′+′−′−−++−
⋅+−′+′−′+−++−
′′′′′
δδωωωωωωδ
δδωωωωωωδ
δδωωωωωωδ
δδωωωωωωδ
kppkppcc
kppkppcc
kppkppcc
kppkppcc
kaaaa
pdpdpdpdkdkd
kppkppkk
kppkppkk
kppkppkk
kppkppkk
1~~ˆˆ
1~~ˆˆ
1~~ˆˆ
1~~ˆˆ
,,ˆˆˆˆ
~2~22
rrrrrr
rrrrrr
rrrrrr
rrrrrr
 (57 b) 
So far, we have only considered terms of the scattering matrix containing the electromagnetic vector 
potential A
. Now, we have to address to those terms containing the scalar potential 0A  too. (38) does 
not only contain (50), but also a Coulomb potential term: 
+= φφ 045ˆ AxdI .       (62 a) 
By substituting ρ  in (45) by φφ + , we obtain the following equation for (62 a), if we take into account 
that the Fourier transformed Coulomb potential looks like 
:       (63) 
43333
aaaakpkpkdkdpdpdI pkkp rr
−−′+′′′−= +′
.  (62 b) 
(42) contains two terms in 0A . The first term consists of a combination of (50) and (62 a), but taken at 
different times and therefore joined via the time ordering operator. That is why we have to use (60) as 
well as (45) and (63) again: 
( ) ( ) ( )( )( )( ) ( ) ( ) ( )( )
( ) ( ) ( )
( ) ( )[
( ) ( )
( ) ( )
( ) ( )
−−+−−+−−+
−−+−−+−−+
+−+−−++−+
+−+−−++−
−−′+′
=Ω⋅+⋅=
ωωωωωωωδδ
εωωωω
ωωωωωωωδδ
ωωωωωωωδδ
εωωωω
ωωωωωωωδδ
11121
11121
ˆˆˆˆˆˆ~
ˆˆˆˆˆˆ
ˆˆˆˆˆˆ~
ˆˆˆˆˆˆ
111112
kppkkp
kpppkpk
kpkkpp
kpppkpk
kppkkp
kpppkpk
kpkkpp
kpppkpk
caaaaa
caaaaa
caaaaa
caaaaa
ppkkdkdpdpdpdpdkd
xxAxxpxAxApxTxdxdI
rrrrrr
rrrrrr
rrrrrr
rrrrrr
(64) 
The second term of (42) containing a scalar potential is even quadratic in 0A : 
( ) ( ) ( ) ( ) ( ) ( )( )∫∫ ++= 2202110124147ˆ xxAxxxAxTxdxdI φφφφ .    (65 a) 
(65 a) contains a factor of two integrands of the kind of (62 a), but taken at two different times. Thus 
the time ordering operator must be taken into account. With the same substitutions as in (62) and (64), 
we obtain the following result: 
( ) ( ) ( )
( ) ( ) εωωωω
δδωωωωωωωωδ
ikkkk
kkppkkpp
aaaaaaaapdkdkdpdpdkdkdpdiI
kpkpkpkp
pkkppkkp
+−−+−′−′
⋅−′+−′−′+−′−−++−−+
⋅′′′′=
11112222
22221111
ˆˆˆˆˆˆˆˆ
rrrrrrrr
rrrrrrrr
(65 b) 
The results (52), (54), (55), (57), (62), (64) and (65) substituted into (38) to (42) now enable us to 
evaluate scattering matrix elements for scattering processes to (and including) the order 2e . As two 
examples, we turn first to the scalar analogue of Compton scattering in order to address then to the 
scattering of two identical scalar bosons. These two scattering processes can be compared easily with 
the corresponding results of the well-known scalar QED dealing with the Klein-Gordon equation (3). 
Compton scattering 
For Compton scattering, we need one scalar boson and one photon each in the input and output 
channel. This means, we have to evaluate the element 
0ˆˆˆˆˆ0:ˆ ++′′′= µµ hqqh caSacS
rrrr ,      (66) 
with the Ŝ -operator (33). Firstly, we realise that the terms based on 1̂I  (see (52)) as well as 6Î  (see 
(64)) must vanish, because the subsequent two elements in the photon operators become zero: 
00ˆ00ˆˆˆ0 == ′′
′′ µλµµλµ δδ hhkhkh cccc
rrrrrr ,         (67) 
00ˆˆˆ0 =++′′ µλµ hkh ccc
rrr . 
There, we have used the commutation relations (47), the properties of creation and annihilation 
operators corresponding to those of (22) and (23) as well as abbreviated the delta functional by 
( )hk
rr −= 3δδ .      (68) 
The terms with 2Î  and 3Î  need 
qppqqppq aaaa rrrrrrrr 2121 0ˆˆˆˆ0 δδ ′
′ = ,    (69) 
whereas a term 
qppppqqppppq aaaaaa rrrrrrrrrrrr 12121212 0ˆˆˆˆˆˆ0 δδδ ′′′
′′ =  
belongs to 4Î . Here we have used again (22) to (24). 
For 2Î , 3Î  and 4Î  the following equations are necessary, too: 
  00ˆˆˆˆ0 =+′′′′ µλλµ hkkh cccc
rrrr ,                      
00ˆˆˆˆ0 =+++ ′′′′ µλλµ hkkh cccc
,       (70) 
µλλµµλλµ δδδδ ′′′′
′′′′ = khkhhkkh cccc
rrrrrrrr 0ˆˆˆˆ0 , 
λµλµµλλµ δδδδ ′′′′
′′′′ = khkhhkkh cccc
rrrrrrrr 0ˆˆˆˆ0 , 
where we have commutated the creation operators successively to the left and the annihilation 
operators to the right. 
Furthermore, we see that the term based on 5Î  from (62) becomes zero, due to 
00ˆˆˆˆˆˆ0 =++′
′′ qpkkpq aaaaaa
rrrrrr .       (71) 
A similar result holds true for the term based on 7Î  from (65). 
00ˆˆˆˆˆˆˆˆˆˆ0
22221111
′′ qpkkppkkpq aaaaaaaaaa
rrrrrrrrrr .     (72) 
Now, we can determine (66) by means of (33): 
( )( ) ( ) ( )762144134122215121 ˆˆˆˆˆˆˆ1ˆ IIIIiIiieIIieS +−+−−++−−≈− .   (73) 
For Compton scattering, 1̂I , 2Î , 3Î , 4Î , 5Î , 6Î  and 7Î  can be evaluated explicitly: 
0ˆˆˆˆ 7651 ==== IIII ,       (74) 
( ) ( ) ( )µεµε
δ ′′⋅
′−′−+
,     (75) 
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
′−⋅′′+′−⋅
++⋅′−+⋅
′−′−+
hqhhhqh
hqhhhqh
hqqhh
rrrrrrrrr
rrrrrrrrr
ωωωωπ
   (76) 
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
+⋅′+′⋅′′
′−⋅′′−′⋅′−′−+
′−′′−′
ωωωωπ
hqhhqh
hqhhqhhqhq
hqhqhq
hqhqhqqhh
rrrrrr
rrrrrr
rrrrrrrr
rrrrrrrr
2,2,1
2,2,1
   (77) 
The two terms 2Î  and 4Î  in (73) resemble the three terms in the corresponding formula of 
Compton scattering for scalar bosons, but this time being based on the Klein-Gordon equation (3) (see 
e.g. [6,8]): 
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )]µεµε
ωωωωπ
′−⋅′′
′+′⋅′′
⋅′−′−+
22~2~22
   (78) 
where ε  is the 4-dimensional generalisation of the three dimensional polarisation vector ε
 used so 
far. In (78), the first two terms correspond to 4Î . We realise that these terms in (78) look very much 
like those in (77), but also that especially the propagators of both theories are a bit different. (75) 
resembles the third term in (78). On the other hand, 3Î  in (76) could also be regarded as the 
analogue of the first two terms in (78) – at least after having used the Coulomb gauge condition (49) in 
(76).  
The first two terms in (78) vanish, if we choose the incoming scalar boson to be at rest, 
( ) ( )0,:,0 rr mqqq == ,      (79) 
and want to have transversally polarised photons in this laboratory system, 
( ) ( ) 0,, =⋅′′=⋅ qhqh µεµε ,     (80) 
and use the Lorentz gauge condition  
( ) ( ) 0,, =′⋅′′=⋅ hhhh µεµε .     (81) 
Accordingly, 3Î  and 4Î  in (73) vanish too, if we adopt (79) again and use the analogue of (80), 
( ) ( ) 0,, =⋅′′=⋅ qhqh rrrrrr µεµε ,     (82) 
as well as apply the Coulomb gauge condition (49). 
That is, under these conditions in both versions of scalar QED, only one term remains (i.e. (75) in (73) 
and, accordingly, the third term in (78)) which is a relativistically generalised version of the matrix 
element from which the well-known Thomson scattering cross section can be evaluated.  
Even though the propagators in both theories are rather different, for this example of scattering 
process, the results do not seem to differ very much from each other in the laboratory system chosen 
above. Therefore the question arises, whether this is also the case for further scattering processes. To 
this end, we are going to investigate what happens, if two identical scalar bosons interact with each 
other. 
Scattering of two identical scalar bosons 
If we want to calculate matrix elements of the S-operator (33) for the scattering of two identical scalar 
bosons, we need two scalar bosons in the input channel and two in the output channel: 
0ˆˆˆˆ0:ˆ
′′= qqqq aaSaaS rrrr .      (83) 
We can reuse (73), but have to evaluate 1̂I , 2Î , 3Î , 4Î , 5Î , 6Î  and 7Î  again. Firstly, 
we recognise that due to (52) and the analogue of (22) and (23) for the photon operators 
01̂ =I ,        (84) 
is valid. Furthermore, we need 
21212211
0ˆˆ0 λλλλ δδ kkkk cc
rrrr =+       (85) 
for calculating 2Î  with the help of (54), whereas all the other photon operator terms therein vanish. 
The same holds true for 3Î  with (55) and 4Î  with (57). 
As far as the scalar boson operators are concerned, for the evaluation of 2Î  and 3Î  
00ˆˆˆˆˆˆ0
121212
=+++′′ qqppqq aaaaaa rrrrrr . 
Hence, these two terms, 
0ˆ2 =I ,          (86) 
    0ˆ3 =I , 
are zero, too. For the evaluation of the non-vanishing term 4Î , the following result is useful: 
2122121121221211
212212112122121112121212
0ˆˆˆˆˆˆˆ0
pqpqpqpqpqpqpqpq
pqpqpqpqpqpqpqpqqqppppqq aaaaaaaa
′′′′′′′′
′′′′′′′′
rrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrrrr
δδδδδδδδ
δδδδδδδδ
. (87) 
With (85), (87) and the invariance of (48) under the transformation kk
−→ , we get 
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
21212112
11111122
21212121
11111111
Ainterms
qqqqqqqq
qqqqqqqqqqqqie
qqqqqqqq
qqqqqqqq
′+⋅′−′−⋅′+
 ′+⋅′−′−⋅′+′−′−+−
′−′′−′
′−′′−′
εωωωεωωω
εωωωεωωω
rrrrrrrrrr
rrrrrrrrrr
 (88 a) 
Here, we have omitted the terms 5Î , 6Î  and 7Î  containing terms in the scalar potential 
which are regarded later.  
(88 a) can be compared directly with the corresponding result of the Klein-Gordon equation (3) (see 
e.g. [9]): 
( ) ( )
( ) ( )
( ) 
′+⋅′+
′+⋅′+′−′−+
qqqqqqqqie
qqqq ωωωωπ
 , (89) 
if we use (48) and leave out the term ε  approaching zero in (88) : 
( ) ( )
( ) ( )
( ) ( )( ) ( )
( ) ( )
( ) ( )( ) ( )
( ) ( )
21212112
11111122
Ainterms
qqqqqqqq
qqqqqqqq
qqqqqqqqie
′+⋅′−′−⋅′+
′+⋅′−′−⋅′+
′+⋅′+
′+⋅′+′−′−+−
rrvrvrrr
rrvrvrrr
rrrrrrvr
(88 b) 
The first two terms in (88 b) correspond to (89) which contains the photon propagator in the Feynman 
gauge. But since we use a Coulomb gauge, only the space-like components of the linear 4-momenta 
appear in the numerators of the first two terms in (88 b). Moreover, in the second line of (88 b) two 
additional terms are present which can be reformulated by means of the delta distribution in (88 b): 
( )( )
( ) ( )
( )( )
( ) ( )221221
′−′−′
− .     (90) 
The structure of (88 b) is the same as that of (89). We have two terms: in the second term, the 
momenta of the two scalar bosons in the output channel have been exchanged compared to the first 
term. 
Now we address to the terms in the scalar potential 0A  in (88). The Coulomb term 5Î  contains a 
term 
pqkqqpqkkqpqqpqk
pqkqqkqpkqpqqkqpqqpkkpqq
aaaaaaaa
′′′′′′′′
′′′′′′′′
rrrrrrrrrrrrrrrr
rrrrrrrrrrrrrrrrrrrrrrrr
21122112
211221121212
0ˆˆˆˆˆˆˆ0
δδδδδδδδ
δδδδδδδδ
Therefore, 5Î  yields 
( ) ( ) 
−−′+′=
qqqqI rrrrδ
.    (91) 
The term 6Î  vanishes, since the annihilation operators for photons in (64) act directly on the 
vacuum states.  
On the other hand, 7Î  becomes a rather lengthy because of 
( )( )
( )( )
( )( )
( )( )11
12112121122222
21121121122222
121121211222
121121122122
122222111112
0ˆˆˆˆˆˆˆˆˆˆˆ0
aaaaaaaaaaaa
kqpqkppkqpqkqp
kkkqpqppqpqkqp
kqpqkppkqkqp
kqpqkkqkppqp
qqpkkppkkpqq
′↔′−+
′↔′−−
′↔′−−
′↔′−=
′′′′′′
′′′′′′
′′′′′′
′′′′′′
rrrrrrrrrrrrrr
rrrrrrrrrrrrrr
rrrrrrrrrrrr
rrrrrrrrrrrr
rrrrrrrrrrrr
δδδδδδδ
δδδδδδδ
δδδδδδ
δδδδδδ
where ( )11 kp ′↔′
 denotes the same term as the immediately preceding one, but with exchanged 
momenta p′
 and 1k ′
, respectively.  Thus, 7Î  gives 
−−′+′=
−′−−′−+′−+′−
22222222
12112221
qpqpqpqpqpqpqpqp
pqqpqq
rrrrrrrrrrrrrrrr
rrrrrr
ωωωωωωωω
ωωωωπ
.  (92) 
Hence in place of the time-like components (i.e. the energy terms) of the linear 4-momenta in the 
numerators of (89) derived from the Klein-Gordon equation, several terms arise: (90), (91) and (92). 
But (91) and (the non relativistic limit of) (92) would also have appeared, if we had started from the 
non-relativistic Schrödinger equation. Thus, these terms state the fact that the equation used for 
obtaining (88) (together with (90), (91) and (92)) is Schrödinger equation like.  
Conclusions and outlook 
For scalar bosons, we could see that it is possible to describe scattering processes by means of a square 
root operator equation being coupled to an electromagnetic field. We achieved this by splitting off a 
factor in the shape of the free square root operator from the equation and by a series expansion of a 
remaining square root factor containing terms in the electromagnetic vector potential and powers of 
the (inverse) free square root operator. The latter ones could be given an integral representation.  
Having quantised the fields involved, we could evaluate the scattering matrix elements for Compton 
scattering and for the scattering of two identical bosons (to – and including – the quadratic order of  e) 
which, on the one hand, resembled the results derived with the help of the corresponding Klein-
Gordon equation and, on the other hand, the results one would have obtained with a non-relativistic 
Schrödinger equation.  
Of course, now several questions arise, e.g.: 
• Can we formulate Feynman rules at all for our non-local scalar QED? 
• Do divergent terms appear and, if yes, can a renormalisation procedure be found? 
• Can the results of this non-local QED be confirmed (or refuted) by experiments? 
To the first question: if one tries to formulate Feynman rules, one must be aware that in each step of 
the approximation procedure, we have to expand the first square root factor (in the second term of) (7) 
to the desired order in e. Thus, the Hamiltonian we are using within that procedure must be adapted in 
each order of e of the approximation. Therefore we conclude that even if it were possible to formulate 
Feynman rules, they would be much more complicated than those for the Klein-Gordon theory. This is 
the price we have to pay for non-locality. But without Feynman rules, it is not very easy to analyse the 
renormalisability of that non-local scalar QED either.  
The answer to the last question listed above is negative due to a lack of elementary spinless bosons in 
nature. But that question would be sensible, if we had a corresponding non-local theory for spin-1/2 
particles. We could even make a guess, how this theory would look like: for free spin-1/2 particles, the 
wave functions in (2) should be 2-spinors. It would also be possible to couple this equation to an 
electromagnetic field, but the usual minimal coupling scheme does not work. Instead we would have 
to postulate an equation: 
ˆ ′=∂ ,       (93) 
with the Hamilton operator 
( ) ( ) 022 ˆˆ eAEiBeAepmH +⋅−−+=′ rmrrrr
σ ,    (94) 
which contains an additional term with the Pauli matrices σ
 and the magnetic field B
 as well as the 
electric field E
 under the square root. For this equation, it has already been shown that it can 
reproduce the gyromagnetic factor of 2 for the electron as well as, when being applied to a hydrogen 
atom, that it can reproduce correct binding energies of the electron at least to (and including) the 
quadratic order in 2e  (see [12]). We can apply the same approximation procedure to this equation as 
we have presented here for the scalar case. But of course, additional difficulties emerge: the 
Hamiltonian corresponding to the one shown in (25) should be Hermitian. And at least for the free 
spin-1/2 case, that Hamiltonian should be relativistically invariant. This means, that for the free case, 
(25) should be a Lorentz scalar with respect to the spin. To this end, we have to replace φ  and +φ  
therein by combinations of a mixture of left and right handed 2-spinors Lφ  and Rφ , respectively, and 
their Hermitian conjugates, because RLφφ
+  and LRφφ
+  are Lorentz scalars. Therefore, (25) could be 
replaced either by 
RLLR HHH φφφφ ±
++ ′+′= ˆˆˆ
       (95) 
or by 
RLLR HHH φφφφ mˆˆˆ 2
1 ′+′= +±
+       (96) 
so that the Hamiltonians (95) and (96) become Hermitian, because of the property 
±′=′ HH ˆˆ m .       (97) 
The results of a QED based on (95) or (96) could then be compared with the ones based on a 
corresponding Dirac equation.  
The author does not know, whether such an approach has already been performed for the spin-1/2 case 
or for the here presented spinless case. He does not know either, if an application for it can be found, 
where e.g. non-local properties are indispensable. But it seems to him that from the technical point of 
view, square root operator equations coupled to an electromagnetic field like (4) (or maybe even like 
(93) together with (94)) can be handled within the framework of a quantum field theory. Such 
equations were given up quite early in the history of quantum mechanics for several good reasons, e.g. 
due to their lack of relativistic invariance (see [13]), their non-local character accompanied by the 
difficulty of finding an appropriate mathematical interpretation and description, respectively (see e.g. 
[4,5,6]). While the first reason mentioned remains still valid, from today’s perspective, those non-local 
properties might not be refused as vehemently as in the past (e.g. when one looks out for 
approximations of the so called Bethe-Salpeter equation [11]). The author hopes to have shown, that at 
least answers to the question of possible descriptions of such non-local square root operator equations 
can be found. 
References 
[1] E. Trübenbacher, Z. Naturforschung 44a, 801-810 (1989). 
[2] C. Lämmerzahl, J. Math. Phys. 34 (9), 3918-3932 (1993). 
[3] T.L. Gill and W.W. Zachary, J. Phys. A: Math.Gen. 38 2479-2496 (2005). 
[4] W. R. Theis, Grundzüge der Quantentheorie (Teubner, Stuttgart, 1985). 
[5] A. Messiah, Quantenmechanik 2 (Walter de Gruyter, Berlin, 1990). 
[6] J.D. Bjorken and S.D. Drell, Relativistische Quantenmechanik (Bibliographisches Institut, 
Mannheim, 1966). 
[7] W. Greiner and J. Reinhardt, Feldquantisierung (Verlag Harri Deutsch, Thun/Frankfurt a.M., 
1993). 
[8] W. Greiner and J. Reinhardt, Quantenelektrodynamik (Verlag Harri Deutsch, Thun/Frankfurt a.M., 
1984. 
[9] W. Greiner and A. Schäfer, Quantenchromodynamik (Verlag Harri Deutsch, Thun/Frankfurt a.M., 
1989. 
[10] L. H. Ryder, Quantum Field Theory (Cambridge University Press, Cambridge, 1985). 
[11] W. Lucha  and F.F. Schöberl, Int. J. Mod. Phys. A 14 2309 (1999). 
[12] T. Gleim, quant-ph/0601211; quant-ph/0602047. 
[13] J. Sucher, J. Math. Phys. 4 (1), 17-23 (1963).
ABSTRACT
  Instead of using local field equations - like the Dirac equation for spin-1/2
and the Klein-Gordon equation for spin-0 particles - one could try to use
non-local field equations in order to describe scattering processes. The latter
equations can be obtained by means of the relativistic energy together with the
correspondence principle, resulting in equations with a square root operator.
By coupling them to an electromagnetic field and expanding the square root (and
taking into account terms of quadratic order in the electromagnetic coupling
constant e), it is possible to calculate scattering matrix elements within the
framework of quantum electrodynamics, e.g. like those for Compton scattering or
for the scattering of two identical particles. This will be done here for the
scalar case. These results are then compared with the corresponding ones based
on the Klein-Gordon equation. A proposal of how to transfer these reflections
to the spin-1/2 case is also presented.

<|endoftext|><|startoftext|>
Draft version November 4, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
FEEDBACK FROM FIRST RADIATION SOURCES: H− PHOTODISSOCIATION
Leonid Chuzhoy
, Michael Kuhlen
and Paul R. Shapiro
Draft version November 4, 2018
ABSTRACT
During the epoch of reionization, the formation of radiation sources is accompanied by the growth
of a H− photodissociating flux. We estimate the impact of this flux on the formation of molecular
hydrogen and cooling in the first galaxies, assuming different types of radiation sources (e.g. Pop
II and Pop III stars, miniquasars). We find that H− photodissociation reduces the formation of H2
molecules by a factor of Fs ∼ 1 + 10
3ksxf
esc δ
−1, where x is the mean ionized fraction in the IGM,
fesc is the fraction of ionizing photons that escape from their progenitor halos, δ is the local gas
overdensity and ks is an order unity constant which depends on the type of radiation source. By the
time a significant fraction of the universe becomes ionized, H− photodissociation may significantly
reduce the H2 abundance and, with it, the primordial star formation rate, delaying the progress of
reionization.
Subject headings: cosmology: theory – early universe – galaxies: formation – galaxies: high redshift
1. INTRODUCTION
The first stars in the ΛCDM universe are believed
to have formed inside dark-matter-dominated minihalos
filled with mostly neutral, metal-free gas of virial tem-
perature Tvir < 10
4 K, when H2 molecules formed in
sufficient abundance to cool the gas radiatively to ∼ 102
K. If, as currently thought, these stars were massive, hot,
and luminous, they may have contributed significantly to
the reionization of the universe, which CMB polarization
observations by WMAP indicate was highly ionized by
z ∼ 10 (Spergel et al. 2006). The release of ionizing UV
radiation by minihalos and other sources (e.g. stars in
more massive halos, with Tvir > 10
4 K, or miniquasars),
required to explain reionization, must have been accom-
panied by radiation release at energies below the H Ly-
man limit, as well, however. This may, in turn, have lim-
ited the H2 abundance inside minihalos and their ability
to form stars, thereby limiting their contribution to cos-
mic reionization.
In the absence of dust and at densities below the three-
body formation regime (n . 1010 cm−3), the most im-
portant reaction for the production of H2 is
H− +H → H2 + e
−, (1)
(e.g., Shapiro & Kang 1987 and refs. therein) with re-
action rate k− = 1.3× 10
−9 cm3 s−1 (Schmetekopf et al.
1967). Once formed, H2 can be destroyed by collisions
with other species
H2 +H
+ → H+2 +H, (2)
H2 +H → H+H+H, (3)
H2 + e
− → H+H+ e−, (4)
or by photodissociation via Lyman-Werner band photon
absorption
H2 + γ → H+H. (5)
1 McDonald Observatory and Department of Astronomy, The
University of Texas at Austin, RLM 16.206, Austin, TX 78712,
USA; chuzhoy@astro.as.utexas.edu
2 Institute for Advanced Study, Princeton, NJ, 08540, USA;
mqk@ias.edu
The latter process becomes dominant once a substan-
tial UV background is built up between 912 and 1110
Å, providing a feedback mechanism against the forma-
tion of new radiation sources (e.g. Haiman et al. 1997;
Haiman et al. 2000; Ciardi et al. 2000; Machacek et al.
2001; Mesinger et al. 2006).
In this paper we explore the impact of another feedback
mechanism, the photodissociation of H−,
H− + γ → H+ e−. (6)
The cross-section for photodissociation of H− is well fit-
ted by (Wishart 1979)
σ−(ǫ) = 2.1× 10
−16 (ǫ− 0.75)
ǫ3.11
cm2, (7)
where ǫ is the photon energy in eV. The cross section is
zero below a threshold of ǫ < 0.755eV, the binding en-
ergy of the second electron. In the absence of the UV
background, the primary mode of H− destruction is the
formation of H2 (Eq. [1]),
3 so introducing the H− pho-
todissociating flux reduces the H2 formation rate by a
factor
Fs = 1 +
, (8)
where ζ− =
nγ(ǫ)σ−(ǫ)cdǫ is the photodissociation rate
per H− ion, nH is the hydrogen atom number density
and nγ(ǫ) is the number density of photons with energy
ǫ.4 Hence the importance of this mechanism depends pri-
marily on the local density ratio of H− photodissociating
photons and hydrogen atoms.
3 When gas fractional ionization is high (x & 0.01) mutual neu-
tralization with H+ can provide another efficient channel for H−
destruction. However, typically the fractional ionization of mini-
halos is much lower.
4 This approximation for Fs breaks down when its value exceeds
∼ 50, since for such UV intensities H+ +H → H
+ γ reaction
becomes a dominant channel of H2 production (assuming reaction
rates given by Shapiro & Kang (1987)). Note also that k
is still
uncertain to within a factor of a few (see Glover et al. 2006), and
this uncertainty carries over to Fs when Fs ≫ 1.
http://arxiv.org/abs/0704.0426v2
The impact of H− photodissociation differs from that
of H2 by two fundamental characteristics. First, the time
required for H− abundance to approach equilibrium is
very short (typically less than 10000 years), while for
H2 the equilibration time can exceed the Hubble time.
Therefore, when gas is exposed to a transient UV flux,
produced by nearby Pop III stars, for example, H− pho-
todissociation can generally be ignored, as it does not
affect the subsequent thermal and chemical evolution.
Secondly, photons that make up the H2 photodissociat-
ing background are destroyed after a few percent of the
Hubble time, as they redshift into one of the Lyman se-
ries resonances, and must be replenished continuously.
By contrast, photons that constitute the H− photodis-
sociating background are very rarely destroyed, which
allows them to accumulate over time. Consequently the
importance of H− photodissociation increases over time,
and as we show in this paper, by the time a significant
(∼ 10 %) fraction of the Universe is ionized, H− pho-
todissociation may result in a drastic reduction of the
molecular hydrogen abundance. This in turn may lead
to a reduced star formation rate and delay the progress
of reionization.
Recently, Glover (2007) considered the suppression of
H2 formation due to the photodissociation of H
− and
H+2 . Whereas Glover (2007) focused on the local feed-
back around and inside HII regions created by Pop III
stars, we treat the problem globally and also consider
long range effects due to the much lower optical depth of
the universe below the Lyman limit.
The paper is organized as following. In §2 and 3, we
estimate the intensity of H− photodissociating flux pro-
duced by UV and X-ray sources, respectively. In §4, we
discuss the implication of our results for gas cooling in
minihalos.
2. H− PHOTODISSOCIATING BACKGROUND - UV
SOURCES
2.1. Recombination products
Since the first radiation sources are expected to form
within overdense gas clouds, only the escaping fraction
of their ionizing photons, fesc, was available for ioniza-
tion of the diffuse IGM. The rest was absorbed within
the host halos and, via the process of radiative recombi-
nation, converted into lower energy UV photons. Since
the universe during that epoch is transparent to most
non-ionizing UV photons, 5 almost all of them add to
the H− photodissociation background.
Neglecting recombinations in the diffuse IGM, the
mean ionization is x = Nibfesc, where Nib is the to-
tal number of ionizing photons per baryon produced up
to this point. Inside halos, the recombination time is
quite short, and so the number of ionizations taking place
there, Nib(1− fesc) = x(1− fesc)/fesc, is almost equal to
5 An exception occurs for photons whose frequency is close to
one of the high (n > 2) Lyman resonances, which, following their
absorption by hydrogen atoms, are further split into two or more
lower energy photons. For Lyα photons, the optical depth is also
very high, but in their case the absorption in almost all cases is
followed by reemission, with the destruction probability being ex-
tremely low (e.g. Furlanetto & Pritchard 2006). Also, at the very
early stage of reionization (x ≪ 1) the presence of H2 molecules
makes the universe opaque in the Lyman-Werner range. However,
since their initial abundance (∼ 10−6) is already very low, the
number of photons they destroy is negligible.
the number of electron recombinations to n ≥ 2 states
(i.e., recombinations which do not result in emission of
additional ionizing photons), Nrec. Therefore the average
H− photodissociating rate is given by
ζ− = Nrecnbarc〈σ−〉 = xnbarc〈σ−〉
1− fesc
, (9)
where nbar is the mean baryon density and 〈σ−〉 is the av-
erage cross-section per recombination photon times the
average number of photons per recombination, 〈σ−〉 =
(jǫ/αrecnenpǫ)σ−(ǫ)dǫ. Note that since emissivity, jǫ,
is proportional to nenp, 〈σ−〉 is in fact independent of
ne and np. Using Osterbrock’s (1989, Sec. 4.3) calcula-
tion of the recombination spectrum, (jǫ/αrecnenp), and
assuming that the temperature of the recombining gas is
close to 104 K, we find 〈σ−〉 = 3.4× 10
−17cm2.
By combining equations (8) and (9), we can estimate
the importance of the H− photodissociation due to re-
combination radiation. Assuming that most of the re-
combinations occurred recently, we find that, the recom-
bination radiation alone will suppress the H2 formation
rate by
Fs = 1 + 800δ
(1− fesc), (10)
where δ = 1.08nH/nbar is the local overdensity. Here we
have neglected recombinations in the diffuse intergalactic
medium (IGM) and the associated H− dissociating pho-
tons from these recombinations, but these would only
further increase Fs.
Cosmological redshift can affect the photodissociation
rate by shifting the spectrum to longer wavelengths.
Initially this leads to an increase in 〈σ−〉 due to the
ǫ−3/2 dependence of the cross-section for ǫ ≫ 0.755
eV. Eventually, as more and more of the spectrum is
shifted below the threshold, the cosmological redshift
begins to decrease the dissociation rate. For recombi-
nation photons this redshift effect is small, and the tran-
sition to 〈σ−〉-depression occurs at a redshift factor of
(1 + zi)/(1 + z) ≈ 2.5, see Figure 1.
2.2. Direct emission
Unlike ionizing photons, whose intensity is heavily at-
tenuated both in stellar atmospheres and in their host
galaxies, most of the photons with frequencies below the
Lyman limit escape freely into the IGM. From then on,
photons with frequency below Lyβ undergo no evolution
apart from cosmological redshift. By contrast, within a
small fraction of the Hubble time, most photons with
frequency between Lyβ and the Lyman limit are split by
cascade into two or more photons after being redshifted
into one of the hydrogen resonances. Most of the cas-
cade products, which include lines such as Lyα, Hα, and
Hβ, as well as a continuum spectrum produced by the
two photon transition 2s → 1s, are above the 0.755 eV
threshold for H− photodissociation.
The relative importance of these directly emitted H−
dissociating photons depends on the nature of the UV
sources. Figure 2 shows the increase of the H− dissocia-
tion rate due to inclusion of direct emission from metal-
poor Pop III stars, which we calculated using the stellar
atmosphere models of Schaerer (2002). Predictably, for
1 1.5 2 2.5 3 3.5 4
)/(1+z)
Fig. 1.— Redshift evolution of the average H− photodissociation
cross-section of the UV photons produced by recombination (dot-
ted line), excitations by non-thermal electrons (dashed line) and
massive Pop III stars (solid line).
4 6 8 10 12 14
Fig. 2.— The ratio between the total H− photodissociation rate
and the photodissociation by recombination products alone for star
with different effective surface temperature.
very massive Pop III stars, with surface temperatures
∼ 105 K, adding the stellar continuum below the Lyman
limit to the recombination spectrum increases the pho-
todissociation rate by only ∼ 10%. If, on the other hand,
most of the early ionizing flux was produced by stars
with masses below 10M⊙, whose continuum emission is
stronger at lower frequencies, then the total H− disso-
ciation rate would be tripled at least. Likewise, direct
emission may be important if most of the UV photons
were produced by miniquasars. For example, assuming
that their spectrum can be approximated by a power
law, Lν ∝ ν
−1.7, with a cutoff below 0.75 eV, adding the
directly emitted photons to the recombination products
increases the total photodissociation rate by a factor of
3. H− PHOTODISSOCIATING BACKGROUND - X-RAY
SOURCES
It has been suggested that X-ray photons could con-
tribute a large fraction of the energy emitted by the
first radiation sources (e.g. Ricotti & Ostriker 2004).
By increasing the number of free electrons, X-rays can
boost the production of H−, and thus of H2, provid-
ing a positive feedback to the formation of new sources
(Haiman et al. 2000; Kuhlen & Madau 2005). This ef-
fect, however, would be at least partially offset by an in-
crease of the H− photodissociating background, caused
by conversion of X-rays into UV photons.
The absorption of an X-ray photon is followed by re-
lease of a non-thermal electron, which then loses some of
its energy by inelastic collisions with atoms before it can
thermalize its energy by elastic scattering with ions and
other electrons. When the gas ionization fraction is low
(x . 0.05), the photoelectron splits most of its energy
evenly between collisional ionizations and excitations of
hydrogen atoms (Shull & van Steenberg 1985). Using
electron-hydrogen excitation cross-sections (Grafe et al.
2001; Stone et al. 2002), we find that around ∼ 5/6 of
the excitations are to the 2p level, which are followed by
emission of a Lyα photon. Most of the remaining excita-
tions are to the 3p level, which decays via emission of one
Hα photon and a subsequent two-photon decay from the
2s level. The Lyα, Hα and two-photon continuum each
produce roughly equal contributions to H− photodisso-
ciation. Per ionization, the average intensity-weighted
cross-section for these photons is 〈σ−〉 = 1.6×10
−17cm2.
Due to the low number of UV photons produced during
this phase, the formation of H2 is not strongly affected
Fs = 1 + 4δ
. (11)
After the ionized fraction climbs above x ∼ 0.05, most
of the energy of the non-thermal electrons is converted
to heat. However, simultaneously with the growth of the
ionized fraction, the temperature of the gas rises, and as
it crosses 104 K, the collisions between thermal electrons
and atoms begin to dissipate the energy added by X-
rays, mainly via emission of Lyα photons. Neglecting
gas clumping, we find that the number of emitted Lyα
photons per hydrogen atom is
Nα = 4.6× 10
−8 cm3 s−1
x(1 − x)e−1.18×10
5/T nH dt.
Assuming for simplicity that x and T are constants, we
can rewrite the equation (12) as
Nα = 11.2
(τe,X
(1− x)e−1.18×10
, (13)
where τe,X =
xnσT dt is the Thompson optical depth
from the epoch of partial ionization by X-rays. If X-
ray preionization contributes at least half of the τe ∼ 0.1
measured by WMAP (i.e. τe,X ≈ 0.05), hydrogen atomic
de-excitations in the diffuse IGM may produce & 30 Lyα
photons per baryon.
The suppression of H2 formation due to H
− photodis-
sociation by Lyα photons is
Fs ≈ 1 + 100Nαδ
−1. (14)
Since the energy of Lyα photons (10.2 eV) is far above
the H− photodissociation threshold (0.75 eV), the pho-
todissociation rate grows roughly as (1+zi)
1.5/(1+z)1.5,
where zi is the redshift at which the photon was emitted.
In the case of an extended period of partial ionization, Fs
may be increased by a factor of a few, possibly exceeding
104δ−1.
Since, when the IGM temperature rises above 104 K,
the formation of new minihalos is suppressed, the impact
of H− photodissociating flux produced by X-ray conver-
sion is relevant only for minihalos which have formed
some time ago or for halos with Tvir > 10
4 K, which also
rely on H2 cooling to form stars.
4. DISCUSSION
As shown by our calculations, H− photodissociation
reduces the formation of H2 molecules by a factor of
Fs ∼ 1 + 10
3ksxf
esc δ
−1, (15)
where ks is a constant of order a few, whose value de-
pends on the type of radiation source and the growth
history of the radiation background. Thus, by the time a
significant fraction (& 0.1) of the universe becomes ion-
ized, H− photodissociation can significantly reduce the
H2 formation rate in regions with overdensities of up to
a few thousands, i.e. in the interior regions of miniha-
los. The equilibrium abundance of molecular hydrogen
during this stage would be determined by the balance be-
tween its formation and destruction rates (Eqs. [1] and
nH2 =
k− nH nH−
, (16)
where kLW is the H2 destruction rate by the Lyman-
Werner photons. Thus a reduction of H− abundance by
a factor Fs translates into the same reduction of the H2
abundance and, in minihalos, a comparable increase of
the cooling time.
Indirectly, H− photodissociation may affect the cool-
ing in the central regions of minihalos even during the
early stages of reionization. The maximum density that
gas can reach in the core region of a minihalo is lim-
ited by the amount of entropy it is able to radiate away
during collapse. The lower density gas prevalent dur-
ing the early collapse phase would be susceptible to H−
dissociation from even a relatively low intensity H− dis-
sociating flux, and the resulting lowered H2 abundance
would limit its ability to radiate away entropy via H2
cooling. Furthermore, the density and H2 abundance at
the center depend on the conditions in the low density
outer regions, through their contributions to both the to-
tal pressure and the self-shielding ability of the halo. We
plan to investigate these effects further with numerical
radiation-hydrodynamic simulations in the future.
LC thanks the McDonald Observatory for the W.J.
McDonald Fellowship. MK gratefully acknowledges sup-
port from the Institute for Advanced Study. This work
was partially supported by NASA Astrophysical Theory
Program grants NAG5-10825 and NNG04G177G to P.
R. S.
REFERENCES
Ciardi, B., Ferrara, A., & Abel, T. 2000, ApJ, 533, 594
Furlanetto, S. R., & Pritchard, J. R. 2006, MNRAS, 372, 1093
Grafe, A., Sweeney, C. J., & Shyn, T. W. 2001, Phys. Rev. A, 63,
052715
Glover, S. C. O. 2007, MNRAS, in press, astro-ph/0703716
Glover, S. C. O., Savin, D. W., & Jappsen, A. -K. 2006, ApJ, 640,
Haiman, Z., Abel, T., & Rees, M. J. 2000, ApJ, 534, 11
Haiman, Z., Rees, M. J., & Loeb, A. 1997, ApJ, 484, 985
Kuhlen, M., & Madau, P. 2005, MNRAS, 363, 1069
Machacek, M. E., Bryan, G. L., & Abel, T. 2001, ApJ, 548, 509
Mesinger, A., Bryan, G.L., & Haiman, Z. 2006, ApJ, 648, 835
Osterbrock, D. E. 1989, “Astrophysics of gaseous nebulae and
active galactic nuclei”, University Science Books, 1989, 422 p.
Ricotti, M., & Ostriker, J. 2004, MNRAS, 352, 547
Schaerer, D. 2002, A&A, 382, 28
Schmetekopf, A. L., Fehsenfeld, F. C., & Ferguson, E. 1967, ApJ,
148, L155
Shapiro, P. R., & Kang, H. 1987, ApJ, 318, 32
Shull, J. M., & van Steenberg, M. E. 1985, ApJ, 298, 268
Spergel, D.N. et al. 2006, astro-ph/0603449
Stone, P. M., Kim, Y.K., & Desclaux, J.P. 2002, J. Res. Natl.
Inst. Stand. Technol. 107, 327
Wishart, A. W. 1979, MNRAS, 187, 59
http://arxiv.org/abs/astro-ph/0703716
http://arxiv.org/abs/astro-ph/0603449
ABSTRACT
  During the epoch of reionization, the formation of radiation sources is
accompanied by the growth of a H- photodissociating flux. We estimate the
impact of this flux on the formation of molecular hydrogen and cooling in the
first galaxies, assuming different types of radiation sources (e.g. Pop II and
Pop III stars, miniquasars). We find that H- photodissociation reduces the
formation of H2 molecules by a factor of ~1+1000k_s*x/(f_esc*delta), where x is
the mean ionized fraction in the IGM, f_esc is the fraction of ionizing photons
that escape from their progenitor halos, delta is the local gas overdensity and
k_s is an order unity constant which depends on the type of radiation source.
By the time a significant fraction of the universe becomes ionized, H-
photodissociation may significantly reduce the H2 abundance and, with it, the
primordial star formation rate, delaying the progress of reionization.

<|endoftext|><|startoftext|>
arXiv:0704.0427v1  [cond-mat.dis-nn]  3 Apr 2007
The 3D ±J Ising model at the ferromagnetic transition line
Martin Hasenbusch,1 Francesco Parisen Toldin,2 Andrea Pelissetto,3 and Ettore Vicari1
1 Dipartimento di Fisica dell’Università di Pisa and INFN, Pisa, Italy.
2 Scuola Normale Superiore and INFN, Pisa, Italy.
3Dipartimento di Fisica dell’Università di Roma “La Sapienza” and INFN, Roma, Italy.
(Dated: November 1, 2018)
Abstract
We study the critical behavior of the three-dimensional ±J Ising model [with a random-exchange
probability P (Jxy) = pδ(Jxy−J)+(1−p)δ(Jxy+J)] at the transition line between the paramagnetic
and ferromagnetic phase, which extends from p = 1 to a multicritical (Nishimori) point at p =
pN ≈ 0.767. By a finite-size scaling analysis of Monte Carlo simulations at various values of p
in the region pN < p < 1, we provide strong numerical evidence that the critical behavior along
the ferromagnetic transition line belongs to the same universality class as the three-dimensional
randomly-dilute Ising model. We obtain the results ν = 0.682(3) and η = 0.036(2) for the critical
exponents, which are consistent with the estimates ν = 0.683(2) and η = 0.036(1) at the transition
of randomly-dilute Ising models.
PACS numbers: 75.10.Nr, 75.40.Cx, 75.40.Mg, 64.60.Fr
http://arxiv.org/abs/0704.0427v1
I. INTRODUCTION
The ±J Ising model has played an important role in the study of the effects of quenched
random disorder and frustration on Ising systems. It is defined by the lattice Hamiltonian
H±J = −
Jxyσxσy, (1)
where σx = ±1, the sum is over the nearest-neighbor sites of a simple cubic lattice, and the
exchange interactions Jxy are uncorrelated quenched random variables, taking values ±J
with probability distribution
P (Jxy) = pδ(Jxy − J) + (1− p)δ(Jxy + J). (2)
For p = 1 we recover the standard Ising model, while for p = 1/2 we obtain the usual
bimodal Ising spin-glass model.
The phase diagram of the three-dimensional (3D) ±J Ising model is sketched in Fig. 1.
The high-temperature phase is paramagnetic for any p. The low-temperature phase depends
on the value of p: it is ferromagnetic for small values of 1 − p, while it is a spin-glass
phase with vanishing magnetization for sufficiently large values of 1 − p. The different
phases are separated by transition lines, which meet at a multicritical point N located
along the so-called Nishimori line.1,2,3 The spin-glass transition has been mostly studied at
the symmetric point p = 1/2, see, e.g., Refs. 3,4 and references therein. The spin-glass
transition line extends up to the Nishimori multicritical point,2 located at5,6,7,8 pN ≈ 0.767.
For larger values of p, the transition is ferromagnetic, up to p = 1 where one recovers the pure
Ising model, and therefore a transition in the Ising universality class. At the ferromagnetic
transition line, for pN < p < 1, the critical behavior is expected to belong to a different
universality class.
An interesting hypothesis, which has already been put forward in Refs. 3,9, is that the
ferromagnetic transition of the ±J Ising model belongs to the 3D randomly-dilute Ising
(RDIs) universality class (see, e.g., Refs. 10,11 for reviews on randomly-dilute spin mod-
els). A representative of the RDIs universality class is the randomly site-dilute Ising model
(RSIM) defined by the lattice Hamiltonian
Hd = −J
ρx ρy σxσy, (3)
glass
ferro
1 − p
FIG. 1: Sketch of the phase diagram of the 3D ±J Ising model in the T -p plane.
where ρx are uncorrelated quenched random variables, which are equal to 0, 1 with proba-
bility
P (ρx) = pδ(ρx − 1) + (1− p)δ(ρx). (4)
For p < 1 and above the percolation threshold of the spins (pperc ≈ 0.3116081(13) on a cubic
lattice12), the RSIM undergoes a continuous phase transition between a disordered and a
ferromagnetic phase, whose nature is independent of p. This transition is definitely different
from the usual Ising transition: for instance, the correlation-length critical exponent13,14,15,16
ν = 0.683(2) differs from the Ising value17,18 ν = 0.63012(16). The RDIs universality class is
expected to describe the ferromagnetic transition in generic diluted ferromagnetic systems.
For instance, it has been verified that also the randomly bond-diluted Ising model (RBIM)
belongs to the RDIs universality class.13,19 These results do not necessarily imply that also
the ±J Ising model has an RDIs ferromagnetic transition line. Indeed, while the RSIM (3)
has only ferromagnetic exchange interactions, the ±J Ising model is frustrated for any value
of p < 1. Therefore, the ferromagnetic transition in the ±J Ising model belongs to the RDIs
universality class only if frustration is irrelevant, a fact that is not obvious and should be
carefully investigated.
Reference 9 investigated the issue by means of a Monte Carlo (MC) renormalization-
group (RG) study, claiming that the ±J Ising model belongs to the same RDIs universality
class as the RSIM and the RBIM. It should be noted however that the quoted estimate for
the correlation-length exponent at the ferromagnetic transition, ν = 0.658(9), is close to
but not fully consistent with the RDIs value ν = 0.683(2).13 Another numerical MC work20
investigated the nonequilibrium relaxation dynamics of the ±J Ising model and showed an
apparent nonuniversal dynamical critical behavior along the ferromagnetic transition line.
These results are not conclusive and further investigation is called for to clarify this issue.
In this paper we focus on the transition line of the 3D ±J Ising model between the
paramagnetic and the ferromagnetic phase. We investigate the critical behavior by means
of MC simulations at various values of p in the region pN < p < 1. Our finite-size scaling
(FSS) analysis provides a strong evidence that the critical behavior of the 3D ±J Ising
along the ferromagnetic line belongs to the 3D RDIs universality class. For example, we
obtain ν = 0.682(3) and η = 0.036(2), which are in good agreement with the presently most
accurate estimates13 ν = 0.683(2) and η = 0.036(1) for the 3D RDIs universality class.
The paper is organized as follows. In Sec. II we summarize some FSS results which are
needed for the analysis of the MC data, and describe our strategy to check whether the
transition belongs to the RDIs universality class. In Sec. III we describe the MC simula-
tions. In Sec. IV we report the results of the FSS analysis. Finally, in Sec. V we draw our
conclusions. In App. A we report the definitions of the quantities we compute.
II. STRATEGY OF THE FINITE-SIZE SCALING ANALYSIS
In this work we check whether the ferromagnetic transition line in the 3D ±J Ising models
belongs to the RDIs universality class. For this purpose, we present a FSS analysis of MC
data for various values of p in the region 1 > p > pN ≈ 0.767. We follow closely Ref. 13,
which studied the ferromagnetic transition line in the 3D RSIM and RBIM and provided
strong numerical evidence that these transitions belong to the same RDIs universality class.
We refer to Ref. 13 for notations (a short summary is reported in App. A) and a detailed
discussion of FSS in these disordered systems.
According to the RG, in the case of periodic boundary conditions and for L → ∞, where
L is the lattice size, a generic RG invariant quantity R at the critical temperature 1/βc
behaves as
R(L, β = βc) = R
1 + c11L
−ω + c12L
−2ω + · · ·+ c21L
−ω2 + · · ·
, (5)
where R∗ is the universal infinite-volume limit and ω and ω2 are the leading and next-
to-leading correction-to-scaling exponents. In RDIs systems scaling corrections play an
important role,16,21 since ω is quite small. Indeed we have ω = 0.33(3) and ω2 = 0.82(8)
in the 3D RDIs universality class.13 These slowly-decaying scaling corrections make the
accurate determination of the universal asymptotic behavior quite difficult.
Instead of computing the various quantities at fixed Hamiltonian parameters, we keep
a RG invariant quantity R fixed at a given value Rf .
22 This means that, for each L, we
determine the pseudocritical inverse temperature βf(L) such that
R(β = βf(L), L) = Rf . (6)
All interesting thermodynamic quantities are then computed at β = βf (L). The pseudocrit-
ical inverse temperature βf(L) converges to βc as L → ∞. The value Rf can be specified
at will, as long as Rf is taken between the high- and low-temperature fixed-point values
of R. The choice Rf = R
∗ (where R∗ is the critical-point value) improves the conver-
gence of βf to βc for L → ∞; indeed βf − βc = O(L
−1/ν) for generic values of Rf , while
βf −βc = O(L
−1/ν−ω) for Rf = R
∗. This FSS method has already been applied to the study
of the critical behavior of N -vector spin models,22,23 and of randomly-dilute Ising models.13
As in Ref. 13, we perform a FSS analysis at fixed Rξ ≡ ξ/L = 0.5943, which is very close to
the fixed-point value R∗ξ = 0.5944(7) of Rξ at βc. Given any RG invariant quantity R, such as
the quartic cumulants U4 and U22, we consider its value at fixed Rξ, i.e., R̄(L) = R(L, βf(L)).
For L → ∞, R̄(L) behaves as R(L, βc):
R̄(L) = R̄∗
1 + b11L
−ω + b12L
−2ω + · · ·+ b21L
−ω2 + · · ·
, (7)
where the coefficients bij depend on the Hamiltonian. The derivative R̄
′ with respect to β
of a generic RG invariant quantity R behaves as
R̄′(L) = aL1/ν
1 + a11L
−ω + a12L
−2ω + · · ·+ a21L
−ω2 + · · ·
. (8)
Finally, the FSS of the magnetic susceptibility χ is given by13
χ̄(L) ≡ χ(L, β = βf (L)) = eL
1 + e11L
−ω + e12L
−2ω + · · ·+ e21L
−ω2 + · · ·
+ eb (9)
where eb represents the background contribution.
A standard RG analysis, see, e.g., Ref. 13, shows that the amplitudes of the O(L−kω)
scaling corrections are proportional to uk3 (with a universal coefficient), where u3 is the
leading irrelevant scaling field with RG dimension y3 = −ω. Hamiltonians such that u3 = 0—
we call them improved Hamiltonians—have a faster approach to the universal asymptotic
behavior, because the O(L−kω) scaling corrections vanish: b1k = a1k = e1k = 0 in Eqs. (7),
(8), and (9). In this case the leading scaling corrections are proportional to u4L
−ω2 , where u4
is the next-to-leading irrelevant scaling field and y4 = −ω2 is its RG dimension. In Ref. 13
it was shown that the RSIM for p = p∗ = 0.800(5) and the RBIM for p = p∗ = 0.56(2)
are improved. Since scaling fields are analytic functions of the Hamiltonian parameters, u3
must be proportional to p − p∗ close to p = p∗, i.e. u3 ≈ c3(p − p
∗). Therefore, since the
coefficients b1k, a1k, and e1k that appear in Eqs. (7), (8), and (9) are proportional to u
3, we
b1k, a1k, e1k ∼ (p− p
∗)k. (10)
Beside the quantities defined in App. A, we also consider observables—in analogy with
the previous terminology, we call them improved quantities—characterized by the fact that
the leading scaling correction proportional to L−ω (approximately) vanishes in any model
belonging to the RDIs universality class.13 We consider the combination of quartic cumulants
Ūim = Ū4 + 1.3Ū22, (11)
and improved estimators of the critical exponent ν defined as
R′ξ,im ≡ R̄
d , U
4,im ≡ Ū
d (12)
(Ūd is defined in App. A). In Ref. 13 we showed that, if the transition belongs to the RDIs
universality class, the leading scaling correction proportional to L−ω of these improved ob-
servables is suppressed. More precisely, we showed that the universal ratio of the amplitudes
of the leading scaling correction in Ūim and Ū4 satisfies
|b11,Ūim/b11,Ū4 | .
, (13)
while the one for the quantities R′ξ,im and R̄
ξ is bounded by
|a11,R′
/a11,R̄′
. (14)
The remaining scaling corrections are of order L−2ω and L−ω2. These improved observables
are particular useful to check whether the transition in a given system belongs to the 3D
RDIs universality class.
10 100
FIG. 2: Exponential autocorrelation time τ of the magnetic susceptibility for a mixture of
Metropolis and cluster updates as discussed in the text, at p = 0.87. The dotted line shows
the result of a fit to τ = cLz: this fit gives z ≈ 1.6.
To summarize: in order to check whether the ferromagnetic transition of the 3D ±J
Ising model belongs to the RDIs universality class, we perform a FSS analysis at fixed
Rξ = 0.5943, and check if the results for the critical exponents and other universal quantities
are consistent with those obtained for the RDIs universality class, which is characterized
by13 critical exponents ν = 0.683(2) and η = 0.036(1), by the leading and next-to-leading
scaling-correction exponents ω = 0.33(3) and ω2 = 0.82(8) and by the universal infinite-
volume values of the quartic cumulants Ū∗22 = 0.148(1), Ū
im = 1.840(4), and Ū
d = 1.500(1).
Notice that the fact that we fix Rξ = 0.5943 does not introduce any bias in our FSS analysis.
III. MONTE CARLO SIMULATIONS
We performed MC simulations of Hamiltonian (1) with J = 1 for p =
0.94, 0.90, 0.883, 0.87, 0.83, 0.80, close to the critical temperature on cubic lattices of size
L3 with periodic boundary conditions, for a large range of lattice sizes: from L = 8 to
L = 80 for p = 0.883, 0.87, to L = 64 for p = 0.94, 0.90, 0.83, and to L = 48 for p = 0.80. We
chose values of p not too close to p = 1: indeed, as p → 1 we expect crossover effects due to
the presence of the Ising transition for p = 1 and, therefore, that the asymptotic behavior
sets in only for large values of L. We return to this point later.
We used a Metropolis algorithm and multispin coding.24 In the simulation nbit systems
evolve in parallel, where nbit = 32 or nbit = 64 depending on the computer that is used.
For each of these nbit systems we use a different set of couplings Jxy. This allows us to
perform 64 parallel simulations on a 64-bit machine, and therefore to gain a large factor
in the efficiency of the MC simulations. We used high-quality random-number generators,
such as the RANLUX25 or the twister26 generators.27 Using the twister random-number
generator, we need about 1.2 × 10−9 seconds for one Metropolis update of a single spin on
an Opteron processor running at 2 GHz. Our simulations took approximately 3 CPU years
on an Opteron (2 GHz) processor.
It is worth mentioning that cluster algorithms, such as the Swendsen-Wang cluster28
and the Wolff single-cluster29 algorithm, show significant slowing down in the ±J Ising
model. At the earlier stage of this work we performed some simulations of the ±J Ising
model at p = 0.87 using the algorithm used in Ref. 13 to simulate the RSIM and the
RBIM. There we used a combination of Metropolis, Swendsen-Wang cluster,28 and Wolff
single-cluster29 updates. More precisely, each updating step consisted of 1 Swendsen-Wang
update, 1 Metropolis update, and L single-cluster updates. In all cases the exponential
autocorrelation times τ of the magnetic susceptibility was small: τ . 1 in units of the
above updating step, even for the largest lattice sizes considered, i.e. L = 192. In the ±J
Ising model at p = 0.87 autocorrelation times are much larger. In Fig. 2 we plot estimates
of τ as obtained from the magnetic susceptibility. They show a clear evidence of critical
slowing down: τ ∼ Lz with z ≈ 1.6. Such a value of z should be compared with the
dynamic exponent of Swendsen-Wang and Wolff cluster algorithms in the RSIM, which is
much smaller:30 z . 0.5. These results show that cluster algorithms behave differently in
the ±J Ising model, likely due to frustration. They suggest that frustration is relevant for
the cluster dynamics.
Taking also into account the computer time required by the cluster algorithms, we then
turned to a multispin Metropolis algorithm. This turns out to be much more effective at the
lattice sizes considered, although it has a larger dynamic exponent z & 2, see, e.g., Ref. 30
and references therein. We also mention that the autocorrelation time significantly increases
with decreasing p (keeping L fixed). For example, for L = 48 it increases by approximately
a factor of 10 from p = 0.90 to p = 0.80. This represents a major limitation to perform
simulations for large lattices close to the multicritical point.
For each lattice size we considered Ns disorder samples, with Ns decreasing with in-
creasing L, from Ns & 10
6 for L = 8 to Ns & 2 × 10
4 for the largest lattices. For each
disorder sample, we collected a few hundred independent measurements at equilibrium. The
averages over disorder are affected by a bias due to the finite number of measures at fixed
disorder.13,31 A bias correction is required whenever one considers the disorder average of
combinations of thermal averages. We used the formulas reported in App. B of Ref. 13.
Errors were computed from the sample-to-sample fluctuations and were determined by using
the jackknife method.27
Our FSS analysis is performed at fixed Rξ ≡ ξ/L. In order to determine expectation
values at fixed Rξ, one needs the values of the observables as a function of β in some
neighborhood of the inverse temperature βrun used in the simulation. In Ref. 13 we used the
reweighting method for this purpose. This requires that the observables and, in particular,
the values of the energy are stored at each measurement. For the huge statistics like those we
have for the smaller values of L, this becomes unpractical. Therefore, we used here a second-
order Taylor expansion, determining O(β, L) from O(βrun, L)+aO(β−βrun)+ bO(β−βrun)
The coefficients aO and bO are obtained from appropriate expectation values as in Ref. 23.
Since their computation involves disorder averages of products of thermal averages, we have
implemented in all cases an exact bias correction, using the formulas of Ref. 13. Derivatives
with respect to β are then obtained as O′(β, L) = aO + 2bO(β − βrun). Of course, this
method requires |βrun − βf | to be sufficiently small. We have carefully checked the results
by performing, for each L and p, runs at different values of β.
The MC estimates of the quantities introduced in Sec. II and in App. A at fixed Rξ ≡
ξ/L = 0.5943 are available on request.
IV. FINITE-SIZE SCALING ANALYSIS
In this section we present the results of our FSS analysis of the MC data at fixed Rξ =
0.5943.
0.0 0.1 0.2 0.3 0.4 0.5
-0.33
p=0.90
p=0.883
p=0.87
p=0.83
p=0.80
FIG. 3: MC estimates of Ū22 versus L
−ω with ω = 0.33 for different values of p. The dotted lines
show results of fits to c0 + c1L
−ε1 + c2L
−ε2 , fixing c0 = 0.148, ε1 = 0.33, and ε2 = 0.82. In the
RDIs universality class Ū∗22 = 0.148(1).
A. Renormalization-group invariant quantities
In Fig. 3 we show the MC estimates of Ū22 versus L
−ω with ω = 0.33(3), which is the
leading scaling exponent of the RDIs universality class. The data vary significantly with p
and L. This p and L dependence is always consistent with the existence of the expected
next-to-leading scaling corrections, i.e. with a behavior of the form
Ū22 = Ū
22 + c1L
−ε1 + c2L
−ε2 , (15)
where Ū∗22, ε1 and ε2 are fixed to the RDIs values:
13 Ū∗22 = 0.148, ε1 = 0.33 and ε2 =
0.66, 0.82. The fits corresponding to ε2 = 0.82 are shown in Fig. 3. Note that in most of
the cases it is crucial to include a next-to-leading correction. Only for p = 0.90 the data are
well fitted by taking only the leading scaling correction.
An unbiased estimate of ω can be obtained from the difference of data at different values
of p, i.e. by considering
Ū22(p1;L)− Ū22(p2;L) ≈ cL
−ω. (16)
Linear fits of the logarithm of these differences give results in reasonable agreement with
0.00 0.05 0.10 0.15 0.20
-0.82
p=0.90
p=0.883
p=0.87
p=0.83
p=0.80
FIG. 4: MC estimates of Ūim versus L
−0.82. The filled square on the vertical axis corresponds to
the RDIs estimate13 Ū∗im = 1.840(4).
the RDIs estimate ω = 0.33(3), especially when only data corresponding to L ≥ Lmin = 24
are used. For Lmin = 24 [Lmin = 32], we obtain ω = 0.27(2) [ω = 0.27(3)] from the data at
p1 = 0.83 and p2 = 0.90, ω = 0.19(5) [ω = 0.31(9)] from those at p1 = 0.883 and p2 = 0.90,
and ω = 0.29(3) [ω = 0.25(4)] from the results at p1 = 0.83 and p2 = 0.883. We also fitted
the difference Ū22− 0.148 at p = 0.90 to cL
−ε (for this value of p next-to-leading corrections
are apparently very small, see Fig. 3). We obtain ω = 0.35(3) [ω = 0.39(6)] for Lmin = 24
[Lmin = 32].
The results of the above-reported fits of Ū22 show that the leading scaling corrections
proportional to L−ω vanish for p ≈ 0.883. Note that, close to p∗, the relevant next-to-
leading scaling corrections should be those proportional to L−ω2 with ω2 ≈ 0.82. Indeed,
according to Eq. (10), the coefficient of those proportional to L−2ω is of order (p − p∗)2,
i.e. b12 ≈ b̄12(p − p
∗). Therefore, b12 is small if b̄12 = O(1) (we checked this numerically).
This applies to the FSS at p = 0.87 and 0.90, where the L−2ω corrections can be neglected,
although in these two cases we cannot neglect the leading L−ω correction whose coefficient is
proportional to p−p∗. An analysis of the leading scaling corrections at p = 0.87, 0.883, 0.90,
assuming the RDIs values Ū∗22 = 0.148(1) and ω = 0.33(3) (we perform combined fits to (15)
10 20 30 40 50
p=0.900, ε=0.82
p=0.900, ε=0.66
p=0.883, ε=0.82
p=0.870, ε=0.82
p=0.870, ε=0.66
p=0.830, ε=0.82
p=0.830, ε=0.66
p=0.800, ε=0.66
FIG. 5: Estimates of Ū∗im as obtained by fits to Ū
im+ cL
−ε, versus the minimum lattice size Lmin
allowed in the fits. Some data are slightly shifted along the x-axis to make them visible. The
dotted lines correspond to the RDIs estimate13 Ū∗im = 1.840(4).
with ε1 = ω) gives the estimate
p∗ = 0.883(3), (17)
which is approximately in the middle of the ferromagnetic line, i.e. 1 − p∗ ≈ (1 − pN)/2.
We performed a similar analysis for Ūd, obtaining a consistent estimate of p
∗. Thus, the ±J
Ising model for p = 0.883 is approximately improved. Therefore, at p = 0.883, fits of the
data assuming O(L−ω2) leading scaling corrections should provide reliable results.
As discussed in Sec. II, a useful quantity to perform stringent checks of universality within
the RDIs universality class is the combination Ūim of quartic cumulants reported in Eq. (11).
For this quantity the scaling corrections proportional to L−ω are small, cf. Eq. (13), and
thus the dominant corrections should behave as L−2ω, with 2ω ≈ 0.66. As already discussed,
for values of p close to p∗, such as p = 0.87, 0.883, 0.90, also the L−2ω term is expected to be
small and thus the dominant corrections should scale as L−ω2 with ω2 ≈ 0.82. In Fig. 4 we
show the MC results for Ūim for various values of p. Fig. 5 shows results of fits to
Ū∗ + cL−ε, (18)
0 20 40 60 80
 L, L
ε=0.82
ε=0.74
ε=0.90
FIG. 6: Estimates of Ū∗d as obtained by fits of Ūd at p = 0.883, to Ū
d + cL
−ε. Some data are
slightly shifted along the x-axis to make them visible. The dotted lines correspond to the RDIs
estimate13 Ū∗d = 1.500(1).
with ε = 0.66, 0.82. We obtain Ū∗im = 1.840(3)[3], 1.842(2)[1], 1.845(2)[3] respectively for
p = 0.90, 0.883, 0.87, fixing ε = ω2 = 0.82(8) (the error in brackets is related to the uncer-
tainty of ω2) and using data with L ≥ 32; moreover we obtain Ū
im = 1.847(3)[2], 1.840(7)[1]
respectively for p = 0.83, 0.80, fixing ε = 0.66(6) and using data with L ≥ 24. For all values
of p the results are in good agreement with RDIs estimate13 Ū∗im = 1.840(4). They provide
strong support to a RDIs critical behavior along the ferromagnetic line.
A further stringent check of universality comes from the analysis of the data Ūd at
p = 0.883, because the data of Ūd are very precise due to a cancellation of the statisti-
cal fluctuations.13 Since the model is improved, the L−kω scaling corrections are negligible
and the large-L behavior is approached with corrections of order L−ω2 , ω2 = 0.82(8). We
thus fit the data to Eq. (18) with ε = 0.74, 0.82, 0.90. In Fig. 6 we show the results. We
obtain Ū∗d = 1.5001(1)[15], 1.5004(2)[9], 1.5003(10)[6] (the error in brackets is related to the
uncertainty of ω2 = 0.82(8)) for Lmin = 12, 24, 48 respectively. Moreover, by fitting the data
to Ū∗d +c1L
−ε1+c2L
−ε2 with ε1 = 0.33 and ε2 = 0.82, we obtain Ū
d = 1.5006(7), 1.500(3) for
Lmin = 12, 24 respectively. These results are in perfect agreement with the RDIs estimate
10 15 20 25 30
0.665
0.670
0.675
0.680
0.685
0.690
R’ξ,im , ε=0.82
R’ξ,im , ε=0.66
 , ε=0.82
R’ξ , ε=0.82
R’ξ , ε=0.33
 , ε=0.82
p=0.883
FIG. 7: Estimates of the critical exponent ν, as obtained by fits of R̄′ξ, Ū
ξ,im, and U
4,im at
p = 0.883. Lmin is the minimum lattice size allowed in the fits. Some data are slightly shifted along
the x-axis to make them visible. The dotted lines correspond to the estimate ν = 0.682(3).
Ū∗d = 1.500(1). Such an agreement is also confirmed by the analysis of the data of Ū22, for
example a fit to Eq. (18) with ε = ω2 = 0.82(8) gives Ū
22 = 0.1486(8)[3] for Lmin = 32, to
be compared with the RDIs estimate13 Ū∗22 = 0.148(1).
We have not shown results for values of p too close to 1, for p > 0.90 say, because they
are affected by crossover effects due to presence of the Ising transition for p = 1, as it also
occurs in randomly dilute Ising models.16,21,32 For instance, for p = 0.94 the data are not
compatible with a behavior of the form (15) with Ū∗22 fixed to the RDIs value. Our data
that correspond to lattice sizes L ≤ 64 apparently converge to a smaller value, consistently
with the expected crossover from pure to random behavior (in pure systems Ū∗22 = 0). The
same quantitative differences are observed in the RSIM and in the RBIM close to the Ising
transition. This suggests that in FSS analyses up to L ≈ 100 the asymptotic RDIs behavior
can only be observed for p . 0.94.
B. Critical exponents
The correlation-length exponent ν can be estimated by fitting the derivative of Rξ and U4
to the expression (8). Accurate estimates are only obtained for improved Hamiltonians. For
generic models, as shown in Ref. 13, good estimates are only obtained by using improved
estimators, such as those reported in Eq. (12).
We analyze the data at p = 0.883, which is a very good approximation of the improved
value p∗ = 0.883(3). In Fig. 7 we report several results for the critical exponent ν, obtained
by analyzing R̄′ξ, Ū
4, and their improved versions R
ξ,im and U
4,im. We show results of fits of
their logarithms to
lnL+ a+ bL−ε, (19)
fixing ε to several values. Since the Hamiltonian is approximately improved, scaling cor-
rections are expected to decrease as L−ω2 with ω2 = 0.82(8). Since p = 0.883 is only
approximately equal to p∗, one may be worried of the residual leading scaling corrections
that are small but do not vanish exactly. Improved estimators should provide the most
reliable results since the leading scaling corrections are additionally suppressed.
As can be seen in Fig. 7, the results obtained by using R̄′ξ and Ū
4 and ε = 0.82 are
perfectly consistent with those obtained from their improved versions. This confirms that
the Hamiltonian is improved. Fits of R̄′ξ to (19) with ε = 0.33 do not provide stable results.
The results approach the values obtained in the other fits only when increasing the minimum
size Lmin allowed in the fit. This is expected, since the L
−ω corrections should be negligible
with respect to the L−ω2 ones. In conclusion, our final estimate of the correlation-length
exponent is
ν = 0.682(3), (20)
which includes all results (with their errors) of the fits of R̄′ξ, Ū
ξ,im, U
4,im to Eq. (19)
with ε = 0.82(8) and Lmin = 16, 24. Estimate (20) is in perfect agreement with the most
precise RDIs estimate ν = 0.683(2).
Estimate (20) is also confirmed by the analysis of the data at the other values of p. Fig. 8
shows results obtained by fitting the logarithm of R′ξ,im to the function (19) for other values
of p. They are definitely consistent with the result obtained at p = 0.883. Results for
p = 0.80 are not shown because the available data are not sufficient to get reliable results.
10 15 20 25 30
p=0.900, R’ξ,im, ε=0.82
p=0.900, R’ξ,im, ε=0.66
p=0.900, U’
, ε=0.82
p=0.870, R’ξ,im, ε=0.82
p=0.870, R’ξ,im, ε=0.66
p=0.870, U’
, ε=0.82
p=0.830, R’ξ,im, ε=0.66
p=0.830, U’
, ε=0.66
FIG. 8: Results from ν obtained by fitting R′ξ,im to aL
1/ν(1+bL−ε). Some data are slightly shifted
along the x-axis to make them visible. The dotted lines correspond to the estimate obtained at
p = 0.883, i.e. ν = 0.682(3).
In order to estimate the critical exponent η, we analyze the FSS of the magnetic suscep-
tibility χ̄, cf. Eq. (9). We fit it to aL2−η + b (where b represents a constant background
term), to aL2−η(1 + cL−ε), and to aL2−η(1 + cL−ε) + b (more precisely, we fit ln χ̄ to the
logarithm of the previous expressions). The results at p = 0.883 are shown in Fig. 9, versus
the minimum size Lmin allowed in the fits. We obtain the estimate
η = 0.036(2), (21)
which includes all results obtained for Lmin & 16. This estimate agrees with the most precise
RDIs estimate η = 0.036(1). Fig. 10 shows results for the other values of p. Again, they are
in good agreement.
C. The critical temperature
The critical temperature can be estimated by extrapolating the estimates of βf at Rξ =
0.5943, cf. Eq. (6). Since we have chosen Rξ = 0.5943 ≈ R
ξ = 0.5944(7),
13 we expect
10 15 20 25 30
+cost
ε=0.82
ε free
+cost & ε=0.82
p=0.883
FIG. 9: Estimates of the critical exponent η, obtained by fitting ln χ̄ at p = 0.883, to a + (2 −
η) lnL+bLη−2 (denoted by +cost), a+(2−η) ln L+a1L
−ε, and to a+(2−η) ln L+a1L
−ε+bLη−2.
Some data are slightly shifted along the x-axis to make them visible. The dotted lines correspond
to the final estimate η = 0.036(2).
in general that βf − βc = O(L
−1/ν−ω). For p = 0.883, since the model is approximately
improved, the leading scaling corrections are related to the next-to-leading exponent ω2.
Thus, in this case βf −βc = O(L
−1/ν−ω2). This behavior is nicely observed in Fig. 11, which
shows βf(L) at p = 0.883 vs L
−1/ν−ω2 with 1/ν + ω2 ≈ 2.28. A fit to βc + aL
−1/ν−ω2 gives
βc = 0.300611(1).
For the other values of p we expect βf − βc = O(L
−1/ν−ω) with 1/ν + ω ≈ 1.79. Linear
fits of βf (L) (for L ≥ Lmin with Lmin sufficiently large to give an acceptable χ
2) give the
estimates βc = 0.25544(2) for p = 0.94, βc = 0.285285(5) for p = 0.90, βc = 0.313748(1) for
p = 0.87, βc = 0.365459(5) for p = 0.83, βc = 0.42501(3) for p = 0.80. We finally recall
that18 βc = 0.22165452(8) for p = 1 (the standard Ising model), and that
7 βc = 0.5967(11)
at the multicritical Nishimori point at pN = 0.7673(3). In Fig. 12 we plot the available
estimates of the critical temperature Tc ≡ 1/βc in the region 1 ≥ p ≥ pN .
The estimates of Tc shown in Fig. 12 hint at a smooth linear behavior for small values of
w ≡ 1− p, close to the Ising point at w = 0. This can be explained by some considerations
10 15 20 25 30 35
p=0.900, +cost
p=0.900, ε free
p=0.870, +cost
p=0.870, ε free
p=0.830, +cost
p=0.830, ε free
FIG. 10: Estimates of the critical exponent η, obtained by fitting χ̄ for various values of p. See the
caption of Fig. 9 for an explanation of the fits. Some data are slightly shifted along the x-axis to
make them visible. The dotted lines correspond to the result η = 0.036(2) obtained at p = 0.883.
on the multicritical behavior around the Ising point at w = 0. The Ising critical behavior
at w = 0 is unstable against the RG perturbation induced by quenched disorder at w > 0,33
which leads to the RDIs critical behavior. Indeed such a perturbation has a positive RG
dimension yw at the Ising fixed point:
10,34 yw = αIs/νIs = 2/νIs − 3 where αIs and νIs are the
Ising specific-heat and correlation-length critical exponents, and therefore17 yw = 0.1740(8).
Thus, in the absence of an external magnetic field, beside the scaling field ut related to the
temperature, there is another relevant scaling field uw associated with the quenched disorder
parameter w ≡ 1− p. General RG scaling arguments21,35 show that the singular part of the
free energy for w → 0 behaves as
Fsing ∼ u
2−αIs
t F (X), X = uwu
t , (22)
where φ = ywνIs = αIs = 0.1096(5) is the crossover exponent, and F (X) is a crossover
scaling function which is universal (apart from normalizations). The scaling fields ut and
uw depend on the parameters of the model. In general, we expect
ut = t+ kw, (23)
0.000 0.002 0.004 0.006
-1/ν−ω2
0.3006
0.3007
FIG. 11: Estimates of βf (L) at p = 0.883 versus L
−(1/ν+ω2) for 1/ν + ω2 ≈ 2.28. The dashed line
corresponds to a linear fit of the data for L ≥ 12.
where t ≡ T/TIs − 1, TIs is the critical temperature of the Ising model, and k is a constant.
No such mixing between t and w occurs in uw, since uw vanishes for w = 0. Hence, we can
take uw = w. The system has a critical transition for w > 0 at Tc(w). Since the singular
part of the free energy close to a critical point behaves as (T −Tc)
2−α (α = −0.049(6) is the
specific-heat exponent of the RDIs universality class), we must have F (Xc) = 0, where Xc
is the value of X obtained by setting T = Tc(w) (see, e.g., Ref. 36 and references therein).
Hence, we obtain
w [Tc(w)/TIs − 1 + kw]
= Xc, (24)
and therefore
Tc(w)/TIs − 1 = (w/Xc)
1/φ − kw + · · · , (25)
where the dots indicate higher-order terms. This expression provides the w dependence of
the critical temperature for w small. Note that the nonanalytic term in Eq. (25) is suppressed
with respect to the analytic ones, because 1/φ ≈ 9.1. Thus, Tc(w) ≈ TIs(1 − kw + O(w
Since Tc(w) < TIs, we can also infer that k > 0. From the results for Tc(w) we estimate
k ≈ 2.2 for the ±J Ising model.
0.0 0.1 0.2
Ising
multicritical point
FIG. 12: The critical temperature Tc ≡ 1/βc vs 1− p.
V. CONCLUSIONS
In this paper we have studied the critical behavior of the 3D ±J Ising model at the
transition line between the paramagnetic and the ferromagnetic phase, which extends from
p = 1 to a multicritical (Nishimori) point at p = pN ≈ 0.767. We presented a FSS analysis of
MC simulations at various values of p in the region pN < p < 1. The results for the critical
exponents and other universal quantities are consistent with those of the RDIs universality
class. For example, we obtained ν = 0.682(3) and η = 0.036(2), which are in good agreement
with the presently most accurate estimates13 ν = 0.683(2) and η = 0.036(1) for the 3D RDIs
universality class. Therefore, our FSS analysis provides a strong evidence that the critical
behavior of the 3D ±J Ising along the ferromagnetic line belongs to the 3D RDIs universality
class.
We also note that the random-exchange interaction in the ±J Ising model gives rise
to frustration, while the RDIs universality class describes transitions in generic diluted
Ising systems with ferromagnetic exchange interactions. This implies that frustration is
irrelevant at the ferromagnetic transition line of the 3D ±J Ising model. Moreover, the
observed scaling corrections are consistent with the RDIs leading and next-to-leading scaling
correction exponents ω = 0.33(3) and ω2 = 0.82(8). This indicates that frustration does not
introduce new irrelevant perturbations at the RDIs fixed point with RG dimension yf & −1.
APPENDIX A: NOTATIONS
We define the two-point correlation function
G(x) ≡ 〈σ0 σx〉, (A1)
where the overline indicates the quenched average over the Jxy probability distribution.
Then, we define the corresponding susceptibility χ ≡
xG(x) and the correlation length ξ
G̃(0)− G̃(qmin)
q̂2minG̃(qmin)
, (A2)
where qmin ≡ (2π/L, 0, 0), q̂ ≡ 2 sin q/2, and G̃(q) is the Fourier transform of G(x). We also
consider quantities that are invariant under RG transformations in the critical limit. Beside
the ratio
Rξ ≡ ξ/L, (A3)
we consider the quartic cumulants U4, U22 and Ud defined by
, (A4)
U22 ≡
µ22 − µ2
Ud ≡ U4 − U22,
where
µk ≡ 〈 (
k〉 . (A5)
We also define corresponding quantities Ū4, Ū22, and Ūd at fixed Rξ = 0.5943. Finally, we
consider the derivative R′ξ of Rξ, and U
4 of U4, with respect to β ≡ 1/T , which allow one to
determine the critical exponent ν.
1 H. Nishimori, Prog. Theor. Phys. 66, 1169 (1981).
2 P. Le Doussal and A.B. Harris, Phys. Rev. Lett. 61, 625 (1988).
3 N. Kawashima and H. Rieger, in Frustrated Spin Systems, edited by H.T. Diep (World Scientific,
Singapore, 2004); cond-mat/0312432.
4 H. Katzgraber, M. Körner, and A.P. Young, Phys. Rev. B 73, 224432 (2006).
5 Y. Ozeki and H. Nishimori, J. Phys. Soc. Japan 56, 3265 (1987).
6 R.R.P. Singh, Phys. Rev. Lett. 67, 899 (1991).
7 Y. Ozeki and N. Ito, J. Phys. A 31, 5451 (1998).
8 Refs. 5,6,7 report the estimates pN = 0.767(2), pN = 0.7656(20), and pN = 0.7673(3) respec-
tively.
9 K. Hukushima, J. Phys. Soc. Japan 69, 631 (2000).
10 A. Pelissetto and E. Vicari, Phys. Rept. 368, 549 (2002).
11 R. Folk, Yu. Holovatch, and T. Yavors’kii, Uspekhi Fiz. Nauk 173, 175 (2003) [Phys. Usp. 46,
175 (2003)].
12 H. G. Ballesteros, L. A. Fernández, V. Mart́ın-Mayor, A. Muñoz Sudupe, G. Parisi, and
J. J. Ruiz-Lorenzo, J. Phys. A 32, 1 (1999).
13 M. Hasenbusch, F. Parisen Toldin, A. Pelissetto, and E. Vicari, J. Stat. Mech.: Theory Exp.
P02016 (2007).
14 P. Calabrese, V. Mart́ın-Mayor, A. Pelissetto, and E. Vicari, Phys. Rev. E 68, 036136 (2003).
15 A. Pelissetto and E. Vicari, Phys. Rev. B 62, 6393 (2000).
16 H.G. Ballesteros, L.A. Fernández, V. Mart́ın-Mayor, A. Muñoz Sudupe, G. Parisi, and J.J.
Ruiz-Lorenzo, Phys. Rev. B 58, 2740 (1998).
17 M. Campostrini, A. Pelissetto, P. Rossi, and E. Vicari, Phys. Rev. E 65, 066127 (2002).
18 Y. Deng and H.W.J. Blöte, Phys. Rev. E 68, 036125 (2003).
19 W. Janke, in Proceedings of the XXIII International Symposium on Lattice Field Theory,
Dublin, July 2005, POS(LAT2005)018
20 N. Ito, Y. Ozeki, and H. Kitatani, J. Phys. Soc. Jpn. 68, 803 (1999).
21 P. Calabrese, P. Parruccini, A. Pelissetto, and E. Vicari, Phys. Rev. E 69, 036120 (2004).
22 M. Hasenbusch, J. Phys. A 32, 4851 (1999).
23 M. Campostrini, M. Hasenbusch, A. Pelissetto, and E. Vicari, Phys. Rev. B 74, 144506 (2006);
Phys. Rev. B 63, 214503 (2001).
24 See, e.g., S. Wansleben, J.B. Zabolitzky, and C. Kalle, J. Stat. Phys. 37, 271 (1984); G. Bhanot,
D. Duke, and R. Salvador, Phys. Rev. B 33, 7841 (1986).
25 M. Lüscher, Comput. Phys. Commun. 79, 100 (1994).
26 The SIMD-oriented fast Marsenne twister random number generator has been introduced by
M. Matsumoto and M. Saito. Details can be found in M. Saito, Master Thesis (2007) and at
http://www.math.sci.hiroshima-u.ac.jp/∼m-mat/MT/emt.html.
27 In order to make the use of these expensive (in terms of CPU-time) generators affordable,
we employed the same sequence of random numbers for the update of all nbit systems (for
the initialization of the configurations at the beginning of the simulation we used independent
random numbers for each of the systems). This may give rise to a statistical correlation among
the nbit systems. This effect is probably small and we have not detected it. Anyway, in order to
ensure a correct estimate of the statistical error, all nbit systems that use the same sequence of
random numbers have been put in the same bin in our jackknife analysis.
28 R.H. Swendsen and J-S. Wang, Phys. Rev. Lett. 58, 86 (1987).
29 U. Wolff, Phys. Rev. Lett. 62, 361 (1989).
30 D. Ivaneyko, J. Ilnytskyi, B. Berche, and Yu. Holovatch, Physica A 370, 163 (2006).
31 H.G. Ballesteros, L.A. Fernández, V. Mart́ın-Mayor, A. Muñoz Sudupe, G. Parisi, and J.J.
Ruiz-Lorenzo, Nucl. Phys. B 512, 681 (1998).
32 The crossover exponent from pure Ising to RDIs critical behavior is the Ising specific-heat
exponent34 αIs, see also Sec. IVC. This implies that the crossover scaling variable in the FSS at
Tc is given by the combination X = cwL
αIs/νIs , where w = 1 − p, αIs/νIs = 0.1740(8), and c is
a normalization constant. When w → 0, strong crossover effects are expected for X . 1, which
corresponds to L . (cw)−5.75. The RDIs asymptotic critical behavior is observed for X ≫ 1.
33 A. B. Harris, J. Phys. C 7, 1671 (1974).
34 A. Aharony, in Phase Transitions and Critical Phenomena, edited by C. Domb and J. Lebowitz
(Academic Press, New York, 1976), Vol. 6, p. 357.
35 I.D. Lawrie and S. Sarbach, in Phase Transitions and Critical Phenomena, Vol. 9, edited by C.
Domb and J. Lebowitz (Academic, London, 1984).
36 A. Pelissetto and E. Vicari, cond-mat/0702273.
ABSTRACT
  We study the critical behavior of the three-dimensional $\pm J$ Ising model
[with a random-exchange probability $P(J_{xy}) = p \delta(J_{xy} - J) + (1-p)
\delta(J_{xy} + J)$] at the transition line between the paramagnetic and
ferromagnetic phase, which extends from $p=1$ to a multicritical (Nishimori)
point at $p=p_N\approx 0.767$. By a finite-size scaling analysis of Monte Carlo
simulations at various values of $p$ in the region $p_N<p<1$, we provide strong
numerical evidence that the critical behavior along the ferromagnetic
transition line belongs to the same universality class as the three-dimensional
randomly-dilute Ising model. We obtain the results $\nu=0.682(3)$ and
$\eta=0.036(2)$ for the critical exponents, which are consistent with the
estimates $\nu=0.683(2)$ and $\eta=0.036(1)$ at the transition of
randomly-dilute Ising models.

<|endoftext|><|startoftext|>
Introduction
Protostars are young stellar objects that are still in the process of accreting the bulk of
their material. Class 0 sources have been proposed as the evolutionary precursors to the class
I protostars (André, Ward-Thompson, & Barsony 1993; André & Montmerle 1994). Work
by Bontemps et al. (1996) and Saraceno et al. (1996) suggested a direct evolutionary se-
quence from the class 0 stage to the class I stage. However, Jayawardhana, Hartmann, & Calvet
(2001) have argued that class 0 protostars are located preferentially in higher density regions
and that class I young stellar objects are located preferentially in lower density regions and,
thus, may be at a comparable evolutionary states. To further complicate the distinction be-
tween class 0 and class I objects, some class I objects, which are viewed at high inclination,
may appear “class 0-like” because of the high optical depth associated with viewing the disk
(nearly) edge-on (Masunaga & Inutsuka 2000).
An example of this confusion may be seen in the binary system L1448N(A,B), where one
component of the system has the spectral energy distribution (SED) of a class 0 protostar
while the other has an SED of a class I protostar (Ciardi et al. 2003; O’Linger et al. 2006).
This spread in apparent evolutionary status is also seen in larger clusters. In Perseus,
Rebull et al. (2007) found that sources within clusters exhibited SEDs for a wide range
in circumstellar environments, suggesting class 0 to class II protostars within the same
aggregate. Either viewing geometry plays a significant role in our interpretation for each
of the SEDs, the aggregates span a significant age spread, or if the sources are coeval, the
disks/envelopes evolve faster than anticipated.
To help address these issues, it would be beneficial to study a set of young stellar objects,
located within the same environment, but isolated and free from the influence of other active
star formation. The Bok Globule CB54, known to be a site of active star formation, may
provide such an environment of isolated aggregate star formation.
CB54 (LBN 1042; Clemens & Barvainis 1988) is a ∼ 100 M⊙ Bok globule associated
with the Vela OB1 molecular cloud (d≈1500 pc; Launhardt & Henning 1997; Launhardt, Ward-Thompson, & Henning
1997). Bok globules are small (10−100 M⊙; Clemens, Yun, & Heyer 1991), isolated molecular
clouds, most of which have been identified via opaque patches in optical images (Clemens & Barvainis
1988; Bourke, Hyland, & Robinson 1995). Globules have been found to be sites of star for-
mation (e.g., Yun & Clemens 1990, 1994a,b; Alves & Yun 1995; Moreira & Yun 1995), of
both single low-mass stars and multiple or binary stars (e.g., Yun 1996).
Active star formation in CB54 was first identified by the association of a dense core with
the IRAS point source PSC 07020-1618, which is located at the center of a collimated molec-
ular outflow (Yun & Clemens 1994b). Near-infrared imaging revealed that CB54 actually
– 3 –
contains two bright near-infrared sources (CB54YC1-I, CB54YC1-II) and diffuse nebulosity
of shocked H2 emission connecting the two sources (see Figure 1 and Yun 1996; Khanzadyan
2003). The positions of CB54YC1-I and -II are offset from the position of the IRAS point
source (see Figure 1), although the IRAS beam size and PSC positional errors (20′′ × 4′′) do
make it difficult to associate any one source with IRAS 07020-1618.
Based upon its near-infrared colors (J−K = 5.29 mag, H−K = 2.58 mag), CB54YC1-
II was classified as a class I young stellar object (Yun 1996). There is an MSX point source
(G228.9946-04.6200), detected only in Band-A (8.3 µm), within a few arcsecs of CB54YC1-II
(see Figure 1). CB54YC1-I (J −K = 4.34 mag, H −K = 1.63 mag) was also classified as a
class I protostar (Yun 1996), but despite its similar near-infrared brightness to CB54YC1-II,
it was not detected by MSX. Unlike CB54YC1-II, the near-infrared colors of CB54YC1-I
could be explained with a highly extinguished (AV ∼ 20 mag) “bare” photosphere. VLA ob-
servations detected a 3.6 cm and 6 cm source within a few arcseconds of CB54YC1-I (see Fig-
ure 1), possibly indicating a stellar wind or accretion shock (Yun et al. 1996; Moreira et al.
1997).
The position of the IRAS point source is spatially coincident with a dense core revealed
in sub-mm, mm, and molecular line mapping; the position of which is offset from the positions
of CB54YC1-I and CB54YC1-II but coincident with the IRAS point source (see Figure 1
and e.g., Wang et al. 1995; Zhou, Evans, & Wang 1996; Henning et al. 2001). Molecular
line observations also indicated the presence of gravitational collapse in the core of CB54
(Wang et al. 1995; Afonso, Yun, & Clemens 1998). Water maser emission, almost exclusively
associated with class 0 protostars in regions of low-mass star formation (e.g., Furuya et al.
2001), was also discovered in CB54 (de Gregorio-Monsalvo et al. 2006; Gómez et al. 2006).
All of this suggests that the star formation in CB54 may be more substantial than revealed
by the near-infrared imaging alone.
We have observed the mid-infrared emission from the Bok globule CB54 at high spatial
resolution (∼ 0.′′5) to clarify the evolutionary status of CB54YC1-I and CB54YC1-II and
to search for additional protostars embedded in the globule core. Our work confirms the
protostellar status of CB54YC1-II, but indicates that CB54YC1-I may be a more evolved
young stellar object or a background giant star. In addition, we have discovered three new
mid-infrared sources which are spatially coincident with the dense core and may be class 0
protostars.
– 4 –
2. Observations and Data Reduction
Mid-infrared imaging observations of CB54 were made on 2004 February 01 (UT) using
the Thermal Region Camera and Spectrograph (T-ReCS; Telesco et al. 1998) on the Gemini
South 8 m telescope. T-ReCS utilizes a 320×240 pixel Si:As blocked impurity band detector,
with a spatial scale of 0.′′089 pixel−1 and a field of view of 28.′′8×21.′′6. The observations were
centered on the J2000 coordinates (α, δ) = (α = 07h04m21s, δ = −16◦23′19′′). Imaging was
obtained in three filters (N, Si-11.7 and Qa-18.3). The on-sky alignment of the T-ReCS field-
of-view was chosen to cover the entire near-infrared nebulosity, the two known near-infrared
sources, and the peak of the sub-millimeter (850 µm) core (see Figure 1).
A standard off-chip 15′′ north-south chop-nod sequence was employed with total on-
source integration times of 300 s per image. Three exposures in the Qa-18.3 filter were
acquired for a total on-source integration time of 900 s. Flux calibration was obtained
from imaging of the standard star HD 32887 (see the Gemini webpage for a compilation
of mid-infrared standard stars and flux densities.)1. The weather quality was listed as the
50th-percentile, and the seeing at 11.7 µm was ≈ 0.′′4. A summary of the filters, frame times,
total integration time per filter, and associated air masses is given in Table 1.
The data were reduced with custom-written IDL routines for the T-ReCS data format.
Four mid-infrared sources were detected by our observations, with no evidence of extended or
diffuse mid-infrared emission (see Figure 2). Standard aperture photometry was performed
using a 1′′ aperture radius. Detection limits were tested by inserting fake sources into the
images and performing aperture photometry. A summary of the photometry (including
estimated 1σ upper limits) and relative positional offsets with respect to CB54YC1-II is
given in Table 2.
3. Discussion
The near-infrared source CB54YC1-II is the brightest mid-infrared source and is de-
tected in each of the mid-infrared filters. The other near-infrared source CB54YC1-I was
not detected in any of the mid-infrared imaging. In addition, three new mid-infrared sources
have been detected (MIR-a, MIR-b, and MIR-c). MIR-a and MIR-b were detected in each
of three mid-infrared filters, while MIR-c was detected only at 18.3 µm. In the following
sections, we evaluate the properties of these sources and discuss the possible star formation
history of the globule.
1 http://www.gemini.edu/sciops/instruments/mir/MIRPhotStandards.html
http://www.gemini.edu/sciops/instruments/mir/MIRPhotStandards.html
– 5 –
3.1. CB54YC1-II
CB54YC1-II, originally classified as a candidate class I protostar (Yun 1996), has a
2.2 − 10.3 µm spectral index [α = −d log(νFν)/d log(ν)] of α = 0.34 ± 0.01, which is
consistent with the spectral index expected for a class I/flat spectrum young stellar object.
The SED of CB54YC1-II in Figure 3. For comparison, the SED for a confirmed class I
young stellar object (IRAS 04195+2251; Eisner, J. et al. 2006), with a similar spectral index
α(2.2−10.6 µm) ≈ 0.3, has been scaled to the SED of CB54YC1-II (see Figure 3), exhibiting
good agreement between the SEDs.
If the mid-infrared emission is primarily the result of gravitational infall, (e.g., Ciardi et al.
2003), the mid-infrared luminosity provides a means of estimating the central protostellar
mass. Integrating the SED from 1 − 20 µm, we estimate the mid-infrared luminosity to be
Lmidir ≈ 8 ± 2 L⊙, for an assumed distance of 1500 pc. We estimate a central source
mass from the relation L = (GṀM∗)/R∗, where Ṁ is the mass infall rate, R∗ is the
source size, and M∗ is the source mass (Shu, Adams, & Lizano 1987). Using a standard
R∗ = 3 R⊙ protostellar radius (Stahler, Shu, & Taam 1980) and typical mass accretion rates
of Ṁ = 10−5 − 10−6 M⊙ yr
−1 (Kenyon, Calvet, & Hartmann 1993), we estimate the central
protostellar mass for CB54YC1-II to be M∗ = 0.08− 0.8 M⊙.
A class II pre-main sequence star located behind a wall of extinction could also explain
the observed SED for CB54YC1-II. In Figure 3, a median SED for T Tauri stars (TTS)
(D’Alessio et al. 1999) has been scaled and convolved with an extinction model (R=3.1
assumed, Mathis 1990). In order to match both the mid-infrared flux densities and the
slope of the near-infrared, the TTS SED must be extinguished by AV ≈ 25 magnitudes, and
indeed, the position of CB54YC1-II in a JHK color-color (J −H = 2.71 mag, H −K = 2.58
mag) diagram is consistent with a heavily extinguished TTS (see Figure 4 in Haisch et al.
2000).
The average volume density of the CB54 envelope (i.e., not including the central con-
densation which is offset from the near-infrared sources), as derived from sub-mm (450 &
850 µm) imaging, is 〈nH〉 ≈ 5 × 10
4 cm−3 (Henning et al. 2001). At 1500 pc, the projected
linear radius of the CB54 envelope is r ≈ 22500 AU (≈ 15′′). If we assume the globule is
spherical, we can derive a peak column density of NH ≈ 4×10
22 cm−2, which corresponds to
a visual extinction of AV ≈ 20 mag. The extinction estimatation does not take into account
specific structure within the cloud including any dense envelope which may immediately
surround the source, but does indicate the above derived extinction levels for CB54YC1-II
are possible. Without a more complete SED or spectroscopy, especially at 3 − 8 µm, it is
difficult to distinguish between the class I and class II models for CB54YC1-II
– 6 –
3.2. CB54YC1-I
CB54YC1-I was not detected in any of the three mid-infrared images, calling into ques-
tion the original class I protostellar classification. Scaling the SED for the class I protostar
(IRAS 04295+2251) to the JHK SED of CB54YC1-I, the predicted mid-infrared flux densi-
ties for CB54YC1-I are Fν ∼ 80− 100 mJy at 11.7 µm and Fν ∼ 150 mJy at 18.3 µm. The
predicted emission is ∼ 10σ above the detection limits (see Figure 4.
The JHK slope for CB54YC1-I is too steep to match the near-infrared SED for a class II
(TTS) pre-main sequence star. However, if TTS SED is modified with a screen of foreground
extinction, the near-infrared SED for CB54YC1-I can be reproduced with a class II pre-
main sequence model. In Figure 4, as was done for CB54YC1-II, the median TTS SED
(D’Alessio et al. 1999) has been scaled and convolved with an extinction model and fitted to
the JHK data for CB54YC1-I. For a best-fit extinction of AV = 15− 17 mag, the TTS SED
can reproduce the near-infrared data. However, the model predicts mid-infrared densities
(Fν ∼ 20−40 mJy at 11.7 µm and Fν ∼ 50−70 mJy at 18.3 µm). The predicted mid-infrared
emission for the TTS SED is ∼ 7σ above the detection limits.
It is possible that CB54YC-I is a more evolved star embedded in the globule or simply
a star background to the globule. To explore these possibilities, we have fitted the JHK
photometry with a blackbody function modified by a line-of-sight extinction curve: Sν =
ΩBν(T ) exp (−Aν/1.086), where Bν(T ) is the Planck function, Aν is the frequency-dependent
extinction, and Ω is the solid angle. At each extinction value in the range from AV = 0− 30
mag (∆AV = 0.1 mag), a range of temperatures (T = 500−50000 K in steps of 100 K) were
tested.
For a given extinction value, there is a unique blackbody temperature for which the chi-
square is a minimum, but there is no global minimum representing a best fit the JHK data.
The average temperature uncertainty for a given extinction value is ∼ 500 K. In Figure 5, the
resulting reduced chi-squares and temperatures for each of the trial extinctions are plotted.
The fitting uncertainty in the temperature for a given extinction value is approximately
10%. The reduced chi-square curve is relatively flat between 0 ≤ AV ≤ 26 mag. Beyond
AV = 26 mag, the reduced chi-square climbs above χ
≈ 1 and begins to diverge rapidly.
The best fit temperature at this boundary is T ≈ 30000 K. Because of the rapid change in
the chi-square beyond this point, we regard this as the upper bound for the extinction and
source temperature of CB54YC1-I.
The lower bound to the extinction and temperature is constrained only by the detection
limits of the mid-infrared observations. The combination of temperature and extinction must
be such that CB54YC1-I is not detected in all three mid-infrared filters (N, Si-11.7, & Qa-
– 7 –
18.3). For example, at zero extinction (AV = 0 mag), the best fit temperature is T ≈ 1100
K and the reduced chi-square is χ2
< 1, but the predicted mid-infrared flux densities violate
the non-detections at both N and 11.7 µm (see Figure 4).
The blackbody plus extinction models only predict mid-infrared flux densities below
the detection limits in all three mid-infrared filters if AV & 23 mag. As an example, the
predicted N-band flux densities (the most sensitive of the three observations) are plotted as
a function of visual extinction in Figure 5 (bottom), showing that the predicted flux density
drops below the detection limit at AV & 23 mag. We, therefore, regard AV ≈ 23 mag and
the corresponding temperature (T ≈ 6000 K) as the lower bound for CB54YC1-I. The lower
(AV = 23 mag, T = 6000 K) and upper (AV = 26 mag, T = 30000 K) bounds for the
blackbody model fits to CB54YC1-I are shown in Figures 4.
If the extinction is AV = 23 mag, the temperature (T ≈ 6000 K) suggests that the
CB54YC1-I could be an early-G or late-F star. A main sequence dwarf of this spectral type
has an absolute K magnitude ofMK ≈ 2.7 mag. With a measured K magnitude ofK = 11.76
(Yun 1996), the implied distance is only ∼ 250 pc, much too close to be extinguished by
CB54. If, however, CB54YC1-I is a G or F giant star, the star would be approximately 4.0
magnitudes brighter and located at a distance of ≈ 1500 pc, sufficiently distant to place
CB54YC1-I behind the globule.
At the upper limit to the model fitting, (AV ≈ 26 mag, T ≈ 30000 K), the temperature
corresponds to a B0 star. With an estimated absolute magnitude of MK = −3. CB54YC1-I
would be at a distance of 2500 pc which is far enough to be behind the globule. The rarity
and short lifespan (10 MYr) of B0 stars and the required chance alignment with the globule
seem to make this a remote possibility.
If, instead, the extinction is near the middle of the extinction range (AV = 24 − 25
mag), the best fit temperature (T ≈ 10000 − 15000 K) implies that CB54YC1-I may be a
young A or B star. With an absolute K magnitude of MK ≈ −1.5− 0, this yields a distance
of only 1200− 1500 pc placing CB54YC1-I at a distance consistent with being an embedded
young star.
An alternative explanation for CB54YC1-I is that the source is an embedded protostar
viewed at an extremely high inclination angle, and the near-infrared detections are not of
the central protostar, but of light scattered by the accretion disk into our line of sight.
Unfortunately, near-infrared photometry alone can not distinguish these models.
– 8 –
3.3. Mid-Infrared Sources
Three new sources have been detected by our mid-infrared observations. These sources
have no near-infrared counterparts. The mid-infrared sources lie just beyond the edge of
the shocked H2 emission and do not correspond to any of the knots or condensations visible
in the near-infrared diffuse emission (see Figure 1 and Yun 1996; Khanzadyan 2003). The
mid-infrared sources, however, are located within the boundaries of the dense core in CB54
and clustered near the position of the IRAS point source (see Figure 1). The two brightest
sources (MIR-a, MIR-b) were detected in all three filters, but MIR-c was detected only at
18.3 µm. A summary of the photometry is given in Table 2, and the SEDs for these sources
are presented in Figure 6.
To characterize and understand the relative temperatures of the mid-infrared sources, a
single temperature blackbody was fit to both MIR-a and MIR-b. The blackbody fits did not
include the broad-band N (10.3 µm) flux density which are potentially contaminated with
an unknown amount of amorphous silicate, and were fit to the narrow-band 11.7 µm and
18.3 µm flux densities. The best-fit blackbody (Figure 6) temperatures for the two sources
are quite similar (TA = 110 ± 10 K and TB = 100 ± 10 K) and are near what is expected
for the bolometric temperatures of class 0 protostars (André, Ward-Thompson, & Barsony
2000). If the N-band photoometry is included in the fits, the resulting temperatures increase
by 10− 20 K.
Blackbody fits to the sub-mm and mm emission from the dense core yields a much
colder envelope temperature of 25 K (Launhardt, Ward-Thompson, & Henning 1997). The
summed flux densities predicted by the mid-infrared blackbody fits is not sufficient to explain
the 100 µm flux density for IRAS PSC 07020-1618 (Fν ≈ 100 Jy), indicating that these class
0 protostars are harbored within the cold, dense envelope.
If we assume that the mid-infrared emission is optically thin and little emission is con-
tributed from the cold envelope, we can estimate the protostellar core mass associated with
mid-infrared emission via
(16/3)πaρD2
QνBν(Td)
Fν (1)
where Fν is the observed flux density at frequency ν, Qν is the grain emissivity at frequency
ν, a is the grain radius, ρ is the grain mass density, D is the distance to CB54, and Bν
is the Planck function at dust temperature Td. Assuming a = 0.5 µm, ρ = 1 g cm
Qν = 0.1(λ/µm)
−α, and α = 0.45 (e.g., Muthumariappan, Kwok, & Volk 2006), we estimate
dust masses for MIR-a and MIR-b of Ma ≈ 0.014 M⊙ and Mb ≈ 0.043 M⊙, respectively.
For an average gas-to-dust mass ratio of 100, the central mid-infrared cores have masses of
Ma ≈ 1.4 M⊙ and Mb ≈ 4.3 M⊙.
– 9 –
MIR-c was detected only at 18.3 µm, but if use the 11.7µm uppper limit to restrict
the blackbody fitting, we find an upper limit to the temperature of 120 K. (see Figure 6).
Coupled with the 18.3 µm flux density, we estimate a limit to the total mass (gas+dust) of
Mc & 0.2 M⊙.
4. Conclusions
We have obtained high angular resolution 10 − 18 µm imaging of CB54, a ∼ 100 M⊙
Bok globule known to harbor a dense core and two near-infrared sources previously classified
as class I young stellar objects. We have detected only one (CB54YC1-II) of the two near-
infrared sources, confirming its protostellar evolutionary status. Based upon the mid-infrared
luminosity, we estimate that the central protostellar mass for CB54YC1-II is M∗ = 0.08 −
0.8M⊙, depending on the mass transfer rate. The SED is also consistent with a more evolved
T Tauri star behind a screen of extinction. Without a more complete SED, it is not possible
to distinguish between these models.
The other near-infrared source (CB54YC1-I) should have been detected if it were a class
I protostar similar to that CB54YC1-II. We find that the near-infrared SED is consistent
with the SED for a more evolved star extinguished by the globule itself. CB54YC1-I may be
a background F- or G-giant or may be an embedded young A- or B-star. An alternative ex-
planation for CB54YC1-I is that the source is an embedded protostar viewed at an extremely
high inclination angle, and the near-infrared detections are not of the central protostar, but
of light scattered by the accretion disk into our line of sight. High spatial resolution near-
infrared polarimetry and/or mid-infrared spectroscopy could be used to ascertain the status
CB54YC1-I. If CB54YC1-I is an embedded, young A or B star, its mass may be on the order
of ∼ 2− 5 M⊙.
Additionally, we have discovered three new mid-infrared sources (MIR-a, MIR-b, and
MIR-c) which are spatially coincident with both the position of the associated IRAS point
source and the center of the dense core in CB54. These sources are characterized with a
100 K blackbody, consistent the expected bolometric temperature of a class 0 protostar.
Based upon the mid-infrared emission, we have estimated the masses for these sources to be
∼ 4 M⊙, ∼ 1.5 M⊙, and ∼ 0.2 M⊙.
If CB54YC1-I is indeed an embedded A or B star, it is interesting to speculate that
CB54YC1-I may have formed first and induced star formation further in the cloud, through
the interaction of its outflow/winds with remainder of the globule. The total mass estimated
for the sources within CB54 is about 10− 15 M⊙ or about 10− 15% of the total cloud mass.
– 10 –
Such a sequential process of star formation occurring in the Bok globule CB54 may be similar
to what is observed in other globules (Huard, Weintraub, & Sandell 2000; Codella et al.
2006). Spectroscopy and a more complete SED in the mid-infrared and far-infrared is needed
to disentangle the possible spectral types and evolutionary states for the sources in CB54.
These observations were carried out during payback time to C. Telesco for develop-
ment of T-ReCS. The authors would like to thank Charlie Telesco, Chris Packham, and
Margaret Moerchen for collecting these data. Portions of this work were supported by the
California Institute of Technology under contract with the National Aeronautics and Space
Administration. C. G. M. acknowledges support from a University of Florida Graduate Mi-
nority Fellowship, a SEAGEP Fellowship, and NSF grants AST97-3367 and AST 02-02976.
C. G. M. would like to thank Eric McKenzie and Ana Matkovic for comments and sugges-
tions. Based on observations obtained at the Gemini Observatory, which is operated by
the Association of Universities for Research in Astronomy, Inc., under a cooperative agree-
ment with the NSF on behalf of the Gemini partnership: the National Science Foundation
(United States), the Particle Physics and Astronomy Research Council (United Kingdom),
the National Research Council (Canada), CONICYT (Chile), the Australian Research Coun-
cil (Australia), CNPq (Brazil) and CONICET (Argentina). This research has made use of
the NASA/IPAC Infrared Science Archive, which is operated by the Jet Propulsion Labora-
tory, California Institute of Technology, under contract with the National Aeronautics and
Space Administration.
REFERENCES
Afonso, J. M., Yun, J. L., & Clemens, D. P. 1998, AJ, 115, 1111
Alves, J. F. & Yun, J. L. 1995, ApJ, 438, L107
André, P. & Montmerle, Th. 1994, ApJ, 420, 837
André, P., Ward-Thompson, D., & Barsony, M. 1993, A&A, ApJ, 420, 837
Andre, P., Ward-Thompson, D., & Barsony, M. 2000, Protostars and Planets IV, 59
Bontemps, S., André, P., Terebey, S., & Cabrit, S. 1996, A&A, 311, 858
Bourke, T. L., Hyland, A. R., & Robinson, G. 1995, MNRAS, 276, 1052
Ciardi, D. R., Telesco, C. M., Williams, J. P., Fisher, R. S., Packham, C., Piña, R., &
Radomski, J. 2003, ApJ, 585, 392
– 11 –
Ciardi, D. R., Telesco, C. M., Packham, C., Gómez Martin, C., Radomski, J. T., De Buizer,
J. M., Phillips, C. J., & Harker, D. E. 2005, ApJ, 629, 897
Clemens, D. P. & Barvainis, R. 1988, ApJS, 68, 257
Clemens, D. P., Yun, J., & Heyer, M. 1991, ApJS, 75, 877
Codella, C., Brand, J., Massi, F., Wouterloot, J. G. A., & Davis, G. R. 2006, A&A, 457, 891
D’Alessio, P., Calvet, N., Hartmann, L., Lizano, S., & Cantó, J. 1999, ApJ, 527, 893
de Gregorio-Monsalvo, I., Gómez, J. F., Suárez, O., Kuiper, T. B. H., Anglada, G., Patel,
N. A., & Torrelles, J. M. 2006, AJ, 132, 2584
Eisner, J. et al. 2006, ApJ635, 396
Furuya, R. S., Kitamura, Y., Wootten, H. A., Claussen, M. J., & Kawabe, R. 2001, ApJ,
559, L143
Gómez, J. F., de Gregorio-Monsalvo, I., Suárez, O., & Kuiper, T. B. H. 2006, AJ, 132, 1322
Haisch, K. E., Jr., Lada, E. A., & Lada, C. J. 2000, AJ, 120, 1396
Henning, Th., Wolf, S., Launhardt, R. & Waters, R. 2003, ApJ, 561, 871
Huard, T. L., Weintraub, D. A., & Sandell, G. 2000, A&A, 362, 635
Jayawardhana, R., Hartmann, L. & Calvet, N. 2001, ApJ, 548, 310
Kenyon, S. J., Calvet, N., & Hartmann, L. 1993, ApJ, 414, 676
Khanzadyan, T. 2003, Ph.D. Thesis
Launhardt, R. & Henning, Th. 1997, A&A, 326, 329
Launhardt, R., Ward-Thompson, D., & Henning, Th. 1997, MNRAS, 288, L45
Masunaga, H. & Inutsuka, S. 2000, ApJ, 531, 350
Mathis, J. S. 1990, ARA&A, 28, 37
Moreira, M. C. & Yun, J. L. 1995, ApJ, 454, 850
Moreira, M. C., Yun, J. L., Vázquez, R. & Torrelles, J. M. 1996, AJ, 113, 1371
Muthumariappan, C., Kwok, S., & Volk, K. 2006, ApJ, 640, 353
– 12 –
O’Linger, J. C., Cole, D. M., Ressler, M. E., & Wolf-Chase, G. 2006, AJ, 131, 2601
Rebull, L. et al. 2007 in press
Saraceno, P., André P., Ceccarelli, C., Griffin, M., & Molinari, S. 1996, A&A, 309, 827
Shu, F. H., Adams, F. C., & Lizano, S. 1987, ARA&A, 25, 23
Stahler, S., Shu, F. H., & Taam, R. E. 1980, ApJ, 242, 226
Telesco, C. M., Piña, R. K., Hanna, K. T., Julian, J. A., Hon, D. B., & Kisko, T. M. 1998,
SPIE, 3354, 534
Wang, Y., Evans, N. J., II, Zhou, S., & Clemens, D. P. 1995, ApJ, 454, 217
Yun, J. L. 1996, AJ, 111, 930
Yun, J. L. & Clemens, D. P. 1990, ApJ, 365, L73
Yun, J. L. & Clemens, D. P. 1994a, AJ, 108, 612
Yun, J. L. & Clemens, D. P. 1994b, ApJS, 92, 145
Yun J. L., Moreira M. C., Torrelles J. M., Afonso J. M., & Santos N. C., 1996, AJ, 111, 841
Zhou, S., Evans, N. J. II, & Wang, Y. 1996, ApJ, 466, 296
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Table 1. Summary of Observations
λc ∆λ Frame Time On-Source
Filter (µm) (µm) (ms) (seconds) Air Mass
N 10.36 5.27 25.8 304 1.11
Si-11.66 11.7 1.13 25.8 304 1.84
Qa-18.30 18.3 1.51 25.8 912 1.18-1.34
Table 2. Source Positions and Flux Densities
J H K MSX-A N Si-11.7 Qa-18.3
1.22 µm 1.65 µm 2.18 µm 8.28 µm 10.36 µm 11.66 µm 18.3 µm
Source (∆α, ∆δ)a (mJy) (mJy) (mJy) (mJy) (mJy) (mJy) (mJy) References
YCII (0.0, 0.0) 0.71± 0.07 5.56± 0.56 37.3± 3.7 278± 19 302± 5 217± 6 431± 11 1,2
YCI (−10.0,+8.3) 0.59± 0.06 4.63± 0.46 12.9± 1.3 · · · < 4 < 6 < 10 1,2
MIR-a (−15.8,+1.7) · · · · · · · · · · · · 16± 4 14± 6 219± 11 2
MIR-b (−13.6,−1.8) · · · · · · · · · · · · 12± 4 14± 6 322± 11 2
MIR-c (−12.6,−3.8) · · · · · · · · · · · · < 4 < 6 70± 12 2
aPositional offsets in arcsec from the source CB54YC1-II (α = 07h04m21.7s, δ = −16◦23′19′′ (J2000)).
References. — 1. Yun (1996); 2. This Work
– 15 –
Fig. 1.— 2MASS KS image of CB 54. The dashed line delineates the area imaged with
T-ReCS. The near-infrared sources CB54YC1-I and CB54YC1-II are annotated, and the
positions of the detected mid-infrared sources are marked with the filled white circles (see
Figure 2). The image (0,0) position is centered on CB54YC1-II. The circle (dotted line)
is centered on the peak of 850 µm core; the size of the circle represents the approximate
size of the 850 µm core (Henning et al. 2001). The cross marks the position of the IRAS
source PSC 07020-1618. The open diamond marks the position of the 8 µm MSX source
G228.9946-04.6200, and the open square marks the position for the cm source discovered
with the VLA (Yun et al. 1996). The image pixel scale is 1′′ pix−1.
– 16 –
Fig. 2.— T-ReCS N-band, 11.7µm and 18.3µm images and contour plots of CB54. The
(0,0) point of each image is centered on CB54YC1-II. The images have been stretched by
an inverse hyperbolic sine to enhance the contrast. The detected mid-infrared sources are
annotated. The position of CB54YC1-I (not detected by the mid-infrared observations)
is marked in each image, and the position of MIR-c (detected only at 18.3µm) is marked
in the N-band and 11.7µm images. The N-band contour levels are [0.03, 0.05, 0.15, 1.5, 3.5]
mJy pix−1; the 11.7 µm contour levels are [0.05, 0.10, 0.15, 0.2, 0.4, 0.8, 1.6] mJy pix−1; the
18.3 µm contour levels are [0.4, 0.8, 1.6, 3.2] mJy pix−1. The pixel scale for each image is
0.′′089 pixel−1.
– 17 –
Fig. 3.— The SED of CB54YC1-II (solid points). The horizontal error bars represent the
bandwidth of the filters. The dashed line represents the SED for the class I protostar IRAS
04295+2251. The dotted line represents a median TTS SED convolved with AV = 25
magnitudes of extinction.
– 18 –
Fig. 4.— The SED of CB54YC1-I (solid points). Upper limits for the mid-infrared obser-
vations are represented by the downarrows. The long-dashed line represents the SED for
the class I protostar IRAS 04295+2251. The dotted line represents a median TTS SED
convolved with AV = 16 magnitudes of extinction. The solid line is a 1100 K blackbody;
the dash-dot line is a 6000 K blackbody convolved with AV = 23 magnitudes of extinction,
and the short-dashed line is a 30000 K blackbody convolved with AV = 26 magnitudes of
extinction.
– 19 –
Fig. 5.— Reduced chi-squares (top), best-fit temperatures (middle), and predicted N-
band flux densities (bottom), are plotted as a function of visual extinction for the black-
body+extinction models fitted to the JHK flux densities of CB54YC1-I. The horizontal
dashed line in the chi-square plot marks the sharp knee in the chi-square curve at AV = 26
mag (χ2
= 1.2). The horizontal dashed line in the flux density plot marks the 1σ detection
limit (4 mJy) of the N-band observations. The vertical dashed lines delineate the extinction
range of 23 ≤ AV ≤ 26 mag (see text for details).
– 20 –
Fig. 6.— The SEDs of the mid-infrared sources MIR-a (solid squares), MIR-b (solid circles),
and MIR-c (open circles). The 110 K and 100 K best-fit blackbody curves are shown for MIR-
a (dashed) and MIR-b (dot-dash). For MIR-c, the dotted line represents a 120 K blackbody
which is fit to the 18.3µm flux density and the 11.7µm upper limit. The horizontal error
bars represent the bandwidth of the filters.
	Introduction 
	Observations and Data Reduction
	Discussion 
	CB54YC1-II 
	CB54YC1-I 
	Mid-Infrared Sources 
	Conclusions
ABSTRACT
  We present mid-infrared (10.4 \micron, 11.7 \micron, and 18.3 \micron)
imaging intended to locate and characterize the suspected protostellar
components within the Bok globule CB54. We detect and confirm the protostellar
status for the near-infrared source CB54YC1-II. The mid-infrared luminosity for
CB54YC1-II was found to be $L_{midir} \approx 8 L_\sun$, and we estimate a
central source mass of $M_* \approx 0.8 M_\sun$ (for a mass accretion rate of
${\dot M}=10^{-6} M_\sun yr^{-1}$). CB54 harbors another near-infrared source
(CB54YC1-I), which was not detected by our observations. The non-detection is
consistent with CB54YC1-I being a highly extinguished embedded young A or B
star or a background G or F giant. An alternative explanation for CB54YC1-I is
that the source is an embedded protostar viewed at an extremely high
inclination angle, and the near-infrared detections are not of the central
protostar, but of light scattered by the accretion disk into our line of sight.
In addition, we have discovered three new mid-infrared sources, which are
spatially coincident with the previously known dense core in CB54. The source
temperatures ($\sim100$K) and association of the mid-infrared sources with the
dense core suggests that these mid-infrared objects may be embedded class 0
protostars.

<|endoftext|><|startoftext|>
Introduction to Systems Biology (Humana Press, 2007).
( http://arxiv.org/ftp/q-bio/papers/0512/0512007.pdf )
http://arxiv.org/ftp/q-bio/papers/0512/0512007.pdf
[22] Ao P, 2007, Orders of magnitude change in phenotype rate caused by mutation, Cellular
Oncology 29: 67 - 69
	References
ABSTRACT
  Is it possible to understand cancer? Or more specifically, is it possible to
understand cancer from genetic side? There already many answers in literature.
The most optimistic one has claimed that it is mission-possible. Duesberg and
his colleagues reviewed the impressive amount of research results on cancer
accumulated over 100 years. It confirms the a general opinion that considering
all available experimental results and clinical observations there is no cancer
theory without major difficulties, including the prevailing gene-based cancer
theories. They have then listed 9 "absolute discrepancies" for such cancer
theory. In this letter the quantitative evidence against one of their major
reasons for dismissing mutation cancer theory, by both in vivo experiment and a
first principle computation, is explicitly pointed out.

<|endoftext|><|startoftext|>
Introduction
Let (M, σ) be a smooth compact and connected symplectic manifold of dimension 2n and let T be a
torus which acts effectively on (M, σ) by means of symplectomorphisms. If the action of T on (M, σ)
is moreover Hamiltonian, then dimT ≤ n, and the image of the momentum mapping µT : M → t
∗ is a
convex polytope ∆ in the dual space t∗ of t, where t denotes the Lie algebra of T . In the maximal case when
dimT = n, (M, σ) is called a Delzant space.
Delzant [3, (*) on p. 323] proved that in this case the polytope ∆ is very special, a so-called Delzant
polytope, of which we recall the definition in Section 2. Furthermore Delzant [3, Th. 2.1] proved that two
Delzant spaces are T -equivariantly symplectomorphic if and only if their momentum mappings have the
same image up to a translation by an element of t∗. Thirdly Delzant [3, pp. 328, 329] proved that for every
Delzant polytope ∆ there exists a Delzant space such that µT (M) = ∆. This Delzant space is obtained as
the reduced phase space for a linear Hamiltonian action of a torus N on a symplectic vector space E, at a
value λN of the momentum mapping of the Hamiltonian N -action, where E, N and λN are determined by
the Delzant polytope.
Finally Delzant [3, Sec. 5] observed that the Delzant polytope gives rise to a fan (= éventail in French),
and that the Delzant space with Delzant polytope ∆ is T -equivariantly diffeomorphic to the toric variety
Mtoric defined by the fan. HereMtoric is a complex n-dimensional complex analytic manifold, and the action
of the real torus T on Mtoric has an extension to a complex analytic action on Mtoric of the complexification
∗Research stimulated by a KNAW professorship
†Partly supported by a Rackham Predoctoral fellowship
http://arxiv.org/abs/0704.0430v1
TC of T . In our description in Section 5 of the toric variety M
toric we do not use fans. The information, for
each vertex v of ∆, which codimension one faces of ∆ contain v, already suffices to define Mtoric.
In this note we show that the construction of the Delzant space M as a reduced phase space leads, for
every vertex v of the Delzant polytope, to a natural coordinatization ϕv of a T -invariant open cell Mv in
M , where Mv contains the unique fixed point mv in M of the T -action such that µT (mv) = v. We give an
explicit construction of the inverse ψv of ϕv, which is a maximal diffeomorphism in the sense of Remark
3.10. The construction of ψv originated in an attempt to extend the equivariant symplectic ball embeddings
from (B2nr , σ0) ⊂ (C
n, σ0) into the Delzant space (M, σ) in Pelayo [10] by maximal equivariant symplec-
tomorphisms from open neighborhoods of the origin in Cn into the Delzant space (M, σ). If v and w are
two different vertices, then the coordinate transformation ϕw ◦ ϕv
−1 is given by the explicit formulas (4.3),
(4.4).
Let Σ be the set of all strata of the orbit type stratification of M for the T -action. Then the domain of
definition Mv of ϕv is equal to the union of all S ∈ Σ such that the fixed point mv belongs to the closure
of S in M , see Corollary 5.5. The strata S ∈ Σ are also the orbits in the toric variety Mtoric ≃ M for the
action of the complexification TC of the real torus T , and the domain of definition Mv of ϕv is equal to
the domain of definition of a natural complex analytic TC-equivariant coordinatization Φv of a TC-invariant
open cell. The diffeomorphism Φv ◦ ϕv
−1, which sends the reduced phase space coordinates to the toric
variety coordinates, maps Uv := ϕv(Mv) diffeomorphically onto a complex vector space, and is given by
the explicit formulas (5.9).
In the toric variety coordinates the complex structure is the standard one and the coordinate transfor-
mations are relatively simple Laurent monomial transformations, whereas the symplectic form is generally
given by quite complicated algebraic functions. On the other hand, in the reduced phase space coordinates
the symplectic form is the standard one, but the coordinate transformations, and also the complex structure,
have a more complicated appearance.
Let F denote the set of all d codimension one faces of ∆ and, for every vertex v of ∆, let Fv denote the
set of all f ∈ F such that v ∈ f . Note that #(Fv) = n for every vertex v of ∆. For any sets A and B, let
AB denote the set of all A-valued functions on B. If A is a field and the set B is finite, then AB is a #(B)-
-dimensional vector space over A. One of the technical points in this paper is the efficient organization of
proofs and formulas made possible by viewing the Delzant space as a reduction of the vector space CF , and
letting, for each vertex v, the coordinatizations ϕv and Φv take their values in C
Fv . This leads to a natural
projection ρv : C
F → CFv obtained by the restriction of functions on F to Fv ⊂ F . For each vertex v the
complex vector space CFv is isomorphic to Cn, but the isomorphism depends on an enumeration of Fv , the
introduction of which would lead to an unnecessary complication of the combinatorics. Similarly our torus
T is isomorphic to Rn/Zn, but the isomorphism depends on the choice of a Z-basis of the integral lattice
tZ in the Lie algebra t of T . As for each vertex v a different Z-basis of tZ appears, we also avoid such a
choice, keeping T in its abstract form. We hope and trust that this will not lead to confusion with our main
references Delzant [3], Audin [2] and Guillemin [7] about Delzant spaces, where CF , each CFv , and T is
denoted as Cd, Cn, and Rn/Zn, respectively.
The organization of this manuscript is as follows. In Section 2 we review the definition of the reduced
phase Delzant space, and introduce the notations which will be convenient for our purposes. In Section 3 we
define the reduced phase space coordinatizations. In Section 4 we give explicit formulas for the coordinate
transformations and describe the reduced phase space Delzant space as obtained by gluing together bounded
open subsets of n-dimensional complex vector spaces with these coordinate transformations as the gluing
maps. In Section 5 we review the definition of the toric variety defined by the Delzant polytope, prove that
the natural mapping from the reduced phase space to the toric variety is a diffeomorphism, and compare
the coordinatizations of Section 3 with the natural coordinatizations of the toric variety. In Section 6 we
present these computations for the two simplest classes of examples, the complex projective spaces and the
Hirzebruch surfaces.
2 The reduced phase space
Let T be an n-dimensional torus, a compact, connected, commutative n-dimensional real Lie group, with
Lie algebra t. It follows that the exponential mapping exp : t → T is a surjective homomorphism from
the additive Lie group t onto T . Furthermore, tZ := ker(exp) is a discrete subgroup of (t, +) such that the
exponential mapping induces an isomorphism from t/tZ onto T , which we also denote by exp. Note that tZ
is defined in terms of the group T rather than only the Lie algebra t, but the notation tZ has the advantage
over the more precise notation TZ that it reminds us of the fact it is a subgroup of the additive group t.
Because t/tZ is compact, tZ has a Z-basis which at the same time is an R-basis of t, and each Z-basis
of tZ is an R-basis of t. Using coordinates with respect to an ordered Z-basis of tZ, we obtain a linear
isomorphism from t onto Rn which maps tZ onto Z
n, and therefore induces an isomorphism from T onto
n/Zn. For this reason, tZ is called the integral lattice in t. However, because we do not have a preferred
Z-basis of tZ, we do not write T = R
n/Zn.
Let ∆ be an n-dimensional convex polytope in t∗. We denote by F and V the set of all codimension one
faces and vertices of ∆, respectively. Note that, as a face is defined as the set of points of the closed convex
set on which a given linear functional attains its minimum, see Rockafellar [11, p.162], every face of ∆ is
compact. For every v ∈ V , we write
Fv = {f ∈ F | v ∈ f}.
∆ is called a Delzant polytope if it has the following properties, see Guillemin [7, p. 8].
i) For each f ∈ F there is an Xf ∈ t and λf ∈ R such that the hyperplane which contains f is equal to
the set of all ξ ∈ t∗ such that 〈Xf , ξ〉 + λf = 0, and ∆ is contained in the set of all ξ ∈ t
∗ such that
〈Xf , ξ〉+ λf ≥ 0. The vector Xf and constant λf are made unique by requiring that they are not an
integral multiple of another such vector and constant, respectively.
ii) For every v ∈ V , the Xf with f ∈ Fv form a Z-basis of the integral lattice tZ in t.
It follows that
∆ = {ξ ∈ t∗ | 〈Xf , ξ〉+ λf ≥ 0 for every f ∈ F}. (2.1)
Also, #(Fv) = n for every v ∈ V , which already makes the polytope ∆ quite special. In the sequel we
assume that ∆ is a given Delzant polytope in t∗.
For any z ∈ CF and f ∈ F we write z(f) = zf , which we view as the coordinate of the vector z with
the index f . Let π be the real linear map from RF to t defined by
π(t) :=
tf Xf , t ∈ R
F . (2.2)
Because, for any vertex v, the Xf with f ∈ Fv form a Z-basis of tZ which is also an R-basis of t, we have
π(ZF ) = tZ and π(R
F ) = t. It follows that π induces a surjective homomorphism of Lie groups π′ from
the torus RF/ZF = (R/Z)F onto t/tZ, and we have the corresponding surjective homomorphism exp ◦π
from RF /ZF onto T .
Write n := ker π, a linear subspace of RF , and N = ker(exp ◦π′), a compact commutative subgroup
of the torus RF /ZF . Actually, N is connected, see Lemma 3.1 below, and therefore isomorphic to n/nZ,
where nZ := n ∩ Z
F is the integral lattice in n of the torus N . 1
On the complex vector space CF of all complex-valued functions on F we have the action of the torus
F/ZF , where t ∈ RF/ZF maps z ∈ CF to the element t · z ∈ CF defined by
(t · z)f = e
2π i tf zf , f ∈ F.
The infinitesimal action of Y ∈ RF =Lie(RF /ZF ) is given by
(Y · z)f = 2π i Yf zf ,
which is a Hamiltonian vector field defined by the function
z 7→ 〈Y, µ(z)〉 =
Yf |zf |
2/2 =
Yf (xf
2 + yf
2)/2, (2.3)
and with respect to the symplectic form
σ := (i /4π)
dzf ∧dzf = (1/2π)
dxf ∧dyf , (2.4)
if zf = xf +i yf , with xf , yf ∈ R. Here the factor 1/2π is introduced in order to avoid an integral lattice
(2π Z)F instead of our ZF .
Because the right hand side of (2.3) depends linearly on Y , we can view µ(z) as an element of (RF )∗ ≃
F , with the coordinates
µ(z)f = |zf |
2/2 = (xf
2 + yf
2)/2, f ∈ F. (2.5)
In other words, the action of RF/ZF on CF is Hamiltonian, with respect to the symplectic form σ and with
momentum mapping µ : CF → (Lie(RF /ZF ))∗ given by (2.3), or equivalently (2.5).
It follows that the subtorus N of RF /ZF acts on CF in a Hamiltonian fashion, with momentum mapping
µN := ι
◦ µ : CF → n∗, (2.6)
where ιn : n → R
F denotes the identity viewed as a linear mapping from n ⊂ RF to RF , and its transposed
: (RF )∗ → n∗ is the map which assigns to each linear form on RF its restriction to n.
Write λN = ι
(λ), where λ denotes the element of (RF )∗ ≃ RF with the coordinates λf , f ∈ F .
It follows from Guillemin [7, Th. 1.6 and Th. 1.4] that λN is a regular value of µN , hence the level set
Z := µN
−1({λN}) of µN for the level λN is a smooth submanifold of C
F , and that the action of N on Z
is proper and free. As a consequence the N -orbit space M = M∆ := Z/N is a smooth 2n-dimensional
manifold such that the projection p : Z → M exhibits Z as a principal N -bundle over M . Moreover, there
is a unique symplectic form σM on M such that p
∗σM = ιZ
∗σ, where ιZ is the identity viewed as a smooth
mapping from Z to CF .
Remark 2.1 Guillemin [7] used the momentum mapping µN − λN instead of µN , such that the reduction
is taken at the zero level of his momentum mapping. We follow Audin [2, Ch. VI, Sec. 3.1] in that we use
the momentum mapping µN for the N -action, which does not depend on λ, and do the reduction at the level
λN . ⊘
1We did not find a proof of the connectedness of N in [3], [2], or [7].
The symplectic manifold (M, σM ) is the Marsden-Weinstein reduction of the symplectic manifold
(CF , σ) for the Hamiltonian N -action at the level λN of the momentum mapping, as defined in Abraham
and Marsden [1, Sec. 4.3]. On the N -orbit space M , we still have the action of the torus (RF /ZF )/N ≃ T ,
with momentum mapping µT :M → t
∗ determined by
π∗ ◦ µT ◦ p = (µ− λ)|Z . (2.7)
The torus T acts effectively on M and µT (M) = ∆, see Guillemin [7, Th. 1.7]. Actually, all these
properties of the reduction will also follow in a simple way from our description in Section 3 of Z in term
of the coordinates zf , f ∈ F .
The symplectic manifold M∆ together with this Hamiltonian T -action is called the Delzant space de-
fined by ∆, see Guillemin, [7, p. 13]. This proves the existence part [3, pp. 328, 329] of Delzant’s theory.
3 The reduced phase space coordinatizations.
For any v ∈ V , let ιv := ρv
∗ : (RFv)∗ → (RF )∗ denote the transposed of the restriction projection
ρv : R
F → RFv . If in the usual way we identify (RFv)∗ and (RF )∗ with RFv and RF , respectively, then
ιv : R
Fv → RF is the embedding defined by ιv(x)f = xf if f ∈ Fv and ιv(x)f ′ = 0 if f
′ ∈ F , f ′ /∈ Fv.
Because ιv maps Z
Fv into ZF and ιv(R
Fv) ∩ ZF = ιv(Z
Fv), it induces an embedding of the n-dimensional
torus RFv/ZFv into RF/ZF , which we also denote by ιv.
Lemma 3.1 With these notations, RF , ZF , and RF/ZF are the direct sum of n and ιv(RFv), n ∩ Zn and
Fv), and N and ιv(R
Fv/ZFv), respectively.
It follows that N is connected, a torus, with integral lattice equal to n ∩ ZF . It also follows that π ◦ ιv
is an isomorphism from the torus RFv/ZFv onto the torus T .
Proof Let t ∈ RF . Because the Xf , f ∈ Fv, form an R-basis of t, there exists a unique t
v ∈ RFv , such
π(t) =
(tv)f Xf = π(ιv(t
that is, t− ιv(t
v) ∈ n. Moreover, because the Xf , f ∈ Fv , also form a Z-basis of tZ, we have that t
v ∈ ZFv ,
and therefore t− ιv(t
v) ∈ n ∩ ZF , if t ∈ ZF . �
Lemma 3.2 We have z ∈ Z if and only if µ(z) − λ ∈ π∗(t∗). More explicitly, if and only if there exists a
ξ ∈ t∗ such that
|zf |
2/2− λf = 〈Xf , ξ〉 for every f ∈ F. (3.1)
When z ∈ Z , the ξ in (3.1) is uniquely determined.
Furthermore, Z = (µ− λ)−1(π∗(∆)), (µ− λ)(Z) = π∗(∆), and Z is a compact subset of CF .
Proof The kernel of ι∗
is equal to the space of all linear forms on RF which vanish on n := kerπ, and
therefore ker ι∗
is equal to the image of π∗ : t∗ → (RF )∗. Because π is surjective, π∗ is injective, which
proves the uniqueness of ξ.
It follows from (3.1) that 〈Xf , ξ〉 + λf ≥ 0 for every f ∈ F , and therefore ξ ∈ ∆ in view of (2.1).
Conversely, if ξ ∈ ∆, then there exists for every f ∈ F a complex number zf such that |zf |
2/2 = 〈Xf , ξ〉+
λf , which means that z ∈ Z and (µ− λ)(z) = π
∗(ξ). The set π∗(∆) is compact because ∆ is compact and
π∗ is continuous. Because the mapping µ− λ is proper, it follows that Z = (µ− λ)−1(π∗(∆)) is compact.
Let v ∈ V . The Xf , f ∈ Fv , form an R-basis of t, and therefore there exists for each z ∈ C
Fv a unique
ξ = µv(z) ∈ t
∗ such that (3.1) holds for every f ∈ Fv . That is, the mapping µv : C
Fv → t∗ is defined by
the equations
|zf |
2/2 − λf = 〈Xf , µv(z)〉, z ∈ C
Fv , f ∈ Fv. (3.2)
In other words, µv is defined by the formula
ρv ◦ π
∗ ◦ µv = ρv ◦ (µ− λ) ◦ ιv, (3.3)
where ρv denotes the restriction projection from R
F onto RFv .
Lemma 3.3 If we let T act on CFv via RFv/ZFv by means of (t, z) 7→ (π ◦ ιv)−1(t) ·z, then µv : CFv → t∗
is a momentum mapping for this Hamiltonian action of T on CFv , with µv(0) = v. Here the symplectic
form on CFv is equal to
σ := (i /4π)
dzf ∧dzf = (1/2π)
dxf ∧dyf , (3.4)
that is, (2.4) with F replaced by Fv .
Let ρv denote the restriction projection from C
F onto CFv , and let Uv be the interior of the subset ρv(Z)
of CFv . Write
∆v := ∆ \
f ′∈F\Fv
f ′. (3.5)
Then ρv(Z) = µv
−1(∆), µv(ρv(Z)) = ∆, Uv = µv
−1(∆v), and µv(Uv) = ∆v. In particular ρv(Z) is a
compact subset of CFv , and Uv is a bounded and connected open neighborhood of 0 in C
Proof The first statement follows from (3.3), the fact that ρv◦µ◦ιv is a momentum mapping for the standard
Fv/ZFv action on CFv , and the fact that a momentum mapping for a Hamiltonian action plus a constant is a
momentum mapping for the same Hamiltonian action. It follows in view of (3.2) that 〈Xf , µv(0)〉+λf = 0
for every f ∈ Fv , hence µv(0) = v in view of i) in the definition of a Delzant polytope, and the fact that {v}
is the intersection of all the f ∈ Fv .
It follows from (3.2), Lemma 3.2, that z ∈ Z if and only if
|zf |
2/2 = 〈Xf , µv(ρv(z))〉 + λf for every f ∈ F, (3.6)
where we note that these equations are satisfied by definition for the f ∈ Fv. Therefore, if z ∈ Z , then (3.6)
and (2.1) imply that µv(ρ(z)) ∈ ∆. Conversely, if ξ ∈ ∆, then it follows from Lemma 3.2 that there exists
z ∈ Z such that π∗(ξ) = µ(z)− λ, of which the restriction to Fv yields ξ = µv(ρ(z)).
If ξ ∈ ∆v, z
v ∈ CFv , µv(z
v) = ξ, then 〈Xf ′ , µv(z
v)〉 + λf ′ > 0 for every f
′ ∈ F \ Fv, which will
remain valid if we replace zv by z̃v in a sufficiently small neighborhood of zv in CFv . It follows that we can
find z̃ ∈ CF such that ρv(z̃) = z̃
v and (3.6) holds with z replaced by z̃. That is, z̃ ∈ Z , and we have proved
that zv ∈ Uv.
Let conversely z ∈ Uv ⊂ C
Fv . We have in view of (3.2) that
|zf |
2/2 = 〈Xf , µv(z)− µv(0)〉 = 〈Xf , µv(z)− v〉
for every f ∈ Fv. Therefore µv(z) − v is multiplied by c
2 if we replace z by c z, c > 0. Because z is in
the interior of ρv(Z), we have c z
v ∈ ρv(Z), hence µv(c z) ∈ ∆ for c > 1, c sufficiently close to 1. On the
other hand, if ξ belongs to a face of ∆ which is not adjacent to v, then v + τ (ξ − v) /∈ ∆ for any τ > 1. It
follows that µv(z) does not belong to any f
′ ∈ F \ Fv, that is, µv(z) ∈ ∆v. �
The equation (3.6) can be written in the form |zf | = rf (µv(ρv(z))), where, for each f ∈ F , the function
rf : ∆ → R≥0 is defined by
rf (ξ) := (2(〈Xf , ξ〉+ λf ))
1/2, f ∈ F, ξ ∈ ∆. (3.7)
We now view the equations (3.6) for z ∈ Z as equations for the coordinates zf ′ , f
′ ∈ F \ Fv, with the
zf , f ∈ Fv as parameters, where the latter constitute the vector z
v = ρv(z). If z
v ∈ Uv, then for each
f ′ ∈ F \ Fv the coordinate zf ′ lies on the circle about the origin with strictly positive radius rf ′(µv(z
Because Lemma 3.1 implies that the homomorphism which assigns to each element of N its projection to
F\Fv/ZF\Fv is an isomorphism, and the latter torus is the group of the coordinatewise rotations of the zf ′ ,
f ′ ∈ F \ Fv, this leads to the following conclusions.
Proposition 3.4 Let v be a vertex of ∆. The open subset Zv := ρv−1(Uv) ∩ Z of Z is a connected smooth
submanfold of CF of real dimension 2n + (d − n), where d = #(F ) and d − n = dimN . The action
of the torus N on Zv is free, and the projection ρv : Zv → Uv exhibits Zv as a principal N -bundle over
Uv. It follows that we have a reduced phase space Mv := Zv/N , which is a connected smooth symplectic
2n-dimensional manifold, which carries an effective Hamiltonian T -action with momentum mapping as in
(2.7), with Z replaced by Zv.
There is a unique global section sv : Uv → Zv of ρv : Zv → Uv such that sv(z)f ′ ∈ R>0 for every
z ∈ Uv and f
′ ∈ F \ Fv. Actually, sv(z)f ′ = rf ′(µv(z)) when z ∈ Uv and f
′ ∈ F \ Fv , and therefore the
section sv is smooth. If pv : Zv → Mv = Zv/N denotes the canonical projection, then ψv := pv ◦ sv is a
T -equivariant symplectomorphism from Uv onto Mv, where T acts on Uv via R
Fv/ZFv , as in Lemma 3.3.
Remark 3.5 When z belongs to the closure ρv(Z) = Uv of Uv in CFv , see Lemma 3.3, we can define
sv(z) ∈ C
F by sv(z)f = zf when f ∈ Fv and sv(z)f ′ = rf ′(µv(z)) when f
′ ∈ F \ Fv . This defines
a continuous extension sv : Uv → C
F of the mapping sv : Uv → Z . Therefore sv(Uv) ⊂ Z , and
ψv := p ◦ sv : Uv →M is a continuous extension of the diffeomorphism ψv : Uv →Mv .
The continuous mapping ψv : Uv → M is surjective, but the restriction of it to the boundary ∂Uv :=
Uv \Uv of Uv in C
Fv is not injective. If zv ∈ ∂Uv, then the set G of all f
′ ∈ F \ Fv such that sv(z
v)f ′ = 0,
or equivalently µT (ψv(z
v)) ∈ f ′, is not empty. The fiber of ψv over ψv(z
v) is equal to the set of all tv · zv,
where the tv ∈ RFv/ZFv are of the form
tvf = −
(v)fg tg, f ∈ Fv,
where tg ∈ R/Z. It follows that each fiber is an orbit of some subtorus of R
Fv/ZFv acting on CFv . ⊘
Recall the definition (3.5) of the open subset ∆v of the Delzant polytope ∆. Because the union over all
vertices v of the ∆v is equal to ∆, we have the following corollary.
Corollary 3.6 The sets Zv, v ∈ V , form a covering of Z . As a consequence, Z is a smooth submanifold
of CF of real dimension n + d. The action of the torus N on Z is free, and we have a reduced phase
space M := Z/N , which is a compact and connected smooth 2n-dimensional symplectic manifold, which
carries an effective Hamiltonian T -action with momentum mapping µT :M → T as in (2.7). The sets Mv,
v ∈ V , form an open covering of M and the ϕv := (ψv)
−1 : Mv → Uv form an atlas of T -equivariant
symplectic coordinatizations of the Hamiltonian T -space M . For each v ∈ V , we have Mv = µT
−1(∆v),
and µT |Mv = µv ◦ ϕv.
For a characterization of Mv in terms of the orbit type stratification in M for the T -action, see Corollary
5.5, which also implies that Mv is an open cell in M .
Corollary 3.7 For every f ∈ F the set µT−1(f) is a real codimension two smooth compact connected
smooth symplectic submanifold of M .
For each v ∈ V , the set Mv is dense in M , and the diffeomorphism ψv : Uv → Mv is maximal among
all diffeomorphisms from open subsets of CFv onto open subsets of M .
Proof If f ∈ F , then for each v ∈ V we have that
−1(f) = {z ∈ Uv | zf = 0} (3.8)
if v ∈ f , that is, f ∈ ∆v. This follows from (3.2) and i) in the description of ∆ in the beginning of Section
2. On the other hand, µv
−1(f) = ∅ if f /∈ ∆v. Because µT
−1(f) ∩Mv = ψv(µv
−1(f)), and the Mv,
v ∈ V , form an open covering of M , this proves the first statement. The second statement follows from the
first one, because the complement of Mv in M is equal to the union of the sets µT
−1(f ′) with f ′ ∈ F \ Fv.
Remark 3.8 It follows from the proof of Corollary 3.7, that µT−1(f) is a connected component of the
fixed point set in M of the of the circle subgroup exp(RXf ) of T .
Actually, µT
−1(f) is a Delzant space for the action of the (n− 1)-dimensional torus
T/exp(RXf ),
with Delzant polytope P ⊂ (t/(RXf ))
∗ such that the image of P in t∗ under the embedding (t/(RXf ))
∗ is equal to a translate of f .
In a similar way, if g is a k-dimensional face of ∆, then µT
−1(g) is a 2k-dimensional Delzant space for
the quotient of T by the subtorus of T which acts trivially on µT
−1(g). ⊘
Remark 3.9 Let, for each f ∈ F , cf ∈H
2(M, Z) ⊂H2(M, R) denote the Poincaré dual of the codimen-
sion two Delzant subspace µT
−1(f) of M , see Remark 3.8. Then, with our normalization of the symplectic
form (2.4), the de Rham cohomology class [σM ] of the symplectic form σM of the Delzant space is equal to
[σM ] =
λf cf , (3.9)
see Guillemin [8, Thm. 6.3]. In particular [σM ] ∈H
2(M, Z), and therefore [σM ] is equal to the Chern class
of a complex line bundle over M , if all the coefficients λf , f ∈ F , are integers.
If ∆ is a simplex, when M is isomorphic to the n-dimensional complex projective space, then the
−1(f), f ∈ F , are complex projective hyperplanes, see Subsection 6.1, which are all homologous to
each other. It follows that in this case [σM ] = γ c, where c is the Poincaré dual of a complex projective
hyperplane and γ is equal to the sum of all the coefficients λf , f ∈ F . ⊘
Remark 3.10 Let ι : T → Rn/Zn be an isomorphism of tori, which allows us to let t ∈ T act on Cn via
n/Zn by means of
(t · z)j = e
2π i ι(t)j zj, 1 ≤ j ≤ n.
Let U be a connected T -invariant open neighborhood of 0 in Cn, provided with the symplectic form (2.4)
with F replaced by {1, . . . , n}. Let ψ : U → M be a T -equivariant symplectomorphism from U onto an
open subset ψ(U) of M . Because 0 is the unique fixed point for the T -action in U , and the fixed points
for the T -action in M are the pre-images under µT of the vertices of ∆, there is a unique v ∈ V such
that µT (ψ(0)) = v. Let Iv : C
Fv → Cn denote the complex linear extension of the tangent map of the
torus isomorphism ι ◦ (π ◦ ιv). In terms of the notation of Lemma 3.3 and Proposition 3.4, we have that
U ⊂ Iv(Uv) and ψv = ψ ◦ Iv on Iv
−1(U), which leads to an identification of ψ with the restriction of ψv to
the connected open subset Iv
−1(U) of Uv, via the isomorphism Iv
The ψ’s, with U equal to a ball in Cn centered at the origin, are the equivariant symplectic ball embed-
dings in Pelayo [10], and the second statement in Corollary 3.7 shows that the diffeomorphisms ψv are the
maximal extensions of these equivariant symplectic ball embeddings. ⊘
4 The coordinate transformations
Let v, w ∈ V . Then
Uv, w := ϕv(Mv ∩Mw) = Uv ∩ ψv
−1 ◦ ψw(Uw)
= {zv ∈ Uv | (z
v)f 6= 0 for every f ∈ Fv \ Fw}. (4.1)
In this section we will give an explicit formula for the coordinate transformations
ϕw ◦ ϕv
−1 = ψw
−1 ◦ ψv : Uv, w → Uw, v,
which then leads to a description of the Delzant space M as obtained by gluing together the subsets Uv with
the coordinate transformations as the gluing maps.
Let f ∈ F . Because the Xg , g ∈ Fw, form a Z-basis of tZ, and Xf ∈ tZ, there exist unique integers
f , g ∈ Fw, such that
f Xg. (4.2)
Note that if f ∈ Fw, then (w)
f = 1 when g = f and (w)
f = 0 otherwise. For the following lemma recall
that rg is defined by expression (3.7).
Lemma 4.1 Let v, w ∈ V , zv ∈ Uv, w. Then zw := ϕw ◦ ϕv−1(zv) ∈ Uw ⊂ CFw is given by
zwg =
(zvf )
f∈Fv\Fw
|zvf |
f (4.3)
if g ∈ Fw ∩ Fv, and
zwg =
(zvf )
f rg(µv(z
f∈Fv\Fw
|zvf |
f (4.4)
if g ∈ Fw \ Fv.
Proof The element zw ∈ Uw is determined by the condition that sw(zw) belongs to the N -orbit of sv(zv).
That is,
w)f = e
i tf sv(z
v)f for every f ∈ F
for some t ∈ RF such that ∑
tf Xf = 0. (4.5)
It follows from (4.5), (4.2) and the linear independence of the Xg, g ∈ Fw, that t ∈ n if and only if
tg = −
f∈F\Fw
tf for every g ∈ Fw. (4.6)
Note that µv(z
v) = µT (m) = µw(z
w), where m = ψv(z
v) = ψw(z
w). It follows from the definition of
the sections sv and sw, see Proposition 3.4, that
i) sv(z
v)f = z
f and sw(z
w)f = z
f if f ∈ Fv ∩ Fw,
ii) sv(z
v)f = z
f and sw(z
w)f = rf (µw(z
w)) = rf (µv(z
v)) if f ∈ Fv \ Fw,
iii) sv(z
v)f = rf (µv(z
v)) and sw(z
w)f = z
f if f ∈ Fw \ Fv, and
iv) sv(z
v)f = rf (µv(z
v)) = rf (µw(z
w)) = sw(z
w)f if f ∈ F \ (Fv ∪ Fw).
It follows from ii) and iv) that tf = −arg z
f and tf = 0 modulo 2π if f ∈ Fv \Fw and f ∈ F \ (Fv ∪Fw),
respectively. Then (4.6) implies that, modulo 2π,
f∈Fv\Fw
f arg z
f for every g ∈ Fw.
It now follows from i) and iii) that if g ∈ Fw, then z
g = sw(z
w)g = e
i tg sv(z
v)g is equal to
ei tg zvg =
(zvf )
f∈Fv\Fw
|zvf |
if g ∈ Fv, and equal to
ei tg |zvg | =
(zvf )
f |zvg |/
f∈Fv\Fw
|zvf |
if g /∈ Fv , respectively. Here we have used that if g ∈ Fw, then (w)
= 1 if f = g and (w)g
= 0 if f ∈ Fw,
f 6= g. Because |zvg | = rg(µv(z
v)) if g /∈ Fv, see (3.6) and (3.7), this completes the proof of the lemma. �
Remark 4.2 Note that zv ∈ Uv, w means that zv ∈ Uv and zvf 6= 0 if f ∈ Fv \ Fw. Furthermore, z
v ∈ Uv
implies that if g /∈ Fv , then µv(z
v) /∈ g, and therefore rg is smooth on a neighborhood of µv(z
v). Finally,
note that if g ∈ Fw and f ∈ Fv ∩Fw, then (w)
∈ {0, 1}, and therefore each of the factors in the right hand
sides of (4.3) and (4.4) is smooth on Uv, w. ⊘
Remark 4.3 In (4.3) and (4.4) only the integers (w)gf appear with f ∈ Fv and g ∈ Fw. Let (w v) denote the
matrix (w)gf , where f ∈ Fv and g ∈ Fw. Then (w v) is invertible, with inverse equal to the integral matrix
(v w). These integral matrices also satisfy the cocycle condition that (w v) (v u) = (w u), if u, v, w ∈ V .
These properties follow from the fact that (4.2) shows that (w v) is the matrix which maps the Z-basis Xg,
g ∈ Fw, onto the Z-basis Xf , f ∈ Fv, of tZ. It is no surprise that these base changes enter in the formulas
which relate the models in the vector spaces CFv for the different choices of v ∈ V . ⊘
Corollary 4.4 Let, for each v ∈ V , the mapping µv : CFv → t∗ be defined by (3.2), which is a momentum
mapping for a Hamiltonian T -action via RFv/ZFv on the symplectic vector space CFv as in Lemma 3.3.
Define Uv := µv
−1(∆v). If also w ∈ V , define Uv, w as the right hand side of (4.1), and, if z
v ∈ Uv, w,
define ϕw, v(z
v) := zw, where zw ∈ CFw is given by (4.3) and (4.4).
Then ϕw, v is a T -equivariant symplectomorphism from Uv, w onto Uw, v such that µw = µv ◦ ϕv, w
on Uw, v. The ϕw, v satisfy the cocycle condition ϕw, v ◦ ϕv, u = ϕw, u where the left hand side is defined.
Glueing together the Hamiltonian T -spaces Uv, v ∈ V , with the momentum maps µv, by means of the
gluing maps ϕw, v, v, w ∈ V , we obtain a compact connected smooth symplectic manifold M̃ with an
effective Hamiltonian T -action with a common momentum map µ̃ : M̃ → T such that µ̃(M̃ ) = ∆. In other
words, M̃ is a Delzant space for the Delzant polytope ∆.
The Delzant space M̃ is obviously isomorphic to the Delzant spaceM = µ−1({λ})/N introduced in Section
2, and actually the isomorphism is used in the proof that M̃ is a Delzant space for the Delzant polytope ∆.
The only purpose of Corollary 4.4 is to exhibit the Delzant space as obtained from gluing together the Uv,
v ∈ V , by means of the gluing maps ϕv, w, v, w ∈ V .
5 The toric variety
Let T := {z ∈ C | |z| = 1} denote the unit circle in the complex plane. The mapping t 7→ u where
uf = e
2π i tf for every f ∈ F is an isomorphism from the torus RF/ZF onto TF , where TF acts on CF by
means of coordinatewise multiplication and RF/ZF acted on CF via the isomorphism from RF/ZF onto
F . The complexification TC of the compact Lie group T is the multiplicative group C
× of all nonzero
complex numbers, and the complexification of TF is equal to TF
:= (TC)
F = (C×)F , which also acts on
F by means of coordinatewise multiplication.
The complexification NC of N is the subgroup exp(nC) of U
, where nC := n⊕ i n ⊂ C
F denotes the
complexification of n, viewed as a complex linear subspace, a complex Lie subalgebra, of the Lie algebra
F of TF
. In view of (4.6), we have, for every v ∈ V , that NC is equal to the set of all t ∈ T
such that
f∈F\Fv
f , g ∈ Fv. (5.1)
This implies that NC is a closed subgroup of T
isomorphic to T
, and therefore NC is a reductive
complex algebraic group.
If we define
v := {z ∈ C
F | zf 6= 0 for every f ∈ F \ Fv}, (5.2)
then it follows from (5.1) that the action of NC on C
v is free and proper. It follows that the action of NC on
v (5.3)
is free and proper, and therefore the NC-orbit space
Mtoric := CF∆/NC (5.4)
has a unique structure of a complex analytic manifold of complex dimension n such that the canonical pro-
jection from CF∆ onto M
toric exhibits CF∆ as a principal NC-bundle over M
toric. On Mtoric we still have the
complex analytic action of the complex Lie group group TF
/NC, which is isomorphic to the complexifi-
cation TC of our real torus T induced by the projection π. The complex analytic manifold M
toric together
with the complex analytic action of TC on it is the toric variety defined by the polytope ∆ in the title of this
section.
If v ∈ V and z ∈ CFv , then it follows from (5.1) that there is a unique t ∈ NC such that tf = zf for
every f ∈ F \ Fv, or in other words, z = t · ζ , where ζ ∈ C
F is such that ζf = 1 for every f ∈ F \ Fv.
Let Sv : C
Fv → CFv be defined by Sv(z
v)f = z
f when f ∈ F and Sv(z
v) = 1 when f ∈ F \ Fv, as in
Audin [2, p. 159]. If Pv : C
v → C
v /NC denotes the canonical projection from C
v onto the open subset
Mtoricv := C
v /NC of M
toric, then Ψv := Pv ◦ Sv is a complex analytic diffeomorphism from C
Fv onto
Mtoricv . It is TC-equivariant if we let TC act on C
Fv via TFv
as in Lemma 3.3. We use the diffeomorphism
Φv := Ψv
−1 from Mtoricv onto C
Fv as a coordinatization of the open subset Mtoricv of M
toric.
If v, w ∈ V , then
Utoricv, w := Φv(M
toric
toric
w ) = C
Fv ∩Ψv
−1 ◦Ψw(C
= {zv ∈ CFv | (zv)f 6= 0 for every f ∈ Fv \ Fw}. (5.5)
Moreover, with a similar argument as for Lemma 4.1, actually much simpler, we have that for every zv ∈
Utoricv, w the element z
w := Φw ◦Φv
−1(zv) ∈ CFw is given by
zwg =
(zvf )
f , g ∈ Fw. (5.6)
In this way the coordinate transformation Φw◦Φv
−1 is a Laurent monomial mapping, much simpler than the
coordinate transformation (4.3), (4.4). It follows that the toric variety Mtoric can be alternatively described
as obtained by gluing the n-dimensional complex vector spaces CFv , v ∈ V , together, with the maps (5.6)
as the gluing maps. This is the kind of toric varieties as introduced by Demazure [4, Sec. 4].
For later use we mention the following observation of Danilov [5, Th. 9.1], which is also of interest in
itself.
Lemma 5.1 Mtoric is simply connected.
Proof Let w ∈ V . It follows from (5.5), for all v ∈ V , that the complement of Mtoricw in Mtoric is equal to
the union of finitely closed complex analytic submanifolds of complex codimension one, whereas Mtoricw is
contractible because it is diffeomorphic to the complex vector space CFw . Because complex codimension
one is real codimension two, any loop in Mtoric with base point in Mtoricw can be slightly deformed to such
a loop which avoids the complement of Mtoricw in Mtoric, that is, which is contained in M
toric
w , after which it
can be contracted within Mtoricw to the base point in M
toric
w . �
Recall the definition in Section 2 of the reduced phase space M = Z/N .
Theorem 5.2 The identity mapping from Z into CF∆, followed by the canonical projection P from C
Mtoric = CF∆/NC, induces a T -equivariant diffeomorphism ̟ from M = Z/N onto M
toric. It follows that
each NC-orbit in C
∆ intersects Z in an N -orbit in Z .
Proof Because N is a closed Lie subgroup of NC, we have that the mapping P : Z → CF∆/NC induces a
mapping ̟ : Z/N → CF∆/NC, which moreover is smooth.
If v ∈ V , z ∈ Zv, then it follows from (5.1) that the tf , f ∈ F \ Fv , of an element t ∈ NC can take
arbitrary values, and therefore the |zf |, f ∈ F \ Fv can be moved arbitrarily by means of infinitesimal
NC-actions. Because Z is defined by prescribing the |zf |, f ∈ F \ Fv, as a smooth function of the zf ,
f ∈ Fv, and the Zv, v ∈ V , form an open covering of Z , this shows that at each point of Z the NC-orbit is
transversal to Z , which implies that ̟ is a submersion.
It follows that ̟(M) is an open subset of Mtoric. Because M is compact and ̟ is continuous, ̟(M)
is compact, and therefore a closed subset of Mtoric. Because Mtoric is connected, the conclusion is that
̟(M) =Mtoric, that is, ̟ is surjective.
Because ̟ is a surjective submersion, dimRM = 2n =dimRM
toric, and M is connected, we conclude
that ̟ is a covering map. Because Mtoric is simply connected, see Lemma 5.1, we conclude that ̟ is
injective, that is, ̟ is a diffeomorphism. �
Remark 5.3 Theorem 5.2 is the last statement in Delzant [3], with no further details of the proof. Audin
[2, Prop. 3.1.1] gave a proof using gradient flows, whereas the injectivity has been proved in [7, Sec. A1.2]
using the principle that the gradient of a strictly convex function defines an injective mapping. ⊘
Note that in the definition of the toric variety Mtoric, the real numbers λf , f ∈ F , did not enter, whereas
these numbers certainly enter in the definition of M , the symplectic form on M , and the diffeomorphism
̟ :M →Mtoric. Therefore the symplectic form σtoricλ := (̟
−1)∗(σ) on Mtoric on Mtoric will depend on the
choice of λ ∈ RF . On the symplectic manifold (Mtoric, σtoricλ ), the action of the maximal compact subgroup
T of TC is Hamiltonian, with momentum mapping equal to
µtoricλ := µ ◦̟
−1 :Mtoric → t∗, (5.7)
where µtoricλ (M
toric) = ∆, where we note that ∆ in (2.1) depends on λ.
In the following lemma we compare the reduced phase space coordinatizations with the toric variety
coordinatizations.
Lemma 5.4 Let v ∈ V . Then Mtoricv = ̟(Mv), and
θv := Ψv
−1 ◦̟ ◦ ψv (5.8)
is a TFv-equivariant diffeomorphism from Uv onto C
For each zv ∈ Uv, the element ζ
v := θv(z
v) is given in terms of zv by
ζvf = z
f ′∈F\Fv
rf ′(µv(z
f ′ , f ∈ Fv , (5.9)
where the functions rf ′ : ∆ → R≥0 are given by (3.7). We have
µv(zv) = µT (ψv(z
v)) = µtoricλ (Ψv(ζ
v)), (5.10)
and zv = θv
−1(ζv) is given in terms of ζ
zvf = ζ
f ′∈F\Fv
rf ′(ξ)
f ′ , f ∈ Fv, (5.11)
where ξ is the element of ∆ equal to the right hand side of (5.10).
Proof It follows from Lemma 3.3 and the paragraph preceding Proposition 3.4 that if zv ∈ ρv(Z), then
zv ∈ Uv if and only if z
f ′ 6= 0 for every f
′ ∈ F \ Fv. That is, the set Zv in Proposition 3.4 is equal to
Z ∩ CFv . It therefore follows from Theorem 5.2 that each NC-orbit in the NC-invariant subset C
v of C
intersects the N -invariant subset Zv of Z in an N -orbit in Zv, that is,
Mtoricv = Pv(C
v ) = ̟(pv(Zv)) = ̟(Mv).
If zv ∈ Uv, then Proposition 3.4 implies that sv(z
v)f = z
f for every f ∈ Fv and
v)f ′ = rf ′(µv(z
v)), f ′ ∈ F \ Fv.
If we define t ∈ TF
tf ′ = rf ′(µv(z
v))−1, f ′ ∈ F \ Fv,
f ′∈F\Fv
rf ′(µv(z
f ′ , f ∈ Fv ,
then (t · sv(z
v))t′ = 1 for every t
′ ∈ F \ Fv and, for every f ∈ Fv, ζ
f := (t · sv)f is equal to the right hand
side of (5.9). That is, t · sv(z
v) = Sv(ζ
v), see the definition of Sv in the paragraph preceding (5.5). On the
other hand, it follows from (5.1) that t ∈ NC, and therefore
Ψv(ζv) = Pv(t · sv(z
v)) = Pv(sv(z
v)) = ̟ ◦ pv(sv(z
v)) = ̟ ◦ ψv(z
that is, ζv = Ψv
−1 ◦̟ ◦ ψv(z
v). �
Corollary 5.5 Let s be the relative interior of a face of ∆. Then µT−1(s) is equal to a stratum S of the
orbit type stratification in M of the T -action, and also equal to the preimage under ̟ : M → Mtoric of a
TC-orbit in M
toric. If s = {v} for a vertex v, then µT
−1(s) = {mv} for the unique fixed point mv in M for
the T -action such that µT (mv) = v.
The mapping s 7→ µT
−1(s) is a bijection from the set Σ∆ of all relative interiors of faces of ∆ onto the
set Σ of all strata of the orbit type stratificiation in M for the action of T . If s, s′ ∈ Σ∆ then s is contained
in the closure of s′ in ∆ if and only if µT
−1(s) is contained in the closure of µT
−1(s′) in M .
The domain of definition Mv of ϕv in M is equal to the union of the S ∈ Σ such that mv belongs
to the closure of S in M . The domain of definition Mtoricv = ̟(Mv) of Φv is equal to the union of the
corresponding strata of the T -action in Mtoric, each of which is a TC-orbit in M
toric. Mv and M
toric
v are
open cells in M and Mtoric, respectively.
Proof There exists a vertex v of ∆ such that v belongs to the closure of s in t∗, which implies that s is
disjoint from all f ′ ∈ F \ Fv. Let Fv, s denote the set of all f ∈ Fv such that s ⊂ f , where Fv, s = ∅ if and
only if s is the interior of ∆. For any subset G of Fv, let C
G denote the set of all z ∈ C
Fv such that zf = 0
if f ∈ G and zf 6= 0 if f ∈ Fv \G. It follows from µv = µT ◦ ψv and (3.8) that ψ
v (µT
−1(s)) is equal to
Uv ∩ C
G with G = Fv, s. The diffeomorphism θv maps this set onto the set C
G with G = Fv, s. Because
the sets of the form CFvG with G ⊂ Fv are the strata of the orbit type stratification of the T
Fv -action on CFv ,
and also equal to the (TC)
Fv -orbits in CFv , the first statement of the corollary follows.
The second statement follows from µv
−1({v}) = {0} and the fact that 0 is the unique fixed point of the
Fv-action in Uv.
If s ∈ Σ∆ and v ∈ V , then mv belongs to the closure of µT
−1(s) if and only if s is not contained in any
f ′ ∈ F \ Fv. This proves the characterization of the domain of definition Mv := Zv/N = µT
−1(∆v) of
ϕv. The last statement follows from the fact that Φv is a diffeomorphism from M
toric
v onto the vector space
Fv , and ̟ is a diffeomorphism from Mv onto M
toric
v . �
Remark 5.6 If v, w ∈ V , then
ϕw ◦ ϕv
−1 = ψw
−1 ◦ ψv = θw
−1 ◦ (Ψw
−1 ◦Ψv) ◦ θv = θw
−1 ◦ (Φw ◦Φv
−1) ◦ θv.
Using the formula (5.6) for Φw ◦ Φv
−1, this can be used in order to obtain the formulas (4.3), (4.4) as a
consequence of (5.9). In the proof, it is used that ξ := µv(z
v) = µw(z
w), |zvf | = rf (ξ) if f ∈ Fv \ Fw, and
= (w)
if f ′ ∈ F \ Fv and g ∈ Fw. ⊘
In the following corollary we describe the symplectic form σtoricλ on the toric variety M
toric in the toric
variety coordinates.
Corollary 5.7 For each v ∈ V , the symplectic form (Φv
−1)∗(σtoricλ ) on C
Fv is equal to (θv
−1)∗(σv), where
σv is the standard symplectic form on C
Fv given by (3.4).
Because rf ′(µv(z
v))2 is an inhomogeneous linear function of the quantities |zvf |
2, it follows from (5.9) that
the equations which determine the |zvf |
2 in terms of the quantities |ζvf |
2 are n polyomial equations for the n
unknowns |zfv |2, f ∈ Fv , where the coefficients of the polyomials are inhomogeneous linear functions of the
|ζvf |, f ∈ Fv. In this sense the |z
2, f ∈ Fv, are algebraic functions of the |ζ
2, f ∈ Fv, and substituting
these in (5.9) we obtain that the diffeomorphism θv
−1 from CFv onto Uv is an algebraic mapping. If ∆ is a
simplex, when Mtoric is the n-dimensional complex projective space, we have an explicit formula for θv
see Subsection 6.1. However, already in the case that ∆ is a planar quadrangle, when Mtoric is a complex
two-dimensional Hirzebruch surface, we do not have an explicit formula for θv
−1. See Subsection 6.2.
Summarizing, we can say that in the toric variety coordinates the complex structure is the standard
one and the coordinate transformations are the relatively simple Laurent monomial transformations (5.6).
However, in the toric variety coordinates the λ-dependent symplectic form in general is given by quite
complicated algebraic functions. On the other hand, in the reduced phase space coordinates the symplectic
form is the standard one, but the coordinate transformations (4.3), (4.4) are more complicated. Also the
complex structure in the reduced phase space coordinates, which depends on λ, is given by more complicated
formulas.
Remark 5.8 It is a challenge to compare the formula in Corollary 5.7 for the symplectic form in toric
variety coordinates with Guillemin’s formula in [7, Th. 3.5 on p. 141] and [8, (1.3)]. Note that in the
latter the pullback by means of the momentum mapping appears of a function on the interior of ∆, where in
general we do not have a really explicit formula for the momentum mapping in toric variety coordinates. ⊘
6 Examples
6.1 The complex projective space
Let ∆ be an n-dimensional simplex in t∗. A little bit of puzzling shows that there is a Z-basis ei, 1 ≤ i ≤ n,
of the integral lattice tZ in t, such that, with the notation
e0 = −
ei, (6.1)
the Xf , f ∈ F , are the ei, 0 ≤ i ≤ n. That is, in the sequel we write F = {0, 1, . . . , n}. The Delzant
simplex (2.1) is determined by the inequalities 〈ei, ξ〉+ λi ≥ 0, 0 ≤ i ≤ n, which has a non-empty interior
if and only if
λi > 0. (6.2)
In the sequel we take for v the vertex determined by the equations 〈ei, ξ〉+ λi = 0 for all 1 ≤ i ≤ n, where
Fv = {1, . . . , n}. If we write ξi = 〈ei, ξ〉, 1 ≤ i ≤ n, when ξ ∈ t
∗, then (3.2) yields that
v)i = |zi|
2/2− λi, 1 ≤ i ≤ n.
It follows from (3.7) that
r0(ξ) = (2(−
ξi + λ0))
and therefore (5.9) yields that
ζvi = z
i (2γ − ‖z
v‖2)−1/2, 1 ≤ i ≤ n, (6.3)
where we have written
‖zv‖2 =
|zvi |
Note that Uv is the open ball in C
n with center at the origin and radius equal to (2c)1/2.
The equations (6.3) imply that
‖ζv‖2 = ‖zv‖
2/(2γ − ‖zv‖2),
hence
‖zv‖2 = 2γ ‖ζv‖2/(1 + ‖ζv‖2).
Therefore the mapping θv−1 : ζv 7→ zv is given by the explicit formulas
zvi = ζ
i (2γ/(1 + ‖ζ
v‖2))1/2, 1 ≤ i ≤ n. (6.4)
It can be verified that the symplectic form (θv
−1)∗(σv), where
σv = (1/2π)
dxvi ∧dy
is the standard symplectic form in (3.4), is equal to γ times the Fubini-Study form in Griffiths and Harris
[6, p. 30, 31]. In view of Remark 3.9 this agrees with the fact that the de Rham cohomology class of the
Fubini-Study form is Poincaré dual to the homology class of a complex projective hyperplane in the complex
projective space, see Griffiths and Harris [6, p. 122].
6.2 The Hirzebruch surface
Let n = 2 and let ∆ be a quadrangle in the t∗ plane. A little bit of puzzling shows that there is an m ∈ Z≥0
and a Z-basis e1, e2 of the integral lattice tZ in t, such that the Xf , f ∈ F , are the ei, 1 ≤ i ≤ 4, with
e3 = −e1 +me2, and e4 = −e2. We recognize the toric variety M
toric as the Hirzebruch surface Σm, see
Hirzebruch [9].
The Delzant polytope (2.1) is determined by the inequalities 〈ei, ξ〉 + λi ≥ 0, 1 ≤ i ≤ 4, which is a
quadrangle if and only if
γ± := λ1 + λ3 ±mλ4 > 0, (6.5)
which inequalities imply that λ2 + λ4 = γ+ + γ− > 0.
In the sequel we take for v the vertex determined by the equations 〈ei, ξ〉+ λi = 0 for i = 1, 2, where
Fv = {1, 2}. If we write ξi = 〈ei, ξ〉, 1 ≤ i ≤ 2, when ξ ∈ t
∗, then (3.2) yields that
v)i = |zi|
2/2 − λi, 1 ≤ i ≤ 2.
It follows from (3.7) that
r3(ξ) = (2(−ξ1 +mξ2 + λ3))
1/2, r4(ξ) = (2(−ξ2 + λ4))
and therefore (5.9) yields that
ζv1 = z
1 (2γ−|z
2 +m |zv2 |
2)−1/2, (6.6)
ζv2 = z
2 (2γ−|z
2 +m |zv2 |
2)m/2 (2(γ+ + γ−)− |z
2)−1/2. (6.7)
If we write ti = |z
2 and τi = |ζ
2, then this leads to the equations
τ1 = t1/(2γ− − t1 +mt2),
τ2 = t2 (2γ− + t1 +mt2)
m/(2(γ+ + γ−)− t2)
for t1, t2. If we solve t1 from the first equation,
t1 = (2γ− +mt2) τ1/(1 + τ1),
and substitute this into the second equation, then this leads to the polynomial equation
(1 + τ1)
m τ2 (2(γ+ + γ−)− t2) = t2 (2γ− +mt2)
m (6.8)
of degree m + 1 for t2. If we substract the left hand side from the right hand side then the derivative with
respect to t2 is strictly positive, and one readily obtains that for every τ1, τ2 ∈ R≥0 there is a unique solution
t2 ∈ R≥0, confirming the first statement in Lemma 5.4.
On the other hand, if we work over C, and view both the parameter ε := (1 + τ1)
m τ2 and the unknown
t2 as elements of the complex projective line P
1, then the equation (6.8) defines a complex algebraic curve
C in the (t2, ε)-plane P
1 × P1, where the restriction to C of the projection to the first variable t2 is a
complex analytic diffeomorphism from C onto P1, as on C we have that ε is a complex analytic function
of t2. In particular C is irreducible. The restriction to C of the projection to the second variable ε is an
(m+ 1)-fold branched covering. Over ε = 0 and over ε = ∞ we have that m of the m+ 1 branches come
together, whereas there are two more branch points on the ε-line over which only two of the branches come
together. The fact that C is irreducible implies that the part of C over the complement of the branch points is
connected, and therefore the analytic continuation of any solution t2 of (6.8), as a complex analytic function
of ε in the complement of the branch points, will reach each other branch if ε runs over a suitable loop. In
other words, the solution t2 is an algebraic function of ε of degree m+ 1, and no branch of a solution is of
lower degree. This holds in particular for our solutions t2 ∈ R≥0 for ε ∈ R≥0.
References
[1] R. Abraham and J.E. Marsden: Foundations of Mechanics. Benjamin/Cummings Publ. Co., London,
etc., 1978.
[2] M. Audin: The Topology of Torus Actions on Symplectic Manifolds. Birkhäuser, Basel, Boston, Berlin,
1991.
[3] T. Delzant: Hamiltoniens périodiques et images convexes de l’application moment. Bull. Soc. Math.
France 116 (1988) 315–339.
[4] M. Demazure: Sous-groupes algébriques de rang maximum du groupe de Cremona. Ann. scient. Éc.
Norm. Sup. 3 (1970) 507–588.
[5] V.I. Danilov: The geometry of toric varieties. Russ. Math. Surveys 33:2 (1978) 97–154, translated from
Uspekhi Mat. Nauk SSSR 33:2 (1978) 85–134.
[6] P. Griffiths and J. Harris: Principles of Algebraic Geometry. J. Wiley & Sons, Inc., New York, etc.,
1978.
[7] V. Guillemin: Moment Maps and Combinatorial Invariants of Hamiltonian Tn-spaces. Birkhäuser,
Boston, etc., 1994.
[8] V. Guillemin: Kaehler structures on toric varieties. J. Differential Geometry 40 (1994) 285–309.
[9] F. Hirzebruch: Über eine Klasse von einfach-zusammenhn̈genden komplexen Mannigfaltigkeiten.
Mathematische Annalen 124 (1951) 77–86.
[10] A. Pelayo: Topology of spaces of equivariant symplectic embeddings. Proc. Amer. Math. Soc. 135
(2007) 277–288.
[11] R.T. Rockafellar: Convex Analysis. Princeton University Press, princeton, N.J., 1970.
J.J. Duistermaat
Mathematisch Instituut, Universiteit Utrecht
P.O. Box 80 010, 3508 TA Utrecht, The Netherlands
e-mail: duis@math.uu.nl
A. Pelayo
Department of Mathematics, University of Michigan
2074 East Hall, 530 Church Street, Ann Arbor, MI 48109–1043, USA
e-mail: apelayo@umich.edu
	Introduction
	The reduced phase space
	The reduced phase space coordinatizations.
	The coordinate transformations
	The toric variety
	Examples
	The complex projective space
	The Hirzebruch surface
ABSTRACT
  In this note we describe the natural coordinatizations of a Delzant space
defined as a reduced phase space (symplectic geometry view-point) and give
explicit formulas for the coordinate transformations. For each fixed point of
the torus action on the Delzant polytope, we have a maximal coordinatization of
an open cell in the Delzant space which contains the fixed point. This cell is
equal to the domain of definition of one of the natural coordinatizations of
the Delzant space as a toric variety (complex algebraic geometry view-point),
and we give an explicit formula for the toric variety coordinates in terms of
the reduced phase space coordinates. We use considerations in the maximal
coordinate neighborhoods to give simple proofs of some of the basic facts about
the Delzant space, as a reduced phase space, and as a toric variety. These can
be viewed as a first application of the coordinatizations, and serve to make
the presentation more self-contained.

<|endoftext|><|startoftext|>
FRAGMENTATION OF GENERAL RELATIVISTIC QUASI-TOROIDAL POLYTROPES
Burkhard Zink,1, 2 Nikolaos Stergioulas,3 Ian Hawke,4 Christian D. Ott,5 Erik Schnetter,1, 6 and Ewald Müller7
1Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA
2Horace Hearne Jr. Institute for Theoretical Physics, Louisiana State University, Baton Rouge, LA 70803, USA
3Department of Physics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
4School of Mathematics, University of Southampton, Southampton SO17 1BJ, UK
5Department of Astronomy and Steward Observatory, The University of Arizona, Tucson, AZ, USA
6Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut, 14476 Golm, Germany
7Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85741 Garching bei München, Germany
How do black holes form from relativistic stars? This ques-
tion is of great fundamental and practical importance in grav-
itational physics and general relativistic astrophysics. On the
fundamental level, black holes are genuinely relativistic ob-
jects, and thus the study of their production involves ques-
tions about horizon dynamics, global structure of spacetimes,
and the nature of the singularities predicted as a consequence
of the occurrence of trapped surfaces. On the level of as-
trophysical applications, systems involving black holes are
possible engines for highly energetic phenomena like AGNs
or gamma-ray bursts, and also likely a comparatively strong
source of gravitational radiation.
The most simple model of black hole formation from, say,
cold neutron stars, is a fluid in spherically symmetric poly-
tropic equilibrium moving on a sequence of increasing mass
due to accretion [1]. This assumes that (i) the stellar structure
and dynamics are represented reasonably by the ideal fluid
equation of state and the polytropic stratification, (ii) accre-
tion processes are slow compared to the dynamical timescales
of the star, and (iii) rotation is negligible. Our focus has been
to study the effects of relaxing the third assumption.
In spherical symmetry, the sequence of equilibrium poly-
tropes has a maximum in the mass function M(ρ
), where ρ
denotes the central rest-mass density of the polytrope. This
maximum is connected to a change in the stability of the fun-
damental mode of oscillation [1], and thus collapse sets in
via a dynamical instability to radial deformations. During the
subsequent evolution, a trapped tube forms at the center which
traverses the stellar material entirely [2].
How much of this behaviour is preserved when rotation is
taken into account? Rotation is known to change the equi-
librium structure of the star, and, in consequence, its modes
of oscillation and set of unstable perturbations. The collapse
might also lead to the formation of a massive disk around the
new-born black hole, and finally only systems without spher-
ical symmetry can be a source of gravitational radiation.
Numerical simulations have been used to study the collapse
and black hole formation of general relativistic rotating poly-
tropic stars [3]. For the uniformly and moderately differen-
tially rotating models investigated in those studies, the dynam-
ical process is described by the instability of a quasi-radial
mode and subsequent collapse of the star up to the formation
of an accreting Kerr black hole at the star’s center.
Will strong differential rotation modify this picture? Even
before our study, there was evidence that this should be the
case. (i) Strong differential rotation can deform the high-
FIG. 1: Development of the fragmentation instability in a model of a
strongly differentially rotating supermassive star. The darker shades
of grey indicate higher density. The closed white line in the last plot
is a trapped surface.
density regions of a star into a toroidal shape, thus chang-
ing the equilibrium structure considerably. (ii) It admits stars
of high normalized rotational energy T/|W | [1] which are
stable to axisymmetric perturbations. (iii) It admits non-
axisymmetric instabilities, for example by the occurrence of
corotation points[4], at low values of T/|W |[5]. (iv) A bar-
mode instability of the type found in Maclaurin spheroids[6]
would likely express itself by the formation of two orbiting
fragments if the initial high-density region has toroidal shape.
This last property has motivated us to ask this question:
Can a bar deformation transform a strongly differentially ro-
tating star into a binary black hole merger with a massive
accretion disk? If so, this process might occur in supermas-
sive stars if the timescales associated with angular momentum
transport are too large to enforce uniform rotation.
We have investigated black hole formation in strongly dif-
ferentially rotating, quasi-toroidal models of supermassive
stars [7, 8], and found that a non-axisymmetric instability can
lead to the off-center formation of a trapped surface (see fig-
ure). An extensive parameter space study of this fragmen-
tation instability [8] reveals that many quasi-toroidal stars of
this kind are dynamically unstable in this manner, even for low
values of T/|W |, and we have found evidence that the coro-
tation mechanism observed by Watts et al. [4] might be active
in these models. Since, on a sequence of increasing T/|W |,
one of the low order m = 1 modes becomes dynamically un-
http://arxiv.org/abs/0704.0431v1
stable before m = 2 and higher order modes, one would not
expect a binary black hole system to form in many situations
(although this may depend on the rotation law and details of
the pre-collapse evolution as well). Rather, the off-center pro-
duction of a single black hole with a massive accretion disk
appears more likely.
Since the normalized angular momentum J/M2 of the ini-
tial model is greater than unity, there is another interesting
consequence of this formation process: the resulting black
hole, unless it is ejected from its shell, may very well be
rapidly rotating, spun up by accretion of the material remain-
ing outside the initial location of the trapped surface. Investi-
gating the late time behaviour of this accretion process, esti-
mating possible kick velocities of the resulting black hole, and
finding the mass of the final accretion disk is, however, beyond
our present-day capabilities and subject of future study.
[1] S. Shapiro and S. Teukolsky, Black Holes, White Dwarfs and
Neutron Stars (Wiley 1983).
[2] S. Shapiro and S. Teukolsky, Astrophys. J. 235, 199 (1980).
[3] M. Shibata, T. Baumgarte and S. Shapiro, Phys. Rev. D 61,
044012 (2000). L. Baiotti, I. Hawke, P. Montero, F. Löffler, L.
Rezzolla, N. Stergioulas, J. A. Font and E. Seidel, Phys. Rev. D
71, 024035 (2005), and references therein.
[4] A. Watts, N. Andersson and D. Jones, Astrophys. J. L37 (2005).
[5] J. Centrella, K. New, L. Lowe and J. Brown, Astrophys. J. 550
(2001).
[6] S. Chandrasekhar, Ellipsoidal Figures of Equilibrium (Yale UP
1969).
[7] B. Zink, N. Stergioulas, I. Hawke, C. D. Ott, E. Schnetter and E.
Müller, Phys. Rev. Letters 96, 161101 (2006).
[8] B. Zink, N. Stergioulas, I. Hawke, C. D. Ott, E. Schnetter and E.
Müller, astro-ph/0611601 (2006).
http://arxiv.org/abs/astro-ph/0611601
ABSTRACT
  We investigate the role of rotational instabilities in the context of black
hole formation in relativistic stars. In addition to the standard scenario - an
axially symmetric dynamical instability forming a horizon at the star's center
- the recently found low-$T/|W|$ instabilities are shown to lead to
fragmentation and off-center horizon formation in differentially rotating
stars. This process might be an alternative pathway to produce SMBHs from
supermassive stars with inefficient angular momentum transport.

<|endoftext|><|startoftext|>
Introduction
These notes, based on the paper [8] by Huebschmann and Stasheff, were pre-
pared for a series of talks at Illinois State University with the intention of ap-
plying Homological Perturbation Theory (HPT) to the construction of derived
brackets [11, 16], and eventually writing Part II of the paper [1].
Derived brackets are obtained by deforming the initial bracket via a deriva-
tion of the bracket. In [3] it was demonstrated that such deformations corre-
spond to solutions of the Maurer-Cartan equation, and the role of an “almost
contraction” was noted. This technique (see also [9]) is very similar to the itera-
tive procedure of [8] for finding the most general solution of the Maurer-Cartan
equation, i.e. the deformation of a given structure in a prescribed direction.
The present article, besides providing additional details of the condensed
article [8], forms a theoretical background for understanding and generalizing
the current techniques that give rise to derived brackets. The generalization,
which will be the subject matter of [2], will be achieved by using Stasheff and
Huebschmann’s universal solution. A second application of the universal solu-
tion will be in deformation quantization and will help us find the coefficients of
star products in a combinatorial manner, rather than as a byproduct of string
theory which underlies the original solution given by Kontsevich [10].
HPT is often used to replace given chain complexes by homotopic, smaller,
and more readily computable chain complexes (to explore “small” or “minimal”
models). This method may prove to be more efficient than “spectral sequences”
in computing (co)homology. One useful tool in HPT is
Lemma 1 (Basic Perturbation Lemma (BPL)). Given a contraction of N onto
M and a perturbation ∂ of dN , under suitable conditions there exists a pertur-
bation d∂ of dM such that H(M,dM + d∂) = H(N, dN + ∂).
The main question is: under what conditions does the BPL allow the preser-
vation of the data structures (DGA’s, DG coalgebras, DGLA’s etc.)? (We will
use the self-explanatory abbreviations such as DG for “differential graded”,
http://arxiv.org/abs/0704.0432v1
DGA for “differential graded (not necessarily associative) algebra”, and DGLA
for “differential graded Lie algebra”.)
Another prominent idea is that of a “(universal) twisting cochain” as a so-
lution of the “master equation”:
Proposition 1. Given a contraction of N onto M and a twisting cochain N →
A (A some DGA), there exists a unique twisting cochain M → A that factors
through the given one and which can be constructed inductively.
The explicit formulas are reminiscent of the Kuranishi map [13] (p.17), and
the relationship will be investigated elsewhere.
Note: we will assume that the ground ring is a field F of characteristic zero.
We will denote the end of an example with the symbol ♦ and the end of a proof
by �.
2 Perturbations of (co)differentials
2.1 Derivations of the tensor algebra
For any vector space V over F we have the isomorphismDer(TV ) ∼= Hom(V, TV )
where TV denotes the (augmented) tensor algebra on V . Namely, every linear
map f from V into TV extends uniquely into a derivation of the algebra TV
via the formula
f̂(v1 ⊗ · · · vn) =
v1 ⊗ · · · ⊗ f(vi)⊗ · · · vn.
Equivalently, every derivation of TV is determined by its restriction to V .
2.2 Coderivations of the tensor coalgebra
Similarly, we have the isomorphism Coder(T cV ) ∼= Hom(T cV, V ) where T cV is
the (coaugmented) coassociative tensor coalgebra of V , with counit η : T cV → F
(projection onto F ), and comultiplication
∆(v1 ⊗ · · · ⊗ vn) =
(v1 ⊗ · · · ⊗ vi)⊗ (vi+1 ⊗ · · · ⊗ vn).
Every linear map f = f1 + f2 + · · ·+ fn + · · · : T
cV → V (where fi : V
⊗i → V )
factors uniquely through a coderivation f̂ of T cV defined via the formula
f̂(v1 ⊗ · · · ⊗ vn) =
v1 ⊗ · · · ⊗ f1(vi)⊗ · · · vn
v1 ⊗ · · · ⊗ f2(vi ⊗ vi+1)⊗ · · · vn
+fn(v1 ⊗ · · · ⊗ vn).
That is, each coderivation on T cV is determined by itself followed by the projec-
tion onto V . Recall that the condition for f̂ to be a coderivation can be written
as ∆f̂ = (1⊗ f̂ + f̂ ⊗ 1)∆.
2.3 Coderivations of the symmetric coalgebra
Let us consider the cofree cocommutative counital coassociative algebra ST cV
on the vector space V as a subspace of T cV . The symmetric group Σn acts on
the left on V ⊗n via σ(v1 ⊗ · · · ⊗ vn) = vσ−1(1) ⊗ · · · ⊗ vσ−1(n). Then
ST cV =
(V ⊗n)Σn
is the space of invariants of this action. The action is compatible with the
coproduct on ST cV , so ST cV is a subcoalgebra of T cV which is cocommutative.
Note that ST (V ) is not a subalgebra with respect to the tensor multiplication
in T (V ); the product has to be symmetrized so that it projects back onto this
subspace (reminiscent of what T. Voronov does with derived brackets). The
projection (symmetrization) map P : T cV → ST cV is given by
P (v1 ⊗ · · · ⊗ vn) =
σ(v1 ⊗ · · · ⊗ vn).
This is not a coalgebra map, but is a retraction of the canonical inclusion
ST cV →֒ T cV . Now a coderivation D : T cV → T cV induces a coderivation
DS : ST
cV → ST cV by the composition ST cV →֒ T cV
→ T cV
→ ST cV .
In particular, a coderivation of T cV induces one of ΛcV = ST c[sV ] where V is
thought of as living in degree zero: we introduce the graded symmetric coalgebra
below. Once again, coderivations of ST cV are determined by their projections
onto ST c1V ; a map f = f1 + f2 + · · · : ST
c(V ) → V determines a coderivation
f̂ as in the tensor coalgebra case.
In the remaining part of this survey, we choose to identify ST cV with the
abstract symmetric coalgebra ScV under the isomorphism
v1 · · · vn 7→ P (v1 ⊗ · · · ⊗ vn).
The coproduct in ScV is given by
∆(v1 · · · vn) =
σ∈Σi,n−i
vσ(1) · · · vσ(i) ⊗ vσ(i+1) · · · vσ(n).
2.4 DGLA’s and perturbations of the codifferential
Definition 1. For any chain complex (X, d), and odd ∂, with (d + ∂)2 =
0, we say that ∂ is a perturbation of the differential d. We call d + ∂ the
perturbed differential. Equivalently, we have [d, ∂] + ∂∂ = 0 in End(X). If ∂ is
also compatible with an existing coalgebra structure on X , we say that it is a
coalgebra perturbation.
Let (g, d) be a graded chain complex (d lowers degrees) with a bracket [ , ]
that is skew-symmetric (not necessarily Leibniz or a chain map). Consider the
differential graded symmetric coalgebra Sc[sg], the differential d being induced
by that on g. Also let ∂ be the coderivation on Sc[sg] of degree −1 induced by
the bracket.
Proposition 2. The bracket [ , ] turns (g, d) into a DGLA if and only if ∂ is
a coalgebra perturbation of d. Also, any DGLA structure on g is determined by
the coalgebra perturbation induced from the bracket.
When g is an ordinary (degree-zero) Lie algebra over a field, Sc[sg] =
Λcg with differential ∂ corresponding to the bracket is the ordinary Koszul
or Chevalley-Eilenberg complex computing the homology of g with coefficients
in the field.
2.5 Strongly homotopy Lie algebras
Definition 2. Let (g, d) be a chain complex and let d also denote the codiffer-
ential in Sc[sg] induced by d. A strongly homotopy Lie (sh-Lie, or L∞) structure
on g is a perturbation ∂ = ∂2 + · · · + ∂n + · · · of d, i.e. an odd coderivation
satisfying [d, ∂] + ∂∂ = 0 and ∂η = 0 (recall that η is the counit) so that the
sum d+ ∂ endows Sc[sg] with a new coaugmented DG coalgebra structure.
The corresponding mega-map ℓ2 + · · · + ℓn + · · · from S
c(sg) to g extends
the differential ℓ1 = d : sg → g, and the lower identities satisfied by
ℓ = ℓ1 + ℓ2 + · · ·+ ℓn + · · ·
read as follows:
ℓ21 = 0
ℓ1(ℓ2(a, b))± ℓ2(ℓ1(a), b)± ℓ2(ℓ1(b), a) = 0
ℓ1(ℓ3(a, b, c))± ℓ3(ℓ1(a), b, c)± ℓ3(ℓ1(b), a, c)± ℓ3(ℓ1(c), a, b)
±ℓ2(ℓ2(a, b), c)± ℓ2(ℓ2(a, c), b)± ℓ2(ℓ2(b, c), a) = 0.
An sh-Lie morphism between two sh-Lie (or DGL) algebras (g, d + · · · ) and
(g′, d′ + · · · ) is a collection of chain maps Fn : S
n[sg] → S
′], satisfying
∆′F (u) = (F ⊗ F )(∆u). Then F is uniquely determined by its projection onto
sg′, that is, we may assume Fn : S
n[sg] → sg
Definition 3. A quasi-isomorphism F between sh-Lie algebras g, g′ is an sh-Lie
morphism such that F1 : sg → sg
′ induces an isomorphism between H(g, d) and
H(g′, d′).
Remark 1. Quasi-isomorphisms between DGLA’s are especially important in
deformation theory. Such a map gives a one-to-one correspondence between
moduli spaces of solutions to MC equations in ~g[[~]] and ~g′[[~]] (see [5]):
given a quasi-isomorphism F : Sc[sg] → sg′, we define F̃ : ~g[[~]] → ~g′[[~]] by
F̃ (r) =
Fn(r, . . . , r)
(also see [6]).
2.6 The Hochschild chain complex and DGA’s
Let (A, µ) be a unital associative algebra (possibly graded), and T c[sA] denote
the tensor coalgebra on the suspension of A. We recall that
Coder(T c[sA]) ∼= Hom(T
c[sA], A).
In particular, the associative bilinear multiplication µ ∈ Hom(T c[sA], A) corre-
sponds to a square-zero coderivation ∂ : T c[sA] → T c[sA] defined by
∂(a1 ⊗ · · · ⊗ an)
(−1)i+1(a1 ⊗ · · · ⊗ µ(ai ⊗ ai+1)⊗ · · · ⊗ an)
+(−1)n+1(µ(an ⊗ a1)⊗ a2 ⊗ · · · ⊗ an−1).
The condition that ∂ is a codifferential is equivalent to the associativity con-
dition m ◦ m = 0 where ◦ is the Gerstenhaber composition on multilinear
maps (a right pre-Lie map). The complex (Hom(T c[sA], A), ∂) is known as
the Hochschild chain complex.
Now let (A, µ, d) be a DGA. Then d+µ ∈ Hom(T c[sA], A) corresponds to a
perturbed codifferential d+ ∂ satisfying (d+ ∂)2 = 0, which is equivalent to the
identities d2 = 0 and [d, ∂] + ∂∂ = 0. The latter can also be split into [d, ∂] = 0
and ∂∂ = 0.
Proposition 3. The multiplication µ turns (A, d) into a DGA if and only if ∂
is a coalgebra perturbation of d. Also, any DGA structure on A is determined
by the coalgebra perturbation induced from µ.
2.7 Strongly homotopy associative algebras
Definition 4. Let (A, d) be a chain complex and let d also denote the codifferen-
tial in T c[sA] induced by d. A strongly homotopy associative (or A∞) structure
on A is a perturbation ∂ = ∂2 + · · · + ∂n + · · · of d, i.e. an odd coderivation
satisfying [d, ∂] + ∂∂ = 0 and ∂η = 0 so that the sum d+ ∂ endows T c[sA] with
a new coaugmented DG coalgebra structure.
The corresponding mega-map m2+ · · ·+mn+ · · · from T
c[sA] to A extends
the differential m1 = d : sA → A, and the lower identities satisfied by
m = m1 +m2 + · · ·+mn + · · ·
read as follows:
m21 = 0
m1(m2(a, b))±m2(m1(a), b)±m2(m1(b), a) = 0
m1(m3(a, b, c))±m3(m1(a), b, c)±m3(a,m1(b), c)±m3(a, b,m1(c)) = 0.
The mega-identity is m ◦ m = 0, sometimes written in the braces notation
{m}{m} = 0.
3 Master equation
If (A, d) is a differential graded associative algebra (DGA), then the equation
dτ = ττ (1)
is called the Master Equation (ME) (or Maurer-Cartan equation (MCE), etc.).
Similarly if (g, d) is a DGLA, then the equation
[τ, τ ] (2)
is also called the Master Equation. Sometimes the sign convention is
dτ + ττ = 0 (3)
[τ, τ ] = 0. (4)
Clearly any solution of such an equation must be an odd element of the algebra.
Moreover, in case A is the graded universal enveloping algebra of g, or g is the
Lie algebra obtained from A by the usual bracket, then solutions of the DGLA
master equation are also solutions of the DGA master equation.
Remark 2. If τ is a solution of Eq. (2) or Eq. (4) in a DGLA g, then the
odd derivation dτ = d− adτ or dτ = d+ adτ respectively defines a new DGLA
structure on g with respect to the old bracket.
Example 1. If g is a DGLA or L∞ algebra, then the corresponding coal-
gebra perturbation ∂ in Coder(Sc(sg)) is a solution of the ME in the DGA
End(Sc(sg)), where the differential is D = add. ⋄
Example 2. Gauge Theory: Let ξ be a principal bundle with structure group
G and Lie algebra g. There is a graded Lie algebra structure on the ad(ξ)-valued
de Rham forms induced by g. Given a connection A and an ad(ξ)-valued 1-form
η, the sum A+ η is again a connection, and its curvature is
FA+η = FA + dAη +
[η, η].
In particular, FA = FA+η if and only if
dAη +
[η, η] = 0
(the Maurer-Cartan equation). Here dA is the covariant derivative of the con-
nection A. When A is a flat connection (zero curvature) then there exists a
DGLA structure on the ad(ξ)-valued differential forms (d2A = 0) and FA+η is
also flat iff the MCE is satisfied (then the covariant derivative for A+ η is dτ ).
4 Twisting cochain
The notion of a twisting cochain generalizes that of a connection in differential
geometry.
4.1 Differential on Hom
If (C, dC) and (A, dA) are chain complexes, the following differential D makes
Hom(C,A) into a chain complex: Dφ = dAφ± φdC .
4.2 Cup product and cup bracket
Proposition 4. For any differential graded coalgebra C and a differential graded
associative algebra (DGA) A, the chain complex (Hom(C,A), D) becomes a
DGA via the cup (convolution) product a ⌣ b defined by the composition
−→ C ⊗ C
−→ A⊗A
−→ A.
The coaugmentation and augmentation maps η and ǫ on C and A respectively
define an augmentation map on (Hom(C,A), D).
Proposition 5. For any differential graded coalgebra C and a DGLA g, the
chain complex (Hom(C, g), D) becomes a DGLA via the cup bracket [a, b] defined
by the composition
−→ C ⊗ C
−→ g ⊗ g
[ , ]
−→ g.
Example 3. If g is a Lie algebra, then the cup bracket on Hom(Sc[sg], g) is
defined as above. For example, if τ and κ are maps Sc1[g] → g, then [τ, κ] may
be nonzero only on Sc2[sg]. In this case, we compute
∆(xy) = 1⊗ xy + x⊗ y + y ⊗ x+ xy ⊗ 1 (x, y ∈ sg),
[τ, κ](xy) = [τ(x), κ(y)] + [τ(y), κ(x)]. (5)
Example 4. The Hochschild complex of an associative algebra (A, µ) (where
µ2 = 0): Let
C•(A) = Hom(T cA,A) = Hom(
A⊗n, A),
with differential D = adµ ∈ Der(C
•(A)). The cup product x ⌣ y = {µ}{x, y}
is the composition
−→ T cA⊗ T cA
−→ A⊗A
−→ A;
if x is an n-linear map and y is an m-linear map, then
(x ⌣ y)(a1 ⊗ · · · ⊗ an+m) = x(a1 ⊗ · · · ⊗ an) · y(an+1 ⊗ · · · ⊗ an+m).
Remark 3. The differential D above is an inner derivation and not derived
from differentials on A and T cA. Still, it is a derivation of the cup product.
4.3 Twisting cochain
Definition 5. Given a coaugmented DG coalgebra C and an augmented DGA
A, a twisting cochain is a homogeneous morphism t : C → A of degree −1 such
that ǫτ = 0 and τη = 0 , and which satisfies Dt = t ⌣ t.
In other words, a twisting cochain is a solution of the master equation on
Hom(C,A) with the usual differential D induced from those of C, A and the
product is the cup product.
Definition 6. Given a DG cocommutative coalgebra C and a DGLA g, a Lie
algebra twisting cochain t : C → g is a homogeneous map of degree −1 whose
composition with the coaugmentation map is zero, and which satisfies
[t, t] (6)
([ , ] being the cup bracket).
Recall that a DGLA structure on a graded chain complex (g, d) is given by
a perturbation ∂ of the corresponding codifferential on Sc[sg]. Moreover, the
piece d∂ + ∂d = 0 of (d+ ∂)2 = 0 says that the bracket is a chain map and the
piece ∂2 = 0 says that the bracket satisfies the Jacobi identity. Let us denote
the symmetric coalgebra with the codifferential ∂ by Sc
[ , ]
[sg]. Quillen’s notation
C[g] for the same DG coalgebra reminds us that this is the Koszul or Chevalley-
Eilenberg complex that computes the homology of g without any regard for the
additional differential d on g.
Example 5. For any DGLA g, its universal Lie algebra twisting cochain
τg : S
[ , ][sg] → g
is given by
τg(sx) = x for x ∈ g
τg(y) = 0 for y ∈ S
k[sg], k 6= 1.
That is, an element with tensor degree one goes to its desuspension and ev-
erything else goes to zero. Clearly, the composition τgη is zero, as τg = 0 on
constants. Next, we show that τg satisfies the equation (6), but we note that in
this construction the differential on g itself is taken to be zero. On the left-hand
side, we have
Dτg(x1 ∧ · · · ∧ xn) = τg∂(x1 ∧ · · · ∧ xn)
which is zero if n 6= 2 and is equal to [x1, x2] if n = 2. Meanwhile, on the
right-hand side, we have
[τg, τg](x1 ∧ · · · ∧ xn)
which is zero if n 6= 2 and is equal to
{[τg(x1), τg(x2)]± [τg(x2), τg(x1)]} = [x1, x2]
if n = 2. ⋄
Remark 4. The universal property of the universal LA twisting cochain is that
every Lie algebra twisting cochain factors through this one: that is, whenever
C is a coalgebra and τ : C → g is a twisting cochain, then τg ◦ c(τ) = τ where
c(τ) : C → Sc
[ , ]
[sg] is the unique coalgebra map induced by τ .
Using HPT, we will construct formal solutions
τ ∈ Hom(ScD[sH(g)], g)
of the master equation. Once we make the choice of a contraction, we will obtain
explicit inductive formulas for D and τ .
5 Homological perturbation theory (HPT)
“HPT is concerned with transferring various kinds of algebraic structure through
a homotopy equivalence”. Also: “HPT is a set of techniques for the transference
of structures from one object to another up to homotopy” (Real [14]).
5.1 Contraction
Definition 7. Let (M,dM ) and (N, dN ) be chain complexes, π : N → M
and ∇ : M → N be chain maps, and h ∈ End(N) be a morphism (possibly
preserving some extra structure) of degree 1. Then a contraction
N, h) (7)
of N onto M is a collection of the above data satisfying
π∇ = idM
D(h) = addN (h) = ∇π − idN
πh = 0, h∇ = 0, hh = 0.
Another way to describe this structure is to say that M is a strong deformation
retract (SDR) of N (also called Eilenberg-Zilber data). The properties on the
last line are referred to as the annihilation properties or side conditions. Note
that the first line makes π surjective (projection) and ∇, injective (inclusion).
The map h is also known as the homotopy operator between ∇π and idN :
∇π − idN = D(h) = dNh+ hdN
(D = addN is the induced differential on Hom(N,N)).
Often filtered contractions are considered.
Remark 5. Lambe and Stasheff [12] noticed that the side conditions on h are
not restrictive: if πh = 0 and h∇ = 0 are not satisfied, then we can replace
h by h′ = D(h)hD(h). Now if h2 = 0 is not satisfied either, we replace h′ by
h′′ = h′dNh
′, which finally gives us an operator h′′ satisfying the side conditions.
Lemma 2. Given a contraction (7), we have a (not necessarily direct) sum
N = Im(∇) + Im(h) + Im(dN ).
Proof. Each x ∈ N can be written as
x = ∇π(x) − hdN (x) − dNh(x). (8)
Lemma 3. [14] Given a contraction (7), we have
Im(∇) + Im(h) = Im(∇)⊕ Im(h) = Ker(h).
Proof. We have
Im(∇) ⊂ Ker(h) and Im(h) ⊂ Ker(h)
since h∇ = 0 and h2 = 0. Conversely, by (8), each x ∈ Ker(h) can be written
x = ∇π(x) − hdN (x) ∈ Im(∇) + Im(h).
That the sum is direct can be seen as follows: let x ∈ Im(∇) ∩ Im(h). Then we
have x = ∇(y) = h(z) for some y ∈ M and z ∈ N . Rewriting the decomposition
of x in Ker(h), we obtain
x = ∇π(x) − hdN (x)
= ∇πh(z)− hdN∇(y)
= 0− hdN∇(y) (πh = 0)
= h∇dM (y) (∇ chain map)
= 0 (h∇ = 0).
Corollary 1. For any contraction (7), we have
H(N, h) ∼= Im(∇) ∼= M.
Lemma 4. Given any contraction (7), we have
Im(h) + Im(dN ) = Im(h)⊕ Im(dN ).
Moreover, if dM ≡ 0, then
Im(h)⊕ Im(dN ) = Ker(π).
Proof. Say x ∈ Im(h) ∩ Im(dN ). Then x = h(y) = dN (z) for some y, z ∈ N ,
so that by (8) we obtain
x = ∇πh(y) − hdNdN (z)− dNhh(y) = 0.
It is always true that Im(h) ⊂ Ker(π) as πh = 0. If dM ≡ 0, we further have
the result
πdN (x) = −dMπx = 0,
so that altogether
Im(h)⊕ Im(dN ) ⊂ Ker(π).
Conversely, for x ∈ Ker(π), we see that (even without the condition dM = 0)
Ker(π) ⊂ Im(h) + Im(dN )
since
x = −hdN (x)− dNh(x) ∀x ∈ Ker(π)
due to (8). �
Lemma 5. For any contraction (7) with dM = 0 we have
Im(∇) + Im(dN ) = Im(∇) ⊕ Im(dN ) = Ker(dN ).
Proof. The given sum is direct: let x ∈ Im(∇) ∩ Im(dN ). Then x = ∇(y) =
dN (z) and by (8)
x = ∇πdN (z)− hdNdN (z)− dNh∇(y) = ∇πdN (z) = −∇dMπ(z) = 0.
Clearly, we have Im(dN ) ⊂ Ker(dN ). Also Im(∇) ⊂ Ker(dN ), because
dN∇(x) = −∇dM (x) = 0.
Conversely, by (8),
Ker(dN ) ⊂ Im(∇) + Im(dN )
since we can write
x = ∇π(x) − dNh(x)
(no condition on dM ) for x ∈ Ker(dN ). �
Corollary 2. For any contraction (7) with dM = 0 we have
H(N, dN ) ∼= Im(∇) ∼= M.
Proposition 6. For any contraction (7) with dM = 0 we have
N = Im(∇)⊕ Im(h)⊕ Im(dN )
where
Im(∇)⊕ Im(h) = Ker(h)
Im(h)⊕ Im(dN ) = Ker(π)
Im(∇)⊕ Im(dN ) = Ker(dN ),
2. Im(h)
→ Im(dN ) is an isomorphism with inverse Im(dN )
→ Im(h), and
3. Im(∇)
→ M is an isomorphism with inverse M
→ Im(∇); we also have
Im(∇) ∼= M = Im(π) ∼= H(N, dN ) ∼= H(N, h).
Remark 6. This is a Hodge-type decomposition reminiscent of the case of a
compact orientable Riemannian manifold M without boundary. If
∗ : Ωr(M) → Ωdim(M)−r
is the “Hodge star operator” (an isomorphism) and
d : Ωr−1(M) → Ωr(M)
is the de Rham differential, then we define a “partial inverse” d† (the adjoint
exterior derivative operator) to −d by d† = ± ∗ d∗. The commutator of d and
d† is called the “Laplace-Beltrami operator”: ∆ = dd† + d†d. Then there exists
a unique decomposition of the algebra of de Rham forms as follows:
Ωr(M) = Harmr(M)⊕ d(Ωr−1(M))⊕ d†(Ωr+1(M)),
where the “harmonic forms” are given by Harmr(M) = Ker(∆). In the case of
our general contraction with dM = 0, the operators h and dN replace d and d
respectively. What do we know about ∆ here? We have
∆ = D(h) = hdN + dNh = (h+ dN )
2 = ∇π − idN .
The kernel of this operator is equal to ∇(M), as we have
(∇π − idN )(x) = 0 ⇔ ∇π(x) = x ⇔ x ∈ ∇(M),
or Ker(∆) = Im(∇). So is there an analog of the Hodge star operator? If we
define an isomorphism ∗ = h+ dN + Id∇(M) (where the last operator is zero on
the remaining direct summands), then we have ∗−1 = −h− dN + Id∇(M), and
−1 = (h+dN+Id∇(M))dN (−h−dN+Id∇(M)) = −hdNh = −hIdIm(dN ) = h.
Remark 7. The operator d† is more like the BV operator than the (even)
Laplacian, which is not square-zero. Another similar case is Q (BRST operator)
and b0 (anti-ghost operator), for which we have Qb0 + b0Q = L0 (the degree
operator which is zero on the cohomology).
Proof. We only need to prove (2) and part of (3). First, we want to show that
−dNh = idIm(dN ) But then for x = dN (y), we have
− dNh(x) = [−dNh]dN (y)
= [hdN −∇π + idN ]dN (y)
= −∇[πdN ](y) + dN (y)
= −∇[−dMπ]dN (y) + x
Similarly, we would like to have −hdN = idIm(h). If x = h(y), then
− hdN (x) = [−hdN ]h(y)
= [dNh−∇π + idN ]h(y)
= −∇πh(y) + h(y)
Finally, we have π∇ = idM and ∇π∇ = ∇idM = ∇, which shows the isomor-
phism between ∇(M) and π(N). �
Example 6. Let (g, d) be a chain complex. Assume that the underlying ring
is a field. Then there exists a contraction
(H(g)
g, h) (9)
of chain complexes, where the differential on H(g) is zero: we can write g as a
linear sum
g = G⊕Ker(d) = G⊕ Im(d)⊕H(g)
by choosing arbitrary representatives of the homology classes etc.; let us show
the decomposition of an element x of g by
x = xG + xIm(d) + xH(g).
Then π is the projection of g onto H(g) and ∇ is the inclusion map of H(g) into
g. Note that as vector spaces G and Im(d) are isomorphic via d: let x, y ∈ G.
dx = dy ⇒ d(x − y) = 0 ⇒ x− y ∈ Ker(d) ∩G = {0}
and d : G → Im(d) is one-to-one as well as onto. We define h to be the inverse
of −d on Im(d) and zero on the rest of g. The linear map h is square-zero and
increases degree by one. Moreover,
(dh+ hd)(x)
= (dh+ hd)(xG + xIm(d) + xH(g))
= dh(xIm(d)) + hd(xG)
= −xIm(d) − xG
(∇π − idg)(xG + xIm(d) + xH(g))
= xH(g) − (xG + xIm(d) + xH(g))
= −xIm(d) − xG.
In comparison with the last corollary, we have
(N, dN ) = (g, d)
(M,dM ) = (H(g, d), 0)
Im(∇) = H(g, d)
Im(h) = G
Im(dN ) = Im(d).
Example 7. (The Tensor Trick) Any contraction (7) of chain complexes in-
duces a filtered contraction
(T c[M ]
T c[N ], T ch)
of coaugmented differential graded coalgebras. Here is how: the projection
PN : T
c[N ] → N
followed by the surjective chain map π : N → M gives us a linear map
π ◦ PN : T
c[N ] → M
(π ◦ PN )(x1 ⊗ · · · ⊗ xk) =
π(x1) if k = 1
0 otherwise.
which can then be made into a unique coalgebra map
T cπ : T c[N ] → T c[M ]
with the usual formula
T cπ(x1 ⊗ · · · ⊗ xk) =
x1 ⊗ · · · ⊗ π(xi)⊗ · · · ⊗ xk.
Next, the morphisms T cπ and T c∇ pass to the corresponding morphisms on
the coalgebras Sc[N ] and Sc[M ] respectively, and Sch is obtained from T ch by
symmetrization, to yield a contraction
(Sc[M ]
Sc[N ], Sch).
In particular, the contraction (9) induces
(Sc[sH(g)]
Sc[sg], Sch), (10)
which is a filtered contraction of coaugmented DG coalgebras. (Warning: Scπ
and Sc∇ are morphisms of coalgebras but Sch is not a coalgebra morphism,
although it is somewhat compatible with the coalgebra structure, being a ho-
motopy of coalgebra maps. One has to be careful when defining a homotopy of
cocommutative coalgebras.) ⋄
5.2 The first main theorem.
Assume that ∂ is the codifferential corresponding to an sh-Lie algebra struc-
ture on (g, d). Since the corresponding multilinear map on g has other compo-
nents than the binary bracket, we will denote the symmetric coalgebra on sg
with codifferential ∂ by Sc∂ [sg] and not by S
[ , ][sg]. Given two sh-Lie algebras
(g1, ∂1) and (g1, ∂2), an sh-morphism or sh-Lie map from g1 to g2 is a morphism
Sc∂1 [sg1] → S
[sg2] of DG coalgebras.
Theorem 1. Given a DGLA g and a contraction of chain complexes such as
(9), the data determine
(i) a differential D on Sc[sH(g)] (a coalgebra perturbation of the zero differen-
tial) turning the latter into a coaugmented DG coalgebra, hence endowing H(g)
with an sh-Lie algebra structure,
(ii) a Lie algebra twisting cochain
τ : ScD[sH(g)] → g
with adjoint τ̄ , written
τ̄ = (Sc∇)∂ : S
D[sH(g)] → C[g],
that induces an isomorphism on the homology, and
(iii) an extension of (Sc∇)∂ to a new contraction
(ScD[sH(g)]
(Scπ)∂
(Sc∇)∂
Sc∂ [sg], (S
cd)∂)
of filtered chain complexes (not necessarily of coalgebras).
Notes on Notation. While the induced bracket on H(g) is a strict graded
Lie bracket, the differential D may involve meaningful terms of higher order.
Let us introduce a table for the notation used in [8] for different types of chain
complexes and the corresponding symmetric coalgebras.
Chain complex Bracket(s) Sym. coalgebra Coderivation Property
(g, d) [ , ] (Sc[sg], d) ∂
graded generic DG coalgebra; coderivation
chain complex bilinear bracket induced diff. d induced by [ , ]
(g, d) [ , ]
Sc[ , ][sg], d
= C[g] ∂ (d+ ∂)
DGLA Lie bracket on g; generalized Koszul or coderivation
d derivation of it Chevalley-Eilenberg induced by [ , ]
complex
(g, 0) [ , ] (Sc[sg], 0) = Λ•(g) ∂ ∂2 = 0
Lie algebra Lie bracket Koszul complex codifferential
for homology induced by [ , ]
(g, d) ℓ2, ℓ3, . . . (S
∂ [sg], d) ∂ (d+ ∂)
2 = 0
L∞ algebra; higher brackets codiff. induced by
d = ℓ1 ℓ2, ℓ3, . . .
(H(g), 0) [ , ] (ScD[sH(g)], 0) D D
2 = 0
homology of induced Lie codiff. defined
DGLA (g, d) bracket on H(g) in the proof
with given
contraction
Sketch of Proof. We obtain the differential D and the twisting cochain τ on
Sc[sH(g)] as infinite series by induction: for b ≥ 1, write Scb for the homogeneous
degree-b component of Sc[sH(g)]. Then D, τ for b ≥ 2 are given by
τ = τ1 + τ2 + · · · , τ1 = ∇τH(g), τ
j : Scj → g, j ≥ 1, (11)
h([τ1, τb−1] + · · ·+ [τb−1, τ1])
D = D1 +D2 + · · · (12)
where Db−1 is the coderivation of Sc[sH(g)] determined by
τH(g)D
b−1 =
π([τ1, τb−1] + · · ·+ [τb−1, τ1]) : Scb → H(g).
That is, the coderivation followed by projection onto the degree-one subspace
sH(g) of Sc[sH(g)] is given by the above formula. In the notation of Subsec-
tion 2.3, we have fb = τH(g)D
b−1 and D = f̂ . For example (dropping the symbol
s for elements of sH(g)), we have τ1(x) = x ∈ H(g) ⊂ g, and τ2(xy) = h[x, y]
by (5). Let us also compute two terms of D:
τH(g)D
1(xy) = π[x, y] ∈ H(g),
τH(g)D
2(xyz) =
π( [x, h[y, z]] + [h[x, y], z] ),
etc. We can see why τ is a LA twisting cochain: since τ satisfies
τ = h
[τ, τ ]
we obtain
−dτ =
[τ, τ ]
in case of the particular SDR we constructed, and the last equation is the mas-
ter equation (the differential on H(g) being zero). The sums (11) and (12) are
infinite, but when either one is applied to a specific element in some subspace
of finite filtration degree, only finitely many terms will be nonzero. (The sum-
mand D1 is the ordinary Cartan-Chevalley-Eilenberg differential for the classi-
fying coalgebra of the graded Lie algebra H(g).) The proof that D is indeed
a coalgebra differential and τ is a twisting cochain “will be given elsewhere”.
A “spectral sequence argument” shows that τ̄ induces an isomorphism on the
homology.�
Remark 8. If ∇H(g) happens to be a Lie subalgebra of g, then [τ1, τ1] will
have values in H(g) and τ2 = (1/2)h[τ1, τ1], as well as the remaining τ j , will
be zero. Similarly, we will have D = D1.
Corollary 3. Under the hypotheses of Theorem 1,
τ : ScD[sH(g)] → g, (13)
viewed as an element of degree −1 of the DGLA Hom(ScD[sH(g)], g), satisfies
the master equation (2).
The twisting cochain (13) is our most general solution of the master equation.
The other solutions of the master equation can be derived from it.
6 Corollaries and the second main theorem
6.1 Other corollaries of Theorem 1.
Corollary 4. Under the hypotheses of Theorem 1, suppose in addition that
there is a differential D̃ on Sc[sH(g)] turning the latter into a coaugmented DG
coalgebra in such a way that (Scπ)∂ = D̃(Scπ). Then D = D̃ and (Scπ)∂ may
be taken to be Scπ. In particular, when (Scπ)∂ is zero, then the differential D
on Sc[sH(g)] is necessarily zero, that is, the new contraction in Theorem 1 has
the form
(Sc[sH(g)]
(Sc∇)∂
Sc∂ [sg], (S
ch)∂).
For example, this is the case when the composite
g ⊗ g
[ , ]
→ H(g)
is zero.
Corollary 5. Under the hypotheses of Theorem 1, suppose in addition that
there is a differential D̃ on Sc[sH(g)] turning the latter into a coaugmented DG
coalgebra in such a way that ∂(Sc∇) = (Sc∇)D̃. Then D = D̃ and (Sc∇)∂ =
S∇. In particular, when ∂(Sc∇) is zero, then the differential D on Sc[sH(g)]
is necessarily zero, that is, the new contraction in Theorem 1 has the form
(Sc[sH(g)]
(Scπ)∂
Sc∂ [sg], (S
ch)∂).
For example, this is the case when the composite
H(g)⊗H(g) →
→ g ⊗ g
[ , ]
is zero.
6.2 The second main theorem
Theorem 2. Given a DGLA g, a DGL subalgebra m of g, and a contraction
(H(g)
of chain complexes so that the composite
[ , ]
→ H(g)
is zero, then the induced bracket on H(g) is zero, that is, H(g) is abelian as a
graded Lie algebra, and the data determine a solution τ ∈ Hom(Sc[sH(g)], g) of
the master equation (2) in such a way that the following hold:
(i) The composite πτ coincides with the universal twisting cochain Sc[sH(g)] →
H(g) for the abelian Lie algebra H(g), and
(ii) the values of τ lie in m.
7 Differential Gerstenhaber and BV algebras
7.1 Differential Gerstenhaber algebras
Definition 8. A Gerstenhaber (or G-) algebra consists of
• A graded commutative and associative algebra (A, µ) (µ suppressed), and
• A graded Lie bracket (the Gerstenhaber or G− bracket) [ , ] : A⊗ A → A
of degree −1, such that
• For each homogeneous element a ∈ A, bracketing with a is a derivation of
the Lie bracket of degree |a| − 1.
That is, we want ad(a) to commute with the bracket for all a ∈ A.
Definition 9. A differential G-algebra is a Gerstenhaber algebra (A, [ , ]) with
a differential d of degree +1 on A which is a derivation of the multiplication on
We want [d, µ] = 0 and [d, d] = 0 in Gerstenhaber’s composition bracket
notation.
Definition 10. A differential G-algebra is called strict if the differential d is a
derivation of the G-bracket as well.
We want d to commute with the bracket.
Let (A, [ , ], d) be a strict differential G-algebra. We will for the moment
ignore all the extra structure on A except for the G-bracket and the differential.
As such, A is a DGLA, and we will change the notation to g to emphasize that.
We will use the grading
g1 = A
0, g0 = A
1, g−1 = A
2, . . . , g−n = A
n+1, . . .
so that the graded bracket and the differential on g are now “ordinary”: namely,
[ , ] : gj ⊗ gk → gj+k, d : gj → gj−1.
Consider a contraction of g onto H(g) as in (9). Let ∂ denote the operator
(“perturbation of d”) on Sc∂ [sg] corresponding to the Lie bracket on g. By the
Main Theorem (Theorem 1), we can transfer it to the symmetric coalgebra of
the homology: there exists a new contraction
(ScD[sH(g)]
(Scπ)∂
(Sc∇)∂
Sc∂ [sg], (S
cd)∂)
of not only filtered chain complexes but of filtered differential graded coalgebras.
The twisting cochain τ of (13) of Corollary 1 is now an element of
Hom(ScD[sH(g)], s
of degree −2, satisfying the master equation
[τ, τ ],
where D is the Hom-differential and the graded cup bracket on the right-hand
side refers to the one induced by the graded coalgebra structure on Sc[sH(g)]
and the graded Lie algebra structure on g.
7.2 Differential BV algebras
Definition 11. Let (A, [ , ]) be a G-algebra with an additional operator ∆ on
A of degree −1. If ∆ satisfies the condition
[a, b] = (−1)|a|
∆(ab)− (∆a)b − (−1)|a|a(∆b)
then it is said to be a generator of the G-algebra. In this case, (A,∆) is called
a weak Batalin-Vilkovisky (BV-) algebra. If, moreover, ∆ is exact (i.e. ∆2 = 0),
then (A,∆) is simply called a Batalin-Vilkovisky (BV-) algebra. Koszul has
shown that ∆ behaves as a derivation for the G-bracket:
∆[x, y] = [∆x, y]− (−1)|x|[x,∆y] ∀x, y ∈ A.
With respect to the original graded commutative and associative product on A,
we can only say that ∆ is a second order differential operator, or Φ3∆(a, b) = 0,
where Φr∆ are r-linear operators used to define higher order differential opera-
tors.
Now let us denote the bracket on a (weak) BV-algebra (A,∆) by [ , ].
Definition 12. If d is a differential of degree +1 that endows (A, [ , ]) with a
differential G-algebra structure, such that ∆d+d∆ = 0, then the triple (A,∆, d)
is called a (weak) differential BV-algebra.
Proposition 7. For any weak differential BV-algebra (A,∆, d), the differential
d behaves as a derivation of the G-bracket [ , ]:
d[x, y] = [dx, y]− (−1)|x|[x, dy].
That is, (A, [ , ], d) is a differential G-algebra.
Note: (A, [ , ],∆) is not a differential G-algebra unless ∆ is exact.
Example 8. [7] Let V be a Z2-graded finite dimensional vector space and sV
be its suspended-graded dual. If {xi} is a basis for V consisting of homogeneous
elements, then the dual basis {x∗i } has the property that xi and x
i always have
opposite parities. Then the algebra C[[x1, . . . , xn, x
1, . . . , x
n]] of formal power
series has the following BV operator (Laplacian):
Since the underlying algebra is graded commutative, the composition of two
derivations is a second order differential operator by any definition. Moreover
∆2 = 0, which makes a BV-algebra out of this data. ♦
7.3 Formality
7.3.1 Formality of differential graded P -algebras
Recall that our ground ring is a field of characteristic zero. Let P be a differential
graded operad and (A, d) be an algebra over P . We often want to know to
which extent the cohomology of a space reflects the underlying topological or
geometrical properties of that space.
Definition 13. The P -algebra A is called formal if there exists a strongly
homotopy P -algebra map (H(A), 0)
→ (A, d) which induces an isomorphism in
homology.
7.3.2 Examples
Example 9. (Commutative DG associative algebras.) A smooth mani-
fold M is called formal if the commutative associative DG algebra of de Rham
forms onM is formal in the sense of the above Definition. Examples are compact
Kähler manifolds, Lie groups, and complete intersections. Poisson manifolds
(proof by Sharygin and Talalaev). ⋄
Example 10. (DGLA’s.) The Hochschild complex for the algebra A =
C∞(M) of smooth functions on a Poisson manifold M (Kontsevich). ⋄
7.4 Differential BV algebras and formality
Definition 14. We will say a differential BV-algebra (A,∆, d) (A=g as a Lie
algebra) satisfies the statement of the Kählerian Formality Lemma (or the ∂∂̄
Lemma) if the maps
Ker(∆), d |Ker(∆)
→֒ (g, d),
Ker(∆), d |Ker(∆)
−→ H(g,∆)
are isomorphisms on the homology, where H(g,∆) is endowed with the zero
differential.
Remark 9. If the statement of the K.F.L. is satisfied, then proj can be extended
to a contraction
(H(g, d), 0)
Ker(∆), d |Ker(∆)
Since we have
H (Ker(∆), d) ∼= H(g, d)
H (Ker(∆), d) ∼= H(g,∆),
now we can have a contraction of Ker(∆) onto H(Ker(∆)) = H(g) as we did
with g and H(g).
Theorem 3. Let (A,∆, d) be a differential BV-algebra satisfying the statement
of the Kählerian Formality Lemma and extend the projection proj to a contrac-
(H(g, d), 0)
where
Ker(∆), d |Ker(∆)
and π =proj. Then H(g) is abelian as a graded Lie algebra and the data deter-
mine a solution
τ ∈ Hom(Sc[sH(g)], g)
of the master equation dτ =
[τ, τ ] in such a way that the following hold:
• The values of τ lie in m, that is, the composite
∆ ◦ τ : Sc[sH(g)] → g
is zero;
• The composite πτ coincides with the universal twisting cochain for the
abelian graded Lie algebra H(g); so that
• For k ≥ 2, the values of the component τk of τ on S
k[sH(g)] lie in Im(∆).
Proof. Follows from Theorem 2. �
Let us add the following condition to the ones in Theorem 3: suppose that
A consists of a single copy of the ground ring F (necessarily generated by the
unit 1 of A) and that ∆(1) = 0. Then 1 generates a central copy of the ground
ring in g (the ground ring commutes with all elements of A), and we may write
g = F ⊕ g̃
as a direct sum of differential graded Lie algebras. Here g̃ is the uniquely
determined complement of F . Why unique? We have g1 = A0 = F and we may
g−n =
An+1,
where g̃ will be closed under the (degree-zero) Lie bracket:
[ , ] : gj ⊗ gk → gj+k,
with j + k ≤ 0 if j, k ≤ 0.
Corollary 6. Assume that the hypotheses of Theorem 3, the abovementioned
conditions (A0 = F , ∆(1) = 0), and the condition H1(g) 6= 0 hold. Then the
contraction of the Theorem can be chosen in such a way that
Im(τk) ⊂ g̃ for k ≥ 2.
The statements of the main theorems have an interpretation in the context
of deformation theory, as explained below.
8 Deformation theory
Given a DGLA g, the construction of the universal solution τ from Theorems
2 and 3 relies on a chosen contraction. This provides a formal solution of the
master equation (MCE), with a perturbed differential D on Sc[sHg] in the
direction of (starting with) the Lie bracket induced on homology, endowing the
former with a dg-coalgebra structure and a twisting cochain:
τ : ScD[sHg] → g.
The moduli space interpretation of the set of solutions is along the lines of
Schlesinger-Stasheff [15]. Since our focus is on the construction of solutions of
the MCE, the reader is referred to the original text [8]. Additional details in
terms of deformation functors, tangent cohomology, and the Kuranishi functor
can be found in [13]. The relation between the latter functor and the construc-
tion of a twisting cochain corresponding to a contraction will be investigated
elsewhere.
References
[1] F. Akman and L.M. Ionescu, Higher derived brackets and deformation the-
ory I; arXiv:math.QA/0504541.
[2] F. Akman and L.M. Ionescu, Higher derived brackets and deformation the-
ory II, in preparation.
[3] F. Akman, L.M. Ionescu, and P. Sissokho, On deformation theory and
graph homology, J. Algebra 310 (2007), 730-741; arXiv:math.QA/0507077.
[4] S. Barannikov and M. Kontsevich, Frobenius manifolds and formality of
Lie algebras of polyvector fields, Internat. Math. Res. Notices 4 (1998),
201-215; arXiv:alg-geom/9710032.
[5] P. Chen, On the formality theorem for the DGLA of Drinfeld;
arXiv:math.QA/0601055.
[6] V. Dolgushev, A formality theorem for Hochschild chains;
arXiv:math.QA/ 0402248.
[7] D. Fiorenza, An introduction to the Batalin-Vilkovisky formalism; arXiv:
math.QA/0402057.
[8] J. Huebschmann and J. Stasheff, Formal solution of the master equa-
tion via HPT and deformation theory, Forum Math. 14 (2002), 847-868;
arXiv:math.AG/9906036.
[9] L.M. Ionescu, A combinatorial approach to coefficients in deformation
quantization; arXiv:math.QA/0404389.
[10] M. Kontsevich, Deformation quantization of Poisson manifolds I;
arXiv:q-alg/9709040.
[11] Y. Kosmann-Schwarzbach, Derived brackets, Lett. Math. Phys. 69 (2004),
61-87.
[12] L. Lambe and J.D. Stasheff, Applications of perturbation theory to iterated
fibrations, Manuscripta Math. 58 (1987), 363-376.
[13] M. Manetti, Deformation theory via differential graded Lie algebras;
arXiv:math.AG/0507284.
[14] P. Real, Homological perturbation theory and associativity, Homology, Ho-
motopy, and Applications 2 (2000), 51-88.
[15] M. Schlessinger and J. Stasheff, Deformation theory and rational homotopy
type, Pub. Math. Sci. IHES (1998).
[16] T. Voronov, Higher derived brackets and homotopy algebras, J. Pure
Appl. Algebra 202 (2005), 133-153.
http://arxiv.org/abs/math/0504541
http://arxiv.org/abs/math/0507077
http://arxiv.org/abs/alg-geom/9710032
http://arxiv.org/abs/math/0601055
http://arxiv.org/abs/math/0402248
http://arxiv.org/abs/math/0402057
http://arxiv.org/abs/math/9906036
http://arxiv.org/abs/math/0404389
http://arxiv.org/abs/q-alg/9709040
http://arxiv.org/abs/math/0507284
	Introduction
	Perturbations of (co)differentials
	Derivations of the tensor algebra
	Coderivations of the tensor coalgebra
	Coderivations of the symmetric coalgebra
	DGLA's and perturbations of the codifferential
	Strongly homotopy Lie algebras
	The Hochschild chain complex and DGA's
	Strongly homotopy associative algebras
	Master equation
	Twisting cochain
	Differential on Hom
	Cup product and cup bracket
	Twisting cochain
	Homological perturbation theory (HPT)
	Contraction
	The first main theorem.
	Corollaries and the second main theorem
	Other corollaries of Theorem ??.
	The second main theorem
	Differential Gerstenhaber and BV algebras
	Differential Gerstenhaber algebras
	Differential BV algebras
	Formality
	Formality of differential graded P-algebras
	Examples
	Differential BV algebras and formality
	Deformation theory
ABSTRACT
  These notes, based on the paper "Formal Solution of the Master Equation via
HPT and Deformation Theory" by Huebschmann and Stasheff, were prepared for a
series of talks at Illinois State University with the intention of applying
Homological Perturbation Theory to the derived bracket constructions of
Kosmann-Schwarzbach and T. Voronov, and eventually writing Part II of the paper
"Higher Derived Brackets and Deformation Theory I" by the present authors.

<|endoftext|><|startoftext|>
Introduction
A general framework for variational formulations of physical theories was presented in [1]. Appli-
cations to statics and dynamics of mechanical systems appear in [2, 3]. In this paper we present a
variational formulation of electrodynamics based on that framework. Our work is related to the gen-
eral formulation of linear field theories in a symplectic framework contained in [4] and to the earlier
formulations of electrodynamics contained in [5, 6]. Some of the results contained in this paper were
announced in [7].
We are presenting a variational formulation of electrodynamics in an intrinsic, frame independent
fashion in the affine Minkowski space-time using de Rham odd and even differential forms ([8, 9, 10])
which permit the rigorous formulations of electrodynamics and the description of the transformation
properties of electromagnetic fields relative to reflections (see [5, 6]). Observable quantities like the
charge contained in a compact volume or the flux of the electromagnetic field through a surface require
the integration of differential forms. The use of odd quantities is not just a matter of elegance. Even
if the space-time is assumed to be orientable, the alternative approach of using standard differential
forms and the Hodge star operator require the choice of a specific orientation, i. e. the addition of an
extrinsic structure. Other recent treatments of classical electrodynamics using odd and even differential
forms can be found in [11, 12, 13].
Our construction is an example of application of the general variational framework [1]. Similar
constructions are needed for the variational formulation of a general field theory. The linearity of
classical electrodynamics and the choice of formulating it on the affine Minkowski space makes our
presentation simpler. Relying on a variational principle more complete than the Hamilton principle
our formulation leads to field equations with external sources. This variational principle also permits
http://arxiv.org/abs/0704.0433v3
the derivation of the constitutive relations which are usually postulated separately since the variations
normally considered are not general enough to derive them from the variational principle.
We interpret a domain in space-time as an odd de Rham 4-current. This permits a treatment of
different types of boundary problems in an unified way. As an example we obtain a smooth transition to
the infinitesimal version by using a current with a one point support. De Rham currents are essentially
objects dual to differential forms.
The present paper is organized in the following way. In the first part we provide the geometric
structures needed for the rigorous formulation of electrodynamics. Part of this material is based on [5]
and is briefly reported here for the sake of completeness. We recall the Cartan calculus for odd and
even differential forms and their integration theory over odd and even de Rham currents.
The second part contains the main results. We start with the construction of a suitable space of
fields for electrodynamics (not a differential manifold) and a construction of tangent and cotangent
vectors. A convenient representation of these objects is introduced. The definition of the space of
fields is inspired by a similar construction suited for the statics of continuous media which is contained
in the final section of [1]. In Section 3 we formulate a variational principle for electrodynamics similar
to the virtual action principle of analytical mechanics with external forces and boundary terms and
derive the field equations which include the constitutive relations in addition to Maxwell’s equations.
The boundary problem in a finite domain is treated in Section 4. Section 5 contains the Lagrangian
formulation of electrodynamics. The Legendre transformation and the Hamiltonian formulation of
electrodynamics in Section 6 conclude the paper.
A. Preliminaries
Here we provide the geometric structures needed for the rigorous formulation of electrodynamics in
Part B. The material in Sections 1, 2, and 4 is based on [5] and is briefly recalled here for the sake of
completeness. Nevertheless, we add a more explicit presentation of some useful details.
1.Orientations of vector spaces and vector subspaces.
Let V be a vector space of dimension m 6= 0. We denote by F(V ) the space of linear iso-
morphisms from V to Rm called frames. It is known that F(V ) is a homogeneous space with re-
spect to the natural group action of the general linear group GL(m,R) in F(V ). Let GL+(m,R)
and GL−(m,R) be the two connected components of the group GL(m,R). The set of orientations
O(V ) = F(V )
GL+(m,R) has two elements. This set is a homogeneous space for the quotient group
H(m,R) = GL(m,R)
GL+(m,R). The sets E = GL+(m,R) and P = GL−(m,R) are the elements of
the quotient group H(m,R) which is the group of permutations of the two elements of O(V ). There is
an ordered base (e1, e2, . . . , em) of V associated with each frame ξ in a obvious way.
Let W ⊂ V be a subspace of a vector space V . The subspace has the set O(W ) of orientations called
inner orientations of W . Orientations of the quotient space V
W are called outer orientations of W .
An outer orientation o′′ of W can be determined by specifying an inner orientation o of W together
with an orientation o′ of V . Let (e1, . . . , en) be the base of W associated with a frame ξ ∈ o. This base
can be completed to a base (e′1, . . . , e
m) of V with (e
1, . . . , e
n) = (e1, . . . , en). The extended base can
be chosen to be associated with a frame ξ′ ∈ o′. Let π:V → V
W be the canonical projection. The
sequence
(e′′1 , . . . , e
m−n) = (π(e
n+1), . . . , π(e
m)) (1)
is a base of V
W . It determines an orientation o′′ of V
W . Hence an outer orientation of W . The
outer orientation o′′ of W constructed from o ∈ O(W ) and o′ ∈ O(V ) is the same as the orientation
constructed from Po and Po′.
We have introduced inner orientation of subspaces of dimension different from zero and outer orien-
tation of subspaces of codimension different from zero. Integration theory of differential forms requires
the possibility of assigning inner orientations to the subspace W = {0} ⊂ V and outer orientations to
the subspace W = V . Two possible orientations are assigned to the subspace W = {0} ⊂ V one of
which is distinguished. The distinguished orientation is denoted by (+) and the other orientation is
denoted by (−). In agreement with the conventions established for orientation the outer orientations
of the subspace W = {0} ⊂ V are the orientations of V . An outer orientation of W = {0} can be
specified in terms of an inner orientation and an orientation of V . If the inner orientation is (+) and the
orientation of V is o, then o is the outer orientation of W . The orientation Po is the outer orientation
derived from (−) and o. The subspace W = V has a distinguished outer orientation defined as an
orientation of V
2.Multicovectors and multivectors.
A q - covector in a vector space V is a mapping a :×qV ×O(V ) → R. This mapping is q - linear and
totally antisymmetric in its vector arguments. A q - covector a is said to be even, if
a(v1, v2, . . . , vq, Po) = a(v1, v2, . . . , vq, o). (2)
It is said to be odd, if
a(v1, v2, . . . , vq, Po) = −a(v1, v2, . . . , vq, o). (3)
The vector space of even q - covectors will be denoted by ∧qeV
∗ and the space of odd q - covectors will
be denoted by ∧qoV
∗. We will use the symbol ∧qpV
∗ to denote either of the spaces in constructions valid
for both parities. The index p with the two possible values e and o will be used on other occasions.
For the definition of exterior product of (even and odd) multicovectors we refer to [5]. Here we just
recall that if two multicovectors a and a′ are even or both are odd, the product a∧a′ is even. In other
cases the product is odd. The exterior product is commutative in the graded sense and associative.
Let {eκ}κ=1,... ,m be a base of V and let {e
κ}κ=1,... ,m be the dual base. Each element e
κ defines an
even covector
eκe :V × O(V ) → R: (v, o) 7→ 〈e
κ, v〉. (4)
We choose an orientation o of V and introduce odd 0-covector eo defined by eo(o) = −eo(Po) = 1 and
the even covector ee defined by ee(o) = ee(Po) = 1.
We come now to multivectors. We denote by K(×qV × O(V )) the vector space of formal linear
combinations of sequences (v1, v2, . . . , vq, o) ∈ ×
qV ×O(V ). In the space K(×qV ×O(V )) we introduce
subspaces
A pq (V ) =
i=1 λi(v
2, . . . , v
i) ∈ K(×qV × O(V ));
i=1 λia(v
2, . . . , v
i) = 0 for each a ∈ ∧qpV
. (5)
Subsequently we define quotient spaces ∧qpV = K(×
qV × O(V ))
A pq (V ). Elements of spaces ∧
eV and
∧qoV are called even q - vectors and odd q - vectors respectively. A multivector is said to be simple if
it is represented by a single element of the space V q × O(V ) interpreted as a subspace of K(V q ×
O(V )). Evaluation of q - covectors on sequences (v1, v2, . . . , vq, o) ∈ ×
qV × O(V ) extends to linear
combinations and their equivalence classes. If w is a q - vector represented by the linear combination∑n
i=1 λi(v
2, . . . , v
i) and a is a q - covector of the same parity as w, then
〈a, w〉 =
i=1 λia(v
2, . . . , v
i) (6)
is the evaluation of a on w. We have constructed pairings 〈 , 〉:∧qpV
∗×∧qpV → R. The exterior product
of multivectors can be easily defined using representatives. The parity of the exterior product is odd
if the parity of one of the factors is odd. It is even otherwise. The exterior product is commutative in
the graded sense and associative.
The left interior multiplications are the operations :∧qpV × ∧
V ∗ → ∧
V ∗, defined for q 6 q′
by 〈w a,w′〉 = 〈a, w ∧w′〉. The parity pp′ which appears in this definition is constructed by assigning
the numerical values +1 and −1 to e and p respectively. The parity of the multivector w′ must
match the parity of the multicovector w a. The right interior multiplications are the operations
:∧qpV×∧
∗ → ∧
pp′ V, defined for q > q
′ by 〈a′, w a〉 = 〈a′∧a, w〉. The parity of the multicovector
a′ in this definition must match the parity of the multivector w a.
3.The Weyl isomorphism and a useful formula.
The space ∧mo V
∗ is one-dimensional. This makes it possible to define the tensor product ∧qeV ⊗∧
as the set of equivalence classes of pairs (w, e) ∈ ∧qeV ×∧
∗. Pairs (w, e) and (w′, e′) are equivalent
if there is a number λ such that w′ = λw and e = λe′ or w = λw′ and e′ = λe. The equivalence class
of a pair (w, e) will be denoted by w ⊗ e. A tensor a ∈ ∧qeV ⊗ ∧
∗ will always be presented as a
product w ⊗ e. The set ∧qeV ⊗ ∧
∗ is a vector space (see [5]).
Proposition 1 ([5]). The linear mapping
Weq:∧
eV ⊗ ∧
∗ → ∧m−qo V
∗:w ⊗ e 7→ w e (7)
is an isomorphism.
The mapping Weq is called the Weyl isomorphism.
We will show now that the values of any bilinear mapping b:∧qeV
∗ × ∧q
∗ → ∧mo V
∗ can be
expressed using the exterior product. The following three technical lemmas will be used.
Lemma 1. Let w ∈ ∧qeV . Then
w (eo ∧ e
e ∧ . . . ∧ e
e ) =
ν1<...<νq
(νi−i)〈eν1e ∧ . . . ∧ e
e , w〉eo ∧ e
e ∧ . . . ∧ e
e (8)
where νq+1 < . . . < νm and (νq+1, . . . , νm) denotes the complementary (m− q)-tuple of (ν1, . . . , νq).
Proof: It is enough to prove the claim for a simple q-vector. If w = w1 ∧ . . . ∧ wq, then
(w1 ∧ . . . ∧ wq) (eo ∧ e
e ∧ . . . ∧ e
(−1)ν1−1〈eν1e , w1〉wq
. . .
eo ∧ e
e ∧ . . . ∧ ê
e ∧ . . . ∧ e
= . . . =
ν1 6=...6=νq
(νi−i) sgn(ν1, . . . , νq)〈e
e , w1〉 · · · 〈e
e , wq〉
eo ∧ e
1 ∧ . . . ∧ êν1e ∧ . . . ∧ ê
e ∧ . . . ∧ e
ν1<...<νq
σ∈S(q)
(νi−i) sgnσ〈e
νσ(1)
e , w1〉 · · · 〈e
νσ(q)
e , wq〉
eo ∧ e
1 ∧ . . . ∧ êν1e ∧ . . . ∧ ê
e ∧ . . . ∧ e
ν1<...<νq
σ∈S(q)
(νi−i) sgnσ〈e
νσ(1)
e , w1〉 · · · 〈e
νσ(q)
e , wq〉eo ∧ e
νq+1 ∧ . . . ∧ eνme
ν1<...<νq
(νi−i) det(〈eνre , ws〉)eo ∧ e
νq+1 ∧ . . . ∧ eνme . (9)
In the second line of this sequence of equalities we used the well known identity
v (a1 ∧ . . . ∧ aq) =
(−1)ν−1〈aν , v〉a1 ∧ . . . ∧ âν ∧ . . . ∧ aq (10)
where v is a vector and a1, . . . , aq are covectors. In the third line we applied repeatedly this identity
and we noted that when ν1 < . . . < νq each missing covector e
e in the expression eo∧e
1∧. . .∧êν1e ∧. . .∧
e ∧ . . .∧ e
e occupied the (νi− i+ 1)-th place in the exterior product, otherwise if νi−l−1 < νi < νi−l,
then it occupied the (νi − i − l + 1)-th place, hence we get the factor (−1)
(νi−i) sgn(ν1, . . . , νq)
where the symbol sgn(ν1, . . . , νq) denotes the sign of the permutation (ν1, . . . , νq).
Lemma 2. If a ∈ ∧qeV
∗ and w ⊗ e ∈ ∧qeV ⊗ ∧
∗, then
〈a, w〉e = a ∧ Weq(w ⊗ e). (11)
Proof: It is enough to prove the claim for e = eo ∧ e
e ∧ . . . ∧ e
e . By the Lemma 1 we get
a ∧ Weq(w ⊗ e)
= a ∧ (w (eo ∧ e
e ∧ . . . ∧ e
ν1<...<νq
(νi−i)〈w, eν1e ∧ . . . ∧ e
e 〉a ∧ eo ∧ e
e ∧ . . . ∧ e
ν1<...<νq
(νi−i)w
ν1...νq
µ1<...<µq
aµ1...µqee ∧ e
e ∧ . . . ∧ e
e ∧ eo ∧ e
e ∧ . . . ∧ e
e . (12)
The right-hand side of (12) reduces to
ν1<...<νq
(νi−i)wν1...νqaν1...νqeo ∧ e
e ∧ . . . ∧ e
e , (13)
since νq+1 < . . . < νm and (νq+1, . . . , νm) is the complementary (m− q)-tuple of (ν1, . . . , νq). Finally
we note that moving each νi (for i = 1, . . . , q) to the νi-th place in (12) requires νi − i transpositions,
since ν1 < . . . < νq. Then each of the remaining νi, i.e. those with i = q + 1, . . . ,m, will necessarily
be at the νi-th place. Hence we obtain
a ∧ Weq(w ⊗ e) =
ν1<...<νq
aν1...νqw
ν1...νqeo ∧ e
e ∧ . . . ∧ e
e . (14)
We denote by Hom(∧mo V
∗| ∧qe V
∗) the space of linear mappings from ∧qeV
∗ to ∧mo V
∗ and by
iq: Hom(∧
∗| ∧qe V
∗) → ∧qeV ⊗ ∧
∗ (15)
the isomorphism characterized by
〈iq(l), a
′ ⊗ u〉 = 〈l(a′), u〉 (16)
for each l ∈ Hom(∧mo V
∗| ∧qe V
∗), a′ ∈ ∧qeV
∗ and u ∈ ∧mo V . The pairing
〈 , 〉: (∧qeV ⊗ ∧
∗) × (∧qeV
∗ ⊗ ∧mo V ) → R: (w ⊗ e, a⊗ u) 7→ 〈a, w〉〈e, u〉 (17)
is used.
Lemma 3. If l ∈ Hom(∧mo V
∗| ∧qe V
∗), then
l(a) = a ∧ Weq(iq(l)), (18)
for each a ∈ ∧qeV
Proof: Let iq(l) = wl ⊗ e with wl ∈ ∧
eV . Using Lemma 2 we obtain
a ∧ Weq(i(l)) = a ∧ Weq(wl ⊗ e) = 〈a, wl〉e. (19)
We will show now that l(a) = 〈a, wl〉e. Indeed if we denote by u ∈ ∧
o V the dual basis of e then we
have l(a) = 〈l(a), u〉e and
〈l(a), u〉 = 〈i(l), a⊗ u〉 = 〈wl ⊗ e, a⊗ u〉 = 〈a, wl〉〈e, u〉 = 〈a, wl〉. (20)
We can now finally show that a useful expression can be obtained for the values of any bilinear
mapping
b:∧qeV
∗ × ∧q
∗ → ∧mo V
∗. (21)
We associate with b the linear mappings
b:∧qeV
∗ → ∧q
e V ⊗ ∧
∗: a 7→ iq′(b(a, ·)) (22)
∗ → ∧qeV ⊗ ∧
∗: a′ 7→ iq(b(·, a
′)). (23)
Proposition 2. If b as in (21) is a bilinear mapping and b, b are the associated linear mappings (22)
and (23), then
b(a, a′) = a′ ∧ Weq′(b(a)) = a ∧ Weq(b(a
′)). (24)
for each (a, a′) ∈ ∧qeV
∗ × ∧q
Proof: Applying Lemma 3 to l = b(a, ·) we get
b(a, a′) = b(a, ·)(a′) = a′ ∧ Weq′(iq′(b(a, ·))) = a
′ ∧ Weq′(b(a)). (25)
The other equality is obtained in the same way by applying Lemma 3 to l = b(·, a′).
4. Integration of differential forms. Chains and currents.
Let M be an affine space modelled on a vector space V . A differential q-form on M is a differentiable
function A:M ××qV × O(V ) → R depending on a point, q vectors and an orientation. It is q-linear
and totally antisymmetric in its vector arguments. A differential form A is said to be even or odd if
for each point in x ∈ M it defines a even or odd multicovector, respectively. We note that a zero-form
on M is a differentiable function f :M ×O(V ) → R. The vector space of even differential q-forms will
be denoted by Φqe(M) and space of odd differential q-forms will be denoted by Φ
o(M). The symbol
Φqp(M) will be used to denote either of the two spaces when the distinction is of no importance.
For the definition of the exterior product of two differential forms and for that of exterior differential
of a differential form, we refer the reader to [5]. Here we just recall that if both forms A and A′ are
even or both are odd, the product A ∧ A′ is even. In other cases the product is odd. The parity of
the differential dA of a q-form A is the same as the parity of the original form A. A q-form A can
be interpreted as a q-covector field Ã:M → ∧qpV
∗. The exterior product and the exterior differential
are extended to this alternative interpretation of forms. The left and right interior multiplications of
even and odd multivector fields with even and odd multicovector fields are defined point by point in
an obvious manner.
A cell of dimension q or a q - cell in M is a pair (χ, o), where χ is a differentiable mapping χ:Rq → M
and o is an orientation of V . For q = 0, R0 is the vector space {0} with a single element 0. Hence a
zero-cell in M is a pair of a point x ∈ M and an orientation of V . The integral of a q - form A on a
cell (χ, o) is the Riemann integral
(χ,o)
· · ·
A (χ(s1, . . . , sq),D1χ(s1, . . . , sq), . . . ,Dqχ(s1, . . . , sq), o) ds1 · · · dsq. (26)
The integral of a 0 - form f on a zero-cell (x, o) is the value
(x,o)
f = f(x, o). For each q we introduce
the space K(Xq(M)) of formal linear combinations of q - cells. The formal linear combinations turn
into real linear combinations if cells are identified with elements of K(Xq(M)). Integration of forms is
extended to linear combinations by linearity. Subspaces N pq (M) ⊂ K(Xq(M)) are defined as the sets
q (M) =
C ∈ K(Xq(M));
A = 0 for each A ∈ Φqp(M)
. (27)
Elements of the quotient spaces Ξpq(M) = K(Xq(M))
N pq (M) are called even chains or odd chains of
dimension q. We extend the sequence of even and odd chains to negative dimension q by defining the
spaces Ξpq(M) = {0} for each q < 0. A chain is said to be simple if it has a single cell as one of its
representatives. Integrals of q - forms on q - chains are well defined. The integral of a q - form A on the
class C of C ∈ K(Xq(M)) is the integral of A on C.
The boundary operator ∂ assigns to a chain C ∈ Ξ pq (M) its boundary ∂C ∈ Ξ
q−1(M). The boundary
of a simple chain represented by a q - cell (χ, o) is the chain represented by the combination
i=1(−1)
i−1((χ(i,1), o) − (χ(i,0), o)), (28)
where the (q−1) - cells (χ(i,1), o) and (χ(i,0), o) are defined by
(i,∗):Rq−1 → M : (s1, . . . , ŝi, . . . , sq) 7→ χ(s1, . . . , si−1, ∗, si+1 . . . , sq) (29)
with ∗ replaced by 1 and 0, respectively. The cells introduced in (29) represent the faces of the simple
chain. The construction of the boundary is extended to generic chains by linearity. The boundary of
a boundary is the zero chain. It is known that Stokes theorem holds for chains and forms of the same
parity.
An even or odd de Rham current of dimension q on a manifold M is a linear function c: Φqp(M) → R.
We will use the symbol
A to denote the value c(A). The spaces of forms are given certain topologies
and the continuity of the function is required. Chains will be treated as currents. They form a dense
subspace in the space of currents. We will consider only very simple examples of currents other than
chains. Topological considerations are of little importance for these examples. The boundary of a
current is defined by assuming that Stokes theorem holds for currents. Thus if c is a current of
dimension q, then the boundary of c is the mapping
∂c: Φq−1p (M) → R:A 7→
dA. (30)
In addition to chains the odd de Rham current most frequently used is the Dirac current wδ(x) of
dimension m derived from an odd m - vector w and a point x ∈ M . If A is an odd m - form, then
wδ(x)
A = 〈Ã(x), w〉. (31)
B. Electrodynamics
1.The space of fields.
In this section we will construct a suitable space of fields for electrodynamics which has the same
role of the space of motions in mechanics (see [3]). The space of fields is not a differentiable manifolds.
Nevertheless, we will introduce in this space enough structure to permit the construction of tangent
and cotangent vectors which are the objects needed in in the formulation of the variational principle.
A class of admissible functions defined on the space of fields has also to be specified. This class needs
to be large enough to include the action, which for electrodynamics is generated (in a sense that will be
specified) from a quadratic Lagrangian density, due to the linearity of the theory. Thus, only functions
generated from quadratic mappings will be used in our variational formulation of electrodynamics. A
larger class of admissible functions would be needed to deal with more general, non linear field theories,
and the price of increased technical difficulties should be paid.
Let M be the affine Minkowski space-time of special relativity with the 4-dimensional model space
V and the non degenerate metric tensor g : V → V ∗ of signature (1, 3). The space of odd 4-currents
with compact supports in M will be denoted by CM . Differential forms will always be presented as
covector fields.
We consider the set X(Φ1e(M);CM) of pairs (A, c), where c is an odd current of dimension 4 in M
with a compact support Sup(c) and A is a local even 1 - form
A:U → ∧1eV
∗ (32)
defined in an open set U ⊂ M containing the support of c. The 1 - form A will represent the electro-
magnetic potential.
A mapping
κ:M × ∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗ (33)
is said to be quadratic if for each x ∈ M there exists a symmetric bilinear mapping
∗ × ∧2eV
∗ × ∧2eV
∗) → ∧4oV ∗ (34)
such that the mappings κx = κ(x, ·, ·) and δ
2κx satisfy the equality
κx(a, f) =
κx((a, f)(a, f)), (35)
for each (a, f) ∈ ∧1eV
∗ × ∧2eV
∗. We are using the standard definition of quadratic mappings in terms
of polarizations. The polarization of the mapping κx is the mapping δ
2κx and the equation (35) is the
standard relation between a quadratic mapping defined on a vector space and its polarization. We will
use the set of quadratic mappings (33) to introduce an equivalence relation in the set X(Φ1e(M);CM).
In the next section they will be also used to define the admissible functions defined on the space of
fields.
Pairs (A, c) and (A′, c′) are equivalent if
κ ◦ (x,A′, dA′) =
κ ◦ (x,A, dA) (36)
for each quadratic mapping κ:M × ∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗. The symbol x is used to indicate the
identity mapping of M and also a point of M .
If µ is an arbitrary odd 4-form on M and κ is set to be the mapping
κ:M × ∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗: (x, a, f) 7→ µ(x), (37)
then the equivalence condition (36) reduces to
µ (38)
and implies that c′ = c.
Equivalence classes of elements of X(Φ1e(M);CM) will be called fields. Our fields are similar to
those used by Freed in [14]. The space of fields will be denoted by Q(Φ1e(M);CM) or simply Q. The
equivalence class of (A, c) will be denoted by q(A, c) or simply q. There is a natural projection
ε:Q(Φ1e(M);CM) → CM : q(A, c) 7→ c (39)
from the space of fields to the space CM of currents in M . It is not difficult to check that each fibre
ε−1(c) of the projection ε is a vector space which will be denoted by the symbol Q(Φ1e(M); c) or Qc.
Indeed, the sum of two pairs (A, c) and (A′, c) with A:U → ∧1eV
∗ and A′:U ′ → ∧1eV
∗ is the pair
(A + A′, c) defined in U ∩ U ′ ⊃ Sup(c). Descending to the quotient with respect to the equivalence
relation defined by (36) easily gives the sum in Qc. Therefore, the space of fields is the disjoint union
of vector spaces Q =
2.Functions, vertical tangent vectors and covectors in the space of fields.
With each quadratic mapping κ:M × ∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗ we associate the function
k:Q(Φ1e(M);CM) → R: q(A, c) 7→
κ ◦ (x,A, dA). (40)
The above mentioned admissible functions defined on the space of fields are those constructed in this
way. They will be called differentiable. The space of such functions will be denoted by K(Φ1e(M);CM).
We note that from the definition of the space of fields Q = Q(Φ1e(M);CM) (as a quotient space with
respect to the equivalence condition (36)) follows easily that the differentiable functions separate points
of Q, i.e. if k(q′) = k(q) for each k ∈ K(Φ1e(M);CM), then q
′ = q. Indeed if k(q(A′, c′)) = k(q(A, c)) for
each k ∈ K(Φ1e(M);CM), then equation (36) holds for each quadratic mapping κ:M×∧
∗×∧2eV
∗. It follows that c′ = c and q(A′, c) = q(A, c).
The tangent space to a vector space Qc (i.e. to a fibre of ε) coincides with Qc.
The vertical tangent bundle to the vector bundle Q(Φ1e(M);CM) is the space
VQ = Q ×
(ε,ε)
Q = {(q, δq) ∈ Q×Q; ε(q) = ε(δq)} (41)
where q = q(A, c) and δq = q(δA, c) and A:U → ∧1eV
∗, δA:U → ∧1eV
∗. The tangent projection is the
canonical projection
τQ:VQ → Q: (q, δq) 7→ q. (42)
There is no obvious choice of the bundle dual to VQ.
We will use the fibre derivatives of functions k ∈ K(Φ1e(M);CM) as models of covectors. The
derivative Dk of a function k:Q → R is defined for each current c separately. It is evaluated on a pair
of vectors q = q(A, c) ∈ Qc and δq = q(δA, c) ∈ Qc. The result is the expression
Dk (q(A, c), q(δA, c)) =
k (q(A + sδA, c)) (0)
κ ◦ (x,A + sδA, F + sδF )
, (43)
where F = dA and δF = dδA. The symbol dk(q(A, c)) will be used to denote the covector characterized
by the pairing
〈dk (q (A, c)) , q(δA, c)〉 = Dk (q(A, c), q(δA, c)) (44)
with vectors δq = q(δA, c) ∈ Qc.
So we need to calculate the expression
κ ◦ (x,A + sδA, F + sδF )
(x) = Dκx(A(x), F (x), δA(x) ⊕ δF (x)), (45)
for each x ∈ U . We are using the following known definition. If V1, V2,W are vector spaces and
m:V1 × V2 → W is a smooth mapping, then its derivative is the mapping
Dm:V1 × V2 × (V1 ⊕ V2) → W : (a, f, δa⊕ δf) 7→
(m(a + sδa, f + sδf)) (0). (46)
We define for each x ∈ U the bilinear mappings
∗ × ∧1eV
∗ → ∧4oV
∗: (a, a′) 7→ δ2κx((a, 0), (a
, 0)), (47)
∗ × ∧2eV
∗ → ∧4oV
∗: (a, f) 7→ δ2κx((a, 0), (0, f)) (48)
∗ × ∧2eV
∗ → ∧4oV
∗: (f, f ′) 7→ δ2κx((0, f), (0, f
′)), (49)
obtaining the equality
κx(a, f) =
λx(a, a) + µx(a, f) +
νx(f, f), (50)
for each (a, f) ∈ ∧1eV
∗ × ∧2eV
∗. The equality (50) in terms of the bilinear mappings (47), (48), and
(49) will be useful to calculate the expression (45).
We have the following lemmas.
Lemma 4. If κx:∧
∗ × ∧2eV
∗ → ∧4oV
∗ is a quadratic mapping and λx, µx, νx are the mappings
defined above, then
Dκx(a, f, δa⊕ δf) =
Dλx(a, a, δa⊕ δa) + Dµx(a, f, δa⊕ δf) +
Dνx(f, f, δf ⊕ δf), (51)
for each (a, f, δa⊕ δf) ∈ ∧1eV
∗ × ∧2eV
∗ × ∧1eV
∗ ⊕ ∧2eV
Proof: We have the equality
λx ◦ ∆ ◦ pr1 + µx +
νx ◦ ∆
′ ◦ pr2, (52)
where
pr1:∧
∗ × ∧2eV
∗ → ∧1eV
∗: (a, f) 7→ a, (53)
∆:∧1eV
∗ → ∧1eV
∗ × ∧1eV
∗: a 7→ (a, a), (54)
pr2:∧
∗ × ∧2eV
∗ → ∧2eV
∗: (a, f) 7→ f, (55)
∆′:∧2eV
∗ → ∧2eV
∗ × ∧2eV
∗: a 7→ (a, a). (56)
The claim follows from the equality (52) by noting that
Dκx(a, f, δa⊕ δf) =
Dλx ◦ (∆,D∆) ◦ (pr1,Dpr1)(a, f, δa⊕ δf) + Dµx(a, f, δa⊕ δf)
Dνx ◦ (∆
,D∆′) ◦ (pr2,Dpr2)(a, f, δa⊕ δf)
Dλx ◦ (∆,D∆)(a, δa) + Dµx(a, f, δa⊕ δf)
Dνx ◦ (∆
′,D∆′)(f, δf)
Dλx(a, a, δa⊕ δa) + Dµx(a, f, δa⊕ δf) +
Dνx(f, f, δf ⊕ δf). (57)
Lemma 5. If V1, V2,W are vector spaces and
b:V1 × V2 → W (58)
is a bilinear mapping, then
Db(a, f, δa⊕ δf) = b(a, δf) + b(δa, f), (59)
for each (a, f, δa⊕ δf) ∈ V1 × V2 × (V1 ⊕ V2).
Proof: From the definition (46) of the derivative it follows that
Db(a, f, δa⊕ δf) =
(b(a, f + sδf) + sb(δa, f + sδf)) (0)
b(a, f) + sb(a, δf) + sb(δa, f) + s2b(δa, δf)
= b(a, δf) + b(δa, f). (60)
In view of Lemmas 4 and 5 the integrand in right hand side of the equality (43) reduces to
Dκx(A(x), F (x), δA(x) ⊕ δF (x)) =
Dλx(A(x), A(x), δA(x) ⊕ δA(x))
+ Dµx(A(x), F (x), δA(x) ⊕ δF (x))
Dνx(F (x), F (x), δF (x) ⊕ δF (x))
(λx(A(x), δA(x)) + λx(δA(x), A(x)))
+ µx(A(x), δF (x)) + µx(δA(x), F (x))
(νx(F (x), δF (x)) + νx(δF (x), F (x))) , (61)
Dκx(A(x), F (x), δA(x) ⊕ δF (x)) = λx(A(x), δA(x)) + µx(A(x), δF (x))
+ µx(δA(x), F (x)) + νx(F (x), δF (x)), (62)
since λx and νx are symmetric. By using the Proposition 2 contained in Section 6, this expression
reduces to
Dκx(A(x), F (x), δA(x) ⊕ δF (x)) =δA(x) ∧ We1(λx(A(x)) + δF (x) ∧ We2(µx(A(x))
+ δA(x) ∧ We1(µx(F (x)) + δF (x) ∧ We2(νx(F (x))
=δA(x) ∧
We1(λx(A(x)) + µx(F (x)))
+ δF (x) ∧ (We2(µx(A(x)) + νx(F (x)))) . (63)
where the linear mappings λx and λx are obtained from the bilinear mapping λx as prescribed in
Section 6, i.e.,
∗ → ∧1eV ⊗ ∧
∗: a 7→ i1(λx(a, ·)), (64)
∗ → ∧1eV ⊗ ∧
∗: a 7→ i1(λx(·, a)), (65)
for each x ∈ U . The mappings µx, µx are obtained from µx, and νx, νx are obtained from νx in the
same way.
We define from λx, λx the mappings
λ:M × ∧1eV
∗ → ∧1eV ⊗ ∧
∗: (x, a) 7→ λx(a), (66)
λ:M × ∧1eV
∗ → ∧1eV ⊗ ∧
∗: (x, a) 7→ λx(a). (67)
The mappings µ, µ, ν, ν are defined respectively from µx, µx, νx, νx in the same way.
By using the equalities (47) and (63) which hold for each x ∈ U we obtain the equality
κ ◦ (x,A + sδA, F + sδF )
We1 ◦
λ ◦ (x,A) + µ ◦ (x, F )
+ (We2 ◦ (µ ◦ (x,A) + ν ◦ (x, F ))) ∧ δF. (68)
We have also used the graded commutativity of the exterior product.
Hence,
Dk (q(A, c), q(δA, c)) =
κ ◦ (x,A + sδA, F + sδF )
We1 ◦
λ ◦ (x,A) + µ ◦ (x, F )
(We2 ◦ (µ ◦ (x,A) + ν ◦ (x, F ))) ∧ δF. (69)
Since F = dA and δF = dδA, this equality can be converted to the form
Dk (q(A, c), q(δA, c)) = −
We1 ◦
λ ◦ (x,A) + µ ◦ (x, dA)
+ d (We2 ◦ (µ ◦ (x,A) + ν ◦ (x, dA)))) ∧ δA
d ((We2 ◦ (µ ◦ (x,A) + ν ◦ dA)) ∧ δA) . (70)
We note that the first integral contains the exterior product of δA and an odd 3-form, while the second
integral contains the exterior differential of the product of δA and an odd 2-form.
The obtained expression suggests the representation of covectors as equivalence classes of elements
of the sets X(Φ2o(M) × Φ
o(M);CM) introduced below.
An element of X(Φ2o(M)×Φ
o(M);CM) is a triple (G, J, c) of a local odd differential 2-form G:U →
∗, a local odd differential 3-form J :U → ∧3oV
∗ and a current c with support contained in U . The
odd 2-form G will be interpreted as the electromagnetic induction, while the odd 3-form J will be
interpreted as the current. A covector p is an equivalence class p(G, J, c) of (G, J, c) ∈ X(Φ2o(M) ×
Φ3o(M);CM). The equivalence relations in X(Φ
o(M) × Φ
o(M);CM) are based on the expression
J ∧ δA−
d (G ∧ δA)
. (71)
Elements (G, J, c) and (G′, J ′, c′) are equivalent if c = c′ and
J ′ ∧ δA−
d (G′ ∧ δA)
J ∧ δA−
d (G ∧ δA)
, (72)
for each (δA, c) ∈ Qc.
This equivalence relation is related to the equivalence relation (36). They can be used in the
construction of dual objects. Indeed a (vertical) tangent vector might also (more generally) be defined
as an equivalence class of pairs (δA, c) with respect to an equivalence relation similar to (36). A
convenient representation of such relation is a relation similar to (72), but with (G, J, c) fixed, defining
the equivalence between two pairs (δA, c) and (δA′, c). Such a construction is not needed in our case
since each Qc is a vector space, so that its tangent space is canonically identified with Qc itself.
The vector space Πc of covectors associated with the current c will be used as the dual space of the
space Qc thought of as the space of vertical tangent vectors with the pairing
〈〈〈〈〈p, δq〉〉〉〉〉
J ∧ δA−
d (G ∧ δA)
, (73)
where δq = q(δA, c) ∈ Qc and p = p(G, J, c) ∈ Πc. The space of all covectors is the union
Πc. (74)
There is a natural projection
ε′: Π → CM : p(G, J, c) 7→ c. (75)
from the space of fields to the space CM of currents in M . The phase space is the space
Ph = Q ×
(ε,ε′)
Π = {(q, p) ∈ Q× Π; ε(q) = ε′(a)}. (76)
The symbol Phc will denote the set Qc × Πc ⊂ Ph.
3.A virtual action principle for electrodynamics.
In this section a variational principle for electrodynamics similar to the virtual action principle of
analytical mechanics (see [3]) will be formulated. The construction is an example of similar construc-
tions needed for the variational formulation of a general field theory. The linearity of the theory and
the choice of formulating it on the affine Minkowski space makes our presentation simpler though
containing the core of the general framework (which can be found in [1]).
The action is the differentiable function
W :Q(Φ1e(M);CM) → R: q(A, c) 7→
L ◦ (A, dA) (77)
derived from the quadratic Lagrangian density
L:∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗: (a, f) 7→ −
〈f,∧2eg
−1(f)〉
|g|. (78)
We are denoting by
|g| the odd 4-covector defined by the Minkowski metric (see [5]). Later we
will use the same symbol to denote the constant 4-covector field constructed from it.
The 1-form A is called the potential, the 2-form F = dA is called the electromagnetic field.
We note that the Lagrangian is a quadratic mapping which depends only on its second argument f
and thus λ = µ = 0 and ν does not depend on x in the formula (50).
A phase ph = (q(A, c), p(G, J, c)) satisfies the virtual action principle if the equality
〈dW (q), δq〉 − 〈〈〈〈〈p, δq〉〉〉〉〉
= 0 (79)
holds for each virtual displacement δq = q(δA, c) ∈ Qc. For each current c the dynamics associated
with the current c is the set Dc ⊂ Phc of phases which satisfy the virtual action principle. The
dynamics is the subset
Dc (80)
of the phase space Ph.
A phase space trajectory is a triple of local differential forms
(A,G, J):U → ∧1eV
∗ × ∧2oV
∗ × ∧3oV
. (81)
The dynamics of a system can also be represented as a set DDDD of phase space trajectories (A,G, J) such
that for each current c with support included in U the phase ph = (q(A, c), p(G, J, c)) is in Dc.
The equation (79) is too abstract to be used directly. A more concrete expression is given in the
following proposition.
Proposition 3. A phase ph = (q(A, c), p(G, J, c)) satisfies the virtual action principle if and only if
the equality
−1 ◦ dA
∧ δA− d
−1 ◦ dA
J ∧ δA−
d (G ∧ δA)
is satisfied for each virtual displacement q(δA, c).
Proof: By applying the formula (70) to the action W we obtain that its variation is
〈dW (q), δq〉 =
(−d(We2 ◦ ν ◦ dA) ∧ δA + d ((We2 ◦ ν ◦ dA) ∧ δA)) , (83)
since λ = µ = 0. Moreover
νx(f, f) = −
〈f,∧2eg
−1(f)〉
|g|, (84)
from which it follows after some calculation that
ν(x, f) = −
∧2e g
−1(f) ⊗
|g|, (85)
for each f ∈ ∧2eV
∗. Hence,
We2 ◦ ν ◦ (x, dA) = −
−1 ◦ dA
and the variation of the action reduces to
〈dW (q), δq〉 =
−1 ◦ dA
∧ δA− d
−1 ◦ dA
. (87)
Recalling that, on the other hand
〈〈〈〈〈p, δq〉〉〉〉〉
J ∧ δA−
d (G ∧ δA)
, (88)
the claim follows.
A phase space trajectory belongs to the dynamics DDDD, if and only if it satisfies the virtual action
principle for each current c with support included in its domain of definition. There is a characterization
of the dynamics of phase space trajectories in terms of differential equations. This is shown in the
following propositions.
Theorem 4. A phase space trajectory (A,G, J) belongs to the dynamics DDDD if and only if it satisfies
the Euler-Lagrange equation
−1 ◦ dA
J (89)
and the constitutive relation
−1 ◦ dA
|g|. (90)
Proof: If a phase space trajectory (A,G, J) satisfies the Euler-Lagrange equation and the constitutive
relation, then by substituting the expressions (89) and (90) of J and G in terms of the electromagnetic
field F in the virtual action principle (82) it follows that (A,G, J) belongs to the dynamics DDDD.
The inverse implication will be proved in the next section.
The constitutive relation (90) produced by our variational principle corresponds to the momentum-
velocity relation of analytical mechanics.
Proposition 5. A phase space trajectory (A,G, J) satisfies the Euler-Lagrange equation and the
constitutive relation if and only if it satisfies the Maxwell’s equations
J (91)
and the constitutive relation
−1 ◦ F
|g|, (92)
with F = dA.
Proof: The constitutive relation (91) is satisfied if and only if the equation (90) is satisfied for
F = dA.
If a phase space trajectory (A,G, J) satisfies the Maxwell’s equations and the constitutive relation,
then by substituting the expression (92) of the electromagnetic induction in the equation (91) we see
that the Euler-Lagrange equation is satisfied, since F = dA.
Conversely if the Euler-Lagrange equation and the constitutive relation are satisfied, then, again by
substitution, we obtain that the Maxwell’s equations (91) are satisfied.
We remark that since the virtual action principle (77) is more complete than the Hamilton principle,
our formulation leads to the Maxwell’s equations (91) with external sources. The proposed variational
principle also permits the derivation of the constitutive relations (92) which are usually postulated
separately since the variations normally considered are not general enough to derive them from the
variational principle.
4.The Dynamics in a compact domain.
Let the current c consist in integrating an odd 4-form on a compact domain K ⊂ M with smooth
boundary ∂K. Field configurations, tangent vectors and covectors are equivalence classes of equivalence
relations based on the expressions ∫
κ ◦ (x,A, dA) (93)
and ∫
J ∧ δA−
d (G ∧ δA)
J ∧ δA−
G ∧ δA. (94)
It follows that a field q = q(A,K) is represented by the restriction
A|K:K → ∧1eV
∗ (95)
of the potential A to the the domain K. A tangent vector δq = q(δA,K) is represented by the
restriction
(δA)|K:K → ∧1eV
∗ (96)
of the variation δA to the domain K. A covector p = p(G, J,K) is represented by the pair of the
restriction
G|∂K: ∂K → ∧2oV
∗ (97)
of the electromagnetic induction G to the boundary ∂K of the domain K and the restriction
K → ∧3oV
∗ (98)
of the current J to the interior
K of the domain K.
A phase is a pair (q, p) of a field q = q(A,K) and a covector p = p(G, J,K).
The action is
W :QK → R: q(A,K) 7→
L ◦ (A, dA). (99)
The virtual action principle is the equality
〈dW (q), δq〉 − 〈〈〈〈〈p, δq〉〉〉〉〉
= 0 (100)
and the dynamics in the domain K is the set DK ⊂ Ph of phases satisfying the virtual action principle.
Proposition 6. A phase ph = (q(A,K), p(G, J,K)), defined in a compact domain K, belongs to the
dynamics DK if and only if the Euler-Lagrange equation
−1 ◦ dA
K (101)
and the constitutive relation
G|∂K =
−1 ◦ dA
|∂K (102)
are satisfied.
Proof: If q = q(A,K), p = p(G, J,K) and δq = q(δA,K), then
〈dW (q), δq〉 =
−1 ◦ dA
−1 ◦ dA
−1 ◦ dA
−1 ◦ dA
∧ δA. (103)
On the other hand
〈〈〈〈〈p, δq〉〉〉〉〉
J ∧ δA−
G ∧ δA. (104)
Thus the virtual action principle assumes the form
−1 ◦ dA
∧ δA−
−1 ◦ dA
J ∧ δA−
G ∧ δA. (105)
By using variations with (δA)|∂K = 0 we derive the Euler-Lagrange equation
−1 ◦ dA
K. (106)
Assuming that this equation is satisfied and using arbitrary variations, the constitutive relation
G|∂K =
−1 ◦ dA
|∂K (107)
follows.
The following Proposition completes the proof of the Theorem 4.
Proposition 7. If a phase space trajectory
(A,G, J):U → ∧1eV
∗ × ∧2oV
∗ × ∧3oV
. (108)
belongs to the dynamics DDDD, then the Euler-Lagrange equation
−1 ◦ dA
J (109)
and the constitutive relation
−1 ◦ dA
|g| (110)
are satisfied in U .
Proof: If (A,G, J) is a phase space trajectory, defined in the open set U ⊂ M , which belongs to the
dynamics DDDD, then the equation
−1 ◦ dA
K (111)
and the boundary relation
G|∂K =
−1 ◦ dA
|∂K (112)
are satisfied for each compact domain K ⊂ U . It follows that equations (109) and (110) are satisfied
in every x ∈ U .
5.The Lagrangian formulation of electrodynamics.
The Lagrangian formulation of dynamics is the infinitesimal limit of the formulation in a compact
domain with the domain shrinking to a point. A formal method which greatly simplifies the passage
to the infinitesimal limit is to replace the compact domain — which is used exclusively as domain
of integration — with the current c = δ(x)w, where δ(x) is the Dirac delta function in x ∈ M and
w ∈ ∧4oV is an odd 4-vector, with w 6= 0. The construction of infinitesimal fields, tangent vectors and
covectors is based on the expressions
δ(x)w
κ ◦ (x,A, dA) = 〈κ (x,A(x), dA(x)) , w〉 (113)
δ(x)w
J ∧ δA−
d (G ∧ δA)
J(x) ∧ δA(x) −
d(G ∧ δA)(x), w
. (114)
Since w 6= 0 and ∧4oV is one-dimensional, it follows from the first expression that a field q = q(A, c) is
represented by the pair
(A(x), F (x)) ∈ ∧1eV
∗ × ∧2eV
∗ (115)
of an even 1-covector A(x) and an even 2-covector F (x).
The second expression reduces to
dG(x) −
∧ δA(x) + G(x) ∧ δF (x), w
, (116)
since dδA(x) = δF (x).
It follows that a tangent vector δq = q(δA, c) is represented by the pair
(δA(x), δF (x)) ∈ ∧1eV
∗ × ∧2eV
∗ (117)
and a covector p = p(G, J, c) is represented by the pair
G(x), dG(x) −
∈ ∧2oV
∗ × ∧3oV
∗. (118)
The pairing
〈〈〈〈〈p, δq〉〉〉〉〉
J ∧ δA−
d (G ∧ δA)
(119)
assumes the form
〈〈〈〈〈p, δq〉〉〉〉〉
dG(x) −
∧ δA(x) + G(x) ∧ δF (x), w
. (120)
We have constructed the space of infinitesimal fields Qδ = ∧
∗×∧2eV
∗ and the space of infinitesimal
covectors Πδ = ∧
∗ × ∧3oV
∗. Hence, the infinitesimal phase space is
Phδ = Qδ × Πδ = ∧
∗ × ∧2eV
∗ × ∧2oV
∗ × ∧3oV
∗. (121)
The infinitesimal action is
W (q(A, δ(x)w)) = 〈L(A(x), F (x)), w〉 . (122)
The infinitesimal dynamics is the set
(q, δq) ∈ Phδ; ∀δq∈Qδ 〈dW (q), δq〉 = 〈〈〈〈〈p, δq〉〉〉〉〉
, (123)
It is easy to verify that the infinitesimal dynamics Dδ admits also the following more explicit expression
(a, f, g, h) ∈ Phδ; ∀
(δa,δf)∈∧1eV
∗×∧2eV ∗
DL(a, f, δa, δf) = −
(h ∧ δa + g ∧ δf)
. (124)
The infinitesimal dynamics Dδ is characterized by the following Proposition.
Proposition 8. An infinitesimal phase ph = (q(A, δ(x)w), p(G, J, δ(x)w)), with w 6= 0, belongs to
the infinitesimal dynamics Dδ if and only if the equations
G(x) =
−1(F (x))
|g| (125)
dG(x) =
J(x) (126)
are satisfied.
Proof: If q = q(A, δ(x)w)), p = p(G, J, δ(x)w), and δq = q(δA, δ(x)w), with w 6= 0, then the variation
of the action (87), which can also be expressed in the form
〈dW (q), δq〉 = −
−1 ◦ F
∧ δF, (127)
reduces to
〈dW (q), δq〉 = −
−1(F (x))
∧ δF (x), w
. (128)
Thus the virtual action principle
〈dW (q), δq〉 = 〈〈〈〈〈p, δq〉〉〉〉〉
(129)
has the explicit form
dG(x) −
∧ δA(x) +
G(x) −
−1(F (x))
∧ δF (x), w
= 0. (130)
Therefore if ph satisfies the equations (125) and (126), then (130) is satisfied and ph ∈ Dδ.
Conversely, if ph ∈ Dδ, then it satisfies the virtual action principle, which implies
dG(x) −
∧ δA(x) +
G(x) −
−1(F (x))
∧ δF (x) = 0, (131)
since w 6= 0 and ∧4oV is one-dimensional. The equations (125) and (126) follow, since δA(x) and δF (x)
are independent.
In a preceding section we showed that the dynamics of phase space trajectories can also be charac-
terized by the Maxwell’s equations and the constitutive relation. This fact can now be proved directly.
Indeed, if a phase space trajectory (A,G, J) belongs to the dynamics DDDD and x is a point in the
domain of definition of the trajectory, then the virtual action principle is satisfied for every infinitesimal
current δ(x)w. Thus the Maxwell’s equations and the constitutive relation are satisfied in x, thanks
to the previous proposition. It follows that they are satisfied in the whole domain of definition of the
trajectory.
To prove the inverse implication, we observe that the virtual action principle (82) can also be
expressed in the form
−1 ◦ F
|g| ∧ δF =
J ∧ δA−
d (G ∧ δA)
(132)
−1 ◦ F
∧ δF +
= 0, (133)
for each virtual displacement q(δA, c). Thus if a phase space trajectory (A,G, J) satisfies the Maxwell’s
equations, then it satisfies the virtual action principle for every current c with support contained in
the domain of definition of the trajectory and hence it belongs to DDDD.
6.The Hamiltonian formulation of electrodynamics.
We associate with the Lagrangian density
L:∧1eV
∗ × ∧2eV
∗ → ∧4oV
∗: (a, f) 7→ −
f,∧2eg
−1(f)
|g|, (134)
the energy density
E:∧1eV
∗ × ∧2oV
∗ × ∧2eV
∗ → ∧4oV
∗ (135)
defined by
E(a, g, f) = −
g ∧ f − L(a, f)
g ∧ f +
f,∧2eg
−1(f)
g ∧ f +
−1(f)
2g − ∧2eg
−1(f)
∧ f (136)
and treat this mapping as a family
E(a, g, ·):∧2eV
∗ → ∧4oV
∗ (137)
of mappings on the fibres of the projection
prP :∧
∗ × ∧2oV
∗ × ∧2eV
∗ → ∧1eV
∗ × ∧2oV
∗ (138)
onto the field-momentum space P = ∧1eV
∗ × ∧2oV
∗. The set
Cr(E, prP ) =
(a, g, λ) ∈ ∧1eV
∗ × ∧2oV
∗ × ∧2eV
δλ∈∧2
DE(a, g, λ, 0, 0, δλ) = 0
(139)
is the critical set of the family. The equality
DE(a, g, λ, δa, δg, δλ) = −
(δg ∧ λ + g ∧ δλ) − DL(a, λ, δa, δλ)
(δg ∧ λ + g ∧ δλ) +
−1(λ)
δg ∧ λ +
−1(λ)
(140)
implies
DE(a, g, λ, 0, 0, δλ) = −
−1(λ)
∧ δλ. (141)
Thus we obtain the expression
Cr(E, prP ) =
(a, g, λ) ∈ ∧1eV
∗ × ∧2oV
∗ × ∧2eV
∗; g = ∧2eg
−1(λ)
. (142)
The critical set is the graph of the Legendre mapping
Λ:∧1eV
∗ × ∧2eV
∗ → ∧2oV
∗: (a, λ) 7→ ∧2eg
−1(λ)
|g|. (143)
For each a ∈ ∧1eV
∗, the mapping Λ(a, ·) is invertible. Its inverse is the mapping
Λ(a, ·)−1:∧2oV
∗ → ∧2eV
∗: g 7→ ∧2eg
|g−1| g
, (144)
where
|g−1| denotes the odd 4-vector characterized by 〈
|g−1|〉 = 1. It follows that the critical
set is the image of the section
σ:∧1eV
∗ × ∧2oV
∗ → ∧1eV
∗ × ∧2oV
∗ × ∧2eV
∗: (a, g) 7→
a, g,∧2eg
|g−1| g
(145)
of the projection prP . The Hamiltonian density is the mapping
H = E ◦ σ:∧1eV
∗ × ∧2oV
∗ → ∧4oV
∗, (146)
defined by the formula
H(a, g) = −
|g−1| g
, (147)
for each (a, g) ∈ ∧1eV
∗×∧2oV
∗. The passage from the Lagrangian density L to the Hamiltonian density
H is the Legendre transformation of electrodynamics.
We show that the energy density can be used to generate the infinitesimal dynamics Dδ. We consider
the set
(a, f, g, r) ∈ Phδ; ∃
V ∗ ∀(δa,δg,δλ)∈∧1
V ∗×∧2
V ∗×∧2
DE(a, g, λ, δa, δg, δλ) =
(r ∧ δa− f ∧ δg)
. (148)
This set is obtained by projecting the set
D̃E =
(a, f, g, r, λ) ∈ Phδ × ∧
(δa,δg,δλ)∈∧1eV
∗×∧2oV ∗×∧2eV ∗
DE(a, g, λ, δa, δg, δλ) =
(r ∧ δa− f ∧ δg)
(149)
onto the phase space Phδ = ∧
∗ × ∧2eV
∗ × ∧2oV
∗ × ∧3oV
It follows from (140) that λ = f and that the set D̃E reduces to
D̃E =
(a, f, g, r, λ) ∈ Phδ × ∧
∗; λ = f,
(δa,δg,δf)∈∧1eV
∗×∧2oV ∗×∧2eV ∗
DL(a, f, δa, δf) =
(r ∧ δa + g ∧ δf)
. (150)
It projects onto the infinitesimal dynamics
(a, f, g, r) ∈ Phδ; ∀
(δa,δf)∈∧1
V ∗×∧2
DL(a, f, δa, δf) = −
(r ∧ δa + g ∧ δf)
. (151)
Hence, DE = Dδ.
It is clear from the definition of the set D̃E that this set is included in the set
(a, f, g, r, λ) ∈ Phδ × ∧
∗; (a, g, λ) ∈ Cr(E, prP )
. (152)
The use of the mapping σ results in
D̃E =
(a, f, g, r, λ) ∈ Phδ × ∧
∗; λ = f,
(δa,δg)∈∧1
V ∗×∧2
DH(a, g, δa, δg) =
(r ∧ δa− f ∧ δg)
. (153)
The Hamiltonian description of the dynamics
(a, f, g, r) ∈ Phδ; ∀
(δa,δg)∈∧1
V ∗×∧2
DH(a, g, δa, δg) =
(r ∧ δa− f ∧ δg)
(a, f, g, r) ∈ Phδ; ∧
|g−1| g
= f , r = 0
. (154)
is obtained by projecting onto the phase space Phδ.
Proposition 9. An infinitesimal phase ph = (q(A, δ(x)w), p(G, J, δ(x)w)), with w 6= 0, belongs to
the infinitesimal dynamics Dδ if and only if the equations
F (x) = ∧2eg ◦
|g−1| G(x)
(155)
dG(x) =
J(x) (156)
are satisfied.
Proof: We recall that an infinitesimal phase ph = (q(A, δ(x)w), p(G, J, δ(x)w)), with w 6= 0, is
represented by (
A(x), F (x), dG(x) −
J(x), G(x)
(157)
and use the Hamiltonian description (154) of infinitesimal dynamics Dδ. The claim easily follows.
The resulting equations for the phase space trajectories (A,G, J)
F = ∧2eg ◦
|g−1| G
(158)
J (159)
are called Hamilton’s equations. The equations (159) are again Maxwell’s equations and the equation
(158) is the inverse of constitutive relation (92).
7.References
[1] W. M. Tulczyjew, The origin of variational principles, in the volume Classical and quantum inte-
grability (Warsaw, 2001), 41–75, Banach Center Publ., 59, Polish Acad. Sci., Warsaw, 2003.
[2] G. Marmo, W. M. Tulczyjew, P. Urbański, Dynamics of autonomous systems with external forces,
Acta Physica Polonica B, 33 (2002), 1181–1240.
[3] A. De Nicola, W. M. Tulczyjew, A variational formulation of analytical mechanics in an affine
space, Rep. Math. Phys., 58 (2006), 335–350.
[4] W. M. Tulczyjew, A symplectic framework of linear field theories, Annali di Matematica pura ed
applicata, 130 (1982), 177–195.
[5] G. Marmo, E. Parasecoli, W. M. Tulczyjew, Space-time orientations and Maxwell’s equations, Rep.
Math. Phys. 56 (2005), 209–248.
[6] G. Marmo, W. M. Tulczyjew, Time reflection and the dynamics of particles and antiparticles, Rep.
Math. Phys. 58 (2006), 147–164.
[7] A. De Nicola, W. M. Tulczyjew, A Note on a Variational Formulation of Electrodynamics, Proc. XV
Int. Workshop on Geom. and Phys., Tenerife (Spain), 2006, Publ. de la RSME 11 (2007), 316–323.
[8] J. A. Schouten, Ricci calculus, Springer, Berlin, 1955.
[9] J. A. Schouten, Tensor Analysis for Physicists, Oxford University Press, London, 1951.
[10] G. de Rham, Variétés Differentiables, Hermann, Paris, 1955.
[11] T. Frankel, The Geometry of Physics: An Introduction, Cambridge University Press, Cambridge,
1997.
[12] F. W. Hehl and Yu. N. Obukhov, Foundations of Classical Electrodynamics, Charge, Flux, and
Metric, Birkhäuser, Boston, MA, 2003.
[13] I. V. Lindell, Differential Forms in Electromagnetics, IEEE Press, Piscataway, NJ, and Wiley-
Interscience, 2004.
[14] D. Freed, Classical field theory and Supersimmetry, IAS Park City Mathematics Series Vol. 11,
IAS, Princeton, 2001.
ABSTRACT
  We present a variational formulation of electrodynamics using de Rham even
and odd differential forms. Our formulation relies on a variational principle
more complete than the Hamilton principle and thus leads to field equations
with external sources and permits the derivation of the constitutive relations.
We interpret a domain in space-time as an odd de Rham 4-current. This permits a
treatment of different types of boundary problems in an unified way. In
particular we obtain a smooth transition to the infinitesimal version by using
a current with a one point support.

<|endoftext|><|startoftext|>
RSC Communication Template (Version 2.1)
Protein and ionic surfactants - promoters and inhibitors of contact line 
pinning 
Viatcheslav V. Berejnova
We report a new effect of surfactants in pinning a drop contact 
line, specifically that lysozyme promotes while lauryl sulfate 
inhibits pinning. We explain the pinning disparity assuming 
difference in wetting: the protein-laden drop wets a "clean" 
surface and the surfactant-laden drop wets an auto-precursored 
surface.  
 To date, the effect of surfactants on a drop's wetting and 
spreading has been well established 1-12. It was observed that low 
molecular weight surfactants extend spreading and decrease the 
contact line stability 13, 14. However, the effect of surfactants and 
proteins on pinning a drop contact line has not yet been thoroughly 
appreciated despite being a primary issue for several important 
methods in life science. Optimization of the pinning conditions 
could benefit the crystallization of globular and membrane proteins 
15, 16, formulation of the pesticide sprays for protecting the plants 1, 
17, and the influence of the pulmonary surfactants on physiological 
wettability of alveoli in the lungs 18.  
 In this letter, we first study the quasi-static pinning of the protein 
and surfactant drops pinned by siliconized glass slides. We 
demonstrate that lysozyme (Lys) increases the hysteresis effect and 
stabilizes the drop contact line, enhancing the size of the 
completely pinned drops by a factor of 4-5 (compared to water). 
Conversely, the sodium dodecyl sulfate (SDS) reduces the apparent 
contact angles, decreasing the size of the completely pinned drops 
by a similar factor. We found the protein pinning to be similar to 
pinning induced by the geometrical and chemical corrugations of 
the contact line 15.  
 In our experiments, DI water was purified by NANOpure II 
(Barnstead, Boston, MA). Lys protein, 6-x times crystallized hen 
egg white, was purchased from Seikagaku America (Mr~14 kD, 
lot: LF1121, Falmouth, MA). Lys was dissolved in a 50 mM 
sodium acetate buffer with pH = 5. pH of the SDS (0.29 kD, 99%, 
Sigma) solutions was 9. All solutions were filtered.  
 Flat, siliconized 22 mm glass slides HR3-231 were purchased 
from Hampton Research (Laguna Niguel, CA). On a new 
siliconized slide, a water drop with a volume of ~20 μL formed a 
reproducible contact angle of ~(92±1)o. The siliconized material of 
the HR3-231 slides is similar to the organosilane-composed 
solution AquaSil (Hampton Research). We inspected the surface 
topography for some of the HR3-231 slides using a contact mode 
AFM (DI MultiMode III, Santa Barbara, CA) with NSC 1215 tips 
from MikroMasch. We found the manufacturer's coating to be 
homogeneous. We treated some hydrophobic slides chemically 15 
in order to obtain the highly hydrophilic (contact angle <2°) 
circular area. Both the edge and hydrophilic-hydrophobic gradient 
provide a high threshold of pinning 15 that was taken as a reference 
in our experiments.  
Fig.1 Dimensionless diagram of the stability of inclined drops.  The 
symbols: ■, □, and ∆ correspond to Lys (5 mg/mL), buffer (Bf), and SDS 
(33.3 mg/mL) solutions, respectively. Insert (a) is a geometrical sketch of 
an inclined drop.  Insert (b) subtracts Vc from the raw-data diagrams 
(sin(α), V). The images 1, 2, and 3 represent three pinning zones: 1 - an 
absolutely stable drop; 2 - a drop is stable up to certain inclination < 90o, 
depending on V; and 3 - a drop is unstable and moves continuously.  
 Each drop was dispensed manually onto initially horizontal glass 
slides, which were then slowly rotated in 2-4° steps (Fig. 1a). For 
the Lys drops the relaxation time between rotations was ~1 min to 
allow the transient disturbances of the drop to dissipate. For the 
SDS drops the initial spreading was similar to that described earlier 
7-10, 12 and the relaxation time interval was different depending on 
concentration. We did not observe the autophobic 14 and 
Marangoni-induced 13 contact line displacements. The apparent 
advancing θa and receding θr contact angles were measured for 
some drops at quasi-static equilibrium using a horizontal 
goniometer 19. Dispensed drop volumes were accurate to 0.1-0.5%, 
and tilt and contact angle measurements were accurate to 1-2°. The 
surface tension γ for the Lys solutions was measured using the 
pendant drop counting method 15. 
 We characterize pinning by measuring the critical tilt α 
corresponding to continuous motion of an inclined drop of volume 
V 19. We observed that the scaling law sin(α)~V-2/3 for large α may 
fit the (V, α) data for the different surfactant concentration C over a 
broad range of the supercriticality ratio V/Vc (Fig.1). Vc is a result 
of fitting, depends on C, and denotes the critical volume 
corresponding to a vertically (sin(α)=1) pinned drop (Fig.1b). The 
function (C, Vc) characterizes the surfactant pinning (Fig.2).   
a IESVic, University of Victoria, Victoria, BC V8W 3P6, Canada  
E-mail: berejnov@gmail.com 
 Two different regimes of pinning can be seen in Fig.2 depending 
on the chemical nature of the used surfactants. For very low 
concentrations close to point A, pinning of the Lys and SDS-laden 
drops behaves similarly. In the moderate and high concentration 
regions, adding Lys (triangles) and SDS (squares) increases and 
decreases the critical volumes of the pinned drops, respectively. 
The white and black circles represent the Lys drops initially 
dispensed on the circularly treated hydrophilic patterns with 
different radii 
15 having edge profiles presented in Fig.2c. These 
patterns do not affect pinning of the SDS-laden drop. Both the 
similarities between the natural and pattern-induced pinning for the 
Lys drops and the differences in pinning between Lys and SDS are 
very remarkable.  
 Summarized below are our results corresponding to the different 
pinning regimes presented in Fig.2. First, pinning is associated with 
the appearance of a trace of liquid film behind the drop 20-22 that can 
be treated as a sign of high adhesion. Verifying this, we mark 
intervals on the concentration axis in Fig.3a and Fig.3b where the 
trace behind the drop body appears. The buffer and pure water do 
not exhibit any traces on the hydrophobic slide (Fig.4a), but the 
treated slides show unstable traces covering the treated area 
(Fig.4c). The Lys and SDS-laden drops show similar traces (Fig.4b 
and Fig.4d). However, high pinning was observed only for the Lys 
drops (Fig3a and Fig3b). The SDS decreases pinning in the trace-
region. Thus, the presence of the trace does not correlate with the 
strength of pinning for our choice of surfactants. 
 Second, an equilibrium of an inclined drop yields the formula 
ρVcgsin(α)~DγΔ(cosθ)r,a 
19, 20, 23, where ρ, g, and γ are the density 
of liquid, gravity, and the surface tension, respectively, and the 
contact angle difference Δ(cosθ)r,a equals cosθr-cosθa. Thus, the 
critical volume of an inclined drop (or pinning) is proportional to 
the wetted diameter D.  Roughly, we can treat D as the diameter of 
an initially dispensed drop onto a horizontal substrate. This 
approximation holds in small drops when deformation of the drop 
surface describing by the Bond number Bo=ρgD2/γ (d1 for our 
case) is not too large. For the concentration range presented in 
Fig.3a we did not find a noticeable variation in D for the naturally 
dispensed Lys drops for either inclination or concentration. For the 
treated slides presented in Fig.2, increasing D qualitatively agrees 
with the above formula for low deviations of D in the Lys drops. 
The patterns with larger radii (open circle) provide higher drop 
stability (larger pinning). The diameters of the SDS drops on 
hydrophobic slides grow monotonically with concentration 7. 
However, unlike the Lys, the SDS decreases the drop pinning with 
respect to the concentration. Therefore, the D variation does not 
affect pinning for the SDS case, but it does for the Lys.  
Fig.3 Combined diagrams of critical volumes, contact angles, their 
difference, and surface tension versus the component concentrations. 
Black circles are γ, (a): Lys and (b): SDS 24. ∆, ▲, □, and ■ denote Vc 
and Δ(cosθ)r,a for the Lys and SDS drops, respectively. Dotted triangles 
(Lys) and squares (SDS) depict the contact angles: white - maximal θa 
and black - minimal θr.  The zones of a liquid film trace appearing behind 
the quasi-statically displaced drop are marked on (a) and (b). Lines are 
not fits. 
Fig.2 Diagram of critical volumes versus concentration for vertically 
pinned drops. The inserts (a) and (b) represent drops on the untreated ∆ - 
Lys and □ - SDS, and treated ○ - Lys 8.0-diam-mm and ● - Lys 7.0-diam-
mm glass slides, respectively. The point A depicts Vc for water and 
buffer. (c) is an AFM image of the etched-2 versus unetched-1 areas for 
the treated slide (bar is 1μ). Dashed lines are not fits. 
 Third, we found that high pinning corresponds to high θa and 
Δ(cosθ)r,a, (Fig.3a and Fig.3c). For high concentration regions these 
parameters are related because the liquid-trace minimizes θr (Fig.3c 
dotted black triangles and squares). Keeping θa large is the only 
way to keep the difference Δ(cosθ)r,a also large. Adding Lys to the 
drops does not affect the initially high values of θa, and, 
consequently, Δ(cosθ)r,a (Fig.3a and Fig.3c). Conversely, adding 
the SDS decreases θa, controlling the overall low value of Δ(cosθ)r,a 
adequately, and provides the low pinning effect. 
 Forth, the overall effect of concentration is noticeably different 
between Lys and SDS. Fig.3a demonstrates that pinning is tied to 
the Lys amount in the drop. Starting at 10-4 mM, the Lys-pinning 
increases as Δ(cosθ)r,a increases, reaches its maximum 
corresponding to the minimum of θr (appearance of the liquid-
2    
trace), and finally saturates. Next, the high concentration region 
pinning decreases as γ(C) decreases (dγ/dC<0). The SDS-pinning, 
however, is much less sensitive to concentration. For the SDS 
drops, V
c is close to the water drop, is stable up to 10mM, and 
decreases as θr decreases. Additional probing of the Lys and SDS 
solutions with the plain glass slides (HR3-227, Hampton Research) 
does not show a qualitatively different behavior of pinning. For 
plain glass, the Vc(C) curves generally follow the data presented in 
Fig.2, corresponding to the hydrophobic untreated slides for both 
the Lys and SDS surfactants.  
 We see that the appearance of a trace indicating the high 
adhesion between solution and glass surface does not provide high 
pinning unambiguously: Lys increases and SDS decreases pinning. 
Neither does extending the drop's wetted diameter correlate with a 
pinning increase: for Lys, buffer, and water, a variation in D due to 
patterning increases pinning while it does not for SDS. Pinning and 
contact angle difference are correlated as well: an increase in 
Δ(cosθ)r,a corresponds to a pinning increase and decrease for the 
Lys and SDS solutions, respectively. 
 We attribute such a dramatic difference in pinning to the 
different gas-solid surfaces near the contact line. We hypothesize 
that the Lys contact line advances the "clean" 10 solid surface, 
whereas the SDS contact line advances the pre-cursored zone 
existing in close proximity to the contact line. We expect that 
mobility of the SDS molecules through a liquid interface may 
affect the advancing zone. Indeed, such mobility of the small 
amphiphilic molecules is well documented 2-4, 7-10, 12. Though the 
mobility of proteins is not available, we argue it to be negligible 
compared to SDS. The globular proteins are weak surfactants with 
large, less amphiphilic, and therefore much less interfacially mobile 
molecules 25-27 and, in addition, they irreversibly adhere to the 
glass. We assume that the proteins adhere to the solid from the drop 
interior and provide high heterogeneity for the three-phase contact 
that results in the high pinning threshold similar to the corrugated 
case 15 (Fig.2, C>10-2mM). In contrast, SDS penetrates the drop-
solid-air interface, increases affinity between the solid and the drop, 
and thus decreases the pinning threshold.  
Fig.4 Different regimes of displacement; (a): 40 μL drop of 50mM 
acetate buffer at pH=5 (water acts similarly); (b): 40 μL drop of 5mg/mL 
Lys; (c): 40 μl drop of water (bar is 5.5 mm); (d): side view of the 20 μL 
SDS 100 mg/mL drop (bar is 3.4 mm). Numbers are inclination in 
degrees; the slides (a), (b), and (d) are untreated, (c) is treated with D=5.5 
 The author thanks N. S. Husseini for proofreading the 
manuscript.  
References 
(1) Furmidge, C. G., J. Colloid Sci., 1962, 17, 309-&.     
(2) Eriksson, J.; Tiberg, F.; Zhmud, B., Langmuir, 2001, 17, 7274-    
7279.     
(3) Kumar, N.; Varanasi, K.; Tilton, R. D.; Garoff, S., Langmuir, 
2003, 19, 5366-5373.     
(4) Luokkala, B. B.; Garoff, S.; Tilton, R. D.; Suter, R. M., Langmuir, 
2001, 17, 5917-5923.     
(5) Mohammadi, R.; Wassink, J.; Amirfazli, A., Langmuir, 2004, 20, 
9657-9662.     
(6) Qu, D.; Suter, R.; Garoff, S., Langmuir, 2002, 18, 1649-1654.     
(7) Starov, V. M.; Kosvintsev, S. R.; Velarde, M. G., J. Colloid 
Interface Sci., 2000, 227, 185-190.     
(8) Stoebe, T.; Lin, Z. X.; Hill, R. M.; Ward, M. D.; Davis, H. T., 
Langmuir, 1996, 12, 337-344.     
(9) Stoebe, T.; Lin, Z. X.; Hill, R. M.; Ward, M. D.; Davis, H. T., 
Langmuir, 1997, 13, 7282-7286.     
(10) Stoebe, T.; Hill, R. M.; Ward, M. D.; Davis, H. T., Langmuir, 
1997, 13, 7276-7281.     
(11) Summ, B. D.; Soboleva, O. A.; Dolzhikova, V. D., Colloid J., 
1998, 60, 598-605.     
(12) von Bahr, M.; Tiberg, F.; Zhmud, B. V., Langmuir, 1999, 15, 
7069-7075.     
(13) Cachile, M.; Cazabat, A. M., Langmuir, 1999, 15, 1515-1521.     
(14) Frank, B.; Garoff, S., Langmuir, 1995, 11, 87-93.     
(15) Berejnov, V.; Thorne, R. E., Acta Cryst. B, 2005, 61, 1563-1567.     
(16) McPherson, A., Crystallization of Biological Macromolecules. 
Cold Spring Harbor Laboratory Press: New York, 1999. 
(17) Watanabe, T.; Yamaguchi, I., Pesticide Sci., 1992, 34, 273-279.     
(18) Hills, B. A., J. Appl. Phys., 1983, 54, 420-426.     
(19) Berejnov, V.; Thorne, R. E., Phys. Rev. E, 2007, submitted.    
http://lanl.arxiv.org/abs/physics/0609208 
(20) Frenkel, Y. I., ZETF, 1948, 18, 659-667.    
http://lanl.arxiv.org/abs/physics/0503051 
(21) Roura, P.; Fort, J., Phys. Rev. E, 2001, 64, 011601.     
(22) Tuck, E. O.; Schwartz, L. W., J. Fluid Mech., 1991, 223, 313-324.     
(23) Macdougall, G.; Ockrent, C., Proc. Roy. Soc. London Ser. A, 
1942, 180, 0151-0173.     
(24) Persson, C. M.; Jonsson, A. P.; Bergstrom, M.; Eriksson, J. C., J. 
Colloid Interface Sci., 2003, 267, 151-154.     
(25) Mobius, D. M., R., Proteins at Liquid Interfaces. Elsevier: 
Amsterdam, 1998. 
(26) Vogler, E. A., Langmuir, 1992, 8, 2013-2020.     
(27) Krishnan, A.; Liu, Y. H.; Cha, P.; Allara, D.; Vogler, E. A., J. 
Biomed. Mater. Res., Part A, 2005, 75A, 445-457.     
ABSTRACT
  We report a new effect of surfactants in pinning a drop contact line,
specifically that lysozyme promotes while lauryl sulfate inhibits pinning. We
explain the pinning disparity assuming difference in wetting: the protein-laden
drop wets a "clean" surface and the surfactant-laden drop wets an
auto-precursored surface.

<|endoftext|><|startoftext|>
Introduction
Algebraically special spacetimes play an essential role in the field of exact solutions of
Einstein’s equations and many known exact solutions in four dimensions are indeed
algebraically special [1]. Recently a generalization of the Petrov classification to higher
dimensions was developed in [2, 3] and it turned out that many higher-dimensional
solutions of Einstein’s equations are algebraically special as well (see e.g. [4]), in fact
‡ Now at: Departament de F́ısica Fonamental, Universitat de Barcelona, Diagonal 647, E-08028
Barcelona, Spain
http://arxiv.org/abs/0704.0435v2
Type D Einstein spacetimes in higher dimensions 2
so far there is only one known solution identified [5] as algebraically general - the static
charged black ring [6].
There is, however, one important difference between four dimensional and n > 4
dimensional cases - the Goldberg-Sachs theorem does not hold in higher dimensions.
Recall that for n = 4 the Goldberg-Sachs theorem implies that principal null
directions of an algebraically special vacuum spacetime are necessarily geodetic and
shearfree. It was stressed already in [7, 8] that the Goldberg-Sachs theorem cannot
be straightforwardly extended to higher dimensions. Namely in [7] it was pointed
out that principal null directions (or Weyl aligned null directions - WANDs [2]) of the
n = 5 Myers-Perry black holes [9] are shearing though the spacetime is of type D. In [8]
it was shown that in fact all vacuum, n > 4, type N and III expanding spacetimes are
shearing. In [10] it was also shown that for n > 4, n odd, all geodetic WANDs with
non-vanishing twist are again shearing.
In this paper we study various properties of algebraically special vacuum
spacetimes, such as geodeticity of multiple WANDs (not guaranteed in higher
dimensions - another “violation” of the Goldberg-Sachs theorem) and relationships
between optical matrices Sij and Aij and the Weyl tensor. Before approaching these
problems, we study in the first part of the paper (sections 3 and 4) constraints on
Weyl types of the spacetime following from various assumptions on the geometry.
In section 3 we show that in arbitrary dimension (i.e., hereafter, n ≥ 4) the only
Weyl types compatible with static spacetimes (and expanding stationary spacetimes
with appropriate reflection symmetry) are types G, Ii, D and O.
In section 4 we study direct or warped product spacetimes. It turns out that
warped spacetimes with a one-dimensional Lorentzian factor are again of types G, Ii,
D and O and that warped spacetimes with a two-dimensional Lorentzian factor are
necessarily of type D or O. This also implies that spherically symmetric spacetimes
are of type D or O.
It follows that type D spacetimes play an important role as the simplest non-trivial
case compatible with the above mentioned assumptions. Therefore, in the second part
of the paper (sections 5 and 6) we focus on studying properties of type D Einstein
spacetimes (i.e., vacuum with an arbitrary cosmological constant), dropping, however,
the assumptions that the spacetime is static, stationary or warped.
In section 5 we study type D spacetimes in arbitrary dimension and analyze
geodeticity of WANDs. It turns out that in a “generic” case in vacuum the multiple
WANDs are geodetic. Let us also point out that negative boost weight Weyl
components do not enter relevant equations and thus the same results also hold for
multiple WANDs in type II Einstein spacetimes. Surprisingly, it also turns out that
explicit examples of special vacuum type D spacetimes not belonging to our “generic”
class and admitting non-geodetic multiple WANDs can easily be constructed. Such
examples for arbitrary dimension n ≥ 7 are given in section 5.4. This shows that
there exist even more striking “violations” of the Goldberg-Sachs theorem in higher
dimensions than the examples with non-zero shear discussed above. In section 5 we
also study various properties of shearfree type D vacuum spacetimes.
Perhaps not surprisingly, the situation in five dimensions is considerably simpler
than for n > 5. In fact it turns out that for n = 5 the Weyl tensor of type D is fully
determined by a 3 × 3 real matrix Φij . At the same time, five dimensional gravity
is already an interesting arena where qualitatively new phenomena appear. We thus
devote section 6 to five dimensional vacuum type D spacetimes. We study relationships
between the Weyl tensor represented by Φij and optical matrices Sij and Aij . One
Type D Einstein spacetimes in higher dimensions 3
of the results is that for “generic” spacetimes with non-twisting WANDs (Aij = 0)
the antisymmetric part of Φij , Φ
ij , vanishes and the symmetric part Φ
ij is aligned
with Sij (in the sense that the matrices Φ
ij and Sij can be diagonalized together).
Similarly, in the “generic” case the condition ΦAij = 0 implies vanishing of Aij . Again,
there exist particular cases for which the “generic” proof does not hold, see section
6 for details. In this section a simple explicit example of a five-dimensional vacuum
type D spacetime, the Myers-Perry metric, is also presented and Sij , Aij , Φ
ij , and
ΦAij are explicitly given.
Finally in section 7 we concisely summarize main results and in the Appendix
we briefly study geometric optics of type D Kerr-NUT-AdS metrics in arbitrary
dimension.
2. Preliminaries
Let us first briefly summarize our notation, further details can be found in [8].
In an n-dimensional spacetime let us introduce a frame of n real vectors m(a)
(a, b . . . = 0, . . . , n − 1): two null vectors m(0) = m(1) = n, m(1) = m(0) = ℓ
and n−2 orthonormal spacelike vectors m(i) = m(i) (i, j . . . = 2, . . . , n−1) satisfying
ℓaℓa = n
ana = ℓ
am(i)a = n
am(i)a = 0, ℓ
ana = 1, m
(i)am(j)a = δij . (1)
The metric now reads
gab = 2ℓ(anb) + δijm
b . (2)
We will use the following decomposition of the covariant derivative of the vector ℓ and
the covariant derivative in the direction of ℓ
ℓa;b = Lcdm
b , D ≡ ℓ
a∇a. (3)
Note that ℓ is geodetic iff Li0 = 0 and for an affine parameterization also L10 = 0.
We will often use the symmetric and antisymmetric parts of Lij , Sij ≡ L(ij) (its
trace S ≡ Sii), Aij ≡ L[ij]. In case of geodetic ℓ, the trace of Sij represents
expansion θ ≡ 1
n−2S, the tracefree part of Sij is shear σij ≡ Sij − θδij and the
antisymmetric matrix Aij is twist.§ Optical scalars can be expressed in terms of ℓ
(when Li0 = 0 = L10)
σ2 ≡ σijσji = ℓ(a;b)ℓ(a;b) − 1n−2
, θ = 1
;a, ω
2 ≡ AijAij = ℓ[a;b]ℓa;b. (4)
The decomposition of the Weyl tensor in the frame (1) in full generality is given
by [8]
Cabcd = 4C0i0j n{am
d} + 8C010i n{aℓbncm
d} + 4C0ijk n{am
+ 4C0101 n{aℓbncℓd} + 4C01ij n{aℓbm
d} + 8C0i1j n{am
+ Cijklm
d} + 8C101i ℓ{anbℓcm
+ 4C1ijk ℓ{am
d} + 4C1i1j ℓ{am
where the operation { } is defined as w{axbyczd} ≡ 12 (w[axb]y[czd] + w[cxd]y[azb]).
§ For the sake of brevity, throughout the paper we shall refer to the corresponding quantities for non-
geodetic congruences as “expansion”, “shear”, and “twist” (in inverted commas), bearing in mind
that in that case expressions (4) do not hold.
Type D Einstein spacetimes in higher dimensions 4
In the second part of this paper we will focus on type D spacetimes, possessing
(in an adapted frame) only boost order zero components (see [8]) C0101, C01ij , C0i1j ,
Cijkl . For simplicity let us define the (n− 2)× (n− 2) real matrix
Φij ≡ C0i1j , (5)
with ΦSij , Φ
ij , and Φ ≡ Φii being the symmetric and antisymmetric parts of Φij and
its trace, respectively. Let us observe that for static spacetimes and for a large class of
warped geometries one has ΦAij = 0 (see section 4). Note also that the above mentioned
boost order zero components of the Weyl tensor are not completely independent. In
fact from the symmetries and the tracelessness of the Weyl tensor (cf. eqs. (7) and
(9) in [8]) it follows that
C01ij = 2C0[i|1|j] = 2Φ
ij , C0(i|1|j) = Φ
ij = − 12Cikjk , C0101 = −
Cijij = Φ. (6)
The type D Weyl tensor is thus completely determined by
m(m−1)
independent
components of ΦAij and
m2(m2−1)
independent components of Cijkl , where n = m−2.‖
3. Static and stationary spacetimes
3.1. Static spacetimes
Algebraically special spacetimes in higher dimensions are characterized by the
existence of preferred null directions - Weyl aligned null directions (WANDs). A
necessary and sufficient condition for a null vector ℓ being WAND in arbitrary
dimension is [3, 11]
ℓbℓcℓ[eCa]bc[dℓf ] = 0, (7)
where Cabcd is the Weyl tensor. Let us now assume that a spacetime of interest is
algebraically special and thus the equation (7) possesses a null solution ℓ = (ℓt, ℓA),
A = 1 . . . n − 1 (note that necessarily ℓt 6= 0 and at least one of the remaining
components is also non-zero).
For static spacetimes the metric does not depend on the direction of time and
consequently the form of the metric and of the Weyl tensor remains unchanged under
the transformation t̃ = −t. Therefore, in these new coordinates equation (7) has the
same form as in the original coordinates and admits a second solution ñ = (ℓt, ℓA).
In the original coordinates n = (−ℓt, ℓA). Thus for static spacetimes the existence of
a WAND ℓ implies the existence of a distinct WAND n which in fact has the same
order of alignment. The only Weyl types compatible with this property are types G,
Ii and D (or, trivially, O, i.e. conformally flat spacetimes). Therefore
Proposition 1 All static spacetimes in arbitrary dimension are of Weyl types G, Ii
or D, unless conformally flat.
In fact explicit examples of static spacetimes of these Weyl types are known -
charged static black ring (type G - [5]), vacuum static black ring (type Ii - [11]), the
Schwarzschild-Tangherlini black hole (type D - [8]) and the Einstein universe R×Sn−1
(type O - cf. the results summarized in section 4). Cf. also the static examples given
in [4].
‖ In the standard n = 4 (i.e., m = 2) case these are essentially the imaginary and real part of
Ψ2. More specifically, with the conventions of [1], one has Φ
Φδij with Φ = −2Re(Ψ2),
= Φ23 = −Im(Ψ2) as the only essential component of Φ
, while the Cijks reduce to the only
non-trivial component C2323 = −Φ.
Type D Einstein spacetimes in higher dimensions 5
Note that in four dimensions there is no type G and type I is equivalent to type
Ii [2, 3]. Thus for n = 4 only types I, D and O are compatible with static spacetimes.
This was discussed already in [12] in the case of static, n = 4, vacuum spacetimes
(see also additional comments in [13] and in section 6.2 of [1]).
3.2. Stationary spacetimes
One can use the same arguments as above for stationary spacetimes with the metric
remaining unchanged under reflection symmetry involving time and some other
coordinates. E.g. in Boyer-Lindquist coordinates the Kerr metric remains unchanged
under t̃ = −t, φ̃ = −φ and n = 5 Myers-Perry under t̃ = −t, φ̃ = −φ, ψ̃ = −ψ or,
for general dimension, Myers-Perry under t̃ = −t, φ̃i = −φi. Note, however, that
in contrast to the static case, in some special stationary cases one could in principle
get from the original WAND ℓ a “new” WAND n = −ℓ which represents the same
null direction. In order to deal with these special cases we note that the “divergence
scalar” (or, loosely speaking, “expansion”, since it does coincide with the standard
expansion scalar in the case of geodetic, affinely parameterized null directions) of
both WANDs n and ℓ related by reflection symmetry is the same (as well as all the
other optical scalars and the geodeticity parameters - this also applies to the static
case), i.e. ℓa;a = n
;a while the “expansion” of −ℓ is equal to −ℓa;a. Therefore for all
“expanding” spacetimes n 6= −ℓ. Thus
Proposition 2 In arbitrary dimension, all stationary spacetimes with non-vanishing
divergence scalar (“expansion”) and invariant under appropriate reflection symmetry
are of Weyl types G, Ii or D, unless conformally flat.
Note also that it is shown in [14] that Kerr-Schild spacetimes with the assumption
R00 = 0 are of type II (or more special) in arbitrary dimension with the Kerr-Schild
vector being the multiple WAND. Therefore all Kerr-Schild spacetimes that are either
static or belong to the above mentioned class of stationary spacetimes are necessarily
of type D. In particular, the Myers-Perry metric in arbitrary dimension is thus of
type D.¶
In addition to the rotating Myers-Perry black holes for n ≥ 4, of type D, we can
mention a number of physically relevant solutions as explicit examples of spacetimes
subject to Proposition 2.+ First, rotating vacuum black rings [17], of type Ii [11]. To
our knowledge, no stationary (non-static) type G solution has been so far explicitly
identified. It is, however, plausible to expect that a rotating charged black ring (so
far unknown in the standard Einstein-Maxwell theory) will be of type G as its static
counterparts. Further interesting examples fulfilling our assumptions are expanding
¶ This was already known in the case n = 5 [4, 8]. Furthermore, it has been demonstrated recently
in [15] by explicit computation of the full curvature tensor that the family [16] of higher dimensional
rotating black holes with a cosmological constant and NUT parameter is of type D for any n. We
observe in addition that, using the connection 1-forms given in [15], it is also straightforward to
show (see the Appendix) that the mutiple WANDs (which are related by reflection symmetry) of all
such solutions are twisting, expanding and shearing (except that the shear vanishes for n = 4). The
fact that the WANDs found in [15] are complex is only due to the analytical continuation trick used
in [16] to cast the line element in a nicely symmetric form - the WANDs of the associated “physical”
spacetimes are thus real after Wick-rotating back one of the coordinates.
+ It is straightforward to verify the “reflexion symmetry” of the metric we mention in this context.
The “expansion” condition, instead, has not been verified explicitly in all cases. However, it is
plausible that these spacetimes are indeed “expanding” since they contain as special limits or subcases
solutions with expansion, e.g. Myers-Perry black holes (cf. section 6.4, [8] and the preceding footnote).
Type D Einstein spacetimes in higher dimensions 6
stationary axisymmetric spacetimes with n − 2 commuting Killing vector fields [18],
which contain, apart from the (n = 5) black holes/rings mentioned above, also e.g.
the recently obtained “black saturn” [19], doubly spinning black rings [20] and black
di-rings [21]. In any dimension also rotating uniform black strings/branes satisfy the
assumptions of Proposition 2 (see section 4), and so does the ansatz recently used
in [22] for the numerical construction of corresponding n = 6 non-uniform solutions.
Other examples are all the stationary solutions discussed in [4] and various black ring
solutions reviewed in [23].
3.3. Remarks and “limitations” of the results
First, it is worth observing that we have not used any field equations for the
gravitational field in the considerations presented above and the results are thus purely
geometrical.
Note that one can not relax the assumption ℓa;a 6= 0 in the case of stationary
spacetimes. For example, the special pp -wave metric ds2 = gijdx
idxj−2dudv−2Hdu2
such that H,u = 0 (note that it is always H,v = 0 by the definition of pp -waves) and
∂u · ∂u = −2H < 0 represents stationary spacetimes (cf., e.g., [24] for the n = 4
vacuum case) that are invariant under reflection symmetry (ũ = −u, ṽ = −v) and yet
of type N [25]. In fact, the geodetic multiple WAND ℓ = ∂v is non-expanding (and
n = −ℓ is not a new WAND).
Furthermore, if we assume a null Killing vector field k instead of a timelike one we
are led to different conclusions. Namely, it is easy to show that k must be geodetic,
shearfree and non-expanding, which for Rabk
akb = 0 implies that k is a twistfree
WAND [10]. We thus end up with a subfamily of the Kundt class, for which (under
the alignment requirement Rabk
a ∝ kb, obeyed e.g. in vacuum) the algebraic type is
II or more special [10] (cf. section 24.4 of [1] for n = 4). In particular, a similar
argument applies locally at Killing horizons, where the type must thus be again II or
more special (provided Rabk
a ∝ kb).∗ This is in agreement with the result of [26] for
generic isolated horizons. As an explicit example, vacuum black rings (which are of
type Ii in the stationary region) become locally of type II on the horizon [11].
Finally, spacelike Killing vectors do not impose any constraint on the algebraic
type of the Weyl tensor, in general, and all types are in fact possible. For example
charged static black rings are of type G, vacuum black rings of type Ii, vacuum black
holes of type D, and they all admit at least one spacelike Killing vector; Kundt
spacetimes can be constructed that admit axial symmetry with all types II, D, III
and N being possible (see, e.g., [1] for n = 4).
4. Direct/warped product spacetimes
In this section we show that the algebraic types discussed above also characterize
certain classes of direct/warped product geometries of physical relevance. In addition
we discuss some optical properties of these spacetimes.
∗ The proof is a bit more tricky in this case since the Killing vector is null only at the horizon.
Still, one can adapt techniques used in [26, 27] for related investigations. Note that the horizon of
higher dimensional stationary black holes is indeed a Killing horizon (at least in the non-degenerate
case) [27].
Type D Einstein spacetimes in higher dimensions 7
4.1. Weyl tensor
Let us consider two (pseudo-)Riemannian spaces (M1, g(1)) and (M2, g(2)) of dimension
n1 and n2 (n1, n2 ≥ 1 and n1 + n2 ≥ 4), parameterized by coordinates xA
(A,B = 0, . . . , n1 − 1) and xI (I, J = n1, . . . , n1 + n2 − 1), respectively. Using
adapted coordinates xµ (µ, ν = 0, . . . , n1 + n2 − 1) constructed from the coordinates
xA of M1 and x
I of M2, we define the direct product (M, g) to be the product
manifold M = M1 ×M2, of dimension n = n1 + n2, equipped with the metric tensor
g(xµ) = g(1)(x
A) ⊕ g(2)(xI) defined (locally) by gAB = g(1)AB, gIJ = g(2)IJ , gAI = 0.
For the sake of definiteness, we shall assume hereafter that (M1, g1) is Lorentzian and
(M2, g2) is Riemannian.
In general, any geometric quantity which can be split like the product metric
(i.e., with no mixed components and with the A[I] components depending only on
the xA[xI ] coordinates) is called a “product object” (or “decomposable”). Various
interesting geometrical properties then follow [28] and, in particular, the Riemann
and Ricci tensors and the Ricci scalar are all decomposable. As a consequence, a
product space is an Einstein space iff each factor is an Einstein space and their Ricci
scalars satisfy R(1)/n1 = R(2)/n2 [28].
Using the above coordinates it follows from the standard definition that the mixed
components of the Weyl tensor are given by
CABCI = CABIJ = CAIJK = 0, (8)
CAIBJ = −
g(1)ABR(2)IJ + g(2)IJR(1)AB
R(1) +R(2)
(n− 1)(n− 2)
g(1)ABg(2)IJ , (9)
where R(1)AB [R(2)IJ ] is the Ricci tensor of (M1, g1) [(M2, g2)]. For the non-mixed
components one has to distinguish the special cases n1 = 1, 2 (and the “symmetric”
cases n2 = 1, 2, which we omit for brevity). If n1 = 1 there are of course no non-mixed
components CABCD since now the x
A span a one-dimensional space. If n1 = 2 there
is only one independent component, i.e. C0101 (notice that here, exceptionally, 0 and
1 are not frame indices but refer to the coordinates x0 and x1 in the factor space M1).
For n1 ≥ 3,
CABCD = C(1)ABCD +
(n− 2)(n1 − 2)
g(1)A[CR(1)D]B − g(1)B[CR(1)D]A
(n− 1)(n− 2)
R(2) −R(1)
n2(n2 + 2n1 − 3)
(n1 − 1)(n1 − 2)
g(1)A[Cg(1)D]B (n1 ≥ 3), (10)
where C(1)ABCD is the Weyl tensor of (M1, g1), whereas the remaining non-mixed
components are given for any n1 ≥ 1 by
CIJKL = C(2)IJKL +
(n− 2)(n2 − 2)
g(2)I[KR(2)L]J − g(2)J[KR(2)L]I
(n− 1)(n− 2)
R(1) −R(2)
n1(n1 + 2n2 − 3)
(n2 − 1)(n2 − 2)
g(2)I[Kg(2)L]J (n2 ≥ 3), (11)
where C(2)IJKL is the Weyl tensor of (M2, g2). It is thus obvious that the Weyl tensor
is not decomposable, in general. It turns out that the Weyl tensor is decomposable
iff both product spaces are Einstein spaces and n2(n2 − 1)R(1) + n1(n1 − 1)R(2) = 0
(the latter condition is identically satisfied whenever n1 = 1 or n2 = 1, while for
n1 = 2 [n2 = 2] it implies that (M1, g1) [(M2, g2)] must be of constant curvature).
When the Weyl tensor is decomposable the only non-vanishing components take the
simple form CABCD = C(1)ABCD, CIJKL = C(2)IJKL. Therefore, in particular, the
Type D Einstein spacetimes in higher dimensions 8
product space is conformally flat iff both product spaces are of constant curvature and
n2(n2 − 1)R(1) + n1(n1 − 1)R(2) = 0.
Determining the possible algebraic types of the Weyl tensor requires considering
various possible choices for the dimension n1 of the Lorentzian factor.
If n1 = 1, the full metric can always be cast in the special static form ds
−dt2 + gIJdxIdxJ . Recalling the result of section 3, the Weyl tensor can thus only
be of type G, Ii, D or O. In particular, one can show that C0i1j = C0j1i, so that for
direct product spacetimes with n1 = 1 one has Φ
ij = 0 identically.
If n1 ≥ 2, it is convenient to adapt the null frame (1) to the natural product
structure, so that gab = 2ℓ(anb) + δÂB̂m
b + δÎĴm
b (where Â, B̂ =
2, . . . , n1 − 1, Î , Ĵ = n1, . . . , n − 1 are now frame indices, and the frame vectors do
not have mixed coordinate components, e.g. ℓI = 0 = nI etc.). From (10) and (11)
it thus follows that CABCD and CIJKL do not give rise to mixed frame components,
and from (9) that CAIBJ does not give rise to non-mixed frame components. Hence
the only non-vanishing mixed components are (ordered by boost weight)
C0Î0Ĵ = −
R(1)00δÎĴ , C0ÎÂĴ = −
R(1)0ÂδÎĴ ,
0Î1Ĵ
= − 1
(2)ÎĴ
+R(1)01δÎĴ
R(1) +R(2)
(n− 1)(n− 2)
ÂÎB̂Ĵ
= − 1
R(2)ÎĴδÂB̂ +R(1)ÂB̂δÎĴ
R(1) +R(2)
(n− 1)(n− 2)
, (12)
C1ÎÂĴ = −
n− 2R(1)1ÂδÎĴ , C1Î1Ĵ = −
n− 2R(1)11δÎĴ .
The non-mixed frame components are given for n1 = 2 by
C0101 = −
2(n2 + 1)
(n2 − 1)R(1) +
2R(2)
(n1 = 2), (13)
and for n1 ≥ 3 by
0Â0B̂
(1)0Â0B̂
(n− 2)(n1 − 2)
R(1)00δÂB̂,
010Â
(1)010Â
(n− 2)(n1 − 2)
(1)0Â
C0ÂB̂Ĉ = C(1)0ÂB̂Ĉ −
(n− 2)(n1 − 2)
R(1)0[ĈδB̂]Â,
C0101 = C(1)0101 −
(n− 2)(n1 − 2)
R(1)01
(n− 1)(n− 2)
R(2) −R(1)
n2(n2 + 2n1 − 3)
(n1 − 1)(n1 − 2)
01ÂB̂
(1)01ÂB̂
(n1 ≥ 3), (14)
C0Â1B̂ = C(1)0Â1B̂ +
(n− 2)(n1 − 2)
R(1)ÂB̂ +R(1)01δÂB̂
(n− 1)(n− 2)
R(2) −R(1)
n2(n2 + 2n1 − 3)
(n1 − 1)(n1 − 2)
ÂB̂ĈD̂
= C(1)ÂB̂ĈD̂ +
(n− 2)(n1 − 2)
R(1)B̂[D̂δĈ]Â −R(1)Â[D̂δĈ]B̂
Type D Einstein spacetimes in higher dimensions 9
(n− 1)(n− 2)
R(2) −R(1)
n2(n2 + 2n1 − 3)
(n1 − 1)(n1 − 2)
B̂[D̂
Ĉ]Â
Î ĴK̂L̂
(2)ÎĴK̂L̂
(n− 2)(n2 − 2)
Î[K̂
(2)L̂]Ĵ
Ĵ[K̂
(2)L̂]Î
(n− 1)(n− 2)
R(1) −R(2)
n1(n1 + 2n2 − 3)
(n2 − 1)(n2 − 2)
Î[K̂
L̂]Ĵ
1ÂB̂Ĉ
(1)1ÂB̂Ĉ
− 2n2
(n− 2)(n1 − 2)
(1)1[Ĉ
B̂]Â
101Â
(1)101Â
(n− 2)(n1 − 2)
(1)1Â
1Â1B̂
(1)1Â1B̂
(n− 2)(n1 − 2)
R(1)11δÂB̂.
(The expression for C
ÎĴK̂L̂
holds only when n2 ≥ 3, while for n2 = 2 one gets only
one component C2323 similar to (13).)
For n1 = 2 the Weyl tensor of (M1, g1) of course vanishes, and in addition we
have R(1)00 = 0 = R(1)11 identically (any 2-space satisfies 2R(1)AB = R(1)g(1)AB).
Therefore among the above components (12) and (13) only the boost weight zero
components C0Î1Ĵ and C0101 survive, so that the corresponding spacetime can be
only of type D (or conformally flat), and ℓ and n, as chosen above, are multiple
WANDs. Note also that Φij reduces to ΦÎĴ = C0Î1Ĵ = C0Ĵ1Î in this case, therefore
ΦAij = 0. As an example, the higher dimensional electric Bertotti-Robinson solutions
fall in this class, cf., e.g, [29, 30].
For n1 = 3, again the Weyl tensor of (M1, g1) vanishes. With the additional
assumption that (M1, g1) is Einstein, we get R(1)00 = R(1)11 = R(1)0Â = R(1)1Â = 0
(here Â = 2 only), and as above the Weyl tensor is of type D with ΦAij = 0.
Similarly, for any n1 > 3, if (M1, g1) is an Einstein space the only non-zero mixed
Weyl components (12) will have boost weight zero, and the non-mixed components
(14) simplify considerably. As a particular consequence, if (M1, g1) is an Einstein
space of type D, (M, g) will also be of type D (but now ΦAij 6= 0, in general) - this
is the case, for example, of uniform black strings/branes (either static or rotating,
see also the discussion concluding this section). If (M1, g1) is of constant curvature,
(M, g) will be of type D with ΦAij = 0 (or O) - this includes the higher dimensional
magnetic Bertotti-Robinson solutions [29]. One can consider other special cases using
similar simple arguments.
A spacetime conformal to a direct product spacetime is called a warped product
spacetime if the conformal factor depends only on one of the two coordinate sets xA,
xI (see e.g. [1]). Obviously, the algebraic type of two conformal spaces is the same.♯
Some of the results presented above can thus be straightforwardly generalized to the
more general case of warped products. For example,
Proposition 3 In arbitrary dimension, a warped spacetime with a one-dimensional
Lorentzian (timelike) factor can be only of type G, Ii, D (with Φ
ij = 0) or O.
This case includes, in particular, the conclusion of section 3 for static spacetimes.
As warped non-static/non-stationary examples we can mention the de Sitter universe
(in global coordinates) and FRW cosmologies. For n = 4 Proposition 3 reduces to a
result of [32].
♯ This is true also for doubly warped product spacetimes discussed in [31], so that Propositions 3
and 4 hold also in that case.
Type D Einstein spacetimes in higher dimensions 10
Furthermore,
Proposition 4 In arbitrary dimension, a warped spacetime with a two-dimensional
Lorentzian factor can be only of type D (with ΦAij = 0) or O.
Cf. again [32] for n = 4. Notice that in this case the line element can
always be cast in one of the two (conformally related) forms ds2 = 2A(u, v)dudv +
f(u, v)hIJ(x)dx
IdxJ or ds2 = 2f̃(x)A(u, v)dudv + gIJ(x)dx
IdxJ (so that multiple
WANDs are given by ∂u and ∂v), which include a number of known spacetimes. For
example, the first possibility includes all spherically symmetric spacetimes, hence as
a special case of Proposition 4 we have
Proposition 5 In arbitrary dimension, a spherically symmetric spacetime is of type
D (with ΦAij = 0) or O.
For n = 4 this has been known for a long time (see e.g. [33] and sections 15.2,
15.3 of [1]), and in this case ΦAij = 0 means that Ψ2 is real (see the footnote on p. 4).
For n > 4 this result has been proven in [34] in the static case.
Other properties of decomposable Weyl tensors were discussed in [2].
4.2. “Factorized” geodetic null vector fields
Let us define an n-dimensional spacetime (M, g) as the warped product of an n1-
dimensional Lorentzian space (M1, g(1)) and an n2-dimensional Riemannian space
(M2, g(2)), with n = n1+n2 as in the preceding subsection. Hereafter we shall assume
n1 ≥ 2. Using the adapted coordinates defined above, the metric can take one of the
following two forms
ds2 = gABdx
AdxB + f(xA)hIJdx
IdxJ , (15)
ds2 = f̃(xI)hABdx
AdxB + gIJdx
IdxJ , (16)
where gAB, hAB = g(1)AB depend only on the x
A coordinates and gIJ , hIJ = g(2)IJ
only on the xI coordinates.
Given a null vector ℓ(1) = ℓ
(1)∂A ofM1, this can be “lifted” to define a null vector
ℓ of M with covariant components ℓA = ℓ(1)A (functions of the x
A only) and ℓI = 0.
From equations (15), (16) it follows that if ℓ(1) is geodetic (and affinely parameterized)
in M1 then ℓ is automatically geodetic (and affinely parameterized) in M . We can
thus “compare” the optical scalars of ℓ(1) in M1 with those of ℓ in M . For the warped
metric (15), with the definitions (4) one finds
σ2 = σ2(1) +
(n1 − 2)n2
n1 + n2 − 2
θ(1) −
(ln f),Aℓ
n1 + n2 − 2
(n1 − 2)θ(1) +
(ln f),Aℓ
, (17)
ω2 = ω2(1),
where σ2
, θ(1) and ω
are the optical scalars of ℓ(1) in (M1, g(1)). For the warped
metric (16) one has
σ2 = f̃−2
σ2(1) +
(n1 − 2)n2
n1 + n2 − 2
θ2(1)
n1 − 2
n1 + n2 − 2
f̃−1θ(1), (18)
ω2 = f̃−2ω2(1).
Type D Einstein spacetimes in higher dimensions 11
The special case of direct products is recovered for f, f̃ = const. (which can be
rescaled to 1), in which case the shear of the full spacetime originates in the shear
and expansion of the Lorentzian factor (while expansion and twist are essentially the
same as in (M1, g(1))).
Note that for n1 = 2 the definitions (4) for σ
and θ(1) become formally singular
because of the normalization, but for a Lorentzian 2-space (e.g., ds2 = 2A(u, v)dudv
with the geodetic null vector ℓ = A−1∂v) one has ℓ(a;b)ℓ
(a;b) = ℓa;a = ℓ[a;b]ℓ
a;b = 0, so
that we can essentially take σ2
= θ(1) = ω(1) = 0 and formulae (17), (18) still hold.
The results of this section can be applied to several known solutions. For example,
static [rotating] black strings and branes (i.e, direct products of Schwarzschild [Kerr]
cross a flat space) are type D vacuum spacetimes with two shearing, expanding,
twistfree [twisting] multiple WANDs. As such, they clearly “violate” the Golberg-
Sachs theorem. In addition, spherically symmetric solutions in any dimensions (which
necessarily take the metric form (15) with n1 = 2) are type D spacetimes with two
shearfree, expanding, twistfree multiple WANDs (independently of any specific field
equations; in the “exceptional case” (ln f),Aℓ
A = 0 the vector ℓ is non-expanding, e.g.
for Bertotti-Robinson/Nariai geometries, or for null generators of horizons).
5. Type D Einstein spacetimes in higher dimensions
From the results of the previous sections it follows that type D spacetimes
are the simplest non-trivial examples of static/stationary (“expanding” and with
an appropriate reflection)/warped spacetimes. Therefore we will focus on type
D spacetimes in general (without assuming staticity etc.). Recall that the
quantities/symbols used below (e.g. Φij , Lij , D) are defined in section 2.
5.1. Algebraic conditions following from the Bianchi equations
Various contractions of Bianchi identities
Rabcd;e +Rabde;c +Rabec;d = 0 (19)
lead to a set of first-order PDEs for frame components of the Riemann tensor given in
Appendix B of [8]. In the following we shall concentrate on Einstein spaces (defined
by Rab =
gab), for which the same set of equations holds unchanged also for
components of the Weyl tensor. In case of algebraically special spacetimes, some
of these differential equations reduce to algebraical equations due to the vanishing of
some components of the Weyl tensor. Here we derive algebraic conditions following
from the Bianchi equations for type D Einstein spacetimes. These conditions will be
employed in subsequent sections.
In particular, by contracting (19) with m(i), ℓ, m(j), m(k) and ℓ (equation (B.8)
in [8]) and assuming to have a type D Einstein space we get the first algebraic condition
ΦijLk − ΦikLj + 2ΦAkjLi − CisjkLs = 0, (20)
where we denoted Li0 by Li. We will also denote LiLi by L.
The second algebraic equation follows from equation (B.15, [8])
0 = 2
ΦAjkLim +Φ
mjLik +Φ
kmLij +ΦijAmk +ΦikAjm +ΦimAkj
+ CisjkLsm + CismjLsk + CiskmLsj (21)
Type D Einstein spacetimes in higher dimensions 12
and contraction of k with i leads to
0 = SΦAmj +ΦAjm − (ΦSmi +ΦAmi)Sij + (ΦSji +ΦAji)Sim
+ 2(ΦAimAij − ΦAijAim) + 12CismjAsi. (22)
By contracting m with j in equation (B.12) from [8] we get
2DΦSik = 4Φ
ijAkj +ΦkjLij +ΦjiLjk − ΦkiS − ΦLik − 2ΦSisLsk
− 2ΦSsk
M i0 −2ΦSis
Mk0 +CijksLsj , (23)
where we employed Ciskj
M j0 +Cijks
M j0= 0 (
M j0 +
M s0 = 0, cf. [8]).
The symmetric part of equation (B.5, [8]) and equation (B.3) (that is equivalent
to the antisymmetric part of (B.5)) give, respectively,
2DΦSik = −2ΦSik+(−2Φis+Φsi)Lsk+(−2Φks+Φsk)Lsi−2ΦSsk
M i0 −2ΦSis
Mk0, (24)
2DΦAik = −2ΦAik+(−2Φis+Φsi)Lsk−(−2Φks+Φsk)Lsi−2ΦAsk
M i0 +2Φ
Mk0 .(25)
By subtracting (24) from (23) we finally obtain the third algebraic equation
0 = −ΦkiS +ΦLki +ΦkjLij + 4ΦAijAkj + (2Φkj − Φjk)Lji + 2ΦAijLjk + CijksLsj .(26)
Its antisymmetric part is, thanks to CikjmAmj = 2CijksAsj , equal to equation (22)
and its symmetric part reads
0 = −SΦSik +ΦSik +ΦSijSjk +ΦSkjSij + 3(ΦAijSjk +ΦAkjSji) + CijksSsj . (27)
Equations (20), (22) and (27) will be extensively used in the following sections.
In passing, let us observe here in what sense the n = 4 case is unique. Recalling
the footnote on p. 4, from (20) we get Li = 0 (geodetic property) unless Φij = 0
(trivial case of zero Weyl tensor); equation (22) is identically satisfied (noting that
necessarily ΦAij ∝ Aij when n = 4); equation (27) implies Sij ∝ δij (vanishing shear)
again unless Φij = 0. Thus for n = 4 we correctly recover the standard Goldberg-
Sachs result (here restricted to type D spacetimes) that multiple WANDs (PNDs) are
geodetic and shearfree in vacuum (and Einstein) spaces [1]. The situation in higher
dimensions, which is qualitatively different from the n = 4 case, is studied in the
following sections.
5.2. WANDs in “generic” vacuum type D and II spacetimes in arbitrary dimension
are geodetic
In this section we study equation (20) in order to determine under which circumstances
the multiple WAND ℓ is geodetic.
By contracting i with k in (20) and using (6) we get
3ΦAij − ΦSij
Li = ΦLj (28)
and after multiplying (28) by Lj we obtain
ΦSijLiLj = −ΦL. (29)
By multiplying (20) by LiLj and using (29) we get
3ΦAikLi +Φ
ikLi +ΦLk
= 0. (30)
Thus either L = 0 or
−3ΦAij − ΦSij
Li = ΦLj . (31)
Type D Einstein spacetimes in higher dimensions 13
By adding and subtracting (28) and (31) we get
ΦSijLi = −ΦLj, ΦAijLi = 0. (32)
Finally multiplying (20) by Li and using (32) we get
LΦAij = 0. (33)
This implies that for a type D vacuum spacetime with non-vanishing ΦAij in arbitrary
dimension corresponding WANDs are geodetic.
In the case with vanishing ΦAij , let us choose a frame in which Φ
ij is diagonal
ΦSij = diag{p(2), p(3), . . . , p(n−1)}. Then from the first equation (32) it follows
(p(i) +Φ)Li = 0, (34)
where (from now on) we do not sum over indices in brackets. If p(i) 6= −Φ, ∀i, then
Li = 0, ∀i, i.e. ℓ is geodetic.
Note that so far we have employed only equation (20), which corresponds to
equation (B.8) in [8] and which does not contain Weyl tensor components with
negative boost order. Consequently, the same conclusions hold also for type II Einstein
spacetimes.
Proposition 6 In arbitrary dimension, multiple WANDs of type II and D Einstein
spacetimes are geodetic if at least one of the following conditions is satisfied:
i) ΦAij is non-vanishing;
ii) for all eigenvalues of ΦSij: p(i) 6= −Φ.
Note that the above argument can not be extended to more special algebraic
classes of spacetimes since it relies on the fact that some Weyl components with boost
weight zero are non-vanishing. However, it was already shown in [8] that multiple
WANDs in type N and III vacuum spacetimes are geodetic (in that case with no
need of extra assumptions). Therefore we can conclude that under most “generic”
conditions multiple WANDs are geodetic. Note, however, that certain special type-
D vacuum solutions with ΦAij = 0 and p(i) = −Φ (for some i) admit non-geodetic
multiple WANDs. Explicit example of such spacetime is given in section 5.4.
5.3. Vacuum type D spacetimes with a “shearfree” WAND
The algebraic equations (22) and (27) are quite complicated in general dimension and
thus here we will limit ourselves to the “shearfree” case. This is of interest since
it includes, for instance, the Robinson-Trautman solutions containing static black
holes [35].
With the “shearfree” condition
Sij =
n−2δij , (35)
equation (27) leads for S 6= 0 to
ΦSij =
n−2δij (S 6= 0), (36)
whereas it is identically satisfied for S = 0. In the rest of this subsection we thus
consider only the “expanding” case S 6= 0. For ΦSij in the form (36) with Φ 6= 0 the
condition ii) of Proposition 6 is satisfied and thus the WAND ℓ is geodetic.
Proposition 7 In arbitrary dimension, multiple “shearfree” and “expanding” WAND
in a type D Einstein spacetime is geodetic whenever Φij 6= 0.
Type D Einstein spacetimes in higher dimensions 14
Note that Φij has to be non-zero for type D spacetimes in four and five dimensions.
Thus all such shearfree WANDs are geodetic.†† On the other hand, spacetimes with
Φij = 0 are not necessarily conformally flat for n > 5 (Cijkl can be non-vanishing, and
in that case equation (20) reduces to CisjkLs = 0) and in fact in section 5.4 we will
present an example of such type D vacuum spacetime with a non-geodetic “shearfree”
multiple WAND.
Furthermore, using (35) and (36), equation (22) reads
0 = n−4
n−2SΦ
ij +ΦAji + 2(Φ
kiAkj +Φ
jkAki) +
CkmijAmk. (37)
As mentioned above this is identically satisfied for n = 4. For n > 4 (and S 6= 0), if one
assumes Aij = 0 it gives Φ
ij = 0, while assuming Φ
ij = 0 leads to CkmijAmk = 2ΦAij .
On the other hand, from equation (25) with (35) and (36) we see that ΦAij = 0 implies
Aij = 0, unless Φ = 0 (in which case the full Φij would be zero). We can thus
summarize these results in
Proposition 8 For a multiple “shearfree” and “expanding” WAND in a type D
Einstein spacetime in n > 4 dimensions the following implications hold
(i) Aij = 0 ⇒ ΦAij = 0.
(ii) ΦAij = 0, Φ
ij 6= 0 ⇒ Aij = 0.
(iii) ΦAij = 0, Φ
ij = 0 ⇒ CkmijAmk = 0.
Note that for an arbitrary odd-dimensional spacetime with a geodetic and
shearfree WAND one has Aij = 0 [10] and thus in the expanding case, θ 6= 0, by (i)
ΦAij also necessarily vanish. Note also that the assumptions of (i) (i.e., σij = 0 = Aij ,
θ 6= 0) uniquely identify the Robinson-Trautman spacetimes (which are of type D for
n > 4) in any dimensions and indeed ΦAij = 0 for the correspondingWeyl tensor [35]. In
general ΦSij =
n−2δij 6= 0 for Robinson-Trautman solutions [35] and by Proposition 7
the multiple WANDs are thus geodetic, however, in the next subsection we present a
very special Robinson-Trautman solution with vanishing ΦSij and with a non-geodetic
WAND.
5.4. An example of type D vacuum spacetimes with a non-geodetic WAND
The conclusions in the preceding subsections about the geodetic character of multiple
WANDs can not be (in contrast to the n = 4 case) extended to the most general
case. In fact, here we point out that a special subclass of the Robinson-Trautman
solutions [35] in n ≥ 7 dimensions represents type D vacuum spacetimes (with a
possible cosmological constant) for which one of the multiple WANDs is non-geodetic.
Namely, let us consider the vacuum family [35, 36]
ds2 = r2hijdx
idxj − 2dudr − 2Hdu2,
2H = K − 2r(lnP ),u −
(n− 2)(n− 1)
r2 (K = 0,±1), (38)
where P 2 = (det hij)
1/(2−n) and hij represents an arbitrary (n − 2)-dimensional
Einstein space (i, j = 2 . . . , n − 1 are, exceptionally, coordinate indices in this
subsection). Using a suitable frame based on the null vectors
ℓ = ∂r, n = −∂u +H∂r, (39)
†† In fact, for n = 4 from the Goldberg-Sachs theorem we already knew that all multiple WANDs are
automatically shearfree and geodetic.
Type D Einstein spacetimes in higher dimensions 15
the only non-vanishing components of the Weyl tensor have boost weight zero and are
given by [35]
Cijkl = r
2(Rijkl − 2Khi[khl]j), (40)
where Rijkl is the Riemann tensor associated to hij . This implies that the
spacetime (38) is of type D, with Φij = 0, and that both ℓ and n are multiple
WANDs. Now, the vector ℓ is geodetic, shearfree and twistfree by construction [35].
Next, one can easily show that
∇nn = −H,rn+H,idxi, (41)
where, by (38), H,i = −r(lnP ),ui. Therefore n is geodetic if and only if (lnP ),ui =
0 ⇔ P = p1(u)p2(x2, x3, . . .). For a general (non-factorized) function P the multiple
WAND n is thus non-geodetic (one can also easily check that it “shearfree”, “twistfree”
and “expanding”). A simple explicit example of such spacetimes is obtained by
extending to any n ≥ 7 the n = 7 dimensional solution discussed in [36], i.e. by
taking in eq. (38)
K = −1, P = f(u, z)−1/2
ρn−5(det ηαβ)
]1/(2−n)
hijdx
idxj = f(u, z)
dz2 + V (ρ)dτ2 +
V (ρ)
dρ2 + ρ2ηαβdx
, (42)
f(u, z) =
4b(u)e2z/l
l2[e2z/l − b(u)]2
, V (ρ) =
where z ≡ x2, τ ≡ x3, ρ ≡ x4, ηαβ = ηαβ(x5, x6, . . .) is the metric of an (n − 5)-
dimensional unit sphere (α, β = 5, . . . , n− 1), µ and l are constants and b(u) > 0 is an
arbitrary function. The multiple WAND n is non-geodetic as long as db/du 6= 0. Note
that there is not contradiction with the results of the previous subsections precisely
because Φij = 0 here.
6. Type D vacuum spacetimes in five dimensions
Let us now study the five-dimensional case. Note that the algebraic relation (6)
between −2ΦSij and Cijkl is equivalent to the relation between the Ricci and the
Riemann tensor of a m − 2 dimensional space. Therefore in five dimensions Cijkl is
equivalent to ΦSij and thus a type D Weyl tensor in five dimesions is fully determined
by Φij . In fact, for n = 5 it is possible to solve the second constraint from (6) for
Cijkl :
Cijkl
(n=5)
jk − δikΦSjl − δjlΦSik + δjkΦSil
− Φ (δilδjk − δikδjl) . (43)
Thus in the five dimensional case the algebraic equations we consider, (20), (21), (22),
(27), can be expressed in terms of Φij , Li, and Lij . Plugging (43) into (20), recalling
equation (32) and contracting with Lk one finds the equation
LΦSij + 2ΦLiLj − ΦLδij = 0. (44)
For n = 5 equation (21) takes the form
0 = ΦAjkLim + (Φ
im + 3Φ
im)Akj +Φ
mjLik + (Φ
ik + 3Φ
ik)Ajm +Φ
kmLij
+(ΦAij + 3Φ
ij)Amk + δij(Φ
msLsk − ΦSksLsm) + δik(ΦSjsLsm − ΦSmsLsj)
+δim(Φ
ksLsj − ΦSjsLsk) + Φ[δijAkm + δikAmj + δimAjk]. (45)
Type D Einstein spacetimes in higher dimensions 16
Equation (22) reduces to
0 = ΦAmjS + 2ΦAjm +Φ
ji(Sim + 2Aim) + Φ
im(Sij + 2Aij)
+ΦSji(Sim − 2Aim) + ΦSmi(−Sij + 2Aij), (46)
and equation (27) has the form
3[(ΦSij +Φ
ij)Sjk + (Φ
kj +Φ
kj)Sji − SΦSki] = δik(2ΦSjsSjs − ΦS). (47)
In the following sections we study (non-)geodecity of multiple WANDs (section 6.1),
spacetimes admitting non-twisting WANDs Aij = 0 (section 6.2) and spacetimes with
ΦAij = 0 (section 6.3).
6.1. Geodeticity of multiple WANDs
It is interesting to return now to equation (20), which is related to the (non-)geodetic
character of multiple WANDs and in five dimensions implies (44). Since we already
know from Proposition 6 that WANDs are necessarily geodetic when ΦAij 6= 0, let us
focus here on the case ΦAij = 0. If Φ = 0 we see that either L = 0 or Φ
ij = 0, the latter
case being now a conformally flat spacetime. Therefore an n = 5 type D Einstein
spacetime requires (ΦAij = 0 and) Φ 6= 0 in order to admit a non-geodetic multiple
WAND. In this case it follows from (44) that there exists an eigenframe of ΦSij such
ΦSij = Φdiag(1, 1,−1), L2 = L3 = 0, (48)
so that L4 6= 0 is responsible for the WAND ℓ being non-geodetic. Such spacetime
is necessarily shearing since the “canonical” form of ΦSij given in equation (48) is
not compatible with that of equation (36). It would be interesting to find such five
dimensional vacuum type D spacetime with a non-geodetic WAND or prove that such
spacetime does not exist.
To summarize,
Proposition 9 In five dimensions, the only type D spacetimes with non-geodetic
multiple WAND ℓ are those satisfying ΦAik = 0 and Φ
ik 6= 0, ΦSik = diag{Φ, Φ, −Φ}.
6.2. “Non-twisting” case - Aij = 0
In the non-twisting case Aij = 0, equation (46) reduce to
ΦjiSim − ΦmiSij +ΦAmjS = 0. (49)
Now we can, without loss of generality, choose a frame in which the symmetric
matrix Sij is diagonal
Sij = diag(s(2), s(3), s(4)). (50)
Then equations (49) and (47) take the form (recall that we do not sum over indices
in brackets)
ΦSik(s(k) − s(i)) + ΦAik(s(k) + s(i) − S) = 0,
ΦSik(s(k) + s(i) − S) + ΦAik(s(k) − s(i)) = 13δik(2Φ
jsSjs − ΦS). (51)
Now let us study components of the two above equations for i 6= k. By summing
the two above equations we get
(2s(k) − S)(ΦSik +ΦAik) = 0 (i 6= k). (52)
Type D Einstein spacetimes in higher dimensions 17
In the “generic” case with 2s(i) 6= S ∀i, this implies
ΦAik = 0 = Φ
ik for i 6= k. (53)
Consequently, ΦSij is also diagonal and from equation (51)
ΦSij = diag(p(2), p(3), p(4)), p(i) =
2ΦSjsSjs − ΦS
3(2s(i) − S)
. (54)
Using (54), it is straightforward to express (two of) the p(i) in terms of the s(i) solving
the linear relations (which are not all independent):
(s(2) − s(3) − s(4))p(2) = (−s(2) + s(3) − s(4))p(3), (55)
(s(2) − s(3) − s(4))p(2) = (−s(2) − s(3) + s(4))p(4), (56)
(−s(2) + s(3) − s(4))p(3) = (−s(2) − s(3) + s(4))p(4). (57)
Proposition 10 In five dimensions, in the “generic” (2s(i) 6= S ∀i) non-twisting
(Aij = 0) type D spacetime, Φ
ij also vanishes and Φ
ij can be diagonalized together
with Sij.
Note that special cases with 2s(i) = S for some i have to be treated separately:
1) If one of s(i) = S/2, e.g. s(4) = S/2, and the others differ from S/2, 0 then only
ΦS44 6= 0, all other component of ΦSij = 0 and ΦAij = 0.
2) If e.g. s(2) = s(3) = S/2, s(4) = 0 then Φ
24 = Φ
34 = Φ
44 = Φ
24 = Φ
34 = 0, the
other components (ΦS22, Φ
33, Φ
23, Φ
23) are arbitrary.
6.3. Case ΦAij = 0
For ΦAij = 0 equations (46), (25) and (47) take the form
2(ΦSmiAij − ΦSjiAim +ΦAjm) + ΦSjiSim − ΦSmiSij = 0, (58)
− ΦSimAij +ΦSjiAim + 2ΦAjm +ΦSjiSim − ΦSmiSij = 0, (59)
3(ΦSijSjk +Φ
kjSji − SΦSki) = δik(2ΦSjlSjl − ΦS). (60)
In previous section 6.2 it was efficient to choose a frame in which Sij was
diagonal, however, now it is more efficient to choose a frame in which ΦSij is diagonal,
ΦSij = diag{p(2), p(3), p(4)}. Then we obtain from (58)–(60) the following set of
equations
(2p(m) + 2p(j) − 2Φ)Amj + Smj(p(j) − p(m)) = 0, (61)
(−p(m) − p(j) − 2Φ)Amj + Smj(p(j) − p(m)) = 0, (62)
3(p(i) + p(k))Sik = δik(3Sp(i) + 2Φ
jlSjl − ΦS). (63)
In the “generic” case p(i) + p(k) 6= 0, ∀i, k, from equation (63)
Sik = diag{s(2), s(3), s(4)}, s(i) =
2ΦSjlSjl − ΦS
6p(i)
. (64)
From (64) we get the relations (which can be solved to fix two of the si, if desired):
s(2)p(3)(p(2) + p(4)) = s(3)p(2)(p(3) + p(4)), (65)
s(2)p(4)(p(2) + p(3)) = s(4)p(2)(p(3) + p(4)), (66)
s(3)p(4)(p(2) + p(3)) = s(4)p(3)(p(2) + p(4)). (67)
Type D Einstein spacetimes in higher dimensions 18
Subtracting (61) and (62) we obtain (p(m) + p(j))Amj = 0 and thus in the “generic”
case p(m) + p(j) 6= 0, ∀m, j,
Amj = 0. (68)
Proposition 11 In five dimensions, the multiple WAND ℓ in a “generic” (p(i) +
p(j) 6= 0, ∀i, j) type D spacetime with ΦAik = 0 and ΦSik 6= 0, is geodetic and non-
twisting (Aij = 0) and Φ
ik and Sij can be diagonalized together.
There are some special cases to be treated:
- Case a) one p(i) = 0 and Φ 6= 0: without loss of generality we choose p(2) = 0, then
from (61)–(63) 2ΦSjlSjl − ΦS = 0, Sij = diag{0, S/2, S/2}, Amj = 0.
- Case b) only one p(i) 6= 0: without loss of generality we choose p(4) 6= 0,
p(2) = p(3) = 0 then from (61)–(63) 2Φ
jlSjl − ΦS = 0, s(2) + s(3) = s(4) = S/2
and S23 is arbirary, Aij vanishes.
- Case c) only one pair satisfies p(m)+ p(j) = 0, p(j) 6= 0: without loss of generality we
choose p(3) + p(4) = 0, i.e. p(2) = Φ, then the diagonal components of Sij still satisfy
(64), from (61)–(63) S34 is arbitrary and
(p(m) + p(j))Amj = 0, 2ΦAmj = (p(j) − p(m))Smj (69)
and thus if Φ 6= 0, A34 = −
S34. If Φ = 0, then S34 = 0 and Sij is diagonal and
A23 is arbitrary.
- Case d) two pairs satisfy p(m) + p(j) = 0: without loss of generality we choose
p(2) = p(3) = −p(4) = Φ. From (64) it follows that the diagonal components of Sij ,
s(2) and s(3), vanish and s(4) is arbitrary. Equation (63) implies that S24 and S34 are
arbitrary and from equation (69) we get A23 = 0, A24 = −S24, A34 = −S34. This
case is the non-geodetic case (48) from section 6.1.
6.4. An example - Myers-Perry black hole
As an illustrative example we give Sij , Aij , Φ
ij and Φ
ij for the five-dimensional
Myers-Perry black hole [9]
ds2 =
dx2 + ρ2dθ2 − dt2 + (x+ a2) sin2 θdφ2 + (x+ b2) cos2 θdψ2
(dt+ a sin2 θdφ+ b cos2 θdψ)2,
where
ρ2 = x+ a2 cos2 θ + b2 sin2 θ, ∆ = (x + a2)(x+ b2)− r02x.
Two (multiple, geodetic) WANDs (related by reflection symmetry) are given by [7]
(x+ a2)(x+ b2)
x+ a2
x+ b2
x∂x, (70)
n = α
(x + a2)(x+ b2)
x+ a2
x+ b2
, (71)
where we chose α = −∆/2ρ2x in order to satisfy the normalization condition ℓ ·n = 1.
Type D Einstein spacetimes in higher dimensions 19
As a basis of spacelike vectors we choose three eigenvectors of Sij
(2) =
(3) =
(−ab∂t + b∂φ + a∂ψ) , (72)
(4) =
(a2 − b2) sin θ cos θ∂t − a tan−1 θ∂φ + b tan θ∂ψ
with χ =
a2 cos2 θ + b2 sin2 θ. In this frame
Sij =
, Aij =
0 0 −1
0 0 0
1 0 0
 , (73)
ΦSij =
ρ2−2x
0 −1 0
0 0 ρ
, ΦAij =
0 0 1
0 0 0
−1 0 0
 .(74)
Notice that in the static (Schwarzschild) limit (a = 0 = b so that ρ2 = x) one has
Sij = δij/
x and σij = 0 = Aij , and indeed for Φij we recover the form discussed in
subsection 5.3 in the shearfree expanding case and in subsection 6.2 in the “generic”
non-twisting case (with p(2) = p(3) = p(4)).
7. Discussion
Let us finally outline main results presented in the paper.
In the first part of the paper (Sections 3 and 4) we study constraints on Weyl
types of a spacetime following from various assumptions on geometry. It turns out
that:
- Static spacetimes are of types G, Ii, D or conformally flat (Proposition 1).
- “Expanding” stationary spacetimes with appropriate reflection symmetry belong to
these types as well (Proposition 2).
- Warped spacetimes with one-dimensional Lorentzian factor are again of types G, Ii,
D and O (Proposition 3).
- Warped spacetimes with two-dimensional Lorentzian factor are necessarily of types D
or O (Proposition 4), in particular this also applies to spherically symmetric spacetimes
(Proposition 5).
These results may have useful practical applications in determining the algebraic
type of specific spacetimes (or at least in ruling out some types) just by “inspecting”
the given metric and without performing any calculations. This is particularly
important in higher dimensions, where it is more difficult to determine the algebraic
class of a given metric.
In the second part of the paper (sections 5 and 6) we study properties of type
D vacuum spacetimes in general (without assuming that the spacetime is static,
stationary or warped). In five dimensions a type D Weyl tensor is determined by a 3×3
matrix Φij with symmetric and antisymmetric parts being Φ
ij and Φ
ij , respectively.
Type D Einstein spacetimes in higher dimensions 20
In general in the non-twisting case Φ
is symmetric while in the twisting case
antisymmetric part ΦAij appears. In higher dimensions n > 5 the (n−2)×(n−2) matrix
Φij does not contain complete information about the Weyl tensor, but it still plays an
important role. The matrix Φij can also be used for further classification of type D or
II spacetimes, for example according to possible degeneracy of eigendirections of Φij .
Special classes are also cases with Φij being symmetric or vanishing (such examples
for n ≥ 7 are given in section 5.4) etc.
First we focused on the geodeticity of multiple WANDs in type D vacuum space-
times (these are always geodetic for n = 4). It was shown that:
- The multiple WAND in a vacuum spacetime is geodetic in the “generic” case, i.e. if
ΦAij 6= 0 or if all eigenvalues of ΦSij are distinct from minus the trace of Φij (Proposition
- It is also geodetic in the type D, shearfree case whenever Φij 6= 0 (Proposition 7).
- However, explicit examples of vacuum type D spacetimes with non-geodetic multiple
WAND in n ≥ 7 dimensions are given in section 5.4. This provides us with the first
examples of spacetimes “violating” the geodetic part of the Goldberg-Sachs theorem.
- In five dimensions multiple WANDs are also geodetic when ΦAij = 0 and Φ
ij 6= 0 has
a “generic” form (Proposition 11), special cases are discussed in section 6.3.
Properties of the matrix Φij , as well as the expansion and twist matrices Sij and
Aij have been also studied:
- For warped spacetimes with a one/two-dimensional Lorentzian factor (thus also for
static spacetimes) the antisymmetric part of Φij , Φ
ij , vanishes.
- In vacuum type D spacetimes admitting a shearfree expanding WAND, ΦSij is pro-
portional to δij and if Aij = 0 (this always holds in odd dimensions [10]) then Φ
ij = 0
and in the case with ΦSij 6= 0 also vice versa (Proposition 8).
- In five dimensions in a “generic” Einstein type D non-twisting spacetime, ΦAij van-
ishes and eigendirections of Φij coincide with those of Sij (Proposition 10).
- In five dimensions in a “generic” vacuum type D spacetime with symmetric Φij , the
multiple WAND ℓ is non-twisting and eigendirections of Φij and Sij coincide (Propo-
sition 11).
These results provide interesting connections between geometric properties of
principal null congruences and Weyl curvature. Hopefully, they can be also used for
constructing exact type D solutions with particular properties.
Acknowledgments
V.P. and A.P. acknowledge support from research plan No AV0Z10190503 and research
grant KJB100190702.
Appendix A. Optics of WANDs in Kerr-NUT-AdS spacetimes in
arbitrary dimension
As discussed in Sec. 3.3, the assumption about non-zero “expansion” in Proposition
2 is essential. In this appendix we study optical properties of WANDs in Kerr-
NUT-AdS spacetimes in arbitrary dimension [16] and show that the “expansion” in
these cases is always non-vanishing. These metrics are thus subject to Proposition
Type D Einstein spacetimes in higher dimensions 21
2. Indeed, it has been already shown in [15] that these spacetimes are of type
D. In addition, since the expansion is non-zero, we can expect that possible (still
stationary) generalizations of these spacetimes (such as charged black holes) with
appropriate reflection symmetry are of types G, Ii or D (see also footnotes on page
5). This appendix also extends our example of five-dimensional Myers-Perry given
in Section 6.4 to the case with NUT parameters and cosmological constant and to
arbitrary dimension. Note, however, that now we use convenient but physically less
“transparent” coordinates (x1, . . . , xm, ψ0, . . . ψm−1) in even dimensions n = 2m and
(x1, . . . , xm, ψ0, . . . ψm) in odd dimensions n = 2m + 1, introduced in [16]. In our
calculations, we employ results obtained in [15].
The metric of [16] for even and odd dimensions is, respectively,
n = 2m:
ds2 =
A(k)µ dψk
, (A.1)
n = 2m+ 1:
ds2 =
A(k)µ dψk
A(k)dψk
.(A.2)
The functions Qµ, A
µ , A
(k) and S̃ depend only on the coordinates (x1, . . . , xm) and
their explicit expressions are given in [15, 16].
Appendix A.1. Even dimensions, n = 2m
An orthonormal frame of 1-forms {e(A)} = {e(µ), e(m+µ)} with A = 1, 2, . . .2m,
µ = 1, 2, . . . ,m,
(µ) =
, e(m+µ) =
A(k)µ dψk
(A.3)
was intoduced in [15]. Denoting the duals of these forms with lower indices, let us
here also define a null frame of vectors ℓ, n, m(i) by
(e(m) + ie(2m)), n = −i
(e(m) − ie(2m)), (A.4)
with m(i) (i = 2 . . . n − 1) corresponding to e(µ), e(m+µ) (µ = 1 . . .m − 1 from now
on). One can show [15] that the null vectors ℓ, n are multiple WANDs of the type D
metric (A.1) and that they are geodetic (and affinely parametrized). Both WANDs
are complex in the coordinates used above, but note that they become in fact real
in “physical” coordinates since the metric (A.1) was obtained from a real Lorentzian
metric by a Wick rotation with xm = ir in [16] and Qm < 0 in the outer stationary
region, where ∂/∂r is spacelike. Thus
Qm = i
|Qm|, so that reintroducing r, both
vectors ie(2m) and e(m) become real (e
(r) = iδαm
Let us now express the matrix Lij (defined in section 2) in terms of Ricci rotation
coefficients, which can be easily obtained from the connection 1-forms given in [15]
Lij = ℓa;bm
(j) =
(e(m)a;b+ie(2m)a;b)m
(j) = −
(γmij+iγ
ij), (A.5)
Type D Einstein spacetimes in higher dimensions 22
γmµµ = γ
m+µ m+µ = −
x2m − x2µ
, (A.6)
γ2mm+µ µ = − γ2mµ m+µ = −
x2m − x2µ
, (A.7)
and with remaining Ricci rotation coefficients entering (A.5) being zero. Then
Sij =
r2+x2
0 δµν
r2+x2µ
 , Aij =
0 −δµν xµr2+x2
r2+x2
, (A.8)
where terms proportional to δµν symbolically represent a (m− 1)× (m− 1) diagonal
block. Note that Sij ∝ δij (that is, the shear is zero) iff n = 4. From this form of Sij
it follows that shear is non-zero for arbitrary even dimension n > 4 and expansion
r2 + x2µ
(A.9)
is non-zero in arbitrary even dimension n ≥ 4. Note indeed that the WANDs ℓ and
n are related by reflection symmetry, in agreement with the discussion in section 3.
The twist is also obviously non-zero for any n ≥ 4. Recall [16] finally that for n = 4
the metric (A.1) represents a subclass of the Plebański-Demiański family of type D
spacetimes with two expanding, twisting and non-shearing principal null directions [1].
Appendix A.2. Odd dimensions, n = 2m+ 1
In odd dimensions, in addition to (A.3) we define
(2m+1) =
A(k)dψk
. (A.10)
Then the null frame consists of ℓ, n given in (A.4), m(i) (i = 2 . . . n−1) corresponding
to e(µ), e(m+µ) (µ = 1 . . .m− 1), and e(2m+1). Again, the null vectors ℓ and n are
geodetic multiple WANDs of the type D metric (A.2) [15].
Now together with (A.7) we have
γm2m+1 2m+1 = −
, (A.11)
and thus
Sij =
r2+x2
0 δµν
r2+x2µ
0 0 1
, Aij =
0 −δµν xµr2+x2
r2+x2
0 0 0
.(A.12)
Shear, expansion and twist are thus non-zero for arbitrary odd dimension n > 4.
Type D Einstein spacetimes in higher dimensions 23
References
[1] H. Stephani, D. Kramer, M. MacCallum, C. Hoenselaers, and E. Herlt. Exact Solutions of
Einstein’s Field Equations. Cambridge University Press, Cambridge, second edition, 2003.
[2] A. Coley, R. Milson, V. Pravda, and A. Pravdová. Classification of the Weyl tensor in higher
dimensions. Class. Quantum Grav., 21:L35–L41, 2004.
[3] R. Milson, A. Coley, V. Pravda, and A. Pravdová. Alignment and algebraically special tensors
in Lorentzian geometry. Int. J. Geom. Meth. Mod. Phys., 2:41–61, 2005.
[4] A. Coley and N. Pelavas. Algebraic classification of higher dimensional spacetimes. Gen. Rel.
Grav., 38:445–461, 2006.
[5] M. Ortaggio and V. Pravda. Black rings with a small electric charge: gyromagnetic ratios and
algebraic alignment. JHEP, 12:054, 2006 [gr-qc/0609049]
[6] D. Ida and Y. Uchida. Stationary Einstein-Maxwell fields in arbitrary dimensions. Phys. Rev.
D, 68:104014, 2003.
[7] V. P. Frolov and D. Stojković. Particle and light motion in a space-time of a five-dimensional
rotating black hole. Phys. Rev. D, 68:064011, 2003.
[8] V. Pravda, A. Pravdová, A. Coley, and R. Milson. Bianchi identities in higher dimensions.
Class. Quantum Grav., 21:2873–2897, 2004.
[9] R. C. Myers and M. J. Perry. Black holes in higher dimensional space-times. Ann. Phys. (N.Y.),
172:304–347, 1986.
[10] M. Ortaggio, V. Pravda, and A. Pravdová. Ricci identities in higher dimensions. Class.
Quantum Grav., 24:1657–1664, 2007.
[11] V. Pravda and A. Pravdová. WANDs of the black ring. Gen. Rel. Grav., 37:1277–1287, 2005.
[12] A. Z. Petrov. Einstein Spaces. Pergamon Press, Oxford, translation of the 1961 Russian edition,
1969.
[13] W. Kinnersley. Type D vacuum metrics. J. Math. Phys., 10:1195–1203, 1969.
[14] V. Pravda, A. Pravdová, and M. Ortaggio, in preparation.
[15] N. Hamamoto, T. Houri, T. Oota, and Y. Yasui. Kerr-NUT-de Sitter curvature in all dimensions.
J. Phys. A, 40:F177–F184, 2007.
[16] W. Chen, H. Lü, and C. N. Pope. General Kerr-NUT-AdS metrics in all dimensions. Class.
Quantum Grav., 23:5323–5340, 2006.
[17] R. Emparan and H. S. Reall. A rotating black ring solution in five dimensions. Phys. Rev.
Lett., 88:101101, 2002.
[18] T. Harmark. Stationary and axisymmetric solutions of higher-dimensional general relativity.
Phys. Rev. D, 70:124002, 2004.
[19] H. Elvang and P. Figueras. Black saturn, hep-th/0701035.
[20] A. A. Pomeransky and R. A. Sen’kov. Black ring with two angular momenta, hep-th/0612005.
[21] H. Iguchi and T. Mishima. Black diring and infinite nonuniqueness. Phys. Rev. D, 75:064018,
2007.
[22] B. Kleihaus, J. Kunz, and E. Radu. Rotating nonuniform black string solutions,
hep-th/0702053.
[23] R. Emparan and H. S. Reall. Black rings. Class. Quantum Grav., 23:R169–R197, 2006.
[24] J. Podolský and K. Veselý. New examples of sandwich gravitational waves and their impulsive
limit. Czech. J. Phys., 48:871–878, 1998.
[25] A. Coley, R. Milson, N. Pelavas, V. Pravda, A. Pravdová, and R. Zalaletdinov. Generalizations
of pp –wave spacetimes in higher dimensions. Phys. Rev. D, 67:104020, 2003.
[26] J. Lewandowski and T. Pawlowski. Quasi-local rotating black holes in higher dimension:
geometry. Class. Quantum Grav., 22:1573–1598, 2005.
[27] S. Hollands, A. Ishibashi, and R. M. Wald. A higher dimensional stationary rotating black hole
must be axisymmetric. Commun. Math. Phys., 271:699–722, 2007.
[28] F. A. Ficken. The Riemannian and affine differential geometry of product-spaces. Ann. Math.,
40:892–913, 1939.
[29] P. G. O. Freund and M. A. Rubin. Dynamics of dimensional reduction. Phys. Lett. B, 97:233–
235, 1980.
[30] V. Cardoso, Ó. J. C. Dias, and J. P. S. Lemos. Nariai, Bertotti-Robinson and anti-Nariai
solutions in higher dimensions. Phys. Rev. D, 70:024002, 2004.
[31] M. P. M. Ramos and E. G. L. R. Vaz. Double warped spacetimes. J. Math. Phys., 44:4839–4865,
2003.
[32] J. Carot and J. da Costa. On the geometry of warped spacetimes. Class. Quantum Grav.,
10:461–482, 1993.
[33] J. Plebański and J. Stachel. Einstein tensor and spherical symmetry. J. Math. Phys., 9:269–283,
http://arxiv.org/abs/gr-qc/0609049
http://arxiv.org/abs/hep-th/0701035
http://arxiv.org/abs/hep-th/0612005
http://arxiv.org/abs/hep-th/0702053
Type D Einstein spacetimes in higher dimensions 24
1979.
[34] G. T. Horowitz and S. F. Ross. Properties of naked black holes. Phys. Rev. D, 57:1098–1107,
1998.
[35] J. Podolský and M. Ortaggio. Robinson-Trautman spacetimes in higher dimensions. Class.
Quantum Grav., 23:5785–5797, 2006.
[36] M. Ortaggio. Higher dimensional spacetimes with a geodesic, shearfree, twistfree and expanding
null congruence, gr-qc/0701036.
http://arxiv.org/abs/gr-qc/0701036
	Introduction
	Preliminaries
	Static and stationary spacetimes 
	Static spacetimes
	Stationary spacetimes 
	Remarks and ``limitations'' of the results
	Direct/warped product spacetimes 
	Weyl tensor
	``Factorized'' geodetic null vector fields
	Type D Einstein spacetimes in higher dimensions
	Algebraic conditions following from the Bianchi equations
	WANDs in ``generic'' vacuum type D and II spacetimes in arbitrary dimension are geodetic
	Vacuum type D spacetimes with a ``shearfree'' WAND
	An example of type D vacuum spacetimes with a non-geodetic WAND 
	Type D vacuum spacetimes in five dimensions
	Geodeticity of multiple WANDs
	``Non-twisting'' case - Aij=0
	Case ijA=0
	An example - Myers-Perry black hole
	Discussion
	Optics of WANDs in Kerr-NUT-AdS spacetimes in arbitrary dimension
	Even dimensions, n=2 m 
	Odd dimensions, n=2 m + 1
ABSTRACT
  We show that all static spacetimes in higher dimensions are of Weyl types G,
I_i, D or O. This applies also to stationary spacetimes if additional
conditions are fulfilled, as for most known black hole/ring solutions. (The
conclusions change when the Killing generator becomes null, such as at Killing
horizons.) Next we demonstrate that the same Weyl types characterize warped
product spacetimes with a one-dimensional Lorentzian (timelike) factor, whereas
warped spacetimes with a two-dimensional Lorentzian factor are restricted to
the types D or O. By exploring the Bianchi identities, we then analyze the
simplest non-trivial case from the above classes - type D vacuum spacetimes,
possibly with a cosmological constant, dropping, however, the assumptions that
the spacetime is static, stationary or warped. It is shown that for ``generic''
type D vacuum spacetimes the corresponding principal null directions are
geodetic in any dimension (this applies also to type II spacetimes). For n>=5,
however, there may exist particular cases of type D spacetimes which admit
non-geodetic multiple principal null directions and we present such examples in
any n>=7. Further studies are restricted to five dimensions, where the type D
Weyl tensor is described by a 3x3 matrix \Phi_{ij}. In the case with
``twistfree'' (A_{ij}=0) principal null geodesics we show that in a ``generic''
case \Phi_{ij} is symmetric and eigenvectors of \Phi_{ij} coincide with those
of the expansion matrix S_{ij}, providing us with three preferred spacelike
directions of the spacetime. Similar results are also obtained when relaxing
the twistfree condition and assuming instead that \Phi_{ij} is symmetric. The
n=5 Myers-Perry black hole and Kerr-NUT-AdS metrics in arbitrary dimension are
briefly studied as specific examples of type D vacuum spacetime.

<|endoftext|><|startoftext|>
Introduction
Almost all the elementary fermions have spin-1
, which can be naturally described by
spinors, so today spinors and spinor representations play a more and more important
role in mathematical and theoretical physics. Noticing the limitations of the linear field
equation, many physicists such as H. Weyl, W. Heisenberg, once proposed the nonlinear
spinor equations[1, 2, 3, 4, 5, 6] to construct a unified field theory for elementary
particles. However they have not gotten many definite results due to the mathematical
difficulties. The rigorous solutions for some simple dark nonlinear spinor models were
∗email: yqgu@luody.com.cn
†email: dqli@fudan.edu.cn
http://arxiv.org/abs/0704.0436v2
obtained in [7, 8, 9, 10], and we found it can provide negative pressure to guarantee a
singularity-free and accelerating expanding universe[11, 12].
The theoretical proof about the existence of solitons was investigated in [13, 14,
15, 16, 17, 18]. The symmetries and many beautiful conditional exact solutions of the
nonlinear spinor, vector and scalar differential equations are collected in [19]. However
lots of these exact solutions seem to be non-physical.
The spinor with its own electromagnetic potential was studied in [20, 21, 22, 23, 24],
it was disclosed that the nonlinear spinor equations have particle like solution with
anomalous magneton, and imply the exact classical mechanics and quantum mechanics
for many-body[25, 26, 27].
In this paper we derive a simplified form of the eigen equation with general meaning
for nonlinear Dirac equation, and then give a scheme to solve the solution. Denote the
Minkowski metric by ηµν = diag[1,−1,−1,−1], Pauli matrices by
~σ = (σj) =
. (1.1)
Define 4× 4 Hermitian matrices as follows
, γ =
, β =
0 −iI
. (1.2)
In this paper, we adopt the Hermitian matrices (1.2) instead of Dirac matrices γµ,
because this form is more convenient for calculation.
For the system of a nonlinear spinor field φ in the 4−d potential Aµ, the Lagrangian
describing the motion is generally given by
L = φ+[αµ(~i∂µ − eAµ)− µcγ]φ+ F (γ̌, β̌), (1.3)
where µ > 0 is a constant mass, F are the nonlinear coupling potential, which is usually
the even polynomial of γ̌, β̌, and γ̌, β̌ are the quadratic scalars of φ defined by
γ̌ = φ+γφ, β̌ = φ+βφ. (1.4)
We can check γ̌ is a true-scalar, but β̌ a pseudo-scalar.
The variation of (1.3) with respect to φ+ gives the dynamic equation
αµ(~i∂µ − eAµ)φ = (µcγ − Fγγ − Fββ)φ, (1.5)
where Fγ =
, Fβ =
. In the Hamiltonian form we have
~i∂tφ = Ĥφ, Ĥ ≡ ~α · (−~i∇− e ~A) + eA0 + (µc− Fγ)γ − Fββ. (1.6)
In this paper we denote ~A = (A1, A2, A3) to be the spatial part of a contravariant
vector Aµ.
It is easy to check that the current conservation law holds ∂µq
µ = 0, so we can take
the normalizing condition as follows
|φ|2d3x = 1. (1.7)
The 4− d potential produced by spinor φ itself takes the following form
µ = eα̌µ = eφ+αµφ. (1.8)
2 Simplification of the Equation
Consider a spinor keeping motionless in an external magnetic field ~B = (0, 0, B). since
the scale of the elementary particle is very small, we take external field B as a constant.
By ∇× ~Aext = ~B, we have the general solution for external vector potential
~Aext =
B(−y, x, 0) +∇Φ, (2.1)
where Φ(~x) is any given smooth function.
In the spherical coordinate system (r, θ, ϕ), we have
~σ · ∇ = σr∂r + (σθ∂θ + σϕ∂ϕ), (2.2)
where (σr, σθ, σϕ) is given by
cos θ sin θe−ϕi
sin θeϕi − cos θ
− sin θ cos θe−ϕi
cos θeϕi sin θ
r sin θ
0 −ie−ϕi
ieϕi 0
. (2.3)
Let Ĵ be the angular momentum operator for the spinor field
Ĵ = ~r × (−~i∇) +
~~S, ~S = diag(~σ, ~σ), (2.4)
then any eigenfunction of Ĵ3 = −~i∂ϕ +
~S3 takes the following form
φ = (u1, u2e
,−iv1,−iv2e
ϕi)T exp
(2.5)
with (κ = 0,±1,±2, · · · ), where uk, vk(k = 1, 2) are functions of r, θ but independent
on ϕ and t. In this paper the index T stands for transposed matrix.
For any spin-1
particle, it has a pole axis. If we set the pole axis as coordinate
x3 = z, then Ĵ3 is commutative with the nonlinear Hamilton operator (1.6) by a U(1)
gauge transformation for spinor as eΦiφ, which removes the uncertain function from
external vector potential (2.1), thereby we have
~Aext =
B(−y, x, 0) =
Br sin θ(− sinϕ, cosϕ, 0). (2.6)
For the above symmetric form of vector potential (2.6), substituting (2.5) into (1.6) we
can check that all functions uk, vk can take real number. For the vector potential pro-
duced by φ itself, we will find below it also takes the form of (2.6). This simplification
of φ may be the essence of the gauge symmetry.
In what follows, we set ~ = c = 1 as units for convenience. Making variable
transformation
u = u1(r, θ) + u2(r, θ)i, v = v1(r, θ) + v2(r, θ)i, (2.7)
then we have {
α̌0 = |u|
2 + |v|2, α̌ = (ūv − uv̄)i,
γ̌ = |u|2 − |v|2, β̌ = ūv + uv̄,
(2.8)
with ~̌α = α̌(− sinϕ, cosϕ, 0). Substituting (2.7) and (2.8) into (1.3) we get the La-
grangian of the eigen states as follows
L = Re
ū(∂r +
∂θ)v̄ − v̄(∂r +
∂θ)ū
r sin θ
(κ+ 1
)(ūv − uv̄)
+(m− eA0)(|u|
2 + |v|2)− ie(ūv − uv̄)A− µ(|u|2 − |v|2) + F,
(2.9)
where
(~r × ~A) · ez = (cosϕAy − sinϕAx) = A(r, sin θ) (2.10)
including both external and inner vector potential. By variation with respect to ū, v̄,
we get an elegant equation with double-helix structure
eθi(∂r +
∂θ)ū =
r sin θ
[(κ+ 1
)u− 1
ū] + (µ+m− eA0 − Fγ)v + Fβu+ ieAu,
eθi(∂r +
∂θ)v̄ =
r sin θ
[(κ + 1
)v − 1
v̄ ] + (µ−m+ eA0 − Fγ)u− Fβv + ieAv.
(2.11)
By (2.11) we find that, κ = 0 corresponds to spin 1
and κ = −1 corresponds to spin
. Generally the coordinates r and θ can not be separable for nonlinear equation.
The energy functional for (2.11) is given by
E = 2π
r2 sin θdrdθ
ū(∂r +
∂θ)v̄ − v̄(∂r +
∂θ)ū
r sin θ
(κ+ 1
) + eA
i(ūv − uv̄) + eA0(|u|
2 + |v|2) + µ(|u|2 − |v|2)
(2.12)
The eigen solution of (2.11) is just the extreme point of E under the constraint of
normalizing condition
(|u|2 + |v|2)r2 sin θdrdθ = 1. (2.13)
(2.13) is also the quantizing condition of the energy spectrum[9, 10].
3 A Scheme for Solving Solution
For general potential Aµ and F , the analytic solution of (2.11) u and v can not be
solved. However they can be conveniently expressed by Fourier series of θ, and the
equations of the radial functions can be derived by variation principle. For any given
integer N ≥ 0, define 2N + 1 vectors
Γ(θ) = (e−2Nθi, e−2(N−1)θi, · · · , e2(N−1)θi, e2Nθi), (3.1)
U(r) = (U−N(r), U−(N−1)(r), · · · , U(N−1)(r), UN(r))
, (3.2)
V (r) = (V−N(r), V−(N−1)(r), · · · , V(N−1)(r), VN(r))
T . (3.3)
The eigen solution of (2.11) with even parity must take the form
u = Γ · U =
2nθi, v = Γ̄ · V eθi =
(−2n+1)θi, (3.4)
and the eigen solution with odd parity takes
u = Γ · Ueθi =
(2n+1)θi, v = Γ̄ · V =
−2nθi. (3.5)
In what follows we only consider (3.4). For (3.5) we have similar results.
For the cases κ 6= 0 and κ 6= −1, i.e. for the cases with nonzero magnetic quantum
number, the solution must have consistent conditions at θ = 0, π as
u(r, 0) = u(r, π) =
Un(r) ≡ 0, v(r, 0) = v(r, π) =
Vn(r) ≡ 0. (3.6)
For this case, (3.4) minus (3.6) we get
2nθi − 1), v =
−2nθi − 1)eθi. (3.7)
By the form (3.7) we have
e2nθi − 1
2i sin θ
= (1 + e2θi + e4θi + · · ·+ e2(n−1)θi)eθi, (3.8)
which removes the singularity of (2.11) at θ = 0, π.
For the spin 1
state, i.e. for the case κ = 0, we can check from (2.11) that the
solution U, V are all real functions. But for the spin −1
state, i.e. for the case κ = −1,
the solution U, V are all pure imaginary functions. In what follows we only consider
the real case. For the covariant quadratic forms (2.8), by (3.3) we have
α̌0 = U
TPU + V T P̄ V, α̌ = UT (Q+ −Q)V i,
γ̌ = UTPU − V T P̄V, β̌ = UT (Q+ +Q)V,
(3.9)
where P = Γ+Γ and Q = ΓTΓeθi are (2N +1)× (2N +1) matrices with components as
Pm,n = exp[2(n−m)θi], (−N ≤ n,m ≤ N),
Qm,n = exp[(2(n+m) + 1)θi].
(3.10)
By (3.9) and (3.10) we have
α̌0 =
n,m=−N
(UnUm + VnVm)(e
−2(n−m)θi + e2(n−m)θi), (3.11)
n,m=−N
(UnUm − VnVm)(e
−2(n−m)θi + e2(n−m)θi), (3.12)
n,m=−N
UnVm(e
−2(n−m)θi − e2(n−m−1)θi)eθii, (3.13)
n,m=−N
UnVm(e
−2(n−m)θi + e2(n−m−1)θi)eθi. (3.14)
The dynamic equation of the potential A is given by
−∆A = eα̌ = e[a0(r) sin θ + a1(r) sin 3θ + · · · ],
= e[a0(r)(e
−2θi − 1) + a1(r)(e
−4θi − e2θi) + · · · ]eθii. (3.15)
Substituting (3.4), (3.7), (3.8) and (3.11)∼(3.15) into (2.11) and directly comparing
the coefficients of all e2nθi, we can easily get a truncated ordinary differential equation
of U(r), V (r). However the convergence this cut-off equation to the original solution
needs proof. A more credible method to get the efficient equation of U(r), V (r) is
via variational principle. The variational equation can be obtained by the following
procedure. Define operators
T̂u =
ΓT (θ)e−θi sin θdθ, T̂v =
Γ+(θ) sin θdθ, (3.16)
then T̂u left multiplying the first equation of (2.11) and T̂v left multiplying the second
give the variational equation of U(r), V (r). The coefficient matrix of U ′(r) and V ′(r)
is the same positive definite symmetric matrix with components
Mn,m =
Pn,m sin θdθ =
1− 4(n−m)2
. (3.17)
The other coefficient matrices can also be similarly obtained.
The convergence of expansion (3.4) and the consistent condition (3.6) seem to
have closely relation with the structure of nonlinear potential F (γ̌, β̌), which reflects
the important properties of the elementary particles such as the exclusion princi-
ple. The above procedure is valid for extensive models. The neat and elegant re-
sults are profoundly rooted in the quaternionic structure of the physical variables and
spacetime[28, 29, 30], so the 3 + 1 dimensional Universe is a miraculous masterpiece
with unique feature.
References
[1] D. Iuanenko, Sov. Phys. 13, 141-149 (1938)
[2] H. Weyl, Phys. Rev. 77, 699-701 (1950)
[3] W. Heisenberg, Physica 19, 897-908 (1953)
[4] K. Johnson, Phys. Lett. 78B, 259-262 (1978)
[5] P. Mathieu, Phys. Rev. D29, 2879-2883 (1984)
[6] A. F. Ranada, Classical nonlinear Dirac field models of extended particles,
In:Quantum theory, groups, fields and particles, edited by A.O.Barut, Amster-
dam, Reidel, 1983
[7] R. Finkelsten, et al, Phys. Rev. 83(2), 326-332(1951)
[8] M. Soler, Phys. Rev. D1(10), 2766-2767(1970)
[9] Y. Q. Gu, Some Properties of the Spinor Soliton, Adv in Appl. Clif. Alg. 8(1),
17-29(1998), http://www.clifford-algebras.org/v8/81/gu81.pdf
[10] Y. Q. Gu, Characteristic Functions and Typical Values of the Nonlinear Dark
Spinor, arXiv:hep-th/0611210
[11] Y. Q. Gu, A Cosmological Model with Dark Spinor Source, arXiv:gr-qc/0610147
[12] Y. Q. Gu, Accelerating Expansion of the Universe with Nonlinear Spinors,
arXiv:gr-qc/0612176
[13] T. Cazenave, L. Vazquez, Comm. Math. Phys.105, 35-47 (1986)
[14] F. Merle, J. Diff. Eq. 74, 50-68 (1988)
[15] M. Balabane, et al, Comm. Math. Phys. 119, 153-176 (1988)
[16] M. Balabane, et al, Comm. Math. Phys. 133, 53-74 (1990)
[17] M. J. Esteban, E. Sere, C. R. Acad. Sci. Pavis, t. 319, Serie I, 1213-1218 (1994)
[18] M. J. Esteban, E. Sere, Comm. Math. Phys. 171, 323-350(1995)
[19] Wilhelm Fushchych, Renat Zhdanov, SYMMETRIES AND EXACT SOLUTIONS
OF NONLINEAR DIRAC EQUATIONS, Kyiv Mathematical Ukraina Publisher,
Ukraine (1997), arXiv:math-ph/0609052
[20] A. Garrett Lisi, A Solution of the Maxwell-Dirac Equations in 3+1 Dimensions,
arXiv:hep-th/9410244
[21] M. Wakano, Prog. Theor. Phys. 35(6), 1117-1141(1996)
[22] Y. Q. Gu, Spinor Soliton with Electromagnetic Field, Adv in Appl. Clif. Alg.
V8(2), 271-282(1998), http://www.clifford-algebras.org/v8/82/gu82.pdf
[23] A. O. Barut and J. Kraus, Found. Physics 13, 189 (1983)
[24] A. O. Barut and J. F. Van Huele, Phys. Rev. A 32, 3187(1985)
[25] Y. Q. Gu, The Electromagnetic Potential Among Nonrelativistic Electrons, Adv
in Appl. Clif. Alg. V9(1), 55-60(1999),
http://www.clifford-algebras.org/v9/91/gu91.pdf
[26] Y. Q. Gu, New Approach to N-body Relativistic Quantum Mechanics,
arXiv:hep-th/0610153
[27] Y. Q. Gu, Mass Energy Relation of the Nonlinear Spinor, arXiv:hep-th/0701030
[28] Y. Q. Gu, A Canonical Form For Relativistic Dynamic Equation, Adv in Appl.
Clif. Alg. V7(1), 13-24(1997),
http://www.clifford-algebras.org/v7/v71/GU71.pdf, arXiv:hep-th/0610189
[29] Y. Q. Gu, Green Functions of Relativistic Field Equations, arXiv:hep-th/0612214
[30] A. Gsponer, J.-P. Hurni, quaternions in mathematical physics
(1): Alphabetical bibliography, arXiv:math-ph/0510059;
(2): Analytical bibliography, arXiv:math-ph/0511092.
ABSTRACT
  How to effectively solve the eigen solutions of the nonlinear spinor field
equation coupling with some other interaction fields is important to understand
the behavior of the elementary particles. In this paper, we derive a simplified
form of the eigen equation of the nonlinear spinor, and then propose a scheme
to solve their numerical solutions. This simplified equation has elegant and
neat structure, which is more convenient for both theoretical analysis and
numerical computation.

<|endoftext|><|startoftext|>
Introduction
	Experimental Method
	Selection of Ds Candidates
	Signal Reconstruction
	The Expected MM2 Spectrum
	MM2 Spectra in Data
	Background Evaluations
	Leptonic Branching Fractions
	Checks of the Method
	The Decay Constant
	Conclusions
	Acknowledgments
	References
ABSTRACT
  We examine e+e- --> Ds- Ds*+ and Ds*- Ds+ interactions at 4170 MeV using the
CLEO-c detector in order to measure the decay constant fDs+. We use the Ds+ -->
l+ nu channel, where the l+ designates either a mu+ or a tau+, when the tau+
--> pi+ nu. Analyzing both modes independently, we determine B(Ds+ --> mu+ nu)
= (0.594 +- 0.066 +- 0.031)%, and B(Ds+ --> tau+ nu) = (8.0 +- 1.3 +- 0.4)%. We
also analyze them simultaneously to find an effective value of B{eff}(Ds+ -->
mu+ nu) = (0.638 +- 0.059 +- 0.033)% and extract fDs = (274 +- 13 +- 7) MeV.
Combining with our previous determination of B(D+ -> mu+ nu), we also find the
ratio fDs/fD+ = 1.23 +- 0.11 +- 0.04. We compare to current theoretical
estimates. Finally, we find B(Ds+ --> e+ nu) < 1.3 x10^{-4} at 90% confidence
level.

<|endoftext|><|startoftext|>
RUNHETC-04-2007
Broadening the Higgs Boson with Right-Handed Neutrinos
and a Higher Dimension Operator at the Electroweak Scale
Michael L. Graesser
1Department of Physics and NHETC, Rutgers University, Piscataway, NJ 08540
Abstract
The existence of certain TeV suppressed higher-dimension operators may open up
new decay channels for the Higgs boson to decay into lighter right-handed neutrinos.
These channels may dominate over all other channels if the Higgs boson is light. For a
Higgs boson mass larger than 2mW the new decays are subdominant yet still of interest.
The right-handed neutrinos have macroscopic decay lengths and decay mostly into final
states containing leptons and quarks. A distinguishing collider signature of this scenario
is a pair of displaced vertices violating lepton number. A general operator analysis is
performed using the minimal flavor violation hypothesis to illustrate that these novel
decay processes can occur while remaining consistent with experimental constraints
on lepton number violating processes. In this context the question of whether these
new decay modes dominate is found to depend crucially on the approximate flavor
symmetries of the right-handed neutrinos.
http://arxiv.org/abs/0704.0438v2
1 Motivation
Neutrino interactions with other Standard Model particles are well-described, forming a
cornerstone of the Standard Model itself. But the origin of their masses remain unknown.
If their masses are generated by a local quantum field theory, then other degrees of freedom
must exist. These particles or “right-handed neutrinos” are either the missing Dirac part-
ners of the neutrinos, or are much heavier than the O(eV) mass scale of the neutrinos and
create a “see-saw” mechanism. Which of these scenarios is realized in Nature is dependent
on the unknown scale of the Majorana mass parameters of the right-handed neutrinos.
The existence of right-handed neutrinos may have other physical consequences, de-
pending on the size of their Majorana masses. Right-handed neutrinos with Majorana
masses violate overall lepton number, which may have consequences for the origin of the
observed baryon asymmetry. Leptogenesis can occur from the out of equilibrium decay of a
right-handed neutrino with mass larger than the TeV scale [1]. Interestingly, right-handed
neutrinos with masses below the electroweak scale may also lead to baryogenesis [2].
But if right-handed neutrinos exist, where did their mass come from? The Majorana
mass parameters are not protected by the gauge invariance of the Standard Model, so an
understanding of the origin of their mass scale requires additional physics. The see-saw
mechanism with order unity Yukawa couplings prefers a large scale, of order 1013−14 GeV.
But in this case a new, intermediate scale must be postulated in addition to the four mass
scales already observed in Nature. On the other hand, such a large scale might occur
naturally within the context of a Grand Unified Theory.
Here I explore the consequences of assuming that the Majorana neutrino mass scale is
generated at the electroweak scale 1. To then obtain the correct mass scale for the left-
handed neutrinos from the “see-saw” mechanism, the neutrino Yukawa couplings must be
tiny, but not unreasonably small, since they would be comparable to the electron Yukawa
coupling. It might be natural for Majorana masses much lighter than the Planck or Grand
Unified scales to occur in specific Randall-Sundrum type models [7] or their CFT dual
descriptions by the AdS/CFT correpsondance [8]. But as the intent of this paper is to
be as model-independent as possible, I will instead assume that it is possible to engineer
electroweak scale Majorana masses and use effective field theory to describe the low-energy
theory of the Higgs boson and the right-handed and left-handed (electroweak) neutrinos.
I will return to question of model-building in the concluding section and provide a few
additional comments.
With the assumption of a common dynamics generating both the Higgs and right-
handed neutrino mass scales, one may then expect strong interactions between these par-
ticles, in the form of higher dimension operators. However since generic flavor-violating
higher dimension operators involving Standard Model fields and suppressed only by the
TeV are excluded, I will use throughout the minimal flavor violation hypothesis [9, 10, 11] in
1For previous work on the phenomenology of electroweak scale right-handed neutrinos, see [3, 4, 5, 6].
None of these authors consider the effects of TeV-scale suppressed higher dimension operators.
order to suppress these operators. The purpose of this paper is to show that the existence
of operators involving the Higgs boson and the right-handed neutrinos can significantly
modify the phenomenology of the Higgs boson by opening a new channel for it to decay
into right-handed neutrinos. I show that the right-handed neutrinos are long-lived and
generically have macroscopic decay lengths. For reasonable values of parameters their de-
cay lengths are anywhere from fractions of a millimeter to tens of metres or longer if one
of the left-handed neutrinos is extremely light or massless. As they decay predominantly
into a quark pair and a charged lepton, a signature for this scenario at a collider would
be the observation of two highly displaced vertices, each producing particles of this type.
Further, by studying these decays all the CP -preserving parameters of the right-handed
and left-handed neutrinos interactions could be measured, at least in principle.
A number of scenarios for new physics at the electroweak scale predict long-lived parti-
cles with striking collider features. Displaced vertices due to long-lived neutral particles or
kinks appearing in charged tracks are predicted to occur in models of low energy gauge me-
diation [12]. More recently models with a hidden sector super-Yang Mills coupled weakly
through a Z ′ or by mass mixing with the Higgs boson can produce dramatic signatures with
displaced jets or leptons and events with high multiplicity [13]. A distinguishing feature of
the Higgs boson decay described here is the presence of two displaced vertices where the
particles produced at each secondary vertex violate overall lepton number.
That new light states or operators at the electroweak scale can drastically modify Higgs
boson physics has also been recently emphasized. Larger neutrino couplings occur in a
model with nearly degenerate right-handed neutrino masses and vanishing tree-level active
neutrino masses, that are then generated radiatively at one-loop [3]. Decays of the Higgs
boson into a right-handed and left-handed neutrino may then dominate over decays to
bottom quarks if the right-handed neutrinos are heavy enough. Models of supersymmetry
having pseudoscalars lighter than the neutral Higgs scalar may have exotic decay processes
for the Higgs boson that can significantly affect limits and searches [14]. Supersymmetry
without R-parity can have striking new signatures of the Higgs boson [15]. Two common
features between that reference and the work presented here is that the Higgs boson decays
into a 6-body final state and may be discovered through displaced vertices, although the
signatures differ.
Interesting phenomena can also occur without supersymmetry. Adding to the Standard
Model higher dimension operators involving only Standard Model fields can modify the
Higgs boson production cross-section and branching fractions [16]. Such an effect can
occur in models with additional colored scalars coupled to top quarks [17].
The outline of the paper is the following. Section 2 discusses the new decay of the Higgs
boson into right-handed neutrinos. Section 3 then discusses various naturalness issues
that arise in connection with the relevant higher dimension operator. Section 4 discusses
predictions for the coefficients of the new operator within the framework of minimal flavor
violation [9, 10, 11]. It is found that the predicted size of the higher dimension operators
depends crucially on the approximate flavor symmetries of the right-handed neutrinos. How
this affects the branching ratio for the Higgs boson to decay into right-handed neutrinos
is then discussed. Section 5 computes the lifetime of the right-handed neutrinos assuming
minimal flavor violation and discusses its dependence on neutrino mass parameters and
mixing angles. I conclude in Section 6 with some comments on model-building issues and
summarize results.
2 Higgs Boson Decay
The renormalizable Lagrangian describing interactions between the Higgs doubletH (1,2)−1/2,
the lepton SU(2)W doublets Li (1,2)−1/2, and three right-handed neutrinos NI (1,1)0 is
given by
mRNN + λνH̃NL+ λlHLe
c (1)
where flavor indices have been suppressed and H̃ ≡ iτ2H∗ where H has a vacuum ex-
pectation value (vev) 〈H〉 = v/
2 and v ≃ 247 GeV. Two-component notation is used
throughout this note. We can choose a basis where the Majorana mass matrix mR is di-
agonal and real with elements MI . In general they will be non-universal. It will also be
convenient to define the 3× 3 Dirac neutrino mass mD ≡ λνv/
2. The standard see-saw
mechanism introduces mass mixing between the right-handed and left-handed neutrinos
which leads to the active neutrino mass matrix
λTν m
R λνv
2 = mTDm
R mD . (2)
This is diagonalized by the PMNS matrix UPMNS [18] to obtain the physical masses mI
of the active neutrinos. At leading order in the Dirac masses the mass mixing between the
left-handed neutrinos νI and right-handed neutrinos NJ is given by
VIJ = [m
R ]IJ = [m
D]IJM
J (3)
and are important for the phenomenology of the right-handed neutrinos. For generic Dirac
and Majorana neutrino masses no simple relation exists between the physical masses, left-
right mixing angles and the PMNS matrix. An estimate for the neutrino couplings is
fI ≃ 7× 10−7
0.5eV
30GeV
. (4)
where λν = URfUL has been expressed in terms of two unitary matrices UL/R and a
diagonal matrix f with elements fI . In general UL 6= UPMNS. Similarly, an approximate
relation for the left-right mixing angles is
VIJ ≃
[UPMNS ]JI = 4× 10−6
0.5eV
30GeV
[UPMNS]JI (5)
which is valid for approximately universal right-handed neutrino masses MI ≃ M and
UR ≃ 1. I note that these formulae for the masses and mixing angles are exact in the
limit of universal Majorana masses and no CP violation in the Dirac masses [11]. For
these fiducial values of the parameters no limits exist from the neutrinoless double β de-
cay experiments or collider searches [5] because the mixing angles are too tiny. No limits
from cosmology exist either since the right-handed neutrinos decay before big bang nucle-
osynthesis if MI∼>O(GeV), which will be assumed throughout (see Section 5 for the decay
length of the right-handed neutrinos).
If a right-handed neutrino is lighter than the Higgs boson, MI < mh, where mh is the
mass of the Higgs boson, then in principle there may be new decay channels
h → NI +X (6)
where X may be a Standard Model particle or another right-handed neutrino (in the latter
case MI +MJ < mh). For instance, from the neutrino coupling one has h → NIνL. This
decay is irrelevant, however, for practical purposes since the rate is too small.
But if it is assumed that at the TeV scale there are new dynamics responsible for
generating both the Higgs boson mass and the right-handed neutrino masses, then higher-
dimension operators involving the two particles should exist and be suppressed by the TeV
scale. These can be a source of new and relevant decay processes. Consider then
δLeff =
O(5)i +
O(6)i + · · ·+ h.c. (7)
where Λ ≃ O(TeV). Only dimension 5 operators are considered here, with dimension
6 operators discussed elsewhere [19]. The central dot ‘·’ denotes a contraction of flavor
indices.
At dimension 5 there are several operators involving right-handed neutrinos. However it
is shown below that constraints from the observed scale of the left-handed neutrino masses
implies that only one of them can be relevant. It is
O(5)1 = H
†HNN (8)
where the flavor dependence is suppressed. The important point is that this operator is
not necessarily suppressed by any small Yukawa couplings. After electroweak symmetry
breaking the only effect of this operator at tree-level is to shift the masses of the right-
handed neutrinos. Constraints on this operator are therefore weak (see below).
This operator, however, can have a significant effect on the Higgs boson. For if
MI +MJ < mh , (9)
the decay
h → NINJ (10)
can occur. For instance, if only a single flavor is lighter than the Higgs boson, the decay
rate is
Γ(h → NINI) =
2β2I + (Imc
where only half the phase space has been integrated over, c
1 /Λ is the coefficient of (8),
and βI ≡ (1− 4M2I /m2h)1/2 is the velocity of the right-handed neutrino.
The dependence of the decay rate on β may be understood from the following comments.
The uninterested reader may skip this paragraph, since this particular dependence is only
briefly referred to later in the next paragraph, and is not particularly crucial to any other
discussion. Imagine a scattering experiment producing the two Majorana fermions only
through an on-shell Higgs boson in the s-channel. The cross-section for this process is
related to the decay rate into this channel, and in particular their dependence on the final
state phase space are identical. Conservation of angular momentum, and when appropriate,
conservation of CP in the scattering process then fixes the dependence of Γ on phase
space. For example, note that the phase of c
1 is physical and cannot be rotated away.
When Imc
1 = 0 the operator (8) conserves CP and the decay rate has the β
3 dependence
typical for fermions. This dependence follows from the usual argument applied to Majorana
fermions: a pair of Majorana fermions has an intrinsic CP parity of −1 [20], so conservation
of CP and total angular momentum in the scattering process implies that the partial wave
amplitude for the two fermions must be a relative p-wave state. If the phase of c
non-vanishing, then CP is broken and the partial wave amplitude can have both p-wave
and s-wave states while still conserving total angular momentum. The latter amplitude
leads to only a βI phase space suppression.
There is a large region of parameter space where this decay rate is larger than the
rate for the Higgs boson to decay into bottom quarks, and, if kinematically allowed, not
significantly smaller than the rate for the Higgs boson to decay into electroweak gauge
bosons. For example, with Im(c
1 ) = 0 and no sum over I,
Γ(h → NINI)
Γ(h → bb)
β3I (12)
This ratio is larger than 1 for Λ∼<12|c
I TeV . If all three right-handed neutrinos are
lighter than the Higgs boson, then the total rate into these channels is larger than the
rate into bottom quarks for Λ∼<20|c
I TeV. If Im(c
1 ) 6= 0 the operator violates CP
and the region of parameter space where decays to right-handed neutrinos dominate over
decays to bottom quarks becomes larger, simply because now the decay rate has less of a
phase space suppression, as described above. The reason for the sensitivity to large values
of Λ is because the bottom Yukawa coupling is small. For
mh > 2mW (13)
the Higgs boson can decay into a pair of W bosons with a large rate and if kinematically
allowed, into a pair of Z gauge bosons with a branching ratio of approximately 1/3. One
finds that with Im(c
1 ) = 0 and no sum over I,
Γ(h → NINI)
Γ(h → WW )
f(βW )
where f(βW ) = 3/4 − β2W /2 + 3β4W /4 [21] and βW is the velocity of the W boson. Still,
the decay of the Higgs boson into right-handed neutrinos is not insignificant. For example,
with Λ ≃ 2 TeV, c(5)1 = 1 and βI ≃ 1, the branching ratio for a Higgs boson of mass 300
GeV to decay into a single right-handed neutrino flavor of mass 30 GeV is approximately
5%. Whether the decays of the Higgs boson into right-handed neutrinos are visible or not
depends on the lifetime of the right-handed neutrino. That issue is discussed in Section 5.
It is now shown that all the other operators at d = 5 involving right-handed neutrinos
and Higgs bosons are irrelevant for the decay of the Higgs boson. Aside from (8), there is
only one more linearly independent operator involving the Higgs boson and a neutrino,
O(5)2 = LH̃LH̃ . (15)
After electroweak symmetry breaking this operator contributes to the left-handed neutrino
masses, so its coefficient must be tiny, c
2/Λ∼<O(mνL) . Consequently, the decay of the
Higgs boson into active neutrinos from this operator is irrelevant. In Section 4 it is seen
that under the minimal flavor violation hypothesis this operator is naturally suppressed
to easily satisfy the condition above. It is then consistent to assume that the dominant
contribution to the active neutrino masses comes from mass mixing with the right-handed
neutrinos.
Other operators involving the Higgs boson exist at dimension 5, but all of them can be
reduced to (15) and dimension 4 operators by using the equations of motion. For instance,
O(5)3 ≡ −i(∂
µN)σµLH̃ → mRNLH̃ + (H̃L)λTν (LH̃) , (16)
where the equations of motion were used in the last step. As a result, this operator does
not introduce any new dynamics. Still, its coefficients must be tiny enough to not generate
too large of a neutrino mass. In particular, enough suppression occurs if its coefficients
are less than or comparable to the neutrino couplings. Under the minimal flavor violation
hypothesis it is seen that these coefficients are naturally suppressed to this level.
Even if the operators O(5)2 and O
3 are not present at tree-level, they will be generated
at the loop-level through operator mixing with O(5)1 . This is because the overall lepton
number symmetry U(1)LN is broken with both the neutrino couplings and O
1 present.
However, such mixing will always involve the neutrino couplings and be small enough to
not generate too large of a neutrino mass. To understand this, it is useful to introduce
a different lepton number under which the right-handed neutrinos are neutral and both
the charged leptons and left-handed neutrinos are charged. Thus the neutrino couplings
and the operators O(5)2 and O
3 violate this symmetry, but the operator O
1 preserves
it. In the limit that λν → 0 this lepton number symmetry is perturbatively exact, so
inserting O(5)1 into loops can only generate O
2 and O
3 with coefficients proportional to
the neutrino couplings. Further, O(5)2 violates this symmetry by two units, so in generating
it from loops of Standard Model particles and insertions of O(5)1 it will be proportional
to at least two powers of the neutrino couplings. Likewise, in generating O(5)3 from such
loops its coefficient is always proportional to at least one power of the neutrino coupling.
In particular, O(5)2 is generated directly at two-loops, with c
2 ∝ λTν λνc
1 . It is also
generated indirectly at one-loop, since O(5)3 is generated at one-loop, with c
3 ∝ c
1 λν .
These operator mixings lead to corrections to the neutrino masses that are suppressed by
loop factors and at least one power of mR/Λ compared to the tree-level result.
As a result, no significant constraint can be applied to the operator O(5)1 .2 Instead the
challenge is to explain why the coefficients of O(5)2 and O
3 in the effective theory are small
to begin with. The preceding arguments show why it is technically natural for them to
be small, even if O(5)1 is present. The minimal flavor violation hypothesis discussed below
does provide a technically consistent framework in which this occurs.
3 Naturalness
The operator
H†HNN (17)
violates chirality, so it contributes to the mass of the right-handed neutrino at both tree
and loop level. At tree level
δmR = c
= 60c
GeV . (18)
There is also a one-loop diagram with an insertion of this operator. It has a quadratic
divergence such that
δmR ≃ 2c
. (19)
Similarly, at one-loop
δm2h ≃
1 mR]Λ . (20)
2This statement assumes c(5)∼<O(16π
2) and that the loop momentum cutoff Λloop ≃ Λ. Constraints
might conceivably occur for very light right-handed neutrino masses, but that possibility is not explored
here since MI∼>O(GeV) is assumed throughout in order that the right-handed neutrinos decay before big
bang nucleosynthesis.
1 ∼ O(1) then a right-handed neutrino with mass MI ≃ 30 GeV requires O(1) tuning
for TeV∼< Λ∼< 10 TeV, and mh ≃ 100 GeV is technically natural unless Λ∼> 10 TeV or mR
is much larger than the range (MI∼<150 GeV ) considered here.
Clearly, if Λ ∼> O(10 TeV) then a symmetry would be required to protect the right-
handed neutrino and Higgs boson masses. One such example is supersymmetry. Then
this operator can be generalized to involve both Higgs superfields and would appear in the
superpotential. It would then be technically natural for the Higgs boson and right-handed
neutrino masses to be protected, even for large values of Λ. As discussed previously, for
such large values of Λ decays of the Higgs boson into right-handed neutrinos may still be
of phenomenological interest.
4 Minimal Flavor Violation
The higher dimension operators involving right-handed neutrinos and Standard Model
leptons previously discussed can a priori have an arbitrary flavor structure and size. But
as is well-known, higher dimension operators in the lepton and quark sector suppressed
by only Λ ≃ TeV −10 TeV are grossly excluded by a host of searches for flavor changing
neutral currents and overall lepton number violating decays.
A predictive framework for the flavor structure of these operators is provided by the
minimal flavor violation hypothesis [9, 10, 11]. This hypothesis postulates a flavor symme-
try assumed to be broken by a minimal set of non-dynamical fields, whose vevs determine
the renormalizable Yukawa couplings and masses that violate the flavor symmetry. Since a
minimal field content is assumed, the flavor violation in higher dimension operators is com-
pletely determined by the now irreducible flavor violation appearing in the right-handed
neutrino masses and the neutrino, charged lepton and quark Yukawa couplings. Without
the assumption of a minimal field content breaking the flavor symmetries, unacceptably
large flavor violating four fermion operators occur. In practice, the flavor properties of a
higher dimension operator is determined by inserting and contracting appropriate powers
and combinations of Yukawa couplings to make the operator formally invariant under the
flavor group. Limits on operators in the quark sector are 5− 10 TeV [10], but weak in the
lepton sector unless the neutrinos couplings are not much less than order unity [11][22].
It is important to determine what this principle implies for the size and flavor structure
of the operator
1 )IJH
†HNINJ . (21)
It is seen below that the size of its coefficients depends critically on the choice of the flavor
group for the right-handed neutrinos. This has important physical consequences which are
then discussed.
In addition one would like to determine whether the operators O(5)2 and O
3 are suf-
ficiently suppressed such that their contribution to the neutrinos masses is always sub-
dominant. In Section 2 it was argued that if these operators are initially absent, radiative
corrections involving O(5)1 and the neutrino couplings will never generate large coefficients
(in the sense used above) for these operators. However, a separate argument is needed to
explain why they are initially small to begin with. It is seen below that this is always the
case assuming minimal flavor violation.
To determine the flavor structure of the higher dimension operators using the minimal
flavor violation hypothesis, the transformation properties of the particles and couplings are
first defined. The flavor symmetry in the lepton sector is taken to be
GN × SU(3)L × SU(3)ec × U(1) (22)
where U(1) is the usual overall lepton number acting on the Standard Model leptons. With
right-handed neutrinos present there is an ambiguity over what flavor group to choose for
the right-handed neutrinos, and what charge to assign them under the U(1). In fact, since
there is always an overall lepton number symmetry unless both the Majorana masses and
the neutrino couplings are non-vanishing, there is a maximum of two such U(1) symmetries.
Two possibilities are considered for the flavor group of the right-handed neutrinos:
GN = SU(3)× U(1)′ or SO(3) . (23)
The former choice corresponds to the maximal flavor group, whereas the latter is chosen to
allow for a large coupling for the operator (21), shown below. The fields transform under
the flavor group SU(3)× SU(3)L × SU(3)ec × U(1)′ × U(1) as
N → (3,1,1)(1,0) (24)
L → (1,3,1)(−1,1) (25)
ec → (1,1,3)(1,−1) , (26)
Thus U(1)′ is a lepton number acting on the right-handed neutrinos and Standard Model
leptons and is broken only by the Majorana masses. U(1) is a lepton number acting only
on the Standard Model leptons and is only broken by the neutrino couplings. Then the
masses and Yukawa couplings of the theory are promoted to spurions transforming under
the flavor symmetry. Their representations are chosen in order that the Lagrangian is
formally invariant under the flavor group. Again for GN = SU(3) × U(1)′,
λν → (3,3,1)(0,−1) (27)
λl → (1,3,3)(0,0) (28)
mR → (6,1,1)(−2,0) . (29)
For GN = SO(3) there are several differences. First, the 3’s of SU(3) simply become 3’s of
SO(3). Next, the U(1) charge assignments remain but there is no U(1)′ symmetry. Finally,
a minimal field content is assumed throughout, implying that for GN = SO(3) mR ∼ 6 is
real.
With these charge assignments a spurion analysis can now be done to estimate the size
of the coefficents of the dimension 5 operators introduced in Section 2.
For either choice of GN one finds the following. An operator that violates the U(1)
lepton number by n units is suppressed by n factors of the tiny neutrino couplings. In
particular, the dangerous dimension 5 operators O(5)2 and O
3 are seen to appear with
two and one neutrino couplings, which is enough to suppress their contributions to the
neutrino masses. If GN = SO(3) such operators can also be made invariant under SO(3)
by appropriate contractions. If however GN = SU(3)×U(1)′, then additional suppressions
occur in order to construct GN invariants. For example, the coefficients of the dimension
5 operators O(5)2 and O
3 are at leading order λ
Rλν/Λ and λνm
R/Λ respectively and
are sufficiently small.
It is now seen that the flavor structure of the operator (8) depends on the choice of the
flavor group GN . One finds
GN = SU(3)× U(1)′ : c
1 ∼ a1
mRTr[m
+ · · ·
GN = SO(3) : c
1 ∼ 1+ d1
mR ·mR
+ · · ·+ e1λνλ†ν + · · · (30)
where · · · denotes higher powers in mR and λνλ†ν . Comparing the expressions in (30), the
only important difference between the two is that 1 is invariant under SO(3), but not
under SU(3) or U(1)′. As we shall see shortly, this is a key difference that has important
consequences for the decay rate of the Higgs boson into right-handed neutrinos.
Next the physical consequences of the choice of flavor group are determined. First note
that if we neglect the λνλ
ν ∝ mL contribution to c
1 , then for either choice of flavor group
the right-handed neutrino masses mR and couplings c
1 are simultaneously diagonalizable.
For GN = SO(3) this follows from the assumption that mR ∼ 6 is a real representation.
As a result, the couplings c
1 are flavor-diagonal in the right-handed neutrino mass basis.
If GN = SO(3) the couplings c
1 are flavor-diagonal, universal at leading order, and
not suppressed by any Yukawa couplings. It follows that
Br(h → NINI)
Br(h → NJNJ)
≃ 1 (31)
up to small flavor-diagonal corrections of order mR/Λ from the next-to-leading-order terms
in the couplings c
1 . βI is the velocity of NI and its appearance in the above ratio is simply
from phase space. It is worth stressing that even if the right-handed neutrino masses are
non-universal, the branching ratios of the Higgs boson into the right-handed neutrinos are
approximately universal and equal to 1/3 up to phase space corrections. The calculations
from Section 2 of the Higgs boson decay rate into right-handed neutrinos do not need to be
rescaled by any small coupling, and the conclusion that these decay channels dominate over
h → bb for Λ up to 20 TeV still holds. Theoretically though, the challenge is to understand
why MI ≪ Λ.
Similarly, if GN = SU(3) the couplings are flavor-diagonal and suppressed by at least
a factor of mR/Λ but not by any Yukawa couplings. This suppression has two effects.
First, it eliminates the naturalness constraints discussed in Section 3. The other is that it
suppresses the decay rate of h → NINI by a predictable amount. In particular
Γ(h → NINI) =
I (32)
where I have set a1 = 1, and
Br(h → NINI)
Br(h → NJNJ)
up to flavor-diagonal corrections of order mR/Λ. In this case, the Higgs boson decays
preferentially to the right-handed neutrino that is the heaviest. Still, even with this sup-
pression these decays dominate over h → bb up to Λ ≃ 1 TeV if three flavors of right-handed
neutrinos of mass MI ≃ O(50GeV) are lighter than the Higgs boson. For larger values of
Λ these decays have a subdominant branching fraction. They are still interesting though,
because they have a rich collider phenomenology and may still be an important channel
in which to search for the Higgs boson. This scenario might be more natural theoretically,
since an approximate SU(3) symmetry is protecting the mass of the fermions.
5 Right-handed Neutrino Decays
I have discussed how the presence of a new operator at the TeV scale can introduce new
decay modes of the Higgs boson into lighter right-handed neutrinos, and described the
circumstances under which these new processes may be the dominant decay mode of the
Higgs boson. In the previous section we have seen that whether that actually occurs or
not depends critically on a few assumptions. In particular, on whether the Higgs boson
is light, on the scale of the new operator, and key assumptions about the identity of the
broken flavor symmetry of the right-handed neutrinos.
Whether the decays of the Higgs boson into right-handed neutrinos are visible or not
depends on the lifetime of the right-handed neutrinos. It is seen below that in the mini-
mal flavor violation hypothesis their decays modes are determined by their renormalizable
couplings to the electroweak neutrinos and leptons, rather than through higher-dimension
operators.
The dominant decay of a right-handed neutrinos is due to the gauge interactions with
the electroweak gauge bosons it acquires through mass mixing with the left-handed neu-
trinos. At leading order a right-handed neutrino NJ acquires couplings to WlI and ZνI
which are identical to those of a left-handed neutrino, except that they are suppressed by
the mixing angles
VIJ = [m
D]IJM
J . (34)
If the right-handed neutrino is heavier than the electroweak gauge bosons but lighter
than the Higgs boson, it can decay as NJ → W+l−I and NJ → ZνI . Since it is a Majorana
particle, decays to charge conjugated final states also occur. The rate for these decays is
proportional to |VIJ |2M3J .
If a right-handed neutrino is lighter than the electroweak gauge bosons, it decays
through an off-shell gauge boson to a three-body final state. Its lifetime can be obtained by
comparing it to the leptonic decay of the τ lepton, but after correcting for some additional
differences described below. The total decay rate is 3
Γtotal(NI)
Γ(τ → µνµντ )
= 2× 9 (cW + 0.40cZ )
. (35)
The corrections are the following. The factor of “9” counts the number of decays available
to the right-handed neutrino through charged current exchange, assuming it to be heavier
than roughly few-10 GeV. The factor of “0.40” counts the neutral current contribution. It
represents about 30% of the branching ratio, with the remaining 70% of the decays through
the charged current. The factor of “2” is because the right-handed neutrino is a Majorana
particle, so it can decay to both particle and anti-particle, e.g. W ∗l− and W ∗l+, or Z∗ν
and Z∗ν. Another correction is due to the finite momentum transfer in the electroweak
gauge boson propagators. This effect is described by the factors cW and cW where
cG(xG, yG) = 2
dzz2(3− 2z)
(1− (1− z)xG)2 + yG
where xG = M
G, yG = Γ
G, cG(0, 0) = 1 and each propagator has been ap-
proximated by the relativistic Breit-Wigner form. The non-vanishing momentum transfer
enhances the decay rate by approximately 10% for mR masses around 30GeV and by ap-
proximately 50% for masses around 50 GeV. This effect primarily affects the overall rate
and is less important to the individual branching ratios.
The formula (36) is also valid when the right-handed neutrino is more massive than
the electroweak gauge bosons such that the previously mentioned on-shell decays occur. In
that case (35) gives the inclusive decay rate of a right-handed neutrino into any electroweak
gauge boson and a charged lepton or a left-handed neutrino. In this case the correction
from the momentum transfer is obviously important to include! It enhances the decay rate
by approximately a factor of 40 for masses around 100 GeV, but eventually scales as M−2I
for a large enough mass.
3An ≈ 2 error in an earlier version has been corrected.
An effect not included in the decay rate formula above is the quantum interference that
occurs in the same flavor l+l−ν or ννν final states. Its largest significance is in affecting the
branching ratio of these specific, subdominant decay channels and is presented elsewhere
[19]. Using cττ = 87µm [23] and BR(→ µνµντ ) = 0.174 [23], (35) gives the following decay
length for NI ,
cτI = 0.90m
cW + 0.40cZ
30 GeV
(120 keV)2
. (37)
Care must be used in interpreting this formula, since the Dirac and Majorana masses
are not completely independent because they must combine together to give the observed
values of the active neutrino masses.
This expression is both model-independent and model-dependent. Up to this point no
assumptions have been made about the elements of the Dirac mass matrix or the right-
handed neutrino masses, so the result above is completely general. Yet the actual value
of the decay length clearly depends on the flavor structure of the Dirac mass matrix. In
particular, the matrix elements [mDm
D]II/MI are not the same as the active neutrino
mass masses. This is fortunate, since it presents an opportunity to measure a different set
of neutrino parameters from those measured in neutrino oscillations.
The masses MI describe 3 real parameters, and a priori the Dirac matrix mD describes
18 real parameters. However, 3 of the phases in mD can be removed by individual lepton
number phase rotations on the left-handed neutrinos and charged leptons, leaving 15 pa-
rameters which I can think of as 6 mixing angles, 3 real Yukawa couplings and 6 phases.
Including the three right-handed neutrino masses gives 18 parameters in total. Five con-
straints on combinations of these 18 parameters already exist from neutrino oscillation
experiments. In principle all of these parameters could be measured through detailed stud-
ies of right-handed neutrino decays, since amplitudes for individual decays are proportional
to the Dirac neutrino matrix elements. However, at tree-level these observables depend
only on |[mD]IJ | and are therefore insensitive to the 6 phases. So by studying tree-level
processes only the 3 right-handed neutrino masses, 3 Yukawa couplings, and 6 mixing
angles could be measured in principle.
In particular, the dominant decay is h → NINI → qqqqlJ lK which contains no miss-
ing energy. Since the secondary events are highly displaced, there should be no confusion
about which jets to combine with which charged leptons. In principle a measurement of
the mass of the right-handed neutrino and the Higgs boson is possible by combining the in-
variant momentum in each event. A subsequent measurement of a right-handed neutrino’s
lifetime from the spatial distribution of its decays measures [mDm
D]II . More information
is acquired by measuring the nine branching ratios BR(NI → qq′lJ) ∝ |[mD]IJ |2. Such
measurements provide 6 additional independent constraints. In total, 12 independent con-
straints on the 18 parameters could in principle be obtained from studying right-handed
neutrino decays at tree-level.
To say anything more precise about the decay length would require a model of the
neutrino couplings and right-handed neutrino mass parameters. Specific predictions could
be done within the context of such a model. Of interest would be the branching ratios and
the mean and relative decay lengths of the three right-handed neutrinos.
The factor [mDm
D]II/MI appearing in the decay length is not the active neutrino mass
obtained by diagonalizing mTDm
R mD, but it is close. If I approximate [mDm
D]II/MI ≃
mI , then
cτI ≃ 0.90m
30 GeV
0.48 eV
cW + 0.4cZ
A few comments are in order. First, the decay lengths are macroscopic, since by inspection
they range from O(100µm) to O(10m) for a range of parameters, and for these values are
therefore visible at colliders. Next, the decay length is evidently extremely sensitive to MI .
Larger values of MI have shorter decays lengths. For instance, if MI = 100 GeV (which
requiresmh > 200 GeV) andmI = 0.5 eV then cτI ≃ 0.2mm. Finally, if the active neutrino
masses are hierarchical, then one would expect M4I cτI to be hierarchical as well, since this
quantity is approximately proportional to m−1L . One or two right-handed neutrinos may
therefore escape the detector if the masses of the lightest two active neutrinos are small
enough.
I have described decays of the right-handed neutrinos caused by its couplings to elec-
troweak gauge bosons acquired through mass mixing with the left-handed neutrinos. How-
ever, additional decay channels occur through exchange of an off-shell Higgs boson, higher
dimension operators or loop effects generated from its gauge couplings. It turns out that
these processes are subdominant, but may be of interest in searching for the Higgs bo-
son. Exchange of an off-shell Higgs boson causes a decay NI → νJbb which is suppressed
compared to the charged and neutral current decays by the tiny bottom Yukawa coupling.
Similarly, the dimension 5 operator (8) with generic flavor couplings allows for the decay
NI → NJh∗ → NJbb for NJ lighter than NI 4. However, using the minimal flavor vio-
lation hypothesis it was shown in Section 4 that the couplings of that higher dimension
operator are diagonal in the same basis as the right-handed neutrino mass basis, up to
flavor-violating corrections that are at best O(λ2ν) (see (30)). As result, this decay is highly
suppressed. At dimension 5 there is one more operator that I have not yet introduced
which is the magnetic moment operator
·NσρσNBρσ (39)
and it involves only two right-handed neutrinos. It causes a heavier right-handed neutrino
to decay into a lighter one, NI → NJ+γ/Z for I 6= J . To estimate the size of this operator,
first note that its coefficient must be anti-symmetric in flavor. Then in the context of
minimal flavor violation with GR = SO(3), the leading order term is c
4 ≃ [λνλ
ν ]AS where
4The author thanks Scott Thomas for this observation.
“AS” denotes ‘anti-symmetric part’. This vanishes unless the neutrino couplings violate
CP . In that case the amplitude for this decay is of order (λν)
2. If GR = SU(3)×U(1)′ the
leading order term cannot be [mR]AS(Tr[mRm
q)n/Λn+q, since they vanish in the right-
handed neutrino mass basis. The next order involves λνλ
ν and some number of mR’s,
but there does not appear to be any invariant term. Thus for either choice of GR the
amplitude for NI decays from this operator are O(λ
ν) or smaller, which is much tinier
than the amplitudes for the other right-handed neutrino decays already discussed which
are of order λν . Subdominant decays N → ν + γ can occur from dimension 6 operators
and also at also one-loop from electroweak interactions, but in both cases the branching
ratio is tiny [19].
6 Discussion
In order for these new decays to occur at all requires that the right-handed neutrinos are
lighter than the Higgs boson. But from a model building perspective, one may wonder
why the right-handed neutrinos are not heavier than the scale Λ. A scenario in which
the right-handed neutrinos are composite would naturally explain why these fermions are
comparable or lighter than the compositeness scale Λ, assumed to be O(TeV). Since their
interactions with the Higgs boson through the dimension 5 operator (8) are not small, the
Higgs boson would be composite as well (but presumed to be light).
These new decay channels of the Higgs boson will be the dominant decay modes if
the right-handed neutrinos are also lighter than the electroweak gauge bosons, and if the
coefficient of the higher dimension operator (8) is not too small. As discussed in Section
4, in the minimal flavor violation framework the predicted size of this operator depends
on the choice of approximate flavor symmetries of the right-handed neutrinos. It may be
O(1) or O(mR/Λ).
In the former situation the new decays dominate over Higgs boson decays to bottom
quarks for scales Λ∼<10 − 20 TeV, although only scales Λ ≃ 1 − 10 TeV are technically
natural. This case however presents a challenge to model building, since the operator (8)
breaks the chirality of the right-handed neutrinos. Although it may be technically natural
for the right-handed neutrinos to be much lighter than the scale Λ (see Section 3), one
might expect that any theory which generates a large coefficient for this operator to also
generate Majorana masses mR ∼ O(Λ).
In the case where the coefficient of (8) is O(mR/Λ) the new decays can still dominate
over decays to bottom quarks provided that the scale Λ ≃ O(1 TeV). For larger values of
Λ these decays are subdominant but have sizable branching fractions up to Λ ≃ O(10TeV).
This situation might be more amendable to model building. For here an approximate
SU(3) symmetry is protecting the mass of the right-handed neutrinos.
In either case though one needs to understand why the right-handed neutrinos are para-
metrically lighter than Λ. It would be extremely interesting to find non-QCD-type theories
of strong dynamics where fermions with masses parametrically lighter than the scale of
strong dynamics occur. Or using the AdS/CFT correspondence [8], to find a Randall-
Sundrum type model [7] that engineers this outcome. The attitude adopted here has been
to assume that such an accident or feature can occur and to explore the consequences.
Assuming that these theoretical concerns can be naturally addresseed, the Higgs boson
physics is quite rich. To summarize, in the new process the Higgs boson decays through a
cascade into a six-body or four-body final state depending on the masses of the right-handed
neutrinos. First, it promptly decays into a pair of right-handed neutrinos, which have a
macroscopic decay length anywhere from O(100µm − 10m) depending on the parameters
of the Majorana and Dirac neutrino masses. If one or two active neutrinos are very light,
then the decay lengths could be larger. Decays occurring in the detector appear as a pair of
displaced vertices. For most of the time each secondary vertex produces a quark pair and a
charged lepton, dramatically violating lepton number. For a smaller fraction of the time a
secondary vertex produces a pair of charged leptons or a pair of quarks, each accompanied
with missing energy. From studying these decays one learns more about neutrinos and
the Higgs boson, even if these channels should not form the dominant decay mode of the
Higgs boson. The experimental constraints on this scenario from existing colliders and its
discovery potential at the LHC will be described elsewhere [19] [24].
Acknowledgments
The author thanks Matt Stassler and Scott Thomas for discussions. This work is supported
by the U.S. Department of Energy under contract No. DE-FG03-92ER40689.
References
[1] M. Fukugita and T. Yanagida, Phys. Lett. B 174, 45 (1986).
[2] T. Asaka and M. Shaposhnikov, Phys. Lett. B 620, 17 (2005) [arXiv:hep-ph/0505013];
M. Shaposhnikov, J. Phys. Conf. Ser. 39, 9 (2006).
[3] A. Pilaftsis, Z. Phys. C 55, 275 (1992) [arXiv:hep-ph/9901206].
[4] K. Belotsky, D. Fargion, M. Khlopov, R. Konoplich and K. Shibaev, Phys. Rev. D 68,
054027 (2003) [arXiv:hep-ph/0210153].
[5] T. Han and B. Zhang, Phys. Rev. Lett. 97, 171804 (2006) [arXiv:hep-ph/0604064].
[6] P. Q. Hung, arXiv:hep-ph/0612004.
[7] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370 (1999) [arXiv:hep-ph/9905221].
http://arxiv.org/abs/hep-ph/0505013
http://arxiv.org/abs/hep-ph/9901206
http://arxiv.org/abs/hep-ph/0210153
http://arxiv.org/abs/hep-ph/0604064
http://arxiv.org/abs/hep-ph/0612004
http://arxiv.org/abs/hep-ph/9905221
[8] J. M. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998) [Int. J. Theor. Phys. 38,
1113 (1999)] [arXiv:hep-th/9711200]; S. S. Gubser, I. R. Klebanov and A. M. Polyakov,
Phys. Lett. B 428, 105 (1998) [arXiv:hep-th/9802109]; E. Witten, Adv. Theor. Math.
Phys. 2, 253 (1998) [arXiv:hep-th/9802150].
[9] R. S. Chivukula and H. Georgi, Phys. Lett. B 188, 99 (1987);
[10] G. D’Ambrosio, G. F. Giudice, G. Isidori and A. Strumia, Nucl. Phys. B 645, 155
(2002) [arXiv:hep-ph/0207036].
[11] V. Cirigliano, B. Grinstein, G. Isidori and M. B. Wise, Nucl. Phys. B 728, 121 (2005)
[arXiv:hep-ph/0507001].
[12] S. Dimopoulos, M. Dine, S. Raby and S. D. Thomas, Phys. Rev. Lett. 76, 3494
(1996) [arXiv:hep-ph/9601367]; S. Dimopoulos, M. Dine, S. Raby, S. D. Thomas
and J. D. Wells, Nucl. Phys. Proc. Suppl. 52A, 38 (1997) [arXiv:hep-ph/9607450];
S. Dimopoulos, S. D. Thomas and J. D. Wells, Nucl. Phys. B 488, 39 (1997)
[arXiv:hep-ph/9609434].
[13] M. J. Strassler and K. M. Zurek, arXiv:hep-ph/0604261; M. J. Strassler and
K. M. Zurek, arXiv:hep-ph/0605193; M. J. Strassler, arXiv:hep-ph/0607160.
[14] R. Dermisek and J. F. Gunion, Phys. Rev. Lett. 95, 041801 (2005)
[arXiv:hep-ph/0502105]; R. Dermisek and J. F. Gunion, Phys. Rev. D 73, 111701
(2006) [arXiv:hep-ph/0510322]; S. Chang, P. J. Fox and N. Weiner, JHEP
0608, 068 (2006) [arXiv:hep-ph/0511250]; S. Chang, P. J. Fox and N. Weiner,
[arXiv:hep-ph/0608310].
[15] L. M. Carpenter, D. E. Kaplan and E. J. Rhee, arXiv:hep-ph/0607204.
[16] A. V. Manohar and M. B.Wise, Phys. Lett. B 636, 107 (2006) [arXiv:hep-ph/0601212].
[17] A. V. Manohar and M. B. Wise, Phys. Rev. D 74, 035009 (2006)
[arXiv:hep-ph/0606172].
[18] B. Pontecorvo, Sov. Phys. JETP 6, 429 (1957) [Zh. Eksp. Teor. Fiz. 33, 549 (1957)].
ibid. Sov. Phys. JETP 7, 172 (1958) [Zh. Eksp. Teor. Fiz. 34, 247 (1957)]; Z. Maki,
M. Nakagawa and S. Sakata, Prog. Theor. Phys. 28, 870 (1962).
[19] M. L. Graesser, arXiv:0705.2190 [hep-ph].
[20] S. Weinberg, “The Quantum theory of fields. Vol. 1: Foundations,” Cambridge, UK:
Univ. Pr. (1995) 609 p
[21] J. F. Gunion, H. E. Haber, G. L. Kane and S. Dawson, The Higgs Hunter’s Guide;
ibid., [arXiv:hep-ph/9302272].
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/9802109
http://arxiv.org/abs/hep-th/9802150
http://arxiv.org/abs/hep-ph/0207036
http://arxiv.org/abs/hep-ph/0507001
http://arxiv.org/abs/hep-ph/9601367
http://arxiv.org/abs/hep-ph/9607450
http://arxiv.org/abs/hep-ph/9609434
http://arxiv.org/abs/hep-ph/0604261
http://arxiv.org/abs/hep-ph/0605193
http://arxiv.org/abs/hep-ph/0607160
http://arxiv.org/abs/hep-ph/0502105
http://arxiv.org/abs/hep-ph/0510322
http://arxiv.org/abs/hep-ph/0511250
http://arxiv.org/abs/hep-ph/0608310
http://arxiv.org/abs/hep-ph/0607204
http://arxiv.org/abs/hep-ph/0601212
http://arxiv.org/abs/hep-ph/0606172
http://arxiv.org/abs/0705.2190
http://arxiv.org/abs/hep-ph/9302272
[22] V. Cirigliano and B. Grinstein, Nucl. Phys. B 752, 18 (2006) [arXiv:hep-ph/0601111].
[23] W. M. Yao et al. [Particle Data Group], J. Phys. G 33, 1 (2006).
[24] Michael L. Graesser, Amit Lath and Jessie Shelton, work in progress.
http://arxiv.org/abs/hep-ph/0601111
	Motivation
	Higgs Boson Decay
	Naturalness
	Minimal Flavor Violation
	Right-handed Neutrino Decays
	Discussion
ABSTRACT
  The existence of certain TeV suppressed higher-dimension operators may open
up new decay channels for the Higgs boson to decay into lighter right-handed
neutrinos. These channels may dominate over all other channels if the Higgs
boson is light. For a Higgs boson mass larger than $2 m_W$ the new decays are
subdominant yet still of interest. The right-handed neutrinos have macroscopic
decay lengths and decay mostly into final states containing leptons and quarks.
A distinguishing collider signature of this scenario is a pair of displaced
vertices violating lepton number. A general operator analysis is performed
using the minimal flavor violation hypothesis to illustrate that these novel
decay processes can occur while remaining consistent with experimental
constraints on lepton number violating processes. In this context the question
of whether these new decay modes dominate is found to depend crucially on the
approximate flavor symmetries of the right-handed neutrinos.

<|endoftext|><|startoftext|>
Coulomb blockade of anyons
Dmitri V. Averin and James A. Nesteroff
Department of Physics and Astronomy, Stony Brook University, SUNY, Stony Brook, NY 11794-3800
(Dated: August 23, 2021)
Coulomb interaction turns anyonic quasiparticles of a primary quantum Hall liquid with filling
factor ν = 1/(2m + 1) into hard-core anyons. We have developed a model of coherent transport of
such quasiparticles in systems of multiple antidots by extending the Wigner-Jordan description of
1D abelian anyons to tunneling problems. We show that the anyonic exchange statistics manifests
itself in tunneling conductance even in the absence of quasiparticle exchanges. In particular, it can
be seen as a non-vanishing resonant peak associated with quasiparticle tunneling through a line of
three antidots.
PACS numbers: 73.43.-f, 05.30.Pr, 71.10.Pm, 03.67.Lx
Quasiparticles of two-dimensional (2D) electron liq-
uids in the regime of the Fractional Quantum Hall effect
(FQHE) have unusual properties of fractional charge [1]
and fractional exchange statistics [2, 3]. The fractional
quasiparticle charge was observed in experiments on an-
tidot tunneling [4] and shot-noise measurements [5, 6].
The situation with fractional statistics is so far less cer-
tain even in the case of the abelian statistics, which is the
subject of this work. Although the recent experiments [7]
demonstrating unusual flux periodicity of conductance
of a quasiparticle interferometer can be interpreted as a
manifestation of the fractional statistics [8, 9], this inter-
pretation is not universally accepted [10, 11]. There is a
number of theoretical proposals (see, e.g., [12, 13]) sug-
gesting tunnel structures where the statistics manifests
itself through noise properties. Partly due to complex-
ity of noise measurements, such experiments have not
been performed successfully up to now. In this work,
we show that coherent quasiparticle dynamics in multi-
antidot structures should provide clear signatures of the
exchange statistics in dc transport. Most notably, in tun-
neling through a line of three antidots, fractional statis-
tics leads to a non-vanishing peak of the tunnel conduc-
tance which would vanish for integer statistics.
These effects rely on the ability of quantum anti-
dots to localize individual quasiparticles of the QH liq-
FIG. 1: Tunneling of anyonic quasiparticles between oppo-
site edges of an FQHE liquid through quasi-1D triple-antidot
systems: (a) loop, (b) open interval. Quasiparticles tunnel
between the edges and the antidots with rates Γ1,2. The an-
tidots are coupled coherently by tunnel amplitudes ∆.
uids [4, 14, 15]. The resulting transport phenomena in
antidots are very similar to those associated with the
Coulomb blockade [16] in tunneling of individual elec-
trons in dots. For instance, similarly to a quantum dot
[19], the linear conductance of one antidot shows periodic
oscillations with each period corresponding to the addi-
tion of one quasiparticle [4, 14, 15, 17, 18]. Recently, we
have developed a theory of such Coulomb-blockade-type
tunneling for a double-antidot system [20], where quasi-
particle exchange statistics does not affect the transport.
The goal of this work is to extend this theory to antidot
structures where the statistics does affect the conduc-
tance. The two simplest structures with this property
consist of three antidots and have quasi-1D geometries
with either periodic or open boundary conditions (Fig. 1).
A technical issue that needed to be resolved to calculate
the tunnel conductance is that the anyonic field opera-
tors defined through the Wigner-Jordan transformation
[21, 22, 23, 24], are not fully sufficient in the situations of
tunneling. As we show below, to obtain correct matrix
elements for anyon tunneling, one needs to keep track
of the appropriate boundary conditions of the wavefunc-
tions which are not accounted for in the field operators.
Specifically, we consider the antidots coupled by tun-
neling among themselves and to two opposite edges of
the quantum Hall liquid (Fig. 1). The edges play the role
of the quasiparticle reservoirs with the transport voltage
V applied between them. We assume that the antidot-
edge coupling is weak and can be treated as a pertur-
bation. Quasiparticle transport through the antidots is
governed then by the kinetic equation similar to that for
Coulomb-blockade transport through quantum dots with
a discrete energy spectrum [25]. Coherent quasiparticle
dynamics requires that the relaxation rate Γd created by
direct Coulomb antidot-edge coupling is weak. This con-
dition should be satisfied if the edge-state confinement
is sufficiently strong [20]. The requirement on the con-
finement is less stringent in the case of the antidot line
(Fig. 1b), in which antidot quasiparticles move along the
edge, suppressing the antidot-edge coupling at low fre-
quencies. We also assume that all quasiparticle energies
http://arxiv.org/abs/0704.0439v2
on the antidots, tunnel amplitudes ∆, temperature T ,
Coulomb interaction energies U between quasiparticles
on different antidots, are much smaller than the energy
gap ∆∗ for excitations on each antidot. This condition
ensures that the state of each antidot is characterized
completely by the occupation number n of its relevant
quantized energy level. In any given range of the back-
gate voltage or magnetic field (which produces the over-
all shift of the antidot energies - see, e.g., [4, 14, 15]),
there can be at most one quasiparticle on each antidot,
n = 0, 1. This “hard-core” property of the quasipar-
ticles means that they behave as fermions in terms of
their occupation factors, despite the anyonic exchange
statistics. All these assumptions can be summarized as:
Γd,Γj ≪ ∆, U, T ≪ ∆∗.
Under these conditions, the antidot tunneling is domi-
nated by the antidot energies. The quasi-1D geometry of
the antidot systems we consider makes it possible to in-
troduce the quasiparticle “coordinate” x numbering suc-
cessive antidots; e.g., x = −1, 0, 1 for systems in Fig. 1.
The quasiparticle Hamiltonian can be the written as
[ǫxnx − (∆xξ†x+1ξx + h.c.)] +
Ux,ynxny , (1)
where ǫx are the energies of the relevant localized states
on the antidots (taken relative to the common chemical
potential of the edges at V = 0), ∆x is the tunnel cou-
pling between them, Ux,y is the quasiparticle Coulomb
repulsion, and nx ≡ ξ†xξx. The quasiparticle operators
ξ†x, ξx in (1) can be viewed as the Klein factors left in the
standard operators for the edge-state quasiparticles when
all the edge magneto-plasmon modes are suppressed by
the gap ∆∗. Characteristics of such Klein factors depend
on the geometry of a specific tunneling problem; non-
trivial examples can be found in [12, 13, 26, 27]. In the
Hamiltonian (1), ξx describe the hard-core anyons with
exchange statistics πν. Wigner-Jordan transformation
expresses them through the Fermi operators cx [21]:
ξx = e
iπ(ν−1)
nzcx , ξyξx = ξxξye
iπνsgn(x−y), (2)
with similar relations for ξ†.
Anyonic statistics creates an effective interaction be-
tween the quasiparticles which can be understood as the
Aharonov-Bohm (AB) interaction between a flux tube
“attached” to one of the particles and the charge carried
by another. In general, this interaction can be masked by
the direct Coulomb interaction Ux,y. In the antidot loop
(Fig. 1a), however, Ux,y is constant, Ux,y = U , and the
interaction term in (1) reduces to Un(n− 1)/2, with n =
x nx – the total number of the quasiparticles on the an-
tidots. In this case, the Coulomb interaction contributes
to the energy separation between the group of states with
different n, but does not affect the level structure for
given n. The hard-core property of quasiparticles limits n
to the interval [0, 3]. For n = 0 and n = 3, the system has
the “empty” and “completely filled” state with respective
energies E0 = 0 , E3 =
x ǫx+3U . The spectrum E1k of
the three n = 1 states |1k〉 =
x φk(x)ξ
x|0〉, is obtained
as usual from (1). In the uniform case ǫx = ǫ, ∆x = ∆,
with an external AB phase ϕ, one has φk(x) = e
ikx/L1/2
E1k = ǫ−∆cos k , k = (2πm+ ϕ)/L , (3)
where m = 0, 1, 2, and the loop length is L = 3.
Anyonic statistics can be seen in the n = 2 states,
|2l〉 = (1/
xy ψl(x, y)ξ
x|0〉. The fermion-anyon
relation (2) suggests that the stationary two-quasiparticle
wavefunctions should coincide up to the exchange phase
with that for free fermions:
ψl(x, y) =
eiπ(1−ν)sgn(x−y)/2√
φq(x) φq(y)
φp(x) φp(y)
. (4)
Here φs are the single-particle eigenstates of the Hamilto-
nian (1). (The states (4) are numbered with the index l of
the third “unoccupied” eigenstate of (1) complementary
to the two occupied ones q, p.) The boundary conditions
for the φs are affected by the exchange phase in Eq. (4).
To find them, we temporarily assume for clarity that co-
ordinates x, y are continuous and lie in the interval [0, L].
Subsequent discretization does not change anything sub-
stantive in this discussion. The 1D hard-core particles
are impenetrable and can be exchanged only by moving
one of them, say x, around the loop from x = y + 0 to
x = y − 0 (Fig. 2a). Since the loop is imbedded in the
underlying 2D system, such an exchange means that the
wavefunction acquires the phase factor eiπν , in which the
sign of ν is fixed by the properties of the 2D system, e.g.
the direction of magnetic field in the case of FQHE liq-
uid. Next, if the second particle is moved similarly, from
y = x+ 0 to y = x− 0, the wavefunction changes in the
same way, for a total factor ei2πν . Equation (4) shows
that only one of these changes can agree with the 1D form
of the exchange phase. As a result, the wavefunction (4)
satisfies different boundary conditions in x and y:
ψl(L, y) = ψl(0, y)e
iϕ, ψl(x, L) = ψl(x, 0)e
i(ϕ+2πν). (5)
Conditions (5) on the wavefunction (4) mean that the
single-particle functions φ in (4) satisfy the boundary
condition that correspond to the effective AB phase
ϕ′ = ϕ+ π− πν, i.e. the addition of an extra quasiparti-
cle to the loop changed the AB phase by π − πν, where
−πν comes from the exchange statistics and π from the
hard-core condition. This gives the energies of the two-
quasiparticle states (4) as U + E1q + E1p, where, if the
loop is uniform, the single-particle energies are given by
Eq. (3) with ϕ → ϕ′. In this case,
k E1k = 0, and
the energies E2l of the two-quasiparticle states can be
written as:
E2l = 2ǫ+ U −∆cos l , l = (2πm′ + ϕ− πν)/3 , (6)
(a) (b)
FIG. 2: Exchanges of hard-core anyons on a 1D loop: (a) real
exchanges by transfer along the loop embedded in a 2D sys-
tem; (b) formal exchanges describing the assumed boundary
conditions (5) of the wavefunction.
where m′ = 0, 1, 2.
One of the consequences of this discussion is that the
sign of ν in the 1D exchange phases of Eqs. (2) and (4)
can be chosen arbitrarily for a given fixed sign of the
2D exchange phase. Reversing this sign only exchanges
the character of the boundary conditions (5) between x
and y. This fact has simple interpretation. Although
the 1D hard-core anyons can not be exchanged directly,
formally, coordinates x and y in Eq. (4) are independent
and one needs to define how they move past each other
at the point x = y. Depending on whether the x-particle
moves around y from below or (as in Fig. 2b) from above,
its trajectory does or does not encircle the flux carried
by the y particle, and the boundary condition for x is
or is not affected by the statistical phase. The choice
made for x immediately implies the opposite choice for
y (Fig. 2b), accounting for different boundary conditions
(5). This interpretation shows that in calculation of any
matrix elements, the participating wavefunctions should
be taken to have the same boundary conditions. While
this requirement is natural for processes with the same
number of anyons, it is less evident for tunneling that
changes the number of anyons in the system. Indeed, the
most basic, tunnel-Hamiltonian, description of tunneling
into the point z of the system leads to the states
ξ†z |1k〉 = (1/
ψk(x, y)ξ
x|0〉 , (7)
ψk(x, y) = [φk(x)δy,z − eiπ(1−ν)sgn(x−y)δx,zφk(y)]/
One can see that Eq. (7) automatically implies specific
choice of the boundary conditions which corresponds to
the tunneling anyon not being encircled by anyons al-
ready in the system. This means that in the calculation
of the tunnel matrix elements with the states (4), one
should always pair the coordinate of the tunneling anyon
with the discontinuous one in (5). Then, the tunnel ma-
trix elements are obtained as
〈2l|ξ†z|1k〉 =
ψ∗l (x, z)φk(x) . (8)
For instance, in the case of uniform loop with states (3)
and (6), we get up to an irrelevant phase factor
〈2l|ξ†z|1k〉 = (2/3) cos[(k − l)/2] . (9)
Specific anyonic interaction between quasiparticles can
be seen in the fact that the matrix elements (9) do not
vanish for any pair of indices k, l. In the fermionic case
ν = 1, one of the elements (9) always vanishes for any
given k, since the two-particle state after tunneling neces-
sarily has one particle in the original single-particle state.
By contrast, the tunneling anyon can shift existing par-
ticle out of its state.
The matrix elements involving empty or fully occu-
pied states coincide with those for fermions. Taken to-
gether with Eqs. (8) and (9) for transitions between the
partially filled states, they determine the rates Γj(E) =
γjfν(E)|〈ξ†z〉|2 of tunneling between the jth edge and the
antidots, where γj is the overall magnitude of the tun-
neling rate, and
fν(E) = (2πT/ωc)
ν−1|Γ(ν/2+iE/2πT )|2e−E/2T /2πΓ(ν)
is its energy dependence associated with the Luttinger-
liquid correlations in the edges [28]. Here Γ(z) is the
gamma-function and ωc is the cut-off energy of the edge
excitations. The rates Γj(E) can be used in the stan-
dard kinetic equation to calculate the conductance of the
antidot system [20]. Anyonic statistics of quasiparticles
affects the position and amplitude of the conductance
peaks through the shift of the energy levels by quasiparti-
cle tunneling (described, e.g., by Eq. (6)) and through the
kinetic effects caused by the anyonic features in the ma-
trix elements (8). In the case of the antidot loop (Fig. 1a),
however, effects of statistics are masked by the external
AB flux ϕ through the loop. Since the area of practical
antidots is much larger than the internal area of the loop,
ϕ is essentially random and can not be controlled by ex-
ternal magnetic field on the relevant scale of one period
of conductance oscillations. Below, we present the results
for conductance for the similar case of a line of antidots
(Fig. 1b), the conductance of which is insensitive to the
AB flux, and shows effects of fractional statistics in the
tunneling matrix elements.
As before, the quasiparticle Hamiltonian is given by
Eq. (1). In this geometry, the interaction energy U1 ≡
U1,0 = U0,−1 between the nearest-neighbor antidots is in
general different from the interaction U2 ≡ U1,−1 between
the quasiparticles at the ends. The localization energies
on the antidots can be written as ǫj = ǫ+xδ+2λ|x|. We
consider first the unbiased line, δ = 0. At low tempera-
tures, T ≪ ∆, U , only the ground states of n quasipar-
ticles with energies En participate in transport: E0 = 0,
E1 = ǫ + λ − ω, E2 = 2ǫ + 3λ − ω̄ + (Ua + Ub)/2, and
E3 = 3ǫ+ 2Ua + Ub + 4λ, where ω = (∆
2 + λ
2)1/2
and ω̄ is given by the same expression with λ replaced by
λ̄ = λ − (U1 − U2)/2. In this regime, the linear conduc-
tanceG consists of three peaks, with each peak associated
with addition of one more quasiparticle to the antidots,
(eν)2
γ1 + γ2
anfν(En+1 − En)
1 + exp[−(En+1 − En)/T ]
, (10)
where an ≡ |〈n + 1|ξ†0|n〉|2. The amplitudes a0, a2 are
effectively single-particle, and thus, independent of the
−10 −5 0 5
 = 2.0∆
 = 1.5∆
T = 0.3∆
−10 −5 0 5
−2 0 2
FIG. 3: Linear conductance G of the antidot line in a ν = 1/3
FQHE liquid (Fig. 1b) as a function of the common anti-
dot energy ǫ relative to the edges. In contrast to electrons
(ν = 1, left inset), tunneling of quasiparticles with fractional
exchange statistics produces non-vanishing conductance peak
associated with transition between the ground states of one
and two quasiparticles. The maximum of this peak is shown in
the right inset (ν = 1/3 – solid, ν = 1 – dashed line) as a func-
tion of the bias δ. The curves are plotted for ∆1 = ∆2, λ = 0,
γ1 = γ2; conductance is normalized to G0 = (eν)
2Γ1(0)/∆1.
exchange statistics: a0 = (ω + λ)/2ω, and a2 = (ω̄ −
λ̄)/2ω̄. By contrast, the amplitude a1 of the transition
from one to two quasiparticles is multi-particle, and is
found from Eqs. (4) and (8) to be strongly statistics-
dependent,
(ω + λ)ω(ω̄ − λ̄)ω̄
cos2(πν/2) . (11)
In particular, a1 vanishes for electron tunneling (ν = 1),
but is non-vanishing in the case of fractional statistics,
e.g., for ν = 1/3, when cos2(πν/2) = 3/4. This is illus-
trated in Fig. 3 which shows the conductance G obtained
by direct solution of the full kinetic equation for tunnel-
ing through the antidots. Qualitatively, the vanishing
amplitude a1 for electrons can be understood as a re-
sult of destructive interference between the two terms in
the two-particle wavefunction which correspond to dif-
ferent ordering of the added/existing electron on the an-
tidot line. The opposite signs of these two terms lead
to vanishing overlap with the single-particle state in the
tunnel matrix element. Fractional statistics of quasipar-
ticles makes this destructive interference incomplete. Fi-
nite bias δ 6= 0 along the line suppresses this interference
making the effect of the statistics smaller. One can still
distinguish the fractional statistics by looking at the de-
pendence of the amplitude of the middle peak of conduc-
tance on the bias δ shown in the right inset in Fig. 3.
In conclusion, we have developed a model of coherent
transport of anyonic quasiparticles in systems of mul-
tiple antidots. In antidot loops, addition of individual
quasiparticles shifts the quasiparticle energy spectrum by
adding statistical flux to the loop. In the case without
loops, energy levels are insensitive to quasiparticle statis-
tics, but the statistics still manifests itself in the quasi-
particle tunneling rates and hence dc tunnel conductance
of the antidot system.
The authors would like to thank F.E. Camino, V.J.
Goldman, J.K. Jain, V.E. Korepin, Yu.V. Nazarov, O.I.
Patu, V.V. Ponomarenko, and J.J.M. Verbaarschot for
discussions. This work was supported in part by NSF
grant # DMR-0325551 and by ARO grant # DAAD19-
03-1-0126.
[1] R.B. Laughlin, Phys. Rev. Lett. 50, 1395 (1983); Rev.
Mod. Phys. 71, 863 (1999).
[2] B.I. Halperin, Phys. Rev. Lett. 52, 1583 (1984).
[3] D. Arovas, J.R. Schrieffer, and F. Wilczek, Phys. Rev.
Lett. 53, 722 (1984).
[4] V.J. Goldman and B. Su, Science 267, 1010 (1995); V.J.
Goldman et al., Phys. Rev. B 64, 085319 (2001).
[5] L. Saminadayar et al., Phys. Rev. Lett. 79, 2526 (1997).
[6] R. de-Picciotto et al., Nature 389, 162 (1997).
[7] F.E. Camino, W. Zhou, and V.J. Goldman, Phys. Rev.
Lett. 95, 246802 (2005); Phys. Rev. B 72, 075342 (2005).
[8] E.-A. Kim, Phys. Rev. Lett. 97, 216404 (2006).
[9] V.J.Goldman, Phys. Rev. B 75, 045334 (2007).
[10] J.K. Jain and C. Shi, Phys. Rev. Lett. 96, 136802 (2006).
[11] B. Rosenow and B.I. Halperin, Phys. Rev. Lett. 98,
106801 (2007).
[12] I. Safi, P. Devillard, and T. Martin, Phys. Rev. Lett. 86,
4628 (2001).
[13] C.L. Kane, Phys. Rev. Lett. 90, 226802 (2003).
[14] I.J. Maasilta and V.J. Goldman, Phys. Rev. B 57, R4273
(1998).
[15] M. Kataoka et al., Phys. Rev. Lett. 83, 160 (1999).
[16] D.V. Averin and K.K. Likharev, in: “Mesoscopic Phe-
nomena in Solids” (eds. B.L. Altshuler, P.A. Lee, and
R.B. Webb) p. 173 (Elsevier, Amsterdam, 1991).
[17] M.R. Geller, D. Loss, and G. Kirczenow, Phys. Rev. Lett.
77, 5110 (1996).
[18] A. Braggio et al., Phys. Rev. B 74, 041304R (2006).
[19] L.P. Kouwenhoven et al. in: “Mesoscopic Electron Trans-
port” (eds. L.L. Sohn, L.P. Kouwenhoven, and G. Schön)
p. 105 (Kluwer, Dordrecht, 1997).
[20] D.V. Averin and J.A. Nestroff, Physica E (2007);
cond-mat/0702614.
[21] E. Fradkin, “Field theories of condensed matter systems”
(Addison-Wesley, 1991), Ch. 7.
[22] J.-X. Zhu and Z.D. Wang, Phys. Rev. A 53, 600 (1996).
[23] M.T. Batchelor, X.-W. Guan, and N. Oelkers, Phys. Rev.
Lett. 96, 210402 (2006).
[24] M. D. Girardeau, Phys. Rev. Lett. 97, 100402 (2006).
[25] D.V. Averin and A.N. Korotkov, Sov. Phys. JETP 70,
937 (1990); C.W.J. Beenakker, Phys. Rev. B 44, 1646
(1991); D.V. Averin, A.N. Korotkov, and K.K. Likharev,
Phys. Rev. B 44, 6199 (1991).
[26] V.V. Ponomarenko and D.V. Averin, JETP Lett. 74, 87
(2001); Phys. Rev. B 70, 195316 (2004); Phys. Rev. B
71, 241308(R) (2005).
[27] K.T. Law, D.E. Feldman, and Y. Gefen, Phys. Rev. B
74, 045319 (2006).
[28] C. de C. Chamon and X.G. Wen, Phys. Rev. Lett. 70,
2605 (1993).
http://arxiv.org/abs/cond-mat/0702614
ABSTRACT
  Coulomb interaction turns anyonic quasiparticles of a primary quantum Hall
liquid with filling factor $\nu =1/(2m+1)$ into hard-core anyons. We have
developed a model of coherent transport of such quasiparticles in systems of
multiple antidots by extending the Wigner-Jordan description of 1D abelian
anyons to tunneling problems. We show that the anyonic exchange statistics
manifests itself in tunneling conductance even in the {\em absence} of
quasiparticle exchanges. In particular, it can be seen as a non-vanishing
resonant peak associated with quasiparticle tunneling through a line of three
antidots.

<|endoftext|><|startoftext|>
Dynamics of a quantum phase transition in a ferromagnetic Bose-Einstein condensate
Bogdan Damski and Wojciech H. Zurek
Theory Division, Los Alamos National Laboratory, MS-B213, Los Alamos, NM 87545, USA
We discuss dynamics of a slow quantum phase transition in a spin-1 Bose-Einstein condensate.
We determine analytically the scaling properties of the system magnetization and verify them with
numerical simulations in a one dimensional model.
Studies of phase transitions have traditionally focused
on equilibrium scalings of various properties near the crit-
ical point. Dynamics of the phase transition presents new
challenges and there is a strong motivation for analyzing
it. Nonequilibrium phase transitions may play a role in
the evolution of the early Universe [1]. Their analogues
can be studied in the condensed matter experiments. The
latter observation led to development of the theory based
on the universality of critical behavior [2], which in turn
resulted in a series of beautiful experiments [3]. The re-
cent progress in the cold atom experiments allows for
time dependent realizations of different models under-
going a quantum phase transition (QPT) [4, 5]. These
experimental developments are only a proverbial tip of
the iceberg, but they call for an in-depth theoretical un-
derstanding of the QPT dynamics.
A QPT is a fundamental change in ground state (GS)
of the system as a result of small variations of an exter-
nal parameter, e.g., a magnetic field [6]. It takes place
ideally at zero absolute temperature, which is in strik-
ing contrast to thermodynamical phase transitions. The
most complete description of the QPT dynamics has been
obtained so far in spin models [7, 8] that are exactly solv-
able. In these systems the gap in the excitation spectrum
goes to zero at the critical point, which precludes the adi-
abatic evolution across the phase boundary. It leads to
creation of excitations whose density and scaling with a
quench rate follow from a quantum version [7, 9] of the
Kibble-Zurek (KZ) theory [1, 2].
We study dynamics of a ferromagnetic condensate of
spin-1 particles [10]. For simplicity, we consider 1D ho-
mogeneous (untrapped) clouds: atoms in a box as in the
experiment [11] with spinless bosons. We adopt the pa-
rameters for our 1D model such that the length and time
scales are comparable to experimental ones [12]. Assum-
ing that the system is placed in a magnetic field B aligned
in the z direction, one gets the following dimensionless
mean-field energy functional [12]
E[Ψ] =
+Q〈Ψ|F 2z |Ψ〉
〈Ψ|Fα|Ψ〉2 (1)
where ΨT = (ψ1, ψ0, ψ−1) describes the m = 0,±1 con-
densate components,
m |ψm|2 = 1, and Fx,y,z are
spin-1 matrices [13]. The first term in (1) is the kinetic
energy, the second and the fourth term describe spin-
independent and spin-dependent atom interactions re-
spectively, the third term is a quadratic Zeeman shift
coming from atom interactions with a magnetic field.
For 87Rb atoms considered here c1 < 0, which results
in an interesting phase diagram due to the competition
between the last two terms in (1). Restricting analysis
to zero longitudinal magnetization case, and introducing
q = Q/(n|c1|), n = Ψ†Ψ
one finds a polar phase for q > 2, described by ΨTP ∼
(0, 1, 0), and the broken-symmetry phase where
ΨTB ∼ (
4− 2qeiχ1 ,
8 + 4qei(χ1+χ−1)/2,
4− 2qeiχ−1)
for 0 ≤ q < 2. The freedom of choosing the χ±1 results
in rotational symmetry of the transverse magnetization
on the (x, y) plane. The transition between these phases
can be driven by the change of the magnetic field B im-
posed on the atom cloud, q ∼ Q ∼ B2 [14], which was
experimentally done in [5].
The dynamics of a QPT depends on the rate of quench
driving the system across the phase boundary. For very
fast “impulse” transition, the system has no time to ad-
just to the changes of the Hamiltonian and arrives in a re-
gion where a new phase is expected with the “old” wave-
function untouched during the evolution. Slow transi-
tions are different: the system has time to “probe” vari-
ous broken symmetry “vacua” in the neighborhood of the
critical point where it gets excited. We are interested in
evolutions that are fast enough to produce macroscopic
excitations of the system, but slow enough to reflect scal-
ings of the critical region. By comparing analytical find-
ings to numerical simulations for experimentally relevant
parameters we provide the first complete description of
QPT dynamics in a ferromagnetic condensate.
Fast transitions were realized in the Berkeley experi-
ment [5]. The 3D numerical simulations closely following
this experiment were reported in [14]. Analytical studies
of the evolution after “impulse” quench were presented in
[15, 16]. The paper of Lamacraft [15] also discusses dy-
namics of non instantaneous transitions in 2D spinor con-
densates focusing on analytical predictions on the growth
of the transverse magnetization correlation functions.
We start with a qualitative discussion. Consider-
ing small perturbations around the GS of the broken-
symmetry phase one finds three Bogolubov modes as in
http://arxiv.org/abs/0704.0440v2
[13] where quantum fluctuations are studied. In the long
wavelength limit (important for slow transitions) there
is only one nonzero eigenvalue mode: the gapped mode
having eigenenergy ∆ ∼
4− q2. Suppose now that we
drive the system from polar to broken-symmetry phase.
The system reaction time to Hamiltonian changes in the
broken-symmetry phase is given by the inverse of the gap:
τ ∼ 1
[7, 9]. For example, when τ is small enough the
evolution becomes adiabatic so the system adjusts fast
to parameter changes. Right after entering the broken-
symmetry phase, the reaction time is large with respect
to the transition time, ∆/ d∆
, and so the system under-
goes the “impulse” evolution where its state is “frozen”.
The gapped mode starts to be populated around the in-
stant t̂ after entering the broken-symmetry phase: the
system leaves the “impulse” regime to catch up with in-
stantaneous GS solution. This happens when the two
time scales become comparable: 1/∆(t̂) ∼ ∆/ d∆
|t=t̂. We
consider here transitions driven by
q(t) = 2− t/τQ, (2)
where τQ is the quench time inversely proportional to the
speed of driving the system through the phase transition.
For slow transitions of interest here, τQ ≫ 1, we obtain
t̂ ∼ τ1/3Q . (3)
In the following we analyze dynamics induced by a
linear decrease of q(t) (2). The evolution starts from
t < 0, i.e., in the polar phase, and ends at t = 2τQ (q =
0). Such q(t) dependence is achieved by ramping down
the magnetic field as ∼
2− t/τQ. The initial state
is chosen as a slightly perturbed GS in the polar phase,
ΨT ∼ (δψ1, 1/
L+δψ0, δψ−1), where |δψm| ≪ 1/
L are
random. We generate the real and imaginary part of δΨm
at different grid points with the probability distribution
p(x) = exp(−x2/2σ2)/
2πσ. We take σ = 10−4 to start
evolution closely to the polar phase GS.
To find the full numerical solution within the mean-
field approximation, we integrate three coupled nonlin-
ear Schrödinger equations for the ψm condensates that
can be easily obtained by the variation of (1). During
evolution we look at the magnetization of the sample
fα = 〈Ψ|Fα|Ψ〉, α = x, y, z.
The transverse magnetization. A total transverse (to the
magnetic field in the z direction) magnetization reads
MT (t) =
dz[f2x(z, t) + f
y (z, t)] =
dz mT , (4)
and is experimentally measurable. It disappears in the
GS of the polar phase and equals (1 − q2/4)/L in the
broken-symmetry GS. Its typical evolution is depicted in
Fig. 1. We see there that nothing happens in the po-
lar phase. The system starts nontrivial evolution in the
0 0.5 1 1.5 2 2.5 3
q(t)=2-t/τ
1.5 1.75 2
0.4 1 4 16 64
FIG. 1: (color) Main plot: numerical solution (black solid
line) vs. static prediction (red dashed line). The arrow depicts
direction of evolution. Inset (a): the same as in the main plot
plus a numerically obtained solution of the linearized problem
(green divergent line). Inset (b): numerical data vs. fit to
τQ ≥ 10 data only (see text for details). In the main plot
τQ = 10 (see [12] for units).
FIG. 2: The vectors represent (fx(z), fy(z)) × 103. Plot (a):
snapshot at q(t = 2.81) = 1.72, i.e., at the first peak in MTL
(see Fig. 1). Plot (b): snapshot by the end of time evolution:
q(t = 20) = 0. The results come from the same numerical
simulation as in Fig. 1 (see [12] for units).
broken-symmetry phase at a distance t̂/τQ after the criti-
cal point was passed. The magnetization grows fast from
that point until it exceeds the static prediction and starts
oscillations with the amplitude decreasing in time. We
consider slow transitions. Therefore, by the end of time
evolution, when q = 0, the system is in the slightly per-
turbed ferromagnetic GS: globallyMTL ≈ 1 (Fig. 1) and
locally L2mT (z) ≈ 1 (Fig. 3). We can now ask: Does the
scaling (3) hold? To find out we define arbitrarily t̂ as the
instant when MTL intersects 1%. A fit to numerics for
τQ ≥ 10 yields ln t̂ = (0.056±0.01)+(0.332±0.002) lnτQ
which confirms prediction (3). This fit is presented in Fig.
1a, where the gradual departure of the numerical data for
τQ < 10 from t̂ ∼ τ1/3Q indicates that τQ ≫ 1 or 37ms has
to be taken for the observation of 1/3 exponent: quench
has to be slow enough to reflect the critical dynamics.
60 65 70 75
FIG. 3: (color) Magnetization of the system at t = 2τQ
(q = 0). The dashed lines facilitate observation of extrema
coincidences. Results come from the same simulation as in
Figs. 1, 2; see [12] for units.
In the GS configuration of the broken-symmetry phase
the vector (fx, fy) can have arbitrary orientation, so in
the dynamical problem considered here it is interesting to
find out how is this rotational symmetry broken. When
unstable evolution starts, spatial correlations in magne-
tization appear (Fig. 2a). In the subsequent evolution
these correlations evolve such that the correlation length
increases: see Fig. 2b obtained by the end of time evolu-
tion. This is a generic picture though the details depend
on the quench time τQ and initial state of the system.
This behavior suggests creation of spin textures [17, 18].
In our case, topological textures are spin configurations
where the magnetization direction varies in space so that
the kinetic energy term in (1) is not minimized, but mag-
netization magnitude follows closely a GS result. Such
structures appear in 1D when the first homotopy group
of the vacuum manifold M is nontrivial, which happens
here: π1(M) = Z [19]. These textures are characterized
by the winding number, 1
Arg(fx + ify), which
is not conserved. Indeed, it reads +1 in Fig. 2a, while
by the end of that evolution (Fig. 2b) it equals 0.
Are different stages of this evolution experimentally
observable? Let’s look at τQ = 10 case presented in Figs.
1-3. The evolution from the phase boundary to the first
peak in magnetizationMT (the q = 0 point) takes 2.81×
37ms ∼= 104ms (2τQ = 740ms). Both these time scales
are well within the reach of the experiment [5].
The longitudinal magnetization. Initially, fz(z) ≈ 0 so
dzfz ≈ 0. The conservation of the latter allows
only for creation of a network of magnetic domains (non-
topological structures with fixed fz sign) having opposite
polarizations. The domains appear by the time when the
system enters unstable evolution and the maxima of |fz|
tend to move towards the minima of mT (Fig. 3). More
quantitatively, we performed Nr evolutions starting from
different initial conditions, but fixed σ. As in the exper-
iment [5], we average over these runs to wash out shot-
to-shot fluctuations. In Fig. 4 we plot the mean domain
size: ξ =
i ξz(i)/Nr, where i = 1, ..., Nr and ξz(i) is
the mean domain size in the i-th run. As shown in Fig.
4a, for t . t̂ we observe ξ ≈ f(t/τ1/3Q ) as for MT (t). The
domains are formed on a time scale of ∼ t̂. A simple anal-
ysis based on KZ theory [1, 2] suggests that their charac-
teristic post-transition size, ξ̂, should be roughly given by
∫ ∼t̂
dtvs(t), where vs(t) is a sound velocity. There are two
sound modes in the broken-symmetry phase that prop-
agate both spin and density fluctuations [13]: the faster
(slower) one has velocity ∼ √c0 (∼
q|c1|). Putting any
of these as vs into the integral, and assuming τQ ≫ 1 for
the slower mode, we get
ξ̂ ∼ τ1/3Q . (5)
This result correctly predicts the scaling property of
the size of post-transition “defects” as is evident from
the overlap of different curves in Fig. 4, which shows
up for τQ ≥ 25 or 0.9s. Quantitatively, we define ξ̂
as the value of ξ averaged over q ∈ [1/2, 1] to wash
out post-transition fluctuations. A fit got us ln ξ̂ =
(−0.38 ± 0.03) + (0.30 ± 0.01) ln τQ, in good agreement
with (5). The fit was done to τQ ≥ 30 data and is pre-
sented in Fig. 4b which illustrates that smaller τQ data
gradually departs from 1/3 scaling law.
Now we focus on the analytical calculations provid-
ing predictions about early stages of time-evolution.
We assume that the wave-function stays close to
the polar phase GS, ΨT = (δψ1(t), 1/
δψ0(t), δψ−1(t)) exp(−iµt), where the chemical potential
is µ = c0/L, |δψm| ≪ 1/
L, and
dz(δΨ0 + δΨ
0) ≡ 0
to keep
dzΨ†Ψ = 1 + O(δΨ2). Linearizing the cou-
pled nonlinear-Schrödinger equations that describe the
system we get fχ = ReGχ, where χ = x, y, Gx =√
2(δΨ1 + δΨ−1)/
L, Gy = i
2(δΨ1 − δΨ−1)/
L, and
Gχ = −
qGχ −
(Gχ +G
where α = 2|c1|/L. To solve this equation we go to
momentum space, aχ(k) =
dzfχ exp(ikz) and bχ(k) =
dzImGχ exp(ikz), getting
0 k2 + αq
2α− k2 − αq 0
. (6)
Diagonalizing the matrix from Eq. (6) we see that there
is instability for k2/α < 2 − q as in the Bogolubov spec-
trum of this model [13]. Thus, the system is stable in
the polar phase, and so small initial perturbations do
not grow during the evolution towards broken-symmetry
phase. The instability for q < 2 is responsible for the
magnetization jump in Fig. 1 and the subsequent break-
down of the linear approach (Fig. 1b).
To solve Eq. (6) with q(t) given by (2) we derive the
equation for d2aχ(t)/dt
2, keep leading order terms in the
slow transition (τQ ≫ 1) and long-wavelength (k2/α ≪
2) limits, and get that
aχ(k, t) = αkχAi(s)+βkχBi(s),
, (7)
0 0.5 1 1.5 2
q(t)=2-t/τ
0.75 1.5
4 16 64
FIG. 4: (color) Dynamics of magnetic domains in fz. Black
line (τQ = 30), red line (τQ = 50), green line (τQ = 80). The
arrow show direction of evolution on the main plot. Inset (a):
early stages of ξ(t) evolution. Inset (b): dependence of the
typical post-transition domain size, ξ̂, on quench time. The
fit was done to τQ ≥ 30 data (see text for details). The figure
shows results averaged over Nr = 44 runs; see [12] for units.
where κ = (α2/2)1/3, αkχ and βkχ are constants given
by initial conditions, while Ai and Bi are Airy func-
tions. From (7) we see that the instability arises from
unbounded increase of the Bi(s) function happening for
s > 0, i.e., k2/α < 2−q(t), which is a dynamical manifes-
tation of the static result for unstable modes. This solu-
tion works till t ∼ t̂ ∼ τ1/3Q when a significant increase of
fχ invalidates the linearized theory: this calculation rig-
orously derives scaling (3). Additionally, the solution (7)
can be reliably used as long as τQ ≫ 1 or 37ms, which
is also supported by numerics (Fig. 1a). The quench
time scale in the experiment [5] is much smaller than
this bound. Finally, these results hold for any initial
state spread over the k modes.
The (re)scalings t/τ
Q and ξ̂ ∼ t̂ ∼ τ
Q derived above
in a 1D system were also found by different means in a 2D
spinor condensate [15]. A trivial extension of our mean-
field analytical calculations to 2D and 3D systems shows
that they hold for any number of spatial dimensions.
To summarize, we have developed a theory of the dy-
namics of symmetry-breaking in the quantum phase tran-
sition inspired by the experiment [5], but for the range of
quench rates that are sufficiently slow so that the critical
scalings can determine phase transition dynamics. This
regime should be accessible by a “slower” version of the
quench [5]. Our analysis points to a Kibble-Zurek-like
scenario, where the state of the system departs from the
old symmetric vacuum with a delay ∼ t̂ after the critical
point was crossed. This sets up an initial post-transition
state with a characteristic length scale ξ̂ (5). This scale
should determine the initial density of topological fea-
tures. In our 1D simulations textures appear, but we
predict that in real 3D experiments other topological de-
fects are created (as they were in [5]), and the distance
between them should be initially ∼ ξ̂. Such topological
defects are more stable than textures so measurement of
their density should be possible and would be a good test
of the theory we have presented.
[1] T.W.B. Kibble, J. Phys. A 9, 1387 (1976); Phys. Rep.
67, 183 (1980).
[2] W.H. Zurek, Nature (London) 317, 505 (1985); Acta
Phys. Pol. B 24, 1301 (1993); Phys. Rep. 276, 177 (1996).
[3] I. Chuang et al., Science 251, 1336 (1991); M.J. Bowick
et al., ibid. 263, 943 (1994); C. Bauerle et al., Nature
(London) 382, 332 (1996); V.M.H. Ruutu et al., ibid.
382, 334 (1996); S. Ducci et al., ibid. 83, 5210 (1999); A.
Maniv, E. Polturak, and G. Koren, Phys. Rev. Lett. 91,
197001 (2003); R. Monaco et al., ibid. 96, 180604 (2006).
[4] M. Lewenstein et al., Adv. Phys. 56, 243 (2007).
[5] L.E. Sadler, J.M. Higbie, S.R. Leslie, M. Vengalattore,
and D. M. Stamper-Kurn, et al., Nature (London) 443,
312 (2006).
[6] S. Sachdev, Quantum Phase Transitions (Cambridge
University Press, Cambridge UK, 2001).
[7] W.H. Zurek, U. Dorner, and P. Zoller, Phys. Rev. Lett.
95, 105701 (2005).
[8] J. Dziarmaga, Phys. Rev. Lett. 95, 245701 (2005); R.W.
Cherng and L.S. Levitov, Phys. Rev. A 73, 043614
(2006).
[9] B. Damski, Phys. Rev. Lett. 95, 035701 (2005). B.
Damski and W.H. Zurek, Phys. Rev. A 73, 063405
(2006).
[10] T.-L. Ho, Phys. Rev. Lett. 81, 742 (1998); T. Ohmi and
K. Machida, J. Phys. Soc. Jpn. 67, 1822 (1998).
[11] T.P. Meyrath et al., Phys. Rev. A 71, 041604(R) (2005).
[12] The experiment [5] is done in the harmonic potential
mω2z(λ
2 + λ2yy
2 + z2)/2, where m is the 87Rb mass,
ωz = 2π×4.3Hz, λx = 13, and λy = 81.4. We use the os-
cillator units through the paper for time (1/ωz = 37ms),
length (
~/mωz = 5.1µm), and energy (~ωz), and as-
sume that the system stays in the harmonic oscillator
ground states in x and y directions. Then, after skip-
ping constant terms, the 3D energy functional [10] re-
duces to dimensionless (1) with the additional Ψ†Ψz2/2
term, and ci = 2Nαi
mωz/~
λxλy/3 (N = 2× 106 is
the atom number, α0 = 16nm, and α1 = −α0/216.1).
To approximate such a system by a box we assume that
the total density of atoms in the harmonic trap center
is the same as in the box of size L. Neglecting in the
Thomas-Fermi limit the first and the last term of (1) we
get L = 2(
c02/3)
2/3 ≈ 78, i.e., 78× 5.1µm ≈ 0.4mm.
[13] K. Murata, H. Saito, and M. Ueda, Phys. Rev. A 75,
013607 (2007).
[14] H. Saito, Y. Kawaguchi, and M. Ueda, Phys. Rev. A 75,
013621 (2007).
[15] A. Lamacraft, Phys. Rev. Lett. 98, 160404 (2007).
[16] M. Uhlmann, R. Schützhold, and U.R. Fischer,
cond-mat/0612664.
[17] G.J. Stephens, Phys. Rev. D 61, 085002 (2000).
[18] A. Vilenkin and E.P.S. Shellard, Cosmic strings and
other topological defects (Cambridge University Press,
Cambridge, 1994).
[19] H. Mäkelä, Y. Zhang, and K.-A. Suominen, J. Phys. A
36, 8555 (2003).
http://arxiv.org/abs/cond-mat/0612664
ABSTRACT
  We discuss dynamics of a slow quantum phase transition in a spin-1
Bose-Einstein condensate. We determine analytically the scaling properties of
the system magnetization and verify them with numerical simulations in a one
dimensional model.

<|endoftext|><|startoftext|>
Introduction
Precise radial velocity (RV) measurements are a well established technique in detecting
extrasolar planets around non-active stars, like solar-type stars with similar masses and ages
to our Sun (see e.g., Butler et al. 2006). This technique has been also applied in the late
1980’s for planet searches around cool evolved stars (Cochran & Hatzes 1989). However, the
number of extrasolar planets around such non solar-type stars is still very small compared to
planets around solar-like stars. The situation for young stars is similar, where practically no
convincing case is known so far. Planet detections around young and active stars are indeed
much more difficult than those around evolved and quiet solar-like stars.
Many young stars possess high levels of stellar activity and are also known as fast
rotators. Spectroscopically this is indicated by strong line broadening and the presence of
emission line features, in particular Hα (λ6536 Å), Ca II H (λ3967 Å) and K (λ3934 Å).
Within the same spectral class the stellar activity of young stars is considerably higher than
for older stars. The rotational velocity of F-, G- and K-type young stars can be as high as
a few hundreds km/s which can be observed by strong line broadening. This makes precise
RV measurements very difficult. Intrinsic stellar activity, like non-radial pulsations and
rotational modulation, manifests itself in RV variation. In order to distinguish the sources of
RV variation in active stars, the stellar spectra have to be investigated carefully, for instance,
via the bisector analysis (e.g., Hatzes 1996) and stellar activity indicators, like Ca II H &
K emission lines and variation in Hα line, to avoid a misinterpretation of the observed RV
variation. This kind of analysis is indispensable for planet searches around active young
stars.
The search for young planetary systems by the RV technique is indeed limited to young
stars which do not show a high activity level. Such a high stellar activity affects the accuracy
of the RVmethod, like in stars with high rotational velocity (v sin i > 20 km/s). Nevertheless,
in comparison to other young planet search methods, like the direct imaging techniques, the
RV method is more sensitive to planetary companions with closer orbits, i.e., less than 10 AU
to the parent stars. A further advantage compared to direct imaging is, that the RV method
is not strongly limited by distance. It can be applied to planet searches in nearby young
moving groups (30–70 pc) and star-forming regions at > 100 pc (e.g., the Taurus-Auriga
region at 140 pc), for which direct imaging methods are not possible.
This work reports the discovery of a planetary companion around the nearby young
star HD 70573. Our RV measurements of HD 70573 show a periodic variation on a time
scale which is much longer than the stellar rotational period. This excludes rotational
modulation as the source of RV variation. We will show that the bisector technique allows us
to distinguish intrinsic stellar activity (non-radial pulsations or stellar rotational modulation
– 3 –
due to starspots) from variability due to companions. By measuring the bisector velocity
spans we detected rotational modulation in other young stars of our sample (Setiawan et
al., in preparation). The planet detection around HD 70573 is concluded by the lack of the
correlation between the observed RVs and stellar activity indicators (Sect. 4).
2. HD 70573: A nearby young star
HD 70573 was identified by Jeffries (1995) as a Lithium rich star. He predicted an
age of this star to be substantially younger than 300 Myrs. In a study of young stellar
kinematic groups by Montes et al. (2001a), HD 70573 has been classified as a member of
the Local Association (Pleiades moving group) with an age range between 20 and 150 Myrs.
Later, Lopéz-Santiago et al. (2006) classified HD 70573 as a member of the Hercules-Lyra
association, a group of stars comoving in space towards the constellation of Hercules. This
moving group has an estimated age of ∼200 Myrs. By comparing the equivalent width of Li
λ6708 Å versus the spectral type diagram (Fig. 2 in Montes et al. 2001b), we derived an
age within the Pleiades age regime (78–125 Myrs).
The stellar parameters of HD 70573 are compiled in Table 1. We measured the equiv-
alent widths (EW) of neutral and ionized lines as described in Gray (1992). By comparing
our EW measurements with the EWs of standard stars adopted from Cayrel de Strobel
(2001) and by using the relation between EWs and temperature we derive the spectral type
of G1-1.5V for HD 70573. The stellar parameters Teff , [Fe/H], log g have been calculated by
using the TGV model (Takeda et al. 2002), which computes the stellar parameters from the
EW of FeI and Fe II.
The absolute visual magnitude has been calculated from the visual brightnessmV = 8.70
mag and the distance d = 45.7 pc (Lopéz-Santiago et al. 2006). Henry et al. (2005) has
measured photometric variations of HD 70573 and found a period of 3.296 days, which cor-
responds to the rotational period of the star. We measured the projected rotational velocity
v sin i from the spectral lines by using a cross-correlation method (Benz & Mayor 1981) with
the instrumental calibration from Setiawan et al. (2004). Our measured value (see Table 1)
is slightly higher than the value published by Henry et al. (1995), who derived v sin i = 11
km/s.
– 4 –
Fig. 1.— RV measurements of HD 70573. We observed a long-period RV variation of 852
days and short-period variation of few days (see text).
3. Observations and results
We are carrying out a RV survey of a sample of young stars with FEROS at the
2.2 m MPG/ESO telescope located at ESO La Silla Observatory, Chile. The spectrograph
has a resolution of R = 48 000 and a wavelength coverage of 3600–9200 Å (Kaufer & Pasquini
1998).
The data reduction has been performed by using the online pipeline, which produces
39 orders of one-dimensional spectra. The RVs have been measured with the simultaneous
calibration mode of FEROS and a cross-correlation technique (Baranne et al. 1996). During
the period of three years we obtained a long-term stability of FEROS that is about 10 m/s.
RV measurements of HD 70573 are shown in Fig. 1. We observed a long-term RV varia-
tion with a period of 852±12 days, which is much longer than the period of the photometric
– 5 –
Fig. 2.— Lomb-Scargle Periodogram of the RV variation of HD 70573
variability. The semi-amplitude of the RV variation is 149±16 m/s. A Lomb-Scargle peri-
odogram (Scargle 1982) of the RVs show the highest peak in the power, which corresponds
to the long-period RV variation. On a smaller time scale of several days we also detected
short-term RV variations. In the Lomb-Scargle periodogram we also found a lower peak in
the power, which corresponds to a period of ∼ 2.6 days. This is comparable to the period
in the photometric variation detected by Henry et al. (1995). The False Alarm Probability
(FAP) of the peaks are 1.1 × 10−3 for the long-period RV variation and 3.5 × 10−2 for the
short-period one. Additional RV measurements, taken with interval of few hours in several
consecutive days, may increase the power in the frequency region that corresponds to the
period of ∼3 days.
– 6 –
Fig. 3.— Bisector velocity span vs. RV of HD 70573. The figure shows no correlation
between both quantities. This favors the presence of a low-mass companion rather than
stellar activity as the source of RV variation.
4. Testing the stellar activity
As detected in many surveys, young stars show high stellar activity, characterized by
strong X-ray, Hα, Ca II H and K emission. In addition, they are also known as fast rotators.
For example, large surveys of young stars in star-forming regions such as NGC 2264 (Lamm
et al. 2004) show that the objects are often fast rotators with periods between 0.2 and 15
days. Stellar magnetic activity manifests itself by starspots and granulation, as observed in
the Sun. Pulsations have also been observed in young stars (e.g., Marconi et al. 2000).
To measure the stellar activity of HD 70573 we investigated the variation of the Ca II
K emission line (λ3934 Å ) and Hα. We did not use of the Ca II H (λ3967 Å) to avoid the
blend which can be caused by the Hǫ line of the Balmer series. Similar to the method used
– 7 –
by Santos et al. (2000), we computed an activity index by measuring the intensity of the
Ca II K relative to the intensities of 2 Å windows located in the blue and red part of the
spectra, which are close to the Ca II K region and do not have strong absorption features.
Our measurements do not show any long period variation which might be correlated with
the RV variation. The relative rms of the S-index variation is 4.5% of the mean value.
In addition, we also measured the equivalent width (EW) variation of the Hα line and Teff
variation by using the line-ratio technique (e.g., Catalano et al. 2002) to search for the stellar
activity. The EW measurements of the Hα line give a value of 961±45 mÅ. The rms of 45
mÅ corresponds to 4.7% variation in the EW, that is similar to the variation observed in the
Ca II K emission line. We observed a short-term Teff variation with a peak-to-peak value
of ∼220 K and a period of few days, which is close to the stellar rotational period. This
result means an approximately 4% variation in Teff (Table 1) and thus in good agreement
with other stellar activity indicators. However, we did not find any long-term periodicity.
The equivalent width variation of the Hα line also does not show any long period variation.
The stellar activity will leave imprints on the spectral line profile. Another possibility to
characterize the stellar activity in the spectra is by using the bisector or the bisector velocity
span (Hatzes 1996), which measures the asymmetry of the spectral line profile. Equivalently,
the bisector velocity span method can be applied to the cross-correlation function used for
the RV computation (Queloz et al. 2001). A correlation between bisector velocity spans and
RVs should be expected, if the activity is responsible for the RV variation. In contrast to
non-active solar-like stars, the bisector velocity spans of active stars are not constant. The
scatter in the velocity spans may provide information about the activity level of the star.
In HD 70573 we found no correlation between the bisector velocity spans and RVs
(Fig. 3). Thus, based on the results of our analysis of the Ca II K emission lines, Hα,
temperature variation and bisector velocity spans as stellar activity indicators we conclude
that the observed long-period RV variation of HD 70573 is most likely due to the presence
of a low-mass (substellar) companion.
5. Discussion
We computed an orbital solution for the RV data of HD 70573 by using a standard
Keplerian fit with χ2 minimization. The orbital parameters are listed in Table 2. HD 70573 b
is probably the youngest extrasolar planet detected so far with the RV technique (Fig. 4).
Planet discoveries around young stars provide important constraints for theories of
planet formation. An example is the migration process of planets occurring in the gas-rich
– 8 –
Fig. 4.— A histogram of the ages of exoplanets as of November 2006. HD 70573 b is the
youngest planet detected so far by the RV method.
phases of protoplanetary disks. The detection of young planets will also allow us to study
the relation between extrasolar planets and the structure of debris disks (Moro-Mart́ın et
al. 2006). Since HD 70573 is part of the young star sample of the SPITZER/FEPS legacy
program (Meyer et al. 2004), the detection of a planetary companion around this star is of
great interest for the study of the relation between debris disks and planets. With a spectral
type of G1-1.5V and an age of only 3–6 % of the age of the Sun, the planetary system around
HD 70573 could resemble the young Solar system.
More planet discoveries around young stars will certainly improve our understanding
of planetary systems in their early evolutionary stages. Since planet searches around young
stars via the RV method are restricted to the visual wavelength region and are strongly af-
fected by stellar activity, other detection techniques like, e.g., NIR direct imaging or astrom-
etry, are gaining importance and will most likely soon deliver more discoveries. Astrometric
– 9 –
measurements with a precision level of few tens of µas, for example, will be able to detect
the astrometric signal of the planet around HD 70573, which is ∼0.23 mas.
Finally, with the detection of a planetary companion around the young star HD 70573
we have shown, that the RV technique is still potentially profitable for the planet search
programs.
We thank the La Silla Observatory team for the assistance during the observations at
the 2.2 m MPG/ESO telescope.
Facilities: FEROS, 2.2 m MPG/ESO.
REFERENCES
Baranne, A., Queloz, D., Mayor, M., et al. 1996, A&AS, 119, 373
Benz, W. & Mayor, M. 1981, A&A, 93, 235
Butler, R. P., Wright, J. T., Marcy, G. W., et al. 2006, ApJ, 646, 505
Catalano, S., Biazzo, K., Frasca, A. et al. 2002, A&A, 394, 1009
Cayrel de Strobel, G., Soubiran, C., & Ralite, N. 2001, A&A, 373, 159
Cochran, W. D. & Hatzes, A. P. 1989, BAAS, 21, 114
Gray, D. F., 1992, The Observation and Analysis of Stellar Photosphere, Cambridge Uni-
versity Press
Hatzes, A. P. 1996, PASP, 108, 839
Henry, G. W., Fekel, F. C., & Hall, D. S. 1995, AJ, 110, 2926
Jeffries, R. D. 1995, MNRAS, 273, 559
Lamm, M., Mundt, R., Bailer-Jones, C. et al. 2004, A&A, 417, 557
Lopéz-Santiago, J., Montes, D., Crespo-Chacon, L. & Fernandez-Figueroa, M.J. 2006, ApJ,
643, 1160
Kaufer, A., & Pasquini, L. 1998, The Messenger, 95
Marconi, M., Ripepi, V., Alcalá, J.M. et al. 2000, A&A, 355, L35
– 10 –
Meyer, M. R., Hillenbrandt, L. A., Bachman, D. E. et al. 2004, ApJS, 154, 422
Montes D., Lopéz-Santiago, J., Galvez, M. C. et al. 2001, MNRAS, 328, 45
Montes D., Lopéz-Santiago, J., Fernández-Figueroa, M. J. et al. 2001, A&A, 379, 976
Moro-Mart́ın, A., Carpenter, J. M., et al. 2006, astro-ph/0612242
Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279
Santos, N.C., Major, M., Naef, D. et al., 2000, A&A, 361, 265
Scargle, J. D. 1982, ApJ, 263, 835
Setiawan J., Pasquini, L., da Silva, L. et al., 2004, A&A, 421, 241
Takeda, Y., Ohkubo, M. & Sadakane, K. 2002, PASJ, 54, 451
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0612242
– 11 –
Table 1: Stellar parameters of HD 70573.
Spectral type G1-1.5V
MV 0.4 mag
distance 45.7 pc
m 1.0 ±0.1 M⊙
Teff 5737 ±70 K
[Fe/H ] -0.18 ±0.2 [Fe/H]⊙
log g 4.59 ±0.1
EW (Li) 156 ±20 mÅ
Age 78–125 Myrs
v sin i 14.7 ±1.0 km/s
Prot 3.296 days
Table 2: Orbital parameters of HD 70573 b
P 851.8 ± 11.6 days
K1 148.5 ± 16.5 m/s
e 0.4 ± 0.1
ω 269.6 ± 14.3 deg
JD0 − 2450000 2106.54 ± 25.72 days
reduced χ2 1.34
O − C 18.7 m/s
m1 1.0 ± 0.1 M⊙
m2sini 6.1 ± 0.4 MJup
a 1.76 ± 0.05 AU
	Introduction
	HD 70573: A nearby young star
	Observations and results
	Testing the stellar activity
	Discussion
ABSTRACT
  We report evidence for a planetary companion around the nearby young star HD
70573. The star is a G type dwarf located at a distance of 46 pc with age
estimation between 20 and 300 Myrs. We carried out spectroscopic observations
of this star with FEROS at the 2.2 m MPG/ESO telescope at La Silla. Our
spectroscopic analysis yields a spectral type of G1-1.5V and an age of about
100 Myrs. Variations in stellar radial velocity of HD 70573 have been monitored
since December 2003 until January 2007. The velocity accuracy of FEROS within
this period is about 10 m/s. HD 70573 shows a radial velocity variation with a
period of 852 +/- 12 days and a semi-amplitude of 149 +/- 6 m/s. The period of
this variation is significantly longer than its rotational period, which is 3.3
days. Based on the analysis of the Ca II K emission line, Halpha and Teff
variation as stellar activity indicators as well as the lack of a correlation
between the bisector velocity span and the radial velocity, we can exclude the
rotational modulation and non-radial pulsations as the source of the
long-period radial velocity variation. Thus, the presence of a low-mass
companion around the star provides the best explanation for the observed radial
velocity variation. Assuming a primary mass m1=1.0 +/- 0.1 Msun for the host
star, we calculated a minimum mass of the companion m2sini of 6.1 Mjup, which
lies in the planetary mass regime, and an orbital semi-major axis of 1.76 AU.
The orbit of the planet has an eccentricity of e=0.4. The planet discovery
around the young star HD 70573 gives an important input for the study of debris
disks around young stars and their relation to the presence of planets.

<|endoftext|><|startoftext|>
Introduction
	Quantum Description of X waves
	Generation and Detection of quantum X Waves
	Conclusions
	Acknowledgments
	References
ABSTRACT
  We show that two distinct quantum states of the electromagnetic field can be
associated to a classical vector X wave or a propagation-invariant solution of
Maxwell equations. The difference between the two states is of pure quantum
mechanical origin since they are internally entangled and disentangled,
respectively and can be generated by different linear or nonlinear processes.
Detection and generation of Schr\"odinger-cat states comprising two entangled
X-waves and their possible applications are discussed.

<|endoftext|><|startoftext|>
Introduction
A precise knowledge of the neutron-neutron scattering length ann is, e.g., important for an understand-
ing of the effects of charge symmetry breaking in nucleon–nucleon forces [1]. The scattering length
ann characterizes scattering at low energies. It is related to the on–shell
1S0 scattering amplitude f
fon(pr) =
pr cot δ(pr)− ipr
−a−1nn + 12 rnnp2r +O(p4r)− i pr
, (1)
where pr is the relative momentum between the two neutrons, δ(pr) the scattering phase shift in the
1S0 partial wave and rnn is the effective range. At low energies the terms of order p
r can be neglected
to very high accuracy. Obviously, a direct determination of ann in a scattering experiment is extremely
difficult due to the absence of a free neutron target. For this reason, the value for ann is to be obtained
from analyses of reactions where there are three particles in the final state, e.g. π−d → γnn [2, 3, 4]
or nd → pnn [5, 6, 7]. There is some spread in the results for ann obtained by the various groups. In
particular, two independent analyses of the reaction nd → pnn give significantly different values for
ann, namely ann = −16.1± 0.4 fm [6] and ann = −18.7± 0.6 fm [7], whereas the latest value obtained
from the reaction π−d → γnn is ann = −18.5 ± 0.3 fm [4]. At the same time, for the proton-proton
scattering length, which is directly accessible, a very recent analysis reports app = −17.3 ± 0.4 fm
[8] after correcting for electromagnetic effects. This means that even the sign of ∆a = app − ann is
http://arxiv.org/abs/0704.0443v1
not fixed.1 It should be mentioned, however, that state of the art calculations for the binding energy
difference of tritium and 3He suggest that ∆a > 0 [9, 10].
In the present work we discuss the possibility to determine ann from differential cross sections in
the reaction γd → π+nn. Specifically, we show that one can extract the value of ann reliably by fitting
the shape of a properly chosen momentum spectrum. In this case the main source of inaccuracies,
caused by uncertainties in the single–nucleon photoproduction multipole E0+, is largely suppressed.
Furthermore there is a suppression of the quasi-free pion production at specific angles. We show that
at these angular configurations the extraction of ann can be done with minimal theoretical uncertainty.
Our investigation is based on the recent work of Ref. [11] in which the transition operator for
the reaction γd → π+nn was calculated up to order χ5/2 in chiral perturbation theory (ChPT) with
χ = mπ/MN ≃ 1/7, where mπ (MN ) is the pion (nucleon) mass. Half-integer powers of χ in the
expansion arise from the unitarity (two– and three–body) cuts (see also [12]). The results of Ref. [11]
for the total cross section are in very good agreement with the experimental data. The only input
parameter that entered the calculation was the leading single–nucleon photoproduction multipole E0+,
which was fixed from a fourth-order one-loop calculation of Bernard et al. [13]. The uncertainty in
E0+ is the main theoretical error in the calculation presented in Ref. [11]. Besides this transition
operator, in the present study we use nucleon–nucleon (NN) wave functions constructed likewise
in the framework of ChPT, namely those of the NNLO interaction of Ref. [14]. This allows us to
estimate the theoretical uncertainty which arises from variations in the wave functions. In fact, as
soon as we include consistently all terms up to order χ5/2, we expect the ambiguities due to different
wave functions not to be larger than a χ3 correction, for only at this order the leading counter term
which absorbs these effects enters. This expectation is indeed quantitatively confirmed in the concrete
calculations.
Since we work within chiral perturbation theory we can estimate the effect of higher orders in
terms of established expansion parameters together with the standard assumption that additional
short ranged operators, that enter at higher orders, behave in accordance with the power counting
(the so-called naturalness assumption). This method was also applied in Refs. [15, 16], where the
reaction π−d → γnn was investigated as a tool to extract ann. However, to know the effect of higher
orders for sure, one has to calculate them. Therefore, to derive a reliable uncertainty estimate for the
extraction of ann from the γd reaction, we use our leading order calculation as baseline result and
estimate the theoretical uncertainty from the effects of the higher orders that we calculated completely.
Based on this, we find a theoretical uncertainty δann . 0.1 fm. We therefore argue that the reaction
γd → π+nn appears to be a good tool for the extraction of ann.
To end this section, we remark that in Ref. [17] a method was proposed to extract scattering
lengths from γd induced meson production. However, this approach should not be used here, since
the momentum transfer is not sufficiently large to use this method and with our explicit calculation
of the transition operator we can reach a significantly higher accuracy.
2 ChPT calculation for γd → π+nn
The diagrams that contribute to the reaction γd → π+nn are shown on Fig. 1. The kinematical
variables are defined in Fig. 2.
Before going into the details some comments are necessary regarding the relevant scales of the
problem. In the near threshold regime of interest here (excess energies of at most 20 MeV above the
1Note, that, in contrast to app, ann is not corrected for electromagnetic effects. However, since those are only of the
order of 0.3 fm [1] they are not relevant for the sign of ∆a. But they ought to be taken into account for determining
charge symmetry breaking effects quantitatively.
χ , χ , χ χ , χ0 1 2 2 5/2
(d2)(b2)
(b1) (c1) (d1)
Figure 1: Diagrams for γd → π+nn. Shown are one–body terms (diagram (a) and (b) ), as well as the
corresponding rescattering contribution (c)—all without and with final state interaction. Diagrams
(d) shows the class of diagrams with intermediate NN interaction. Solid, wavy, and dashed lines
denote nucleons, photons and pions, in order. Filled squares and ellipses stand for the various vertices
(see Ref. [11] for the details), the hatched area shows the deuteron wave function and the filled circle
denotes the nn scattering amplitude. Crossed terms (where the external lines are interchanged) are
not shown explicitly.
pion production threshold) the outgoing pion momenta are small compared even to the pion mass.
Thus, in addition to the conventional expansion parameters of ChPT mπ/Λχ and qγ/Λχ, where Λχ
denotes the chiral symmetry breaking scale of order of (and often identified with) the nucleon mass,
and qγ denotes the photon momentum in the center–of–mass system which is of order of the pion
mass, we can also regard kπ/mπ as small, where kπ denotes the momentum of the outgoing pion. In
what follows we will perform an expansion in two parameters, namely
χm = mπ/MN and χQ = kπ/mπ .
Obviously, the value of the second parameter depends on the excess energy Q. The energy regime
of interest to us corresponds to excess energies up to 20 MeV. The maximum value of χQ, χ
2Q/mπ, at the highest energy considered is thus about 1/2. Since this is numerically close to
we use the following assignment for the expansion parameter:
χ ∼ χm ∼ χ2Q . (2)
The tree level γp → π+n vertex, as it appears in diagrams (a1) and (a2) in Fig 1 (the vertex is labeled
as filled square), contributes at leading order (order χ0), and orders χ1 and χ2, depending on the
one–body operator used. Note that the loop diagrams with πN rescattering (see diagrams (b), (c)
and (d) in Fig. 1) contribute at order χ2m as well as at χ
mχQ, χ
m and at χ
Q. The origin of the
non–integer power of χ are the two–body (πN) and three–body (πNN) singularities. Thus, all terms
up to χ5/2 are explicitly taken into account in our calculation of the transition operator.
As already emphasized, we employ wave functions evaluated in the same framework in order to
have a fully consistent calculation. In our work, we use the N2LO wave functions corresponding to
the chiral NN forces introduced in Ref. [18] and based on the spectral function regularization (SFR)
scheme [19]. At this order, the NN force receives contributions from one-pion exchange, two-pion
p−q 2
Figure 2: Kinematical variables for γd → π+nn. The relative neutron–neutron momentum is defined
as ~pr =
(~p1 − ~p2).
exchange at the subleading order as well as from all possible short-range contact interactions with up
to two derivatives. In addition, the dominant isospin-breaking correction due to the charged-to-neutral
pion mass difference in the one-pion exchange potential together with the two leading isospin-breaking
S-wave contact interactions were taken into account [18]. The two corresponding low-energy constants
were adjusted to reproduce the scattering lengths ann and app. The SFR cutoff Λ̃ is varied in the range
500 . . . 700 MeV. It was argued in Ref. [19] that such a choice for Λ̃ provides a natural separation of
the long- and short-range parts of the nuclear force and allows to improve the convergence of the chiral
expansion [19]. The cutoff Λ in the Lippmann-Schwinger equation is varied in the range 450 . . . 600
MeV. For an extensive discussion on the choice of Λ and Λ̃ the reader is referred to [14, 18].
3 Differential cross sections: relevant features
In this section we outline the features of the differential cross section for unpolarized particles that are
important for our considerations. For later convenience let us consider the function F proportional to
the square of the matrix element as well as the five–fold differential cross section
F (pr, θr, φr, θπ, φπ) = C pr kπ(pr)|M(pr, θr, φr, θπ, φπ)|2 ∝
d5σ(pr, θr, φr, θπ, φπ)
dΩ~prdΩ~kπ
, (3)
where ~pr (~kπ) stands for the relative momentum of the two final neutrons (momentum of the final
pion) in the center–of–mass frame, θr, φr (θπ, φπ) for the corresponding polar and azimuthal angles,
respectively, and |M|2 for the squared and averaged amplitude. In Eq. (3) C is an irrelevant dimen-
sionful constant. In what follows we will consider only shapes of cross sections and therefore the value
of C is not important for our considerations. The value of kπ at given pr and excess energy Q is fixed
by energy conservation:
, (4)
hence we write kπ(pr) in Eq. (3).
In the following we choose the momentum ~qγ of the initial photon to be along the z–axis. Then
the cross sections at a certain excess energy Q depend on four variables, namely the magnitude of
the relative momentum of the two final neutrons pr, the polar angles of the vectors ~pr and ~kπ, and
the difference between the azimuthal angles of those two momenta. Unpolarized cross sections are
invariant under rotations around the beam axis, which makes the dependence on the missing angle
trivial.
Typical differential cross sections F are shown in Fig. 3 as a function of pr at some fixed set of
angles {φr, θπ, φπ} and Q = 5 MeV for two different values of θr. One can see from this figure that
for the differential cross section F of Eq. (3) there are two characteristic regions:
1. The region of quasi-free production (QF) at large pr, which corresponds to the dominance of those
diagrams of Fig. 1 that do not contain the NN interaction in the final or intermediate states.
In the Appendix we give explicit expressions for the diagram a1 – the most significant diagram
of this type. At large pr the pion momentum kπ is small (see Eq. (4)) and the arguments of the
deuteron wave function in Eqs. (A.1) and (A.2) may become small for particular combinations
of ± ~pr and ~qγ/2. This feature gives rise to a peak in the differential cross section at large pr.
2. The region with prominence of the strong nn final–state interaction (FSI) at small pr (in fact,
we would have the strongest final state interaction at zero relative momentum, however the cross
section goes to zero at pr = 0 due to the phase space, therefore we see a peak shape).
One can see from Fig. 3 that the FSI peak depends on the value of θr only marginally, whereas the
quasi-free peak shows significant dependence on this angle. In particular, the quasi-free production
is largely suppressed at θr = 90
◦ — at this angle the arguments of the wave functions in both terms
in the r.h.s. of Eqs. (A.1) and (A.2) are large. It can also be seen from Fig. 3 (right panel) that
the effect of higher orders is more important for the quasi-free production amplitude — the influence
of higher-order effects on the FSI production is quite small. Another interesting observation is that
the contributions of higher orders change the relative height of the two peaks – the FSI peak goes up
whereas the QF peak goes down when we proceed from the LO calculation to the order χ5/2. In order
to suppress the distortions of the spectrum due to higher orders in the chiral expansion, which is the
condition for an extraction of ann with small theoretical uncertainty, configurations should be chosen
where θr = 90
We now briefly discuss the dependence of the cross section on the remaining angles θπ, φπ (we always
may choose φr to be zero). The dependence on θπ is illustrated in Fig. 4. One can see from this figure
that the dependence on θπ is significant for both the quasi–free as well as the FSI peak. This can be
easily understood from the explicit expressions for the matrix elements given in the Appendix keeping
in mind that already at Q = 5 MeV the maximal value of kπ is about mπ/3 while qγ ≈ mπ. Thus,
the momentum transfer to the nucleon pair, |~qγ −~kπ|, varies in the range 2mπ/3 to 4mπ/3 depending
on θπ. Since the S-wave deuteron wave function is large only for very small arguments, the influence
of the direction of ~kπ is significant. In addition, from Fig. 4 it follows that a variation of θπ not only
changes the magnitude but also the shape of the cross section, even in the FSI region. This has to be
taken into account in the analysis of any experiment.
In contrast to the polar angles, the dependence of F on φπ is negligible for all configurations (there
is no dependence at all for θr = 0
◦ and at θr = 90
◦, only the anyway small QF contribution changes
by just 5 %).
4 Extraction of ann and estimate of the theoretical uncertainty
In this section we discuss how to extract the scattering length from future data on γd → π+nn as well
as the resulting theoretical uncertainty. Our focus is especially the latter point. As in the previous
section we will only discuss results at excess energy Q = 5 MeV. However, the analysis can be repeated
analogously at any excess energy within the range of applicability of the formalism, i.e. Q ≤ 20 MeV.
We are interested in extracting the value of ann, which, in turn, is a low-energy characteristic of
neutron-neutron scattering and manifests itself in the momentum dependence of the cross section at
small values of the momentum pr. The influence of the value of ann on the cross section is illustrated in
0 10 20 30 40 50 60 70
 [MeV]
0 10 20 30 40 50 60 70
 [MeV]
.] FSI
QF 0 degrees
QF 90 degrees
Figure 3: Left panel: Differential cross section. The solid line corresponds to the configuration
when the quasi–free peak is suppressed (θr = 90
◦), whereas the dashed line corresponds to one of
the configurations when the quasi–free production amplitude is maximal (θr = 0
◦). The values of
the remaining angles are θπ = 135
◦, φr = φπ = 0
◦; they are the same for both curves. Right panel:
Differential cross section — relative strength of QF and FSI peaks. Here the dashed curves correspond
to the calculation at LO, the solid ones to the calculation at χ5/2. Curves denoted by ”FSI” (”QF”)
are obtained by retaining only those diagrams of Fig. 1 that contain (do not contain) the final or
the intermediate nucleon–nucleon interaction. The labels ”0 degrees” and ”90 degrees” denote the
corresponding values of θr for the ”QF” curves whereas the ”FSI” curves are almost insensitive to this
angle. The values or the remaining angles are as on the left panel of this figure. The overall scale is
arbitrary in both panels but the relative normalization is the same for all curves.
Fig. 5, where the cross sections are shown for three different values of ann, namely −18, −19, −20 fm.
For each value there are two curves, the dashed one corresponds to θr = 0
◦, and the solid one to
θr = 90
◦. One can see from Fig. 5 that the influence of different values of ann is significant in the FSI
peak and marginal in the quasi–free peak, as one would have expected.
In the previous section we have shown (see right panel in Fig. 3) that the relative height of the quasi-
free and the FSI peak changes if the effects of higher orders are included in the cross sections. Therefore
those angular configurations are to be preferred, where the quasi-free production is suppressed.
The central point of this study is to demonstrate that there is a large sensitivity of the momentum
spectra to the scattering length and that this scattering length can be extracted with a small and
controlled theoretical uncertainty. As outlined in the Introduction, we can estimate this uncertainty
reliably, because the effect of the higher orders up to χ5/2 are calculated completely. In order to
demonstrate the effect of those higher orders on the shape of the momentum distribution, in Fig. 6
we show as the light band the spread in the results for the calculation from LO to χ5/2. The results
also include higher partial waves for the pion as well as the final nn system.
There is some sensitivity to the behavior of the deuteron wave function at short distances. For the
reaction π−d → γnn this sensitivity was identified as the largest effect at N3LO in Ref. [15] 2. Guided
by that observation we include in the uncertainty estimate also the spread in the results due to the
use of different wave functions. In order to remove the effect of the change in normalization when,
e.g., changing the chiral order, all curves are normalized at pnorm = 30 MeV in Fig. 6. In the same
2Within the framework of ChPT with a consistent power counting scheme, the quantitative impact of the wave-
function dependence is governed by the order at which a counter term appears that can absorb this model dependence.
The corresponding counter term for the γd as well as the πd reaction arises at N3LO.
0 10 20 30 40 50 60 70
 [MeV]
0 10 20 30 40 50 60 70
 [MeV]
Figure 4: Dependence of the differential cross section on θπ. The left panel corresponds to the
suppressed quasi–free amplitude (θr = 90
◦), the right panel to the maximal quasi–free amplitude
(θr = 0
◦). Solid, dashed, dotted, and dash-dotted lines correspond to θπ = 0
◦, 45◦, 90◦, 135◦
respectively. The values of the remaining angles are φr = φπ = 0
◦. The overall scale is arbitrary in
both panels but the relative normalization is the same for all curves.
Figure (with the same normalization) we also show the change in the shape that comes from different
values of the scattering length: the dark band is generated by a variation of ann by ±1 fm around
the central value of −18.9 fm. Clearly, the theoretical uncertainty is negligibly small compared to the
signal of interest.
One way to quantify the theoretical uncertainty is through the use of the function S, defined as
S(ann,Φ) =
F (pr|a(0)nn ,Φ(0))−N(ann,Φ) F (pr|ann,Φ)
w(pr) , (5)
where pmax =
MNQ is the maximum value of pr, F (pr|ann,Φ) is proportional to the five-fold
differential cross section as defined in Eq. (3). In the latter we refrained from showing the angular
dependence in favor of the parametric dependence of the cross section on the nn scattering length ann
as well as the multi–index Φ, which symbolizes the dependence of the cross section on the chosen chiral
order and the wave functions used, as outlined above. The weight function w(pr) was introduced to
allow us to suppress particular regions of momenta in the analysis — the role of w(pr) will be discussed
in detail below. For simplicity we may assume that S is dimensionless; all dimensions can be absorbed
into the constant C defined in Eq. (3).
The value a
nn denotes the central value of the scattering length (−18.9 fm) for which we perform
the estimate of the theoretical uncertainty3 whereas Φ(0) corresponds to the baseline type of calcu-
lation, namely leading order with chiral wave functions as specified in the Appendix. The relative
normalization N(ann,Φ) is fixed by demanding that S gets minimized for any given pair of parameters
ann,Φ (∂S/∂N = 0). This gives
N(ann,Φ) =
dpr F (pr|a
nn ,Φ
(0))F (pr|ann,Φ)w(pr)
dpr F 2(pr|ann,Φ)w(pr)
. (6)
3Note that the theoretical uncertainty practically does not change when the central value of the scattering length
varies in the relevant interval ±1 fm.
0 10 20 30 40 50 60 70
 [MeV]
-20 fm
-19 fm
-18 fm
Figure 5: The effect of varying the value of ann on the differential cross section. The solid and dashed
lines correspond to the same angular configurations as in Fig. 3, left panel. The different values of
ann are shown on the figure. The overall scale is arbitrary but the relative normalization is the same
for all curves.
Obviously S is the continuum version of the standard χ2 sum, i.e. it characterizes the mean-square
deviation from the baseline cross section F (pr|a
nn ,Φ
(0)). In this way we determine the theoretical
uncertainty in full analogy to the standard method of data analysis.
In order to quantify the theoretical uncertainty we may define Φmax as that chiral order and choice
of wave function, where S(a(0)nn ,Φmax) gets maximal:
S(a(0)nn ,Φmax) = max
S(a(0)nn ,Φ)
. (7)
Therefore S(a(0)nn ,Φmax) provides an integral measure of the theoretical uncertainty of the differential
cross section. Demanding that the effect of a change in the scattering length by the amount ∆ann
matches that by the inclusion of higher orders etc., we can identify ∆ann as an uncertainty in the
scattering length. Expressed in terms of S, we may define ∆ann via
S(a(0)nn +∆ann,Φ(0)) = S(a(0)nn ,Φmax) . (8)
This relation is illustrated in Fig. 7. The dashed horizontal line corresponds to S(a(0)nn ,Φmax), where
we use w(pr) = 1. The dashed parabolic line shows the corresponding S(a
nn + ∆ann,Φ
(0)) as a
function of ∆ann. The calculation is performed for θr = 90
◦, and θπ = 0
◦. The crossing point of the
curves corresponds to ∆ann = 0.16 fm, which can be identified as the theoretical uncertainty for the
extraction of the scattering length.
In the previous section we showed that the signal region is located at momenta lower than 30 MeV.
On the other hand, the theoretical uncertainty of the differential cross section is largest for large values
of pr due to the onset of the quasi–free contribution. In view of these two facts it seems reasonable
to use such weight functions w(pr) that suppress the contribution of large momenta. For instance, we
may use w(pr) = Θ(p
cut−pr) for the weight function. If we choose, e.g., pcut = 30 MeV the theoretical
uncertainty of the extraction of the scattering length reduces to 0.07 fm, as is demonstrated by the
solid lines in Fig. 7. This figure nicely illustrates that the parabolic curve that represents the signal
changes only very little when a restriction to small values of pr is applied. At the same time this
procedure significantly reduces the value of the uncertainty S(a(0)nn ,Φmax).
0 10 20 30 40 50 60 70
 [MeV]
Figure 6: The light (white) band is the error band, and dark (blue) band correspond to ±1 fm shift
in the scattering length from the central value −18.9 fm.
The observation that the dependence of S(a(0)nn +∆ann,Φ(0)) on ∆ann is very well approximated by
a parabola allows for a more systematic study of the pcut dependence of the theoretical uncertainty.
We therefore define
α(pcut) =
S(a(0)nn +∆ann,Φ(0)|pcut)
(∆ann)2
, (9)
where the explicit pcut dependence is introduced into the function S through the weight function w as
explained above. The dashed and the solid parabola in Fig. 7 can then be written as α(pcut) (∆ann)
with α(pmax) = 41 fm
−2 and α(30 MeV) = 33 fm−2. In the left panel of Fig. 8 we show α(pcut) as
the solid line. In the same panel the dashed line represents the measure of the theoretical uncertainty
given by S(a(0)nn ,Φmax|pcut), multiplied by a factor of 40. This figure makes more quantitative the
statement made above: for very small values of pcut we cut into the signal region and therefore α
shows a very rapid variation. However, as soon as pcut is larger than 30 MeV it goes to a plateau (in
the figure indicated by the arrow). On the other hand, the theoretical uncertainty is monotonously
growing once pcut is larger than 30 MeV. From this figure we deduce that the ideal value for pcut
is between 25 and 40 MeV. This translates into a theoretical uncertainty between 0.05 and 0.1 fm,
as illustrated in the right panel of the same figure. The value of θπ also has some impact on the
theoretical uncertainty, however, in its whole parameter range the estimated uncertainty stays below
0.1 fm for pcut = 30 MeV.
Clearly, also the experimental data, once they exist, should be analyzed using a procedure analogous
to the one given above. This means that the scattering length is to be extracted from a χ2 fit of the
theoretical curves to the data. In this work we used the calculation at LO as baseline result and the
results at higher orders to estimate the theoretical uncertainty. Consequently, we propose to use the
momentum spectrum calculated at LO in the fitting procedure of the experiment. The corresponding
analytical expressions are given in the Appendix. The only parameter to be adjusted besides the
scattering length is the overall normalization. In this fitting procedure only those data points should
be included that are below a given pcut, in order to keep the theoretical uncertainty small.
-0.2 -0.1 0 0.1 0.2
 [fm]
Figure 7: The functions S(a(0)nn ,Φmax) and S(a
nn + ∆ann,Φ
(0)) are shown by the horizontal and
parabolic curves, respectively. The solid curves are obtained by adding the weight factor in Eq. (5)
that cuts all momenta above 30 MeV in distinction from the dashed ones. The calculation is performed
for the scattering length a
nn = −18.9 fm, θr = 90◦, and θπ = 0◦. The value of ∆ann corresponding
to the crossing point of the horizontal and parabolic curves determines the theoretical uncertainty of
the calculation.
5 Discussion and conclusions
In the previous section it was shown that for the angular configurations that suppress the quasi-free
production the inclusion of higher order effects (NLO, N2LO, and χ5/2) as well as the use of different
wave functions leads only to a minor change in the momentum dependence of the five-fold differential
cross sections.
Based on this observation we propose to use the momentum spectrum calculated at LO for the
extraction of the neutron–neutron scattering length from the data. This procedure has the advantage
that the corresponding matrix elements can be given in an analytic form (see Appendix) that could
be used directly in the Monte Carlo codes for the experiment analysis. In this way the non–trivial
dependence of the spectra on θπ, discussed above, can be easily controlled. The scattering length can
then be extracted by a two parameter fit to the data where, simultaneously to a variation in ann, the
normalization constant needs to be adjusted.
Note that the leading order calculation basically agrees to the expression given in Ref. [20] long
ago. However, a systematic and reliable study of the theoretical uncertainties of the extraction was
possible only within our full calculation up to order χ5/2 in ChPT. In this way we could show that
the reaction γd → π+nn is very well suited for a determination of the nn scattering length. The
theoretical uncertainty of order 0.1 fm for the extracted scattering length, estimated in this paper, is
of the same order as that claimed for π−d → γnn [16] and nd → pnn [4, 7].
We discussed in detail the theoretical uncertainty for a fixed excess energy of Q = 5 MeV only,
however, it should be clear that the procedure can be easily repeated for any energy within the range of
applicability of our approach (Q ≤ 20 MeV). For example, we checked that the theoretical uncertainty
stays below 0.1 fm also at Q = 10 MeV. Note that the number of events in the signal region scales
roughly with
Q, the phase space available for the pion. It remains to be seen which energy is the
best for the corresponding experiment.
We showed that for a proper choice of both kinematics and weight function w, the theoretical
0 10 20 30 40 50 60 70
 [MeV]
0 10 20 30 40 50 60 70
 [MeV]
Figure 8: Left panel: Comparison of the pcut dependence of functions S(a(0)nn ,Φmax|pcut) (dashed curve)
and α(pcut) (solid curve). The calculation is performed for the scattering length a
nn = −18.9 fm,
θr = 90
◦, and θπ = 0
◦. Right panel: The corresponding theoretical uncertainty ∆ann as a function of
pcut.
uncertainty for the extraction of the neutron–neutron scattering length from γd → π+nn can be as
low as 0.1 fm. It should be stressed, however, that this error was evaluated most conservatively – we
use our LO calculation as baseline result and estimate the theoretical uncertainty from the effects of
the higher orders that we calculated completely. This error can be significantly reduced by further
studies. For example, if we include in the uncertainty estimate only the spread in the results due to
the use of different wave functions, which is identified as the largest effect at N3LO for the reaction
π−d → γnn [15], the theoretical uncertainty of the extracted scattering length reduces by one order
of magnitude. This indicates that the theoretical uncertainty is indeed under control. However to
put this N3LO estimation on more solid ground a complete calculation should be performed to this
order. Most of the operators that are relevant at this order are the same as those of π−d → γnn, given
explicitly in Ref. [21]. One counter term enters, which can be fixed from other processes [16], e.g.,
from nd scattering [22], the reaction NN → NNπ [23], or from weak decays [16]. Once this is done
we may use our calculation to order χ5/2 as baseline result and estimate the theoretical uncertainty
from the then available N3LO calculation.
Although we have identified the angles θr = 90
◦ as the preferred kinematics, also other configura-
tions could be studied in order to control the systematics. However, then the spectra calculated at
χ5/2 should be used in the analysis.
Acknowledgments
We thank A. Bernstein for useful discussions and interest in this work. We also thank D. R. Phillips
and A. G̊ardestig for helpful discussions. This research is part of the EU Integrated Infrastructure
Initiative Hadron Physics Project under contract number RII3-CT-2004-506078, and was supported
also by the DFG-RFBR grant no. 05-02-04012 (436 RUS 113/820/0-1(R)) and the DFG SFB/TR 16
”Subnuclear Structure of Matter”. A. K. and V. B. acknowledge the support of the Federal Program
of the Russian Ministry of Industry, Science, and Technology No 02.434.11.7091. E. E. acknowledges
the support of the Helmholtz Association (contract no. VH-NG-222).
A Leading amplitudes
In this appendix we give explicit expressions for the amplitudes that appear at leading order in the
calculation for γd → π+nn. As outlined in the main text these expressions can be used directly in
the analysis of the data, once available. In addition, they should also proof useful for the design of
the experiment. Note, as outlined in the text, only near θr = 90
◦ the leading order calculation gives
a sufficiently accurate representation of the spectra. At all other angles one should use the complete
calculation.
At leading order only diagrams a1 and a2 of Fig. 1 contribute. Since only the momentum dependence
of the amplitudes is relevant for the experimental analysis we drop an overall factor compared to
Ref. [11]. The corresponding amplitudes read:
M sa1 =
u(~pr − ~kπ/2 + ~qγ/2) + u(−~pr − ~kπ/2 + ~qγ/2)
(A.1)
M ta1 =
u(~pr − ~kπ/2 + ~qγ/2)− u(−~pr − ~kπ/2 + ~qγ/2)
(A.2)
Ma2 = 8π
fon(pr)
g(pr)
(2π)3
u(~p− ~kπ/2 + ~qγ/2) g(p)
p2 − p2r − i0
fon(pr)
iqπγ g(pr)
p2r + β
αi − ipr + iqπγ
αi − ipr − iqπγ
αi + βj − iqπγ
αi + βj + iqπγ
(A.3)
where u(~p) denotes the S–wave part of the deuteron wave function in momentum space. We checked
by explicit calculations that the inclusion of the deuteron D-wave changes only the absolute scale of
the differential cross sections but not its momentum dependence. Thus, the D-wave contribution is
not taken into account in the parameterization. The quantity qπγ is defined as qπγ = |~kπ − ~qγ|/2. The
labels s and t stand for spin singlet and triplet final two-nucleon states, respectively — we do not
write out the corresponding spin structures. We take into account only the 1S0 partial wave in the
final state interaction. For a discussion of the effect of nn P–waves see Ref. [11].
To derive the expression forMa2 we used the fact that the neutron–neutron scattering amplitude can
be represented to high accuracy in separable form [11, 24]. The neutron–neutron scattering amplitude,
f(p, k;E), can be written in half off–shell kinematics as
f(p, k; k2/MN ) =
2π2MNg(p)g(k)
g2(q)
q2−k2−i0
= fon(k)
, (A.4)
where the corresponding on–shell amplitude fon(k) can then be expressed in terms of the scattering
phase–shifts through
fon(k) = f(k, k; k2/MN ) =
k cot δ(k) − ik
For small momenta one can use the effective range expansion for k cot δ = −1/ann + rnnk2/2+O(k4),
in agreement with Eq. (1). Here ann is the parameter to be fitted to the data and rnn = 2.76 fm. We
checked that changing the value of rnn within the bounds allowed (±0.1 fm [1]) leads to negligible
effects on the extraction of the scattering length. In this way we expressed the matrix element explicitly
in terms of the scattering length. We checked that the ratio g(p)/g(k) in Eq. (A.4) does not change
when we vary the scattering length within acceptable range bounds.
In order to evaluate the convolution of the deuteron wave function with the nn final state interaction
analytically, we needed to employ the following parameterizations for the 1S0 nn form factor g(p) (see
Eq. (A.4)) and the S-wave deuteron wave function
g(p) =
p2 + β2i
; u(p) =
p2 + α2i
; (A.5)
1S0 form factor S-wave deuteron w.f.
βi [MeV] Di [MeV] αi [MeV] Ci [MeV
1 164.53278 31.101228 45.334919 43.543212
2 246.85751 -1310.3056 242.66091 -35.643003
3 329.18224 9455.9603 439.98691 419.25214
4 411.50697 -9666.0268 637.31291 -1833.4708
5 493.83170 -55571.615 834.63891 -3710.8173
6 576.15643 64600.071 1031.9649 24903.150
7 658.48116 149128.85 1229.2909 -31673.576
8 740.80589 -84844.967 1426.6169 26476.636
9 823.13062 -295594.17 1623.9429 -118733.48
10 905.45536 -30332.710 1821.2689 259759.15
11 987.78009 560829.89 2018.5949 -223816.07
12 1070.1048 -307006.25 2215.9209 −
i=1Ci
Table 1: Parameters of the 1S0 form factor and the S-wave deuteron wave function for the separable
representation of the N2LO chiral NN potential.
where the parameters corresponding to the ChPT calculation at N2LOwith cut offs {Λ, Λ̃}= {550 MeV,
600 MeV} (see Ref [18] for details) are listed in Table 1. Note that the coefficients in the parameter-
ization of the wave function have to fulfill the relation
Ci = 0 in order to ensure the regularity of
the deuteron wave function at the origin in coordinate space [25].
The squared and averaged amplitude to be used in the expression for the differential cross section,
defined in Eq. (3) is
|M(pr, θr, φr, θπ, φπ)fit|2 = |M sa1 +Ma2|
∣M ta1
. (A.6)
In a fit to data two parameters are to be adjusted, namely the overall normalization C of Eq. (3)
and the object of desire, ann.
References
[1] G. Miller, B. Nefkens and I. Šlaus. Phys. Rep. 194, 1 (1990).
[2] B. Gabioud et al., Nucl. Phys. A 420, 496 (1984).
[3] O. Schori et al., Phys. Rev. C 35, 2252 (1987).
[4] C. R. Howell et al., Phys. Lett. B 444, 252 (1998).
[5] D. E. Gonzáles Trotter et al., Phys. Rev. Lett. 83, 3788 (1999).
[6] V. Huhn et al., Phys. Rev. Lett. 85, 1190 (2000); Phys. Rev. C 63, 014003 (2000).
[7] D. E. Gonzáles Trotter et al., Phys. Rev. C 73, 034001 (2006).
[8] R. B. Wiringa, V. G. J. Stoks and R. Schiavilla, Phys. Rev. C 51, 38 (1995) [arXiv:
nucl-th/9408016].
http://arxiv.org/abs/nucl-th/9408016
[9] R. Machleidt and H. Müther, Phys. Rev. C 63, 034005 (2001) [arXiv:nucl-th/0011057].
[10] A. Nogga et al., Phys. Rev. C 67, 034004 (2003) [arXiv:nucl-th/0202037].
[11] V. Lensky, V. Baru, J. Haidenbauer, C. Hanhart, A. Kudryavtsev and U.-G. Meißner,
Eur. Phys. J. A 26, 107 (2005) [arXiv: nucl-th/0505039].
[12] V. Baru, C. Hanhart, A. E. Kudryavtsev and U.-G. Meißner, Phys. Lett. B 589, 118 (2004)
[arXiv: nucl-th/0402027].
[13] V. Bernard, N. Kaiser and U.-G. Meißner, Phys. Lett. B 383, 116 (1996) [arXiv:hep-ph/9603278].
[14] E. Epelbaum, Prog. Part. Nucl. Phys. 57, 654 (2006) [arXiv: nucl-th/0505032].
[15] A. G̊ardestig and D. R. Phillips, Phys. Rev. C 73, 014002 (2006) [arXiv: nucl-th/0501049].
[16] A. G̊ardestig and D. R. Phillips, Phys. Rev. Lett. 96 (2006) 232301 [arXiv:nucl-th/0603045].
[17] A. Gasparyan, J. Haidenbauer, C. Hanhart and K. Miyagawa, arXiv:nucl-th/0701090;
Eur. Phys. J. A in print.
[18] E. Epelbaum, W. Glöckle and U.-G. Meißner, Nucl. Phys. A 747, 362 (2005) [arXiv:
nucl-th/0405048]
[19] E. Epelbaum, W. Glöckle and U.-G. Meißner, Eur. Phys. J. A 19, 125 (2004) 125
[arXiv:nucl-th/0304037].
[20] J. M. Laget, Phys. Rep. 69 (1981) 1.
[21] A. G̊ardestig, Phys. Rev. C 74, 017001 (2006) [arXiv:nucl-th/0604035].
[22] E. Epelbaum, A. Nogga, W. Glöckle, H. Kamada, U.-G. Meißner and H. Witala, Phys. Rev. C
66, 064001 (2002) [arXiv:nucl-th/0208023].
[23] C. Hanhart, U. van Kolck and G. A. Miller, Phys. Rev. Lett. 85 (2000) 2905
[arXiv:nucl-th/0004033].
[24] J. Haidenbauer and W. Plessas, Phys. Rev. C 30, 1822 (1984).
[25] M. Lacombe, B. Loiseau, R. Vinh Mau, J. Cote, P. Pires and R. de Tourreil, Phys. Lett. B 101,
139 (1981).
http://arxiv.org/abs/nucl-th/0011057
http://arxiv.org/abs/nucl-th/0202037
http://arxiv.org/abs/nucl-th/0505039
http://arxiv.org/abs/nucl-th/0402027
http://arxiv.org/abs/hep-ph/9603278
http://arxiv.org/abs/nucl-th/0505032
http://arxiv.org/abs/nucl-th/0501049
http://arxiv.org/abs/nucl-th/0603045
http://arxiv.org/abs/nucl-th/0701090
http://arxiv.org/abs/nucl-th/0405048
http://arxiv.org/abs/nucl-th/0304037
http://arxiv.org/abs/nucl-th/0604035
http://arxiv.org/abs/nucl-th/0208023
http://arxiv.org/abs/nucl-th/0004033
	Introduction
	ChPT calculation for d+nn
	Differential cross sections: relevant features
	Extraction of ann and estimate of the theoretical uncertainty
	Discussion and conclusions
	Leading amplitudes
ABSTRACT
  We discuss the possibility to extract the neutron-neutron scattering length
a_{nn} from experimental spectra on the reaction gamma d --> pi^+ nn. The
transition operator is calculated to high accuracy from chiral perturbation
theory. We argue that for properly chosen kinematics, the theoretical
uncertainty of the method can be as low as 0.1 fm.

<|endoftext|><|startoftext|>
Multiple Unfoldings of Orbifold Singularities:
Engineering Geometric Analogies to Unification
Jacob L. Bourjaily∗
Joseph Henry Laboratories, Princeton University, Princeton, NJ 08544
(Dated: 2nd April 2007)
Katz and Vafa [1] showed how charged matter can arise geometrically by the deformation of ADE-
type orbifold singularities in type IIa, M-theory, and F-theory compactifications. In this paper we use
those same basic ingredients, used there to geometrically engineer specific matter representations,
here to deform the compactification manifold itself in a way which naturally compliments many
features of unified model building. We realize this idea explicitly by deforming a manifold engineered
to give rise to an SU5 grand unified model into a one giving rise to the Standard Model. In this
framework, the relative local positions of the singularities giving rise to Standard Model fields are
specified in terms of the values of a small number of complex structure moduli which deform the
original manifold, greatly reducing the arbitrariness of their relative positions.
I. INTRODUCTION
One of the ways in which a gauge theory with
massless charged matter can arise in type IIa, M-
theory, and F-theory is known as geometrical engi-
neering. In this framework, gauge theory at low en-
ergy arises from co-dimension four singular surfaces
in the compactification manifold [2] and charged
matter arises as isolated points (curves in F-theory)
on these surfaces over which the singularity is en-
hanced. Katz and Vafa [1] constructed explicit ex-
amples of local geometry which would give rise to dif-
ferent representations of various gauge groups. Their
work was presented explicitly in the language of type
IIa or F-theory, but the general results have been
shown to apply much more broadly to M-theory as
well [3, 4, 5, 6, 7].
The picture of matter and gauge theory arising
from pure geometry via singular structures has been
used very fruitfully in much of the progress of M-
theory phenomenology. In [8] Witten engineered
an interesting phenomenological model in M-theory
which could possibly solve the Higgs doublet-triplet
splitting problem; this model was explored in great
detail together with Friedmann in [9]. There, the ex-
plicit topology of the ADE-singular surface and the
relative locations of all the isolated conical singulari-
ties was motivated by phenomenology—the descrip-
tion of the geometry of the singularities themselves
was taken for granted.
Unlike model building with D-branes, for exam-
ple, geometrical engineering as it has been under-
stood provides little information about the number,
type, and relative locations of the many different sin-
gularities needed for any phenomenological model.
This information must either come a posteriori from
phenomenological success or via duality to a con-
crete string model. But recent successes in M-theory
model building (for example, [10, 11]) motivate a
new look at how to describe the relative structure
∗Electronic address: jbourjai@princeton.edu
of singularities—at least locally—within the frame-
work of geometrical engineering itself.
In this paper, we reduce the apparent arbitrari-
ness of the number and relative positions of the sin-
gularities required by low-energy phenomenology by
showing how they can be obtained from deforming
a smaller number of singularities in a more unified
model. In section II we review the ingredients of
geometrical engineering as described in [1]. The ba-
sic framework is then interpreted in a novel way in
section III to relate manifolds with matter singular-
ities to those with more or less symmetry. The idea
is used explicitly to deform an SU5 grand unified
model into the Standard model.
To be clear, as in [1] our results apply only strictly
to N = 2 models from type IIa compactifications
or N = 1 models from F-theory compactifications1;
but we suspect that this framework has an M-theory
analogue in the spirit of [7].
II. GEOMETRICAL ENGINEERING
In the framework of geometrical engineering the
compactification manifold is described as a fibration
of (singular) K3 surfaces over a base space of appro-
priate dimension. The collection of point-like (co-
dimension four) singularities of the K3 fibres would
then be a co-dimension four surface in the compact-
ified manifold, giving rise to gauge theory of type
corresponding to the singularities on each K3 fibre.
Table I lists polynomials in C3 whose solutions can
be (locally) taken to be the fibres for each corre-
sponding gauge group.
One of the strengths of this framework is its gen-
erality: the local geometry is specified in terms of
1 We essentially describe non-compact Calabi-Yau three-
folds which are K3-fibrations over C1. If the C1-base is
fibred over CP1 as an O(−2) bundle, for example, then
the total space will be a Calabi-Yau four-fold upon which
F-theory can be compactified, giving rise to an N = 1
theory.
http://arxiv.org/abs/0704.0444v1
mailto:jbourjai@princeton.edu
TABLE I: Hypersurfaces in C3 giving rise to the desired
orbifold singularities.
Gauge group Polynomial
SUn (≡ An−1) xy = z
SO2n (≡ Dn) x
2 + y2z = zn−1
2 = y3 + z4
2 + y3 = yz3
2 + y3 = z5
the K3 fibres, so that the description applies equally
well to compactifications in type IIa, M-theory, and
F-theory—the difference being the dimension of the
space over which the surfaces in Table I are fibred.
To obtain massless charged matter, however, addi-
tional structure is necessary. Specifically, at isolated
points (in type IIa or M-theory) on the co-dimension
four singular surface, the type of singularity of the
K3 fibres must be enhanced by one rank. Mathe-
matically, this requires that one can describe how
the various polynomials in Table I can be deformed
into each other; and the possible two-dimensional
deformations have been classified [12].
For example, to describe the embedding of a mass-
less 5 of SU5 in type IIa, you would need to start
with aK3-fibred Calabi-Yau where each of the fibres
are of the type giving rise to SU5 gauge theory. From
Table I we see that these four-dimensional fibres are
locally the set of solutions to the equation
xy = z5, (1)
in C3. Now, to obtain matter in the 5 representa-
tion, there would need to be an isolated point some-
where on the two-dimensional base space where the
fibre is enhanced to SU6, [1]. A description of the
local geometry can be given by
xy = (z + 5t)(z − t)5, (2)
where t is a complex coordinate on the base over
which the K3’s are fibred. Notice that when t = 0
the equation describes precisely the fibre which
would have given rise to SU6 gauge theory if it were
fibred over the entire base manifold. However, be-
cause it is the fibre only over the origin in the com-
plex t-plane, there is no SU6 gauge theory. Equation
(2) is said to describe the ‘resolution’ SU6 → SU5,
which is found to give rise to SU5 gauge theory at
low energy with a single massless 5 at t = 0. This
and many other explicit examples of such resolutions
and the matter representations obtained are given in
One subtlety which makes the description above
not automatically apply to M-theory constructions,
however, is that in equation (2) the complex pa-
rameter t is two-dimensional: taken as a coor-
dinate over which the K3 surfaces are fibred, it
gives rise to a six-dimensional compactification man-
ifold. In M-theory, co-dimension four singularities
are three-dimensional and chiral matter would live
at isolated points on these three dimensional orb-
ifold singularities. So in M-theory the resolution
SU6 → SU5 would need a three-dimensional defor-
mation. Morally, the structure is identical to that
described in equation (2), but the parameter t must
be upgraded to describe three-dimensional deforma-
tions. This can be done in terms of hyper-Kähler
quotients. We suspect that all the resolutions de-
scribed explicitly for type IIa here and in [1] can be
upgraded to three-dimensional deformations needed
in M-theory, and in many cases these generalizations
have already been given [4, 5, 7].
III. ENGINEERING GEOMETRIC
ANALOGIES TO UNIFICATION
The main result of this paper is that distinct con-
ical singularities on a surface with some gauge sym-
metry can be deformed into each other in ways anal-
ogous to unification; and conversely, that a descrip-
tion of a single matter field in a unified theory can
be ‘unfolded’ into distinct matter fields in a theory
of lower gauge symmetry. Because the tools used to
perform these unification-like deformations are pre-
cisely the same as those used to describe the singu-
larities themselves, some care must be taken to avoid
unnecessary confusion.
We will start by reinterpreting the tools used
above to engineer charged matter, and then we will
use both interpretations simultaneously to construct
explicit examples of the geometric analogue to uni-
fied model building.
Consider again the resolution SU6 → SU5 de-
scribed by
xy = (z + 5s)(z − s)5, (3)
where we have replaced t 7→ s from equation (2)
to make a interpretative distinction that will soon
become clear. We propose to momentarily discuss
pure gauge theory and ignore any description of mat-
ter. With this in mind, take a fixed (real) two-
dimensional neighborhood over which every point is
fibred by the solutions to equation (3) for any fixed
value of s. Because the fibres are the same every-
where on the manifold, there is no matter: for any
s the geometry would give rise to pure gauge theory
at low energy. For s 6= 0 solutions to equation (3)
are SU5 fibres and so the compactification manifold
would give rise to pure SU5. However, when s = 0
the fibres are all SU6 and so the low-energy theory
would be pure SU6. Therefore s is a ‘global’ param-
eter which deforms the gauge content of the theory
such that for arbitrary values of s 6= 0 the theory is
pure SU5 and for s = 0 it is pure SU6. That this
deformation is ‘smooth’ is apparent at least when
s 6= 0.
An obvious question to ask is how this framework
applies when conical singularities are present. We
(1, 2)(3, 1)
SU6 SU5
SU3×SU3SU4×SU2
FIG. 1: The plane describing the deformation of a
theory with a single of SU into one of SU SU
gauge theory with one ( and one ( as a
function of as described by equation (4). For a fixed
value of , the base space over which solutions to (4) are
fibred are indicated by the black line. Notice that the
relative positions of the two isolated (conical) singulari
ties are fixed by
will show that when the ADE-surface singularity
changes because of some complex structure modu-
lus such as above, the conical singularities giving
rise to charged matter (often) behave as one would
expect from unified model building intuition. This
is best demonstrated with explicit examples.
Suppose that the singular 3 surfaces are fibred
over a two-dimensional base space with local com-
plex coordinate . And say the four-dimensional fi-
bre over the point is given by the solutions to
xy = ( + 5 )( + 3 (4)
for a given value of , which is now to be interpreted
as a complex structure modulus deforming the entire
local geometry near = 0. When = 0 the geometry
is of course identical to our previous description of
SU SU and so the theory would be SU with a
single massless located at = 0.
Consider now to be fixed at some non-zero value.
The gauge theory is then SU SU : for generic
values of , the fibres given by equation (4) have
two singular points, at + 3 = 0 and
= 0, and so the union of these
points over the base manifold coordinatized by will
be two distinct, two-dimensional singular surfaces:
one giving rise to SU and the other SU . These
surfaces become coincident as a single SU surface
when = 0.
Along the complex -plane, there are two iso-
lated points over which the singularities are en-
hanced: at s/2 the fibre is visibly SU SU
and at s/3 the fibre is SU SU . There-
fore the theory has two two charged, massless fields,
in the ( and ( representations of
SU SU at s/2 and s/3, respec-
tively. Figure 1 indicates the singularity structure
as a function of
Notice how this description parallels unified model
building: the = 0 theory of one of SU de-
forms smoothly into one with (
of SU SU
Similarly, we may ask how a 10 of SU would de-
form into distinct singularities supporting Standard
Model matter fields. The fibre structure giving rise
to a massless 10 of SU is given as follows. Let
be a local coordinate on the base space over which
fibres are given by solutions to
+ 2yt 10 ; (5)
at = 0, equation (5) describes an SO10 fibre, while
for = 0 the fibres are SU —although in this case
the result is harder to read off. This resolution,
SO10 SU , gives rise to a 10 of SU [1].
Following the same idea as before, the deformation
of this singularity into SU SU is given by
+ 2 + ( + ( (6)
where is again interpreted as a complex structure
modulus deforming the geometry near the singular-
ity. Notice as before that = 0 describes an SU the-
ory with one massless 10 located at = 0. However,
for = 0 there are again two orbifold singularities
corresponding to SU SU gauge theory. At
three distinct points on the complex plane the rank
of the fibre is enhanced: = 0, and
give rise to matter in the ( , ( , and
representations of SU SU , respec-
tively. The structure of the deformation achieved by
varying is shown in Figure 2.
Again, our intuition from unified model building
is realized naturally in this framework.
FIG. 1: The t-s plane describing the deformation of a
theory with a single 5 of SU5 into one of SU3 × SU2 × U1
gauge theory with one (3,1)1/3 and one (1, 2)−1/2 as a
function of s as described by equation (4). For a fixed
value of s, the base space over which solutions to (4) are
fibred are indicated by the black line. Notice that the
relative positions of the two isolated (conical) singulari-
ties are fixed by s.
will show that when the ADE-surface singularity
changes because of some complex structure modu-
lus such as s above, the conical singularities giving
rise to charged matter (often) behave as one would
expect from unified model building intuition. This
is best demonstrated with explicit examples.
Suppose that the singular K3 surfaces are fibred
over a two-dimensional base space with local com-
plex coordinate t. And say the four-dimensional fi-
bre over the point t is given by the solutions to
xy = (z + 5t)(z − t+ 3s)2(z − t− 2s)3, (4)
for a given value of s, which is now to be interpreted
as a complex structure modulus deforming the entire
local geometry near t = 0. When s = 0 the geometry
is of course identical to our previous description of
SU6 → SU5 and so the theory would be SU5 with a
single massless 5 located at t = 0.
Consider now s to be fixed at some non-zero value.
The gauge theory is then SU3×SU2×U1: for generic
values of t, the fibres given by equation (4) have
two singular points, at x = y = z − t+ 3s = 0 and
x = y = z − t− 2s = 0, and so the union of these
points over the base manifold coordinatized by t will
be two distinct, two-dimensional singular surfaces:
one giving rise to SU2 and the other SU3. These
surfaces become coincident as a single SU5 surface
when s = 0.
Along the complex t-plane, there are two iso-
lated points over which the singularities are en-
hanced: at t = s/2 the fibre is visibly SU3 × SU3,
and at t = −s/3 the fibre is SU4 × SU2. There-
fore the theory has two two charged, massless fields,
in the (1,2)
−1/2 and (3,1)1/3 representations of
SU3 × SU2 × U1 at t = s/2 and t = −s/3, respec-
tively. Figure 1 indicates the singularity structure
as a function of s.
Notice how this description parallels unified model
building: the s = 0 theory of one 5 of SU5 de-
forms smoothly into one with (3,1)1/3 ⊕ (1,2)−1/2
of SU3 × SU2 × U1.
Similarly, we may ask how a 10 of SU5 would de-
form into distinct singularities supporting Standard
Model matter fields. The fibre structure giving rise
to a massless 10 of SU5 is given as follows. Let t
be a local coordinate on the base space over which
fibres are given by solutions to
x2 + y2z + 2yt5 =
z + t2
− t10
; (5)
at t = 0, equation (5) describes an SO10 fibre, while
for t 6= 0 the fibres are SU5—although in this case
the result is harder to read off. This resolution,
SO10 → SU5, gives rise to a 10 of SU5 [1].
Following the same idea as before, the deformation
of this singularity into SU3 × SU2 is given by
x2 + y2z + 2y(t+ s)3(t− s)2 =
z + (t− s)2
z + (t+ s)2
− (t− s)4(t+ s)6
, (6)
where s is again interpreted as a complex structure
modulus deforming the geometry near the singular-
ity. Notice as before that s = 0 describes an SU5 the-
ory with one massless 10 located at t = 0. However,
for s 6= 0 there are again two orbifold singularities
corresponding to SU3 × SU2 × U1 gauge theory. At
three distinct points on the complex t plane the rank
of the fibre is enhanced: t = −s, t = 0, and t = s
give rise to matter in the (3,1)
−2/3, (3,2)1/6, and
(1,1)1 representations of SU3 × SU2 × U1, respec-
tively. The structure of the deformation achieved by
varying s is shown in Figure 2.
Again, our intuition from unified model building
is realized naturally in this framework.
(3, 2)
(3, 1) (1, 1)
SO10 SU5
SU3×SU2×SU2SU4×SU2
FIG. 2: The plane describing the deformation of a
theory with a single 10 of SU into one of SU SU
gauge theory, with matter content (
, as a function of as described by equation (6).
For a fixed value of , the base space over which solutions
to (6) are fibred are indicated by the black line. Notice
that the relative positions of the three isolated (conical)
singularities are fixed by
IV. DISCUSSION
One of the primary reasons why geometrical engi-
neering had not been more widely used phenomeno-
logically is because the number, type, and relative lo-
cations of the singularities giving rise to various mat-
ter fields were explicitly ad hoc: the inherent local
framework prevented relationships between distinct
singularities from being discussed. In this paper, we
have shown a framework in which these questions
can be addressed concretely, systematically reduc-
ing the arbitrariness of these models.
Of course, the local nature of geometrical engi-
neering is still inherent in this framework, and con-
tinues to prevent us from addressing questions about
the global structure such as stability, quantum grav-
ity, and the quantization of seemingly continuous
parameters like . However, in the spirit of [13], we
think that local engineering is a good step toward re-
alistic string phenomenology, and may perhaps offer
new insights.
In this paper we explicitly illustrated the geomet-
ric unfolding of the matter content of an SU grand
unified model into the Standard Model. But the pro-
cedure can easily be generalized. It is not difficult to
see how this will work for a more unified theory. For
example, one can envision how an entire family could
unfold out of a single SO10 resolution (which
starts as a 16 of SO10), or how all three families of
the Standard Mode could be unfolded out of a sin-
gle SO10 SU or SU resolution.
However, these examples require more sophisticated
tools of analysis, and so we have chosen to describe
these in a forthcoming work.
V. ACKNOWLEDGEMENTS
This work originated from discussions with Mal-
colm Perry whose insights drove this work forward in
its earliest steps. The author also appreciates helpful
discussions, comments, and suggestions from Her-
man Verlinde, Sergei Gukov, Gordon Kane, Edward
Witten, Paul Langacker, Bobby Acharya, Dmitry
Malyshev, Matthew Buican, Piyush Kumar, and
Konstantin Bobkov.
This research was supported in part by the Michi-
gan Center for Theoretical Physics and a Gradu-
ate Research Fellowship from the National Science
Foundation.
[1] S. Katz and C. Vafa, “Matter from Geometry,”
Nucl. Phys., vol. B497, pp. 146–154, 1997, hep-
th/9606086.
[2] A. Klemm, W. Lerche, and P. Mayr, “K3 Fibrations
and Heterotic Type II String Duality,” Phys. Lett.
vol. B357, pp. 313–322, 1995, hep-th/9506112.
[3] M. Atiyah and E. Witten, “M-theory Dynamics on
a Manifold of Holonomy,” Adv. Theor. Math.
Phys., vol. 6, pp. 1–106, 2003, hep-th/0107177.
[4] E. Witten, “Anomaly Cancellation on Mani-
folds,” 2001, hep-th/0108165.
[5] B. Acharya and E. Witten, “Chiral Fermions from
Manifolds of Holonomy,” 2001, hep-th/0109152.
[6] B. S. Acharya and S. Gukov, “M-theory and Singu-
larities of Exceptional Holonomy Manifolds,” Phys.
Rept., vol. 392, pp. 121–189, 2004, hep-th/0409191.
[7] P. Berglund and A. Brandhuber, “Matter from
Manifolds,” Nucl. Phys., vol. B641, pp. 351–375,
2002, hep-th/0205184.
[8] E. Witten, “Deconstruction, Holonomy, and
Doublet-Triplet Splitting,” 2001, hep-ph/0201018.
[9] T. Friedmann and E. Witten, “Unification Scale,
Proton Decay, and Manifolds of G(2) Holonomy,”
Adv. Theor. Math. Phys., vol. 7, pp. 577–617, 2003,
hep-th/0211269.
[10] B. Acharya, K. Bobkov, G. Kane, P. Kumar, and
D. Vaman, “An M Theory Solution to the Hierarchy
Problem,” Phys. Rev. Lett., vol. 97, p. 191601, 2006,
hep-th/0606262.
[11] B. S. Acharya, K. Bobkov, G. L. Kane, P. Kumar,
and J. Shao, “Explaining the Electroweak Scale
and Stabilizing Moduli in M-Theory,” 2007, hep-
th/0701034.
[12] S. Katz and D. Morrison, “Gorenstein Threefold
Singularities with Small Resolutions via Invariant
Theory for Weyl Groups,” J. Algebraic Geometry
vol. 1, pp. 449–530, 1992.
[13] H. Verlinde and M. Wijnholt, “Building the Stan-
dard Model on a D3-Brane,” JHEP, vol. 01, p. 106,
2007, hep-th/0508089.
FIG. 2: The t-s plane describing the deformation of a
theory with a single 10 of SU5 into one of SU3×SU2×U1
gauge theory, with matter content (3,1)
−2/3⊕(3, 2)1/6⊕
(1,1)1, as a function of s as described by equation (6).
For a fixed value of s, the base space over which solutions
to (6) are fibred are indicated by the black line. Notice
that the relative positions of the three isolated (conical)
singularities are fixed by s.
IV. DISCUSSION
One of the primary reasons why geometrical engi-
neering had not been more widely used phenomeno-
logically is because the number, type, and relative lo-
cations of the singularities giving rise to various mat-
ter fields were explicitly ad hoc: the inherent local
framework prevented relationships between distinct
singularities from being discussed. In this paper, we
have shown a framework in which these questions
can be addressed concretely, systematically reduc-
ing the arbitrariness of these models.
Of course, the local nature of geometrical engi-
neering is still inherent in this framework, and con-
tinues to prevent us from addressing questions about
the global structure such as stability, quantum grav-
ity, and the quantization of seemingly continuous
parameters like s. However, in the spirit of [13], we
think that local engineering is a good step toward re-
alistic string phenomenology, and may perhaps offer
new insights.
In this paper we explicitly illustrated the geomet-
ric unfolding of the matter content of an SU5 grand
unified model into the Standard Model. But the pro-
cedure can easily be generalized. It is not difficult to
see how this will work for a more unified theory. For
example, one can envision how an entire family could
unfold out of a single E6 → SO10 resolution (which
starts as a 16 of SO10), or how all three families of
the Standard Mode could be unfolded out of a sin-
gle E8 → SO10×SU3 or E8 → E6 ×SU2 resolution.
However, these examples require more sophisticated
tools of analysis, and so we have chosen to describe
these in a forthcoming work.
V. ACKNOWLEDGEMENTS
This work originated from discussions with Mal-
colm Perry whose insights drove this work forward in
its earliest steps. The author also appreciates helpful
discussions, comments, and suggestions from Her-
man Verlinde, Sergei Gukov, Gordon Kane, Edward
Witten, Paul Langacker, Bobby Acharya, Dmitry
Malyshev, Matthew Buican, Piyush Kumar, and
Konstantin Bobkov.
This research was supported in part by the Michi-
gan Center for Theoretical Physics and a Gradu-
ate Research Fellowship from the National Science
Foundation.
[1] S. Katz and C. Vafa, “Matter from Geome-
try,” Nucl. Phys., vol. B497, pp. 146–154, 1997,
hep-th/9606086.
[2] A. Klemm, W. Lerche, and P. Mayr, “K3 Fibrations
and Heterotic Type II String Duality,” Phys. Lett.,
vol. B357, pp. 313–322, 1995, hep-th/9506112.
[3] M. Atiyah and E. Witten, “M-theory Dynamics on
a Manifold of G2 Holonomy,” Adv. Theor. Math.
Phys., vol. 6, pp. 1–106, 2003, hep-th/0107177.
[4] E. Witten, “Anomaly Cancellation on G2 Mani-
folds,” 2001, hep-th/0108165.
[5] B. Acharya and E. Witten, “Chiral Fermions from
Manifolds of G2 Holonomy,” 2001, hep-th/0109152.
[6] B. S. Acharya and S. Gukov, “M-theory and Singu-
larities of Exceptional Holonomy Manifolds,” Phys.
Rept., vol. 392, pp. 121–189, 2004, hep-th/0409191.
[7] P. Berglund and A. Brandhuber, “Matter from G2
Manifolds,” Nucl. Phys., vol. B641, pp. 351–375,
2002, hep-th/0205184.
[8] E. Witten, “Deconstruction, G2 Holonomy, and
Doublet-Triplet Splitting,” 2001, hep-ph/0201018.
[9] T. Friedmann and E. Witten, “Unification Scale,
Proton Decay, and Manifolds of G(2) Holonomy,”
Adv. Theor. Math. Phys., vol. 7, pp. 577–617, 2003,
hep-th/0211269.
[10] B. Acharya, K. Bobkov, G. Kane, P. Kumar, and
D. Vaman, “An M Theory Solution to the Hierarchy
Problem,” Phys. Rev. Lett., vol. 97, p. 191601, 2006,
hep-th/0606262.
[11] B. S. Acharya, K. Bobkov, G. L. Kane, P. Ku-
mar, and J. Shao, “Explaining the Electroweak
Scale and Stabilizing Moduli in M-Theory,” 2007,
hep-th/0701034.
[12] S. Katz and D. Morrison, “Gorenstein Threefold
Singularities with Small Resolutions via Invariant
Theory for Weyl Groups,” J. Algebraic Geometry,
vol. 1, pp. 449–530, 1992.
[13] H. Verlinde and M. Wijnholt, “Building the Stan-
dard Model on a D3-Brane,” JHEP, vol. 01, p. 106,
2007, hep-th/0508089.
http://arxiv.org/abs/hep-th/9606086
http://arxiv.org/abs/hep-th/9506112
http://arxiv.org/abs/hep-th/0107177
http://arxiv.org/abs/hep-th/0108165
http://arxiv.org/abs/hep-th/0109152
http://arxiv.org/abs/hep-th/0409191
http://arxiv.org/abs/hep-th/0205184
http://arxiv.org/abs/hep-ph/0201018
http://arxiv.org/abs/hep-th/0211269
http://arxiv.org/abs/hep-th/0606262
http://arxiv.org/abs/hep-th/0701034
http://arxiv.org/abs/hep-th/0508089
ABSTRACT
  Katz and Vafa showed how charged matter can arise geometrically by the
deformation of ADE-type orbifold singularities in type IIa, M-theory, and
F-theory compactifications. In this paper we use those same basic ingredients,
used there to geometrically engineer specific matter representations, here to
deform the compactification manifold itself in a way which naturally
compliments many features of unified model building. We realize this idea
explicitly by deforming a manifold engineered to give rise to an $SU_5$ grand
unified model into a one giving rise to the Standard Model. In this framework,
the relative local positions of the singularities giving rise to Standard Model
fields are specified in terms of the values of a small number of complex
structure moduli which deform the original manifold, greatly reducing the
arbitrariness of their relative positions.

<|endoftext|><|startoftext|>
Geometrically Engineering the Standard Model:
Locally Unfolding Three Families out of E8
Jacob L. Bourjaily∗
Joseph Henry Laboratories, Princeton University, Princeton, NJ 08544
(Dated: 3rd April 2007)
This paper extends and builds upon the results of [1], in which we described how to use the tools
of geometrical engineering to deform geometrically-engineered grand unified models into ones with
lower symmetry. This top-down unfolding has the advantage that the relative positions of singular-
ities giving rise to the many ‘low energy’ matter fields are related by only a few parameters which
deform the geometry of the unified model. And because the relative positions of singularities are
necessary to compute the superpotential, for example, this is a framework in which the arbitrariness
of geometrically engineered models can be greatly reduced.
In [1], this picture was made concrete for the case of deforming the representations of an SU5
model into their Standard Model content. In this paper we continue that discussion to show how
a geometrically engineered 16 of SO10 can be unfolded into the Standard Model, and how the
three families of the Standard Model uniquely emerge from the unfolding of a single, isolated E8
singularity.
I. INTRODUCTION
In [2], Katz and Vafa showed how to geometrically
engineer matter representations in terms of the local
singularity structure of type IIa, M-theory, and F-
theory compactifications. In that framework matter
and gauge theory both have purely geometrical ori-
gins: SUn, SO2n and En gauge theories arise from
the existence of co-dimension four singular curves of
certain types in the compactification manifold [3];
and massless matter representations arise from iso-
lated points (in type IIa or M-theory) or curves (in
F-theory) along the singular surface over which the
type of singularity is enhanced by one rank.
Despite the extraordinary generality of this frame-
work, it has not been widely used phenomenologi-
cally. This is largely because the description of the
isolated enhancements of singularities giving rise to
various matter representations is inherently local:
although the geometry near any particular enhance-
ment could be described concretely, the framework
had nothing to say about numbers, types, and rela-
tive locations of different matter fields. This global
data was either to be determined by duality to a
concrete, global string theory model1, or suggested
via the a posteriori success of a given set of relative
positions (as in e.g. [5, 6]).
Another way to relate the number and relative
positions of (enhanced singularities giving rise to)
matter fields was given in [1]: in that paper, we
described for example how a local description of the
geometry giving rise to a massless 5 of SU5 could
be smoothly deformed into a local description of a
(3,1) and a (1,2) of SU3×SU2 which live at distinct
∗Electronic address: jbourjai@princeton.edu
1 Geometrically engineered models in M-theory are dual to
intersecting brane models in type II, (see e.g. [4]).
points—related by a single deformation parameter.
A cartoon of what was described in that paper is
shown in Figure 1.
In this paper we describe pedagogically how to
extend that idea to engineer analogies to SO10 and
E6 × SU2 grand unified models
2. Although in [1]
we were able to analyze explicit unfoldings of SO10
and SU6 singularities sufficiently well by sight, this
will not be possible for our present examples. All
of the examples in this paper involve the unfolding
of isolated En singularities; and although algebraic
descriptions of these are known and classified [7], it
would be unnecessarily cumbersome and unenlight-
ening to analyze them explicitly as we did in [1].
Therefore, in section II we describe a much more
powerful and elegant language in which to study
these resolutions.
In section III we describe in detail how the un-
folding of a 16 of SO10 into the Standard Model is
derived in the language of section II. This is achieved
in two stages: in the first stage, we unfold the 16 into
10⊕5⊕1 of SU5; we then unfold the resulting SU5
model into a single ‘family’ of the Standard Model.
At the end, all the relative positions of the singu-
larities of the family are set by the non-zero values
of two complex structure moduli, thereby greatly re-
ducing the arbitrariness of their relative positions.
The next most obvious example would be a de-
scription of how a 27 of E6 geometrically unfolds
into the Standard Model. However, there are two
reasons to leave this example to the reader: first, it is
a most natural extension of the results of section III;
secondly, it is a consequence of the E6 × SU2 grand
unified model which we describe in section IV. Al-
2 As described in section IV, the resolution E8 → E6 × SU2
naturally starts as a theory with three 27’s of E6 related
by an SU2 family symmetry.
http://arxiv.org/abs/0704.0445v2
mailto:jbourjai@princeton.edu
FIG. 1: A cartoon of the geometric deformation of a 5 and 10 of SU5 into the Standard Model as described in [1].
The surface over which the singularities are enhanced is coordinatized by a complex parameter t and the geometry
is deformed by changing the value of a complex parameter s. The relative locations of the ‘resolved’ singularities are
given in terms of s.
though not given as an example in [2], it is not hard
to see3 that a single isolated E8 singularity at the
intersection of a co-dimension four surfaces of types
E6 and SU2 gives rise to matter in the representa-
tion (27,2)⊕ (27,1) ⊕ (1,2). It is easy to see how
this would unfold into the matter content of three
families—one coming from each of the 27’s. That
three families emerge from E8 is a general conse-
quence of group theory and can be understood from
the fact that E6 ×SU3 is a maximal subgroup of E8
into which the adjoint of E8 partially branches into
an SU3 triplet of 27’s.
As in the preceding paper [1], this work is pre-
sented concretely in the language of Calabi-Yau
compactifications of type IIa string theory, which
can also be naturally extended to F-theory mod-
els. Here, we engineer the explicit local geometry
of (non-compact) Calabi-Yau three-folds which are
K3-fibrations over C1. If type IIa string theory is
compactified on this three-fold, a four-dimensional
N = 2 theory with various massless hypermulti-
plets will result. But if, for example, the C1 base of
this three-fold were fibred as an O(−2) bundle over
1, the resulting total space would be a Calabi-Yau
four-fold4 upon which F-theory would compactify to
an N = 1 theory with chiral multiplets. However,
because the manifold over which the singular K3’s
are fibred in M-theory is a real, three-dimensional
space, our fibrations over C1 do not have a direct
application to M-theory.
It would of course be desirable to have a similar
description of geometric unfolding explicitly in the
language of G2-manifolds so that this picture could
be realized concretely in M-theory as well. This is
particularly important in light of the recent advances
in M-theory phenomenology (e.g. [8, 9]). By exten-
3 This is described in section II.
4 This is just one example of the ways in which these Calabi-
Yau three-folds could be fibred over CP1 to result in a
Calabi-Yau four-fold.
sion of the work of Berglund and Brandhuber in [10],
such a generalization should be relatively straight-
forward, but we will not attempt to do this here.
II. RESOLVING En-TYPE SINGULARITIES
Recall that a gauge theory in type IIa string the-
ory can arise from compactification to six dimen-
sions over a singular K3 surface (similar statements
apply to M-theory and F-theory) [3]. The complex
structures of the singular compactification manifolds
giving rise to SUn(≡ An−1), SO2n(≡ Dn), and En
gauge theory are given in Table I—where the sur-
faces are labelled conveniently by the name of the
resulting gauge theory5.
We can generalize this discussion by considering
a complex, one-dimensional space B over which a
smooth family of singular K3 surfaces are fibred. If
almost everywhere over B the K3-fibres have singu-
larities of a single type, then compactification of type
IIa string theory over the total space will give rise
Gauge group Polynomial
SUn (≡ An−1) xy = z
SO2n (≡ Dn) x
2 + y2z = zn−1
2 = y3 + z4
2 + y3 = 16yz3
2 + y3 = z5
TABLE I: Hypersurfaces in C3 giving rise to the desired
orbifold singularities.
5 It is of curious historical interest that the equations listed
in Table I were first identified by Fleix Klein in 1884 [11].
The reader may also be amused that the full resolutions of
these surfaces were almost completely classified—up to a
few computational errors—by Bramble in 1918 [12].
to gauge theory in four-dimensions of the type corre-
sponding to the typical fibre. Massless charged mat-
ter will arise if over isolated points in B the type of
fibre is enhanced by one rank. The geometry about
a single such isolated point where the singularity is
enhanced was described in detail by Katz and Vafa
in [2].
The representation of matter living at these ‘more-
singular’ points was also given in [2]: suppose that
G ⊃ H ×U1 and that the rank of H is one less than
G; then, if there is an isolated G-type singularity
over a surface of H-type singularities, the resulting
massless representation is given by those parts of the
decomposition of the adjoint of G into H×U1 which
are charged under the U1.
Because the question of how to (smoothly) deform
the surfaces of Table I into ones of lower rank has in-
trinsic mathematical interest, it is not too surprising
that all possible two-dimensional deformations have
been classified. Our discussion below will make use
of the notation and results presented in [7].
In our present work, we are interested in defor-
mations of En singularities into ones of lower rank.
Unlike SUn singularities, the resolutions of which are
easy enough to read off by sight, the algebraic com-
plexity of En singularities is formidable. To appre-
ciate what is meant by this, consider the resolution
of E7. From Table I we know that an E7 singularity
is locally isomorphic to the surface x2 + y3 = 16yz3
in C3. Its full resolution in terms of the seven defor-
mation parameters ~t = (t1, t2, . . . , t7) is given by
− x2 − y3 + 16yz3 + ǫ2(~t)y
2z + ǫ6(~t)y
2 + ǫ8(~t)yz + ǫ10(~t)z
2 + ǫ12(~t)y + ǫ14(~t)z + ǫ18(~t) = 0, (1)
where the ǫn(~t) are n
th order symmetric polynomi-
als in the components of ~t which are tabulated over
several pages of the appendix of [7].
A näıve way to determine the type of singularity
found by resolving E7 “in the direction ~t” would be
to expand equation (1) completely using the explicit
functions ǫn(~t), find each of its singular points, and
expand locally about each until an isomorphism with
a singularity of lower rank in Table I was clear. This
is the way, for example, that [2] demonstrated that
the resolution of E7 in the direction (0, 0, 0, 0, 0, t, 0)
gives rise to E6 for t 6= 0. All of the results in this
paper could be verified in this way. Luckily, however,
Katz and Morrison described a much more powerful
and direct way to analyze the deformations of En
singularities [7].
We would like a pragmatic answer to the follow-
ing question: what is the type of fibre found by re-
solving an En singularity in the direction ~t? That
there is an easy answer to this question makes our
work much simpler. Although an adequate treat-
ment would take us well beyond the scope of our
present discussion, the answer given in [7] is at least
very easy to make use of6: for each of the equations
in Table II satisfied by the components of ~t, the sin-
gularity has the corresponding root. Given the list of
roots, it is then a straight-forward exercise to con-
struct the Dynkin diagram corresponding to the sin-
gularity7.
6 Of course, this answer does depend on the parameterization
used. As stated before, we are working with the conventions
of [7].
7 Misusing the notation of [7] in a way applicable only to
In an admittedly bad notation, we consider each of
the n deformation parameters ti(t) to be functions
of t, the local coordinate on the base space B. A
(non-Abelian) gauge theory will be present if there
are roots implied by Table II which are preserved
for generic values of t. And charged massless matter
will exist if at isolated points {t∗} an additional root
is added—or, in terms of Dynkin diagrams, if an
additional node is added. At each isolated point we
can therefore identify the resolution G → H and
thereby determine the resulting representation.
Equation Root
ti − tj = 0 =⇒ ei − ej
ti + tj + tk = 0 =⇒ e0 − ei − ej − ek
tij = 0 =⇒ 2e0 −
2ti1 +
tij = 0 =⇒ 3e0 − 2ei1 −
TABLE II: The roots of the singularity resulting from the
resolution of En in the direction~t. This is a reproduction
of Table 4 of Ref. [7].
SUn, SO2n and En, one can think of the vectors ei as an or-
thonormal basis in Minkowski space which is equipped with
a mostly-plus metric. Then roots are vectors in this space
of norm +2. Each (positive) root gives rise to a node in the
resulting Dynkin diagram, and two nodes are connected by
a line if their inner product is −1 and disconnected if they
are orthogonal.
III. GEOMETRIC ANALOGUE OF SO10
GRAND UNIFICATION
A. The Description of a 16 of SO10
A necessary starting point to describe the unfold-
ing of a 16 of SO10 into the Standard Model is a
description of the initial geometry as was done in [2].
We will briefly review that construction in the lan-
guage described above before we unfold it, first into
an SU5 model, and later all the way into SU3×SU2.
Let t be a local complex coordinate on the space
B over which is fibred the resolution of E6 parame-
terized by ~t = (t, t, t, t, t,−2t). To be clear, for each
value of t, the vector~t describes an explicit surface in
3 given in reference [7] analogous to that of equa-
tion (1) above.
Considering the rules of Table II, we see that for
an arbitrary value of t 6= 0 the root lattice of the
fibre is
(e0−e1−e2−e6)
(e1 − e2) (e2 − e3) (e3 − e4) (e4 − e5)
where we have displayed the roots suggestively so as
to reproduce the SO10 Dynkin diagram. At t = 0,
however, E6 is restored. So we have an isolated E6
fibre over the point t = 0, while for any t 6= 0 the
fibre is SO10. This gives rise to SO10 gauge theory
with a single massless 16 located at the origin in the
t-plane.
B. Unfolding the 16 of SO10 into SU5
We would like to unfold the manifold described
above into one with SU5 gauge theory. It is not
hard to guess in what ‘directions’ ~t we may deform
the the geometry so that the fibre over a generic
point is SU5. Let a denote a parameter independent
of t which adjusts the whole geometry over the re-
gion which is coordinatized by t. Then let the fibre
over t be given by the resolution of E6 in the direc-
tion (t, t, t, t, t+ a,−2t− a). Obviously when a = 0
the situation is the same as above and results in a
single massless 16 of SO10. However, when a 6= 0
the situation is different: for generic values of t it is
easy to see that the simple roots are
(e0−e1−e5−e6) (e1−e2) (e2−e3) (e3−e4)
which means that the generic fibre over t is just
SU5—and so the resulting gauge theory is SU5.
To find what matter representations exist, we
must determine over which locations t the rank of
the fibre is enhanced. This means we are seeking
special values of t (determined by a) at which an ad-
ditional equation in Table II is satisfied. For each of
these points, we can draw the resulting Dynkin dia-
TABLE III: The locations on the complex t-plane over
which the singularity of the fibre is enhanced, and the
representations of SU5 × U1 that result.
Location Fibre
Representation
of SU5 × U1
3t + 2a = 0 SU5 × SU2 1−5
3t+ a = 0 SO10 10−1
t = 0 SU6 53
gram to determine the fibre over that point, thereby
determining the representation which arises there.
It is not hard to exhaustively find all these ‘more
singular’ points. They are give in Table III. Notice
that we have included the U1-charge assignments
that result; these are normalized as in the appendix
of [13].
C. Unfolding a 16 of SO10 into the Standard
Model
To complete our task and unfold the 16 of SO10
all the way to the Standard Model, we must deform
the fibres by another ‘global’ parameter, which we
will denote b. It is not hard to guess a direction over
which the generic fibre will be SU3 × SU2: try for
example (t, t, t, t + b, t + a,−2t− a − b). Again, we
notice that for a general location t and generic fixed
values a, b 6= 0, the singularity has the root structure
(e1 − e2) (e2 − e3)
(e0 − e4 − e5 − e6)
, (4)
which is visibly SU3 × SU2.
Like above, it is a straight-forward exercise to de-
termine all the locations over which the singularity
is enhanced, and the resulting representation which
arises. These points including their resulting rep-
resentations (with U1-charges as normalized in [13])
are listed in Table IV. The entire unfolding is repro-
duced graphically in Figure 2.
Location Fibre Representationof SU3×SU2×U1 Name
3t + 2a+ b = 0 SU3 × SU2 × SU2 (1,1)0 ν
3t + a+ 2b = 0 SU3 × SU2 × SU2 (1,1)6 e
3t+ a+ b = 0 SU5 (3,2)1 Q
3t + a = 0 SU4 × SU2 (3,1)−4 u
3t+ b = 0 SU4 × SU2 (3,1)2 d
t = 0 SU3 × SU3 (1,2)−3 L
TABLE IV: The locations on the complex t-plane over
which the singularity of the fibre is enhanced and the
representations of SU3 × SU2 × U1 that result.
FIG. 2: An illustration of the resolution of a geometrically engineered 16 of SO10 into the Standard Model as a
function of two complex structure moduli a and b as described in section III. The coordinate along the base space is t
and runs vertically in the diagram. When a = b = 0, along the left-hand side, there is just one isolated E6 singularity
at t = 0. When b = 0 but a is allowed to vary, this single singularity splits into three, and any a =constant slice will
have three isolated singularities in the complex t-plane as shown above. Moving rightward in the diagram, at the
dashed line a is held fixed and b is allowed to grow, causing the three enhancements of SU5 to break apart into six
total isolated singularities over SU3 ×SU2, which is shown on the right-hand-side. Also shown are the (appropriately
normalized) U1 charges of fields obtained via this multiple unfolding.
IV. GEOMETRIC ANALOGUE OF E6 × SU2
GRAND UNIFICATION
After having completed the unfolding of a 16 of
SO10 into the Standard Model, it is natural to ask if
this idea can be extended to relate all the singulari-
ties of the Standard Model as perhaps the unfolding
of a single isolated singularity of higher-rank. The
answer is in fact yes—and there is a sense in which
precisely three families arise if the notion of ‘geomet-
ric unification’ is saturated.
Because a 16 of SO10 arises from the resolution
E6 → SO10, it can only be unfolded out an ex-
ceptional singularity. Clearly the highest level of
unification one can achieve along this line would
be to start with a resolution E8 → H where H is
a rank-seven subgroup of E8 which contains SO10.
The possible ‘top-level’ gauge groups are then E7,
E6 × SU2, and SO10 × SU3. We choose to study
E8 → E6 × SU2 as our example because it will nat-
urally include a description of the unfolding of 27 of
E6 into the Standard Model, which is interesting in
its own right, and because it follows quite directly
from our work in section III.
The initial geometry which we will deform into the
Standard Model is given as follows. Let t be a com-
plex coordinate on the base space B over which is fi-
bred the resolution (t, t, 0, 0, 0, 0, 0, 0) of E8. Clearly,
when t = 0 we recover E8; when t 6= 0 we see that
the roots of the fibre are
(e0−e3−e4−e5)
(e3 − e4) (e4 − e5) (e5 − e6) (e6 − e7) (e7 − e8)
(e1 − e2)
, (5)
which is visibly E6 × SU2. Following the general
rule to determine the representation resulting from
a given resolution [2], we find that at t = 0 lives
massless matter charged in the (27,2)1⊕(27,1)−2⊕
(1,2)3 representation of E6 × SU2 × U1ϕ .
To avoid pedantic redundancy, in Figure 3 we have
summarized in great detail the entire unfolding into
SU3×SU2×U1Y ×U1χ×U1Ψ×U1ζ×U1ϕ . An outline
of the steps involved in deriving this unfolding is
given presently.
First, the unfolding of the E6 × SU2 gauge the-
ory into E6 gauge theory is obtained by defining
the fibre over t to be given by the resolution of E8
in the direction (t + a, t − a, 0, 0, 0, 0, 0, 0) for some
a 6= 0. This clearly kills the SU2 node of the fibre
in equation (5). There are five locations at which
the singularity is enhanced by one rank, giving rise
to three 27’s and two singlets as shown in the left-
most section of Figure 3.
FIG. 3: An illustration of the resolution of a single isolated E8 into the Standard Model in terms of four deformation
parameters a, b, c, d. Along the left hand side, for a = b = c = d = 0, the generic E6 × SU2 fibre is enhanced to E8
at t = 0. Moving from left to right, a, b, c, d are sequentially allowed to grow to some non-zero value—and between
dashed lines all but one of the moduli are held fixed. Solid lines indicate the locations of enhanced singularities
relative to the plane for as functions of a, b, c, d. The complete list of isolated singularities, their locations, and charge
assignments are given on the right hand side of the diagram.
The rest of the unfolding is a natural appli-
cation of the work in section III. Let us now
set the fibre over t to be given by the resolution
(t+ a, t− a, b, b, b, b, b,−2b) of E8 for arbitrary com-
plex deformation parameters a, b 6= 0. From section
III we see immediately that the generic fibre is SO10.
A thorough scanning for possible solutions to equa-
tions in Table II shows that there are 11 isolated
points on the complex t-plane over which the singu-
larity is enhanced. These correspond to the ‘break-
ing’ of each 27 of E6 into 16 ⊕ 10 ⊕ 1 of SO10,
while the singlets remain singlets. This is seen in
the second vertical strip (from the left) in Figure 3.
Again, following our discussion above, it is easy
to guess possibilities for the next two resolution di-
rections. First, we set the fibre over t to be given by
(t+ a, t− a, b, b, b, b, b+ c,−2b− c) which will result
SU5 gauge theory with matter content correspond-
ing to the ‘canonical’ decomposition of three 27’s of
E6 with two singlets. And finally, the full resolution
of the E6×SU2 grand unified model into SU3×SU2
can be given by letting the fibre over t be given by
the (t+a, t−a, t, t, t, t+d, t+c,−2b−c−d) resolution
of E8 for (generic) arbitrary fixed complex structure
moduli a, b, c, d 6= 0.
V. IMPLICATIONS
Let us clarify what we have done. For a given set
of fixed, nonzero complex structure moduli, the reso-
lution given above describes the explicit, local geom-
etry of a non-compact Calabi-Yau three-fold, which
is a K3-fibration over C1. If type IIa string theory
is compactified on this three-fold, the resulting four-
dimensional theory will have SU3 × SU2 gauge the-
ory with hypermultiplets at isolated points as given
in Figure 3 which reproduce the spectrum of three
families of the Standard Model with an extended
Higgs sector and some exotics. Alternatively, if one
takes this (non-compact) Calabi-Yau three-fold and
fibres it over CP1 as described in section I so that
the total space is Calabi-Yau, then F-theory on this
space will give rise to N = 1 supersymmetry with
SU3×SU2 gauge theory and chiral multiplets in the
representations given in Figure 3. And although it
does not follow directly from our construction above,
considering the close similarities between two- and
three-dimensional resolutions of the singularK3 sur-
faces we have every reason to suspect an analogous
geometry can be engineered for M-theory in terms
of hyper-Kähler quotients by extension of the results
in [10, 14, 15]. We are currently working on building
this geometry in M-theory, and we expect to report
on this work soon.
Given these four complex structure moduli, all
the relative positions of the 35 disparate singulari-
ties giving rise to all three families of the (extended)
Standard Model are then known8. Beyond the usual
three families of the Standard Model, the manifold
also gives rise to two Higgs doublets for each fam-
ily, six Higgs colour triplets, three right-handed neu-
trinos and five other Standard Model singlets. We
should point out that this matter content (and their
U1-charge assignments) is a consequence of group
theory and algebraic geometry alone—it is simply
what is found when unfolding E8 all the way to the
Standard Model.
And given the relative positions and local ge-
ometry of the singularities together with the U1-
structure, one can in principle compute the full su-
perpotential coming from instantons wrapping dif-
ferent singularities. Because these are fixed by the
values of the complex structure moduli, there is a
(complex) four-dimensional landscape9 of different,
explicit SU3 × SU2 embeddings at the compactifi-
cation scale. Although this large landscape may
appear to have too much freedom, we remind the
reader that in the traditional understanding of ge-
ometrical engineering there would be hundreds of
parameters describing the (independent) relative lo-
cations of each of the isolated singularities.
There are a few things to notice about the form of
the superpotential that will emerge. First, because
of the U1-charge assignments, each term in the su-
perpotential must combine exactly one term arising
from each of the 27’s. This greatly limits the form
of the superpotential. And in particular, it implies
that neither mass nor flavour eigenstates will arise
from any single 27—that is, the ‘families’ in the col-
loquial sense are necessarily linear combinations of
fields resulting from different 27’s.
Also notice that in general the terms in the su-
perpotential will be proportional to e−
dVol where
dVol is the volume form of some cycle wrapping sin-
gularities (the details of which depends on whether
we are talking about type IIa, M-theory, or F-theory
realizations), and are in principle calculable in terms
of the deformation moldui. And because these coef-
ficients are exponentially related to the volumes of
cycles, we expect the high-scale Lagrangian will be
generically hierarchical. This structure could be im-
portant for solving problems in phenomenology—for
example the µ problem in the Higgs potential, the
Higgs doublet-triplet splitting problem, or avoiding
8 A subtlety, however, is that because our language has been
explicitly that of N = 2 theory from type IIa, we are unable
to distinguish the 5 from the 5 in the splitting of the 10’s
of SO10. In Figure 3, a consistent choice was made—and
although we do not justify this claim here, it is the choice
that will be correct for the M-theory generalization of this
work.
9 That it is continuous is a consequence of the fact that we are
engineering non-compact Calabi-Yaus. If one matched this
local geometry to a compact global structure, the landscape
would of course be discrete.
proton decay.
We are in the process of studying the phenomenol-
ogy of models on this landscape. At first glance,
the U1-structure combined with high-scale hierar-
chies could possibly be complex enough to be able to
avoid some of the typical problems of E6-like grand
unified models. We should point out that if there
were no high-scale hierarchies, however, then the al-
lowed terms in the superpotential would generically
give rise to low-energy lepton and baryon number vi-
olation, similar to any ‘generic’ E6 model—i.e. one
which includes all types of terms allowed by the E6-
mandated U1-structure [16]. We could always im-
pose additional symmetries and add fields by hand
to solve these problems, but this would not be very
compelling. However, if viable models already ex-
ist in the landscape which do not require additional
fields or symmetries, these would be compelling even
if we do not yet understand how they are selected.
One of the most important phenomenological
questions about these models is the fate of the
additional U1 symmetries. Although we suspect
that one can determine which of the U1 symmetries
are dynamical below the compactification scale by
studying the normalizability of their corresponding
vector multiplets, we do not presently have have
a complete understanding of this situation. Of
course, if any additional U1’s survive to low en-
ergy they could have very interesting—or damning—
phenomenological consequences.
VI. DISCUSSION
An important point to bear in mind when consid-
ering geometrically engineered models is that there
generically exist10 moduli which can deform the ge-
ometry into one which gives rise to a theory with
less gauge symmetry. For example, if you are given a
geometrically-engineered SO10 grand unified model,
then our results show explicitly that the model can
be locally deformed into an SU5 model, and this
can be deformed further into the Standard Model;
the original SO10 theory is seen to be a single
point in a (complex) two-dimensional landscape of
SU3×SU2 theories. And because larger symmetries
always lie in lower dimensional surfaces of moduli
space, it is very relevant to ask what physics pre-
vents this unfolding from taking place. Indeed, this
question applies to the Standard Model as well—
our analysis could easily go further to unfold away
SU3×SU2. We are not presently able to answer why
this does not happen11; although this observation
10 There could be global obstructions which prevent such a
deformation from taking place. But these are invisible to
the non-compact, local constructions considered here.
11 Although, perhaps the unfolding of SU2 may provide an al-
ternative to tuning in the usual Higgs sector [17]. It would
be interesting to understand in greater detail the relation-
suggests that perhaps theories with less symmetry,
like SU3 × SU2, could be much more natural than
grand unified theories.
More generally, it is not presently understood
what physics controls the values of the geometric
moduli which deform the manifold—the parameters
which deform the E8 → E6×SU2 complex structure,
for example. We do not yet have a general mecha-
nism which would fix these parameters; we simply
observe that any non-zero values of the moduli will
give rise to a geometrically engineered manifold with
SU3×SU2 gauge theory ‘peppered’ with all the nec-
essary singularities of the three families of the Stan-
dard Model together with the usual E6-like exotics.
And importantly, for any point in the complex four-
dimensional ‘landscape,’ the relative locations of all
the relevant singularities are known—and hence in
principle so is the superpotential.
This relationship between moduli-fixing and
gauge symmetry breaking could be a novel feature
of geometrically-unfolded models. It may allow one
to apply the results in [8], for example, to single out
theories on the landscape. However, a prerequisite
to this type of analysis would be an identification
of which moduli should be identified with the ones
which deform the geometry as described here.
Although the motivation in this paper and in
[1] appears to be a top-down realization of grand
unification, there is a sense in which we are re-
ally engineering from the bottom-up. Specifically,
because the local geometry we have described is
non-compact, the resulting theory is decoupled from
quantum gravity, and the parameters along the land-
scape of deformations are continuous. This is not
unlike the situation in [18]. But what we lose in
global constraints we perhaps gain by concrete local
structure. Not only do we have a framework which
naturally predicts three families with a rather de-
tailed phenomenological structure, but we have done
so in a way that preserves all the information about
the local geometry. And because this framework re-
alizes the ‘physics from pure geometry’ paradigm in
a potentially powerful way, it could prove impor-
tant to concrete phenomenological constructions in
M-theory, for example.
Of course we envision these local geometries to
be embedded within compact Calabi-Yau manifolds.
It is an assumption of the framework that the pre-
cise global topology of the compactification mani-
fold can be ignored at least as a good first approx-
imation. One may ask the extent to which these
constructions can be glued into compact manifolds.
Concretely: under what circumstances can a non-
compact Calabi-Yau three-fold which is a fibration of
K3 surfaces with asymptotically uniform ADE-type
ship between unfolding and the Higgs mechanism.
singularities be compactified? This is an important
question for mathematicians, the answer to which
would likely lead to important physical insight—e.g.
quantization of the moduli space of deformations.
A possible objection to this framework is that
our constructions appear to depend on several seem-
ingly arbitrary choices (the specific chain from E8 to
the Standard Model, which roots were eliminated at
each step, etc.). However, it is likely that the particle
content, for example, which results is completely in-
dependent of these choices. Furthermore, we suspect
that different realizations of the unfolding merely re-
sult in different parameterizations of the landscape,
and do not reflect true additional arbitrariness. But
this is still an area that deserves attention.
Lastly, because in this picture the Standard Model
is seen to unfold at the compactification scale, one
may ask what has become of gauge coupling unifi-
cation. Because the gauge coupling constants are
functions of the volumes of their corresponding co-
dimension four singular surfaces12 which depend on
the deformation moduli, the traditional meaning of
grand unification is more subtle here—as is typical
in string phenomenology. For example, although we
chose to unfold the Standard Model sequentially as
a series of less unified models, there is no reason to
suspect that that order has any physical importance.
Surely, if as we parameterized the unfolding in sec-
tion IV, setting d → 0 (or c → 0) would result in an
SU5 grand unified theory; but setting a → 0 instead
would result in a restoration of family symmetry.
The four complex structure moduli tune different
types of unification separately—and should simulta-
neously be at play in the question of gauge coupling
unification.
It is interesting to note, however, that if one were
to simultaneously scale the values of all the moduli
to be very small, the spectrum would be more and
more unified: the relative distances between singu-
larities shrink, unifying the coefficients in the su-
perpotential; and the volumes of the co-dimension
four singularities (if realized in a compact manifold)
would approach one another, resulting in a unifica-
tion of their gauge couplings. What this may mean
phenomenologically remains to be understood.
In this paper we have described a local, purely ge-
ometric framework in which gauge symmetry ‘break-
ing’ can be re-cast as a problem of moduli fixing—
and in which the same moduli which describe this
geometric ‘unfolding’ also determine the physics of
massless matter. And although we still do not un-
derstand the mechanisms by which these moduli are
fixed, the landscape of possibilities is already enor-
mously reduced: what would have been the hun-
dreds of parameters describing the relative positions
on the compactification manifold of the Standard
Model’s three families worth of matter fields, we
specify them all in terms of only four complex struc-
ture moduli which describe the unfolding of an iso-
lated E8 singularity. And the fact that three families
emerges is group-theoretic and not added by hand.
VII. ACKNOWLEDGEMENTS
It is a pleasure to thank helpful discussions with
and insightful comments of Herman Verlinde, Sergei
Gukov, Gordy Kane, Paul Langacker, Edward Wit-
ten, Cumrun Vafa, Brent Nelson, Malcolm Perry,
Dmitry Malyshev, Matthew Buican, Piyush Kumar,
and Konstantin Bobkov.
This work was funded in part by a Graduate Re-
search Fellowship from the National Science Foun-
dation.
12 Of course, this can only be discussed concretely when the
compact manifold is known.
[1] J. L. Bourjaily, “Multiple Unfoldings of Orbifold
Singularities: Engineering Geometric Analogies to
Unification,” arXiv:0704.0444.
[2] S. Katz and C. Vafa, “Matter from Geome-
try,” Nucl. Phys., vol. B497, pp. 146–154, 1997,
hep-th/9606086.
[3] A. Klemm, W. Lerche, and P. Mayr, “K3 Fibrations
and Heterotic Type II String Duality,” Phys. Lett.,
vol. B357, pp. 313–322, 1995, hep-th/9506112.
[4] B. S. Acharya and S. Gukov, “M-theory and Singu-
larities of Exceptional Holonomy Manifolds,” Phys.
Rept., vol. 392, pp. 121–189, 2004, hep-th/0409191.
[5] E. Witten, “Deconstruction, G2 Holonomy, and
Doublet-Triplet Splitting,” 2001, hep-ph/0201018.
[6] T. Friedmann and E. Witten, “Unification Scale,
Proton Decay, and Manifolds of G(2) Holonomy,”
Adv. Theor. Math. Phys., vol. 7, pp. 577–617, 2003,
hep-th/0211269.
[7] S. Katz and D. Morrison, “Gorenstein Threefold
Singularities with Small Resolutions via Invariant
Theory for Weyl Groups,” J. Algebraic Geometry,
vol. 1, pp. 449–530, 1992.
[8] B. Acharya, K. Bobkov, G. Kane, P. Kumar, and
D. Vaman, “An M Theory Solution to the Hierarchy
Problem,” Phys. Rev. Lett., vol. 97, p. 191601, 2006,
hep-th/0606262.
[9] B. S. Acharya, K. Bobkov, G. L. Kane, P. Ku-
mar, and J. Shao, “Explaining the Electroweak
Scale and Stabilizing Moduli in M-Theory,” 2007,
hep-th/0701034.
[10] P. Berglund and A. Brandhuber, “Matter from G2
Manifolds,” Nucl. Phys., vol. B641, pp. 351–375,
2002, hep-th/0205184.
[11] F. Klein, Vorlesungen über das Ikosaeder und die
Auflösung der Gleichungen vom funften Grade.
Leipzig: Teubner, 1884.
[12] C. C. Bramble, “A Collineation Group Isomorphic
with the Group of Double Tangents of the Plane
http://arxiv.org/abs/0704.0444
http://arxiv.org/abs/hep-th/9606086
http://arxiv.org/abs/hep-th/9506112
http://arxiv.org/abs/hep-th/0409191
http://arxiv.org/abs/hep-ph/0201018
http://arxiv.org/abs/hep-th/0211269
http://arxiv.org/abs/hep-th/0606262
http://arxiv.org/abs/hep-th/0701034
http://arxiv.org/abs/hep-th/0205184
Quartic,” Amer. J. Math., vol. 40, pp. 351–365,
1918.
[13] R. Slansky, “Group Theory for Unified Model Build-
ing,” Phys. Rept., vol. 79, pp. 1–128, 1981.
[14] M. Atiyah and E. Witten, “M-theory Dynamics on
a Manifold of G2 Holonomy,” Adv. Theor. Math.
Phys., vol. 6, pp. 1–106, 2003, hep-th/0107177.
[15] B. Acharya and E. Witten, “Chiral Fermions from
Manifolds of G2 Holonomy,” 2001, hep-th/0109152.
[16] J. L. Hewett and T. G. Rizzo, “Low-Energy Phe-
nomenology of Superstring Inspired E(6) Models,”
Phys. Rept., vol. 183, p. 193, 1989.
[17] This was pointed out by Paul Langacker in a private
correspondance.
[18] H. Verlinde and M. Wijnholt, “Building the Stan-
dard Model on a D3-Brane,” JHEP, vol. 01, p. 106,
2007, hep-th/0508089.
http://arxiv.org/abs/hep-th/0107177
http://arxiv.org/abs/hep-th/0109152
http://arxiv.org/abs/hep-th/0508089
ABSTRACT
  This paper extends and builds upon the results of an earlier paper, in which
we described how to use the tools of geometrical engineering to deform
geometrically-engineered grand unified models into ones with lower symmetry.
This top-down unfolding has the advantage that the relative positions of
singularities giving rise to the many `low energy' matter fields are related by
only a few parameters which deform the geometry of the unified model. And
because the relative positions of singularities are necessary to compute the
superpotential, for example, this is a framework in which the arbitrariness of
geometrically engineered models can be greatly reduced.
  In our earlier paper, this picture was made concrete for the case of
deforming the representations of an SU(5) model into their Standard Model
content. In this paper we continue that discussion to show how a geometrically
engineered 16 of SO(10) can be unfolded into the Standard Model, and how the
three families of the Standard Model uniquely emerge from the unfolding of a
single, isolated E8 singularity.

<|endoftext|><|startoftext|>
Introduction
The classification of smooth, complex surfaces S of general type with small birational in-
variants is quite a natural problem in the framework of algebraic geometry. For instance, one
may want to understand the case where the Euler characteristic χ(OS) is 1, that is, when the
geometric genus pg(S) is equal to the irregularity q(S). All surfaces of general type with these
invariants satisfy pg ≤ 4. In addition, if pg = q = 4 then the self-intersection K
S of the canon-
ical class of S is equal to 8 and S is the product of two genus 2 curves, whereas if pg = q = 3
then K2S = 6 or 8 and both cases are completely described ([CCML98], [HP02], [Pir02]). On
the other hand, surfaces of general type with pg = q = 0, 1, 2 are still far from being classified.
We refer the reader to the survey paper [BaCaPi06] for a recent account on this topic and a
comprehensive list of references.
A natural way of producing interesting examples of algebraic surfaces is to construct them as
quotients of known ones by the action of a finite group. For instance Godeaux constructed in
[Go31] the first example of surface of general type with vanishing geometric genus taking the
quotient of a general quintic surface of P3 by a free action of Z5. In line with this Beauville
proposed in [Be96, p. 118] the construction of a surface of general type with pg = q = 0, K
S = 8
as the quotient of a product of two curves C and F by the free action of a finite group G whose
order is related to the genera g(C) and g(F ) by the equality |G| = (g(C)−1)(g(F )−1). Gener-
alizing Beauville’s example we say that a surface S is isogenous to a product if S = (C ×F )/G,
for C and F smooth curves and G a finite group acting freely on C ×F . A systematic study of
these surfaces has been carried out in [Ca00]. They are of general type if and only if both g(C)
and g(F ) are greater than or equal to 2 and in this case S admits a unique minimal realization
where they are as small as possible. From now on, we tacitly assume that such a realization is
chosen, so that the genera of the curves and the group G are invariants of S. The action of G
can be seen to respect the product structure on C × F . This means that such actions fall in
two cases: the mixed one, where there exists some element in G exchanging the two factors (in
this situation C and F must be isomorphic) and the unmixed one, where G acts faithfully on
both C and F and diagonally on their product.
After [Be96], examples of surfaces isogenous to a product with pg = q = 0 appeared in [Par03]
and [BaCa03], and their complete classification was obtained in [BaCaGr06].
The next natural step is therefore the analysis of the case pg = q = 1. Surfaces of general type
with these invariants are the irregular ones with the lowest geometric genus and for this reason
it would be important to provide their complete description. So far, this has been obtained only
in the cases K2S = 2, 3 ([Ca81], [CaCi91], [CaCi93], [Pol05], [CaPi06]).
The goal of the present paper is to give the full list of surfaces with pg = q = 1 that are isoge-
nous to a product. Our work has to be seen as the sequel to the article [Pol07], which describe
Date: November 4, 2018.
2000 Mathematics Subject Classification. 14J29 (primary), 14L30, 14Q99, 20F05.
Key words and phrases. Surfaces of general type, isotrivial fibrations, actions of finite groups.
http://arxiv.org/abs/0704.0446v2
all unmixed cases with G abelian and some unmixed examples with G nonabelian. Apart from
the complete list of the genera and groups occurring, our paper contains the first examples of
surfaces of mixed type with q = 1. The mixed cases turn out to be much less frequent than
the unmixed ones and, as when pg = q = 0, they occur for only one value of the order of G.
However, in contrast with what happens when pg = q = 0, the mixed cases do not correspond
to the maximum value of |G| but appear for a rather small order, namely |G| = 16.
Our classification procedure involves arguments from both geometry and computational group
theory. We will give here a brief account on how the result is achieved.
If S is any surface isogenous to a product and satisfying pg = q then |G|, g(C), g(F ) are related
as in Beauville’s example and we have K2S = 8. Besides, if pg = q = 1 such surfaces are neces-
sarily minimal and of general type (Lemma 2.1).
If S = (C×F )/G is of unmixed type then the two projections πC : C×F −→ C, πF : C×F −→ F
induce two morphisms α : S −→ C/G, β : S −→ F/G, whose smooth fibres are isomorphic to F
and C, respectively. Moreover, the geometry of S is encoded in the geometry of the two cover-
ings h : C −→ C/G, f : F −→ F/G and the invariants of S impose strong restrictions on g(C),
g(F ) and |G|. Indeed we have 1 = q(S) = g(C/G) + g(F/G) so we may assume that E := C/G
is an elliptic curve and F/G ∼= P1. Then α : S −→ E is the Albanese morphism of S and the
genus galb of the general Albanese fibre equals g(F ). It is proven in [Pol07, Proposition 2.3] that
3 ≤ g(F ) ≤ 5; in particular this allows us to control |G|. The covers f and h are determined by
two suitable systems of generators for G, that we call V and W, respectively. Besides, in order
to obtain a free action of G on C×F and a quotient S with the desired invariants, V and W are
subject to strict conditions of combinatorial nature (Proposition 2.2). The geometry imposes
also strong restrictions on the possible W and the genus of C, so the existence of V and W and
the compatibility conditions can be verified through a computer search. It is worth mentioning
that the classification of finite groups of automorphisms acting on curves of genus lesser than or
equal to 5 could have also been retrieved from the existing literature ([Br90], [Ki03], [KuKi90],
[KuKu90]).
If S = (C × C)/G is of mixed type then the index two subgroup G◦ of G corresponding to
transformations that do not exchange the coordinates in C ×C acts faithfully on C. The quo-
tient E = C/G◦ is isomorphic to the Albanese variety of S and galb = g(C) (Proposition 2.5).
Moreover g(C) may only be 5, 7 or 9, hence |G| is at most 64 (Proposition 2.10). The cover
h : C −→ E is determined by a suitable system of generators V for G◦ and since the action of
G on C × C is required to be free, combinatorial restrictions involving the elements of V and
those of G \ G◦ have to be imposed (Proposition 2.6). Our classification is obtained by first
listing those groups G◦ for which V exists and then by looking at the admissible extensions G
of G◦. We find that the only possibility occurring is for g(C) = 5 so that |G| is necessarily 16
(Propositions 4.1, 4.2, 4.3).
In the last part of the paper we examine the structure of the subset of the moduli space
corresponding to surfaces isogenous to a product with pg = q = 1. It can be explicitly described
by calculating the number of orbits of the direct product of certain mapping class groups with
Aut(G) acting on the set (of pairs) of systems of generators (Proposition 5.1). In particular it
is possible to determine the number of irreducible connected components and their respective
dimensions, see the forthcoming article [Pe08].
Our computations were carried out by using the computer algebra program GAP4, whose data-
base includes all groups of order less than 2000, with the exception of 1024 (see [GAP4]). For
the reader’s convenience we included the scripts in the Appendix.
Now let us state the main result of this paper.
Main Theorem. Let S = (C × F )/G be a surface with pg = q = 1, isogenous to a product of
curves. Then S is minimal of general type and the occurrences for g(F ), g(C), G, the dimension
D of the moduli space and the number N of its connected components are precisely those in the
table below.
IdSmall
g(F ) = galb g(C) G Group(G) Type D N
3 3 (Z2)
2 G(4, 2) unmixed (∗) 5 1
3 5 (Z2)
3 G(8, 5) unmixed (∗) 4 1
3 5 Z2 × Z4 G(8, 2) unmixed (∗) 3 2
3 9 Z2 × Z8 G(16, 5) unmixed (∗) 2 1
3 5 D4 G(8, 3) unmixed 3 1
3 7 D6 G(12, 4) unmixed (∗∗) 3 1
3 9 Z2 ×D4 G(16, 11) unmixed 3 1
3 13 D2,12,5 G(24, 5) unmixed 2 1
3 13 Z2 ×A4 G(24, 13) unmixed 2 1
3 13 S4 G(24, 12) unmixed 2 1
3 17 Z2 ⋉ (Z2 × Z8) G(32, 9) unmixed 2 1
3 25 Z2 × S4 G(48, 48) unmixed 2 1
4 3 S3 G(6, 1) unmixed (∗∗) 4 1
4 5 D6 G(12, 4) unmixed 3 1
4 7 Z3 × S3 G(18, 3) unmixed 2 2
4 7 Z3 × S3 G(18, 3) unmixed 1 1
4 9 S4 G(24, 12) unmixed (∗∗) 2 1
4 13 S3 × S3 G(36, 10) unmixed 1 1
4 13 Z6 × S3 G(36, 12) unmixed 1 1
4 13 Z4 ⋉ (Z3)
2 G(36, 9) unmixed 1 2
4 21 A5 G(60, 5) unmixed (∗∗) 1 1
4 25 Z3 × S4 G(72, 42) unmixed 1 1
4 41 S5 G(120, 34) unmixed 1 1
5 3 D4 G(8, 3) unmixed (∗∗) 4 1
5 4 A4 G(12, 3) unmixed (∗∗) 2 2
5 5 Z4 ⋉ (Z2)
2 G(16, 3) unmixed 2 3
5 7 Z2 ×A4 G(24, 13) unmixed 2 2
5 7 Z2 ×A4 G(24, 13) unmixed 1 1
5 9 Z8 ⋉ (Z2)
2 G(32, 5) unmixed 1 1
5 9 Z2 ⋉D2,8,5 G(32, 7) unmixed 1 1
5 9 Z4 ⋉ (Z4 × Z2) G(32, 2) unmixed 1 1
5 9 Z4 ⋉ (Z2)
3 G(32, 6) unmixed 1 1
5 13 (Z2)
2 ×A4 G(48, 49) unmixed 1 1
5 17 Z4 ⋉ (Z2)
4 G(64, 32) unmixed 1 2
5 21 Z5 ⋉ (Z2)
4 G(80, 49) unmixed 1 2
5 5 D2,8,3 G(16, 8) mixed 2 1
5 5 D2,8,5 G(16, 6) mixed 2 3
5 5 Z4 ⋉ (Z2)
2 G(16, 3) mixed 2 1
Here IdSmallGroup(G) denotes the label of the group G in the GAP4 database of small groups.
The calculation of N is due to Penegini and Rollenske, see [Pe08], except for the cases marked
with (∗), which were already studied in [Pol07]. The cases marked with (∗∗) also appeared in
[Pol07], but the computation of N was missing.
This work is organized as follows.
In Section 1 we collect the basic facts about surfaces isogenous to a product, following the
treatment given by Catanese in [Ca00] and we fix the algebraic setup.
In Section 2 we apply the structure theorems of Catanese to the case pg = q = 1 and this leads
to Propositions 2.2 and 2.6, that provide the translation of our classification problem from ge-
ometry to algebra. All these results are used in Sections 3 and 4, which are the core of the
paper and give the complete lists of the occurring groups and genera in the unmixed and mixed
cases, respectively.
Finally, Section 5 is devoted to the description of the moduli spaces.
Notations and conventions. All varieties, morphisms, etc. in this article are defined over
C. By “surface” we mean a projective, non-singular surface S, and for such a surface KS
denotes the canonical class, pg(S) = h
0(S, KS) is the geometric genus, q(S) = h
1(S, KS) is the
irregularity and χ(OS) = 1 − q(S) + pg(S) is the Euler characteristic. Throughout the paper
we use the following notation for groups:
• Zn: cyclic group of order n.
• Dp,q,r = Zp ⋉Zq = 〈x, y | x
p = yq = 1, xyx−1 = yr〉: split metacyclic group of order pq.
The group D2,n,−1 is the dihedral group of order 2n and it will be denoted by Dn.
• Sn, An: symmetric, alternating group on n symbols.
• If x, y ∈ G, their commutator is defined as [x, y] = xyx−1y−1.
• If x ∈ G we denote by Intx the inner automorphism of G defined as Intx(g) = xgx
• IdSmallGroup(G) indicates the label of the group G in the GAP4 database of small
groups. For instance IdSmallGroup(D4) = G(8, 3) and this means that D4 is the third
in the list of groups of order 8.
Acknowledgements. The authors wish to thank M. Penegini and S. Rollenske for giving
them a preliminary version of [Pe08] and for kindly allowing them to include their results in
the Main Theorem. Moreover they are indebted with the referee for several valuable comments
and suggestions to improve this article.
1. Basic on surfaces isogenous to a product
In this section we collect for the reader’s convenience some basic results on groups acting on
curves and surfaces isogenous to a product, referring to [Ca00] for further details.
Definition 1.1. A complex surface S of general type is said to be isogenous to a product if there
exist two smooth curves C, F and a finite group G acting freely on C×F so that S = (C×F )/G.
There are two cases: the unmixed one, where G acts diagonally, and the mixed one, where
there exist elements of G exchanging the two factors (and then C, F are isomorphic).
In both cases, since the action of G on C × F is free, we have
K2S =
K2C×F
8(g(C) − 1)(g(F ) − 1)
χ(OS) =
χ(OC×F )
(g(C)− 1)(g(F ) − 1)
hence K2S = 8χ(OS).
Let C, F be curves of genus ≥ 2. Then the inclusion Aut(C × F ) ⊃ Aut(C) × Aut(F ) is an
equality if C and F are not isomorphic, whereas Aut(C × C) = Z2 ⋉ (Aut(C) × Aut(C)), the
Z2 being generated by the involution exchanging the two coordinates. If S = (C × F )/G is
a surface isogenous to a product, we will always consider its unique minimal realization. This
means that
• in the unmixed case, we have G ⊂ Aut(C) and G ⊂ Aut(F ) (i.e. G acts faithfully on
both C and F );
• in the mixed case, where C ∼= F , we haveG◦ ⊂ Aut(C), for G◦ := G∩(Aut(C)×Aut(C)).
(See [Ca00, Corollary 3.9 and Remark 3.10]).
Definition 1.2. Let G be a finite group and let g′ ≥ 0, and mr ≥ mr−1 ≥ . . . ≥ m1 ≥ 2 be
integers. A generating vector for G of type (g′ | m1, . . . ,mr) is a (2g
′ + r)-ple of elements
V = {g1, . . . , gr; h1, . . . , h2g′}
such that: the set V generates G; |gi| = mi and g1g2 · · · grΠ
i=1[hi, hi+g′ ] = 1. If such a V exists,
then G is said to be (g′ | m1, . . . ,mr)-generated.
For convenience we make abbreviations such as (4 | 23, 32) for (4 | 2, 2, 2, 3, 3) when we write
down the type of the generating vector V.
By Riemann’s existence theorem a finite group G acts as a group of automorphisms of some
compact Riemann surface X of genus g with quotient a Riemann surface Y of genus g′ if and
only if there exist integers mr ≥ mr−1 ≥ . . . ≥ m1 ≥ 2 such that G is (g
′ | m1, . . . ,mr)-
generated and g, g′, |G| and the mi are related by the Riemann-Hurwitz formula. Moreover,
if V = {g1, . . . , gr; h1, . . . , h2g′} is a generating vector for G, the subgroups 〈gi〉 and their
conjugates are precisely the nontrivial stabilizers of the G-action ([Br90, Section 2], [Bre00,
Chapter 3], [H71]). The description of surfaces isogenous to a product can be therefore reduced
to finding suitable generating vectors. Requiring that S has given invariants pg and q imposes
numerical restrictions on the order of the group G and the genus of the curves C and F . Our
goal is to classify all surfaces with pg = q = 1 isogenous to a product. The aim of the next
section is to translate this classification problem from geometry to algebra.
2. The case pg = q = 1. Building data
Lemma 2.1. Let S = (C × F )/G be a surface isogenous to a product with pg = q = 1. Then
(i) K2S = 8.
(ii) |G| = (g(C)− 1)(g(F ) − 1).
(iii) S is a minimal surface of general type.
Proof. Claims (i) and (ii) follow from (1). Now let us consider (iii). Since C×F is minimal and
the cover C×F −→ S is étale, S is minimal as well. Moreover (ii) implies either g(C) = g(F ) = 0
or g(C) ≥ 2, g(F ) ≥ 2. The first case is impossible otherwise S = P1 × P1 and pg = q = 0; thus
the second case occurs, hence S is of general type. �
2.1. Unmixed case. If S = (C ×F )/G is a surface with pg = q = 1, isogenous to an unmixed
product, then g(C) ≥ 3, g(F ) ≥ 3 and up to exchanging F and C one may assume F/G ∼= P1
and C/G ∼= E, where E is an elliptic curve. Moreover α : S −→ C/G is the Albanese morphism
of S and galb = g(F ), see [Pol07, Proposition 2.2]. This leads to
Proposition 2.2. ([Pol07, Proposition 3.1]) Let G be a finite group which is both (0 | m1, . . . ,mr)
and (1 | n1, . . . , ns)-generated, with generating vectors V = {g1, . . . , gr} and W = {ℓ1, . . . , ℓs; h1, h2},
respectively. Let g(F ), g(C) be the positive integers defined by the Riemann-Hurwitz relations
2g(F ) − 2 = |G|
− 2 +
2g(C)− 2 = |G|
Assume moreover that g(C) ≥ 3, g(F ) ≥ 3, |G| = (g(C) − 1)(g(F ) − 1) and
〈σgiσ
〈σℓjσ
 = {1G}.
Then there is a free, diagonal action of G on C × F such that the quotient S = (C × F )/G is
a minimal surface of general type with pg = q = 1, K
S = 8. Conversely, every surface with
pg = q = 1, isogenous to an unmixed product, arises in this way.
Here, condition (U) ensures that the G-action on C × F is free.
Set m := (m1, . . . ,mr) and n := (n1, . . . , ns); if S = (C × F )/G is a surface with pg = q = 1
which is constructed by using the recipe in Proposition 2.2, it will be called an unmixed surface
of type (G, m, n).
Proposition 2.3. ([Pol07, Proposition 2.3]) Let S = (C ×F )/G be an unmixed surface of type
(G, m, n). Then there are exactly the following possibilities:
(a) g(F ) = 3, n = (22)
(b) g(F ) = 4, n = (3)
(c) g(F ) = 5, n = (2).
The following lemma gives a restriction on m instead.
Lemma 2.4. Let S = (C × F )/G be an unmixed surface of type (G, m, n). Then every mi
divides
(g(F )−1)
Proof. Since 〈gi〉 is a stabilizer for the G-action on F and since G acts freely on (C × F ),
the subgroup 〈gi〉 ∼= Zmi acts freely on C. By Riemann-Hurwitz formula applied to the cover
C −→ C/〈gi〉 we have g(C)− 1 = mi(g(C/〈gi〉)− 1). Thus mi divides g(C)− 1 =
(g(F )−1)
2.2. Mixed case.
Proposition 2.5. Let S = (C × C)/G be a surface with pg = q = 1 isogenous to a mixed
product. Then E := C/G◦ is an elliptic curve isomorphic to the Albanese variety of S.
Proof. We have (see [Ca00, Proposition 3.15])
C = H0(Ω1S) = (H
0(Ω1C)⊕H
0(Ω1C))
G = (H0(Ω1C)
G◦ ⊕H0(Ω1C)
G◦)G/G
= (H0(Ω1E)⊕H
0(Ω1E))
G/G◦ .
Since S is of mixed type, the quotient Z2 = G/G
◦ exchanges the last two summands, whence
h0(Ω1E) = 1. Thus E is an elliptic curve and there is a commutative diagram
(3) C × C
E × E
K E(2)
showing that the Albanese morphism α of S factors through the Abel-Jacobi map α̂ of the
double symmetric product E(2) of E. �
By Lemma 2.1 we have |G| = (g(C)− 1)2. In this case [Ca00, Proposition 3.16] becomes
Proposition 2.6. Assume that G◦ is a (1 | n1, . . . , ns)-generated finite group with generating
vector V = {ℓ1, . . . , ℓs; h1, h2} and that there is a nonsplit extension
(4) 1 −→ G◦ −→ G −→ Z2 −→ 1
which gives an involution [ϕ] in Out(G◦). Let g(C) ∈ N be defined by the Riemann-Hurwitz
relation 2g(C)−2 = |G◦|
. Assume, in addition, that |G| = (g(C)−1)2 and that
(M1) for all g ∈ G \G◦ we have
{ℓ1, . . . , ℓs} ∩ {gℓ1g
−1, . . . , gℓsg
−1} = ∅;
(M2) for all g ∈ G \G◦ we have
g2 /∈
〈σℓjσ
Then there is a free, mixed action of G on C × C such that the quotient S = (C × C)/G is a
minimal surface of general type with pg = q = 1, K
S = 8.
Conversely, every surface S with pg = q = 1, isogenous to a mixed product, arises in this way.
Here, conditions (M1) and (M2) ensure that the G-action on C × C is free.
Remark 2.7. The surface S is not covered by elliptic curves because it is of general type
(Lemma 2.1), so the map C −→ C/G◦ = E is ramified. Therefore condition (M1) implies that
G is not abelian.
Remark 2.8. The exact sequence (4) is non split if and only if the number of elements of order
2 in G equals the number of elements of order 2 in G◦.
Proposition 2.9. Let S = (C × C)/G be a surface with pg = q = 1, isogenous to a mixed
product. Then galb = g(C).
Proof. Let us look at diagram (3). The Abel-Jacobi map α̂ gives to E(2) the structure of a
1-bundle over E ([CaCi93]); let f be the generic fibre of this bundle and F ∗ := ρ∗ε∗(f). If
Falb is the generic Albanese fibre of S we have Falb = π(F
∗). Let n = (n1, . . . , ns) be such
that G◦ is (1 |n1, . . . ns)-generated and 2g(C)− 2 = |G
. The (G◦ ×G◦)-cover
ρ is branched exactly along the union of s “horizontal” copies of E and s “vertical” copies of
E; moreover for each i there are one horizontal copy and one vertical copy whose branching
number is ni. Since ε
∗(f) is an elliptic curve that intersects all these copies of E transversally
in one point, by Riemann-Hurwitz formula applied to F ∗ −→ ε∗(f) we obtain
2g(F ∗)− 2 = |G◦|2 ·
On the other hand the G-cover π is étale, so we have
2g(Falb)− 2 =
(2g(F ∗)− 2) = |G◦|
= 2g(C)− 2,
whence galb = g(C). �
If S = (C × C)/G is a surface with pg = q = 1 which is constructed by using the recipe of
Proposition 2.6, it will be called a mixed surface of type (G, n). The analogue of Proposition
2.3 in the mixed case is
Proposition 2.10. Let S = (C × C)/G be a mixed surface of type (G, n). Then there are at
most the following possibilities:
• g(C) = 5, n = (22), |G| = 16;
• g(C) = 7, n = (3), |G| = 36;
• g(C) = 9, n = (2), |G| = 64.
Proof. By Proposition 2.6 we have 2g(C) − 2 = |G◦|
and |G◦| = 1
(g(C) − 1)2,
so g(C) must be odd and we obtain 4 = (g(C)− 1)
. Therefore 4 ≥ 1
(g(C)− 1)
and the only possibilities are g(C) = 3, 5, 7, 9.
The case g(C) = 3 is ruled out because G cannot be abelian by Remark 2.7.
If g(C) = 5 then
= 1, so n = (22) and |G| = 16.
If g(C) = 7 then
, so n = (3) and |G| = 36.
If g(C) = 9 then
, so n = (2) and |G| = 64. �
We will see in Section 2.10 that only the case g(C) = 5 actually occurs.
3. The unmixed case
The classification of surfaces of general type with pg = q = 1 isogenous to an unmixed prod-
uct is carried out in [Pol07] when the group G is abelian. Therefore in this section we assume
that G is nonabelian.
Following [BaCaGr06, Section 1.2], for an r-ple m = (m1, . . . ,mr) ∈ N
r we set
Θ(m) := −2 +
, α(m) :=
If S is an unmixed surface of type (G, m, n) then we necessarily have 2 ≤ m1 ≤ . . . ≤ mr and
Θ(m) > 0. Besides, by Proposition 2.2 we have α(m) =
g(F )−1
= g(C)− 1 ∈ N and by Lemma
2.4 each integer mi divides α(m). Then we get
Proposition 3.1. Let S = (C × F )/G be a surface with pg = q = 1 isogenous to an unmixed
product of type (G, m, n). Then the possibilities for m and α(m), written in the format mα(m),
lie in the set T below:
(2, 3, 7)84 , (2, 3, 8)48, (2, 4, 5)40, (2, 3, 9)36 , (2, 3, 10)30 , (2, 3, 12)24 ,
(2, 4, 6)24 , (3
2, 4)24, (2, 5
2)20, (2, 3, 18)18 , (2, 4, 8)16, (3
2, 5)15,
(2, 4, 12)12 , (2, 6
2)12, (3
2, 6)12, (3, 4
2)12, (2, 5, 10)10 , (3
2, 9)9,
(2, 82)8, (4
3)8, (3, 6
2)6, (5
3)5, (2
3, 3)12, (2
3, 4)8,
(23, 6)6, (2
2, 32)6, (2
2, 42)4, (3
4)3, (2
5)4, (2
Proof. This follows combining [BaCaGr06, Proposition 1.4] with Lemma 2.4. �
By abuse of notation, we write m ∈ T instead of mα(m) ∈ T.
Now we analyze the three cases in Proposition 2.3 separately, according to the value of g(F ).
Note that if g(F ) = 3, 4, 5 then |Aut(F )| ≤ 168, 120, 192, respectively ([Bre00, p. 91]).
Proposition 3.2. If g(F ) = 3 we have precisely the following possibilities.
IdSmall
G Group(G) m
D4 G(8, 3) (2
2, 42)
D6 G(12, 4) (2
3, 6)
Z2 ×D4 G(16, 11) (2
3, 4)
D2,12,5 G(24, 5) (2, 4, 12)
Z2 ×A4 G(24, 13) (2, 6
S4 G(24, 12) (3, 4
Z2 ⋉ (Z2 × Z8) G(32, 9) (2, 4, 8)
Z2 × S4 G(48, 48) (2, 4, 6)
Proof. Since n = (22) it follows that G is (1 | 22)-generated and by the second relation in (2)
we have |G| = 2(g(C) − 1). So we must describe all unmixed surfaces of type (G,m,n) with
m ∈ T, n = (22) and |G| = 2α(m). By a computer search through the r-tuples in Proposition
3.1 we can therefore list all possibilities, proving our statement. See the GAP4 script 1 in the
Appendix to see how this procedure applies to an explicit example.
Proposition 3.3. If g(F ) = 4 we have precisely the following possibilities.
IdSmall
G Group(G) m
S3 G(6, 1) (2
D6 G(12, 4) (2
Z3 × S3 G(18, 3) (2
2, 32)
Z3 × S3 G(18, 3) (3, 6
S4 G(24, 12) (2
3, 4)
S3 × S3 G(36, 10) (2, 6
Z6 × S3 G(36, 12) (2, 6
Z4 ⋉ (Z3)
2 G(36, 9) (3, 42)
A5 G(60, 5) (2, 5
Z3 × S4 G(72, 42) (2, 3, 12)
S5 G(120, 34) (2, 4, 5)
Proof. Since n = (3) it follows that G is (1 | 3)-generated and by the second relation in (2)
we have |G| = 3(g(C) − 1). Therefore our statement can be proven searching by computer
calculation all unmixed surfaces of type (G,m,n) with m ∈ T, n = (3), |G| = 3α(m) and
α(m) ≤ 40. �
Proposition 3.4. If g(F ) = 5 we have precisely the following possibilities.
IdSmall
G Group(G) m
D4 G(8, 3) (2
A4 G(12, 3) (3
Z4 ⋉ (Z2)
2 G(16, 3) (22, 42)
Z2 ×A4 G(24, 13) (2
2, 32)
Z2 ×A4 G(24, 13) (3, 6
Z8 ⋉ (Z2)
2 G(32, 5) (2, 82)
Z2 ⋉D2,8,5 G(32, 7) (2, 8
Z4 ⋉ (Z4 × Z2) G(32, 2) (4
Z4 ⋉ (Z2)
3 G(32, 6) (43)
2 ×A4 G(48, 49) (2, 6
Z4 ⋉ (Z2)
4 G(64, 32) (2, 4, 8)
Z5 ⋉ (Z2)
4 G(80, 49) (2, 52)
Proof. Since n = (2), it follows that G is (1 | 2)-generated and by the second relation in (2)
we have |G| = 4(g(C) − 1). Therefore our statement can be proven searching by computer
calculation all unmixed surfaces of type (G,m,n) with m ∈ T, n = (2), |G| = 4α(m) and
α(m) ≤ 48. �
4. The mixed case
In this section we use Proposition 2.6 in order to classify the surfaces with pg = q = 1
isogenous to a mixed product. By Proposition 2.10 we have g(C) = 5, 7 or 9. Let us consider
the three cases separately.
4.1. The case g(C) = 5, |G| = 16.
Proposition 4.1. If g(C) = 5, |G| = 16 we have precisely the following possibilities.
IdSmall IdSmall
G◦ Group(G◦) G Group(G)
D4 G(8, 3) D2,8,3 G(16, 8)
Z2 × Z4 G(8, 2) D2,8,5 G(16, 6)
3 G(8, 5) Z4 ⋉ (Z2)
2 G(16, 3)
Proof. In this case n = (22), so our first task is to find all nonsplit sequences of type (4) for
which G◦ is a (1 | 22)-generated group of order 8. The three abelian groups of order 8 and D4
are (1 | 22)-generated whereas the quaternion group Q8 is not.
Since Z8 has only one element ℓ of order 2, condition (M1) in Proposition 2.6 cannot be satisfied
for any choice of V. By Remark 2.7 we are left to analyze the possible embeddings of Z2 × Z4,
D4 and (Z2)
3 in nonabelian groups of order 16. The groups Z2 × Z4, D4 and (Z2)
3 have 3, 5
and 7 elements of order 2, respectively. Therefore if n2 denotes the number of elements of order
2 in G, by Remark 2.8 we must consider only those groups G of order 16 with n2 ∈ {3, 5, 7}.
The nonabelian groups of order 16 with n2 = 3 are D2,8,5, Z2 × Q8 and D4,4,−1 and they all
contain a copy of Z2 × Z4. The only nonabelian group of order 16 with n2 = 5 is D2,8,3 and
it contains a subgroup isomorphic to D4. The nonabelian groups of order 16 with n2 = 7 are
Z4 ⋉ (Z2)
2 = G(16, 3) and Z2 ⋉ Q8, and only the former contains a subgroup isomorphic to
3 (cfr. [Wi05]).
Summarizing, we are left with the following cases:
D4 D2,8,3
Z2 × Z4 D2,8,5
Z2 × Z4 Z2 ×Q8
Z2 × Z4 D4,4,−1
Z4 ⋉ (Z2)
Let us analyze them separately.
• G◦ = D4, G = D2,8,3 = 〈x, y | x
2 = y8 = 1, xyx−1 = y3 〉
We consider the subgroup G◦ := 〈x, y2〉 ∼= D4. Set ℓ1 = ℓ2 = x and h1 = h2 = y
2. Condition
(M1) holds because CG(x) = 〈x, y
4〉 ⊂ G◦. Condition (M2) is satisfied because the conjugacy
class of x in G◦ is contained in the coset x〈y2〉 while for every g ∈ yG◦ we have g2 ∈ 〈y〉.
Therefore this case occurs by Proposition 2.6.
• G◦ = Z2 × Z4, G = D2,8,5 = 〈x, y | x
2 = y8 = 1, xyx−1 = y5〉
We consider the subgroup G◦ := 〈x, y2〉 ∼= Z2 × Z4. Set ℓ1 = ℓ2 = x and h1 = h2 = y
Conditions (M1) and (M2) are verified as in the previous case, so this possibility occurs.
• G◦ = Z2 × Z4, G = Z2 ×Q8 and G
◦ = Z2 × Z4, G = D4,4,−1.
All elements of order 2 in G are central so condition (M1) cannot be satisfied and these cases
do not occur.
• G◦ = (Z2)
3, G = Z4 ⋉ (Z2)
2 = 〈x, y, z | x4 = y2 = z2 = 1, xyx−1 = yz, [x, z] = [y, z] = 1〉
We consider the subgroup G◦ := 〈y, z, x2〉 ∼= (Z2)
3. Set ℓ1 = ℓ2 = y and h1 = z, h2 = x
Condition (M1) holds because G◦ is abelian and [x, y] 6= 1. Condition (M2) is satisfied because
if g ∈ xG◦ then g2 ∈ 〈z, x2〉. Therefore this case occurs. �
4.2. The case g(C) = 7, |G| = 36.
Proposition 4.2. The case g(C) = 7, |G| = 36 does not occur.
Proof. In this case n = (3), so G◦ is a group of order 18 which is (1 | 3)-generated. There are
five groups of order 18 up to isomorphism. By computer search or direct calculation we see that
the only one which is (1 | 3)-generated is Z3 × S3 = G(18, 3). Thus G would fit into a short
exact sequence
(5) 1 −→ Z3 × S3 −→ G −→ Z2 −→ 1.
A computer search shows that the only groups of order 36 containing a subgroup isomorphic to
Z3 × S3 are G(36, 10) = S3 × S3 and G(36, 12) = Z6 × S3 (see GAP4 script 2 in the Appendix).
They contain 15 and 7 elements of order 2, respectively. On the other hand Z3 × S3 contains 3
elements of order 2, so by Remark 2.8 all possible extensions of the form (5) are split and this
case cannot occur. �
4.3. The case g(C) = 9, |G| = 64.
Proposition 4.3. The case g(C) = 9, |G| = 64 does not occur.
The proof will be the consequence of the results below. First notice that, since n = (2), the
group G◦ must be (1 | 2)-generated.
Computational Fact 4.4. There exist precisely 8 groups of order 32 which are (1 | 2)-
generated, namely G(32, t) for t ∈ {2, 4, 5, 6, 7, 8, 12, 17}. The number n2 of their elements
of order 2 is given in the table below:
t 2 4 5 6 7 8 12 17
n2(G(32, t)) 7 3 7 11 11 3 3 3
Proof. Slightly modifying the first part of GAP4 script 1 in the Appendix we easily find that
the groups of order 32 which are (1 | 2)-generated are exactly those in the statement. The
number of elements of order 2 in each case are found by a quick computer search: see again the
Appendix, GAP4 script 3. �
Computational Fact 4.5. Let t ∈ {2, 4, 5, 6, 7, 8, 12, 17}. A nonsplit extension of the form
(6) 1 −→ G(32, t) −→ G(64, s) −→ Z2 −→ 1
exists if and only if the pair (t, s) is one of the following:
(2, 9), (2, 57), (2, 59), (2, 63), (2, 64), (2, 68), (2, 70), (2, 72), (2, 76), (2, 79), (2, 81), (2, 82),
(4, 11), (4, 28), (4, 122), (4, 127), (4, 172), (4, 182),
(5, 5), (5, 9), (5, 112), (5, 113), (5, 114), (5, 132), (5, 164), (5, 165), (5, 166),
(6, 33), (6, 35),
(7, 33),
(8, 37),
(12, 7), (12, 13), (12, 14), (12, 15), (12, 16), (12, 126), (12, 127), (12, 143), (12, 156),
(12, 158), (12, 160),
(17, 28), (17, 43), (17, 45), (17, 46).
Proof. Assume t = 2. Using the GAP4 script 4 in the Appendix we find that the groups of order 64
containing a subgroup isomorphic toG(32, 2) areG(64, s) for s ∈ {8, 9, 56, 57, 58, 59, 61, 62, 63,
64, 66, 67, 68, 69, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82}. By Remark 2.8 and Computa-
tional Fact 4.4, in order to detect all the groups G(64, s) fitting in some nonsplit extension of
type (6) with t = 2, it is sufficient to select from the previous list the groups containing exactly
n2 = 7 elements of order 2. This can be done with the GAP4 script 5 in the Appendix, proving
the claim in the case t = 2. The proof for the other values of t may be carried out exactly in
the same way. �
Let us denote by [G, G]2 and [G
◦, G◦]2 the subsets of elements of order 2 in [G, G] and
[G◦, G◦], respectively.
Lemma 4.6. Assume g(C) = 9 and that one of the following situations occur:
• [G,G]2 ⊆ Z(G);
• there exists some element y ∈ G \G◦ commuting with all elements in [G◦, G◦]2.
Then given any generating vector V = {ℓ1; h1, h2} of type (1 | 2) for G
◦, condition (M1) in
Proposition 2.6 cannot be satisfied.
Proof. Since ℓ1 ∈ [G
◦, G◦]2 ⊆ [G, G]2, in any of the above situations CG(ℓ1) is not contained in
G◦, so (M1) cannot hold. �
Computational Fact 4.7. Let G = G(64, s) be one of the groups appearing in the list of
Computational Fact 4.5. Then [G,G]2 is not contained in Z(G) if and only if s = 5, 33, 35, 37.
Proof. See the GAP4 script 6 in the Appendix. �
Computational facts 4.5, 4.7 and Lemma 4.6 imply that we only need to analyze the following
pairs (G◦, G):
G(32, 5) G(64, 5)
G(32, 6) G(64, 33)
G(32, 7) G(64, 33)
G(32, 6) G(64, 35)
G(32, 8) G(64, 37)
Proposition 4.8. The case G◦ = G(32, 5) does not occur.
Proof. A presentation for the group G◦ is
G◦ = 〈x, y, z | x8 = y2 = z2 = 1, [y, z] = [x, z] = 1, [x, y] = z〉.
Its derived subgroup contains exactly one element of order 2, namely z. It follows that if
{ℓ1; h1, h2} is any generating vector of type (1 | 2) for G
◦, then ℓ1 = z. Since [G
◦, G◦] is
characteristic in G◦, condition (M1) cannot be satisfied for any embedding of G◦ into G. �
By using the two instructions P:=PresentationViaCosetTable(G) and TzPrintRelators(P)
and setting in the output
x := f1, y := f2, z := f3, w := f4, v := f5, u := f6
one obtains the following presentations for G(64, 33), G(64, 35) and G(64, 37).
G(64, 33) = 〈x, y, z, w, v, u | z2 = w2 = v2 = u2 = 1, x2 = w, y2 = u,
[x, zy] = z, [x, vz] = v, [x, vu] = u,
[y, z] = [y, v] = [z, v] = [w, v] = [x, u] = 1〉
G(64, 35) = 〈x, y, z, w, v, u | w2 = v2 = u2 = 1, z2 = y2 = u, x2 = w,
[y, z] = [z, w] = u, [x, yz] = z, [x, z] = uv,
[y, v] = [z, v] = [w, v] = [x, u] = 1〉
G(64, 37) = 〈x, y, z, w, v, u | v2 = u2 = 1, w2 = z2 = y2 = u, x2 = w,
[y, z] = [z, w] = u, [x, yz] = z, [x, z] = uv,
[y, v] = [z, v] = [w, v] = 1〉.
Computational Fact 4.9. Referring to presentations (7), (8) and (9), we have the following
facts.
• The group G(64, 33) contains exactly one subgroup N1 isomorphic to G(32, 6) and one
subgroup N2 isomorphic to G(32, 7), namely
N1 := 〈x, z, w, v, u〉, N2 := 〈xy, z, w, v, u〉.
• The group G(64, 35) contains exactly two subgroups N3, N4 isomorphic to G(32, 6),
namely
N3 := 〈x, z, w, v, u〉, N4 := 〈xy, z, w, v, u〉.
• The group G(64, 37) contains exactly two subgroups N5, N6 isomorphic to G(32, 8),
namely
N5 := 〈x, z, w, v, u〉, N6 := 〈xy, z, w, v, u〉.
In addition, for every i ∈ {1, . . . , 6} we have
(a) [Ni, Ni] = 〈v, u〉 ∼= Z2 × Z2.
(b) y /∈ Ni and y commutes with all elements in [Ni, Ni].
Proof. See the GAP4 script 7 in the Appendix. �
Proposition 4.10. The cases G◦ = G(32, 6), G(32, 7), G(32, 8) do not occur.
Proof. By Lemma 4.6 and Computational Fact 4.9 it follows that, given any nonsplit extension
of type (6) with G◦ as above, condition (M1) in Proposition 2.6 cannot be satisfied. �
Summing up, we finally obtain
Proof of Proposition 4.3. It follows from Propositions 4.8 and 4.10.
5. Moduli spaces
Let Ma,b be the moduli space of smooth minimal surfaces of general type with χ(OS) =
a, K2S = b; by an important result of Gieseker, Ma,b is a quasiprojective variety for all a, b ∈ N
(see [Gie77]). Obviously, our surfaces are contained in M1,8 and we want to describe their locus
there. We denote by M(G,m,n) the moduli space of unmixed surfaces of type (G,m,n) and by
M(G,n) the moduli space of mixed surfaces of type (G,n). We know that n = (22), (3) or (2) in
the unmixed case, whereas n = (22) in the mixed one. By a general result of Catanese ([Ca00]),
both M(G,m,n) and M(G,n) consist of finitely many irreducible connected components of
M1,8, all of the same dimension. More precisely, we have
dim M(G,m,n) = r + s− 3, dim M(G,n) = s.
Consider the mapping class groups in genus zero and one:
Mod0,[r] := 〈σ1, . . . , σr | σiσi+1σi = σi+1σiσi+1,
σiσj = σjσi if |i− j| ≥ 2,
σr−1σr−2 · · · σ
1 · · · σr−2σr−1 = 1〉,
Mod1,1 := 〈tα, tβ , tγ | tαtβtα = tβtαtβ, (tαtβ)
3 = 1〉,
Mod1,[2] := 〈tα, tβ , tγ , ρ | tαtβtα = tβtαtβ, tαtγtα = tγtαtγ ,
tβtγ = tγtβ, (tαtβtγ)
4 = 1,
tαρ = ρtα, tβρ = ρtβ, tγρ = ρtγ〉.
One can prove that
Mod0,[r] : = π0 Diff
+(P1 − {p1, . . . , pr}),
Mod1,1 : = π0 Diff
+(Σ1 − {p}),
Mod1,[2] : = π0 Diff
+(Σ1 − {p, q}),
where Σ1 is the torus S
1×S1 ([Schn03], [CattMu04]). This implies that we can define actions of
these groups on the set of generating vectors for G of type (0 | m1, . . . ,mr), (1 | n) and (1 | n
respectively.
If V := {g1, . . . , gr} is of type (0 | m1, . . . ,mr) then the action is given by
gi −→ gi+1
gi+1 −→ g
i+1gigi+1
gj −→ gj if j 6= i, i+ 1.
If W := {ℓ1; h1, h2} is of type (1 | n) then
ℓ1 −→ ℓ1
h1 −→ h1
h2 −→ h2h1
ℓ1 −→ ℓ1
h1 −→ h1h
h2 −→ h2.
If W := {ℓ1, ℓ2; h1, h2} is of type (1 | n
2) then
ℓ1 −→ ℓ1
ℓ2 −→ ℓ2
h1 −→ h1
h2 −→ h2h1
ℓ1 −→ ℓ1
ℓ2 −→ ℓ2
h1 −→ h1h
h2 −→ h2
ℓ1 −→ ℓ1
ℓ2 −→ h1h
1 ℓ2h1h2h
h1 −→ h
2 ℓ1h1
h2 −→ h2
ℓ1 −→ h
1 ℓ2h1h2
ℓ2 −→ h
2 ℓ1h2h1
h1 −→ h
h2 −→ h
These are called Hurwitz moves and the induced equivalence relation on generating vectors is
said Hurwitz equivalence (see [BaCa03], [BaCaGr06], [Pol07]).
Now let B(G, m, n) be the set of pairs of generating vectors (V, W) such that the assumptions
in Proposition 2.2 are satisfied; then we denote by R the equivalence relation on B(G, m, n)
generated by Hurwitz moves on V, Hurwitz moves on W and the simultaneous action of Aut(G)
on V and W. Similarly, let B(G, n) be the set of generating vectors V such that the assump-
tions of Proposition 2.6 are satisfied; then we denote by R the equivalence relation on B(G, n)
generated by the Hurwitz moves and the action of Aut(G) on V.
Proposition 5.1. The number of irreducible components in M(G, m, n) equals the number
of R-classes in B(G, m, n). Analogously, the number of irreducible components in M(G, n)
equals the number of R-classes in B(G, n).
Proof. We can repeat exactly the same argument used in [BaCaGr06, Propositions 5.2 and 5.5];
we must just replace, where it is necessary, the mapping class group of P1 with the mapping
class group of the elliptic curve E. �
Proposition 5.1 in principle allows us to compute the number of connected components of the
moduli space in each case. In practice, this task may be too hard to be achieved by hand, but it
is not out of reach if one uses the computer. Recently, M. Penegini and S. Rollenske developed a
GAP4 script that solves this problem in a rather short time. We put the result of their calculations
in the Main Theorem (see Introduction), referring the reader to the forthcoming paper [Pe08]
for further details.
6. Appendix
In this Appendix we include, for the reader’s convenience, some of the GAP4 scripts that we
have used in our computations; all the others are similar and can be easily obtained modifying
the ones below.
Let us show how the procedure in the proof of Proposition 3.2 applies to an explicit exam-
ple, namely mα(m) = (2, 4, 12)12 . First we find all the nonabelian groups of order 24 that are
(0 | 2, 4, 12)-generated. This is done using GAP4 as below; the output tells us that there is only
one such a group, namely G = G(24, 5).
gap> # -------------- SCRIPT 1 ------------------
gap> s:=NumberSmallGroups(24);; set:=[1..s];
[1..15]
gap> for t in set do
> c:=0; G:= SmallGroup(24,t);
> Ab:=IsAbelian(G);
> for g1 in G do
> for g2 in G do
> g3:=(g1*g2)^-1;
> H:= Subgroup(G, [g1,g2]);
> if Order(g1)=2 and Order(g2)=4 and Order(g3)=12 and
> Order(H)=Order(G) and
> Ab=false then
> c:=c+1; fi;
> if Order(g1)=2 and Order(g2)=4 and Order(g3)=12 and
> Order(H)=Order(G) and
> Ab=false and c=1 then
> Print(IdSmallGroup(G)," ");
> fi; od; od; od; Print("\n");
[24,5]
By using the two instructions P:=PresentationViaCosetTable(G) and TzPrintRelators(P)
we see that G has the presentation 〈x, y | x2 = y12 = 1, xyx−1 = y5〉, hence it is isomorphic to
the metacyclic group D2,12,5.
In order to speed up further computations, we define the sets G2, G4 given by the elements of
G having order 2 and 4, respectively.
gap> G:=SmallGroup(24,5);;
gap> G2:=[];; G4:=[];;
gap> for g in G do
> if Order(g)=2 then Add(G2,g); fi;
> if Order(g)=4 then Add(G4,g); fi; od;
Then we check whether G is actually (1 | 22)-generated; if not, it should be excluded.
gap> c:=0;;
gap> for l2 in G2 do
> for h1 in G do
> for h2 in G do
> l1:=(l2*h1*h2*h1^-1*h2^-1)^-1;
> K:=Subgroup(G, [l2, h1, h2]);
> if Order(l1)=2 and Order(K)=Order(G) then
> Print(IdSmallGroup(G), " is (1 | 2,2)-generated", "\n"); c:=1; fi;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od;
[24,5] is (1 | 2,2)-generated
We finish the proof by checking whether the surface S actually exists; the procedure is to look
for a pair (V, W) of generating vectors for G satisfying the assumptions of Proposition 2.2.
gap> c:=0;;
gap> for g1 in G2 do
> for g2 in G4 do
> g3:=(g1*g2)^-1;
> H:=Subgroup(G, [g1, g2]);
> for l2 in G2 do
> for h1 in G do
> for h2 in G do
> l1:=(l2*h1*h2*h1^-1*h2^-1)^-1;
> K:=Subgroup(G, [l2, h1, h2]);
> Boole1:=l1 in ConjugacyClass(G, g1);
> Boole2:=l1 in ConjugacyClass(G, g2^2);
> Boole3:=l1 in ConjugacyClass(G, g3^6);
> Boole4:=l2 in ConjugacyClass(G, g1);
> Boole5:=l2 in ConjugacyClass(G, g2^2);
> Boole6:=l2 in ConjugacyClass(G, g3^6);
> if Order(g3)=12 and Order(l1)=2 and
> Order(H)=Order(G) and Order(K)=Order(G) and
> Boole1=false and Boole2=false and Boole3=false and
> Boole4=false and Boole5=false and Boole6=false then
> Print("The surface exists "); c:=1; fi;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od;
> if c=1 then break; fi; od; Print("\n");
The surface exists
The script above can be easily modified in order to obtain the list of all admissible pairs (V, W);
for instance, one of such pairs is given by
g1 = x, g2 = xy
−1, g3 = y
ℓ1 = xy
2, ℓ2 = xy
2, h1 = y, h2 = y.
Finally, here are the GAP4 scripts used in Section 4.
gap> # -------------- SCRIPT 2 ------------------
gap> s:=NumberSmallGroups(36);; set:=[1..s];
[1..14]
gap> for t in set do
> c:=0; G:=SmallGroup(36,t);
> N:=NormalSubgroups(G);
> for G0 in N do
> if IdSmallGroup(G0)=[18,3] then
> c:=c+1; fi;
> if IdSmallGroup(G0)=[18,3] and c=1 then
> Print(IdSmallGroup(G), " ");
> fi; od; od; Print("\n");
[36,10] [36,12]
gap> # -------------- SCRIPT 3 ------------------
gap> set:=[2,4,5,6,7,8,12,17];;
gap> for t in set do
> n2:=0;
> G0:=SmallGroup(32,t);
> for g in G0 do
> if Order(g)=2 then
> n2:=n2+1; fi; od;
> Print(IdSmallGroup(G0), " "); Print(n2, " ");
> od; Print("\n");
[32,2] 7 [32,4] 3 [32,5] 7 [32,6] 11 [32,7] 11
[32,8] 3 [32,12] 3 [32,17] 3
gap> # -------------- SCRIPT 4 ------------------
gap> s:=NumberSmallGroups(64);; set:=[1..s];
[1..267]
gap> for t in set do
> c:=0; G:=SmallGroup(64,t);
> N:=NormalSubgroups(G);
> for G0 in N do
> if IdSmallGroup(G0)=[32,2] then
> c:=c+1; fi;
> if IdSmallGroup(G0)=[32,2] and c=1 then
> Print(IdSmallGroup(G), " ");
> fi; od; od; Print("\n");
[64,8] [64,9] [64,56] [64,57] [64,58] [64,59] [64,61] [64,62] [64,63] [64,64]
[64,66] [64,67] [64,68] [64,69] [64,70] [64,72] [64,73] [64,74] [64,75] [64,76]
[64,77] [64,78] [64,79] [64,80] [64,81] [64,82]
gap> # -------------- SCRIPT 5 ------------------
gap> set:=[8,9,56,57,58,59,61,62,63,64,66,67,68,69,70,
>72,73,74,75,76,77,78,79,80,81,82];;
gap> for t in set do
> n2:=0; G:=SmallGroup(64,t);
> for g in G do
> if Order(g)=2 then n2:=n2+1;
> fi; od;
> if n2=7 then
> Print(IdSmallGroup(G), " ");
> fi; od; Print("\n");
[64,9] [64,57] [64,59] [64,63] [64,64] [64,68] [64,70]
[64,72] [64,76] [64,79] [64,81] [64,82]
gap> # -------------- SCRIPT 6 ------------------
gap> set:=[5,7,9,11,13,,14,15,16,28,33,35,37,43,45,46,
>57,59,63,64,68,70,72,76,79,81,82,112,113,114, 122,126,
>127,132,143,156,158,160,164,165,166,172,182];;
gap> for t in set do
> c:=0; G:=SmallGroup(64,t);
> D:=DerivedSubgroup(G);
> for d in D do
> B:=d in Center(G);
> if Order(d)=2 and B=false then
> c:=c+1; fi;
> if Order(d)=2 and B=false and c=1 then
> Print(IdSmallGroup(G), " ");
> fi; od; od; Print("\n");
[64,5] [64,33] [64,35] [64,37]
gap> # -------------- SCRIPT 7 ------------------
gap> s:=[33, 35, 37];; I:=[1, 2, 3];;
gap> r:=[ [[32,6], [32,7]], [[32,6]], [[32,8]] ];;
> for i in I do
> G:=SmallGroup(64, s[i]); Print(IdSmallGroup(G), "\n");
> for N in NormalSubgroups(G) do
> if IdSmallGroup(N) in r[i] then
> Print(N, "="); Print(IdSmallGroup(N), " ");
> Print(DerivedSubgroup(N), "\n");
> fi; od; Print("\n"); od;
[64,33]
Group( [ f1*f2, f3, f4, f5, f6 ] )=[32,7] Group( [ f5, f6 ] )
Group( [ f1, f3, f4, f5, f6 ] )=[32,6] Group( [ f5, f6 ] )
[64,35]
Group( [ f1*f2, f3, f4, f5, f6 ] )=[32,6] Group( [ f5, f6 ] )
Group( [ f1, f3, f4, f5, f6 ] )=[32,6] Group( [ f5, f6 ] )
[64,37]
Group( [ f1*f2, f3, f4, f5, f6 ] )=[32,8] Group( [ f5, f6 ] )
Group( [ f1, f3, f4, f5, f6 ] )=[32,8] Group( [ f5, f6 ] )
References
[BaCa03] I. Bauer, F. Catanese: Some new surfaces with pg = q = 0, Proceedings of the Fano Conference
(Torino, 2002).
[BaCaGr06] I. Bauer, F. Catanese, F. Grunewald: The classification of surfaces with pg = q = 0 isogenous to
a product of curves, e-print math.AG/0610267 (2006) to appear in Pure Appl. Math Q., volume in
honour of F. Bogomolov’s 60-th birthday.
[BaCaPi06] I. Bauer, F. Catanese, R. Pignatelli: Complex surfaces of general type: some recent progress, (2006),
to appear in Global methods in complex geometry, 1–58 Springer-Verlag.
[Be96] A. Beauville: Complex algebraic surfaces, Cambridge University Press 1996.
[Bre00] T. Breuer: Characters and Automorphism groups of Compact Riemann Surfaces, Cambridge University
Press 2000.
[Br90] S. A. Broughton: Classifying finite group actions on surfaces of low genus, J. Pure Appl. Algebra 69
(1990), 233-270.
[Ca81] F. Catanese: On a class of surfaces of general type, in Algebraic Surfaces, CIME, Liguori (1981), 269-284.
[Ca00] F. Catanese: Fibred surfaces, varieties isogenous to a product and related moduli spaces, American J.
of Math. 122 (2000), 1-44.
[CaCi91] F. Catanese and C. Ciliberto: Surfaces with pg = q = 1, Sympos. Math. XXXII (1991), 49-79.
[CaCi93] F. Catanese and C. Ciliberto: Symmetric product of elliptic curves and surfaces of general type with
pg = q = 1, J. Algebraic Geom. 2 (1993), 389-411.
[CaPi06] F. Catanese, R. Pignatelli: Fibrations of low genus I, Ann. Sci. École Norm. Sup. (4)39 (2006), 1011-
1049.
[CCML98] F. Catanese, C. Ciliberto and M. M. Lopes: Of the classification of irregular surfaces of general type
with non birational bicanonical map, Trans. Amer. Math. Soc. 350 (1998), 275-308.
[CattMu04] A. Cattabriga, M. Mulazzani: (1,1)-knots via the mapping class group of the twice punctured torus,
Adv. Geom. 4 (2004), 263-277.
[GAP4] The GAP Group, GAP – Groups, Algorithms, and Programming, Version 4.4 ; 2006,
http : //www.gap− system.org.
[Gie77] D. Gieseker: Global moduli for surfaces of general type, Invent. Math 43 (1977), 233-282.
[Go31] L. Godeaux: Sur une surface algébrique de genre zero et bigenre deux, Atti Accad. Naz. Lincei 14
(1931), 479-481.
[H71] W. J. Harvey: On the branch loci in Teichmüller space, Trans. Amer. Mat. Soc. 153 (1971), 387-399.
[HP02] C. Hacon, R. Pardini: Surfaces with pg = q = 3, Trans. Amer. Math. Soc. 354 no. 7 (2002), 2631-2638.
[Ki03] H. Kimura: Classification of automorphism groups, up to topological equivalence, of compact Riemann
surfaces of genus 4, J. Algebra 264 (2003), 26-54.
[KuKi90] A. Kuribayashi, H. Kimura: Automorphism groups of compact Riemann surfaces of genus five, J.
Algebra 134 (1990), no. 1, 80–103.
[KuKu90] I. Kuribayashi, A. Kuribayashi: Automorphism groups of compact Riemann surfaces of genera three
and four, J. Pure Appl. Algebra 65 (1990), no. 3, 277–292.
[Par03] R. Pardini: The classification of double planes of general type with K2S = 8 and pg = 0, J. Algebra 259
(2003) no. 3, 95-118.
[Pe08] M. Penegini: Surfaces with pg = q = 2 isogenous to a product of curves: a computational approach.
With an appendix of S. Rollenske. Work in progress
[Pir02] G. P. Pirola: Surfaces with pg = q = 3, Manuscripta Math. 108 no. 2 (2002), 163-170.
[Pol05] F. Polizzi: On surfaces of general type with pg = q = 1, K
S = 3, Collect. Math. 56, no. 2 (2005),
181-234.
[Pol07] F. Polizzi: On surfaces of general type with pg = q = 1 isogenous to a product of curves, e-print
math.AG/0601063, to appear in Comm. Algebra.
[Schn03] L. Schneps: Special loci in moduli spaces of curves. Galois groups and fundamental groups, 217–275,
Math. Sci. Res. Inst. Publ. 41, Cambridge Univ. Press, Cambridge, 2003.
[Wi05] M. Wild: The groups of order sixteen made easy, Amer. Math. Monthly 112, Number 1 (2005), 20-31.
Dipartimento di Matematica Pura ed Applicata, Università di Padova, Via Trieste 63, 35121
Padova, Italy.
E-mail address: carnoval@math.unipd.it
Dipartimento di Matematica, Università della Calabria, Via Pietro Bucci, 87036 Arcavacata di
Rende (CS), Italy.
E-mail address: polizzi@mat.unical.it
	0. Introduction
	1. Basic on surfaces isogenous to a product
	2. The case pg=q=1. Building data
	2.1. Unmixed case
	2.2. Mixed case
	3. The unmixed case
	4. The mixed case
	4.1. The case g(C)=5,  |G|=16
	4.2. The case g(C)=7,  |G|=36
	4.3. The case g(C)=9,  |G|=64
	5. Moduli spaces
	6. Appendix
	References
ABSTRACT
  A projective surface S is said to be isogenous to a product if there exist
two smooth curves C, F and a finite group G acting freely on C \times F so that
S=(C \times F)/G. In this paper we classify all surfaces with p_g=q=1 which are
isogenous to a product.

<|endoftext|><|startoftext|>
Manipulating the rotational properties of a two-component Bose gas
J. Christensson1, S. Bargi1, K. Kärkkäinen1, Y. Yu1, G. M. Kavoulakis1, M. Manninen2, and S. M. Reimann1
Mathematical Physics, Lund Institute of Technology, P.O. Box 118, SE-22100 Lund, Sweden
Nanoscience Center, Department of Physics, FIN-40014 University of Jyväskylä, Finland
(Dated: October 25, 2018)
A rotating, two-component Bose-Einstein condensate is shown to exhibit vortices of multiple
quantization, which are possible due to the interatomic interactions between the two species. Also,
persistent currents are absent in this system. Finally, the order parameter has a very simple structure
for a range of angular momenta.
PACS numbers: 05.30.Jp, 03.75.Lm, 67.40.-w
When a superfluid is set into rotation, it demonstrates
many fascinating phenomena, such as quantized vortex
states and persistent flow [1]. The studies of rotational
properties of superfluids originated some decades ago,
mostly in connection with liquid Helium, nuclei, and neu-
tron stars. More recently, similar properties have also
been studied extensively in cold gases of trapped atoms.
Quantum gases of atoms provide an ideal system for
studying multi-component superfluids. At first sight, the
rotational properties of a multi-component gas may look
like a trivial generalization of the case of a single compo-
nent. However, as long as the different components inter-
act and exchange angular momentum, the extra degrees
of freedom associated with the motion of each species is
not at all a trivial effect. On the contrary, this coupled
system may demonstrate some very different phenomena,
see, e.g., Refs. [2, 3, 4]. Several experimental and theo-
retical studies have been performed on this problem, see,
e.g., Refs. [5, 6, 7, 8, 9, 10, 11], as well as the review
article of Ref. [12].
In this Letter, the rotational properties of a super-
fluid that consists of two distinguishable components are
examined. Three new and surprising conclusions result
from our study:
Firstly, under appropriate conditions, one may achieve
vortex states of multiple quantization. It is important
to note that these states result from the interaction be-
tween the different species, and not from the functional
form of the external confinement. It is well known from
older studies of single-component gases, that any external
potential that increases more rapidly than quadratically
gives rise to vortex states of multiple quantization, for
sufficiently weak interactions [13]; on the contrary, in a
harmonic potential, the vortex states are always singly-
quantized. In the present study, vortex states of multiple
quantization result purely because of the interaction be-
tween the different components, even in a harmonic ex-
ternal potential. Therefore, our study may serve as an
alternative way to achieve such states [14].
Secondly, our simulations indicate that multi-
component gases do not support persistent currents, in
agreement with older studies of homogeneous superflu-
ids [3]. Essentially, the energy barrier that separates the
(metastable) state with circulation/flow from the non-
rotating state, is absent in this case, as the numerical
results, as well as the intuitive arguments presented be-
low, suggest.
Finally, we investigate the structure of the lowest state
of the gas, in the range of the total angular momentum L
between zero and Nmin = min(NA, NB), where NA and
NB are the populations of the two species labelled as A
and B. In this range of L, only the single-particle states
with m = 0 and m = 1 are macroscopically occupied, as
derived in Ref. [15] within the approximation of the low-
est Landau level of weak interactions. Remarkably, our
numerical simulations within the mean-field approxima-
tion, which go well beyond the limit of weak interactions,
show that this result is more general.
For simplicity we assume equal masses for the atoms
of the two components, MA = MB = M . Also, we
model the elastic collisions between the atoms by a con-
tact potential, with equal scattering lengths for colli-
sions between the same species and different species,
aAA = aBB = aAB = a (except in Fig. 4). Our results
are not sensitive to the above equality and hold even if
aAA ≈ aBB ≈ aAB, as in Rubidium, for example. For the
atom populations we assume NA 6= NB, but NA/NB <∼ 1
(without loss of generality). The trapping potential is
assumed to be harmonic, Vext(r) = M(ω
2ρ2 + ω2zz
2)/2.
Our Hamiltonian is thus
NA+NB
+ Vext(ri) +
NA+NB
i6=j=1
δ(ri − rj),(1)
where U0 = 4πh̄
2a/M . We consider rotation around the
z axis, and also assume that h̄ωz ≫ h̄ω, and h̄ωz ≫ n0U0,
where n0 is the typical atom density. With these assump-
tions, our problem becomes effectively two-dimensional,
as the atoms reside in the lowest harmonic oscillator state
along the axis of rotation. Thus, there are only two quan-
tum numbers that characterize the motion of the atoms,
the number of radial nodes n, and the quantum number
m associated with the angular momentum. The corre-
sponding eigenstates of the harmonic oscillator in two
dimensions are labelled as Φn,m.
Within the mean-field approximation, the energy of
the gas in the rest frame is
i=A,B
h̄2∇2
+ Vext(r)
http://arxiv.org/abs/0704.0447v1
(|ΨA|
4 + |ΨB|
4 + 2|ΨA|
2|ΨB|
2) d3r, (2)
where ΨA and ΨB are the order parameters of the two
components. By considering variations in Ψ∗A and Ψ
we get the two coupled Gross-Pitaevskii-like equations,
h̄2∇2
+ Vext + U0|ΨB|
ΨA + U0|ΨA|
2ΨA = µAΨA,
h̄2∇2
+ Vext + U0|ΨA|
ΨB + U0|ΨB|
2ΨB = µBΨB,
where µA and µB is the chemical potential of each com-
ponent. We use the method of relaxation [16] to minimize
the energy of Eq. (2) in the rotating frame, E′ = E−LΩ,
where Ω is its angular velocity.
For the diagonalization of the many-body Hamilto-
nian, we further assume weak interactions, n0U0 ≪ h̄ω,
and work within the subspace of the states of the lowest
Landau level, with n = 0. This condition is not neces-
sary, however it allows us to consider a relatively larger
number of atoms and higher values of the angular mo-
mentum. We consider all the Fock states which are eigen-
states of the number operators N̂A, N̂B of each species,
and of the operator of the total angular momentum L̂,
and diagonalize the resulting matrix.
Combination of the mean-field approximation and of
numerical diagonalization of the many-body Hamiltonian
allows us to examine both limits of weak as well as strong
interactions. For obvious reasons we use the diagonal-
ization in the limit of weak interactions, and the mean-
field approximation (mostly) in the limit of strong in-
teractions. The interaction energy is measured in units
of v0 = U0
|Φ0,0(x, y)|
4|φ0(z)|
4 d3r = (2/π)1/2h̄ωa/az,
where φ0(z) is the lowest state of the oscillator potential
along the z axis, and az = (h̄/Mωz)
1/2 is the oscilla-
tor length along this axis. For convenience we introduce
the dimensionless constant γ = Nv0/h̄ω =
2/πNa/az,
with N = NA + NB being the total number of atoms,
which measures the strength of the interaction.
We first study the limit of weak coupling, γ ≪ 1, and
use numerical diagonalization. Considering NA = 4 and
NB = 16 atoms, we use the conditional probability dis-
tributions to plot the density of the two components,
for L = 4, 16, 28, and 32, as shown in Fig. 1. When
L = 4 = NA, and L = 16 = NB, the component whose
population is equal to L forms a vortex state at the center
of the trap, while the other component does not rotate,
residing in the core of the vortex. This is a so-called
“coreless” vortex state [5, 6, 17]. As L increases beyond
L = NB = 16, a second vortex enters component B, and
for L = 2NB = 32, this merges with the other vortex to
form a doubly-quantized vortex state. For this value of
L = 32, the smaller component A does not carry any an-
gular momentum (apart from corrections of order 1/N).
The fact that this is indeed a doubly-quantized vortex
state, is confirmed by the occupancy of the single-particle
FIG. 1: The conditional probability distribution of the two
components, with NA = 4 (higher row), and NB = 16 (lower
row). Each graph extends between −2.4a0 and 2.4a0 in
both directions. The reference point is located at (x, y) =
(1.25a0, 0) in the lower graphs (B component). The angular
momentum L increases from left to right, L = 4(= NA), 16(=
NB), 28, and 32(= 2NB).
states. By increasing NA, NB, and L = 2NB proportion-
ally, we observe that the occupancy of the single-particle
state withm = 2 of componentB approaches unity, while
the occupancy of all the other states are at most of order
1/NB. The same happens for the single-particle state
with m = 0 of the non-rotating component A.
A similar situation emerges for the case of stronger cou-
pling, γ = 50, where we have minimized the mean-field
energy of Eq. (2) in the rotating frame (in the absence of
rotation the two clouds do not phase separate). For ex-
ample, we get convergent solutions, shown in Fig. 2, for
NB/NA = 2.777, and (i): LA = NA, LB = 0, for Ω/ω =
0.35 (top left), (ii) LA = 0, LB = NB, for Ω/ω = 0.45
(top middle), (iii) LA = 0.755NA, LB = 1.171NB, for
Ω/ω = 0.555 (top right), (iv) LA = 0, LB = 2NB, for
Ω/ω = 0.60 (bottom left), (v) LA = 0.876NA, LB =
2.057NB, for Ω/ω = 0.69 (bottom middle), and (vii)
LA = 0, LB = 3NB, for Ω/ω = 0.73 (bottom right).
Here, LA and LB is the angular momentum of each com-
ponent, with L = LA + LB. Again, when L = 2NB, and
L = 3NB the phase plots show clearly a doubly-quantized
and a triply-quantized vortex state in component B, and
a non-rotating cloud in component A.
The picture that appears from these calculations is
intriguing: as Ω increases, a multiply-quantized vortex
state of multiplicity κ splits into κ singly-quantized ones,
and on the same time, one more singly-quantized vortex
state enters the cloud from infinity. Eventually all these
vortices merge into a multiply-quantized one of multiplic-
ity equal to κ + 1. Figure 2 shows the above results for
various values of Ω.
The mechanical stability of states which involve the
gradual entry of the vortices from the periphery of the
cloud is novel. This behavior is absent in one-component
systems, in both harmonic, as well as anharmonic trap-
ping potentials. In one component gases, only vortex
phases of given rotational symmetry are mechanically
stable [18, 19]. In the present problem, the mechanical
stability of states with no rotational symmetry (shown
in Fig. 2) is a consequence of the non-negative curva-
ture of the dispersion relation (i.e., of the total energy)
FIG. 2: The density (higher graphs of each panel) and the
phase (lower graphs of each panel) of the order parameters
ΨA (left graphs of each panel) and ΨB (right graphs of each
panel), with NB/NA = 2.777 and a coupling γ = 50. Each
graph extends between −4.41a0 and 4.41a0 in both directions.
The values of the angular momentum per atom and of Ω in
each panel are given in the text.
E(L). This observation also connects with the (absence
of) metastable, persistent currents (i.e., the second main
result of our study), which we present below.
In Ref. [15] we have given a simple argument for the
presence of vortex states of multiple quantization within
the mean-field approximation. At least when the ra-
tio between NA and NB is of order unity (but NA 6=
NB), there are self-consistent solutions of Eqs. (3) of vor-
tex states of multiple quantization. Within these solu-
tions, the smaller component (say component A), does
not rotate, providing an “effective” external potential
Veff,B(r) = Vext(r) + U0nA(r) for the other one (compo-
nent B), which is anharmonic close to the center of the
trap. This effectively anharmonic potential is responsible
for the multiple quantization of the vortex states. There-
fore, we conclude that for a relatively small population
imbalance, the “coreless vortices” are vortices of multiple
quantization.
The second aspect of our study is the absence of
metastable currents (in the laboratory frame, for Ω = 0).
A convenient and physically-transparent way to think
about persistent currents is that they correspond to
metastable minima in the dispersion relation E = E(L)
[20]. A non-negative curvature of E(L) for all values of
L implies the absence of metastability. For all the cou-
plings we have examined, both within the numerical diag-
onalization, as well as the mean-field approximation, we
have found a non-negative second derivative of the dis-
persion relation. Figure 3 shows LA/NA, LB/NB, and
L/N versus Ω, for γ = 50. These curves are calculated
by minimizing the energy E(L) in the rotating frame for
a fixed Ω, and plotting the angular momentum per par-
ticle of the corresponding state for the given rotational
frequency.
Again, our argument for the effective anharmonic po-
tential is consistent with this positive curvature. Let us
 0.3  0.4  0.5  0.6  0.7
 0.3  0.4  0.5  0.6  0.7
FIG. 3: Higher graph: The angular momentum LA/NA
(crosses) and LB/NB (dots), as function of Ω. Lower graph:
L/N as function of Ω. All curves result from the minimiza-
tion of the energy in the rotating frame, within the mean-field
approximation, for γ = 50.
consider for simplicity aAA = aBB = 0, and aAB 6= 0.
Then, the problem of solving Eqs. (3) becomes essen-
tially a (coupled) eigenvalue problem. If E0,m are the
(lowest) eigenvalues of the effective (anharmonic) po-
tential felt by the rotating component for a given an-
gular momentum mh̄, then ∂2E0,m/∂m
2 is always posi-
tive. For example, if one considers a weakly-anharmonic
effective potential, Veff(ρ) = Mω
2ρ2[1 + λ(ρ/a0)
2k]/2,
where k = 1, 2, . . . is a positive integer, a0 = (h̄/Mω)
is the oscillator length, and 0 < λ ≪ 1 is a small di-
mensionless constant, according to perturbation theory,
E0,m = h̄ω|m|+ λ(|m|+ 1) . . . (|m|+ k)/2, which clearly
has a positive curvature.
One may gain some physical insight into the absence
of persistent currents by understanding the difference be-
tween a gas with one and two components. In the case
of a single component, for sufficiently strong (and repul-
sive) interactions, an energy barrier that separates the
state with circulation from the vortex-free state may de-
velop. In the simplest model where the atoms rotate in
a toroidal trap, in order for them to get rid of the circu-
lation, they have to form a node in their density, which
costs interaction energy, and this creates the energy bar-
rier [20, 21]. On the other hand, in the presence of a
second component, this node may be filled with atoms of
the other species, and therefore the system may get rid
of the circulation with no energy expense. This physical
picture is also supported by the density plots in Figs. 2
and 4. For example, in the case of coreless vortices, the
core of the vortex is filled by the other (non-rotating)
component [12]. More generally, the density minima of
FIG. 4: The density (higher graphs) and the phase (lower
graphs) of the order parameters ΨA (left graphs of each panel)
and ΨB (right graphs of each panel), with NB/NA = 2.777.
Here Ω/ω = 0.6 and γ = 50. In the left panel LA = 0,
and LB/NB = 2. In the right panel, the scattering length
aBB is twice as large as in the left panel, aBB = 2a. In this
case, LA/NA = 0.05, and LB/NB = 1.936. All graphs extend
between −4.41a0 and 4.41a0.
the one component coincide, roughly speaking, with the
density maxima of the other component, resulting in a
total density ntot = |ΨA|
2 + |ΨB|
2 which does not have
any local minima or nodes.
Our third result is based on the mean-field approxima-
tion. For 0 ≤ L ≤ Nmin, where Nmin = min(NA, NB),
the only components of the order parameters ΨA and ΨB
are the single-particle states with m = 0 and m = 1, i.e.,
cn,0Φn,0 + cn,1Φn,1,
dn,0Φn,0 + dn,1Φn,1, (4)
where cn,0, cn,1, dn,0 and dn,1 are functions of L and of
the coupling. The numerical simulations that we perform
within a range of couplings γ ≤ 50 that extend well be-
yond the lowest-Landau level approximation, reveal this
very simple structure for the lowest state of both com-
ponents. Also, the corresponding dispersion relation is
numerically very close to a parabola, as in the case of
weak interactions [15]. Again, one may attribute these
facts to the effective potential that arises from the inter-
action between the two species [15].
In the studies that have examined a single-component
gas in an external anharmonic potential, it has been
shown that as the strength of the interaction increases,
there is a phase transition from the phase of multiple
quantization to the phase of single quantization [19]. In
the present case the situation is more complex, since the
effective anharmonic potential is generated by the in-
teraction between the two species as a result of a self-
consistent solution. Still, a similar phase transition takes
place here, as, for example, one keeps the scattering
lengths aAA and aAB fixed, and increases aBB that cor-
responds to the rotating component. Figure 4 shows the
density and the phase of both species, for aAA = aBB =
aAB = a (left panel), and aBB = 2aAA = 2aAB = 2a
(right panel). Component B undergoes a phase transi-
tion from a doubly-quantized vortex state to two singly-
quantized vortices.
To conclude, mixtures of bosons demonstrate numer-
ous novel superfluid properties and provide a model sys-
tem for studying them. Here we have given a flavor of
the richness of this problem. Many of the results pre-
sented in our study are worth investigating further, as,
for example, one changes the ratio of the populations, the
coupling constant between the same and different species,
or the masses.
We acknowledge financial support from the Euro-
pean Community project ULTRA-1D (NMP4-CT-2003-
505457), the Swedish Research Council, the Swedish
Foundation for Strategic Research, and the NordForsk
Nordic Network on “Low-dimensional physics”.
[1] A. J. Leggett, Rev. Mod. Phys. 71, S318 (1999).
[2] N. D. Mermin and T.-L. Ho, Phys. Rev. Lett. 36, 594
(1976).
[3] P. Bhattacharyya, T.-L. Ho, and N. D. Mermin, Phys.
Rev. Lett. 39, 1290 (1977).
[4] T.-L. Ho, Phys. Rev. Lett. 49, 1837 (1982).
[5] M. R. Matthews, B. P. Anderson, P. C. Haljan, D. S.
Hall, C. E. Wieman, and E. A. Cornell, Phys. Rev. Lett.
83, 2498 (1999).
[6] A. E. Leanhardt, Y. Shin, D. Kielpinski, D. E. Pritchard,
and W. Ketterle, Phys. Rev. Lett. 90, 140403 (2003).
[7] V. Schweikhard, I. Coddington, P. Engels, S. Tung, and
E. A. Cornell, Phys. Rev. Lett. 93, 210403 (2004).
[8] J. E. Williams and M. J. Holland, Nature (London) 401,
568 (1999).
[9] E. J. Mueller and T.-L. Ho, Phys. Rev. Lett. 88, 180403
(2002).
[10] K. Kasamatsu, M. Tsubota, and M. Ueda Phys. Rev.
Lett. 91, 150406 (2003).
[11] K. Kasamatsu, M. Tsubota, and M. Ueda Phys. Rev. A
71, 043611 (2005).
[12] K. Kasamatsu, M. Tsubota, and M. Ueda, Int. J. of Mod.
Phys. B, 19, 1835 (2005).
[13] See, e.g., A. L. Fetter, Phys. Rev. A 64, 063608 (2001); E.
Lundh, Phys. Rev. A 65, 043604 (2002), and references
therein.
[14] V. Bretin, S. Stock, Y. Seurin, and J. Dalibard, Phys.
Rev. Lett. 92, 050403 (2004).
[15] S. Bargi, J. Christensson, G. M. Kavoulakis, and S. M.
Reimann, Phys. Rev. Lett. 98, 130403 (2007).
[16] See, e.g., S. A. Chin and E. Krotscheck, Phys. Rev. E 72
036705 (2005).
[17] R. Blaauwgeers, V. B. Eltsov, M. Krusius, J. J. Ruohio,
R. Schanen, and G. E.Volovik, Nature (London) 404, 471
(2000).
[18] D. A. Butts, and D. S. Rokhsar, Nature (London) 397,
327 (1999).
[19] See, e.g., G. M. Kavoulakis and G. Baym, New Journal
of Phys. 5, 51.1 (2003), and references therein.
[20] A. J. Leggett, Rev. Mod. Phys. 73, 307 (2001).
[21] D. Rokhsar, e-print cond-mat/9709212.
http://arxiv.org/abs/cond-mat/9709212
ABSTRACT
  A rotating, two-component Bose-Einstein condensate is shown to exhibit
vortices of multiple quantization, which are possible due to the interatomic
interactions between the two species. Also, persistent currents are absent in
this system. Finally, the order parameter has a very simple structure for a
range of angular momenta.

<|endoftext|><|startoftext|>
Introduction
The past decade has seen the discovery of over one hundred extra-solar planets. Most of
these planets (see, e.g. Marcy et al. 2005) are gas giants of Jupiter mass or greater, in orbits
close to their host stars.1 Such planets are often called ‘hot Jupiters.’ Although detection
methods (primarily radial velocity measurements) are strongly biased towards detecting such
planets, a sufficient number have been discovered to require detailed explanation.
The present orbits of the ‘hot Jupiters’ lie so close to their host stars that in situ forma-
tion is almost impossible. The problem is two-fold. Firstly, the expected disc temperature
is high, which suppresses gravitational capture of gas by a gas giant core. Secondly, there
1For the most up to date information, refer to http://exoplanet.eu/ and http://exoplanets.org/
http://arxiv.org/abs/0704.0448v1
http://exoplanet.eu/
http://exoplanets.org/
– 2 –
simply isn’t much material at such small orbital radii. We are therefore led to the conclusion
that the ‘hot Jupiters’ formed further from the star, and migrated to their present orbits.
In this paper, we present a study of giant planet migration in power law discs.
A planet in a circumstellar disc exerts a torque on the disc material at resonances
(Goldreich and Tremaine 1978, 1979, 1980). By Newton’s Third Law, there is an opposite
torque on the planet, which causes it to migrate. Planets of mass comparable to Jupiter (the
exact threshold depends on the disc conditions, see below) exert torques comparable to the
viscous torque in the disc. They can therefore open a gap in the disc. If the gap is perfectly
clean, then the planet will act as a ‘relay station’ between the inner and outer discs. We then
expect the planet’s orbital evolution to follow the viscous evolution of the disc – a process
known as Type II migration (Ward and Hahn 2000). We shall compare our computational
results to the predictions of Type II theory.
We summarise the theory of Type II migration in Section 2. We describe our numerical
method in Section 3, and demonstrate the formation of a gap in Section 4. Section 5 presents
our main results. In Section 6 we discuss the implications of our findings, before presenting
our conclusions in Section 7.
2. Type II Migration
In this section, we shall briefly review the theory of Type II migration. For a more
thorough analysis of the theory of planet–disc interactions, the reader should turn to one of
the many published reviews (e.g. Lin and Papaloizou 1993; Ward and Hahn 2000; Lin et al.
2000).
Planet–disc interactions are dominated by resonances. A planet embedded in a cir-
cumstellar disc excites waves at its Lindblad resonances (LR). These waves carry angular
momentum, and hence exert a torque on the disc material. The strength of the torque from
the Lindblad resonances is proportional to TLR ∝ Σq
2, where Σ is the local disc surface
density, and q is the planet-star mass ratio. These torques act to push material away from
the planet. At the same time, the disc gas is expected to have an intrinsic viscosity, ν (al-
though the precise origin and exact behaviour of such a viscosity are still much debated),
which leads to a viscous torque Tν ∝ Σν. Since the Lindblad torques scale with q2, we can
expect that as the planet accretes material, the Lindblad torques will eventually dominate
the viscous torque in the disc. Balancing the two torques leads to the so-called gap opening
criterion:
q > 40R−1 (1)
– 3 –
where R = r2Ω/ν is the Reynolds number of the disc (Bryden et al. 1999; Nelson et al.
2000). There is a similar requirement that the planet’s Hill sphere exceed the local scale
height of the disc, namely that
q > 3
where h is the disc scale height. See Lin and Papaloizou (1993) for a further discussion.
The tidal condition of Equation 2 leads to the gap width being at least twice the local scale
height. If this condition is not satisfied, then the edge of the gap will be Rayleigh unstable.
Consideration of the torque condition leads to the expectation that the gap will lie between
the m = 2 Lindblad resonances of the planet (that is, between r = 0.6 and r = 1.3). For
Jovian mass planets in circumstellar discs, these gap sizes are similar.
Once the planet has opened a gap, it is assumed to isolate the inner and outer discs
from each other. However, each part of the disc will still be undergoing viscous evolution.
According to Type II migration theory, the planet will be locked to the viscous evolution of
the disc (Ward and Hahn 2000). Setting the migration rate of the planet equal to the radial
drift velocity of the gas in a thin accretion disc (see e.g. Pringle 1981), we find:
ȧ = −
so long as the disc is sufficiently massive. Note that Equation 3 is independent of the disc
surface density. If the disc mass is too low, then the torques (also proportional to Σ above)
will not be able to force the planet to migrate at this rate. Syer and Clarke (1995) and
Ivanov et al. (1999) have examined this limit.
Ida and Lin (2004) suggest an alternative prescription for determining the Type II mi-
gration rate. They balance the angular momentum change of the planet, 1
MpΩpaȧ with the
maximum viscous couple in the disc J̇ = 3
r2. Since we are using power law discs
here, J̇ will not have a maximum, so we use the nominal value at the planet’s orbit. This
leads to the prediction
ȧ = −3
Since this equation was obtained by balancing angular momentum, we might expect it to be
valid for a full range of disc masses.
Surprisingly, giant planet migration does not seem to have been subjected to a system-
atic test (this is in sharp contrast to the theory of Type I migration, which applies to low mass
planets). Some curiosities in the behaviour of migrating giant planets have been seen (e.g.
Schäfer et al. 2004), but these have not been explored in detail. The work of Nelson et al.
– 4 –
(2000) provides the most complete set of runs to date, but the physical parameters were not
varied in a regular fashion.
In this paper, we shall present a series of numerical experiments, following giant planets
migrating in a variety of accretion discs. We will then compare our results to equations 3
and 4. Equation 3 makes particularly strong predictions about the expected migration rates
- namely that the migration rate depends solely on the disc viscosity. This is the first time
such a test has been performed.
3. Numerical Set up
We use the Fargo code of Masset (2000a,b) to perform our calculations. Fargo is a
simple 2d polar mesh code dedicated to disc planet interactions. It is based upon a standard,
Zeus-like (Stone and Norman 1992) hydrodynamic solver, but owes its name to the Fargo
algorithm upon which the azimuthal advection is based. This algorithm avoids the restrictive
timestep typically imposed by the rapidly rotating inner regions of the disc, by permitting
each annulus of cells to rotate at its local Keplerian velocity and stitching the results together
again at the end of the timestep. The use of the Fargo algorithm typically lifts the timestep
by an order of magnitude, and therefore speeds up the calculation accordingly. The mesh
centre lies at the central star, so indirect terms coming from the planets and the disc are
included in the potential calculation. We make use of an non-reflecting inner boundary, to
prevent reflected waves from interfering with the calculations. The pitch angle of the wake
is evaluated in the WKB approximation. The inner ring of active cells is then copied to the
ghost cells, with an azimuthal shift appropriate to the pitch angle. Material which flows off
the inner boundary is not added to the star (nor does the planet itself accrete). At the outer
boundary, mass was added, to compensate for the viscous evolution of the system.
We use units normalised such that G = M
+Mp = 1, while the planet’s initial orbital
radius is set at a = 1. References to times in terms of ‘orbits’ should be understood to
mean “orbital times at the planet’s initial radius.” The grid extends between r = 0.4 and
r = 2.5. Scaled to Jupiter’s orbit, this grid roughly covers the area between the asteroid belt
and Saturn’s orbit. We assume a constant aspect ratio disc, with h/r = 0.05. We set the
mass ratio q = Mp/M∗ to be 10
−3, approximately equal to the Jupiter-Sun value. In our
parameter space search, we varied the disc surface density profile and viscosity. The surface
density was initially a power law:
Σ(r) = Σ0
and we take r0 = 1 (the planet’s initial orbital radius). Four values for δ are considered:
– 5 –
0, 0.5, 1 and 1.5. These represent a reasonable range of alternatives, from a theoretically
simple constant surface density disc to the canonical Minimum Mass Solar Nebula. We set
Σ0 through
qdisc =
which provides a quick estimate of the disc’s mass within the planet’s orbit. This estimate
is accurate for δ = 0 discs, but is an underestimate for larger δ values. We take four values
for q
: 5×10−4, 10−3, 2×10−3, and 3×10−3. The total disc mass lies between 1.9×10−3
(for q
= 5× 10−4 and δ = 1.5) and 0.018 (for q
= 3× 10−3 and δ = 0) in units of the
stellar mass. By way of comparison, the Minimum Mass Solar Nebula (MMSN) requires at
least 5MJ of gas in the vicinity of Jupiter’s orbit.
2 Thus, our lower mass discs are somewhat
sub-Minimum.
The viscosity is taken to be uniform, and has values ν = 10−4 and 10−5 in our units.
With a uniform viscosity, δ = 0.5 yields a disc with an initially stationary surface density pro-
file (cf equation 2.10 of Pringle 1981). These viscosities may be related to the α prescription
for viscosity, ν = αcsh using
ν = α
Ωr2 (7)
This implies that α varies with radius. With our aspect ratio, a viscosity of ν = 10−5 gives
α ≈ 4 × 10−3 at the planet’s initial orbital radius. Note that for the highest viscosity, the
gap opening criterion of Equation 1 is not satisfied. The tidal condition of Equation 2 is
always satisfied in our numerical experiments.
The gravitational effect of the planet on the disc is smoothed at 0.6 of the disc thickness
at the planet’s orbital radius:
φ = −
r2 + ǫ2
where ǫ = 0.6h. There are two motivations for this, the first being the desire to avoid having
a singularity wandering around the grid. The second is physical. The 2d approximation
becomes poor close to the planet, where the vertical distribution of material becomes im-
portant. The actual distance of material from the planet ceases to be the well approximated
by the in-plane distance, which would lead to the gravitational effect being over-estimated.
Accordingly, we soften the potential over distances comparable to the disc scale height. How-
ever, this softening length is still substantially smaller than the expected gap size and the
2We calculate this by comparing Jupiter’s metal content to that of the Sun, and assuming that both
condensed from the same gas cloud. Scaled to the Solar System, our grid roughly covers the region between
the asteroid belt and Saturn
– 6 –
planet’s Hill sphere. When calculating the torque the disc exerts on the planet, material from
within the Hill sphere is subject to an exponential cut off, for similar reasons. At the start
of each run, the planet is introduced gradually (over about one orbit), and is not initially
permitted to migrate. This is done to minimise the effect of transients caused by the sudden
appearance of a planet in a smooth disc. We considered release times of approximately one
orbit, and 100 & 1000 orbits. The computational grid is covered by 128 radial and 384
azimuthal cells (all uniformly spaced).
So far as possible, this setup mirrors that used in the comparison project presented by
de Val-Borro et al. (2006). In that comparison, the Fargo code was seen to give similar
results to other codes used to study the disc–planet interaction problem.
4. Development of the Gap
Since we introduce the planet into an initially unperturbed disc, there is a period of
rapid evolution, as the planet clears a gap. In this section, we shall discuss the development
of this gap.
In Figure 1, we trace the evolution of the gap in a ν = 10−5 disc. The initial surface
density profile had δ = 0.5 (cf equation 5). There are no surprises in this plot, when
compared with the many other numerical calculations of gap formation. We see that the
gap is mostly cleared in the first 100 orbits (note that the y-axis is logarithmic). The gap
lies roughly between the m = 2 Lindblad resonances (located at r = 0.6 and r = 1.3). For a
q = 10−3 planet, this distance is also comparable to the co-rotation region. This plot draws
attention to the fact that the planet never completely clears the gap. Even after 1000 orbits,
the surface density in the gap is around 3% of its initial value. The gap edge is covered by
roughly ten grid cells.
If we increase the viscosity to ν = 10−4, the gap becomes far less pronounced. We show
the development of the gap in this case in Figure 2. Again, most of the depletion occurs
during the first 100 orbits, but the total depletion is far less. The density has only dropped
to around 50% of its initial value. This is not unexpected - according to Equation 1, a
Jupiter mass planet should not be able to open a gap in such a viscous disc. Of course, the
condition of Equation 2 is satisfied.
Figures 1 and 2 both show that most of the gap clearing occurs withing the first 100
orbits. We shall therefore use this as our canonical release time below. However, we shall
show the effect of varying the release time as well.
– 7 –
Gaps in protoplanetary discs are known to vary smoothly with q and ν (see, e.g.
astro-ph/0608020), so what is taken to be a gap is somewhat arbitrary. We shall con-
tinue with both viscosity values, but we must bear in mind that in the high viscosity case
the gap is quite shallow.
5. Results
We will now present the results of our numerical experiments of a Jupiter mass planet
migrating in power law discs, grouped by viscosity. Such planets are conventionally assumed
to undergo Type II migration. We have two predictions for the migration rate of giant
planets in Equations 3 and 4. We shall compare our results to these predictions.
5.1. High Viscosity
The higher viscosity runs had ν = 10−4. This viscosity means that for a Jupiter mass
planet, the viscous gap opening criterion q > 40R−1 of Equation 1 is not quite satisfied.
However, the tidal condition of Equation 2 is fulfilled.
We shall discuss the results from the runs where the planet was released after 100
orbits. In Appendix A, we demonstrate that the release time is not significant. The orbital
evolution of these planets is plotted in Figure 3. We cut the y-axis at 0.6, since at that point
the m = 2 ILR of the planet encounters the edge of the grid. The migration rate of the
planet therefore becomes unreliable. We can see that the migration rate is a strong function
(or equivalently, Σ0). This is in direct contradiction to the prediction of Equation 3.
Notice also how the migration rate varies with a. Equation 3 predicts that ȧ ∝ a−1, which
we do not see. We see that the migration rate generally falls with a, which is consistent
with equation 4. Figure 3 shows that the reduction of ȧ with a falls as δ increases (that
is, there is a pronounced curve in the migrations for δ = 0, whereas those for δ = 1.5 are
almost straight lines). This is broadly consistent with equation 4, were the migration rate is
proportional to ȧ ∝ aΣ ≡ a1−δ. Complicating this is the viscous evolution of the disc itself,
which is probably the reason why ȧ is not strictly proportional to a1−δ (which would predict
accelerating migration for the δ = 1.5 case).
We have seen that the migration rate of a Jupiter mass planet in these discs is strongly
affected by the disc surface density. However, since the gap is not particularly deep, whether
Type II behaviour should be expected is debatable.
– 8 –
5.2. Medium Viscosity
Here, we examine the results from runs with ν = 10−5. Again, we shall examine the
case where the planet was released after 100 orbits first.
In Figure 4, we show the orbital migration for planets embedded in a variety of discs.
We see that planets embedded in higher surface density discs (parameterised by q
Equation 6) consistently undergo faster migration. The migration rate is roughly propor-
tional to the disc surface density. Note also the variation of ȧ with a. It is similar to that
seen for figure 3 above. Eccentricities again remained low.
5.3. Summary of results
In this section, we have presented a series of giant planet migration runs. Such planets
have generally been thought to undergo Type II migration. One formulation of the theory
predicts that the migration rate depends solely on the disc viscosity (Equation 3). However,
we have found that the migration rates vary systematically with disc surface density. Higher
disc surface densities give faster migration, which is consistent with Equation 4. There is a
weaker variation with disc viscosity, which is inconsistent with both predictions.
In Appendix A, we show that our conclusions are not affected by varying the time at
which the planet was released to migrate. We demonstrate that the grid resolution does not
affect our results in Appendix B.
6. Discussion
In Section 5, we presented a series of runs designed to study how giant planets migrate.
This migration of such planets is thought to be controlled by the viscous torque within
the disc. Two different rates have been suggested. In the first (Equation 3), the planet is
locked to the viscous evolution of the disc, and the migration rate depends solely on the disc
viscosity. The second (Equation 4) computes an angular momentum balance between the
planet and disc. In this theory, the migration rate also depends on the disc surface density
and planet mass.
We have seen that the migration rate we obtain varies strongly with disc surface density,
indicating that Equation 3 in not appropriate. Although Equation 4 is more promising,
we do not recover the same variation with viscosity. The higher viscosity runs underwent
more rapid migration, but the difference in migration rates was not an order of magnitude.
– 9 –
However, this result is not as robust as the variation with surface density, since the high
viscosity runs did not satify both gap opening criteria. Although Figure 2 shows that the
high viscosity runs are definitely in the non-linear regime, the gap itself was not especially
clean.
We have shown (Appendix A) that our results are not simply ‘turn-on’ transients. The
migration rates are not significantly affected by the time at which the planet is released
from a fixed orbit. Our resolution tests (Appendix B) demonstrate that our results are not
significantly affected by a doubling of the grid resolution.
Our neglect of material within the Hill sphere when calculating the torque is a point
of concern. D’Angelo et al. (2003) noted that most of the torque in their calculations came
from within the Hill sphere.3 However, the theory of Type II migration takes no account of
this material either. It is a simple 2d theory, which assumes that the planet is merely acting
as a ‘relay station’ for the disc’s viscous torques. If the flow structure within the Hill sphere
is of critical importance, then we should not expect giant planet migration to be as simple
as Section 2 suggests.
The accretion behaviour of the planet could also affect migration. This is directly linked
to the previous point about flow within the Hill sphere. Kley (1999) showed that even in the
presence of a gap, a planet could continue to accrete material from the disc.4 Similar results
were reported by Lubow et al. (1999) and Kley et al. (2001). In this paper, we did not allow
the planet to accrete, and this caused material to build up around the planet. Since we
attentuated the torque from within the Hill sphere, this would not have affected our results
directly. However, if accretion were allowed, then the planet could gain an appreciable
amount of mass. This would both alter the gap structure, and make it more difficult for the
disc to move the planet (due to the planet’s increased inertia). Related to this issue is the
recent finding of Lubow and D’Angelo (2006) that the accretion rate through the gap could
be over 10% of the viscous accretion rate in the main disc, despite the drop in gas density.
Finally, there is the matter of viscosity. In our numerical experiments, we used a physi-
cal viscosity in the Navier-Stokes equations. In reality, the ‘viscosity’ in protoplanetary discs
probably originates fromMHD turbulence, and calculations have shown (Winters et al. 2003)
that the gap structure obtained in an MHD calculation differs from that in a purely hydro-
dynamic one. In particular, the gaps tend to be wider and shallower. If material in the
3However, they did not perform a parameter space search like we have done here
4Note that this finding in itself implicitly contradicted the usual assumption that the planet isolates the
inner and outer discs
– 10 –
corotation region is important to determining the migration rate, then this alteration in
gap structure will cause further changes to the migration rates. Nelson (2005) has already
demonstrated that a magnetic turbulence strongly affects the migration of a low mass planet.
Although our computations do not include MHD turbulence, the theory of Type II migration
neglects it too, so this cannot be the reason for the differences we have observed.
When might we expect migration to proceed according to equation 3? We believe this
might be possible for a planet of moderate mass, in a cold, very low viscosity disc, which
is more massive than the planet. Our reasoning is as follows: equation 3 is based on the
assumption that the planet completely isolates the inner and outer discs. This is easiest
to achieve in a very low viscosity disc (cf equation 1), which is also cold (cf equation 2).
We also require the disc to be more massive than the planet, to ensure sufficient angular
momentum reserves are available. The gap will lie roughly between the m = 2 Lindblad
resonances, and we would want these particular resonances to be responsible for most of the
gap clearing (i.e. the m = 2 resonances themselves must dominate the disc’s viscous torque).
This is because we would need disc material to be kept well away from the corotation region
of the planet. Masset and Papaloizou (2003) showed that corotation torques can give rise to
extremely rapid migration - known as ‘runaway’ or Type III migration. A planet less massive
than Jupiter will have its corotation region inside its m = 2 Lindblad resonances. However,
Masset and Papaloizou found that such planets tended to undergo runaway migration. This
reinforces the need for the disc itself to have a very low viscosity, so that the gap is as clean
as possible.
6.1. Origin of the Torque
We shall now discuss the radial origin of the torque.
In figure 5, we show the torque profiles, Tz(r) acting on the planet after 100, 500 and
1000 orbits. The disc viscosity was ν = 10−5, and the surface density profile was initially flat
(δ = 0). The planet was held on a fixed orbit for the entire calculation. In computing the
torques, the same exponential cut off used with Fargo was applied. We see that most of
the torque is generated within the range 0.8 < r < 1.2, and that the torques from the inner
and outer discs have opposite signs. At later times, the torque peaks on either side of the
planet lessen. This is due to the gap emptying further. The outer peak (which is pushing
the planet inwards) also broadens, while the inner peak does not. This ultimately ensures
that the planet migrates inwards.
Figure 6 shows how the torque felt by the planet scales with q
∝ Σ0. These curves
– 11 –
are plotted for a δ = 0, ν = 10−5 disc after 1000 orbits. As we might expect from section 5,
we see that the strength of the torques is directly proportional to the value of q
. This
leads to the migration rate varying strongly with the surface density of the disc.
Finally, in figure 7, we show the effect of varying the initial disc power law, δ, on the
torque profiles. Again, the torque profiles are for a ν = 10−5 disc, and are plotted after 1000
orbits of the (fixed) planet. We see that the torques are very similar, regardless of the initial
δ value, indicating that the perturbations induced by the planet are not dependent on the
background structure of the disc. Again, this is expected from section 5.
Figures 5, 6 and 7 all show that the torque generation peaks at radii of r = 0.9 and r =
1.1 (roughly 1.5 Hill radii from the planet). Comparing to figure 1, we see that these locations
lie deep within the gap, which is interesting for a number of reasons. The peaks are close to
the cutoff radius generally applied to obtain the numerical factor in equation 1, namely the
planet’s Hill sphere. They are also well within the corotation region, raising the possibility
that corotation torques are affecting the orbital evolution of the planet. Unfortunately, at
this resolution, the Hill radius is only covered by four or five grid cells, and the torque peak
only lies seven grid cells from the planet itself. The smoothing lengths are also comparable
to these distances.
Our resolution tests (Appendix B) show that our resolution is adequate for the smooth-
ing lengths used. However, with so much torque being generated close to the planet, it is
likely that the smoothing is significantly affecting the torque. Reduction of the smoothing
lengths is obviously desirable, but unfortunately not possible in a two dimensional calcula-
tion. As noted in section 3, we must smooth the planet’s gravity at about the local scale
height of the disc in order to make a 2d calculation valid. In Appendix C, we show the
effect of reducing the exclusion radius for the calculation of the migration torque. The effect
appears to be minimal, but these results should be treated with some caution, due to the
issue noted above. With the structure of the flow close to the planet so obviously critical to
determining the migration rate, a full 3d calculation would be required to determine a robust
migration rate. However, we would be most surprised if such calculations undermined our
main conclusion that migration rates of massive planets are proportional to the disc mass.
7. Conclusion
In this paper, we have performed a systematic test of giant (Jupiter mass) planet migra-
tion. The migration rates we obtained varied strongly with the initial disc surface density,
and less strongly with the disc viscosity. We have shown that the simplest theory of Type
– 12 –
II migration, where the planet is locked to the viscous evolution of the disc (Equation 3), is
incorrect. An alternative formulation, based on an angular momentum balance (Equation 4),
looks more promising. However, we have not tested this second theory fully. We verified
that our results were not simply ‘turn-on’ transients, or purely the effect of low resolution.
Neither doubling the grid resolution, nor allowing the planet to clear its gap for 1000 orbits
affected our central finding.
Separate confirmation of our results, using a different code would be highly desir-
able. Although we have no reason to believe that Fargo is misbehaving, the work of
de Val-Borro et al. (2006) underlines how codes can give varying results, even for the ‘same’
physical scenario (it is for this reason that ‘simulations’ should properly be referred to as
‘numerical experiments’). The issues of accretion and gravitational softening (both of the
planet’s effect on the disc, and the torque exerted on the planet) also merit closer consider-
ation. Indeed, if the gap shape and flow through the gap are critical for migration, one is
led to wonder if 2d calculations are sufficient. Two dimensional calculations have generally
been thought adequate for Jovian mass planets because the gap would keep material away
from the planet (where the 3d nature of the flow will become evident). If the flow within
the Hill sphere is important, then 2d calculations cease to be convincing.
A. Effect of Varying Release Time
In this Appendix, we shall demonstrate that varying the time at which the planet is
released does not affect our central conclusion. We start by showing that changing the
release time only affects the migration rates slightly. We then show that, since the effect is
consistent for all discs used, the conclusion that the planet migration rate varies with disc
surface density is robust.
In Figure 8, we show the effect of varying the release time on a planet in a high viscosity
(ν = 10−4) disc with q
= 0.002 and δ = 0.5. The effect is fairly minimal, and this plot
is typical. The explanation for this lies in Figure 2, which shows that only a minimal gap is
formed. Indeed, one can debate whether the surface density depression is a gap, since the
usual criterion (Equation 1) is not satisfied.
Figure 9 shows the effect of release time on the orbital migration of a planet in the
ν = 10−5 case. The particular disc used in this comparison had q
= 0.002 and δ = 0,
but the behaviour was generically the same for all cases. We see that the time at which
the planet is released does have an effect on the migration rate. However, the effect is not
especially dramatic.
– 13 –
We demonstrate that the release time does not affect our central conclusions in Figure 10.
This duplicates Figure 4, but with the release time increased to 1000 orbits (note that the x
axis has the zero point shifted, to improve the use of space). Although the exact migration
rates undoubtedly change, the main conclusion that these planets are not undergoing Type
II migration is unaffected. The migration rates continue to be affected by the disc surface
density.
B. The Effect of Resolution
In this appendix, we shall study the effect of increasing the grid resolution on our results.
We re-ran four of our numerical experiments, but with the grid resolution doubled. We
picked the set of four runs with ν = 10−5, and δ = 0.5. In Figure 11 we plot the results,
compared to the top right panel of Figure 4. In this plot, we can see that doubling the grid
resolution has a minimal effect on the migration rates obtained.
From this, we see that our conclusions are not simply an artifact of low resolution. Of
course, the smoothing we have used could be hiding some effects, but we would not expect
decreasing the smoothing length to change the planet migration to Type II behaviour. The
smoothing length, at 0.6h is already rather smaller than the gap width, so shrinking it further
is unlikely to enable the planet to make the gap cleaner. Furthermore, this smoothing length
is already as small as we can realistically make it. The flow close to the planet will really
have a 3d structure, not resolved in these calculations. By forcing all material to lie in the
disc plane (as required by a 2d calculation), we effectively make it closer to the planet –
significantly so within the gap. By softening the potential over a distance comparable to the
scale height, we approximate the true 3d strength of the planet’s gravity.
C. The Effect of Hill Sphere Exclusion
By excluding material within the planet’s Hill sphere when computing the migration
torque, we potentially reduced the migration torque substantially. Although this has a sound
physical motivation (cf section 3), it does potentially affect the migration rate of the planet.
Figure 12 shows that this has negligible effect on our results. This shows the migration
of eight planets, embedded in ν = 10−5, δ = 0 discs. The four standard q
values were
considered, and the planets were held for 1000 orbits before being released to migrate. The
only difference between each pair of curves is whether the exclusion radius for the torque
calculation was a full Hill radius, for the solid lines, or half the Hill radius, for the dotted
– 14 –
lines. The solid lines in figure 12 are equivalent to the δ = 0 (top left) panel of figure 4,
up to the difference in release time. In every case, the migration of each pair of planets is
almost identical.
In figure 13, we show the effect of the exclusion radius reduction on the torque profiles.
This figure should be directly compared to figure 5. We can see that the torque close to
the planet is increased, particularly for the first curve, plotted after 100 orbits. There is
also a stronger peak close to the planet for all the curves. Otherwise, the torque profiles are
remarkably similar to figure 5. This is as expected, given the results shown in figure 12.
The author acknowledges support from NSF grants AST-0406799, AST-0098442, AST-
0406823, and NASA grants ATP04-0000-0016 and NNG04GM12G (issued through the Ori-
gins of Solar Systems Program). I would like thank Frederic Masset for use of the Fargo
code. I am also very grateful to Eric Blackman and Alice Quillen, for reading early drafts
of this manuscript. Some of the computations presented here used the resources of HPC2N,
Ume̊a
REFERENCES
Bryden, G., Chen, X., Lin, D. N. C., Nelson, R. P., and Papaloizou, J. C. B.: 1999, ApJ
514, 344
D’Angelo, G., Kley, W., and Henning, T.: 2003, ApJ 586, 540
de Val-Borro, M., Edgar, R. G., Artymowicz, P., Ciecielag, P., Cresswell, P., D’Angelo, G.,
Delgado-Donate, E. J., Dirksen, G., Fromang, S., Gawryszczak, A., Klahr, H., Kley,
W., Lyra, W., Masset, F., Mellema, G., Nelson, R. P., Paardekooper, S.-J., Peplinski,
A., Pierens, A., Plewa, T., Rice, K., Schäfer, C., and Speith, R.: 2006, MNRAS 370,
Goldreich, P. and Tremaine, S.: 1978, ApJ 222, 850
Goldreich, P. and Tremaine, S.: 1979, ApJ 233, 857
Goldreich, P. and Tremaine, S.: 1980, ApJ 241, 425
Ida, S. and Lin, D. N. C.: 2004, ApJ 604, 388
Ivanov, P. B., Papaloizou, J. C. B., and Polnarev, A. G.: 1999, MNRAS 307, 79
Kley, W.: 1999, MNRAS 303, 696
– 15 –
Kley, W., D’Angelo, G., and Henning, T.: 2001, ApJ 547, 457
Lin, D. N. C. and Papaloizou, J. C. B.: 1993, in E. H. Levy and J. I. Lunine (eds.), Protostars
and Planets III, pp 749–835
Lin, D. N. C., Papaloizou, J. C. B., Terquem, C., Bryden, G., and Ida, S.: 2000, Protostars
and Planets IV p. 1111
Lubow, S. H. and D’Angelo, G.: 2006, ApJ 641, 526
Lubow, S. H., Seibert, M., and Artymowicz, P.: 1999, ApJ 526, 1001
Marcy, G., Butler, R. P., Fischer, D., Vogt, S., Wright, J. T., Tinney, C. G., and Jones,
H. R. A.: 2005, Progress of Theoretical Physics Supplement 158, 24
Masset, F.: 2000a, A&AS 141, 165
Masset, F. S.: 2000b, in ASP Conf. Ser. 219: Disks, Planetesimals, and Planets, p. 75
Masset, F. S. and Papaloizou, J. C. B.: 2003, ApJ 588, 494
Nelson, R. P.: 2005, A&A 443, 1067
Nelson, R. P., Papaloizou, J. C. B., Masset, F., and Kley, W.: 2000, MNRAS 318, 18
Pringle, J. E.: 1981, ARA&A 19, 137
Schäfer, C., Speith, R., Hipp, M., and Kley, W.: 2004, A&A 418, 325
Stone, J. M. and Norman, M. L.: 1992, ApJS 80, 753
Syer, D. and Clarke, C. J.: 1995, MNRAS 277, 758
Ward, W. R. and Hahn, J. M.: 2000, Protostars and Planets IV p. 1135
Winters, W. F., Balbus, S. A., and Hawley, J. F.: 2003, ApJ 589, 543
This preprint was prepared with the AAS LATEX macros v5.2.
– 16 –
Fig. 1.— Development of the gap for a Jupiter-mass planet on a fixed circular orbit, em-
bedded in a ν = 10−5 disc, with δ = 0.5 (cf equation 5). The azimuthally averaged surface
density profile is show after 10, 100 and 1000 orbits. Note that the y-axis is logarithmic
– 17 –
Fig. 2.— Development of the gap for a Jupiter-mass planet on a fixed circular orbit, em-
bedded in a ν = 10−4 disc, with δ = 0.5 (cf equation 5). The azimuthally averaged surface
density profile is show after 10, 100 and 1000 orbits. For ease of comparison, the y-axis is
identical to that in Figure 1
– 18 –
Fig. 3.— Planetary migration in the ν = 10−4 case for disc power laws of δ = 0 (top left), 0.5
(top right), 1.0 (bottom left) and 1.5 (bottom right). The planets were released to migrate
after 100 orbits. The lines are marked by the value of q
(see Equation 6)
– 19 –
Fig. 4.— Planetary migration in the ν = 10−5 case for disc power laws of δ = 0 (top left), 0.5
(top right), 1.0 (bottom left) and 1.5 (bottom right). The planets were released to migrate
after 100 orbits. The lines are marked by the value of q
(see Equation 6)
– 20 –
Fig. 5.— Radial torque profile for a planet in a ν = 10−5 disc as a function of time. The
disc initially had a flat (δ = 0) surface density profile, and the planet was on a fixed orbit
for the entire time
– 21 –
Fig. 6.— Radial torque profiles for a planet in ν = 10−5 discs of differing surface density
after 1000 orbits. The disc had a δ = 0 initial surface density profile, and each line is marked
with its q
value. The planet’s orbit was fixed
– 22 –
Fig. 7.— Radial torque profiles for a planet in ν = 10−5 discs of differing δ, plotted after
1000 orbits. All discs had the same q
value, and the planet was held on a fixed orbit
– 23 –
Fig. 8.— The effect of release time on the ν = 10−4 case. The orbital evolution of a planet
in a sample disc is shown for three different release times
– 24 –
Fig. 9.— The effect of release time on the ν = 10−5 case. The orbital evolution of a planet
in a sample disc is shown for three different release times
– 25 –
Fig. 10.— Planetary migration in the ν = 10−5 case for planets released after 1000 orbits.
The disc power laws are δ = 0 (top left), 0.5 (top right), 1.0 (bottom left) and 1.5 (bottom
right). The lines are marked by the value of q
(see Equation 6). This plot should be
compared to Figure 4
– 26 –
Fig. 11.— Resolution test for a ν = 10−5 disc with δ = 0.5, where the planet is released
after 100 orbits. Four different q
values are compared (refer to the top right panel of
Figure 4) at two grid resolutions
– 27 –
Fig. 12.— Effect of reducing the exclusion radius to half the Hill radius when computing the
migration torque on the planet. This figure is based on the δ = 0 (top left) panel of Figure 4,
except that the planets were released after 1000 orbits. The standard four q
migrations
are plotted. The solid lines correspond to those in Figure 4 (up to the difference in release
time), while the dotted lines trace planets where the torque exclusion was only half the Hill
radius
– 28 –
Fig. 13.— Effect on the torque profiles of reducing exclusion radius to half the Hill radius.
This figure should be compared to figure 5
	Introduction
	Type II Migration
	Numerical Set up
	Development of the Gap
	Results
	High Viscosity
	Medium Viscosity
	Summary of results
	Discussion
	Origin of the Torque
	Conclusion
	Effect of Varying Release Time
	The Effect of Resolution
	The Effect of Hill Sphere Exclusion
ABSTRACT
  Many extra-solar planets discovered over the past decade are gas giants in
tight orbits around their host stars. Due to the difficulties of forming these
`hot Jupiters' in situ, they are generally assumed to have migrated to their
present orbits through interactions with their nascent discs. In this paper, we
present a systematic study of giant planet migration in power law discs. We
find that the planetary migration rate is proportional to the disc surface
density. This is inconsistent with the assumption that the migration rate is
simply the viscous drift speed of the disc. However, this result can be
obtained by balancing the angular momentum of the planet with the viscous
torque in the disc. We have verified that this result is not affected by
adjusting the resolution of the grid, the smoothing length used, or the time at
which the planet is released to migrate.

<|endoftext|><|startoftext|>
Introduction 1
2 Calabi-Yau Threefolds 5
2.1 The Calabi-Yau Threefold X . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The Intermediate Quotient X . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Toric Geometry and Mirror Symmetry 7
3.1 Toric Varieties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 The Batyrev-Borisov Construction . . . . . . . . . . . . . . . . . . . . . 10
3.3 Toric Intersection Ring . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Mori Cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 B-Model Prepotential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Quotienting the B-Model 20
4.1 The Quotient by G1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 The Quotient by G2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3 B-Model on X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Instanton Numbers of X . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.5 B-Model on X
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.6 Instanton Numbers of X∗ . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.7 Instanton Numbers Assuming The Self-Mirror Property . . . . . . . . . 33
5 The Self-Mirror Property 34
6 Factorization vs. The (3,1,0,0,0) Curve 38
7 Towards a Closed Formula 41
8 Conclusion 43
A Triangulation of ∇̄∗ and ∇∗ 44
B The Flop of X∗ 50
Bibliography 53
1 Introduction
Counting world sheet instantons (that is, holomorphic curves) on a Calabi-Yau three-
fold has had a large number of applications in mathematics and physics, ever since it
was essentially solved by mirror symmetry several years ago [2]. The purpose of this
paper is to take into account an important subtlety that does not appear in very sim-
ple Calabi-Yau manifolds like hypersurfaces in smooth toric varieties. This subtlety
is the appearance of torsion curve classes. That is, the homology1 group
= Z3 ⊕ Z3 ⊕ Z3 (1)
contains the torsion2 subgroup Z3⊕Z3. Here, the manifold of interest X is a quotient
of one of Schoen’s Calabi-Yau manifolds [3, 4] by a freely acting symmetry group.
There are already a few known examples of such Calabi-Yau manifolds with torsion
curves [5, 6, 7, 8, 9], but the proper instanton counting has never been done before.
The prime motivation for studying these curves is that one would like to compute
the superpotential for the vector bundle moduli [10, 11, 12, 13, 14, 15, 16] in a heterotic
MSSM [17, 18, 19, 20, 21, 22, 23, 24, 25]. Our main result will be that there exist
smooth rigid rational curves in X that are alone in their homology class. This proves
that, in general, no cancellation between contributions to the superpotential W from
instantons in the same homology class can occur.
Therefore we would like to count rational curves on X . In physical terms, we need
to find the instanton correction F
X,0 to the genus zero prepotential of the (A-model)
topological string on X . This is usually written as a (convergent) power series in
h11 variables qa = e
2πita . Each summand is the contribution of an instanton, and
the (integer) coefficients are the multiplicities of instantons in each homology class.
According to [26, 27, 1] the novel feature of the 3-torsion curves on X is that for each
3-torsion generator we need an additional variable bj such that b
j = 1. The Fourier
series of the prepotential on X becomes
X,0(p, q, r, b1, b2) =
n1,n2,n3∈Z
m1,m2∈Z3
n(n1,n2,n3,m1,m2) Li3
pn1qn2rn3bm11 b
, (2)
where n(n1,n2,n3,m1,m2) is the instanton number in the curve class (n1, n2, n3, m1, m2).
For the purpose of computing the prepotential, we can either use directly the A-
model or start with the B-model and apply mirror symmetry. The A-model calculation
was carried out in the companion paper [1], entitled Part A. The results were:
• A set of powerful techniques to compute the torsion subgroups in the integral
homology and cohomology groups of X . They are spectral sequences starting
with the so-called group (co)homology of the group action on the universal cover
1In the following, Z3
= Z/3Z always denotes the integers mod 3. Similarly, we write (Z3)
⊕nZ3 = Z3 ⊕ · · · ⊕ Z3 for the Abelian group generated by n generators of order 3.
2Not to be confused with the torsion tensor of a connection.
• A closed formula for the genus zero prepotential
X,0(p, q, r, b1, b2) =
i,j=0
pbi1b
P (q)4P (r)4+O(p2) =
i,j=0
Li3(pb
2)+· · · (3)
to linear order in p, extending the one computed in [28] for the universal cover X̃ .
Here, if p(k) is the number of partitions of k ∈ Z≥, then P (q) is the generating
function for partitions,
P (q)
p(i)qi =
ln q)
. (4)
• Expanding eq. (3) as an instanton series we find that the number of rational
curves of degree (1, 0, 0, m1, m2) is:
n(1,0,0,m1,m2) = 1, ∀ m1, m2 ∈ Z3. (5)
Furthermore, these curves have normal bundle OP1(−1)⊕OP1(−1). Hence, there
are indeed 9 smooth rigid rational curves which are alone in their homology class.
Alternatively, one can start with the B-model topological string and apply mirror
symmetry, which is what we will do in this paper, entitled Part B. This will allow
us to obtain the higher order terms in p. The order in p up to which one wants to
compute the instanton numbers is only limited by computer power. We will again
find a closed formula at every order in p, however, this time by guessing it from the
instanton calculation, and hence only up to the order given by this limitation. The
way to arrive at this result is as follows:
• The universal cover X̃ admits a simple realization as a complete intersection in
a toric variety. In this situation, mirror symmetry boils down to an algorithm to
compute instanton numbers. Unfortunately, there are many non-toric divisors
which cannot be treated this way. It turns out that, after descending to X ,
precisely the torsion information is lost. In this approach one can only compute
X,0(q1, q2, q3, 1, 1).
• As a pleasant surprise we find strong evidence that the manifold X is self-mirror.
In particular, we attempt to compute the instanton numbers on the mirror X∗
by descending from the covering space X̃∗. The embedding of X̃∗ into a toric
variety is such that all 19 divisors are toric. In principle, this allows for a
complete analysis including the full Z3 ⊕Z3 torsion information, but this is too
demanding in view of current computer power.
• Although the full quotient X = X̃/(Z3 × Z3) is not toric, it turns out that a
certain partial quotient X = X̃/Z3 can be realized as a complete intersection
in a toric variety. That way, one only has to deal with h11(X) = 7 parameters,
which is manageable on a desktop computer. Assuming the self-mirror property,
we work with the mirror X
, for which again all divisors are toric, and we can
compute the expansion of F
X,0(p, q, r, 1, b2) to any desired degree. A symmetry
argument then allows one to recover the b1 dependence as well. Finally, we can
extract the instanton numbers n(n1,n2,n3,m1,m2) including the torsion information.
• As can be seen from the A-model result eq. (3), we observe that the prepotential
X,0 at order p factors into
i,j=0 b
2 times a function of p, q, r only. This
means that the instanton number does not depend on the torsion part of its
homology class. We will explain the underlying reason for this factorization
and show that it breaks down at order p3. This fits nicely with the B-model
computation at order p3, where the instanton numbers do depend on the torsion
part.
• Another consequence of the self-mirror property is that X is a non-toric example
for the conjecture of [6]. By this conjecture, certain torsion subgroups of the
integral homology groups are exchanged under mirror symmetry.
An easily readable overview and a discussion of the physical consequences of our
findings for superpotentials and moduli stabilization of heterotic models was pre-
sented in [27]. The present Part B is self-contained and can be read independently of
Part A [1]. All necessary results from Part A are reproduced in this part.
As a guide through this paper, we start in Section 2 with a brief overview of the
topology of the various spaces involved as determined in Part A [1]. This is followed by
a review of the Batyrev-Borisov construction of mirror pairs of complete intersections
in toric varieties in Section 3. We illustrate this construction by means of the covering
spaces X̃ and X̃∗ of our example. The review includes the techniques to compute the
B-model prepotential and the mirror map. These are applied in Section 4 to the partial
quotients X and X
yielding the main results stated above. This assumes that X as
well as X are self-mirror, and evidence for this property is recapitulated in Section 5.
Moreover, we show how the torsion subgroups are exchanged. Section 6 contains an
explanation for the breakdown of the factorization alluded to above. Putting all the
information together we try to guess a closed form for the prepotential in Section 7.
Finally, we present our conclusions in Section 8. In the course of this work we will
notice that a certain flop of X is very natural from the toric point of view, and we
will present it in Appendix B.
2 Calabi-Yau Threefolds
2.1 The Calabi-Yau Threefold X
The Calabi-Yau manifold X of interest is constructed as a free G
= Z3 × Z3 quo-
tient of its universal covering space X̃ . The latter is one of Schoen’s Calabi-Yau
threefolds [3]. It is simply connected and hence easier to study. Among its various
descriptions are the fiber product of two dP9 surfaces, a resolution of a certain T
orbifold [29], or a complete intersection in a toric variety. In the present Part B,
we will mostly use the latter viewpoint. The simplest way is to introduce the toric
ambient variety P2×P1×P2 with homogeneous coordinates
[x0 : x1 : x2], [t0 : t1], [y0 : y1 : y2]
∈ P2×P1×P2 . (6)
The embedded Calabi-Yau threefold X̃ is then obtained as the complete intersection
of a degree (0, 1, 3) and a degree (3, 1, 0) hypersurface in P2×P1×P2. We restrict
the coefficients of their defining equations Fi = 0 to a particular set of three complex
parameters λ1, λ2, λ3, such that the polynomials Fi read
x30 + x
1 + x
x0x1x2
= F1 (7a)
λ1t0 + t1
y30 + y
1 + y
λ2t0 + λ3t1
y0y1y2
= F2. (7b)
For the special complex structure parametrized by λ1, λ2, λ3 the complete intersection
is invariant under the G = Z3 × Z3 action generated by (ζ
[x0 : x1 : x2] 7→ [x0 : ζx1 : ζ
[t0 : t1] 7→ [t0 : t1] (no action)
[y0 : y1 : y2] 7→ [y0 : ζy1 : ζ
[x0 : x1 : x2] 7→ [x1 : x2 : x0]
[t0 : t1] 7→ [t0 : t1] (no action)
[y0 : y1 : y2] 7→ [y1 : y2 : y0]
One can show that the fixed points of this group action in P2 ×P1 ×P2 do not satisfy
eqns. (7a) and (7b), hence the action on X̃ is free.
2.2 The Intermediate Quotient X
The partial quotient
G1 (9)
will be of particular interest in this paper because this quotient is generated by phase
symmetries, see eq. (8a), and hence is toric. In particular, we will need a basis of
Kähler classes. As usual, we will not distinguish degree-2 cohomology and degree-4
homology classes but identify them via Poincaré duality. Part A [1] ?? shows that3
= H2(X̃,Z)G1 ⊕ Z3 = spanZ
φ, τ1, υ1, ψ1, τ2, υ2, ψ2
⊕ Z3. (10)
Hence, by abuse of notation, we can identify the free generators on X with the G1-
invariant generators on X̃ , see Part A eq. (??), via the pull back by the quotient map.
The triple intersection numbers on X = X̃/Z3 are one-third of the corresponding
intersection numbers on X̃ listed in Part A eq. (??). Hence, the intersection numbers
on X are
φτ1τ2 = 3 φτ1υ2 = 3 φτ1ψ2 = 6 φυ1τ2 = 3 φυ1υ2 = 3
φυ1ψ2 = 6 φψ1τ2 = 6 φψ1υ2 = 6 φψ1ψ2 = 12 τ
1 τ2 = 1
τ 21υ2 = 1 τ
1ψ2 = 2 τ1υ1τ2 = 3 τ1υ1υ2 = 3 τ1υ1ψ2 = 6
τ1ψ1τ2 = 3 τ1ψ1υ2 = 3 τ1ψ1ψ2 = 6 τ1τ
2 = 1 τ1τ2υ2 = 3
τ1τ2ψ2 = 3 τ1υ
2 = 3 τ1υ2ψ2 = 6 τ1ψ
2 = 6 υ
1τ2 = 3
υ21υ2 = 3 υ
1ψ2 = 6 υ1ψ1τ2 = 6 υ1ψ1υ2 = 6 υ1ψ1ψ2 = 12
2 = 1 υ1τ2υ2 = 3 υ1τ2ψ2 = 3 υ1υ
2 = 3 υ1υ2ψ2 = 6
2 = 6 ψ
1τ2 = 6 ψ
1υ2 = 6 ψ
1ψ2 = 12 ψ1τ
2 = 2
ψ1τ2υ2 = 6 ψ1τ2ψ2 = 6 ψ1υ
2 = 6 ψ1υ2ψ2 = 12 ψ1ψ
2 = 12.
Clearly, G2 acts on the partial quotient X . From Part A eq. (??) it follows that,
of the 7 non-torsion divisors above, only 3 are G2-invariant. This invariant part is
particularly manageable and will be important in the following. We find
= span
φ, τ1, τ2
with products 3τ 2i = τiφ. In particular, the triple intersection numbers on X are
τ 21 τ2 = 1, τ1φτ2 = 3, τ1τ
2 = 1, (13)
and 0 otherwise. Finally, the second Chern class of X is c2(X) = 12(τ
1 + τ
2 ). There-
fore,
· τ1 = 12, c2
· φ = 0, c2
· τ2 = 12. (14)
2.3 Variables
As we discussed in Part A ??, the instanton-generated superpotential should be
thought of as a series with one variable for each generator in H2. In particular, we
3The torsion in H2 are just the Wilson lines, that is, first Chern classes of flat line bundles. They
will play no role in the following. The torsion curves in H2, on the other hand, are the focus of this
paper.
Calabi-Yau
threefold
) Free
generators
Torsion
generators
X̃ Z19
p0, q0, . . . , q8, r0, . . . , r8
X = X̃/G1 Z
7 ⊕ Z3
P,Q1, Q2, Q3, R1, R2, R3
X = X̃/G Z3 ⊕ Z3 ⊕ Z3
p, q, r
b1, b2
Table 1: The different Calabi-Yau threefolds, curve classes, and variables used
to expand the prepotential.
will be interested in the Calabi-Yau threefolds X̃ , X , and X . For these, the degree-2
integral homology and the variables used (see Part A [1] for precise definitions) are
in summarized Table 1. Pushing down the curves by the respective quotients lets us
express the prepotential on the quotient in terms of the prepotential on the covering
space. We found in Part A that
P,Q1, Q2, Q3, R1, R2, R3, b1)
PQ51Q
3 b1, Q
2 , Q
3b1, Q3, 1, b1, Q1Q
2 , R
1, R3, 1, b
1, R1R
p, q, r, b1, b2) =
p, q, b2, b2, r, b
2, b1
. (16)
3 Toric Geometry and Mirror Symmetry
In this section we review mirror symmetry and the construction of the B-model for
the mirror of the covering space X̃ . Since X̃ is a complete intersection in a toric
variety, we can use the standard constructions. Because we expect the model to
be self-mirror, we will analyze the B-model for X̃ and its mirror X̃∗. The toric
geometry for X̃ is much simpler4 than for X̃∗, but contains less information. In this
section we will start with the simpler model in order to review the Batyrev-Borisov
construction for the mirror of a complete intersection in a toric variety. Then we will
apply this construction to the more complicated model, now without going into too
many details. We will see that, on the simpler side, not all parameters are toric and
no torsion is visible. However, on the more complicated mirror side, all parameters
are toric which will allow us, in principle, to perform the B-model computation of
4Meaning that X̃ is a complete intersection in the very simple toric variety P2 ×P1 ×P2, whereas
X̃∗ is embedded in a complicated toric ambient variety.
the complete prepotential. As X̃ ∼= X̃∗ is expected to be self-mirror, this determines
the complete prepotential F
eX∗,0
as well. In practice, however, the analysis is
computationally too involved.
Fortunately, the space X = X̃/G1 and its mirror will turn out to be both tractable
with toric methods and sufficiently informative. This quotient will be the subject
of Section 4. Finally, this is also the starting point for arguing in Section 5 that the
self-mirror property persists at the level of instanton corrections.
Recall that, in Subsection 2.1 we defined our Calabi-Yau manifold as the complete
intersection
F1 = 0, F2 = 0
⊂ P2×P1×P2 (17)
with the two polynomials F1, F2 as in eqns. (7a) and (7b), respectively. In order to
construct the mirror manifold following Batyrev and Borisov, we need to reformulate
this definition in terms of toric geometry. We review here some essential ingredients
of toric geometry, for details we refer to [30, 31] and references therein. We will give
the abstract definitions and concepts step by step, and at each step illustrate them
with the example X̃ and its mirror manifold X̃∗.
3.1 Toric Varieties
Given a lattice N of dimension d, a toric variety VΣ is defined in terms of a fan Σ
which is a collection of rational polyhedral5 cones σ ⊂ N such that it contains all
faces and intersections of its elements. VΣ is compact if the support of Σ covers all
of the real extension NR of the lattice N . The resulting d-dimensional variety VΣ is
smooth if all cones are simplicial and if all maximal cones are generated by a lattice
basis.
Let Σ(1) denote the set of one-dimensional cones (rays) with primitive generators ρi,
i = 1, . . . , n. The simplest description of VΣ introduces n homogeneous coordinates zi
corresponding to the generators ρi of the rays in Σ
(1). These homogeneous coordinates
are then subjected to weighted projective identifications
z1 : · · · : zn
1 z1 : · · · : λ
a = 1, . . . , h (18)
for any nonzero complex number λ ∈ C×, where the integer n-vectors q
i are genera-
tors of the linear relations
i ρi = 0 among the primitive lattice vectors
6 ρi. In or-
der to obtain a well-behaved quotient, we must exclude an exceptional set Z(Σ) ⊂ Cn
5Here, the tip of the cone is always the origin of N . A cone is rational if it is spanned by rays
which pass through lattice points (other than the origin), that is, have rational slopes. A cone is
polyhedral if it is the cone over an (d − 1)-dimensional polytope. In other words, curved faces are
not allowed.
6We will use the same symbol ρ∗ to denote the generators in Σ
(1) and the corresponding primitive
lattice vectors in N .
that is defined in terms of the fan, as will be explained below. Hence, the quotient is
Cn − Z(Σ)
, (19)
where Γ ≃ N/ span{ρi} is a finite abelian group. There are h = n−d independent C
identifications, therefore the complex dimension of VΣ equals the rank d of the lattice
N . The identifications by Γ are only non-trivial if the ρi do not span the lattice N .
Refinements of the lattice N with fixed ρi can hence be used to construct quotients
of toric varieties VΣ by discrete phase symmetries such as Z3. Such quotients will be
discussed in Section 4. Note that the rays ρi are in 1–to–1 correspondence with the
(C×)-invariant divisors Di on VΣ, which are defined as
zi = 0
⊂ VΣ. (20)
Conversely, the homogeneous coordinate zi is a section of the line bundle O(Di).
For example, consider the simplest compact toric variety, the projective space Pd.
Its fan Σ = Σ(∆) is generated by the n = d+ 1 vectors
ρ1 = e1, ρ2 = e2, . . . , ρn−1 = ed, ρn = −
ei (21)
of a d-dimensional simplex ∆. They satisfy a single linear relation,
i=1 ρi = 0.
Therefore qi = 1 for all i, and the homogeneous coordinates in eq. (18) are the usual
homogeneous coordinates on Pd.
For products of toric varieties we simply extend the relations for any single factor
by zeros and take the union of them. Hence, the fan of the polyhedron ∆∗ describing
the 5-dimensional toric variety P2×P1×P2 in eq. (17) is generated by the n = 5+3 = 8
vectors
ρ1 = e1, ρ2 = e2, ρ3 = −e1 − e2, ρ4 = e3,
ρ5 = −e3, ρ6 = e4, ρ7 = e5, ρ8 = −e4 − e5
satisfying the linear relations
ρi = 0. (23)
Except for the origin, there are no other lattice points in the interior of ∆∗. The
corresponding homogeneous coordinates will be denoted by
z1 = x0, z2 = x1, z3 = x2,
z4 = t0, z5 = t1,
z6 = y0, z7 = y1, z8 = y2.
In more general situations, given a polytope ∆∗ ⊂ N we will denote the resulting toric
variety by P∆∗ = VΣ(∆∗).
3.2 The Batyrev-Borisov Construction
Batyrev showed that a generic section ofK−1
, the anticanonical bundle of P∆∗, defines
a Calabi-Yau hypersurface if ∆∗ is reflexive, which means, by definition, that ∆∗ and
its dual
x ∈MR
∣∣∣ (x, y) ≥ −1 ∀y ∈ ∆∗
are both lattice polytopes. Here, M = Hom(N,Z) is the lattice dual to N and MR is
its real extension. Mirror symmetry corresponds to the exchange of ∆ and ∆∗ [32].
The generalization of this construction to complete intersections of codimension r > 1
is due to Batyrev and Borisov [33, 34]. For that purpose, they introduced the notion
of a nef partition. Consider a dual pair of d-dimensional reflexive polytopes ∆ ⊂
∗ ⊂ NR. In that context, a partition E = E1 ∪ · · · ∪ Er of the set of vertices
of ∆∗ into disjoint subsets E1, . . . , Er is called a nef-partition if there exist r integral
upper convex Σ(∆∗)-piecewise linear support functions φl : NR → R, l = 1, . . . , r such
φl(ρ) =
1 if ρ ∈ El,
0 otherwise.
Each φl corresponds to a divisor
D0,l =
Dρ (27)
on P∆∗ , and their intersection
Y = D0,1 ∩ · · · ∩D0,r (28)
defines a family Y of Calabi-Yau complete intersections of codimension r. Moreover,
each φl corresponds to a lattice polyhedron ∆l defined as
x ∈MR
∣∣∣ (x, y) ≥ −φl(y) ∀y ∈ NR
. (29)
The lattice points m ∈ ∆l correspond to monomials
〈m,ρi〉
i ∈ Γ (P∆∗ ,O(D0,l)) . (30)
One can show that the sum of the functions φl is equal to the support function of K
and, therefore, the corresponding Minkowski sum is ∆1 + · · · + ∆r = ∆. Moreover,
the knowledge of the decomposition E = E1 ∪ · · · ∪Er is equivalent to that of the set
of supporting polyhedra Π(∆) = {∆1, . . . ,∆r}, and therefore this data is often also
called a nef partition.
It can be shown that given a nef partition Π(∆) the polytopes7
{0} ∪ El
⊂ NR (31)
define again a nef partition Π∗(∇) = {∇1, . . . ,∇r} such that the Minkowski sum
∇ = ∇1 + · · · + ∇r is a reflexive polytope. This is the combinatorial manifestation
of mirror symmetry in terms of dual pairs of nef partitions of ∆∗ and ∇∗, which we
summarize in the diagram
∆ = ∆1 + . . .+∆r ∆
∇1, . . . ,∇r
Mirror
Symmetry
uujjj
MR NR
∆1, . . . ,∆r
(∆l,∇l′) ≥ −δl l′ ∇ = ∇1 + . . .+∇r
. (32)
In the horizontal direction, we have the duality between the lattices M and N and
mirror symmetry goes from the upper right to the lower left. The other diagonal has
also a meaning in terms of mirror symmetry as we will explain below. The complete
intersections Y ⊂ P∆∗ and Y
∗ ⊂ P∇∗ associated to the dual nef partitions are then
mirror Calabi-Yau varieties.
Let us now apply the Batyrev-Borisov construction to the complete intersection
eq. (17), hence r = 2. There exist several nef-partitions of ∆∗. The one which has
the correct degrees (3, 1, 0) and (0, 1, 3) is, up to exchange of t0 and t1, E1 = {ρi |i =
1, . . . , 4} and E2 = {ρi |i = 5, . . . , 8}. Adding the origin and taking the convex hull
yields the polytopes
ρ1, . . . , ρ4, 0
, ∇2 =
ρ5, . . . , ρ8, 0
, (33)
where the ρi are defined in eq. (22). The two divisors cutting out the Calabi-Yau
threefold are, according to eq. (27),
D0,1 =
Di, D0,2 =
Di ⇒ X̃ = D0,1 ∩D0,2 ⊂ P∆∗ (34)
Note that, while ∆∗ has no further lattice points, its dual ∆ has 18 vertices and 300
lattice points. Using the computer package PALP [35], we determine the associated
polytopes ∆1 and ∆2 of the global sections of O(D0,1) and O(D0,2), respectively. In
an appropriate lattice basis there is, up to symmetry, a unique nef partition consisting
ν1, . . . , ν6, 0
, ∆2 =
ν7, . . . , ν12, 0
, (35)
7The brackets
· · ·
denote the convex hull.
where
ν1 = 2e1 − e2, ν2 = −e1 + 2e2, ν3 = −e1 − e2,
ν4 = 2e1 − e2 − e3, ν5 = −e1 + 2e2 − e3, ν6 = −e1 − e2 − e3,
ν7 = 2e4 − e5, ν8 = −e4 + 2e5, ν9 = −e4 − e5,
ν10 = e3 + 2e4 − e5, ν11 = e3 − e4 + 2e5, ν12 = e3 − e4 − e5.
Among these 12 vectors there are the 7 independent linear relations
3ν3 + ν4 + ν5 − 2ν6 = 0, 3ν9 + ν10 + ν11 − 2ν12 = 0,
ν1 − ν3 − ν4 + ν6 = 0, −ν1 + ν2 + ν4 − ν5 = 0,
ν7 − ν9 − ν10 + ν12 = 0, −ν7 + ν8 + ν10 − ν11 = 0,
−ν2 + ν5 − ν8 + ν11 = 0.
The convex hull ∇∗ = 〈∆1,∆2〉 yields the fan Σ(∇
∗) and, consequently, the toric
variety P∇∗ . Let D
i , i = 1, . . . , 12 be the divisors associated to the vertices νi. Then,
by eq. (27), the nef partition eq. (35) defines the divisors
D∗0,1 =
D∗i , D
0,2 =
D∗i , ⇒ X̃
∗ = D∗0,1 ∩D
0,2 ⊂ P∇∗ (38)
cutting out the mirror complete intersection X̃∗. In contrast to ∆∗, the polytope ∇∗
contains extra integral points. We find that it contains, in addition to the origin and
the vertices in eq. (36), the 26 points
ν13 =
(ν4 + ν5 + ν6) = −e3, ν12+6k+i+j =
(ν3k+i + 2ν3k+j),
ν14 =
(ν10 + ν11 + ν12) = e3, ν15+6k+i+j =
(ν3k+j + 2ν3k+i)
∀ k ∈ {0, . . . , 3}, (i, j) ∈
(1, 2), (1, 3), (2, 3)
For completeness, note that the dual polytope ∇ has 15 vertices and 24 lattice points.
Running PALP to compute the Hodge numbers using the formula of [36], we obtain
= h1,2
= h1,1
= h1,2
= 19, (40)
in agreement with Part A [1], eq. (??).
So far, we have mainly focused on the information contained in the reflexive poly-
topes ∆∗ and ∇∗ and ignored their duals. We have already mentioned that in the
reflexive case a generic section of K−1
defines a Calabi-Yau manifold, and that such
sections are provided by the lattice points of ∆. In other words, ∆ and ∇ are the
Newton polytopes of Y and Y ∗, respectively. That is, the complete intersection Y
(Y ∗) is defined by r polynomial equations, and the exponents of the monomials in
each equation are the lattice points in ∆ (∇). More precisely, the Minkowski sum for,
say, ∆ = ∆1 + · · ·+∆r defines r homogeneous polynomials
Fl(z) =
∇l′∩N
〈m,ρi〉+δl l′
i , l = 1, . . . , r (41)
with coefficients al,m ∈ C. The simultaneous vanishing of F1, . . . , Fr then defines
the complete intersection Calabi-Yau manifold Y ⊂ P∆∗ . Exchanging ∆l and ∇l′ in
eq. (41) yields the equations F ∗l defining the mirror manifold Y
∗. It is in this sense
that the map from the upper left to the lower right in eq. (32) is also a manifestation
of mirror symmetry. Since we will not need the actual polynomials for X̃ and X̃∗, we
refrain from writing them explicitly. Instead, we refer the reader to Section 4, where
we determine the equations in a simpler situation.
3.3 Toric Intersection Ring
Up to now we have only considered one of the ingredients in the fan Σ, namely, the
generators ρ ∈ Σ(1) which defined the C× action in eq. (19). The second ingredient
is the exceptional set Z(Σ). It corresponds to fixed loci of a continuous subgroup of
for which the quotient eq. (19) is not well defined. Therefore, these loci have
to be removed. In terms of the homogeneous coordinates zi, this happens precisely
when a subset {zi |i ∈ I}, I ⊆ {1, . . . , n}, of the coordinates vanishes simultaneously
such that there is no cone σ ∈ Σ containing all of the ρi ⊆ σ, i ∈ I. Hence, the set
Z(Σ) is the union of the sets ZI = {[z1 : · · · : zn] |zi = 0 ∀i ∈ I}. Minimal index
sets I with this property are called primitive collections [37]. In order to determine
the index sets I we need a coherent8 triangulation T = T (∆∗) of the polytope ∆∗
for which all simplices contain the origin. Different triangulations will yield different
exceptional sets and, hence, different toric varieties. However, for simplicity, we will
mostly suppress the choice of a triangulation in the notation. In the case of complete
intersections, only those triangulations of ∆∗ are compatible with a given nef partition
that can be lifted to a triangulation of the corresponding Gorenstein cone, see [38].
The polytope defining projective space Pd admits a unique triangulation with
the required properties, and this triangulation consists of n = d + 1 simplices. The
only primitive collection is I = {1, . . . , n}. This is well-known from the definition
of projective space, where we have to remove the origin z1 = · · · = zd+1 = 0 from
Cd+1. Similarly, the polyhedron ∆∗ for the ambient space P∆∗ of X̃ admits a unique
triangulation, and the primitive collections are those of its factors, that is,
I1 = {1, 2, 3}, I2 = {4, 5}, I3 = {6, 7, 8}. (42)
8Coherent triangulations, sometimes also called regular triangulations, satisfy some technical
property that is equivalent to the associated toric quotient being Kähler.
The mirror polyhedron∇∗, on the other hand, admits a huge number of triangulations.
We will discuss particularly interesting triangulations of the mirror polyhedron at the
end of Appendix A.
The primitive collections determine the cohomology ring of toric varieties and,
together with the nef partition, complete intersections. Recall that if the collection
ρi1 , . . . , ρik of rays is not contained in at least one cone, then the corresponding ho-
mogeneous coordinates zil are not allowed to vanish simultaneously. Therefore, the
corresponding divisors Dil have no common intersection. Hence, we obtain non-linear
relations RI = Di1 · . . . ·Dik = 0 in the intersection ring. It can be shown that all such
relations are generated by the primitive collections I = {i1, . . . , ik} defined above.
The ideal generated by these RI is called Stanley-Reisner ideal
ISR =
RI , I primitive collection
⊂ Z[D1, . . . , Dn], (43)
and Z[D1, . . . , Dn]/ISR is the Stanley-Reisner ring. The intersection ring of a non-
singular compact toric variety PΣ is [39]
= Z [D1, . . . , Dn]
(m, ρi)Di
. (44)
In other words, the intersection ring can be obtained from the Stanley-Reisner ring by
adding the linear relations
(m, ρi)Di = 0, where it is sufficient to take a set of basis
vectors for m ∈ M . In particular, the intersection number of the divisors spanning a
maximal-dimensional simplicial cone σ = span
R≥{ρi1 , . . . , ρid} is
Di1 · . . . ·Did =
Vol(σ)
, (45)
where Vol(σ) is the lattice-volume, that is, the geometric volume divided by the
volume 1
of a basic simplex. For practical purposes it is sufficient to compute one
of these volumes, the remaining intersections can be obtained using the linear and
non-linear relations.
Having found the intersection ring of the ambient toric variety, we now turn to
the complete intersection Y ⊂ P∆∗ . The toric part of its even-degree intersection ring
is [40]
Hevtoric
= Q [D1, . . . , Dn]
IY , (46)
where IY is the ideal quotient
(m, ρi)Di
D0,l. (47)
Note that it can happen that some of the Di appear as generators of IY . This means
that they can be set to zero in the intersection ring. Geometrically, this means that
these divisors do not intersect a generic complete intersection Y . While the inter-
section ring depends on the triangulation T (∆∗) through the primitive collections
defining the Stanley-Reisner ideal, we conjecture that the divisors Di not intersecting
Y are independent of the choice of triangulation. This conjecture is proven for r = 1
and supported by a large amount of empirical evidence for r > 1. We conclude that
the dimension dimH2toric(Y ) is in general smaller than h
1,1(Y ) for the following two
reasons: Only h = n − d = dimH2(P∆∗,Z) divisors are realized in the ambient toric
variety P∆∗ , and some of them may not descend to the complete intersection Y . Using
the adjunction formula we can compute the the Chern classes of Y by expanding
c(Y ) =
(1 +Di)
(1 +D0,l)
. (48)
The intersection ring together with the second Chern class determine the diffeomor-
phism type of a simply-connected Calabi-Yau manifold [41]. If we consider the coho-
mology with integral coefficients there can be torsion and, in fact, this is what this
paper is all about. Unfortunately, a combinatorial formula in terms of the fan Σ(∆)
for the torsion in the integral cohomology of a toric Calabi-Yau manifold is only known
in the hypersurface case [6].
We now illustrate these concepts in the example of the complete intersection X̃ ⊂
P∆∗ = P
2 × P1 × P2 and its mirror manifold X̃∗. In eq. (42) we already determined
the primitive collections, hence the corresponding Stanley-Reisner ideal is
ISR =
D1D2D3, D4D5, D6D7D8
. (49)
The linear equivalences are D1 = D2, D1 = D3, D4 = D5, D6 = D7, D6 = D8 and,
hence, we can choose K1 = D4, K2 = D1, K3 = D6 as a basis for H
2(P∆∗). In terms
of this basis, we obtain D0,1 = K1+3K2 and D0,2 = K1+3K3, see eq. (27). Therefore,
the ideal I
in eq. (34) is
2K2 −K2
2K3, K1K2 − 3K2
2, K1K3 − 3K3
2, K1
2, K2
3, K3
. (50)
Next, we define the restriction of the Ki to X̃ to be the divisors
J̃i = Ki · X̃ = Ki(K1 + 3K2)(K1 + 3K3). (51)
We need to compute one of the intersection numbers directly from the volume of a
cone, say, J̃1J̃2J̃3 = K1K2K3(K1 + 3K2)(K1 + 3K3) = 9K1K
3 , where we made use
of the relations in I
. Using eq. (45), this intersection can be evaluated to be
3 = 9D1D2D4D6D7 = 9/Vol
〈ρ1, ρ2, ρ4, ρ6, ρ7〉
= 9/Vol
〈e1, e2, e3, e4, e5〉
Then, again using eq. (50), we see that the only non-vanishing intersection numbers
and the second Chern class are
J̃22 J̃3 = 3, J̃1J̃2J̃3 = 9, J̃2J̃
3 = 3,
· J̃1 = 0, c2
· J̃2 = 36, c2
· J̃3 = 36.
Note that only h
toric(X̃) = 3 of the h
1,1(X̃) = 19 parameters are realized torically.
Comparing the triple intersection numbers with eq. (13), it is clear that these 3 toric
divisors are precisely the G-invariant divisors on X̃ .
A similar, though much more complicated, calculation can be done for X̃∗ ⊂ P∇∗ .
Using the results of Appendix A one can show that, among the points in eq. (39), the
14 divisors D∗13, D
14, D
12+6k+i+j , D
15+6k+i+j, k = 0, 2 appear as generators of eq. (47)
and, therefore, do not intersect X̃∗. Subtracting from the remaining 24 divisors in
eqns. (36) and (39) the remaining 5 linear relations in eq. (37), we find that all
toric(X̃
∗) = h1,1(X̃∗) = 19 moduli are realized torically.
3.4 Mori Cone
As we have just seen, the cohomology classes Di span H
2(PΣ,R) = H
1,1(PΣ). The
Kähler classes of a smooth projective toric variety PΣ form an open cone in H
1,1(PΣ)
called the Kähler cone K(PΣ). This cone has a combinatorial description in terms of
the fan Σ, which we now review.
First, define a support function to be a continuous function ψ : NR → R given on
each cone σ ∈ Σ by an mσ ∈MR via
ψ(ρ) = (mσ, ρ) ∀ρ ∈ σ ⊂ NR. (54)
A support function determines a divisor D =
ψ(ρi)Di. We say that D is convex if
ψ is a convex function on NR. The convex classes form a non-empty strongly convex
polyhedral cone in H1,1(PΣ) whose interior is the Kähler cone K(PΣ). Such a support
function is strictly convex if and only if
ψ(ρi1 + · · ·+ ρik) > ψ(ρi1) + · · ·+ ψ(ρik) (55)
for every primitive collection I = {i1, . . . , ik} [40]. The dual of the Kähler cone
K(PΣ) is called the Mori cone or the cone of numerically effective curves NE(PΣ).
Its generators can be described by vectors l(a) of the corresponding linear relations∑
i ρi = 0. Each face of the Kähler cone K(PΣ) is dual to an edge of NE(PΣ).
These edges are generated by curves c(a), and the entries of the vector l(a) are
= c(a) ·Di. (56)
A practical algorithm to find the generators for l(a) in terms of the triangulation T (∆∗)
is described in [42]. Of course, we are not interested in the ambient space but in a
complete intersection Y ⊂ P∆∗ . The restriction of a Kähler class on the ambient space
yields a Kähler class on Y , but not every Kähler class on Y arises that way. We define
the toric part of the Kähler cone on Y as the restriction [43]
K(Y )toric = K(PΣ)
⊂ K(Y ). (57)
In the simplicial case, we can always take the basis Ji of H
toric(Y,Q) to be edges of the
Kähler cone. The dual of the toric Kähler cone of Y is the (toric) Mori cone NE(Y )toric.
This is sufficient for mirror symmetry purposes, however, it can be larger than the
actual cone of effective curves. Once the generators l(a) of NE(P∆∗) are determined,
we need to add the information about the nef partition. For this purpose, we define
= −D0,m · c
(a) m = 1, . . . , r. (58)
Finally, it is customary to write the generators of the Mori cone NE(Y )toric as
l(a) =
0,1 , . . . , l
0,r ; l
1 , . . . , l
, (59)
which are, by abuse of notation, again denoted by l(a). The knowledge of the (toric)
Mori cone is important for several reasons. It defines the local coordinates on the
complex structure moduli space of the mirror Y ∗ near the point of maximal unipotent
monodromy. Moreover, the generators enter the coefficients of the fundamental period
which is a solution of the Picard-Fuchs equations as we will review in Subsection 3.5.
For example, using the unique primitive collections in eq. (42), the Mori cone for
P∆∗ is generated
l(1) =(0, 0, 0, 1, 1, 0, 0, 0)
l(2) =(1, 1, 1, 0, 0, 0, 0, 0)
l(3) =(0, 0, 0, 0, 0, 1, 1, 1).
Recalling the nef partition D0,1 = D1 + · · ·+D4, D0,2 = D5 + · · ·+D8, we prepend
(−D0,1 · c
(a),−D0,2 · c
(a)) = (−3, 0), (−1,−1), (0,−3), a = 1, 2, 3, to obtain the gener-
ators
l(1) =(−1,−1; 0, 0, 0, 1, 1, 0, 0, 0)
l(2) =(−3, 0; 1, 1, 1, 0, 0, 0, 0, 0)
l(3) =( 0,−3; 0, 0, 0, 0, 0, 1, 1, 1)
of the Mori cone NE(X̃)toric. Due to the large number of toric moduli, the calculation
for the Mori cone NE(P∇∗) of the ambient toric variety of the mirror X̃
∗ is much more
complex.
9We sort the Mori cone generators such that the first one corresponds to the P1 of the ambient
space, and the second and third generator are the hyperplane sections of the two P2. In other words,
we have J̃a · c
(b) = δb
. This is the basis of curves that we used for the A-model computation.
3.5 B-Model Prepotential
Mirror symmetry identifies the quantum corrected Kähler moduli space of Y with
the classical complex structure moduli space of Y ∗, see the excellent treatise in [43]
for details. The deformations of the complex structure of Y ∗ are encoded in the
periods ̟ =
Ω and the latter can be computed from the equations F ∗l that cut
out Y ∗ ⊂ P∇∗ . Given the Mori cone eq. (59) and the classical intersections numbers
κabc = Ja · Jb · Jc we follow [44, 45, 38, 43] to write down a local expansion of the
periods, convergent near the large complex structure point, which is characterized
by its maximal unipotent monodromy. In the following, we will review just the bare
essentials.
The coefficients ai in the polynomial constraints F
l of the complete intersection Y
see eq. (41), define the complex structure of Y ∗. A particular set of local coordinates
ua on the complex structure moduli space on Y
∗ is defined by
i b = 1, . . . , h (62)
where h
toric(Y ) and am,0 is the coefficient in (41) corresponding to the origin in
∇l. In these coordinates, the point of maximal unipotent monodromy is at ub = 0.
We define the cohomology-valued period
̟(u, J) =
{na≥=0}
0,mJa
a=1 l
0,mna
a=1 l
una+Jaa . (63)
where (x)n = Γ(x + n)/Γ(x) is the Pochhammer symbol. Note that the choice of
triangulation is implicit in the generators l(a) of the Mori cone. Expanding ̟(u, J)
by cohomology degree yields
̟(u, J) = ̟(0)(u) +
̟(1)a (u)Ja +
̟(2)a (u)κabcJbJc −̟
(3)(u) dVol, (64)
where dVol is the volume form. The coefficients in eq. (64) are the fundamental period
̟(0)(u), that is, the unique solution to the Picard-Fuchs equations holomorphic at
ua = 0, and
̟(1)a (u) = ∂Ja̟(u, J)|J=0, ̟
a (u) =
κabc∂Jb∂Jc̟(u, J)|J=0,
̟(3)(u) = −
κabc∂Ja∂Jb∂Jc̟(u, J)|J=0.
These coefficients coincide with the basis of solutions of the Picard- Fuchs equations
obtained from the Frobenius method in [46, 31]. The B-model prepotential FBY ∗,0 is
Y ∗,0(u) =
2̟(0)(u)2
̟(0)(u)̟(3)(u) +
̟(1)a (u)̟
a (u)
. (66)
At the large complex structure point the mirror map defines natural flat coordinates
on the Kähler moduli space of the original manifold Y , which are
i (u)
̟0(u)
, i = 1, . . . , h. (67)
We also define qj = e
2πitj = uj + O(u
2). One way to obtain the prepotential is to
compute its third derivatives
C∗abc = DaDbDcF
Y ∗,0 =
Ω ∧ ∂a∂b∂cΩ, (68)
and apply the Picard-Fuchs operators. This leads to linear differential equations,
which determine C∗abc up to a common constant, see again [46, 43] for details. The
quantum corrected three point function Cijk(q) on Y follows from C
abc(u) using the
inverse mirror map eq. (67) u = u(t), and one obtains
Cijk(q) =
̟(0)(u(q))2
C∗abc(u(q)). (69)
In practice, we use the formula
Cijk(q) = ∂ti∂tj
k (u(q))
̟(0)(u(q))
. (70)
Integrating three times with respect to ti yields the prepotential F
Y ∗,0(t) up to a
polynomial of degree three in ti which can be determined partially by the topological
data of Y .
Mirror symmetry then ensures that the B-model prepotential, eq. (66), is equal to
the A-model prepotential. That is,
FY,0(q) = F
Y ∗,0(u(q)). (71)
This allows us to compute the instanton numbers nd. For the case of interest,
X̃ ∈ P∆∗ = P
2 × P1 × P2, (72)
we refer to [28] where this program been carried out in detail. The same calculation
can in principle be done on the mirror X∗, but the large number of toric moduli
again makes it highly extensive. Instead, we refer to the next section where a suitable
quotient of X̃∗ will be treated in detail for which the computations are reasonably
simple.
4 Quotienting the B-Model
In this section we consider the quotientX = X̃/G in terms of toric geometry and study
the mirror of X in this context. In order to achieve this, we first analyze the partial
quotient X = X̃/G1. Using the techniques introduced in Section 3, we construct the
mirror X
. Using their toric realization, we perform the B-model computation for the
non-perturbative prepotentials F
and F
, respectively. Finally, we explain how
one can implement the quotient by G2 on both sides in order to obtain X and X
4.1 The Quotient by G1
We start with a review of the general discussion of free quotients of complete intersec-
tions in toric geometry in [31]. Consider a fan Σ ⊂ NR and pick a lattice refinement
N̄ such that Γ = N̄/N is a finite abelian group. Such a lattice refinement consists of
a finite sequence of lattice refinements of the form N → N +wpZ which are described
by a vector wp =
αpiρi with αpi ∈ Z. The group Γ is then isomorphic to
Zkp .
Let Σ̄ be the fan obtained from Σ by relating everything to the lattice N̄ . In this
context, we make some additional identifications in the toric quotient eq. (19) [47].
One finds that VΣ̄ = VΣ/Γ is the quotient of VΣ by the finite abelian group Γ. Its
action on the homogeneous coordinates is by multiplication by phases
z1 : · · · : zn
ξα1z1 : · · · : ξ
, ξ = e
k , (73)
for every cyclic subgroup of order k. We will denote such group actions by Zk :
(α1, . . . , αn). If VΣ is a compact toric variety, then the quotient VΣ̄ is never free [39].
However, a hypersurface or complete intersection in VΣ need not intersect the set
of fixed points, and in that case we get a smooth quotient manifold with nontrivial
fundamental group.
We now apply this to P∆∗ = P
2 × P1 × P2 defined in eq. (22). The first step in
performing the quotient of P∆∗ by G1 thus amounts to a refinement N̄ = wZ+N of
the lattice N with index |G1| = 3. From the definition eq. (8a) of the action of G1 on
P∆∗ and eq. (24) we read off that the refinement is by a vector
ρ2 + 2ρ3 + ρ7 + 2ρ8
+ Z5. (74)
The resulting polytope ∆̄∗ admits the same nef partition as ∆∗ in eq. (33),
∇̄1 = 〈ρ̄1, . . . , ρ̄4, 0〉, ∇̄2 = 〈ρ̄5, . . . , ρ̄8, 0〉. (75)
where we express the generators ρ̄ in terms of ρ as
ρ̄i = ρi, i = 1, . . . , 6 ,
ρ̄7 = ρ7 + e1 + 2e2 + e4 + 2e5, ρ̄8 = ρ8 − e1 − 2e2 − e4 − 2e5.
It is easy to check that the ρ̄i satisfy the same linear relations eq. (23) as the ρi,
and that w = 1
(ρ̄1 − ρ̄2 + ρ̄6 − ρ̄7) = −e2 − e5. The ρ̄i together with w therefore
indeed generate the lattice N̄ . Note that, while all 8 non-zero lattice points of ∆̄∗
are vertices, the dual polytope ∆̄ has 18 vertices and 102 points. Using PALP [35]
again, we compute the lattice points of the polytope ∇̄∗ = 〈∆̄1, ∆̄2〉 ⊂MR, which will
describe the ambient space of the mirror X
of X . We find
∆̄1 = 〈ν̄1, . . . , ν̄6, 0〉, ∆̄2 = 〈ν̄7, . . . , ν̄12, 0〉, (77)
where we express the vertices ν̄i in terms of the vertices νi of ∇
ν̄3k+1 = ν3k+1, ν̄3k+2 = ν3k+2 − e5, ν̄3k+3 = ν3k+3 + e5, k = 0, . . . , 3. (78)
Again, it is easy to check that the ν̄i satisfy the same linear relations eq. (37) as the
νi. It turns out that the lattice points of ∇̄
∗ generate a sublattice M̄ of index 3 in M ,
and the lattice refinement is generated by
ν̄1 + 2ν̄2 + 2ν̄7 + ν̄8
= e2 + e4 − e5. (79)
Among the points of ∇∗ listed in eq. (39) only ν13 and ν14 are also lattice points of
the sublattice M̄ . In fact, we have ν̄13 = ν13 and ν̄14 = ν14. Hence, ∇̄
∗ has 12 vertices
and 15 lattice points; its dual ∇̄ = ∇̄1 + ∇̄2 has 42 lattice points among which 15 are
vertices10.
Once we have the polytopes ∆̄∗ and ∇̄∗, we can construct X and X
as complete
intersections entirely analogous to X̃ and X̃∗, see Section 3. That is, using eq. (27),
we define
X = D̄0,1 ∩ D̄0,2, X
= D̄∗0,1 ∩ D̄
0,2 (81)
in terms of the nef partitions eq. (75) and (77), respectively. Here, D̄i and D̄
i denote
the divisors associated to the generators ρ̄i and ν̄i, respectively. The absence of fixed
points of the G1 action on the complete intersection X̃ is guaranteed by the fact
that the resulting polytope ∆̄∗ ⊂ N̄R has no additional lattice points [31]. Hence,
X = X̃/G1 has a non-trivial fundamental group π1(X) = Z3. Surprisingly, it turns
out that the mirror X
is a free quotient as well. To see this recall that, as noticed
above, the lattice points of ∇̄∗ generate a sublattice M̄ of index 3 inM . Furthermore,
∇̄∗ also has no additional lattice points with respect to ∇∗. Therefore, there is a
10Note that all of our polytopes differ from the non-free Z3 × Z3 quotient of ∆
∗ defined in [28],
Proposition 7.1. In the notation of [31] their quotient is
∇∗ 6= P
1 1 1 0 0 0 0 0
0 0 0 1 1 1 0 0
0 0 0 0 0 0 1 1
Z3 : 0 1 2 0 0 0 0 0
Z3 : 0 0 0 0 1 2 0 0
and has 21 points and 8 vertices in the lattice N .
group G∗1 ≃ Z3 acting torically on P∇∗ . On the homogeneous coordinates this action
g∗1 :
z1 : · · · : z12
ζz1 : ζ
2z2 : z3 : · · · : z6 : ζ
2z7 : ζz8 : z9 : · · · : z12
. (82)
Hence, X
= X̃∗/G∗1 also has a non-trivial fundamental group π1(X
) = Z3. Note
that this never happens for hypersurfaces in toric varieties [6]. Having the toric
representation of X and X
, we can now compute their Hodge numbers. It turns out
= h1,2
= h1,1
= h1,2
= 7, (83)
in agreement with Part A [1], eq. (??).
4.2 The Quotient by G2
We now turn to the G2 action, which does not act torically. Hence, we cannot, in
principle, find a toric variety containing X = X/G2 as we did for the G1 quotient
above. However, at least we have to ensure that X and X
are G2-symmetric. This
can be achieved via suitable symmetries in the toric data.
The easy part of the toric data for X is the polytope ∆̄∗. The G2 action on the
ambient space permutes the homogeneous coordinates, see eq. (8b). In terms of toric
geometry, this means that it permutes the corresponding points of the polytope. That
is11,
g2 : ρ̄i 7→ ρ̄1+(i mod 3) ∀i ∈ {1, 2, 3},
g2 : ρ̄4 7→ ρ̄4, ρ̄5 7→ ρ̄5,
g2 : ρ̄5+i 7→ ρ̄6+(i mod 3) ∀i ∈ {1, 2, 3}.
It induces a mirror group actionG∗2 onX
which is geometrical, rather than a quantum
symmetry as discussed in [48]. The action of G∗2 is obviously the dual group action
on the dual lattice M , which again must be a symmetry of the relevant polytope ∇̄∗.
We find that
g∗2 : ν̄3k+i 7→ ν̄3k+1+(i mod 3) ∀k = 0, . . . , 3, i ∈ {1, 2, 3}. (85)
As a check on the mirror group action, note that the matrix of scalar products, see
eq. (87) below, is invariant. That is,
g2(ρ̄l), g
2(ν̄l′)
ρ̄l, ν̄l′
∀ l, l′. (86)
By abuse of notation, we denote the corresponding cyclic permutation of homogeneous
coordinates by g∗2 as well. Using this action, we define the mirror of X to be X
11We define the modulus operation such that (i mod 3) ∈ {0, 1, 2}.
/G∗2. This idea has already been used for the construction of mirrors of orbifolds
of the quintic [49] soon after the discovery of the first mirror construction by Greene
and Plesser.
Following eq. (41), the equations for the Calabi-Yau complete intersections X and
are defined by evaluating the matrix of scalar products 〈ρ̄i, ν̄j〉+ δl l′ , which are
〈 , 〉+ δl l′ ν̄1 ν̄2 ν̄3 ν̄4 ν̄5 ν̄6 ν̄13 ν̄7 ν̄8 ν̄9 ν̄10 ν̄11 ν̄12 ν̄14
ρ̄1 3 0 0 3 0 0 1 0 0 0 0 0 0 0
ρ̄2 0 3 0 0 3 0 1 0 0 0 0 0 0 0
ρ̄3 0 0 3 0 0 3 1 0 0 0 0 0 0 0
ρ̄4 1 1 1 0 0 0 0 0 0 0 1 1 1 1
ρ̄5 0 0 0 1 1 1 1 1 1 1 0 0 0 0
ρ̄6 0 0 0 0 0 0 0 3 0 0 3 0 0 1
ρ̄7 0 0 0 0 0 0 0 0 3 0 0 3 0 1
ρ̄8 0 0 0 0 0 0 0 0 0 3 0 0 3 1
The equations of X can now be read off from the columns of eq. (87), and one finds
F1 = (λ5t0 + λ6t1)(x
0 + x
1 + x
2) + (λ7t0 + λ8t1)x0x1x2, (88a)
F2 = (λ1t0 + λ4t1)(y
0 + y
1 + y
2) + (λ2t0 + λ3t1)y0y1y2, (88b)
where the G2-symmetry has been imposed. Note that the last monomial in each
equation corresponds to the vector 0 ∈ ∆̄l, l = 1, 2. Two of the eight coefficients λm
can be fixed by normalizing the equations, say λ4 = λ5 = 1, and three correspond to
the symmetries of P1, that is, SL(2) transformations of [t0 : t1]. Hence, we can, for
example, set λ6 = λ7 = λ8 = 0. This leaves us with 3 complex structure deformations
λ1, λ2, and λ3, see eqns. (7a) and (7b).
The equations defining X∗ correspond to the rows of eq. (87), that is,
F ∗1 = a1(z
4 + z
5 + z
6)z13 + (a2z10z11z12z14 + a3z4z5z6z13)z1z2z3, (89a)
F ∗2 = a4(z
10 + z
11 + z
12)z14 + (a5z4z5z6z13 + a6z10z11z12z14)z7z8z9, (89b)
where, again, invariance under G∗2 has been imposed and the last monomial of each
equation comes from the lattice point 0 ∈ ∇̄l, , l = 1, 2. Both equations are homo-
geneous with respect to all seven scaling degrees that follow from the linear relations
eq. (37). Among the twelve scalings of the coordinates zi, six are compatible with the
cyclic permutations g∗2, see eq. (85). Subtracting the three G2 symmetric indepen-
dent scalings among the relations eq. (37), there remains one torus action that acts
effectively on the parameters plus two normalizations of the equations. As expected,
the six parameters am of the equations of X
∗ thus become the 3 complex structure
moduli.
So far, we only considered the polytopes ∆̄∗ and ∇̄∗. However, this is only part of
the toric data defining the manifolds X and X
, respectively. In addition, we need the
triangulations and the corresponding exceptional sets. A change in the triangulation
corresponds to a flop of the toric variety. The very real danger is that not all, and
perhaps none, of the flopped Calabi-Yau manifolds are G2-symmetric. For X ⊂ P∆̄∗
this turns out to be unproblematic, but for X
⊂ P∇̄∗ we will find a condition for the
choice of a triangulation.
4.3 B-Model on X
We now return to the discussion of the triangulations and the intersection ring of X .
The analogous, but technically much more involved discussion of X
will be presented
in Subsection 4.5.
For X everything is straightforward since the G1-quotient did not introduce ad-
ditional lattice points in the associated polytope ∆̄∗. Therefore, just like for the
polytope ∆∗ of the covering space X̃ , there exists a unique triangulation. In particu-
lar the primitive collections, the Stanley-Reisner ideal, and the ideal IX are identical
to the ones in eqns. (42), (49), and (50) since they are derived from the same triangu-
lation. Moreover, one can easily see that this triangulation is G2-invariant and, hence,
X is G2 symmetric.
The only change is in the normalization of the intersection ring in eq. (52), since
the total volume has to be divided by 3 = |G1|. This can also be seen in eq. (76),
where the volume of the cone is now 3 instead of 1. Hence, on X the intersection ring
and the second Chern class are
J̄22 J̄3 = 1, J̄1J̄2J̄3 = 3, J̄2J̄
3 = 1,
· J̄1 = 0, c2
· J̄2 = 12, c2
· J̄3 = 12.
Comparing these intersection numbers with eq. (13), it is clear that the toric divisors
should be identified with the G1-invariant divisors on X as
J̄1 = φ, J̄2 = τ1, J̄3 = τ2. (91)
The curves spanning the Mori cone on the cover turn out to be G1-invariant as well.
Therefore, the Mori cones NE(P∆̄∗) and NE(X)toric are identical to those in eqns. (60)
and (61), respectively.
Following the steps given in Section 3 we now want to compute the B-model pre-
potential FB
, plug in the mirror map, and obtain the prepotential on X
(P,Q1, Q2, Q3, R1, R2, R3, b1). (92)
We immediately realize the following two caveats:
• We do not know how to incorporate the torsion curves H2(X,Z)tors = Z3 into
the toric mirror symmetry calculation.
• Of the 7 Kähler classes on X , only 3 are toric.
This means that only 3 out of the 7 + 1 variables in the prepotential are accessible,
and the remaining ones are set to one. Looking at the intersection numbers eq. (90),
it is clear that the 3 divisors are precisely the G2-invariant divisors on X , see eq. (13).
Therefore, these 3 variables must be those that map to the variables p, q, and r on
X . By comparing with eq. (16), we see that the corresponding variables on X are P ,
Q1, and R1. Hence, we actually only compute
(P,Q1, 1, 1, R1, 1, 1, 1) =
n1,n2,n3
nX(n1,n2,n3) Li3
P n1Qn21 R
. (93)
In effect, this means that the resulting instanton numbers are not just the instantons
in a single integral homology class, but the instanton numbers in a whole set of integral
homology classes. The instanton numbers sum over all curve classes that cannot be
distinguished by P,Q1, R1 ∈ Hom
H2(X,Z),C
. Up to total degree 4 and the
symmetry
nX(n1,n2,n3) = n
(n1,n3,n2)
, (94)
the resulting instanton numbers are
nX(1,0,0) = 27 n
(1,0,1) = 108 n
(1,0,2) = 378 n
(1,0,3) = 1080
nX(1,1,1) = 432 n
(1,1,2) = 1512 n
(2,0,1) = −54 n
(2,0,2) = −756
nX(2,1,1) = 864 n
(3,0,1) = 9.
4.4 Instanton Numbers of X
Knowing the prepotential on X , we now want to divide out the free G2 action and
arrive at the prepotential on X . Since we do not know the complete expansion but
only eq. (93), we have to set b1 = b2 = 1 in the descent equation (16). This yields
p, q, r, 1, 1) =
p, q, 1, 1, r, 1, 1, 1
n1,n2,n3
nX(n1,n2,n3) Li3
pn1qn2rn3
Up to the symmetry nX
(n1,n2,n3)
(n1,n3,n2)
, the non-vanishing instanton numbers for
X up to total degree 5 are
nX(1,0,0) = 9 n
(1,0,1) = 36 n
(1,0,2) = 126 n
(1,0,3) = 360
nX(1,0,4) = 945 n
(1,1,1) = 144 n
(1,1,2) = 504 n
(1,1,3) = 1440
nX(1,2,2) = 1764 n
(2,0,1) = −18 n
(2,0,2) = −252 n
(2,0,3) = −1728
nX(2,1,1) = 288 n
(2,1,2) = 3960 n
(3,0,1) = 3 n
(3,0,2) = 252
nX(3,1,1) = 756,
Unfortunately, this direct calculation misses the torsion information and only yields
the expansion F
X,0(p, q, r, 1, 1). The b1 dependence was lost because the toric methods
do not yield this part, and the b2 dependence was lost because the relevant divisor on
X was not toric. Comparing with the full expansion of the prepotential
p, q, r, b1, b2) =
n1,n2,n3
m1,m2
nX(n1,n2,n3,m1,m2) Li3
pn1qn2rn3bm11 b
, (98)
see Part A eq. (??), this means we only obtain the sum of the instanton numbers over
all torsion classes
nX(n1,n2,n3) =
m1,m2=0
nX(n1,n2,n3,m1,m2). (99)
Clearly, this destroys the torsion information, that is, the instanton numbers nX
(n1,n2,n3)
do not depend on the torsion part of the integral homology. For comparison purposes,
we list the instanton numbers nX(n1,n2,n3) for 0 ≤ n1, n2, n3 ≤ 5 in Table 2.
4.5 B-Model on X
We now study the mirror X
, which sits in a more complicated ambient toric variety.
Consequently, the analysis is more involved. The big advantage, however, will turn
out to be that all h11(X
) = 7 Kähler moduli are toric, which will enable us to obtain
the full instanton expansion.
Since the polytope ∇̄∗ in eq. (78) is not simplicial, we have to specify a resolution of
the singularities, that is, a triangulation T (∇̄∗). Moreover, not any triangulation will
do, but we have to make sure that it is compatible with the action of the permutation
group G∗2. While a tedious technicality, the existence of such a resolution has to
be shown in order to establish the existence of a geometrical mirror family of X .
In particular, we show in Appendix A that there is no projective resolution of the
ambient space among the 720 coherent star triangulations of ∇̄∗ that respects the
permutation symmetry eq. (85). In other words, if one demands G∗2 symmetry then
the ambient toric variety cannot be chosen to be Kähler, but only a complex manifold.
Clearly, in that case there is no Kähler cone and the usual toric mirror symmetry
algorithm does not work. What comes to the rescue is that there are two classes of
non-symmetric projective resolutions for which the symmetry-violating exceptional
sets do not intersect X
. Hence the complete intersection is G2-symmetric, even
though the ambient space is not.
We conclude that the extended Kähler moduli space of X
contains two sym-
metric phases. We will denote these two classes of triangulations by T± = T±(∇̄
see Appendix A. In fact, the two phases are topologically distinct, and only the tri-
angulation T+ describes the threefold X
that we are interested in. In Appendix B,
we will investigate the other triangulation T− which describes a flop of X
(0,n2,n3)
(3,n2,n3)
0 1 2 3 4 5
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
0 1 2 3 4 5
0 0 3 252 4158 40173 287415
1 3 756 15390 164280 1259685 7763364
2 252 15390 426708 5427684 46537092 310465062
3 4158 164280 5427684 73971360 657552966 4487097816
4 40173 1259685 46537092 657552966 5948103483 41016575313
5 287415 7763364 310465062 4487097816 41016575313 284581389204
nX(1,n2,n3) n
(4,n2,n3)
0 1 2 3 4 5
0 9 36 126 360 945 2268
1 36 144 504 1440 3780 9072
2 126 504 1764 5040 13230 31752
3 360 1440 5040 14400 37800 90720
4 945 3780 13230 37800 99225 238140
5 2268 9072 31752 90720 238140 571536
0 1 2 3 4 5
0 0 0 −144 −6048 −107280 −1235520
1 0 −306 −12348 −207000 −2273400 −19066500
2 −144 −12348 348480 14609520 235219680 2505155400
3 −6048 −207000 14609520 520226784 8245864800 87989812560
4 −107280 −2273400 235219680 8245864800 131759049600 1417949658000
5 −1235520 −19066500 2505155400 87989812560 1417949658000 15365394415800
(2,n2,n3)
(5,n2,n3)
0 1 2 3 4 5
0 0 −18 −252 −1728 −9000 −38808
1 −18 288 3960 27648 143748 620928
2 −252 3960 54432 380160 1976472 8537760
3 −1728 27648 380160 2654208 13799808 59609088
4 −9000 143748 1976472 13799808 71748000 309920688
5 −38808 620928 8537760 59609088 309920688 1338720768
0 1 2 3 4 5
0 0 0 45 5670 189990 3508920
1 0 36 13140 474840 8793648 111499020
2 45 13140 1112886 38961252 777759975 10723515300
3 5670 474840 38961252 1952428464 47357606430 732897531720
4 189990 8793648 777759975 47357606430 1237373786439 19911043749420
5 3508920 111499020 10723515300 732897531720 19911043749420 327006066948660
Table 2: Summed instanton numbers nX
(n1,n2,n3)
m1,m2
(n1,n2,n3,m1,m2)
(hence not distinguishing torsion) com-
puted by mirror symmetry. The table contains all non-vanishing instanton numbers for 0 ≤ n1, n2, n3 ≤ 6.
Following Subsection 3.3, given the triangulation T+, we can determine the prim-
itive collections. This immediately yields the Stanley-Reisner ideal
ISR =
D̄1D̄13, D̄2D̄4, D̄2D̄13, D̄3D̄4, D̄3D̄5, D̄3D̄13, D̄4D̄10, D̄4D̄11, D̄4D̄12,
D4D̄14, D̄5D̄10, D̄5D̄11, D̄5D̄12, D̄5D̄14, D̄6D̄10, D̄6D̄11, D̄6D̄12,
D̄6D̄14, D̄13D̄10, D̄13D̄11, D̄13D̄12, D̄13D̄14, D̄7D̄14, D̄8D̄10, D̄8D̄12,
D̄8D̄14, D̄9D10, D̄9D̄14, D̄1D̄2D̄3, D̄1D̄2D̄6, D1D̄5D̄6, D̄4D̄5D̄6,
D̄7D̄8D̄9, D̄7D̄9D̄11, D̄7D̄11D̄12, D̄10D̄11D̄12
(100)
where we dropped the superscript ∗ on D̄ for ease of notation. From this, in turn, we
obtain the generators l̄
+ of the Mori cone NE(P∇̄∗):
+ =( 0, 0, 0, 0, 0, 0, 1, 0, 0,−1, 0, 0, 0, 1)
+ =( 1, 0, 0,−1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0)
+ =(−1, 1, 0, 1,−1, 0, 0, 0, 0, 0, 0, 0, 0, 0)
+ =( 0,−1, 1, 0, 1,−1, 0, 0, 0, 0, 0, 0, 0, 0)
+ =( 0, 0,−1, 0, 0, 1, 0,−1, 0, 0, 1, 0, 0, 0)
+ =( 0, 0, 0, 0, 0, 0,−1, 0, 1, 1, 0,−1, 0, 0)
+ =( 0, 0, 0, 0, 0, 0, 0, 1,−1, 0,−1, 1, 0, 0)
+ =( 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,−3, 0)
+ =( 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0,−3).
(101)
A dual basis for the generators of the Kähler cone K(P∇̄∗) is
K̄1 = D̄13 + 2 D̄1 − D̄2 − D̄3 + D̄9 + D̄7 + D̄8 + 3 D̄4,
K̄2 = 3 D̄1 + D̄13 + 3 D̄4,
K̄3 = D̄13 + 2 D̄1 + 3 D̄4,
K̄4 = D̄13 + 2 D̄1 − D̄2 + 3 D̄4,
K̄5 = D̄13 + 2 D̄1 − D̄2 − D̄3 + 3 D̄4,
K̄6 = D̄13 + 2 D̄1 − D̄2 − D̄3 + D̄9 + D̄8 + 3 D̄4,
K̄7 = D̄8 + D̄13 + 2 D̄1 − D̄2 − D̄3 + 3 D̄4,
K̄8 = D̄4 + D̄1,
K̄9 = D̄10 + D̄7.
(102)
The Calabi-Yau complete intersection X
is then defined by X
= K̄1K̄2. It turns
out that the divisors D̄13, D̄14 do not intersect X
. Therefore, all
toric
= h1,1
= 7 (103)
Kähler moduli are realized torically. Since there are two divisors that do not intersect,
finding the Mori cone is somewhat subtle. First, we have to restrict the lattice of linear
relations to the sublattice orthogonal to these two directions. For the generators of
the toric Mori cone NE(X
)toric, this means that l̄
+ → 3l̄
+ + l̄
+ , l̄
+ → 3l̄
+ + l̄
and that we drop l̄
+ , l̄
+ as well as the entries corresponding to intersections with
D̄13, D̄14. In addition, we prepend the intersection numbers with D̄0,1 and D̄0,2. This
yields
+ =(−3, 0; 0, 0, 0, 0, 0, 0, 3, 0, 0,−2, 1, 1)
+ =( 0,−3; 3, 0, 0,−2, 1, 1, 0, 0, 0, 0, 0, 0)
+ =( 0, 0;−1, 1, 0, 1,−1, 0, 0, 0, 0, 0, 0, 0)
+ =( 0, 0; 0,−1, 1, 0, 1,−1, 0, 0, 0, 0, 0, 0)
+ =( 0, 0; 0, 0,−1, 0, 0, 1, 0,−1, 0, 0, 1, 0)
+ =( 0, 0; 0, 0, 0, 0, 0, 0,−1, 0, 1, 1, 0,−1)
+ =( 0, 0; 0, 0, 0, 0, 0, 0, 0, 1,−1, 0,−1, 1).
(104)
The dual basis of divisors is
J̄∗1 =
K̄21K̄2, J̄
K̄1K̄
2 , J̄
5 = K̄1K̄2K̄5,
J̄∗3 = K̄1K̄2K̄3, J̄
4 = K̄1K̄2K̄4,
J̄∗6 = K̄1K̄2K̄6, J̄
7 = K̄1K̄2K̄7.
(105)
We now try to identify this basis J̄∗1 , . . . , J̄
7 of divisors onX
with the basis {φ, τ1, υ1, ψ1, τ2, υ2, ψ2}
of divisors on X in eq. (10). It turns out that there is more than one way to identify
the bases if one only wants to preserve the triple intersection numbers. To obtain a
unique answer, we also need to identify the actions by G∗2 and G2 as well. First, the
G∗2 action on H
2(P∇̄∗ ,Z) is defined by eq. (85). Using the linear equivalence relations
2D̄1 − D̄2 − D̄3 + 2D̄4 − D̄5 − D̄6 = 0
−D̄1 + 2D̄2 − D̄3 − D̄4 + 2D̄5 − D̄6 = 0
2D̄7 − D̄8 − D̄9 + 2D̄10 − D̄11 − D̄12 = 0
−D̄2 + D̄3 − D̄5 + D̄6 − D̄8 + D̄9 − D̄11 + D̄12 = 0
−D̄4 − D̄5 − D̄6 + D̄10 + D̄11 + D̄12 − D̄13 + D̄14 = 0
(106)
and the definition eq. (105), one can compute the induced group action on H2(X
We find



1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 3 −1 1 0 0 0
0 3 −1 0 1 0 0
0 0 0 0 1 0 0
3 0 0 0 1 0 −1
0 0 0 0 1 1 −1



. (107)
Second, recall that the G2 action on the divisors of X



1 0 0 0 0 0 0
0 1 0 0 0 0 0
1 3 0 −1 0 0 0
0 3 1 −1 0 0 0
0 0 0 0 1 0 0
1 0 0 0 3 0 −1
0 0 0 0 3 1 −1



, (108)
see Part A eq. (??).
The essentially unique12 identification of divisors on X
and X then turns out to
J̄∗1 = τ1, J̄
2 = τ2, J̄
3 = ψ2, J̄
4 = υ2,
J̄∗5 = φ, J̄
6 = 3τ1 + υ1 − ψ1 = g2(ψ1), J̄
7 = υ1.
(109)
Note that we are identifying divisors on X
with divisors on X in eq.(109), something
that one would usually not do. However, in view of the anticipated self-mirror prop-
erty, X
∗ ∼= X, this is a sensible thing to try to attempt. And, indeed, the identification
above is an isomorphism of the intersection rings.
Regardless of this identification, we now continue to apply mirror symmetry. First,
the second Chern class is
· J̄∗1 = 12, c2
· J̄∗5 = 0, c2
· J̄∗2 = 12,
· J̄∗3 = c2
· J̄∗6 = 24, c2
· J̄∗4 = c2
· J̄∗7 = 12.
(110)
Using this information, we now compute the B-model prepotential
(q1, . . . , q7) = 3q5 + 3q4q5 +
q25 + 3q5q7 + 3q7q5q6 + 3q4q5q7 +
+ 3q3q4q5 +
7 + 3q3q4q5q7 +
+ 3q4q3q2q5 + 3q7q5q4q6 + 3q1q5q6q7 +
q45 +O(q
(111)
Finally, we insert the mirror map and obtain the A-model prepotential on X
. Since
we already identified the bases J̄∗1 , . . . , J̄
7 with the divisors on X , we will use the
same names (but with an added ∗ superscript) for the Fourier-transformed variables
to expand the prepotential. With this notation, we obtain
(P ∗, Q∗1, Q
3, 1) = 3P
∗ + 3
P ∗2 + 1
P ∗3 + 3
P ∗4 + 3
+ 3P ∗Q∗2 +
P ∗2Q∗22 + 3P
∗Q∗2Q
3 + 3P
∗R∗2 +
P ∗2R∗22 + 3P
∗R∗2Q
+ 3P ∗R∗2Q
3 + 3P
∗R∗2R
3 + 3P
∗R∗2R
2 + 3P
∗R∗2R
+ 3P ∗Q∗1R
3 + 3P
∗Q∗1R
2 + 9P
∗Q∗1R
3 + 3P
∗Q∗2Q
+ 3P ∗Q∗2Q
2 + 9P
∗Q∗2Q
total degree ≥ 6
(112)
12Up to the ĝ2 and g
2 ĝ2 symmetry.
see also Part A eq. (??). The instanton numbers on X
are the expansion coefficients
(P ∗, Q∗1, Q
3, 1)
n1,...,n7
(n1,n2,n3,n4,n5,n6,n7)
P ∗n1Q∗n21 Q
. (113)
We see that we almost get the complete instanton expansion eq. (15), we only miss
the expansion in the b∗1 variable which is not computed by the toric mirror symmetry
algorithm. Up to total degree 5, the instanton numbers are
(1,0,0,0,0,0,0) =3 n
(1,0,0,0,0,1,0) =3 n
(1,0,0,0,0,1,1) =3 n
(1,0,1,0,0,0,0) =3
(1,0,1,0,0,1,0) =3 n
(1,0,1,0,0,1,1) =3 n
(1,0,1,1,0,0,0) =3 n
(1,0,1,1,0,1,0) =3
(1,0,1,1,0,1,1) =3 n
(1,1,0,0,0,1,2) =9 n
(1,0,1,2,1,0,0) =9 n
(1,0,1,1,1,1,0) =3
(1,1,1,0,0,1,1) =3 n
(1,1,0,0,0,1,1) =3 n
(1,0,1,1,1,0,0) =3.
(114)
Finally, let us take a look at the G∗2 action, see eq. (85). Of the 7 generators of the
toric Mori cone, eq. (104), only the 3 generators l̄
+ , l̄
+ and l̄
+ are invariant. Not
surprisingly, the dual G∗2-invariant divisors
J̄∗5 = φ, J̄
1 = τ1, J̄
2 = τ2 (115)
were identified with the G2-invariant divisors on X in eq. (109). Therefore, only 3
Kähler parameters survive to the quotient X∗ = X
/G∗2, and we have
= h1,2
= h1,1
= h1,2
= 3. (116)
4.6 Instanton Numbers of X∗
Now that we have the expression eq. (113) for the prepotential on X
, we can again
apply a suitable variable substitution
P ∗, Q∗1, Q
p∗, q∗, r∗, b∗1, b
(117)
and obtain the prepotential on the quotient X∗ = X
/G∗2. The correct way to replace
the variables is determined by the group action on the homology and cohomology as
we explained in Part A. Having computed the G∗2-action in eq. (107), we determine
the descent equation for the prepotential to be13
p∗, q∗, r∗, b∗1, b
|G∗2|
p∗, q∗, b∗2, b
∗, b∗22 , b
2 , b
. (118)
13Interestingly, eq. (118) turns out to be exactly analogous to eq. (16), even though the identifica-
tion of divisors on X
and X is not just a relabeling of divisors.
Using the series expansion of the prepotential for b∗1 = 1 on X
from Subsection 4.5,
we now find that
X∗,0(p
∗, q∗, r∗, 1, b∗2)
Li3(p
2 ) + 4 Li3(p
2 ) + 4 Li3(p
+ 14 Li3(p
∗q∗2b
2 ) + 16 Li3(p
∗q∗r∗b
2 ) + 14 Li3(p
∗r∗2b
+ 40 Li3(p
∗q∗3b
2 ) + 56 Li3(p
∗q∗2rb
2 ) + 56 Li3(p
∗q∗r2b
+ 40 Li3(p
∗r∗3b
2 ) + 105 Li3(p
∗q∗4b
2 ) + 160 Li3(p
∗q∗3r∗b
− 2 Li3(p
∗2q∗b
2 )− 2 Li3(p
∗2r∗b
− 28 Li3(p
∗2q∗2b
2 ) + 32 Li3(p
∗2q∗r∗b
2 )− 28 Li3(p
∗2r∗2b
+ 3Li3(p
∗3q∗) + 3 Li3(p
∗3r∗)
total p∗, q∗, r∗-degree ≥ 5
(119)
The corresponding instanton numbers
X∗,0(p
∗, q∗, r∗, 1, b∗2) =
n1,n2,n3,m2
(n1,n2,n3,m2)
p∗n1q∗n2r∗n3b∗m22
(120)
are listed in Table 3. For comparison purposes, we list the summed instanton numbers
(n1, n2, n3) n
(n1,n2,n3,0)
(n1,n2,n3,1)
(n1,n2,n3,2)
(n1,n2,n3)
(1, 0, 0) 3 3 3 9
(1, 0, 1) 12 12 12 36
(1, 0, 2) 42 42 42 126
(1, 0, 3) 120 120 120 360
(1, 1, 1) 48 48 48 144
(1, 1, 2) 168 168 168 504
(2, 0, 1) −6 −6 −6 −18
(2, 0, 2) −84 −84 −84 −252
(2, 1, 1) 96 96 96 288
(3, 0, 1) 3 0 0 3
Table 3: Instanton numbers nX
(n1,n2,n3,m2)
computed by toric mirror symmetry.
They are invariant under the exchange n2 ↔ n3, so we only display
them for n2 ≤ n3.
on X as well, see eq. (99). One observes that the sum over the more refined instanton
numbers on X∗ equals the summed instanton number on X , another clue towards X
being self-mirror.
4.7 Instanton Numbers Assuming The Self-Mirror Property
So far, we have alluded to X being possibly self-mirror, but not actually made use
of this property. Now we are going to assume the self-mirror property and, hence,
obtain the prepotential on X as
X,0(p, q, r, b1, b2) = F
X∗,0(p, q, r, b1, b2). (121)
Note that at linear and quadratic order in p we can actually recover the b1, b2 expansion
from the summed instanton numbers in Subsection 4.4 and the factorization which
we will prove in Section 6.
In contrast, for the prepotential terms at order p3 we have to use the X∗ pre-
potential to obtain the b2 expansion from eq. (119). Since this is based on a toric
computation on X
, we do not directly obtain the b1 expansion. However, note that
the fact that g1 acted torically, eq. (8a), and g2 non-torically, eq. (8b), is just a conse-
quence of the choice of coordinate system on P2×P1×P2. By a suitable coordinate
choice, we could have made any one of the four Z3 subgroups of G = Z3 × Z3 act
torically. Therefore, any combination of b1, b2 other than 1 = b
2 has to occur in the
same way in the complete series expansion of the prepotential. We conclude that the
prepotential can only depend on b1 and b2 through the combinations
i,j=0
2. (122)
This observation lets us recover the full b1, b2 expansion of the prepotential. To
summarize, we obtain
X∗,0(p, q, r, b1, b2)
i,j=0
Li3(pb
2) + 4 Li3(pqb
2) + 4 Li3(prb
+ 14 Li3(pq
2bi1b
2) + 16 Li3(pqrb
2) + 14 Li3(pr
2bi1b
+ 40 Li3(pq
3bi1b
2) + 56 Li3(pq
2rbi1b
2) + 56 Li3(pqr
2bi1b
+ 40 Li3(pr
3bi1b
2) + 105 Li3(pq
4bi1b
2) + 160 Li3(pq
3rbi1b
+ 196 Li3(pq
2r2bi1b
2) + 160 Li3(pqr
3bi1b
2) + 105 Li3(pr
4bi1b
− 2 Li3(p
2qbi1b
2)− 2 Li3(p
2rbi1b
2)− 28 Li3(p
2q2bi1b
+ 32 Li3(p
2qrbi1b
2)− 28 Li3(p
2r2bi1b
2)− 192 Li3(p
2q3bi1b
+ 440 Li3(p
2q2rbi1b
2) + 440 Li3(p
2qr2bi1b
2)− 192 Li3(p
2r3bi1b
+ 3Li3(p
3q) + 3 Li3(p
+ 9 Li3(p
3q2) + 27
i,j=0
Li3(p
3q2bi1b
+ 9 Li3(p
3q2) + 27
i,j=0
Li3(p
3q2bi1b
+ 27 Li3(p
3qr) + 81
i,j=0
Li3(p
3qrbi1b
total p, q, r-degree ≥ 6
(123)
Obtaining all of these terms required a computation of FB
in eq. (111) up to total
degree 23 in the 7 variables, which is close to the limit of what can be done with
current desktop computers. We list the instanton numbers in Table 4. Observe that
the instanton numbers sometimes do depend on the torsion part of their homology
class.
5 The Self-Mirror Property
When one speaks of a Calabi-Yau manifold Y being self-mirror, one has to indicate
which level of invariants one is referring to. In particular, one might think of four
types of invariants that are natural from the point of view of string theory. The
weakest level is just the Euler number. In general, exchanging complex structure and
Kähler moduli changes the sign of χ(Y ) = 2h11(Y )− 2h21(Y ). Therefore, a necessary
condition for Y and its mirror Y ∗ to be equal is obviously that
χ(Y ) = −χ(Y ∗) = 0. (124)
This level of invariants, however, is much too crude and therefore insufficient. A much
stronger level is based on the fact that the cohomology groups of even degree come with
(1,n2,n3,0,0)
(1,n2,n3,m1,m2)
, (m1, m2) 6= (0, 0)
0 1 2 3 4
0 1 4 14 40 105
1 4 16 56 160
2 14 56 196
3 40 160
4 105
0 1 2 3 4
0 1 4 14 40 105
1 4 16 56 160
2 14 56 196
3 40 160
4 105
(2,n2,n3,0,0)
(2,n2,n3,m1,m2)
, (m1, m2) 6= (0, 0)
0 1 2 3
0 0 −2 −28 −192
1 −2 32 440
2 −28 440
3 −192
0 1 2 3
0 0 −2 −28 −192
1 −2 32 440
2 −28 440
3 −192
(3,n2,n3,0,0)
(3,n2,n3,m1,m2)
, (m1, m2) 6= (0, 0)
0 1 2
0 0 3 36
1 3 108
0 1 2
0 0 0 27
1 0 81
Table 4: Instanton numbers nX
(n1,n2,n3,m1,m2)
computed by mirror symmetry.
The table contains all non-vanishing instanton numbers for n1+n2+
n3 ≤ 5. The entries marked in bold depend non-trivially on the tor-
sion part of their respective homology class.
an integral lattice structure and form a ring, and therefore have a product. Because
of Poincaré duality, that is, H2(Y ) = H4(Y )∨, it is sufficient to look at H2(Y ). There
is a product H2(Y )×H2(Y ) → H2(Y ) whose structure constants κijk are the triple
intersection numbers. These intersection numbers are finer invariants than just the
dimensions of the cohomology groups, and a self-mirror Calabi-Yau threefold should
satisfy
κijk(Y ) = κijk(Y
∗). (125)
For simply connected threefolds with torsion-free homology a theorem of Wall [41]
states that the cohomology groups with the intersection product κijk(Y ) together
with the second Chern class c2(Y ) determine the diffeomorphism type of Y .
If, however, Y and Y ∗ have non-trivial fundamental groups then we cannot con-
clude that easily that they are diffeomorphic. But the non-trivial fundamental group
is often reflected in torsion in homology (for example if π1(Y ) is Abelian). In that
case, the conjecture of [6] says that for any Calabi-Yau threefold Z
. (126)
Therefore, a self-mirror manifold Y = Y ∗ is expected to satisfy
. (127)
Of the many spaces Y satisfying eq. (124) there are only a few which also satisfy
eq. (125).
So far we only considered classical topology, but we know that the ring H2(Y )
experiences quantum corrections when going far away from the large volume limit.
At small volume the intersection numbers are replaced by the three-point functions
Cijk(q) of (topological) conformal field theory in eq. (69). In the large volume limit
q goes to zero and the Cijk(q) go to κijk, as expected. The Cijk(q) are characterized
by the genus zero instanton numbers n
= nd. In mathematical terms, these are
resummations of the Gromow-Witten invariants of Y and characterize the symplectic
structure of Y . This level of invariants is even stronger than the cohomology ring,
since there are examples of diffeomorphic manifolds which have different Calabi-Yau
structures, i.e. different n
[50, 51, 31]. Therefore, a self-mirror Calabi-Yau threefold
Y must satisfy
(Y ) = n
(Y ∗). (128)
One can go even further and couple the topological conformal field theory to topolog-
ical gravity and define higher genus instanton numbers n
, where now
d (Y ) = n
∗), g > 0 (129)
has to hold. These invariants are very difficult to compute, however see [52, 53]
for recent progress. We do not know whether they contain more information about
the symplectic structure than the genus zero invariants. In other words, there are
presently no examples known whose n
agree for g = 0 but differ for g > 0.
Now, one can start with any Y and use some method to construct the mirror
Y ∗. Among these are the Greene-Plesser construction in conformal field theory, or its
geometric generalizations by Batyrev and Borisov for complete intersections in toric
varieties. Then, to show that Y is self-mirror one proceeds to compute the various
invariants. The simplest condition, eq. (124), can directly be checked in terms of the
toric data. This concretely means that one starts with a mirror pair Y and Y ∗ satis-
fying eq. (124) and checks whether eqns. (125), (127), (128), and (129) are satisfied.
In fact, in Section 4 we collected a large amount of evidence in favor of the claim that
X and its Batyrev-Borisov mirror threefold X∗ are the same. Indeed, eqns. (40), (83)
and (116) show that X̃ , X , and X satisfy by construction the constraint eq. (124)
on the Euler number. More interestingly, by the identifications found in eqns. (109)
and (115) we observed that the condition on the intersection ring, eq. (125), is sat-
isfied for X and X , respectively. Next, eq. (97) and Table 3 show that X also fulfils
the requirement eq. (128) on the genus zero instanton numbers. It would be very
interesting to see whether also the condition eq. (129) for higher genus curves can be
Finally, we consider the torsion in cohomology. In Part A ?? we have shown that
≃ Z3 ⊕ Z3, (130)
as we expect from a self-mirror threefold. Moreover, we can actually compute the
fundamental group of the Batyrev-Borisov mirror independently. For that, first notice
that the quotient X
= X̃∗/G∗1 is fixed-point free, see Subsection 4.2. The mirror
permutation G∗2 onX
acts freely as well. Therefore, bothX andX∗ are free quotients
by a group isomorphic to Z3 ⊕ Z3, thus their fundamental groups are
≃ Z3 ⊕ Z3. (131)
Moreover, on can easily show that on a proper14 Calabi-Yau threefold Z one has
H2(Z,Z)tors = π1(Z)ab, the Abelianization of the fundamental group. Hence, we see
H3(X,Z)tors ≃ Z3 ⊕ Z3 ≃ H
2(X∗,Z)tors (132)
and the first of eq. (126) is true. This provides the first evidence for the conjecture
of [6] in a context other than toric hypersurfaces.
Another point of view is that there is a geometrical or rather combinatorial reason
for the self-mirror property in this case. From eqns. (36) and (39) one can easily
see that the lattice points νi, ν6+i, ν13, ν14, i = 1, . . . , 3, span a sub-polytope of ∇
satisfying the same linear relations as all the lattice points ρi of ∆
∗ in eq. (23). Hence,
14A proper Calabi-Yau threefold has holonomy group the full SU(3). In particular, this implies
that the fundamental group is finite.
this sub-polytope is isomorphic to ∆∗. The same is true for the polytopes ∇̄∗ and
∆̄∗. The toric variety P∇̄∗ which is the ambient space of X
can therefore be regarded
as a blow-up of a quotient of P∆̄∗ , the ambient space of X . Actually, this blow-up
makes all 7 divisors of X
toric. Similarly, P∇∗ can be regarded as a blow-up of a
quotient of P∆∗ . As shown in Subsection 3.3 this entails that all 19 Kähler moduli of
X̃∗ are realized torically. Note that it is possible that the mirror polytopes ∆∗ and
∇∗ are actually isomorphic. In fact, for toric hypersurfaces there are 41, 710 self-dual
polytopes [54]. The novel feature in our case is that non-isomorphic polytopes lead
to self-mirror complete intersections, consistent with the nef partitions.
6 Factorization vs. The (3,1,0,0,0) Curve
One interesting observation is that the prepotential F
X,0 at order p, see eq. (123)
in this paper and eq. (??) in Part A [1], factors into
i,j=0 b
2 times a function of
p, q, r only. This means that the instanton number for any pseudo-section (curve
contributing at order p) does not depend on the torsion part of its homology class. In
other words, for any pseudo-section there are 8 other pseudo-sections with the same
class in H2(X,Z)free and together filling up all of H2(X,Z)tors = Z3 ⊕Z3. In contrast,
this factorization does not hold at order p3. For example,
X,0(p, q, r, b1, b2) = · · ·+ 3p
b1 + b
1 + b2 + b1b2 + b
1b2 + b
2 + b1b
2 + b
+ · · · .
(133)
The purpose of this subsection is to understand this behavior.
First, the factorization of the prepotential at any order of p not divisible by 3
follows from an extra symmetry that we have not utilized so far. The covering space
X̃ is, in addition to eqns. (8a) and (8b), also invariant under another Ĝ = Z3 × Z3
action generated by (ζ
ĝ1 :
[x0 : x1 : x2] 7→ [x0 : ζx1 : ζ
[t0 : t1] 7→ [t0 : t1] (no action)
[y0 : y1 : y2] 7→ [y0 : y1 : y2] (no action)
(134a)
ĝ2 :
[x0 : x1 : x2] 7→ [x1 : x2 : x0]
[t0 : t1] 7→ [t0 : t1] (no action)
[y0 : y1 : y2] 7→ [y0 : y1 : y2] (no action)
(134b)
This symmetry has fixed points and, therefore, cannot be used if one is looking for
a smooth quotient of X̃ . However, it commutes with G and hence descends to a
Ĝ = Z3 × Z3 symmetry of X (with fixed points). Clearly, the instanton sum must
observe this additional geometric symmetry. To make use of this symmetry, we have
to express its action on the variables in F
X,0(p, q, r, b1, b2). We can do so by first noting
that the basic 81 curves
s1×s2 ⊂ X̃, s1 ∈MW (B1), s2 ∈MW (B2) (135)
are really one orbit under G × Ĝ. Recall that, after dividing out G, these curves
became the 9 sections in MW (X) = Z3 ⊕ Z3, see Part A ??. We now observe
that MW (X) = {sij} is one Ĝ-orbit; since each of these sections contributes pb
i, j = 0, . . . , 2 the induced Ĝ action on the prepotential must be
ĝ1 : F
X,0(p, q, r, b1, b2) 7→ F
X,0(b1p, q, r, b1, b2),
ĝ2 : F
X,0(p, q, r, b1, b2) 7→ F
X,0(b2p, q, r, b1, b2).
(136)
Clearly, the prepotential must be invariant under the ĝ1, ĝ2 action. While imposing
no constraint on the p3n terms in the prepotential, all other powers of p must appear
in the combination
i,j=0
, n 6≡ 0 mod 3. (137)
This proves the factorization observed at the beginning of this subsection.
Second, we would like to understand the p3q terms in eq. (133). These are the
curves in the homology classes15
(3, 1, 0, ∗, ∗) ∈ Z3 ⊕ Z3 ⊕ Z3 = H2
. (138)
We will show that the rational curves in this class come in a single family, that is, the
moduli space of genus 0 curves on X in these homology classes
X, (3, 1, 0, ∗, ∗)
(139)
is connected. In particular, all such curves have the same homology class (3, 1, 0, 0, 0)
and only contribute to p3q in the prepotential eq. (133). As discussed in Part A ??,
any such map CX : P
1 → X factors
CX //
C eX ��
��������
. (140)
15Recall that the exponent of p is the degree along the base P1. This is why we pick a basis in
H2(X,Z)free such that a curve in (n1, n2, n3,m1,m2) contributes at order p
n1qn2rn3bm11 b
2 in the
prepotential.
The map C
can be written in terms of homogeneous coordinates as a function
: P1[z0:z1] 7→ P
[x0:x1:x2]
×P1[t0:t1] ×P
[y0:y1:y2]
(141)
satisfying the equations (7a) and (7b) defining X̃ ,
F1 ◦ C eX
[z0 : z1]
= 0 = F2 ◦ C eX
[z0 : z1]
∀[z0 : z1] ∈ P
1 . (142)
The curve CX ends up in the homology class (3, 1, 0, ∗, ∗) if and only if the defining
equation (141) is of degree (3, 1, 0) in P2×P1×P2. Hence, eq. (141) is defined by
complex constants αij, βij , γi (up to rescaling) such that
xi = αi0 z0 + αi1 z1 i = 0, 1, 2
ti = βi0 z
0 + βi1 z
0z1 + βi2 z0z
1 + βi3 z
1 i = 0, 1
yi = γi i = 0, 1, 2.
(143)
These constants have to be picked such that the resulting curve lies on the complete
intersection X̃, that is, they have to satisfy eq. (142). Inserting eq. (143), we find
that F1 ◦ C eX
[z0 : z1]
is a homogeneous degree 6 polynomial in [z0 : z1]. Since
the coefficients of zk0z
1 must vanish individually, this yields 7 constraints for the
parameters αij , βij . What makes this system of constraint equations tractable is the
fact that they are all linear in βij ,
F1 ◦ C eX = 0 ⇔

A1 0 0 0 A5 0 0 0
A2 A1 0 0 A6 A5 0 0
A3 A2 A1 0 A7 A6 A5 0
A4 A3 A2 A1 A8 A7 A6 A5
0 A4 A3 A2 0 A8 A7 A6
0 0 A4 A3 0 0 A8 A7
0 0 0 A4 0 0 0 A8



= 0 (144)
where
= α300 + α
10 + α
20 A5
= α00α10α20
= 3α01α
00 + 3α11α
10 + 3α21α
20 + α
20 A6
= (α01α10 + α00α11)α20 + α00α10α21
= 3α201α00 + 3α
11α10 + 3α
21α20 A7
= α01α11α20 + (α01α10 + α00α11)α21
= α301 + α
11 + α
21 A8
= α01α11α21.
(145)
Thinking of this as 7 linear equations for the 8 parameters βij, there is always a non-
zero solution. The solution is generically unique up to an overall factor, and turns
into an Pn for special values of the αij . Moreover, the parameter space of the αij is
connected (essentially, the moduli space of lines in P2). Since we just identified the
parameter space of the (αij, βij) as a blow-up thereof, it is therefore connected as well.
It remains to satisfy F2 ◦ C eX = 0. One can easily see that the only way is to pick
the γi to be simultaneous solutions of
γ30 + γ
1 + γ
2 = 0 = γ1γ2γ3. (146)
Since two cubics intersect in 9 points, there are 9 such solutions, permuted by G.
Therefore, the parameter space of (αij , βij, γi) has 9 connected components, permuted
by the G-action. The moduli space of curves CX on X is the G-quotient of the moduli
space of curves C
on X̃, and therefore has only a single connected component. By
continuity, every curve CX in this connected family has the same homology class,
explaining the piece of the prepotential given in eq. (133).
7 Towards a Closed Formula
Putting all the information together we found out about the prepotential on X , one
can try to divine a closed form for the prepotential. We guess that the order pn terms
have the closed form
X,0(p, q, r, b1, b2)
i,j∈Z3
P (q)4P (r)4
M2n−2(q, r) (147)
if n is not a multiple of 3 and, slightly weaker, that
X,0(p, q, r, 1, 1)
P (q)4P (r)4
M2n−2(q, r) (148)
if n is a multiple of 3. Here,
• P (q) is the usual generating function of partitions eq. (4).
• The M2n−2 are polynomials in the Eisenstein series E2(q), E4(q), E6(q) and
E2(r), E4(r), E6(r), starting with
M−2(q, r) = 0
M0(q, r) = 1
M2(q, r) = E2(q)E2(r)
M4(q, r) =
E4(q)E4(r) +
E4(q)E2(r)
2 + E2(q)
2E4(r)
E2(q)
2E2(r)
M6(q, r) =
E6(q)E6(r) +
E6(q)E4(r)E2(r) + E4(q)E2(q)E6(r)
E6(q)E2(r)
3 + E2(q)
3E6(r)
E4(q)E2(q)E4(r)E2(r)
E2(q)
3E4(r)E2(r) + E4(q)E2(q)E2(r)
E2(q)
3E2(r)
M8(q, r) =
E6(q)E2(q)E6(r)E2(r) +
E4(q)E4(q)E4(r)E4(r)
E6(q)E2(q)E4(r)E4(r) + E4(q)E4(q)E6(r)E2(r)
E6(q)E2(q)E4(r)E2(r)
2 + E4(q)E2(q)
2E6(r)E2(r)
E6(q)E2(q)E2(r)
4 + E2(q)
4E6(r)E2(r)
+ 137
E4(q)E4(q)E4(r)E2(r)
2 + E4(q)E2(q)
2E4(r)E4(r)
E4(q)E4(q)E2(r)
4 + E2(q)
4E4(r)E4(r)
E4(q)E2(q)
2E4(r)E2(r)
2 + 121
E2(q)
4E2(r)
E4(q)E2(q)
2E2(r)
4 + E2(q)
4E4(r)E2(r)
(149)
They are symmetric under the exchange q ↔ r and of weight 2n in q and r
separately. But, for example, M4 above does not factor into a function of q and
a function of r. So theM2n−2 are not the products of the polynomials appearing
in the dP9 prepotential. However, by setting q = 0 or r = 0 one recovers the
corresponding polynomials in the dP9 prepotential [55].
• The E2i are the usual Eisenstein series
E2(q) = 1− 24q − 72q
2 − 96q3 − 168q4 − 144q5 − 288q6 +O(q7)
E4(q) = 1 + 240q + 2160q
2 + 6720q3 + 17520q4 + 30240q5 +O(q6)
E6(q) = 1− 504q − 16632q
2 − 122976q3 − 532728q4 +O(q5).
(150)
Note that the naive Taylor series coefficients of the prepotential are fractional, but
when expanding in terms of Li3’s (which account for the multicover contributions)
one finds integral instanton numbers.
These expressions for the prepotential agree with all instanton numbers computed
in this paper. Unfortunately, we have not been able to guess a closed formula that
includes the b1 and b2 dependence of the prepotential F
X,0(p, q, r, b1, b2)|pn if n is
divisible by 3. We expect that these involve extra functions beyond the Eisenstein
series.
8 Conclusion
In the initial paper Part A [1], we analyzed the topology of the Calabi-Yau manifold
of interest and found that
= Z3 ⊕ Z3 ⊕ Z3. (151)
Although the presence of torsion curve classes complicates the counting of rational
curves, we managed to derive the A-model prepotential to linear order in p.
The goal of this paper is to go beyond the results of Part A using mirror symmetry.
By carefully adapting methods designed for complete intersections in toric varieties, we
can apply mirror symmetry to compute the instanton numbers onX , even thoughX is
not toric. Using thatX is self-mirror, we completely solve this problem and are able to
calculate the complete A-model prepotential to any desired precision (and for arbitrary
degrees in p), limited only by computer power. Carrying out this computation, we
find the first examples of instanton numbers that do depend on the torsion part of
their integral homology class, see Table 4 on Page 35.
Since the self-mirror property of X is important, we investigate it in detail. In
doing so, we go far beyond just checking that the Hodge numbers are self-mirror.
In particular, we find that the intersection rings are identical and that torsion in
homology obeys the conjectured mirror relation [6]. Finally, going beyond classical
geometry, we independently calculate certain instanton numbers onX and its Batyrev-
Borisov mirror X∗. Again, we find that X and X∗ are indistinguishable, providing
strong evidence for X being self-mirror. Both of these results extend those found in
Part A [1].
Using these results, we are able to guess certain closed expressions for the pre-
potential of X in terms of modular forms. In certain limits it specializes to the dP9
prepotential of [55]. There it is known that the coefficients in p of the dP9 prepoten-
tial satisfy a recursion relation. Moreover, there is a gap condition, that is, a certain
number of subsequent terms in a series expansion is absent. This condition provides
sufficient data to determine the integration constants for the recursion and allows to
determine the prepotential completely, even at higher genus. We expect a similar
story to be valid for the prepotential of X .
Acknowledgments
The authors would like to thank Albrecht Klemm, Tony Pantev, and Masa-Hiko Saito
for valuable discussions. We also thank Johanna Knapp for providing a Singular [56]
code to compute the intersection ring of Calabi-Yau manifolds in toric varieties. This
research was supported in part by the Department of Physics and the Math/Physics
Research Group at the University of Pennsylvania under cooperative research agree-
ment DE-FG02-95ER40893 with the U. S. Department of Energy and an NSF Focused
Research Grant DMS0139799 for “The Geometry of Superstrings”, in part by the
Austrian Research Funds FWF grant number P18679-N16, in part by the European
Union RTN contract MRTN-CT-2004-005104, in part by the Italian Ministry of Uni-
versity (MIUR) under the contract PRIN 2005-023102 “Superstringhe, brane e inter-
azioni fondamentali”, and in part by the Marie Curie Grant MERG-2004-006374.E. S.
thanks the Math/Physics Research group at the University of Pennsylvania for kind
hospitality.
A Triangulation of ∇̄∗ and ∇∗
In principle the coherent triangulations of the fan over ∇̄∗ can be computed with
TOPCOM by finding the 720 star triangulations in the total of 230, 832 coherent tri-
angulations of ∇̄∗. The discussion of the symmetry properties is greatly facilitated,
however, by an explicit understanding of their structure. We will work out the trian-
gulations by first triangulating the facets and then checking the compatibility of their
maximal intersections and the coherence of the resulting star triangulations.
We start with a couple of useful definitions. A circuit is a minimal collection of n
affinely dependent points p1, . . . , pn,
λ1p1 + . . . λnpn = 0 with λ1 + . . .+ λn = 0, λi 6= 0, (152)
any proper subset of which is affinely independent. The coefficient vector λn hence has
nonzero entries and is unique up to a prefactor. We indicate the unique separation into
points with positive and negative coefficients with the notation 〈pi1 . . . pis|pis+1 . . . pin〉.
Each circuit admits two different triangulations, which are obtained by dropping one
of the points with positive coefficients and one of the remaining points, respectively.
We indicate this with a hat over the relevant subset. The two resulting triangulations
̂pi1 . . . pis |pis+1 . . . pin
pi1 . . . pis| ̂pis+1 . . . pin
(153)
hence consist of s and n− s simplices, respectively. If the first point is in the convex
hull of the others, that is, s = 1, then only one of the triangulations is maximal (all
points are vertices of at least one simplex).
Furthermore, we introduce the notation:
ai = ν̄i, bi = ν̄3+i, ci = ν̄6+i, di = ν̄12+i, i = 1, 2, 3,
e = ν̄13, f = ν̄14.
(154)
Among these 14 vectors in eq. (154) there are 9 independent linear relations, see
eq. (37),
a1 + a2 + a3 = 0, c1 + c2 + c3 = 0,
e+ f = 0, bi = ai + e, dl = cl + f,
(155)
which imply others like ai + bj = aj + bi and ai + cl = bi + dl or e =
(b1 + b2 + b3)
and f = 1
(d1 + d2 + d3).
Lemma 1. ∇̄∗ has 15 facets, 6 of which are simplicial:
aiajbibjclcmdldm] i<j
aiajd1d2d3
b1b2b3clcm
. (156)
The nine non-simplicial facets form an orbit under the permutation symmetries
Zab3 × Z
3 generated by gab : (
( ai+1
and gcd :
( cl+1
. According to the
linear relations eq. (155) the eight points on each non-simplicial facet form quadratic
circuits ai+ bj = bi+aj , ai+ cl = bi+dl, and cl+dm = cm+dl, which we call mixed if
they contain vertices of both elements of the nef partition 〈aicl|bidl〉, and pure circuits
〈aibj |biaj〉, 〈cldm|cmdl〉 otherwise.
The coherent triangulations of the facets [aiajbibjclcmdldm] are most easily ob-
tained from their Gale transform
1 −1 −1 1 0 0 0 0
1 0 −1 0 1 0 −1 0
0 0 0 0 1 −1 −1 1
 , (157)
which is the coefficient matrix of the basis ai−aj −bi+bj = 0, ai−bi+cl−dl = 0 and
cl− cm−dl+dm = 0 of linear relations. The coherent triangulations are in one-to-one
correspondence to chambers that are seperated by the facets of the cones generated
by all linear bases µ = {v1, v2, v3} with vi selected among the 8 column vectors of the
Gale transform [57, 58]. In the present case the cones over the faces of the parallel-
epiped in Figure 1 are subdivided into 24 chambers, which are indicated by dashed
lines. The triangulations, which we can label by the facet containing and the edge
adjoining the chamber, are obtained as the sets of complements of those bases µ that
span a cone containing the respective chamber.
Hence, each non-simplicial facet has 24 coherent triangulations, which can be
characterized by the triangulations of its 2 pure and of its 4 mixed circuits: Calling
the triangulation 〈âicl|bidl〉 positive and the triangulation 〈aicl|b̂idl〉 negative, and
arranging the cyclic permutations gab and gcd in the horizontal and vertical direction,
respectively, we can assign one of 16 different types ± ±± ± to each triangulation, where
Figure 1: Secondary fan of the non-simplicial facets. Chambers are indicated
by dashed lines.
the signs indicate the induced triangulations of the mixed circuits. The constraints
that reduce the a priori 32 = 26 combinations to 24 all derive from the following rules:
ai aj
bi bj
aicl|b̂idl
âjcl|bjdl
aibj |âjbi
âicl|bidl
ajcl|b̂jdl
âibj |ajbi
〉 (158)
i.e. a triangular prism can be triangulated in 6 different ways, which correlates the a
priori 8 combinations of the triangulations of the 3 squares (with analogous constraints
for the two “horizontal” prisms [aibiclcmdldm] contained in the facet [aiajbibjclcmdldm]).
Putting the pieces together we obtain
Lemma 2. The 24 triangulations of the non-simplicial facets can be assorted as fol-
lows:
• For + ++ + ,
− − the pure circuits are unconstrained, yielding 2 · 2
2 = 8 triangu-
lations.
• For + +− − ,
+ + the pure ab-circuit is unconstrained; with the transposed types
+ − ,
− + this accounts for another 8 triangulations.
• The final 8 triangulations come from the 8 types with an odd number of positive
signs, for which the triangulation of the pure circuits is unique.
• The two types + −− + and
+ − cannot occur because of contradictory implications
for the triangulations of the pure circuits.
The secondary fan and the induced triangulations for the codimension-two faces at
which the non-simplical facets intersect can be obtained from Figure 1 by projection
along the dropped vertices. The secondary fan of the prism of eq. (158), for example,
which is shown in Figure 2, is obtained from Figure 1 by projection along the diagonal
Taibj
〈aicl|bidl〉s
Taicl
〈ajcl|bjdl〉
Tajcl
〈ajbi|aibj〉
Tajbi
〈bidl|aicl〉 s
Tbidl
��〈ajcl|bjdl〉
bi Tbjdl
〈ajbi|aibj〉
Figure 2: Secondary fan of the codimension two face [aiajbibjcldl].
〈cmdm〉. The wall crossings between the six cones in Figure 2 are labeled by the circuits
whose flops relate the adjoining triangulations [57].
For the construction of the complete star triangulation we now observe that the
non-simplicial intersections of the 9 non-simplicial facets [aiajbibjclcmdldm] are given
by the 18 triangular prisms [aiajbibjcldl] and [aibiclcmdldm]. If we interpret the former
as vertices and the latter as links then the resulting compatibility conditions corre-
spond to a graph with the topology of a torus. The vertices of this graph are decorated
by signs as shown in Table 5 and connected by horizontal and vertical links. The re-
1 · 26 9 · 22 18 · 22 6 · 24 36 · 20 36 · 21 9 · 22
Table 5: The 824 = 2 (64 + 36 + 72 + 96 + 36 + 72 + 36) star triangulations of
∇∗, including the 720 = 2 (36 + 36+ 72+ 72+ 36+ 72+ 36) coherent
triangulations.
striction on the compatible signs is due to the absence of the inconsistent types + −− +
and − ++ − as subgraphs on the torus. The multiplicities µ · 2
n come from the number
n of unconstrained pure circuits and from the order µ of the effective part of the
symmetry group generated by transposition and permutations of lines and columns.
We thus find a total of 824 triangulations. The cyclic permutation symmetry that
we want to keep on the Calabi-Yau manifold X
amounts to a diagonal shift, i.e.
its induced action on the graph is generated by gabgcd. We are hence left with the
types
, and the shift symmetry furthermore aligns the triangulations of
the pure circuits and thus reduced the multiplicities from 26 to 22, yielding a total of
8 triangulations for which P∇̄∗ is G
2 symmetric.
The resulting triangulations of the facet [a2a3b2b3c2c3d2d3] are
triangulation of [a2a3b2b3c2c3d2d3]
〈â2b3|a3b2〉, 〈ĉ2d3|c3d2〉 {[a3b2b3d2d3], [b2b3c3d2d3], [a2a3b2d2d3], [b2b3c2c3d2]}
〈â2b3|a3b2〉, 〈c2d3|ĉ3d2〉 {[a3b2b3d2d3], [b2b3c2d2d3], [a2a3b2d2d3], [b2b3c2c3d3]}
〈a2b3|â3b2〉, 〈ĉ2d3|c3d2〉 {[a2b2b3d2d3], [b2b3c3d2d3], [a2a3b3d2d3], [b2b3c2c3d2]}
〈a2b3|â3b2〉, 〈c2d3|ĉ3d2〉 {[a2b2b3d2d3], [b2b3c2d2d3], [a2a3b3d2d3], [b2b3c2c3d3]}
(159)
triangulation of [a2a3b2b3c2c3d2d3]
〈a2b3|â3b2〉, 〈c2d3|ĉ3d2〉 {[a2a3b3c2c3], [a2a3c2c3d3], [a2b2b3c2c3], [a2a3c2d2d3]}
〈a2b3|â3b2〉, 〈ĉ2d3|c3d2〉 {[a2a3b3c2c3], [a2a3c2c3d2], [a2b3b2c2c3], [a2a3c3d2d3]}
〈â2b3|a3b2〉, 〈c2d3|ĉ3d2〉 {[a2a3b2c2c3], [a2a3c2c3d3], [a3b2b3c2c3], [a2a3c2d2d3]}
〈â2b3|a3b2〉, 〈ĉ2d3|c3d2〉 {[a2a3b2c2c3], [a2a3c2c3d2], [a3b2b3c2c3], [a2a3c3d2d3]}
(160)
It can be checked that the triangulations listed in eqns. (159) and (160) come from the
chambers contained in the cones over [a2a3c2c3] and [b2b3d2d3], respectively. For the
first of these triangulations we consider the chamber adjoining the edge [a2c2], which
is contained in the span of the four bases µ1 = {a2c2c3}, µ2 = {a2a3c2}, µ3 = {b3c2c3}
and µ4 = {a2a3d3}, whose complements are [a3b2b3d2d3], [b2b3c3d2d3], [a2a3b2d2d3] and
[b2b3c2c3d2] in agreement with the first triangulation in eq. (159).
Unfortunately, coherent triangulations of the facets that induce the same trian-
gulations on their common (maximal) intersections do not automatically combine to
coherent star triangulations of the polytope, and indeed only 720 of the 824 triangu-
lations in Table 5 turn out to be coherent. The non-coherent ones are easily isolated
by observing that coherent triangulations (via their height functions) induce coherent
triangulations of the prisms [a1a2a3b1b2b2] and [c1c2c3d1d2d3], which eliminates the
triangulations for which Zab3 or Z
3 is not broken by the triangulation of the pure
circuits. For the triangulation types
this reduces the multiplicity from
82 to 62. The only other affected types are the ones in the middle column of Table 5,
which have unbroken horizontal symmetry and for which the multiplicity is reduced
from 12 · 8 to 12 · 6. This poses a problem for the eight Z3-symmetric triangulations,
which are all non-coherent. Coherence of the remaining 720 triangulations can be
established by checking that their Mori cones are all strictly convex [59].
What comes to our rescue is that, even if all projective ambient spaces break the
diagonal Z3 permutation symmetry, it may be preserved on X
if the obstructing ex-
ceptional sets do not overlap with the complete intersection. In the present case these
are the blow-ups of the singularities coming from the pure circuits, i.e. codimension
two sets of the form ai · bj or cl · dm, where we use, for simplicity, the symbol of the
vertex ν̄j for the corresponding divisor Dj . Recall from eq. (77) that X
is given by
the product D̄∗0,1 · D̄
0,2 of the divisors
D̄∗0,1 = a1 + a2+ a3 + b1 + b2 + b3 + e, D̄
0,2 = c1 + c2 + c3+ c1+ d2+ d3+ f (161)
defined by the nef partition. Taking into account the five linear equivalences, we
observe that
a1 + b1 = a2 + b2 = a3 + b3, c1 + d1 = c2 + d2 = c3 + d3,
b1 + b2 + b3 + e = d1 + d2 + d3 + f,
(162)
for divisor classes in the intersection ring. We first show that e and f do not intersect
: In any maximal triangulation e and f belong only to the simplices
b̂1b2b3cmcle
d̂1d2d3amalf
, (163)
respectively, so that
e·ai = e·dl = e·f = 0 ⇒ e·D̄
0,1 = e·(b1+b2+b3+e) = e·(d1+d2+d3+f) = 0 (164)
and similarly f · D̄∗0,2 = 0. Putting everything together, we conclude that
a1 · b2 · D̄
0,1 = a1 · b2 · 3(a3 + b3) = 0 (165)
because none of the facets, and hence no triangle in any of the triangulations contains
{a1, b2, a3} or {a1, b2, b3} as a subset. Similarly cl · dm · D̄
0,2 = 0 in the intersection
ring for l 6= m. Consequently, all exceptional sets arising from triangulations of
pure circuits do not intersect X
and hence do not obstruct the cyclic permutation
symmetry G∗2. We will denote any of the remaining 36 coherent triangulations of type
by T+ = T+(∇̄
∗) and T− = T−(∇̄
∗), respectively.
The polytope ∇∗ of the mirror X̃∗ of the universal cover has 39 lattice points, with
the same 12 vertices as ∇̄∗ but living on the finer lattice M̄ . The 24 additional lattice
points, see eq. (39), are
aij =
(ai + 2aj), bij =
(bi + 2bj), (166)
cij =
(ci + 2cj), dij =
(di + 2dj), (167)
where i 6= j. These additional points are all located on edges of ∇∗. It is natural
to consider triangulations that are refinements of the ones that we just discussed.
Observing that the additional points turn all simplices in eqns. (159), (160) and (163)
into pyramids over a tetrahedron with interior points on opposite edges it is easy to see
that the maximal triangulations are unique and multiply the number 54 = 9 · 4+6 · 3
of triangles in the original triangulations by a factor of 9. The resulting triangulations
have been used to show that the divisors corresponding to the vertices aij and cij do
not intersect X̃∗.
B The Flop of X∗
In Subsection 4.5 we have taken into account only one of the triangulations T+(∇̄
We can repeat the same calculation with one of the triangulations T−. We denote
the resulting Calabi-Yau manifold by X
−. Skipping the details, we find that the
generators of the Mori cone NE(X
−) can be expressed in terms of those of NE(X
in eq. (104) as
− = l̄
+ + 3
+ + l̄
+ + l̄
+ + l̄
− = l̄
+ + 3
+ + l̄
+ + l̄
− = l̄
+ + l̄
− = − l̄
− = − l̄
+ − l̄
+ − l̄
+ − l̄
+ − l̄
− = l̄
− = l̄
(168)
One can also express the dual basis of divisors J̄ ′i on the flop in terms of the dual basis
J̄∗i on X , see eq. (105). We find
J̄ ′1 = J̄
1 , J̄
2 = J̄
2 , J̄
5 = 3J̄
1 + 3J̄
2 − J̄
J̄ ′3 = 3J̄
1 + J̄
3 − J̄
5 , J̄
4 = 3J̄
1 + J̄
3 − J̄
J̄ ′6 = 3J̄
2 − J̄
5 + J̄
6 , J̄
7 = 3J̄
2 − J̄
5 + J̄
(169)
The intersection ring is
J̄ ′21 J̄
2 = 1, J̄
3 = 2, J̄
4 = 1, J̄
5 = 3, J̄
6 = 3,
J̄ ′21 J̄
7 = 3, J̄
2 = 1, J̄
3 = 3, J̄
4 = 3, J̄
5 = 3,
J̄ ′1J̄
6 = 3, J̄
7 = 3, J̄
3 = 6, J̄
4 = 6, J̄
5 = 9,
J̄ ′1J̄
6 = 9, J̄
7 = 9, J̄
4 = 3, J̄
5 = 9, J̄
6 = 9,
J̄ ′1J̄
7 = 9, J̄
5 = 9, J̄
6 = 9, J̄
7 = 9, J̄
6 = 9,
J̄ ′1J̄
7 = 9, J̄
7 = 9, J̄
3 = 3, J̄
4 = 3, J̄
5 = 3,
J̄ ′22 J̄
6 = 2, J̄
7 = 1, J̄
3 = 9, J̄
4 = 9, J̄
5 = 9,
J̄ ′2J̄
6 = 9, J̄
7 = 9, J̄
4 = 9, J̄
5 = 9, J̄
6 = 9,
J̄ ′2J̄
7 = 9, J̄
5 = 9, J̄
6 = 9, J̄
7 = 9, J̄
6 = 6,
J̄ ′2J̄
7 = 6, J̄
7 = 3, J̄
3 = 18, J̄
4 = 18, J̄
5 = 27,
J̄ ′23 J̄
6 = 27, J̄
7 = 27, J̄
4 = 18, J̄
5 = 27, J̄
6 = 27,
J̄ ′3J̄
7 = 27, J̄
5 = 27, J̄
6 = 27, J̄
7 = 27, J̄
6 = 27,
J̄ ′3J̄
7 = 27, J̄
7 = 27, J̄
4 = 9, J̄
5 = 27, J̄
6 = 27,
J̄ ′24 J̄
7 = 27, J̄
5 = 27, J̄
6 = 27, J̄
7 = 27, J̄
6 = 27,
J̄ ′4J̄
7 = 27, J̄
7 = 27, J̄
5 = 27, J̄
6 = 27, J̄
7 = 27,
J̄5J̄
6 = 27, J̄
7 = 27, J̄5J̄
7 = 27, J̄
6 = 18, J̄
7 = 18,
J̄ ′6J̄
7 = 18, J̄
7 = 9.
(170)
The second Chern class is
· J̄ ′1 = 12, c2
· J̄ ′5 = 18, c2
· J̄ ′2 = 12,
· J̄ ′3 = c2
· J̄ ′6 = 24, c2
· J̄ ′4 = c2
· J̄ ′7 = 30.
(171)
We observe that both the intersection ring and the second Chern class cannot be
brought into (11) and (110) by a linear transformation with integer coefficients, re-
spectively. Hence, the second phase really is topologically distinct.
We denote the Fourier-transformed variables in the B-model prepotential (71) by
q′i, i = 1, . . . , 7. With this notation, we obtain
(q′1, . . . , q
7) = 3 q
1 + 3 q
2 + 3 q
+ 3 q′3q
5 + 3 q
+ 3 q′3q
5 + 3 q
6 + 3 q
12333
12333
+ 3 q′1q
+ 3 q′2q
− 6 q′1q
5 − 6 q
+ 3 q′3q
6 + 3 q
7 +O(q
(172)
The instanton numbers on X
− are the expansion coefficients in
(q′1, . . . , q
7, 1) =
n1,...,n7
(n1,n2,n3,n4,n5,n6,n7)
. (173)
Up to degree 4, they read
1,0,0,0,0,0,0 = 3 n
0,1,0,0,0,0,0 = 3 n
0,0,0,0,1,0,0 = 3 n
2,0,0,0,0,0,0 = −6
0,2,0,0,0,0,0 = −6 n
3,0,0,0,0,0,0 = 27 n
0,3,0,0,0,0,0 = 27 n
4,0,0,0,0,0,0 = −192
0,4,0,0,0,0,0 = −192 n
0,0,1,0,1,0,0 = 3 n
0,0,0,0,1,1,0 = 3 n
0,0,1,1,1,0,0 = 3
0,0,1,0,1,1,0 = 3 n
0,0,0,0,1,1,1 = 3 n
1,0,0,3,0,0,0 = 3 n
0,1,0,0,0,0,3 = 3
1,0,1,1,1,0,0 = −6 n
0,1,0,0,1,1,1 = −6 n
0,0,1,1,1,1,0 = 3 n
0,0,1,0,1,1,1 = 3
(174)
It is easy to check that the symmetry G∗2 acts without fixed points on X
− so that there
are two phases of the quotient X∗, too, with h1,1(X∗) = h1,2(X∗) = 3 and fundamental
group π1(X
∗) = Z3×Z3. They correspond to the two classes of triangulations T+ and
T−. The first phase was studied in detail in Subsection 4.5 and 4.6.
We denote the Calabi-Yau manifold in the second phase by X∗− = X
2. From
the linear equivalence relations eq. (106) and the definition eq. (169) we can compute
the induced group action on H2(X
−,Z) and find

J̄ ′1
J̄ ′2
J̄ ′3
J̄ ′4
J̄ ′5
J̄ ′6
J̄ ′7


1 0 0 0 0 0 0
0 1 0 0 0 0 0
3 0 0 −1 1 0 0
3 0 1 −1 0 0 0
0 0 0 0 1 0 0
0 3 0 0 1 0 −1
0 3 0 0 0 1 −1


J̄ ′1
J̄ ′2
J̄ ′3
J̄ ′4
J̄ ′5
J̄ ′6
J̄ ′7

. (175)
In terms of the three invariant divisors J ′1 = J̄
2 = J̄
3 = J̄
2 the intersection ring
and the second Chern class of X∗− then are
J ′22 J
3 = 1, J
2 = 3, J
3 = 1, J
3 = 3,
J ′21 J
2 = 9, J
3 = 3, J
3 = 9, J
1 = 27,
1 = 18, c2 J
2 = 12, c2 J
3 = 12.
(176)
Again, we observe that there is no linear basis transformation with integer coefficients
that brings both the intersection ring and the second Chern class into (90). Hence,
also the phase X∗− is topologically distinct from X
(n1, n2, n3) n
(n1,n2,n3,0)
(n1,n2,n3,1)
(n1,n2,n3,2)
(n1,n2,n3)
(1, 0, 0) 3 0 0 3
(2, 0, 0) −6 0 0 −6
(3, 0, 0) 18 0 0 18
(0, 1, 0) 3 3 3 9
(1, 1, 0) −6 −6 −6 −18
(1, 1, 1) 12 12 12 36
(2, 1, 0) 15 15 15 45
(1, 2, 0) 12 12 12 36
Table 6: Instanton numbers n
(n1,n2,n3,m2)
computed by toric mirror symmetry.
They are invariant under the exchange n2 ↔ n3, so we only display
them for n2 ≤ n3.
To give a geometrical interpretation of what happens, we look at the induced action
of G∗2 on the toric Mori cone NE(X
−). Only the generators l̄
− , l̄
− and l̄
− in eq. (168)
are invariant. This is exactly as in the first phase. Denoting the invariant generators
± , l̄
± , l̄
± by l
± , l
± , l
± , respectively, we observe that phase T− is obtained from
the phase T+ as a flop by the curve corresponding to the generator l
− = −l
+ , l
− = l
+ + 3l
+ , l
− = l
+ + 3l
+ . (177)
If we use the realization of X in terms of the fiber product of two dP9 surfaces, the
above result means that the base P1 of X has been flopped.
Furthermore, having computed the G∗2-action in eq. (175), we determine the de-
scent equation for the prepotential to be
p′, q′, r′, b′1, b
|G∗2|
p′, q′, b′2, b
′, b′2
2, b′2
2, b′1
. (178)
The corresponding instanton numbers
′, q′, r′, 1, b′2) =
n1,n2,n3,m2
(n1,n2,n3,m2)
p′n1q′n2r′n3b′2
(179)
are listed in Table 6.
Bibliography
[1] V. Braun, M. Kreuzer, B. A. Ovrut, and E. Scheidegger, “Worldsheet
instantons and torsion curves. Part A: Direct computation,” hep-th/0703182.
(document), 1, 1, 1, 2.2, 2.3, 3.2, 4.1, 6, 8, 8
http://arXiv.org/abs/hep-th/0703182
[2] P. Candelas, X. C. De La Ossa, P. S. Green, and L. Parkes, “A pair of
Calabi-Yau manifolds as an exactly soluble superconformal theory,” Nucl. Phys.
B359 (1991) 21–74. 1
[3] C. Schoen, “On fiber products of rational elliptic surfaces with section,” Math.
Z. 197 (1988), no. 2, 177–199. 1, 2.1
[4] V. Braun, B. A. Ovrut, T. Pantev, and R. Reinbacher, “Elliptic Calabi-Yau
threefolds with Z(3) x Z(3) Wilson lines,” JHEP 12 (2004) 062,
hep-th/0410055. 1
[5] P. S. Aspinwall, D. R. Morrison, and M. Gross, “Stable singularities in string
theory,” Commun. Math. Phys. 178 (1996) 115–134, hep-th/9503208. 1
[6] V. Batyrev and M. Kreuzer, “Integral Cohomology and Mirror Symmetry for
Calabi-Yau 3-folds,” math.AG/0505432. 1, 1, 3.3, 4.1, 5, 5, 8
[7] M. Gross and S. Pavanelli, “A Calabi-Yau threefold with Brauer group
(Z/8Z)2,” math.AG/0512182. 1
[8] S. Ferrara, J. A. Harvey, A. Strominger, and C. Vafa, “Second quantized mirror
symmetry,” Phys. Lett. B361 (1995) 59–65, hep-th/9505162. 1
[9] P. S. Aspinwall, “An N=2 Dual Pair and a Phase Transition,” Nucl. Phys.
B460 (1996) 57–76, hep-th/9510142. 1
[10] E. Lima, B. A. Ovrut, and J. Park, “Five-brane superpotentials in heterotic
M-theory,” Nucl. Phys. B626 (2002) 113–164, hep-th/0102046. 1
[11] E. Lima, B. A. Ovrut, J. Park, and R. Reinbacher, “Non-perturbative
superpotential from membrane instantons in heterotic M-theory,” Nucl. Phys.
B614 (2001) 117–170, hep-th/0101049. 1
[12] E. I. Buchbinder, R. Donagi, and B. A. Ovrut, “Superpotentials for vector
bundle moduli,” Nucl. Phys. B653 (2003) 400–420, hep-th/0205190. 1
[13] E. I. Buchbinder, R. Donagi, and B. A. Ovrut, “Vector bundle moduli
superpotentials in heterotic superstrings and M-theory,” JHEP 07 (2002) 066,
hep-th/0206203. 1
[14] E. Buchbinder and B. A. Ovrut, “Vector bundle moduli,” Russ. Phys. J. 45
(2002) 662–669. 1
[15] E. I. Buchbinder and B. A. Ovrut, “Vacuum stability in heterotic M-theory,”
Phys. Rev. D69 (2004) 086010, hep-th/0310112. 1
http://arXiv.org/abs/hep-th/0410055
http://arXiv.org/abs/hep-th/9503208
http://arXiv.org/abs/math.AG/0505432
http://arXiv.org/abs/math.AG/0512182
http://arXiv.org/abs/hep-th/9505162
http://arXiv.org/abs/hep-th/9510142
http://arXiv.org/abs/hep-th/0102046
http://arXiv.org/abs/hep-th/0101049
http://arXiv.org/abs/hep-th/0205190
http://arXiv.org/abs/hep-th/0206203
http://arXiv.org/abs/hep-th/0310112
[16] E. Buchbinder, R. Donagi, and B. A. Ovrut, “Vector bundle moduli and small
instanton transitions,” JHEP 06 (2002) 054, hep-th/0202084. 1
[17] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “The exact MSSM spectrum
from string theory,” JHEP 05 (2006) 043, hep-th/0512177. 1
[18] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “A heterotic standard
model,” Phys. Lett. B618 (2005) 252–258, hep-th/0501070. 1
[19] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “A standard model from the
E(8) x E(8) heterotic superstring,” JHEP 06 (2005) 039, hep-th/0502155. 1
[20] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “Vector bundle extensions,
sheaf cohomology, and the heterotic standard model,” Adv. Theor. Math. Phys.
10 (2006) 4, hep-th/0505041. 1
[21] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “Heterotic standard model
moduli,” JHEP 01 (2006) 025, hep-th/0509051. 1
[22] V. Braun, Y.-H. He, B. A. Ovrut, and T. Pantev, “Moduli dependent mu-terms
in a heterotic standard model,” hep-th/0510142. 1
[23] V. Braun, Y.-H. He, and B. A. Ovrut, “Yukawa couplings in heterotic standard
models,” JHEP 04 (2006) 019, hep-th/0601204. 1
[24] V. Braun, Y.-H. He, and B. A. Ovrut, “Stability of the minimal heterotic
standard model bundle,” JHEP 06 (2006) 032, hep-th/0602073. 1
[25] V. Braun and B. A. Ovrut, “Stabilizing moduli with a positive cosmological
constant in heterotic M-theory,” JHEP 07 (2006) 035, hep-th/0603088. 1
[26] P. S. Aspinwall and D. R. Morrison, “Chiral rings do not suffice: N=(2,2)
theories with nonzero fundamental group,” Phys. Lett. B334 (1994) 79–86,
hep-th/9406032. 1
[27] V. Braun, M. Kreuzer, B. A. Ovrut, and E. Scheidegger, “Worldsheet
instantons, torsion curves, and non-perturbative superpotentials,”
hep-th/0703134. 1, 1
[28] S. Hosono, M. Saito, and J. Stienstra, “On the mirror symmetry conjecture for
Schoen’s Calabi-Yau 3-folds,” alg-geom/9709027. Given at Taniguchi
Symposium on Integrable Systems and Algebraic Geometry, Kyoto, Japan, 7-11
Jul 1997. 1, 3.5, 10
[29] D. Lust, S. Reffert, E. Scheidegger, and S. Stieberger, “Resolved toroidal
orbifolds and their orientifolds,” hep-th/0609014. 2.1
http://arXiv.org/abs/hep-th/0202084
http://arXiv.org/abs/hep-th/0512177
http://arXiv.org/abs/hep-th/0501070
http://arXiv.org/abs/hep-th/0502155
http://arXiv.org/abs/hep-th/0505041
http://arXiv.org/abs/hep-th/0509051
http://arXiv.org/abs/hep-th/0510142
http://arXiv.org/abs/hep-th/0601204
http://arXiv.org/abs/hep-th/0602073
http://arXiv.org/abs/hep-th/0603088
http://arXiv.org/abs/hep-th/9406032
http://arXiv.org/abs/hep-th/0703134
http://arXiv.org/abs/alg-geom/9709027
http://arXiv.org/abs/hep-th/0609014
[30] M. Kreuzer, “Toric geometry and Calabi-Yau compactifications,”
hep-th/0612307. 3
[31] A. Klemm, M. Kreuzer, E. Riegler, and E. Scheidegger, “Topological string
amplitudes, complete intersection Calabi-Yau spaces and threshold corrections,”
JHEP 05 (2005) 023, hep-th/0410018. 3, 3.5, 4.1, 4.1, 10, 5
[32] V. V. Batyrev, “Dual polyhedra and mirror symmetry for Calabi-Yau
hypersurfaces in toric varieties,” J. Alg. Geom. 3 (1994) 493–545. 3.2
[33] L. A. Borisov, “Towards the Mirror Symmetry for Calabi-Yau Complete
Intersections in Gorenstein Toric Fano Varieties,” arXiv:alg-geom/9310001.
[34] V. V. Batyrev and L. A. Borisov, “On Calabi-Yau complete intersections in
toric varieties,” in Higher-dimensional complex varieties (Trento, 1994),
pp. 39–65. de Gruyter, Berlin, 1996. alg-geom/9412017. 3.2
[35] M. Kreuzer and H. Skarke, “PALP: A Package for analyzing lattice polytopes
with applications to toric geometry,” Comput. Phys. Commun. 157 (2004)
87–106, math.na/0204356. 3.2, 4.1
[36] V. V. Batyrev and L. A. Borisov, “Mirror duality and string-theoretic Hodge
numbers,” Invent. Math. 126 (1996) 183, alg-geom/9509009. 3.2
[37] V. V. Batyrev, “On the classification of smooth projective toric varieties,”
Tohoku Math. J., II. Ser. 43 (1991) 569 – 585. 3.3
[38] J. Stienstra, “Resonant Hypergeometric Systems and Mirror Symmetry,” in
Integrable systems and algebraic geometry, M. H. S. et al., ed., pp. 412 – 452.
World Scientific, Singapore, 1998. alg-geom/9711002. 3.3, 3.5
[39] V. Danilov, “Geometry of Toric Varieties,” Russ. Math. Surv. 33 (1978) 97 –
154. 3.3, 4.1
[40] V. V. Batyrev, “Quantum Cohomology Rings of Toric Manifolds,” Astérisque
(1993), no. 218, 9–34, alg-geom/9310004. 3.3, 3.4
[41] C. T. C. Wall, “Classification Problems in Differential Topology. V: On Certain
6- manifolds,” Invent. Math. 1 (1966) 355 – 374. Corrigendum. Ibid. 2, 306
(1967). 3.3, 5
[42] P. Berglund, S. H. Katz, and A. Klemm, “Mirror symmetry and the moduli
space for generic hypersurfaces in toric varieties,” Nucl. Phys. B456 (1995)
153–204, hep-th/9506091. 3.4
http://arXiv.org/abs/hep-th/0612307
http://arXiv.org/abs/hep-th/0410018
http://arXiv.org/abs/arXiv:alg-geom/9310001
http://arXiv.org/abs/alg-geom/9412017
http://arXiv.org/abs/math.na/0204356
http://arXiv.org/abs/alg-geom/9509009
http://arXiv.org/abs/alg-geom/9711002
http://arXiv.org/abs/alg-geom/9310004
http://arXiv.org/abs/hep-th/9506091
[43] D. A. Cox and S. Katz, Mirror symmetry and algebraic geometry, vol. 68 of
Mathematical Surveys and Monographs. American Mathematical Society,
Providence, RI, 1999. 3.4, 3.5, 3.5
[44] A. B. Givental, “Equivariant Gromov-Witten invariants,” Internat. Math. Res.
Notices (1996), no. 13, 613–663, alg-geom/9603021. 3.5
[45] A. B. Givental, “A mirror theorem for toric complete intersections,” in
Topological field theory, primitive forms and related topics (Kyoto, 1996),
vol. 160 of Progr. Math., pp. 141–175. Birkhäuser Boston, Boston, MA, 1998.
alg-geom/9701016. 3.5
[46] S. Hosono, A. Klemm, S. Theisen, and S.-T. Yau, “Mirror symmetry, mirror
map and applications to complete intersection Calabi-Yau spaces,” Nucl. Phys.
B433 (1995) 501–554, hep-th/9406055. 3.5, 3.5
[47] D. A. Cox, “The Homogeneous Coordinate Ring of a Toric Variety,” J. Algebr.
Geom. 4 (1995) 17 – 50, alg-geom/9210008. 4.1
[48] M. Kreuzer, “The Mirror map for invertible LG models,” Phys. Lett. B328
(1994) 312–318, hep-th/9402114. 4.2
[49] P. S. Aspinwall, C. A. Lutken, and G. G. Ross, “Construction and couplings of
mirror manifolds,” Phys. Lett. B241 (1990) 373–380. 4.2
[50] M. Gross, “The deformation space of Calabi-Yau n-folds with canonical
singularities can be obstructed,” in Mirror symmetry, II, vol. 1 of AMS/IP
Stud. Adv. Math., pp. 401–411. Amer. Math. Soc., Providence, RI, 1997. 5
[51] Y. Ruan, “Topological sigma model and Donaldson-type invariants in Gromov
theory,” Duke Math. J. 83 (1996), no. 2, 461–500. 5
[52] M.-x. Huang, A. Klemm, and S. Quackenbush, “Topological string theory on
compact Calabi-Yau: Modularity and boundary conditions,” hep-th/0612125.
[53] T. W. Grimm, A. Klemm, M. Marino, and M. Weiss, “Direct Integration of the
Topological String,” hep-th/0702187. 5
[54] M. Kreuzer and H. Skarke, “Complete classification of reflexive polyhedra in
four dimensions,” Adv. Theor. Math. Phys. 4 (2002) 1209–1230,
hep-th/0002240. 5
[55] S. Hosono, M. H. Saito, and A. Takahashi, “Holomorphic anomaly equation and
BPS state counting of rational elliptic surface,” Adv. Theor. Math. Phys. 3
(1999) 177–208, hep-th/9901151. 7, 8
http://arXiv.org/abs/alg-geom/9603021
http://arXiv.org/abs/alg-geom/9701016
http://arXiv.org/abs/hep-th/9406055
http://arXiv.org/abs/alg-geom/9210008
http://arXiv.org/abs/hep-th/9402114
http://arXiv.org/abs/hep-th/0612125
http://arXiv.org/abs/hep-th/0702187
http://arXiv.org/abs/hep-th/0002240
http://arXiv.org/abs/hep-th/9901151
[56] G.-M. Greuel, G. Pfister, and H. Schönemann, “Singular 3.0,” a computer
algebra system for polynomial computations, Centre for Computer Algebra,
University of Kaiserslautern, 2005. http://www.singular.uni-kl.de. 8
[57] L. J. Billera, P. Filliman, and B. Sturmfels, “Constructions and complexity of
secondary polytopes,” Adv. Math. 83 (1990), no. 2, 155–179. A, A
[58] I. M. Gel′fand, M. M. Kapranov, and A. V. Zelevinsky, Discriminants,
resultants, and multidimensional determinants. Mathematics: Theory &
Applications. Birkhäuser Boston Inc., Boston, MA, 1994. A
[59] J. A. Wísniewski, “Toric Mori theory and Fano manifolds,” Séminaires &
Congrès 6 (2002) 249. A
	Introduction
	Calabi-Yau Threefolds
	The Calabi-Yau Threefold X
	The Intermediate Quotient Xbar
	Variables
	Toric Geometry and Mirror Symmetry
	Toric Varieties
	The Batyrev-Borisov Construction
	Toric Intersection Ring
	Mori Cone
	B-Model Prepotential
	Quotienting the B-Model
	The Quotient by G1
	The Quotient by G2
	B-Model on Xbar
	Instanton Numbers of X
	B-Model on the Mirror of Xbar
	Instanton Numbers of the Mirror of X
	Instanton Numbers Assuming The Self-Mirror Property
	The Self-Mirror Property
	Factorization vs. The (3,1,0,0,0) Curve
	Towards a Closed Formula
	Conclusion
	Triangulations
	The Flop of X*
	Bibliography
ABSTRACT
  We apply mirror symmetry to the problem of counting holomorphic rational
curves in a Calabi-Yau threefold X with Z_3 x Z_3 Wilson lines. As we found in
Part A [hep-th/0703182], the integral homology group H_2(X,Z)=Z^3 + Z_3 + Z_3
contains torsion curves. Using the B-model on the mirror of X as well as its
covering spaces, we compute the instanton numbers. We observe that X is
self-mirror even at the quantum level. Using the self-mirror property, we
derive the complete prepotential on X, going beyond the results of Part A. In
particular, this yields the first example where the instanton number depends on
the torsion part of its homology class. Another consequence is that the
threefold X provides a non-toric example for the conjectured exchange of
torsion subgroups in mirror manifolds.

<|endoftext|><|startoftext|>
Introduction - Theories that are able to capture both the
Mott [1] and the Anderson [2] mechanisms for electron lo-
calization have remained elusive despite many years of ef-
fort. An attractive approach to this difficult problem has
recently been proposed by combining the dynamical mean-
field theory (DMFT) [3] of the Mott transition, and the
typical medium theory (TMT) [4] of Anderson localiza-
tion. This new formulation of the Mott-Anderson problem
has been explored in recent work by Vollhardt and collabo-
rators [5] using numerical renormalization-group methods,
but the precise mechanism for the critical behavior of this
model remains to be elucidated. Here we examine the mech-
anism for disorder screening within this theory, which ex-
plains several aspects of the results found in Ref. [5]
Within TMT-DMFT, a lattice problem is mapped onto
an ensemble of single-impurities problems, which are em-
bedded in a self-consistently determined bath. Recent work
of Ref. [6] examined the behavior of a collection of single-
impurity models in the situation where the bath seen by the
impurities was chosen to mimic the approach to the Mott-
Anderson transition. In this work, the impurity quasipar-
ticle weight Zi was shown to present a scaling behavior as
a function of the on-site energy εi and the distance t to
the transition. These findings, however, are not sufficient
to address the disorder screening behavior of the model,
which requires the description of the renormalized disor-
∗ Corresponding author. Tel: +55 31 3499-5671 fax: +55 31 3499-
Email address: aguiar@fisica.ufmg.br (M. C. O. Aguiar).
-0.5 -0.2 0.0 0.2 0.5
t = 0.10
t = 0.08
t = 0.06
t = 0.04
t = 0.02
t = 0.01
t = 0.004
t = 0.001
Fig. 1. Renormalized energy εR as a function of the on-site energy ε
for a collection of single-impurity problems close to the Mott-An-
derson transition (t → 0). The parameters used were U = 1.75 and
W = 2.8.
der potential. In this paper, we demonstrate that a scaling
procedure similar to that presented in Ref. [6] can also be
carried on for the renormalized energy εiR.
Renormalization of the disorder potential - We consider a
collection of Anderson impurity models [6] with on-site re-
pulsion U , on-site energies εi, and the total spectral weight
t of the cavity field. Without loss of generality [6], we con-
sider a featureless model bath with non-vanishing spectral
weight for −t/2 < ω < t/2 and zero otherwise. Our goal
is to describe the statistics of the renormalized site ener-
gies as the metal-insulator transition is approached, corre-
sponding to t → 0 within the TMT-DMFT scheme.
Preprint submitted to Elsevier 1 November 2018
http://arxiv.org/abs/0704.0450v1
δε’=|ε/W-0.3125|
ln(t*) = 0.58 + 2 ln(δε’)
ln(t*) = 3.4 + 2 ln(δε’)
Fig. 2. Scaled renormalized energy εR/t
1/2 as a function of t/t∗(δε)
showing that the results for different (and positive) ε can be collapsed
onto a single scaling function with two branches. Different symbols
correspond to different ε; the upper (bottom) branch contains results
for ε > U/2 (ε < U/2). The inset shows the scaling parameter t∗
as a function of |ε/W − 0.3125| for the upper (squares) and bottom
(circles) branches. The parameters used were U = 1.75 and W = 2.8.
The impurity models were solved at zero temperature us-
ing the SB4 method [7,6], which provides a parametrization
of the low-energy (quasiparticle) part of the local Green’s
function, given by
Gi(ωn) =
iωn − ε
R − Zi∆(ωn)
. (1)
Here Zi is the local quasiparticle weight and ε
R is the renor-
malized site energy. The details of the calculations mirror
those of Ref. [6].
The results for the renormalized energy εiR as a func-
tion of −W/2 < εi < W/2, in the vicinity of the Mott-
Anderson transition (t → 0), are shown in Fig. 1. As in
Ref. [6], we find two-fluid behavior, where sites with |εi| <
U/2 turn into local magnetic moments, corresponding to
“Kondo pinning” [8] εiR → 0. For the remaining sites, ε
εi + U/2 or ε
R → εi − U/2, as they become, respectively,
doubly occupied (those with εi < −U/2) or empty (those
with εi > U/2). We should emphasize that such two-fluid
behavior thus emerges only for sufficiently strong disorder,
such that U < W . Otherwise all sites turn into local mag-
netic moments, and the Mott transition for moderate dis-
order retains a character similar to that found within the
standard DMFT approach [8].
Scaling analysis - These results can alternatively be pre-
sented in a scaling form, as shown in Fig. 2. Here, we
show that it is possible to collapse the family of curves
εR(t, δε)/t
0.5, where δε = (εi − ε
∗) /ε∗ and ε∗ = U/2,
onto a single universal scaling function εR(t, δε)/t
0.5 =
f [t/t∗(δε)] with two branches, one for εi < ε
∗ and other
for εi > ε
∗. In agreement with Ref. [6] (inset of Fig. 2),
the crossover scale t∗(δε) ∼ |δε|φ, with exponent φ = 2. In
the limit t → 0, we find that the branch corresponding to
εi < ε
∗ has the asymptotic form f(x) ∼ x3/2 (here x =
t/t∗(δε)), corresponding to εR(t) ∼ t
2. Similarly, for εi >
ε∗, f(x) ∼ x−1/2 corresponding to εR(t) ∼ constant. For
0.0 0.5 1.0 1.5 2.0 2.5
clean value
Fig. 3. Arithmetic and geometric density-of-states (ADOS and
TDOS, respectively) at the Fermi level as a function of U , for
W = 1.5, when the TMT-DMFT self-consistent loop is performed.
x ≫ 1 the two branches merge, viz. f(x) ∼ A±B±x−0.5.
Disorder screening - Within TMT-DMFT, the Anderson
localization effects are manifested by the reduction of the
typical density of states (TDOS), since the (algebraic) aver-
age (ADOS) remains finite even in an Anderson insulator.
When disorder is strongly screened due to the correlation
effects, the two quantities should not differ much, as illus-
trated by the results of Fig. 3. Here we present the results of
the fully self-consistent solution, as the Mott-like transition
is approached by increasing U for W = 1.5. Close to the
transition, both averages approach the clean limit (dashed
line), indicating a strong screening effect. These results are
consistent with those found in the numerical renormaliza-
tion group solution of the TMT-DMFT equation of Ref. [5].
As discussed above, strong disorder screening is expected
near the Mott-like transition (U > W ), which indeed cor-
responds to the mechanism responsible for the results in
Fig. 3.When the transition is approached at strong disorder
(U < W ) (not shown), strong screening effects are found
only for a fractions of the sites (i.e of the volume of the
sample), indicating different critical behavior at the Mott-
Anderson transition. The details of the critical behavior in
this case will be discussed elsewhere.
Acknowledgements - This work was partially supported
by NSF grants DMR-0312495 (M.C.O.A.), DMR-0234215
and DMR-0542026 (V.D.) and DMR-0096462 (G.K.).
References
[1] N.F. Mott, Metal-insulator Transitions (Taylor and Francis,
London, 1974).
[2] P.W. Anderson, Phys. Rev. 109 (1958) 1498.
[3] A. Georges et al., Rev. Mod. Phys. 68 (1996) 13.
[4] V. Dobrosavljević et al., Europhys. Lett. 62 (2003) 76.
[5] K. Byczuk, W. Hofstetter, and D. Vollhardt, Phys. Rev. Lett. 94
(2005) 056404.
[6] M.C.O. Aguiar et al., Phys. Rev. B 73 (2006) 115117.
[7] G. Kotliar and A.E. Ruckenstein, Phys. Rev. Lett. 57 (1986)
1362.
[8] D. Tanasković et al., Phys. Rev. Lett. 91 (2003) 066603.
	References
ABSTRACT
  Correlation-driven screening of disorder is studied within the typical-medium
dynamical mean-field theory (TMT-DMFT) of the Mott-Anderson transition. In the
strongly correlated regime, the site energies epsilon_R^i characterizing the
effective disorder potential are strongly renormalized due to the phenomenon of
Kondo pinning. This effect produces very strong screening when the interaction
U is stronger then disorder W, but applies only to a fraction of the sites in
the opposite limit (U<W).

<|endoftext|><|startoftext|>
Electromigrated nanoscale gaps for surface-enhanced Raman
spectroscopy
Daniel R. Ward1, Nathaniel K. Grady2, Carly S. Levin3, Naomi J.
Halas3,4, Yanpeng Wu2, Peter Nordlander1,4, Douglas Natelson1,4
1Department of Physics and Astronomy,
2Applied Physics Graduate Program, 3Department of Chemistry,
4Department of Electrical and Computer Engineering,
and the Rice Quantum Institute, Rice University,
6100 Main St., Houston, TX 77005, USA
(Dated: October 23, 2018)
Abstract
Single-molecule detection with chemical specificity is a powerful and much desired tool for bi-
ology, chemistry, physics, and sensing technologies. Surface-enhanced spectroscopies enable single
molecule studies, yet reliable substrates of adequate sensitivity are in short supply. We present
a simple, scaleable substrate for surface-enhanced Raman spectroscopy (SERS) incorporating
nanometer-scale electromigrated gaps between extended electrodes. Molecules in the nanogap
active regions exhibit hallmarks of very high Raman sensitivity, including blinking and spectral
diffusion. Electrodynamic simulations show plasmonic focusing, giving electromagnetic enhance-
ments approaching those needed for single-molecule SERS.
http://arxiv.org/abs/0704.0451v1
Multifunctional sensors with single-molecule sensitivity are greatly desired for a vari-
ety of sensing applications, from biochemical analysis to explosives detection. Chemi-
cal and electromagnetic interactions between molecules and metal substrates are used in
surface-enhanced spectroscopies[1] to approach single molecule sensitivity. Electromagnetic
enhancement in nanostructured conductors results when incident light excites local elec-
tronic modes, producing large electric fields in a nanoscale region, known as a “hot spot”,
that greatly exceed the strength of the incident field. Hot spots can lead to particularly
large enhancements of Raman scattering, since the Raman scattering rate is proportional to
|E(ω)|2|E(ω′)|2 at the location of the molecule, where E(ω) is the electric field component
at the frequency of the incident radiation, and E(ω′) is the component at the scattered
frequency.
It has been an ongoing challenge to design and fabricate a substrate for systematic SERS
at the single molecule level. Single-molecule SERS sensitivity was first clearly demon-
strated using random aggregates of colloidal nanoparticles[2, 3, 4, 5]. Numerous other
metal substrate configurations have been used for SERS, including chemically engineered
nanoparticles[6, 7, 8], nanostructures defined by bottom-up patterning[9, 10], and those
made by traditional lithographic approaches[11]. In the most sensitive substrate geome-
tries, incident light excites adjacent subwavelength nanoparticles or nanostructures, result-
ing in large field enhancements within the interparticle gap[12, 13]. Fractal aggregates of
nanoparticles[14] can further increase field enhancements by focusing plasmon energy from
larger length scales down to particular nanometer-scale hotspots[15]. However, precise and
reproducible formation of such assemblies in predetermined locations has been extremely
challenging. An alternative approach is tip-enhanced Raman spectroscopy (TERS), in which
the incident light excites an interelectrode plasmon resonance localized between a sharp,
metal scanned probe tip and an underlying metal substrate. Recent progress has been made
in single-molecule TERS detection[16, 17, 18]. A similar approach was recently attempted
using a mechanical break junction[19]. While useful for surface imaging, TERS requires
feedback to control the tip-surface gap, and is not scalable or readily integrated with other
sensing modalities.
We demonstrate a scaleable and highly reliable method for producing planar extended
electrodes with nanoscale spacings that exhibit very large SERS signals, with each electrode
pair having one well-defined hot spot. Confocal scanning Raman microscopy demonstrates
the localization of the enhanced Raman emission. The SERS response is consistent with
a very small number of molecules in the hotspot, showing blinking and spectral diffusion
of Raman lines. Sensitivity is sufficiently high that SERS from physisorbed atmospheric
contaminants may be detected after minutes of exposure to ambient conditions. The Raman
enhancement for para-mercaptoaniline (pMA) is estimated from experimental data to exceed
108. Finite-difference time-domain (FDTD) modeling of realistic structures reveals a rich
collection of interelectrode plasmon modes that can readily lead to SERS enhancements as
large as 5× 1010 over a broad range of illumination wavelengths. These structures hold the
promise of integration of single-molecule SERS with electronic transport measurements, as
well as other near-field optical devices.
Our structures are fabricated on a Si wafer topped by 200 nm of thermal oxide. Electron
beam lithography is used to pattern “multibowtie” structures as shown in Fig. 1A. The
multibowties consist of two larger pads connected by multiple constrictions, as shown. The
constriction widths are 80-100 nm, readily within the reach of modern photolithography.
After evaporation of 1 nm Ti and 15 nm Au followed by liftoff in acetone, the electrode sets
are cleaned of organic residue by exposure to O2 plasma for 1 minute. The multibowties are
placed in a vacuum probe station, and electromigration[20] is used to form nanometer-scale
gaps in the constrictions in parallel, as shown in Fig. 1B. Electromigration is a nonthermal
process whereby momentum transfer from current-carrying electrons is transferred to the
lattice, rearranging the atomic positions. Electromigration has been studied thoroughly[21,
22, 23, 24] as a means of producing electrodes for studies of single-molecule conduction.
We have performed manual and automated electromigration at room temperature, with
identical results. The number of parallel constrictions in a single multibowtie is limited
by the output current capacity of our electromigration voltage source. A post-migration
resistance of ∼ 10 kΩ for the structure in Fig. 1A appears optimal.
Post-migration high resolution scanning electron microscopy (SEM) shows interelectrode
gaps ranging from too small to resolve to several nanometers. There are no detectable
nanoparticles in the gap region or along the electrode edges. Based on electromigration of
283 multibowties (1981 individual constrictions), 77% of multibowties have final resistances
less than 100 kΩ, and 43% have final resistances less than 25 kΩ. We believe that this yield,
already high, can be improved significantly with better process control, particularly of the
lithography and liftoff.
The optical properties of the resulting nanogaps are characterized using a WITec
CRM 200 scanning confocal Raman microscope in reflection mode, with normal illumination
from a 785 nm diode laser. Using a 100× objective, the resulting diffraction-limited spot
is measured to be gaussian with a full-width at half-maximum of 575 nm. Fig. 1C shows a
spatial map of the integrated emission from the 520 cm−1 Raman line of the Si substrate.
The Au electrodes are clearly resolved. Polarization of the incident radiation is horizontal
in this figure. Rayleigh scattered light from these structures shows significant changes upon
polarization rotation, while SERS response is approximately independent of polarization.
Freshly cleaned nanogaps show no Stokes-shifted Raman emission out to 3000 cm−1.
However, in 65% of clean nanogaps examined, a broad continuum background (see Support-
ing Information) is seen, decaying roughly linearly in wavenumber out to 1000 cm−1 before
falling below detectability. This background is spatially localized to a diffraction-limited re-
gion around the interelectrode gap and is entirely absent in unmigrated junctions. The origin
of this continuum, similar to that seen in other strongly enhancing SERS substrates[5], is
likely inelastic electronic effects in the gold electrodes[25]. In samples coated with molecules,
this background correlates strongly with visibility of SERS. No junctions without this back-
ground displayed SERS signals. Like the SERS signal, this background is approximately
polarization independent. Temporal fluctuations of this background in clean junctions are
minimal, strongly implying that fluctuations of the electrode geometry are not responsible
for SERS blinking.
The SERS enhancement of the junctions has been tested using various molecules. The
bulk of testing utilized pMA, which self-assembles onto the Au electrodes via standard thiol
chemistry. Particular modes of interest are carbon ring modes at 1077 cm−1 and 1590 cm−1.
Fig. 1D shows a map of the Raman emission from the 1077 cm−1 line on the same junction
as Fig. 1B,C, after self-assembly of pMA. This emission is strongly localized to the position
of the nanogap. No Raman signal is detectable either on the metal films or at the edges of
the metal electrodes. Fig. 1E shows the spatial localization of the continuum background
mentioned above.
Fig. 2 shows a more detailed examination of the SERS response of the gap region of a
typical junction after self-assembly of pMA. Fig. 2A,B are time series of the SERS response,
with known pMAmodes indicated. The modes visible are similar to those seen in other SERS
measurements of pMA on lithographically fabricated Au structures[11]. Each spectrum was
acquired with 1 s integration time, with the objective positioned over the center of the
nanogap hotspot. Temporal fluctuations of both the Raman intensity (“blinking”) and
Raman shift (spectral diffusion), generally regarded as hallmarks of few- or single-molecule
SERS sensitivity[26], are readily apparent. Fig. 2C shows a comparison of the wandering
of the 1077 cm−1 pMA line with that of the 520 cm−1 Raman line of the underlying Si
substrate over the same time interval. This demonstrates that the spectral diffusion is due
to changes in the molecular environment, rather than a variation in spectrometer response.
Fig. 2D shows the variation in the peak amplitudes over that same time interval.
This blinking and spectral diffusion are seen routinely in these junctions. We have ob-
served such Raman response from several molecules, including self-assembled films of pMA,
para-mercaptobenzoic acid (pMBA), a Co-containing transition metal complex[28], and spin-
coated poly(3-hexylthiophene) (P3HT). These molecules all have distinct Raman responses
and show blinking and wandering in the junction hotspots.
Another indicator of very large enhancement factors in these structures is sensitivity to
exogenous, physisorbed contamination. Carbon contamination has been discussed[29, 30, 31]
in the context of both SERS and TERS. This substrate is sensitive enough to examine such
contaminants (see Supporting Information). While clean junctions with no deliberately
attached molecules initially show only the continuum background, gap-localized SERS sig-
natures in the sp2 carbon region between 1000 cm−1 and 1600 cm−1 are readily detected
after exposure to ambient lab conditions for tens of minutes. Nanojunctions that have been
coated with a self-assembled monolayer (SAM) (for example, pMA) do not show this carbon
signature, even after hours of ambient exposure. Presumably this has to do with the ex-
tremely localized field enhancement in these structures, with the SAM sterically preventing
physisorbed contaminants from entering the region of enhanced near field.
Recently arrived contaminant SERS signatures abruptly disappear within tens of seconds
at high incident powers (1.8 mW), presumably due to desorption. SERS from covalently
bound molecules is considerably more robust, degrading slowly at high powers, and persisting
indefinitely for incident powers below 700 µW. SEM examination of the nanogaps shows
no optically induced damage after exposure to intensities that would significantly degrade
nanoparticles[32]. The large extended pads likely aid in the thermal sinking of the nanogap
region to the substrate.
Estimating enhancement factors rigorously is notoriously difficult, particularly when the
hotspot size is not known. Although SERS enhancement volume measurements are possible
using molecular rulers[33], this is not feasible with such small nanogaps. Junctions made
directly on Raman-active substrates (Si with no oxide; GaAs) show no clearly detectable
enhancement of substrate modes in the gap region, suggesting that the electromagnetic
enhancement is strongly confined to the thickness of the metal film electrodes. Figure 3
shows a comparison between a typical pMA SERS spectrum acquired on a junction with
a 600 s integration time at 700 µW incident power, and a spectrum acquired over one of
that device’s Au pads, for the same settings. The pad spectrum shows no detectable pMA
features and is dark current limited.
We use FDTD calculations to understand the strong SERS response in these structures
and roughly estimate enhancement factors. It is important to note that the finite grid size
(2 nm) required for practical computation times restricts the quantitative accuracy of these
calculations. However, the main results regarding spatial mode structure (allowing assess-
ment of the localization of the hotspot) and energy dependence are robust to these concerns,
and the calculated electric field enhancement is an underestimate[34]. Fig. 4 shows a cal-
culated extinction spectrum and map of |E|4 in the vicinity of the junction. Calculational
details and additional plots are presented in the Supporting Information. These calculations
predict that there should be large SERS enhancements across a broad bandwidth of excit-
ing wavelengths because of the complicated mode structure possible in the interelectrode
gap. Nanometer-scale asperities from the electromigration process break the interelectrode
symmetry of the structure. The result is that optical excitations at a variety of polariza-
tions can excite many interelectrode modes besides the simple dipolar plasmon commonly
considered. For extended electrodes, a continuous band of plasmon resonances coupling to
wavelengths from 500-1000 nm is expected[35]. This broken symmetry also leads to much
less dependence of the calculated enhancement on polarization direction, as seen experi-
mentally. The calculations confirm that the electromagnetic enhancement is confined in
the normal direction to the film thickness. Laterally, the field enhancement is confined to
a region comparable to the radius of curvature of the asperity. For gaps and asperities in
the range of 2 nm, purely electromagnetic enhancements can exceed 1011, approaching that
sufficient for single-molecule sensitivity.
Using the data from the device in Fig. 3, we estimate the total enhancement in that
device. To be conservative, we assume a hotspot effective radius of 2.5 nm with dense
packing of pMA, giving N ≈ 100 molecules. Blinking and wandering suggest that the true
N value is much closer to one. The integrated Raman signal over a gaussian fit to the
1077 cm−1 Raman line is 2.0 counts/sec/molecule when the incident power is 700 µW. For
our apparatus the count rate from imaging a bulk crystal at the same equivalent power
(see Supporting Information) is 4.2× 10−9 counts/sec/molecule, so that we estimate a total
enhancement of at least 5× 108.
We have demonstrated a SERS substrate capable of extremely high sensitivity for trace
chemical detection. Unlike previous substrates, these nanojunctions may be mass fabricated
in controlled positions with high yield using a combination of standard lithography and
electromigration. The resulting hotspot geometry is predicted to allow large SERS enhance-
ments over a broad band of illuminating wavelengths. Other nonlinear optical effects should
be observable in these structures as well. The extended electrode geometry and underly-
ing gate electrode are ideal for integration with other sensing modalities such as electronic
transport. Tuning molecule/electrode charge transfer via the gate electrode may also enable
the direct examination of the fundamental nature of chemical enhancement in SERS.
DW acknowledges support from the NSF-funded Integrative Graduate Research and Ed-
ucational Training (IGERT) program in Nanophotonics. NH, PN, and DN acknowledge
support from Robert A. Welch Foundation grants C-1220, C-1222, and C-1636, respectively.
DN also acknowledges the National Science Foundation, the David and Lucille Packard
Foundation, the Sloan Foundation, and the Research Corporation. C.S.L. was supported by
a training fellowship from the Keck Center Nanobiology Training Program of the Gulf Coast
Consortia, NIH 1 T90 DK070121-01. YP and NKG are supported by US Army Research
Office grant W911NF-04-1-0203.
Supporting Information Available: Detailed examination of continuum background
and adsorption of exogenous contaminants; extended discussion of FDTD calculations; more
detailed discussion of SERS enhancement calculation. This material is available free of
charge via the Internet at http://pubs.acs.org.
http://pubs.acs.org
FIG. 1: (A) Full multibowtie structure, with seven nanoconstrictions. (B) Close-up of an individual
constriction after electromigration. Note that the resulting nanoscale gap (<∼ 5 nm at closest
separation, as inferred from closer images) is toward the right edge of the indicated red square. (C)
Map of Si Raman peak (integrated from 500-550 cm−1) in device from (B), with red corresponding
to high total counts. The attenuation of the Si Raman line by the Au electrodes is clear. (D) Map of
pMA SERS signal for this device based on one carbon ring mode (integrated from 1050-1110 cm−1).
(E) Map of integrated low energy background (50-300 cm−1) for this device.
FIG. 2: (A) Waterfall plot (1 s integration steps) of SERS spectrum at a single nanogap that had
been soaked in 1 mM pMA in ethanol. Identified pMA peaks include the ring modes at 1077 cm−1
and 1590 cm−1, as well as an 1145 cm−1 δCH mode with b2 symmetry, an 1190 cm
−1 mode
identified as δCH of a1 symmetry, a mode near 1380 cm
−1 identified as δCH+νCC of b2 symmetry,
and a mode near 1425 cm−1 identified as νCC+δCH of b2 symmetry. Mode assignments are based
on Ref. [27]. (B) Close-up of (A) to show correlated wandering and blinking of 1077 cm−1 and
1145 cm−1 modes. (C) Comparison of 1145 cm−1 mode position (blue) with the Si Raman peak
(red), which shows no such wandering. The jitter in the Si peak position is 1 pixel in the detector.
(D) Comparison of 1145 cm−1 peak height (green, found by a gaussian fit to the peak) fluctuations
with those of the Si peak (blue).
Figure 3:
Comparison of pMA at hotspot (blue,left axis) and 5um over on Au 
pad (green,right axis).  Prominent features include the 1077,1590 a1 
symmetry mode peaks, and the less strong 1180 (1160 in moerner
paper) b2 symmetry mode peak.  Additionally the 520 Si peak is 
visible.  The Si peak is significantly reduced in the Au film spectra 
because the film is only partially transparent at 785 nm.  B2 
symmetry modes appear weaker because of blinking on and off 
which reduces the measured intensity when averaged over long time 
periods.  There are no pMA features in the Au pad spectrum dark 
current noise limited.  Estimated enhancement factor is ~1e6.
0 500 1000 1500 2000
0 500 1000 1500 2000
Raman Shift (cm-1)
FIG. 3: Blue curve (left scale): pMA SERS spectrum at hotspot center of one nanojunction densely
covered by pMA, integrated for 10 minutes at incident power of 700 µW. Green curve (right scale):
integrated signal under same conditions on middle of Au pad on the same nanojunction. The
feature near 960 cm−1 is from the Si substrate. No Raman features are detectable on either the
Au pads or their edges under these conditions.
FIG. 4: (A) FDTD-calculated extinction spectrum from the model electrode configuration shown in
(B). (B) Mock-up electrode tips capped with nanoscale hemispherical asperities, with |E|4 plotted
for the 937 nm resonance of (A). Constriction transverse width at narrowest point is 100 nm. Gap
size without asperities is 8 nm. Asperity on left (right) electrode has radius of 6 nm (4 nm). Au
film thickness is 15 nm, SiO2 underlayer thickness is 50 nm. Radiation is normally incident, with
polarization oriented horizontally. Grid size for FDTD calculation is 2 nm. (C) Close-up of central
region of (B), showing extremely localized enhancement at asperities. (D) Cross-section indicated
in (C), showing that enhancement in this configuration does not penetrate significantly into the
substrate. Predicted maximum electromagnetic Raman enhancement in this mode exceeds 108.
[1] Moskovits, M. Rev. Mod. Phys. 1985, 57, 783-826.
[2] Kneipp, K.; Wang, Y.; Kneipp, H.; Perelman, L. T.; Itzkan, I.; Dasari, R. R.; Feld, M. S.
Phys. Rev. Lett. 1997, 78, 1667-1670.
[3] Nie, S.; Emory, S. R. Science 1997, 275, 1102-1106.
[4] Xu, H.; Bjerneld, E. J.; Käll, M.; Börjesson, L. Phys. Rev. Lett. 1999, 83 4357-4360.
[5] Michaels, A. M.; Jiang, J.; Brus, L. J. Phys. Chem. B 2000, 104, 11965-11971.
[6] Jackson, J. B.; Halas, N. J. Proc. Nat. Acad. Sci. U.S.A. 2004, 101, 17930-17935.
[7] Wang, H.; Levin, C. S.; Halas, N. J. J. Am. Chem. Soc. 2005, 127, 14992-14993.
[8] Oldenburg, S. J.; Westcott, S. L.; Averitt, R. D.; Halas, N. J. J. Chem. Phys. 1999, 111,
4729-4735.
[9] Haynes, C. L.; Van Duyne, R. P. J. Phys. Chem. B 2001, 105, 5599-5611.
[10] Qin, L.; Zou, S.; Xue, C.; Atkinson, A.; Schatz, G. C.; Mirkin, C. A., Proc. Nat. Acad. Sci.
U.S.A. 2006, 103, 13300-13303.
[11] Fromm, D. P.; Sundaramurthy, A.; Kinkhabwala, A.; Schuck, P. J.; Kino, G. S.; Moerner,
W. E. J. Chem. Phys. 2006, 124, 061101.
[12] Hallock, A. J.; Redmond, P. L.; Brus, L. E. Proc. Nat. Acad. Sci. U.S.A. 2005, 102, 1280-1284.
[13] Nordlander, P.; Oubre, C.; Prodan, E.; Li, K.; Stockman, M. I. Nano Lett. 2004, 4, 899-903.
[14] Wang, Z.; Pan, S.; Krauss, T. D.; Du, H.; Rothberg, L. J. Proc. Nat. Acad. Sci. U.S.A. 2003,
100, 8638-8643.
[15] Li, K.; Stockman, M. I., Bergman, D. J. Phys. Rev. Lett. 2003, 91, 227402.
[16] Domke, K. F.; Zhang, D.; Pettinger, B. J. Am. Chem. Soc. 2006, 128, 14721-14727.
[17] Neascu, C. C.; Dreyer, J.; Behr, N.; Raschke, M. B. Phys. Rev. B 2006, 73, 193406.
[18] Zhang, W.; Yeo, B. S.; Schmid, T.; Zenobi, R. J. Phys. Chem. C 2007, 111, 1733-1738.
[19] Tian, J.-H.; Liu, B.; Li, X.; Yang, Z.-L.; Ren, B.; Wu, S.-T.; Tao, N.; Tian, Z.-Q. J. Am.
Chem. Soc. 2006, 128, 14748-14749.
[20] Park, H.; Lim, A. K. L.; Alivisatos, A. P.; Park, J.; McEuen, P. L. Appl. Phys. Lett. 1999,
75, 301-303.
[21] Strachan, D. R.; Smith, D. E.; Johnston, D. E.; Park, T. H.; Therien, M. J.; Bonnell, D. A.,
Johnson, A. T. Appl. Phys. Lett. 2005, 86, 043109.
[22] Trouwborst, M. L.; van der Molen, S. J.; van Wees, B. J. J. Appl. Phys. 2006, 99, 114316.
[23] Strachan, D. R.; Smith, D. E.; Fischbein, M. D.; Johnston, D. E.; Guiton, B. S.; Drndic, M.;
Bonnell, D. A., Johnson, A. T. Nano Lett. 2006, 6, 441-444.
[24] Taychatanapat, T.; Bolotin, K. I.; Kuemmeth, F.; Ralph, D. C. Nano Lett. 2007, 7, 652-656.
[25] Beversluis, M. R.; Bouhelier, A.; Novotny, L. Phys. Rev. B 2003, 68, 115433.
[26] Wang, Z.; Rothberg, L. J. J. Phys. Chem. B 2005, 109, 3387-3391.
[27] Osawa, M.; Matsuda, N.; Yoshii, K.; Uchida, I. J. Phys. Chem. 1994, 98, 12702-12707.
[28] Ciszek, J. W.; Keane, Z. K.; Cheng, L.; Stewart, M. P.; Yu, L. H.; Natelson, D.; Tour, J. M.
J. Am. Chem. Soc. 2006, 128, 3179-3189.
[29] Kudelski, A.; Pettinger, B. Chem. Phys. Lett. 2000, 321, 356-362.
[30] Otto, A. J. Raman Spectrosc. 2002, 33, 593-598.
[31] Richards, D.; Milner, R. G.; Huang, F.; Festy, F. J. Raman Spectrosc. 2003, 34, 663-667.
[32] Schuck, P. J.; Fromm, D. P.; Sundaramurthy, A.; Kino, G. S.; Moerner, W. E. Phys. Rev.
Lett. 2005, 94, 017402.
[33] Lal, S.; Grady, N. K.; Goodrich, G. P.; Halas, N. J. Nano Lett. 2006, 6, 2338-2343.
[34] Oubre, C.; Nordlander, P. J. Phys. Chem. B 2005, 109, 10042-10051.
[35] Nordlander, P.; Le, F. Appl. Phys. B 2006, 84, 35-41.
Supporting Information: Electromigrated nanoscale gaps for surface-enhanced
Raman spectroscopy
I. CONTINUUM BACKGROUND
The strong continuum background observed at the nanogaps is shown in Fig. S1A.
The continuum slopes down linearly in intensity from 0 cm−1 to almost 1500 cm−1. This
continuum, seen only in the presence of the nanogap, is compared with the Au film and
Si substrate spectra taken using the same microscope configuration. The 300 cm−1 and
520 cm−1 peaks are from the Si substrate. The spectrum shown in Fig. S1A also shows
a small peak at 1345 cm−1 which is indicative of absorbates from the air settling at the
nanogap. The continuum is localized to the nanogap as seen in Fig. S1B, where a spatial
plot has been made by integrating the SERS spectrum from 600 cm−1 to 800 cm−1 at each
point. This wavenumber range was chosen to avoid any of the Si substrate Raman active
modes. Additionally a comparison of Fig. S1B with the spatial plot of the Si 520 cm−1
peak, Fig. S1C, shows that the continuum background of the nanogap is indeed located at
the center of the bowtie, as expected. Although blinking of SERS at the nanogap has been
observed for pMA and pMBA, the continuum itself does not blink in the absence of molecules.
This is clear from the data in Fig S1D showing the time evolution of the SERS spectrum at
the clean nanogap. No fluctuations are observed, and the continuum background remains
constant.
FIG. S5: (A) Raman spectra at hotspot (blue) of a clean bowtie, Au pad(green), and over Si
substrate(red). The continuum is very strong at the hotspot and shows linear slope from 0 to ∼
1500 cm−1. Also visible are the 300 cm−1 peak and 520 cm−1 peaks of the Si substrate and a weak
peak at 1345 indicating the onset of atmospheric contamination after approximately 15 minutes of
air exposure. Curves have been offset by 150 counts/s (green) and 200 counts/s (red) for clarity.
(B) Spatial plot of integrated signal over 600-800 cm−1 showing the localization of continuum to
the center of the device when compared to the Si plot (C). Yellow is strong signal; blue is no signal.
(C) Spatial plot of integrated Si signal over 500-540 cm−1. Red indicates strong Si signal the blue
areas show where the Au pads are. (D) Time spectra of clean bowtie. The intensity is reported
in CCD counts/second. No blinking of the continuum is observed. The lowest wavenumbers are
reported to have zero counts/second due to the low pass filter used to block out the laser.
II. DEPENDENCE ON INCIDENT POLARIZATION
The SERS signal from the nanogap does not have significant polarization dependence, as
shown in Fig S2A. The two spectra are from the same nanogap with the polarization at 0 and
90 degrees to the gap. Although slightly different due to positioning and actual time variation
of the spectrum, the two spectra do not show any strong differences in the intensities of the
pMA signal or the continuum. The nanogap does exhibit a strong polarization dependence
for Rayleigh scattering, as shown in Fig S2B-S2E. Figures S2B and S2D show a spatial map
of the Rayleigh scattering for the polarization across the gap (B) and parallel with the gap
FIG. S6: (A) Raman spectra at hotspot of bowtie with pMA assembled on surface. The blue
spectrum is with polarization at 0 deg. (direction shown in (C)). The green spectrum has been
shifted 50 counts/s for clarity and is at the same hotspot but with the polarization rotated 90
degrees relative to the substrate (direction shown in (D)). (B) Spatial plot of integrated signal
over -40 to 40 cm−1 showing the Rayleigh scattering from the center of the device. Red is high
intensity blue is low intensity. The large pads are at the left/right. Polarization direction indicated
in (C). (C) Spatial plot of integrated Si signal over 500-540 cm−1. Red indicates Si, blue is Au
pads. Polarization direction is indicated by the arrow. (D) Spatial plot of integrated signal over
-40 to 40 cm−1 showing the Rayleigh scattering from the center of the device for the sample rotated
90 deg. relative to (B),(C). Red is high intensity blue is low intensity. Polarization direction is
indicated in (E). There is a local maximum in the Rayleigh scattering at the center of the gap.
(E) Spatial plot of integrated Si signal over 500-540 cm−1. Red indicates Si, blue is Au pads.
Polarization direction is indicated by the arrow.
III. DEPENDENCE ON SOLUTION CONCENTRATION
We have examined SERS spectra for varying supernatant solution concentration during
the assembly procedure. Ideally, successive dilutions of the solution should vary surface cov-
erage of the assembled molecules. While molecular coverage on the edges of polycrystalline
Au films is not readily assessed, we observe reproducible qualitative trends as concentration
is reduced. For pMBA molecules assembled from solutions in nanopure water, we have var-
ied concentrations from 1 mM down to 100 pM. The fraction of junctions showing SERS
distinct from carbon contamination remains roughly constant down to concentrations as low
as 1 µM. For our volumes and electrode areas, this is still expected to correspond to a dense
coverage of 1 molecule per 0.19 nm2. At concentrations below 1 µM, SERS spectra change
significantly, while remaining distinct from those of carbon contamination: blinking occurs
more frequently; modes of b2 symmetry rather than a1 symmetry appear more frequently;
and the molecular peaks can be more than 100× larger than the high coverage case for the
same integration times. These observations are qualitatively consistent with the molecules
exploring different surface orientations at low coverages, and charge transfer/chemical en-
hancement varying with orientation. However, the actual coverage at the edges remains
unknown.
The concentration of the solution used for assembling molecules on the nanogap surface
strongly influences the form of the observed Raman spectrum as well as the rate and intensity
of the mode blinking. Raman spectra of pMBA were taken by soaking samples in 2 mL of
different concentrations of pMBA. Although for all of these concentrations there are enough
molecules in solution to form a monolayer over the bowtie surface, significant differences in
the spectra were observed. Fig. S3A shows a representative Raman spectrum for pMBA
at the nanogap for 1mM concentrations. The two carbon ring modes at 1077 cm−1 and
1590 cm−1 are clearly present along with a third peak at 1463 cm−1. The time spectra for
this nanogap in Fig S3B. The 1077 cm−1 and 1590 cm−1 peaks are relatively stable and always
present while other modes, such as the 1463 cm−1 mode, blink on and off for a few seconds
at a time. As the concentration is decreased to 1 µM, the pMBA signal tends to be stronger
with more intense blinking. Additionally the 1077 cm−1 mode is observed to disappear while
the 1590 cm−1 mode remains. Additional modes begin to become more visible such as the
1265 cm−1 and 1480 cm−1 modes seen in Fig S3C. At even lower concentrations such as 1
nM, the pMBA signal is again more intense with even more blinking as seen in Fig. S3F
(which has been plotted with intensity on a log scale). The 1077 cm−1 mode is again unseen
while the 1590 cm−1 mode begins to fluctuate in intensity even more. The blinking becomes
much more intense with the intensity of the signal periodically reaching close to ten times
the maximum intensity observed for pMBA at 1 µM. We suggest that the increased blinking
and larger amplitude signals are a result of the molecules not being as tightly packed on the
surface in the 1 µM and 1 nM cases as in the 1 mM case. As a result of looser packing,
the molecules are free to explore more surface conformations, including those with more and
different charge transfer with the Au surface.
We point out that these pMBA spectra are distinct from those seen in physisorbed carbon
contamination on initally clean junctions. These data persist at high incident powers and do
not show “arrival” phenomena as described in the subsequent section. Furthermore, they are
unlikely to originate from photodecomposition of pMBA, since the illumination conditions
are identical for all coverages.
FIG. S7: (A) Raman from pMBA at 1 mM concentration taken at t = 10 s. (B) Corresponding
time spectrum for 1 mM. (C) Raman from pMBA at 1 µM concentration taken at t = 251 s. (D)
Corresponding time spectrum for 1 µM. (E) Raman from pMBA at 1 nM concentration taken at
t = 24.5 s (F) Corresponding time spectrum for 1 nm, plotted on log intensity scale.
IV. DETECTION OF ADSORBED CONTAMINANTS
Due to the large enhancements possible with the nanogaps, contamination from airborne
absorbates occurs readily in the absence of assembled molecules on the nanogap surface. We
have observed the absorption of contaminants onto the surface of clean nanogaps in as little
as 10 minutes. Collecting Raman spectra every 4 seconds, we can observe the appearance
of contaminants on the surface as seen in Fig. S4A and S4B. It is difficult to identify the
contaminants, as the spectra observed have large variations, although carbon ring modes
are often observed in conjunction with other modes. Furthermore the Raman signal from
contaminants often blinks very strongly, with periods of no or weak signal followed by several
seconds of intense blinking, as seen in Fig S4C. The changes in intensity can be more than a
factor of 100. Again we suggest that the strong blinking is a result of the weak attachment
of the contaminants to the nanogap surface, allowing them to move considerably and explore
many interactions with the Au surface. As previously mentioned, these contamination spec-
tra are not observed when molecules of interest have been preassembled deliberately on the
electrode surface. The likely explanation for this is that the self-assembled later sterically
prevents contaminants from arriving at the nanogap region of maximum field enhancement.
FIG. S8: (A) Raman spectra for clean bowtie (blue) and clean bowtie after a few minutes exposed
to the air (green). This change in the Raman spectrum is indicative of contamination for surface
absorbed molecules from the air. (B) Raman spectra for a clean bowtie showing the onset of a
contaminant signal at 900 cm−1 as time progresses. (C) Waterfall plot showing the extremely strong
blinking observed for adsorbed contamination. The fluctuations are much larger than the those
observed for dense coverage of pMA, pMBA, or P3HT. Notice the scale relative to the 520 cm−1
Si peak seen at t = 340 s.
V. FDTD CALCULATIONS
The optical properties of the bowtie structure were calculated using the Finite-Difference
Time-Domain method (FDTD) using a Drude dielectric function with parameters fitted
to the experimental data for gold. This fit provides an accurate description of the optical
properties of gold for wavelengths larger than 500 nm [S1]. These calculations do not account
for reduced carrier mean free path due to surface scattering in the metal film, nor do they
include interelectrode tunneling. However, such effects are unlikely to change the results
significantly.
The bowtie is modeled as a two finite triangular structures as illustrated in Fig. 4A
of the manuscript. Our computational method requires the nanostructures to be modeled
to be of finite extent. The plasmon modes of a finite system are standing modes with
frequencies determined by the size of the sample and the number of nodes of the surface
charge distribution associated with the plasmon. For an extended system such as the bowties
manufactured in this study, the plasmon resonances can be characterized as traveling surface
waves with a continuous distribution of wavevectors.
A series of calculations of bowties with increasing length reveals that the optical spectrum
is characterized by increasingly densely spaced plasmon resonances in the wavelength regime
500-1000 nm and a low energy finite-size induced split-off state involving plasmons localized
on the outer surfaces of the bowtie. For a large bowtie, we expect the plasmon resonances
in the 500-1000 nm wavelength interval to form a continuous band [S2].
The electric field enhancements across the bowtie junction for the plasmon modes within
this band are relatively similar with large and uniform enhancements in the range of 50-150.
The magnitudes of the field enhancements were found to increase with increasing size of the
bowtie structure. For instance, the maximum field enhancement factor was found to be 115
for a 200 nm bowtie (Each half of the bowtie is modeled as a truncated triangle 200 nm
long.) and 175 for a 400 nm bowtie. Our use of a finite gridsize also underestimates the
electric field enhancements[S3]. Thus our calculated electric field enhancements are likely
to significantly underestimate the actual electric field enhancements in the experimentally
manufactured bowties.
For a perfectly symmetric bowtie, significant field enhancements are only induced for
incident light polarized across the junction. If the mirror symmetry is broken, for instance
by making one of the structures thicker or triangular, large field enhancements are induced
for all polarizations of incident light.
To investigate the effects of nanoasperities, FDTD calculations were performed for a
bowtie with two semi-spherical protrusions in the junction as shown in Fig. 4 of the main
text, and Figs. S5-S7 of the Supporting Online Material. As expected, the presence of these
protrusions does not influence the optical spectrum. However, the local field enhancements
around the protrusions become very large, typically three or four times higher than for the
corresponding structure without the defect. The physical mechanism for this increase is
an antenna effect caused by the coupling of plasmons localized on the protrusion with the
extended plasmons on the remaining bowtie structure [S4].
FIG. S9: Maps of FDTD-calculated |E| for the 1535 nm mode indicated in the main manuscript’s
Fig. 4A. Color scale is logarithmic in |E|/|Einc|. Illumination direction is normal incidence, with
electric field polarization oriented horizontally in (A)-(C). Maximum field enhancements are shown.
(A) Overall view. (B) Close-up of interelectrode gap showing asperities. (C) Side-view of section
indicated in (B) in red. (D) Side view of section indicated in (B) in blue.
FIG. S10: Maps of FDTD-calculated |E| for the 937 nm mode indicated in the main manuscript’s
Fig. 4A. Color scale is logarithmic in |E|/|Einc|. Illumination direction is normal incidence, with
electric field polarization oriented horizontally in (A)-(C). Maximum field enhancements are shown.
(A) Overall view. (B) Close-up of interelectrode gap showing asperities. (C) Side-view of section
indicated in (B) in red. (D) Side view of section indicated in (B) in blue.
FIG. S11: Maps of FDTD-calculated |E| for the 746 nm mode indicated in the main manuscript’s
Fig. 4A. Color scale is logarithmic in |E|/|Einc|. Illumination direction is normal incidence, with
electric field polarization oriented horizontally in (A)-(C). Maximum field enhancements are shown.
(A) Overall view. (B) Close-up of interelectrode gap showing asperities. (C) Side-view of section
indicated in (B) in red. (D) Side view of section indicated in (B) in blue.
VI. ENHANCEMENT ESTIMATE
To estimate an enhancement based on the data of Fig. 3 in the main text, it was necessary
to understand the effective count rate per molecule of Raman scattering from bulk pMA in
our measurement setup. This requires knowing the effective volume probed by the WITec
system when the laser is focused on a bulk pMA crystal.
The full-width-half-maximum (FWHM) of the laser spot size was found to be 575 nm.
This was determined by measuring the count rate of the Rayleigh scattering peak (at zero
wavenumbers) as a function of position as the beam was scanned over the edge of a Au film
on a Si substrate. Averaging 16 such scans, the Rayleigh intensity was fit to the form of an
integrated gaussian to determine the FWHM of the gaussian beam. The 575 nm figure is
likely an overestimate due to systematic noise in the flat regions of the fit.
For a gaussian beam with intensity of the form ∝ e−
2σ2 , the FWHM = 2
2 ln 2σ. The
effective radius of an equivalent cylindrical beam is 2σ, or 346 nm in this case. The effective
confocal depth [S5] was determined by measuring the 520 cm−1 Si Raman peak as a function
of vertical displacement of a blank substrate. The effective depth profile was determined
by numerical integration of the Si data using matlab. The effective volume probed by the
beam is 1.92× 10−12 cm3. From the bulk properties of pMA, this corresponds to 1.09× 1010
molecules.
The count rate for the bulk pMA 1077 cm−1 line, corrected by the ratio of (Si SERS
rate/Si bulk rate) to accomodate for the difference in laser powers, is 46 counts/s, compared
with 203 counts/s for the SERS data of Fig. 3. This leads to the enhancement estimate
quoted in the main text of 5× 108.
[S1] Oubre, C.; Nordlander, P. J. Phys. Chem. B 108, 108, 17740-17747.
[S2] Nordlander, P.; Le, F. Appl. Phys. B 2006, 84, 35-41.
[S3] Oubre, C.; Nordlander, P. J. Phys. Chem. B 2005, 109, 10042-10051.
[S4] Hao, F.; Nehl, C. L.; Hafner, J. H.; Nordlander, P. Nano Lett. 2007, 7,
10.1021/nl062969c.
[S5] Cai, W.B.; Ren, B.; Li, X. Q.; Shi, C. X.; Liu, F. M.; Cai, X. W.; Tian, Z. Q. Surf. Sci.
1998, 406, 9-22.
	References
	Continuum background
	Dependence on incident polarization
	Dependence on solution concentration
	Detection of adsorbed contaminants
	FDTD calculations
	Enhancement estimate
ABSTRACT
  Single-molecule detection with chemical specificity is a powerful and much
desired tool for biology, chemistry, physics, and sensing technologies.
Surface-enhanced spectroscopies enable single molecule studies, yet reliable
substrates of adequate sensitivity are in short supply. We present a simple,
scaleable substrate for surface-enhanced Raman spectroscopy (SERS)
incorporating nanometer-scale electromigrated gaps between extended electrodes.
Molecules in the nanogap active regions exhibit hallmarks of very high Raman
sensitivity, including blinking and spectral diffusion. Electrodynamic
simulations show plasmonic focusing, giving electromagnetic enhancements
approaching those needed for single-molecule SERS.

<|endoftext|><|startoftext|>
Introduction
Cygnus X-1 is the first dynamically-determined black hole system (Webster & Murdin
1972; Bolton 1972). It is in a binary system with a massive O9.7 Iab supergiant, and the
orbital period was determined optically to be 5.6 days. Cyg X-1 is thus intrinsically different
from the majority of known black hole candidates (BHCs) whose companion stars are much
1Email: chang40@physics.purdue.edu, cui@physics.purdue.edu
http://arxiv.org/abs/0704.0452v1
– 2 –
less massive (see review by McClintock & Remillard 2006). Curiously, those that have a
high-mass companion (including Cyg X-1, LMC X-1 and LMC X-3) are all persistent X-
ray sources, while those that have a low-mass companion are exclusively transient sources.
Perhaps, stellar wind from the companion star plays a significant role in this regard (Cui,
Chen, & Zhang 1997).
Unlike transient BHCs, in which mass accretion is mediated by the companion star
overfilling its Roche-lobe, Cyg X-1 is thought to be a wind-fed system. In this case, however,
the wind is thought to be highly focused toward the black hole, because the companion star
is nearly filling its Roche lobe (Gies & Bolton 1986). The observed orbital modulation of
the X-ray flux (Wen et al. 1999; Brocksopp et al. 1999a) has provided tentative evidence for
wind accretion, because it is probably caused by varying amount of absorption through the
wind (Wen et al. 1999). On the other hand, it is generally believed that an accretion disk
is also present, based on the presence of an ultra-soft component, as well as Fe Kα emission
line in the X-ray spectrum (e.g., Ebisawa et al. 1996; Cui et al. 1998).
Cyg X-1 is probably still the most studied BHC. It is a fixture in the target list for
all major X-ray missions. Much has been learned from modeling the X-ray continuum of
the source, as well as from studying its X-ray variability. A recent development is the
availability of high-resolution X-ray data, which may shed further light on the accretion
process and the environment within the binary system. Cyg X-1 has been observed on many
occasions with the High Energy Transmission Grating Spectrometer (HETGS) on board the
Chandra X-ray Observatory and the Reflection Grating Spectrometer on board the XMM-
Newton Observatory. The high-resolution spectra have revealed the presence of numerous
absorption lines that are associated with highly ionized material (Marshall et al. 2001; Schulz
et al. 2002; Feng et al. 2003; Miller et al. 2005).
In this work, we report the detection of absorption lines, some of which have been seen
previously but are much stronger here, and, more surprisingly, the discovery of dramatic
variability of the lines, based on data from our HETGS/Chandra observation of Cyg X-1
during its 2001 state transition. The fluxes of some of the lines varied by nearly two orders
of magnitude over the duration of the observation, while the overall flux of the source varied
only mildly.
2. Observations and Data Reduction
Cyg X-1 made a rare transition between the low-hard state and the high-soft state, as
seen by the All-Sky Monitor (ASM) on the Rossi X-ray Timing Explorer (RXTE), about
– 3 –
five years after a similar episode in 1996 (Cui et al. 1997a and 1997b). Figure 1 shows the
ASM light curve that covers the entire period. In this case, the flux of the source stayed
high for ∼400 days, about twice as long as in 1996. Otherwise, the two episodes are very
similar, including the flux levels of the two states, prominent X-ray flares in both states, and
rapid transitions.
Triggered by the ASM alert, the source was observed from 2001 October 28 16:13:52
to October 29 00:33:52 (UT) with the HETGS on Chandra (ObsID #3407). The HETGS
consists of two gratings: Medium Energy Grating (MEG) and High Energy Grating (HEG).
After passing the gratings, the photons are recorded and read out with the spectroscopic
array of the Advanced CCD Imaging Spectrometer (ACIS). To avoid photon pile-up in
the dispersed events, we chose to run the ACIS in continuous-clocking (CC) mode. We also
applied a spatial window around the aim point to accept every 10th event in the zeroth order,
to prevent telemetry saturation yet still have a handle on the position of the zeroth-order
image for accurate wavelength calibration.
The Chandra data were reduced and analyzed with the standard CIAO analysis package
(version 3.2). Following the CIAO 3.2 Science Threads1, we prepared and filtered the data,
produced the Level 2 event file from the Level 1 data products, constructed the light curves,
and made the spectra and the corresponding response matrices and auxilliary response files
(ARFs). We did have to work around a problem related to the use of maskfile for the CC-
mode data, when we were making the ARFs. The solution is now included in the CIAO
3.3 Science Threads. To verify wavelength calibrations, we compared the plus and minus
orders and found good agreement on the position of prominent lines. We did not subtract
background from either the light curves or the spectra, because it is not obvious how to select
background events from the CC-mode data. For a bright source like Cyg X-1, however, we
expect the effects to be entirely negligible.
In coordination with the Chandra observation, we also observed Cyg X-1 with the Pro-
portional Counter Array (PCA) and High Energy X-ray Timing Experiment (HEXTE) de-
tectors on board RXTE. The PCA and HEXTE covers roughly the energy ranges of 2–60
keV and 15-250 keV, respectively. In this work, we made use of the combined broad spec-
tral coverage of the two detectors to constrain the X-ray continuum more reliably. The
RXTE observation was carried out in six short segments, with exposure times of 1–3 ks.
The data were reduced with the standard HEASOFT package (version 5.2), along with the
associated calibration files and background models. We followed the usual procedures2 in
1See http://asc.harvard.edu/ciao3.2/threads/index.html
2See http://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook book.html
http://asc.harvard.edu/ciao3.2/threads/index.html
http://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html
– 4 –
preparing, filtering, and reducing the data, as well as in deriving light curves and spectra
from the standard-mode data. While the HEXTE background was directly measured from
off-source observations, the PCA background was estimated based on the background model
appropriate for bright sources.
3. Analysis and Results
3.1. Light Curves
Figure 2 shows the Chandra light curve of Cyg X-1 that is made from the MEG first-
order data (with plus and minus one orders coadded). For comparison, we have over-plotted
the average count rate of PCU #2 for each of the RXTE exposures. The agreement is quite
good. Besides the rapid flares and other short-term variability that are expected in Cyg X-1,
the source also varies significantly on a timescale of hours. The MEG count rate rises from
a baseline level of roughly 170 counts s−1 to a peak of roughly 220 counts s−1 within 3–4
hours and quickly goes back down to the baseline level. We somewhat arbitrarily divided
the light curve into two time periods (which are labeled as Period I and Period II in Fig. 2)
for subsequent analyses.
3.2. High-Resolution Spectroscopy
We analyzed and modeled the HETGS spectra in ISIS (version 1.2.9) (Houck & Denicola
2000)3. For this work, we focused only on the first-order spectra, because of the relatively
poor signal-to-noise ratio of higher-order spectra. For both the MEG and HEG, we first co-
added the plus and minus orders to produce an overall first-order spectrum, again following
the appropriate CIAO 3.2 science threads. The resolution of the raw data is about 0.01 Å
and 0.005 Å for the MEG and HEG, respectively, which represents a factor of 2 over-sampling
of the instrumental resolution. We applied no further binning of the data. Each spectrum
was broken into 3-Å segments for subsequent analyses. Each segment was fitted locally with
a model that consists of a multi-color disk component and a power law for the continuum
and negative or positive Gaussian functions for narrow absorption or emission features, with
the interstellar absorption taken into account. Such a continuum model is typical of BHCs.
However, the best-fit continuum differs among the segments or between the MEG and HEG,
presumably due to remaining calibration uncertainties. Since we are only interested in using
3See also http://space.mit.edu/CXC/ISIS/
http://space.mit.edu/CXC/ISIS/
– 5 –
the Chandra data to study lines, we think that the adopted procedure is justified. We use
the RXTE data to more reliably constrain the continuum.
One thing that one notices right away is the presence of many absorption lines in the
high-resolution spectra. For this work, we consider a feature real if it is present both in
the MEG and HEG data, with a significance of above 4σ. Figure 3 shows a portion of the
MEG first-order spectrum for Period I, to highlight the lines detected. No absorption lines
were found at wavelengths above 15 Å. We should point out that we also see the emission-
like features at 6.74 Å and 7.96 Å, which most likely instrumental artifacts associated with
calibration uncertainties around the Si K and Al K edges (Miller et al. 2005 and references
therein). We have identified the absorption lines with highly ionized species of Ne, Na, Mg,
Al, Si, S, and Fe, based mostly on the atomic data in Verner et al. (1996) and Behar et al.
(2002), but also in ATOMDB4 1.3.1 for additional transitions.
The line identification process involves three steps: (1) a transition is considered a
candidate if the theoretical wavelength is within 0.03 Å of the measured value; (2) in cases
where multiple candidates exist based on (1), the most probably one (based both on the
oscillator strength of the transition and the relative abundance of the element) is chosen;
and (3) consistency check is made to avoid the identification of a line when more probable
transitions of the same ion are not seen. The last step is critical. For instance, we initially
associated the lines at 10.051 Å and 11.029 Å with Fe XVIII 2s22p5–2s22p46d and Fe XVII
2s22p6–2s2p64p, respectively, because, assuming solar abundances, they are expected to be
stronger than Na XI 1s–2p and Na X 1s2–1s1p, which were also viable candidates from
Step 1. However, we did not detect other more probable transitions associated with Fe
XVIII and Fe XVII, which made the identifications highly unlikely. We had to conclude
that Fe is under-abundant by a factor of 2–3, at minimum, so that we could associate the
lines with Na instead. If so, the line at 7.480 Å would more likely be associated with Mg XI
1s2–1s4p, as opposed to Fe XXIII 2s2–2s5p, which we initially identified. The problem with
this is that we saw no hint of Mg XI 1s2–1s3p, which is more probable. Therefore, we had
to conclude similarly that Mg is also under-abundant by at least a similar amount (but not
by too much, as constrained by other Mg lines).
Table 1 show all of the absorption lines that we detected and identified. The flux and
equivalent width (EW) of each line shown were derived from the best-fit Gaussian for the
line, as well as the local continuum around the line for the latter. Note that in a few cases
our identifications are different from those in the literature. They include: Fe XXII at 8.718
Å and Fe XXI at 9.476 Å (cf. Marshall et al. 2001; Miller et al. 2005); Fe XX at 10.12 Å,
4See http://cxc.harvard.edu/atomdb
http://cxc.harvard.edu/atomdb
– 6 –
Na X at 11.0027 Å, and Fe XX at 12.82 Å (cf. Marshall et al. 2001); and Fe XXI 11.975 Å
(cf. Miller et al. 2005). Only about half of the absorption lines that we detected have been
seen previously in Cyg X-1 (Marshall et al. 2001; Schulz et al. 2002; Feng et al. 2003; Miller
et al. 2005). Many of these lines are much stronger in our case. On the other hand, we
detected nearly all of the reported absorption lines. The exceptions include: S XVI at 4.72
Å (Marshall et al. 2001; Feng et al. 2003), which is detectable only at the 3σ level in our
case (and is thus not included in the table), Fe XIX at 14.53 Å (Miller et al. 2005), which is
actually detectable at about the 4σ level here (but just misses our threshold), and Fe XIX
at 14.97 Å and Fe XVIII at 16.01 Å (Miller et al. 2005), whose presence is not apparent in
our data (< 3σ). Miller et al. (2005) also reported a line at 7.85 Å (Mg XI) but only at
the 3σ level. The line can also be seen in our data at the 4σ level (but also just misses our
threshold). Therefore, we already see some indication that the absorption lines in Cyg X-1
may be variable from a comparison of our results with those published.
Still, it is surprising that almost all of the absorption lines become undetectable in
Period II. Figure 4 shows the MEG first-order spectrum for this time period, which can be
directly compared to results in Fig. 3. The only exception is the line at 14.608 Å, which is
detected with a significance of 5σ in Period II. We derived 99% upper limits on the flux and
EW of each line seen in Period I but not in Period II. The results are also summarized in
Table 1, for direct comparison. As an example, we examine the Ne X line at 12.1339 Å in
the two periods. The integrated flux of the line is about 5× 10−3 photons cm2 s−1 in Period
I, while its 99% upper limit for Period II is only 0.06×10−3 photons cm2 s−1. Therefore, the
line weakened by nearly two orders of magnitude in flux over a timescale of merely several
hours. This is the first time that such dramatic variability of the lines has been observed in
any BHC. While this is the most extreme case, other lines also show large variability (see
Table 1).
To quantify the column density of each ion required to account for the corresponding
absorption line detected and its variability, we carried out curve-of-growth analysis, following
Kotani et al. (2000). The atomic data used in the analysis were again taken from Verner
et al. (1996), Behar et al. (2002), and also ATOMDB 1.3.1 in some cases. The analysis
assumes that the width of the lines is due entirely to thermal Doppler broadening. For
resolved lines, we derived the characteristic temperature from the measured widths. For
unresolved lines, on the other hand, we adopted a temperature that would lead to a line
width equal to the resolution of the MEG (0.023 Å at FWHM). In these cases, therefore, the
derived column density only represents a lower limit. The results are shown in Table 1. This
explains why, e.g., the density of Ne X derived from the 12.144-Å line (which is unresolved)
is significantly lower than that from the 10.245-Å line. Note, however, that the latter is
much lower that that from the 9.727-Å line. We think that the inconsistency arises from the
– 7 –
fact that the 9.727-Å line is likely a mixture of the Ne line and the Fe XIX line at 9.700 Å.
Similar inconsistency is also apparent in a few other cases (see Table 1), which may originate
similarly in line blending. It is also worth noting that most lines that we have analyzed fall
on the linear part of the curve of growth. All resolved lines at wavelengths λ > 11.7 Å are
saturated; so is the unresolved line at 14.203 Å. To show the degree of variability, we also
derived a 99% upper limit on the column density of each of the ions for Period II (assuming
the same characteristic temperatures).
3.3. X-ray Continuum
We now use the RXTE data to constrain the X-ray continuum of Cyg X-1. Both the
PCA and HEXTE data were used. For the PCA, we used only data from the first xenon layer
of each PCU, which is most accurately calibrated. Consequently, the PCA spectral coverage
is limited to roughly 2.5–30 keV. We relied on the HEXTE data to extend spectral coverage
to higher energies. The PCA consists of five detector units, known as PCUs. Not all PCUs
were always operating. For simplicity, we used only data from PCU #0 and PCU #2, which
stayed on throughout the observation, in the subsequent modeling. We chose to derive a
spectrum for each PCU separately, as well as for each of the two HEXTE clusters. Residual
calibration errors were accounted for by adding 1% systematic uncertainty to the data. We
also rebinned the HEXTE spectra to a signal-to-noise ratio of at least 3 in each bin. The
individual spectra were then jointly fitted with the same model that includes a multi-color
disk component, a broken power-law that rolls over exponentially beyond a characteristic
energy, and a Gaussian, taking into account the interstellar absorption. We also multiply
the model by a normalization factor that is fixed at unity for PCU #2 but is allowed to
vary for other detectors, in order to account for any uncalibrated difference in the overall
throughput among the detectors. The model fits the data well for all six segments in the
sense that the reduced χ2 is near unity (with 169 degrees of freedom).
The spectral shape of Cyg X-1 varied little from one observation to the other. The best-
fit photon indices are ∼2.1 and ∼1.7 below and above ∼10 keV. The roll-over energy stays at
20–21 keV and the e-folding energy roughly at 120–130 keV. Neither the disk component nor
the absorption column density is well constrained by the data, due to the lack of sensitivity
(and, to some extent, large calibration uncertainty) at low energies. The results can be
compared directly with those of Cui et al. (1997a), who applied the same empirical model
to the RXTE spectra of Cyg X-1 during the 1996 transition. It is quite apparent that the
spectra here are significantly harder, implying that the source was certainly not yet in the
true high-soft state (see Cui et al. 1997a and 1997b for discussions on the “settling period”).
– 8 –
From the long-term ASM monitoring data, we can see that Cyg X-1 was brighter during
our observation than during any of the earlier Chandra observations (Marshall et al. 2001;
Schulz et al. 2002; Miller et al. 2005), but not as bright as during a later observation (Feng
et al. 2003), when the high-soft state appears to have been reached.
3.4. Photoionization Modeling
To shed light on the physical properties of the absorber, we carried out a photo-ionization
calculation with XSTAR version 2.1 kn35. The underlying assumption is that the absorber
is photo-ionized by the X-ray radiation from the vicinity of the black hole. The input
parameters include the 0.5-10 keV luminosity (Lx = 3.11× 10
37 erg s−1, for a distance of 2.5
kpc), power-law photon index (2.1), both of which are based on results from modeling one of
the RXTE spectra with an assumed NH value of 5.5×10
21 cm−2 (Ebisawa et al. 1996; Cui et
al. 1998). We should note that it is in general risky to extrapolating the assumed power-law
spectrum to lower energies, because it could severely over-estimate the flux there. For the
lines of interest here, however, only ionizing photons with energies > 1 keV contribute and
the spectrum of those photons is described fairly well by a power law, because the effective
temperature of the disk component is expected to very low (e.g., Ebisawa et al. 1996; Cui
et al. 1998).
One of the outputs of the calculation is abundances of the ions of interest, as a function
of the ionization parameter, ξ = Lx/nr
2, where n is the density of the absorber and r is the
distance of the absorber to the source of ionizing photons. Using these abundance curves,
we could, in principle, constrain ξ to a range that is consistent with the ratio of the densities
of any two ions of an element. The challenge in practice is, as already mentioned, that many
of the lines are likely a blend of multiple transitions (of comparable probabilities), which
makes it difficult to reliably determine the densities. Nevertheless, we made an attempt
at deriving such constraints with the resolved, non-mixed lines. Figure 5 summarizes the
results. The intervals do not all overlap, which implies that no single value of ξ could account
for all the data. This is supported by the fact that we have detected all the lines that are
expected for ionization parameters in the range of roughly 102.5–104.5. If one assumes that
the “absorbers” are thin shells along the line of sight and that they all have the same density,
e.g., n = 1011 cm−3 (see, e.g., Wen et al. 1999), one would have 1011 cm . r . 1012 cm.
Compared with the estimated the distance between the compact object and the companion
star (∼ 1.4 × 1012 cm; LaSala et al. 1998), this would put the “absorbers” within the
5See http://heasarc.gsfc.nasa.gov/docs/software/xstar/xstar.html
http://heasarc.gsfc.nasa.gov/docs/software/xstar/xstar.html
– 9 –
binary system. One should, however, take the results with caution, because of, e.g., gross
over-simplification regarding the geometry of the “absorbers”.
4. Discussion
The observed dramatic variability of the absorption lines might be caused by a change
in the degree of ionization in the wind. Since the overall X-ray luminosity varied only mildly,
we speculate that it probably arises from a sudden change in the density of the wind. There
is evidence that such a change could occur during a state transition or during flares (Gies et
al. 2003). If the moderate decrease in the ionizing flux is accompanied by a more dramatic
reduction in the density of the wind from Period I to Period II, the ionization parameter
might increase sufficiently to cause a total ionization of the wind in Period II and thus the
disappearance (or significant weakening) of the lines. It is worth noting that in Period I
lighter elements seen are all H- or He-like but Fe is in an intermediate ionization state (as
indicated by the absence of H- or He-like ions), suggesting a high but not extreme degree of
ionization in the period. Conversely, a dramatic increase in the wind density could achieve
the same effect. Numerical simulations of similar wind-accretion systems (e.g., Blondin,
Stevens, & Kallman 1991) have revealed not only a significant jump in the column density
at late orbital phases (& 0.6) that is associated with tidal streams but also large variability of
the absorbing column. It is conceivable that Period II might coincide with a sudden increase
in the column density. Since we found no apparent absorption lines that correspond to a
lower degree of ionization in Period II, however, such lines must be outside the spectral range
covered with our data, in order for the scenario to be viable. A quantitative assessment of
these scenarios is beyond the scope of this work.
Many of the absorption lines detected by Miller et al. (2005) are much stronger during
Period I of our observation (see Table 1). Using only the lines that are detected with a
significance above 5σ, we looked for a systematic red- or blue- shift of the lines, following
up on the reported redshift of the lines by Marshall et al. (2001) based on data taken in
the low-hard state. The results are summarized in Figure 6 (in the left panel). In this
case, although the lines are still systematically redshifted on average, there is not an obvious
single-velocity solution. Interestingly, if we limit the results only to those lines that were
used by Marshall et al. (2001), as shown in the right panel of Figure 6, we would arrive at
an average velocity that is very close to what Marshall et al. reported, although the scatter
of data points is much larger in our case. On the other hand, our observation spans binary
orbital phases from 0.85 to 0.92, according to the most updated ephemeris (Brocksopp et al.
1999b), while Marshall et al.’s covers a phase range of 0.83–0.86. If the redshift of the lines
– 10 –
is related to the focused-wind scenario advocated by Miller et al. (2005), we ought to see a
larger (by about 30%) redshift. Given the large uncertainties, as well as the possibility that
the wind geometry might be different for different states, it is difficult to draw any definitive
conclusions.
Feng et al. (2003) reported the detection of a number of absorption lines of asymmetric
profile, when Cyg X-1 was in the high-soft state, which they interpreted as evidence for
inflows. The same lines are also present in our data during Period I and are, in fact, much
stronger (except for S XVI, as noted in § 3.2). Figure 7 shows an expanded view of the
Si and Mg lines, which are the strongest in the group. As is apparent from the figure, the
line profile can be fitted fairly well by a Gaussian function in all cases. Therefore, the lines
show no apparent asymmetry here. This also seems to be the case for the S and Fe lines,
although the statistics of the data are not as good. Taken together, our results and Feng et
al.’s imply that that the phenomenon is either unique to the high-soft state (in which Feng
et al. made the observation) or is intermittent in nature. We should also note that Feng et
al’s observation was carried out around the superior conjunction (i.e., zero orbital phase),
where absorption due to the wind is expected to be the strongest (e.g., Wen et al. 1999).
It is not clear, however, how such additonal absorption would lead to an asymmetry in the
profile of lines.
No emission lines are apparent in our data. Evidence for weak emission lines has been
presented (Schulz et al. 2002; Miller et al. 2005) but the significance is marginal in all cases.
On the other hand, several absorption edges are easily detected in our data (see Figs. 3 and
4), as first reported and studied in detail by Schulz et al. (2002). The edges can almost
certainly be attributed to the interstellar absorption.
We thank Harvey Tananbaum for approving this DDT observation, Norbert Schulz and
Herman Marshall for helpful discussion on the pros and cons of various observing configura-
tions, John Houck for help with the use of ISIS and Tim Kallman for help on using XSTAR,
and David Huenemoerder for looking into issues related to the Chandra data products. We
acknowledge the use of the curve-of-growth analysis script that Taro Kotani has made pub-
licly available. We also thank the anonymous referee for a number of useful comments that
led to significant improvement of the manuscript. Support for this work was provided in
part by NASA through the Chandra Award DD1-2011X issued by the Chandra X-ray Ob-
servatory Center, which is operated by the Smithsonian Astrophysical Observatory for and
on behalf of NASA under contract NAS8-03060, and through the LTSA grant NAG5-9998.
– 11 –
REFERENCES
Behar, E., & Netzer, H., 2002, ApJ, 570, 165
Blondin, J. M., Stevens, I. R., & Kallman, T. R. 1991, ApJ, 371, 684
Bolton, C. T., 1972, Nature, 235, 271
Brocksopp, C., et al. 1999a, MNRAS, 309, 1063
Brocksopp, C., Tarasov, A. E., Lyuty, V. M., & Roche, P., 1999b, A&A, 343, 861
Cui, W., Heindl, W. A., Rothschild, R. E., Zhang, S. N., Jahoda, K., & Focke, W. 1997a,
ApJ, 474, L57
Cui, W., Zhang, S. N., Focke, W., & Swank, J. 1997b, ApJ, 484, 383
Cui, W., Chen, W., & Zhang, S. N. 1997, in 1997 Pacific Rim Conference on Stellar Astro-
physics, eds. K. L. Chan, et al., PASP Conf. Ser, vol 138, p. 75
Cui, W., Ebisawa, K., Dotani, T., & Kubota, A. 1998, ApJ, 493, L75
Ebisawa, K., et al. 1996, ApJ, 467, 419
Feng, Y. X., Tennant, A. F., & Zhang, S. N., 2003, ApJ, 597, 1017
Gies, D. R., & Bolton, C. T., 1986, ApJ, 304, 389
Gies, D. R., et al. 2003, ApJ, 583, 424
Houck, J. C., & Denicola, L. A. 2000, in Astronomical Data Analysis Software and Systems
IX, eds. N. Manset, C. Veillet, and D. Crabtree, ASP Conf. Proc., Vol. 216, p.591
Kotani, T., Ebisawa, K., Dotani, T., Inoue, H., Nagase, F., Tanaka, Y., & Ueda, Y., 2000,
ApJ, 539, 413
LaSala, J., Charles, P. A., Smith, R. A. D., Balucinska-Church, M., & Church, M. J. 1998,
MNRAS, 301, 285
Marshall, H. L., Schulz, N. S., Fang, T,. Cui, W., Canizares, C. R., Miller, J. M., &
Lewin, W. H. G., 2001, in X-Ray Emmission from Accretion onto Black Holes, eds.
T. Yaqoob, & J. H. Krolik, 45 620, 398
McClintock, J. E., & Remillard, R. A. 2006, in Compact Stellar X-ray Sources, Eds. W. H. G. Lewin
and M. van der Klis (Cambridge Univ. Press), p157
Miller, J. M., et al. 2005, ApJ, 620, 398
Schulz, N. S., Cui, W., Canizares, C. R., Marshall, H. L., Lee, J. C., Miller, J. M., &
Lewin, W. H. G., 2002, ApJ, 565, 1141
Verner, D. A., Verner, E. M. & Ferland, G. J., 1996, At. Data Nucl. Data Tables, 64, 1
– 12 –
Webster, B. L., & Murdin, P., 1972, Nature, 235, 37
Wen, L., Cui, W., Levine, A. M., & Bradt, H. V. 1999, ApJ, 525, 968
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Table 1: Detected Absorption Lines
Theoretical Measured Shift Flux(I) Flux(II) EW(I) EW(II) Nz(I) Nz(II)
Ion and Transition (Å) (Å) (km s−1) (10−3ph cm−2s−1) (mÅ) (mÅ) (10−16cm−2) (10−16cm−2)
S XV 1s2-1s2p 5.039b 5.041(3) 120±180 1.8(3) <0.9 3.8(7) <1.5 2.5(5) ≤1.0
Si XIV 1s-2p 6.1822a 6.189(1) 330±50 3.0(2) <0.6 6.9(4) <1.0 5.4(3) ≤0.7
Si XIII 1s2-1s2p 6.648b 6.657(2) 410±90 2.6(2) <1.7 6.1(5) <2.9 2.4(2) ≤1.1
Mg XII 1s-3p 7.1062a 7.119(3) 540±130 1.2(2) <0.9 2.6(5) <1.4 8(2) ≤4.0
Al XIII 1s-2p 7.1727a 7.191(
) 760
1.3(2) <1.7 3.0(5) <2.7 1.6(3)d ≤1.5
Fe XXIII 1s22s2p-1s22s6d 7.2646c 7.268(5) 140±210 1.4(3) <2.0 3.1(6) <3.3 29(6) ≤31
Fe XXIII 1s22s2-1s22s5p 7.4722a 7.480(
) 310
0.9(2) <1.4 1.9(5) <2.3 5(1) ≤6.5
Fe XXIV 1s22s-1s24p 7.9893a 8.004(5) 550±190 2.2(3) <1.3 4.7(7) <2.2 9(1) ≤4.0
Fe XXII 1s22s22p-1s22s25d 8.0904c 8.096(3) 210±110 1.0(2) <0.6 2.1(5) <1.0 8(2)d ≤3.6
Fe XXII 1s22s22p-1s22s25d 8.1684c 8.166(3) -90±110 1.0(2) <1.1 2.3(5) <1.9 9(2)d ≤7.2
Fe XXIII 1s22s2-1s22s4p 8.3029a 8.319(2) 580±70 3.0(3) <0.6 6.6(6) <1.0 6.4(6) ≤0.9
Mg XII 1s-2p 8.4210a 8.428(1) 250±40 5.0(3) <0.4 10.8(6) <0.7 4.7(3) ≤0.3
Fe XXI 1s22s22p2-1s22s22p5d 8.573a 8.577(5) 140±170 1.3(3) <0.7 2.8(7) <1.2 6(2) ≤2.6
Fe XXII 1s22s22p-1s22s2p1/24p3/2 8.718
c 8.735(
) 580
2.2(3) <0.5 4.7(
) <0.9 7(1) ≤1.3
Fe XXI 1s22s22p2-1s22s2p1/22p3/24p3/2 8.8254
c 8.823(
) -80
1.4(3) <0.9 3.0(
) <1.4 21(
) ≤9.4
Fe XXII 1s22s22p-1s22s24d 8.98a 8.978(
) -70
1.8(3) <0.9 3.9(6) <1.6 4.6(8)d ≤1.8
Fe XXII 1s22s22p-1s22s24d 9.07a 9.083(
) 430
) <0.6 4.0(
) <1.0 5.2
Mg XI 1s2-1s2p 9.170b 9.192(
) 720
6.2(4) − 13.5(9) − 2.9(2) −
Fe XXI 1s22s22p2-1s22s22p4d 9.356a 9.378(5) 700±160 1.9(4) <2.8 4.3
<4.8 9(2) ≤10
Fe XXI 1s22s22p2-1s22s22p4d 9.476a 9.478(1) 60±30 3.8(3) <0.8 8.6(
) <1.5 6.6(
)d ≤1.0
Fe XIX 1s22s22p4-1s22s22p3(2D)5d 9.68a 9.700(4) 620±120 4.4(
) <1.8 9(1) <3.1 30(3) ≤9.8
Ne X 1s-4p 9.7082a 9.727(
) 580
) <0.7 5.3(
) <1.2 24(4) ≤5.0
Fe XX 1s22s22p3-1s22s22p2(3P)4d 9.991a 10.000(1) 270±30 3.7(4) <0.7 8.3(8) <1.3 5.6(6)d ≤0.8
Na XI 1s-2p 10.0250a 10.051(2) 780±60 5.6(
) <2.0 13(1) <3.5 4.0(3) ≤1.0
Fe XX 1s22s22p3-1s22s22p2(3P)4d 10.12a 10.127(3) 210±90 1.7(4) <0.3 3.8(9) <0.6 2.3(6)d ≤0.4
Ne X 1s-3p 10.2389a 10.245(
) 180
) <0.5 6(1) <0.9 9(2) ≤1.2
Fe XXIV 1s22s-1s23p 10.619a 10.631(
) 340
9.1(6) <4.6 22(2) <8.9 10(1) ≤3.7
Fe XXIV 1s22s-1s23p 10.663a 10.674(3) 310±80 5.6(6) <3.2 14(2) <6.2 12(2) ≤5.0
Fe XIX 1s22s22p4-1s22s22p3(4S)4d 10.816c 10.818(5) 60±140 7.0(9) <3.0 18(2) <6.0 14(2) ≤4.3
Fe XXIII 1s22s2-1s22s3p 10.981a 10.990(1) 230±30 5.5(5) <2.7 14(1) <5.6 2.4(2)d ≤0.8
Na X 1s2-1s2p 11.0027a 11.029(2) 720±50 6.3(
) <4.4 17(2) <9.2 2.7(4) ≤1.3
Fe XVIII 1s22s22p5-1s22s22p4(1D)4d 11.326c 11.33(1) 100±260 8(1) <5.1 23(
) <11 24(
) ≤11
Fe XXII 1s22s22p-1s22s2p(3P0)3p 11.44a 11.431(1) -240±30 7.3(6) <2.5 21(2) <5.7 8(1)d ≤1.6
Fe XXII 1s22s22p-1s22s2p(3P0)3p 11.51a 11.500(3) -260±80 4.3(
) <0.6 12(2) <1.4 22(
) ≤2.3
Fe XXII 1s22s22p-1s22s23d 11.77a 11.781(
) 280
12.6(9) <3.8 39(3) <9.3 7.5
Fe XXI 1s22s22p2-1s22s2p23p 11.975c 11.982(2) 180±50 9.1(9) − 29(3) − 17(
Ne X 1s-2p 12.1339a 12.144(
) 250
4.7(7) <0.06 16(2) <0.2 3.9(
)d ≤0.04
Fe XXI 1s22s22p2-1s22s22p3d 12.259a 12.247(
) -290
) <5.4 14(3) <14 10(3)d ≤10
Fe XXI 1s22s22p2-1s22s22p3d 12.285a 12.304(2) 460±50 22(1) <6.8 75(5) <18 12(2) ≤1.6
Fe XXI 1s22s22p2-1s22s22p3d 12.422c 12.438(2) 390±50 3.9(8) <2.9 14(3) <8.1 3.0(
)d ≤1.6
Fe XX 1s22s22p3-1s22s2p33p 12.576c 12.583(
) 170
7(1) <3.6 26(4) <10 19(
) ≤5.2
Fe XX 1s22s22p3-1s22s22p2(3P)3d 12.82a 12.844(2) 560±50 26(2) <6.5 100(6) <20 34
Fe XX 1s22s22p3-1s22s22p1/22p3/23d 12.912
c 12.914(3) 50±70 12(2) <4.7 47(6) <14 33
Fe XX 1s22s22p3-1s22s22p1/22p3/23d 12.965
c 12.953(
) -280
12(1) − 48(6) − 56
Ne IX 1s2-1s2p 13.448b 13.448(
9(2) <8.5 38(9) <30 6(
) ≤4.1
Fe XIX 1s22s22p4-1s22s22p1/22p
3d 13.479c 13.482(3) 70±70 5(1) <3.3 20(
) <12 1.0(
)d ≤0.5
Fe XIX 1s22s22p4-1s22s22p3(2D)3d 13.518c 13.523(3) 110±70 16(2) − 70(
) − 24
Fe XVIII 1s22s22p5-1s22s22p4(1D)3d 14.203a 14.220(3) 360±40 6(1) <4.0 32(7) <18 4(
)d ≤1.5
Fe XIX 1s22s22p4-1s22s22p3(2P)3s 14.60a
14.608(5)e 100±100 19(3) 20(4) 81±13 73±14 200
−60Fe XVIII 1s22s22p5-1s22s22p4(3P)3d 14.610a
Notes. — Results for Periods I and II are both shown for comparison. The errors in parentheses indicate uncertainty in the last digit of the
measurement; 1σ errors are shown. Negative flux or EW upper limits (indicating emission) are not shown.
a Verner et al. (1996); b Behar et al. (2002); c ATOMDB 1.3.3; d Unresolved; e The two transitions are equally probable. The average wavelength
was used to derive the Doppler-shift of the line.
– 14 –
Fig. 1.— Daily-averaged ASM Light Curve of Cyg X-1 during the 2001 state transition. The
vertical line indicates the time of the Chandra observation.
– 15 –
Fig. 2.— X-ray Light Curves of Cyg X-1. The solid curve shows data from the MEG first
order, while the horizontal bars show the average count rates from PCU #2. The error bars
are negligible in both cases. The dashed line defines the two time periods for subsequent
analyses (see text).
– 16 –
Fig. 3.— MEG first-order spectrum of Cyg X-1 for Period I. No binning was applied. The
presence of absorption lines are apparent. The identifications of the lines are shown. Note
that the emission-like features at 6.74 Å and 7.96 Å are likely instrumental (see text). The
solid line shows the best-fit to the (local) continuum.
– 17 –
Fig. 4.— As in Fig. 3, but for for Period II. Note the absence of nearly all the absorption
lines seen in Fig. 3.
– 18 –
2.5 3 3.5 4
Ionization Parameter (log ξ)
Fe XX/Fe XXII
Fe XX/Fe XXIV
Fe XX/Fe XIX
Fe XX/Fe XXI
Fe XX/ Fe XXIII
Si XIV/Si XIII
Ma XII/Mg XI
Ne X/Ne IX
Fig. 5.— Allowed ranges of the ionization parameter, each of which is inferred from the ratio
of the average densities of two ions of the same element.
4 6 8 10 12 14
Wavelength (Å)
4 6 8 10 12 14
Wavelength (Å)
Fig. 6.— Inferred Doppler shift of the selected absorption lines, (left) all the lines with a
significance above 5σ and (right) only the lines that were used by Marshall et al. (2001).
The dotted line shows the average Doppler velocity in both cases.
– 19 –
Fig. 7.— Profiles of the selected absorption lines. In each case, the dot-dashed histogram
shows a fit to the profile with a Gaussian function.
	Introduction
	Observations and Data Reduction
	Analysis and Results
	Light Curves
	High-Resolution Spectroscopy
	X-ray Continuum
	Photoionization Modeling
	Discussion
ABSTRACT
  We report results from a 30 ks observation of Cygnus X-1 with the High Energy
Transmission Grating Spectrometer (HETGS) on board the {\em Chandra X-ray
Observatory}. Numerous absorption lines were detected in the HETGS spectrum.
The lines are associated with highly ionized Ne, Na, Mg, Al, Si, S, and Fe,
some of which have been seen in earlier HETGS observations. Surprisingly,
however, we discovered dramatic variability of the lines over the duration of
the present observation. For instance, the flux of the Ne X line at 12.14 \AA\
was about $5 \times 10^{-3}$ photons cm$^{-2}$ s$^{-1}$ in the early part of
the observation but became subsequently undetectable, with a 99% upper limit of
$0.06 \times 10^{-3}$ photons cm$^{-2}$ s$^{-1}$ on the flux of the line. This
implies that the line weakened by nearly two orders of magnitude on a timescale
of hours. The overall X-ray flux of the source did also vary during the
observation but only by 20--30%. For Cyg X-1, the absorption lines are
generally attributed to the absorption of X-rays by ionized stellar wind in the
binary system. Therefore, they may provide valuable diagnostics on the physical
condition of the wind. We discuss the implications of the results.

<|endoftext|><|startoftext|>
Antiferromagnetism-superconductivity competition in electron-doped cuprates
triggered by oxygen reduction
P. Richard,1, ∗ M. Neupane,1 Y.-M. Xu,1 P. Fournier,2 S. Li,3 Pengcheng Dai,3, 4 Z. Wang,1 and H. Ding1
Department of Physics, Boston College, Chestnut Hill, MA 02467
Département de physique, Université de Sherbrooke, Sherbrooke, Québec, Canada, J1K 2R1
Department of Physics and astronomy, The University of Tennessee, Knoxville, TN 37996
Neutron Scattering Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831
(Dated: October 25, 2018)
We have performed a systematic angle-resolved photoemission study of as-grown and oxygen-
reduced Pr2−xCexCuO4 and Pr1−xLaCexCuO4 electron-doped cuprates. In contrast to the common
belief, neither the band filling nor the band parameters are significantly affected by the oxygen
reduction process. Instead, we show that the main electronic role of the reduction process is to
remove an anisotropic leading edge gap around the Fermi surface. While the nodal leading edge gap
is induced by long-range antiferomagnetic order, the origin of the antinodal one remains unclear.
PACS numbers: 74.72.Jt, 74.25.Jb, 74.62.Dh, 79.60.-i
Even though most of the work has been focused on
hole-doped cuprates, the understanding of their electron-
doped counterparts is essential for obtaining a univer-
sal picture of high-Tc superconductivity. To achieve this
goal, it is necessary to first solve the main mystery that
holds since the discovery of the T’-structure electron-
doped cuprates RE2−xCexCuO4 and RE1−xLaCexCuO4
(RE = Pr, Nd, Sm, Eu): why is superconductivity in
these compounds achieved only when a tiny amount
of oxygen (∼ 1%) is removed from the as-grown (AG)
samples following a post-annealing process (reduction)
[1, 2, 3, 4, 5]? In fact, the AG samples, even with suf-
ficient electron-doping by adding Ce, are antiferromag-
netic (AF) insulators at low temperature. Far from be-
ing a simple materials issue, the understanding of the
microscopic origin of the reduction process that triggers
superconductivity may shed light on other related ques-
tions in high-Tc superconductivity.
Long considered as the microscopic explanation of
the reduction process, the removal of extraneous oxygen
atoms located above Cu (apical oxygen) has been ruled
out by recent Raman and crystal-field infrared transmis-
sion studies [6, 7]. Indeed, these studies revealed two
main defects appearing with the oxygen reduction, which
have been tentatively assigned to out-of-plane and in-
plane oxygen vacancies, the latter being the only one
observed at optimal doping. In parallel, a (RE,Ce)2O3
impurity phase epitaxial to the CuO2 planes appears in
reduced superconducting samples but disappears in re-
oxygenated nonsuperconducting samples [8, 9, 10, 11].
Based on this phenomenon, it has been proposed recently
that the Cu excess released during the formation of the
(RE,Ce)2O3 impurity phase fills Cu vacancies and makes
the remaining structure more stoichiometric [11]. Which
of these structural defects has the most significant impact
on the electronic properties is still under intense debate
∗Electronic address: richarpi@bc.edu
and calls for a better characterization of the electronic
band structure before and after the reduction process.
In contrast to the widespread belief that the reduction
process in the electron-doped cuprates can be consid-
ered as an independent degree of freedom for carrier dop-
ing, a recent systematic study of the Hall coefficient in
Pr2−xCexCuO4 thin films with various oxygen contents
showed that the carrier mobility rather than their con-
centration is modified by the reduction process [12]. In
particular, the annealing process leads to the delocaliza-
tion of holelike carriers, most likely due to the suppres-
sion of the long-range AF order [10, 13]. Understanding
how the reduction process can tune the competition be-
tween the AF and superconducting states and modify the
electronic band structure is thus crucial.
In this letter, we present the first systematic angle-
resolved photoemission spectroscopy (ARPES) study of
the impact of the reduction process in the electron-doped
cuprates. We show that neither the electronic filling nor
the band structure parameters are significantly changed
by the reduction process. Instead, the reduction process
suppresses long-range AF order and fills up a leading edge
gap (LEG) which has two components. While the nodal
LEG is of AF origin, the nature of the antinodal LEG
remains unclear.
High-quality Pr1.85Ce0.15CuO4 single crystals have
been grown by the flux technique. Some of the non-
superconducting AG samples have been annealed in an
argon environment at temperatures between 850 and 925
oC for a typical period of 5 days encapsulated in a poly-
crystalline matrix [14] and are referred in the text as
reduced samples. The reduced samples exhibit super-
conducting transitions around 24 K. Using the floating
zone technique, high-quality Pr0.88LaCe0.12CuO4 single
crystals have also been grown and have been annealed
as described in Ref. [11]. The samples have been stud-
ied by ARPES using the PGM and U1-NIM beamlines
of the Synchrotron Radiation Center (Stoughton, WI)
with photon energies of 73.5 and 22 eV. The data have
been recorded at 40 K using a Scienta SES-2002 ana-
http://arxiv.org/abs/0704.0453v1
mailto:richarpi@bc.edu
lyzer with a 30 meV energy resolution. The samples
have been cleaved in situ in a vacuum better than 10−10
Torr. Although this letter focuses on the data obtained
on Pr2−xCexCuO4, similar results have been obtained for
the Pr1−xLaCexCuO4 samples.
In order to transform AG samples into superconduc-
tors, the reduction process must affect the electronic
structure and especially the excitations near the Fermi
energy (EF). Fig. 1 compares the constant energy in-
tensity plots (CEIPs) of the reduced (a,b) and AG (d,e)
samples in momentum space, as obtained by ARPES.
Bright spots indicate regions with large photoemission
intensity. The CEIPs centered at -100 meV with 20 meV
energy integration window (b,e) are quite similar, and
one can easily distinguish the X(±π,±π)-centered hole-
like pockets. This contrasts with the CEIPs centered at
EF (a,d). Contrary to the reduced sample (Fig. 1a),
the intensity at EF is strongly suppressed in the AG
sample (Fig. 1d). Nevertheless, underlying Fermi sur-
face (FS) contours can be extracted in both cases and
the results are reproduced in Figs. 1c and f for the re-
duced and AG samples, respectively. Surprisingly, the
data extracted for the reduced and the AG samples can
be fitted, within uncertainties, with the same band pa-
rameters µ = 0.05 eV, t = −1.1 eV and t′ = 0.32
eV, using the simple effective tight-binding (TB) model
E − µ = t/2[cos(kx) + cos(ky)] + t
′ cos(kx) cos(ky).
According to Luttinger theorem, the introduction of
extra negative carriers following the reduction would lead
to smaller X-centered holelike pockets and thus to at least
an increase of µ, which is not observed. Actually, the un-
derlying FS contours coincide with a doping of x ≈ 0.15
in both cases. This is a strong evidence that the re-
duction process modifies neither the band filling nor the
shape of the band dispersion sufficiently to induce the
dramatic changes observed in the transport properties
[12]. Instead, the CEIPs at EF indicate that the anneal-
ing process removes a LEG that is present at EF in the
AG samples. This assertion is confirmed by Fig. 1h,
which compares the electron distribution curves (EDCs)
of AG and reduced samples at different k-locations given
in Fig. 1g. All the AG sample spectra have their leading
edge shifted towards higher binding energies as compared
with the corresponding reduced sample spectra and thus
have much weaker intensities at EF.
Now one asks the question: how can the reduction pro-
cess suppress the LEG observed in the AG samples? We
first checked that the samples were not charged by in-
creasing the photon flux, which had no influence on the
leading edge shift in our experiment. The most likely
candidate to explain this mystery is the AF ordering,
which exists in AG samples. It is known that AF is sup-
pressed in the reduced samples [10, 13]. This effect is
clearly observable by ARPES. We plotted in Figs. 2a
and 2b the electronic dispersion as measured along the
lines indicated in Fig. 2c for the reduced (dashed) and
AG (solid) samples, respectively. In addition to the main
band branches, indicated with dashed arrows, the spec-
FIG. 1: (Color online) a-b and d-e) CEIPs (20 meV integra-
tion) of the ARPES data obtained at 40 K using a 73.5 eV
photons with a A||Γ-X polarization. c and f) Underlying FS
contours associated with the reduced and AG samples. The
experimental data are represented by circles while the points
indicated by diamonds have been obtained by symmetry oper-
ations. The data have been fitted by an effective tight-binding
model (see the text). h) Comparison of the EDCs of reduced
(solid) and AG (dashed) at the locations given in panel g.
trum of the AG sample shows features that are not ob-
served in the reduced one. As indicated directly by solid
arrows on Fig. 2, these features correspond to the AF-
induced folding (AIF) band. Such features, observed in
the first as well as in the second Brillouin zones (BZs), are
responsible for the M(-π,0)-centered electron FS pockets
emphasized in Fig. 2d, which shows the AG CEIP at -50
meV obtained with 22 eV photons.
In the presence of an AIF band, the hybridization of
the main and AIF bands opens a gap at locations coin-
ciding with the magnetic BZ boundary [15]. The energy
position of the center of the gap between the upper and
lower hybridized bands depends on the k-location along
that boundary, defined by M(0,π) and equivalent points.
Hence, along the (0,π)→(π,0) direction, it occurs below
EF between the M points and the hot spots, which are
defined as the k-locations where the intersection occurs
exactly at EF. On the other hand, between the hot spots,
the intersection occurs always above EF and the upper
hybridized band can never cross EF and therefore cannot
be observed by low temperature ARPES, whereas the
top of the lower hybridized band is pushed down. When
the AF gap is large enough, the small holelike FS pocket
around (π/2,π/2) is gapped out.
In order to check this scenario, we investigated the
band dispersion of reduced and AG samples along the
nodal (Γ-X) direction. The results are given in Fig. 3.
Figs. 3a and b show the EDCs of the reduced and AG
samples, respectively. Besides the clear leading edge shift
FIG. 2: (Color online) Comparison of the reduced (a) and AG
(b) spectra obtained along the lines given in c. MB (dashed
arrows) and AIF (solid arrows) indicate the main and AIF
band structures, respectively. The number next to MB or
AFI indicates the zone in which the band is detected. d) CEIP
at -50 meV (30 meV integration) of a Pr1.85Ce0.15CuO4 AG
sample obtained at 22 eV. The suppression along the vertical
Γ-M direction is due to ARPES selection rules.
observed for the AG sample as compared to the reduced
one, the EDC maxima of the AG sample exhibit a bend-
ing back characteristic of hybridization: from the top to
the bottom of Fig. 3b, the EDC maxima first move closer
to EF, and then move away, with a decrease of intensity.
A contrast in the shape of the momentum distribution
curves (MDCs) of the reduced and AG samples, which are
given respectively in Figs. 3c and d, is also observed. The
asymmetric shape of the AGMDCs suggests the presence
of a band folded along the AF boundary (vertical dashed
line). This effect is clearly seen on the corresponding
second momentum-derivative intensity (SMDI) plots dis-
played in Figs. 3e and f for the reduced and AG samples,
respectively. The position of the MDC peaks corresponds
to the minimum in the SMDI plots (bright spots). In con-
trast to the situation shown in Fig. 3e, Fig. 3f exhibits
an additional band whose dispersion is the reflection of
the main band with respect to the AF boundary. Using
the position of the MDCs, we extracted the main band
dispersion and reported it on Fig. 3f, along with its re-
flection across the AF boundary. We also reported on
Fig. 3f the position of the EDC maxima associated with
the AG sample. These maxima, which coincide with the
renormalized dispersion band, support the hybridization
scenario and indicate that the portion of the FS around
(π/2,π/2) is suppressed in the AG samples.
Fig. 4a, which compares the k-dependence of the lead-
ing edge shift for the AG and reduced samples, provides
additional evidence that an AF hybridization gap is sup-
pressed after the reduction process. While no clear lead-
ing edge shift is observable for the reduced sample, an
anisotropic LEG is observed for the AG sample. Hence,
FIG. 3: (Color online) Comparison of the nodal dispersion
(second zone) between reduced (top panels) and AG samples
(bottom panels). a-b) EDCs from -0.87π/a (bottom) to -
0.62π/a (top). c-d) MDCs between - 80 meV (bottom) and 0
meV (top). e-f) SMDI plots. The vertical dashed line corre-
sponds to the AF boundary, while the solid lines and the dots
correspond to the unhybridized main and AF bands, and to
the hybridized band, respectively.
a maximum is observed around the hot spot, as expected
from the hybridization scenario [15]. In order to illus-
trate further the AF scenario, we plotted in Figs. 4b-e
simulations of the nodal dispersion obtained using the fit
parameters given above, with various AF gap sizes. We
introduced a broadening to mimic realistic results and re-
moved the Fermi function for sake of clarity. For a large
gap, the band folding is clearly observable and the lower
hybridized band never crosses EF. As a consequence, a
large leading edge shift is recorded. This leading edge
shift decreases as the gap becomes smaller, and it dis-
appears when the lower band crosses EF, as illustrated
in 4d. Our simulations indicate that a LEG of 20 meV
along the nodal direction can be produced by a 60 meV
AF LEG at the hot spot with a proper linewidth, in
agreement with our observation in Fig. 4a.
While the experimental results for the nodal region can
easily be described by an AF hybridization gap, such a
model alone fails to explain the LEG observed for the
antinodal region, where the main and AIF bands inter-
sect below EF. In particular, the original band position
at M is ∼ 300 meV below EF, thus a small LEG (∼ 20
meV) cannot be produced by a simple AF hybridization
effect. Nevertheless, AF order may still be responsible
for the antinodal LEG. It has been predicted that the
whole FS of the Nd1.85Ce0.15CuO4 superconducting sam-
ples, including the antinodal region, is pseudogapped due
to paramagnons in the semiclassical regime [16]. Even
though this is in apparent contradiction with our exper-
imental data, which indicate no antinodal LEG for the
reduced samples, this idea may be valid for the AG sam-
ples, for which the AF correlations are much stronger.
FIG. 4: (Color online) a) k-dependece of the leading edge
shift. Lines are guides for the eye. b-e) Simulations of the
band dispersion in the presence of an AF gap. We used the
TB parameters defined in the text and introduced a band
broadening in order to mimic real data.
We now turn to a critical question: how can a small
amount of extra oxygen atoms induce AF order in the
AG samples? Literature provides two opposite scenarios
involving CuO2 plane defects and the competition be-
tween the AF and superconducting states. It has been
suggested that in-plane oxygen vacancies in the reduced
samples can suppress the AF order, affect the band pa-
rameters and induce superconductivity [6, 7]. However,
the present study indicates that the band parameters
are not modified significantly by the reduction process.
Moreover, the antinodal LEG in this scenario would be
more likely an indirect consequence of magnetic fluctu-
ations such as paramagnons [16]. The lack of theoreti-
cal study on the subject leaves open the possibility that
charge disorder induced by oxygen vacancies can suppress
the AF order and promote superconductivity.
An opposite scenario is based on a recent neutron study
suggesting a deficiency of Cu in AG Pr1−xLaCexCuO4
samples that is healed after the reduction process through
the formation of a (RE,Ce)2O3 impurity phase [11]. It
has been argued that, in hole-doped cuprates, a Cu va-
cancy, like a nonmagnetic Zn impurity, would suppress
local superconducting phase coherence and at the same
time induce a staggered paramagnetic S=1/2 local mo-
ment extending over a few unit cells [17]. Such impurity
or Cu vacancy induced local magnetism has been recently
observed by in-plane 17O NMR in the superconducting
state of the optimally hole-doped YBa2Cu3O7 with dilute
Zn impurities [18]. If the amount of Cu vacancies in the
AG samples, although small, is sufficient, it is possible to
establish AF long-range order by quantum percolation of
the AF regions. In addition, the strong scattering of the
Cu vacancies in CuO2 planes of AG samples may produce
a localization gap (or Coulomb gap) that could explain
the observed antinodal LEG. We note that the residual
resistivity (∼ 500 µΩcm) at the superconductor-insulator
transition, found in re-oxygenated Pr1.83Ce0.17CuO4 thin
films [12], corresponds to the two-dimensional (2D) re-
sistance ρ2D0 ≈ 8.3 kΩ/� per CuO2, close to the uni-
versal 2D value h/4e2 ≃ 6.5kΩ/� [19, 20, 21]. Simi-
lar results were also found in the Zn-substituted hole-
doped cuprates [22]. Moreover, the 2%-4% of Zn sub-
stitution needed to suppress completely superconductiv-
ity in La2−xSrxCu1−zZnzO4 [22] is similar to the value
of 1.2% to 2.3% Cu vacancies estimated in the AG and
non-superconducting Pr0.88LaCe0.12CuO4 samples [11].
We caution that there are other possible explanations
to account for the antinodal LEG, such as a charge den-
sity wave induced by the nesting of two sides of the M-
centered electron pockets. However, it would then be
hard to explain why both the long-range AF and this
charge density wave are suppressed in the reduced sam-
ples. The unexpected presence of the antinodal LEG calls
for further theoretical and experimental studies.
We are indebted to A.-M. S. Tremblay and D. Sénéchal
for useful discussions. We acknowledge support from
NSF DMR-0353108, DOE DEFG02-99ER45747 and DE-
FG02-05ER46202. This work is based upon research con-
ducted at the Synchrotron Radiation Center supported
by NSF DMR-0537588. P.F. acknowledges the support
of NSERC (Canada), FQRNT (Québec), CFI and CIAR.
[1] E. Moran et al., Physica C 160, 30 (1989).
[2] J. S. Kim and D. R. Gaskell, Physica C 209, 381 (1993).
[3] E. Wang et al., Phys. Rev. B 41, 6582 (1990).
[4] E. Takayama-Muromachi et al., Physica C 159, 634
(1989).
[5] K. Susuki et al., Physica C 166, 357 (1990).
[6] G. Riou et al., Phys. Rev. B 69, 024511 (2004).
[7] P. Richard et al., Phys. Rev. B 70, 064513 (2004).
[8] K. Kurahashi et al., J. Phys. Soc. Jpn 71, 910 (2002).
[9] M. Matsuura et al., Phys. Rev. B 68, 144503 (2003).
[10] Pengcheng Dai et al., Phys. Rev. B 71, 100502 (2005).
[11] H. J. Kang et al., Nature Materials 6, 224 (2007).
[12] J. Gauthier et al. , Phys. Rev. B 75, 024424 (2007).
[13] P. Richard et al., Phys. Rev. B 72, 184514 (2005).
[14] M. Brinkmann et al., Physica C 269, 76 (1996).
[15] H. Matsui et al., Phys. Rev. Lett. 94, 047005 (2005).
[16] D. K. Sunko and S. Barǐsić, Phys. Rev. B 75, 060506
(2007).
[17] Z. Wang and P. A. Lee, Phys. Rev. Lett. 89, 217002
(2002).
[18] S. Ouazi et al., Phys. Rev. Lett. 96, 127005 (2006).
[19] T. Pang, Phys. Rev. Lett. 62, 2176 (1989).
[20] M. P. A. Fisher et al., Phys. Rev. Lett. 64, 587 (1990).
[21] V. J. Emery and S. A. Kivelson, Phys. Rev. Lett. 74,
3253 (1995).
[22] Y. Fukuzumi et al., Phys. Rev. Lett. 76, 684 (1996).
ABSTRACT
  We have performed a systematic angle-resolved photoemission study of as-grown
and oxygen-reduced Pr$_{2-x}$Ce$_x$CuO$_4$ and Pr$_{1-x}$LaCe$_{x}$CuO$_4$
electron-doped cuprates. In contrast to the common belief, neither the band
filling nor the band parameters are significantly affected by the oxygen
reduction process. Instead, we show that the main electronic role of the
reduction process is to remove an anisotropic leading edge gap around the Fermi
surface. While the nodal leading edge gap is induced by long-range
antiferomagnetic order, the origin of the antinodal one remains unclear.

<|endoftext|><|startoftext|>
Microsoft Word - Bennett_space_microlensing.doc
An Extrasolar Planet Census with a Space-based 
Microlensing Survey 
D.P. Bennett
, J. Anderson
, J.-P. Beaulieu
, I. Bond
, E. Cheng
, K. Cook
, S. Friedman
, B.S. 
Gaudi
, A. Gould
, J. Jenkins
, R. Kimble
, D. Lin
, M. Rich
, K. Sahu
, D. Tenerelli
, A. 
Udalski
, and P. Yock
University of Notre Dame, Notre Dame, IN, USA 
 Rice University, Houston, TX, USA 
 Institut d’Astrophysique, Paris, France 
 Massey University, Auckland, New Zealand 
 Conceptual Analytics, LLC, Glen Dale, MD, USA 
 Lawrence Livermore National Laboratory, USA 
 Space Telescope Science Institute, Baltimore, MD, USA 
 Ohio State University, Columbus, OH, USA 
 SETI Institute, Mountain View, CA, USA 
 NASA/Goddard Space Flight Center, Greenbelt, MD, USA 
 University of California, Santa Cruz, CA, USA 
 University of California, Los Angeles, CA, USA 
 Lockheed Martin Space Systems Co., Sunnyvale, CA, USA 
 Warsaw University, Warsaw, Poland 
University of Auckland, Auckland, New Zealand 
ABSTRACT 
A space-based gravitational microlensing exoplanet survey will provide a statistical census of 
exoplanets with masses  0.1M  and orbital separations ranging from 0.5AU to . This includes 
analogs to all the Solar System’s planets except for Mercury, as well as most types of planets 
predicted by planet formation theories. Such a survey will provide results on the frequency of 
planets around all types of stars except those with short lifetimes. Close-in planets with 
separations < 0.5 AU are invisible to a space-based microlensing survey, but these can be found 
by Kepler. Other methods, including ground-based microlensing, cannot approach the 
comprehensive statistics on the mass and semi-major axis distribution of extrasolar planets that a 
space-based microlensing survey will provide. The terrestrial planet sensitivity of a ground-based 
microlensing survey is limited to the vicinity of the Einstein radius at 2-3 AU, and space-based 
imaging is needed to identify and determine the mass of the planetary host stars for the vast 
majority of planets discovered by microlensing. Thus, a space-based microlensing survey is 
likely to be the only way to gain a comprehensive understanding of the nature of planetary 
systems, which is needed to understand planet formation and habitability. The proposed 
Microlensing Planet Finder (MPF) mission is an example of a space-based microlensing survey 
that can accomplish these objectives with proven technology and a cost that fits comfortably 
under the NASA Discovery Program cost cap. 
1. Basics of the Gravitational Microlensing Method 
The physical basis of microlensing is the gravitational attraction of light rays by a star or 
planet. As illustrated in Fig. 1, if a “lens star” passes close to the line of sight to a more distant 
source star, the gravitational field of the lens star will deflect the light rays from the source star. 
The gravitational bending effect of the lens star “splits”, distorts, and magnifies the images of the 
source star. For Galactic microlensing, the image separation is  4 mas, so the observer sees a 
microlensing event as a transient brightening of the source as the lens star’s proper motion 
moves it across the line of sight. 
Gravitational microlensing events are characterized by the Einstein ring radius, 
= 2.0 AU
(1 kpc)
where ML is the lens star mass, and DL 
and DS are the distances to the lens and 
source, respectively. This is the radius of 
the ring image that is seen with perfect 
alignment between the lens and source 
stars. The lensing magnification is 
determined by the alignment of the lens 
and source stars measured in units of RE, 
so even low-mass lenses can give rise to 
high magnification microlensing events. 
The duration of a microlensing event is 
given by the Einstein ring crossing time, 
Fig. 1: The geometry of a microlensing planet search
towards the Galactic bulge. Main sequence stars in
the bulge are monitored for magnification due to
gravitational lensing by foreground stars and planets
in the Galactic disk and bulge. 
Fig. 2: MPF is sensitive to planets above the 
purple curve in the mass vs. semi-major axis 
plane. The gold, green and cyan regions 
indicates the sensitivities of radial velocity 
surveys, SIM and Kepler, respectively. The 
location of our Solar System’s planets and many 
extrasolar planets are indicated, with ground-
based microlensing discoveries in red. 
which is typically 1-3 months for stellar lenses and a few days or less for a planet. 
Planets are detected via light curve deviations that differ from the normal stellar lens light 
curves (Mao & Paczynski 1991). Usually, the signal occurs when one of the two images due to 
lensing by the host star passes close to the location of the planet, as indicated in Fig. 1 (Gould & 
Loeb 1992), but planets can also be detected at very high magnification where the gravitational 
field of the planet destroys the symmetry of the Einstein ring (Griest & Safizadeh 1998). 
2. Capabilities of the Microlensing Method 
Planets down to one tenth of an Earth mass can be detected. The probability of a 
detectable planetary signal and its duration both scale as RE ~M p
, but given the optimum 
alignment, planetary signals from low-mass 
planets can be quite strong. The limiting 
mass for the microlensing method occurs 
when the planetary Einstein radius becomes 
smaller than the projected radius of the 
source star (Bennett & Rhie 1996). The 
~5.5 M  planet detected by Beaulieu et al. 
(2006) is near this limit for a giant source 
star, but most microlensing events have G 
or K-dwarf source stars with radii that are 
at least 10 times smaller than this. So, the 
sensitivity of the microlensing method 
extends down to < 0.1M , as the results of a 
detailed simulation of the MPF mission 
(Bennett & Rhie 2002) show in Fig. 2. 
Microlensing is sensitive to a wide 
range of planet-star separations and host 
star types. The host stars for planets 
detected by microlensing are a random 
sample of stars that happen to pass close to 
the line-of-sight to the source stars in the 
Galactic bulge, so all common types of 
stars are surveyed, including G, K, and M-
dwarfs, as well as white dwarfs and brown 
dwarfs. Microlensing is most sensitive to planets at a separation of ~RE (usually 2-3 AU) due to 
the strong stellar lens magnification at this separation, but the sensitivity extends to arbitrarily 
large separations. It is only planets well inside RE that are missed because the stellar lens images 
that would be distorted by these inner planets have very low magnifications and a very small 
contribution to the total brightness. These features can be seen in Fig. 2, which compares the 
sensitivity of the MPF mission with expectations for other planned and current programs. Other 
ongoing and planned programs can detect, at most, analogs of two of the Solar System’s planets, 
while a space-based microlensing survey can detect seven—all but Mercury. The only method 
with comparable sensitivity to MPF is the Kepler space-based transit survey, which complements 
the microlensing method with sensitivity at semi-major axes, a  1. The sensitivities of MPF and 
Kepler overlap at separations of ~1 AU, which corresponds to the habitable zone for G and K 
stars.  
The red crosses in Fig. 2 indicate the two gas 
giant (Bond et al. 2004; Udalski et al. 2005) and 
two ~10M  “super-earth” planets in orbits of ~3 
AU discovered by ground-based microlensing 
(Beaulieu et al. 2006; Gould et al. 2006). A 
preliminary analysis suggests that about one 
third of all stars are likely to have a super-earth 
at 1.5-4AU whereas radial velocity surveys find 
that only about 3% of stars have gas giants in 
this region (Butler et al. 2006). 
 Microlensing light curves yield 
unambiguous planet parameters. For the great 
majority of events, the basic planet parameters 
(planet:star mass ratio, planet-star separation) 
can be “read off” the planetary deviation
(Gould 
& Loeb 1992; Bennett & Rhie 1996; 
Wambsganss 1997). Possible ambiguities in the 
interpretation of planetary microlensing events 
have been studied in detail (Gaudi & Gould 
1997; Gaudi 1998), and these can be resolved 
with good quality, continuous light curves that 
will be routinely acquired with a space-based 
microlensing survey. A space-based survey will 
also detect most of the planetary host stars, 
which generally allows the host star mass, 
approximate spectral type, and the planetary 
mass and separation to be determined (Bennett 
et al. 2007). The distance to the planetary 
system is determined when the host star is 
identified, so a space-based microlensing survey 
will also measure how the properties of exoplanet systems change as a function of distance from 
the Galactic Center. There is usually some redundancy in the measurements that determine the 
properties of the host stars, and so the determination is robust to complicating factors, such as a 
binary companion to the background source star. 
Detailed simulations indicate a large number of planet detections. Bennett & Rhie (2002) 
and Gaudi (unpublished) have independently simulated space-based microlensing surveys. These 
simulations included variations in the assumed mission capabilities that allow us to explore how 
changes in the mission design will affect the scientific output, and they form the basis of our 
predictions in Figs. 2-4. In order to predict the number of planets that will be detected by a 
space-based microlensing survey, we must make assumptions regarding the frequency of 
exoplanets. Figs. 3 and 4 show the expected number of planets that MPF would detect at orbital 
separations 1-2.5 AU and  (i.e. free-floating planets) assuming one such planet per star. The 
range 1-2.5 AU is presented because this is just outside the range of Kepler and inside the region 
of highest sensitivity for ground-based microlensing surveys. It also corresponds to the outer part 
of the habitable zone for G and K stars and contains the “snow-line” for part of the history of 
lower mass stars (Kennedy et al. 2006). Free-floating planets are expected to be a common by-
Fig. 3: The expected number of MPF planet 
discoveries as a function of the planet mass if 
every star has a single planet at a separation 
of 1.0-2.5 AU. 
Fig. 4: The expected number of MPF free-
floating planet discoveries. 
product of most planet formation scenarios (Levison 
et al. 1998; Goldreich et al. 2004), and only a space-
based microlensing survey can detect free-floating 
planets of  1M . 
3. A Space-based Microlensing Survey Is Needed 
Microlensing relies upon the high density of 
source and lens stars towards the Galactic bulge to 
generate the stellar alignments that are needed to 
generate microlensing events, but this high star 
density also means that the bulge main sequence 
source stars are not generally resolved in ground-
based images, as Fig. 5 demonstrates. This means 
that the precise photometry needed to detect planets 
of  1M  is not possible from the ground unless the magnification due to the stellar lens is 
moderately high. This, in turn, implies that ground-based microlensing is only sensitive to 
terrestrial planets located close to the Einstein ring (at ~2-3 AU). The full sensitivity to terrestrial 
planets in all orbits from 0.5 AU to  comes only from a space-based survey. 
 Planetary host star detection from space yields precise star and planet parameters. For 
all but a small fraction of planetary microlensing events, space-based imaging is needed to detect 
the planetary host stars, and the detection of the host stars allows the star and planet masses and 
separation in physical units to be determined. This can be accomplished with HST observations 
for a small number of planetary microlensing events (Bennett et al. 2006), but space-based 
survey data will be needed for the detection of host stars for hundreds or thousands of planetary 
microlensing events. Fig. 6 shows the distribution of planetary host star masses and the predicted 
uncertainties in the masses and separation of the planets and their host stars (Bennett et al. 2007) 
from simulations of the MPF mission. The host stars with masses determined to better than 20% 
are indicated by the red histogram in Fig. 6(a), and these are primarily the host stars that can be 
detected in MPF images. 
Ground-based microlensing surveys also suffer significant losses in data coverage and quality 
due to poor weather and seeing. As a result, a significant fraction of the planetary deviations seen 
Fig. 6: (a) The simulated distribution of stellar masses for stars with detected terrestrial planets. 
The red histogram indicates the subset of this distribution for which the masses can be 
determined to better than 20%. (b) The distribution of uncertainties in the projected star-planet 
separation. (c) The distribution of uncertainties in the star and planet masses. 
Fig. 5: A comparison between an image 
of the same star field in the Galactic bulge 
from CTIO in 1” seeing and a simulated 
MPF frame (based on an HST image). 
The indicated star is a microlensed main 
sequence source star. 
(a) (b) (c) 
in a ground-based microlensing survey will have poorly constrained planet parameters due to 
poor light curve coverage (Peale 2003). 
4. A Space-Based Microlensing Survey Constrains Planet Formation Theories 
Rapid advancement in exoplanet research is driven by both extensive observational searches 
around mature stars as well as the construction of planet formation and evolution models. 
Perhaps the most surprising discovery so far is the great diversity in the planets' dynamical 
properties, but these results are largely confined to planets that are unusually massive or reside in 
very close orbits. The core accretion theory suggests most planets are much less massive than gas 
giants and that the critical region for understanding planet formation is the “snow-line”, located 
in the region (1.5-4 AU) of greatest microlensing sensitivity (Ida & Lin 2005; Kennedy et al. 
2006). Early results from ground-based microlensing searches (Beaulieu et al. 2006; Gould et al. 
2006) appear to confirm these expectations. A space-based microlensing survey would extend 
the current sensitivity of the microlensing method down to masses of ~0.1M  over a large range 
(0.5AU- ) in separation, and in combination with Kepler, such a mission provides sensitivity to 
sub-Earth mass planets at all separations. The semi-major axis region probed by space-
microlensing provides a cleaner test of planet formation theories than the close-in planets 
detected by other methods, because planets discovered at > 0.5 AU are more likely to have 
formed in situ than the close-in planets. The sensitivity region for space-microlensing includes 
the outer habitable zone for G and K stars through the “snow-line” and beyond, and the lower 
sensitivity limit reaches the regime of planetary embryos at ~0.1M . It may be that such planets 
are much more common than planets of 1M  because their type-1 migration time is much longer.  
Space-microlensing tests core accretion. The space-microlensing census of low-mass 
planets should also provide direct evidence of features of the proto-planetary disk predicted by 
the core accretion theory. There are several physical processes that control the development of 
planetary embryos and planets in the proto-planetary disk. In the inner disk, the size of planetary 
embryos is controlled by the isolation mass, and the isolation mass is expected to jump by an 
order of magnitude across the “snow-line” because of the increased surface density of solids in 
the disk. But the number of gas giant and super-earth planets is also expected to increase beyond 
the snow line, while the planetary growth time increases. This means that it is more likely for the 
growth of outer planets to be terminated via gravitational scattering of planetesimals or the proto-
planets themselves. Scattering would also result in the removal of lower mass planets into very 
distant orbits or even out of the gravitational influence of the host stars altogether, but space-
based microlensing can still detect planets in these locations. The frequency of planets of 
different masses and separations that a space-based microlensing survey provides will yield a 
unique insight into the planetary formation process and will allow us to determine the importance 
of these processes. 
The habitability of a planet depends on its formation history. The suitability of a planet 
for life depends on a number of factors, such as the average surface temperature, which 
determines if the planet resides in the habitable zone. However, there are many other factors that 
also may be important, such as the presence of sufficient water and other volatile compounds 
necessary for life (Raymond et al. 2004; Lissauer 2007). Thus, a reasonable understanding of 
planet formation is an important foundation for the search for nearby habitable planets and life. 
5. Overview of the Microlensing Planet Finder Mission  
Key requirements of the MPF mission are summarized in table 1. MPF continuously observes 
four 0.65 sq. deg. fields in the central Galactic Bulge using an inclined geostationary orbit to 
provide a continuous view of the Galactic Bulge 
fields and a continuous downlink. MPF will use a 
dedicated ground station co-located with other 
NASA facilities at White Sands, NM. Spacecraft 
commanding and on-board processing are 
minimized because of the simple observation plan and orbit design. 
MPF system. MPF uses a 1.1m Three-Mirror Anastigmat (TMA) telescope feeding a 145 
Mpixel HgCdTe focal plane residing on a standard spacecraft bus as shown in Fig. 7. The MPF 
design leverages existing hardware and design concepts, many of which are already 
demonstrated on-orbit and/or flight qualified. The spacecraft bus is a near-identical copy of that 
used for Spitzer and has demonstrated performance that meets MPF requirements. The telescope 
system leverages Ikonos and NextView commercial Earth-observing telescope designs that 
provide extensive diffraction-limited images. The focal plane design taps proven technologies 
developed for JWST. All elements are at TRL 6 or better. The focal plane design uses common 
non-destructive readout CMOS multiplexers for two detector technologies that cover the visible 
and the near-IR. The MPF focal plane can track up to 35 guide stars in the field, providing a 
built-in fine guidance capability. The focal plane gains additional advantages from using the 
Teledyne SIDECAR™ application specific integrated circuit (ASIC) that condenses all the 
control and readout electronics into a “system-on-a-chip” implementation. This approach 
dramatically simplifies the support electronics while minimizing wire-count challenges. 
Cost and Schedule. The total cost for the MPF mission is (FY06) $390M including 30% 
contingency during development. The team that developed the costs included NASA GSFC, 
Lockheed Martin, ITT, STScI, Teledyne and University of Notre Dame. 
6. Discussion and Summary 
A space-based microlensing survey provides a census of extrasolar planets that is complete (in 
a statistical sense) down to 0.1M  at orbital separations  0.5 AU, and when combined with the 
results of the Kepler mission a space-based microlensing survey will give a comprehensive 
Property Value Units  
Launch Vehicle 7920-9.5 Delta II 
Orbit Inclined GEO 
28.7 
degrees 
Mission Lifetime 4.0 years 
Telescope Aperture 1.1 meters (diam.) 
Field of View 0.95x0.68 degrees 
Spatial Resolution 0.240 arcsec/pixel 
Pointing Stability 0.048 arcsec 
Focal Plane Format 145 Megapixels 
Spectral Range 600 – 1700 nm in 3 bands 
Quantum Efficiency > 75% 
> 55% 
900-1400 nm 
700-1600 nm 
Dark Current < 1 e-/pixel/sec 
Readout Noise < 30 e-/read 
Photometric 
Accuracy 
1 or better % at J=20.5. 
Data Rate 50.1 Mbits/sec 
Table 1: Key MPF Mission Requirements 
Fig. 7: MPF On-Orbit Configuration 
picture of all types of extrasolar planets with masses down to well below an Earth mass. This 
complete coverage of planets at all separations can be used to calibrate the poorly understood 
theory of planetary migration. This fundamental exoplanet census data is needed to gain a 
comprehensive understanding of processes of planet formation and migration, and this 
understanding of planet formation is an important ingredient for the understanding of the 
requirements for habitable planets and the development of life on extrasolar planets. 
A subset of the science goals can be accomplished with an enhanced ground-based 
microlensing program (Gould et al. 2007), which would be sensitive to Earth-mass planets in the 
vicinity of the “snow-line”. But such a survey would have its sensitivity to Earth-like planets 
limited to a narrow range of semi-major axes, so it would not provide the complete picture of the 
frequency of exoplanets down to 0.1M  that a space-based microlensing survey would provide. 
Furthermore, a ground-based survey would not be able to detect the planetary host stars for most 
of the events, and so it will not provide the systematic data on the variation of exoplanet 
properties as a function of host star type that a space-based survey will provide. 
The basic requirements for a space-based microlensing survey are a 1-m class wide field-of-
view space telescope that can image the central Galactic bulge in the near-IR or optical almost 
continuously for periods of at least several months at a time. This can be accomplished as a 
NASA Discovery mission, as the example of the MPF mission shows, but there are a number of 
other proposed missions with similar requirements, such as a number of JDEM concept missions 
or a stare-mode astrometry mission (Johnston et al. 2007) that makes use of the same type of 
detectors as the MPF mission. Such an astrometry mission could complement the statistical 
planetary results from microlensing survey with data on nearby planets. Thus, a space-based 
microlensing survey could be accomplished with a standalone Discovery class mission or a joint 
mission with another project. As Fig. 2 shows, there is no other planned mission that can 
duplicate the science return of a space-based microlensing survey, and our knowledge of 
exoplanets and their formation will remain incomplete until such a mission is flown. 
References 
Basri, G., Borucki, W.J., & Koch, D. 2005, 
New Ast. Rev., 49, 478 
Beaulieu, J.-P. et al. 2006, Nature, 439, 437 
Bennett, D.P. et al. 2006, ApJL, 647, L171 
Bennett, D.P., Anderson, J., & Gaudi, B.S. 
2007, ApJ, in press (astro-ph/0611448) 
Bennett, D.P., & Rhie, S.H. 1996, ApJ, 472, 
Bennett, D.P., & Rhie, S.H. 2002, ApJ, 574, 
Bond, I. A. et al., 2004, ApJL, 606, L155 
Butler, R.P., et al. 2006, ApJ, 646, 50 
Gaudi, B.S. & Gould, A. 1997, ApJ, 486, 85 
Gaudi, B.S., 1998, ApJ, 506, 533 
Gould, A. et al. 2007, ExoPTF white paper 
Gould, A. & Loeb, A. 1992, ApJ, 396, 104 
Goldreich, P., Lithwick, Y., & Sari, R. 2004, 
ApJ, 614, 497 
Griest, K., & Safizadeh, N. 1998, ApJ, 500, 37 
Han, C. 2006, ApJ, 644, 1232 
Johnston, K. et al. 2007, ExoPTF white paper 
Kennedy, G.M., Kenyon, S.J., & Bromley, 
B.C. 2006, ApJL, 650, L139 
Levison, H.F., Lissauer, J.J., & Duncan, M.J., 
1998, AJ, 116, 1998 
Lissauer, J.J. 2007, ApJL, in press (astro-
ph/0703576) 
Mao, S. & Paczynski, B., 1991, ApJ, 374, L37 
Peale, S J. 2003, AJ, 126, 1595 
Raymond, S.N. Quinn, T., & Lunine, J.I. 2004, 
Icarus, 168, 1.  
Udalski, A. et al. 2005, ApJL, 628, L109 
Wambsganss, J. 1997, MNRAS, 284, 172
ABSTRACT
  A space-based gravitational microlensing exoplanet survey will provide a
statistical census of exoplanets with masses down to 0.1 Earth-masses and
orbital separations ranging from 0.5AU to infinity. This includes analogs to
all the Solar System's planets except for Mercury, as well as most types of
planets predicted by planet formation theories. Such a survey will provide
results on the frequency of planets around all types of stars except those with
short lifetimes. Close-in planets with separations < 0.5 AU are invisible to a
space-based microlensing survey, but these can be found by Kepler. Other
methods, including ground-based microlensing, cannot approach the comprehensive
statistics on the mass and semi-major axis distribution of extrasolar planets
that a space-based microlensing survey will provide. The terrestrial planet
sensitivity of a ground-based microlensing survey is limited to the vicinity of
the Einstein radius at 2-3 AU, and space-based imaging is needed to identify
and determine the mass of the planetary host stars for the vast majority of
planets discovered by microlensing. Thus, a space-based microlensing survey is
likely to be the only way to gain a comprehensive understanding of the nature
of planetary systems, which is needed to understand planet formation and
habitability. The proposed Microlensing Planet Finder (MPF) mission is an
example of a space-based microlensing survey that can accomplish these
objectives with proven technology and a cost that fits comfortably under the
NASA Discovery Program cost cap.

<|endoftext|><|startoftext|>
Draft version October 29, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
USCO1606-1935: AN UNUSUALLY WIDE LOW-MASS TRIPLE SYSTEM?
Adam L. Kraus (alk@astro.caltech.edu), Lynne A. Hillenbrand (lah@astro.caltech.edu)
California Institute of Technology, Department of Astrophysics, MC 105-24, Pasadena, CA 91125
Draft version October 29, 2018
ABSTRACT
We present photometric, astrometric, and spectroscopic observations of USco160611.9-193532 AB,
a candidate ultrawide (∼1600 AU), low-mass (Mtot ∼0.4 M⊙) multiple system in the nearby OB
association Upper Scorpius. We conclude that both components are young, comoving members of
the association; we also present high-resolution observations which show that the primary is itself a
close binary system. If the Aab and B components are gravitationally bound, the system would fall
into the small class of young multiple systems which have unusually wide separations as compared to
field systems of similar mass. However, we demonstrate that physical association can not be assumed
purely on probabilistic grounds for any individual candidate system in this separation range. Analysis
of the association’s two-point correlation function shows that there is a significant probability (25%)
that at least one pair of low-mass association members will be separated in projection by .15′′, so
analysis of the wide binary population in Upper Sco will require a systematic search for all wide
systems; the detection of another such pair would represent an excess at the 98% confidence level.
Subject headings: stars:binaries:general; stars:low-mass,brown dwarfs;stars:pre-main se-
quence;stars:individual([PBB2002] USco160611.9-193532)
1. INTRODUCTION
The frequency and properties of multiple star systems
are important diagnostics for placing constraints on star
formation processes. This has prompted numerous at-
tempts to characterize the properties of nearby binary
systems in the field. These surveys (e.g. Duquennoy &
Mayor 1991; Fischer & Marcy 1992; Close et al. 2003;
Bouy et al. 2003; Burgasser et al. 2003) have found
that binary frequencies and properties are very strongly
dependent on mass. Solar-mass stars have high binary
frequencies (&60%) and maximum separations of up to
∼104 AU. By contrast, M dwarfs have moderately high
binary frequencies (30-40%) and few binary companions
with separations of more than ∼500 AU, while brown
dwarfs have low binary frequencies (∼15%) and few com-
panions with separations >20 AU.
The mass-dependent decline in the maximum observed
binary separation has been described by Reid et al.
(2001) and Burgasser et al. (2003) with an empirical
function which is exponential at high masses (amax ∝
103.3Mtot) and quadratic at low masses (amax ∝M
tot).
The mechanism that produces the mass dependence is
currently unknown; N-body simulations show that the
empirical limit is not a result of dynamical evolution in
the field (e.g. Burgasser et al. 2003; Weinberg et al.
1987) since the rate of disruptive stellar encounters is far
too low. This suggests that the limit must be set early in
stellar lifetimes, either as a result of the binary formation
process or during early dynamical evolution in relatively
crowded natal environments. Surveys of nearby young
stellar associations have identified several unusually wide
systems (Chauvin et al. 2004; Caballero et al. 2006; ;
Jayawardhana & Ivanov 2006; Luhman et al. 2006, 2007;
Close et al. 2007), but not in sufficient numbers to study
their properties in a statistically meaningful manner.
We have addressed this problem by using archival
2MASS data to systematically search for candidate wide
binary systems among all of the known members of three
nearby young associations (Upper Sco, Taurus-Auriga,
and Chamaeleon-I; Kraus & Hillenbrand 2007). Our re-
sults broadly agree with the standard paradigm; there
is a significant deficit of wide systems among very low-
mass stars and brown dwarfs as compared to their more
massive brethren. However, we did identify a small num-
ber of candidate wide systems. One of these candidates
is [PBB2002] USco160611.9-193532 (hereafter USco1606-
1935), a wide (10.87′′; 1600 AU) pair of stars with similar
fluxes and colors. The brighter member of the pair was
spectroscopically confirmed by Preibisch et al. (2002) to
be a young M5 star. The fainter member fell just below
the flux limit of their survey.
In this paper, we describe our photometric, as-
trometric, and spectroscopic followup observations for
USco1606-1935 and evaluate the probability that the sys-
tem is an unusually wide, low-mass binary. In Section 2,
we describe our observations and data analysis methods.
In Section 3, we use these results to establish that both
members of the pair are young and co-moving, and that
the primary is itself a close binary. Finally, in Section
4 we address the possibility that the pair is not bound,
but a chance alignment of young stars, by analyzing the
clustering of pre-main-sequence stars in Upper Sco.
2. OBSERVATIONS AND DATA ANALYSIS
Most binary surveys, including our discovery survey,
identify companions based on their proximity to the pri-
mary star and argue for physical association based on
the (usually very low) probability that an unbound star
would have been observed in chance alignment. How-
ever, the probability of contamination is much higher
for very wide systems like USco1606-1935, so we de-
cided to pursue additional information in order to con-
firm its multiplicity and further characterize its system
components. In this section, we describe our followup
efforts: a search of publicly available databases to ob-
http://arxiv.org/abs/0704.0455v1
tain additional photometry and astrometry, acquisition
of intermediate-resolution spectra to measure the sec-
ondary spectral type and test for signatures of youth,
and acquisition of high-resolution images to determine if
either component is itself a tighter binary and to test for
common proper motion.
2.1. Archival Data
We identified USco1606-1935 AB as a candidate bi-
nary system using archival data from 2MASS (Skrut-
skie et al. 2006). The binary components are bright
and clearly resolved, so we were able to retrieve ad-
ditional photometry and astrometry from several other
wide-field imaging surveys. We collated results for the
binary components themselves and for nearby field stars
from 2MASS, the Deep Near Infrared Survey (DENIS;
Epchtein et al. 1999), United States Naval Observatory
B1.0 survey (USNO-B; Monet et al. 2003), and the Su-
perCOSMOS Sky Survey (SSS; Hambly et al. 2001).
The DENIS and 2MASS source catalogues are based on
wide-field imaging surveys conducted in the optical/NIR
(IJK and JHK, respectively) using infrared array de-
tectors, while the USNO-B and SSS source catalogues
are based on independent digitizations of photographic
plates from the First Palomar Observatory Sky Survey
and the ESO Southern-Sky Survey.
2.1.1. Photometry
After evaluating the data, we decided to base our anal-
ysis on the JHK magnitudes measured by 2MASS and
the photographic I magnitude of USNO-B (hereafter de-
noted I2, following the nomenclature of the USNO-B
catalog, to distinguish it from Cousins IC). We chose
these observations because their accuracy can be directly
tested using the independent IJK magnitudes measured
by DENIS; this comparison shows that the fluxes are con-
sistent within the uncertainties. We do not directly use
the DENIS observations because they are not as deep as
the other surveys. We adopted the photometric uncer-
tainties suggested in each survey’s technical reference.
2.1.2. Astrometry
As we describe in Section 3.3, there appear to be large
systematic differences in the astrometry reported by the
USNO-B and SSS source catalogs. These surveys rep-
resent digitizations of the same photographic plates, so
these systematic discrepancies suggest that at least one
survey introduces systematic biases in the digitization
and calibration process. Given the uncertainty in which
measurements to trust, we have chosen to disregard all
available photographic astrometry and only use results
from 2MASS and DENIS.
Our discovery survey already measured 2MASS rel-
ative astrometry for each filter directly from the pro-
cessed atlas images, so we have adopted those values.
We extracted DENIS astrometry from the source catalog,
which contains the average positions for all three filters.
Both surveys quote astrometric uncertainties of 70-100
mas for stars in the brightness range of our targets, but
that value includes a significant systematic term result-
ing from the transformation to an all-sky reference frame.
We have conducted tests with standard binary systems
of known separation which suggest that relative astrom-
etry on angular scales of <1′ is accurate to ∼40 mas,
so we adopt this value as the astrometric uncertainty for
each survey.
2.2. Optical Spectroscopy
We obtained an intermediate-resolution spectrum of
USco1606-1935 B with the Double Spectrograph (Oke &
Gunn 1982) on the Hale 5m telescope at Palomar Obser-
vatory. The spectrum presented here was obtained with
the red channel using a 316 l/mm grating and a 2.0′′ slit,
yielding a spectral resolution of R ∼1250 over a wave-
length range of 6400-8800 angstroms. Wavelength cali-
bration was achieved by observing a standard lamp after
the science target, and flux normalization was achieved
by observation of the spectrophotometric standard star
Feige 34 (Massey et al. 1988). The spectrum was pro-
cessed using standard IRAF1 tasks.
Our field and young spectral type standards were
drawn from membership surveys of Upper Sco and Tau-
rus by Slesnick et al. (2006a, 2006b) which used identical
instrument settings for the spectroscopic confirmation of
photometrically selected candidate members.
2.3. High-Resolution Imaging
We observed USco1606-1935 A and B on February 7,
2006 (JD=2453773) using laser guide star adaptive op-
tics (LGSAO; Wizinowich et al. 2006) on the Keck-II
telescope with NIRC2 (K. Matthews, in prep), a high
spatial resolution near-infrared camera. The seeing was
average to poor (&1′′) for most of the observing run, but
the system delivered nearly diffraction-limited correction
in K ′ (60 mas FWHM) during the period of these ob-
servations. The system performance was above average
given the low elevation (34 degrees; 1.8 airmasses), most
likely due to the proximity and brightness of the tip-tilt
reference star (R = 14.2, d = 14′′).
Images were obtained using the K ′ filter in both the
narrow and wide camera modes. The pixel scales in
these modes are 9.942 mas pix−1 (FOV=10.18′′) and
39.686 mas pix−1 (FOV=40.64′′). All wide-camera ob-
servations were centered on the close Aab binary. The A
and B components were too wide to fit reasonably into
a single narrow-camera exposure, so we took separate
exposure sequences centered on each. We obtained four
wide-camera exposures of the AB system, seven narrow-
camera exposures of A, and four narrow-camera expo-
sures of B; the total integration times for each image set
are 80s, 175s, and 100s, respectively. Each set was pro-
duced with a 3-point box dither pattern that omitted the
bottom-left position due to higher read-noise for the de-
tector in that quadrant. Single exposures were also taken
at the central position.
Our science targets are relatively bright, so all obser-
vations were taken in correlated double-sampling mode,
for which the array read noise is 38 electrons/read. The
read noise is the dominant noise term for identifying
faint sources, yielding 10σ detection limits of K ∼ 19.2
for the wide camera observations, K ∼ 18.8 for the
narrow-camera observations centered on component A,
1 IRAF is distributed by the National Optical Astronomy Ob-
servatories, which are operated by the Association of Universities
for Research in Astronomy, Inc., under cooperative agreement with
the National Science Foundation.
and K ∼ 18.3 for the narrow-camera observations cen-
tered on component B; the detection limits for B are
slightly shallower due to the shorter total integration
time. The data were flat-fielded and dark- and bias-
subtracted using standard IRAF procedures. The images
were distortion-corrected using new high-order distortion
solutions (P. Cameron, in prep) that deliver a significant
performance increase as compared to the solutions pre-
sented in the NIRC2 pre-ship manual2; the typical resid-
uals are ∼4 mas in wide camera mode and ∼0.6 mas in
narrow camera mode. We adopt these systematic limits
as the uncertainty in astrometry for bright objects; all
faint objects (K ∼16-18) have larger uncertainties (∼10
mas) due to photon statistics.
We measured PSF-fitting photometry and astrome-
try for our sources using the IRAF package DAOPHOT
(Stetson 1987), and specifically with the ALLSTAR rou-
tine. We analyzed each frame separately in order to esti-
mate the uncertainty in individual measurements and to
allow for the potential rejection of frames with inferior
AO correction; our final results represent the mean value
for all observations in a filter.
In the wide-camera observations, we produced a tem-
plate PSF based on the B component and the field star
F1 (see Section 3.1 and Figure 1), both of which appear
to be single sources. In the narrow-camera observations
centered on A or B, the science target was the only bright
object detected in our observations, so there was not a
separate source from which to adopt a template PSF. We
could have adopted a template PSF from another set of
observations, but the AO correction usually varies sig-
nificantly between targets since it is very sensitive to the
seeing, elevation, laser return, and tip-tilt separation and
brightness. We found that no other target in our survey
provided a good PSF match.
We addressed this issue for the Aab binary pair by
developing a procedure to reconstruct the single-source
PSF directly from the observations of the binary system.
Our algorithm begins with a preliminary estimate of the
single-source PSF, then iteratively fits both components
of the binary system with the estimated PSF and uses
the synthetic PSF to subtract the best-fit estimate of the
secondary flux. This residual image (which is dominated
by the primary flux distribution) is then used to fit an
improved estimate of the single-source PSF.
DAOPHOT characterizes an empirical PSF in terms
of an analytical function and a lookup table of residuals,
so we first iterated the procedure using a purely ana-
lytical function until it converged, then added a lookup
table to the estimated PSF and iterated until its con-
tents also converged. Observations of single stars sug-
gested that the penny2 function (a gaussian core with
lorentzian wings) would provide the best analytic fit, so
we chose it as our analytic function. Four iterations of
the fitting process were required for the analytic func-
tion to converge and 3 iterations were required for the
lookup table to converge. Our algorithm does not work
for the B component because it appears to be single, so
we adopted the average synthetic single-source PSF from
analysis of the Aab system to perform PSF fitting and
verify that it is single.
We calibrated our photometry using 2MASS K mag-
2 http://www2.keck.hawaii.edu/realpublic/inst/nirc2/
nitudes for the A and B components and the nearby field
star F1 (Section 3). The 2MASS observations were con-
ducted using the Ks filter rather than K
′, but the the-
oretical isochrones computed by Kim et al. (2005) for
the Ks and K
′ systems differ by .0.01 magnitudes for
objects in this color range; this is much smaller than
other uncertainties in the calibration. Carpenter (2001)
found typical zero point shifts of .0.03 magnitudes be-
tween 2MASS Ks and several standard K bandpasses,
all of which are more distinctly different from Ks than
K ′, which also demonstrates that the zero point shift
between Ks and K
′ should be negligible.
The calibration process could introduce systematic un-
certainties if any of the three calibration sources are vari-
able, but based on the small deviation in the individual
calibration offsets for each source (0.03 mag), variability
does not appear to be a significant factor. We tested the
calibration using DENIS K magnitudes and found that
the two methods agree to within 0.01 mag, albeit with a
higher standard deviation (0.12 mag) for DENIS.
3. RESULTS
3.1. Images
In Figure 1, we show a NIRC2 wide-camera image of
the field surrounding USco1606-1935. The A and B com-
ponents are labeled, as are 6 apparent field stars (named
F1 through F6) which we use as astrometric comparison
stars. We found counterparts for the first three field stars
in existing survey catalogues: F1 was detected by all four
sky surveys, F2 was detected by DENIS, USNO-B, and
SSS, and F3 was detected only by USNO-B and SSS.
In Figure 2, we show individual contour plots drawn
from NIRC2 narrow-camera images of the A and B
components. These high-resolution images show that
USco1606-1935 A is itself composed of two sources; we
designate these two components Aa and Ab. We do not
possess any direct diagnostic information to determine if
Aa and Ab are physically associated, but there are only
two other bright sources in the field of view. If the source
count is representative of the surface density of bright
(K < 15) sources along the line of sight, the probability
of finding an unbound bright source within <100 mas of
the A component is only ∼ 10−5. Thus, we consider Aa
and Ab to comprise a physically bound binary system.
3.2. Photometry
Photometric data are generally sufficient to reject most
nonmember interlopers because association members fol-
low a bright, well-defined cluster sequence in color-
magnitude diagrams and most field stars will fall be-
low or bluer than the association sequence. In Table
2, we summarize the observed and archival photometry
for each source in the NIRC2 wide-camera images. In
Figure 3, we show three color-magnitude diagrams (K
versus J − K, H − K, and I2 − K) for our observed
sources and for all spectroscopically-confirmed members
of Upper Sco (as summarized in Kraus & Hillenbrand
2007).
The colors and magnitudes for USco1606-1935 B are
consistent with the known members of Upper Sco, which
supports the assertion that it is an association member.
B is located marginally above and redward of the mean
cluster sequence in the (K,J −K) and (K,H −K) dia-
grams; if this result is genuine and not a consequence of
Fig. 1.— The field surrounding USco1606-1935. The A and B components are labeled, as are 6 apparent field stars. The separation
between the Aa and Ab components is too small to be apparent in this image.
Fig. 2.— Contour plots showing our LGSAO observations of USco1606-1935. The first panel shows an original exposure for the Aab
pair, the second and third panels show Aa and Ab after subtracting best-fit values for the other component, and the last panel shows an
original exposure for B. The contours are drawn at 5% to 95% of the peak pixel values.
Fig. 3.— Color-magnitude diagrams showing all spectroscopically-confirmed members of Upper Sco (black crosses), the A and B binary
components (red), and the other six objects detected in our LGSAO images (blue). The NIR CMDs (top) demonstrate that F1 lies
significantly below the association sequence, and therefore is an unrelated field star. The optical-NIR CMD (bottom) supports this
identification and demonstrates that F2 and F3 are also field stars that lie below the association sequence. We measure formal upper limits
only for stars F4-F6, but marginal R band detections in the POSS plates suggest that F4 and F6 are also field stars. Typical uncertainties
are plotted on the left edge of each plot.
TABLE 1
Coordinates and Photometry
Name RAa DECa KLGS
b K2MASS
b Hb Jb I2b
A 16 06 11.99 -19 35 33.1 11.04 11.02 11.35 12.01 14.1
Aa - - 11.71 - - - -
Ab - - 11.88 - - - -
B 16 06 11.44 -19 35 40.5 11.74 11.78 12.32 13.00 14.9
F1 16 06 12.09 -19 35 18.3 11.51 11.50 11.62 12.27 13.5
F2 16 06 12.90 -19 35 36.1 16.32 - - - 17.8
F3 16 06 13.23 -19 35 23.7 16.66 - - - 18.7
F4 16 06 11.75 -19 35 32.0 17.43 - - - -
F5 16 06 12.40 -19 35 40.3 17.28 - - - -
F6 16 06 12.94 -19 35 44.6 16.97 - - - -
Note. — Photometry is drawn from our observations (KLGS), 2MASS (JHK2MASS),
and the USNO-B1.0 catalogue (I2).
a Coordinates are derived from the 2MASS position for USco1606-1935 A and the rel-
ative separations we measure using LGSAO. The absolute uncertainty in the 2MASS
position with respect to the International Coordinate Reference System (ICRS) is .0.1′′.
b Photometric uncertainties are ∼0.03 mag for LGSAO and 2MASS photometry and
∼0.25 mag for USNO-B1.0 photometry.
TABLE 2
Relative Astrometry
LGSAO K 2MASS K 2MASS H 2MASS J DENIS IJK
(JD=2453773) (JD=2451297) (JD=2451297) (JD=2451297) (JD=2451332)
∆α ∆δ ∆α ∆δ ∆α ∆δ ∆α ∆δ ∆α ∆δ
Aa -0.0132 -0.0149 - - - - - - - -
Ab +0.0201 +0.0266 - - - - - - - -
B -7.825 -7.460 -7.757 -7.455 -7.749 -7.395 -7.834 -7.382 -7.865 -7.448
F1 +1.453 +14.844 +1.401 +14.762 +1.446 +14.732 +1.479 +14.735 +1.418 +14.728
F2 +12.839 -3.017 - - - - - - -a -a
F3 +17.571 +9.370 - - - - - - - -
F4 -3.438 +1.056 - - - - - - - -
F5 +5.805 -7.224 - - - - - - - -
F6 +13.385 -11.540 - - - - - - - -
Note. — The zero-point for all coordinate offsets is the photocenter of the unresolved Aab system. The relative
astrometric uncertainties for 2MASS and DENIS results are ∼40 mas; uncertainties for the LGSAO results are ∼5
mas for bright objects and ∼10 mas for faint objects.
a F2 was marginally detected in i by DENIS, but the astrometry is not sufficiently precise to be useful in calculating
its proper motion.
the photometric uncertainties, it could be a consequence
of differential reddening, aK band excess associated with
a hot disk, or the presence of an unresolved tight binary
companion. However, B does not appear to be as red
in DENIS data (J −K = 0.98), which suggests that the
2MASS result may not be genuine.
The three sources for which we have colors (F1, F2,
and F3) all sit below the Upper Sco member sequence in
the (K,I2 − K) color-magnitude diagram. Some USco
members also fall marginally blueward of the associa-
tion sequence in (K,I2−K); we can find no correlation
with location, multiplicity, or other systematic factors,
so this feature may be a result of intrinsic variability be-
tween the epochs of K and I2. This result suggests that
the (K,I2−K) CMD is not sufficient for ruling out the
membership of F1. However, F1 also sits at the extreme
blueward edge of the association sequence in (K,J −K)
and is clearly distinct from the association sequence in
(K,H − K). We therefore judge that all three sources
are unassociated field star interlopers.
We do not possess sufficient information to determine
whether these three stars are field dwarfs in the Milky
Way disk or background giants in the Milky Way bulge;
the unknown nature of these sources could complicate fu-
ture efforts to calculate absolute proper motions because
comparison to nonmoving background giants is the best
way to establish a nonmoving astrometric frame of refer-
ence. As we will show in Section 3.3, F1 possesses a small
total proper motion (<10 mas yr−1), so it may be a dis-
tant background star. Its 2MASS colors (J −H = 0.65,
H −K = 0.12) place it on the giant sequence in a color-
color diagram, but reddened early-type stars with spec-
tral type <M0 can also reproduce these colors.
We are unable to measure colors for the stars F4, F5,
and F6 because they were detected only in our LGSAO
observations. However, visual inspection of the digitized
POSS plates via Aladdin (Bonnarel et al. 2000) found
possible R band counterparts to F4 and F6 that were
not identified by USNO-B. If these detections are genuine
and these two sources fall near the USNO-B survey limit
(R ∼ 20 − 21), their colors (R − K ∼ 3 − 4 or I2 −
K ∼ 2−3) are too blue to be consistent with association
membership.
3.3. Astrometry
Fig. 4.— Relative separations from the A component to the B component (left) and the field star F1 (right) for our LGSAO data and
archival 2MASS/DENIS data. The blue circles denote LGSAO data, the red circles denote 2MASS data for each filter (J , H, and K), and
the green circles denote the average DENIS values for all three filters (IJK). The black line shows the expected relative astrometry as a
function of time for a stationary object, and the predicted archival astrometry values for the non-moving (background) case are shown on
these curves with red asterisks. The results for component B are consistent with common proper motion; the results for F1 are inconsistent
with common proper motion and suggest that the total proper motion is small, denoting a probable background star.
Fig. 5.— The spectrum of USco1606-1935 B (red) as compared
to a set of standard stars drawn from the field and from the young
Taurus and Upper Sco associations. The overall continuum shape
is best fit by a field standard with spectral type M5; the spec-
trum around the Na doublet at 8189 angstroms is better fit by an
intermediate-age (5 Myr) M5 than a young (1-2 Myr) or field M5,
suggesting that the B component is also intermediate-aged.
The standard method for confirming physical associa-
tion of candidate binary companions is to test for com-
mon proper motion. This test is not as useful for young
stars in associations because other (gravitationally un-
bound) association members have similar proper motions
to within .2-3 mas yr−1. However, proper motion anal-
ysis can still be used to eliminate nearby late-type field
stars and background giants that coincidentally fall along
the association color-magnitude sequence but possess dis-
tinct kinematics.
In Table 2, we summarize the relative astrometry for
the three system components and for the field stars F1-F6
as measured with our LGSAO observations and archival
data from 2MASS and DENIS. All offsets are given with
respect to the photocenter of the unresolved Aab sys-
tem; Aa and Ab have similar fluxes and do not appear
to be variable in any of these measurements (Section 2.3),
so this zero point should be consistent between different
epochs. We evaluated the possibility of including astro-
metric data from older photographic surveys like USNO-
B and SSS, but rejected this idea after finding that the
two surveys reported very large (up to 1′′) differences
in the separation of the A-B system from digitization
of the same photographic plates. We calculated relative
proper motions in each dimension by averaging the four
first-epoch values (2MASS and DENIS; Table 2), then
comparing the result to our second-epoch observation ob-
tained with LGSAO. We did not attempt a least-squares
fit because the 2MASS values are coeval and the DENIS
results were measured only 35 days after the 2MASS re-
sults.
In Figure 4, we plot the relative astrometry between A
and B and between A and F1 as measured by 2MASS,
DENIS, and our LGSAO survey. We also show the ex-
pected relative motion curve if B or F1 are nonmoving
background stars and A moves with the mean proper mo-
tion and parallax of Upper Sco, (µα,µδ)=(-9.3,-20.2) mas
yr−1 and π=7 mas (de Zeeuw et al. 1999; Kraus & Hil-
lenbrand 2007). The total relative motion of B over the
6.8 year observation interval is (+24±25,-40±25) mas;
the corresponding relative proper motion is (+3.5±3.7,-
5.9±3.7) mas yr−1, which is consistent with comovement
to within <2σ. This result is inconsistent with the hy-
pothesis that B is a nonmoving background star at the
8σ level.
The relative motion of F1 is (+17±25,+105±25) mas
or (+2.5±3.7,+15.4±3.7)mas yr−1, which is inconsistent
with comovement at the 4σ level. The absolute proper
motion of F1, assuming A moves with the mean proper
motion of Upper Sco, is (-7±4,-5±4) mas yr−1, which is
consistent with nonmovement to within <2σ. The impli-
cation is that F1 is probably a distant background star,
either a giant or a reddened early-type star.
3.4. Spectroscopy
The least ambiguous method for identifying young
stars is to observe spectroscopic signatures of youth like
lithium or various gravity-sensitive features. Spectro-
scopic confirmation is not strictly necessary in the case of
USco1606-1935 since we confirmed common proper mo-
tion for the A-B system, but a spectral type is also useful
in constraining the physical properties of the secondary,
so we decided to obtain an optical spectrum.
In the top panel of Figure 5, we plot our spectrum
for B in comparison to three standard field dwarfs with
spectral types of M4V-M6V. We qualitatively find that
the standard star which produces the best fit is GJ 866
(M5V). The M4V and M6V standards do not adequately
fit either the overall continuum shape or the depths of
the TiO features at 8000 and 8500 angstroms, so the
corresponding uncertainty in the spectral type is .0.5
subclasses.
In the bottom panel of Figure 5, we plot a restricted
range of the spectrum (8170-8210 angstroms) centered
on the Na-8189 absorption doublet. The depth of the
doublet is sensitive to surface gravity (e.g. Slesnick et
al. 2006a, 2006b); high-gravity dwarfs possess very deep
absorption lines, while low-gravity giants show almost
no absorption. We also plot standard stars of identi-
cal spectral type (M5) spanning a range of ages. The
depth of the B component’s Na 8189 doublet appears to
be consistent with the depth for a member of USco (5
Myr), deeper than that of a Taurus member (1-2 Myr),
and shallower than that of a field star, which confirms
that the B component is a pre-main sequence member of
Upper Sco.
We have quantified our analysis by calculating the
spectral indices TiO-7140, TiO-8465, and Na-8189,
which measure the depth of key temperature- and
gravity-sensitive features (Slesnick et al. 2006a). We
find that T iO7140 = 2.28, T iO8465 = 1.23, and Na8189 =
0.92; all three indices are consistent with our assessment
that B is a young M5 star which has not yet contracted
to the zero-age main sequence.
3.5. Stellar and Binary Properties
In Table 3, we list the inferred stellar and binary
properties for the Aa-Ab and A-B systems, which we
estimate using the methods described in Kraus & Hil-
lenbrand (2007). This procedure calculates component
masses by combining the 5 Myr isochrone of Baraffe et
al. (1998) and the M dwarf temperature scale of Luhman
TABLE 3
Binary Properties
Property Aa-Ab A-B
Measured
Sep (mas) 53.2±1.0 10874±5
PA (deg) 38.7±1.0 226.45±0.03
∆K (mag) 0.17±0.05 0.70±0.05
aproj (AU) 7.7±1.2 1600±200
Inferred
q 0.88±0.05 0.53±0.08
SpTPrim M5±0.5 M5+M5.2(±0.5)
SpTSec M5.2±0.5 M5±0.5
MPrim 0.14±0.02 0.26±0.04
MSec 0.12±0.02 0.14±0.02
Note. — The center of mass for the Aa-Ab
pair is unknown, so we calculate all A-B separa-
tions with respect to the K band photocenter.
et al. (2003) to directly convert observed spectral types
to masses. Relative properties (mass ratios q and relative
spectral types) are calculated by combining the Baraffe
isochrones and Luhman temperature scale with the em-
pirical NIR colors of Bessell & Brett (1998) and the K-
band bolometric corrections of Leggett et al. (1998) to
estimate q and ∆SpT from the observed flux ratio ∆K.
We have adopted the previously-measured spectral
type for A (M5; Preibisch et al. 2002) as the type for
component Aa, but the inferred spectral type for Ab is
only 0.2 subclasses later, so this assumption should be ro-
bust to within the uncertainties (∼0.5 subclasses). The
projected spatial separations are calculated for the mean
distance of Upper Sco, 145±2 pc (de Zeeuw et al. 1999).
If the total radial depth of Upper Sco is equal to its angu-
lar extent (∼15o or ∼40 pc), then the unknown depth of
USco1606-1935 within Upper Sco implies an uncertainty
in the projected spatial separation of ±15%. The sys-
tematic uncertainty due to the uncertainty in the mean
distance of Upper Sco is negligible (.2%).
4. IS USCO1606-1935 AB A BINARY SYSTEM?
The unambiguous identification of pre-main sequence
binaries is complicated by the difficulty of distinguishing
gravitationally bound binary pairs from coeval, comov-
ing association members which are aligned in projection.
Most traditional methods used to confirm field binary
companions do not work in the case of young binaries in
clusters and associations because all association members
share common distances and kinematics (to within cur-
rent observational uncertainties), so the only remaining
option is to assess the probability of chance alignment.
We address this challenge by quantifying the clustering
of PMS stars via calculation of the two-point correlation
function (TPCF) across a wide range of angular scales
(1′′ to >1 degree). This type of analysis has been at-
tempted in the past (e.g. Gomez et al. 1993 for Taurus;
Simon 1997 for Ophiuchus, Taurus, and the Trapezium),
but these studies were conducted using samples that were
significantly incomplete relative to today.
The TPCF, w(θ), is defined as the number of excess
pairs of objects with a given separation θ over the ex-
pected number for a random distribution (Peebles 1980).
The TPCF is linearly proportional to the surface density
of companions per star, Σ(θ) = (N∗/A)[1 +w(θ)], where
A is the survey area and N∗ is the total number of stars.
Fig. 6.— The surface density of companions as a function of separation for young stars and brown dwarfs in Upper Sco. Red symbols
denote results from our wide-binary survey using 2MASS (Kraus & Hillenbrand 2007) and blue symbols denote data for all spectroscopically-
confirmed members in two fields surveyed by Preibisch et al. (2002). The data appear to be well-fit by two power laws (dashed lines) which
most likely correspond to gravitationally bound binaries and unbound clusters of stars that have not yet completely dispersed from their
formation environments. The data points which were used to fit these power laws are denoted with circles; other points are denoted with
crosses.
However, it is often easier to evaluate the TPCF via a
Monte Carlo-based definition, w(θ) = Np(θ)/Nr(θ) − 1,
where Np(θ) is the number of pairs in the survey area
with separations in a bin centered on θ and Nr(θ) is the
expected number of pairs for a random distribution of ob-
jects over the same area (Hewett 1982). The advantage
of this method is that it does not require edge correc-
tions, unlike direct measurement of Σ(θ). We adopted
this method due to its ease of implementation, but we
report our subsequent results in terms of Σ(θ) since it is
a more intuitive quantity.
The current census of Upper Sco members across the
full association is very incomplete, so we implemented
our analysis for intermediate and large separations (θ >
6.4′′) using only members located in two heavily-studied
fields originally observed by Preibisch et al. (2001, 2002;
the 2df-East and 2df-West fields). The census of mem-
bers in these fields may not be complete, but we ex-
pect that it is the least incomplete. The census of com-
panions at smaller separations (1.5-6.4′′) has been uni-
formly studied for all spectroscopically-confirmed mem-
bers (Kraus & Hillenbrand 2007), so we have maximized
the sample size in this separation range by considering
the immediate area around all known members, not just
those within the Preibisch fields. Our survey was only
complete for mass ratios q >0.25, so we do not include
companions with mass ratios q < 0.25.
These choices might lead to systematic biases if the
Preibisch fields are still significantly incomplete or if
the frequency and properties of binary systems show
intra-association variations, but any such incompleteness
would probably change the result by no more than a fac-
tor of 2-3. As we will subsequently show, Σ(θ) varies by 4
orders of magnitude across the full range of θ. The well-
established mass dependence of multiplicity should not
affect our results since the mass function for the Preibisch
fields is similar to that seen for the rest of the association.
In Figure 6, we plot Σ(θ) for Upper Sco, spanning
the separation range −3.5 < log(θ) < 0.25 (1.14′′ to
1.78 deg ). We have fit this relation with two power
laws, one which dominates at small separations (.15-
30′′) and one at larger separations. We interpret the
two segments, following Simon (1997), to be the result
of gravitationally-bound binarity and gravitationally un-
bound intra-association clustering, respectively. We fit
the binary power law to the three lowest-separation bins
(log(θ) < −2.75) because this is the separation range
over which we possess uniform multiplicity data. The
cluster power law was fit to the six highest-separation
bins (log(θ) > −1.25) because those bins have the small-
est uncertainties. Bins corresponding to intermediate
separations seem to follow the two power laws.
We found that the slope of the cluster power law (-
0.14±0.02) is very close to zero, which implies that there
is very little clustering on scales of .1 deg. This re-
sult is not unexpected for intermediate-age associations
like Upper Sco; given the typical intra-association veloc-
ity dispersion (∼1 km s−1) and the age (5 Myr), most
association members have dispersed ∼5 pc (2 deg) rel-
ative to their formation point, averaging out structure
on smaller spatial scales. Simon (1997) found that the
slopes for Taurus, Ophiuchus, and the ONC are steeper,
suggesting that more structure is present on these small
scales at young ages (∼1-2 Myr). The slope of the binary
power law (-3.03±0.24) is much steeper than the clus-
ter regime. The separation range represented is much
larger than the peak of the binary separation distribu-
tion (∼30 AU for field solar-mass stars; Duquennoy &
Mayor 1991), so the steep negative slope corresponds to
the large-separation tail of the separation distribution
function. The two power laws seem to cross at separa-
tions of ∼15-30′′ (aproj ∼ 2500− 5000 AU), though this
result depends on the sample completeness in the binary
and cluster regimes. We interpret this to be the maxi-
mum separation range at which binaries can be identified.
If we extrapolate the cluster power law into the separa-
tion regime of the binary power law, we find that the ex-
pected surface density of unbound coincidentally-aligned
companions is ∼60 deg−2. Given this surface density,
there should be ∼1 chance alignment within 15′′ among
the 366 spectroscopically confirmed members of Upper
Sco. Among the 173 known late-type stars and brown
dwarfs (SpT≥M4) for which this separation range is un-
usually wide, the expected number of chance alignments
with any other member is 0.5. If the mass function of
known members is similar to the total mass function,
approximately half (∼0.25 chance alignments) are ex-
pected to occur with another low-mass member. There-
fore, we expect ∼0.25 chance alignments which might be
mistaken for a low-mass binary pair.
The probability that one or more such chance align-
ments actually exists for a known low-mass USco member
is 25% (based on Poisson statistics), which suggests that
the nature of a single candidate wide pair like USco1606-
1935 AB can not be unambiguously determined. If any
more pairs can be confirmed, then they would represent a
statistically significant excess. The corresponding prob-
ability of finding 2 chance alignments of low-mass mem-
bers is only 2%. As we have described in our survey
of wide multiplicity with 2MASS (Kraus & Hillenbrand
2007), we have identified at least three additional can-
didate ultrawide systems in Upper Sco, so spectroscopic
and astrometric followup of these candidate systems is a
high priority.
5. SUMMARY
We have presented photometric, astrometric, and spec-
troscopic observations of USco1606-1935, a candidate ul-
trawide (∼1600 AU), low-mass (Mtot ∼0.4 M⊙) hierar-
chical triple system in the nearby OB association Upper
Scorpius. We conclude that the ultrawide B component
is a young, comoving member of the association, and
show that the primary is itself a close binary system.
If the Aab and B components are gravitationally
bound, the system would join the growing class of young
multiple systems which have unusually wide separations
as compared to field systems of similar mass. However,
we demonstrate that binarity can not be assumed purely
on probabilistic grounds. Analysis of the association’s
two-point correlation function shows that there is a sig-
nificant probability (25%) that at least one pair of low-
mass association members will be separated by .15′′, so
analysis of the wide binary population requires a system-
atic search for all wide binaries. The detection of another
pair of low-mass members within 15′′ would represent
an excess at the 98% confidence level. In principle, bina-
rity could also be demonstrated by measuring common
proper motion with precision higher than the internal
velocity scatter of the association; given the astromet-
ric precision currently attainable with LGSAO data (.1
mas), the test could be feasible within .5 years.
The authors thank C. Slesnick for providing guidance
in the analysis of young stellar spectra, P. Cameron for
sharing his NIRC2 astrometric calibration results prior
to publication, and the anonymous referee for returning
a helpful and very prompt review. The authors also wish
to thank the observatory staff, and particularly the Keck
LGSAO team, for their tireless efforts in commissioning
this valuable addition to the observatory. Finally, we
recognize and acknowledge the very significant cultural
role and reverence that the summit of Mauna Kea has
always had within the indigenous Hawaiian community.
We are most fortunate to have the opportunity to con-
duct observations from this mountain.
This work makes use of data products from the Two
Micron All-Sky Survey, which is a joint project of the
University of Massachusetts and the Infrared Process-
ing and Analysis Center/California Institute of Tech-
nology, funded by the National Aeronautics and Space
Administration and the National Science Foundation.
This work also makes use of data products from the
DENIS project, which has been partly funded by the
SCIENCE and the HCM plans of the European Com-
mission under grants CT920791 and CT940627. It is
supported by INSU, MEN and CNRS in France, by
the State of Baden-Wrttemberg in Germany, by DG-
ICYT in Spain, by CNR in Italy, by FFwFBWF in
Austria, by FAPESP in Brazil, by OTKA grants F-
4239 and F-013990 in Hungary, and by the ESO C&EE
grant A-04-046. Finally, our research has made use of
the USNOFS Image and Catalogue Archive operated by
the United States Naval Observatory, Flagstaff Station
(http://www.nofs.navy.mil/data/fchpix/).
REFERENCES
Baraffe, I., Chabrier, G., Allard, F., & Hauschildt, P. 1998, A&A,
337, 403
Bessell, M. & Brett, J. 1988, PASP, 100, 1134
Bonnarel, F. et al. 2000, A&AS, 143, 33
Bouy, H., Brandner, W., Martin, E., Delfosse, X., Allard, F., &
Basri, G. 2003, AJ, 126, 1526
Burgasser, A. et al. 2003, ApJ, 125, 850
Caballero, J., Martin, E., Dobbie, P., & Barrado y Navascues, D.
2006, A&A, 460, 635
Carpenter, J. 2001, AJ, 121, 2851
Chauvin, G., Lagrange, A., Dumas, C., Zuckerman, B., Mouillet,
D., Song, I., Beuzit, J., & Lowrance, P. 2004, A&A, 425, 29
Close, L., Siegler, N., Freed, M., & Biller, B. 2003, ApJ, 587, 407
Close, L. et al. 2007, ApJ, in press
de Zeeuw, P., Hoogerwerf, R., de Bruijne, J., Brown, A., &
Blaauw, A. 1999, AJ, 117, 354
Duquennoy, A. & Mayor, M. 1991, A&A, 248, 485
Epchtein, N. et al. 1999, A&A, 349, 236
Fischer, D. & Marcy, G. 1993, ApJ, 396, 178
Gomez, M., Hartmann, L., Kenyon, S., & Hewett, R. 1993, AJ,
105, 1927
http://www.nofs.navy.mil/data/fchpix/
Hambly, N. et al. 2001, MNRAS, 326, 1279
Hewett, P. 1982, MNRAS, 201, 867
Jayawardhana, R. & Ivanov, V. 2006, Science, 313, 1279
Kim, S., Figer, D., Lee, M., & Oh, S. 2005, PASP, 117, 445
Kraus, A. & Hillenbrand, L. 2007, ApJ, in press
(astro-ph/0702545)
Leggett, S., Allard, F., & Hauschildt, P. 1998, ApJ, 509, 836
Luhman, K., Stauffer, J., Muench, Al, Rieke, G., Lada, E.,
Bouvier, J., & Lada, C. 2003, ApJ, 593, 1093
Luhman, K., Whitney, B., Meade, M., Babler, B., Indebetouw,
R., Bracker, S., & Churchwell, E. 2006, ApJ, 649, 1180
Luhman, K., Allers, K., Jaffe, D., Cushing, M., Williams, K.,
Slesnick, C., & Vacca, W. 2007, ApJ, in press
Massey, P., Strobel, K., Barnes, J., & Anderson, E. 1988, ApJ,
328, 315
Monet, D. et al. 2003, AJ, 125, 984
Oke, B. & Gunn, J. 1982, PASP, 94, 586
Peebles, J. 1980, The Large Scale Structure of the Universe
(Princeton: Princeton Univ. Press)
Preibisch, T., Guenther, E., & Zinnecker, H. 2001, AJ, 121, 1040
Preibisch, T., Brown, A., Bridges, T., Guenther, E., & Zinnecker,
H. 2002, AJ, 124, 404
Reid, I. & Gizis, J. 1997, AJ, 113, 2246
Reid, I., Gizis, J., Kirkpatrick, J., & Koerner, D. 2001, AJ, 121,
Simon, M. 1997, ApJ, 482, 81
Skrutskie, M. et al. 2006, AJ, 131, 1163
Slesnick, C., Carpenter, J., & Hillenbrand, L. 2006a, AJ, 131, 3016
Slesnick, C., Carpenter, J., Hillenbrand, L., & Mamajek, E.
2006b, AJ, 132, 2665
Stetson, P. 1987, PASP, 99, 191
Weinberg, M., Shapiro, S., & Wasserman, I. 1987, ApJ, 312, 367
Wizinowich, P. et al. 2006, PASP, 118, 297
http://arxiv.org/abs/astro-ph/0702545
ABSTRACT
  We present photometric, astrometric, and spectroscopic observations of
USco160611.9-193532 AB, a candidate ultrawide (~1600 AU), low-mass (M_tot~0.4
M_sun) multiple system in the nearby OB association Upper Scorpius. We conclude
that both components are young, comoving members of the association; we also
present high-resolution observations which show that the primary is itself a
close binary system. If the Aab and B components are gravitationally bound, the
system would fall into the small class of young multiple systems which have
unusually wide separations as compared to field systems of similar mass.
However, we demonstrate that physical association can not be assumed purely on
probabilistic grounds for any individual candidate system in this separation
range. Analysis of the association's two-point correlation function shows that
there is a significant probability (25%) that at least one pair of low-mass
association members will be separated in projection by <15", so analysis of the
wide binary population in Upper Sco will require a systematic search for all
wide systems; the detection of another such pair would represent an excess at
the 98% confidence level.

<|endoftext|><|startoftext|>
Introduction
In this contribution we study the quantum propagation of particles in a cosmological
background. We are particularly interested in understanding the dissipative phenomena
related to the time dependence of the metric. To this end, we analyze the propagator
of a massive particle which interacts with a massless radiation field in an expanding
universe. Several points must be considered.
First, we are dealing with an interacting field theory in a curved spacetime. In
this situation, the asymptotic in and out vacua generally do not coincide. Being
interested in expectation values, rather than in in-out matrix elements, we adopt the
Keldysh-Schwinger formalism, or Closed Time Path (CTP) method [1–3] in curved
spacetime [4–7].
Second, as it is well known, in a curved spacetime there is no single definition for
the vacuum nor for the concept of particle. We face this issue by working within the
adiabatic approximation [8]: the massive particles will have their Compton wavelengths
much smaller than the typical curvature radius of the universe (in our case, the Hubble
radius).
Third, as explained in [9], in theories such as QED or perturbative quantum gravity,
dissipative effects appear only at two loops, because the one-loop diagrams which could
lead to dissipation vanish on the mass shell. Here, in order to keep the calculations
simple, we have chosen a simple, yet physically meaningful, model which exhibits
http://arxiv.org/abs/0704.0456v1
Particle propagation in cosmological backgrounds 2
dissipation at one loop. We expect the behaviour of QED or perturbative quantum
gravity to be similar at two loops.
In this contribution we compute the retarded self-energy of the lightest field in a
massive doublet which propagates in a thermal bath of massless particles in an expanding
universe, and from it we extract the decay rate. Notice that in the Minkowski vacuum
the excitations of the lighter field are stable, hence the decay rate is zero. We concentrate
on the physical insights and summarize the main results. A more detailed account will
be given in separate publications [10, 11]
The contribution is organized as follows. In section 2 we introduce the model and
motivate the use of the adiabatic approximation for the massive fields. In section 3
we present the results for the imaginary part of the self-energy and the decay rate. In
section 4 we study the time evolution of the interacting propagator. Finally, in section
5 we summarize the main points of the contribution and discuss its relevance to the
trans-Planckian question. We use a system of natural units with ~ = c = 1, and the
metric has the signature (−,+,+,+).
2. The model
We consider spatially isotropic and homogeneous Friedmann-Lemâıtre-Robertson-
Walker models with flat spatial sections:
ds2 = −dt2 + a2(t)dx2. (1)
The particle model is the following: two massive fields φm, and φM , interacting with a
massless field, χ, via a trilinear coupling. The total action is S = Sm + SM + Sχ + Sint,
where each term is given by
dt d3x a3(t)
(∂tφm)
a2(t)
(∂xφm)
2 −m2φ2m
, (2a)
dt d3x a3(t)
(∂tφM)
a2(t)
(∂xφM)
2 −M2φ2M
, (2b)
dt d3x a3(t)
(∂tχ)
a2(t)
(∂xχ)
2 − ξR(t)χ2
, (2c)
Sint = gM
dt d3x a3(t)φmφMχ, (2d)
with R(t) being the Ricci scalar. We assume that the massless field is conformally
coupled to gravity, so that ξ = 1/6. It is useful to work with rescaled massive fields
defined by φ̄(t,x) := [−g(t,x)]1/4φ(t,x) = a3/2(t)φ(t,x).
We consider the two massive fields having large masses but with a small mass
difference ∆m := M − m ≪ M . As shown in [12], the model can be interpreted as
a field-theory description of a relativistic two-level atom (of mass m and energy gap
∆m) interacting with a scalar radiation field χ. The radiation field χ is assumed to be
at some conformal temperature θ (which can eventually be zero). The corresponding
physical temperature, as well as the Hubble rate H(t) := ȧ(t)/a(t), are chosen to be
Particle propagation in cosmological backgrounds 3
much smaller than the masses of the fields. These restrictions ensure that the number of
massive particles is strictly conserved. The non-trivial dynamics concerns the transitions
between the two massive fields accompanied by emission and absorption of massless
quanta.
In a curved spacetimes it is not a trivial task to compute even the free field vacuum
propagators. For massless conformally coupled fields there is a natural vacuum state,
the conformal vacuum. Propagators in this vacuum, when expressed in conformal time,
essentially correspond to the flat spacetime propagators. For the massive fields, rather
than attempting to find the exact free propagator, we will exploit the fact that their
Compton wavelengths is much smaller than the Hubble length H−1. In this regime, the
adiabatic (WKB) approximation is valid and explicit expressions for the free propagators
can be computed [8, 10] —see for instance (19).
3. The self-energy and decay rates
In this section we consider the interacting retarded Green function
GR(t, t
′;p) := θ(t− t′)〈[φ̂mp(t), φ̂mp(t
′)]〉 (3)
within the adiabatic approximation. It is related to the retarded self-energy ΣR via [12]
GR(t, t
′;p) = G
R (t, t
ds ds′
−g(s)
−g(s′)G
R (t, s;p)ΣR(s, s
′;p)GR(s
′, t′;p) (4)
where G
R (t, t
′;p) is the free retarded propagator. In terms of the rescaled fields,
φ̄(t;p) = a3/2φ(t;p), the above relation becomes
ḠR(t, t
′;p) = Ḡ
R (t, t
′;p)− i
ds ds′ Ḡ
R (t, s;p)Σ̄R(s, s
′;p)ḠR(s
′, t′;p). (5)
We assume that the massive fields are in the adiabatic vacuum, and that the
massless field χ is in a thermal state, characterized by a fixed conformal temperature
θ. We will compute the imaginary part of the one-loop self energy to order g2 in
the adiabatic approximation, evaluated at the mass shell. It will be evaluated in a a
frequency representation around the average time coordinate T = (t1+t2)/2, by Fourier-
transforming with respect to the difference coordinate ∆ = t1 − t2, which amounts to a
local frequency representation (it is further analyzed in next section). As for the spatial
part, we work in the momentum representation to exploit conservation of the conformal
momentum.
3.1. Linear approximation to the scale factor
As a first step, we approximate the evolution of the scale factor by a linear expansion:
a(t) ≈ a(T )[1 +H(T )(t− T )]. (6)
This approximation for the scale factor is appropriate when considering physical
temperatures which are much larger than the expansion rate (but still much smaller than
Particle propagation in cosmological backgrounds 4
the fields masses). On the mass shell, thermal corrections will dominate over curvature
corrections, since the thermal energy scale is much larger than the curvature energy
scale. Therefore, we expect the on-shell self-energy to be governed by the thermal bath
at the instantaneous physical temperature at each moment of the expansion, θ/a(T ).
The explicit calculation [10] confirms that the imaginary part of the on-shell self-
energy is given by that of a thermal bath in Minkowski at a physical temperature θ/a(T ).
In the limit in which the atoms are at rest this result is [10, 12]
Im Σ̄R(m, T ; 0) = −
M∆m nθ/a(T )(∆m), (7)
where nθ/a(T )(∆m) is the Bose-Einstein function:
nθ/a(T )(∆m) :=
e∆m a(T )/θ −1
. (8)
As in Minkowski spacetime, the self-energy corresponds to a decay rate,
Γ = −
Im Σ̄R(m, T ; 0) =
∆m nθ/a(T )(∆m), (9)
which amounts to the probability per unit time for the lightest state to absorb a massless
particle from the thermal bath.
3.2. Beyond linear order: vacuum effects
When the expansion rate of the universe is of the order of the temperature or larger,
vacuum effects become relevant. Energy conservation does not hold for energy scales of
the order of the expansion rate, and therefore we expect new channels for the particle
decay which will contribute to the imaginary part of the self-energy.
In order to study the vacuum effects we need to choose a explicit model for the
evolution of the scale factor. For instance, in the case of de Sitter,
a(t) = a(T ) eH(t−T ), (10)
the vacuum contribution to the imaginary part of the retarded self-energy given by [11]
Im Σ̄R(m, T ; 0) = −
M∆m nH/(2π)(∆m), (11)
which coicides with the self-energy in a Minkowski thermal bath at a temperature
H/(2π). The result is not unexpected since the effective de Sitter temperature [13]
is recovered. The corresponding decay rate
Γ = −
Im Σ̄R(m, T ; 0) =
∆m nH/(2π)(∆m), (12)
amounts for the probability per unit time for the lightest field to emit a massless particle.
Energy conservation forbids this process in Minkowski spacetime, but this restriction
does not apply in an expanding universe.
Particle propagation in cosmological backgrounds 5
4. Retarded propagator and self-energy in cosmology
In expanding universes the propagators are no longer time-translation invariant. We
can nevertheless always express the propagator in a frequency representation,
ḠR(ω, T ;p) :=
d∆ eiω∆ ḠR(T +∆/2, T −∆/2;p) . (13)
For short time differences as compared to the inverse expansion rate, i.e., |t−t′| ≪ H−1,
(5) can be diagonalized:
ḠR(ω, T ;p) =
[−iḠ(0)(ω, T ;p)]−1 + Σ̄R(ω, T ;p)
. (14)
Fourier-transforming again we get the short-time behavior:
ḠR(t, t
′;p) =
Rp(T )
sin [Rp(T )(t− t
′)] e−Γp(T )(t−t
′)/2 θ(t− t′). (15)
(T ) := E2
(T ) + Re Σ̄R(Ep, T ;p) := m
a2(T )
+ Re Σ̄R(Ep, T ;p) (16)
Γp(T ) := −
Rp(T )
Im Σ̄R(Ep, T ;p). (17)
Therefore one recovers the usual interpretation, in which the real part of the self-energy
corresponds to the energy shift, and in which the imaginary part corresponds to the
decay rate. Notice that both quantities depend in general on time.
One may also be interested in considering large time lapses, and in this case the
frequency representation of the propagator around the average time does not make sense.
Lifting the short-time requirement, and only imposing the adiabatic approximation, the
following expression for the evolution of the retarded propagator is found [10]:
ḠR(t1, t2;p) =
Rp(t1)Rp(t2)
dt′ Rp(t
dt′ Γk(t
θ(t1 − t2). (18)
Notice that the long-time evolution of the propagator can be expressed in terms of
integrals of quantities evaluated in the local frequency representation. Two time scales
are clearely separated: the interaction timescale, in which the interaction process take
place and in which the self-energy is evaluated, and the evolution timescale, which
can be much longer and during which the propagators deviate significantly from the
corresponding Minkowski expression. Equation (18) can be derived in a very similar
way as the well-known adiabatic approximation for the free retarded propagator [8]:
R (t1, t2;p) =
Ep(t1)Ep(t2)
dt′ Ep(t
θ(t1 − t2). (19)
Particle propagation in cosmological backgrounds 6
5. Summary and discussion
The goal of this contribution is to analyze the quantum effects in the propagation of
interacting fields in a cosmological background. This issue may play an important role in
justifying the non-trivial dispersion relations which have been used when addressing the
trans-Planckian question in the context of black holes [14–17] and cosmology [18–21].
Interactions could indeed significantly modify the field propagation when approaching
the event horizon of a black hole [22–25] or at primordial stages of inflation [9].
In our model, the masses of the fields were assumed to be much larger than the
expansion rate of the universe. This was a key assumption, because it allowed to
introduce the adiabatic (WKB) approximation, which not only makes the problem
solvable, but also allows having a well-defined particle concept even in absence of
asymptotic regimes. Within this approximation, the time-evolution of the interacting
propagators can be computed from the integral of the retarded self-energy, evaluated
on-shell in a frequency representation around the mid time.
The imaginary part of the self-energy determines the decay of the retarded
propagator, and hence it is an expression of the dissipative properties. For temperatures
higher than the expansion parameter the decay of the propagator is determined by the
local temperature at each moment of expansion. For lower temperatures, the decay of
the propagator is driven by the expansion rate of the universe. This second contribution,
which is present even in the vacuum, can be interpreted as being a consequence of the
absence of energy conservation at those energy scales comparable to the expansion rate.
The decay rate, derived from the imaginary part of the self-energy, has a secular
character. Even small decay rates could thus give an important effect when integrated
over large periods of time. The exact significance of the generically dissipative properties
of the propagator will be further analyzed elsewhere [11].
Acknowledgments
I am very grateful with Renaud Parentani and Enric Verdaguer for a critical reading
of the manuscript. This work is partially supported by the Research Projects MEC
FPA2004-04582-C02-02 and DURSI 2005SGR-00082.
References
[1] J. S. Schwinger. J. Math. Phys., 2:407, 1961.
[2] L. V. Keldysh. Zh. Eksp. Teor. Fiz, 47:1515, 1965. [Sov. Phys. JEPT 20:1018, 1965].
[3] K.-C. Chou, Z.-B. Su, B.-L. Hao, and L. Yu. Phys. Rept., 118:1–131, 1985.
[4] R. D. Jordan. Phys. Rev. D, 33:444–454, 1986.
[5] E. Calzetta and B. L. Hu. Phys. Rev. D, 35:495–509, 1987.
[6] A. Campos and E. Verdaguer. Phys. Rev. D, 49:1861–1880, 1994.
[7] S. Weinberg. Phys. Rev. D, 72:043514, 2005.
[8] N. D. Birrell and P. C. W. Davies. Quantum fields in curved space. Cambridge University Press,
Cambridge, England, 1982.
Particle propagation in cosmological backgrounds 7
[9] D. Arteaga, R. Parentani, and E. Verdaguer. Phys. Rev. D, 70:044019, 2004.
[10] D. Arteaga, R. Parentani, and E. Verdaguer. To appear in Int. J. Theor. Phys.
[11] D. Arteaga, R. Parentani, and E. Verdaguer. In preparation.
[12] D. Arteaga, R. Parentani, and E. Verdaguer. Int. J. Theor. Phys., 44:1665–1689, 2005.
[13] E. Mottola, Phys. Rev. D, 31:754–766, 1985.
[14] W. G. Unruh. Phys. Rev. Lett., 46:1351–1353, 1981.
[15] T. Jacobson. Phys. Rev. D, 44:1731–1739, 1991.
[16] W. G. Unruh. Phys. Rev. D, 51:2827–2838, 1995.
[17] R. Balbinot, A. Fabbri, S. Fagnocchi, and R. Parentani. Riv. Nuovo Cim. 28:1–55, 2005
(gr-qc/0601079).
[18] J. Martin and R. Brandenberger. Phys. Rev. D, 63:123501, 2001.
[19] J. Martin and R. Brandenberger. Phys. Rev. D, 68:063513, 2003.
[20] J. C. Niemeyer. Phys. Rev. D, 63:123502, 2001.
[21] J. C. Niemeyer and R. Parentani. Phys. Rev. D, 64:101301, 2001.
[22] C. Barrabès, V. Frolov, and R. Parentani. Phys. Rev. D, 62:044020, 2000.
[23] R. Parentani. Phys. Rev. D, 63:041503, 2001.
[24] R. Parentani. Int. J. Theor. Phys., 41:2175–2200, 2002.
[25] R. Parentani. Int. J. Mod. Phys., A17:2721–2726, 2002.
http://arxiv.org/abs/gr-qc/0601079
	Introduction
	The model
	The self-energy and decay rates
	Linear approximation to the scale factor
	Beyond linear order: vacuum effects
	Retarded propagator and self-energy in cosmology
	Summary and discussion
ABSTRACT
  We study the quantum propagation of particles in cosmological backgrounds, by
considering a doublet of massive scalar fields propagating in an expanding
universe, possibly filled with radiation. We focus on the dissipative effects
related to the expansion rate. At first order, we recover the expected result
that the decay rate is determined by the local temperature. Beyond linear
order, the decay rate has an additional contribution governed by the expansion
parameter. This latter contribution is present even for stable particles in the
vacuum. Finally, we analyze the long time behaviour of the propagator and
briefly discuss applications to the trans-Planckian question.

<|endoftext|><|startoftext|>
Introduction
	Equations of Motion
	Transmission line-SQUID-mechanical oscillator Hamiltonian
	Open system Heisenberg equations of motion
	Observables and `in' states
	Solving the equations of motion
	Linear response approximation
	Semiclassical approximation
	Complete solution to detector signal response and noise
	Quantum bound on noise
	Results
	Analytical approximations
	Displacement sensitivity
	Force sensitivity
	Back reaction cooling
	Concluding remarks
	Acknowledgements
	References
ABSTRACT
  We provide a quantum analysis of a DC SQUID mechanical displacement detector
within the sub-critical Josephson current regime. A segment of the SQUID loop
forms the mechanical resonator and motion of the latter is transduced
inductively through changes in the flux threading the loop. Expressions are
derived for the detector signal response and noise, which are used to evaluate
the position and force detection sensitivity. We also investigate cooling of
the mechanical resonator due to back reaction noise from the detector.

<|endoftext|><|startoftext|>
Introduction
The Galactic magnetic field is now recognized
as a fundamental component of the interstellar
medium, and plays a critical role in the formation
and evolution of structures in the Milky Way. An
important prediction in models of the large-scale
magnetic field, in both the Milky Way and in other
galaxies, is the existence of magnetic field reversals
(regions of magnetic shear across which the field
changes direction by roughly 180◦). Determining
the number and location of these magnetic rever-
sals is essential to understanding Galactic evolu-
tion (Shukurov 2005). While the majority of the
recent studies suggest several large-scale reversals
in the Galaxy along its radius, it is interesting to
John.Dickey@utas.edu.au
http://arxiv.org/abs/0704.0458v1
note that most other galaxies exhibit either one re-
versal or none (Beck 2007). Is our Galaxy unique
this way, or is it simply a difference in observing
methods?
The large-scale Galactic magnetic field is con-
centrated in the disk, and is most often studied via
observations of rotation measure (RM), the mea-
surable consequence of Faraday rotation. For a
source that emits linearly polarised radiation at
an angle φ◦, the received radiation will have a po-
larisation angle at a wavelength, λ [m] , given by:
φ = φ◦ + λ
2 0.812
neB · dl = φ◦ + λ
2 RM (1)
where ne [cm
−3] is the electron density andB [µG]
is the magnetic field along the propagation path
dl [pc]. The RM integral is over the path from the
source to the observer.
For more than four decades, the RMs of both
pulsars and extragalactic radio sources (EGS)
have been used to probe the Galactic magnetic
field. This has led to a series of clear conclu-
sions. First, in the local spiral arm, the field
is unquestionably directed clockwise (CW), as
viewed from the North Galactic pole (Manchester
1972, 1974; Heiles 1996a). The field in the first
quadrant (Q1; 0◦ ≤ ℓ ≤ 90◦; see Figure 1) of
the Sagittarius-Carina spiral arm is reversed rel-
ative to the local arm (Thomson & Nelson 1980;
Simard-Normandin & Kronberg 1980; Lyne & Smith
1989), implying a large-scale field reversal be-
tween the local arm and the Q1 component of the
Sagittarius-Carina arm. Some evidence suggests
that this reversed field extends into the fourth-
quadrant (Q4; 270◦ ≤ ℓ ≤ 360◦) component of the
Sagittarius-Carina arm (eg. Rand & Lyne 1994;
Han et al. 1999; Frick et al. 2001).
At larger distances, the existence of other large-
scale reversals in the Galaxy remains unclear.
Ideally, reconstruction of the Galactic magnetic
field should utilize information from both pulsars
and extragalactic sources. However, until recently
there have been very few EGS RMs available at
low-latitude that might be used to study the field
in the disk. Studies that have utilized pulsar RMs
alone are constrained by the comparatively sparse
sampling of pulsars on the sky, which can make it
difficult to map the field in complicated regions.
Three recent pulsar RM studies are as follows.
Using pulsar RMs, Weisberg et al. (2004) inves-
tigated the existence and location of reversals in
Q1, concluding that a reversal occurs between each
arm so that the magnetic fields in adjacent arms
are oppositely directed. While the evidence pre-
sented by Weisberg et al. (2004) for a reversal
between the local arm and the Sagittarius-Carina
arm in Q1 is indisputable, their evidence for addi-
tional reversals is based on limited data, and they
acknowledge that the evidence for a reversal into
the Q4-component of the Sagittarius-Carina arm
is not well-defined.
Vallée (2005) investigated azimuthal field con-
figurations in the Galaxy using pulsar RMs aver-
aged in concentric rings. He concluded that best-
fit for this model was an overall clockwise mag-
netic field with a 2 kpc wide counter-clockwise
(CCW) ring, located between 4 and 6 kpc from
the Galactic center (note that a non-standard
Solar-circle radius of 7.2 kpc was assumed in this
study). In this model, the Galactic field has two
large-scale reversals, and the Q1 component of
the Sagittarius-Carina arm is CCW, while the Q4
component is CW.
Using 223 new pulsar RM observations primar-
ily in the fourth quadrant, in conjunction with pre-
vious pulsar RMs, Han et al. (2006) concluded
there is a reversal at every arm-interarm boundary,
so that the fields in the arms are directed CCW,
and the interarm regions are directed CW. A po-
tential inconsistency with this model is that the
majority of the pulsars distributed along the Q4
component of the Sagittarius-Carina spiral arm
have what Han et al. (2006) describe as ‘unex-
pectedly positive’ RMs, which they suggest is the
influence of HII regions along the lines-of-sight to
the affected pulsars (see also Mitra et al. 2003).
Recent surveys of the Galactic plane at high
resolution and at multiple wavelengths have as-
sisted greatly in the study of the Galactic mag-
netic field by addressing the previous paucity of
low-latitude EGS RM data (eg. Brown & Taylor
2001). One such survey is the Southern Galac-
tic Plane Survey (SGPS; McClure-Griffiths et al.
2005; Haverkorn et al. 2006b). The SGPS images
low latitudes in the third and fourth quadrants of
the Galaxy, complementing the Canadian Galactic
Plane Survey in the northern hemisphere (CGPS;
Taylor et al. 2003). Rotation measures calculated
from these data were used by Haverkorn et al.
(2006a) to explore the structure of the small-scale
field. Here, we present the tabulation of these
RMs and use them to examine the structure of
the large-scale field.
2. Rotation Measure Calculations
The initial set of observations for the SGPS
(ie. Phase I) spans an area of 253◦ ≤ ℓ ≤
358◦ and |b| ≤ 1.5◦. The observations were
done with the Australia Telescope Compact Ar-
ray (ATCA) in New South Wales, Australia. For
details about the SGPS observations and polari-
metric data reduction, see Gaensler et al. (2001)
and Haverkorn et al. (2006b).
RMs were calculated using twelve separate 8
MHz bands centered on frequencies between 1336
MHz and 1432 MHz. The proximity of the 8
MHz bands allows for unambiguous RM calcula-
tions using the algorithm designed by Brown et al.
(2003b) for the CGPS, with appropriate modifica-
tions for the ATCA. Specifically, an RM calcula-
tion was considered reliable if the source was suf-
ficiently polarized (>0.3%), had sufficient signal
to noise (>2) across at least 3 pixels, was Fara-
day thin (Vallée 1980), and had a consistent value
across the source.
Of the 215 polarised source candidates identi-
fied, 148 sources had RMs that successfully passed
the screening tests discussed above. These new
data are given in Table 1. Three of our 148 sources
had a RM determined by Gaensler et al. (2001),
as part of the SGPS test region (325.5◦ ≤ ℓ ≤
332.5◦,−0.5 ≤ b ≤ 3.5◦). The previously deter-
mined values of these sources are within the er-
rors of the new values quoted in Table 1. These
sources are indicated by footnote marker d. Two
additional test region RM sources fall within the
latitude range of the SGPS but they lie within the
noise perimeter of our data (see Gaensler et al.
2001) and were not observable. There is one other
previously observed RM from an EGS, reported
by Broten et al. (1988), that falls within our field
at ℓ = 307.1◦, b = 1.2◦. This source is resolved
in the SGPS and exhibits a large gradient across
the source. Consequently, it failed the screening,
though its calculated RM (+183± 23 rad m−2) is
in agreement with the previously determined value
(+185± 1 rad m−2).
3. Observations of the Galactic Magnetic
Field
Figure 2 shows the RMs of EGS and pulsars in
the SGPS region. Most of the pulsars (99 of 120)
are from Han et al. (2006). The remaining are
from Taylor et al. (2000) and Han et al. (1999).
The most striking feature of Figure 2 is the change
in sign of RM from predominantly positive at low
longitudes to predominantly negative at high lon-
gitudes for both the pulsars and EGS at ℓ ∼ 304◦,
though it is more prominent for the EGS. A change
in sign of the overall trend in RM can only come
from a change of direction in the dominant line-
of-sight component of the magnetic field. The fact
that the RM sign change is abrupt indicates that
this directional change is the result of a physi-
cal field reversal, rather than a viewing angle ef-
fect such as that observed towards ℓ ∼ 180◦ (eg.
Brown & Taylor 2001). Furthermore, the abrupt-
ness indicates a thin current sheet and an associ-
ated large gradient in the magnetic field (Heiles
1996b).
By averaging the EGS RMs in ℓ across the sky
to reduce small-scale variations (for example, due
to the small-scale field or intrinsic effects from the
EGS themselves), we can obtain more information
about the large-scale field. The top panel in Fig-
ure 3 shows the RM data from the SGPS, both
raw and binned plotted as a function of Galac-
tic longitude. The middle panel shows these data
smoothed. In all three presentations of the EGS
data, an oscillating pattern of RM with longitude
is visible. The transition from positive to nega-
tive RMs remains at ℓ ∼ 304◦, indicated by the
solid vertical line. In the bottom panel of Figure
3, we plot the individual pulsar RMs as a function
of Galactic longitude. These data were not aver-
aged in ℓ because of the variation and uncertainty
in the pulsar distances. In spite of this, there are
features seen here similar to the oscillatory pattern
observed in the EGS data.
The dashed and dotted lines in the middle panel
of Figure 3 are the approximate longitudes of |RM|
maxima and minima respectively for the SGPS
data. In Figure 1, we show these lines as lines-of-
sight overlaid on a top-down view of the Galaxy
where the grey scale is the Galactic electron den-
sity model of Cordes & Lazio (2002), hereafter
CL02. Interestingly, the blue dashed lines (|RM|
maxima) tend to have the longest continuous frac-
tion of their length along a spiral arm or through
the central annulus put in the CL02 to correspond
to the molecular ring. This is consistent with the
expectation that EGS RMs should be dominated
by the spiral arms as a result of the higher electron
densities in these regions (Han et al. 2006). The
strong positive RMs in the longitude range around
ℓ ∼ 292◦, seen in both the EGS and pulsars, sug-
gest the magnetic field in the Q4-component of the
Sagittarius-Carina arm is directed CW. The same
conclusion was reached by Caswell et al. (2004)
using data from a distant supernova remnant.
This CW field presents a simple explanation for
the positive pulsar RMs identified by Han et al.
(2006) as part of a ‘Carina anomaly’.
Similarly, the strong negative RMs in the lon-
gitude range around ℓ ∼ 312◦ suggest the mag-
netic field in the Q4-component of the Scutum–
Crux arm is directed CCW. Therefore, a magnetic
field reversal must reside between the Sagittarius–
Carina arm and the Scutum–Crux arm in Q4. Fur-
thermore, the strong evidence for a field reversal
between the local and Sagittarius–Carina arms in
Q1 suggests the reversal must slice through the
Sagittarius–Carina arm if this reversal is contin-
uous between Q1 and Q4, such as in the model
proposed by Vallée (2005). The subsequently im-
plied lack of alignment of the magnetic field with
the Sagittarius-Carina arm is consistent with ear-
lier observations of at least a 5◦ offset in pitch an-
gle between the orientation of the local field and
that of the local spiral arm (Beck 2007).
4. Modeling the Large-Scale Magnetic
Field
As a separate approach to qualitative obser-
vational analysis of the data, we can fit global
magnetic field models to the RM data (with as-
sumptions regarding the electron distribution in
the Galaxy), to explore the structure of the uni-
form field and the location or existence of mag-
netic field reversals. Here we use the CL02 electron
density model and the technique of Brown (2002)
which uses linear inversion theory (Menke 1984)
to obtain a least-squares fit to the data. With this
method, the boundaries of magnetic field regions
are fixed a priori but the strength and direction of
the field within these regions are not. The model is
fit to the observed RMs to derive the strength and
direction within each region. This contrasts previ-
ous analyses which also employed separate models
for the electron density and magnetic field, but
where the direction of the field was an input to
the model (e.g. Indrani & Despande 1998).
We present here a simple model where we con-
sider nine magnetic field regions, eight of which
are either arm or interarm regions, and the ninth
corresponds to the molecular ring. This model
does not include the differing pitch angle between
the magnetic and spiral arms discussed in section
3, but is instead designed to be directly compa-
rable with the results of Weisberg et al. (2004)
and Han et al. (2006). The region boundaries are
delineated by the green spirals in Figure 4. The
constraints for the model field within each arm or
interarm region are: 1) a log-spiral with a pitch
angle of 11.5◦ as shown; 2) a field strength that
is inversely proportional to Galactic radius (eg.
Heiles 1996b; Beck 2001; Brown et al. 2003a);
3) a zero vertical component; 4) a coherent di-
rection within each region. We set the magnetic
field within the molecular ring to have the same
constraints as in the arms, except that the field is
assumed to be purely azimuthal. At Galactic radii
less than 3 kpc or greater than 20 kpc, or at more
than 1 kpc from the mid-plane, the field is set to
zero.
We constrain the model using RM data from
individual sources only within the SGPS region
(120 pulsars, 148 SGPS EGS, and 1 EGS from
Broten et. al 1988; see sections 2 and 3).1 For
modeling purposes, we assume that the inter-
galactic contribution to the EGS RMs is neg-
ligible (Simard-Normandin & Kronberg 1980;
Gaensler et al. 2005), so that the EGS may be
considered to reside at Galactocentric radii of 20
kpc. Most of the published pulsar distances used
are based on the CL02 model, and are therefore
consistent with this model. As a consequence of
the limited sky coverage of the RM data used here,
we confine our analysis of the result to within the
SGPS longitudes.
The best-fit output for this model is a CW field
everywhere except for a CCW field in the Scutum-
Crux arm and in the molecular ring, as shown in
1Global models using data from all quadrants is beyond the
scope of this paper, but will be presented in a future paper.
Figure 4. Figure 5 shows a plot of both the mea-
sured and modeled RMs for the individual SGPS
EGS data (top panel), these data averaged and
smoothed as in the middle panel of Figure 3 (mid-
dle panel), and the measured and modeled RMs
for individual pulsars (bottom panel).
This model is able to closely reproduce the RM
structure seen in the smoothed SGPS EGS data
(recall that the fit is to the individual EGS and
pulsar data) . However, there are two (relatively
small) discrepancies between the model and the
observed data towards the outer Galaxy. The first
is in the vicinity around ℓ ∼ 275 deg, where the
measured data are more negative than the mod-
eled data as also seen in the top panel Figure 5. In
Figure 2, there is a contained region of small, neg-
ative RM between 270◦ < l < 283◦ at b > 0◦, re-
sembling a magnetic bubble like that discussed by
Clegg et al. (1992) and Brown & Taylor (2001).
RMs at these longitudes nominally should be dom-
inated by the field in the local arm. Thus, if the
negative RMs seen here were to be attributed to
the large-scale field, this would imply that the
field is directed counter-clockwise in the local arm.
This is contrary to the many studies that show
the field is clockwise in the local arm, as discussed
in section 1. Interestingly, this region was previ-
ously identified in the Parkes 2.4-GHz survey as
containing a polarized feature of unknown origin
(Duncan et al. 1997). Since the lines-of-sight in
the outer Galaxy are considerably smaller through
the interstellar medium, compared to lines-of-sight
through the inner Galaxy, it is likely that this lo-
calized feature is dominating the RM for these
EGS. As a result, averaging the negative RMs
through this localized feature with the otherwise
positive RMs of the local arm creates the effect of
RM ∼ 0 at l ∼ 275◦.
The other discrepancy is around ℓ ∼ 265◦
where the measured RMs are more positive than
the modeled RMs. Lines-of-sight at this longi-
tude are again through the outer Galaxy where
localized features including the Gum nebula
(Chanot & Sivan 1983) could dominate the RM.
As seen in the top panel of Figure 3 there are two
EGS sources at ℓ = 263◦ that have RMs roughly
double that of neighboring sources which biases
the average RM near this longitude. Interestingly,
there is also a localized peak in the pulsar data
near this longitude, as seen in the bottom panel
of Figure 3. The model tries to fit both the nega-
tive region around ℓ ∼ 275◦ and the more positive
region around ℓ ∼ 265◦, with the result being a
compromise between the two.
Although the individual pulsar data are noisier
than the averaged EGS data, the trends of the
pulsar data are also reproduced by this model.
In particular, the model supports our observa-
tional conclusion that the Q4-component of the
Sagittarius-Carina arm is directed CW, while the
Q4-component of the Scutum-Crux arm is directed
Finally, we note that the direction of the Norma
arm field is not well constrained by this model or
by the data. Regardless of whether the field in the
Norma arm is oriented CW or CCW, the results
from this model contrast the previous suggestions
of reversals with every arm (Weisberg et al. 2004)
or at every arm-interarm boundary (Han et al.
2006).
5. Summary
We present the rotation measures for 148 ex-
tragalactic sources found in the southern Galactic
plane survey. The oscillations of rotation mea-
sure with longitude revealed by these sources,
and as also seen in pulsar RM data, highlight
the dominating effect of the spiral arms on rota-
tion measure. Both empirically and with a di-
rect fit to measurements, the new data show con-
clusively that the field is directed clockwise in
the fourth-quadrant component of the Sagittarius-
Carina arm, and that a field reversal exists be-
tween the Sagittarius-Carina arm and the Scutum-
Crux arms in the fourth quadrant.
A definitive measurement of the number of
large-scale magnetic reversals in the Galaxy can
only emerge from an analysis that includes pulsar
and EGS RM data at all Galactic longitudes, and
which considers a wide range of distinct field con-
figurations. In addition, the technique presented
here is constrained to geometries imposed by the
CL02 model. With these caveats, the results from
our study of southern RMs indicate that far fewer
magnetic reversals are needed to explain the data
than other recent studies have suggested.
We thank the anonymous referee for the insight-
ful comments that have improved this manuscript.
The Australia Telescope is funded by the Com-
monwealth of Australia for operation as a National
Facility managed by CSIRO. This work was facili-
tated in part by an associateship grant to JCB by
the Alberta Ingenuity Fund. MH acknowledges
support from the National Radio Astronomy Ob-
servatory, which is operated by Associated Uni-
versities, Inc., under cooperative agreement with
the NSF. BMG and MH acknowledge the support
of the NSF through grant AST-0307358.
REFERENCES
Beck, R. 2001, Space Sci. Rev., 99, 243
Beck, R. 2007 in Polarisation 2005, F. Boulanger
& M. A. Miville-Deschenes (eds.), EAS Publi-
cation Series, 19 (astro-ph/0603531)
Brown, J. C. 2002, PhD Thesis, University of Cal-
Brown, J. C., Taylor, A. R. 2001, ApJ, 563, L31
Brown, J. C., Taylor, A. R., Wielebinski, R., &
Mueller, P. 2003, ApJ, 592, L29
Brown, J. C., Taylor, A. R., & Jackel, B. J. 2003b,
ApJS, 145, 213
Broten, N. W., MacLeod, J. M., & Vallée, J. P.
1988, ApSS, 141, 303
Caswell, J. L., McClure-Griffiths, & N. M., Che-
ung, M. C. M. 2004, MNRAS, 352, 1405
Clegg, A. W., Cordes, J. M., Simonetti, J. M., &
Kulkarni, S. R. 1992, ApJ, 386, 143
Cordes, J. M., & Lazio, T. J. W. 2002, preprint
(astro-ph/0207156) CL02
Chanot, A. & Sivan, J. P. 1983, A&A, 121, 19
Duncan, A. R., Haynes, R. F., Jones, K. L., &
Stewart, R. T. 1997, MNRAS, 291, 279
Frick, P., Stepanov, R., Shukurov, A., & Sokoloff,
D. 2001, MNRAS, 325, 649
Gaensler, B. M., Dickey, J. M., McClure-Griffiths,
N. M., Green, A. J., Wieringa, M. H., &
Haynes, R. F. 2001, ApJ, 549, 959
Gaensler, B. M., Haverkorn, M., Staveley-Smith,
L., Dickey, J. M., McClure-Griffiths, N. M.,
Dickel, J. R. &Wolleben, M. 2005, Science, 307,
Han, J. L., Manchester, R. N., & Qiao, G. J. 1999,
MNRAS, 306, 371
Han, J. L., Manchester, R. N., Lyne, A. G., Qiao,
G. J., & van Straten, W., 2006, ApJ, 642, 868
Haverkorn, M., Gaensler, B. M., Brown, J. C.,
Bizunok, N. S., McClure-Griffiths, N. M.,
Dickey, J. M., & Green, A. J. 2006a, ApJ, L33
Haverkorn, M., Gaensler, B. M., McClure-
Griffiths, N. M., Dickey, J. M., & Green, A. J.
2006b, ApJS, 167, 230
Heiles, C. 1996a, ApJ, 462, 316
Heiles, C. 1996b, in Polarimetry of the Interstellar
Medium, W. Roberge & D. Whittet (eds), 97
(ASPCS), 457
Indrani, C. & Deshpande, A. A. 1998, New As-
tronomy, 4, 33
Lyne, A. G. & Smith, F. G. 1989, MNRAS, 237,
Manchester, R. N. 1972, ApJ, 172, 43
Manchester, R. N. 1974, ApJ, 188, 637
McClure-Griffiths, N. M., Dickey, J. M., Gaensler,
B. M., Green, A. J., Haverkorn, M., & Strasser,
S. 2005, ApJS, 158, 178
Menke, W. 1984, Geophysical Data Analysis: Dis-
crete Inverse Theory (Academic Press, inc.)
Mitra, D., Wielebinski, R., Kramer, M., & Jessner,
A. 2003, 403, 585
Rand, R. J. & Lyne, A.G. 1994, MNRAS, 268, 497
Shukurov, A. 2005 in Cosmic Magnetic Fields, R.
Wielebinski & R. Beck (eds.), Springer LNP
664, 113
Simard-Normandin, M. & Kronberg, P.P. 1980,
ApJ, 242, 74
Taylor, A. R., Gibson,S. J., Peracaula, M. et al.
2003, AJ, 125, 3145
http://arxiv.org/abs/astro-ph/0603531
http://arxiv.org/abs/astro-ph/0207156
Taylor, J. H., Manchester, R. N., & Lyne, A. G.
2000, VizieR Online Data Catalog, 7189
Thomson, R. C., Nelson & A. H. 1980, MNRAS,
191, 863
Vallée, J. P. 1980, A&A, 86, 251
Vallée, J. P. 2005, ApJ, 619, 297
Weisberg, J. M., Cordes, J. M., Kuan, B., Devine,
K. E., Green, J. T., & Backer, D. C. 2004,
ApJS, 150, 317
This 2-column preprint was prepared with the AAS LATEX
macros v5.2.
Table 1
Rotation Measures of the SGPS
ℓ a b a α δ I b m c RM
(◦) (◦) (J2000) (J2000) (mJy) (%) (rad m−2)
253.30 0.89 08 19 46.2 -34 42 05 179 5.8 −8 ± 12
253.52 0.83 08 20 08.4 -34 55 20 21 6.4 −50 ± 26
253.68 -0.60 08 14 47.2 -35 51 08 72 3.5 −349 ± 27
254.16 -0.34 08 17 10.3 -36 06 01 64 5.5 −338 ± 19
254.60 -0.87 08 16 12.7 -36 46 05 62 4.3 −15 ± 24
254.81 0.93 08 24 08.9 -35 55 21 87 13.7 +84 ± 13
254.95 0.63 08 23 20.7 -36 11 56 41 4.9 +8 ± 32
255.16 0.24 08 22 21.9 -36 36 03 92 5.3 −35 ± 16
255.27 0.16 08 22 20.2 -36 43 53 84 3.4 +26 ± 23
255.36 -0.26 08 20 52.1 -37 02 48 136 9.0 +97 ± 10
255.36 0.50 08 23 59.5 -36 37 00 94 8.0 −115 ± 13
256.14 0.25 08 25 11.9 -37 24 00 92 5.3 +44 ± 17
256.64 -0.22 08 24 43.7 -38 04 40 104 4.7 +172 ± 15
257.47 0.54 08 30 20.4 -38 18 27 46 9.5 +23 ± 20
257.71 -0.66 08 26 02.2 -39 12 13 117 4.0 +144 ± 20
257.92 0.65 08 32 07.7 -38 36 33 383 2.3 +76 ± 14
258.52 1.02 08 35 30.1 -38 52 23 118 2.7 +196 ± 26
258.77 0.08 08 32 23.4 -39 37 38 131 3.4 +221 ± 16
259.05 -0.72 08 29 49.9 -40 19 35 123 5.7 +175 ± 12
259.05 -0.75 08 29 45.1 -40 20 52 103 10.0 +150 ± 12
259.77 1.22 08 40 15.2 -39 44 39 33 5.8 +250 ± 29
260.41 -0.43 08 35 21.8 -41 14 50 101 3.0 +221 ± 18
260.52 -0.55 08 35 11.4 -41 24 44 25 5.8 +247 ± 31
260.69 -0.23 08 37 05.4 -41 21 08 224 3.2 +204 ± 12
263.20 1.07 08 50 56.2 -42 30 54 165 5.6 +739 ± 14
263.22 1.08 08 51 02.6 -42 31 32 164 4.4 +826 ± 19
263.50 0.17 08 48 12.7 -43 19 14 139 0.9 +260 ± 28
264.24 0.88 08 53 48.3 -43 25 59 112 11.0 +406 ± 9
265.69 0.85 08 58 54.5 -44 33 41 106 1.8 +70 ± 28
266.14 1.08 09 01 33.6 -44 44 49 66 3.2 +211 ± 29
266.27 0.66 09 00 15.2 -45 07 17 95 3.4 +396 ± 18
267.03 0.04 09 00 27.1 -46 06 15 58 3.8 +298 ± 21
267.17 0.47 09 02 50.7 -45 55 36 598 0.3 +323 ± 28
268.62 0.58 09 08 56.5 -46 55 53 231 2.1 +256 ± 15
269.05 0.17 09 08 54.5 -47 31 15 118 3.6 +456 ± 16
269.55 0.45 09 12 09.7 -47 41 35 156 1.9 +137 ± 18
270.56 -0.85 09 10 32.9 -49 19 14 58 5.5 −152 ± 23
270.91 0.93 09 19 52.0 -48 20 10 57 6.2 −149 ± 22
271.22 -0.35 09 15 37.0 -49 27 04 66 2.5 +215 ± 29
271.30 -0.06 09 17 12.3 -49 18 17 83 5.0 +136 ± 16
Table 1—Continued
ℓ a b a α δ I b m c RM
(◦) (◦) (J2000) (J2000) (mJy) (%) (rad m−2)
271.52 -1.01 09 13 54.3 -50 07 34 120 1.7 +75 ± 28
271.70 -0.38 09 17 30.7 -49 49 16 116 3.5 −93 ± 18
272.36 0.62 09 24 45.8 -49 34 44 21 5.6 −296 ± 28
273.46 0.68 09 29 58.1 -50 17 41 18 13.6 −106 ± 19
273.57 1.28 09 33 00.8 -49 55 47 229 1.9 +20 ± 25
274.77 0.25 09 34 14.3 -51 30 18 51 3.9 −76 ± 22
275.02 0.82 09 37 51.7 -51 15 07 26 8.4 −101 ± 25
275.48 -0.68 09 33 32.0 -52 40 18 282 2.8 +248 ± 13
275.56 1.02 09 41 17.5 -51 27 26 281 6.3 +16 ± 7
275.56 -0.20 09 36 03.4 -52 22 03 46 2.9 +112 ± 30
275.83 0.16 09 38 55.3 -52 16 41 170 9.3 −107 ± 7
275.86 0.94 09 42 25.0 -51 42 48 29 11.0 0 ± 21
276.46 0.89 09 45 09.7 -52 08 29 153 2.5 −17 ± 20
277.44 0.86 09 49 58.5 -52 47 04 10 12.1 −71 ± 28
277.78 -0.73 09 44 52.1 -54 13 45 127 2.0 −36 ± 23
277.78 -0.81 09 44 32.3 -54 17 32 690 7.3 −77 ± 3
278.04 0.75 09 52 37.5 -53 15 08 635 7.8 −104 ± 3
278.37 0.15 09 51 47.9 -53 55 46 18 13.7 −4 ± 24
278.43 0.53 09 53 42.8 -53 40 12 42 4.4 +20 ± 27
278.47 -0.30 09 50 22.2 -54 20 21 22 6.6 −116 ± 24
279.04 -0.88 09 50 55.5 -55 09 04 242 2.4 +239 ± 17
279.09 0.89 09 58 46.2 -53 47 21 47 5.0 −120 ± 25
279.15 -0.63 09 52 36.5 -55 01 08 36 11.2 +341 ± 24
279.15 -0.65 09 52 32.6 -55 02 37 44 6.2 +329 ± 20
279.33 0.80 09 59 41.3 -54 00 11 35 7.8 −157 ± 18
279.80 1.21 10 03 53.7 -53 57 24 25 9.7 −145 ± 25
280.53 0.81 10 06 18.9 -54 42 57 37 5.4 −10 ± 23
280.62 -0.14 10 02 53.8 -55 32 09 134 3.7 −112 ± 15
282.07 -0.78 10 08 36.2 -56 54 09 723 0.9 +862 ± 16
282.46 0.24 10 15 09.3 -56 17 28 109 2.0 +256 ± 23
284.30 0.81 10 28 39.3 -56 48 09 80 2.4 −547 ± 25
285.15 0.96 10 34 33.2 -57 06 17 268 0.9 +168 ± 31
285.60 0.62 10 36 12.7 -57 37 38 44 3.3 +368 ± 35
286.04 -1.05 10 32 43.2 -59 17 54 2826 0.4 +809 ± 14
286.89 0.59 10 44 37.5 -58 16 43 58 3.0 +324 ± 28
288.27 -0.70 10 49 34.1 -60 03 21 255 0.6 +491 ± 25
290.81 0.74 11 12 45.4 -59 47 27 1858 0.4 +419 ± 23
292.90 -0.02 11 26 31.6 -61 13 37 52 6.6 +349 ± 28
293.39 0.73 11 32 19.4 -60 40 22 67 5.0 +121 ± 20
293.73 0.63 11 34 43.2 -60 52 06 30 3.4 +116 ± 23
Table 1—Continued
ℓ a b a α δ I b m c RM
(◦) (◦) (J2000) (J2000) (mJy) (%) (rad m−2)
294.29 -0.90 11 35 31.6 -62 29 13 32 4.2 +449 ± 26
294.38 -0.75 11 36 37.4 -62 22 07 265 0.7 +470 ± 24
295.17 0.01 11 44 55.1 -61 51 26 57 3.5 +363 ± 22
295.23 -1.05 11 43 02.3 -62 53 56 114 2.1 −207 ± 22
295.29 -1.23 11 43 05.7 -63 04 55 226 1.6 −43 ± 29
296.18 -0.59 11 52 05.2 -62 40 59 118 3.7 +752 ± 14
296.90 0.14 11 59 31.2 -62 07 15 452 1.6 +1113 ± 11
297.67 0.77 12 06 52.0 -61 38 54 283 2.6 +570 ± 11
299.42 -0.23 12 20 29.8 -62 53 37 55 4.0 +535 ± 30
299.51 -1.10 12 20 24.8 -63 46 10 192 2.9 +315 ± 21
300.25 -0.01 12 27 57.6 -62 45 42 109 4.1 +123 ± 14
300.47 -0.99 12 29 06.5 -63 45 08 112 4.9 +412 ± 18
300.65 -0.41 12 31 12.4 -63 11 40 809 0.9 +358 ± 19
301.14 -0.09 12 35 39.6 -62 54 30 375 1.3 +350 ± 18
301.70 0.25 12 40 45.4 -62 36 01 21 5.2 +296 ± 33
302.60 -1.17 12 48 26.2 -64 02 26 162 1.7 +159 ± 25
303.30 0.51 12 54 36.3 -62 21 25 149 3.5 −370 ± 14
304.53 1.00 13 05 00.1 -61 49 31 32 6.1 +40 ± 28
305.62 -1.16 13 15 52.5 -63 54 09 44 10.5 −61 ± 28
306.87 0.02 13 25 47.5 -62 35 15 215 1.0 −197 ± 27
306.92 -0.70 13 27 00.7 -63 17 46 275 2.0 +52 ± 16
307.20 -0.84 13 29 39.3 -63 23 47 41 8.1 +382 ± 18
308.64 -0.62 13 41 56.1 -62 55 38 40 2.8 −133 ± 27
308.73 0.07 13 41 33.5 -62 14 06 92 1.4 −661 ± 29
308.93 0.40 13 42 41.5 -61 52 38 152 2.5 −752 ± 17
309.06 0.84 13 43 00.9 -61 24 52 670 2.3 −504 ± 10
310.20 -1.04 13 56 09.7 -62 59 30 46 8.5 −584 ± 19
312.37 -0.04 14 11 37.1 -61 25 54 129 2.3 −438 ± 28
313.96 -0.76 14 26 14.9 -61 34 58 88 2.3 −480 ± 22
313.99 0.94 14 21 36.9 -59 59 01 507 1.8 −828 ± 17
314.02 1.01 14 21 40.3 -59 54 20 757 1.7 −579 ± 20
314.50 0.30 14 27 14.8 -60 24 07 86 5.3 −738 ± 19
314.82 0.89 14 27 57.3 -59 43 58 92 3.7 −507 ± 25
316.64 1.15 14 40 15.0 -58 47 30 64 4.6 −525 ± 27
317.54 -0.57 14 52 27.0 -59 58 11 152 2.1 +395 ± 27
318.53 0.30 14 56 18.1 -58 44 29 326 1.0 +53 ± 29
319.34 1.08 14 58 58.6 -57 40 31 1101 0.7 +241 ± 21
319.39 0.74 15 00 28.7 -57 57 04 85 1.4 +279 ± 30
320.83 0.88 15 09 19.7 -57 07 19 454 0.8 −8 ± 20
321.48 1.02 15 12 54.7 -56 40 20 933 2.0 −243 ± 8
Table 1—Continued
ℓ a b a α δ I b m c RM
(◦) (◦) (J2000) (J2000) (mJy) (%) (rad m−2)
321.58 -0.76 15 20 29.3 -58 07 41 1831 1.1 −138 ± 9
322.05 -0.95 15 24 17.5 -58 01 49 78 5.4 −397 ± 17
323.15 -0.52 15 29 19.5 -57 03 49 47 2.9 +83 ± 25
324.77 0.61 15 34 10.5 -55 12 30 161 3.9 −66 ± 14
325.81 1.08 15 38 05.1 -54 13 08 608 0.6 −15 ± 31 d
325.83 -0.30 15 43 57.7 -55 18 26 247 1.8 +356 ± 18
326.69 -1.16 15 52 28.2 -55 27 04 95 9.3 −142 ± 22
327.31 0.88 15 47 01.4 -53 28 29 256 1.2 −189 ± 26 d
328.36 -0.41 15 58 00.3 -53 48 30 493 0.3 −721 ± 35
329.48 0.22 16 00 57.1 -52 36 04 599 0.7 −100 ± 25
330.12 -1.08 16 09 49.5 -53 08 33 218 3.0 −931 ± 25
332.14 1.03 16 10 10.8 -50 13 21 151 2.8 −754 ± 22 d
333.72 -0.27 16 22 55.5 -50 03 31 108 2.5 +204 ± 29
335.32 0.60 16 26 00.4 -48 18 36 352 2.9 −138 ± 11
337.06 0.85 16 32 01.7 -46 52 39 100 2.5 −739 ± 32
339.65 -0.24 16 46 44.6 -45 40 00 222 2.2 −398 ± 19
342.16 -0.74 16 57 52.7 -44 02 49 264 2.2 +127 ± 15
342.62 -0.45 16 58 15.4 -43 30 19 50 3.9 −913 ± 30
343.29 0.60 16 56 01.8 -42 19 58 102 4.7 −1035 ± 16
345.22 0.68 17 02 02.9 -40 45 50 39 12.6 +183 ± 21
347.40 -1.04 17 16 09.8 -40 02 24 72 4.0 −524 ± 36
349.65 -0.36 17 19 58.3 -37 48 15 133 3.3 +110 ± 20
350.52 -0.73 17 23 58.9 -37 18 18 127 2.2 +277 ± 32
351.31 -0.53 17 25 22.9 -36 32 16 200 3.3 −247 ± 15
351.82 0.17 17 23 56.8 -35 43 34 118 6.1 +134 ± 16
352.13 1.15 17 20 51.7 -34 54 49 116 6.5 +76 ± 30
355.43 -0.81 17 37 32.3 -33 14 40 76 4.2 +601 ± 21
356.57 0.87 17 33 45.9 -31 22 37 116 1.6 +985 ± 30
aThe identified location is the peak of the gaussian fit to the source in
polarised intensity. All sources were either unresolved or partially resolved.
b I is the peak-pixel Stokes I value of the interferometric data.
c m is the fractional polarisation (linear polarised intensity divided by
Stokes I) averaged over the FWHM pixels used for RM calculation.
dA RM for this source was calculated by Gaensler et al. (2001) using
a simpler approach than the more rigorous method used here. These new
values should replace the previously determined values.
Norma
Scutum-Crux
Sagittarius-
Carina
Perseus
local
molecular
l=253o
l=358o
Fig. 1.— View of the Galaxy from above. The grey scale is the CL02 electron density model, with labels
indicating the spiral arms, and the asterisk indicating the location of the Sun. Quadrants 1, 2, 3, and 4 are
labeled with Q1, Q2, Q3 and Q4, respectively. The bounding longitudes of the SGPS (Phase I) are indicated
by black lines and are labeled. The circles represent the smoothed-averaged SGPS RMs as shown in the
middle panel of Figure 3. Filled (open) circles indicate positive (negative) RM, with the size of the circles
linearly proportional to |RM|, truncated between 59 and 592 rad m−2 (the |RM|max from the middle panel of
Figure 3) so that sources with |RM| < 59 rad m−2 are set to 59 rad m−2. The blue dashed (dotted) lines are
also from the middle panel of Figure 3, and indicate the approximate longitudes of |RM| maxima (minima)
in the SGPS RM data. The solid blue line is the longitude where the RMs transition from primarily positive
to primarily negative (ℓ ∼ 304◦).
      
350 330 310 304 290 270 250
PULSARS
Galactic Longitude
Fig. 2.— Rotation measures for sources in the SGPS. Top panel: extragalactic RM sources; Bottom panel:
pulsar RM sources. Grey filled symbols indicate positive rotation measure, black open symbols indicate
negative rotation measures; sizes of symbols are linearly proportional to the magnitude of RM truncated
between 100-600 rad m−2, so that sources with |RM| < 100 rad m−2 are set to 100 rad m−2, and those with
|RM| > 600 rad m−2 are set to 600 rad m−2. In the top panel, the square at ℓ = 307.1◦, b= 1.2◦ represents
the EGS RM previously observed by Broten et al. (1988), while the squares at ℓ ∼ 330◦ represent the EGS
RMs of the SGPS test region. The circles in the top panel represent the SGPS EGS as given in Table 1.
The vertical line at ℓ = 304◦ is the approximate longitude where the EGS RMs change sign. The dashed
boxes indicate boundaries of the SGPS region.
             
-1200
   SGPS EGS
(raw, binned)
             
-1200
SGPS EGS
(smoothed)
360 350 340 330 320 310  300 290 280 270 260 250
-1200
PULSARS
  (raw) 
Galactic Longitude
Fig. 3.— RM versus Galactic longitude for RM sources in the SGPS region. Top panel: circles represent the
individual SGPS EGS sources (errors are smaller than the symbol size), while open red diamonds represent
data averaged into into independent longitude bins containing 9 sources (the end bin at 255◦ contains 13
sources). Where symbol size permits, the error bars are the standard deviation in longitude and RM for each
bin. Middle panel: purple diamonds represent boxcar-averaged SGPS EGS data over 9 degrees in longitude
with a stepsize of 3 degrees. In contrast to the binned data in the top panel, the error bars are the standard
error of the mean. The solid line marks the approximate longitude of transition from predominantly positive
RMs to negative RMs (l ∼ 304◦) Dotted lines (dashed lines) mark approximate longitudes of minimum
(maximum) |RM| in SGPS data. Bottom panel: squares represent the individual pulsars with known RM in
the SGPS region (errors are smaller than the symbol sizes).
Fig. 4.— A simple model of the magnetic field in the SGPS region, constrained using individual RMs of
149 EGS and 120 pulsars. Colored regions indicate the total strength for the model fit of magnetic field
corresponding to the regions identified by the green lines as discussed in the text. Red (blue) shading indicates
a clockwise (counter-clockwise) field direction as viewed from the Galactic north pole. The direction of the
field within the arms is also indicated by arrows. The open-head arrow on the Norma arm indicates the field
direction in this arm is not well-constrained in this model configuration with the data used. The bounding
longitudes of the SGPS (Phase I) are indicated by black lines and are labeled. The grey scale is the CL02
electron density model with labels indicating the spiral arms.
            
-1200
SGPS EGS
(raw)
Model
            
-1200
SGPS EGS
(smoothed)
Model
360 350 340 330 320 310 300 290 280 270 260 250
-1200
Pulsars (raw)
Model
Galactic Longitude
Fig. 5.— Comparison of modeled RMs (green symbols) and observed RMs. The modeled RMs are calculated
using the CL02 electron density model and the magnetic field model shown in Figure 4. The format of this
figure follows that of Figure 3, where the individual SGPS EGS are shown in the top panel, smoothed-
averaged SGPS EGS data are shown in the middle panel, and the individual pulsars are presented in the
bottom panel.
	Introduction
	Rotation Measure Calculations
	Observations of the Galactic Magnetic Field
	Modeling the Large-Scale Magnetic Field
	Summary
ABSTRACT
  We present new Faraday rotation measures (RMs) for 148 extragalactic radio
sources behind the southern Galactic plane (253o < l < 356o, |b| < 1.5o), and
use these data in combination with published data to probe the large-scale
structure of the Milky Way's magnetic field. We show that the magnitudes of
these RMs oscillate with longitude in a manner that correlates with the
locations of the Galactic spiral arms. The observed pattern in RMs requries the
presence of at least one large-scale magnetic reversal in the fourth Galactic
quadrant, located between the Sagittarius- Carina and Scutum-Crux spiral arms.
To quantitatively compare our measurements to other recent studies, we consider
all available extragalactic and pulsar RMs in the region we have surveyed, and
jointly fit these data to simple models in which the large-scale field follows
the spiral arms. In the best-fitting model, the magnetic field in the fourth
Galactic quadrant is directed clockwise in the Sagittarius-Carina spiral arm
(as viewed from the North Galactic pole), but is oriented counter- clockwise in
the Scutum-Crux arm. This contrasts with recent analyses of pulsar RMs alone,
in which the fourth-quadrant field was presumed to be directed
counter-clockwise in the Sagittarius- Carina arm. Also in contrast to recent
pulsar RM studies, our joint modeling of pulsar and extragalactic RMs
demonstrates that large numbers of large-scale magnetic field reversals are not
required to account for observations.

<|endoftext|><|startoftext|>
Introduction
Let G be an abelian group and let A,S be finite subsets of G with 0 /∈ S. Shepherdson’s
generalization of the Cauchy-Davenport Theorem states that |A ∪ (A + S)| ≥ |A| + |S| if
A ∪ (A+ S) contains no subgroup generated by some element of S.
As an application Shepherdson [14] proved that there are s1, · · · , sk ∈ S such that k ≤
⌉ and
1≤i≤k si = 0, if G is finite. The paper of Shepherdson includes thanks to Heilbronn
for suggesting this application together with a mention that Chowla obtained some related
zero-sum results.
Let D = (V,E) be a loopless finite digraph with minimal outdgree at least 1. It is well
known that D contains a directed cycle. The smallest cardinality of such a cycle is called the
girth of D and will be denoted by g(D). In 1970 Behzad, Chartrand and Wall [1] conjectured
that |V | ≥ r(g(D) − 1) + 1, if d+(x) = d−(x) = r for all x ∈ V . In 1978, Caccetta and
Häggkvist [3] made the stronger conjecture :
|V | ≥ min(d+x : x ∈ V )(g(D) − 1) + 1.
These conjectures are still largely open, even for the special case g(D) = 4. The reader
may find references and results about this question in [2].
These conjectures were proved by the author for vertex-symmetric digraphs [6]. This
result applied to Cayley graphs shows the validity of Shepherdson’s zero-sum result for all
finite groups. Unfortunately we were not aware at that moment of Shepherdson’s result. Our
Université Pierre et Marie Curie, Paris. yha@ccr.jussieu.fr
http://arxiv.org/abs/0704.0459v1
proof [6] is based on the properties of atoms of a finite digraph and Menger’s Theorem. A
description of Cayley graphs on finite Abelian groups such that |V | = r(g(D)− 1) + 1, where
r is the outdegree was obtained by the authors of [9] using Kemperman critical pair Theory
[12]. A new proof of the Caccetta and Häggkvist conjecture for vertex-symmetric digraphs
based on an additive result of Kemperman [11] and the representation of vertex symmetric
digraphs as coset graphs is given in [10].
More recently Seymour proposed the following conjecture [13]:
Let D be a loopless digraph and let r ≥ 1 be an integer. Then there is a vertex a such
|Γ(a) ∪ Γ2(a) ∪ · · · ∪ Γg−2(a)| ≥ r(g − 2),
where g = g(D).
The case g = 4 of this conjecture is mentioned in [2]. Seymour’s Conjecture implies
the conjecture of Behzad, Chartrand and Wall. Seymour’s Conjecture also implies that D
contains a directed cycle C with |C| ≤ ⌈
|V |−1
⌉ + 1. Notice that the Caccetta-Häggkvist
Conjecture states that D contains a directed cycle C with |C| ≤ ⌈
We shall allow infinite relations. The classical strong connectivity of digraphs needs to be
modified in this case in order to have a good lower bound of the size of the image of a set.
Also the presence of loops will simplify the presentation of the connectivity method. Since
this convention is unusual in this part of Graph Theory, we shall work with relations. Our
terminology will be developed in the next section.
Seymour’s conjecture may be formulated as follows :
Conjecture 1 [13] Let Γ = (V,E) be a finite reflexive relation and let j be an integer. Then
there is an x ∈ V such that one of the following conditions holds.
• |Γj(x)| ≥ 1 + j(|Γ(x)| − 1).
• Γ−1(x) ∩ Γj(x) 6= {x}.
Our main result is the following one:
Let Γ = (V,E) be a point-symmetric reflexive relation and let v ∈ V such that |Γ(v)| is
finite. Let j ≥ 1 be such that Γj(v) ∩ Γ−(v) = {v}. Then
|Γj(v)| ≥ |Γj−1(v)|+ |Γ(v)| − 1.
This result implies the validity of the above conjectures for vertex-symmetric graphs.
2 Terminology
Let V be a set. The diagonal of V is by definition ∆V = {(x, x) : x ∈ V }. Let E ⊂ V × V .
The ordered pair Γ = (V,E) will be called a relation . The relation Γ is said to be reflexive
if ∆V ⊂ E.
Let a ∈ V and let A ⊂ V . The image of a is by definition
Γ(a) = {x : (a, x) ∈ E}.
The image of A is by definition
Γ(A) =
Γ(x).
The cardinality of the image of x will be called the degree of x and will be denoted by d(x).
The relation Γ will be called regular with degree r if the elements of V have the same degree
r. We shall say that Γ is locally finite if d(x) is finite for all x. The reverse relation of Γ is
by definition Γ− = (V,E−), where E− = {(x, y)
(y, x) ∈ E}. The restriction of Γ = (V,E)
to a subset W ⊂ V is defined as the relation Γ[W ] = (W,E ∩ (W ×W )).
Let Φ = (W,F ) be a relation. A function f : V −→ W will be called a homomorphism if
for all x, y ∈ V such that (x, y) ∈ E, we have (f(x), f(y)) ∈ F .
The relation Γ will be called point-symmetric if for all x, y ∈ V , there is an automorphism
f such that y = f(x). Clearly a point-symmetric relation is regular.
We identify graphs and their relations. A loopless finite relation will be called a digraph.
The reader may replace everywhere the term ”relation” by ”graph”. In this case we mention
some differences between our terminology (which follows closely the standard notations of
Set Theory) and the notations used in some text books of Graph Theory. We point out that
our graphs are usually called directed graphs without multiple arcs or digraphs. Notice that
the notion Γ(a) used here and in Set Theory is written Γ+(a) in some text books in Graph
Theory. Also our notion of degree is called outdegree. We made the choice of Set Theory
terminology since some parts of this paper could have some interest in Group Theory and
Number Theory.
We shall use the composition Γ1◦· · ·◦Γk of relations Γ1, · · · ,Γk on V . If all these relations
are equal to Γ, we shall write
Γ1 ◦ · · · ◦ Γk = Γ
We shall write Γ0 for the identity relation IV = (V,∆V ). Also we shall write Γ
−j instead
of (Γ−)j .
3 Connectivity
Let Γ = (V,E) be a relation. For X ⊂ V , we shall write
∂Γ(X) = Γ(X) \X.
When the context is clear the reference to Γ will be omitted.
Let Γ = (V,E) be a locally finite reflexive relation. The connectivity of Γ is by definition
κ(Γ) = |V | − 1, if E = V × V. Otherwise
κ(Γ) = min{|∂(X)| : 1 ≤ |X| < ∞ and Γ(X) 6= V }. (1)
A subset X achieving the minimum in (1) is called a fragment of Γ. A fragment with
minimum cardinality is called an atom. The cardinality of an atom of Γ will be denoted by
a(Γ). It is not true that distinct atoms are always disjoint. But the author proved in [4] that,
if V is finite, then distinct atoms of Γ are disjoint, or distinct atoms of Γ− are disjoint. In
[7], it was observed that the same methods imply that distinct atoms of Γ are disjoint if V
is infinite. One may find in [8] unified proofs and some applications to Group Theory and
Additive Number Theory.
As a consequence of this result we could obtain :
Proposition 2 [4, 5, 7, 8] Let Γ = (V,E) be a locally-finite point-symmetric relation with
E 6= V × V . Suppose that V is infinite or that a(Γ) ≤ a(Γ−). Let A be an atom of Γ. Then
the subrelation Γ[A] induced on A is a point-symmetric relation. Moreover |A| ≤ κ(Γ).
4 Iterated image size
Lemma 3 Let Γ = (V,E) be a point-symmetric relation. Then for all i, Γi is point-symmetric.
Proof. Clearly any automorphism of Γ is an automorphism of Γi.
Theorem 4 Let Γ = (V,E) be a point-symmetric reflexive locally finite relation and let v ∈ V.
Let j ≥ 1 be an integer such that Γj(v) ∩ Γ−(v) = {v}. Then
|Γj(v)| ≥ |Γj−1(v)|+ |Γ(v)| − 1.
Proof.
Set V0 =
0≤i Γ
i(v). Clearly Γj(v) ⊂ V0. So we may assume that Γ = Γ[V0] and V = V0.
In the finite case this means that we restrict ourselves to the connected component con-
taining v.
We shall assume j > 1, since the result is obvious for j = 1.
With this hypothesis, clearly we have κ(Γ) ≥ 1.
Clearly
E 6= V × V.
Set κ = κ(Γ). Let A be an atom of Γ containing v. The proof is by induction on |Γ(v)|.
Put r = |Γ(v)|. Assume first
κ = r − 1.
Observe that Γj(v) 6= V . Then by the definition of κ, we have
|Γj(v) \ Γj−1(v)| = |∂(Γj−1(v))| ≥ κ = r − 1.
The result holds in this case. So we may assume
κ ≤ r − 2, (2)
and hence r ≥ 3. Then |A| ≥ 2, since otherwise κ = |∂(A)| = r − 1.
Case 1. V is infinite or a(Γ) ≤ a(Γ−).
By Proposition 2, Γ[A] is point-symmetric (and hence regular) and
|A| ≤ κ. (3)
Put r0 = |Γ(v) ∩A|.
Put X = Γj−1(v) and Y = A ∪X. By the induction hypothesis, we have
|∂(X) ∩A| = |Γj(v) ∩A| − |Γj−1(v) ∩A| ≥ r0 − 1. (4)
Let us prove that
∂(Y ) ⊂ (∂(X) \ A) ∪ (∂(A) \ Γ(v)). (5)
Since j ≥ 2, we have Γ(v) ⊂ X and hence Γ(v) ∩ ∂(X) = ∅. Then (5) clearly holds.
It follows that
∂(Y ) \ ∂(X) ⊂ ∂(A) \ Γ(v),
and hence we have
|∂(Y ) \ ∂(X)| ≤ |∂(A)| − |∂(A) ∩ Γ(v)|
= κ− |Γ(v)| + |A ∩ Γ(v)|
= κ+ r0 − r.
Hence
|∂(Y ) \ ∂(X)| ≤ κ+ r0 − r. (6)
Let us show that Γ(Y ) 6= V. This holds obviously if V is infinite. So we may assume V
finite. In this case we have |Γ−(v)| = |Γ(v)|.
Clearly we have
Γ(Y ) = Γ(X) ∪ (A \ Γ(v)) ∪ (∂(Y ) \ ∂(X)).
It follows using ( 3) and (6) that
|Γ(Y )| ≤ |Γ(X)|+ |A \ Γ(v)|+ |∂(Y ) \ ∂(X)|
≤ |V \ (Γ−(v) \ {v})| + |A| − r0 + κ+ r0 − r
= |V |+ |A|+ κ− 2r + 1
≤ |V |+ 2κ− 2r + 1 ≤ |V | − 3.
By the definition of κ, we have |∂(Y )| ≥ κ.
By (4) and (6),
|∂(X)| = |∂(X) ∩A|+ |∂(Y ) ∩ ∂(X)|
≥ r0 − 1 + |∂(Y )| − |∂(Y ) \ ∂(X)|
≥ r0 − 1 + κ− (κ+ r0 − r) = r − 1,
and the result is proved since
∂(X) = Γj(v) \ Γj−1(v).
Case 2. V is finite and a(Γ) > a(Γ−).
The argument used in Case 1, shows that |Γ−j(v) \ Γ−(j−1)(v)| ≥ r − 1.
By Lemma 3, Γj is point-symmetric. Since V is finite, Γj and its reverse Γ−j have the
same degree. Therefore observing that these relations are reflexive
r − 1 ≤ |Γ−j(v)| − |Γ−(j−1)(v)|
= |Γj(v)| − |Γ(j−1)(v)|.
The next result shows the validity of the conjecture of Seymour mentioned in the intro-
duction in the case of relations with a symmetric group of automorphisms.
Corollary 5 Let Γ = (V,E) be a point-symmetric reflexive relation with degree r and let
v ∈ V . Let j ≥ 1 be an integer such that Γj(v) ∩ Γ−(v) = {v} then
|Γj(v)| ≥ 1 + (r − 1)j.
Proof. The proof follows by induction using Theorem 4
Corollary 6 [6] Let Γ = (V,E) be a point-symmetric digraph with degree r ≥ 1 and put
g = g(Γ). Then |V | ≥ 1 + r(g − 1).
Proof. Set Φ = (V,E ∪ ∆V ). Let v ∈ V . Clearly we have Φ
g−2(v) ∩ Φ−(v) = {v}. By
Corollary 5, |V | − r = |V \ (Φ−(a) \ {a})| ≥ |Φg−2(v)| ≥ 1 + (g − 2)r.
This result, proved in [6], shows the validity of the Caccetta-Häggkvist Conjecture for
point-symmetric graphs. But the proof obtained here is much easier.
Corollary 7 [6] Let G be a group of order n and let S ⊂ G\{1} with cardinality = s. There
are elements s1, s2, · · · , sk ∈ S such that k ≤ ⌈
⌉ and
1≤i≤k
si = 1.
The proof follows by applying Corollary 6 to the Cayley graph defined by S on G. In
particular the theorem of Shepherdson mentioned in the introduction holds for all finite
groups.
References
[1] M. Behzad, G. Chartrand and C.E. Wall, On minimal regular digraphs with given girth,
Fund. Math. 69 (1970), 227-231.
[2] J. A. Bondy, Counting subgraphs: a new approach to the Caccetta-Häggkvist conjecture.
Graphs and combinatorics (Marseille, 1995). Discrete Math. 165/166 (1997), 71-80.
[3] L. Caccetta and R. Häggkvist, On minimal digraphs with given girth, Proceedings of
the Ninth Southeastern Conference on Combinatorics, Graph Theory, and Computing
(Florida Atlantic Univ., Boca Raton, Fla., 1978), (Winnipeg, Man.), Congress. Numer.,
XXI, Utilitas Math. (1978), 181-187.
[4] Y.O. Hamidoune, Sur les atomes d’un graphe orienté, C.R. Acad. Sc. Paris A 284 (1977),
1253-1256.
[5] Y.O. Hamidoune, Quelques problèmes de connexité dans les graphes orientés, J. Comb.
Theory B 30 (1981), 1-10.
[6] Y.O. Hamidoune, An application of connectivity theory in graphs to factorizations of
elements in groups, Europ. J of Combinatorics 2 (1981), 349-355.
[7] Y.O. Hamidoune, Sur les atomes d’un graphe de Cayley infini, Discrete Math., 73 (1989),
297-300.
[8] Y.O. Hamidoune, On small subset product in a group. Structure Theory of set-addition,
Astérisque. no. 258(1999), xiv-xv, 281-308.
[9] Y.O. Hamidoune, A. Lladó and O. Serra, Vosperian and superconnected Abelian Cayley
digraphs, Graphs and Combinatorics 7(1991), 143-152.
[10] Melvyn B. Nathanson, The Caccetta-Häggkvist conjecture and Additive Number Theory,
http://arxiv.org/archive/math: eprint arXiv:math/0603469.
[11] J.H.B. Kempermann, On complexes in a semigroup, Nederl. Akad. Wetensch. Proc. Ser.
A. 59= Indag. Math. 18(1956), 247-254.
[12] J. H. B. Kemperman, On small sumsets in a Abelian group, Acta Math. 103 (1960),
63-88.
[13] P. Seymour, Oral communication.
[14] J. C. Shepherdson, On the addition of elements of a sequence, J. London Math Soc.
22(1947), 85-88.
http://arxiv.org/archive/math:
http://arxiv.org/abs/math/0603469
	Introduction
	Terminology 
	Connectivity
	Iterated image size
ABSTRACT
  Let $\Gamma =(V,E)$ be a point-symmetric reflexive relation and let $v\in V$
such that
  $|\Gamma (v)|$ is finite (and hence $|\Gamma (x)|$ is finite for all $x$, by
the transitive action of the group of automorphisms). Let $j\in \N$ be an
integer such that $\Gamma ^j(v)\cap \Gamma ^{-}(v)=\{v\}$. Our main result
states that
  $$ |\Gamma ^{j} (v)|\ge | \Gamma ^{j-1} (v)| + |\Gamma (v)|-1.$$
  As an application we have $ |\Gamma ^{j} (v)| \ge 1+(|\Gamma (v)|-1)j.$ The
last result confirms a recent conjecture of Seymour in the case of
vertex-symmetric graphs. Also it gives a short proof for the validity of the
Caccetta-H\"aggkvist conjecture for vertex-symmetric graphs and generalizes an
additive result of Shepherdson.

<|endoftext|><|startoftext|>
Draft version October 3, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
THE KILODEGREE EXTREMELY LITTLE TELESCOPE (KELT): A SMALL ROBOTIC TELESCOPE FOR
LARGE-AREA SYNOPTIC SURVEYS
Joshua Pepper
, Richard W. Pogge
, D. L. DePoy
, J. L. Marshall
, K. Z. Stanek
, Amelia M. Stutz
, Shawn
Poindexter
, Robert Siverd
, Thomas P. O’Brien
, Mark Trueblood
, & Patricia Trueblood
Draft version October 3, 2018
ABSTRACT
The Kilodegree Extremely Little Telescope (KELT) project is a survey for planetary transits of
bright stars. It consists of a small-aperture, wide-field automated telescope located at Winer Obser-
vatory near Sonoita, Arizona. The telescope surveys a set of 26◦×26◦ fields, together covering about
25% of the Northern sky, targeting stars in the range of 8 < V < 10 mag, searching for transits by
close-in Jupiters. This paper describes the system hardware and software and discusses the quality
of the observations. We show that KELT is able to achieve the necessary photometric precision to
detect planetary transits around solar-type main sequence stars.
Subject headings: Astronomical Instrumentation
1. INTRODUCTION
The scientific value of planetary transits of bright
stars is well known – for a comprehensive review
see Charbonneau et al. (2007). These transits pro-
vide the opportunity to study the internal structure
of planets (Guillot 2005), their atmospheric compo-
sition (Charbonneau et al. 2002), spin-orbit alignment
(Gaudi & Winn 2007), and the presence of rings or
moons (Barnes & Fortnoy 2004). Radial-velocity (RV)
surveys have searched the brightest stars in the sky for
planets and are probing increasingly fainter stars. Even
with significant multiplexing, however, RV surveys are
not able to search large numbers of stars fainter than
V ∼ 8 mag. To find planets around fainter stars, transit
surveys are more suitable, and a number of such surveys
are underway. These surveys typically have wide fields
of view and small apertures to simultaneously monitor
tens or hundreds of thousands of stars. These surveys
have so far discovered six planets transiting bright stars
(Alonso et al. 2004; Bakos et al. 2007; Cameron et al.
2007; McCullough et al. 2006; O’Donovan et al. 2006).
In order to discover more such scientifically valuable sys-
tems, we have begun a survey to discover transiting plan-
The Kilodegree Extremely Little Telescope (KELT)
is designed to meet the objectives described in
Pepper, Gould, & DePoy (2003) for a wide field, small-
aperture survey for planetary transits of bright stars.
That paper derives a model for the ability of a given
transit survey to detect close-in giant planets (i.e. “Hot
Jupiters”), and determines an optimal survey configura-
tion for targeting 8 < V < 10 mag main-sequence stars.
Based on the model of Pepper, Gould, & DePoy (2003),
we expect to detect roughly four transiting planets with
the KELT survey.
The KELT system has two different observing config-
Electronic address: pepper@astronomy.ohio-state.edu
1 The Ohio State University Department of Astronomy, 4055
McPherson Lab, 140 West 18th Avenue, Columbus, OH 43210-
2 Department of Astronomy, University of Arizona, 933 N.
Cherry Avenue, Tucson, AZ 85721-0065
3 Winer Observatory, P. O. Box 797, Sonoita, AZ 85637-0797
urations. The primary configuration is a “Survey Mode”
designed for wide area coverage, either in large strips or
an all-sky survey, with the goal of covering broad sec-
tions of the sky with a large field of view, at a cadence
of a few minutes on a nightly basis throughout most of
the observing year. This mode implements the primary
scientific driver of KELT and gives this telescope a wider
field of view and targets a brighter magnitude range than
other transit surveys of its kind. The second configura-
tion is a “Campaign Mode” that uses a smaller field of
view and is designed to conduct short duration inten-
sive observing campaigns on specific fields. In Campaign
Mode we have undertaken a 74-day campaign towards
the Praesepe open cluster.
In this paper, we describe the instrumentation, deploy-
ment, and operations of KELT (§2); we characterize the
performance of the different components and the overall
system in the field (§3); we show how precise our pho-
tometry is (§4); and we provide examples of lightcurves
for variable stars and transit-like events (§5).
2. KELT SYSTEM OVERVIEW
2.1. Instrument
KELT consists of an optical assembly (CCD detector,
medium-format camera lens, and filter) mounted on a
robotic telescope mount. A dedicated computer is used
to control the telescope, camera, observation scheduling,
and data archiving system tasks. One goal in assembling
KELT was to use as many off-the-shelf components and
software packages as possible to speed the development.
The KELT detector is an Apogee Instruments4 AP16E
thermoelectrically cooled CCD camera. This camera
uses the Kodak KAF-16801E front-side illuminated CCD
with 4096 × 4096 9µm pixels (36.88 × 36.88mm detec-
tor area) and has peak quantum efficiency of ∼65% at
600nm. The AP16E uses a PCI card and cable to control
the camera and thermoelectric cooler (TEC). According
to the camera specifications and confirmed by labora-
tory testing, the camera is operated at a conversion gain
of 3.6 electrons/ADU and delivers a measured readout
4 http://www.ccd.com
http://arxiv.org/abs/0704.0460v2
mailto:pepper@astronomy.ohio-state.edu
noise of ∼15 e−. The device is read out at 14-bit reso-
lution at 1.3MHz, which gives a full-frame readout time
of ∼30 seconds. The CCD specifications claim a full-well
depth of ∼100,000e−, but the 14-bit ADC saturates at
16383 ADU (∼59,000 e−). The TEC can cool the device
to ∼30◦C below the ambient air temperature. Nominal
dark current is 0.1 − 0.2 e− pixel−1 sec−1 at an operat-
ing temperature of −10◦C (typical for 20◦C ambient air
temperature).
We use two different lenses with KELT. For the wide-
angle survey mode, we use a Mamiya 645 80mm f/1.9
medium-format manual-focus lens with a 42mm aper-
ture. This lens provides a roughly 23′′ pix−1 image scale
and a 26◦×26◦ field of view. To provide a narrow-angle
campaign mode, we use a Mamiya 645 200mm f/2.8
APO manual-focus telephoto lens with a 71mm aper-
ture. This provides a roughly 9.′′5 pix−1 image scale and
effective 10.◦8×10.◦8 field of view. Both lenses have some
vignetting at the corners, and the image quality declines
toward the outer part of the field, so the effective field of
view is circular (see §2.5 and §3.3 for details).
To reject the mostly-blue background sky without
greatly diminishing the sensitivity to stars (which are
mostly redder than the night sky), we use a Kodak
Wratten #8 red-pass filter with a 50% transmission
point at ∼490nm (the filter looks yellow to the eye),
mounted in front of the KELT lens. The calculated re-
sponse function of the KELT CCD and filter is shown
in Figure 1. The transmission function for the Wrat-
ten #8 filter is taken from the Kodak Photographic
Filters Handbook (Kodak 1998), and the quantum effi-
ciency curve for the Kodak KAF-16801E CCD was pro-
vided by the Eastman Kodak Company. The effect of
atmospheric transmission on this bandpass is estimated
for 1.2 airmasses at the altitude of Winer Observatory
(1515m) using the Palomar monochromatic extinction
coefficients (Hayes & Latham 1975), which are for an al-
titude of 1700m. We did not estimate in detail the atmo-
spheric water vapor or O2 extinction terms, as these are
not important for our application. The effective wave-
length of the combined Filter+CCD response function
(excluding atmospheric effects) is 691nm, with an effec-
tive width of 318nm, computed following the definition
of Schneider, Gunn, & Hoessel (1983). This results in
an effective bandpass that is equivalent to a very broad
R-band filter.
The optical assembly (camera+lens+filter) is mounted
on a Paramount ME Robotic Telescope Mount manufac-
tured by Software Bisque5. The Paramount is a research-
grade German Equatorial Mount designed specifically for
robotic operation with integrated telescope and cam-
era control. According to manufacturer’s ratings the
periodic tracking error of the mount before correction
is ±5′′. That is smaller than the large pixels of the
KELT camera and therefore does not affect our obser-
vations. The mount can carry an instrument payload
of up to 75 kg, more than sufficient for the KELT cam-
era, which weighs approximately 9 kg. The Paramount
is installed on a stock 91 cm high steel pier using a stock
base adapter plate. The Paramount provides us with
a robust, complete mounting solution for our telescope.
The optical assembly is mated to the Paramount using
5 http://www.bisque.com
a custom mounting bracket that mounts directly on the
Paramount’s Versa-Plate mounting surface.
The CCD camera and mount are controlled by a PC
computer located at the observing site that runs the Win-
dows XP Professional operating system and the Bisque
Observatory Software Suite from Software Bisque. There
are three main applications that we use for KELT:
TheSky to operate the Paramount ME (pointing and
tracking); CCDSoft CCD camera control software to op-
erate the AP16E camera; and Orchestrate scripting and
automation software to integrate the operation of TheSky
and CCDSoft. Orchestrate provides a simple script-
ing interface that lets us control all aspects of a night’s
observing with a single command script. This software
lets us prepare an entire night’s observing schedule and
upload it to the KELT control computer during the af-
ternoon. A scheduled task on the computer starts up
the system at sunset and loads the observing script into
Orchestrate, after which the system runs unattended
for the entire night, weather permitting.
The observing site provides AC power and Internet
connectivity. To ensure clean AC power for the KELT
telescope and control computer at the observing site
(§2.2) we use a 1500VA Powerware 9125 Uninterrupt-
able Power Supply (UPS). This filters the line power
and protects the system against surges or brief power
outages. The control computer is connected to the
Internet through the observing site’s gateway, and its
internal clock is synchronized with the network time
servers at the Kitt Peak National Observatory using the
Dimension 4 network time protocol (NTP) client6. This
ensures sufficiently accurate timing for telescope point-
ing and time-series photometry. Since we observe at few-
minute cadences, we can tolerate few-second timing pre-
cision, which is easily within the typical performance of
Dimension 4 on the available T1 connection.
2.2. Observatory Site
The KELT telescope has been installed at the Irvin
M. Winer Memorial Mobile Observatory7 near Sonoita,
Arizona. Located at N 31◦39′53′′, W 110◦36′03′′, ap-
proximately 50 miles southeast of Tucson at an eleva-
tion of 4970 feet (1515meters), Winer Observatory has a
dedicated observing building with a 25×50-foot (7.6×15-
meter) roll-off enclosure and provides site and mainte-
nance services. Winer currently hosts four robotic tele-
scopes, including KELT. The KELT telescope pier is
bolted to the concrete floor, and its control cables are
run into a nearby telescope control room that houses the
control computer and UPS. Winer also provides Internet
access (currently via a dedicated DS/T1 line, but early
in the project the site used a slower ISDN link), that al-
lows us to remotely login to the control computer via a
secure gateway.
The weather conditions at Winer Observatory are
roughly as good as comparable sites in southern Arizona;
about 60% of all observing time is usable, with half of
that time being measurably photometric. Since our PSFs
are between 2 and 3 pixels, and the pixel scales are 9.′′5
and 23′′ (for the 200mm and 80mm lenses, respectively),
6 v5.0 from Thinking Man Software
(www.thinkman.com/dimension4/)
7 http://www.winer.org
atmospheric seeing variations (which are on the scale of
arcseconds) are not a factor in our observations.
2.3. Observing Operations
Observations with KELT are carried out each
non-clouded night using command scripts for pre-
programmed, robotic operation; we do not undertake
any remote real-time operations. The nightly observ-
ing scripts are created at Ohio State University (OSU)
using a script-generation program written in Perl, and
then uploaded to Winer Observatory where they are used
by Orchestrate to direct the telescope to observe the
specified fields for each night. Each night has a differ-
ent script, and we generally upload scripts in 3-4 week
batches during the main survey season, or more fre-
quently during pointed-target campaigns.
The suite of software programs we use to control the
telescope mount and camera works well, but has several
limitations. Most importantly, the Orchestrate script-
ing package does not provide the built-in ability to pro-
gram control loops or conditional branching, which is
why we use a Perl program to create the Orchestrate
scripts we upload to the control computer.
The scripts start the telescope each night based on the
local clock time. On nights when the weather is judged
to be good enough for observing, the observatory control
computer automatically opens the roof of the observatory
at nautical (12◦) twilight and the telescope begins obser-
vations on schedule. If the weather is not suitable for
observing, on-site personnel abort the command script.
If the weather appears good at first but degrades dur-
ing the night, the observatory computer closes the roof
and the personnel abort the script. All data acquired are
archived automatically at the end of the night.
When the script is loaded into Orchestrate, it first
waits until one hour before astronomical (18◦) twilight.
At that point the telescope takes five dark images and five
bias images, with the exposure times for the dark frames
the same as the exposure times of the observations for
the night. Once these calibration data are taken, the
telescope goes back to sleep until astronomical twilight,
at which point it slews to the first target field and begins
the nightly observing sequence. Unless the weather turns
bad, prompting the roof to close, observations continue
until astronomical dawn. At this time the telescope is
slewed to its stowed position, and five dark images and
five bias images are acquired, ending observing for the
night. (See §2.5 for a discussion of KELT flatfields.)
We have so far used KELT in two distinct operating
modes: campaign mode, using the 200mm lens, and sur-
vey mode, using the 80mm lens. In campaign mode, the
telescope intensively observes a single target field for an
entire night. In survey mode, the telescope observes a
number of fields that are equally spaced around the sky
at 2h intervals of Right Ascension centered on Declina-
tion +31◦39′ (the latitude of Winer).
In campaign mode, the observing script instructs the
telescope to wait until the target field is above 2 air-
masses, and to then observe the field continuously until it
sinks below 2 airmasses or astronomical dawn, whichever
happens first. In survey mode, the telescope begins ob-
serving after astronomical twilight, tiling between two
fields at a time. New fields are observed as they rise
above 1.4 airmasses. Provisions for Moon avoidance are
built into the scripts to prevent observations of any sur-
vey field when it is within 45◦ (two field widths from field
center) of the Moon.
An operational complication arises because the KELT
mount is a German Equatorial design. This means that
fields observed East of the Meridian are rotated by 180◦
relative to fields observed West of the Meridian. The
practical effect is that we must separately reduce pho-
tometry for fields taken in East and West orientations,
especially when we use difference-imaging photometry.
To avoid the complication of creating two separate data
pipelines to reduce data taken in survey mode, we in-
stead observe only fields in the Eastern part of the sky.
We lose some observation time due to periods when the
Moon is within 45◦ of all fields in the East above 1.4
airmasses, leading to downtime when no field is available
that meets the observing criteria. In those situations, the
scripts instruct the telescope to pause observing until the
next field becomes available. Overall, the loss to Moon
avoidance reduces the total amount of data acquired by
∼10%.
2.4. Data Handling and Archiving
Data acquired by the camera are immediately writ-
ten to a hard drive in the control computer, logged, and
then copied to one of two 250GB external hard drives
attached to the control computer by a USB interface. At
the end of the night, all new images are automatically du-
plicated onto the other external hard drive. During long
winter nights, the telescope can take as many as 500-600
images per night, depending on the exposure time, fill-
ing the drives every two weeks. Normally, however, bad
weather and downtime due to Moon avoidance reduce
the actual observing rate, so it typically takes 3-4 weeks
for both storage drives to reach capacity.
Data quality is monitored daily using automated
scripts running on the Windows observing computer at
Winer. At the end of the night, a Perl script selects three
images from the beginning, middle, and end of the night
and uploads them to a computer at OSU. These sample
images are analyzed for basic statistics: mean, median,
and modal sky value and the mean FWHM of stars mea-
sured across the images, and then visually inspected to
ensure that the camera and mount are operating cor-
rectly.
When the external drives approach full capacity, they
are disconnected from the computer. One of the drives
is a hardened drive made by Olixir Technologies (their
Mobile DataVault) that serves as the transport drive.
This drive is shipped via FedEx to the OSU Astronomy
Department in Columbus, Ohio, in a cushioned trans-
port box, as bandwidth limitations at Winer Observa-
tory preclude online data transfer. The second drive is a
conventional external USB drive without special mobile
packaging made by Maxtor Corporation which serves as
the backup drive and which is stored at Winer Obser-
vatory. Until the transport drive arrives at OSU and
its data have been successfully copied and verified, the
backup drive at Winer is stored and left idle. In the
event a transport drive arrives damaged, data from the
backup drive will be copied to another transport drive
and shipped. In the meantime, the removed drives are
replaced with two others of each type and operations
continue. At any given time, we have four external disk
drives in use: two drives in operation, one in transit,
and one stored as a backup. For transferring hundreds
of gigabytes of images every few weeks during the prime
observing season, this procedure has proven to be very
reliable and efficient. To date, out of dozens of drive ship-
ments, we have lost only two drives to damage in transit,
with no loss of data (both arrived damaged at Winer af-
ter their data were retrieved and copied at OSU).
When a transport drive arrives at OSU, the drive is
connected to our main data storage computer and all of
the images are copied and run though a series of data
quality checks. The image files are renamed, replacing
the cumbersome default file name created by the CCDSoft
application with a name indicating the field observed,
the UTC date, and an image number. The images are
then analyzed to measure image quality (modal sky and
mean FWHM). If the modal sky value is above 800 ADU
(due to moonlight, cloud cover, or other ambient light
sources), the image is discarded. The cutoff at 800 ADU
was determined using two months of representative data
showing that images compromised by excessive light con-
tamination consistently had sky values above that level
and were unsuitable for photometry. The cutoff is high,
with many poor images well below the cutoff, but we
choose to be conservative about eliminating images early
in the reduction process. The images that pass the initial
filter on sky values, and the trimmed sections of the bad
images, are stored on a multi-terabyte RAID storage ar-
ray at OSU, providing data redundancy and fast access
for data reduction and analysis.
2.5. Data Reduction
Here we describe the data reduction process in brief,
for the purposes of evaluating the performance of the
KELT camera. Detailed descriptions of the reduction
process will be included in an upcoming paper on the
scientific results of KELT observing campaigns (J. Pep-
per, in preparation).
The data reduction pipeline consists of three steps.
First, we process the images by subtracting dark frames
and dividing by a flatfield. Second, we identify all the
stars in the field and determine their instrumental mag-
nitudes. Third, we obtain the photometry on all images
using difference image analysis.
Dark images are created for each night by median-
combining 10 dark images – five from the beginning and
end of each night. In early testing we determined that we
can treat our dark frames as combined dark+bias. We
take bias frames separately to monitor their stability, but
do not incorporate them into the reduction process be-
cause the bias has been extremely stable. In cases where
dark frames were not taken or there were problems with
the images, we use good dark frames from nights brack-
eting the observations to create a substitute dark frame.
We confirmed that using dark images from nearby nights
did not significantly affect the statistics of the subtracted
images – our dark images are quite stable from night to
night.
The KELT system is challenging to accurately flatfield.
For the 200mm lens there is a combined decrease in flux
of ∼18% between the center of the image and the edges,
and a decrease of up to ∼26% between the center and
the corners. For the 80mm lens, the decrease is ∼23%
from the center to the edge, and ∼35% from the center
to the corners. Because of the large KELT field of view
(10.◦8 and 26◦ for the 200mm and 80mm lenses, respec-
tively), twilight flats are not useful for flatfielding since
the twilight sky is not uniform on those scales. Dome
flats using a diffuse screen produced reasonable results
with high signal-to-noise ratio. The flats are sufficiently
repeatable that we do not need to regularly take dome
flats. For relative photometry, the dome flats work ade-
quately, and we are able to absolutely flatfield our images
to ∼ 5% accuracy.
Once images have been dark-subtracted and flatfielded,
we can then create catalogs of images and measure the
brightnesses of stars on the images. This photometric
analysis is done in two basic steps. The first is to cre-
ate a high-quality reference image for a field by com-
bining a few dozen of the best images and then use the
DAOPHOT software package (Stetson 1987) to identify
all of the stars in the field down to a faint magnitude
limit and measure their approximate instrumental mag-
nitudes. See §4.1 for the details on how the instrumental
magnitudes are calibrated to standard photometry.
Once a template and DAOPHOT star catalog with
baseline instrumental magnitudes have been created, the
second step is to process the images with the ISIS image
subtraction package (Alard & Lupton 1998; Alard 2000).
Our reduction process is similar to that of Hartman et al.
(2004). The ISIS package first spatially registers all of the
images to align them with the reference image. The refer-
ence image is convolved with a kernel for each image and
subtracted, creating a difference image. The flux for each
star identified on the reference image is then measured
on each subtracted image using PSF-fitting photometry.
Image subtraction has been shown in limited tests to be
equal to or better than other photometric methods for
the purposes of transit searches (Bakos 2006). In section
§4 below, we provide additional information about the
reduction process to obtain relative photometry.
3. INSTRUMENT PERFORMANCE
In this section we quantify the performance of the
KELT system by assessing in turn the telescope mount,
the astrometric quality (geometric image quality), the
image quality (position-dependent PSF), and photomet-
ric sensitivity.
3.1. Telescope Performance
Since the telescope was installed at Winer in October
2004, the hardware has performed up to specifications.
There have been no significant problems with the mount
or the control software. The pointing has not been per-
fect: our fields are so large that minor pointing errors do
not significantly affect our scientific results, but during
normal operations the typical intra-night drift is ∼25′
in Declination and ∼9′ in Right Ascension. We believe
the drift is due to a slight non-perpendicularity between
the orientation of the camera and the axis of the mount.
While the magnitude of the drift seems large, it repre-
sents a movement of ∼65 pixels, less than 2% of the size
of the field. It does not cause stars to move across large
portions of the detector, and therefore does not lead to
significant changes in the PSFs of individual stars. Since
our reduction method utilizes image subtraction, we do
lose the ability to take good photometry at the edges of
a field. However, because of PSF distortions and other
effects (see §3.3), we already have degraded sensitivity in
those regions. Therefore the loss of coverage and sensi-
tivity due to drift is quite small. Future alignment of the
telescope will attempt to reduce or eliminate the drift.
3.2. Astrometric Performance
Measurements of the positions of stars from the Tycho-
2 Catalog (Høg et al. 2000) are used to determine the
conversion between pixel coordinates and celestial coor-
dinates on the KELT images. We use the Astrometrix8
package to compute polynomial astrometric solutions
for our images following the procedure described by
Calabretta & Greisen (2002). To avoid stars that are sat-
urated on the KELT images, we consider Tycho-2 stars
with magnitudes 9.0 ≤ VTyc ≤ 10.0. From these we select
up to 1000 stars per image. A first attempt to compute a
global astrometric solution for the entire 4096×4096 im-
age produced large residuals for most of the outer parts
of the detector, with discrepancies between the predicted
and actual positions of Tycho-2 stars of many tens of pix-
els. Since our primary goal is to convert pixel coordinates
into celestial coordinates on a star-by-star basis, a global
solution is not required. We instead divide the image
into 25 subimages on a 5×5 grid and perform a separate
astrometric solution for each subimage. A third-order
polynomial astrometric fit is computed for each subim-
age using Astrometrix. The individual subimage fits
give much better results, with offsets between predicted
and measured positions of catalog stars at the subpixel
level except at the extreme corners of the field. The
subframes overlap by a few tens of pixels, and fits to
stars common to adjacent subimages are consistent at
the arcsecond level. For the 200mm lens, the typical
RMS residuals are ±0.′′8, or < 10% of the average pixel
size of ∼9.′′5. The 80mm lens has slightly worse RMS
residuals, ±5′′or ∼20% of a pixel size of 22′′, but still
well within tolerances for our purposes.
Having a good astrometric fit to the images permits
a quantitative assessment of the geometric performance
of the KELT optics; specifically, variation in pixel size
and shape across the field. Since this is a commercial
lens with proprietary optical designs, there is no way to
determine these propoerties a priori. This analysis will
therefore be useful for anyone contemplating using ssim-
ilar systems, and has implicatons for the potential use
of theis setup - for example, such a camera/lens com-
bination cannot be used to construct large-scale image
mosaics. Furthermore, while sky subtraction deals with
this effect, flatfielding does not.
For the 200mm lens, the effective pixel size decreases
by about 1% from center-to-edge from 9.′′537 near the
center to 9.′′450 at the edges of the field (∼9.′′40 at the
corners). Contours of constant effective pixel scale are
circular and centered on the intersection of the optical
axis of the 200mm camera lens and the CCD detector,
as shown in Figure 2. The square CCD pixels do not
perfectly project onto squares on the sky, but slowly dis-
tort systematically away from the center, showing the
characteristic signature of ∼0.5% pincushion distortion,
expected for the manufacturer’s typical claims for their
telephoto lenses. As Figure 3 shows, for the 80mm lens
the effective pixel size decreases by about 3.5% from
8 http://www.na.astro.it/$\sim$radovich/wifix.htm
center-to-edge, from 23.′′19 near the center to 22.′′40 at
the edges of the field (and ∼21.′′8 at the corners). This
effect is larger than the one seen with the 200mm lens,
consistent with ∼2% pincushion distortion in this lens,
typical of short focal-length wider-angle lenses. Contours
of constant effective pixel scale are also circular and cen-
tered on the CCD detector.
The effect of the optical distortion is that pixels project
onto smaller effective areas on the sky moving radially
outward from the center of the CCD, making the sky
appear non-flat (center-to-edge) at the ∼1.5% level for
the 200mm lens and at the ∼6% level for the 80mm lens.
There are two effects that act together to decrease the
background sky level per pixel as you go radially outward
from the center of the image: the decreasing pixel scale,
and hence decreasing pixel area on the sky, with radius,
and increasing vignetting with radius.
3.3. Image Quality
As expected for such an optically fast system, the
image PSF varies systematically as a function of posi-
tion. The small physical pixel size (9µm) implies that
the lens optics dominate the PSF, and we are insensitive
to changes in atmospheric seeing. For both lenses, the
systematic patterns in both the image full-width at half
maximum (FWHM) and more refined measures of image
quality (i.e. the 80% encircled energy diameter D80) can
be used to quantitatively assess the position-dependent
image quality.
For the 200mm lens, typical image PSFs have FWHM
of ∼1.8–2.9pixels, and 80% encircled energy diameters
of D80 =4.7-9pixels. The 80mm lens has a similar range
of FWHM for stellar image PSFs, and the 80% encircled
energy diameters range from D80 =6-10pixels. There
are significant changes in the detailed PSF shape across
each image from the center to the extreme edges of the
detector. Figure 4 shows representative stellar PSFs for
a 5×5 grid across the CCD for the 200mm lens. The
80mm lens shows more pronounced distortions, as shown
in Figure 5. Therefore, the 80mm lens has a roughly
24◦ diameter effective field of view with reasonably good
images and little vignetting, whereas the 200mm lens
works well over most of the CCD detector except at the
extreme corners.
Figures 6 and 7 show maps of the image FWHM as a
function of position for the 200mm and 80mm lenses,
respectively, derived from measurements of unsaturated,
bright field stars in representative images. The most ob-
vious feature in both is the strong vertical trend in in-
creasing FWHM, with nearly no differences horizontally.
Because this is seen with both lenses, which are of very
different design, we believe this is because the CCD is
tilted relative to the optical axis. Because the lens de-
signs are proprietary, we do not know precisely how much
the detector is tilted, nor the origin of the tilt at present.
This effect could be due to how the detector is mounted
inside the camera, or to the camera/lens mounting plate.
This apparent field tilt also affects the maps of the 80%
encircled-energy diameter D80, shown in Figures 8 and
The 80mm lens has very stable imaging performance
over time. The FWHM and D80 maps derived for im-
ages of the same field over a 11-month period show no
significant changes. We have, however, periodically ad-
http://www.na.astro.it/$\sim $radovich/wifix.htm
justed the focus when working with the telescope, which
my cause some discontinuities in the data for the survey
images. We will explore such effects in upcoming papers.
Unfortunately, the PSF is not stable across the image
over time for the 200mm lens. While intranight vari-
ations in the FWHM maps are quite small, there are
significant changes from night to night that have no ap-
parent correlation with hour angle, CCD temperature, or
any other physical or environmental parameter for which
we have measurements. The effect of the changes we see
is for the region of best FWHM (the base of the trough
seen in the FWHM map in Figure 6) to move vertically
on the CCD by many hundreds of pixels. We do not
yet know the cause. The main effect is to complicate
the difference-imaging reductions of the data. We will
discuss these complications and their mitigation in the
subsequent paper describing our results for the 200mm
lens Praesepe cluster observing campaign.
3.4. Photometric Sensitivity
Given the nature of the KELT bandpass (see Figure 1),
we calibrate our instrumental magnitudes to the R band.
We do so by rescaling our instrumental magnitudes by a
constant, such that
RK ≡ −2.5 log(ADU/sec) +RK,0 (1)
where the instrumental ADU/sec is measured using aper-
ture photometry with IRAF, RK is defined as an approx-
imate KELT R magnitude, and RK,0 is the zero-point.
We find that the RK magnitudes are within a few tenths
of a magnitude of standard R band photometry, with the
uncertainty dominated by the color term. Since we do
not have V −I colors for all our stars, we quote observed
magnitudes in RK , which can be considered equivalent
to Johnson R, modulo a color term defined by
V = RK + CV I(V − I) (2)
where CV I is the (V − I) color coefficient, and V/I are
in the Johnson/Cousins system. Since we do not have
previously measured R magnitudes of large numbers of
stars in our fields in our magnitude range, we relate RK
to known magnitudes by matching stars from our ob-
servations to the Hipparcos catalog, selecting only stars
with measured V and I colors in Hipparcos. We take the
mean instrumental magnitude from a set of high-quality
images, and match the known magnitudes to the mean
instrumental magnitudes, using Equations 1 and 2.
For the cluster observations with the 200mm lens, we
select 22 calibration stars, and measure their instrumen-
tal magnitudes on 76 high-quality images, resulting in a
magnitude zero-point of RK,0 = 16.38 ± 0.06 mag, and
CV I = 0.55± 0.2. For the survey observations with the
80mm lens, we use 59 stars on 77 images, resulting in a
magnitude zero-point of RK,0 = 15.15 ± 0.07 mag, and
CV I = 0.5 ± 0.3. Since the Hipparcos stars we use to
calibrate our data have (V − I) colors mostly between 0
and 1, we expect our calibrations to be less accurate for
redder stars. These measured zero-point uncertainties
suggest that the flatfield corrections are good to within
∼ 5% in absolute accuracy.
Tying together the full calibration process, we find that
a fiducial R = 10 mag star at the field center with (V −
I) = 0 has a flux of 356 counts per second with the
200mm lens, and 115 counts per second with the 80mm
lens. Scaling those numbers by the different aperture
sizes of the lenses (71mm aperture for the 200mm lens
and 42mm aperture for the 80mm lens), we find that the
200mm lens is about 8% more efficient than the 80mm
lens.
4. RELATIVE PHOTOMETRY
KELT was designed primarily for precision time-series
relative photometry (see Everett & Howell (2001) for
background). The crucial test for our instrument is the
ability to obtain long-term lightcurves with low noise and
minimal systematics. A simple test of KELT’s photo-
metric performance is to examine the root-mean-squared
(RMS) of the magnitudes of an ensemble of lightcurves as
a function of magnitude. We apply the ISIS image sub-
traction package to samples of our data to obtain relative
photometry, and measure the statistics of the resulting
lightcurves. Our criteria for the ability to detect plan-
etary transits is the presence of substantial numbers of
stars for which the RMS of the lightcurves are below the
2% and 1% levels.
4.1. Difference Imaging Performance
The instrumental magnitudes for the KELT lightcurves
are produced through a combination of ISIS and
DAOPHOT photometry. This process involves some
careful conversion between DAOPHOT and ISIS flux
measurements – see Appendix B of Hartman et al. (2004)
for the details of the conversion. First, we create a ref-
erence image by combining a number of high-quality im-
ages. DAOPHOT measures the instrumental magnitude
of the stars on the reference image mi(ref) based on PSF
fitting photometry, with the magnitude of each star i cal-
culated from the flux by mi(ref) ≡ 25− 2.5 log[fi(ref)] +
Cap, where fi is the flux measured by DAOPHOT and
Cap is an aperture correction to ensure that mi(ref) =
25 − 2.5 log(ci), where ci is the counts per second from
the star in ADU. ISIS then creates an ensemble of sub-
tracted images for the whole data set using the refer-
ence. To derive the full light curve, ISIS fits a PSF
for each star on each subtracted image j, to obtain a
flux fi(j). The DAOPHOT-reported instrumental mag-
nitudes for the reference images serve as the magnitude
baseline for the conversion of ISIS fluxes to magnitudes,
where the magnitude of the ith star on the jth image is
mi(j) = mi(ref)− 2.5 log[1− fi(j)/fi(ref)].
We then calculate the RMS variation of all the detected
stars in both the Praesepe data set and for a sample of
the survey data. Because of the night-to-night varia-
tions in the position of the best image quality on the
detector with the 200mm lens described at the end of
§3.3, we calculate the RMS for the Praesepe data from
a single night of observations, to better demonstrate the
intrinsic instrumental performance. In Figure 10 we plot
the distribution of RMS versus RK magnitude for 67,674
stars on 32 images with 60-second exposures from one
night. With this lens and exposure time, we obtain pho-
tometry of stars in the magnitude range RK = 8 − 16
mag. For stars brighter than about RK = 9.5mag,
systematics begin to dominate the light curves, mostly
due to saturation. Out of the 67,674 stars, 4,281 have
RMS< 0.02mag, and 1,369 have RMS< 0.01mag.
We perform the same analysis for one of the regu-
lar survey fields observed with the 80mm lens using
239 observations over 8 nights with 150-second expo-
sures. We obtain photometry on 49,376 stars in the range
RK = 6 − 14 mag, and plot the data in Figure 11. We
find 14,333 stars with RMS< 0.02mag, and 3,822 stars
with RMS< 0.01mag, with systematics dominating for
stars brighter than RK = 7.5mag.
Overall, the RMS performance is mostly as expected
from Poisson statistics except at the bright end. The best
precision just reaches the theoretical noise limit, with a
spread of RMS values above that limit due to real-world
effects. For the brightest stars in our data we see a floor
in which the RMS no longer decreases as the stars get
brighter, and instead becomes roughly constant at RMS
= 0.004 magnitudes. The RMS floor is indicative of a
fixed pattern noise component, caused by intrapixel sen-
sitivity on the CCD. The Kodak KAF-E series CCDs
are 2-phase front-side illuminated devices in which the
second poly-gate electrode on each pixel is a transparent
gate made of Indium-Tin-Oxide (ITO) to boost the over-
all quantum efficiency of the device (Meisenzahl et al.
2000). Over much of the wavelength regime of interest for
KELT, the ITO material is ∼2 times more transparent
than the regular silicon oxide material used on the first
poly-gate. The result is significant pixel substructure
in which the quantum efficiency varies stepwise across
each 9µm pixel, which introduces a component of fixed-
pattern noise that produces the observed RMS floor. We
note that more recent models of commercial CCD cam-
eras with the Kodak KAF-16801 detector are using a
newer version of this device that incorporate a front-
surface microlens array that mitigates the intrapixel step
in transmission, but persons contemplating similar sys-
tems to our own should be aware of the issue and take it
into account.
We do not expect to obtain high-precision photometry
for the very brightest stars in our data, but the RMS floor
is well below the 1% level and it should not significantly
affect our ability to detect transits. We plot noise models
in Figures 10 and 11, which include photon noise and
sky noise, along with the RMS floor. In the future we
will choose a lens focus that makes slightly larger FWHM
images to minimize these effects.
5. REPRESENTATIVE RESULTS
The RMS plots shown in Figures 10 and 11 demon-
strate that our telescope can obtain precision relative
photometry, with large numbers of stars measured at the
1% level. However, simply looking at the RMS informa-
tion does not prove that the data set can yield the con-
sistent quality with low systematics necessary for a tran-
sit search. To illustrate that we can fulfill that require-
ment, we show in Figure 12 the lightcurves of three sam-
ple variable stars we have discovered using the 200mm
lens while observing the field of Praesepe. Even with the
night-to-night variations discussed in §3.3, we are able
to achieve the consistent data quality that transit detec-
tion requires, and are able to clearly see features in the
phased lightcurves of a few percent or less.
An even better example of our ability to detect tran-
sits can be seen in Figure 13. Here we show two objects
we detected in our data from the Praesepe field which
exhibit transit-like dips in their lightcurves. These ef-
fects can be clearly seen at the level of a few percent.
These objects are not planetary: follow-up spectroscopy
indicates that the top object is an F star with a tran-
siting M dwarf companion, and the bottom object is a
grazing eclipsing binary (Latham 2007). However, they
demonstrate that we can confidently detect transit-like
behavior at the 1%-2% level with our telescope. A full
catalog of the variable stars and transit candidates from
the KELT observations of the Praesepe field will appear
in a forthcoming paper (J. Pepper, in preparation).
6. SUMMARY AND DISCUSSION
The KELT project has been acquiring data since Oc-
tober 2004. We observed the field of the Praesepe open
cluster with the 200mm lens for two months in early
2005, and have spent the rest of the time using the 80mm
lens for a survey of 13 fields around the Northern sky.
The KELT telescope, used in “Survey Mode” with the
80mm lens, reflects the design specifications called for
in the theoretical paper Pepper, Gould, & DePoy (2003).
The performance of the telescope as described in this pa-
per provides a real-world evaluation of the potential for
this telescope to detect transiting planets. In addition
to the lightcurves and RMS plots that show the tele-
scope’s abilities, we find that the total number of pho-
tons acquired from a fiducial V = 10 mag star over an
entire observing run is well above the number assumed
in Pepper, Gould, & DePoy (2003) with the parameter
gamma0.
This paper has described the KELT instrumentation
and performance, with both the 200mm lens used for
observing clusters and the 80mm lens used for conduct-
ing the all-sky survey. It is the widest-field instrument
that is currently being employed to search for transit-
ing planets, and we observe brighter stars than other
wide field surveys. The performance metrics demon-
strate that it is capable of detecting signals at the ∼1%
level needed to detect Jupiter size planets transiting
solar-type stars. Further refinements of our reduction
process promise to expand our sensitivity to transits,
such as applying detrending algorithms of the sort devel-
oped by Tamuz, Mazeh, & Zuker (2005). We also plan
to conduct a full analysis of red noise (i.e. temporally
correlated systematic noise, see Pont, Zuker, & Queloz
(2006) for a full description) for all KELT data.
To date, we have detected over 100 previously unknown
variable stars in our observations towards Praesepe, and
we have identified several lightcurves with transit-like be-
havior, of which we show two in Figure 13. Future pa-
pers will report on the success in discovering variable
stars and searching for planets in Praesepe, along with
the full transit search for the all-sky survey.
We would like to thank the many people who have
helped with this research, including Scott Gaudi, An-
drew Gould, Christopher Burke, and Jerry Mason. We
would also like to thank Apogee Instruments and Soft-
ware Bisque for supplying the camera and mount for the
telescope and for excellent service for the hardware. This
work was supported by the National Aeronautics and
Space Administration under Grant No. NNG04GO70G
issued through the Origins of Solar Systems program.
REFERENCES
Alard, C. & Lupton, R.H. 1998, ApJ, 503, 325
Alard, C. 2000, A&A, 144, 363
Alonso, R., et al. 2004, ApJ, 613, L153
Bakos, G. A. 2006, private communication
Bakos, G. A. 2006, ApJ, 656, 552
Barnes, J. W. & Fortnoy, J. J. 2004, ApJ, 616, 1193
Calabretta, M.R., & Greisen, E.W. 2002, å, 395, 1077
Cameron, A. C., et al. 2007, MNRAS, 375, 951
Charbonneau, D., Brown, T. M., Noyes, R. W., & Gilliland, R. L.
2002, ApJ, 568, 377
Charbonneau, D., Brown, T. M., Burrows, A., & Laughlin, G.
2007, in Protostars & Planets V, ed. B. Reipurth, D. Jewitt, &
K. Keil (Tucson: University of Arizona Press), in press
(astro-ph/0603376)
Everett, M. E. & Howell, S. B. 2001, PASP, 113, 1428
Gaudi B. S. & Winn, J. N. 2007, ApJ, 655, 550
Guillot, T. 2005, Annual Review of Earth and Planetary Sciences,
33, 493
Hartman, J. D., Bakos, G., Stanek, K. Z., & Noyes, R. W. 2004,
AJ, 128, 1761
Hayes, D. S. & Latham, D. W. 1975, ApJ, 197, 593
Høg, E., et al. 2000, A&A, 355, L27
Kodak Photographic Filters Handbook, Publication No. B-3
Eastman Kodak Company, Rochester, New York (1998)
Latham, D. 2007, private communication
McCullough, P. R., et al. 2006, ApJ, 648, 1228
Meisenzahl, E., Chang, W., DesJardin, W., Kosman, S.,
Shepherd, J., Stevens, E., & Wong, K. 2000, Proc. SPIE, 3965,
O’Donovan, F. T., et al. 2006, ApJ, 651L, 61
Pepper, J., Gould, A., & Depoy, D. L., 2003, Acta Astron, 53, 213
Pont, F., Zuker, S., & Queloz, D. 2006, MNRAS, 373, 231
Schneider, D. P., Gunn, J. E., & Hoessel, J.G. 1983, ApJ, 264, 337
Stetson, P. B. 1987, PASP, 99, 191
Tamuz, O., Mazeh, T., & Zuker, S. 2005, MNRAS, 356, 1466
Udalski, A., Szymanski, M. K., Kubiak, M., Pietrzynski, G.,
Soszynski, I., Zebrun, K., Szewczyk, O., & Wyrzykowski, L.
2004, Acta, 54, 313
http://arxiv.org/abs/astro-ph/0603376
0.1.2.3.4.5.6
Response
Fig. 1.— Calculated response function of the KELT CCD camera and Kodak #8 Wratten filter. The dashed curve is the response function
including atmospheric transmission at Winer Observatory for 1.2 airmasses. This response function does not include the transmission of
the camera lenses.
−10 0 10
X (mm)
9.525
9.500
9.475
9.537"/pix
Fig. 2.— Effective pixel scale in arcseconds pixel−1 for the KELT 200mm camera. Contours show curves of constant effective pixel scale.
The cross (+) marks the optical center of the field, where the pixel scale is 9.′′537 pix−1.
−10 0 10
X (mm)
21.68"/pix
Fig. 3.— Effective pixel scale in arcseconds pixel−1 for the KELT 80mm camera. Contours show curves of constant effective pixel scale.
The cross (+) marks the optical center of the field, where the pixel scale is 21.′′68 pix−1.
−14,14 −7,14  0,14 7,14 14,14
−14,7 −7,7  0,7 7,7 14,7
−14,0 −7,0  0,0mm 7,0 14,0
−14,−7 −7,−7  0,−7 7,−7 14,−7
−14,−14 −7,−14  0,−14 7,−14 14,−14
Fig. 4.— Representative stellar PSFs from the 200mm telephoto lens, shown as intensity contours of bright, unsaturated stars taken
with a 5×5 grid pattern on the CCD; the position of the center of each box, relative to the center of the image, is indicated in mm. Each
box is 15 pixels (135mm) on a side. The scale-bar in the center panel indicates 50µm on the detector (5.5 pixels). Contours show levels of
(5,10,15,20,25,30,40,50,60,70,80,90,100)% of the peak intensity.
−14,14 −7,14  0,14 7,14 14,14
−14,7 −7,7  0,7 7,7 14,7
−14,0 −7,0  0,0mm 7,0 14,0
−14,−7 −7,−7  0,−7 7,−7 14,−7
−14,−14 −7,−14  0,−14 7,−14 14,−14
Fig. 5.— Representative stellar PSFs from the 80mm wide-angle lens, displayed in the same format as in Figure 4. The images are more
severely distorted at the extreme edges of the field than with the 200mm lens.
−10 0 10
X (mm)
Fig. 6.— Contours of constant FWHM for stellar images in a representative 200mm lens KELT image, with FWHM given in pixels.
Contours are based on a smooth polynomial surface fit to measurements of ∼1200 bright, unsaturated stars distributed across the image.
Contour spacing is every 0.1 pixels, with particular contour level values in pixels as indicated.
−10 0 10
X (mm)
Fig. 7.— Contours of constant FWHM for stellar images in a representative 80mm lens KELT image, with FWHM given in pixels.
Format is the same as in Figure 6. Contours are based on measurements of ∼2100 bright, unsaturated stars.
−10 0 10
X (mm)
D80=4.7pix
Fig. 8.— Contours of constant 80% Encircled-Energy Diameter (D80) in pixels for stellar images in a representative 200mm lens KELT
image. Contours are based on a smooth polynomial surface fit to measurements of ∼1200 bright, unsaturated stars distributed across the
image.
−10 0 10
X (mm)
D80=6.12 pix
Fig. 9.— Contours of constant 80% Encircled-Energy Diameter (D80) in pixels for stellar images in a representative 80mm lens KELT
image. Contours are based on measurements of ∼2100 bright, unsaturated stars.
Fig. 10.— RMS plot for one night of data from the 200mm lens. Data are shown for 67,674 stars. The dashed lines show the limits for 1%
and 2% RMS. The solid line represents a noise model including photon noise and sky noise, along with an RMS floor of 0.004 magnitudes.
Fig. 11.— RMS plot for eight nights of data from the 80mm lens. Data are shown for 49,376 stars. The dashed lines show the limits
for 1% and 2% RMS. The solid line represents a noise model including photon noise and sky noise, along with an RMS floor of 0.004
magnitudes.
Fig. 12.— Three variable stars discovered with the 200mm lens in the field of the Praesepe open cluster.
Fig. 13.— Two transit candidates discovered with the 200mm lens in the field of the Praesepe open cluster. The lower panel of each
plot shows the data binned in 10-minute bins. Follow-up spectroscopy indicates that object (a) is an F star with a transiting K dwarf
companion, and object (b) is a grazing eclipsing binary.
ABSTRACT
  The Kilodegree Extremely Little Telescope (KELT) project is a survey for
planetary transits of bright stars. It consists of a small-aperture, wide-field
automated telescope located at Winer Observatory near Sonoita, Arizona. The
telescope surveys a set of 26 x 26 degree fields, together covering about 25%
of the Northern sky, targeting stars in the range of 8<V<10 mag, searching for
transits by close-in Jupiters. This paper describes the system hardware and
software and discusses the quality of the observations. We show that KELT is
able to achieve the necessary photometric precision to detect planetary
transits around solar-type main sequence stars.

<|endoftext|><|startoftext|>
Entanglement increase from local interactions with not-completely-positive maps
Thomas F. Jordan∗
Physics Department, University of Minnesota, Duluth, Minnesota 55812
Anil Shaji†
The University of New Mexico, Department of Physics and Astronomy, 800 Yale Blvd. NE, Albuquerque, New Mexico 87131
E. C. G. Sudarshan‡
The University of Texas at Austin, Center for Statistical Mechanics, 1 University Station C1609, Austin Texas 78712
Simple examples are constructed that show the entanglement of two qubits being both increased and decreased
by interactions on just one of them. One of the two qubits interacts with a third qubit, a control, that is never
entangled or correlated with either of the two entangled qubits and is never entangled, but becomes correlated,
with the system of those two qubits. The two entangled qubits do not interact, but their state can change from
maximally entangled to separable or from separable to maximally entangled. Similar changes for the two qubits
are made with a swap operation between one of the qubits and a control; then there are compensating changes of
entanglement that involve the control. When the entanglement increases, the map that describes the change of
the state of the two entangled qubits is not completely positive. Combination of two independent interactions that
individually give exponential decay of the entanglement can cause the entanglement to not decay exponentially
but, instead, go to zero at a finite time.
Keywords: Entanglement, Quantum information
I. INTRODUCTION
We construct simple examples here that show the entan-
glement of two qubits being both increased and decreased by
interactions on just one of them. In our first and basic step,
taken in Sec. II, we have one of the two qubits interact with a
third qubit, a control, that is never entangled or correlated with
either of the two entangled qubits and is never entangled, but
becomes correlated, with the system of those two qubits. In
Sec. III, we do this for each of the two entangled qubits, and
consider the combination of the two interactions, with sepa-
rate control qubits that are not correlated and do not interact
with each other. The two entangled qubits do not interact, but
their state can change from maximally entangled to separable
or from separable to maximally entangled. Similar changes
for the two qubits are made with a swap operation between
one of the qubits and a control; then there are compensating
changes of entanglement that involve the control. This is de-
scribed in Sec. II.A.
Whenever the entanglement increases, and in some cases
where the entanglement decreases, the map that describes the
change of the state of the two entangled qubits is not com-
pletely positive and does not apply to all states of two qubits.
It all depends on whether there are correlations with the con-
trols at the beginning of the interval for which the dynamics is
considered. The maps are described in Sec. IV and discussed
in Sec. V. The completely positive maps that decrease the en-
tanglement have already been described [1].
When the interaction of each qubit with its control by itself
∗email: tjordan@d.umn.edu
†email: shaji@unm.edu
‡email: sudarshan@physics.utexas.edu
gives exponential decay of the entanglement, the combination
of the two interactions gives exponential decay at the rate that
is the sum of the rates for the individual interactions, when the
two interactions are made the same way. Making them differ-
ently can cause the entanglement to not decay at that rate or
at any single rate. Instead, the entanglement goes to zero at a
finite time; the state becomes separable and remains separable
at later times. This is described in Sec. III.A. Similar behavior
has been observed in more physically interesting and mathe-
matically complicated models [2, 3, 4].
These examples are built on the same framework, but to a
very different design, from those we made for Lorentz trans-
formations that entangle spins [5]. There the momenta that
played the roles of controls were purposely correlated. Here
the controls are kept independent. The framework makes the
operations transparent by describing the qubit states with den-
sity matrices written in terms of Pauli matrices, so you can
see the Pauli matrices being rotated by the interactions. States
are shown to be separable by writing out the density matrices
explicitly as sums of products for pure states. For each in-
teraction here, the map that makes the change of the density
matrix for the entangled qubits is described by a simple rule
that particular Pauli matrices in the density matrix are multi-
plied by a number; equivalently, the map of the state of the
entangled qubits is described by a rule that particular mean
values are multiplied by a number.
Our examples show that statements like “entanglement
should not increase under local operations and classical com-
munication” [6, 7] are not generally true outside the set of
local operations considered in the original proof [6]. In our
examples, each control qubit interacts with only one of the
two entangled qubits. In this sense, the quantum operations
are local. Correlation with a control at the beginning of the
interval for which the dynamics is considered can give local
operations that increase entanglement.
http://arxiv.org/abs/0704.0461v2
mailto:tjordan@d.umn.edu
mailto:shaji@unm.edu
mailto:sudarshan@physics.utexas.edu
II. ONE INTERACTION
We consider the entanglement of two qubits, A and B. We
use Pauli matrices Σ1,Σ2,Σ3 for qubit A, and Pauli matrices
Ξ1,Ξ2,Ξ3 for qubit B. We let qubit A interact with a third
qubit, which we call C. We think of C as a control. By inter-
acting with qubit A, it will control the entanglement of qubits
A and B. We work with states represented by orthonormal
vectors |α〉 and |β〉 for C. We consider a state of the three
qubits represented by a density matrix
Π = ρ⊗ 1
11C (2.1)
with ρ the density matrix for the state of qubits A and B.
We follow common physics practice and write a product of
operators for separate systems, for example a product of Pauli
matrices Σ andΞ for qubitsA andB, simply as ΣΞ, not Σ⊗Ξ.
Occasionally we insert a ⊗ for emphasis or clarity. We write
11A, 11B , 11C , but we do not put labels A and B on the Σj and
Ξk. The single statement that the Σj are for qubit A and the
Ξk are for qubit B eliminates the need for continual use of
both A and B lalels and ⊗ signs.
Suppose ρ is one of the density matrices
(11 ± Σ1Ξ1 ± Σ2Ξ2 − Σ3Ξ3). (2.2)
Both ρ+ and ρ− represent maximally entangled pure states
for the two qubits. They are Bell states. The state of zero total
spin is represented by ρ− and the state obtained from that by
rotating one of the spins by π around the z axis is represented
by ρ+.
For a rotation W , let DA(W ) be the 2 × 2 unitary rotation
matrix made from the Σj so that
DA(W )
ΣDA(W ) = W (Σ) (2.3)
where W (Σ) is simply the vector Σ rotated by W . Let W (φ)
be the rotation by φ around the z axis, and let DA(φ) be
DA(W (φ)).
We consider an interaction between qubits A and C de-
scribed by the unitary transformation
U = DA(φ)|α〉〈α| +DA(−φ)|β〉〈β| (2.4)
or, in Hamiltonian form,
U = e−iφH (2.5)
H = Σ3
(|α〉〈α| − |β〉〈β|). (2.6)
This changes the density matrix ρ for qubits A and B to
ρ′ = TrC
(U ⊗ 11B)Π(U ⊗ 11B)†
DA(φ)ρDA(φ)
DA(−φ)ρDA(−φ)†. (2.7)
For ρ± this gives
ρ′± =
[11 ± (Σ1 cosφ+Σ2 sinφ)Ξ1
± (−Σ1 sinφ+Σ2 cosφ)Ξ2 − Σ3Ξ3]
[11 ± (Σ1 cosφ− Σ2 sinφ)Ξ1
± (Σ1 sinφ+Σ2 cosφ)Ξ2 − Σ3Ξ3]
[11 ± (Σ1Ξ1 +Σ2Ξ2) cosφ− Σ3Ξ3]
= ρ± cos
2(φ/2) + ρ∓ sin
2(φ/2). (2.8)
A. From maximally entangled to separable and back
We focus first on the case where φ is π/2. Then both ρ+
and ρ− are changed to
[11 − Σ3Ξ3]
(11 − Σ3)
(11 + Ξ3)
(11 +Σ3)
(11 − Ξ3). (2.9)
The density matrix ρ for a maximally entangled state is
changed to the density matrix ρ′ for a separable state that is
a mixture of just two products of pure states. The inverse of
the unitary dynamics of qubits A and R takes ρ′ back to ρ; it
changes a separable state to a maximally entangled state.
The dynamics continuing forward also changes this separa-
ble state to a maximally entangled state. As φ goes from π/2
to π, the density matrix ρ′± changes from that of Eq. (2.9) to
ρ′± = ρ∓. (2.10)
There can be revivals of entanglement between two qubits
when there is no interaction between them, as well as when
[8] there is.
Changes in the state of qubits A and B from maximally
entangled to separable and back to maximally entangled can
also be made very simply with a swap of states[9] between A
and C. This can be done with a unitary operator U ⊗ 11B with
U a unitary operator for qubits A and C that acts on a basis
of product state vectors simply by interchanging the states of
A and C. There is interaction between qubits A and C only;
qubit B is not involved.
Applied to an initial state described by Eqs. (2.1) and (2.2),
where qubits A and B are maximally entangled, this swap op-
eration gives a separable state for A and B. Applied a second
time, it restores the initial state where A and B are maximally
entangled. For qubits A and B, this is similar to what hap-
pens when φ goes from 0 to π/2 to π. For the three qubits,
it is different. The swap operation does not change the com-
plete inventory of entanglements for the three qubits. It just
moves the entanglements around. In particular, C becomes
maximally entangled with B. We will see, in Secs. II.C and
D, that the interaction described by Eqs. (2.4), (2.5) and (2.6)
does change the complete inventory of entanglements for the
three qubits. When the state of qubits A and B changes from
maximally entangled to separable and back to maximally en-
tangled, there are no compensating changes of other two-part
entanglements. In particular, qubit C never becomes entan-
gled with anything.
B. Concurrence
The change of entanglement is smaller when φ does not
change by π/2. ¿From Eq. (2.8) we have
ρ′± =
[11±(Σ1Ξ1+Σ2Ξ2) cosφ+(Σ1Ξ1)(Σ2Ξ2)], (2.11)
after rewriting the last term. This shows that for both ρ′
ρ′− the eigenvalues are
(1 + cosφ),
(1 − cosφ), 0, 0 (2.12)
because Σ1Ξ1 and Σ2Ξ2 each have eigenvalues 1 and −1 and
together they make a complete set of commuting operators:
their four different pairs of eigenvalues label a basis of eigen-
vectors for the space of states for the two qubits. The Wooters
concurrence [10] is a measure of the entanglement in a state
of two qubits. It is defined by
C(ρ) ≡ max
(2.13)
where ρ is the density matrix that represents the state and
λ1, λ2, λ3, λ4 are the eigenvalues, in decreasing order, of
ρ Σ2Ξ2 ρ
⋆ Σ2Ξ2, with ρ
⋆ the complex conjugate that is ob-
tained by changing Σ2 and Ξ2 to −Σ2 and −Ξ2. From
Eq. (2.11) we have
ρ′± Σ2Ξ2 (ρ
⋆ Σ2Ξ2 = ρ
⋆ (Σ2Ξ2)
2=(ρ′±)
2 (2.14)
so for ρ′± the
λi are the eigenvalues of ρ
± and the concur-
rence is
C(ρ′±) = | cosφ|. (2.15)
We can consider the change of entanglement as φ changes
through any interval. When | cosφ| decreases, the entangle-
ment decreases. When | cosφ| increases, the entanglement
increases.
C. Two-part entanglements
The only two-part entanglements are when qubit A is in
one part and qubit B is in the other. There is entanglement
between qubitA and the subsystem of two qubits B and C and
between qubit B and the subsystem of two qubits A and C, as
well as between qubits A and B. There is never entanglement
between the state of qubit C and the state of the subsystem of
two qubits A and B. The density matrix
(U ⊗ 11B)Π(U ⊗ 11B)†=
DA(φ)ρ±DA(φ)
†|α〉〈α|
DA(−φ)ρ±DA(−φ)†|β〉〈β|
(2.16)
is always a mixture of two products of pure states. The re-
duced density matrix for the subsystem of qubits A and C,
obtained by taking the trace over the states of qubit B, is just
11A ⊗ 11C/4, and the reduced density matrix for qubits B and
C, obtained by taking the trace over the states of qubit A,
is 11B ⊗ 11C/4. There is never entanglement or correlation
between qubits A and C or between qubits B and C. The
reduced density matrices for the individual single qubits are
just 11A/2, 11B/2, and 11C/2. The only subsystem density ma-
trix that carries any information is the density matrix ρ for
the qubits A and B, which is changed by the interaction with
qubit C. The entropy of the subsystem of qubits A and B can
increase or decrease, but there is no change of entropy for any
other subsystem or for the entire system of three qubits.
D. Three-part entanglement
There is three-part entanglement. The state represented by
the density matrix (2.16) is called biseparable because it is
separable as the state of a system of two parts, with C one
part and the subsystem of two qubits A and B the other part.
It is not separable as the state of a system of three parts A, B,
and C. The density matrix (2.16) is not a mixture of products
of density matrices for pure states of the individual qubits A,
B, and C. If it were, its partial trace over the states of C, the
reduced density matrix that represents the state of the subsys-
tem of the two qubitsA andB, would be a mixture of products
for pure states of A and B. That happens only when cosφ is 0.
In that case, we can see that the density matrix (2.16) is not a
mixture of products for pure states of the individual qubits A,
B, and C because its partial transpose obtained by changing
Ξ2 to −Ξ2 is not a positive matrix.
In the classification of three-part entanglement for qubits,
biseparable states are between separable states and states that
involve W or GHZ entanglement [11, 12, 13]. Let Π1,
Π2, Π3 be Pauli matrices for the qubit C such that |α〉〈α|
is (1/2)(11 +Π3) and |β〉〈β| is (1/2)(11 −Π3). Bounds from
Mermin witness operators say that for separable or bisepara-
ble states
−2 ≤ 〈ΣjΞjΠj−ΣjΞkΠk−ΣkΞjΠk−ΣkΞkΠj〉 ≤ 2 (2.17)
for j, k = 1, 2, 3 and j 6= k; a mean value outside these
bounds is a mark of W or GHZ entanglement [14]. In our
examples, these mean values are always 0. A mean value
〈|GHZ〉〈GHZ|〉 larger than 3/4 for the projection operator
onto the GHZ state,
|GHZ〉 = 1√
|0〉|0〉|0〉+ 1√
|1〉|1〉|1〉, (2.18)
is a mark of GHZ entanglement; it can not be larger than 3/4
for a W state [13]. A mean value 〈|GHZ〉〈GHZ|〉 larger than
1/2 is a mark of a W state; it can not be larger than 1/2 for
a biseparable state [13]. In our examples, 〈|GHZ〉〈GHZ|〉
is always 0. A mean value 〈|W 〉〈W |〉 larger than 2/3 for the
projection operator onto the W state,
|W 〉 = 1√
|1〉|0〉|0〉+ 1√
|0〉|1〉|0〉+ 1√
|0〉|0〉|1〉, (2.19)
is a mark of W entanglement; it can not be larger than 2/3 for
a biseparable state [13]. In our examples,
〈|W 〉〈W |〉 = 1
(1 ± cosφ). (2.20)
This mean value does not involve either entanglement or cor-
relation of the qubitC; it would be the same if both |α〉〈α| and
|β〉〈β| in the density matrix (2.16) were replaced by (1/2)C ,
the completely mixed density matrix for C.
For any φ, the density matrices (2.16) for the two cases +
and − are changed into each other by the local unitary trans-
formation that changes the Pauli matrices for one of the qubits
A or B by rotating its spin by π around the z axis. As a func-
tion of φ, the mean value 〈|W 〉〈W |〉 changes in opposite di-
rections for the + and − cases. So will any mean value for the
states described by the density matrices (2.16), if it changes at
For the states described by the density matrices (2.16), the
only nonzero mean values that involve the qubit C are
〈Σ1Ξ2Π3〉 = ∓ sinφ
〈Σ2Ξ1Π3〉 = ± sinφ. (2.21)
These would be the same if they were calculated with only the
|α〉〈α| part or only the |β〉〈β| part of the density matrix (2.16).
In fact, they are the same as 〈Σ1Ξ2〉〈Π3〉 and 〈Σ2Ξ1〉〈Π3〉
calculated for one of those parts. Their values do not require
either entanglement or correlation of C.
III. TWO INTERACTIONS
If a control were coupled similarly to qubit B as well, then
cosφ would be replaced by cosφA cosφB in the next to last
line of Eq. (2.8) and in Eqs.(2.11) and (2.15). If the coupling
of qubit B is made with a rotation around the x axis instead
of the z axis, then the next to last line of Eq. (2.8) becomes
ρ′± =
[11 ± Σ1Ξ1 cosφA ± Σ2Ξ2 cosφA cosφB
−Σ3Ξ3 cosφB ]. (3.1)
Rewriting the last term and looking at eigenvalues in terms of
Σ1Ξ1 and Σ2Ξ2 as before yields the concurrence
C(ρ′±)=
max[0, | cosφA|+| cosφA cosφB|+| cosφB|−1].
(3.2)
When cosφA is 1, these Eqs.(3.1) and (3.2) describe the result
obtained when there is only the interaction of qubit B made
with a rotation around the x axis. If neither cosφA nor cosφB
is 1, the concurrence becomes zero, the state separable, be-
fore cosφA or cosφB is zero. The interactions of qubits A
and B with their controls change maximally entangled states
to separable states. The inverses change separable states to
maximally entangled states. In the following subsection, we
describe the density matrices that show explicitly that the sep-
arable states are mixtures of products of pure states.
A. Exponential decay
To describe exponential decay of entanglement we let
cosφA = e
−ΓAt, cosφB = e
−ΓBt (3.3)
by letting each interaction be modulated by a time-dependent
Hamiltonian H(t) that is related to the Hamiltonian H of
Eqs.(2.4) and (2.5) by
H(t) = H
= HΓcotφ, (3.4)
where φ and Γ are φA and ΓA or φB and ΓB . The same result
could be produced in different ways. The interactions could be
with large reservoirs instead of qubit controls [2, 3, 4]. Each
qubit A or B could interact with a stream of reservoir qubits
[15]. Here we are interested in the way the entanglement is
changed by the combination of the two interactions. That de-
pends only on the changes in the density matrix ρ for qubits A
and B, not on the nature of the controls and the interactions.
Maps that make the changes in ρ will be described in the next
section.
If there is only the interaction of qubit A with qubit C, the
concurrence is e−ΓAt. If there is only interaction of qubit B
with its control, the concurrence is e−ΓBt. If there are both
and both are made with rotations around the z axis, the con-
currence is e−ΓAte−ΓBt. If there are both and the interaction
of qubit B with its control is made with a rotation around the
x axis, the concurrence is
C(ρ′±)=
max[0, e−ΓAt+e−ΓAte−ΓBt+e−ΓBt−1]. (3.5)
This concurrence (3.5) is zero when
e−ΓAt + e−ΓAte−ΓBt + e−ΓBt = 1. (3.6)
Then the state is separable; it is a mixture of six products of
pure states: from Eqs.(3.1) and (3.3)
ρ′± =
e−ΓAt
(11 +Σ1)
(11 ± Ξ1)
e−ΓAt
(11 − Σ1)
(11 ∓ Ξ1)
e−ΓAte−ΓBt
(11 +Σ2)
(11 ± Ξ2)
e−ΓAte−ΓBt
(11 − Σ2)
(11 ∓ Ξ2)
e−ΓBt
(11 +Σ3)
(11 − Ξ3)
e−ΓBt
(11 − Σ3)
(11 + Ξ3). (3.7)
The state remains separable at later times; when the sum of
the exponential decay factors is less than 1, the density ma-
trix is a mixture in which just a multiple of the density ma-
trix 1/4 for the completely mixed state is added to the terms
of Eq. (3.7). This change of maximally entangled states to
separable states can be described without reference to expo-
nential decay by continuing to use cosφA and cosφB instead
of e−ΓAt and e−ΓBt. Similar behavior involving exponential
decay has been observed in more physically interesting and
mathematically complicated models [2, 3, 4].
IV. MAPS
The maps that make the changes in the density matrix ρ for
qubits A and B could be described in different ways using
various matrix forms. That is not needed here. Writing ρ in
terms of Pauli matrices provides a very simple way to describe
the maps. For any density matrix
〈Σj〉Σj +
〈Ξk〉Ξk +
j,k=1
〈ΣjΞk〉ΣjΞk
(4.1)
for qubits A and B, the result of the interaction of qubit A
with qubit C, described by Eq. (2.7), is that in ρ, in both the
Σj and ΣjΞk terms,
Σ1 −→ Σ1 cosφA, Σ2 −→ Σ2 cosφA; (4.2)
the result of the interaction of qubit B with its control is that
Ξ1 −→ Ξ1 cosφB, Ξ2 −→ Ξ2 cosφB (4.3)
if the interaction is made with a rotation around the z axis;
and the result of the interaction of qubit B with its control is
that in ρ
Ξ2 −→ Ξ2 cosφB, Ξ3 −→ Ξ3 cosφB (4.4)
if the interaction is made with a rotation around the x axis.
The terms with sin φ cancel out because there is an equal mix-
ture of parts with φ and parts with −φ.
The changes in the state of qubits A and B can be described
equivalently by maps of mean values that describe the state:
the result of the interaction of qubit A with qubit C, described
by Eq. (2.7), is that
〈Σ1〉 −→ 〈Σ1〉 cosφA
〈Σ2〉 −→ 〈Σ2〉 cosφA
〈Σ1Ξk〉 −→ 〈Σ1Ξk〉 cosφA
〈Σ2Ξk〉 −→ 〈Σ2Ξk〉 cosφA (4.5)
for k = 1, 2, 3; the result of the interaction of qubit B with its
control is that
〈Ξ1〉 −→ 〈Ξ1〉 cosφB
〈Ξ2〉 −→ 〈Ξ2〉 cosφB
〈ΣjΞ1〉 −→ 〈ΣjΞ1〉 cosφB
〈ΣjΞ2〉 −→ 〈ΣjΞ2〉 cosφB (4.6)
for j = 1, 2, 3 if the interaction is made with a rotation around
the z axis; and the result of the interaction of qubit B with its
control is that
〈Ξ2〉 −→ 〈Ξ2〉 cosφB
〈Ξ3〉 −→ 〈Ξ3〉 cosφB
〈ΣjΞ2〉 −→ 〈ΣjΞ2〉 cosφB
〈ΣjΞ3〉 −→ 〈ΣjΞ3〉 cosφB (4.7)
for j = 1, 2, 3 if the interaction is made with a rotation around
the x axis.
When φA and φB change over intervals from initial values
φAi and φBi to final values φAf and φBf , the cosφA and
cosφB factors in the maps are replaced by cosφAf/ cosφAi
and cosφBf/ cosφBi. If either of these factors is larger than
1, the map is not completely positive and does not apply to all
density matrices ρ for qubitsA andB. This happens whenever
the entanglement increases. It also happens in cases where the
concurrence (3.2) decreases, when one of cosφA and cosφB
increases and the other decreases and there is more decrease
than increase. The completely positive maps that decrease the
entanglement have already been described [1].
V. RECONCILIATION
Entanglement being increased by local interactions may
seem surprising from perspectives framed by experience in
common situations where it is impossible. Entanglement is
not increased by a completely positive map of the state of two
qubits produced by an interaction on one of them. The in-
teraction will produce a completely positive map if it is with
a control whose state is initially not correlated with the state
of the two qubits, as in Eq. (2.1). In our examples, that hap-
pens only when the initial value of φ is 0 or a multiple of π.
Otherwise, the state of the control is correlated with the state
of the two qubits as in Eq. (2.16). When a subsystem is ini-
tially correlated with the rest of a larger system that is being
changed by unitary Hamiltonian dynamics, the map that de-
scribes the change of the state of the subsystem generally is
not completely positive and applies to a limited set of subsys-
tem states [16, 17]. We see this in our examples whenever the
entanglement increases and in some cases when the entangle-
ment decreases.
The map depends on both the dynamics and the initial cor-
relations. It describes the effect of both in one step. Com-
pletely positive maps are what you get in the simplest set-up
where you bring a system and control together in independent
states and consider the effect of the dynamics that begins then.
Dynamics over intervals where the maps are not completely
positive can be expected to play roles in more complex set-
tings. We should not let expectations for completely positive
maps prevent us from seeing things that can happen.
Our perspective is enlarged when we look beyond the map
and include the dynamics. We can see the dynamics and the
initial preparation as two related but separate steps. We can
consider the effect of the dynamics, whatever the preparation
may be.
Local interactions that increase entanglement are com-
pletely outside a perspective that is limited to pure states. An
interaction on one of the qubits can not change the entangle-
ment at all if the state of the two qubits remains pure [18].
The entanglement of a pure state of two qubits depends only
on the spectrum of the reduced density matrices that describe
the states of the individual qubits, which is the same for the
two qubits. If that could be changed by an interaction on one
of the qubits, there could be a signal faster than light. In our
examples, the state of the two qubits is pure only when it is
maximally entangled. In our examples, the spectrum of the
density matrices for the individual qubits never changes, and
gives no measure of the entanglement.
Acknowledgments
We are grateful to a referee for very helpful suggestions,
including the comparison with a swap operation. Anil Shaji
acknowledges the support of the US Office of Naval Research
through Contract No. N00014-03-1-0426.
[1] M. Ziman and V. Buzek, Phys. Rev. A 72, 052325 (2005).
[2] T. Yu and J. H. Eberly, Phys. Rev. Lett. 97, 140403 (2006).
[3] X.-T. Liang, Phys. Lett. A 349, 98 (2006).
[4] T. Yu and J. H. Eberly, arXiv:quant-ph/0703083 (2007).
[5] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, Phys. Rev. A 75,
022101 (2007).
[6] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W. K. Woot-
ters, Phys. Rev. A 54, 3824 (1996).
[7] R. Horodecki, P. Horodecki, M. Horodecki, and K. Horodecki,
arXiv:quant-ph/0702225 (2007).
[8] K. Zyczkowski, P. Horodecki, M. Horodecki, and
R. Horodecki, Phys. Rev. A 65, 012101 (2001).
[9] M. Zukowski, A. Zeilinger, M. A. Horne, and A. K. Ekert, Phys.
Rev. Lett. 71, 4287 (1993).
[10] W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
[11] W. Dür, J. I. Cirac, and R. Tarrach, Phys. Rev. Lett. 83, 3562
(1999).
[12] W. Dür and J. I. Cirac, Phys. Rev. A 61, 042314 (2000).
[13] A. Acı́n, D. Bruß, M. Lewenstein, and A. Sanpera, Phys. Rev.
Lett. 87, 040401 (2001).
[14] G. Toth, O. Guhne, M. Seevinck, and J. Uffink, Phys. Revs A
72, 014101 (2005).
[15] E. C. G. Sudarshan, Chaos, Solitons and Fractals 16, 369
(2003).
[16] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, Phys. Rev. A.
70, 52110 (2004).
[17] T. F. Jordan, A. Shaji, and E. C. G. Sudarshan, Phys. Rev. A.
73, 12106 (2006).
[18] C. H. Bennett, H. J. Bernstein, S. Popescu, and B. Schumacher,
Phys. Rev. A 53, 2046 (1996).
ABSTRACT
  Simple examples are constructed that show the entanglement of two qubits
being both increased and decreased by interactions on just one of them. One of
the two qubits interacts with a third qubit, a control, that is never entangled
or correlated with either of the two entangled qubits and is never entangled,
but becomes correlated, with the system of those two qubits. The two entangled
qubits do not interact, but their state can change from maximally entangled to
separable or from separable to maximally entangled. Similar changes for the two
qubits are made with a swap operation between one of the qubits and a control;
then there are compensating changes of entanglement that involve the control.
When the entanglement increases, the map that describes the change of the state
of the two entangled qubits is not completely positive. Combination of two
independent interactions that individually give exponential decay of the
entanglement can cause the entanglement to not decay exponentially but,
instead, go to zero at a finite time.

<|endoftext|><|startoftext|>
Microsoft Word - sun.doc
Extended Solar Emission – an Analysis of the EGRET Data 
Elena Orlando, Dirk Petry and Andrew Strong 
Max-Planck-Institut für extraterrestrische Physik, Postfach 1312, D-85741 Garching, Germany 
Abstract. The Sun was recently predicted to be an extended source of gamma-ray emission, produced by inverse-
Compton scattering of cosmic-ray electrons with the solar radiation. The emission was predicted to contribute to the 
diffuse extragalactic background even at large angular distances from the Sun. While this emission is expected to be 
readily detectable in future by GLAST, the situation for available EGRET data is more challenging. We present a detailed 
study of the EGRET database, using a time dependent analysis, accounting for the effect of the emission from 3C 279, the 
moon, and other sources, which interfere with the solar signal. The technique has been tested on the moon signal, with 
results consistent with previous work. We find clear evidence for emission from the Sun and its vicinity. The 
observations are compared with our model for the extended emission.  
Keywords: Cosmic rays, gamma-ray emission, Sun, EGRET. 
PACS: 95.85, 96.50, 96.60  
THE EXTENDED SOLAR EMISSION MODEL 
The heliosphere has been studied as an extended source of gamma-ray emission, produced by inverse-Compton 
scattering of cosmic-ray electrons with the solar photon field [1,2]. For this analysis our model [1] has been 
improved using the modulated electron spectrum at all distances following [2] instead of the measured local electron 
spectrum, and using the anisotropic inverse-Compton scattering formalism [3].   
ANALYSIS OF THE EGRET DATA 
We analyzed the EGRET data using the code developed for the moving target Earth [4] and adding necessary 
features (solar and lunar ephemerides, occultation, background point source trace calculations). The diffuse 
background was reduced by excluding the Galactic plane. Otherwise all available exposure within mission phase 1-3 
was used. When the Sun passed by other gamma-ray sources (moon, 3C 279 and several quasars), these sources 
were included in the analysis.  Details will be given in [5]. 
We fitted the data in the Sun-centred system using a multi-parameter likelihood fitting technique, leaving as free 
parameters the solar extended inverse-Compton flux from the model, the solar disk flux from pion decay [7], a 
uniform background, and the flux of 3C279, the dominant background point source.  The moon flux was determined 
from moon-centred fits and the 3EG source fluxes were fixed at their catalogue values. All components were 
convolved with the energy-dependent EGRET PSF.  The region used for fitting is a circle of radius 10º centred on 
the Sun.  Since the interesting parameters are solar disk source and extended emission, the likelihood is maximized 
over the other components.  In order to verify our method, we checked that the fluxes of the Crab Nebula, 3C 279, 
and in particular the moon [6] were reproduced.  
Results 
The log-likelihood ratio for E >100 MeV is displayed in Fig.1 as a function of solar disk flux and extended flux, 
compared with the model prediction of solar inverse-Compton flux for modulation parameter 500 MV at 1 AU. The 
solar emission is detected with 5.3σ significance. There is evidence for the extension of the emission at a level of 
2.7σ; the maximum log L indicates an extended component with a flux compatible with the IC model. The total flux 
from the Sun is more than expected for the disk source [7], so this is clear evidence for the IC emission even without 
the proof of extension. We find that the measured extended flux is fully consistent with the model. Figure 2 shows 
the fitted model counts of the main components and the total including uniform background. 
In future work we will perform a detailed spectral analysis, refine the analysis of systematic errors, and study 
different models for the modulated electron spectrum. This is important for future missions such as GLAST and for 
studying solar modulation. 
FIGURE 1.  Log Likelihood above 100 MeV as function of  the solar disk flux and extended solar flux, relative to point at (0,0). 
The level of our predicted IC model flux and the predicted disk flux [7] are shown.   
FIGURE 2.  Fitted model counts of the main components centered on the Sun. From left to right: Sun disk, Sun IC, moon,  
3C 279, and the total predicted counts including uniform background.  The colors show the counts/pixel, for 0.5°µ 0.5° pixels.  
REFERENCES 
1. E. Orlando and A. W. Strong, Ap&SS in press, astro-ph/0607563 (2006). 
2. I. V. Moskalenko et al., ApJL 652 (2006) L65-L68. 
3. I. V. Moskalenko and A. W. Strong, ApJ, 528, 357 (2000). 
4. D. Petry, AIP Conf. Proc., 745, 709 (2005). 
5. D. Petry, E. Orlando and A.W. Strong, in prep. (2007). 
6.  D.J. Thompson et al., Journal of Geophys. Res. 102 (A7), 14735 (1997). 
7. D. Seckel et al., ApJ 382, 652 (1991). 
ABSTRACT
  The Sun was recently predicted to be an extended source of gamma-ray
emission, produced by inverse-Compton scattering of cosmic-ray electrons with
the solar radiation. The emission was predicted to contribute to the diffuse
extragalactic background even at large angular distances from the Sun. While
this emission is expected to be readily detectable in future by GLAST, the
situation for available EGRET data is more challenging. We present a detailed
study of the EGRET database, using a time dependent analysis, accounting for
the effect of the emission from 3C 279, the moon, and other sources, which
interfere with the solar signal. The technique has been tested on the moon
signal, with results consistent with previous work. We find clear evidence for
emission from the Sun and its vicinity. The observations are compared with our
model for the extended emission.

<|endoftext|><|startoftext|>
Draft version October 23, 2018
Preprint typeset using LATEX style emulateapj v. 08/13/06
MASS AND TEMPERATURE OF THE TWA 7 DEBRIS DISK
Brenda C. Matthews
Herzberg Institute of Astrophysics, National Research Council of Canada, 5071 West Saanich Road, Victoria, BC, V9E 2E7, Canada
Paul G. Kalas
Department of Astronomy, University of California, Berkeley, CA, 94720-3411, U.S.A.
Mark C. Wyatt
Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge, CB3 0HA, U.K.
Draft version October 23, 2018
ABSTRACT
We present photometric detections of dust emission at 850 and 450 µm around the pre-main sequence
M1 dwarf TWA 7 using the SCUBA camera on the James Clerk Maxwell Telescope. These data
confirm the presence of a cold dust disk around TWA 7, a member of the TW Hydrae Association.
Based on the 850 µm flux, we estimate the mass of the disk to be 18 Mlunar (0.2 M⊕) assuming a mass
opacity of 1.7 cm2 g−1 with a temperature of 45 K. This makes the TWA 7 disk (d = 55 pc) an order
of magnitude more massive than the disk reported around AU Microscopii (GL 803), the closest (9.9
pc) debris disk detected around an M dwarf. This is consistent with TWA 7 being slightly younger
than AU Mic. We find that the mid-IR and submillimeter data require the disk to be comprised of
dust at a range of temperatures. A model in which the dust is at a single radius from the star, with
a range of temperatures according to grain size, is as effective at fitting the emission spectrum as a
model in which the dust is of uniform size, but has a range of temperatures according to distance. We
discuss this disk in the context of known disks in the TW Hydrae Association and around low-mass
stars; a comparison of masses of disks in the TWA reveals no trend in mass or evolutionary state
(gas-rich vs. debris) as a function of spectral type.
Subject headings: stars: circumstellar matter, pre-main sequence — stars: individual (TWA 7) —
submillimeter
1. INTRODUCTION
TWA 7 (2MASS J10423011-3340162, TWA 7A) is
a weak-line T Tauri star identified as part of the
TW Hydrae Association (TWA, Kastner et al. 1997) by
Webb et al. (1999) based on proper motion studies in
conjunction with youth indicators such as high lithium
abundance, X-ray activity and evidence of strong chro-
mospheric activity. Disk systems were inferred around
four of the 18 association members (Zuckerman & Song
2004) from measurements of IR excess with the In-
frared Astronomical Satellite (IRAS). TW Hydra itself
(a K7 pre-main sequence star) hosts the nearest proto-
stellar disk to the Sun. Another accreting disk is ob-
served around one member of the triple system Hen 3-
600 (Muzerolle et al. 2000), and two debris disk systems
have been detected, around the A0 star HR 4796A (Jura
1991; Schneider et al. 1999) and one of two spectroscopic
binary components of the quadruple system HD 98800
(Jayawardhana et al. 1999; Gehrz et al. 1999), which is
a K5 dwarf. TWA 7 was not detected by IRAS.
Based on the width of the Li 6707 Ȧ line,
Neuhäuser et al. (2000) deduced that TWA 7 is a
pre-main sequence star. TWA 7 was not detected
by Hipparcos, but its membership in the TWA sets
its distance to be 55 ± 16 pc (Neuhäuser et al. 2000;
Weinberger et al. 2004; Low et al. 2005). Its spectral
Electronic address: brenda.matthews@nrc-cnrc.gc.ca
Electronic address: kalas@astron.berkeley.edu
Electronic address: wyatt@ast.cam.ac.uk
type is M1 based on LRIS spectra (Webb et al. 1999).
Based on existing photometry, evolutionary tracks and
isochrone fitting, Neuhäuser et al. (2000) derived an
age of 1-6 Myr (i.e., roughly coeval with other TWA
stars) and a mass of 0.55 ± 0.15 M⊙. The age of
the association is generally taken to be ∼ 8 − 10
Myr (Stauffer, Hartmann & Barrado y Navascues 1995;
Zuckerman & Song 2004). This is the age when planet
formation is thought to be ongoing and when disk dissi-
pation is occurring. Thus, the TWA is the ideal cluster in
which to observe the transitions from pre-main sequence
stars with proto-planetary (gas-rich) disks to main se-
quence stars with debris (gas-poor) disks.
TWA 7 has been observed for infrared excess emission
several times. The presence of a disk around TWA 7 was
first noted in submillimeter observations byWebb (2000).
They measured a flux of 15.5± 2.4 mJy at 850 µm. Nei-
ther Jayawardhana et al. (1999) nor Weinberger et al.
(2004) detected any excess associated with TWA 7 in
the mid-infrared. However, Low et al. (2005) report de-
tections of IR excess at 24 and 70 µm toward TWA 7 with
the Spitzer Space Telescope. Based on these data and ex-
isting shorter wavelength data on the stellar photosphere,
Low et al. (2005) derive a disk temperature of 80 K and
a lower limit to the mass of 2.4 × 1023 g (3.3 × 10−3
Mlunar) for the disk. This is a lower limit because the 70
µm data are not sensitive to colder material in the outer
disk and dust grains exceeding a few hundred micron in
size. A search for substellar companions using the NIC-
http://arxiv.org/abs/0704.0463v1
mailto:brenda.matthews@nrc-cnrc.gc.ca
mailto:kalas@astron.berkeley.edu
mailto:wyatt@ast.cam.ac.uk
2 Matthews et al.
MOS chronograph on HST did not reveal any evidence
of the disk (Lowrance et al. 2005); a nearby point source
is identified as a background object.
The study of debris disks around members of an as-
sociation permits the study of the evolution of disks as
a function of spectral type alone, since the disks likely
formed coevally and with similar compositions. It is also
possible to judge whether the presence and evolution of
disks around multiple stellar systems is comparable to
those around single stars. The discovery of disks around
low-mass stars is relatively recent (Greaves et al. 1998;
Liu et al. 2004; Kalas et al. 2004). The low radiation
field of late-K and M dwarfs means that the disks are
faint in scattered light compared to disks around more
massive stars, and hence they were less frequently tar-
geted by scattered light searches for disks. However,
scattered light imaging is often a follow-up technique
used after an infrared excess has been discovered (e.g.,
AU Mic, Kalas et al. 2004). In fact, the low radiation
fields may favor long-lived and slowly evolving disks,
since some disk material may be unchanged from the
original proto-planetary disks (e.g., Graham et al. 2007).
Spitzer has detected evidence of disks around a few K
stars (Chen et al. 2005; Bryden et al. 2006; Uzpen et al.
2005; Gorlova et al. 2004; Beichman et al. 2005), but
Beichman et al. (2006) note that in a sample of 61 K1 to
M6 stars, no excess emission is detected at 70 µm. This is
well below the expected detection rate if the disk fraction
is at all comparable to the ∼ 15% observed around solar-
type stars. The discovery of disks around K- and M-type
dwarfs may be difficult at far-infrared wavelengths be-
cause material at similar radii to disks around early-type
stars will be cool and will not radiate sufficiently. How-
ever, submillimeter observing sensitivities and the dust
temperature conspire to allow these objects to be dis-
covered at submillimeter wavelengths (Zuckerman 2001).
Submillimeter observations are sensitive to colder, larger
grains, which are more likely to be optically thin than the
warmer far-infrared emitting dust (Hildebrand 1988).
We report here detections of submillimeter excess emis-
sion at 450 and 850 µm around TWA 7 using the James
Clerk Maxwell Telescope (JCMT). In § 2, we summarize
our observations. In § 3, we present our results. We
discuss the relevance of these data to the TWA and the
population of disks around low-mass stars in § 4; our
results are summarized in § 5.
2. OBSERVATIONS AND DATA REDUCTION
Observations were made in 2004 October 19 using the
photometry mode on the Submillimeter Common User
Bolometer Array (SCUBA) on the JCMT (Holland et al.
1999). The pointing center of the observation was α =
10h42m30.s3, δ = −33◦40′16.′′9 (J2000). The on-source in-
tegration time was 1.6 hours. Flux calibration was done
using Mars, yielding flux conversion factors (FCFs) of
289.2 ± 1.4 Jy beam−1 volt−1 at 850 µm and 367.5 +/-
15 Jy beam−1 volt−1 at 450 µm. The absolute flux cali-
bration is accurate to ∼ 20− 30%. Pointing was checked
on the source 1034 − 293. The weather was excellent
during the observations, with a CSO tau measurment at
225 GHz of ∼ 0.04. The extinction was corrected using
skydips to measure the tau at 850 and 450 µm. The
mean tau value was 0.15 at 850 µm in four skydips and
0.59 at 450 µm in three skydips. The extinction values
derived from the skydips were consistent with the values
extrapolated from the CSO tau values according to the
relations of Archibald et al. (2002).
The data were reduced using the Starlink SURF pack-
age (Jenness & Lightfoot 1998). After flatfielding and
extinction correction, we flagged noisy bolometers rig-
orously; eleven (of 37) bolometers were removed in the
long wavelength array and 26 (of 91) were removed from
the short wavelength array. The photometry data were
then clipped at the 5 σ level to remove extreme values.
Short timescale variations in the sky background were
then removed using the mean of all bolometers except
the central one in each array. The average and variance
were then taken of each individual integration for the
central bolometers of the long and short wavelength ar-
rays after a clip of 3 σ was applied. In the case of the
450 µm data, the signal-to-noise ratio was improved by
applying a subsequent 2σ clip to the remaining data.
3. RESULTS
We have detected emission toward TWA 7 at both 850
µm and 450 µm. The fluxes measured are 9.7 ± 1.6 mJy
(6.1σ) and 23.0 ± 7.2 mJy (3.2σ), respectively. Errors
are statistical, and do not include the typical flux uncer-
tainty of∼ 20−30% for submillimeter single-dish calibra-
tion. Utilizing the measured fluxes of the star (Table 1)
and recently published Spitzer data (Low et al. 2005), we
construct the spectral energy distribution (SED) for this
source (Figure 1). The submillimeter data clearly repre-
sent an excess of emission over the photospheric emission
from TWA 7, as detected at 70 µm by Low et al. (2005).
Fig. 1.— The spectral energy distribution of TWA 7. The optical
and near-infrared data (diamonds) are modeled with a NextGen
model (Hauschildt et al. 1999) scaled to the mass and luminosity
of TWA 7 (Low et al. 2005). The star has a temperature of ∼
3500 K (best fit to stellar photometry, Low et al. 2005; Webb et al.
1999). Triangles mark detections from the Spitzer Space Telescope
and submillimeter detections from the JCMT. These fluxes show
clear excess when compared to the stellar photosphere. The flux
values are reported in Table 1. Fits for two single temperature
blackbodies (parameters are described in Table 2) to the TWA 7
data show that no single temperature fits all four measurements
of excess emission. An 80 K blackbody, Model 1, (dotted line) fits
the 24 and 70 µm data (Low et al. 2005), but underestimates the
submillimeter fluxes. The 70, 450 and 850 µm data are well fit
by a 45 K blackbody, Model 2, (dashed line), but a disk this cold
cannot account for the observed 24 µm excess.
The flux we measure at 850 µm is only about 70%
that measured by Webb (2000). Taking into account
the typical 20 − 30% uncertainty in absolute flux cali-
bration between epochs, the fluxes become 15± 3.8 mJy
(2000) and 9.7 ± 2.5 mJy (2004), which are consistent
Properties of the TWA 7 Disk 3
TABLE 1
Fluxes of TWA 7
Wavelength Magnitude Flux Reference
[µm] [mJy]
0.44 (B) 12.55 39.4 HST Guide Star Catalog (Lasker et al. 1996)
0.44 (B) 12.3 49.7 USNO-A2.0 (Monet et al. 1998)
0.54 (V) 11.06 142.4 reported in Low et al. (2005)
0.64 (R) 11.2 97.4 USNO-A2.0 (Monet et al. 1998)
1.25 (J) 7.78 1259.5 Webb et al. (1999)
1.65 (H) 7.13 1476.3 Webb et al. (1999)
2.16 (Ks) 6.90 1159.1 2MASS PSC
2.18 (K) 6.89 1148.8 Webb et al. (1999)
12 – 70.4 ± 8.6 Weinberger et al. (2004)
24 – 30.2 ± 3.0 Low et al. (2005)
70 – 85 ± 17 Low et al. (2005)
450 – 23 ± 7.2 this work
850 – 9.7 ± 1.6 this work
Note. — Conversion from magnitudes to Janskys has been done using zero points from
Allen’s Astrophysical Quantities (Cox 1999). The 12 µm estimate by Weinberger et al.
(2004) agrees with that of Jayawardhana et al. (1999) to within 1 σ and is consistent with
the photospheric flux expected from TWA 7.
within the errors. The instability of flux calibration in
the submillimeter is well known, and may be assuaged
somewhat by the ability to flux calibrate more often with
SCUBA-2, the next generation submillimeter camera on
the JCMT. In the interests of consistency with the only
450 µm detection in this work, we adopt the 850 µm flux
from the later epoch and do not attempt to combine the
two datasets.
3.1. Submillimeter Excess and Temperature
Low et al. (2005) derive a disk temperature of 80 K for
TWA 7 based on its infrared excess values at 24 and 70
µm. The temperature of the star is derived to be 3500
K, which is consistent with the log(Teff) of 3.56 reported
by Neuhäuser et al. (2000) based on the stellar SED.
Figure 1 shows the flux density distribution toward
TWA 7. The stellar fluxes and fit to the stellar pho-
tosphere are taken from Low et al. (2005) based on
NextGen models by Hauschildt et al. (1999) using a grid
of Kurucz (1979). The submillimeter excesses are evi-
dent when compared with the stellar spectra. Based on
their detection of TWA 7 in the mid-infrared, Low et al.
(2005) determine that the dust orbiting TWA 7 exists at
radii ≥ 7 AU from the star and has a temperature of 80
K. We have constructed several models for dust emission
around a 0.55M⊙ star (see Table 2 for their parameters).
We present two single temperature models in Figure 1,
neither of which is capable of fitting the measured ex-
cess fluxes at all four wavelengths. Model 1 (dotted line
in Fig. 1) is that of Low et al. (2005): a blackbody of
80 K, which fits the 24 and 70 µm data, but cannot re-
produce the fluxes at longer wavelengths. Model 2 is a
colder black body at a temperature of 45 K (dashed line
of Fig. 1), which fits the three longest wavelength fluxes,
but cannot reproduce the 24 µm flux. Thus we conclude
that the dust in this disk cannot be at a single tempera-
ture; rather there are a range of temperatures responsible
for the observed emission.
Two models that are able to fit the observed emission
spectrum are shown in Figure 2. In both models the
dust has a range of temperatures, however there are two
different physical motivations behind the origin of this
range.
Fig. 2.— The measured SED for TWA 7. Symbols and fit to the
stellar spectrum is as for Fig. 1. Two multi-temperature models are
fit to the excess emission arising from the disk. These are described
in detail in the text, and the parameters are summarized in Table
2. In Model 3 (dotted line), different temperatures arise due to
grains of different sizes located at a common radius from the star;
while Model 4 (dashed line) contains grains distributed at a range
of distances from the star.
In Model 3 (dotted line of Fig. 2), the dust is assumed
to all lie at the same distance from the star, r0. It
is assumed to have a range of sizes with a power law
distribution defined by the relation n(D) ∝ D(2−3q) for
grains of size D. This power law is assumed to be trun-
cated below dust of size Dmin = 0.9 µm, the size for
which β = Frad/Fgra = 0.5 for compact grains. The
dust is assumed to be composed of compact spherical
grains of a mixture of organic refractories and silicates
(Li & Greenberg 1997), and interaction with stellar radi-
ation determines the temperature that dust of different
sizes attains. The same model was used in Wyatt & Dent
(2002) to model the emission from the Fomalhaut disk,
where more detail can be found on the modelling method.
The emission spectrum could be fitted by dust at a ra-
dius r0 = 100 AU with a size distribution described by
q = 1.78. The temperatures of dust in this model range
from 21 K for the largest grains (> 100 µm) to 65 K for
the smallest grains (0.9 µm).
The size distribution used in Model 3 is close to that
expected in a collisional cascade wherein dust is replen-
ished by collisions between larger grains, since this re-
4 Matthews et al.
TABLE 2
Dust Model Parameters
Model Plotted as Temperature Dmin Dmax q Radius Mass (M⊕)
1 dotted Fig. 1 80 K – – – – 0.025
2 dashed Fig. 1 45 K – – – – 0.2
3 dotted Fig. 2 21 - 65 K 0.9 µm 1 m 1.78 100 AU 6.0
4 dashed Fig. 2 > 38 K – – – ≤ 35 AU 0.2
sults in a size distribution with q ∼ 1.83 and would be
truncated below the size of dust for which radiation pres-
sure would place the dust on hyperbolic orbits as soon
as they are created. The effect of radiation forces on
small grains is quantified by the parameter β = Frad/Fgra
(which is not to be confused with the index of dust emis-
sivity of grains that moderates the Rayleigh-Jeans tail
of the SED of cold dust), and it is dust with β > 0.5
which is unbound. However, due to the low luminosity
of M stars, it is not clear whether radiation pressure is
sufficient to remove dust grains from the disk system.
Figure 3 shows β as a function of dust grain size for
dust around TWA 7 (M∗ = 0.55 M⊙, L∗ = 0.31 L⊙).
While β is larger than 0.5 for compact (< 0.9 µm) grains,
this condition is only met for a narrow region of the
size distribution. Furthermore, if the dust grains are
porous, as around AU Mic (Graham et al. 2007), then
no grains will have β > 0.5. An alternative origin for
the small grain cut-off could be stellar wind forces, since
these provide a pressure force similar to radiation pres-
sure (Augereau & Beust 2006), and it is known that stel-
lar wind forces can be significantly stronger than radia-
tion forces for M stars (Plavchan, Jura & Lipscy 2005).
While the smallest grain size in the distribution may dif-
fer from our value of 0.9 µm, this does not affect the
ability of the model to fit the observed emission spec-
trum with suitable modifications to the slope in the size
distribution and radius of the dust belt.
In Model 4 (dashed line of Fig. 2), the dust grains
are assumed to lie at a range of distances from the star,
but they are all assumed to emit like black bodies, and
so have temperatures T = 278.3 L0.25
r. The spatial
distribution of the grains was taken from the model of
Wyatt (2005b) in which dust is created in a planetesimal
belt at r0, and then migrates inward due to Poynting-
Robertson (P-R) drag, but with some fraction of the
grains removed by mutual collisions on the way. In the
model the dust ends up with a spatial distribution which
can be described by the parameter η0, such that the sur-
face density falls off ∝ 1/(1 + 4 η0 (1 −
r/r0)). The
emission spectrum could be fitted using the parameters
r0 = 30 AU and η0 = 10, resulting in dust with tem-
peratures upwards of 38 K. However, the density of the
planetesimal belt required to scale the resulting emission
spectrum showed that removal by collisions dominates
over the P-R drag force in such a way that η0 should
be closer to 1400 in this disk. Thus, if this model is to
have a true physical motivation, then we need to invoke a
drag force which is ∼ 140 times stronger than P-R drag.
Such a force could come from the stellar wind, which
causes a drag force similar to P-R drag and which can
be incorporated into the model by reducing η0 by a fac-
tor 1 + (dMwind/dt) c
2/L∗ (Jura 2004). Thus to achieve
η0 = 10 in this way we would require the stellar wind to
be ∼ 140 times stronger than that of the Sun. The high
X-ray luminosity of TWA 7 (LX = 9.2±1.0×1029 erg/s,
Stelzer & Neuhauser 2000) may be indicative of a strong
stellar wind, since measured mass loss rates have been
found to increase with X-ray flux (Wood et al. 2005).
However the correlation found by (Wood et al. 2005, see
their Fig. 3) breaks down at X-ray fluxes an order of mag-
nitude lower than that of TWA 7 (for which FX ∼ 8×106
erg cm−2 s−1, and so it is not possible to use this flux to
estimate the mass loss rate with any certainty, and we
simply note that the mass loss rate required to achieve
η0 = 10 is not incompatible with observations of mass
loss rates of other stars. A stellar wind drag force has
also been invoked to explain structure in the AU Mic disk
(Strubbe & Chiang 2006; Augereau & Beust 2006).
We note that we are not claiming that the radius, r,
size distribution, q, or parameter η0 have been well con-
strained by these fits. The SED does indicate that the
disk around TWA 7 contains grains at a range of temper-
atures. The fits of Figure 2 illustrate two ways in which
multiple temperatures in the disk may arise from phys-
ically motivated models: the dust could have a range of
sizes, or it could be distributed over a range of distances.
Other models may also fit the data, including those in
which dust has a range of distances and sizes, and those
in which the dust originates in not one but multiple dust
zones, as has been inferred for AU Mic (Fitzgerald et al.,
in prep.).
Fig. 3.— Ratio of the radiation force acting on dust grains of
different size in the TWA 7 disk to the force of stellar gravity, β =
Frad/Fgra. Two different models are shown: compact grains (p =
0) and porous grains (p = 0.9). Dust with β > 0.5 is unbound from
the star as soon as it is released from a planetesimal. This figure
demonstrates that radiation pressure is not a significant mechanism
to remove dust grains from the TWA 7 system and is particularly
ineffective if the grains are highly porous.
The derived fits of Figure 2 do show differences in the
mid-IR spectrum. While this suggests that knowledge
of the mid-IR spectrum would enable us to distinguish
between the two models, it is worth pointing out that
Properties of the TWA 7 Disk 5
the exact shape of this spectrum is very sensitive to the
size distributions of small grains (in Model 3) and to the
radial distribution of grains interior to the planetesimal
belt (in Model 4). Thus it is possible that different as-
sumptions about these distributions could be made to
provide a fit to the same observed spectrum with both
models. The only way to definitively break the degen-
eracy is through imaging of the thermal emission from
the mid-IR to the submillimeter. The single radius disk
must look the same at all wavelengths, while a radially
distributed disk would look larger at longer wavelengths,
as more of the cold dust further from the star is detected.
3.2. Mass of the Disk
Low et al. (2005) estimate the minimum mass in dust
of the TWA 7 disk to be 0.0033 Mlunar, under the as-
sumption that the dust grain size is 2.8 µm. In deter-
mining the mass of the disk from submillimeter measure-
ments, a key parameter is the temperature of the dust.
Where possible, we discuss the derived mass for each of
the models discussed above. For submillimeter fluxes, we
can estimate the mass of the disk directly for an assumed
temperature and opacity from the relation:
Mdisk =
κν Bν(Td)
where Bν(Td) is the Planck function at the dust temper-
ature, Td, and κν is the absorption coefficient of the dust.
The derived mass is a strongly dependent function of the
value of κν . For debris disk studies, a value of 1.7 cm
g−1 is appropriate at 850 µm (Dent et al. 2000). This is
at the upper end of the values derived by Pollack et al.
(1994).
We can estimate the mass in Model 1 by using the
submillimeter flux predicted from the Rayleigh-Jeans tail
of the model (2.4 mJy) as well as the 80 K temperature
and standard opacity. This gives a mass of 0.025 M⊕
(∼ 2 Mlunar). This can be interpreted as the mass of hot
dust, with the proviso that the submillimeter observation
shows that there is more mass in colder dust as well.
Model 2 fits the submillimeter and 70 µm emission well.
Under the estimate of 45 K for the dust temperature of
the disk, the standard dust opacity and the measured 850
µm flux, the TWA 7 disk contains 0.2M⊕ of material (18
Mlunar). Based on the uncertainty in the flux (random
and systematic), the uncertainty on the mass estimates
are ∼ 30%. The mass of the TWA 7 disk is an order
of magnitude greater than the mass of 0.011 M⊕ for the
disk detected around AU Mic (Liu et al. 2004) that is
also derived based on a single temperature fit to far-IR
and submillimeter data with κν = 1.7 cm
2 g−1.
Model 3 contains dust grains at different temperatures
based on their size. In this case, the size distribution of
dust is well defined (i.e., with known scaling), the total
mass of dust is given by Mtot = 11 D
max, where Mtot is
the mass in Earth masses and Dmax is in meters. While
it is impossible to observationally knowDmax (since large
planetesimals are effectively invisible), in our model 95%
of the 850 µm flux comes from grains < 0.4 m. This
implies a mass of 6 M⊕, significantly higher than that
of Model 2. In the TWA 7 system, Dmax could be even
larger (or smaller) than this, so this discrepency carries
little definitive weight. The size distribution is relatively
shallow (i.e., there are lots of large grains) which explains
why the mass is much larger than in Model 2.
To derive a mass for Model 4 we assume an opacity
of 1.7 cm2 g−1 and determine the mass of dust in the
model required to reproduce the observed 850 µm flux,
given that the dust has a range of temperatures. The
derived mass is almost exactly the same as that at of
Model 2 at 0.2M⊕. This is to be expected because the
submillimeter flux in both models is dominated by the
coldest dust at 38-45 K, and very little additional mass
is required in Model 4 to explain the mid-IR emission (as
illustrated by the low mass in Model 1).
Given the inherent assumptions and unknowns in each
of these models, we adopt the results of the highly sim-
ple Model 2 as our most robust estimate of the mass
of the dust disk in TWA 7. It depends on the tem-
perature of the grains producing the observed 850 µm
flux, which are well fit by the cold dust model of Fig-
ure 1. The mass is also highly dependent on the value
of κν , for which we have adopted a value in line with
other disk modeling work (Dent et al. 2000), and so can
be compared with the disk masses derived by other au-
thors (e.g., Wyatt, Dent & Greaves 2003; Liu et al. 2004;
Najita & Williams 2005).
4. DISCUSSION
4.1. Disks in the TW Hydrae Association
Recent MIPS data from the Spitzer Space Telescope
show that most stars in the TWA show no evidence of
circumstellar dust out to 70 µm (Low et al. 2005). These
results confirm the assertion of Weinberger et al. (2004)
and Greaves & Wyatt (2003) that there is a bimodal dis-
tribution in the TWA: either stars have strong excesses
associated with warm dust emission, or they have very
weak emission consistent with very cold disks. The ex-
ceptions are the proto-planetary disks around TW Hya
and Hen 3-600, which both show evidence of warm and
cold disk components (Wilner et al. 2000; Zuckerman
2001).
Lower limits on warm dust emission were set for all
TWA members (except TW Hya) by Weinberger et al.
(2004). The absence of dust near the star is often taken
as a signature of a centrally depleted debris, rather than
an accreting, proto-planetary disk. There are a few
main sequence stars for which warm dust is present (i.e.,
HD 69830 and η Corvi), but this may be a transient
phenomenon (Wyatt et al. 2007). Based on their data,
Weinberger et al. (2004) deduced that the non-detections
implied an absence of material in terrestrial planet region
and that, except for TW Hya, there were no long-lived
disks in the association which could still form planets.
An outstanding question is whether the stars without
warm disks have potential disk material locked up in
undetected planets, which would imply that dusty sys-
tems represent failed planetary systems, or whether non-
detections represent disk systems in which the dust is
too cold to be detected in the mid-IR. Only large-scale
searches for cold dust around a statistically significant
number of stars can clarify whether a sizable population
of disks exist which are too cold to detect at mid-IR
wavelengths. Such a survey is planned using the new
SCUBA-2 camera at the JCMT (Matthews et al. 2007).
The disks now known in the TWA each have very dif-
ferent properties. The disk around TW Hya (K7) still
6 Matthews et al.
TABLE 3
Fluxes and Masses of Disks in the TWA
Star Sp. Type Gas? Optical Depth Flux λ Mass Temp Reference
[mJy] [Mlunar] [K]
HR 4796A A0 no thin 19.1 ± 3.4 850 µm 19 99 Sheret et al. (2004)
HD 98800 K5 no thick 111.1 ± 0.01 800 µma 28 150 Prato et al. (2001)
TW Hya K7 yes thick 8± 1 7 mm 8× 105 – Wilner et al. (2000), SED fit
TWA 7 M1 no thin 9.7± 1.6 850 µm 18 45 this work
TWA 13 M1 no thin 27.6 ± 5.9 70 µm > 0.0019 65 Low et al. (2005)
Hen 3-600 M3 yes thick ∼ 65 850 µm 304b 20c Zuckerman (2001)
Note. — a κ850µm adjusted by 1/λ.
b mass estimate based on conservative value of κ850µm = 1.7 cm
2 g−1; c temperature
constraint on cold dust component only.
contains molecular gas (Kastner et al. 1997), meaning
it is proto-planetary or in a transition from a proto-
planetary to a debris disk. It is a broad, face-on disk
which extends to 135 AU with evidence of a dip in
flux at 85 AU (Krist et al. 2000; Wilner et al. 2000;
Trilling et al. 2001; Weinberger et al. 2002; Qi et al.
2004). Hen 3-600 also shows evidence of hosting an
accreting, gas-rich disk (Muzerolle et al. 2000). The
HR 4796A (TWA 11) disk contains a narrow dust ring
at 70 AU from the star (Jayawardhana et al. 1998;
Koerner et al. 1998; Schneider et al. 1999; Telesco et al.
2000) and no detectable gas in emission line stud-
ies (Greaves, Mannings & Holland 2000) or more recent
searches for absorption due to circumstellar gas along the
line of sight (Chen & Kamp 2004), a technique which is
highly dependent on the temperature profile of the gas
because the disk is not edge-on. In both the Hen 3-600
and HD 98800 systems, the disk orbits only one mem-
ber of the binary (Jayawardhana et al. 1999; Gehrz et al.
1999) with evidence for cooler dust in circumbinary or-
bits (Zuckerman 2001). In fact, HD 98800 is a quadruple
system, so the dust is in a circumbinary orbit around a
spectroscopic binary.
The discussion of the TWA disk population can be il-
luminated by the results of recent Spitzer study of the
Upper Sco Association (with an estimated age of 3 − 5
Myr) by Carpenter et al. (2006). Their MIPS obser-
vation found optically thin disks around A-type stars,
no disks around solar-like stars, and optically thick, ac-
creting disks around K- and M-type dwarfs, suggest-
ing that disk evolution proceeds more quickly around
higher mass stars. A similar result was found for the
H and χ Persei double cluster at 13 Myr (Currie et al.
2007). The sample of stars in Upper Sco was 204
stars with 31 detections at 8 µm, well distributed across
spectral types. In the TWA, we have disk detections
around only a handful of members, and the associa-
tion is much smaller, with only 18 members. How-
ever, both the optically-thick, gas-rich disks in the TWA
are hosted by K and M stars, whereas the only A star
with a disk hosts an optically-thin debris disk. We
note that HD 98800 is noted to be gas-poor, but with
an optically-thick dust disk (Weintraub, Kastner & Bary
2000; Zuckerman & Becklin 1993).
The masses and temperatures of the disks are com-
pared for the TWA members with disks in Table 3. All
six disks are detected and their SEDs modeled in the
recent paper by Low et al. (2005). Where possible, the
masses in Table 3 are derived from submillimeter fluxes
or fits to SEDs, rather than infrared values which pro-
vide lower limits only. This is only an issue for TWA
13 which has not yet been detected at submillimeter or
millimeter wavelengths. Scaling κ850µm = 1.7 cm
2 g−1
to 70 µm implies a lower mass limit of 0.1 Mlunar based
on cold grains.
Of the debris disks, the most massive is the disk around
the HD 98800 system, but its mass is comparable to that
of the disk around the earlier star HR 4796A. Based on
submillimeter masses, the TWA 7 disk is roughly compa-
rable to that of HR 4796A. It is clear that we cannot yet
identify a systematic trend in mass or evolutionary phase
with spectral type in the TWA. We note that, given the
presence of massive disks around components of the mul-
tiple systems HD 98800 and HR 4796, their dust disk life-
times do not appear to be any shorter than that around
a single star.
4.2. Disks Around Late-type Stars
Table 4 shows the compilation of the few known debris
disks around late-type (K and M) stars. Although a de-
bris disk has been historically claimed around HD 233517
(Sylvester et al. 1989), Jura et al. (2006) conclude that
it is a giant, not a main sequence, star, and so we do
not include it. Similarly, we exclude HD 23362, although
it has a measured excess, because Kalas et al. (2002) at-
tribute the emission to a uniform surrounding dust cloud,
not a debris disk, around a K2III star at 187 pc dis-
tance. All fluxes are measured at 850 µm with SCUBA,
except where noted. For one of these stars, only an up-
per limit is observed in the 850 µm flux. Of the eight
solid detections, two (HD 92945, TWA 13) are at a sin-
gle wavelength in the mid-infrared, and two (GJ 182 and
GJ 842.2) are detected only in the submillimeter. For the
four remaining disks (HD 53143, ǫ Eri, AU Mic and TWA
7), there is a trend of increasing mass with later spectral
type, but it must be noted that the mass dependence
could also be attributed to the known trend of declining
dust masses around older stars (Rhee et al. 2006) in the
cases of HD 53143 and ǫ Eri since they are significantly
older than AUMic and TWA 7. The youngest star, TWA
7, has the most massive disk. We are obviously also bi-
ased toward more massive disks at larger distances. AU
Mic’s disk (9.9 pc) could not have been detected in the
observation which detected TWA 7’s disk at 55 pc.
Spitzer has made great progress in the last year de-
tecting infrared excess from main sequence stars (see the
review by Werner et al. 2006). However, not many of
these have been around late-type stars, and for those
Properties of the TWA 7 Disk 7
TABLE 4
Masses and Mass Limits of Debris Disks around Late-type Main Sequence Stars
Star Spectral Distance Age Flux λ Mass Temp Mass Reference
Type [pc] [Myr] [mJy] [Mlunar] [K]
HD 69830a K0 12.6 600-2000 < 7 850 µm < 0.24 100 this work
HD 92945 K1 22 20-150 271 70 µm > 0.002 40 Chen et al. (2005)
HD 53143 K1 18 1000 82.0± 1.1 30-34 µm > 6.5× 10−6 120± 60 Chen et al. (2006)
– optical > 0.0096 60b Kalas et al. (2006)
ǫ Eri K2 3 730± 200 40 ± 1.5 850 µm 0.1 85 Sheret et al. (2004)
GJ 842.2 M0.5 20.9 200 25 ± 4.6 850 µm 28± 5 13 Lestrade et al. (2006)
GJ 182c M0.5 26.7 100
4.8± 1.2 850 µm > 2.1 40 + 150d Liu et al. (2004)
AU Micc M1 9.9 10 14.4± 1.8 850 µm 0.89 40 Liu et al. (2004)
TWA 7c M1 55 8 9.7± 1.6 850 µm 18 45 this work
TWA 13c M1 55 8 27.6± 5.9 70 µm > 0.0019 65 Low et al. (2005)
Note. — a We have used the adopted temperature of 100 K but revised downward the estimated flux and mass based on
re-analysis of data originally presented in Sheret et al. (2004). Wyatt et al. (2007) suggest that the warm dust around this
source must be transient. b Temperature from Zuckerman & Song (2004). c Ages derived from association membership. d
two-component fit.
detections which have been made (Gorlova et al. 2004;
Beichman et al. 2005; Uzpen et al. 2005; Bryden et al.
2006; Smith et al. 2006), no estimates of mass in the disk
exist. The detections are typically toward field stars or
very distant targets in the galactic plane, although one
(P922) is a member of the cluster M47 (Gorlova et al.
2004). We do not list these candidates in Table 4. One
exception is the excess around HD 92945 (K1V), for
which Chen et al. (2005) measure a minimum disk mass
of 2× 10−3 Mlunar.
As discussed in § 3.1, the degeneracies in the models
are best broken with thermal imaging. Of the disks com-
piled in Table 4, only ǫ Eri (3.3 pc) has been well resolved
at submillimeter wavelengths. The distance of the TWA
makes imaging of 100 AU (∼ 2′′) scale disks impossi-
ble with single dish telescopes in the submillimeter. ǫ
Eri would be exceedingly difficult to map if it were even
three times more distant. We will have to rely on arrays
with higher sensitivity to obtain maps like that of ǫ Eri
around most low mass stars unless many more are dis-
covered within 10 pc. In the long-term, mid-IR images
will be possible with the planned MIRI instrument on the
James Webb Space Telescope. In the short-term, submil-
limeter imaging will be possible with the Atacama Large
Millimeter-submillimeter Array (ALMA). Far-IR obser-
vations will be possible with the Herschel Space Obser-
vatory, although imaging of 100 AU disks will only be
possible for stars within 10 pc.
5. SUMMARY
We have detected submillimeter excess emission aris-
ing from the dust disk around TWA 7 at 450 and 850
µm using SCUBA on the JCMT. Based on our photom-
etry and recent data from Spitzer, we derive a disk mass
of 0.2 M⊕ (18 Mlunar) for a temperature of 45 K. This
model effectively fits the 70, 450 and 850 µm data with
a blackbody. To fit these data and the 24 µm flux re-
quires dust at a range of temperatures, and we show that
this could arise from dust at one radius with a range of
sizes, or from dust of one size at a range of distances from
the star. Based on the SED alone, it is not possible to
determine which physical model is dominant.
While the multiple system HD 98800 appears to har-
bour the most massive debris disk in the TWA, disks of
relatively comparable masses are observed around the A0
star HR 4796A and the M1 star TWA 7. Therefore, the
formation of debris disks does not appear to be solely a
function of the mass of the parent star. A comparison of
masses of disks in the TWA reveals no trend in mass or
evolutionary state (gas-rich, proto-planetary vs. debris)
as a function of spectral type, although the detection of
proto-planetary disks around the latest stars is consis-
tent with the results of Carpenter et al. (2006) toward
the Upper Sco Association and Currie et al. (2007) in
the double cluster H and χ Persei.
Kalas et al. (2006) came to the same conclusion with
regard to other debris disks. They surmise that nur-
ture could explain the presence or absence of disks at
later epochs. If the environment dynamically heats the
disk such that the large planets fail to form, then dust
remains for a longer timescale. The dynamically quiet
systems then may quickly form planets, leaving no disk
to be observed at later epochs, although there is as yet
no evidence for any correlation between stars with debris
and/or planets (Moro-Martin et al. 2006).
The authors acknowledge our anonymous referee for
an insightful and constructive report. As well, the au-
thors thank B. Zuckerman for providing us with the pre-
vious 850 µm flux measurement from the thesis of R.
Webb, and P. Smith for providing the stellar fit to TWA
7 of Low et al. (2005) to maximize consistency with their
analysis. We also acknowledge useful conversations with
P. Hauschildt and R. Gray regarding the spectra of M
dwarfs. We thank our telescope operator E. Lundin and
the staff at the JCMT for their support. B.C.M ac-
knowledges support of the National Research Council of
Canada through a Plaskett Fellowship. P.K. acknowl-
edges support from GO-10228 provided by STScI under
NASA contract NAS5-26555. M.C.W. acknowledges sup-
port of the Royal Society.
8 Matthews et al.
REFERENCES
Archibald, E.N. et al. 2002, MNRAS, 336, 1
Augereau, J.-C., & Beust, H. 2006, A&A, 455, 987
Beichman, C.A., et al. 2005, ApJ, 622, 1160
Beichman, C.A., et al. 2006, ApJ, 652, 1674
Bryden, G., et al. 2006, ApJ, 636, 1098
Carpenter, J., et al. 2006, ApJ, 651, 49
Chen, C.H., et al. 2006, ApJS, 166, 351
Chen, C.H., et al. 2005, ApJ, 634, 1372
Chen, C.H., & Kamp, I. 2004, ApJ, 602, 985
Cox, A. 1999, Allen’s Astrophysical Quantities, (AIP Press: New
York).
Currie, T., et al. 2007, ApJ, in press, astro-ph/0701441
de la Reza, R., Torres, C.A.O., Quast, G., Castilho, B.V., Vieira,
G.L. 1989, 343, 61
Dent, W.R., Walker, H.J., Holland, W.S., & Greaves, J.S. 2000,
MNRAS, 314, 702
Gehrz, R., et al. 1999, ApJ, 512, 55
Gorlova, N., et al. 2004, ApJS, 154, 448
Graham, J.R., Kalas, P., & Matthews, B.C. 2007, ApJ, 654, 595
Greaves, J.S., et al. 1998, ApJ, 506, 133
Greaves, J.S., Mannings, V., Holland, W.S. 2000, Icarus, 143, 155
Greaves, J.S., & Wyatt, M.C. 2003, MNRAS, 345, 1212
Hauschildt, P.H., Allard, F., & Baron, E. 1999, ApJ, 512, 377
Hildebrand, R. 1983, QJRAS, 24, 267
Holland, W.S., et al. 1999, MNRAS, 303, 659
Jayawardhana, R., et al. 1999, ApJ, 521, L129
Jayawardhana, R., et al. 1998, ApJ, 503, 79
Jenness, T., & Lightfoot, J.F., 1998, ASPC, 145, 216
Jura, M. 1991, ApJ, 383, 79
Jura, M. 2004, ApJ, 603, 729
Jura, M., et al. 2006, ApJ, 637, 45
Kalas, P. et al. 2002, ApJ, 567, 999
Kalas, P., Graham, J.R,, Clampin, M.C., & Fitzgerald, M.P. 2006,
ApJ, 637, 57
Kalas, P., Liu, M., & Matthews, B. 2004, Science, 303, 1990
Kastner, J.H., Zuckerman, B., Weintraub, D.A., & Forveille, T.
1997, Science, 277, 67
Koerner, D.W., Ressler, M.E., Werner, M.W., & Backman, D.E.
1998, ApJ, 503, 83
Krist, J.E., et al. 2000, ApJ, 538, 793
Kurucz, R.L. 1979, ApJS, 40, 1
Lasker, B.M., Russel, J.N., & Jenkner, H. 1996, The Guide Star
Catalog Version 1.2
Lestrade, X. et al. 2006, A&A, 460, 733
Li, A., & Greenberg, J.M. 1997, A&A, 323, 566
Liu, M., Matthews, B., Williams, J., & Kalas, P. 2004, ApJ, 608,
Low, F.J., et al. 2005, ApJ, 631, 1170
Lowrance, P.J., et al. 2005, AJ, 130, 1845
Matthews, B.C., et al. 2007, submitted to PASP
Monet, D. G., et al. 1998, USNO-A2.0 Catalog (Flagstaff: USNO)
Moro-Martin, A., et al. 2006, astro-ph/0612242
Muzerolle, J., Calvet, N., Brinceño, C., Hartmann, L., &
Hillenbrand, L. 2000, ApJ, 535, 47
Najita, J. & Williams, J.P. 2005, ApJ, 635, 625
Neuhäuser, et al. 2000, A&A, 354, L9
Plavchan, P., Jura, M., & Lipscy, S.J. 2005, ApJ, 631, 1161
Pollack, J.B., Hollenbach, D., Beckwith, S., Simonelli, D.P., Roush,
T., & Fong, W. 1994, ApJ, 421, 615
Prato, L., et al. 2001, ApJ, 549, 590
Qi, C., et al. 2004, ApJ, 616, 11
Rhee, J.H., Song, I., Zuckerman, B., & McElwain, M. 2006,
accepted to the Astrophysical Journal
Schneider et al. 1999, ApJ, 513, 127
Sheret, I., Dent, W.R.F., & Wyatt, M.C. 2004, MNRAS, 348, 1282
Smith, P.S., et al. 2006, ApJ, 644, L125
Stauffer, J.R., Hartmann, L.W., & Barrado y Navascues, D. 1995,
ApJ, 454, 910
Stelzer, B., & Neuhäuser, R. 2000, A&A, 361, 581
Strubbe, L.E., & Chiang, E.I. 2006, ApJ, 648, 652
Sylvester, R.J., Dunkin, S.K., & Barlow, M.J. 1989, MNRAS, 327,
Telesco, C.M., et al. 2000, ApJ, 530, 329
Trilling et al. 2001, ApJ, 552, L151
Uzpen, B., et al. 2005, ApJ, 629, 512
Webb, R. 2000, UCLA PhD Thesis
Webb, R. et al. 1999, ApJ, 512, L63
Weinberger, A.J., Becklin, E.E., Zuckerman, B., & Song, I. 2004,
AJ, 127, 2246
Weinberger, A.J., et al. 2002, ApJ, 566, 409
Weintraub, D.A., Kastner, J.H., & Bary, J.S. 2000, ApJ, 541, 767
Weintraub, D.A., Sandell, G., & Duncan, W.D. 1989, ApJ, 340,
Werner, M., et al. 2006, ARA&A, 44, 269
Wilner, D., Ho, P., Kastner, & Rodŕıguez 2000, ApJ, 534, 101
Wood, B.E., et al. 2005, ApJ, 628, L143
Wyatt, M.C. et al. 2007, ApJ, 658, 569
Wyatt, M.C. 2005a, IAU Colloquium, 197, 383
Wyatt, M.C. 2005b, A&A, 433, 1007
Wyatt, M.C., & Dent, W.R.F. 2002, MNRAS, 334, 589
Wyatt, M.C., Dent, W.R.F., & Greaves, J.S. 2003, MNRAS, 342,
Zuckerman, B. 2001, ARA&A, 39, 549
Zuckerman, B., & Becklin, E.E. 1993, ApJ, 406, L25
Zuckerman, B., & Song, I. 2004, ARA&A, 42, 685
http://arxiv.org/abs/astro-ph/0701441
http://arxiv.org/abs/astro-ph/0612242
ABSTRACT
  We present photometric detections of dust emission at 850 and 450 micron
around the pre-main sequence M1 dwarf TWA 7 using the SCUBA camera on the James
Clerk Maxwell Telescope. These data confirm the presence of a cold dust disk
around TWA 7, a member of the TW Hydrae Association. Based on the 850 micron
flux, we estimate the mass of the disk to be 18 lunar masses (0.2 Earth masses)
assuming a mass opacity of 1.7 cm^2/g with a temperature of 45 K. This makes
the TWA 7 disk (d=55 pc) an order of magnitude more massive than the disk
reported around AU Microscopii (GL 803), the closest (9.9 pc) debris disk
detected around an M dwarf. This is consistent with TWA 7 being slightly
younger than AU Mic. We find that the mid-IR and submillimeter data require the
disk to be comprised of dust at a range of temperatures. A model in which the
dust is at a single radius from the star, with a range of temperatures
according to grain size, is as effective at fitting the emission spectrum as a
model in which the dust is of uniform size, but has a range of temperatures
according to distance. We discuss this disk in the context of known disks in
the TW Hydrae Association and around low-mass stars; a comparison of masses of
disks in the TWA reveals no trend in mass or evolutionary state (gas-rich vs.
debris) as a function of spectral type.

<|endoftext|><|startoftext|>
Introduction
Simulated annealing (SA) is used in a wide variety of biomolecular
calculations. Crystallographic refinement protocols[2] and standard
NMR structure calculations[3, 4, 5] both rely on SA to optimize a
“target function,” constructed so that the global minimum of the tar-
get function corresponds to the native structure. Molecular dynamics
calculations often begin by cooling a configuration from a high tem-
perature ensemble to a lower temperature, at which the simulation is
to be performed.
In this paper, we consider a different use for SA calculations. Since
a set of structures that is generated by a series of SA trajectories is a
nonequilibrium sample, they may not be used to calculate equilibrium
averages. However, Neal demonstrated a simple procedure, called “an-
nealed importance sampling” (AIS) that allows the nonequilibrium
sample to be reweighted into an equilibrium one[1]. AIS is closely
connected with the Jarzynski relation[6]. To our knowledge, the al-
gorithm has only appeared once in the chemical physics literature[7],
where it was used (along with sophisticated Monte Carlo techniques)
to sample a one-dimensional potential. Here, we demonstrate an ap-
plication of the AIS algorithm to generate an equilibrium sample of
an implicitly solvated peptide, and discuss other uses for AIS which
may of interest to the molecular simulation community.
The basic idea which underlies SA is also the motivation for other
temperature based sampling methods, notably J-walking[8], simulated
tempering[9, 10] and replica exchange/parallel tempering[11, 12]. By
coupling a simulation to a high temperature reservoir, it is hoped that
the low temperature simulation may explore the configuration space
more thoroughly. This is achieved by thermally activated crossing
of energetic barriers, which are large compared to the thermal en-
ergy scale of the lower temperature simulation, but are crossed more
frequently at higher temperature. Simulated and parallel tempering
differ in the way that the different temperature simulations are cou-
pled. Simulated tempering heats and then cools the system, in a way
that maintains an equilibrium distribution. Parallel tempering couples
simulations run in parallel at different temperatures by occasionally
swapping configuartions between temperatures, again in such a way
that canonical sampling is maintained.
AIS offers yet another approach to utilizing a high temperature
ensemble for equilibrium sampling at a lower temperature. A sam-
ple of a high temperature ensemble is annealed to a lower temper-
ature, by alternating constant temperature simulation with steps in
which the tempertaure is jumped to a lower value. Each annealed
structure is assigned a weight, which depends on the trajectory that
was traced during the annealing process. Equilibrium averages over
the lower temperature ensemble may then be calculated by a simple
weighted average. Furthermore, the distribution of trajectory weights
contains useful information about the statistics of the annealed sam-
ple. Roughly, a schedule which quenches high temperature structures
very rapidly to low temperature will result in a sample dominated by
a few high weight structures, resulting in poor statistics. This con-
nection between the distribution of weights and the extent to which
the schedule is not adiabatic ought to be of interest to anyone who
uses SA protocols—whether for equilibrium sampling or for structure
calculation.
We have used the AIS method to generate 298 K equilibrium en-
sembles of the dileucine peptide, by annealing structures from a 500
K distribution with several different cooling schedules. For the most
efficient schedule used, we found a modest gain (about a factor of 3)
over constant temperature simulation. This result is consistent with
earlier observations on the expected efficiency of temperature-based
sampling methods[13].
2 Theory
Consider a standard simulated annealing (SA) trajectory, in which a
protein is slowly cooled from a conformation x at a (high) temperature
TM . The cooling is achieved by alternating constant temperature
dynamics with “temperature jumps,” during which the temperature
is lowered instantaneously. Usually, the system is cooled to a low
temperature, since the aim of standard SA calculations is to find the
global minimum on the energy landscape. But we can imagine instead
ending the run at T0 = 300 K—in fact, we can think of many such
runs, all ending at 300 K. We then have an ensemble of conformations,
though clearly not distributed canonically at T0. We would like to
know if there is a way to reweight this distribution, so that it can be
used to compute equilibrium averages at T0. The affirmative answer
is provided by the annealed importance sampling (AIS) method.
To make the discussion more concrete, consider many independent
annealing trajectories xj(t) which at time tM−1 have just been cooled
from inverse temperature βM to βM−1. As usual, each temperature
defines a distribution of conformations: πi(x) ∝ exp[−βiU(x)]. Imme-
diately after tM−1, before the system is allowed to relax to πM−1(x),
we can compute the equilibrium average of an arbitrary quantity A
over πM−1(x) by using the weight w(x) = πM−1(x)/πM (x):
〈A〉M−1ZM−1 =
dxA(x)πM−1(x) =
dxA(x)πM (x)w(x), (1)
where 〈A〉i denotes an average over πi, and Zi =
dxπi(x). In other
words, we may reweight the distribution πM(x) to calculate averages
over πM−1(x), by multiplying by the ratio of Boltzmann factors.
Generalizing the argument toM temperature steps is straightforward[1],
by forming the product of weights for successive cooling steps:
wj ≡ w(xj(t0)) =
πi−1(xj(ti−1))
πi(xj(ti−1))
. (2)
Equation 2 gives the weight for trajectory j, cooled at successive times
tM−1, tM−2,... through inverse temperatures βM , βM−1,... to reach
conformation xj(t0). At each temperature, reweighting ensures that
averages may be calculated for the appropriate canonical distribution,
even though the system has not yet relaxed.
The AIS idea is easily turned into an algorithm for producing a
canonical distribution from serially generated annealing trajectories:
(i) Generate a sample of the distribution πM(x), by a
sufficiently long simulation at TM .
(ii) Pull a conformation from πM (x) at random and anneal
down to β0, yielding conformation x1(t0). Keep track of the
weight w(x1(t0)) for this trajectory by Eq. 2.
(iii) Repeat steps (iii) and (iv) N times, yielding congiura-
tions xj and weights w(xj) ≡ wj for j = 1, 1, ..., N .
Equilibrium averages at temperature T0 are then calculated by a
weighted average:
〈A〉0 =
j=1wjAj∑N
The cooling schedule is defined by the number and spacing of the
temperature steps, as well as the duration of the constant temperature
simulation at each step. As available resources necessarily limit the
CPU time spent on each annealing trajectory, careful consideration of
the schedule is in order. Clearly, a schedule in which high temperature
configurations are quenched in one step to low temperature amounts
to a single-step reweighting procedure[14]. We may expect that such a
schedule would be quite ineffective for large temperature jumps, since
very few configurations in the high temperature distribution have ap-
preciable weight in the low temperature distribution. By introducing
intermediate steps, the system is allowed to relax locally, bridging the
high and low temperature distributions in a way that echoes replica
exchange protocols[11, 12], simulated tempering[9, 10], and the multi-
ple histogram method[15]. However, the “top-down” structure of the
algorithm most closely resembles J-walking[8, 16].
3 Results
The dileucine peptide (ACE-[Leu]2-NME) is good choice for the vali-
dation of new algorithms, as it is small enough (50 atoms, including
nonpolar hydrogens) that exhaustive sampling by standard simulation
methods is possible, yet more akin to protein systems than a one- or
two-dimensional “toy” model.
The high temperature ensemble was generated by 300 nsec of
Langevin dynamics at TM = 500 K, as implemented in Tinker v.
4.2.2[17], with a timestep of 1.0 fsec, and a friction constant of 91
psec−1, and solvation was treated by the GB/SA method[18]. Frames
were written every psec, resulting in a sample of 3× 104 frames in the
high temperature sample.
The 500 K sample was annealed down to 298 K using 4 different
schedules, consisting of a total of 3, 5, 9, and 17 temperatures, includ-
ing the endpoints. In each case, the temperatures were distributed
geometrically. Following each temperature jump, the velocities were
reinitialized by sampling randomly from the Maxwell-Boltzmann dis-
tribution, and then allowed to relax at constant temperature for a
time tR = 0.5 psec (except where noted) with the protocol described
above. A total of N = 1.6× 104 annealing trajectories were generated
for each schedule. The control of the integration routine to effect the
annealing, as well as the calculation of the trajectory weights, were
implemented in a Perl script.
Figure 1 shows that the 298 K distribution of energy is recovered
by the AIS procedure. It is noteworthy that the 500 K distribution
(corresponding to the high T sample) overlaps very little with the 298
K distribution, and yet the 298 K distribution is reproduced well for
the two slowest schedules. Equally interesting is how poorly the algo-
rithm performs when the structures are cooled too rapidly, especially
on the low E side of the distribution, where there is no overlap with
the high T distribution. We conclude that the schedules with 3 or 5
T -steps quench the structures too rapidly, resulting in many of the tra-
jectories becoming “stuck” in high-energy states that are metastable
at 298 K.
This last observation may be quantified by asking, “How many
of the annealed structures contribute appreciable weight to averages
calculated with Eq. 3?” To address this question, for each schedule we
estimated the number of configurations n which contribute appreciable
weight to the averages:
≡ fN, (4)
where wmax is the largest weight observed (see Table 1). If this number
is near 1, then a small number of trajectories dominate the average—
see Eq. 3 —and poor results should be expected. The effective fraction
of the annealing trajectories which generate “useful” or “successful”
structures is denoteed by f .
A more complete picture is provided by the full distribution of
the (logarithm of) trajectory weights (Fig. 2). For each schedule, the
weights which contribute the most to the T = 298 K sample are to
the right, at large values of w. The trend is clear—as slower cooling
is effected, the distribution narrows and shifts to the right. It has
been shown that the accuracy of averages computed from this type of
protocol is roughly related to the variance of the (adjusted) weights[1].
(The adjusted weight is the weight divided by the average weight.)
This “rule of thumb” is borne out by the data in Fig. 2 and Table
1—as the cooling slows down the distribution of weights narrows, and
the number of trajectories contributing to the equilibrium averages
increases. This type of analysis may serve as a means of distinguishing
between annealing schedules to decide on a cooling schedule which is
slow enough to yield reasonable estimates of equilibrium averages. It
is also essential for optimizing an AIS protocol for sampling efficiency,
as discussed in the next few paragraphs.
How much better than standard simulation (if at all) is equilib-
rium sampling by AIS? In order to make a direct comparison between
AIS and constant temperature simulation, we need to compare the
CPU time invested per statistically independent configuration in each
protocol. For the constant temperature simulation, this time may be
estimated in several ways[19, 20], and is essentially the time needed
for the simulation to “forget” where it has been. Following the con-
vention for correlation times, we call this time τi = τ(Ti), where i
labels the temperature: M for the high T distribution, and 0 for the
low T distribution. For the system studied here, τM = 0.8 nsec and
τ0 = 3.0 nsec, as estimated from timseries of the α → β backbone
dihedral transition[13].
The total cost to generate a structure in an AIS simulation is the
sum of the costs of generating a structure in the high T distribution
plus that for the annealing phase. Of course, not every annealing
trajectory contributes to thermodynamic averages(Eq. 3). What then
is the total cost tcost of a “successful” annealed structure? The first
part is from high temperature sampling—i.e., τM . The second part is
the cost of all the annealing trajectories, divided by the number which
contribute to equilibrium averages. The time tanneal is the time spent
annealing each structure:
tanneal = tR(M − 2) (5)
Recall that tR is the duration of the constant temperature relaxation
steps, and there is no relaxation phase at the highest and lowest tem-
peratures.
The total cost tcost is then the sum of τM and tanneal:
tc = τM + tanneal/f. (6)
The efficiency of an AIS protocol may then be computed by taking
the ratio R ≡ τ0/tcost (see Table 1), which gives the factor by which
an AIS protocol is more or less efficient than constant temperature
simulation. The data in Table 1 show that the best schedule used
here offer a modest speedup over constant temperature simulation, of a
factor of about 3. These findings are in agreement with an analysis we
have published of another temperature-based sampling protocol[13].
We note that an optimized AIS protocol would require tuningN based
on (perhaps preliminary) estimates of f .
It is instructive to compare the AIS results to simple reweighting—
i.e., AIS with no intermediate temperature steps or relaxation. In
this case, no computer time is spent annealing, and the efficiency
gain is simply τ0/τM = 3.75. The fraction f is of course reduced
compared to any AIS protocol—when reweighting our 500 K dileucine
trajectory to 298 K distribution, f = 1.3 × 10−4—but this has no
impact on the efficiency, provided a sufficient number of snapshots are
available for reweighting. However, it is clear that f will be greatly
reduced in systems which undergo a folding transition upon lowering
the temperature. This is simply a reflection of the fact that there is
negligible overlap between the folded and unfolded distributions. In
such cases, a useful reweighting protocol would require the generation
of astronomical numbers of structures in the TM distribution, and
annealing is advised.
4 Conclusion
We have demonstrated the application of Neal’s annealed importance
sampling (AIS) algorithm for equilibrium sampling of the dileucine
peptide. AIS allows the calculation of equilibrium averages from a
nonequilibrium sample of strutures that results from a simulated an-
nealing protocol. To our knowledge, AIS has not previously been
applied to a molecular system. While the method, as näıvely im-
plemented here, represents only a modest improvement over constant
temperature simulation, it is interesting for several reasons beyond
equilibrium sampling.
First, in applications where simulated annealing is already in widespread
use (most notably, NMR structure calculations[3, 4, 5]), the path
weights may be used to calculate (perhaps noisy) equilibrium aver-
ages, and perhaps ultimately Boltzmann-distributed ensembles. The
path weights also contain information that can be used to discriminate
between different schedules, which may provide a way to optimize the
schedule, based on the analysis of tann, the cost of annealing to “good”
structures.
Second, it may be possible to improve considerably on the efficiency
of the method by implementing a more sophisticated version, which
uses a resampling procedure to prune the low weight paths at each
cooling step. (For a detailed discussion of resampling methods, see
the book by Liu[21].) In this approach, we first cool some number N
of structures from the high temperature (TM ) ensemble, yielding N
weighted structures at TM−1. We then resample N times from this
TM−1 ensemble, according to the cumulative distribution function of
the weights, pruning the low weight paths without biasing the sample.
This type of approach was recently applied successfully to sampling
near native protein configurations of a discretized and coarse-grained
model[22]. Nevertheless, we emphasize that the ultimate efficiency of
any AIS protocol limited by the intrinsic sampling rate of the highest
temperature, which may be modest; see Ref. ??.
Finally, the AIS procedure could be naturally combined with “an-
nealing” in the parameters of the Hamiltonian. Such a hybrid of AIS
and Hamiltonian switching might be used, for example, to transform
an NMR target function into a molecular mechanics potential function,
over the course of a structure calculation. The result of such a cal-
culation would be an equilibrium ensemble of structures, distributed
according to the molecular mechanics potential. Such ensembles would
find wide application, for instance in docking or homology modeling.
Acknowledgements The authors thank Gordon Rule for several
enlightening discussions about NMR methodology. D. Z. thanks Chris
Jarzynski for alerting him to Neal’s work on AIS. This research was
supported by the NSF (MCB-0643456), the NIH (GM076569), and
the Department of Computational Biology, University of Pittsburgh.
References
[1] Radford M. Neal. Annealed importance sampling. Stat. and
Comp., 11:125–139, 2001.
[2] Axel T. Brünger, Paul D. Adams, G. Marius Clore, Warren L.
DeLano, Piet Gros, Ralf W. Grosse-Kuntsleve, Jian-Sheng Jiang,
John Kuszewski, Michael Nilges, Navraj S. Pannu, Randy J.
Read, Luke M. Rice, Thomas Simonson, and Gregory L. War-
ren. Crystallograhy and NMR system: a new software suite for
macromolecular structure determination. Acta. Cryst., D54:905–
921, 1998.
[3] C.D. Schwieters, J.J. Kuszewski, N. Tjandra, and G.M. Clore.
The Xplor-NIH NMR molecular structure determination package.
J. Magn. Res., 160:66–74, 2003.
[4] P. Güntert, W. Braun, and K. Wüthrich. Torsion angle dynamics
for NMR structure calculation with the new program DYANA.
J. Mol. Biol., 273:283–298, 1997.
[5] Axel T. Brünger, Paul D. Adams, and Luke M. Rice. New ap-
plications of simulated annealing in X-ray crystallography and
solution NMR. Structure, 15:325–336, 1997.
[6] C. Jarzynski. Nonequilibrium equality for free energy differences.
Phys. Rev. Lett., 78:2690–2693, 1997.
[7] S. Brown and T. Head-Gordon. Cool walking: A new Markov
chain Monte Carlo method. J. Comp. Chem., 24:68–76, 2002.
[8] D. D. Frantz, D. L. Freeman, and J. D. Doll. Reducing quasi-
ergodic behavior in Monte Carlo simulations by J-walking: appli-
cations to atomic clusters. J. Chem. Phys., 93:2768–2783, 1990.
[9] A. P. Lyubartsev, A. A. Martsinovski, S. V. Shevkunov, and P. N.
Vorontsov-Velyaminov. New approach to Monte Carlo calculation
of the free energy: Method of expanded ensembles. J. Chem.
Phys., 96:1776–1783, 1992.
[10] E. Marinari and G. Parisi. Europhys. Lett., 19:451–458, 1992.
[11] Charles J. Geyer. Markov chain Monte Carlo maximum likeli-
hood. In E. M. Keramidas, editor, Proceedings of the 23rd sympo-
sium on the interface, Computing science and statistics. Interface
foundation of North America, 1991.
[12] David J. Earl and Michael W. Deem. Parallel tempering: theory,
applications, and new perspectives. Phys. Chem. Chem. Phys.,
23:3910–3916, 2005.
[13] Daniel M. Zuckerman and Edward Lyman. A second look at
canonical sampling of biomolecules using replica exchange simu-
lation. J. Chem. Th. and Comp., 4:1200–1202, 2006.
[14] Alan M. Ferrenberg and Robert H. Swendsen. New Monte
Carlo technique for studying phase transitions. Phys. Rev. Lett.,
61:2635–2638, 1988.
[15] S. Kumar, D. Bouzida, R. H. Swendsen, P. A. Kollman, and
J. M. Rosenberg. The weighted histogram analysis method for
free energy calculations in biomolecules. i. the method. J. Com-
put. Chem., 13:1011–1021, 1992.
[16] Alexander Matro, David L. Freeman, and Robert Q. Topper.
Computational study of the structures and thermodynamic prop-
erties of ammonium chloride clusters using a parallel jump-
walking approach. J. Chem. Phys., 104, 1996.
[17] http://dasher.wustl.edu/tinker/.
[18] W. C. Still, A. Tempczyk, and R. C. Hawley. Semianalytical
treatment of solvation for molecular mechanics and dynamics. J.
Am. Chem. Soc., 112:6127–6129, 1990.
[19] A. M. Ferrenberg, D. P. Landau, and K. Binder. Statistical and
systematic errors in monte carlo sampling. J. Stat. Phys., 63:867–
882, 1991.
[20] Edward Lyman and Daniel M. Zuckerman. On the convergence
of biomolecular simulations by evaluation of the effective sample
size. preprint: http://xxx.lanl.gov/abs/q-bio.QM/0607037.
[21] Jun S. Liu. Monte Carlo strategies in scientific computing.
Springer, New York, 2001.
[22] Jinfeng Zhang, Ming Lin, Rong Chen, Jie Liang, and Jun S. Liu.
Monte Carlo sampling of near-native structures of proteins with
applications. PROTEINS, 66:61–68, 2007.
T-steps Annealing time Successful Fractional Net cost Efficiency
tanneal structures success rate gain
M = (M − 2)tR n f ≡ n/N tcost (nsec) R
3† 0.5 psec 7.1 4.4× 10−4 1.94 1.5
5† 1.5 psec 43.7 2.7× 10−3 1.36 2.2
17† 1.5 psec 137.6 8.6× 10−3 0.97 3.1
33 1.5 psec 46.2 2.9× 10−3 1.32 2.3
9† 3.5 psec 163.2 1.0× 10−2 1.15 2.6
17 7.5 psec 205.3 1.3× 10−2 1.38 2.2
17 15.0 psec 237.2 1.5× 10−2 1.80 1.7
17 30.0 psec 353.8 2.2× 10−2 2.16 1.4
Table 1: Comparison of the efficiency of AIS between several cooling sched-
ules. n is given by Eq. 4, tcost is given by Eq. 6. The efficiency gain is the total
simulation time invested in each successful annealed structure tcost divided
by the time needed to generate an indepenendent structure by constant tem-
perature simulation. The † indicates schedules for which data are presented
in Figs. 1 and 2.
Figure Legends
Figure 1.
Distribution of energies, from standard, constant temperature simu-
lation and AIS. The dashed line is the T = 500 K distribution that
was used for the high T ensemble. The other data compare a 300
nsec, T = 298 K constant temperature simulation to 298 K ensembles
generated by the AIS algorithm with different cooling schedules. The
schedules are discussed in Table 1.
Figure 2.
Distribution of the logarithm of trajectory weights for the four cooling
schedules used in Fig. 1 and discussed in Table 1.
Figure 1:
Figure 2:
	Introduction
	Theory
	Results
	Conclusion
ABSTRACT
  Annealed importance sampling is a means to assign equilibrium weights to a
nonequilibrium sample that was generated by a simulated annealing protocol. The
weights may then be used to calculate equilibrium averages, and also serve as
an ``adiabatic signature'' of the chosen cooling schedule. In this paper we
demonstrate the method on the 50-atom dileucine peptide, showing that
equilibrium distributions are attained for manageable cooling schedules. For
this system, as naively implemented here, the method is modestly more efficient
than constant temperature simulation. However, the method is worth considering
whenever any simulated heating or cooling is performed (as is often done at the
beginning of a simulation project, or during an NMR structure calculation), as
it is simple to implement and requires minimal additional CPU expense.
Furthermore, the naive implementation presented here can be improved.

<|endoftext|><|startoftext|>
Introduction
	Stationary State
	H–theorem and the Associated Entropy
	H–Theorem
	Boundness from Below
	Some Particular Cases
	Conclusion
	References
	Appendix
ABSTRACT
  A recently introduced nonlinear Fokker-Planck equation, derived directly from
a master equation, comes out as a very general tool to describe
phenomenologically systems presenting complex behavior, like anomalous
diffusion, in the presence of external forces. Such an equation is
characterized by a nonlinear diffusion term that may present, in general, two
distinct powers of the probability distribution. Herein, we calculate the
stationary-state distributions of this equation in some special cases, and
introduce associated classes of generalized entropies in order to satisfy the
H-theorem. Within this approach, the parameters associated with the transition
rates of the original master-equation are related to such generalized
entropies, and are shown to obey some restrictions. Some particular cases are
discussed.

<|endoftext|><|startoftext|>
Conduction electron spin-lattice relaxation time in the MgB2 superconductor
F. Simon,∗ F. Murányi,† T. Fehér, A. Jánossy
Budapest University of Technology and Economics, Institute of Physics
and Condensed Matter Research Group of the Hungarian Academy of
Sciences, H-1521 Budapest, PO BOX 91, Hungary
L. Forró
IPMC/SB Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne-EPFL, Switzerland
C. Petrovic,‡ S.L. Bud’ko, P.C. Canfield
Ames Laboratory, U.S. Department of Energy and Department of Physics
and Astronomy, Iowa State University, Ames, Iowa 50011, USA
(Dated: July 19, 2021)
The spin-lattice relaxation time, T1, of conduction electrons is measured as a function of tem-
perature and magnetic field in MgB2. The method is based on the detection of the z component
of the conduction electron magnetization under electron spin resonance conditions with amplitude
modulated microwave excitation. Measurement of T1 below Tc at 0.32 T allows to disentangle con-
tributions from the two Fermi surfaces of MgB2 as this field restores normal state on the Fermi
surface part with π symmetry only.
INTRODUCTION
The conduction electron spin-lattice relaxation time in
metals, T1, is the characteristic time for the return to
thermal equilibrium of a spin system driven out of equi-
librium by e.g. a microwave field at electron-spin reso-
nance (ESR) or a spin-polarized current. The applica-
bility of metals in “spintronics” devices in which infor-
mation is processed using electron spins [1] depends on a
sufficiently long spin life-time. In pure metals T1 is lim-
ited by the Elliott mechanism [2, 3], i.e. the scattering
of conduction electrons by the random spin-orbit poten-
tial of non-magnetic impurities or phonons. In supercon-
ductors, the Elliott mechanism becomes ineffective and
a long T1 is predicted well below Tc [3]. Here we report
the direct measurement of the spin-lattice relaxation time
of conduction electrons in MgB2 in the superconducting
state. The motivation to study the magnetic field and
temperature dependence of T1 is two-fold: i) to test the
predicted lengthening of T1 to temperatures well below
Tc, ii) to measure the contributions to T1 from different
Fermi surface sheets and to compare with the correspond-
ing momentum life-times, τ .
The lengthening of T1 has been observed in a re-
stricted temperature range below Tc in the fulleride
superconductor, K3C60 by measuring the conduction
electron-spin resonance (CESR) line-width, ∆H [4]. This
method assumes 1/T1 = 1/T2 = γe∆H , where γe/2π =
28.0 GHz/T is the electron gyromagnetic ratio, and 1/T2
is the spin-spin or transversal relaxation rate. It is
limited to cases where the homogeneous broadening of
the CESR line due to a finite spin lifetime outweighs
∆Hinhom, the line broadening from inhomogeneities of
the magnetic field. In a superconducting powder sam-
ple, the CESR line is inhomogeneously broadened below
the irreversibility line due to the distribution of vorte-
ces, which is temperature and magnetic field dependent.
This prevents the measurement of T1 from the line-width
and calls for a method to directly measure T1. Electron
spin echo techniques, which usually enable the measure-
ment of T1, are not available for the required nanosec-
ond time resolution range. The magnetic resonance tech-
nique, termed longitudinally detected (LOD) ESR [5, 6]
used in this work allows to measure T1’s as short as a
few ns. The method is based on the measurement of the
electron spin magnetization along the magnetic field, Mz,
using modulated microwave excitation. Mz recovers to
its equilibrium value with the T1 time-constant, thus the
method allows the direct measurement of T1 independent
of magnetic field inhomogeneities.
MgB2 has a high superconducting transition tempera-
ture of Tc = 39 K [7] and its unusual physical properties
[8, 9, 10, 11] are attributed [12, 13] to its disconnected,
weakly interacting Fermi surface (FS) parts. The Fermi
surface sheets derived from B-B bonds with π and σ char-
acters (π and σ FS) have smaller and higher electron-
phonon couplings and superconductor gaps, respectively,
and contribute roughly equally to the density of states
(DOS). The strange band structure leads to unique ther-
modynamic properties: magnetic fields of about 0.3-0.4
T restore the π FS well below Tc for all field orienta-
tions in polycrystalline samples but the material remains
superconducting to much higher fields. This is character-
ized by a small and nearly isotropic upper critical field,
c2 ∼ 0.3 − 0.4 T [10, 14] and a strongly anisotropic
one, Hσ
c2 = 2 − 16 T, [10, 15, 16] related to the π and σ
Fermi surface sheets, respectively. Our measurements at
low fields and low temperatures determine T1 from the π
FS alone, while high field and high temperature experi-
ments measure T1 averaged over the whole FS. We find
http://arxiv.org/abs/0704.0466v1
that spin relaxation in high purity MgB2 is temperature
independent in the high field normal state between 3 K
and 50 K, indicating that it arises from non magnetic im-
purities. Spin relaxation times for electrons on the π and
σ Fermi surface sheets are widely different but are not
proportional to the corresponding momentum relaxation
times.
EXPERIMENTAL
The same MgB2 samples were used as in a previous
study [17]. Thorough grinding, particle size selection
and mixing with SnO2, an ESR silent oxide, produced
a fine powder with well separated small metallic par-
ticles. The nearly symmetric appearance of the CESR
signal [18] proves that penetration of microwaves is ho-
mogeneous and that the particles are smaller than the
microwave penetration depth of ∼ 1µm. SQUID mag-
netometry showed that grinding and particle selection
do not affect the superconducting properties. The parti-
cles are not single crystals but rather aggregates of small
sized single crystals. Continuous wave (cw) and longitu-
dinally detected ESR experiments were performed in a
home-built apparatus [6] at 9.1 and 35.4 GHz microwave
frequencies, corresponding to 0.32 and 1.27 T resonance
magnetic fields. The 9.1 GHz apparatus is based on a
loop-gap resonator with a low quality factor (Q ∼ 200)
and the 35.4 GHz instrument does not employ a mi-
crowave cavity at all. The cw-ESR was detected using an
audio frequency magnetic field modulation. Line-widths
are determined by Lorentzian fits to the cw-ESR data.
For the LOD-ESR, the microwaves are amplitude modu-
lated with f = Ω/2π of typically 10 MHz and the result-
ing varying Mz component of the sample magnetization
is detected with a coil which is parallel to the external
magnetic field and is part of a resonant circuit that is
tuned to f and is matched to 50 Ohms. cw-ESR at 420
GHz (centered at 15.0 T) was performed at EPFL using a
quasi-optical microwave bridge with no resonant cavities.
RESULTS
Relaxation in the normal state
The low temperature behavior of the spin-lattice re-
laxation time in MgB2 in the normal state can be mea-
sured using cw-ESR from the homogeneous line-width,
∆Hhom, using 1/T1 = γe∆Hhom at high fields, H > Hc2
that suppresses superconductivity. The maximum upper
critical field is Hc2,max ∼ 16 T for particles with field
in the (a, b) crystallographic plane in the polycrystalline
sample at zero temperature [19]. We did not observe any
effects of superconductivity on the CESR, at 15 T it is
suppressed in the full sample above a temperature of a
0 10 20 30 40 50 60
T (K)
FIG. 1: CESR line-width of MgB2 as a function of tempera-
ture for the 15 T CESR measurement (�). Open circles (©)
show the homogeneous line-width (∆Hhom) after correcting
for the field dependent broadening as explained in the text.
Representative error bars are shown at the lowest tempera-
ture.
0 2 4 6 8 10 12 14 16
Magnetic field (T)
FIG. 2: ESR line-width of MgB2 as a function of magnetic
field measured at 40 K (�). Solid curve is a linear fit to the
data with parameters given in the text.
few K. Fig. 1 shows that the temperature dependence of
the CESR line-width at 15 T is small below 40 K.
The CESR line-width is magnetic field dependent as
shown in Fig. 2 at 40 K: it is linear as function
of magnetic field with ∆H = ∆H0 + b ∗ H , where
∆H0 = 0.90(1) mT is the residual line-width and b =
0.057(1) mT/T. The residual homogeneous line-width
corresponds to T1 = 6.3 ns at 40 K. The linear relation
can be used to correct the 15 T CESR line-width data
to obtain the homogeneous contribution, ∆HHom(T ) =
∆H(15 T, T )− 15T · b as the magnetic field dependence
is expected to be temperature independent. We show
the homogeneous line-width in Fig. 1. We find that it is
0 10 20 30 40 50 60
32 34 36 38
T (K)
T (K)
FIG. 3: Inhomogeneous CESR line broadening in MgB2 below
Tc at 0.32 T. Full and open symbols show the CESR line-width
for up and down magnetic field sweeps, respectively. Inset
shows the data near Tc. Note the line narrowing between Tc
and Tirr and the field sweep direction dependent line-widths
below the irreversibility temperature.
temperature independent within experimental precision
between 3 and 50 K. This means that the spin-lattice re-
laxation time flattens to a residual value that is given by
non-magnetic impurities.
Relaxation in the superconducting state
In type II superconductors, CESR arises from thermal
excitations and from normal state vortex cores. The in-
homogeneity of the magnetic field in the vortex lattice
or glass states does not broaden the CESR line. The lo-
cal magnetic field inhomogeneity is averaged since within
the spin life-time itinerant electrons travel long distances
compared to the inter-vortex distance [4]. This is in con-
trast to the NMR case where the line-shape is affected:
the nuclei are fixed to the crystal and nuclei inside and
outside the vortex cores experience different local fields
[20]. In other words, a superconducting a single crystal
sample would display a narrow conduction electron ESR
line if there were no irreversible effects. However, the
CESR line is inhomogeneously broadened below the irre-
versibility line for a superconducting powder sample: the
vortex distribution depends on a number of factors such
as the thermal and magnetic field history, grain size and,
for an anisotropic superconductor such as MgB2, on the
crystal orientation with respect to the magnetic field also.
The resulting inhomogeneous broadening of the CESR
line gives 1/γ∆Hinhom = T
2 ≪ T1,2, and T1 cannot be
measured from the line-width. In Fig. 3 we show this
effect: above Tc MgB2 has a relaxationally broadened
line-width of ∆H = 0.9 mT. Between Tc and the irre-
versibility temperature at the given field, Tirr, the CESR
remains homogeneous and narrows with the lengthening
of T1. However, below Tirr it broadens abruptly and the
line-width depends on the direction of the magnetic field
sweep: for up sweep it is broader than for down sweeps
due to the irreversibility of vortex insertion and removal.
To enable a direct measurement of the T1 spin lattice
relaxation time, one has to resort to time resolved experi-
ments. Conventional spin-echo ESR methods are limited
to T1’s larger than a few 100 ns. To measure T1’s of
a few nanoseconds, the so-called longitudinally detected
ESR was invented in the 1960’s by Hervé and Pescia [21]
and improved by several groups [22, 23]. The method
is based on the deep amplitude modulation of the mi-
crowave excitation with an angular frequency, Ω ∼ 1/T1.
When the sample is irradiated with the amplitude mod-
ulated microwaves at ESR resonance, the component of
the magnetization along the static magnetic field, Mz,
decays from the equilibrium value, M0, with a time con-
stant T1. Mz relaxes back to M0 with a T1 relaxation
time when the microwaves are turned off. The oscillat-
ing Mz is detected using a coil which is part of a resonant
rf circuit. The phase sensitive detection of the oscillat-
ing voltage using lock-in detection allows the measure-
ment of T1 using ΩT1 = v/u [5, 21], where u and v are
the amplitudes of the in- and out-of-phase components
of the oscillating magnetization after corrections for in-
strument related phase shifts. The principal limitation
of the LOD-ESR technique is its 3-4 orders of magnitude
lower sensitivity compared to conventional cw-ESR. The
LOD-ESR method and the experimental apparatus are
detailed in Refs. [5, 6].
To prove that the LOD-ESR signal of the itinerant
electrons is detected in the superconducting phase, we
compare in Fig. 4 the LOD-ESR signal with that mea-
sured with conventional continuous-wave CESR (referred
to as CESR in the following) of MgB2 in the normal and
superconducting states. The CESR signal is the deriva-
tive of the absorption due to magnetic field modulation
used for lock-in detection. This signal was previously
identified as the ESR of conduction electrons in MgB2 in
the superconducting and normal states [15, 17, 24] and
its characteristics have been discussed in detail [15, 17].
Above Tc at 40 K, the CESR line is relaxationally broad-
ened. Below Tc, it is inhomogeneously broadened and is
diamagnetically shifted, i.e. to higher resonance fields.
The irreversible effects also contribute to a non-linear
baseline known as the non-resonant microwave absorp-
tion [25]. The intensity of the CESR signal decreases
below Tc as we discussed previously [17], due to the van-
ishing of normal state electrons.
The LOD-ESR signal shows the same characteristics
as the CESR below Tc: it is broadened, shifted to higher
fields and its intensity decreases. The values for the
temperature dependent diamagnetic shifts and broad-
ening and the relative intensity change agree for the
two kinds of measurements within experimental precision
(not shown). This unambiguously proves that the LOD-
0.30 0.32 0.340.30 0.32 0.34
ESR LOD-ESR
Magnetic field (T)
15 Kb)
40 Ka)
FIG. 4: ESR (a-b) and LOD-ESR (c-d) spectra of MgB2 at
9.1 GHz (0.32 T). a) and c) at 40 K in the normal state,
and b) and d) in the superconducting state at 15 K. Solid
and dashed curves are the in- and out-of-phase LOD signals,
respectively and are offset for clarity. Vertical solid lines in-
dicate the resonance field above Tc. Note the diamagnetic
shift and broadening for for both kinds of spectra below Tc.
Also note the rotated phase of the in-phase and out-of phase
channels upon cooling.
ESR signal originates from the conduction electrons.
The change of the relaxation time T1 is visible in the
LOD-ESR spectra in Fig. 4 as a change in the relative
intensities of the in- and out-of-phase signals. At 40 K
v/u = 0.47 and at 15 K v/u = 0.95, which together with
Ω/2π = 11.4 MHz gives 6.3 and 13.3 ns relaxation times,
respectively. In Fig. 5, we show the T1 data inferred from
the LOD-ESR spectra at 0.32 and 1.27 T as a function
of the reduced temperature T/Tc.
DISCUSSION
The observed lengthening of T1 below Tc ( Fig. 5) is
expected from theory for non-magnetic scattering cen-
ters and low magnetic fields where the susceptibility is
dominated by excitations over the superconducting gap.
On the other hand, the field independence between 0.32
and 1.27 T of T1 below Tc is surprising. The lengthen-
ing of T1 below Tc in zero magnetic field for an isotropic,
type I superconductor was calculated in the framework
of weak-coupled BCS theory by Yafet [3]. He concluded
that T1 lengthens as a result of the freezing of normal
state excitations. However, no theory exists for a type II
superconductor in finite fields with Hc2 anistotropy such
as MgB2, thus here the T1 data are analyzed phenomeno-
0.0 0.5 1.0 1.5 2.0
FIG. 5: Spin-lattice relaxation time as a function of the re-
duced temperature in MgB2 at 0.32 (�) and 1.27 T (©) mag-
netic fields. Representative error bars are shown for some of
the data. Dashed curve shows T1 corresponding to ∆HHom
in the 15 T measurement such as in Fig. 1 with the reduced
temperature normalized to 39 K.
logically in the framework of the two-band/gap model of
MgB2.
In the following, we deduce the residual (low tem-
perature), impurity related spin scattering contributions
of the σ and π Fermi surface sheets. The DOS is
distributed almost equally on the FS sheets of MgB2:
Nπ/(Nπ + Nσ) = 0.56 [13], where Nπ and Nσ are the
DOS of the two types of FS sheets. A magnetic field
of ∼ 0.3 − 0.4 T closes the gap on the π FS sheets but
leaves the gap on the σ sheet almost intact. [10, 17].
This suggests that well below Tc, our experiment at 0.32
T measures exclusively the relaxation of electrons on the
fully closed π FS sheets. Since T1 at 0.32 T increases
slowly with temperature between 10 and 20 K, we ex-
trapolate T1π ≈ T1(10 K, 0.32 T) = 20(2) ns for the π
In order to separate the contribution of the σ FS to
the relaxation rate in the normal state, 1/T1n, we assume
that inter-band relaxation is negligible and 1/T1n is equal
to the average of the spin-lattice relaxation rates on the
two FS’s weighted by the corresponding DOS:
Nπ/T1π +Nσ/T1σ
Nπ +Nσ
Here T1σ is the spin-lattice relaxation time on the σ FS.
The 15 T measurement shows that 1/T1n changes little
with temperature between 3 K and 40 K. Thus we find
T1σ = 3.4(5) ns for the contribution of the σ FS sheets
using T1n = T1(Tc) = 6.3 ns, T1π = 20(2) ns and Eq. 1.
For normal metals with a simple Fermi surface, the so-
called Elliott relation [2, 3, 26, 27] holds, which states
that for a given type of disorder (e.g. phonons or dislo-
cations) T1 is proportional to the momentum relaxation
time, τ . The proportionality constant depends on the
spin orbit splitting of the conduction electron bands and
has been estimated in a number of metals from the shift
of the CESR from the free electron value. Metals with
complicated Fermi surfaces i.e. with great variations of
the electron-phonon coupling on the different FS parts
are known to deviate from the Elliott relation [28] and
calculation of T1 requires to take into account the de-
tails of the band-structure [29, 30, 31]. Examples include
polyvalent elemental metals such as Mg or Al. Clearly, a
calculation of T1 is required for MgB2, which takes into
account its band structure peculiarities. Comparison of
spin scattering and momentum scattering times of the
two types of Fermi surfaces is instructive. The relative
values of τ for the two FS parts, τπ and τσ, and for in-
terband scattering were estimated by Mazin et al. [32].
A very small impurity interband scattering and τπ < τσ,
i.e. a larger π intraband scattering relative to σ intra-
band scattering was required to explain the rather small
depression of Tc in materials with widely different con-
ductivities. De Haas-van Alphen [33] and magnetore-
sistance [34] measurements of high purity samples yield
τπ < τσ also. Such a behaviour could rise fromMg vacan-
cies, which perturb more electrons of the π band relative
to the σ band. However, our spin scattering data do not
support this. In contrast to momentum scattering, spin
scattering is stronger on the σ FS: T1π : T1σ = 6 : 1
in high purity samples and low temperatures. The rela-
tive values of T1 and τ for the two FS do not necessarily
follow the same trend, spin relaxation times at low tem-
peratures depend on spin orbit relaxation on impurities
while momentum relaxation is due to potential scatter-
ing. However, a defect center such as a Mg vacancy with a
strong modification of the electron-phonon coupling and
an atomic number strongly differing from that of the reg-
ular atoms constituting the compound would greatly af-
fect T1 compared to τ . In the two gap model Mg defects
are expected to shorten T1π more than T1σ and thus are
unlikely to be the dominant scatterers.
A final note concerns the validity of the above analysis
of T1’s in the framework of the two-band/gapmodel. The
field independence of the lowest temperature T1 for 0.32
and 1.27 T is unexpected within this model. The spin
susceptibility increases strongly between these fields and
more normal states are restored at 1.27 T than expected
from the closing of the gap on the π FS sheets alone [17].
Based on this, one would expect to observe additional
spin scattering from the restored σ FS parts, which is
clearly not the case. This also indicates that a theoret-
ical study, which takes into account the peculiarities of
MgB2 is required to explain the anomalous spin-lattice
relaxation times.
In conclusion, we presented the measurement of the
spin-lattice relaxation time, T1, of conduction electrons
as a function of temperature and magnetic field in the
MgB2 superconductor. We use a novel method based on
the detection of the z component of the conduction elec-
tron magnetization during electron spin resonance con-
ditions with amplitude modulated microwave excitation.
Lengthening of T1 below Tc is observed irrespective of the
significant CESR line broadening due to irreversible dia-
magnetism in the polycrystalline sample. The field inde-
pendence of T1 for 0.32 T and 1.27 T allows to measure
the separate contributions to T1 from the two distinct
types of the Fermi surface.
ACKNOWLEDGEMENTS
The authors are grateful to Richárd Gaál for the de-
velopment of the ESR instrument at the EPFL. F.S.
and F.M. acknowledge the Zoltán Magyary postdoc-
toral programme, the Bolyai fellowship of the Hungarian
Academy of Sciences and the Alexander von Humboldt
Foundation for support, respectively. Work supported
by the Hungarian State Grants (OTKA) No. TS049881,
F61733, PF63954 and NK60984 and by the Swiss NSF
and its NCCR ”MaNEP”. Ames Laboratory is operated
for the U.S. Department of Energy by Iowa State Uni-
versity under Contract No. W-7405-Eng-82.
∗ Corresponding author: simon@esr.phy.bme.hu
† Present address: Leibniz Institute for Solid State and
Materials Research Dresden, PF 270116 D-01171 Dres-
den, Germany
‡ Present address: Condensed Matter Physics and Ma-
terials Science Department, Brookhaven National Labo-
ratory, Upton, New York 11973-5000, USA
[1] I. Žutić, J. Fabian, and S. D. Sarma, Rev. Mod. Phys.
76, 323 (2004).
[2] R. J. Elliott, Physical Review 96, 266279 (1954).
[3] Y. Yafet, Phys. Lett. A 98, 287 (1983).
[4] N. M. Nemes, J. E. Fischer, G. Baumgartner, L. Forró,
T. Fehér, G. Oszlányi, F. Simon, and A. Jánossy, Phys.
Rev. B 61, 7118 (2000).
[5] F. Murányi, F. Simon, F. Fülöp, and A. Jánossy, J.
Magn. Res. 167, 221 (2004).
[6] F. Simon and F. Murányi, J. Magn. Res. 173, 288 (2005).
[7] J. Nagamatsu, N. Nakagawa, T. Muranaka, Y. Zenitani,
and J. Akimitsu, Nature 410, 63 (2001).
[8] F. Bouquet, R. A. Fisher, N. E. Phillips, D. G. Hinks,
and J. D. Jorgensen, Physical Review Letters 87, 047001
(2001).
[9] P. Szabó, P. Samuely, J. Kacmarćık, T. Klein, J. Marcus,
D. Fruchart, S. Miraglia, C. Marcenat, and A. G. M.
Jansen, Phys. Rev. Lett. 87, 137005 (2001).
[10] F. Bouquet, Y. Wang, I. Sheikin, T. Plackowski,
A. Junod, S. Lee, and S. Tajima, Physical Review Letters
89, 257001 (2002).
[11] S. Tsuda, T. Yokoya, Y. Takano, H. Kito, A. Matsushita,
F. Yin, J. Itoh, H. Harima, and S. Shin, Physical Review
Letters 91, 127001 (2003).
[12] J. Kortus, I. I. Mazin, K. D. Belashchenko, V. P.
Antropov, and L. L. Boyer, Phys. Rev. Lett. 86, 4656
(2001).
[13] H. J. Choi, D. Roundy, H. Sun, M. L. Cohen, and S. G.
Louie, Nature 418, 758 (2002).
[14] M. R. Eskildsen, M. Kugler, S. Tanaka, J. Jun, S. M.
Kazakov, J. Karpinski, and R. Fischer, Phys. Rev. Lett.
89, 187003 (2002).
[15] F. Simon, A. Jánossy, T. Fehér, F. Murányi, S. Garaj,
L. Forró, C. Petrovic, S. L. Bud’ko, G. Lapertot, V. Ko-
gan, et al., Phys. Rev. Lett. 87, 047002 (2001).
[16] M. Angst, R. Puzniak, A. Wisniewski, J. Jun, S. M.
Kazakov, J. Karpinski, J. Roos, and H. Keller, Phys.
Rev. Lett. 88, 167004 (2002).
[17] F. Simon, A. Jánossy, T. Fehér, F. Murányi, S. Garaj,
L. Forró, C. Petrovic, S. L. Bud’ko, R. A. Ribeiro, and
P. C. Canfield, Phys. Rev. B 72, 012511 (2005).
[18] F. J. Dyson, Physical Review 98, 349 (1955).
[19] D. K. Finnemore, J. E. Ostenson, S. L. Bud’ko, G. Laper-
tot, and P. C. Canfield, Phys. Rev. Lett. 86, 2420 (2001).
[20] P. Pincus, Phys. Lett. 13, 21 (1964).
[21] J. Hervé and J. Pescia, C. R. Acad. Sci.(Paris) 251, 665
(1960).
[22] V. A. Atsarkin, V. V. Demidov, and G. A. Vasneva, Phys-
ical Review B 52, 1290 (1995).
[23] J. Granwehr, J. Forrer, and A. Schweiger, J. Magn. Res.
151, 78 (2001).
[24] R. R. Urbano, P. G. Pagliuso, C. Rettori, Y. Kopelevich,
N. O. Moreno, and J. L. Sarrao, Phys. Rev. Lett. 89,
087602 (2002).
[25] K. W. Blazey, K. A. Muller, J. G. Bednorz, W. Berlinger,
G. Amoretti, E. Buluggiu, A. Vera, and F. C. Matacotta,
Phys. Rev. B 36, 7241 (1987).
[26] Y. Yafet, in Solid State Physics, edited by F. Seitz and
D. Turnbull (1963), vol. 14, p. 1.
[27] F. Beuneu and P. Monod, Phys. Rev. B 18, 2422 (1977).
[28] P. Monod and F. Beuneu, Phys. Rev. B 19, 911 (1979).
[29] R. H. Silsbee and F. Beuneu, Phys. Rev. B 27, 2682
(1983).
[30] J. Fabian and S. Das Sarma, Phys. Rev. Lett. 81, 5624
(1998).
[31] J. Fabian and S. Das Sarma, Phys. Rev. Lett. 83, 1211
(1999).
[32] I. I. Mazin, O. K. Andersen, O. Jepsen, O. V. Dol-
gov, J. Kortus, A. A. Golubov, A. B. Kuzmenko, and
D. van der Marel, Phys. Rev. Lett. 89, 107002 (2002).
[33] E. A. Yelland, J. C. Cooper, A. Carrington, N. E. Hussey,
P. J. Meeson, S. Lee, A. Yamammoto, and S. Tajima,
Phys. Rev. Lett. 88, 217002 (2002).
[34] I. Pallecchi, V. Ferrando, E. G. D’Agliano, D. Marre,
M. Monni, M. Putti, C. Tarantini, F. Gatti, H. U. Ae-
bersold, E. Lehmann, et al., Phys. Rev. B 72, 184512
(2005).
ABSTRACT
  The spin-lattice relaxation time, $T_{1}$, of conduction electrons is
measured as a function of temperature and magnetic field in MgB$_2$. The method
is based on the detection of the $z$ component of the conduction electron
magnetization under electron spin resonance conditions with amplitude modulated
microwave excitation. Measurement of $T_{1}$ below $T_c$ at 0.32 T allows to
disentangle contributions from the two Fermi surfaces of MgB$_{2}$ as this
field restores normal state on the Fermi surface part with $\pi$ symmetry only.

<|endoftext|><|startoftext|>
Introduction
The X-ray continuum of black hole candidates (BHCs) is roughly composed of two
main elements (see review by Liang 1998), an ultra-soft component that is thought to be
1Email: pszotag@physics.purdue.edu,cui@physics.purdue.edu
http://arxiv.org/abs/0704.0467v3
– 2 –
associated with emission from the accretion disk, and a hard component that is thought to
be produced by inverse Compton scattering of soft photons by energetic electrons that can
be either thermal or non-thermal in origin. Modeling the disk component could, in principle,
allow one to determine the radius of the inner edge of the accretion disk in a BHC (review
by Tanaka & Lewin 1995, and references therein). This has been tried and the results have
provided evidence that the accretion disk extends all the way in to the last stable orbit under
certain circumstances (Tanaka & Lewin 1995). Motivated by this observation, Zhang et al.
(1997) suggested that modeling the X-ray continuum of a BHC could lead to a measurement
of the spin of the black hole, if the mass of the black hole can be independently derived.
In retrospect, we now know that the accretion disk reaches the last stable orbit probably
only in the high-soft state (e.g., Narayan 1996) 1, so the proposed technique may only be
applicable to data taken in such a state. Since the X-ray spectrum of BHCs is dominated by
the disk component in the high-soft state, the determination of the disk parameters based
on spectral modeling should, in principle, be quite accurate, even if one neglects the hard
component whose physical origin is less well understood, particularly in the high-soft state.
However, there are still serious issues associated with the exercise.
First, the local spectrum of the X-ray emitting portion of the accretion disk is not
a blackbody, because the opacity is dominated by electron scattering. Saturated Comp-
tonization leads to a “diluted” blackbody spectrum, whose color temperature is given by
Tcol = fcolTeff , where fcol is the color correction factor and Teff is the effective tempera-
ture (Ebisuzaki et al. 1984). Much effort has gone into finding the values of fcol that are
appropriate for BHCs (Shimura & Takahara 1995; Merloni et al. 2000; Davis et al. 2006).
The situation is still uncertain, but it is clear that fcol depends on a number of important
physical parameters, such as mass accretion rate, which can vary even for a given source. It
is, therefore, not possible to know what value to use a priori. Cui et al. (2002) proposed
an observational approach to derive fcol from the data (see also Shrader & Titarchuk 1999).
Although the technique showed some promise with limited data, it needs to be tested further.
Second, there is observational evidence (Zhang et al. 2000) that the surface layer of the
accretion disk in BHCs might deviate from the standard α-disk structure (Shakura & Sunyaev
1 It has recently been argued, based on hard-state observations of BHCs (e.g., Miller et al. 2006), that
the disk also reaches the last stable orbit in the low-hard state. We must, however, caution against drawing
strong conclusions on the properties of the disk from modeling a hard-state spectrum, because it would
require a reliable extraction of the weak disk component from the dominating hard component whose precise
origin (e.g., the geometry of the emitting region and the nature of seed photons) is still being debated (see
Cui et al. 2002 for an in-depth discussion). This is why we chose to focus on the soft-state observations in
this work.
– 3 –
1973). Such an effect is expected from X-ray heating of the disk by a central hard X-ray
source (e.g., Nayakshin & Melia 1997; Mistra et al. 1998), but it is not clear why the effect
is still significant even for the high-soft state, in which hard X-ray production is expected
to be quite weak. The presence of such a “warm” layer would add further complication in
modeling the observed X-ray spectrum (Zhang et al. 2000), because Compton scattering in
the layer can further modify the spectrum.
Third, some of the widely-used disk models (e.g., the multi-color disk; Mitsuda et al.
1984) do not take into account general relativistic effects that can affect the formation of
the X-ray spectrum. Attempts have been made to incorporate the effects empirically in the
analysis by introducing a number of correction factors (Zhang et al. 1997). Recently, two
new disk models have been developed that account for the general relativistic effects (Li
et al. 2005; Davis & Hubeny 2006). The models also consider spectral hardening due to
scattering, with one treating fcol as a free parameter (Li et al. 2005) and the other carrying
out radiative transfer in the disk (Davis & Hubeny 2006). The models have been applied to
observations of a number of BHCs (Shafee et al. 2006; Davis et al. 2006; McClintock et al.
2006; Middleton et al. 2006).
In this work, we examined some of the issues and also assessed the viability of the state-
of-the-art disk models, making use of data of much improved quality that have recently
become available. Specifically, we analyzed two XMM-Newton observations of GX 339−4
and attempted to fit the observed X-ray spectra with different models. With its large effective
area and good sensitivity at low energies (< 1 keV), XMM-Newton offers distinct advantages
over other X-ray observatories for our purposes. The low-energy sensitivity is often not
appreciated as much as it should be; it is critical to reliable modeling of the disk spectrum,
because the effective temperature of the disk is typically . 1 keV for BHCs.
2. Data
2.1. XMM-Newton Observations
We analyzed data from two archival XMM-Newton observations (ObsIDs 0093562701
and 0148220201) of GX 339−4 during its 2002–2003 outburst. The first observation was
taken near the peak of the outburst (on 2002 August 24), judging from the ASM/RXTE
light curve 2, while the second one was taken at the tail end of the episode (on 2003 March
8). GX 339−4 was observed for about 61 and 20 ks during the two observations, respectively.
2See http://heasarc.gsfc.nasa.gov/xte weather
http://heasarc.gsfc.nasa.gov/xte_weather
– 4 –
Since we are mainly interested in the X-ray continuum here, we focused on the EPIC data.
The pn/EPIC detector was operated in the burst mode, with the thin optical blocking filter,
during the first observation, and the MOS/EPIC detectors were not used. In the second
observation, the pn and MOS detectors were both run in the timing mode with the medium
blocking filter. Even with the timing mode, the MOS data still suffer from severe photon
pile-up, due to the high count rate. In contrast, the pile-up effects are minimal in the pn
data. This work is, therefore, based on the pn data.
The data were reduced with the standard SAS package (version 7.0.0). We followed
the procedures described in the XMM-Newton data analysis cookbook 3 in preparing and
filtering the data, making light curves, extracting spectra, and generating the corresponding
arf and rmf files for subsequent spectral modeling. We did need to turn off bad-pixel search
in processing the first observation because of a bug in the searching routine for the burst
mode. The effects should be negligible because the source was very bright then. The events
of interest were extracted from a rectangular region, with RAWX 32–40 RAWY 3–179 and
RAWX 34–42 RAWY 3–199 for the 2002 and 2003 observations, respectively. Filtering
expressions “FLAG = 0” and “PATTERN ≤ 4” were applied to select good single and
double events.
Because the source was bright during both observations, a significant number of source
events are present even near the edge of the CCD chip, which makes it impossible to cleanly
extract background events. This should only affect the high-energy end of the spectrum
(where the background counts may become comparable or exceed the source counts). Our
choice of the central 9 columns of the chip was made to minimize the effect on the shape of
the spectrum. However, it led to an underestimation of the overall normalization, which is
also important here. To determine the normalization more accurately, we also made spectra
with events from the whole chip. The difference amounts to roughly 8%. For spectral
modeling, we added a 1% systematic error to the data and grouped the channels so that
each bin contains at least 500 counts.
2.2. RXTE Observations
To complement the soft-band coverage of XMM-Newton, we obtained simultaneous
RXTE data from the public archive. GX 339−4 was observed with RXTE for about 4
and 16 ks, respectively, during the two XMM-Newton observing periods. The data were
3See http://wave.xray.mpe.mpg.de/xmm/cookbook.
http://wave.xray.mpe.mpg.de/xmm/cookbook
– 5 –
reduced with FTOOLS 5.2. We followed the standard steps 4 in preparing and filtering
the data, deriving PCA and HEXTE spectra from data taken in the standard modes, and
generating the corresponding response files for spectral modeling.
A PCA or HEXTE spectrum consists of separate spectra from the individual detector
units that were in operation. In deriving the PCA spectra, we only used data from the first
xenon layer of each detector unit (which is best calibrated) and combined spectra from all
the live detectors into one, to maximize the signal-to-noise ratio (S/N). To estimate the PCA
background, we used the background model for bright sources (pca bkgd cmbrightvle eMv20030330.mdl).
As for the HEXTE data, we extracted a spectrum for each of the two clusters separately.
For spectral modeling, we rebinned the HEXTE spectra so that each bin contains at least
5000 counts. We also added a 1% systematic error to both the PCA and HEXTE spectra.
3. Results
We carried out spectral modeling in XSPEC (Arnaud 1996). The spectral bands of
interest are 0.5–10 keV (pn/EPIC), 3–25 keV (PCA), and > 15 keV (HEXTE). The spectra
are always jointly fitted with a common model, except for a normalization factor (fixed
at unity for the pn data) that was introduced to account for any residual difference in
the calibration of the throughput of the detectors. Strictly speaking, however, the XMM-
Newton and RXTE coverages are not always simultaneous, due to the difference not only in
the observing time but also in the orbits of the two satellites. To justify joint modeling, we
broke each of the XMM-Newton observations into 8 segments and extracted a spectrum for
each segment. We compared the individual spectra and observed no apparent variation in
the shape of the spectrum in either case.
We experimented with several models for the ultra-soft and hard components of the
spectrum. The former is often modeled with a non-relativistic, multi-temperature blackbody
model (“diskbb” in XSPEC; Mitsuda et al. 1984). For this work, we instead used the two
relativistic disk models (“kerrbb” in XSPEC, Li et al. 2005; and “bhspec”, Davis & Hubeny
2006). To test the procedure of deriving the color correction factor from the data, as proposed
by Cui et al. (2002), we also modeled the disk component with saturated Compton scattering
(“comptt” in XSPEC, in a disk geometry; Titarchuk 1994). In all cases, the hard component
of the spectrum was modeled with unsaturated Compton scattering (also “comptt” but in
a spherical geometry). Interstellar absorption was taken into account (with “phabs” in
XSPEC).
4see http://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook book.html
http://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html
– 6 –
The best and only formally acceptable fit to the continuum was obtained with comptt+comptt.
In this case, the residuals reveal the presence of discrete features, which include absorption
edges at 863 eV and 880 eV for the 2002 and 2003 observations, respectively, and emission
lines at 569 eV and 562 eV. We suspect that the edges are calibration artifacts, since we
were not able to associate them with any elements. On the other hand, the emission features
could be real, with the former being associated with O VIII and the latter with O VII (corre-
sponding to transitions at rest-frame energies 569 eV and 561 eV, respectively), which would
imply a plasma temperature of 0.1–0.2 keV. The lines are unresolved and are quite weak,
with equivalent widths of 26 and 21 eV for the 2002 and 2003 observations, respectively.
We will not discuss the discrete spectral features any further, since the main focus here is
on the X-ray continuum. The 2002 data also show the presence of an emission feature at
2.2 keV, which is likely an artifact caused by calibration uncertainty around the M-edge of
gold (in the mirror coating). However, the feature is not apparent in the 2003 data, which
is a bit puzzling, because the statistics are comparable in the two cases. We consulted with
the XMM-Newton Helpdesk about it, and were told that it had probably been corrected for
by the calibration in the timing mode, but not so well in the burst mode. After accounting
for the discrete spectral features (with “edge” and “gaussian” in XSPEC), we still saw, in
the residuals, genuine inconsistency between the pn/EPIC and PCA data at low energies,
which could be related to known PCA calibration uncertainties around the L-edge of xenon.
For this work, we resolved the issue simply by excluding the PCA data below 9 keV in the
joint fits.
For the 2003 data, the continuum fit also shows significant structures in the residuals
roughly in the range of 5–8 keV, which might be similar to those reported by Miller et al.
(2004) based on an XMM-Newton observation taken several months earlier. They are most
likely associated with the Kα emission of the iron and its associated absorption edge. The
excess appears broad and asymmetric in shape, as illustrated in Figure 1. Therefore, we
modeled it as a gravitationally redshifted disk line (“laor” in XSPEC; Laor 1991). Also, we
included a smeared edge (“smedge” in XSPEC) in the fit. The results are: ELaor = 6.48
+0.07
−0.09
keV, i = 51◦ +2
−1, q = 5.2± 0.2, and Rin = 1.76
+0.10
−0.06 Rg (where Rg is the gravitational radius)
for the line; Eedge = 8.5 ± 0.1 keV, W = 2.7
−0.4 keV, and τ = 0.59
+0.07
−0.05 for the edge. Note
that we fixed Rout at 400 Rg in the “laor” model. The obtained value for the inclination
angle (i) is consistent with those estimated for the system (e.g., Zdziarski et al. 2004). If
this interpretation is correct, the results would require a very high value (a∗ & 0.97) for
the black hole spin (cf. Miller et al. 2004). However, no such broad line (nor the edge) is
apparent in the 2002 data. Adding the line (as a Gaussian component) to the model, we
found that the data could accommodate it, but its equivalent width would be merely 14+12
eV, compared to 485+217
−130 eV based on the 2003 data.
– 7 –
Figure 2 shows the observed X-ray spectra of GX 339−4, along with the best-fit models
and the associated residuals. The parameters of the continuum fits are summarized in
Table 1. The source was clearly in the high-soft state during the 2002 observation, with the
disk contributing about 96% of the 0.5–10 keV flux. The spectrum became harder during the
2003 observation, but the disk still contributed about 80% of the 0.5–10 keV flux. Following
Cui et al. (2002), we attempted to derive the color correction factor from the continuum fits.
Briefly, to account for the effects of scattering in a Shakura-Sunyaev disk (Shakura & Sunyaev
1973), one should, strictly speaking, start with a multitemperature blackbody spectrum
for the seed photons. However, comptt assumes a Wien spectrum for the seed photons.
Fitting the peak of diskbb with a Wien distribution leads to Tdiskbb = 2.7TWien. Based on
spectral modeling with comptt, therefore, we can approximate the color correction factor as
fcol = Te/2.7T0 (Cui et al. 2002; see also Zhang 2005). For the 2002 and 2003 observations,
respectively, we have fcol = 1.48
+0.09
−0.08 and 1.35
+0.01
−0.01, which seem quite reasonable. This lends
support to the viability of the observational approach in deriving fcol.
We then replaced the saturated Compton component with a multicolor disk model,
but failed to obtain any formally acceptable fits to the observed X-ray continua with either
“kerrbb” or “bhspec”. In this case, we fixed the inclination angle at the value from relativistic
line modeling (51◦), the mass of the black hole at 10 M⊙, and the distance at 8 kpc (Zdziarski
et al. 2004). With “kerrbb”, we also adopted the default settings for torque-free inner
boundary condition, returning radiation, and limb darkening, and fixed the normalization
at unity and the color correction factors at the values that we derived. The best-fit models
are shown in Figure 3. Neither one is formally acceptable, with χ2/dof = 2634/1203 and
2010/1079 for the 2002 and 2003 observations, respectively. The residuals show significant
structures in both cases. Taken at its face value, the black hole spin would be about 0.7,
after correcting for the loss of flux due to the use of the central nine columns of the pn
chip (see § 2.1). The situation is hardly improved when the inclination angle and the color
correction factor are allowed to vary.
Figure 4 shows the best-fit models with “bhspec”. Again, significant features are no-
ticeable in the residuals. The χ2 values of the fits are χ2/dof = 2246/1203 and 2505/1079 for
the 2002 and 2003 observations, respectively. As already mentioned, in this model spectral
hardening (due to electron scattering) is taken into account in modeling the disk atmosphere.
Again, taken at its face value, the black hole spin is about 0.5. Relaxing the inclination angle
does not improve the fits.
– 8 –
4. Discussion
The importance of accurately modeling the accretion disk X-ray continuum of BHCs
goes beyond gaining insights into radiative processes associated with accretion flows. It also
lies in the exciting prospect of deriving the spin of black holes from such spectral modeling.
The technique is one of many that have been proposed for BHCs (Laor 1991; Bromley et al.
1997; Zhang at al. 1997; Nowak et al. 1997; Cui et al. 1998; Stella et al. 1999; Wagoner
et al. 2001; Abramowicz & Kluzniak 2001). Although varying degrees of success have been
achieved, it is fair to say that the techniques all have serious issues in their applications
to the data. Further investigation, both theoretical and observational, is thus needed to
examine the issues.
We have demonstrated in this work that the high quality of the data is starting to
demand a proper treatment of electron scattering in radiative transfer through the accre-
tion disk around a stellar-mass black hole. Some of the effects that were not appreciated
previously in fitting low S/N data are now becoming apparent. At present, this demanding
situation fundamentally limits our ability to reliably derive the physical parameters of the
accretion disk or the black hole in an X-ray binary, based on modeling the disk X-ray contin-
uum. There are also observational issues that add additional uncertainties to the exercise.
For instance, many key parameters (e.g., black hole mass, inclination angle, and distance)
that characterize a source are often poorly determined but are needed to determine, e.g.,
the black hole spin. This is entirely independent of the quality of X-ray data. Also, perhaps
less appreciated are the significant uncertainties in the absolute and cross calibrations of the
detectors on different X-ray satellites. This issue is relevant, because the determination of
the spin of a black hole in an X-ray binary depends critically on the overall normalization of
the X-ray continuum. This is the reason why one must be very careful in comparing results
based on data from different satellites.
We have shown that neither of the two state-of-the-art disk models is capable of satis-
factorily fitting the observed ultra-soft component of the spectra of GX 339−4. While this
is perhaps not totally surprising for “kerrbb”, since it does not actually carry out radia-
tive transfer calculations, it is for “bhspec”. These models have recently been applied to
data to derive the spin of black holes in a number of systems, so our finding is somewhat
disappointing. If we take the best-fit parameters at their face values, the models would
suggest that GX 339−4 contains a moderately rotating black hole (with a∗ ∼ 0.5–0.6). On
the other hand, if we attribute the asymmetry in the profile of the observed Fe Kα line to
gravitational redshift, we would conclude that the source contains a rapidly rotating black
hole (with a∗ ≈ 0.96). We should note, however, that the apparent inconsistency can be
easily reconciled when we take into account the large uncertainties associated with, e.g.,
– 9 –
black hole mass, inclination angle, and distance. For example, if we adopt 13.5 M⊙ for the
black hole mass, 51◦ for the inclination, and 7.5 kpc for the distance, the “kerrbb’ model
yields a∗ ≈ 0.93 and 0.96 when fitting the 2002 and 2003 data, respectively.
We were able to fit the ultrasoft component quite satisfactorily with a simple saturated
Compton scattering model. The results allowed us to test a procedure that was previously
suggested by Cui et al. (2002) to empirically derive the color correction factor from the same
X-ray data. The values obtained are very close to the theoretical expectation (e.g., Shimura
& Takahara 1995), which is also often adopted in spectral modeling. Therefore, our results
have provided further support for this observational approach. Although the use of a single
color correction factor ignores possible radial dependence of spectral hardening in the disk,
it does not seem unreasonable given that the X-ray emission from the disk originates from
a relatively narrow region (closest to the black hole).
5. Conclusions
Based on our joint spectral analysis of two simultaneous XMM-Newton/RXTE obser-
vations of GX 339-4, we can draw following conclusions:
• The empirical procedure to derive the color correction factor (fcol) observationally,
as proposed by Cui et al. (2002), yields reasonable results. If confirmed by further
investigations, this would eliminate a major (theoretical) uncertainty in deriving the
parameters of the disk from modeling the X-ray continuum.
• The observed X-ray continuum of GX 339-4 in the high-soft state, which is dominated
by emission from the optically-thick accretion disk, cannot be satisfactorily fitted by
any existing disk model. Therefore, one should excise caution in assessing quantitative
results from such spectral modeling.
We wish to thank Shuangnan Zhang for suggesting the derivation of the spectral hard-
ening factor from modeling the disk X-ray continuum and for subsequently collaborating on
the subject. This work is a follow-up to much of the initial discussions. We also thank Lev
Titarchuk for candid discussions on the theoretical aspects of the subject. This research
has made use of data obtained through the High Energy Astrophysics Science Archive Re-
search Center Online Service, provided by the NASA/Goddard Space Flight Center. It was
supported in part by NASA through the LTSA grant NAG5-9998. We also gratefully ac-
knowledge financial support from the Purdue Research Foundation and from a Grodzins
Summer Research Award from the Department of Physics at Purdue University (to G.P.).
Table 1. Best X-ray Continuum Fitsa
comptt comptt
Obs NH kT0 kTe τ K kT0 kTe τ K χ
2/dof
1021 cm−2 keV keV keV keV
2002 4.5 (+1− 2) 0.20 (1) 0.793 (+3− 4) 13.4 (2) 25 (+2− 1) 1.7 (+2 − 1) 46
1.8 (+1− 2) 1.7
× 10−3 978/1201
2003 4.75 (1) 0.170 (1) 0.618 (1) 10.07 (2) 7.58 (2) 1.11 (1) 183 (2) 0.38 (2) 3.21 (3) × 10−3 920/1076
aThe numbers in parentheses indicate uncertainty in the last digit. For asymmetric errors, both the lower and upper bounds are shown, again for the
last digit. The errors shown represent 90% confidence intervals for single parameter estimation.
– 11 –
REFERENCES
Abramowicz, M. A., & Kluzniak, W. 2001, A&A, 374, L19
Arnaud, K. A. 1996, in ASP Conf. Ser. 101, Astronomical Data Analysis Software and
Systems V, ed. G. Jacoby & J. Barnes (San Francisco: ASP), 17
Bromley, B. C., Chen, K., & Miller, W. A. 1997, ApJ, 475, 57
Cui, W., Feng, Y. X., Zhang, S. N., Bautz, M. W., Garmire, G. P., & Schulz, N. S. 2002,
ApJ, 576, 357
Cui, W., Zhang, S. N., & Chen, W. 1998, ApJ, 492, L53
Davis, S. W., Done, C. & Blaes, O. M. 2006, ApJ, 647, 525
Davis, S. W., & Hubeny, I. 2006, ApJS, 164, 530
Ebisuzaki, T., Sugimoto, D., & Hanawa, T. 1984, PASJ, 36, 551
Laor, A., 1991, ApJ, 376, 90
Li, L.-X., Zimmerman, E. R., Narayan, R., & McClintock, J. E. 2005, ApJS, 157, 335
McClintock, J. E., Shafee, R., Narayan, R., Remillard, R. A., Davis, S. W., & Li, L. 2006,
ApJ, 652, 518
Merloni, A., Fabian, A. C., & Ross, R. R. 2000, MNRAS, 313, 193
Middleton, M., Done, C., Gierlinski, M., & Davis, S. W. 2006, MNRAS, 373, 1004
Miller, J. M., et al. 2004, ApJ, 606, L131
Miller, J. M., et al. 2006, ApJ, 653, 525
Misra, R., Chitnis, V. R., & Melia, F. 1998, ApJ, 495, 407
Mitsuda, K., et al. 1984, PASJ, 36, 741
Narayan, R. 1996, ApJ, 462, 136
Nayakshin, S., & Melia, F. 1997, ApJ, 490, L13
Nowak, M. A., Wagoner, R. V., Begelman, M. C., & Lehr, D. E. 1997, ApJ, 477, L91
Shafee, R., McClintock, J. E., Narayan, R., Davis, S. W., Li, L.-X., & Remillard, R. A. 2006,
ApJ, 636, L113
Shakura, N. I., & Sunyaev, R. A., 1973, A& A, 24, 337
Shimura, T., & Takahara, F., 1995, ApJ, 445, 780
Shrader, C., & Titarchuk, L., 1999, ApJ, 521, L91
Stella, L., Vietri, M., & Morsink, S. M. 1999, ApJ, 524, L63
– 12 –
Tanaka, Y., & Lewin, W. H. G. 1995, in X-ray Binaries, ed. W. H. G. Lewin, J. Van Paradijs,
& E. P. J. van den Heuvel (Cambridge: Cambridge Univ. Press), 126
Titarchuk, L., 1994, ApJ, 434, 570
Wagoner, R. V., Silbergleit, A. S., & Ortega-Rodŕıguez, M. 2001, ApJ, 559, L25
Zdziarski, A. A., et al. 2004, MNRAS, 351, 791
Zhang, S. N., Cui, W., & Chen, W. 1997, ApJ, 482, L155
Zhang, S. N., et al. 2000, Science, 287, 1239
Zhang, X. L. 2005, Ph.D. thesis, Univ. Alabama, Huntsville
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Energy (keV)
Fig. 1.— Broad line detected in the 2003 X-ray spectrum. Shown are the residuals after the
“laor” component is removed from the best-fit model (see text).
– 14 –
1 10 100
Energy (keV)
Energy (keV)
Fig. 2.— Observed X-ray spectra of GX 339−4 from the 2002 (left) and 2003 (right) obser-
vations. The best-fit models are shown in solid histograms. The bottom panels show the
respective residuals of the fits.
1 10 100
Energy (keV)
Energy (keV)
Fig. 3.— Same as Fig. 2 but the disk emission was modeled with “kerrbb”.
– 15 –
Fig. 4.— Same as Fig. 2 but the disk emission was modeled with “bhspec”.
	Introduction
	Data
	XMM-Newton Observations
	RXTE Observations
	Results
	Discussion
	Conclusions
ABSTRACT
  We critically examine issues associated with determining the fundamental
properties of the black hole and the surrounding accretion disk in an X-ray
binary based on modeling the disk X-ray continuum of the source. We base our
work mainly on two XMM-Newton observations of GX 339-4, because they provided
high-quality data at low energies (below 1 keV) which are critical for reliably
modeling the spectrum of the accretion disk. A key issue examined is the
determination of the so-called "color correction factor", which is often
empirically introduced to account for the deviation of the local disk spectrum
from a blackbody (due to electron scattering). This factor cannot be
pre-determined theoretically because it may vary with, e.g., mass accretion
rate, among a number of important factors. We follow up on an earlier
suggestion to estimate the color correction observationally by modeling the
disk spectrum with saturated Compton scattering. We show that the spectra can
be fitted well and the approach yields reasonable values for the color
correction factor. For comparison, we have also attempted to fit the spectra
with other models. We show that even the high-soft-state continuum (which is
dominated by the disk emission) cannot be satisfactorily fitted by
state-of-the-art disk models. We discuss the implication of the results.

<|endoftext|><|startoftext|>
Introduction
Let G = (V1, V2, E) be an undirected bipartite graph. A biclique subgraph in G
is a complete bipartite subgraph of G and maximum edge biclique (MEB) is the
problem of finding a biclique subgraph with the most number of edges. MEB is
a well-known problem and received much attention in recent years because of
its wide range of applications in areas including machine learning [14], manage-
ment science [16] and bioinformatics, where it is found particularly relevant in
the formulation of numerous biclustering problems for biological data analysis
[5,2,18,19,17], and we refer readers to the survey by Madeira and Oliveira [13]
for a fairly extensive discussion on this. Maximum edge biclique is shown to be
NP-hard by Peeters [15] via a reduction from 3SAT. Its approximability status,
on the other hand, remains an open question despite considerable efforts [7,8,12]
1. In particular, Feige and Kogan [8] conjectured that maximum edge biclique
1 Note it might be easy to confuse the MEB problem with the Bipartite Clique problem
discussed by Khot in [12]. Bipartite Clique, which also known as Balanced Complete
http://arxiv.org/abs/0704.0468v2
2 Jinsong Tan
is hard to approximate within a factor of nǫ for some ǫ > 0. In this paper, we
consider a weighted formulation of this problem defined as follows
Definition 1. S-Maximum Weighted Edge Biclique (S-MWEB)
Instance: A complete bipartite graph G = (V1, V2, E) (throughout the paper, let
η = max{|V1|, |V2|} and n = |V1|+ |V2|), a weight function wG : E → S, where
S is a set consisting of both positive and negative integers.
Question: Find a biclique subgraph of G where the sum of weights on edges is
maximized.
A few comments are in order. First note it is not a lose of generality but a
technical convenience to require the graph be complete, one can always think of
an incomplete bipartite graph as complete where non-edges are assigned weight
0. Also note we require that both positive and negative weights be in S at the
same time because otherwise S-MWEB becomes a trivial problem.
Our study of S-MWEB is motivated by the problem of finding statistically
significant biclusters in microarray data analysis in the SAMBA model [18]
and the Minimum Description Length with Holes (MDLH) problem [3,4,10];
detailed discussion of the two problems can be found in Sect. 4. Our main
technical contribution of this paper is to show that if S satisfies the condition
|minS
| ∈ Ω(ηδ−1/2) ∩O(η1/2−δ), where δ > 0 is any arbitrarily small constant,
then no polynomial time algorithm can approximate S-MWEB within a factor
of nǫ for some ǫ > 0 unless RP = NP. This result enables us to answer open
questions regarding the hardness of the SAMBA model and the MDLH prob-
lem. Since maximum edge biclique can be characterized as a special case of
S-MWEB with S = {−η, 1}, the nǫ-inapproximability result also provides inter-
esting insights into the conjectured nǫ-inapproximability [8] of maximum edge
biclique.
The rest of the paper is organized in three sections. In Sect. 2, we present
the main technical result by proving the aforementioned inapproximability of S-
MWEB. We give applications of this by answering hardness questions regarding
two applied problems in Sect. 3. We conclude this work by raising a few open
problems in the last section.
2 Approximating S-Maximum Edge Biclique is Hard
We start this section by giving two lemmas about CLIQUE, which will be used
in establishing inapproximability for the biclique problems we consider later.
Lemma 1 is a recent result by Zuckerman [20], obtained by a derandomization
of results of H̊astad [11]; Lemma 2 follows immediately from Lemma 1.
Lemma 1. ([20]) It is NP-hard to approximate CLIQUE within a factor of
n1−ǫ, for any ǫ > 0.
Bipartite Subgraph [8], aims to maximize the number of vertices of a balanced sub-
graph whereas MEB aims to maximize the total weights on edges in a (not necessarily
balanced) subgraph.
Inapproximability of Maximum Weighted Edge Biclique and Its Applications 3
Lemma 2. For any constant ǫ > 0, no polynomial time algorithm can approx-
imate CLIQUE within a factor of n1−ǫ with probability at least 1
poly(n)
unless
RP = NP.
2.1 A Technical Lemma
We first describe the construction of a structure called {γ, {α, β}}-Product,
which will be used in the proof of our main technical lemma.
Definition 2. ({γ, {α, β}}-Product)
Input: An instance of S-MWEB on complete bipartite graph G = V1×V2, where
γ ∈ S and α < γ < β; an integer N .
Output: Complete bipartite graph GN = V N1 × V N2 constructed as follows: V N1
and V N2 are N duplicates of V1 and V2, respectively. For each edge (i, j) ∈ GN ,
let (φ(i), φ(j)) be the corresponding edge in G. If wG(φ(i), φ(j)) = γ, assign
weight α or β to (i, j) independently and identically at random with expectation
being γ, denote the weight by random variable X. If wG(φ(i), φ(j)) 6= γ, then
keep the weight unchanged. Call the weight function constructed this way w(·).
For any subgraph H of GN , denote by wγ(H) (resp., w−γ(H)) the total
weight of H contributed by former-γ-edges (resp., other edges). Clearly, w(H) =
wγ(H) + w−γ(H).
With a graph product constructed in this randomized fashion, we have the fol-
lowing lemma.
Lemma 3. Given an S-MWEB instance G = (V1, V2, E) where γ ∈ S, and a
number δ ∈ (0, 1
]; let η = max (|V1|, |V2|), N = η
δ(3−2δ)+3
δ(1+2δ) , GN = (V N1 , V
2 , E)
be the {γ, {α, β}}-product of G and S ′ = (S ∪ {α, β})− {γ}. If
1. |β − α| = O((Nη) 12−δ); and
2. there is a polynomial time algorithm that approximates the S ′-MWEB
instance within a factor of λ, where λ is some arbitrary function in the size of
the S ′-MWEB instance
then there exists a polynomial time algorithm that approximates the S-MWEB
instance within a factor of λ, with probability at least 1
poly(n)
Proof. For notational convenience, we denote η
−δ by f(η) throughout the proof.
Define random variable Y = X − γ, clearly E[Y ] = 0. Suppose there is a poly-
nomial time algorithm A that approximates S ′-MWEB within a factor of λ, we
can then run A on GN , the output biclique G∗B corresponds to N
2 bicliques in
G (not necessarily all distinct). Let G∗A be the most weighted among these N
subgraphs of G, in the rest of the proof we show that with high probability, G∗A
is a λ-approximation of S-MWEB on G.
Denote by E1 the event that G
B does not imply a λ-approximation on G.
Let H be the set of subgraphs of GN that do not imply a λ-approximation on G,
4 Jinsong Tan
clearly, |H| ≤ 4Nη. Let H ′ be an arbitrary element in H, we have the following
inequalities
Pr {E1} ≤ Pr
at least one element in H is a λ-approximation of GN
≤ 4Nη · Pr
H ′ is a λ-approximation of GN
= 4Nη · Pr{E2}
where E2 is the event that H
′ is a λ-approximation of GN .
Let the weight of an optimal solution U1×U2 of G be K, denote by UN1 ×UN2
the correspondingN2-duplication in GN . Let x1 and x2 be the number of former-
γ-edges in H ′ and UN1 × UN2 , respectively. Suppose E2 happens, then we must
w−γ(H
′) + x1γ ≤ N2(Kλ − 1)
w−γ(H
′) + wγ(H
′) ≥ 1
(w−γ(U
1 × UN2 ) + wγ(UN1 × UN2 ))
where the first inequality follows from the fact that we only consider integer
weights. Since w−γ(U
1 × UN2 ) = N2K − x2γ, it implies
(wγ(H
′)− x1γ)−
(wγ(U
1 × UN2 )− x2γ) ≥ N2
so we have the following statement on probability
Pr{E2} ≤ Pr
(wγ(H
′)− x1γ)− 1λ(wγ(U
1 × UN2 )− x2γ) ≥ N2
Let z1 (resp., z2 and z3) be the number of edges in E(H
′) − E(UN1 × UN2 )
( resp., E(UN1 × UN2 ) − E(H ′) and E(UN1 × UN2 ) ∩ E(H ′) ) transformed from
former-γ-edges in G. We have
(wγ(H
′)− x1γ)− 1λ(wγ(U
1 × UN2 )− x2γ) ≥ N2
i=1 Yi −
j=1 Yj +
k=1 Yk ≥ N2
i=1 Yi +
j=1 (−Yj) +
k=1 Yk ≥ N2
i=1 Yi ≥
j=1 (−Yj) ≥
k=1 Yk ≥
i=1 Yi ≥
j=1 (−Yj) ≥
k=1 Yk ≥
i∈{1,2,3}
3zi(c1f(Nη))
(Hoeffding bound)
≤ 3 · exp
−c2 · N
η3−2δ
(zi ≤ η2N2)
where c1, c2 are constants (c2 > 0). Now if we set N = η
+θ for some θ, we
Pr {E1} ≤ 4Nη · Pr {E2} ≤ 3 · exp
ln 4 · η
(1+2δ)
+θ − c2 · η(1+2δ)θ
For this probability to be bounded by 1
as η is large enough, we need to have
+θ < (1+2δ)θ. Solving this inequality gives θ > 2
δ(1+2δ)
. Therefore, for any
δ ∈ (0, 1
], by setting N = η
δ(3−2δ)+3
δ(1+2δ) , we have Pr{E1}, i.e. the probability that
Inapproximability of Maximum Weighted Edge Biclique and Its Applications 5
the solution returned by A does not imply a λ-approximation of G, is bounded
from above by 1
once input size is large enough. This gives a polynomial time
algorithm that approximates S-MWEB within a factor of λ with probability at
least 1
This lemma immediately leads to the following corollary.
Corollary 1. Following the construction in Lemma 3, if S ′-MWEB can be ap-
proximated within a factor of nǫ
, for some ǫ′ > 0, then there exists a polyno-
mial time algorithm that approximates S-MWEB within a factor of nǫ, where
ǫ = (1 +
δ(3−2δ)+3
δ(1+2δ)
)ǫ′, with probability at least 1
poly(n)
Proof. Let |G| and |GN | be the number of nodes in the S-MWEB and S ′-MWEB
problem, respectively. Since λ = |GN |ǫ′ ≤ |G|(1+
δ(3−2δ)+3
δ(1+2δ)
, our claim follows
from Lemma 3. ⊓⊔
2.2 {−1, 0, 1}-MWEB
In this section, we prove inapproximability of {−1, 0, 1}-MWEB by giving a
reduction from CLIQUE; in subsequence sections, we prove inapproximability
results for more general S-MWEB by constructing randomized reduction from
{−1, 0, 1}-MWEB.
Lemma 4. The decision version of the {−1, 0, 1}-MWEB problem is NP-complete.
Proof. We prove this by describing a reduction from CLIQUE. Given a CLIQUE
instance G = (V,E), construct G′ = (V ′, E′) such that V ′ = V1∪V2 where V1, V2
are duplicates of V in that there exist bijections φ1 : V1 → V and φ2 : V2 → V .
E′ = E1 ∪ E2 ∪E3
E1 = {(u, v) | u ∈ V1, v ∈ V2 and (φ1(u), φ2(v)) ∈ E}
E2 = {(u, v) | u ∈ V1, v ∈ V2, φ1(u) 6= φ2(v) and (φ1(u), φ2(v)) /∈ E}
E3 = {(u, v) | u ∈ V1, v ∈ V2, and φ1(u) = φ2(v)}
Clearly, G′ is a biclique. Now assign weight 0 to edges in E1, −1 to edges in
E2 and 1 to edges in E3. We then claim that there is a clique of size k in G if
and only if there is a biclique of total edge weight k in G′.
First consider the case where there is a clique of size k in G, let U be the set
of vertices of the clique, then taking the subgraph induced by φ−11 (U)× φ
2 (U)
in G′ gives us a biclique of total weight k.
Now suppose that there is a biclique U1×U2 of total weight k in G′. Without
loss of generality, assume U1 and U2 correspond to the same subset of vertices in
2 Note we are slightly abusing notation here by always representing the size of a given
problem under discussion by n. Here n refers to the size of S ′-MWEB (resp. S-
MWEB) when we are talking about approximation factor nǫ
(resp. nǫ). We adopt
the same convention in the sequel.
6 Jinsong Tan
V because if (φ1(U1)−φ2(U2))∪ (φ2(U2)−φ1(U1)) is not empty, then removing
(U1 −U2)∪ (U2 −U1) will never decrease the total weight of the solution. Given
φ1(U1) = φ2(U2), we argue that there is no edge of weight −1 in biclique U1×U2;
suppose otherwise there exists a weight −1 edge (i1, j2) (i1 ∈ U1, and j2 ∈ U2),
then the corresponding edge (j1, i2) (j1 ∈ U1, and i2 ∈ U2) must be of weight
−1 too and removing i1, i2 from the solution biclique will increase total weight
by at least 1 because among all edges incident to i1 and i2, (i1, i2) is of weight 1,
(i1, j2) and (i2, j1) are of weight −1 and the rest are of weights either 0 or −1.
Therefore, we have shown that if there is a solution U1 × U2 of weight k in
G′, U1 and U2 correspond to the same set of vertices U ∈ V and U is a clique of
size k. It is clear that the reduction can be performed in polynomial time and
the problem is NP, and thus NP-complete. ⊓⊔
Given Lemma 1, the following corollary follows immediately from the above
reduction.
Theorem 1. For any constant ǫ > 0, no polynomial time algorithm can approx-
imate problem {−1, 0, 1}-MWEB within a factor of n1−ǫ unless P = NP.
Proof. It is obvious that the reduction given in the proof of Lemma 4 preserves
inapproximability exactly, and given that CLIQUE is hard to approximate within
a factor of n1−ǫ unless P = NP, the theorem follows. ⊓⊔
Theorem 2. For any constant ǫ > 0, no polynomial time algorithm can approx-
imate {−1, 0, 1}-MWEB within a factor of n1−ǫ with probability at least 1
poly(n)
unless RP = NP.
Proof. If there exists such a randomized algorithm for {−1, 0, 1}-MWEB, com-
bining it with the reduction given in Lemma 4, we obtain an RP algorithm for
CLIQUE. This is impossible unless RP = NP. ⊓⊔
2.3 {−1, 1}-MWEB
Lemma 5. If there exists a polynomial time algorithm that approximates {−1, 1}-
MWEB within a factor of nǫ, then there exists a polynomial time algorithm that
approximates {−1, 0, 1}-MWEB within a factor of n5ǫ with probability at least
poly(n)
Proof. We prove this by constructing a {γ, {α, β}}-Product from {−1, 0, 1}-
MWEB to {−1, 1}-MWEB by setting γ = 0, α = −1 and β = 1. Since δ = 1
according to Corollary 1, it is sufficient to set N = η4 so that the probability of
obtaining a n5ǫ-approximation for {−1, 0, 1}-MWEB is at least 1
poly(n)
Theorem 3. For any constant ǫ > 0, no polynomial time algorithm can approx-
imate {−1, 1}-MWEB within a factor of n 15−ǫ with probability at least 1
poly(n)
unless RP = NP.
Proof. This follows directly from Theorem 2 and Lemma 5. ⊓⊔
Inapproximability of Maximum Weighted Edge Biclique and Its Applications 7
2.4 {−η
, 1}-MWEB and {−ηδ−
2 , 1}-MWEB
In this section, we consider the generalized cases of the S-MWEB problem.
Theorem 4. For any δ ∈ (0, 1
], there exists some constant ǫ such that no poly-
nomial time algorithm can approximate {−η 12−δ, 1}-MWEB within a factor of
nǫ with probability at least 1
poly(n)
unless RP = NP. The same statement holds
for {−ηδ− 12 , 1}-MWEB.
Proof. We prove this by first construct a {γ, {α, β}}-Product from {−1, 1}-
MWEB to {−η 12−δ, 1}-MWEB by setting γ = −1, α = −(Nη) 12−δ and β = 1. By
Corollary 1, we know that for any δ ∈ (0, 1
], if there exists a polynomial time al-
gorithm that approximates {−η 12−δ, 1}-MWEB within a factor of nǫ, then there
exists a polynomial time algorithm that approximates {−1, 1}-MWEB within a
factor of n
δ(3−2δ)+3
δ(1+2δ)
with probability at least 1
poly(n)
. So invoking the hardness
result in Theorem 3 gives the desired hardness result for {−η 12−δ, 1}-MWEB.
The same conclusion applies to {−1, η 12−δ}-MWEB by setting γ = 1, α = −1
and β = (Nη)
−δ. Since η is a constant for any given graph, we can simply divide
each weight in {−1, η 12−δ} by η 12−δ. ⊓⊔
Theorem 4 leads to the following general statement.
Theorem 5. For any small constant δ ∈ (0, 1
], if
∣ ∈ Ω(ηδ−1/2)∩O(η1/2−δ),
then there exists some constant ǫ such that no polynomial time algorithm can ap-
proximate S-MWEB within a factor of nǫ with probability at least 1
poly(n)
unless
RP = NP.
3 Two Applications
In this section, we describe two applications of the results establish in Sect. 3 by
proving hardness and inapproximability of problems found in practice.
3.1 SAMBA Model is Hard
Microarray technology has been the latest technological breakthrough in biolog-
ical and biomedical research; in many applications, a key step in analyzing gene
expression data obtained through microarray is the identification of a bicluster
satisfying certain properties and with largest area (see the survey [13] for a fairly
extensive discussion on this).
In particular, Tanay et. al. [18] considered the Statistical-Algorithmic Method
for Bicluster Analysis (SAMBA) model. In their formulation, a complete bipar-
tite graph is given where one side corresponds to genes and the other size cor-
responds to conditions. An edges (u, v) is assigned a real weight which could be
either positive or negative, depending on the expression level of gene u in condi-
tion v, in a way such that heavy subgraphs corresponds to statistically significant
8 Jinsong Tan
biclusters. Two weight-assigning schemes are considered in their paper. In the
first, or simple statistical model, a tight upper-bound on the probability of an
observed biclusters in computed; in the second, or refined statistical model, the
weights are assigned in a way such that a maximum weight biclique subgraph
corresponds to a maximum likelihood bicluster.
The Simple SAMBA Statistical Model: LetH = (V ′1 , V
2 , E
′) be a subgraph
of G = (V1, V2, E), E′ = {V ′1 × V ′2} − E′ and p =
|V1||V2|
. The simple statistical
model assumes that edges occur independently and identically at random with
probability p. Denote by BT (k, p, n) the probability of observing k or more
successes in n binomial trials, the probability of observing a graph at least as
dense as H is thus p(H) = BT (|E′|, p, |V ′1 ||V ′2 |). This model assumes p < 12 and
|V ′1 ||V ′2 | ≪ |V1||V2|, therefore p(H) is upper bounded by
p∗(H) = 2|V
1 ||V
2 |p|E
′|(1− p)|V
1 ||V
2 |−|E
The goal of this model is thus to find a subgraph H with the smallest p∗(H).
This is equivalent to maximizing
− log p∗(H) = |E′|(−1− log p) + (|V ′1 ||V ′2 | − |E′|)(−1− log (1− p))
which is essentially solving a S-MWEB problem that assigns either positive
weight (−1 − log p) or negative weight (−1 − log (1 − p)) to an edge (u, v), de-
pending on whether gene u express or not in condition v, respectively. The
summation of edge weights over H is defined as the statistical significance of H .
Since 1
≤ p < 1
, asymptotically we have
−1−log (1−p)
−1−log p
∈ Ω( 1
log η
) ∩ O(1).
Invoking Theorem 5 gives the following.
Theorem 6. For the Simple SAMBA Statistical model, there exists some ǫ > 0
such that no polynomial time algorithm, possibly randomized, can find a bicluster
whose statistical significance is within a factor of nǫ of optimal unless RP = NP.
The Refined SAMBA Statistical Model: In the refined model, each edge
(u, v) is assumed to take an independent Bernoulli trial with parameter pu,v,
therefore p(H) = (
(u,v)∈E′ pu,v)(
(u,v)∈E′(1 − pu,v)) is the probability of ob-
serving a subgraph H . Since p(H) generally decreases as the size of H increases,
Tanay et al. aims to find a bicluster with the largest (normalized) likelihood ra-
tio L(H) =
(u,v)∈E′ pc)(
(u,v)∈E′(1− pc))
, where pc > max(u,v)∈E pu,v is a
constant probability and chosen with biologically sound assumptions. Note this
is equivalent to maximizing the log-likelihood ratio
logL(H) =
(u,v)∈E′
(u,v)∈E′
1− pc
1− pu,v
With this formulation, each edge is assigned weight either log pc
> 0 or
log 1−pc
1−pu,v
< 0 and finding the most statistically significant bicluster is equiva-
lent to solving S-MWEB with S = {log 1−pc
1−pu,v
, log pc
}. Since pc is a constant
Inapproximability of Maximum Weighted Edge Biclique and Its Applications 9
and 1
≤ pu,v < pc, we have log (1−pc)−log (1−pu,v)log pc−log pu,v ∈ Ω(
log η
) ∩ O(1). Invoking
Theorem 5 gives the following.
Theorem 7. For the Refined SAMBA Statistical model, there exists some ǫ > 0
such that no polynomial time algorithm, possibly randomized, can find a bicluster
whose log-likelihood is within a factor of nǫ of optimal unless RP = NP.
3.2 Minimum Description Length with Holes (MDLH) is Hard
Bu et. al [4] considered the Minimum Description Length with Holes problem
(defined in the following); the 2-dimensional case is claimed NP-hard in this
paper and the proof is referred to [3]. However, the proof given in [3] suffers
from an error in its reduction3, thus whether MDLH is NP-complete remains
unsettled. In this section, by employing the results established in the previous
sections, we show that no polynomial time algorithm exists for MDLH, under
the slightly weaker (than P 6= NP) but widely believed assumption RP 6= NP.
We first briefly describe the Minimum Description Length summarization
with Holes problem; for a detailed discussion of the subject, we refer the readers
to [3,4].
Suppose one is given a k-dimensional binary matrix M , where each entry is
of value either 1, which is of interest, or of value 0, which is not of interest. Be-
sides, there are also k hierarchies (trees) associated with each dimension, namely
T1, T2, ..., Tk, each of height l1, l2, ..., lk respectively. Define level l = maxi(li).
For each Ti, there is a bijection between its leafs and the ’hyperplanes’ in the
ith dimension (e.g. in a 2-dimensional matrix, these hyperplanes corresponds to
rows and columns). A region is a tuple (x1, x2, ..., xk), where xi is a leaf node
or an internal node in hierarchy Ti. Region (x1, x2, ..., xk) is said to cover cell
(c1, c2, ..., ck) if ci is a descendant of xi, for all 1 ≤ i ≤ k. A k-dimensional l-level
MDLH summary is defined as two sets S and H , where 1) S is a set of regions
covering all the 1-entries in M ; and 2) H is the set of 0-entries covered (unde-
sirably) by S and to be excluded from the summary. The length of a summary
is defined as |S|+ |H |, and the MDLH problem asks the question if there exists
a MDLH summary of length at most K, for a given K > 0.
In an effort to establish hardness of MDLH, we first define the following
problem, which serves as an intermediate problem bridging {−1, 1}-MWEB and
MDLH.
Definition 3. (Problem P)
Instance: A complete bipartite graph G = (V1, V2, E) where each edge takes on
a value in {−1, 1}, and a positive integer k.
Question: Does there exist an induced subgraph (a biclique U1 × U2) whose
total weight of edges is ω, such that |U1|+ |U2|+ ω ≥ k.
Lemma 6. No polynomial time algorithm exists for Problem P unless RP = NP.
3 In Lemma 3.2.1 of [3], the reduction from CLIQUE to CEW is incorrect.
10 Jinsong Tan
Proof. We prove this by constructing a reduction from {−1, 1}-MWEB to Prob-
lem P as follows: for the given input biclique G = (V1, V2, E), make N duplicates
of V1 and N duplicates of V2, where N = (|V1| + |V2|)2. Connect each copy of
V1 to each copy of V2 in a way that is identical to the input biclique, we then
claim that there is a size k solution to {−1, 1}-MWEB if and only if there is a
size N2k solution to Problem P .
If there is a size k solution to {−1, 1}-MWEB, then it is straightforward that
there is a solution to Problem P of size at leastN2k. For the reverse direction, we
show that if no solution to {−1, 1}-MWEB is of size at least k, then the maximum
solution to Problem P is strictly less than N2k. Note a solution UN1 × UN2 to
Problem P consists of at most N2 (not necessarily all distinct) solutions to
{−1, 1}-MWEB, and each of them can contribute at most (k − 1) in weight to
UN1 ×UN2 , so the total weight gained from edges is at most N2(k− 1). And note
the total weight gained from vertices is at most N(|V1|+ |V2|) = N
N , therefore
the weight is upper bounded by N
N +N2(k − 1) < N2k and this completes
the proof.
As a conclusion, we have a polynomial time reduction from {−1, 1}-MWEB
to Problem P . Since no polynomial time algorithm exists for {−1, 1}-MWEB
unless RP = NP, the same holds for Problem P . ⊓⊔
Theorem 8. No polynomial time algorithm exists for MDLH summarization,
even in the 2-dimension 2-level case, unless RP = NP.
Proof. We prove this by showing that Problem P is a complementary problem
of 2-dimensional 2-level MDLH.
Let the input 2DmatrixM be of size n1×n2, with a tree of height 2 associated
with each dimension. Without loss of generality, we only consider the ’sparse’
case where the number of 1-entries is less than the number of 0-entries by at
least 2 so that the optimal solution will never contain the whole matrix as one
of its regions. Let S be the set of regions in a solution. Let R and C be the set
of rows and columns not included in S. Let Z be the set of all zero entries in M .
Let z be the total number of zero entries in the R × C ’leftover’ matrix and let
w be the total number of 1-entries in it. MDLH tries to minimize the following:
(n1 − |R|) + (n2 − |C|) + (|Z| − z) + w = (n1 + n2 + |Z|)− (|R|+ |C|+ z − w)
Since (n1 + n2 + |Z|) is a fixed quantity for any given input matrix, the 2-
dimensional 2-level MDLH problem is equivalent to maximizing (|R|+|C|+z−w),
which is precisely the definition of Problem P .
Therefore, 2-dimensional 2-level MDLH is a complementary problem to Prob-
lem P and by Lemma 6 we conclude that no polynomial time algorithm exists
for 2-dimensional 2-level MDLH unless RP = NP. ⊓⊔
4 Concluding Remarks
Maximum weighted edge biclique and its variants have received much atten-
tion in recently years because of it wide range of applications in various fields
Inapproximability of Maximum Weighted Edge Biclique and Its Applications 11
including machine learning, database, and particularly bioinformatics and com-
putational biology, where many computational problems for the analysis of mi-
croarray data are closely related. To tackle these applied problems, various kinds
of heuristics are proposed and experimented and it is not known whether these
algorithms give provable approximations. In this work, we answer this question
by showing that it is highly unlikely (under the assumption RP 6= NP) that good
polynomial time approximation algorithm exists for maximum weighted edge
biclique for a wide range of choices of weight; and we further give specific appli-
cations of this result to two applied problems. We conclude our work by listing
a few open questions.
1. We have shown that {Θ(−ηδ), 1}-MWEB is nǫ-inapproximable for δ ∈
); also it is easy to see that (i) the problem is in P when δ ≤ −1, where
the entire input graph is the optimal solution; (ii) for any δ ≥ 1, the problem is
equivalent to MEB, which is conjectured to be nǫ-inapproximable [8]. Therefore
it is natural to ask what is the approximability of the {−nδ, 1}-MWEB problem
when δ ∈ (−1,− 1
] and δ ∈ [ 1
, 1]. In particular, can this be answered by a better
analysis of Lemma 3?
2. We are especially interested in {−1, 1}-MWEB, which is closely related
to the formulations of many natural problems [1,3,4,18]. We have shown that
no polynomial time algorithm exists for this problem unless RP = NP, and we
believe this problem is NP-complete, however a proof has eluded us so far.
References
1. N. Bansal, A. Blum, and S. Chawla. Correlation clustering, Machine Learning,
56:89-113, 2004.
2. A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini. Discovering local structure in
gene expression data: The Order-Preserving Submatrix Problem. In Proceedings of
RECOMB’02, 49-57, 2002.
3. S. Bu. The summarization of hierarchical data with exceptions. Master The-
sis, Department of Computer Science, University of British Columbia, 2004.
http://www.cs.ubc.ca/grads/resources/thesis/Nov04/Shaofeng Bu.pdf
4. S. Bu, L. V. S. Lakshmanan, R. T. Ng. MDL Summarization with Holes. In Pro-
ceedings of VLDB’05, 433-444, 2005.
5. Y. Cheng, and G. Church. Biclustering of expression data. In Proceedings of
ISMB’00, 93-103. AAAI Press, 2000.
6. M. Dawande, P. Keskinocak, J. M. Swaminathan, and S. Tayur. On Bipartite and
multipartite clique problems. Journal of Algorithms, 41(2):388-403, 2001.
7. U. Feige. Relations between average case complexity and approximation complex-
ity. In Proceedings of STOC’02, 534-543, 2002.
8. U. Feige and S. Kogan. Hardness of approximation of the Balanced Complete
Bipartite Subgraph problem. Technical Report MCS04-04, The Weizmann Institute
of Science, 2004.
9. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the
Theory of NP-completeness. Freeman, San Francisco, 1979.
10. P. Fontana, S. Guha and J. Tan. Recursive MDL Summarization and Approxima-
tion Algorithms. Preprint, 2007.
http://www.cs.ubc.ca/grads/resources/thesis/Nov04/Shaofeng_Bu.pdf
12 Jinsong Tan
11. J. H̊astad. Clique is hard to approximate within n1−ǫ. Acta Mathematica, 182:105-
142, 1999.
12. S. Khot. Ruling out PTAS for Graph Min-Bisection, Densest Subgraph and Bipar-
tite Clique. In Proceedings of FOCS’04, 136-145, 2004.
13. S. C. Madeira, and A. L. Oliveira. Biclustering algorithms for biological data anal-
ysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinfor-
matics, 1:24-45, 2004.
14. N. Mishra, D. Ron, and R. Swaminathan. On finding large conjunctive clusters. In
Proceedings of COLT’03, 448-462, 2003.
15. R. Peeters. The maximum edge biclique problem is NP-complete. Discrete Applied
Mathematics, 131:651-654, 2003.
16. J. M. Swaminathan and S. Tayur. Managing Broader Product Lines Through De-
layed Differentiation Using Vanilla Boxes. Management Science, 44:161-172, 1998.
17. J. Tan, K. Chua, L. Zhang, and S. Zhu. Algorithmic and Complexity Issues of
Three Clustering Methods in Microarray Data Analysis Algorithmica, 48(2): 203-
219, 2007.
18. A. Tanay, R. Sharan, and R. Shamir. Discovering statistically significant biclusters
in gene expression data. Bioinformatics, 18, Supplement 1:136-144, 2002.
19. L. Zhang, and S. Zhu. A New Clustering Method for Microarray Data Analysis.
In Proceedings of CSB’02, 268-275, 2002.
20. D. Zuckerman. Linear Degree Extractors and the Inapproximability of Max Clique
and Chromatic Number. In Proceedings of STOC’06, 681-690, 2006.
	Inapproximability of Maximum Weighted Edge Biclique and Its Applications
	Jinsong Tan
ABSTRACT
  Given a bipartite graph $G = (V_1,V_2,E)$ where edges take on {\it both}
positive and negative weights from set $\mathcal{S}$, the {\it maximum weighted
edge biclique} problem, or $\mathcal{S}$-MWEB for short, asks to find a
bipartite subgraph whose sum of edge weights is maximized. This problem has
various applications in bioinformatics, machine learning and databases and its
(in)approximability remains open. In this paper, we show that for a wide range
of choices of $\mathcal{S}$, specifically when $| \frac{\min\mathcal{S}} {\max
\mathcal{S}} | \in \Omega(\eta^{\delta-1/2}) \cap O(\eta^{1/2-\delta})$ (where
$\eta = \max\{|V_1|, |V_2|\}$, and $\delta \in (0,1/2]$), no polynomial time
algorithm can approximate $\mathcal{S}$-MWEB within a factor of $n^{\epsilon}$
for some $\epsilon > 0$ unless $\mathsf{RP = NP}$. This hardness result gives
justification of the heuristic approaches adopted for various applied problems
in the aforementioned areas, and indicates that good approximation algorithms
are unlikely to exist. Specifically, we give two applications by showing that:
1) finding statistically significant biclusters in the SAMBA model, proposed in
\cite{Tan02} for the analysis of microarray data, is
$n^{\epsilon}$-inapproximable; and 2) no polynomial time algorithm exists for
the Minimum Description Length with Holes problem \cite{Bu05} unless
$\mathsf{RP=NP}$.

<|endoftext|><|startoftext|>
Introduction.
We start by showing a one-to-one correspondence between arrangements of d lines in P2,
and lines in Pd−2. This correspondence is a particular case of a more general one between
arrangements of d sections on ruled surfaces, which generalize line arrangements (see Remark
2.1), and certain curves in Pd−2. This was developed by the author in [21]. In there, we
consider arrangements as single curves via moduli spaces of pointed stable rational curves.
One of the main ingredients is the description of these moduli spaces given by Kapranov in
[14] and [15]. In the present paper, we only treat the case of line arrangements, giving a proof
of this correspondence by means of quite elementary geometry. An important consequence
of seeing an arrangement as a single curve is, as it turns out, to answer questions about
realization of certain line arrangements. That is the second part of this paper. Through this
correspondence, we are able to classify the so-called (3, q)-nets over C for all q ≤ 6, and the
Quaternion nets. This classification shows various new properties for 3-nets, and opens the
question about realization of Latin squares in P2
(they give the combinatorial data which
defines a 3-net). The following is an outline of the paper.
In Section 2, we prove a one-to-one correspondence between arrangements of d lines, and
lines in Pd−2. An arrangement of d lines is a set
A = {L1, L2, . . . , Ld}
of d labeled lines in P2 such that
i=1 Li = ∅. As it was pointed out above, our general
correspondence is between arrangements of sections and curves. For this reason, instead
of considering line arrangements, we consider pairs (A, P ) where A is an arrangement of d
lines, and P ∈ P2 is a point outside of
i=1 Li. In Proposition 2.1, we prove a one-to-one
correspondence between these pairs, up to projective equivalence, and lines in Pd−2 outside
of a fixed hyperplane arrangement Hd. We also sketch a second proof of it (see Remark
2.1) to hint the more general correspondence mentioned before (see [21] for details).
Under this correspondence, for a fixed arrangement A, different choices of P may produce
different lines in Pd−2. Thus, if one fixes the combinatorial type of A and wants to study
http://arxiv.org/abs/0704.0469v4
its moduli space, the presence of this artificial point P introduces more parameters than
needed. In practice, even the question of realization of A becomes hard with this extra point
P . To eliminate this difficulty, we take P in A and consider the new pair (A′, P ), where the
lines in A′ are the lines in A not containing P (with a certain labelling). By taking P as
a point lying on several lines of A, one greatly simplifies computations to prove or disprove
its realization over some field, and to find a moduli space for its combinatorial type.
In Sections 3 and 4, we use our method to study a particular type of line arrangements
which are called nets. There is a large body of literature about them (cf. [1], [3], [5], [6],
[17], [22], [19], [20], [9]). Nowadays, they are of interest to topologists who study resonance
varieties of complex line arrangements (see [17], [22], [4], [9]). In general, they can be thought
as the geometric structures of finite quasigroups, which in turn are intimately related with
Latin squares [6]. We define (p, q)-nets in Section 3. We exemplify our correspondence
by computing the Hesse arrangement, which is the only (4, 3)-net over C, and by showing
that (4, 4)-nets do not exist in characteristic different than 2 (they do in characteristic 2,
see Example 3.3). In [22], Yuzvinsky proved that (p, q)-nets over C are only possible for
p = 3, 4, 5 (not true in positive characteristic, where any p is possible, see Example 3.3).
Examples of (5, q)-nets were unknown, and for (4, q)-nets the only example was the Hesse
arrangement. In [19], Stipins proved that (5, q)-nets do not exist over C, leaving open the
case p = 4. It is believed that the only (4, q)-net is the Hesse arrangement. In Section 4, we
present a classification for (3, q)-nets over C with q ≤ 6.
It is known that a q×q Latin square provides the combinatorial data which defines a (3, q)-
net (see [6], [17], [16], [19]). Until very recently, the only known (3, q)-nets corresponded
to Latin squares coming from multiplication tables of certain abelian groups. Yuzvinsky
conjectured in [22] that this should always be the case. In [19], there was given a three
dimensional family of (3, 5)-nets not coming from a group. For the case q = 6, we have
twelve possible cases associated to the twelve main classes of 6×6 Latin squares. In Section
4, we show that only nine of them are realizable in P2 over C. These nine cases present new
properties for 3-nets: we have four three dimensional and five two dimensional families, some
of them define nets strictly over C, for others we have nets over R or even over Q, etc. After
that, we construct a three dimensional family of (3, 8)-nets associated to the Quaternion
group, which has members defined over Q. The new cases corresponding to the symmetric
and Quaternion groups show that there are (3, q)-nets associated to non-abelian groups (see
[22, Conj. 6.1]). Out of this, a natural question is: find a combinatorial characterization of
the main classes of Latin squares (see Remark 3.1) which realize (3, q)-nets in P2
We denote the projective space of dimension n by Pn, and a point in it by [x1 : . . . :
xn+1] = [xi]
i=1 . If P1, . . . , Pr are r distinct points in P
n, then 〈P1, . . . , Pr〉 is the projective
linear space spanned by them. The points P1, . . . , Pn+2 in P
n are said to be in general
position if no n + 1 of them lie in a hyperplane.
Acknowledgments: I am grateful to Dave Anderson, my thesis advisor Igor Dolgachev,
Sean Keel, Finn Knudsen, Janis Stipins, and Jenia Tevelev for valuable discussions. I would
also like to acknowledge the referee for helping me to improve the exposition of this paper.
2. Arrangements of d lines in P2, and lines in Pd−2.
Definition 2.1. Let d ≥ 3 be an integer. An arrangement of d lines A is a set of d labeled
lines {L1, . . . , Ld} in P2 such that
i=1 Li = ∅.
When the labelling is not relevant, we will consider A as the plane curve
i=1 Li. We
introduce ordered pairs (A, P ), where A is an arrangement of d lines in P2, and P is a point
in P2 \A. If (A, P ) and (A′, P ′) are two such pairs, we say that they are isomorphic if there
exists an automorphism T of P2 such that T (Li) = L
i for every i, and T (P ) = P
′. Let Ld
be the set of isomorphism classes of pairs (A, P ). For example, clearly L3 is a set with only
one element, represented by the class of the pair
{{x = 0}, {y = 0}, {z = 0}}, [1 : 1 : 1]
On the other hand, let us fix d points in Pd−2 in general position. We precisely take
P1 = [1 : 0 : . . . : 0], P2 = [0 : 1 : 0 : . . . : 0], . . . , Pd−1 = [0 : . . . : 0 : 1], Pd = [1 : . . . : 1].
Consider the projective linear spaces
Λi1,...,ir = 〈Pj : j /∈ {i1, . . . , ir}〉,
where 1 ≤ r ≤ d − 1 and i1, . . . , ir are distinct numbers, and let Hd be the union of all
the hyperplanes Λi,j. Hence, Λi,j = {[x1 : . . . : xd−1] ∈ Pd−2 : xi = xj} for i, j 6= d,
Λi,d = {[x1 : . . . : xd−1] ∈ Pd−2 : xi = 0}, and
Hd = {[x1 : . . . : xd−1] ∈ Pd−2 : x1x2 · · ·xd−1
(xj − xi) = 0}.
The proof of the following proposition is inspired by a particular case of the so-called
Gelfand-MacPherson correspondence [15, Chap. 2].
Proposition 2.1. There is a one-to-one correspondence between Ld and the set of lines in
Pd−2 not contained in Hd.
Proof. Let us fix a pair (A, P ), where A is defined by the linear polynomials
Li(x, y, z) = ai,1x+ ai,2y + ai,3z, 1 ≤ i ≤ d.
Consider the embedding ι(A,P ) : P
2 →֒ Pd−1 given by
[x : y : z] 7→
[L1(x, y, z)
L1(P )
: . . . :
Ld(x, y, z)
Ld(P )
Then, ι(A,P )(P
2) is a projective plane, ι(A,P )(P ) = [1 : . . . : 1], and ι(A,P )(Li) = ι(A,P )(P
{yi = 0} for every i ∈ {1, 2, . . . , d}. We now consider the projection
̺ : Pd−1 \ [1 : . . . : 1] → Pd−2, [y1 : y2 : . . . : yd] 7→ [y1 − yd : y2 − yd : . . . : yd−1 − yd].
In this way, if Σi,j = {[y1 : y2 : . . . : yd] : yi = yj}, we see that ̺(Σi,j) = Λi,j. Therefore,
we have that ̺
ι(A,P )(P
is a line in Pd−2 not contained in Hd. To show the one-to-one
correspondence, we need to prove that (A, P ) 7→ ̺
ι(A,P )(P
gives a well-defined bijection
between Ld and the set of lines in P
d−2 not contained in Hd. Clearly we have a bijection
between projective planes in Pd−1 passing through [1 : . . . : 1] and not contained in
Σi,j ,
and the set of lines in Pd−2 not contained in Hd.
Let T : P2 → P2 be an automorphism of P2. Let B =
be the 3× 3 invertible matrix
corresponding to T−1. Consider the pair (A′, P ′) defined by A′ = {L′i = T (Li)}di=1 and
P ′ = T (P ). Then, the equations defining the lines L′i are
j=1 ai,jbj,1
j=1 ai,jbj,2
j=1 ai,jbj,3
z = 0.
Hence, we obtain that ι(A,P ) = ι(A′,P ′) ◦ T , and so our map (A, P ) 7→ ̺
ι(A,P )(P
is well-
defined on Ld.
It is clearly surjective, so we only need injectivity. Let ι(A,P ) and ι(A′,P ′) be the corre-
sponding maps for the pairs (A, P ) and (A′, P ′) such that ι(A,P )(P2) = ι(A′,P ′)(P2). Let
T = ι−1
(A′,P ′) ◦ ι(A,P ) : P2 → P2. Then, T is an automorphism of P2 such that T (Li) = L′i
for every i and T (P ) = P ′. Hence they are isomorphic, and so we have the one-to-one
correspondence. �
Remark 2.1. The following is a sketch of how this one-to-one correspondence works for
arrangements of sections on geometrically ruled surfaces (cf. [?, p. 369]) via the moduli
spaces M0,d+1 (cf. [14]). We will do it only for line arrangements, and over C. For the
general case see [21].
Let us fix a pair (A, P ) as before, and let BlP (P2) be the blow-up of P2 at the point P [?,
p. 386]. Then, we have an induced genus zero fibration BlP (P
2) → P1. The pull-back of A in
BlP (P
2) is an arrangement of d labeled sections, each of which belongs to the fix class E+F .
Here E is the exceptional divisor of the blow-up, and F is any fiber. Conversely, given an
arrangement of d sections A in BlP (P2) with members in the fix class E+F , we blow-down
the exceptional divisor E to obtain a pair (A, P ) in P2. Isomorphic pairs (A, P ) correspond
to isomorphic arrangements of sections (via automorphisms of the fibration BlP (P
2) → P1).
Now the correspondence. We have fixed the pair (A, P ), and the fibration BlP (P2) → P1
as given above. Consider the genus zero fibration f : R → P1, where R is the blow-up at
all the singular points of A in BlP (P2) except nodes. Then, f is a family of (d+ 1)-marked
stable curves of genus zero. The markings are given by the labeled lines of A, which are
now d labeled sections of f , and the (−1)-curve coming from the exceptional divisor E in
BlP (P
2). Therefore, since M0,d+1 is a fine moduli space, we have the following commutative
diagram coming from its universal family.
// M 0,d+2
// M 0,d+1
Let B′ be the image of g in M 0,d+1. It is a projective curve, since f has singular and
non-singular fibers, and so f is not isotrivial. Let us now consider the Kapranov map
ψd+1 : M 0,d+1 → Pd−2 [15, p. 81], and let B = ψd+1(B′). Because of the geometry of the
fibers of f and the Kapranov’s construction, one can prove (see [21]) that B intersects all the
hyperplanes Λi,j transversally. Say B intersects Λi,j. This means that the lines Li and Lj of
A intersect in P2. But since they are lines, they can only intersect at one point. Therefore,
we must have B.Λi,j = 1, and so deg(B) = 1, that is, B is a line in P
d−2. Observe that this
line is outside of Hd.
In particular, B′ is a smooth rational curve. It is not hard to see the converse, this is,
how to obtain a pair (A, P ) from a line in Pd−2 outside of Hd (see [21]). Moreover, one can
check that the pair we obtain is unique up to isomorphism of pairs. In this way, to prove
the one-to-one correspondence, we have to show that the map g is an inclusion.
Assume deg(g) > 1. Notice that g is totally ramified at the points corresponding to
singular fibers of f , since again they come from intersections of lines in P2, and so all the
singular fibers have distinct points as images in B′. Let sing(f) be the set of points in P1
corresponding to singular fibers of f . Then, since
i=1 Li = ∅, we have | sing(f)| ≥ 3 (at
least we have a triangle in A). Now, by the Riemann-Hurwitz formula, we have
−2 = deg(g)(−2) + (deg(g)− 1)| sing(f)|+ ǫ
where ǫ ≥ 0 stands for the contribution from ramification of f not in sing(f). But we
re-write the equation as 0 = (deg(g) − 1)(| sing(f)| − 2) + ǫ, and since deg(g) > 1 and
| sing(f)| ≥ 3, this is a contradiction. Therefore, deg(g) = 1 and we have proved the one-
to-one correspondence. Again, we refer to [21, Ch. 2 and 3] for the general one-to-one
correspondence involving arrangements of sections on geometrically ruled surfaces.
In this way, for each pair (A, P ) ∈ Ld, we denote its corresponding line in Pd−2 by L(A, P ).
We now want to describe more precisely how this one-to-one correspondence relates them.
Definition 2.2. Let K be any field. The pair (A, P ) is said to be defined over K if the
coefficients of the equations defining the lines in A, and the coordinates of P are in K.
Hence, for arbitrary fields K, Proposition 2.1 gives a one-to-one correspondence between
pairs (A, P ) defined over K, and lines L(A, P ) in Pd−2 defined over K.
Definition 2.3. Let 1 < k < d be an integer. A point in P2 is said to be a k-point of A if
it belongs to exactly k lines of A. If these lines are {Li1 , Li2 , . . . , Lik}, we denote this point
by [[i1, i2, . . . , ik]]. The number of k-points of A is denoted by tk.
Remark 2.2. The complexity of an arrangement relies on its k-points. There are more
constraints for the existence of an arrangement, over some field, than the plane restriction:
any two lines intersect at one point. Combinatorially there are possible line arrangements,
with assigned k-points, which may not be realizable in P2 over C (we will return to this in the
next sections, for the particular case of nets). For instance, we have the Fano arrangement
(formed by seven lines with seven 3-points) which is not realizable in P2 over fields of
characteristic 6= 2. A rather trivial restriction, which is purely combinatorial, is that the
numbers tk must satisfy
tk; this is the only linear relation they satisfy for
a fix d. In [13], Hirzebruch proved the following inequality for an arrangement of d lines in
the complex projective plane having td = td−1 = 0,
t3 ≥ d+
(k − 4)tk.
This is a non-trivial relation among the numbers tk, which comes from the Miyaoka-Yau
inequality for complex algebraic surfaces (see [21] for more about this type of restrictions).
This inequality is clearly not true in positive characteristic.
Let us fix a pair (A, P ), and its line L(A, P ) in Pd−2. Let λ be a line in P2 passing through
P . Notice that λ corresponds to a point in L(A, P ). Let K(λ) be the set of k-points of A
in λ, for all 1 < k < d; it might be empty or consist of several points. We write
K(λ) = {[[i1, i2, . . . , ik1 ]], [[j1, j2, . . . , jk2]], . . .}.
Example 2.1. In Figure 1, we have the complete quadrilateral A, formed by the set of
lines {L1, . . . , L6}, and a point P outside of A. Through P we have all the λ lines. In
the figure, we have named two such lines: λ and λ′. Thus, K(λ) = {[[3, 6]], [[1, 4]]} and
K(λ′) = {[[1, 2, 3]]}.
[[1,4]][[3,6]]
[[1,2,3]]
Figure 1. Some K(λ) sets for the pair (Complete quadrilateral, P ).
The set K(λ) imposes the following constraints for the the point [x1 : x2 : . . . : xd−1] in
L(A, P ) corresponding to λ. For each k-point [[i1, i2, . . . , ik]] in K(λ), we have:
• If for some j, ij = d, then xil = 0 for all il 6= d.
• Otherwise, xi1 = xi2 = . . . = xik 6= 0.
For [[i1, . . . , ik1]], [[j1, . . . , jk2]] in K(λ), we have that xia 6= xjb, otherwise [[i1, . . . , ik1 ]] and
[[j1, . . . , jk2]] would not be distinct points in λ. We will work out various examples when we
compute nets in the next sections.
Assume we know the combinatorial data of (A, P ), but we do not know whether is realiz-
able in P2 over some field K. Then, this realization question is equivalent to the realization
question of L(A, P ) over K. If we are only interested in the line arrangement A, the point
P introduces unnecessary dimensions which makes the realization question harder. Instead,
we consider the new pair (A′, P ′), where P ′ ∈ A and the lines of A′ are the lines in A
not containing P ′ in a certain order. Now, the line L(A′, P ′) corresponding to (A′, P ′) is
in Pd
′−2, and d′ < d. So we have less dimensions to work with, and L(A′, P ′) completely
represents our arrangement A, by keeping track of P ′.
If we take P ′ as a k-point with k large, the previous observation will be important to
simplify computations to prove or disprove the realization of A. In addition, we find a
moduli space for the combinatorial type of A, forgetting the artificial point P . Again, by
combinatorial type we mean the data given by some of the intersection of its lines.
In the next two sections, we will compute some special configurations by means of the
line L(A, P ). We make the following choices to write down equations for the lines in A:
• The point P will be always [0 : 0 : 1].
• The arrangement A will be formed by {L1, . . . , Ld}, where Li are the lines of A,
and also their linear polynomials Li(x, y, z) = (aix + biy + z) for every i 6= d, and
Ld = (z).
With these assumptions, it is easy to check that the corresponding line L(A, P ) in Pd−2
is [ait + biu]
i=1 , where [t : u] ∈ P1.
3. (p, q)-nets in P2.
We now introduce a specific type of line arrangements in P2 which are called nets. Our
main references are [6], [17], [22], [16], [19], and [20]. We begin with the definition of a net
taken from [19].
Definition 3.1. Let p ≥ 3 be an integer. A p-net in P2 is a (p + 1)-tuple (A1, ...,Ap,X ),
where each Ai is a nonempty finite set of lines of P2 and X is a finite set of points of P2,
satisfying the following conditions:
(1) The Ai are pairwise disjoint.
(2) The intersection point of any line in Ai with any line in Aj belongs to X for i 6= j.
(3) Through every point in X there passes exactly one line of each Ai.
One can prove that |Ai| = |Aj| for every i, j and |X | = |A1|2 (see [19], [22]). Let us
denote |Aj| by q, this is the degree of the net. Thus, if we use classical notation (see for
example [7] or [10]), a p-net of degree q is a (q2p, pqq) configuration. Following [19] and [22],
we denote a p-net of degree q by (p, q)-net. We label the lines of Ai by {Lq(i−1)+j}qj=1 for
all i, and define the arrangement A = {L1, L2, ..., Lpq}. We assume q ≥ 2 to get rid of the
trivial arrangement, which is actually not considered in Definition 2.1.
Assume for now that we work over an algebraically closed field K. A (p, q)-net A =
(A1, ...,Ap,X ) defines a unique pencil of curves P(A) of degree q as follows. Take any two
sets of lines Ai and Aj. Consider Ai and Aj as the equations which define them, i.e., the
multiplication of its lines. Then, the pencil is defined as
P(A) = {uAi + tAj : [u : t] ∈ P1}.
This is well-defined. Take Ak with k 6= i, j, and a point Q in Ak \ X . Then, there exists
[u : t] ∈ P1 such that uAi(Q) + tAj(Q) = 0. We write B = uAi + tAj, which is a curve
of degree q containing X ∪ {Q}. If Ak and B do not have common factors, we have, by
Bezout’s Theorem, that Ak belongs to P(A) (Ak is B times a non-zero constant). This
proves the independence of the choice of i, j to define P(A). If Ak and B have a non-trivial
common factor, then it has to be formed by the multiplication of 0 < q1 < q lines in Ak.
In this way, this common factor C contains exactly qq1 points of X . Therefore, if B = CF
and Ak = CG, the set {F = 0} ∩ {G = 0} has at least q(q − q1) > (q − q1)2 points,
deg(F ) = deg(G) = q − q1, and gcd(F,G) = 1. This is impossible by Bezout’s Theorem.
In addition, if the characteristic of K is zero, the general member of this pencil is smooth
[?, p. 272], i.e., outside of finitely many points in P1, uAi + tAj is a smooth plane curve.
Hence, after we blow up the q2 points in X we obtain a fibration of curves of genus (q−1)(q−2)
with at least p completely reducible fibers. This fibration leads to the following restriction
on nets defined over C, due to Yuzvinsky [22] (see [18] for the higher dimensional analogue).
The proof is a simple topological argument which uses the topological Euler characteristic
of the fibration.
Proposition 3.1. For an arbitrary (p, q)-net in P2 defined over C, the only possible values
for (p, q) are: (p = 3, q ≥ 2), (p = 4, q ≥ 3) and (p = 5, q ≥ 6).
The combinatorial data which defines (p, q)-nets can be expressed using Latin squares. A
Latin square is a q × q table filled with q different symbols (in our case numbers from 1 to
q) in such a way that each symbol occurs exactly once in each row and exactly once in each
column. They are the multiplication tables of finite quasigroups. Let A = (A1, ...,Ap,X )
be a (p, q)-net. The q2 p-points in X are determined by (p− 2) q × q Latin squares which
form an orthogonal set, as explained for example in [19] or [16].
Although we have defined nets as arrangements of lines already in P2, we will first “think
combinatorially” about the (p, q)-net through this orthogonal set of (p− 2) Latin squares,
and then we will attempt to prove or disprove its realization on P2 over some field. This is
the strategy from now on.
Example 3.1. In this example we use our correspondence to reprove the existence of the
famous Hesse arrangement. This (4, 3)-net has nice applications in algebraic geometry (see
for example [13, 2, 21]). Let us denote this net by A = A1 ∪ A2 ∪ A3 ∪ A4, with Ai =
{L3i−2, L3i−1, L3i}. By relabelling the lines of A, we may assume that the combinatorial
data is given by the following set of orthogonal Latin squares.
1 2 3
2 3 1
3 1 2
1 2 3
3 1 2
2 3 1
These Latin squares give the intersections of A3 and A4 respectively with A1 (columns)
and A2 (rows) (see [16] or [19]). For example, the left one tell us that L2, L6 and L7 (values)
have a common point of incidence. The right one says L2, L6 and L12 have also non-empty
intersection. Hence, [[2, 6, 7, 12]] ∈ X . In this way, we find X , which is completely described
in the following tables.
L1 L2 L3
L4 L7 L8 L9
L5 L8 L9 L7
L6 L9 L7 L8
L1 L2 L3
L4 L10 L11 L12
L5 L12 L10 L11
L6 L11 L12 L10
We now consider the new arrangement of lines A′ = A \ {L3, L4, L9, L12} together with
the point P = [[3, 4, 9, 12]]. We rename the twelve lines in the following way: A′ = {L′1 =
L1, L
2 = L2, L
3 = L5, L
4 = L6, L
5 = L7, L
6 = L8, L
7 = L10, L
8 = L11} and the lines passing
through P , α = L3, β = L4, γ = L9, and δ = L12. By our correspondence, we have a line
L(A′, P ) in P6 for the pair (A′, P ), and it passes through these distinguished four points α,
β, γ, and δ (we abuse the notation, these lines correspond to points in L(A′, P )). Then,
K(α) = {[[4, 6, 7]], [[3, 5, 8]]}, K(β) = {[[2, 6, 8]], [[1, 5, 7]]}, K(γ) = {[[2, 3, 7]], [[1, 4, 8]]}, and
K(δ) = {[[2, 4, 5]], [[1, 3, 6]]}. Hence, we write:
α = [a1 : a2 : 0 : 1 : 0 : 1 : 1], β = [1 : 0 : a3 : a4 : 1 : 0 : 1]
γ = [0 : 1 : 1 : 0 : a5 : a6 : 1], δ = [1 : a7 : 1 : a7 : a7 : 1 : a8]
for some numbers ai (with extra restrictions), and we take L(A′, P ) : αt+ βu, [t : u] ∈ P1.
For some [t : u], we have the equation αt+ βu = γ, and from this we obtain:
w − 1
1− w a4 =
w − 1 a5 = 1− w a6 = w,
where w is a parameter.
For another pair [t : u], we have αt + βu = δ, and so w2 − w + 1 = 0, a7 = 1w , and
. Therefore, our field of definition needs to have roots for the equation w2−w+1.
For instance, over C, we take w = e
3 , and then L(A′, P ) is:
[w − 1 : 1 : 0 : w : 0 : w : w]t+ [w − 1 : 0 : −1 : w : w − 1 : 0 : w − 1]u.
According to our choices at the end of Section 2, we write down the lines of A as:
{L1 = ((w−1)x+y+z), L2 = (x+z), L3 = (y)} {L4 = (x), L5 = (−y+z), L6 = (wx+wy+z)}
{L7 = (y+w2z), L8 = (wx+z), L9 = (wx+y)} {L10 = (x+wy+
z), L11 = (z), L12 = (x−wy)}.
Notice that the lines in A corresponding to α, β, γ, and δ are ux− ty = 0, where [t : u] is
the corresponding point in P1 for each of them, as points in L(A′, P ).
Example 3.2. In this example, we show that there are no (4, 4)-nets over fields of char-
acteristic 6= 2. This fact has independently been shown in [8] over C. We start supposing
their existence, let A = {Ai}4i=1 be such a net. Again, by relabelling the lines of A, we may
assume that the orthogonal set of Latin squares is:
1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1
1 2 3 4
3 4 1 2
4 3 2 1
2 1 4 3
We consider (A′, P ) defined by the arrangement of twelve lines A′ = A\{L4, L5, L12, L16},
and the point P = [[4, 5, 12, 16]]. The lines of A′ are L′1 = L1, L′2 = L2, L′3 = L3, L′4 = L6,
L′5 = L7, L
6 = L8, L
7 = L9, L
8 = L10, L
9 = L11, L
10 = L13, L
11 = L14, and L
12 = L15. The
special lines are α = L4, β = L5, γ = L12, and δ = L16. Hence, we have that
α = [a1 : a2 : a3 : 1 : a4 : 0 : 0 : a4 : 1 : a4 : 1] β = [1 : b4 : 0 : b1 : b2 : b3 : 1 : b4 : 0 : 1 : b4]
γ = [1 : 0 : c4 : c4 : 0 : 1 : c1 : c2 : c3 : c4 : 1] δ = [d1 : 1 : d4 : 1 : d1 : d4, 1 : d4 : d1 : d2 : d3]
as points in L(A′, P ), which we write as αt+ βu, [t : u] ∈ P1. Let c1 = a, c2 = b and c3 = c.
Since γ ∈ L(A′, P ), we have:
a+ b+ c− 1
b+ c− 1
c4 = a+ b+ c− 1
a + b− 1
1− b− c
Also, since δ ∈ L(A′, P ), have ad4 = 1 and ad1 + b = 1, plus the following equations:
(1) : d1(1 − c) = 1 − b − c, (2) : d1(1 − b)(c − 1) + d1c(1 − c) = (1 − b)c, (3) : (1 − b)(1 +
d4(b + c − 1)) = d4c, and (4) : c2 = (1 − b)(b + c − 1) among others. These equations are
enough to produce a contradiction. By isolating d1 in (1), replacing it in (2), and using (4),
we get c3 = (1− b)3 which requires a 3rd primitive root of 1. Say w is such, so b = 1− wc.
Then, by (3), we get w2(1 + 2c) = w − 1. Since the characteristic of our field is not 2, we
have c = 1
, and so b = 0, and a = 1. This gives a1 = 0, which is a contradiction, because
it would imply that L1 ∩ L4 ∩ L8 ∩ L9 ∩ L15 6= ∅. See next example for the char. 2 case.
Example 3.3. Positive characteristic gives more freedom for the realization of nets com-
pared to Proposition 3.1, and the previous examples. Let q be a prime number, and let K
be a field with m = qn elements. In P2
, we have m2 + m + 1 points with coordinates in
K, and there are m2 + m + 1 lines such that through each of these points passes exactly
m + 1 of these lines, and each of these lines contains exactly m + 1 of these points [12,
p. 65]. By eliminating one of these lines, we obtain a (m + 1, m)-net. Each of the m + 1
members of this net has m lines intersecting at one point, and so tm = m + 1, tm+1 = m
and tk = 0 otherwise. Hence, in positive characteristic, there are p-nets for all p ≥ 3. If we
want a (4, 4)-net, one takes q = 2 (necessary by Example 3.2) and n = 2, and considers the
corresponding (5, 4)-net. We now eliminate one of its members to obtain a (4, 4)-net.
In [20], Stipins proves that there are no 5-nets over C (see [23] for a generalization of
his result). His proof does not use the combinatorics given by Latin squares. We will see
that this issue matters for the realization of (3, 6)-nets, and so, it would be interesting to
know if Latin squares are relevant or not for the possible realization of 4-nets over C. It is
believed that, except for the Hesse arrangement, (4, q)-nets do not exist over C. In this way,
by Proposition 3.1, the only cases left over C would be 3-nets. In [22], it is proved that for
every finite subgroup H of a smooth elliptic curve, there exists a 3-net over C corresponding
to the Latin square of the multiplication table of H . In the same paper, the author proves
that there are no (3, 8)-nets associated to the group Z/2Z ⊕ Z/2Z ⊕ Z/2Z. In [19], it can
be found a classification of (3, q)-nets for q ≤ 5. In the next section we classify (3, q)-nets
for q ≤ 6, and the (3, 8)-nets associated to the Quaternion group.
Remark 3.1. (Main classes of Latin squares) As we explained before, a q × q Latin
square gives the set X for a (3, q)-net A = {A1,A2,A3}. What if we are interested only in
the realization of A in P2 as a curve, i.e., without labelling lines? Then, we divide the set
of all q × q Latin squares into the so-called main classes (see [6] or [16]).
For a given q× q Latin square M corresponding to A, by rearranging rows, columns and
symbols ofM , we obtain a new labelling for the lines in each Ai. If we writeM in its orthog-
onal array representation, i.e. M = {(r, c, s) : r = row number, c = column number, s =
symbol number}, we can perform six operations on M , each of them a permutation of
(r, c, s) which translates into relabelling the members {A1,A2,A3}, and so we obtain the
same curve in P2. We can partition the set of all q × q Latin squares in main classes (also
called Species) which means: if M,N belong to the same class, then we can obtain N by
applying a finite number of the above operations to M . In what follows, we will choose one
member from each class. The following table shows the number of main classes for small q.
q 1 2 3 4 5 6 7 8 9 10
# main classes 1 1 1 2 2 12 147 283 657 19 270 853 541 34 817 397 894 749 939
4. Classification of (3, q)-nets for q ≤ 6, and the Quaternion nets.
In order to do this classification, we use again the trick of eliminating some lines passing
through a k-point P , and considering the new pair (A′, P ). We work with (3, q)-nets, thus
P is taken as a 3-point in X (and so, we eliminate three lines from A). If the (3, q)-net is
given by A = {A1,A2,A3} such that Ai = {Lq(i−1)+j}qj=1, then the new pair (A′, P ) will
be given by A′ = {L′1 = L2, L′2 = L3, . . . , L′q−1 = Lq, L′q = Lq+2, L′q+1 = Lq+3, . . . , L′2q−2 =
L2q, L
2q−1 = L2q+2, L
2q = L2q+3, . . . , L
3q−3 = L3q}, P = L1 ∩ Lq+1 ∩ L2q+1, and α = L1,
β = Lq+1, γ = L2q+1. The corresponding line L(A′, P ) is αt+ βu, [t, u] ∈ P1.
We obtain X from a given Latin square. Then, we fix a point P in X , so the locus of the
line L(A′, P ) is actually the moduli space of the (3, q)-nets with combinatorial data defined
by that Latin square (or better its main class). We give in each case equations for the lines
of the nets depending on parameters coming from L(A′, P ).
(3, 2)-nets.
Here we have one main class given by the multiplication table of Z/2Z: 1 2
. According
to our set up, (A′, P ) is formed by an arrangement A′ of three lines and P = [[1, 3, 5]] ∈ X .
The line L(A′, P ) is actually the whole P1. This tells us that there is only one (3, 2)-net,
up to projective equivalence. The special points are α = [1 : 0], β = [0 : 1], and γ = [1 : 1].
This (3, 2)-net is represented by the singular members of the pencil λz(x−y)+µy(z−x) = 0
on P2, and it is called complete quadrilateral (see Figure 1).
(3, 3)-nets.
Again, there is one main class given by the multiplication table of Z/3Z.
1 2 3
3 1 2
2 3 1
For (A′, P ) we have an arrangement of six linesA′ and P = [[1, 4, 7]] ∈ X , the line L(A′, P )
is in P4. The special points can be taken as α = [a1 : a2 : 1 : 0 : 1], β = [1 : 0 : b1 : b2 : 1], and
γ = [1 : c1 : c1 : 1 : c2]. Then, for some [t : u] ∈ P1, we have αt + βu = γ. Thus, if a2 = a,
b2 = b and c1 = c, we have that α =
a(b−1)
: a : 1 : 0 : 1
and β =
1 : 0 :
bc(a−1)
: b : 1
. The
rest of the points in X ′ (again, although A′ is not a net, we think of X ′ as the set of 3-points
in A′ coming from X ) [[1, 3, 6]] and [[2, 4, 5]] give the same restriction (a − 1)(b − 1) = 1,
i.e., a = b
b−1 . Therefore, the line L(A
′, P ) has two parameters of freedom, and it is given by
b−1 : 1 : 0 : 1
t + [1 : 0 : c : b : 1]u where c, b are numbers with some restrictions (for
example, c, b 6= 0 or 1). Hence, we find that this family of (3, 3)-nets can be represented by:
L1 = (y), L2 = (
x + y + z), L3 = (
b−1x+ z), L4 = (x), L5 = (x+ cy + z), L6 = (by + z),
L7 = (x+ c(1− b)y), L8 = (x+ y + z), and L9 = (z).
(3, 4)-nets.
Here we have two main classes. We represent them by the following Latin squares.
1 2 3 4
2 3 4 1
3 4 1 2
4 1 2 3
1 2 3 4
2 1 4 3
3 4 1 2
4 3 2 1
They correspond to Z/4Z and Z/2Z ⊕ Z/2Z respectively. We first deal with M1. Then,
we have α = [a1 : a2 : a3 : 1 : a4 : 0 : 1 : a4], β = [1 : b1 : 0 : b2 : b3 : b4 : 1 : b1]
and γ = [1 : c1 : c2 : c2 : c1 : 1 : c3 : c4]. Let a3 = a, b4 = b and c2 = c. By imposing
γ to L(A′, P ), one can find a1 = (−1+b)abc , a2 =
(−b1+c1b)a
, a4 =
(−b1+c4b)a
, b2 =
−(c−c2a)b
b3 = b1 − c4b + c1b, and c3 = 1b +
. When we impose L(A′, P ) to pass through [[1, 5, 9]],
[[2, 4, 9]] and [[1, 4, 8]], we obtain equations to solve for c4, c1, and b1 respectively. After that,
the restrictions [[2, 6, 7]], [[3, 5, 7]], and [[3, 6, 8]] are trivially satisfied. The line L(A′, P ) is
parametrized by (a, b, c) in a open set of A3, and it is given by: a1 =
a(b−1)
, a2 =
abc+ab−a−bc ,
a3 = a, a4 =
a2(b−1)
abc+ab−a−bc , b1 =
b2(a−1)c
abc+ab−a−bc , b2 =
bc(a−1)
, b3 =
abc+ab−a−bc and b4 = b.
Similarly, for M2 we have α = [a1 : a2 : a3 : 1 : a4 : 0 : 1 : a4], β = [1 : b1 : 0 : b2 : b3 : b4 :
1 : b1], and γ = [1 : c1 : c2 : 1 : c1 : c2 : c3 : c4]. Of course, the only change with respect to the
previous case is γ. By doing similar computations, we have that L(A′, P ) is parametrized
by (a, b, c) in a open set of A3, and it is given by: a1 =
(b−c)a
, a2 =
abc+ab−bc−ac , a3 = a,
a2(b−c)
abc+ab−bc−ac , b1 =
b2(a−c)
abc+ab−bc−ac , b2 =
b(a−c)
, b3 =
abc+ab−bc−ac , and b4 = b (see [19, p. 11]
for more information about this net).
Hence, the lines for the corresponding (3, 4)-nets for Mr can be represented by: L1 = (y),
L2 = (a1x + y + z), L3 = (a2x + b1y + z), L4 = (a3x + z), L5 = (x), L6 = (x + b2y + z),
L7 = (a4x+b3y+z), L8 = (b4y+z), L9 = (ax−bc2−ry), L10 = (x+y+z), L11 = (a4x+b1y+z),
and L12 = (z). For example, if we evaluate the equations for the cyclic type M1 at a =
b = 1−i
, and c = −i (where i =
−1), we obtain the well-known net: A1 = {y, (1+i)x+2y+
2z, (1+ i)x+y+2z, (1+ i)x+2z}, A2 = {x, 2x+(1− i)y+2z, x+(1− i)y+2z, (1− i)y+2z}
and A3 = {x + y, x + y + z, x + y + 2z, z}. This net is projectively equivalent of the one
given by the plane curve (x4 − y4)(y4 − z4)(x4 − z4) = 0, known as CEVA(4) [7, p. 435].
(3, 5)-nets.
We have two main classes, and we represent them by the following Latin squares.
1 2 3 4 5
2 3 4 5 1
3 4 5 1 2
4 5 1 2 3
5 1 2 3 4
1 2 3 4 5
2 1 4 5 3
3 5 1 2 4
4 3 5 1 2
5 4 2 3 1
The Latin square M1 corresponds to Z/5Z. As before, for M1 and M2 we have that
α = [a1 : a2 : a3 : a4 : 1 : a5 : a6 : 0 : 1 : a5 : a6] and β = [1 : b1 : b2 : 0 : b3 : b4 : b5 :
b6 : 1 : b1 : b2], but for M1, γ = [1 : c1 : c2 : c3 : c3 : c2 : c1 : 1 : c4 : c5 : c6], and for M2,
γ = [1 : c1 : c2 : c3 : 1 : c1 : c2 : c3 : c4 : c5 : c6].
In the case ofM1, after we impose γ to L(A′, P ), we use the conditions [[2, 3, 5]], [[4, 8, 11]],
[[2, 6, 12]], [[2, 8, 9]], and [[3, 8, 10]] to solve for b2, c6, c5, b1, and c2 respectively. After that
we have four parameters left: a4 = a, b6 = b, c3 = c, and c1 = d, and we get the following
constrain for them:
b2(a−1)(d−c)(c−ad)+b(−d2a+dc+2d2a2−2da2c−da+ca−dc2+c2da)+ad(ca−da+1−c) = 0.
Hence, the (3, 5)-nets for M1 are parametrized by an open set of the hypersurface in A
defined by this equation. The values for the variables are:
a(b− 1)
ab(d− 1)
a− ba + bc a3 =
a(d− db+ bc)
c2(a− 1)b a4 = a
a2(d− 1)(d− db+ bc)
(a− ba + bc)(a− 1)cd a6 =
ad(b− 1)
b(da− adb+ bc)
a− ba + bc b2 =
d− db+ bc
bc(a− 1)
(da− adb+ bc)(d− db+ bc)a
(a− ba + bc)(a− 1)cd b5 = d b6 = b.
In the case of M2, we obtain a three dimensional moduli space of (3, 5)-nets as well. It is
parametrized by (a, b, c) in an open set of A3 such that a4 = a, b6 = b and c1 = c, and:
a2(1− b)
b(ab− a− b) a2 = c a3 =
(−a2 + a2b+ cba− ab− cb)b
(ab+ cb− a− b)(ab− a− b) a4 = a
(a2 − a2b− cba + ab+ cb)a
(ab− a− b)2 a6 =
c(b− 1)a
−a + ab+ cb− b
cb2(1− a)
a(ab− a− b) b2 =
(a− ab+ b− c)b2
(−a + ab+ cb− b)(ab− a− b)
b2(1− a)
a(ab− a− b) b4 =
ab(ab − a− b+ c)
(ab− a− b)2 b5 =
cb(a+ b− ab)
a(ab− a+ bc− b) b6 = b.
To obtain the lines for the nets corresponding to Mr, we just evaluate: L1 = (y), L2 =
(a1x + y + z), L3 = (a2x+ b1y + z), L4 = (a3x+ b2y + z), L5 = (a4x+ z), L6 = (x), L7 =
(x+b3y+z), L8 = (a5x+b4y+z), L9 = (a6x+b5y+z), L10 = (b6y+z), L11 = (ax−bc2−ry),
L12 = (x + y + z), L13 = (a5x + b1y + z), L14 = (a6x + b2y + z), and L15 = (z). These
two 3 dimensional families of (3, 5)-nets appear in [19]. We notice that both families of
(3, 5)-nets have members defined over Q. For the case M1, we can make b
2 disappear from
the equation by declaring c = ad (the relations a = 1 and d = c are not allowed). Then,
b = 2da−1−da
2da−2da2−1+a−d2a+d2a2 , and it can be checked that for suitable a, d ∈ Z the conditions for
being (3, 5)-net are satisfied.
(3, 6)-nets.
We have twelve main classes of Latin squares to check. The following is a list showing
one member of each class. It was taken from [6, pp. 129-137].
1 2 3 4 5 6
2 3 4 5 6 1
3 4 5 6 1 2
4 5 6 1 2 3
5 6 1 2 3 4
6 1 2 3 4 5
1 2 3 4 5 6
2 1 5 6 3 4
3 6 1 5 4 2
4 5 6 1 2 3
5 4 2 3 6 1
6 3 4 2 1 5
1 2 3 4 5 6
2 3 1 5 6 4
3 1 2 6 4 5
4 6 5 2 1 3
5 4 6 3 2 1
6 5 4 1 3 2
1 2 3 4 5 6
2 1 4 3 6 5
3 4 5 6 1 2
4 3 6 5 2 1
5 6 1 2 4 3
6 5 2 1 3 4
1 2 3 4 5 6
2 1 4 3 6 5
3 4 5 6 1 2
4 3 6 5 2 1
5 6 2 1 4 3
6 5 1 2 3 4
1 2 3 4 5 6
2 1 4 5 6 3
3 6 2 1 4 5
4 5 6 2 3 1
5 3 1 6 2 4
6 4 5 3 1 2
1 2 3 4 5 6
2 1 4 3 6 5
3 5 1 6 4 2
4 6 5 1 2 3
5 3 6 2 1 4
6 4 2 5 3 1
1 2 3 4 5 6
2 1 6 5 3 4
3 6 1 2 4 5
4 5 2 1 6 3
5 3 4 6 1 2
6 4 5 3 2 1
1 2 3 4 5 6
2 3 1 6 4 5
3 1 2 5 6 4
4 6 5 1 2 3
5 4 6 2 3 1
6 5 4 3 1 2
M10 =
1 2 3 4 5 6
2 1 6 5 4 3
3 5 1 2 6 4
4 6 2 1 3 5
5 3 4 6 2 1
6 4 5 3 1 2
M11 =
1 2 3 4 5 6
2 1 4 5 6 3
3 4 2 6 1 5
4 5 6 2 3 1
5 6 1 3 2 4
6 3 5 1 4 2
M12 =
1 2 3 4 5 6
2 1 5 6 4 3
3 5 4 2 6 1
4 6 2 3 1 5
5 4 6 1 3 2
6 3 1 5 2 4
The Latin squares M1 and M2 correspond to the multiplication table of the groups Z/6Z
and S3, respectively. The following is the set up for the analysis of (3, 6)-nets. We first fix one
Latin square M from the list above. Let A = {A1,A2,A3} be the corresponding (possible)
(3, 6)-net, where A1 = {L1, . . . , L6}, A2 = {L7, . . . , L12},, and A3 = {L13, . . . , L18}. As
before, we consider a new arrangement A′ together with a point P such that A′ = A \
{L1, L7, L13}, and P = [[1, 7, 13]] ∈ X . We label the lines of A′ from 1 to 15 following the
order of A, i.e., L′1 = L2, . . . , L′5 = L6, L′6 = L8, etc, eliminating L1, L7, and L13. Let
L(A′, P ) be the line in P13 for (A′, P ). The special lines (or points of L(A′, P )) α = L1,
β = L7, and γ = L13 are as α = [a1 : a2 : a3 : a4 : a5 : 1 : a6 : a7 : a8 : 0 : 1 : a6 : a7 : a8],
β = [1 : b1 : b2 : b3 : 0 : b4 : b5 : b6 : b7 : b8 : 1 : b1 : b2 : b3], and γ = γ(c1, c2, ..., c8) depending
on M . Since there is [t, u] ∈ P1 satisfying αt + βu = γ, we can and do write a1, a2, a3, a4,
a6, a7, a8, b4, b5, b6, and c5 with respect to the rest of the variables.
After that, we start imposing the points in X ′ which translates, as before, into 2 × 2
determinants equal to zero. At this stage we have 20 equations given by these determinants,
and 12 variables. We choose appropriately from them to isolate variables so that they appear
with exponent 1. In the way of solving these equations, we prove or disprove realization for
A. When the (3, 6)-net exists, i.e. A is realizable in P2 over some field, the equations for its
lines can be taken as: L1 = (y), L2 = (a1x+y+z), L3 = (a2x+b1y+z), L4 = (a3x+b2y+z),
L5 = (a4x + b3y + z), L6 = (a5x + z), L7 = (x), L8 = (x + b4y + z), L9 = (a6x + b5y + z),
L10 = (a7x+b6y+z), L11 = (a8x+b7y+z), L12 = (b8y+z), L13 = (ux−ty), L14 = (x+y+z),
L15 = (a6x + b1y + z), L16 = (a7x + b2y + z), L17 = (a8x + b3y + z), and L18 = (z), where
[t, u] satisfies αt+ βu = γ.
Now we apply this procedure case by case. We first give the result, after that we indicate
the order we solve the equations coming from the points in X ′, and then we give a moduli
parametrization whenever the net exits. For simplicity, we work always in characteristic
zero. We often omit the final expressions for the variables, although they can be given
explicitly.
M1: (Z/6Z) This gives a three dimensional moduli space. We have that some of these
nets can be defined over R. We solve the determinants in the following order: [[4, 6, 15]]
solve for c3, [[5, 10, 14]] solve for c8, [[1, 9, 15]] solve for c1, [[5, 9, 13]] solve for c7, [[3, 10, 12]]
solve for c6, [[2, 10, 11]] solve for b3, [[3, 9, 11]] solve for c2, and [[2, 8, 15]] solve for b2. If
a5 = a, b1 = d, b8 = b, and c4 = c, then they must satisfy:
c2(−1+ a)b4(a2− a2b+ cab+ ab− 2a+ ca− bc)− b2c(2c2b2+5cab+4a2b2c− 4ca2b− 2b2a3−
a2c+3a2−2a3−5a2b+ca3+2a2b2−bc2a+4ba3−4ac2b2−3ab2c−a3b2c+c2ba2+2a2b2c2)d+
(bc+ a− ab)(a2b2c2 + c2b2 − 2ac2b2 + a2b2c− ab2c+ 2cab− ca2b+ a2b2 − 2a2b+ a2)d2 = 0.
So, the moduli space for these nets is an open set of this hypersurface.
M2: (S3) This gives a three dimensional moduli space parametrized by an open set of
A3. It does not contains (3, 6)-nets defined over R. The reason is that we need the square
root of −1 to define the nets. Moreover, all of them have extra 3-points, apart from the
ones coming from X . The order we take is: [[5, 10, 14]] solve for c8, [[2, 6, 15]] solve for c1,
[[1, 10, 13]] solve for c7, [[1, 9, 12]] solve for c6, [[2, 10, 11]] solve for b1, [[5, 6, 12]] solve for c3,
[[1, 8, 15]] solve for b2, [[1, 7, 14]] solve for b8, and [[2, 8, 14]] solve for c2. If i =
−1, a5 = a,
b3 = e and c4 = c, then the expressions for the variables are:
3)(2c+e−i
2aec−ac−ce−ice
3+ae+iae
3+ica
(−1+i
3)(ae−iae
3−2ce+2ac)a
2(2ae−2ce+2aec+ac+ica
a4 = a a5 = a a6 =
(−1+i
3)(ae−iae
3−2ce+2ac)a
2(2aec−ac−ce−ice
3+ae+iae
3+ica
3)(e−i
3e+2c)a2
2(2ae−2ce+2aec+ac+ica
(−1+i
3)e2(a−c)
2aec−ac−ce−ice
3+ae+iae
3+ica
3)(−ce+ae+ac)e
2ae−2ce+2aec+ac+ica
b3 = e b4 =
(−1+i
3)(a−c)e
3)(−ce+ae+ac)e
2aec−ac−ce−ice
3+ae+iae
3+ica
2ae−2ce+2aec+ac+ica
For instance, if we plug in a = c+ic
3−2c and e =
c(1+i
2(c−1) , we get a one dimensional family
of arrangements of 18 lines with t2 = 18, t3 = 39, t4 = 3, tk = 0 otherwise.
M3: This gives a three dimensional moduli space which does not contains (3, 6)-nets
defined over R. The reason again is that we need to have the square root of −1 to realize
the nets. The order we solve is: [[5, 10, 11]] solve for b8, [[1, 9, 15]] solve for c8, [[5, 9, 12]]
solve for c6, [[3, 6, 15]] solve for c1, [[1, 10, 13]] solve for c7, [[4, 9, 11]] solve for b3, [[1, 6, 12]]
solve for b1, and [[3, 10, 12]] solve for b2. If a5 = a, c3 = d, c2 = e, and c4 = c, then they
must satisfy:
(e2a2 + e2 − e2a− 2a2de− de+ d2 + 3dea+ d2a2 − 2d2a) + (−ea− e+ ad− d)c+ c2 = 0
and so its moduli space is an open set of this hypersurface. Moreover, by solving for c, we
have that: c = 1
(ea+ e− ad+ d±
−3(a− 1)(d− e)). But, we cannot have a = 1 or d = e,
and so this shows that the square root of −1 is necessary.
M4: This case is not possible over C. To get the contradiction, we take: [[5, 10, 13]] solve
for c7, [[3, 7, 15]] solve for c6, [[2, 8, 15]] solve for b2, [[4, 6, 15]] solve for c3, [[5, 6, 14]] solve for
a5, [[1, 9, 15]] solve for c8, [[1, 10, 14]] solve for c1, [[3, 8, 14]] solve for c2, [[2, 10, 11]] solve for
c4, and [[2, 6, 13]] solve for b1. At this stage, we obtain several possibilities from the equation
given by [[2, 6, 13]], none of them possible (for example, a2 = a6).
M5: This case is not possible over C. By solving [[5, 10, 13]] for c7, and then [[3, 7, 15]] for
c6, we obtain a6 = a7 which is a contradiction.
M6: This gives a two dimensional moduli space, and so this parameter space are not
always three dimensional (see [19, p. 14]). Some of these nets can be defined over R.
The order we take is: [[5, 10, 11]] solve for a5, [[1, 9, 15]] solve for c8, [[3, 7, 15]] solve for c6,
[[2, 6, 15]] solve for b1, [[5, 6, 13]] solve for c7, [[4, 9, 11]] solve for b3, [[2, 9, 13]] solve for c1,
[[1, 10, 12]] solve for c3, and [[3, 9, 12]] solve for b2. If b8 = b, c2 = d, and c4 = c, then they
must satisfy:
bc(1−c)(bc−c−b)+(bc3+b2−5bc2+3bc−2b2c+b2c2−c3+2c2)d+(−b+2bc−2c+c2)d2 = 0.
Thus, its moduli space is an open set of this hypersurface.
M7: This gives a two dimensional moduli space parametrized by an open set of A
These nets can be defined over Q. The order we solve is the following: [[5, 6, 13]] solve for
c7, [[3, 6, 15]] solve for b2, [[1, 9, 15]] solve for c8, [[5, 9, 12]] solve for c6, [[1, 10, 14]] solve for
b3, [[3, 9, 11]] solve for b8, [[4, 8, 11]] solve for c3, [[4, 10, 13]] solve for c2, [[4, 7, 15]] solve for
b1, and [[5, 7, 11]] solve for c1. If a5 = a and c4 = c, then we have:
(c2−4c+2ac+4−2a)a
c(a−2)(c−2) a2 =
(c−1)(c−2)(a−2)a
a2c2+a2−2a2c−2c2a+5ac−2a+c2−2c a3 =
ac(a+c−2)
−c2−ac+c2a+2c−2a+a2
(a−2)(a−ac+c−2)a
−c2+c2a−3ac+2c+a2c+2a−a2 a5 = a a6 =
(a+c−2)(−a+ac−c+2)a
a2c2+a2−2a2c−2c2a+5ac−2a+c2−2c
(a−2)a2(c−1)
c2+ac−c2a−2c+2a−a2 a8 =
a(c2−4c+2ac+4−2a)
−c2+c2a−3ac+2c+a2c+2a−a2
(a−1)(a−2)(c−2)2c
(a+c−2)(a2c2+a2−2a2c−2c2a+5ac−2a+c2−2c) b2 =
(c−a)(a−2)(c−2)
−c2−ac+c2a+2c−2a+a2
(a−2)2(c−1)a(c−2)
(a+c−2)(c2−c2a+3ac−2c−a2c−2a+a2) b4 =
(c−a)(a−2)(c−2)
ac(a+c−2) b5 =
(c−1)(c−2)(a−2)a
a2c2+a2−2a2c−2c2a+5ac−2a+c2−2c
c(a−2)(c−2)a(a−1)
(c2+ac−c2a−2c+2a−a2)(a+c−2) b7 =
c(a−2)(c−2)
−c2+c2a−3ac+2c+a2c+2a−a2 b8 =
(a−2)(c−2)
2−a−c
M8: This also gives a two dimensional moduli space. Some of these nets can be defined
over R. The order we solve is the following: [[2, 6, 15]] solve for b1, [[1, 10, 13]] solve for c7,
[[1, 7, 15]] solve for c6, [[5, 7, 14]] solve for c8, [[4, 10, 11]] solve for c3, [[5, 6, 13]] solve for b2,
[[2, 10, 14]] solve for b3, [[5, 9, 11]] solve for a5, and [[3, 7, 11]] solve for c1. If b8 = b, c4 = c,
and c2 = e, then they have to satisfy:
c2(c−b)(4c2−6cb−b3+3b2)+c(cb−2c+b)(6c2−9cb−b3+4b2)e+(bc−b+c)(cb−2c+b)2e2 = 0.
Thus, its moduli space is an open set of this hypersurface.
M9: This gives a three dimensional moduli space. Some of them can be defined over R.
The order we solve is the following: [[5, 10, 11]] solve for a5, [[1, 10, 14]] solve for c8, [[4, 7, 15]]
solve for c6, [[4, 9, 12]] solve for b3, [[1, 8, 15]] solve for c7, [[5, 8, 12]] solve for c2, [[5, 6, 14]]
solve for c1, and [[3, 6, 15]] solve for b2. If b1 = e, b8 = b, c4 = c, and c3 = d, then they have
to satisfy:
(b2c2 + c2 + bc− b2c− 2bc2) + (−2c + 2bc+ ce− bec + e2b− eb)d+ (−e + 1)d2 = 0.
Thus, its moduli space is an open set of this hypersurface.
M10: This gives a two dimensional moduli space. Some of these nets can be defined
over R. The order we solve is the following: [[5, 10, 11]] solve for a5, [[1, 7, 15]] solve for b1,
[[1, 10, 12]] solve for c6, [[3, 6, 15]] solve for b2, [[5, 6, 13]] solve for c7, [[5, 7, 14]] solve for c8,
[[4, 8, 15]] solve for b3, [[3, 7, 11]] solve for c3, and [[2, 8, 11]] solve for c2. If b8 = b, c4 = c,
and c1 = e, then they have to satisfy:
ce(c− 2e) + (2ce− c− e)(e− c)b+ c(1− e)(e− c)b2 = 0.
Thus, its moduli space is an open set of this hypersurface.
M11: This also gives a two dimensional moduli space. Some of these nets can be defined
over R. The order we solve is the following: [[5, 10, 11]] solve for a5, [[1, 9, 15]] solve for c8,
[[3, 8, 11]] solve for c7, [[3, 7, 15]] solve for c6, [[4, 6, 15]] solve for b3, [[2, 8, 15]] solve for b1,
[[4, 9, 11]] solve for c2, [[5, 7, 14]] solve for c3, and [[1, 8, 14]] solve for c1. An extra property
for this nets is that c7 has to be zero, and so L13, L16, and L18 have always a common point
of incidence. If b2 = e, b8 = b, and c4 = c, then they must satisfy:
c(b− 1)(bc− b− c) + (b2c− 2bc + c− b2 + 2b)e− e2 = 0.
Thus, its moduli space is an open set of this hypersurface.
M12: This case is not possible over C. To achieve contradiction, we take: [[2, 9, 15]] solve
for c8, [[5, 10, 13]] solve for c7, [[3, 6, 15]] solve for b2, [[1, 8, 15]] solve for c3, [[1, 10, 12]] solve
for c6, [[5, 9, 11]] solve for c2, and [[5, 6, 12]] solve for b1. Then, the equation induced by
[[1, 9, 13]] gives six possibilities, none of them is possible.
(3, 8)-nets corresponding to the Quaternion group.
We now compute the (3, 8)-nets corresponding to the multiplication table of the Quater-
nion group.
1 2 3 4 5 6 7 8
2 1 6 7 8 3 4 5
3 6 2 5 7 1 8 4
4 7 8 2 3 5 1 6
5 8 7 6 2 4 3 1
6 3 1 8 4 2 5 7
7 4 5 1 6 8 2 3
8 5 4 3 1 7 6 2
In this case, we have a three dimensional moduli space for them, given by an open set
of A3. Also, these (3, 8)-nets can be defined over Q (so we can even draw them). This
example shows again that non-abelian groups can also realize nets over C. The set up is
similar to what we did before. In this case, A′ = A \ {L1, L9, L17} and P = [[1, 9, 17]]. Our
distinguished points on L(A′, P ) ⊆ P19 are: α = [a1 : a2 : a3 : a4 : a5 : a6 : a7 : 1 : a8 : a9 :
a10 : a11 : a12 : 0 : 1 : a8 : a9 : a10 : a11 : a12], β = [1 : b1 : b2 : b3 : b4 : b5 : 0 : b6 : b7 : b8 : b9 :
b10 : b11 : b12 : 1 : b1 : b2 : b3 : b4 : b5], and γ = [1 : c1 : c2 : c3 : c4 : c5 : c6 : 1 : c4 : c5 : c6 :
c1 : c2 : c3 : c7 : c8 : c9 : c10 : c11 : c12]. Let [t : u] ∈ P1 such that αt + βu = γ. We isolate
first a1, a2, a3, a4, a5, a6, a8, a9, a10, a11, b6, b7, b8, b9, b10, b11, and c7 with respect to the
other variables. The following is the order we solve (some of) the 2× 2 determinants given
by the 3-points in X ′: [[1, 11, 21]] solve for c10, [[2, 10, 21]] solve for c9, [[3, 12, 21]] solve for
c11, [[4, 8, 21]] solve for b3, [[5, 13, 21]] solve for c8, [[7, 14, 15]] solve for b12, [[5, 14, 20]] solve
for c4, [[2, 14, 17]] solve for c1, [[4, 9, 20]] solve for c2, [[6, 13, 15]] solve for c5, [[3, 8, 20]] solve
for b5, [[3, 10, 15]] solve for b4, [[3, 9, 18]] solve for c6, and [[3, 11, 19]] solve for b1. Then, if we
write a7 = a, b2 = e, and c3 = d, the expressions for all the variables are:
ad−a−d
a−2 a2 =
2e2d−2ed+ed2−e2d2+(−2ed2+e2d2+2e+6ed−3e2d−4)a+(−4ed−e+4+ed2+e2d)a2
(ae−2a−2e+2)(ade−ed+d−a−da)
e(ade−ed+d−a−da)
ae−2a−2e+2 a4 = d
4d+2e2d−6ed+e2d2−ed2+(−2e2d2+2ed2−8d−e2d+10ed−2e)a+(4d+e+e2d2−ed2−4ed)a2
(ae−2a−2e+2)(a+d−ad−de)
(a+d−ad−de)(ae+dae−4a−2e−ed+4)
(ae−2a−2e+2)(ade−a−da−2ed+2+d) a7 = a a8 =
(a+d−da−2)e
ae−2a−2e+2
2e2d−2ed+ed2−e2d2+(−2ed2+e2d2+2e+6ed−3e2d−4)a+(−4ed−e+4+ed2+e2d)a2
(a+d−ad−2)(ae−2a−2e+2)
a10 = a+ d− ad a11 = ade+ae−4a−2e−ed+4ae−2a−2e+2
a12 =
2e2d+4d−6ed+e2d2−ed2+(−2e2d2+2ed2−8d−e2d+10ed−2e)a+(4d+e+e2d2−ed2−4ed)a2
(ade−a−da−2ed+d+2)(ae−2a−2e+2)
−2e+ed+ae−2a
−ed+dae+d−a−da b2 = e b3 = 2 b4 =
ae−2a−2e−ed+4
a+d−ad−de b5 =
ade+ae−4a−2e−ed+4
ade−a−da−2ed+d+2 b6 =
e(ad−d+2−a)
ad+de−a−d b8 = −
−2e+ed+ae−2a
a+d−ad−2 b9 = 2− a b10 =
ae+dae−4a−ed−2e+4
ade−ed+d−a−da
b11 =
ae−2a−2e+4−ed
ade−a−da−2ed+d+2 b12 =
a−1 ,
with [t : u] = [2− a : d(a− 1)] ∈ P1.
Since b3 = 2, these (3, 8)-nets are not possible in characteristic 2. The lines for these (3, 8)-
nets can be written as: L1 = (y), L2 = (a1x+y+z), L3 = (a2x+b1y+z), L4 = (a3x+b2y+z),
L5 = (a4x+ b3y + z), L6 = (a5x+ b4y + z), L7 = (a6x+ b5y + z), L8 = (a7x+ z), L9 = (x),
L10 = (x + b6y + z), L11 = (a8x + b7y + z), L12 = (a9x + b8y + z), L13 = (a10x + b9y + z),
L14 = (a11x + b10y + z), L15 = (a12x + b11y + z), L16 = (b12y + z), L17 = (ux − ty),
L18 = (x + y + z), L19 = (a8x + b1y + z), L20 = (a9x + b2y + z), L21 = (a10x + b3y + z),
L22 = (a11x+ b4y + z), L23 = (a12x+ b5y + z), and L24 = (z).
A natural question, which we leave open, is the following:
Question 4.1. Is there a combinatorial characterization of the main classes of q × q Latin
squares which realize (3, q)-nets in P2
References
1. J. Aczel. Quasigroups, nets, and nomograms, Adv. in Math. 1 (1965) 383-450.
2. M. Artebani and I. Dolgachev. The Hesse pencil of plane cubic curves, arXiv:math.AG/0611590, to
appear on L’Enseignement Mathématique.
3. A. Barlotti and K. Strambach. The geometry of binary systems, Adv. in Math. 49 (1983) 1-105.
4. M. A. Marco Buzuñariz. Resonance varieties, admissible line combinatorics and combinatorial pencils,
arXiv:math/0505435.
5. O. Chein, H. O. Pflugfelder and J. D. H. Smith. Quasigroups and loops: theory and applications, Sigma
Series in Pure Mathematics 8, Heldermann Verlag, Berlin, 1990.
6. J. Dénes and A. D. Keedwell. Latin squares and their applications, Academic Press, 1974.
7. I. Dolgachev. Abstract configurations in algebraic geometry, The Fano Conference, Univ. Torino, Turin
(2004) 423-462.
8. C. Dunn, M. S. Miller, M. Wakefield and S. Zwicknagl. Equivalence classes of Latin squares and nets
in CP2, arXiv:math/0703142v4.
9. M. Falk and S. Yuzvinsky. Multinets, resonance varieties, and pencils of plane curves, Compos. Math.
143, no. 4, (2007) 1069-1088.
10. B. Grünbaum. Configurations of points and lines, The Coxeter legacy, Amer. Math. Soc., Providence
RI, 2006, 179-225.
11. R. Hartshorne. Algebraic geometry, Graduate Text in Mathematics v.52, Springer, 1977.
12. J. W. P. Hirschfeld. Projective geometries over finite fields, The Clarendon Press Oxford University
Press, New York, 1979, Oxford Mathematical Monographs.
13. F. Hirzebruch. Arrangements of lines and algebraic surfaces, Arithmetic and geometry, Vol. II, Progr.
Math. 36, Birkhäuser, Boston, Mass., 1983, 113-140.
14. M. M. Kapranov. Veronese curves and Grothendieck-Knudsen moduli space M0,n , J. Algebraic Geom.
2 (1993) 239-262.
15. M. M. Kapranov. Chow quotients of Grassmannians I, Adv. Soviet Math. 16, part 2, A.M.S., (1993)
29-110.
16. Y. Kawahara. The non-vanishing cohomology of Orlik-Solomon algebras, Tokyo J. Math. 30(2007), no.1
223–238.
17. A. Libgober and S. Yuzvinsky. Cohomology of the Orlik-Solomon algebras and local systems, Compositio
Math. 121, no. 3, (2000) 337-361.
18. J. V. Pereira and S. Yuzvinsky. Completely reducible hypersurfaces in a pencil, arXiv:math/0701312v2.
19. J. Stipins. Old and new examples of k-nets in P2, arXiv:math.AG/0701046.
20. J. Stipins. On finite k-nets in the complex projective plane, Ph.D. Thesis, University of Michigan (2007).
21. G. Urzúa. Arrangements of curves and algebraic surfaces, Ph.D. Thesis, University of Michigan (2008).
22. S. Yuzvinsky. Realization of finite abelian groups by nets in P2, Compos. Math. 140, no. 6, (2004)
1614–1624.
23. S. Yuzvinsky. A new bound on the number of special fibers in a pencil of curves, arXiv:0801.1521v2.
Department of Mathematics and Statistics, University of Massachusetts at Amherst, USA.
E-mail address : urzua@math.umass.edu
http://arxiv.org/abs/math/0611590
http://arxiv.org/abs/math/0505435
http://arxiv.org/abs/math/0703142
http://arxiv.org/abs/math/0701312
http://arxiv.org/abs/math/0701046
http://arxiv.org/abs/0801.1521
	1. Introduction.
	2. Arrangements of d lines in ¶2, and lines in ¶d-2.
	3. (p,q)-nets in ¶2.
	4. Classification of (3,q)-nets for q6, and the Quaternion nets.
	References
ABSTRACT
  We show a one-to-one correspondence between arrangements of d lines in the
projective plane, and lines in P^{d-2}. We apply this correspondence to
classify (3,q)-nets over the complex numbers for all q<=6. When q=6, we have
twelve possible combinatorial cases, but we prove that only nine of them are
realizable. This new case shows several new properties for 3-nets: different
dimensions for moduli, strict realization over certain fields, etc. We also
construct a three dimensional family of (3,8)-nets corresponding to the
Quaternion group.

<|endoftext|><|startoftext|>
Introduction
	Experiment description
	From PSF to structure function
	Overview
	Conditions of the experiment
	VISIR data
	Shack-Hartmann data
	Results
	Conclusions
	Models of turbulence
	Correction for the missing flux
	Effect of the tip-tilt servo on the SF
ABSTRACT
  We probe turbulence structure from centimetric to metric scales by
simultaneous imagery at mid-infrared and visible wavelengths at the VLT
telescope and show that it departs significantly from the commonly used
Kolmogorov model. The data can be fitted by the von Karman model with an outer
scale of the order of 30 m and we see clear signs of the phase structure
function saturation across the 8-m VLT aperture. The image quality improves in
the infrared faster than the standard lambda^{-1/5} scaling and may be
diffraction-limited at 30-m apertures even without adaptive optics at
wavelengths longer than 8 micron.

<|endoftext|><|startoftext|>
Density dependence of the symmetry energy and the nuclear equation of state: A
Dynamical and Statistical model perspective
D.V. Shetty, S.J. Yennello, and G.A. Souliotis
Cyclotron Institute, Texas A&M University, College Station, TX 77843, USA
(Dated: October 25, 2018)
The density dependence of the symmetry energy in the equation of state of isospin asymmetric
nuclear matter is of significant importance for studying the structure of systems as diverse as the
neutron-rich nuclei and the neutron stars. A number of reactions using the dynamical and the
statistical models of multifragmentation, and the experimental isoscaling observable, is studied to
extract information on the density dependence of the symmetry energy. It is observed that the
dynamical and the statistical model calculations give consistent results assuming the sequential
decay effect in dynamical model to be small. A comparison with several other independent studies
is also made to obtain important constraint on the form of the density dependence of the symmetry
energy. The comparison rules out an extremely “ stiff ” and “ soft ” form of the density dependence
of the symmetry energy with important implications for astrophysical and nuclear physics studies.
PACS numbers: 21.30.Fe, 25.70.-z, 25.70.Lm, 25.70.Mn, 25.70.Pq
I. INTRODUCTION
The fundamental goal of nuclear physics is to under-
stand the basic building blocks of nature - neutrons and
protons - and the nature of interaction that binds them
together into nuclear matter. Studying the nature of
matter and the strength of nuclear interaction is key to
understanding some of the fundamental problems such
as, How are elements formed? How do stars explode into
supernova? What kind of matter exists inside a neu-
tron star? How are neutrons compressed inside a neutron
star to density trillions of times greater than on earth ?
What determines the density-pressure relation, the so-
called equation of state?
The key ingredient for constructing the nuclear equa-
tion of state is the basic nucleon-nucleon interaction. Un-
til now our understanding of the nucleon-nucleon inter-
action has come from studying nuclear matter that is
symmetric in isospin (neutron-to-proton ratio, N/Z ≈ 1)
and matter found near normal nuclear density (ρo ≈ 0.16
fm−3). It is not known how far this understanding re-
mains valid as one goes away from the normal nuclear
density and symmetric nuclear matter. Various interac-
tions used in “ ab initio ” microscopic calculations pre-
dict different forms of the nuclear equation of state above
and below the normal nuclear matter density, and away
from the symmetric nuclear matter [1, 2, 3, 4, 5, 6]. As a
result, the symmetry energy, which is the difference in en-
ergy between the pure neutron matter and the symmetric
nuclear matter, shows very different behavior above and
below normal nuclear density [6] (see Fig. 1).
In general, two different forms of the density depen-
dence of the symmetry energy have been predicted. One,
where the symmetry energy increases monotonically with
increasing density (“ stiff ” dependence) and the other,
where the symmetry energy increases initially up to nor-
mal nuclear density and then decreases at higher densities
(“ soft ” dependence). Constraining the form of the den-
sity dependence of the symmetry energy is important not
0 0.5 1
ρ / ρ0
0 1 2 3
ρ / ρ0
SkLya
var AV
+δv+3-BF
DD-TW
DD-ρδ
FIG. 1: (Color online) Symmetry energy as a function of den-
sity predicted by microscopic “ ab initio ” calculations. The
left panel shows the low-density region, while the right panel
displays the high-density range. The figure is taken from Ref.
only for a better understanding of the nucleon-nucleon
interaction, and hence its extrapolation to the structure
of neutron-rich nuclei [7, 8, 9, 10], but also for deter-
mining the structure of compact stellar objects such as
neutron stars [11, 12, 13, 14, 15, 16, 17]. For example, a
“ stiff ” form of the density dependence of the symmetry
energy is predicted to lead to a large neutron skin thick-
ness compared to a “ soft ” dependence [8, 10, 18, 19, 20].
Similarly, a “ stiff ” dependence of the symmetry energy
can result in rapid cooling of a neutron star, and a larger
neutron star radius, compared to a “ soft ” density depen-
dence of the symmetry energy [20, 21, 22]. The nuclear
Equation Of State (EOS) is therefore a fundamental en-
tity that determines the properties of systems as diverse
as atomic nuclei and neutron stars, and the knowledge of
which is of significant importance [16, 17, 23].
Experimentally, the best possible means of studying
the nuclear equation of state at sub-normal nuclear den-
http://arxiv.org/abs/0704.0471v1
sity is through intermediate-energy heavy-ion reactions
[26, 27]. In this kind of reaction, an excited nucleus
(the composite of the projectile and the target nucleus)
expands to a sub-nuclear density and disintegrates into
various light and heavy fragments in a process called
multifragmentation. By studying the isotopic yield dis-
tribution of these fragments one can extract important
information about the symmetry energy and its density
dependence. Current studies on the nuclear equation of
state are limited to beams consisting of stable nuclei. It
is hoped that in the future radioactive beam facilities
such as, FAIR (GSI) [24], SPIRAL2 (GANIL) and FRIB
(USA) [25] will provide tremendous opportunities for ex-
ploring the nuclear EOS in regions never before studied
(i.e., extreme isospin and away from normal nuclear den-
sity).
In this work, we have made an attempt to study
the density dependence of the symmetry energy using
two different theoretical approaches for studying multi-
fragmentation, namely the dynamical and the statistical
model approaches of multifragmentation. In section II,
the isoscaling technique used to study the density de-
pendence of the symmetry energy, and their different in-
terpretations in terms of statistical and dynamical ap-
proach, are presented. In section III and IV, a brief
description of the experiment and the experimental re-
sults are presented. The dynamical and the statistical
approaches used to interpret the experimental results are
presented in section V. A comparsion between the two
approaches with other independent studies is presented
in section VI. Finally, a discussion and summary, and
conclusions are presented in section VII and VIII, re-
spectively.
II. SYMMETRY ENERGY AND THE ISOTOPIC
YIELD DISTRIBUTION
It has been shown from experimental measurements
that the ratio of the fragment yields, R21(N ,Z), taken
from two different multifragmentation reactions, 1 and
2, obeys an exponential dependence on the neutron num-
ber (N) and the proton number (Z) of the fragments; an
observation known as isoscaling [28, 29, 30]. The depen-
dence is characterized by the relation,
R21(N,Z) = Y2(N,Z)/Y1(N,Z) = Ce
(αN+βZ) (1)
Where, Y2 and Y1 are the fragment yields from the
neutron-rich and the neutron-deficient systems, respec-
tively. C is an overall normalization factor, and α and β
are the parameters characterizing the isoscaling behavior.
Isoscaling is also theoretically predicted by the dynam-
ical [31, 32, 33, 34, 35] and statistical [36, 37, 38, 39]
models of multifragmentation. In these models, the dif-
ference in the chemical potential of systems with different
neutron-to-proton ration (N/Z) is directly related to the
isoscaling parameter α. The isoscaling parameter α, is
related to the symmetry energy Csym, through the rela-
tion,
4Csym
where, Z1, A1 and Z2, A2 are the charge and the mass
numbers from the two systems and T is the temperature.
This relation provides a simple and straight-forward con-
nection between the symmetry energy and the fragment
isotopic yield distribution.
It must be mentioned that although the above equation
derived from the statistical and the dynamical models of
multifragmentation appears similar in form, the physical
meaning of the terms involved in this equation differ for
each model.
1) In statistical models, the Z/A in Eq. (2) corresponds
to the charge-to-mass ratio of the initial equilibrated frag-
menting system. Whereas, in dynamical models, it cor-
responds to the charge-to-mass ratio of the liquid phase
at a certain time (≈ 300 fm/c) during the dynamical evo-
lution of the colliding systems.
2) The interpretation of the symmetry energy Csym, in
dynamical and statistical models also differs significantly.
The dynamical models relate the symmetry energy in the
above equation to that of the fragmenting source. The
statistical models, on the other hand, relate Csym to that
of the fragments formed at freeze-out.
These conceptual differences between the statistical
and the dynamical models are due to the radically differ-
ent approaches taken in the interpretation of the multi-
fragmentation process. The different interpretation has
also lead to conflicting results from the use of Eq. 2, due
to the different sequential decay effects predicted for the
primary fragments by each model.
The isoscaling parameter α, in Eq. 2 corresponds to
the hot primary fragments which undergo sequential de-
cay into cold secondary fragments. These secondary frag-
ments are the ones that are eventually detected in exper-
iments. The experimentally determined isoscaling pa-
rameter must therefore be corrected for the sequential
decay effect before comparing it to the theoretical mod-
els. It has been observed that while statistical model
calculations show no significant change in the isoscaling
parameter after sequential decay [40], dynamical models
give contrasting results; with some showing no significant
changes [41], while others showing a change of as much
as 50% [42].
In this work, we adopt both theoretical approaches
with their respective interpretations to study the den-
sity dependence of the symmetry energy. In particular,
we use the Antisymmetrized Molecular Dynamics (AMD)
model [31, 43] and the Statistical Multifragmentation
Model (SMM) [36] for this study. A comparison between
the two can provide useful insight into the physical mean-
ing of the above equation in the two models.
III. EXPERIMENT
A. Experimental Setup
The experiments were carried out at the Cyclotron
Institute of Texas A&M University (TAMU) using the
K500 Superconducting Cyclotron and the National Su-
perconducting Cyclotron Laboratory (NSCL) at Michi-
gan State University (MSU). Targets of 58Fe (2.3
mg/cm2) and 58Ni (1.75 mg/cm2) were bombarded with
beams of 40Ar and 40Ca at 33 and 45 MeV/nucleon for
the TAMU measurements [44], and targets of 58Fe (∼ 5
mg/cm2) and 58Ni (∼ 5 mg/cm2) were bombarded with
beams of 40Ar and 40Ca at 25 and 53 MeV/nucleon for
the NSCL measurements [45]. The various combinations
of target and projectile nuclei allowed for a range of N/Z
(neutron-to-proton ratio) (1.04 − 1.23) of the system to
be studied, while keeping the total mass constant (A =
98). In a separate experiment at TAMU, beams of 58Ni
and 58Fe at 30, 40, and 47 MeV/nucleon were also bom-
barded on self-supporting 58Ni and 58Fe targets.
The beams in the TAMU measurements were fully
stripped by allowing them to pass through a thin alu-
minum foil before being hit at the center of the target
inside the TAMU 4π neutron ball [46]. Light charged
particles (Z ≤ 2) and intermediate mass fragments (Z >
2) were detected using six discrete telescopes placed in-
side the scattering chamber of the neutron ball at angles
of 10◦, 44◦, 72◦, 100◦, 128◦ and 148◦. Each telescope con-
sisted of a gas ionization chamber (IC) followed by a pair
of silicon detectors (Si-Si) and a CsI scintillator detector,
providing three distinct detector pairs (IC-Si, Si-Si, and
Si-CsI) for fragment identification. The ionization cham-
ber was of axial field design and was operated with CF4
gas at a pressure of 50 Torr. The gaseous medium was 6
cm thick with a typical threshold of ∼ 0.5 MeV/nucleon
for intermediate mass fragments. The silicon detectors
had an active area of 5 cm × 5 cm and were each sub-
divided into four quadrants. The first and second silicon
detectors in the stack were 0.14 mm and 1 mm thick,
respectively. The dynamical energy range of the silicon
pair was ∼ 16 - 50 MeV for 4He and ∼ 90 - 270 MeV
for 12C. The CsI scintillator crystals that followed the
silicon detector pair were 2.54 cm in thickness and were
read out by photodiodes. Good elemental (Z) identifi-
cation was achieved for fragments that punched through
the IC detector and stopped in the first silicon detector.
Fragments measured in the Si-Si detector pair also had
good isotopic separation. Fragments that stopped in CsI
detectors showed isotopic resolution up to Z = 7. The
trigger for the data acquisition was generated by requir-
ing a valid hit in one of the silicon detectors.
The calibration of the IC-Si detectors were carried out
using the standard alpha sources and by operating the
IC at various gas pressures. The Si-Si detectors were cal-
ibrated by measuring the energy deposited by the alpha
particles in the thin silicon and the punch-through ener-
gies of different isotopes in the thick silicon. The Si-CsI
detectors were calibrated by selecting points along the
different light charged isotopes and determining the en-
ergy deposited in the CsI crystal from the energy loss in
the calibrated Si detector.
The setup for the NSCL experiment consisted of 13
silicon detector telescopes placed inside the MSU 4π Ar-
ray. Four of which were placed at 14◦, each of which
consisted of a 100 µm thick and a 1 mm thick silicon
surface-barrier detector followed by a 20 cm thick plastic
scintillator. Five telescopes were placed at 40◦, in front
of the most forward detectors in the main ball of the 4π
Array. They each consisted of a 100 µm surface-barrier
detector followed by a 5 mm lithium drifted silicon de-
tector. More details can be found in Ref. [45]. Good
isotopic resolution was obtained as in TAMU measure-
ments.
B. Event Characterization
The event characterization of the NSCL data was
accomplished by detection of nearly all the coincident
charged particles by the MSU 4π Array. Data were ac-
quired using two different triggers; the bulk of which
was obtained with the requirement of a valid event in
one of the silicon telescopes. Additional data were taken
with a minimum bias 4π Array trigger for normalization
of the event characterization. The impact parameter of
the event was determined by the mid-rapidity charge de-
tected in the 4π Array as discussed in Ref. [47]. The
effectiveness of the centrality cuts was tested by compar-
ing the multiplicity of events from a minimum bias trigger
with the multiplicity distribution when a valid fragment
was detected at 40◦ [48]. The minimum bias trigger had
a peak multiplicity of charged particles of one, whereas
with the requirement of a fragment at 40◦, the peak of
the multiplicity distribution increased to five.
The event characterization for the TAMU data was
accomplished by using the 4π neutron ball that sur-
rounded the detector assembly. The neutron ball con-
sisted of eleven scintillator tanks segmented in its me-
dian plane and surrounding the vacuum chamber. The
upper and the lower tank were 1.5 m diameter hemi-
spheres. Nine wedge-shaped detectors were sandwiched
between the hemispheres. All the wedges subtended 40◦
in the horizontal plane. The neutron ball was filled with
a pseudocumene-based liquid scintillator mixed with 0.3
% (b.w.) of Gd salt (Gd 2-ethyl hexanoate). Scintilla-
tions from thermal neutrons captured by Gd were de-
tected by twenty 5-inch phototubes : five in each hemi-
sphere, one on each of the identical 40◦ wedges and two
on the forward edges. The efficiency with which the neu-
trons could be detected is about 83%, as measured with
a 252Cf source.
The detected neutrons were used to differentiate be-
tween the central and peripheral collisions. To un-
derstand the effectiveness of neutron multiplicity as
a centrality trigger, simulations were carried out us-
ing a hybrid BUU-GEMINI calculations at various im-
pact parameters for the 40Ca + 58Fe reaction at 33
MeV/nucleon. The simulated neutron multiplicity distri-
bution was compared with the experimentally measured
distribution. The multiplicity of neutron for the impact
parameter b = 0 collisions was found to be higher than
the b = 5 collision. By gating on the 10% highest neu-
tron multiplicity events, one could clearly discriminate
against the peripheral events.
To determine the contributions from noncentral im-
pact parameter collisions, neutrons emitted in coinci-
dence with fragments at 44◦ and 152◦ were calculated at
b = 0 fm and b = 5 fm. The number of events were ad-
justed for geometrical cross sectional differences. A ratio
was made between the number of events with a neutron
multiplicity of at least six, calculated at b = 0 fm, and the
number of events with the same neutron multiplicity at b
= 5 fm. The ratios were observed to be 19.0 and 11.1 at
44◦ and 1.3 and 2.2 at 152◦ for 33 and 45 MeV/nucleon
respectively. At intermediate angles, high neutron multi-
plicities were observed to be outside the region in which
b = 5 fm contributes significantly. At backward angles
the collisions at b = 5 fm made a larger contribution to
the neutron multiplicity.
In addition to the neutron multiplicity distribution, the
charge distribution of the fragments was also used to in-
vestigate the contributions from central and mid-impact
parameter collisions. The b = 5 collisions produced es-
sentially no fragments with charge greater than three in
the 44◦ telescope.
In an earlier work [44], some analysis of the fragment
kinetic energy and charge distributions were presented.
It was shown that at a laboratory angle of 44◦ the ki-
netic energy and the charge distributions are well repro-
duced by the statistical model calculation. Using a mov-
ing source analysis of the fragment energy spectra, it was
also shown that the fragments emitted at backward an-
gles originate from a target-like source, while those emit-
ted at 44◦ originate primarily from a composite source.
In this work, we will concentrate exclusively on data from
the laboratory angle of 44◦, which corresponds to the cen-
ter of mass angle ≈ 90◦, to study the symmetry energy
and the isoscaling properties of the fragments produced.
The choice of this angle enables one to select events which
are predominantly central and undergo bulk multifrag-
mentation. The contributions to the intermediate mass
fragments from the projectile-like and target-like sources
can therefore be assumed to be minimal.
IV. EXPERIMENTAL RESULTS
A. Fragment isotopic yield distribution
The experimentally measured relative isotopic yield
distributions for the Lithium (left) and Carbon (right)
elements, in 58Ni + 58Ni (star symbols), 58Ni + 58Fe
(square symbols), 58Fe + 58Ni (circle symbols) and 58Fe
FIG. 2: Relative yield distribution of the fragments for the
Lithium (left) and Carbon (right) isotopes in 58Ni + 58Ni
(stars and solid lines), 58Fe + 58Ni (circles and dashed lines),
58Ni + 58Fe (squares and dashed lines), and 58Fe + 58Fe (tri-
angles and dotted lines) reactions at various beam energies.
+ 58Fe (triangle symbols) reactions, are shown in Fig. 2
for beam energies of 30, 40 and 47 MeV/nucleon. Sim-
ilarly, the isotopic yield distributions for Lithium (left),
Berillium (center) and Carbon (right) elements, in 40Ca
+ 58Ni (star symbols), 40Ar + 58Ni (circle symbols) and
40Ar + 58Fe (square symbols) reactions, are shown in fig-
ure 3 for beam energies of 25, 33 and 45 MeV/nucleon.
The isotope distribution for each element in Fig. 3
shows higher fragment yield for the neutron rich iso-
topes in 40Ar + 58Fe reaction (squares), which has the
largest neutron-to-proton ratio (N/Z), in comparison to
the 40Ca + 58Ni reaction (stars), which has the small-
est neutron-to-proton ratio. The yields for the reaction,
40Ar + 58Fe (circles), which has an intermediate value of
the neutron-to-proton ratio, are in between those of the
other two reactions. A similar feature is also observed for
the 58Ni + 58Fe, 58Fe + 58Ni and 58Fe + 58Fe reactions
shown in Fig. 2. The fragment yield distributions there-
fore show the isospin dependence of the composite system
on the fragments produced in the multifragmentation re-
action. One also observes that the relative difference in
the yield distribution between the three reactions in each
figure decreases with increasing beam energy. This is due
to the secondary de-excitation of the primary fragments,
a process that becomes important for systems with in-
creasing neutron-to-proton ratio and excitation energy.
FIG. 3: Relative yield distribution of the fragments for
Lithium (left), Berillium (center) and Carbon (right) isotopes
in 40Ca + 58Ni (stars and solid lines), 40Ar + 58Ni (circles and
dashed lines), and 40Ar + 58Fe (squares and dotted lines) re-
actions at various beam energies.
B. Isotopic and Isotonic scaling
As discussed in section II, the ratio of isotope
yields in two different systems, 1 and 2, R21(N,Z) =
Y2(N,Z)/Y1(N,Z), follows an exponential dependence
on the neutron number (N) and the proton number (Z)
of the isotopes in relation known as isoscaling.
In Fig. 4, we show the isotopic yield ratio as a func-
tion of neutron number N , for Ar + Fe, Ar + Ni and
Ca + Ni systems at beam energies of 25, 33, 45 and
53 MeV/nucleon. The left column shows the ratio for
the 40Ar + 58Fe and 40Ca + 58Ni pair of reaction and
the right column shows the ratio for the 40Ar + 58Ni and
40Ca + 58Ni pair of reaction. One observes that the ratio
for each element shows linear behavior in the logarithmic
plot and aligns with the neighboring element quite well.
This feature is observed for all the beam energies and
both pairs of reactions studied. One also observes that
the alignment of the data points varies with beam ener-
gies as well as the pairs of reaction. To have a quantita-
tive estimate of this variation, the ratio for each element
(Z) was simultaneously fit using an exponential relation
(shown by the solid lines) to obtain the slope parame-
ter α. The values of the parameters are shown at the
top of each panel in the figure. The value of the slope
FIG. 4: Experimental isotopic yield ratios of the fragments as
a function of neutron number N , for various beam energies.
The left column correspond to 40Ar + 58Fe and 40Ca + 58Ni
pair of reactions. The right column correspond to 40Ar + 58Ni
and 40Ca + 58Ni pair of reactions. The different symbols
correspond to Z = 3 (circles), Z = 4 (open stars), Z = 5
(triangles), Z = 6 (squares) and Z = 7 (filled stars) elements.
The lines are the exponential fits to the data as explained in
the text.
parameter α is larger for the 40Ar + 58Fe and 40Ca +
58Ni reactions, which has a larger difference in the N/Z
of the systems in the pair, compared to the 40Ar + 58Ni
and 40Ca + 58Ni reactions, which has a smaller difference
in the corresponding N/Z. The α value furthermore de-
creases with increasing beam energy. A similar feature
is also observed in Fe + Fe, Fe + Ni, Ni + Fe and Ni
+ Ni systems. Fig. 5 shows the isotope yield ratios and
the isotone yield ratios for the Fe + Fe and Ni + Ni re-
actions for the 30 MeV/nucleon beam energy. A relative
comparison of how the isoscaling parameter α, evolves as
a function of beam energy and the isospin of the system
is shown in Fig. 6. The figure clearly shows that the α
value decreases with beam energy from 25 MeV/nucleon
to 53 MeV/nucleon. In addition, there is also a clear
drop in the α values with the decrease of the N/Z values
of the system.
FIG. 5: Experimental isotope yield ratios (top) and isotone
yield ratios (bottom) from 58Fe + 58Fe and 58Ni + 58Ni re-
actions as a function of N and Z for 30 MeV/nucleon beam
energy. The solid lines are fit to the data as discussed in the
text.
FIG. 6: Experimental isoscaling parameter α, as a function
of the beam energy. The solid circles are for the 40Ar + 58Fe
and 40Ca + 58Ni reactions. The open triangles are for 58Fe +
58Fe and 58Ni + 58Ni reactions. The solid stars are for 40Ar
+ 58Ni and 40Ca + 58Ni reactions. The open squares are for
58Fe + 58Ni and 58Ni + 58Ni reactions.
V. THEORETICAL MODEL COMPARISON
A. Dynamical AMD model
The Antisymmetrized Molecular Dynamics (AMD)
[31, 43] is a microscopic model that simulates the time
evolution of a nuclear collision. The colliding system in
this model is represented in terms of a fully antisym-
metrized product of Gaussian wave packets. During the
evolution, the wave packet centroids move according to
the deterministic equation of motion. The followed state
of the simulation branches stochastically and successively
into a huge number of reaction channels. The interactions
are parameterized in terms of an effective force acting be-
tween nucleons and the nucleon-nucleon collision cross-
sections. The advantage of using a dynamical model to
study the nuclear equation of state is that it allows one
to understand the functional form of the density depen-
dence of the symmetry energy at a very fundamental level
(i.e., from the basic nucleon-nucleon interaction).
Recently [31], the fragment yields from heavy ion col-
lisions simulated within the Antisymmetrized Molecular
Dynamics (AMD) calculation were reported to follow a
scaling behavior of the type shown in Eq. 1. A linear re-
lation between the isoscaling parameter α and the differ-
ence in the isospin asymmetry (Z/A)2 of the fragments
(as given in Eq. 2), with appreciably different slopes,
was predicted for two different forms of the density de-
pendence of the symmetry energy; a “ stiff ” dependence
(obtained from Gogny-AS interaction) and a “ soft ” de-
pendence (obtained from Gogny interaction).
In this section, we compare the experimentally de-
termined isoscaling parameter with the predictions of
the AMD model calculation. The isospin asymmetry
of the fragments for the present systems was estimated
at t = 300 fm/c of the dynamical evolution using the
AMD calculation. The values for the fragment asym-
metry (Z/A)2, were obtained by interpolating between
those calculated for the 40Ca + 40Ca, 48Ca + 48Ca and
60Ca + 60Ca systems by Ono et al. [31]. These systems
are symmetric and nearly similar in charge and mass as
studied in the present work. Fig. 7 shows the AMD cal-
culation of the fragment asymmetry, (Z/A)2 at t = 300
fm/c, as a function of initial asymmetry at time t = 0
fm/c, for two different choices of the nucleon-nucleon in-
teraction, Gogny and Gogny-AS. The asymmetry values
for the 40Ca + 40Ca, 48Ca + 48Ca and 60Ca + 60Ca sys-
tems of Ref. [31] are shown by solid and hollow square
symbols for the Gogny and Gogny-AS interaction, re-
spectively. The lines are the linear fits to the calcula-
tions. The interpolated values for the present systems
are shown by the solid circles and triangles, and hollow
circles and triangles for the Gogny and Gogny-AS inter-
action, respectively.
We note that the AMD calculations carried out in Ref.
[31] and shown in Fig. 7 correspond to the beam energy
of 35 MeV/nucleon. The interpolated values of the asym-
metries for the present systems obtained from Fig. 7 are
FIG. 7: AMD calculations of the fragment asymmetry
(Z/A)2, at t = 300 fm/c for the Gogny (solid line and solid
squares) and Gogny-AS (dotted line and hollow squares) in-
teractions at 35 MeV/nucleon. The calculations are taken
from Ref. [31] for the systems shown by the square symbols.
The lines are linear fit to the square symbols. The other sym-
bols are the interpolated values for the systems studied in this
work.
therefore for the beam energy of 35 MeV/nucleon. In or-
der to compare the experimentally determined isoscaling
parameter to that of the calculations, we therefore make
use of the experimental isoscaling parameter for the beam
energy of 35 MeV/nucleon using Fig. 6.
Fig. 8 shows a comparison between the experimentally
observed α and those from the AMD model calculations
plotted as a function of the difference in the fragment
asymmetry for the beam energy of 35 MeV/nucleon. The
solid and the dotted lines are the AMD model predictions
for the “ soft ” (Gogny) and the “ stiff ” (Gogny-AS)
form of the density dependence of the symmetry energy,
respectively. The solid and the hollow symbols (squares,
stars, triangles and circles) are the results of the present
study for the two different values of the fragment asym-
metry, assuming Gogny and Gogny-AS interactions, re-
spectively. Also shown in the figure are the scaling pa-
rameters (asterisks, crosses, diamond and inverted tri-
angle) taken from various other works in the literature
[36, 49]. It is observed that the experimentally deter-
mined α parameter increases linearly with increasing dif-
ference in the fragment asymmetry of the two systems
as predicted by the AMD calculation. Also, the data
points are in closer agreement with those predicted by
the Gogny-AS interaction (dotted line) than those from
the usual Gogny force (solid line).
In the above comparison between the data and the cal-
FIG. 8: Isoscaling parameter α, as a function of the differ-
ence in fragment asymmetry for 35 MeV/nucleon. The solid
and the dotted lines are the AMD calculations for the Gogny
and Gogny-AS interactions, respectively [31]. The solid and
the hollow squares, stars, triangles and circles are from the
present work as described in the text. The other symbols cor-
responds to data taken from [49] (asterisks) and [36] (crosses,
diamonds, inverted triangles).
culation, the corrections for the isoscaling parameter α
due to the sequential de-excitation of the fragments are
not taken into account. The slightly lower values of the
isoscaling parameters (symbols) from the present mea-
surements with respect to the Gogny-AS values (dotted
line) could be due to the small secondary de-excitation
effect of the fragments not accounted for in this compar-
ison. Recently, it has been reported by Ono et al. [42],
that the sequential decay effect in the dynamical calcu-
lations can affect the α value by as much as 50 %, and
the ability to distinguish between the “ stiff ” and the “
soft ” form of the density dependence of the symmetry
energy diminishes significantly. The calculations by Ono
et al., were carried out for the above studied systems
using the AMD model. However, dynamical calculation
carried out by Tian et al. [41], using Isospin Quantum
Molecular Dynamic (IQMD) model shows no significant
difference between the primary and the secondary α. The
sequential decay effect from the IQMD calculation was
also carried out for the same systems and beam energy
as studied by Ono et al. [42] using the AMD model. The
contrasting results between the two dynamical calcula-
tions for the same systems and energy currently present
significant amount of uncertainty in reliably estimating
the effect of sequential decay using dynamical models.
One reason for this could be the large discrepancy that
exists in the determination of the primary fragment exci-
tation energy from the current dynamical models. It has
been shown using another dynamical model (stochastic
mean field calculation) (see Liu et al. [33]), that it re-
quires a significantly lower value of the primary fragment
excitation energy (by as much as 50%), to be able to
reproduce the experimentally observed fragment isotope
distribution.
In the above comparison between the data and the cal-
culation, we have therefore assumed the effect of the se-
quential decay to be negligible. A correction of about 10
- 15 %, as determined and well established from various
statistical model studies [28], results in a slight increase
in the α values bringing them even closer to the dotted
line. The observed agreement of the experimental data
with the Gogny-AS type of interaction therefore appears
to suggest a “ stiff ” form of the density dependence of
the symmetry energy.
Figure 9 shows various forms of the density dependence
of the symmetry energy in isospin asymmetric nuclear
matter used by Chen et al. [50], and those used in the
present dynamical model analysis. The dot-dashed, dot-
ted and the dashed curve corresponds to the momentum
dependent Gogny interactions used by Chen et al., to ex-
plain the NSCL-MSU isospin diffusion data. Assuming
that the density dependence of the symmetry energy can
be parametrized as,
Csym(ρ) = C
(MeV ) (3)
where Cosym, is the value of the symmetry energy at sat-
uration density and γ is the parameter that characterizes
the stiffness of the symmetry energy, the above depen-
dences used by Chen et al. can be written as, Esym ≈
31.6 (ρ/ρ◦)
γ , where, γ = 1.6, 1.05 and 0.69, respectively.
The solid curve and the solid point in Fig. 9 correspond
to those from the Gogny and Gogny-AS interactions used
to study the isoscaling data in the present work.
By parameterizing the density dependence of the sym-
metry energy that explains the present isoscaling data,
one obtains, Csym(ρ) ≈ 31.6 (ρ/ρ◦)
γ , where γ = 0.69,
from the dynamical model analysis.
B. Statistical Multifragmentation Model
The Statistical Multifragmentation Model (SMM) [51,
52] is based on the assumption of statistical equilibrium
at a low density freeze-out stage. All breakup channels
composed of nucleons and excited fragments are taken
into account and considered as partitions. During each
partition the conservation of mass, charge, energy, mo-
mentum and angular momentum is taken into account,
and the partitions are sampled uniformly in the phase
space according to their statistical weights using Monte
Carlo sampling. The Coulomb interaction between the
fragments is treated in the Wigner-Seitz approximation.
Light fragments with mass number A ≤ 4 are considered
FIG. 9: Different forms of the density dependence of the nu-
clear symmetry energy used in the dynamical analysis of the
present measurements on isoscaling data and the isospin dif-
fusion measurements of NSCL-MSU [50]. The curves are as
described in the text.
as elementary particles with only translational degrees
of freedom (“nuclear gas”). Fragments with A > 4 are
treated as heated nuclear liquid drops, and their individ-
ual free energies FA,Z are parametrized as a sum of the
volume, surface, Coulomb and symmetry energy.
For the present study we make use of the SMM ver-
sion adopted by Botvina et al. [36]. In this version,
the secondary de-excitation of large fragments with A
> 16 is described by Weisskopf-type evaporation and
Bohr-Wheeler-type fission models [51, 53]. The decay
of smaller fragments is treated with the Fermi-breakup
model. All ground and nucleon-stable excited states of
light fragments are taken into account and the popula-
tion probabilities of these states are calculated according
to the available phase space [53]. The sequential decay
effect on the isoscaling parameter in this version of SMM
has been established to be small and in good agreement
with other versions of the statistical models.
Unlike dynamical calculations, the form of the density
dependence of the symmetry energy is not known a priori,
but has to be deduced from the systematic correlations
between the isoscaling parameter, temperature, symme-
try energy and the density of the multifragmenting sys-
tem. To build this correlation, we make use of the frag-
ment yield distributions measured in 58Ni, 58Fe + 58Ni,
58Fe reactions at 30, 40 and 47 MeV/nucleon to study
the isoscaling parameter α, as a function of the excita-
tion energy of the fragmenting source. The parameter α
was obtained from the ratio’s of the isotopic yields for two
different pairs of reactions, 58Fe + 58Ni and 58Ni + 58Ni,
and 58Fe + 58Fe and 58Ni + 58Ni as discussed in section
4.2. The excitation energy of the source for each beam
energy was determined by simulating the initial stage of
the collision dynamics using the Boltzmann-Nordheim-
Vlasov (BNV) model calculation [54]. The results were
obtained at a time around 40 - 50 fm/c after the projec-
tile had fused with the target nuclei and the quadrupole
moment of the nucleon coordinates (used for identifica-
tion of the deformation of the system) approached zero.
These excitation energies were also compared with those
obtained from the systematic calorimetric measurements
(see Ref. [55]) for systems with mass (A ∼ 100), and sim-
ilar to those studied in the present work, and are in good
agreement. Fig. 10 (a) shows the experimental isoscaling
parameter α, as a function of the excitation energy for
Fe + Fe and Ni + Ni, and Fe + Ni and Ni + Ni pairs
of reactions. A systematic decrease in the absolute val-
ues of the isoscaling parameter with increasing excitation
energy is observed for both pairs. The α parameters for
the 58Fe + 58Fe and 58Ni + 58Ni are about twice as large
compared to those for the 58Fe + 58Ni and 58Ni + 58Ni
pair of reactions.
The experimental isoscaling parameter was compared
with the predictions of the Statistical Multifragmentation
Model (SMM) [51, 56] calculations to study their depen-
dence on the excitation energy and the isospin content.
The initial parameters such as, the mass, charge and ex-
citation energy of the fragmenting source for the calcula-
tion was obtained from the BNV calculations as discussed
above. The possible uncertainties in the source param-
eters due to the loss of nucleons during pre-equilibrium
emission was accounted for by carrying out the calcula-
tions for smaller source sizes. The break-up density in
the calculation was taken to be multiplicity-dependent
and was varied from approximately 1/2 to 1/3 the sat-
uration density. This was achieved by varying the free
volume with the excitation energy as shown in Ref. [51].
The form of the dependence was adopted from the work
of Bondorf et al. [57, 58], (and shown by the solid curve in
Fig. 10 (d)). It is known that the multiplicity-dependent
break-up density, which corresponds to a fixed interfrag-
ment spacing and constant pressure at break-up, leads
to a pronounced plateau in the caloric curve [57, 58]. A
constant break-up density would lead to a steeper tem-
perature versus excitation energy dependence.
The symmetry energy in the calculation was varied un-
til a reasonable agreement between the calculated and the
measured α was obtained. Fig. 10 (a) shows the com-
parison between the SMM calculated and the measured
α for both pairs of systems. The dashed curves corre-
spond to the calculation for the primary fragments and
the solid curves to the secondary fragments. The width
in the curve is the measure of the uncertainty in the in-
puts to the SMM calculation. It is observed that, within
the given uncertainties, the decrease in the α values with
increasing excitation energy and decreasing isospin dif-
ference ∆(Z/A)2, of the systems is well reproduced by
4 6 8 10
0 2 4 6 8 10
E* (MeV/nucleon)
4 6 8 10
0 2 4 6 8 10
FIG. 10: (Color online) Isoscaling parameter α, temperature,
symmetry energy and density as a function of excitation en-
ergy for the Fe + Fe and Ni + Ni (inverted triangles), and
Fe + Ni and Ni + Ni (solid circles) reactions at 30, 40 and
47 MeV/nucleon. a) Experimental isoscaling parameter as
a function of excitation energy. The solid and the dashed
curves are the SMM calculations as discussed in the text. b)
Temperature as a function of excitation energy. The solid
stars correspond to the measured values and are taken from
Ref. [55]. The solid and the dashed curve corresponds to the
Fermi-gas relation. The dotted curve corresponds to the one
obtained from Eq. 4. c) Symmetry energy as a function of
excitation energy. d) Density as a function of excitation en-
ergy. The solid stars correspond to those from Ref. [69]. The
open triangles are those from Ref. [70]. The solid curve is
from Ref. [57].
the SMM calculation. One also notes that the effect of
sequential decay on the isoscaling parameter is small as
observed in several other studies [40, 59] using statistical
models.
We show in Fig. 10 (b), the temperature as a func-
tion of excitation energy (caloric curve) obtained from
the above SMM calculation that uses the excitation en-
ergy dependence of the break-up density to explain the
observed isoscaling parameters. These are shown by the
solid and inverted triangle symbols. Also shown in the
figure are the experimentally measured caloric curve data
compiled by Natowitz et al. [55], from various measure-
ments for this mass range. The data from these mea-
surements are shown collectively by solid star symbols
and no distinction is made among them. The Fermi-gas
model predictions with inverse level density parameter
Ko = 10 (solid and dashed curve), is also shown. It is
evident from the figure that the temperatures obtained
from the SMM calculations are in good agreement with
the overall trend of the caloric curve. Somewhat lower
value for the temperature is observed when the break-up
density of the system is kept constant at 1/3 the nor-
mal nuclear density. By allowing the break-up density
to evolve with the excitation energy, a near plateau that
agrees with the experimentally measured caloric curves
is obtained. This assures that the input parameters used
in the SMM calculation for comparing with the data are
reasonable.
The symmetry energies obtained from the statistical
model comparison of the experimental isoscaling param-
eter α, are as shown in Fig. 10 (c). A steady decrease in
the symmetry energy with increasing excitation energy is
observed for both pairs of systems. Such a decrease has
also been observed in several other studies [60, 61, 62, 63].
We have also estimated the effect of the symmetry en-
ergy evolving during the sequential de-excitation of the
primary fragments [60, 64]. These are reflected in the
large error bars shown in Fig. 10 (c).
The phase diagram of the multifragmenting system is
two dimensional and hence the excitation energy depen-
dence of the temperature (the caloric curve) must take
into account the density dependence too. Often this de-
pendence is neglected while studying the caloric curve.
In the following, we attempt to extract the density of the
fragmenting system as a function of excitation energy. It
has been shown by Sobotka et al. [65], that the plateau
in the caloric curve could be a consequence of the ther-
mal expansion of the system at higher excitation energy
and decreasing density. By assuming that the decrease
in the breakup density, as taken in the present statisti-
cal multifragmentation calculation, can be approximated
by the expanding Fermi gas model, and furthermore the
temperature in Eq. 2 and the temperature in the Fermi-
gas relation are related, one can extract the density as a
function of excitation energy using the relation
Ko(ρ/ρo)2/3E∗ (4)
0 0.5 1 1.5
FIG. 11: (Color online) Symmetry energy as a function of
density for the Fe + Fe and Ni + Ni pair of reaction (inverted
triangles), and Fe + Ni and Ni + Ni pair of reactions (solid
circles) for the 30, 40 and 47 MeV/nucleon. The solid curve is
the dependence obtained form the dynamical model analysis
as explained in the text.
In the above expression, the momentum and the fre-
quency dependent factors in the effective mass ratio are
taken to be one as expected at high excitation energies
and low densities studied in this work [66, 67, 68].
The resulting densities for the two pairs of systems are
shown in Fig. 10 (d) by the solid circles and inverted
triangles. For comparison, we also show the break-up
densities obtained from the analysis of the apparent level
density parameters required to fit the measured caloric
curve by Natowitz et al. [69], and those obtained by Viola
et al. [70] from the Coulomb barrier systematics that are
required to fit the measured intermediate mass fragment
kinetic energy spectra. One observes that the present re-
sults obtained by requiring to fit the measured isoscaling
parameters and the caloric curve are in good agreement
with those obtained by Natowitz et al. The figure also
shows the fixed freeze-out density of 1/3 (dashed line)
and 1/6 (dotted line) of the saturation density assumed in
various statistical model comparisons. The caloric curve
obtained using the above densities and excitation ener-
gies (shown by solid stars, circles and the triangles) with
Ko = 10 in Eq. 4, is shown by the dotted curve in Fig.
10 (b). The small discrepancy between the dotted curve
and the data (solid stars) below 4 MeV/nucleon is due
to the approximate nature of Eq. 4 being used.
It is therefore evident from figure 10 (a), (b), (c) and
(d) that the decrease in the experimental isoscaling pa-
rameter α, symmetry energy, break-up density, and the
flattening of the temperature with increasing excitation
energy are all correlated. One can thus conclude that the
expansion of the system during the multifragmentation
process leads to a decrease in the isoscaling parameter,
decrease in the symmetry energy and density, and the
flattening of the caloric curve.
TABLE I: Parameterized form of the density dependence of the symmetry energy obtained from various independent studies.
Reference Parametrization Studies
Fuchs et al. [85] 32.9(ρ/ρo)
0.59 Relativistic Dirac-Brueckner calculation
Heiselberg et al. [82] 32.0(ρ/ρo)
0.60 Variational calculation
Danielewicz et al. [81] 31(33)(ρ/ρo)
0.55(0.79) BE, skin, isospin analog states
Tsang et al. [79] 12.125(ρ/ρo)
2 Isospin diffusion
Chen et al. [50] 31.6(ρ/ρo)
1.05 Isospin diffusion
Li et al. [80] 31.6(ρ/ρo)
0.69 Isospin diffusion
Piekarewicz et al. [77, 78] 32.7(ρ/ρo)
0.64 Giant resonances
Shetty et al. [73, 74, 75] 31.6(ρ/ρo)
0.69 Isotopic distribution
Famiano et al. [87] 32.0(ρ/ρo)
0.55 neutron-proton emission ratio
Tsang et al. [28] 23.4(ρ/ρo)
0.6 Isotopic distribution
From the above correlation between the symmetry en-
ergy as a function of excitation energy, and the density
as a function of excitation energy, we obtain the sym-
metry energy as a function of density. This is shown by
the inverted triangles and solid circles in Fig. 11 for the
Fe + Fe and Ni + Ni, and the Fe + Ni and Ni + Ni
pair of reactions. The temperature in the present work
remains nearly constant for the range of excitation ener-
gies studied, the observed decrease in the symmetry en-
ergy with increasing excitation energy is therefore a con-
sequence of decreasing density. This is also supported
by microscopic calculations which shows an extremely
slow evolution of the symmetry energy with temperature
[71, 72]. The evolution is practically negligible for the
temperature range studied in this work. The solid curve
in Fig. 11 corresponds to the dependence Csym(ρ) =
31.6 (ρ/ρ◦)
0.69 MeV, obtained from the dynamical Anti-
symmetrized Molecular Dynamic (AMD) calculation, as
discussed in the previous section. It is thus observed that
the dynamical and statistical models lead to similar den-
sity dependence of the symmetry energy.
VI. COMPARISON WITH OTHER
INDEPENDENT STUDIES
In the following, we compare the form of the den-
sity dependence of the symmetry energy obtained from
the present experimentally measured isoscaling param-
eter using the statistical and the dynamical multifrag-
mentation models with several other recent independent
studies. Fig. 12 shows this comparison. The green solid
curve corresponds to the one obtained from Gogny-AS
interaction in dynamical AMD model that explains the
present results [73, 74], assuming the sequential decay
effect to be small. The inverted triangle and the circle
symbols also correspond to the present measurements ob-
tained by comparing with the statistical multifragmenta-
tion model [75]. The red dashed curve corresponds to the
one obtained recently from an accurately calibrated rela-
tivistic mean field interaction, used for describing the Gi-
ant Monopole Resonance (GMR) in 90Zr and 208Pb, and
the IVGDR in 208Pb by Piekarewicz et al. [76, 77, 78].
The pink dot-dashed curve correspond to the one used
to explain the isospin diffusion results of NSCL-MSU us-
ing the isospin dependent Boltzmann-Uehling-Uhlenbeck
(IBUU) model by Tsang et al. [79]. The blue dot-dashed
curve also corresponds to the one used for explaining the
isospin diffusion data of NSCL-MSU by Chen et al. [50],
but with the momentum dependence of the interaction
included in the IBUU calculation. This dependence has
been further modified to include the isospin dependence
of the in-medium nucleon-nucleon cross-section by Li et
al. [80], and is in good agreement with the present study.
The shaded region in the figure corresponds to those ob-
tained by constraining the binding energy, neutron skin
thickness and isospin analogue state in finite nuclei us-
ing the mass formula of Danielewicz [81]. The yellow
solid curve correspond to the parameterization adopted
by Heiselberg et al. [82] in their studies on neutron stars.
By fitting earlier predictions of the variational calcula-
tions by Akmal et al. [83, 84], where the many-body and
special relativistic corrections are progressively incorpo-
rated, Heiselberg and Hjorth-Jensen obtained a value of
Cosym = 32 MeV and γ = 0.6, similar to those obtained
from the present measurements. A similar result is also
obtained from the relativistic Dirac-Brueckner calcula-
tion, with Cosym = 32.9 MeV and γ = 0.59 [85]. The
Dirac-Brueckner is an “ab-initio” calculation based on
nucleon-nucleon interaction with Bonn A type potential
instead of the AV18 potential used in the variational cal-
culation of Ref. [84]. The density dependence of the sym-
metry energy has also been studied in the framework of
expanding emitting source (EES) model by Tsang et al.
[28], where a power law dependence of the form Csym(ρ)
= 23.4(ρ/ρo)
γ , with γ = 0.6 was obtained. This depen-
dence (shown by the black dotted curve) is significantly
softer than other dependences shown in the figure. The
solid square point in the figure correspond to the value
of symmetry energy obtained by fitting the experimental
differential cross-section data in a charge exchange reac-
tion using the isospin dependent CDM3Y6 interaction of
FIG. 12: (Color online) Comparison between the results on the density dependence of the symmetry energy obtained from
various different studies. The various curves and the symbols are described in the text.
the optical potential by Khoa et al. [86].
An alternate observable, the double neutron/proton
ratio of nucleons taken from two reaction systems using
four isotopes of the same element, has recently been pro-
posed as a probe to study the density dependence of the
symmetry energy [87]. This observable is expected to be
more robust than the isoscaling observable. It was shown
recently [87] that the experimentally determined double-
ratio for the 124Sn + 124Sn reaction to that for the 112Sn
+ 112Sn reaction, results in a dependence with γ = 0.5
(shown by black dashed curve in Fig. 12), when com-
pared to the predictions of the IBUU transport model
calculations. This observation is in close agreement with
other studies discussed above. However, this dependence
has been obtained by using the momentum independent
calculation of Ref. [88]. A more recent calculation [89]
using a BUU transport model that includes momentum
dependent interaction show significantly lower values for
the double neutron/proton ratio of free nucleons com-
pared to the one reported by Famiano et al.
The parameterized forms of the density dependence of
the symmetry energy obtained from all the above men-
tioned studies are as given in Table I. The close agree-
ment between various independent studies show that a
constraint on the density dependence of the symmetry
energy, given as Csym(ρ) = C
sym(ρ/ρo)
γ , where Cosym ≈
31 - 33 MeV and γ ≈ 0.55 - 0.69 can be obtained.
VII. DISCUSSION
We make the following observations from the above
comparison between the statistical and the dynamical
model analysis :
1) Assuming a negligibly small sequential decay effect,
the form of the density dependence of the symmetry en-
ergy obtained from the dynamical model analysis is in
good agreement with the one obtained from the statistical
model analysis: As mentioned earlier, the sequential de-
cay effect among various dynamical model calculations is
still a subject of debate. The statistical models however
consistently show small sequential decay effect. If the se-
quential decay in both the dynamical and the statistical
model is determined by the excitation energy, charge (Z)
and mass (A) of the fragments, and not by the process
that leads to these fragments, the de-excitation of the
fragments must lead to a same amount of change in the
isoscaling parameter (either a large change or no change
at all). It is therefore unrealistic to assume that the se-
quential decay effect is different in the dynamical and the
statistical model calculations. One comparison by Hudan
et al. [90], show good agreement between the experimen-
tally determined primary fragment excitation energy and
those calculated using the statistical multifragmentation
model (see table II of Ref. [90]). Furthermore, if dy-
namical and statistical models are merely two different
ways of interpreting the same multifragmentation process
(i.e., one simulating the entire process from the forma-
tion to the breakup stage, and the other simulating only
the later breakup stage), the isoscaling parameter from
both interpretation must lead to consistent results. It
is well known, and as discussed in section II, that both
interpretations predict isoscaling in multifragmentation.
As discussed in section V A, the apparent disagreement
between the sequential decay effect in statistical and dy-
namical models, could be due to the large discrepancy
that exists in the determination of the primary fragment
excitation energy from current dynamical models.
It has been argued [91] that the effect of sequential
decay on the isoscaling parameter α, in statistical multi-
fragmentation model depends not only on the excitation
energy but also on the value of the symmetry energy. The
fragments in their primary stage are usually hot and the
properties of hot nuclei (i.e., their binding energy and
mass) differ from those of cold nuclei. If hot fragments
in the freeze-out configuration have smaller symmetry
energy, their mass at the beginning of the sequential de-
excitation will be different and this effect should be taken
into account. At smaller values of the symmetry energy
the sequential decay effect can be large. In order to study
the effect of symmetry energy evolution on isoscaling pa-
rameter during sequential decay, we have adopted in this
work a phenomenological approach of Buyukcizmeci et
al. [64]. Fig. 13 and 14 shows the primary and the sec-
ondary isoscaling parameter as a function of symmetry
energy calculated from the statistical multifragmentation
model (SMM) for the Ar + Ni and Ca + Ni, and Ar + Fe
and Ca + Ni pair of reactions. The various panels from
top to bottom correspond to different system excitation
energies. Fig. 13 shows the result of the calculations
where the symmetry energy is kept fixed, and Fig. 14
shows the result for the calculations where the symme-
try energy is varied during the de-excitation process. The
dashed lines in the figure correspond to the primary frag-
ments (Eq. 2) and the solid lines to the secondary frag-
ments. It is observed that there is no significant change
in the primary and the secondary alpha.
2) The result of the statistical model analysis is in good
agreement with other independent studies : A comparison
between the density dependence of the symmetry energy
obtained from the statistical model analysis (for which
the sequential decay effect is known to be small) and
other independent studies shows excellent agreement.
3) The isoscaling parameter probes the property of in-
finite nuclear matter : The symmetry energy obtained
from dynamical model analysis (shown by the solid curve
in Fig. 11) relates to the volume part of the symmetry en-
ergy as in infinite nuclear matter, whereas, the symmetry
energy obtained from the statistical model analysis (solid
circles and inverted triangles in Fig. 11) relates to the
FIG. 13: SMM calculated isoscaling parameter α as a func-
tion of symmetry energy for various excitation energies. The
open circles joined by dotted lines correspond to the primary
fragments and the open stars joined by solid lines to the sec-
ondary fragments. The left column shows the calculation for
40Ar + 58Ni and 40Ca + 58Ni pair, and the right column for
the 40Ar + 58Fe and 40Ca + 58Ni pair.
fragment that is finite and has surface contribution. The
similarity between the two can probably be understood
in terms of the weakening of the surface symmetry free
energy when the fragments are being formed. During
the density fluctuation in uniform low density matter,
the fragments are not completely isolated and continue
to interact with each other, resulting in a decrease in
the surface contribution as predicted by Ono et al. [43].
It must be mentioned that by fitting the nuclear masses
with mass formula, a volume contribution to the symme-
try energy of about 31 - 33 MeV and surface contribution
of about 11 - 13 MeV was obtained by Danielewicz [92]
for nuclei at normal density. Using the constraint ob-
tained for the volume part of the symmetry energy from
the present study, and following the expression for the
symmetry energy of finite nuclei by Danielewicz, we write
the general expression for the density dependence of the
symmetry energy as,
SA(ρ) =
α(ρ/ρ◦)
1 + [α(ρ/ρ◦)γ/βA1/3]
where, α ≡ Cosym = 31 - 33 MeV, γ = 0.55 - 0.69 and
α/β = 2.6 - 3.0. The quantities α and β are the volume
and the surface symmetry energy at normal nuclear den-
sity. The above equation reduces to Eq. 3 for infinite
FIG. 14: Same as in fig. 13, but with the modified secondary
de-excitation with evolving symmetry energy.
nuclear matter in the limit of A → ∞, and to the sym-
metry energy of finite nuclei for ρ = ρ◦. The ratio of the
volume symmetry energy to the surface symmetry energy
(α/β) is closely related to the neutron skin thickness. De-
pending upon how the nuclear surface and the Coulomb
contribution is treated, two different correlations between
the volume and the surface symmetry energy have been
predicted [17] from fits to nuclear masses. Experimental
masses and neutron skin thickness measurements for nu-
clei with N/Z > 1 should provide tighter constraint on
the surface-volume correlation.
4) The density dependence of the symmetry energy ob-
tained using the statistical model approach is consistent
with other experimentally determined observables : In the
past, attempts have been made to study the density de-
pendence of the symmetry energy by looking at specific
observables and comparing them with the predictions of
the dynamical models. Such an approach attempts to ex-
plain the observable of interest without trying to simul-
taneously explain other properties, such as, the temper-
ature, density and excitation of the fragmenting system.
This has lead to a variety of different dependences with-
out any clue to what density is being probed. It might not
be straightforward to distinguish different realistic EOS
interactions using dynamical models, due to the large
uncertainties that currently exist in the sequential decay
effects for these models. But the idea of extracting infor-
mation on the symmetry energy from the point of view
of the basic nucleon-nucleon interaction is a very pow-
erful approach. On the other hand, the determination
of the density dependence of the symmetry energy from
statistical model analysis by simultaneously explaining
the isoscaling parameter, caloric curve and the density
as a function of excitation energy is a reverse approach.
This approach attempts to explain the experimental ob-
servables without any prior knowledge of the governing
interaction and arrives at a dependence which can then
be compared with those predicted from the basic inter-
actions.
5) Symmetry energy determined from the multifrag-
mentation study is lower than that of normal nuclei
: Theoretical many-body calculations [1, 93, 94, 95]
and those from the empirical liquid-drop mass formula
[96, 97] predict symmetry energy near normal nuclear
density and temperature to be around 30 - 32 MeV. As-
suming a negligible evolution of the symmetry energy as
a function of temperature, as shown in Ref. [71, 72], the
present statistical model analysis yields symmetry energy
of the order of 18 - 20 MeV at half the normal nuclear
density.
6) The above constraint on the density dependence of
the symmetry energy has important implications for as-
trophysical and nuclear physics studies :
a) Neutron skin thickness: It has been shown [98] that
an empirical fit to a large number of mean-field calcu-
lations yield neutron skin thickness for 208Pb nucleus,
Rn - Rp ≃ (0.22γ + 0.06) fm, where γ is the exponent
that determines the stiffness of the density dependence
of the symmetry energy. From the above comparison, as-
suming only those density dependences of the symmetry
energy which have symmetry energy at normal nuclear
density 31 - 33 MeV, one obtains a neutron skin thick-
ness of 0.18 - 0.21 fm. An accurate determination of
the neutron skin thickness from the parity-violating elec-
tron scattering measurement [18] will provide a unique
observational constraint on the thickness of the neutron
skin of a heavy nucleus. The above constraint also leads
to symmetric matter nuclear compressibility of K ∼ 230
b) Neutron star mass and radius : The constraint also
predicts a limiting neutron star mass of Mmax = 1.72
solar mass and a radius, R = 11 - 13 km for the “ canon-
ical ” neutron star. Recent observations of pulsar-white
dwarf binaries at the Arecibo observatory suggest a pul-
sar mass for PSRJ0751+1807 of M = 2.1+0.4
−0.5 solar mass
at a 95% confidence level [99].
c) Neutron star cooling : Furthermore, it predicts a
direct URCA cooling for neutron stars above 1.4 times
the solar mass. If such is the case, then the enhanced
cooling of a M = 1.4 solar mass neutron star may provide
strong evidence in favor of exotic matter in the core of a
neutron star.
These results have important implications for nuclear
astrophysics and future experiments probing the prop-
erties of nuclei using beams of neutron-rich nuclei. The
above constraint was obtained by studying the low den-
sity behavior of nuclear matter. Measurements at den-
sities above normal nuclear matter should further con-
straint the form of the symmetry energy. Such measure-
ments should yield consistent results when extrapolated
to low densities.
VIII. SUMMARY AND CONCLUSIONS
In summary, a number of reactions were studied to in-
vestigate the density dependence of the symmetry energy
in the equation of state of isospin asymmetric nuclear
matter. The results were analyzed within the framework
of the dynamical and the statistical models of multifrag-
mentation. It is observed that a dependence of the form
Csym(ρ) = 31.6 (ρ/ρo)
0.69 MeV, agrees reasonably with
the experimental data indicating that a “ stiff ” form of
the symmetry energy provides a better description of the
nuclear matter EOS at sub-nuclear densities. A compar-
ison with several other independent studies shows that
a constraint on the density dependence of the symmetry
energy given as Csym(ρ) = C
sym(ρ/ρo)
γ , where Cosym ≈
31 - 33 MeV and γ ≈ 0.55 - 0.69, can thereby be obtained.
The present observation has important implications for
astrophysics, as well as, nuclear physics studies to be car-
ried out at future radioactive beam facilities worldwide.
IX. ACKNOWLEDGMENTS
This work was supported in part by the Robert A.
Welch Foundation through grant No. A-1266, and the
Department of Energy through grant No. DE-FG03-
93ER40773. We also thank A. Botvina for the SMM
code, and the Catania group for the BNV code.
[1] A.E.L. Dieperink, Y. Dewulf, D. Van Neck, M. Waro-
quier, and V. Rodin, Phys. Rev. C 68, 064307 (2003).
[2] R.B. Wiringa, V. Fiks and A. Fabrocini, Phys. Rev. C
38, 1010 (1988).
[3] C.H. Lee, T.T.S. Kuo, G.Q. Li and G.E. Brown, Phys.
Rev. C 57, 3488 (1998).
[4] B. Liu, V. Greco, V. Baran, M. Colonna, and M. DiToro,
Phys. Rev. C 65, 045201 (2002).
[5] N. Kaiser, S. Fritsch and W. Weise, Nucl. Phys. A 697,
255 (2002).
[6] C. Fuchs and H.H. Wolter, Eur. Phys. J. A 30, 5 (2006).
[7] B.A. Brown, Phys. Rev. Lett 85, 5296 (2000).
[8] C.J. Horowitz and J. Piekarewicz, Phys. Rev. Lett. 86,
5647 (2001).
[9] R.J. Furnstahl, Nucl. Phys. A 706, 85 (2002).
[10] K. Oyamatsu, I. Tanihata, Y. Sugahara, K. Sumiyoshi,
and H. Toki, Nucl. Phys. A 634, 3 (1998).
[11] J. Lattimer, C. Pethick, M. Prakash and P. Hansel, Phys.
Rev. Lett. 66, 2701 (1991).
[12] C. Lee, Phys. Rep. 275, 255 (1996).
[13] C.J. Pethick and D.G. Ravenhall, Annu. Rev. Nucl. Part.
Sci. 45, 429 (1995).
[14] J.M. Lattimer and M. Prakash, Phys. Rep. 333, 121
(2000).
[15] W.R. Hix, O.E.B. Messer, A. Mezzacappa, M. Lieben-
dorfer, J. Sampaio, K. Langanke, D.J. Dean, and G.
Martinez-Pinedo, Phys. Rev. Lett. 91, 201102 (2003).
[16] J.M. Lattimer and M. Prakash, Science, 304, 536 (2004).
[17] A.W. Steiner, M. Prakash, J.M. Lattimer and P.J. Ellis,
Phys. Rep. 411, 325 (2005).
[18] C.J. Horowitz, S.J. Pollock, P.A. Souder and R. Michaels,
Phys. Rev. C 63, 025501 (2001).
[19] C.J. Horowitz and J. Piekarewicz, Phys. Rev. C 66,
55803 (2002).
[20] J.R. Stone, J.C. Miller, R. Koncewicz, P.D. Stevenson
and M.R. Strayer, Phys. Rev. C 68, 034324 (2003).
[21] J. Lattimer et al., Astrophys. J. 425, 802 (1994).
[22] P. Slane, D.J. Helfand and S.S. Murray, Astrophys. J. L
571, 45 (2002).
[23] P. Danielewicz, R. Lacey and W.G. Lynch, Science, 298,
1592 (2002).
[24] GSI Conceptual Design Report,
http://www.gsi.de/GSI-Future.
[25] RIA homepage, http://www.orau.org/ria.
[26] B.A. Li, C.M. Ko and W. Bauer, Int. J. Mod. Phys. E 7,
147 (1998).
[27] V. Baran, M. Colonna, V. Greco and M. DiToro, Phys.
Rep. 410, 335 (2005).
[28] M.B. Tsang, W.A. Friedman, C.K. Gelbke, W.G. Lynch,
G. Verde, and H.S. Xu, Phys. Rev. Lett. 86, 5023 (2001).
[29] H.S. Xu et al., Phys. Rev. Lett. 85, 716 (2000).
[30] D.V. Shetty, S.J. Yennello, E. Martin, A. Keksis, G.A.
Souliotis, Phys. Rev. C 68, 021602 (2003).
[31] A. Ono, P. Danielewicz, W.A. Friedman, W.G. Lynch
and M.B. Tsang, Phys. Rev. C 68, 051601 (2003).
[32] Q. Li, Z. Li, H. Stocker, Phys. Rev. C 73, 051601 (2006).
[33] T.X. Liu et al., Phys. Rev. C 69, 014603 (2004).
[34] M. Colonna and F. Matera, Phys. Rev. C 71, 064605
(2005).
[35] C.O. Dorso, C.R. Escudero, M. Ison, and J.A. Lopez,
Phys. Rev. C 73, 044601 (2006).
[36] A.S. Botvina, O.V. Lozhkin and W. Trautmann, Phys.
Rev. C 65, 044610 (2002).
[37] Al.H. Raduta, Ad.R. Raduta, Phys. Rev. C 65, 054610
(2002).
[38] W.A. Friedman, Phys. Rev. C 42, 667 (1990).
[39] D.H.E. Gross, Phys. Rep. 279, 119 (1997).
[40] M.B. Tsang et al., Phys. Rev. C 64, 054615 (2001).
[41] W.D. Tian et al., arXiv: nucl-th/0601079 (2006).
[42] A. Ono, P. Danielewicz, W.A. Friedman, W.G. Lynch
and M.B. Tsang, arXiv: nucl-ex/0507018 (2005).
[43] A. Ono, P. Danielewicz, W.A. Friedman et al., Phys. Rev.
C 70, 041604 (2004).
[44] H. Johnston et al., Phys. Rev. C 56, 1972 (1997).
[45] S.J. Yennello et al., Phys. Lett. B 321, 15 (1994).
[46] R.P. Schmitt et al., Nucl. Instrum. Methods Phys. Res.
A 354, 487 (1995).
[47] C.A. Ogilvie, D.A. Cebra, J. Clayton, S. Howden, J.
Karn, A. Vander Molen, G.D. Westfall, W.K. Wilson,
and J.S. Winfield, Phys. Rev. C 40, 654 (1989).
[48] S.J. Yennello et al., Proc. of the 10th Winter Workshop
on Nuclear Dynamics, Snowbird, UT, edited byW. Bauer
http://www.gsi.de/GSI-Future
http://www.orau.org/ria
http://arxiv.org/abs/nucl-th/0601079
http://arxiv.org/abs/nucl-ex/0507018
(1994).
[49] E. Geraci et al., Nucl. Phys. A 732, 173 (2004).
[50] L.W. Chen, C.M. Ko and B.A. Li, Phys. Rev. Lett. 94,
032701 (2005).
[51] J.P. Bondorf et al., Phys. Rep. 257, 133 (1995).
[52] A.S. Botvina et al., Nucl. Phys. A 584, 737 (1995).
[53] A.S. Botvina, A.S. Iljinov, I.N. Mishustin, J.P. Bondorf,
R. Donangelo, and K. Sneppen, Nucl. Phys. A 475, 663
(1987).
[54] V. Baran, M. Colonna, M. Di Toro, V. Greco, M.
Zielinska-Pfabe and H.H. Wolter, Nucl. Phys. A 703, 603
(2002).
[55] J.B. Natowitz, R. Wada, K. Hagel, T. Keutgen, M. Mur-
ray, A. Makeev, L. Qin, P. Smith, and C. Hamilton, Phys.
Rev. C 65, 034618 (2002); and references therein.
[56] A.S. Botvina and I.N. Mishustin, Phys. Rev. C 63,
061601 (2001).
[57] J.P. Bondorf, R. Donangelo, I.N. Mishustin and H.
Schultz, Nucl. Phys. A 444, 460 (1985).
[58] J.P. Bondorf, A.S. Botvina and I.N. Mishustin, Phys.
Rev. C 58, 27 (1998).
[59] W.P. Tan et al., Phys. Rev. C 64, 051901 (2001).
[60] J. Iglio, D.V. Shetty, S.J. Yennello, G.A. Souliotis, M.
Jandel, A. Keksis, S. Soisson, B. Stein and S. Wuenschel,
Phys. Rev. C 74, 024605 (2006).
[61] D.V. Shetty, A.S. Botvina, S.J. Yennello, A. Keksis, E.
Martin and G.A. Souliotis, Phys. Rev. C 71, 024602
(2005).
[62] A. LeFevre et al., Phys. Rev. Lett. 94, 162701 (2005).
[63] D. Henzlova, A.S. Botvina, K.H. Schmidt, V. Henzl, P.
Napolitani, and M.V. Ricciardi, arXiv: nucl-ex/0507003
(2005).
[64] N. Buyukcizmeci, R. Ogul, and A.S. Botvina, Eur. Phys.
J. 25, (2005) 57.
[65] L.G. Sobotka, R.J. Charity, J. Toke and W.U. Schroder,
Phys. Rev. Lett. 93, 132702 (2004).
[66] R. Hasse and P. Schuck, Phys. Lett. B 179, 313 (1986).
[67] S. Shlomo and J.B. Natowitz, Phys. Lett. B 252, 187
(1990).
[68] S. Shlomo and J.B. Natowitz, Phys. Rev. C 44, 2878
(1991).
[69] J.B. Natowitz, K. Hagel, Y. Ma, M. Murray, L. Qin, S.
Shlomo, R. Wada, and J. Wang, Phys. Rev. C 66, 031601
(2002).
[70] V.E. Viola, K. Kwiatkowski, J.B. Natowitz, and S.J. Yen-
nello, Phys. Rev. Lett. 93, 132701 (2004).
[71] Isospin Physics in Heavy Ion Collisions at Intermediate
Energies, edited by B.-A. Li and W. Schroder (Nova Sci-
ence, New York, 2001).
[72] B.A. Li and L.W. Chen, Phys. Rev. C 74, 034610 (2006).
[73] D.V. Shetty, S.J. Yennello, A.S. Botvina, G.A. Souliotis,
M. Jandel, E. Bell, A. Keksis, S. Soisson, B. Stein, and
J. Iglio , Phys. Rev. C 70, 011601 (2004).
[74] D.V. Shetty, S.J. Yennello, and G.A. Souliotis, Phys.
Rev. C 75, 034602 (2007); arXiv: nucl-ex/0505011
(2005).
[75] D.V. Shetty, S.J. Yennello, G.A. Souliotis, A.L. Keksis,
S.N. Soisson, B.C. Stein, and S. Wuenschel, Phys. Rev.
C (2007) (Submitted); arXiV: nucl-ex/0606032 (2006).
[76] B.G. Todd-Rutel and J. Piekarewicz, Phys. Rev. Lett 95,
122501 (2005).
[77] J. Piekarewicz (Private Communication).
[78] J. Piekarewicz, Proc. of the International Conference on
Current Problems in Nuclear Physics and Atomic En-
ergy, Kyiv, Ukraine, (May 29 - June 3, 2006).
[79] M.B. Tsang et al., Phys. Rev. Lett. 92, 062701 (2004).
[80] B.A. Li and L.W. Chen, Phys. Rev. C 72, 064611 (2005).
[81] P. Danielewicz, arXiv : nucl-th/0411115 (2004).
[82] H. Heiselberg and M. Hjorth-Jensen, Phys. Rep. 328, 237
(2000).
[83] A. Akmal and V.R. Pandharipande, Phys. Rev. C 56,
2261 (1997).
[84] A. Akmal, V.R. Pandharipande and D.G. Ravenhall,
Phys. Rev. C 58, 1804 (1998).
[85] C. Fuchs (Private Communication); E.N.E. van Dalen, C.
Fuchs, and A. Faessler, Nucl. Phys. A 744, 227 (2004).
[86] D.T. Khoa and H.S. Than, Phys. Rev. C 71, 044601
(2005).
[87] M.A. Famiano et al., Phys. Rev. Lett. 97, 052701 (2006).
[88] B.A. Li, C. Ko, and Z. Ren, Phys. Rev. Lett. 78, 1644
(1997).
[89] B.A. Li, L.W. Chen, G.C. Yong, and W. Zuo, Phys. Lett.
B 634, 378 (2006).
[90] S. Hudan et al., Phys. Rev. C 67, 064613 (2003).
[91] M. Colonna and M.B. Tsang, Eur. Phys. J. A 30, 165
(2006).
[92] P. Danielewicz, Nucl. Phys. A727, 233 (2003).
[93] W. Zuo, I. Bombaci, and U. Lombardo, Phys. Rev. C 60,
024605 (1999).
[94] M. Brack, C. Guet, and H.B. Hakansson, Phys. Rep. 123,
276 (1985).
[95] J.M. Pearson and R.C. Nayak, Nucl. Phys. A 668, 163
(2000).
[96] W.D. Myers and W.J. Swiatecki, Nucl. Phys. 81, 1
(1966).
[97] K. Pomorski and J. Dudek, Phys. Rev. C 67, 044316
(2003).
[98] C.J. Horowitz, Eur. Phys. J. A 30, 303 (2006).
[99] D.J. Nice et al., Astrophys. J. 634, 1242 (2005).
http://arxiv.org/abs/nucl-ex/0507003
http://arxiv.org/abs/nucl-ex/0505011
http://arxiv.org/abs/nucl-ex/0606032
http://arxiv.org/abs/nucl-th/0411115
ABSTRACT
  The density dependence of the symmetry energy in the equation of state of
isospin asymmetric nuclear matter is of significant importance for studying the
structure of systems as diverse as the neutron-rich nuclei and the neutron
stars. A number of reactions using the dynamical and the statistical models of
multifragmentation, and the experimental isoscaling observable, is studied to
extract information on the density dependence of the symmetry energy. It is
observed that the dynamical and the statistical model calculations give
consistent results assuming the sequential decay effect in dynamical model to
be small. A comparison with several other independent studies is also made to
obtain important constraint on the form of the density dependence of the
symmetry energy. The comparison rules out an extremely " stiff " and " soft "
form of the density dependence of the symmetry energy with important
implications for astrophysical and nuclear physics studies.

<|endoftext|><|startoftext|>
entropic.tex
Competitive nucleation and the Ostwald rule
in a generalized Potts model with multiple metastable phases
David P. Sanders,∗ Hernán Larralde, and François Leyvraz
Instituto de Ciencias Fı́sicas, Universidad Nacional Autónoma de México, Apartado postal 48-3, 62551 Cuernavaca, Morelos, Mexico
(Dated: November 4, 2018)
We introduce a simple nearest-neighbor spin model with multiple metastable phases, the number and decay
pathways of which are explicitly controlled by the parameters of the system. With this model we can construct,
for example, a system which evolves through an arbitrarily long succession of metastable phases. We also
construct systems in which different phases may nucleate competitively from a single initial phase. For such a
system, we present a general method to extract from numerical simulations the individual nucleation rates of the
nucleating phases. The results show that the Ostwald rule, which predicts which phase will nucleate, must be
modified probabilistically when the new phases are almost equally stable. Finally, we show that the nucleation
rate of a phase depends, among other things, on the number of other phases accessible from it.
PACS numbers: 64.60.My, 64.60.Qb, 05.10.Ln, 05.50.+q
Metastability is a ubiquitous phenomenon in nature.
Broadly speaking, it occurs when a system is “trapped” in
a phase different from equilibrium. This non-equilibrium
phase, the metastable state, can last for extremely long times.
Thus, it is not surprising that metastable states play a cru-
cial role in many physical processes and are at the center of
much current research. For example, recently an intermedi-
ate metastable phase was shown to provide an easier pathway
for the growth of crystal nuclei from fluids (nucleation), with
implications for the crystallization of proteins [1, 2]. Proteins
themselves are known to get stuck in misfolded metastable
structures [3], preventing them from reaching their equilib-
rium configuration. The phenomenology observed in these
and in many other systems can be thought of as arising from a
complicated energy landscape, with several local “metastable
minima” where the trapping occurs [4]. The extreme situation
is that of glasses, in which the energy landscape can have ex-
temely many local minima hindering relaxation of the system
to a thermodynamically stable crystal [5].
The above systems present at least several metastable states.
These states and the transitions between them usually arise
from the microscopic interactions in a complicated way.
When this is the case, the study of phenomena such as compe-
tition between nucleating phases and specific nucleation path-
ways may be obscured. In view of this, in this work we
present a simple spin model with nearest-neighbour interac-
tions, where the number of metastable phases and the decay
pathways between them can be explicitly specified by varying
the model parameters. It thus serves as a test-bed for theoreti-
cal results relating to systems with multiple metastable phases
[6, 7, 8], just as the kinetic Ising model, a special case of our
model, has been central in the study of systems with a sin-
gle metastable phase [9]. As discussed below, the model also
describes the adsorption of multiple chemical species onto a
surface, an interesting physical problem in its own right.
After presenting the model, as an illustration of a possible
application, we construct a system with arbitrarily long suc-
cessions of metastable states. We then focus on competition
between phases nucleating from a single initial metastable
phase. An important question in this context is to understand
which phases nucleate under which conditions. The Ostwald
rule states that the nucleating phase is the one with the small-
est free energy barrier from the initial phase: see Ref. [10]
and references therein. Previous results have supported this
prediction [11].
We show that in general the Ostwald rule must be modified
probabilistically when the new phases are of similar stabil-
ity, using an argument based on individual nucleation rates of
each phase. We give a method by which these rates can be
measured in simulations or experiments, and show that there
is a parameter regime in which any of the new phases may
nucleate—only the nucleation probability of each phase can
be established, with the outcome in any given run being un-
predictable. We finally show that the nucleation probability of
a phase depends on the phases accessible from it.
Model details:- Our model is based on the Potts model, in
which each spin has one of q states [12] and each phase has a
majority of spins in one state; the Ising model corresponds to
q = 2. The relative stability of each phase is controlled by ex-
ternal fields, and the interplay of these fields with interactions
between different spin states allows us to obtain any desired
transition pathways between phases.
Viewing the fields as chemical potentials, we can recast the
model as a multi-component lattice gas which describes ad-
sorption on a lattice substrate (e.g., a crystal plane) of mul-
tiple chemical species with lateral interactions [13]. Much
experimental work has been done on the thermodynamics of
such systems, but little on the kinetics—see [14] and refer-
ences therein; nonetheless, our results should be testable in
that context. A more complicated system where the kinetics
has been characterized is a colloid–polymer system [15, 16],
where possible pathways were found from considerations of
the free energy landscape [17]. Our approach is complemen-
tary in that specific pathways result from microscopic interac-
tions.
We work on an L×L square lattice with N := L2 spins and
periodic boundary conditions, although the results are qualiti-
tavely unaffected by lattice type. Each lattice site i has a spin
http://arxiv.org/abs/0704.0472v3
σi taking values in {1, . . . ,q}, and the energy of a configura-
tion ß is given by the Hamiltonian
H(ß) :=− ∑
〈i, j〉
Jσi,σ j −
hαMα . (1)
Here, Mα := ∑i δσi,α is the magnetisation (=number of spins)
of the spin type α; δα ,γ = 1 if α = γ , and 0 otherwise. The
first term is a sum over nearest-neighbor pairs of spins of a
symmetric interaction energy Jα ,γ = Jγ,α , and the second de-
scribes the effect of external fields hα acting on spin type α .
We set the diagonal elements Jα ,α of the interaction ma-
trix to unity (Jα ,α := 1 for all α), so that in the absence of
non-diagonal interactions and fields, the model reduces to the
standard Potts model [12]. This has q symmetrical phases
coexisting below a critical temperature Tc = 1/(ln(1+
each phase has a majority of spins in one of the q spin states.
Including fields breaks the symmetry between phases. If
hα = hγ , then the α and γ phases coexist, with a first-order
phase transition between them, for T < Tc; this is at the ori-
gin of metastability. Weak non-diagonal interactions do not
qualititavely affect this coexistence.
To evolve the system we choose discrete-time Metropolis
dynamics [18]: at each time step, a spin and its new value
are chosen at random, the increment ∆H of the Hamiltonian
(1) for this change is calculated, and the update is accepted
with probability min{1,exp(−β ∆H)}, where β := 1/T is the
inverse temperature. This gives a Markov chain on the space
of all possible configurations.
This Markov chain has a unique equilibrium distribution,
concentrated on the phase(s) with the largest hα . The other
phases are metastable, that is, when started in such a phase α ,
the system stays there for some time, before a transition to a
more stable phase γ is nucleated by the appearance of a crit-
ical droplet of the γ phase. At sufficiently low temperatures,
the relative stability is determined by hγ > hα . The reverse
transition is exponentially unlikely.
In the standard Potts model, the equilibrium phase (almost)
always nucleates. To obtain non-trivial transition pathways,
nucleation of other phases must be promoted. This we achieve
using non-diagonal interactions between distinct spin types
α 6= γ: setting Jα ,γ > 0 favors nucleation of γ droplets inside
the α phase by lowering the surface tension between α and γ
regions, and hence decreasing the droplet free energy of for-
mation (nucleation barrier), whereas formation of γ droplets
in the α phase is suppressed if Jα ,γ < 0.
We can now construct models whose phases obey arbitrary
metastable transition graphs. These are directed graphs with
the restriction that no loops returning to a previously visited
phase are allowed. Each vertex corresponds to one phase, la-
beled by its dominant spin state, and each arrow to a desired
transition: α → γ means that phase γ can nucleate directly
from phase α . Fig. 1 shows example transition graphs.
To construct a model corresponding to a given transition
graph, we proceed as follows. The number of spin types, q, is
the number of vertices in the graph. To each spin type α we
2 3 4
FIG. 1: Metastable transition graphs: (a) kinetic Ising model; (b) suc-
cession of 3 phases; (c) single metastable phase decaying to two com-
peting phases; (d) as in (c), but such that one phase can decay further;
(e) three competing phases.
0 10000 20000 30000
time (Monte Carlo steps per site)
1 2 3 4 5
FIG. 2: (Color online) Time dependence of magnetisation per site
Mα/N of each phase α , in a single run of the model with transition
graph 1 → 2 → 3 → 4 → 5 (a succession of phases). Parameters are
L = 50, β = 1.25, hα = 0.1(α −1), K1 = 0.1, and K2 = 1.0. Labels
denote the dominant phase. Configuration snapshots depict post-
critical nuclei of each new phase embedded in the previous phase.
These grow to fill the system, producing the next phase in sequence.
assign a field hα , with hγ > hα if γ is below α in the graph.
The off-diagonal interactions are given by Jα ,γ := K1 > 0 (at-
tractive) if α → γ , and Jα ,γ :=−K2 < 0 (repulsive) otherwise.
K2 must be large enough to inhibit immediate formation of
non-adjacent phases with large fields.
As an illustration, we construct a model exhibiting a lin-
ear succession of metastable phases with transition graph
1 → 2 → ··· → q. We impose fields 0 = h1 < h2 < · · · < hq
and attractive interactions Jα ,α±1 := K1 > 0 between neigh-
bouring states, and set all other non-diagonal interactions to
−K2 < 0. With suitable, moderately robust, parameters, we
observe the desired behavior, shown for q = 5 in Fig. 2.
A three-phase succession was previously observed in a ki-
netic Blume–Capel model [19, 20], corresponding to a special
case of our model with q = 3 [21]. The physical reason for the
observed transitions is, however, much more transparent with
the Hamiltonian in the form (1), with its intuitive interpreta-
tion in terms of attractive and repulsive interactions.
Competitive nucleation:- We now turn to the decay of one
metastable phase into two competing phases (Fig. 1(c)). Sear
studied competitive heterogeneous nucleation (occurring on
impurities) in the 3-state standard Potts model [11]. In con-
trast, all behaviors discussed in this work are endogenous: ob-
0 0.1 0.2 0.3 0.4 0.5
(h3 −h2)/h2
0 50 100 150
β = 1.25
β = 1.40
L = 150
L = 200
h3 = 0.105
h3 = 0.11
FIG. 3: (Color online) Probability p2(h3) that phase 2 nucleates be-
fore phase 3 as a function of ∆h/h2, with h2 = 0.1, K1 = 0.1 and
K2 = 1, and β = 1.25, L = 50 unless otherwise noted. Up to n = 104
trials were used for each data point; statistical errors are of the or-
der of the symbol size. Inset: system-size dependence of p2 for two
values of h3 with fixed parameter values.
served transitions are not caused by external influences, but
rather arise spontaneously from within the system itself.
We fix 0 = h1 < h2 ≤ h3, J1,2 = J1,3 > 0 and J2,3 < 0. Let
∆h := h3 − h2 be the field difference between the new phases,
2 and 3. When ∆h = 0, these phases are symmetrical, each
nucleating half of the time, while for ∆h > 0, we expect the
1–3 free energy barrier to be lower than the 1–2 one, so that
according to the Ostwald rule, only phase 3 should nucleate.
To test this, we perform n simulations starting from phase 1
for each ∆h, in n2 of which phase 2 nucleates before phase 3.
The ratio n2/n is then an estimate of the probability p2(∆h)
that phase 2 nucleates first. For efficiency, we use a rejection-
free version of the Metropolis method [21, 22].
Fig. 3 plots p2 as a function of a non-dimensionalised ∆h.
For ∆h sufficiently close to 0, phase 2 can still nucleate first,
contrary to the simple Ostwald rule. The probability that it
does so rapidly decreases for larger ∆h, until a point beyond
which phase 3 effectively always nucleates.
To explain these results in a general context, we assume, as
in classical nucleation theory [23], that there are well-defined
nucleation rates λi(L) of phases i = 2,3, giving the number of
critical nuclei which form per unit time in a system of size L.
The nucleation rates per site are µi(L) := λi(L)/N.
A nucleation rate is the inverse of a mean nucleation time,
which can be measured in experiments or simulations by aver-
aging over many nucleation events in independent runs. In the
case of competitive nucleation, however, we can only measure
the mean time τ for the first phase to nucleate, after which
this phase invades the entire system. The rate of this first
nucleation is λ2 + λ3, since the total number of nucleation
events per unit time is the sum of those for each type, so that
τ = 1/(λ2 + λ3). For convenience, in simulations τ is taken
to be the time until the new phase occupies half the system.
Under the same assumptions, the probability that phase 2
0 0.1 0.2 0.3 0.4
(h3 −h2)/h2
λ2 direct
λ2 forward-flux
λ3 direct
λ3 forward-flux
-0.2 0 0.2 0.4 0.6
(h2 −h3)/h3
with phase 4
without phase 4
FIG. 4: (Color online) (a) Comparison of nucleation rates λi(h3) of
phases i = 2,3 from phase 1 in the model of Fig. 1(c) with h2 fixed,
calculated directly from simulations and using forward-flux sam-
pling. (b) Nucleation probability p2(h2) for h3 = 0.1 and h4 = 0.2
varying h2, with and without phase 4, in the model of Fig. 1(d).
Dashed lines show positions of equal nucleation probability and
equal field of the two phases. In both subfigures parameters are as in
Fig. 3, with L = 50 and β = 1.25.
nucleates first is p2 = Prob{T2 ≤ T3}, where Ti, the time for
phase i to nucleate, is an exponentially distributed random
variable with mean 1/λi. This gives p2 = λ2/(λ2+λ3) = λ2τ .
Individual nucleation rates of the two phases, which can-
not be obtained directly, can now be calculated as: λ2 =
p2/τ and λ3 = (1− p2)/τ . This generalizes to P compet-
ing phases, where the measurable quantities are the mean nu-
cleation time τ = 1/∑Pi=1 λi and the nucleation probabilities
pi = λi/∑ j λ j = µi/∑ j µ j of each phase i. The nucleation
rate of phase i is then λi = pi/τ .
Figure 4(a) shows λ2 and λ3 calculated in this way. To con-
firm the validity of such calculations, we use the “forward-flux
sampling” method [24, 25], which directly calculates the tran-
sition rate between two phases in a stochastic system. This
has previously been used to study nucleation rates in the Ising
model [26, 27]. In our case, the possibility of escape to an
additional new phase must be taken into account [21]. Fig-
ure 4(a) shows that the results indeed coincide with those of
the direct method, within statistical errors. We remark that we
are unaware of any analytical prediction giving the observed
variation of nucleation rates.
The above considerations used “in reverse” confirm that
the Ostwald rule must in general be modified when the new
phases have similar stabilities, as follows. Consider phases
which are equally stable for given parameter values. We ex-
pect nucleation barriers, and hence nucleation rates, to vary
continuously with the parameters, so that the nucleation prob-
abilities pi also vary continuously. Hence there is a region,
where the phases have similar stabilities, in which all nucle-
ation probabilities are non-zero—only the probability of each
phase nucleating is well-defined, with the outcome in any
given run being stochastic, as in Fig. 3. The definite prediction
given by the Ostwald rule is thus invalid in this region.
To see how our results depend on system size L, we note
FIG. 5: (Color online) Configuration snapshots of the multidroplet
regime in the model with transition graph Fig. 1(e), with L = 200,
β = 1.25, and h1 = 0, h2 = h3 = h4 = 0.15. Nucleation of droplets
of three equally stable phases from a single metastable phase is fol-
lowed by droplet growth and then domain coarsening. The “coating”
of the initial phase visible between domains in the final snapshot is
due to repulsive interactions between the new phases.
that in a broad region of L, the per-site nucleation rates µi,
and hence also the pi, are independent of L, as confirmed by
the plateaus in the inset of Fig. 3. This is valid when the nu-
cleation process is mediated by growth of a single droplet [9].
Note that this regime may be of relevance for macroscopically
large systems [9].
Above a certain system size, however, droplets of differ-
ent phases may nucleate before any dominates the system
(the ‘multidroplet’ regime) [9]. This results in coarsening,
as shown in Fig. 5 for three competing phases of equal sta-
bility. Even if a phase-α droplet nucleates first, droplets of
a more-stable phase may then nucleate and grow to dominate
the system before the α phase can do so, thus reducing pα , as
seen in Fig. 3.
Finally, a generic non-symmetric case can be obtained by
adding a new phase, 4, and a decay path, 3→ 4, as in Fig. 1(d).
Even when h2 = h3, phase 3 now nucleates more often, as
shown by the horizontally displaced nucleation probability
curve in Fig. 4(b): the presence of phase 4 reduces p2 by
roughly half. This is due to an entropic effect: there are more
3-dominated critical droplets than 2-dominated ones, since
spins of type 4 also appear in the former droplets, resulting
in nucleation of a binary mixture [23]; a similar effect is vis-
ible in Fig. 1. There is thus a lower free energy barrier to
form phase 3, and yet nucleation of both phases 2 and 3 is still
observed, again at odds with the simple Ostwald rule.
In summary, we have introduced a generalized Potts model
which can easily be tuned to have any given number of
metastable phases and arbitrary transitions between them. We
have shown generally that individual nucleation rates of com-
petitively nucleating phases can be calculated from exper-
imentally measurable quantities, and that the Ostwald rule
must be modified when the nucleating phases have compa-
rable stabilities. In future work [21], we will study the model
in detail and compare its properties with theoretical results [8]
on systems with multiple metastable phases.
DPS thanks A. Huerta for useful discussions and the Uni-
versidad Nacional Autónoma de México for financial sup-
port. The financial support of DGAPA-UNAM project PA-
PIIT IN112307 is also acknowledged.
∗ Electronic address: dsanders@fis.unam.mx
[1] J. F. Lutsko and G. Nicolis, Phys. Rev. Lett. 96, 046102 (2006).
[2] P. R. ten Wolde and D. Frenkel, Science 277, 1975 (1997).
[3] S. Takada and P. G. Wolynes, Phys. Rev. E 55, 4562 (1997).
[4] D. Wales, Energy Landscapes (Cambridge University Press,
Cambridge, 2003).
[5] G. Biroli and J. Kurchan, Phys. Rev. E 64, 016101 (2001).
[6] A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Comm. Math.
Phys. 228, 219 (2002).
[7] B. Gaveau and L. S. Schulman, Phys. Rev. E 73, 036124 (2006).
[8] H. Larralde, F. Leyvraz, and D. P. Sanders, J. Stat. Mech. 2006,
P08013 (2006).
[9] P. A. Rikvold, H. Tomita, S. Miyashita, and S. W. Sides, Phys.
Rev. E 49, 5080 (1994).
[10] P. R. ten Wolde and D. Frenkel, Phys. Chem. Chem. Phys. 1,
2191 (1999).
[11] R. P. Sear, J. Phys. Cond. Matt. 17, 3997 (2005).
[12] F. Y. Wu, Rev. Mod. Phys. 54, 235 (1982).
[13] P. Rikvold, J. Collins, G. Hansen, and J. Gunton, Surf. Sci. 203
(1988).
[14] S. Manzi, W. Mas, R. Belardinelli, and V. Pereyra, Langmuir
20, 499 (2004).
[15] W. C. K. Poon, F. Renth, R. M. L. Evans, D. J. Fairhurst, M. E.
Cates, and P. N. Pusey, Phys. Rev. Lett. 83, 1239 (1999).
[16] F. Renth, W. C. K. Poon, and R. M. L. Evans, Phys. Rev. E 64,
031402 (2001).
[17] R. M. L. Evans, W. C. K. Poon, and F. Renth, Phys. Rev. E 64,
031403 (2001).
[18] M. E. J. Newman and G. T. Barkema, Monte Carlo Methods in
Statistical Physics (Oxford University Press, New York, 1999).
[19] E. N. M. Cirillo and E. Olivieri, J. Stat. Phys. 83, 473 (1996).
[20] T. Fiig, B. M. Gorman, P. A. Rikvold, and M. A. Novotny, Phys.
Rev. E 50, 1930 (1994).
[21] D. P. Sanders, H. Larralde, and F. Leyvraz, (in preparation).
[22] M. A. Novotny, Phys. Rev. Lett. 74, 1 (1995).
[23] P. G. Debenedetti, Metastable Liquids (Princeton University
Press, Princeton, 1996).
[24] R. J. Allen, P. B. Warren, and P. R. ten Wolde, Phys. Rev. Lett.
94, 018104 (2005).
[25] R. J. Allen, D. Frenkel, and P. R. ten Wolde, J. Chem. Phys.
124, 024102 (2006).
[26] R. P. Sear, J. Phys. Chem. B 110, 4985 (2006).
[27] A. J. Page and R. P. Sear, Phys. Rev. Lett. 97, 065701 (2006).
mailto:dsanders@fis.unam.mx
ABSTRACT
  We introduce a simple nearest-neighbor spin model with multiple metastable
phases, the number and decay pathways of which are explicitly controlled by the
parameters of the system. With this model we can construct, for example, a
system which evolves through an arbitrarily long succession of metastable
phases. We also construct systems in which different phases may nucleate
competitively from a single initial phase. For such a system, we present a
general method to extract from numerical simulations the individual nucleation
rates of the nucleating phases. The results show that the Ostwald rule, which
predicts which phase will nucleate, must be modified probabilistically when the
new phases are almost equally stable. Finally, we show that the nucleation rate
of a phase depends, among other things, on the number of other phases
accessible from it.

<|endoftext|><|startoftext|>
Introduction
In the mid 1960’s a direct method for the problems of the calculus of vari-
ations, which permits one to obtain absolute extremizers directly, without
using variational methods, was introduced by Leitmann (Ref. 9). Since then,
this direct method has been extended and applied to a variety of problems
(see e.g. Refs. 1, 3, 4, 10). A different but related direct approach to prob-
lems of optimal control, based on the variational symmetries of the problem
(cf. Refs. 7, 8), was recently introduced by Silva and Torres (Ref. 11). The
emphasis in Ref. 11 has been on showing the differences and similarities be-
tween the proposed method and that suggested by Leitmann. In order to
illustrate the relation between these two methods, only examples capable of
1Associate Professor in the Department of Mathematics, University of Aveiro, Aveiro,
Portugal.
2Professor in the Graduate School, College of Engineering, University of California,
Berkeley, California.
http://arxiv.org/abs/0704.0473v1
treatment by both methods were presented in Ref. 11. In this note, we dis-
cuss some differences between the method of Carlson and Leitmann (C/L)
and Silva and Torres (S/T). In particular, we show how one succeeds when
the other does not.
2 The Invariant Transformation Method of S/T3
Let us consider the problem of optimal control in Lagrange form: minimize
an integral
I [x(·), u(·)] =
L (t, x(t), u(t)) dt (1)
subject to a control system
ẋ(t) = ϕ (t, x(t), u(t)) a.e. on [a, b] , (2)
together with appropriate boundary conditions and constraint on the values
of the control variables:
x(a) = xa , x(b) = xb , u(t) ∈ Ω . (3)
The Lagrangian L(·, ·, ·) is a real function assumed to be continuously dif-
ferentiable in [a, b] × Rn × Rm; t ∈ R is the independent variable; x(·) :
[a, b] → Rn the vector of state variables; u(·) : [a, b] → Ω ⊆ Rm the
vector of controls, assumed to be a piecewise continuous function; and
ϕ(·, ·, ·) : [a, b] × Rn × Rm → Rn the derivative function, assumed to be
a continuously differentiable vector function. When Ω is an open set (it
may be all Euclidean space Ω = Rm), problem (1)-(3) can be studied using
the classical techniques of the Calculus of Variations. Optimal Control The-
ory includes the classical Calculus of Variations and generalizes the theory
by dealing with the cases where Ω is not an open set.
The application of the invariant transformation method (Ref. 11) de-
pends on the existence of a sufficiently rich family of invariance transfor-
mations (variational symmetries). The reader interested on the study of
variational symmetries is referred to Refs. 8, 12, 13 and references therein.
Definition 2.1 Let hs be a s-parameter family of C1 mappings satisfying:
hs(·, ·, ·) : [a, b] ×Rn × Ω −→ R× Rn × Rm ,
hs(t, x, u) = (ts(t, x, u), xs(t, x, u), us(t, x, u)) ,
h0(t, x, u) = (t, x, u) , ∀(t, x, u) ∈ [a, b] ×Rn × Ω .
3Throughout this Note, the notation conforms to that used in the references.
If there exists a function Φs(t, x, u) ∈ C1 ([a, b],Rn,Ω;R) such that
L ◦ hs(t, x(t), u(t))
ts (t, x(t), u(t)) = L (t, x(t), u(t)) +
Φs (t, x(t), u(t))
xs (t, x(t), u(t)) = ϕ ◦ hs (t, x(t), u(t))
ts (t, x(t), u(t)) (5)
for all admissible pairs (x(·), u(·)), then (1)–(2) is said to be invariant un-
der the transformations hs(t, x, u) up to Φs(t, x, u); and the transformations
hs(t, x, u) are said to be a variational symmetry of (1)–(2).
The method proposed in Ref. 11 is based on a very simple idea. Given
an optimal control problem, one begins by determining its invariance trans-
formations according to Definition 2.1. With respect to this, the tools devel-
oped in Refs. 7, 8 are useful. Applying the parameter-invariance transfor-
mations, we embed our problem into a parameter-family of optimal control
problems. Given the invariance properties, if we are able to solve one of the
problems of this family, we also get the solution to our original problem (or
to any other problem of the same family) from the invariant transformations.
In section 4 we give an example which shows that the Invariant Transforma-
tion Method (Ref. 11) is more general than the earlier C/L transformation
method in the case of optimal control problems.
3 The Direct Solution Method of C/L
Since this method is fully discussed in readily available references, e.g.
Refs. 1, 3, 4, 9, 10, many in this journal, we shall only recall that the C/L
transformation based method is applicable to problems in the Calculus of
Variations format: minimize an integral
I [x(·)] =
F (t, x(t), ẋ(t)) dt (6)
with given end conditions
x(a) = xa , x(b) = xb . (7)
If one wishes to solve an optimal control problem (1)-(3), the “elimination”
of u(t) in favor of a function of t, x(t), ẋ(t) must be possible. As illustrated
in section 4, this may fail even if the Implicit Function Theorem is satisfied.
Both the S/T and the C/L methods are predicated on posing a problem
“equivalent” to the original problems (1)-(3) and (6)-(7), respectively. Thus,
these methods are useful only if the solution of the “equivalent” problem is
directly obtainable, i.e., by inspection. There is, at present, no result assur-
ing that this can be done in general for the S/T method. However, for the
C/L method, at least in the scalar x case, it has been shown in Ref. 5 and
generalized to open-loop differential games in Ref. 2, that the “equivalent”
problem always has a minimizing solution obtained by inspection. The con-
ditions sufficient for this result are convexity of integrand F (t, x(t), ẋ(t))
with respect to ẋ(t), and existence of a so-called “field of extremals”. In-
deed, no matter what the integrand of the original problem is, provided the
conditions above are met, the absolute minimizer of the equivalent problem
is always a constant.
4 Example 1
The advantage of the invariant transformation method when compared with
the earlier transformation method is that one can apply it directly to control
systems whereas the method of C/L requires that the control u(t) can be
expressed as a sufficiently smooth function of t, x(t), ẋ(t), e.g. such that
the integrand be continuous in x(t) and ẋ(t). Here we use the invariant
transformation method of S/T to solve a simple optimal control problem
that is not covered by the classical theory of the Calculus of Variations and
which can not be solved by the previous transformation method.
Consider the global minimum problem
I[u1(·), u2(·)] =
u1(t)
2 + u2(t)
dt −→ min
ẋ1(t) = exp(u1(t)) + u1(t) + u2(t) ,
ẋ2(t) = u2(t) ,
x1(0) = 0 , x1(1) = 2 , x2(0) = 0 , x2(1) = 1 ,
u1(t) , u2(t) ∈ Ω = [−1, 1] .
We apply the procedure introduced in Ref. 11 and briefly described in sec-
tion 2. First we notice that problem (8) is variationally invariant according
to Definition 2.1 under the one-parameter transformations4
= x1 + st , x
= x2 + st , u
= u2 + s (t
s = t and us
= u1) . (9)
To prove this, we need to show that both the functional integral I[·] and the
control system stay invariant under the s-parameter transformations (9).
This is easily seen by direct calculations. We begin by showing (4):
Is[us
(·), us
(·)] =
+ (us
u1(t)
2 + (u2(t) + s)
u1(t)
2 + u2(t)
s2t+ 2sx2(t)
= I[u1(·), u2(·)] + s
2 + 2s .
We remark that Φs (t, x2) = s
2t+2sx2 and that I
s[·] and I[·] have the same
minimizers: adding a constant s2+2s to the functional I[·] does not change
the minimizer of I[·]. It remains to prove (5):
(t)) =
(x1(t) + st) = ẋ1(t) + s = exp(u1(t)) + u1(t) + u2(t) + s
= exp(us
(t)) + us
(t) + us
(t) ,
(t)) =
(x2(t) + st) = ẋ2(t) + s = u2(t) + s
(t) .
Conditions (10) and (11) prove that problem (8) is invariant under the
s-parameter transformations (9) up to the gauge term Φs = s2t + 2sx2.
Using the invariance transformations (9), we generalize problem (8) to a
s-parameter family of problems, s ∈ R, which include the original problem
4A computer algebra package that can be used to find the invariance trans-
formations (see Refs. 7, 8) is available from the Maple Application Center at
http://www.maplesoft.com/applications/app center view.aspx?AID=1983
for s = 0:
Is[us
(·), us
(·)] =
(t))2 + (us
(t))2dt −→ min
(t) = exp(us
(t)) + us
(t) + us
(t) ,
ẋ2(t) = u
(t) ,
(0) = 0 , xs
(1) = 2 + s , xs
(0) = 0 , xs
(1) = 1 + s ,
(t) ∈ [−1, 1] , us
(t) ∈ [−1 + s, 1 + s] .
It is clear that Is ≥ 0 and that Is = 0 if us
(t) = us
(t) ≡ 0. The con-
trol equation, the boundary conditions and the constraints on the values of
the controls, imply that us
(t) = us
(t) ≡ 0 is admissible only if s = −1:
xs=−1
(t) = t, xs=−1
(t) ≡ 0. Hence, for s = −1 the global minimum to Is[·]
is 0 and the minimizing trajectory is given by
(t) ≡ 0 , ũs
(t) ≡ 0 , x̃s
(t) = t , x̃s
(t) ≡ 0 .
Since for any s one has by (10) that I[u1(·), u2(·)] = I
(·), us
(·)]−s2−2s,
we conclude that the global minimum for problem I[u1(·), u2(·)] is 1. Thus,
using the inverse functions of the variational symmetries (9),
u1(t) = u
(t) , u2(t) = u
(t)−s , x1(t) = x
(t)−st , x2(t) = x
(t)−st ,
the absolute minimizer is
ũ1(t) = 0 , ũ2(t) = 1 , x̃1(t) = 2t , x̃2(t) = t .
This problem cannot be solved by the C/L method.
While the existence of a function h(·) such that u1(t) = h (ẋ1(t)− ẋ2(t))
is assured by the satisfaction of the Implicit Function Theorem, this is not
useful in “eliminating” the control in favor of t, x(t), ẋ(t) which is a re-
quirement of the C/L method. This is so because h(·) is a solution of a
transcendental equation. In addition, since the controls are bounded, even
if “elimination” of the control were possible, this would lead to differential
constraints of the form briefly discussed in Ref. 6.
5 Example 2
Consider the problem of attaining the absolute minimum of integral
I[x(·)] =
ẋ2(t) + tẋ(t)
dt (12)
with prescribed end conditions
x(a) = xa , x(b) = xb . (13)
This is a problem in the Calculus of Variations format for which the C/L or
S/T transformation-based method applies. These methods can be applied ab
initio to obtain the solution. However, here we shall employ the constructive
sufficiency condition embodied in Theorem 7 of Ref. 5 towards that end.
The Euler-Lagrange equation is
ẍ(t) = −
so that
x(t) = −
t2 + c1t+ c2 (15)
is the extremal with the constants of integration given by end conditions
(13), say ci = c
, i = 1, 2, and the extremal satisfying (15) is
x∗(t) = −
t2 + c∗
t+ c∗
which, being the solution of necessary condition (14), is a candidate for the
absolute minimizer of (12)-(13).
Now we can readily show that the conditions of Theorem 7 of Ref. 5 are
met. Consider the one-parameter family of extremals
ξ(t, β) = −
t2 + c∗
t+ c∗
+ β .
First of all, there exist a β∗, namely β∗ = 0, such that
ξ(t, β∗) = x∗(t) .
Secondly,
∂ξ(t, β)
6= 0 ,
and finally, the integrand of (12) is convex in ẋ(t).
Thus, the extremal (16) is indeed the absolute minimizer of (12)-(13).
Of course, this is precisely the solution obtained by employing the C/L
method. Indeed, the more general method inherent in Theorem 7 of Ref. 5
was derived using the C/L transformation-based method.
6 Conclusion
We have contrasted two transformation-based methods for obtaining abso-
lutely extremizing solutions for two classes of problems. One, dubbed the
Carlson/Leitmann method, is applicable to problems in the format of the
Calculus of Variations. The other, due to Silva and Torres, is applicable to
problems of Optimal Control.
We have shown that it is not always possible to convert an Optimal
Control problem into a Calculus of Variations one. Hence, the C/L method
may fail to apply when the S/T succeeds. On the other hand, a classical
constructive sufficiency condition, readily derived by the C/L method, ren-
ders absolute extremizers for specific problems of the Calculus of Variations
more directly and succinctly than the C/L and S/T methods.
References
1. CARLSON, D. A., An Observation on Two Methods of Obtaining So-
lutions to Variational Problems, J. Optim. Theory Appl., Vol. 114,
pp. 345–361, 2002.
2. CARLSON, D. A., Fields of Extremals and Sufficient Conditions for a
Class of Variational Games, Proceed. of 12th International Symposium
on Dynamic Games and Applications, Sophia Antipolis, France, 2006,
to appear.
3. CARLSON, D. A., and LEITMANN, G., Coordinate Transformation
Method for the Extremization of Multiple Integrals, J. Optim. Theory
Appl., Vol. 127, pp. 523–533, 2005.
4. CARLSON, D. A., and LEITMANN, G., A Direct Method for Open-
Loop Dynamic Games for Affine Control Systems. Dynamic games: the-
ory and applications, 37–55, GERAD 25th Anniv. Ser., 10, Springer,
New York, 2005.
5. CARLSON, D. A., and LEITMANN, G., Fields of Extremals and Suffi-
cient Conditions for the Simplest Problem of the Calculus of Variations,
J. Global Optim., 2007, to appear.
6. CARTIGNY, P., and DEISSENBERG, C., An Extension of Leitmann’s
Direct Method to Inequality Constraints, Int. Game Theory Rev., Vol. 6,
pp. 15–20, 2004.
7. GOUVEIA, P. D. F., and TORRES, D. F. M., Automatic Computation
of Conservation Laws in the Calculus of Variations and Optimal Con-
trol, Computational Methods in Applied Mathematics, Vol. 5, pp. 387–
409, 2005.
8. GOUVEIA, P. D. F., TORRES, D. F. M., and ROCHA, E. A. M.,
Symbolic Computation of Variational Symmetries in Optimal Control,
Control and Cybernetics, Vol. 35, pp. 831–849, 2006.
9. LEITMANN, G., A Note on Absolute Extrema of Certain Integrals,
International Journal of Nonlinear Mechanics, Vol. 2, pp. 55–59, 1967.
10. LEITMANN, G., Some Extensions to a Direct Optimization Method, J.
Optim. Theory Appl., Vol. 111, pp. 1–6, 2001.
11. SILVA, C. J., and TORRES, D. F. M., Absolute Extrema of Invariant
Optimal Control Problems, Commun. Appl. Anal., Vol. 10, pp. 503–516,
2006.
12. TORRES, D. F. M., Quasi-Invariant Optimal Control Problems, Port.
Math. (N.S.), Vol. 61, pp. 97–114, 2004.
13. TORRES, D. F. M., A Noether Theorem on Unimprovable Conserva-
tion Laws for Vector-Valued Optimization Problems in Control Theory,
Georgian Mathematical Journal, Vol. 13, pp. 173–182, 2006.
	Introduction
	The Invariant Transformation Method of S/T3
	The Direct Solution Method of C/L
	Example 1
	Example 2
	Conclusion
ABSTRACT
  In this note we contrast two transformation-based methods to deduce absolute
extrema and the corresponding extremizers. Unlike variation-based methods, the
transformation-based ones of Carlson and Leitmann and the recent one of Silva
and Torres are direct in that they permit obtaining solutions by inspection.

<|endoftext|><|startoftext|>
Geometric phase of an atom inside an adiabatic radio frequency potential
P. Zhang1 and L. You1, 2
School of Physics, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
Center for Advanced Study, Tsinghua University, Beijing 100084, People’s Republic of China
(Dated: November 4, 2018)
We investigate the geometric phase of an atom inside an adiabatic radio frequency (rf) potential
created from a static magnetic field (B-field) and a time dependent rf field. The spatial motion
of the atomic center of mass is shown to give rise to a geometric phase, or Berry’s phase, to the
adiabatically evolving atomic hyperfine spin along the local B-field. This phase is found to depend
on both the static B-field along the semi-classical trajectory of the atomic center of mass and an
“effective magnetic field” of the total B-field, including the oscillating rf field. Specific calculations
are provided for several recent atom interferometry experiments and proposals utilizing adiabatic rf
potentials.
PACS numbers: 03.65.Vf, 39.20.+q, 03.75.-b, 39.25.+k
I. INTRODUCTION
Magnetic trapping is an important enabling technol-
ogy for the active research field of neutral atomic quan-
tum gases. A variety of trap potentials can be developed
using magnetic (B-) fields with different spatial distri-
butions and time variations. For instance, the widely
used quadrupole trap and the Ioffe-Pritchard trap [1] are
usually created with static B-fields, while the time aver-
aged orbiting potential (TOP) [2] and time orbiting ring
trap (TORT) [3, 4] are created using oscillating B-fields
with frequencies larger than the effective trap frequen-
cies. Atom chips [5] have brought further developments
to magnetic trap technology, as they can provide larger
B-fields and gradients at reduced power-consumptions
or electric currents using micro-fabricated coils. Today,
magnetic trapping is a versatile tool used in many labo-
ratories around the world for controlling atomic spatial
motion in regions of different scales and geometric shapes,
e.g., 3D or 2D traps, double well traps, and storage ring
traps [1, 2, 3, 4, 5, 6, 7].
Recently, magnetic traps based on adiabatic microwave
[8, 9] and adiabatic radio frequency (rf) potentials
(ARFP) [10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26] have attracted considerable attention.
An ARFP is typically created with the combination of
a static B-field and an rf field. The idea for an ARFP
has been around for some time [10, 11, 12], and experi-
mental demonstrations recently have been carried out for
confining both thermal [13] and Bose condensed atoms
[14, 15, 16, 17]. Further development of improved ARFP
with atom chip technology likely will assist in practi-
cal applications of atom interferometry. For instance,
a double well potential was constructed recently using
low order multi-poles capable of atomic beam splitting
while maintaining tight spatial confinement [21]. Sev-
eral interesting recent proposals outline the construc-
tion of small storage rings with radii of the order 1µm
[18, 19, 20, 21, 22], which could become useful if imple-
mented for atom Sagnac interferometry [27] setups on
atom chips.
When a neutral atom is confined in a magnetic poten-
tial, its hyperfine spin is assumed to follow adiabatically
the spatial variation of the B-field direction during its
spatial translational motion. As a result of this adia-
batic approximation, the center of mass motion for the
atom experiences an induced gauge field [28], giving rise
to a geometric phase (or Berry’s phase) to the atomic
internal spin state [29, 30]. The effect of this geometric
phase is widely known, and is first addressed carefully
in a meaningful way for atomic quantum gases by an
explicit calculation of the resulting geometric phase in a
static or a time averaged magnetic trap in Ref. [31]. Sev-
eral important consequences are predicted to occur for a
magnetically trapped atomic condensate in a quadrupole
trap, a Ioffe-Pritchard trap [31, 32], or a TORT based
storage ring [33]. To our knowledge, this geometric phase
effect has not been investigated in any detail for an atom
inside an ARFP.
In a recent paper, we show that this geometric phase
causes an effective Aharonov-Bohm-type [34] phase shift
in a magnetic storage ring based atom interferometer [33].
In addition, our studies imply that the spatial fluctuation
of the geometric phase can lead to a reduction of the
visibility of the interference contrast. In view of this, we
decided to carry out this study as reported here for the
atomic geometric phase in an ARFP in order to shed light
on the proposed high precision atom Sagnac interference
experiment [27].
Analytical derivations for this study at some places
become rather tedious and complicated. We therefore
first will summarize our major results here for readers
who may not be interested in the intricate details. We
find that the geometric phase in an ARFP generally takes
a more complicated form in comparison to the case of a
static trap or a time averaged trap. In an ARFP, this
phase factor is found to be determined by the trajectory
of the time independent component of the trap field as
well as an “effective B-field” that depends on the total
B-field. In contrast to the earlier result found for a static
trap or a time averaged trap [31, 33], the final result turns
out to be not expressible as a functional of the trajectory
http://arxiv.org/abs/0704.0476v1
for the direction of the total B-field in the parameter
space.
This paper is organized as follows. In sec. II, we gener-
alize the semi-classical approach as outlined in Ref. [21]
for the operating principle of an ARFP to a form more
convenient for discussing the geometric phase. Section
III parallels that of sec. II by reformulating a full quan-
tum theory for discussing the geometric phase inside an
ARFP [22]. The explicit expression for the geometric
phase inside an ARFP is given in a readily adaptable
form for specifical calculations. In sec. IV, we discuss the
effect of the geometric phase in several types of ARFP
recently proposed for atomic splitters and storage rings
[18, 19, 20, 21, 22] and beam splitters [14, 21]. Finally,
concluding remarks are given in sec. V.
II. A SEMI-CLASSICAL APPROACH
In this section, we provide a semi-classical formula-
tion for calculating the atomic geometric phase inside an
ARFP. The semi-classical working principle for an ARFP
is described in Ref. [21], although only for the special
case when the atomic center of mass is assumed at a fixed
location. In order to calculate the geometric phase, our
formulation allows for the explicit consideration of atomic
center of mass motion classically. In our approach, the
geometric phase is obtained naturally, and the validity
conditions for both the adiabatic and the rotating wave
approximations are clearly shown for an ARFP.
Inside an ARFP [21], the total B-field ~B(~r, t) is the
sum of a static field component ~Bs(~r) and an oscillatory
rf field ~Bo(~r, t), which conveniently is expressed as
~Bo(~r, t) = ~B
rf (~r, t) cos(ωt) +
rf (~r, t) cos(ωt+ η). (1)
where ~r is the spatial position vector of the atom, ω is
frequency of the rf field, and η is a relative phase factor.
In this section, we will assume that the atomic spatial
motion is pre-determined, i.e., ~r(t) is given (as a slowly
varying function of time t). For weak B-fields, the system
Hamiltonian is simply the linear Zemman interaction
H(t) = gFµB ~F · ~B[~r(t), t], (2)
where gF is the corresponding Lande g-factor and µB
denotes the Bohr magneton. ~ = 1 is assumed.
For a static or a time averaged magnetic trap, the
Hamiltonian (2) varies slowly over time scales of the Lar-
mor precession of the atomic spin in the total B-field.
During the effectively slow trapped motion, the atomic
hyperfine spin is assumed to be fixed at the instanta-
neous eigenstate of the Hamiltonian (2). The geometric
phase then can be calculated straightforwardly from the
variation of the B-field direction in the parameter space
[31, 33].
In an ARFP, the situation is more complicated. Al-
though the variation of ~Bs[~r(t)] remains much slower
than the Larmor precession, the rf frequency ω usually
is assumed to be nearly resonant with the precession fre-
quency. Thus, the Hamiltonian (2) contains both fast
and slow time varying components, making the direct
calculation of the geometric phase a more involved task.
In the following, we will proceed step by step, clarifying
the various approximations adopted along the way.
To understand the working principle for an ARFP, we
first decompose the Hamiltonian H(t) (2) into the fol-
lowing form
H(t) = Hs[~r(t)] +H+[~r(t)]e
−iωt +H−[~r(t)]e
iωt, (3)
where Hs and H± are all slow varying functions of time
and are given by
Hs[~r(t)] = gFµB ~F · ~Bs[~r(t)],
H+[~r(t)] =
gFµB ~F ·
rf [~r(t)] + e
−iη ~B
rf [~r(t)]
H−[~r(t)] = H
+[~r(t)]. (4)
Hs is diagonal in the spin angular momentum ba-
sis defined along the local direction of the static B-
field ~Bs[~r(t)]. The eigenstate takes the familiar form
|mF [~r(t)]〉s, quantized along the direction of ~Bs[~r(t)],
with the eigenvalue mF |Bs[~r(t)]| for ~Bs[~r(t)] · ~F and
mF ∈ [−F, F ], in analogy with the usual case of the z-
quantized representation result of Fz |mF 〉z = mF |mF 〉z.
Next we introduce a unitary transformation
U(t) =
mF=−F
|mF 〉z s〈mF [~r(t)]|eimF κωt, (5)
with κ = sign(gF ) for the rotating wave approxima-
tion. The quantum state in the interaction picture
|Ψ(t)〉I = U(t)|Ψ(t)〉 defined by U(t) is governed by the
Schroedinger equation i∂t|Ψ(t)〉I = HI(t)|Ψ(t)〉I , with
the Hamiltonian in the interaction picture given by
HI(t) = UHU
† + i(∂tU)U
mκ∆[~r(t)]|m〉z z〈m| − i
m,n=−F
|m〉z s〈m[~r(t)]|
|n[~r(t)]〉s z〈n|ei(m−n)κωt
m=−F+1
h(+)m [~r(t)]|m〉z z〈m− 1|+ h(−)m [~r(t)]|m〉z z〈m− 1|e2iκωt + h.c.
hm[~r(t)]|m〉z z〈m|eiκωt + h.c.
, (6)
where the time dependent parameters are defined as
∆[~r(t)] = µB|gF ~B[~r(t)]| − ω,
h(±)m [~r(t)] = s〈m[~r(t)]|H±[~r(t)]|(m− 1)[~r(t)]〉s, and(7)
hm[~r(t)] = s〈m[~r(t)]|H±[~r(t)]|m[~r(t)]〉s.
The above result is obtained easily if we note that
the matrix element s〈m[~r(t)|H±(t)|m′[~r(t)〉s is non-zero
only when m − m′ = 0,±1. So far, we have always as-
sumed that |m[~r(t)]〉s is a single valued function of the
atomic position ~r. A careful examination shows that the
eigenstate |m[~r(t)]〉s cannot be determined uniquely be-
cause of the presence of the U(1) gauge freedom for se-
lecting a local phase factor exp{iφ[~r(t)]}, which conse-
quently affects the resulting expressions for h
m (t) and
s〈m[~r(t)]|d/dt|m′[~r(t)]〉s.
The rotating wave approximation neglects of the
oscillating terms proportional to eimωt (m 6= 0) in the
Hamiltonian HI (6). The error for this approximation
is estimated easily from a time dependent perturbation
calculation. The sufficient condition for its validity
requires that all factors such as
dt′hm(t
′) exp[iκωt′],
′)ξm,m−1(t
′) exp[iκ(2ω + ∆)t′], and
dt′〈m[~r(t′)]|d/dt′|n[~r(t′)]〉ξmn(t′) exp[i(m − n)κ(ω +
∆)t′] are negligible, where
ξmn(t) = exp
dt′ s〈m[~r(t′)]|
|m[~r(t′)]〉s
dt′ s〈n[~r(t′)]|
|n[~r(t′)]〉s
. (8)
Thus, the gauge independent factors h
m ξm,m−1,
〈ms|d/dt|ns〉ξmn , hm, and ∆ should all vary slowly with
time and with the modulus of their amplitudes much less
than ω.
The effective Hamiltonian in the interaction picture
under the rotating wave approximation then becomes
eff (t) = µBgF
~F · ~Beff [~r(t)]
|m〉z s〈m[~r(t)]|
|m[~r(t)]〉s z〈m|, (9)
where the first term resembles a coupling between the
atomic spin and an “effective B-field” ~Beff(~r), whose
components in real space are given by
Beffx (~r) = Re
2 s〈m(~r)|H±(~r)|(m− 1)(~r)〉s
(F +m)(F −m− 1)
Beffy (~r) = −Im
2 s〈m(~r)|H±(~r)|(m− 1)(~r)〉s
(F +m)(F −m− 1)
, and
Beffz (~r) =
~Bs(~r)
µB|gF |
. (10)
Clearly, the x- and y-components of the effective field
~Beff(~r) depend on the explicit form of the eigenstate
|m(~r)〉s. In fact, it easily can be seen that different
choices of the local phase factor for the |m(~r)〉s actually
lead to different values of ~Beff(~r) related to each other
through ~r-dependent rotations in the x-y plane.
In practice, the eigenstate |n(~r)〉s and the effective field
~B eff can sometimes be constructed more simply, as in
Ref. [21]. For any spatial position ~r, we first choose a
rotation R[m̂(~r), χ(~r)] along the axis m̂(~r) with an angle
χ(~r) that satisfies R[~n(~r), χ(~r)] ~Bs(~r) = | ~Bs(~r)|êz. It is
then easy to show that the eigenstate |n[~r(t)]〉s can be
chosen as
|n(~r)〉s = exp
i ~F · m̂(~r)χ(~r)
|n〉z. (11)
Unfortunately, the choice for R is not unique in a given
static field ~Bs(~r), an analogous result to the U(1) gauge
freedom for the the egienstate |n(~r)〉s. Corresponding
to the choice (11) given above for |n(~r)〉s, the unitary
transformation U defined in (5) would become
U(t) = exp(−iFzωt) · exp
−i ~F · m̂(~r)χ(~r)
, (12)
and the transverse components of the “effective B-field”
given by B effx,y(~r) = Bx,y(~r)/2 [21] with
~B(~r) = R[m̂(~r), χ(~r)] ~B
rf (~r)
+R[êz,−κη]R[m̂(~r), χ(~r)] ~B(b)rf (~r). (13)
In earlier discussions of an ARFP [21, 22], the atomic
internal state is assumed uniformly to remain adiabati-
cally in a certain eigenstate of the first term of H
eff (t).
To fully appreciate this adiabatic approximation and to
calculate the geometric phase, we expand |Ψ(t)〉I into
the instantaneous eigenstate basis |n[~r(t)]〉eff quantized
along the direction of the effective B-field ~Beff accord-
ing to |Ψ(t)〉I =
n Cn(t)|n[~r(t)]〉eff . The first term of
eff (t) is simply the effective Zemman interaction be-
tween the atomic hyperfine spin and the effective B-field.
The corresponding Schroedinger equation for the Hamil-
tonian H
eff (t) of (9) then becomes
Cn(t) = [ǫ
I (t) + νnn(t)]Cn(t) +
m 6=n
νnm(t)Cm(t),
I (t) = nµBgF | ~B
eff(t)|,
νpq(t) = −i
eff〈p[~r(t)]|l〉z
s〈l[~r(t)]|
|l[~r(t)]〉s z〈l|q[~r(t)]〉eff
−i eff〈p[~r(t)]|
|q[~r(t)]〉eff . (14)
Under the adiabatic approximation, the atomic inter-
nal state remains in a given eigenstate |n[~r(t)]〉eff with
transitions to states |m[~r(t)]〉eff (m 6= n) being negligi-
bly small. Thus, the transition probability, as estimated
from the first order perturbation theory,
dt′νnm(t
dt′′[ǫ
(t′′)+νnn(t
′′)−ǫm
(t′′)−νmm(t
should be much less than one. As before, we find the
sufficient condition for the validity of the adiabatic ap-
proximation is given by
|νmn(t)|
|ǫmI (t′)− ǫ
≪ 1, (15)
provided that νmn(t) exp[i
[νnn(t
′′) − νmm(t′′)]dt′],
which is independent of the local phase factor for
|n[~r(t)]〉eff and |n[~r(t)]〉s, remains a slowly varying func-
tion of time.
A straight forward calculation from the effective
Hamiltonian (9) then gives the general expression for the
geometric phase in an ARFP
γn(t) =
νnn(t
′)dt′
| eff〈n[~r(t′)|l〉z|
s〈l[~r(t′)]|
|l[~r(t′)]〉sdt′
+γ(I)n (t), (16)
and γ
n (t) = −i
dt′ eff〈n[~r(t′)|d/dt′|n[~r(t′)〉eff . During
the adiabatic motion in a given internal state, the time
evolution of the coefficient Cn(t) takes the form
Cn(t) = Cn(0)e
(t′)dt′e−iγn(t). (17)
Equation (16) is the central result of this work. The
geometric phase of an atom inside an ARFP is shown to
contain two parts. The second part, γ
n (t), is clearly
due to the interaction term µBgF ~F · ~Beff in H(I)eff (9),
with its value determined by the trajectory of the direc-
tion for the “effective B-field” ~Beff . The first part arises
from the second term of H
eff (9). It is determined by
the trajectories of both the static field ~Bs and the ef-
fective B-field ~Beff . The expression for γn in an ARFP
is complicated because the internal quantum state in an
ARFP is assumed to be adiabatically kept in an eigen-
state of µBgF ~F · ~Beff , rather than an eigenstate of the
total interaction Hamiltonian H
eff (t).
In section IV, we will perform explicit calculations for
several examples of ARFP proposed for various applica-
tions: e.g., as atomic storage rings or atomic beam split-
ters. Most often we find that only the first part of Eq.
(16) contributes a non-zero value to the geometric phase.
Before proceeding to the next section for a quantal
treatment of the geometric phase, we find the time evo-
lution of the atomic spin state in the Schroedinger picture
|Ψ(t)〉 =
Cl(0) z〈m|l[~r(t)]〉eff ×
(t′)dt′e−iγl(t)e−imωt|m[~r(t)]〉s, (18)
obtained directly from |Ψ(t)〉 = U †(t)|Ψ(t)〉I after the
applications of the rotating wave and adiabatic approxi-
mations. When the atom is prepared initially in a specific
adiabatic state |n[~r(t)]〉eff of the interaction picture, we
arrive at the simple case of Cl(0) = δln.
III. A QUANTUM MECHANICAL
TREATMENT
In the previous section, we provided the result for the
geometric phase γn(t) in an ARFP based on a semi-
classical approach, where the atomic center of mass mo-
tion is described classically. A clear physical picture ex-
ists in this case for the appearance of the geometric phase
in a certain parameter space. The validity conditions for
the rotating wave and the adiabatic approximations as
obtained above are all formulated in terms of gauge inde-
pendent forms. However, if the influence of the geomet-
ric phase on the atomic spatial motion is to be included,
e.g., as in the Aharonov-Bohm-type, phase shift, inter-
ference arrangement in an atomic Sagnac interferometer
discussed earlier [33], we would need an improved de-
scription where both the atomic spin and its center of
mass motion are treated quantum mechanically.
In a full quantum treatment of the atomic motion, the
quantum state of an atom can be expressed as |Φ(t)〉 =
φl(~r, t)|l〉z, where φl(~r, t) is the atomic spatial wave
function for the internal state |l〉z of Fz. The state |Φ(t)〉
then satisfies the Schroedinger equation governed by the
Hamiltonian
+ gFµB ~F · ~B(~r, t), (19)
with ~P being the kinetic momentum and M the atomic
mass.
The rotating wave and adiabatic approximations can
be introduced now by defining the interaction picture
with the unitary transformation
U(t) =
|m〉z eff〈m(~r)|
|n〉z s〈n(~r)|einκωt
. (20)
The state in the interaction picture |Φ(t)〉I = U(t)|Φ(t)〉
now is governed by the Schroedinger equation with the
Hamiltonian Heff = UHU†. Under the rotating wave
and adiabatic approximations, we neglect transitions be-
tween states |m〉z and |n〉z (m 6= n) as well as the rapidly
oscillating terms. We then obtain
Heff ≈
|n〉z z〈n|Heff |n〉z z〈n|
ad |n〉z z〈n|, (21)
where the adiabatic Hamiltonian H
ad for the n-th adia-
batic branch is defined as
~P − ~An
I (~r), (22)
with the effective gauge potential
~An(~r) = −i
| eff〈n(~r)|l〉z|2 s〈l(~r)|∇|l(~r)〉s
−i eff〈n(~r)|∇|n(~r)〉eff . (23)
In this form, it is well known that the geometric phase
γn can be expressed as the integral of the gauge poten-
tial ~An along the spatial trajectory for the atomic center
of mass in an ARFP, i.e., one would expect generally
that γn =
~An · d~r. Similar to the result of the semi-
classical approach, the gauge potential ~An(~r) can be ex-
pressed as the sum of two parts. The first part in Eq.
(23) is the weighted sum of the atomic gauge potential
−i s〈l(~r)|∇|l(~r)〉s from the static field ~Bs, while the sec-
ond term is the atomic gauge potential from the “effective
B-field” ~Beff .
A full quantum treatment for atomic motion in an
ARFP has been attempted earlier [22]. In fact, many of
our formulations are identical to the results of Ref. [22].
For instance, it is easy to show that the unitary trans-
formations US, UR, and UF in [22] are related directly
to ours as U
S = U . The only difference concerns
the gauge potential ~An that was neglected in Ref. [22].
Thus, they did not give the expression for the gauge po-
tential, and the result for the geometric phase was not
obtained either [22]. Our study shows that the neglect of
the adiabatic gauge potential potentially can give rise to
a final result, dependent on the choice of the local phase
factors for the internal eigenstate.
IV. GEOMETRIC PHASES IN ARFP BASED
APPLICATIONS
In the above two sections, we obtain the expression for
the atomic geometric phase in an ARFP. This section is
devoted to the calculations of the geometric phases for
several proposed applications of ARFP, such as storage
rings or beam splitters for neutral atoms [18, 19, 20, 21,
Before presenting our results for the more specific
cases, we provide some general discussions of the geo-
metric phases in several ARFP based storage rings. As
was pointed out earlier, the geometric phase γn is given
by the line integral of the gauge potential ~An along the
trajectory for the atomic center of mass motion. For a
closed path in the storage ring at a fixed ρ = ρc and
z = zc, this can be further reduced to
γn = q
A(φ)n (ρ, φ, z)ρdφ, (24)
where the integer q is the winding number of the path
and A
n is the component of ~An along the azimuthal
direction êφ of the familiar cylindrical coordinate system
(ρ, φ, z). Without loss of generality, we take q = 1 in this
paper. For the storage rings proposed in Refs. [18, 19,
20, 21, 22], the gauge potentials A
n (ρ, φ, z) are actually
independent of the angle φ. Therefore, the geometric
phase is simply given by
γ(c)n = 2πρcA
n (ρc, zc), (25)
given out in explicit forms for different storage ring
schemes [18, 19, 20, 21, 22].
In reality, because of thermal motion or when the
atomic transverse motional state is considered, the center
of mass for an atom can deviate from (ρc, zc) even for a
closed trajectory. This uncertainty in the exact shape of
the closed trajectory gives rise to a fluctuating geometric
phase and is usually difficult to study. Assuming a simple
closed path at fixed ρ and z, we have found previously
that the subsequently fluctuations could decrease the vis-
ibility of the interference pattern [33]. Quantum mechan-
ically, such destructive interference can be explained as
resulting from entanglement between the freedoms for φ
and (ρ, z) because of the dependence of the gauge po-
tential Aφn on ρ and z. Therefore, it is important to
investigate this dependence near the trap center.
For simplicity, our discussions below will focus on the
closed loops where ρ and z are φ-independent constants.
In this case, the geometric phase can be expressed as
γn(ρ, z) = 2πρA
n (ρ, z). We will show numerically the
distributions for γn(ρ, z) obtained this way near the cen-
tral region of (ρc, zc). If needed, a more rigorous ap-
proach can be developed to investigate the fluctuations
of the resulting geometric phase from the gauge potential
n (ρ, z).
A. The storage ring proposals of Refs. [18, 19, 20]
This subsection is devoted to a detailed calculation of
the geometric phases for the ARFP storage ring propos-
als of Refs. [18, 19, 20]. We will derive the analyti-
cal expressions for the azimuthal component A
n of the
gauge potential that arises in both cases from cylindri-
cally symmetric static B-field and rf fields. Because of
the cylindrical symmetry, the angle βs(ρ, z) between the
local static B-field and the z-axis is required to be ana-
lytical in the region near the storage ring. Therefore, the
eigenstate |n(~r)〉s of ~F · ~Bs can be chosen as
|n(~r)〉s = exp{−i[~F · êφβs(ρ, z) + nφ]}|n〉z . (26)
Consequently, ~B eff(~r) is also cylindrically symmetric,
which leads to the eigenstate |n(~r)〉eff of ~F · ~B eff as
|n(~r)〉eff = exp{−i[~F · n̂ eff⊥ (~r)βeff(ρ, z) + nφ]}|n〉z, (27)
with the unit vector n̂ eff⊥ (~r) in the x-y plane orthogonal to
~B eff(~r) and β eff(ρ, z) denoting the angle between ~B
eff(~r)
and the z-axis. We note that the unit vector field n̂ eff⊥ (~r)
also possesses cylindrical symmetry, i.e., remains invari-
ant under rotation around the z-axis. The expressions of
(26) and (27) allow us to obtain the simple expression of
the gauge potential
A(φ)n (ρ, z) = −
cosβeff(ρ, z) cosβs(ρ, z), (28)
after straightforward calculations.
In the scheme of Ref. [18], the static B-field is a “ring-
shaped quadrupole field” that vanishes along a circle of
a radius ρ0 in the x-y plane. Near ρ = ρ0, the B-field is
given approximately by
~Bs(~r) = B
′(ρ− ρ0)êρ −B′zêz, (29)
like a quadrupole field, while the rf-field takes a compli-
cated form
~Bo(~r, t) =
cos(ωt) +
cos(ωt+ ϕ)
sin(ωt) +
sin(ωt+ ϕ)
êz, (30)
with constants a and b independent of ~r.
From the expression of (26) for the eigenstate |n(~r)〉s,
the “effective B-field” ~B eff becomes
~B eff(~r) = B′[
(ρ− ρ0)2 + z2 − r0]êz
cos(θ + ϕ) +
cos θ
sin(θ + ϕ)− a√
sin θ
êφ, (31)
FIG. 1: (Color online) A cross-sectional view for the storage
ring of Ref. [18]. The static field is zero in the ring at the
fixed radius ρ0. The addition of rf-fields creates an ARFP
centered at a ring through (ρc, zc). The distance from the
trap center to the ring with radius ρ0 in the plane z = 0 is r0.
where r0 and θ are given by
|µBgFB′|
cos θ(ρ, z) =
ρ− ρ0
(ρ− ρ0)2 + z2
sin θ(ρ, z) =
(ρ− ρ0)2 + z2
. (32)
In an ARFP, as discussed here, the trap center
at (ρc, zc) is determined by minimizing both the z-
component and the transverse component of ~B eff . With-
out loss of generality, we will assume a, b > 0. Then,
(ρc, zc) is found to satisfy
θ(ρc, zc) = −ϕ/2,
(ρc − ρ0)2 + z2c = r0, (33)
i.e., the trap center lies on the surface of the “resonance
toroid” at ρ = ρ0 with a radius r0 as shown in Fig. 1.
The relative angle of the trap center with respect to the
center of the toroid cross-section is given by −ϕ/2. On
this “resonance toroid,” the rf-field is resonant with the
static field, i.e., B effz vanishes. As a result, the “effective
B-field” lies again in the x-y plane on the “resonance
toroid,” which gives cosβeff(ρc, zc) = 0 and leads to the
result A
n (ρc, zc) = γn = 0 as shown in the trap center
for the storage ring considered before in Ref. [18].
From the expression (29) of the static field and the def-
inition of the angle θ(ρ, z), we find a simple relationship
βs(ρ, z) = π/2 + θ(ρ, z), with which the gauge potential
n (ρ, φ) in (28) can be further simplified as
A(φ)n (ρ, z) =
cosβeff(ρ, z) sin θ(ρ, z)
cosβeff(ρ, z) sin θ(ρc, zc), (34)
near the trap center. Thus, the spatial fluctuation of
the gauge potential A
n (ρ, z) in the region around the
trap center is closely related to the angle θ(ρc, zc) of the
trap center, or the parameter ϕ of the oscillating field
~Bo. When ϕ = 0, the atom is trapped in the region
with θ ≈ 0 or π, where the fluctuation of A(φ)n (ρ, z) is
suppressed significantly due to the small value of sin θ.
On the other hand, if the angle ϕ is set to π with the
trap center located in the region with θ ≈ ±π/2, the
fluctuation of the gauge potential becomes amplified.
In Fig. 2, we illustrate numerical results for the dis-
tribution of the geometric phase γ1(ρ, z) = 2πρA
1 (ρ, z)
in the region near the trap center at ϕ = 0, π/2, π. We
see clearly decreased fluctuations of γ1 when the absolute
value of sin θ(ρc, zc) = − sin(ϕ/2) is decreased.
Next we turn to the storage ring of Ref. [19] con-
structed from a quadrupole static B-field ~Bs(~r) =
B′(x, y,−2z) and an ~r-independent rf field ~Bo =
Brf cos(ωt)êz along the z direction. The resulting ARFP
provides a 2D ring shaped trap in the x-y plane. In ad-
dition, a 1D optical potential along the z direction is em-
ployed to confine atoms in the transverse plane at z = 0
[19]. The “effective B-field” takes the form
~B eff(~r) = B′(ρ− ρ0)êz −
Brf êρ, (35)
in the plane at z = 0, with ρ0 = ω/|µBgFB′|. Because
the strength of ~B eff is near minimum at the ring ρ = ρ0,
the trap center for this storage ring is located at ρc =
ρ0 and zc = 0. At the trap center, the “effective B-
field” is along the direction of êρ. Thus, according to Eq.
(28), the geometric phase γ
n at the trap center again
vanishes.
In Fig. 3, we show the distribution of the geometric
phase γ1 in the region near the trap center for Brf =
0.05|B′|ρ0 and Brf = 0.15|B′|ρ0. We see that the fluctu-
ation is relatively small when the strength of the rf-field
is large. This can be explained by Eq. (28), which shows
that A
n is proportional to cosβeff and can be approx-
imated as 2B effz /Brf near the trap center. When Brf is
large, the gauge potential becomes a relatively slow vary-
ing function of ρ and z. In this case, the presence of a
1D optical potential allows for the possibility of tuning
the trap center position to a nonzero value of z, with the
storage ring remaining in the x-y plane. Then cosβs is
assumed to a nonzero value, leading to increased fluctu-
ations for the geometric phase.
Finally, we discuss the geometric phase in the “time
averaged” ARFP storage ring proposed in Ref. [20]. Un-
like previously considered ARFP based storage rings, the
time dependence now exists in both the “static B-field”
and the frequency of the rf field given by
~Bs(~r, t) = B
′ρêρ − 2B′zêz +Bm sin(ωmt)êz,
−0.1 
0    
0.1  −0.1 
0    
0.1  
−0.1 
0    
0.1  −0.1 
0    
0.1  
 (ρ−ρc)/ρ0 
−0.1 
0    
0.1  −0.1 
0    
0.1  
FIG. 2: (Color online) The distribution of the geometric phase
γ1 near the trap center (ρc, zc) of the storage ring proposed
in Ref. [18] at (a) ϕ = 0, (b) ϕ = π/2, and (c) ϕ = π, clearly
displaying the sin(ϕ/2) dependence.
~Bo(t) = Brf sin[ω(t)t]êz,
ω(t) = ω0
1 + (Bm/B′ρ0)2 sin
2(ωmt) . (36)
The frequency ωm is assumed to be much smaller than ω0
but much larger than the trap frequency. The radius ρ0
is now defined as ρ0 = ω0/|µBgFB′|, and the “effective
−0.05
0.05 −0.05
0.05 
−0.05
0    
0.05 −0.05
0.05 
FIG. 3: (Color online) The geometric phase γ1 for the storage
ring of Ref. [19] with (a) Brf = 0.15B
′ρ0 and (b) Brf =
0.05B′ρ0.
B-field” takes the form
~B eff(~r, t) = ∆(~r, t)êz −
|2 ~Bs(~r, t)|
Brf êφ . (37)
The operating principle for the time averaged storage
ring of Ref. [20] is similar to the well-known TOP [2]
and TORT traps [3, 4]. The effective trap potential ex-
perienced by the atom is proportional to the time aver-
aged value of the “effective B-field”
∫ 2π/ωm
| ~B eff(~r, t)|dt.
When Brf and Bm are much smaller than B
′ρ0, the
center of the storage ring is located approximately at
ρc = ρ0, zc = 0. Using the earlier result [33], we find that
in the time averaged storage ring, the effective gauge po-
tential Ã
n (ρ, z) is reduced simply to the time averaged
instantaneous gauge potential
Ã(φ)n (ρ, z) =
∫ 2π/ωm
A(φ)n (ρ, z, t)dt, (38)
with A
n (ρ, z, t) given in (28). The geometric phase then
is given approximately by γn(ρ, z) = 2πÃ
n (ρ, z). In this
case, we find that the geometric phase always vanishes at
−0.05 
−0.05
−0.2 
0 (ρ−ρ0)/ρ0
−0.05
−0.05
 z/ρ0 
FIG. 4: (Color online) The geometric phase γ1 for the storage
ring of Ref. [20] at (a) Brf = 0.3B
′ρ0 and (b) Brf = 0.1B
Bm = 0.05B
the trap center (ρc, zc). Figure 4 illustrates the distribu-
tion of the geometric phase in the region near the trap
center for two different values of the rf-field amplitude
Brf . Similar to the storage ring of Ref. [19], the fluctua-
tion of the geometric phase is suppressed in this case for
large Brf .
B. The storage ring proposals of Refs. [21, 22]
Next we consider the ARFP based storage ring pro-
posed in Refs. [21, 22]. In this case, the static B-field
is that of a Ioffe-Pritchard trap on an atom chip. In the
Cartesian coordinate (x, y, z), it takes the form
~Bs = B
′xêx −B′yêy +B′Lêz, (39)
where B′ is the B-field gradient and the bias field along
the z-direction is denoted as B′L. The amplitudes ~B
and ~B
rf (z) of the rf field are
rf = [Brf(z)/
2]êx and
rf = [Brf(z)/
2]êy with
Brf(z) = B
rf +B
′′z2. (40)
In the schemes of Ref. [21, 22] considered earlier, the
phase η of the rf field is assumed to be κπ/2. The x-
and y-components of the “effective B-field” ~B eff(~r) then
become
B effx (~r) =
Brf(z)
(1 + cosβs(ρ, z)),
B effy (~r) = 0, (41)
according to Eq. (10). Then the strength of the “effective
B-field” ~B eff has its minimum along a circle with a non-
zero radius ρc, provided a positive detuning ∆ exists at
the origin (0, 0, 0) [21, 22]. The “effective B-field” ~B eff is
easily shown to lie in the x-z plane along the trap bottom
mapped out by the atomic center of mass motion. This
gives rise to a vanishing γ
F . With a proper choice for
the local phase of |n(~r)〉eff , the gauge potential A(φ)n takes
the form
A(φ)n (ρ, z) =
cosβeff(ρ, z) (1− cosβs(ρ, z)) . (42)
Figure 5 displays the geometric phase along a closed
path for a spin-1 atom as a function of ρc for the ARFP
storage ring proposed in Refs. [21, 22]. The parameter λ
is defined as
∆[~r = 0]
|gFµBB(0)rf |
. (43)
To assure the validity of the rotating wave approx-
imation, we find that the maximal values of ∆[~r =
0]/(|gF |µB) and B(0)rf /
2 must be restricted to the re-
gion of λ ∈ [0, 0.15].
As shown in Fig. 1, the geometric phase remains much
smaller than 2π in this situation. This fact can be appre-
ciated easily if we look at the distribution of the “effec-
tive B-field” ~B eff . According to Eq. (41), the component
B effx has a nonzero minimal value Brf/(2
2), while |Beffz |
can become arbitrarily small, although not necessarily
zero in general. Therefore, at the trap center where | ~B eff |
is a minimum, the value of cosβeff = B
z /| ~B eff | can be-
come very small, leading to small geometric phases. Yet,
despite the relatively small geometric phase found here,
our result remains important because it could represent
a systematic error if not properly included in a Sagnac
interference experiment.
In Fig. 6, we show the spatial distribution of the ge-
ometric phase γ1 around the trap center with λ = 1/3
and λ = 3. The fluctuation is found to be relatively
small when λ is small or when the rf-field amplitude Brf
is large.
Although not discussed in Refs. [21, 22], a ring shaped
trap also can be realized if we take η = −κπ/2. The
“effective B-field” ~B eff still lies in the x-y plane
~B effx (~r) = −
Brf(z)
cos(2φ)(1 − cosβs(ρ, z)),
~B effy (~r) =
Brf(z)
sin(2φ)(1 − cosβs(ρ, z)), (44)
0  0.1 0.2 0.3 0.4 0.5 0.6
−0.1 
−0.08
−0.06
−0.04
−0.02
0    
λ=1/3 
FIG. 5: (Color online) The geometric phase γ1 is plotted
against the radius ρc for the ARFP storage ring of Ref.
[21, 22] with η = κπ/2 at λ = 3, λ = 1, and λ = 1/3.
To assure the validity of the rotating wave approximation, in
the solid lines, the maximal value of ∆[~r = 0]/|gFµBB′ρ0|
or Brf/(
2B′ρ0) are restricted to be smaller than 0.15. The
extending dashed line is beyond the rotating wave approxi-
mation for λ = 1/3 and Brf/(
2B′ρ0) ∈ [0.15, 0.3].
clearly giving rise to a non-zero solid angle with respect to
a closed path along the storage ring. Therefore, the term
(eff)
F is non-zero in this case. We choose the eigenstates
|n(~r)〉s and |n(~r)〉eff as
|n(~r)〉s = exp[−i ~F · n̂s(~r)βs(ρ, z)]|n〉z,
|n(~r)〉eff = exp[−i ~F · n̂ eff⊥ (~r)βeff(ρ, z)]|n〉z . (45)
with the unit vector n̂s⊥(~r) in the x-y plane orthogonal
to ~Bs(~r). In this case, the gauge potential A
n can be
expressed as
A(φ)n (ρ, z) =
cosβeff(ρ, z)[1 + cosβs(ρ, z)]. (46)
In Figure 7, we show the fluctuation of the geometric
phase γ1 for a closed path with a new parameter
λ′ = 6
∆[~r = 0]
|gFµBB(0)rf |
, (47)
equal to 3 and 1/3. The fluctuation for γ1 is found to
be much larger than the case of η = κπ/2, which can be
explained by the transverse components B effx,y of the “ef-
fective B-field.” Because cosβs is always close to unity.
In the case of η = −κπ/2, B effx,y can take only small pos-
itive values. Therefore, at the minimum of the ARFP
trap ρ = ρ0 of | ~B eff |, both B effz and B effx,y have to be
close to zero. In this case the value for cosβeff becomes
a rapidly changing function of ρ in the region near ρc.
Our above calculations have obtained analytical ex-
pressions of the geometric phases in an ARFP based stor-
age ring for η = ±κπ/2. We have further investigated the
−0.05
0.05 
−0.05
0.05 
−0.06
−0.02
−0.05
0.05 
−0.05
0.05 
FIG. 6: (Color online) The spatial distribution of the geo-
metric phase γ1 for the storage ring of Refs. [21, 22] at (a)
λ = 1/3 and (b) λ = 3. η = κπ/2. B
rf = 0.08B
′L and
B′′ = 10−12B′/L are assumed.
fluctuations of the geometric phase for the two cases of
η = ±κπ/2. It seems one benefits from implementing
a Sagnac interferometer in the discussed ARFP storage
ring with η = κπ/2 and operating at a relatively large λ.
Before proceeding onto the concluding section, we will
discuss the geometric phase in an ARFP based beam
splitter created via a double potential [14, 21]. In such an
implementation, the static field ~Bs is created from a Ioffe-
Pritchard trap, while the oscillating rf field components
are ~B
rf = Brf [z]êx and
rf = 0. By spatially tuning
the amplitude of Brf from zero to a significant value,
in the x-y plane, an ARFP can be tuned from a single
well centered near the origin to a double well with two
minimal points at the point with nonzero radius ρ0 and
φ = 0, π. Therefore, a Y-shaped atom beam splitter
can be accomplished when the Brf [z] initially is increased
along the z-axis to a large value, and then decreased to
zero. In such an arrangement, the atom beam moving
along the z direction can be separated into two beams
that move along the z-axis at φ = 0, π for a while, and
then can be recombined again into a single beam.
−0.05
0.05 
−0.05
0.05 
−0.05
0.05 
−0.05
0    
0.05 
FIG. 7: (Color online) The spatial distribution of the geo-
metric phase γ1 for the storage ring of Refs. [21, 22] at (a)
λ′ = 3 and (b)λ′ = 1/3. η = −κπ/2. B(0)rf = 0.08B
′L and
B′′ = 10−12B′/L are assumed.
In the atom interferometer considered above, both the
static field ~Bs and the “effective B-field” ~B
eff are limited
to the x-z plane. Therefore, for motion along the closed
path of the trap bottom, the solid angle enclosed by the
trajectory of ~B eff is zero. Thus, the geometric phase in
(16) can be expressed as
γn(t) = −i
| eff〈n(~r)|l〉z|2 s〈l(~r)|∇|l(~r)〉s · ~vdt′. (48)
We can show that the product
eff〈n(~r)|l〉z|2 s〈l(~r)|∇|l(~r)〉s is a function of ρc and
is independent of z. Thus, the geometric phase can be
expressed as an integral of this function with respect to
ρc, from zero to a large value and then back to zero.
Therefore, the value of the geometric phase would be
zero in the end.
V. CONCLUSION
In this study, we develop theoretical formalisms for
the calculation of the atomic geometric phase inside an
ARFP. We show that, due to the complexity of the
ARFP, the geometric phase depends on the spatial vari-
ation of both the static field and an “effective B-field”
~B eff . We provide general expressions for the geometric
phase and the corresponding adiabatic gauge potential in
Eqs. (16) and (23), respectively.
To shed light on actual applications of the atomic ge-
ometric phase, we investigate the distribution of atomic
geometric phases for several proposed or ongoing exper-
iments with ARFP based storage rings and atom beam
splitters. We prove rigorously that the geometric phase in
the center of the storage rings proposed in Refs. [18, 19]
is always zero. In addition, we find that in the storage
ring of Ref. [18], the spatial fluctuation of the geometric
phase sensitively depends on the position of the trap cen-
ter on the “resonance toroid.” In the proposals of Refs.
[19, 20, 21, 22], the fluctuation for the geometric phase
becomes significantly suppressed when the amplitude Brf
of the rf-field is large. In the proposals of [21, 22], the
fluctuations of the geometric phase also is suppressed if
the angle η is set to be κ2π. In the beam splitter real-
ized with the double well potential ARFP [14, 21], the
geometric phase is shown to be zero.
Our work helps to clarify the working principle of trap-
ping neutral atoms in an ARFP and the validity condi-
tions for the various approximations involved. We hope
our results will shine new light on the proposed inertial
sensing experiments based on trapped atoms in ARFP.
Acknowledgments
We thank Dr. T. Uzer and Dr. B. Sun for helpful dis-
cussions. This work is supported by NASA, NSF, CNSF,
and the 863 and 973 programs of the MOST of China.
[1] D. E. Pritchard, Phys. Rev. Lett. 51, 1336 (1983).
[2] W. Petrich, M. H. Anderson, J. R. Ensher, and E. A.
Cornell, Phys. Rev. Lett. 74, 3352 (1995).
[3] A.S. Arnold and E. Riis, J. Mod. Opt. 49, 959
(2002); C. S. Garvie, E. Riis, and A. S. Arnold,
Laser Spectroscopy XVI, edited by P. Hannaford et al.
(World Scientific, Singapore, 2004), p. 178, see also
〈www.photonics.phys.strath.ac.uk〉.
[4] S. Gupta, K.W. Murch, K. L. Moore, T. P. Purdy, and D.
M. Stamper-Kurn, Phys. Rev. Lett. 95, 143201 (2005);
K.W. Murch, K. L. Moore, S. Gupta, and D.M. Stamper-
Kurn, Phys. Rev. Lett. 96, 013202 (2005).
[5] R. Folman, P. Krüger, J. Schmiedmayer, J. Denschlag,
and C. Henkel, Advances in Atomic, Molecular, and Op-
tical Physics, vol. 48, 263 (2002).
[6] J. A. Sauer, M. D. Barrett, and M. S. Chapman, Phys.
Rev. Lett. 87, 270401 (2001).
[7] A.S. Arnold, C.S. Garvie, and E. Riis, Phys. Rev. A 73,
041606(R) (2006).
[8] C. C. Agosta, I. F. Silvera, H. T. C. Stoof, and B. J.
Verhaar, Phys. Rev. Lett. 62, 2361 (1989).
[9] Z. Zhao, I. F. Silvera, and M. Reynolds, Jour. Low. Temp.
Phys. 89, 703 (1992).
[10] A. J. Moerdijk, B. J. Verhaar, and T. M. Nagtegaal,
Phys. Rev. A 53, 4343 (1996).
[11] H. Zhang, P. Zhang, X. Xu, J. Han, and Y. Wang, Chin.
Phys. Lett. 22, 83 (2001).
[12] O. Zobay and B. M. Garraway, Phys. Rev. Lett. 86, 1195
(2001); Phys. Rev. A 69, 023605 (2004).
[13] Y. Colombe, E. Knyazchyan, O. Morizot, B. Mercier, V.
Lorent, and H. Perrin, Europhys. Lett. 67, 593 (2004).
[14] S. Hofferberth, I. Lesanovsky, B. Fischer, J. Verdu, and
J. Schmiedmayer, Nature Physics 2, 710 (2006).
[15] T. Schumm, S. Hofferberth, L. M. Andersson, S. Wilder-
muth, S. Groth, I. Bar-Joseph, J. Schmiedmayer, and P.
Krüger, Nature Physics 1, 57 (2005).
[16] G.-B. Jo, Y. Shin, S. Will, T. A. Pasquini, M. Saba,
W. Ketterle, D. E. Pritchard, M. Vengalattore, and M.
Prentiss, Phys. Rev. Lett. 98, 030407 (2007).
[17] M. White, H. Gao, M. Pasienski, and B. DeMarco, Phys.
Rev. A 74, 023616 (2006).
[18] T. Fernholz, C. R. Gerritsma, P. Krüger, and R. J. C.
Spreeuw, arXiv:physics/0512017.
[19] O. Morizot, Y. Colombe, V. Lorent, and H. Perrin, arXiv:
physics/0512015.
[20] I. Lesanovsky and W. von Klitzing, arXiv:
cond-mat/0612213.
[21] I. Lesanovsky, T. Schumm, S. Hofferberth, L. M. Ander-
sson, P. Krüger, and J. Schmiedmayer, Phys. Rev. A 73,
033619 (2006).
[22] I. Lesanovsky, S. Hofferberth, J. Schmiedmayer, and P.
Schmelcher, Phys. Rev. A 74, 033619 (2006).
[23] C. L. G. Alzar, H. Perrin, H. B. M. Garraway, and V.
Lorent, arXiv: physics/0608088.
[24] X. Li, H. Zhang, M. Ke, B. Yan, and Y. Wang, arXiv:
physics/0607034.
[25] S. Hofferberth, B. Fishcher, T. Schumm, J. Schmied-
mayer, and I. Lesanovsky, arXiv: quan-ph/0611240.
[26] Ph.W. Courteille, B. Deh, J. Fortágh, A. Günther, S.
Kraft, C. Marzok, S. Slama, and C. Zimmermann, J.
Phys. B 39, 1055 (2006).
[27] M. G. Sagnac, C. R. Hebd. Seances Acad. Sci. 157, 708
(1913).
[28] C. A. Mead and D. G. Truhlar, J. Chem. Phys. 70, 2284
(1979); C. A. Mead, Phys. Rev. Lett. 59, 161 (1987); C.
P. Sun and M. L. Ge, Phys. Rev. D 41, 1349 (1990).
[29] M. V. Berry, Proc. R. Soc. Lond. A 392, 45 (1984).
[30] J. Schmiedmayer, M. S. Chapman, C. R. Ekstrom, T. D.
Hammond, D. K. Kokorowski, A. Lenef, R. A. Ruben-
stein, E. T. Smith, and D. E. Pritchard, p. 72, Atom
interferometry, edited by P. Berman, (Academic Press,
N.Y. 1997).
[31] T. Ho and V. B. Shenoy, Phys. Rev. Lett. 77, 2595
(1996).
http://arxiv.org/abs/physics/0512017
http://arxiv.org/abs/physics/0512015
http://arxiv.org/abs/cond-mat/0612213
http://arxiv.org/abs/physics/0608088
http://arxiv.org/abs/physics/0607034
http://arxiv.org/abs/quan-ph/0611240
[32] P. Zhang, H. H. Jen, C. P. Sun, and L. You, Phys. Rev.
Lett. 98, 030403 (2007).
[33] P. Zhang and L. You, Phys. Rev. A 74, 062110 (2006).
[34] Y. Aharonov and D. Bohm, Phys. Rev. 115, 485 (1959).
ABSTRACT
  We investigate the geometric phase of an atom inside an adiabatic radio
frequency (rf) potential created from a static magnetic field (B-field) and a
time dependent rf field. The spatial motion of the atomic center of mass is
shown to give rise to a geometric phase, or Berry's phase, to the adiabatically
evolving atomic hyperfine spin along the local B-field. This phase is found to
depend on both the static B-field along the semi-classical trajectory of the
atomic center of mass and an ``effective magnetic field'' of the total B-field,
including the oscillating rf field. Specific calculations are provided for
several recent atom interferometry experiments and proposals utilizing
adiabatic rf potentials.

<|endoftext|><|startoftext|>
Introduction
Numerical simulations have shown that the very first stars invariably formed in isolation
and were much more massive than the sun, due mainly to the inability of primordial gas
http://arxiv.org/abs/0704.0477v1
– 2 –
to efficiently cool at low temperatures (Abel et al. 2002; Bromm et al. 2002; Yoshida et al.
2006). Tumlinson et al. (2004) have suggested that the Pop III IMF was not dominated by
very massive stars (M > 140 M⊙), but instead by stars with M = 8–40 M⊙. Even this IMF,
though, is still remarkably distinct from that observed for the local universe, which peaks at
less than one solar mass (Miller & Scalo 1979; Kroupa 2002; Chabrier 2003).
The deaths of the first stars produced and distributed copious amounts of metals into
their surroundings, through either core-collapse (M ∼> 10 M⊙) or pair-instability (M ∼> 140
M⊙) supernovae (Heger & Woosley 2002). These metals provide additional avenues for ra-
diative cooling of the ambient gas, through fine-structure and molecular transitions, as well
as continuum emission from dust formed from the supernova ejecta, permitting the gas that
will form the next generation of stars to reach temperatures lower than what is possible for
metal-free gas. Fragmentation of collapsing gas will continue so long as the gas can keep
decreasing in temperature as the density increases (Larson 2005), or until the gas becomes
optically thick to its own emission (Low & Lynden-Bell 1976). The minimum fragment mass
is determined by the local Jeans mass,
MJ ≃ 700 M⊙(T/200K)
3/2(n/104cm−3)−1/2(µ/2)−2, (1)
where T, n, and µ are the temperature, number density, and mean molecular weight, at the
halt of fragmentation (Larson 2005). For metal-free gas, a minimum temperature of ∼ 200
K is reached at n ≃ 104 cm−3 when H2 cooling becomes inefficient, yielding a Jeans mass,
MJ ≃ 10
3 M⊙ (Abel et al. 2002; Bromm et al. 2002). At some certain chemical abundance,
it is conjectured that metals provide sufficient cooling, so that the temperature of the gas
continues to decrease as the density increases past the stalling point for metal-free gas,
allowing the collapsing gas-cloud to undergo fragmentation and form smaller and smaller
clumps. The enrichment of gas to some critical metallicity, Zcr, will trigger the formation of
the first low-mass (Pop II) stars in the universe, as the gas can cool to lower temperatures at
higher metallicity, in general. The value of Zcr can be estimated by calculating the metallicity
required to produce a cooling rate equal to the rate of adiabatic compression heating at a
given temperature and density. This has been carried out for individual alpha elements, such
as C and O, by Bromm & Loeb (2003), and C, O, Si, Fe, as well as solar abundance patterns
by Santoro & Shull (2006), yielding roughly, 10−3.5 Z⊙ ∼< Zcr ∼< 10
−3 Z⊙.
Aside from the minimum clump mass, however, not much more can be said about the
spectrum of clump masses produced during fragmentation. Omukai et al. (2005) use one-
zone models with very sophisticated chemical networks to follow the evolution of temperature
and density in the center of a collapsing gas cloud, for a range of metallicities. The predic-
tions of fragmentation from this work, though, are based solely on statistical arguments of
elongation in prestellar cores and do not capture the complex processes of interaction and ac-
– 3 –
cretion associated with the formation of multiple stars (Bate et al. 2003). Tsuribe & Omukai
(2006) simulate the high density (n ≥ 1010 cm−3) evolution of extremely low-metallicity gas
(Z < 10−4 Z⊙), but the conclusions of this work are limited by the fact that the simulations
are initialized at an extremely late phase in the evolution of the prestellar core. The nu-
merical simulations by Bromm et al. (2001), which use cosmological initial conditions, show
fragmentation in gas with Z = 10−3 Z⊙, but a mass resolution of 100 M⊙ prevents this study
from saying anything conclusive about the formation of sub-stellar mass objects.
In this paper, we present the results of three-dimensional hydrodynamic simulations of
metal-enriched star-formation. These simulations are similar in nature to those of Bromm et al.
(2001), but with vastly improved numerical methods and updated physics. We describe the
setup of our simulations in §2, with the results in §3 and a discussion of the consequences of
this work in §4.
2. Simulation Setup
We perform a series of four simulations, with constant metallicities, Z = 0 (metal-free),
10−4 Z⊙, 10
−3 Z⊙, and 10
−2 Z⊙, using the Eulerian adaptive mesh refinement hydrodynamics/N-
body code, Enzo (Bryan & Norman 1997; O’Shea et al. 2004). The metallicity is held con-
stant throughout each simulation in order to isolate the role of heavy element concentration
in altering the dynamics of collapse compared to the identical metal-free case. In reality,
metals will be injected over time into star forming gas by Pop III supernova blast waves,
and the mixing of those metals with the gas will not be completely uniform. Here we fo-
cus on an idealized approximation in order to capture the essential physics of collapse and
fragmentation.
Each simulation begins at z = 99, in a cube, 300 h−1 kpc comoving per side, in a ΛCDM
universe, with the following cosmological parameters: ΩM = 0.3, ΩΛ = 0.7, ΩB = 0.04, and
Hubble constant, h = 0.7, in units of 100 km s−1 Mpc−1. We initialize all the simulations
identically, with a power spectrum of density fluctuations given by Eisenstein & Hu (1999),
with σ8 = 0.9 and n = 1. The computation box consists of a top grid, with 128
3 cells, and
three static subgrids, refining by a factor of 2 each. This gives the central refined region,
which is 1/64 the total computational volume, an effective top grid resolution of 10243 cells.
The grid is centered on the location of a ∼ 5 × 105 M⊙ dark matter halo that is observed
to form at z ∼ 18 in a prior dark-matter-only simulation, as is done similarly in Abel et al.
(2002); O’Shea et al. (2005). Refinement occurs during the simulations whenever the gas,
or dark matter, density is greater than the mean density by a factor of 4, or 8, respectively.
We also require that the local Jeans length be resolved by at least 16 grid cells at all times
– 4 –
in order to avoid artificial fragmentation as prescribed by Truelove et al. (1997).
To include the radiative cooling processes from the heavy elements, we use the method
described in Smith, Sigurdsson, & Abel (2007), in preparation. The nonequilibrium abun-
dances and cooling rates of H, H+, H−, He, He+, He++, H2, H
2 , and e
− are calculated
internally, as in Abel et al. (2002); Anninos et al. (1997). Meanwhile, the metal cooling
rates are interpolated from large grids of values, precomputed with the photoionization soft-
ware, CLOUDY (Ferland et al. 1998). We ignore the cooling from dust and focus only on
the contribution of gas-phase metals in the optically-thin limit. Unlike other studies of the
formation of the first metal-enriched structures, we do not assume the presence of an ionizing
UV background. In our model, the singular pop III star that was associated with the dark
matter halo in which our stars form has already died in a supernova. We also assume any
other Pop III stars are too distant to affect the local star-forming region and that QSOs have
yet to form. We use the coronoal equilibrium command when constructing the cooling
data in CLOUDY to simulate a gas where all ionization is collisional. The metal cooling
data was created using the Linux cluster, Lion-xo, run by the High Performance Computing
Group at The Pennsylvania State University. As a consequence of our choice to ignore any
external radiation, we do not observe the fine-structure emission of [C ii] (157.74 µm) that
was reported by Santoro & Shull (2006) to be important. Instead, cooling from C comes in
the form of fine-structure lines of [C i] (369.7 µm, 609.2 µm). The cooling from [C i] in our
study dominates in the same range of densities and temperatures as the cooling from [C ii]
in Santoro & Shull (2006). We observe the contributions of the other coolants studied by
Santoro & Shull (2006), [O i], [Si ii], and [Fe ii], to be in agreement with their work. In
addition, we find that emission from [S i] (25.19 µm) dominates the cooling from metals at
n ∼ 107 cm−3 and T ∼ 1–3 × 103 K. The absence of UV radiation in our simulations also
allows H2 to form, differentiating this study from Bromm et al. (2001). This allows for a
more direct comparison between the metal-free and metal-enriched cases.
The simulations are run until one or more dense cores form at the center of the dark
matter halo and a maximum refinement level of 28 is reached for the first time, giving us
a dynamic range of greater than 1010. Only the simulation with Z = 10−2 Z⊙ reached 28
levels of refinement. The three other simulations were stopped after reaching 27 refinement
levels, since their central densities were already higher than the simulation with Z = 10−2
Z⊙. Table 1 summarizes the final state of each simulation, where zcol is the collapse redshift,
lmax is the highest level of refinement, nmax is the maximum gas density within the box, and
∆tcol is the time difference to collapse from the metal-free simulation.
– 5 –
3. Results
As can be seen in Table 1, the runs with higher metallicities reach the runaway collapse
phase faster. The relationship between metallicity relative to solar and ∆tcol is well fit by
a power-law with index, n ≃ 0.22. Gas-clouds with more metals are able to radiate away
their thermal energy more quickly, and thus, collapse faster. An inverse relation between
metallicity and the number of grids and grid-cells exists because the low-density, background
gas evolves at roughly the same rate in all simulations, yet has more time, in the runs with
lower metallicities, with which to collapse to higher density, requiring additional refinement.
Our simulations, shown in Figure 1, display a qualitative transition in behavior between
metallicities of 10−4 Z⊙ and 10
−3 Z⊙. In the runs with the highest metallicities (Figure 1C
and 1D), the central core is extremely asymmetric, and multiple density maxima are clearly
visible. All four runs display similar large-scale density profiles (Figure 2A). Radiative cooling
from H2 becomes extremely inefficient below T ∼ 200 K, creating the effective temperature
floor, visible in Figure 2B for the metal-free case (Abel et al. 2002; Bromm et al. 2002). At n
≃ 104 cm−3, the rotational levels of H2 are populated according to LTE, reducing the cooling
efficiency and causing the temperature to increase (Abel et al. 2002; Bromm et al. 2002). In
the isothermal collapse model of Shu (1977), the accretion rate is proportional to the cube
of the sound speed. The increase in temperature leads to an increase in the accretion rate,
causing the density, and thus, the enclosed mass (Figure 2C), to be slightly higher inside the
central ∼ 0.1 pc in the metal-free case. A similar situation occurs further within for the Z
= 10−4 Z⊙ and, later, the 10
−3 Z⊙ cases, as the metal cooling is overwhelmed by adiabatic
compression heating and the temperature begins to rise with density. The presence of metals
at the level of 10−4 Z⊙ enhances the cooling enough to lower the gas temperature to ∼ 75
K. Metallicities greater than 10−3 Z⊙ provide sufficient cooling to bring the gas down to
the temperature of the cosmic microwave background, where TCMB ≃ 2.7 K (1 + z). The
gas temperatures are in general agreement with the calculations of Omukai et al. (2005)
that include a CMB spectrum at z = 20. Fragmentation requires that the cooling time be
less than the dynamical time. Figure 2D shows that this criterion is essentially never met
in the zero metallicity case, and only marginally in the Z = 10−4 Z⊙ case. However, the
fragmentation criterion is more than satisfied in the Z = 10−3 Z⊙ and 10
−2 Z⊙ cases over a
wide mass-range.
In order to locate fragments within our simulations, we employ an algorithm, based
on Williams et al. (1994), that works by identifying isolated density countours. Before we
search for clumps, we smooth the density field by assigning each grid-cell the mass-weighted
mean density of the group of cells including itself and its neighbors within one cell-width.
This serves to eliminate small density perturbations that would be misidentified as clumps by
– 6 –
the code. In order to directly compare the fragmentation from each simulation, we limit the
search for clumps to the 1 M⊙ of gas surrounding the cell with the highest density. On larger
scales, all of the runs display a filamentary structure that is qualitatively similar. No other
region in any of the simulation boxes has collapsed to densities comparable to those found
within the region where the clump search is performed. The results are shown in Figure 3.
A single clump exists in the metal-free and 10−4 Z⊙ simulations, containing 99.7% of the
total mass within the region of interest. In the simulation with Z = 10−3 Z⊙, 91% of the
mass is shared between two clumps with 0.52 M⊙ and 0.39 M⊙. In the same simulation, we
also find two smaller clumps 0.06 M⊙ and 0.02 M⊙. Finally, in the Z = 10
−2 Z⊙ simulation,
we see two clumps with 0.79 M⊙ and 0.21 M⊙.
4. Discussion
We have shown, through three-dimensional hydrodynamic simulations, that fragmen-
tation occurs in collapsing gas with metallicities, Z ≥ 10−3 Z⊙. Our results indicate that
star-formation occurs in exactly the same manner at metallicity, Z = 10−4 Z⊙, as it does at
zero metallicity. The similarities between the simulations with metallicities, Z = 10−3 Z⊙
and 10−2 Z⊙, suggest that the transition to low-mass star-formation is complete by 10
−3 Z⊙,
implying that the entire transition occurs over only one order of magnitude in metal abun-
dance. More simulations, bracketing the metallicity range, 10−4 to 10−3 Z⊙, will test how
abrupt the transition truly is. We will also explore the effect of non-solar abundances on the
low metallicity IMF. It has been recently argued that dust cooling at high densities (n ≥ 1013
cm−3) can induce fragmentation for metallicities as low as 10−6 Z⊙ Schneider et al. (2006).
In light of the work by Frebel et al. (2007), who note the absence of stars with Dtrans <
-3.5, where Dtrans is a measure of the combined logarithmic abundance of C and O, it seems
unlikely that Zcr is this low. While the fragmentation mode discussed in Schneider et al.
(2006), and also Omukai et al. (2005), may truly exist, it is possible that metal yields from
Pop III supernovae overshoot this metallicity, for realistic mixing scenarios, leaving almost
no star-forming regions with such a low concentration of heavy elements. Similar to our
results, Omukai et al. (2005) note that only high-mass fragments are produced when Z =
10−4 Z⊙. If Pop III supernovae are able to immediately enrich the local universe to Z = 10
Z⊙, the high-density dust cooling fragmentation mode would be skipped altogether, and the
high-mass stars that formed via the mode observed at 10−4 Z⊙ would leave no record in the
search for low-metallicity stars in the local universe.
We have limited the search for fragments to the dense 1 M⊙ core at the center of each
simulation. Within this region, it is unlikely that any more fragments will form in any of
– 7 –
the simulations. In all of the cases presented, the cooling has begun to be overwhelmed
by compression heating such that the central temperature is now increasing with increasing
density, which was indicated by Larson (2005) to be the end of hierarchical fragmentation.
Fragmentation may continue in the surrounding lower density gas in the cases of Z = 10−3
Z⊙ and 10
−2 Z⊙. The final stellar masses of these objects will also be affected interaction
and accretion that will occur in later stages of evolution. In the two lowest metallicity cases,
the gas immediately surrounding the central core evolves slowly enough that it will not have
sufficient time to reach high densities before the UV radiation from the central, massive star
dissociates all of the H2. As was shown by Bromm et al. (2001), clouds with metallicities, Z
≤ 10−4 Z⊙ are unable to collapse without the aid of H2 cooling.
In the two simulations in which significant fragmentation is observed, Z = 10−3 and
10−2 Z⊙, the gas is able to cool rapidly to the temperature of the CMB. Wise & Abel (2005)
predict that the rate of Pop III supernovae peaks at a redshift, z ∼ 20, and then drops
off sharply, implying that metal production from Pop III stars is effectively finished at this
point. In this epoch, the characteristic mass-scale for metal-enriched star formation will be
regulated by the CMB, as is predicted in Bromm & Loeb (2003). Thus, the first Pop II stars
will be considerably more massive, on average, than stars observed today, as was suggested
by Larson (1998). Observations of low-mass prestellar cores in the local universe reveal them
to have temperatures of about 8.5 K (Evans 1999), implying that the IMF may not become
completely ’normal’ until z < 3 when the CMB fell below this temperature.
We thank Tom Abel, Greg Bryan, Mike Norman, Brian O’Shea, and Matt Turk for
useful discussions. BDS also thanks Michael Kuhlen for providing an update to some useful
analysis tools. We are also very grateful for insightful comments from an anonymous referee.
This work was made possible by Hubble Space Telescope Theory Grant HST-AR-10978.01,
and an allocation from the San Diego Supercomputing Center.
REFERENCES
Abel, T., Bryan, G. L., & Norman, M. L. 2002, Science, 295, 93
Anninos, P., Zhang, Y., Abel, T., & Norman, M. L. 1997, New Astronomy, 2, 209
Bate, M. R., Bonnell, I. A., & Bromm, V. 2003, MNRAS, 339, 577
Bromm, V., Coppi, P. S., & Larson, R. B. 2002, ApJ, 564, 23
Bromm, V., Ferrara, A., Coppi, P. S., & Larson, R. B. 2001, MNRAS, 328, 969
– 8 –
Bromm, V. & Loeb, A. 2003, Nature, 425, 812
Bryan, G. & Norman, M. L. 1997, in Workshop on Structured Adaptive Mech Refinement
Grid Methods, ed. N. Chrisochoides, IMA Volumes in Mathematics No. 117 (Springer-
Verlag)
Chabrier, G. 2003, PASP, 115, 763
Eisenstein, D. J. & Hu, W. 1999, ApJ, 511, 5
Evans, II, N. J. 1999, ARA&A, 37, 311
Ferland, G. J., Korista, K. T., Verner, D. A., Ferguson, J. W., Kingdon, J. B., & Verner,
E. M. 1998, PASP, 110, 761
Frebel, A., Johnson, J. L., & Bromm, V. 2007, ArXiv Astrophysics e-prints
Heger, A. & Woosley, S. E. 2002, ApJ, 567, 532
Kroupa, P. 2002, Science, 295, 82
Larson, R. B. 1998, MNRAS, 301, 569
—. 2005, MNRAS, 359, 211
Low, C. & Lynden-Bell, D. 1976, MNRAS, 176, 367
Miller, G. E. & Scalo, J. M. 1979, ApJS, 41, 513
Omukai, K., Tsuribe, T., Schneider, R., & Ferrara, A. 2005, ApJ, 626, 627
O’Shea, B. W., Abel, T., Whalen, D., & Norman, M. L. 2005, ApJ, 628, L5
O’Shea, B. W., G., B., Bordner, J., Norman, M. L., Abel, T., Harknes, R., & Kritsuk, A.
2004, in Lecture Notes in Computational Science and Engineering, Vol. 41, Adaptive
Mesh Refinement - Theory and Applications, ed. T. Plewa, T. Linde, & V. G. Weirs
Santoro, F. & Shull, J. M. 2006, ApJ, 643, 26
Schneider, R., Omukai, K., Inoue, A. K., & Ferrara, A. 2006, MNRAS, 369, 1437
Shu, F. H. 1977, ApJ, 214, 488
Truelove, J. K., Klein, R. I., McKee, C. F., Holliman, II, J. H., Howell, L. H., & Greenough,
J. A. 1997, ApJ, 489, L179+
– 9 –
Tsuribe, T. & Omukai, K. 2006, ApJ, 642, L61
Tumlinson, J., Venkatesan, A., & Shull, J. M. 2004, ApJ, 612, 602
Williams, J. P., de Geus, E. J., & Blitz, L. 1994, ApJ, 428, 693
Wise, J. H. & Abel, T. 2005, ApJ, 629, 615
Yoshida, N., Omukai, K., Hernquist, L., & Abel, T. 2006, ApJ, 652, 6
This preprint was prepared with the AAS LATEX macros v5.2.
– 10 –
Table 1
Simulation Final States
Z (Z⊙) zcol lmax Grids Cells nmax (cm
−3) ∆tcol (Myr)
0 18.231519 27 8469 4.82 ×107 4.11 ×1013 -
10−4 18.838816 27 8060 4.64 ×107 3.90 ×1013 9.19
10−3 19.336557 27 7911 4.56 ×107 1.65 ×1013 16.21
10−2 20.032518 28 7521 4.42 ×107 1.50 ×1013 25.33
– 11 –
Fig. 1.— Slices through gas density for the final output of simulations with Z = 0 (A), 10−4
Z⊙ (B), 10
−3 Z⊙ (C), and 10
−2 Z⊙ (D). Each slice intersects the grid-cell with the highest
gas density and has a width of 2 × 10−8 of the computation box, corresponding to a proper
size of ∼4 × 10−4 pc (84 AU). The color-bar at bottom ascends logarithmically, from left to
right, spanning exactly four orders of magnitude in density.
– 12 –
Fig. 2.— Radially averaged, mass-weighted quantities for the final output each simulation:
Z = 0 (red), 10−4 Z⊙ (green), 10
−3 Z⊙ (blue), and 10
−2 Z⊙ (purple). A: Number density vs.
radius. B: Temperature vs. enclosed mass. C: Enclosed gas mass vs. radius. D: Ratio of
crossing time to cooling time vs. enclosed mass. The classical criterion for fragmentation is
met when the ratio of the crossing time to the cooling time is greater than 1.
– 13 –
Fig. 3.— Masses of clumps found within the final output of each simulation. The location
on the x and y axes corresponds to the log of the clump mass and the metallicity of the
simulation. Colors are the same as in Figure 2. The radii of the circles are proportional to
the masses of the clumps they represent. A factor of 10 in mass is equivalent to a factor of
2 in radius. The search for clumps is limited to the 1 M⊙ surrounding the grid cell with the
highest gas density. Only clumps with at least 1000 cells are plotted.
	Introduction
	Simulation Setup
	Results
	Discussion
ABSTRACT
  We observe a sharp transition from a singular, high-mass mode of star
formation, to a low-mass dominated mode, in numerical simulations, at a
metallicity of 10^-3 Zsolar. We incorporate a new method for including the
radiative cooling from metals into adaptive mesh-refinement hydrodynamic
simulations. Our results illustrate how metals, produced by the first stars,
led to a transition from the high-mass star formation mode of Pop III stars, to
the low-mass mode that dominates today. We ran hydrodynamic simulations with
cosmological initial conditions in the standard LambdaCDM model, with
metallicities, from zero to 10^-2 Zsolar, beginnning at redshift, z = 99. The
simulations were run until a dense core forms at the center of a 5 x 10^5
Msolar dark matter halo, at z ~ 18. Analysis of the central 1 Msolar core
reveals that the two simulations with the lowest metallicities, Z = 0 and 10^-4
Zsolar, contain one clump with 99% of the mass, while the two with
metallicities, Z = 10^-3 and 10^-2 Zsolar, each contain two clumps that share
most of the mass. The Z = 10^-3 Zsolar simulation also produced two low-mass
proto-stellar objects with masses between 10^-2 and 10^-1 Msolar. Gas with Z >=
10^-3 Zsolar is able to cool to the temperature of the CMB, which sets a lower
limit to the minimum fragmentation mass. This suggests that the second
generation stars produced a spectrum of lower mass stars, but were still more
massive on average than stars formed in the local universe.

<|endoftext|><|startoftext|>
Accepted by the Astrophysical Journal, March 7, 2007
Preprint typeset using LATEX style emulateapj v. 03/07/07
SUPER–STAR CLUSTER VELOCITY DISPERSIONS AND VIRIAL MASSES IN THE M82 NUCLEAR
STARBURST1
Nate McCrady
and James R. Graham
Department of Astronomy, University of California, Berkeley, CA 94720-3411
Accepted by the Astrophysical Journal, March 7, 2007
ABSTRACT
We use high-resolution near-infrared spectroscopy from Keck Observatory to measure the stellar
velocity dispersions of 19 super star clusters (SSCs) in the nuclear starburst of M82. The clusters
have ages on the order of 10 Myr, which is many times longer than the crossing times implied by their
velocity dispersions and radii. We therefore apply the Virial Theorem to derive the kinematic mass
for 15 of the SSCs. The SSCs have masses of 2× 105 to 4× 106 M⊙ , with a total population mass of
1.4 × 107 M⊙ . Comparison of the loci of the young M82 SSCs and old Milky Way globular clusters
in a plot of radius versus velocity dispersion suggests that the SSCs are a population of potential
globular clusters. We present the mass function for the SSCs, and find a power law fit with an index
of γ = −1.91±0.06. This result is nearly identical to the mass function of young SSCs in the Antennae
galaxies.
Subject headings: galaxies: individual (M82) — galaxies: starburst — galaxies: star clusters — in-
frared: galaxies
1. INTRODUCTION
1.1. Starburst Galaxies and Super Star Clusters
Short-duration episodes of intense star formation
known as “starbursts” are responsible for a significant
portion of star formation activity in the present-day Uni-
verse. Heckman (1998) estimates that the four most lu-
minous circumnuclear starbursts (M82, NGC 253, M83
and NGC 4945) account for 25 percent of the high-mass
(> 8 M⊙ ) star formation within 10 Mpc. The star-
burst phenomenon is the present-day manifestation of
the dominant mode of star formation in the early Uni-
verse (Leitherer 2001). At z = 0, high-mass stars form
predominantly in dense clusters and OB associations
(Miller & Scalo 1978). Massive stellar clusters in nearby
starburst galaxies thus provide a laboratory for studying
intense star formation and related feedback processes,
as well as physical conditions analogous to high-redshift
star formation.
Star formation in starbursts is resolved into young,
dense, massive “super–star clusters” (SSCs) that rep-
resent a substantial fraction of new stars formed
in a burst event (Meurer et al. 1995; Zepf et al.
1999). Hubble Space Telescope (HST) observations with
WFPC/WFPC2 in visible light (e.g., O’Connell et al.
1994; Whitmore & Schweizer 1995) have resolved SSCs
in the nearest starburst galaxies, and SSCs appear ubiq-
uitous in mergers and interacting galaxies. Of the
roughly 30 gas-rich mergers observed by HST, all have
young, massive, compact clusters (Whitmore 2001, and
refs.). A spectacular example is the “Antennae” galax-
1 Based on observations made at the W.M. Keck Observatory,
which is operated as a scientific partnership among the California
Institute of Technology, the University of California and the Na-
tional Aeronautics and Space Administration. The Observatory
was made possible by the generous financial support of the W.M.
Keck Foundation.
2 Now at Department of Physics and Astronomy, UCLA, Los
Angeles, CA 90095-1547
3 nate@astro.ucla.edu
ies, NGC 4038/4039, with more than 103 optically-visible
SSCs (Whitmore et al. 1999) and many other clusters
deeply embedded in dust (Gilbert et al. 2000).
The derived masses, radii and ages of SSCs suggest
that they are young globular clusters, and the brightest
and most massive may evolve into a population of glob-
ular clusters similar to that of the Milky Way. The ques-
tion of whether SSCs are in fact the progenitors of glob-
ular clusters depends critically on their masses and their
content of low-mass stars in particular (Meurer et al.
1995). If the stellar initial mass function (IMF) within
the clusters is biased toward high-mass stars (or “top
heavy”), for example by suppression of low mass star
formation, the clusters may not survive the disruptive
nature of mass loss resulting from both stellar evolu-
tion and dynamical processes (e.g. Chernoff & Weinberg
1990; Takahashi & Portegies Zwart 2000).
Local analogues of SSCs — massive, dense young clus-
ters in the Galaxy and Large Magellanic Cloud — con-
tain substantial populations of low-mass stars. The cen-
tral ionizing star cluster in NGC 3603, the most massive
H II region in the Galaxy, has 2000 M⊙ in OB stars alone.
Brandl et al. (1999) find that the mass spectrum of the
cluster is “well populated” down to 0.1 M⊙ . The clus-
ter R136 at the center of the 30 Doradus nebula in the
Large Magellanic Cloud is considered the closest example
of a starburst region (Brandl et al. 1996). In addition to
∼ 103 OB stars, some with masses in excess of 100 M⊙ ,
Sirianni et al. (2000) find a “substantial population” of
low-mass stars down to masses of 0.6 M⊙ in R136. With
a cluster mass of ∼ 2 × 104 M⊙ (Walborn et al. 2002),
R136 is at the lower end of the SSC mass range.
1.2. Mass Estimates
Most mass estimates to date for SSCs beyond the Lo-
cal Group are “photometric masses.” This technique in-
volves measuring the luminosity and broadband color of
a cluster, and comparing the results to the predictions
of spectral synthesis models. Examples include optical
http://arxiv.org/abs/0704.0478v1
2 McCRADY & GRAHAM
studies of clusters in the Antennae galaxies (Whitmore
2000, and refs.) and the nuclear cluster in IC 4449
(Böker et al. 2001). The resulting values are highly de-
pendent on the assumed IMF, age estimates and theo-
retical stellar evolution models. Consequently, the pho-
tometric method provides only very weak constraints on
the IMF in a cluster. The question of how SSCs evolve
or whether they can survive to become globular clusters
cannot be directly addressed with confidence by photom-
etry alone.
Recently, however, several studies have obtained kine-
matic masses for SSCs in starbursts directly from obser-
vations of stellar velocity dispersions. Ho & Filippenko
(1996b) use high-resolution optical spectra to measure
the velocity dispersion of an SSC in the nearby amor-
phous galaxy NGC 1705. They derive a cluster mass of
(8.2± 2.1)× 104 M⊙ . The dwarf starburst galaxy NGC
1569 contains two prominent SSCs. Ho & Filippenko
(1996a) used optical spectroscopy to determine the ve-
locity dispersion and derive a mass of (2 − 6) × 105
M⊙ for SSC-A. Gilbert (2002) uses near-IR spectroscopy
to identify two separate velocity components along the
line of sight at the position of SSC-A, and finds masses
of 3.0× 105 M⊙ for cluster A1 and 3.4× 10
5 M⊙ for clus-
ter A2. Gilbert also finds a velocity dispersion for SSC-B
and derives a mass of 1.8×105 M⊙ . Mengel et al. (2002)
use high-resolution optical and near-IR spectroscopy to
measure the velocity dispersions of six of the brightest
clusters in the merging Antennae galaxies. They derive
mass estimates ranging from 6.5× 105 to 4.7× 106 M⊙ .
In addition to these clusters in starburst galaxies, the
kinematic approach has been applied to several young,
massive clusters in more quiescent galaxies. Böker et al.
(1999) uses near-IR spectra to derive a mass of 6.6× 106
M⊙ for the nuclear star cluster in the giant spiral IC 342.
Larsen et al. (2004) derives virial masses for two SSCs
in the dwarf irregular NGC 4214 and two SSCs in NGC
4449 based on optical spectra, and one young SSC in
the nearby spiral NGC 6946 based on near-IR spectra.
Clusters in the Larsen study have masses between 2×105
and 1.8× 106 M⊙ , typical of SSCs.
Mass estimates based on measured velocity dispersions
are nearly independent of theoretical models, relying on
simple application of the virial theorem. Armed with
the virial mass, one can derive the light-to-mass ratio of
an SSC, which may be compared to predictions of pop-
ulation synthesis models to constrain the cluster’s IMF.
Critical to understanding the IMF is detection of low-
mass stars, the light of which is swamped by high lumi-
nosity supergiants. Measurement of the kinematic mass
provides the only means of detecting and quantifying the
low-mass stellar population of a cluster based on its in-
tegrated light.
1.3. Messier 82
M82 (NGC 3034) provides a useful laboratory. As
one of the nearest starburst galaxies at 3.6 Mpc
(Freedman et al. 1994), M82 presents an obvious reso-
lution advantage: star-forming regions can be studied on
small spatial scales (1′′ = 17.5 pc) and individual SSCs
are resolved by HST. The galaxy’s high inclination of
81◦ (Achtermann & Lacy 1995) and prevalent dust lead
to large, patchy extinction; infrared observations are re-
quired to overcome this obstacle in characterizing the
SSC population.
The light of red supergiant stars (RSGs) dominates
the near-IR continuum throughout the galaxy’s star-
burst core. Satyapal et al. (1997) interpret the compact
sources along the plane of M82 as young star clusters,
the space density of which increase towards the nucleus.
The smooth component of the near-IR emission is itself
likely the integrated contribution from unresolved clus-
ters of RSGs (Förster Schreiber 1998). Based on mid-IR
observations, Lipscy & Plavchan (2004) find that at least
20 percent of the star formation in M82 is occurring in
SSCs. HST/NICMOS images of the region (Figure 1)
show many luminous SSCs within ∼ 300 pc of the nu-
cleus.
The nuclear starburst is “active” in the sense that
the typical age for the starburst clusters is ∼ 107 years
(Satyapal et al. 1997). Evolutionary synthesis models by
Förster Schreiber (1998) suggest the nuclear starburst
(i.e., the central 450 pc) consists of two distinct, short
duration events with ages of about 5 Myr and 10 Myr.
The most intense star formation took place parallel to
the plane of the galaxy with a peak near the nucleus.
O’Connell et al. (1995) image M82 in the V and I bands
with the high-resolution Planetary Camera aboard HST,
identifying over 100 SSCs within a few hundred parsecs
of the nucleus. de Grijs et al. (2001) images a region in
the disk of M82, 1 kpc from the nuclear starburst, with
WFPC2 and NICMOS. They identify 113 SSC candi-
dates which were part of a starburst episode ∼ 600 Myr
ago (a “fossil starburst”), with little star formation in the
past 300 Myr. The clusters in the fossil starburst have
masses of 104−6 M⊙ . Smith & Gallagher (2001, SG01)
estimate an age of 60±20 Myr for the SSC ‘M82-F’, inter-
mediate between the ongoing nuclear burst and the fossil
burst farther out in the disk. M82-F lies ∼ 500 pc west
of the nucleus of M82. McCrady et al. (2005) measures
a mass of 5.6× 105 M⊙ for the cluster based on near-IR
observations, and found evidence for mass segregation.
Early ground-based studies of M82 found evidence of
an abnormal IMF. Rieke et al. (1993) uses population
synthesis models to constrain the IMF based on the near-
IR observations of McLeod et al. (1993). They conclude
that the large K-band luminosity of the M82 starburst
relative to its dynamical mass requires an IMF that is
significantly deficient in low-mass stars (M < 3 M⊙ ).
Doane & Mathews (1993) examine the supernova rate,
molecular gas mass and total dynamic mass and con-
clude that an IMF producing stars of mass > 3 M⊙ easily
matches observations, whereas a power-law IMF (e.g.,
Salpeter 1955) would require an unreasonably small mass
of stars in the region prior to the onset of the burst. In
contrast to these global studies, Satyapal et al. (1997)
use 1′′-resolution near-IR images to identify pointlike
sources and find that at this scale starburst models can
match observations without invoking a high-mass-biased
IMF. High spatial-resolution studies are necessary to in-
vestigate star formation in the cluster-rich M82 star-
burst. In a pilot study for this article, McCrady et al.
(2003) measure the kinematic mass of two clusters using
near-IR spectra and imaging. Based on the light-to-mass
ratios of the clusters, they find that one (MGG-11) ap-
pears to have a top-heavy IMF, whereas the other (MGG-
9) appears consistent with a normal IMF. Measurement
of the SSC mass independent of assumptions regarding
SUPER–STAR CLUSTERS IN M82 3
the L/M ratio are required to further constrain the stel-
lar IMF in the clusters.
1.4. Overview
In this article, we measure the virial masses of the su-
per star cluster population of the inner ∼ 500 pc of the
M82 starburst, extending the work of McCrady et al.
(2003). Our aim is to examine star formation in the
starburst on the scale of individual super star clusters,
regions only a few parsecs in extent, with an eye towards
placing constraints on the IMF of individual SSCs.
We use high-spectral-resolution near-IR spectroscopy
from the W.M. Keck Observatory to measure the stellar
velocity dispersions and dominant stellar spectral type
of the SSCs. We then apply the virial theorem to derive
their masses. Clusters for which the age may be deter-
mined facilitate investigation of the IMF. In § 2 we de-
scribe the kinematic mass, the method we use to measure
the velocity dispersion of stars in a cluster, and discuss
related systematic effects. In §3 we discuss the NIRSPEC
observations, data reduction and spectral extraction. In
§ 4 we measure cluster velocity dispersions and derive the
kinematic masses of the clusters, and present the cluster
mass function for the nuclear starburst region.
2. APPROACH
The mass of a gravitationally-bound star cluster may
be determined by application of the virial theorem
(Spitzer 1987). Specifically, the virial mass is a function
of two observable quantities:
M = 10
where rhp is the half-light radius in projection, σr is
the one-dimensional line-of-sight velocity dispersion and
G is Newton’s gravitational constant. Half-light radii
for the M82 clusters were measured in McCrady et al.
(2003) based on HST/NICMOS images. We assume the
light profile of the cluster traces the mass distribution,
and thus use the measured half-light radius as a proxy
for the half-mass radius. In the case of mass segrega-
tion, however, this assumption breaks down for near-
IR light, and the resulting mass represents a lower-limit
(McCrady et al. 2005). The M82 nuclear SSCs are la-
belled in a NIRSPEC slit-viewing camera (SCAM) mo-
saic shown in Figure 2. HST/NICMOSH-band (F160W)
images of the clusters are shown in Figure 3.
To measure σr for the clusters, we obtain high-spectral-
resolution near-IR integrated light spectra and perform a
cross-correlation analysis with template supergiant stars.
The near-IR spectrum of a young SSC (i.e., ages < 100
Myr) is dominated by the light of cool, evolved super-
giant stars. These highly luminous stars have a large
number of molecular and atomic features in the H band.
The integrated-light spectrum of an SSC resembles the
spectrum of a red supergiant star, the features of which
have been “washed out” by the velocity dispersion of
stars in the cluster (Figure 4). Our cross correlation
method, described in detail in McCrady et al. (2003),
returns both the velocity dispersion of the cluster rel-
ative to a particular template supergiant and a measure
of the similarity of the cluster and supergiant as quan-
tified by the peak value of the cross correlation function
(CCF). We have prepared an atlas of 19 high-resolution
(R ∼ 22, 000) template star spectra in the H band, rang-
ing from spectral types G2 through M5 in luminosity
class I (Kirian et al. 2006). Results of the cross correla-
tion analysis are presented in §4.
Determination of the velocity dispersion by cross cor-
relation analysis is subject to several potential sources
of systematic error. A detailed analysis is presented in
McCrady (2005). We present an overview in the follow-
ing paragraphs.
One potential difficulty is metallicity differences be-
tween the Galactic supergiants used as templates and the
supergiants producing the cluster light. McLeod et al.
(1993, and refs.) cited evidence from emission-line stud-
ies of various elements and concluded that the present-
day ISM in M82 has solar or slightly higher metallicity.
Förster Schreiber et al. (2001) determined that near-IR
stellar absorption features observed in the starburst core
are consistent with the light from solar-metallicity red
supergiants (RSG). Origlia et al. (2004) performed abun-
dance analysis on the nuclear starburst region using spec-
tral synthesis models for near-IR absorption and X-ray
emission. They found an iron abundance roughly half
of the solar value, but enhancement of α-elements to so-
lar or slightly higher levels. This pattern is consistent
with enrichment by recursive bursts of Type II super-
novae. The template supergiants used in our analysis
also have roughly solar metallicity, and thus we expect
that metallicity effects are unlikely to significantly bias
the measured cluster velocity dispersions.
Cross-correlation with a mismatched template spec-
trum can introduce systematic bias to the velocity dis-
persion determination. Tests with supergiant spectra
broadened with a Gaussian to simulate the effect of a
cluster velocity distribution indicate that the cross cor-
relation analysis correctly identifies the “best fit” based
upon the peak value of the CCF. Increasing the level of
added noise decreases the CCF peak, but does not lead to
misidentification of the best template. We have elected
to cross-correlate the cluster spectra with spectra of sin-
gle RSG stars because the light of a young coeval cluster
should be dominated by the light of the most massive
stars. At an age of ∼ 107 years, this would be the light
of evolved massive stars, i.e., the RSG stars. Mixing a
composite spectrum (with the inclusion of intermediate
mass stars) would generate a better match to the overall
spectrum (particularly the depth of absorption features
— see below). But for our purposes, it is more important
to be able to isolate the width of the lines resulting from
the velocity dispersion of cluster stars.
In addition to the dominant light of the RSG stars, we
expect the cluster spectra to contain a substantial con-
tribution from intermediate mass stars still on the main
sequence. In the H band, the spectra of A and late-B
stars (with masses ∼ 2–6 M⊙ ) are essentially featureless,
with the exception of the prominent (and wide) hydro-
gen Brackett series absorption lines. The prevalent neb-
ular emission in the disk of M82 requires us to avoid the
wavelength ranges of these hydrogen lines in our analy-
sis. Over the rest of the spectral range in our analysis,
an admixture of intermediate mass star spectra would
only change the slope of the spectrum (as the near-IR
spectra of these stars are essentially thermal). One step
in our analysis is the removal of any continuum slope,
4 McCRADY & GRAHAM
as the cross-correlation technique is used to measure the
velocity dispersion, information which is contained in the
width of the absorption lines, not in the depth of the lines
or in the continuum. As such, addition of intermediate
mass star spectra would not affect the measured velocity
dispersions or virial masses.
Filtering of the spectra in Fourier space is a source
of systematic error. Extracted spectra (§3) are cross-
correlated with the spectra of template supergiant stars.
The spectra are baseline-subtracted, apodized and both
high- and low-pass filtered in Fourier space. At the high-
frequency end, the cross correlation result is affected by
random noise; high amplitude noise residuals from sky
emission line subtraction are particularly noxious, as the
effects are unpredictable and merely serve to increase
uncertainty. Low frequencies contain information about
spectrum-wide residual variations after baseline subtrac-
tion. One likely source of such a variation is the presence
of light from intermediate-mass main sequence stars in
the integrated cluster light. Very broad hydrogen absorp-
tion features typical of the otherwise largely featureless
H-band spectra of A0V stars (Meyer et al. 1998), for ex-
ample, would not be removed by the low-order baseline
subtraction. Information pertaining to the velocity dis-
persion of the cluster resides in the frequencies between
the extremes.
The frequency filtering applied to the NIRSPEC data
leads to a systematic error of 0− 3 km s−1 , which varies
between echelle orders. The offsetting correction applied
to the results respresents a correction of no more than 20
percent, generally less. Noise in the input spectrum leads
to uncertainty in the correction factor in the range of
0.1−0.5 km s−1 , setting the lower bound on the precision
of the velocity dispersion measurements.
3. OBSERVATIONS
3.1. NIRSPEC Spectra
The spectra used to determine the internal velocity dis-
persions of the SSCs and template stars were taken with
the facility near-infrared echelle spectrometer NIRSPEC
(McLean et al. 1998) on the 10-m Keck II telescope on
Mauna Kea, Hawaii. We used NIRSPEC in the echelle
mode, which yields spectral resolution of R ∼ 22, 000.
The integrated light of super star clusters aged 5 to
80 Myr is dominated by evolved supergiant and bright
giant stars (Gilbert 2002). The near-IR spectrum of cool
evolved stars is replete with atomic and molecular ab-
sorption features — no “continuum” in the traditional
sense (i.e., a Planck thermal spectrum) is evident. Both
the H and K bands offer a large number of features that
the cross correlation analysis effectively averages over in
determination of the mean feature width. There is per-
haps an advantage to the H band, in that warm cir-
cumstellar dust may veil features in the K band. The
NIRSPEC detector experiences significant “persistence”
from exposure to large flux, for example bright sky OH
emission lines or arc lamp lines. Operationally, this dis-
courages changing of filters during an observing night as
the persistent after-images of sky emission lines from a
different filter add significantly to the noise in an echel-
legram. In this analysis, we have opted to observe the
clusters at the shorter wavelength only.
The NIRSPEC-5 (N5) order-sorting filter covers the
wavelength range 1.51–1.75 µm, corresponding approxi-
mately to Johnson H . The N5 echelle data fall in seven
echelle orders, ranging from 44 through 50. All obser-
vations were taken with the echelle and cross-dispersion
gratings set at their blaze angles. This position max-
imizes signal-to-noise for a given exposure time. This
advantage is mitigated by the fact that more than a sin-
gle position is required to cover the free spectral range
at 1.6 µm, and portions of the H band are not observed.
Spectra used in this work were obtained over four ob-
serving seasons, from February 2002 through January
2005. Table 1 presents a summary list of spectroscopic
observations. NIRSPEC observations of evolved stars
used as template spectra are discussed in Kirian et al.
(2006) and McCrady et al. (2003). The minimum air-
mass of M82 (declination +69◦40′) from Mauna Kea is
1.56, and efforts were made to observe at an airmass of
no larger than 2.0 when possible. The slit used has a
width of 0.432′′ (3 pixels) and length of 24′′. Use of the
long slit improves background subtraction, and often al-
lows multiple clusters to be observed simultaneously. Slit
positions were chosen to include multiple objects where
possible to increase observing efficiency. Certain pairs of
targets are closely-separated and only resolvable in good
seeing.
Each individual cluster spectrum has an integration
time of 600 seconds. Bright OH sky emission lines begin
to saturate in longer exposures, increasing the difficulty
of sky subtraction. Total integration time on a cluster is
increased by repeating the observations.
3.2. Reduction and Extraction
The spectra were dark-subtracted, flat-fielded and cor-
rected for cosmic rays and bad pixels. The curved echelle
orders were then rectified onto an orthogonal slit-position
versus wavelength grid based on a wavelength solution
from sky (OH) emission lines. Each pixel in the grid has
a width of δλ = 0.019 nm. We sky-subtracted by fitting
third-order polynomials to the 2D spectra column-by-
column.
The NIRSPEC echelle turret is jostled when the cryo-
genic image rotator undergoes large slews, leading to
stochastic shifts of the wavelength scale of up to sev-
eral pixels. Doppler shift information is lost due to such
shifts. Typically, the absolute wavelength scale is estab-
lished using telluric OH emission. Throughout each data
acquisition cycle the image rotator was either turned off,
or only slow, small amplitude tracking motions were ex-
ecuted. In either case the wavelength solution is stable
to better than a few hundredths of a pixel. Thus the ve-
locity broadening reported here is intrinsic to the source
and is not an instrumental effect.
The cluster spectra were extracted using Gaus-
sian weighting functions matched to the wavelength-
integrated profile of each cluster. To correct for atmo-
spheric absorption in the cluster spectra, we observed a
hot main sequence star at a similar airmass. This cali-
bration star spectrum is divided by a spline function fit
to remove photospheric absorption features (particularly
Brackett series and helium lines) and continuum slope.
The resulting atmospheric absorption spectrum is then
divided into the cluster spectra.
The adopted sky-subtraction method generally com-
pletely removes sky emission lines. However, a high
level of noise is left behind at the position of bright OH
SUPER–STAR CLUSTERS IN M82 5
lines, particularly un- or barely-resolved doublets. These
“noise spikes” must be removed to avoid introduction
of systematic bias in the cross-correlation results. We
smooth the spectrum with a broad (∼ 40 km s−1 ) step
function, and subtract the original spectrum to obtain a
residuals array. Data points greater than 5× the rms are
replaced by the smoothed pixel. The step-function width
and clipping level were chosen to limit replaced pixels to
only those affected by strong sky emission lines. Cer-
tain atmospheric OH emission lines were incompletely
removed in the sky subtraction process. In these cases,
we replaced the pixels affected by residual sky emission
with the median value of the ∼ 5 pixels on either side of
the contaminated range. The fraction of pixels replaced
in a given echelle order typically amounts to a few per-
cent. Tests on an OH emission-free echelle order indicate
that the cross-correlation result is unaffected within the
stated uncertainties.
An atlas of the spectra for 19 SSCs is presented in Fig-
ures 5 and 6. We have included here only the spectra
for echelle orders 46 and 47; plots of all echelle orders
for each cluster are available in McCrady (2005). Each
cluster spectrum represents the summation of multiple
observations (Table 1). The total is normalized by di-
viding by the median value for that echelle order, such
that the spectrum is centered about unity. The resulting
scale is relative flux, which allows direct comparison of
spectra of different clusters. The spectra are offset by
an arbitrary integer amount for presentation of multi-
ple clusters on a single set of axes, and labeled towards
the right (long-wavelength) side. The clusters within a
given echelle order are arranged in the atlas in order of
increasing velocity dispersion.
The signal-to-noise ratio (S/N) of the extracted spec-
tra varies between clusters. The clusters vary in lumi-
nosity over a range of four apparent magnitudes in H
band (McCrady et al. 2003), which is a factor of ∼ 40.
The luminosity differences carry through to the single-
integration S/N as all clusters were observed for 600 sec-
onds per integration. Light losses due to variable seeing
and inefficiencies in fine guiding add additional varia-
tion to the S/N for each cluster. Although faint clusters
were observed more often, not all clusters were observed
to the same S/N as a result of observing constraints.
Differences in the total S/N are evident in the atlas of
cluster spectra. For example, faint SSC-k was observed
only twice and has total S/N ∼ 9 per pixel based on
the CCD equation (Howell 2000, p. 54), which accounts
for Poisson statistics, background, dark current and read
noise. Bright SSC-1c was also observed just twice, but
has total S/N ∼ 28 per pixel. Repeated observations
(13 times) of SSC-r brought the total S/N up to ∼ 37
per pixel; a single observation of faint SSC-r results in
S/N comparable to a single observation of SSC k. These
examples provide a sense of the S/N range of the obser-
vations.
The spectra presented in the atlas have been shifted
to rest wavelength. Across the top of each plot, we have
identified the positions of certain prominent spectral fea-
tures. The H-band spectra of the clusters resemble the
spectra of supergiant stars, and have a large number of
iron and OH absorption lines. Rovibrational ∆v = 3
bandheads of carbon monoxide are recognizable in or-
ders 45 through 49. As seen in the spectra of supergiant
stars (Kirian et al. 2006; Meyer et al. 1998), the strength
of the CO bandheads and OH lines increase at cooler
effective temperatures. Lines from other miscellaneous
metals (Mn, Ti, Si, Ca, C, Ni) and molecules (CN) are
also indicated. At the bottom of each atlas plot is a rep-
resentative sky emission spectrum for that echelle order,
arbitrarily scaled. The brightest OH emission lines leave
a footprint of increased noise in the extracted cluster
spectrum.
The disk of M82 has substantial diffuse emission. The
nuclear starburst displays mottled near-IR continuum
emission (Figure 1). The cross-correlation analysis based
upon the spectra of the clusters is more sensitive to line
emission. Lynds & Sandage (1963) first noted the bipo-
lar, filamentary network of Hα emission extending more
than 1 kpc from the galactic center. Paschen α im-
ages show patchy recombination emission throughout the
nuclear starburst (Alonso-Herrero et al. 2003). K-band
spectra show emission lines of hydrogen (Br γ), He I and
H2. Prominent H-band emission lines are [Fe II] at 1.644
µm and the Br 6 line of hydrogen at 1.7367 µm. In sev-
eral of the slit positions used, nebular emission varies
along the slit and removal is difficult. Remnants of the
lines are apparent in the spectra of certain clusters (e.g.,
SSC-1c in Figure 6).
3.3. Objects Observed
The 20 objects of Table 3 of McCrady et al. (2003)
and the 19 objects of Table 2 do not constitute equal
sets. The intersection of the two tables contains the 15
SSCs for which we have herein derived a mass (see Table
2). The union of the two tables contains 24 objects.
Moreover, Figure 2 identifies 26 objects. An accounting
of the objects observed in this project is as follows.
Five clusters for which we measured photometry and
half-light radius had significant problems with their
echelle spectroscopy. The practical single-exposure time
limit of 600 seconds is set by the saturation point of
atmospheric OH emission lines. Clusters fainter than
[F160W] ∼ 15 mag have S/N < 2 in 600 seconds of typ-
ical seeing. In poor seeing, these faint clusters are often
too difficult to identify for positioning the spectrograph
slit. SSC-1b is a frustrating case, as it is a bright clus-
ter with good S/N in a 600-s exposure. In poor seeing,
however, the light from SSC-1b is blended with the light
of the nearby clusters SSC-1a and SSC-1c. Observations
on 2003 Feb 6 were not used for SSC-1b or SSC-1c be-
cause of inadequate seeing. On the night of 2005 Jan 24,
the seeing was sufficient to resolve SSC-1c, however the
NIRSPEC detector was contaminated by persistence due
to sky emission from use in low-resolution mode by an
unaffiliated observing team earlier in the night. While
we were able to extract SSC-1c, SSC-1b was significantly
contaminated and had to be rejected.
Six other objects identified in Figure 2 have no derived
virial mass. SSC-z is clearly a super star cluster, but
lies just north of the edge of the HST/NICMOS field
(McCrady et al. 2003). In the absence of a resolved im-
age, we have no measurement of the half-light radius.
Object “y” also lies off the edge of the NICMOS field,
just south of SSC-L. Spectra of object “y” are inconclu-
sive due to low S/N in any case. Object “10” is un-
resolved by the NICMOS image, and we therefore have
no measured half-light radius. Interestingly, object “10”
6 McCRADY & GRAHAM
is coincident with a point source in Paschen α images,
suggesting it may be a compact H II emission region sur-
rounding one or several massive stars. SSC-j and SSC-a
are not well fit by empirical King functions, and the half-
light radii of these clusters are undetermined. SSC-h is
likewise not well fit by the empirical King model, as it is
clearly a collection of sources in NIC2 images (Figure 3).
4. ANALYSIS
4.1. Cross Correlation Results
Each cluster was observed multiple times, with seven
echelle orders in the N5 filter per observation. A single
observation of the spectrum in a particular echelle order
is treated as one “experiment” for the cluster. Each of
the experiments for a cluster is cross correlated with the
spectrum from the corresponding spectral order of each
of the template evolved stars. The result of this analy-
sis is an ensemble of cross correlation functions (CCFs)
for each cluster/template star pair. The peak amplitude
of the CCF measures the similarity of the cluster to the
template spectral type (Table 2). For each cluster, we
have identified the template supergiant spectrum which
provides the best match. In most cases, several template
stars match the cluster approximately equally well. The
spectra of template stars with higher surface tempera-
tures (e.g., G-type stars) proved to be poor matches for
the cluster spectra. The CCFs for each cluster and its
best-match template are presented in McCrady (2005).
The velocity dispersions based upon cross correlation
results for the best-match template star are listed in Ta-
ble 2. The quoted uncertainties reflect the formal error
based upon the standard deviation of the mean for the
ensemble of experiments. Systematic errors are treated
in McCrady (2005, see also §2); the stated uncertainties
do not include any allowance for the applied correction
of systematic offsets. In the course of the analysis, each
cluster spectrum is cross correlated with each of the tem-
plate spectra. We find that the velocity dispersions in-
dicated by the best match template are consistent with
velocity dispersions indicated by other templates of sim-
ilar effective temperature to within the uncertainties.
4.2. Derived Virial Masses
Armed with measurements of the cluster half-light
radii and velocity dispersions, we are ready to derive
the virial masses. Table 2 lists the derived mass for 15
SSCs in M82. Most of the SSCs have masses between
2 × 105 and 106 M⊙ ; clusters SSC-L, SSC-7 and SSC-9
haveM > 106 M⊙ . The median uncertainty on the virial
mass measurements is 16 percent. A significant portion
of the error budget is the 8 percent uncertainty in the
adopted distance to M82.
Figure 7 plots the half-light radii versus the velocity
dispersions for the SSCs. We have plotted the locus of
points for certain masses as dashed lines. These lines il-
lustrate that the uncertainty in the velocity dispersion,
σr, has a greater impact on the uncertainty in mass than
does the uncertainty in the half-light radius, rhp. This is
to be expected, as the virial mass (Eq. 1) is proportional
to the square of σr. We have mitigated this effect by
measuring σr to a precision sufficient to balance the er-
ror budget roughly evenly between uncertainties on the
velocity dispersions and the halflight radii.
Application of the virial theorem to determine the
mass of the clusters is based on the assumption that
the clusters are at present bound (self-gravitating) en-
tities. This assumption is supported by a comparison of
the relevant timescales: the crossing time and the age
of the clusters. The crossing time is the typical time re-
quired for a star to cross the cluster, where tcr ≈ rhp/σr
(Binney & Tremaine 1987, p. 190). The SSCs in Table
2 have crossing times in the range of 4 × 104 to 3 × 105
years, while their ages are on the order of ∼ 107 years
(Satyapal et al. 1997). Thus, member stars have made
tens of crossings of the clusters. After just a few cross-
ing times, the stars of a cluster are well mixed (King
1981) and the virial theorem is well satisfied (Aarseth
1974). The assumption that the M82 clusters are cur-
rently gravitationally bound is therefore valid.
To provide context and a sense of scale for the derived
cluster masses, we turn to virial mass measurements of
SSCs and globular clusters in our own and other galax-
ies from the literature. Pryor & Meylan (1993) use ve-
locity dispersions and King-Michie model fits to derive
virial masses of 56 Galactic globular clusters. Their ve-
locity dispersions range from 1 − 19 km s−1 (σ = 6.8
km s−1 ) and masses of 104 to 4 × 106 M⊙ (M = 5.6 ×
105 M⊙ ). The Milky Way globular clusters are plot-
ted in Figure 7 for comparison with the M82 clusters.
(Pryor & Meylan (1993) provide σr and M , from which
we estimated rhp using Eq. 1.) In total, the Milky
Way has about 180 globular clusters (Ashman & Zepf
1998, p. 31). If we assume an average cluster mass of
1.9 × 105 M⊙ (Mandushev et al. 1991), the total mass
of the Galactic population is ∼ 3.4 × 107 M⊙ . The
M82 SSCs in Table 2 have a total mass of ∼ 1.4 × 107
M⊙ , comparable to the aggregate mass of the much older
Galactic globular clusters. (However, we do not mean to
imply that all of the M82 SSCs will remain bound for 12
Gyr; see §5.)
The old globular clusters of the Milky Way are spread
widely in the σ − rh space of Figure 7, but in general
the locus of points is below (lower velocity dispersion
and to the right (larger radius) of the locus of points for
the young M82 SSCs. What can we infer about the two
populations from this plot? It is interesting to consider
the time evolution of a cluster in this parameter space.
Over time, mass loss from individual stars in the course
of their evolution will cause a cluster to lose mass. As
detailed in the Appendix, adiabatic mass loss by a viri-
alized cluster progresses such that the product σr is con-
served. An isolated cluster, evolving through adiabatic
mass loss, would gradually move down and to the right in
Figure 7 as indicated by the plotted vector, crossing the
“isobaric” lines. Over the span of 15 Gyr, a cluster with
a Kroupa IMF would lose around half of its initial mass
(i.e., its mass at 10 Myr) as a result of stellar evolution
(McCrady et al. 2003). Such adiabatic evolution of the
young M82 SSCs over a Hubble Time would place the
clusters in the same region as the bulk of the old globu-
lar clusters in Figure 7. We discuss the implications of
this plot further in §5.
Additional context for our results comes from observed
cluster systems in other galaxies. Dubath & Grillmair
(1997) measured the masses of nine globular clusters in
M31. They find velocity dispersions of 7−27 km s−1 (σ =
SUPER–STAR CLUSTERS IN M82 7
14 km s−1 ), rhp = 2 − 5 pc (rhp = 3.6 pc) and masses
of 4.3− 82× 105 M⊙ (M = 2.3× 10
6 M⊙ ). Larsen et al.
(2002) measure virial masses for four globular clusters in
M33. They find velocity dispersions of 4.4− 6.5 km s−1 ,
rhp = 2 − 8 pc and masses of 1.4 − 6.2 × 10
5 M⊙ . In
general, the old globular clusters in these neighboring
galaxies are larger with lower velocity dispersions. This
pattern is consistent with the notion that the M82 SSCs
are a population of young globular clusters, as adiabatic
mass loss due to stellar evolution would cause the clus-
ters to expand over time (see Appendix). A more direct
comparison is provided by young SSCs in other galaxies.
Mengel et al. (2002) examines five young (age ∼ 8 Myr)
clusters in the merging Antennae galaxies. They find ve-
locity dispersions of 9 − 21 km s−1 , rhp of 3.6 − 4.0 pc,
and masses of 6.4− 47× 105 M⊙ .
As discussed in McCrady et al. (2005), SSC-F shows
evidence of mass segregation. While cluster-wide dynam-
ical mass segregation in these young clusters is unlikely, it
is possible that the most massive stars in a cluster have
either rapidly sunk toward the core or simply formed
nearer the core. In either case, the red supergiant ve-
locity dispersions we measure in the near-IR would be
smaller than the cluster mean and the masses we derive
would represent lower limits. Gasdynamical modeling
by Boily et al. (2005) indicates that inward migration of
massive stars may cause the dimensionless geometric pa-
rameter η in the virial mass formula (Equation 1) to in-
crease by a factor of around two over a few ×107 yr.
In our analysis, we have explicitly assumed that η = 10
as derived in (McCrady et al. 2003). If mass segrega-
tion has in fact led to η > 10 in these M82 clusters, the
masses we derive here would be underestimated by the
corresponding factor.
4.3. Cluster Mass Function
Armed with the virial masses of the SSCs, we can inves-
tigate the cluster mass function for the M82 nuclear star-
burst. The standard manner for making a cluster mass
function is to prepare a histogram of the masses over log-
arithmically spaced bins and fit a power law or lognor-
mal distribution to the slope. Rosolowsky (2005) demon-
strates the shortcomings of this method, citing specifi-
cally the dependence of the power law index to the choice
of bin size and spacing in cases involving a small number
of data points. We choose instead to evaluate the cumu-
lative mass function for the 15 M82 SSCs. Figure 8 plots
the cumulative mass function as N(M ′ > M), which is
the total number of clusters with mass greater than the
reference mass M . In the common case of a power law,
the slope of a standard mass function is dN/dM ∝ Mγ .
For the cumulative mass function, the integration adds
one to the exponent, such that N(M ′ > M) ∝ Mγ+1
(Rosolowsky 2005).
The mass function for the M82 SSCs is well fit by a
power law of index γ = −1.91±0.05. The uncertainty on
the power law index is based on a Monte Carlo simulation
of our mass data. We resampled the cluster masses by
adding normal noise according to the uncertainties on the
mass measurements, then fit for the index of the resulting
cumulative mass function. The distribution of power law
indices was Gaussian with a standard deviation of our
quoted uncertainty. The power law index of γ ∼ −2
indicates that the stellar mass of the cluster population
is divided rather equally between the high-mass clusters
and low-mass clusters. Of the aggregate mass in our
sample of 15 SSCs, roughly 60 percent of the mass is
contained in the three most massive clusters (SSCs L, 9
and 7).
Estimation of the completeness limit for our SSC mass
function is somewhat difficult. Our mass measurements
are based on measured cluster velocity dispersions. For
us to measure a velocity dispersion, the cluster must be
observable in the near-IR and be sufficiently bright and
spatially resolved for us to obtain a usable spectrum.
Limiting factors include the intrinsic mass of the cluster,
the light-to-mass ratio for the cluster, confusion with ad-
jacent clusters or background emission, and line-of-sight
extinction. With the exception of SSC 1b, which suffers
from source confusion, we are confident we have obtained
the spectra of all clusters brighter than apparent mag-
nitude H ∼ 13.8. At the distance of M82, this corre-
sponds to a cluster luminosity of ∼ 1.8 × 105 L⊙ . To
convert to a mass estimate, we can estimate the light-
to-mass ratio for the clusters. For clusters in the age
range of 7–13 Myr, suitable for the M82 nuclear clusters
(McCrady et al. 2003), and a Kroupa (2001) field star
mass function, Starburst99 models predict L/M ∼ 1 in
units of L⊙ /M⊙ . The highly variable extinction in the
dusty, inclined disk of the galaxy could be hiding ad-
ditional bright clusters, on the far side of the disk for
example. Assuming a typical extinction correction for
the clusters of ∼ 0.5 mag in H band, we estimate that
our mass function is largely complete for clusters more
massive than ∼ 3× 105 M⊙ . To characterize the poten-
tial effects of incompleteness, we added fake clusters of
mass (3 ± 0.5)× 105M⊙ to the Monte Carlo simulation.
Each additional undetected cluster with mass near this
completeness limit would decrease the power law index
by ∆γ ∼ −0.02 (i.e., the power law would become more
steeply negative).
5. DISCUSSION
In §4.2, we note that the evolution of an individual M82
SSC via adiabatic mass loss due to stellar evolution over
a Hubble time would reposition the SSC in σ−rh param-
eter space. Such a repositioning would leave any of the
SSCs in our sample in a region consistent with the posi-
tion of the old globular clusters in the Milky Way (Figure
7). In a sense, this represents a necessary but not suffi-
cient condition for the hypothesis that these young SSCs
represent the progenitors of globular clusters. If, instead,
our analysis showed that stellar evolution would move the
clusters to a point in σ− rh parameter space that would
be inconsistent with the position of old globular clus-
ters, it would represent strong evidence that these SSCs
could not be progenitors of globular clusters. In fact,
adiabatically evolving any individual M82 SSC from our
sample for 15 Gyr would leave it solidly within the re-
gion of σ − rh parameter space occupied by old globular
clusters.
But we hasten to note that this result is insufficient
evidence that these SSCs are destined to become a
population of old globular clusters. The M82 nuclear
clusters are young, with ages on the order of 10 Myr
(Förster Schreiber 1998; McCrady 2005). There is grow-
ing evidence in the literature that a significant portion of
8 McCRADY & GRAHAM
young clusters are disrupted on a timescale of approxi-
mately 10 Myr from birth. Observations of massive clus-
ters in the Antennae galaxies (Fall et al. 2005) and M51
(Bastian et al. 2005) and lower mass open clusters in the
solar neighborhood (Lada & Lada 2003) find an excess
of clusters with ages ∼ 10 Myr relative to what would
be expected based on an assumption of constant cluster
formation rate (Bastian & Gieles 2006). The naive in-
terpretation of these findings is that there was a burst
of star cluster formation in the past 10 Myr in each of
these galaxies. But as noted by Fall et al. (2005), the
age distribution of star clusters represents the combined
histories of star cluster formation and disruption within
a galaxy. The relative wealth of clusters aged 10 Myr in
these various nearby galaxies suggests that we are fortu-
nate to be observing them at a special time in their star
formation histories, whereby we fall afoul of the cosmo-
logical principle.
An attractive alternative is that a high percentage of
clusters are disrupted within approximately 10 Myr of
formation, i.e., the e-folding survival time for a popula-
tion of clusters is about 10 Myr (Mengel et al. 2005).
This hypothesis goes by the morbid name of “infant
mortality.” The energy and momentum output of mas-
sive stars via stellar winds and supernovae could remove
residual natal gas from a young massive cluster. If the
gas removal were impulsive (i.e., occured over less than
a crossing time), the cluster could become gravitation-
ally unbound and begin expanding freely. Whether or
not a cluster survives this phase depends largely on the
star formation efficiency in the formation of the cluster
from natal gas (see references in Bastian & Gieles 2006).
Fall et al. (2005) posit that because a cluster with more
mass has both more gas to remove and more massive
stars to provide the energy, the fraction of clusters dis-
rupted may be roughly independent of mass. They find
this conjecture to be consistent with their observation
that the shape of the cluster mass function in the Anten-
nae galaxies is nearly independent of age.
Zhang & Fall (1999) investigated the mass function
of young star clusters in the Antennae galaxies (NGC
4038/9) based on photometric mass estimates, and found
a power law mass function of γ = −2 over the range
104 ≤ M/M⊙ ≤ 10
6, a result confirmed by Fall et al.
(2005). Mengel et al. (2005) found potential evidence
for a turnover or change in slope of the mass function for
the Antennae clusters, but cautions that the random and
systematic uncertainties discourage overinterpretation of
this result. They note that determination of the cluster
mass function from photometric masses requires age de-
terminations for individual clusters, which is particularly
delicate work around ages of ∼ 107 yr when the cluster
luminosity varies greatly with age.
Our virial mass measurements obviate determination
of the ages of individual clusters and assumptions re-
garding the form and cutoff masses of the stellar IMF.
The M82 nuclear clusters in our sample are also young,
and follow a power law mass distribution very similar to
the Antennae clusters. A power law mass distribution
for young SSCs stands in contrast to the lognormal mass
distribution for old globular clusters (Harris 1991), which
imply a preferred mass scale at the peak of ∼ 2×105M⊙
for Milky Way globular clusters. Several processes, oper-
ating on different timescales, have the ability to disrupt
star clusters. As discussed above, infant mortality ap-
pears to disrupt clusters independently of mass, thereby
preserving the shape of the initial cluster mass function.
On longer timescales (∼ 108−109 yrs), the strongly mass
dependent processes of two-body relaxation and exter-
nal perturbations (such as gravitational shocks and dy-
namical friction) can disrupt the clusters (Bastian et al.
2005). Analytical models by Fall & Zhang (2001) find
that the initial form of the high-mass end of the cluster
mass function is preserved over time. Two-body relax-
ation decreases the masses of clusters linearly over time,
flattening the mass function at low masses but little af-
fecting the shape at high masses. By 12 Gyr, the mass
function develops a peak at a mass of about 2× 105M⊙ .
Thus, an initial power law distribution of cluster masses
will develop into a distribution resembling the lognormal
mass function of old globular clusters over 12 Gyr, with
disruption erasing all information regarding the initial
shape of the mass function at the low-mass end.
In a galactic environment, of course, the clusters are
not isolated, and are additionally subject to external
forces such as galactic tides and dynamical friction. The
inner 1 kpc of a galaxy, with its strong tidal fields, is
a particularly dangerous place for star clusters. But as
noted by Fall & Zhang (2001), their models “support the
suggestion that at least some of the star clusters formed
in merging and interacting galaxies can be regarded as
young globular clusters.” As shown in Figure 7, the adia-
batic evolution of a cluster through stellar evolution and
mass loss over a similar time frame will tend to move the
M82 SSCs into the σ − rh parameter space occupied by
old Galactic globular clusters. While we cannot predict
the fate of any individual M82 SSC, our results suggest
that any cluster which should happen to survive for a
Hubble time could resemble the old globular clusters seen
in the Milky Way today.
6. SUMMARY
In this paper, we investigate the SSC population of the
inner ∼ 500 pc of the M82 starburst. The nuclear star-
burst in M82 contains roughly two dozen SSCs that are
prominent in the near-IR. Based on high spectral reso-
lution near-IR spectra, we measure line-of-sight velocity
dispersions for 19 SSCs in the nuclear starburst. We find
dispersions in the range of 7 − 35 km s−1 , comparable
with values for older globular clusters. We apply the
virial theorem to the measured velocity dispersions and
halflight radii to derive the masses of 15 of the SSCs. The
SSC masses lie in the range of 2.5 × 105 M⊙ to 4 × 10
M⊙ , placing them at the high end of the mass distribu-
tion function for old Galactic globular clusters. The total
mass of the 15 measured SSCs is 1.4× 107 M⊙ , which is
of the same order of magnitude as the total mass of the
globular cluster system of the Milky Way. Evolution of
the clusters via gradual mass loss from stellar evolution
would move them into the realm of σ − rh parameter
space occupied by old Milky Way globular clusters. The
cumulative mass function of the clusters follows a power
law with an index of γ = −1.91 ± 0.06. This is very
similar to the mass distribution of young SSCs in the
Antennae galaxies, and lends credence to the suspicion
that SSCs are potential future globular clusters.
We would like to thank the staff of the Keck Ob-
SUPER–STAR CLUSTERS IN M82 9
servatory for their assistance in our observations. We
also thank the anonymous referee for helpful com-
ments regarding implications of this work. NM thanks
John Johnson for invaluable data wrangling advice and
W. D. Vacca, L. Blitz and S. E. Boggs for helpful com-
ments. The authors wish to recognize and acknowledge
the very significant cultural role and reverence that the
summit of Mauna Kea has always had within the in-
digenous Hawaiian community. We are most fortunate
to have the opportunity to conduct observations from
this mountain. This material is based upon work sup-
ported by the National Science Foundation under Grant
No. 0502649, with additional support from NSF Grant
AST–0205999. Any opinions, findings, and conclusions
or recommendations expressed in this material are those
of the authors and do not necessarily reflect the views of
the National Science Foundation.
APPENDIX
APPENDIX
The consequences of mass loss from a virialized star cluster depend upon the rate of loss. We consider specifically
two cases: (1) rapid mass loss, where a star cluster has a sufficiently long relaxation time that v and R are unable to
readjust during the ejection of some quantity of mass, and (2) slow mass loss, where v and R for the cluster continually
readjust to maintain equilibrium.
In the case of rapid mass loss, Hills (1980) derives an expression for the relation between the initial mass and the
amount of mass ejected:
R = R0
M0 −∆M
M0 − 2∆M
. (A1)
Evidently, as ∆M tends to M0/2 the cluster radius tends to infinity and the cluster becomes unbound.
In the case of gradual mass loss, the cluster constantly adjusts to maintain equilibrium. The total energy at any
instant, whether the cluster is in equilibrium or not, is
Mv2 − η
where v is the 3-d rms velocity and η is a non-dimensional form factor. The corresponding change in total energy of
the system is
v2 − 2η
. (A3)
After δm is lost, the cluster must readjust to the new equilibrium. In the new equilibrium configuration the Virial
theorem, E = −T = Ω/2 (Binney & Tremaine 1987, p. 211), can be invoked to express the partial derivative:
, (A4)
which can be integrated from the initial mass, M0, and energy E0,
dE/E = 3
dM/M (A5)
to yield E/E0 = (M/M0)
3. Substituting again from the Virial theorem, which shows that E ∝ Ω ∝ M2/R, we have
have an expression for the radius as a function of mass,
R = R0
. (A6)
which is the adiabatic invariant from Hills (1980), where M ≡ M0 −∆M .
In the equilibrium states both before and after loss of ∆M , the Virial theorem applies, and E = −T (where
T = Mv2/2 is the kinetic energy). Thus:
−Mv2/2
. (A7)
Simplifying and combining terms,
which has the physical solution (M/M0) = (v/v0). If we compare this last result to Equation (A6), we find
and therefore for adiabatic mass loss, vR = v0R0 = constant.
REFERENCES
Aarseth, S. J. 1974, A&A, 35, 237 Achtermann, J. M. & Lacy, J. H. 1995, ApJ, 439, 163
10 McCRADY & GRAHAM
Alonso-Herrero, A., Rieke, G. H., Rieke, M. J., & Kelly, D. M.
2003, AJ, 125, 1210
Ashman, K. M. & Zepf, S. E. 1998, Globular Cluster Systems (New
York: Cambridge University Press)
Böker, T., van der Marel, R. P., Mazzuca, L., Rix, H., Rudnick,
G., Ho, L. C., & Shields, J. C. 2001, AJ, 121, 1473
Böker, T., van der Marel, R. P., & Vacca, W. D. 1999, AJ, 118,
Bastian, N. & Gieles, M. 2006, ArXiv Astrophysics e-prints, astro-
ph/0609669
Bastian, N., Gieles, M., Lamers, H. J. G. L. M., Scheepmaker,
R. A., & de Grijs, R. 2005, A&A, 431, 905
Binney, J. & Tremaine, S. 1987, Galactic Dynamics (Princeton,
NJ: Princeton University Press)
Boily, C. M., Lançon, A., Deiters, S., & Heggie, D. C. 2005, ApJ,
620, L27
Brandl, B., Brandner, W., Eisenhauer, F., Moffat, A. F. J., Palla,
F., & Zinnecker, H. 1999, A&A, 352, L69
Brandl, B., Sams, B. J., Bertoldi, F., Eckart, A., Genzel, R.,
Drapatz, S., Hofmann, R., Loewe, M., & Quirrenbach, A. 1996,
ApJ, 466, 254
Chernoff, D. F. & Weinberg, M. D. 1990, ApJ, 351, 121
de Grijs, R., O’Connell, R. W., & Gallagher, J. S. 2001, AJ, 121,
Doane, J. S. & Mathews, W. G. 1993, ApJ, 419, 573
Dubath, P. & Grillmair, C. J. 1997, A&A, 321, 379
Fall, S. M., Chandar, R., & Whitmore, B. C. 2005, ApJ, 631, L133
Fall, S. M. & Zhang, Q. 2001, ApJ, 561, 751
Förster Schreiber, N. M. 1998, PhD thesis, Lugwig-Maximilians-
Universität München
Förster Schreiber, N. M., Genzel, R., Lutz, D., Kunze, D., &
Sternberg, A. 2001, ApJ, 552, 544
Freedman, W. L., Hughes, S. M., Madore, B. F., Mould, J. R., Lee,
M. G., Stetson, P., Kennicutt, R. C., Turner, A., Ferrarese, L.,
Ford, H., Graham, J. A., Hill, R., Hoessel, J. G., Huchra, J., &
Illingworth, G. D. 1994, ApJ, 427, 628
Gilbert, A. M. 2002, PhD thesis, Univ. of California, Berkeley
Gilbert, A. M., Graham, J. R., McLean, I. S., Becklin, E. E., Figer,
D. F., Larkin, J. E., Levenson, N. A., Teplitz, H. I., & Wilcox,
M. K. 2000, ApJ, 533, L57
Harris, W. E. 1991, ARA&A, 29, 543
Heckman, T. M. 1998, in ASP Conf. Ser. 148: Origins, ed. C. E.
Woodward, J. M. Shull, & H. A. Thronson (San Francisco:
Astronomical Society of the Pacific), 127
Hills, J. G. 1980, ApJ, 235, 986
Ho, L. C. & Filippenko, A. V. 1996a, ApJ, 466, L83
—. 1996b, ApJ, 472, 600
Howell, S. B. 2000, Handbook of CCD Astronomy (Cambridge:
Cambridge University Press)
King, I. R. 1981, QJRAS, 22, 227
Kirian, R., McCrady, N., & Graham, J. R. 2006, ApJS, in press
Kroupa, P. 2001, MNRAS, 322, 231
Lada, C. J. & Lada, E. A. 2003, ARA&A, 41, 57
Larsen, S. S., Brodie, J. P., & Hunter, D. A. 2004, AJ, 128, 2295
Larsen, S. S., Brodie, J. P., Sarajedini, A., & Huchra, J. P. 2002,
AJ, 124, 2615
Leitherer, C. 2001, in ASP Conf. Ser. 245: Astrophysical Ages
and Times Scales, ed. N. M. T. von Hippel & C. Simpson (San
Francisco: Astronomical Society of the Pacific), 390
Lipscy, S. J. & Plavchan, P. 2004, ApJ, 603, 82
Lynds, C. R. & Sandage, A. R. 1963, ApJ, 137, 1005
Mandushev, G., Staneva, A., & Spasova, N. 1991, A&A, 252, 94
McCrady, N. 2005, PhD thesis, Univ. of California, Berkeley
McCrady, N., Gilbert, A. M., & Graham, J. R. 2003, ApJ, 596, 240
McCrady, N., Graham, J. R., & Vacca, W. D. 2005, ApJ, 621, 278
McLean, I. S., Becklin, E. E., Bendiksen, O., Brims, G., Canfield,
J., Figer, D. F., Graham, J. R., Hare, J., Lacayanga, F., Larkin,
J. E., Larson, S. B., Levenson, N., Magnone, N., Teplitz, H., &
Wong, W. 1998, in Proc. SPIE Vol. 3354, Infrared Astronomical
Instrumentation, ed. A. M. Fowler (Bellingham: SPIE), 566
McLeod, K. K., Rieke, G. H., Rieke, M. J., & Kelly, D. M. 1993,
ApJ, 412, 111
Mengel, S., Lehnert, M. D., Thatte, N., & Genzel, R. 2002, A&A,
383, 137
—. 2005, A&A, 443, 41
Meurer, G. R., Heckman, T. M., Leitherer, C., Kinney, A., Robert,
C., & Garnett, D. R. 1995, AJ, 110, 2665
Meyer, M. R., Edwards, S., Hinkle, K. H., & Strom, S. E. 1998,
ApJ, 508, 397
Miller, G. E. & Scalo, J. M. 1978, PASP, 90, 506
O’Connell, R. W., Gallagher, J. S., & Hunter, D. A. 1994, ApJ,
433, 65
O’Connell, R. W., Gallagher, J. S., Hunter, D. A., & Colley, W. N.
1995, ApJ, 446, L1
Origlia, L., Ranalli, P., Comastri, A., & Maiolino, R. 2004, ApJ,
606, 862
Pryor, C. & Meylan, G. 1993, in ASP Conf. Ser. 50: Structure
and Dynamics of Globular Clusters, ed. S. G. Djorgovski &
G. Meylan (San Francisco: Astronomical Society of the Pacific),
Rieke, G. H., Loken, K., Rieke, M. J., & Tamblyn, P. 1993, ApJ,
412, 99
Rosolowsky, E. 2005, PASP, 117, 1403
Salpeter, E. E. 1955, ApJ, 121, 161
Satyapal, S., Watson, D. M., Pipher, J. L., Forrest, W. J.,
Greenhouse, M. A., Smith, H. A., Fischer, J., &Woodward, C. E.
1997, ApJ, 483, 148
Sirianni, M., Nota, A., Leitherer, C., De Marchi, G., & Clampin,
M. 2000, ApJ, 533, 203
Smith, L. J. & Gallagher, J. S. 2001, MNRAS, 326, 1027
Spitzer, L. 1987, Dynamical Evolution of Globular Clusters
(Princeton, NJ: Princeton University Press)
Takahashi, K. & Portegies Zwart, S. F. 2000, ApJ, 535, 759
Walborn, N. R., Máız-Apellániz, J., & Barbá, R. H. 2002, AJ, 124,
Whitmore, B. 2000, in ASP Conf. Ser. 197: Dynamics of Galaxies:
from the Early Universe to the Present, ed. G. A. M. F. Combes
& V. Charmandaris (San Francisco: Astronomical Society of the
Pacific), 315
Whitmore, B. C. 2001, in Starburst Galaxies: Near and Far, ed.
L. Tacconi & D. Lutz (Berlin: Springer), 106
Whitmore, B. C. & Schweizer, F. 1995, AJ, 109, 960
Whitmore, B. C., Zhang, Q., Leitherer, C., Fall, S. M., Schweizer,
F., & Miller, B. W. 1999, AJ, 118, 1551
Zepf, S. E., Ashman, K. M., English, J., Freeman, K. C., &
Sharples, R. M. 1999, AJ, 118, 752
Zhang, Q. & Fall, S. M. 1999, ApJ, 527, L81
SUPER–STAR CLUSTERS IN M82 11
TABLE 1
NIRSPEC N5 Observations
Date Objects texp Airmass Seeing Atm Star Remarks
(UT) (min) (sec z) (′′) (SpT)
2002 Feb 23 3, 9, 11 40 1.6 0.5 HD 74604 (B8V) (1)
2003 Jan 19 F, L 70 1.9 0.8 HD 173087 (B5V)
2003 Feb 6 1b, 1c, r 60 1.8 0.7 HD 74604 (B8V)
2003 Feb 6 1a, 3, m 50 1.7 0.7 HD 82327 (B9V) (2)
2003 Feb 6 6, 7 60 1.6 0.7 HD 82327 (B9V)
2003 Feb 6 8, 10, c 120 1.6 0.6 HD 82327 (B9V)
2003 Feb 6 s, t 60 1.9 0.6 HD 146926 (B8V)
2003 Feb 7 a, b 90 2.0 1.5 HD 146926 (B8V) (3)
2003 Dec 5 6, 7 20 1.6 0.5 HD 63586 (A0V)
2003 Dec 5 3, 6, h 40 1.6 0.5 HD 63586 (A0V) (4)
2004 Feb 8 a, y 60 1.9 0.8 HD 146926 (B8V)
2004 Feb 8 r, z 30 2.2 1.0 HD 146926 (B8V)
2004 Feb 9 r, z 40 1.6 1.3+ HD 82327 (B9V) (3)
2004 Feb 9 a, y 30 1.7 1.3+ HD 82327 (B9V) (3)
2005 Jan 24 1a, 1b 20 1.6 0.6 HD 82327 (B9V)
2005 Jan 24 1a, 1c, q 20 1.5 0.6 HD 82327 (B9V)
2005 Jan 24 j, k, q 30 1.5 0.6 HD 82327 (B9V)
Note. — Seeing values are estimates. Remarks. — (1) Only 30 min on object 3.
(2) Only 40 min on object 1a. (3) Seeing very poor & variable. (4) Only 20 min on
object 3.
TABLE 2
Cross-Correlation Results
Best Fit Mean CCF Dispersion Mass∗ tcr
Object Template Peak Value (σ, km s−1 (105 M⊙ ) (10
5 yr)
SSC L HR2289 0.70± 0.05 34.7± 0.4 40.± 6. 0.41± 0.05
SSC F HR2289 0.69± 0.1 12.4± 0.3 5.5± 0.8 1.2± 0.1
SSC a HD237008 0.53± 0.1 10.9± 0.4
SSC 11 HD237008 0.78± 0.07 12.1± 0.4 3.9± 0.6 0.9± 0.1
SSC 9 HD237008 0.70± 0.07 19.8± 0.5 23.± 4. 1.3± 0.2
SSC 8 HD14469 0.52± 0.1 10.5± 0.5 4.0± 0.7 1.5± 0.2
SSC 7 HD237008 0.53± 0.1 18.6± 1. 22.± 4. 1.4± 0.2
SSC 6 HD237008 0.76± 0.09 9.2± 0.3 2.7± 0.4 1.5± 0.2
SSC h HD237008 0.62± 0.07 33.2± 1.0
SSC j HD237008 0.55± 0.10 9.0± 0.8
SSC k HD237008 0.51± 0.1 9.± 1. 5.7± 2. 3.1± 0.5
SSC m HD14469 0.38± 0.1 15.2± 0.8 7.3± 1. 0.9± 0.1
SSC q HD237008 0.53± 0.1 7.9± 0.6 2.8± 0.6 2.5± 0.3
SSC 3 HD237008 0.76± 0.1 8.7± 0.3 2.7± 0.4 1.8± 0.2
SSC 1a HD237008 0.72± 0.07 13.4± 0.4 8.6± 1. 1.5± 0.2
SSC 1c HD237008 0.71± 0.08 12.2± 0.4 5.2± 0.8 1.2± 0.2
SSC r HD237008 0.65± 0.1 8.6± 0.3 3.0± 0.5 2.0± 0.2
SSC t HD237008 0.46± 0.1 7.9± 0.9 2.5± 0.7 2.2± 0.4
SSC z HD237008 0.69± 0.1 9.9± 0.3
∗ Error in mass includes errors in distance to M82, half-light radius and ve-
locity dispersion.
† Crossing time, described in §4.2.
12 McCRADY & GRAHAM
Fig. 1.— Color mosaic of HST ACS/WFC and NICMOS images of the nuclear region in M82. ACS F814W, NICMOS F160W and
NICMOS F222M images are mapped to blue, green and red, respectively. The image is ∼ 25′′ × 65′′ (0.4 × 1.1 kpc) with north up and
east to the left. About two dozen super star clusters are evident, many of which are spatially coincident with and reddened by the band
of variable extinction running from upper left to lower right in the image.
SUPER–STAR CLUSTERS IN M82 13
Fig. 2.— Mosaic of H-band (N5) NIRSPEC SCAM images of the nucleus of M82. Candidate SSCs are labeled for reference. Coordinates
are J2000. Inset image is from HST/NICMOS.
14 McCRADY & GRAHAM
Fig. 3.— HST/NICMOS F160W images of each cluster. Each image is 2.5′′ × 2.5′′, and the position angle of the y-axes is 349.4◦ (i.e.,
North is 10.6◦ left of straight up). The images are log-scaled, as the cores are substantially brighter than the halos.
SUPER–STAR CLUSTERS IN M82 15
Fig. 4.— Comparison of the spectra of SSC-11 and several cool supergiants in echelle order 49. The cluster spectrum displays the same
features as the supergiants, but appears washed out due to the velocity dispersion of its constituent stars. The supergiant stars are plotted
in a temperature sequence, with the hottest star at the top.
16 McCRADY & GRAHAM
Fig. 5.— Atlas of SSC spectra for echelle orders 47 & 46.
SUPER–STAR CLUSTERS IN M82 17
Fig. 6.— Atlas of SSC spectra for echelle orders 47 & 46, continued.
18 McCRADY & GRAHAM
Fig. 7.— Projected halflight radius (rhp) versus velocity dispersion (σr) for M82 SSCs (circles). Dashed lines indicate the locus of points
for cluster mass as labeled. Error bars on the halflight radius do not include the uncertainty on the distance to M82. Galactic globular
clusters (squares) from Pryor & Meylan (1993) are plotted for comparison. The vector indicates time evolution of a cluster due to adiabatic
loss of half its mass (see Appendix).
SUPER–STAR CLUSTERS IN M82 19
Fig. 8.— Cumulative mass function for the M82 SSCs. The dashed line indicates a power law fit where N(M ′ > M) ∝ Mγ+1. The best
fit has a slope of γ = −1.91± 0.06. The estimated completeness point for cluster mass is marked ’C’ (see text). The fitted power law does
not reflect any correction for completeness.
ABSTRACT
  We use high-resolution near-infrared spectroscopy from Keck Observatory to
measure the stellar velocity dispersions of 19 super star clusters (SSCs) in
the nuclear starburst of M82. The clusters have ages on the order of 10 Myr,
which is many times longer than the crossing times implied by their velocity
dispersions and radii. We therefore apply the Virial Theorem to derive the
kinematic mass for 15 of the SSCs. The SSCs have masses of 2 x 10^5 to 4 x 10^6
solar masses, with a total population mass of 1.4 x 10^7 solar masses.
Comparison of the loci of the young M82 SSCs and old Milky Way globular
clusters in a plot of radius versus velocity dispersion suggests that the SSCs
are a population of potential globular clusters. We present the mass function
for the SSCs, and find a power law fit with an index of gamma = -1.91 +/- 0.06.
This result is nearly identical to the mass function of young SSCs in the
Antennae galaxies.

<|endoftext|><|startoftext|>
Introduction
For a proper scheme p : X → k over a perfect field, the Picard scheme PicX representing the
functor T 7→ H0(Tet, R
1p∗Gm) exists, and its connected component Pic
X is separated and of
finite type [Mu64, II 15]. By Chevalley’s structure theorem [Chev60], the reduced connected
component Pic
0,red
X is an extension of an abelian variety AX by a linear algebraic group LX :
(1) 0 → LX → Pic
0,red
X → AX → 0.
The commutative, smooth affine group scheme LX is the direct product of a torus TX and a
unipotent group UX . The following theorem completely characterizes TX :
Theorem 1. If X is proper over a perfect field, then the cocharactermodule Homk̄(Gm, TX) of
the maximal torus of PicX is isomorphic to H
et(X̄,Z) as a Galois-module.
To analyze the unipotent part, we let Pic(X[t])[1] be the typical part, i.e. the subgroup of elements
x of Pic(X[t]) such that the map X[t] → X[t], t 7→ nt sends x to nx.
Theorem 2. Let X be proper over a perfect field. Then Pic(X[t])[1] is isomorphic to the group
of morphisms of schemes f : Ga → UX satisfying f(nx) = nf(x) for every n ∈ Z. In particular,
Homk(Ga, UX) ⊆ Pic(X[t])[1], and this is an equality in characteristic 0.
To get another description of UX , we assume that X is reduced (the map on the Picard scheme
induced by the map Xred → X is well understood by the work of Oort [Oort62]). The semi-
normalization X+ → X is the largest scheme between X and its normalization which is strongly
universally homeomorphic to X in the sense that the map X+ → X induces an isomorphism on
all residue fields. A Theorem of Traverso [Tra70] implies that Pic(X[t])[1], hence UX , vanishes if
X is reduced and seminormal. We use this to show
Theorem 3. Let X be reduced and proper over a perfect field.
a) We have a short exact sequence
(2) 0 → KX → Pic
0,red
X → Pic
0,red
and inclusions of unipotent group schemes
UX ⊆ KX ⊆ p∗(Gm,X+/Gm,X)
with quotients finite p-primary group schemes.
2010 Mathematics Subject Classification. 14K30.
Key words and phrases. Picard scheme, torus, unipotent subgroup, semi-normalization, etale cohomology.
Supported in part by NSF grant No.0556263.
http://arxiv.org/abs/0704.0479v4
2 THOMAS GEISSER
b) The group scheme p∗(Gm,X+/Gm,X) represents the functor
T 7→ {OX×T -line bundles L ⊆ OX+×T which are invertible in OX+×T }.
Notation: For a field k, we denote by k̄ its algebraic closure, and for a scheme X over k we let
X̄ = X ×k k̄. Unless specified otherwise, all extension and homomorphism groups are considered
on the fpqc site. 1
Acknowledgements:
This (original) paper was written while the author was visiting T. Saito at the University of Tokyo,
whom we thank for his hospitality. We are indebted to G. Faltings for pointing out a mistake
in a previous version, and the referee, whose comments helped to improve the exposition and to
give more concise proofs. O. Gabber pointed out mistakes in the original version and suggested
improvements.
2. The torus
Proposition 4. If p : X → k is reduced, geometrically connected, and proper over a perfect field,
then Gm,k → p∗Gm,X is an isomorphism. Moreover, if f : X
′ → X is a universal homeomorphism
and X ′ is reduced as well, then f induces an isomorphism p∗Gm,X ∼= p∗Gm,X′ .
Proof. Since any scheme T over k is flat, we have by flat base change Rjq∗OXT = H
j(X,OX )⊗kOT ,
where q : XT → T is the projection. In particular,
p∗Gm,X(T ) := Γ(X × T,OX×T )
× = (Γ(X,OX )⊗ Γ(T,OT ))
and it suffices to show that Γ(X,OX) ∼= Γ(X
′,OX′) ∼= k. Since Γ(X̄,OX̄)
Gal(k̄/k) = Γ(X,OX ), we
can assume that k is algebraically closed and that X is connected, in which case the statement
follows because X and X ′ are reduced, proper, connected, and have a k-rational point. ✷
Lemma 5. For any scheme X we have isomorphisms
H1et(X,Z)
fl(X,Z)
∼= Ext
X(Gm,X ,Gm,X).
Proof. The first isomorphism is [Mi80, III Rem. 3.11(b)]. To prove the second isomorphism, we
note that HomX(Gm,X ,Gm,X) ∼= ZX by [SGA3, VIII Cor. 1.5], and that Ext
X(Gm,X ,Gm,X) is
isomorphic to the group of extensions of group schemes [Oort66, Cor. 17.5], which vanishes by
[SGA7, VIII Prop. 3.3.1].3 Hence we obtain the isomorphism from the spectral sequence [Mi80,
III Thm.1.22]
2 = H
fl(X, Ext
X(Gm,X ,GmX)) ⇒ Ext
X (Gm,X ,Gm,X).
Proof. (Theorem 1) Since the maps defined below are natural, we can assume that k is alge-
braically closed and X is connected. We can also assume that X is reduced, because H1et(X,Z)
H1et(X
red,Z), and the map PicX → PicXred has unipotent kernel and cokernel [Oort62, Cor. page
9]. It suffices to calculate Homk(Gm,k,PicX), because there are no homomorphisms from Gm to
commutative group schemes other than tori [Oort66, p. 81]. By Yoneda’s Lemma, the latter group
1In [Gei09] we used the étale topology
2This replaces [Gei09, Prop. 9 a)] which is incorrect as stated because the induction step in the proof does not
preserve the hypothesis on reducedness.
3This was claimed without proof in [Gei09].
THE AFFINE PART OF THE PICARD SCHEME (CORRECTED). 3
is isomorphic to the group of homomorphisms of sheaves on the fpqc site Homk(Gm,k, R
1p∗Gm,X).
The Leray spectral sequence
(3) E
2 = Ext
k(Gm,k, R
tp∗Gm,X) ⇒ Ext
X (Gm,X ,Gm,X).
gives an exact sequence
0 → Ext1k(Gm,k, p∗GmX) → Ext
X(Gm,X ,Gm,X) → Homk(Gm,k, R
1p∗Gm,X)
−→ Ext2k(Gm,k, p∗Gm,X).
By Proposition 4 the left term agrees with Ext1k(Gm,k,Gm,k), and this vanishes by [Oort66, Cor.
17.5]. Thus it suffices to show that δX is the zero map
4. Choose a closed point of X and let i :
Z → X be the corresponding closed subscheme. Since p∗ ◦ i∗ = id we have R
sp∗i∗ = R
s(p◦ i)∗ = 0
for s > 0. Hence we obtain a diagram
Homk(Gm,k, R
1p∗Gm,X) −−−−→ Homk(Gm,k, R
1p∗i∗Gm,Z) = 0
Ext2k(Gm,k, p∗Gm,X)
−−−−→ Ext2k(Gm,k,Gm).
By Proposition 4, the lower horizontal map is an isomorphism. ✷
Remark. The example in [Gei06, Prop. 8.2] shows that the map H iet(X̄,Z) → Ext
(Gm,Gm) is
not an isomorphism for i ≥ 2. One can ask if it is an isomorphism if one replaces H iet(X̄,Z) by
the eh-cohomology group H ieh(X̄,Z) of [Gei06].
Example. If X is the node over an algebraically closed field, then H1et(X,Z)
∼= Z, and TX ∼= Gm.
Let X be a node with non-rational tangent slopes at the singular point. Base changing to the
algebraic closure, one sees that H1et(X̄,Z)
∼= Z, with Galois group acting as multiplication by −1,
hence TX is an anisotropic torus.
Using the theorem, we are able to recover the torsion of TX , AX and the diagonalizable part of
NSX := PicX /Pic
0,red
X in terms of etale cohomology:
Corollary 6. Let X be proper over a perfect field k. Then we have canonical isomorphisms
H1et(X̄,Z)⊗Q/Z
∼= colimHomk̄(µm, TX);
Div(torH
et(X̄,Z))
∼= colimHomk̄(µm, AX);
et(X̄,Z)/Div
∼= colimHomk̄(µm, NSX).
Proof. Taking the colimit of the isomorphismH1et(X̄,Z/m)
∼= Homk̄(µm,PicX) of [Mi80, Prop.4.16]
or [Ray70, §6.2], we obtain H1et(X̄,Q/Z)
∼= colimHomk̄(µm,PicX). Since Ext
(Gm, TX) = 0, The-
orem 1 implies that Homk̄(µm, TX)
∼= Homk̄(Gm, TX)/m
∼= H1et(X̄,Z)/m. Consider the commu-
tative diagram:
colimHomk̄(µm, TX) H
et(X̄,Z)⊗Q/Z
colimHomk̄(µm,Pic
0,red
X ) −−−−→ colimHomk̄(µm,PicX) −−−−→ colimHomk̄(µm, NSX)
colimHomk̄(µm, AX)
−−−−→ torH
et(X̄,Z) −−−−→ coker f.
The middle column is the short exact coefficient sequence. The left column and middle row are
short exact because Ext1
(µm, TX) = Ext
(µm,Pic
0,red
X ) = 0 by [Oort66, Cor. 17.5, II 14.2]. A dia-
gram chase shows that f is injective, and the right vertical map is an isomorphism. The Corollary
4The remainder of the proof is a simplification suggested by O. Gabber.
4 THOMAS GEISSER
follows because colimHomk̄(µm, AX) is divisible and colimHomk̄(µm, NSX) is finite. ✷
The above result should be compared to [Gei10, Prop.6.2], where we show that, for every proper
scheme over an algebraically closed field, the higher Chow group of zero-cycles CH0(X, 1,Z/m)
is the Pontrjagin dual of H1et(X,Z/m). This implies a short exact sequence
0 → torA
X(k) → CH0(X, 1,Q/Z) → χ(TX)⊗Q/Z → 0,
for AtX the dual abelian variety of AX , and χ(TX) the character module of TX . However, in this
case the contribution from the torus and from the abelian variety are not compatible with the
coefficient sequence
0 → CH0(X, 1) ⊗Q/Z → CH0(X, 1,Q/Z) → torCH0(X) → 0
as in Corollary 6.
Looking at tangent spaces, the previous Corollary gives a dimension formula:
Corollary 7. Let l be a prime different from char k. Then
dimk H
1(X,OX ) = dimUX + dimk Lie(NS
X) + rankH
et(X,Z) +
corankl H
et(X̄,Ql/Zl).
3. The unipotent part
Let N Pic(X) := ker
Pic(X[t])
−→ Pic(X)
. Since t 7→ 0t induces x 7→ 0x on the typical part,
Pic(X[t])[1] is a subgroup of N Pic(X). In [Wei91], Weibel shows that for every scheme there is a
direct sum decomposition
Pic(X[t, t−1]) ∼= Pic(X) ⊕N Pic(X) ⊕N Pic(X)⊕H
et(X,Z).
Proof. (Theorem 2). We show first that N Pic(X) = ker
1) → UX(k)
. Since there are
no non-trivial morphisms of schemes from A1k to an abelian variety, a torus, an infinitesimal
group, or a discrete group, we see that the kernel of UX(A
k) → UX(k) agrees with the kernel of
PicX(A
k) → PicX(k). Let p : X → k and p
′ : X × A1k → A
k be the structure morphisms. Then
the Leray spectral sequence gives a commutative diagram
0 −−−−→ H1et(A
Gm) −−−−→ Pic(X × A
k) −−−−→ PicX(A
k) −−−−→ H
0 −−−−→ H1et(k, p∗Gm) −−−−→ Pic(X) −−−−→ PicX(k) −−−−→ H
et(k, p∗Gm),
and it suffices to show that the outer vertical maps are isomorphisms. Let X
−→ L → k be the
Stein factorization of p, such that OL ∼= g∗OX and L is the spectrum of an Artinian k-algebra.
Since A1k → k is flat, p
OX×A1
= OA1
⊗kp∗OX , and X×A
−→ A1L −→ A
k is the Stein factorization
of p′. We obtain
H iet(A
Gm) ∼= H
Gm) ∼= H
L,Gm),
and H iet(k, p∗Gm)
∼= H iet(L,Gm). Hence the terms on the left vanish because Pic(L) = Pic(A
0. To show that H2et(A
L,Gm) → H
et(L,Gm) is an isomorphism, we can assume that L is a local
Artinian k-algebra with (perfect) residue field k′. By [Mi80, III Rem.3.11] we are reduced to
showing that H2et(A
k′ ,Gm) → H
′,Gm) is an isomorphism, and this can be found in [Mi80, IV
Ex.2.20].
Given an element x of N Pic(X), the condition x ∈ Pic(X[t])[1] implies that the corresponding f ∈
HomSch(A
1, UX) satisfies f(nx) = nf(x) for all n. If k has characteristic 0, then UX ∼= G
a for some
THE AFFINE PART OF THE PICARD SCHEME (CORRECTED). 5
r, and the map f : Ga → UX corresponds to a morphism of Hopf algebras f
∗ : k[x1, · · · , xr] → k[t].
If f∗(xi) =
j ajt
j, then
aj(nt)
j = nf∗(xi) = f
∗(nxi) = n
only if nj = n for all n, hence j = 1. ✷
Example. If k has characteristic p, then t 7→ t2p−1 induces a map Ga → Ga which is compatible
with multiplication by n, but not a homomorphism of group schemes.
Corollary 8. We have UX = 0 if and only if N Pic(X) = 0.
Proof. This follows from N Pic(X) = ker
1) → UX(k)
, because any unipotent, connected,
smooth affine group is an affine space as a scheme, hence admits a non-trivial morphism from A1
which sends 0 to 0 if it is non-trivial. ✷
The kernel and cokernel of PicX → PicXred has been described in [Oort62], hence we will from
now assume that X is reduced. If X+ is the semi-normalization of X, then the map OX → OX+
is an injection of sheaves on the same topological space. For X+ reduced and semi-normal,
N Pic(X+) = 0 by Traverso’s theorem [Tra70] together with [Wei91, Thm. 4.7]. Hence the
Corollary implies that UX+ = 0, and that
UX = ker(Pic
0,red
X → Pic
0,red
(For curves, this recovers [BLR90, Prop.9.2/10].) Indeed, by Corollary 6, the map Pic
0,red
0,red
induces an isomorphism on the torus and abelian variety part, because it induces an
isomorphism on etale cohomology.
Proof. (Theorem 3) a) We have isomorphims H iet(X,Z)
∼= H iet(X
+,Z), which combined with
Corollary 6 shows that the canonical map Pic
0,red
X → Pic
0,red
induces an isomorphism on the torus
components, and is an isogeny with kernel a unipotent group scheme PX on the abelian variety
parts. 5 Hence the map is surjective and the kernel KX is an extension of PX by UX . Applying
the Proposition to the exact sequence of etale sheaves
0 → p∗Gm,X → p∗Gm,X+ → p∗(Gm,X+/Gm,X) → PicX → PicX+
on Spec k, we obtain the diagram with exact columns
0 −−−−→ KX −−−−→ Pic
0,red
X −−−−→ Pic
0,red
−−−−→ 0
0 −−−−→ p∗(Gm,X+/Gm,X) −−−−→ PicX −−−−→ PicX+
0 −−−−→ coker u −−−−→ NSX −−−−→ NSX+ .
Since v is injective, so is u. The Neron-Severi group schemes are extensions of finitely gener-
ated étale group schemes by a finite connected group scheme. The isomorphism H2et(X,µm)
H2et(X
′, µm) implies that PicX(k̄)/m → PicX+(k̄)/m is injective for any m prime to p, and since
0,red
X (k̄) and Pic
0,red
(k̄) are m-divisible, the same holds for NSX(k̄)/m → NSX+(k̄)/m, and
consequently for NSX [
] → NSX+ [
]. Thus coker u is contained in the extension of the p-primary
torsion subgroup NSX{p} by the finite connected group scheme NS
X . Finally, the isomorphism
5O.Gabber [Gab20] showed that, conversely, any finite unipotent commutative group scheme can appear as PX .
6 THOMAS GEISSER
Homk̄(µm,PicX)
∼= H1et(X̄,Z/m) from [Mi80, III Prop. 4.16] together with the isomorphism
H iet(X,Z)
∼= H iet(X
+,Z) and the result on KX shows that the three right maps in the diagram
induce isomorphisms on Homk̄(µm,−) for all m, hence the three groups on the left are unipotent.
b) Recall that q : XT → T , and consider the diagram
0 −−−−→ H1et(T, q∗Gm,X×T ) −−−−→ Pic(X × T ) −−−−→ PicX/k(T ) −−−−→ H
et(T, q∗Gm,X×T )
0 −−−−→ H1et(T, q∗Gm,X+×T ) −−−−→ Pic(X
+ × T ) −−−−→ PicX+/k(T ) −−−−→ H
et(T, q∗Gm,X+×T ).
Since q∗Ga,X×T = H
0(X,OX )⊗OT = H
0(X+,OX+)⊗OT = q∗Ga,X+×T is an isomorphism as in
Proposition 4a), the outer maps are isomorphisms, and it suffices to calculate ker r. Let Y = X×T
and Y ′ = X+ × T , and consider the tautological map
f : {OY -line bundles L ⊆ OY ′ which are invertible in OY ′} → Pic(Y ).
It suffices to show the following statements:
a) The image of f is contained in ker
Pic(Y ) → Pic(Y ′)
b) f surjects onto ker
Pic(Y ) → Pic(Y ′)
c) f is injective.
a) We claim that the map L ⊗OY OY ′ → OY ′ ⊗OY OY ′
−→ OY ′ is an isomorphism. We can check
this on an affine covering, and in this case it is proved in [RS93, Lemma 2.2(4)].
b) Let L ∈ Pic(Y ) with L⊗OY OY ′
∼= OY ′ . Since L is flat, we get an injection L = L⊗OY OY →
L⊗OY OY ′
∼= OY ′ . We claim that the inverse of L in OY ′ is the sheaf associated to the presheaf
U 7→ {x ∈ OY ′(U)|xL(U) ⊆ OY (U)} ⊆ OY ′(U). This can be checked on an affine covering, and
then it is [RS93, Lemma 2.2(2)].
c) Let L and L′ be subsheaves of OY ′ which are invertible in OY ′ and isomorphic as abstract
invertible sheaves. Multiplying with the inverse of L′ inside OY ′ , it suffices to show that if L is
a subsheaf of OY ′ , and f : OY → L an isomorphism, then L = OY ⊆ OY ′ . But f(1) is a global
unit of OY ′(Y ), and by Proposition 4a), OY (Y )
× = OY ′(Y )
×. Hence L = f(1)−1L = OY . ✷
References
[BLR90] S. Bosch, W. Lutkebohmert, M. Raynaud, Neron models, Ergebnisse der Mathematik und ihrer Grenzge-
biete (3), 21. Springer-Verlag.
[Chev60] C. Chevalley, Une demonstration d’un theoreme sur les groupes algebriques, J. Math. Pures Appl. (9) 39
(1960), 307–317.
[SGA3] M. Demazure, A. Grothendieck, eds. (1970). Séminaire de Géométrie Algébrique du Bois Marie - 1962-64
- Schémas en groupes - (SGA 3) - vol. 2, Lecture notes in mathematics 152. Springer-Verlag.
[Gab20] O. Gabber, Letter to the author Nov. 2020.
[Gei06] T. Geisser, Arithmetic cohomology over finite fields and special values of ζ-functions, Duke Math. J. 133
(2006), no. 1, 27–57.
[Gei09] T. Geisser, The affine part of the Picard scheme, Comp. Math. 145 (2009) 415–422.
[Gei10] T. Geisser, Duality via cycle complexes, Ann. of Math. (2) 172 (2010), 1095–1126.
[Gr62] A. Grothendieck, Technique de descente et theoremes d’existence en geometrie algebrique. VI. Les schemas
de Picard. Proprietes generales, Seminaire Bourbaki, 1961/62, no. 236.
[SGA7] A. Grothendieck, Séminaire de Géométrie Algébrique du Bois Marie - 1967-69 - Groupes de monodromie
en géométrie algébrique - (SGA 7) - vol. 1. Lecture Notes in Mathematics 288. Springer-Verlag.
[Mi80] J. S. Milne, Etale cohomology, Princeton Math. Series 33.
[Mu64] J. P. Murre, On contravariant functors from the category of pre-schemes over a field into the category of
abelian groups (with an application to the Picard functor), Inst. Hautes Etudes Sci. Publ. Math. No. 23
(1964) 5–43.
THE AFFINE PART OF THE PICARD SCHEME (CORRECTED). 7
[Oort62] F. Oort, Sur le schema de Picard, Bull. Soc. Math. France 90 (1962) 1–14.
[Oort66] F. Oort, Commutative group schemes, Lecture Notes in Mathematics 15, Springer-Verlag, Berlin-New York
(1966).
[Ray70] M. Raynaud, Specialisation du foncteur de Picard, Inst. Hautes Etudes Sci. Publ. Math. No. 38 (1970)
27–76.
[RS93] L. Roberts, B. Singh, Subintegrality, invertible modules and the Picard group, Compositio Math. 85 (1993),
no. 3, 249–279.
[Tra70] C. Traverso, Seminormality and Picard group, Ann. Scuola Norm. Sup. Pisa (3) 24 (1970), 585–595.
[Wei91] C. Weibel, Pic is a contracted functor, Invent. Math. 103 (1991), no. 2, 351–377.
Dep. of Math., Rikkyo University, Japan
	1. Introduction
	2. The torus
	3. The unipotent part
	References
ABSTRACT
  We describe the maximal torus and maximal unipotent subgroup of the Picard
variety of a proper scheme over a perfect field.

<|endoftext|><|startoftext|>
The few scales of nuclei and nuclear matter
A. Delfino a, T. Frederico b, V. S. Timóteo c, and Lauro Tomio d
aInstituto de F́ısica, Universidade Federal Fluminense, 24210-900 Niterói, RJ,
Brasil
bDepartamento de F́ısica, Instituto Tecnológico de Aeronáutica, CTA, 12228-900,
São José dos Campos, Brasil
cCentro Superior de Educação Tecnológica, Universidade Estadual de Campinas,
13484-370, Limeira, SP, Brasil
dInstituto de F́ısica Teórica, Universidade Estadual Paulista, 01405-900, São
Paulo, Brasil
Abstract
The well known correlations of low-energy three and four-nucleon observables with a
typical three-nucleon scale (e.g. the Tjon line) is extended to light nuclei and nuclear
matter. Evidence for the scaling between light nuclei binding energies and the tri-
ton one are pointed out. We show that the saturation energy and density of nuclear
matter are correlated to the triton binding. From the available systematic nuclear
matter calculations, we verify the existence of bands representing these correlations.
PACS 21.45.+v, 21.65.+f, 21.30.Fe
Key words: Scaling, nonrelativistic few-body systems, nonrelativistic nuclear
matter
Two-nucleon interactions are typically constructed to fit scattering data and
deuteron properties. When such interactions are used to calculate three-nucleon
observables, the results exhibit some discrepancies [1]. Basically, they are ex-
plained as originated from different strengths of the two-nucleon tensor force
and short-range repulsions, provided that all realistic two-nucleon interactions
have the correct one-pion exchange tail. In four-nucleon bound state (4He )
calculations the discrepancies still remain. But, at least, they are correlated,
as seen in the binding energies of 4He (Bα) and triton (Bt), which lie on a
very narrow band [2], obtained when the short-range repulsion of the nucleon-
nucleon interaction is varied while two-nucleon informations (deuteron and
scattering) are kept fixed. This correlation is known as Tjon line [2]. Bα and
Bt follows an almost straight line in the range of about 1-2 MeV of variation
Preprint submitted to Elsevier 4 November 2018
http://arxiv.org/abs/0704.0481v1
of the triton binding energy around the experimental value. As the long-range
two-nucleon scales we have the deuteron binding energy (Bd) and the singlet
virtual-state energy (Bv).
Two-body short-ranged interactions, supporting very low two-body binding
energy and/or large scattering lengths, when used to calculate three-body sys-
tems, approach what we call the universal Thomas-Efimov limit [3]. By trying
to find the range r0 of the two-nucleon force, Thomas [4] showed that when
r0 → 0, while the two-body binding energy B2 is kept fixed, the three-body
binding energy goes to infinity (Thomas collapse). Much latter, Efimov [5]
showed that, in the limit B2 = 0(r0 6= 0) the number of three-body bound
states is infinite with an accumulation point at the common two- and three-
body threshold. Note that both the Thomas and Efimov effects are claimed to
be model independent, since they are due to a dynamically generated effective
three-body potential acting at distances outside the range of the two-body
potential. These apparently different effects are related to the same scaling
mechanism, as shown in Ref. [3]. In other words, the Thomas effect appears
when r0 is much smaller than the size of the two-body system (which is of the
order of the scattering length |a|), while the Efimov effect arises for |a| >> r0.
Therefore, what matters for both effects is the same condition: |a| >> r0 or
the ratio |a|/r0 >> 1. In terms of the two-body energies this is translated
m|B2|/h̄
r0 << 1 (m the boson mass). One would expect that the
Thomas-Efimov effect is manifested in weakly-bound quantum few-body sys-
tems which are much larger in size than the corresponding two-body effective
range. Notice that a zero binding energy for a free two-body system is not
known in nature. But, nowadays it was shown that, for trapped ultracold
gases of certain atomic species, it is possible to adjust the two-body scattering
length at very large values, using Feshbach resonance techniques, by tuning the
external magnetic field [6]. In this case, it is expected that the Thomas-Efimov
effect can be manifested [7].
The deuteron and triton may be viewed as low energy systems with large size
scales in which the range of the potential is smaller than the corresponding
healing distances of the wave functions, leading the nucleons to have a high
probability to be outside of the interaction range. Then, the low-energy prop-
erties of these systems can be studied with models that minimally includes the
physics of the Thomas-Efimov effect, as in the case of a few-body model with
renormalized pairwise s-wave zero-range force [8]. This approach shows that
all the low-energy properties of the three-body system are well defined in the
model, once one three-body scale and the two-body low-energy observables are
given. As a consequence, correlations between two three-body s−wave observ-
ables are expected to appear in model calculations with short-ranged interac-
tions. Along this line, some previous works (see [9] and references therein) have
studied weakly-bound halo states in exotic nuclei as well as possible Efimov
states for He-He-Alkali molecules.
The scaling of three-nucleon observables with the triton binding energy corre-
sponds to universal behaviors found when a three-body scale is varied. For ex-
ample, the Phillips plot [10] of the neutron-deuteron doublet scattering length,
as a function of the triton energy is nowadays one of the universal scalings
found in the three-nucleon system [11,12]. In general, it is observed for nu-
clear and molecular weakly bound three-body systems [13,14,15] the scaling
of observables with the three-body binding energy.
The Thomas-collapse of the three-body energy in systems of maximum wave-
function symmetry implies the existence of a three nucleon scale (identified
with the triton binding energy) governing the short-range behavior of the
wave-function. Four nucleons can also form a state of maximum symmetry,
allowing in principle the collapse of such configuration, independently of the
three-nucleon collapse [8]. This is under discussion and it is suggested in [16]
that the four-body scale is not independent of the three-body one. However,
in their work it was introduced a three-body force to stabilize the shallowest
three-body state, against the variation of the cut-off. The three-body interac-
tion can be attractive or repulsive and their conclusion lies on the repulsive
sector. We note that the attractive part indicates a possible independent be-
havior of the four body ground-state energy from the three-body one. Certainly
this point merits further discussions and so far, we think, it is still open the
possibility of a four-body scale. Anyway, as the nucleon-nucleon interaction
is strongly repulsive at short range and therefore the probability of four nu-
cleons to be simultaneously in a volume ∼ r30 is quite small, presumably the
four-nucleon scale itself has much less opportunity to be evidenced in realistic
nuclear models. Indeed, this is indicated by the existence of the Tjon line. Due
to that, as we will see later, the four-nucleon binding energy is eliminated in
favor of the triton binding energy. In this respect it is worthwhile to note that
Platter et al. extended the effective field theory framework applied to four-
bosons [16] to calculate the 4He binding energy by controlling the triton energy
through a repulsive effective three-nucleon force. Within their approach [17]
the Tjon line is reproduced.
In a nuclear scenario dominated by an interaction with a range smaller than
the nucleon-nucleon scattering lengths, and considering the triton and 4He
nuclear sizes yet larger than the force range, the picture of nuclei would be of
a many-body system with the wave-function being an eigenfunction of the free
Hamiltonian almost everywhere. The Pauli principle allows only up to four nu-
cleons at the same position, forbidding certain particular configurations with
overlap of more particles. If more than four particles are allowed to overlap,
it would imply that the asymptotic information from the interaction of the
cluster would go beyond of those already fixed by the low-energy observables
of two, three and four nucleons. By some unknown reason the parameters
of Quantum Chromodynamics are close to this limit. It was conjectured in
Ref. [18] that a small change in the light quark masses away from their physi-
cal values could put the deuteron and the singlet virtual state at zero binding
energy, and therefore the above idealized picture of the nuclear systems could
not be far from reality. It is quite amazing thinking that nuclear wave func-
tions could heal much beyond the interaction range. Therefore, the details of
the long wavelength structure of nuclei are given by the free Hamiltonian and
by few-nucleon scales, which determine the wave function at short distances.
The universal behavior of the scaling functions are due to that.
If one wonders about the neutron matter within a non-relativistic quantum
framework, in the limit of a zero-range force, we could say that the only scale
in this case is the neutron-neutron scattering length. Therefore, the binding
energy of neutron droplets will be strongly correlated to that quantity, which
is the only physical scale in this situation allowed by the Pauli principle. This
discussion has been performed in the context of three neutron systems [19].
Moreover, it was concluded in Ref. [20] that stable tetra-neutron droplets
would imply a major change in the neutron-neutron scattering length.
Another example of the dominance of only two-body scale appears in three-
boson systems in two dimensions, where the Thomas-Efimov effect is ab-
sent [3]. In this case, only two-body low energy scales are enough to define the
many-body properties in the limit of a zero-range interaction. The low-energy
properties of a many-body system of spin-zero particles in two dimensions
will be sensitive only to the two-boson binding energy. Even in the case where
bosons are trapped, since the essential singularity of the point-like configura-
tion is not affected by the confining force as the harmonic one.
For the sake of generality, we start with the observables Bd, Bv, Bt and Bα
as the scales determining the asymptotic properties of nuclei [9]. Then, in the
limit of a zero-range interaction, we write the binding energy of a nucleus with
mass number A and isospin projection Iz, considering isospin breaking effects,
B(A, Iz) = A Bt B (βv, βd, βα, A, Iz) , (1)
where βa = Ba/Bt with a = v, d and α.
According to the Tjon line, βα remains approximately constant for a variety
of two-nucleon potentials and the parametrization of the numerical results,
given in MeV, for several two-nucleon potentials is
Bα = 4.72 (Bt − 2.48) , (2)
which for B
t = 8.48 MeV gives B
α = 28.32 MeV. Using (2) in (1),
R(A, Iz) = B(A, Iz)/A = Bt R (Bt, A, Iz) , (3)
where in the scaling function R(A, Iz) the values of Bd and Bv are fixed to
the experimental values. The dependence of Bα with Bt for realistic nucleon-
nucleon potentials is given by Eq. (2).
Equation (3) generalizes the concept of the Tjon line to nuclei. Recent calcu-
lations using the AV18 nucleon-nucleon potential plus three-body forces [21]
show that there is a systematic improvement of the binding energy results
for He, Li, Be and B isotopes simultaneously with the triton binding energy,
when models are tuned to fit Bt. It is important to note that these AV18
calculations have at least two three-body parameters that are fitted to Bt and
nuclear matter saturation properties. Consequently, one could argue that such
calculations cannot provide evidence for one-parameter correlation. The fit-
ting to nuclear matter calculation presumably is not that important for light
nuclei in view of the dominance of the triton binding (or three-body correla-
tions) in the four-nucleon bound state as given by the Tjon line. Therefore,
it is reasonable to think that three-body correlations are quite important for
light-nuclei, since that even for the alpha particle where the nucleons are in a
very compact configuration this occurs. In our opinion, the fitting of nuclear
matter saturation properties has more to do with the approximations done
in nuclear matter calculations. The three-body potential should be somewhat
tuned, which is, probably, not so important for light nuclei once the triton
binding attains its physical value.
For nuclear matter properties calculation using a variety of two-nucleon poten-
tials, in which the tensor strength was varied but the deuteron binding energy
was kept fixed, it was shown that these interactions cannot quantitatively ac-
count for nuclear saturation [22,23]. Coester et al. [22] observed that, in an
energy versus density plot, the saturation points of nuclear matter obtained
by employing different realistic potentials are located along a band (Coester
band). Also, in a relativistic framework it was observed such strong correla-
tion [24]. The displayed nuclear matter binding energy (BA/A ≡ B(A,0)/A)
versus saturation density [ ρo = (2/3)k
2, with kF the Fermi momentum]
results are within a narrow band [22]. The observation given in Ref. [22] have
been studied by many other authors that have used nuclear matter binding
energies and saturation densities from different two-nucleon interactions. The
main argumentation, as also in the case of three-nucleon calculations, is that
this effect comes from different strengths of the two-nucleon tensor force and
short range repulsion, which changes the triton binding energy, while keeping
fixed the low-energy two-body scales.
Basically, nuclear matter saturates due to the composed repulsive and attrac-
tive short-range two-nucleon potential. Since, it may also be seen as a typical
low-energy problem, it is natural to question whether any connection exists
30 40 50 60 70
 [ MeV ]
Fig. 1. Infinite nuclear matter binding energy as a function of EF extracted from
Ref. [25] (solid circles and squares). The squares includes the single particle contri-
bution in the continuum. The full triangle is given by the empirical values.
between the proper few-body scales, Bd , Bv and Bt with those of the many-
body problem, like the BA/A and the Fermi energy EF = h̄
2k2F/(2mN) . For
light nuclei there is strong evidences of scaling between Bd , Bv and Bt as
expressed by Eq. (3).
Here we are arguing that the scales of nuclear matter, BA/A and EF , are
determined by Bd , Bv and Bt . Therefore, we suppose that going to the
infinite isospin symmetrical nuclear matter, A → ∞ and Iz = 0, the limit
B (βv, βd, βα, A, Iz = 0)
=Bt G (βv, βd, βα) , (4)
is well defined and expresses the correlation between the binding energy of the
nucleon in nuclear matter with the few-nucleon scales. The Fermi energy
EF = Bt EF (βv, βd, βα) , (5)
will be correlated as well to the few-nucleon binding energies.
The aim of this work is to study the possible correlation of the nuclear matter
binding energy per nucleon with Bd , Bv and Bt , in order to improve our
understanding of the general and important scaling of observables. Our inves-
tigation is based on Eqs. (4) and (5), motivated by our previous discussion
that leads to Eq. (1) and in the several works that recognize the role played by
the low-energy few-body scales [8,9,13,12,14,15], in defining the observables of
few-nucleon systems.
In the present framework, the universal scaling functions connect the proper
-9 -8.5 -8 -7.5 -7 -6.5 -6
 [MeV]
Fig. 2. BA/A as a function of Bt extracted from Ref. [25] (solid circles and squares).
The squares includes the single particle contribution in the continuum. The full
triangle is given by the empirical values.
scales of the few-body system with those of the many-body system, as given by
Eqs. (4) and (5). Different potentials, which describe the deuteron and the two-
nucleon scattering properties give different values of Bt , Bα BA/A and EF .
As we have done in deriving Eq. (3) for a class of changes in the short-range
part of the nuclear force that keeps the deuteron and low energy scattering
properties unchanged, and taking into account that for these variations of
the potential the 4He and triton binding energies are strongly correlated as
given by the Tjon line, one can rewrite Eqs. (4) and (5) in order to get a one
parameter scaling:
=Bt G (Bt) (6)
for fixed Bd and Bv, where the only true dependence in the class of potential
variations is dominated by Bt. The analogous expression for the Fermi energy
EF = Bt EF (Bt) , (7)
where EF scale with Bt.
In the perspective of the one parameter functions of Eqs. (6) and (7), it is
clear that one could express Et as a function of EF and immediately get
= C(EF ), (8)
-9 -8.5 -8 -7.5 -7 -6.5 -6
 [MeV]
Fig. 3. EF as a function of Bt extracted from Ref. [25] (solid circles and squares).
The squares includes the single particle contribution in the continuum. The full
triangle is given by the empirical values.
the correlation implied by the Coester band.
In order to enlighten our discussion we bring a variety of nuclear matter bind-
ing energies BA/A at the corresponding saturation density, represented by
the Fermi momenta kF , calculated from different two-nucleon potentials. In
Fig. 1, we present the well known Coester band in which the results for BA/A
and EF are showed. The two distinct bands represent the nuclear matter cal-
culations with and without the single-particle continuum contributions. The
empirical values are BA/A = 16 MeV and EF = 37.8 MeV from [27].
The correlation of BA/A with Bt expressed by (6) is plotted in Fig. 2, for
the same set of results given in Fig. 1. The two-nucleon potentials present
different values for the triton binding energy Bt , while the two-nucleon low-
energy observables are fixed. We observe that the scaling function EtG(Et)
is quite linear in the interval of about 2 MeV including the triton binding
energy. We note that the ratio BA/A depends strongly on Bt. We understand
this fact as a reminiscent manifestation of the three-body scale in the nuclear
matter results obtained with only two-body correlations. In Fig. 3, we show the
correlation between the Fermi energy and the triton binding energy. In general,
we observe that the increase in the three-body scale leads to the increase of the
Fermi energy, which is reasonable in view of the scaling function (7). However,
we observe that the empirical values disagrees with the general trend of the
correlation, a problem that could already be anticipated by looking at the
Coester band in Fig. 1.
The inclusion of three-body correlations, which carries the dynamics that sta-
bilizes the Thomas collapse, presumably has a repulsive effect diminishing the
saturation density and somewhat the nuclear matter binding. Due the short-
range repulsion of the nucleon-nucleon interaction, the nuclear matter tends
to saturate at large densities if only two-body correlations are considered and
the empirical binding is achieved. The dynamically generated three-body sta-
bilization mechanism carried only through the three-body correlations should
appear in the nucleon-nucleon interaction range, however such repulsive con-
tribution is absent if only two-body correlations are considered in the evalu-
ation of nuclear matter properties. Therefore, we suspect that the inclusion
of three-body correlations in nuclear matter calculations will bring the corre-
lation curve of the BA/A with Bt in Fig. 2 toward the empirical values; and
also in Fig. 3, where the saturation densities would be possibly found at lower
values for a given triton binding energy.
Recent sophisticated many-body calculations [26], where the triton binding en-
ergy and nuclear matter saturation density were adjusted through two three-
body parameters make subtle the clear appreciation of the real role of the
three-body correlations, once the fit of these realistic forces mixes differences
that come from the interaction and from the many-body approximations in nu-
clear matter calculations. It is beyond our work the discussion of the quite in-
volved many-body approximations needed to perform such calculations. Nev-
ertheless, our conjecture is obviously based on qualitative arguments, which
implies that just one three-body scale parameter is relevant for a systematic
description of light nuclei and nuclear matter.
The one parameter dependence in Eqs. (6) and (7) suggests to plot the dimen-
sionless quantity BA/(ABt) as a function of the ratio EF/Bt, which should look
as an almost linear correlation. We display in Fig. 4 the values for BA/(A Bt)
versus EF/Bt . As we could anticipate, the results show a clear linear cor-
relation. We are tempted to say that, if the correlation is extrapolated and
assuming that the binding and saturation densities somewhat decreases when
three-body correlations are considered, it looks to be possible that the empir-
ical values would be consistent with the correlation band.
In summary, we suggest for the first time a possible scaling of nuclei asymp-
totic properties, in particular the nuclear binding energies with the triton
binding energy, substantiated by recent realistic calculations of light nuclei.
This observation generalizes to the many-nucleon context the correlations be-
tween observables found in the three and four-nucleon systems. Beyond that,
we found that the original correlation between the nuclear matter binding en-
ergy per nucleon with the Fermi momentum described by the Coester band
can now be seen as robustly represented by the scaling of nuclear matter prop-
erties with the triton binding energy. The values of Bt carry different aspects
of the used two- and three-nucleon potentials. To verify the extension of our
conjecture, we propose that one could control the strength of the three-body
force without a many-body parameter in order produce different triton binding
4 5 6 7 8 9
Fig. 4. Infinite nuclear matter binding energy as a function of EF , both in units
of the triton binding energy. The calculation results are extracted from Ref. [25]
(solid circles and squares). The squares includes the single particle contribution in
the continuum. The full triangle represents the empirical values.
energies and nuclear matter properties. We emphasizing that the nuclear mat-
ter results should be obtained within the same approximations for all models.
In this way, we expect that the plotted results in figures 2 and 3 should lie in
a very narrow band.
Our discussion may turn in a simple way to systematize results of possible
forthcoming realistic calculations for many-nucleon systems. Consistent with
our conclusions, it was argued in Ref. [18] that QCD implies that nuclear
physics is close to the Thomas-Efimov limit. In this reference, it was conjec-
tured that a small change in the light quark masses away from their physical
values could be enough to move the deuteron and the singlet virtual state to
zero binding energies. Therefore, nuclear physics could be dominated by long-
range universal effective forces making the scalings not only a subtlety but
also an evident reality in ab-initio non-relativistic nuclear model calculations.
Acknowledgments. This work was partially supported by Fundação de Amparo
à Pesquisa do Estado de São Paulo and Conselho Nacional de Desenvolvimento
Cient́ıfico e Tecnológico. V. S. T. would like to thank FAEPEX/UNICAMP
for partial financial support.
References
[1] W. Glöckle, H. Witala, D. Huber, H. Kamada, J. Golak, Phys. Rep. 274 (1996)
[2] J.A. Tjon, Phys. Lett. B 56 (1975) 217; R.E. Perne and H. Kroeger, Phys. Rev.
C 20 (1979) 340; J.A. Tjon, Nucl. Phys. A 353 (1981) 470.
[3] S.K. Adhikari, A. Delfino, T. Frederico, I.D. Goldman and L. Tomio, Phys.
Rev. A 37 (1988) 3666; S.K. Adhikari, A. Delfino, T. Frederico and L. Tomio,
Phys. Rev. A 47 (1993) 1093. See also A. Delfino and E.F. Redish in the
“Book of Contributions, XIII International Conference on Few-Body Problems
in Physics”, pg. 304, Edited by I.R. Afnan and R.T. Cahill (1992).
[4] L.H. Thomas Phys. Rev. 47 (1935) 903.
[5] V. Efimov, Phys. Lett. B 33 (1970) 563; Nucl. Phys. A 362 (1981) 45.
[6] J.L. Roberts, N.R. Claussen, S.L. Cornish, and C.E. Wieman, Phys. Rev. Lett.
85 (2000) 728.
[7] D. Blume and C.H. Greene, Phys. Rev. A 66 (2002) 013601; S. Jonsell, H.
Heiselberg and C.J. Pethick, Phys. Rev. Lett. 89 (2002) 250401.
[8] S.K. Adhikari, T. Frederico, and I.D. Goldman, Phys. Rev. Lett. 74 (1995) 487;
S.K. Adhikari and T. Frederico, Phys. Rev. Lett. 74 (1995) 4572; T. Frederico,
A. Delfino, and L. Tomio, Phys. Lett. B 481 (2000) 143.
[9] T. Frederico, L. Tomio, A. Delfino, and A.E.A. Amorim, Phys. Rev. A 60 (1999)
R9; A. Delfino, T. Frederico, and L. Tomio, Few-Body Systems 28 (2000) 259;
A. Delfino, T. Frederico, and L. Tomio, J. Chem. Phys. 113 (2000) 7874.
[10] A.C. Phillips, Nucl. Phys. A 107 (1968) 109.
[11] T. Frederico, I.D. Goldman, Phys. Rev. C 36 (1987) 1661; T. Frederico, I.D.
Goldman, S.K. Adhikari, Phys. Rev. C 37 (1988) 949.
[12] P.F. Bedaque, U. van Kolck, Ann. Rev. Nucl. Part. Sci. 52 (2002) 339.
[13] E. Nielsen, D.V. Fedorov, A.S. Jensen, E. Garrido, Phys. Rep. 347 (2001) 374.
[14] A.S. Jensen, K. Riisager, D.V. Fedorov, E. Garrido, Rev. Mod. Phys. 76 (2004)
[15] E. Braaten, H.-W. Hammer, cond-mat/0410417.
[16] L. Platter, H.-W. Hammer, U.-G. Meissner, Phys. Rev. A 70 (2004) 052101.
[17] L. Platter, H.-W. Hammer, U.-G. Meissner, Phys. Lett. B 607 (2005) 254.
[18] E. Braaten and H.-W. Hammer, Phys. Rev. Lett. 91 (2003) 102002.
[19] A. Delfino, and T. Frederico, Phys. Rev. C 53 (1996) 62.
[20] S.C. Pieper, Phys. Rev. Lett. 90 (2003) 252501.
[21] S.C. Pieper and R.B. Wiringa, Ann. Rev. Nucl. Part. Sci. 51 (2001) 53; R.B.
Wiringa, S.C. Pieper, Phys. Rev. Lett. 89 (2002) 182501; S.C. Pieper, K. Varga,
R.B. Wiringa, Phys. Rev. C 66 (2002) 044310.
[22] F. Coester, S. Cohen, B.D. Day, and C.M. Vincent, Phys. Rev. C 1 (1970) 769.
[23] B.D. Day, Rev. Mod. Phys. 50 (1978) 495; B.D. Day, Phys. Rev. Lett. 47 (1981)
http://arxiv.org/abs/cond-mat/0410417
[24] A. Delfino, M. Malheiro, V.S. Timóteo, J.S. Sá Martins, Braz. J. Phys. 35
(2005) 190.
[25] R. Machleidt, Adv. Nucl. Phys. 19 (1989) 189.
[26] A. Akmal, V. R. Pandharipande, Phys. Rev. C 56 (1997) 2261; J. Morales Jr.,
V. R. Pandharipande, D. G. Ravenhall, Phys. Rev. C 66 (2002) 054308.
[27] R. J.Furnstahl, nucl-th/0504043.
http://arxiv.org/abs/nucl-th/0504043
	References
ABSTRACT
  The well-known correlations of low-energy three and four-nucleon observables
with a typical three-nucleon scale (e.g., the Tjon line) is extended to light
nuclei and nuclear matter. Evidence for the scaling between light nuclei
binding energies and the triton one are pointed out. We argue that the
saturation energy and density of nuclear matter are correlated to the triton
binding energy. The available systematic nuclear matter calculations indicate a
possible band structure representing these correlations.

<|endoftext|><|startoftext|>
Implementation of holonomic quantum computation through engineering and
manipulating environment
Zhang-qi Yin, Fu-li Li,∗ and Peng Peng
Department of Applied Physics, Xi’an Jiaotong University, Xi’an 710049, China
We consider an atom-field coupled system, in which two pairs of four-level atoms are respectively
driven by laser fields and trapped in two distant cavities that are connected by an optical fiber. First,
we show that an effective squeezing reservoir can be engineered under appropriate conditions. Then,
we show that a two-qubit geometric CPHASE gate between the atoms in the two cavities can be
implemented through adiabatically manipulating the engineered reservoir along a closed loop. This
scheme that combines engineering environment with decoherence-free space and geometric phase
quantum computation together has the remarkable feature: a CPHASE gate with arbitrary phase
shift is implemented by simply changing the strength and relative phase of the driving fields.
PACS numbers: 03.67Lx, 03.65.Vf, 03.65.Yz, 42.50.Dv
I. INTRODUCTION
Quantum computation, attracting much current inter-
est since Shor’s algorithm [1] was proposed, depends on
two key factors: quantum entanglement and precision
control of quantum systems. Unfortunately, quantum
systems are inevitably coupled to their environment so
that entanglement is too fragile to be retained. This
makes the realization of quantum computation extremely
difficult in the real world. In order to overcome this dif-
ficulty, one proposed the decoherence-free space concept
[2, 3]. It is found that when qubits involved in quantum
computation collectively interact with a same environ-
ment there exists a “protected” subspace in the entire
Hilbert space, in which the qubits are immune to the
decoherence effects induced by the environment. This
subspace is called decoherence-free space (DFS). To per-
form quantum computation in a DFS, one has to design
the specific Hamiltonian containing controlling parame-
ters, which eigenspace is spanned by DFS states and the
state-unitary manipulation related to quantum compu-
tation goal is implemented by changing the controlling
parameters [4].
As well known, instantaneous eigenstates of a quantum
system with the time-dependent Hamiltonian may ac-
quire a geometric phase when the time-dependent param-
eters adiabatically undergo a closed loop in the parameter
space [5]. The phase depends only on the swept solid an-
gle by the parameter vector in the parameter space. This
feature can be utilized to implement geometric quantum
computation (GQC) which is resilient to stochastic con-
trol errors [6, 7, 8]. On combining the DFS approach
with the GQC scheme, one may build quantum gates
which may be immune to both the environment-induced
decoherence effects and the control-led errors [9]. In the
scheme, quantum logical bits are represented by degener-
ate eigenstates of the parameterized Hamiltonian. These
∗Email: flli@mail.xjtu.edu.cn
states have the features: they belong to DFS, and unitar-
ily evolve in time and acquire a geometric phase when the
controlling parameters adiabatically vary and undergo a
closed loop.
In the recent paper [10], Carollo and coworkers showed
that a cascade three-level atom interacting with a broad-
band squeezed vacuum bosonic bath can be prepared in a
state which is decoupled to the environment. This state
depends on the reservoir parameters such as squeezing
degree and phase angle. As the squeezing parameters
smoothly vary, the atomic state can unitarily evolve in
time and always be in the manifold of the DFS. More-
over, after a cyclic evolution of the squeezing parameters,
the state acquires a geometric phase. This investigation
has been generalized to cases where both quantum sys-
tems and manipulated reservoir under consideration are
not restricted to cascade three-level atoms and squeezed
vacuum [11]. These results strongly inspire us that in-
stead of engineering Hamiltonian one may implement the
decoherence-free GQC by engineering and manipulating
reservoir.
In this paper, we propose a scheme in which the
quantum-reservoir engineering [12, 13, 14] is combined
with DFS and Berry phase together to realize a two-
qubit CPHASE gate [15]. We show that atomic states
can unitarily evolve in time in a DFS if the change rate
of reservoir parameters is much smaller than the charac-
teristic relaxation time of an atom-reservoir coupled sys-
tem. Moreover, we find that as the reservoir parameters
adiabatically change in time along an appropriate closed
loop, the atomic state in the DFS acquires a Berry phase
and a CPHASE gate with arbitrary phase shift can be
realized. To our knowledge, it is the first proposal for the
realization of quantum gates by engineering and steering
the environment.
This paper is organized as follows. In Sec. II, we in-
troduce a cavity-atom coupling model in which two pairs
of four-level atoms are respectively trapped in two dis-
tant cavities that are connected by an optical fiber. In
the model, each of pairs of the atoms are simultaneously
driven by laser fields and coupled to the local cavity
http://arxiv.org/abs/0704.0482v3
mailto:flli@mail.xjtu.edu.cn
modes through the double Raman transition configura-
tion. Under large detuning and bad cavity limits, we
investigate to engineer an effective broadband squeezing
reservoir for the atoms. In Sec. III, we analyze how to re-
alize controlling gates between the atoms trapped in the
two cavities by steering the squeezing reservoir. Section
IV contains conclusions of our investigations.
laserlaser
fibre
FIG. 1. Atom-field coupling scheme.
|gjn〉
|rjn〉
|sjn〉
δjn|ejn〉
FIG. 2. Atomic level configuration for atom j in cavity
II. ENGINEERING A SQUEEZING
ENVIRONMENT AND GENERATING A
DECOHERENCE-FREE SUBSPACE
Our scheme is shown in Fig.1. A pair of four-level
atoms are trapped in each of two distant cavities, respec-
tively, which are connected through an optical fiber. In
the short fiber limit [16, 17, 18], only one fiber mode b
is excited and coupled to cavity modes a1 and a2 with
strength ν [19]. We assume that the cavity modes and
the fiber mode have the same frequency ω. The level
scheme of atoms is shown in Fig.2. Atom j in cavity n
is labeled by the index jn with j, n = 1, 2. The distance
between the atoms in the same cavity is assumed to be
large enough that there is no direct interaction between
the atoms. The levels |gjn〉 and |ejn〉 of atom j in cavity
n, with j, n = 1, 2 are stable with a long life time. The
energy of the level |gjn〉 is taken to be zero as the energy
reference point. The lower lying level |ejn〉, and upper
levels |rjn〉 and |sjn〉 have the energy δjn, and ωrjn and
ωsjn, respectively, in the unit with ~ = 1. Transitions
|gjn〉 ↔ |sjn〉 and |ejn〉 ↔ |rjn〉 are driven by laser fields
of frequencies ωLsjn and ω
jn with Rabi frequencies Ω
and Ωrjn and relative phase ϕ, respectively. Transitions
|gjn〉 ↔ |rjn〉 and |ejn〉 ↔ |sjn〉 are coupled to the cav-
ity mode an with the strengths g
jn and g
jn, respectively.
Here, we set ∆rjn = ω
jn − ω = ωrjn − ω
jn − δjn, and
∆sjn = ω
jn − ω − δjn = ωsjn − ω
Under the Markovian approximation, the master equa-
tion of the density matrix for the whole system under
consideration can be written as [14]
ρ̇T = −i[H, ρT ] + Lcav1ρT + Lcav2ρT + LfiberρT , (1)
where H = H0 +Hd +Hac +Hcf with
j,n=1
ωrjn|rjn〉〈rjn|+ ωsjn|sjn〉〈sjn|+ δjn|ejn〉〈ejn|)
a†nan + b
j,n=1
(Ωsjn
t|sjn〉〈gjn|
e−i(ω
t+ϕ)|rjn〉〈ejn|+H.c.
Hac =
j,n=1
(grjn|rjn〉〈gjn|an + gsjn|sjn〉〈ejn|an +H.c.),
Hcf =ν
1 + a
2) + H.c.
Here, H0 is the free energy of atoms and cavity fields,
Hd is the interaction energy between the atoms and laser
fields, Hac is the interaction energy between the atoms
and the cavity fields, and Hcf describes the interaction
between the cavity modes and the fiber mode. The last
three terms in (1) describe the relaxation processes of
the cavity and fibre modes in the usual vacuum reservoir,
taking the forms
LcavnρT =κn(2anρTa
n − a†nanρT − ρT a†nan),
LfiberρT =κf (2bρT b
† − b†bρT − ρT b†b),
where κn is the leakage rate of photons from cavity n,
and κf is the decay rate of the fiber mode.
Let’s introduce collective basis: |a〉n = (|g1n〉|e2n〉 −
|e1n〉|g2n〉)/
2, | − 1〉n = |g1n〉|g2n〉, |0〉n = (|g1n〉|e2n〉+
|e1n〉|g2n〉)/
2, |1〉n = |e1n〉|e2n〉. The states |a〉n and
| − 1〉n are taken as a qubit n for quantum computation.
In the large detuning limit, adiabatically eliminating the
excited states and setting
= βrn and
= βsn, from (2), we obtain the effective
interaction Hamiltonian
Heff =
iϕS+n + β
nSn) +H.c.
+Hcf , (4)
where S+n = |0〉nn〈−1|+|1〉nn〈0|. In the derivation of (4),
we have assumed the resonant condition
〈a†nan〉 +
〈a†nan〉+ δ′jn. In order to satisfy the
condition with the flexible choice of Ωrjn, Ω
jn, ∆
jn and
∆sjn, we have introduced additional ac-Stark shifts δ
jn to
states |gjn〉, which can be generated by using a laser field
to couple the level |gjn〉 to an ancillary level.
We now introduce three normal modes c and c± with
frequencies ω and ω±
2ν by use of the unitary transfor-
mation a1 =
(c+ + c− +
2c), a2 =
(c+ + c− −
b = 1√
(c+ − c−) [17, 18]. In the limit ν ≫ |βrj |, |βsj |,
neglecting the far off-resonant modes c± and setting
1 = −β
2 = β
p with p = r, s, we can approximately
write the effective Hamiltonian (4) as
Heff = (β
reiϕS+ + βsS)c+H.c., (5)
where S+ = S+1 + S
Since the modes c± are nearly not excited and decou-
pled with the resonant mode c, the fiber mode b is mostly
in the vacuum state, therefore, LfiberρT can be neglected,
and Lcav1ρT + Lcav2ρT can be approximated as
LcavρT = κ(2cρT c
† − c†cρT − ρT c†c), (6)
where κ = (κ1 + κ2)/2.
In the bad cavity limit, κ≫ β, adiabatically eliminat-
ing the mode c [12, 14], from Eq. (1) with the replace-
ment of the Hamiltonian (2) and the relaxation terms (3)
by the effective Hamiltonian (5) and the relaxation term
(6), respectively, we can obtain the master equation for
the density matrix of the atoms
ρ̇ = −Γ
(R+Rρ+ ρR+R− 2RρR+), (7)
where ρ = Trf (ρT ), R = S cosh r + e
iϕS† sinh r, r =
cosh−1(βr/
βr2 − βs2) and Γ = 2(βr2−βs2)/κ. Eq. (7)
describes the collective interaction of two cascade three-
level atoms with the effective squeezed vacuum reservoir
[10]. The parameters βr, βs and ϕ are easily changed and
controlled at will by varying the strength and phase of the
driving lasers [8]. We will show that a geometric phase
gate can be realized through changing these parameters.
The DFS of the atomic system is spanned by the
states which satisfy the equation R(r, ϕ)|ψDF(r, ϕ)〉 = 0
[10]. In terms of basis states |e1〉 = |a〉1|a〉2, |e2〉 =
|a〉1| − 1〉2, |e3〉 = | − 1〉1|a〉2, |e4〉 = |a〉1|0〉2, |e5〉 =
|0〉1|a〉2, |e6〉 = |a〉1|1〉2, |e7〉 = |1〉1|a〉2, |e8〉 =
|1〉1|1〉2, |e9〉 = 1√2 (|1〉1|0〉2 + |0〉1|1〉2), |e10〉 = | −
1〉1| − 1〉2, |e11〉 = 1√2 (|0〉1| − 1〉2 + | − 1〉1|0〉2), |e12〉 =
(|1〉1|−1〉2+ |−1〉1|1〉2)+ 2√6 |0〉1|0〉2), the DFS states
can be written as
|ψDF(r, ϕ)〉1 =|e1〉,
|ψDF(r, ϕ)〉j =
cosh r√
cosh 2r
|ej〉 − eiϕ
sinh r√
cosh 2r
|ej+4〉, j = 2, 3,
|ψDF(r, ϕ)〉4 =
e2iϕ(tanh r)2|e8〉 −
eiϕ tanh r|e12〉+ |e10〉
(tanh r)4 + 2
(tanh r)2 + 1
Let’s introduce a unitary transformation O(r, ϕ) by
|φi〉 =
i=1Oij(r, ϕ)|ej〉, where |φi〉 = |ψDF〉i for i =
1, 2, 3, 4. For the transformed density matrix ρ̄ = O†ρO,
we have
= i[G, ρ̄] +O†
O, (9)
where G(r, ϕ) = iO† dO
= iO†[ṙ dO
+ ϕ̇dO
]. To solve
Eq. (9) in the DFS, let’s define the time-independent
projector Π(0) = O†Π(r, ϕ)O =
†|φi〉〈φi|O =
j=1 |ej〉〈ej | + |e10〉〈e10| onto the DFS. From (9), we
obtain the equation of motion for ρ̄DF = Π(0)ρ̄Π(0)
dρ̄DF
=i[GDF, ρ̄DF] + iΠ(0)GΠ⊥(0)ρ̄Π(0)
− iΠ(0)ρ̄Π⊥(0)GΠ(0) + Π(0)O†
OΠ(0),
where Π⊥(0) = 1− Π(0) and GDF = Π(0)GΠ(0). In the
limit of ṙ, ϕ̇≪ Γ, the last three terms in Eq. (10) can be
neglected [11]. In this way, Eq. (10) is reduced to
dρ̄DF
= i[GDF, ρ̄DF]. (11)
Therefore, in the frame dragged adiabatically by the
reservoir, the state of the atoms in the DFS unitarily
evolves in time.
III. REALIZING CONTROLLING PHASE
GATES THROUGH MANIPULATING THE
SQUEEZING ENVIRONMENT
In this section, we investigate how to realize a
CPHASE gate through manipulating the engineered
reservoir. Suppose that at the initial time the laser
field driving the transition |g〉 ↔ |s〉 is switched off
but the laser field driving the transition |r〉 ↔ |e〉 is
switched on and the atoms are in the DFS state |Ψ(0)〉a =
(|a〉1|a〉2 + |a〉1| − 1〉2 + | − 1〉1|a〉2 + | − 1〉1| − 1〉2) =
j=1 |ψDF(0, 0)〉j/2. To generate a geometric phase for
the atomic state, we smoothly change the parameters of
the engineered reservoir along a closed loop, which is di-
vided into the following three steps: (1) From time 0 to
T1, hold on ϕ = 0, and adiabatically increase the parame-
ter r from 0 to r0; (2) From time T1 to T2, hold on r = r0,
and adiabatically change the phase ϕ from 0 to ϕ0; (3)
From time T2 to T3, hold on ϕ = ϕ0, and adiabatically
decrease r from r0 to 0. When the cyclic evolution ends,
the atomic state becomes
|Ψ(T3)〉a =
(|e1〉+ eiχ1 |e2〉+ eiχ1 |e3〉+ eiχ12 |e10〉),
where geometric phases χ1 = −ν1ϕ0, χ12 = −ν12ϕ0 with
sinh2 r0
sinh2 r0+cosh
, ν12 =
2 tanh4 r0+
tanh2 r0
tanh4 r0+
tanh2 r0+1
. By per-
forming local transformations U1 = e
−iχ1 |−1〉11〈−1| and
U2 = e
−iχ1 | − 1〉22〈−1|, the state (12) can be written as
|Ψ′(T3)〉a = U1U2|Ψ(T3)〉a = 12 (|a〉1|a〉2 + |a〉1| − 1〉2 +
| − 1〉1|a〉2 + ei∆| − 1〉1| − 1〉2), where ∆ = χ12 − 2χ1 =
(2ν1 − ν12)ϕ0. Thus, the CPHASE gate with the phase
shift ∆ is realized. If both the atoms in cavity 1 and
the atoms in cavity 2 “see” different environments, |χ12|
must be equal to |2χ1| and ∆ = 0. Therefore, the phase
shift ∆ results from the collective coupling of the atoms
in both cavities with the same engineered environment.
If r0 = atanh(
4/3− 1) ≃ 0.4157, |ν12| = |ν1|. Under
this condition with ϕ0 = π/ν1, the state of the atoms at
the time T3 is |Ψ′′(T3)〉a = − 12 (−|a〉1|a〉2 + |a〉1| − 1〉2 +
| − 1〉1|a〉2 + | − 1〉1| − 1〉2). In this case, the Controlled-
Z gate between the two qubits is realized without local
transformations.
0 0.2 0.4 0.6 0.8
0.9965
0.997
0.9975
0.998
0.9985
0.999
0.9995
T=100/Γ
T=200/Γ
T=400/Γ
FIG. 3. Fidelity Fr of the atomic state.
0 0.2 0.4 0.6 0.8
0.965
0.975
0.985
0.995
T = 200/Γ
T = 400/Γ
T = 1000/Γ
FIG. 4. Fidelity Fp of the atomic state.
The above results depend on the adiabatical approxi-
mation. To check the adiabatical condition, we numer-
ically simulate the following two examples. In the first
example, we suppose that at the initial time the atoms
are in the state |Ψ1〉a = (|a〉1|a〉2+ |ψDF(0, 0)〉2)/
2 and
the laser field driving the transition |e〉 ↔ |r〉 are turned
on. Then, by slowly switching the laser field driving the
transition |g〉 ↔ |s〉, we increase the parameter r from 0
to r0 according to the linear function r(t) = r0t/T . In the
adiabatical limit (T ≫ Γ−1), the atomic state becomes
|Ψ′1〉a = (|a〉1|a〉2 + |ψDF(r0, 0)〉2)/
2 at the time T . On
the other hand, in the Hilbert space spanned by the ba-
sis states {|ei〉} for i = 1, 2, · · · , 12 , we can numerically
solve Eq. (7) and obtain the density matrix ρ1(T ) of the
atoms. Let’s define Fr = a〈Ψ′1|ρ1(T )|Ψ′1〉a as the fidelity
for this process. As shown in Fig. 3, if T > 100/Γ, Fr is
always bigger than 0.997 if r ∈ (0, 0.8), corresponding to
the almost perfect evolution.
In the second example, we suppose that the atoms are
initially in the state |Ψ2〉a = (|a〉1|a〉2+ |ψDF(r, 0)〉4)/
and all the driving fields are turned on to hold the pa-
rameters r = r0 and ϕ = 0. By adiabatically changing
the phase ϕ from 0 to 2π at the rate ϕ̇ = 2π/T , the
atomic state at the time T becomes |Ψ′2〉a = (|a〉1|a〉2 +
eiχ12 |ψDF(r, 2π)〉4)/
2. Let’s define the fidelity for this
example as Fp = a〈Ψ′2|ρ(T )|Ψ′〉a, where ρ(T ) is the nu-
merical solution of Eq. (7). As shown in Fig. 4, Fp
increases as T increases but decreases as the parameter
r0 increases. If T > 1000/Γ, Fp is larger than 0.992 for
0 < r0 < 0.8. From these two examples, we find that to
fulfill the adiabatical condition the time used in the step
2 should be much longer than in the steps 1 and 3.
A controlled-Z gate has been numerically simulated
by directly solving Eq. (7) with r0 = 0.5, and ϕ0 =
π/|2ν1 − ν12|. In the simulation, we set ṙ = r0/T1 in the
steps 1 and 3, and ϕ̇ = ϕ0/(T2 − T1) in the step 2 with
T1 = 0.05T3 and T2 − T1 = 0.90T3. If T3 > 1100/Γ, we
find that the fidelity F = a〈Ψ(T3)|ρ(T3)|Ψ(T3)〉a is larger
than 0.95. For an almost perfect controlled-Z gate with
F > 0.99, we find that T3 must be longer than 6000/Γ.
Now let’s briefly discuss the effects of the atomic spon-
taneous emission, the fiber mode decay and cavity photon
leakage. For simplicity but without the loss of generality,
we suppose that atomic spontaneous emission rates of the
excited levels are equal to γ. In the large detunig limit,
the characteristic spontaneous emission rate of the atoms
is γeff = γ(Ω
2/2∆2) [14, 16] and the effective decay rate
of the fiber mode is κeff = κfΩ
2g2/(4∆2ν2). If κf ≤ γ
and g2 ≪ ν2, κeff can be much smaller than γeff . Under
this condition, the present scheme is feasible if Γ ≫ γeff .
In the current cavity quantum dynamic (CQED) ex-
periment, the parameters (g, κ, γ) = (2000, 10, 10) MHz
could be available [20]. If setting Ω/(2∆) = 1√
× 10−3,
we have Γ ≃ 4 × 104γeff . The condition is held. In
the present scheme, the large cavity decay rate is re-
quired to ensure that the cavity modes are in a broad-
band squeezed vacuum reservoir and then the atoms al-
ways ”see” the broadband squeezed vacuum reservoir
during the dynamic evolution. For an arbitrary small but
nonzero value of the squeezing degree of the reservoir, a
CPHASE gate with arbitrary high fidelity can always be
realized in the represent scheme. The cavity decay does
not directly affect the fidelity of the realized CPHASE
gates. However, the larger the decay rate is, the longer
the operation time of the CPHASE gates is. Thus, we
have the condition κ >> β, γeff for realizing the reliable
CPHASE gates. Based on the parameters quoted above,
this condition can be well satisfied. With the parameters
of the current CQED experiment, we find that the op-
eration time of the controlled-Z gate, with fidelity larger
than 0.95, is about 2.8 ms. It is much shorter than both
1/γeff and the single-atom trapping time in cavity [21].
On the other hand, the present scheme needs a strong
coupling between the cavity and the fiber. This could be
realized at the current experiment [22]. Therefore, the
requirement for the realization of the present scheme can
be satisfied with the current technology.
IV. CONCLUSIONS
We propose a cavity-atom coupled scheme for the real-
ization of quantum controlling gates, in which each of two
pairs of four-level atoms in two distant cavities connected
by a short optical fibre are simultaneously driven by laser
fields and coupled to the local cavity modes through the
double Raman transition configuration. We show that
an effective squeezing reservoir coupled to the multilevel
atoms can be engineered under appropriate driving con-
dition and bad cavity limit. We find that in the scheme
a CPHASE gate with arbitrary phase shift can be im-
plemented through adiabatically changing the strength
and phase of driving fields along a closed loop. It is also
noticed that the larger the effective coupling strength be-
tween the environment and the atoms is, the more reli-
able the realized CPHASE gate is.
Acknowledgments
We thank Yun-feng Xiao and Wen-ping He for valuable
discussions and suggestions. This work was supported
by the Natural Science Foundation of China (Grant Nos.
10674106, 60778021 and 05-06-01).
[1] P. W. Shor, in Proceeings, 35th Annual Symposium on
Foundations of Computer Science, edited by S. Gold-
wasser (IEEE Press, Los Almitos, CA, 1994), p. 124.
[2] L.-M. Duan and G.-C. Guo, Phys. Rev. Lett. 79, 1953
(1997).
[3] P. Zanardi and M. Rasetti, Phys. Rev. Lett. 79, 3306
(1997).
[4] D. A. Lidar and et al., Phys. Rev. Lett. 81, 2594 (1998).
[5] M. V. Berry, Proc. R. Soc. A 392, 45 (1984).
[6] P. Zanardi and M. Rasetti, Phys. Lett. A 264, 94 (1999).
[7] S.-L. Zhu and P. Zanardi, Phys. Rev. A 72, 020301
(2005).
[8] L.-M. Duan and et al., Science 292, 1695 (2001).
[9] L.-A. Wu and et al., Phys. Rev. Lett. 95, 130501 (2005).
[10] Angelo Carollo et al., Phys. Rev. Lett. 96, 150403 (2006).
[11] Angelo Carollo et al., Phys. Rev. Lett. 96, 020403 (2006).
[12] J. I. Cirac, Phys. Rev. A 46, 4354 (1992).
[13] C. J. Myatt and et al., Nature (London) 403, 269 (2000).
[14] S. G. Clark and A. S. Parkins, Phys. Rev. Lett. 90,
047905 (2003).
[15] S. Lloyd, Phys. Rev. Lett. 75, 346 (1995).
[16] T. Pellizzari, Phys. Rev. Lett. 79, 5242 (1997).
[17] A. Serafini and et al., Phys. Rev. Lett. 96, 010503 (2006).
[18] Zhang-qi Yin and Fu-li Li, Phys. Rev. A 75, 012324
(2007).
[19] S. J. van Enk and et al., Phys. Rev. A 59, 2659 (1999).
[20] S. M. Spillane et al., Phys. Rev. A 71, 013817 (2005).
[21] Stefan Nuβmann and et al., Nature Phys. 1, 122 (2005).
[22] S. M. Spillane et al., Phys. Rev. Lett. 91, 043902 (2003).
ABSTRACT
  We consider an atom-field coupled system, in which two pairs of four-level
atoms are respectively driven by laser fields and trapped in two distant
cavities that are connected by an optical fiber. First, we show that an
effective squeezing reservoir can be engineered under appropriate conditions.
Then, we show that a two-qubit geometric CPHASE gate between the atoms in the
two cavities can be implemented through adiabatically manipulating the
engineered reservoir along a closed loop. This scheme that combines engineering
environment with decoherence-free space and geometric phase quantum computation
together has the remarkable feature: a CPHASE gate with arbitrary phase shift
is implemented by simply changing the strength and relative phase of the
driving fields.

<|endoftext|><|startoftext|>
Introduction
	Experimental Setup and Data Analysis
	Experimental results
	Acknowledgments
	References
	Tables
	Figure Captions
ABSTRACT
  We measured the correlation of the times between successive flaps of a flag
for a variety of wind speeds and found no evidence of low dimensional chaotic
behavior in the return maps of these times. We instead observed what is best
modeled as random times determined by an exponential distribution. This study
was done as an undergraduate experiment and illustrates the differences between
low dimensional chaotic and possibly higher dimensional chaotic systems.

<|endoftext|><|startoftext|>
arXiv:0704.0486v1  [astro-ph]  4 Apr 2007
Accepted for Publication in ApJ Letters, April 2007
Preprint typeset using LATEX style emulateapj v. 08/13/06
KINEMATIC DECOUPLING OF GLOBULAR CLUSTERS
WITH EXTENDED HORIZONTAL-BRANCH
Young-Wook Lee
, Hansung B. Gim
, & Dana I. Casetti-Dinescu
Accepted for Publication in ApJ Letters, April 2007
ABSTRACT
About 25% of the Milky Way globular clusters (GCs) exhibit unusually extended color distribution
of stars in the core helium-burning horizontal-branch (HB) phase. This phenomenon is now best
understood as due to the presence of helium enhanced second generation subpopulations, which has
raised a possibility that these peculiar GCs might have a unique origin. Here we show that these
GCs with extended HB are clearly distinct from other normal GCs in kinematics and mass. The
GCs with extended HB are more massive than normal GCs, and are dominated by random motion
with no correlation between kinematics and metallicity. Surprisingly, however, when they are excluded,
most normal GCs in the inner halo show clear signs of dissipational collapse that apparently led to the
formation of the disk. Normal GCs in the outer halo share their kinematic properties with the extended
HB GCs, which is consistent with the accretion origin. Our result further suggests heterogeneous
origins of GCs, and we anticipate this to be a starting point for more detailed investigations of Milky
Way formation, including early mergers, collapse, and later accretion.
Subject headings: Galaxy: formation – globular clusters: general – stars: horizontal-branch
1. INTRODUCTION
The discovery of multiple stellar populations in the
most massive GC ωCen (Lee et al. 1999), together
with the fact that the second most massive GC M54
is a core of the disrupting Sagittarius dwarf galaxy
(Layden & Sarajedini 2000), have strengthen the view
that some of the massive GCs might be remaining cores
of disrupted nucleated dwarf galaxies (Freeman 1993).
Among their several peculiar characteristics, both ωCen
and M54 have extended horizontal-branch (EHB), with
extremely hot horizontal-branch (HB) stars well sep-
arated from redder HB (Lee et al. 1999; Rosenberg,
Recio-Blanco, & Garcia-Marin 2004). High pre-
cision Hubble Space T elescope (HST ) photometry
(Bedin et al. 2004) has discovered that ωCen also has
a curious double main-sequence (MS). Recent studies
have shown that both these peculiar colour-magnitude
diagram (CMD) characteristics are best understood as
due to the presence of helium enhanced second gen-
eration subpopulations (Norris 2004; Lee et al. 2005;
Piotto et al. 2005; D’Antona et al. 2005). Further-
more, the prediction of the models (Lee et al. 2005;
D’Antona et al. 2005) that most of the GCs with EHB
would have double or broadened MSs are now confirmed
by HST/ACS (Advanced Camera for Survey) pho-
tometry (Piotto et al. 2007). This ensures that EHBs
are strong signature of the presence of multiple popula-
tions in GCs. A significant fraction (∼30%) of the helium
enriched subpopulation observed in these peculiar GCs
is also best explained if the second generation stars were
formed from enriched gas trapped in the deep gravita-
tional potential well while these GCs were cores of the
ancient dwarf galaxies (Bekki & Norris 2006). Despite
the lack of apparently wide spread in iron-peak elements
1 Center for Space Astrophysics & Department of Astronomy,
Yonsei University, Seoul 120-749, Korea (ywlee@csa.yonsei.ac.kr)
2 Department of Astronomy, Yale University, New Haven, CT
06520, USA
in most of these GCs, all of these recent developments
suggest that GCs with EHB are probably not genuine
GCs, but might have a unique origin in the formation
history of the Galaxy.
In order to test this working hypothesis further, we
have carefully surveyed 114 GCs with reasonably good
CMDs, and found that 28 (25%) of them have EHB
(Lee et al., in preparation). Their NGC numbers are:
2419, 2808, 5139, 5986, 6093, 6205, 6266, 6273, 6388,
6441, 6656, 6715, 6752, 7078, and 7089 for the GCs with
strongly extended HB; and 1851, 1904, 4833, 5824, 5904,
6229, 6402, 6522, 6626, 6681, 6712, 6723, and 6864 for
the GCs with moderately extended HB, including those
with bimodal HB distributions. We will collectively call
all of them as “EHB GCs”. Our selection of EHB GCs
were based on the reddening independent criteria on
CMD in B&V passbands (∆VHB > 3.5 for strongly ex-
tended HB; either 3.0 < ∆VHB < 3.5 or ∆(B − V )HB
> 0.78 with clear bimodal colour distribution for mod-
erately extended HB). But, since their appearances on
CMDs are distinct enough from GCs with normal HB
(Piotto et al. 2002), our selection agrees well with the re-
sult based on smaller sample and other measures of HB
temperature extension (e.g., Recio-Blanco et al. 2006).
We have then investigated their properties compared to
other normal GCs.
2. LUMINOSITY FUNCTION AND KINEMATICS
First of all, from the luminosity function (Fig. 1), we
found that EHB GCs are among the brightest GCs of
the Milky Way, including 11 out of 12 brightest GCs
(see also Recio-Blanco et al. 2006). It is surprising to
see that not a single EHB GC is fainter than MV = -
7. Careful inspection of all CMDs confirms that this is
not due to the smaller number of HB stars in fainter
GCs. Because of significant fraction (18 - 51%) of the
helium enriched bluer subpopulation observed in EHB
GCs, its presence on the HB would be reliably detected
(> 5 - 10 stars) even in a cluster of MV = -6 or -5 if it
http://arxiv.org/abs/0704.0486v1
2 Lee, Gim, & Casetti-Dinescu
-2 -4 -6 -8 -10 -12
40 EHB
Normal
Fig. 1.— The histogram of MV for 114 Milky Way GCs (data
from Harris 1996). Blue and red are GCs with strongly and mod-
erately extended HBs, respectively. EHB GCs are clearly brighter
(more massive) than normal GCs.
existed. This result is perhaps already suggesting that
EHB GCs might have a peculiar origin, as their inferred
current stellar mass, which might represent only a small
fraction of their original mass, is comparable with that
of low-luminosity dwarf galaxies in the Local Group.
Motivated by this, we have investigated the kinemat-
ics of EHB GCs, in order to see whether their kinematic
properties are also distinct from other normal GCs. Fol-
lowing previous investigation (Zinn 1993), we have first
divided GCs into three subgroups (Fig. 2) based on the
HB morphology and metallicity diagram (Lee, Demar-
que, & Zinn 1994). Metal-poor ([Fe/H] < -0.8) GCs
in the “Old halo (OH)” group have bluer HB morphol-
ogy at a given metallicity, and those in the “Younger
halo (YH)” group have redder HB morphology at fixed
metallicity. The OH group, in the mean, is probably
older than the YH group by ∼1 Gyr (Rey et al. 2001;
Salaris & Weiss 2002). The metal-rich ([Fe/H] > -0.8)
GCs are further classified as “disk/bulge (D/B)” group.
EHB GCs belong in all three subgroups, although the
majority of them are in the OH group. Note also that
most (94%) GCs in YH group are in the outer halo
(galactocentric distance, Rgc > 8 kpc), while the ma-
jority (80%) of GCs in OH and D/B groups are in the
inner halo (Rgc < 8 kpc).
The result of the kinematic analysis based on
the constant-rotational-velocity solutions (Zinn 1993;
Frenk & White 1980) and the updated database of Har-
ris (Harris 1996) is presented in Table 1. When all the
GCs are considered, we are basically confirming the con-
clusion of the previous work (Zinn 1993). YH group is
dominated by random motion with no sign of significant
rotation (Vrot), while OH group shows some prograde
rotation and a smaller line-of-sight velocity dispersion
(σlos). D/B group is mostly supported by rotation with
a relatively small σlos. We find, however, EHB GCs, both
belonging to YH and OH groups, are dominated by ran-
dom motion and show no signs of rotation. Consequently,
when they are excluded from the sample, normal GCs in
OH group show increased rotation (from 1.5 to 1.8σ from
zero Vrot) and higher value of Vrot/σlos. The same trend
is also observed in the normal GCs in D/B group, but
with much larger uncertainty. When only comparably
bright (MV < -6) GCs are considered, the differences
become significantly larger (2.5σ from zero Vrot). The
TABLE 1
KINEMATICS OF GLOBULAR CLUSTERS BASED ON
RADIAL VELOCITY ALONE*
Group N Vrot σlos Vrot/σlos
All GCs
All Halo 71 25±27 124±10 0.20±0.22
YH 25 -18±66 153±22 -0.12±0.43
OH 46 40±27 104±11 0.38±0.26
D/B 14 168±28 65±12 2.57±0.65
EHB GCs
All EHB 24 10±32 93±13 0.11±0.34
OH 18 4±35 91±15 0.05±0.38
Normal
All Halo 48 32±39 137±14 0.24±0.29
YH 20 -42±80 162±26 -0.26±0.49
OH 28 70±39 111±15 0.63±0.36
D/B 13 188±22 48±9 3.94±0.89
Normal (MV < -6)
All Halo 39 37±44 139±16 0.26±0.32
YH 17 -69±81 160±27 -0.44±0.51
OH 22 105±42 103±15 1.02±0.43
D/B 11 195±27 52±11 3.76±0.95
*For Rgc < 40 kpc and excluding GCs with (cosψ) > 0.2
above analysis, based only on the radial velocity data,
provides good reason to suspect that EHB GCs are kine-
matically decoupled from other normal GCs, especially
in OH group. Below, we investigate this in more detail
using the measurements of full spatial motions and or-
bital parameters now available for 49 GCs in our sample
(Dinescu et al. 2003).
In Figure 3, we have plotted kinematic parameters ob-
tained from full spatial motions as a function of metallic-
-1.0 -0.5 0.0 0.5 1.0
EHB OH+D/B
EHB YH
Normal OH+D/B
Normal YH
Red Blue
HB Type
Fig. 2.— The subdivision of GCs in the HB morphology ver-
sus metallicity diagram. The filled circles are GCs either in the
“old halo (OH)” or metal-rich ([Fe/H] > -0.8) “disk/bulge (D/B)”
groups, while open circles are those in the “younger halo (YH)”
group. The EHB GCs belong in all three subgroups, but most of
them are in OH group. The updated database (Lee et al. 1994)
consisting 95 GCs in Rgc < 40 kpc zone are compared with model
HB isochrones (Rey et al. 2001) in solid lines, the upper being
older by 1.1 Gyr. Short dashed line has the same age as the upper
solid line, but is for EHB GCs with 15 % of helium enhanced (Y
= 0.33) subpopulation (Lee et al. 2005).
Kinematic Decoupling of EHB GCs 3
      
-2000
-1500
-1000
EHB OH
EHB YH
Normal OH+D/B (P)
Normal OH+D/B (R)
Normal YH
47Tuc
      
200 M54
      
-2.5 -2.0 -1.5 -1.0 -0.5  
47Tuc
[Fe/H]
0.0 -2.5 -2.0 -1.5 -1.0 -0.5  
[Fe/H]
0.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
-1000
47Tuc
[Fe/H]
Fig. 3.— The relationship between kinematics derived from full spatial motions and metallicity. From (a) to (f), total orbital energy
(Etot), maximum distance perpendicular to the Galactic plane (Zmax), velocity component perpendicular to the plane (W ), rotational
velocity (Θ), orbital eccentricity, and the angular momentum component associated with Θ (LZ) are plotted as a function of metallicity
(Zinn 1993), respectively. Red filled circles are normal GCs with prograde rotation in OH and D/B groups. All of them are in the inner
halo. Red open circles are normal GCs with retrograde rotation in OH and D/B groups. Only 3 of them are in the inner halo.
ity, which shows more directly the systematic differences
between EHB and normal GCs. In all panels of Fig-
ure 3, EHB GCs have diversity in kinematics, and show
no correlations with metallicity (correlation coefficient,
r, of -0.02 to -0.31 with high p-values of 0.25 to 0.94).
Normal YH GCs, mostly in the outer halo, show kine-
matically hot signatures (high Etot, Zmax, & eccentricity,
and large velocity dispersion) (Mackey & Gilmore 2004).
To our surprise, however, when EHB GCs are excluded,
most normal GCs with prograde rotation in OH and D/B
groups (red filled circles) show clear signs of dissipational
collapse. Etot, Zmax, W velocity, and perhaps orbital ec-
centricity are all decreasing with increasing metallicity,
among which the ‘chevron’ shape of W velocity distri-
bution is most impressive. Rotational velocity, however,
is increasing with metallicity, and LZ appears to be con-
served. The orbital properties of NGC 6528 are known to
be highly affected by the potential of the bar because of
its proximity (Dinescu et al. 2003). Thus, excluding this
one deviant point in panels (d) and (e), we obtain strong
correlations for Etot, Zmax, |W |, Θ, and eccentricity (r
= -0.58, -0.71, -0.95, 0.89, and -0.77, respectively) with
small p-values (0.05, 0.01, 3.1× 10−6, 0.0002, and 0.005,
respectively). In other words, the correlations are highly
significant at the level of 95%, 99%, 99.999%, 99.98%,
and 99.5%, respectively. As expected, however, corre-
lation is low (r = 0.17) for LZ with a high p-value of
0.59.
All of these trends observed for normal GCs with pro-
grade rotation in OH and D/B groups are fully consistent
with the model first envisioned by Eggen, Lynden-Bell,
& Sandage (1962), where metal enrichment went on as
dissipational collapse continued. Although these results
are based on relatively modest numbers of GCs with full
spatial motion information, their coherent behaviours in
all panels of Figure 3, together with statistically signifi-
cant correlations, confirm that we are detecting real sig-
natures. Also, these results are consistent with the kine-
matics solution obtained from radial velocity alone (Ta-
ble 1), which is based on a larger sample of GCs. We
argue, therefore, (1) EHB GCs in our sample are indeed
kinematically decoupled from most of the normal GCs in
OH group, and (2) when EHB GCs are excluded, we are
detecting clearer signatures of dissipational collapse in
the inner halo, which apparently led to the formation of
the Galactic disk (Zinn 1993; Mackey & Gilmore 2004).
The kinematics of EHB GCs, which are not following
the dissipational collapse, are more consistent with what
one would expect among the relicts of primeval star-
forming subsystems that first formed the nucleus (EHB
GCs with low Etot and Zmax) and halo (EHB GCs with
high Etot and Zmax) of the Galaxy through both dissipa-
tional and dissipationless mergers, as has been predicted
by recent ΛCDM simulations for “high-σ peaks” (e.g.,
Diemand, Madau, & Moore 2005; Moore et al. 2006).
As described above, a significant fraction of the helium
enriched subpopulation also favours building block origin
of EHB GCs. Normal YH GCs in the outer halo share
4 Lee, Gim, & Casetti-Dinescu
their kinematic properties with the outlying EHB GCs,
which is consistent with the view (Searle & Zinn 1978)
that they were originally formed in the outskirts of iso-
lated building blocks and later accreted to the outer halo
of the Galaxy when their parent dwarf galaxies, like the
Sagittarius, were merging with the Milky Way. The GCs
with EHB also tend to show more extended Na-O and
Mg-Al anticorrelations (Gratton 2007). Therefore, the
suggested connection between some of these GCs with
strong chemical inhomogenity and orbital parameters
(Carretta 2006) might be due to the diversity of kine-
matics among EHB GCs.
According to the present picture, most of the normal
GCs with retrograde rotation in OH and D/B groups
could have also originated from the subsystems with ret-
rograde rotation. Interestingly, their relatively confined
distributions both in the angular momentum phase space
(Helmi et al. 1999) and velocity space are not inconsis-
tent with the possibility that some or most of them were
former members of parent dwarf galaxies hosting two
EHB GCs, ωCen and/or NGC 6723 (Lee et al., in prepa-
ration). Their distribution in velocity space is also well
consistent with the model prediction of the tidal debris
from ωCen’s parent dwarf system (Mizutani, Chiba, &
Sakamoto 2003), which was presumably formed in the
outer halo and accreted to the inner halo. Note that a
similar minor merging of subsystem with the thin disk
(Quinn, Hernquist, & Fullagar 1993) could have also
changed some of the original kinematic properties of two
disk GCs in Figure 3 (47Tuc and M71).
3. DISCUSSION
The clear differences in kinematics and mass be-
tween GCs with and without EHB are strong evidence
that they have different origins. Our results suggest
present-day Galactic GCs are most likely an ensemble
of heterogeneous objects originated from three distinct
phases of the Milky Way formation: (1) remaining cores
or central star clusters of building blocks that first assem-
bled to form the nucleus and halo of the proto-Galaxy
(Bromm & Clarke 2002; Santos 2003; Bekki 2005;
Kravtsov & Gnedin 2005; Moore et al. 2006), (2) gen-
uine GCs formed in the dissipational collapse of a
transient gas-rich inner halo system that eventually
formed the Galactic disk (Eggen et al. 1962), and (3)
genuine GCs formed in the outskirts of outlying building
blocks that later accreted to the outer halo of the Milky
Way (Searle & Zinn 1978). In this picture, relicts of
first building blocks that formed the flattened nucleus
(Kravtsov & Gnedin 2005; Moore et al. 2006) are now
observed as relatively metal-poor EHB GCs (e.g., NGC
6266, 6522, and 6626) having low Etot and Zmax near the
centre. Formation of the slowly-rotating gas-rich inner
halo system that later collapsed in phase (2) is still most
unclear, but it is attractive to speculate that leftover gas
from “rare peaks” (building blocks hosting EHB GCs)
in the inner halo and gas from continuously falling “less
rare peaks” (Moore et al. 2006) led to the formation of
this structure, perhaps with the aids of some heating
feedbacks (e.g., Schawinski et al. 2006) soon followed by
cooling. Several lines of further study will certainly help
to shed more light into the picture briefly sketched here.
For example, search for the tidal streams that might be
associated with EHB GCs, dark matter search in the
outlying EHB GCs where preferential disruption of dark
matter halo (Saitoh et al. 2006) might be less severe,
kinematics analyses of extragalactic GC systems along
with the ultraviolet survey for EHB GC candidates,
together with more detailed high resolution ΛCDM
simulations.
We thank R. Zinn, R. Larson, and P. Demarque for
helpful discussions, C. Chung for his assistance in HB
isochrone construction, and H.-Y. Lee for her assistance
in CMD compilation. Support for this work was provided
by the Creative Research Initiatives Program of the Ko-
rean Ministry of Science & Technology and KOSEF, for
which we are grateful.
REFERENCES
Bedin, L. R., Piotto, G., Anderson, J., Cassisi, S., King, I. R.,
Momany, Y., & Carraro, G. 2004, ApJ, 605, L125
Bekki, K. 2005, ApJ, 626, L93
Bekki, K. & Norris, J. E. 2006, ApJ, 637, L109
Bromm, V. & Clarke, C. J. 2002, ApJ, 566, L1
Carretta, E. 2006, AJ, 131, 1766
D’Antona, F., Bellazzini, M., Caloi, V., Pecci, F. F., Galleti, S., &
Rood, R. T. 2005, ApJ, 631, 868
Diemand, J., Madau, P., & Moore, B. 2005, MNRAS, 364, 367
Dinescu, D. I., Girard, T. M., van Altena, W. F., & Lopez, C. E.
2003, AJ, 125, 1373
Eggen, O. J., Lynden-Bell, D., & Sandage, A. R. 1962, ApJ, 136,
Freeman, K. C. 1993, in ASP Conf. Ser. 48, The Globular Clusters-
Galaxy Connection, ed. G. H. Smith & J. P. Brodie (San
Francisco: ASP), 608
Frenk, C. S. & White, S. D. M. 1980, MNRAS, 193, 295
Gratton, R. 2007, in ASP Conf. Ser., From the Stars to Galaxies,
ed. A. Vallenari & R. Tantalo, (Venice: ASP), in press
Harris, W. E. 1996, AJ, 112, 1487
Helmi, A., White, S. D. M., de Zeeuw, P. T., & Zhao, H. 1999,
Nature, 402, 53
Kravtsov, A. V. & Gnedin, O. Y. 2005, ApJ, 623, 650
Layden, A. C. & Sarajedini, A. 2000, AJ, 119, 1760
Lee, Y.-W., Demarque, P., & Zinn, R. 1994, ApJ, 423, 248
Lee, Y.-W., Joo, J.-M., Sohn, Y.-J., Rey, S.-C., Lee, H.-C., &
Walker, A. R. 1999, Nature, 402, 55
Lee, Y.-W. et al. 2005, ApJ, 621, L57
Mackey, A. D. & Gilmore, G. F. 2004, MNRAS, 355, 504
Mizutani, A., Chiba, M., & Sakamoto, T. 2003, ApJ, 589, L89
Moore, B., Diemand, J., Madau, P., Zemp, M., Stadel, J. 2006,
MNRAS, 368, 563
Norris, J. E. 2004, ApJ, 612, L25
Piotto, G. et al. 2002, A&A, 391, 945
Piotto, G. et al. 2005, ApJ, 621, 777
Piotto, G. et al. 2007, preprint (astro-ph/0703767)
Quinn, P., Hernquist, L, & Fullagar, D. 1993, ApJ, 430, 74
Recio-Blanco, A., Aparico, A., Piotto, G., de Angeli, F., &
Djorgovski, S. J. 2006, A&A, 452, 875
Rey, S.-C., Yoon, S. J., Lee, Y.-W., Chaboyer, B., & Sarajedini, A.
2001, AJ, 122, 3219
Rosenberg, A., Recio-Blanco, A., & Garcia-Marin, M. 2004, ApJ,
603, 135
Saitoh, T. R., Koda, J., Okamoto, T., Wada, K., & Habe, A. 2006,
ApJ, 640, 22
Salaris, M. & Weiss, A. 2002, A&A, 388, 492
Santos, M. R. 2003, in ESO Astrophysics Symposia, Extragalactic
Globular Cluster Systems, ed. M. Kissler-Patig (Garching:
Springer-Verlag), 348
Schawinski, K. et al., 2006, Nature, 442, 888
Searle, L & Zinn, R. 1978, ApJ, 225, 357
Zinn, R. 1993, in ASP Conf. Ser. 48, The Globular Clusters-Galaxy
Connection, ed. G. H. Smith & J. P. Brodie (San Francisco:
ASP), 38
ABSTRACT
  About 25% of the Milky Way globular clusters (GCs) exhibit unusually extended
color distribution of stars in the horizontal-branch (HB) phase. This
phenomenon is now best understood as due to the presence of helium enhanced
second generation subpopulations, which has raised a possibility that these
peculiar GCs might have a unique origin. Here we show that these GCs with
extended HB are clearly distinct from other normal GCs in kinematics and mass.
The GCs with extended HB are more massive than normal GCs, and are dominated by
random motion with no correlation between kinematics and metallicity.
Surprisingly, however, when they are excluded, most normal GCs in the inner
halo show clear signs of dissipational collapse that apparently led to the
formation of the disk. Normal GCs in the outer halo share their kinematic
properties with the extended HB GCs, which is consistent with the accretion
origin. Our result further suggests heterogeneous origins of GCs, and we
anticipate this to be a starting point for more detailed investigations of
Milky Way formation, including early mergers, collapse, and later accretion.

<|endoftext|><|startoftext|>
Microsoft Word - stancil_text+figs_prDR.doc
 Long Distance Signaling Using Axion-like Particles 
Daniel D. Stancil, Department of Electrical and Computer Engineering 
Carnegie Mellon University, Pittsburgh, PA 15213 
Abstract 
The possible existence of axion-like particles could lead to a new type of long distance 
communication. In this work, basic antenna concepts are defined and a Friis-like equation 
is derived to facilitate long-distance link calculations. An example calculation is 
presented showing that communication over distances of 1000 km or more may be 
possible for 3.5am <  meV and 
8 15 10  GeVag γγ
− −> × .  
PACS: 14.80.Mz, 84.40.Ba, 84.40.Ua, 42.79.Sz 
The axion has been proposed as a solution to the strong-CP problem [1], and is also a 
candidate for the galactic dark matter [2]. Interest in axions has increased recently owing 
to the report by the PVLAS collaboration of optical rotation induced by a magnetic field 
in a vacuum [3], since the creation of axions or other similar particles was one possible 
explanation for this rotation. Subsequently a number of groups began experimental 
searches using axion generation and detection schemes that would more definitively point 
to these new particles as the explanation [4]. The PVLAS result was surprising because it 
suggested coupling between the axion and electromagnetic fields that was much larger 
than thought possible based on solar axion observations by the CAST collaboration [5]. 
Although mechanisms have been proposed to reconcile the reports [6], the PVLAS 
collaboration recently retracted the results [7] and an independent group has reported a 
negative result from a photon regeneration experiment that excludes the PVLAS result 
[8]. 
There does not now appear to be any experimental evidence of a coupling 
strength inconsistent with CAST observations. However, since recent work has suggested 
mechanisms whereby such strong coupling may be possible, I believe it remains 
interesting to consider the implications of stronger-than-expected axion-photon coupling.  
In particular, I would like to call attention to the observation that a new type of long-
distance signaling and communication may be possible. It may be possible to construct a 
communication system that cannot be blocked—even communicating directly through 
the diameter of the earth. This would make reliable worldwide signaling possible without 
the use of either satellites or the ionosphere, and would enable communication to 
locations previously inaccessible, such as submarines at the bottom of the sea and mines 
deep beneath the earth. The signal would also be very difficult to intercept since the axion 
beam would be essentially as narrow as a laser beam used to create it, and most of the 
path would be underground. With advances in power and sensitivity, it may also be 
possible to use axion signaling in space communications. For example, using such a 
system, communication with points on the far side of the moon may be possible without 
the use of lunar satellites. 
Communication systems using neutrinos have also been proposed [9], and would 
have many of the same characteristics as the proposed axion system. However, the 
generation and detection of neutrinos requires massive particle accelerators and 
scintillation detectors [10]. Also, full deflection over 4π steradians would not be 
practical, though limited beam steering could be achieved using a magnetic field to 
deflect the precursor pion beam. Finally, it would be difficult to consider modulation 
techniques more sophisticated than simple amplitude modulation. In contrast, using axion 
mass and coupling values not yet experimentally explored, it appears that worldwide 
communication would be possible with a fully steerable axion system about the size of a 
medium-size telescope. Further, since the signals at the input and output would be 
electromagnetic waves, any existing modulation technology could be used. 
However, as with neutrinos, the lack of strong interactions with matter presents 
challenges with respect to the generation and detection of axions. Sikivie proposed an 
experimental approach for detecting axions via their coupling to the electromagnetic field 
[11]. The coupling was obtained by considering the Lagrangian density 
1 1 1 1
4 4 2 2a a
L F F g aF F a a m aμν μν μμν γγ μν μ= − − + ∂ ∂ − , (1) 
where F A Aμν μ ν ν μ= ∂ − ∂  is the electromagnetic field tensor, ( ),A V Aα = −  where V is the 
electric scalar potential and A  is the magnetic vector potential, a  is the axion field, ag γγ  
is the coupling constant between the electromagnetic and axion fields, and the 
electromagnetic dual tensor is given by 
F F γδαβ αβγδε= . In these equations we have 
taken 1h c= = .  
As an example, consider the coupling between plane waves propagating along the 
z direction caused by a strong static magnetic field parallel to the polarization of the 
incident electromagnetic wave. If the time dependence is exp( )i tω− , where ω is the 
frequency of the incident linearly polarized electromagnetic wave, the equations of 
motion obtained from (1) reduce to the coupled equations 
2 2 2
0 02 2,         .     a a a
m a i g B A A i g B a
z zγγ γγ
ω ω ω ω
+ − = − + =
 (2) 
Thus the static magnetic field B0 couples the photon and axion fields. An apparatus for 
the generation and detection of axions based on this coupling is shown in Figure 1(a). 
This is sometimes referred to as an “invisible light through walls” experiment [4,12,13]. 
An electromagnetic wave with amplitude (0)A  enters a region of magnetic field of 
strength BOT extending over a distance LT. If the coupling is sufficiently weak that the 
change in A  over the transmit conversion region and the change in a over the receive 
conversion region are negligible, then the conversion loss through the system will be [12] 
 0 / in R TP P p p= , (3) 
where inP  is the optical input power, 0P  is the optical output power, Tp , Rp are the 
probabilities of photon-axion (and axion-photon) conversion in the transmitter and 
receiver, respectively, and the conversion probability is [11,12,14] 
1 sin / 2
2 / 2aa
p g B L
k qLγγ
ω ⎡ ⎤
= ⎢ ⎥
. (4) 
Here aq k kγ= −  indicates the phase mismatch between the photon and axion fields.  
The efficiency of generating axions and regenerating photons can be greatly 
increased by adding electromagnetic resonators, as shown in Figure 1(b) [4,15]. In this 
figure a laser is used to generate the incident photons, and mirrors are used to cause the 
light to pass through the magnetic field multiple times, increasing the conversion 
probability by the factor 2 /TF π , where FT is the finesse of the resonator in the axion 
generator (transmitter). This factor can be interpreted as the effective number of photon 
passes in the resonator. Axions are also emitted in the backward direction owing to the 
counter-propagating light in the resonator, resulting in half the particles traveling in an 
unwanted direction. Consequently, the probability of conversion in a given direction is 
increased by the factor /TF π . As also shown in Figure 1(b), a resonator on the photon 
regenerator (receiver) likewise increases the axion-photon conversion probability [15]. 
Since regenerated photons will be emitted in both directions, detectors are placed on both 
ends of the receiving optical resonator, enhancing the photon regeneration probability by 
the factor 2 /RF π , where FR is the finesse of the receiving resonator. (If the power from a 
single end is collected, the factor would be /RF π , as with the transmitter.) Finally, to 
turn this into a communication system, we add appropriate electromagnetic wave 
modulators and detectors as shown in Figure 1(b). 
The conversion loss equation (3) is valid when the transmitter and receiver are 
sufficiently close together that beam diffraction can be neglected, and when the 
transmitter and receiver have equal cross-sectional areas. For signaling over long 
distances, neither assumption will be valid in general. To treat the long-distance case, we 
first calculate the radiated axion field, then calculate the regenerated photons resulting 
when the radiated field reaches the receiver. 
The general solution to Eq. (2) is given by [12] 
 3 0( ) ( ) ( )4
aik r r
a r i g d r A r B r
r rγγ
′ ′ ′=
′−∫ i . (5) 
If the observation point r  is very far away from all points in the source volume V, then 
we obtain the far-field approximation for the axion field 
 3 0( ) ( ) ( )4
a r i g d r e A r B r
′−′ ′ ′≈ ∫ i i . (6) 
Consider the case where the source 0( ) ( )A r B r′ ′i  is only nonzero inside a cylinder of 
radius R and length L as shown in Figure 2. Further, we assume that within this cylinder 
0 0ˆ TB xB=  and 0ˆ / exp( )TA x F A ik zγπ= , where 0A is the amplitude of the incident 
electromagnetic wave, and the cylinder is contained in a resonant cavity with finesse TF . 
Using Eq. (6), the far-field potential is found to be 
sin / 2
2 / 2
aik r az T a TT
a T T T
a Taz T
k k L J k RFe
a r i g A B L s
r k Rk k L
⎡ ⎤ ⎡ ⎤−⎣ ⎦≈ ⎢ ⎥
− ⎢ ⎥⎣ ⎦
, (7) 
where sin ,   cosa a az ak k k kρ θ θ= = , and
T Ts Rπ=  is the cross-sectional area of the source 
region at the transmitter. This expression has its maximum when 0θ = , for which 
,  0az a ak k k ρ= = . The factor containing the Bessel function in (7) has the limit 
( )10lim / 1/ 2a a T a Tk J k R k Rρ ρ ρ→ ⎡ ⎤ =⎣ ⎦ . Consequently, the axion field on axis is 
sin / 2
4 / 2
aik r
a T T T
a r i g A B L s
r qLγγ
≈ . (8) 
The time averaged transmitted power density is 
( ) ( )
a a T T in
S r k a r s p P
= = , (9) 
where (0)in TP S sγ= , and ( )
(0) 0
S k Aγ γω= . If this axion power density is 
incident upon a photon regenerator at distance r that is perfectly aligned with the 
transmitter, then the received power is 
 0 (2 / ) ( )R R a RP F p S r sπ= . (10) 
Substituting Eq. (9) for the power flux ( )aS r  gives 
in aR T
R R T T
P kF F
P p s p s
rπ π π π
= . (11) 
This expression can be understood in terms of antenna theory for electromagnetic waves. 
In this context, we refer to the apparatus consisting of the resonator and the structure 
creating the magnetic field as an axion antenna. In analogy with conventional antenna 
theory, we define the directivity as 
/(4 )
= , (12) 
where Prad is the total power radiated by the transmitting antenna. To obtain the total 
power radiated, we could integrate the power flux (9) over a sphere enclosing the 
antenna. However, it is easier to do the calculation in the near field using the axion field 
at the aperture of the transmitting antenna. Using ( ) ( / ) (0)a T T TS L F p Sγπ= , we have 
 2 ( ) (2 / )rad a T T T T inP S L s F p Pπ= = . (13) 
Substituting (13) for the total radiated power and (9) for the power flux, the directivity 
simplifies to 
 2 2(2 / ) (4 / )( / 2)T a T a TD s sπ λ π λ= = . (14) 
The relation between the directivity and physical area (14) is ½ that found in 
conventional antenna theory, or equivalently, the area appears to be half the physical 
area. This results from the bi-directional radiation properties of the resonator. Defining an 
efficiency as /rad inP Pη = , we also define the antenna gain as 
 2(2 / ) (2 / )T T T T T a TG D F p sη π π λ= = , (15) 
where (2 / )T T TF pη π= . 
Next suppose that at some distant location this transmitted field is incident upon a 
receive antenna with length LR, radius RR, and finesse FR. From (10) and assuming the 
photons emitted from both ends of the receive antenna are collected, the total power 
collected will be 0 , ( )e R aP s S r= , where we have defined the effective area of the 
receiving antenna as 
 , ( / ) ,e R R c R Rs s n F pπ=  (16) 
R Rs Rπ=   is the physical cross-sectional area, and cn  is the number of ends from which 
photons are collected (i.e., 1, 2cn = ). 
As with electromagnetic antennas, the ratio of effective area to gain is found to be 
independent of the details of the antenna, other than whether or not photons are collected 
from both ends of the antenna when used to receive: 
 2, ,/ / /(4 ).e R R e T T c as G s G n λ π= =  (17) 
If photons were collected only from one end of the receive antenna ( 1cn = ), then Eq. (17) 
would be identical to conventional antenna theory.  
With these definitions, the expression for the received power (11) can be 
interpreted as a Friis-like equation: 
 ( ) ( )( )2 2 20 , , ,/ / 4 /( ) /in c T R a e T c T a e RP P n G G r s n s rλ π λ= = . (18) 
Here ,c Tn  is the value used to compute ,e Ts  according to Eq. (16). The ratio , ,/e T c Ts n  is 
independent of the choice of ,c Tn , as it should be since the number of detectors that might 
be used on receive is independent of the transmit properties of the antenna.  
It is also useful to note that if the magnetic field is uniform (i.e., wigglers, or 
quasi-phase matching, are not used [12]), then there is an optimum length for the 
conversion region of an antenna. This occurs when ( )sin / 2 1qL = , or qL π= . The 
optimum length is found to be 
 2( / )opt aL mγλ ω= . (19) 
In obtaining this expression, we have used 2 /(2 )aq m ω≈ , which is valid for am ω .  
The diffraction-limited power pattern of the radiated axion field is determined by 
the aperture size in wavelengths through the Bessel function term in (7): 
 ( ) ( ) 214 sin /( sin )d a aP J k R k Rθ θ θ= ⎡ ⎤⎣ ⎦ . (20) 
The diffraction beam width between first nulls is determined by the first zero of the Airy 
disc, and for small angles is given by the well-known expression 
 1.22 /dFWFN TRγθ λ≈ . (21) 
Similarly, the diffraction beam width at half maximum is determined by the roots of 
( ) 1/ 2dP θ = , or 
 (1.616 / ) / 0.514 /dFWHM a T TR Rγθ π λ λ= ≈ . (22) 
In contrast, the conversion-limited power pattern is given by 
sin cos / 2
2 cos / 2
k k L
k k L
⎡ ⎤⎡ ⎤−⎣ ⎦⎢ ⎥=
, (23) 
and depends on both the length in wavelengths and the velocity mismatch. For the 
optimum length L given by (19), the conversion beam widths are approximately given by 
 2( / ),            1.06( / ) c cFWFN a FWHM am mθ ω θ ω≈ ≈ . (24) 
For quantum-limited detection, the channel capacity is [16] 
 2log (1 / )d RC Nν η ν= Δ + Δ , (25) 
were dη  is the quantum efficiency of the optical detectors, νΔ  is the bandwidth of the 
resonator /(2 )c LFνΔ =  [17], and c is the velocity of light. Equation (25) can be 
combined with (11) or (18) to find the axion parameters that would permit a particular 
channel capacity at a given distance for a particular experimental apparatus. 
As an example, consider transmitters and receivers with 1064 nm (1.17 eV)γλ = , 
10 WinP = , 
20.01 ms = , 0 10 TB = , 3L =  m, 
53.1 10F = × , 0.5dη = , and a minimum 
information capacity of 1 bps. Figure 3 shows the inverse coupling strength curves 
( ) 1/a aM m g γγ=  for distances of 1000 km and the diameter of the earth. Also shown is 
the curve for communication between the earth and the far side of the moon using the 
“4+4” experimental apparatus proposed in [15]. The shaded regions are excluded by the 
results from the BFRT collaboration [13] and Robilliard et al. [8]. The dots represent 
extensions of the curves if quasi-phase matching (QPM) is used by periodically reversing 
the magnetic field [12]. The minimum period of the reversal is taken to be twice the beam 
diameter for the 3 m system example, and 28.6 m for the 4+4 system. From the figure, 
communication over distances in excess of 1000 km should be possible for 3.5am <  meV 
and 72 10  GeVM < ×  ( 8 15 10  GeVag γγ
− −> × ). Of particular interest, we note that the 
range 6 72 10  GeV 2 10  GeVM× < < ×  has not yet been excluded by photon regeneration 
experiments. 
For an area of 0.01 m2, the radius is 0.01/ 0.0564R π= =  m. The half-power 
diffraction beam width for the example antennas is therefore 
6 49.7 10 rad 5.56 10 deg 2dFWHMθ
− −= × ≈ × ≈  arc-sec, while the half-power conversion beam 
width for 0.7am ≈ meV (optimum for L=3 m) is 
4 26.3 10 rad 3.63 10 deg 131cFWHMθ
− −= × ≈ × ≈  arc-sec. Consequently the total beam width is 
determined by the diffraction beam width in this example. While pointing with an 
accuracy of 2 arc-sec would be challenging, it is somewhat less stringent than that 
required for optical deep space communications [18]. The size of such axion transmitters 
and receivers would be roughly that of a medium size professional telescope. It is 
interesting to note that the size of the beam at a distance of the earth’s diameter is about 
124 m. Consequently coordinates obtained with differential GPS at both sites should 
enable the computation of pointing directions to sufficient accuracy. 
Note that the receive resonator must be tuned to the same frequency as the 
transmit resonator to within a fraction of the resonator line width which is 161 Hz in this 
example. This will present a significant challenge, especially since the resonators are 
remote from one another. A possible approach would be to use atomic clocks to stabilize 
both the transmit frequency and a local reference frequency at the receiver. The receive 
resonator would then be locked to the reference to get as close as possible to the correct 
frequency, then slowly tuned until the signal is located. A feedback loop could then be 
closed to lock the receive resonator to the signal. 
In summary, for 3.5am <  meV and 
72 10  GeVM < ×  ( 8 15 10  GeVag γγ
− −> × ), it 
may be possible to realize a new type of wireless signaling that cannot be blocked or 
shielded. An example calculation shows that communication between points located 
diametrically opposite on the earth should be possible. This could enable world-wide 
communication without the use of satellites or the ionosphere. However, with present 
knowledge, the signaling will be limited to low data rates, perhaps on the order of a few 
bits per second for terrestrial links. This estimate assumes 3 m long 
generation/regeneration regions to allow fully steerable instruments. Though not easily 
steerable, the apparatus in the “4+4” experiment proposed by Sikivie et al., [15] may 
enable communication to the far side of the moon for 0.3am <  meV and 
66 10  GeVM < × . 
I would like to acknowledge helpful discussions with Jim Lesh, Rich Holman, 
Jeff Peterson, Pierre Sikivie, and David Tanner during the development of these ideas.  
Electronic address: stancil@cmu.edu 
References 
1. R. D. Peccei  and H. R. Quinn, Phys. Rev. Lett., 38, 1440, (1977); S. Weinberg, 
Phys. Rev. Lett., 40, 223 (1978); F. Wilczek, Phys. Rev. Lett., 40, 279 (1978). 
2. R. Bradley, et al., Rev. Mod. Phys. 75, 777 (2003). 
3. E. Zavattini, et al. (PVLAS Collaboration), Phys. Rev. Lett., 96, 110406 (2006).  
4. See, for example, R. Rabadan, A. Ringwald, and Kris Sigurdson, Phys. Rev. Lett., 
96, 110407 (2006); and J. Jaeckel, E. Masso, J. Redondo, A. Ringwald, and F. 
Takahashi, arXiv:hep-ph/0605313. 
5. S. Andriamonje et al. (CAST collaboration), J. Cosmol. Astropart. Phys. 04, 010 
(2007). 
6. See, for example, R.N. Mohapatra and S. Nasri, Phys. Rev. Lett., 98, 050402 
(2007). 
7. E. Zavattini et al., arXiv:0706.3419 [hep-ex]. 
8. C. Robilliard, et al., arXiv:0707.1296v3 [hep-ex]. 
9. See, for example, J. W. Eerkens, US Patent # 4,205,268, May 27, 1980; J. M. 
Pasachoff and M. L. Kutner, Cosmic Search Vol. 1 No. 3 p. 2 (1979), 
http://www.bigear.org/CSMO/PDF/CS03/cs03p02.pdf ; H. Xie, J. Gao, C. Liu, 
and Q. Zhai, 2006 IET Int. Conf. on Wireless Mobile and Multimedia Networks 
Proc., Hangzhou, China, 6-9 Nov., (2006).  
10. D. G. Michael, et al., Phys. Rev. Lett., 97, 191801 (2006). 
11. P. Sikivie, Phys. Rev. Lett., 51, 1415 (1983); Phys. Rev. D, 32, 2988 (1985). 
12. K. Van Bibber, N.R. Dagdeviren, S.E. Koonin, A.K. Kerman, and H.N. Nelson, 
Phys. Rev. Lett., 59, 759 (1987). 
13. R. Cameron, et al., Phys. Rev. D, 47 3707 (1993). 
14. G. Raffelt and L. Stodolsky, Phys. Rev. D, 37, 1237 (1988). 
15. P. Sikivie, D.B. Turner, and Karl van Bibber, Phys. Rev. Lett. 98, 172002 (2007). 
16. C-C. Chen, in Deep Space Optical Communications, edited by H. Hemmati (John 
Wiley & Sons, Hoboken, NJ, 2006) p. 92. 
17. A. Yariv and P. Yeh, Photonics: Optical Electronics in Modern Communications, 
6th Edition (Oxford Univ. Press, New York, 2007), pp. 171. 
18. C-C. Chen, op cit., p. 106. 
Figure Captions 
Figure 1. (a) Basic system for the generation and detection of axions. (b) An axion communication 
system, with the axion generator and photon regenerator located remote from one another.   
Figure 2. A source with uniform amplitude over a cylindrical region. 
Figure 3. (Color Online) The ranges of inverse coupling parameter M and pseudoscalar mass ma that 
would permit communication at information bandwidths of at least 1 bps using the example system, 
and the 4+4 system proposed in [15]. The shaded regions are ruled out by [8] and [13]. 
LT LR
Mod Laser 
Data In  
Data Out  
 B0T   
Optical 
Detectors 
   B0T      B0R 
Photon 
input 
Photon 
output 
Axions penetrate barrier 
opaque to photons 
LT LR
Photons 
Axions 
Cavity mirrors  
(b)  
Figure 1, Stancil, Phys. Rev. D. 
Figure 2, Stancil, Phys. Rev. D. 
-LT /2 LT /2 
ma (eV)
earth diameter
1000 km
moon, 4+4
Robilliard, et al.
QPM extensions
Figure 3, Stancil, Phys. Rev. D.
ABSTRACT
  The possible existence of axion-like particles could lead to a new type of
long distance communication. In this work, basic antenna concepts are defined
and a Friis-like equation is derived to facilitate long-distance link
calculations. An example calculation is presented showing that communication
over distances of 1000 km or more may be possible for $m_{a}< 3.5$ meV and
$g_{a\gamma \gamma} > 5 \times 10^{- 8} {\text{GeV}}^{- 1}$.

<|endoftext|><|startoftext|>
Introduction 
  In April 2001, we put forward the REESSE1 public-key encryption scheme [1]. In September 2003, we 
proposed the REESSE1 public-key cryptosystem which is an extension of the first version, and includes 
both encryption and signature [2]. In May 2005, it was argued that the lever function ℓ(.) is necessary and 
sufficient for the security of the REESSE1 encryption [3]. In [3], the continued fraction method of analyzing 
the key transforms Cx ≡ Ax W and Cx ≡ Ax W 
(x) (% M) with x ∈ [1, n] and ℓ (x) ∈ Ω was mentioned earlier 
than in any other publications. In November 2006, an abbreviation of the REESSE1+ cryptosystem was 
submitted to eprint.iacr.org [4]. 
  As is pointed out in [4], the set Ω = {5δ, …, (n + 4)δ | δ ≥ 1} is not unique, and other Ω may be 
selected ― Ω = {n + 1, …, n + n} with ℓ (i) + ℓ (j) ≠ ℓ (k) ∀ i, j, k ∈ [1, n] for example. Clearly, Ω is a 
security dominant parameter, and just like p and q in the RSA cryptosystem. 
  In May 2005, [5] pointed out that the REESSE1 signature scheme was insecure, which is right. 
  In July 2005, [6] thought unreasoningly that the REESSE1 encryption scheme was insecure, which is 
wrong, and rebutted thoroughly by us in [7]. Moreover, [7] illuminated definitely that the idea of the 
continued fraction analysis of REESSE1 did not originate from [6] (naturally also not from [8]), ant the 
idea firstly formally appeared in our 2004 application for a national fund project [7]. What needs to be 
pointed out further is that the authors of [8] are the related reviewers of our 2004 application. 
In December 2006, [8] thought unreasoningly again that the REESSE1+ public-key cryptosystem is not 
secure at all, which connotes any private key in REESSE1+ can be extracted by [8]. It is of flubdub and gulf. 
                                                        
* Received 05 April 2007, and revised 19 December 2009. 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
The ancients said ‘stop an advancing army with troops, and stop onrushing water with earth’. 
In what follows, the function f in [8] is namely the function ℓ in [4], namely f (i), f (j), f (k) in [8] are 
equivalent to ℓ (i), ℓ (j), ℓ (k), unless otherwise specified, the sign  represents ‘M – 1’, the sign % does 
‘modulo’, and unattached (x) does x-th expression. 
  In short, there exist 6 grave faults in [8]: 
   The converse proposition of fact 1.1 does not hold. 
  Clearly, fact 1.1 implies that if f (i) + f (j) = f (k), then Z / M – p u / q u < 1 / (2 qu
2) with L / Ak = p u / q u. 
We will prove by a counterexample that the former is only sufficient, but not necessary, namely if Z / M 
– p u / q u < 1 / (2 q u
2), then f (i) + f (j) = f (k) do not necessarily hold, and also namely Z / M – p u / q u < 1 / (2 q u
for f (i) + f (j) = f (k) is only necessary, but not sufficient. 
   Fact 1.2, 1.3 and 4 do not always hold. 
  Even if they hold, fact 1.2, 1.3 and 4 each are insufficient for f (i) + f (j) = f (k), and further, property 1, 2, 
3, 4 and 5 are invalid. Note fact 4 is essentially equivalent to each of fact 1.2 and 1.3. 
   The converse proposition of fact 2.2 does not hold, namely table 1 is insufficient for f (i) + f (j) = f (k). 
   Both algorithm 1 based on fact 4 and algorithm 2 based on table 1 are disordered & wrong logically. 
   To achieve so-called “breaking”, the example in [8] was woven elaborately, and table 2 was falsified, 
namely its authors intendedly mutilated the two tuple data to cause indeterminacy. 
   The inverse T –1 %  does not exist, and Q –1 %  not necessarily exist. 
Additionally, the case of Ω = {5 + δ, …, (n + 4) + δ | δ ≥ n – 4} with f (i) + f (j) ≠ f (k) ∀ i, j, k ∈ [1, n] is 
not analyzed at all. 
  Therefore, the cryptanalysis of the REESSE1+ cryptosystem by [8] is a type of pseudo-attack and 
balderdash leading to which the most radical reason is that the authors of [8] are not aware of the 
indeterminacy of the lever function ℓ (.) namely f (.), as is mentioned in [4]: 
   if the order of W is d < , then there is W f (x) ≡ W f (x) + d (% M), and when f (i) + f (j) = f (k), we see that 
f (i) + d + f (j) + d ≠ f (k) + d; 
   when f (i) + f (j) ≠ f (k), there exist Ci ≡ Ai′ W ′
), Cj ≡ Aj′ W ′
(j), and Ck ≡ Ak′ W ′
) (% M) such that f ′(i) 
+ f ′ (j) ≡ f ′ (k) (% ) with Ak′ ≤ ṕ , where ṕ is the maximal prime allowed. 
Another vital reason is that [8] always regarded necessary conditions for f (i) + f (j) = f (k) as sufficient 
and necessary conditions, and [8] did not consider the whole space of private keys or public keys. 
2  Theorem 1 vs the REESSE1+ Cryptosystem 
2.1  Condition at Theorem 1 in [8] Dissatisfies Necessity 
  Theorem 1 in [8] is retailed as follows: 
  Theorem 1: Let α be a real number, and let r / s be a rational with gcd(r, s) = 1 and |α – r / s| < 1 / (2s2). 
Then r / s is a convergent of the continued fraction expansion of α.                                
  Here, |α – r / s| represents the absolute value of (α – r / s). 
  The proof of theorem 1 is referred to [9]. 
  The condition |α – r / s| < 1 / (2s2) is only sufficient for r / s to be a convergent of the continued fraction of 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
α, but not necessary. Namely if r / s is a convergent of the continued fraction of α, |α – r / s| < 1 / (2s2) does 
not necessarily hold. 
  An example is taken. 
  Example 1. 
  Let r / s = 2 / 13, and then 1 / 2s2 = 1 / (2 × 132) = 0.002958579882. 
  Let α = 2039 / 13001, and then 
2039 / 13001 – 2 / 13 = 0.002987935839 > 0.002958579882 = 1 / (2 × 132). 
  On the other hand, the continued fraction of 2039 / 13001 is 1 / (6 + (1 / (2 + 1 / (1 + … 1 / 3)))). 
  Thus, 2 / 13 is a convergent of the continued fraction of 2039 / 13001, which illustrates |α – r / s| < 1 / (2s2) 
is not necessary for r / s to be a convergent of the continued fraction of α. 
2.2  Ak Will Emerge But Is Undecidable If f (i) + f (j) = f (k) 
Assume that ṕ is the maximum prime in the cryptosystem, {A1, …, An} is a coprime sequence with 0 < 
∀Ax ≤ ṕ, M > ∏ 
x = 1 Ax is a prime, and Cx ≡ Ax W
(x) (% M) for x = 1, …, n is a public key [4], where n ≥ 6, 
and f (x) ∈ Ω = {5δ, …, (n + 4)δ | δ = 1} = {5, …, n + 4}. 
  Assume f (k) = f (i) + f (j) with i ≠ k, j ≠ k , and i, j, k ∈ [1, …, n]. Let 
Z ≡ Ci Cj Ck
–1 (% M). 
  Then 
Z ≡ Ai Aj (Ak) 
–1 (% M) 
Z (Ak) ≡ Ai Aj (% M) 
Z (Ak) – L M = Ai Aj, 
where L is a positive integer. 
  Dividing the either side of the above equation by (M Ak) yields 
Z / M – L / Ak = Ai Aj / (M Ak).                       (1) 
  Due to M > ∏ n   x = 1 Ax and every Ax ≥ 2, we have 
Z / M – L / Ak < 1 / (2 
2).                       (1′) 
  Obviously, when n > 2 + 1, (1′) may have a variant, namely 
Z / M – L / Ak < 1 / (2 Ak
2).                         (1″) 
  In terms of theorem 1, L / Ak is a convergent of the continued fraction of Z / M. 
Let p0 / q0, p1 / q1, …, pt / qt be the convergent sequence of continued fraction of Z / M, and  
L / Ak = p u / q u. 
  Note that if pu / qu satisfies (1″), then pu + 1 / qu + 1, pu + 2 / qu + 2, …, pt / qt also likely satisfies (1″). 
Therefore, there likely exist multiple values of L / Ak by (1″), and Ak is undetermined. 
However, if we do not know in advance whether f (i) + f (j) = f (k), then even if Z / M – p u / q u < 1 / (2 q u
we can not decide f (i) + f (j) = f (k). Namely Z / M – p u / q u < 1 / (2 q u
2) is only necessary for f (i) + f (j) = f (k), 
but not sufficient, which will be discussed further in what follows. 
3  Conditions at Fact 1.1 and 4 Each Are Insufficient for f (i) + f (j) = f (k) 
Because fact 4 is essentially equivalent to each of fact 1.2 and 1.3, if the condition at fact 4 is 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
insufficient for f (i) + f (j) = f (k), the conditions at fact 1.2 and 1.3 each are also insufficient. 
The condition qu+1 > qu ∆ = qu (M / (2∏ 
x=n-2 prime 〈x〉))
2 at fact 4 connotes (1″) at fact 1.1 because (1″) 
is the precondition of qu+1 > qu ∆ which is the dominant basis of alg.1
 [8]. 
3.1  Converse Proposition of Fact 1.1 does not Hold and (1″) Is Insufficient for f (i) + f (j) = f (k) 
  Fact 1.1 in [8] is retailed as follows: 
  Fact 1.1 [8]: If f (i) + f (j) = f (k), there exists a qu such that qu = Ak in {p0 / q0, p1 / q1, …, pt / qt}, the 
convergent sequence of continued fraction expansion of Z / M with Z ≡ Ci Cj Ck
–1 % M.                
  Due to f (i) + f (j) = f (k), Z / M = L / Ak + Ai Aj / (M Ak), M > ∏ 
x = 1 Ax and Ax ≥ 2, we have 
Z / M – L / Ak = Ai Aj / (M Ak) < Ai Aj / (Ak ∏ 
x = 1 Ax). 
Further, 
Z / M – L / Ak < 1 / (2 Ak
2).                         (1″) 
  Let Z / M = [0; a1, a2, …, at] is the continued fraction expansion of Z / M. 
  By theorem 1, ∃ u ∈ [1, t] makes Z / M – p u / q u < 1 / (2 q u
  Let L / Ak = pu / qu, where 
pu / qu = a0 + 1 / (a1 + 1 / (a2 + … + 1 / (au – 1 + 1 / au))).                   (2) 
  Notice that it is possible that ∃ h > 0 makes Z / M – p u + h / q u + h < 1 / (2 q u + h
2), and moreover not fact 1.1 
but its converse is the inner logical base of alg.1 in [8]. 
  Through a counterexample, we will prove that the converse proposition of fact 1.1 does not hold, that is, 
the condition Z / M – p u / q u < 1 / (2 q u
2) is insufficient for f (i) + f (j) = f (k). 
  Example 2. 
  For convenience in computing, let n = 6, {Ax} = {11, 10, 3, 7, 17, 13}, δ = 1, and M = 510931. 
  Arbitrarily select W = 17797, f(1) = 9, f(2) = 6, f(3) = 10, f(4) = 5, f(5) = 7, and f(6) = 8. 
  From Cx ≡ Ax W
 f(x) (% M), we obtain 
  {Cx} = {113101, 79182, 175066, 433093, 501150, 389033}, 
and its inverse sequence 
  {Cx
–1} = {266775, 236469, 435654, 149312, 434038, 425203}. 
  Randomly select i = 1, j = 3, and k = 5. In this case, f(5) = 7 ≠ f(1) + f(3) = 9 + 10. Compute  
Z ≡ C1 C3 C5
                                       ≡ 113101 × 175066 × 434038  
                                       ≡ 186640 (% 510931). 
  Presume that W in C1 C3 is just neutralized by W
 –1 in C5
–1, then  
186640 ≡ A1 A3 A5
–1 (% 510931). 
  According to (1),  
186640 / 510931 – L / A5 = A1 A3 / (510931 A5). 
  By the Euclidean algorithm, a1, a2, a3, … are found out, and thus the continued fraction of 
186640 / 510931 = 1 / (2 + 1 / (1 + 1 / (2 + 1 / (1 + 1 / (4 + … + 1 / 3))))). 
  Heuristically let 
p 4 / q 4 = L / A5 = 1 / (2 + 1 / (1 + 1 / (2 + 1 / 1))) = 4 / 11, 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
which indicates that probably A5 = 11. On this occasion, there is 
186640 / 510931 – 4 / 11 = 0.0016575801 
                                               < 1 / (2 A5
2) = 1 / (2 × 112) = 0.0041322314. 
  The above expression satisfies (1″), namely the condition at theorem 1, and thereby A5 = 11 less than the 
maximum in {Ax} is deduced, which is in direct contradiction to factual A5 = 17. 
  So the condition Z / M – p u / q u < 1 / (2 q u
2) is not sufficient for f (i) + f (j) = f (k), namely the converse 
proposition of fact 1.1 does not hold. 
3.2  Each of Fact 1.2, 1.3 and 4 does not Hold 
  Fact 1.2 in [8] is retailed as follows: 
  Fact 1.2 [8]: There is sharp increase from qu to qu+1 since qu+1 ≥ (Ak M / (2Ai Aj))
1/ 2. 
  The derivation of fact 1.2 in [8] is retailed as follows: 
  Let L / Ak be the u-th convergent, i.e., qu = Ak and pu = L, i.e., pu / qu = L / Ak. Then we know that 
|Z / M – pu+1 / qu+1| < Ai Aj / (Ak M) = 1 / (2 ((Ak M / (2Ai Aj))
2)2).              (2′) 
  According to theorem 1 and convergence of sequence {p0 / q0, p1 / q1, …, pt / qt}, we obtain that 
qu+1 ≥ (Ak M / (2Ai Aj))
2 = Ak (M / (2Ai Aj Ak))
2.                    (3) 
                                                                                     
  Is the above derivation right? See the following analysis. 
  Clearly, by the definition of a finite continued fraction, (2′) holds. In addition, in terms of [9], pu+1 are 
qu+1 are coprime, and there is qu+1 ≥ Ak = qu, which is a judgment foundation. 
If f (i) + f (j) = f (k), then there is |Z / M – pu / qu| < 1 / (2 qu
2) with L / Ak = pu / qu. Furthermore, through 
practical observations, in most cases, there is also 
|Z / M – pu+1 / qu+1| < 1 / (2 qu+1
2).                         (3′) 
  According to (2′) and (3′), we have either 
|Z / M – pu+1 / qu+1| < 1 / (2 qu+1
2) < 1 / (2 ((Ak M / (2Ai Aj))
2)2),             (3″) 
|Z / M – pu+1 / qu+1| < 1 / (2 ((Ak M / (2Ai Aj))
2)2) < 1 / (2 qu+1
2).             (3′′′) 
  If (3″) holds, there exists qu+1 ≥ Ak (M / (2Ai Aj Ak))
2, which also indicates qu+1 ≥ Ak = qu. 
  If (3′′′) holds, there exists Ak (M / (2Ai Aj Ak))
2 ≥ qu+1. Notice that in this case, qu+1 ≥ Ak = qu is still 
possible. 
  Therefore, qu+1 ≥ Ak (M / (2Ai Aj Ak))
2, namely fact 1.2 does not necessarily hold, which indicates that 
there is a logic error during the derivation of (3) in [8]. 
  Moreover, from (2′) and (3′) we can judge that when n is large enough ― 80 for example, the 
probability that (3′′′) holds is greater than one that (3″) holds. 
  Now, we review fact 1.3 in [8]. It is retailed as follows: 
  Fact 1.3 [8]: Due to fact 1.2, there is also a sharp increase from au to au+1, since qv+1 = av+1 qv + qv –1 for v 
= 1, 3, …, t. Here av s are items of Z / M determined by (2).                                       
  Obviously, because fact 1.2 does not hold, fact 1.3 does not also hold. 
  Further, because fact 1.2, namely (3) does not hold, naturally, fact 4 in [8] does not also hold, that is, qu+1 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
> qu (M / (2∏ 
x=n-2 prime 〈x〉))
2 is not always valid. 
  Observe an example once more. 
In example 3, suppose that the bit-length of a plaintext block is 8, and two bits of a block correspond to 
three items of a coprime sequence {Ax}, which means that the encryption algorithm is optimized through a 
compact binary sequence. In practice, we do just so. 
Apparently, the length of {Ax} is 3 × (8 / 2) = 12. 
Example 3. 
  Let {Ax} = {{23, 11, 17}, {41, 29, 26}, {15, 19, 37}, {31, 7, 43}}, and  
M = 2022169 > 31 × 37 × 41 × 43 = 2022161. 
  Randomly select W = 1507351, f (1) = 6, f (2) = 14, f (3) = 9, f (4) = 11, f (5) = 12, f (6) = 10, f (7) = 8, f (8) 
= 16, f (9) = 5, f (10) = 13, f (11) = 15, and f (12) = 7. 
  From Cx ≡ Ax W
(x) (% M), we obtain {Cx} = 
{{572402, 1930240, 374715}, {25128, 265158, 350520}, 
{1674837, 1231458, 1448214}, {110225, 1198155, 757620}}, 
and {C6
–1, C7
–1} = {93176, 1591882}. Let 
Z ≡ (C4 C12) (C6
–1)  
                                    ≡ (25128 × 757620) (93176 × 1591882) 
                                ≡ 776394 × 1123251  
                                    ≡ 689616 (% 2022169). 
  Then, 689616 / 2022169 – L / (A6 A7) = (A4 A12) / (2022169 A6 A7). 
  Further, the continued fraction of 689616 / 2022169 is  
1 / (2 +1 / (1 + 1/ (13 + 1/ (1 + (1/ (3 + 1/ (2 + 1/ (2 + 1/ (2 + 1/ (97 + 4 / 9)))))))))). 
  Heuristically let 
L / (A6 A7) = 1 / (2 +1 / (1 + 1/ (13 + 1/ (1 + (1/ (3 + 1 / 2))))) 
                           = 133 / 390, 
which indicates that probably A6 A7 = 390. Because the discriminant 
689616 / 2022169 – 133 / 390 = 2.235477262e-6 
                                                < 1 / (2 × 3902) = 3.287310979e-6 
satisfies the condition at theorem 1 in [8], A6 A7 = 390 is deduced out.  
  The integer 390 may be factorized into the pairs (2, 195), (3, 130), (5, 78), (6, 65), (10, 39), (13, 30), or 
(15, 26), where the elements of (10, 39), (13, 30), and (15, 26) are less than maximal number in {Ax}. Thus, 
true (A6, A7) = (26, 15) is included in 6 potential cases. Here, au = 2 and also au + 1 = 2, and there is no sharp 
increase from au to au+1. 
  Additionally, this example also illustrates that when one attempts to infer the suitable factors of the 
product Ak 1 Ak 2 by f (i) + f (j) = f (k1) + f (k2) with every f (x) ∈ Ω = {n + 1, …, 2n}, indeterminacy is 
increased remarkably. 
3.3  Condition at Fact 4 Is Insufficient for f (i) + f (j) = f (k) 
In [6], the attackers attempted to seek Ak dominantly by the converse proposition of fact 1.1, and 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
however, disturbing values of Ak are too many to determine the original value of Ak. Therefore, in [8], the 
attackers attempted to diminish indeterminacy of Ak through fact 4 which connotes fact 1.1, and is 
equivalent to each of fact 1.2 and 1.3. 
To say the least, even if fact 4 is valid sometimes, we can prove by a counterexample that the condition 
at fact 4 is insufficient for f (i) + f (j) = f (k). 
  Example 4. 
  Still let n = 6, {Ax} = {11, 10, 3, 7, 17, 13}, and M = 510931 > 11 × 10 × 3 × 7 × 17 × 13 = 510510. 
  Arbitrarily select W = 17797, f(1) = 9, f(2) = 6, f(3) = 10, f(4) = 5, f(5) = 7, and f(6) = 8. 
  From Cx ≡ Ax W
 f(x) (% M), we obtain {Cx} = {113101, 79182, 175066, 433093, 501150, 389033}, 
and its inverse sequence {Cx
–1} = {266775, 236469, 435654, 149312, 434038, 425203}. 
  Randomly select i = 1, j = 3, and k = 6. In this case, f(6) = 8 ≠ f(1) + f(3) = 9 + 10. Compute  
Z ≡ C1 C3 C6
                                       ≡ 113101 × 175066 × 425203  
                                       ≡ 425865 (% 510931). 
  Presume that W in C1 C3 is just neutralized by W
 –1 in C6
–1, then  
425865 ≡ A1 A3 A6
–1 (% 510931). 
  According to alg.1 in [8],  
425865 / 510931 – L / A6 = A1 A3 / (510931 A6). 
  Compute the continued fraction of186640 / 510931 being  
1 / (1 + 1 / (5 + 1 / (159 + 1 / 535))). 
  Heuristically let 
L / A6 = 1 / (1 + 1 / 5) = 5 / 6, 
which indicates that probably A6 = 6. Further, can verify that 
425865 / 510931 – 5 / 6 = 0.000174518 
                                               < 1 / (2 × 62) = 0.0138889 
satisfies the condition at theorem 1 in [8]. 
  Let u = 2, and qu = Ak = A6 = 6.  
  Then pu+1/ qu+1 = p3 / q3 = 1 / (1 + 1 / (5 + 1 / 159)) = 796 / 955, and 
Ak (M / (2Ai Aj Ak))
2 = 6 (510931 / (2 × 11 × 3 × 6))1/ 2  
= 6 × 35.9197  
= 215.5186. 
  In addition, evidently prime〈1〉 = 2, prime〈2〉 = 3, prime〈3〉 = 5, prime〈4〉 = 7, prime〈5〉 = 11, prime〈6〉 = 
13, prime〈7〉 = 17, and prime〈8〉 = 19 which are according to [8]. 
  Then, by fact 4 in [8], m = 7, and ∆ = (M / (2∏ m   x=n -2 prime〈x〉))
2 = (15)1 / 2 = 3.8729. 
  Thus, qu+1 = 955 > Ak (M / (2Ai Aj Ak))
2 = 216 satisfies fact 1.2 namely (3), au+1 = 159 > au = 5 satisfies 
fact 1.3, and qu+1 = 955 > qu ∆ ≈ 24 satisfies fact 4 and alg.1. 
  By the condition at fact 4, A6 = 6 < max A = 221 is deduced, namely alg.1 will output {1, 3, 6, 6}. 
However, it is in direct contradiction to true A6 = 13, which show the condition at fact 4 is not sufficient for 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
f (i) + f (j) = f (k), and every Ax will likely be evaluated to at least two eligible values (see example 5). 
Because the condition at fact 4 is insufficient for f (i) + f (j) = f (k), property 1, 2, 3, 4 and 5 are invalid. 
Further, the run result of alg.1 regarding an arbitrary public key {C1, …, Cn} as an input will contain 
enormous disturbing data as n ≥ 80, and it is infeasible that alg.2 find out the original coprime sequence 
{Ax} in polynomial time (see example 5), which manifests that alg.1 and 2 are invalid. 
4  Example in [8] Is Woven Elaborately and Data at Table 2 Is Falsified 
4.1  Example in [8] Illustrates Nothing about Breaking 
  It is easily understood that according to fact 1.1 and 4, the authors of [8] can weave an example 
consistent with alg.1 and 2 since 
   the lever function value {f (1), …, f (n)} may be known in advance; 
   the coprime sequence {A1, …, An} may be selected elaborately in advance; 
   the condition Z / M – L / Ak < 1 / (2 Ak
2) at fact 1.1 is necessary for f (i) + f (j) = f (k); 
   the condition qu+1 > qu (M / (2∏ 
x=n-2 prime 〈x〉))
2 at fact 4 is necessary for f (i) + f (j) = f (k) sometime. 
  However, as is indicated in the above rebutment, a consistent example does not illustrates that a related 
{Ax} can be extracted accurately from an arbitrary public key {Cx} when {f (x)} and {Ax} are unknown in 
advance. The authors of [8] at most broke “their own REESSE1+”, which diverted themselves, but not our 
REESSE1+ with choice parameters. It is well understood that even though a cryptosystem is RSA or ECC, 
its parameter is must also selected; otherwise the cryptosystem is insecure. 
  The example in [8] is neither readable nor verifiable in short time, and the proportion of n to log2 M is 
not also proper, which contravenes the optimization principle for the modulus M in the REESSE1+ 
cryptosystem. An obvious truth is that if M is too large, the length of a public key will increase rapidly. 
Therefore, M should be as small as possible while at least meets M > ∏ n   x = 1 Ax meantime. Selection of the 
sequence {Ax} in [8] also contravenes the optimization principle. 
  The intent for [8] to select such a large M that n is out of proportion to log2 M seems to want to increase 
the necessity of the conditions at fact 1.1 and 4 for f (i) + f (j) = f (k). However, it can not increase the 
sufficiency of the conditions. 
4.2  Data at Table 2 Is Falsified for a Compatible Effect 
  In above paragraphs, we illustrate that the condition Z / M – L / Ak < 1 / (2 Ak
2) at fact 1.1, namely  (1″) 
is insufficient for f (i) + f (j) = f (k). Property I will make us better understand it. 
  Property I: Let Cx ≡ Ax W
(x) (% M), where every x ∈ [1, n], Ax ≤ ṕ, f (x) ∈ {5, …, n + 4}, M > ∏ 
x = 1 Ax 
is a prime. Then, ∀ i, j, k ∈ [1, n], even if f(i) + f(j) ≠ f(k),  
  1) there always exist 
Ci ≡ A′i W ′ 
f ′ (i), Cj ≡ A′j W ′ 
f ′ (j), and Ck ≡ A′k W ′ 
f ′ (k) (% M) 
such that f ′ (i) + f ′ (j) ≡ f ′ (k) (% ) with A′k ≤ ṕ.  
  2) Ci, Cj, Ck make (1″) hold with A′k ≤ ṕ in all probability. 
  Proof:  
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
  Let Οd be an oracle for a discrete logarithm. 
  Suppose that W ′ ∈ [1, ] is a generator of ( * M , ·). 
  In terms of group theories, ∀ A′k ∈ {2, …, ṕ}, the equation  
Ck ≡ A′k W ′ 
(k) (% M) 
has a solution. f ′(k) may be taken through Οd. 
  ∀ f ′(i) ∈ [1, ], and let f ′(j) ≡ f ′(k) – f ′(i) (% ).  
  Then, from Ci ≡ A′i W ′
(i) and Cj ≡ A′j W ′
(j) (% M), we can obtain many distinct pairs (A′i, A′j), where A′i, 
A′j ∈ (1, M), and f ′(i) + f ′(j) ≡ f ′(k) (% ). 
  2) 
  Let 
Z ≡ Ci Cj Ck
 ≡ A′i A′j W ′ 
(i) + f ′
 (A′k W ′ 
(k))–1 (% M) 
with f ′(i) + f ′(j) ≡ f ′(k) (% ) but f(i) + f(j) ≠ f(k). 
  Further, there is A′i A′j ≡ Ci Cj Ck
 A′k (% M). 
  It is easily seen from the above equations the values of W′ and f ′(k) do not influence the value of A′i A′j. 
  If A′k ∈ [2, ṕ] changes, A′i A′j also changes. Thus, ∀ i, j, k ∈ [1, n], the number of value of A′i A′j is ṕ – 1. 
  Let M = 2 q ṕ 2 A′k, where q is a rational number. 
  According to (1), 
Z / M – L / A′k = A′i A′j / (M A′k) = A′i A′j / (2q ṕ
  When A′i A′j ≤ q ṕ
 2, there is  
Z / M – L / A′k ≤ q ṕ
 2 / (2 q ṕ 2 A′k
2) = 1 / (2 A′k
which satisfies (1″). 
  Assume that the value of A′i A′j distributes uniformly on (1, M). Then, the probability that A′i A′j makes (1″) 
hold is 
P∀ i,  j,  k ∈ [1, n] = (q ṕ
 2 / (2 q ṕ 2)) (1 / 2 + … + 1 / ṕ)) 
                                  ≥ (1 / 2)(2 (ṕ – 1) / (ṕ + 2)) 
                                  = 1 – 3 / (ṕ + 2). 
  It is seen that the probability is very large.                                                 
  According to property I.2, for a certain Ck ∈ {C1, …, Cn} and ∀ Ci, Cj ∈ {C1, …, Cn}, Ak will have 
roughly n2 values by (1″) namely the condition at fact 1.1, including the repeated, and considering the 
symmetry, almost every value has at least one counterpart. 
Of course, if the condition at fact 4, namely qu+1 > qu ∆ which connotes (1″) is used as a constraint, the 
number of values of Ak = qu will decrease. Example 4 already shows that even though f(i) + f(j) ≠ f(k), an 
eligible Ak can still be found. 
Notice that when i, j, k all fix on, it is fully possible that L / Ak has multiple satisfactory values, which 
implies multiple convergents of the continued fraction of Z / M likely meet (1″) and even qu+1 > qu ∆ . 
To clarify the matter thoroughly, we program by alg.1 in MS Visual C++, make an executable file, 
repeat the experiment regarding the public key at the example in [8] as input, and obtain the following 
output which is classified the same as in [8]: 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
Ak Tuples (i, j, k) 
A1 = 9 (9, 9, 1) 
A2 = 253 (7, 5, 2), (9, 6, 2), (5, 7, 2), (6, 9, 2) 
A3 = 16127 (10, 7, 3), (7, 10, 3) 
A4 = 3 (8, 3, 4), (3, 8, 4) 
A4 = 205 (9, 3, 4), (6, 5, 4), (5, 6, 4), (7, 7, 4), (3, 9, 4) 
A4 = 152391460756 (8, 7, 4), (7, 8, 4) 
A6 = 53022327 (4, 3, 6), (3, 4, 6) 
A6 = 318461273008612 (4, 3, 6), (3, 4, 6) 
A6 = 4471789987666990 (5, 3, 6), (3, 5, 6) 
A6 = 1572955621791218 (5, 5, 6) 
A8 = 2809 (5, 5, 8), (9, 7, 8), (7, 9, 8) 
A10 = 49 (9, 5, 10), (5, 9, 10) 
A10 = 1894 (9, 6, 10), (6, 9, 10) 
A10 = 6957 (9, 7, 10), (7, 9, 10) 
Table I: Output of the program by alg. 1 given the public key at the example in [8] 
Obviously, table 2 in [8] misrepresented A3 = 16127 as A4 = 16127, and A6 = 53022327 as A10 = 
53022327. What gets worse is that table 2 mutilated the two tuple data (4, 3, 6, 318461273008612) and (3, 
4, 6, 318461273008612), which is a type of data falsification. These two tuple data illustrate that for fixed i, 
j, k, the L / Ak may have several satisfactory values, namely the several convergents of the continued 
fraction of Z / M meet fact 4 meantime, which reflects the insufficiency of the condition qu+1 > qu ∆ further, 
increases the indeterminacy of Ak greatly, and weakens the reliability of alg.1 in [8] greatly. 
4.3  Example in [8] Is Woven Elaborately and Alg.2 in [8] Is Invalid 
In the above, it is mentioned that at most the authors of [8] broke “their own REESSE1+”, because the 
example in [8] is woven elaborately, and the parameters {Ax} and {f (x)} are selected deliberately. 
If we use another set of parameters for producing a public key as the input of the program by alg.1, the 
output result will contains so many disturbing data that the original sequence {A1, …, An} can not be 
distinguished in polynomial time. 
Example 5. 
Let n = 10, {Ax} = {437, 221, 77, 43, 37, 29, 41, 31, 15, 2}, and  
M = 13082761331670077 > ∏ n   x = 1 Ax = 13082761331670030. 
  Arbitrarily select W = 944516391, f(1) = 11, f(2) = 14, f(3) = 13, f(4) = 8, f(5) = 10, f(6) = 5, f(7) = 9, f(8) 
= 7, f(9) = 12, f(10) = 6. 
  According to Cx ≡ Ax W
 f(x) (% M), we obtain {Cx} = {3534250731208421, 12235924019299910, 
8726060645493642, 10110020851673707, 2328792308267710, 8425476748983036, 6187583147203887, 
10200412235916586, 9359330740489342, 5977236088006743}. 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
Input the public key {Cx} into the program by alg.1, and obtain ∆ = 506, max A = 58642670, and the 
following tuples greater than 100: 
Ak Tuples (i, j, k) 
A1 = 187125 (1, 1, 1) 
A1 = 121089 (2, 1, 1), (1, 2, 1) 
A1 = 77 (5, 3, 1), (3, 5, 1) 
A1 = 23 (8, 6, 1), (6, 8, 1), (10, 10, 1) 
A1 = 437 (10, 6, 1), (6, 10, 1) 
A2 = 1251 (1, 1, 2) 
A2 = 187125 (2, 1, 2), (1, 2, 2) 
A2 = 121089 (2, 2, 2) 
A2 = 17 (8, 4, 2), (6, 5, 2), (5, 6, 2), (10, 7, 2), (4, 8, 2), (7, 10, 2) 
A2 = 221 (10, 4, 2), (7, 6, 2), (6, 7, 2), (8, 8, 2), (4, 10, 2) 
A2 = 77 (9, 8, 2), (8, 9, 2) 
A2 = 4204 (10, 10, 2) 
A3 = 187125 (3, 1, 3), (1, 3, 3) 
A3 = 12 (7, 1, 3), (1, 7, 3) 
A3 = 121089 (3, 2, 3), (2, 3, 3) 
A3 = 77 (6, 4, 3), (4, 6, 3), (10, 8, 3), (8, 10, 3) 
A3 = 11 (10, 4, 3), (7, 6, 3), (6, 7, 3), (8, 8, 3), (4, 10, 3) 
A3 = 2113 (8, 7, 3), (7, 8, 3) 
A3 = 769 (9, 8, 3), (8, 9, 3) 
A4 = 187125 (4, 1, 4), (1, 4, 4) 
A4 = 121089 (4, 2, 4), (2, 4, 4) 
A4 = 76 (10, 6, 4), (6, 10, 4) 
A4 = 56 (10, 9, 4), (9, 10, 4) 
A5 = 187125 (5, 1, 5), (1, 5, 5) 
A5 = 630269 (6, 1, 5), (1, 6, 5) 
A5 = 121089 (5, 2, 5), (2, 5, 5) 
A5 = 41 (8, 2, 5), (2, 8, 5) 
A5 = 97 (4, 3, 5), (3, 4, 5) 
A5 = 37 (6, 6, 5), (10, 6, 5), (6, 10, 5) 
A6 = 187125 (6, 1, 6), (1, 6, 6) 
A6 = 121089 (6, 2, 6), (2, 6, 6) 
A7 = 187125 (7, 1, 7), (1, 7, 7) 
A7 = 121089 (7, 2, 7), (2, 7, 7) 
A7 = 3 (9, 3, 7), (3, 9, 7) 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
A8 = 187125 (8, 1, 8), (1, 8, 8) 
A8 = 34945619 (6, 2, 8), (2, 6, 8) 
A8 = 121089 (8, 2, 8), (2, 8, 8) 
A9 = 187125 (9, 1, 9), (1, 9, 9) 
A9 = 121089 (9, 2, 9), (2, 9, 9) 
A9 = 5 (6, 4, 9), (4, 6, 9), (10, 8, 9), (8, 10, 9) 
A9 = 15 (8, 6, 9), (6, 8, 9), (10, 10, 9) 
A10 = 259970 (4, 1, 10), (1, 4, 10) 
A10 = 187125 (10, 1, 10), (1, 10, 10) 
A10 = 121089 (10, 2, 10), (2, 10, 10) 
A10 = 7629 (8, 3, 10), (3, 8, 10) 
Table II: Output of the program by alg. 1 given the public key at example 5 
From table II, we observe that 
Ak from 5 tuples is A2 = 221 or A3 = 11, 
Ak from 4 tuples is A3 = 77 or A9 = 5, 
Ak from 3 tuples is A1 = 23, A5 = 37, or A9 = 15, 
Ak from 2 tuples is A1 = 77, A2 = 77, A3 = 12, A4 = 56, A5 = 41, or A7 = 3 etc, and 
Ak from 1 tuples is A1 = 187125, A2 = 1251, A2 = 121089, or A2 = 4204. 
Among these Ak′s, there exist at least 2
 n – 5 compatible combinations. 
For instance, arbitrarily select compatible A3 = 11, A9 = 5, A1 = 23, A5 = 41, and A2 = 1251, and find out 
f(3) = 14, f(9) = 13, f(1) = 12, f(5) = 11, and f(2) = 10 by Table 1 in [8]. 
Again for instance, arbitrarily select compatible A3 = 11, A9 = 5, A5 = 37, A7 = 3, and A1 = 187125, and 
find out f(3) = 14, f(9) = 13, f(5) = 12, f(7) = 11, and f(1) = 10 by Table 1 in [8]. 
Therefore, if keep Ω = {5, .., n + 4} unvaried, we may select fit {Ax} and W so as to make the time 
complexity of the continued fraction attack by qu+1 > qu ∆ and table 1 get to at least O(2 
n), which elucidates 
that the example woven elaborately in [8] has no practical meaning, and alg.2 in [8] is invalid. 
However, we had best select fit Ω while let {Ax} and W random so as to avoid attack by (1′) (see sect.5.1). 
4.4  Distribution of Tuples Relating Ak does not Follow Table 1 in [8] 
In addition, from table II we also observe that A2 = 17 involves 6 tuples, and A5 = 37 involves 3 tuples 
(but in fact, 6 tuples is impossible, and f(5) = 10), which indicates that the distribution of tuples relating Ak 
does not follow table 1 in [8]. Besides, considering A3 = 11 from 5 tuples, A9 = 5 from 4 tuples etc, we see 
that table 1 is insufficient for f (i) + f (j) = f (k), that is, the converse proposition of fact 2.2 does not hold. 
5  Why Is Cx ≡ Ax W f(x) (% M) Changed to Cx ≡ (Ax W f(x))δ (% M) in REESSE1+ v2.1 
5.1  Lever Set Ω Needs to Be Complicated When Cx ≡ Ax W
 f(x) (% M) 
In REESSE1, Cx ≡ Ax W
 f(x) (% M) with f (x) ∈ Ω = {5, …, n + 4}. 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
In REESSE1+, Cx ≡ Ax W
 f(x) (% M) with f (x) ∈ Ω = {5δ, …, (n + 4)δ | δ ≥ 1}, {5 + δ, …, (n + 4) + δ | δ 
≥ n − 4}, {5, 7, …, 19, 53, 55, …} etc. 
If let W′ = W δ (% M), we see that {5δ, …, (n + 4)δ | δ ≥ 1} is substantially the same as {5, …, n + 4}. 
Although [8] by Z / M – L / Ak < 1 / (2 Ak
2) and qu+1 > qu ∆ can not break REESSE1+ with Cx ≡ Ax W
 f(x) 
(% M) and Ω = {5δ, …, (n + 4)δ | δ ≥ 1}, attack by Z / M – L / Ak < 1 / (2 
2), namely (1′) will filter out 
the most of disturbing data as n is large, which makes REESSE1+ be faced with danger. Therefore, in 
REESSE1+ with Cx ≡ Ax W
 f(x) (% M), Ω needs to be complicated, namely had best select Ω = {5, 7, …, 19, 
53, 55, …} which is an odd set of 2n elements such that  ∀ e1, e2 ∈ Ω, e1 ≠ e2,  ∀ e1, e2, e3 ∈ Ω, e1 + e2 
≠ e3,  ∀ e1, e2, e3, e4 ∈ Ω, e1 + e2 + e3 ≠ e4. 
5.2  Key Transform Cx ≡ Ax W
 f(x) (% M) Needs to Be Strengthened When Still Ω = {5, …, n + 4} 
In REESSE1+ with Cx ≡ Ax W
 f(x) (% M) and f (x) ∈ Ω = {5, 7, …, 19, 53, 55, …}, because the elements 
of Ω are relatively large, decryption speed will decrease greatly. 
To keep Ω = {5, …, n + 4} unvaried, the key transform should be strengthened, so in REESSE1+ v2.1, 
we let Cx ≡ (Ax W
 f(x))δ (% M). In this way, REESSE1+ v2.1 is not only secure but also swift. 
6  Attack on the Signature Is an Eisegesis 
6.1  T –1 %  does not Exist and Q –1 %  not Necessarily Exist 
  Section 4 of the original [8] deduces U ≡ ((Q / H) 1 / S  (GW) –1δ δ (δ + 1) – 1 / S) Q T (% M), which is right. 
  However, (GW) –1δ δ (δ + 1) – 1 / S ≡ ((Q / H) – S – 1  – 1) U (Q T) – 1 (% M) further given in [8] is wrong because 
T –1 %  with T |  does not exist, and neither does Q –1 %  exist when gcd(Q, ) > 1. In the signature 
algorithm, it is easy to let gcd(Q, ) > 1. 
  Denote x = (GW) –1δ δ (δ + 1) – 1 / S (% M). 
  Then, the trivial solution to x Q T ≡ U ((Q / H) 1 / S) – Q T (% M) does not exist when gcd (T,  / T) > 1. 
  Due to stipulating T ≥ 2 n in the key generation algorithm, the time complexity of finding out a random 
solution to x Q T ≡ U ((Q / H) 1 / S) – Q T is at least max (O(2 n – 1), O(M / (Q T))) through the probabilistic 
algorithm [10]. 
  If a solution to x Q T ≡ U ((Q / H) 1 / S) – Q T is found through the discrete logarithm method, the probability 
that the solution is just equal to (GW) –1δ δ (δ + 1) – 1 / S (% M) is at most 1 / 2 n. 
  If denote x = ((GW) –1δ δ (δ + 1) – 1 / S) T (% M), then x Q ≡ U ((Q / H) 1 / S) – Q T (% M). 
  When gcd (Q, ) > 5 and M / Q > 2 n, seeking a solution to x Q ≡ U ((Q / H) 1 / S) – Q T is also at least the 
discrete logarithm problem. 
6.2  Forging Attack in [8] May Be Easily Avoided through Turning D | (δ Q – W) to D | (δ Q – WH) 
  In REESSE1+ [4], we definitely pointed out that Q ≠ Q1, where Q is produced currently, and Q1 is any of 
signature foreparts produced ever before. Of course, Q ≠ Q1 implied that the linear combination of Q1 with 
Q2 should be excluded from signature foreparts. However, such exclusion is infeasible in polynomial time. 
  Therefore, in practical applications, it is suggested as a shortcut that users move the parameter H in Q ≡ 
(R G0) 
 Hδ (% M) into D | (δ Q – W), and make D | (δ Q – W) become D | (δ Q – WH). In this wise, the 
Refuting the Pseudo Attack on the REESSE1+ Cryptosystem                           http://arxiv.org/pdf/0704.0492 
forgery attack in [8] is easily avoided, namely Q′ can not be forged out at least in polynomial time. 
  Notice that correspondingly, the λ S in the signature algorithm and the discriminant in the verification 
algorithm should also be adjusted. 
7  Conclusion 
  The above rebuttal shows that each or the combination of (1″), qu+1 > qu ∆, and table 1 is not sufficient 
for f (i) + f (j) = f (k), there exist logic errors in the deduction of (3), and alg.1 based on fact 4 and alg.2 
based on table 1 are not valid. Additional, the signature forgery attack in [8] is easily avoided. Hence, the 
conclusion of [8] that REESSE1+ is not secure at all (which connotes that [8] can extract a related private 
key from any public key in REESSE1+) is completely incorrect, as long as Ω is fitly selected, REESSE1+ 
with Cx ≡ Ax W
 f(x) (% M) is secure, and the private key attack in [8] like [6] is a pseudo attack.. 
The authors of [8] attempt to convince people or credulous one of their opinion through an example 
woven elaborately, and their purpose is to want to suffocate REESSE1+, suppress us, and elevate 
themselves. Especially, [8] like [6] does not list the origin of idea of the continued fraction analysis of 
REESSE1, and falsifies the data at table 2, which violates scientific research ethics and honestness. 
We welcome unmalicious, co-promotive, and normal academic criticism which is utterly necessary. 
References 
[1] Shenghui Su, The REESSE1 Public-key Encryption Algorithms, Int. C1: H04L 9/14, ZL01110163.6, Chinese Patent, Apr. 
2001. 
[2] Shenghui Su, The REESSE1 Public-key Cryptosystem, Computer Engineering & Science, Chinese, v25(5), 2003, 
pp.13-16. 
[3] Shenghui Su, Yixian Yang and Bingru Yang, The Necessity and Sufficiency Analysis of the Lever Function in the 
REESSE1 Encryption Scheme, Acta Electronica Sinica, Chinese, v34(10), 2006, pp.1892-1895. (Received May 13, 2005) 
[4] Shenghui Su and Shuwang Lü, The REESSE1+ Public-key Cryptosystem, http://eprint.iacr.org/2006/420.pdf. 
[5] Shengli Liu, Fangguo Zhang and Kefei Chen, Cryptanalysis of REESSE1 Digital Signature Algorithm, CCICS 2005, 
Xi’an, China, May 2005. 
[6] Shengli Liu, Fangguo Zhang and Kefei Chen, Cryptanalysis of REESSE1 Public Encryption Cryptosystem, Information 
Security, Chinese, n7, 2005, pp.121-124. 
[7] Shenghui Su, Refuting the Pseudo Attack on the REESSE1 Public-key Algorithms for Encryption, Computer Engineering 
and Applications, Chinese, v42(20), 2006, pp.129-133. 
[8] Shengli Liu and Fangguo Zhang, Cryptanalysis of REESSE1+ Public Key Cryptosystem, http://eprint.iacr.org/2006/ 
480.pdf, Dec. 22, 2006. 
[9] Kenneth H. Rosen, Elementary Number Theory and Its Applications (5th ed.), Boston: Addison-Wesley, 2005, ch. 12. 
[10] Henri Cohen, A Course in Computational Algebraic Number Theory, Berlin: Springer-Verlag, 2000, ch. 1, 3. 
Remark 
The first version of this paper was sent to the authors of [8] via email on Mar. 6, 2007, and the draft of this 
revised version was sent to the authors of [8] via email between Oct. 23 and Nov. 12, 2009 repeatedly. 
The authors of [8] revised section 5 of [8] on Mar. 12, 2007 after they read this paper and the eprint.iacr.org′s 
demand that [8] should be withdrawn or modified, but the modification avoided the heavy and chose the light.
ABSTRACT
  We illustrate through example 1 and 2 that the condition at theorem 1 in [8]
dissatisfies necessity, and the converse proposition of fact 1.1 in [8] does
not hold, namely the condition Z/M - L/Ak < 1/(2 Ak^2) is not sufficient for
f(i) + f(j) = f(k). Illuminate through an analysis and ex.3 that there is a
logic error during deduction of fact 1.2, which causes each of fact 1.2, 1.3, 4
to be invalid. Demonstrate through ex.4 and 5 that each or the combination of
qu+1 > qu * D at fact 4 and table 1 at fact 2.2 is not sufficient for f(i) +
f(j) = f(k), property 1, 2, 3, 4, 5 each are invalid, and alg.1 based on fact 4
and alg.2 based on table 1 are disordered and wrong logically. Further,
manifest through a repeated experiment and ex.5 that the data at table 2 is
falsified, and the example in [8] is woven elaborately. We explain why Cx = Ax
* W^f(x) (% M) is changed to Cx = (Ax * W^f(x))^d (% M) in REESSE1+ v2.1. To
the signature fraud, we point out that [8] misunderstands the existence of T^-1
and Q^-1 % (M-1), and forging of Q can be easily avoided through moving H.
Therefore, the conclusion of [8] that REESSE1+ is not secure at all (which
connotes that [8] can extract a related private key from any public key in
REESSE1+) is fully incorrect, and as long as the parameter Omega is fitly
selected, REESSE1+ with Cx = Ax * W^f(x) (% M) is secure.

<|endoftext|><|startoftext|>
Phase structure of a surface model on dynamically triangulated spheres with elastic
skeletons
Hiroshi Koibuchi∗
Department of Mechanical and Systems Engineering,
Ibaraki National College of Technology, Nakane 866 Hitachinaka, Ibaraki 312-8508, Japan
(Dated: August 12, 2021)
We find three distinct phases; a tubular phase, a planar phase, and the spherical phase, in a
triangulated fluid surface model. It is also found that these phases are separated by discontinu-
ous transitions. The fluid surface model is investigated within the framework of the conventional
curvature model by using the canonical Monte Carlo simulations with dynamical triangulations.
The mechanical strength of the surface is given only by skeletons, and no two-dimensional bending
energy is assumed in the Hamiltonian. The skeletons are composed of elastic linear-chains and rigid
junctions and form a compartmentalized structure on the surface, and for this reason the vertices
of triangles can diffuse freely only inside the compartments. As a consequence, an inhomogeneous
structure is introduced in the model; the surface strength inside the compartments is different from
the surface strength on the compartments. However, the rotational symmetry is not influenced by
the elastic skeletons; there is no specific direction on the surface. In addition to the three phases
mentioned above, a collapsed phase is expected to exist in the low bending rigidity regime that was
not studied here. The inhomogeneous structure and the fluidity of vertices are considered to be the
origin of such variety of phases.
PACS numbers: 64.60.-i, 68.60.-p, 87.16.Dg
I. INTRODUCTION
A crumpling of surfaces has been investigated on the
basis of the singularity analysis, and progress has been
recently made on understanding the crumpling phenom-
ena; the universal structure on the crumpled thin sheets
was found in the formations of singularity of ridges and
cones [1, 2]. A similar transition to this phenomena was
also found experimentally between the smooth state and
the crumpled state in an artificial membrane, which is
partly polymerized [3].
Studies have also been focused on the transition in the
surface model of Helfrich, Polyakov and Kleinert (HPK)
[4, 5, 6] from the viewpoint of statistical mechanics [7,
8, 9, 10, 11, 12, 13]. The bending rigidity is known to
be stiffened by the thermal fluctuation of the surface,
and this was confirmed in the statistical mechanics of
membranes [14, 15, 16, 17, 18]. Numerical studies were
made to understand the transition in triangulated surface
models [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]. The
transition was reported as first-order in recent numerical
studies [31, 32].
On the other hand, the concern with inhomogeneous
surfaces has been growing over the past decade [33, 34]. A
homogeneous artificial membrane that is coated by elas-
tic skeletons is also considered to be an inhomogeneous
membrane. Some of the mechanical properties of such
membranes were revealed experimentally [35]. The hop
diffusion of membrane protein or lipids was observed, and
as a consequence the compartment of cytoskeletons was
∗Electronic address: koibuchi@mech.ibaraki-ct.ac.jp
confirmed to be in biological membranes [36]. It is also
well known that the microtubule, which is an element
of the cytoskeleton, gives a mechanical strength to the
surface of the biological membranes.
However, the surface collapsing phenomena and the
surface fluctuation phenomena are almost unknown in
such inhomogeneous models for membranes. Therefore,
it is worthwhile to study an inhomogeneous fluid sur-
face model within the framework of the conventional sur-
face model of HPK. We note that the inhomogeneity in
our model corresponds to the cytoskeletons in biologi-
cal membranes as stated above. The fluidity realized by
dynamical triangulations in the inhomogeneous model,
as well as the fluidity in the homogeneous surface mod-
els, corresponds to the lateral diffusion of lipids in mem-
branes.
In this paper we study a compartmentalized surface
model by Monte Carlo (MC) simulations. The Hamil-
tonian of the model includes no two-dimensional bend-
ing energy but a one-dimensional bending energy. The
model is defined on dynamically triangulated surfaces,
where the free diffusion of vertices is confined inside the
compartments. The mechanical strength of the surface is
given only by the compartment boundary, which is com-
posed of one-dimensional elastic chains and rigid junc-
tions. Because the collapsed phase is expected to appear
at sufficiently small bending rigidity b[kT ]→0(b 6=0), we
concentrate on the phase structure at relatively large b
in this paper. Consequently, information on the phase
boundary at b→0 remains unanswered.
We recently reported numerical results of three types
of surface models [37, 38], which are similar to the model
in this paper. Then, we should comment on the similar-
ity/difference between the model in this paper and the
http://arxiv.org/abs/0704.0493v1
mailto:koibuchi@mech.ibaraki-ct.ac.jp
models in [37, 38]. Firstly, the lattice structure of the
model in this paper is very similar to that of the first
model in [37] and that of the model in [38], and is iden-
tical to that of the second model in [37]. Secondly, the
lattice in this paper and that of the first model in [37]
are the dynamically triangulated one, while the lattice of
the second model in [37] and that in [38] are the fixed-
connectivity one. Thirdly, the Hamiltonian is different
from the one in the first model in [37]. The Hamiltonian
of the model of this paper includes only one-dimensional
bending energy, which is defined on the compartment
boundary, while the Hamiltonian of the first in [37] in-
cludes only a two-dimensional bending energy, which
is defined all over the surface, and no one-dimensional
bending energy is given to the compartment boundary.
Therefore, the model in this paper is different from the
three models in [37, 38].
Our results obtained in this paper show that the model
undergoes a first-order transition between the smooth
phase and the crumpled phase. Moreover, the smooth
phase can be divided into the spherical phase and the
planar phase, and the crumpled phase can also be divided
into the tubular phase and the collapsed phase, which is
expected to appear at sufficiently small b because no self-
avoiding property [39, 40, 41] is assumed in the model.
It must be emphasized that such variety of phases can
be seen neither in the conventional surface models nor in
the compartmentalized models such as those in [37, 38].
One remarkable result is the appearance of planar sur-
faces. The echinocytic shapes of erythrocytes were ex-
tensively studied, and they are currently known to be
described by many models such as the area difference
bilayer model [34]. The shape of membranes is also sen-
sitive to the flow fields [42]. Our model in this paper indi-
cates that one possible origin of such planar shape comes
from the inhomogeneity due to the cytoskeltal structure
and the fluidity of lateral diffusion of vertices.
II. MODEL
Figure 1(a) shows a triangulated surface of size
(N,NS , NJ , L) = (2322, 600, 42, 6), where N is the total
number of vertices including the junctions, NS is the total
number of vertices on the chains, NJ is the total number
of junctions, and L is the length of chains between the
two nearest-neighbor junctions. It should be noted again
that NJ is included in N ; junctions are counted in the
total number of vertices. The junctions are assumed as
rigid plates; twelve of them are pentagon and all the oth-
ers are hexagon. The junction size in Fig.1(a) is drawn
many times larger than that of the lattices for the sim-
ulations; and it will be discussed in the last part of this
section. Thick lines on the surface in Fig.1(a) denote the
chains, which are terminated at the junctions.
The construction of the lattices is as follows: Let us
start with the icosahedron. Every edge of the icosahe-
dron is divided into ℓ pieces of uniform length, and then
(a) (N,NS , NJ , L)=
(2322, 600, 42, 6)
(b) A rigid junction
with the chains
θ(ij)θ(ij)
FIG. 1: (Color online) (a) Starting configuration of surfaces
of size (N,NS, NJ , L)=(2322, 600, 42, 6), and (b) angles θ(ij)
in the bending energy S2 of Eq.(4). Thick lines in (a) denote
the compartment boundary composed of the linear chains and
the rigid junctions of the hexagonal and the pentagonal plates,
whose size is drawn many times larger than that of the lattices
for the simulations.
we have a triangulated surface of size N0 = 10ℓ
2+2 (=
the total number of vertices on the surface). The com-
partmentalized structures are constructed by dividing ℓ
further into m pieces (m=1, 2, · · · ). Thus, we have the
chains of uniform length L= (ℓ/m)−2 when m divides
ℓ. The reason for the subtraction −2 is because of the
junctions at the two end points of the chain. Because
the compartmentalized structure is a sublattice, the to-
tal number of junctions NJ is given by NJ = 10m
The total number of bonds in the sublattice is 3NJ−6,
and each bond contains L−1 vertices, then NS is given
by NS =(3NJ−6)(L−1), which can be written as NS =
30m(ℓ−3m). The hexagonal (pentagonal) rigid junctions
are composed of 7 (6) vertices, then NJ−12 hexagonal
rigid junctions and 12 pentagonal rigid junctions reduce
the total number of vertices N0 by (NJ−12)×6 and 12×5.
Therefore, we have N=N0−6NJ+12, which can also be
written as N=10ℓ2−60m2+2. The thermodynamic limit
of our model is defined by N→∞, NS→∞, and NJ →∞
under the condition that L is finite. We have the thermo-
dynamic limit at ℓ→∞ and m→∞. The lattice of size
(N,NS , NJ , L)=(2322, 600, 42, 6) in Fig.1(a) is given by
two independent integers (ℓ,m)=(16, 2).
The surfaces can be characterized by the length L. In
this paper, we assume three values for L such that
L = 6, L = 8, L = 11. (1)
The value of L has a one to one correspondence with the
total number of vertices n in a compartment; in fact, the
values of L in Eq.(1) correspond to n= 21, n= 36, and
n=66, respectively [37]. We note that the effective phys-
ical meaning of increasing (decreasing) L can be consid-
ered as the increasing (decreasing) temperature. In fact,
the surface fluctuation mainly comes from the thermal
fluctuation of vertices inside the compartments. Because
no bending energy is assumed inside the compartments,
the fluctuation of vertices becomes large not only in the
in-plane directions (free diffusion) but also in the direc-
tion perpendicular to the surface. Thus, we consider that
the fluctuations are expected to grow with increasing n,
i.e., increasing L.
We use the surfaces of size (N,NS , NJ) listed in Ta-
ble I. Three different sizes (N,NS , NJ) are assumed for
each L. The corresponding integers (ℓ,m) are as follows:
(16, 2), (24, 3), and (32, 4) for the L=6 surfaces, (10, 1),
(20, 2), and (30, 3) for the L = 8 surfaces, and (13, 1),
(26, 2), and (39, 3) for the L=11 surfaces.
TABLE I: The surface size assumed in the simulations. Three
sizes (N,NS , NJ ) are assumed for each L.
L (N,NS , NJ ) (N,NS , NJ ) (N,NS , NJ )
6 (2322,600,42) (5222,1350,92) (9282,2400,162)
8 (942,210,12) (3762,840,42) (8462,1890,92)
11 (1632,300,12) (6522,1200,42) (14672,2700,92)
The model is defined by the partition function
dXi exp [−S(X, T )] , (2)
S(X, T ) = S1 + bS2,
where S1 is the Gaussian bond potential, which is de-
fined all over the surface, and S2 is the one-dimensional
bending energy, which is defined on the compartment
boundary and will be given below. The parameter b is
the bending rigidity. The integration symbol
in Eq.(2)
denotes that the center of mass of the surface is fixed.
denotes the sum over all possible triangulations T ,
which are performed by the bond flip technique keeping
the compartments unflipped. The bond flip procedure
will be given in the following section.
The integration measure
i=1 dXi is given by the
product
dXi =
qαj(i)
(α = 3/2, 0), (3)
where N ′ (=N−NJ) is the total number of vertices ex-
cluding the junctions,
i=1 dXiq
i denotes the integra-
tion over the 3D translational degrees of freedom (DOF)
of the vertices i, and
i=1 dXi
j(i) q
j(i) denotes those
of the 3D translational DOF and the 3D rotational DOF
of the junctions i. The co-ordination number qi is the
total number of bonds meeting at the vertex i, and qj(i)
is the total number of bonds meeting at the corner j(i)
of the junction i.
The parameter α was chosen to be α=3/2 in [43, 44],
while α = 0 in many previous simulations on dynami-
cally triangulated surfaces in the literatures. It is easy to
understand that large positive α suppresses the configu-
rations with large coordination number. Therefore, it is
interesting to see the dependence of the phase structure
on α.
We chose both α = 3/2 and α = 0 for the weight
qαi [43, 44], and see whether the phase structure of the
model depends on α or not. If the parameter is chosen
to α=3/2, then the coordination number qi serves as a
weight of the integration dXi, while α=0 gives the uni-
form weight. The weight
i=1 q
i can also be written
i=1 q
i = exp(α
log qi), and therefore,
i=1 q
is considered to be the co-ordination dependent term
log qi in the Hamiltonian; −α
log qi changes
its value only on dynamically triangulated surfaces.
The Gaussian term S1 and the bending energy term
S2 are defined by
(Xi −Xj)
, S2 =
1− cos θ(ij)
, (4)
where
(ij) in S1 is the sum over bonds (ij) connecting
the vertices i and j, and
(ij) in S2 is also the sum over
bonds (ij). θ(ij) in S2 is the angle between the bonds
i and j, which include virtual bonds. The virtual bonds
denote the lines between the center and the corners of the
junction; the hexagonal (pentagonal) junction contains
six (five) virtual bonds.
Figure 1(b) is a junction and the chains linked to the
junction on a fluctuating surface. Triangles are elimi-
nated from the figure. One θ(ij) shown at a corner of
the junction is defined by using a virtual bond and a real
bond in a chain, and the other θ(ij) shown at a vertex is
defined by real bonds on the same chain.
The size of the junctions can be characterized by the
edge length R, which is fixed to
R = 0.1 (edge length of the junctions). (5)
The value R = 0.1 is relatively smaller than the mean
bond length 0.707, which corresponds to the relation
S1/N = 1.5 satisfied in the equilibrium configuration of
surfaces without the rigid junctions. As we will see later,
the relation S1/N =1.5 is slightly violated in the model
of this paper because of the rigid junctions.
Here we comment on the unit of physical quantities.
Let a be the length scale of the model, then the unit
of physical quantity that has the length unit can be ex-
pressed by a; the unit of S1 is [a
2]. The surface tension
coefficient λ in λS1+bS2 has the unit [kT/a
2] and as-
sumed to be λ=1[kT/a2], and the bending rigidity b has
the unit of [kT ] as described above.
Note that the bending rigidity b in the Hamiltonian is
a microscopic quantity from the view point of statistical
mechanical model, and therefore b is not always identical
to the macroscopic bending rigidity of real physical mem-
branes. However, the microscopic value b of real mem-
branes can effectively be varied with the temperature,
because b has the unit of kT . Therefore, it is possible to
consider that the phase structure described in terms of b
in the surface model corresponds to the phase structure
described in terms of T in real physical membranes. The
length scale a in the model is also a microscopic quantity
and, we consider that a is sufficiently smaller than the
membrane size.
III. MONTE CARLO TECHNIQUE
A sequence of random numbers called Mersenne
Twister [45] is used in the canonical MC simulations.
The Metropolis technique is applied to update X and T ,
where the variable X denotes the position of the ver-
tices and that of the junctions. The vertex position
X is shifted so that X ′ = X + δX , where δX is ran-
domly chosen in a small sphere. The new position X ′ is
accepted with the probability Min[1, exp(−∆S)], where
∆S = S(new)−S(old). The position X of a hexagonal
(or pentagonal) junction, which is not a point but a rigid
plate, is also integrated out by performing 3D random
translations and 3D random rotations.
Thus, the variable X is updated by a random N ′ (=
N−NJ) shifts of vertices, a random NJ translations of
junctions, and a randomNJ rotations of junctions. These
updates are denoted by (N ′, NJ , NJ) updates of X . The
N ′ shifts of X can be divided into NS shift of the vertices
on the linear chains and N ′−NS shifts of all the other
vertices, which are those inside the compartments.
The radius of the small sphere for δX is fixed at the
beginning of the MC simulations in order to maintain
about 50% acceptance rate. The vertices on the linear
chains carry the bending energy S2 in Eq.(4), while all the
other vertices inside the compartments does not. There-
fore, the acceptance rate is independently controlled in
the two-groups of vertices. The radius for the random
translation of the junctions and that for the random rota-
tion are also independently chosen so that the acceptance
rates are both about 50%.
The summation over T in Z of Eq.(2) is performed by
using the standard bond flip technique [22, 23]. The flip
is accepted with the probability Min[1, exp(−∆S)]. The
acceptance rate for the bond flip is not under control and
is about 75%, which is almost independent of b.
The bonds are labeled with sequential numbers. The
total number of bonds is denoted by N ′B, which excludes
the number of bonds on the linear chains because the
bonds on the linear chains remain unflipped.
The bond flip is performed as follows: Firstly, the odd-
numbered bonds are sequentially chosen to be flipped for
the N ′B/2 updates of T , and after that, the (N,NJ , NJ)
updates of X are performed. Secondly, the remaining
even-numbered bonds are chosen to be flipped for the
N ′B/2 updates of T , and after that, the (N,NJ , NJ) up-
dates ofX are performed. Thus, the (N,NJ , NJ) updates
of X and the N ′B/2 updates of T are consecutively per-
formed, and these make one MCS (Monte Carlo Sweep).
We introduce the lower bound 1× 10−8 to the area of
triangles. No lower bound is imposed on the bond length.
IV. RESULTS OF SIMULATION
A. α = 3/2
As mentioned in Section II, we assume the value of
α in Eq.(3) as α = 3/2 and α = 0. In this subsection,
we present the results obtained under α= 3/2 by using
snapshots and figures, and in the next subsection we will
show some of the results under α=0.
The thermalization MCS is 1× 107 in almost all cases.
However, more than 1 × 108 thermalization MCS were
done close to the transition point in such cases that the
surface is trapped in one phase at first and then changes
its phase to a more stable one under a given condition.
The total number of MCS for the production of samples
is 0.8 × 108 ∼ 1.3 × 108. At the transition point, about
2 × 108 MCS was performed after the thermalization in
some cases.
(a) b= 21.2 (b) b= 21.8 (c) b= 22
(d) The section (e) The section (f) The section
FIG. 2: (Color online) The snapshots of surfaces of size
(N,NS , NJ , L) = (8462, 1890, 92, 8) obtained at (a) b = 21.2
(tubular phase), (b) b = 21.8 (planar phase), and (c) b = 22
(spherical phase), and (d),(e),(f) are the surface sections of
(a),(b),(c), respectively. α=3/2.
We show snapshots of the (N,NS , NJ , L) =
(8462, 1890, 92, 8) surface in Figs.2(a)–2(c). They were
obtained at (a) b = 21.2, (b) b = 21.8, and (c) b = 22,
which respectively corresponds to the tubular phase, the
planar phase, and the spherical phase. The snapshot of
Fig.2(b) at b=21.8 was the final configuration produced
after 2× 108 MCS including 1× 108 thermalizaion MCS;
the planar surface was stable after the thermalization
MCS. The surface sections are shown in Figs.2(d)–2(f);
the sections in Figs.2(d) and 2(e) were obtained by slicing
the surfaces perpendicular to the vertical axis, and the
section in Figs.2(f) was obtained by slicing the surface
perpendicular to a horizontal axis. All of the snapshots
were drawn in the same scale. The axis of the tubular
surface Fig.2(a) as well as the axis perpendicular to the
planar surface Fig.2(b) is spontaneously chosen.
The planar phase is stable only on the L=8 surfaces,
while it seems unstable on the L=6 surfaces and on the
L=11 surfaces. Even if the planar phase once appears on
the surfaces of L=6 and L=11 of size at least N≤9282
and N ≤ 14672, respectively, it eventually collapses into
the tubular phase. Therefore, we find that no planar
phase can be seen on the L=6 and the L=11 surfaces;
the tubular phase and the spherical phase are connected
by a discontinuous transition on those surfaces. Thus,
we understand that the planar phase appears depending
on the size of the compartments. We should note that
the planar surface may bend and fluctuate in the limit of
N→∞, and the tubular surface may also bend and wind
in the same limit.
20 21 22 23
S2/NS'
N=8462
α=1.5
tubular
planar
spherical
9 9.5 10
S2/NS'
N=9282
α=1.5
tubular spherical
46 48 50 52
spherical
S2/NS'
N=6522
α=1.5
tubular
FIG. 3: The one-dimensional bending energy S2/N
S against
b obtained on the surfaces of (a) L = 6, (b) L = 8, and (c)
L=11. N ′S(=NS+6NJ −12) is the total number of vertices
where S2 is defined.
Figures 3(a),3(b), and 3(c) show the bending energy
S of Eq.(4) against b, which were obtained on the
surfaces of L = 6, L = 8, and L = 11, respectively.
N ′S(=NS+6NJ−12) is the total number of vertices where
S2 is defined. 6NJ −12 is the total number of corners
of the junctions, which include 12-pentagons. The solid
lines on the data were drawn to guide the eyes. Dashed
lines drawn vertically denote the phase boundary be-
tween the tubular and the spherical phases, the bound-
ary between the tubular and the planar phases, and the
boundary between the planar and the spherical phases.
The discontinuous change of S2/N
S between the tubu-
lar phase and the spherical (or the planar) phase is very
clear in the figures and considered to be a sign of the
first-order transition.
In order to see the difference between S2/N
S in those
three phases, we plot in Figs.4(a),4(b), and 4(c) the varia-
tion of S2/N
S against MCS obtained at b=21.2, b=21.4,
and b = 21.8 on the (N,NS , NJ , L) = (8462, 1890, 92, 8)
surface. The thermalization MCS were not discarded;
they were included only in those variations. S2/N
b=21.2 in Fig.4(a) shows a jump from the spherical phase
to the planar phase and a jump from the planar phase to
the tubular phase; the corresponding MCS at the jumps
were indicated with the dashed vertical lines. We also
find in Fig.4(b) a jump from the spherical phase to the
planar phase. A jump is also seen in S2/N
S at b=21.8
0 0.5 1
planar
N=8462
L=8b=21.4
spherical
[x108]0 0.5 1
N=8462
b=21.2
tubularplanarspherical
[x108] 0 1 2
N=8462
b=21.8
spherical
planar
0.05 0.06
b=21.2
(d) 0.05 0.06
b=21.4
(e) 0.05 0.06
b=21.8
FIG. 4: The variation of S2/N
S against MCS, which were
obtained on the (N,NS , NJ , L)=(8462, 1890, 92, 8) surface at
(a) b=21.2, (b) b=21.4, and (c) b=21.8. The dashed lines
denote the MCS where the jumps occurred. The correspond-
ing normalized histogram h(S2) obtained at (d) b=21.2, (e)
b = 21.4, and (f) b = 21.8. The parameter α was fixed to
α=3/2.
in Fig.4(c) from the spherical phase to the planar phase.
The value of b=21.2 corresponds to the tubular phase,
whereas b = 21.4 and b= 21.8 correspond to the planar
phase, because the final states are considered to be stable
states. The surfaces at b=21.2 and b=21.8 can be seen
in the snapshots in Figs.3(a) and 3(b).
The distribution of S2/N
S are shown as the normal-
ized histograms h(S2) in Figs.4(d)–4(f), which respec-
tively correspond to the variations in Figs.4(a)–4(c). We
see that h(S2) in Fig.4(d) has three peaks; two of them
are almost overlapping and the other one is distinctly
separated from the previous two. Those three peaks in
h(S2) correspond to the spherical phase, planar phase,
and the tubular phase. Two almost overlapping peaks
can also be seen in h(S2) in Figs.4(e) and 4(f), and they
are corresponding to the spherical phase and the planar
phase. We remark that the surfaces hardly change not
only from the tubular phase to the smooth (= spherical
or planar) phase but also from the planar phase to the
spherical phase on the L = 8 and L = 11 surfaces. For
this reason, we find in Figs.4(a)–4(c) no jump-back from
a higher S2 state (such as the tubular state) to a lower
S2 state (such as the planar state).
The two-dimensional bending energy is defined by
(1− ni · nj) , (6)
where ni is the unit normal vector of the triangle i, and
ni ·nj is defined on the common bond (ij) of the triangles
i and j. S3 is not included in the Hamiltonian and is
defined even on the edges of the rigid junctions. Figures
5(a)–5(c) show S3/NB against b obtained on the surfaces
of L=6, L=8, and L=11, where NB is the total number
of bonds including the edges of the junctions. The jump
of S3/NB in Fig.5(b) is clearly seen between the tubular
20 21 22 23
S3/NB
N=8462
α=1.5
tubular
planar
spherical
9 9.5 10
S3/NB
N=9282
α=1.5
tubular spherical
46 48 50 52
spherical
S3/NB
N=6522
α=1.5
tubular
FIG. 5: The two-dimensional bending energy S3/NB against
b obtained on the surfaces of (a) L = 6, (b) L = 8, and (c)
L=11. NB is the total number of bonds where S3 is defined.
phase and the planar phase. On the contrary, S3/NB
in the planar phase in Fig.5(b), as well as S2/N
S in the
planar phase in Fig.3(b), is not so clearly distinguishable
from that in the spherical phase.
20 21 22 23
1.512
1.514
1.516
N=8462
α=1.5
tubular
planar
spherical
9 9.5 10
1.514
1.516
1.518
N=9282
α=1.5
sphericaltubular
46 48 50 52
1.508
1.512
N=6522
α=1.5
tubular spherical
FIG. 6: The Gaussian bond potential S1/N against b obtained
on the surfaces of (a) L=6, (b) L=8, and (c) L=11.
It is expected that the Gaussian bond potential S1/N
is influenced by the phase transitions. The potential
S1/N should be S1/N ≃ 3/2, which is satisfied in the
model without the rigid junctions because of the scale
invariant property of the partition function in that case.
However, the junction size R in Eq.(5) is finite in the
model of this paper, and therefore S1/N can slightly de-
viate from 3/2.
Figures 6(a)–6(c) show S1/N against b obtained on the
surfaces of (a) L=6, (b) L=8, and (c) L=11. Discontin-
uous changes in S1/N shown in the figures are consistent
with the discontinuous transitions of the model, although
the changes are very small compared to the value of S1/N
itself. We find also the expected deviation of S1/N from
3/2 in the figures.
Figures 7(a)–7(c) show the mean square sizeX2, which
is defined by
Xi − X̄
, X̄ =
Xi, (7)
where X̄ is the center of mass of the surface. We see
that the phase transition is not reflected in X2 on the
L=6 surfaces in Fig.7(a), and the transition is also not
reflected in X2 on the L= 8 surfaces in Fig.7(b) at the
transition point between the planar phase and the spher-
ical phase. To the contrary, X2 discontinuously changes
20 21 22 23
N=942
α=1.5
N=8462
N=3762
tubular spherical
planar
9 9.5 10
α=1.5
N=2322
N=9282
N=5222
tubular spherical
46 48 50 52
N=6522
α=1.5
N=1632
N=14672
tubular spherical
FIG. 7: The mean square size X2 against b obtained on the
surfaces of (a) L=6, (b) L=8, and (c) L=11.
in Fig.7(b) at the transition point between the tubular
phase and the planar phase and also at the transition
point in Fig.7(c). All of these behaviors ofX2 at the tran-
sition points are consistent with those of S2/N
S , S3/NB,
and S1/N .
B. α = 0
In this subsection, we present some of the results ob-
tained under α=0.
(a) b= 20.9 (b) b= 21.4 (c) b= 21.8
(d) The section (e) The section (f) The section
FIG. 8: (Color online) The snapshots of surfaces of size
(N,NS , NJ , L) = (8462, 1890, 92, 8) obtained at (a) b = 20.9
(tubular phase), (b) b=21.4 (planar phase), and (c) b=21.8
(spherical phase), and (d),(e),(f) are the surface sections of
(a),(b),(c), respectively. α=0.
Snapshots of surfaces of α=0 are shown in Figs.8(a),
8(b), 8(c), which respectively correspond to the tubu-
lar phase (b = 20.9), the planar phase (b = 21.4), and
the spherical phase (b = 21.8). The surface size is
(N,NS , NJ , L)= (8462, 1890, 92, 8), which is identical to
that in Fig.2. The snapshot in Fig.8(b) at b = 21.4 is
the final configuration produced after 1.9× 108 MCS in-
cluding 1 × 107 thermalizaion MCS; the planar surface
was stable throughout the simulation. Thus, we find
that three distinct phases are seen also in the surfaces
of L = 8, and that the planar phase is unstable on the
surfaces of L= 6 and L= 11 under the condition α= 0.
Therefore, we consider that the phase structure of the
model is independent of whether α=3/2 or α=0.
20 21 22 23
S2/NS'
N=8462
tubular
planar
spherical
9 9.5 10
S2/NS'
N=9282
tubular spherical
46 48 50 52
spherical
S2/NS'
N=14672
tubular
FIG. 9: The one-dimensional bending energy S2/N
S against b
obtained on the surface of (a) L=6, (b) L=8, and (c) L=11.
N ′S(=NS+6NJ−12) is the total number of vertices where S2
is defined.
The one-dimensional bending energy S2/N
S obtained
under α=0 is shown in Figs.9(a)–9(c). A discontinuous
change can be seen in S2/N
S not only in Fig.9(b) at the
phase boundary between the tubular phase and the pla-
nar phase but also in Fig.9(c) at the phase boundary be-
tween the tubular phase and the spherical phase. A jump
of S2/N
S in Fig.9(b) at the transition point between the
planar phase and the spherical phase is very small, and
hence is hardly seen just the same as in Fig.3(b) under
α = 3/2 in the previous subsection. Thus, we find no
difference between S2/N
S of α=0 and that of α=3/2.
20 21 22 23
N=942
N=8462
N=3762
tubular spherical
planar
9 9.5 10
N=2322
N=9282
N=5222
tubular spherical
46 48 50 52
N=14672
tubular spherical
FIG. 10: The mean square size X2 against b obtained on the
surfaces of (a) L=6, (b) L=8, and (c) L=11.
The mean square size X2 are shown in Figs.10(a)–
10(c). A jump is also seen in X2 on the L = 8 and
L=11 surfaces in Figs.10(b) and 10(c), and it is hardly
seen on the L=6 surfaces of size up to (N,NS , NJ , L)=
(9282, 2400, 162, 6). These results are identical to those
observed in Figs.7(a)–7(c) under α=3/2.
Finally, we comment on the planar phase appeared
only on the L = 8 surface. The thermal fluctuation of
vertices inside the compartments disorders the surface
against the bending energy of the compartment bound-
ary. Therefore, the strength to disorder the surface in-
creases (decreases) with increasing (decreasing) L if N
remains fixed, as stated in Section II. On the other
hand, the mechanical strength of the surface increases
(decreases) with decreasing (increasing) L, because the
total number of junctions increases (decreases) with de-
creasing (increasing) L. Therefore, the strength to or-
der the surface increases (decreases) with decreasing (in-
creasing) L. Then, we expect that the surface is ordered
(disordered) at sufficiently small (large) L at given in-
termediate value of b. Moreover, it seems possible that
two competitive forces to order/disorder the surface are
balanced with each other at intermediate values of L and
consequently, some new phase appears depending on b
at those L. Note also that the possibility of the appear-
ance of planar phase is not completely eliminated on the
surfaces of L=6 and L=11 of sufficiently large size.
V. SUMMARY AND CONCLUSION
We have shown that a dynamically triangulated spher-
ical surface has three distinct phases; the tubular phase,
the planar phase, and the spherical phase, and that they
are separated by discontinuous transitions. The first-
order nature was very clear from the discontinuity in the
bending energies S2 and S3 not only at the transition
point between the tubular phase and the planar phase
but also at the transition point between the tubular phase
and the spherical phase. We know that the model has
the collapsed phase at sufficiently small b, since the self-
avoiding property is not assumed at least. Therefore, we
expect that the model has four different phases including
the collapsed phase, although the order of the transition
between the collapsed phase and the tubular phase is un-
known.
The mechanical strength of the surface is given only
by elastic linear-chains with rigid junctions. The
triangulated surfaces are characterized by the size
(N,NS , NJ , L), where N is the total number of vertices
including the junctions, NS is the total number of vertices
on the chains, NJ is the total number of junctions, and L
is the length of chains between the two nearest-neighbor
junctions on the starting configurations. These four pa-
rameters are not totally independent, because these are
given by two independent integers (ℓ,m), where m di-
vides ℓ. In fact, N =10ℓ2−60m2+2, NS =30m(ℓ−3m),
NJ =10m
2+2, and L=(ℓ/m)−2.
We assumed three different values for L such that L=
6, L=8, and L=11 in the simulations. The edge length
R of the rigid junction was fixed to be R = 0.1. The
parameter α, which represents a weight for the three-
dimensional integrations of the partition function, was
assumed as α=3/2 and α=0.
It is remarkable that the model has the planar phase,
which is stable only on the surfaces with a specific struc-
ture. In fact, the planar phase can be seen on the surfaces
of L=8, and it is unstable on the L=6 and L=11 sur-
faces. The planar phase appears in a narrow region on
the b-axis between the tubular phase and the spherical
phase, and it is distinguishable from the spherical phase
because a small but finite discontinuity can be seen in
the bending energies S2/N
S and S3/NB. The gap of the
bending energy S2 at the transition point is very small,
i.e., S2 in the planar phase is almost identical to that
in the spherical phase; however, the double peak struc-
ture was clearly seen in the histogram of S2, which is
included in the Hamiltonian. From this, we confirmed
that the transition between the planar phase and the
spherical phase is of first order. Our model in this pa-
per indicates that one possible origin of planar shape of
spherical membranes comes from the inhomogeneity due
to the cytoskeltal structure and the fluidity of lateral dif-
fusion of vertices.
We have confirmed that the results obtained at α=3/2
in Eq.(3) remain unchanged when α = 0. The phase
structure of the fluid surface model in this paper is in-
dependent of the choice of α at least for α = 3/2 and
α= 0. Large scale simulations should be performed. It
remains to be studied how large (ℓ,m) are sufficient for
the thermodynamic limit of the model.
Acknowledgments
This work is supported in part by a Grant-in-Aid for
Scientific Research from Japan Society for the Promotion
of Science.
[1] E. Cerda and L. Mahadevan, Phys. Rev. Lett. 80, (1998)
2358.
[2] R. da Silveira, S. Chaieb and L. Mahadevan, Science,
287, (2000) 1468.
[3] Sahraoui Chaieb, Vinay K. Natrajan, and Ahmed Abd
El-rahman, Phys. Rev. Lett. 96, 078101(1 - 4) (2006).
[4] W. Helfrich, Z. Naturforsch, 28c (1973) 693.
[5] A.M. Polyakov, Nucl. Phys. B 268 (1986) 406.
[6] H. Kleinert, Phys. Lett. 174B (1986) 335.
[7] D. Nelson, in Statistical Mechanics of Membranes and
Surfaces, Second Edition, edited by D. Nelson, T.Piran,
and S.Weinberg, (World Scientific, 2004), p.1.
[8] F. David, in Two dimensional quantum gravity and ran-
dom surfaces, Vol.8, edited by D. Nelson, T. Piran, and
S. Weinberg, (World Scientific, Singapore, 1989), p.81.
[9] D. Nelson, in Statistical Mechanics of Membranes and
Surfaces, Second Edition, edited by D. Nelson, T.Piran,
and S.Weinberg, (World Scientific, 2004), p.149.
[10] K. Wiese, in: C.Domb, J.Lebowitz (Eds.), Phase Transi-
tions and Critical Phenomena, Vol. 19, Academic Press,
London, 2000, p.253.
[11] M. Bowick and A. Travesset, Phys. Rep. 344 (2001) 255.
[12] G. Gompper and M. Schick, Self-assembling amphiphilic
systems, In Phase Transitions and Critical Phenomena
16, C. Domb and J.L. Lebowitz, Eds. (Academic Press,
1994) p.1.
[13] J.F. Wheater, J. Phys. A Math. Gen. 27 (1994) 3323.
[14] L. Peliti and S. Leibler, Phys. Rev. Lett. 54 (15) (1985)
1690.
[15] F. David and E. Guitter, Europhys. Lett, 5 (8) (1988)
[16] M. Paczuski, M. Kardar, and D. R. Nelson, Phys. Rev.
Lett. 60 (1988) 2638.
[17] M.E.S. Borelli, H. Kleinert, and Adriaan M.J. Schakel,
Phys. Lett. A 267 (2000) 201.
[18] M.E.S. Borelli and H. Kleinert, Phys. Rev. B 63 (2001)
205414.
[19] Y. Kantor and D.R. Nelson, Phys. Rev. A 36 (1987) 4020.
[20] J.F. Wheater, Nucl. Phys. B 458 (1996) 671
[21] M. Bowick, S. Catterall, M. Falcioni, G. Thorleifsson,
and K. Anagnostopoulos, J. Phys. I France 6 (1996) 1321;
M. Bowick, S. Catterall, M. Falcioni, G. Thorleifsson,
and K. Anagnostopoulos, Nucl. Phys. Proc. Suppl. 47
(1996) 838;
M. Bowick, S. Catterall, M. Falcioni, G. Thorleifsson,
and K. Anagnostopoulos, Nucl. Phys. Proc. Suppl. 53
(1997) 746.
[22] A.Baumgartner and J.S.Ho, Phys. Rev. A, 41, (1990)
5747 .
[23] S.M. Catterall, Phys. Lett. 220B, 253 (1989).
[24] S.M. Catterall, J.B. Kogut, and R.L. Renken, Nucl. Phys.
Proc. Suppl. B 99A (1991) 1.
[25] J. Ambjorn, A. Irback, J. Jurkiewicz, and B. Petersson,
Nucl. Phys. B 393 (1993) 571.
[26] K. Anagnostopoulos, M. Bowick, P. Gottington, M. Fal-
cioni, L. Han, G. Harris, and E. Marinari, Phys. Lett.
317B (1993) 102.
[27] M. Bowick, P. Coddington, L. Han, G. Harris, and E.
Marinari, Nucl. Phys. Proc. Suppl. 30 (1993) 795;
M. Bowick, P. Coddington, L. Han, G. Harris, and E.
Marinari, Nucl. Phys. B 394 (1993) 791.
[28] H. Koibuchi, Phys. Lett. A 300 (2002) 582;
H. Koibuchi, N. Kusano, A. Nidaira, K. Suzuki, and
M.Yamada, Phys. Lett. A 319 (2003) 44;
H. Koibuchi, N. Kusano, A. Nidaira, and K. Suzuki,
Phys. Lett. A 332 (2004) 141.
[29] H. Koibuchi, Eur. Phys. J. B 45 (2005) 377; Eur. Phys.
J. B, 52 (2006) 265.
[30] H. Koibuchi, A. Nidaira, T. Morita, and K. Suzuki, Phys.
Rev. E 68 (2003) 011804;
H. Koibuchi, Z. Sasaki, and K. Shinohara, Phys. Rev. E
70, (2004) 066144.
[31] J-P. Kownacki and H. T. Diep, Phys. Rev. E 66 (2002)
066105.
[32] H. Koibuchi, N. Kusano, A. Nidaira, K. Suzuki, and M.
Yamada, Phys. Rev. E 69 (2004) 066139;
H. Koibuchi and T. Kuwahata, Phys. Rev. E 72 (2005)
026124;
I. Endo and H. Koibuchi, Nucl. Phys. B 732 [FS] (2006)
[33] Ling Miao, Udo Seifert, Michael Wortis, and Hans-
Gunther Dobereiner Phys. Rev. E 49, 5389 - 5407 (1994).
[34] Marija Jari, Udo Seifert, Wolfgang Wintz, and Michael
Wortis, Phys. Rev. E 52 (1995) 6623 - 6634.
[35] E.Helfer, S.Harlepp, L.Bourdieu, J.Robert,
F.C.MacKintosh, and D. Chatenay, Phys. Rev. Lett. 87
(2001) 088103.
[36] K. Murase, T. Fujiwara, Y. Umehara, K. Suzuki, R.
Iino, H. Yamashita, M. Saito, H. Murakoshi, K. Rito-
hie, and A. Kusumi, Ultrafine Membrane Compartments
for Molecular Diffusion as Revealed by Single Molecule
Techniques, Biol. J. 86 (2004) 4075 - 4093 .
[37] H.Koibuchi, submitted; the first model and the second
model are identical to that in cond-mat/0607225 and that
in cond-mat/0607508, respectively.
[38] H.Koibuchi, J. Stat. Phys. in press, cond-mat/0607225.
[39] G. Grest, J. Phys. I (France) 1 (1991) 1695.
[40] M. Bowick and A. Travesset, Eur. Phys. J. E 5 (2001)
[41] M. Bowick, A. Cacciuto, G. Thorleifsson, and A. Traves-
set, Phys. Rev. Lett. 87 (2001) 148103.
[42] H. Noguchi and G. Gompper, Phys. Rev. Lett. 93,
258102 (2004).
[43] F. David, Nucl. Phys. B 257 [FS14], 543 (1985).
[44] D.V. Boulatov, V.A. Kazakov, I.K. Kostov and A.A.
Migdal, Nucl. Phys. B 275 [FS17], 641 (1986).
[45] M. Matsumoto and T. Nishimura, ”Mersenne Twister:
A 623-dimensionally equidistributed uniform pseudoran-
dom number generator”, ACM Trans. on Modeling and
Computer Simulation Vol. 8, No. 1, January (1998) pp.3-
http://arxiv.org/abs/cond-mat/0607225
http://arxiv.org/abs/cond-mat/0607508
http://arxiv.org/abs/cond-mat/0607225
ABSTRACT
  We find three distinct phases; a tubular phase, a planar phase, and the
spherical phase, in a triangulated fluid surface model. It is also found that
these phases are separated by discontinuous transitions. The fluid surface
model is investigated within the framework of the conventional curvature model
by using the canonical Monte Carlo simulations with dynamical triangulations.
The mechanical strength of the surface is given only by skeletons, and no
two-dimensional bending energy is assumed in the Hamiltonian. The skeletons are
composed of elastic linear-chains and rigid junctions and form a
compartmentalized structure on the surface, and for this reason the vertices of
triangles can diffuse freely only inside the compartments. As a consequence, an
inhomogeneous structure is introduced in the model; the surface strength inside
the compartments is different from the surface strength on the compartments.
However, the rotational symmetry is not influenced by the elastic skeletons;
there is no specific direction on the surface. In addition to the three phases
mentioned above, a collapsed phase is expected to exist in the low bending
rigidity regime that was not studied here. The inhomogeneous structure and the
fluidity of vertices are considered to be the origin of such variety of phases.

<|endoftext|><|startoftext|>
Introduction
A deeper understanding of the structure of Hilbert spaces of finite dimensions is of utmost
importance for quantum information theory. Recently, we made an important step in this
respect by demonstrating that the commutation algebra of the generalized Pauli operators on
the 2N -dimensional Hilbert spaces is embodied in the geometry of the symplectic polar space of
rank N and order two [1, 2, 3]. The case of two-qubit operator space, N = 2, was scrutinized
in very detail [1, 3] by explicitly demonstrating, in different ways, the correspondence between
various subsets of the generalized Pauli operators/matrices and the fundamental subgeometries
of the associated rank-two polar space – the (unique) generalized quadrangle of order two. In
this paper we will reveal another interesting geometry hidden behind the Pauli operators of
two-qubits, namely that of the Veldkamp space defined on this generalized quadrangle.
2 Finite generalized quadrangles and Veldkamp spaces
In this section we will briefly highlight the basics of the theory of finite generalized quadran-
gles [4] and introduce the concept of the Veldkamp space of a point-line incidence geometry [5]
to be employed in what follows.
http://arxiv.org/abs/0704.0495v3
mailto:msaniga@astro.sk
http://www.ta3.sk/~msaniga/
mailto:michel.planat@femto-st.fr
mailto:pracna@jh-inst.cas.cz
mailto:havlicek@geometrie.tuwien.ac.at
http://www.emis.de/journals/SIGMA/2007/075/
2 M. Saniga, M. Planat, P. Pracna and H. Havlicek
A finite generalized quadrangle of order (s, t), usually denoted GQ(s, t), is an incidence struc-
ture S = (P,B, I), where P and B are disjoint (non-empty) sets of objects, called respectively
points and lines, and where I is a symmetric point-line incidence relation satisfying the following
axioms [4]: (i) each point is incident with 1+ t lines (t ≥ 1) and two distinct points are incident
with at most one line; (ii) each line is incident with 1 + s points (s ≥ 1) and two distinct lines
are incident with at most one point; and (iii) if x is a point and L is a line not incident with x,
then there exists a unique pair (y,M) ∈ P ×B for which xIMIyIL; from these axioms it readily
follows that |P | = (s + 1)(st + 1) and |B| = (t + 1)(st + 1). It is obvious that there exists a
point-line duality with respect to which each of the axioms is self-dual. Interchanging points
and lines in S thus yields a generalized quadrangle SD of order (t, s), called the dual of S. If
s = t, S is said to have order s. The generalized quadrangle of order (s, 1) is called a grid and
that of order (1, t) a dual grid. A generalized quadrangle with both s > 1 and t > 1 is called
thick.
Given two points x and y of S one writes x ∼ y and says that x and y are collinear if there
exists a line L of S incident with both. For any x ∈ P denote x⊥ = {y ∈ P |y ∼ x} and note that
x ∈ x⊥; obviously, x⊥ = 1 + s+ st. Given an arbitrary subset A of P , the perp(-set) of A, A⊥,
is defined as A⊥ =
{x⊥|x ∈ A} and A⊥⊥ := (A⊥)⊥. A triple of pairwise non-collinear points
of S is called a triad; given any triad T , a point of T⊥ is called its center and we say that T is
acentric, centric or unicentric according as |T⊥| is, respectively, zero, non-zero or one. An ovoid
of a generalized quadrangle S is a set of points of S such that each line of S is incident with
exactly one point of the set; hence, each ovoid contains st+ 1 points.
The concept of crucial importance is a geometric hyperplane H of a point-line geometry
Γ(P,B), which is a proper subset of P such that each line of Γ meets H in one or all points [6].
For Γ = GQ(s, t), it is well known that H is one of the following three kinds: (i) the perp-set of
a point x, x⊥; (ii) a (full) subquadrangle of order (s, t′), t′ < t; and (iii) an ovoid.
Finally, we need to introduce the notion of the Veldkamp space of a point-line incidence
geometry Γ(P,B), V(Γ) [5]. V(Γ) is the space in which (i) a point is a geometric hyperplane of Γ
and (ii) a line is the collection H1H2 of all geometric hyperplanes H of Γ such that H1
H = H2
H or H = Hi (i = 1, 2), where H1 and H2 are distinct points of V(Γ).
Γ = S, from the preceding paragraph we learn that the points of V(S) are, in general, of three
different types.
3 The smallest thick GQ and its Veldkamp space
The smallest thick GQ is obviously the one with s = t = 2, dubbed the “doily.” This quadrangle
has a number of interesting representations of which we mention the most important two [4].
One, frequently denoted as W3(2) or simply W (2), is in terms of the points of PG(3, 2) (i.e.,
the three-dimensional projective space over the Galois field with two elements) together with
the totally isotropic lines with respect to a symplectic polarity. The other, usually denoted as
Q(4, 2), is in terms of points and lines of a parabolic quadric in PG(4, 2). By abuse of notation,
any GQ isomorphic to W (2) will also be denoted by this symbol. From the preceding section
we readily get that W (2) is endowed with 15 points/lines, each line contains three points and,
dually, each point is on three lines; moreover, it is a self-dual object, i.e., isomorphic to its dual.
W (2) features all the three kinds of hyperplanes, of the following cardinalities [5]: 15 perp-sets,
x⊥, seven points each; 10 grids (of order (2, 1)), nine points each; and six ovoids, five points
each – as depicted in Fig. 1. The quadrangle exhibits two distinct kinds of triads, viz. unicentric
and tricentric. A point of W (2) is the center of four distinct unicentric triads (Fig. 2, left); hence,
1It is important to mention here that the definition of Veldkamp space given by Shult in [7] is more restrictive
than that of Buekenhout and Cohen [5] adopted in this paper.
The Veldkamp Space of Two-Qubits 3
Figure 1. The three kinds of geometric hyperplanes of W (2). The points of the quadrangle are repre-
sented by small circles and its lines are illustrated by the straight segments as well as by the segments
of circles; note that not every intersection of two segments counts for a point of the quadrangle. The
upper panel shows the points’ perp-sets (yellow bullets), the middle panel grids (red bullets) and the
bottom panel ovoids (blue bullets); the use of different colouring will become clear later. Each picture –
except that in the bottom right-hand corner – stands for five different hyperplanes, the four other being
obtained from it by its successive rotations through 72 degrees around the center of the pentagon.
Figure 2. Left: – The four distinct unicentric triads (grey bullets) and their common center (black
bullet); note that the triads intersect pairwise in a single point and their union covers fully the center’s
perp-set. Right: – A grid (red bullets) and its complement as a disjoint union of two complementary
tricentric triads (black and grey bullets); the two triads are also seen to comprise a dual grid (of order
(1, 2)).
4 M. Saniga, M. Planat, P. Pracna and H. Havlicek
Figure 3. The five different kinds of the lines of V(W (2)), each being uniquely determined by the
properties of its core-set (black bullets). Note that the “yellow” hyperplanes (i.e., perp-sets) occur
in each type, and yellow is also the colour of two homogeneous (i.e., endowed with only one kind of
a hyperplane) types (2nd and 3rd row). It is also worth mentioning that the cardinality of core-sets is
an odd number not exceeding five. The three hyperplanes of any line are always in such relation to each
other that their union comprises all the points of W (2).
The Veldkamp Space of Two-Qubits 5
the number of such triads is 4 × 15 = 60. Tricentric triads always come in “complementary”
pairs, one representing the centers of the other, and each such pair is the complement of a grid
of W (2) (Fig. 2, right); hence, the number of such triads is 2 × 10 = 20. A unicentric triad is
always a subset of an ovoid, which is never the case for a tricentric triad; the latter, in graph-
combinatorial terms, representing a complete bipartite graph on six vertices. Now, we have
enough background information at hand to reveal the structure of the Veldkamp space of our
“doily”, V(W (2)).2
From the definition given in Section 2, we easily see that V(W (2)) consists of 31 points of
which 15 are represented/generated by single-point perp-sets, 10 by grids and six by ovoids.
The lines of V(W (2)) feature three points each and are of five distinct types, as illustrated in
Fig. 3. These types differ from each other in the cardinality and structure of “core-sets”, i.e.,
the sets of points of W (2) shared by all the three hyperplanes forming a given line. As it is
obvious from Fig. 3, the lines of the first three types (the first three rows of the figure) have the
core-sets of the same cardinality, three, differing from each other only in the structure of these
sets as being unicentric triads, tricentric triads and triples of collinear points, respectively. The
lines of the fourth type have as core-sets pentads of points, each being a quadruple of points
collinear with a given point of W (2), whereas core-sets of the last type’s lines feature just a single
point. A much more interesting issue is the composition of the lines. Just a brief look at Fig. 3
reveals that geometric hyperplanes of only one kind, namely perp-sets, are present on each line
of V(W (2)); grids and ovoids occur only on two kinds of the lines. We also see that the purely
homogeneous types are those whose core-sets feature collinear triples and tricentric triads, the
most heterogeneous type – the one exhibiting all the three kinds of hyperplanes – being that
characterized by unicentric triads. We also notice that there are no lines comprising solely grids
and/or solely ovoids, nor the lines featuring only grids and ovoids, which seems to be connected
with the fact that the cardinality of a core-set is an odd number. From the properties of W (2)
and its triads as discussed above it readily follows that the number of the lines of type one to
five is 60, 20, 15, 45 and 15, respectively, totalling 155. All these observations and facts are
gathered in Table 1. We conclude this section with the observation that V(W (2)) has the same
number of points (31) and lines (155) as PG(4, 2), the four-dimensional projective space over
the Galois field of two elements [8]; this is not a coincidence, as the two spaces are, in fact,
isomorphic to each other [5].
4 Pauli operators of two-qubits in light of V(W (2))
As discovered in [1] (see also [3]), the fifteen generalized Pauli operators/matrices associated
with the Hilbert space of two-qubits (see, e.g., [9]) can be put into a one-to-one correspondence
with the fifteen points of the generalized quadrangle W (2) in such a way that their commutation
algebra is completely and uniquely reproduced by the geometry of W (2) in which the concept
commuting/non-commuting translates into that of collinear/non-collinear. Given this mapping,
it was possible to ascribe a definitive geometrical meaning to sets of three pairwise commuting
generalized Pauli operators in terms of lines of W (2) and to other three kinds of distinguished
subsets of the operators having their counterparts in geometric hyperplanes of W (2) as shown
in Table 2 (see [1, 3] for more details). Yet, V(W (2)) puts this bijection in a different light,
in which other three subsets of the Pauli operators come into play, namely those represented
by the two types of a triad and by the specific pentads occurring as the core-sets of the lines
of V(W (2)) (Table 1). As already mentioned, the role of tricentric triads of the operators has
2As this paper is primarily aimed at physicists rather than mathematicians, in what follows we opt for an
elementary and self-contained exposition of the Veldkamp space of W (2); this explanation is based only upon
some very simple properties of W (2) readily to be grasped from its depiction as “the doily”, and does not
presuppose/require any further background from the reader.
6 M. Saniga, M. Planat, P. Pracna and H. Havlicek
Table 1. A succinct summary of the properties of the five different types of the lines of V(W (2)) in
terms of the core-sets and the types of geometric hyperplanes featured by a generic line of a given type.
The last column gives the total number of lines per the corresponding type.
Type of Core-Set Perp-Sets Grids Ovoids #
Single Point 1 0 2 15
Collinear Triple 3 0 0 15
Unicentric Triad 1 1 1 60
Tricentric Triad 3 0 0 20
Pentad 1 2 0 45
Table 2. Three kinds of the distinguished subsets of the generalized Pauli operators of two-qubits (PO)
viewed as the geometric hyperplanes in the generalized quadrangle of order two (GQ) [1, 3].
PO set of five mutually set of six operators nine operators of
non-commuting operators commuting with a given one a Mermin’s square
GQ ovoid perp-set\{reference point} grid
been recognized in disguise of complete bipartite graphs on six vertices [3]. A true novelty here
is obviously unicentric triads and pentads of the generalized Pauli operators as these are all
intimately connected with single-point perp-sets; given a point of W (2) (i.e., a generalized Pauli
operator of two-qubits), its perp-set fully encompasses four unicentric triads (Fig. 2, left) and
three pentads (Fig. 3, 4th row) of points/operators. This feature has also a very interesting
aspect in connection with the conjecture relating the existence of mutually unbiased bases and
finite projective planes raised in [10], because with each point x of W (2) there is associated
a projective plane of order two (the Fano plane) whose points are the elements of x⊥ and whose
lines are the spans {u, v}⊥⊥, where u, v ∈ x⊥ with u 6= v [4].
Identifying the Pauli operators of a two-qubit system with the points of the generalized
quadrangle of order two led to the discovery of three distinguished subsets of the operators in
terms of geometric hyperplanes of the quadrangle. Here we go one level higher, and identifying
these subsets with the points of the associated Veldkamp space leads to recognition of nother
remarkable subsets of the Pauli operators, viz. unicentric triads and pentads. It is really
intriguing to see that these are the core-sets of the two kinds of lines that both feature grids
alias Mermin squares. As it is well known, Mermin squares, which reveal certain important
aspects of the entanglement of the system, play a crucial role in the proof of the Kochen–Specker
theorem in dimension four and our approach gives a novel geometrical meaning to this [3, 11].
At the Veldkamp space level it turns out of particular importance to study relations between
eigenvectors of the above-mentioned unicentric triads and pentad of operators in order to reveal
finer, hitherto unnoticed traits of the structure of Mermin squares. These seem to be intimately
connected with the existence of outer automorphisms of the symmetric group on six letters,
which is the full group of automorphisms of our quadrangle; as this group is the only symmetric
group possessing (non-trivial) outer automorphisms, this implies that two-qubits have a rather
special footing among multiple qubit systems. All these aspects deserve special attention and
will therefore be dealt with in a separate paper.
Concerning three-qubits, our preliminary study indicates that the corresponding finite geo-
metry differs fundamentally from that of W (2) in the sense that it contains multi-lines, i.e., two
or more lines passing through two distinct points [12]. As we do not have a full picture at hand
yet, we cannot see if it admits hyperplanes and so lends itself to constructing the corresponding
Veldkamp space. If the latter does exist, it is likely to differ substantially from that of two-
qubits, which would imply the expected difference between entanglement properties of the two
The Veldkamp Space of Two-Qubits 7
kinds of systems; if not, this will only further strengthen the above-mentioned uniqueness of
two-qubits.
5 Conclusion
By employing the concept of the Veldkamp space of the generalized quadrangle of order two,
we were able to recognize other, on top of those examined in [1, 2, 3], distinguished subsets of
generalized Pauli operators of two-level quantum systems, namely unicentric triads and pentads
of them. It may well be that these two kinds of subsets of the two-qubit Pauli operators hold an
important key for getting deeper insights into the nature of finite geometries underlying multiple
higher-level quantum systems [12, 13], in particular when the dimension of Hilbert space is not
a power of a prime [14].
Acknowledgements
This work was partially supported by the Science and Technology Assistance Agency under
the contract # APVT–51–012704, the VEGA grant agency projects # 2/6070/26 and # 7012
(all from Slovak Republic), the trans-national ECO-NET project # 12651NJ “Geometries Over
Finite Rings and the Properties of Mutually Unbiased Bases” (France), the CNRS–SAV Project
# 20246 “Projective and Related Geometries for Quantum Information” (France/Slovakia) and
by the 〈Action Austria–Slovakia〉 project # 58s2 “Finite Geometries Behind Hilbert Spaces”.
References
[1] Saniga M., Planat M., Pracna P., Projective ring line encompassing two-qubits, Theor. Math. Phys., to
appear, quant-ph/0611063.
[2] Saniga M., Planat M., Multiple qubits and symplectic polar spaces of order two, Adv. Studies Theor. Phys.
1 (2007) 1–4, quant-ph/0612179.
[3] Planat M., Saniga M., On the Pauli graph of N-qudits, quant-ph/0701211.
[4] Payne S.E., Thas J.A., Finite generalized quadrangles, Pitman, Boston – London – Melbourne, 1984.
[5] Buekenhout F., Cohen A.M., Diagram geometry (Chapter 10.2), Springer, New York, to appear; preprints
of separate chapters can be found at http://www.win.tue.nl/~amc/buek/.
[6] Ronan M.A., Embeddings and hyperplanes of discrete geometries, European J. Combin. 8 (1987), 179–185.
[7] Shult E., On Veldkamp lines, Bull. Belg. Math. Soc. 4 (1997), 299–316.
[8] Hirschfeld J.W.P., Thas J.A., General Galois geometries, Oxford University Press, Oxford, 1991.
[9] Lawrence J., Brukner Č., Zeilinger A., Mutually unbiased binary observable sets on N qubits, Phys. Rev. A
65 (2002), 032320, 5 pages, quant-ph/0104012.
[10] Saniga M., Planat M., Rosu H., Mutually unbiased bases and finite projective planes, J. Opt. B: Quantum
Semiclass. Opt. 6 (2004), L19–L20, math-ph/0403057.
[11] Saniga M., Planat M., Minarovjech M., Projective line over the finite quotient ring GF (2)[x]/(x3 − x) and
quantum entanglement: the Mermin “magic” square/pentagram, Theor. Math. Phys. 151 (2007), 625–631,
quant-ph/0603206.
[12] Planat M., Saniga M., Pauli graph and finite projective lines/geometries, Optics and Optoelectronics, to
appear, quant-ph/0703154.
[13] Thas K., Pauli operators of N-qubit Hilbert spaces and the Saniga–Planat conjecture, Chaos Solitons Frac-
tals, to appear.
[14] Thas K., The geometry of generalized Pauli operators of N-qudit Hilbert space, Quantum Information and
Computation, submitted.
http://arxiv.org/abs/quant-ph/0611063
http://arxiv.org/abs/quant-ph/0612179
http://arxiv.org/abs/quant-ph/0701211
http://www.win.tue.nl/~amc/buek/
http://arxiv.org/abs/quant-ph/0104012
http://arxiv.org/abs/math-ph/0403057
http://arxiv.org/abs/quant-ph/0603206
http://arxiv.org/abs/quant-ph/0703154
	Introduction
	Finite generalized quadrangles and Veldkamp spaces
	The smallest thick GQ and its Veldkamp space
	Pauli operators of two-qubits in light of V(W(2))
	Conclusion
	References
ABSTRACT
  Given a remarkable representation of the generalized Pauli operators of
two-qubits in terms of the points of the generalized quadrangle of order two,
W(2), it is shown that specific subsets of these operators can also be
associated with the points and lines of the four-dimensional projective space
over the Galois field with two elements - the so-called Veldkamp space of W(2).
An intriguing novelty is the recognition of (uni- and tri-centric) triads and
specific pentads of the Pauli operators in addition to the "classical" subsets
answering to geometric hyperplanes of W(2).

<|endoftext|><|startoftext|>
Fusion process studied with preequilibrium giant dipole resonance in time-dependent
Hartree-Fock theory
C. Simenel1,2, Ph. Chomaz2 and G. de France2
1 DSM/DAPNIA/SPhN, CEA/SACLAY, F-91191 Gif-sur-Yvette Cedex, France and
2 Grand Accélérateur National d’Ions Lourds (GANIL), CEA/DSM-CNRS/IN2P3,
Bvd Henri Becquerel, BP 55027,F-14076 CAEN Cedex 5, France
(Dated: November 1, 2018)
The equilibration of macroscopic degrees of freedom during the fusion of heavy nuclei, like the
charge and the shape, are studied in the Time-Dependent Hartree-Fock theory. The preequilibrium
Giant Dipole Resonance (GDR) is used to probe the fusion path. It is shown that such isovector
collective state is excited in N/Z asymmetric fusion and to a less extent in mass asymmetric systems.
The characteristics of this GDR are governed by the structure of the fused system in its preequi-
librium phase, like its deformation, rotation and vibration. In particular, we show that a lowering
of the preequilibrium GDR energy is expected as compared to the statistical one. Revisiting ex-
perimental data, we extract an evidence of this lowering for the first time. We also quantify the
fusion-evaporation enhancement due to γ-ray emission from the preequilibrium GDR. This cooling
mechanism along the fusion path may be suitable to synthesize in the future super heavy elements
using radioactive beams with strong N/Z asymmetries in the entrance channel.
PACS numbers: 24.30.Cz, 21.60.Jz, 25.70.Gh, 25.70.Jj
I. INTRODUCTION
The fusion of two nuclei occurs at small impact pa-
rameters when the overlap between their wave functions
is big enough to allow the strong interaction to overcome
the Coulomb repulsion. Heavy-ion fusion reactions have
numerous applications, like the study of high spin states
in yrast and super-deformed bands [1] or the formation of
Heavy and Super Heavy Elements (SHE) [2]. Induced by
beams of unstable nuclei, this mechanism will also allow
to produce very exotic species and allow for the study of
isospin equilibration in the fused system.
The fusion process can be schematically divided in
three steps: (i) an approach phase during which each nu-
cleus feels only the Coulomb field of its partner and which
ends up when the nuclear interaction starts to dominate,
(ii) a rapid equilibration of the energy and the angular
momentum transfered from the relative motion to the
internal degrees of freedom, leading to the formation of
a Compound Nucleus (CN) and (iii) a statistical decay
of the CN. Lots of theoretical and experimental efforts
[3] are made to understand step (i). These studies focus
on an energy range located around the fusion barrier. At
these energies the fusion is controlled by quantum tunnel-
ing which is strongly influenced by the couplings between
the internal degrees of freedom and the relative motion of
the two colliding partners. Although the cooling mecha-
nisms involved in (iii) are well known and consist mainly
in light particle and γ-ray emission in competition with
fission for heavy systems, the initial conditions of the
statistical decay depend on the equilibration process (ii)
which is still subject to many debates nowadays. Indeed,
step (ii) is characterized by an equilibration of several
degrees of freedom like the shape [4] or the charge [5]
which can be accompanied by the emission of preequi-
librium particles. Such emission decreases the excitation
energy and the angular momentum. The latter quanti-
ties are crucial and must be determined precisely because
they have a major influence on the CN survival probabil-
ity and therefore on the synthesis of very exotic systems
such as the SHE.
In this paper we study the equilibration of the charges
in fused systems, its interplay with other macroscopic de-
grees of freedom like the shape and the rotation, and its
implications on the statistical decay. To probe theoret-
ically and experimentally this way to fusion, we use the
preequilibrium isovector Giant Dipole Resonance (GDR)
[6, 7, 8, 9, 10]. Giant Resonances are interpreted as the
first quantum of collective vibrations involving protons
and neutrons fluids. The Giant Monopole Resonance
can be described as a breathing mode, an alternation
of compression and dilatation of the whole nucleus. The
GDR corresponds to a collective oscillation of the pro-
tons against the neutrons. The Giant Quadrupole Res-
onance consists in a nuclear shape oscillation between
prolate and oblate deformations. Many other resonances
have been discovered [11, 12]. In particular Giant Res-
onances have been observed in hot nuclei formed by fu-
sion [13, 14]. This demonstrates the survival of ordered
vibrations in very excited systems, which are known to
be chaotic, even if some Giant Resonance characteristics
like the width are affected by the temperature [15, 16].
Moreover, the strong couplings between various collec-
tive modes which occur for Giant Resonances built on
the ground state [17, 18] are still present in fusion reac-
tions [10, 19]. It might therefore be possible to use the
Giant Resonances properties to probe the nuclear struc-
ture of the composite system on its way to fusion.
The choice of the preequilibrium GDR, that is, a GDR
excited in step (ii) before the formation of a fully equi-
librated CN, is motivated by the fact that its properties
strongly depend on the structure of the state on which
http://arxiv.org/abs/0704.0496v3
it is built, for instance the deformation [5]. The idea is
to form a CN with two N/Z asymmetric reactants. Such
a reaction may lead to the excitation of a dipole mode
because of the presence of a net dipole moment in the en-
trance channel. This dipole oscillation should occur be-
fore the charges are fully equilibrated, that is, during the
preequilibrium phase in which the system keeps a mem-
ory of the entrance channel [5, 6, 7, 8, 9, 10, 20, 21]. In
addition, for such N/Z asymmetric reactions, an enhance-
ment of the fast GDR γ-ray emission is expected as com-
pared to the ”slower” statistical γ-ray yield [7, 8, 9, 10].
This is of particular interest since the properties of these
GDR γ-rays characterize the dinuclear system which pre-
cedes the hot equilibrated CN. The first experimental in-
dications on the existence of such new phenomenon have
been reported in [22, 23, 24, 25, 26] for fusion reactions
and in [26, 27, 28, 29, 30, 31, 32, 33] in the case of deep
inelastic collisions.
The paper is organized as follows: In Sec. II we
study the properties of the preequilibrium GDR using the
Time-Dependent Hartree-Fock (TDHF) formalism. In
Sec. III we show how an N/Z asymmetric entrance chan-
nel may increase the fusion-evaporation cross-sections.
Finally, we conclude in section IV.
II. TDHF STUDY OF THE PREEQUILIBRIUM
GIANT DIPOLE RESONANCE
At the early time of the fusion reaction, the system
keeps the memory of the entrance channel. We call this
stage of the collision the preequilibrium phase which ends
when all the degrees of freedom are equilibrated in the
compound system and when the statistical decay starts.
One of these degrees of freedom is the isospin, which
measures the asymmetry between protons and neutrons.
When the two nuclei have different N/Z ratios, the pro-
ton and neutron centers of mass of the total system do
not coincide. As shown in [6, 21], there is a non zero force
between the two kind of nucleons which tends to restore
the initial isospin asymmetry. In such a case, an oscil-
lation of protons against neutrons on the way to fusion
might occur, that is, the so-called preequilibrium GDR
[5, 6, 7, 8, 9, 10, 20].
In fusion reactions the shape of the system changes
drastically during the preequilibrium phase. Studies of
the dynamics in the fusion reaction mechanism requires
sophisticated calculations to extract the preequilibrium
GDR characteristics (energy, width...) and in turn, on
the way to fusion. To achieve this goal, we choose to
use, as in the pioneer work of Bonche and Ngô on charge
equilibration [5], the TDHF approach because it is a fully
microscopic theory which takes into account the quantal
nature of the single particle dynamics. Moreover in the
present study we will restrict ourself to the observation
of one-body observables (e.g. the density ρ(r)) which
are supposed to be well described by such a mean field
approach. However it is clear that an important challenge
is to develop methods going beyond mean field which is
beyond the scope of this paper.
In this section we present quantum calculations
on preequilibrium giant collective vibrations using the
TDHF theory. We shall start with a brief description of
the TDHF theory in Sec. II A. Then we examine the role
of various relevant symmetries in the entrance channel,
namely the N/Z and mass symmetries (Sec. II B-IID).
Finally, in Sec. II E we shall compare our results with
the experimental data obtained by Flibotte et al. [22].
A. TDHF approach
In the TDHF approach [34, 35, 36, 37, 38, 39, 40], each
single particle wave function is propagated in the mean
field generated by the ensemble of particles. The mean
field approximation does not take into account the dissi-
pation due to two-body interactions [41, 42, 43, 44]. How-
ever TDHF takes care of one-body mechanisms such as
Landau spreading and evaporation damping [45]. Quan-
tum effects induced by the single particle dynamics like
shell effects or modification of the moment of inertia [46]
are accounted for properly.
The main advantage of TDHF is its fully microscopic
treatment of the N-body dynamics with the same effec-
tive interaction as the one used for the calculation of the
Hartree-Fock (HF) ground sates of the collision partners.
The consistency of the method for the structure of nuclei
and the nuclear reactions increases its prediction power
and its availability to study the interplay between exotic
structures and reaction mechanisms.
Moreover the TDHF equation is strongly non linear
which is of great importance for reactions around the bar-
rier because it includes couplings between relative motion
and internal degrees of freedom of the collision partners.
Also TDHF provides a good description of collective mo-
tion and can even exhibit couplings between collective
modes [17]. In fact the TDHF theory is optimized for
the prediction of expectation values of one-body observ-
ables and gives their exact evolution in the extreme case
where the residual interaction vanishes. However, the
TDHF prediction of multipole moments in nuclear colli-
sion, for instance, may differ from the correct evolution
because of the omission of the residual interaction. An
improvement of the description would be given by the
inclusion of the effect of the residual interaction on the
dynamics, which would increase considerably the compu-
tational time and is beyond the scope of this paper.
The TDHF theory describes the evolution of the
one-body density matrix ρ(t) of matrix elements
〈rsq|ρ̂|r′s′q′〉 =
ϕ∗i (r
′s′q′)ϕi(rsq), where ϕi(rsq) =
〈rsq|i〉 denotes the component with a spin s and isospin
q of the occupied single particle wave-function ϕi. This
evolution is determined by a non linear Liouville-von
Neumann equation,
ρ− [h(ρ), ρ] = 0 (1)
where h(ρ) is the matrix associated to the self consistent
mean-field Hamiltonian. We have used the code built by
P. Bonche and coworkers [47] with an effective Skyrme
interaction [48] and SLy4d parameters [47]. In its actual
version, TDHF does not account for pairing interactions.
B. N/Z asymmetric reactions
As far as the dipole motion in the preequilibrium phase
is concerned, it is obvious that the main relevant asym-
metry responsible for such a motion is a difference in the
charge-to-mass ratio between the collision partners [6].
The associated experimental signature is an enhancement
of the γ-ray emission in the GDR energy region of the
compound system [22, 23, 24, 25, 26] which is attributed
to a dipole oscillation. Several informations about the
fusion path can be extracted from such a dipole oscilla-
tion and its corresponding γ-ray spectrum. For numerical
tractability we start our study of the fusion process with
a light system: 12Be+28S→40Ca. We first deduce the
γ-ray spectrum from the dipole motion. Then we study
the effects of the deformation of the compound system,
and of the impact parameter on this motion.
1. The preequilibrium GDR γ-ray spectrum
We first consider a central collision at an energy of
1 MeV/nucleon in the center of mass. The expectation
value of the dipole moment Q̂D is defined by
QD = 〈Q̂D〉 =
(Xp −Xn) (2)
whereXp =
〈x̂p〉
andXn =
〈x̂n〉
are the positions
of the proton and neutron centers of mass respectively.
The expectation value of the conjugated dipole moment
P̂D is then associated to the relative velocity between
protons and neutrons, and is defined by the relation
PD = 〈P̂D〉 =
(Pp − Pn) (3)
where Pp =
〈p̂p〉 and Pn =
〈p̂n〉 are the to-
tal proton and neutron moments respectively. These
definitions ensure the canonical commutation relation
Q̂D, P̂D
= i~.
The time evolutions of QD and PD are plotted in
Figs. 1-c and 1-b respectively. The trajectories in both
the (QD,t) and (PD,t) planes exhibit oscillations which
we attribute to the preequilibrium GDR. We also note
that PD(t) oscillates in phase quadrature with QD(t) and
that those oscillations are damped due to the one-body
dissipation. Consequently, the plot of PD as a function
of QD shown in Fig. 1-a is a spiral. The GDR period
extracted from these plots is around 107 fm/c, which
corresponds to an energy of ∼ 11.6 MeV.
FIG. 1: Time evolution of the expectation value of the dipole
moment, QD, and its conjugated moment, PD, in the reaction
12Be+28S→40Ca at an energy of 1 MeV/nucleon in the center
of mass and at zero impact parameter.
During the collision and before the equilibrium is
reached, a fast rearrangement of charges occurs within
the composite system [5], generating the γ-ray emission.
We extract the preequilibrium GDR γ-ray spectrum from
the Fourier transform of the acceleration of the charges
[9, 49]
(Eγ) =
|I(Eγ)|2
where α is the fine structure constant and
I(Eγ) =
The spectrum obtained from Eq. 4 is plotted in Fig. 2
(solid line). In order to have a spectrum without spu-
rious peaks coming from the finite integration time,
we multiply the quantity d
by a gaussian function
[50]. In addition, this function plays a role
of a filter in the time domain. This filter prevents the sig-
nal to be affected by the interaction between the nucleus
and the emitted nucleons which have been reflected on
the box [51]. We choose τ = 320 fm/c in our calculations.
This ensures the fact that the spectra are free of spuri-
ous effects coming from the echo. However this procedure
adds a width Γ ∼ ~
∼ 0.6 MeV. This is a drawback if
one is concerned with detailed spectroscopy. However, in
this paper, we are only interested by the gross properties
of the preequilibrium GDR in order to study the fusion
mechanisms. As we can see in Fig. 2, the preequilibrium
GDR energy is E
GDR = 11.64 MeV, which corresponds
FIG. 2: preequilibrium GDR γ-ray spectrum calculated in
the reaction 12Be+28S → 40Ca (solid line) at an energy of 1
MeV/nucleon in the center of mass and γ-ray spectrum of a
GDR built on the ground state of 40Ca (dotted line).
to the previous value deduced from the GDR oscillation
period.
The energy of the preequilibrium GDR is much lower
than the one of the GDR built on the spherical ground
state of the 40Ca. This situation will be now explored
into more details.
2. Deformation effect
To better characterize the preequilibrium GDR, it is
necessary to compare it with the usual GDR built upon
the CN ground state [20]. This GDR is generated by
applying an isovector dipole boost with a velocity kD on
the 40Ca HF ground state |ψ(t)〉 = exp
−ikDQ̂D
|HF 〉
yielding an oscillation of QD(t) and PD(t) in phase
quadrature as we can see in Fig. 3. The period of
the oscillation is around 80 fm/c which is lower than
in the fusion case and corresponds to a higher energy
(EGDR = 15.5 MeV) as it is shown in the associated
GDR γ-ray spectrum in Fig. 2 (dotted line). The lower
energy obtained for the fusing system reveals a strong
prolate deformation [5, 9, 10, 20]. The two mechanisms
(fusion reaction and dipole boost) are expected to gen-
erate a GDR with quite different dynamical properties.
This can be seen in the density plot projected in the re-
action plane shown in Fig. 4, which shows that in the
case of a fusion reaction, the CN relaxes its initial pro-
late elongation along the collision axis with a time which
is larger than the typical dipole oscillation period of the
FIG. 3: GDR built upon the HF ground state in 40Ca and
excited by an isovector dipole boost: evolution of the expec-
tation value of the associated dipole moment, QD, and its
conjugated moment, PD, as a function of time.
FIG. 4: Density plots projected on the reaction plane for dif-
ferent times in the case of the fusion reaction. Lines represent
isodensities.
Deformation effects can be studied all along the fusion
path [4, 20]. The quadrupole deformation parameter ǫ
is defined by a scaling of the axis from a spherical to a
deformed shape along the x-axis
Rx = R0(1 + α)
Ryz = R0(1− ǫ) (5)
where α is defined by the conservation of the volume of
the nucleus RxRyRz = R
0, which leads to
(2− ǫ)ǫ
(1 − ǫ)2
. (6)
If one neglects high order terms in ǫ, we get the usual
value α ≃ 2ǫ.
The deformation parameter is related to the expecta-
tion values of the monopole and quadrupole moments Q̂0
FIG. 5: Time evolution of the deformation, ǫ, in 40Ca
formed in the 12Be+28S fusion reaction at an energy of 1
MeV/nucleon in the center of mass. The time axis origin is
chosen when the maximum of the fusion barrier is reached.
The average preequilibrium deformation ǫp obtained from the
GDR energy (see Eq. 13) is represented by a dashed line.
and Q̂2 which are expressed by
Q0 = 〈Q̂0〉 =
dr ρ(r)r2 (7)
Q2 = 〈Q̂2〉 =
dr ρ(r)r2
. (8)
We can write Q2 as a function of Q0
Q2 = −
Q0 + 3
dr ρ(r)x2. (9)
Eqs. 5 and 7 lead to
dr ρ(r)x2 = (1 + α)2
Q0. (10)
Using Eqs. 6, 9, 10 and ǫ < 1, we get
ǫ(t) = 1−
2Q2(t)√
5Q0(t)
which, at first order in ǫ, becomes
ǫ(t) =
Q2(t)
5Q0(t)
. (12)
In Ref. [10] we used Eq. 12 to characterize the average
deformation. In Fig. 5 we present the time evolution of
the deformation, ǫ(t), obtained from the more general
FIG. 6: a) Energy of the preequilibrium GDR obtained from
the first oscillation of the dipole moment and b) the defor-
mation parameter, ǫ, obtained from Eq. 13 (dashed line) and
from Eq. 11 (solid line), as a function of the center of mass
energy.
expression of ε given in Eq. 11. We consider a 40Ca
formed in the 12Be+28S fusion reaction at an energy of 1
MeV/nucleon in the center of mass. The important point
here is that the deformation does not relax and strongly
affects the frequency of the oscillations. A lower energy is
expected for the longitudinal collective motion E
GDR in
the fused system as compared to the one simulated in a
spherical 40Ca [5, 7, 8, 9, 10, 20]. Following a macroscopic
model for the dipole oscillation, we expect the energy of
the GDR to evolve with the deformation along the x-axis
(collision axis) as
= (1 − ǫp)2 (13)
where ǫp is the average deformation during the preequi-
librium stage. The frequency of the GDR along the defor-
mation axis fulfills this relation with ǫp ≃ 0.13 in excel-
lent agreement with the observed deformation in Fig. 5.
We have also investigated the effect of the center of
mass energy ECM on the preequilibrium GDR energy
and on the deformation parameter (see Fig. 6). The GDR
energy exhibits small variations (less than 1 MeV) with
the center of mass energy (Fig. 6-a). For ECM < 40
MeV, the increase of E
with ECM is attributed to
the formation of a dinuclear system with a slow neck
dynamics at low energy [9]. The presence of the neck is
in fact expected to slow down the charge equilibration
process, and then to increase the GDR period.
For ECM > 40 MeV, Fig. 6-a a decrease of E
when ECM increases. As illustrated in Fig. 6-b, this is
associated to a larger quadrupole deformation when the
collision is more violent. Consequently, the higher the
center of mass energy, the more prolately deformed the
CN. In Fig. 6-b, the deformation is estimated from Eq. 13
(dashed line) and from Eq. 11 (solid line) at the first max-
imum after one oscillation of ǫ(t) (e.g. at t ∼ 225 fm/c
in the case of ECM = 1 MeV/u as we can see in Fig. 5).
We also observe in this energy domain a good agreement
between the deformations calculated with both methods.
This lowering of the GDR energy due to deformation is
not specific to nuclear physics. Indeed, an energy split-
ting of the isovector dipole mode has been observed in
fissioning atomic clusters due to a strong prolate defor-
mation of the fission phase [52]. In such systems, the use
of LASERs with the ”pulse and probe” technique is ex-
pected to give access to the deformation and also to the
fission time [53].
3. non central collisions
To better mimic the situation of a fusion reaction, we
extended our calculations to non-zero impact parameters.
In fact, a non central collision may excite collective ro-
tational states in the deformed preequilibrated CN. This
rotation may be coupled to the preequilibrium GDR [20].
In particular, the interplay of dipole vibration and defor-
mation can be affected by the rotation. In addition to
the center of mass coordinates with x along the beam
axis and y perpendicular to the reaction plane, we de-
fine a new coordinate system x′, y′, z′, where x′ is the
deformation axis, and y = y′ is the rotation axis (see
Fig. 7). In the head-on collision example studied previ-
ously, those two frames are the same. For symmetry rea-
sons, the dipole oscillation cannot occur along the z = z′
and y = y′ axis.
For non-central collisions, the oscillation is only for-
bidden along the y = y′ axis [5, 20]. In this case the
amplitude of the oscillation along x′ slightly decreases
with the impact parameter. This decrease becomes sig-
nificant at rather large impact parameters as we can see
in Fig. 8 where we have plotted the amplitude of the first
oscillation of the dipole moment along x′ (solid line) as
a function of the impact parameter. This decrease is ac-
companied by an oscillation of the dipole moment along
the z′ axis with a smaller amplitude which increases with
the impact parameter b. Both amplitudes are of the same
order when b ∼ 5 fm.
FIG. 7: Description of the two frames used in non central
collisions.
FIG. 8: Amplitude of the first oscillation of the dipole moment
along x′ (solid line) and along z′ (dashed line) as a function
of the impact parameter, b, in the 12Be+28S fusion reaction
at an energy of 1 MeV/nucleon in the center of mass.
The oscillation along the z′ axis results from a weak
symmetry breaking due to the rotation of the system
[10]. In order to demonstrate this, let us start with the
time-dependent Schrödinger equation in the laboratory
frame R: i~|ψ̇〉 = Ĥ |ψ〉. In the rotating frame R′, the
expression of the wave function is |ψ′〉 = R̂(α)|ψ〉 where
R̂(α) = e−iα(t)Ĵy is a rotation matrix, Ĵy is the generator
of the rotations around y and α(t) is the angle between
the two frames (see Fig. 7). We express the Schrödinger
FIG. 9: Time evolution of the total dipole moment for the
8Be+32S→40Ca reaction at an energy of 1 MeV/nucleon in
the center of mass. At time t = 0 fm/c, the distance between
the centers of mass of the nuclei is 92.8 fm. The arrow indi-
cates the time when the fusion barrier is reached. The dashed
line gives the result of the adiabatic model (cf. Eq. 15).
equation as −~α̇ĴyR̂−1|ψ′〉+i~R̂−1|ψ̇′〉 = ĤR̂−1|ψ′〉 and
we get [10]
i~|ψ̇′〉 =
R̂ĤR̂−1 + ~α̇Ĵy
|ψ′〉. (14)
Eq. 14 is the Schrödinger equation expressed in the
rotating frame R′ of the CN and Ĥ ′ = R̂ĤR̂−1+~α̇Ĵy is
the Hamiltonian expressed in this frame. The last term
induces a motion along the z′ axis from a dipole vibration
along x′. It is quantified by the dipole moment along z′
which is plotted as a dashed line in Fig. 8. This is a
clear manifestation of couplings between rotational and
vibrational motions in nuclei.
In this subsection we have shown that an N/Z asymme-
try in the entrance channel generates a dipole oscillation
during the preequilibrium phase of a fusion reaction. In
the next one we will see that, due to polarization effects,
such a motion also occurs in N/Z symmetric systems al-
though with a smaller amplitude.
C. N/Z symmetric reactions
We now examine the situation of a central collision in-
volving two N = Z nuclei using the example of 8Be+32S
at ECM = 1 MeV/nucleon (
8Be is bound with a strong
prolate deformation in Hartree-Fock calculations with
the SLy4d force). As we can see in Fig. 9, the amplitude
FIG. 10: Schematic representation of the isovector polariza-
tion due to Coulomb repulsion between protons that occurs
before fusion. The protons are represented by a solid line and
the neutrons by a dotted line. Xi is the position of the center
of mass of the nucleus i.
of the dipole oscillations is significantly reduced as com-
pared to the N/Z asymmetric case (cf. Fig. 1-c). In this
latter system (12Be+28S), the dipole oscillations are gen-
erated by the N/Z asymmetry, whereas in the 8Be+32S
reaction, they are only due to the mass asymmetry of
the two collision partners. Indeed, a mass asymmetry
induces a difference in the isovector polarization in the
collision partners. This polarization is due to Coulomb
repulsion between protons of the colliding nuclei before
the fusion starts [5].
To show it, let us use an adiabatic approach in which
we consider that the polarization of a nucleus at a dis-
tance X = X2 −X1 between the centers of mass is gen-
erated by the Coulomb field of its collision partner. Xi
is the position of the center of mass of the nucleus i.
The distance between the proton and neutron centers of
mass in nucleus i is supposed to be small as compared
to X (see Fig. 10). The equality between the external
Coulomb field and the restoring force between protons
and neutrons leads to a dipole moment in the nucleus i
QDi(t) ≃ (−1)i
NiZiZje
AiEGDR
imX(t)
where i 6= j = 1 (for 32S) or 2 (for 8Be). The GDR
energy is calculated in each collision partner from the
dipole response frequency following a small amplitude
dipole boost. We get EGDR = 23.0 MeV for
32S and
EGDRx = 17.2 MeV for
8Be along its deformation axis
which is chosen to be aligned with the collision axis. The
dipole moment in the total system becomes
QD(t) =
N1Z2 −N2Z1
X(t) +QD1(t) +QD2(t). (15)
The first term of the right hand side of Eq. 15 is usually
dominant for a N/Z asymmetric reaction [8]. However, it
vanishes for a N/Z symmetric one. In this case, one is left
with the sum of the dipole moments of the partners. This
simple adiabatic model (dashed line in Fig. 9) gives the
good trend of the total dipole moment up to the vicinity
of the contact point.
After the fusion starts, the dipole moment increases
and oscillates in the preequilibrium system. The adia-
batic model is too simple to describe this phenomenon.
In fact, due to the polarization, the nuclear interaction
acts first on neutrons and then is expected to modify
strongly the dipole moment at the initial stage of the
fusion [5].
The consequence of this polarization in a mass asym-
metric system is a dipole oscillation which can be in-
terpreted, as previously, in term of an excitation of a
preequilibrium GDR. However, the GDR excitation is
very small as compared to the N/Z asymmetric case. Of
course, for a mass and N/Z symmetric reaction no pree-
quilibrium GDR are allowed for symmetry reason [10].
As we will see in the next section, the special case of
an N/Z asymmetric and mass symmetric system exhibits
some interesting behaviors as far as the collective motions
are concerned.
D. mass asymmetry and isoscalar vibrations
In this subsection we study the couplings between the
isovector dipole motion and isoscalar vibrations in the
preequilibrium phase and their dependence on the mass
asymmetry in the entrance channel. The dipole motion
can be coupled to isoscalar vibrations through the non
linearity of the TDHF equation [10, 17, 19]. The presence
of such isoscalar vibrations in the preequilibrium system
depends on the structure of the colliding partners and
on their mass asymmetry. For instance a mass symmet-
ric system has a stronger quadrupole deformation at the
touching point than a mass asymmetric one. In such a
system a quadrupole vibration might appear.
Let us start this study with the time evolution of the
instantaneous dipole period [10] which is very sensitive to
couplings with isoscalar vibrations. We define this period
as being twice the time to describe half a revolution in the
spiral diagram representing the evolution of the system
in the (PD, QD) space. The resulting evolution is plotted
in Fig. 11 for two N/Z asymmetric central collisions:
• the mass asymmetric 12Be+28S reaction at ECM =
1 MeV/nucleon.
• the mass symmetric 20O + 20Mg reaction at
ECM = 1.6 MeV/nucleon.
The center of mass energy has been chosen to obtain
the same ECM/VB ratio for both reactions (VB is the
Coulomb barrier).
The mean values of the GDR period obtained for the
two reactions are different. For the mass symmetric re-
action, this value is ≃ 170 fm/c, whereas in the mass
asymmetric case it is ≃ 105 fm/c, in good agreement
with the one obtained from Fig. 1 (107 fm/c). This dif-
ference is attributed to a larger deformation of the CN
FIG. 11: Time evolution of the GDR period for 20O+20Mg
at 1.6 MeV/nucleon (solid line) and for 12Be+28S at
1 MeV/nucleon (dashed line). Both energies are in the center
of mass.
in the mass symmetric case which, in average, is ǫ ∼ 0.2
(from Eq. 12), as compared to the mass asymmetric sys-
tem (ǫ ∼ 0.13). Note that it is not appropriate to use
Eq. 13, to calculate the deformation from the observed
GDR energy frequency for 20O+20Mg since it is valid
only for small deformations.
The dipole moment time evolution for those two re-
actions (Figs. 1 and 12), shows that unlike 20O+20Mg,
the oscillations in the 12Be+28S system are dominated
by a single energy. This is consistent with the evolution
of the GDR period in Fig. 11 which is rather constant
in the mass asymmetric case whereas it exhibits strong
oscillations in the mass symmetric one. This anharmonic-
ity can also be seen in the GDR γ-ray spectrum of the
20O+20Mg reaction plotted in Fig. 13. Indeed, one can
clearly identify two peaks in this spectrum at 7.7 MeV
and 10.8 MeV.
To better understand what is the origin of the dif-
ferences between the two systems, we have calculated
the evolutions of the monopole Q0 and quadrupole Q2
moments defined by Eqs. 7 and 8 respectively. Those
evolutions are plotted in Fig. 14-a for 20O+20Mg. We
first note that Q2 is always positive, that is, the com-
pound system keeps a prolate deformation. In addition,
Q0 and Q2 exhibit strong oscillations with the same pe-
riod ∼ 165 fm/c. Therefore, we conclude that they have
the same origin which is interpreted as a vibration of
the density around a prolate shape [10]. This mode is
only excited in the mass symmetric channel: the evolu-
tions of Q0(t) and Q2(t) for the mass asymmetric reac-
tion (12Be+28S) at 1 MeV/nucleon in the center of mass
FIG. 12: Evolution of the expectation value of the dipole mo-
ment, QD, and its conjugated moment, PD, in the reactions
20O+20Mg→40Ca at an energy of 1.6 MeV/nucleon in the
center of mass.
FIG. 13: GDR γ-ray spectrum calculated in the
20O+20Mg→40Ca reaction at an energy of 1.6 MeV/nucleon
in the center of mass.
(thick lines in Fig. 14-b) do not show any significant oscil-
lation of these moments. Evolutions of Q0(t) and Q2(t)
at 1.6 MeV/nucleon in the center of mass are also plot-
ted (thin lines in Fig. 14-b). They do not exhibit any
significant oscillation neither. Therefore, the vibrations
observed in Fig. 14-a are not attributed to a difference
in the collision energy but to the mass asymmetry in the
entrance channel.
The monopole and quadrupole oscillations modify the
properties of the dipole mode in a time dependent way
[8, 20]. Let us consider a harmonic oscillator for the
dipole motion with a time dependent rigidity constant.
This is a way to simulate the non linearities of TDHF.
Indeed, the observed oscillation of the density modifies
the restoring force between protons and neutrons. This is
due to the fact that the density enters in the mean field
potential of the TDHF equation (Eq. 1). This restor-
ing force is lower along the deformation axis of a pro-
lately deformed nucleus than in the perpendicular axis.
Thus, variations of the density profile in the TDHF equa-
tion can be modeled by a corresponding variation of the
rigidity constant k(t). In such a model, the evolution
of the dipole moment is given by the differential equa-
tion Q̈D(t) + (k(t)/µ)QD(t) = 0 where µ =
the reduced mass of the system. We note ω0 the av-
erage pulsation related to the rigidity constant given by
k(t)/µ = ω20(1 + η cosωt), where ω is the pulsation of
the density oscillation deduced from Fig. 14-a and η is a
dimensionless constant which quantifies the coupling be-
tween the GDR and the other collective mode associated
to the density vibration. We thus have
Q̈D(t) + ω
0 [1 + η cosωt]QD(t) = 0. (16)
This equation is the so called Mathieu’s equation [10].
It is interesting to show how we can get this equation
from a more microscopic equation like the TDHF one
(Eq. 1) in a one dimensional framework. Following the
way of ref. [54], the Wigner transform of Eq. 1 for a local
self consistent potential V is
V f (17)
where f(x, p, t) =
ds exp(−ip.s/~) ρ(x+ s
, x− s
, t) is
the Wigner transform of the density matrix ρ(x1, x2, t) =
〈x1|ρ̂(t)|x2〉. The upper indices on the derivative opera-
tors in Eq. 17 stand for the function on which the oper-
ator acts. We have of course f = fp + fn where fp and
fn are the Wigner transforms of the proton and neutron
density matrices respectively.
We now apply the Wigner Function Moment (WFM)
method to get a closed system of dynamical equations for
the dipole and its conjugated moments. We calculate the
integrals on the phase space of Eq. 17 with the weights
xτ on the one hand, and pτ on the other hand (τ=1 for
protons and −1 for neutrons). The distance D between
proton and neutron centers of mass can be written as
dx dp x (fp − fn) and we get
(fp − fn)
dx dp x sin
V (fp − fn)
where the time dependence has been omitted for simplic-
ity. The right hand side term is the integral of multiple
p-derivatives of f so it vanishes because fp, fn and all
their p−derivatives vanish for |p| → ∞. With P being
the relative momentum between protons and neutrons
dp dx p (fp − fn) we get
. (18)
We now calculate the integral of Eq. 17 with the
weight pτ . Noting the matter density n(x, t) =
dp f(x, p, t) and the kinetic energy density A(x, t) =
dp p2 f(x, p, t) we have
(Ap −An) = −
(np − nn) .
Using A = 0 for |x| → ∞ we have
Ṗ = −
(np − nn) . (19)
Eqs. 18 and 19 are the system of dynamical equations
of motion we were looking for. It is important to stress
that this system of equations is obtained without approx-
imation for a local potential. To go further, we need an
explicit form of the potential. If we consider for instance
a harmonic oscillator V = kx2/2, we obtain the dipole
moment evolution equation: mD̈ = −k D with the solu-
tion D = D0 cosω0 t, where ω0 =
If a breathing mode occurs at a pulsation ω, then the
density n(x, t) oscillates with the pulsation ω: n(x, t) =
n0(x) [1 + λ(x) cosωt]. Since the potential is self consis-
tent, it also presents oscillations which are a function of
cosωt: V (x, t) ≡ V (x, cosωt). We assume for this poten-
tial the separable form V (x, t) = V0(x) (1 + F [cosωt]),
where V0(x) is the potential when no breathing mode
is excited. Using a harmonic picture for V0, that is,
V0(x) =
2, we get from Eqs. 18 and 19 the equa-
tion for the dipole moment QD =
Q̈D(t) + ω
0 (1 + F [cosωt])QD(t) = 0. (20)
We finally see that the Mathieu’s equation (Eq. 16) ap-
pears to be an approximation of Eq. 20 where only the
linear part of the function F(ξ) ≃ ηξ is conserved.
We have solved the Mathieu’s equation numerically
with a set of parameters suitable for our problem. The
pulsation of the density oscillation is extracted from
Fig. 14-a and we get ω ≃ 7.5 MeV/~. For the pulsation
of the GDR we choose the main peak at ωGDR ≃ 7.7
MeV/~ (see Fig. 12). It is related to the pulsation ω0
by the relation ω0 = rωGDR. The constants r and η are
tuned to reproduce approximatively the TDHF results
period. The parameter r is expected to be close to 1 but
not exactly 1 because of the presence of the oscillating
term which may slightly change the mean value of the
dipole pulsation. The solution of the Mathieu’s equation
oscillates with a time-dependent period which reproduces
the TDHF case quite well with r ≃ 1.1 and η ≃ 0.5 (see
Fig. 15).
FIG. 14: Evolution with time of the monopole (Q0, solid line)
and quadrupole (Q2, dashed line) moments in the reactions
20O+20Mg→40Ca at 1.6 MeV/nucleon (a) and for 12Be+28S
at 1 MeV/nucleon (thick lines) and 1.6 MeV/nucleon (thin
lines) (b). Both energies are in the center of mass.
FIG. 15: Time evolution of the GDR period calculated for the
reaction 20O+20Mg→ 40Ca at an energy of 1.6 MeV/nucleon
in the center of mass (solid line) and its modelization by the
Mathieu’s equation (dashed line).
FIG. 16: Evolution of the expectation value of the dipole
moment, QD, and its conjugated moment, PD, in the case of
the N/Z asymmetric reaction 40Ca+100Mo at a center of mass
energy of 0.83 MeV/nucleon.
In a recent paper [19], following the formalism devel-
oped in a study of non linear vibrations [17], we related η
to a matrix element of the residual interaction coupling
collective states.
As a consequence, the excitation of collective modes
such as the quadrupole and monopole vibrations is cou-
pled to the preequilibrium GDR. Such vibrations occur
only in the mass symmetric reaction we studied. The ef-
fects of this coupling are a reduction of the GDR energy
(estimated around 10 per cent in this case) and an addi-
tional spreading of the resonance line shape due to the
modulation of the dipole frequency.
E. comparison with experiments
As a test case, we have performed TDHF calcula-
tions of the reactions studied by Flibotte et al. [22].
In this paper, two systems have been investigated: an
N/Z asymmetric one (40Ca+100Mo) and an N/Z quasi-
symmetric one (36S+104Pd) at a center of mass energy
of 0.83 MeV/nucleon. These systems have been cho-
sen because they lead to the same composite system
(140Sm). The corresponding dipole evolutions obtained
from TDHF are plotted in Fig. 16 for the N/Z asymmet-
ric reaction and in Fig. 17 for the N/Z quasi-symmetric
one. A dipole oscillation is observed in both reactions
but with a stronger amplitude in the N/Z asymmetric
The preequilibrium GDR γ-ray spectra for those reac-
tions are calculated using Eq. 4 and plotted in Fig. 18-a.
FIG. 17: Evolution of the expectation value of the dipole
moment, QD, and its conjugated moment, PD, in the case of
the N/Z quasi-symmetric reaction 36S+104Pd at a center of
mass energy of 0.83 MeV/nucleon.
The area under the peak associated to the N/Z asymmet-
ric reaction (solid line) is considerably larger than the one
under the N/Z quasi-symmetric one (dashed line).
To estimate the importance of the preequilibrium γ-
ray emission with respect to the statistical decay and its
role on the fusion process, we have calculated the spec-
trum associated to the first chance statistical γ-ray decay.
It is obtained from the γ-ray emission probability in all
directions per energy unit assuming an equilibrated CN
[9, 55, 56]. Its expression is
E4γ e
E2γ − E2GDR
where m is the nucleon mass, ΓGDR and EGDR are the
width and the energy of the statistical GDR respectively,
and T is the temperature of the equilibrated CN. At first
order, the energy of the GDR does not depend on the
temperature [14]. We use the values EGDR = 15 MeV
and ΓGDR = 7 MeV. Following the same method as the
one employed in Ref. [9], we approximate the CN width
ΓCN with the total neutron width
ΓCN ≃ Γn =
2mr20A
T 2e−
T (22)
where Bn = 8.5 MeV is the neutron binding energy and
r0 = 1.2 fm. The temperature T is calculated from the
equation
where a ≃ 1/10 MeV−1 is the level density parameter
and E∗ = 71 MeV is the excitation energy. The resulting
spectrum is plotted in Fig. 18-a (dotted line).
We note that the N/Z asymmetric preequilibrium spec-
trum is comparable in intensity to the first step statistical
one. This fact has already been pointed out by Baran et
al. [9] who got a similar spectrum for the N/Z asymmet-
ric reaction with a semiclassical approach.
Another important conclusion which can be drawn
from Fig. 18-a is the lowering of the GDR γ-ray energy
for the non statistical part as compared to the statistical
one which is attributed to the deformation of the nucleus
(see sec. II B 2). This phenomenon is also reported by
Baran et al. [9]. In fact we get from Fig. 18-a a position
of the peak of about 7.5 MeV for the preequilibrium GDR
while Baran et al. obtained ∼9 MeV. On the experimen-
tal side, the γ-ray spectra are dominated by a statistical
background decreasing exponentially. In addition to this
background, the GDR creates a bump located around
the GDR energy (Fig. 1 of Ref. [22]). To get rid of the
statistical background, the authors of [22] linearized the
γ-ray spectra by dividing them by a theoretical statistical
background. The resulting spectra are plotted in Fig. 2
of Ref. [22]. This procedure is used by the authors to
determine the preequilibrium to statistical ratio for the
GDR component. However it cannot be used to deter-
mine the positions in energy of the peaks because the
division by an exponential background induces a shift in
energy which is different for both contributions (statisti-
cal and preequilibrium) if they are not centered around
the same energy, as expected from Fig. 18-a.
We modified the procedure as follows. First, we assume
that no preequilibrium γ-ray is emitted in the N/Z quasi-
symmetric reaction 36S+104Pd. We then subtract the
total γ-ray spectrum associated to the quasi-symmetric
reaction from the N/Z asymmetric one. These two spec-
tra are plotted in Fig. 1 of Ref. [22]. The result of this
subtraction is the preequilibrium component of the GDR
in the reaction 40Ca+100Mo, and is plotted in Fig. 18-b.
The error bars are both statistical and systematic due to
the graphical extraction of the data. Below 5 MeV the
systematic error is to high to get relevant data. Focusing
on the energy position of the preequilibrium component,
we note a good agreement between TDHF predictions
and experimental data.
To conclude, we extracted from existing data, for the
first time, an experimental observation of the lowering
of the preequilibrium GDR predicted by our TDHF cal-
culations. This analysis shows that the preequilibrium
GDR is, indeed, a powerful experimental tool to study
the fusion path. Another application of N/Z asymmetric
fusion reactions is proposed in the next section.
FIG. 18: a) preequilibrium GDR γ-ray spectrum calcu-
lated in the reactions 40Ca+100Mo (solid line) and 36S+104Pd
(dashed line). The dotted line represents the first chance sta-
tistical γ-ray decay spectrum. b) Experimental data resulting
from the subtraction of the γ-ray spectra obtained by Flibotte
et al. [22] in the reactions 40Ca+100Mo and 36S+104Pd.
III. FUSION/EVAPORATION CROSS
SECTIONS OF HEAVY NUCLEI
As mentioned in [9], the emission of a preequilibrium
GDR γ-ray decreases the excitation energy hence the ini-
tial temperature of the nucleus reaching the statistical
phase. The emission of preequilibrium particles, which
can be controlled in our example by the N/Z asymmetry,
is thus a new interesting cooling mechanism for the for-
mation of Heavy and Super Heavy Elements. For such
nuclei, the statistical fission considerably dominates the
neutron emission and the survival probability of the CN
becomes very small.
SHE must be populated at low excitation energy.
Firstly, because the smaller the excitation energy, the
smaller the fission probability. Secondly, because the
shell corrections decrease with excitation energy [57].
These corrections are responsible for the stability of the
transfermiums nuclei (Z > 100) in their ground state.
The quantum stabilization decreases quite rapidly with
excitation energy until the fission barrier vanishes. Those
two reasons are strong motivations to study the cooling
mechanisms involved in the preequilibrium phase of the
CN formation.
In the following, we expose one cooling mechanism re-
sponsible for the predicted enhancement of the survival
probability in the case of a N/Z asymmetric reaction. As
an illustration, we treat only the γ-emission part of the
preequilibrium GDR decay. Although it may play an im-
portant role, we do not treat the preequilibrium neutron
emission for two reasons:
• Only the direct neutron decay of giant resonances
can be assessed in TDHF. Then, we would be able
to describe only a small part of this neutron emis-
sion, the other parts being the sequential and sta-
tistical decays. Missing the sequential decay would
be a strong limitation of the description.
• We would need not only the number of emitted neu-
trons, but also their energy. Consequently, huge
spatial grid would have to be used in order to per-
form a spatial Fourier transform of the single par-
ticle wave functions, which is out of range of three
dimensional TDHF codes because of computational
limitations.
Let us define PE∗
init.
(E∗) the survival probability at an
excitation energy E∗ of a CN which started its statistical
decay at the energy E∗init. We also note P
surv and P
the final survival probabilities of the CN formed by N/Z
symmetric and asymmetric reactions respectively.
Fig. 19-a illustrates schematically the evolution of the
survival probability (x-axis) when the excitation energy
decreases (y-axis) in a case of an N/Z symmetry in the
entrance channel. In this case, no γ-ray emission is ex-
pected in the preequilibrium phase and the initial ex-
citation energy is always maximum E∗init = E
0 , where
E∗0 = Q + Ecm is the excitation energy when no pree-
quilibrium particles are emitted, Ecm is the center of
mass energy and Q = (M1 +M2 −MCN )c2. During the
statistical decay, the excitation energy decreases mainly
through neutron emission, but at the same time the
survival probability of the compound nucleus decreases
too. For instance, when the excitation energy reaches
E∗1 = E
0−EGDR, the survival probability P1 = PE∗0 (E
at this energy might be small. At the end of the decay,
when the excitation energy is zero, the survival probabil-
ity becomes PSsurv = PE∗0 (0) = P1PE
Fig. 19-b shows the same for an N/Z asymmetric re-
action. In this last case, the nucleus can emit a pree-
quilibrium GDR γ-ray with a probability Pγ . The nu-
clei which emit such a γ-ray begin the statistical decay
at a lower energy Einit = E
1 , whereas those which did
not emit a γ-ray still starts their decay at Einit = E
The probability for the latter case is 1 − Pγ . The sur-
vival probability at the end of the decay then reads
PAsurv = [(1− Pγ)P1 + Pγ ]PE∗1 (0). The ratio of the sur-
FIG. 19: Schematic representation of the CN population dur-
ing the statistical decay in the case of an N/Z symmetric
collision (a) and an N/Z asymmetric reaction (b).
vival probabilities between the N/Z symmetric and asym-
metric cases is
PAsurv
PSsurv
= 1 +
(1− P1) . (24)
We now use a simple model to get an estimate of this
quantity. It is clear that, to get a quantitative predictions
of survival probabilities, the studied mechanism has to
be included in more elaborated statistical models, which
is beyond the scope of this paper. The probability Pγ
can be calculated by integrating Eq. 4 over the energy
range. This can be done for example with a TDHF cal-
culation or using the classical electrodynamic formulae
from Ref. [49]. Following these formulae, we approximate
the probability to emit a preequilibrium GDR γ-ray per
interval of energy by
2e2QD(0)
3π(~c)3
E21 +
(E − E1)2 + ΓGDR
(E + E1)2 +
where E1 =
− ΓGDR2
is the “shifted” energy of
the damped harmonic motion and ΓGDR is the damping
width of the preequilibrium GDR. The initial value of the
dipole moment, QD(0), can be estimated from Eq. 15 at
the touching point and neglecting the polarization of the
collision partners [8]. We get
QD(0) ≃
R1 +R2
(Z1N2 − Z2N1)
where Ri is the radius of nucleus i.
To determine PE∗0 (E
1 ), we need to solve a system of
six equations: Eqs. 22, 23 and
= −Γn(t)
(Bn + T (t)) (25)
= −Γf(t)
P (t) (26)
Γf (t) =
~ω0ωs
Bf (t)
T (t) (27)
Bf (t) ≡ Bf [E∗(t)] = Bf (0)e−
Ed (28)
Eq. 25 gives the evolution of the excitation energy, assum-
ing as in [9] that the CN width can be identified to the
neutron width. This implies that we neglect the statisti-
cal gamma emission. This choice is justified by the fact
that the statistical neutron emission is much more prob-
able than the gamma emission in the excitation energy
domain of interest where the fission dominates, which is
above the neutron emission threshold Bn. Eq. 26 gives
the evolution of the survival probability against fission
P . Eq. 27 gives the evolution of the fission width. The
parameters ω0 and ωS are the oscillator frequencies of the
two parabolas approximating the potential V (x) in the
first minimum and at the saddle point respectively. The
variable x is related to the distance between the mass
centers of the nascent fission fragments (see [58]) and
β = 5× 1021s−1 is the reduced friction. Eq. 28 gives the
evolution of the fission barrier Bf . For SHE, this barrier
has only a quantum nature and vanishes at high excita-
tion energy. Ed ≃ 20 MeV is the shell damping energy
[58]. We consider that a CN with an excitation energy
between E∗1 and E
0 decays only by fission or neutron
emission.
We take here the example of the reac-
tion 124Xe+141Xe→265Hs∗ at the fusion barrier
(Ecm = Bfus), that is, an excitation energy E
0 = 54
MeV. With an energy and a width of the GDR of 13
MeV and 4 MeV respectively, the preequilibrium γ-ray
emission probability is Pγ ≃ 0.05. For the statistical
decay we take Bf [E
∗ = 0] ≃ 8.5 MeV, Bn = 6.5 MeV
and ω0 ≃ ωS ≃ 1 MeV/~. We also get a survival
probability PE∗0 (E
1 ) ≃ 0.01 which is small as compared
to Pγ . Following Eq. 24, the enhancement of the total
survival probability due solely to the N/Z asymmetry in
the entrance channel becomes PAsurv/P
surv ∼ 6.
To conclude, we see that such an effect may be useful
for the formation of Heavy and Super Heavy Elements.
Indeed, based on our conclusions, very asymmetric N/Z
collisions induced by radioactive ion beams that are com-
ing online in several laboratories, should allow the syn-
thesis SHE with a larger cross sections than are obtain-
able with beams of stable isotopes.
IV. CONCLUSION
In this paper we have performed TDHF calculations to
study in some details the properties of the preequilibrium
GDR that can be excited before the formation of a fully
equilibrated CN. We have shown that this probe can be
used to better understand the early stage of the fusion
path, and more precisely the charge equilibration. We
have clarified the role of the N/Z and/or mass asymme-
tries on the GDR excitation. The energy of the preequi-
librium GDR is expected to decrease with excitation en-
ergy, an effect attributed to a strong prolate shape asso-
ciated to the fused system. We presented the first experi-
mental indication of this shift in energy. The calculations
for an N/Z asymmetric collisions at non zero impact pa-
rameters have been performed and revealed couplings be-
tween the dipole oscillations and the CN rotation. Other
couplings between vibrational modes for mass symmetric
reactions have also been studied.
Finally we suggest that the use of N/Z asymmetric fu-
sion reactions is a good choice to synthesize Heavy and
Super Heavy Elements. In that case, the preequilibrium
GDR γ-ray emission cooling mechanism might be well
suited to reach the statistical phase with a low excita-
tion energy yielding a larger survival probability against
fission. The availability of radioactive beams with large
N/Z asymmetry and sufficient intensities for these kind of
studies will be extremely useful to check experimentally
our predictions in the near future.
Acknowledgments
This paper is dedicated to the memory of P. Bonche,
the author of the TDHF code we used. We are grateful
to M. Di Toro for a useful reading of the manuscript. We
also thank V. Baran, M. Colonna, D. Lacroix, D. Boilley
and J. P. Wieleczko for several fruitful discussions, and
P. Schuck for providing a pertinent reference.
[1] R.V.F. Janssens and T.L. Khoo, Ann. Rev. Nucl. Part.
Sci. 41, 321 (1991).
[2] S. Hofmann, Rep. Prog. Phys. 61 , 639 (1998).
[3] M. Dasgupta, D. J. Hinde, N. Rowley and A. M. Ste-
fanini, Ann. Rev. Nucl. Part. Sci. 48, 401 (1998).
[4] P. Bonche and B. Grammaticos, Phys. Lett. B 95, 198
(1980).
[5] P. Bonche and N. Ngô, Phys. Lett. B 105, 17 (1981).
[6] Ph. Chomaz, M. Di Toro and A. Smerzi, Nucl. Phys. A
563, 509 (1993).
[7] V. Baran, M. Colonna, M. Di Toro, A. Guarnera and A.
Smerzi, Nucl. Phys. A 600, 111 (1996).
[8] V. Baran, M. Cabibbo, M. Colonna, M. Di Toro and N.
Tsoneva, Nucl. Phys. A 679, 373 (2001).
[9] V. Baran, D. M. Brink, M. Colonna and M. Di Toro,
Phys. Rev. Lett. 87, 182501 (2001).
[10] C. Simenel, Ph. Chomaz and G. de France, Phys. Rev.
Lett. 86, 2971 (2001).
[11] A. Van der Woude, Prog. in Part. and Nucl. Phys. 18,
217 (1987).
[12] M. N. Harakeh and A. van der Woude, Giant Resonances:
Fundamental High-Frequency Modes of Nuclear Excita-
tion, Oxford Science publications (2001).
[13] J. O. Newton et al., Phys. Rev. Lett. 46, 1383 (1981).
[14] J.J. Gaardhoje, Ann. Rev. Nucl. Part. Sci. 42, 483
(1992).
[15] A. Bracco et al., Phys. Rev. Lett. 62, 2080 (1989).
[16] Ph. Chomaz, Phys. Lett. B 347, 1 (1995).
[17] C. Simenel and Ph. Chomaz, Phys. Rev. C 68, 024302
(2003).
[18] M. Fallot, Ph. Chomaz, M. V. Andrès, F. Catara, E. G.
Lanza and J. A. Scarpaci, Nucl. Phys. A 729, 699 (2003).
[19] Ph. Chomaz and C. Simenel, Nucl. Phys. A 731, 188c
(2004).
[20] E. Suraud, M. Pi and P. Schuck, Nucl. Phys. A 492, 294
(1989).
[21] C. H. Dasso, H. Sofia and A. Vitturi, Eur. Phys. J. A 12,
279 (2001).
[22] S. Flibotte et al., Phys. Rev. Lett. 77, 1448 (1996).
[23] M. Cinausero et. al, Nuov. Cim. A 111, 613 (1998).
[24] D. Pierroutsakou et al., Eur. Phys. J. A 17, 71 (2003).
[25] D. Pierroutsakou et al., Phys. Rev. C 71, 054605 (2005).
[26] F. Amorini et al., Phys. Rev. C 69, 014608 (2004).
[27] L. Campajola et al., Z. Phys. A 352, 421 (1995).
[28] F. Amorini et al., Phys. Rev. C 58, 987 (1998).
[29] M. Sandoli et al., Eur. Phys. J. A 6, 275 (1999).
[30] M. Papa et al., Eur. Phys. J. A 4, 69 (1999).
[31] D. Pierroutsakou et al., Eur. Phys. J. A 16, 423 (2003).
[32] M. Papa et al., Phys. Rev. C 68, 034606 (2003).
[33] M. Papa et al., Phys. Rev. C 72, 064608 (2005).
[34] D. R. Hartree, Proc. Cambridge Philos. Soc. 24, 89
(1928).
[35] V. A. Fock, Z. Phys. 61, 126 (1930).
[36] P.A.M. Dirac, Proc. Camb. Phil. Soc. 26, 376 (1930).
[37] D. Vautherin and D. M. Brink, Phys. Rev. C 5, 626
(1972).
[38] Y.M. Engel, D.M. Brink and K. Goeke, S.J. Krieger and
D. Vautherin Nuc. Phys. A 249, 215 (1975).
[39] P. Bonche, S. Koonin and J. W. Negele, Phys. Rev. C
13, 1226 (1976).
[40] J. W. Negele, Rev. Mod. Phys. 54, 913 (1982).
[41] M. Gong, M. Thoyama and J. Randrup, Z. Phys. A 335,
331 (1990).
[42] C. Y. Wong and H. H. K. Tang, Phys. Rev. Lett. 40,
1070 (1978).
[43] D. Lacroix, S. Ayik and Ph. Chomaz, Prog. Part. Nucl.
Phys. 52, 497 (2004).
[44] O. Juillet and Ph. Chomaz, Phys. Rev. Lett. 88, 142503
(2002).
[45] Ph. Chomaz, N. V. Giai, and S. Ayik, Phys. Lett. B 189,
375 (1987).
[46] C. Simenel, Ph. Chomaz and G. de France, Phys. Rev.
Lett. 93, 102701 (2004).
[47] K.-H. Kim, T. Otsuka and P. Bonche, J. Phys. G 23,
1267 (1997).
[48] T. Skyrme, Phil. Mag. 1, 1043 (1956).
[49] J. D. Jackson, Classical Electrodynamics (Wiley, New
York, 1962), Eq. (15.1).
[50] J. A. Maruhn, P. G. Reinhard, P. D. Stevenson, J. R.
Stone and M. R. Strayer, Phys. Rev. C 71, 064328 (2005).
[51] P.-G. Reinhard et al., Phys. Rev. E 73, 036709 (2006).
[52] F. Calvayrac, S. El-Gammal, C. Kohl, P.-G. Reinhard
and E. Suraud, Nuov. Cim. A 110, 1175 (1997).
[53] P. M.Dinh, P. G. Reinhard and E. Suraud, J. Phys. B
38, 1637M (2005).
[54] E. B. Balbutsev and P. Schuck, Nucl. Phys. A 652, 221
(1999).
[55] D. M. Brink, Nucl. Phys. A 482, 3c (1988).
[56] K.A. Snover, Annu. Rev. Nucl. Part. Sci. 36, 545 (1986).
[57] A. V. Ignatyuk, K. K. Istekov et G. N. Smirenkin, Sov.
J. Nucl. Phys. 30, 626 (1979).
[58] Y. Aritomo, T. Wada, M. Ohta and Y. Abe, Phys. Rev.
C 59, 796 (1999).
ABSTRACT
  The equilibration of macroscopic degrees of freedom during the fusion of
heavy nuclei, like the charge and the shape, are studied in the Time-Dependent
Hartree-Fock theory. The pre-equilibrium Giant Dipole Resonance (GDR) is used
to probe the fusion path. It is shown that such isovector collective state is
excited in N/Z asymmetric fusion and to a less extent in mass asymmetric
systems. The characteristics of this GDR are governed by the structure of the
fused system in its preequilibrium phase, like its deformation, rotation and
vibration. In particular, we show that a lowering of the pre-equilibrium GDR
energy is expected as compared to the statistical one. Revisiting experimental
data, we extract an evidence of this lowering for the first time. We also
quantify the fusion-evaporation enhancement due to gamma-ray emission from the
pre-equilibrium GDR. This cooling mechanism along the fusion path may be
suitable to synthesize in the future super heavy elements using radioactive
beams with strong N/Z asymmetries in the entrance channel.

<|endoftext|><|startoftext|>
Introduction.
In this paper, we describe a new, systematic and explicit way of approximat-
ing solutions of mixed hyperbolic systems with constant coefficients satisfy-
ing a Uniform Lopatinski Condition via different Penalization approaches.
In applied Mathematics like, for instance, in the study of fluids dynamics,
the method of penalization is used to treat boundary conditions in the case
of complex geometries. By replacing the boundary condition by a singular
perturbation of the PDE extended to a larger domain, this method allows
the construction of an approximate, often more easily computable, solution.
We consider mixed boundary value problems for hyperbolic systems:
Aj∂j ,
on {xd ≥ 0}, with boundary conditions on {xd = 0}. The n× n real valued
matrices Aj are assumed constant. Of course, we assume the coefficients to
be constant as a first approach, aiming to generalize the results obtained here
in future works. We assume that the boundary {xd = 0} is noncharacteristic,
which means that detAd 6= 0. We denote by
y := (x1, . . . , xd−1) and x := xd. The problem writes:
(1.1)
Hu = f, {x > 0},
Γu|x=0 = Γg,
u|t<0 = 0 ,
where the unknown u(t, x) ∈ Rn, Γ : Rn → Rp is linear and such that
rg Γ = p; which implies that Γ can be viewed as a p×n real valued constant
matrix. Let us fix T > 0 once and for all for this paper. Let Ω+
denotes
the set [0, T ] × Rd+ and ΥT denote the set [0, T ] × R
d−1. f is a function in
Hk(Ω+
), g is a function in Hk(ΥT ), where k ≥ 3 or k = ∞, such that:
f |t<0 = 0 and g|t<0 = 0. We make moreover the following Hyperbolicity
assumption on H :
Assumption 1.1. For all (η, ξ) ∈ Rd−1 × R− {0}, the eigenvalues of
ηjAj + ξAd
are real, semi-simple and of constant multiplicity.
Let us introduce now the frequency variable ζ := (γ, τ, η), where iτ + γ,
with γ ≥ 0, and τ ∈ R stands for the frequency variable dual to t and
η = (η1, . . . , ηd−1) where ηj ∈ R is the frequency variable dual to xj. We
note:
A(ζ) := − (Ad)
(iτ + γ)Id+
iηjAj
Denote by M a N ×N, complex valued, matrix; E−(M)[resp E+(M)] is the
linear subspace generated by the generalized eigenvectors associated to the
eigenvalues of M with negative [resp positive] real part. If F and G denote
two linear subspaces of CN such that dimF+dimG = N, det(F,G) denotes
the determinant obtained by taking orthonormal bases in each space. Up to
the sign, the result is independent of the choice of the bases. We shall now
explicit the Uniform Lopatinski Condition assumption:
Assumption 1.2. (H,Γ) satisfies the Uniform Lopatinski Condition i.e for
all ζ such that γ > 0, there holds:
(1.2) |det(E−(A), ker Γ)| ≥ C > 0.
The mixed hyperbolic system (1.1) has a unique solution in
Hk(Ω+
), and, since H is hyperbolic with constant multiplicity, for all γ
positive, the eigenvalues of A stay away from the imaginary axis. More-
over, as emphasized for instance by Chazarain and Piriou in [3] and Mé-
tivier in [8], there is a continuous extension of the linear subspace E−(A)
to {γ = 0, (τ, η) 6= 0
} that we will denote by Ẽ−(A). Ẽ+(A) extends as
well continuously to {γ = 0, (τ, η) 6= 0
} and we will denote Ẽ+(A) this
extension. Moreover, there holds:
Ẽ−(A)
Ẽ+(A) = C
We can refer the reader to [3], [6], [7], or [8] for detailed estimates concern-
ing mixed hyperbolic problems satisfying a Uniform Lopatinski Condition.
Moreover, we can refer to [10] for the proof of the continuous extension of the
linear subspaces mentioned above in the hyperbolic-parabolic framework.
Remark 1.3. As a consequence of the uniform Lopatinski condition, there
holds, for all ζ 6= 0 :
rg Γ = p = dim Ẽ−(A(ζ)).
1.1 A Kreiss Symmetrizer Approach.
We will now describe a penalization method involving a Kreiss Symmetrizer
and a matrix constructed by Rauch in [12], in the construction of our singular
perturbation. Note well that we have some freedom in both the choice of the
Kreiss Symmetrizer and of Rauch’s matrix. Let us denote respectively by û,
f̂ , and ĝ the tangential Fourier-Laplace transform of u, f, and g. Since the
Uniform Lopatinski Condition is holding for the mixed hyperbolic system
(1.1), there is, see [9] a Kreiss symmetrizer S for the problem:
(1.3)
∂xû = Aû+ f̂ , {x > 0},
Γû|x=0 = Γĝ,
That is to say there exists a matrix S(ζ), homogeneous of order zero in ζ,
C∞ in R+ × Rd − {0
} and there are λ > 0, δ > 0 and C1 such that:
• S is hermitian symmetric.
• ℜ (SA) ≥ λId.
• S ≥ δId− C1Γ
An algebraic result proved by Rauch in [12] can be reformulated as follow,
and a proof is recalled in section 2.2:
Lemma 1.4. There is a hermitian symmetric, uniformly definite positive,
N ×N matrix B such that:
ker Γ = E+((S)
−1B).
Moreover B depends smoothly of ζ.
Remark 1.5. This result is proved by constructing explicit matrices satisfy-
ing the desired properties. Thus, it is not merely an existence result and we
can use the explicitly known matrix B in our construction of a penalization
operator.
Let us denote by R := B
2 and SR := R
−1SR−1. We will denote by P−
the projector on E−(SR) parallel to E+(SR) and by P
+ the projector on
E+(SR) parallel to E−(SR); P
− and P+ denoting the associated Fourier
multiplier. We recall that, denoting by F the tangential Fourier transform,
the Fourier multiplier P−(∂t, ∂y, γ) [resp P
+(∂t, ∂y, γ)] is then defined, for all
w ∈ Hk(Rd+1), and γ > 0, by:
−(∂t, ∂y, γ)w
= P−(ζ)F(w),
[resp
+(∂t, ∂y, γ)w
= P+(ζ)F(w)],
in the future we will rather write:
±(∂t, ∂y, γ)w
= P±(ζ)F(w).
We fix, once and for all, γ > 0 big enough. Let us consider then the solution
uε of the well-posed Cauchy problem on the whole space (1.4):
(1.4)
Huε +
Muε1x<0 = f1x>0 +
θ1x<0, {x ∈ R},
uε|t<0 = 0,
where
M := −eγtAdS
−1RP−Re−γt,
θ := −eγtAdS
−1RP−Γg̃,
and S(∂t, ∂y) [resp R(∂t, ∂y)] denotes the Fourier multiplier associated to
S(ζ) [resp R(ζ)]. Let us define g̃ by:
g̃ := e−x
In what follows, ĝ will denote the Fourier-Laplace transform of g̃. Let us
denote by
ũ := u−1x<0 + u1x≥0 = u
1x≤0 + u1x>0.
u denotes the solution of (1.1), and thus belongs to Hk(Ω+
). u− is a function
belonging to Hk(Ω−
) and such that u−|x=0 = u|x=0. More precisely, u
− can
be computed by: eγtF−1
R−1(v̂− + P−Γĝ)
, where v̂− is the solution of the
problem:
SR∂xv̂
− − P+SRARv̂
− = P+SRARP
−Γĝ, {x < 0},
v̂−|x=0 = P
+Rû|x=0,
and û denotes the Fourier-Laplace transform of the solution u of (1.1).
Theorem 1.6. For all k ≥ 3, if f ∈ Hk(Ω+
) and g ∈ Hk(ΥT ), then there
holds:
‖uε − u−‖
Hk−3(Ω
+ ‖uε − u‖
Hk−3(Ω
= O(ε),
where uε denotes the solution of the Cauchy problem (1.4) and u denotes
the solution of the mixed hyperbolic problem (1.1). If g = 0 then:
‖uε − u−‖
+ ‖uε − u‖
= O(ε).
Of course, since uε is defined for all {x ∈ R}, its limit as ε → 0+, ũ is
can be viewed as an ”extension” of u on the fictive domain {x < 0}. The
”extension” resulting from our method of penalization gives a continuous ũ
across {x = 0}, while the method used in [2] gave simply: ũ|x<0 = 0. We
have the following Corollaries:
Corollary 1.7. Assume for example that f ∈ H∞(Ω+
) and
g ∈ H∞(ΥT ) then
‖uε − u‖
Hs(Ω+
= O(ε); ∀s > 0.
Corollary 1.8. If f belongs to L2(Ω+
) and g = 0 then:
‖uε − ũ‖L2(ΩT ) = 0.
One of the interest of this first approach lies in the rate of convergence
of uε towards u. Indeed, in general, a boundary layer will form near the
boundary in this kind of singular perturbation problem. For example in the
paper by Bardos and Rauch [2], as confirmed by Droniou [4], a boundary
layer forms. It is also the case in [11], as analyzed in our Appendix. There
are also boundary layers phenomena in the parabolic context: see the ap-
proach proposed by Angot, Bruneau and Fabrie [1] for instance. However,
surprisingly, and like in the penalization method proposed by Fornet and
Guès in [5], our method allows the convergence to occur without formation
of any boundary layer on the boundary. As a result, this leads to the kind
of sharp stability estimate given in Theorem 1.6. These results concern the
case where f and g are sufficiently regular. The reason is that we construct
an approximate solution. In the case of g only in L2(ΥT ), such a simple
treatment does not work. However, let δ > 0 be given. If we approximate
f and g by smooth functions fν ∈ H
) and gν ∈ H
∞(ΥT ) such that
‖f − fν‖L2(Ω+
< δ and ‖g − gν‖L2(ΥT ) < δ, by the uniform Lopatinski
condition, we get:
‖uν − u‖L2(Ω+
< Cν,
where uν is the solution of the mixed hyperbolic problem (1.1) with data fν
and gν . We can now apply Corollary 1.8 to uν , and obtain by penalization
a sequence uεν in L
2(ΩT ) such that: limε→0+ u
ν = uν in L
). Finally, by
choosing, ε sufficiently small, we get ‖u − uεν‖L2(Ω+
< 2Cδ. By choosing ε
and ν as functions of δ, and noting u(δ) = u
, we have:
(1.5) lim
‖u(δ) − u‖L2(Ω+
1.2 A second Approach.
In the first approach we have just introduced, it is necessary to compute
a Kreiss’s Symmetrizer and a Rauch’s matrix. In view of future numerical
applications, we will now introduce another method preventing the compu-
tation of these matrices. The price to pay is that we need the preliminary
computation of v, which is by definition the solution of the Cauchy problem
on the free space:
(1.6)
Hv = f, (t, y, x) ∈ ΩT ,
v|t<0 = 0 ∀(y, x) ∈ R
Let us denote P−(ζ) the spectral projector on Ẽ−(A(ζ)) parallel to
Ẽ+(A(ζ)), andP
+(ζ) the spectral projector on Ẽ+(A(ζ)) parallel to Ẽ−(A(ζ)).
Let us introduce P±(∂t, ∂y, γ), the Fourier multiplier associated to P
±(ζ).
Let us denote by Π the projector on Ẽ−(A(ζ)) parallel to KerΓ, which has
a sense because of the Uniform Lopatinski Condition and denote Π the
associated Fourier multiplier. We define then h̃ by:
h̃ := e−x
−(e−γtv|x=0) +Πe
−γt(g − v|x=0)
where g denotes the function involved in the boundary condition of the
mixed hyperbolic problem (1.1). Now, let us consider the following singularly
perturbed Cauchy problem on the whole space:
(1.7)
Huε +
−e−γtuε1x<0 = f1x>0 +
γth̃1x<0,
uε|t<0 = 0 .
Let us denote by
ũ := u−1x<0 + u1x≥0 = u
1x≤0 + u1x>0.
u denotes the solution of (1.1) thus belonging toHk(Ω+
) and u− is a function
belonging to Hk(Ω−
) and such that u−|x=0 = u|x=0. More precisely, u
− can
be computed by: eγtF−1(F(h̃)+v̂−), where v̂− is the solution of the problem:
+v̂−)−A(P+v̂−) = 0, {x < 0},
+v̂−|x=0 = P
+û|x=0.
and û denotes the Fourier-Laplace transform of the solution u of (1.1). The
problem (1.7) is well-posed and, for all ε > 0, there exists a unique
uε ∈ Hk(ΩT ) solution. We will fix γ adequately big beforehand. We observe
then the following result:
Theorem 1.9. For all k ≥ 3, if f ∈ Hk(Ω+
) and g ∈ Hk(ΥT ), then there
holds:
‖uε − u−‖
Hk−3(Ω
+ ‖uε − u‖
Hk−3(Ω
= O(ε),
where uε denotes the solution of the Cauchy problem (1.7) and u denotes
the solution of the mixed hyperbolic problem (1.1).
The singular perturbation involved in the definition of uε does not de-
pend either of Kreiss’s Symmetrizer or Rauch’s matrix. As a result, for
this method of penalization far less computations are necessary in order to
obtain our singular perturbation. Note well that the proof of the energy
estimates in Theorem 1.9 is completely different from the proof of the en-
ergy estimates in Theorem 1.6. Indeed, for our first approach our singularly
perturbed problem was treated as a Cauchy problem, contrary to our second
approach where it was interpreted as a transmission problem.
Corollary 1.10. Assume for example that f ∈ H∞(Ω+
) and
g ∈ H∞(ΥT ) then
‖uε − u‖
= O(ε); ∀s > 0.
Of course, we see that the same problem of regularity arises in Theorem
1.9 and Theorem 1.6. However, by a simple density argument, we can also
prove here the exact analogous of (1.5).
Remark 1.11. In the case where f = 0, then the solution v of (1.6) is
v = 0 and thus, the perturbed cauchy problem (1.7) rewrites:
Huε +
−e−γtuε1x<0 =
γte−x
Πe−γtg
1x<0, {x ∈ R},
uε|t<0 = 0 .
2 Underlying approach leading to the proof of
Theorem 1.6.
2.1 Some preliminaries.
Since the Uniform Lopatinski Condition holds, there is S, homogeneous of
order zero in ζ, and such that there are λ > 0, δ > 0 and C1 and there holds:
• S is hermitian symmetric.
• ℜ (SA) ≥ λId.
• S ≥ δId− C1Γ
S is then called a Kreiss Symmetrizer for the problem:
(2.1)
∂xû = Aû+ f̂ , {x > 0},
Γû|x=0 = Γĝ,
where f̂ and ĝ denotes respectively the Fourier-Laplace transforms of f and
g̃; and û denotes the Fourier-Laplace transform of the solution u of the well-
posed mixed hyperbolic problem (1.1). û is also solution, for all fixed ζ 6= 0
of the following equation:
(2.2)
S∂xû = SAû+ S(Ad)
−1f̂ , {x > 0},
Γû|x=0 = Γĝ,
Remark 2.1. Following our current assumptions, Γ is independent of ζ 6= 0,
however, more general boundary conditions, of the form:
Γ(ζ)û|x=0 = Γ(ζ)ĝ,
can be treated. It would imply taking as boundary condition for (1.1):
Γγu|x=0 = Γγg,
with for γ big enough,
Γγ := Γ(∂t, ∂y)e
where, Γ(∂t, ∂y) denotes the Fourier multiplier associated to Γ(ζ), that is to
say is defined by:
F(Γ(∂t, ∂y)u) = Γ(ζ)F(u).
Referring for example to [3] and [7], Kreiss has proved that the exis-
tence of a Kreiss symmetrizer for the symbolic equation is sufficient to prove
the well-posedness of the associated pseudodifferential equation (here (1.1)).
Indeed, multiplying by û and integrating by parts the equation:
S∂xû = SAû+ S(Ad)
leads to the desired a priori estimates. For all ζ 6= 0, S(ζ) is hermitian
symmetric and definite positive on ker Γ. Let us sum up the properties crucial
in the proof of the well-posedness of our problem:
Proposition 2.2. For all ζ = (τ, γ, η) such that τ2 + γ2 +
j=1 η
j = 1,
there holds:
• S(ζ) is hermitian symmetric.
• ℜ (SA) (ζ) := 1
(SA+ (SA)∗)(ζ) is positive definite.
• −S(ζ) is definite negative on ker Γ and ker Γ is of same dimension as
the number of negative eigenvalues in −S(ζ).
Note that, by homogeneity of S, it is equivalent for the properties in Propo-
sition 2.2 to hold for |ζ| = 1 or for |ζ| > 0. As a consequence of the first
point and third point of Proposition 2.2, the Lemma 1.4 applies and gives a
matrix B such that: ker Γ = E+(S
−1B). In the sequel, such a matrix B is
fixed once for all.
The following chapter contains a proof of Lemma 1.4 assorted of a detailed
construction of B.
2.2 Detailed proof of Lemma 1.4: Construction of the ma-
trices B solving Lemma 1.4.
As we will emphasize in next chapter, Lemma 1.4 is a crucial feature in our
first method of Penalization. The aim of this chapter is to give a more com-
plete proof rather than simply recalling Rauch’s result and, in the process,
to precise how the matrices B solving Lemma 1.4 are constructed. For all
ζ 6= 0, S(ζ) is hermitian symmetric, uniformly definite positive on Ẽ+(A(ζ)),
and uniformly definite negative on Ẽ−(A(ζ)); as a consequence, S(ζ) keeps
exactly p positive eigenvalues and N − p negative eigenvalues for all ζ 6= 0.
Basically, knowing that S is uniformly definite positive on ker Γ; we search
to express ker Γ in a way involving S. Consider q ∈ ker Γ, since, for all ζ 6= 0,
E−(S(ζ))
E+(S(ζ)) = C
N , we can split q in:
q := q+ + q−
with q+ ∈ E+(S(ζ)) and q
− ∈ E−(S(ζ)).
Since dim kerΓ = dimE+(S(ζ)) = p, these two linear subspaces are in bijec-
tion. Let us give the two main ideas behind this proof: one idea is to detail
the bijection between q ∈ kerΓ and q+ ∈ E+(S(ζ)) as it satisfies some con-
straints, the other is to come down to the model case where the eigenvalues
of S are either 1 or −1. Let us denote:
S̃−1 =
−IdN−p 0
0 Idp
In a first step, we will prove the following result:
Proposition 2.3. If we assume that V is a linear subspace of CN of dimen-
sion p, and that there is C > 0 such that, for all q ∈ V, there holds:
〈S̃−1q, q〉 ≥ C〈q, q〉,
then the two following equivalent properties hold:
• There is a hermitian symmetric, positive definite matrix R̃, such that:
[q ∈ V] ⇔
q ∈ E+(R̃S̃R̃)
which is equivalent to:
V = E+(R̃
• There is a hermitian symmetric, positive matrix R̃, such that:
[q ∈ V] ⇔
R̃q ∈ E+(R̃S̃
−1R̃)
which is equivalent to:
V = E+(S̃
−1R̃2).
Moreover, we can link the two properties by taking:
R̃2 = S̃R̃
Proof. In this proof, we will show how to construct some matrices R̃
satisfying the required properties. There is a (N − p)× p matrix ℵ of rank
N − p such that ‖ℵ‖ ≤ 1 and:
V = {q ∈ CN , q− = ℵq+},
where q+[resp q−] denotes the projector on E+((S̃)
−1) [resp E−((S̃)
−1)]parallel
to E−((S̃)
−1) [resp E+((S̃)
−1)]. Indeed, dimV = p = dimE+((S̃)
−1), and
N = E−((S̃)
E+((S̃)
−1). Moreover, there ic C > 0 such that, for all
q ∈ V, there holds:
〈(S̃)−1q, q〉 = −〈q−, q−〉+ 〈q+, q+〉 ≥ C〈q, q〉.
and thus
|q+|2 − |ℵq+|2 ≥ C|q|2,
which implies that ‖ℵ‖ < 1. We will show now that, for R̃ constructed as
follow:
IdN−p −ℵ
−ℵ∗ Idp
there holds:
[q ∈ V] ⇔
R̃q ∈ E+(R̃S̃
−1R̃)
First,we see that the constructed R̃ is trivially hermitian symmetric and
positive definite since ‖ℵ‖ < 1. First, we have:
R̃S̃−1R̃ =
−IdN−p +NN
0 Idp −N
R̃q =
q− − ℵq+
−ℵ∗q− + q+
Thus, since ‖ℵ‖ < 1, there holds:
R̃q ∈ E+(R̃S̃
−1R̃)
q− − ℵq+ = 0
⇔ [q ∈ V] .
We will now prove that we have:
(R̃)−1E+(R̃S̃
−1R̃) = E+(S̃
−1R̃2).
Since R̃S̃−1R̃ is hermitian symmetric, the linear subspace E+(R̃S̃
−1R̃) is
generated by the eigenvectors of R̃S̃−1R̃ associated to positive eigenvalues.
A basis of (R̃)−1E+(R̃S̃
−1R̃) is thus given by ((R̃)−1vj)j where vj denotes
an eigenvector of R̃S̃−1R̃ associated to a positive eigenvalue λj . We have:
R̃S̃−1R̃vj = λjvj.
Let us denote wj = (R̃)
−1vj, we have then:
R̃S̃−1R̃2wj = λjR̃wj ⇔ S̃
−1R̃2wj = λjwj .
As a result, wj is an eigenvector of S
−1R̃2 associated to the eigenvalue λj
hence we obtain that:
(R̃)−1E+(R̃S̃
−1R̃) = E+(S̃
−1R̃2).
We can also prove, the same way, that:
R̃E+(R̃S̃R̃) = E+(R̃
Now, taking
R̃2 = S̃R̃
we can check that:
E+(S̃
−1R̃2) = E+(R̃
which concludes the proof. ✷
Lemma (??) is a Corollary of the following Proposition:
Proposition 2.4. If S−1 denotes a smooth in ζ 6= 0, matrix-valued function
in the space of hermitian symmetric matrices with p positive eigenvalues and
N −p negative eigenvalues and ker Γ denotes a linear subspace of dimension
p and there is C > 0 such that, for all q ∈ ker Γ, there holds:
〈S−1q, q〉 ≥ C〈q, q〉,
then the two following equivalent properties hold:
• There is a smooth in ζ 6= 0, matrix-valued function R, in the space of
hermitian symmetric, positive matrices such that:
[q ∈ KerΓ] ⇔
∀ζ 6= 0, R−1(ζ)q ∈ E+(R(ζ)S(ζ)R(ζ))
which is equivalent to:
∀ζ 6= 0, KerΓ = E+(R
2(ζ)S(ζ)).
• There is a smooth in ζ 6= 0, matrix-valued function R, in the space of
hermitian symmetric, positive matrices such that:
[q ∈ KerΓ] ⇔
∀ζ 6= 0, R(ζ)q ∈ E+(R(ζ)S
−1(ζ)R(ζ))
which is equivalent to:
∀ζ 6= 0, KerΓ = E+(S
−1(ζ)R2(ζ)).
Moreover, for all ζ 6= 0, these two properties can be linked by taking:
(R(ζ))2 = S(ζ)(R(ζ))2S(ζ).
Proof. We will show here that Proposition 2.4 can be deduced from Propo-
sition 2.3. For all ζ 6= 0, S(ζ) is a hermitian symmetric matrix, moreover S
depends smoothly of ζ. As a consequence S−1 is also a hermitian symmetric
matrix depending smoothly of ζ, and as such, there is a nonsingular matrix
V such that:
S̃−1 = V ∗
Let us denote Λ the diagonalized version of S−1 with eigenvalues sorted by
increasing order, then there is Z depending smoothly of ζ such that, for all
ζ 6= 0, we have:
Z∗(ζ) = Z−1(ζ),
Λ(ζ) = Z∗(ζ)
(ζ)Z(ζ).
As a consequence, V depends smoothly of ζ since, for all ζ 6= 0:
V (ζ) = (Λ(ζ))−
2Z(ζ),
where Λ is the diagonal matrix obtained by taking the absolute value of each
eigenvalue of Λ. For the sake of simplicity, let us omit the dependence in ζ.
Now, for all q ∈ V −1 ker Γ, there is C > 0, such that:
〈S̃−1q, q〉 = 〈V ∗S−1V q, q〉 = 〈S−1(V q), (V q)〉 ≥ C〈(V q), (V q)〉.
Moreover V is nonsingular, thus there is C ′ > 0, such that, for all
q ∈ V −1 ker Γ, there holds:
〈S̃−1q, q〉 ≥ C ′〈q, q〉.
Moreover dimV −1 ker Γ = p, using Proposition 2.3, for all fixed ζ 6= 0, there
is a hermitian symmetric, positive definite matrix R̃(ζ), such that:
V −1(ζ) ker Γ = E+((R̃(ζ))
2S̃(ζ)) = R̃(ζ)E+(R̃(ζ)S̃(ζ)R̃)(ζ).
We will now prove that we can construct R̃ depending smoothly of ζ. First
there is a (N − p) × p matrix ℵ of rank N − p, depending smoothly of ζ,
such that fore all ζ 6= 0 ‖ℵ(ζ)‖ ≤ 1 and:
V −1(ζ) ker Γ = {q ∈ CN , q− = ℵ(ζ)q+},
where q+[resp q−] denotes the projector on E+((S̃)
−1) [resp E−((S̃)
−1)]parallel
to E−((S̃)
−1) [resp E+((S̃)
−1)]. R̃ is given, for all ζ 6= 0, by:
R̃(ζ) =
S̃−1(ζ)R̃2(ζ)S̃−1(ζ),
with R̃ given by:
R̃(ζ) =
IdN−p −ℵ(ζ)
−ℵ∗(ζ) Idp
Since S̃−1 = V ∗
V, there holds: S̃ = V ∗SV, and, as a consequence:
(V R̃)−1 ker Γ = E+(R̃V
∗SV R̃).
As R̃V ∗SV R̃ is hermitian symmetric, a basis of the linear subspace E+(R̃V
∗SV R̃)
is given by the eigenvectors of R̃V ∗SV R̃ associated to positive eigenvalues.
This leads us to consider vj = (V R̃)
−1uj satisfying:
R̃V ∗SV R̃vj = λjvj.
We have:
R̃V ∗SV R̃(V R̃)−1uj = λj(V R̃)
−1uj.
hence:
(V R̃)R̃V ∗Suj = λjuj .
Since (V R̃)R̃V ∗ = (R̃V ∗)∗(R̃V ∗) is hermitian symmetric and positive defi-
nite, we can then define its square root. We define R by:
(R̃V ∗)∗(R̃V ∗).
Since both R̃ and V depends smoothly of ζ, so does R. Moreover, there
holds:
R2Suj = λjuj,
which gives:
ker Γ = V R̃E+(R̃V
∗SV R̃) = E+(R
We have thus proved there is a smooth in ζ 6= 0, matrix-valued function R,
in the space of hermitian symmetric, positive matrices such that:
[q ∈ KerΓ] ⇔
∀ζ 6= 0, R−1(ζ)q ∈ E+(R(ζ)S(ζ)R(ζ))
which is equivalent to:
∀ζ 6= 0, KerΓ = E+(R
2(ζ)S(ζ)).
Now consider R defined, for all ζ 6= 0, by:
R(ζ) =
S(ζ)(R(ζ))2S(ζ),
R(ζ) =
(R̃(ζ)V ∗(ζ)S(ζ))∗(R̃(ζ)V ∗(ζ)S(ζ)).
ζ 7→ R(ζ) is smooth and, for all ζ, R(ζ) is a hermitian symmetric, positive
definite matrix. Moreover, there holds:
[q ∈ KerΓ] ⇔
∀ζ 6= 0, R(ζ)q ∈ E+(R(ζ)S
−1(ζ)R(ζ))
which is equivalent to:
∀ζ 6= 0, KerΓ = E+(S
−1(ζ)R2(ζ)).
Let us detail the computation of R(ζ).
R(ζ) =
S(ζ)V (ζ)R̃
(ζ)V ∗(ζ)S(ζ).
Moreover
(ζ) = S̃−1(ζ)R̃2(ζ)S̃−1(ζ),
we have thus:
R(ζ) =
R̃(ζ)S̃−1(ζ)V ∗(ζ)S(ζ)
R̃(ζ)S̃−1(ζ)V ∗(ζ)S(ζ)
which gives:
B(ζ) =
R̃(ζ)S̃−1(ζ)V ∗(ζ)S(ζ)
R̃(ζ)S̃−1(ζ)V ∗(ζ)S(ζ)
We recall that R̃ is given, for all ζ 6= 0, by:
R̃(ζ) =
IdN−p −ℵ(ζ)
−ℵ∗(ζ) Idp
and that for all ζ 6= 0, V (ζ) is given by:
V (ζ) = (Λ(ζ))−
2Z(ζ),
where
Λ(ζ) = Z∗(ζ)
(ζ)Z(ζ)
with Λ is a diagonal matrix with real coefficients: (λ1, . . . , λN ), and Λ de-
notes the diagonal matrix with diagonal coefficients (|λ1|, . . . , |λN |).
Remark 2.5. In the construction of B the only freedom we have resides in
the choice of ℵ.
2.3 A change of dependent variables.
Let us denote by R := B
2 and v̂ := Rû. v̂ is hence solution of (2.3):
(2.3)
R−1SR−1∂xv̂ = R
−1SAR−1v̂ +R−1S(Ad)
−1f̂ , {x > 0},
ΓR−1v̂|x=0 = Γĝ,
We will adopt the following notations: SR := R
−1SR−1, AR := RAR
and ΓR := ΓR
−1. We first observe that:
ker ΓR = R ker Γ = RE+((S)
−1R2).
but S−1
= RS−1R thus
ker ΓR = RE+(R
−1SRR) = E+(SR).
This is where Lemma 1.4 is used in a crucial manner. Let us denote by
− the projector on E−(SR) parallel to E+(SR) and by by P
+ the projector
on E+(SR) parallel to E−(SR); P
− and P+ denoting the associated Fourier
multiplier. Since SR is hermitian symmetric, P
− is in fact the orthogonal
projector on E−(SR). The problem (2.3) can then be written:
SR∂xv̂ = SRARv̂ +R
−1S(Ad)
−1f̂ , {x > 0},
−v̂|x=0 = P
−Γĝ,
This problem is well-posed because, as a direct Corollary of Proposition 2.2,
we have:
Proposition 2.6. For all ζ such that τ2 + γ2 + |η|2 = 1, there holds:
• SR(ζ) is hermitian symmetric.
• ℜ (SRAR) (ζ) is positive definite.
• −SR(ζ) is definite negative on ker ΓR and the dimension of ker ΓR is
the same as the number of negative eigenvalues of −SR(ζ).
Proof. For the sake of simplicity, let us omit the dependence in ζ in our
notations.
• SR := R
−1SR−1, and both S and R are hermitian thus SR is hermi-
tian.
• SRAR = R
−1SAR−1, thus for all q ∈ CN , there holds:
2〈ℜ(SRAR)q, q〉 = 〈SRARq, q〉+〈q, SRARq〉 = 〈R
−1SAR−1q, q〉+〈q,R−1SAR−1q〉,
since R−1 is hermitian, we have then:
= 〈SAR−1q,R−1q〉+ 〈R−1q, SAR−1q〉 = 2〈ℜ(SA)R−1q,R−1q〉.
Since ℜ(SA) is positive definite and R is invertible, ℜ (SRAR) is thus
positive definite.
• By construction of R, it satisfies ker ΓR = E+(SR), with SR hermitian.
As a consequence −SR is definite negative on ker ΓR and the dimension
of ker ΓR is the same as the number of negative eigenvalues of −SR.
✷ Let us mention that, since R and S remains uniformly bounded in
ζ 6= 0, f̂ and R−1S(Ad)
−1f̂ belongs to the same space. In a same spirit as
[5], this suggests the following singular perturbation of (2.3):
SR∂xv̂
−v̂ε1x<0 = SRARv̂
−Γĝ1x<0 +R
−1S(Ad)
−1f̂ , {x ∈ R},
This is equivalent to perturb (2.2) as follow:
S∂xû
RP−Rûε1x<0 = SAû
RP−Γĝ1x<0 + S(Ad)
−1f̂ , {x ∈ R},
Finally, this induces the following perturbation for (1.1):
(2.4)
Huε +
Muε1x<0 = f1x>0 +
θ1x<0, {x ∈ R},
uε|t<0 = 0,
where
M := −eγtAdS
−1RP−Re−γt,
θ = −eγtAdS
−1RP−Γg̃,
and S(∂t, ∂y) [resp R(∂t, ∂y)] denotes the Fourier multiplier associated to
S(ζ) [resp R(ζ)].
3 Proof of Theorem 1.6.
First, we construct an approximate solution of equation (2.4) (which is also
equation (1.4)), then prove suitable energy estimates that ensures uε and its
approximate solution both converges towards the same limit as
ε → 0+.
3.1 Construction of the approximate solution.
uε is the solution of the well-posed Cauchy problem:
Huε +
Muε1x<0 = f1x>0 +
θ1x<0, {x ∈ R},
uε|t<0 = 0.
uε is moreover the solution of the well-posed Cauchy problem:
Huε +
Muε1x<0 = SA
f1x>0 +
θ1x<0, {x ∈ R},
uε|t<0 = 0.
The associated equation after tangential Fourier-Laplace transform writes :
S∂xû
RP−Rûε1x<0 − SAû
ε = −
RP−Γĝ1x<0 + S(Ad)
−1f̂1x>0, {x ∈ R}.
or alternatively:
ûε = R−1v̂ε
SR∂xv̂
−v̂ε1x<0 = SRARv̂
−Γĝ1x<0 +R
−1S(Ad)
−1f̂ , {x ∈ R},
We will use the following formulation as a transmission problem in our con-
struction of an approximate solution:
SR∂xv̂
ε+ = SRARv̂
ε+ +R−1S(Ad)
−1f̂ , {x > 0},
SR∂xv̂
−v̂ε− = SRARv̂
−Γĝ, {x < 0},
v̂ε+|x=0+ = v̂
ε−|x=0− .
For Ω an open regular subset of Rd+1, and ρ ∈ N, let us introduce the
weighted spaces H
γ (Ω) defined by:
H̺γ (Ω) = {̟ ∈ e
γtL2(Ω), ‖̟‖H̺γ (Ω) < ∞};
where
γ (Ω)
α,|α|≤̺
γρ−|α|‖e−γt∂α̟‖2
L2(Ω).
We will construct an approximate solution uεapp of u
ε. uεapp will be con-
structed as follow:
uεapp = u
app1x>0 + u
app1x<0,
where uε±app is an approximate solution of u
ε± satisfying the following ansatz:
uε±app =
U±j (ζ, x)ε
where the profiles U±j belong toH
), where Ω±
stands for [0, T ]×Rd±.
Denote
v̂εapp = RF(e
−γtuεapp) := v̂
app1x>0 + v̂
app1x<0.
v̂ε±app is then an approximate solution of v
ε± and is of the form:
v̂ε±app =
j (ζ, x)ε
where
= RF(e−γtU±
and conversely
U±j = e
γtF−1
R−1V ±j
The profiles U±j can be constructed inductively at any order. Let us show
how the first profiles are constructed: Identifying the terms in ε−1 gives:
−V −0 = P
−Γĝ.
Hence, P+V −0 remains to be computed in order to obtain the profile
0 = e
γtF−1
Identifying the terms in ε0 gives then that V +0 is solution of the well-posed
problem:
(3.1)
SR∂xV
0 = SRARV
−1S(Ad)
−1f̂ , {x > 0},
−V +0 |x=0 = P
−Γĝ.
The associated profile
U+0 = e
γtF−1
R−1V +0
belongs then toHkγ (Ω
).Moreover, the problem (3.1) is Kreiss-Symmetrizable
and thus the trace of the profile U+0 , see [3] for instance, satisfies:
U+0 |x=0 ∈ H
γ (ΥT ).
Since V +0 has just be computed, V
0 |x=0 is given by: V
0 |x=0 − V
0 |x=0 = 0
and thus, there holds:
−V +0 |x=0 = P
−V −0 |x=0.
Moreover
SR∂xV
0 − P
−V −1 = SRARV
0 , {x < 0}.
Projecting this equation on E+(SR) collinearly to E−(SR) gives then:
SR∂xP
0 − P
+SRARV
0 = 0, {x < 0},
Since
+SRARV
0 = P
+SRARP
+V −0 + P
+SRARP
−Γĝ,
we have then:
SR∂x(P
+V −0 )− P
+SRAR(P
+V −0 ) = P
+SRARP
−Γĝ, {x < 0},
and as a consequence, P+V −0 is solution of the following problem:
(3.2)
SR∂x(P
0 )− P
+SRAR(P
0 ) = P
+SRARP
−Γĝ {x < 0},
+V −0 |x=0 = P
+V +0 |x=0.
Let us precise how (3.2) has to be interpreted: we denote w = P+V −0 . w is
then totally polarized on E+(SR), and satisfies the problem:
(3.3)
+w = w
SR∂xw − P
+SRARw = P
+SRARP
−Γĝ {x < 0},
w|x=0 = P
+V +0 |x=0.
As we will see, the problem (3.3) is Kreiss-Symmetrizable and thus well-
posed. Indeed, for all ζ such that τ2 + γ2 + |η|2 = 1, we have, omitting the
dependencies in ζ in our notations:
• For all q ∈ CN , there holds:
〈SRq, q〉 = 〈q, SRq〉.
• Since Re(SRAR) is positive definite and P
+ is an orthogonal projector,
there is C > 0 such that, for all q ∈ E+(SR), there holds:
〈P+SRARP
+q, q〉+ 〈q,P+SRARP
+q〉 ≥ C〈q, q〉.
Indeed, for all q ∈ E+(SR), there holds:
〈P+SRARP
+q, q〉 = 〈P+SRARP
+q,P+q〉 = 〈SRARP
+q,P+q〉.
• −SR is definite negative on kerP
+ that is to say, that there is c > 0
such that, for all q ∈ kerP+, there holds:
〈−SRq, q〉 ≤ −c〈q, q〉.
Moreover kerP+ has the same dimension as the number of negative
eigenvalues in −SR.
The profile U−0 can then be computed by:
U−0 := e
γtF−1
R−1(w + P−Γĝ)
belongs to Hkγ (Ω
), moreover its trace U−0 |x=0 belongs
to Hkγ (ΥT ). Consider now the equation:
1 = SR∂xV
0 − SRARV
0 , {x < 0}.
Since P−V −1 |x=0 = P
−V +1 |x=0, V
1 is solution of the well-posed problem:
SR∂xV
1 = SRARV
1 , {x > 0},
−V +1 |x=0 = SR∂xV
0 |x=0 − SRARV
0 |x=0.
Due to the loss of regularity in the boundary condition, the associated profile
U+1 = e
γtF−1
R−1V +1
belongs to H
), moreover its trace U+1 |x=0 belongs
γ (ΥT ). Moreover, applying P
+ to the equation:
−V −2 + SRARV
1 = SR∂xV
1 , {x < 0},
we obtain:
SR∂xP
+V −1 = P
+SRARP
+V −1 + P
+SRARP
−V −1 , {x < 0},
+V −1 |x=0 = P
+V +1 |x=0.
As before, let us take P+V −1 as the unknown of the well-posed problem:
SR∂x(P
+V −1 )− P
+SRAR(P
+V −1 ) = P
+SRAR
SR∂xV
0 − SRARV
, {x < 0},
(P+V −1 )|x=0 = P
1 |x=0.
This problem is Kreiss-Symmetrizable since, for all ζ such that
τ2 + γ2 + |η|2 = 1, there holds:
• For all q ∈ CN , there holds:
〈SRq, q〉 = 〈q, SRq〉.
• There is C > 0 such that for all q ∈ E+(SR), there holds:
〈P+SRARP
+q, q〉+ 〈q,P+SRARP
+q〉 ≥ C〈q, q〉.
• −SR is definite negative on kerP
+ that is to say, that there is c > 0
such that, for all q ∈ kerP+, there holds:
〈−SRq, q〉 ≤ −c〈q, q〉.
Moreover kerP+ has the same dimension as the number of negative
eigenvalues in −SR.
However, due to a loss of regularity in both the source term and the boundary
condition, the associated profile
U−1 = e
γtF−1
+V −1 + SR∂xV
0 − SRARV
belongs to H
). The construction of the following profiles can be
pursued at any order the same way. In practice, we take:
uε+app = U
uε−app = U
0 + εU
As a result, the approximate solution writes uεapp := u
app1x>0 + u
app1x<0;
where uε+app belongs to H
) and uε−app belongs to H
). uεapp is then
solution of a well-posed problem of the form:
(3.4)
Huεapp +
Muεapp1x<0 = f1x>0 +
θ1x<0 + εr
ε, {x ∈ R},
uεapp|t<0 = 0 .
Where rε := rε+1x>0 + r
ε−1x<0, with r
ε+ ∈ H
) and
rε− ∈ Hk−3γ (Ω
Remark 3.1. In the case where g = 0, the loss of regularity in the profiles
is delayed by one step. As a result, in this case we obtain:
uε+app ∈ H
uε−app ∈ H
rε+ ∈ Hkγ (Ω
rε− ∈ H
3.2 Stability estimates
We will begin by proving energy estimates on the following equation:
(3.5) SRARê
ε − SR∂xê
−êε1x<0 = εr̂
ε, {x ∈ R},
where êε = R
F(e−γtuε)−F(e−γtuεapp)
:= ŵε; with wε = uε − uεapp.
Refering to (3.4), wε is the solution of the Cauchy problem:
(3.6)
Hwε +
Mwε1x<0 = εr
wε|t<0 = 0 .
For a fixed positive ε, the perturbation is nonsingular and thus the principal
part of the pseudodifferential operator H+ 1
M is the same as the principal
part of H. Hence, there is a unique solution of the Cauchy problem (3.6):
wε which belongs to Hk−3γ (ΩT ). In order to simplify the notations, in this
chapter we shall denote by L2 and H
γ the spaces: L
2(ΩT ) and H
γ (ΩT ).
We recall the definition of the weighted spaces: H
γ (ΩT ) for ρ ∈ N.
H̺γ (ΩT ) = {̟ ∈ e
γtL2(ΩT ), ‖̟‖H̺γ (ΩT ) < ∞};
where
γ (ΩT )
α,|α|≤̺
γρ−|α|‖e−γt∂α̟‖2
L2(ΩT )
For fixed positive ε, there holds:
∂x〈SRê
ε, êε〉L2 dx = 0.
2Re〈SR∂xê
ε, êε〉L2 dx = 0.
Using the equation, we have then:
Re〈SRARê
−êε − εr̂ε, êε〉L2 dx = 0.
which is equivalent to:
Re〈SRARê
ε, êε〉L2 dx =
Re〈P−êεεr̂ε, êε〉L2 dx
Re〈r̂ε, êε〉L2) dx.
But Re〈SRARê
ε, êε〉 = 〈Re (SRAR) ê
ε, êε〉 and Re (SRAR) is positive defi-
nite, hence there is C > 0, independent of ε such that:
Cγ‖êε‖2
Re〈P−êε, êε〉 ≤
Re〈εr̂ε, êε〉L2 dx.
Thus, because P− is an orthogonal projector, for all positive λ, there holds:
Cγ‖êε‖2
‖P−êε‖2
‖êε‖2
‖εr̂ε‖2
Choosing λ big enough we have C− ε
> 0 and the following energy estimate:
γ‖êε‖2
‖P−êε‖2
‖r̂ε‖2
This shows that êε converges towards zero in L2 when ε tends to zero, with
a rate in O(ε). We recall that êε is given by:
êε := RF
e−γt(uεapp − u
and r̂ε is given by:
r̂ε := RF
e−γtrε
Moreover, since R and P− are two uniformly bounded, uniformly definite
positive matrices, there are two positive real numbers α and β such that,
for all ζ 6= 0 and x ∈ R, there holds:
• α‖F
e−γt(uεapp − u
≤ ‖êε‖2
• α‖P−F
e−γt(uεapp − u
≤ ‖P−êε‖2
• ‖r̂ε‖2
≤ β‖F
e−γtrε
Applying then Plancherel’s equality we obtain then:
γ‖uεapp − u
eγtL2
uεapp − u
eγtL2
‖rε‖2
eγtL2
We have thus proved there are two positive constants c and C such that:
cγ‖uεapp − u
eγtL2
uεapp − u
eγtL2
‖rε‖2
eγtL2
Let us denote by ‖.‖∗
+ ‖.‖2
. More generally, when
rε ∈ H̺, there is two positive constants cρ and Cρ such that:
cργ‖u
app − u
uεapp − u
‖rε‖∗2
As we have seen during the construction of the profiles, ̺ = k− 3 in general
and ρ = k − 3
in the case where g = 0.
3.3 End of the proof of Theorem 1.6.
As a consequence of our stability estimate, there holds:
‖uεapp − u
Hk−3(Ω
+ ‖uεapp − u
Hk−3(Ω
= O(ε2).
Moreover, by construction of uεapp, there holds:
‖uεapp − u
Hk−3(Ω
+ ‖uεapp − u‖
Hk−3(Ω
= O(ε2).
Hence, we have proved that:
‖uε − u−‖2
Hk−3(Ω
+ ‖uε − u‖2
Hk−3(Ω
= O(ε2).
By the same arguments, if g = 0, there holds:
‖uε − u−‖2
+ ‖uε − u‖2
= O(ε2).
This completes the proof of Theorem 1.6.
4 Proof of Theorem 1.9.
Like in the proof of Theorem 1.6, we begin by constructing formally an
approximate solution of equation (1.7). We prove then suitable energy esti-
mates that ensures both uε and its approximate solution converges towards
ũ as ε → 0+.
4.1 Construction of an approximate solution.
The goal of this Lemma is to replace the boundary condition Γu|x=0 = Γg of
problem (1.1) by a condition of the form P−(e−γtu)|x=0 = h with a suitable
h ∈ Hk(ΥT ).
Lemma 4.1. Let u denote the unique solution in Hk(Ω+
) of the mixed
hyperbolic problem (1.1), P+(∂t, ∂y, γ)
e−γtu
does not depend of the choice
of the boundary operator Γ and of g. Let us introduce the function h of
Hk(ΥT ) defined by:
e−γtv|x=0
e−γt(g − v|x=0)
The solution u of the mixed hyperbolic problem (1.1) is the unique solution
of the following well-posed mixed hyperbolic problem (4.1):
(4.1)
Hu = f, {x > 0},
−(∂t, ∂y, γ)
e−γtu|x=0
u|t<0 = 0 .
In addition, the mapping (f, g) → h is linear continuous from
Hk(Ω+
)×Hk(ΥT ) to H
k(ΥT ).
Proof. Let v denote a solution in Hk(ΩT ) of the equation:
Hv = f, (t, y, x) ∈ ΩT ,
v|t<0 = 0 .
We introduce then U which is, by definition, the solution of the following
mixed hyperbolic problem:
HU = 0, {x > 0},
Γ(∂t, ∂y, γ)U|x=0 = Γ(∂t, ∂y, γ)g − Γ(∂t, ∂y, γ)v|x=0,
U|t<0 = 0 .
The right hand side of the boundary condition is, a priori,
in Hk−
2 (ΥT ). Hence the solution U belongs to H
2 (Ω+
). By construction
we have:
(4.2) u = U+ v.
Hence, since u ∈ Hk(Ω+
) and v ∈ Hk(Ω+
), in fact we have:
U ∈ Hk(Ω+
Let Û denote the Fourier-Laplace transform in (t, y) of U (Fourier-Laplace
transform tangential to the boundary) given by: F(e−γtU). It satisfies the
following symbolic equation:
∂xÛ = A(ζ)Û, {x > 0},
Γ(ζ)Û|x=0 = Γ(ζ)ĝ − Γ(ζ)v̂|x=0,
where ĝ and v̂ denotes respectively the tangential Fourier-Laplace transform
of g and v. Since A(ζ) is independent of x, projecting the above equation on
E+(A(ζ)) gives then:
Û = A(ζ)P+Û.
Moreover P+Û|x=0 ∈ E−(A(ζ))
E+(A(ζ)) since limx→∞P
Û = 0. Hence,
there holds:
Û = 0,
and thus
Û = P−Û.
The boundary condition:
Γ(ζ)Û|x=0 = Γ(ζ)ĝ − Γ(ζ)v̂|x=0
is equivalent to:
Û|x=0 ∈ ĝ − v̂|x=0 +KerΓ.
We have thus:
Û|x=0 ∈ ĝ − v̂|x=0 + ker Γ.
Let us denote by Π the projector on Ẽ−(A) parallel to ker Γ, which has a
sense because the Uniform Lopatinski Condition holds.
Since Û|x=0 ∈ Ẽ−(A), and of the Uniform Lopatinski Condition, the above
boundary condition is equivalent to:
Û|x=0 = Π(ĝ − v̂|x=0),
and thus, because P−Û|x=0 belongs to E−(A), we have:
Û|x=0 = Π(ĝ − v̂|x=0).
As a consequence, we obtain:
−û|x=0 = P
−v̂|x=0 +Π(ĝ − v̂|x=0).
Hence, there holds:
e−γtu|x=0
e−γtv|x=0
e−γt(g − v|x=0)
:= h.
P+(∂t, ∂y, γ)
e−γtu
= P+(∂t, ∂y, γ)
e−γtv
, thus it does not depend of the
choice of the boundary operator Γ and of g.Moreover, since u|x=0 ∈ H
k(ΥT ),
it follows that
g ∈ Hk(ΥT ). Now, since the Uniform Lopatinski Condition holds, u satisfies
the following energy estimate:
eγtL2(Ω
+ ‖u|x=0‖
eγtL2(ΥT )
≤ γ‖f‖2
eγtL2(Ω
+ ‖g‖eγtL2(ΥT ),
More generally, we have:
Hkγ (Ω
+ ‖u|x=0‖
Hkγ (ΥT )
≤ γ‖f‖2
Hkγ (Ω
+ ‖g‖Hkγ (ΥT ).
where ‖̟‖2
|α|=0 γ
k−|α|‖∂α̟‖2
eγtL2
h = P−(e−γtu|x=0) hence
L2(ΥT )
≤ C‖e−γtu|x=0‖
L2(ΥT )
= C‖u|x=0‖
eγtL2(ΥT )
and for 0 ≤ j ≤ d− 1, there holds:
‖∂jh‖
L2(ΥT )
≤ cj‖ηjP
−F(e−γtu)|x=0‖ ≤ c
j‖u|x=0‖H1γ(ΥT ).
More generally, we have:
Hkγ (ΥT )
≤ Ckγ‖f‖
Hkγ (Ω
+ Ck‖g‖
Hkγ (ΥT )
But γ is a positive real number fixed once and for all at the beginning of the
paper, hence this proves that the mapping (f, g) → h is continuous from
Hk(Ω+
)×Hk(ΥT ) to H
k(ΥT ). ✷
As we will see, Lemma 4.1 is central in our construction of an approximate
solution. We will construct an approximate solution
uεapp := u
app1x>0 + u
app1x<0,
along the following ansatz:
uε+app :=
εju+j (t, y, x),
with u+j ∈ H
), u+j |x=0 ∈ H
γ (ΥT ); and
uε−app :=
j (t, y, x),
with u−j ∈ H
), u−j |x=0 ∈ H
γ (ΥT ). As usual, we will refer to the
terms u±
as profiles. We will rather work on the reformulation of
problem (1.7) as the transmission problem (4.3):
(4.3)


Huε+ = f, {x > 0},
Huε− +
−e−γtuε− =
γth̃, {x < 0},
uε+|x=0 − u
ε−|x=0 = 0,
uε±|t<0 = 0 .
Plugging uε+app and u
app in (4.3) and identifying the terms with same power
in ε, we obtain the following profiles equations:
• Identification of the terms of order ε−1 :
(4.4) Ade
−e−γtu
0 = Ade
γth̃, {x < 0}.
• Identification of the terms of order ε0 :
(4.5) Hu−0 +Ade
−e−γtu−1 = 0, {x < 0}.
(4.6) Hu+0 = f, {x > 0},
• Identification of the terms of order εj for j ≥ 1 :
(4.7) Hu
j +Ade
−e−γtu
j+1 = 0, {x < 0}.
(4.8) Hu
j = 0, {x > 0},
• Translation of the continuity condition over the boundary on the pro-
files:
For all 1 ≤ j ≤ M, there holds:
(4.9) u
j |x=0 − u
j |x=0 = 0.
Denote by û±j := F(e
−γtu±j ) . We have then:
j := e
γtF−1(û±j ).
We will now give the equations satisfied by the Fourier-Laplace transform
of the profiles: û±j . First, equation (4.4) is equivalent to:
−û−0 = F(h̃), {x < 0}.
We deduce from this equation that there holds:
−û−0 |x=0 = ĥ.
Then, using (4.9) for j = 0, and (4.6) gives that, for γ big enough,
0 = F(e
−γtû
where û+0 is the solution of the well-posed first order ODE in x:
0 −Aû
0 = F(e
−γt(Ad)
−1f), {x > 0},
−û+0 |x=0 = h,
Thus u+0 is solution of:
Hu+0 = f, {x > 0},
eγtP−e−γtu+0 |x=0 = h.
Thanks to Lemma 4.1, we recognize u+0 as the solution of our starting mixed
hyperbolic problem (1.1). Once u+0 is known, so is û
0 and thus û
0 |x=0 is
given by:
û−0 |x=0 = û
0 |x=0.
Moreover,
u+0 |x=0 = u
0 |x=0 ∈ H
γ (ΥT ).
By (4.5), there holds:
0 −Aû
−û−1 = 0, {x < 0}.
As a consequence, P+û−0 is given by the well-posed ODE:
0 )−A(P
0 ) = 0, {x < 0},
+û−0 |x=0 = P
+û+0 |x=0.
Indeed, since kerP+(ζ) = E−(A(ζ)), this problem satisfies the Uniform
Lopatinski Condition: for all ζ 6= 0, there holds:
E−(A(ζ))
E+(A(ζ)) = C
For γ big enough, by linearity of the inverse Fourier transform, u−0 can then
be computed by:
u−0 := e
γtF−1(P−û−0 ) + e
γtF−1(P+û−0 ).
Following up with that process of construction, we can go on with the con-
struction of the profiles at any order. Indeed, assume that all the profiles
(u+j , u
j ) up to order j have been computed. Then consider the equation
obtained through identification:
j+1 = −∂xû
j +Aû
j , {x < 0}.
We see there is a loss of regularity between û−j+1 and û
Let us say that u±j ∈ H
). Considering the traces, we have then:
j |x=0 ∈ H
γ (ΥT ). We will show in this part how the Sobolev regularity
of the profiles u±j+1, which is by definition mj+1, can be computed know-
ing mj . To begin with P
j+1 belongs to H
). P−u+j+1|x=0, which
belongs to H
γ (ΥT ), is known by P
j+1|x=0 = e
γtF−1(P−û+j+1|x=0),
with:
j+1|x=0 = P
j+1|x=0.
Hence, û+j+1 := F(e
−γtu+j+1) is the solution of the first order ODE in x :
j+1 −Aû
j+1 = 0, {x > 0},
−û+j+1|x=0 = P
−û−j+1|x=0.
Since kerP−(ζ) = E+(A(ζ)), this problem satisfies the Uniform Lopatinski
Condition: for all ζ 6= 0, there holds:
E−(A(ζ))
E+(A(ζ)) = C
As a consequence, this problem is well-posed and, u+j+1 ∈ H
).More-
over, there holds:
u+j+1|x=0 = u
j+1|x=0 ∈ H
γ (ΥT ).
Indeed, P+û+j+1 ∈ H
∞(Rd+1+ ) hence P
+u+j+1|x=0 ∈ H
γ (ΥT ) and thus
u+j+1|x=0 ∈ H
γ (ΥT ). Furthermore, we have:
u−j+1|x=0 = u
j+1|x=0.
Applying P+ on the following equation:
−û−j+2 = −∂xû
j+1 +Aû
j+1, {x < 0};
we obtain then the equation:
j+1)−AP
j+1 = 0, {x < 0}.
Remark 4.2.
−û−j+2 = −∂xû
j+1 +Aû
j+1, {x < 0}.
shows that the ”Fourier profile” û−j+1 must be so that −∂xû
j+1 + Aû
j+1 is
polarized on E−(A). It is indeed the case because we search for û
j+1 satisfy-
+û−j+1)−AP
+û−j+1 = 0, {x < 0}.
j+1 is given by:
j+1 := e
γtF−1(P−û−j+1) + e
γtF−1(P+û−j+1).
with P+u−j+1 = e
γtF−1(P+û−j+1) belongs to H
) and is the unique
solution of the well-posed first order ODE:
+û−j+1)−A(P
+û−j+1) = 0, {x < 0},
+û−j+1|x=0 = P
+û+j+1|x=0.
The profile u−j+1 belongs to H
). This achieves to show that the
knowledge of (u+j , u
j ), allows us to compute (u
j+1, u
j+1).
Moreover mj+1 = mj −
, that is to say that a construction of each supple-
mentary profile consummate 3
of Sobolev regularity. In practice, we take:
uε+app = u
uε−app = u
0 + εu
As a result, the approximate solution writes uεapp := u
app1x>0 + u
app1x<0;
where uε+app belongs to H
) and uε−app belongs to H
). The so de-
fined uεapp is solution of a well-posed problem of the form:
(4.10)
Huεapp +
−e−γtuεapp1x<0 = f1x>0 +
γth̃1x<0 + εr
uεapp|t<0 = 0 .
Where rε := rε+1x>0 + r
ε−1x<0, with r
ε+ ∈ H
) and
rε− ∈ Hk−3γ (Ω
4.2 Asymptotic Stability of the problem as ε tends towards
zero.
Denote by vε = uεapp − u
ε. By construction of uεapp, v
ε is solution of the
following Cauchy problem:
(4.11)
Hvε +
−e−γtvε1x<0 = εr
vε|t<0 = 0 .
For all positive ε, this problem is well-posed. In order to investigate the
stability of this problem as ε goes to zero, we will reformulate it as a trans-
mission problem. The restrictions of vε to {x > 0} and {x < 0}, respectively
denoted by vε+ and vε− are solution the following transmission problem:
(4.12)


Hvε+ = εrε+, {x > 0},
Hvε− +
−e−γtvε− = εrε−, {x < 0},
vε+|x=0 − v
ε−|x=0 = 0,
vε±|t<0 = 0 .
Let us denote by V ε the function, valued in R2N , defined for all {x > 0}
and (t, y) ∈ [0, T ]× Rd−1 by:
V ε(t, y, x) =
V ε+(t, y, x)
V ε−(t, y,−x)
vε is solution of the Cauchy problem (4.11) iff V ε is solution of the mixed
hyperbolic problem on a half space (4.13) given below:
(4.13)
H̃V ε +BεV ε = εRε, {x > 0},
Γ̃V ε|x=0 = 0,
V ε|t<0 = 0 ,
where
H̃ = ∂t +
0 −Ad
γtP−e−γt
Rε(t, y, x) =
rε+(t, y, x)
rε−(t, y,−x)
Id −Id
Returning to the construction of our approximate solution, we have
Rε ∈ H
)×Hk−3γ (Ω
) and is such that Rε|t<0 = 0.
In fact Rε ∈ Hk
) with k′ = k−3. For all positive ε, there exists a unique
solution V ε in Hkγ (Ω
) to the above problem. We will prove here that this
solution converges, uniformly in ε, towards 0 in Hk
), as ε vanishes. As
in the proof of Kreiss Theorem, see [3] for instance, existence of solution
for mixed hyperbolic systems like (1.7) or (4.13), are obtained through the
proof of both direct and ”dual”a priori estimates on an adjoint problem. This
estimates results in the constant coefficient case of estimates on the Fourier-
Laplace transform of the solution. Additionally, if this ”Fourier” estimate
can be proved, both direct and ”dual” energy estimates are deduced from
it. In a first step, let us recall formally how to conduct the Fourier-Laplace
transform of a mixed hyperbolic problem:
Hu = f, {x > 0},
Γu|x=0 = g,
u|t<0 = 0 ,
Denote by u∗ := e
−γtu, u∗ is in particular a solution of the following problem:
Hu∗ + γu∗ = e
−γtf, {x > 0},
Γu∗|x=0 = g .
We take then the tangential (with respect to (t,y)) Fourier transform of this
equation, which gives:
Ad∂xû∗ + (γ + iτ)û∗ + iηj
Aj û∗ = F
e−γtf
, {x > 0},
Γû∗|x=0 = ĝ .
Multiplying this equation by A−1
, we obtain that u∗ is solution of the fol-
lowing ODE in x:
∂xû∗ −Aû∗ = (Ad)
e−γtf
, {x > 0},
Γû∗|x=0 = ĝ .
Note that, û∗ and u can be freely deduced from each other through the
formulas:
û∗ = F(e
−γtu)
u = eγtF−1(û∗).
We shall now introduce a rescaled solution V ε of the solution V ε of
(4.13) defined as follows: V ε(t, y, x) := V ε(t, y, εx), and the rescaled remain-
der: Rε(t, y, x) := Rε(t, y, εx). Denoting by V̂
= F(e−γtV ), the associated
equation writes then:
− εÃV̂
= ε2R̂ε, {x > 0},
|x=0 = 0 .
where
M(ζ) =
0 P−(ζ)
We remark that
εÃ(ζ) = Ã(εζ) = Ã(ζ̂),
with ζ̂ = (τ̂ , γ̂, η̂) := εζ. Moreover P− is homogeneous of order zero in ζ.
Let us denote R̃ε(ζ̂ , x) := R̂ε(ζ, x) and Ṽ
(ζ̂ , x) := V̂
(ζ, x). Hence Ṽ
solution of the following problem:
−Ã(ζ̂) +M(ζ̂)
= ε2R̃ε(ζ̂ , x), {x > 0},
|x=0 = 0 .
As a consequence, the Uniform Lopatinski Condition for problem (4.13)
writes: For all γ̂ > 0,
|det(E−(Ã(ζ̂)−M(ζ̂), ker Γ)| ≥ C > 0.
In view of the proof of the Proposition (4.3), we recall that the spaces E±(A)
have to be considered in the extended sense defined above.
Proposition 4.3. Since H satisfies the hyperbolicity Assumption in As-
sumption 1.1, the Uniform Lopatinski Condition is satisfied for our present
problem; that is to say that, for all ζ̂ such that γ̂ > 0 there holds:
|det(E−(Ã(ζ̂)−M(ζ̂), ker Γ)| ≥ C > 0.
Proof. We will begin to show that the Uniform Lopatinski Condition writes
as well that for all ζ̂ 6= 0 there holds:
(4.14) E+(A(ζ̂)−P
−(ζ̂))
E−(A(ζ̂)) = {0} .
This notation keeps a sense for ζ̂ such that γ̂ = 0 because we will prove
a posteriori that the involved linear subspaces continuously extends from
{ζ̂ , γ̂ > 0} to {ζ̂ , γ̂ = 0}. Then we will prove that, for all ζ̂, the property
4.14 holds true. The Uniform Lopatinski Condition writes actually, for all
ζ̂ 6= 0 :
E−(Ã(ζ̂)−M(ζ̂))
ker Γ̃ = {0}.
and thus, since we have:
E−(Ã(ζ̂)−M(ζ̂)) = E−(A(ζ̂))× E+(A(ζ̂)−P
−(ζ̂)),
and by definition of Γ̃, the Uniform Lopatinski Condition writes then that,
for all ζ̂ 6= 0, there holds:
E+(A(ζ̂)−P
−(ζ̂))
E−(A(ζ̂)) = {0}.
Lemma 4.4.
A(ζ̂)−P−(ζ̂)
A(ζ̂)
A(ζ̂)−P−(ζ̂)
A(ζ̂)
Proof. For all ζ̂ 6= 0, there is an invertible N × N matrix with complex
coefficients P (ζ̂) such that: P−1AP is trigonal and the diagonal coefficients
are sorted by increasing order of their real parts. Let us denote by ν the
dimension of E− (A) . The above matrix P traduces the change of basis from
the canonical basis of CN into (v1, . . . , vν , vν+1, . . . , vN ), where
Span ((vk)1≤k≤ν) = E− (A) ,
Span ((vk)ν+1≤k≤N ) = E+ (A) .
Moreover, there holds
P−1P−P = D
where D is the diagonal matrix whose ν first diagonal terms are equal to 1
and the N − ν last diagonal terms are null.
P−1(A−P−)P = P−1AP −D.
P−1AP−D is also trigonal, with the same eigenvalues with positive real part
as P−1AP and the same eigenvalues with negative real part as P−1AP −Id.
As a consequence, for all ζ̂ 6= 0, there holds:
A(ζ̂)−P−(ζ̂)
A(ζ̂)
A(ζ̂)−P−(ζ̂)
A(ζ̂)
As a consequence of Lemma 4.4, the rescaled Uniform Lopatinski Con-
dition for ε > 0, ε → 0 happens to be exactly the same as the one written
for bigger positive ε. Indeed, it writes, for all ζ̂ 6= 0 :
E+(A(ζ̂))
E−(A(ζ̂)) = {0}.
✷ The Lopatinski condition is satisfied, and, as a result, the following,
uniform in ε, energy estimate holds for V ε, for all γ ≥ γk′ > 0 :
γ‖V ε‖2
+ ‖V ε|2x=0‖Hk′γ (ΥT )
‖εRε‖2
which is equivalent to:
(4.15) γ‖V ε‖2
+ ‖V ε|x=0‖
γ (ΥT )
‖εRε‖2
This proves the convergence of V ε towards zero in Hk
). The weight
γ is fixed beforehand thus, in fact, the solution of (4.13) tends to zero in
) at a rate at least in O(ε).
5 End of proof of Theorem 1.9.
Let us consider V ε defined by:
V ε(t, y, x) :=
uε+app(t, y, x)− u
ε+(t, y, x)
uε−app(t, y,−x)− u
ε−(t, y,−x)
This notation is perfectly fine because the so-defined function is solution of
an equation of the form (4.13). Moreover, thanks to the stability estimate
(4.15), there is γk positive such that, for all γ > γk, we have:
γ‖uεapp−u
+γ‖uεapp−u
+‖uεapp−u
γ (ΥT )
‖εRε+‖2
Hence, it follows that:
‖uεapp − u
Hk−3(Ω
+ ‖uεapp − u
Hk−3(Ω
= O(ε2).
Moreover, by construction of uεapp, we have:
‖uεapp − u‖
Hk−3(Ω
+ ‖uεapp − u
Hk−3(Ω
= O(ε2).
As a result, we obtain that there holds:
‖uε − u‖2
Hk−3(Ω
+ ‖uε − u−‖2
Hk−3(Ω
= O(ε2).
This concludes the proof of Theorem 1.9.
6 Appendix: answer to a question asked in [11].
In this chapter, we will show that the loss of convergence observed numeri-
cally in [11] in a neighborhood of the boundary is due to a boundary layer
phenomenon. We consider the 1-D wave equation:
(6.1)
∂ttU − c
2∂xxU = 0, (x, t) ∈]0, π[×R
U |x=0 = U |x=π = 0,
U |t=0(x) = sin(x),
∂tU |t=0 = 0.
As in [11], we define then U ε = U ε+1x>0 + U
ε−1x<0 by:
(6.2)


ε+ − c2∂xxU
ε+ = 0, (x, t) ∈]0, π[×R+,
ε− − c2∂xxU
U ε− = 0, (x, t) ∈]−∞, 0[×R+,
U ε+|x=0 − U
ε−|x=0 = 0
ε+|x=0 − ∂xU
ε−|x=0 = 0
U ε+|x=π = 0.
U ε±|t=0(x) = sin(x), {±x > 0}.
ε±|t=0 = 0, {±x > 0}.
We will now construct formally an approximate solution U ε±app of U
ε± satis-
fying the following ansatz:
U ε+app =
U+j (t, x)ε
U ε−app =
t, x,
where the profiles U−j (t, x, z) := U
j (t, x) + U
j (t, z), with
e−αzU∗−j = 0,
for some α > 0. Since the stability estimates are trivial here, we will only
focus on the construction of
U εapp := U
app1x>0 + U
app1x<0.
Plugging U ε±app into problem (6.2) and identifying the terms with same power
of ε, we obtain the following equations:
U−0 = 0,
Moreover, U∗−0 = 0 as it is the only solution of the problem:
U∗−0 − c
2∂zzU
0 = 0, {z < 0},
0 |z=0 = 0,
0 = 0.
U ε+app converges towards U
0 as ε → 0
+. As awaited U+0 is the solution of the
well-posed 1-D wave equation:


0 − c
2∂xxU
0 = 0, (x, t) ∈]0, π[×R
U+0 |x=0 = U
0 |x=0 + U
0 |z=0 = 0.
U+0 |x=π = 0.
0 |t=0(x) = sin(x), {x > 0}.
0 |t=0 = 0, {x > 0}.
Let us write the following profiles equations: First, we can see that, for all
j ≥ 1, there holds:
where U∗−1 is the solution of the well-posed profile equation:
1 − c
2∂zzU
1 = −∂ttU
0 = 0, {z < 0},
1 |z=0 = ∂xU
0 |x=0,
U∗−1 = 0.
Hence U∗−1 is given by:
U∗−1 = c∂xU
0 |x=0e
We will show now that the profiles can be computed as any order. Assume
that U∗−j has been computed, U
j is solution of the well-posed 1-D wave
equation: 


− c2∂xxU
= 0, (x, t) ∈]0, π[×R+,
j |x=0 = U
j |z=0.
U+j |x=π = 0.
U+j |t=0(x) = 0, {x > 0}.
0 |t=0 = 0, {x > 0}.
U∗−j+1 is then solution of the well-posed profile equation:
U∗−j+1 − c
2∂zzU
j+1 = −∂ttU
j , {z < 0},
j+1|z=0 = ∂xU
|x=0,
U∗−j+1 = 0.
Let us answer the question asked in [11]: U ε− is bound to present boundary
layer behavior in {x = 0−}, indeed its approximate solution is composed
exclusively of boundary layer profiles, which describes quick transitions at
the boundary using a fast scale in ε. As a result of the loss in convergence
induced by the boundary layer, the following estimate holds:
‖U ε − U‖L2(]−∞,π[×R+) = O(ε
In [11], their small parameter is µ = ε2, as a result, adopting the same
notations as them, our estimate writes:
‖Uµ − U‖L2(]−∞,π[×R+) = O(µ
which is in agreement with the estimates given in [11]. Like in the penaliza-
tion approach proposed by Bardos and Rauch [2] and underlined by Droniou
in [4], the boundary layer only forms on one side of the boundary. The ap-
proximation U ε+ of U, is computed by taking U ε+|x=0 = U
ε−|x=0, thus, in
numerical applications, the boundary layer phenomenon also affects the rate
of convergence of U ε+ towards U,
as ε → 0+.
References
[1] Ph. Angot, Ch.H. Bruneau, P. Fabrie, A penalization method to take
into account obstacles in viscous flows. Numerische Mathematik 1999;
81:497-520.
[2] C. Bardos, J. Rauch, Maximal positive boundary value problems as limits
of singular perturbation problems, Trans. Amer. Math. Soc.,270 (1982),
pp 377-408.
[3] J. Chazarain, A. Piriou, Introduction to the theory of linear partial dif-
ferential equations. translated from the french , Studies in Mathematics
and its Applications, 14 , North Holland Publishing Co., Amsterdam-
New York,1982.
[4] J. Droniou, Perturbation Singulière par Pénalisation d’un Système Hy-
perbolique, Rapport de stage (1997).
[5] B. Fornet, O.Guès, Penalization approach of semi-linear symmetric hy-
perbolic problems with dissipative boundary conditions, preprint (2007).
[6] O. Guès, G. Métivier, M. Williams, K.Zumbrun Uniform stability esti-
mates for constant-coefficient symmetric hyperbolic boundary value prob-
lems. preprint (2005).
[7] H.O. Kreiss, Initial boundary value problems for hyperbolic systems,
Comm. Pure Appl. math 13 (1970), 277-298.
[8] G. Métivier, Small Viscosity and Boundary Layer Methods : Theory,
Stability Analysis, and Applications, Birkhauser (2003).
[9] G. Métivier, K. Zumbrun Viscous Boundary Layers for Noncharacteris-
tic Nonlinear Hyperbolic Problems, Preprint.
[10] G. Métivier, K. Zumbrun Symmetrizers and Continuity of Stable Sub-
spaces for Parabolic-Hyperbolic Boundary Value Problems, Preprint.
[11] A. Paccou, G. Chiavassa, J. Liandrat and K. Schneider A penalization
method applied to the wave equation. C. R. Acad. Sci. Paris Serie II,
(2003)
[12] J. Rauch, Boundary value problems as limits of problems in all space.,
In Séminaire Goulaouic-Schwartz (1978/1979), pages Exp. No. 3, 17.
École Polytech., Palaiseau, 1979.
	Introduction.
	A Kreiss Symmetrizer Approach.
	A second Approach.
	Underlying approach leading to the proof ofTheorem ??.
	Some preliminaries.
	Detailed proof of Lemma ??: Construction of the matrices B solving Lemma ??.
	A change of dependent variables.
	Proof of Theorem ??.
	Construction of the approximate solution.
	Stability estimates
	End of the proof of Theorem ??.
	Proof of Theorem ??.
	Construction of an approximate solution.
	Asymptotic Stability of the problem as  tends towards zero.
	End of proof of Theorem ??.
	Appendix: answer to a question asked in paccou.
ABSTRACT
  In this paper, we describe a new, systematic and explicit way of
approximating solutions of mixed hyperbolic systems with constant coefficients
satisfying a Uniform Lopatinski Condition via different Penalization
approaches.

<|endoftext|><|startoftext|>
RN.eps
A unified analysis of the reactor neutrino program
towards the measurement of the θ13 mixing angle
G. Mention
(a), Th. Lasserre (a,b), D. Motta (a)
(a)DAPNIA/SPP, CEA Saclay, 91191 Gif sur Yvette, France
(b) Laboratoire Astroparticule et Cosmologie (APC), Paris, France
November 9, 2018
Abstract
We present in this article a detailed quantitative discussion of the measurement of the leptonic
mixing angle θ13 through currently scheduled reactor neutrino oscillation experiments. We thus focus
on Double Chooz (Phase I & II), Daya Bay (Phase I & II) and RENO experiments. We perform a
unified analysis, including systematics, backgrounds and accurate experimental setup in each case. Each
identified systematic error and background impact has been assessed on experimental setups following
published data when available and extrapolating from Double Chooz acquired knowledge otherwise.
After reviewing the experiments, we present a new analysis of their sensitivities to sin2(2θ13) and study
the impact of the different systematics based on the pulls approach. Through this generic statistical
analysis we discuss the advantages and drawbacks of each experimental setup.
1 Experimental context
Over the last years the phenomenon of neutrino flavor conversion induced by nonzero neutrino masses
has been demonstrated by experiments with solar [1, 2, 3], atmospheric [4], reactor [5, 6] and accelerator
neutrinos [7, 8]. Neutrino oscillation, that can be described by the Pontecorvo–Maki–Nakagawa–Sakata
(PMNS) mixing matrix [11], is the current best mechanism to explain the data. Considering only the three
known families, the neutrino mixing matrix is parameterized by the three mixing angles (θ12, θ23, θ13) and a
possible δ CP violation phase. The angle θ12 has been measured to be large (sin
2(2θ12) ≃ 0.8), the angle θ23
has been measured to be close to maximum (sin2(2θ23) & 0.9); but the last angle θ13 has only been upper
bounded sin2(2θ13) . 0.15 at 90 % C.L., by the CHOOZ reactor experiment [9, 12]. These achievements
have now shifted the field of neutrino oscillation physics into a new era of precision measurements. Next
generation experiments are underway all around the world to further pin down the values of the oscillation
parameters of both solar and atmosheric driven oscillations. Currently the most important task, for the
experimentalists, is the determination of the last oscillation through the measurement of the unknown
mixing angle θ13. An improved sensitivity on θ13 is not only important for the understanding of neutrino
oscillations, but also to open up the possibility of observing CP-violation in the lepton sector if the θ13 driven
oscillation is discovered by the forthcoming experiments.
In order to improve the CHOOZ constraint on θ13 at least two identical unsegmented liquid scintillator
neutrino detectors close to a nuclear power plant (NPP) are required. The near detector(s) located a few
hundred meters away from the reactor cores monitor the unoscillated νe flux. The far detector(s) is(are)
located at a distance between 1 and 2 km, searching for a departure from the 1/L2 behavior induced by
oscillations. Experimental errors are being partially cancelled when using identical detectors. The goal is
to achieve an overall effective systematic error of less than 1 % [19, 34].
Three experiments have received partial or full approval to perform such a measurement in a near future:
Double Chooz in France [24], Daya Bay in China [25] and RENO in Korea [26]. In addition, a project is
http://arxiv.org/abs/0704.0498v2
under study at the Angra power plant in Brazil to further improve the sensitivity of the measurement on a
longer time scale [29]. Furthermore the Japanese KASKA collaboration is promoting the reactor neutrino
oscillation physics [28].
2 Neutrino oscillation at reactor and θ13
Fission reactors are prodigious sources of electron antineutrinos which have a continuous energy spectrum
up to about 10 MeV. For Eνe > Ethr = 1.806 MeV they can be detected though the νe+p → e++n reaction
using the delayed coincidence technique, where an electron antineutrino interacts with a free proton in a
tank containing a target volume filled with Gd loaded liquid scintillator. The positron and the resultant
annihilation gamma-rays are detected as a prompt signal while the neutron slows down and then thermalizes
in the liquid scintillator before being captured by a hydrogen or gadolinium nucleus. The excited nucleus
then emits gamma rays which are detected as the delayed signal. Electron antineutrinos energy is derived
from the measured visible energy from positron energy loss and annihilation,
Evis = Ee+ +me ≃ Eν − Ethrν + 2me . (1)
The νe spectrum is calculated from measurements of the beta decay spectra of
235U, 239Pu and 241Pu [15]
after fissioning by thermal neutrons and theoretical 238U spectrum, since no data are available for 238U 1.
As a nuclear reactor operates, the fission element proportions evolve in time (the so-called burn-up). Since
we are interested here on long term interpretation of the data for oscillation search, we will use an average
fuel composition for a reactor cycle corresponding to
235U (55.6 %), 239Pu (32.6 %), 238U (7.1 %) and 241Pu (4.7 %) . (2)
The mean energy release per fission, 〈Ef 〉, is then 205 MeV and the energy weighted cross section for
νep → e+n amounts to 〈σf 〉 = 6 10−43 cm2 per fission. Let us introduce a new luminosity unit, called the
r.n.u. (for reactor neutrino unit) and defined as 1 r.n.u. = 0.197 1060 MeV. With this unit, an experiment
taking data for T years with a total NPP (nuclear power plant) thermal power of P GW and with N 1030 free
protons inside the target has a luminosity L = T P N r.n.u.. The event number, N(L), at a distance L from
the source, assuming no - oscillation, can be quickly assessed with
N(L) =
〈σf 〉
4π 〈Ef 〉
≃ 4.6 109
1 GWth
For the full antineutrino reactor energy spectra simulation, we follow Vogel’s analytical parameteriza-
tion [16], based on second order Eν polynomials. Higher order parameterizations [18] give very comparable
results and do not require a specific attention for our aim in this article. The antineutrino event rates
per energy range is then computed according to the mean reactor core composition (2) and experimental
site specifications (reactor and detector locations, average efficiencies and running time as described in
section 6).
Reactor neutrino experiments measure the survival probability Pee which does not depend on the Dirac
δCP phase. In addition, the oscillation of MeV’s reactor neutrinos studied over a distance of a few kilome-
ters is not affected by the modification of the coherent forward scattering from matter electrons [32, 33].
Expanding the full three flavors νe oscillation probability as a function of (∆m
21/∆m
2 ≃ 1/302 ratio
and sin2(2θ13), Pee measurements from reactor experiments on the kilometer scale may be described by the
simplest two flavors oscillation formula:
Pee ≃ 1− sin2(2θ13) sin2
∆m231L
. (4)
as long as sin2(2θ13) & 10
−3. We assume that in eq. (4), ∆m231 is measured by other experiments (MINOS [8],
K2K [7] and super-K [4]). With a determination of ∆m231 better than 10 %, the impact on sin
2(2θ13)
1238U fissions only with fast neutrons. Theoretical predictions are computed by summing all known beta decay processes
contributing.
determination can be neglected [34]. All results within this study are, unless otherwise stated, computed
for a representative value of ∆m231 [13]:
∆m231 = 2.5
+0.25
−0.25 10
−3 eV2 at 68 % C.L.. (5)
3 Generic analysis of θ13 sensitivity
The calculation of event rates is a convolution of the νe flux spectrum, the cross section, the oscillation
probability, the detector efficiency with the energy response function.
The detector energy response simulation, as well as its correction through detector callibrations are specific
to each experiment. The details of the corrections are thus beyond the scope of this article. We therefore
assume in the following that the reconstructed energy is identical to the true deposited energy. We will
implement, anyway, a simple energy scale systematic uncertainty (section 6.1).
We based our event rates computations on an extended version of the numerical code developed for Dou-
ble Chooz [34, 24]. These computations take into account the characteristics of each experimental setup as
the number of reactors, detectors, locations, overburdens, efficiencies, operating time, and so on which will
be described in section 6.
The resulting event rates, then, form the basis for a χ2 analysis, where systematic uncertainties are properly
included. Since event rates in the disappearance channel of reactor experiments are quite large, we can use
a Gaussian χ2, which has the advantage of allowing a natural inclusion of systematic errors through the
so-called “pull-approach” [14]:
χ2 = min
i=1,...,Nb
D=D1,...,DN
∆Di −
k,k′=1
k,k′αk′
. (6)
This generic χ2 definition encompasses all the spectral information (i index) from each detector (D index)
and systematics parameterization through αk and S
i,k. For S
i,k = 0, we recover the classical χ
2 definition,
χ2no syst =
, through
∆Di =
i −NDi
/UDi (7)
where we assume that the simulated data event numbers, N
i , are uncorrelated between bins and detectors.
In the absence of real data, they are computed for fixed given values of ∆m231
⋆ and sin2(2θ13)
⋆. On the
other hand NDi , the theoretical model, relies on the searched sin
2(2θ13) value. We assume an uncorrelated
weight error UDi which, in the absence of systematic uncertainty, is simply expressed as the statistical error:
UDi =
NDi .
Systematic uncertainties are included in the χ2 definition (6) through αk and S
i,k coefficients. The S
coefficient represents the shift of the ith bin of detector D spectrum due to a 1 σ variation in the kth
systematic uncertainty parameter αk. Following this definition, we introduce bin, detector and reactor
correlations in the systematic errors through SDi,k definitions whereas systematic parameter correlations are
gathered in Wk,k′ (we refer to the appendix for details). Eventually, some fully uncorrelated systematic
uncertainties may be included through the UDi definition, by adding quadratically all their effects together
with the statistical uncertainty. We will use this property to include background shape uncertainties in
our analysis as described in section 5. We refer to the appendix for the proper inclusion and definition of
systematic coefficients SDi,k and αk inside the χ
2 definition (6).
We define the sensitivity or sensitivity limit sin2(2θ13)lim as the largest value of sin
2(2θ13) which fits the
true value sin2(2θ13)
⋆ = 0 at the chosen confidence level. We therefore determine the sin2(2θ13) sensitivity
at 90 % C.L. of an experiment as the value of sin2(2θ13) for which
∆χ2 = χ2(sin2(2θ13))− χ2min = 2.71 . (8)
4 Generic overview of systematic error inputs
Systematic errors can be classified into three main categories: reactor, detector and data analysis induced
uncertainties. In this section we provide a brief and generic description of the systematic uncertainties
included in our modelization. Details concerning specific experimental cases are given in sections 6, 7.
The dominant reactor induced systematic error comes from our limited knowledge of the physical processes
which produce electron antineutrinos in nuclear reactors. This leads to an overall normalization error on
the production rate of 1.9 % [9]. Similarily we include a 2.0 % uncertainty on the antineutrino spectral
shape [15], with the conservative hypothesis that the energy bins are not correlated among themselves.
Furthermore, even at a perfectly defined thermal power and with an absolute knowledge of the number of
antineutrinos emitted for each fission, an underlying uncertainty remains since the nuclear energy released
per fission is known to about 0.5 % [9]. In our model the last three uncertainties are taken to be fully
correlated between the nuclear cores.
We included another group of reactor induced systematics, taken to be uncorrelated between the reactor
cores: the uncertainty on the determination of the thermal power of each nuclear core, within 0.6 – 3 % and
the uncertainty on the isotopic composition of the nuclear core fuel elements, within 2 – 3 %. Also, finite
size and solid angle effects, distances bias between reactors and detectors, as well as displacements of the
neutrino production barycenter might affect the fluxes at the near detector(s) if they are close to the power
plants (below 200 m) up to a level of 0.1 %.
We did not implemented the uncertainty coming from neutrinos produced in the spent fuel pools, often
located within tens of meters from the nuclear cores since we could not gather all the relevant information
for the different sites. We thus neglect this effect in our simulation, whithout any justification except in
the case of Double Chooz, since a detailed evaluation has shown that this uncertainty does not affect its
sensitivity [24]. We point out that in particular cases this additional neutrino source slightly affects the
antineutrino spectrum, and is thus relevant for experiments aiming at high sensitivities, e.g. sin2(2θ13) ∼
0.01. We could also have introduced a specific error on the inverse-beta decay neutrino cross section of
0.1 % [9, 36]. However, being fully correlated between the detectors, the latter can be gathered into a global
uncertainty, adding up to the overall neutrino rate knowledge.
The basic principle of the multi-detector concept is the cancellation of the reactor induced systematics;
additional contributions would not modify significantly the sensitivities of the forthcoming experiments.
Let us now focus on the uncorrelated errors between detectors that could affect strongly the experimental
results.
The uncorrelated errors between detectors directly contribute to the relative normalization of the mea-
sured antineutrino energy spectrum of each detector. One of the major improvement with respect to the
CHOOZ experiment relies on the precise measurement of the number of free protons inside target volumes,
proportional to the antineutrino rates. Experimentally the target mass will be determined at 0.2 %. An
uncertainty on the fraction of free hydrogen per unit volume remains, at the level of 0.2 – 0.8 %, if different
batches of liquid scintillator are used to fill the detectors of a given experiment. This complex case diserve a
special treatment as described in section 7.2. We did not include any error associated with the measurement
of the live time of the experiments.
The last set of systematics concerns the selection cuts applied to extract the antineutrino signal and reject
the backgrounds. Neutrino events are identified as positrons followed in time by a single neutron captured on
a gadolinium nucleus. New detector designs have been proposed in order to simplify the analysis, reducing
the systematic errors while keeping high statistics and high detection efficiency. We accounted for three
uncertainties for both positron and neutron associated to a candidate event: the possibility of missing the
particle as it escapes the target (escape), the uncertainty related to the particle interactions2 (capture) and
2Note that it affects essentially the neutrons, since the Gd concentration might differ between the detectors if they are not
the identification cut based on the energy deposited in the detector active region (identification). Finally
we take into account the uncertainty on the efficiency of the delay cut and an error on the neutron unicity
of the event selections, whereas we do not consider any position vertex reconstruction. We provide in
section 6.1 (Table 1) the detailed systematic inputs necessary for our simulations. The uncertainty on these
efficiencies have been treated, otherwise stated, as uncorrelated between detectors. Uncertainties induced
by the background subtractions are discussed in sections 5 and 6.3.
5 Backgrounds: description and modelization
In this section we briefly review the three main kinds of background for the next generation of reactor
neutrino experiments: accidental coincidences, fast neutrons and the long-lived muon induced isotopes
9Li/8He. We then describe our simplified background modelization.
Naturally occurring radioactivity mostly creates accidental background, defined as a coincidence of a prompt
energy deposition between 0.5 and 10 MeV, followed by a delayed neutron-like event in the fiducial volume
of the detector within a few hundredths of a millisecond. Selection of high purity materials for detector
construction (scintillator, mineral oil, PMT’s, steel, etc.) and passive shielding provide an efficient handle
against this type of background. We assume that the accidental background rate can be measured in situ
with a precision of 10 %.
Cosmic ray muons will be the dominating trigger rate at the depth of all near detector sites. Even though the
energy deposition corresponds to about 2 MeV per centimeter path length (providing a strong discrimination
tool) they induce the main source of background.
Muon induced production of the radioactive isotopes 8He, 9Li and 11Li can not be correlated to the primary
muon interaction since their lifetimes are much longer than the characteristic time between two subsequent
muon interactions. The characteristic signature of this last class of events consists in a four-fold coincidence
(µ → n → β → n). The initial muon interaction is followed by the capture of spallation neutrons within
about 1 ms. The time scale of the decay of the considered isotopes is on the order of a few 100 ms, again
followed by a neutron capture. This background mimics the νe signal and is considered among the most
serious difficulty to overcome for the next generation reactor neutrino experiments. In our simulation we
will assume that it can be estimated to within 50 %.
A further source of background are neutrons that are produced in the surrounding rocks by radioactivity
and in cosmic ray muon induced hadronic cascades. In the latter case, which is dominant at shallow depth,
the primary cosmic ray muon may not penetrate the detector, being thus invisible. Fast neutrons may
then enter the detector and create recoil protons and be captured by hydrogen or gadolinium nuclei after
thermalization. Such a sequence can mimick a νe event. In the case of Double Chooz far detector (depth of
300 m.w.e.), muon induced neutron production can be fairly well estimated from the results of the CHOOZ
experiment, since it was the dominating backgroundmonitored during reactor off periods [9]. We will assume
that this background rate can be estimated within a factor of two. Figure 1 illustrates the background
spectra that we implemented in our modelization. We used the CHOOZ reactor off data [9] to estimate the
fast neutron background spectral shape, measured to be flat. We used a simple approximate exponential
shape for accidental backgrounds. Finally we implemented the spectra of 8He and 9Li based on nuclear data
information; we weighted them in the ratio 0.2/0.8, respectively. We checked that slight modifications of
the background shapes do not change the results of our simulations for sin2(2θ13) sensitivities above 0.01.
Background rates are adjusted to match to the predictions at each detector location. Afterward they are
subtracted from the total Evis spectrum in each detector. These three backgrounds, Bn, have rates and
shapes known at σBD
and σshp,BD
, respectively. We take the conservative assumption that these shape
uncertainties are fully uncorrelated between bins. Thus, the σshp,BD
contributions may be directly included
inside UDi (eq. (7)). These generic uncorrelated errors will then take into account the statistical and the
background shape subtraction uncertainties:
filled with a single batch of liquid scintillator.
0 1 2 3 4 5 6 7 8 9 10
Evis (MeV)
Proton recoils
Accidentals
Cosmogenics
Figure 1: Energy spectrum of the backgrounds from spallation neutrons, accidentals, and cosmogenic
20 % 8He and 80 % 9Li. Each curve is normalized to unity.
UDi =
NDi +
BDn,i + σ
shp,BD
BDn,i
As the background rate uncertainties are correlated between bins, we treat them with additional pull terms,
, in eq. (6), with weights σBD
included in SDi,k (see the appendix).
6 Comparison of the current proposals
Several sites are currently being considered for new reactor experiments to search for θ13: Angra dos
Reis (Angra, Brazil), Chooz (Double Chooz, France, and possibly Triple Chooz), Daya Bay (Daya Bay,
China), Kashiwazaki (KASKA, Japan) and Yonggwang (RENO, Korea). All these experiments may be
classified in two generations. The first aims to probe the value of sin2(2θ13) till 0.02 – 03, and the second to
track sin2(2θ13) down to 0.01 (90 % C.L.). The first phase concerns Double Chooz, RENO, and possibly
Daya Bay (with its phase I). This phase should end by 2013. Angra, Daya Bay (nominal setup with
8 detectors), KASKA and possibly Triple Chooz are focusing on the second phase. For these second
generation experiments, a significant R&D effort is required since the effective Gd-scintillator mass will be
increased by, at least, one order of magnitude, and systematics, as well as backgrounds uncertainties, have
to be further reduced with respect to the first phase experiments. In the following comparisons we will not
include Angra, KASKA and Triple Chooz. We will only focus on Double Chooz, Daya Bay and RENO.
In the next sections we first introduce a generic discussion on systematics, scintillator composition and
backgrounds (sections 6.1 to 6.3). We will then shortly describe each setup and compute the associated
baseline sensitivity (sections 7.1 to 7.3). We then perform a comparative analysis of the setups in section 8
to highlight advantages and drawbacks of each setup.
6.1 Detailed systematics review
The two Double Chooz [24] and Daya Bay [25] proposals take a careful inventory of systematics, compiled
in Table 1 for comparison. Double Chooz and Daya Bay estimates for the case of no additional R&D are
at hand. We decided to strictly use the systematic errors (Table 1) and background values quoted by the
collaborations [24, 25, 26]. However, we could not find any detailed background estimate for the RENO and
Daya Bay Mid site setups. In the latter cases we use a simple model to estimate the background subtraction
uncertainties from the scaling of the Double Chooz far detector (see section 6.3).
Error Description CHOOZ Double Chooz Daya Bay
No R&D R&D
Absolute Absolute Relative Absolute Relative Relative
Reactor
Production cross section 1.90 % 1.90 % 1.90 %
Core powers 0.70 % 2.00 % 2.00 %
Energy per fission 0.60 % 0.50 % 0.50 %
Solid angle/Bary. displct. 0.07 % 0.08 % 0.08 %
Detector
Detection cross section 0.30 % 0.10 % 0.10 %
Target mass 0.30 % 0.20 % 0.20 % 0.20 % 0.20 % 0.02 %
Fiducial volume 0.20 %
Target free H fraction 0.80 % 0.50 % ? 0.20 % 0.10 %
Dead time (electronics) 0.25 %
Analysis (paticle id.)
e+ escape (D) 0.10 %
e+ capture (C)
e+ identification cut (E) 0.80 % 0.10 % 0.10 %
n escape (D) 0.10 %
n capture (% Gd) (C) 0.85 % 0.30 % 0.30 % 0.10 % 0.10 % 0.10 %
n identification cut (E) 0.40 % 0.20 % 0.20 % 0.20 % 0.20 % 0.10 %
νe time cut (T) 0.40 % 0.10 % 0.10 % 0.10 % 0.10 % 0.03 %
νe distance cut (D) 0.30 %
unicity (n multiplicity) 0.50 % 0.05 % 0.05 %
Total 2.72 % 2.88 % 0.44 % 2.82 % 0.39 % 0.20 %
Table 1: Breakdown of the systematic errors included in the computation of the sensitivity of Dou-
ble Chooz [24] and Daya Bay [25]. Since no breakdown of the RENO systematic errors has been pub-
lished we use the same systematic error budget as for Double Chooz. Double Chooz and Daya Bay relative
systematics are almost comparable, the only main difference coming from the determination of the gadolin-
ium concentration and the free proton fraction inside the target volume. The absolute determination of
the free proton fraction can have some impact in Daya Bay since multiple batches will be used to fill all
the 8 detectors (see section 7.2 for details). Nevertheless, there is no published value in the Daya Bay
proposal [25].
Through all the available publications [24, 25, 26, 28], the differences between the systematics are found
only for the relative normalization of the dectetor (σrel) and the subtraction uncertainties on background
rates (σBD
). From section 4 and Table 1, we thus group systematics in two categories to be used in our χ2
analysis eq. (6):
1. generic systematics common to all the experiments (Table 2):
– σabs, the theoretical uncertainty on reactor antineutrino spectrum prediction. We call it also the
absolute normalization of event rates (since common to all the detectors), extracted from Table 1
without power uncertainty contributions. σabs is at the level of 2 %;
– σshp, the theoretical reactor spectrum shape uncertainty, at the level of 2 %;
– σabs
, σrel
, the absolute and relative energy scale uncertainties, roughly assessed at the level of
0.5 % each;
– σpwr, the reactor thermal power uncertainty, at the level of 2.0 %;
– σcmp, the reactor core specific fuel composition uncertainty which is roughly at the level of the
power uncertainties (2 – 3 %) on each fuel element.
σabs σshp σ
σpwr σcmp
2.0 % 2.0 % 0.5 % 0.5 % 2.0 % 2 – 3 %
Table 2: Generic systematic uncertainties as included in the χ2 analysis. For more details about the
correlations between the systematics, we refer to the appendix, and more particularly to Table 15.
2. specific systematics:
– σrel, the relative normalization of event rates between all the detectors. This uncertainty is
uncorrelated between detectors;
– σBD
, the background subtraction unceratinties, described in section 6.3.
6.2 Impact of the scintillator composition
In this section we stress the impact of the scintillator composition on the sensitivity of reactor neutrino
experiments. All current projects will use a gadolinium doped liquid scintillator to enhance the neutron
capture. The long-term stability of this scintillator is among the most difficult experimental challenges,
since a degradation of the scintillator transparency would induce large systematic uncertainties. In the
following we consider a stable scintillator for all experiments. The choice of the scintillating base has
some importance since it defines the free proton number per unit volume, the νe rate and proton recoil
background rate. Similarly the 12C number per volume drives the long-lived muon induced isotopes in the
target scintillator.
Different bases can be used as neutrino target scintillator, mixture of dodecane (DOD), pseudocumene
(PC) or phenylxylylethane (PXE), or linear alkylbenzene (LAB), as described in details in Table 3. If we
consider the Double Chooz scintillator (80 % DOD + 20 % PXE) as our reference, a pure LAB scintillator
contains 4.9 % less free proton per volume, and 9.5 % more carbon atoms. In the following we will assume
the Daya Bay [25] and RENO [26] experiments will use pure LAB as the target scintillator, and we will
renormalize the neutrino and the backgound rates accordingly in section 7.2 and 7.3.
6.3 Estimation of the µ-induced backgrounds
Our estimates of the backgrounds induced by cosmic muons is based on the Double Chooz proposal [24]. A
modification of the MUSIC code [20] was used to compute the muon rates and energy spectra by propagating
Liquid Formula density 1028H/m3 1028C/m3
Dodecane (DOD) C12H26 0.753 6.93 3.20
pseudocumene (PC) C9H12 0.88 5.30 3.97
phenylxylylethane (PXE) C16H18 0.985 5.08 4.52
90 % DOD+10 % PC mixture in vol. 0.77 6.77 3.32
80 % DOD+20 % PXE mixture in vol. 0.80 6.56 3.46
linear alkylbenzene (LAB) C16H30 0.86 6.24 3.79
Table 3: Impact of the scintillator composition on the target free proton number driving the νe rate
as well as the recoil proton background, and on the carbon composition which drives the long-lived muon
induced isotopes produced in the detector. In the following we consider that Double Chooz is using modules
containing 8.26 tons of 80 % dodecane +20 % PXE (in volume) based target scintillator, and that Daya Bay
and RENO are both using 20 ton modules of a LAB based target scintillator.
surface muons through rock. The site topographies have been included, according to a digitized map of the
Chooz hill profile [22] in the case of the far site (300 m.w.e. overburden), and according to a flat topography
in the case of the near site (80 m.w.e. overburden). For the far site this full Monte-Carlo simulation predicts
a muon flux Φµ=0.612± 0.007 m−2 s−1, slightly higher than the approximate measured value quoted in [9].
The mean muon energy computed according to this method is 〈Eµ〉 = 61 GeV. For the near site, we get
Φµ = 5.9 m
−2 s−1, and 〈Eµ〉 = 22 GeV. A similar detailed computation was performed for the three sites
of the Daya Bay collaboration [25].
Thus, for both the cases of Double Chooz and Daya Bay we use only the values of the muon flux and mean
energy computed by the collaborations. However, we could not find any published data for the case of
RENO. We then use the underground muon fluxes and mean energies calculated analytically following [21].
In order to justify this approximation, Table 4 reports the muon flux and mean energy computed by the
Double Chooz and the Daya Bay collaborations, from 300 m.w.e. to 923 m.w.e., as well as the analytical
computations according to [21]. We found a reasonable agreement which bears out the use of the analytical
model [21] for the case of the RENO sites (depths of 255 m.w.e. and 675 m.w.e.). Nevertheless, note
that the analytical simulation assumes a flat topography. It is worth noting, however, that the mean muon
energy predicted by the analytical computation is systematically ∼ 25 % lower in the depth range of interest.
Therefore we arbitrarily renormalize the analytical calculation by 25 % to estimate the mean muon energy
at the RENO sites. We also note a large discrepancy between the detailed computation and the analytical
model of [21] for the Double Chooz near site, probably due to its very shallow depth.
Cross sections of muon induced isotope production on liquid scintillator targets (12C) have been measured
by the NA54 experiment at the CERN SPS muon beam at 100 GeV and 190 GeV muon energies [23]. The
energy dependence was found to scale as σtot(Eµ) ∝ 〈Eµ〉α with α = 0.73± 0.10 averaged over the various
isotopes produced. We consider in the following that both long-lived muon induced isotopes and muon
induced fast neutron backgrounds scale as Φµ ×〈Eµ〉α, our reference being taken at the Chooz far site (full
Monte-Carlo simulation). We define the depth scaling factor as
DSF = (Φµ 〈Eµ〉α)/(Φµ 〈Eµ〉α)Double Chooz far , (10)
which is illustrated in the last column of Table 4 for various detector sites. Daily background rates computed
for Daya Bay, Double Chooz, and RENO detectors at the different sites are then given in Table 5.
Two cases are considered and compared: the background rates taken from the literature when available,
and the background rates extrapolated from the background computed for the Double Chooz far detector,
scaled with the target mass, the scintillator free proton and carbon numbers, as well as the depth scaling
factor (DSF). We note here the good agreement between the Daya Bay cosmogenic induced backgrounds
(9Li/8He and fast neutrons) estimated from our Double Chooz extended model and the original estimates
of the Daya Bay collaboration. This bears out the use of our model for the RENO and Daya Bay mid
site configurations. For the case of the 9Li/8He background, we understand well this agreement since
the background rate mainly depends on the mass of the neutrino target region. The agreement is more
Site depth (m.w.e.), Detailed simulation Analytical model DSF
topography Φµ 〈Eµ〉 Φµ 〈Eµ〉
m−2 s−1 GeV m−2 s−1 GeV
DC near 80, flat 5.9 22 9.9 17 6.80
RENO near 230, hill — — 1.2 40 1.57
DB near 1 255, hill 1.2 55 0.9 44 1.32
DB near 2 291, hill 0.73 60 0.72 49 1.06
DC far 300, hill 0.61 61 0.67 50 1
DB mid 541, hill 0.17 97 0.15 71 0.32
RENO far 675, hill — — 0.084 94 0.20
DB far 923, hill 0.04 138 0.035 118 0.10
Table 4: Muon flux Φµ and mean energy 〈Eµ〉 for the underground site of the reactor neutrino experiments.
We compare the values obtained from a full Monte-Carlo simulation for Double Chooz and Daya Bay to the
analytical model of [21]. We use the latter model for RENO. The depth scaling factor (DSF) is defined by
the product (Φµ×〈Eµ〉α)/(Φµ×〈Eµ〉α)Double Chooz far, the Double Chooz far site is taken as the reference.
Backgrounds induced by cosmic muons are scaled according to this factor.
Detector Accidental (d−1) µ-induced fast-n (d−1) µ-induced 9Li/8He (d−1)
Site Original DC ext. Original DC ext. Original DC ext.
Double Chooz near 13.60 ± 1.36 1.36 ± 1.36 9.52 ± 4.76
RENO near — 7.10 ± 0.71 — 0.68 ± 0.68 — 5.40 ± 2.70
Daya Bay DB 1.86± 0.19 5.98 ± 0.60 0.50 ± 0.50 0.57 ± 0.57 3.7 ± 1.85 4.55 ± 2.27
Daya Bay LA 1.52± 0.15 4.76 ± 0.48 0.35 ± 0.35 0.45 ± 0.45 2.5 ± 1.25 3.63 ± 1.81
Double Chooz far 2.00± 0.20 — 0.20 ± 0.20 — 1.40 ± 0.70 —
Daya Bay mid — 1.45 ± 0.14 — 0.14 ± 0.14 — 1.10 ± 0.57
RENO far — 0.90 ± 0.09 — 0.09 ± 0.09 — 0.69 ± 0.35
Daya Bay far 0.12± 0.01 0.44 ± 0.04 0.03 ± 0.03 0.04 ± 0.04 0.26 ± 0.13 0.33 ± 0.17
Table 5: Daily background rates computed for Daya Bay, Double Chooz, and RENO detectors at the
different sites. We consider the three main background sources: accidental events, µ-induced fast neutrons,
and µ-induced 9Li/8He. The columns labelled “Original” quote the background rates taken from the
literature when available. The columns labelled “DC ext.” (for extended) quote the background value
extrapolated from the background computed for the Double Chooz far detector, scaled with the detector
target mass, the scintillator free proton and carbon numbers, as well as the depth scaling factor (DSF).
Background subtraction systematic errors (%)
Detector Accidental µ-induced fast-n µ-induced 9Li/8He
Site Original DC ext. Original DC ext. Original DC ext.
Double Chooz near 0.123 0.123 0.043
RENO near — 0.019 — 0.019 — 0.074
Daya Bay DB 0.020 0.064 0.054 0.061 0.199 0.245
Daya Bay LA 0.020 0.063 0.046 0.060 0.164 0.239
Double Chooz far 0.292 — 0.292 — 1.020 —
Daya Bay mid — 0.120 — 0.115 — 0.458
RENO far — 0.100 — 0.095 — 0.382
Daya Bay far 0.010 0.036 0.025 0.035 0.108 0.138
Table 6: Background subtraction systematic errors (in percent) computed for Daya Bay, Double Chooz, and
RENO detectors at the different sites. We consider the three main background sources: accidental events,
µ-induced fast neutrons, and µ-induced 9Li/8He. The columns labelled “Original” quote the systematic
errors taken from the literature when available. The columns labelled “DC ext.” (for extended) quote
the systematic errors extrapolated from the Double Chooz far detector, taking into account the estimated
detector signal to background ratio as well as the background rate uncertainty.
surprising, however, for the case of the fast neutron background, since the size of the liquid shielding around
the detector active area is rather different between the Double Chooz and Daya Bay detectors (the RENO
detector design is very close to the Double Chooz case). In addition, further detector differences such as
the thickness of the buffer oil shielding the inner target region, as well as the different mechanical structure
explain the discrepancy between the Daya Bay computation and the DC extended model for the case of
the accidental background. Nevertheless, we found out that these differences influence only weakly the
sensitivity computed for the three experiments since the accidental background energy spectrum is different
enough from the expected oscillation signal, and it is supposed to be known with a precision of 10 %.
In a similar way, Table 6 gives the background subtraction systematic errors (in percent) computed for
Daya Bay, Double Chooz, and RENO detectors at the different sites, taking into account the estimated
detector signal to noise ratios as well as the background rate uncertainties.
7 Reactor experiments baseline sensitivity
7.1 Double Chooz
The Double Chooz collaboration is composed of institutes from Brazil, France, Germany, Japan, Russia,
Spain, United Kingdom, and the United States. The experimental site is located close to the twin reactor
cores of the Chooz nuclear power station (two PWR3 producing 8.5 GWth), operated by the French company
Électricité de France (EDF). The two, almost identical, detectors will contain a 8.3 ton fiducial volume of
liquid scintillator (density of 0.8) doped with 0.1 g/l of gadolinium (Gd). The far detector will be installed
in the existing laboratory, 1.05 km from the cores barycenter, shielded by 300 m.w.e. of rock. This detector
should be operating alone for 1.5 – 2 years (DC Phase I), starting data taking by the end of 2008. The
second detector will be installed in the meantime about 280 m from the nuclear core barycenter, at the
bottom of a 40 m shaft (80 m.w.e.) to be excavated. Distances between detectors and nuclear cores as well
as site overburdens are given in Table 7. Since there are no more than two NPP cores, it is still possible to
install the near detector at a suitable position where the ratio of reactor νe fluxes from each core is the same
as for the far detector (the iso-ratio curve is plotted on figure 2). This allows reactor relative uncertainty
cancellations (NPP core compositions). This detector should be operational by 2010, and will take data
3Pressurized Water Reactor
Detector near far
Distance from West reactor (m) 290.7 1114.6± 0.1
Distance from East reactor (m) 260.3 997.9± 0.1
Detector Efficiency 80 % 80 %
Dead Time 25 % 2.5 %
Rate without efficiency (d−1) 977 66
Rate with detector efficiency (d−1) 782 53
Integrated rate (y−1) 1.67 105 1.48 104
Table 7: Double Chooz antineutrinos rate expected in the near and far detectors, with and without reactor
and detector efficiencies. The integrated rate in the last line includes detector efficiency, dead time, and
reactor off periods averaged over a year. The averaged reactor global load factor is estimated at 79 % [30].
Figure 2: Double Chooz experiment site configuration. We show also on this figure the far flux iso-ratio
line. Another detector located on this particular curve will receive the same reactor flux ratio as for the far
detector: 44.5 % from West and 55.5 % from East reactor. The near detector of Double Chooz is foreseen
to be placed on this line.
for three years (DC Phase II). Other details concerning the experiment may be found in the collaboration
proposals [24].
For the Double Chooz phase I (DC Phase I) analysis, we used systematics of Table 2, setting σrel and
to 0 since only one detector will be present. For a data taking period of 1.5 years, the sensitivity
is sin2(2θ13)lim = 0.0544. The second phase (DC Phase II) will then start and both detectors will take
data for 3 years as scheduled in the proposal. The full experiment will then achieve a final sensitivity
sin2(2θ13)lim = 0.0278, assuming systematics from Table 2 and σrel = 0.6 % [24]. The sensitivity worsens
for ∆m231 < 2.5 10
−3 eV2, due to the close distance of the far detector4. If we take the lower bound [12, 13]
on ∆m231 (2.0 10
−3 eV2), Double Chooz will lose 30 % in sensitivity whereas for the upper bound [12, 13]
on ∆m231 (3.0 10
−3 eV2), Double Chooz will gain 15 % in sensitivity.
7.2 Daya Bay
Daya Bay is an experiment proposed by institutes from China, the United States, and Russia. Daya Bay [25]
will be located in the Guang-Dong Province, on the site of the Daya Bay nuclear power station. The site is
made up of two pairs of twin reactors, Daya Bay (DB) and Ling Ao I (LA I). An additional pair of reactors,
Ling Ao II (LA II), is currently under construction and should be operational by 2010 – 2011 [25]. Each
core has a thermal power of 2.9 GW [30]. In the full installation setup 3.3 km of tunnel and 3 detector halls
have to be excavated, in order to accommodate 8 detector modules [25]. Each module contains an effective
volume of 20 tons of Gd-loaded LAB liquid scintillator (Table 3). Distances between detectors and nuclear
cores as well as site overburdens are given in Table 8. This site yields a rather complex signal composition
Detector near DB near LA mid far
Distance from DB 1 (m) 350 1,356 1,153 1,970
Distance from DB 2 (m) 381 1,331 1,161 2,000
Distance from LA I 1 (m) 942 492 783 1,619
Distance from LA I 2 (m) 1,030 475 818 1,623
Distance from LA II 1 (m) 1,378 500 968 1,602
Distance from LA II 2 (m) 1,463 555 1,029 1,624
Detector eff. 80 % 80 % 80 % 80 %
Dead Time 7.2 % 4.3 % 1 % 0.2 %
Rate without eff. (d−1) 1,938 1,813 494 430
Rate with detector eff. (d−1) 1,550 1,450 395 344
Integrated rate (y−1) 4.10 105 3.95 105 1.11 105 9.77 104
Table 8: Daya Bay antineutrino rates expected in the near and far detectors, with and without reactor and
detector efficiencies. The integrated rate in the last line includes detector efficiency, dead time, and reactor
off periods averaged over a year. We assumed that LA II NPP will be operating for the time the far site
will be fully installed, but for the Mid site installation we assumed that LA II will be off.
in each detector coming from up to 6 different NPP cores, as shown in Table 9. According to the Day Bay
proposal, our estimate of the sensitivity is sin2(2θ13) = 0.0085 very close to the Daya Bay quoted value
(sin2(2θ13) = 0.008). However, in the Daya Bay proposal, the uncertainties on the reactor fuel composition,
σcmp, as well as the energy scale associated uncertainties σ
and σrel
are neglected. Taking into account
these systematics (Table 2) and the quoted value of σrel = 0.39 % with no R&D [25], our estimate of the
Daya Bay final sensitivity is then sin2(2θ13) = 0.009, with the full installation (DB Phase II, Figure 3)
after 3 years of data taking. We draw the attention of the reader on the point that these computations are
based on the assumption that σrel = 0.39 % is fully uncorrelated between all the detectors, and in particular
4a bit too close to the NPP to get the maximum amount of information from first minimum of oscillation over the reactor
spectrum distortion
DB LA1 LA2
near 1 (DB) 83.1 % 11.4 % 5.5 %
near 2 (LA) 6.5 % 50.6 % 42.8 %
Mid (LA2 OFF) 32.3 % 67.7 % 0.0 %
Mid (LA2 ON) 22.5 % 47.1 % 30.4 %
far 24.9 % 37.4 % 37.7 %
Table 9: Daya Bay rate contributions from each NPP set (2 cores by set) while assuming, except if otherwise
noticed (3rd line), the new Ling Ao II NPP is operating at full power.
between detectors on a same site. Nevertheless this hypothesis is not guarenteed. We discuss this point as
well as the current filling scenario (multi-batches) just below.
Figure 3: Daya Bay installation phases and site configuration. On the left: phase I, with 2 × (2 × 20 t)
detectors, Daya Bay and Ling Ao I power plants are operating. On the right: phase II with 2× (2× 20 t)
near sites, and 4× 20 t at far site, all three power plants are operational.
A preliminary fast phase (DB Phase I) is proposed by the Daya Bay collaboration. This phase includes only
4 detectors, 2 of them located at the DB near site and the other 2 at the mid site (see Figure 3). Taking
systematics from Table 2 and σrel = 0.39 % fully uncorrelated between detectors, we get a sensitivity after
1 year of data taking of sin2(2θ13) = 0.040. if LA II NPP is off. If DB Phase I starts after 2010, with LA II
operational, we get a sensitivity of sin2(2θ13) = 0.038 still after 1 year of data taking. The reason for a
better sensitivity in the second scenario is that the mid detectors get 44 % more νe events (Table 8, 9 and
Figure 3), with a larger oscillation baseline with respect to LA I.
Daya Bay site correlation
The main concept of the Daya Bay experiment is based on the multi-inter-calibration of detectors. Since
many detectors are installed on a same site, the total uncorrelated uncertainty of a site is decreased by
a factor of 1/
NSd compared to the single detector uncorrelated uncertainty, where N
d is the number of
detectors on a given site. In the Daya Bay proposal, the full relative uncertainty (σrel = 0.39 %) is assumed
to be uncorrelated between all the detectors. However, the fraction of correlated and non-correlated error
between detectors of a same site is not trivial. It relies on many experimental assumptions on uncertainty
correlations. The absence of correlations between detectors on a same site implies there will be no detector-
to-detector correction applied. If detector responses happen to be different, any data correction from
detector to detector on a same site would yield correlations between detector uncertainties.
For DB Phase II, if we assume that σrel is fully uncorrelated between the detectors, we get a sensitivity of
sin2(2θ13) = 0.009. On the contrary, if we assume that σrel is fully uncorrelated between detectors in different
sites, but fully correlated between detectors on a same site, we get a sensitivity of sin2(2θ13) = 0.012. The
real sensitivity should lay between these two extremes.
For DB Phase I, if we assume σrel is fully correlated on a same site and fully uncorrelated on distant sites,
we get sin2(2θ13) = 0.041 if LA II is off, and sin
2(2θ13) = 0.038 if LA II is on. We conlude from these results
that DB Mid sensitivity does not strongly depend on the correlations in σrel. More generally, DB Mid setup
weakly depends on σrel. This latter point will be discussed in section 8, together with a full description of
the impact of each systematic on the forseen sensitivity.
Daya Bay and the filling procedure
A large amount of Gd-loaded liquid scintillator will have to be produced, stored and filled into the detectors
(8×20 tons, which is 10 times more than in Double Chooz). Due to the large number of detectors and
the large amount of liquid to manage, the Daya Bay collaboration plans to fill detectors with four different
Gd-doped liquid scintillator batches. A single batch will be used to fill detectors by pairs [25] (Figure 4).
The best installation scenario (which is the one chosen by the Daya Bay collaboration) is then to move
one of the filled detector to a near site, the other one to the far site (scheme 1 of Figure 4 and Table 10).
Another batch of Gd-loaded liquid scintillator will be used for the next pair of detectors and so on. With
the adopted filling procedure [25], extra systematic uncertainties on the hydrogen content between different
batches have to be included.
DB LA
Pair 1
Pair 2
Pair 3
Pair 4
Scheme 1 Scheme 2 Scheme 3
Scheme 4 Scheme 5 Scheme 6
Figure 4: Possible installation of detector pairs in the Daya Bay experiment according to the adopted filling
procedure [25]. Due to the large number of detectors, and the large amount of liquid to be managed,
the Daya Bay collaboration plans to fill detectors by pairs with four different Gd-doped liquid scintillator
batches. The forseen detector installation scenario [25] corresponds to scheme 1. We illustrate here other
installation possibilities (schemes 2 to 6).
Although the relative uncertainty on the free proton content within a same batch could be kept at a very
low level (0.2 % in the Daya Bay proposal [25], negligeable in the Double Chooz proposal [24]), it is not
necessarily true in the case of different batches, in which the chemical composition may slightly change.
The free proton content between different batches relies then on the measurement of this quantity. In the
CHOOZ experiment [9], the free proton fraction inside the Gd-loaded liquid scintillator was known to 0.8 %.
In Double Chooz, this uncertainty is assessed at the level of 0.5 %. Since there is no published value on the
absolute determination of this quantity in the Daya Bay proposal, we assume here, as in Table 1, a 0.5 %
uncertainty between different batches. The filling systematic coefficients are explained in Table 10 (refer to
the appendix and to Table 15 for full description). The uncertainty on the free proton fraction of a single
Scheme 1
Scheme 2
Scheme 3
Scheme 4
Scheme 5
Scheme 6
0.008 0.01 0.012 0.014 0.016
(2θ13)
Single batch
Four batches
Figure 5: Sensitivity on sin2(2θ13) at 90 % C.L. after 3 years of data taking for the 6 different installation
schemes illustrated on Figure 4, with σpair = 0.5 %. Left bounds are computed assuming uncorrelated errors
between all the detectors and right bounds are for the assumption of full correlation of σrel between detectors
of a same site. The real sensitivity should be somewhere in between these two bounds. For comparison,
we also show on this graph the computations for the single batch hypothesis where we take σrel = 0.39 %.
Note that obviously the first installation scheme provides the best sensitivity. In these results we do not
include any detector swapping scenario.
Error type k SD
Filling (Ns = 5Nd +Nb + 5Nr + 2)
of batch 1 Ns +1 σpair
αpair,1
of batch 2 Ns +2 σpair
αpair,2
of batch 3 Ns +3 σpair
αpair,3
of batch 4 Ns +4 σpair
αpair,4
Table 10: Daya Bay, phase II (8 detectors), specific filling systematic parameters table in addition to
standard one (see the appendix for details). Here we adopt a fully uncorrelated uncertainty of the different
batches, σpair = 0.5 %, but fully correlated in a same batch.
batch is taken to σpair = 0.5 %. This uncertainty is taken to be fully correlated when detectors are filled
with the same batch and fully uncorrelated otherwise.
According to this filling procedure, the final sensitivity of DB Phase II would be sin2(2θ13) = 0.0093
instead of 0.0089 with the initial installation scenario (scheme 1 of Figure 4). In all the other configuration
schemes (2 – 6), which allows comparing on a same site at least two detectors filled with the same batch,
the sensitivity is more largely weakened (Figure 5), to an extent depending on σpair.
Note that we did not include any detector swapping option in previous conclusions. In the Day Bay proposal,
the retained installation scenario is the first scheme of Figure 4. The baseline swapping option is then the
permutation of two detectors filled with the same batch, in 4 steps, 1 step per batch. On the one hand,
the drawback of such a swapping scenario is that two detectors filled with the same batch will never be
directly compared. On the other hand, configuration schemes 2 – 6 allow detector intercalibration within a
pair. However, it should be noticed that the time spent in any configuration different from scheme 1 may
decrease the combined final sin2(2θ13) sensitivity (Figure 5).
7.3 RENO
The RENO experiment [26] will be located close to the Yonggwang nuclear power plant in Korea, about
400 km south of Seoul. The power plant is a complex of six PWR reactors, each of them producing a thermal
power of 2.74 GW [30]. The Yonggwong power station is ranked number 4 in the world, with a total thermal
power of 16.4 GW. Its power rating is often cited as an advantage of the RENO experiment. These six
reactors are equally distributed on a straight segment spanning over 1.5 km. The average cumulative
operating factors for the reactors are all above 80 %. Figure 6 shows the foreseen layout of the experimental
site. The near and far detectors will be located 150 m and 1,500 m away from the center of the reactor
Figure 6: RENO experiment site configuration.
row (Table 11), and will be shielded by a 88 m hill (230 m.w.e.) and a 260 m “mountain” (675 m.w.e.)
respectively. Two neutrino laboratories have to be excavated and equipped in order to host the detectors.
They will be located at the bottom of two tunnels having a length of 100 m and 600 m for near and far
detector, respectively. In this configuration, the flux contribution from R1 to R6 ranges from 3 % to 39 %
for the near detector, and from 15 % to 18 % for the far detector (Table 12).
Assuming the sytematics of Table 1 and, lacking of published data, fixing σrel = 0.6 % as for Double Chooz,
we obtain a final sensitivity after 3 years of data taking of sin2(2θ13) = 0.021. This is calculated for
2× 20 t detectors. However, a recent talk [27], quotes a different detector size, with 15 t of Gd-doped liquid
scintillator. In that case, the final sensitivity after 3 years, would be sin2(2θ13) = 0.023.
As seen from Table 12, the near detector monitors mainly the two central NPP cores, and the experiment
sensitivity is quite affected by the reactor power uncertainties. If we compute the sensitivity with only the
2 central NPP cores, we get even better results compared to the full NPP configuration. This means that
four of the six cores are useless for the sin2(2θ13) measurement. This can be understood by the fact that the
Detector near far
Distance from R1 (m) 765 1677
Distance from R2 (m) 474 1566
Distance from R3 (m) 212 1507
Distance from R4 (m) 212 1507
Distance from R5 (m) 474 1566
Distance from R6 (m) 765 1677
Detector Efficiency 80 % 80 %
Dead Time 7.2 % 0.5 %
Rate without efficiency (d−1) 2859 121
Rate with detector efficiency (d−1) 2287 97
Integrated rate (y−1) 6.20 105 2.82 104
Table 11: RENO antineutrino rates expected in the near and far detectors, with and without reactor and
detector efficiencies. The integrated rate in the last line includes detector efficiency, dead time, and reactor
off periods averaged over a year.
R1 R2 R3 R4 R5 R6
near 3.0 % 7.8 % 39.2 % 39.2 % 7.8 % 3.0 %
far 14.8 % 16.9 % 18.3 % 18.3 % 16.9 % 14.8 %
Table 12: RENO rate contributions from each reactor core.
sensitivity gained by statistics is compensated by a loss of information on the νe rates and energy spectra.
Thus the appeal of this site is diminished. The sensitivity is equivalent to a 5.8 GWth NPP reactor neutrino
experiment for σpwr = 2.0 %.
The RENO collaboration considers using 3 small very near detectors (200 – 300 kg) to monitor sub-groups
of cores of the NPP. However, taking into account current knowledge on reactor spectra (σabs = 2.0 %,
σshp = 2.0 % [15]), even dedicated detectors with 10
5 νe events will not improve the thermal power knowledge
below ∼ 2 %. Thus, the overall sin2(2θ13) sensitivity will not improve.
8 Discussion
We develop here a two step comparison of the experiments described before: Double Chooz phase I (single
detector, DC I ), phase II (both detectors, DC II ), Daya Bay phase I (DB Mid) and phase II (DB Full) and,
RENO (RN ). The first elements of comparison are based on a single core equivalent approach. Although
purely hypothetical, this analysis provides a lot of information on the impact of the layout of the site
(locations of NPP reactor cores and detectors). The second approach, giving far more information on the
impact of systematics on each experiment, is based on the pulls-approach, presented in section 3 and detailed
in the Appendix. Note that in the following discussion we adopt the approximation that all the NPP cores
operate with the same average efficiency. However, this assumption is not guaranteed, and, especially for
NPP with many reactors such as RENO and Daya Bay experiments, running time and procedure of each
NPP core have to be taken into account in the final analysis. For the Double Chooz experiment, which
places the near detector on the flux iso-ratio line of the far detector, the full running operation time and
procedure of each core is not needed to perform the final analysis.
8.1 The single core equivalent approach
First of all we may simplify each experiment to its roughest single core equivalent (SCE) with total matching
power P =
r=1 P
r, with P r the power of the rth reactor and Nr the number of available reactors on site.
In this case, we compute the sin2(2θ13) sensitivity for each experiment (see Table 13) for their baseline
option [25, 24, 26] but also for a single core equivalent, and the averaged near and far detector locations
computed with
)−1/2
. (11)
This average distance, L, yields the same event rates associated to near and far detectors of each experiment
except for the Daya Bay Phase II experiment. For this particular case, we have to compute this average
distance in two steps for the equivalent near detector. In this special case, since there are two near sites,
we compute the L for the DB near site, LDB, and for the LA near site, LLA, and then compute the overall
single near detector equivalent distance as
)−1/2
. (12)
Following this SCE simplification, numerical values of sin2(2θ13) sensitivity are gathered in Table 13. As
DC DB Mid DB Full RN
LA II OFF LA II ON
L (in r.n.u.) 17 67 101 303 71
Lfar (in m) 1,051 931 951 1,716 1579
Lnear (in m) 274 484 576 441
⋆ 325
sin2(2θ13)lim 0.0278 0.0410 0.0381 0.0110 0.0213
sin2(2θ13)
lim 0.0274 0.0274 0.0289 0.0105 0.0176
Table 13: In this table we provide the luminosity (expressed in r.n.u., see eq. (3)), sin2(2θ13)lim at 90 % C.L.
of Double Chooz (DC II ), Daya Bay phase I (DB Mid) and phase II (DB Full), and RENO (RN ). Also
quoted in this table are the average distances, L, to a single equivalent core (SCE) with total power
r=1 P
r (see text for explanations, the star (⋆) indicates a particular treatement). The sensitivity in
the SCE case, sin2(2θ13)
lim , is then computed to highlight possible drawback on the site configuration by
comparison with the baseline sensitivity sin2(2θ13)lim.
a first remark, the biggest discrepancy between sin2(2θ13)lim and sin
2(2θ13)
is as large as 40 % for
the DB Mid experiment. With four to six times higher luminosity compared to DC, a farther near site
from the cores and a closer far site, the sensitivity for the hypothetical DB Mid SCE experiment is not
improved with respect to DC. The discrepancy between DB Mid and DB Mid SCE clearly comes from the
wide repartition of NPP. LA cores can not be properly monitored at the DB near site. We conclude that
the DB Mid experiment is half way between an experiment with a single far detector and an experiment
with two identical detectors, one near and one far. One may notice that DC is an experiment where the
reactor cores may be considered as a single equivalent core with double power since there is only a less
than 1 % difference between DC and DC SCE. This is not the case for the RN experiment where the
relative discrepancy between RN and RN SCE is at the level of 15 %. Daya Bay Phase II, thanks to its two
near sites, is only marginally affected by a ∼ 5 % difference between DB Full and DB Full SCE (more on
section 8.2).
8.2 The pulls analysis
In this discussion we are interested in assessing how much a given experiment sensitivity relies on the
knowldege of the systematics (e.g.the number of systematics and the impact on sensitivity). The pulls-
approach is perfectly adapted to this analysis. The idea is to break down the total ∆χ2
∆χ2(slim) = χ
2(slim)− χ2min (13)
= min
{α1,...,αK}
χ2(slim, α1, . . . , αK)− min
{s,α1,...,αK}
χ2(s, α1, . . . , αK) ,
(where s is shorthand for sin2(2θ13)) (14)
into sub-parts δχ2i which represent their relative contribution to the overall ∆χ
2 of eq. (13). Since the
sensitivity limit on sin2(2θ13) is computed at 90 % C.L., ∆χ
2 has a common value for every experiment
and is equal to 2.71. Thus, δχ2i is defined as
δχ2i =
ith pull term
with the trivial induced normalization:
i = 1.
Table 14 shows the results of our computation for the baseline option of each experimental setup. We report,
together with the final sensitivities discussed earlier, the relative contributions δχ2i where δχ
N1,N2,F
are the
observables (the contribution from the first term in eq. (6) for the respective detector) and the other δχ2i
are the pulls (the weight term contributions in eq. (6)). Since the role of the far detector is to determine
δχ2i in % DC I DC II DB Mid DB Full RN
δχ2N1 — 3.0 % 4.3 % 1.1 % 1.5 %
δχ2N2 — — — 3.3 % —
δχ2F 29.5 % 38.0 % 23.4 % 31.2 % 34.4 %
29.1 % 1.5 % 9.0 % 1.0 % 7.9 %
18.4 % 1.3 % 8.4 % 0.5 % 1.0 %
δχ2rel — 48.3 % 6.6 % 56.8 % 28.5 %
δχ2scl,abs 6.5 % 1.2 % 6.1 % 0.1 % 0.1 %
δχ2scl,rel — 5.0 % 11.8 % 1.6 % 0.2 %
1.0 % 0.8 % 0.4 % 0.1 % 0.5 %
δχ2pwr 14.7 % 0.8 % 27.3 % 3.9 % 16.9 %
δχ2cmp 0.1 % 0.0 % 0.6 % 0.1 % 9.1 %
δχ2ε 0.6 % 0.0 % 2.2 % 0.3 % 0.0 %
sin2(2θ13)lim 0.054 0.028 0.041 0.011 0.021
Table 14: Relative contributions, δχ2i , to the global ∆χ
2. The higher is the δχ2i contribution, the more
the sin2(2θ13) sensitivity depends on the considered parameter. In red we highlight the main contributions
(20 – 60 %), in orange other significant terms (5 – 20 %). All the values are calculated for the base case of
each experimental setup. For the DB Full setup, where correlations between detectors on a same site are
possible, we take σrel half correlated and half uncorrelated between detectors of a same site, and completely
uncorrelated between detectors of different sites (see section 7.2 for details). Note that for Daya Bay we do
not include here batch-to-batch uncertainties.
sin2(2θ13), it is obvious that the associated residual should significantly contribute to the sensitivity. How-
ever what we see is that all computed sensitivities mainly depend on systematics, which contribute from 60 %
to 70 % of the overall ∆χ2. In the concept of identical detectors, correlated uncertainties between detectors
should weakly impact the sensitivity (at the level of near detector “precision”). This is automatically the
case if the near detector successfully monitors the NPP cores. Two of the quoted experimental setups reach
this goal: DC II and DB Full. Double Chooz, with its final two detector installation, has one dominant
systematic: the relative normalization uncertainty. The second most important contribution comes from
the relative energy scale uncertainty.
Daya Bay full installation is mostly limited by the relative normalization uncertainty. The weaker impact
of the relative energy scale uncertainties comes from the better far site distance to the NPP Cores. The
uncertainty on the energy scale matches less the oscillation induced distortion. On the other hand, three
of the described experimental setups still rely on theoretical knowledge of the spectrum and NPP cores
associated uncertainties: DC I, DB Mid and RN. In particular, the DB Mid installation is sensitive to
several systematics, especially the NPP power uncertainties. Taking data on a longer time scale (3 years,
for instance) with such a detector configuration will not improve the sin2(2θ13) sensitivity as much. Single
core power uncertainties do not weaken the sensitivity in two particular cases:
– if the near and far detectors have the same NPP core flux ratio contributions (this is the case of
DC II);
– if the far detector distance to each NPP core is the same, even if the spectra ratios are not the same in
near/far detectors. In this particular case, the oscillation pattern is not entangled with power uncer-
tainties in the far detector, since the νe travelling distances are the same. Power uncertainties would
only contribute, weakly, through the absolute normalization and correlations with other systematics
in the near detector.
The DB site configuration does not meet any of the above conditions. Moreover, for DB Mid, no near site
monitors the LA I NPP. This makes the DB Mid setup an intermediate between a two identical near/far
detector experiment and a single far detector configuration. Since the near site is farther away from the
NPP cores (Table 13), theoretical uncertainties on the spectrum have a larger impact than in DC II. Also,
since the average distance of DB Mid far site is closer, the oscillation pattern matches slightly better with
the energy scale associated distortion (bigger contribution than in DC II).
The RENO far site location is the best among the quoted setups in canceling the impact of the relative
and absolute energy scale uncertainties. However, because the site configuration does not fill any of the two
conditions for the cancellation of the NPP core power uncertainties, this experiment relies on the precision
with which each core power can be determined. Moreover, the near detector is a bit farther away than in
the DC II case, which explains why the global reactor νe rate is less effectively determined and have a larger
contribution than in DC II to the final sin2(2θ13) sensitivity.
8.3 Touching the “right systematic chord”
In the previous section we have determined the dominant systematics of each setup. In this section we
focus on the comparison of all the reactor experiments under the assumption that systematics are known
at the same level. Moreover, we want to illustrate the impact of the determination of the most significant
systematics on the sensitivity of each experimental setup:
• the single core power uncertainties;
• the relative normalization between detectors;
• the energy scale uncertainties (absolute and relative, between detectors).
In the CHOOZ experiment, the power uncertainties were assessed at 0.6 % [9]. However, this estimate
was uniquely based on the heat balance of the steam generators. Even if quite precise, this method could
be inaccurate. Other methods, such as the external neutron flux measurements, more directly linked to
the fission rates inside the cores, lead to a power evaluation less effectively determined with an assessed
error around 1.5 %. This latter method allows continuous tracking of the NPP core power variations. The
Double Chooz and Daya Bay proposals set their baseline estimates of these uncertainties at a conservative
value of 2 %. We studied the impact of this knowledge on the sensitivity by assuming two extreme scenarios:
in the worst case, a 3 % error, and the best case of 0.6 % precision. As a standard, we set central value
σpwr = 2.0 % for all the experiments.
The relative normalization between detectors is the most significant systematic in two identical detector
setups, with a near detector successfully monitoring the whole NPP (DC II and DB Full setups). The
Total
∆m231
0.4 0.6 0.8 1 1.2 1.4 1.6
Double Chooz Phase I
 sin2(2θ13)lim = 0.0539 (90 % C.L.)
Total
∆m231
0.4 0.6 0.8 1 1.2 1.4 1.6
Double Chooz Phase II
 sin2(2θ13)lim = 0.0235 (90 % C.L.)
Total
∆m231
0.4 0.6 0.8 1 1.2 1.4 1.6
Daya Bay Phase I
 sin2(2θ13)lim = 0.0402 (90 % C.L.)
Total
∆m231
0.4 0.6 0.8 1 1.2 1.4 1.6
Daya Bay Phase II
 sin2(2θ13)lim = 0.0106 (90 % C.L.)
Total
∆m231
0.4 0.6 0.8 1 1.2 1.4 1.6
 sin2(2θ13)lim = 0.0180 (90 % C.L.)
Common systematic framework
σabs 2.0 %
σshp 2.0 %
σrel 0.4 %
σpwr 2.0 %
0.5 %
(eV2) 2.5 10−3
Best Worst
σpwr 0.6 % 3.0 %
σrel 0.2 % 0.6 %
0.0 % 1.0 %
(eV2) 3 10−3 2 10−3
Figure 7: Double Chooz, Daya Bay and RENO sensitivities as a function of the size of the main systematics.
The common systematic framework is what experimentalists believe to be achievable, without any further
R&D. It is worth noting that the main difference between the common and the baseline cases comes from
Double Chooz, which takes a conservative value of the relative normalization at 0.6 %. The common
framework is used to compute the reference sin2(2θ13) sensitivity of each setup (value on top of each graph).
Then each systematic (σpwr, σrel, σscl) impact on sensitivity is separately computed and illustrated as
ratio R = sin2(2θ13)best or worst/ sin
2(2θ13)baseline on each graph. The overall impact changing all three
systematics together is also illustrated with the “Total” label. Moreover we also provide a quick guess on
sin2(2θ13) sensitivity behaviour as a function of ∆m
31 best fit value provided by other experiments. For the
Daya Bay Phase II experiment, where possible correlation between detectors on a same site may happen, we
take σrel half correlated and half uncorrelated between detectors of a same site, and completely uncorrelated
between detectors of different sites (see section 7.2 for details).
two available detailed quantifications of this uncertainty are the Double Chooz [24] and Daya Bay [25]
proposals. Each detector is estimated to measure the νe rate to a relative accuracy with respect to each
other of 0.39 % (DB Full) and 0.44 % (DC II). In the Double Chooz proposal, however, a conservative
value has been preferred for the baseline sensitivity calculation: σrel = 0.6 %. We will take this number
as the worst case. The Daya Bay collaboration plans, after some R&D, to reach a relative uncertainty of
0.2 %. This value has been taken as the best case. We set as a standard central value σrel = 0.4 % for all
experiments.
Even if we did not implement detector response in this simulations, we included energy scale uncertainties.
The Double Chooz proposal quotes σabs
= 0.5 % and σrel
= 0.5 %. This is taken as the common central
values for all the experiments. However, in the CHOOZ experiment [9], the energy scale uncertainty was
estimated at the level of 1.1 %. We thus take a 1.0 % uncertainty on both relative and absolute energy scale
determination as the worst case. As a best case, we switched off the impact of energy scale systematics on
the sensitivity.
Although the impact of the ∆m231 uncertainty on the sensitivity to sin
2(2θ13) is negligible with the current
knowledge, the central best fit value on this parameter from other experiments such as MINOS [8], K2K [7]
and SuperK [4], may have some influence on the reactor experiment sensitivities. We thus include the current
∆m231 bounds on the sensitivity computations: the “worst case” is taken to be ∆m
31 = 2.0 10
−3 eV2 and the
“best” one ∆m231 = 3.0 10
−3 eV2 (2 σ bounds from [13]). The standard central value for all the experiments
has been taken to be ∆m231 = 2.5 10
−3 eV2.
We illustrate in Figure 7 these three systematic scenarii: best, central and worst cases. Each contribution
is assessed separately, but we also show on this graph the total impact by summing the effect of the
three discussed systematics (σpwr, σrel and σscl) to their respective best, central and worst values. We also
illustrate the impact of the true central value of ∆m231 on the sensitivity. In this representation, we show the
ratio of the computed sensitivity sin2(2θ13)b,w for the best (resp. worst) case over the sensitivity sin
2(2θ13)c
for standard central systematic values. The “Total” bar shows that in Daya Bay and RENO the sensitivity
can vary from 0.6 to 1.2 – 1.3 of the baseline case, for experimental systematics ranging from a “best” to
a “worst” scenario. In the case of Double Chooz, the impact of systematics is less significant, at the level
of 20 % on both sides. In the Double Chooz and DB Mid cases, the sensitivity could be worsened for best
fit values of ∆m231 below 2.5 10
−3 eV2. In DB Full, the sensitivity is quite stable over the current allowed
range for ∆m231 with only a 5 % effect on sensitivity.
Double Chooz is an optimized experiment in the sense of robustness with respect to systematics for a goal
sensitivity in the 0.02 – 0.03 range. Daya Bay phase II is adequate to reach a sensitivity at the level of 0.01.
However, a simpler experiment for this class of sensitivity would be a scaled-up variant of Double Chooz,
with a very close near site at 150 m and a 1.5 km baseline for the far site. At the Diablo Canyon power
plant [19], where two 3.19 GWth cores are operational and modest civil engineering works would be required,
four 20 t detectors (2 near, 2 far) would give a sensitivity of 0.013 after 3 years of data taking.
9 Conclusion
In this work we have presented a detailed comparative analysis of the sensitivities to sin2(2θ13) of upcoming
and proposed reactor experiments.
We have first calculated the sensitivities using all available data published by the respective collaborations
for the baseline of both the systematical uncertainties and the experimental setups. Our results are generally
in good agreement with the sensitivities quoted by the collaborations: 0.054 for Double Chooz Phase I; 0.028
for Double Chooz Phase II; 0.041 for Daya Bay “mid”; 0.0089 for Daya Bay “full”; and 0.021 for RENO.
In the case of Daya Bay, we have additionally evaluated the impact of the proposed filling procedure,
with pairs of near-far detectors filled with the same scintillator batch and 4 different scintillator batches.
If the hydrogen mass fraction is controlled to 0.5 % between different batches, the sensitivity worsens
slightly to 0.0093. We also examined, still for Day Bay, the until-now implicit assumption that the errors
of the relative normalizations between the 8 detectors are fully uncorrelated. In reality this is the most
optimistic scenario, since a part of these uncertainties may come from site-dependent systematics. In the
most pessimistic scenario (all relative normalizations fully correlated for the same site), the Daya Bay
sensitivity would be 0.012.
An important result of this work is that the total thermal power available for an experiment, a figure of
merit that has been often used as a strong argument to “rank” different projects, has a modest impact on
the success of an experiment. Large powers are only available in multi-core sites, which are very difficult to
monitor. The associated systematics can be overwhelming with respect to the benefit from the statistics.
This is very nicely exemplified by the case of RENO, which would reach the same sensitivity with just 2 of
its 6 reactors on, and by Daya Bay “mid”, which results to be just half way between Double Chooz Phase I
and Double Chooz Phase II. We have illustrated how optimal is the use of the available thermal power in
each site through a comparison of the real experiments with ideal “single core equivalent” setups: Double
Chooz and Daya Bay “full” are nearly optimal, Daya Bay “mid” and RENO do not make full advantage of
their huge power. Daya Bay pays for the complexity of its nuclear power plant by the inevitable construction
of two near sites.
We have carried out a detailed and unified χ2−pull analysis for all the experiments, under common assump-
tions for all systematic uncertainties. This study allowed us to compare all projects on an equal footing and
evaluating the impact of each single systematic uncertainty on the sensitivity of each experiment. In this
calculation the Double Chooz baseline sensitivity is 0.0235 and a single systematic dominates, which is the
error of the relative normalization between near and far detector. This shows that taking into consideration
its small mass, Double Chooz is an optimized experiment. With respect to the other projects, the Double
Chooz sensitivity has a more pronounced dependence on the best fit ∆m213, with sin
2(2θ13)lim ranging from
0.020 to 0.031 for the currently 2 σ−allowed ∆m213 interval.
Daya Bay “full” proves to be robust with respect to the size of the systematical errors and to variations of
∆m213 and is the only partly approved project with an achievable sensitivity potential below 0.01. The Daya
Bay sensitivity is dominated by the accuracy of the relative normalization between detectors and the degree
of correlation existing on a same site between the latter parameters. With the knowledge of the systematics
we have today, the Daya Bay sensitivity would be on average between 0.0094 and 0.0123, depending on the
degree of correlation of the systematics between detectors on the same site.
Contrary to Double Chooz and Daya Bay, the sensitivity of RENO is largely degraded by the uncertainties
of the reactor powers and fuel composition. This, again, shows that the site is not optimal. Nevertheless,
due to the large target mass and optimal baseline, a very competitive sensitivity, around 0.02, is achievable.
This pull analysis also shows that the impact of the backgrounds on the χ2 is minor with respect to other
systematics. Backgrounds, at least in our simulation, are therefore not critical in any of the analyzed
experiments.
Taken into consideration all the above results, we come to a conclusion about the features of the optimal
experiment to approach a sin2(2θ13) sensitivity of 0.01. Where by “optimal” we mean a robust sensitivity,
the simplest configuration, the minimal amount of civil works and the smallest mass. Such an optimal
experiment would have a nuclear power plant layout as simple and powerful as Double Chooz; a favorable
topography for sufficiently deep underground laboratories, a very close, single near site, a ∼ 1.5 km baseline
for the far site and a total target mass of about the half of Daya Bay. Diablo Canyon, California, is a good
example of an suitable site for such an experiment.
Acknowledgement
We are greatful to A. Milsztajn for the careful proofreading of this article and his useful comments. We
wish also to acknowledge M. Lindner for fruitful discussions on the comparison of the θ13 experiments. It
is also a pleasure to thank M. Cribier and H. de Kerret for continuous helpful exchanges.
Appendix
In this section we first describe the way we computed event rates, and then the χ2 analysis implementation
with detailed systematic inclusions.
Event rates
The visible energy inside the detector has a simple expression as a function of the positron energy or neutrino
energy of the inverse β-decay reaction:
Evis = Ee+ +me ≃ Eν − Ethrν + 2me , (16)
where Ethrν = Mn−Mp+me is the νe threshold energy of the reaction. The event rates produced in reactor
R and recorded in detector D per visible energy bin [Ei;Ei+1] may be written as
i (θ13,∆m
31) =
∫ Ei+1−me
Ei−me
(Ee+ , Eν) (17)
(Ee+ , Eν) =
4πL2R,D
h(LR,D, L)σ(Eν)φ
R,D(Eν) ǫ(Ee+) (18)
×R(Ee+ , Eν)Pee(Eν , L, θ13,∆m231) ,
where h(LR,D, L), σ(Eν ), ǫ(Ee+), φ
R,D(Eν), R(Ee+ , Eν) stand for the finite size effect function the νe
inverse β-decay reaction cross section, the e+ detection efficiency, the νe flux from reactor R in detector D
and the energy response of the detector respectively. The generic normalization factor NR,D is the product
of the experiment life time by the number of available free protons inside the target volume, the global load
factor of reactor R and the dead time of detector D. The νe flux from reactor R in detector D is described
in term of the isotope composition as:
φR,D(Eν) = P
Efisℓ
ℓ (Eν) (19)
where ℓ = 235U, 238U, 239Pu, 241Pu labels the most important isotopes contributing to the νe flux, C
is the relative contribution of the isotope ℓ to the total reactor power (CRℓ = N
R, and Nfisℓ is the
number of fissions per second of isotope ℓ), Efisℓ is the energy release per fission for isotope ℓ and P
R is the
thermal power of reactor R. In eq. (19), φ
ℓ (Eν) is the energy differential number of neutrinos emitted
per fission by the isotope ℓ, and we adopt the parameterization for the φ
ℓ (Eν) from ref. [16]. For the C
we take a typical isotope composition in a nuclear reactor given in eq. (2).
We take for Pee the oscillation probability expressed in eq. (4). We assume in this article a constant efficiency
of ǫ(Ee+) = 80 % and a Gaussian energy response function
R(Ee+ , Eν) =
2πρ(Ee+)
Ee+ − Eν +Mn −Mp
2ρ(Ee+)
. (20)
with an energy resolution of ρ(Ee+) = 8 %/
Ee+ [MeV ]. The finite size effect function h(LR,D, L) is
assumed to be a Dirac distribution (pointlike sources and detectors) in this paper. The total event rates in
ith bin of detector D is then simply expressed as
NDi (θ13,∆m
31) =
R=R1,...,RNr
i (θ13,∆m
31) . (21)
χ2 analysis and systematics inclusion
The computed event rates, N
i , are then included in a χ
2 pull-approach analysis [14], where correlations
between the systematic uncertainties are properly included:
χ2 = min
i=1,...,Nb
D=D1,...,DN
∆Di −
k,k′=1
k,k′αk′ . (22)
with,
∆Di =
i −NDi
/UDi (23)
and UDi are given by eq. (9). The αk and S
i,k coefficients are described in Table 15.
The SDi,k coefficient represents the shift of the i
th bin of detector D spectrum due to a 1 σ variation in the
kth systematic uncertainty parameter αk. Most of the systematics are expressed as function of N
i of B
quantities already described previously. The energy scale systematic coefficients in SDi,k are defined through
MDi which follows the relation
MDi =
R=R1,...,RNr
i , (24)
where
∫ Ei+1−me
Ei−me
(Ee+ , Eν) . (25)
It is often assumed in the pull-approach that Wk,k′ = δk,k′ . If we keep this definition of Wk,k′ then we
are faced with the problem that the reactor spectrum shape uncertainties may contribute to the absolute
normalization error and the fuel composition uncertainties may contribute to the reactor core power errors.
If we want to get rid of these free contributions we could use two methods:
1. the first one consists to infer that
ℓ αcmp,ℓC
ℓ = 0
2. the second one is to introduce additional weight terms in the χ2 definition (22):
ℓ αcmp,ℓC
ℓ /εcmp
The first method allows disentangling fuel composition from power uncertainties. However, this assumption
is a bit too restrictive since in practice the sum
ℓ αcmp,ℓC
R = 0 is only constrained at the knowledge
level of the power of reactor R. The second method has the advantage of allowing an estimate of the level
of contribution of this systematic to the power uncertainties. Moreover this simply leads to redefining the
Wk,k′ matrix as
W−1k,k′ = δk,k′ +
ℓ,ℓ′,R
cmp,R
ℓ,ℓ′,k,k′
CRℓ C
ε2cmp
, (26)
cmp,R
ℓ,ℓ′,k,k′ =
δℓ,k−kR
δℓ′,k′−kR
if k, k′ ∈ {kR0 + 1, . . . , kR0 + 5},
kR0 = 5Nd +Nb +Nr + 2 + 4(R− 1),
0 otherwise.
With these definitions, εcmp determines at which level the fuel composition uncertainties are allowed to
contribute to the core power errors. One wants typically that fuel composition uncertainties contribute
within the allowed region of power uncertainty. Thus,
εcmp = σpwr/P
R . (28)
Regarding the fuel composition uncertainties, σcmp may be assessed roughly at the level of σpwr uncertainties:
σ2cmp = 2 − 3 % . (29)
Error type k SD
i,k ×UDi αDi,k
Absolute normalization 1 σabsN
i αabs
Relative normalization (Ns = 1)
in D1 Ns + 1 σrelN
in DNd Ns +Nd σrelN
Absolute Energy scale Nd + 2 σsclN
i αscl
Relative Energy scale (Ns = Nd + 2)
in D1 Ns + 1 σ
MD1i α
in DNd Ns +Nd σ
Backgrounds (Ns = 2Nd + 2)
accidentals in D1 Ns + 1 σ
BD11,i α
accidentals in DNd Ns +Nd σ
1,i α
cosmogenics in D1 Ns +Nd + 1 σ
BD12,i α
cosmogenics in DNd Ns + 2Nd σ
2,i α
proton recoils in D1 Ns + 2Nd + 1 σ
BD13,i α
proton recoils in DNd Ns + 3Nd σ
3,i α
Reactor spectrum shape (Ns = 5Nd + 2)
in bin 1 Ns + 1 σshpN
1 αshp,1
in bin Nb Ns +Nb σshpN
αshp,Nb
Reactor power (Ns = 5Nd +Nb + 2)
from R1 Ns + 1 σ
from RNr Ns +Nr σ
pwr N
RNr ,D
Reactor R composition (Ns = 5Nd +Nb +Nr + 2 + 4(R− 1))
from 235U Ns + 1 σcmpN
cmp,R
from 239Pu Ns + 2 σcmpN
cmp,R
from 238U Ns + 3 σcmpN
cmp,R
from 241Pu Ns + 4 σcmpN
cmp,R
Table 15: Systematic parameters table. We used the following definitions: Nb is the number of bins in the
reactor spectra, Nd is the number of detectors in the experiment, Nr is the number of reactor cores in the
NPP, Ns is short for the number of previously defined systematic parameters. For specific values of σabs,
σrel, σshp, σBD
, σpwr, σcmp, we refer to the experiment comparison sections 6 – 8.
References
[1] B. T. Cleveland et al., Astrophys. J. 496 (1998) 505; J.N. Abdurashitov et al. [SAGE Collaboration],
J. Exp. Theor. Phys. 95 (2002) 181, astro-ph/0204245; T. Kirsten et al. [GALLEX and GNO Collab-
orations], Nucl. Phys. B (Proc. Suppl.) 118 (2003) 33; C. Cattadori, Talk given at Neutrino04, June
14-19, 2004, Paris, France.
[2] Super-K Collaboration, S. Fukuda et al., Phys. Lett. B 539 (2002) 179, hep-ex/0205075; J. Hosaka et
al. hep-ex/0508053.
[3] SNO Collaboration, Q.R. Ahmad et al., Phys. Rev. Lett. 89, 011302 (2002), nucl-ex/0204009;
B. Aharmim et al., Phys. Rev. C 72, 055502 (2005), nucl-ex/0502021.
[4] Super-K Collaboration, Y. Fukuda et al., Phys. Rev. Lett. 81 (1998) 1562, hep-ex/9807003; Y. Ashie
et al., Phys. Rev. D 71 (2005) 112005, hep-ex/0501064.
[5] K. Eguchi et al. [KamLAND Collaboration], Phys. Rev. Lett. 90 (2003) 021802, hep-ex/0212021.
[6] T. Araki et al. [KamLAND Collaboration], Phys. Rev. Lett. 94, 081801 (2005), hep-ex/0406035.
[7] K2K Collaboration, E. Aliu et al., Phys. Rev. Lett. 94, 081802 (2005), hep-ex/0411038; M. H. Ahn et
al., hep-ex/0606032.
[8] B. J. Rebel [MINOS Collaboration], arXiv:hep-ex/0701049.
[9] M. Apollonio et al. [CHOOZ Collaboration], Phys. Lett. B 466, 415 (1999), hep-ex/9907037; Eur.
Phys. J. C 27, 331 (2003), hep-ex/0301017.
[10] G. Alimonti et al. [BOREXINO Collaboration], Astropart. Phys. 8, 141 (1998).
[11] B. Pontecorvo, J. Exptl. Theoret. Phys. 33 (1957) 549 [Sov. Phys. JETP 6 (1958) 429]; J. Exptl.
Theoret. Phys. 34 (1958) 247 [Sov. Phys. JETP 7 (1958) 172]; Z. Maki, M. Nakagawa and S. Sakata,
Prog. Theor. Phys. 28 (1962) 870.
[12] M. Maltoni et al., New J. Phys. 6, 122 (2004), hep-ph/0405172.
[13] T. Schwetz, Phys. Scripta T127 (2006) 1, hep-ph/0606060.
[14] G. L. Fogli, E. Lisi, A. Marrone, D. Montanino and A. Palazzo, Phys. Rev. D 66, 053010 (2002),
hep-ph/0206162.
[15] Schreckenbach K. et al., Phys. Lett. B160,325(1985); Hahn A.A. et al., Phys. Lett. B218, 365(1989);
F. Von Feilitzsch et al. Phys. Lett. B118 , 162(1982).
[16] P. Vogel and J. Engel, Phys. Rev. D 39 (1989) 3378.
[17] C. Bemporad, G. Gratta and P. Vogel, Rev. Mod. Phys. 74 (2002) 297, hep-ph/0107277.
[18] P. Huber and T. Schwetz, Phys. Rev. D 70, 053011 (2004) hep-ph/0407026.
[19] K. Anderson et al., hep-ex/0402041.
[20] P.Antonioli et al., Astropart. Phys.7, 4 (1997), 357-368
[21] B.P. Heisinger, Muon-induzierte Produktion von Radionukliden, dissertation at the Physik Department
E15, TU-München, (1998).
[22] A.Tang, G. Horton-Smith, V. A. Kudryavtsev, A. Tonazzo, Journal-ref: Phys.Rev. D74 (2006) 053007.
[23] T. Hagner, R. von Hentig, B. Heisinger, L. Oberauer, S. Schönert, F. von Feilitzsch, and E. Nolte,
Astropart. Phys. 14, 33 (2000).
http://arxiv.org/abs/astro-ph/0204245
http://arxiv.org/abs/hep-ex/0205075
http://arxiv.org/abs/hep-ex/0508053
http://arxiv.org/abs/nucl-ex/0204009
http://arxiv.org/abs/nucl-ex/0502021
http://arxiv.org/abs/hep-ex/9807003
http://arxiv.org/abs/hep-ex/0501064
http://arxiv.org/abs/hep-ex/0212021
http://arxiv.org/abs/hep-ex/0406035
http://arxiv.org/abs/hep-ex/0411038
http://arxiv.org/abs/hep-ex/0606032
http://arxiv.org/abs/hep-ex/0701049
http://arxiv.org/abs/hep-ex/9907037
http://arxiv.org/abs/hep-ex/0301017
http://arxiv.org/abs/hep-ph/0405172
http://arxiv.org/abs/hep-ph/0606060
http://arxiv.org/abs/hep-ph/0206162
http://arxiv.org/abs/hep-ph/0107277
http://arxiv.org/abs/hep-ph/0407026
http://arxiv.org/abs/hep-ex/0402041
[24] F. Ardellier et al., hep-ex/0405032; S. Berridge et al., hep-ex/0410081; M. Goodman, Th. Lasserre et
al., hep-ex/0606025.
[25] X. Guo [Daya Bay Collaboration], hep-ex/0701029. Yifang Wang,
hep-ex/0610024. Seminar of D. Jaffe at CEA/Saclay on 0ctober 23 2006,
www-dapnia.cea.fr/Phocea/file.php?class=std&file=Seminaires/1425/1425.pdf
[26] http://neutrino.snu.ac.kr/RENO/
[27] http://www.ba.infn.it/~now/now2006/Program/program.html
[28] M. Aoki et al., hep-ex/0607013.
[29] J.C. Anjos et al., hep-ex/0511059.
[30] CEA, Nuclear power plants in the world (2005).
http://132.166.172.2/fr/Publications/trilogie/Elecnuc-2005.pdf
CEA, Energy handbook (2005).
http://132.166.172.2/fr/Publications/trilogie/Infos-Energie-2005.pdf
[31] Achkar B. et al., Phys. Lett. B374, 243(1996).
[32] H. Minakata, et al., Phys.Rev. D68 (2003) 033017, hep-ph/0211111.
[33] P. Huber, et al., Nucl.Phys. B665 (2003) 487-519, hep-ph/0303232.
[34] G. Mention, PhD. Thesis, http://doublechooz.in2p3.fr/
[35] M. Apollonio, et al., Eur. Phys. J. C27, (2003) 331, hep-ex/0301017.
[36] S. Eidelman et al., Phys. Lett. B592 (2004).
http://arxiv.org/abs/hep-ex/0405032
http://arxiv.org/abs/hep-ex/0410081
http://arxiv.org/abs/hep-ex/0606025
http://arxiv.org/abs/hep-ex/0701029
http://arxiv.org/abs/hep-ex/0610024
http://arxiv.org/abs/hep-ex/0607013
http://arxiv.org/abs/hep-ex/0511059
http://arxiv.org/abs/hep-ph/0211111
http://arxiv.org/abs/hep-ph/0303232
http://doublechooz.in2p3.fr/
http://arxiv.org/abs/hep-ex/0301017
	Experimental context
	Neutrino oscillation at reactor and 13
	Generic analysis of 13 sensitivity
	Generic overview of systematic error inputs
	Backgrounds: description and modelization
	Comparison of the current proposals
	Detailed systematics review
	Impact of the scintillator composition
	Estimation of the -induced backgrounds
	Reactor experiments baseline sensitivity
	Double Chooz
	Daya Bay
	RENO
	Discussion
	The single core equivalent approach
	The pulls analysis
	Touching the ``right systematic chord''
	Conclusion
ABSTRACT
  We present in this article a detailed quantitative discussion of the
measurement of the leptonic mixing angle theta_13 through currently scheduled
reactor neutrino oscillation experiments. We thus focus on Double Chooz (Phase
I & II), Daya Bay (Phase I & II) and RENO experiments. We perform a unified
analysis, including systematics, backgrounds and accurate experimental setup in
each case. Each identified systematic error and background impact has been
assessed on experimental setups following published data when available and
extrapolating from Double Chooz acquired knowledge otherwise. After reviewing
the experiments, we present a new analysis of their sensitivities to sin^2(2
theta_13) and study the impact of the different systematics based on the pulls
approach. Through this generic statistical analysis we discuss the advantages
and drawbacks of each experimental setup.

<|endoftext|><|startoftext|>
Introduction
Research in high data rate wireless systems has enabled applications to go wireless and become
more interesting, e.g., wireless Internet access, mobile video conferencing and mobile TV on buses
and trains. These applications would have been impossible without high rate wireless transmission
links. As many wireless devices are battery operated, power constraint is often imposed on them
to make sure that they maintain a certain desired lifespan. In this paper, we investigate optimal
routing problem to maximize the transmission rate in the wireless network where there is a power
constraint on each node.
The wireless channel is inherently broadcast, in that messages sent out by a node are heard
by all nodes listening in the same frequency band and in communication range. This opens up
opportunities for richer forms of cooperation among the wireless users/nodes. For example, rather
than using point-to-point multi-hop routing (a direct adaptation from wired networks), where a
node only transmits to the next node in the “route”, cooperative strategies, such as information
theoretic relaying [1, 2, 3] and opportunistic routing [4], could be used. These richer forms of
cooperation can lead to efficient distributed algorithms and can increase the end-to-end data
rates. The gain from cooperation has been shown in information theoretic analyses [5][6] and
demonstrated in practical implementations [7, 8, 9].
We now briefly describe what we mean by these richer forms of cooperation, often using the
term “coding” to highlight that our approach stems from information theory [10]. Figs. 1(a)–(c)
depict wireless networks in which node 1 is the source, nodes 2 and 3 are relays, and node 4
is the destination. In Fig. 1(a), since every node can hear what node 1 transmits, the simplest
strategy is for node 4 to directly decode from node 1, which we call the single-hop coding strategy
(SH). However, when nodes 1 and 4 are situated far apart, signals from node 1 go through severe
attenuation before they reach node 4. This is when relay nodes 2 and 3 can help. Referring to
http://arxiv.org/abs/0704.0499v1
Figure 1: Different coding strategies for multiple relay channels.
Fig. 1(b), node 1 transmits to node 2. Node 2 fully decodes the data and re-transmits to node
3. Node 3 does the same and relays the data to node 4. This is the well known multi-hop coding
strategy (MH). Although we can view relay nodes helping the source to transmit data as a form of
cooperation, as far as decoding is concerned, SH and MH are still point-to-point strategies (a node
only decodes from one node) and we categorize them as non-cooperative coding strategies. Taking
a closer look at MH, we see that node 3 can hear and decode node 1’s transmission (although
it is intended for node 2). This suggests a cooperative coding strategy, depicted in Fig. 1(c), in
which node 3 decodes transmissions from nodes 1 and 2, and node 4 decodes transmissions from
nodes 1–3. This cooperative way of encoding and decoding stems from an information theoretic
approach and is termed the decode-and-forward coding strategy (DF) [1, 2, 3].
Regardless of whether MH or DF is used for data transmission, there is a sequence of nodes
through which data flows. Kurose and Ross [11] define a route as “the path taken by a datagram
between source and destination”. The datagram “hops” from one node to the next node, capturing
the scenario in which a node receives data only from a node behind and forwards data only to the
node in front. However, in the cooperative coding paradigm, data does not flow from one node to
another; rather it is from many to many with complex ways of cooperating. To describe the flow
of information in these new modes of cooperation, we define a route as follows.
Definition 1 The route taken by a packet from the source to the destination is an ordered set of
nodes involved in encoding/transmitting and receiving/decoding of the packet. The sequence of the
nodes in the route is determined by the order in which nodes’ transmit signals first depend on the
packet.
Remark 1 If a group of nodes transmits simultaneously, then they can be ordered arbitrarily
within the group. For example, consider a four node network, in which node 1 first broadcasts the
message, and then nodes 2 and 3 listen and simultaneously transmit to node 4. The route here
can be described by {1, 2, 3, 4} or {1, 3, 2, 4}.
Remark 2 Fig. 1 describes three coding strategies for the same four-node network. The route
for SH in Fig. 1a is {1, 4}. The routes for MH and DF in Figs. 1b & 1c respectively are both
{1, 2, 3, 4}.
Given the myriad of ways in which nodes can cooperate, there is a natural routing problem
in the cooperative coding paradigm. Furthermore, route selection directly affects the end-to-end
data transmission rate. For DF, the current routing solutions for MH cannot be applied trivially.
In this paper, we construct an algorithm to find an optimal route (in terms of maximizing rates)
for DF.
Our contributions in this paper are as follows.
1. We show how much gain one can expect using DF, a cooperative coding strategy, over MH,
a non-cooperative coding strategy, on the same route.
2. We construct an algorithm that finds rate maximizing routes for DF.
3. We construct a heuristic algorithm that runs in polynomial time. We show that the heuristic
algorithm finds an optimal route for DF when the nodes send independent codewords.
4. We implement DF using low-density parity-check (LDPC) codes [12][13]. Also, we show the
performance of codes using different coding strategies and on different routes.
This paper investigates cooperative coding and routing in the wireless network based on an
information theoretic approach. A few idealized assumptions are made (e.g., infinite block length,
unbounded communication range). Some of these assumptions are, however, relaxed in the simu-
lations in Section 8.
1.1 Related Work
Communications in wireless networks has been progressing from MH to that using cooperative
strategies. More research is being directed toward designing codes that are based on information
theoretic cooperative coding strategies to harvest the gain in transmission rates predicted by
information theory. Examples of codes based on cooperative coding strategies include DF-based
Turbo codes [14][15] and LDPC codes [16, 17, 18, 19] for the single relay channel. It has been
mentioned that some of these codes can be extended to the multiple relay channel [2][3][5][6].
In the past, link optimization (i.e., maximizing the transmission rate between node pairs)
and route optimization were done separately. Routing was optimized after the links between the
nodes had been established. Algorithms such as Bellman-Ford [20, Section 24.1][21] and Dijkstra’s
algorithm [22] that assign costs to all links were used to find a route with the lowest cost from
source to the destination. These ways of separating routing and coding are not optimal for MH
or DF as the rates of the links change depending on which route is chosen. Realizing the inter-
dependency between links and routes, it has been suggested that links and routes be jointly
optimized [23, 24, 25, 26]. This gives rise to cross-layering [27] in the OSI model. However, in
these joint routing and coding work, data transmission from the source to destination is still based
on MH. Routing algorithms that are optimized for MH might not be suitable for DF.
In Ad hoc On-demand Distance Vector Routing (AODV) [28] and Dynamic Source Routing
(DSR) [29], the source node broadcasts a route discovery packet. Neighboring nodes receive and
re-broadcast the packet. When the destination receives the packet, a route is formed by tracing
the path that the packet took. These routing algorithms minimize the transmission delay but
might not optimize the transmission rate.
In Extremely Opportunistic Routing (ExOR) [4], a node broadcasts its data to a set of potential
relays. Nodes in this set transmit acknowledgments and then selected nodes forward the data.
Though ExOR does not have predefined routes, MH is used on the effective route taken by a
packet.
As far as we know, routing algorithms for cooperative coding have not been investigated. In
this paper, we propose algorithms to find optimal routes for DF-based codes in the multiple relay
channel. Our work complements code design by finding the best route (rate maximizing) on
which the codes can be used. As previous work focused on cooperative coding for the single relay
channel, in this paper, we implement DF-based LDPC codes on the multiple relay channel. We
then compare the transmission rate of different coding strategies on different routes.
We focus on the multiple relay channel [2][3][5][6], which is a single-source single-destination
network, as a first step towards understanding general multiple-source multiple-destination net-
works. We study DF because it is one of the “more implementable” information theoretic coding
strategies [19, 14, 15, 16, 17, 18].
2 Motivating Cooperation
2.1 Network Model
We consider a D-node network S = {1, 2, 3 . . . , D − 1, D} with one source (node 1) and one
destination (node D). Node i, ∀i ∈ S, either transmits at fixed average power Pi or turns off.
We use the standard path loss model for signal propagation. The received power at node t from
node i is given by Pit = κd
it Pi, where dit is the distance between nodes i and t, η is the path
loss exponent (η ≥ 2 with equality for free space transmission), and κ is a positive constant. The
receiver at node t is subject to thermal ambient noise of power Nt. We assume duplex nodes, i.e.,
nodes can transmit and receive simultaneously. We assume that all nodes have the same noise
variance.
Given this network model, we investigate how nodes can cooperatively send messages from the
source to destination. We study and compare several coding strategies.
Remark 3 We consider single-flow networks. This is the first step in understanding a more
complicated problem of multiple flows. The relevance of our work in multiple-flow networks is as
follows:
1. In a multiple-flow networks where each flow uses an allocated orthogonal channel, the rate
of each flow can be optimized in respective channel using the algorithm derived in this paper.
2. In a multiple-flow network with existing flows, if we wish to add a new flow, the algorithm
in this paper finds an optimal route for the new flow. Note that adding a new flow might
affect existing flows. We can restrict the transmit power of nodes in the new flow to control
the interference introduced.
2.2 Single-Hop Coding Strategy (SH)
In SH, the source directly transmits data to the destination. The signal-to-noise ratio (SNR) at
the destination, node D, is given by γSH(D) = P1DN
. The Shannon capacity of this SH link is
RSH =
log (1 + γSH(D)) . This rate depends on the source-destination distance and can be poor
if the source and destination are situated far away from each other (because of signal attenuation).
Remark 4 We assume that nodes that do not participate in relaying the data for a source-
destination pair do not cause interference. Another way to account for the external noise is to
include it in the receiver noise.
2.3 Multi-Hop Coding Strategy (MH)
In MH, we make use of the relays to aid the transmission from the source to the destination. The
source simply transmits to the next relay. The first relay decodes the message and re-transmits
it to the second relay, and so on until the destination. This can improve the transmission rate
if the attenuation from the source/relay to the next relay is reduced as compared to that in SH.
However, since all relays transmit simultaneously, there exists interference, beside noise, at the
receiver.
In the rest of this paper, we denote a route by M = {m1,m2, . . . ,m|M|}. We define the
set of all possible routes from the source (node 1) to the destination (node D) by Π(S) =
{m1,m2, . . . ,m|M|} : m2, . . . ,m|M|−1 are all possible selections and permutations of the relays
(including the empty set), m1 = 1,m|M| = D
Using the routeM, the SNR at node mt is
γMH(mt,M) = Pmt−1mt
|M|−1
i=1,i6=t
Pmimt +Nmt
. (1)
Since all relays and the destination must fully decode the messages, the transmission rate from
the source to the destination using routeM is
RMH(M) = min
m∈M\{m1}
log (1 + γMH(m,M)) . (2)
We term RMH(M) the rate supported by the route M using MH. The maximum rate using by
MH, optimized over all possible routes, is
RmaxMH = max
M∈Π(S)
RMH(M). (3)
We note that there may exist more than one route that support this maximum rate.
2.4 Decode-and-Forward Cooperative Coding (DF)
Using DF [1, 2, 3], each decoder decodes transmissions from all nodes behind. E.g., the third node
in the route decodes the transmissions from the first and the second node. So, the third node
decodes each source message using two blocks of received codewords. In addition, assuming that
the nodes in front decode the messages correctly, a node knows what they transmit and hence it
cancels the interference from these nodes. It has been shown that in order to maximize the DF
rate on routeM, node mi transmits Xmi =
j=i+1
αmimjPmiUmj , for 0 ≤
j=i+1 αmimj ≤ 1,
∀i = 1, . . . , |M|−1. Umj are independent Gaussian random variables with unit variance. {αij |j =
i+1, . . . , |M|} are the power splits of node i, allocating portions of its transmit power to transmit
independent sub-codewords Uj . Doing this the SNR of node mt in routeM is
γDF(mt,M) = N
αmimjPmimt
. (4)
Using the routeM, DF can achieve rates up to
RDF(M) = max
m∈M\{m1}
log (1 + γDF(m,M)) , (5)
and the maximum rate using by DF is
RmaxDF = max
M∈Π(S)
RDF(M). (6)
Definition 2 We define the reception rate of node m in routeM as Rm(M) =
log (1 + γDF(m,M)) .
It is the rate at which node m can fully decode the messages. The same concept applies to MH.
Remark 5 In practice, a relay in the route might decode a message wrongly, and hence forward
the wrong message. When this happens, the nodes behind, when trying to cancel the co-channel in-
terference introduced by this relay, will introduce more noise at their decoders. While this scenario
is not captured in (4), we allow imperfect interference cancellation in our simulations (Section 8).
2.5 Comparing the Strategies
It is easy to see that, for any chosen routeM with four nodes or more, γDF(m,M) > γMH(m,M), ∀m ∈
M. Also, we can show that for anyM∈ Π(S),
RDF(M) = RMH(M) = RSH, for |M| = 2, (7a)
RDF(M) ≥ RMH(M), for |M| = 3, (7b)
RDF(M) > RMH(M), for |M| ≥ 4, (7c)
 1000
 5  10  15  20  25  30  35
No. of nodes in route, |M|
Samples taken for each |M| = 105
(1) DF rate/MH rate
(1) DF rate/SH rate
(2) DF rate/MH rate
(2) DF rate/SH rate
Figure 2: Ratio of average transmission rate SH, MH, and DF versus |M| for two cases: (1)
d1D = 10 and (2) d1D = |M| − 1.
and RmaxDF ≥ R
MH ≥ RSH.
However, it is not clear how much again, on average, we can expect using DF compared to
MH and SH. Now, we compare the rates of SH, MH, and DF for randomly generated routes of
different lengths in a line topology. We consider two cases: (1) d1D = 10 and (2) d1D = |M| − 1.
Then |M|− 2 nodes are randomly placed along the straight line joining nodes 1 and D. Note that
in case 1, node density increases with the number of nodes while in case 2, the average adjacent
node spacing is constant for all |M|. We set Pi = Ni = 1 for all transmitters and receivers, and
κ = 1, η = 2. For each randomly generated route, we calculate the transmission rate using SH, MH,
and DF. Here we restrict the nodes to transmit independent codewords for easier optimization,
i.e., we set αij = 1, ∀i, ∀j = i+ 1 and αij = 0, ∀j 6= i+ 1.
Fig 2 shows the results for cases 1 and 2. For a route of 25 nodes, the DF rate is roughly two
orders of magnitude higher than that of SH and MH for both cases. Moreover, as we increase the
number of nodes in the route, the gain of DF over MH/SH increases for both cases.
We note that if the nodes are allowed to send arbitrarily correlated codewords, the DF rate
can be higher. For example, consider the route M = {(0, 0), (0.5, 0), (2, 0), (3, 0), (4, 0)}. With
independent codewords,
RDF(M)/RMH(M) = 2.95, but with arbitrarily correlated codewords,
RDF(M)/RMH(M) = 4.40.
3 The Optimal Routing Problem
We define the optimal route set for DF as
QDF , {M ∈ Π(S) : RDF(M) = R
DF } (8)
where Π(S) is a set of all possible routes from the source to the destination. We define the optimal
route set because the rate maximizing route may not be unique. Then the optimal DF routing
problem is
Find at least oneM
DF ∈ QDF and RDF
The optimal route set and routing problem for MH are similarly defined.
Finding optimal routes for MH and DF by brute force is hard, as it involves testing all routes
in Π(S). In Sections 4 and 5, we construct an algorithm that finds M
DF, potentially without
having to test all routes in Π(S). However, this algorithm runs in factorial time in the worst case.
Hence, in Section 7, we proposed a heuristic algorithm, which runs in polynomial time.
4 The Nearest Neighbor Algorithm
Now, we present an algorithm to find an optimal route for DF. In the section, we assume that
nodes use independent codewords, i.e., we set αij = 1, ∀i, ∀j = i+ 1 and αij = 0, ∀j 6= i+ 1.
Remark 6 Although the theorems in this section are proven assuming that the nodes send in-
dependent codewords, we can also show that they hold even when the nodes transmit arbitrarily
correlated codewords [30].
Remark 7 We consider independent codewords in this section because coherent combining is
practically infeasible. When the nodes are operating in the GHz range, it is difficult, if not im-
possible, to synchronize the carriers to nanosecond accuracy. Furthermore, even if we have very
precise clocks, coherent combining is still unlikely in a multiple-node network. For example, in a
four-node route, even if we manage to synchronize nodes 1 and 2 to allow coherent combining at
node 3, we might not be able to ensure that they will coherently combine at node 4.
First, we define the nearest neighbor with respect to a route.
Definition 3 Node i /∈M is a nearest neighbor with respect to the routeM iff
Pmi ≥ Pmj , ∀m ∈ M, ∀j ∈ S \ (M∪ {i}). (9)
Note that nearest neighbor might not be unique. Now, we describe the nearest neighbor algorithm
(NNA).
Algorithm 1 (NNA)
1. Initialize M = {m1}, where m1 = 1.
2. If there exists a unique nearest neighbor i∗ with respect to the current routeM, we append
i∗ to the current route: M←M∪ {i∗}. Else, the NNA terminates prematurely. Since M
is an ordered set, the notation A ∪ B means appending ordered set B to the end of ordered
set A.
3. Step 2 is repeated until the destination node, node D, is added intoM.
The algorithm is said to terminate normally if node D is added to the route. Otherwise, the
algorithm is said to terminate prematurely. If the NNA terminates normally, we have the following
theorem.
Theorem 1 Consider a multiple node wireless network with one source and one destination. If
the NNA terminates normally, then the NNA route is optimal for DF.
To prove Theorem 1, we need the following lemmas.
Lemma 1 When we add the unique nearest neighbor, node a∗, to routeM, the rate supported by
the new route M1 =M∪ {m|M1| = a
∗} is greater or equal than the rate supported by the route
formed by adding any other node toM. Mathematically,
RDF(M∪ {a
∗}) ≥ RDF(M∪ {b}), ∀b ∈ S \ (M∪{a
∗}). (10)
Proof: [Proof for Lemma 1] ConsideringM2, the reception rate of node m|M2| = b is
Rb(M∪ {b}) =
1 +Nb
 , (11)
and the reception rate of the node m|M1| = a
∗ in routeM1 is
Ra∗(M∪ {a
∗}) =
1 +N−1a∗
Pmia∗
 . (12)
Clearly, if Pma∗ ≥ Pmb, ∀m ∈ M with at least one inequality, Ra∗(M1) > Rb(M2). Hence
RDF(M1) ≥ RDF(M2). Hence we have Lemma 1.
We have proven that at any point of time during route construction, in order to maximize the
rate supported by the route, we must choose the nearest neighbor (assuming it exists). Next, we
show that choosing the nearest neighbor will not harm the rate supported by the route even when
more nodes are added.
Lemma 2 Let M = {a∗1, a
2, . . . , a
} be a route formed by adding the nearest neighbor one by
one starting from the source. Now, arbitrarily add K nodes to M. The first node b1 is not
a nearest neighbor and the rest may or may not be nearest neighbors. In other words, M1 =
{a∗1, a
2, . . . , a
, b1, b2, . . . , bK}, where b1 is not a nearest neighbor toM. We can always replace
b1 by the nearest neighbor a
|M|+1
(assuming it exists) to obtain
{a∗1, . . . , a
|M|, a
|M|+1, b1, . . . , bK−1},
if a∗|M|+1 /∈ {b1, . . . , bK−1},
{a∗1, . . . , a
|M|+1
, b1, . . . , bk−1, bk+1, . . . , bK},
if a∗
|M|+1
= bk, for some bk ∈ {b1, . . . , bK−1},
where RDF(M2) ≥ RDF(M1).
Proof: [Proof for Lemma 2] For both cases in (13), the reception rates for the first |M| nodes
in both M1 and M2 remain the same as each of them decodes from the same nodes behind the
route. In equations,
(M2) = Ra∗
(M1), ∀i = 2, 3, . . . , |M|. (14)
From Lemma 1, Ra∗
|M|+1
(M2) > Rb1(M1).
Now, we study the case when a∗
|M|+1
/∈ {b1, . . . , bK−1}. For nodes {b1, . . . , bK−1} inM2, with
an additional node behind, i.e., a∗
|M|+1
, the reception rates of these nodes are higher than the
same nodes inM1:
Rbi(M2) > Rbi(M1), ∀i = 1, 2, . . . ,K − 1. (15)
Now, we study the case when a∗
|M|+1
= bk for some bk ∈ {b1, . . . , bK−1}. Similar to the first
case, with an additional transmitting node a∗
|M|+1
, the nodes {b1, . . . , bk−1} in M2 have higher
reception rates compared to those inM1, i.e.,
Rbi(M2) > Rbi(M1), ∀i = 1, 2, . . . , k − 1. (16)
For nodes {bk+1, . . . , bK}, there is no change in the reception rate because each of them has exactly
the same nodes behind them in bothM2 andM1. So,
Rbi(M2) = Rbi(M1), ∀i = k + 1, k + 2, . . . ,K. (17)
Lemma 2 follows by noting thatRDF(M2) = mini∈M2\{a∗1} Ri(M2) ≥ mini∈M1\{a∗1} Ri(M1) =
RDF(M1).
Lemma 3 For a route that contains all nearest neighbors, the supported rate is always higher or
equal to any route, of the same length, with one or more non-nearest neighbors in it.
Proof: [Proof for Lemma 3] Lemma 3 can be proven by applying Lemma 2 recursively until
the entire set is replaced by nearest neighbor nodes.
Now we consider routes of different lengths but that end on the same node.
Lemma 4 Consider a route M1, where node 1 is the source, and node m|M1| = D is the desti-
nation, that is
M1 = {m
1,m2, . . . ,m|M1|}. (18)
Here, one or more nodes in {m2, . . . ,m|M1|} are not nearest neighbors. The following route where
all nodes are added according to the NNA (assuming that it does not terminate prematurely)
supports rate as good or higher than that supported byM1.
M2 = {m
2, . . . ,m
}, (19)
where m∗|M2| = D and |M1| not necessarily equals |M2|. In other words, RDF(M2) ≥ RDF(M1).
Proof: [Proof for Lemma 4] First of all, we consider the case |M1| = |M2|. The results follows
immediately from Lemma 3. Second, we consider |M1| > |M2|. We consider first |M2| nodes in
M1, i.e.,M
1 = {m
1,m2, . . . ,m|M2|}. Then,
RDF(M2) ≥ RDF(M
1) ≥ RDF(M1). (20)
The first inequality is obtained by applying Lemma 3. M2 andM
1 are of the same length. The
former is formed using the NNA while the latter is not. The second inequality can be argued as
follows. The first |M2| nodes in both routesM
1 andM1 are identical. Hence the reception rates
are the same. However, there are additional nodes in M1 whose reception rate might be lower
than RDF(M
1). Hence, the rate supported byM
1 can only be higher than that ofM1.
Lastly, consider |M2| > |M1|. We replace the transmitting nodes inM1 with nearest neighbors
and obtain
M3 = {m
2, . . . ,m
|M1|−1
,m|M1| = D}. (21)
Note that m|M1| might not be the nearest neighbor. Clearly, using Lemma 2, RDF(M3) ≥
RDF(M1). Now,
RDF(M2)
= min{Rm∗
(M2), . . . , Rm∗
|M1|−1
(M2), Rm∗
(M2),
. . . , Rm∗
|M2|−1
(M2), Rm∗
(M2) = RD(M2)}, (22a)
≥ min{Rm∗
(M3), . . . , Rm∗
|M1|−1
(M3), RD(M3)}, (22b)
= RDF(M3) ≥ RDF(M1). (22c)
The inequality in (22b) is because inM2, {m
, . . . ,m∗
|M2|−1
} are added to {m∗1, . . . ,m
|M1|−1
before D. A necessary condition for this is
Pmn ≥ PmD, ∀m ∈ {m
1, . . . ,m
|M1|−1
∀n ∈ {m∗|M1|, . . . ,m
|M2|−1
}, (23)
with at least one strict inequality for each n. Hence, Rn(M2) > RD(M3), ∀n ∈ {m
, . . . ,m∗
|M2|−1
With additional nodes transmitting to D inM2, RD(M2) > RD(M3). Hence, we have Lemma 4.
Proof: [Proof for Theorem 1] From Lemma 4, we know that if the NNA terminates normally,
the route (from the source to the destination) formed using the NNA can support transmission
rates as high as any other route. In other words, the NNA finds a route that supports the highest
rate achievable by DF. Theorem 1 follows.
Remark 8 We note the NNA terminates normally if and only if a unique nearest neighbor exists
at each step. In the next section, we extend the NNA to an algorithm which terminates normally
given any network topology.
5 The Nearest Neighbor Set Algorithm
In this section, we modify the NNA so that it terminates normally in any multiple node wireless
network with a single source and a single destination. We term this algorithm the nearest neighbor
set algorithm (NNSA). First, we define the nearest neighbor set.
Definition 4 The nearest neighbor set N = {n1, n2, . . . , n|N |} with respect to routeM = {m1,m2,
. . . ,m|M|} is defined as the smallest set N where each n ∈ N ⊆ S \ M satisfies the following
condition.
Pmn ≥ Pma, ∀m ∈M, ∀a ∈ S \ (M∪N ), (24)
with at least one strict inequality for every pair of (n, a) ∈ {(n, a) : n ∈ N , a ∈ S \ (M∪N )}.
Now we describe the NNSA.
Algorithm 2 (NNSA)
1. Starting with the source node, we have M = {1}.
2. Find the nearest neighbor set N . The original route M branches out to |N | new routes as
follows:
Mi ←M∪ {ni}, i = 1, . . . , |N |. (25)
3. For each new route in (25), step 2 is repeated until the destination is added to all routes.
When the algorithm terminates, we end up with many routes from the source to the destination.
We term these routes NNSA candidates. We calculate the supported rate of each candidate and
choose the one which gives the highest supported rate. The following theorem says that any NNSA
candidate that gives the highest supported rate is an optimal route for DF.
Theorem 2 Consider a single-source single-destination multiple node wireless network. The
NNSA candidate routes that give the highest supported rate are optimal for DF.
Proof: [Sketch of proof for Theorem 2] Using the technique used in the proof of Theorem 1,
we can show that adding a node that does not belong to the nearest neighbor set can only
be suboptimal. We can always replace that node with one from the nearest neighbor set and
obtain an equal or higher rate. In other words, we can show that the supported rate of M1 =
{1,m2, . . . ,m|M1|−1, D}, with one or more nodes in {m2, . . . ,m|M1|} not from the nearest neighbor
set, is lower or equal to the supported rate ofM2 = {1,m
2, . . . ,m
|M2|−1
, D}, where all nodes in
M2 are added according to the NNSA. In other words, RDF(M2) ≥ RDF(M1).
The NNSA finds all possible routes for which every node is added from the nearest neighbor
set. Hence one or more of the NNSA candidates must achieve the highest rate achievable by DF.
This gives us Theorem 2.
Remark 9 We can show that a shortest optimal route, defined as some MSOR
∈ QDF, s.t.
|MSOR
| ≤ |M|, ∀M ∈ QDF, is contained in one of the NNSA candidates that supports R
Remark 10 The NNSA might output optimal routes that include more nodes from the network
unnecessarily. In other words, shorter optimal routes exist. However, from Remark 9, we can find
the shortest optimal route by pruning the optimal routes output by the NNSA.
6 Complexity of NNSA
With the NNSA, we can now search for the optimal route in the NNSA candidate set, as compared
to searching in Π(S) using brute force. The number of candidates determines the number of routes
whose rate we need to calculate to find optimal routes for DF. We note that the size of the NNSA
candidate set might still, in the worst case, equal |Π(S)|. Using brute force, the number of
permutations we need to check is |Π(S)| = 1 +
+ · · ·
= O((D − 1)!), where
D is the total number of nodes in the network and
n×(n−1)×···×1
(n−k)×(n−k−1)×···×1
We ran the NNSA on 10000 randomly generated networks with a varying number of nodes
uniformly distributed in a 1m×1m square area. The source, relays and destination were randomly
assigned. On average, half of the NNSA candidate set sizes were less than 0.715% of the size of
Π(S) for the 8-node channel and less than 0.253% for the 11-node channel.
We note that the average size of the NNSA candidate set does grow factorially with the number
of nodes. However this does increase the range of finite size networks for which we can find optimal
routes. Furthermore, the NNSA provides insights for designing heuristic algorithms to find good
routes for DF based codes. In the next section, we propose heuristic algorithms which find routes
in polynomial time.
Remark 11 The NNSA builds routes from the source. Relays are added to the route regardless
of where the destination is. We will use this observation in designing heuristic algorithms.
7 Heuristic Algorithms
In the NNSA, the optimal route is constructed by adding the “next hop” node one by one to the
partial route. The node to be added is from the nearest neighbor set. If the nearest neighbor set
contains more than one node, the current route branches to more than one route, leading to a
possibly large NNSA candidate set size.
To avoid this, we consider a heuristic approach that starts from the source node and repeatedly
adds only one “good” candidate from the nearest neighbor set until the destination is reached. For
the choice of the next hop node, we consider the node which receives the largest sum of received
power from all the node in the existing route. We call this the maximum sum-of-received-power
algorithm (MSPA). By choosing only one node to be added to the partial route, we prevent the
algorithm from branching out to multiple routes. This heuristic approach yields only one route,
regardless of the network size. We now explicitly describe the MSPA.
Algorithm 3 (MSPA)
1. Start with the source node: M = {1}.
2. For every node t ∈ S\M, find the sum of received power from all nodes inM to t,
i∈M Pit.
3. Let a∗ be any node with the highest sum of received power, i.e.,
i∈M Pia∗ ≥
j∈M Pjt, ∀t ∈
S \M. Append node a∗ to the route: M←M∪ {a∗}.
4. Repeat steps 2–3 until the destination is added to the route.
Remark 12 Assuming that the value of the previous sum-of-received-power computations are
cached, the complexity of step 2 in MSPA is O(D) because there are at most (D − 1) nodes
not in the route. The complexity of the comparisons in step 3 is O(D). Steps 2–3 are repeated at
most (D − 1) times, giving a worst case complexity of the MSPA of O(D2). Recall that D = |S|.
It turns out that the MSPA is optimal if the nodes are restricted to sending independent
codewords, as proven in the following theorem.
Theorem 3 In a single-source single-destination multiple node wireless network in which the
nodes send independent codewords, the MSPA route is optimal for DF.
Proof: [Proof for Theorem 3] Consider an optimal route
M1 = {m
2, . . . ,m
k+1, . . . ,m
}. Suppose that the first k nodes of the MSPA route
are the same as this optimal route but the (k + 1)-th node is different, i.e., the MSPA route is
M2 = {m
2, . . . ,m
, a, . . . } where a 6= m∗
Since a is added to the route by MSPA, a necessary condition is
i=1 Pm∗i a ≥
j=1 Pm∗jm
. So,
Ra(M2) ≥ Rm∗
(M1). (26)
Now, consider the case where a 6= m∗i , ∀i = k + 2, . . . , |M|. We add a to M1 and obtain
M3 = {m
2, . . . ,m
, a,m∗
k+1, . . . ,m
}. Then,
(M3) = Rm∗
(M1), i = 2, . . . , k (27a)
Ra(M3) ≥ Rm∗
(M1) (27b)
(M3) > Rm∗
(M1), i = k + 1, . . . , |M|. (27c)
So, RDF(M3) ≥ RDF(M1).
Suppose a = m∗n, for some n ∈ {k + 2, . . . , |M|}. We swap the position of a and obtain
M4 = {m
2, . . . ,m
, a,m∗
k+1, . . . ,m
n−1,m
n+1,m
}. Then,
(M4) = Rm∗
(M1), i = 2, . . . , k, n+ 1, . . . , |M| (28a)
Ra(M4) ≥ Rm∗
(M1) (28b)
(M4) > Rm∗
(M1), i = k + 1, . . . , n− 1. (28c)
So, RDF(M4) ≥ RDF(M1).
In summary, we choose an optimal route. Starting from the second node, we compare the
optimal route with the MSPA route. If the nodes are different, we insert (or swap, if the node is in
the optimal route but at a different position) the node in the MSPA route to the optimal route to
obtain a new optimal route. Repeat this by comparing the new optimal route to the MSPA route
and changing the first node that differ until the MSPA is contained in an optimal route. Then,
we have Theorem 3.
Remark 13 However, unlike the NNA and the NNSA, the MSPA does not output an optimal
route when the nodes are allowed to send arbitrarily correlated codewords. Consider a four-node
network with node coordinates 1(0,0), 2(0.418,0), 3(0.209,0.6755), and 4(0.995,0). Assume Pi =
1, Ni = 1, κ = 1, η = 2. The MSPA route is M1 = {1, 2, 4}. The NNSA outputs M1 and
M2 = {1, 2, 3, 4}. It is easy to compute that RDF(M1) = 1.30826 and RDF(M2) = 1.31576.
8 DF with LDPC Codes
In the previous section, we computed achievable rates of different strategies and routes based on
an information theoretic approach. In this section, we compare the different strategies and routes
in a line network using practical low density parity check (LDPC) codes [12][13] with incremental
redundancy. The aims of this section are:
1. To illustrate that DF on the multiple relay channel is implementable.
2. To demonstrate that DF performs better than MH under certain network topologies.
3. To show that routing backward (away from the destination) can be good in DF.
4. To demonstrate that the NNSA route performs better than other routes using DF.
Figure 3: Network topology.
 1e-05
 1e-04
 0.001
 0.01
 0  1  2  3  4  5  6  7  8
Eb/N0 (dB)
LDPC in 4-node network. 1(0,0.5) 2(0,0.4) 3(0,1) 4(0,1.5)
DF M=1234
DF M=134
DF M=1324
DF M=124
SH M=14
MH M=1234
MH M=134
Figure 4: Performance (information bit error rate (BER) versus transmit SNR) of different strate-
gies on different routes in a 4-node network.
From Fig. 4, we see that for a given route, DF performs better than MH. An interesting
observation is that routing backward helps in DF but not MH. We find that the NNSA route
(which is also the MSPA route), i.e., {1, 2, 3, 4}, achieves the lowest BER compared to other
routes using DF.
Remark 14 We note that the total transmit energy differs depending on the length of the route.
One might argue that route {1, 4}, though having a higher BER, is better as only 1/3 power is
consumed compared to route {1, 2, 3, 4}. However, we stress that this paper finds a route that max-
imizes the transmission rate, given that each node must transmit within a given power constraint.
Whether or not the node transmits, it is not important in the route comparison.
Remark 15 We plotted BER versus SNR in Fig. 4. If the maximum raw channel data rate
(in bps) is Ψmax, then the throughput is Ψ = (1 − PER) · Ψmax, where PER is the packet error
rate and depends on the BER and the packet size. In simulations, we found that packet error rate
(PER) had the same behavior as BER.
9 Concluding Remarks
We first showed that DF gives a significant transmission rate gain over MH, for an arbitrary
route in the wireless network. We presented an algorithm, the nearest neighbor set algorithm
(NNSA), to find optimal routes, which maximize the rates achievable by DF. As this algorithm,
in the worst case, runs in factorial time, we designed a heuristic algorithm, the maximum sum-
of-received-power algorithm (MSPA), that runs in polynomial time. We showed that the MSPA
finds an optimal route when the nodes can only send independent codewords. However, unlike
the NNA and the NNSA, the MSPA does not find an optimal route when the nodes are allowed
to send arbitrarily correlated codewords. We implemented DF on practical networks using LDPC
codes with incremental redundancy to compare different routes.
We would like to highlight that for a given route, the choice of coding strategies, be it MH or
DF, does not affect the spatial re-use of the system. In both strategies, the nodes in the route
(except the destination) transmit at the same power level. The difference lies in how the nodes
decode the data.
We also note that there are some practical problems in implementing DF in large networks.
First, since real-world systems have finite transmit power constraints, nodes have a finite commu-
nication range, beyond which they cannot be heard. Second, large networks may be distributed
over a wide area and cooperation over large distances may not be feasible. The solution is to
partition the network into clusters and perform cooperative coding locally in the cluster, e.g., local
DF, in which only nodes in a cluster cooperate with each other.
References
[1] T. M. Cover and A. A. El-Gamal, “Capacity theorems for the relay channel,” IEEE Trans.
Inform. Theory, vol. IT-25, no. 5, pp. 572–584, Sept. 1979.
[2] L. Xie and P. R. Kumar, “An achievable rate for the multiple level relay channel,” IEEE
Trans. Inform. Theory, vol. 51, no. 4, pp. 1348–1358, Apr. 2005.
[3] G. Kramer, M. Gastpar, and P. Gupta, “Cooperative strategies and capacity theorems for
relay networks,” IEEE Trans. Inform. Theory, vol. 51, no. 9, pp. 3037–3063, Sept. 2005.
[4] S. Biswas and R. Morris, “Opportunistic routing in multi-hop wireless networks.” Computer
Communication Review, vol. 34, no. 1, pp. 69–74, 2004.
[5] L. Ong and M. Motani, “Myopic coding in wireless networks,” Proceedings of the 39th Confer-
ence on Information Sciences and Systems (CISS 2005), John Hopkins University, Baltimore,
MD, Mar. 16–18 2005.
[6] ——, “Myopic coding in multiple relay channels,” 2005 IEEE International Symposium on
Information Theory (ISIT 2005), Adelaide Convention Centre, Adelaide, Australia, Sept. 4–9
2005.
[7] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity–part i: System de-
scription,” IEEE Trans. Comms., vol. 51, no. 11, pp. 1927–1938, Nov. 2003.
[8] ——, “User cooperation diversity–part ii: Implementation aspects and perfomance analysis,”
IEEE Trans. Inform. Theory, vol. 51, no. 11, pp. 1939–1948, Nov. 2003.
[9] T. L. Lim, V. Srivastava, and M. Motani, “Selective cooperation based on link distance esti-
mations in wireless ad-hoc networks,” The 2006 IEEE International Symposium on a World of
Wireless, Mobile and Multimedia Networks (WOWMOM 2006), Niagara-Falls, Buffalo-NY,
June 26–29 2006.
[10] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley and Sons,
1991.
[11] J. F. Kurose and K. W. Ross, Computer Networking: A Top-Down Approach Featuring the
Intnernet. Pearson Education, 2003.
[12] R. G. Gallager, “Low density parity check codes,” IRE Trans. Inform. Theory, vol. IT-8, pp.
21–28, Jan. 1962.
[13] D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans.
Inform. Theory, vol. 45, no. 2, pp. 399–430, Mar. 1999.
[14] B. Zhao and M. C. Valenti, “Distributed turbo coded diversity for relay channel,” Electronics
Letters, vol. 39, no. 10, pp. 786–787, May 2003.
[15] Z. Zhang, I. Bahceci, and T. M. Duman, “Capacity approaching codes for relay channels,”
2004 IEEE International Symposium on Information Theory (ISIT), Chicago, USA,, June
27–July 2 2004.
[16] M. A. Khojastepour, N. Ahmed, and B. Aazhang, “Code design for the relay channel and
factor graph decoding,” Thirty-Eight Annual Confrernce on Signal, Systems, and Computers,
Asilomar, Pacific Grove, California, Nov. 7–10 2004.
[17] A. Chakrabarti, A. De-Baynast, A. Sabharwal, and B. Aazhang, “Low density parity check
codes for the relay channel,” under revision for IEEE JSAC special issue, 2006.
[18] J. Ezri and M. Gastpar, “On the performance of independently designed ldpc codes for the
relay channel,” 2006 IEEE International Symposium on Information Theory (ISIT 2006),
The Westin Seattle, Seattle, Washington, July 9–14 2006.
[19] P. Razaghi and W. Yu, “Bilayer ldpc codes for the relay channel,” IEEE International Con-
ference on Communications (ICC), Istanbul, Turkey (2006), 2006.
[20] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms. MIT
Press and McGraw-Hill, 2001.
[21] B. L. Su, Y. M. Huang, and F. Lee, “Distributed location-aware transmission for ad hoc
networks,” 17th APAN Meetings and Joint Techs Workshop, Honolulu, USA, Jan. 2004.
[22] E. W. Dijkstra, “A note on two problems in connection with graphs,” Numerische Math,
vol. 1, pp. 269–271, 1959.
[23] R. L. Cruz and A. V. Santhanam, “Optimal routing, link scheduling and power control
in multihop wireless networks,” 22nd Annual Joint Conference of the IEEE Computer and
Communications Societies (INFOCOM), vol. 1, pp. 702–711, Mar. 30 – Apr. 3 2003.
[24] Y. Li and A. Ephremides, “Joint scheduling, power control, and routing algorithm for ad-
hoc wireless networks,” 38th Annual Hawaii International Conference on System Sciences
(HICSS), Jan 3–6 2005.
[25] G. Lu and B. Krishnamachari, “Energy efficient joint scheduling and power control for wireless
sensor networks,” Sensor and Ad Hoc Communications and Networks (IEEE SECON), pp.
362–373, Sept 26–29 2005.
[26] J. H. Zhang, H. T. Wu, Q. Zhang, and B. Li, “Joint routing and scheduling in multi-radio
multi-channel multi-hop wireless networks,” 2nd International Conference on Broadband Net-
works, vol. 1, pp. 631–640, Oct. 3–7 2005.
[27] V. Srivastava and M. Motani, “Cross-layer design: a survey and the road ahead,” IEEE
Communications Magazine, pp. 112–119, Oct 2005.
[28] C. Perkins, “Ad hoc on demand distance vector (aodv) routing,” IETF, Internet Draft, draft-
ietf-manet-aodv-00.txt, Nov. 1997.
[29] D. B. Johnson and D. Maltz, “Dynamic source routing in ad hoc wireless networks,” in Mobile
Computing. Kluwer Academic Publishers, 1996, vol. 353.
[30] L. Ong and M. Motani, “Optimal routing for the decode-and-forward strategy in the gaus-
sian multiple relay channel,” accepted and to be presented at the 2007 IEEE International
Symposium on Information Theory (ISIT 2007), Acropolis Congress and Exhibition Center,
Nice, France.
	Introduction
	Related Work
	Motivating Cooperation
	Network Model
	Single-Hop Coding Strategy (SH)
	Multi-Hop Coding Strategy (MH)
	Decode-and-Forward Cooperative Coding (DF)
	Comparing the Strategies
	The Optimal Routing Problem
	The Nearest Neighbor Algorithm
	The Nearest Neighbor Set Algorithm
	Complexity of NNSA
	Heuristic Algorithms
	DF with LDPC Codes
	Concluding Remarks
ABSTRACT
  We investigate cooperative wireless relay networks in which the nodes can
help each other in data transmission. We study different coding strategies in
the single-source single-destination network with many relay nodes. Given the
myriad of ways in which nodes can cooperate, there is a natural routing
problem, i.e., determining an ordered set of nodes to relay the data from the
source to the destination. We find that for a given route, the
decode-and-forward strategy, which is an information theoretic cooperative
coding strategy, achieves rates significantly higher than that achievable by
the usual multi-hop coding strategy, which is a point-to-point non-cooperative
coding strategy. We construct an algorithm to find an optimal route (in terms
of rate maximizing) for the decode-and-forward strategy. Since the algorithm
runs in factorial time in the worst case, we propose a heuristic algorithm that
runs in polynomial time. The heuristic algorithm outputs an optimal route when
the nodes transmit independent codewords. We implement these coding strategies
using practical low density parity check codes to compare the performance of
the strategies on different routes.

<|endoftext|><|startoftext|>
Introduction and main results
Let G be a group. We shall write A(G) for the automorphism group
of G. According to Schweigert [10], we say that an element f ∈ A(G)
is a polynomial automorphism of G if there exist integers ǫ1, . . . , ǫm ∈ Z
and elements u0, . . . , um ∈ G such that
f(x) = u0x
ǫ1u1 . . . um−1x
for all x ∈ G. Since f(1) = 1, it is easy to see that f(x) can be
expressed as a ‘product’ of inner automorphisms, that is
f(x) = (v−1
xǫ1v1) . . . (v
ǫmvm).
We shall write P0(G) for the set of polynomial automorphisms of G.
Actually, Schweigert defines a polynomial automorphism in the context
of finite groups. In particular, in this context, the set P0(G) is clearly
a subgroup of A(G). On the other hand, this is not necessarily the
case when G is infinite. For instance, in the additive group of rational
numbers, the set of polynomial automorphisms forms a monoid with
respect to the operation of functional composition, which is isomorphic
to the multiplicative monoid Z \ {0}.
1991 Mathematics Subject Classification. 20F28, 20F16, 20F18.
Key words and phrases. polynomial automorphism, metabelian group, nilpotent
group, IA-automorphism.
http://arxiv.org/abs/0704.0500v1
2 GÉRARD ENDIMIONI
In this paper, we shall consider the subgroup P(G) = 〈P0(G)〉 of
A(G), generated by all polynomial automorphisms ofG. Hence P0(G) =
P(G) when G is finite, but for example P(G) is distinct from P0(G)
when G is the additive group of rational numbers (note that P(G) =
A(G) in this last case).
It is easy to verify that P0(G) is a normal subset of A(G). Thus
P(G) is a normal subgroup of A(G); in addition, we have
I(G)E P(G)E A(G),
where I(G) is the group of inner automorphisms of G. Also P(G)
contains the group of invertible elements of the monoid P0(G). It is
worth noting that there exist finite groups G such that the quotient
P(G)/I(G) is not soluble [7].
If G is abelian, each polynomial automorphism is of the form x 7→ xǫ,
and so P(G) is abelian. When G is a finite nilpotent group of class
k ≥ 2, it is proved in [4] that P(G) is nilpotent of class k − 1 (see also
[10, Satz 3.5]). We show here that this result remains true when G is
infinite.
Theorem 1.1. Let G be a nilpotent group of class k ≥ 2. Then P(G)
is nilpotent of class k − 1.
Notice that conversely, if P(G) is nilpotent, then so is G since P(G)
contains the group of inner automorphisms.
When G is metabelian, it seems that nothing is known about P(G),
even in the context of finite groups. In this paper, we shall prove the
following.
Theorem 1.2. Let G be a metabelian group. Then the group P(G) is
itself metabelian.
In Section 3, we shall interpret a result of C. K. Gupta as a very
particular case of this theorem (see Corollary 3.1 below).
2. Proofs
As usual, in a group G, the commutator of two elements x, y is de-
fined by [x, y] = x−1y−1xy. Instead of [[x, y], z], we shall write [x, y, z].
We denote by [G,G] the derived subgroup of G.
POLYNOMIAL AUTOMORPHISMS 3
Lemma 2.1. Let f, g be two functions over a group G, respectively
defined by the relations
f(x) = (v−1
xǫ1v1) . . . (v
ǫmvm),
g(x) = (w−1
xη1w1) . . . (w
ηnwn)
(we do not suppose that f and g are automorphisms). Let t be an
element of G such that any two conjugates of t commute. Then we
have the relation
f(g(t)) =
tǫiηj [tǫiηj , vi][t
ǫiηj , wj][t
ǫiηj , wj, vi]
(notice that in this product, the order of the factors is of no conse-
quence).
Proof. Using the fact that any two conjugates of t commute, we can
write
f(g(t)) =
w−1j t
v−1i w
ǫiηjwjvi
tǫiηj [tǫiηj , wjvi].
We conclude thanks to the relation [x, yz] = [x, z][x, y][x, y, z]. �
In a nilpotent group G of class ≤ 2, two conjugates of any element
t ∈ G commute. Therefore, as an immediate consequence of Lemma
2.1, we observe that any two polynomial automorphisms of G commute.
Since these automorphisms generate P(G), we obtain:
Corollary 2.1. If G is a nilpotent group of class ≤ 2, then P(G) is
abelian.
We are now ready to prove our first theorem.
Proof of Theorem 1.1. Since P(G) contains I(G) (which is nilpotent of
class k − 1 exactly), it suffices to show that P(G) is nilpotent of class
at most k − 1. We argue by induction on the nilpotency class k of
4 GÉRARD ENDIMIONI
G. The case k = 2 follows from Corollary 2.1. Now suppose that our
theorem is proved for an integer k ≥ 2 and consider a nilpotent group
G of class k + 1. Denote by ζ(G) the centre of G. One can define a
homomorphism Θ : P(G) → A(G/ζ(G)), where for each f ∈ P(G),
Θ(f) is the automorphism induced by f in G/ζ(G). Clearly, if f is a
polynomial automorphism of G, then Θ(f) is a polynomial automor-
phism of G/ζ(G). Hence Θ(P(G)) is a subgroup of P(G/ζ(G)), and so,
by induction, is nilpotent of class at most k − 1. Since Θ(P(G)) and
P(G)/ kerΘ are isomorphic, it suffices to show that ker Θ is included
in the centre of P(G) and the theorem is proved. For that, consider
an element g ∈ ker Θ and put w(x) = x−1g(x) for any x in G. Thus
g(x) = xw(x) and w(x) belongs to ζ(G) for all x ∈ G. Notice that w
defines a homomorphism of G into ζ(G) since
w(xy) = y−1x−1g(x)g(y) = y−1w(x)g(y) = w(x)w(y).
In order to show that g belongs to the centre of P(G), it suffices to
verify that g commutes with any polynomial automorphism f of G.
Suppose that f is defined by the relation
f(x) = (v−1
xǫ1v1) . . . (v
ǫmvm).
We have easily
f(g(x)) = f(xw(x)) = f(x)f(w(x)) = f(x)w(x)ǫ,
where ǫ = ǫ1 + · · ·+ ǫm. In the same way, by using the fact that w is
a homomorphism, we can write
g(f(x)) = f(x)w(f(x))
= f(x)(w(v1)
−1w(x)ǫ1w(v1)) . . . (w(vm)
−1w(x)ǫmw(vm)),
whence g(f(x)) = f(x)w(x)ǫ. Thus g and f commute, as required, and
the result follows. �
Now we undertake the proof of our second theorem. First we need
the following result, which is well known and easy to prove (see for
example [8, Lemma 34.51] or [9, Part 2, p. 64]).
Lemma 2.2. In a metabelian group G, if t is an element of the derived
subgroup [G,G], we have the relation [t, x, y] = [t, y, x] for all x, y ∈ G.
POLYNOMIAL AUTOMORPHISMS 5
We arrive to the key lemma in the proof of Theorem 1.2. This
lemma shows that when G is metabelian, any element h ∈ [P(G),P(G)]
operates trivially on [G,G] and on G/[G,G].
Lemma 2.3. Let G be a metabelian group. Suppose that h is an ele-
ment of the derived subgroup [P(G),P(G)]. Then
(i) h(t) = t for all t ∈ [G,G];
(ii) x−1h(x) belongs to [G,G] for all x ∈ G.
Proof. (i) Consider the homomorphism Φ : P(G) → A([G,G]) defined
like this: for any f ∈ P(G), Φ(f) is the restriction of f to [G,G]. We
must show that ker Φ contains [P(G),P(G)]. For that, first notice that
any two conjugates of t ∈ [G,G] commute since G is metabelian. Now
we apply Lemma 2.1. If f and g are polynomial automorphisms of G
defined as in this lemma, we obtain the equalities
f(g(t)) =
tǫiηj [tǫiηj , vi][t
ǫiηj , wj][t
ǫiηj , wj, vi],
g(f(t)) =
tǫiηj [tǫiηj , vi][t
ǫiηj , wj][t
ǫiηj , vj, wi],
and so, by Lemma 2.2, f(g(t)) = g(f(t)) for all t ∈ [G,G]. It follows
that [f, g] belongs to ker Φ. In other words, the images of f and g in
P(G)/ ker Φ commute. Since P(G)/ ker Φ is generated by the images
of the polynomial automorphisms, this quotient is abelian. It follows
that ker Φ contains [P(G),P(G)], as desired.
(ii) Here, we consider the homomorphism Ψ : P(G) → A(G/[G,G]),
where for any f ∈ P(G), Ψ(f) is the automorphism induced inG/[G,G]
by f . Since a polynomial automorphism of G induces in G/[G,G]
a polynomial automorphism of G/[G,G], Ψ(P(G)) is a subgroup of
P(G/[G,G]). But P(G/[G,G]) is abelian (see for instance Corollary 2.1
above) and Ψ(P(G)) is isomorphic to P(G)/ kerΨ. Hence P(G)/ ker Ψ
is abelian. Consequently, kerΨ contains [P(G),P(G)] and the result
follows. �
Proof of Theorem 1.2. Let f, g be two elements of [P(G),P(G)]. For
any x ∈ G, put v(x) = x−1f(x) and w(x) = x−1g(x). By Lemma 2.3,
6 GÉRARD ENDIMIONI
v(x) and w(x) belong to [G,G]. Applying again Lemma 2.3, we can
write
f(g(x) = f(xw(x)) = f(x)f(w(x)) = xv(x)w(x).
In the same way, we have g(f(x)) = xw(x)v(x) = xv(x)w(x). It follows
that f and g commute. Thus [P(G),P(G)] is abelian, and so P(G) is
metabelian. �
3. IA-automorphisms of two-generator metabelian groups
By way of illustration, we apply Theorem 1.2 to IA-automorphisms
of a two-generator metabelian group. We recall that an automorphism
of a group G is said to be an IA-automorphism if it induces the iden-
tity automorphism on G/[G,G]. In a free metabelian group of rank
2, each IA-automorphism is inner [1], and so is a polynomial automor-
phism. It turns out that in any two-generator metabelian group, each
IA-automorphism is polynomial. This result is implicit in [2] with a
different terminology. For convenience, we give a proof since this one
is short and elementary.
Proposition 3.1. Each IA-automorphism of a two-generator metabelian
group is polynomial.
To prove this proposition, we shall use the following result.
Lemma 3.1. In a metabelian group G, each function ϕ of the form
x 7→ ϕ(x) = x[x, v1]
η1 . . . [x, vn]
ηn (vi ∈ G, ηi ∈ Z)
is an endomorphism.
Proof. Thanks to the relation [xy, z] = y−1[x, z]y[y, z], we get
ϕ(xy) = xy
y−1[x, vi]y[y, vi]
POLYNOMIAL AUTOMORPHISMS 7
But since the derived subgroup of G is abelian, we can write
ϕ(xy) = xy
y−1[x, vi]
[y, vi]
[x, vi]
[y, vi]
= ϕ(x)ϕ(y),
as required. �
Proof of Proposition 3.1. Suppose thatG is a two-generator metabelian
group generated by a and b. If f is an IA-automorphism of G, we have
f(a) = av and f(b) = bw, where v and w belong to the derived sub-
group [G,G]. Now notice that [G,G] is the normal closure of [a, b].
Therefore, [G,G] is generated by [a, b] and the elements of the form
[a, b, u], with u ∈ G. Hence v and w can be written in the form
v = [a, b]α
[a, b, vi]
w = [a, b]β
[a, b, wi]
where α, β, λ1, . . . , λn, µ1, . . . , µn are integers (possibly equal to 0). By
using the relation [x, y, z] = [x, y]−1[x, z]−1[x, yz], we obtain
v = [a, b]α−λ
[a, vi]
−λi [a, bvi]
w = [a, b]β−µ
[a, wi]
−µi [a, bwi]
where λ = λ1 + · · ·+ λn and µ = µ1 + · · ·+ µn. Now put
ϕ(x) = x[x, b]α−λ[x, a]µ−β
[x, vi]
−λi[x, bvi]
λi[x, wi]
µi [x, awi]
−µi .
By Lemma 3.1, ϕ is an endomorphism of G. Moreover, we have
ϕ(a) = a[a, b]α−λ
[a, vi]
−λi[a, bvi]
λi = av = f(a)
8 GÉRARD ENDIMIONI
since [a, wi] = [a, awi].
In the same way, we get
ϕ(b) = b[a, b]β−µ
[b, wi]
µi [b, awi]
−µi .
By using the identity [a, wi]
−1[a, bwi] = [b, awi]
−1[b, wi] (valid in any
group), we obtain
ϕ(b) = b[a, b]β−µ
[a, wi]
−µi [a, bwi]
µi = bw = f(b).
Thus f = ϕ and the proof is complete. �
We remark that Proposition 3.1 cannot be extended to three-generator
metabelian groups. For example, in the free metabelian group of rank
3 freely generated by a, b, c, consider the IA-automorphism f defined
by f(a) = a, f(b) = b and f(c) = c[a, b]. Suppose that f is polynomial.
Since [a, b] = c−1f(c), the commutator [a, b] would be in the normal
closure of c, hence would be a product of conjugates of c±1. Substi-
tuting 1 for c in this expression gives then [a, b] = 1, a contradiction.
Therefore f is an IA-automorphism which is not polynomial.
As a consequence of Theorem 1.2 and Proposition 3.1, we obtain an
alternative proof of a result due to C. K. Gupta [6] (see also [3]).
Corollary 3.1 ([6]). In a two-generator metabelian group, the group
of IA-automorphisms is metabelian.
Let Md denote the free metabelian group of rank d. By a result of
Bachmuth [1], if d ≥ 3, the group of IA-automorphisms of Md contains
a subgroup which is (absolutely) free of rank d. Thus Corollary 3.1
fails in a d-generator metabelian group when d ≥ 3. Also Bachmuth’s
result shows once again that the group of IA-automorphisms of Md is
not included in P(Md) (if d ≥ 3), since P(Md) is metabelian.
In conclusion we mention that the metabelian groups constitute an
important source of polynomial endomorphisms and automorphisms.
Indeed, by Lemma 3.1, each function of the form
x 7→ x[x, v1]
η1 . . . [x, vn]
ηn (vi ∈ G, ηi ∈ Z)
POLYNOMIAL AUTOMORPHISMS 9
is an endomorphism in a metabelian group G. Besides, when G is
metabelian and nilpotent, such an endomorphism is an automorphism
since in a nilpotent group, every function of the form
x 7→ u0x
ǫ1u1 . . . um−1x
ǫmum (ui ∈ G, ǫi ∈ Z)
is a bijection if ǫ1 + · · ·+ ǫm = ±1 (see [5, Theorem 1]).
References
[1] S. Bachmuth, Automorphisms of free metabelian groups, Trans. Amer. Math.
Soc. 118 (1965), 93–104.
[2] A. Caranti and C. M. Scoppola, Endomorphisms of two-generated metabelian
groups that induce the identity modulo the derived subgroup, Arch. Math. 56
(1991), 218–227.
[3] F. Catino and M. M. Miccoli, A note on IA-endomorphisms of two-generated
metabelian groups, Rend. Sem. Mat. Univ. Padova 96 (1996), 99–104.
[4] G. Corsi Tani and M. F. Rinaldi Bonafede, Polynomial automorphisms in
nilpotent finite groups, Boll. U.M.I. 5 (1986), 285–292.
[5] G. Endimioni, Applications rationnelles d’un groupe nilpotent, C. R. Acad.
Sci. Paris 314 (1992), 431–434.
[6] C. K. Gupta, IA-automorphisms of two-generator metabelian groups, Arch.
Math. 37 (1981), 106–112.
[7] G. Kowol, Polynomautomorphismen von Gruppen, Arch. Math. 57 (1991),
114–121.
[8] H. Neumann, Varieties of Groups, Springer-Verlag, Berlin (1967).
[9] D. J. S. Robinson, Finiteness Conditions and Generalized Soluble Groups,
Springer-Verlag, Berlin (1972).
[10] D. Schweigert, Polynomautomorphismen auf endlichen Gruppen, Arch. Math.
29 (1977), 34–38.
C.M.I-Université de Provence, 39, rue F. Joliot-Curie, F-13453 Mar-
seille Cedex 13
E-mail address : endimion@gyptis.univ-mrs.fr
	1. Introduction and main results
	2. Proofs
	3. IA-automorphisms of two-generator metabelian groups
	References
ABSTRACT
  We prove that if a group is nilpotent (resp. metabelian), then so is the
subgroup of its automorphism group generated by all polynomial automorphisms.

<|endoftext|><|startoftext|>
Introduction
The focusing nonlinear Schrödinger (NLS) equation for the complex valued function Ψ =
Ψ(x, t)
iΨt +
Ψxx + |Ψ|2Ψ = 0 (1.1)
has numerous physical applications in the description of nonlinear waves (see, e.g., the books
[47, 35, 36]). It can be considered as an infinite dimensional analogue of a completely
integrable Hamiltonian system [48], where the Hamiltonian and the Poisson bracket is given
Ψt + {Ψ(x),H} = 0
{Ψ(x),Ψ∗(y)} = i δ(x− y) (1.2)
|Ψx|2 − |Ψ |4
(here Ψ∗ stands for the complex conjugate function). Properties of various classes of so-
lutions to this equation have been extensively studied both analytically and numerically
[5, 6, 7, 8, 16, 21, 25, 26, 29, 32, 34, 43, 44]. One of the striking features that distinguishes
this equation from, say, the defocusing case
iΨt +
Ψxx − |Ψ|2Ψ = 0
is the phenomenon of modulation instability [1, 11, 16]. Namely, slow modulations of the
plane wave solutions
Ψ = Aei(kx−ωt), ω =
k2 − A2
develop fast oscillations in finite time.
The appropriate mathematical framework for studying these phenomena is the theory of
the initial value problem
Ψ(x, 0; �) = A(x) e
S(x) (1.3)
for the �-dependent NLS
i �Ψt +
Ψxx + |Ψ|2Ψ = 0. (1.4)
Here � > 0 is a small parameter, A(x) and S(x) are real-valued smooth functions. Introduc-
ing the slow variables
u = |Ψ|2, v =
(1.5)
the equation can be recast into the following system:
ut + (u v)x = 0
(1.6)
vt + v vx − ux +
The initial data for the system (1.6) coming from (1.3) do not depend on �:
u(x, 0) = A2(x), v(x, 0) = S ′(x). (1.7)
The simplest explanation of the modulation instability then comes from considering the so-
called dispersionless limit �→ 0. In this limit one obtains the following first order quasilin-
ear system
ut +v ux + u vx = 0
vt − ux + v vx = 0
 . (1.8)
This is a system of elliptic type because of the condition u > 0. Indeed, the eigenvalues of
the coefficient matrix (
are complex conjugate, λ = v ± i
u. So, the Cauchy problem for the system (1.8) is ill-
posed in the Hadamard sense (cf. [33, 7]). Even for analytic initial data the life span of a
typical solution is finite, t < t0. The x- and t-derivatives explode at some point x = x0 when
the time approaches t0. This phenomenon is similar to the gradient catastrophe of solutions
to nonlinear hyperbolic PDEs [2].
For the full system (1.6) the Cauchy problem is well-posed for a suitable class of �-
independent initial data (see details in [17, 46]). However, the well-posedness is not uniform
in �. In practical terms that means that the solution to (1.6) behaves in a very irregular way
in some regions of the (x, t)-plane when � → 0. Such an irregular behaviour begins near
the points (x = x0, t = t0) of the “gradient catastrophe” of the solution to the dispersionless
limit (1.8). The solutions to (1.8) and (1.6) are essentially indistinguishable for t < t0; the
situation changes dramatically near x0 when approaching the critical point. Namely, when
approaching t = t0 the peak near a local maximum1 of u becomes more and more narrow
due to self-focusing; the solution develops a zone of rapid oscillations for t > t0. They
have been studied both analytically and numerically in [8, 16, 21, 23, 25, 26, 34, 43, 44].
However, no results are available so far about the behaviour of the solutions to the focusing
NLS at the critical point (x0, t0).
The main subject of this work is the study of the behaviour of solutions to the Cauchy
problem (1.6), (1.7) near the point of gradient catastrophe of the dispersionless system (1.8).
In order to deal with the Cauchy problem for (1.8) we will assume analyticity2 of the initial
data u(x, 0), v(x, 0). Then the Cauchy problem for (1.8) can be solved for t < t0 via a
suitable version of the hodograph transform (see Section 2 below). An important feature of
the gradient catastrophe for this system is that it happens at an isolated point of the (x, t)-
plane, unlikely the case of KdV or defocusing NLS where the singularity of the hodograph
solution takes place on a curve. We identify this singularity for a generic solution to (1.8)
as the elliptic umbilic singularity (see Section 4 below) in the terminology of R.Thom [42].
This codimension 2 singularity is one of the real forms labeled by the root system of the D4
type in the terminology of V.Arnold et al. [3].
Our main goal is to find a replacement for the elliptic umbilic singularity when the disper-
sive terms are added, i.e., we want to describe the leading term of the asymptotic behaviour
for �→ 0 of the solution to (1.6) near the critical point (x0, t0) of a generic solution to (1.8).
Thus, our study can be considered as a continuation of the programme initiated in [13]
to study critical behaviour of Hamiltonian perturbations of nonlinear hyperbolic PDEs; the
fundamental difference is that the non perturbed system (1.8) is not hyperbolic! However,
many ideas and methods of [13] (see also [12]) play an important role in our considerations.
The most important of these is the idea of universality of the critical behaviour. The
1Regarding initial data with local minima we did not observe cusps related to minima in numerical simula-
tions. We believe they do not exist because of the focusing effect in the NLS that pushes maxima to cusps but
seems to smoothen minima.
2We believe that the main conclusions of this paper must hold true also for non analytic initial data; the
numerical experiments of [8] do not show much difference in the properties of solution between analytic and
non analytic cases. However, the precise formulation of our Main Conjecture has to be refined in the non
analytic case.
general formulation of the universality suggested in [13] for the case of Hamiltonian pertur-
bations of the scalar nonlinear transport equation says that the leading term of the multiscale
asymptotics of the generic solution near the critical point does not depend on the choice of
the solution, modulo Galilean transformations and rescalings. This leading term was identi-
fied in [13] via a particular solution to the fourth order analogue of the Painlevé-I equation
(the so-called P 21 equation). The existence of the needed solution to P
1 has been rigorously
established in [9]. Moreover, it was argued in [13] that this behaviour is essentially indepen-
dent on the choice of the Hamiltonian perturbation. Some of these universality conjectures
have been partially confirmed by numerical analysis carried out in [21].
The main message of this paper is the formulation of the Universality Conjecture for
the critical behaviour of the solutions to the focusing NLS. Our considerations suggest the
description of the leading term in the asymptotic expansion of the solution to (1.6) near the
critical point via a particular solution to the classical Painlevé-I equation (P-I)
Ωζζ = 6 Ω
2 − ζ.
The so-called tritronquée solution to P-I was discovered by P.Boutroux [4] as the unique
solution having no poles in the sector | arg ζ| < 4π/5 for sufficiently large |ζ|. Remarkably,
the very same solution3 arises in the critical behaviour of solutions to focusing NLS!
The paper is organized as follows. In Section 2 we develop a version of the hodograph
transform for integrating the dispersionless system (1.8) before the catastrophe t < t0. We
also establish the shape of the singularity of the solution near the critical point; the latter is
identified in Section 4 with the elliptic umbilic catastrophe of Thom. In Section 3 we develop
a method of constructing formal perturbative solutions to the full system (1.6) before the
critical time. In Section 5 we collect the necessary information about the tritronquée solution
of P-I and formulate the Main Conjecture of this paper. Such a formulation relies on a much
stronger property of the tritronquée solution: namely, we need this solution to be pole-free
in the whole sector | arg ζ| < 4π/5. Numerical evidence for the absence of poles in this
sector is given in Section 6. In Section 7 we analyze numerically the agreement between the
critical behaviour of solutions to focusing NLS and its conjectural description in terms of the
tritronquée solution restricted on certain lines in the complex ζ-plane. In the final Section 8
we give some additional remark and outline the programme of future research.
Acknowledgments. The authors thank K.McLaughlin for a very instructive discussion. One
of the authors (B.D.) thanks R.Conte for bringing his attention to the tritronquées solutions
of P-I. The results of this paper have been presented by one of the authors (T.G.) at the
Conference “The Theory of Highly Oscillatory Problems”, Newton Institute, Cambridge,
March 26 - 30, 2007. T.G. thanks A.Fokas and S.Venakides for the stimulating discussion
after the talk. The present work is partially supported by the European Science Foundation
Programme “Methods of Integrable Systems, Geometry, Applied Mathematics” (MISGAM),
3It is interesting that the same tritronquée solution (for real ζ only) appeares also in the study of certain
critical phenomena in plasma [41]. In the theory of random matrices and orthogonal polynomials a different
solution to P-I arises; see, e.g., [14].
Marie Curie RTN “European Network in Geometry, Mathematical Physics and Applications”
(ENIGMA). The work of B.D. and T.G. is also partially supported by Italian Ministry of
Universities and Researches (MUR) research grant PRIN 2006 “Geometric methods in the
theory of nonlinear waves and their applications”.
2 Dispersionless NLS, its solutions and critical behaviour
The equations (1.6) are a Hamiltonian system
ut + {u(x), H} = 0
vt + {v(x), H} = 0
with respect to the Poisson bracket originated in (1.2)
{u(x), v(y)} = δ′(x− y), (2.1)
other brackets vanish, with the Hamiltonian
u v2 − u2
dx. (2.2)
Let us first describe the general analytic solution to the dispersionless system (1.8).
Lemma 2.1 Let uI(x), vI(x) be two real valued analytic functions of the real variable x
satisfying
(uI,x)
+ (vI,x)
2 6= 0.
Then the solution u = u(x, t), v = v(x, t) to the Cauchy problem
u(x, 0) = uI(x), v(x, 0) = vI(x) (2.3)
for the system (1.8) for sufficiently small t can be determined from the following system
x = v t+ fu
0 = u t+ fv
 (2.4)
where f = f(u, v) is an analytic solution to the following linear PDE:
fvv + u fuu = 0. (2.5)
Conversely, given any solution to (2.5) satisfying u2f 2uu + f
uv 6= 0 at some point (u = ũ, v =
ṽ) such that fv(ũ, ṽ) = 0, the system (2.4) determines a solution to (1.8) defined locally near
the point x = x̃ := fu(ũ, ṽ) for sufficiently small t.
Remark 2.2 The solutions to the linear PDE (2.5) correspond to the first integrals of dis-
persionless NLS:
f(u, v) dx,
F = 0. (2.6)
Taking them as the Hamiltonians
us + {u(x), F} ≡ ut + (fv)x = 0
vs + {v(x), F} ≡ vt + (fu)x = 0
 (2.7)
yields infinitesimal symmetries of the dispersionless NLS:
(ut)s = (us)t , (vt)s = (vs)t . (2.8)
One of the first integrals will be extensively used in this paper. It corresponds to the
Hamiltonian density
g = −
v2 + u(log u− 1). (2.9)
The associated Hamiltonian flow reads
us + vx = 0
 (2.10)
Eliminating the dependent variable v one arrives at the elliptic version of the long wave limit
of Toda lattice:
uss + (log u)xx = 0.
Due to commutativity (2.8) the systems (1.8) and (2.10) admit a simultaneous solution u =
u(x, t, s), v = v(x, t, s). Any such solution can be locally determined from a system similar
to (2.4)
x = v t+ fu
s = u t+ fv
 (2.11)
where f = f(u, v), as above, solves the linear PDE (2.5).
The system (2.11) determines a solution u = u(x, t, s), v = v(x, t, s) provided applica-
bility of the implicit function theorem. The conditions of the latter fail to hold at the critical
point (x0, s0, t0, u0, v0) such that
x0 = v0t0 + fu(u0, v0)
s0 = u0t0 + fv(u0, v0)
fuu(u0, v0) = fvv(u0, v0) = 0, fuv(u0, v0) = −t0
(2.12)
In sequel we adopt the following system of notations: the values of the function f and of its
derivatives at the critical point will be denoted by f 0 etc. E.g., the last line of the conditions
(2.12) will read
f 0uu = f
vv = 0, f
uv = −t0.
Definition 1. We say that the critical point is generic if at this point:
f 0uuv 6= 0.
Let us the introduce real parameters r, ψ determined by the third derivatives of the function
f evaluated at the critical point,
(cosψ − i sinψ) = f 0uuv + i
uuu. (2.13)
Due to the genericity assumption
+ πk.
In order to describe the local behaviour of a solution to the dispersionless NLS/Toda
equations we define a function R(X;S, ψ) of real variables X , S depending on the real
parameter ψ satisfying
(S + cosψ)2 + (X + sinψ)2 6= 0 (2.14)
by the following formula
R(X,S, ψ) = sign [cosψ] (2.15)
1 +X sinψ + S cosψ +
1 + 2(X sinψ + S cosψ) +X2 + S2
P0(X,S, ψ) =
R(X,S, ψ) cosψ −
(X cosψ − S sinψ) sinψ
R(X,S, ψ)
− cosψ
(2.16)
Q0(X,S, ψ) =
(X cosψ − S sinψ) cosψ
R(X,S, ψ)
+R(X,S, ψ) sinψ
− sinψ.
Observe that P0(X,S, ψ) and Q0(X,S, ψ) are smooth functions of the real variable X pro-
vided validity of the inequality (2.14).
Lemma 2.3 Given an analytic solution u(x, s, t), v(x, s, t) to the dispersionless NLS/Toda
equations with a generic critical point (x0, s0, t0, u0, v0), and arbitrary real numbers X , S
satisfying (2.14), T < 0, then there exist the following limits
λ−1/2
u(x0 + λ
1/2v0T +
r X T 2, s0 + λ
1/2u0T +
rS T 2, λ1/2T )− u0
= r T P0 (X,S, ψ)
(2.17)
λ−1/2
v(x0 + λ
1/2v0T +
r X T 2, s0 + λ
1/2u0T +
rS T 2, λ1/2T )− v0
T Q0 (X,S, ψ)
where the parameters r, ψ are defined by (2.13).
Proof From the linear PDE (2.5) it follows that
fuvv = −ufuuu − fuu, fvvv = −ufuuv.
Using these formulae we expand the implicit function equations (2.11) near the critical point
in the form
x̄− v0t̄ = v̄ t̄+
f 0uuu(ū
2 − u0v̄2) + 2f 0uuvū v̄
(|ū|2 + |v̄|2)3/2
(2.18)
s̄− u0t̄ = ū t̄+
f 0uuv(ū
2 − u0v̄2)− 2u0f 0uuuū v̄
(|ū|2 + |v̄|2)3/2
where we introduce the shifted variables
x̄ = x− x0, s̄ = s− s0, t̄ = t− t0
ū = u− u0, v̄ = v − v0.
The rescaling
x̄− v0t̄ 7→ λ(x̄− v0t̄)
s̄− u0t̄ 7→ λ(s̄− u0t̄)
t̄ 7→ λ1/2t̄
ū 7→ λ1/2ū
v̄ 7→ λ1/2v̄
(2.19)
in the limit λ→ 0 yields the quadratic equation
z = t̄ w +
aw2, t̄ 6= 0 (2.20)
where the complex independent and dependent variables z and w read
z = s̄+ i
u0x̄− (u0 + i
u0v0)t̄, w = ū+ i
u0v̄ (2.21)
and the complex constant a is defined by
a = f 0uuv + i
uuu, (2.22)
therefore
= r eiψ.
The substitution
X = 2
x̄− v0t̄
r t̄2
, S = 2
s̄− u0t̄
r t̄2
reduces the quadratic equation to(
w + t̄ r eiψ
= r2t̄2e2iψ
1 + e−iψ(S + iX)
For t̄ < 0 we choose the following root
w = r t̄ei ψ
1 + e−i ψ(S + iX)− 1
(2.23)
where the branch of the square root is obtained by the analytic continuation of the one taking
positive values on the positive real axis. Equivalently,
w = t̄ r eiψ
sign (cosψ)
∆ + 1 + S cosψ +X sinψ + i
X cosψ − S sinψ
∆ + 1 + S cosψ +X sinψ
where
1 + 2(S cosψ +X sinψ) +X2 + S2.
This gives the formulae (2.15).
The result of the lemma describes the local structure of generic solutions to the dis-
persionless NLS/Toda equations near the critical point. It can also be represented in the
following form
u(x, s, t) ' u0 + r T P0(X,S, ψ)
(2.24)
v(x, s, t) ' v0 +
r T Q0(X,S, ψ)
where
X = 2
x̄− v0t̄
r t̄2
, S = 2
s̄− u0t̄
r t̄2
, T = t̄. (2.25)
We want to emphasize that the approximation (2.24) works only near the critical point.
Indeed, for large x→ ±∞ the function u(x, s, t) and v(x, s, t) have the following behaviour
u = −
r |x|u1/40
1∓ sinψ + u0 − r t̄ cosψ +O
(2.26)
v = ∓
r |x|
u01/4
sign (cosψ)
1± sinψ + v0 −
t̄ sinψ +O
. (2.27)
So, for sufficiently large |x| the function u(x, s, t) defined by (2.24) becomes negative.
The function u has a maximum at the point X = S tanψ, so locally
u ≤ u0 + r T cosψ −
r | cosψ|
+ r T 2 < u0. (2.28)
At the critical point (x0, s0, t0, u0, v0) the function u develops a cusp. Let us consider only
the particular case S = 0 in order to avoid complicated expressions. In this case the local
behaviour of the function u near the critical point is given by
t̄→−0
r |x̂|
1− sinψ, x̂ > 0
r |x̂|
1 + sinψ, x̂ < 0
(2.29)
(here x̂ =
u0(x̄ − v0t̄)). Thus the parameters r, ψ describe the shape of the cusp at the
critical point.
3 First integrals and solutions of the NLS/Toda equations
Let us first show that any first integral (2.6) of the dispersionless equations can be uniquely
extended to a first integral of the full equations.
Lemma 3.1 Given a solution f = f(u, v) to the linear PDE (2.5), there exists a unique, up
to a total derivative, formal power series in �
hf = f +
f (u, v;ux, vx, . . . , u
(2k), v(2k))
such that the integral
hf dx
commutes with the Hamiltonian of NLS equation:
{H,Hf} = 0
at every order in �. Explicitly,
hf = f −
fuuu +
u2x + 2fuuvuxvx − ufuuuv
(3.1)
fuuuu +
u2xx + 2fuuuvuxxvxx − ufuuuuv
fuuuuuxxv
fuuuvvxxu
3456u3
30fuuu − 9ufuuuu + 12u2f5u + 4u3f6u
432u2
−3fuuuv + 6ufuuuuv + 2u2f5u v
u3xvx +
9fuuuu + 9uf5u + 2u
(9fuuuuv + 10uf5u v)uxv
(18f5u + 5uf6u) v
+O(�6)
Here we use short notations
f5u :=
, f6u :=
, f5u v :=
∂u5∂v
Example 1. Taking f = 1
(u v2 − u2) one obtains the Hamiltonian of the NLS equation
(u v2 − u2) +
In this case the infinite series truncates. It is easy to see that the series in � truncates if
and only if f(u, v) is a polynomial in u. Polynomial in u solutions to the linear PDE (2.5)
correspond to the standard first integrals of the NLS hierarchy.
Example 2. Taking g = −1
v2 + u(log u− 1) (cf. (2.9)) one obtains the Hamiltonian of
Toda equation
hg = −
v2 + u(log u− 1)−
u2x + 2u v
240u3
144u5
360u3
+O(�6) (3.2)
written in terms of the function φ = log u in the form
�2φxx + e
φ(s+�) − 2eφ(s) + eφ(s−�) = 0.
Lemma 3.2 Any solution to the NLS/Toda equations in the class of formal power series in �
can be obtained from the equations
x = v t+
δu(x)
(3.3)
s = u t+
δv(x)
where f = f(u, v; �) is an arbitrary admissible solution to the linear PDE (2.5) in the class
of formal power series in �,
hf dx.
Now, we can apply to the system (3.3) the rescaling (2.19) accompanied by the transfor-
mation
� 7→ λ5/4�. (3.4)
At the limit λ→ 0 we arrive at the following system of equations
s̄− u0t̄ = ū t̄+ f 0uuv
(ū2 − u0v̄2) +
− u0f 0uuu
ū v̄ +
(3.5)
x̄− v0t̄ = v̄ t̄+ f 0uuu
(ū2 − u0v̄2) +
+ f 0uuv
ū v̄ +
Using the complex variables z, w defined in (2.21) we can rewrite the system in the following
form:
z reiψ = w t̄ reiψ +
wxx. (3.6)
The last observation is that the Toda equations generated by the Hamiltonian Hg =
hg dx
(see Example 2 above) after the scaling limit (2.19), (3.4) yield the Cauchy - Riemann equa-
tions for the function w = w(z),
∂w/∂z̄ = 0.
Therefore the system (3.5) can be recast into the form equivalent to the Painlevé-I (P-I)
equation (see (5.1) below)
z reiψ = w t̄ reiψ +
u0wzz. (3.7)
Choosing
λ = �4/5
we eliminate � from the equation.
In the Section 5 below we will write explicitly the reduction of (3.7) to the Painlevé-I
equation and give a conjectural characterization of the particular solution of the latter.
4 Critical behaviour and elliptic umbilic catastrophe
Separating again the real and complex parts of (3.6) one obtains a system of ODEs
UXX +
(U2 − V 2) + r t̄ (U cosψ − V sinψ)− r (S cosψ −X sinψ) = 0
(4.1)
VXX + UV + r t̄ (U sinψ + V cosψ)− r (S sinψ +X cosψ) = 0
that can be identified with the Euler - Lagrange equations
δS = 0, S =
L(U, V, UX , VX) dX
with the Lagrangian
V 2X − U
U3 − 3U V 2
(U2 − V 2) cosψ − 2U V sinψ
(4.2)
+r (X sinψ − S cosψ)U + r (S sinψ +X cosψ)V.
In the “dispersionless limit” � → 0 the Euler - Lagrange equations reduce to the search of
stationary points of a function (let us also set t̄ = 0)
U3 − 3U V 2
+ a+U + a−V (4.3)
where we redenote
a+ = r (X sinψ − S cosψ) , a− = r (S sinψ +X cosψ) .
At a+ = a− = 0 the function F has an isolated singularity at the origin U = V = 0 of the
type D4,− also called elliptic umbilic singularity, according to R.Thom [42]. This singular-
ity appears in various physical problems; we mention here the caustics in the collisionless
dark matter [40] to give just an example. The parameters a+ and a− define two particular
directions on the base of the miniversal unfolding of the elliptic umbilic; the full unfolding
depending on 4 parameters reads
U3 − 3U V 2
b(U2 + V 2) + a+U + a−V + c. (4.4)
It would be interesting to study the properties of the modified Euler - Lagrange equations for
the Lagrangian
L̂ = L+
b(U2 + V 2).
This deformation does not seem to arrive from considering solutions to the NLS hierarchy.
5 The tritronquée solution to the Painlevé-I equation and
the Main Conjecture
In this section we will select a particular solution to the Painlevé-I (P-I) equation
Ωζζ = 6Ω
2 − ζ (5.1)
Recall [22] that an arbitrary solution to this equation is a meromorphic function on the com-
plex ζ-plane. According to P. Boutroux [4] the poles of the solutions accumulate along the
arg ζ =
, n = 0, ±1, ±2. (5.2)
Boutroux proved that, for each ray there is a one-parameter family of particular solutions
called intégrales tronquées whose lines of poles truncate for large ζ . He proved that the
intégrale tronquée has no poles for large |ζ| within two consecutive sectors of the angle
2π/5 around the ray, and, moreover it has the asymptotic behaviour of the form
Ω = −
)1/2 [
(1−ε)
(5.3)
for a suitable choice of the branch of the square root (see below) and a sufficiently small
ε > 0.
Furthemore, if a solution truncates along any two of the rays (5.2) then it truncates along
three of them. These particular solutions to P-I are called tritronquées. They have no poles
for large |ζ| in four consecutive sectors; their asymptotics for large ζ is given by (5.3). It
suffices to know the tritronquée solution Ω0(ζ) for the sector
| arg ζ| <
. (5.4)
In this case the branch of the square root in (5.3) is obtained by the analytic continuation
of the principal branch taking positive values on the positive half axis ζ > 0. Other four
tritronquées solutions are obtained by applying the symmetry
Ωn(ζ) = e
, n = ±1, ±2. (5.5)
The properties of the tritronquées solutions in the finite part of the complex plane were
studied in the important paper of N.Joshi and A.Kitaev [24].
A. Kapaev [27] obtained a complete characterization of the tritronquées solutions in
terms of the Riemann - Hilbert problem associated with P-I. We will briefly sketch here
the main steps of his construction.
The equation (5.1) can be represented as the compatibility condition of the following
system of linear differential equations for a two-component vector valued function Φ =
Φ(λ, ζ)
 Ωζ 2λ2 + 2Ωλ− ζ + 2Ω2
2(λ− Ω) −Ωζ
 Φ (5.6)
Φζ = −
 0 λ+ 2Ω
 Φ. (5.7)
The canonical matrix solutions Φk(λ, ζ) to the system (5.6) - (5.7) are uniquely determined
by their asymptotic behaviour
Φk(λ, ζ) ∼
λ1/4 λ1/4
λ−1/4 −λ−1/4
(5.8)
λ−3/2
eθ(λ,ζ)σ3 , |λ| → ∞, λ ∈ Σk
in the sectors
λ ∈ C |
< arg λ <
, k ∈ Z. (5.9)
θ(λ, ζ) =
λ5/2 − ζ λ1/2, σ3 =
, H =
Ω2ζ − 2 Ω
3 + ζ Ω, (5.10)
the branch cut on the complex λ-plane for the fractional powers of λ is chosen along the
negative real half-line.
The Stokes matrices Sk are defined by
Φk+1(λ, ζ) = Φk(λ, ζ)Sk, λ ∈ Σk ∩ Σk+1. (5.11)
They have the triangular form
S2k−1 =
1 s2k−1
, S2k =
s2k 1
(5.12)
and satisfy the constraints
Sk+5 = σ1 Sk σ1, k ∈ Z; S1S2S3S4S5 = i σ1 (5.13)
where
Due to (5.13) two of the Stokes multipliers sk determine all others; they depend neither on λ
nor on ζ provided Ω(ζ) satisfies (5.1).
In order to obtain a parametrization of solutions to the P-I equation (5.1) by Stokes mul-
tipliers of the linear differential equation (5.6) one has to reformulate the above definitions
as a certain Riemann - Hilbert problem. The solution of the Riemann - Hilbert problem de-
pends on ζ through the asymptotics (5.8). If the Riemann - Hilbert problem has a unique
solution for the given ζ0 ∈ C then the canonical matrices Φk(λ, ζ) depend analytically on
ζ for sufficiently small |ζ − ζ0|; the coefficient Ω = Ω(ζ) will then satisfy (5.1). The poles
of the meromorphic function Ω(ζ) correspond to the forbidden values of the parameter ζ for
which the Riemann - Hilbert problem admits no solution.
We will now consider a particular solution to the P-I equation specified by the following
Riemann - Hilbert problem. Denote four oriented rays γ0, γ±1, ρ in the complex λ-plane
defined by
γk = {λ ∈ C | arg λ =
}, k = 0, ±1
(5.14)
ρ = {λ ∈ C | arg λ = π}
directed towards infinity. The rays divide the complex plane in four sectors. We are looking
for a piecewise analytic function Φ(λ, ζ) on
λ ∈ C \ (γ−1 ∪ γ0 ∪ γ1 ∪ ρ)
depending on the parameter ζ continuous up to the boundary with the asymptotic behaviour
at |λ| → ∞ of the form (5.8) satisfying the following jump conditions on the rays
Φ+(λ, ζ) = Φ−(λ, ζ)Sk, λ ∈ γk
(5.15)
Φ+(λ, ζ) = Φ−(λ, ζ)Sρ, λ ∈ ρ.
Here the plus/minus subscripts refer to the boundary values of Φ respectively on the left/right
sides of the corresponding oriented ray, the jump matrices are given by
, S±1 =
, Sρ =
. (5.16)
The following result is due to A.Kapaev4 .
Theorem 5.1 The solution to the above Riemann - Hilbert problem exists and it is unique
| arg λ| <
, |λ| > R (5.17)
for a sufficiently large positive number R. The associated function
Ω0(ζ) :=
dH(ζ)
(5.18)
H(ζ) :=
σ3Φ(λ, ζ) e−θ(λ,ζ)σ3 − 1
is analytic in the domain (5.17), it satisfies P-I and enjoys the asymptotic behaviour
Ω0(ζ) ∼ −
, |ζ| → ∞, | arg λ| <
. (5.19)
Moreover, any solution of P-I having no poles in the sector (5.17) for some large R > 0
coincides with Ω0(ζ).
4Our solution Ω0(ζ) coincides with y3(x) ≡ y−2(x), x = −ζ, of [27] (see eq. (2.73) of [27]; Kapaev uses
a different normalization y′′ = 6y2 + x of the P-I equation).
Joshi and Kitaev proved that the tritronquée solution has no poles on the positive real
axis. They found a numerical estimate for the position of the first pole ζ0 of the tritronquée
solution Ω0(ζ) on the negative real axis:
ζ0 ' −2.3841687
(cf. also [10]). Besides this estimate very little is known about the location of poles of Ω0(ζ).
Our numerical experiments (see below) suggest the following
Main Conjecture. Part 1. The tritronquée solution Ω0(ζ) has no poles in the sector
| arg λ| <
. (5.20)
We are now ready to describe the conjectural universal structure behind the critical be-
haviour of generic solutions to the focusing NLS. For simplicity of the formulation let us
assume cosψ > 0.
Main Conjecture. Part 2. Any generic solution to the NLS/Toda equations near the
critical point behaves as follows
u(x, s, t; �) + i
u0v(x, s, t; �) ' u0 + i
u0v0 − t̄ reiψ + 2 �2/5(3r
5 Ω0(ζ) +O
(5.21)
s̄− u0t̄+ i
u0(x̄− v0t̄) + 12re
iψ t̄2
where Ω0(ζ) is the tritronquée solution to the Painlevé-I equation (5.1).
The above considerations can actually be applied replacing the NLS time flow by any
other flow of the NLS/Toda hierarchy. The local description of the critical behaviour remains
unchanged.
Remark 5.2 Note that the angle of the line ζ(x̄) in (5.21) for t̄ fixed is equal to ψ/5 + π/2,
ψ ∈ [−π, π]. Thus the maximal value of argζ is equal to 7π/10 < 4π/5. The lines in (5.21)
consequently do not get close to the critical lines of the tritronquée solution.
6 Numerical analysis of the tritronquée solution of P-I
In this section we will numerically construct the tritronquée solution Ω0, i.e. the tritronquée
solution with asymptotic behavior (5.3). We will drop the index 0 in the following. The
solution will be first constructed on a straight line in the complex plane. In a second step
we will then explore global properties of these solutions within the limitations imposed by a
numerical approach5.
5Cf. [15] where a similar technique was applied to solve numerically the Painlevé-II equation in the complex
domain.
Let the straight line in the complex plane be given by ζ = ay + b with a, b ∈ C constant
(we choose a to have a non-negative imaginary part) and y ∈ R. The asymptotic conditions
Ω ∼ −
, (6.1)
for y → ±∞. The root is defined to have its cut along the negative real axis and to as-
sume positive values on the positive real axis. This choice of the root implies the following
symmetry for the solution:
Ω(ζ∗) = Ω∗(ζ). (6.2)
Thus Ω is real on the real axis, see [24].
Numerically it is not convenient to impose boundary conditions at infinity. We thus
assume that the wanted solution can be expanded in a Laurent series in
ζ around infinity.
Such an asymptotic expansion is possible for the considered tritronquée solution in the sector
| arg ζ| < 4π/5. The formal series can be written there (see [24]) in the form
Ωf = −
ζ5k/2
, (6.3)
where a0 = 1, and where the remaining coefficients follow from the recurrence relation for
k ≥ 0
ak+1 =
25k2 − 1
amak+1−m. (6.4)
This formal series is divergent, the coefficients ak behave asymptotically as ((k − 1)!)2, see
[24] for a detailed discussion.
It is known that divergent series can be used to get numerically acceptable approxima-
tions to the values of the sum by properly truncating the series. Generally the best approx-
imations for the sum result from truncating the series where the terms take the smallest
absolute values (see e.g. [18]). Since we work in Matlab with a precision of 16 digits and
with values of |ζ| ≥ 10, we typically consider up to 10 terms in the series. In this case the
terms corresponding to a10 are of the order of machine precision (10−14 and below).
Thus we have constructed approximations to the numerical values of the tritronquée
solution for large values of |ζ|. These can be used as in [24] to set up an initial value problem
for the P-I equation and to solve this with a standard ODE solver. In fact the approach works
well on the real axis starting from positive values until one reaches the first singularity on
the negative real axis. It is straightforward to check the results of [24] with e.g. ode45, the
Runge-Kutta solver in Matlab corresponding to the Maple solver used in [24]. If one solves
P-I on a line that avoids the sector | arg ζ| > 4π/5, one could integrate until one reaches once
more large values of |ζ| for which the asymptotic conditions are known. This would provide
a control on the numerical accuracy of this so-called shooting approach. Shooting methods
are problematic if the second solution to the initial value problem has poles as is the case for
P-I. In this case the numerical errors in the initial data (here due to the asymptotic conditions)
and in the time integration will lead to a large contribution of the unwanted solution close
to its poles which will make the numerical solution useless. It is obvious that P-I has such
poles from the numerical results in [24] and the property (5.5). In [24] the task was to locate
poles in the tritronquée solution, and in this case the shooting approach seems to be the only
available. Here we are studying, however, the solution on a line in the complex plane where
we know the asymptotic conditions for the affine parameter tending to ±∞.
Thus we use as in [15] the asymptotic conditions on lines avoiding the sector | arg ζ| >
4π/5 to set up a boundary value problem for y = ±y0, y0 ≥ 10. The solution in the
interval [−y0, y0] is numerically obtained with a finite difference code based on a collocation
method. The code bvp4c distributed with Matlab, see [39] for details, uses cubic polynomials
in between the collocation points. The P-I equation is rewritten in the form of a first order
system. With some initial guess (we use Ω = −
ζ/6 as the initial guess), the differential
equation is solved iteratively by linearization. The collocation points (we use up to 10000)
are dynamically adjusted during the iteration. The iteration is stopped when the equation is
satisfied at the collocation points with a prescribed relative accuracy, typically 10−10. The
solution for a = i and b = 0 is shown in Fig. 1. The values of Ω in between the collocation
!20 !15 !10 !5 0 5 10 15 20
Figure 1: Real (blue) and imaginary part (red) of the tritronquée solution to the Painlevé I
equation for ζ = iy.
points are obtained by interpolation via the cubic polynomials in terms of which the solution
has been constructed. This interpolation leads to a loss in accuracy of roughly one order of
magnitude with respect to the precision at the collocation points. To test this we determine
the numerical solution via bvp4c for P-I on Chebychev collocation points and check the
accuracy with which the equation is satisfied via Chebychev differentiation, see e.g. [45].
It is found that the numerical solution with a relative tolerance of 10−10 on the collocation
points satisfies the ODE to roughly the same order except at the boundary points where it
is of the order 10−8, see Fig. 2 where we show the residual ∆ by plugging the numerical
solution into the differential equation for the above example. It is straightforward to achieve
a prescribed accuracy by requiring a certain value for the relative tolerance. Notice that we
are not interested in a high precision solution of P-I here, but in a comparison of solutions
to the NLS equation close to the point of gradient catastrophe of the semiclassical system
with an asymptotic solution in terms of P-I transcendents. For this purpose an accuracy of
the solution of the order of 10−4 will be sufficient in all studied cases.
!20 !15 !10 !5 0 5 10 15 20
Figure 2: Error in the solution of the Painlevé I equation.
The quality of the used boundary conditions via the asymptotic behavior can be checked
by computing the solution for different values of y0. One finds that the difference between
the asymptotic square root and the tritronquée solution is only visible near the origin, see
Fig. 3. For large x it can be seen that the difference between the square root asymptotics
and the tritronquée solution reaches quickly values below the aimed at threshold of 10−4.
It is interesting to note that this difference is actually smaller than the difference between
the tritronquée solution and the truncated formal asymptotic series except at the boundary,
where the latter condition is implemented (see Fig. 3).
The dominant behavior of the square root changes if one approaches the critical lines
a = exp(4πi/5), b = 0. As can be seen from Fig. 4, the solution shows oscillations on top
of the square root. The closer one comes to the critical lines, the slower is the fall off of the
amplitude of the oscillations. We conjecture that these oscillations will have on the critical
10 12 14 16 18 20
!20 !10 0 10 20
Figure 3: The plot on the left side shows the absolute value of the difference between the
tritronquée solution and the asymptotic condition −
ζ/6 for a = i and b = 0. The plot on
the right side shows in blue the same difference for x > 10 and in red the difference between
the tritronquée solution and the truncated asymptotic series.
lines only a slow algebraic fall off towards infinity.
The above approach thus allows the computation of the tritronquée solution for a line
avoiding the sector | arg ζ| > 4π/5 with high accuracy. The picture one obtains by com-
puting Ω along several such lines is that there are indeed no singularities in the sector
| arg ζ| < 4π/5, and that the square root behavior is followed for large |ζ|. To obtain a
more complete picture, we compute the tritronquée solution for | arg ζ| < 4π/5 − 0.05 and
|ζ| < R ( we choose R = 20). The boundary data for |ζ| = R follow as before from the
truncated asymptotic series, the data for arg ζ = ±4π/5 − 0.05 are obtained by computing
the tritronquée solution on the respective lines as above.
To solve the resulting boundary value problem for the P-I equation is, however, computa-
tionally expensive since we have to solve an equation in 2 real dimensions iteratively. Since
the solution we want to construct is holomorphic there, we can instead solve the harmonicity
condition (the two dimensional Laplace equation) for the given boundary conditions. To this
end we introduce polar coordinates r, φ and use a spectral grid as described in [45]: the main
point is a doubling of the interval r ∈ [0, R] to [−R,R] to allow for a better distribution of
the Chebychev collocation points. Since we work with values of φ < φ0, we cannot use the
usual Fourier series approach for the azimuthal coordinate. Instead we use again a Cheby-
chev collocation method. The found solution in the considered domain is shown in Fig. 5
!20 !15 !10 !5 0 5 10 15 20
Figure 4: Real (blue) and imaginary part (red) of the tritronquée solution close to the critical
line (for a = exp(i(4π/5− .05))) with oscillations of slowly decreasing amplitude.
and Fig. 6. The quality of the solution can be tested by plugging the found solution to the
Laplace equation into the P-I equation. Due to the low resolution and problems at the bound-
ary, the accuracy is considerably lower in the two dimensional case than on the lines. This
is, however, not a problem since we need only the one dimensional solutions for quantitative
comparisons with NLS solutions. The two dimensional solutions give nonetheless strong
numerical evidence for the conjecture that the tritronquée solution has globally no poles in
the sector | arg ζ| < 4π/5.
7 Critical behavior in NLS and the tritronquée solution of
P-I: numerical results
In this section we will compare the numerical solution of the focusing NLS equation for
two examples of initial data for values of � between 0.1 and 0.025 with the asymptotic so-
lutions discussed in the previous sections, the semiclassical solution up to the breakup and
the tritronquée solution to the Painlevé I equation. The numerical approach to solve the NLS
equation is discussed in detail in [29]. For values of � below 0.04 we have to use Krasny
filtering [30] (Fourier coefficients with an absolute value below 10−13 are put equal to zero
to avoid the excitation of unstable modes). With double precision arithmetic we could thus
reach � = 0.025, but could not go below.
Figure 5: Real part of the tritronquée solution in the sector r < 20 and |φ| < 4π/5− 0.05.
7.1 Initial data
We consider initial data where u(x, 0) has a single positive hump, and where v(x, 0) is
monotonously decreasing. For initial data of the form u(x, 0) = A2(x) and v(x, 0) = 0
where the function A(x) is analytic with a single positive hump with maximum value A0,
the semiclassical solution of NLS follows from (2.11) with f(u, v) given by
f(u, v) = 2=
dη ρ(η)
v)2 + u
 (7.1)
where
ρ(η) =
∫ x+(η)
x−(η)
A2(x) + η2
and where x±(η) are defined by A(x±(η)) = iη. The formula (7.1) follows from results by
S.Kamvissis, K.McLaughlin and P.Miller in [26].
From f(u, v) it is straightforward to recover the initial data from the equations
x = fu, fv = 0. (7.2)
Numerically we study the critical behavior of two classes of initial data, one symmetric with
respect to x which were used in [34], and initial data without symmetry with respect to x
Figure 6: Imaginary part of the tritronquée solution in the sector r < 20 and |φ| < 4π/5 −
0.05.
which are built from the initial data studied in [43]. For the former class the corresponding
exact solution of focusing NLS is known in terms of a determinant. Nonetheless we integrate
the NLS equation for these initial data numerically since this approach is not limited to
special cases, but can be used for general smooth Schwartzian initial data as in the latter
case.
7.1.1 Symmetric initial data
We consider the particular class of initial data data
u(x, t = 0) = A20sech
2 x, v(x, t = 0) = −µ tanhx, µ ≥ 0. (7.3)
Introducing the quantity
− A20,
we find that the semiclassical solution for these initial data follows from (2.11) with
f(u, v) =
(v − 2M)∆+ −
(v + 2M)∆− −
u log u
u log
v +M + ∆+)(−
v −M + ∆−)
] (7.4)
where
v ±M)2 + u
For µ = 0 we recover the Satsuma-Yajima [37] initial data that were studied numerically in
[34]. The function f(u, v) takes the form
f(u, v) = <
+ i A0
+ i A0
+ u log
+ i A0)
+ iA0)2 + u√
(7.5)
which can also be recovered from (7.1) by setting ρ = i. The critical point is given by
u0 = 2A
0, v0 = 0, x0 = 0, t0 =
. (7.6)
Furthermore we have
f 0uuu = 0, f
uuv =
, r = 4A30, ψ = 0, (7.7)
where r, ψ are defined in (2.13). For A0 = 1 the initial data (7.3) coincides with the one
studied in [43] by A.Tovbis, S.Venakides and X.Zhou. In the particular case µ = 2, A0 = 1
the function f(u, v) in (7.1) simplifies to
f(u, v) = v −
v2 + u+ u log
−12v +
v2 + u
 . (7.8)
In this case the critical point is given by
v0 = 0, u0 = 2 + µ, t0 =
2 + µ
, x0 = 0.
Furthermore,
f 0uuu = 0, f
uuv =
(µ+ 2)3
, r =
(µ+ 2)3
, ψ = 0.
7.1.2 Non-symmetric initial data
Recall that we are interested here in Cauchy data in the Schwartz class of rapidly decreasing
functions. The above initial data are symmetric with respect to x, u is an even and v an odd
function in x. To obtain a situation which is manifestly not symmetric, we use the fact that if
f is a solution to (2.5), the same holds for derivatives and anti-derivatives of f with respect
to v and for any linear combination of those. If fv is an even function in v, this will obviously
not be the case for a linear combination of f and fv.
As a specific example, we consider the linear combination
f = f1 + αf2, α = const,
where f1 coincides with (7.8) and
f2 = 2u
v2 + u−
v2 + u
+ u v log
−12v +
v2 + u
 . (7.9)
The function f2 is obtained from the integration f2,v = f1 − v. The critical point is given in
this case by
u0 = 4(1− 16α2), v0 = −16α, x0 =
1 + 4α
1− 4α
, t0 =
1 + 4α
1− 4α
thus we have |α| < 1/4. Furthermore,
f 0uuv = −
4α2 − 1/8
1− 16α2
, f 0uuu =
1− 16α2
such that
r = 8u0, ψ = − arctan
1− 16α2
1/8− 4α2
We determine the initial data corresponding to f for a given value of |α| < 1/4 by
solving (7.2) for u, v in dependence of x. This is done numerically by using the algorithm of
[31] which is implemented as the function fminsearch in Matlab. The algorithm provides an
iterative approach which converges in our case rapidly if the starting values are close enough
to the solution, which is achieved by choosing u and v corresponding to f0 as an initial guess.
For α close to 1/4 we observe numerically a steepening of the initial pulse which will lead
to a shock front in the limit α→ 1/4. For the computations presented here, we consider the
case α = 0.1 which leads to the initial data shown in Fig. 7.
The initial data are computed in the way described above to the order of the Krasny filter
on the interval x ∈ [−15, 11] on Chebychev collocation points. Standard interpolation via
Chebychev polynomials is then used to interpolate the resulting data to a Fourier grid. To
avoid a Gibbs phenomenon at the interval ends due to the non-periodicity of the data, we use
a Fourier grid on the interval [−10π, 10π] to ensure that the function u takes values of the
order of the Krasny filter. For x < −15 and x > 11, the function u is exponentially small
which implies the zero-finding algorithm will no longer provide the needed precision. Thus
we determine the exponential tails of the solution to leading order analytically. We find for
x→ −∞
u = v2+ exp
2(x− αv+)
αv+ + 1
v = v+ − v+ exp
2(x− αv+)
αv+ + 1
2 log(v+) +
2(x− αv+)
αv+ + 1
, (7.10)
!4 !3 !2 !1 0 1 2 3 4
!4 !3 !2 !1 0 1 2 3 4
Figure 7: Initial data for the NLS equations without symmetry with respect to x.
and for x→ +∞
u = v2− exp
2(x+ α)
αv− + 1
v = v− − v− exp
2(x+ α)
αv− + 1
2 log(−v−)−
2(x+ α)
αv− + 1
, (7.11)
where v± = (
1± α − 1)/α. The initial data for the NLS equation in the form Ψ =√
u exp(iS/�) are then found by integrating v on the Chebychev grid by standard integration
of Chebychev polynomials. The exponential tails for S follow from (7.10) and (7.11). The
matching of the tails to the Chebychev interpolant is not smooth and leads to a small Gibbs
phenomenon. The Fourier coefficients decrease, however, to the order of the Krasny filter
which is sufficient for our purposes. Thus we obtain the non-symmetric initial data with
roughly the same precision as the analytic symmetric data.
7.2 Semiclassical solution
For times t� tc, the semiclassical solution gives a very accurate asymptotic description for
the NLS solution. The situation is similar to the Hopf and the KdV equation [19]. We find
for the symmetric initial data for t = tc/2 that the L∞ norm of the difference between the
solutions decreases as �2. More precisely a linear regression analysis in the case of symmetric
initial data (for the values � = 0.03, 0.04, . . . , 0.1) for the logarithm of this norm leads to an
error proportional to �a with a = 1.94, a correlation coefficient r = 0.9995 and standard error
σa = 0.03. In the non-symmetric case, we find a = 1.98, r = 0.999996 and σa = 0.003.
Close to the critical time the semiclassical solution only provides a satisfactory descrip-
tion of the NLS solution for large values of |x − xc|. In the breakup region it fails to be
accurate since it develops a cusp at xc whereas the NLS solution stays smooth. This behavior
can be well seen in Fig. 8 for the symmetric initial data. The largest difference between the
−0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5
Figure 8: The blue line is the function u of the solution to the focusing NLS equation for
the initial data u(x, 0) = 2 sechx and � = 0.04 at the critical time, and the red line is
the corresponding semiclassical solution given by formulas (2.4). The green line gives the
multiscales solution via the tritronquée solution of the Painlevé I equation.
semiclassical and the NLS solution is always at the critical point. We find that the L∞ norm
of the difference scales roughly as �2/5 as suggested by the Main Conjecture. More precisely
we find a scaling proportional to �a with a = 0.38 and r = 0.999997 and σa = 4.2 ∗ 10−4.
For the non-symmetric initial data, we find a = 0.36, r = 0.9999 and σa = 0.002. The
corresponding plot for u can be seen in Fig. 9.
The function v for the same situation as in Fig. 8 is shown in Fig. 10. It can be seen that
the semiclassical solution is again a satisfactory description for |x − xc| large, but fails to
be accurate close to the breakup point. The phase for the non-symmetric initial data can be
seen in Fig. 11. In the following we will always study the scaling for the function u without
further notice.
0.2 0.3 0.4 0.5 0.6 0.7
Figure 9: The blue line is the function u of the solution to the focusing NLS equation for
the non-symmetric initial data and � = 0.04 at the critical time, and the red line is the corre-
sponding semiclassical solution given by formulas (2.4). The green line gives the multiscales
solution via the tritronquée solution of the Painlevé I equation.
7.3 Multiscales solution
It can be seen in Fig. 8 and Fig. 10 that the multiscales solution (5.21) in terms of the
tritronquée solution to the Painlevé I equation gives a much better asymptotic description
to the NLS solution at breakup close to the breakup point than the semiclassical solution
for the symmetric initial data. For larger values of |x − xc|, the semiclassical solution pro-
vides, however, the better approximation. The rescaling of the coordinates in (5.21) sug-
gests to consider the difference between the NLS and the multiscales solution in an interval
[−γ�4/5, γ�4/5] (we choose here γ = 1, but within numerical accuracy the result does not
depend on varying γ around this value). These intervals can be seen in Fig. 12. We find that
the L∞ norm of the difference between these solutions in this interval scales roughly like
�4/5. More precisely we have a scaling �a with a = 0.76 (r = 0.998 and σa = 0.019).
For the non-symmetric initial data, the situation at the critical point can be seen in Fig. 9
and Fig. 11. Again the multiscales solution (5.21) gives a much better description close to
the critical point than the semiclassical solution. However, the approximation is here much
better on the side with weak slope for u than on the side with strong slope. We consider
again the L∞-norm of the difference between the multiscales and the NLS solution in the
interval [−γ�4/5, γ�4/5]. The scaling behavior of the solution can be seen in Fig. 13. For
γ = 1 we find a = 0.71, r = 0.998 and σa = 0.02. These values do not change much for
−0.1 −0.08 −0.06 −0.04 −0.02 0 0.02 0.04 0.06 0.08 0.1
Figure 10: The blue line is the function v of the solution to the focusing NLS equation
for the initial data u0(x) = 2 sechx and � = 0.04 at the critical time, and the red line is
the corresponding semiclassical solution given by formulas (2.4). The green line gives the
multiscales solution via the tritronquée solution of the Painlevé I equation.
larger γ. For smaller γ there are not enough points to provide a valid statistics. The value
of a smaller than the predicted 4/5 is seemingly due to the strong asymmetry in the quality
of the approximation of NLS by the multiscales solution as can be seen from Fig. 9. In the
considered interval, the deviation is already so big that the scaling no longer holds as in the
symmetric case. To study the scaling with a reliable statistics would, however, require the
use of a considerably higher resolution which would be computationally too expensive.
Going beyond the critical time, one finds that the real part of the NLS solution continues
to grow before the central hump breaks up into several humps. Notice that the multiscales
solution always leads to a function u that is smaller than the corresponding function of the
NLS solution at breakup and before. This changes for times after the breakup as can be
inferred from Fig. 14 which shows the time dependence of the NLS and the corresponding
multiscales solution for the non-symmetric initial data. The approximation is always best at
the critical time.
To study the quality of the approximation (5.21), we use rescaled times. The scaling of
the coordinates in (5.21) suggests to consider the NLS solution close to breakup at the times
t±(�) with
t±(�) = tc + u0/r −
(u0/r)2 ± �4/5β, (7.12)
0.2 0.3 0.4 0.5 0.6 0.7
Figure 11: The blue line is the function v of the solution to the focusing NLS equation
for the non-symmetric initial data and � = 0.04 at the critical time, and the red line is
the corresponding semiclassical solution given by formulas (2.4). The green line gives the
multiscales solution via the tritronquée solution of the Painlevé I equation.
where β is a constant (we consider β = 0.1). We will only study the symmetric initial data in
this context. Before breakup we obtain the situation shown in Fig. 15. It can be seen that the
multiscales solution always provides a better description close to xc than the semiclassical
solution, and that the quality improves in this respect with decreasing �. We find that the L∞
norm of the difference scales in this case as �a with a = 0.55 (r = 0.994 and σa = 0.03).
The situation for times after breakup can be inferred from Fig. 16. Close to the central
region the multiscales solution shows a clear difference to the NLS solution. But it is inter-
esting to note that the ripples next to the central hump are well approximated by the Painlevé
I solution. The L∞ norm of the difference between the two solutions scales roughly like �.
More precisely we find a scaling �a with a = 1.02 (r = 0.9999 and σa = 7.7 ∗ 10−3).
8 Concluding remarks
In this paper we have started the study of the critical behavior of generic solutions of the fo-
cusing nonlinear Schrödinger equation. We have formulated the conjectural analytic descrip-
tion of this behavior in terms of the tritronquée solution to the Painlevé-I equation restricted
to certain lines in the complex plane. We provided analytical as well as numerical evidence
−0.05 0 0.05
ε=0.03
−0.1 0 0.1
ε=0.1
Figure 12: The blue line is the solution to the focusing NLS equation for the initial data
u0(x) = 2 sechx at the critical time, and the green line gives the multiscales solution via the
tritronquée solution of the Painlevé I equation. The plots are shown for two values of � at the
critical time.
supporting our conjecture. In subsequent publications we plan to further study the Main
Conjecture of the present paper by applying techniques based, first of all, on the Riemann -
Hilbert problem method [26, 43, 44] and the theory of Whitham equations (see [19] for the
numerical implementation of the Whitham procedure in the analysis of oscillatory behavior
of solutions to the KdV equations). The latter will also be applied to the asymptotic descrip-
tion of solutions inside the oscillatory zone. Furthermore we plan to study the possibility
of extending the Main Conjecture to the critical behavior of solutions to the Hamiltonian
perturbations of more general first order quasilinear systems of elliptic type. Last but not
least, it would be of interest to study the distribution of poles of the tritronquée solution in
the sector | arg ζ| > 4π
and to compare these poles with the peaks of solutions to NLS inside
the oscillatory zone. The elliptic asymptotics obtained by Kitaev [28] might be useful for
studying these poles for large |ζ|.
In this paper we did not study the behaviour of solutions to NLS near the boundary u = 0.
Such a study is postponed for a subsequent publication.
0.35 0.4 0.45 0.5
!=0.03
0.2 0.3 0.4 0.5 0.6 0.7
!=0.1
Figure 13: The blue line is the solution to the focusing NLS equation for the non-symmetric
initial data at the critical time, and the green line gives the multiscales solution via the
tritronquée solution of the Painlevé I equation. The plots are shown for two values of �
at the critical time.
References
[1] G.P.Agrawal, Nonlinear Fiber Optics. Academic Press, San Diego, 2006, 4th edition.
[2] S.Alinhac, Blowup for Nonlinear Hyperbolic Equations. Progress in Nonlinear Dif-
ferential Equations and their Applications, 17. Birkhäuser Boston, Inc., Boston, MA,
1995.
[3] V.I.Arnold,V.V.Goryunov, O.V.Lyashko,V.A.Vasil’ev, Singularity Theory. I. Dynamical
systems. VI, Encyclopaedia Math. Sci. 6, Springer, Berlin, 1993.
[4] P.Boutroux, Recherches sur les transcendants de M. Painlevé et l’étude asymptotique
des équations différentielles du second ordre. Ann. École Norm 30 (1913) 265 - 375.
[5] J.C.Bronski, J.N.Kutz, Numerical simulation of the semiclassical limit of the focusing
nonlinear Schrödinger equation. Phys. Lett. A 254 (2002) 325 - 336.
[6] R.Buckingham, S.Venakides, Long-time asymptotics of the nonlinear Schrödinger
equation shock problem. Comm. Pure Appl. Math., Published Online 12.03.2007.
[7] R.Carles, WKB analysis for the nonlinear Schrödinger equation and instability results.
ArXiv:math.AP/0702318.
http://arxiv.org/abs/math/0702318
0.2 0.4 0.6
t= 0.199
0.2 0.4 0.6
t= 0.201
0.2 0.4 0.6
t= 0.203
0.2 0.4 0.6
t= 0.206
0.2 0.4 0.6
t= 0.208
0.2 0.4 0.6
t= 0.210
0.2 0.4 0.6
t= 0.212
0.2 0.4 0.6
t= 0.214
0.2 0.4 0.6
t= 0.216
Figure 14: The blue line is the solution to the focusing NLS equation for the non-symmetric
initial data for � = 0.04 for various times, and the magenta line gives the multiscales solution
via the tritronquée solution of the Painlevé I equation. The plot in the middle shows the
behavior at the critical time.
[8] H.D.Ceniceros, F.-R.Tian, A numerical study of the semi-classical limit of the focusing
nonlinear Schrödinger equation. Phys. Lett. A 306 (2002) 25–34.
[9] T.Claeys, M.Vanlessen, The existence of a real pole-free solution of the fourth order
analogue of the Painlevé I equation. ArXiv:math-ph/0604046.
[10] O.Costin, Correlation between pole location and asymptotic behavior for Painlevé I
solutions. Comm. Pure Appl. Math. 52 (1999) 461–478.
[11] M.C.Cross and P.C.Hohenberg, Pattern formation outside of equilibrium. Rev. Mod.
Phys. 65 (1993) 851-1112.
[12] B.Dubrovin, S.-Q.Liu, Y.Zhang, On Hamiltonian perturbations of hyperbolic systems
of conservation laws I: quasitriviality of bihamiltonian perturbations. Comm. Pure
Appl. Math. 59 (2006) 559-615.
[13] B.Dubrovin, On Hamiltonian perturbations of hyperbolic systems of conservation laws,
II: universality of critical behaviour, Comm. Math. Phys. 267 (2006) 117 - 139.
http://arxiv.org/abs/math-ph/0604046
−0.1 0 0.1
ε=0.1
−0.05 0 0.05
ε=0.03
Figure 15: The blue line is the solution to the focusing NLS equation for the initial data
u0(x) = 2 sechx, and the red line is the corresponding semiclassical solution given by for-
mulas (2.4). The green line gives the multiscales solution via the tritronquée solution of the
Painlevé I equation. The plots are shown for two values of � at the corresponding times t−(�).
[14] M.Duits, A.Kuijlaars, Painlevé I asymptotics for orthogonal polynomials with respect
to a varying quartic weight. ArXiv:math/0605201.
[15] A.S.Fokas, S.Tanveer, A Hele - Shaw problem and the second Painlevé transcendent.
Math. Proc. Camb. Phil. Soc. 124 (1998) 169 - 191.
[16] M.G.Forest, J.E.Lee, Geometry and modulation theory for the periodic nonlinear
Schrödinger equation. In: Oscillation Theory, Computation, and Methods of Compen-
sated Compactness (Minneapolis, Minn., 1985), 35-69. The IMA Volumes in Mathe-
matics and Its Applications, 2. Springer, New York, 1986.
[17] J. Ginibre, G.Velo, On a class of nonlinear Schrödinger equations. I. The Cauchy prob-
lem, general case. J. Funct. Anal. 32 (1979) 1-32.
[18] I. S.Gradshteyn, I. M. Ryzhik, Table of Integrals, Series, and Products. Translated from
the Russian. Sixth edition. Translation edited and with a preface by Alan Jeffrey and
Daniel Zwillinger. Academic Press, Inc., San Diego, CA, 2000.
[19] T.Grava, C.Klein, Numerical solution of the small dispersion limit of Korteweg de
Vries and Whitham equations. ArXiv:math-ph0511011, to appear in Comm. Pure Appl.
Math., 2007.
http://arxiv.org/abs/math/0605201
http://arxiv.org/abs/math-ph/0511011
−0.1 0 0.1
ε=0.1
−0.05 0 0.05
ε=0.03
Figure 16: The blue line is the solution to the focusing NLS equation for the initial data
u0(x) = 2 sechx, and the green line gives the multiscales solution via the tritronquée solution
of the Painlevé I equation. The plots are shown for two values of � at the corresponding times
t+(�).
[20] T.Grava, C.Klein, Numerical study of a multiscale expansion of KdV and Camassa-
Holm equation. ArXiv:math-ph/0702038.
[21] E.Grenier, Semiclassical limit of the nonlinear Schrödinger equation in small time.
Proc. Amer. Math. Soc. 126 (1998) 523–530.
[22] E.L.Ince, Ordinary Differential Equations. Dover Publications, New York, 1944.
[23] S.Jin, C.D.Levermore, D.W.McLaughlin, The behavior of solutions of the NLS equa-
tion in the semiclassical limit. Singular Limits of Dispersive Waves (Lyon, 1991), 235–
255, NATO Adv. Sci. Inst. Ser. B Phys., 320, Plenum, New York, 1994.
[24] N.Joshi, A.Kitaev, On Boutroux’s tritronquée solutions of the first Painlevé equation.
Stud. Appl. Math. 107 (2001) 253–291.
[25] S.Kamvissis, Long time behavior for the focusing nonlinear Schrödinger equation with
real spectral singularities. Comm. Math. Phys. 180 (1996) 325–341.
[26] S. Kamvissis, K.D.T.-R.McLaughlin, P.D.Miller, Semiclassical Soliton Ensembles for
the Focusing Nonlinear Schrödinger Equation. Annals of Mathematics Studies, 154.
Princeton University Press, Princeton, NJ, 2003.
http://arxiv.org/abs/math-ph/0702038
[27] A.Kapaev, Quasi-linear Stokes phenomenon for the Painlevé first equation. J. Phys. A:
Math. Gen. 37 (2004) 11149–11167.
[28] A.Kitaev, The isomonodromy technique and the elliptic asymptotics of the first
Painlevé transcendent. Algebra i Analiz 5 (1993), no. 3, 179–211; translation in St.
Petersburg Math. J. 5 (1994), no. 3, 577–605.
[29] C. Klein, Fourth order time-stepping for low dispersion Ko-
rteweg - de Vries and nonlinear Schrödinger equation (2006),
http://www.mis.mpg.de/preprints/2006/prepr2006 133.html
[30] R.Krasny, A study of singularity formation in a vortex sheet by the point-vortex ap-
proximation. J. Fluid Mech. 167 (1986) 65–93.
[31] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, Convergence properties of
the Nelder-Mead simplex method in low dimensions. SIAM Journal of Optimization 9
(1988) 112-147.
[32] G.D.Lyng, P.D.Miller, The N -soliton of the focusing nonlinear Schrödinger equation
for N large. Comm. Pure Appl. Math. 60 (2007) 951-1026.
[33] G.Métivier, Remarks on the well-posedness of the nonlinear Cauchy problem.
ArXiv:math.AP/0611441.
[34] P.D.Miller, S.Kamvissis, On the semiclassical limit of the focusing nonlinear
Schrödinger equation. Phys. Lett. A 247 (1998) 75–86.
[35] A.C.Newell, Solitons in Mathematics and Physics. CBMS-NSF Regional Conference
Series in Applied Mathematics, 48. SIAM, Philadelphia, PA, 1985.
[36] S.P.Novikov, S.V.Manakov, L.P.Pitaevskiı̆, V.E.Zakharov, Theory of Solitons. The In-
verse Scattering Method. Translated from the Russian. Contemporary Soviet Mathe-
matics. Consultants Bureau [Plenum], New York, 1984.
[37] J. SATSUMA AND N. YAJIMA, Initial value problems of one-dimensional self-
modulation of nonlinear waves in dispersive media, Supp. Prog. Theo. Phys. 55 (1974),
pp. 284-306.
[38] A.B.Shabat, One-dimensional perturbations of a differential operator, and the inverse
scattering problem. In: Problems in Mechanics and Mathematical Physics, 279–296.
Nauka, Moscow, 1976.
[39] L. F. Shampine, M. W. Reichelt and J. Kierzenka, Solving Boundary Value Prob-
lems for Ordinary Differential Equations in MATLAB with bvp4c, available at
http://www.mathworks.com/bvp tutorial
[40] P.Sikivie, The caustic ring singularity. Phys. Rev. D60 (1999) 063501.
http://www.mis.mpg.de/preprints/2006/prepr$2006_$133.html
http://arxiv.org/abs/math/0611441
http://www.mathworks.com/bvp_tutorial
[41] M.Slemrod, Monotone increasing solutions of the Painlevé 1 equation y′′ = y2 +x and
their role in the stability of the plasma-sheath transition. European J. Appl. Math. 13
(2002) 663–680.
[42] R.Thom, Structural Stability and Morphogenesis: An Outline of a General Theory of
Models. Reading, MA: Addison-Wesley, 1989.
[43] A.Tovbis, S.Venakides, X.Zhou, On semiclassical (zero dispersion limit) solutions of
the focusing nonlinear Schrödinger equation. Comm. Pure Appl. Math. 57 (2004) 877–
[44] A.Tovbis, S.Venakides, X.Zhou, On the long-time limit of semiclassical (zero disper-
sion limit) solutions of the focusing nonlinear Schödinger equation: pure radiation case.
Comm. Pure Appl. Math. 59 (2006) 1379–1432.
[45] L. N. Trefethen, Spectral Methods in MATLAB, SIAM, Philadelphia, PA, 2000.
[46] Y.Tsutsumi, L2-solutions for nonlinear Schrödinger equations and nonlinear groups.
Funkcial. Ekvac. 30 (1987) 115–125.
[47] G.B.Whitham, Linear and Nonlinear Waves. Wiley-Intersci. 1974.
[48] V.E.Zakharov, A.B.Shabat, A. B. Exact theory of two-dimensional self-focusing and
one-dimensional self-modulation of waves in nonlinear media. Soviet Physics JETP 34
(1972), no. 1, 62-69.; translated from Ž. Eksper. Teoret. Fiz. (1971), no. 1, 118-134.
	Introduction
	Dispersionless NLS, its solutions and critical behaviour
	First integrals and solutions of the NLS/Toda equations
	Critical behaviour and elliptic umbilic catastrophe
	The tritronquée solution to the Painlevé-I equation and the Main Conjecture
	Numerical analysis of the tritronquée solution of P-I
	Critical behavior in NLS and the tritronquée solution of P-I: numerical results
	Initial data
	Symmetric initial data
	Non-symmetric initial data
	Semiclassical solution
	Multiscales solution
	Concluding remarks
ABSTRACT
  We argue that the critical behaviour near the point of ``gradient
catastrophe" of the solution to the Cauchy problem for the focusing nonlinear
Schr\"odinger equation $ i\epsilon \psi_t +\frac{\epsilon^2}2\psi_{xx}+
|\psi|^2 \psi =0$ with analytic initial data of the form $\psi(x,0;\epsilon)
=A(x) e^{\frac{i}{\epsilon} S(x)}$ is approximately described by a particular
solution to the Painlev\'e-I equation.

<|endoftext|><|startoftext|>
Introduction
In the paper [?] we defined the category of functors Fquad from a category having
as objects the nondegenerate F2-quadratic spaces to the category E of F2-vector
spaces, where F2 is the field with two elements. The motivation for the construction
of this category is to obtain an analogous framework for the orthogonal groups over
F2, to that which exists for the general linear groups. We recall that the category
F of functors from the category Ef of finite dimensional F2-vector spaces to the
category E of all F2-vector spaces is a very useful tool for the study of the stable
cohomology of the general linear groups with suitable coefficients (see [?]). Another
motivation, in topology, for the study of the category F is the connection which
exists between this category and unstable modules over the Steenrod algebra (see
[?]). In order to have a good understanding of the category Fquad, we seek to
classify its simple objects. We constructed in [?] two families of simple objects in
Fquad. The first one is obtained by the fully-faithful, exact functor ι : F → Fquad,
defined in [?], which preserves simple objects. By [?], the simple objects in F are
in one-to-one correspondence with the irreducible representations of finite general
linear groups over F2. The second family is obtained by the fully-faithful, exact
functor κ : Fiso → Fquad, which preserves simple objects, where Fiso is equivalent
to the product of the categories of modules over the orthogonal groups of possibly
degenerate quadratic forms. In [?], we constructed two families of simple objects
in the category Fquad which are neither in the image of ι nor in the image of κ.
These simple objects are subfunctors of the tensor product between an object in
the image of ι and an object in the image of κ. We proved that these simple objects
in Fquad are the composition factors of two particular mixed functors, defined in
Date: November 11, 2018.
http://arxiv.org/abs/0704.0502v1
2 CHRISTINE VESPA
The aim of this paper is to begin a programme to obtain a complete classification
of the simple objects in Fquad. Accordingly, we seek to decompose the projective
generators of this category into indecomposable factors and to obtain the simple
factors of these indecomposable factors. This paper begins the study of the standard
projective objects in the category Fquad. Although explicit decompositions of all
the projective generators are not provided in this paper, we give several useful
tools, results and examples for the realization of this programme. Furthermore,
we deduce from the results contained in this paper several interesting consequences
for the structure of the category Fquad. In work in progress, we obtain a general
decomposition of standard projective object PH of Fquad which is indexed by the
subspaces of H . Here we present explicit decompositions of the standard projective
objects associated to “small” quadratic spaces, since these decompositions play a
fundamental rôle in the category Fquad (for example, for the description of the
polynomial functors of Fquad). Furthermore, recall that the decompositions of the
injective standard IF
of the category F and thus, by duality, that of the projective
standard PF
, is fundamental for the comprehension of the other injective standards
of F . Hence, the decompositions of the two smaller projective standard of Fquad
represent an important step in the understanding of the category Fquad.
We briefly summarize the contents of this paper. After some recollections on
the category Fquad, where we recall the definitions of the isotropic functors and
the mixed functors, we define a filtration of the standard projective objects PV in
Fquad:
0 ⊂ P
V ⊂ P
V ⊂ . . . ⊂ P
(dim(V )−1)
V ⊂ P
(dim(V ))
V = PV .
We obtain a general description of the two extremities of this filtration.
Theorem. Let V be a nondegenerate F2-quadratic space.
(1) There is a natural equivalence: P
V ≃ ι(P
ǫ(V )
), where ι : F → Fquad,
ǫ is the functor that forgets the quadratic form and PF
ǫ(V )
is the standard
projective object in F associated to the vector space ǫ(V ).
(2) The functor P
V is a direct summand of PV .
Proposition. Let V be a nondegenerate F2-quadratic space, we have a natural
equivalence:
PV /P
(dim(V )−1)
V ≃ κ(isoV )
where κ : Fiso → Fquad and isoV is an isotropic functor in Fiso.
An important consequence of the Theorem concerning the functor P
V is given
in following result.
Theorem. The category ι(F) is a thick subcategory of Fquad.
Then by an explicit study of the filtration of the functors PH0 and PH1 we
obtain the following fundamental decompositions of these two standard projective
functors.
Theorem. (1) The standard projective object PH0 admits the following decom-
position:
PH0 = ι(P
⊕2)⊕ (Mix0,1
⊕2 ⊕Mix1,1)⊕ κ(isoH0)
where Mix0,1 and Mix1,1 are two mixed functors and isoH0 is an isotropic
functor.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 3
(2) The standard projective object PH1 admits the following decomposition:
PH1 = ι(P
⊕2)⊕Mix1,1
⊕3 ⊕ κ(isoH1)
where Mix1,1 is a mixed functor and isoH1 is an isotropic functor.
These decompositions have several interesting consequences. Firstly, thanks to
this theorem we can complete the study of the functors Mix0,1 and Mix1,1 started
in [?] by the following result.
Proposition. The functors Mix0,1 and Mix1,1 are indecomposable.
We want to emphasize that the complete structure of the direct summands of
the decompositions of PH0 and PH1 is understood. The structure of the isotropic
functors is given in [?], those of the mixed functors Mix0,1 and Mix1,1 is the main
result of [?] and is completed by the previous proposition and those of PF
⊕2 follows
from [?]. Then, these decompositions give rise to a classification of the simple
functors S of Fquad such that S(H0) 6= 0 or S(H1) 6= 0.
Proposition. The isomorphism classes of non-constant simple functors of Fquad
such that either S(H0) 6= 0 or S(H1) 6= 0 are:
ι(Λ1), ι(Λ2), ι(S(2,1)), κ(iso(x,0)), κ(iso(x,1)), RH0 , RH1 , SH1
where RH0 , RH1 and SH1 are the simple functors introduced in Corollary 1.7.
These decompositions also allow us to derive some homological calculations in
the category Fquad.
Proposition. For n a natural number, we have:
ExtnFquad(RH0 , RH0) ≃ F2 and Ext
Fquad
(RH1 , RH1) ≃ F2
where RH0 and RH1 are the simple functors introduced in Corollary 1.7.
Finally, after having introduced the notion of polynomial functor for the category
Fquad, which generalizes that for F , we obtain the following result as an application
of the classification of the simple functors S of Fquad such that S(H0) 6= 0 or
S(H1) 6= 0 and of the thickness of the subcategory ι(F) of Fquad.
Theorem. The polynomial functors of Fquad are in the image of the functor ι :
F → Fquad.
Most of the results of this paper are contained in the Ph.D. thesis of the author
1. The category Fquad: some recollections
We recall in this section some definitions and results about the category Fquad
obtained in [?].
Let Eq be the category having as objects finite dimensional F2-vector spaces
equipped with a non degenerate quadratic form and with morphisms linear maps
that preserve the quadratic forms. By the classification of quadratic forms over
the field F2 (see, for instance, [?]) we know that only spaces of even dimension can
be nondegenerate and, for a fixed even dimension, there are two non-equivalent
nondegenerate spaces, which are distinguished by the Arf invariant. We will denote
by H0 (resp. H1) the nondegenerate quadratic space of dimension two such that
Arf(H0) = 0 (resp. Arf(H1) = 1). The orthogonal sum of two nondegenerate
4 CHRISTINE VESPA
quadratic spaces (V, qV ) and (W, qW ) is, by definition, the quadratic space (V ⊕
W, qV⊕W ) where qV ⊕W (v, w) = qV (v) + qW (w). Recall that the spaces H0⊥H0
and H1⊥H1 are isomorphic. Observe that the morphisms of Eq are injective linear
maps and this category does not admit push-outs or pullbacks. There exists a
pseudo push-out in Eq that allows us to generalize the construction of the category
of co-spans of Bénabou [?] and thus to define the category Tq in which there exist
retractions.
Definition 1.1. The category Tq is the category having as objects those of Eq and,
for V and W objects in Tq, HomTq (V,W ) is the set of equivalence classes of dia-
grams in Eq of the form V
←− W for the equivalence relation generated by
the relation R defined as follows: V
−→ X1
←− W R V
−→ X2
←− W if there
exists a morphism α of Eq such that α ◦ f = u and α ◦ g = v. The composition is
defined using the pseudo push-out. The morphism of HomTq (V,W ) represented by
the diagram V
←−W will be denoted by [V
←−W ].
Remark 1.2. A morphism of HomTq (V,W ) is represented by a diagram of the
form: V
−→ W⊥W ′
←−−W , where iW is the canonical inclusion. In the following,
we will use this representation of a morphism, without further comment.
By definition, the category Fquad is the category of functors from Tq to E . Hence
Fquad is abelian and has enough projective objects. By the Yoneda lemma, for any
object V of Tq, the functor PV = F2[HomTq (V,−)] is a projective object and there
is a natural isomorphism: HomFquad(PV , F ) ≃ F (V ), for all objects F of Fquad.
The set of functors {PV |V ∈ S}, named the standard projective objects in Fquad,
is a set of projective generators of Fquad, where S is a set of representatives of
isometry classes of nondegenerate quadratic spaces.
There is a forgetful functor ǫ : Tq → E
f in Fquad, defined by ǫ(V ) = O(V ) and
−→W⊥W ′
←−W ]) = pg ◦ O(f)
where pg is the orthogonal projection from W⊥W
′ to W and O : Eq → E
f is the
functor which forgets the quadratic form. By the fullness of the functor ǫ and an
argument of essential surjectivity, we obtain the following theorem.
Theorem 1.3. [?] There is a functor ι : F → Fquad, which is exact, fully faithful
and preserves simple objects.
In order to define another subcategory of Fquad, we consider the category E
having as objects finite dimensional F2-vector spaces equipped with a (possibly
degenerate) quadratic form and with morphisms injective linear maps which pre-
serve the quadratic forms. The category Edegq admits pullbacks; consequently the
category of spans Sp(Edegq ) ([?]) is defined. By definition, the category Fiso is the
category of functors from Sp(Edegq ) to E . As in the case of the category Fquad, the
category Fiso is abelian and has enough projective objects: by the Yoneda lemma,
for any object V of Sp(Edegq ), the functor QV = F2[HomSp(Edegq )(V,−)] is a projec-
tive object in Fiso. We define a particular family of functors of Fiso, the isotropic
functors, which form a set of projective generators and injective cogenerators of
Fiso. The category Fiso is related to Fquad by the following theorem.
Theorem 1.4. [?] There is a functor κ : Fiso → Fquad, which is exact, fully-faithful
and preserves simple objects.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 5
We obtain the classification of the simple objects of the category Fiso from the
following theorem.
Theorem 1.5. [?] There is a natural equivalence of categories
Fiso ≃
F2[O(V )]−mod
where S is a set of representatives of isometry classes of quadratic spaces (possibly
degenerate) and O(V ) is the orthogonal group.
The object ofFiso which corresponds, by this equivalence, to the module F2[O(V )]
is the isotropic functor isoV , defined in [?]. Recall that, as a vector space, isoV (W )
is isomorphic to the subspace of QV (W ) generated by the elements [V
←− V →W ].
A straightforward consequence of the classification of simple objects of Fiso given
in Theorem 1.5 is given in the following corollary. Recall that, by definition, an
object F of Fquad is finite if it has a finite composition series with simple subquo-
tients.
Corollary 1.6. The isotropic functors κ(isoV ) are finite in the category Fquad.
In section 3, we will require the composition series for the isotropic functors asso-
ciated to some small quadratic spaces. For α ∈ {0, 1}, let (x, α) be the degenerate
quadratic space of dimension one generated by x such that q(x) = α. Since the or-
thogonal groups O(x, 0) and O(x, 1) are trivial and O(H0) ≃ S2 and O(H1) ≃ S3,
we deduce from Theorem 1.5 and 1.4, the following corollary.
Corollary 1.7. (1) The functors κ(iso(x,0)) and κ(iso(x,1)) are simple in Fquad.
(2) The functor κ(isoH0) is indecomposable. We have the following non-split
short exact sequence:
0→ RH0 → κ(isoH0)→ RH0 → 0
where RH0 is the functor obtained from the trivial representation of O(H0).
(3) The functor κ(isoH1) admits the following decomposition:
κ(isoH1) = FH1 ⊕ (SH1)
where SH1 is the functor obtained from the natural representation of O(H1)
and FH1 is an indecomposable functor for which we have the following non-
split short exact sequence:
0→ RH1 → FH1 → RH1 → 0
where RH1 is the functor obtained from the trivial representation of O(H1).
In [?], we define a new family of functors of Fquad, named the mixed functors
and we decompose two particular functors of this family: the functors Mix0,1 and
Mix1,1. We recall the following description of these functors.
Proposition 1.8. [?] For α ∈ {0, 1}, the functors Mixα,1 : Tq → E are defined by
Mixα,1(V ) = F2[SV ]
where SV = {(v1, v2) |v1 ∈ V, v2 ∈ V, q(v1 + v2) = α, B(v1, v2) = 1} and
Mixα,1([V
−→W⊥L
←−−W ])[(v1, v2)] =
[(pW ◦ f(v1), pW ◦ f(v2))] if f(v1 + v2) ∈W
0 otherwise
where pW is the orthogonal projection.
6 CHRISTINE VESPA
In [?], for a positive integer n, we defined subfunctors Lnα of ι(Λ
n)⊗ κ(iso(x,α)),
where Λn is the nth exterior power and we proved that these functors are simple.
The functor L1α is equivalent to the functor κ(iso(x,α)). We obtain the following
result.
Theorem 1.9. [?] Let α be an element in {0, 1}.
(1) The functor Mixα,1 is infinite.
(2) There exists a subfunctor Σα,1 of Mixα,1 such that we have the following
short exact sequence
0→ Σα,1 → Mixα,1 → Σα,1 → 0.
(3) The functor Σα,1 is uniserial with unique composition series given by the
decreasing filtration given by the subfunctors kdΣα,1 of Σα,1:
. . . ⊂ kdΣα,1 ⊂ . . . ⊂ k1Σα,1 ⊂ k0Σα,1 = Σα,1.
(a) The head of Σα,1 (i.e. Σα,1/k1Σα,1) is isomorphic to the functor
κ(iso(x,α)) where iso(x,α) is a simple object in Fiso.
(b) For d > 0
kdΣα,1/kd+1Σα,1 ≃ L
where Ld+1α is a simple object of the category Fquad that is neither in
the image of ι nor in the image of κ.
The functor Ld+1α is a subfunctor of ι(Λ
d+1)⊗κ(iso(x,α)), where Λ
is the (d+ 1)st exterior power functor.
2. Filtration of the standard projective functors PV of Fquad
In this section, we define a filtration of the standard projective functors PV of
Fquad. This construction gives rise to an essential tool to obtain, in section 3,
the direct decompositions of the projective objects PH0 and PH1 of Fquad, into
indecomposable summands.
After defining this filtration, we will deduce general results about the projective
PV of Fquad. In Theorem 2.6 we prove that the rank zero part is a direct summand
of PV and we identify this functor. This result allows us to prove that ι(F) is a
thick subcategory of Fquad. We will also show that the top quotient of this filtration
is isomorphic to κ(isoV ), where isoV is the isotropic functor.
2.1. Definition of the filtration. We recall that a morphism in Tq from V to W ,
where V and W are nondegenerate quadratic spaces, is represented by a diagram
V → X ←W.
Definition 2.1. A morphism [V → X ← W ] in Tq has rank equal to i if the
pullback in Edegq of the diagram V → X ←W is a quadratic space of dimension i.
Notation 2.2. We denote by Hom
(V,W ) the subset of HomTq (V,W ) of mor-
phisms of rank less than or equal to i.
We have the following proposition:
Proposition 2.3. For W an object in Tq, the following subvector space of PV (W ):
V (W ) = F2[Hom
(V,W )]
defines a subfunctor P
V of PV .
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 7
Proof. It is sufficient to verify that for all morphisms f = [W → Y ← Z] of Tq
and g = [V → X ← W ] of Hom
(V,W ), the composition f ◦ g has rank less than
or equal to i. The composition f ◦ g is represented by the following commutative
diagram:
P ′ //
V // X // X⊥W
where P and P ′ are the pullbacks and X⊥
Y is the pseudo push-out defined in [?].
Consequently: f ◦ g = [V → X⊥
Y ← Z]. Since [V → X ← W ] is an element of
V (W ), we know that the dimension of P is less than or equal to i. We deduce
from the injectivity of the morphisms of Edegq , that P
′ has dimension smaller than
or equal to i. �
The following lemma is a straightforward consequence of Definition 2.1.
Lemma 2.4. There exists a natural equivalence: P
(dim(V ))
V ≃ PV .
We deduce the following proposition.
Proposition 2.5. The functors P
V , for i = 0, . . . , dim(V ), define an increasing
filtration of the functor PV .
Proof. The inclusion of vector spaces P
V (W ) ⊂ P
(i+1)
V (W ) is clear, forW an object
in Tq. Consequently, P
V is a subfunctor of P
(i+1)
V by the proposition 2.3. �
2.2. Extremities of the filtration. In the previous section, we have obtained,
for all objects V of Tq, the following filtration of the functor PV :
0 ⊂ P
V ⊂ P
V ⊂ . . . ⊂ P
(dim(V )−1)
V ⊂ P
(dim(V ))
V = PV .
The aim of this section is to study the two extremities of this filtration, namely,
the functor P
V and the quotient PV /P
(dim(V )−1)
2.2.1. The functor P
V . For V an object in Tq, we recall that the functor P
ǫ(V ) of
F defined by PFǫ(V )(−) = F2[HomEf (ǫ(V ),−)], where ǫ : Tq → E
f is the forgetful
functor of Fquad, is projective, by the Yoneda lemma. The aim of this paragraph
is to prove the following theorem:
Theorem 2.6. Let V be an object of Tq.
(1) There is a natural equivalence: P
V ≃ ι(P
ǫ(V )
), where ι : F → Fquad is the
functor given in Theorem 1.3.
(2) The functor P
V is a direct summand of PV .
Before proving this result, we give the following useful characterization of the
morphisms of rank zero, which is a straightforward consequence of the definition of
the rank of a morphism.
8 CHRISTINE VESPA
Lemma 2.7. Let V be an object in Tq. A morphism T = [V
−→ W⊥W ′
←−− W ]
has rank zero if and only if pW ′ ◦ α is an injective linear map, where pW ′ is the
orthogonal projection from W⊥W ′ to W ′.
For V and W objects in Tq, the forgetful functor ǫ : Tq → E
f gives rise to a map
HomTq(V,W )→ HomEf (ǫ(V ), ǫ(W )). By passage to the vector spaces freely gener-
ated by these sets and by functoriality of ǫ, we deduce the existence of a morphism
from PV to ι(P
ǫ(V )
). As the functors P
V are subfunctors of PV , we obtain a mor-
phism f from P
V to ι(P
ǫ(V )
). Consequently, to prove Theorem 2.6, it is sufficient
to prove the following proposition.
Proposition 2.8. The map P
V (W )
−−→ PFǫ(V )(ǫ(W )) is an isomorphism for V
and W objects in Tq.
The surjectivity of fW relies on the following lemma, which is an improved
version of the fullness of the forgetful functor ǫ given in [?].
Lemma 2.9. Let (V, qV ) and (W, qW ) be two objects of Tq and
f ∈ HomEf (ǫ(V, qV ), ǫ(W, qW )) a linear map, then there exists a morphism
T = [V
−→W⊥Y
←−−W ] of rank zero such that ǫ(T ) = f .
Proof. As the quadratic space V is nondegenerate, we know that it has even di-
mension. We write dim(V ) = 2n. We prove the result by induction on n.
To start the induction, let (V, qV ) be a nondegenerate quadratic space of di-
mension two, with symplectic basis {a, b} and f : V → W be a linear map. The
following linear map preserves the quadratic form:
g1 : V → W⊥H1⊥H0⊥H0 ≃W⊥Span(a1, b1)⊥Span(a0, b0)⊥Span(a
a 7−→ f(a) + (q(a) + q(f(a)))a1 + a0
b 7−→ f(b) + (q(b) + q(f(a)))a1 + (1 +B(f(a), f(b)))b0 + a
Consequently, the morphism: T = V
−→W⊥H1⊥H0 ←֓ W , is a morphism of rank
zero of Tq such that ǫ(T ) = f .
Let Vn be a nondegenerate quadratic space of dimension 2n, {a1, b1, . . . , an, bn}
be a symplectic basis of Vn and fn : Vn →W be a linear map. By induction, there
exists a map:
gn : Vn → W⊥Y
ai 7−→ fn(ai) + yi
bi 7−→ fn(bi) + zi
where yi and zi, for all integers i between 1 and n, are elements of Y . The map
gn preserves the quadratic form and the morphism T = [Vn
−→W⊥Y ←֓ W ] is of
rank zero and verifies ǫ([Vn
−→W⊥Y ←֓ W ]) = fn.
Let Vn+1 be a nondegenerate quadratic space of dimension 2(n + 1),
{a1, b1, . . . , an, bn, an+1, bn+1} a symplectic basis of Vn+1 and fn+1 : Vn+1 → W
a linear map. To define the map gn+1, we will consider the restriction of fn+1 to
Vn and extend the map gn given by the inductive assumption. For that, we need
the following space: E ≃ W⊥W ′⊥H⊥n0 ⊥H
0 ⊥H1⊥H0⊥H0 for which we specify
the notations for a basis:
E ≃W⊥W ′⊥(⊥ni=1Span(a
0))⊥(⊥
i=1Span(A
0))⊥Span(A1, B1)
⊥Span(C0, D0)⊥Span(E0, F0).
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 9
The following map:
gn+1 : V → W⊥W
′⊥H⊥n0 ⊥H
0 ⊥H1⊥H0⊥H0
ai 7−→ fn+1(ai) + yi + a
0 for i between 1 and n
bi 7−→ fn+1(bi) + zi +A
an+1 7−→ fn+1(an+1) + (q(an+1) + q(fn+1(an+1)))A1 + C0
i=1 B(fn+1(ai), fn+1(an+1))b
i=1 B(fn+1(bi), fn+1(an+1))B
bn+1 7−→ fn+1(bn+1) + (q(bn+1) + q(fn+1(bn+1)))A1
+(1 +B(fn+1(an+1), fn+1(bn+1)))D0
i=1 B(fn+1(ai), fn+1(bn+1))b
i=1 B(fn+1(bi), fn+1(bn+1))B
0 + E0
preserves the quadratic form. Furthermore, the morphism
T = [Vn+1
−−−→W⊥W ′⊥H⊥n0 ⊥H
0 ⊥H1⊥H0⊥H0 ←֓ W ]
is of rank zero and satisfies: ǫ(T ) = fn+1, which completes the inductive step.
The proof of the injectivity of fW relies on the following result, which can be
regarded as Witt’s theorem for degenerate quadratic forms.
Theorem 2.10. Let V be a nondegenerate quadratic space, D and D′ subquadratic
spaces (possibly degenerate) of V and f : D → D′ an isometry between these two
quadratic spaces. Then, there exists an isometry f : V → V such that the following
diagram is commutative:
f // V
// D′.
Proof. For a proof of this result, we refer the reader to [?] §4, theorem 1. �
Proof of the injectivity of fW . The natural map f is induced by the natural map
(V,−)→ HomEf (ǫ(V ), ǫ(−))
by passage to the vector spaces freely generated by these sets. So, f is injective if
and only if this natural map is injective. Consequently, it is sufficient to verify that,
for T = [V
−→ W⊥W ′
←−− W ] and T ′ = [V
−→ W⊥W ′′
←−−W ] two generators of
V (W ) such that
(2.10.1) pW ◦ O(α) = p
W ◦ O(α
we have T = T ′.
Let {a1, b1, . . . , an, bn} be a symplectic basis of V . We deduce from 2.10.1 that,
for all i ∈ {1, . . . , n}, we have:
(2.10.2) α(ai) = wi + w
i, α(bi) = xi + x
(2.10.3) α′(ai) = wi + w
i , α
′(bi) = xi + x
10 CHRISTINE VESPA
where, for all i ∈ {1, . . . , n}, wi and xi are in W , w
i and x
i are in W
′ and
x′′i and w
i are in W
′′. By Lemma 2.7, since the morphisms are of rank zero,
{w′1, x
1, . . . , w
n} and {w
1 , x
1 , . . . , w
n} are two linearly independent families
of vectors.
We will denote by W ′ = Span(w′1, x
1, . . . , w
n) (respectively
W ′′ = Span(w′′1 , x
1 , . . . , w
n)) the subquadratic space (possibly degenerate), of
W ′ (respectively W ′′) and we define the linear map f : W ′ → W ′′ by f(w′i) = w
and f(x′i) = x
i for all i ∈ {1, . . . , n}.
Since α and α′ preserve the quadratic forms, we deduce from the relations 2.10.2
and 2.10.3 that f preserves the quadratic form. Hence, we can apply Theorem
2.10 to the nondegenerate space W ′⊥W ′′, which gives a morphism f : W ′⊥W ′′ →
W ′⊥W ′′ of Eq, such that, the restriction of this morphism to W
′ coincides with f .
We deduce the commutativity of the following diagram:
α̃ //
α̃′ ++WWW
W W⊥(W ′⊥W ′′)
W⊥(W ′⊥W ′′)
where α̃ = iW⊥W ′ ◦ α and α̃′ = iW⊥W ′′ ◦ α
′. Consequently, we obtain the equality
T = T ′ since, by inclusion, we have
T = [V
−→W⊥W ′
←−−W ] = [V
−→W⊥W ′⊥W ′′
←−−W ]
T ′ = [V
−→W⊥W ′′
←−−W ] = [V
−→ W⊥W ′⊥W ′′
←−−W ].
Notation 2.11. For V and W two objects of Eq, and f a morphism of HomEf (ǫ(V ), ǫ(W )),
we denote by tf the morphism of Hom
(V,W ) corresponding to f and by [tf ] the
canonical generator of P
V (W ) obtained from tf . To simplify the notation, we will
denote the morphism tIdV of Hom
(V, V ) by eV .
We deduce from the first point of Theorem 2.6 the following corollary.
Corollary 2.12. For V , W and X objects of Eq, f : ǫ(W )→ ǫ(X) and g : ǫ(V )→
ǫ(W ) morphisms of Ef , we have:
tf ◦ tg = tf◦g
where tf , tg and tf◦g are respectively the morphisms of Hom
(W,X), Hom
(V,W )
and Hom
(V,X) associated to the linear maps f, g and f ◦ g.
We can apply this result to the idempotents of the ring of endomorphisms
End(PV ), to obtain the following proposition.
Proposition 2.13. The canonical generator [eV ] of P
V is an idempotent of the
ring of endomorphisms End(PV ) such that PV .[eV ] ≃ P
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 11
Proof. The canonical generator [eV ] is an idempotent of End(PV ) by Corollary 2.12.
By definition of the rank filtration PV .[eV ] ⊂ P
V and, for a canonical generator
[tf ] of P
V , we have [tf ] = [tf ] · [eV ]. �
The idempotent [eV ] plays a central rôle in the proof of the thickness of the
subcategory ι(F) in Fquad, which is the subject of the following paragraph. For
that, the following result is necessary.
Lemma 2.14. Let V and W be objects of Tq, the functor ι induces an isomorphism:
HomF (P
ǫ(V ), P
ǫ(W ))
−→ HomFquad(P
V , P
where ǫ : Tq → E is the forgetful functor.
Proof. By Proposition 2.13 and Theorem 2.6 we have the following equivalences:
HomFquad(P
V , P
W ) ≃ P
W (eV )P
W (V ) ≃ P
W (V )
≃ ι(PFǫ(W ))(V ) ≃ P
ǫ(W )(ǫ(V )) ≃ HomF (P
ǫ(V ), P
ǫ(W )).
To conclude this paragraph, we give the following property of eV which will be
useful in section 4 concerning the polynomial functors of Fquad.
Lemma 2.15. For V and W two objects of Tq, we have: eV⊥W = eV⊥eW , where
⊥ : Tq × Tq → Tq is the functor induced by the orthogonal sum.
Proof. This is a straightforward consequence of Proposition 2.8. �
2.2.2. The category ι(F) is a thick subcategory of Fquad. The aim of this paragraph
is to prove the following result.
Theorem 2.16. The category ι(F) is a thick subcategory of Fquad, where
ι : F → Fquad is the functor defined in Theorem 1.3.
To prove this theorem, we need the following general result about the precom-
position functor which is proved in the Appendix of [?].
Proposition 2.17. Let C and D be two small categories, A be an abelian category,
F : C → D be a functor and −◦F : Func(D,A)→ Func(C,A) be the precomposition
functor, where Func(C,A) is the category of functors from C to A. If F is full and
essentially surjective, then any subobject (respectively quotient) of an object in the
image of the precomposition functor is isomorphic to an object in the image of the
precomposition functor.
Proof of Theorem 2.16. • The subcategory ι(F) of Fquad is full by Theorem
• Let FF be an object in F and G a subobject of ι(FF ). Let F ′ be the
category of functors from Ef−(even) to E , where Ef−(even) is the full sub-
category of Ef having as objects the F2-vector spaces of even dimension.
The categories F and F ′ are equivalent [?]. The functor ǫ : Eq → E
factorizes through the inclusion Ef−(even) →֒ Ef . This induces a functor
ǫ′ : Eq → E
f−(even) which is full and essentially surjective. Consequently,
we can use Proposition 2.17 to obtain: G ≃ ι(GF ). Similarly, we obtain
the result for the quotient.
12 CHRISTINE VESPA
• Let GF and HF be objects of F , we set G = ι(GF ) and H = ι(HF ). For a
short exact sequence: 0 → G → F → H → 0, we have to prove that there
exists a functor FF in F such that F = ι(FF ).
Let P1 → P0 → G
F → 0 and Q1 → Q0 → H
F → 0 be projective
presentations of GF and HF in F , we have the following commutative
diagram
0 // ι(P1) //
ι(P1)⊕ ι(Q1) //
ι(Q1) //
0 // ι(P0) //
ι(P0)⊕ ι(Q0) //
ι(Q0) //
0 // G
0 0 0
where the columns are projective resolutions in Fquad, by the horseshoe
lemma. By Lemma 2.14, the morphism ι(P1)⊕ ι(Q1)→ ι(P0)⊕ ι(Q0) is in-
duced by a morphism of F denoted by f . Consequently,
F ∼= ι(Coker(f)) ∈ ι(F).
By Theorem 2.16, we deduce from Lemma 2.14, the following characterization
of the simple functors of F in Fquad which will be used in section 4 of this paper
concerning the polynomial functors of Fquad.
Lemma 2.18. (1) Let F be a functor of Fquad, then F is in the image of the
functor ι : F → Fquad if and only if, for all objects V in Tq,
F (eV )F (V ) = F (V ).
(2) Let S be a simple object in Fquad, then S is in the image of the functor
ι : F → Fquad if and only if there exists an object W in Tq such that
S(eW )S(W ) 6= 0.
Proof. (1) The forward implication is a consequence of the following fact: for a
functor F in the image of ι, HomFquad(P
V , F ) = F (eV )F (V ) = F (V ). The
reverse implication relies on the fact that the condition
F (eV )F (V ) = F (V ) implies that F is a quotient of a sum of projective
objects of the form P
V . Since the category ι(F) is thick in Fquad by
Theorem 2.16, we obtain the result.
(2) Observe that, if S(eW )S(W ) 6= 0, we have HomFquad(P
W , S) 6= 0, thus
S is a quotient of P
W by simplicity of S. Lemma 2.14 implies that there
exists a one-to-one correspondance between the indecomposable factors of
V and those of P
ǫ(V ). We deduce that the simple quotients of P
V arise
from F . Consequently, S is in the image of the functor ι.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 13
2.2.3. The quotient PV /P
(dim(V )−1)
V . The aim of this paragraph is to prove the
following result:
Proposition 2.19. Let V be an object in Tq, we have a natural equivalence:
PV /P
(dim(V )−1)
V ≃ κ(isoV )
where isoV is an isotropic functor and κ : Fiso → Fquad is the functor given in
Theorem 1.4.
To prove this proposition, we need the following notation and result:
Notation 2.20. Denote by σf the natural map PV
−→ κ(isoV ) which corre-
sponds to the canonical generator [V
−→ V ] of isoV (V ) by the equivalence
Hom(PV , κ(isoV )) ≃ isoV (V ) ≃ F2[O(V )] given by the Yoneda lemma.
Lemma 2.21. The functor κ(isoV ) of Fquad is a quotient of the functor PV =
F2[HomTq(V,−)].
Proof. The natural map PV
−→ κ(isoV ) is surjective: a pre-image of the canonical
generator [V
−→ W ] of κ(isoV )(W ) by (σf )W , is the morphism
g◦f−1
−−−−→W
←−W ]. �
A formal consequence of the previous lemma is given in the following result.
Lemma 2.22. The functor κ(isoV ) of Fquad is a quotient of the functor PV /P
(dim(V )−1)
Proof. By definition of the filtration and by the previous lemma, we have the dia-
gram:
0 // P
(dim(V )−1)
i // PV
// // PV /P
(dim(V )−1)
κ(isoV )
where i is the canonical inclusion of P
(dim(V )−1)
V in PV . By definition of σId,
we have σId ◦ i = 0, from which we deduce the existence of the surjection τ :
PV /P
(dim(V )−1)
V → κ(isoV ). �
We will prove below that this natural map is an isomorphism. It is sufficient to
prove the following result.
Proposition 2.23. For V and W two objects of Tq, we have an isomorphism
(PV /P
(dim(V )−1)
V )(W ) ≃ κ(isoV )(W ).
The proof of this proposition relies on the following lemma.
Lemma 2.24. For a non-zero canonical generator of (PV /P
(dim(V )−1)
V )(W ) repre-
sented by the morphism T = [V
−→ W⊥W ′
←−− W ] of Tq, we have g(V ) ⊂ W and
T = [V
←−W ], where g = iW ◦ f .
14 CHRISTINE VESPA
Proof. By definition of the filtration, for V andW two objects of Tq, the vector space
(PV /P
(dim(V )−1)
V )(W ) is generated by Hom
[dim(V )]
(V,W ) where Hom
[dim(V )]
(V,W )
is the set of morphisms from V to W whose the pullback D in Edegq is a quadratic
space such that dim(D) = dim(V ). We deduce from the existence of a monomor-
phism from D to V and from the equality of the dimensions, that D and V are
isometric. Consequently, for the morphism T of the statement, we have, by defi-
nition of the pullback, g(V ) ⊂ W . Thus, we have T = [V
←− W ], where
g = iW ◦ f , by the equivalence relation defined over the morphisms of Tq in Defini-
tion 1.1.
Proof of Proposition 2.23. The natural map τ obtained in the proof of Lemma 2.22
defines, for W an object in Tq, the linear map
τW : (PV /P
(dim(V )−1)
V )(W ) → κ(isoV )(W )
T = [V
←−W ] 7−→ [V
−→ W ]
which is clearly an isomorphism. �
3. Decomposition of the standard projective functors PH0 and PH1
On abelian categories, the decompositions into direct summands of a functor
F of Fquad correspond to decompositions into orthogonal idempotents of 1 in the
ring EndFquad(F ) (see for example [?]). One of the difficulties of the category
Fquad lies in the fact that the rings of endomorphisms of projectives PV and their
representations are not well-understood. The decompositions of projectives PV ,
obtained in work in preparation, using a refinement of the rank filtration will allow
us to understand the structure of these rings better.
In this section, we obtain the decompositions into indecomposable factors of the
projective objects PH0 and PH1 by an explicit study of the filtration defined in
section 2. This section concludes by several consequences of these decompositions.
In particular, we give a classification of the “small” simple functors of Fquad, which
is an essential ingredient in the following section about the polynomial functors of
Fquad.
3.1. Decomposition of PH0 . To obtain the decomposition of the functor PH0 into
indecomposable factors, we give an explicit description of the subquotients of the
filtration; then, we prove that the filtration splits for this functor and we identify
the factors of this decomposition.
3.1.1. Explicit description of the subquotients of the filtration. The aim of this para-
graph is to give a basis of the vector spaces P
(V ), P
(V ) and PH0/P
for V a given object in Tq.
We deduce from Theorem 2.6 and Notation 2.11, the following result.
Lemma 3.1. A basis B
(V ) is given by the set:
= {[tf ] for f ∈ HomEf (F2
⊕2, ǫ(V ))}.
By definition of the filtration, a canonical generator of P
(V ), represented
by the morphism T = [H0
−→ V⊥L
←− V ] of Tq, satisfies the following property:
I = f(H0) ∩ i(V ) is a quadratic space of dimension one.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 15
Lemma 3.2. Let T = [H0
−→ V⊥L
←− V ] be a morphism of Tq which represents a
canonical generator of P
(V ), and {a0, b0} be a symplectic basis of H0, then
the map f in T has one of the three following forms.
(1) If I = (f(a0), 0), the map f : H0 → V⊥L is defined by:
f(a0) = v and f(b0) = w + l
for v and w elements of V satisfying q(v) = 0 and B(v, w) = 1 and l a
non-zero element of L.
(2) If I = (f(b0), 0), the map f : H0 → V⊥L is defined by:
f(a0) = v + l and f(b0) = w
for v and w elements of V satisfying q(w) = 0 and B(v, w) = 1 and l a
non-zero element of L.
(3) If I = (f(a0 + b0), 1), the map f : H0 → V⊥L is defined by:
f(a0) = v + l and f(b0) = w + l
for v and w elements of V satisfying q(v + w) = 1 and B(v, w) = 1 and l
a non-zero element of L.
Proof. The quadratic space H0 has three subspaces of dimension one which are:
Span(a0) and Span(b0) isometric to (x, 0) and Span(a0 + b0) isometric to (x, 1).
These three subspaces give rise to each one of the maps f defined in the statement.
Notation 3.3. The morphisms [H0
−→ V⊥L
←− V ], where f is one of the mor-
phisms described in the point (1) (respectively (2) and (3)) of the previous lemma,
will be known as type A (respectively B and C) morphisms.
We have the following proposition.
Proposition 3.4. For T = [H0
−→ V⊥L
←− V ] and T ′ = [H0
−→ V⊥L′
←− V ]
morphisms of Tq which represent canonical generators of P
(V ), the follow-
ing properties are equivalent.
(1) The morphisms T and T ′ of HomTq (H0, V ) have the same type and satisfy
the relation pV ◦ f = p
V ◦ f
(2) The morphisms T and T ′ of HomTq (H0, V ) are equal.
The proof of the implication (2)⇒ (1) relies on the following technical lemma.
Lemma 3.5. Let T = [V
←− W ] and T ′ = [V
−→ X ′
←− W ′] be morphisms
of HomTq (V,W ). If T = T
′, then g(V ) + h(W ) ≃ g′(V ) + h′(W ) in Edegq .
Proof. By definition of the equivalence relation given in Definition 1.1, it is sufficient
to prove that, for two morphisms T and T ′ such that TRT ′, we have g(V )+h(W ) ≃
g′(V ) + h′(W ).
16 CHRISTINE VESPA
By definition, g(V )+h(W ) is the smallest, possibly degenerate, quadratic space
such that we have a commutative diagram in Edegq , of the form:
**UUU
U g(V ) + h(W )
Similarly, g′(V ) + h′(W ) is the smallest quadratic space such that we have an
analogous commutative diagram. By definition of the relation R, we have the
existence of a morphism δ in Eq such that the following diagram is commutative:
By the consideration of the following commutative diagram in Edegq
where Y = g(V ) + h(W ), we deduce from the minimality of g′(V ) + h′(W ), the
existence of a morphism in Edegq from g
′(V ) + h′(W ) to g(V ) + h(W ) such that the
corresponding diagram is commutative. Then, by minimality of g(V ) + h(W ) for
T , we obtain: g(V ) + h(W ) ≃ g′(V ) + h′(W ). �
Proof of Proposition 3.4. Suppose that the morphisms T and T ′ of HomTq (H0, V )
are of type A such that pV ◦ f = p
V ◦ f
′. We deduce that:
f(a0) = v; f(b0) = w + l and f
′(a0) = v; f
′(b0) = w + l
for v and w elements of V and l (resp. l′) a non-zero element of L (resp. L′).
Since the maps f and f ′ preserve the quadratic forms, we have
q(b0) = q(w) + q(l) = q(w) + q(l
′). We deduce that q(l) = q(l′), thus the map
Span(l)
−→ Span(l′), such that α(l) = l′, preserves the quadratic form. Conse-
quently, we can apply Theorem 2.10 to obtain the existence of a map α : L⊥L′ →
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 17
L⊥L′ such that the following diagram is commutative:
α // L⊥L′
Span(l)
// Span(l′).
We deduce the commutativity of the diagram:
V � _
f ′ **VVV
V⊥(L⊥L′)
V⊥(L⊥L′).
Since T = [H0
−→ V⊥L⊥L′
←− V ] and T ′ = [H0
−→ V⊥L⊥L′
←− V ], by inclusion,
we deduce from the previous diagram that T = T ′.
We reason in the same way, for the morphisms of type B and C.
Conversely, if T = T ′, by Lemma 3.5 f(H0) + iV (V ) ≃ f
′(H0) + i
V (V ). Conse-
quently we deduce from Theorem 2.10 the existence of an isometry β : V⊥L⊥L′→
V⊥L⊥L′ making the following diagram commutative:
V⊥L⊥L′
β // V⊥L⊥L′
f(H0) + iV (V )
// f ′(H0) + i
V (V ).
This yields the commutativity of the following diagram:
f ′ **UUU
U V⊥L⊥L′
V⊥L⊥L′
which implies that β ◦ iV = i
V . Thus, β = IdV⊥β
′ where β′ : L⊥L′ → L⊥L′ is a
morphism of HomEq (L⊥L
′, L⊥L′). Consequently, we have
f(a0) = v + l; f(b0) = w + l
′ and f ′(a0) = v + β
′(l); f ′(b0) = w + β
′(l′)
and we deduce that pV ◦ f = p
V . Furthermore, since β
′ is inversible, for all x in
L⊥L′ we have: x is non-zero if and only if β′(x) is non-zero. Consequently, T and
T ′ have the same type. �
This proposition justifies the following notation.
Notation 3.6. We will denote by Av,w, Bv,w and Cv,w the morphisms of HomTq (H0, V )
respectively of type A, B and C and such that pV ◦ f(a0) = v and pV ◦ f(b0) = w.
18 CHRISTINE VESPA
The following result is a straightforward consequence of Lemma 3.2 and Propo-
sition 3.4.
Lemma 3.7. A basis B
(V ) is given by the set:
= { [Av,w] for v and w elements of V satisfying q(v) = 0 and B(v, w) = 1,
[Bv,w] for v and w elements of V satisfying q(w) = 0 and B(v, w) = 1,
[Cv,w] for v and w elements of V satisfying q(v + w) = 1 and B(v, w) = 1}
By Proposition 2.19, we have (PH0/P
)(V ) ≃ κ(isoH0)(V ). We deduce the
following result.
Lemma 3.8. A basis B
of PH0/P
(V ) is given by the set:
= {[Df ] for f ∈ HomEq (H0, V )},
where Df is the morphism of Tq represented by the diagram: H0
←− V .
We end this paragraph by the rules of composition for the morphisms tf , Av,w,
Bv,w, Cv,w and Df , summarized in the following proposition. This technical re-
sult will be fundamental in the following paragraph, to prove the splitting of the
filtration.
Lemma 3.9. Let T = V
−→ W⊥L
←−− W be a morphism of HomTq(V,W ). The
following relations are satisfied:
(1) For f a morphism of HomEf (F2
⊕2, ǫ(V )) we have:
T ◦ tf = tϕ◦f .
(2) (a) For v and w elements of V satisfying q(v) = 0 and B(v, w) = 1, we
have:
T ◦Av,w =
Aϕ(v),pW ◦ϕ(w) if ϕ(v) ∈W
tpW ◦(ϕ⊥Id)◦α otherwise.
(b) For v and w elements of V satisfying q(w) = 0 and B(v, w) = 1, we
have:
T ◦Bv,w =
BpW ◦ϕ(v),ϕ(w) if ϕ(w) ∈W
tpW ◦(ϕ⊥Id)◦α otherwise.
(c) For v and w elements of V satisfying q(v + w) = 1 and B(v, w) = 1,
we have:
T ◦ Cv,w =
CpW ◦ϕ(v),pW ◦ϕ(w) if ϕ(v + w) ∈W
tpW ◦(ϕ⊥Id)◦α otherwise.
(3) For f a morphism of HomEq (H0, V ), we have:
T ◦Df =
Dϕ◦f if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) ∈ W
Aϕ◦f(a0),pW ◦ϕ◦f(b0) if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) /∈ W
BpW ◦ϕ◦f(a0),ϕ◦f(b0) if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) ∈ W
CpW ◦ϕ◦f(a0),pW ◦ϕ◦f(b0) if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) /∈ W
and ϕ ◦ f(a0 + b0) ∈W
tpW ◦(ϕ⊥Id)◦α if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) /∈ W
and ϕ ◦ f(a0 + b0) /∈W.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 19
Proof. By definition of the composition in Tq, we have the following diagram:
α // V⊥L′
// W⊥L⊥L′
(1) For a morphism tf , we have α(a0) = f(a0) + l and α(b0) = f(b0) + m.
where {l,m} is a linearly independent family of L′. Consequently:
(ϕ⊥Id) ◦ α(a0) = ϕ ◦ f(a0) + l et (ϕ⊥Id) ◦ α(b0) = ϕ ◦ f(b0) +m.
We deduce that T ◦ tf = tϕ◦f .
(2) For Av,w, we have α(a0) = v and α(b0) = w + l
′, where l′ is a non-zero
element of L′. Consequently:
(ϕ⊥Id) ◦ α(a0) = ϕ(v) et (ϕ⊥Id) ◦ α(b0) = ϕ(w) + l
We have to distinguish two cases:
• if ϕ(v) ∈W , since ϕ preserves quadratic forms, we have q(ϕ(v)) = q(v)
and, since L′ is orthogonal to V , B(ϕ(v), pW ◦ϕ(w)) = B(ϕ(v), ϕ(w)) =
B(v, w). Thus the morphism Aϕ(v),pW ◦ϕ(w) is defined and we have:
T ◦Av,w = Aϕ(v),pW ◦ϕ(w);
• otherwise, ϕ(v) = pW ◦ ϕ(v) + m where m is a non-zero element of
L. Consequently, we obtain a morphism of nul rank and we have:
T ◦Av,w = tpW ◦(ϕ⊥Id)◦α.
The cases Bv,w and Cv,w are similar to the case of Av,w and are left to
the reader.
(3) For the morphism Df , where f is an element of HomEq (H0, V ), we have
α(a0) = f(a0) = v and α(b0) = f(b0) = w where v and w are elements of V .
Consequently: (ϕ⊥Id)◦α(a0) = ϕ(v) and (ϕ⊥Id)◦α(b0) = ϕ(w). Since ϕ◦f
preserves the quadratic forms, we have: q(ϕ ◦ f(a0)) = q(ϕ ◦ f(b0)) = 0
and B(ϕ ◦ f(a0), ϕ ◦ f(b0)) = 1. Thus the morphisms Aϕ◦f(a0),ϕ◦f(b0),
Bϕ◦f(a0),ϕ◦f(b0) and Cϕ◦f(a0),ϕ◦f(b0) are defined.
We have to distinguish four cases:
• if ϕ(v) ∈W and ϕ(w) ∈W then T ◦Df = Dϕ◦f ;
• if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) /∈ W , we have ϕ ◦ f(a0) = w
′ and
ϕ ◦ f(b0) = w
′′ + l, where l is a non-zero element of L. Consequently,
we obtain a morphism of type A and we have T ◦Df = Aϕ◦f(a0),ϕ◦f(b0);
• if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) ∈ W , we have ϕ ◦ f(a0) = w
′ + l and
ϕ ◦ f(b0) = w
′′, where l is a non-zero element of L. Consequently, we
obtain a morphism of type B and we have T ◦Df = Bϕ◦f(a0),ϕ◦f(b0);
• if ϕ ◦ f(a0) /∈ W , ϕ ◦ f(b0) /∈ W and ϕ ◦ f(a0 + b0) ∈ W , we have
ϕ◦f(a0) = w
′+ l and ϕ◦f(b0) = w
′′+ l, where l is a non-zero element
of L. Consequently, we obtain a morphism of type C and we have
T ◦Df = Cϕ◦f(a0),ϕ◦f(b0);
• if ϕ ◦ f(a0) /∈ W , ϕ ◦ f(b0) /∈ W and ϕ ◦ f(a0 + b0) /∈ W , we have
ϕ ◦ f(a0) = w
′ + l and ϕ ◦ f(b0) = w
′′ + l′, where l and l′ are non-zero
20 CHRISTINE VESPA
elements of L. Consequently, we obtain a morphism of nul rank and
we have T ◦Df = tpW ◦(ϕ⊥Id)◦α.
3.1.2. Splitting of the filtration for the functor PH0 . In this paragraph, we prove
the following result.
Proposition 3.10. The rank filtration splits for the functor PH0 , namely:
PH0 = P
⊕ PH0/P
Proof. By Theorem 2.6, we have: P
. To prove the proposition,
it is sufficient to prove that PH0 = P
⊕ PH0/P
By definition of the filtration, we have the short exact sequence:
(3.10.1) 0→ P
→ PH0
−→ PH0/P
Let V be an object in Tq, we consider a morphism f of HomEq (H0, V ) and
the generator [Df ] of PH0/P
(V ) associated to f . Since the map f preserves
the quadratic forms, we have: q(f(a0)) = q(f(b0)) = 0, q(f(a0 + b0)) = 1; thus
B(f(a0), f(b0)) = 1. Consequently, the morphisms Af(a0),f(b0), Bf(a0),f(b0) and
Cf(a0),f(b0) of HomTq (H0, V ) are defined. We define a map sV : PH0/P
(V ) →
PH0(V ) by:
sV : PH0/P
(V ) −→ PH0(V )
[Df ] 7→ [Df ] + [Af(a0),f(b0)] + [Bf(a0),f(b0)] + [Cf(a0),f(b0)].
We verify the two following statements.
(1) pV ◦ sV = Id.
For [Df ] a canonical generator of PH0/P
(V ), we have
pV ◦ sV ([Df ]) = [Df ]
since the morphisms Af(a0),f(b0), Bf(a0),f(b0) and Cf(a0),f(b0) have a rank
equal to one.
(2) The maps sV define a natural map.
One verifies that, for a morphism T = V
−→W⊥L
←−−W of HomTq (V,W ),
we have the commutativity of the following diagram:
PH0/P
sV //
PH0/P
PH0(V )
PH0 (T )
PH0/P
sW // PH0(W ).
In order to simplify notation, we will write: A′ = Aϕ◦f(a0),pW ◦ϕ◦f(b0),
B′ = BpW ◦ϕ◦f(a0),ϕ◦f(b0), C
′ = CpW ◦ϕ◦f(a0),pW ◦ϕ◦f(b0) and t
′ = tpW ◦(ϕ⊥Id)◦α.
On the one hand, by Lemma 3.9, we have:
PH0 (T ) ◦ sV ([Df ])
= PH0(T )([Df ] + [Af(a0),f(b0)] + [Bf(a0),f(b0)] + [Cf(a0),f(b0)])
= [T ◦Df ] + [T ◦Af(a0),f(b0)] + [T ◦Bf(a0),f(b0)] + [T ◦ Cf(a0),f(b0)]
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 21
[Dϕ◦f ] +[A
′] +[B′] +[C′] if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) ∈ W
[A′] +[A′] +[t′] +[t′] = 0 if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) /∈ W
[B′] +[t′] +[B′] +[t′] = 0 if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) ∈ W
[C′] +[t′] +[t′] +[C′] = 0 if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) /∈ W
and ϕ ◦ f(a0 + b0) ∈W
[t′] +[t′] +[t′] +[t′] = 0 if ϕ ◦ f(a0) /∈ W and ϕ ◦ f(b0) /∈ W
and ϕ ◦ f(a0 + b0) /∈W.
[Dϕ◦f ] +[A
′] +[B′] +[C′] if ϕ ◦ f(a0) ∈ W and ϕ ◦ f(b0) ∈W
0 otherwise.
On the other hand, by Lemma 3.9, we have:
PH0/P
(T )([Df ]) =
[Dϕ◦f ] if ϕ ◦ f(a0) ∈W and ϕ ◦ f(b0) ∈W
0 otherwise
since the morphisms A, B, C and t are zero in the quotient PH0/P
(W ).
We deduce,
sW ◦ PH0/P
(T )([Df ]) =
[Dϕ◦f ] + [A
′] + [B′] + [C′] if ϕ ◦ f(a0) ∈ W
and ϕ ◦ f(b0) ∈W
0 otherwise.
Consequently, the maps sV define a natural map which is a section of p. This
gives rise to the splitting of the exact sequence 3.10.1. �
3.1.3. Identification of the direct summands. The aim of this paragraph is to iden-
tify the summands of the decomposition given in Proposition 3.10. We begin by
proving that the morphisms of type A (respectively of type B and C) define a
subfunctor of P
which is a direct summand of this functor.
Lemma 3.11. The functor P
admits the following decomposition into di-
rect summands:
= FA ⊕ FB ⊕ FC
where FA, FB and FC are subfunctors of P
generated by, respectively, the
morphisms of type A, B and C.
Proof. By Lemma 3.7, we have an isomorphism of vector spaces
(V ) = FA(V )⊕ FB(V )⊕ FC(V ),
for all objects V in Tq.
Consequently, it is sufficient to prove that FA, FB and FC are subfunctors of
For FA, we have to verify the commutativity of the diagram
FA(V )
iV //
FA(T )
FA(W )
iW // P
22 CHRISTINE VESPA
where T is a morphism of HomTq(V,W ). Let [Av,w] be a canonical generator of
FA(V ), we have by Lemma 3.9
T ◦Av,w =
Aϕ(v),pW ◦ϕ(w) if ϕ(v) ∈W
tpW ◦(ϕ⊥Id)◦α otherwise.
Consequently,
(T ) ◦ iV ([Av,w]) =
[Aϕ(v),pW ◦ϕ(w)] if ϕ(v) ∈ W
0 otherwise,
since the morphism tpW ◦(ϕ⊥Id)◦α, has nul rank.
We deduce that P
(T ) ◦ iV ([Av,w ]) is in the vector space FA(W ); thus,
FA is a subfunctor of P
In the same way, by the use of values of T ◦Bv,w and T ◦ Cv,w given in Lemma
3.9, we prove that the functors FB and FC are subfunctors of P
In the following lemma, we identify the functors FA, FB and FC with certain
mixed functors defined in [?] and recalled in section 1.
Lemma 3.12. (1) The functors FA and FB are isomorphic to the functor
Mix0,1.
(2) The functor FC is isomorphic to the functor Mix1,1.
Proof. (1) The isomorphism FA ≃ Mix0,1.
Let [Av,w] be a canonical generator of FA(V ), we have, by definition,
B(v, w) = 1. Consequently, the following linear map exists:
σ1V : FA(V ) → Mix0,1(V )
[Av,w] 7−→ [(w, v + w)].
The map σ1V is an isomorphism, whose inverse is given by
(σ1V )
: Mix0,1(V ) → FA(V )
[(v, w)] 7−→ [Av+w,v].
We have to verify that the maps σ1V define a natural map; namely, for
a morphism T = [V
−→ W⊥L
←−− W ], that the following diagram is
commutative:
FA(V )
σ1V //
FA(T )
Mix0,1(V )
Mix0,1(T )
FA(W )
σ1W // Mix0,1(W ).
We have:
Mix0,1(T ) ◦ σ
V ([Av,w ]) = Mix0,1(T )[(w, v + w)]
[(pW ◦ (ϕ(w)), pW ◦ (ϕ(v + w))] if ϕ(v) ∈W
0 otherwise
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 23
by the definition of the mixed functors, and
σ1W ◦ FA(T )([Av,w]) = σ
[Aϕ(v),pW ◦ϕ(w)] if ϕ(v) ∈W
0 otherwise
[(pW ◦ (ϕ(w)), ϕ(v) + pW ◦ (ϕ(w))] if ϕ(v) ∈W
0 otherwise.
When ϕ(v) ∈ W , we have:
[(pW ◦ (ϕ(w)), ϕ(v) + pW ◦ (ϕ(w))] = [(pW ◦ (ϕ(w)), pW ◦ (ϕ(v + w))],
what proves the naturality of σ1.
Since the two following cases are very close to the previous one, we only
give the definition of the isomorphism of vector spaces and we leave the
reader to verify that they define natural equivalences.
(2) The isomorphism FB ≃ Mix1,0.
σ2V : FB(V ) → Mix0,1(V )
Bv,w 7−→ [(v, v + w)].
(3) The isomorphism FC ≃Mix1,1.
σ3V : FC(V ) → Mix1,1(V )
Cv,w 7−→ [(v, w)]
We deduce the following proposition.
Proposition 3.13. The projective functor PH0 admits the following decomposition
into direct summands:
PH0 = ι(P
⊕2))⊕ (Mix0,1
⊕2 ⊕Mix1,1)⊕ κ(isoH0)
where Mix0,1 and Mix1,1 are mixed functors and isoH0 is an isotropic functor.
Proof. This proposition is a straightforward consequence of Proposition 3.10, The-
orem 2.6, Proposition 2.19 and Lemmas 3.11 and 3.12. �
3.2. Decomposition of PH1 . The study of the functor PH1 is analogous to that
of the functor PH0 given in the previous section. Consequently, for the functor PH1 ,
we give only the principal results without proofs.
3.2.1. Explicit description of the subquotients of the filtration. In this paragraph,
we give basis of the vector spaces P
(V ), P
(V ) and PH1/P
(V ) for V
an object in Tq.
Lemma 3.14. A basis B
(V ) is given by the set:
= {tf for f ∈ HomEf (F2
⊕2, ǫ(V ))}.
Lemma 3.15. Let T = [H1
−→ V⊥L
←− V ] be a morphism of Tq which represents
a canonical generator of P
(V ), and {a1, b1} a symplectic basis of H1, then
the map f in T has one of the three following forms.
24 CHRISTINE VESPA
(1) If I = (f(a1), 0) the map f : H1 → V⊥L is defined by:
f(a1) = v et f(b1) = w + l
for v and w elements of V satisfying q(v) = 1 and B(v, w) = 1 and l a
non-zero element of L.
(2) If I = (f(b1), 0) the map f : H1 → V⊥L is defined by:
f(a1) = v + l et f(b1) = w
for v and w elements of V satisfying q(w) = 1 and B(v, w) = 1 and l a
non-zero element of L.
(3) If I = (f(a1 + b1), 1) the map f : H1 → V⊥L is defined by:
f(a1) = v + l et f(b1) = w + l
for v and w elements of V satisfying q(v + w) = 1 and B(v, w) = 1 and l
a non-zero element of L.
Notation 3.16. The morphisms [H1
−→ V⊥L
←− V ], where f is one of the
morphisms described in the point (1) (respectively (2) and (3)) of the previous lemma
will be known as type E (respectively F and G) morphisms.
The analogous proposition to Proposition 3.4 holds for H1. This justifies the
following notation.
Notation 3.17. Denote by Ev,w, Fv,w and Gv,w the morphisms of HomTq (H1, V )
respectively of type E, F and G and such that pV ◦ f(a1) = v and pV ◦ f(b1) = w.
We deduce the following lemmas.
Lemma 3.18. A basis B
(V ) is given by the set:
= { [Ev,w] for v and w elements of V satisfying q(v) = 1 and B(v, w) = 1,
[Fv,w] for v and w elements of V satisfying q(w) = 1 and B(v, w) = 1,
[Gv,w] for v and w elements of V satisfying q(v + w) = 1 and B(v, w) = 1}
Lemma 3.19. A basis B
of PH1/P
(V ) is given by the set:
= {[Hf ] for f ∈ HomEq (H1, V )},
where Hf is the morphism of Tq represented by the diagram: H1
←− V .
The rules of composition of morphisms Ev,w, Fv,w , Gv,w and Hf are similar to
those given for Av,w, Bv,w, Cv,w and Df in Lemma 3.9. The details can be provided
by the reader.
3.2.2. Splitting of the filtration for the functor PH1 .
Proposition 3.20. The rank filtration splits for the functor PH1 , namely:
PH1 = P
⊕ PH1/P
Proof. One verifies that the map sV : PH1/P
(V )→ PH1(V ) given by:
sV : PH1/P
(V ) −→ PH1(V )
[Hf ] 7→ [Hf ] + [Ef(a1),f(b1)] + [Ff(a1),f(b1)] + [Gf(a1),f(b1)].
defines a natural map which is a section of the projection PH1 → PH1/P
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 25
3.2.3. Identification of the direct summands. We have the following lemma.
Lemma 3.21. The functor P
admits the following decomposition into di-
rect summands
= FE ⊕ FF ⊕ FG
where FE, FF and FG are subfunctors of P
generated by, respectively, the
morphisms of type E, F and G.
In the following lemma, we identify the functors FE , FF and FG with mixed
functor.
Lemma 3.22. The functors FE , FF and FG are equivalent to the functor Mix1,1.
We deduce the following decomposition.
Proposition 3.23. The projective functor PH1 admits the following decomposition
into direct summands:
PH1 = ι(P
)⊕Mix1,1
⊕3 ⊕ κ(isoH1)
where Mix1,1 is a mixed functor and isoH1 is an isotropic functor.
3.3. Consequences of decompositions of functors PH0 and PH1 . In this sec-
tion, we draw the conclusions of the decompositions of PH0 and PH1 given in Propo-
sitions 3.13 and 3.23. We deduce the indecomposability of the functors Mix0,1 and
Mix1,1, we study the projectivity of the first isotropic functors in Fquad and we give
the classification of the “small” simple objects of Fquad.
3.3.1. Indecomposability of functors Mix0,1 and Mix1,1. The aim of this paragraph
is to prove the following result:
Proposition 3.24. The functors Mix0,1 and Mix1,1 are indecomposable.
The proof of this proposition relies on the following obvious lemma.
Lemma 3.25. If the functor F of Fquad decomposes as a direct sum: F1⊕ . . .⊕Fn,
then the projections πi : F → Fi and the inclusions ji : Fi → F induce idempotents
ei = ji ◦ πi in the ring End(F ).
Proof of Proposition 3.24. (1) By the Yoneda lemma, we have:
Hom(PH0 ,Mix0,1) = Mix0,1(H0).
By a calculation, we obtain that the dimension of the space Mix0,1(H0) is
equal to 4. According to Proposition 3.13, the order of multiplicity of the
summand Mix0,1 in the decomposition of PH0 is equal to 2. Consequently,
the dimension of the vector space E := Hom(Mix0,1,Mix0,1) is 2. We have
the following basis: {Id, τ} where the map τ is given by: τ([(u, v)]) =
([(v, u)]). Consequently, E = ({0, Id, τ, Id + τ},+, ◦), as a ring, and it is
easy to see that this ring does not admit a non-trivial idempotent.
(2) Similarly, we have Hom(PH0 ,Mix1,1) = Mix1,1(H0) and
dim(Mix1,1(H0)) = 2. The order of multiplicity of the summand Mix1,1
in the decomposition of PH0 is equal to 1. We deduce that the ring
Hom(Mix1,1,Mix1,1) does not admit a non-trivial idempotent.
26 CHRISTINE VESPA
We deduce from this proposition the following result, which complements The-
orem 1.9, obtained in [?]:
Corollary 3.26. The short exact sequence 0 → Σα,1 → Mixα,1 → Σα,1 → 0 does
not split.
3.3.2. Projectivity of certain isotropic functors in Fquad. The decompositions given
in Propositions 3.13 and 3.23 allow us to study the projectivity of isotropic functors
in Fquad. Corollary 4.37 in [?] shows that the set of functors {isoV |V ∈ S} is a
set of projective generators of Fiso, where S is a set of representatives of isometry
classes of (possibly degenerate) quadratic spaces.
Since the functor κ(isoH0) (respectively κ(isoH1) ) is a direct summand of the
functor PH0 (respectively PH1) we have the following result.
Proposition 3.27. The functors κ(isoH0) and κ(isoH1) are projective in the cat-
egory Fquad.
We deduce from Corollary 1.6 and the previous proposition, the following result.
Corollary 3.28. The category Fquad contains non-constant finite, projective ob-
jects.
This corollary constitutes one of the new features of the category Fquad compared
to F . Recall that, according to Corollary B7 in [?], due to Lionel Schwartz, the
category F does not contain non-constant finite projective functors.
Recall that, the functor κ(iso(x,0)) is the top composition factor of Mix0,1 and
κ(iso(x,1)) is that of Mix1,1. We have the following result.
Proposition 3.29. The projective cover of κ(iso(x,0)) (respectively κ(iso(x,1))) is
the functor Mix0,1 (respectively Mix1,1). In particular, the functors κ(iso(x,0)) and
κ(iso(x,1)) are not projective in Fquad.
Proof. Since κ(iso(x,0))(H0) 6= {0} and κ(iso(x,1))(H0) 6= {0}, if these two functors
were projective, they would be direct summands of the functor PH0 . We deduce
from Proposition 3.13, that these functors are not projective. �
Remark 3.30. Propositions 3.27 and 3.29 let us conjecture that, for a nondegen-
erate F2-quadratic space H, κ(isoH) is a projective functor in Fquad and, for a
degenerate quadratic space D, κ(isoD) is not a projective functor in Fquad and its
projective cover is a generalized mixed functor. This result will be the subject of
future work.
3.3.3. Classification of simple objects S of Fquad such that either S(H0) 6= 0 or
S(H1) 6= 0. If S is a simple object in Fquad, such that S(H0) 6= 0, the Yoneda lemma
implies that Hom(PH0 , S) = S(H0) 6= 0. Consequently, there exists a morphism of
Fquad from PH0 to S which is an epimorphism, by simplicity of S. We deduce
from the decompositions given in Proposition 3.13 and 3.23, from Corollary 1.7
concerning the functors isoH0 and isoH1 and from the study of the functors Mix0,1
and Mix1,1 done in [?] and recalled in section 1, the following result.
Proposition 3.31. The isomorphism classes of non-constant simple functors of
Fquad such that either S(H0) 6= 0 or S(H1) 6= 0 are:
ι(Λ1), ι(Λ2), ι(S(2,1)), κ(iso(x,0)), κ(iso(x,1)), RH0 , RH1 , SH1
where RH0 , RH1 and SH1 are the simple functors introduced in Corollary 1.7.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 27
3.3.4. Extension groups in Fquad. By Theorem 1.4 and 1.5 we obtain an exact, fully-
faithful functor
V ∈S F2[O(V )]−mod
−→ Fquad, where S is a set of representatives
of isometry classes of quadratic spaces (possibly degenerate). Consequently, for
M and N two F2[O(V )]− modules, this functor induces a morphism of extension
groups:
F2[O(V )]−mod
(M,N)
(κ̃)∗
−−−→ Ext∗Fquad(κ̃(M), κ̃(N)).
We have the following proposition.
Proposition 3.32. For V ∈ {H0, H1}, the morphism (κ̃)∗ is an isomorphism.
The proof of this proposition relies on the following lemma.
Lemma 3.33. For V ∈ {H0, H1}, if P is a finite projective F2[O(V )]-module, κ̃(P )
is projective in Fquad.
Proof. If P is a finite projective F2[O(V )]-module, there exists a F2[O(V )]-module
Q such that P ⊕Q ≃ F2[O(V )]
⊕N . We deduce from the exactness of κ that κ̃(P ⊕
Q) ≃ κ̃(P ) ⊕ κ̃(Q). Since κ̃(F2[O(V )]) = κ(isoV ) and the functors κ(isoH0) and
κ(isoH1) are projective, by Proposition 3.27, we obtain that κ̃(P ) is projective. �
Proof of Proposition 3.32. Let M and N be F2[O(V )]-modules for V ∈ {H0, H1}
and P• → M be a projective resolution of M . Lemma 3.33 implies that κ̃(P•)
is a projective resolution of κ̃(M). The functor κ̃ induces a morphism of cochain
complexes
HomF2[O(V )]−mod(P•, N)→ HomFquad(κ̃(P•), κ̃(N))
which induces the morphism (κ̃)∗ in cohomology. Since the functor κ̃ is fully-
faithful the previous morphism is an isomorphism and so induces an isomorphism
in cohomology.
We deduce the following corollary:
Corollary 3.34. For n a natural number, we have:
ExtnFquad(RH0 , RH0) ≃ F2 and Ext
Fquad
(RH1 , RH1) ≃ F2
where RH0 and RH1 are the simple functors introduced in Corollary 1.7.
Proof. Let ǫ be an element in {0, 1}. For V = Hǫ, by Corollary 1.7 (1), we have
κ̃(F2) = RHǫ . So, applying Proposition 3.32 to M = N = F2 we obtain:
Ext∗Fquad(RHǫ , RHǫ) ≃ Ext
F2[O(Hǫ)]−mod
(F2,F2) = H
∗(O(Hǫ),F2).
Since O(H0) ≃ S2 ≃ C2 and O(H1) ≃ S3 ≃ GL2(F2) we know by classical results
of cohomology of groups that
Hn(O(H0),F2) = H
n(O(H1),F2) = F2.
Remark 3.35. This corollary exhibits an important difference between the cate-
gories F and Fquad; recall that in F Ext
F (S, S) = 0 for all simple objects S of F
(see [?]).
28 CHRISTINE VESPA
4. Application: the polynomial functors of Fquad
In this section, having generalized the notion of polynomial functor to the cat-
egory Fquad, we prove, by induction, that the polynomial functors of Fquad are in
the image of the functor ι : F → Fquad.
4.1. Definition of polynomial functors of Fquad.
4.1.1. The difference functors of Fquad. We define the difference functors of Fquad
which generalize the notion of difference functor of F . Recall that, according to
[?], the difference functor ∆ : F → F is the functor given by
∆F (V ) := Ker(F (V ⊕ F2)
F (p)
−−−→ F (V )),
for F an object in F , V an object in Ef and p : V ⊕ F2 → V the projection.
Definition 4.1. The difference functors ∆H0 : Fquad → Fquad and ∆H1 : Fquad →
Fquad are the functors defined by:
∆H0F (V ) := Ker(F (V⊥H0)
F (T0)
−−−−→ F (V )),
∆H1F (V ) := Ker(F (V⊥H1)
F (T1)
−−−−→ F (V )),
for F an object in Fquad, V an object in Tq, and Ti = [V⊥Hi
−→ V⊥Hi
←− V ] for
i ∈ {0, 1} .
We have the following result:
Lemma 4.2. The functors ∆H0 and ∆H1 are exact.
4.1.2. Definition of polynomial functors. Before giving the definition of polynomial
functors in Fquad, let us recall that of polynomial functors of F ( [?]). For an object
F of F , F is a polynomial functor of degree 0 if and only if ∆F = 0 and, for an
integer d, F is polynomial of degree at most d+ 1 if and only if ∆F is polynomial
of degree at most d.
Definition 4.3. Let F be an object in Fquad:
(1) the functor F is polynomial of degree 0 if and only if ∆H0F = ∆H1F = 0;
(2) for an integer d, the functor F is polynomial of degree at most d+1, if and
only if ∆H0F and ∆H1F are polynomial of degree at most d.
The following proposition allows us to simplify the definition of a polynomial
functor of degree 0.
Lemma 4.4. Let F be an object in Fquad. The functor ∆H0F is zero if and only
if the functor ∆H1F is zero.
Proof. If ∆H0F = 0, we have, for all objects V of Tq, F (V ) ≃ F (V⊥H0). Conse-
quently,
F (V ) ≃ F (V⊥H0) ≃ F (V⊥H0⊥H0) ≃ F (V⊥H1⊥H1),
where the last isomorphism is obtained from the isomorphism H0⊥H0 ≃ H1⊥H1,
recalled in section 1. We deduce from the existence of morphisms F (V ) →֒ F (V⊥H1)
and F (V ⊥H1) →֒ F (V⊥H1⊥H1), induced by the inclusions and from the isomor-
phism between F (V ) and F (V⊥H1⊥H1), that
F (V ) ≃ F (V⊥H1) ≃ F (V⊥H1⊥H1).
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 29
Therefore ∆H1F = 0.
The proof of the converse is similar. �
4.2. Study of polynomial functors of Fquad. The aim of this section is to prove
the following result:
Theorem 4.5. The polynomial functors of Fquad are in the image of the functor
ι : F → Fquad.
We will prove this theorem by induction over the degree of the polynomial func-
tors.
4.2.1. Polynomial functors of degree zero of Fquad. In this paragraph, we start the
induction. The proof of the following result relies, in an essential way, on the
classification of simple functors S of Fquad such that S(H0) 6= 0 or S(H1) 6= 0,
obtained in Proposition 3.31.
Lemma 4.6. Let S be a simple functor of Fquad, S is a polynomial functor of
degree zero if and only if S is the constant functor F2.
Proof. In order to prove the direct implication, we have to distinguish two cases.
(1) If S(H0) = S(H1) = 0.
By the classification of the nondegenerate quadratic spaces over F2, if W
is a space of minimal dimension, satisfying S(W ) 6= 0, we have the existence
of an element ǫ of {0, 1} and a nondegenerate quadratic space V which is
non-zero, such that: W ≃ Hǫ⊥V. Since W is of minimal dimension, we
have S(V ) = 0. This implies:
∆HǫS(V ) = S(Hǫ⊥V ) 6= 0.
We deduce the result in this case.
(2) If S(H0) 6= 0 or S(H1) 6= 0.
In this case, we use the classification of simple functors S of Fquad such
that S(H0) 6= 0 or S(H1) 6= 0 obtained in Proposition 3.31. By an explicit
calculation for all the functors S obtained in this classification, we obtain
that the functors ∆H0S are non-zero except for the constant functor S = F2.
The converse is trivial.
4.2.2. Proof of Theorem 4.5. To prove Theorem 4.5, we need the following result
where the idempotents [eV ], obtained in Proposition 2.13, play a crucial rôle.
Proposition 4.7. Let S be a non-trivial simple functor of Fquad which is not in
the image of the functor ι : F → Fquad, then, one of the functors ∆H0S or ∆H1S
is not in the image of the functor ι : F → Fquad.
Proof. Let W be a nondegenerate quadratic space of minimal dimension, such that
S(W ) 6= 0. We distinguish the two following cases.
(1) If dim(W ) = 2.
By an explicit calculation for all the functors S of the classification given
in Proposition 3.31, we obtain the result.
30 CHRISTINE VESPA
(2) If dim(W ) > 2.
There exists a nondegenerate quadratic space V , possibly trivial, and an
element ǫ of {0, 1}, such that: W ≃ H0⊥Hǫ⊥V. Suppose that ∆H0S and
∆H1S are in the image of the functor ι, we prove, below, that this implies
that S is in the image of ι. By Lemma 2.18 it is sufficient to show the
existence of an object W in Tq such that: S(eW )S(W ) 6= 0. By Lemma
2.15, we have: eW = eH0⊥eHǫ⊥eV .
Since W is assumed to be a space of minimal dimension such that
S(W ) 6= 0, we have S(H0⊥V ) = S(Hǫ⊥V ) = 0. This implies that
∆H0S(Hǫ⊥V ) ≃ S(W )(4.7.1)
∆HǫS(H0⊥V ) ≃ S(W ).(4.7.2)
These isomorphisms are natural and, for (4.7.1), the action of EndTq(Hǫ⊥V )
on ∆H0S(Hǫ⊥V ) corresponds to the restriction of the action of EndTq (W )
on S(W ). In the same way, for (4.7.2), the action of EndTq (H0⊥V ) on
∆HǫS(H0⊥V ) corresponds to the restriction of the action of EndTq (W ) on
S(W ). Suppose that ∆H0S and ∆H1S are in the image of ι. We deduce
that:
S(1H0⊥eHǫ⊥eV )S(W ) = ∆H0S(eHǫ⊥eV )∆H0S(Hǫ⊥V )
= ∆H0S(Hǫ⊥V ) = S(W )
where the first equality comes from the action described previously, the
second is a consequence of Lemma 2.15 and the third is given by 4.7.1. In
the same way, we obtain:
S(eH0⊥1Hǫ⊥eV )S(W ) = ∆HǫS(eH0⊥eV )∆HǫS(H0⊥V )
= ∆HǫS(H0⊥V ) = S(W ).
We deduce that: S(1H0⊥eHǫ⊥eV ) ◦ S(eH0⊥1Hǫ⊥eV )S(W ) = S(W ). Since
S(1H0⊥eHǫ⊥eV ) ◦ S(eH0⊥1Hǫ⊥eV ) = S(eW ) by Lemma 2.15, we have:
S(eW )S(W ) 6= 0, as required.
The proof of the theorem relies, also, on the following lemmas.
Lemma 4.8. (1) A functor F of Fquad which takes values in finite vector
spaces and such that ∆HǫF is finite, is finite.
(2) A polynomial functor F of Fquad which takes values in finite vector spaces
is finite.
Proof. (1) The functor
Fquad
−→ E × Fquad
F 7→ (F (0),∆HǫF )
is exact and faithful by Lemma 4.2. Hence, if γ(F ) is finite then F is finite.
(2) If F takes values in finite vector spaces, so does ∆HǫF . Consequently we
can apply the first point recursively to obtain the result.
Lemma 4.9. A finite object F of Fquad whose composition factors are in the image
of the functor ι is in the image of the functor ι.
GENERIC REPRESENTATIONS OF ORTHOGONAL GROUPS: PROJECTIVE FUNCTORS 31
Proof. This result is a straightforward consequence of the thickness of the subcat-
egory ι(F) in Fquad given in Theorem 2.16. �
Proof of Theorem 4.5. Since a functor of Fquad is colimit of its subfunctors which
take values in finite vector spaces, we deduce from Lemma 4.8, that it is sufficient
to prove the result for the finite polynomial functors of Fquad. Furthermore, since
the functors ∆H0 and ∆H1 are exact by Lemma 4.2, it is sufficient to consider the
case of a simple functor S. We prove the theorem by induction over the polynomial
degree.
If S is polynomial of degree 0, according to Lemma 4.2, S is in the image of the
functor ι.
Suppose that all simple polynomial functors of Fquad and of degree d are in the
image of the functor ι and consider a simple polynomial functor S of Fquad such
that deg(S) = d + 1. By the definition of polynomial functor in Fquad given in
4.3, the functors ∆H0S and ∆H1S are polynomial of degree d. We deduce that all
composition factors of ∆H0S and ∆H1S are polynomial of degree smaller than or
equal to d and, by induction, we obtain that they are in the image of ι. Since S is a
simple functor, it is a quotient of a standard projective functor PV . Consequently
S takes its values in finite dimensional vector spaces. Therefore, ∆H0S and ∆H1S
take their values in finite dimensional vector spaces. We deduce from Lemma 4.8
that the functors ∆H0S and ∆H1S are finite, and, by Lemma 4.9, we obtain that
∆H0S and ∆H1S are in the image of the functor ι. Consequently, by Proposition
4.7, S is in the image of the functor ι. �
Ecole Polytechnique Fédérale de Lausanne, Institut de Géométrie, Algèbre et
Topologie, Lausanne, Switzerland.
E-mail address: christine.vespa@epfl.ch
	Introduction
	1. The category Fquad: some recollections
	2. Filtration of the standard projective functors PV of Fquad
	2.1. Definition of the filtration
	2.2. Extremities of the filtration
	3. Decomposition of the standard projective functors PH0 and PH1
	3.1. Decomposition of PH0
	3.2. Decomposition of PH1
	3.3. Consequences of decompositions of functors PH0 and PH1
	4. Application: the polynomial functors of Fquad
	4.1. Definition of polynomial functors of Fquad
	4.2. Study of polynomial functors of Fquad
ABSTRACT
  In this paper, we continue the study of the category of functors Fquad,
associated to F_2-vector spaces equipped with a nondegenerate quadratic form,
initiated in two previous papers of the author. We define a filtration of the
standard projective objects in Fquad; this refines to give a decomposition into
indecomposable factors of the two first standard projective objects in Fquad.
As an application of these two decompositions, we give a complete description
of the polynomial functors of the category Fquad.

<|endoftext|><|startoftext|>
Introduction
Recently a new class of geometries related with stable forms has been discovered [Hitchin2000],
[Hitchin2001], [Witt2005], [Le2006], [LPV2007]. In some cases we can define easily a nec-
essary and sufficient condition for a manifold M to admit a stable form of type ω in terms
of topological invariants of M , for example if ω is a 3-form of G2-type [Gray1969]. But
in general there is no method to solve the question how to find a necessary and sufficient
condition for a manifold to admit a stable form. In a previous note [Le2006] we have
wrongly stated a sufficient condition for an open manifold to admit a closed stable 3-form
of G̃2-type. We recall that [Bryant1987] a 3-form on R
7 is called of G̃2-type, if it lies on
the Gl(R7)-orbit of a 3-form
3 = θ1 ∧ θ2 ∧ θ3 + α1 ∧ θ1 + α2 ∧ θ2 + α3 ∧ θ3
Here αi are 2-forms on V
7 which can be written as
α1 = y1 ∧ y2 + y3 ∧ y4, α2 = y1 ∧ y3 − y2 ∧ y4, α3 = y1 ∧ y4 + y2 ∧ y3
and (θ1, θ2, θ3, y1, y2, y3, y4) is an oriented basis of (V
The group G̃2 can be defined as the isotropy group of ω
3 under the action of Gl(R7).
Bryant proved that [Bryant1987] G̃2 coincides with the automorphism group of the split
octonians.
In this note we prove the following
http://arxiv.org/abs/0704.0503v2
Main Theorem. Suppose that M7 is a compact 7-manifold. Then M7 admits a 3-form
of G̃2-type, if and only if M
7 is orientable and spinnable. Equivalently the first and second
Stiefel-Whitney classes of M7 vanish. Suppose that M7 is an open manifold which admits
an embedding to a compact orientable and spinnable 7-manifold. Then M7 admits a closed
3-form of G̃2-type.
2 Proof of Main Theorem
Our proof is based on the following simple fact on G̃2.
2.1. Lemma. We have π1(G̃2) = Z2. Hence its maximal compact Lie group is SO(4).
This Lemma is well-known, (Bryant mentioned it but he omitted a proof in [Bryant1987]),
but I did not find an explicit proof of it in popular lectures on Lie groups, though it could
be given as an exercise. For a hint to a solution of this exercise we refer to [HL1982], p.115,
for an explicit embedding of SO(4) into G2. The reader can also check that the image of
this group is also a subgroup of G̃2 ⊂ Gl(R
7). We shall denote this image by SO(4)3,4.
The Cartan theory on symmetric spaces implies that SO(4)3,4 is a maximal compact Lie
subgroup of G̃2.
Now let us return to proof of our Main theorem. Clearly if M7 admits a G̃2-structure,
then it must be orientable and spinnable, since a maximal compact Lie subgroup SO(4)3,4
of G2 is also a compact subgroup of G2.
2.2. Lemma. Assume that M7 is compact, orientable and spinnable. Then M7 admits a
G̃2-structure.
Proof. SinceM7 is compact, orientable and spinable,M7 admits a SU(2)-structure [Friedrich1997].
Now it is easy to see that it admits a SO(4)3,4-structure, where SO(4)3,4 is a maximal com-
pact Lie subgroup of G2. Hence M
7 admits a G̃2-structure. ✷
To prove the last statement of the Main Theorem we shall use the following theorem due
to Eliashberg-Mishachev to deform the 3-form ω3 to a closed 3-form ω̄3 of G̃2-type on
For a subspaceR ⊂ ΛpM we denote by CloaR a subspace of the space SecR which consists
of closed p-forms ω : M → R in the cohomology class a ∈ Hp(M).
Eliashberg-Mishashev Theorem [E-M2002,10.2.1] Let M be an open manifold, a ∈
Hp(M) a fixed cohomology class and R an open Diff M-invariant subset. Then the inclu-
CloaR →֒ SecR
is a homotopy equivalence. In particular,
- any p-form ω : M → R is homotopic in R to a closed form ω̄.
- any homotopy of p-form ωt : M :→ R which connects two closed forms ω0, ω1 ∈ a can be
deformed in R into a homotopy of closed forms ω̄t connecting ω0 and ω1 ∈ a.
Let R be the space of all 3-forms of G̃2-type on M = M
7. Clearly this space is an open
DiffM7-invariant subset of Λ3M7. Now we apply the Eliashberg-Mishashev theorem to
our 3-form ω3 of G̃2-type whose existence has been proved above. Hence M
7 admits a
closed 3-form ω̄3 of G̃2-type. ✷
2.3. Remark. It seems that we can drop the closedness condition in our Main Theorem
and use the classical obstruction theory to prove the main Theorem.
Acknowledgement.
This note is partially supported by grant of ASCR Nr IAA100190701.
References
[Adams1996] J.F. Adams, Lectures on exceptional Lie groups, The Chicago University
Press, 1996.
[Bryant1987] R. Bryant, Metrics with exceptional holonomy, Ann. of Math. (2), 126
(1987), 525-576.
[E-M2002] Y. Eliashberg and N. Mishachev, Introduction to the h-Principle, AMS
2002.
[Gray1969] A. Gray, Vector cross products on manifolds, TAMS 141, (1969), 465-504,
(Errata in TAMS 148 (1970), 625).
[Friedrich1997] Th. Friedrich, I. Kath, A. Moroianu, U. Semmelmann, On nearly
parallel G2-manifolds, Journal Geom. Phys. 23 (1997), 259-286.
[HL1982] , R. Harvey and H. B. Lawson, Calibrated geometries, Acta Math. (182),
47-157.
[Hitchin2000] N. Hitchin, The geometry of three-forms in 6 and 7 dimensions, J.D.G. 55
(2000), 547-576.
[Hitchin2001] N. Hitchin, Stable forms and special metrics, Contemporean math., (2001),
288, 70-89.
[Le2006] H. V. Le, The existence of symplectic 3-forms on 7-manifolds,
arXiv:math.DG/0603182.
[LPV2007] H.V.Le, M. Panak and J. Vanzura, Manifolds admitting stable forms, in
preparation.
[Witt2005] F. Witt, Special metric structures and closed forms, Ph.D. Thesis ,
arxiv:math.DG/0502443.
Hong Van Le, Institute of Mathematics, Zitna 25, 11567 Praha 1, hvle@math.cas.cz,
http://arxiv.org/abs/math/0603182
http://arxiv.org/abs/math/0502443
	Introduction
	Proof of Main Theorem
ABSTRACT
  We find a necessary and sufficient condition for a compact 7-manifold to
admit a $\tilde G_2$-structure. As a result we find a sufficient condition for
an open 7-manifold to admit a closed 3-form of $\tilde G_2$-type.

<|endoftext|><|startoftext|>
Introduction
Reports on recent observations of pulsars in various binary systems show that
the maximum mass of a neutron star can be large as (1.7 ∼ 2.1)M⊙.
1), 2) Most of
them still have large uncertainty, but a few are within the above range with relatively
small error bars. The possibility of large mass of a neutron star thus has led to a
claim that exotic states of matter at high densities are not necessary in the neutron
star as far as its mass is concerned,3) but there also appeared a counter argument
that the large mass does not necessarily rule out the exotic states.4) There are many
sources of uncertainties at high densities, e.g. state of matter, constituent particles
and their interactions, but the information available to reduce the uncertainties is
not sufficient yet.
We revisit the neutron star mass problem with a simple phenomenological ap-
proach. One fixed point of nuclear matter physics is the nuclear saturation density;
its properties such as density, binding energy, symmetry energy, and compression
modulus are fairly well constrained. We describe these saturation properties in terms
of quantum hadrodynamics (QHD).5) The other fixed point we choose is the hard
core repulsion at short range. Though it is a kind of artifact adopted to describe the
nucleon-nucleon data, its role is clear in many phenomena of nuclear physics. The
effect of hard core can be parametrized with an excluded volume in the estimation
of thermodynamic variables.6), 7), 8) It has been more frequently employed to explain
the phase transition in the relativistic heavy ion collision environment, and could
describe well the transition from hadronic to quark-gluon plasma phase.9)
In this work, by including hard cores in the interaction of baryons, we explore
the bulk properties of the neutron star, and compare the result with the recent
observation of heavy neutron star masses. This paper is outlined as follows. In the
next section, we briefly address the basic formalism of QHD with hard core. The next
section comes up with numerical results, and brief concluding remarks are drawn in
the following section.
∗) e-mail address: hch@meson.skku.ac.kr
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0504v1
2 C. H. Hyun
§2. Formalism
We employ the QHD Lagrangian,
ψ̄B (i∂ · γ −mB + gσBσ − gωBγ0ω0 − gρBτ3γ0b30)ψB
mN (gσNσ)
(gσNσ)
l=e, µ
ψ̄l(i∂ · γ −ml)ψl, (2.1)
where the baryon species B includes octet baryons, and σ, ω0 and b30 are non-
vanishing meson fields in the mean field approximation. When we account for the
forbidden region due to hard core, the baryon density is redefined as
1 + vevρ′
, (2.2)
where ρ′ is the density in the case of point particle and vev the excluded volume.
We assume vev =
πr30 where r0 is the radius of hard core, which is treated as a free
parameter in our consideration. Consistency with thermodynamic relations and self-
consistency conditions alter the form of state variables (pressure, chemical potential,
energy density and etc) and equation of motion of σ-meson field from those of point
particle ones. The explicit formulas and equations can be found in old6), 7), 8) and
recent10), 11) publications.
Three meson-nucleon coupling constants gσN , gωN and gρN and two σ-meson
self interaction coefficients b and c are fitted to five saturation properties, the sat-
uration density (0.17 fm−3), binding energy (16.0 MeV), symmetry energy (32.5
MeV), compression modulus (300 MeV) and nucleon effective mass (0.75m∗N ), with
a given hard core radius r0. Meson-hyperon coupling constants are determined by
quark counting rules, gMY = gMN
q=u,d nqY /3, where gMY is the meson-hyperon
coupling constant, nqY is the number of u and d quarks in a hyperon species Y
and gMN is the meson-nucleon coupling constant. As for the hard core radius of
hyperons, we assume the same value as that of the nucleon for simplicity.
Table I summarizes the parameters determined from the given saturation prop-
erties and hard core radii.
r0 (fm) gσN gωN gρN b (×10
3) c (×103)
0 8.44 8.92 7.76 3.97 4.00
0.2 8.43 8.91 7.72 3.80 4.37
0.3 8.39 8.89 7.64 3.38 5.26
0.4 8.30 8.85 7.47 2.51 7.11
0.5 8.16 8.78 7.19 0.93 10.47
Table I. Meson-nucleon coupling constants and coefficients b and c fitted to a set of saturation
properties described in the text with a given r0 value.
Compatibility of Exotic States with Neutron Star Observation 3
§3. Numerical result
Fig. 1 shows the binding energy per a nucleon in the symmetric nuclear matter
with different hard core radii. Though the saturation properties are the same re-
gardless of r0 values, the equation of state becomes stiffer at high densities with a
larger r0 value.
 0  0.5  1  1.5  2  2.5
r0 = 0.0
    0.2
    0.4
    0.5
Fig. 1. Binding energy of a nucleon in the symmetric nuclear matter with different hard core radii.
The equation of state of neutron star matter is determined self-consistently by
the baryon number conservation, charge neutrality, β-equilibrium of baryons and
leptons, and equations of motion of meson fields. Once the equation of state is
determined, the mass-radius relation of a neutron star can be obtained by solving
Tolman-Oppenheimer-Volkoff (TOV) equation. Table II shows the maximum mass of
a neutron star, corresponding radius and central density with nucleons only (columns
of “np”) and with hyperons (columns of “npY ”). Consistent with the behavior of
the equation of state at high densities in Fig. 1, the maximum mass becomes larger
with a larger r0 value. For np case, when r0 = 0.5 fm, the increase amounts to
10% of the maximum mass without hard core. With hyperons, the maximum mass
increases to the range of large mass in recent observations when r0 & 0.3 fm.
n p n p Y
r0 (fm) M (M⊙) R (km) ρcent (ρ0) M (M⊙) R (km) ρcent (ρ0)
0 2.10 10.9 6.4 1.53 11.3 6.1
0.2 - - - 1.58 11.4 6.1
0.3 2.14 11.1 6.2 1.70 11.5 5.9
0.4 2.20 11.4 5.9 1.97 12.2 5.2
0.5 2.34 11.7 5.4 - - -
Table II. Maximum mass M in units of solar mass, and corresponding radius R in km and central
density ρcent in unit of the saturation density ρ0.
4 C. H. Hyun
§4. Conclusion
We investigated the maximum mass of a neutron star in a simple phenomeno-
logical approach where the hard-core repulsion is included in the QHD model. The
hard core radius is treated as a free parameter, and the meson-nucleon coupling con-
stants are fixed identical saturation properties. We obtained the equation of state
of neutron star matter that satisfies thermodynamic equations and self-consistency
conditions. Solving TOV equation, we obtained the mass-radius relation of a neu-
tron star. Our result shows that the maximum mass with hyperons can be as large
as observed masses with a hard core radius r0 & 0.3 fm. These values of r0 are in the
range of hard core radius 0.3 ∼ 0.6 fm in well-known hard-core potential models such
as Hamada-Johnston12) or Reid.13) More investigations are necessary to figure out
the uncertainties. For instance, the hard core size of hyperons can matter. The effect
of hard cores to the formation of other exotic states such as meson condensation or
deconfined quark phases is also worthy to be studied.
To conclude, the our result shows that the hyperon matter, which is known to
give the biggest effect to the mass-radius relation of a neutron star among possible
exotic states in the interior of a neutron star, is not necessarily incompatible with
the observed mass.
Acknowledgments
The author thanks the Yukawa Institute for Theoretical Physics at Kyoto Uni-
versity, where this work was initiated and developed during the YKIS2006 on ”New
Frontiers on QCD”. Author is grateful to Shung-ichi Ando for reading the manuscript.
This works was supported by the Basic Research Program of Korea Science & En-
gineering Foundation (R01-2005-000-10050-0).
References
1) D. J. Nice et al., Astrophys. J. 634 (2005), 1242
2) D. Page and S. Reddy, Ann. Rev. Nucl. Part. Sci. 65 (2006), 327
3) F. Özel, Nature 441 (2006), 1115
4) T. Klahn et al., nucl-th/0609067.
5) B. D. Serot and J. D. Walecka, Adv. Nucl. Phys. 16 (1986), 1
6) D. H. Rischke, M. I. Gorenstein, H. Stöcker and W. Greiner, Z. Phys. C 51 (1991), 485
7) S. Kagiyama, A. Nakamura and T. Omodaka, Z. Phys. C 53 (1992), 163
8) J. Cleymans, J. Stalnacke and E. Suhonen, Z. Phys. C 55 (1992), 317
9) S. Kagiyama et al., Eur. Phys. J. C 25 (2002), 453
10) P. K. Panda et al., Phys. Rev. C 65 (2002), 065206
11) R. M. Aguirre and A. L. De Paoli, Phys. Rev. C 68 (2003), 055804
12) T. Hamada and I. D. Johnston, Nucl. Phys. 34 (1962), 382
13) R. V. Reid, Ann. Phys. (N.Y.) 50 (1968), 411
http://arxiv.org/abs/nucl-th/0609067
	Introduction
	Formalism
	Numerical result
	Conclusion
ABSTRACT
  We consider the effect of hard core repulsion in the baryon-baryon
interaction at short distance to the properties of a neutron star. We obtain
that, even with hyperons in the interior of a neutron star, the neutron star
mass can be as large as $\sim 2 M_\odot$.

<|endoftext|><|startoftext|>
OCU-PHYS 263
Exact Solutions of Einstein-Yang-Mills Theory with
Higher-Derivative Coupling
Hironobu Kihara∗
Osaka City University, Advanced Mathematical Institute (OCAMI),
3-3-138 Sugimoto, Sumiyoshi, Osaka 558-8585, Japan
Muneto Nitta†
Department of Physics, Keio University, Hiyoshi,
Yokohama, Kanagawa 223-8521, Japan
Abstract
We construct a classical solution of an Einstein-Yang-Mills system with a fourth order term
with respect to the field strength of the Yang-Mills field. The solution provides a compactification
proposed by Cremmer and Scherk; ten-dimensional space-time with a cosmological constant is
compactified to the four-dimensional Minkowski space with a six-dimensional sphere S6 on which
an instanton solution exists. The radius of the sphere is not a modulus but is determined by the
gauge coupling and the four-derivative coupling constants and the Newton’s constant. We also
construct a solution of ten-dimensional theory without a cosmological constant compactified to
AdS4 × S6.
∗Electronic address: kihara(at)sci.osaka-cu.ac.jp
†Electronic address: nitta(at)phys-h.keio.ac.jp
http://arxiv.org/abs/0704.0505v3
mailto:kihara(at)sci.osaka-cu.ac.jp
mailto:nitta(at)phys-h.keio.ac.jp
Unification of fundamental forces with space-time and matter often requires higher-
dimensional space-time rather than our four-dimensional Universe. The early Kaluza-Klein
theory unifies gravity and the electro-magnetic interaction by considering five-dimensional
space-time with one direction compactified into a circle S1 [1]. This old idea has been revis-
ited several times. After supergravity was discovered many people tried to unify all forces
and matter in higher-dimensional space-time compactified on various internal manifolds [2].
String theory was proposed as the most attractive candidate of unification, but it is defined
only in ten-dimensional space-time. In order to realize four-dimensional Universe one has
to find a suitable six-dimensional internal space. So many candidates of such spaces were
proposed; Calabi-Yau manifolds and orbifold models. Internal manifolds can be deformed
with satisfying the Einstein equation, and these degrees of freedom are called the moduli.
The moduli introduce unwanted massless particles in four-dimensional world. Recently a
new mechanism has been suggested to fix these moduli by turning on the Ramond-Ramond
flux on the internal space [3, 4]. This flux compactification has been extensively studied in
these years.
We would like to revise the compactification scenario with fixed moduli proposed by
Cremmer and Scherk long time ago [5] (see also [6, 7]) in a theory with a cosmological
constant. By placing solitons on a compact internal space they showed decompactifying limit
with large radius of the internal space is disfavored and the radius is fixed to a certain value
determined by coupling constants. They considered the ’t Hooft-Polyakov monopole [8] on
S2 and the Yang-Mills instanton [9] on S4, both of which can satisfy, with proper coupling
constants, the first order (self-dual) equations rather than the second order equations of
motion, but their solutions on higher dimensional sphere are not the case. Since string
theory is defined in ten dimensions, it is natural to consider this scenario with stable BPS
solitons on a six-dimensional internal space like S6.
Higher dimensional generalization of self-dual equations was suggested by Tchrakian some
years ago [10]. Eight dimensional case is known as octonionic instantons [11]. Though several
works have been done for generalized self-dual equations [12, 13], a six-dimensional case was
not discussed because of the lack of conformal property. Recently we have found a new
solution to the generalized self-dual equations in an SO(6) pure Yang-Mills theory with a
fourth order term with respect to the field strength of the Yang-Mills field (a four-derivative
term) on a six-dimensional sphere S6 [14].
In this letter we propose to use this solution in the context of a compactification of the
Cremmer-Scherk type. In our model ten-dimensional space-time with (without) a cosmo-
logical constant is compactified to a four-dimensional Minkowski space M4 (anti de Sitter
space AdS4) with a six-dimensional sphere S
6, where dimensionality of the internal space,
six, is required by the four derivative term. Unlike the case of the absence of gravity [14] the
four-derivative coupling constant α can differ from the constant β in the generalized self-dual
equations. When the relation α = β holds the generalized self-dual equations become the
Bogomol’nyi equations and solutions are BPS. We find for both M4 × S6 and AdS4 × S6
that certain relations exist between the radius of S6, the gauge coupling, the four-derivative
coupling α and the gravitational coupling constants. When the four-derivative coupling con-
stant α vanishes in the case of M4 × S6, these relations reduce to those of the original work
by Cremmer and Scherk. The advantage of our model to the Cremmer-Scherk model is that
the Yang-Mills soliton in our model satisfies the self-dual equations (the Bogomol’nyi equa-
tions for α = β) rather than usual equations of motion in the case of the Cremmer-Scherk
model. This ensures the stability of configuration at least for the sector of Yang-Mills fields.
Let us consider that space-time is a ten-dimensional manifold. We consider an Einstein-
Yang-Mills theory. Our action contains as dynamical variables the Yang-Mills (gauge) fields
and a graviton field or the metric ĝµ̂ν̂ . Indices with a hat “̂ ” will refer to a ten-
dimensional space-time (X0, X1, · · · , X9). Latin indices (a, b, · · · ) run from 1 to 6 and refer
to an internal space. The Clifford algebra associated with the orthogonal group SO(6) is
useful and we represent generators of the Lie algebra so(6) as their elements. The Clifford
algebra is defined by gamma matrices {Γa} which satisfy the following anti-commutation
relations, {Γa,Γb} = 2δab. These matrices can be realized as 8 × 8 matrices with complex
coefficients. The generators of so(6) are represented by Γab =
[Γa,Γb]. We often abbreviate
the Yang-Mills fields as Aµ̂ =
Γab and we also use notations with differential forms.
Thus the gauge fields are expressed as A = Aµ̂dX
µ̂. In this notation, the corresponding field
strength F is written as F = dA+eA∧A, where e is a gauge coupling. Covariant derivative
Dµ̂ on an adjoint representation Y =
Y [ab]Γab is defined as Dµ̂Y = ∂µ̂Y + e(Aµ̂Y − Y Aµ̂),
where Y is a scalar multiplet. The action Stotal consists of two parts. One is the Einstein-
Hilbert action SE and the other SYMT is a Yang-Mills action with a term which is the fourth
power of the field strength F . Such a quartic term has been studied by Tchrakian [10] and
so we call it the Tchrakian term. The total action is:
Stotal = SE + SYMT , SE =
dvR ,
SYMT =
−F ∧ ∗F + α2(F ∧ F ) ∧ ∗(F ∧ F )− V0dv
. (1)
Here the 10-form dv is an invariant volume form with respect to the metric ĝ and is written
as dv =
−ĝd10X in a local patch. The scalar curvature is denoted by R. The asterisk “∗”
denotes the Hodge dual operator. This operator defines an inner product over differential
forms, and for a given form ω, ω∧∗ω is proportional to the invariant volume form dv.1 The
parameters of this action are the Newton’s gravitational constant G, the gauge coupling e,
the four-derivative coupling α and the cosmological constant V0.
We show the explicit form of the Yang-Mills part with components of A and F ,
SYMT = −
µ̂ν̂ F
µ̂ν̂,[ab] +
[abcd]
µ̂ν̂ρ̂σ̂ T
µ̂ν̂ρ̂σ̂,[ab][cd] +
3 · 16
Sµ̂ν̂ρ̂σ̂S
µ̂ν̂ρ̂σ̂ +
, (2)
dX µ̂ ∧ dX ν̂Γab , Sµ̂ν̂ρ̂σ̂ = F [ab]µ̂ν̂ F
, (3)
[ab][cd]
µ̂ν̂ρ̂σ̂
[abcd]
µ̂ν̂ρ̂σ̂
[ab][cd]
µ̂ν̂ρ̂σ̂
[ac][db]
µ̂ν̂ρ̂σ̂
[ad][bc]
µ̂ν̂ρ̂σ̂
[cd][ab]
µ̂ν̂ρ̂σ̂
[db][ac]
µ̂ν̂ρ̂σ̂
[bc][ad]
µ̂ν̂ρ̂σ̂
. (4)
The Euler-Lagrange equations from these actions read the usual Einstein equation and the
equations for the Yang-Mills fields:
Rµ̂ν̂ −
ĝµ̂ν̂R = 8πGTµ̂ν̂ , Dµ̂
−gF µ̂ν̂ − 2α2
−gF [µ̂ν̂F ρ̂σ̂]Fρ̂σ̂
= 0 . (5)
Here the energy-momentum tensor Tµ̂ν̂ is obtained by the variation of the Yang-Mills part
with respect to the metric:
Tµ̂ν̂ =
ρ̂,[ab]F
+ α2T̃
[abcd]
µ̂ρ̂σ̂τ̂
ρ̂σ̂τ̂ ,[ab][cd] +
3 · 2
Sµ̂ρ̂σ̂τ̂Sν̂
ρ̂σ̂τ̂ − 1
gµ̂ν̂χ
F µ̂ν̂,[ab] +
[abcd]
µ̂ν̂ρ̂σ̂
T µ̂ν̂ρ̂σ̂,[ab][cd] +
3 · 8
Sµ̂ν̂ρ̂σ̂S
µ̂ν̂ρ̂σ̂ + V0 . (6)
To solve these equations, we make an ansatz which is the same as that of Cremmer-Scherk.
Our ansatz for the metric is the following:
ds2 = ηµνdx
µdxν +
(1 + y2/4R20)
dyIdyJ = ĝµ̂ν̂dX
µ̂dX ν̂ , y2 =
(yI)2 , (7)
1 The Hodge dual operator acting on a differential form on a space with Minkowski signature satisfies the
following relation: (Fµνdx
µν) ∧ ∗(Fρσdxρσ) = −FµνFµνdv.
where the coordinates X are the total space-time coordinates. The metric ηµν = diag(− +
++) is the Lorentz metric on the four-dimensional Minkowski space. Greek indices without
a hat “̂ ”, for instance µ will refer to the first four variables. Capital indices (I, J, · · · ) run
from one to six and refer to the compact space. The six-dimensional space is taken as a
sphere with a radius R0. The Riemann tensor, Ricci tensor and scalar curvature are
RIJKL =
δIKgJL − δILgJK
, RIJ =
gIJ , R =
. (8)
The rest components of the curvature tensor vanish. In this space, the Einstein equations
in (5) reduce to simple equations,
= 8πGTµν , 0 = TµI , −
gIJ = 8πGTIJ . (9)
We now make ansatzes for the gauge fields. We assume that the fields A do not depend
on the four-dimensional directions, ∂µA = 0, and they have no four-dimensional components
Aµ = 0. This implies that the field strengths are two forms on the six-dimensional sphere:
A = AI(y)dy
I , F =
FIJdy
I ∧ dyJ . With these ansatzes, the four-dimensional part of the
energy-momentum tensor becomes −1
ηµνχ, and the equation reduces to 30/R
0 = 8πGχ.
This equation requires that the χ is a constant. Suppose that the field strength fulfils the
generalized self-dual condition
F = iβγ7 ∗6 (F ∧ F ), (10)
where β is a real parameter. Here “∗6” means the Hodge dual on the six-dimensional sphere.
Then the second part of the equations of motion (5) is fulfilled automatically by the relation
DF = 0, where the exterior covariant derivative is defined as DF = dF+e (A ∧ F − F ∧A).
In fact we have an explicit solution to the self-dual equation:
4eR20
yaebΓab , F =
4eR20
ea ∧ ebΓab , β =
. (11)
Here we identify the internal space index and the sphere index. The energy-momentum
tensor of this configuration becomes
, χ = (1 + ζ)
4e2R40
+ V0 , TIJ = −
(1− ζ)
4e2R40
gIJ . (12)
With these ansatzes we obtain algebraic equations from the Einstein equations:
= 8πG
(1 + ζ)
2e2R40
= 8πG
(1− ζ) 5
8e2R40
, (13)
From these we finally obtain
e2R20
(2 + 4ζ) , V0 =
4e2R40
(1 + 3ζ) . (14)
When the four-derivative coupling vanishes, α = 0 and therefore ζ = 0, these relations
reduce to those of the Cremmer and Scherk [5]. 2 When the relation α = β holds (ζ = 1)
our solution saturates the Bogomol’nyi bound and becomes a BPS state. The energy density
is given by an integral over S6 as follows:
YMT =
−F ∧ ∗6F + α2(F ∧ F ) ∧ ∗6(F ∧ F )
Tr (iF ∓ αγ7 ∗6 (F ∧ F )) ∧ ∗6 (iF ∓ αγ7 ∗6 (F ∧ F ))±
Trγ7F ∧ F ∧ F
≥ ± i
Trγ7F ∧ F ∧ F = ∓
ǫabcdefF
[ab] ∧ F [cd] ∧ F [ef ] ≡ ±Q , (15)
where the field strength F has only components along S6. If the coupling α is equal to β,
the solution of eq. (10) satisfies the Bogomol’nyi equation and the energy attains the local
minimum. We can also consider a system coupled with scalar fields. Suppose that scalar
fields Qm transform as a representation of SO(6). The index m labels the representation
space. Let us add an action SQ of the scalar fields Q with a Higgs potential
dvDµ̂Q
mDµ̂Qm + V (Q2)
, Dµ̂Q
m = ∂µ̂Q
R(Γab)mm′Q
m′ (16)
to Stotal. The equations of motion are modified. In general, our solution mentioned above
does not satisfy the modified equations any more. However, for the scalars which fulfil the
covariantly constant condition Dµ̂Q
m = 0 and attain the absolute minimum V (Q) = 0,
the configurations of A and g in equations (7), (11) are still solutions for the modified
equations. Here the constant value of the minimum is shifted to 0. Thus we can argue the
Higgs mechanism around our solutions.
Next we suppose that the four-dimensional part is an anti-de Sitter space AdS4 of radius
RA. Our ansatz for the metric is the following:
ds2 = ηµν(x)dx
µdxν + gIJ(y)dy
IdyJ = ĝµ̂ν̂dX
µ̂dX ν̂ , (17)
2 We need to redefine e the half when we compare to the result of [5].
gIJ(y)dy
IdyJ =
(1 + y2/4R20)
dyIdyJ y2 =
(yI)2 ,
ηµν(x)dx
µdxν =
cos2 θ
−dτ 2 + dθ2 + sin2 θdΩ2
, dΩ2 =
|dz|2
(1 + |z|2/4)2
, (18)
where z parametrizes a whole complex plane. The metric ηµν(x) is a maximally symmetric
metric on the four-dimensional anti-de Sitter space. The Riemann tensor and the Ricci
tensor are
Rµνρσ = −
δµρηνσ − δµσηνρ
, Rµν = −
ηµν ,
RIJKL =
δIKgJL − δILgJK
, RIJ =
gIJ . (19)
The total scalar curvature is obtained by a summation of those of two parts: R = −
In this space, the Einstein equations are
Rµν −
ηµνR = 8πGTµν , RIJ −
gIJR = 8πGTIJ . (20)
The ansatz for the gauge fields is the same as previous one and the energy momentum tensor
does not change. With these ansatzes, we obtain algebraic equations from the Einstein
equations as
= −4πG
(1 + ζ)
4e2R40
= −4πG
(1− ζ) 5
4e2R40
We are interested in a possible relation to string theory and therefore we consider the case
with the vanishing cosmological constant, V0 = 0. In this case, the radii (RA, R0) are
written by the couplings,
R20 = (5 + 7ζ)
, R2A =
5 + 7ζ
5 + 15ζ
R20. (22)
Thus the additional higher derivative coupling term of the Tchrakian type does not affect
critically to the equations of motion. When ζ = 1 our solution becomes a solution of the
Bogomol’nyi equation again.
Our solutions introduced in this letter are new solutions of the system with a Tchrakian
term. The origin of this term has not been clear so far but it seems rather universal in order
to construct solitons with codimensions higher than four: for instance it has played a crucial
role to construct a finite energy monopole (with codimension five) in a six-dimensional space-
time [13]. Though the parameter ζ(= α2/β2) is a free parameter, we expect that the system
goes to ζ = 1 because it becomes BPS. There are several discussions on the (in)stability of
higher-dimensional Yang-Mills theories [15]. To compute the mass spectra of the fluctuations
around our solutions is a future work. When the scalar fields Qm are non-trivially coupled,
the system may allow BPS composite solitons which are made of solitons with different
codimensions, as in the case of usual self-dual Yang-Mills equations coupled to Higgs fields
[16].
Finally our solution of AdS4 ×S6 may have a relation with D2-branes, and we hope that
there exists some impact on AdS/CFT duality [17].
Acknowledgments
We are grateful to D. H. Tchrakian for various comments. We would like to thank
Y. Hosotani, H. Itoyama, Y. Yasui, M. Sakaguchi, T. Oota, T. Kimura, S. Shimasaki and
E. Itou. We also thank M. Sheikh-Jabbari for an advice. This work is supported by the
21 COE program “Constitution of wide-angle mathematical basis focused on knots” from
Japan Ministry of Education.
[1] G. Nordström, Phys. Z. 15, 504 (1914) [arXiv:physics/0702221]; T. Kaluza, Sitzungsber.
Preuss. Akad. Wiss. Berlin (Math. Phys. ) 1921, 966 (1921); O. Klein, Z. Phys. 37, 895
(1926) [Surveys High Energ. Phys. 5, 241 (1986)].
[2] M. J. Duff, B. E. W. Nilsson and C. N. Pope, Phys. Rept. 130, 1 (1986).
[3] K. Dasgupta, G. Rajesh and S. Sethi, JHEP 9908, 023 (1999) [arXiv:hep-th/9908088].
[4] S. Kachru, R. Kallosh, A. Linde and S. P. Trivedi, Phys. Rev. D 68, 046005 (2003)
[arXiv:hep-th/0301240].
[5] E. Cremmer and J. Scherk, Nucl. Phys. B 108, 409 (1976); Nucl. Phys. B 118, 61 (1977).
[6] Z. Horvath, L. Palla, E. Cremmer and J. Scherk, Nucl. Phys. B 127, 57 (1977).
[7] R. Kerner and D. H. Tchrakian, Phys. Lett. B 215, 87 (1988).
[8] G. ’t Hooft, Nucl. Phys. B 79, 276 (1974); A. M. Polyakov, JETP Lett. 20, 194 (1974) [Pisma
http://arxiv.org/abs/physics/0702221
http://arxiv.org/abs/hep-th/9908088
http://arxiv.org/abs/hep-th/0301240
Zh. Eksp. Teor. Fiz. 20, 430 (1974)].
[9] A. A. Belavin, A. M. Polyakov, A. S. Schwarts and Yu. S. Tyupkin, Phys. Lett. B 59, 85
(1975).
[10] D. H. Tchrakian, J. Math. Phys. 21, 166 (1980).
[11] B. Grossman, T. W. Kephart and J. D. Stasheff, Commun. Math. Phys. 96, 431 (1984)
[Erratum-ibid. 100, 311 (1985)]; R. V. Buniy and T. W. Kephart, Phys. Lett. B 548 (2002)
97 [arXiv:hep-th/0210037].
[12] D. H. Tchrakian, Phys. Lett. B 150, 360 (1985); D. O’Se and D. H. Tchrakian, Lett. Math.
Phys. 13, 211 (1987); Z. Ma and D. H. Tchrakian, J. Math. Phys. 31, 1506 (1990).
[13] H. Kihara, Y. Hosotani and M. Nitta, Phys. Rev. D 71, 041701 (2005) [arXiv:hep-th/0408068];
E. Radu and D. H. Tchrakian, Phys. Rev. D 71, 125013 (2005) [arXiv:hep-th/0502025].
[14] H. Kihara and M. Nitta, arXiv:hep-th/0703166.
[15] S. Randjbar-Daemi, A. Salam and J. A. Strathdee, Phys. Lett. B 124, 345 (1983) [Erratum-
ibid. B 144, 455 (1984)]; O. DeWolfe, D. Z. Freedman, S. S. Gubser, G. T. Horowitz and
I. Mitra, Phys. Rev. D 65, 064033 (2002) [arXiv:hep-th/0105047]; A. E. Mosaffa, S. Randjbar-
Daemi and M. M. Sheikh-Jabbari, arXiv:hep-th/0612181.
[16] M. Eto, Y. Isozumi, M. Nitta, K. Ohashi and N. Sakai, J. Phys. A 39, R315 (2006)
[arXiv:hep-th/0602170].
[17] J. M. Maldacena, Adv. Theor. Math. Phys. 2, 231 (1998) [Int. J. Theor. Phys. 38,
1113 (1999)] [arXiv:hep-th/9711200]; E. Witten, Adv. Theor. Math. Phys. 2, 253 (1998)
[arXiv:hep-th/9802150]; O. Aharony, S. S. Gubser, J. M. Maldacena, H. Ooguri and Y. Oz,
Phys. Rept. 323, 183 (2000) [arXiv:hep-th/9905111].
http://arxiv.org/abs/hep-th/0210037
http://arxiv.org/abs/hep-th/0408068
http://arxiv.org/abs/hep-th/0502025
http://arxiv.org/abs/hep-th/0703166
http://arxiv.org/abs/hep-th/0105047
http://arxiv.org/abs/hep-th/0612181
http://arxiv.org/abs/hep-th/0602170
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/9802150
http://arxiv.org/abs/hep-th/9905111
	Acknowledgments
	References
ABSTRACT
  We construct a classical solution of an Einstein-Yang-Mills system with a
fourth order term with respect to the field strength of the Yang-Mills field.
The solution provides a spontaneous compactification proposed by Cremmer and
Scherk; ten-dimensional space-time with a cosmological constant is compactified
to the four-dimensional Minkowski space with a six-dimensional sphere S^6 on
which an instanton solution exists. The radius of the sphere is not a modulus
but is determined by the gauge coupling and the four-derivative coupling
constants and the Newton's constant. We also construct a solution of
ten-dimensional theory without a cosmological constant compactified to AdS_4 x
S^6.

<|endoftext|><|startoftext|>
APS/123-QED
Dimensional crossover of quantum critical behavior in CeCoIn5
J. G. Donath,1 P. Gegenwart,2 F. Steglich,1 E. D. Bauer,3 and J. L. Sarrao3
Max-Planck-Institute for Chemical Physics of Solids, D-01187 Dresden, Germany
I. Physik. Institut, Georg-August-Universität Göttingen, D-37077 Göttingen
Los Alamos National Laboratory, Los Alamos, NM 87545, USA
(Dated: November 11, 2018)
The nature of quantum criticality in CeCoIn5 is studied by low-temperature thermal expansion
α(T ). At the field-induced quantum critical point at H = 5 T a crossover scale T ⋆ ≈ 0.3 K is
observed, separating α(T )/T ∝ T−1 from a weaker T−1/2 divergence. We ascribe this change to a
crossover in the dimensionality of the critical fluctuations which may be coupled to a change from
unconventional to conventional quantum criticality. Disorder, whose effect on quantum criticality
is studied in CeCoIn5−xSnx (0 ≤ x ≤ 0.18), shifts T ⋆ towards higher temperatures.
PACS numbers: 71.10.Hf,71.27.+a,74.70.Tx
Quantum criticality in heavy fermion (HF) systems
continues to attract interest due to the occurrence of
highly anomalous metallic states with severe deviations
from Landau Fermi liquid (LFL) behavior [1, 2] and
the emergence of unconventional superconductivity in
close vicinity to antiferromagnetic (AF) quantum criti-
cal points (QCPs) [3]. Neither the nature of the non-
Fermi liquid (NFL) normal state related to quantum crit-
icality, nor the superconducting (SC) pairing mechanism
has been clarified up to now. It is thus of great inter-
est to investigate whether quantum criticality in these
systems can be described by conventional theory within
the framework of a spin-density-wave (SDW) instability
[4, 5], or whether unconventional scenarios in which the
f-electrons localize at the magnetic QCP due to a de-
struction of the Kondo resonance [6, 7, 8] may be more
appropriate. For the formation of the latter, magnetic
frustration leading to a reduced dimensionality of the
critical fluctuations may be crucial.
The CeMIn5 (M=Rh, Ir, Co) systems are prototyp-
ical as they display a generic phase diagram with un-
conventional HF superconductivity in close vicinity to
an AF QCP [9]. They crystallize in a tetragonal struc-
ture which can be viewed as an alternating series of
CeIn3 and MIn2 layers. As a result of the layered
crystal structure, the Fermi surface displays a strongly
two-dimensional (2D) character with cylindrical sheets
along the crystallographic c-axis [10]. Compared to cu-
bic CeIn3, a HF superconductor with Tc = 0.2 K [3]
in a very narrow pressure range close to the magnetic
quantum phase transition at pc ≈ 2.6GPa, SC transition
temperatures of about 2 K are observed over wide pres-
sure ranges for the tetragonal CeRhIn5 (at p ≥ 1.6 GPa)
and CeCoIn5 (at ambient pressure) [11, 12]. This Tc
enhancement has been attributed to the layered crystal
structure and, relatedly, strongly anisotropic magnetic
fluctuations [12]. Indeed the nuclear magnetic relaxation
rate 1/T1 of CeCoIn5 displays a weak T
1/4 dependence
in the normal state between 2 and 40 K which signals
strongly anisotropic quantum critical fluctuations [13].
��� ��� ���
� � � � � ��
:;<=>?@ABCDE
J KLM
TUVWXYZ
Figure 1: (Color online) Phase diagram of CeCoIn5 for H ‖
c as determined from thermal expansion. Superconducting
phase in gray with first-order boundary below 0.7 K indicated
by thick black line. Regions where thermal expansion follows
2D- and 3D quantum critical behavior are marked in blue and
yellow, respectively. The inset displays the evolution of the
crossover with Sn-doping in CeCoIn5−xSnx at the respective
Hc2(x).
The aim of this Letter is a detailed investigation of the
nature of quantum criticality in CeCoIn5. We focus in
particular to the region very close to the upper critical
field Hc2 for superconductivity (5 T for H ‖ c, cf. Figure
1) which previously has been studied by heat and charge
transport [14, 15, 16] and specific heat measurements
[17]. Diverging coefficients of the T 2 contributions to the
electrical and the thermal resistivity prove the existence
of a magnetic-field induced QCP at 5 T. NFL behavior
in the temperature dependence of the electronic specific
heat coefficient at 5 T has been described in the frame of
the SDW theory [17]. However, below 0.3 K a large nu-
clear contribution arising from the Zeeman splitting of In-
nuclear moments needs to be subtracted. Therefore, the
data do not allow to distinguish between a saturation or
http://arxiv.org/abs/0704.0506v2
logarithmic divergence at lowest temperatures and thus
further thermodynamic measurements are needed to de-
termine the nature of quantum criticality in the system
(transport data will be discussed later).
Thermal expansion is ideally suited for this purpose.
It probes the pressure dependence of the entropy which
close to QCPs is accumulated at finite temperatures.
Scaling arguments have revealed that thermal expansion
α(T ) is far more singular than specific heat C(T ) in the
approach of any pressure-sensitive QCP [18]. Within
the SDW theory the leading contribution to α(T )/T di-
verges like T−1/2 and T−1 for 3D and 2D AF QCPs,
respectively [18]. Both can easily be distinguished from
α(T )/T = const. expected for a LFL. Especially impor-
tant in this context, thermal expansion, in contrast to
specific heat, is not affected by nuclear hyperfine contri-
butions.
For our study, we have used high-quality single crys-
tals of CeCoIn5−xSnx grown from In flux, whose low-
temperature specific heat and electrical resistivity are
discussed in [17, 19, 20]. For details on the sample char-
acterization see [19, 20, 21]. Thermal expansion has been
determined with the aid of high-resolution dilatometers
at temperatures down to 0.04 K and in magnetic fields up
to 10 T. We have measured the length change ∆Lc along
the c-axis and determined the linear (c-axis) expansion
coefficient α = ∂ lnLc/∂T .
Figure 2 displays our thermal expansion data on un-
doped CeCoIn5. At the upper critical field of 5 T, the
thermal expansion coefficient α(T )/T (cf. inset a) grows
much stronger upon cooling than the respective specific
heat coefficient which diverges only logarithmically [17].
Over more than one decade in temperature, i.e. for
0.3 K ≤ T ≤ 6 K, the data follow a 1/T divergence
which hints at 2D AF quantum critical fluctuations [18].
The latter may result from the layered crystal structure
[12, 13]. At T ⋆ ≈ 0.3 K, the temperature dependence
changes to α ∝ T 1/2 (see main part and inset b, which
also displays data obtained on a second sample down
to 40 mK). Note, that α/T does not show a saturation
excluding the formation of a LFL above the lowest mea-
sured temperature. The square-root behavior for α(T ) is
compatible with a 3D AF QCP of itinerant nature [18]
as observed for CeNi2Ge2 [2] and CeIn3−xSnx [22].
As H is increased above 5 T, T ⋆ increases and α(T )
becomes less singular, i.e. the coefficient of the square-
root contribution decreases (cf. Fig.2, inset b). This
suggests that the system is tuned away from the QCP,
compatible with previous studies [14, 17], although LFL
behavior is not yet fully established in thermal expansion.
Our data on CeCoIn5 are summarized in the main part
of Fig. 1. We have observed a crossover scale separating
2D from 3D quantum critical behavior. To provide fur-
ther evidence for this crossover and to investigate how
it is influenced by weak disorder, we now focus on the
series CeCoIn5−xSnx where the Sn-atoms preferentially
0 1 2 3 4 5 6
0.1 1 7
0.0 0.5 1.0 1.5
 T (K)
 1/2 CeCoIn
5T // c
  T (K)
α/T (10
   H (T)
 5 (S2)
 8  (S2)
 10 (S2)
 T (K)
Figure 2: (Color online) Temperature dependence of the lin-
ear thermal expansion coefficient of CeCoIn5 at H = 5 T
(‖ c). Dotted line and arrow indicate α ∝
T and crossover
temperature T ⋆, defined as upper limit for this T -dependence,
respectively. Inset (a) displays data from main part as α/T
vs T (on logarithmic scale). Solid line indicates T−1 depen-
dence. Inset (b) compares data from main part in the low-
temperature regime with 5, 8 and 10T data obtained from a
second sample (S2). Lines display square-root behavior.
x Tc (K) Hc2 (T)
0.00 (2.25 ± 0.05) (4.9± 0.1)
0.03 (1.80 ± 0.05) (4.5± 0.1)
0.06 (1.50 ± 0.05) (3.9± 0.1)
0.09 (1.15 ± 0.05) (3.4± 0.1)
0.12 (0.75 ± 0.05) (2.5± 0.1)
0.18 0 0
Table I: Values for the SC transition temperature Tc and up-
per critical magnetic field Hc2 for CeCoIn5−xSnx [19].
occupy the In-1 position within in the tetragonal plane
[21]. Sn doping weakens superconductivity, leading to a
linear suppression of Tc towards zero for x = 0.18 [19].
The temperature-magnetic field phase diagram of var-
ious CeCoIn5−xSnx single crystals has previously been
studied by low-temperature electrical resistivity and spe-
cific heat measurements [19, 20]. As Tc is reduced, a
corresponding reduction of Hc2 is observed (for the x-
dependence of Tc and Hc2, see Table I). For all differ-
ent Sn concentrations the temperature dependence of the
specific heat displays NFL behavior at the respective up-
per critical field and the formation of a LFL state at
fields exceeding Hc2(x) [19]. This suggests that field-
induced quantum criticality is always pinned at the upper
critical field Hc2 when the latter is reduced by Sn dop-
ing. Furthermore, the low-T specific heat coefficient at
0.1 1 6
            x
 0.00
 0.03
 0.06
 0.09
 0.12
 0.18
T (K)
CeCoIn
Figure 3: Linear thermal expansion coefficient as α vs. T on
a logarithmic scale for CeCoIn5−xInx at H ≃ Hc2(x) (given
in Table I). Note that data sets are shifted by 5× 10−6 K−2,
subsequently. Arrows indicate lower limit of α/T ∝ T−1 be-
havior.
H = Hc2(x) remains unchanged within the scatter of the
data for 0 ≤ x ≤ 0.12 [19]. On the other hand, the resid-
ual resistivity ρ0 shows a tenfold increase for x ranging
from 0 to 0.18, indicating the effect of disorder scatter-
ing due to the random distribution of Sn-atoms on the
in-plane In site [20]. The study of CeCoIn5−xSnx thus
allows to systematically investigate the disorder depen-
dence of NFL behavior without tuning the system away
from the QCP.
Figure 3 shows c-axis thermal expansion data for the
various studied CeCoIn5−xSnx single crystals at their re-
spective upper critical magnetic fields (for the zero-field
data see [23]). In all these samples, 2D-like quantum crit-
ical behavior α(T )/T ∝ T−1 is found from 6 K down to a
lower bound which increases from about 0.3 K for x = 0
to about 1.4 K for x = 0.18.
Like for x = 0, the low-T thermal expansion of all
samples studied is well described by α(T ) ∝
T . For
x = 0.18, this temperature dependence holds up to
T ⋆ ≈ 1.4 K, i.e. over more than one decade (see Fig. 4),
providing clear evidence for 3D AF quantum critical fluc-
tuations in the latter system. Clearly, the temperature
0 1 2 3 4
T (K)
CeCoIn
         x
 0.18
Figure 4: Linear thermal expansion coefficient α(T ) of
CeCoIn5 at H = 5 T (‖ c, open squares), as well as
CeCoIn4.82Sn0.18 at H = 0 (open circles). Dotted lines and
arrows indicate α ∝
T and crossover temperatures T ⋆, re-
spectively.
at which the dimensional crossover occurs is shifted with
Sn doping in CeCoIn5−xSnx to values above 1 K, cf. the
inset of Fig. 1. As stated above, the partial substitution
of the In-(1) site by Sn-atoms enhances impurity scat-
tering without tuning the system away from the QCP.
Our observation of a shift of the crossover scale T ⋆ with
x is then naturally attributed to the effect of isotropic
impurity scattering, which ”smears out” the magnetic
anisotropy. Crossovers have also been observed in the
electrical and heat transport [15, 16] as well as in the
Hall coefficient [24] for the current direction j ⊥ c. How-
ever, transport experiments are influenced by electronic
relaxational properties, which can give rise to compli-
cated behavior for anisotropic and multiband systems
like CeCoIn5. Indeed, for j ‖ c no crossover is visible
in ρ(T ), and the Wiedemann-Franz law, which is obeyed
for j ⊥ c, seems to be violated [16].
In order to clearly show that, at the QCP in
CeCoIn5−xSnx, a finite energy scale kBT
⋆ exists which
marks the crossover from 2D to 3D quantum critical
behavior, measurements either of the fluctuation spec-
trum in equilibrium, for example by inelastic neutron
scattering (INS), or of thermodynamic properties are re-
quired. q-scans of the INS over wide regions in recip-
rocal space, required to decide on the dimensionality of
the quantum critical fluctuations, are not possible at high
fields. Therefore, our thermodynamic measurements pro-
vide the only way to investigate this question and indeed
prove such a crossover.
We now address the nature of quantum criticality
(SDW-type or unconventional) in the regime where 2D-
like behavior is observed. Theory suggests that 2D
fluctuations are necessary for the occurrence of locally-
[\] ^ _
������
������
  − �� 
  − ¢£¤¥
ª «¬­
Figure 5: (Color online) Critical Grüneisen ratio Γcr =
Vm/κT × αcr/Ccr, where αcr and Ccr denote thermal expan-
sion and specific heat after subtraction of non-critical back-
ground contributions [18], of YbRh2(Si0.95Ge0.05)2 (left axis,
[2]) and CeCoIn5−xInx (x = 0 at H = 5 T and x = 0.18
at 0 T, right axis). For the latter, the molar volume and
isothermal compressibility equal Vm = 9.57 · 10−5 m3/mol
and κT = (3.43 ± 0.16) × 10−3 GPa−1 [25], respectively.
A small background term [0.35 × 10−6 K−2] has been sub-
tracted from α/T for x = 0.12. No specific-heat background
contributions have been subtracted since C(T )/T ∝ log T at
T > T ⋆ (T ⋆ indicated by dotted arrows). The so-derived crit-
ical Grüneisen ratio is invalid for T < T ⋆, where the specific
heat is dominated by a non-critical contribution [26]. Lines
indicate power-law behavior at T > T ⋆.
critical quantum criticality [7]. The latter is well estab-
lished for the magnetic-field tuned AF QCP in the heavy
fermion system YbRh2Si2 and its slightly Ge-doped vari-
ant YbRh2(Si0.95Ge0.05)2, for which the critical field is
almost zero [2]. It is therefore very interesting to com-
pare the low-T thermodynamics of CeCoIn5−xSnx with
the latter system. Of particular importance is the tem-
perature dependence of the critical Grüneisen ratio Γcr,
i.e. the ratio of the critical components of thermal ex-
pansion to specific heat. It has previously been shown,
that Γcr(T ) ∝ T−ǫ with ǫ = 1 and 2/3 for conventional
and unconventional quantum criticality, respectively [2].
Figure 5 shows striking similarities in the temper-
ature dependence of the critical Grüneisen ratio of
YbRh2(Si0.95Ge0.05)2 and CeCoIn5−xSnx at tempera-
tures above T ⋆(x), i.e. in the 2D regime: A rather simi-
lar fractional Grüneisen exponent is found which is close
to the prediction for the locally-critical QCP scenario in
the presence of xy anisotropy [7]. Theoretically, it has
been shown that such behavior requires quasi-2D quan-
tum critical fluctuations [7] supporting further the lat-
ter at T > T ⋆ in CeCoIn5. In view of the lack of su-
perconductivity in YbRh2Si2 (at least for T > 10 mK),
it is highly desirable to check, whether or not a similar
crossover towards conventional behavior at temperatures
below the lower limit of previous studies (20 mK) takes
place in the latter material.
To summarize, we have found thermodynamic evidence
for a finite crossover scale T ⋆ at the magnetic-field tuned
QCP in CeCoIn5. We associate T
⋆ with a dimensional
crossover from 2D (T > T ⋆) to 3D (T < T ⋆) quantum
critical behavior. The introduction of disorder shifts the
crossover scale towards higher temperatures.
Stimulating discussions with M. Nicklas, Q. Si and S.
Wirth are gratefully acknowledged. Work at Dresden and
Göttingen was partially financed by the DFG Research
unit 960 (Quantum phase transitions), while work at Los
Alamos was carried out under the auspices of the U.S.
[1] G.R. Stewart, Rev. Mod. Phys. 73, 797-855 (2001), 78,
743-753 (2006).
[2] P. Gegenwart, Q. Si, F. Steglich, arXiv:0712.2045v2 and
refs. therein.
[3] N.D. Mathur et al., Nature 394, 39 (1998).
[4] A.J. Millis, Phys. Rev. B 48, 7183 (1993).
[5] T. Moriya and T. Takimoto, J. Phys. Soc. Jpn. 64, 90
(1995), and refs. therein.
[6] P. Coleman et al., J. Phys. Cond. Matt. 13 R723 (2001).
[7] Q. Si et al., Nature 413, 804 (2001).
[8] T. Senthil, M. Vojta, S. Sachdev, Phys. Rev. B 69,
035111 (2004).
[9] J.D. Thompson, et al., Physica B 329, 446 (2003) and
refs. therein.
[10] R. Settai et al., J. Phys. Condens. Matter 13 L627 (2001).
[11] H. Hegger et al., Phys. Rev. Lett. 84, 4986 (2000).
[12] C. Petrovic et al., J. Phys. Condens. Matter 13, L337
(2001).
[13] Y. Kawasaki et al., J. Phys. Soc. Jpn. 72, 2308 (2003).
[14] J. Paglione et al., Phys. Rev. Lett. 91, 246405 (2003).
[15] J. Paglione et al., Phys. Rev. Lett. 97, 106606 (2006).
[16] M.A. Tanatar, J. Paglione, C. Petrovich, L. Taillefer, Sci-
ence 316, 1320 (2007).
[17] A. Bianchi, R. Movshovich, I. Vekhter, P.G. Pagliuso,
J.L. Sarrao, Phys. Rev. Lett. 91, 257001 (2003).
[18] L. Zhu, M. Garst, A. Rosch, Q. Si, Phys. Rev. Lett. 91,
066404 (2003).
[19] E.D. Bauer et al., Phys. Rev. Lett. 94, 047001 (2005).
[20] E.D. Bauer et al., Phys. Rev. B 73, 245109 (2006).
[21] M. Daniel et al., Phys. Rev. Lett. 95, 016406 (2005).
[22] R. Küchler et al., Phys. Rev. Lett. 96, 256403 (2006).
[23] J.G. Donath, et al., Physica B 378-380, 98 (2006).
[24] S. Singh et al., Phys. Rev. Lett. 98, 057001 (2007).
[25] R.S. Kumar, A.L. Cornelius, J.L. Sarrao, Phys. Rev. B
70, 214526 (2004).
[26] For T < T ⋆, specific heat follows the predictions of the
3D SDW scenario [17], i.e. C(T )/T = γ′ − β′
T . Since
αcr ∝
T , Γcr ∝ T−1 (as expected [18]).
http://arxiv.org/abs/0712.2045
ABSTRACT
  The nature of quantum criticality in CeCoIn$_5$ is studied by low-temperature
thermal expansion $\alpha(T)$. At the field-induced quantum critical point at
H=5 T a crossover scale $T^\star\approx 0.3$ K is observed, separating
$\alpha(T)/T\propto T^{-1}$ from a weaker $T^{-1/2}$ divergence. We ascribe
this change to a crossover in the dimensionality of the critical fluctuations
which may be coupled to a change from unconventional to conventional quantum
criticality. Disorder, whose effect on quantum criticality is studied in
CeCoIn$_{5-x}$Sn$_x$ ($0\leq x\leq 0.18$), shifts $T^\star$ towards higher
temperatures.

<|endoftext|><|startoftext|>
arXiv:0704.0507v2  [hep-th]  9 Oct 2007
Imperial/TP/2007/mjd/1
CERN-PH-TH/2007-78
UCLA/07/TEP/7
E6 and the bipartite entanglement of three qutrits
M. J. Duff 1† and S. Ferrara2‡
† The Blackett Laboratory, Imperial College London,
Prince Consort Road, London SW7 2AZ
‡Physics Department,Theory Unit, CERN, CH1211, Geneva23, Switzerland
Department of Physics & Astronomy, Universuty of California,Los Angeles, USA
INFN-Laboratori Nazionale di Frascati, Via E. Fermi 40, 00044 Frascati, Italy
ABSTRACT
Recent investigations have established an analogy between the entropy of four-dimensional
supersymmetric black holes in string theory and entanglement in quantum information the-
ory. Examples include: (1) N = 2 STU black holes and the tripartite entanglement of
three qubits (2-state systems), where the common symmetry is [SL(2)]3 and (2) N = 8
black holes and the tripartite entanglement of seven qubits where the common symmetry
is E7 ⊃ [SL(2)]
7. Here we present another example: N = 8 black holes (or black strings)
in five dimensions and the bipartite entanglement of three qutrits (3-state systems), where
the common symmetry is E6 ⊃ [SL(3)]
3. Both the black hole (or black string) entropy and
the entanglement measure are provided by the Cartan cubic E6 invariant. Similar analogies
exist for “magic” N = 2 supergravity black holes in both four and five dimensions.
1m.duff@imperial.ac.uk
2Sergio.Ferrara@cern.ch
http://arxiv.org/abs/0704.0507v2
Contents
1 D = 4 black holes and qubits 3
1.1 N = 2 black holes and the tripartite entanglement of three qubits . . . . . . 3
1.2 N = 2 black holes and the bipartite entanglement of two qubits . . . . . . . 4
1.3 N = 8 black holes and the tripartite entanglement of seven qubits . . . . . . 4
1.4 Magic supergravities in D = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Five-dimensional supergravity 8
3 D = 5 black holes and qutrits 10
3.1 N = 2 black holes and the bipartite entanglement of two qutrits . . . . . . . 10
3.2 N = 8 black holes and the bipartite entanglement of three qutrits . . . . . . 11
3.3 Magic supergravities in D = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Conclusions 13
5 Acknowledgements 14
1 D = 4 black holes and qubits
It sometimes happens that two very different areas of theoretical physics share the same
mathematics. This may eventually lead to the realisation that they are, in fact, dual de-
scriptions of the same physical phenomena, or it may not. Either way, it frequently leads
to new insights in both areas. Recent papers [1, 2, 3, 4, 5, 6] have established an analogy
between the entropy of certain four-dimensional supersymmetric black holes in string the-
ory and entanglement measures in quantum information theory. In this paper we extend
the analogy from four dimensions to five which also involves going from two-state systems
(qubits) to three-state systems (qutrits).
We begin by recalling the four-dimensional examples:
1.1 N = 2 black holes and the tripartite entanglement of three
qubits
The three qubit system (Alice, Bob, Charlie) is described by the state
|Ψ〉 = aABC |ABC〉 (1.1)
where A = 0, 1, so the Hilbert space has dimension 23 = 8. The complex numbers aABC
transforms as a (2, 2, 2) under SL(2, C)A×SL(2, C)B ×SL(2, C)C . The tripartite entangle-
ment is measured by the 3-tangle [7, 8]
τ3(ABC) = 4|Det aABC |. (1.2)
where Det aABC is Cayley’s hyperdeterminant [9].
Det a = −
ǫA1A2ǫB1B2ǫA3A4ǫB3B4ǫC1C4ǫC2C3aA1B1C1aA2B2C2aA3B3C3aA4B4C4 (1.3)
The hyperdeterminant is invariant under SL(2)A × SL(2)B × SL(2)C and under a triality
that interchanges A, B and C.
In the context of stringy black holes the 8 aABC are the 4 electric and 4 magnetic charges
of the N = 2 STU black hole [10] and hence take on real (integer) values. The STU model
corresponds to N = 2 supergravity coupled to three vector multiplets, where the symmetry
is [SL(2, Z)]3. The Bekenstein-Hawking entropy of the black hole, S, was first calculated in
[11]. The connection to quantum information theory arises by noting [1] that it can also be
expressed in terms of Cayley’s hyperdeterminant
S = π
|Det aABC |. (1.4)
One can then establish a dictionary between the classification of various entangled states
(separable A-B-C; bipartite entangled A-BC, B-CA, C-AB; tripartite entangled W; tripartite
entangled GHZ) and the classfication of various “small” and “large” BPS and non-BPS black
holes [1, 2, 3, 4, 5, 6]. For example, the GHZ state [12]
|Ψ〉 ∼ |111〉+ |000〉 (1.5)
with Det aABC ≥ 0 corresponds to a large non-BPS 2-charge black hole; the W-state
|Ψ〉 ∼ |100〉+ |010〉+ |001〉 (1.6)
with Det aABC = 0 corresponds to a small-BPS 3-charge black hole; the GHZ-state
|Ψ〉 = −|000〉+ |011〉+ |101〉+ |110〉 (1.7)
corresponds to a large BPS 4-charge black hole.
1.2 N = 2 black holes and the bipartite entanglement of two qubits
An even simpler example [2] is provided by the two qubit system (Alice and Bob) described
by the state
|Ψ〉 = aAB|AB〉 (1.8)
where A = 0, 1, and the Hilbert space has dimension 22 = 4. The aAB transforms as a (2, 2)
under SL(2, C)A × SL(2, C)B. The entanglement is measured by the 2-tangle
τ2(AB) = C
2(AB) (1.9)
where
C(AB) = 2 |det aAB| (1.10)
is the concurrence. The determinant is invariant under SL(2, C)A × SL(2, C)B and under
a duality that interchanges A and B. Here it is sufficient to look at N = 2 supergravity
coupled to just one vector multiplet and the 4 aAB are the 2 electric and 2 magnetic charges
of the axion-dilaton black hole with entropy
S = π|det aAB| (1.11)
For example, the Bell state
|Ψ〉 ∼ |11〉+ |00〉 (1.12)
with det aAB ≥ 0 corresponds to a large non-BPS 2-charge black hole.
1.3 N = 8 black holes and the tripartite entanglement of seven
qubits
We recall that in the case of D = 4, N = 8 supergravity, the the 28 electric and 28 magnetic
charges belong to the 56 of E7(7). The black hole entropy is [15, 18]
S = π
|J4| (1.13)
where J4 is Cartan’s quartic E7 invariant [13, 14]. It may be written
J4 = P
ijQjkP
klQli −
P ijQijP
klQkl
ǫijklmnopQijQklQmnQop + ǫijklmnop P
ijP klPmnP op
. (1.14)
where P ij and Qjk are 8× 8 antisymmetric matrices.
The qubit interpretation [4] relies on the decomposition
E7(C) ⊃ [SL(2, C)]
7 (1.15)
under which
(2, 2, 1, 2, 1, 1, 1)
+(1, 2, 2, 1, 2, 1, 1)
+(1, 1, 2, 2, 1, 2, 1)
+(1, 1, 1, 2, 2, 1, 2)
+(2, 1, 1, 1, 2, 2, 1)
+(1, 2, 1, 1, 1, 2, 2)
+(2, 1, 2, 1, 1, 1, 2) (1.16)
suggesting the tripartite entanglement of seven qubits (Alice, Bob, Charlie, Daisy, Emma,
Fred and George) described by the state.
|Ψ〉 =
aABD|ABD〉
+bBCE |BCE〉
+cCDF |CDF 〉
+dDEG|DEG〉
+eEFA|EFA〉
+fFGB|FGB〉
+gGAC |GAC〉 (1.17)
where A = 0, 1, so the Hilbert space has dimension 7.23 = 56. The a, b, c, d, e, f, g transform
as a 56 of E7(C). The entanglement may be represented by a heptagon where the vertices
A,B,C,D,E,F,G represent the seven qubits and the seven triangles ABD, BCE, CDF, DEG,
EFA, FGB, GAC represent the tripartite entanglement. See Figure 1. Alternatively, we can
use the Fano plane. See Figure 2. The Fano plane also corresponds to the multiplication
table of the octonions3
The measure of the tripartite entanglement of the seven qubits is provided by the 3-tangle
τ3(ABCDEFG) = 4|J4| (1.18)
J4 ∼ a
4 + b4 + c4 + d4 + e4 + f 4 + g4+
3Not the “split” octonions as was incorrectly stated in the published version of [4].
Figure 1: The E7 entanglement diagram. Each of the seven vertices A,B,C,D,E,F,G rep-
resents a qubit and each of the seven triangles ABD, BCE, CDF, DEG, EFA, FGB, GAC
describes a tripartite entanglement.
2[a2b2 + b2c2 + c2d2 + d2e2 + e2f 2 + f 2g2 + g2a2+
a2c2 + b2d2 + c2e2 + d2f 2 + e2g2 + f 2a2 + g2b2+
a2d2 + b2e2 + c2f 2 + d2g2 + e2a2 + f 2b2 + g2c2]
+8[bcdf + cdeg + defa+ efgb+ fgac+ gabd+ abce] (1.19)
where products like
a4 = (ABD)(ABD)(ABD)(ABD)
= ǫA1A2ǫB1B2ǫD1D4ǫA3A4ǫB3B4ǫD2D3aA1B1D1aA2B2D2aA3B3D3aA4B4D4 (1.20)
exclude four individuals (here Charlie, Emma, Fred and George), products like
a2b2 = (ABD)(ABD)(FGB)(FGB)
= ǫA1A2ǫB1B3ǫD1D2ǫF3F4ǫG3G4ǫB2B4aA1B1D1aA2B2D2bF3G3B3bF4G4B4 (1.21)
exclude two individuals (here Charlie and Emma), and products like
abce = (ABD)(BCE)(CDF )(EFA)
= ǫA1A4ǫB1B2ǫC2C3ǫD1D3ǫE2E4ǫF3F4aA1B1D1bB2C2E2cC3D3F3eE4F4A4 (1.22)
exclude one individual (here George)4.
Once again large non-BPS, small BPS and large BPS black holes correspond to states
with J4 > 0, J4 = 0 and J4 < 0, respectively.
4This corrects the corresponding equation in the published version of [4] which had the wrong index
contraction.
Figure 2: The Fano plane has seven points, representing the seven qubits, and seven lines
(the circle counts as a line) with three points on every line, representing the tripartite
entanglement, and three lines through every point.
1.4 Magic supergravities in D = 4
The black holes described by Cayley’s hyperdeterminant are those of N = 2 supergravity
coupled to three vector multiplets, where the symmetry is [SL(2, Z)]3. In [4] the following
four-dimensional generalizations were considered:
1) N = 2 supergravity coupled to l vector multiplets where the symmetry is SL(2, Z)×
SO(l − 1, 2, Z) and the black holes carry charges belonging to the (2, l + 1) representation
(l + 1 electric plus l + 1 magnetic).
2) N = 4 supergravity coupled to m vector multiplets where the symmetry is SL(2, Z)×
SO(6, m, Z) where the black holes carry charges belonging to the (2, 6 +m) representation
(m+ 6 electric plus m+ 6 magnetic).
3) N = 8 supergravity where the symmetry is the non-compact exceptional group
E7(7)(Z) and the black holes carry charges belonging to the fundamental 56-dimensional
representation (28 electric plus 28 magnetic).
In all three case there exist quartic invariants akin to Cayley’s hyperdeterminant whose
square root yields the corresponding black hole entropy. In [4] we succeeded in giving a
quantum theoretic interpretation in the N = 8 case together with its truncations to N = 4
(with m = 6) and N = 2 (with l = 3, the case we already knew [1]).
However, as suggested by Levay [5], one might also consider the “magic” supergravi-
ties [22, 23, 24]. These correspond to the R, C, H, O (real, complex, quaternionic and
octonionic) N = 2, D = 4 supergravity coupled to 6, 9, 15 and 27 vector multiplets with
symmetries Sp(6, Z), SU(3, 3), SO∗(12) and E7(−25), respectively. Once again, as has been
shown just recently [20], in all cases there are quartic invariants whose square root yields the
corresponding black hole entropy.
Here we demonstrate that the black-hole/qubit correspondence does indeed continue to
hold for magic supergravities. The crucial observation is that, although the black hole
charges aABC are real (integer) numbers and the entropy (1.13) is invariant under E7(7)(Z),
the coefficients aABC that appear in the qutrit state (1.17) are complex. So the three tangle
(1.18) is invariant under E7(C) which contains both E7(7)(Z) and E7(−25)(Z) as subgroups.
To find a supergravity correspondence therefore, we could equally well have chosen the magic
octonionic N = 2 supergravity rather than the conventional N = 8 supergravity. The fact
E7(7)(Z) ⊃ [SL(2)(Z)]
7 (1.23)
E7(−25)(Z) 6⊃ [SL(2)(Z)]
7 (1.24)
is irrelevant. All that matters is that
E7(C) ⊃ [SL(2)(C)]
7 (1.25)
The same argument holds for the magic real, complex and quaternionic N = 2 supergravities
which are, in any case truncations of N = 8 (in contrast to the octonionic) .
Having made this observation, one may then revisit the conventional N = 2 and N = 4
cases (1) and (2) above. When we looked at the seven qubit subsector E7(C) ⊃ SL(2, C)×
SO(12, C), we gave an N = 4 supergravity interpretationwith symmetry SL(2, R)×SO(6, 6)
[4], but we could equally have given an interpretation in terms of N = 2 supergravity coupled
to 11 vector multiplets with symmetry SL(2, R)× SO(10, 2).
Moreover, SO(l−1, 2) is contained in SO(l+1, C) and SO(6, m) is contained in SO(12+
m,C) so we can give a qubit interpretation to more vector multiplets for both N = 2 and
N = 4, at least in the case of SO(4n, C) which contains [SL(2, C)]2n.
2 Five-dimensional supergravity
In five dimensions we might consider:
1)N = 2 supergravity coupled to l+1 vector multiplets where the symmetry is SO(1, 1, Z)×
SO(l, 1, Z) and the black holes carry charges belonging to the (l+1) representation (all elec-
tric) .
2)N = 4 supergravity coupled tom vector multiplets where the symmetry is SO(1, 1, Z)×
SO(m, 5, Z) where the black holes carry charges belonging to the (m+5) representation (all
electric).
3) N = 8 supergravity where the symmetry is the non-compact exceptional group
E6(6)(Z) and the black holes carry charges belonging to the fundamental 27-dimensional
representation (all electric).
The electrically charged objects are point-like and the magnetic duals are one-dimensional,
or string-like, transforming according to the contrgredient representation. In all three cases
above there exist cubic invariants akin to the determinant which yield the corresponding
black hole or black string entropy.
In this section we briefly describe the salient properties of maximal N = 8 case, following
[16]. We have 27 abelian gauge fields which transform in the fundamental representation of
E6(6). The first invariant of E6(6) is the cubic invariant [13, 17, 16, 18, 19]
J3 = qijΩ
jlqlmΩ
mnqnpΩ
pi (2.1)
where qij is the charge vector transforming as a 27 which can be represented as traceless
Sp(8) matrix. The entropy of a black hole with charges qij is then given by
S = π
|J3| (2.2)
We will see that a configuration with J3 6= 0 preserves 1/8 of the supersymmetries. If J3 = 0
and ∂J3
6= 0 then it preserves 1/4 of the supersymmetries, and finally if ∂J3
= 0 (and the
charge vector qi is non-zero), the configuration preserves 1/2 of the supersymmetries. We
will show this by choosing a particular basis for the charges, the general result following by
U-duality.
In five dimensions the compact group H is USp(8). We choose our conventions so that
USp(2) = SU(2). In the commutator of the supersymmetry generators we have a central
charge matrix Zab which can be brought to a normal form by a USp(8) transformation. In
the normal form the central charge matrix can be written as
eab =
s1 + s2 − s3 0 0 0
0 s1 + s3 − s2 0 0
0 0 s2 + s3 − s1 0
0 0 0 −(s1 + s2 + s3)
(2.3)
we can order si so that s1 ≥ s2 ≥ |s3|. The cubic invariant, in this basis, becomes
J3 = s1s2s3 (2.4)
Even though the eigenvalues si might depend on the moduli, the invariant (2.4) only depends
on the quantized values of the charges. We can write a generic charge configuration as UeU t,
where e is the normal frame as above, and the invariant will then be (2.4). There are three
distinct possibilities
J3 6= 0 s1, s2, s3 6= 0
J3 = 0,
6= 0 s1, s2 6= 0, s3 = 0
J3 = 0,
= 0 s1 6= 0, s2, s3 = 0 (2.5)
Taking the case of type II on T 5 we can choose the rotation in such a way that, for example,
s1 corresponds to solitonic five-brane charge, s2 to fundamental string winding charge along
some direction and s3 to Kaluza-Klein momentum along the same direction. We can see that
in this specific example the three possibilities in (2.5) break 1/8, 1/4 and 1/2 supersymme-
tries. The respective orbits are
E6(6)
F4(4)
E6(6)
SO(5, 4)×T16
E6(6)
SO(5, 5)×T16
(2.6)
This also shows that one can generically choose a basis for the charges so that all others
are related by U-duality. The basis chosen here is the S-dual of the D-brane basis usually
chosen for describing black holes in type II B on T 5 . All others are related by U-duality to
this particular choice. Note that, in contrast to the four-dimensional case where flipping the
sign of J4 (1.14) interchanges BPS and non-BPS black holes, the sign of the J3 (2.4) is not
important since it changes under a CPT transformation. There is no non-BPS orbit in five
dimensions.
In five dimensions there are also string-like configurations which are the magnetic duals of
the configurations considered here. They transform in the contragredient 27′ representation
and the solutions preserving 1/2, 1/4, 1/8 supersymmetries are characterized in an analogous
way. We could also have configurations where we have both point-like and string-like ch the
point-like charge is uniformly distributed along the string, it is more natural to consider this
configuration as a point-like object in D = 4 by dimensional reduction.
It is useful to decompose the U-duality group into the T-duality group and the S-duality
group. The decomposition reads E6 → SO(5, 5)× SO(1, 1), leading to
27 → 161 + 10−2 + 14 (2.7)
The last term in (2.7) corresponds to the NS five-brane charge. The 16 correspond to the
D-brane charges and the 10 correspond to the 5 directions of KK momentum and the 5
directions of fundamental string winding, which are the charges that explicitly appear in
string perturbation theory. The cubic invariant has the decomposition
(27)3 → 10−2 10−2 14 + 161 161 10−2 (2.8)
This is saying that in order to have a non-zero area black hole we must have three NS charges
(more precisely some “perturbative” charges and a solitonic five-brane); or we can have two
D-brane charges and one NS charge. In particular, it is not possible to have a black hole
with a non-zero horizon area with purely D-brane charges.
Notice that the non-compact nature of the groups is crucial in this classification.
3 D = 5 black holes and qutrits
So far, all the quantum information analogies involve four-dimensional black holes and qubits.
In order to find an analogy with five-dimensional black holes we invoke three state systems
called qutrits.
3.1 N = 2 black holes and the bipartite entanglement of two
qutrits
The two qutrit system (Alice and Bob) is described by the state
|Ψ〉 = aAB|AB〉
where A = 0, 1, 2, so the Hilbert space has dimension 32 = 9. The aAB transforms as a (3, 3)
under SL(3)A × SL(3)B. The bipartite entanglement is measured by the concurrence [21]
C(AB) = 33/2|det aAB|. (3.1)
The determinant is invariant under SL(3, C)A × SL(3, C)B and under a duality that inter-
changes A and B.
The black hole interpretation is provided by N = 2 supergravity coupled to 8 vector
multiplets with symmetry SL(3, C) where the black hole charges transform as a 9. The
entropy is given by
S = π|det aAB| (3.2)
3.2 N = 8 black holes and the bipartite entanglement of three
qutrits
As we have seen in section (2) in the case of D = 5, N = 8 supergravity, the black hole
charges belong to the 27 of E6(6) and the entropy is given by (2.2).
The qutrit interpretation now relies on the decomposition
E6(C) ⊃ SL(3, C)A × SL(3, C)B × SL(3, C)C (3.3)
under which
27 → (3, 3, 1) + (3′, 1, 3) + (1, 3′, 3′) (3.4)
suggesting the bipartite entanglement of three qutrits (Alice, Bob, Charlie). However, the
larger symmetry requires that they undergo at most bipartite entanglement of a very specific
kind, where each person has bipartite entanglement with the other two:
|Ψ〉 = aAB|AB〉+ b
C |BC〉+ c
CA|CA〉 (3.5)
where A = 0, 1, 2, so the Hilbert space has dimension 3.32 = 27. The three states trans-
forms as a pair of triplets under two of the SL(3)’s and singlets under the remaining one.
Individually, therefore, the bipartite entanglement of each of the three states is given by the
determinant (3.1). Taken together however, we see from (3.4) that they transform as a com-
plex 27 of E6(C). The entanglement diagram is a triangle with vertices ABC representing
the qutrits and the lines AB, BC and CA representing the entanglements. See Fig. 3. The
N=2 truncation of section 3.1 is represented by just the line AB with endpoints A and B.
Note that:
1) Any pair of states has an individual in common
2) Each individual is excluded from one out of the three states
The entanglement measure will be given by the concurrence
C(ABC) = 33/2|J3| (3.6)
J3 being the singlet in 27× 27× 27:
J3 ∼ a
3 + b3 + c3 + 6abc (3.7)
Figure 3: The entanglement diagram is a triangle with vertices ABC representing the qutrits
and the lines AB, BC and CA representing the entanglements.
where the products
a3 = ǫA1A2A3ǫB1B2B3aA1B1aA2B2aA3B3 (3.8)
b3 = ǫB1B2B3ǫ
C1C2C3bB1C1b
C3 (3.9)
c3 = ǫC1C2C3ǫA1A2A3c
C1A1cC2A2cC3A3 (3.10)
exclude one individual (Charlie, Alice, and Bob respectively), and the product
abc = aABb
CA (3.11)
excludes none.
3.3 Magic supergravities in D = 5
Just as in four dimensions, one might also consider the “magic” supergravities [22, 23,
24]. These correspond to the R, C, H, O (real, complex, quaternionic and octonionic)
N = 2, D = 5 supergravity coupled to 5, 8, 14 and 26 vector multiplets with symmetries
SL(3, R), SL(3, C), SU∗(6) and E6(−26) respectively. Once again, in all cases there are cubic
invariants whose square root yields the corresponding black hole entropy [20].
Here we demonstrate that the black-hole/qubit correspondence continue to hold for these
D = 5 magic supergravities, as well as D = 4 . Once again, the crucial observation is that,
although the black hole charges aAB are real (integer) numbers and the entropy (2.2) is
invariant under E6(6)(Z), the coefficients aAB that appear in the wave function (3.5) are
complex. So the 2-tangle (3.6) is invariant under E6(C) which contains both E6(6)(Z) and
E6(−26)(Z) as subgroups. To find a supergravity correspondence therefore, we could equally
well have chosen the magic octonionic N = 2 supergravity rather than the conventional
N = 8 supergravity. The fact that
E6(6)(Z) ⊃ [SL(3)(Z)]
3 (3.12)
E6(−26)(Z) 6⊃ [SL(3)(Z)]
3 (3.13)
is irrelevant. All that matters is that
E6(C) ⊃ [SL(3)(C)]
3 (3.14)
The same argument holds for the magic real, complex and quaternionic N = 2 supergravities
which are, in any case truncations of N = 8 (in contrast to the octonionic). In fact, the
example of section 3.1 corresponds to the complex case.
Having made this observation, one may then revisit the conventional N = 2 and N = 4
cases (1) and (2) of section (2). SO(l, 1) is contained in SO(l + 1, C) and SO(m, 5) is
contained in SO(5 +m,C), so we can give a qutrit interpretation to more vector multiplets
for both N = 2 and N = 4, at least in the case of SO(6n, C) which contains [SL(3, C)]n.
4 Conclusions
We note that the 27-dimensional Hilbert space given in (3.4) and (3.5) is not a subspace of
the 33-dimensional three qutrit Hilbert space given by (3, 3, 3), but rather a direct sum of
three 32-dimensional Hilbert spaces. It is, however, a subspace of the 73-dimensional three
7-dit Hilbert space given by (7, 7, 7). Consider the decomposition
SL(7)A × SL(7)B × SL(7)C → SL(3)A × SL(3)B × SL(3)C
under which
(7, 7, 7) →
(3′, 3′, 3′) + (3′, 3′, 3) + (3′, 3, 3′) + (3, 3′, 3′) + (3′, 3, 3) + (3, 3′, 3) + (3, 3, 3′) + (3, 3, 3)
+(3′, 3′, 1) + (3′, 1, 3′) + (1, 3′, 3′) + (3′, 1, 3) + (3′, 3, 1) + (1, 3, 3′)
+(3, 3, 1) + (3, 1, 3) + (1, 3, 3) + (3, 1, 3′) + (3, 3′, 1) + (1, 3′, 3)
+(3′, 1, 1) + (1, 3′, 1) + (1, 1, 3′) + (3, 1, 1) + (1, 3, 1) + (1, 1, 3)
+(1, 1, 1)
This contains the subspace that describes the bipartite entanglement of three qutrits, namely
(3′, 3, 1) + (3, 1, 3) + (1, 3′, 3′)
So the triangle entanglement we have described fits within conventional quantum information
theory.
Our analogy between black holes and quantum information remains, for the moment,
just that. We know of no physics connecting them.
Nevertheless, just as the exceptional group E7 describes the tripartite entanglement of
seven qubits [4, 5], we have seen is this paper that the exceptional group E6 describes the
bipartite entanglement of three qutrits. In the E7 case, the quartic Cartan invariant provides
both the measure of entanglement and the entropy of the four-dimensional N = 8 black hole,
whereas in the E6 case, the cubic Cartan invariant provides both the measure of entanglement
and the entropy of the five-dimensional N = 8 black hole.
Moreover, we have seen that similar analogies exist not only for the N = 4 and N = 2
truncations, but also for the magic N = 2 supergravities in both four and five dimensions
(In the four-dimensional case, this had previously been conjectured by Levay[4, 5]). Murat
Gunaydin has suggested (private communication) that the appearance of octonions implies a
connection to quaternionic and/or octonionic quantum mechanics. This was not apparent (at
least to us) in the four-dimensional N = 8 case [4], but the appearance in the five dimensional
magic N = 2 case of SL(3, R), SL(3, C), SL(3, H) and SL(3, O) is more suggestive.
5 Acknowledgements
MJD has enjoyed useful conversations with Leron Borsten, Hajar Ebrahim, Chris Hull,
Martin Plenio and Tony Sudbery. This work was supported in part by the National Science
Foundation under grant number PHY-0245337 and PHY-0555605. Any opinions, findings
and conclusions or recommendations expressed in this material are those of the authors and
do not necessarily reflect the views of the National Science Foundation. The work of S.F.
has been supported in part by the European Community Human Potential Program under
contract MRTN-CT-2004-005104 Constituents, fundamental forces and symmetries of the
universe, in association with INFN Frascati National Laboratories and by the D.O.E grant
DE-FG03-91ER40662, Task C. The work of MJD is supported in part by PPARC under
rolling grant PPA/G/O/2002/00474, PP/D50744X/1.
References
[1] M. J. Duff, “String triality, black hole entropy and Cayley’s hyperdeterminant,” Phys.
Rev. D 76, 025017 (2007) [arXiv:hep-th/0601134].
[2] R. Kallosh and A. Linde, “Strings, black holes, and quantum information,” Phys. Rev.
D 73, 104033 (2006) [arXiv:hep-th/0602061].
[3] P. Levay, “Stringy black holes and the geometry of entanglement,” Phys. Rev. D 74,
024030 (2006) [arXiv:hep-th/0603136].
[4] M. J. Duff and S. Ferrara, “E7 and the tripartite entanglement of seven qubits,” Phys.
Rev. D 76, 025018 (2007) [arXiv:quant-ph/0609227].
[5] P. Levay, “Strings, black holes, the tripartite entanglement of seven qubits and the Fano
plane,” Phys. Rev. D 75, 024024 (2007) [arXiv:hep-th/0610314].
[6] M. J. Duff and S. Ferrara, “Black hole entropy and quantum information,” arXiv:hep-
th/0612036.
[7] V. Coffman, J. Kundu and W. Wooters, “Distributed entanglement,” Phys. Rev. A61
(2000) 52306, [arXiv:quant-ph/9907047].
[8] A. Miyake and M. Wadati, “Multipartite entanglement and hyperdeterminants,” Quant.
Info. Comp. 2 (Special), 540-555 (2002) [arXiv:quant-ph/0212146].
[9] A. Cayley, “On the theory of linear transformations,” Camb. Math. J. 4 193-209,1845.
[10] M. J. Duff, J. T. Liu and J. Rahmfeld, “Four-dimensional string-string-string triality,”
Nucl. Phys. B 459, 125 (1996) [arXiv:hep-th/9508094].
[11] K. Behrndt, R. Kallosh, J. Rahmfeld, M. Shmakova and W. K. Wong, “STU black holes
and string triality,” Phys. Rev. D 54, 6293 (1996) [arXiv:hep-th/9608059].
[12] D. M. Greenberger, M. Horne and A. Zeilinger, in Bell’s Theorem, Quantum Theory
and Conceptions of the Universe, ed. M. Kafatos (Kluwer, Dordrecht, 1989)
[13] E. Cartan, “Oeuvres completes”, (Editions du Centre National de la Recherche Scien-
tifique, Paris, 1984).
[14] E. Cremmer and B. Julia, “The SO(8) Supergravity,” Nucl. Phys. B 159, 141 (1979).
[15] R. Kallosh and B. Kol, “E(7) Symmetric Area of the Black Hole Horizon,” Phys. Rev.
D 53, 5344 (1996) [arXiv:hep-th/9602014].
[16] S. Ferrara and J. M. Maldacena, “Branes, central charges and U -duality invariant BPS
conditions,” Class. Quant. Grav. 15, 749 (1998) [arXiv:hep-th/9706097].
[17] S. Ferrara and R. Kallosh, “Universality of Supersymmetric Attractors,” Phys. Rev. D
54, 1525 (1996) [arXiv:hep-th/9603090].
[18] S. Ferrara and M. Gunaydin, “Orbits of exceptional groups, duality and BPS states in
string theory,” Int. J. Mod. Phys. A 13, 2075 (1998) [arXiv:hep-th/9708025].
[19] L. Andrianopoli, R. D’Auria and S. Ferrara, “Five dimensional U-duality, black-hole en-
tropy and topological invariants,” Phys. Lett. B 411, 39 (1997) [arXiv:hep-th/9705024].
[20] S. Ferrara, E. G. Gimon and R. Kallosh, “Magic supergravities, N = 8 and black hole
composites,” Phys. Rev. D 74, 125018 (2006) [arXiv:hep-th/0606211].
[21] Heng Fan, Keiji Matsumoto, Hiroshi Imai, “Quantify entanglement by concurrence
hierarchy” J.Phys.A:Math.Gen 36 022317 (2003), quant-ph/0205126
[22] M. Gunaydin, G. Sierra and P. K. Townsend, “Gauging The D = 5 Maxwell-Einstein
Supergravity Theories: More On Jordan Algebras,” Nucl. Phys. B 253, 573 (1985).
[23] M. Gunaydin, G. Sierra and P. K. Townsend, “The Geometry Of N=2 Maxwell-Einstein
Supergravity And Jordan Algebras,” Nucl. Phys. B 242, 244 (1984).
[24] M. Gunaydin, G. Sierra and P. K. Townsend, “Exceptional Supergravity Theories And
The Magic Square,” Phys. Lett. B 133, 72 (1983).
ABSTRACT
  Recent investigations have established an analogy between the entropy of
four-dimensional supersymmetric black holes in string theory and entanglement
in quantum information theory. Examples include: (1) N=2 STU black holes and
the tripartite entanglement of three qubits (2-state systems), where the common
symmetry is [SL(2)]^3 and (2) N=8 black holes and the tripartite entanglement
of seven qubits where the common symmetry is E_7 which contains [SL(2)]^7. Here
we present another example: N=8 black holes (or black strings) in five
dimensions and the bipartite entanglement of three qutrits (3-state systems),
where the common symmetry is E_6 which contains [SL(3)]^3. Both the black hole
(or black string) entropy and the entanglement measure are provided by the
Cartan cubic E_6 invariant. Similar analogies exist for ``magic'' N=2
supergravity black holes in both four and five dimensions.

<|endoftext|><|startoftext|>
Introduction
Let a sequence of processes Xn = Xn(·) be given, converging in distribution (in some sense, e.g., in a
sense of convergence of finite-dimensional distributions, distributions in spaces C or D, etc.) to a limit process
X = X(·). Also let the family of functionals φn of the processes Xn be given. Assume that they are additive
in an appropriate sense with respect to time variable. The general question, considered in the present paper,
is what an information about the limit behavior of the distributions of functionals φn can be obtained in a
situation where the processesXn, X possess certain Markov properties. The starting point in our considerations
is provided by the comparatively simple, but important particular case of the problem outlined above, in which
all the processes Xn coincide. In this situation, φn are a functionals of the same process X , and if X is Markov
process and φn are W -functionals (see [1], Chapter 6), then their limit behavior, according to the well known
theorem by E.B.Dynkin ([1], Theorem 6.4), is determined by the limit behavior of their characteristics (that
is, their expectations).
In the present paper we consider the processes Xn that differ one from another. The class of sequences
of processes Xn, considered in the framework of our approach, contains sequences of Markov chains with
appropriately normalized time, embedded into C or D (for example, by means of standard operations of
linearization or construction of graduated processes), and weakly convergent to Markov process X . Important
partial case is provided by random broken lines (or random step functions) Xn, constructed by a random walk
in Rd and weakly convergent to a homogenous stable process X (particularly, to the Brownian motion).
We show that, under some structural assumption about processes Xn, X (the condition is that the sequence
{Xn} provides Markov approximation for the process X), the full analogue of the Dynkin’s theorem takes
place: if the characteristics of functionals φn converge weakly to the characteristics of W -functional φ of the
limit process X , then the distributions of φn converge to the distribution of φ. Our method of proof is based
on L2-estimates for the distance between additive functionals, similar to those given in Lemma 6.5 [1]. The
proof of these estimates is concerned with a preliminary construction of processes Xn, X on one probability
space in such a way, that the functionals φn, φ, associated initially with a different processes, are interpreted
as a functionals of one two-component process. The (some kind of) Markov property of the two-component
process is essential for the estimates, analogous to those given in Lemma 6.5 [1]; the structural assumption
mentioned above is just the claim for such a property to hold true in an appropriate form.
The method, proposed by authors, allows one to reduce the problem of studying of asymptotic behavior
of the distributions of additive functionals to a priori more simple problem of studying of their means. In
our opinion, it provides a good addition to the available methods of studying the limit behavior of additive
functionals both for the important partial case of random walks (we do not give the detailed review here,
2000 Mathematics Subject Classification. Primary 60J55; Secondary 60F17.
Key words and phrases. additive functional, characteristics of additive functional, Markov approximation.
http://arxiv.org/abs/0704.0508v1
2 YURI N.KARTASHOV, ALEXEY M.KULIK
referring the reader to monographs [2],[3],[4], papers [5],[6] and reviews there), and for general Markov chains.
Among the latter, it is necessary to mention the method that is based on the passing to the limit in the difference
equations that describe characteristic functions of additive functionals of Markov chains, and ascends to the
works of I.I.Gikhman at 50-ies (see [7],[8], also [9] and the survey paper [10]).
The structure of the article is following. In Chapter 2, we introduce the notion of Markov approximation
and give examples that illustrate it. In Chapter 3, the main theorem of the article is introduced and proved.
In Chapters 4,5, the two elementary examples of application of this theorem are given. In Chapter 6, the main
theorem is applied to the proof of a general sufficient condition for weak convergence of additive functionals,
set on the sequence of Markov chains, that is formulated in terms of transition probabilities of the chains.
2. Markov approximation.
Further we assume that the processes Xn, X are defined on R
+ and have a locally compact metric phase
space (X, ρ). We say that the process X possesses the Markov property at the time moment s ∈ R+ w.r.t.
filtration {Gt, t ∈ R+}, if X is adapted to this filtration and for each k ∈ N, t1, . . . , tk > s there exists a
stochastic kernel {Pst1...tk(x,A), x ∈ X, A ∈ B(Xk)} such that
(2.1) E[1IA((X(t1), . . . , X(tk)))|Gs] = Pst1...tk(X(s), A) a.s., A ∈ B(Xk).
The measure Pst1...tk(x, ·) has a natural interpretation as the finite-dimensional distribution of X at the points
t1, . . . , tk, conditioned by {X(s) = x}; we denote below Pst1...tk(x, ·) = P ((X(t1), . . . , X(tk)) ∈ ·|X(s) = x).
Remark 1. In some cases, (2.1) implies the following functional analogue of (2.1):
(2.2) E[1I·(X |∞s )|Gs] = E[1I·(X |∞s )|X(s)],
where X |∞s denotes the trajectory of the process X on the time interval [s,+∞), considered as an element of
appropriate functional space. For instance, if the Kolmogorov’s sufficient condition for existence of continuous
modification holds true both for unconditional and conditional distributions of X , then (2.2) holds with X |∞s
considered as an element of C([s,+∞),X).
Everywhere below we assume that the process X possesses the Markov property w.r.t. its canonic filtration
at every point s ∈ R+ and for the processesXn the same property holds true at every point of the type in , i ∈ Z+
(the choice of the denominator here is quite arbitrary; it is possible to put any expression N(n) → ∞, n→ ∞
instead of n, but we avoid to do this in order to shorten the notation).
The next definition is introduced in [11].
Definition 1. The sequence {Xn} provides Markov approximation for the process X , if for arbitrary γ >
0, T < +∞ there exists K(γ, T ) ∈ N and a sequence of two-componential processes {Ŷn = (X̂n, X̂n)}, defined
on another probability space, such that
(i) X̂n
=Xn, X̂
n d=X ;
(ii) the process Ŷn, together with the processes X̂n, X̂
n, possesses the Markov property at the points
iK(γ,T )
, i ∈ N w.r.t. filtration {F̂nt = σ(Ŷn(s), s ≤ t)};
(iii) lim sup
 sup
i≤ Tn
K(γ,T )
iK(γ,T )
, X̂n
iK(γ,T )
 < γ.
Remark 2. Condition (ii) implies that, for i, k ∈ N, t1, . . . , tk > iK(γ,T )n , (x, y) ∈ X
2, the marginal distributions
(Ŷn(t1), . . . , Ŷn(tk)) ∈ ·|Ŷn( iK(γ,T )n ) = (x, y)
are equal to P
(Xn(t1), . . . , Xn(tk)) ∈ ·|Xn( iK(γ,T )n ) = x
and P
(X(t1), . . . , X(tk)) ∈ ·|X
iK(γ,T )
respectively.
Let us give some examples that illustrate Definition 1.
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 3
Example 1. Let {ξk} be a sequence of i.i.d random vectors in Rd with E‖ξk‖2+δRd < +∞ for some δ > 0.
Assume {ξk} to have zero mean and identity for their covariance matrix. Let us introduce the sequence of
processes Xn (”random broken lines”) on R
(2.3) Xn(t) =
Sk−1√
+ (nt− k + 1)
− Sk−1√
, t ∈
k − 1
, k ∈ N,
where Sn =
k=1 ξk. Then Xn converge by distribution in C(R
+,Rd) to the Brownian motion X in Rd.
It is shown in [11] that the sequence {Xn} provides Markov approximation for the process X (part I. of
Theorem 1 [11]). On the other hand, in the same paper (part II. of the same Theorem) the following effect
is revealed. Let us denote by K(γ, T ) the minimal constant K(γ, T ) such that there exists a process Ŷn
satisfying conditions (i)-(iii) of Definition 1. Then, in all the cases except one trivial case ξk ∼ N (0, I), for
each fixed T > 0 the convergence K(γ, T ) → +∞, γ → 0+ takes place. In other words, while the accuracy of
approximation of the Brownian motion X by the random walk Xn becomes better (this accuracy is described
by the parameter γ), the Markov properties of the pair of processes (X,Xn) necessarily become worse (these
properties are characterized by K(γ, T )).
Example 2. Let {ξk} be i.i.d random variables, belonging to the normal domain of attraction for α-stable
distribution L, α ∈ (0, 2). By the definition, this means that
α [Sn − an] ⇒ L, an =
0, α ∈ (0, 1)
nEξ1, α ∈ (1, 2)
n2E sin ξ1
, α = 1
([12], Chapter XVII.5). In order to shorten the notation, we assume that an ≡ 0 and consider processes Xn
on R+ of the type
(2.4) Xn(t) = n
αSk−1 + (nt− k + 1)
αSk − n−
αSk−1
, t ∈
k − 1
, k ∈ N.
Then Xn converge by distribution in D(R
+) to the homogeneous process with independent increments X in
R, for which X(1)−X(0) d=L (we call such process a process an α-stable one).
It is shown in [11] (Theorem 2) that the sequence {Xn} provides Markov approximation for the process X .
Furthermore, in this situation, on the contrary to the previous example, K(γ, T ) = 1 for all γ, T . This means
that, in this case, the Markov properties do not become worse while accuracy of approximation improves.
Remark 3. The last example shows that the property of Markov approximation does imply, in general, the
convergence of distributions of the processes Xn to the distribution of X in C = C(R
+,X) even if Xn has
continuous trajectories. The same can be said about convergence in D = D(R+,X) (we omit the corresponding
example).
Let us remark that the approach, introduced in the present paper, is closely related to the Skorokhod’s
method of embedding of random walk into Wiener process by means of of appropriate sequence of stopping
moments ([13]), widely used in literature. The basic idea is the same: we have to construct two processes on
the same probability space, with the pair keeping Markov or martingale properties. However, the Skorokhod’s
method, while being quite efficient for one-dimensional random walks that approximate Wiener process, is
much less appropriate in a multi-dimensional situation or for stable domain of attraction. Examples 1 and 2
show that the claim for the Markov approximation to hold true is not restrictive, at least for all basic classes
of random walks with no regard to the dimension of the phase space or to the type of limit distribution.
The following example shows that the property of Markov approximation is ”stable” in the following sense.
This property is preserved under construction of a new pair (Zn, Z) from the pair (Xn, X), possessing this
property, in some regular way (e.g., as a solution of a family of stochastic equations).
4 YURI N.KARTASHOV, ALEXEY M.KULIK
Example 3. Let Xn, X be as in Example 1, functions a : R
m → Rm, b : Rd → Rd×m be Lipschitz and
b∗(x)b(x) > 0, x ∈ Rm (the sign ∗ denotes the operation of taking of the adjoint matrix). Define
(2.5) Zn
k + 1
, Zn(0) = z,
) ≡ [Xn(k+1n )−Xn(
)]. Then ([14], [15]) Zn converge by distribution in C(R
+,Rm) to the process Z,
defined by SDE
(2.6) dZ(t) = a(Z(t))dt + b(Z(t))dX(t), Z(0) = z,
where X is the Brownian motion in Rd. It is natural to call the sequence Zn the difference approximation of
the diffusion process Z.
Let us show that the sequence {Zn} provides Markov approximation for the process Z. For arbitrary γ, T ,
we construct a pair (X̂n, X̂
n), corresponding to processes Xn, X and satisfying conditions of Definition 1 (such
construction is possible due to Example 1).
Let us construct the processes Ẑn, Ẑ
n as the functionals of the processes X̂n, X̂
n by equalities (2.5),(2.6) with
Xn replaced by X̂n and X replaced by X̂
n (note that (2.6) has unique strong solution, hence this procedure
is correct). By the construction, the pair (Ẑn, Ẑ
n) satisfies condition (i) of Definition 1. It is easy to verify
that the Markov condition (ii) for the pair (X̂n, X̂
n) holds in the functional form (2.2) with Ŷn|∞s considered
as an element of C([s,+∞),Rd × Rd) (see Remark 1). Hence, the pair (Ẑn, Ẑn) also satisfies condition (ii) of
Definition 1. Let us write
∆(γ) = lim sup
 sup
i≤ Tn
K(γ,T )
iK(γ, T )
, Ẑn
iK(γ, T )
and show that
(2.7) ∆(γ) → 0+, γ → 0 + .
Note that (2.7) immediately implies Markov approximation: for arbitrary δ > 0 we chose, using (2.7), γ = γ(δ)
such that inequalities γ < δ and ∆(γ) < δ hold. Then the pair (Ẑn, Ẑ
n), constructed by the scheme described
above, satisfy Definition 1 with the constant γ replaced by δ (note that, under this construction, the value
K(δ, T ) ≡ KZ(δ, T ) for the pair (Ẑn, Ẑn) is expressed through the same value for the pair (X̂n, X̂n) by
KZ(δ, T ) = KX(γ(δ), T )).
Now assume that (2.7) does not hold, then there exist constant c > 0 and sequence γk → 0+, nk → +∞
such that
(2.8)
K(γk, T )
→ 0, P
 sup
i≤ Tnk
K(γk,T )
iK(γk, T )
iK(γk, T )
 > c.
Consider the sequence of four-component processes (X̂nk , X̂
nk , Ẑnk , Ẑ
nk). Every component of this sequence is
weakly compact in C(R+,Rd) or C(R+,Rm), hence the whole sequence is also weakly compact in C(R+,Rd×
d ×Rm ×Rm). Consider arbitrary limit point (X̂∗, X̂∗, Ẑ∗, Ẑ∗) (in a sense of convergence by distribution) of
this sequence. It follows from (2.8) that
(2.9) P (Z∗ 6= Z∗) > 0.
It follows from Theorem 2.2 [15] (see also Chapter 9.5 [14]) that the processes Z∗, Z
∗ satisfy SDE (2.6) with X
replaced by X∗, X
∗. However, the SDE (2.6) possesses the property of pathwise uniqueness (see [16]), and the
property (iii) of the pair (Xnk , X̂
nk) implies that the processes X∗, X
∗ coincide a.s. Therefore, the processes
Z∗, Z
∗ also coincide a.s., that contradicts to (2.9) and show that our assumption that ∆(γ) 6→ 0+, γ → 0+ is
false.
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 5
The examples given above show that the claim for the Markov approximation to hold is not very restrictive,
and is provided in a typical situations. On the other hand, this claim is strong enough to provide one the
opportunity to obtain an analog of the Dynkin’s theorem; this will be shown in the next chapter.
3. Main theorem
We consider the functionals of the type
(3.1) φs,tn (Y )
k:s≤k/n<t
k + 1
, . . . , Y
k + L− 1
, 0 ≤ s < t,
where the functions Fn(·) are nonnegative, L is a fixed integer. Together with the functionals φn, that are
”stepwise” functions w.r.t. every time variable, we consider random broken lines, related to these functions:
n = φ
n + (ns− j + 1)φ
n + (nt− k + 1)φ
n , s ∈
j − 1
, t ∈
k − 1
We interpret the random broken lines ψn as a random elements in space C(T,R
+), where T
= {(s, t)|0 ≤ s ≤
If process Y possesses Markov property w.r.t. the filtration, associated with this process, at the points of
the type s = i
, i ∈ Z+, then, for functional φn, its characteristic fn is naturally defined by the formula
(3.2) f s,tn (x)
= E[φs,tn (Y )|Y (s) = x], s =
, i ∈ Z+, t > s, x ∈ X.
Note that the functional (3.1) is a function of values of Y at finite number of time moments, thus the mean
value in (3.2) is well defined as the integral over the family {Pst1...tk(x, ·), t1, . . . , tk > s, k ∈ N} of conditional
finite-dimensional distributions of the process Y .
The main result of this chapter is given in the following theorem.
Theorem 1. Assume that there exist the sequence Xn that provides Markov approximation for the homoge-
neous Markov process X and the sequence {φn ≡ φn(Xn)} of the functionals of the type (3.1). Let the following
conditions hold true:
(1) The functions Fn(·) are bounded and uniformly tend to zero:
δ(Fn)
= sup{Fn(x1, . . . , xL)|x1, . . . , xL ∈ X} → 0, n→ ∞.
(2) There exists a function f , that appears to be a characteristics (in a sense of Chapter 6 [1]) of some
W -functional φ = φ(X) of the limiting Markov process X, such that, for each T ,
,t∈(s,T )
∥∥f s,tn (·)− f s,t(·)
∥∥→ 0, n→ ∞,
where ‖g(·)‖ ≡ sup
|g(x)|.
(3) The limiting function f is uniformly continuous with respect to variable x, that is, for arbitrary T
0≤s≤t<T
∣∣f s,t(x′)− f s,t(x′′)
∣∣→ 0, |x′ − x′′| → 0.
ψn(Xn) ⇒ φ(X) ≡ {φs,t(X), (s, t) ∈ T},
where ψn are the random broken lines corresponding to the functionals φn and convergence is understood in a
sense of C(T,R+).
Remark 4. Conditions 1,2 are analogous to those of the Dynkin’s theorem: condition 2 is exactly the condition
for the characteristics to converge, condition 1 corresponds to the assumption that the prelimit functionals are
W -functionals. In the present situation, of course, we can not say that φn are W -functionals, particulary, φn
are not continuous with respect to temporary variable. Condition 1 means exactly that the values of jumps
6 YURI N.KARTASHOV, ALEXEY M.KULIK
are negligible while n→ ∞. Condition 3, though not very restrictive, is specific, and is caused by necessity to
consider functionals, set over different processes.
Remark 5. If Xn ⇒ X in C or in D (this condition is not provided by the conditions of the Theorem, see
Remark 3), then, as one can easily see from the proof, (Xn, ψn(Xn)) ⇒ (X,φ(X)) in C × C(T,R+) or in
D× C(T,R+), respectively.
Note that the result of the theorem also holds for the Markov process X that is not homogeneous w.r.t.
time variable; the claim for the limit Markov process to be homogeneous is imposed in order to shorten the
notation only. This remark concerns also the most of the results stated below.
Proof of the theorem. The general scheme of the proof is close to the one, proposed in [17] in order to
prove the analogue of the Dynkin’s theorem for the family of functionals of a single Markov process, for which
the properties of additivity, continuity and homogeneity may fail, but the violations become negligible while
n→ ∞.
First let us show that the finite-dimensional distributions of φn converge to the corresponding distributions
of φ. Let the constants γ, T be fixed and X̂n, X̂
n be processes satisfying conditions (i)-(iii) of Definition 1
with these constants. For these processes, one can consider the functionals φn(X̂n), φ(X̂
n); obviously, their
distributions and characteristics coincide with those for φn(Xn), φ(X). In order to shorten notation, we denote
further φn = φn(X̂n), φ = φ(X̂
n),K = K(γ, T ),Ft = F̂nt ≡ σ(X̂n(s), X̂n(s), s ≤ t).
It follows from the condition (iii) and the definition of characteristics that, for arbitrary t ∈
(3.3) E
,t|FKi
n |FKi
almost surely.
Lemma 1. For 0 ≤ s ≤ t ≤ T , the following estimate holds:
lim sup
n (X̂n)− φs,t(X̂)
∥∥f0,T
∥∥G(f, γ, T ) + 4
∥∥f0,T
∥∥2 ,
where G(f, γ, T ) = sup
0≤s≤t≤T,|x′−x′′|<γ
|f s,t(x′)− f s,t(x′′)|.
Proof. We will prove the statement of lemma for s = 0, t = T ; in general case the proof is exactly the same.
Consider the partition of the axis R+ by points of the type Ki
, i ∈ N. Denote Mn = [nTK ] + 1,
(i−1)K
,( iK
n , ∆̃
(i−1)K
T , i = 1,Mn.
We have that
φ0,Tn − φ0,T
∆ni − ∆̃ni
∆ni ∆̃
j = Σ
1 + 2Σ
where
(∆ni )
(∆̃ni )
2 − 2
∆ni ∆̃
1≤i<l≤Mn
∆ni ∆
1≤i<j≤Mn
∆ni ∆̃
1≤j<k≤Mn
∆̃nj ∆̃
1≤j<i≤Mn
∆ni ∆̃
Let us estimate the expectations Σn1 ,Σ
2 separately. Since the increments ∆
i , ∆̃
i are non-negative, the first
sum can be estimated by the sum of the first two terms:
(3.4) Σn1 ≤
(∆ni )
(∆̃ni )
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 7
The expectation of the first term in (3.4) can be estimated via the definition of φn:
(∆ni )
2 ≤ E
i=1,Mn
) Mn∑
∆ni ≤ Kδnf0,Tn
X̂n(0)
≤ Kδn
∥∥f0,Tn
∥∥→ 0, n→ +∞,
where δn ≡ δ(Fn). Convergence to zero of the expectation of the second term in (3.4) is provided by the
arguments, analogous to those used in [1] Chapter 6: on the one hand, by the continuity of functional φ,∑Mn
i=1(∆̃
2 → 0 by probability; on the other hand,
i=1(∆̃
2 is dominated by the variable (φ0,T )2; the
expectation of this variable, due to Lemma 6.4 [1], does not exceed 2
∥∥f0,T
∥∥2 <∞. Therefore, E
i=1(∆̃
0 due to the Lebesgue theorem on dominated convergence. Hence, lim sup
EΣn1 ≤ 0.
The expectation of Σn2 is equal
EΣn2 = E
1≤i<l≤Mn
∆ni ∆
1≤i<j≤Mn
∆ni ∆̃
1≤j<k≤Mn
∆̃nj ∆̃
1≤j<i≤Mn
∆ni ∆̃
(3.5) = E
Mn−1∑
n − φ
Mn−1∑
n − φ
We estimate the second term in (3.5), using property (3.3). Since ∆̃ni is measurable w.r.t. FKi
, the following
estimate holds:
Mn−1∑
n − φ
Mn−1∑
∆̃ni E
n − φ
Mn−1∑
− f Kin ,T
Mn−1∑
n − f
∣∣∣+ E
Mn−1∑
∣∣∣∣f
− f Kin ,T
))∣∣∣∣ ≤
(3.6) ≤ ‖f0,T‖ sup
,t∈(s,T )
∥∥f s,tn (·)− f s,t(·)
∥∥+ E
Mn−1∑
∣∣∣∣f
− f Kin ,T
))∣∣∣∣
(in the last inequality, we have used that
∑Mn−1
i=1 ∆̃
i ≤ φ0,T and Eφ0,T ≤ ‖f0,T‖). The first term in (3.6)
tends to zero. In order to estimate the second term, we put Ωγ,T =
i≤ Tn
, X̂n
(recall
that P (Ωγ,T ) < γ due to the claim (iii) of Definition 1). We have
Mn−1∑
∣∣∣∣f
− f Kin ,T
))∣∣∣∣ ≤ Eφ
G(f, γ, T )1IΩ\Ωγ,T+
(3.7) + E
Mn−1∑
∣∣∣∣f
− f Kin ,T
))∣∣∣∣ 1IΩγ,T .
The first term in (3.7) can be estimated by ‖f0,T‖G(f, γ, T ). The second term is estimated by Cauchy
inequality:
Mn−1∑
∣∣∣∣f
− f Kin ,T
))∣∣∣∣ 1IΩγ,T ≤
∥∥f0,T
∥∥Eφ0,T 1IΩγ,T ≤
∥∥f0,T
∥∥ [E(φ0,T )2
2 [P (Ωγ,T )]
∥∥f0,T
∥∥2√2γ
8 YURI N.KARTASHOV, ALEXEY M.KULIK
(here, the Lemma 6.4 [1] was applied). Summing up the above relations, we deduce that
(3.8) lim sup
Mn−1∑
n − φ
∥∥f0,T
∥∥G(f, γ, T ) +
∥∥f0,T
∥∥2√2γ.
Now, let us proceed with the estimation of the first item in (3.5). Straightforward use of the property
(3.3) is impossible here, since the variable ∆ni is a functional of values of the process X̂n at the points
, Ki+1
, . . . Ki+L
, that is, it is not measurable with respect to FKi
. Without loss of generality, one can
assume that K ≥ L (otherwise one can make the same procedure with the constant K replaced by K · L).
Then the variable ∆ni is measurable with respect to FK(i+1)
. The functionals φn, φ are additive at points of
the type j
. Applying (3.3) and condition 1 of the Theorem, we obtain the following relation
Mn−1∑
n − φ
Mn−1∑
K(i+1)
n − φ
K(i+1)
Mn−1∑
K(i+1)
K(i+ 1)
K(i+1)
K(i+ 1)
(3.9) ≤ Kδn
∣∣f0,Tn
∣∣+ E
Mn−1∑
K(i+1)
K(i+ 1)
K(i+1)
K(i+ 1)
The first term in (3.9) tends to zero. The second term in (3.9) is estimated in the same way with the second
term in (3.5), with one necessary change. We cannot apply Lemma 6.4 [1] in order to estimate the second
moment φ0,Tn , therefore this estimate must be obtained separately. This can be done in a following way:
E(φ0,Tn )
2 = E
(∆ni )
2 + 2E
1≤i<j≤Mn
∆ni ∆
j = E
(∆ni )
2 + 2E
1≤i≤Mn
∆ni φ
(∆ni )
2 + 2E
1≤i≤Mn
∆ni [φ
iK/n,(i+1)K/n
n + φ
(i+1)K/n,T
n ] ≤
(∆ni )
2 + 2KδnE
1≤i≤Mn
∆ni + 2E
1≤i≤Mn
∥∥f0,Tn
(3.10) ≤
(2K + 1)δn + 2‖f0,Tn ‖
n ≤ (2K + 1)δn
∥∥f0,Tn
∥∥+ 2
∥∥f0,Tn
∥∥2 ,
all transitions here are analogous to those given above, and thus are not discussed in details. Repeating literally
the estimates for the second term in (3.5), we obtain the estimate
(3.11) lim sup
Mn−1∑
n − φ
∥∥f0,T
∥∥G(f, γ, T ) +
∥∥f0,T
∥∥2√2γ.
It follows from (3.8),(3.11) that lim sup
[2Σn2 ] ≤ 4
∥∥f0,T
∥∥G(f, γ, T ) + 4
∥∥f0,T
∥∥2. This, combined with the
estimate lim sup
[Σn1 ] ≤ 0 proved before, provides the needed statement. The lemma is proved.
Now, we can complete the proof of the convergence of finite-dimensional distributions of φn to those of φ.
In order to shorten notation we consider the one-dimensional distributions only; in general case considerations
are completely the same.
Take arbitrary s, t, s < t. In order to prove weak convergence φs,tn (Xn) to φ
s,t(X), it is sufficient to show
that, for arbitrary bounded Lipschitz function g,
(3.12) lim sup
∣∣Eg(φs,tn (Xn))− Eg(φs,t(X))
∣∣ = 0.
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 9
Let g be fixed, consider a pair of processes X̂n, X̂
n, corresponding (in a sence of Definition 1) to T = t and
given positive γ. By construction, φs,tn (Xn)
=φs,tn (X̂n), φ
s,t(X)
=φs,t(X̂n). Applying Lemma 1, we obtain
lim sup
∣∣Eg(φs,tn (Xn))− Eg(φs,t(X))
∣∣ ≤ lim sup
∣∣∣g(φs,tn (X̂n))− φs,t(X̂n)
∣∣∣ ≤
≤ Lip(g) lim sup
∣∣∣φs,tn
− φs,t
)∣∣∣ ≤ 2Lip(g)
‖f0,t‖G(f, γ, t) +
2γ ‖f0,t‖2,
here Lip(g) denotes the Lipshits constant for g. Condition 3 of the Theorem provides that G(f, γ, t) → 0, γ →
0+. Therefore, since γ > 0 is arbitrary, (3.12) follows from the estimate given above.
Since sups,t |ψs,tn − φs,tn | ≤ δn → 0, the finite-dimensional distributions of φn converge to corresponding
distributions of φ. Thus, the only thing left to show in order to prove the Theorem, is that the family of
distributions of ψn is dense in C(T,R
+). The values of the functions ψn at the point s, t differ from the values
at the closest knots of partition s∗, t∗ ∈ 1nZ+ at most on δn, and ψn are monotone as the functions of the time
variables. Hence, in order to prove the required statement, it is sufficient to show that, for arbitrary sequence
of partitions
Sn = {sn0 = 0 < sn1 < · · · < snk < . . . } ⊂ 1nZ+, n ∈ N
with σn ≡ maxk(snk −snk−1) → 0, n→ +∞
and arbitrary T ∈ R+,
k:sk≤T
snk−1,s
→ 0, n→ +∞.
Set γn,T = sup0<t−s<σn,t<T ‖f
n ‖, note, that γn,T → 0, n→ +∞ due to continuity of the limit characteristics
f and uniform convergence of fn ⇒ f . In the same way with (3.10) we obtain the estimate
(3.13) E
snk−1,s
≤ {(2K + 1)δn + 2γn,T }Eφ
snk−1,s
Summing up the estimates (3.13) w.r.t. k (recall that φs,tn = ψ
n when s, t ∈ 1nZ+), we obtain
k:sk≤T
snk−1,s
≤ {(2K + 1)δn + 2γn,T } ‖f0,Tn ‖ → 0, n→ +∞,
what was to be proved. The theorem is proved.
Let us make one remark. For the random walks, the Skorokhod’s method is well known, allowing one to
reduce the investigation of the sums of the type (3.1) to the case L = 1. This method can be applied in the
context of current paper, also. Namely, the reasoning, similar to the one used in the proof of Theorem 1,
Chapter 5.3 [2], provides the following result (the proof is omitted).
Proposition 1. Let the sequence of functionals {φn = φn(Xn)} of the type (3.1) be given, and, for every n,
the process Xn possesses the Markov property at the time moments
, i ∈ Z+. Consider the functionals
n (Xn)
k:s≤k/n<t
, 0 ≤ s < t,
where
Ψn,k(x) ≡ E
k + 1
, . . . , Xn
k + L− 1
)) ∣∣∣Xn
, x ∈ X.
Let functions Fn(·) be non-negative and satisfy condition 1 of Theorem 1, then the functionals φs,tn have a
limit distribution if and only if the functionals χs,tn have a limit distribution, and the limit distributions of the
functionals φs,tn , χ
n are equal as soon as they exist.
It is worth to note that the Proposition 1 does not lead to simplification of the initial problem in the context
of current paper. The number of values of process Xn, contained in a one summand for the functional φn (that
is, number L), is not involved significantly into the proof of the main theorem. We will see later that the main
problem in the application of the Theorem consists in verification of the condition 2 of uniform convergence
of characteristics; the characteristics of the functionals φn and χn, obviously, coincide.
In the following two chapters, the examples of application of Theorem 1 are given.
10 YURI N.KARTASHOV, ALEXEY M.KULIK
4. The local time of a random walk at a point.
Let the processesXn be constructed w.r.t. one-dimensional random walk that belongs to the normal domain
of attraction of an α-stable law, α ∈ (1, 2] (see Examples 1,2). We assume the centering sequence an to be
equal to zero, and set the random broken lines Xn by equality (2.4).
Consider, for arbitrary z∗ ∈ R, the functionals φn = φn(Xn) of the type (3.1) with L = 2, Fn(x, y) =
|y−x|
1I(x−z∗)(y−z∗)<0 +
(1Ix 6=z∗,y=z∗ + 1Ix=z∗,y 6=z∗)
. For every s < t, s, t ∈ { j
, j ∈ Z+}, with probability 1
the following equality takes place
(4.1) φs,tn (Xn) = lim
1IXn(r)∈(z∗−ε,z∗+ε)\{z∗} dr, 0 ≤ s < t.
Therefore the functionals φn can be naturally interpreted as the censored local times for the broken lines Xn
at the point z∗ (the censoring operation consists in removing horizontal parts of the broken lines). Theorem
3.1 allows one to obtain the following limit result.
Proposition 2. Let the distribution of the jump ξ1 of the random walk be concentrated on Z and aperiodic.
Then the conditions of Theorem 1 hold true and φs,tn (Xn) converge by distribution to φ
s,t(X) = P (ξ1 6=
0) · Ls,t(X, z∗), where L(X, z∗) is the local time of the limit α-stable process X at the point z∗.
Proof. The condition for Xn to provide Markov approximation for X holds true (see Example 2). Condition
1 of the Theorem holds with δn = 2n
−1 since either the increment of the process Xn in the neighboring knots
is equal to zero or the absolute value of this increment is not less then n−
α . Let us show that the characteristics
of functionals φn converge uniformly to the function
(4.2) f s,t(x) = P (ξ1 6= 0)
∫ t−s
pr(z∗ − x) dr,
where pr(·) is the density of distribution X(r) under condition X(0) = 0; this provides conditions 2,3 of the
Theorem.
In order to shorten notation we take z∗ = 0. Denote P
i = P (Sk = i), Pj = P
j = P (ξ1 = j), i, j ∈ Z. We
have that
f s,tn (x) = n
j 6=0
i∈(xn
α −j,xn
P ki +
+ P k
notation i ∈ (a, b) in the case a > b means that b < i < a. Using the appropriate version of the Gnedenko’s
local limit theorem (see [18], Theorem 4.2.1), one can write
(4.3) εk ≡ sup
∣∣∣∣k
αP ki − p1
)∣∣∣∣→ 0, k → +∞.
Hence
f s,tn (x) =
j 6=0
i∈(xn
α −j,xn
(4.4) +
α − j
+ Ξn(x),
where
(4.5) |Ξn(x)| ≤
[nt]∑
and Ξn ⇒ 0, n→ +∞ via the Toeplitz’s theorem.
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 11
The density p1 is uniformly continuous over R, hence, using the same arguments, one can show that, up to
a summand that uniformly converges to zero, the value of f s,tn (x) equals
j 6=0
i∈(xn
α −j,xn
(4.6) =
P (ξ1 6= 0)
P (ξ1 6= 0)
in the latter equality, we have used that the process X is self-similar, that is, pr(x) = r
α p1(r
αx), r > 0.
The sum in the right hand part of (4.6) is exactly the integral sum for the integral in the right hand part of
(4.2), the functions {pr(·), r ≥ r0} are uniformly continuous for arbitrary r0 > 0 and supx pr(x) ≤ Cr−
α . This
immediately provides the required uniform convergence of fn to f . The proposition is proved.
The similar result can be proved for ξk with non-lattice distribution, for which there exists a bounded
distribution density of Sn0 for some n0 (the proof is omitted).
The result of Proposition 2 and its analog for non-lattice random walks is not essentially new; one can obtain
it applying either Proposition 1 and the technique, exposed in §§III.2, III.3 [3], or the reasonings, similar to
those used in the proof of Theorem 3 [9]. Our reason to give this example consists, on the one hand, in
describing the way of application of Theorem 1 in a simple situation where an appropriate local limit theorem
is available, and on the other hand, in emphasizing the following interesting fact, that is not reflected in a
literature available for us. For a ”good” random walks (lattice or essentially non-lattice), their local times at
the point, defined by the natural equality (4.1), converge by distribution exactly to the local time of the limit
process at the same point, as soon as the broken lines corresponding to the random walk does not contain
horizontal sections.
5. Difference approximations of diffusion processes.
Consider the sequence {Zn} of difference approximations of diffusion process Z (see Example 3, equalities
(2.5),(2.6)). The sequence {Zn} provides Markov approximation for Z, that allows one to apply Theorem 1
while considering the question on the limit behavior of the functionals of type (3.1) for {Zn}.
One of possible way to proceed here is to apply the estimates based on an appropriate local limit theorem,
like it was made in the previous chapter. In order to make this paper reasonably short, we do not give the
detailed exposition of this subject here (see the separate paper [19]). In this chapter, we give a simple corollary
of Theorem 1, that provides invariance principle for certain ”canonic” additive functionals, that are related to
the Doob’s decomposition of |Zn(·)|.
Let us consider the objects introduced in Example 3 with m = d = 1 and a, b, {ξn} satisfying conditions
introduced there. Put
(5.1) φs,tn (Zn) ≡
k∈(sn,tn]
Zn( k−1n )Zn(
Zn( k−1n )=0
ψn are corresponding broken lines.
Proposition 3. The processes ψn converge by distribution in C(T,R) to the local time
φs,t ≡ lim
1I|Z(r)|<εb
2(Z(r)) dr
of the diffusion process Z at the point 0.
Proof. Since the diffusion coefficient is non-degenerate, Z possesses continuous transition density pt(x, y)
and the standard estimate supx pt(x, y) ≤
C(y)√
holds true. This implies existence of the local time of Z at
12 YURI N.KARTASHOV, ALEXEY M.KULIK
the point 0. This local time is a W -functional with the characteristics f0,t(x) = b2(0)
ps(x, 0) ds, that is,
condition 3 of Theorem 1 holds. Straightforward calculations prove the equality
(5.2) |Zn(t)| − |Zn(s)| = φ0,tn (Zn) +
[nt]−1∑
where s ∈ 1
Z+, sign (0) = 0. This provides that
f s,tn (x) = E [|Zn(t)||Zn(s) = x]− |x| −
[nt]−1∑
sign (Zn
) ∣∣∣Zn(s) = x
Processes Zn converge weakly to Z, function a(x)sign (x) has unique jump at point x = 0 and P (Z(r) = 0) = 0
for every r > 0. Hence the standard reasonings provide that (we omit the details)
(5.3) f s,tn (x)⇒
E [|Z(t)||Z(s) = x]− |x| − E
a(Zr)sign (Zr) dr
∣∣∣Z(s) = x
This proves condition 2 of Theorem 1, since the right hand side of (5.3) is exactly the characteristics of the
local time φ due to Ito-Tanaka formula.
In order to provide condition 1, let us, for a while, suppose additionally that the coefficients a, b are bounded.
We apply the standard ”cutting” procedure: on each step of approximation, together with the process Zn,
we consider the process Z̃n, constructed by the same scheme from a sequence of i.i.d.r.v. {ξ̃n}, satisfying
conditions ‖ξ̃n‖ ≤ n
2 and ξn = ξ̃n for ‖ξn‖ ≤ n
2 . For such Z̃n, condition 1 of theorem holds with
δ(Fn) ≤ n−1 max
|a(x)| + n
|b(x)|,
and the other conditions of theorem for Z̃n remain to hold true. This proves the statement of Proposition 3
for {Z̃n}. On the other hand, for arbitrary T ∈ R+
Zn|[0,T ] 6= Z̃n|[0,T ]
1− 2+δ
= o(1), n→ +∞,
and therefore the statement of Proposition 3 holds true for {Zn}. At last, the additional assumption that the
coefficients a, b are bounded, can be removed via a standard localization procedure. The proposition is proved.
Remark 6. Let a = 0, b = 1, P (ξk = ±1) = 12 (that is, Zn corresponds to the Bernoulli’s random walk), then
functional (5.1) can be represented at the form
(5.4) φ̃s,tn =
# {k ∈ [sn, tn) : Zn(k) = 0} .
The functional (5.4) is widely used in a literature as the difference analogue of the local time at the point
zero for lattice random walks. Proposition 3 shows that the functional (5.1) is a natural difference analogue of
the local time both for random walks and, more generally, for difference approximations of diffusion processes
without any restrictions on the distribution of the sequence {ξk}.
6. Invariance principle for additive functionals of Markov chains
In previous two chapters we have considered more or less particular examples illustrating possible ways to
provide the main condition of Theorem 1 (condition 2). In this chapter we introduce general sufficient condition
of weak convergence of additive functionals, constructed on the sequence of Markov chains, that is formulated
in terms of the transition probabilities of these chains and the functions Fn involved in representation (3.1).
This condition is obtained as an application of Theorem 1, and the main assumption here is that the local limit
theorem (condition 4 of Theorem 2 below) takes place in an appropriate form. For recurrent Markov chains
this condition, together with a natural condition of weak convergence of ”symbols” of additive functionals
(exact formulation is given below), is sufficient for convergence of characteristics, and the estimates here are
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 13
similar to (4.4) – (4.6) (see Theorem 3 below). For transient chains these estimates are not powerful enough,
since in this case the estimate (4.5) does not provide that Ξn is negligible. One possible way to overcome this
difficultly is to apply a more strong version of local limit theorem, for instance, to claim explicitly the rate
of convergence εk → 0 in (4.3). Such an approach restricts the range of possible applications, therefore we
introduce another one, that is concerned with a uniform condition on the modulus of continuity of processes
Xn (condition 5 of Theorem 2) and a ”dimensional” condition on the symbols of functionals (condition 6),
adjusted one with another with an appropriate way (condition 7).
We assume that a σ-finite measures ν, νn on X are given such that
P (X(t) ∈ dy|X(s) = x) = pt−s(x, y)ν(dy), 0 ≤ s < t, x, y ∈ X,
∈ dy|Xn
= pn,k(x, y)νn(dy), i ∈ Z+, k ∈ N, x, y ∈ X.
The measurable functions pt, pn,k are interpreted as the transition probability densities for X,Xn w.r.t. mea-
sures ν, νn.
We assume the W -functional φ = φ(X) with the characteristics f to be given. It is known (see [1], Chapter
6) that
s,t = L2 − lim
0,ε(X(r)) dr,
and therefore
f s,t(x) = lim
pr(x, y)
f0,ε(y)ν(dy) dr.
We assume that, as ε→ 0+, the measures 1
f0,εdν converge weakly (i.e., on every bounded continuous function)
to a finite measure µ, the characteristics f can be represented in the form
(6.1) f s,t(x) =
∫ t−s
pr(x, y)µ(dy) dr, and
pr(x, y)µ(dy)
dr < +∞, T ∈ R+.
We also consider the sequence of the functionals φn = φn(Xn) of the type (3.1) with L = 1 and Fn =
(the case L > 1 can be considered similarly and we omit it in order to shorten notation). The characteristics
of φn has the form
f s,tn (x) =
pn,k(x, y)µn(dy), 0 ≤ s < t, x ∈ X,
where µn(dy) ≡ gn(y)νn(dy) are the ”symbols” of the functionals φn.
Theorem 2. Assume the following conditions to hold true.
(1) Trajectories of the processes Xn are continuous, and the sequence {Xn} possesses Markov approxima-
tion of X.
(2) 1
supx gn(x) → 0, n→ +∞.
(3) For arbitrary t0 > 0, the function (t, x, y) 7→ pt(x, y) is uniformly continuous on [t0,+∞) × X2, and
for arbitrary y ∈ X
x 6∈B(y,R)
pt(x, y) → 0, R → +∞
(here and below B(x,R) ≡ {x ∈ X|ρ(x, y) < R}). Furthermore, there exist constants γ > 0, Cγ > 0
such that
x,y∈X
pt(x, y) ≤ Cγt−γ , t > 0.
(4) There exist sequences {αn}, {βn} ⊂ R+ tending to zero, such that
x,y∈X
|pn,k(x, y)− p k
(x, y)| ≤ (αn + βk)
, n, k ∈ N.
14 YURI N.KARTASHOV, ALEXEY M.KULIK
(5) There exist constants δ > 0, Cδ > 0 such that, for arbitrary T > 0,
x∈X,n∈N
t,s∈[0,T ],|t−s|≥ 1
ρ(Xn(t), Xn(s))
|t− s|δ
]Cδ ∣∣∣X(0) = x
 < +∞.
(6) Measures µn are finite and converge weakly to measure µ. There exist constants θ > 0, Cθ, cθ > 0 such
µn(B(x,R)) ≤ CθRθ, x ∈ X, n ∈ N, R > cθn−δ
(note that the latter condition provides that µ(B(x,R)) ≤ CθRθ, x ∈ X, R > 0).
(7) The constants γ, δ, θ, Cδ satisfy the relations
δθ + 1 > γ, Cδ > 2θ + 2.
Then (Xn, ψn(Xn)) ⇒ (X,φ(X)) in a sense of convergence in distribution in C(R+,X)×C(T,R+) (ψn are
the random broken lines corresponding to the functionals φn).
Proof. In order to prove the Theorem, it is sufficient to show that, for every T ∈ R+,
(6.2) f s,tn (x) ⇒
s≤t≤T,x∈X
f s,t(x), n→ +∞.
Indeed, the sequence {Xn} provides Markov approximation for X (condition 1), and condition 1 of Theorem 1
is provides by condition 2 of Theorem 2. Having (6.2) proved, we provide condition 2 of Theorem 1. Condition
3 of this theorem is provided by (6.1) and uniform continuity of the density p. At last, condition 5 of Theorem
2 provides weak convergence of Xn to X in C(R
+,X), that allows one to apply Theorem 1 and Remark 5.
Before proving (6.2), let us make some auxiliary estimates. Denote
δ,n(Xn) = sup
v,w∈[s,t],|v−w|≥ 1
ρ(Xn(v), Xn(w))
|v − w|δ ,
n,A =
Xn(r) ∈ B(Xn(s), A(r − s)δ), r ∈
note that {Hs,tδ,n(Xn) < A} ⊂ D
n,A. Also denote α = maxn αn, β = maxk βk, δn =
supx |gn(x)|
, B1 =
maxn δn, B2(T ) =
Cθ(Cγ+α+β)
1+δθ−γ T
1+δθ−γ. For arbitrary A > cθ, T ∈ R+, consider the functionals φs,tn,A =
φs,tn 1IHs,t
(Xn)<A
, s ≤ t ≤ T .
Lemma 2. 1. E
n,A|Xn(s) = x
≤ B1 +B2(T )Aθ.
|Xn(s) = x
≤ 3B1(B1 +B2(T )Aθ) + 2(B1 +B2(T )Aθ)2.
3. Let p ∈
1, 2Cδ−2
Cδ+2θ
(recall that 1 < 2Cδ−2
Cδ+2θ
due to condition 7 of the Theorem). Then
x∈X,n∈N,s≤t≤T
)p |Xn(s) = x
< +∞.
Proof. Using condition 4 of the Theorem and then condition 6, we obtain, for t, s ∈ 1
Z+, the estimate
n,A|Xn(s) = x
φs,tn 1IDs,t
|Xn(s) = x
gn(x)
n(t−s)−1∑
x,A( kn )
pn,k(x, y)µn(dy) ≤
≤ δn +
Cγ + α+ β
n(t−s)−1∑
≤ δn +
Cγ + α+ β
n(t−s)−1∑
)γ−δθ
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 15
that immediately proves the first statement of the Lemma. The second statement can be obtained from the
first one via the estimate similar to (3.10) with the use of the inequality
1IHs,t
(Xn)<A
≤ 1IHs,r
(Xn)<A1IHr,t
(Xn)<A
that holds true for arbitrary r ∈ (s, t).
Applying statement 2 and Hölder inequality we obtain
φs,tn
)p |Xn(s) = x
φs,tn
(Xn)∈[N−1,N)|Xn(s) = x
(φs,tn )
21IHT
(Xn)<N
|Xn(s) = x
P (HTδ,n(Xn) ≥ N − 1)
] 2−p
B3(T ) +B4(T )N
2 ·B5(T ) [(N − 1) ∨ 1]−
here and below Bi(T ), i = 3, 4, . . . denotes a constant, that can be expressed explicitly through T and the
constants introduced in the formulation of the Theorem, but an explicit expression is not needed in our
consideration. Since θp− 2−p
Cδ < −1 by the choice of p, this proves the statement 3. The lemma is proved.
Let us proceed with the proof of (6.2). Choose non-increasing Lipschitz function Ψ : R+ → [0, 1] such that
Ψ([0, 1]) = {1},Ψ([2,+∞)) = {0}, and set
Ψr(x, y) = Ψ(r
−1 · ρ(x, y)), r > 0, x, y ∈ X, Ψ0 ≡ 1.
Note that, for arbitrary r0 > 0, the function (r, x, y) 7→ Ψr(x, y) is uniformly continuous on [r0,+∞)× X2.
For fixed s ≤ t ≤ T,A ∈ R+ we decompose φs,tn as φs,tn = η
n,A + ζ
n,A, where
n,A =
A( kn−s)
Xn(s), Xn
We have that, on the set D
n,A, for k such that s ≤ kn < t,
Xn(0), Xn
A( kn−s)
Xn(s), Xn
hence {φs,tn = η
n,A} ⊃ D
n,A and
(6.3) {ζs,tn,A 6= 0} ⊂ Ω\D
n,A ⊂ {H
δ,n ≥ A}.
Let p be the same as in statement 3 of Lemma 2. Then it follows from (6.3) and inequality 0 ≤ ζs,tn,A ≤ φs,tn
(6.4) E
n,A|Xn(s) = x
(φs,tn )
p|Xn(s) = x
δ,n ≥ A|Xn(s) = x)
] p−1
p ≤ B6(T )A−δ
Similarly, one can write φs,t = η
A + ζ
A , where η
ΨA(r−s)δ(X(s), X(r))dφ
s,r ,
(6.5) E
A |X(s) = x
≤ B6(T )A−δ
We have ∣∣∣E
n,A|Xn(s) = x
A |X(s) = x
]∣∣∣ =
∣∣∣∣∣∣
gn(x)
]n(t−s)[−1∑
pk,n(x, y)ΨA( kn )
δ (x, y)µn(dy)−
∫ t−s
pr(x, y)ΨArδ(x, y)µ(dy) dr
∣∣∣∣∣∣
≤ δn +∆1n(x,A, s, t) + ∆2n(x,A, s, t) + ∆3n(x,A, s, t),
16 YURI N.KARTASHOV, ALEXEY M.KULIK
where ]z[≡ min{N ∈ Z, N ≥ z},
∆1n(x,A, s, t) =
∣∣∣∣∣∣
]n(t−s)[−1∑
[pk,n(x, y)− p k
(x, y)]Ψ
A( kn )
δ (x, y)µn(dy)
∣∣∣∣∣∣
∆2n(x,A, s, t) =
∣∣∣∣∣∣
]n(t−s)[−1∑
(x, y)Ψ
A( kn)
δ(x, y)µn(dy)−
∫ t−s
pr(x, y)ΨArδ(x, y)µn(dy) dr
∣∣∣∣∣∣
∆3n(x,A, s, t) =
∫ t−s
pr(x, y)ΨArδ (x, y)[µn(dy)− µ(dy)] dr
∣∣∣∣ .
Denote ∆in(A, T ) = supx∈X,s≤t≤T ∆
n(x,A, s, t), i = 1, 2, 3. Since Ψr(x, y) ∈ [0, 1] and {Ψr(x, y) 6= 0} ⊂ {y ∈
B(x, 2r)},
∆1n(A, T ) ≤
]nT [−1∑
[αn + βk]
x, 2A
(6.6) ≤ Cθ(2A)θ ·
]nT [−1∑
[αn + βk]
)δθ−γ
→ 0, n→ +∞
by Toeplitz theorem.
The function (r, x, y) 7→ pr(x, y)Ψr(x, y) is uniformly continuous over [r0,+∞)×X2 for any r0 > 0, therefore
an estimate analogous to (6.6) provides that
x∈X,s≤t≤T
∣∣∣∣∣∣
]n(t−s)[−1∑
k=[r0n]+1
(x, y)Ψ
A( kn)
δ (x, y)µn(dy)−
∫ t−s
pr(x, y)ΨArδ(x, y)µn(dy) dr
∣∣∣∣∣∣
(note that maxn µn(X) < +∞ since µn weakly converge to µ). The same arguments provide that
lim sup
∆2n(A, T ) ≤
≤ lim sup
[r0n]∑
 = B7(A, T )(r0)δθ−γ+1.
Since r0 > 0 is arbitrary, this implies that
(6.7) ∆2n(A, T ) → 0, n→ +∞.
At last, the weak convergence of µn to µ and the first part of condition 3 provide that, for every t,
In(A, t) ≡ sup
pt(x, y)ΨArδ(x, y)[µn(dy)− µ(dy)]
∣∣∣∣→ 0, n→ +∞.
Since In(A, t) ≤ Cγt−γ · Cθ(2Atδ)θ, the Lebesgue theorem of dominated convergence provides that
(6.8) ∆3n(A, T ) → 0, n→ +∞.
It follows from the estimates (6.4) – (6.8) that
lim sup
x∈X,s≤t≤T
∣∣f s,tn (x)− f s,t(x)
∣∣ ≤ 2B6(T )A−δ
p , A > cθ.
Taking A→ +∞ we obtain (6.2), that completes the proof. The theorem is proved.
In order to make our exposition complete, let us formulate a version of Theorem 2 for the recurrent case.
Theorem 3. Let conditions 1 – 4 of Theorem 2 hold true and γ < 1. Also let µn converge weakly to µ, and
Xn converge to X by distribution in C(R
+,X).
Then (Xn, ψn(Xn)) ⇒ (X,φ(X)) in a sense of convergence in distribution in C(R+,X)× C(T,R+).
INVARIANCE PRINCIPLE FOR ADDITIVE FUNCTIONALS OF MARKOV CHAINS 17
The proof, with slight changes, repeats the proof of Theorem 2, and is omitted. Note that, under conditions
of Theorem 3, the convergence of finite-dimensional distributions of φn can be provided with the use of the
technique, mentioned in the Introduction, that was proposed by I.I.Gikhman and is based on studying of limit
behavior of difference equations for characteristic functions of φs,tn (see for instance the proof of Theorem 3
[9]). In the transient case, treated in Theorem 2, this technique can not be applied since the uniform estimates,
analogous to (4.4) – (4.6), are not available in this case.
At last, let us give an example of application of Theorem 2. To shorten exposition we omit the proofs of
some technical details.
Example 4. Let X = Rd, d ≥ 2 and Xn, X be as in Example 1. Let K ⊂ Rd be a compact set, for which the
surface measure λK is well defined by equality
λK(·) ≡ w − lim
λd(· ∩Kε)
λd(Kε)
where w − lim means the limit in the sense of weak convergence of measures, λd is Lebesgue measure on Rd,
Kε ≡ {x|dist(x,K) ≤ ε}. Assume that the condition
(6.9) λd(Kε) ≥ const · εβ, ε > 0
holds with some β < 2. In particular, the set K can be smooth (or, more generally, Lipschitz) surface of
codimension 1 or fractal with its Haussdorf-Besikovich dimension greater then d− 2.
It not hard to verify that µ ≡ λK is W -measure (see [1], Chapter 8.1 for the terminology), and therefore
corresponds to some W -functional φ of the Wiener process X . This functional is naturally interpreted as the
local time of Wiener process at the set K, and can be written as φs,t =
λK(Xr) dr.
We consider the functionals φn(Xn) of the form
k∈[sn,tn)
1I{Xn( k
)∈K 1√
and apply Theorem 2 in order to prove convergence of the distributions in C(R+,Rd)× C(T,R+)
(6.10) (Xn, ψn(Xn)) ⇒ (X,φ(X))
(ψn are the broken lines corresponding to φn).
Condition 1 holds true due to Example 1, condition 2 is provided by condition (6.9) (by this condition,
supx gn(x) ≤ const · n
2 ). Condition 3 holds with pt(x, y) = (2πt)
2 exp{− 1
‖y− x‖2
} and γ = d
. Condition
(6.9) implies condition 6 with θ = d− β.
We assume that the random walk Sn is either aperiodic on some lattice hZ
d or is strongly non-lattice (i.e.,
Sn0 has bounded distribution density for some n0). Under this assumption, condition 4 holds with αn ≡ 0,
ν = λd and νn equal to counting measures on
d in lattice case or λd in strongly non-lattice case.
It remains to provide conditions 5, 7. We have γ−1
= d−2
2(d−β) <
. Choose some δ ∈
and consider
α > 0 such that
> δ and α > 2θ + 2. Suppose that
(6.11) E‖ξk‖αRd < +∞.
Then applying Burkholder inequality we obtain that
(6.12) E‖Xn(t)−Xn(s)‖αRd ≤ const · |t− s|
2 , |t− s| ≥ 1√
, x ∈ Rd.
Repeating the standard proof of the Kolmogorov’s theorem on existence of continuous modification (see, for
instance [20], p. 44,45), one can deduce from (6.12) that, for ς < α, ϑ <
t,s∈[0,T ],|t−s|≥ 1
‖Xn(t)−Xn(s)‖Rd
|t− s|ϑ
< +∞.
18 YURI N.KARTASHOV, ALEXEY M.KULIK
Finally, choosing ϑ = δ, ς > 2θ + 2 we obtain that conditions 5,7 hold with Cθ = ς . Applying Theorem 2,
we obtain weak convergence (6.10) under additional moment condition (6.11). One can remove this condition
using the ”cutting” procedure, described in the proof of the Proposition 3.
Let us remark that for the lattice random walks the result, exposed in Example 4, was obtained in [5] by a
technique, essentially different from the one proposed here. Convergence (6.10) in continuous case, as far as it
is known to authors, is a new result.
References
[1] Dynkin E.B. Markov processes, M.: Fizmatgiz, 1963 (in Russian).
[2] Skorokhod A.V., Slobodeniuk M.P. Limit theorems for random walks, Kiev: Naukova dumka, 1970 (in Russian).
[3] Borodin A.N., Ibragimov I.A. Limit theorems for the functionals of random walks, Proc. of the Mathematical Institute of R.
Acad. Sci, vol. 195. St.-P.: Nauka, 1994 (in Russian).
[4] Revesz P. Random walk in random and nonrandom environments, World Sci. Publ. Co., Inc., Teaneck, NJ, 1990.
[5] Bass R.F., Khoshnevisan D. Local times on curves and uniform invariance principles, Prob. Theory Rel. Fields 92, 1992, p.
465 – 492.
[6] Cherny A.S., Shiryaev A.N., Yor M. Limit behavior of the ”horizontal-vertical” random walk and some extensions of the
Donsker-Prokhorov invariance principle. Probability theory and its applications, vol. 47, 3, 2002, p. 498 – 517.
[7] Gikhman I.I. Some limit theorems for the number of intersections of a boundary of a given domain by a random function,
Sci. notes of Kiev Un-ty, 1957, vol. 16, 10, p. 149 – 164 (in Ukrainian).
[8] Gikhman I.I. Asymptotic distributions for the number of intersections of a boundary of a domain by a random function,
Visnyk of Kiev Un-ty, serie astron., athem and mech., 1958, v. 1, 1, p. 25 – 46 (in Ukrainian).
[9] Portenko N.I. Integral equations and limit theorems for additive functionals of Markov processes, robability theory and its
applications, 1967, v. 12, 3, p. 551 – 558 (in Russian).
[10] Portenko N.I. The development of I.I.Gikhman’s idea concerning the methods for investigating local behavior of diffusion
processes and their weakly convergent sequences, Probab. Theory and Math. Stat., 1994, 50, p. 7 – 22.
[11] Kulik A.M. Markov Approximation of stable processes by random walks, vol.12(28) 2006, .1-2, p. 87 – 93.
[12] Feller W. An introduction to probability theory and its applications, Vol II, M.: Mir, 1984 (Russian, translated from
W.Feller, An introduction to probability theory and its applications, John Wiley & Sons, New York, 1971).
[13] Skorokhod A.V. Studies in theory of stochastic processes, Kiev, Kiev Univ-ty publishing house, 1961 (in Russian).
[14] Jacod J., Shiryaev A. Limit theorems for stochastic processes,Springer, Berlin, 1987.
[15] Kurtz T.G., Protter Ph. Weak limit theorems for stochastic integrals and SDE’s, Annals of Probability, 1991, vol. 19, 3, p.
1035 – 1070.
[16] Yamada T., Watanabe S. On the uniqueness of solutions of stochastic differential equations, J. Math. Kyoto Univ., 1971,
vol. 11, p. 156 – 167.
[17] Androshchuk T.O., Kulik A.M. Limit theorems for oscillatory functionals of a Markov process. Theory of stochastic proc-
cesses, vol. 11(27), p. 3 – 13.
[18] Ibragimov I.A., Linnik Yu.V. Linnik, Independent and stationary related variables, M.: Nauka, 1965 (in Russian).
[19] Kulik A.M. Difference approximation for local times of multidimensional diffusions, arXiv:math/0702175
[20] Skorokhod A.V. Lections on theory of stochastic processes, Kyiv: Lybid, 1990 (in Ukrainian).
E-mail address: kulik@imath.kiev.ua
http://arxiv.org/abs/math/0702175
	1. Introduction
	2. Markov approximation.
	3. Main theorem
	4. The local time of a random walk at a point.
	5. Difference approximations of diffusion processes.
	6. Invariance principle for additive functionals of Markov chains
	References
ABSTRACT
  We consider a sequence of additive functionals {\phi_n}, set on a sequence of
Markov chains {X_n} that weakly converges to a Markov process X. We give
sufficient condition for such a sequence to converge in distribution,
formulated in terms of the characteristics of the additive functionals, and
related to the Dynkin's theorem on the convergence of W-functionals. As an
application of the main theorem, the general sufficient condition for
convergence of additive functionals in terms of transition probabilities of the
chains X_n is proved.

<|endoftext|><|startoftext|>
Introduction
Let H,K be real separable Hilbert spaces with norms | · |H and | · |K . Let W
be a cylindrical Wiener process in K defined on a probability space (Ω,F ,P)
and let {Ft}t∈[0,T ] denote its natural augmented filtration. Let L
2(K,H) be
the Hilbert space of Hilbert-Schmidt operators from K to H.
http://arxiv.org/abs/0704.0509v1
We are interested in solving the following backward stochastic differential
equation
dYt = −AYtdt− f(t, Yt, Zt)dt+ ZtdWt, 0 ≤ t ≤ T, YT = ξ (1)
where ξ is a random variable with values in H, f(t, Yt, Zt) = f0(t, Yt) +
f1(t, Yt, Zt) and f0, f1 are given functions, and the operator A is an un-
bounded operator with domain D(A) contained in H. The unknowns are
the processes {Yt}t∈[0,T ] and {Zt}t∈[0,T ], which are required to be adapted
with respect to the filtration of the Wiener process and take values in H,
L2(K,H) respectively.
In finite dimensional framework such type of equations has been solved
by Pardoux and Peng [12] in the nonlinear case. They proved an existence
and uniqueness result for the solution of the equation (1) when A = 0, the
coefficient f(t, y, z) is Lipschitz continuous in both variables y and z, and
the data ξ and the process {f(t, 0, 0)}t∈[0,T ] are square integrable. Since this
first result, many papers were devoted to existence and uniqueness results
under weaker assumptions. In finite dimension, when A = 0, the Lipschitz
condition on the coefficient f with respect to the variable y is replaced by a
monotonicity assumption; moreover, more general growth conditions in the
variable y are formulated. Let us mention the contribution of Briand and
Carmona [1], for a study of polynomial growth in Lp with p > 2, and the
work of Pardoux [11] for an arbitrary growth. In [13] Pardoux and Rascanu
deal with a BSDE involving the subdifferential of a convex function; in
particular, one coefficient is not everywhere defined for y in Rk.
In other works the existence of the solution is proved when the data,
ξ and the process {f(t, 0, 0)}t∈[0,T ], are in L
p for p ∈ (1, 2). El Karoui,
Peng and Quenez [4] treat the case when f is Lipschitz continuous; in [2]
this result is generalized to the case of a monotone coefficient f (both for
equations on a fixed and on a random time interval) and is studied even the
case p = 1.
In the infinite-dimensional framework Hu and Peng [6], and Oksendal
and Zhang [10] give an existence and uniqueness result for the equation with
an operator A, infinitesimal generator of a strongly continous semigroup and
the coefficient f Lipschitz in y and z. Pardoux and Rascanu [14] replace the
operator A with the subdifferential of a convex function and assume that f is
dissipative, everywhere defined and continuous with respect to y, Lipschitz
with respect to z and with linear growth in y and z.
Special results deal with stochastic backward partial differential equa-
tions (BSPDEs): we recall in particular the works of Ma and Yong [8] and
[9]. Earlier, Peng [16] studied a backward stochastic partial differential
equation and regarded the classical Hamilton-Jacobi-Bellman equation of
optimal stochastic control as special case of this problem.
Our work extends these results in a special direction. We consider an
operator A which is the generator of an analytic contraction semigroup on
H and a coefficient f(t, y, z) of the form f0(t, y)+ f1(t, y, z). The coefficient
f1(t, y, z) is assumed to be bounded and Lipschitz with respect to y and z.
The term f0(t, y) is defined for y only taking values in a suitable subspaceHα
of H and it satisfies the following growth condition for some 1 < γ < 1/α,
S ≥ 0, P-a.s.
|f0(t, y)|H ≤ S(1 + ||y||
) ∀t ∈ [0, T ], ∀y ∈ Hα.
Following [6], we understand the equation (1) in the following integral
e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds+
e(s−t)AZsdWs = e
(T−t)Aξ,
requiring, in particular, that Y takes values in Hα. This requires generally
that the final condition also takes values in the smaller spaceHα. We take as
Hα a real interpolation space which belongs to the class Jα between H and
the domain of an operator A (see Section 2). Moreover f0(t, ·) is assumed
to be locally Lipschitz from Hα into H and dissipative in H. We prove
(Theorem 5) that if ξ takes its values in the closure of D(A) in Hα and is
such that ||ξ||Hα is essentially bounded, then equation (2) has a unique mild
solution, i.e. there exists a unique pair of progressively measurable processes
Y : Ω×[0, T ] → Hα, Z : Ω×[0, T ] → L
2(K;H), satisfying P-a.s. equality (2)
for every t in [0, T ] and such that E supt∈[0,T ] ||Yt||
||Zt||
L2(K,H)
This result extends former results concerning the deterministic case to
the stochastic framework: see [7], where previous works of Fujita - Kato [5],
Pazy [15] and others are collected. In these papers similar assumptions are
made on the coefficients f0, f1 and on the operator A.
The plan of the paper is as follows. In Section 2 some notations and
definitions are fixed. In Section 3 existence and uniqueness of the solution
of a simplified equation are proved, where f1 is a bounded progressively
measurable process which does not depend on y and z. In Section 4, applying
the previous result, a fixed point argument is used in order to prove our
main result on existence and uniqueness of a mild solution of (2). Section 5
is devoted to applications.
2 Notations and setting
The letters K and H will always denote two real separable Hilbert spaces.
Scalar product is denoted by 〈·, ·〉; L2(K;H) is the separable Hilbert space of
Hilbert-Schmidt operators from K to H endowed with the Hilbert-Schmidt
norm. W = {Wt}t∈[0,T ] is a cylindrical Wiener process with values in K,
defined on a complete probability space (Ω,F ,P). {Ft}t∈[0,T ] is the natural
filtration of W , augmented with the family of P-null sets of F .
Next we define several classes of stochastic processes with values in a
Banach space X.
• L2(Ω× [0, T ];X) denotes the space of measurable X-valued processes
Y such that
|Yτ |
is finite, identified up to modification.
• L2(Ω;C([0, T ];X)) denotes the space of continuousX-valued processes
Y such that
E sup
τ∈[0,T ]
|Yτ |
is finite, identified up to indistinguisha-
bility.
• Cα([0, T ];X) denotes the space of α-Hölderian functions on [0, T ] with
values in X such that [f ]α = sup
0≤x<y≤T
|f(x)− f(y)|
(y − x)α
Now we need to recall several preliminaries on semigroup and interpo-
lation spaces. We refer the reader to [7] for the proofs and other related
results.
A linear operator A in a Banach space X, with domain D(A) ⊂ X, is
called sectorial if there are constants ω ∈ R, θ ∈ (π/2, π), M > 0 such that
(i) ρ(A) ⊇ Sθ,ω = {λ ∈ C : λ 6= ω, |arg(λ− ω)| < θ},
(ii) ||(λI −A)−1||L(X) ≤
|λ−ω|
∀λ ∈ Sθ,ω
where ρ(A) is the resolvent set of A. For every t > 0, (3) allows us to define
a linear bounded operator etA in X, by means of the Dunford integral
etA =
ω+γr,η
etλ(λI −A)−1dλ, t > 0, (4)
where, r > 0, η ∈ (π/2, π) and γr,η is the curve {λ ∈ C : |argλ| = η, |λ| ≥
r} ∪ {λ ∈ C : |argλ| ≤ η, |λ| = r}, oriented counterclockwise. We also set
e0Ax = x,∀x ∈ X. Since the function λ 7→ etλR(λ,A) is holomorphic in
Sθ,ω, the definition of e
tA is independent of the choice of r and η. If A is
sectorial, the function [0,+∞) → L(X), t 7→ etA, with etA defined by (4)
is called analytic semigroup generated by A in X. We note that for every
x ∈ X the function t 7→ etAx is analytic (and hence continuous) for t > 0.
etA is a strongly continuous semigroup if and only if D(A) is dense in X; in
particular this holds if X is a reflexive space.
We need to introduce suitable classes of subspaces of X.
Definition 1. Let (α, p) be two numbers such that 0 < α < 1, 1 ≤ p ≤ ∞
or (α, p) = (1,∞). Then we denote with DA(α, p) the space
DA(α, p) = {x ∈ X : t 7→ v(t) = ||t
1−α−1/pAetAx|| ∈ Lp(0, 1)}
where ||x||DA(α,p) = ||x||X + [x]α = ||x||X + ||v||Lp(0,1).
(We set as usual 1/∞ = 0).
We recall here some estimates for the function t 7→ etA when t → 0,
which we will use in the sequel. For convenience, in the next proposition we
set DA(0, p) = X, p ∈ [1,∞].
Proposition 1. Let (α, p), (β, p) ∈ (0, 1)× [1,+∞]∪{(1,∞)}, α ≤ β. Then
there exists C = C(p;α, β) such that
||t−α+βetA||L(DA(α,p),DA(β,p)) ≤ C, 0 < t ≤ 1.
Definition 2. Let 0 ≤ α ≤ 1 and let D,X be Banach spaces, D ⊂ X. A Ba-
nach space Y such that D ⊂ Y ⊂ X is said to belong to the class Jα between
X and D if there is a constant C such that ||x||Y ≤ C||x||
X ||x||
D, ∀x ∈
D. In this case we write Y ∈ Jα(X,D).
Now we give the definition of solution to the BSDE:
e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds+
e(s−t)AZsdWs = e
(T−t)Aξ,
Definition 3. A pair of progressively measurable processes (Y,Z) is called
mild solution of (5) if it belongs to the space L2(Ω;C([0, T ];Hα))×L
[0, T ];L2(K,H)) and P-a.s.solves the integral equation (5) on the interval
[0, T ].
We finally state a lemma needed in the sequel. It is a generalization of
the well known Gronwall’s lemma. Its proof is given in the Appendix.
Lemma 1. Assume a, b, α, β are nonnegative constants, with α < 1, β > 0
and 0 < T < ∞. For any nonnegative process U ∈ L1(Ω× [0, T ]), satisfying
P-a.s. Ut ≤ a(T − t)
(s− t)β−1EFtUsds for almost every t ∈ [0, T ], it
holds P-a.s. Ut ≤ aM(T − t)
−α, for almost every t ∈ [0, T ]. M is a constant
depending only on b, α, β, T .
3 A simplified equation
As a preparation for the study of (2), in this section we consider the following
simplified version of that equation:
e(s−t)A[f0(s, Ys)ds+ f1(s)]ds +
e(s−t)AZsdWs = e
(T−t)Aξ, (6)
for all t ∈ [0, T ].
We suppose that the following assumptions hold.
Hypothesis 2.
1. A : D(A) ⊂ H → H is a sectorial operator. We also assume that A is
dissipative, i.e. it satisfies < Ay, y >≤ 0,∀y ∈ D(A);
2. for some 0 < α < 1 there exists a Banach space Hα continuously embed-
ded in H and such that
(i) DA(α, 1) ⊂ Hα ⊂ DA(α,∞);
(ii) the part of A in Hα is sectorial in Hα.
3. the final condition ξ is an FT -measurable random variable defined on
Ω with values in the closure of D(A) with respect to Hα-norm. We
denote this set D(A)
. Moreover ξ belongs to L∞(Ω,FT ,P;Hα);
4. f0 : Ω× [0, T ]×Hα → H satisfies:
i) {f0(t, y)}t∈[0,T ] is progressively measurable ∀y ∈ Hα;
ii) there exist constants S > 0, 1 < γ < 1/α such that P-a.s.
|f0(t, y)|H ≤ S(1 + ||y||
) t ∈ [0, T ], y ∈ Hα;
iii) for every R > 0 there is LR > 0 such that P-a.s.
|f0(t, y1)− f0(t, y2)|H ≤ LR||y1 − y2||Hα
for t ∈ [0, T ] and yi ∈ Hα with ||yi||Hα ≤ R;
iv) there exists a number µ ∈ R such that P-a.s., ∀t ∈ [0, T ], y1, y2 ∈
< f0(t, y1)− f0(t, y2), y1 − y2 >H≤ µ|y1 − y2|
H ; (7)
5. f1 : Ω × [0, T ] → H is progressively measurable and for some constant
C > 0 it satisfies P-a.s. |f1(t)|H ≤ C, for t ∈ [0, T ].
Remark 1. We note that the pair (Y,Z) solves the BSDE (6) with final con-
dition ξ and drift f = f0+f1 if and only if the pair (Ȳ , Z̄) := (e
λtYt, e
λtZt) is
a solution of the same equation with final condition eλT ξ and drift f ′(t, y) :=
0(t, y) + f
1(t) where f
0(t, y) = e
λt(f0(t, e
−λty)− λy), f
1(t) = e
λtf1(t). If we
choose µ = λ, then f
0 satisfies the same assumption as f0, but with (7) re-
placed by < f0(t, y1)− f0(t, y2), y1 − y2 >H≤ 0. If this last condition holds,
then f0 is called dissipative. Hence, without loss of generality, we shall
assume until the end that f0 is dissipative, or equivalently that µ = 0 in (7).
3.1 A priori estimates
We prove a basic estimate for the solution in the norm of H.
Proposition 2. Suppose that Hypothesis 2 holds; if (Y,Z) is a mild solution
of (6) on the interval [a, T ], 0 ≤ a ≤ T , then there exists a constant C1,
which depends only on ||ξ||L∞(Ω;H) and on the constants S of 4.ii) and C of
5. such that P-a.s. supa≤t≤T ||Yt||H ≤ C1. In particular the constant C1 is
independent of a.
Proof. Let the pair (Y,Z) ∈ L2(Ω, C([a, T ];Hα)× L
2(Ω × [a, T ];L2(K;H))
satisfy (6). Let us introduce the operators Jn = n(nI − A)
−1, n > 0. We
note that the operators AJn are the Yosida approximations of A and they
are bounded. Moreover |Jnx − x| → 0 as n → ∞, for every x ∈ H. We
set Y nt = JnYt, Z
t = JnZt. It is readily verified that Y
n admits the Itô
differential
dY nt = −AY
t dt− Jnf(t, Yt)dt− Jnf1(t)dt+ Z
t dWt, and Y
T = Jnξ.
Applying the Ito formula to |Y nt |
H , using the dissipativity of A, we obtain
|Y nt |
||Zns ||
L2(K;H)ds ≤ |Jnξ|
H + 2
< Jnf0(s, Ys), Y
s >H ds+
< Jnf1(s), Y
s >H ds− 2
< Y ns , Z
s dWs >H .
We note that
< Jnf0(s, Ys) + Jnf1(s), Y
s >H ds →
< f0(s, Ys) +
f1(s), Ys >H ds by dominated convergence, as n → ∞. Moreover by
the dominated convergence theorem we have
||(Zns )
∗Y ns − Z
sYs||
Kds →
0 P-a.s. and it follows that
< Y ns , Z
s dWs >H→
< Ys, ZsdWs >H
in probability. If we let n → ∞ in (8) we obtain
||Zs||
L2(K;H)ds ≤ |ξ|
H + 2
< f0(s, Ys) + f1(s), Ys >H ds
< Ys, ZsdWs >H .
Recalling (7), that we assume to hold with µ = 0, it follows that
||Zs||
L2(K,H) ≤
≤ |ξ|2H + 2
< f0(s, 0), Ys >H +2
< f1(s), Ys >H ds+
< Ys, ZsdWs >H
≤ |ξ|2H +
|f(s, 0)|2Hds+
|f1(s)|
Hds + 2
< Ys, ZsdWs >H .
Now, since sup0≤t≤T |f(t, 0)|
H ≤ S
2 and since the stochastic integral
< Ys, ZsdWs >H , t ∈ [a, T ] is a martingale, if we take the conditional
expectation given Ft we have
H ≤ E
Ft |ξ|2H + 2E
|f(s, 0)|2Hds+ E
|f1(s)|
≤ |ξ|2L∞(Ω,H) + (S
2 + C2)T + 2
Ft |Ys|
Since Y belongs to L2(Ω;C([a, T ];Hα)) and, consequently, ||Y ||
L1(Ω× [0, T ]), we can apply Lemma 1 to |Y |2H and conclude that
H ≤ (|ξ|
L∞(Ω,H)
+ [S2 + C2]T )(1 + 2Te2T ).
Now we will show that the result of Proposition 2, together with the
growth condition satisfied by f0, yields an a priori estimate on the solution
in the Hα-norm.
Let 0 < α < 1 and let γ > 1 be given by 4.ii). We fix θ = αγ and consider
the Banach space DA(θ,∞) introduced in Definition 1. It is easy to check
(see [7]) that, if we take θ ∈ (0, 1), θ > α, then Hα contains DA(θ,∞) and
belongs to the class Jα/θ between DA(θ,∞) and H, hence the following
inequality is satisfied:
|x|Hα ≤ c|x|
DA(θ,∞)
H , x ∈ DA(θ,∞). (9)
Proposition 3. Suppose that Hypothesis 2 is satisfied. Let (Y,Z) be a mild
solution of (6) in [a, T ], a ≥ 0 and assume that there exists two constants
R > 0 and K > 0, possibly depending on a, such that, P-a.s.,
t∈[a,T ]
||Yt||Hα ≤ R, sup
t∈[a,T ]
|Yt|H ≤ K. (10)
Then the following inequality holds P-a.s.:
|Yt|L∞(Ω,DA(θ,∞)) ≤ C2
(T − t)θ−α
, a ≤ t < T (11)
with C2 depending on the operator A, ||ξ||L∞(Ω,Hα), θ, α, K, C of 5. and S
of 4.ii) of Hypothesis 2.
Proof. Taking the conditional expectation given Ft in equation (6) we find
Yt = E
e(T−t)Aξ +
e(s−t)A[f0(s, Ys) + f1(s)]ds
, a ≤ t ≤ T.
Consequently, we have
||Yt||DA(θ,∞) ≤ E
Ft ||e(T−t)Aξ||DA(θ,∞)
||e(s−t)A[f0(s, Ys) + f1(s)]||DA(θ,∞)ds, a ≤ t ≤ T .
Since Hα ⊂ DA(α,∞), we have
Ft ||e(T−t)Aξ||DA(θ,∞) ≤
≤ EFt ||e(T−t)A||L(DA(α,∞),DA(θ,∞))||ξ||L∞(Ω,DA(α,∞))
(T − t)θ−α
||ξ||L∞(Ω,Hα),
with C0 = C0(α, θ,∞), where in the last inequality we use Proposition 1.
Moreover
||e(s−t)A[f0(s, Ys) + f1(s)]||DA(θ,∞)ds ≤
≤ EFt
||e(s−t)A||L(H,DA(θ,∞))|f0(s, Ys) + f1(s)|Hds ≤
≤ EFt
(s− t)θ
[|f0(s, Ys)|H + |f1(s)|H ]ds
≤ EFt
(s− t)θ
[S(1 + ||Ys||
) + C]ds.
In the inequality we used Hypotheses 4.ii) and 5. and Proposition 1. Re-
calling (9), we conclude that the last term is dominated by
(s − t)θ
S(1 + c|Ys|
γ(1−α)/θ
H ||Ys||
DA(θ,∞)
) + C
= EFt
(s − t)θ
S(1 + c|Ys|
γ(1−α)/θ
H ||Ys||DA(θ,∞)) + C
by choosing θ = αγ. By the second inequality in (10) this can be estimated
(s− t)θ
S(1 + cKγ(1−α)/θEFt ||Ys||DA(θ,∞) + C)ds
(s− t)θ
(C + S)ds +
(s− t)θ
ScKγ(1−α)/θEFt ||Ys||DA(θ,∞)ds.
Hence by (13) and (14) it follows
||Yt||DA(θ,∞) ≤
(T − t)θ−α
||ξ||L∞(Ω,Hα) +
(s− t)θ
(C + S)ds
(s− t)θ
ScKγ(1−α)/θEFt ||Ys||DA(θ,∞)ds,
and (11) follows from Lemma 1. In order to justify the application of Lemma
1, we need to prove that ||Y ||DA(θ,∞) belongs to L
1(Ω × [a, T ]). This also
follows from(13) and (14) since, for some constant K1,
||Yt||DA(θ,∞) ≤
(T − t)θ−α
||ξ||L∞(Ω,Hα) + E
Ft [ sup
s∈[a,T ]
(1 + ||Ys||
(s − t)θ
(T − t)θ−α
||ξ||L∞(Ω,Hα) + (1 +R
(s − t)θ
3.2 Local existence and uniqueness
We prove that, under Hypothesis 2, there exists a unique solution of (6) on
an interval [T − δ, T ] with δ sufficiently small.
To treat the ordinary integral in the left hand side of (6), we need the
following result, whose proof can be found in [7], Proposition 4.2.1 and
Lemma 7.1.1.
Lemma 3. Let φ ∈ L∞((a, T );H), 0 < a < T and set
v(t) =
e(s−t)Aφ(s)ds, a ≤ t ≤ T.
If 0 < α < 1, then v ∈ C1−α([a, T ];DA(α, 1)) and there is G0 > 0, not
depending on a, such that
||v||C1−α([a,T ];DA(α,1)) ≤ G0||φ||L∞((a,T );H).
Since DA(α, 1) ⊂ Hα, we also have v ∈ C
1−α([a, T ];Hα) and there is G > 0,
not depending on a, such that
||v||C1−α([a,T ];Hα) ≤ G||φ||L∞((a,T );H).
Theorem 4. Let us assume that Hypothesis 2 holds, except possibly 4.iv).
Then there exists δ > 0 such that the equation (6) has a unique local mild
solution (Y,Z) ∈ L2(Ω;C([T − δ, T ];Hα))× L
2(Ω× [T − δ, T ];L2(K;H)).
Remark 2. The dissipativity condition 4.iv) only plays a role in obtaining
the a priori estimate in H (Proposition 2) and consequently global existence,
as we will see later.
Proof. Let Mα := sup0≤t≤T ||e
tA||L(Hα). We fix a positive number R such
that R ≥ 2Mα||ξ||L∞(Ω;Hα). This implies that sup0≤t≤T ||e
tAξ||Hα ≤ R/2
P-a.s. Moreover, let LR be such that
|f0(t, y1)− f0(t, y2)|H ≤ LR||y1 − y2||Hα 0 ≤ t ≤ T, ||yi||Hα ≤ R
We recall that the space L2(Ω;C([T − δ, T ];Hα)) is a Banach space en-
dowed with the norm Y →
E supt∈[T−δ,T ] ||Yt||
. We define
K = {Y ∈ L2(Ω;C([T − δ, T ],Hα)) : sup
t∈[T−δ,T ]
||Yt||Hα ≤ R, a.s.}.
It easy to check that K is a closed subset of L2(Ω;C([T − δ, T ],Hα)),
hence a complete metric space (with the inherited metrics). We look for a
local mild solution (Y,Z) in the space K. We define a nonlinear operator
Γ : K → K as follows: given U ∈ K, Y = Γ(U) is the first component of the
mild solution (Y,Z) of the equation
e(s−t)A[f0(s, Us)ds+ f1(s)]ds+
e(s−t)AZsdWs = e
(T−t)Aξ (15)
for t ∈ [T − δ, T ]. Since U ∈ K we have P-a.s.
|f0(t, Ut) + f1(t)|H ≤ S(1 + ||Ut||
) + C ≤ S(1 +Rγ) + C, (16)
for all t in [T − δ, T ]. Hence f0(·, U·)+ f1(·) belongs to L
2(Ω× [T − δ, T ];H)
and, by a result of Hu and Peng [6], there exists a unique pair (Y,Z) ∈
L2(Ω× [T − δ, T ];H)× L2(Ω× [T − δ, T ];L2(K;H)) satisfying (15). More-
over, by taking the conditional expectation given Ft, Y has the following
representation
Yt = E
e(T−t)Aξ +
e(s−t)A[f0(s, Us) + f1(s)]ds
We will show that Γ is a contraction for the norm of L2(Ω, C([T − δ, T ];Hα)
and maps K into itself, if δ is sufficiently small; clearly, its unique fixed point
is the required solution of the BSDE.
We first check the contraction property. Let U1, U2 ∈ K. Then
Γ(U1)t − Γ(U
2)t = Y
t − Y
t = E
e(s−t)A(f0(s, U
s )− f0(s, U
Let v(t) =
e(s−t)A
f0(s, U
s )− f0(s, U
ds. Then, noting that v(T ) = 0
and recalling Lemma 3, for t ∈ [T − δ, T ]
||Y 1t − Y
t ||Hα =
= ||EFtv(t)||Hα ≤ E
Ft ||v(t)||Hα
≤ δ1−αEFt ||v||C(1−α)([T−δ,T ],Hα)
≤ Gδ(1−α)EFt ||f0(·, U
· )− f0(·, U
· )||L∞([T−δ,T ],H)
≤ Gδ(1−α)LRE
Ft sup
t∈[T−δ,T ]
||U1t − U
t ||Hα =: Mt,
where {Mt, t ∈ [T − δ, T ]} is a martingale. Hence, by Doob’s inequality
E sup
t∈[T−δ,T ]
||Y 1t − Y
≤ E sup
t∈[T−δ,T ]
2 ≤ 2E|MT |
= 2G2L2Rδ
2(1−α)
E sup
t∈[T−δ,T ]
||U1t − U
If δ ≤ δ0 = 2GLR
(1−α) , then Γ is a contraction with constant 1/2.
Next we check that Γ mapsK into itself. For each U ∈ K and t ∈ [T−δ, T ]
with δ ≤ δ0 we have
t∈[T−δ,T ]
||Γ(U)t||Hα = sup
t∈[T−δ,T ]
||Yt||Hα ≤ sup
t∈[T−δ,T ]
Ft ||e(T−t)Aξ||Hα+
+ sup
t∈[T−δ,T ]
Ft ||
e(s−t)A[f0(s, Us) + f1(s)]ds||Hα
≤ R/2 + sup
t∈[T−δ,T ]
||e(s−t)A[f0(s, Us) + f1(s)]||Hαds
≤ R/2 + sup
t∈[T−δ,T ]
||e(s−t)A[f0(s, Us) + f1(s)]||DA(α,1)ds,
where in the last inequality we have used the fact that DA(α, 1) ⊂ Hα. Now,
by Proposition 1, and from 4.ii) and 5., it follows that
||e(s−t)A[f0(s, Us) + f1(s)]||DA(α,1) ≤
≤ ||e(s−t)A||L(H,DA(α,1))|f0(s, Us) + f1(s)|H
(s− t)α
[S(1 + ||Us||
) + C].
Then, since U ∈ K, we arrive at
t∈[T−δ,T ]
||Γ(U)t||Hα ≤
≤ R/2 + sup
t∈[T−δ,T ]
(s− t)α
[S(1 + ||Us||
) + C]ds
≤ R/2 + sup
t∈[T−δ,T ]
(s− t)α
[S(1 +Rγ) + C]ds
≤ R/2 + CαS
[(1 +Rγ) + C]
δ1−α,
where Cα depends on A, α. Hence, if δ ≤ δ0 is such that CαS
[(1+Rγ )+C]
is less or equal to R/2, then sup
t∈[T−δ,T ]
||Γ(U)t||Hα ≤ R. Due to Lemma 3, P-
a.s. the function t 7→ Yt−E
Fte(T−t)Aξ belongs to C[T−δ, T ];Hα); moreover,
the map t 7→ EFte(T−t)Aξ belongs to C[T − δ, T ];Hα), since ξ is a random
variable taking values in D(A)
. Therefore, P-a.s. Y· ∈ C([T − δ, T ];Hα)
and Γ maps K into itself and has a unique fixed point in K.
Remark 3. By Lemma 3, using properties of analytic semigroups, it can
be proved that for every fixed ω the range of the map Γ is contained in
C1−β([T − δ, T − ǫ];DA(β, 1)) for every ǫ ∈ (0, δ), β ∈ [0, 1].
3.3 Global existence
Now we are able to prove a global existence theorem for the solution of the
equation (6), using all the results presented above.
Theorem 5. If Hypothesis 2 is satisfied, the equation (6) has a unique mild
solution (Y,Z) ∈ L2(Ω;C([0, T ],Hα))× L
2(Ω× [0, T ]);L2(K;H)).
Proof. By Theorem 4 equation (6) has a unique mild solution (Y 1, Z1) ∈
L2(Ω;C([T − δ1, T ],Hα)) × L
2(Ω × [T − δ1, T ]);L
2(K;H)) on the interval
[T − δ1, T ], for some δ1 > 0. By Proposition 2 we know that there exists a
constant C1 such that P-a.s.
|YT−δ1 |H ≤ C1. (17)
We recall that the constant C1 depends only on |ξ|L∞(Ω;H) and on the con-
stants S of 4.ii) and C of 5. and is independent of δ1. Moreover, by Propo-
sition 3, there exists a constant C2 such that P-a.s.
||YT−δ1 ||L∞(Ω,DA(θ,∞)) ≤ C2
δθ−α1
, (18)
with C2 depending on the operator A, ||ξ||L∞(Ω,Hα), θ, α, C1. This implies
that YT−δ1 belongs to L
∞(Ω;Hα) and it can be taken as final value for the
problem
∫ T−δ1
e(s−t)A[f0(s, Ys)ds+ f1(s)]ds +
∫ T−δ1
e(s−t)AZsdWs =
= e(T−δ1−t)AYT−δ1
on an interval [T − δ1 − δ2, T − δ1], for some δ2 > 0. As in the proof of
Theorem 4, we fix a positive number R2 such that
R2 = 2Mα
≥ 2Mα||YT−δ1 ||L∞(Ω,DA(θ,∞)).
By Theorem 4 there exists a pair of progressively measurable processes
(Y 2, Z2) in L2(Ω;C([T − δ1 − δ2, T − δ1];Hα)) × L
2(Ω × [T − δ1 − δ2, T −
δ1];L
2(K,H)) which solves (19) on the interval [T − δ1 − δ2, T − δ1] where
δ2 depends on the operator A, α, R2. We note that the continuity in T − δ1
of Y 2 follows from the fact that YT−δ1 takes values in DA(α, 1) (see Remark
3), so that YT−δ1 takes values in D(A)
. Now, the process Yt defined by
Y 1t on the interval [T − δ1, T ] and by Y
t on [T − δ1 − δ2, T − δ1] belongs
to L2(Ω;C([T − δ1 − δ2, T ];Hα)) and it easy to see that it satisfies (6) in
the whole interval [T − δ1 − δ2, T ]. Consequently, by Proposition 2, P-a.s.,
|YT−δ1−δ2 |H ≤ C1 with C1 the constant in (17), and by (18)
||YT−δ1−δ2 ||L∞(Ω,DA(θ,∞)) ≤
(δ1 + δ2)θ−α
, (20)
where C2 is the same constant as in (18). Again, YT−δ1−δ2 can be taken as
initial value for problem
∫ T−δ1−δ2
e(s−t)A[f0(s, Ys)ds+ f1(s)]ds +
∫ T−δ1−δ2
e(s−t)AZsdWs =
e(T−δ1−δ2−t)AYT−δ1−δ2
on the interval [T − δ1 − δ2 − δ3, T − δ1 − δ2], where δ3 will be fixed later.
In this case, by (20), we can choose
R3 = R2 = 2Mα
≥ 2Mα||YT−δ1−δ2 ||L∞(Ω,DA(θ,∞))
and prove that there exists a unique mild solution (Y 3, Z3) of (21) on the
interval [T−δ1−δ2−δ3, T−δ1−δ2], with δ3 = δ2 . So we extend the solution
to [T − δ1 − 2δ2, T ]. Proceeding this way we prove the global existence to
(6) on [0, T ].
4 The general case
We can now study the equation:
e(s−t)A[f0(s, Ys) + f1(s, Ys, Zs)]ds +
e(s−t)AZsdWs = e
(T−t)Aξ
We require that the function f1 satisfy the following assumptions:
Hypothesis 6.
1. there exists K ≥ 0 such that P-a.s.
|f1(t, y, z)− f1(t, y
)|H ≤ K|y − y
|H +K||z − z
||L2(K;H),
for every t ∈ [0, T ], y, y
∈ H, z, z
∈ L2(K;H),
2. there exists C ≥ 0 such that P-a.s. |f1(t, y, z)|H ≤ C, for every t ∈
[0, T ], y ∈ H, z ∈ L2(K;H).
Theorem 7. If Hypotheses 2 and 6 hold, then equation (22) has a unique
solution in L2(Ω;C([0, T ];Hα))× L
2(Ω× [0, T ];L2(K;H)).
Proof. LetM be the space of progressive processes (Y,Z) in the space L2(Ω×
[0, T ];H) × L2(Ω× [0, T ];L2(K;H)) endowed with the norm
|||(Y,Z)|||2β = E
eβs(|Ys|
H + ||Zs||
L2(K;H))ds,
where β will be fixed later. We define Φ : M → M as follows: given (U, V ) ∈
M, (Y,Z) = Φ(U, V ) is the unique solution on the interval [0, T ] of the
equation
e(s−t)A[f0(s, Ys)ds+f1(s, Us, Vs)]ds+
e(s−t)AZsdWs = e
(T−t)Aξ.
By Theorem 5 the above equation has a unique mild solution (Y,Z) which
belongs to L2(Ω;C([0, T ];Hα))×L
2(Ω×[0, T ];L2(K;H)). Therefore Φ(M) ⊂
M. We will show that Φ is a contraction for a suitable choice of β; clearly,
its unique fixed point is the required solution of (22). We take another
pair (U
) ∈ M and apply Proposition 3.1 in [3] to the difference of two
equations. We obtain
β|Y 1t − Y
H + ‖Z
s − Z
L2(K;H)ds
eβs < f0(s, Y
s ) + f1(s, U
s , V
−f0(s, Y
s )− f1(s, U
s , V
s ), Y
s − Y
s >H ds
eβsK(|U1s − U
s |H + ||V
s − V
s ||L2(K;H))|Y
s − Y
s |Hds
eβs(|U1s − U
H + ||V
s − V
L2(K;H))/2 + 4K
2|Y 1s − Y
where we have used 4.iv) of Hypothesis 2 and 1. of Hypothesis 6. Choosing
β = 4K2 + 1, we obtain the required contraction property.
5 Applications
In this section we present some backward stochastic partial differential prob-
lems which can be solved with our techniques.
5.1 The reaction-diffusion equation
Let D be an open and bounded subset of Rn with a smooth boundary
∂D. We choose K = L2(D). This choice implies that dWt/dt is the
so-called ”space-time white noise”. Moreover, since Hilbert-Schmidt op-
erators on L2(D) are represented by square integrable kernels, the space
L2(L2(D), L2(D)) can be identified with L2(D ×D). We are given a com-
plete probability space (Ω,F ,P) with a filtration (Ft)t∈[0,T ] generated by
W and augmented in the usual way. Let us consider a non symmetric
bilinear, coercive continuous form a : H10 (D) × H
0 (D) → R defined by
a(u, v) := −
i,j aij(x)Diu(x)Djv(x)dx, where the coefficients aij are
Lipschitz continuous and there exists α > 0 such that
i,j=1 aij(x)ξiξj ≥
α|ξ|2 for every x ∈ D, ξ ∈ Rn. Let A be the operator associated with the bi-
linear form a such that < Au, v >L2(D)= a(u, v), v ∈ H
0 (D) and u ∈ D(A).
It is known that, in this case, D(A) = H2(D) ∩H10 (D), where H
2(D) and
H10 (D) are the usual Sobolev spaces.
We consider for t ∈ [0, T ] and x ∈ D the backward stochastic problem
written formally
∂Y (t, x)
= AY (t, x) + r(Y (t, x)) + g(t, Y (t, x), Z(t, x), x)+
+ Z(t, x)
∂W (t, x)
on Ω× [0, T ]× D̄
Y (T, x) = ξ(x) on Ω× D̄
Y (t, x) = 0 on Ω× [0, T ]× ∂D
We suppose the following.
Hypothesis 8.
1. r : R → R is a continuous, increasing and locally Lipschitz function;
2. r satisfies the following growth condition: |r(x)| ≤ S(1 + |x|γ) ∀x ∈ R
for some γ > 1;
3. g is a measurable real function defined on [0, T ] × R × L2(D × D) ×D
and there exists a constant K > 0 such that
|g(t, y1, z1, x)− g(t, y2, z2, x)| ≤ K(|y1 − y2|+ ||z1 − z2||L2(D×D))
for all t ∈ [0, T ], y1, y2 ∈ R, z1, z2 ∈ L
2(D), x ∈ D;
4. there exists a real function h in L2(D×D) such that P-a.s. |g(t, y, z, x)| ≤
K1h(x) for all t ∈ [0, T ], y ∈ R, z ∈ L
2(D), x ∈ D;
5. ξ belongs to L∞(Ω;H2(D) ∩H10 (D)).
We define the operator A by (Ay)(x) = Ay(x) with domain D(A) =
H2(D) ∩ H10 (D). We set f0(t, y)(x) = −r(y(t, x)) for t ∈ [0, T ], x ∈ D
and y in a suitable subspace of H which will be determined below. For
t ∈ [0, T ], x ∈ D, y ∈ L2(D), z ∈ L2(D × D) we define f1 as the operator
f1(t, y, z)(x) = −g(t, y(t, x), z(t, x), x). Then problem (23) can be written
in abstract way as
dYt = −AYtdt− f0(t, Yt)dt− f1(t, Yt, Zt)dt+ ZtdWt, YT = ξ.
Under the conditions in Hypothesis 8, the assumptions in Hypotheses
2, 6 are satisfied. The operator A is a closed operator in L2(D) and it
is the infinitesimal generator of an analytic semigroup in L2(D) satisfying
‖etA‖L(H) ≤ 1 (see [17], Chapter 3). In particular, by Lumer-Philips theo-
rem, A is dissipative. The non linear function f0(t, ·) : L
2γ(D) → L2(D),
y 7→ −r(y) is locally Lipschitz. We look for a space of class Jα between H
and D(A) where f0 is well defined and locally Lipschitz. It is well known
(see [18]) that the fractional order Sobolev space W β,2(D) is of class Jβ/2
between L2(D) and H2(D) for every β ∈ (0, 2). Hence the space Hα defined
by Hα = W
β,2(D) if β < 1, by W β,2(D) ∩H10 (D) if β ≥ 1 is of class Jβ/2
between H and D(A). Moreover the restriction of A on Hα is a sectorial
operator ([18]). By the Sobolev embedding theorem, W β,2 is contained in
Lq(D) for all q if β ≥ n
, and in L2n/(n−2β)(D) if β < n
. If we choose
β ∈ (0, 2) we have W β,2(D) ⊂ L2γ(D) for n < 4
. It is clear that f0
is locally Lipschitz with respect to y from Hα into H. It is easy to verify
that f0 satisfies 4.ii) of Hypothesis 2 with γ = 2n + 1 and that it is dis-
sipative with constant µ = 0. The function f1 is Lipschitz uniformly with
respect to y and z and it is bounded. The final condition ξ takes values in
and belongs to L∞(Ω;Hα). Hence we can apply the global exis-
tence theorem and state that the above problem has a unique mild solution
(Y,Z) ∈ L2(Ω;C([0, T ];Hα))× L
2(Ω× [0, T ];L2(K,H)).
5.2 A spin system
Let Z be the one-dimensional lattice of integers. Its elements will be inter-
preted as atoms. A configuration is a real function y defined on Z. The
value y(n) of the configuration y at the point n can be viewed as the state
of the atom n.
We consider an infinite system of equations
dY nt = −anY
t dt+
|n−j|≤1
V (Y nt − Y
t )dt+ Z
t n ∈ Z, 0 ≤ t ≤ T
Yn(T ) = ξn n ∈ Z,
where Y n and Zn are real processes, and V : R → R.
Let l2(Z) be the usual Hilbert space of square summable sequences. To
study system (24) we apply results of previous sections. To fit our assump-
tion in Hypotheses 2 and 6, we suppose the following
Hypothesis 9.
1. W n, n ∈ Z are independent standard real Wiener processes;
2. a = {an}n∈Z is a sequence of nonnegative real numbers;
3. ξ = {ξn}n∈Z is a random variable belonging to L
∞(Ω, l2(Z));
4. the function V : R → R is defined by V (x) = x2k+1 k ∈ N.
We will study system (24) regarded as a backward stochastic evolution
equation for t ∈ [0, T ]
dYt = (AYt + f0(t, Yt))dt+ ZtdWt, YT = ξ (25)
on a properly chosen Hilbert space H of functions on Z.
To reformulate problem (24) in the abstract form (25), we set K = H =
l2(Z). We set Wt = {W
t }n∈Z, t ∈ [0, T ]. By 1. of Hypothesis 9, W is a
cylindrical Wiener process inH defined on (Ω,F , P ). We define the operator
A(y) = (anyn)n, D(A) = {y ∈ l
2(Z) such that
n∈Z a
n < ∞}.
It is easy to prove that A is a self-adjoint operator in l2(Z), hence the
infinitesimal generator of a sectorial semigroup. The coefficient f0 is given
by (f0(t, y))n = (V (yn+1−yn)+V (yn−1−yn)), t ∈ [0, T ], y ∈ D(f0) where
D(f0) = {y ∈ l
2(Z) such that
n∈Z |xn+1 − xn|
2(2k+1) < +∞}. Under Hy-
pothesis 9, A, f0, ξ satisfy Hypotheses 2 and 6. We observe that in this case
the domain of f0 is the whole space H: if y ∈ l
2(Z) then
|yn+1 − yn|
2(2k+1)}
2(2k+1) ≤ {
|yn+1 − yn|
2 ≤ 2||y||l2(Z).
Consequently, we can take Hα with α = 0, i.e. H0 = H. The function f0 is
dissipative. Namely
< f0(t, y)− f0(t, y
′), y − y′ >l2(Z) =
{[(yn+1 − yn)
(2k+1) + (yn−1 − yn)
(2k+1)]+
+[(y′n+1 − y
(2k+1) + (y′n−1 − y
(2k+1)])[yn − y
[(yn+1 − yn)
(2k+1) − (y′n+1 − y
(2k+1)][(yn+1 − yn)− (y
n+1 − y
and the last term is negative. Moreover, f0 satisfies 4.ii) of Hypothesis 2
with γ = 2k + 1. The map f0 is also locally Lipschitz from H in to H.
Then by Theorem 7, problem (25) has a unique mild solution (Y,Z) which
belongs to L2(Ω, C([0, T ];H)) × L2(Ω× [0, T ];L2(K,H)).
6 Appendix
This section is devoted to the proof of Lemma 1. Assume first that β =
1. Using recursively the inequality Ut ≤ a(T − t)
−α + b
FtUsds we can
easily prove that
≤ a(T − t)−α +
(r − t)k−1
(k − 1)!
(T − r)α
+bEFt
(b(r − t))n−1
(n− 1)!
Urdr.
The last term in the above inequality tends to zero as n tends to infinity
for each t in the interval [0, T ]. Thus
Ut ≤ a(T − t)
−α + a
bk(T − t)k−1
(k − 1)!
(T − r)α
≤ a(T − t)−α + abeb(T−t)
(T − r)α
≤ a(T − t)−α + abeb(T−t)
(T − t)1−α ≤ a(T − t)−αM
where M = 1 + bebT 1
In the case β 6= 1 a similar proof can be given, based on recursive use of
the inequality Ut ≤ a(T − t)
−α + b
(s − t)β−1EFtUsds.
Acknowledgments:
I wish to thank Giuseppe Da Prato for hospitality at the Scuola Normale
Superiore in Pisa, suggestions and helpful discussions. I would like to express
my gratitude to Marco Fuhrman: I am indebted to him for his precious help
and encouragement. Special thanks are due to Alessandra Lunardi, who
gave me valuable advice and support.
References
[1] Ph. Briand and R. Carmona. BSDEs with polynomial growth generators.
J. Appl. Math. Stochastic Anal., 13(3):207–238, 2000.
[2] Ph. Briand and B. Deylon and Y. Hu and E. Pardoux and L. Stoica.
Lp solutions of backward stochastic differential equations. Stochastic
Process. Appl., 108(1):109–129, 2003.
[3] F. Confortola. Dissipative backward stochastic differential equations in
infinite dimensions. Infinite Dimensional Analysis, Quantum Probability
and Related Topics, 9 (1):155–168, 2006.
[4] N. El Karoui and S. G. Peng and M. C. Quenez. Backward Stochastic
Differential equations in Finance. Math. Finance, 7(1):1–71, 1997.
[5] H. Fujita and T. Kato. On the Navier-Stokes initial value problem I.
Arch. Rational Mech. Anal. 16:269–315, 1964.
[6] Y. Hu and S. G. Peng. Adapted solution of a backward semilinear
stochastic evolution equation. Stochastic Anal. Appl., 9(4):445–459,
1991.
[7] A. Lunardi. Analytic semigroups and optimal regularity in parabolic prob-
lems volume 16 of Progress in Nonlinear Differential Equations and their
Applications. Birkhser Verlag, Basel 1995.
[8] J. Ma and J. Yong Adapted solution of a degenerate backward SPDE,
with applications. Stochastic Process. Appl. 70:59–84, 1997.
[9] J. Ma and J. Yong On linear, degenerate backward stochastic partial
differential equations. Probab. theory Related Fields 113:135–170 1999.
[10] B. Oksendal and T. Zhang. On backward stochastic partial differential
equations, 2001. Preprint.
[11] E. Pardoux. BSDEs, weak convergence and homogenization of semilin-
ear PDEs. Nonlinear analysis, differential equations and control (Mon-
treal, QC, 1998), 503–549, NATO Sci. Ser. C Math. Phys. Sci., 528,
Kluwer Acad. Publ., Dordrecht, 1999.
[12] É. Pardoux and S. Peng. Adapted solution of a backward stochastic
differential equation. Systems and Control Lett. 14:55–61, 1990.
[13] E. Pardoux and A. Răşcanu. Backward stochastic differential equa-
tions with subdifferential operator and related variational inequalities.
Stochastic Process. Appl., 76(2):191–215, 1998.
[14] E. Pardoux and A. Răşcanu. Backward stochastic variational inequali-
ties. Stochastics Stochastics Rep., 67(3-4):159–167, 1999.
[15] A. Pazy Semigroups of linear operators and applications to partial dif-
ferential equations, Springer-Verlag, (1983).
[16] S. Peng Stochastic Hamilton-Jacobi-Bellman equations. SIAM J. Con-
trol Optim., 30:284–304, 1992.
[17] H. Tanabe Equations of evolution. Monographs and Studies in Mathemat-
ics, 6. Pitman (Advanced Publishing Program), Boston, Mass.-London,
1979.
[18] H. Triebel. Interpolation Theory, Function Spaces, Differential Op-
erators vol. 18 of North-Holland Mathematical Library North-Holland
Publishing Co., Amsterdam-New York, 1978
ABSTRACT
  In this paper we study a class of backward stochastic differential equations
(BSDEs) of the form dY(t)= -AY(t)dt -f_0(t,Y(t))dt -f_1(t,Y(t),Z(t))dt +
Z(t)dW(t) on the interval [0,T], with given final condition at time T, in an
infinite dimensional Hilbert space H. The unbounded operator A is sectorial and
dissipative and the nonlinearity f_0(t,y) is dissipative and defined for y only
taking values in a subspace of H. A typical example is provided by the
so-called polynomial nonlinearities. Applications are given to stochastic
partial differential equations and spin systems.

<|endoftext|><|startoftext|>
Introduction
	b - DM coincidence from Affleck-Dine leptogenesis 
	Lepton asymmetry
	Baryon asymmetry and LSP production from Q-balls
	A solution to the missing satellite problem and the cusp problem
	Concluding remarks
	Acknowledgments
	References
ABSTRACT
  We show that axinos, which are dominantly generated by the decay of the
next-to-lightest supersymmetric particles produced from the leptonic $Q$-ball
($L$-ball), become warm dark matter suitable for the solution of the missing
satellite problem and the cusp problem. In addition, $\Omega_b - \Omega_{DM}$
coincidence is naturally explained in this scenario.

<|endoftext|><|startoftext|>
A UNIFIED APPROACH TO SIC-POVMs AND MUBs
Olivier Albouy and Maurice R. Kibler
Université de Lyon, Institut de Physique Nucléaire,
Université Lyon 1 and CNRS/IN2P3, 43 bd du 11 novembre 1918,
F–69622 Villeurbanne, France
Electronic mail: o.albouy@ipnl.in2p3.fr, m.kibler@ipnl.in2p3.fr
Abstract
A unified approach to (symmetric informationally complete) positive op-
erator valued measures and mutually unbiased bases is developed in this arti-
cle. The approach is based on the use of Racah unit tensors for the Wigner-
Racah algebra of SU(2) ⊃ U(1). Emphasis is put on similarities and differ-
ences between SIC-POVMs and MUBs.
Keywords: finite–dimensional Hilbert spaces; mutually unbiased bases; positive op-
erator valued measures; SU(2) ⊃ U(1) Wigner–Racah algebra
1 INTRODUCTION
The importance of finite–dimensional spaces for quantum mechanics is well recognized
(see for instance [1]-[3]). In particular, such spaces play a major role in quantum informa-
tion theory, especially for quantum cryptography and quantum state tomography [4]-[27].
Along this vein, a symmetric informationally complete (SIC) positive operator valued
measure (POVM) is a set of operators acting on a finite Hilbert space [4]-[14] (see also
[3] for an infinite Hilbert space) and mutually unbiased bases (MUBs) are specific bases
for such a space [15]-[27].
The introduction of POVMs goes back to the seventies [4]-[7]. The most general quan-
tum measurement is represented by a POVM. In the present work, we will be interested
http://arxiv.org/abs/0704.0511v3
in SIC-POVMs, for which the statistics of the measurement allows the reconstruction of
the quantum state. Moreover, those POVMs are endowed with an extra symmetry condi-
tion (see definition in Sec. 2). The notion of MUBs (see definition in Sec. 3), implicit or
explicit in the seminal works of [15]-[18], has been the object of numerous mathematical
and physical investigations during the last two decades in connection with the so-called
complementary observables. Unfortunately, the question to know, for a given Hilbert
space of finite dimension d, whether there exist SIC-POVMs and how many MUBs there
exist has remained an open one.
The aim of this note is to develop a unified approach to SIC-POVMs and MUBs based
on a complex vector space of higher dimension, viz. d2 instead of d. We then give a
specific example of this approach grounded on the Wigner-Racah algebra of the chain
SU(2) ⊃ U(1) recently used for a study of entanglement of rotationally invariant spin
systems [28] and for an angular momentum study of MUBs [26, 27].
Most of the notations in this work are standard. Let us simply mention that I is the
identity operator, the bar indicates complex conjugation, A† denotes the adjoint of the
operator A, δa,b stands for the Kronecker symbol for a and b, and ∆(a, b, c) is 1 or 0
according as a, b and c satisfy or not the triangular inequality.
2 SIC-POVMs
Let Cd be the standard Hilbert space of dimension d endowed with its usual inner product
denoted by 〈 | 〉. As is usual, we will identify a POVM with a nonorthogonal decompo-
sition of the identity. Thus, a discrete SIC-POVM is a set {Px : x = 1, 2, · · · , d2} of d2
nonnegative operators Px acting on C
d, such that:
• they satisfy the trace or symmetry condition
Tr (PxPy) =
, x 6= y; (1)
moreover, we will assume the operators Px are normalized, thus completing this
condition with
= 1; (2)
• they form a decomposition of the identity
Px = I; (3)
• they satisfy a completeness condition: the knowledge of the probabilities px defined
by px = Tr(Pxρ) is sufficient to reconstruct the density matrix ρ.
Now, let us develop each of the operators Px on an orthonormal (with respect to the
Hilbert–Schmidt product) basis {ui : i = 1, 2, · · · , d2} of the space of linear operators on
vi(x)ui, (4)
where the operators ui satisfy Tr(u
iuj) = δi,j . The operators Px are thus considered as
vectors
v(x) = (v1(x), v2(x), · · · , vd2(x)) (5)
in the Hilbert space Cd
of dimension d2 and the determination of the operators Px is
equivalent to the determination of the components vi(x) of v(x). In this language, the
trace property (1) together with the normalization condition (2) give
v(x) · v(y) = 1
(dδx,y + 1) , (6)
where v(x) · v(y) =
i=1 vi(x)vi(y) is the usual Hermitian product in C
In order to compare Eq. (6) with what usually happens in the search for SIC-POVMs,
we suppose from now on that the operators Px are rank-one operators. Therefore, by
putting
Px = |Φx〉〈Φx| (7)
with |φx〉 ∈ Cd, the trace property (1, 2) reads
|〈Φx|Φy〉|2 =
(dδx,y + 1) . (8)
From this point of view, to find d2 operators Px is equivalent to finding d
2 vectors |φx〉
in Cd satisfying Eq. (8). At the price of an increase in the number of components from
d3 (for d2 vectors in Cd) to d4 (for d2 vectors in Cd
), we have got rid of the square
modulus to result in a single scalar product (compare Eqs. (6) and (8)), what may prove
to be suitable for another way to search for SIC-POVMs. Moreover, our relation (6) is
independent of any hypothesis on the rank of the operators Px. In fact, there exists a lot
of relations among these d4 coefficients that decrease the effective number of coefficients
to be found and give structural constraints on them. Those relations are highly sensitive
to the choice of the basis {ui : i = 1, 2, · · · , d2} and we are going to exhibit an example
of such a set of relations by choosing the basis to consist of Racah unit tensors.
The cornerstone of this approach is to identify Cd with a subspace ε(j) of constant
angular momentum j = (d− 1)/2. Such a subspace is spanned by the set {|j,m〉 : m =
−j,−j + 1, · · · , j}, where |j,m〉 is an eigenvector of the square and the z-component
of a generalized angular momentum operator. Let u(k) be the Racah unit tensor [29]
of order k (with k = 0, 1, · · · , 2j) defined by its 2k + 1 components u(k)q (where q =
−k,−k + 1, · · · , k) through
u(k)q =
m′=−j
(−1)j−m
j k j
−m q m′
|j,m〉〈j,m′|, (9)
where (· · ·) denotes a 3–jm Wigner symbol. For fixed j, the (2j + 1)2 operators u(k)q
(with k = 0, 1, · · · , 2j and q = −k,−k + 1, · · · , k) act on ε(j) ∼ Cd and form a basis
of the Hilbert space CN of dimension N = (2j + 1)2, the inner product in CN being the
Hilbert–Schmidt product. The formulas (involving unit tensors, 3–jm and 6–j symbols)
relevant for this work are given in Appendix (see also [29] to [31]). We must remember
that those Racah operators are not normalized to unity (see relation (46)). So this will
generate an extra factor when defining vi(x).
Each operator Px can be developed as a linear combination of the operators u
Hence, we have
ckq(x)u
q , (10)
where the unknown expansion coefficients ckq(x) are a priori complex numbers. The
determination of the operators Px is thus equivalent to the determination of the coefficients
ckq(x), which are formally given by
ckq(x) = (2k + 1)〈Φx|u(k)q |Φx〉, (11)
as can be seen by multiplying each member of Eq. (10) by the adjoint of u
p and then
using Eq. (46) of Appendix.
By defining the vector
v(x) = (v1(x), v2(x), · · · , vN(x)), N = (2j + 1)2 (12)
vi(x) =
2k + 1
ckq(x), i = k
2 + k + q + 1, (13)
the following properties and relations are obtained.
• The first component v1(x) of v(x) does not depend on x since
c00(x) =
2j + 1
for all x ∈ {1, 2, · · · , (2j + 1)2}.
Proof: Take the trace of Eq. (10) and use Eq. (48) of Appendix.
• The components vi(x) of v(x) satisfy the complex conjugation property described
ckq(x) = (−1)qck−q(x) (15)
for all x ∈ {1, 2, · · · , (2j + 1)2}, k ∈ {0, 1, · · · , 2j} and q ∈ {−k,−k + 1, · · · , k}.
Proof: Use the Hermitian property of Px and Eq. (43) of Appendix.
• In terms of ckq, Eq. (6) reads
2k + 1
ckq(x)ckq(y) =
2(j + 1)
[(2j + 1)δx,y + 1] (16)
for all x, y ∈ {1, 2, · · · , (2j + 1)2}, where the sum over q is SO(3) rotationally
invariant.
Proof: The proof is trivial.
• The coefficients ckq(x) are solutions of the nonlinear system given by
2K + 1
cKQ(x) = (−1)2j−Q
k ℓ K
−q −p Q
k ℓ K
j j j
ckq(x)cℓp(x) (17)
for all x∈ {1, 2, · · · , (2j+1)2}, K ∈ {0, 1, · · · , 2j} and Q∈ {−K,−K+1, · · · , K}.
Proof: Consider P 2x = Px and use the coupling relation (51) of Appendix involving
a 3–jm and a 6–j Wigner symbols.
As a corollary of the latter property, by taking K = 0 and using Eqs. (47) and (50)
of Appendix, we get again the normalization relation ‖v(x)‖2 = v(x) · v(x) = 1.
• All coefficients ckq(x) are connected through the sum rule
(2j+1)2
ckq(x)
j k j
−m q m′
= (−1)j−m(2j + 1)δm,m′ , (18)
which turns out to be useful for global checking purposes.
Proof: Take the jm–jm′ matrix element of the resolution of the identity in terms of
the operators Px/(2j + 1).
3 MUBs
A complete set of MUBs in the Hilbert space Cd is a set of d(d + 1) vectors |aα〉 ∈ Cd
such that
|〈aα|bβ〉|2 = δα,βδa,b +
(1− δa,b), (19)
where a = 0, 1, · · · , d and α = 0, 1, · · · , d − 1. The indices of type a refer to the bases
and, for fixed a, the index α refers to one of the d vectors of the basis corresponding to a.
We know that such a complete set exists if d is a prime or the power of a prime (e.g., see
[16]-[24]).
The approach developed in Sec. 2 for SIC-POVMs can be applied to MUBs too. Let
us suppose that it is possible to find d+ 1 sets Sa (with a = 0, 1, · · · , d) of vectors in Cd,
each set Sa = {|aα〉 : α = 0, 1, · · · , d − 1} containing d vectors |aα〉 such that Eq. (19)
be satisfied. This amounts to finding d(d+ 1) projection operators
Πaα = |aα〉〈aα| (20)
satisfying the trace condition
Tr (ΠaαΠbβ) = δα,βδa,b +
(1− δa,b), (21)
where the trace is taken on Cd. Therefore, they also form a nonorthogonal decomposition
of the identity
Πaα = I. (22)
As in Sec. 2, we develop each operator Πaα on an orthonormal basis with expansion
coefficients wi(aα). Thus we get vectors w(aα) in C
w(aα) = (w1(aα), w2(aα), · · · , wd2(aα)) (23)
such that
w(aα) · w(bβ) = δα,βδa,b +
(1− δa,b) (24)
for all a, b ∈ {0, 1, · · · , d} and α, β ∈ {0, 1, · · · , d− 1}.
Now we draw the same relations as for POVMs by choosing the Racah operators to be
our basis in Cd
. We assume once again that the Hilbert space Cd is realized by ε(j) with
j = (d − 1)/2. Then, each operator Πaα can be developed on the basis of the (2j + 1)2
operators u
Πaα =
dkq(aα)u
q , (25)
to be compared with Eq. (10). The expansion coefficients are
dkq(aα) = (2k + 1)〈aα|u(k)q |aα〉 (26)
for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, k ∈ {0, 1, · · · , 2j} and q ∈ {−k,−k +
1, · · · , k}. For a and α fixed, the complex coefficients dkq(aα) define a vector
w(aα) = (w1(aα), w2(aα), · · · , wN(aα)) , N = (2j + 1)2 (27)
in the Hilbert space CN , the components of which are given by
wi(aα) =
2k + 1
dkq(aα), i = k
2 + k + q + 1. (28)
We are thus led to the following properties and relations. The proofs are similar to those
in Sec. 2.
• First component w1(aα) of w(aα):
d00(aα) =
2j + 1
for all a ∈ {0, 1, · · · , 2j + 1} and α ∈ {0, 1, · · · , 2j}.
• Complex conjugation property:
dkq(aα) = (−1)qdk−q(aα) (30)
for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, k ∈ {0, 1, · · · , 2j} and q ∈
{−k,−k + 1, · · · , k}.
• Rotational invariance:
2k + 1
dkq(aα)dkq(bβ) = δα,βδa,b +
2j + 1
(1− δa,b) (31)
for all a, b ∈ {0, 1, · · · , 2j + 1} and α, β ∈ {0, 1, · · · , 2j}.
• Tensor product formula:
2K + 1
dKQ(aα) = (−1)2j−Q
k ℓ K
−q −p Q
k ℓ K
j j j
dkq(aα)dℓp(aα) (32)
for all a ∈ {0, 1, · · · , 2j + 1}, α ∈ {0, 1, · · · , 2j}, K ∈ {0, 1, · · · , 2j} and Q ∈
{−K,−K + 1, · · · , K}.
• Sum rule:
dkq(aα)
j k j
−m q m′
= (−1)j−m2(2j + 1)δm,m′ (33)
which involves all coefficients dkq(aα).
4 CONCLUSIONS
Although the structure of the relations in Sec. 1 on the one hand and Sec. 2 on the other
hand is very similar, there are deep differences between the two sets of results. The
similarities are reminiscent of the fact that both MUBs and SIC-POVMs can be linked to
finite affine planes [12, 13, 22, 23, 25] and to complex projective 2–designs [8, 10, 19, 24].
On the other side, there are two arguments in favor of the differences between relations (6)
and (24). First, the problem of constructing SIC-POVMs in dimension d is not equivalent
to the existence of an affine plane of order d [12, 13]. Second, there is a consensus around
the conjecture according to which there exists a complete set of MUBs in dimension d if
and only if there exists an affine plane of order d [22].
In dimension d, to find d2 operators Px of a SIC-POVM acting on the Hilbert space
d amounts to find d2 vectors v(x) in the Hilbert space CN with N = d2 satisfying
‖vx‖ = 1, v(x) · v(y) =
for x 6= y (34)
(the norm ‖v(x)‖ of each vector v(x) is 1 and the angle ωxy of any pair of vectors v(x)
and v(y) is ωxy = cos
−1[1/(d+ 1)] for x 6= y).
In a similar way, to find d + 1 MUBs of Cd is equivalent to find d + 1 sets Sa (with
a = 0, 1, · · · , d) of d vectors, i.e., d(d + 1) vectors in all, w(aα) in CN with N = d2
satisfying
w(aα) · w(aβ) = δα,β, w(aα) · w(bβ) =
for a 6= b (35)
(each set Sa consists of d orthonormalized vectors and the angle ωaαbβ of any vector
w(aα) of a set Sa with any vector w(bβ) of a set Sb is ωaαbβ = cos
−1(1/d) for a 6= b).
According to a well accepted conjecture [8, 10], SIC-POVMs should exist in any
dimension. The present study shows that in order to prove this conjecture it is sufficient
to prove that Eq. (34) admits solutions for any value of d.
The situation is different for MUBs. In dimension d, it is known that there exist d+ 1
sets of d vectors of type |aα〉 in Cd satisfying Eq. (19) when d is a prime or the power of
a prime. This shows that Eq. (35) can be solved for d prime or power of a prime. For d
prime, it is possible to find an explicit solution of Eq. (19). In fact, we have [26, 27]
|aα〉 =
2j + 1
ω(j+m)(j−m+1)a/2+(j+m)α|j,m〉, (36)
ω = exp
2j + 1
, j =
(d− 1) (37)
for a, α ∈ {0, 1, · · · , 2j} while
|aα〉 = |j,m〉 (38)
for a = 2j + 1 and α = j +m = 0, 1, · · · , 2j. Then, Eq. (26) yields
dkq(aα) =
2k + 1
2j + 1
m′=−j
ωθ(m,m
′)(−1)j−m
j k j
−m q m′
, (39)
θ(m,m′) = (m−m′)
(1−m−m′)a+ α
for a, α ∈ {0, 1, · · · , 2j} while
dkq(aα) = δq,0(2k + 1)(−1)j−m
j k j
−m 0 m
for a = 2j + 1 and α = j + m = 0, 1, · · · , 2j. It can be shown that Eqs. (40) and (41)
are in agreement with the results of Sec. 3. We thus have a solution of the equations for
the results of Sec. 3 when d is prime. As an open problem, it would be worthwhile to find
an explicit solution for the coefficients dkq(aα) when d = 2j + 1 is any positive power
of a prime. Finally, note that to prove (or disprove) the conjecture according to which a
complete set of MUBs in dimension d exists only if d is a prime or the power of a prime
is equivalent to prove (or disprove) that Eq. (35) has a solution only if d is a prime or the
power of a prime.
APPENDIX: WIGNER-RACAH ALGEBRA OF SU(2) ⊃
We limit ourselves to those basic formulas for the Wigner-Racah algebra of the chain
SU(2) ⊃ U(1) which are necessary to derive the results of this paper. The summations
in this appendix have to be extended to the allowed values for the involved magnetic and
angular momentum quantum numbers.
The definition (9) of the components u
q of the Racah unit tensor u(k) yields
〈j,m|u(k)q |j,m′〉 = (−1)j−m
j k j
−m q m′
, (42)
from which we easily obtain the Hermitian conjugation property
u(k)q
= (−1)qu(k)−q . (43)
The 3–jm Wigner symbol in Eq. (42) satisfies the orthogonality relations
j j′ k
m m′ q
j j′ ℓ
m m′ p
2k + 1
δk,ℓδq,p∆(j, j
′, k) (44)
(2k + 1)
j j′ k
m m′ q
j j′ k
M M ′ q
= δm,Mδm′,M ′. (45)
The trace relation on the space ε(j)
u(k)q
u(ℓ)p
2k + 1
δk,ℓδq,p∆(j, j, k) (46)
easily follows by combining Eqs. (42) and (44). Furthermore, by introducing
j j′ 0
m −m′ 0
= δj,j′δm,m′(−1)j−m
2j + 1
in Eq. (44), we obtain the sum rule
(−1)j−m
j k j
−m q m
2j + 1δk,0δq,0∆(j, k, j), (48)
known in spectroscopy as the barycenter theorem.
There are several relations involving 3–jm and 6–j symbols. In particular, we have
(−1)j−M
j k j
−m q M
j ℓ j
−M p m′
j K j
−m Q m′
= (−1)2j−Q
k ℓ K
−q −p Q
k ℓ K
j j j
, (49)
where {· · ·} denotes a 6–j Wigner symbol (or W Racah coefficient). Note that the intro-
duction of
k ℓ 0
j j J
= δk,ℓ(−1)j+k+J
(2k + 1)(2j + 1)
in Eq. (49) gives back Eq. (44). Equation (49) is central in the derivation of the coupling
relation
u(k)q u
(−1)2j−Q(2K + 1)
k ℓ K
−q −p Q
k ℓ K
j j j
Q . (51)
Equation (51) makes it possible to calculate the commutator [u
q , u
p ] which shows that
the set {u(k)q : k = 0, 1, · · · , 2j; q = −k,−k + 1, · · · , k} can be used to span the Lie
algebra of the unitary group U(2j + 1). The latter result is at the root of the expansions
(17) and (32).
Note added in version 3
After the submission of the present paper for publication in Journal of Russian Laser
Research, a pre-print dealing with the existence of SIC-POVMs was posted on arXiv [32].
The main result in [32] is that SIC-POVMs exist in all dimensions. As a corollary of this
result, Eq. (34) admits solutions in any dimension.
Acknowledgements
This work was presented at the International Conference on Squeezed States and Un-
certainty Relations, University of Bradford, England (ICSSUR’07). The authors wish to
thank the organizer A. Vourdas and are grateful to D. M. Appleby, V. I. Man’ko and M.
Planat for interesting comments.
References
[1] A. Peres, “Quantum Theory: Concepts and Methods”, Dordrecht: Kluwer (1995)
[2] A. Vourdas, J. Phys. A: Math. Gen. 38, 8453 (2005)
[3] W. M. de Muynck, “Foundations of Quantum Mechanics, an Empiricist Approach”,
Dordrecht: Kluwer (2002)
[4] J. M. Jauch and C. Piron, Helv. Phys. Acta 40, 559 (1967)
[5] E. B. Davies and J. T. Levis, Comm. Math. Phys. 17, 239 (1970)
[6] E. B. Davies, IEEE Trans. Inform. Theory IT-24, 596 (1978)
[7] K. Kraus, “States, Effects, and Operations”, Lect. Notes Phys. 190 (1983)
[8] G. Zauner, Diploma Thesis, University of Wien (1999)
[9] C. M. Caves, C. A. Fuchs and R. Schack, J. Math. Phys. 43, 4537 (2002)
[10] J. M. Renes, R. Blume-Kohout, A. J. Scott and C. M. Caves, J. Math. Phys. 45, 2171
(2004)
[11] D. M. Appleby, J. Math. Phys. 46, 052107 (2005)
[12] M. Grassl, Proc. ERATO Conf. Quant. Inf. Science (EQIS 2004) ed. J. Gruska, Tokyo
(2005)
[13] M. Grassl, Elec. Notes Discrete Math. 20, 151 (2005)
[14] S. Weigert, Int. J. Mod. Phys. B 20, 1942 (2006)
[15] J. Schwinger, Proc. Nat. Acad. Sci. USA 46, 570 (1960)
[16] P. Delsarte, J. M. Goethals and J. J. Seidel, Philips Res. Repts. 30, 91 (1975)
[17] I. D. Ivanović, J. Phys. A: Math. Gen. 14, 3241 (1981)
[18] W. K. Wootters, Ann. Phys. (N.Y.) 176, 1 (1987)
[19] H. Barnum, Preprint quant-ph/0205155 (2002)
[20] S. Bandyopadhyay, P. O. Boykin, V. Roychowdhury and F. Vatan, Algorithmica 34,
512 (2002)
[21] A. O. Pittenger and M. H. Rubin, Linear Alg. Appl. 390, 255 (2004)
[22] M. Saniga, M. Planat and H. Rosu, J. Opt. B: Quantum Semiclassical Opt. 6, L19
(2004)
[23] I. Bengtsson and Å. Ericsson, Open Syst. Inf. Dyn. 12, 107 (2005)
[24] A. Klappenecker and M. Rötteler, Preprint quant-ph/0502031 (2005)
[25] W. K. Wootters, Found. Phys. 36, 112 (2006)
[26] M. R. Kibler and M. Planat, Int. J. Mod. Phys. B 20, 1802 (2006)
[27] O. Albouy and M. R. Kibler, SIGMA 3, article 076 (2007)
[28] H.-P. Breuer, J. Phys. A: Math. Gen. 38, 9019 (2005)
[29] G. Racah, Phys. Rev. 62, 438 (1942)
[30] U. Fano and G. Racah, “Irreducible Tensorial Sets”, New York: Academic (1959)
[31] M. Kibler and G. Grenet, J. Math. Phys. 21, 422 (1980)
[32] J.L. Hall and A. Rao, Preprint quant-ph/0707.3002v1 (20 July 2007)
http://arxiv.org/abs/quant-ph/0205155
http://arxiv.org/abs/quant-ph/0502031
	INTRODUCTION
	SIC-POVMs
	MUBs
	CONCLUSIONS
ABSTRACT
  A unified approach to (symmetric informationally complete) positive operator
valued measures and mutually unbiased bases is developed in this article. The
approach is based on the use of operator equivalents expanded in the enveloping
algebra of SU(2). Emphasis is put on similarities and differences between
SIC-POVMs and MUBs.

<|endoftext|><|startoftext|>
Introduction
The simplest model exhibiting time-oscillations in a two-component system is the model
proposed independently by Lotka [1, 2, 3] and by Volterra [4]. In this model the
individuals of two species are dispersed over an assumed homogeneous space. It is
implicitly assumed in this approach that any individual can interact with any other one
with equal intensity implying that their positions are not taken into account. The time
evolution of the densities of the two species in the Lotka-Volterra model is given by a set
of two ordinary differential equations [5, 6, 7, 8] and is set up in analogy with the laws
of mass-action. Depending on the level of description wanted, the approach based on
mass-action laws, contained on the Lotka-Volterra model, suffices. However, there are
situations in which the coexistence takes place in a spatially heterogeneous habitat such
that the population densities can be very low in some regions. In this case we need to
proceed beyond the mass-law equations and consider the space structure of the habitat.
In other words, it becomes necessary to analyze the coexistence by taking explicitly into
account spatial structured models.
In fact, the role of space in the description of population biology problems has been
recognized by several authors in the last years [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26]. In a very clear manner, Durrett and Levin [11] have pointed
out that the modelling of population dynamics systems which are spatially distributed
by interacting particle systems [11, 27, 28, 29] is the appropriate theoretical approach
that is able to give the more complete description of the problem. We include in this
approach probabilistic cellular automata (PCA) [29, 30, 31], which will concern us here.
We refer to interacting particle systems and PCA as stochastic lattice models. They are
both Markovian processes defined by discrete stochastic variables residing on the sites
of a lattice; the former being a continuous time process and the latter a discrete time
process.
In the present work we study the coexistence and the emergence of stable self-
sustained oscillations in a predator-prey system by considering a PCA previously studied
by numerical simulations [24, 26]. This PCA is defined by local rules, similar to the
ones of the contact process [27], that are capable of describing the interaction between
prey and predator. Here, we focus on the analysis of the PCA by means of dynamic
mean-field approximations [10, 28, 29, 30, 32, 33]. In this approach the equations for
the time evolution of correlations of various orders are truncated at a certain level and
high order correlations of sites are written in terms of small order correlations. The
simplest approximation is the one in which all correlations are written in terms of one-
site correlation, called simple approximation. In a more sophisticated approximation,
called pair approximation [10, 29], any correlation is written in terms of one-site and
two-site correlations.
The simple mean-field approximation is capable of predicting coexistence of
individuals in a stationary state where the densities of each species, and of empty
sites, are constant. However, it is not capable of predicting possible time oscillating
Stable oscillations of a predator-prey probabilistic cellular automaton 3
behavior of the population densities and we have proceed to the next order of mean-
field approximation. The simple approximation, on the other hand, can be placed in an
explicit correspondence with a patch model [7, 12, 34] where unoccupied patches can be
colonized by prey and patches occupied by prey can be colonized by predators that in
turn may become extinct. In this approximation the PCA can be seen as an extended
version of the Lotka-Volterra model which includes an extra logistic term related to the
empty sites.
The pair-mean field approximation is able to predict possible time oscillating
behavior of the population densities that are self-sustained and are attained thorough
Hopf bifurcations. This is in contrast with the Lotka-Volterra model which presents
no stable oscillations but exhibits instead infinite cycles that are associated to different
initial conditions. However, from the biological point of view, one does not expect that a
small variation in the initial densities of prey and predator result in different amplitudes
of oscillations. Within our approach, a PCA treated in the pair-approximation, the
oscillations are associated with limit cycles what mean to say that they are stable
against the changes in the initial conditions. According to our point of view, the pair-
approximation, in which the correlation between neighboring sites are treated exactly,
provides a basic description of the predator-prey spatial interactions. For this reason,
we will refer to the PCA in this approximation as a quasi-spatial-structured model.
2. Model
2.1. Probabilistic cellular automaton
We consider interacting particles living on the sites of a lattice and evolving in time
according to Markovian local rules. The lattice is the geometrical object that plays the
role of the spatial region occupied by particles, in a general case, or by individuals of each
species in the present case. The lattice sites are the possible locations for the individuals.
Each site can be either empty or occupied by one individual of different species and a
stochastic variable ηi is introduced to describe the state of each site at a given instant
of time. The state of the entire system is denoted by η = (η1, . . . , ηi, . . . , ηN) where
N is the total number of sites. The transition between the states is governed by the
interactions between neighbor sites in the lattice and by a synchronous dynamics.
The probability P (ℓ)(η) of configuration η at time step ℓ evolves according to the
Markov chain equation
P (ℓ+1)(η) =
W (η|η′)P (ℓ)(η′), (1)
where the summation is over all the microscopic configurations of the system, and
W (η|η′) is the conditional transition probability from state η′ at time ℓ to state η at
time ℓ + 1. This transition probability does not depend on time and contains all the
information about the dynamics of the system. Taking into account that all the sites
are simultaneously updated, which is the fundamental property of a PCA, the transition
Stable oscillations of a predator-prey probabilistic cellular automaton 4
Figure 1. Transitions of the predator-prey model. The three states are: prey or
herbivorous (H or 1), predator (P or 2) and empty or vegetable (V or 0). The allowed
transitions obey the cyclic order shown.
probability can be factorized and written in the form [29, 30]
W (η|η′) =
wi(ηi|η
′), (2)
where wi(ηi|η
′) is the conditional transition probability that site i takes the state ηi
given that the whole system is in state η′. Being a probability distribution, the quantity
wi(ηi|η
′) must satisfy the following properties: wi(ηi, η
′) ≥ 0 and
wi(ηi|η
′) = 1. (3)
The average of any state function F (η) is evaluated by
〈F (η)〉ℓ =
F (η)P (ℓ)(η). (4)
The time evolution equation for 〈F (η)〉 is obtained from definition (4) and equation (1).
For example, we can derive the equations for the time evolution of densities and two-site
correlations.
2.2. Predator-prey probabilistic cellular automaton
To model a predator-prey system by a PCA, the stochastic variable ηi associated to site
i will represent the occupancy of the site by one prey, or the occupancy by one predator
or the vacancy (a site devoid of any individual). The variable ηi is assumed to take
the value 0, 1, or 2, according to whether the site is empty (V), occupied by a prey
individual (H) or by a predator (P), respectively. That is,
0, empty (V),
1, prey (H),
2, predator (P),
which defines a three-state per site PCA.
The stochastic rules, embodied in the transition rate wi(ηi|η
′), are set up according
to the following assumptions. (a) The space is homogeneous, which means to say that
no region of the space will be privileged against the others, that is, in principle the
individuals have the same conditions of surveillance in any space region. (b) The space
is isotropic, which means to say that there is no preferential direction in this space for
any interaction. (c) The allowed transitions between states are only the ones that obey
Stable oscillations of a predator-prey probabilistic cellular automaton 5
the cyclic order shown in figure 1. Prey can only born in empty sites; prey can give place
to a predator, in a process where a prey individual dies and a predator is instantaneously
born; finally a predator can die leaving an empty site. The empty sites are places where
prey can proliferate and can be seen as the resource for prey surveillance. The death of
predators complete this cycle, reintegrating to the system the resources for prey.
The predator-prey PCA has three parameters: a, the probability of birth of prey, b,
the probability of birth of predator and death of prey, and c, the probability of predator
death. Two of the process are catalytic: the occupancy of a site by prey or by a predator
is conditioned, respectively, to the existence of prey or predator in the neighborhood of
the site. The third reaction, where predator dies, is spontaneous, that is, it occurs, with
probability c, independently of the neighbors of the site. We assume that
a+ b+ c = 1, (6)
with 0 ≤ a, b, c ≤ 1.
The transition probabilities of the predator-prey PCA are described in what follows:
(a) If a site i is empty, ηi = 0, and there is at least one prey in its first neighborhood
there is a favorable condition for the birth of a new prey. The probability of site i being
occupied in next time step by a prey is proportional to the parameter a and to the
number of prey that are in the first neighborhood of the empty site.
(b) If a site is occupied by a prey, ηi = 1, and there is at least one predator in its
first neighborhood then the site has a probability of being occupied by a new predator
in the next instant of time. In this process the prey dies instantaneously. The transition
probability is proportional to the parameter b and the number of predators in first
neighborhood of the site.
(c) If site i is occupied by a predator, ηi = 2, it dies with probability c.
The transition probabilities associated to the three processes above mentioned can
be summarized as follows:
wi(0|η) = cδ(ηi, 2) + [1− fi(η)]δ(ηi, 0), (7)
wi(1|η) = fi(η)δ(ηi, 0) + [1 − gi(η)]δ(ηi, 1), (8)
wi(2|η) = gi(η)δ(ηi, 1) + (1− c)δ(ηi, 2), (9)
where
fi(η) =
δ(ηk, 1), gi(η) =
δ(ηk, 2), (10)
and the summation is over the four nearest neighbors of site i in a regular square lattice.
The notation δ(x, y) stands for the Kronecker delta function. These stochastic local
rules, when inserted in equation (2), define the dynamics of the PCA for a predator-
prey system.
The present stochastic dynamics predicts the existence of states, called absorbing
states, in which the system becomes trapped. Once the system has entered such a state
Stable oscillations of a predator-prey probabilistic cellular automaton 6
it cannot escape from it anymore remaining there forever. There are two absorbing
states. One of them is the empty lattice. Since the predator death is spontaneous, a
configuration where just predators are present is not stationary. This situation happens
whenever the prey have been extinct. In this case the predator cannot reproduce
anymore and also get extinct, leaving the entire lattice with empty sites. The other
absorbing state is the lattice full of prey. This situation occurs if there are few predators
and they become extinct. The remaining prey will then reproduce without predation
filling up the whole lattice. The existence of absorbing stationary states is an evidence
of the irreversible character of the model or, in other words, of the lack of detailed
balance [29]. However, the most interesting states, the ones that we are concerned with
in the present study, are the active states characterized by the coexistence of prey and
predators.
2.3. Time evolution equations for state functions
We start by defining the densities, which are the one-site correlations, and the two-site
correlations. These quantities will be useful in our mean-field analysis to be developed
below. The density of prey, predator, and empty sites at time step ℓ are defined thought
the expressions
i (1) = 〈δ(ηi, 1)〉ℓ, (11)
i (2) = 〈δ(ηi, 2)〉ℓ, (12)
i (0) = 〈δ(ηi, 0)〉ℓ. (13)
The evolution equations for the above densities are obtained from their definitions as
state functions, as given by equation (4), and by using the evolution equation for P (ℓ)(η),
given by equation (1). The resulting equations can be formally written as
(ℓ+1)
i (1) = 〈wi(1|η)〉ℓ, (14)
(ℓ+1)
i (2) = 〈wi(2|η)〉ℓ, (15)
(ℓ+1)
i (0) = 〈wi(0|η)〉ℓ, (16)
where the transition probabilities for this model are given in equations (7), (8) and (9).
The correlation between a prey localized at site i and a predator localized at site j
at time step ℓ is defined by
ij (1, 2) = 〈δ(ηi, 1)δ(ηj, 2)〉ℓ. (17)
The other two-site correlations are defined similarly. The time evolution equation for
the correlation of two neighbor sites i and j, one being occupied by a prey and the other
by a predator, is given by
(ℓ+1)
ij (1, 2) = 〈wi(1|η)wj(2|η)〉ℓ. (18)
The other two-site evolution equations are given by similar formal expressions. We
can also derive equations for three-site correlations. Since we are interested here on
Stable oscillations of a predator-prey probabilistic cellular automaton 7
approximations in which only the one-site and two-site correlations should be treated
exactly, the above equations suffice.
We call the attention to the fact that equation (18) includes the product of two
transition probabilities. This is a consequence of the synchronous update of the PCA
which allows that both neighboring sites i and j have their states changed at same
time step. This situation does not occur when we consider a continuous time one-
site dynamics. Therefore, although local interaction in the present PCA and in the
continuous time model considered in reference [10] are the same, the predator-prey
system evolves according to different global dynamics which leads to different time
evolution equations for the densities and the correlations.
The exact evolution equations for the one-site correlations are
P ′j (1) =
Pji(01)−
Pji(12) + Pj(1), (19)
P ′j (2) =
Pji(12) + (1− c)Pj(2), (20)
where the summation in j is over the ζ nearest neighbors of site i. To simplify notation
we are using unprimed and primed quantities to refer to quantities taken at time ℓ and
ℓ+ 1, respectively.
The exact evolution equations for the correlations of two nearest neighbor sites j
and k are
P ′jk(01) =
n(6=j)
Pjkn(001)−
i(6=k)
Pijkn(1001)
+ (1−
Pjk(01)−
n(6=j)
Pjkn(012)
i(6=k)
Pijk(101)−
n(6=j)
Pijkn(1012)
)Pjk(21)−
n(6=j)
Pjkn(212)
n(6=j)
Pjkn(201), (21)
P ′jk(12) =
n(6=j)
Pjkn(012) +
i(6=k)
Pijkn(1012)
n(6=j)
Pjkn(112)−
i(6=k)
Pijkn(2112)
+ (1− c)
)Pjk(12)−
i(6=k)
Pijk(212)
(1− c)
i(6=k)
Pijk(102), (22)
Stable oscillations of a predator-prey probabilistic cellular automaton 8
P ′jk(02) =
n(6=j)
)Pjkn(012)−
i(6=k)
Pijkn(1012)
+ (1− c)
cPjk(22) + Pjk(02)−
i(6=k)
Pijk(102)
Pjk(21) +
n(6=j)
Pjkn(212)
 , (23)
where the summation in i is over the nearest neighbors of j and the summation in n is
over the nearest neighbors of k.
3. Mean-field approximation
3.1. One and two site approximations
The evolution equation for a density in any interacting particle system which evolves in
time according to local interaction rules always contains terms related to the correlations
between neighbor sites in a lattice. The evolution equations for the correlations of two
neighbor sites includes the correlation of clusters of three or more sites in the lattice and
so on. In this way we can have an infinite set of coupled equations for the correlations
which is equivalent to the evolution equation for the probability P (ℓ)(η), described in
equation (1) for the automaton. The scope of the dynamic mean-field approximation
consists in the truncation of this infinite set of coupled equations [30, 31, 32, 33].
The lowest order dynamic mean-field approximation is the one where the probability
of a given cluster is written as the product of the probabilities of each site. That is, all
the correlations between sites in the cluster are neglected. For example, let us consider
the cluster constituted by a center (C) site and its first neighboring sites to the north
(N), south (S), east (E) and west (W) as shown in figure 2.
Within the one-site approximation the probability P (N,E,W, S, C) corresponding
to the cluster shown in figure 2 is approximated by
P (N,E,W, S, C) = P (N)P (E)P (W )P (S)P (C), (24)
where P (X), X = N,E,W, S, C are the one-site probabilities corresponding to each
site. For some stochastic dynamics models this approximation is able to give qualitative
results that are in agreement with the expected results.
In order to get a better approximation we must include fluctuations. The
simplest mean-field approximation that includes correlations is the pair-mean field
approximation. This approximation is better explained by taking again, as an example,
the cluster constituted by a center site which and its four nearest neighbors, shown
above. Within the pair-approximation the conditional probability P (N,E,W, S |C) is
approximated by
P (N,E,W, S |C) = P (N |C)P (E, |C)P (W |C)P (S |C), (25)
Stable oscillations of a predator-prey probabilistic cellular automaton 9
Figure 2. A site (C) of the square lattice and its four nearest neighbor sites (N, E,
W, S).
that is, the conditional probability P (N,E,W, S |C) is written in terms of the product
of the conditional probabilities P (X|C), X = N,E,W, S. Now using the definition of
conditional probability we have
P (N,E,W, S, C)
P (C)
P (N,C)
P (C)
P (E,C)
P (C)
P (W,C)
P (C)
P (S, C)
P (C)
, (26)
P (N,E,W, S, C) =
P (N,C)P (E,C)P (W,C)P (S, C)
[P (C)]3
. (27)
We see that the resulting probability is written as a function of two-site correlations
P (X,C), and the one-site correlation P (C).
3.2. Patch model
The simple mean-field approximation of the predator-prey PCA describes exactly
the same properties of an extended Levins patch model [7, 34]. That is, the PCA
with local rules similar to the contact process becomes, in the simple mean-field
approximation, analogous to the Levins model for metapopulation with empty patches,
patches colonized by prey and patches colonized by predators.
In the one-site mean-field approximation we consider that the probability of any
cluster of sites can be written as the product of the probabilities of each site, as in
equation (24). Using this approach, and writing x = Pi(1), y = Pi(2), and z = Pi(0) it
can be seen that the set of equations can be reduced to the following two-dimensional
map [26]
x′ = x+ axz − bxy, (28)
which is an evolution equation for prey density x, and
y′ = y + bxy − cy, (29)
which is an evolution equation for predator density yℓ. Notice that
z = 1− x− y. (30)
The fixed point of this map are those that represent the stationary solutions x′ = x
and y′ = y, and they correspond to the three following solutions x1 = 0, y1 = 0, and
x2 = 1, y2 = 0, and x3 = a/b, y3 = (1 − c/b)/(1 + b/a). The first solution corresponds
to an absorbing states where both species have been extinct. The second corresponds
Stable oscillations of a predator-prey probabilistic cellular automaton 10
−0.5 0 0.5
Figure 3. Phase diagram of the patch model. The continuous line represents
the transition, c1(p), between the prey absorbing (A) state and the active species
coexistence (C) state. The dashed line separates the two asymptotic time behavior of
the active state.
to an absorbing state where predators have extinct. The third solution corresponds to
an active state where prey and predator coexist.
Due to the constraint (6), the parameters a, b and c are not all independent and
only two can be chosen as independent. For this reason it is convenient to introduce the
following parametrization [10]
− p, b =
+ p, (31)
and consider p and c as the independent variables. The parameter p is such that
−1/2 ≤ p ≤ 1/2 and 0 ≤ c ≤ 1 as before. This parametrization will useful in the
determination of the different phases displayed by the model.
A linear stability analysis reveals that solution the (x1, y1) is a hyperbolic saddle
point for any set of the parameters a, b and c and so it is always unstable. The empty
absorbing state will never be reached. A linear stability analysis also shows that the
solution (x2, y2) is a stable node in the following region of the phase diagram c > c1
where
c1(p) =
(1 + 2p). (32)
The active solution is stable in the region c < c1 and is attained in two ways: by an
asymptotic stable focus, where the successive interactions of the map show damped
oscillations; or trough an asymptotic stable node. In the phase diagram of figure 3 we
show the transition line between the prey absorbing state and the active state given by
c = c1.
In figure 4 it is shown the behavior of the densities against the parameter c, the
probability of predators death, for the special case p = 0.2. In terms of phase transitions
what happens is that in the phase diagram there is a transition line separating the
absorbing prey phase and the active phase which is characterized by constant and
nonzero densities of prey and predator.
Stable oscillations of a predator-prey probabilistic cellular automaton 11
0 0.1 0.2 0.3 0.4 0.5 0.6
predator
Figure 4. Densities of predator and prey as functions of the parameter c for p = 0.2,
for the patch model.
We may conclude that the mean-field approximation for the predator-prey
probabilistic cellular automaton with rules (7), (8), and (9) is capable to show, under a
robust set of control parameters, that prey and predators can coexist without extinction.
However the map defined by equations (28) and (29) is not able to describe self-sustained
oscillations of species population densities.
3.3. Quasi-spatial model
In order to find if oscillations in the species populations can be described within a mean-
field approach we consider a more sophisticated approximation, the pair-approximation,
where correlations of two neighbor sites are included in the time evolution equations for
the densities. This is the lowest order mean-field approximation which takes into account
the spatial localization of neighboring individuals.
In this analysis we will maintain the correlations of one site and the correlations
of two-sites in the equations. Correlations of three and four neighbor sites will be
approximated by means of equation (27). With these approximation the model is
described by the following set of five coupled equations
x′ = au− bv + x, (33)
y′ = bv + (1− c)y, (34)
u′ = αa[
] + [(1− βa)− αa
][u− αb
+ αac
+ c[(1− βb)v − αb
], (35)
v′ = αb[βa
] + αa(1− c)
+ αb[
] + (1− c)[(1− βb)v − αb
], (36)
Stable oscillations of a predator-prey probabilistic cellular automaton 12
−0.5 0 0.5
Figure 5. Phase diagram of the quasi-spatial model. The upper continuous
line represents the transition, c1(p), between the prey absorbing (A) state and the
nonoscillating coexistence (CNO) state. The lower continuous line represents the
transition, c2(p), between the nonoscillating coexistence and the oscillating (COS)
coexistence state. The dashed line separates the two asymptotic time behavior of
the nonoscillating coexistence state.
w′ = αb[(1− βa)
] + (1− c)[w − αa
+ c[βbv + αb
] + c(1− c)s, (37)
where α and β are numerical fractions defined by α = (ζ − 1)/ζ and β = 1/ζ where
ζ is the coordination number of the lattice. For the present case of a square lattice,
ζ = 4 so that α = 3/4 and β = 1/4. We are using the following notation: u = P (0, 1),
v = P (1, 2), and w = P (0, 2) and also r = P (1, 1), q = P (0, 0) and s = P (2, 2). The
last three correlations are not independent but are related to others by
r = x− u− v, (38)
q = z − u− w, (39)
s = y − v − w. (40)
We used the properties P (1, 0) = P (0, 1), P (1, 2) = P (2, 1) and P (2, 0) = P (0, 2), that
follows from the assumption that space is isotropic and homogeneous.
We have analyzed numerically the five-dimensional map, described by the set of
equations (33), (34), (35), (36) and (37), and we have obtained four types of solutions.
Two solutions are trivial and are given by x = y = u = v = w = 0 and x = 1,
y = u = v = w = 0. They correspond to the empty and prey absorbing states,
respectively. The empty absorbing state, where both species have been extinct is
an unstable solution and never occurs. However, the prey absorbing state is one of
the possible stable stationary solutions and is stable above the critical transition line
Stable oscillations of a predator-prey probabilistic cellular automaton 13
0 0.1 0.2 0.3 0.4 0.5 0.6
predator
Figure 6. Densities of predator and prey as functions of c for the quasi-spatial model,
for p = 0.2.
c = c1(p) shown in figure 5. Below this line it becomes unstable giving rise to the active
state.
The other solutions correspond to the active states where both prey and predator
coexist. These solutions are of two kinds: a stationary solution where there is a
coexistence of the two species with densities constant in time, which we call the
nonoscillating (NO) active state; and another solution where both population densities
oscillate in time. This solution corresponds to a self-sustained oscillation of the predator-
prey system and will be called the oscillating (O) active state. In the phase diagram of
figure 5 there is a line c = c2(p) that separates the NO and O active phases. Figure 6
shows the behavior of the densities as a function of c for p = 0.
3.4. Oscillatory behavior
In figure 7 we show an example of self-sustained oscillations of the densities of prey
and predators as functions of time. The oscillating solutions are attained from the
nonoscillating solutions by a Hopf bifurcation. The fixed point associated to this solution
is an unstable center which produces a stable limit cycle as trajectories in the phase-
space of the predator density versus prey density, as can be seen in figure 7. Notice
that the oscillations are not damped and have a well defined period which is the same
for the prey density and for the predator density, which implies that the oscillations are
coupled. A maximum of predators always follow a maximum of prey. This means that
the abundance of prey is a condition that favors the increase in the number of predators.
As the predator number increases the prey population decays. The evanescence of prey
is followed by a decrease in the predator number, giving conditions for the increase of
prey population until the cycle starts again.
A well defined oscillatory behavior is found for many biological population, the
most famous being the one related to the time oscillations of the population of lynx
and snowshoe hare in Canada for which data were collected for a long period of time
Stable oscillations of a predator-prey probabilistic cellular automaton 14
1000 1200 1400 1600 1800 2000
predator
0 0.05 0.1 0.15
Prey population density
Figure 7. (a) Densities of predator and prey as functions of time and (b) density of
predator versus density of prey, for the quasi-spatial model, for p = 0 and c = 0.016.
[7, 8]. If the hare population cycles are mainly governed by the lynx cycle then the
oscillations shown by the present model reproduces qualitatively some of the features of
this predator-prey dynamics.
Next we analyze the behavior of the frequency and amplitude of oscillations. Fixing
the parameter p and varying the parameter c, we verify that in all the oscillating region
the frequency of oscillation is proportional to parameter c,
ω ∼ c, (41)
as can be seen in figure 8. Low frequencies are associated to low values of c; what means
that, for small values of c, the greater the predator lifetime the greater will be period of
the oscillation. As to the amplitude A of the oscillations, we have verified, that fixing
the value of p and varying the parameter c, it increases as c decreases. Our results show
that,
A ∼ (c− c2)
1/2, (42)
as expected for a Hopf bifurcation and shown in figure 8. The transition line c = c2
from the oscillating phase to the nonoscillating phase can either be obtained by using the
criterion given by equation (42) or by analyzing the eigenvalues associated to the map
given by the set of equations (33), (34), (35), (36) and (37). This last criterion means
to find the points of phase diagram such that the real part of the dominant complex
eigenvalue equals 1.
4. Discussion and conclusion
The main result coming from the pair mean-field approximation applied to the predator-
prey PCA is that it is possible to describe coexistence and self-sustained time oscillations.
Moreover, these are stable oscillations. Given a set of parameters, just one limit cycle
is achieved, no matter what the initial conditions are. This property is essential in
describing a biological system since a small variation in the initial condition can not
Stable oscillations of a predator-prey probabilistic cellular automaton 15
0 0.005 0.01 0.015 0.02
0 0.005 0.01 0.015 0.02
Figure 8. (a) Frequency of oscillations ω versus the parameter c. The frequency
vanishes linearly as one approaches c = 0. (b) Amplitude A of oscillations versus c
near the Hopf bifurcation point c2 = 0.019. The quantity A
2 vanishes linearly when
c → c2 in accordance with a Hopf bifurcation.
modify the amplitude, frequency and mean value of the time oscillation densities of a
predator-prey system. Similar results were obtained from a continuous time version of
the present model [10]. Although the simple mean-field equations are essentially the
same in both versions this is not the case concerning the pair mean-field approximation.
The time evolutions of the pair correlations for the PCA, presented here, depend on
higher order correlations (up to fourth) when compared to the ones of the continuous
version (up to third).
The model studied here is a spatial structured model with individuals residing in
sites of a lattice and described by discrete dynamic variables. When we perform simple
mean-field approximation we neglect all the correlations of sites in the lattice. But we
take into account that there are limited resources for the surveillance of each species.
For example in the time evolution equation for the density of prey we have an explicit
term relative to reaction of birth of prey which is the product of the density of prey x by
the density of empty sites z = (1−x−y). This coincides with an extended patch model
approach for predator-prey systems. The presence of this term is what differs the simple
mean-field equations from the Lotka-Volterra equations. However, taking into account
the limitation of space and resources the simple mean-field equations are not sufficient
to get self-sustained oscillations although able to describe damped time oscillations of
population densities.
To get self-sustained time oscillations we had to proceed to the next level of
approximation in which a pair of nearest neighbor sites is treated exactly. This
approximation can be seen as representing a pair of nearest neighbor sites immersed
in a mean field produced by the rest of the lattice. The most important feature being
the fact that the two sites of this pair can be seen as localized in space. The set of
five equations which results from the pair approximation for the PCA is indeed able
to produce self-sustained oscillations of population densities. It presents an important
Stable oscillations of a predator-prey probabilistic cellular automaton 16
property that the Lotka-Volterra model lacks, namely, the oscillating solutions are stable
and are unique for a given set of the control parameters.
Acknowledgements
The authors have been supported by the Brazilian agency CNPq.
References
[1] Lotka A 1920 J. Am. Chem. Soc. 42 1595
[2] Lotka A 1920 Proc. Nat. Acad. of Sciences USA 6 410
[3] Lotka A 1924 Elements of Mathematical Biology (new York: Dover)
[4] Volterra V 1931 Leçons sur la Théorie Mathématique de la Lutte pour la Vie Paris: Gauthier-
Villars)
[5] Haken H 1976 Synergetics, An Introduction (Berlin: Springer)
[6] Renshaw E 1991 Modelling Biological Populations in Space and Time (Cambridge: Cambridge
University Press)
[7] Hastings A 1997 Population Biology: Concepts and Models (New York: Springer)
[8] Ricklefs R E and Miller G L 2000 Ecology (New York: Freeman)
[9] Tainaka K 1989 Phys. Rev. Lett. 63 2688
[10] Satulovsky J and Tomé T 1994 Phys. Rev. E 49 5073
[11] Durrett R and Levin S 1994 Theor. Popul. Biol. 46 363
[12] Hanski I and Gilpin M E (eds.) 1997 Metapopulation Biology: Ecology, Genetic and Evolution
(San Diego: Academic Press)
[13] Satulovsky J and Tomé T 1997 J. Math. Biol. 35 344
[14] Tilman D and Kareiva P 1997 Spatial Ecology: the Role of Space in Population Dynamics and
Interactions (Princeton: Princeton University Press)
[15] Fracheburg L and Krapvisky P 1998 J. Phys. A 31 L287
[16] Liu Y C, Durrett R and Milgroom M 2000 Ecol. Model. 127 291
[17] Antal T, Droz M, Lipowsky A and Odor G 2001 Phys. Rev. E 64 036118
[18] Ovaskanien O, Sato K, Bascompte J and Hanski I 2002 J. Theor. Biol. 215 95
[19] Aguiar M A M, Sayama H, Baranger M and Bar-Yam Y 2003 Braz. J. Phys. 33 514
[20] de Carvalho K C and Tomé T 2004 Mod. Phys. Lett. B 18 873
[21] Nakagiri N and Tainaka K 2004 Ecol. Model. 174 103
[22] Szabó G 2005 J. Phys. A 38 6689
[23] Stauffer D, Kunwar A and Chowdhury D 2005 Physica A 352 202
[24] de Carvalho K C and Tomé T 2006 Int. Mod. Phys. C 17 1647
[25] Mobilia M, Georgiev I T and Tauber U C 2006 Phys. Rev. E 73 040903
[26] Arashiro E and Tomé T 2007 J. Phys. A 40 887
[27] Liggett T M 1985 Interacting Particle Systems (New York: Springer)
[28] Marro J and Dickman R 1999 Nonequilibrium Phase Transitions (Cambridge: Cambridge
University Press)
[29] Tomé T and de Oliveira M J 2001 Dinâmica Estocástica e Irreversibilidade (São Paulo: Editora
da Universidade de São Paulo)
[30] Tomé T 1994 Physica A 212 99
[31] Tomé T, Arashiro E, Drugowich de Feĺıcio J R and de Oliveira M J, 2003 Braz. J. Phys. 33 458
[32] Dickman R 1986 Phys. Rev. A 34 4246
[33] Tomé T and Drugowich de Feĺıcio J R 1996 Phys. Rev. E 53 3976
[34] Levins R 1969 Bull. Entomol. Soc. Am. 15 237
	Introduction
	Model
	Probabilistic cellular automaton
	Predator-prey probabilistic cellular automaton
	Time evolution equations for state functions
	Mean-field approximation
	One and two site approximations
	Patch model
	Quasi-spatial model
	Oscillatory behavior
	Discussion and conclusion
ABSTRACT
  We analyze a probabilistic cellular automaton describing the dynamics of
coexistence of a predator-prey system. The individuals of each species are
localized over the sites of a lattice and the local stochastic updating rules
are inspired on the processes of the Lotka-Volterra model. Two levels of
mean-field approximations are set up. The simple approximation is equivalent to
an extended patch model, a simple metapopulation model with patches colonized
by prey, patches colonized by predators and empty patches. This approximation
is capable of describing the limited available space for species occupancy. The
pair approximation is moreover able to describe two types of coexistence of
prey and predators: one where population densities are constant in time and
another displaying self-sustained time-oscillations of the population
densities. The oscillations are associated with limit cycles and arise through
a Hopf bifurcation. They are stable against changes in the initial conditions
and, in this sense, they differ from the Lotka-Volterra cycles which depend on
initial conditions. In this respect, the present model is biologically more
realistic than the Lotka-Volterra model.

<|endoftext|><|startoftext|>
Introduction
	Observations and data reduction
	WHT spectroscopy
	Calar Alto photometry
	Data analysis
	Radial velocity analysis
	Light curve analysis
	Discussion
	Conclusions
ABSTRACT
  Intermediate polars (IPs) are cataclysmic variables which contain magnetic
white dwarfs with a rotational period shorter than the binary orbital period.
Evolutionary theory predicts that IPs with long orbital periods evolve through
the 2-3 hour period gap, but it is very uncertain what the properties of the
resulting objects are. Whilst a relatively large number of long-period IPs are
known, very few of these have short orbital periods. We present phase-resolved
spectroscopy and photometry of SDSS J233325.92+152222.1 and classify it as the
IP with the shortest known orbital period (83.12 +/- 0.09 min), which contains
a white dwarf with a relatively long spin period (41.66 +/- 0.13 min). We
estimate the white dwarf's magnetic moment to be mu(WD) \approx 2 x 10^33 G
cm^3, which is not only similar to three of the other four confirmed
short-period IPs but also to those of many of the long-period IPs. We suggest
that long-period IPs conserve their magnetic moment as they evolve towards
shorter orbital periods. Therefore the dominant population of long-period IPs,
which have white dwarf spin periods roughly ten times shorter than their
orbital periods, will likely end up as short-period IPs like SDSS J2333, with
spin periods a large fraction of their orbital periods.

<|endoftext|><|startoftext|>
Microsoft Word - Complexity_Considerations_FOL2.doc
Radosław Hofman, cSAT problem lower bound, 2007  
Abstract—This article deals with the lower bound that 
is considered as the worst case minimal amount of time 
required to calculate a problem result for cSAT (counted 
Boolean satisfiability problem). It uses the observation 
that Boolean algebra is a complete first-order theory 
where every sentence is decidable. Lower bound of this 
decidability is defined and shown. 
The article shows that deterministic calculation model 
made up of finite number of machines (algorithms), 
oracles, axioms, or predicates is incapable of solving 
considered NP-complete problem when its instance 
grows to infinity. This is a direct proof of the fact that P 
and NP complexity classes differ and oracle capable of 
solving NP-complete problems in polynomial time must 
consist of infinite number of objects (i.e., must be 
nondeterministic). 
Corollary of this article clears complexity hierarchy: 
P < NP 
Index terms—complexity class, P vs NP, Boolean 
algebra, first order theory, first order predicate calculus. 
I. INTRODUCTION 
Unknown relation between P and NP  [5] complexity 
classes remains one of the significant unsolved problems in 
complexity theory. P complexity class consists of problems 
solvable by deterministic Turing machine (DTM) in 
polynomially bounded time, while NP complexity class 
consists of problem solvable by nondeterministic Turing 
machine (NDTM) in polynomially bounded time. This means 
that DTM can verify the solution of every NP problem in 
polynomially bounded time, even if polynomial algorithm for 
finding this solution is unknown  [13]. 
All known attempts to prove whether these classes are or 
are not equal could not convince the community that 
arguments used there are final. Problem with attempts 
showing that P=NP is mainly with counter examples 
provided for methods described by solvers (see for example: 
 [6],  [9]), especially for large instances. Problem with proof 
attempts that P≠NP touches mainly the difference between a 
problem and an algorithm. Proving the inequality of these 
classes is equivalent to proving that “there is no such 
algorithm that solves a particular NP problem in 
polynomially bounded time.” Algorithm is an immaterial 
object, so proving that it does not exist is rather difficult. 
Can then the inequality of complexity classes be proved? 
One of the possible ways is to use the properties of first-order 
theory. Useful properties include every sentence ϕ in theory 
T is provable if there exists a set of axioms a, b, c, … such 
that ϕ can be obtained using these axioms and the inference 
Manuscript created December 29, 2006. Author is Ph. D. student of 
Department of Information Systems at The Poznan University of 
Economics, http://www.kie.ae.poznan.pl, email: radekh@teycom.pl. 
rules “modus ponens” and “universal generalization” (a ∧ b ∧ 
c… → ϕ)  [2]. 
II. BACKGROUND 
This section presents some background for the first-order 
theory and other rules used in the article. 
A. First-Order Theories 
First-order theory is a given set of axioms in some 
language. Language consists of logical symbols and set 
constants, functions, and relation symbols (predicates). 
Terms and formulas are built from language and give rise to 
sentences, which are formulas with no free variables in body. 
Theory is then a set of sentences which may be closed if it 
contains all consequences of its elements. Theory can be also 
complete (i.e. every sentence can be proved or disproved), 
consistent (not every sentence is provable), or decidable 
(every sentence can be proved or disproved and there exists a 
computational path (algorithm) showing which sentences are 
provable). 
An example of first-order theory that is complete and 
decidable is Boolean algebra  [16] or Zermelo–Frænkel set 
theory. 
B. First-Order Logic 
First-order logic, also called first-order predicate calculus 
(FOPC), is a system of deductions extending propositional 
logic. Atomic sentences of first-order logic are called 
predicates and are written usually in the form P(t1, t2, …,tn). 
An important ingredient of the first-order logic not found in 
propositional logic is quantification. 
In 1929, Gödel  [8] proved that every valid logical formula 
is valid in first-order logic. In other words, it is proved that 
for complete first-order theory, inference rules of FOPC are 
sufficient to prove any valid formula. 
First-order predicate calculus language consists of 
predicates, constants, functions, variables, logical operators 
(NOT, OR, AND), quantifiers, parentheses, and some types 
of equality symbol. There is also a set of rules for recognition 
of terms and well-formed formulas (wffs). 
There are four axioms for quantification: 
1) PRED-1: (∀ x Z(x)) → Z(t) 
2) PRED-2: Z(t) → (∃ x Z(x)) 
3) PRED-3: (∀ x (W → Z(x))) → (W → ∀ x Z(x)) 
4) PRED-4: (∀ x (Z(x) → W)) → (∃ x Z(x) → W) 
An important theorem for first-order logic is the outcome 
from Herbrand’s work (known as Herbrand’s theorem). It 
states that in predicate logic without equality, a formula A in 
prenex form (all quantifiers at the front) is provable if and 
only if a sequent S comprising substitution instances of the 
quantifier-free subformula of A is propositionally derivable, 
and A can be obtained from S by structural rules and 
quantifier rules only. In other words, it states that the formula 
cSAT problem lower bound 
Radosław Hofman, cSAT problem lower bound, 2007  
is provable, if, and only if we can rewrite it without quantifier 
substituting values and obtain provable formula. For 
example: 
∀ x Z(x) = Z(0) ∧ Z(1) 
∃ x Z(x) = Z(0) ∨ Z(1) 
C. Boolean Algebra 
Boolean algebra (also called Boolean lattice) is an 
algebraic structure containing objects and operations upon 
them and set of axioms (see Section D). It consists of one 
unary operation ¬ (not) and two binary operations ∧ (and), ∨ 
(or) also with two distinct elements 0 (constant representing 
false), 1 (constant representing true). Language of Boolean 
algebra considered as language for first order logic also 
contains symbols: = (equality), ⇒ (implication), parentheses 
and quantifiers, ∀ (universal), and ∃ (existential). 
Boolean algebra has the essentials of logic properties as 
well as all set operations (union, intersection, complement). 
D. Axioms of Boolean Algebra 
Given below is a complete list of Boolean algebra axioms. 
This set is not a minimal set of axioms (especially staring 
from  Ax13)) – some axioms can be derived from others, but it 
does not change the reasoning used in this article (the list is 
larger only for clearness and ensuring that it is complete): 
Ax1) a = b can be written as (a ∧ b) ∨ (¬a ∧ ¬b) 
Ax2) a ⇒ b = ¬a ∨ (a ∧ b) 
Ax3) a ∨ (b ∨ c) = a ∨ b ∨ c = (a ∨ b) ∨ c 
Ax4) a ∧ (b ∧ c) = a ∧ b ∧ c = (a ∧ b) ∧ c 
Ax5) a ∨ b = b ∨ a 
Ax6) a ∧ b = b ∧ a 
Ax7) a ∨ (a ∧ b) = a 
Ax8) a ∧ (a ∨ b) = a 
Ax9) a ∨ (b ∧ c) = (a ∨ b) ∧ (a ∨ c) 
Ax10) a ∧ (b ∨ c) = (a ∧ b) ∨ (a ∧ c) 
Ax11) a ∨ ¬a = 1 
Ax12) a ∧ ¬a = 0 
Ax13) a ∨ a = a 
Ax14) a ∧ a = a 
Ax15) a ∨ 0 = a 
Ax16) a ∧ 1 = a 
Ax17) a ∨ 1 = 1 
Ax18) a ∧ 0 = 0 
Ax19) ¬0 = 1 
Ax20) ¬1 = 0 
Ax21) ¬(a ∨ b) = ¬a ∧ ¬b 
Ax22) ¬(a ∧ b) = ¬a ∨ ¬b 
Ax23) ¬¬a=a 
Using universal generalization, one may add in every 
axiom definition, the universal quantifier stating “for all x 
axiom body” (). 
E. Computational Tree 
Corollary of the considerations stated earlier establishes 
that every formula in Boolean algebra is decidable. It is said 
to be proved (or called “tautology”) if there exists a 
transformation path from a set of axioms to a sentence that we 
are trying to prove. Until the authors are discussing FOPC, 
one may say that every sentence is provable, if, and only if we 
can start with axiom and repeatedly apply “modus ponens” or 
“universal generalization” and obtain this sentence  [2]. 
One may then consider every possible (provable) sentence 
to be deducible from axioms, which may be presented as the 
graph shown in Fig.  1. 
Axiom 1 Axiom 2 Axiom 3
Axiom n
Sentence 1.1 Sentence 1.2 Sentence 1.3 Sentence 1.m...
Modus ponens
General universalization
  Figure 1 Example of deducible tree 
Axioms may also be the result of computations, especially 
when they are not independent (e.g., ZF axioms) or when 
computations fall in a cycle. Usually, during computation one 
would skip deduction to already proven sentences because it 
does not introduce any new information, so deductions to 
axioms would have been omitted. 
F. Inference Rules and Deduction 
Modus ponens is an inference rule using the reasoning: if a 
and a → b are both proved, then b is also proved. 
Universal generalization is an inference rule using the 
reasoning: if P(a) is proved, and a is a free variable, then ∀ a 
P(a) is also proved. 
Deduction theorem (in fact deduction meta-theorem) states 
that if formula F can be deduced from E, then the implication 
P → Q can be directly shown to be deducible from the empty 
set. Using symbol “├” for deducible, one may write: if P ├ Q 
then ├ P → Q. One may generalize it to a finite sequence of 
assumption formulas P1, P2, P3, …, Pn ├ Q: 
P1, P2, P3… Pn-1 ├ Pn → Q and repeat it until we obtain the 
empty set on the left-hand side: ├ (P1 → (…(Pn-1 → (Pn → 
Q))…). 
Deduction follows three kinds of steps: setting up a set of 
assumptions (hypothesis), reiteration - calling hypothesis 
made previously to make it recent, and deduction, which is 
removing recent hypothesis. If one wants to convert proof 
done using deduction meta-theorem to axiomatic proof, then 
usually the following axioms would have been involved: 
1) P → (Q → P) 
2) (P → (Q → R)) → ((P → Q) → (P → R)) 
3) Modus ponens: (P ∧ (P → Q)) → Q 
G. Corollaries 
Theorem 1––If formula expressible in FOPC language is 
deducible, then every possible transformation of this formula 
obtained by usage inference rules and axioms is also 
deducible, and can be expressed in the same language. 
Theorem 2––If every transformation of formula is 
Radosław Hofman, cSAT problem lower bound, 2007  
expressible in FOPC, then the optimal for certain resource for 
chosen computational model is also expressible in the same 
language. 
Proofs of these theorems are provided in the appendix 
( VI.A and  VI.B). 
One needs to focus on Theorem 1 and have good 
understanding of significance of Gödels work. Let there be 
considered some formula ϕ which is intended to be proven or 
disproved. Assuming that there exists some deterministic 
transformation T1 which transform formula ϕ to ϕ1: 
ϕ1=T1(ϕ), there can also exist another transformation T2 
taking ϕ1 as input and returning ϕ2 as output: 
ϕ2=T2(ϕ1)=T2(T1(ϕ)). Continuing this idea of 
transformations one reaches ϕTRUE or ϕFALSE formula 
allowing to prove or disprove ϕ. 
Theorem 1 is in fact summary of Gödels Theorem  [8], 
stating that ϕ, ϕ1, ϕ2… ϕx can all be expressed in FOPC 
language. 
Above in fact causes statement of Theorem 2: if every 
possible transformation / reformulation of formula is 
expressible in FOPC language, then (by power of every 
quantification) also optimal transformation is expressible in 
FOPC language. It does not matter what is nature of this 
transformations. If they are deterministic (for certain input 
always returns same output in finite number of steps). 
Optimal way to solve the problem (decide on formula) can be 
then written as: ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) 
III. CSAT LOWER BOUND 
A. Problem Definition 
In this work, the authors consider  a problem called “Count 
of Satisfaction of Boolean Expression for formula ϕ.” This 
problem is almost the same as the classical SAT problem, but 
instead of the question “Is there an assignment to variables 
such that formula ϕ is satisfied?”, they ask the question “Are 
there at least L assignments such that formula ϕ is satisfied?”. 
L in problem instance is written unary, and the remaining 
part of the instance is exactly the same as in SAT problem 
(the authors assume that it is in conjunctive normal form 
(CNF)). 
It is easy to show that the problem is in NP – Guess & 
Check algorithm, for NDTM requires O(L*n) steps to check 
(certificate size is L*v where v is the number of Boolean 
variables used). 
It is also easy to show that the problem is NP-complete. 
One can show it using reduction from SAT problem and ask 
the question “Is there at least L=1 assignment such that 
formula ϕ is satisfied?” 
B. Measurable Predicate 
Problem question is easy to understand by a human, but it 
certainly extends to FOPC language defined in Section  II. To 
express it in a defined language, one  needs to define 
predicate “µ” – measure. This predicate is a representation of 
sigma-additive (countably additive) measurable function 
known as “set cardinality.” Definition of this predicate 
requires one constant variable n – number of different 
Boolean variables used. Predicate “µ” will measure number 
of assignments satisfying formula ϕ. 
1: µ(∅) :- 0 
2: µ(TRUE) :- 2n 
3: µ(FALSE) :- 0 
4: µ(¬ϕ1) :- 2n-µ(ϕ1) 
5: µ(a1) :- 2n-1 
6: µ(a1∧a2…∧ak) 
 ∃ ai, aj: i≠j ∧ ai=¬aj :- µ(FALSE) 
 ∃ ai, aj: i≠j ∧ ai=aj :- µ(a1∧a2…aj-1∧aj+1...∧ak) 
  :- 2n-k 
µ(ϕ1∨ϕ2) :- µ(ϕ1)+µ(ϕ2)−µ(ϕ1∧ϕ2) 
One may think of adding some more conditions to this 
predicate, but the list given earlier is sufficient to calculate 
measure for every formula for a defined language (growth of 
number of axioms and definitions is discussed in Section  H). 
It is also compliant with sigma-measurable function 
definition. 
One may also observe that usage of measure leads to 
exponential number of calculations required for CNF. This is 
a consequence of sigma-additivity property: for any sets a 
and b: µ(a∪b)=µ(a)+µ(b)−µ(a∩b). If one considers m sets, 
then this function transforms to: 
⋅−=
})..({
aaPS Sa
aa IU µµ , where 
P({a1..am}) is power set over m sets, which means that it has 
2m objects in it. If one is able to calculate the measure of a set 
or intersection of sets, then the calculation of union of m sets 
requires Ω(2m) intersections to be measured. 
Problem question using predicate “µ” is then: “µ(ϕ)≥L?”. 
Direct calculation may not  be the only possible way for 
solving problems, and the authors now analyze the definition 
of lower bound, deterministic and nondeterministic 
computation models. 
C. Lower Bound Definition 
Lower bound in Big-O notation is denoted as Ω(g(n)), and 
for its use in this article, one may assume that it is used to 
express problem lower bound. Interpretation of lower bound 
is “minimum value of function in the worst case,” and is 
defined as f(n) ∈ Ω(g(n)) ⇔ 0
inflim >
∞→ ng
In most of the complexity considerations, two types of 
resources are used in the expressions of problem lower 
bounds or algorithm upper bounds. These resources are time 
(number of steps required) and space (number of 
symbols/tape cells required). 
Theorem 3––Time complexity of problem/algorithm is 
always greater than or equal to space complexity. This 
theorem is proved in Section  VI.C. 
Theorem 4––Minimal number of symbols required for 
unambiguous description of object is Ω(log(N)), where N 
represents the number of possible objects to be stored. In 
other words, this means that if one has N different objects that 
may occur in computations at a certain step and would want 
to store information on which one occurred, then Ω(log(N)) 
Radosław Hofman, cSAT problem lower bound, 2007  
symbols are required. This theorem is proved in Section 
 VI.D. 
In this work, the authors mainly  consider time complexity 
using observation from Theorems 3 and 4. 
Theorem 5––Lower bound calculated to express a specific 
resource (time or space) usage for deciding formula 
expressed in FOPC for a chosen computational model is 
equal to the minimal usage of this resource for the best 
possible transformation of formula in this language. This 
theorem is proved in Section  VI.E. 
Theorem 5 is consequence of Theorem 2. If one had set of 
deterministic transformations expressing optimal way to 
solve the problem:  
ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) then by power of 
definitions it can be shown that lower bound for problem 
solution is exactly equal to time required by this optimal 
solution. This is consequence of lower bound definition – it is 
asymptotically minimal amount of resource required to solve 
the problem. Repeating most important observations till this 
point: 
a) formula ϕ can be expressed in FOPC language (from 
Gödels Theorem  [8]) 
b) any possible transformation of formula can be 
expressed in FOPC language ϕ1=T1(ϕ) (Theorem 1) 
c) if every deterministic transformation can be expressed 
in FOPC language then also optimal deterministic 
transformation can be expressed in FOPC language   
(Theorem 2)  
ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) 
d) resource cost of optimal transformation of formula is 
equal to deterministic lower bound of the problem 
Roughly speaking, lower bound should be considered as 
the minimal amount of resource used for computation for the 
worst case. In case of time, it is the minimal number of 
operations to perform. It is even intuitive to see that if one 
could express calculation in some “steps,” then lower bound 
is equivalent to minimal number of “steps” required in the 
worst case. 
D. Nondeterministic Calculation Model 
Nondeterministic calculation model may be considered as 
the “luckiest possible guesser.” Such an approach expresses 
that the role of NDTM to answer a problem question is to 
guess the certificate and check it. If the check can be 
performed in O(nc) for some constant c, then one considers 
the problem as part of NP complexity class. 
One has to remember that DTM is a “special case” of 
NDTM where from every machine state, only one possibility 
to choose the next state exists, regardless of the symbol in the 
cell where the tape read/write head is positioned. This means 
that every problem solvable by DTM in O(nc) steps is 
solvable also on NDTM in at most same number of steps (or 
may be less). 
A good example expressing the differences between DTM 
and NDTM is the 2SAT problem (classic satisfaction of 
Boolean expression in CNF problem, but where in each 
clause there are at most two literals). This problem is solvable 
by DTM in O(n3) steps, but NDTM may guess the correct 
assignment and verify it in O(n). 
In terms of first-order logic and Herbrand’s theorem, one 
can see that NDTM is a verifier of Herbrand’s subformulas. 
When the formula is expressed using existential quantifier:  
∃ <a: assignment> F(a), then, according to Herbrand’s 
theorem, it is equivalent to: F(a1) ∨ F(a2)… ∨ F(ak). NDTM 
is able to check each of F(ai) simultaneously, even if the 
number of possible assignments is exponential, excepting 
when at least one of the computation paths led to an accepting 
state. 
One can see that for a nondeterministic calculation model 
problem, the number of steps of lower bound is equal to the 
minimal number of steps required to check the certificate. 
For example, for 2SAT problems one can have different 
approaches. 
Number of possible 
Herbrand’s 
subformulas 
Minimal number 
of steps to check 
each subformula 
Total 
calculatio
n cost 
1 2n N n 
(guessing only p 
variables) 
n*p3 N*p3 
(without splitting) 
3 n3 
 Table 1  Different approaches for nondeterministic 
calculation 
Table 1 presents different approaches differing mainly in 
the number of “guesses.” Calculation of problem lower 
bound for nondeterministic model of calculation returns the 
minimal number of steps required to check subformula. 
The last row presents the approach where the problem is 
not split, so it is calculated as in the deterministic model of 
calculations. 
E. Deterministic Calculation Model 
As mentioned in  Section D, deterministic model of 
calculations follows a single computation path. It is obvious 
that despite direct calculations, DTM can also perform Guess 
& Check algorithm (simulating NDTM). This time, the 
authors do not assume that DTM is the “luckiest possible 
guesser” and for lower bound complexity calculation of this 
approach, they have to assume that DTM is the “worst 
possible guesser.” This is also a consequence of the slight 
change in computation goal - NDTM has to “accept” when 
there is computational path leading to accepting state, while 
DTM has to “decide” on input, which means that the answer 
“NO” can be produced only when there is no possible way of 
reaching the accepting state (NDTM can be defined without 
rejecting state). 
Additionally, DTM requires an iterator (space on tape 
where number of current “guess” can be stored), which 
according to Theorem 4 requires Ω(log(H)), where H 
represents the possible number of “guesses.” 
Table 2 shows what time complexity would look like. 
Radosław Hofman, cSAT problem lower bound, 2007  
Number of 
possible 
Herbrand’s 
subformulas 
Minimal 
number of 
steps to 
check each 
subformula 
Total calculation 
cost 
1 2n N 2n*n+log(2n) 
2 2p n*p3 2n*n*p3+log(2p) 
K 1 n3 n3 
 Table 2 Different approaches for deterministic calculation 
In this table, the last row represents the minimal possible 
number of steps to calculate result. It is easy to show that for 
DTM, this row also presents deterministic problem lower 
bound because if any of the “guessing” approaches had been 
better, then it would have been used to present minimal 
deterministic calculation cost (we assume that values in the 
table are “best possible” not “best known” - see Theorem 5). 
F. cSAT Nondeterministic Algorithm Upper Bound 
Upper bound for algorithm solving cSAT problem is 
polynomial. It is a consequence of Herbrand’s theorem and 
ability of NDTM to: 
generate all possible subformulas in O(nc) 
verify each of them in O(nc) 
NDTM algorithm can be described using the following 
steps: 
1) Guess sets of measure L consisting of assignments of 
variables (time O(L*v)) 
2) Verify guessed set (time O(L*v)) 
This procedure leads to accepting the state (if at least one 
computation path is accepting) in at most O(L*v) steps and 
because instance size n∈Ω(L+v), the solution is provided in 
O(n2). 
G. cSAT Deterministic Lower Bound 
Now, using the observations described in the earlier 
sections, the authors calculate deterministic lower bound of 
cSAT problem (it is known that its nondeterministic upper 
bound is O(L*v)). 
First, one needs to write the problem in the FOPC 
language. One  uses the predicate µ: µ(ϕ)≥L. 
This problem may be considered to be harder than the 
classic SAT problem. If one tries to guess all possible 
subsets, then we would have Ω(
v22 ) subsets, so according to 
Theorem 4, it would require Ω(2v) symbols to store 
information about the considered subset, which, according to 
Theorem 3, leads to the conclusion that such a calculation 
requires at least Ω(2v) steps. “Guessing” only subsets of size 
L leads to Ω(2v) different subsets, so that it can be calculated 
by NDTM in polynomial time (Ω(v*L) steps), but requires 
Ω(2v*v*L) steps to calculate on DTM. 
In fact, following the assumption that DTM is the worst 
possible guesser, one may see that the number of hypotheses 
(“guesses”) used during computation can lead to an 
exponential usage of time if “depth” of hypothesis path is 
longer than O(log(n)) or any of the hypotheses has more than 
polynomial number of possible values. For example, if one 
states hypothesis A with possibilities, it is true or false 
(constant number of possibilities) and it is followed by 
hypothesis B (true or false), etc.; we need O(n) hypotheses 
before we can decide on formula, then in the worst case we 
require Ω(2n) steps to give the answer “NO.” 
Leaving then all Guess & Check approaches, the authors 
try to determine the minimal possible number of steps for 
DTM to decide on problem input. According to Theorem 5 
and considerations from the earlier sections, the authors 
conclude that the shortest possible path consists of steps 
transforming input formula to axioms of theory. If one can 
show that every transformation requires exponential number 
of steps or usage of object using exponential number of 
symbols to store, then it will be direct proof that lower bound 
of cSAT problem is over-polynomial. 
When will one be able to observe exponential growth of 
minimum number of required steps? If after using an axiom 
or predicate, one will obtain a formula of multiplicative 
length by a factor greater than 1. For example, if for formula 
of size n1 (considered to be in CNF), the authors use  Ax9) for 
one parenthesis, they obtain new formula in the format 
v1∧(n2,1)∨v2∧(n2,2)∨…∨vm∧(n2,m). In each of the m parts, 
one can use a variable from the beginning to remove all its 
negations from body, so |n2,*|<|n1|−m, but for very large n1, 
these parts of formula will still require further 
transformations, which if done only with  Ax9) would lead to 
exponential growth. Concluding this paragraph, one may say 
that if transformation reduces size of formula substring by 
O(nc) and multiplies this shorter string in formula making 
string grow to n2, where n2∈Ω(n1*c), then this path leads to 
exponential growth of formula and thus its lower bound is 
Ω(2n). 
In Table 3, the authors present the effect obtained by usage 
of every possible transformation, but before this the authors 
define polynomial purifying function for formula. This 
function will use axioms  Ax7),  Ax8),  Ax11),  Ax12),  Ax13), 
 Ax14),  Ax15),  Ax16),  Ax17),  Ax18),  Ax19),  Ax20),  Ax23), 
and two observations: 
µ(ϕ1)=µ(ϕ1∧(TRUE))=µ(ϕ1∧(v1∨v2∨…∨TRUE)); 
µ(v1∧ϕ1)=µ(ϕ2) where ϕ2 is obtained by replacing every 
occurrence of v1 in ϕ1 with constant TRUE. 
Roughly speaking, this function  looks for variables that 
can be cleared out from formula and prepare it for the next 
step of calculation. The authors assume that at every step of 
calculation, formula is in a form not allowing the use of any 
of the above axioms or rules. It is also important to remember 
that the number of transformation rules does not matter - refer 
to Section  H. 
Transfor
mation 
used 
Length 
string 
used 
Result 
string 
length 
Lower 
bound for 
path 
Remarks for “worst 
case” 
 Ax1)    
 Ax2)    
These axioms cannot be 
used since input never 
contains these symbols 
 Ax3) n1 n1 Ω(cSAT) 
 Ax4) n1 n1 Ω(cSAT) 
These axioms do not 
change formula length 
Radosław Hofman, cSAT problem lower bound, 2007  
Transfor
mation 
used 
Length 
string 
used 
Result 
string 
length 
Lower 
bound for 
path 
Remarks for “worst 
case” 
 Ax5) n1 n1 Ω(cSAT) 
 Ax6) n1 n1 Ω(cSAT) 
 Ax7)    
 Ax8)    
These axioms cannot be 
used because formula is 
transformed by purifying
function 
 Ax9) 
m1+m2+
2*m1*
m2+nr-
 Ax10) 
m1+m2+
2*m1*
m2+nr-
Used on two parenthesis 
replaces them with string 
of size 2*m1*m2, after 
using these axioms 
purifying function will 
reduce size but in the 
worst case only by 2 
symbols 
 Ax11), 
 Ax12), 
 Ax13), 
 Ax14), 
 Ax15), 
 Ax16), 
 Ax17), 
 Ax18), 
 Ax19), 
 Ax20) 
These axioms cannot be 
used because formula is 
transformed by purifying
function 
 Ax21) n1 n1 Ω(cSAT) 
 Ax22) n1 n1 Ω(cSAT) 
These axioms do not 
change formula length 
 Ax23)    As  Ax20) above 
µ1    
µ2    
µ3    
These rules cannot be 
used because formula is 
transformed by purifying
function 
µ4 n1 n1 Ω(cSAT) 
Can be used with  Ax21), 
 Ax22) or  Ax23), but 
does not change length 
of formula 
µ5    As µ3 above 
µ6 m1+m2 
m1*m2 
n  Treating formula as 
consisting of two parts 
 Table 3 Lower bounds for every possible transformation 
of cSAT formula 
The variables used in Table 3 are the following: 
n1 – Length of the formula 
nr – Length of the remaining part of the formula 
m1 – Length of first part/parenthesis of the formula 
m2 – Length of second part/parenthesis of the formula 
p1 – Number of parentheses in the formula 
 – The authors consider asymptotic behavior of the 
function, so one may use kind of “mean” m1 – representing 
Ω(m1). 
It is clear that in the worst case,  Ax9),  Ax10), or µ6 have to 
be used several times before purifying function would make 
significant reduction of length. Lower bound is considered as 
the minimal worst case, so from this table it is clear that in the 
worst case it is Ω(mp)=Ω(2p*log(m)) and because p and m are 
both O(n), the whole lower bound is Ω(2n). 
H. More Conditions and More Axioms 
The above considerations prove clearly that deterministic 
lower bound for cSAT problem considered with FOPC 
language defined is exponential. But one needs to answer one 
additional question – Is it the result of too poor FOPC axioms 
set definition? Or are too few predicates defined? 
In  [1] Baker–Gill–Solovay theorem, authors have shown 
that problem “Is P equal to NP?” can be relativized using 
oracles. Oracle is a machine (black box) that gives answers to 
certain type of problems in one step. One can then imagine 
that there are a very large number of oracles which can solve 
certain types of instances. DTM task is to pick up one of them 
(or use them sequentially because if the number of oracles is 
an attribute of the machine, then even if we have used 
millions of them, the complexity in terms of relation to 
instance size is O(1)). 
The authors presume then, that for cSAT problem, there 
exists some deterministic algorithm calculating answer in 
O(nc) steps. Following lower bound calculation, one knows 
that this algorithm calculates a result requiring Ω(2n) 
transformations. Reminding optimal transformation as 
described above: ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) 
and x∈Ω(2n). 
The presumption made here can be presented as existence 
of some transformation TA≡Tx-k(Tx-k-1(…(Tx-k-m(   ))…)), 
where m is exponential (TA is equivalent to exponential 
number of transformations in FOPC language, on optimal 
transformation path). The authors also assume that TA is 
deterministic, as deterministic lower bound is discussed in 
this section, and computable in polynomial number of steps. 
Now, one need to look on transformation path as on 
decision process, where at each step there is a decision to be 
made (decision which transformation is to be used). Each 
decision takes Ω(1) space to be stored. If m was dynamic and 
asymptotically equal to 2n, and also computable in 
polynomial number of steps then this would be equal to O(2n) 
decisions in O(nc) time what contradicts Theorem 3. 
Considering constant m (invented by algorithm designer) 
one may ask what is common in a large number of Turing 
machines (in the sense of defined algorithms), large number 
of axioms, large number of predicates, large number of 
oracles, or large m in above transformation TA? Their number 
is always a constant, even if very large. If then anyone defines 
multiple TMs, adds multiple axioms, predicates, defines 
large number of oracles, or finds one transformation TA 
equivalent to exponential number of other transformation, 
then in fact after defining them, one may have constant 
number of machines, axioms, predicates, transformations, 
and oracles. 
The authors then assume that there exists a machine 
denoted by LDTM in which implements are equivalent to 
large number of TMs, large number of axioms, predicates, 
implements TA and are connected to multitude of oracles. 
Such a defined machine is (by power of assumption) capable 
Radosław Hofman, cSAT problem lower bound, 2007  
of answering cSAT questions for a finite number of differing 
input types (number of types is a consequence of maximal 
input size). 
In other words, the authors assume that there exists a 
machine LDTM able to answer cSAT questions for instance 
size less than or equal to nl. They may consider having 
( )lncO  different input types, and each type is covered at least 
by one combination of axioms, predicates, or oracles 
allowing LDTM to give answer in O(n) steps. One may 
assume that there are gl such combinations. Denoting |gi(nl)| 
number of instances solved by ith combination of axioms and 
oracles for instance nl symbols long, the authors have 
assumed that: |||)(|
cng ≥∑
, where ( ) ||||
cng <∀
Now the authors determine the ability of this machine to 
answer cSAT question where n=nl*y. Number of 
combinations of axioms and oracles remain constant (gl), but 
they assume that each combination covers more instances 
(considered to be “same type”) gl(n)=gl(nl*y)≤gl(nl)
y. The 
number of possible types grows from ( )lncO  to 
( ) ( )( )ynyn ll cOcO =* . 
Calculating instances covered by gl definitions, we have: 
ngyng
|)(||)*(| . 
If one proves that for y growing to infinity 
lcng |||)(|
, it will be proof that not all instances of 
size O(nl*y) are solvable using LDTM definitions, so these 
large instances will require calculations using deterministic 
lower bound discussed in Section  G. Proof is presented in 
 VI.F, so corollary about impossibility to answer cSAT 
problems in polynomial time by LDTM holds. 
IV. COROLLARIES 
To summarize this article, the authors repeat the deduction 
path: 
a) formula ϕ can be expressed in FOPC language (from 
Gödels Theorem  [8]) 
b) any possible transformation of formula can be 
expressed in FOPC language ϕ1=T1(ϕ) (Theorem 1) 
c) if every deterministic transformation can be expressed 
in FOPC language then also optimal deterministic 
transformation can be expressed in FOPC language   
(Theorem 2)  
ϕTRUE/FALSE=Tx(Tx-1(Tx-1(…(T2(T1(ϕ)))…)) 
d) resource cost of optimal transformation of formula is 
equal to deterministic lower bound of the problem 
e) TA equivalent to exponential number of 
transformations computable in polynomial time 
contradicts Theorem 3 
f) large number of defined constant set of 
transformations, oracles, algorithms, machines ect. 
cannot cover all possible inputs for growing instance 
size (Theorem 6) 
g) optimal solution of problem requires Ω(2n) 
transformations 
h) deterministic lower bound for cSAT problem is then 
Ω(2n), then cSAT∉P (Theorem 5) 
i) NDTM solves cSAT in polynomial time, so 
cSAT∈NP 
j) this means that P≠NP 
If above considerations are correct then checking problem 
known to be in P has to show that it is in P using the same 
reasoning. Such check for 2SAT problem if presented in 
Section  VI.G - lower bound for this problem is Ω(nc). 
In  [1], there was presented an oracle A for which PA=NPA. 
Proof presented in Section  VI.F and problem lower bound 
lead to corollary, and if A is able to solve cSAT in 
polynomial time, then A has to be nondeterministic - it has to 
consist of infinite number of objects: deterministic oracles, 
algorithms, DTMs, axioms, rules etc. (the authors also 
consider NDTM as an infinite set of DTM duplicates - each 
for one computational branch). 
This work discusses problem P=NP, as described in  [5]. It 
may be said to relativize (see  [1]) to deterministic model of 
computation showing that deterministic calculation model 
made up of finite number of machines (algorithms), oracles, 
axioms, or predicates is incapable of solving the considered 
problem when its instance grows to infinity. 
On the other hand, one may conclude that if restrictions on 
maximum input length problem are set, then the problem can 
be proved to be in P using a large number of machines, 
axioms, algorithms, predicates, or oracles. 
For deterministic model of computation, one  knows then 
that P≠NP. 
Using Theorem 13 from  [12], the authors also know that 
NP-complete≠(NP-P). In this theorem, the authors have 
proved that: if P≠NP and U is some NP-complete language 
then U=A∪B where neither A nor B language is 
NP-complete (at least one of them is also not equal to P: A≠P 
∨ B≠P). Complexity classes can be put in a picture (Fig. 2): 
Figure 2 Relation between P, NP, and NP-complete classes 
V. REFERENCES 
[1] Baker T. P., Gill J., Solovay R., “Relativizations of the P 
=? NP question”, SIAM Journal on Computing, vol. 4, 
no. 4, 1975, pp. 431-442. 
[2] Barwise J., Etchemendy J., “Language Proof and Logic”, 
Seven Bridges Press, CSLI (University of Chicago 
Press) and New York, 2000. 
[3] Chandra A. K., Kozen D. C., Stockmeyer L. J., 
“Alternation”, Journal of the ACM, vol. 28, no. 1, 1981. 
[4] Cook S. A., “The complexity of theorem-proving 
procedures”, Proceedings of the Third Annual ACM 
Symposium on Theory of Computing, 1971, pp. 
151-158. 
Radosław Hofman, cSAT problem lower bound, 2007  
[5] Cook S. A., “P versus NP problem”, unpublished. 
Available at: 
http://www.claymath.org/millennium/P_vs_NP/Official
_Problem_Description.pdf 
[6] Diaby M., “P = NP: Linear programming formulation of 
the traveling salesman problem”, 2006, unpublished. 
Available at: http://arxiv.org/abs/cs.CC/0609005 
[7] Gallier J. H., "Logic for Computer Science: Foundations 
of Automatic Theorem Proving", Harper & Row 
Publishers, 1986. 
[8] Gödel K., "Über formal unentscheidbare Sätze der 
Principia Mathematica und verwandter Systeme", I. 
Monatshefte für Mathematik und Physik, vol. 38, 1931, 
pp. 173-198. 
[9] Hofman R., “Report on article: P=NP linear 
programming formulation of the traveling salesman 
problem”, 2006, unpublished. Available at: 
http://arxiv.org/abs/cs.CC/0610125 
[10] Jech T., “Set Theory: The Third Millennium Edition, 
Revised and Expanded”, ISBN 3-540-44085-2, 2003. 
[11] Karp R. M., “Reducibility among combinatorial 
problems”, In Complexity of Computer Computations, 
Proceedings of the Symposium of IBM Thomas J. 
Watson Research Center, Yorktown Heights, NY. 
Plenum, New York, 1972, pp. 85-103. 
[12] Landweber, Lipton, Robertson, “On the structure of sets 
in NP and other complexity classes”, Theoretical 
Computer Science, vol. 15, 1981, pp. 181-200. 
[13] Papadimitriou C.H., Steiglitz K., “Combinatorial 
Optimization: Algorithms and Complexity”, 
Prentice-Hall, Englewood Cliffs, 1982. 
[14] Razborov A., Rudich S., “Natural proofs”, Journal of 
Computer and System Sciences, vol. 55, no. 1, 1997, pp. 
24-35. 
[15] Savitch W. J., “Relationships between nondeterministic 
and deterministic tape complexities”, Journal of 
Computation and System Science, vol. 4, 1970, pp. 
177-192. 
[16] Tarski A., Givant S., “A Formalization of Set Theory 
Without Variables”, American Mathematical Society, 
Providence, RI, 1987. 
VI. APPENDIX 
A. Proof 1 - Proof of Theorem 1 
Theorem 1 - If formula expressible in FOPC language is 
deducible, then every possible transformation of this formula 
obtained by usage inference rules and axioms is also 
deducible and can be expressed in the same language. 
This theorem is a direct consequence of FOPC definitions. 
If ϕ is deducible, then: 
• ϕ ∧ axiom 
• ϕ → axiom 
• ∀ x ϕ 
are also deducible. 
B. Proof 2 - Proof of Theorem 2 
Theorem 2 - If every transformation of formula is 
expressible in FOPC, then the optimal for certain resource for 
chosen computational model is also expressible in the same 
language. 
This theorem is a consequence of Theorem 1 and FOPC 
definitions. If the goal of calculation is to decide on formula 
based on theory axioms, then it is required to obtain formula 
as a consequence of axioms (with empty left-hand side):  
├ (P1 → (…(Pn-1 → (Pn → Q))…). 
The authors said that every possible transformation of 
formula is expressible in FOPC and this directly means that 
the optimal in the aspect of a certain resource (time or space) 
path is also expressible in FOPC. 
C. Proof 3 - Proof of Theorem 3 
Theorem 3 - Time complexity of problem/algorithm is 
always greater than or equal to space complexity. 
This theorem is a consequence of Turing machine 
definition, which states that in one step, a machine can read or 
write one (or in general constant) number of symbols. If then 
f(n) symbols were written, then machine had used at least f(n) 
steps to write them. 
D. Proof 4 - Proof of Theorem 4 
Theorem 4 - Minimal number of symbols required for 
unambiguous description of object is Ω(log(N)), where N 
represents the number of possible objects to be stored. 
In this section, the function log is considered to have ∑ in 
root, where ∑ represents the number of the symbols in the 
alphabet: log(∑)=1. 
The authors prove the theorem using contradiction. 
Suppose that one knows “compression” algorithm allowing 
to write each of N symbols using log(N)−f(N) symbols, where 
f(N) is a function such that: ∀ N: 0<f(N)<log(N). 
On log(N)−f(N),one can write at most ∑log(N)-f(N) different 
strings. 
)log(
)log(
)log(
)()log(
NfNfNf
NfN NN
Now the authors check whether this number is greater than 
the number of objects to be identified by checking limens: 
lim  
It is easy to see that if f(N)=0, then limens is equal to zero 
(which means that exactly N different objects can be 
described using a string of desired length), but when f(N)>1 
(it is the smallest value making difference in the number of 
symbols used), it is negative which means that less than N 
objects can be represented using a string of this length. 
E. Proof 5 - Proof of Theorem 5 
Theorem 5 - Lower bound calculated to express specific 
resource (time or space) usage for deciding formula 
expressed in FOPC for a chosen computational model is 
equal to the minimal usage of this resource for best possible 
transformation of formula in this language. 
Proof of this theorem is in fact a direct corollary of 
Radosław Hofman, cSAT problem lower bound, 2007  
Theorems 1 and 2. If any transformation of formula is 
expressible in FOPC language, then the optimal in terms of 
chosen resource is also expressible in FOPC language and 
when the authors calculate lower bound for this resource for 
the whole transformation path (from input string to decidable 
form (to axioms)), then they obtain the value of lower bound 
for the considered problem. 
F. Proof 6 - Proof of Not Covering by Constant Set of 
Definitions All Possible Large Instances by LDTM 
Assumptions: ( ) ||||
cng <∀
 and |||)(|
cng ≥∑
. Also 
gi(n) function operates on natural numbers and returns natural 
numbers, so ( ) |1|||
cng . 
One want to solve yn
lcng |||)(|
 for y growing to 
infinity. 
First one may observe that: ∑∑
|1||)(|  if 
one can solve inequality yn
yn ll cc |||1|
, then it will be 
equivalent to prove that the proof is correct. 
The  new equality presented by the authors is free from i 
variables, so it can be rewritten as: ynyn ll ccl |||1|* <− . 
Now the authors take logarithm on both sides to the base l: 
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )||log|1|log1
||log*|1|log*1
||log|1|loglog
||log|1|*log
At this point, it is obvious that this inequality holds - 1/y 
when y grows to infinity may be omitted and one has 
inequality of two logarithms where this one on the left-hand 
side is the logarithm of lower value. 
More formally, one may calculate limens: 
( ) ( )
1loglim
loglim
||log|1|log
lim  
Proof is then correct. 
G. Proof 7 - Lower Bound for 2SAT Problem 
2SAT problem is a special case of cSAT problem. Its 
special factors are: 
• L = 1 (problem question is “µ(ϕ)>=1” or 
“µ(ϕ)>0”) 
m1=m2=…=mp=2 
The authors assume that the input string is in CNF and 
purifying function (defined in Section  III.G) cannot be 
applied. 
They use  Ax9) on parenthesis to select next parenthesis 
such that: 
• parenthesis has not been used yet 
• parenthesis contains negation of variable used in 
a previous step 
Every time the usage of  Ax9) will be followed by purifying 
function usage. 
For example:  
(a∨b)1∧(a∨¬c)2∧(c∨d)3∧(¬b∨¬c)4∧(¬b∨¬a)5∧(¬d∨¬a)
6∧(e∨f)7 the authors would have used  Ax9) for parenthesis: 1 
and 6 (last variable is: ¬d), then 3 (lv: c), then 4 (lv: ¬b), then 
1 (second time, lv: a), then 5 (lv: ¬b), then 1 (third time, lv: 
a), then 6 (second time, lv: ¬d), then 3 (second time, lv: c), 
and finally 2. Every parenthesis will be used at most on every 
path from any pair of parentheses. 
At every stage, calculation formula will contain at most 
p+1 conjunctions (where p is the number of parenthesis 
processed) and each conjunction will contain at most every 
variable once. Every parenthesis will be used at most p2 
times, which means that at every stage of computation, 
formula length is O(p3). 
This may not be a time optimal solution. According to 
Theorem 2, optimal transformation path is expressible using 
axioms and predicates defined for FOPC, but to show that the 
problem is in P, one does not need to look for optimal 
transformation path - the authors have shown that there exists 
at least one transformation path polynomially bounded to 
instance size, and even if it is not the optimal one, it shows 
that 2SAT problem is in P.
ABSTRACT
  This article discusses completeness of Boolean Algebra as First Order Theory
in Goedel's meaning. If Theory is complete then any possible transformation is
equivalent to some transformation using axioms, predicates etc. defined for
this theory. If formula is to be proved (or disproved) then it has to be
reduced to axioms. If every transformation is deducible then also optimal
transformation is deducible. If every transformation is exponential then
optimal one is too, what allows to define lower bound for discussed problem to
be exponential (outside P). Then we show algorithm for NDTM solving the same
problem in O(n^c) (so problem is in NP), what proves that P \neq NP.
  Article proves also that result of relativisation of P=NP question and oracle
shown by Baker-Gill-Solovay distinguish between deterministic and
non-deterministic calculation models. If there exists oracle A for which
P^A=NP^A then A consists of infinite number of algorithms, DTMs, axioms and
predicates, or like NDTM infinite number of simultaneous states.

<|endoftext|><|startoftext|>
arXiv:0704.0515v2  [cond-mat.mes-hall]  16 Jul 2007
Temperature dependence of Coulomb drag between finite-length quantum wires
J. Peguiron,1 C. Bruder,1 and B. Trauzettel1
Department of Physics and Astronomy, University of Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland
(Dated: July 2007)
We evaluate the Coulomb drag current in two finite-length Tomonaga-Luttinger-liquid wires cou-
pled by an electrostatic backscattering interaction. The drag current in one wire shows oscillations
as a function of the bias voltage applied to the other wire, reflecting interferences of the plasmon
standing waves in the interacting wires. In agreement with this picture, the amplitude of the current
oscillations is reduced with increasing temperature. This is a clear signature of non-Fermi-liquid
physics because for coupled Fermi liquids the drag resistance is always expected to increase as the
temperature is raised.
PACS numbers: 71.10.Pm,72.10.-d,72.15.Nj
Coulomb drag phenomena in coupled one-
dimensional (1D) electron systems have been
investigated quite extensively in the past [1–10]. The
interest has mainly been driven by the fact that
Coulomb drag, i.e. the electrical response of one wire
as a finite bias is applied to the other wire, seems to
be an ideal testing ground for Tomonaga-Luttinger-
liquid (TLL) physics in nature. This is because
both inter-wire and intra-wire Coulomb interactions
substantially modify transport properties such as
the average current and the current noise. On the
experimental side, there have been a few works, some
of which have claimed to have observed TLL behavior
in the drag data [11–13]. Recently, Yamamoto and
coworkers have measured Coulomb drag in coupled
quantum wires of different lengths and found peculiar
transport properties that depend, for instance, on the
asymmetry of the two wires [14]. This experiment is
the major motivation for our work.
We analyze theoretically the Coulomb drag current
of two electrostatically coupled quantum wires using
the concept of the inhomogeneous TLL model [15–17].
This model is known to capture the essential physics of
an interacting 1D wire of finite length coupled to non-
interacting (Fermi liquid) electron reservoirs. Within
this framework, we are able to study finite-length and
finite-temperature effects and therefore to make quali-
tative contact with the experimental setup of Ref. [14].
Since the Coulomb interaction varies between the wire
regions and the lead regions, charge excitations feel
the interaction difference at the boundaries and are
known to exhibit Andreev-type reflections [17]. We
show that these reflections play a crucial role in the
Coulomb drag setup illustrated in Fig. 1. Further-
more, we show that the quantum interference phe-
nomena associated with the Andreev-type reflections
considerably modify the drag current. This is particu-
larly interesting as far as the temperature dependence
of the drag current is concerned. For Fermi-liquid sys-
tems, it is well known that the drag resistance should
always increase as the temperature is raised [6]. In
our setup instead, the drag current at a fixed drive
bias can either increase or decrease as a function of
temperature. It crucially depends on the interference
pattern due to finite-length effects. This is a clear
signature of non-Fermi-liquid physics which could be
observed in the double-wire setup of Ref. [14].
The system considered consists of two in-
teracting parallel wires (j = <,>) of fi-
nite length L< (shorter wire) and L> (longer
wire) connected to non-interacting semi-infinite 1D
leads (Fig. 1) and is described by the Hamiltonian
j=<,>
H0j +H
+HC. The intra-wire inter-
action is modelled through a TLL description [15–17]
H0j =
Π2j +
g2j (x)
(∂xΦj)
with the piecewise constant interaction parame-
ter gj(x) = gj < 1 in the wire region |x| <
Lj/2 and gj(x) = 1 in the non-interacting lead re-
gions |x| > Lj/2. The Fermi velocity vFj , the in-
teraction strength gj, and the wire length Lj set the
frequency ωLj = vFj/gjLj of the collective plasmonic
excitations hosted in each wire. A voltage eVj =
interwire
coupling
wire <
wire >
FIG. 1: (color online). The system under consideration.
Each interacting wire of length Lj [gray area, interaction
parameter gj(|x| < Lj/2) = gj < 1] is connected to a pair
of non-interacting leads [gj(|x| > Lj/2) = 1]. The region
of backscattering inter-wire interaction (red dashed box)
extends over the length of the shorter wire. A voltage V is
applied between the leads connected to the drive wire (here
the longer wire j = >) and the backscattering-induced
current Idr in the drag wire (here the shorter wire j = <)
is investigated.
http://arxiv.org/abs/0704.0515v2
0 1 2 3
V / 2πV
a) g=0.1
b) g=0.25
c) g=0.5
d) g=0.75
e) g=1
FIG. 2: Drag current as a function of drive voltage for
identical wires at zero temperature (solid curves). The
inter-wire interaction strength ranges from strongly inter-
acting (g = 0.1) to non-interacting (g = 1) for the dif-
ferent curves (with I
dr = eλ
2α4g/~2ωL and VL =
~ωL/e). The dashed curves show the dominant contri-
bution ∝ V 4g−2 [given in Eq. (10)] for g = 0.1, 0.25, 0.5.
L − µ
R applied to the leads is described by
HVj = −
dx µj(x)∂xΦj(x), (2)
with the piecewise constant electro-chemical potential
µj(x) =
L for x < −Lj/2,
0 for |x| < Lj/2,
R for x > Lj/2
with µ
L = −µ
R . This model is expected to cap-
ture the essential physics of a quantum wire coupled
smoothly to electron reservoirs (with typical smooth-
ing length Ls) as long as Lj=<,> ≫ Ls ≫ λF , where
λF is the electron Fermi wavelength [18, 19]. Fi-
nally, we include an inter-wire backscattering inter-
action over the length L< of the shorter wire,
HC = λBS
∫ L</2
−L</2
dx cos{
4π[Φ<(x)−Φ>(x)]}. (4)
This term includes the contribution of the density-
density interaction which is most relevant to Coulomb
drag [1, 2] when the Fermi wave-vectors of both wires
are similar in magnitude, i.e. kF< ≈ kF> [22].
In the following, we choose to apply a voltage V> =
V to the longer wire (µ>L = −µ
R = eV/2, drive wire)
and none V< = 0 to the shorter wire (µ
L = µ
R = 0,
drag wire). The average current in the wires may then
be written as I< = Idr and I> =
V − Idr. In our
model, the two currents I< and I> always flow in the
same direction, which is due to momentum conserva-
tion. This is known as positive Coulomb drag.
In order to get an expression for the drag cur-
rent Idr, we follow the formalism used in [18] in the
case of a single wire with an impurity. We consider
the situation of weak inter-wire coupling. To second
order in λBS, we obtain
Idr = I
∫ 1−R
drjdr(r, R), (5)
with the normalization I
dr = eλ
2ωL<,
where α< = ωL</ωc is the ratio between the plas-
mon frequency of the shorter wire and a cutoff fre-
quency ωc, of the order of the wire bandwidth. The
plasmon frequency defines a voltage VL = ~ωL</e
and a temperature TL = ~ωL</kB. It is conve-
nient to introduce corresponding dimensionless volt-
age u = V/VL and temperature θ = T/TL. The inte-
grand in (5),
jdr(r, R) =
eiuτ − e−iuτ
× exp
4πC<(r, R; τ) + 4πC>
, (6)
involves the parameters l = L>/L<, p = g>/g<,
and q = vF>/vF<. The correlation function Cj =
CGSj + C
j of each wire can be decomposed in a
zero-temperature and a finite-temperature contribu-
tion given by [18]
CGSj (r, R; τ) = −
αj + i(τ − sr − 2k)
αj + i(−2k)
|2k+1|
αj + i(τ − sR− 2k − 1)
[α2j + (r − sR− 2k − 1)2]1/2
, (7)
CTFj (r, R; τ) = −
sinch[πθ(τ − sr − 2k)]
sinch[πθ(−2k)]
|2k+1|
sinch[πθ(τ − sR− 2k − 1)]
sinch[πθ(r − sR− 2k − 1)]
, (8)
with γj = (1− gj)/(1 + gj) and sinchx = (sinhx)/x. It is to be noted that the expression for the drag cur-
3rent Idr does not depend on whether the drive wire is
the longer wire or the shorter one due to the symme-
try of our model. In the following, we present results
obtained by numerical evaluation of the triple integral
involved in (5) and (6) and discuss several analytical
approximations.
First we set the temperature to zero and consider
identical wires (l = p = q = 1, thus we drop the wire
index j). The drag current shows non-monotonous
behavior and oscillations with period ∼ 2π~ωL/e as a
function of the bias voltage (Fig. 2). It decays at large
voltages for g < 1/2 whereas it increases for g > 1/2.
Thus, we obtain qualitatively the same behavior as in
a dual Coulomb drag setup where a drive current is
applied and a drag voltage is measured [2]. Similar os-
cillations as a function of voltage have been predicted
in the context of two coupled fractional quantum Hall
line junctions [20].
An analytic approximation can be derived in the
limit u = eV/~ωL ≫ 1, that is for high voltages
or long wires. In Eq. (7), the terms proportional
to γ|m| account for contributions from plasmon exci-
tations reflected |m| times inside the wire. When the
wire length is much longer than other relevant length
scales, the contribution without any reflection m = 0
becomes dominant and yields the integrand
jdr(r, R) ∼
Γ(2g)
)2g−1/2
J2g−1/2(ur), (9)
where Γ(z) denotes the Gamma function and Jν(z)
the Bessel function of order ν. The resulting expres-
sion for the drag current, which involves hypergeo-
metric functions, underestimates the amplitude of the
oscillations with respect to the exact numerical re-
sult. However, the behavior at large u, governed by
the dominant contribution
Idr ∼
2Γ2(2g)
)4g−2
, (10)
shows good agreement in the appropriate parameter
regime (dashed curves in Fig. 2). Since we do per-
turbation theory in λBS, the relation Idr ≪ (e2/h)V
has to hold. In the large u regime, this means
(λBSL/~ωc)
2(ωL/ωc)(eV/~ωc)
4g−3 ≪ 1.
Now we consider the situation where the two wires
have different lengths. The qualitative behavior of the
drag current does not change for increasing length ra-
tio l = L>/L<, but the peak positions get shifted to
lower voltages (Fig. 3). Here, neglecting plasmon re-
flections in the correlation function of the wires leads
again to the expression (9), which is independent of l.
This fact explains why the drag current does not
change appreciably as a function of l and indicates
that the peak shifts observed result from plasmon re-
flections inside the wires. Our studies are the first to
analyze the effect of an asymmetry in the length on
Coulomb drag phenomena in coupled quantum wires
which is of recent experimental relevance [14].
 0  0.5  1  1.5  2
Idr/Idr
L>/L<V/2πVL
Idr/Idr
FIG. 3: (color online). Drag current as a function of the
bias voltage and of the length ratio of the wires (with
dr = eλ
2ωL< and VL = ~ωL</e).
We now discuss in detail the temperature depen-
dence of the drag current. For clarity, we consider
again symmetric wires (l = 1) [23]. The oscilla-
tions of the drag current as a function of bias voltage
get washed out with increasing temperature (Fig. 4).
This behavior is consistent with the picture which at-
tributes the oscillations to interferences of the plas-
mon excitations of the wires. Thus, for bias voltages
such that the drag current is close to a maximum at
zero temperature, one observes a decrease of the drag
current with increasing temperature, whereas the op-
posite behavior can be observed close to a minimum of
the zero-temperature drag current for strong interac-
tions. This behavior is in stark contrast to the linear
temperature dependence predicted for Coulomb drag
between 1D Fermi-liquid conductors [4] and therefore
a clear signature of TLL physics. Note that our Fig. 4
bears significant resemblance to Fig. 9 of Ref. [12].
A good approximation of the drag current can be
obtained for temperatures much larger than the tem-
perature associated with the plasmon frequency, θ =
kBT/~ωL ≫ 1, by neglecting contributions from plas-
mon reflections in the wires. Then, we obtain the
dominant contribution
Idr ∼
dr (2πθ)
4g−2 sinh
g + iu
4Γ2(2g)
Taking the limit of low bias voltage in this result,
Idr ∼
θ≫1,u
πΓ4(g)
4Γ2(2g)
(2πθ)4g−3u, (12)
we recover the power-law dependence T 4g−3 of the lin-
ear conductance predicted by renormalization group
analysis [5]. At large bias voltage u ≫ θ ≫ 1, we
recover the zero-temperature result (10).
In the case g = 1/2, Eq. (11) as well as the con-
tribution to next order in θ−1 can be brought into a
0 1 2 3
V / 2πV
0 1 2 3
g=0.25
g=0.6
FIG. 4: Temperature dependence of the drag current for
identical wires with interaction strength g = 0.25. The
solid curves (labelled a - e) are evaluated for tempera-
tures given by T/TL = kBT/~ωL = 0, 0.5, 1, 1.5, 5, respec-
tively (with I
dr = eλ
2α4g/~2ωL and VL = ~ωL/e).
The dashed curve shows the high-temperature limit (11)
for T/TL = 5. The inset shows a similar plot for weaker
intra-wire interaction g = 0.6 where the oscillations are
less pronounced.
compact analytic form
Idr ∼
, (13)
where Ψ′(z) = d2 ln Γ(z)/dz2 denotes the trigamma
function. The first term is Eq. (11) evaluated
at g = 1/2 (dotted curve in Fig. 5), and the second
one is the dominant correction, which takes values
within [−I(0)dr /θ, 0] (included in the dashed curve in
Fig. 5). This illustrates that the first-order approx-
imation already yields a nice qualitative description
for the full numerical result, which makes us confident
that Eq. (11) is a good high-temperature approxima-
tion also for g 6= 1/2.
In summary, we have analyzed two coupled quan-
tum wires that exhibit both a finite intra-wire interac-
tion and a finite inter-wire interaction. We have taken
into account finite-length effects within the inhomoge-
neous TLL model that is known to capture the essen-
tial physics of quantum wires coupled to Fermi-liquid
reservoirs. We have investigated how an asymmetry
in the lengths of the wires changes the drag current
and we have predicted a rich temperature dependence
of the drag current that shows clear signatures of non-
Fermi-liquid physics.
We would like to thank F. Dolcini, H. Grabert,
M. Kindermann, Y. Nazarov, S. Tarucha, and M. Ya-
mamoto for interesting discussions. This work
was supported by the Swiss NSF and the NCCR
Nanoscience.
0 1 2 3 4 5 6 7 8
V / 2πV
exact numerical result [Eq. (5)]
dominant term [Eq. (11)]
dominant term + first correction [Eq. (13)]
g=0.5, T/T
=5g=0.5, T/T
FIG. 5: Drag current at high temperature (T/TL = 5)
for identical wires with interaction strength g = 0.5 (solid
curve), where I
dr = eλ
2α4g/~2ωL and VL = ~ωL/e.
The dotted curve shows the dominant contribution in the
high-temperature limit [first term in (13)], the dashed
curve includes the first correction as well [both terms
in (13)].
[1] K. Flensberg, Phys. Rev. Lett. 81, 184 (1998).
[2] Y. V. Nazarov and D. V. Averin, Phys. Rev. Lett. 81,
653 (1998).
[3] A. Komnik and R. Egger, Phys. Rev. Lett. 80, 2881
(1998).
[4] V. L. Gurevich, V. B. Pevzner, and E. W. Fenton, J.
Phys.: Condens. Matter 10, 2551 (1998).
[5] R. Klesse and A. Stern, Phys. Rev. B 62, 16912
(2000).
[6] V. V. Ponomarenko and D. V. Averin, Phys. Rev.
Lett. 85, 4928 (2000).
[7] B. Trauzettel, R. Egger, and H. Grabert, Phys. Rev.
Lett. 88, 116401 (2002).
[8] M. Pustilnik, E. G. Mishchenko, L. I. Glazman, and
A. V. Andreev, Phys. Rev. Lett. 97, 126805 (2003).
[9] G. A. Fiete, K. Le Hur, and L. Balents, Phys. Rev. B
73, 165104 (2006).
[10] M. Pustilnik, E. G. Mishchenko, and O. A. Starykh,
Phys. Rev. Lett. 97, 246803 (2006).
[11] P. Debray et al., Physica E 6, 694 (2000).
[12] P. Debray et al., J. Phys.: Condens. Matter 13, 3389
(2001).
[13] M. Yamamoto, M. Stopa, Y. Tokura, Y. Hirayama,
and S. Tarucha, Physica E 12, 726 (2002).
[14] M. Yamamoto, M. Stopa, Y. Tokura, Y. Hirayama,
and S. Tarucha, Science 313, 204 (2006).
[15] D. L. Maslov and M. Stone, Phys. Rev. B 52, R5539
(1995).
[16] V. V. Ponomarenko, Phys. Rev. B 52, R8666 (1995).
[17] I. Safi and H. J. Schulz, Phys. Rev. B 52, R17040
(1995).
[18] F. Dolcini, B. Trauzettel, I. Safi, and H. Grabert,
Phys. Rev. B 71, 165309 (2005).
[19] The effect of a finite smoothing length Ls on the con-
ductance in an equivalent model has been analyzed by
K. Janzen, V. Meden, and K. Schönhammer, Phys.
Rev. B 74, 085301 (2006).
[20] U. Zülicke and E. Shimshoni, Phys. Rev. B 69, 085307
5(2004).
[21] A. Komnik and R. Egger, Eur. Phys. J. B 19, 271
(2001).
[22] Inter-wire forward-scattering is not explicitely in-
cluded in our model as it can be recast in a mere renor-
malization of the intra-wire interaction strength gj of
each wire [5]. We also neglect electron tunneling be-
tween the two wires for two reasons: (i) It is a less
relevant process than HC in a renormalization group
sense [21]. (ii) It can be tuned to zero in a Coulomb
drag experiment [12]. We assume to be away from
half-filling where Umklapp scattering is forbidden.
[23] In view of the results shown in Fig. 3, we do not expect
any qualitative changes of the predicted temperature
dependence for the case l = 1 as we make the wires
asymmetric in length (l 6= 1).
ABSTRACT
  We evaluate the Coulomb drag current in two finite-length
Tomonaga-Luttinger-liquid wires coupled by an electrostatic backscattering
interaction. The drag current in one wire shows oscillations as a function of
the bias voltage applied to the other wire, reflecting interferences of the
plasmon standing waves in the interacting wires. In agreement with this
picture, the amplitude of the current oscillations is reduced with increasing
temperature. This is a clear signature of non-Fermi-liquid physics because for
coupled Fermi liquids the drag resistance is always expected to increase as the
temperature is raised.

<|endoftext|><|startoftext|>
Journal of the Chinese Chemical Society, 2001, 48: 449-454
Effects of Imperfect Gate Operations in Shor’s Prime Factorization Algorithm
Hao Guo1,2, Gui-Lu Long1,2,3,4,5 and Yang Sun1,2,6,7
Department of Physics, Tsinghua University, Beijing 100084
Key Laboratory for Quantum Information and Measurements, MOE
Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100080, P.R. China
Centre for Nuclear Theory, Lanzhou National Laboratory of Heavy Ions
Chinese Academy of Sciences, Lanzhou 740000, P.R. China
Center of Atomic, Molecular and Nanosciences, Tsinghua University, Beijing 100084
Department of Physics, Xuzhou Normal University, Xuzhou,
Jiangsu 221009
Department of Physics and Astronomy,
University of Tennessee, Knoxville, TN 37996, U.S.A.
(Dated: 2001)
The effects of imperfect gate operations in implementation of Shor’s prime factorization algorithm
are investigated. The gate imperfections may be classified into three categories: the systematic error,
the random error, and the one with combined errors. It is found that Shor’s algorithm is robust
against the systematic errors but is vulnerable to the random errors. Error threshold is given to the
algorithm for a given number N to be factorized.
PACS numbers: PACS numbers: 03.67.Lx, 89.70.+c, 89.80.+h
I. INTRODUCTION
Shor’s factorization algorithm [1] is a very impor-
tant quantum algorithm, through which one has demon-
strated the power of quantum computers. It has greatly
promoted the worldwide research in quantum computing
over the past few years. In practice, however, quantum
systems are subject to influence of environment, and in
addition, quantum gate operations are often imperfect
[2, 3]. Environment influence on the system can cause de-
coherence of quantum states, and gate imperfection leads
to errors in quantum computing. Thanks to Shor’s an-
other important work, in which he showed that quantum
error correlation can be corrected [4]. With quantum
error correction scheme, errors arising from both deco-
herence and imperfection can be corrected.
There have been several works on the effects of deco-
herence on Shor’s algorithm. Sun et al. discussed the
effect of decoherence on the algorithm by modeling the
environment [5]. Palma studied the effects of both deco-
herence and gate imperfection in ion trap quantum com-
puters [6]. There have also been many other studies on
the quantum algorithm [7, 8, 9, 10].
The error correction scheme uses available resources.
Thus it is important to study the robustness of the algo-
rithm itself so that one can strike a balance between the
amount of quantum error correction and the amount of
qubits available. In this paper, we investigate the effects
of gate imperfection on the efficiency of Shor’s factoriza-
tion algorithm. The results may guide us in practice to
suppress deliberately those errors that influence the algo-
rithm most sensitively. For those errors that do not affect
the algorithm very much, we may ignore them as a good
approximation. In addition, study of the robustness of
algorithm to errors is important where one can not apply
the quantum error correction at all, for instance, in cases
that there are not enough qubits available.
The paper is organized as follows. Section II is devoted
to an outline of Shor’s algorithm and different error’s
modes. In Section III, we present the results. Finally, a
short summary is given in Section IV.
II. SHOR’S ALGORITHM AND ERROR’S
MODES
Shor’s algorithm consists of the following steps:
1) preparing a superposition of evenly distributed
states
|ψ〉 = 1√
|a〉|0〉,
where q = 2L and N2 ≤ q ≤ 2N2 with N being the
number to be factorized;
2) implementing yamodN and putting the results into
the 2nd register
|ψ1〉 =
|a〉|yamodN〉;
3) making a measument on the 2nd register; The state
of the register is then
|φ2〉 =
|jr + l〉|z = yl = yjr+lmodN〉
where j ≤
4) performing discrete Fourier transformation (DFT)
on the first register |φ3〉 =
f̃ (c) |c〉
|z〉, where
f̃ (c) =
2πi(jr + l)
2πilc
2πijrc
http://arxiv.org/abs/0704.0516v1
This term is nonzero only when c = k q
, with k =
0, 1, 2...r − 1, which correspond to the peaks of the dis-
tribution in the measured results, and thus this term be-
comes f̃(c) = 1√
2πilc
q . The Fourier transformation is
important because it makes the state in the first register
the same for all possible values in the 2nd register. The
DFT is constructed by two basic gate operations: the
single bit gate operation Aj =
, which is
also called the Walsh-Hadmard transformation, and the
2-bits controlled rotation
Bjk =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 eiθjk
with θjk =
. The gate sequence for implementing
DFT is
(Aq−1)(Bq−2q−1Aq−2) . . . (B0q−1B0q−2 . . . B01A0).
Errors can occur in both Aj and Bjk. Aj is actually a
rotation about y-axis through π
Aj(θ) = e
Syθ = I cos(
)−i sin(θ
)σy =
cos( θ
) − sin( θ
sin( θ
) cos( θ
If the gate operation is not perfect, the rotation is not
exactly π
. In this case, Aj is a rotation of
Aj(δ) =
cos(δ)− sin(δ) −(sin(δ) + cos(δ))
sin(δ) + cos(δ) cos(δ)− sin(δ)
If δ is very small, we have:
Aj(θ) =
1− δ −(1 + δ)
1 + δ+ 1− δ
Similarly, errors in Bjk can be written as
Bjk =
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 ei(θjk+δ)
With these errors, the DFT becomes
|a〉 →
i( 2π
+δc)a(1 + δ′c)|c̃〉 =
i( 2πc
+δc)a(1 + δ′c)|c̃〉,(1)
where δc and δ
c denote the error of Aj and Bjk, respec-
tively.
Let us assume the following error modes: 1) system-
atic errors, where δc or δ
c in (1) can only have system-
atic errors (EM1); 2) random errors (EM2), for which
we assume that δc or δ
c can only be random errors of
the Gaussian or the uniform type; 3) coexistence of both
systematic and random errors (EM3). In the next sec-
tion, we shall present the results of numerical simulations
and discuss the effects of imperfect gate operation on the
DFT algorithm, and thus on the Shor’s algorithm.
III. INFLUENCE OF IMPERFECT GATE
OPERATIONS
We first discuss the influence of imperfect gate opera-
tions in the initial preparation
Al−1Al−2...A0|0...0〉 = 1√
(|0〉+ |1〉+ δ1(|0〉 − |1〉))⊗ (|0〉+ |1〉+ δ2(|0〉 − |1〉))⊗ . . .⊗ (|0〉+ |1〉+ δn(|0〉 − |1〉))
i1i2...in=0
|i1i2...in〉+ 1√
R=1 δn
i1i2...in=0
(|i1..iR−10iR+1..in〉 − |i1..iR−11iR+1..in〉
If the errors are systematic, for instance, caused by the
inaccurate calibration of the rotations, then δ1 = δ2 =
. . . = δn = δ. In this case, we can write the 2nd term as
|ψ〉 = 1√
i1i2...in=0
(2s− n)|i1i2...in〉,
where s stands for the number of 1’s, and 2s− n = s −
(n − s) is the difference in the number of 1’s and 0’s.
Thus the results after the first procedure is
(|a〉+ δ(2s− n)|a〉) =
(1 + δa)|a〉. (2)
This implies that after the procedure, the amplitude of
each state is no longer equal, but have slight difference.
Combining the effect in the initialization and in the DFT,
we have
(1 + δa)(1 + δc)e
i( 2πc
+δ′c)a
= (1 + δ′′)ei(
+δ′c)a,
where δ′′c = δc + δa. In the DFT, we have
|ψ〉 ⇒
(1 + δj)e
i( 2πc
+δ′j)(jr+l)|c̃〉,
where we have rewrite δ′′ as δj here. Let Pc denote the
probability of getting the state |c̃〉 after we perform a
measurement, we have
(1 + δm)(1 + δk)e
i( 2πc
+δ′m)(mr+l)×e−i(
+δ′k)(kr+l)
(1 + δm)(1 + δk) cos[
r(m − k) + (mr + l)δ′m − (kr + l)δ′k](3)
From Eq. (3), we find that after the last measurement,
each state can be extracted with a probability which is
nonzero, and the offset l can’t be eliminated.
Eq. (3) is very complicated, so we will make some
predigestions to discuss different error modes for conve-
nience. Generally speaking, the influence of exponential
error δj is more remarkable than δj , so we can omit the
error δj , thus
DFTq |φ〉 =
j=0 e
i( 2πc
+δ′j)(jr+l)|c〉 .
A. Case 1
If only systematic errors (EM1) are considered, namely,
all the δj ’s are equal, then f̃(c) can be given analytically
f̃(c) =
i( 2πc
+δ)(jr+l)
il( 2πc
+δ) 1− ei(
1− ei(
The relative probability of finding c is
f̃(c)
sin2(
sin2(πcr
and if c = k q
, then
r sin2(
q2 sin2( δr
It can be easily seen that limδ→0 Pc =
, which is just
the case that no error is considered.
When δ takes certain values, say, δ = 2
(k− r
)π where
k is an integer, then the summation in Eq. (4) is on
longer valid. In our simulation, δ does not take these
values. Here we consider the case where q = 27 = 128
and r = 4. For comparisons, we have drawn the relative
probability for obtaining state c in Fig.1. for this given
example. We have found the following results:
(i) When δ is small, the errors do hardly influence the
final result, for instance when c = k q
, then
Pc = lim
r sin2( δq
q2 sin2( δr
The probability distribution is almost identical to those
without errors.
(ii) Let us increase δ gradually, from Fig.2, we see that
a gradual change in the probability distribution takes
place. (Here, we again consider the relative probabilities)
When δ is increased to certain values, the positions of
peaks change greatly. For instance at δ = 0.05, there
appears a peak at c=127, whereas it is Pc = 0 when no
systematic errors are present. In general, the influence of
systematic errors on the algorithm is a shift of the peak
positions. This influences the final results directly.
B. Case 2
When both random errors and systematic errors are
present, we add random errors to the simulation. To see
the effect of different mode of random errors, we use two
random number generators. One is the Gaussian mode
and the other is the uniform mode. In this case, the er-
ror has the form δ = δ0 + s, where δ0 is the systematic
error. s has a probability distribution with respect to
c, depending on the uniform or the Gaussian distribu-
tion. When δ0 = 0, we have only random errors which is
our error mode 2. When δ0 6= 0, we have error mode 3.
For the uniform distribution, s ∼ ±smax × u(0, 1) where
u(0, 1) is evenly distributed in [0,1]. smax indicates the
maximum deviation from δ0. For Gaussian distribution,
s ∼ N(0, σ0). Through the figure, we see the following:
(1) When only random errors are present (δ0 = 0), the
peak positions are not affected by these random errors.
However, different random error modes cause similar re-
sults. The results for uniform random error mode are
shown in Fig.3. For the uniform distribution error mode,
with increasing δmax, the final probability distribution of
the final results become irregular. In particular, when
δmax is very large, all the patterns are destroyed and is
hardly recognizable. Many unexpected small peaks ap-
pear. For the Gaussian distribution error mode, as shown
in Fig.4, the influence of the error is more serious. This
is because in Gaussian distribution, there is no cut-off
of errors. Large errors can occur although their proba-
bility is small. The influence of σ0 on the final results
is also sensitive, because it determines the shape of the
distribution. When σ0 increases, the final probability dis-
tribution becomes very messy. A small change in σ0 can
cause a big change in the final results.
(2) When δ0 6= 0, which corresponds to error mode 3,
the effect is seen as to shift the positions of the peaks in
addition to the influences of the random errors.
IV. SUMMARY
To summarize, we have analyzed the errors in Shor’s
factorization algorithm. It has been seen that the effect
of the systematic errors is to shift the positions of the
peaks, whereas the random errors change the shape of the
probability distribution. For systematic errors, the shape
of the distribution of the final results is hardly destroyed,
though displaced. We can still use the result with several
trial guesses to obtain the right results because the peak
positions are shifted only slightly. However, the random
errors are detrimental to the algorithm and should be
reduced as much as possible. It is different from the
case with Grover’s algorithm where systematic errors are
disastrous while random errors are less harmful [10].
[1] P.W. Shor, Proceedings of the 35th Annual Symposium
on the Foundations of Computer Science, edited by S.
Goldwasser (IEEE Computer Society Press, Los Alami-
tos, CA, 1994) p.124.
-20 0 20 40 60 80 100 120 140
FIG. 1: Relative probability for finding state c in the absence
of errors.
[2] A. Ekert and R. Jozsa, Rev. Mod. Phys. 68 (1996) 733.
[3] W.G. Unruh, Phys. Rev A51 (1995) 992.
[4] I. Chuang and R. laflamme, ”Quantum error correction
by codding” (1995) quant-ph/9511003.
[5] C.P. Sun, H. Zhan and X.F. Liu, Phys. Rew. A58 (1998)
1810.
[6] G.M. Palma, K.A. Suominen and A.K. Ekert, Proc. R.
Soc. London, A 452 (1996) 567.
[7] R.P. Feynman, Int. J. Theo. Phys., 21 (1982) 467.
[8] D. Deutsch, Proc. R. Soc. Land. A 400 (1985) 97.
[9] L.K. Grover, Phys. Rev, Lett, 79 (1997) 325.
[10] G.L. Long, Y.S. Li, W.L. Zhang, C.C. Tu, Phys. Rev. A
61 (2000) 042305.
[11] L.K. Grover, Phys. Rev. Lett, 80 (1998) 4329.
http://arxiv.org/abs/quant-ph/9511003
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
FIG. 2: The same as Fig.1. with systematic errors. In sub-
figures (1), (2), (3), (4), δ are 0.02, 0.03, 0.05 respectively. In
sub-figure (4), the curve with solid circles(with higher peaks)
is the result with δ = 0.1, and the one without solid cir-
cles(with lower peaks) denotes the result with δ = 0.33.
-20 0 20 40 60 80 100 120 140
c -20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
0.000
0.005
0.010
0.015
0.020
0.025
0.030
FIG. 3: The same as Fig.1. with uniform random errors. In
sub-figures (1), (2), (3), (4), smax are set to 0.01, 0.03, 0.05,
0.1 respectively.
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
-20 0 20 40 60 80 100 120 140
0.004
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
-20 0 20 40 60 80 100 120 140
FIG. 4: The same as Fig.1. with Gaussian random errors
and systematic errors. In sub-figures (1), (2), and (3) τ are
set to 0.01, 0.03 and 0.05 respectively, and δ0 = 0(without
systematic errors). In sub-figure (4), both systematic and
random Gaussian errors exist, where δ0 = 0.33, τ = 0.02.
ABSTRACT
  The effects of imperfect gate operations in implementation of Shor's prime
factorization algorithm are investigated. The gate imperfections may be
classified into three categories: the systematic error, the random error, and
the one with combined errors. It is found that Shor's algorithm is robust
against the systematic errors but is vulnerable to the random errors. Error
threshold is given to the algorithm for a given number $N$ to be factorized.

<|endoftext|><|startoftext|>
Introduction
The quantitative assessment of dietary exposure to certain contaminants is of high priority to the
Food and Agricultural Organization and the World Health Organization (FAO/WHO). For exam-
ple, excessive exposure to methylmercury, a contaminant mainly found in fish and other seafood
(mollusks and shellfish) may have neurotoxic effects such as neuronal loss, ataxia, visual disturbance,
impaired hearing, and paralysis (WHO, 1990). Quantitative risk assessments for such chronic risk
require the comparison between a tolerable dose of the contaminant called Provisional Tolerable
Weekly Intake (PTWI) and the population’s usual intake. The usual intake distribution is gener-
ally estimated from independent individual food consumption surveys (generally not exceeding 7
days) and food contamination data. Several models have been developed to estimate the distribu-
tion of usual dietary intake from short-term measurements (see for example, Nusser et al., 1996;
Hoffmann et al., 2002). The proportion of consumers whose usual weekly intake exceeds the PTWI
can then be viewed as a risk indicator (see for example, Tressou et al., 2004). This kind of risk
assessment does not account for the underlying dynamic process, i.e. for the fact that the contami-
nant is ingested over time and naturally eliminated at a certain rate by the human body. Moreover,
longer term measurements of consumption are available through household budget surveys (HBS).
In this paper, we propose to use HBS data to quantify individual long term exposure to a
contaminant. This data provides long time series of household food acquisitions which are first used
in a decomposition model, similar to the one proposed by Chesher (1997, 1998) in the nutrition
field, in order to obtain time series of individual intakes. Then, the pharmacokinetic properties of
the contaminant are integrated into an autoregressive model in which the current body burden is
defined as a fraction of the previous one plus the current intake.
From a toxicological point of view, this approach is, to our knowledge, novel and hence requires
the definition of an ad-hoc long term safe dose as proposed in the next section. We refer to this
autoregressive model as Kinetic Dietary Exposure Model (KDEM).
From a statistical point of view, such autoregressive models are well known in general time series
analysis (see for example, Hamilton, 1994) and most of the paper is devoted to the description of
the decomposition model. This statistical model aims at estimating individual quantities from total
household quantities and structures. This problem is similar to that studied by Engle et al. (1986),
Chesher (1997, 1998), and Vasdekis and Trichopoulou (2000), and is addressed in a slightly different
way. In the present article, the individual contaminant intake is firstly viewed as a nonlinear
function of age within each gender, with time and socioeconomic characteristics being secondly
introduced in a linear way. The nonlinear function is represented by a truncated polynomial
spline of order 1 that admits a mixed model spline representation (section 4.9 in Ruppert et al.,
2003). These choices yield a simple linear mixed model which is estimated by REstricted Maximum
Likelihood (REML, Patterson and Thompson, 1971). One major extension of the proposed model
compared to Chesher (1997) is the introduction of dependence between the individual intakes of a
given household.
In the next section, focusing on the methylmercury example even though the method is much
more general and could be applied to any chronic food risk, SECODIP data are described along
with the construction of a household intake series and the individual cumulative and long term
exposure concepts yielding the KDEM. Section 2 is devoted to the statistical methodology used to
decompose the household intake series into individual intake series, namely the presentation of the
model and its estimation and tests. Section 3 displays the results for the quantification of long term
exposure to methylmercury of the French population using the 2001 SECODIP panel. Finally, a
discussion on the use of household acquisition data, with the focus on the French SECODIP panel,
is conducted in section 4 with respect to the proposed long term risk analysis.
1 Motivating example: risk related to methylmercury in seafoods
in the French population
In this section, the Kinetic Dietary Exposure Model (KDEM) and the concept of long term risk are
defined. Then a brief panorama of consumption data in France is given and the way the SECODIP
HBS data will be used as an input of the KDEM is described.
1.1 Cumulative exposure and long term risk: the Kinetic Dietary Exposure
Model (KDEM)
The main objective of the analysis is to assess individuals’ long term exposure to a contaminant to
deduce whether these individuals are at risk or not. As mentioned in the introduction the only ”safe
dose” reference is the PTWI expressed in terms of body weight (relative intake). Unfortunately,
TNS SECODIP did not record the body weight of the individuals until 2001. The body weights
are thus estimated from independent data sets; namely the French national survey on individual
consumption (INCA, CREDOC-AFSSA-DGAL, 1999) for people older than 18, and the weekly
body weight distribution available from French health records (Sempé et al. (1979)) for individuals
under 18. In both cases, gender differentiation is introduced.
Assume that estimations of the individual weekly intakes are available, that is yi,h,t denotes the
intake of individual i belonging to household h for the tth week (with i = 1, . . . , nh,t; h = 1, . . . ,H
and t = 1, . . . , T ), and Di,h,t denotes the same quantity expressed on a body weight basis. The
cumulative exposure up to the tth week of this individual is then given by
Si,h,t = exp(−η) · Si,h,t−1 +Di,h,t, (1.1)
where η > 0 is the natural dissipation rate of the contaminant in the organism. This dissipation
parameter is defined from the so called half life of the contaminant,which is the time required for
the body burden to decrease by half in the absence of any new intake. For methylmercury, the half
life, denoted by l1/2, is estimated to 6 weeks, so that η = ln(2)/l1/2 := ln(2)/6 (Smith and Farris,
1996).
The autoregressive model defined by (1.1) and a given initial state Si,h,0 = Di,h,0 has a stationary
solution since exp(−η) < 1. As a convention, Si,h,0 is set to the mean of all positive exposures
(Di,h,t)t=1,. . . ,T . However, this convention has little impact on the level of an individual’s long term
exposure since the contribution of the initial state Si,h,0 tends to zero as t increases. We call this
autoregressive model ”KDEM” for Kinetic Dietary Exposure Model.
The individual cumulative exposure Si,h,t can be considered to be the long term exposure of an
individual for sufficiently large values of t. For methylmercury, the long term steady state of the
individual exposure to a contaminant is reached after 5 or 6 half lives according to Dr P. Granjean,
a methylmercury expert. Thus, the long term individual’s exposure to methylmercury is defined
as the cumulative exposure reached after say 6l1/2 = 36 weeks.
The risk assessment usually consists of comparing the exposure with the so called Provisional
Tolerable Weekly Intake (PTWI). This tolerable dose, determined from animal experiments and
extrapolated to humans, refers to the dose an individual can ingest throughout his entire life
without appreciable risk. For methylmercury, the PTWI is set to 1.6 microgram per kilogram of
body weight per week (1.6 µg/kg bw, see FAO/WHO, 2003).
In our dynamic approach, the long term exposure is compared to a reference long term exposure
denoted by Sref , and defined as the cumulative exposure of an individual whose weekly intake is
equal to the PTWI, d, such as
Sref = lim
1− exp(−η) , (1.2)
where
d exp(−η(t− s)) = dexp(−η(t+ 1))− 1
exp(−η)− 1 . (1.3)
For methylmercury, the reference for long term exposure Sref is 14.6 µg/kg bw. An individual
is then assumed to be at risk if his cumulative exposure Si,h,t exceeds the reference S
t for any
t > 6l1/2.
This KDEM model requires some long surveys of individual intakes which are not monitored
and can only be approximated from available consumption data and contamination data.
1.2 From household acquisition data to household intake series
Two current major consumption data sources in France are the national survey on individual
consumption (INCA, CREDOC-AFSSA-DGAL, 1999) and the SECODIP panel managed by the
company TNS SECODIP. Most quantitative risk assessments conducted by the French agency for
food safety (AFSSA) use the 7 day individual consumption data of the INCA survey jointly with
contamination data collected by several French institutions. Regarding methylmercury, seafood
contamination data have been collected through different analytical surveys (MAAPAR, 1998-2002;
IFREMER, 1994-1998) and were used in Tressou et al. (2004) and Crépet et al. (2005) combined
with the INCA survey. In this paper, a methodology using the SECODIP data is developed (see
Boizot, 2005, for a full description of this database).
The company TNS SECODIP has been collecting the weekly food acquisition data of about five
thousand households since 1989. All participating households register grocery purchases through
the use of EAN bar codes but other grocery purchases are registered differently: the fresh fruit
and vegetable purchases are recorded by the FL sub-panel while fresh meat, fresh fish and wine
purchases are recorded by the VP sub-panel. The households are selected by stratification according
to several socioeconomic variables and stay in the survey for about 4 years. TNS SECODIP provides
weights for each sub-panel and each period of 4 weeks to make sure of the representativeness of
the results in terms of several socioeconomic variables. TNS SECODIP also defines the notion of
household activity which refers to the correct and regular reporting of household purchases over a
year. For each household, the age and gender of each member of the household are retained in our
decomposition model with some socioeconomic variables: the region, the social class (from modest
to well-to-do), the occupation category and level of education of the principal household earner.
For methylmercury risk assessment, the households of the VP panel are considered; in the 2001
data set, there are H = 3229 active households (corresponding to 9288 individuals) and T = 53
weeks during which the households may or may not acquire seafood. The weekly purchases of
seafood are clustered into two categories (”Fish” and ”Mollusks and Shellfish”) for which the mean
contamination levels are calculated from the MAAPAR-IFREMER data and are given in table 1.
Table 1 around here, see page 21
Household intake series ((yh,t)h=1,...,H;t=1,...,T) are computed as the cross product between weekly
purchases of seafoods which are assimilated to weekly consumptions, and mean contamination
levels. They are expressed in micrograms per week (µg/w). The food ”purchase-consumption”
assimilation is of course arguable and will be the main subject of the final discussion (see section
4). An additional assumption concerns the household size, denoted by nh,t for the household h
and the week t. This can indeed vary over time in the case of a birth or death of a household
member. Since a new born baby will not consume fish in his first few months, we assume that
food diversification (and hence consumption of seafoods) starts at one year of age, yielding a total
sample of 8913 individuals for the 2001 panel. These household intake series are then decomposed
into individual intake series using the model described in the next section. These individual intake
series are then used as imputs of the KDEM.
2 Statistical methodology
In this section, the decomposition model is described and compared to similar models described in
the literature, namely Chesher (1997, 1998); Vasdekis and Trichopoulou (2000). Its estimation and
some structure tests are then presented.
2.1 The decomposition model
2.1.1 General principle
Consider a household composed of nh,t members, each member having unobserved weekly intakes
yi,h,t, with i = 1,. . . , nh,t, h = 1,. . . ,H, and t = 1,. . . , T . The week t intake of a household h is
simply the sum across household members of the individual weekly intakes, such as
yh,t =
nh,t∑
yi,h,t. (2.1)
As detailed below, the individual weekly intake yi,h,t is assumed to depend on
• the age and gender of the individual via a function f,
• some socioeconomic characteristics of the household,
• time (seasonal variations).
There are obviously several ways to model the individual intake under these assumptions
and this choice leads to more or less simple estimation procedures. In Chesher (1997, 1998);
Vasdekis and Trichopoulou (2000), a discretization argument on age is used leading to a penalized
least square estimation of a great number of parameters, that is one parameter for each year of age
and gender. We propose to use a truncated polynomial spline of order 1 for each gender, which
admits a mixed model spline representation for f. As far as socioeconomic characteristics are con-
cerned, Chesher (1997) retained a multiplicative specification whereas Vasdekis and Trichopoulou
(2000) chose the additive one. In the multiplicative model, a change in income for example would
proportionally affect all the individual intakes whereas in the additive setting, they would be af-
fected by the same value. Following Vasdekis and Trichopoulou (2000), we retained the additive
specification since the difference between the two specifications may not be notable, and the addi-
tive setting yields to a much simpler estimation procedure (linear model). Finally, time dependency
is only introduced in Chesher (1998) to track changes with age within cohorts: this time depen-
dency is directly introduced into the function f that is bivariately smoothed according to age and
time (cf. Green and Silverman, 1994). Again, we adopt a simpler specification in which time is
introduced as a dummy variable. All these assumptions yield an individual model of the form
yi,h,t = xi,h,tβ + zi,h,tu+ wh,tγ + δtα+ εi,h,t, (2.2)
where the terms xi,h,tβ + zi,h,tu stand for the mixed model spline representation of the function
f, the term wh,tγ denotes the socioeconomic effects, the term δtα the time effect, and εi,h,t is the
individual error term.
Combining (2.1) and (2.2) , we obtain the final rescaled household model given by
Yh,t = Xh,tβ + Zh,tu+
nh,twh,tγ +
nh,tδtα+ εh,t, (2.3)
where Yh,t ≡
∑nh,t
i=1 yi,h,t/
nh,t, Xh,t ≡
∑nh,t
i=1 xi,h,t/
nh,t, Zh,t ≡
∑nh,t
i=1 zi,h,t/
nh,t, and εh,t ≡
∑nh,t
i=1 εi,h,t/
nh,t.
2.1.2 Specification details
Age-gender function specification Let ai,h,t and si,h denote the age and sex of individual i
of household h for the tth week. Individual dietary intake is generally different according to the
gender of individuals, so the function f takes the following form
f(ai,h,t, si,h) = fM (ai,h,t)1l{si,h=M} + fF (ai,h,t)1l{si,h=F},
where fM(.) and fF (.) are age-intake relationships for males (M) and females (F) respectively, and
1l{A} is the indicator function of event A. The function fS(.) is approximated by a spline of order
one with a truncated polynomial basis for either sex, such as
fS(ai,h,t) = β
0 + β
1 ai,h,t +
uSk (ai,h,t − κS,k)+ , (2.4)
where the (κS,k)k=1,. . . ,KS
are nodes chosen from an age list and
(ai,h,t − κS,k)+ ≡ (ai,h,t − κS,k) 1l{ai,h,t−κS,k>0}
denotes the positive part of the difference between the age of the individual ai,h,t and the node κS,k
and the uSk are random effects assumed to be i.i.d. Gaussian with distribution N
0, σ2uS
. This
last assumption allows us to introduce some penalties into the model and to smooth the function
fS yielding a mixed model representation for the spline as shown in Speed (1991); Verbyla (1999);
Brumback et al. (1999); Ruppert et al. (2003). As in Ruppert et al. (2003), page 125, the total
number of nodes KS is set to min
{∣∣aS,d
∣∣ , 35
, where aS,d is the list of distinct ages for individuals
of sex S, and the nodes κS,k are defined as the
percentile of vector aS,d for k = 1,. . . ,KS .
Defining xi,h,t as a line vector
1l{si,h=M} ai,h,t1l{si,h=M} 1l{si,h=F} ai,h,t1l{si,h=F}
, and
zi,h,t as the line vector
(ai,h,t − κS,k)+ 1l{si,h=S}
k=1,. . . ,KS ; S=M,F
, we finally obtain the first
terms of (2.2) , that is f(ai,h,t, si,h) = xi,h,tβ + zi,h,tu.
Socioeconomic characteristics and time dependency In the application, all the socioe-
conomic characterics are categorical variables. Consider the Q categorical variables W
h,t , q =
1, . . . , Q, with mq modalities, and fix the m
q modality as the reference modality, then the socioe-
conomic effect term in (2.2) and (2.3) is
wh,tγ =
mq−1∑
γq,m1l
where γq,m is the effect of the m
th modality of the socioeconomic variable q.
Similarly, time is only measured by weekly counts throughout the year so that the time effect
in (2.2) and (2.3) is simply
δtα =
τ 6=τR
ατ1l{τ=t},
where ατ is the effect of week τ and τR is the reference week.
Error specification The error at the individual level εi,h,t is assumed to be Gaussian with zero
mean, and the variance-covariance structure is such that
• households are independent, i.e. ∀i, i′, t, t′ and ∀h 6= h′
cov(εi,h,t, εi′,h′,t′) = 0,
• members of the same household are dependent, that is for ∀h, t and i 6= i′,
cov(εi,h,t, εi′,h,t) = ρσ
where ρ measures the dependence between individuals within the same household.
• there is no time dependence, that is ∀i, i′ and ∀t 6= t′
cov(εi,h,t, εi′,h,t′) = 0.
In the rescaled household model (2.3), the error εh,t ≡
∑nh,t
i=1 εi,h,t/
nh,t is i.i.d. Gaussian with
a zero mean and a variance R such that ∀t, t′ and ∀h 6= h′,
V(εh,t) = ρσ
εnh,t + (1− ρ)σ2ε and cov(εh,t, εh′,t′) = 0. (2.5)
2.2 Estimation and tests
The model (2.3) is a linear mixed model that can be estimated using restricted maximum likelihood
(REML) techniques, see Ruppert et al. (2003) for details. An attractive consequence of the use of
the mixed model representation of a penalized spline in (2.4) is that mixed model methodology
and software can be used to estimate the parameters and predict the random effect in the resulting
household model. The amount of smoothing of the underlying functions fS is estimated with the
REML technique via the estimation of σ2uS . The estimation was conducted using R©SAS MIXED
procedure. To get estimators for σ2ε and ρ, asymptotic least square techniques combined with the
linear relationship between the variance given in (2.5) and the household size were used. More
precisely, a residual variance σ2n is first estimated for each household size n = 1, . . . , N = maxnh,t
using an option of the MIXED procedure (see the program for the detailed syntax). Then, ordinary
least square regression and the delta method give estimators for σ2ε and ρ and their standard
deviations.
The individual intake is then predicted by
ŷi,h,t = xi,h,tβ̂ + zi,h,tû+ wh,tγ̂ + δtα̂, (2.6)
where β̂, γ̂, and α̂ are the estimators of β, γ, and α respectively and û is the best prediction of the
random effect u in the model (2.3).
Confidence and prediction intervals can be built for the prediction ŷi,h,t as proposed in Ruppert et al.
(2003) and several tests can be conducted in this model:
1. Are the random effects different according to sex? In other words, is the assertion σ2uM =
σ2uF = σ
u true?
2. Another question is the necessity for such random effects. Is the assertion σ2u = 0 (resp.
σ2uM = 0 or σ
= 0) true?
3. More globally, is the function f the same for both sexes? Is the assertion fM = fS true?
These tests can be conducted using classical likelihood (or restricted likelihood) ratio techniques.
The likelihood ratio statistic is asymptotically distributed as a chi square with a degree of freedom
being the number of tested equalities, except for point 2, where the limiting distribution is known to
be a mixture of chi-square (Self and Liang, 1987; Crainiceanu et al., 2003) because the test concerns
the frontier of the parameter definition (σ2u ∈ [0,+∞[).
3 Applying our methodology to the methylmercury risk assess-
In this section, we illustrate our approach on our motivating example. Firstly, several tests are
conducted on the decomposition model, and secondly, individual long term exposure is compared
to the reference long term exposure described in section 1.
3.1 Estimation and tests on the structure of the model
Table 2 shows the REML estimates for all socioeconomic variables (parameter γ) and the p-values
of Student tests in the model (2.3). The socioeconomic variables used are household income, region
of residence, occupation category and level of education of the principal household earner. For each
socioeconomic variable, the reference modality is given in Table 2. We assume here that
• the function f differs according to the gender but the random effect does not (fM 6= fF and
σ2uM = σ
• the maximum household size N is set to 6 for variance-covariance estimation. Indeed, the
dependence between individuals within the same household depends on the household size
nh in (2.5). For each household size, a variance is estimated, and estimates of ρ and σ
are obtained using asymptotic least square techniques as mentioned in section 2.2. Since
large households are not numerous in the database, the estimations are implemented with a
maximum household size, N , set to 6; it is assumed that there is a common variance for all
households with size greater than N .
In this sub-section, we show the results of several tests we carried out to simplify the inter-
pretation of our study. These tests have been implemented in a hierarchical way, starting with
the highest-order interaction terms, combining to the reference modality the modality which does
not differ significantly from the reference. All tests are performed on the 5% level of significance
and each new hypothesis is tested, conditionally on the results of the previous tests. Each null
hypothesis and the p-value resulting from the appropriate F-test are shown in Table 3.
First of all, concerning the occupation category variable, the self-employed modality does not
significantly differ from the reference modality blue collar workers (H1, Pval = 0.771). Refitting the
model with the reference modality ”Blue collar workers and self employed”, all the socioeconomic
variables are significantly different from the reference. Then, F-tests allow us to conclude that the
resulting three groups are significantly different from each other (H2, H3, H4).
Let us now consider the region of residence variable. First, there are some very substantial
differences among the 4 regions of residence (H5, Pval =< 0.001). However, the modality ”North,
Brittany, and Vendee coast” and the modality ”Paris and its suburbs” should be grouped (H6 c,
Pvalc = 0.881). Then, the other tests implemented for the level of education and income variables
suggest that no further simplification is possible (see p-values of null hypotheses H7, H8, H9 in
Table 3). Finally, the overall F-test comparing our resulting final model to the original model (2.3)
shows that no important variable has been left out of the model (Pval = 0.59).
Table 4 shows the parameter estimates and p-values of the Student’s t-tests for all socioeco-
nomic variables of the reduced final model. The income effects on individual exposure are those
expected: the richer the households are, the higher their exposures are because seafoods are ex-
pensive. Furthermore, living in a coastal region or in Paris and its suburbs brings about larger
individual exposure relatively to living in a non coastal region because of the more ready supply of
seafoods in these regions. Moreover, the more educated you are, the larger the individual exposure
is. The occupation category of the principal household earner has an unexpected effect on the in-
dividual exposure. Indeed a higher exposure is expected for white collar workers and retirees whan
compared to blue collar workers but an opposite effect is observed. This may be explained by the
fact that the reference modality for this variable is a very heterogeneous modality also comprising
managers and self-employed persons (farmers and craftsmen). Another explanation could be that
white collars workers have a higher propensity to eat out in restaurants whereas outside the home
consumption is not included in the model.
Table 2 around here, see page 21
Table 3 around here, see page 22
Table 4 around here, see page 22
Likelihood ratio tests are implemented to test the structure of the final model. First, the
dependence of individual exposures to methylmercury within a household is tested. The null
hypothesis ρ = 0 (cf. equation (2.5)) is rejected (null Pval) which confirms that individuals within
the same household have correlated exposures. Then, we test if the function f is the same for both
genders. The null hypothesis fM = fF is rejected (null Pval) but the null hypothesis σ
= σ2uF
is accepted. This means that individual exposure differs with gender but both functions need the
same amount of smoothing.
3.2 The cumulative and the long term individual exposure
The cumulative individual exposure Si,h,t is calculated from the estimated individual weekly intakes
according to equation (1.1) and the resulting values for t > 35 are compared to the reference
cumulative exposure defined by (1.3). Figure 1 shows the cumulative individual exposure over the
53 weeks of the year 2001 for different individuals. Only certain percentiles of the distribution of
the individual cumulative exposures of the last week are displayed. For example, the curve Pmax
represents the cumulative exposure of an individual whose last week’s cumulative exposure is the
highest. This is the cumulative exposure of a girl who turned one year old during the 30th week of
2001, lives in Paris or its suburbs in a well to do household.
Very few individuals have a cumulative individual exposure above the reference long term ex-
posure. We estimate that only 0.186% of individuals are deemed at risk. This risk index should be
compared to the more common one defined as the percentage of weekly intakes Di,h,t exceeding the
PTWI, denoted R1.6, such as R1.6 =
i=1 1l (Di,h,t > 1.6). R1.6 is equal to 0.45%,
and is slightly higher since each occasional deviation above the PTWI increases the risk index
whereas only long term deviations above this PTWI should be taken into account to assess the
risk.
A deeper analysis of at risk individuals shows that all these vulnerable individuals are children
less than three years old. They represent 5.29% of the children aged between 1 and 3 in 2001.
Further, no child of a modest households is found to be at risk.
Figure 1 around here, see page 23
4 Discussion
As mentioned in section 1, the use of household acquisition data in a food safety context, and in
our case the use of the SECODIP database for assessing methylmercury dietary intakes, gives rise
to some approximations:
1. Consumption outside of the home is out of the scope of household acquisition data. TNS
SECODIP does not provide any information on the quantities of seafoods consumed out
of the home or bought for outside consumption. Nevertheless, Serra-Majem et al. (2003)
assert that these data are good estimates for the consumption of the whole household.
Vasdekis and Trichopoulou (2000) avoid this question by using the term ”availaibility” in-
stead of intake or consumption. However, as in Chesher (1997), auxiliary information about
outdoor consumption could be introduced in the model as a correction factor accounting for
the propensity to eat outside of the home according to age, sex or socioeconomic variables.
The French INCA survey on individual consumptions gives details about inside / outside the
home consumption for 3003 individuals people aged 3 and older. The mean outside the home
consumption proportion is 20% for seafoods. Applying such a factor to all household intakes
yields a long term risk of 0.226%, and R1.6 = 0.791%. Furthermore, in this case, a small
proportion of consumers older than 3 years old are vulnerable. Nevertheless, children aged
between 1 and 3 in 2001 still represent the most vulnerable consumer group, at 10% of the
corresponding population.
2. The amount of food bought by a household can be different from the amount actually con-
sumed. Indeed, namely for seafoods, a non negligible part is not edible: Favier et al. (1995)
show than on average only 61% of fresh or frozen fish is edible. Besides, Maresca and Poquet
(1994) also demonstrate some part of the purchased food is thrown away, which also reduces
the actual amount of food consumed by a household. However, SECODIP does not specify
whether the quantity of fresh or frozen fish bought is ready to be consumed or as a whole fish
that needs some preparation. Applying such a factor to all household intakes yields a long
term risk of 0.00%, and R1.6 = 0.043%. If both the 20% outside of the home consumption
correction factor and the 61% edible proportion factor are applied to our series, the long term
risk is equal to 0.021%, R1.6 = 0.13%, and 1.06% of the population of children aged between
1 and 3 are vulnerable. These results stress that applying such a correction factor to assess
the actual quantity consumed is probably too strong and is certainly a crude approximation
of the quantity of seafoods ingested. Thus, a more detailed database on fish and seafood is
needed, to realize an accurate assessment of exposure to methylmercury, taking into account
only the edible part of fish and other seafood.
Body weight information is crucial in a food safety context and will be included in the future
SECODIP data since it has now been added to the list of required individual characteristics.
The measurement error afferent to this quantity will remain however, namely for children
whose body weight changes a lot throughout a year. Nevertheless, approximating the weekly
body weight of young children by the median of the weekly body weight distribution available
in French health records is the best approximation possible.
3. The food nomenclature of the SECODIP database is not as detailed as the contamination
database. Unfortunately, fish and seafood species are not well documented so it is not possible
to consider more than two food categories when computing household intakes. This problem
of nomenclature matching is ubiquitous of food risk assessments since contamination analysis
are generally conducted independently from the food nomenclature of consumption data.
These arguments mainly show the disadvantages of the use of household food acquisition data
such as the SECODIP database. Nevertheless, they also present many advantages compared to the
individual food record survey mainly used in France in the food safety context:
• As mentioned before, households respond for a long period of time (the average is 4 years in
the SECODIP panel) which allows us to observe long term behaviors and avoid some well
known biases of individual food record surveys. For example, respondents might over- (under-
) declare certain foods with a good (bad) nutritional value either deliberately or just because
they increased (reduced) their consumption for the short (7 days) period of the survey.
• The individual surveys are expensive and very difficult to conduct. Highly trained interviewers
are required and extraordinary cooperation is required from respondents. Household food
acquisition data can serve many other applications (economics or marketing) and, at least for
the SECODIP data, acquisition recording is simplified by optical scanning of food barcodes.
Conclusion
In this paper, we proposed a methodology to assess chronic risks related to food contamination
using the example of methylmercury exposure through seafood consumption. This methodology
includes the definition of a Kinetic Dietary Exposure Model (KDEM) that integrates the fact that
contaminants are eliminated from the body at different rates, the rate being measured by the half
life of the contaminant. In this paper, the estimation is based on the use of household food acqui-
sition data which are first decomposed into individual intake data through a disaggregation model
accounting for the dependence among household members. Several extensions of this methodology
are currently studied. First, the disaggregation model could be improved by considering a prelim-
inary step in which we determine what member is an actual consumer, in the spirit of the Tobit
model. The KDEM idea is also currently being developed by studying the stability and ergodic
properties of the underlying continuous time piecewise deterministic Markov process (Bertail et al.,
2006). The parameters of this new model are the intake distribution, the inter intake time distri-
bution and the dissipation rate distribution. In this framework, the dissipation parameter η of the
KDEM model is random and the intake and inter-intake distributions can be estimated either from
individual (INCA-type) data or household (SECODIP-type) data.
References
Bertail, P., S. Clémençon and J. Tressou (2006). A storage model with random release rate for
modeling exposure to food contaminants. Submitted for publication.
Boizot, C. (2005). Présentation du panel de données SECODIP. Technical report. INRA-CORELA.
Brumback, B., D. Ruppert and M. P. Wand (1999). Comment on ”variable selection and function
estimation in additive nonparametric regression using a data-based prior” by Shively, Kohn, and
Wood. Journal of the American Statistical Association 94, 794–797.
Chesher, A. (1997). Diet revealed?: Semiparametric estimation of nutrient intake-age relationships.
Journal of the Royal Statistical Society A 160(3), 389–428.
Chesher, A. (1998). Individual demands from household aggregates: Time and age variation in the
quality of diet. Journal of Applied Econometrics 13(5), 505–524.
Crainiceanu, C. M., D. Ruppert and T. J. Vogelsang (2003). Some properties of likelihood ratio
tests in linear mixed models. (Working Paper).
CREDOC-AFSSA-DGAL (1999). Enquête INCA (individuelle et nationale sur les consommations
alimentaires). TEC&DOC ed.. Lavoisier, Paris. (Coordinateur : J.L. Volatier).
Crépet, A., J. Tressou, P. Verger and J. Ch. Leblanc (2005). Management options to reduce ex-
posure to methyl mercury through the consumption of fish and fishery products by the French
population. Regulatory Toxicology and Pharmacology 42(2), 179–189.
Engle, R. F., C. W. J. Granger, J. Rice and A. Weiss (1986). Non-parametric estimation of the rela-
tionship between weather and electricity demand. Journal of the American Statistical Association
81, 310–320.
FAO/WHO (2003). Evaluation of certain food additives and contaminants for methylmercury. Sixty
first report of the Joint FAO/WHO Expert Committee on Food Additives, Technical Report
Series. WHO. Geneva, Switzerland.
Favier, C., J. Ireland-Ripert, C. Toque and M. Feinberg (1995). Rpertoire Gnral des Aliments,
Table de composition, tome 1. TEC&DOC ed.. Lavoisier, Paris.
Green, P.J. and B.W. Silverman (1994). Nonparametric Regression and Generalized Linear Models.
Chapman & Hall.
Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
Hoffmann, K., H. Boeingand, A. Dufour, J. L. Volatier, J. Telman, M. Virtanen, W. Becker and
S. De Henauw (2002). Estimating the distribution of usual dietary intake by short-term mea-
surements. European Journal of Clinical Nutrition 56, 53–62.
IFREMER (1994-1998). Résultat du réseau national d’observation de la qualité du milieu marin
pour les mollusques (RNO).
MAAPAR (1998-2002). Résultats des plans de surveillance pour les produits de la mer. Ministère
de l’Agriculture, de l’Alimentation, de la Pêche et des Affaires Rurales.
Maresca, B. and G. Poquet (1994). Collectes slectives des dchets et comportements des mnages.
Technical Report R146. CREDOC.
Nusser, S.M., A.L. A.L. Carriquiry, K.W. Dodd and W.A. Fuller (1996). A semiparametric trans-
formation approach to estimating usual intake distributions. Journal of the American Statistical
Association 91, 1440–1449.
Patterson, H. D. and R. Thompson (1971). Recovery of inter-block information when block sizes
are unequal. Biometrika 58, 545–554.
Ruppert, D., M .P. Wand and R. J. Carroll (2003). Semiparametric regression. Cambridge Series
in Statistical and Probabilistic Mathematics. Cambrige University Press.
Self, S. G. and K.Y. Liang (1987). Asymptotic properties of maximum likelihood estimators and
likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Associ-
ation 82(398), 605–610.
Sempé, M., G. Pédron and M. P. Roy-Pernot (1979). Auxologie, méthode et séquences. Théraplix.
Paris.
Serra-Majem, L., D. MacLean, L. Ribas, D. Brule, W. Sekula, R. Prattala, R. Garcia-Closas,
A. Yngve and M. Lalondeand A. Petrasovits (2003). Comparative analysis of nutrition data from
national, household, and individual levels: results from a WHO-CINDI collaborative project in
Canada, Finland, Poland, and Spain. Journal of Epidemiology and Community Health 57, 74–80.
Smith, J. C. and F. F. Farris (1996). Methyl mercury pharmacokinetics in man: A reevaluation.
Toxicology And Applied Pharmacology 137, 245–252.
Speed, T. (1991). Discussion of “that blup is a good thing: the estimation of random effects” by g.
robinson. Statistical science 6, 42–44.
Tressou, J., A. Crépet, P. Bertail, M. H. Feinberg and J. C. Leblanc (2004). Probabilistic exposure
assessment to food chemicals based on extreme value theory. application to heavy metals from
fish and sea products. Food and Chemical Toxicology 42(8), 1349–1358.
Vasdekis, V.G.S. and A. Trichopoulou (2000). Non parametric estimation of individual food avail-
ability along with bootstrap confidence intervals in household budget surveys. Statistics and
Probability Letters 46, 337–345.
Verbyla, A. (1999). Mixed Models for Practitioners. Biometrics SA, Adelaide.
WHO (1990). Methylmercury, environmental health criteria 101. Technical report. Geneva, Switzer-
land.
Figures and Tables
Table 1: Description of the contamination database (Unit: microgram per kilogram
Mean Min Max Standard Deviation Number of analysis
Fish 0.147 0.003 3.520 0.235 1350
Mollusk and Shellfish 0.014 0.001 0.172 0.011 1293
Table 2: Restricted maximum likelihood estimates (REML) for age and all socioeconomic variables
and the p-value of the Student’s tests (Pval)
Effect Parameter REML Pval
Income (ref: Mean sup)
Well to do γ1 6.027 <0.001
Mean inf γ2 2.686 <0.001
Modest γ3 -1.928 <0.001
Region of residence (ref: Noncoastal regions)
North, Brittany, Vendee coast γ4 0.962 0.003
South West coast γ5 5.232 <0.001
Mediterranean coast γ6 2.303 <0.001
Paris and its suburbs γ7 1.023 0.009
Occupation category of the principal household earner (ref: Blue collar workers)
self-employed persons γ8 -0.122 0.771
white collar workers γ9 -3.733 <0.001
retirees γ10 -5.261 <0.001
no activity γ11 -1.910 0.004
Level of Education of the principal household earner (ref: BAC and higher degree)
student γ12 5.901 <0.001
no or weak diploma γ13 -1.281 <0.001
Table 3: The different steps performed in testing the socioeconomic part of our model. For each
step, the null hypothesis tested and the p-value resulting from the appropriate F-test are shown.
All tests are performed conditionally on the results of the previous tests (Pval)
Null hypothesis Pval
H1 : γ8 = 0 0.771
H2 : γ9 = γ10 0.030
H3 : γ9 = γ11 0.018
H4 : γ10 = γ11 <0.001
H5 : γ4 = γ5 = γ6 = γ7 <0.001
H6 : a : γ4 = γ5 <0.001
b : γ4 = γ6 <0.001
c : γ4 = γ7 0.881
d : γ5 = γ6 <0.001
e : γ5 = γ7 <0.001
f : γ6 = γ7 0.0103
H7 : γ12 = γ13 <0.001
H8 : γ1 = γ2 = γ3 <0.001
H9 : a : γ1 = γ2 <0.001
b : γ1 = γ3 <0.001
c : γ2 = γ3 <0.001
Table 4: Restricted maximum likelihood estimates (REML) for all age and socioeconomic variables
of the reduced final model with all variance components and their standard errors (s.e)
Effect Parameter REML Pval
Income (ref: Mean sup)
Well to do γ1 6.108 <0.001
Mean inf γ2 2.760 <0.001
Modest γ3 -1.915 <0.001
Region of residence (ref: Non coastal regions)
Paris and North, Brittany, Vendee coast γ4= γ7 0.995 <0.001
South west coast γ5 5.156 <0.001
Mediterranean coast γ6 2.250 <0.001
Occupation category of the principal household earner (ref: Blue collar workers and self employed persons)
white collar workers γ9 -3.745 <0.001
retirees γ10 -5.243 <0.001
no activity γ11 -1.871 0.005
Level of education of the principal household earner (ref: BAC and higher degree)
student γ12 5.879 <0.001
no or weak diploma γ13 -1.279 <0.001
REML s.e
Variance of the random effect σu 24.832 6.7316
Variance-covariance structure
variance σ2 1260705 282309
correlation ρ -0.22 0.0434
Figure 1: Cumulative exposure to MeHg (unit: µg per kg of body weight)
	Motivating example: risk related to methylmercury in seafoods in the French population
	Cumulative exposure and long term risk: the Kinetic Dietary Exposure Model (KDEM)
	From household acquisition data to household intake series
	Statistical methodology
	The decomposition model
	General principle
	Specification details
	Estimation and tests
	Applying our methodology to the methylmercury risk assessment
	Estimation and tests on the structure of the model
	The cumulative and the long term individual exposure
	Discussion
ABSTRACT
  Foods naturally contain a number of contaminants that may have different and
long term toxic effects. This paper introduces a novel approach for the
assessment of such chronic food risk that integrates the pharmacokinetic
properties of a given contaminant. The estimation of such a Kinetic Dietary
Exposure Model (KDEM) should be based on long term consumption data which, for
the moment, can only be provided by Household Budget Surveys such as the
SECODIP panel in France. A semi parametric model is proposed to decompose a
series of household quantities into individual quantities which are then used
as inputs of the KDEM. As an illustration, the risk assessment related to the
presence of methyl mercury in seafood is revisited using this novel approach.

<|endoftext|><|startoftext|>
Introduction
Hot molecular cores represent an early evolutionary stage in
massive star formation prior to the formation of an ultra-
compact Hii region (UCHii). Single-dish line surveys toward
hot cores have revealed high abundances of many molecu-
lar species and temperatures usually exceeding 100 K (e.g.,
Schilke et al. 1997; Hatchell et al. 1998; McCutcheon et al.
2000). Unfortunately, most hot cores are relatively far away
(a few kpc, Orion-KL being an important exception), and high-
spatial resolution studies are important to disentangle the var-
Send offprint requests to: H. Beuther
ious components in the region, to resolve potential multiple
heating sources, and to search for chemical variations through-
out the regions. Here we present sub-arcsecond resolution
submm spectral line and dust continuum observations of the
hot core G29.96−0.02, characterizing the physical and chemi-
cal properties of this prototypical region.
The hot core/UCHii region G29.96−0.02 is a well studied
source comprising a cometary UCHii region and approximately
2.6′′ to the west a hot molecular core (Wood & Churchwell
1989; Cesaroni et al. 1994, 1998). G29.96−0.02 is at a dis-
tance of ∼6 kpc (Pratap et al. 1999), the bolometric luminos-
ity measured with IRAS is very high with L ∼ 1.4 × 106 L⊙
http://arxiv.org/abs/0704.0518v1
2 Beuther et al.: SMA observations of G29.96−0.02
(Cesaroni et al. 1994). Since the region harbors at least two
massive (proto)stars (within the UCHii region and the hot core)
this luminosity must be distributed over various sources. Based
on cm continuum free-free emission, Cesaroni et al. (1994) cal-
culate a luminosity for the UCHii region of Lcm ∼ 4.4×10
5 L⊙.
Furthermore, they try to estimate the luminosity of the hot core
via a first order black-body approximation and get a value of
Lbb ∼ 1.2×10
5 L⊙. Later, Olmi et al. (2003) derive a similar es-
timate (∼ 9 × 104 L⊙) via integrating a much better determined
SED. The exciting source of the UCHii region has been identi-
fied in the near-infrared as an O5-O8 star (Watson & Hanson
1997). Furthermore, Pratap et al. (1999) identified two addi-
tional sources toward the rim of the UCHii region and an en-
hanced density of reddened sources indicative of an embedded
cluster.
A line survey toward a number of UCHii regions reveals
that G29.96−0.02 is a strong molecular line emitter in nearly
all observed species (Hatchell et al. 1998). High-angular res-
olution studies show that many species (e.g., NH3, CH3CN,
HNCO, HCOOCH3) peak toward the main H2O maser cluster
∼ 2.6′′ west of the UCHii region (e.g, Hofner & Churchwell
1996; Cesaroni et al. 1998; Olmi et al. 2003), whereas CH3OH
peaks ∼ 4′′ further south-west associated with another iso-
lated H2O maser feature (Pratap et al. 1999). Hoffman et al.
(2003) detected one of the relatively rare H2CO masers toward
the hot core position. These masers are proposed to trace the
warm molecular gas in the vicinity of young forming massive
stars (Araya et al. 2006). The signature of a CH3OH peak off-
set from the other molecular lines is reminiscent of Orion-KL
(e.g., Wright et al. 1996; Beuther et al. 2005b). Temperature
estimates toward the hot core based on high-density trac-
ers vary between 80 and 150 K (e.g., Cesaroni et al. 1994;
Hatchell et al. 1998; Pratap et al. 1999; Olmi et al. 2003).
While Gibb et al. (2004) detect a molecular outflow in
H2S emanating from the hot core in approximately the south-
east north-west direction, Cesaroni et al. (1998) and Olmi et al.
(2003) detect a velocity gradient in the east-west direction
in the high-density tracers NH3(4,4) and CH3CN, consistent
with a rotating disk around an embedded protostar. However,
Maxia et al. (2001) also report that their rather low-resolution
5.9′′ × 3.7′′ (≈ 0.15 pc) SiO(2–1) data are consistent with the
disk scenario as well. This is a bit puzzling since SiO is usually
found to trace shocked gas in outflows and not more quies-
cent gas in disks. Inspecting their SiO image again (Fig. 6 in
Maxia et al. 2001), this interpretation is not unambiguous, the
data also appear to be consistent with the outflow observed in
H2S (Gibb et al. 2004). It is possible that the spatial resolution
of their SiO(2–1) observations is not sufficient to really disen-
tangle the outflow in this distant region.
Olmi et al. (2003) compiled the SED from cm to mid-
infrared wavelengths. While the 3 mm data are still strongly
dominated by the free-free emission (Olmi et al. 2003), at
1 mm the hot core becomes clearly distinguished from the ad-
jacent UCHii region (Wyrowski et al. 2002). G29.96−0.02 is
one of the few hot cores which is detected at mid-infrared
wavelengths (De Buizer et al. 2002). Interestingly, the mid-
infrared peak is ∼ 0.5′′ (∼3000 AU) offset from the NH3(4,4)
hot core position. While Gibb et al. (2004) speculate that the
mid-infrared peak might arise from the scattered light only,
De Buizer et al. (2002) suggest that it could trace a second mas-
sive source within the same core. This hypothesis can be tested
via very-high-angular-resolution submm continuum studies.
2. Observations
We have observed the hot core G29.96−0.02 with the
Submillimeter Array (SMA1, Ho et al. 2004) during four nights
between May and November 2005. We used all available ar-
ray configurations (compact, extended, very extended, for de-
tails see Table 1) with unprojected baselines between 16 and
500 m, resulting at 862 µm in a projected baseline range from
16.5 to 591 kλ. The chosen phase center was the peak position
of the associated UCHii region R.A. [J2000.0]: 18h46m03.s99
and Decl. [J2000.0] −02◦39′21.′′47. The velocity of rest is
vlsr ∼ +98 km s
−1 (Churchwell et al. 1990).
Table 1. Observing parameters
Date Config. # ant. Source loop τ(225GHz)
[hours]
28.May05 very ext. 6 7.0 0.13-0.16
18.Jul.05 comp. 7 7.5 0.06-0.09
4.Sep.05 ext. 6 4.5 0.06-0.08
5.Nov.05 very ext. 7 3.0 0.06
For bandpass calibration we used Ganymede in the com-
pact configuration and 3C279 and 3C454.3 in the extended and
very extended configuration. The flux scale was derived in the
compact configuration again from observations of Ganymede.
For two datasets of the more extended configurations, we used
3C454.3 for the relative scaling between the various baselines
and then scaled that absolutely via observations of Uranus.
For the fourth dataset we did the flux calibration using 3C279
only. The flux accuracy is estimated to be accurate within 20%.
Phase and amplitude calibration was done via frequent observa-
tions of the quasars 1743-038 and 1751+096, about 15.5◦ and
18.3◦ from the phase center of G29.96−0.02. The zenith opac-
ity τ(348GHz), measured with the NRAO tipping radiometer
located at the Caltech Submillimeter Observatory, varied dur-
ing the different observation nights between ∼0.15 and ∼0.4
(scaled from the 225 GHz measurement). The receiver operated
in a double-sideband mode with an IF band of 4-6 GHz so that
the upper and lower sideband were separated by 10 GHz. The
central frequencies of the upper and lower sideband were 348.2
and 338.2 GHz, respectively. The correlator had a bandwidth
of 2 GHz and the channel spacing was 0.8125 MHz. Measured
double-sideband system temperatures corrected to the top of
the atmosphere were between 110 and 800 K, depending on the
zenith opacity and the elevation of the source. Our sensitivity
was dynamic-range limited by the side-lobes of the strongest
1 The Submillimeter Array is a joint project between the
Smithsonian Astrophysical Observatory and the Academia Sinica
Institute of Astronomy and Astrophysics, and is funded by the
Smithsonian Institution and the Academia Sinica.
Beuther et al.: SMA observations of G29.96−0.02 3
emission peaks and thus varied between the line maps of dif-
ferent molecules and molecular transitions. This limitation was
mainly due to the incomplete sampling of short uv-spacings
and the presence of extended structures. The 1σ rms for the
velocity-integrated molecular line maps (the velocity ranges for
the integrations were chosen for each line separately depend-
ing on the line-widths and intensities) ranged between 36 and
76 mJy. The average synthesized beam of the spectral line maps
was 0.65′′×0.48′′ (P.A. −83◦). The 862 µm submm continuum
image was created by averaging the apparently line-free parts
of the upper sideband. The 1σ rms of the submm continuum
image was ∼ 21 mJy/beam, and the achieved synthesized beam
was 0.36′′×0.25′′ (P.A. 18◦), the smallest beam obtained so far
with the SMA. The different synthesized beams between line
and continuum maps are due to different applied weightings in
the imaging process (“robust” parameters set in MIRIAD to 0
and -2, respectively) because there was insufficient signal-to-
noise in the line data obtained in the very extended configura-
tion. The initial flagging and calibration was done with the IDL
superset MIR originally developed for the Owens Valley Radio
Observatory (Scoville et al. 1993) and adapted for the SMA2.
The imaging and data analysis were conducted in MIRIAD
(Sault et al. 1995).
3. Results
3.1. Submillimeter continuum emission
Figure 1 presents the 862 µm continuum emission extracted
from the line-free parts of the upper sideband spectrum
(∼1.8 GHz in total used) shown in Figure 4. The very high spa-
tial resolution of 0.36′′ × 0.25′′ corresponds to a linear res-
olution of ∼ 1800 AU at the given distance of ∼6 kpc. The
submm continuum emission peaks approximately 2′′ west of
the UCHii region and is associated with the molecular line
emission known from previous observations. We do not de-
tect any submm continuum emission toward the UCHii re-
gion itself. At the given spatial resolution, for the first time
multiplicity within the G29.96−0.02 hot core is resolved and
we identify 6 submm continuum emission peaks (submm1 to
submm6) above the 3σ level of 63 mJy beam−1 (Fig. 1). We
consider submm1 and submm2 to be separate sources instead
of a dust ridge because we count compact spherical or ellip-
tical sources and their emission peaks are separated by about
one synthesized beam. The four strongest submm peaks, that
are all > 6σ detections, are located within a region of (1.3′′)2
(7800 AU) in diameter. The submm peak submm1 is associated
with H2O and H2CO maser emission (Hofner & Churchwell
1996; Hoffman et al. 2003), and we consider this to be probably
the most luminous sub-source. The other H2O maser peaks are
offset from the submm continuum emission. The mid-infrared
source detected by De Buizer et al. (2002) is offset > 1′′ from
the submm emission. This may either be due to uncertainties
in the MIR astrometry or the MIR emission may trace another
young source in the region. It should be noted that the class ii
CH3OH masers detected by Minier et al. (2001) peak close to
2 The MIR cookbook by Charlie Qi can be found at
http://cfa-www.harvard.edu/∼cqi/mircook.html.
the MIR source as well, which indicates that the MIR offset
from the hot core may well be real.
Table 2 lists the absolute source positions, their 862µm
peak intensities and the integrated flux densities approximately
associated with each of the sub-sources. Calculating the bright-
ness temperature Tb of the corresponding Planck-function for,
e.g., submm1, we get Tb(Peak1) ∼ 27 K. Assuming hot core
dust temperatures of ∼ 100 K, the usual assumption of opti-
cally thin dust emission is not really valid anymore, and one
gets an approximate beam-averaged optical depth τ of the
dust emission of ∼0.3. To calculate the dust and gas masses,
we can follow the mass determination outlined in Hildebrand
(1983) and Beuther et al. (2002, 2005a), which assumes op-
tically thin emission, and correct that for the increased dust
opacity. Assuming constant emission along the line of sight,
the opacity correction factor C is
1 − e−τ
With τ ∼ 0.3, we get a correction factor C ∼ 1.16 still compa-
rably small. Assuming a dust opacity index β = 1.5, the dust
opacity per unit dust mass is κ(862µm) ∼ 1.5 cm2g−1 (with
the reference value κ(250µm) ∼ 9.4 cm2g−1, see Hildebrand
1983), and we assume a gas-to-dust ratio of 100. Given the
uncertainties in β and T , we estimate the masses to be accu-
rate within a factor 4. Table 2 gives the derived masses and
beam-averaged column densities. Each sub-peak has a mass
of a few M⊙, and the main submm1 exhibits approximately
10 M⊙ of compact, warm gas and dust emission. The inte-
grated 862 µm continuum flux density of the central region
comprising the four main submm continuum sources amounts
to 1.16 Jy. At an average dust temperature of 100 K, this cor-
responds to a central core mass of 39.9 M⊙. In comparison to
these flux density measurements, Thompson et al. (2006) ob-
served with SCUBA 850µm peak and integrated flux densities
of ∼ 11.5 ± 1.2 Jy/(14′′beam) and ∼19.2 Jy, respectively. The
ratio between peak and integrated JCMT fluxes already indi-
cates non-compact emission even on that scales. Furthermore,
subtracting a typical line contamination of the continuum emis-
sion in hot cores of the order 25% (e.g., NGC6334I, Hunter et
al. in prep.), the total 850µm single-dish continuum flux den-
sity should amount to ∼8.6 Jy. Compared with the integrated
flux density in the SMA data of ∼1.74 Jy, this indicates that
approximately 80% of the single-dish emission is filtered out
by the missing short spacings in the interferometer data. The
dust and gas in the central region have higher temperatures than
the components filtered out on larger spatial scales, and since
the dust and gas mass is inversely proportionally related to the
temperature by MH2 ∝ (e
hν/kT − 1) (e.g., Beuther et al. 2002),
a greater proportion of the mass (> 80%) is filtered out in the
SMA data. However, the SMA image reveals the most compact
hot gas and dust cores at the center of the evolving massive
star-forming region. The shortest baseline of the SMA obser-
vations of ∼16.5 kλ correspond to scales > 12′′ which hence
have to be filtered out entirely. However, even smaller scales are
missing because the uv-spacings corresponding to scales ≥ 5′′
are still relatively poorly sampled and the image presented in
Figure 1 is only sensitive to spatial scales of the order a few
http://cfa-www.harvard.edu/~cqi/mircook.html
4 Beuther et al.: SMA observations of G29.96−0.02
Fig. 1. The hot core UCHii region G29.96−0.02. The grey-scale with contours shows the submm continuum emission with a
spatial resolution of 0.36′′×0.25′′. The contour levels start at the 1σ level of 21 mJy beam−1 and continue at 63, 105 mJy beam−1
(black contours) to 147, 168 mJy beam−1 (white contours). The dashed contours outline the cm continuum emission from the
UCHii region and the thick contours show the NH3 emission (Cesaroni et al. 1994). The contouring is done from 15 to 95% (step
10%) of the peak emission of each image, respectively (S peak(1.2cm) = 109mJy/beam, S peak(NH3) = 15mJy/beam). Triangles,
circles and pentagons show the H2O (Hofner & Churchwell 1996), H2CO (Hoffman et al. 2003) and class ii CH3OH (Minier et al.
2001) maser positions. The star marks the peak of the MIR emission (De Buizer et al. 2002), which is not a point source but has
a similar size as the NH3 emission. The squares mark the infrared sources by Pratap et al. (1999).
arcseconds. The submm peaks detected by the SMA are much
stronger than what would have been expected if the single-dish
flux (∼8.6 Jy) were uniformly distributed over the SCUBA pri-
mary beam of 14′′, even ignoring any spatial filtering and miss-
ing flux effects (This would result in ∼ 4 mJy per synthesized
SMA beam.). This shows that the emission measured on the
small spatial scales sampled by the SMA represents the com-
pact core emission much better than expected. However, it does
not imply that the gas masses measured by the SMA are the
only gas reservoir the embedded protostars have for their on-
going accretion; they may also gain mass from the large-scale
gas envelope that is filtered out by our observations (see also
the competitive accretion scenario, e.g., Bonnell et al. 2004).
The derived beam-averaged H2 column densities are of the or-
der a few times 1024 cm−2, corresponding to visual extinctions
Av of a few 1000 (Av = NH/0.94 × 10
21, Frerking et al. 1982).
3.2. Spectral line emission
Figure 4 presents spectra extracted toward the main submm
submm1 with an angular resolution of 0.64′′×0.47′′ compared
to the submm continuum map (see §2). More than 80 spectral
lines from 18 molecular species, isotopologues or vibrationally
excited species have been identified with a minor fraction of
∼5% of unidentified lines (UL) (Tables 6 & 4). The range of up-
per level excitation temperatures for the many lines varies be-
tween approximately 40 and 750 K (Table 6). Therefore, with
one set of observations we are able to trace various gaseous
temperature components from the relatively colder gas sur-
Beuther et al.: SMA observations of G29.96−0.02 5
Table 2. Submm continuum source parameters
Source R.A. Dec. S peak S int M N
[J2000] [J2000] [ mJy
] [mJy] [M⊙] [10
24cm−3]
submm1 18:46:03.786 -02:39:22.19 173 288 11.5 5.7
submm2 18:46:03.789 -02:39:22.48 168 237 9.5 5.5
submm3 18:46:03.758 -02:39:22.16 138 178 7.1 4.5
submm4 18:46:03.736 -02:39:22.65 151 249 9.9 5.0
submm5 18:46:03.710 -02:39:23.33 68 106 4.2 2.2
submm6 18:46:03.665 -02:39:23.80 84 85 3.4 2.8
The Table shows the peak intensities S peak, the integrated intensities S int, the derived gas masses M as well as the H2 column densities N.
rounding the hot core region to the densest and warmest gas
best observed in some of the vibrationally excited lines.
Table 3. Peak intensities, rms and velocity ranges for images
in Figs. 2 & 3.
Line S peak rms ∆v
mJy/beam mJy/beam km/s
862µm cont., low res. 422 17
CH3OH(73,5 − 62,4) 878 64 [90,104]
13CH3OH(137,7 − 127,6) 752 51 [95,101]
CH3OH(74,3 − 64,3), vt = 1 1419 69 [91,105]
CH3OCH3(74,3 − 63,4) 669 46 [94,104]
C2H5OH(157,9 − 156,10) 586 51 [95,100]
SiO(8 − 7) 391 36 [75,105]
C34S(7 − 6) 592 62 [92,104]
H2CS(101,0 − 91,9) 933 69 [92,100]
34SO(88 − 77) 827 57 [95,103]
SO2(144,14 − 183,15) 544 53 [94,100]
HCOOCH3(275,22 − 265,21) 491 70 [96,100]
CH3CN(198 − 188) 788 71 [94,100]
CH3CH2CN(383,36 − 373,35) 791 56 [94,102]
CH3CHCN(362,34 − 352,32) 655 68 [96,100]
HC3N(37 − 36), v7 = 1 622 55 [94,102]
HC3N(37 − 36), v7 = 2 416 57 [94,100]
HN13C(4 − 3) 1149 76 [94,100]
Figures 2 and 3 now present integrated images of the var-
ious detected species, isotopologues and vibrationally excited
lines. For comparison, Figure 2 also shows the submm con-
tinuum emission reduced with the same degraded spatial res-
olution as the line images. All images show emission in the
vicinity of the hot molecular core and no emission toward the
associated UCHii region. However, the morphology varies sig-
nificantly between many of the observed molecular line maps.
The molecular emission is largely confined to the central region
of the main four submm continuum peaks, and we do not detect
appreciable molecular emission toward the continuum peaks
5 and 6. Reducing the submm continuum data with the same
spatial resolution as the line images, the four submm peaks
are smoothed to a single elongated structure peaking close to
the submm peak submm1 (Fig. 2, top-left panel). The ground
state CH3OH emission is relatively broadly distributed with
two peaks in east-west direction, and one may associate one
with the submm peaks 1 and 2 and the other with the submm
peak submm3, but most other maps show on average one spec-
tral line peak somewhere in the middle of the 4 main submm
continuum peaks, similar to the lower-resolution submm con-
tinuum map.
However, there are also a few species which significantly
deviate from this picture and show a different spatial morphol-
ogy. For example SiO is more extended in north-east south-
west direction likely due to a molecular outflow (§3.3). Also
interesting is the emission from C34S which lacks emission
around the central four submm peaks but is stronger in the in-
terface region between the hot molecular core and the UCHii
region (§4.3). Furthermore, there are a few spectral line maps
– mainly those from likely optically thin lines (HCOOCH3,
HN13C), highly excited lines (CH3CHCN) and vibrationally
excited lines (CH3OH vt = 1, 2, HC3N v7 = 1, 2) – which show
their emission peaks concentrated toward the main submm
peak submm1 (§4.5).
Previous lower-resolution (∼ 10′′) molecular line observa-
tions revealed strong CH3OH emission toward the H2O maser
feature approximately 4′′ south-west of the hot core peak
(Fig. 1, Hofner & Churchwell 1996; Pratap et al. 1999). A lit-
tle bit surprising, we do not detect any CH3OH emission (nor
any other species) toward that south-western position, even
when imaged at low angular resolution using only the com-
pact configuration data (therefore, we do not cover that posi-
tion in Figures 2 and 3). Pratap et al. (1999) discuss mainly
two possibilities to explain this discrepancy: Either their ob-
served specific CH3OH(80 − 71) line is a weak maser and
we do not cover any comparable CH3OH line, or the emis-
sion covered by the lower-resolution data is relatively extended
and filtered out by our observations. As discussed in the pre-
vious section, the shortest baseline of our observations was
∼16 m, implying that we are not sensitive to any scales > 12′′.
Since the CH3OH emission in Pratap et al. (1999) is slightly
resolved by their synthesized beam of 12.6′′ × 9.8′′, it is un-
likely that we would have filtered out all emission. However,
among the many observed CH3OH lines (Table 6), some have
similar excitation temperatures of the order 80 K as the line
observed by Pratap et al. (1999), and we would expect to de-
tect thermal emission from these lines as well. Therefore, our
non-detection of CH3OH emission toward the south-western
H2O maser position supports rather their suggested scenario of
weak CH3OH maser emission in the previously reported obser-
vations (Pratap et al. 1999).
6 Beuther et al.: SMA observations of G29.96−0.02
Fig. 4. Lower and upper sideband spectra extracted toward the
submm1. The spatial resolution of these data is 0.64′′ × 0.47′′.
The main line identifications are shown in both panels.
Table 4. Detected molecular species
Species Isotopologues Vibrational states
CH3OH
13CH3OH CH3OH, vt = 1, 2
CH3OCH3
C2H5OH
HCOOCH3
CH3CN
CH3CH2CN
CH3CHCN
HC3N, v7 = 1, 2
HN13C
a The detection of this CH3OH vt = 2 line is
doubtful since other close vt = 2 lines with
similar excitation temperatures were not detected.
3.3. Molecular outflow emission
The SiO(8-7) spectrum spans a large range of velocities from
∼75 to ∼111 km s−1. Integrating the blue- and red-shifted emis-
sion, one gets the outflow map presented in Fig. 5. The elon-
gated north-west south-east structure is consistent with the pre-
viously proposed outflow by Gibb et al. (2004). The additional
red feature north-east of the central hot core region makes the
interpretation ambiguous: If the north-west south-east outflow
is a relatively highly collimated jet, then the north-eastern red
feature could be attributed to an additional outflow leaving
the core in north-east south-west direction. The blue wing of
that potential second outflow would not detected in our data.
However, since we are filtering out any larger-scale emission,
it is also possible that the red SiO features south-east and north-
east of the main core are part of the same wide-angle outflow
tracing potentially the limb-brightened cavity walls. In this sce-
nario, our observations would miss part of the blue-shifted
wide-angle outflow lobe. With the current data, it is difficult to
clearly distinguish between the two scenarios. However, com-
paring the elongated blue-shifted SiO(8–7) data with the pre-
vious north-west south-eastern outflow observed in H2S by
Gibb et al. (2004), it appears that this is the most likely direc-
tion of the main outflow of the region. Therefore, the multi-
ple outflow scenario appears more likely for the hot core in
G29.96−0.02. The lower resolution SiO(2–1) observation by
Maxia et al. (2001) are also consistent with this scenario. Based
on these data, we cannot conclusively say which of the submm
continuum sources submm1 to submm4 contribute to driving
the outflows.
Fig. 5. SiO(8-7) outflow map. The full and dashed contours
are integrated over the blue- and redshifted SiO emission as
shown in the figure. The contouring starts at ±2σ and continues
in ±1σ steps (thick contours positive, thin contours negative).
The 1σ values for the blue- and red-shifted images are 48 and
46 mJy beam−1, respectively. The markers are the same as in
the previous images, the synthesized beam of 0.68′′ × 0.49′′ is
shown at the bottom right, and the arrows guide the eye for the
potential directions of the two discussed outflows. The offsets
on the axes are relative to the phase center.
4. Discussion
4.1. The formation of a proto-Trapezium system?
The four main submm continuum peaks are located within a
projected area of 7800 × 7800 (AU)2 on the sky. The projected
separation ∆θ between individual sub-sources varies between
1800 AU (peaks 1 and 2) and 5400 AU (peaks 1 and 4, see Table
5). Could the four central submm peaks be the predecessors
of a future Trapezium system? Trapezia are defined as non-
hierarchical multiple systems of three or more stars where the
Beuther et al.: SMA observations of G29.96−0.02 7
largest projected separation between Trapezia members should
not exceed the smallest projected separation by a factor of 3
(Sharpless 1954; Ambartsumian 1955; Abt & Corbally 2000).
This criterion is satisfied by the four submm peaks at the cen-
ter of the G29.96−0.02 hot core. The 14 optically identified
Trapezia discussed by Abt & Corbally (2000) have mean radii
to the furthest outlying member of ∼ 4 × 104 AU, with the
largest radius of ∼ 5.4 × 105 AU (∼2.6 pc), the approximate
dimension of an open cluster. Therefore, the protostellar pro-
jected separations of the tentative proto-Trapezium candidate in
G29.96−0.02 are significantly smaller than in typical optically
visible Trapezia systems. A similar small size for a candidate
Trapezium system has recently been reported for the multiple
system in W3IRS5 (Megeath et al. 2005).
Table 5. Spatial separation
Pair ∆θ ∆x
[′′] [AU]
1-2 0.3 1800
1-3 0.5 3000
1-4 0.9 5400
2-3 0.6 3600
2-4 0.8 4800
3-4 0.6 3600
The numbers in column 1 correspond to the numbers of the submm
peaks.
The small sizes of the proto-Trapezia in G29.96−0.02 and
W3IRS5 may be attributed to their youth. During their upcom-
ing evolution, these young system will expel most of the sur-
rounding gas and dust envelope via the protostellar outflows
and strong uv-radiation. Therefore, the whole gravitational po-
tential of the system will decrease and the kinetic energy may
dominate. Systems with positive total energy will globally ex-
pand and will eventually be observable as a larger-scale optical
Trapezia systems (Ambartsumian 1955).
With the given data it is hard to estimate how massive the
expected Trapezia stars are and will finally be at the end of
their formation processes. The integrated hot core luminosity
is estimated to be ∼ 105 L⊙ (Cesaroni et al. 1994; Olmi et al.
2003), in contrast to the integrated luminosity of the whole re-
gion measured by the large IRAS beam of ∼ 106 L⊙. Producing
105 L⊙ requires either an O7 star or a few stars of comparable
but lower masses. Nevertheless, the numbers imply that this
Trapezium system should form at least one or more massive
stars. Although the gas masses we derived from our dust con-
tinuum data (Table 2) are relatively low, that does not neces-
sarily imply that their mass reservoir is restricted to these gas
masses because it is possible that they may accrete additional
gas from the larger-scale envelope that is filtered out by our
observations. This scenario is predicted by the competitive ac-
cretion model for massive star formation (e.g., Bonnell et al.
2004). The fact that the gas masses we find for the four
strongest submm sources are all similar allows to speculate that
they may form about similar mass stars in the end, however, this
cannot be proven by these data in more detail.
Assuming that the projected size of the potential
proto-Trapezium system in G29.96−0.02 of approximately
7800 (AU)2 resembles a 3-dimensional sphere of radius
∼3900 AU, we can estimate the current protostellar volume
density of the region to approximately 1.4 × 105 protostars
per cubic pc. This number is larger than typical stellar den-
sities in young clusters of the order 104 stars per cubic pc
(Lada & Lada 2003), but it is still below the extremely high
(proto)stellar densities required for protostellar merger models
of the order 106 to 108 stars per cubic pc (Bonnell et al. 1998,
2004; Stahler et al. 2000; Bally & Zinnecker 2005).
Although we have not yet observed the extremely high
(proto)stellar densities predicted by the coalescence scenario,
as soon as we observe massive star-forming regions with a spa-
tial resolution ≤ 4000 AU, we begin to resolve multiplicity and
potential proto-Trapezia (see also the recent observations of
NGC6334I and I(N) by Hunter et al. 2006). Furthermore, this
(proto)stellar density may even be a lower limit, since we ob-
serve only a two-dimensional projection and are additionally
sensitivity limited to masses ≥ 2.1 M⊙ (corresponding to the
3σ flux limit of 63 mJy beam−1 at the assumed temperature of
100 K). Higher spatial resolution has so far always increased
the observed (proto)stellar densities, and it is possible that in
the future we may reach the 106 requirement for merging to
play a role. However, it is also important to get better theoret-
ical predictions of potential merger signatures that observers
could look for.
4.2. Various episodes of massive star formation?
It is interesting to note that the previously identified mid-
infrared source (De Buizer et al. 2002) is offset from the
submm continuum peaks. Although the mid-infrared astrome-
try is usually relatively uncertain, the association of the mid-
infrared peak with class ii CH3OH maser emission with an
absolute positional uncertainty of only 30 mas (Minier et al.
2001) is indicative that the offset may be real. Combining the
facts that we find within a small region of only ∼20000 AU
(∼0.1 pc) at least three different regions of massive star for-
mation – the UCHii region, the mid-infrared source, and the
submm continuum sources – indicates that not all massive stars
within the same evolving cluster are coeval but that sequences
of massive star formation may take place even on such small
spatial scales.
4.3. Carbon mono-sulfide C34S
One of the most striking spectral line maps is from the rare car-
bon mono-sulfide isotopologue C34S(7–6). Its emission peak is
not toward the hot core nor any of the submm continuum peaks,
but largely east of it in the interface region between the submm
continuum peaks and the UCHii region. Hence, one likes to
understand why the C34S emission is that weak toward the hot
core region and that strong at the hot core/UCHii region inter-
face.
CS usually desorbs from dust grains at moderate temper-
atures of a few 10 K, hence it should be observable relatively
8 Beuther et al.: SMA observations of G29.96−0.02
early in the evolution of a growing hot molecular core (e.g.,
Viti et al. 2004). From 100 K upwards H2O is released from
grains, then it forms OH molecules, and the OH can react
with S to SO and SO2 (e.g., Charnley 1997). Therefore, the
initial high CS abundances should decrease with time while
the SO and SO2 are expected to increase with time (e.g.,
Wakelam et al. (2005)). As shown in Figure 2, 34SO peaks to-
ward the hot core where the derived CH3OH temperatures ex-
ceed the H2O evaporation temperature (see §4.4 and Fig. 7b,
potentially validating this theoretical prediction. According to
such chemical models, the hot core G29.96−0.02 should have
a chemical age of at least a few times 104 years.
The strong C34S emission in the hot core/UCHii interface
region may be explained in the same framework. In the molec-
ular evolution scheme outlined above, one would expect low
C34S emission toward the hot core with a maybe symmetrical
increase further-out. In the case of the G29.96−0.02 hot core,
we have the decrease toward the center, but the emission rises
only toward the east, north and west with the strongest increase
in the eastern hot core/UCHii region interface. If one compares
the C34S morphology in Figure 2 with the temperature distri-
bution in Figure 7b, one finds the lowest CH3OH temperatures
right in the vicinity of the C34S emission peaks, adding further
support to the proposed chemical picture.
Extrapolating this scenario to other molecules, it indicates
that species which are destroyed by H2O, e.g., molecular ions
such as HCO+ or N2H
+ (e.g., Bergin et al. 1998), are no good
probes of the inner regions of hot molecular cores.
4.4. Temperature structure
Leurini et al. (2004, 2007) investigated the diagnostic proper-
ties of methanol over a range of physical parameters typical
of high-mass star-forming regions. They found that the ground
state lines of CH3OH are mainly tracers of the spatial density
of the gas, although at submillimeter wavelengths high k tran-
sitions are also sensitive to the kinetic temperature. However,
in hot, dense regions such as hot cores, the effects of infrared
pumping on the level populations due to the thermal heating of
the dust is not negligible, but mimic the effect of collisional ex-
citation. For the ground state line, Leurini et al. (2007) found
that it is virtually impossible to distinguish between IR pump-
ing and pumping by collisions, as both mechanisms equally
populate the vt = 0 levels. On the other hand, the vibrationally
or torsionally excited lines have very high critical densities
(1010–1011 cm−3) and high level energies (T ≥ 200 K). They
are difficult to be populated by collisions and trace the IR field
instead.
To study the physical conditions of the gas around the main
continuum peaks in G29.96–0.02, we analyzed only the emis-
sion coming from the vt = 1 lines, as their optical depth is lower
than for the ground state, and their emission is confined to the
gas around the dust condensations, while the vt = 0 transitions
are more extended and can be affected by problems of missing
flux. We first fitted the methanol emission of the vt = 1 lines
(see Fig. 6) towards the peak position, using the method de-
scribed by Leurini et al. (2004, 2007) that is based on an LVG
analysis and includes radiative pumping (Leurini et al. 2007).
The continuum emission derived in §3.1 was used in the calcu-
lations to solve the equations for the level populations. The two
main dust condensations submm1 and submm2 fall in the beam
of the line data; however, we assumed that the emission is com-
ing from only one component, which is more extended than our
beam, and derived a CH3OH column density averaged over the
beam of 4×1017 cm−2. The corresponding methanol abundance,
relative to H2 is of the order of 10
−7, typical of hot core sources.
Since the emission from the vt = 1 lines is optically thin for
this column density, and also at higher values, we consider this
approach valid. The temperature derived toward the line peak
is 340 K. This corresponds to our best fit model, but from a χ2
analysis we can only infer a low limit of ∼220 K for the temper-
ature of the gas. Since lines are optically thin, the degeneracy
between kinetic temperature and column density is not solved,
and the model delivers good fit to the vt = 1 lines for lower
or higher temperatures by adjusting the methanol column den-
sity. However, the low temperature solutions (Tkin=100–200 K)
need high methanol abundances relative to H2(∼ 10
−6), which
can be hardly found at these temperatures. Moreover, lines are
optically thick for these column densities, and the assumption
of our analysis is not valid anymore.
We also investigated the line ratio between several vt = 1
lines at the column density derived for the main position, to
find the best temperature diagnostic tool among the methanol
lines and derive a temperature map of the region. We found
that the line ratios with the blend of lines at ∼ 337.64 GHz
increase with the temperature of the gas (Fig. 7a). However,
the blending of several transitions together complicates the use
of such diagnostic. In Fig. 7b, we show the map of the line ratio
between the 71,6 → 61,5-E vt = 1 at 337.708 GHz and the blend
between the 71,7 → 61,6-E vt = 1 at 337.642 GHz and 70,7 →
60,6-E vt = 1 at 337.644 GHz. Since line intensities do not
simply add up, we did not correct for the overlapping between
the two transitions. Two other lines, the 74,3 → 64,2-E vt = 1
and the 75,3 → 64,2-E vt = 1, are also very close in frequency.
This is seen in the linewidth of the blending, which is wider
than for the other lines. Therefore, we considered only half of
the channels of the blending at 337.64 GHz in our line ratio
analysis. From the ratio-map in Fig. 7b, submm1, submm2 and
submm3 of Table 1 show high temperatures (T≥ 300 K), while
relatively low temperature gas (T∼ 100 K) is found at R.A.
[J2000]=18h46m03s.818 Dec. [J2000]= −02◦39′22′′.14, close
to a secondary peak of many ground state lines of methanol
(Fig. 2). The temperature then decreases towards submm4. The
increase in the line ratio towards the south-east and north is
probably not true, but due to the poor signal to noise ratio in
these areas. Changes in the column densities along the area may
affect our results.
Fig. 6. Spectrum of the 7ka ,kb → 6ka ,kb−1 vt = 1 methanol band to-
ward the main dust condensation. Overlaid in black is the synthetic
spectrum resulting from the fit.
Beuther et al.: SMA observations of G29.96−0.02 9
Fig. 7. a: Modeled line ratio between the 71,6 → 61,5-E vt = 1 line and
the 71,7 → 61,6-E vt = 1 transitions, as function of the temperature. b:
Map of the line ratio between the same transitions in the inner region
around the peaks. The white stars mark the positions of the dust peaks;
the white dashed contours show the values of the line ratio from ∼ 150
to ∼ 350 K, which correspond to levels from 0.3 to 0.7 in step of 0.1
in the map. The solid black contours show the continuum emission
smoothed to the resolution of the line data (from 0.2 to 0.4 Jy/beam in
step of 0.05). The offsets on the axes are relative to the phase center.
4.5. Tracing rotation toward the massive cores
At the given lower spatial resolution of the spectral line data
compared to the submm continuum, we cannot resolve the four
submm peaks well. However, one of the aims of such multi-line
studies is to identify spectral lines that trace the massive proto-
stars and that are potentially associated with massive disk-like
structures. Such lines may then be used for kinematic gas stud-
ies of rotating gas envelopes, tori or accretion disks. Therefore,
we analyzed the data-cubes searching for velocity structures
indicative of any kind of rotation. In the large majority of spec-
tral lines, this was not successful and we could mostly not
identify coherent velocity structure. While chemical and tem-
perature effects (§4.3 & 4.4) may be responsible for parts of
that, the large column densities derived in §3.1 imply also large
molecular line column densities and hence large optical depths.
Therefore, many of the observed lines are likely optically thick
tracing only outer gas layers of the hot molecular core not pen-
etrating down to the deeply embedded protostars. Furthermore,
many molecules would not only be excited in the central ro-
tating disk-like structures but also in the surrounding envelope
and maybe the outflow. Hence, disentangling the different com-
ponents observationally remains a challenging task.
Fig. 8. Moment 1 maps of HN13C(4–3) (top) and HC3N(37–
36)v7 = 1 (bottom). The markers are the same as in the previous
images, and the synthesized beam of 0.68′′ × 0.49′′ is shown at
the bottom left. The offsets on the axes are relative to the phase
center.
The major exceptions are the molecular lines of the rare
isotopologue of hydrogen isocyanide HN13C(4–3) with a low
excitation temperature of only 42 K, and the vibrationally ex-
cited line of cyanoacetylene HC3N(37–36)v7 = 1 with a higher
excitation temperature of 629 K (Fig. 8). In both cases we find
a velocity gradient across the main submm peak submm1 with
a position angle of ∼ 45◦ from north. This is approximately
perpendicular to the molecular outflow discussed in §3.3 and
by Gibb et al. (2004). Interestingly, Gibb et al. (2004) also find
a similar velocity gradient in their central velocity channels of
H2S. The previously reported NH3 and CH3CN velocity gradi-
ents in approximately east-west direction (Cesaroni et al. 1998;
Olmi et al. 2003) have been observed with slightly lower spa-
tial resolution and are consistent with our data as well.
Our observations as well as previous work in the liter-
ature suggest that the G29.96−0.02 hot core exhibits a ve-
locity gradient in the dense gas in approximately north-east
south-west direction perpendicular to the molecular outflow
observed at larger scales. Based on the HN13C(4–3) map, the
diameter of this structure is ∼ 1.6′′ corresponding to radius
of ∼4800 AU. Since this emission encompasses not only the
submm peak submm1 but also the submm2 and submm3,
it is not genuine protostellar disk as often observed in low-
mass star-forming regions. The velocity structure does not re-
semble Keplerian rotation and may hence be due to some
larger-scale rotating envelope or torus that could transform
into a genuine accretion disks at smaller still unresolved spa-
tial scales (Cesaroni et al. 2007). Additional options to ex-
plain such a velocity gradient may be (a) interaction with the
2nd outflow in north-east–south-western direction, (b) inter-
action with the expanding UCHii region, and (c) global col-
lapse like recently proposed for NGC2264 (Peretto et al. 2006).
While we cannot exclude (a) and (b), option (c) of a globally
collapsing core appears particularly interesting because com-
bining rotation and collapse would result in an inward spi-
raling kinematic structure, potentially similar to the models
originally proposed for rotating low-mass cores (e.g., Ulrich
1976; Terebey et al. 1984). Recent hydrodynamic simulations
by Dobbs et al. (2005) and Krumholz et al. (2006) as well as
analytic studies by Kratter & Matzner (2006) find fragmenta-
tion and star formation within the massive disks forming early
in the collapse process of high-mass cores. This would be con-
sistent with the found three sub-sources (submm1 to submm3)
within the HN13C/HC3N structure. However, on a cautionary
note it needs to be stressed that the collapse/rotation scenario
is far from conclusive, and that the outflow and/or UCHii re-
gion can potentially influence the observed velocity pattern as
well. It remains puzzling that only these two lines exhibit the
discussed signatures whereas all the other spectral lines in our
setup do not.
5. Conclusions and Summary
The new 862 µm submm continuum and spectral line data ob-
tained with the SMA toward G29.96−0.02 at sub-arcsecond
spatial resolution resolve the hot molecular core into sev-
eral sub-sources. At an angular resolution of 0.36′′ × 0.25′′,
corresponding to linear scales of ∼1800 AU, the central core
contains four submm continuum peaks which resemble a
Trapezium-like multiple system at a very early evolutionary
stage. Assuming spherical symmetry for the hot core region,
the protostellar densities are high of the order 1.4 × 105 pro-
tostars per pc3. However, these protostellar densities are still
below the required values between 106 to 108 protostars/pc3 to
make coalescence of protostars a feasible process. Derived H2
column densities of the order a few 1024 cm−2 imply visual ex-
tinctions of a few 1000. The existence of three sites of massive
star formation in different evolutionary stages within a small
region (the UCHii region, the mid-infrared source, and the
submm continuum sources) indicates that sequences of mas-
sive star formation may take place within the same evolving
massive protocluster.
The 4 GHz of observed bandpass reveal a plethora of ap-
proximately 80 spectral lines from 18 molecular species, iso-
topologues or vibrationally excited lines. Only about 5% of the
10 Beuther et al.: SMA observations of G29.96−0.02
spectral lines remain unidentified. Most spectral lines peak to-
ward the hot molecular core, while a few species also show
more extended emission, likely due to molecular outflows and
chemical differentiation. The CH3OH line forest allows us to
investigate the temperature structure in more detail. We find
hot core temperatures≥ 300 K and decreasing temperature gra-
dients to the core edges. The SiO(8-7) observations confirm
a previously reported outflow Gibb et al. (2004) in north-west
south-east direction with a potential identification of a second
outflow emanating approximately in perpendicular direction.
Furthermore, C34S exhibits a peculiar morphology being weak
toward the hot molecular core and strong in its surroundings,
particular in the UCHii/hot core interface region. The C34S de-
ficiency toward the hot molecular core may be explained by
time-dependent chemical desorption from grains, where the
C34S desorbs early, and later-on after H2O desorbs from grains
forming OH, the sulphur reacts with the OH to form SO and
Furthermore, we were interested in identifying the best
molecular line tracers to investigate the kinematics and po-
tential disk-like structures in such dense and young massive
star-forming regions. Most spectral lines do not exhibit any
coherent velocity structure. A likely explanation for this un-
correlation between molecular line peaks and submm contin-
uum peaks is that many spectral lines may be optically thick in
such high-column-density regions, and that additional chem-
ical evolution and temperature effects complicate the picture.
Furthermore, many molecules are excited in various gas com-
ponents (e.g., disk, envelope, outflow), and it is often observa-
tionally difficult to disentangle the different contributions prop-
erly. There are a few exceptions of optically thin and vibra-
tionally excited lines that apparently probe deeper into the core
tracing submm1 better than other transitions. Investigating the
velocity pattern of these spectral lines, we find for some of
them a velocity gradient in the north-east south-west direc-
tion perpendicular to the molecular outflow. Since the spatial
scale of this structure is relatively large (∼4800 AU) compris-
ing three of the central protostellar sources, and since the veloc-
ity structure is not Keplerian, this is not a genuine Keplerian ac-
cretion disk. While these data are consistent with a larger-scale
toroid or envelope that may rotate and/or globally collapse, we
cannot exclude other explanations, such as that the influence
of the outflow(s) and/or expanding UCHII region produces the
observed velocity pattern. In addition to this, these data con-
firm previous findings that the high column densities, the large
optical depths of the spectral lines, the chemical evolution, and
the different spectral line contributions from various gas com-
ponents make it very difficult to identify suitable massive ac-
cretion disk tracers, and hence to study this phenomenon in a
more statistical fashion. (e.g., Beuther et al. 2006)
Acknowledgements. We like to thank Peter Schilke and Sebastian
Wolf for many interesting discussions about related subjects. We
also thank the anonymous referee whose comments helped improv-
ing the paper. H.B. acknowledges financial support by the Emmy-
Noether-Program of the Deutsche Forschungsgemeinschaft (DFG,
grant BE2578).
References
Abt, H. A. & Corbally, C. J. 2000, ApJ, 541, 841
Ambartsumian, V. A. 1955, The Observatory, 75, 72
Araya, E., Hofner, P., Goss, W. M., et al. 2006, ApJ, 643, L33
Bally, J. & Zinnecker, H. 2005, AJ, 129, 2281
Bergin, E. A., Neufeld, D. A., & Melnick, G. J. 1998, ApJ, 499,
Beuther, H., Schilke, P., Menten, K. M., et al. 2002, ApJ, 566,
Beuther, H., Schilke, P., Menten, K. M., et al. 2005a, ApJ, 633,
Beuther, H., Zhang, Q., Greenhill, L. J., et al. 2005b, ApJ, 632,
Beuther, H., Zhang, Q., Sridharan, T. K., Lee, C.-F., & Zapata,
L. A. 2006, A&A, 454, 221
Bonnell, I. A., Bate, M. R., & Zinnecker, H. 1998, MNRAS,
298, 93
Bonnell, I. A., Vine, S. G., & Bate, M. R. 2004, MNRAS, 349,
Cesaroni, R., Churchwell, E., Hofner, P., Walmsley, C. M., &
Kurtz, S. 1994, A&A, 288, 903
Cesaroni, R., Galli, D., Lodato, G., Walmsley, C. M., &
Zhang, Q. 2007, in Protostars and Planets V, ed. B. Reipurth,
D. Jewitt, & K. Keil, 197–212
Cesaroni, R., Hofner, P., Walmsley, C. M., & Churchwell, E.
1998, A&A, 331, 709
Charnley, S. B. 1997, ApJ, 481, 396
Churchwell, E., Walmsley, C. M., & Cesaroni, R. 1990, A&AS,
83, 119
De Buizer, J. M., Radomski, J. T., Piña, R. K., & Telesco, C. M.
2002, ApJ, 580, 305
Dobbs, C. L., Bonnell, I. A., & Clark, P. C. 2005, MNRAS,
360, 2
Frerking, M. A., Langer, W. D., & Wilson, R. W. 1982, ApJ,
262, 590
Gibb, A. G., Wyrowski, F., & Mundy, L. G. 2004, ApJ, 616,
Hatchell, J., Thompson, M. A., Millar, T. J., & MacDonald,
G. H. 1998, A&AS, 133, 29
Hildebrand, R. H. 1983, QJRAS, 24, 267
Ho, P. T. P., Moran, J. M., & Lo, K. Y. 2004, ApJ, 616, L1
Hoffman, I. M., Goss, W. M., Palmer, P., & Richards, A. M. S.
2003, ApJ, 598, 1061
Hofner, P. & Churchwell, E. 1996, A&AS, 120, 283
Hunter, T. R., Brogan, C. L., Megeath, S. T., et al. 2006, ApJ,
649, 888
Kratter, K. & Matzner, C. 2006, ArXiv Astrophysics e-prints:
astro-ph/0609692
Krumholz, M., Klein, R., & McKee, C. 2006, ArXiv
Astrophysics e-prints: astro-ph/0609798
Lada, C. J. & Lada, E. A. 2003, ARA&A, 41, 57
Leurini, S., Schilke, P., Menten, K. M., et al. 2004, A&A, 422,
Leurini, S., Schilke, P., Wyrowski, F., & Menten, K. 2007, sub-
mitted
Maxia, C., Testi, L., Cesaroni, R., & Walmsley, C. M. 2001,
A&A, 371, 287
Beuther et al.: SMA observations of G29.96−0.02 11
McCutcheon, W. H., Sandell, G., Matthews, H. E., et al. 2000,
MNRAS, 316, 152
Megeath, S. T., Wilson, T. L., & Corbin, M. R. 2005, ApJ, 622,
Minier, V., Conway, J. E., & Booth, R. S. 2001, A&A, 369, 278
Olmi, L., Cesaroni, R., Hofner, P., et al. 2003, A&A, 407, 225
Peretto, N., Hennebelle, P., & Andre, P. 2006, ArXiv
Astrophysics e-prints, astro-ph/0611277
Pratap, P., Megeath, S. T., & Bergin, E. A. 1999, ApJ, 517, 799
Sault, R. J., Teuben, P. J., & Wright, M. C. H. 1995, in ASP
Conf. Ser. 77: Astronomical Data Analysis Software and
Systems IV, 433
Schilke, P., Walmsley, C. M., Pineau des Forets, G., & Flower,
D. R. 1997, A&A, 321, 293
Scoville, N. Z., Carlstrom, J. E., Chandler, C. J., et al. 1993,
PASP, 105, 1482
Sharpless, S. 1954, ApJ, 119, 334
Stahler, S. W., Palla, F., & Ho, P. T. P. 2000, Protostars and
Planets IV, 327
Terebey, S., Shu, F. H., & Cassen, P. 1984, ApJ, 286, 529
Thompson, M. A., Hatchell, J., Walsh, A. J., MacDonald,
G. H., & Millar, T. J. 2006, A&A, 453, 1003
Ulrich, R. K. 1976, ApJ, 210, 377
Viti, S., Collings, M. P., Dever, J. W., McCoustra, M. R. S., &
Williams, D. A. 2004, MNRAS, 354, 1141
Wakelam, V., Selsis, F., Herbst, E., & Caselli, P. 2005, A&A,
444, 883
Watson, A. M. & Hanson, M. M. 1997, ApJ, 490, L165+
Wood, D. O. S. & Churchwell, E. 1989, ApJS, 69, 831
Wright, M. C. H., Plambeck, R. L., & Wilner, D. J. 1996, ApJ,
469, 216
Wyrowski, F., Gibb, A. G., & Mundy, L. G. 2002, in
Astronomical Society of the Pacific Conference Series, 43
12 Beuther et al.: SMA observations of G29.96−0.02
Fig. 2. Compilation of integrated line images (and submm continuum at the same spatial resolution) always shown in grey-scale
with contours and labeled at the bottom of each panel. The dashed contours show negative features due to missing short spacings.
The contouring is done from ±15 to ±95% (step ±10%) of the peak emission of each image, respectively. Peak fluxes S peak, rms
and integrated velocity ranges for each image are given in Table 3. The dotted contours again show the UCHii region and the
stars mark the submm continuum peaks from Figure 1. The offsets on the axes are relative to the phase center.
Beuther et al.: SMA observations of G29.96−0.02 13
Fig. 3. Continued Figure 2.
14 Beuther et al.: SMA observations of G29.96−0.02
Table 6. Line parameters
Freq. Line Eu Freq. Line Eu
GHz K GHz K
337.279 CH3OH(72,5 − 62,4)E(vt=2)
a 727 338.409 CH3OH(70,7 − 60,6)A 65
337.297 CH3OH(71,7 − 61,6)A(vt=1) 390 338.431 CH3OH(76,1 − 66,0)E 254
337.348 CH3CH2CN(383,36 − 373,35) 328 338.442 CH3OH(76,1 − 66,0)A 259
337.397 C34S(7–6) 65 CH3OH(76,2 − 66,1)A
− 259
337.421 CH3OCH3(212,19 − 203,18) 220 338.457 CH3OH(75,2 − 65,1)E 189
337.446 CH3CH2CN(374,33 − 364,32) 322 338.475 CH3OH(75,3 − 65,2)E 201
337.464 CH3OH(76,1 − 60,0)A(vt=1) 533 338.486 CH3OH(75,3 − 65,2)A 203
337.474 UL CH3OH(75,2 − 65,1)A
− 203
337.490 HCOOCH3(278,20 − 268,19)E 267 338.504 CH3OH(74,4 − 64,3)E 153
337.519 CH3OH(75,2 − 65,2)E(vt=1) 482 338.513 CH3OH(74,4 − 64,3)A
− 145
337.546 CH3OH(75,3 − 65,2)A(vt=1) 485 CH3OH(74,3 − 64,2)A 145
CH3OH(75,2 − 65,1)A
−(vt=1) 485 CH3OH(72,6 − 62,5)A
− 103
337.582 34SO(88 − 77) 86 338.530 CH3OH(74,3 − 64,2)E 161
337.605 CH3OH(72,5 − 62,4)E(vt=1) 429 338.541 CH3OH(73,5 − 63,4)A
+ 115
337.611 CH3OH(76,1 − 66,0)E(vt=1) 657 338.543 CH3OH(73,4 − 63,3)A
− 115
CH3OH(73,4 − 63,3)E(vt=1) 388 338.560 CH3OH(73,5 − 63,4)E 128
337.626 CH3OH(72,5 − 62,4)A(vt=1) 364 338.583 CH3OH(73,4 − 63,3)E 113
337.636 CH3OH(72,6 − 62,5)A
−(vt=1) 364 338.612 SO2(201,19 − 192,18) 199
337.642 CH3OH(71,7 − 61,6)E(vt=1) 356 338.615 CH3OH(71,6 − 61,5)E 86
337.644 CH3OH(70,7 − 60,6)E(vt=1) 365 338.640 CH3OH(72,5 − 62,4)A 103
337.646 CH3OH(74,3 − 64,2)E(vt=1) 470 338.722 CH3OH(72,5 − 62,4)E 87
337.648 CH3OH(75,3 − 65,2)E(vt=1) 611 338.723 CH3OH(72,6 − 62,5)E 91
337.655 CH3OH(73,5 − 63,4)A(vt=1) 461 338.760
13CH3OH(137,7 − 127,6)A 206
CH3OH(73,4 − 63,3)A
−(vt=1) 461 338.769 HC3N(37 − 36)v7 = 2 525
337.671 CH3OH(72,6 − 62,5)E(vt=1) 465 338.886 C2H5OH(157,8 − 156,19) 162
337.686 CH3OH(74,3 − 64,2)A(vt=1) 546 339.058 C2H5OH(147,7 − 146,8) 150
CH3OH(74,4 − 64,3)A
−(vt=1) 546 347.232 CH2CHCN(381,38 − 371,37) 329
CH3OH(75,2 − 65,1)E(vt=1) 494 347.331
28SiO(8–7) 75
337.708 CH3OH(71,6 − 61,5)E(vt=1) 489 347.446 UL
337.722 CH3OCH3(74,4 − 63,3)EE 48 347.494 HCOOCH3(275,22 − 265,21)A 247
337.732 CH3OCH3(74,3 − 63,3)EE 48 347.759 CH2CHCN(362,34 − 352,32) 317
337.749 CH3OH(70,7 − 60,6)A(vt=1) 489 347.792 UL
337.778 CH3OCH3(74,4 − 63,4)EE 48 347.842 UL
337.787 CH3OCH3(74,3 − 63,4)AA 48 347.916 C2H5OH(204,17 − 194,16) 251
337.825 HC3N(37 − 36)v7 = 1 629 347.983 UL
337.838 CH3OH(206,14 − 215,16)E 676 348.261 CH3CH2CN(392,37 − 382,36) 344
337.878 CH3OH(71,6 − 61,5)A(vt=2) 748 348.340 HN
13C(4–3) 42
337.969 CH3OH(71,6 − 61,5)A(vt=1) 390 348.345 CH3CH2CN(402,39 − 392,38) 351
338.081 H2CS(101,10 − 91,9) 102 348.532 H2CS(101,9 − 91,8) 105
338.125 CH3OH(70,7 − 60,6)E 78 348.910 HCOOCH3(289,20 − 279,19)E 295
338.143 CH3CH2CN(373,34 − 363,33) 317 348.911 CH3CN(199 − 189) 745
338.306 SO2(144,14 − 183,15) 197 349.025 CH3CN(198 − 188) 624
338.345 CH3OH(71,7 − 61,6)E 71 349.107 CH3OH(141,13 − 140,14) 43
338.405 CH3OH(76,2 − 66,1)E 244
a The detection of this CH3OH vt = 2 line is doubtful since other close vt = 2 lines with similar
excitation temperatures were not detected.
This figure "ch3oh1.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0518v1
http://arxiv.org/ps/0704.0518v1
This figure "hc3n_v1_mom1.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0518v1
http://arxiv.org/ps/0704.0518v1
This figure "hn13c_mom1.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0518v1
http://arxiv.org/ps/0704.0518v1
This figure "ch3oh2.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0518v1
http://arxiv.org/ps/0704.0518v1
This figure "ch3oh3.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0518v1
http://arxiv.org/ps/0704.0518v1
	Introduction
	Observations
	Results
	Submillimeter continuum emission
	Spectral line emission
	Molecular outflow emission
	Discussion
	The formation of a proto-Trapezium system?
	Various episodes of massive star formation?
	Carbon mono-sulfide C34S
	Temperature structure
	Tracing rotation toward the massive cores
	Conclusions and Summary
ABSTRACT
  Aiming at a better understand of the physical and chemical processes in the
hot molecular core stage of high-mass star formation, we observed the
prototypical hot core G29.96-0.02 in the 862mu band with the Submillimeter
Array (SMA) at sub-arcsecond spatial resolution. The observations resolved the
hot molecular core into six submm continuum sources with the finest spatial
resolution of 0.36''x0.25'' (~1800AU) achieved so far. Four of them located
within 7800(AU)^2 comprise a proto-Trapezium system with estimated protostellar
densities of 1.4x0^5 protostars/pc^3. The plethora of ~80 spectral lines allows
us to study the molecular outflow(s), the core kinematics, the temperature
structure of the region as well as chemical effects. The derived hot core
temperatures are of the order 300K. We find interesting chemical spatial
differentiations, e.g., C34S is deficient toward the hot core and is enhanced
at the UCHII/hot core interface, which may be explained by temperature
sensitive desorption from grains and following gas phase chemistry. The
SiO(8-7) emission outlines likely two molecular outflows emanating from this
hot core region. Emission from most other molecules peaks centrally on the hot
core and is not dominated by any individual submm peak. Potential reasons for
that are discussed. A few spectral lines that are associated with the main
submm continuum source, show a velocity gradient perpendicular to the
large-scale outflow. Since this velocity structure comprises three of the
central protostellar sources, this is not a Keplerian disk. While the data are
consistent with a gas core that may rotate and/or collapse, we cannot exclude
the outflow(s) and/or nearby expanding UCHII region as possible alternative
causes of this velocity pattern.

<|endoftext|><|startoftext|>
Introduction 
The Hamiltonian formulation of non-conservative systems has been developed 
by Riewe[1,2].He used the fractional derivative [3,4,5] to construct the Lagrangian and 
Hamiltonian for non-conservative systems. As a sequel to Riewe's work, Rabei et al. 
[6] used Laplace transforms of fractional integrals and fractional derivatives to develop 
a general formula for the potential of any arbitrary forces, conservative or non-
conservative. This led directly to the consideration of the dissipative effects in 
Lagrangian and Hamiltonian formulations. Besides, the canonical quantization of non-
conservative systems carried out by Rabei et al. [7]. 
Other investigations and further developments are given by Agrawal [8] .He 
presented the fractional variational problems and the resulting equations are found to 
be similar to those for variation problems containing integral order derivatives. This 
approach is extended to classical fields with fractional derivatives [9]. Besides, Kilmek 
[10] showed that the fractional Hamiltonian is usually not a constant of motion, even in 
the case when the Hamiltonian is not an explicit function of time. In addition, as a 
continuation of Agrawal’s work [8], Rabei et al. [11] achieved the passage from the 
Lagrangian containing fractional derivatives to the Hamiltonian. The Hamilton's 
equations of motion are obtained in a similar manner to the usual mechanics. 
In the present work, the Hamilton – Jacobi partial differential equation (HJPDE) 
is generalized to be applicable for systems containing fractional derivatives.   
The paper is organized as follows: In Sec. 2 Lagrangian and Hamiltonian 
formalisms with fractional derivatives are reviewed briefly. In Sec.3, Hamilton-Jacobi 
Partial differential equations with fractional derivatives is constructed, and two 
illustrative examples are given in Sec. 4. 
2- Hamiltonian Formalism with Fractional Derivative 
Several definitions of a fractional derivative have been proposed. These 
definitions include Riemann–Liouville, Grünwald–Letnikov, Weyl, Caputo, Marchaud, 
and Riesz fractional derivatives. Here; the problem is formulated in terms of the left 
and the right Riemann–Liouville fractional derivatives. 
The left Riemann–Liouville fractional derivative defined as 
∫ −−−⎟⎠
xa dfx
xfD τττ
αα )()(
)( 1                                                 (1) 
Which is denoted as the LRLFD and the right Riemann–Liouville fractional 
derivative reads as 
∫ −−−⎟⎠
bx dfx
xfD τττ
αα )()(
)( 1                                              (2) 
Which is denoted as the RRLFD. Here α is the order of the derivative such that 
nn ≤≤− α1  and Γ  represents the Euler gamma function. If α is an integer, these 
derivatives are defined in the usual sense, i.e. 
,....3,2,1,)()(,)()( =⎟
xfDxf
xfD bxxa             (3) 
The fractional operator 
xa D can be written as [13] 
D −= αα                                                                                                  (4) 
Where the number of additional differentiations n is equal to [α] +1, where [α] is 
the whole part of α. The operator αxa D is a generalization of differential and integral 
operators and can be introduced as follows: 
∫ − 0)Re()(
0)Re(1
0)Re(
                                                           (5) 
 Following to Agrawal [8], the Euler-Lgrange equations for fractional calculus 
of variations problem is obtained as 
                                                                  (6) 
Where L is the genaralized Lagrangian function of the form 
),,,( tqDqDqL btta
The generalized momenta are introduced as  
ββαα ∂
= ,                                                                          (7) 
And the Hamiltonian depending on the fractional time derivatives reads as  
LqDpqDpH btta −+=
α                                                                                   (8) 
In Ref [11], the Hamilton’s equations of motion are obtained in a similar manner 
to the usual mechanics. These equations read as,   
 ;    qD
,   ;     β
α pDpD
tabt +=∂
                           
                                                                                         
It is observed that the fractional Hamiltonian is not a constant of motion even 
though the Lagrangian does not depend on the time explicitly. 
3. Hamilton-Jacobi Partial Differential Equation with Fractional 
Derivatives 
In this section, the determination of the Hamilton-Jacobi partial differential 
equation for systems with fractional derivatives is discussed. According to Rabei et al. 
[11], the fractional Hamiltonian is written as 
( ) ),,,(,,, tqDqDqLqDpqDptppqH bttabtta βαββααβα −+=                            (9) 
Consider the canonical transformation with a generating function 
 ( )tPPqDqDF btta ,,,, 112 βαβα −−  
Then, the new Hamiltonian will take the form 
( ) ),,,(,,, tQDQDQLQDPQDPtPPQK bttabtta βαββααβα ′−+=                          (10) 
The old canonical coordinates βα ppq ,, , satisfy the fractional Hamilton’s 
principle that can be put in the form 
( ) 0
btta dtHqDpqDp
αδ                                                                  (11) 
At the same time the new canonical coordinates βα PPQ ,, , of course satisfy a similar 
principle.  
( ) 0
btta dtKQDPQDP
αδ                                                                  (12) 
The simultaneous validity of Eq. (11) and Eq. (12) does not mean of course that the 
integrands in both expressions are equal. Since the general form of the Hamilton’s 
principle has zero variation at the end points, both statements will be satisfied if the 
integrands connected by a relation of the form [12] 
KQDPQDPHqDpqDp bttabtta +−+=−+
α                              (13) 
Where the function F is given as:  
( ) QDPQDPtPPqDqDFF bttabtta 11112 ,,,, −−−− −−= ββααβαβα                              (14) 
The function F2 is called Hamilton’s principal function S for a contact transformation. 
( )tPPqDqDSF btta ,,,, 112 βαβα −−=                                                                        (15) 
Thus,  
btbttata
1111 −−−− −−−−= ββ
By using definitions of fractional calculus given in Eq. (4) then we have    
QDPQD
QDPQD
btbttata
αα −−−−= −− 11                           (16) 
Substituting the values of the 
 from Eq. (16) into the Eq. (13) we have  
KHqDpqDp bttabtta
11 −− −−+−=−+ ββααββ
α                  (17) 
Again using definitions of fractional calculus given in Eq. (4) we have the following 
form 
α 11                   (18) 
Substituting the values of the 
 from Eq. (18) into the Eq. (17) we get 
= ββαα                                                              (19) 
http://scienceworld.wolfram.com/physics/ContactTransformation.html
              
QD btta ∂
= −− 11 ,                                              (20) 
                           K
+                                                                                (21) 
We can automatically ensure that the new variables are constant in time by 
requiring that the transformed Hamiltonian K shall be identically zero, In other words, 
βα PPQ ,,  are constants. We see by putting K = 0 that this generating function must 
satisfy the partial differential equation. 
                         0=
H                                                                                  (22) 
This equation is called the Hamilton –Jacobi equation. Let us assume that 
21 , EPEP == βα  
Where 1E , 2E  are constants. Then the action function (15), can be expressed as 
                 ( )tEEqDqDSS btta ,,,, 2111 −−= βα                                                      (23) 
Further insight into the physical significance of Hamilton’s principal function S 
is furnished by an examination of the total time derivative, which can be computed 
from the formula 
           
α 11                                       (24) 
By using Eq. (19) we have 
                        HqDpqDp
btta −+=
And using Eq. (9) we have 
                                   L
Thus 
                                  ∫=
LdtS                                                                                (25) 
If we restrict our considerations to the time -independent Hamiltonians, then the 
Hamilton-Jacobi function can be written in the form 
     ( ) ( ) ( )tEEfEqDWEqDWS btta ,,,, 21212111 ++= −− βα                                          (26) 
Where W  is called Hamilton’s characteristic function and the function, f  takes the 
following form: 
                                      ( ) EttEEf −=,, 21  
 Making use of equations (19) and (20) we obtain: 
                
= ββαα                                               (27) 
            2
11 , λλ βα =
QD btta                                 (28) 
Here 1λ , 2λ  are constants.    
                                                                                         
The physical significance of W can be understood by writing its total time derivative 
                            qD
=                                                              (29) 
Comparing this expression to the results of substituting Eq. (27) into Eq. (29) we see 
that 
∫∫ −=⇒=⇒= qDdpWqdtDpWqDpdt
tatata
α            (30) 
Again one may show that 
                                  ∫ −= qDdpW bt 12 ββ                                                            (31) 
4. Illustrative Examples 
To demonstrate the application of our formalism, let us discuss the following models: 
As a first model consider the lagrangian given by Agrawal [8]  
                                                      ( )20
qDL t
The (HJPDE) for this Lagrangian is calculated as  
                                                   ( ) 0
1 2 =
Using Eq. (27) we obtain 
                                         0
1 =−⎟⎟
− EqD
Solving this equation we have 
                                                 qDEW t
−= α  
Thus 
                                       Ep 2=α  
Making use of Eq. (26) we obtain the function S as: 
                                   EtqDES t −=
Eq. (28) leads to 
                       1
1 λαα =−=
= −− tqD
QD tt  
Thus 
                                  ( )110 2 λα +=− tEqDt  
                                α
α pEqDt == 20  
This is the same result obtained by Rabei et al. [11], which is equivalent to Agrawal 
formalism [8].  
As a second model consider the Lagrangian given by Rabei et al. [11] 
                                     ( ) ( ) qDqDqDqDL tttt βαβα 102120
++=  
The Hamiltonian is calculated as 
                                            ( ) ( )22
βα ppH ==  
Thus, the Hamilton-Jacobi partial differential equation reads as: 
                                          ( ) 0
1 2 =
Making use of Eq. (26) we have 
                                          0
1 =−⎟⎟
− EqD
Thus, 
                                            qDEW t
−= α  
Again the (HJPDE) can be written as 
                                        ( ) 0
1 2 =
Then 
                                       0
2 =−⎟⎟
− EqD
Which leads to 
                                      qDEW t
−= β  
Thus the Hamilton-Jacobi action function can be written as 
                               EtqDEqDES tt −+=
Where 
                             E
= −αα  
                               E
= −ββ  
Using Eq. (28) we get 
                              1
1 λβαα =−+= −−− tqD
QD ttt  
Thus 
                              ( )11110 2 λβα +=+ −− tEqDqD tt  
                                        EqDqD tt 210 =+
Then 
                                        qDqDp tt
α 10 +=  
                                          qDqDp tt
β 10 +=  
These Leads to 
                              0))(( 1010 =++ qDqDDD tttt
This result is in exact agreement with Rabei et al. [11]. 
5- Conclusion 
In This work we have studied the Hamilton-Jacobi partial differential equation 
for systems containing fractional derivatives. A general theory to solve the Hamilton-
Jacobi partial differential equation is proposed for systems containing fractional 
derivatives under the condition that this equation is separable. The Hamilton-Jacobi 
function is determined in the same manner as for usual systems. Finding this function 
enables us to get the solutions of the equations of motion. 
In order to test our formalism, and to get a somewhat deeper understanding, we 
have examined two examples of systems with fractional derivatives. The result found 
to be in exact agreement with Lagrangian formulation given by Agrawal [8] and with 
Hamiltonian formulation given by Rabei et al. [11]. 
6- References 
[1]F. Riewe, Non Conservative Lgrangian and Hamiltonian mechanics. Physical 
Review E. 53:1890-1899, (1996). 
[2] F. Riewe, Mechanics with Fractional Derivatives. Physical Review E.55: 3581-
3592, (1997). 
[3] R. Hilfer, Applications of Fractional Calculus in Physics, World Scientific 
Publishing Company, Singapore, New Jersey, London and Hong Kong, (2000). 
[4] S. G. Samko, A. A. Kilbas, O. I. Marichev, Fractional Integrals and Derivatives: 
theory and applications, Gordon and Breach, Amsterdam, (1993). 
[5] I. Podlubny, Fractional Differential Equations, Academic Press, San Diego (1999); 
Fractional Calculus and Applied Analysis 5: 367, (2002). 
[6] Eqab M. Rabei, T. Al-halholy, A. Rousan, Potentials of Arbitrary Forces with 
Fractional Derivatives, International Journal of Modern Physics A, 19: 3083-3092, 
(2004). 
[7] Eqab M. Rabei, Abdul-Wali Ajlouni, Humman B. Ghassib, Quantization of Non-
Conservative Systems Using Fractional Calculus, WSEAS Transactions on 
Mathematics, 5: 853-863, (2006); Quantization of Brownian Motion, International 
Journal of Theoretical physics, (2006) in press. 
[8]Om P. Agrawal, Formulation of Euler–Lagrange equations for fractional variational 
problems, J. Math. Anal. Appl.272: 368-379, (2002). 
[9] Dumitru Baleanu, Sami I. Muslih, Lagrangian formulation of classical fields within 
Riemann-Liouville fractional derivatives, Phys. Scripta (in press).  
[10] M. Klimek, Lagrangian and Hamiltonian fractional sequential mechanics; Czech 
J. Phys., 52: 1247-1253, (2002); Fractional sequential mechanics – models with 
symmetric fractional derivative, Czech J. Phys. 51: 1348-1354, (2001). 
[11] Eqab M. Rabei, Khaled I. Nawafleh, Raed S. Hijjawi , Sami I. Muslih, Dumitru 
Baleanu, The Hamilton Formalism With Fractional Derivatives, J. Math. Anal. Appl. 
(in press) 
[12] H. Goldstein, Classical Mechanics, Addison-Wesley Publishing Company, 
(1980). 
[13] Igor M. Sokolov,Joseph Klafter, Alexander Blumen, Fractional Kinetics, Physics 
Today(2002) American Institute of physics,S-0031-9228-0211-030-1. 
[14] B.N.N. Achar, J.W. Hanneken, T. Enck, T.Clarke, Dynamics of the fractional 
oscillator, Physica A, 297: 361-367, (2001).
ABSTRACT
  As a continuation of Rabei et al. work [11], the Hamilton- Jacobi partial
differential equation is generalized to be applicable for systems containing
fractional derivatives. The Hamilton- Jacobi function in configuration space is
obtained in a similar manner to the usual mechanics. Two problems are
considered to demonstrate the application of the formalism. The result found to
be in exact agreement with Agrawal's formalism.

<|endoftext|><|startoftext|>
Introduction and the model
Entanglement is a physical observable measured by the von Neumann entropy
or, alternatively, by the Concurrence of the system under consideration.
The concept of entanglement gives a physical meaning to the electron cor-
relation energy in structures of interacting electrons. The electron correlation
is not directly observable, since it is defined as the difference between the ex-
act ground state energy of the many electrons Schrödinger equation and the
Hartree–Fock energy.
In this paper we discuss the Hamiltonian which describes the Hydrogen
molecule regarded as a two interacting spin 1
(qubit) model.
In [1] it was argued that the entanglement (a quantum observable) can be
used in analyzing the so–called correlation energy which is not directly observ-
able. From our point of view, the Hydrogen molecule is dealt with a bipartite
system governed by the Hamiltonian
HH2 = −
(1 + g)σ1 ⊗ σ1 −
(1− g)σ2 ⊗ σ2 − B(σ3 ⊗ σ3 + σ0 ⊗ σ3), (1)
∗Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100
Lecce, Italy; e–mail: tina.maiolo@le.infn.it
†Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100
Lecce, Italy; e–mail: luigi.martina@le.infn.it
‡Dipartimento di Fisica dell’Università del Salento and Sezione INFN di Lecce, 73100
Lecce, Italy; e–mail: giulio.soliani@le.infn.it
http://arxiv.org/abs/0704.0520v2
where σi stand for the Pauli matrices (σ0 = I). Actually, this model was con-
sidered in [1] in order to illustrate their method. However, here we will make
some interpretative changes. Indeed, from our point of view, the states of an
isolated atom are strongly reduced to a system with two energy levels related to
the intensity of the magnetic field B. Relatively to this scale, the exchange in-
teraction constant J is usually smaller than B, in order to represent the residual
interatomic interactions. From the point of view of quantum chemistry, one may
interpret the discrete spectrum as provided by the Hartree–Fock calculations,
while the interaction coupling J models the residual multielectronic effects, not
taken into account by the mean field approximation.
For simplicity we limit ourselves to the ferromagnetic phase with J > 0.
The parameter g, such that 0 ≤ g ≤ 1, describes the degree of anisotropy
corresponding for g = 0 to the completely isotropic XY spin model. Conversely,
g = 1 provides the anisotropic XY spin model, the so-called Ising model.
We notice that when the atoms are far apart, their interaction is quite weak.
This corresponds to a vanishing value of J . In this situation the state of the
system is completely factorized in the product state of the ground states of the
indipendent spins. The corresponding total energy, in unit of B, is just the sum
of the two fundamental levels, E0 = −2, which we may consider as the Hartree-
Fock approximated fundamental level in molecular structure calculations.
When J 6= 0, the fundamental energy eigenvalue is E= −
4 + g2λ2 in Re-
gion I defined by 0 < λ ≤ 2√
, otherwise E = −λ (λ means the coupling
constant) in Region II, which is the complement of I which respect to pos-
itive real axis. The corresponding (non normalized) eigenstates are |ΨI〉 =
g2λ2+4+2
, 0, 0, 1
and |ΨII〉 =
0, 1, 1, 0
, respectively. In both cases the
state is entangled.
Since we are dealing with pure states, the von Neumann entropy [2]
SvN = −Tr
ρ1log2ρ1
is chosen to be a measurement of the entanglement, where ρ1 is the 1-particle
reduced density matrix. However, for general mixed states other entanglement
estimators (for instance, the Concurrence [4]) have to be used. In the considered
case, one has
SvN,I = −
g2λ2 + 4 log
g2λ2 + 4
g2λ2 + 4 + 4
λ2 + 8
g2λ2 + 4 + 2
g2λ2 + 4
g2λ2 + 4
g2λ2 + 2
g2λ2 + 4 + 4
log(4)
SvN,II = 1. (4)
Scrutinizing Eq. (3) and Eq. (4) it emerges that the entropy is an increasing
function of the coupling constant λ in Region I, but the state is maximally
entangled in Region II independently from the anisotropy parameter g. One
sees that, as it arises graphycally, for g = 1 the entanglement is a monotonic
increasing function of the interaction coupling λ. Moreover for weak (< 1)
coupling values it is always less than the 30%. Of course, for large coupling
constants the entropy approaches 1, meaning that all levels are equiprobably
visited by the considered spin.
Limiting all further considerations to the case of weak interaction, we observe
that at the boundary point λb =
a discontinuity occurs, signaling a
crossing of the lowest eigenvalues and, in a more general context, a quantum
phase transition [5].
As it was pointed out in [6], for quantifying the entanglement we can resort
to the reduced density matrix. Furthermore, in [7], Wootters has shown that
for a pair of binary qubits one can use the concept of Concurrence C to measure
the entanglement.
The Concurrence reads
C(ρ) = max(0, ν1 − ν2 − ν3 − ν4), (5)
where the νi’s are the eigenvalues of the Hermitian matrix
where ρ̃ = (σy ⊗ σy)ρ∗(σy ⊗ σy), ρ∗ being the complex conjugate of ρ taken in
the standard basis [7].
Some interesting results on the simple model (1) of the Hydrogen molecule
can be achieved by realizing a comparative study of the von Neumann entropy
and the Concurrence.
To this aim, we compute the Concurrence CI and CII, i. e.
CI = gλ
g2λ2 + 4
, CII = 1. (6)
where I and II refer to Regions I and II, where 0 ≤ λ ≤ 2
1−g2 , and E = −λ,
respectively.
In Figure 1 a comparison between the Concurrence and the von Neumann
entropy for two spins system as a function of the coupling λ for g = 1 is pre-
sented.
Sec. 2 contains a comparison between the entanglement and the correlation
energy. In Sec. 3 the Configuration Interaction method is introduced to compare
entanglement and correlation energy. In Sec. 4 some differences between the
Configuration Interaction approach and the two spin Ising model are presented.
Finally, our main results are summarized in Sec. 5.
1 2 3 4 5
Conc.
Figure 1: Comparison between the Concurrence and the von Neumann entropy
for the two spins system as a function of the coupling constant λ for g = 1.
2 A comparison between the entanglement and
the correlation energy
Now we look for a comparison between the entanglement with the energy cor-
relation, which as we have already recalled, it is understood as the difference of
the fundamental energy level compared with respect to the corresponding value
at vanishing coupling constant λ.
For g = 1 and in unities of B it is given by
Ecorr = |E0| − 2 =
4 + λ2 − 2. (7)
We observe that the entanglement measure is always bounded, while Ecorr is
a divergent function of λ. So it does not make much sense to look for simple
relations valid on the entire λ-axes. Consequently, limiting ourselves to weak
couplings, for 0 ≤ λ ≤ 1, we minimize the mean squared deviation
∆S2α dλ, with ∆Sα = Ecorr − αSvN . (8)
Thus the minimizing parameter αmin will be given by
αmin =
EcorrSvN dλ
≈ −0.691217. (9)
A formula analogous to (9) can be obtained by using the Concurrence as
a measure of entanglement. In this case, by minimizing the mean squared
deviation we have
∆C2α′ dλ, with ∆Cα′ = Ecorr − α′ C. (10)
Now, in order to estimate the relative deviation of SvN with respect to Ecorr,
let us report |∆Sαmin |/SvN and |∆Sαmin/Ecorr| as functions of λ at the optimal
value αmin. The graphs of these functions are shown in Figure 2.
0.2 0.4 0.6 0.8 1
ÈDSmin�SvN È
0.2 0.4 0.6 0.8 1
ÈDSmin�EcorrÈ
Figure 2: The relative quadratic deviation between the von Neumann entropy
and the correlation energy with respect to the former and the latter, respectively,
at the optimal value αmin as a function of the coupling constant λ for g = 1.
In Figure 3, the relative quadratic deviation between the Concurrence and
the correlation energy with respect to the former and the latter, at the optimal
values α′min, is represented.
0.2 0.4 0.6 0.8 1
ÈDCminÈ�C
0.2 0.4 0.6 0.8 1
ÈDCmin�EcorrÈ
Figure 3: The relative quadratic deviation between the Concurrence and the
correlation energy with respect to the former and the latter, respectively, at the
optimal value α′min as a function of the coupling constant λ for g = 1.
Remark 1
From these graphs, one can argue that the agreement between the two func-
tions SvN and Ecorr is only qualitatively good, in fact, for very small λ, it is not
good at all. However, in an intermediate range of values, i. e., 0.6 ≤ λ ≤ 1 the
two functions are almost proportional within the 10%. Analogously, the same
is true between energy and Concurrence. Even, the agreement becomes worst
comparing the relative deviation of the Concurrence with respect to the corre-
lation energy, since the range in which the relative deviations become smaller
than 10% are narrower. Then, the question is whether the above results are
i) sufficient to justify the conjecture advanced in [1], i.e., entanglement can be
considered as an estimation of correlation energy; ii) if such a relation has a
more concrete physical meaning, in particular whether the minimizing parame-
ter αmin and the vanishing point of ∆Sαmin does possess any physical meaning
(or α′min and the vanishing point of ∆Cα′min). Notice that in the case of the
comparison for the Concurrence simpler analytical expressions appear. For in-
stance one finds ∆Cα′
0.383249 λ√
λ2 + 4 + 2
Remark 2
We note that in an interval of values around αmin, the deviation function
(8) possesses a minimum in the interval of interest 0 ≤ λ ≤ 1, otherwise the
minimum is achieved at larger value of λ, or the function is monotonically
increasing (see Figure 4).
0.2 0.4 0.6 0.8 1
-0.05
0.2 0.4 0.6 0.8 1
-0.05
d DSΑ � dΛ
Figure 4: The deviation ∆Sα and its derivative with respect to λ are computed
for values of −1.29(red) ≤ α ≤ −0.091(violet), for steps of 0.06. The curve
drawn thicker corresponds to αmin
This behavior suggests to consider the function ∆Sαmin as a sort of ”free en-
ergy” , where αmin mimics the ”temperature” specific of the system. If, for some
reason, we allow λ to change, then we expect that spontaneously the interaction
coupling adjusts itself to the minimum of ∆Sαmin . Similar considerations can
be made looking at the graphs drawn for the function ∆Cα′
and its derivative
with respect to λ (see Figure 5).
The function ∆Sαmin or, alternatively, the minimum of ∆Cα′
can be ob-
tained algebraically. Such a minimum is at the value of the coupling constant
λSvNmin ≈ 0.485 and λCmin ≈ 0.371, respectively.
The authors in [1] studied numerically the von Neumann entropy and the
correlation function for a Hydrogen molecule, using an old result by Herring and
Flicker [8], going back to an oldest idea by Heitler and London [9], which con-
sists in substituting the molecular binding with a position dependent exchange
coupling:
J(r) ≈ 1.641 r
2 e−2 r Ry, (11)
where r is given in Bohr radius, see Figure 6. The maximum value taken by this
function is at the point rmax = 1.25. Assuming B = 0.5 Ry, i.e. 12 of the funda-
mental level of the Hydrogen atom, the maximum value λ′max = J(rmax)/B ≈
0.2 0.4 0.6 0.8 1
0.2 0.4 0.6 0.8 1
dDCΑ'� dΛ
Figure 5: The deviation ∆Cα′ and its derivative with respect to λ are computed
for values of −0.98(red) ≤ α′ ≤ 0.22(violet), for steps of 0.06. The curve drawn
thicker corresponds to α′min
0.5 1 1.5 2 2.5 3
JHrLHRyL
Figure 6: The effective interaction Hydrogen-Hydrogen atom
0.470628 < λSvNmin , i.e. the value of the effective interaction value is less than
the minimum for the deviation function ∆Sαmin . Then, the equilibrium bal-
ance between entanglement (as von Neumann entropy) and correlation energy
predicts a length of the molecule equal to rmax (see the first panel of Figure 7).
On the other hand, if we consider the energy gap 2B = 3/4 Ry, i.e. the energy
step to the first excited state, one obtains the new value λ′′max ≈ 0.628, which
goes beyond λmin, even if it is always less than 1. Now, the deviation function
∆Sαmin has two minima as seen in the second panel of Figure 7, one of which
is at r′′− ≈ 0.76 , the other one being at r′′+ ≈ 1.91.
These results should be compared with the experimental equilibrium length
of the Hydrogen molecule, which is rexp ≈ 2.0.
We point out that although the spin–model described by the Hamiltonian
(1) is characterized by features which are essentially rough, however we are
induced to answer positively to the quest for a physical meaning of the deviation
function ∆Sαmin . Indeed, the results elucidated in Figure 7 encourage, on one
part, improvement of the computation of r in order to make more accurate the
comparison with the experimental value rexp.
0.5 1 1.5 2 2.5 3
-0.015
-0.0125
-0.01
-0.0075
-0.005
-0.0025
DS minHr; B= .5 RyL
0.5 1 1.5 2 2.5 3
-0.015
-0.0125
-0.01
-0.0075
-0.005
-0.0025
DS minHr; B= .375 RyL
r''- r''+
Figure 7: The von Neumann entropy for the 2-spin model for B = .5 Ry (left
panel) and for B = .375 Ry (right panel) and the position depending interaction
given by (11).
The first question to answer is whether this draft works also for the Concur-
rence. A statement about it is not obvious, since the von Neumann entropy is
a nonlinear function of the Concurrence in the 2-qubits case.
However, from Figure 8 one can see that the minimized deviation of the
Concurrence takes one minimum for relatively large intensity of the magnetic
field ( say B ≥ 0.6 Ry), while for weak fields two minima appear, corresponding
to the situation depicted nearby.
0 0.2 0.4 0.6 0.8 1
r HBohrL
B HRyL
0.2 0.4 0.6 0.8 1
       
         B(Ry)
r(Bohr)
Figure 8: Two contour plots of the minimized deviation of the Concurrence as a
function of the magnetic field B (Ry) and of the internuclear distance r, as given
by (11). The range of values divided by the contour lines is [−0.038, 0, 04] for
the left panel and [−0.03705, −0, 03000] for the right one that approximatively
corresponding to the black area in the left panel.
In correspondence of the same values considered above, for B = 0.5 Ry
the function ∆Cα′
(r) has two minima at r = 0.79 and r = 1.88, while for
B = 0.375 Ry they are located at r = 0.60 and r = 2.25. So one sees that the
resulting equilibrium configurations are not much very close to the experimental
one. The equilibrium configuration more closest to the experimental one is the
minimum occurring at r = 1.88 (B = 1
Ry) for the function ∆Cα′
One sees that one of the resulting equilibrium configurations is only roughly
close to the experimental one.
In other words, to conclude monitoring numerically B the equilibrium config-
uration more closest to experimental one in the minimum occurring at r = 1.88
for B = 1
Ry and at r = 2.25 for B = 0.375 Ry for the function ∆Cα′
3 A quantum chemical framework to compare
entanglement and correlation energy
In this Section we represent the results produced in [1], where the electron entan-
glement in the Hydrogen molecule, calculated by the von Neumann entropy of
the reduced density matrix ρ1, is obtained starting by the excitation coefficients
of the wave function expanded by a configuration interaction method:
ρCISD1
= −Tr
ρCISD1 log2ρ
|c2i+11 |2 +
|c2i+1,2i+21,2 |2
|c2i+11 |2 +
|c2i+1,2i+21,2 |2
|c0|2 +
|c2i+22 |2
|c0|2 +
|c2i+22 |2
, (12)
where c1 is the coefficient for a single excitation, and c1,2 is the double
excitation (in Appendix A of [10] more details are shown).
In this framework, entanglement (S) and correlation energy (Ecorr), as func-
tions of nucleus – nucleus separation are those in Figure 9
0 1 2 3 4 5
R ( Å ) 
S ( ρ1
Figure 9: Comparison between the entanglement, calculated by the von Neu-
mann entropy of the reduced density matrix, and the electron correlation energy
in the Hydrogen molecule.
By the results given by this model, we want to discuss and to suggest some
answers to the questions i) and ii) presented in Remark 1. Even if, in order
to represent correlation energy and entanglement, we use two different scales,
in Figure 9 we can see that entanglement has a small value in the united atom
limit after it is growing for small distances till it arrives at a maximum value
then it decrease till it assumes zero value at the separated atom limit and it is
exactly the progress of the correlation curve.
In order to compare the entropy S with the electron correlation energy Ecorr,
we rescale S with the parameter αmin calculated with some procedure illustrated
in Eq. (8) and Eq. (9) replacing the integration variable λ with R; in this way
we extract
EcorrSvNdR
SvNdR
≈ 0.009. (13)
The corresponding ∆Sαmin = Ecorr−αS allows us to answer to the question
ii); in fact, as it is shown in Figure 10, the vanishing point of ∆Sαmin is,
according to the two –spin Ising model, nearby R ≈ 2 Å that corresponds to
the equilibrium configuration of the Hydrogen molecule.
0 1 2 3 4 5
−0.01
Figure 10: ∆Sαmin for theH2 molecule as a function of nucleus–nucleus distance.
4 Differences between the Configuration Inter-
action approach and the two–spin Ising model
The model proposed in Sec. 1 provides us with a measurement of entanglement:
indeed, Eq. (3) describes the von Neumann entropy as a function of coupling
constant λ, for small λ. By using Eq. (7), we can express λ in terms of corre-
lation energy and substituting it in Eq. (3) we can obtain the variation of SvN
in terms of Ecorr.
SvN = −
EcorrLog
Ecorr
2(Ecorr+2)
+ (Ecorr + 4)Log
Ecorr+4
2(Ecorr+2)
(Ecorr + 2)Log4
. (14)
In order to calculated the coefficient of proportionality among SvN and Ecorr
we make an expansion of SvN for Ecorr → 0 (or equivalently for λ → 0) at the
first order, obtaining a straight line characterized by an angular coefficient given
by mSvN (Ecorr) = (
)(1 + 1
). Since this behavior is uncorrect to represent
the logatithmic singularity of SvN in the origin, we make an expansion of Eq.
(14), preserving the logarithmic deviation, and we obtain an expression of the
SvN = AEcorr +BEcorrLog(Ecorr), (15)
where A = 1/2 and B = −1/(4Log2).
0.02 0.04 0.06 0.08 0.1
0.025
0.075
0.125
Ecorr   
Linear 
AE+BELogE
Figure 11: A comparison among the behavior of Eq. (14) and its linear approx-
imation and the logarithmic one, for the Ising model.
In order to compare the behavior of SvN in Eq. (14), we have organized
the numerical data, calculated with the method proposed in [1], by making a
correspondence between each value of Ecorr and its respective value of SvN ,
obtaining the plot in Figure 12
0 0.01 0.02 0.03 0.04 0.05
Figure 12: A correspondence of Ecorr and SvN calculated by the numerical
procedure suggested by [1]
Of particular significance is the fact that, in the range where S is monotoni-
cally increasing, the correlation energy has its maximum, consequently S seems
to be not a function. Moreover, it is important to note that Ecorr begins to
decrease for R > 1 Å, region where the states become mixed, i. e. ,Trρ 6= Trρ2;
as depicted in Figure 13.
0 1 2 3 4
R(Å )
Figure 13: The increasing of the degree of mixing in the two electron state: in
black we depict the trace of ρ, in red the trace of ρ2.
Probably, for this reason, the procedure adopted in [1] seems to be not cor-
rect: the density matrix, in fact, is calculated starting by the excitation coeffi-
cient of a wave function obtained developping with the Configuration Interaction
Single Double method a pure two electrons state.
However, even if we consider only the first branch of the plot in Figure 12,
i.e. , the numerical values of SvN corresponding with increasing values of Ecorr,
and we fit the values around Ecorr → 0 with a F = AEcorr + BEcorrLog(Ecorr)
we draw out numerical values of the coefficient different from the ones used in
Eq. (15). This result is shown in Figure 14.
0.01 0.02 0.03 0.04 0.05
0.35 S
Ecorr
A=17.1
B=3.3
Figure 14: A fit of SvN as a function of Ecorr, around the origin, with a function
of the form F = AEcorr +BEcorrLog(Ecorr) whose coefficients A and B assume
the numerical values in Figure.
In particular the arithmetic sign of the coefficient B in the two models are
opposite and this implies the opposite concavity of the curve.
This fact, clearly demonstrates a not satisfactory agreement between the
Ising model and the one proposed in [1].
5 Concluding remarks
We have explored the role of entanglement in the model of two qubits describing
the Hydrogen molecule (1), considered as a bipartite system. In our discussion
we have limited to the ferromagnetic case governed by the interaction coupling
parameter J > 0.
The concept of entanglement gives a physical meaning to the electron cor-
relation energy in structures of interacting electrons. The entanglement can be
measured by using the von Neumann entropy or, alternatively, the notion of
Concurrence [7]. To compute the entanglement it is convenient to consider two
Regions, say I and II, which provide two different reduced density matrices.
The entropy turns out to be an increasing function of the coupling constant λ
in Region I, but the state under consideration is maximally entangled in Region
II indipendently from the anisotropy parameter g.
An interesting result is that for large coupling constants the entropy ap-
proach 1, meaning that all levels are equiprobably visited by the considered
spin.
For weak interactions, at the boundary point λb =
the von Neumann
entropy admits a discontinuity, indicating a crossing of the lowest eigenvalues
and, in a more general constext, a quantum phase transition [5].
In Sec. 2 a comparison between the entanglement and the correlation energy
is performed.
To quantifying the entanglement we resort to the reduced density matrix.
The entanglement can also be measured by exploiting the concept of Concur-
rence.
The entanglement measure is always bounded, while the energy correlation,
Ecorr = |E0| − 2 =
4 + λ2 − 2, is a divergent function of λ. This fact tells us
that to look for simple relations valid on the whole λ−axes has no sense.
Thus, by limiting ourselves to weak couplings, we have minimized the mean
square deviation given by Eq. (8). This procedure leads to the value αmin ≈
−0.691217 for the minimizing parameter (see Eq. (9)).
Sec. 1 contains a comparison between the von Neumann entropy and the
Concurrence.
Such a comparison is illustrated in Figure 1, for two spin system as a function
of the coupling λ for g = 1.
Some important points are commented in Remark 1 and Remark 2 .
In Figure 4 the deviation ∆Sα and its derivatives with respect to λ are
computed and αmin is evaluated for α ranging in the interval −1.29 ≤ α ≤
−0.091.
In Figure 5 the minimized Concurrence deviation ∆C
for the four eigen-
states of the 2-spin model is shown.
We point out the existence of a perfect symmetry among the Concurrence
deviations for pairs of eigenstates of opposite eigenvalues.
Formula (11), due to Heitler–London [9], is reported, where the position
dependent exchange coupling J(r) is expressed in term of the length r of the
nucleus–nucleus separation in the Hydrogen molecule.
To conclude, the magnetic field B has been monitored such that the equi-
librium configuration more closest to the experimental one, r ≈ 2.00, is the
minimum occurring at r = 1.88 for B = 1
Ry and r = 2.25 for B = 0.375 Ry
for the function ∆Cα′
We observe also that in the intermediate range of values, i. e., for 0.6 ≤ λ ≤
1, the two functions SvN and the correlation energy are almost proportional
within the 10%.
However, when we organized the pairs of points (Ecorr, SvN ) calculated by
following the procedure described by [1], it is clear that the von Neumann en-
tropy cannot be considered a function of correlation energy. The principle cause
is that the function Ecorr presents a maximum in the region where SvN is mono-
tonically increasing.
The reversing behavior of correlation energy occurs in correspondence with
an increase of the mixing degree of the two electrons state. The function Ecorr
in terms of the nucleus – nucleus distance R, increases till the state is pure, on
the contrary, when Tr(ρ2) becomes discordant from Tr(ρ), the function Ecorr
decreases.
This fact suggests us that the numerical model based on the calculation of
SvN starting by the excitation coefficients ci, isn’t completley correct because
the density matrix is obtained as a product of two electron pure states. However,
even if we consider only a branch of the plot in Figure 12, the function obtained
by the two spin Ising model, i. e., Eq. (14), is unsuitable for fitting these
numerical data.
On the basis of our results, essentially grounded on numerical considerations,
in the near feature we would explore more complicated systems of molecules,
such as for example the ethylene or other hydrocarbons, and compare these
studies with the goals obtained for the Hydrogen molecule.
Acknowledgments
The authors acknowledge the Italian Ministry of Scientific Researches (MIUR)
for partial support of the present work under the project SINTESI 2004/06 and
the INFN for partial support under the project Iniziativa Specifica LE41.
References
[1] Z. Huang, S. Kais, Chem. Phys. Lett. 413, 1 (2005).
[2] M. A. Nielsen and I. L. Chuang Quantum Computation and Quantum In-
formation, Cambridge Univ. Press, Cambridge, 2000.
[3] D. M. Collin, Z. Naturforsch A 48, 68 (1993).
[4] P. Rungta and C. M. Caves Phys. Rev. A 67, 012307 (2003).
[5] S. Sachdev Quantum Phase Transition, Cambridge University Press, 2001.
[6] O. Osenda, Z. Huang and S. Kais Phys. Rev A 67, 062321 (2003).
[7] W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
[8] C. Herring and M. Flicker, Phys. Rev. A 134, 362 (1964).
[9] W. Heitler, F. London, Z. Physik 44, 455 (1927)
[10] T. Maiolo, F. Della Sala, L. Martina, G. Soliani arXiv: quant–ph/ 0610238
(2006).
http://arxiv.org/abs/quant--ph/0610238
	Introduction and the model 
	A comparison between the entanglement and the correlation energy 
	A quantum chemical framework to compare entanglement and correlation energy 
	Differences between the Configuration Interaction approach and the two–spin Ising model 
	Concluding remarks
ABSTRACT
  In this paper we investigate some entanglement properties for the Hydrogen
molecule considered as a two interacting spin 1/2 (qubit) model. The
entanglement related to the $H_{2}$ molecule is evaluated both using the von
Neumann entropy and the Concurrence and it is compared with the corresponding
quantities for the two interacting spin system. Many aspects of these functions
are examinated employing in part analytical and, essentially, numerical
techniques. We have compared analogous results obtained by Huang and Kais a few
years ago. In this respect, some possible controversial situations are
presented and discussed.

<|endoftext|><|startoftext|>
Introduction
	Fractional charges on frustrated lattices
	Effective Hamiltonians
	Height representation, conserved quantities, and gauge symmetries
	Ground states and lowest excitations in the undoped case
	Static and dynamical properties of the doped system
	References
ABSTRACT
  Systems of strongly correlated fermions on certain geometrically frustrated
lattices at particular filling factors support excitations with fractional
charges $\pm e/2$. We calculate quantum mechanical ground states, low--lying
excitations and spectral functions of finite lattices by means of numerical
diagonalization. The ground state of the most thoroughfully studied case, the
criss-crossed checkerboard lattice, is degenerate and shows long--range order.
Static fractional charges are confined by a weak linear force, most probably
leading to bound states of large spatial extent. Consequently, the
quasi-particle weight is reduced, which reflects the internal dynamics of the
fractionally charged excitations. By using an additional parameter, we
fine--tune the system to a special point at which fractional charges are
manifestly deconfined--the so--called Rokhsar--Kivelson point. For a deeper
understanding of the low--energy physics of these models and for numerical
advantages, several conserved quantum numbers are identified.

<|endoftext|><|startoftext|>
BABAR-PUB-07/009
SLAC-PUB-12430
hep-ex/0704.0522
Measurement of Decay Amplitudes of B → (cc)K∗ with an Angular Analysis, for
(cc) = J/ψ , ψ(2S) and χ
B. Aubert,1 M. Bona,1 D. Boutigny,1 Y. Karyotakis,1 J. P. Lees,1 V. Poireau,1 X. Prudent,1 V. Tisserand,1
A. Zghiche,1 J. Garra Tico,2 E. Grauges,2 L. Lopez,3 A. Palano,3 G. Eigen,4 I. Ofte,4 B. Stugu,4 L. Sun,4
G. S. Abrams,5 M. Battaglia,5 D. N. Brown,5 J. Button-Shafer,5 R. N. Cahn,5 Y. Groysman,5 R. G. Jacobsen,5
J. A. Kadyk,5 L. T. Kerth,5 Yu. G. Kolomensky,5 G. Kukartsev,5 D. Lopes Pegna,5 G. Lynch,5 L. M. Mir,5
T. J. Orimoto,5 M. Pripstein,5 N. A. Roe,5 M. T. Ronan,5, ∗ K. Tackmann,5 W. A. Wenzel,5 P. del Amo Sanchez,6
C. M. Hawkes,6 A. T. Watson,6 T. Held,7 H. Koch,7 B. Lewandowski,7 M. Pelizaeus,7 T. Schroeder,7
M. Steinke,7 W. N. Cottingham,8 D. Walker,8 D. J. Asgeirsson,9 T. Cuhadar-Donszelmann,9 B. G. Fulsom,9
C. Hearty,9 N. S. Knecht,9 T. S. Mattison,9 J. A. McKenna,9 A. Khan,10 M. Saleem,10 L. Teodorescu,10
V. E. Blinov,11 A. D. Bukin,11 V. P. Druzhinin,11 V. B. Golubev,11 A. P. Onuchin,11 S. I. Serednyakov,11
Yu. I. Skovpen,11 E. P. Solodov,11 K. Yu Todyshev,11 M. Bondioli,12 S. Curry,12 I. Eschrich,12 D. Kirkby,12
A. J. Lankford,12 P. Lund,12 M. Mandelkern,12 E. C. Martin,12 D. P. Stoker,12 S. Abachi,13 C. Buchanan,13
S. D. Foulkes,14 J. W. Gary,14 F. Liu,14 O. Long,14 B. C. Shen,14 L. Zhang,14 H. P. Paar,15 S. Rahatlou,15
V. Sharma,15 J. W. Berryhill,16 C. Campagnari,16 A. Cunha,16 B. Dahmes,16 T. M. Hong,16 D. Kovalskyi,16
J. D. Richman,16 T. W. Beck,17 A. M. Eisner,17 C. J. Flacco,17 C. A. Heusch,17 J. Kroseberg,17 W. S. Lockman,17
T. Schalk,17 B. A. Schumm,17 A. Seiden,17 D. C. Williams,17 M. G. Wilson,17 L. O. Winstrom,17 E. Chen,18
C. H. Cheng,18 A. Dvoretskii,18 F. Fang,18 D. G. Hitlin,18 I. Narsky,18 T. Piatenko,18 F. C. Porter,18
G. Mancinelli,19 B. T. Meadows,19 K. Mishra,19 M. D. Sokoloff,19 F. Blanc,20 P. C. Bloom,20 S. Chen,20
W. T. Ford,20 J. F. Hirschauer,20 A. Kreisel,20 M. Nagel,20 U. Nauenberg,20 A. Olivas,20 J. G. Smith,20
K. A. Ulmer,20 S. R. Wagner,20 J. Zhang,20 A. M. Gabareen,21 A. Soffer,21 W. H. Toki,21 R. J. Wilson,21
F. Winklmeier,21 Q. Zeng,21 D. D. Altenburg,22 E. Feltresi,22 A. Hauke,22 H. Jasper,22 J. Merkel,22 A. Petzold,22
B. Spaan,22 K. Wacker,22 T. Brandt,23 V. Klose,23 H. M. Lacker,23 W. F. Mader,23 R. Nogowski,23 J. Schubert,23
K. R. Schubert,23 R. Schwierz,23 J. E. Sundermann,23 A. Volk,23 D. Bernard,24 G. R. Bonneaud,24 E. Latour,24
V. Lombardo,24 Ch. Thiebaux,24 M. Verderi,24 P. J. Clark,25 W. Gradl,25 F. Muheim,25 S. Playfer,25
A. I. Robertson,25 Y. Xie,25 M. Andreotti,26 D. Bettoni,26 C. Bozzi,26 R. Calabrese,26 A. Cecchi,26 G. Cibinetto,26
P. Franchini,26 E. Luppi,26 M. Negrini,26 A. Petrella,26 L. Piemontese,26 E. Prencipe,26 V. Santoro,26 F. Anulli,27
R. Baldini-Ferroli,27 A. Calcaterra,27 R. de Sangro,27 G. Finocchiaro,27 S. Pacetti,27 P. Patteri,27 I. M. Peruzzi,27, †
M. Piccolo,27 M. Rama,27 A. Zallo,27 A. Buzzo,28 R. Contri,28 M. Lo Vetere,28 M. M. Macri,28 M. R. Monge,28
S. Passaggio,28 C. Patrignani,28 E. Robutti,28 A. Santroni,28 S. Tosi,28 K. S. Chaisanguanthum,29 M. Morii,29
J. Wu,29 R. S. Dubitzky,30 J. Marks,30 S. Schenk,30 U. Uwer,30 D. J. Bard,31 P. D. Dauncey,31 R. L. Flack,31
J. A. Nash,31 M. B. Nikolich,31 W. Panduro Vazquez,31 P. K. Behera,32 X. Chai,32 M. J. Charles,32 U. Mallik,32
N. T. Meyer,32 V. Ziegler,32 J. Cochran,33 H. B. Crawley,33 L. Dong,33 V. Eyges,33 W. T. Meyer,33 S. Prell,33
E. I. Rosenberg,33 A. E. Rubin,33 A. V. Gritsan,34 Z. J. Guo,34 C. K. Lae,34 A. G. Denig,35 M. Fritsch,35
G. Schott,35 N. Arnaud,36 J. Béquilleux,36 M. Davier,36 G. Grosdidier,36 A. Höcker,36 V. Lepeltier,36
F. Le Diberder,36 A. M. Lutz,36 S. Pruvot,36 S. Rodier,36 P. Roudeau,36 M. H. Schune,36 J. Serrano,36 V. Sordini,36
A. Stocchi,36 W. F. Wang,36 G. Wormser,36 D. J. Lange,37 D. M. Wright,37 C. A. Chavez,38 I. J. Forster,38
J. R. Fry,38 E. Gabathuler,38 R. Gamet,38 D. E. Hutchcroft,38 D. J. Payne,38 K. C. Schofield,38 C. Touramanis,38
A. J. Bevan,39 K. A. George,39 F. Di Lodovico,39 W. Menges,39 R. Sacco,39 G. Cowan,40 H. U. Flaecher,40
D. A. Hopkins,40 P. S. Jackson,40 T. R. McMahon,40 F. Salvatore,40 A. C. Wren,40 D. N. Brown,41 C. L. Davis,41
J. Allison,42 N. R. Barlow,42 R. J. Barlow,42 Y. M. Chia,42 C. L. Edgar,42 G. D. Lafferty,42 T. J. West,42
J. I. Yi,42 J. Anderson,43 C. Chen,43 A. Jawahery,43 D. A. Roberts,43 G. Simi,43 J. M. Tuggle,43 G. Blaylock,44
C. Dallapiccola,44 S. S. Hertzbach,44 X. Li,44 T. B. Moore,44 E. Salvati,44 S. Saremi,44 R. Cowan,45 P. H. Fisher,45
G. Sciolla,45 S. J. Sekula,45 M. Spitznagel,45 F. Taylor,45 R. K. Yamamoto,45 S. E. Mclachlin,46 P. M. Patel,46
S. H. Robertson,46 A. Lazzaro,47 F. Palombo,47 J. M. Bauer,48 L. Cremaldi,48 V. Eschenburg,48 R. Godang,48
R. Kroeger,48 D. A. Sanders,48 D. J. Summers,48 H. W. Zhao,48 S. Brunet,49 D. Côté,49 M. Simard,49 P. Taras,49
http://arxiv.org/abs/0704.0522v2
F. B. Viaud,49 H. Nicholson,50 G. De Nardo,51 F. Fabozzi,51, ‡ L. Lista,51 D. Monorchio,51 C. Sciacca,51
M. A. Baak,52 G. Raven,52 H. L. Snoek,52 C. P. Jessop,53 J. M. LoSecco,53 G. Benelli,54 L. A. Corwin,54
K. K. Gan,54 K. Honscheid,54 D. Hufnagel,54 H. Kagan,54 R. Kass,54 J. P. Morris,54 A. M. Rahimi,54
J. J. Regensburger,54 R. Ter-Antonyan,54 Q. K. Wong,54 N. L. Blount,55 J. Brau,55 R. Frey,55 O. Igonkina,55
J. A. Kolb,55 M. Lu,55 R. Rahmat,55 N. B. Sinev,55 D. Strom,55 J. Strube,55 E. Torrence,55 N. Gagliardi,56 A. Gaz,56
M. Margoni,56 M. Morandin,56 A. Pompili,56 M. Posocco,56 M. Rotondo,56 F. Simonetto,56 R. Stroili,56 C. Voci,56
E. Ben-Haim,57 H. Briand,57 J. Chauveau,57 P. David,57 L. Del Buono,57 Ch. de la Vaissière,57 O. Hamon,57
B. L. Hartfiel,57 Ph. Leruste,57 J. Malclès,57 J. Ocariz,57 A. Perez,57 L. Gladney,58 M. Biasini,59 R. Covarelli,59
E. Manoni,59 C. Angelini,60 G. Batignani,60 S. Bettarini,60 G. Calderini,60 M. Carpinelli,60 R. Cenci,60
A. Cervelli,60 F. Forti,60 M. A. Giorgi,60 A. Lusiani,60 G. Marchiori,60 M. A. Mazur,60 M. Morganti,60 N. Neri,60
E. Paoloni,60 G. Rizzo,60 J. J. Walsh,60 M. Haire,61 J. Biesiada,62 P. Elmer,62 Y. P. Lau,62 C. Lu,62 J. Olsen,62
A. J. S. Smith,62 A. V. Telnov,62 E. Baracchini,63 F. Bellini,63 G. Cavoto,63 A. D’Orazio,63 D. del Re,63 E. Di
Marco,63 R. Faccini,63 F. Ferrarotto,63 F. Ferroni,63 M. Gaspero,63 P. D. Jackson,63 L. Li Gioi,63 M. A. Mazzoni,63
S. Morganti,63 G. Piredda,63 F. Polci,63 F. Renga,63 C. Voena,63 M. Ebert,64 H. Schröder,64 R. Waldi,64 T. Adye,65
G. Castelli,65 B. Franek,65 E. O. Olaiya,65 S. Ricciardi,65 W. Roethel,65 F. F. Wilson,65 R. Aleksan,66 S. Emery,66
M. Escalier,66 A. Gaidot,66 S. F. Ganzhur,66 G. Hamel de Monchenault,66 W. Kozanecki,66 M. Legendre,66
G. Vasseur,66 Ch. Yèche,66 M. Zito,66 X. R. Chen,67 H. Liu,67 W. Park,67 M. V. Purohit,67 J. R. Wilson,67
M. T. Allen,68 D. Aston,68 R. Bartoldus,68 P. Bechtle,68 N. Berger,68 R. Claus,68 J. P. Coleman,68 M. R. Convery,68
J. C. Dingfelder,68 J. Dorfan,68 G. P. Dubois-Felsmann,68 D. Dujmic,68 W. Dunwoodie,68 R. C. Field,68
T. Glanzman,68 S. J. Gowdy,68 M. T. Graham,68 P. Grenier,68 C. Hast,68 T. Hryn’ova,68 W. R. Innes,68
M. H. Kelsey,68 H. Kim,68 P. Kim,68 D. W. G. S. Leith,68 S. Li,68 S. Luitz,68 V. Luth,68 H. L. Lynch,68
D. B. MacFarlane,68 H. Marsiske,68 R. Messner,68 D. R. Muller,68 C. P. O’Grady,68 A. Perazzo,68 M. Perl,68
T. Pulliam,68 B. N. Ratcliff,68 A. Roodman,68 A. A. Salnikov,68 R. H. Schindler,68 J. Schwiening,68 A. Snyder,68
J. Stelzer,68 D. Su,68 M. K. Sullivan,68 K. Suzuki,68 S. K. Swain,68 J. M. Thompson,68 J. Va’vra,68 N. van Bakel,68
A. P. Wagner,68 M. Weaver,68 W. J. Wisniewski,68 M. Wittgen,68 D. H. Wright,68 A. K. Yarritu,68 K. Yi,68
C. C. Young,68 P. R. Burchat,69 A. J. Edwards,69 S. A. Majewski,69 B. A. Petersen,69 L. Wilden,69 S. Ahmed,70
M. S. Alam,70 R. Bula,70 J. A. Ernst,70 V. Jain,70 B. Pan,70 M. A. Saeed,70 F. R. Wappler,70 S. B. Zain,70
W. Bugg,71 M. Krishnamurthy,71 S. M. Spanier,71 R. Eckmann,72 J. L. Ritchie,72 A. M. Ruland,72 C. J. Schilling,72
R. F. Schwitters,72 J. M. Izen,73 X. C. Lou,73 S. Ye,73 F. Bianchi,74 F. Gallo,74 D. Gamba,74 M. Pelliccioni,74
M. Bomben,75 L. Bosisio,75 C. Cartaro,75 F. Cossutti,75 G. Della Ricca,75 L. Lanceri,75 L. Vitale,75 V. Azzolini,76
N. Lopez-March,76 F. Martinez-Vidal,76 D. A. Milanes,76 A. Oyanguren,76 J. Albert,77 Sw. Banerjee,77 B. Bhuyan,77
K. Hamano,77 R. Kowalewski,77 I. M. Nugent,77 J. M. Roney,77 R. J. Sobie,77 J. J. Back,78 P. F. Harrison,78
T. E. Latham,78 G. B. Mohanty,78 M. Pappagallo,78, § H. R. Band,79 X. Chen,79 S. Dasu,79 K. T. Flood,79
J. J. Hollar,79 P. E. Kutter,79 Y. Pan,79 M. Pierini,79 R. Prepost,79 S. L. Wu,79 Z. Yu,79 and H. Neal80
(The BABAR Collaboration)
1Laboratoire de Physique des Particules, IN2P3/CNRS et Université de Savoie, F-74941 Annecy-Le-Vieux, France
2Universitat de Barcelona, Facultat de Fisica, Departament ECM, E-08028 Barcelona, Spain
3Università di Bari, Dipartimento di Fisica and INFN, I-70126 Bari, Italy
4University of Bergen, Institute of Physics, N-5007 Bergen, Norway
5Lawrence Berkeley National Laboratory and University of California, Berkeley, California 94720, USA
6University of Birmingham, Birmingham, B15 2TT, United Kingdom
7Ruhr Universität Bochum, Institut für Experimentalphysik 1, D-44780 Bochum, Germany
8University of Bristol, Bristol BS8 1TL, United Kingdom
9University of British Columbia, Vancouver, British Columbia, Canada V6T 1Z1
10Brunel University, Uxbridge, Middlesex UB8 3PH, United Kingdom
11Budker Institute of Nuclear Physics, Novosibirsk 630090, Russia
12University of California at Irvine, Irvine, California 92697, USA
13University of California at Los Angeles, Los Angeles, California 90024, USA
14University of California at Riverside, Riverside, California 92521, USA
15University of California at San Diego, La Jolla, California 92093, USA
16University of California at Santa Barbara, Santa Barbara, California 93106, USA
17University of California at Santa Cruz, Institute for Particle Physics, Santa Cruz, California 95064, USA
18California Institute of Technology, Pasadena, California 91125, USA
19University of Cincinnati, Cincinnati, Ohio 45221, USA
20University of Colorado, Boulder, Colorado 80309, USA
21Colorado State University, Fort Collins, Colorado 80523, USA
22Universität Dortmund, Institut für Physik, D-44221 Dortmund, Germany
23Technische Universität Dresden, Institut für Kern- und Teilchenphysik, D-01062 Dresden, Germany
24Laboratoire Leprince-Ringuet, CNRS/IN2P3, Ecole Polytechnique, F-91128 Palaiseau, France
25University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom
26Università di Ferrara, Dipartimento di Fisica and INFN, I-44100 Ferrara, Italy
27Laboratori Nazionali di Frascati dell’INFN, I-00044 Frascati, Italy
28Università di Genova, Dipartimento di Fisica and INFN, I-16146 Genova, Italy
29Harvard University, Cambridge, Massachusetts 02138, USA
30Universität Heidelberg, Physikalisches Institut, Philosophenweg 12, D-69120 Heidelberg, Germany
31Imperial College London, London, SW7 2AZ, United Kingdom
32University of Iowa, Iowa City, Iowa 52242, USA
33Iowa State University, Ames, Iowa 50011-3160, USA
34Johns Hopkins University, Baltimore, Maryland 21218, USA
35Universität Karlsruhe, Institut für Experimentelle Kernphysik, D-76021 Karlsruhe, Germany
36Laboratoire de l’Accélérateur Linéaire, IN2P3/CNRS et Université Paris-Sud 11,
Centre Scientifique d’Orsay, B. P. 34, F-91898 ORSAY Cedex, France
37Lawrence Livermore National Laboratory, Livermore, California 94550, USA
38University of Liverpool, Liverpool L69 7ZE, United Kingdom
39Queen Mary, University of London, E1 4NS, United Kingdom
40University of London, Royal Holloway and Bedford New College, Egham, Surrey TW20 0EX, United Kingdom
41University of Louisville, Louisville, Kentucky 40292, USA
42University of Manchester, Manchester M13 9PL, United Kingdom
43University of Maryland, College Park, Maryland 20742, USA
44University of Massachusetts, Amherst, Massachusetts 01003, USA
45Massachusetts Institute of Technology, Laboratory for Nuclear Science, Cambridge, Massachusetts 02139, USA
46McGill University, Montréal, Québec, Canada H3A 2T8
47Università di Milano, Dipartimento di Fisica and INFN, I-20133 Milano, Italy
48University of Mississippi, University, Mississippi 38677, USA
49Université de Montréal, Physique des Particules, Montréal, Québec, Canada H3C 3J7
50Mount Holyoke College, South Hadley, Massachusetts 01075, USA
51Università di Napoli Federico II, Dipartimento di Scienze Fisiche and INFN, I-80126, Napoli, Italy
52NIKHEF, National Institute for Nuclear Physics and High Energy Physics, NL-1009 DB Amsterdam, The Netherlands
53University of Notre Dame, Notre Dame, Indiana 46556, USA
54Ohio State University, Columbus, Ohio 43210, USA
55University of Oregon, Eugene, Oregon 97403, USA
56Università di Padova, Dipartimento di Fisica and INFN, I-35131 Padova, Italy
57Laboratoire de Physique Nucléaire et de Hautes Energies,
IN2P3/CNRS, Université Pierre et Marie Curie-Paris6,
Université Denis Diderot-Paris7, F-75252 Paris, France
58University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
59Università di Perugia, Dipartimento di Fisica and INFN, I-06100 Perugia, Italy
60Università di Pisa, Dipartimento di Fisica, Scuola Normale Superiore and INFN, I-56127 Pisa, Italy
61Prairie View A&M University, Prairie View, Texas 77446, USA
62Princeton University, Princeton, New Jersey 08544, USA
63Università di Roma La Sapienza, Dipartimento di Fisica and INFN, I-00185 Roma, Italy
64Universität Rostock, D-18051 Rostock, Germany
65Rutherford Appleton Laboratory, Chilton, Didcot, Oxon, OX11 0QX, United Kingdom
66DSM/Dapnia, CEA/Saclay, F-91191 Gif-sur-Yvette, France
67University of South Carolina, Columbia, South Carolina 29208, USA
68Stanford Linear Accelerator Center, Stanford, California 94309, USA
69Stanford University, Stanford, California 94305-4060, USA
70State University of New York, Albany, New York 12222, USA
71University of Tennessee, Knoxville, Tennessee 37996, USA
72University of Texas at Austin, Austin, Texas 78712, USA
73University of Texas at Dallas, Richardson, Texas 75083, USA
74Università di Torino, Dipartimento di Fisica Sperimentale and INFN, I-10125 Torino, Italy
75Università di Trieste, Dipartimento di Fisica and INFN, I-34127 Trieste, Italy
76IFIC, Universitat de Valencia-CSIC, E-46071 Valencia, Spain
77University of Victoria, Victoria, British Columbia, Canada V8W 3P6
78Department of Physics, University of Warwick, Coventry CV4 7AL, United Kingdom
79University of Wisconsin, Madison, Wisconsin 53706, USA
80Yale University, New Haven, Connecticut 06511, USA
(Dated: November 4, 2018)
We perform the first three-dimensional measurement of the amplitudes of B → ψ(2S)K∗ and
B → χc1K
∗ decays and update our previous measurement for B → J/ψK∗. We use a data sample
collected with the BABAR detector at the PEP-II storage ring, corresponding to 232 million BB
pairs. The longitudinal polarization of decays involving a JPC = 1++ χc1 meson is found to be
larger than that with a 1−− J/ψ or ψ(2S) meson. No direct CP -violating charge asymmetry is
observed.
PACS numbers: 13.25.Hw, 12.15.Hh, 11.30.Er
In the context of measuring the parameters of the
Unitarity Triangle of the CKM matrix, B0 decays to
charmonium-containing final states (J/ψ , ψ(2S), χc1)K
defined collectively here as B0 → (cc̄)K∗, are of in-
terest for the precise measurement of sin 2β, where
β ≡ arg[−VcdV ∗cb/VtdV ∗tb], in a similar way as for B0 →
J/ψK0. Furthermore, the J/ψK∗ channel allows the
measurement of cos 2β [1].
For the modes considered in this paper, the final state
consists of two spin-1 mesons, leading to three possible
values of the total angular momentum with different CP
eigenvalues (L = 1 is odd, while L = 0, 2 are even). The
different contributions must be taken into account in the
measurement of sin 2β. The amplitude for longitudinal
polarization of the two spin-1 mesons is A0. There are
two amplitudes for polarizations of the mesons transverse
to the decay axis, here expressed in the transversity basis
[2]: A‖ for parallel polarization and A⊥ for their perpen-
dicular polarization. Only the relative amplitudes are
measured, so that |A0|2 + |A‖|2 + |A⊥|2 = 1. Previous
measurements by the CLEO [3], CDF [4], BABAR [1] and
Belle [5] collaborations for the B → J/ψK∗ channels are
all compatible with each other, and with a CP -odd in-
tensity fraction |A⊥|2 close to 0.2.
Factorization predicts that the phases of the transver-
sity decay amplitudes are the same. BABAR has observed
[1, 6] a significant departure from this prediction.
Precise measurements of the branching fractions of
B → (cc̄)K∗ decays are now available [7] to test the
theoretical description of the non-factorizable contribu-
tions [8], but polarization measurements are also needed.
In particular, measurements for ψ(2S) and χc1, com-
pared to that of J/ψ , would discriminate the mass de-
pendence from the quantum number dependence. CLEO
has measured the longitudinal polarization of B → ψ(2S)
K∗ decays to be |A0|2 = 0.45 ± 0.11 ± 0.04 [9]. Belle
has studied B → χc1 K∗ decays and obtained |A0|2 =
0.87± 0.09± 0.07 [10].
B → (cc̄)K(∗) decays provide a clean environment
for the measurement of the CKM angle β because one
tree amplitude dominates the decay. Very small direct
CP -violating charge asymmetries are expected in these
decays, and no such signal has been found [7]. While
more than one amplitude with different strong and weak
phases are needed to create a charge asymmetry in a sim-
ple branching fraction measurement, London et al. have
suggested [11] that an angular analysis of vector-vector
decays can detect charge asymmetries even in the case of
vanishing strong phase difference. Belle has looked for,
and not found, such a signal [5].
In this paper we present the amplitude measurement of
charged and neutral B → (cc)K∗ using a selection simi-
lar to that of Ref. [7], and a fitting method similar to that
of Ref. [1]. We use the notation ψ for the 1−− states J/ψ
and ψ(2S). ψ (χc1) candidates are reconstructed in their
decays to ℓ+ℓ− (J/ψγ), where ℓ represents an electron or
a muon. Decays to the flavor eigenstates K∗0 → K±π∓,
K∗± → K0
π± and K∗± → K±π0 are used. The relative
strong phases are known to have a two-fold ambiguity
when measured in an angular analysis alone. In con-
trast to earlier publications [3, 4, 6] we use here the set
of phases predicted in Ref. [12], with arguments based
on the conservation of the s-quark helicity in the decay
of the b quark. We have confirmed experimentally this
prediction through the study of the variation with Kπ in-
variant mass of the phase difference between theK∗(892)
amplitude and a non-resonant Kπ S-wave amplitude [1].
The data were collected with the BABAR detector at the
PEP-II asymmetric e+e− storage ring, and correspond to
an integrated luminosity of about 209 fb−1 at the center-
of-mass energy near the Υ (4S) mass. The BABAR detec-
tor is described in detail elsewhere [13]. Charged-particle
tracking is provided by a five-layer silicon vertex tracker
(SVT) and a 40-layer drift chamber (DCH). For charged-
particle identification (PID), ionization energy loss in the
DCH and SVT, and Cherenkov radiation detected in a
ring-imaging device (DIRC) are used. Photons are iden-
tified by the electromagnetic calorimeter (EMC), which
comprises 6580 thallium-doped CsI crystals. These sys-
tems are mounted inside a 1.5-T solenoidal superconduct-
ing magnet. Muons are identified in the instrumented
flux return (IFR), composed of resistive plate chambers
and layers of iron that return the magnetic flux of the
solenoid. We use the GEANT4 [14] software to simulate
interactions of particles traversing the detector, taking
into account the varying accelerator and detector condi-
tions.
J/ψ → e+e− (µ+µ−) candidates must have a mass
between 2.95 − 3.14 (3.06 − 3.14) GeV/c2. ψ(2S) can-
didates are required to have invariant masses 3.44 <
me+e− < 3.74 GeV/c
2 or 3.64 < mµ+µ− < 3.74 GeV/c
Electron candidates are combined with photon candi-
dates in order to recover some of the energy lost through
Bremsstrahlung. J/ψ candidates and γ candidates with
an energy larger than 150MeV, are combined to form χc1
candidates, which must satisfy 350 < mℓ+ℓ−γ −mℓ+ℓ− <
450 MeV/c2. π0 → γγ candidates must satisfy 113 <
mγγ < 153 MeV/c
2. The energy of each photon has to
be greater than 50MeV. K0
→ π+π− candidates are
required to satisfy 489 < mπ+π− < 507 MeV/c
2. In ad-
dition, the K0
flight distance from the ψ vertex must be
larger than three times its uncertainty. K∗0 and K∗+
candidates are required to satisfy 796 < mKπ < 996
MeV/c2 and 792 < mKπ < 992 MeV/c
2, respectively. In
addition, due to the presence of a large background of
low-energy non-genuine π0’s, the cosine of the angle θK∗
between the K momentum and the B momentum in the
K∗ rest frame has to be less than 0.8 for K∗ → K±π0.
In events where two B’s reconstruct to modes with the
same cc̄ and K candidate, one with a π± and the other
with a π0, the B candidate with a π0 is discarded due to
the high background induced by fake π0’s.
B candidates, reconstructed by combining cc̄ and K∗
candidates, are characterized by two kinematic variables:
the difference between the reconstructed energy of the
B candidate and the beam energy in the center-of-mass
frame ∆E = E∗B −
s/2, and the beam-energy substi-
tuted mass mES ≡
(s/2 + p0 · pB)2/E20 − p2B, where
subscript 0 and B correspond to Υ (4S) and the B can-
didate in the laboratory frame. For a correctly recon-
structed B meson, ∆E is expected to peak near zero and
mES near the B-meson mass 5.279GeV/c
2. The analysis
is performed in a region of the mES vs ∆E plane defined
by 5.2 < mES < 5.3 GeV/c
2 and −120 < ∆E < 120
MeV. The signal region is defined asmES > 5.27 GeV/c
and |∆E| smaller than 40 (30) MeV for channels with
(without) a π0. For events that have multiple candi-
dates, the candidate having the smallest |∆E| is chosen.
mES distributions are available in Ref. [18].
The B decay amplitudes are measured from the dif-
ferential decay distribution, expressed in the transversity
basis [1, 6], Fig. 1, with conventions detailed in Ref. [15].
θK∗ is the helicity angle of the K
∗ decay. It is defined in
FIG. 1: Definition of the transversity angles. Details are given
in the text.
the rest frame of the K∗ meson, and is the angle between
the kaon and the opposite direction of the B meson in
this frame. θtr and φtr are defined in the ψ (χc1) rest
frame and are the polar and azimutal angle of the posi-
tive lepton (J/ψ daughter of χc1) , with respect the axis
defined by:
• xtr: opposite direction of the B meson;
• ytr: perpendicular to xtr, in the (xtr,pK∗) plane,
with a direction such that pK∗ · ytr > 0;
• ztr: to complete the frame, ie: ztr = xtr × ytr.
In terms of the transversity angular variables ω ≡
(cos θK∗ , cos θtr, φtr), the time-integrated differential de-
cay rate for the decay of the B meson is
g(ω;A) ≡ 1
d cos θK∗d cos θtrdφtr
Akfk(ω), (1)
where the amplitude coefficientsAi and the angular func-
tions fk(ω), k = 1 · · · 6 are listed in Table I. The ψ
decays to two spin-1/2 particles, while the χc1 decays
to two vector particles. The angular dependencies are
therefore different [15]. The symbol A ≡ (A0, A‖, A⊥)
denotes the transversity amplitudes for the decay of the
B meson, and A for the B meson decay. In the absence
of direct CP violation, we can choose a phase conven-
tion in which these amplitudes are related by A0 = +A0,
A‖ = +A‖, A⊥ = −A⊥, so that A⊥ is CP -odd and A0
and A‖ are CP -even. The phases δj of the amplitudes,
where j = 0, ‖,⊥, are defined by Aj = |Aj |eiδj . Phases
are defined relative to δ0 = 0.
We perform an unbinned likelihood fit of the three-
dimensional angle probability density function (PDF).
The acceptance of the detector and the efficiency of
the event reconstruction may vary as a function of the
transversity angles, in particular as the angle θK∗ is
strongly correlated with the momentum of the final kaon
and pion. We use the acceptance correction method de-
velopped in Ref. [1]. The PDF of the observed events,
gobs, is :
gobs(ω;A) = g(ω;A)
〈ε〉(A)
, (2)
where
ε(ω) is the angle-dependent acceptance and
〈ε〉(A) ≡
g(ω;A)ε(ω)dω (3)
is the average acceptance. We take into account the pres-
ence of cross-feed from channels with the same cc̄ candi-
date and a differentK∗ candidate that has (due to isospin
symmetry) the same A dependence as the signal. The
observed PDF for channel b (b = K±π∓,K0
π±,K±π0)
is then
gbobs(ω;A) = g(ω;A)
εb(ω)
k=1 Ak(A)Φbk
, (4)
TABLE I: Amplitude coefficients Ak and angular functions fk(ω) that contribute to the differential decay rate. An overall
normalization factor 9/32π (for ψ) and 9/64π (for χc1) has been omitted. In the case of a B decay, the ℑm terms change sign.
i Ak fk(ω) for ψ [1, 6] fk(ω) for χc1 [15]
1 |A0|
2 2 cos2 θK∗
1− sin2 θtr cos
2 φtr
2 cos2 θK∗
1 + sin2 θtr cos
2 φtr
2 |A‖|
2 sin2 θK∗
1− sin2 θtr sin
2 φtr
sin2 θK∗
1 + sin2 θtr sin
2 φtr
3 |A⊥|
2 sin2 θK∗ sin
2 θtr sin
2 θK∗
2 cos2 θtr + sin
2 θtr
4 ℑm(A∗‖A⊥) sin
2 θK∗ sin 2θtr sinφtr − sin
2 θK∗ sin 2θtr sinφtr
5 ℜe(A‖A
sin 2θK∗ sin
2 θtr sin 2φtr
sin 2θK∗ sin
2 θtr sin 2φtr
6 ℑm(A⊥A
sin 2θK∗ sin 2θtr cos φtr −
sin 2θK∗ sin 2θtr cos φtr
where εb(ω) is the efficiency, defined as the ratio between
the reconstructed and generated yield for the process
(B → (cc̄)K∗, K∗ → b), and we do not distinguish be-
tween correctly reconstructed signal and cross-feed in the
numerator
εb(ω) ≡
a→b(ω). (5)
εa→b(ω) is the probability for an event generated in chan-
nel a and with angle ω to be detected as an event in
channel b. Fa, a = K
π0,K±π∓,K±π0,K0
π± denotes
the fraction of each channel in the total branching frac-
tion B → ccK∗,
a Fa = 1. The Φ
k are the fk(ω)
moments of the total efficiency εb, including cross-feed :
Φbk ≡
fk(ω)ε
a→b(ω)dω. (6)
Under the approximations of neglecting the angular
resolution for signal and cross-feed events, and the pos-
sible mis-measurement of the B flavor such as in events
where both daughters inK∗0 → K±π∓ are mis-identified
(K-π swap), the PDF gobs can be expressed as in Eq. (2),
and only the coefficients ΦbK are needed. The biases in-
duced by these approximations have been estimated with
Monte Carlo (MC) based studies and found to be negli-
gible.
The coefficients Φbk are computed with exclusive signal
MC samples obtained using a full simulation of the ex-
periment [14, 16]. PID efficiencies measured with data
control samples are used to adjust the MC simulation to
the observed performance of the detector. Separate co-
efficients are used for different charges of the final state
mesons, in particular to take into account the charge de-
pendence of the interaction of charged kaons with matter,
and a possible charge asymmetry of the detector. Writ-
ing the expression for the log-likelihood Lb(A) for the
PDF gbobs(ωi;A) for a pure signal sample of NS events,
the relevant contribution is
Lb(A) =
ln (g(ωi;A))−NS ln
Ak(A)Φbk
, (7)
since the remaining term
i=1 ln
εb(ωi)
does not de-
pend on the amplitudes.
We use a background correction method [1] in which
background events from a pure background sample of
NB events are added with a negative weight to the log-
likelihood that is maximized
L′b(A) ≡
nB+NS
L(ωi;A)−
L(ωj ;A), (8)
where L(ω;A) = ln(gbobs(ω;A)). The fit is performed
within the mES signal region. Background events used
here for subtraction are from generic (BB, qq) MC sam-
ples. ñB is an estimate of the unknown number nB of
background events that are present in the signal region
in the data sample.
As L′b is not a log-likelihood, the uncertainties yielded
by the minimization program Minuit [17] are biased es-
timates of the actual uncertainties. An unbiased esti-
mation of the uncertainties is described and validated in
Appendix A of Ref. [1]. With this pseudo-log-likelihood
technique, we avoid parametrizing the acceptance as well
as the background angular distributions.
The measurement is affected by several systematic un-
certainties. The branching fractions used in the cross-
feed part of the acceptance cross section are varied by
±1σ, and the largest variation is retained. The uncer-
tainty induced by the finite size of the MC sample used
to compute the coefficients Φbk is estimated by the statis-
tical uncertainty of the angular fit on that MC sample [6].
The uncertainty due to our limited understanding of the
PID efficiency is estimated by using two different meth-
ods to correct for the MC-vs-data differences. The back-
ground uncertainty is obtained by comparing MC and
data shapes of the mES distributions for the combinato-
rial component and by using the corresponding branching
errors for the peaking component. The uncertainty due
to the presence of a Kπ S wave under the K∗(892) peak
is estimated by a fit including it. The differential decay
rate is described by Eqs. (6-9) of Ref. [1].
The results are summarized in Table II. The values of
|A0|2, |A‖|2, |A⊥|2 are negatively correlated due to the
constraint |A0|2+ |A‖|2+ |A⊥|2 = 1. In particular, |A‖|2,
TABLE II: Summary of the measured amplitudes. For decays to χc1, as A⊥ is compatible with zero, its phase is not defined.
Channel |A0|
2 |A‖|
2 |A⊥|
2 δ‖ δ⊥
J/ψK∗ 0.556 ± 0.009 ± 0.010 0.211 ± 0.010 ± 0.006 0.233 ± 0.010 ± 0.005 −2.93 ± 0.08± 0.04 2.91 ± 0.05± 0.03
ψ(2S)K∗ 0.48± 0.05 ± 0.02 0.22 ± 0.06 ± 0.02 0.30± 0.06 ± 0.02 −2.8± 0.4± 0.1 2.8± 0.3± 0.1
∗ 0.77± 0.07 ± 0.04 0.20 ± 0.07 ± 0.04 0.03± 0.04 ± 0.02 0.0± 0.3± 0.1 –
Ψ(2S)
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
cosθK*
K+π- Ksπ
+ K+π0
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
-1 0 1
cosθtr
K+π- Ksπ
+ K+π0
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
0 2.5 5
K+π- Ksπ
+ K+π0
FIG. 2: Angular distributions with PDF from fit overlaid. The asymmetry of the cos θK∗ distributions induced by the S-wave
interference is clearly visible.
TABLE III: Difference between the interference terms mea-
sured in B and B decays to J/ψ .
δA4 δA6
(K+π−) 0.002 ± 0.025 ± 0.005 −0.011 ± 0.043 ± 0.016
(K+π0) −0.017 ± 0.047 ± 0.023 −0.051 ± 0.098 ± 0.064
(K0Sπ
+) −0.008 ± 0.049 ± 0.011 0.075 ± 0.089 ± 0.009
which would be the least precisely measured parameter in
separate one-dimensional fits, is strongly anti-correlated
with |A0|2, which would be the best measured. The
one-dimensional (1D) distributions, acceptance-corrected
with an 1D Ansatz and background-subtracted, are over-
laid with the fit results and shown on Figure 2. In con-
trast with the dedicated method used in the fit, for the
plots, we simply computed the 1D efficiency maps from
the distributions of the accepted events divided by the
1D PDF. As in lower statistics studies, the cos θK∗ for-
ward backward asymmetry due to the interference with
the S wave is clearly visible.
Our measurement of the amplitudes of B decays to
J/ψ are compatible with, and of better precision than,
previous measurements. A comparison of neutral and
charged B decays (not shown) yields results consistent
with isospin symmetry. The strong phase difference δ‖ −
δ⊥ is obtained from a fit in which the phase origin is
δ⊥ ≡ 0. We confirm our previous observation that the
strong phase differences are significantly different from
zero, in contrast with what is predicted by factorization.
For B → J/ψK∗, it amounts to δ‖ − δ⊥ = 0.45± 0.05±
0.02. The presence of direct CP -violating triple-products
in the amplitude would produce a B to B difference in
the interference terms A4 and A6: δA4 and δA6. Our
results (see Table III), with improved precision relative
to Ref. [19], are consistent with no CP violation.
In summary, we have performed the first three-
dimensional analysis of the decays to ψ(2S) and χc1. The
longitudinal polarization of the decay to ψ(2S) is lower
than that to J/ψ , while the CP -odd intensity fraction is
higher (by 1.4 and 1.0 standard deviations, respectively).
This is compatible with the prediction of models of me-
son decays in the framework of factorization. The lon-
gitudinal polarization of the decay to χc1 is found to be
larger than that to J/ψ , in contrast with the predictions
of Ref. [8], which include non-factorizable contributions.
The CP -odd intensity fraction of this decay is compatible
with zero. The parallel and longitudinal amplitudes for
χc1 seem to be aligned (|δ‖ − δ0| ∼ 0) while for ψ they
are anti-aligned (|δ‖ − δ0| ∼ π).
We are grateful for the extraordinary contributions of
our PEP-II colleagues in achieving the excellent luminos-
ity and machine conditions that have made this work pos-
sible. The success of this project also relies critically on
the expertise and dedication of the computing organiza-
tions that support BABAR. The collaborating institutions
wish to thank SLAC for its support and the kind hospi-
tality extended to them. This work is supported by the
US Department of Energy and National Science Foun-
dation, the Natural Sciences and Engineering Research
Council (Canada), the Commissariat à l’Energie Atom-
ique and Institut National de Physique Nucléaire et de
Physique des Particules (France), the Bundesministerium
für Bildung und Forschung and Deutsche Forschungsge-
meinschaft (Germany), the Istituto Nazionale di Fisica
Nucleare (Italy), the Foundation for Fundamental Re-
search on Matter (The Netherlands), the Research Coun-
cil of Norway, the Ministry of Science and Technology of
the Russian Federation, Ministerio de Educación y Cien-
cia (Spain), and the Science and Technology Facilities
Council (United Kingdom). Individuals have received
support from the Marie-Curie IEF program (European
Union) and the A. P. Sloan Foundation.
∗ Deceased
† Also with Università di Perugia, Dipartimento di Fisica,
Perugia, Italy
‡ Also with Università della Basilicata, Potenza, Italy
§ Also with IPPP, Physics Department, Durham Univer-
sity, Durham DH1 3LE, United Kingdom
[1] B. Aubert et al. [BABAR Collaboration], Phys. Rev. D
71, 032005 (2005).
[2] I. Dunietz, H. R. Quinn, A. Snyder, W. Toki and
H. J. Lipkin, Phys. Rev. D 43, 2193 (1991).
[3] C. P. Jessop et al. [CLEO Collaboration], Phys. Rev.
Lett. 79, 4533 (1997).
[4] T. Affolder et al. [CDF Collaboration], Phys. Rev. Lett.
85, 4668 (2000).
[5] R. Itoh et al. [Belle Collaboration], Phys. Rev. Lett. 95,
091601 (2005).
[6] B. Aubert et al. [BABAR Collaboration], Phys. Rev.
Lett. 87, 241801 (2001).
[7] B. Aubert et al. [BABAR Collaboration], Phys. Rev.
Lett. 94, 141801 (2005).
[8] C. H. Chen and H. N. Li, Phys. Rev. D 71, 114008 (2005).
[9] S. J. Richichi et al. [CLEO Collaboration], Phys. Rev. D
63, 031103 (2001).
[10] N. Soni et al. [Belle Collaboration], Phys. Lett. B 634,
155 (2006).
[11] D. London, N. Sinha and R. Sinha, Phys. Rev. Lett. 85,
1807 (2000).
[12] M. Suzuki, Phys. Rev. D 64, 117503 (2001).
[13] B. Aubert et al. [BABAR Collaboration], Nucl. Instrum.
Meth. A 479, 1 (2002).
[14] S. Agostinelli et al. [GEANT4 Collaboration], Nucl. In-
strum. Meth. A 506, 250 (2003).
[15] Ph. D. Thesis, S. T’Jampens, BaBar THESIS-03/016,
Paris XI Univ., 18 Dec 2002.
[16] D. J. Lange, Nucl. Instrum. Meth. A 462 152 (2001).
[17] F. James and M. Roos, Comput. Phys. Commun. 10
(1975) 343.
[18] B. Aubert et al. [BABAR Collaboration],
arXiv:hep-ex/0607081.
[19] R. Itoh et al. [BELLE Collaboration], Phys. Rev. Lett.
95, 091601 (2005).
http://arxiv.org/abs/hep-ex/0607081
	References
ABSTRACT
  We perform the first three-dimensional measurement of the amplitudes of $B\to
\psi(2S) K^*$ and $B\to \chi_{c1} K^*$ decays and update our previous
measurement for $B\to J/\psi K^*$. We use a data sample collected with the
BaBar detector at the PEP2 storage ring, corresponding to 232 million $B\bar B$
pairs. The longitudinal polarization of decays involving a $J^{PC}=1^{++}$
$\chi_{c1}$ meson is found to be larger than that with a $1^{--}$ $J/\psi$ or
$\psi(2S)$ meson. No direct {\it CP}-violating charge asymmetry is observed.

<|endoftext|><|startoftext|>
Quantum superpositions and entanglement of thermal states at high temperatures
and their applications to quantum information processing
Hyunseok Jeong and Timothy C. Ralph
Centre for Quantum Computer Technology, Department of Physics,
University of Queensland, St Lucia, Qld 4072, Australia
(Dated: October 26, 2018)
We study characteristics of superpositions and entanglement of thermal states at high tempera-
tures and discuss their applications to quantum information processing. We introduce thermal-state
qubits and thermal-Bell states, which are a generalization of pure-state qubits and Bell states to
thermal mixtures. A scheme is then presented to discriminate between the four thermal-Bell states
without photon number resolving detection but with Kerr nonlinear interactions and two single-
photon detectors. This enables one to perform quantum teleportation and gate operations for
quantum computation with thermal-state qubits.
I. INTRODUCTION
In many problems considered within the framework of
quantum physics, physical systems are treated as pure
states that can be represented by state vectors, or equiva-
lently, by wave functions. Even though such an approach
is simple and useful to address certain problems, it could
often be quite different from real conditions of physical
systems. This may be particularly true when one deals
with macroscopic physical systems in terms of quantum
physics. A macroscopic object is a complex open sys-
tem which cannot avoid continuous interactions with the
environment. Such a physical system is generally in a
significantly mixed state and cannot be represented by
a state vector. In general, mixed states are subtle ob-
jects whose properties are significantly more difficult to
characterize than pure states.
Schrödinger’s famous cat paradox is a typical example
where a massive classical object was assumed to be a pure
state. It describes a counter-intuitive feature of quantum
physics which dramatically appears when the principle of
quantum superposition is applied to macroscopic objects.
In the original paradox and its various explanations, the
initial cat isolated in the steel chamber is considered a
pure state that can be represented by a state vector such
as |alive〉 (or a wave function such as ψalive). The cat
isolated from the environment is then assumed to inter-
act with a microscopic superposition state, (|g〉+|e〉)/
where |g〉 and |e〉 are the ground and excited states of a
two-level atom. The cat will be dead if the atom is found
in the excited state, |e〉, while it will remain alive if other-
wise. Thus in Schrödinger’s gedanken experiment the cat
is entangled with the atom as (|g〉|alive〉+ |e〉|dead〉)
where the alive and dead statuses of the cat are described
by the state vectors |alive〉 and |dead〉. If one mea-
sures out the atomic system on the superposed basis,
(|g〉 ± |e〉)/
2, the cat will be in a superposition of alive
and dead states such as (|alive〉± |dead〉)/
2. It is often
argued that such superposed states and entangled states
can theoretically exist but are virtually impossible to ob-
serve because one cannot perfectly isolate a macroscopic
object such as the cat from its environment [4].
However, this explanation is not fully satisfactory be-
cause the cat, a macroscopic object, is a complex open
system which cannot be represented by a state vector.
One may argue that the cat could be assumed to be in an
unknown pure state such that the cat was certainly alive
but the exact state of the cat was unknown. However,
the interactions between the cat and its environment can
cause the cat to become entangled with the environment
[5]. In such a case, even though one can perfectly iso-
late the cat in the steel chamber from the enviroment,
the cat will remain entangled with the environment due
to its pre-interactions with the environment. Therefore,
strictly speaking, even to assume a cat as an unknown
pure state in the steel chamber is not legitimate. Thus
a key point here is that it is unsatisfactory to describe
the cat by a pure state such as |alive〉 and |dead〉. We
may need a more realistic assumption that the “cat” in
Schrödinger’s paradox was in a significantly mixed clas-
sical state. An intriguing question is then whether the
quantum properties of the resulting state would still re-
main or diminish under such an assumption.
Recently, such an analogy of Schrödinger’s cat para-
dox, where the state corresponding to the virtual cat
is a significantly mixed thermal state, was investigated
[6]. A thermal state with a high temperature is consid-
ered a classical state in quantum optics. As the tem-
perature of the thermal state increases, the degree of
mixedness, which can be quantified by linear entropy,
rapidly approaches the maximum value. When the tem-
perature approaches infinity, the thermal state does not
show any quantum properties. As a comparison, coher-
ent states with large amplitudes are known as the most
classical pure states [7], and their superposition is of-
ten regarded as a superposition of classical states [8].
However, coherent states are still pure states which may
not well represent truly classical systems, and they dis-
play some nonclassical features [9]. In Ref. [6], it was
shown that prominent quantum properties can actually
be transferred from a microscopic superposition to a sig-
nificantly mixed thermal state (i.e. a thermal state of
which the degree of mixedness is close to the maximum
value) at a high temperature through an experimentally
http://arxiv.org/abs/0704.0523v2
feasible process. This result clarifies that unavoidable ini-
tial mixedness of the cat does not preclude strong quan-
tum phenomena.
One of the results in Ref. [6] is that quantum entan-
glement can be produced between thermal states with
nearly the maximum Bell-inequality violation when the
temperatures of both modes goes to infinity. In previous
related results, Bose et al. showed that entanglement
can arise when two systems interact if one of the system
are pure even when the other system is extremely mixed
[10]. There is an interesting previous example shown by
Filip et al. for the maximum violation of Bell’s inequal-
ity when one of the modes is an extremely mixed thermal
state [11]. Very recently, Ferreira et al. showed that en-
tanglement can be generated at any finite temperature
between high Q cavity mode field and a movable mirror
thermal state [12]. However, in these example [10, 11, 12]
only one of the modes is considered a large thermal state
[10, 11, 12] and entanglement vanishes in the infinite tem-
perature limit [10, 12], which is obviously in contrast to
the result presented in Ref. [6]. Entanglement for both
of the modes at the thermal limit of the infinitely high
temperature has not been found before. Remarkably, the
violation of Bell’s inequality in our examples reaches up
to Cirel’son’s bound [13] even in this infinite-temperature
limit for both modes. As Vedral [14] and Ferreira et al.
[12] pointed out it is believed that high temperatures re-
duce entanglement and all entanglement vanishes if the
temperature is high enough, which is obviously not the
case in Ref. [6].
The purpose of this paper is twofold. Firstly, we review
and further investigate various properties of superposi-
tions and entanglement of thermal states at high tem-
peratures [6]. In particular, we investigate two classes of
highly mixed symmetric states in the phase space. Both
the classes of these states do not show typical interference
patterns in the phase space while they manifest strong
singular behaviors. Interestingly, the first class of states
has neither squeezing properties nor negative values in
their Wigner functions, however, they are found to be
highly nonclassical states. The second class of states
has the maximum negativity in the Wigner function.
Further, we discuss the possibility of quantum informa-
tion processing with thermal-state qubits. We introduce
thermal-state qubits and thermal-Bell states, which are
a generalization of pure Bell states. We show that four
thermal-Bell states can be well discriminated by nonlin-
ear interactions without photon number resolving mea-
surements. Quantum teleportation and gate operations
for thermal-state qubits can be realized using the Bell
measurement scheme.
This paper is organized as follows. In Sec. II, we review
the generation process of superpositions of thermal states
and study their characteristics. In Sec. III, we study en-
tanglement of thermal states, i.e., Bell inequality viola-
tions. In Sec. IV, we discuss the possibility of quantum
information processing using thermal states. We first de-
fine the thermal-state qubit and the Bell-basis states us-
ing thermal-state entanglement. We then show that the
four Bell states can be well discriminated by homodyne
detection and two Kerr nonlinearities. It follows that
quantum teleportation and quantum gate operations can
be realized with thermal-state qubits. We conclude with
final remarks in Sec. V.
II. SUPERPOSITIONS OF THERMAL STATES
A. Generation of thermal-state superpositions
Let us first consider a two-mode harmonic oscillator
system. A displaced thermal state can be defined as
ρth(V, d) =
d2αP th(V, d)|α〉〈α| (1)
where |α〉 is a coherent state of amplitude α and
P thα (V, d) =
π(V − 1) exp[−
2|α− d|2
V − 1 ] (2)
with variance V and displacement d in the phase space.
The thermal temperature τ increases as V increases as
e~ν/τ = (V + 1)/(V − 1), where ~ is Planck’s constant
and ν is the frequency [15]. Suppose that a microscopic
superposition state
|ψ〉a =
(|0〉a + |1〉a), (3)
where |0〉 and |1〉 are the ground and first excited states
of the harmonic oscillator, interacts with a thermal state
ρthb (V, d) and the interaction Hamiltonian is
HK = λâ†âb̂†b̂ (4)
which corresponds to the cross Kerr nonlinear interac-
tion. The resulting state is then
ρentab =
d2αP th(V, d)
|0〉〈0| ⊗ |α〉〈α|
+ |1〉〈0| ⊗ |αeiϕ〉〈α|+ |0〉〈1| ⊗ |α〉〈αeiϕ|
+ |1〉〈1| ⊗ |αeiϕ〉〈αeiϕ|
and ϕ is determined by the strength of the nonlinearity
λ and the interaction time. The Wigner representation
of ρentab is
W entab (α, β) =
e−2|α|
W th(β; d) + 2αV c(β; d) + 2[αV c(β; d)]∗ + (4|α|2 − 1)W th(β; deiϕ)
where α and β are complex numbers parametrizing the phase spaces of the microscopic and macroscopic systems
respectively and
W th(α; d) =
exp[−2|α− d|
], (7)
V c(α; d) =
exp[− 2
(1 − eiϕ)d2 − 1
(α− 2e
)(α∗ − 2d
)], (8)
K = 2+ (V − 1)(1− eiϕ), J = (sinϕ/2 + iV cosϕ/2)/(2V sinϕ/2 + 2i cosϕ/2), and d has been assumed real without
loss of generality. If one traces ρentab over mode a, the remaining state will be simply in a classical mixture of two
thermal states and its Wigner function will be positive everywhere. However, if one measures out the “microscopic
part” on the superposed basis, i.e., (|0〉a ± |1〉a)/
2, the “macroscopic part” for mode b may not lose its nonclassical
characteristics. Such a measurement on the the superposed basis will reduce the remaining state to
ρsup(±) = N±s
d2αP th(V, d)
|α〉〈α| ± |αeiϕ〉〈α| ± |α〉〈αeiϕ|+ |αeiϕ〉〈αeiϕ|
, (9)
where N±s are the normalization factors, and its Wigner function is
W sup(±)(α) = N±s {W th(α; d)± V c(α; d) ± {V c(α; d)}∗ +W th(α; deiϕ)}. (10)
The ± signs in Eqs. (8) and (9) correspond to the two
possible results from the measurement of the microscopic
system. The state in Eq. (10) is a superposition of two
thermal states.
A feasible experimental setup to generate superposi-
tions of thermal states is atom-field interactions in cavi-
ties, where a π/2 pulse can be used to prepare the atom
in a superposed state. This type of experiment has al-
ready been performed to produce a superposition of co-
herent states [16]. In our cases, simply thermal states
can be used instead of coherent states. Another pos-
sible setup is an all-optical scheme with free-traveling
fields and a cross-Kerr medium, where a standard single-
photon qubit could be used as the microscopic superpo-
sition. Recently, there have been theoretical and experi-
mental efforts to produce and observe giant Kerr nonlin-
earities using electromagnetically induced transparency
[17]. Furthermore, it was shown that a weak Kerr non-
linearity can still be useful if a initially strong field is
employed in this type of experiment [18]. We shall fur-
ther explain this with examples in Sec. III.
B. Negativity of the Wigner function
The negativity of the Wigner function is known as an
indicator of non-classicality of quantum states. In order
to observe negativity of the Wigner function in a real
experiment, its absolute minimum negativity should be
large enough. The minimum negativity of the Wigner
function in Eq. (6) for V = 1 is −0.144 for d = 0 and
−0.246 for d → ∞. Now suppose the initial state can
be considered a classical thermal state by letting V ≫ 1.
One might expect that the negativity would be washed
out as the initial state becomes mixed, but this is not
the case. The minimum negativity actually increases as
V gets larger. If V → ∞, the minimum negativity of the
Wigner function (6) is −0.246 regardless of d: no matter
how mixed the initial thermal state was, the minimum
negativity of Wigner function is found to be a large value.
The point in the phase space which gives the minimum
negativity when V ≫ 1 or d ≫ 0 is (− 1
, 0) and has
negativity
Wneg ≡W entab (−
, 0) =
2(−2 + 1
exp[− 2d
. (11)
It can be shown that Wneg approaches −4/(π2
−0.246 when either d→ ∞ or V → ∞.
This effect is obviously due to the interaction between
the microscopic superposition and the macroscopic ther-
mal state. If the initial microscopic state is not super-
posed, e.g., |ψ〉a = |1〉a, the resulting state will be a
simple direct product, (|1〉〈1|)a ⊗ ρthb (V,−d). Whilst for
V = 1 this state will exhibit negativity, this is washed out
and tends to zero as V → ∞. Needless to say, if it was
|0〉a instead of |1〉a, the resulting Wigner function will be
a direct product of two Gaussian states whose Wigner
fucntion can never be negative. The superpositon state
(3) plays the crucial role in making the minimum negativ-
ity of the resulting Wigner function always saturate to a
certain negative value no matter how mixed and classical
the initial state of the other mode becomes.
− −0.04
100 0.15 0
0.040
0.005
0.015
FIG. 1: The probability distributions of x (left) and p (right)
for a “superposition” of two distant thermal states. A thermal
state with a large mixedness is converted to such a “thermal-
state superposition” by interacting with a microscopic super-
potion (see text). The variance V and displacement d for the
thermal state are chosen as (a) V = 100 and d = 100, and (b)
V = 1000 and d = 300. The fringe visibility is 1 regardless of
V and the fringe spacing (the distance between the fringes)
does not depend on the variance (i.e. mixedness) but only on
the distance d between the two component thermal states.
The Wigner functions of the single-mode states,
W sup(±)(α), in Eq. (10) show large negative values. The
minimum negativity of the Wigner function W sup(−)(α)
is W sup(−)(0) = 2/π regardless of the values of V and
d. On the other hand, the minimum negativity of the
Wigner function W sup(+)(α) approaches 2/π for d→ ∞
and disappears when d = 0.
C. Quantum interference in the phase space
When ϕ = π, the state (9) becomes
ρ± = N(ρth(V, d)±σ(V, d)±σ(V,−d)+ρth(V,−d)), (12)
where σ(V, d) =
d2αP th(V, d)| − α〉〈α| and
N = 2
exp[− 2d
. (13)
If the initial state for mode b is a pure coherent state,
i.e., V = 1, the measurement on the superposed basis for
mode a will produce a superposition of two pure coherent
states as
|Ψ̃±〉 =
1± e−2|α|2
(|α〉 ± | − α〉), (14)
where α = d. The probability P± to obtain the state ρ±
is obtained as [19]
P± = 〈ψ±|Trb[ρentab ]|ψ±〉 =
exp[− 2d2
), (15)
1998 2000 −2002 5 5
FIG. 2: The probability distributions P for a “superposition”
of thermal states where V = 5, d = 2000, ϕ = π/1000. The
x′ (p′) axis in this figure has been rotated by π/2000 from the
x (p) axis for clarity.
where |ψ±〉 = (|0〉±|1〉)/
2. The probability approaches
P± = 1/2 when either d or V becomes large.
As an analogy of Schrödinger’s cat paradox, the vari-
ance V corresponds to the size the initial “cat”, and the
distance d between the two thermal component states
corresponds to distinguishability between the “alive cat”
and the “dead cat”. Suppose that both V and d are very
large for the initial thermal state. The two thermal states
ρth(V,±d) become macroscopically distinguishable when
V , and our example may become a more realis-
tic analogy of the cat paradox in this limit. Both the
states ρ± in this case show probability distributions with
two Gaussian peaks and interference fringes [6]. Figure 1
presents the probability distributions of x (≡ Re[α]) and
p (≡ Im[α]) for ρ− (a) when V = 100 and d = 100 and
(b) when V = 1000 and d = 300. The probability dis-
tribution of x (p) for ρ± can be obtained by integrating
the Wigner function of ρ± over p (x). The two Gaussian
peaks along the x axis and interference fringes along the
p axis shown in Fig. 1 are a typical signature of a quan-
tum superposition between macroscopically distinguish-
able states. The visibility v of the interference fringes is
defined as [15]
Imax − Imin
Imax + Imin
, (16)
where I =
dxW sup(−)(α) and the maximum should be
taken over p. It can be simply shown that the visibil-
ity v is always 1 regardless of the value of V . Note that
d should increase proportionally to
V to maintain the
condition of classical distingushability between the two
component thermal states ρth(V,±d). The interference
fringes with high visibility are incompatible with classical
physics and evidence of quantum coherence. The fringe
spacing (the distance between the fringes) does not de-
pend on V but only on d, i.e., a pure superposition of
coherent states shows the same fringe spacing for a given
d. We emphasize that the states shown in Fig. 1 are
“superpositions” of severely mixed thermal states.
An experimental realization of a nonlinear effect cor-
responding to ϕ = π is very demanding particularly in
the presence of decoherence. Here we point out that the
method using a weak nonlinear effect (ϕ≪ π) combined
with a strong field (d≫ 1) [18] can be useful to generate a
thermal-state superposition with prominent interference
(a) (b) (c)
0.005
0.005
0.015
0.005
0.015
(d) (e) (f)
FIG. 3: (Color online) The time dependent Wigner functions of the thermal state of V = 100 at the origin (d = 0) after an
interaction with a microscopic superposition and a conditional measurement. The measurement result on the microscopic part
was supposed to be (|0〉 + |1〉)/
2. The interaction times are (a) θ = λt = 0, (b) θ = λt = π/32, (c) θ = π/16, (d) θ ≈ 3.102,
(e) θ ≈ 3.122 and (f) θ = π.
patterns. In Fig. 2, we have used experimentally acces-
sible values, V = 5, d = 2000 and ϕ = π/1000, but the
fringe visibility is still 1. In this case, decoherence during
the nonlinear interaction would be significantly reduced
because of the decrease of the interaction time [18]. Note
also that, if required, the state in Fig. 2 can be moved to
the center of the phase space, for example, using a biased
beam splitter (BS) and a strong coherent field [18].
D. Symmetric macroscopic quantum states
Let us assume that d = 0, i.e., the initial state is the
thermal state, ρth(V, 0), at the origin of the phase space.
In this case, the thermal-state superpositions, ρ±, are
produced with probabilities, P± = (1/2){1± (1/V )}, re-
spectively. Figure 3 shows the Wigner functions of ρ+
dependent on the interaction time between the macro-
scopic thermal state and the microscopic superposition
in a cross Kerr medium. The state is always symmetric
in the phase space regardless of the interaction time as
shown in Fig. 3. In this figure, the initial state is a ther-
mal state of V = 100 (Fig. 3(a)). In a relatively short
time (θ = π/32 and θ = π/16), the state shows some in-
terference patterns. When θ = π, the evolved state looks
very localized around the origin as shown in Fig 3. The
generated state at θ = π does not show negativity of the
Wigner function nor squeezing properties. On the other
hand, a well defined P function does not exist for this
state.
In the case of ρ−, with the same assumption d = 0, the
Wigner function at ϕ = π has the minimum negativity
(−2/π) at the origin regardless of V . As a result of the
interaction with the microscopic superposition, a deep
hole to the negative direction below zero has been formed
around the origin for ρ− as shown in Fig. 4 .
III. ENTANGLEMENT BETWEEN THERMAL
STATES
Entanglement between macroscopic objects and its
Bell-type inequality tests are an important issue. In this
section, we shall show that entanglement can be gener-
ated between high-temperature thermal states even when
the temperature of each mode goes to infinity.
−1 −0.5 0 0.5 1
FIG. 4: (Color online) The Wigner function of the thermal
state of V = 100 at the origin (d = 0) after an interac-
tion with a microscopic superposition and a conditional mea-
surement. The measurement result on the microscopic part
was supposed to be (|0〉 − |1〉)/
2 with the interaction time
θ = λt = π.
2001 400 600 800 1000
50 100 150
FIG. 5: (a) The optimized violation, B ≡ |B+|max, of Bell-
CHSH inequality for the “thermal-state entenglement”, ρ+,
of V = 1000 (solid curve) and V = 100 (dashed curve).
The Bell-violation of a pure entangled coherent state, i.e.,
V = 1, has been plotted for comparison (dotted curve). The
Bell-violation B approaches its maximum bound, 2
2, when
V regardless of the level of the mixedness V . (b)
The optimized Bell-violation B against d for the different type
of thermal-state entanglement generated using a 50:50 beam
splitter from ρ+. V = 1000 (solid curve), V = 100 (dashed
curve) and V = 1 (dotted curve).
A. Entanglement using two initial thermal states
If the microscopic superposition interacts with two
thermal states, ρthb (V, d) and ρ
c (V, d), and the micro-
scopic particle is measured out on the superposed basis,
the resulting state will be
ρtm(±) = Nt
ρth(V, d)⊗ ρth(V, d)± σ(V, d) ⊗ σ(V, d)
± σ(V,−d)⊗ σ(V,−d) + ρth(V,−d)⊗ ρth(V,−d)
where
Nt = 2
exp[− 4d
. (18)
Such two-mode thermal-state entanglement can be gener-
ated using two cavities and an atomic state detector [20].
Extending the two cavities to N cavities, entanglement
of N -mode thermal states can also be generated. Such
a state is an analogy of the N -mode pure GHZ state
[21] but each mode is extremely mixed. Here we shall
consider the Bell-CHSH inequality [22, 23] with photon
number parity measurements [20, 24]. The parity mea-
surements can be performed in a high-Q cavity using a
far-off-resonant interaction between a two-level atom and
the field [25]. The Bell-CHSH inequality can be repre-
sented in terms of the Winger function as [24]
|B(±)| = π
|W tm(±)(α, β) +W tm(±)(α, β′)
+W tm(±)(α′, β)−W tm(±)(α′, β′)| ≤ 2,
where W tm(±)(α, β) is the Wigner function of ρtm(±) in
Eq. (17). As shown in Fig. 5, the Bell-violation ap-
proaches the maximum bound for a bipartite measure-
ment, 2
2 [13], when d ≫
V regardless of the level of
the mixedness V , i.e., the temperatures of the thermal
states. Note that it is true for both of ρ+ and ρ− even
though only the case of ρ+ has been plotted in Fig. 5(a).
This implies that entanglement of nearly 1 ebit has been
produced between the two significantly mixed thermal
states for d ≫
V , and such “thermal-state entangle-
ment” cannot be described by a local theory.
B. Entanglement using a beam splitter
A different type of macroscopic entanglement can be
generated by applying the beam splitter operation
exp[θ/2(eiφâ†sâd − e−iφâ
dâs)], (20)
on the “thermal-state superpositions” in Eq. (9). The
state after passing through a 50:50 beam splitter can be
represented as
d2αP thα (V, d)
,− α√
〉 ± | − α√
,− α√
| ± 〈− α√
, (21)
200 400 600 800 1000
FIG. 6: The optimized Bell-violation B against V for the
slightly different type of thermal-state entanglement gener-
ated using a 50:50 beam splitter using ρ+ when d = 0.
3.13 3.14 3.15 3.16
3.13 3.14 3.15 3.16
FIG. 7: (a) The Bell-CHSH function B against θ (= λt) for
V = 1 (solid curve), V = 10 (dashed curve) and V = 20
(dotted curve) for d = 30. (b) The Bell-CHSH function for
d = 10 (solid curve), d = 20 (dashed curve) and d = 30
(dotted curve) for V = 10. The Bell violations are more
sensitive to the interaction time as either V or d increases.
where N is defined in Eq. (13). When d is large, this
state violates the Bell-CHSH inequality to the maximum
bound 2
2 regardless of the level of mixedness V as
shown in Fig. 5(b). Again, it is true for both of ρ+ and
ρ− even though only the case of ρ+ has been plotted
in Fig. 5(b). Furthermore, these states severely violate
Bell’s inequality even when d = 0 as V increases as shown
in Fig. 6. We have found that the optimized Bell violation
of these states approaches 2.32449 for V → ∞. Interest-
ingly, this value is exactly the same as the optimized
Bell-CHSH violation for a pure two-mode squeezed state
in the infinite squeezing limit [26]. Note that multilmode
entangled states can be generated using multiple beam
splitters.
It should be noted that the Bell violations are more
sensitive to the interaction time when either V or d is
larger. Figure 7 clearly shows this tendency. Therefore,
in order to observe the Bell violations using the mixed
state of V (and d) large, the interaction time in the Kerr
medium should be more accurate.
IV. QUANTUM INFORMATION PROCESSING
WITH THERMAL-STATE QUBITS
In this section, we discuss the possibility of quan-
tum information processing with thermal-state qubits
and thermal-state entanglement.
A. Qubits and Bell-state measurements
We introduce a thermal-state qubit
ρψ = |a|2ρth(V, d)± ab∗σ(V, d)± a∗bσ(V,−d) + |b|2ρth(V,−d),
where a and b are arbitrary complex numbers. The ba-
sis states, ρth(V, d) and ρth(V,−d), can be well discrimi-
nated by a homodyne measurement when d is larger than
V . The thermal state qubit (22) can be re-written as
d2αP thα (V, d)
a|α〉+ b| − α〉
a∗〈α| + b∗〈−α|
which can be understood as a generalization of the co-
herent state qubit, a|d〉+ b| − d〉, where |d〉 is a coherent
state of amplitude d. The thermal-state qubit (23) be-
comes identical to the coherent-state qubit when V = 1.
We also define four thermal-Bell states as
ρΦ(±) = Nt
ρth(V, d) ⊗ ρth(V, d)± σ(V, d)⊗ σ(V, d)± σ(V,−d)⊗ σ(V,−d) + ρth(V,−d)⊗ ρth(V,−d)
ρΨ(±) = Nt
ρth(V, d)⊗ ρth(V,−d)± σ(V, d) ⊗ σ(V,−d)± σ(V,−d)⊗ σ(V, d) + ρth(V,−d)⊗ ρth(V, d)
where Nt was defined in Eq. (18). The thermal-Bell states can be written as
ρΦ(±) = Nt
dα2dβ2P thα (V, d)P
β (V, d)
|α, β〉 ± | − α,−β〉
〈α, β| ± 〈−α,−β|
, (26)
ρΨ(±) = Nt
dα2dβ2P thα (V, d)P
β (V, d)
|α,−β〉 ± | − α, β〉
〈α,−β| ± 〈−α, β|
. (27)
Homodyne 
detector C
FIG. 8: A schematic of the thermal-Bell state measurement
(a) using photon number resolving detection and (b) using ho-
modyne measurements with cross-Kerr nonlinear interactions
(NL). See text for details.
For quantum information processing applications, it is
an important task to discriminate between the four Bell
states. Here we discuss two possible ways to discrimi-
nate between the thermal-Bell states (25). We shall only
briefly describe the first scheme using photon number re-
solving measurements and focus on the second scheme
using nonlinear interactions.
The first method is to simply use a 50-50 beam splitter
and two photon number resolving detectors as shown in
Fig. 8(a). This scheme is basically the same as the Bell-
state measurement scheme with pure entangled coherent
states [27, 28]. Let us suppose that the amplitude, d, is
large enough, i.e., d ≫
V . If the incident state was
ρΦ(+) or ρΦ(−), most of the photons are detected on de-
tector A in in Fig. 8(a). Meanwhile, most of the photons
are detected on detector B when the incident state was
ρΨ(+) or ρΨ(−). The average photon numbers between
the “many-photon case” and the “few-photon case” are
compared in Fig. 9. Furthermore, the states ρΨ(+) and
ρΦ(+) contain only even numbers of photons while ρΨ(−)
and ρΦ(−) contain only odd numbers of photons. There-
fore, all the four Bell states can be well discriminated by
analyzing numbers of photons detected at detectors A
and B. For example, if detector A detects many photons
while detector B detects few and the total photon num-
ber detected by the two detectors are even, this means
that state ρΦ(+) was measured by the thermal-Bell mea-
surement. The nonzero failure probability can be made
arbitrarily small by increasing d.
However, the average photon numbers of the thermal-
Bell states are high when V ≫ 1 and d≫ 1. In this case,
it would be unrealistic to use photon number resolving
detectors. It would be an interesting question whether
0.5 1 1.5 2 2.5 3
0.5 1 1.5 2 2.5 3
FIG. 9: The average photon number N for the “many-photon
case” (solid line) and the “few-photon case” (dashed line) for
V = 10 against d (a) when the input state is either ρΦ(+) or
ρΨ(+) and (b) when the input state is either ρΦ(−) or ρΨ(−).
these four thermal-Bell states can be distinguished by
classical measurements, such as homodyne detection, in-
stead of photon number resolving detection. Our alterna-
tive scheme employs cross-Kerr nonlinearities and single
photon detectors as shown in Fig. 8(b). Let us first sup-
pose that the input field was ρΦ(+). The incident two-
mode state passes through a 50-50 beam splitter, BS1.
The state after passing through the 50:50 beam splitter,
BS1, is
ρB = Nt
d2αd2βP thα (V, d)P
β (V, d)
|η,−ξ〉〈η,−ξ|
+ |η,−ξ〉〈−η, ξ|+ | − η, ξ〉〈η,−ξ| + | − η, ξ〉〈−η, ξ|
where η = (α+β)/
2 and ξ = (α−β)/
2. Two dual-rail
single photon qubits, |ψ+〉ee′ and |ψ+〉ff ′ , where
|ψ+〉 =
(|0〉|1〉+ |1〉|0〉), (29)
are prepared using two single photons and 50:50 beam
splitters, BS2 and BS3, as shown in Fig. 8(b). Then,
traveling fields at modes c and d interacts with those
of modes e and f , respectively, in cross-Kerr nonlinear
media. We suppose that the interaction time is t = π/λ,
and the resulting state is then
= UceUdfρ
ff ′U
df (30)
where Uce = exp[iπHKce/λ~] and ρq = |ψq〉〈ψq |. An ex-
plicit form of Eq. (30) can then be simply obtained using
the identity
Uce|α〉c|0〉e = |α〉c|0〉e,
Uce|α〉c|1〉e = | − α〉c|1〉e
where |α〉 is a coherent state. However, we omit such an
explicit expression in this paper for it is too lengthy.
After the nonlinear interactions, the qubit parts,
modes e, e′, f and f ′, should be measured with the mea-
surement basis
{|++〉, |+−〉, | −+〉, | − −〉} (32)
where |+ +〉 = |ψ+〉ee′ |ψ+〉ff ′ , | + −〉 = |ψ+〉ee′ |ψ−〉ff ′ ,
| − +〉 = |ψ−〉ee′ |ψ+〉ff ′ , | − −〉 = |ψ−〉ee′ |ψ−〉ff ′ , and
|ψ−〉 = (|0〉|1〉 − |1〉|0〉)/
2. This measurement can be
performed using two 50:50 beam splitters, BS4 and BS5,
and four detectors, A1, A2, B1 and B2, as shown in
Fig. 8(b). If detector A1 and B1 click, i.e., the mea-
surement result is | + +〉, the resulting state at modes c
and d is
ρ++ =
d2αd2βP thα (V, d)P
β (V, d)
(|η〉 + | − η〉)(〈η| + 〈−η|)
(|ξ〉+ | − ξ〉)(〈ξ| + 〈−ξ|)
Note that state ρ++ is not normalized, which implies that
the probability of obtaining the corresponding measure-
ment result is not unity. The probability of obtaining
this result is
P++ =
(V + 1)(V + e−
2(V 2 + e−
. (34)
When the result is either |+−〉 or | −+〉, the result is
〈ψ2|ρB
|ψ2〉 = 〈ψ3|ρB
|ψ3〉 = 0, (35)
which obviously means that the probability of the ob-
taining this result is zero. When the result is | −−〉, i.e.,
detector A2 and B2 click,
ρ−− =
d2αd2βP thα (V, d)P
β (V, d)
(|η〉 − | − η〉)(〈η| − 〈−η|)
(|ξ〉 − | − ξ〉)(〈ξ| − 〈−ξ|)
which is not normalized. The probability of obtaining
this result is
P−− =
(V − 1)(V − e− 4d
2(V 2 + e−
, (37)
and it can be simply verified that P+++P−− = 1. There-
fore, only the measurement results |++〉 and | −−〉 can
be obtained in the case of the input state ρΦ(+). This is
exactly the same for the case of ρΨ(+). In the same way,
it can be shown that if either the input state was ρΦ(−)
or ρΨ(−), only the measurement results |+−〉 and | −+〉
−20 −10 10 20
Probability
−20 −10 10 20
Probability
FIG. 10: (a) The probability distributions, P++
(solid curve)
and P++
(dashed curve), for homodyne measurements at de-
tector C. (b) The probability distributions, P++
(solid curve)
and P++
(dashed curve), for homodyne measurements at de-
tector C.
can be obtained. In other words, the parity of the to-
tal incoming state is perfectly well discriminated by the
measurements on single-photon qubits.
Subsequently, a homodyne measurement is performed
for mode c by homodyne detector C as shown in Fig. 8(b).
We assume that ideal homodyne measurements are per-
formed, i.e., when a homodyne measurement is per-
formed the state is projected onto eigenstate |x〉 of oper-
ator X with eigenvalue x, where
(a+ a†). (38)
Let us first consider the case when the measurement re-
sult for the single photon qubits is | + +〉. In this case,
the remaining state is ρ++ in Eq. (33). The probabil-
ity distribution P++
for the homodyne measurement at
detector C is
= 〈x|Trd[ρ++]|x〉 =
2 (e−V x
2 (V + 1)
. (39)
Note that the superscript, ++, denotes that the qubit
measurement result was |++〉, and the subscript, Φ(+),
denotes that the input state was ρΦ
. These notations
will be used also for the other cases in this section. The
same analysis can be performed for the other possible
measurement outcome | − −〉:
= 〈x|Trd[ρ−−]|x〉 =
2 (e−V x
2 − e−x
2 (V − 1)
. (40)
In the same way, for another input state, ρΦ(−), it is
straightforward to show:
= P++
, P−+
= P−−
, (41)
and P++
= P−−
= 0. On the other hand, if the input
state was ρΨ(+), the probability distributions P++
at detector C are
= 〈x|Trc[ρ++]|x〉 =
x{4d+(2+V 2)x}
(1+V 2)x2
V + 2e
2x(2d+x)
V + e
x(8d+x+V 2x)
V V ) + 1
, (42)
= 〈x|Trc[ρ−−]|x〉 =
x{4d+(2+V 2)x}
(1+V 2)x2
V − 2e
2x(2d+x)
V + e
x(8d+x+V 2x)
V V )− 1
. (43)
20 4 6 8 10 d
FIG. 11: The distinguishability Ps between states ρ
Ψ(+) and
ρΦ(+) by a homodyne measurement against for V = 10 (solid
curve) and V = 20 (dashed curve) against distance d. See
text for details.
It is straightforward to show for the other input state
ρΨ(−):
= P−−
, P−+
= P++
. (44)
The probability distributions P++
and P++
are plotted
in Fig. 10. Figure 10 shows that when the input state was
ρΦ(+) or ρΦ(−), the homodyne measurement outcome by
detector C, characterized by P++
and P−−
, is located
around the origin. However, when the input state was
ρΨ(+) or ρΨ(−), the homodyne measurement outcome by
detector C, characterized by P++
and P−−
, is located
far from the origin. Therefore, two of the Bell states,
ρΦ(+) or ρΦ(−), can be well distinguished from the other
two by the homodyne detector C for the case of the mea-
surement outcome |++〉. Finally, by combining the ho-
modyne measurement result and the qubit measurement
result, all four Bell states can be effectively distinguished.
For example, let us assume that the measurement out-
come of the single photon detectors was | + +〉 and the
homodyne detection outcome was around the origin, i.e.,
x ≈ 0. Then, one can say that state ρΨ(−) has been mea-
sured for the result of the thermal-Bell measurement.
As implied in Fig. 10, the overlaps between the proba-
bility distributions around the origin, P++
and P−−
and the other distributions, P++
and P−−
, are ex-
tremely small for a sufficiently large d. In other words,
the distinguishability by the homodyne detection rapidly
approaches 1 as d increases. As an example, we can cal-
culate the distinguishability between the states ρΨ(+) and
ρΦ(+) by the homodyne measurement by detector C. The
distinguishability by homodyne detection is
|x|<d
dxP++c (x) +
|x|≥d
dxP++d (x)
which is plotted in Fig. 11. The distinguishability is Ps ≈
0.99 for d = 5.5 (d = 7.8) when V = 10 (V = 20), and
it becomes as high as Ps > 0.99999 for d = 10 (d = 15)
when V = 10 (V = 20). If necessary, another homodyne
measurement can be performed for mode d to enhance
distinguishability of the Bell measurement. When the
probability distribution at detector C is around the origin
that of detector D is far from the origin and vice versa.
Note also that the second scheme using homodyne de-
tection is robust to detection inefficiency compared with
the first scheme using photon number resolving measure-
ments. In the first scheme, even if a detector misses only
one photon, it will result in a completely wrong mea-
surement outcome. In the second scheme, however, the
measurement outcome will not be affected in that way. If
a single photon detector misses a photon, it will be imme-
diately recognized. Such a case can simply be discarded
so that it will only degrade the success probability of the
Bell measurement. The homodyne detection inefficiency
will not significantly affect the result when the distribu-
tions around the origin and the distributions far from the
origin are well separated, i.e., when d ≫
V , as shown
in Fig. 10. On the other hand, loss in the Kerr medium
will have a detrimental affect.
B. Quantum teleportation and computation
Quantum teleportation of a thermal-state qubit can be
performed using one of the Bell states as the quantum
channel. Let us assume that Alice needs to teleport a
thermal-state qubit, ρψ, to Bob using a thermal-state
entanglement, ρΨ(−), shared by the two parties. The total state can be represented as
1 ⊗ ρ
23 = Nt
dα2dβ2dγ2P thα (V, d)P
β (V, d)P
γ (V, d)
(a|α〉+ b| − α〉)1(|β,−γ〉 − | − β, γ〉)23
. (46)
Alice first needs to perform the thermal-Bell measure-
ment described in the previous subsection. To complete
the teleportation process, Bob should perform an appro-
priate unitary transformation on his part of the quantum
channel according to the measurement result sent from
Alice via a classical channel. It is straightforward to show
that the required transformations are exactly the same
to those for the coherent-state qubit [27]. When the mea-
surement outcome is ρΨ(−), Bob obtains a perfect replica
of the original unknown qubit without any operation.
When the measurement outcome is ρΦ(−), Bob should
perform |α〉 ↔ | − α〉 on his qubit in Eq. (23). Such a
phase shift by π can be done using a phase shifter whose
action is described by P (ϕ) = eiϕa
†a, where a and a† are
the annihilation and creation operators. When the out-
come is ρΨ(+), the transformation should be performed
as |α〉 → |α〉 and | − α〉 → −| − α〉. It is known that the
displacement operator is a good approximation of this
transformation for d ≫ 1 [29]. This transformation can
also be achieved by teleporting the state again locally
and repeating until the required phase shift is obtained
[30]. When the outcome is ρΦ(+), σx and σz should be
successively applied.
V. CONCLUSION
In this paper, we have studied characteristics of su-
perpositions and entanglement of thermal states at high
temperatures and discussed their applications to quan-
tum information processing. The superpositions and en-
tanglement of thermal states show various nonclassical
properties such as interference patterns, negativity of the
Wigner functions, and violations of the Bell-CHSH in-
equality. The Bell violations are more sensitive to the
interaction time during the generation process when the
thermal temperature (i.e. mixedness) of the thermal-
state entanglement is larger. Therefore, in order to ob-
serve the Bell violations using the mixed state at a high
temperature, the interaction time in the Kerr medium
should be accurate. We have pointed out that certain
superpositions of high-temperature thermal states, sym-
metric in the phase space, can also be generated. Some
of these states have neither squeezing properties nor neg-
ative values in their Wigner functions but they are found
to be highly nonclassical.
We have introduced the thermal-state qubit and
thermal-Bell states for applications to quantum informa-
tion processing. We have presented two possible methods
for the Bell-state measurement. The Bell-state measure-
ment enables one to perform quantum teleportation and
gate operations for quantum computation with thermal-
state qubits. The first scheme uses two photon number
resolving detectors and a 50-50 beam splitter to discrim-
inate the thermal-Bell states. Using the second scheme,
it is possible to effectively discriminate the thermal-Bell
states without photon number resolving detection. The
required resources for the second scheme are two Kerr
nonlinear interactions, two single photon detectors, two
50:50 beam splitters and one homodyne detector. The
second scheme is more robust to inefficiency of the de-
tectors: the inefficiency of the single photon detectors
only degrades the success probability of the Bell mea-
surement.
Acknowledgments
This work was supported by the DTO-funded U.S.
Army Research Office Contract No. W911NF-05-0397,
the Australian Research Council and Queensland State
Government.
[1] E. Schrödinger, Naturwissenschaften. 23, pp. 807-812;
823-828; 844-849 (1935).
[2] A.J. Leggett and A. Garg, Phys. Rev. Lett. 54, 857
(1985).
[3] M.D. Reid, preprint quant-ph/0101052 and references
therein.
[4] M.A. Nielsen and I.L. Chuang, Quantum Computation
and Quantum Information (Cambridge, 2000).
[5] H.M. Wiseman and J.A. Vaccaro, Phys. Rev. Lett. 87,
240402, (2001); See discussions in the introduction and
references therein.
[6] H. Jeong and T.C. Ralph, Phys. Rev. Lett. 97, 100401
(2006).
[7] E. Schrödinger, Naturwissenschaften 14, 664 (1926).
[8] W. Schleich, M. Pernigo, and F.L. Kien, Phys. Rev. A
44, 2172 (1991).
[9] L.M. Johansen, Phys. Lett. A 329, 184.
[10] S. Bose, I. Fuentes-Guridi, P.L. Knight, and V. Vedral,
http://arxiv.org/abs/quant-ph/0101052
Phys. Rev. Lett. 87, 050401 (2001).
[11] R. Filip, M. Dusek, J. Fiurasek, L. Mista, Phys. Rev. A
65, 043802 (2002).
[12] A. Ferreira, A. Guerreiro, and V. Vedral, Phys. Rev. Lett.
96, 060407 (2006); We note that this work appeared on
the Los Alamos archive (quant-ph/0504186) after we up-
loaded the main results of our work (quant-ph/0410210).
[13] B. S. Cirel’son, Lett. Math. Phys. 4, 93 (1980).
[14] V. Vedral, New J. Phys. 6 102 (2004).
[15] D. F. Walls and G. J. Milburn, Quantum Optics,
Springer-Verlag (1994).
[16] M. Brune et al., Phys. Rev. Lett. 77, 4887 (1996); A.
Auffeves et al., Phys. Rev. Lett. 91 230405 (2003).
[17] H. Schmidt and A. Imamoglu, Opt. Lett. 21, 1936 (1996);
L. V. Hau et al., Nature 397, 594 (1999).
[18] H. Jeong, Phys. Rev. A 72, 034305 (2005) and references
therein.
[19] We note that denominator V was missing in the genera-
tion probability P± in [6].
[20] M. S. Kim and J. Lee, Phys. Rev. A 61 042102 (2000).
[21] D. M. Greenberger, M. Horne and A. Zeilinger, Bells
theorem, Quantum theory, and Conceptions of the the
Universe, ed. M. Kafatos, Kluwer, Dordrecht, 69 (1989);
[22] S. Bell, Physics 1, 195 (1964).
[23] J. F. Clauser et al., Phys. Rev. Lett. 23, 880 (1969).
[24] K. Banaszek and K. Wódkiewicz, Phys. Rev. A 58, 4345
(1998); Phys. Rev. Lett. 82, 2009 (1999).
[25] B. -G. Englert, N. Sterpi, and H.Walther, Opt. Commun.
100 526 (1993).
[26] H. Jeong, W. Son, M. S. Kim, D. Ahn, and C. Brukner,
Phys. Rev. A 67, 012106 (2003).
[27] H. Jeong, M. S. Kim, and J. Lee, Phys. Rev. A. 64,
052308 (2001).
[28] S. J. van Enk and O. Hirota, Phys. Rev. A. 64, 022313
(2001).
[29] H. Jeong and M. S. Kim Phys. Rev. A 65, 042305 (2002).
[30] T. C. Ralph, A. Gilchrist, G. J. Milburn, W. J. Munro,
and S. Glancy, Phys. Rev. A 68, 042319 (2003).
http://arxiv.org/abs/quant-ph/0504186
http://arxiv.org/abs/quant-ph/0410210
ABSTRACT
  We study characteristics of superpositions and entanglement of thermal states
at high temperatures and discuss their applications to quantum information
processing. We introduce thermal-state qubits and thermal-Bell states, which
are a generalization of pure-state qubits and Bell states to thermal mixtures.
A scheme is then presented to discriminate between the four thermal-Bell states
without photon number resolving detection but with Kerr nonlinear interactions
and two single-photon detectors. This enables one to perform quantum
teleportation and gate operations for quantum computation with thermal-state
qubits.

<|endoftext|><|startoftext|>
Optimal control of stochastic differential equations
with dynamical boundary conditions
Stefano BONACCORSI∗, Fulvia CONFORTOLA†, Elisa MASTROGIACOMO
Dipartimento di Matematica, Università di Trento,
via Sommarive 14, 38050 Povo (Trento), Italia
In this paper we investigate the optimal control problem for a class of stochastic Cauchy
evolution problem with non standard boundary dynamic and control. The model is
composed by an infinite dimensional dynamical system coupled with a finite dimensional
dynamics, which describes the boundary conditions of the internal system. In other
terms, we are concerned with non standard boundary conditions, as the value at the
boundary is governed by a different stochastic differential equation.
Keywords: Stochastic differential equations in infinite dimensions, dynamical bound-
ary conditions, optimal control
1991 MSC :
1 Setting of the problem
Our model is a one dimensional semilinear diffusion equation in a confined system,
where interactions with extremal points cannot be disregarded. The extremal points
have a mass and the boundary potential evolves with a specific dynamic. Stochas-
ticity enters through fluctuations and random perturbations both in the inside as
on the boundaries; in particular, in our model we assume that the control process
is perturbed by a noisy term.
There is a growing literature concerning such problems; we shall mention the
paper [2] where a problem in a domain O ⊂ Rn is concerned; the authors cite as an
example an SPDE with stochastic perturbations which appears in connection with
random fluctuations of the atmospheric pressure field. As opposite to ours, however,
that paper is not concerned with control problems. Quite recently, the authors
became aware of the paper [1] where a different application to some generalized
Lamb model is proposed.
The internal dynamic is described by a stochastic evolution problem in the unit
∗stefano.bonaccorsi@unitn.it
†Current address: fulvia.confortola@unimib.it
http://arxiv.org/abs/0704.0524v1
2 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
interval D = [0, 1]
∂tu(t, x) = ∂
xu(t, x) + f(t, x, u(t, x)) + g(t, x, u(t, x))Ẇ (t, x) (1)
which we write as an abstract evolution problem on the space L2(0, 1)
du(t) = Amu(t) + F (t, u(t)) dt+G(t, u(t)) dW (t), (2)
where the leading operator is Am = ∂
x with domain D(Am) = H
2(0, 1). We assume
that f and g are real valued mappings, defined on [0, T ]× [0, 1]× R, which verify
some boundedness and Lipschitz continuity assumptions.
The boundary dynamic is governed by a finite dimensional system which follows
a (ordinary, two dimensional) stochastic differential equation
∂tvi(t) = −bivi(t) + ∂νu(t, i) + hi(t)V̇i(t), i = 0, 1
where bi are positive numbers and hi(t) are bounded, measurable functions; ∂ν is
the normal derivative on the boundary, and coincides with (−1)i∂x for i = 0, 1. For
notational semplicity, we introduce the 2× 2 diagonal matrices B = diag(−b−0, b1)
and h(t) = diag(h0(t), h1(t)). There is a constraint
Lu = v
which we interpret as the operator evaluating boundary conditions; the system is
coupled by the presence, in the second equation, of a feedback term C that is an
unbounded operator
∂xu(0)
−∂xu(1)
The idea is to write the problem in abstract form for the vector u =
the space X = L2(0, 1)× R2, that is
du = Au(t) + F(t,u(t)) dt +G(t,u(t)) dW(t)
u(0) =
Our main concern is to study spectral properties of the matrix operator
on the domain
D(A) = {u ∈ D(Am)× R2 : Lu = v}.
Theorem 1. A is the infinitesimal generator of a strongly continuous, analytic
semigroup of contractions etA, self-adjoint and compact.
Control of stochastic differential equations with dynamical boundary conditions 3
We shall prove the above theorem in Section 2. Further, we shall prove that
A is a self-adjoint operator with compact resolvent, which implies that the gener-
ated semigroup is Hilbert-Schmidt. Moreover, we can characterize the complete,
orthonormal system of eigenfunctions associated to A.
Let us fix a complete probability space (Ω,F, {Ft},P); on this space we de-
fine W (t), that is a space-time Wiener process taking values in X and V (t) =
(V1(t), V2(t)), that is a R
2-valued Wiener process, such that W (t, x) and V (t) are
independent.
As a corollary to Theorem 1, using standard results for infinite dimensional
stochastic differential equations, compare [3, Theorem 7.4], we obtain the following
existence result
Theorem 2. For any initial condition
∈ X×R2 there exists a unique process
u ∈ L2F (0, T ;X × R2) such that
u(t) = etA
e(t−s)AF(u(s)) ds+
e(t−s)AG(u(s)) dW(s)
that is by definition a mild solution of (3).
The abstract semigroup setting we propose in this paper allows to obtain an
optimal control synthesis for the above evolution problem with boundary control
and noise. This means that we assume a boundary dynamics of the form:
∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)] (4)
where z(t) is the control process and takes values in a given subset of R2.
As before, we can write the system – defined by the internal evolution problem
(1) and the dynamical boundary conditions described by (4) – in the following
abstract form
duzt = Au
t dt+ F(t,u
t ) dt+G(t,u
t )[Pzt dt+ dWt]
ut0 = u0.
P : R2 → X denote the immersion of the boundary space in the product space
X = L2(0, 1)× R2.
The aim is to choose a control process z, within a set of admissible controls, in
such way to minimize a cost functional of the form
J(t0, u0, z) = E
λ(s,uzs , zs)) ds+ Eφ(u
T ) (6)
where λ and φ are given real functions. In our setting, altough the control lives in a
finite dimensional space, we obtain an abstract optimal control problem in infinite
dimensions. Such type of problems has been exhaustively studied by Fuhrman and
Tessitore in [8]. The control problem is understood in the usual weak sense (see [7]).
We prove that if f and g are sufficiently regular then the abstract control problem,
under suitable assumptions on λ and φ, can be solved and we can characterize
optimal controls by a feedback law (see Theorem 17 and compare Theorem 7.2 in
[8]).
4 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
Theorem 3. In our assumptions, there exists an admissible control {z̄t, t ∈ [0, T ]}
taking values in a bounded subset of R2, such that the closed loop equation:
duτ = Auτ dτ +G(τ,uτ )PΓ(τ,uτ ,G(τ,uτ )
∗∇xv(τ,uτ )) dτ
+ F(τ,uτ ) dτ +G(τ,uτ ) dWτ , τ ∈ [t0, T ],
ut0 = u0 ∈ X.
admits a solution and the couple (z,u) is optimal for the control problem.
Stochastic boundary value problems are already present in the literature, see the
paper [11] and the references therein; in those papers, the approach to the solution
of the system is more similar to that in [2]. We also need to mention the paper [5]
for a one dimensional case where the boundary values are set equal to a white noise
mapping.
2 Generation properties
Let X = L2(0, 1) be the Hilbert space of square integrable real valued functions
defined on D = [0, 1] and X = X × R2. In this section we consider the following
initial-boundary value problem on the space X
u(t) = Amu(t)
v(t) = Lu(t)
v(t) = Bv(t)− Cu(t)
u(0) = u0 ∈ X, v(0) = v0 ∈ R2.
In the above equation, Am is an unbounded operator with maximal domain
Am = ∂
x, D(Am) = H
2(0, 1);
B is a diagonal matrix with negative entries (−b0,−b1).
Let C : D(C) ⊂ X → ∂X the feedback operator, defined on D(C) = H1(0, 1)
∂xu(0)
−∂xu(1)
The boundary evaluation operator L is the mapping L : X → R2 given by
Its inverse is the Dirichlet mapping D
λ : R
2 → D(Am)
λ φ = u(x) ∈ D(Am) :
(λI −Am)u(x) = 0,
Lu = φ.
As proposed in [10], we define a mild solution of (8) a function u ∈ C([0, T ];X)
such that
u(t) = u0 +Am
u(s) ds, t ∈ [0, T ]
v(t) = v0 +B
v(s) ds+ C
u(s) ds.
Control of stochastic differential equations with dynamical boundary conditions 5
In order to use semigroup theory to study equation (8), we consider a matrix
operator describing the evolution with feedback on the boundary
on the domain
D(A) = {u ∈ D(Am)× R2 : Lu = v}.
Then a mild solution for equation (8) exists if and only if A is the generator of a
strongly continuous semigroup.
The above definition of the domain D(A) puts in evidence the relation between
the first and the second component of the vector u. There is a different characteri-
zation that is sometimes useful in the applications.
Let us define the operator A0 as A0 = Am on D(A0) = {u ∈ D(Am) : Lu = 0}.
We can then write the domain of A as
D(A) = {u ∈ D(Am)× ∂X : u−DA,L0 v ∈ D(A0)}.
The operator A can be decomposed as the product
I −DA,L0
Then, according to Engel [6], A is called a one-sided K-coupled matrix-valued
operator.
Proof of Theorem 1
In this section we apply form theory in order to prove generation property of the
operator A, compare the monograph [13].
Proposition 4. A is the infinitesimal generator of a strongly continuous, analytic
semigroup of contractions, self-adjoint and compact.
We will give the proof in two steps. First of all we will consider the following
form:
a(u,v) =
u′(x)v′(x) dx + b0 u(0) v(0) + b1 u(1) v(1)
on the domain
u = (u, α) ∈ H1(0, 1)× R2 | u(0) = α0, u(1) = α1
and we will show that it is densely defined, closed, positive, symmetric and continue.
Moreover, the operator associated with the form a is (A, D(A)) defined above.
According to [13], this implies that the operator A is self-adjoint and generates a
contraction semigroup etA on X that is analytic of angle π
. Then we will show the
self-adjointness and the compactness of the semigroup etA. To see this, we will refer
to [9].
Let us begin with the properties of the form a.
6 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
Lemma 5. The form a is densely defined, closed, positive, symmetric and continue.
Proof. By assumption, since b0 and b1 are positive real numbers, it follows that in
particular a is symmetric and positive.
It is clear that V is a linear subspace of X. Observe that V is dense in X if any
u ∈ X can be approximated with elements of V . Consider (u, α) ∈ L2[0, 1] × R2.
Since C∞c [0, 1] is dense in L
2(0, 1) it follows that for all ε > 0 there exists v ∈
C∞c [0, 1] such that
|u− v|L2[0,1] ≤
Now let ρ0(x) be a symmetric function in C
c (R) with support in Bε(0), ρ0(0) = 1
ρ0(x) dx = ε/3. Finally, let ρ1(x) = ρ0(x−1). Then, if we define the function
ρ = v + α0 ρ0
[0,1]
+ α1 ρ1
[0,1]
, we have:
|u− ρ|L2[0,1] ≤ |u− v|L2[0,1] + |α0ρ0|L2[0,1] + |α1ρ1|L2[0,1] ≤
≤ max {1, α0, α1} ε.
Morever, ρ(0) = α0 and ρ(1) = α1. Thus
|(u, α)− (ρ, ρ(0), ρ(1))|
for a suitable M . This shows that V is dense in X.
In order to check closedness and continuity of a, observe first that the norm
induced by a on the space V is equivalent to the norm given by the inner product
(u,v)V =
[u′(x)v′(x) + u(x)v(x)] dx+ u(1)v(1) + u(0)v(0).
In fact, if we set b = b0 + b1, we have
‖u‖a =
a(u,u) + ‖u‖2V
so that
‖u‖2a ≤ 2 ‖u‖
H1(0,1) + 2b
u(0)2 + u(1)2
≤ max {2, 2b} ‖u‖2V .
Now observe that V becomes a Hilbert space when equipped with the inner product
defined above since V is a closed subspace of H1(0, 1)× R2. Then a is closed.
Finally, a is continuous. To see this, take u,v ∈ V ; then
|a(u,v)| ≤
|u′(x)v′(x)| dx+ b [|u(0)| |v(0)|+ |u(1)| |v(1)|]
≤ ‖u‖H1(0,1) ‖v‖H1(0,1) + b [|u(0)| |v(0)|+ |u(1)| |v(1)|]
≤ ‖u‖V ‖v‖V ≤M ‖u‖a ‖v‖a
by the Cauchy-Schwartz inequality.
Control of stochastic differential equations with dynamical boundary conditions 7
Lemma 6. The operator associated with a is (A, D(A)) defined above.
Proof. Denote by (C, D(C)) the operator associated with a. By definition, C is given
D(C) = {f ∈ V | ∃g ∈ X s.t. a(f ,g) = (g,h)X∀h ∈ V }
Cf = −g.
Let us first show that A ⊂ C. Take f ∈ D(A). Then for all h ∈ V
a(f ,h) =
f ′(x)h′(x) dx + b0f(0)h(0) + b1f(1)h(1)
= f ′(x)h(x)|10 −
f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1)
= f ′(1)h(1)− f ′(0)h(0)−
f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1).
At the same time, if we set α = (f(0), f(1)), β = (h(0), h(1)), we have
(Af ,h) = (Af, h)L2(0,1) + (Cf +Bα, β)R2 =
f ′′(x)h(x) dx + f ′(0)h(0)− f ′(1)h(1)
− b0f(0)h(0)− b1f(1)h(1) = −a(f ,g).
The last equality shows that A ⊂ C.
To check the converse inclusion C ⊂ A take f ∈ D(C). By definition, there exists
g ∈ X such that
a(f ,h) = (g,h)X, ∀h ∈ V
that is,
f ′(x)h′(x) dx =
g(x)h(x) dx.
Now choose h = (h, α) ∈ V such that the function h belongs to H10 (0, 1) (the
existence of such a function is ensured by the continuous embedding of H10 (0, 1)in
H1(0, 1)). Then by the last equality we cand derive that f ′ ∈ H1(0, 1) and g
is the weak derivative of f ′: it follows that f ′ ∈ H1(0, 1) and we conclude that
f ∈ H2(0, 1). Integrating by parts as in the proof of the first inclusion we see that
a(f ,h) =
f ′(x)h′(x) dx + b0f(0)h(0) + b1f(1)h(1)
= f ′(x)h(x)|10 −
f ′′(x)h(x) dx + b0f(0)h(0) + b1f(1)h(1)
= (−Af ,h) = (g,h), ∀h ∈ V.
This implies that Af = −g, and the proof is complete.
8 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
Corollary 7. The operator (A, D(A)) is self-adjoint and dissipative. Moreover it
has compact resolvent.
Proof. The self-adjointness of A follows by [13] (Proposition 1.24) and he dissipativ-
ity is obsvious. Since D(A) ⊂ H2(0, 1)×R2, the operator A has compact resolvent
and the claim follows.
Taking into account the above corollary, it follows that A generates a contraction
semigroup (etA)t≥0 on X that is analytic of angle π/2 and self-adjoint. Finally, by
[9, Corollary XIX.6.3] we obtain that etA is compact for all t > 0.
Thus we have just proved Proposition 4.
Remark 1. By the Spectral Theorem [9, Chapter XIX, Corollary 6.3] it follows
that there exists an orthonormal basis {en}n∈N of X and a sequence {λn}n∈N of real
negative numbers λn ≤ 0, such that en ∈ D(A), Aen = λnen and lim
λn = −∞.
Moreover, A is given by
λn(u, en)en, u ∈ D(A)
etAu =
eλnt(u, en)en, u ∈ X.
2.1 Spectral properties of the matrix operator
We shall now apply Theorem 2.5 in Engel[6] in order to describe the spectrum of
A. According to that result
σ(A) ⊆ σ(A0) ∪ σ(B) ∪ S (9)
where
S = {λ ∈ ρ(A0) ∩ ρ(B) : Det(F (λ)) = 0}. (10)
The matrix F (λ) is defined as
F (λ) = I − (λ−B)LλKλR(λ,B)
where the operators Lλ and Kλ are given by
Lλ = −BR(λ,B)R(0, B)C, Kλ = −A0R(λ,A0)DA,L0 .
Notice that the matrix F (λ) can also be written as
F (λ) = I + CA0R(λ,A0)D
0 R(λ,B).
Remark 2. In case when the feedback operator matrix C is identically zero, the
above construction implies that S = ∅.
Control of stochastic differential equations with dynamical boundary conditions 9
Determining the set S
In the following, we construct explicitly the set S. The idea is to construct the
matrix F (λ) and compute its determinant.
We have to distinguish two cases. If λ < 0 we have
Det(F (λ)) = 1 +
−λcos(
λ+ b0
λ+ b1
(λ+ b0)(λ + b1)
We note that the equation Det(F (λ)) = 0 has infinite solutions {λj}j∈N and
every λj belongs to the interval (−π2(j + 1)2,−π2j2).
Each λj is eigenvalue of the operator A corresponding to the eigenfunction φj =
(ej(x), ej(0), ej(1)) where
ej(x) =
−λjBj
b0 + λj
−λjx+Bj sin
−λjx.
for a normalizing constant 0 < Bj <
If λ > 0 then
Det(F (λ)) = 1 +
1 + e2
− 1 + e2
b0 + λ
b1 + λ
(b0 + λ) (b1 + λ)
We note that Det(F (λ)) > 0 for every λ > 0. This means that there are not
elements λ strictly positive in S. Moreover the eigenvalues of A in S are all negative.
Remark 3. It is possible to verify directly with some computation that the eigen-
values of A are not eigenvalues of A.
Further, the same happens in general with the eigenvalues of B, except in case
b0 and b1 satisfy an explicit relation. In any case, also if b0 and b1 happen to belong
to σ(A), they are in a finite number and do not affect its behaviour.
Therefore, with no loss of generality, in the following we may and do assume
that all the eigenvalues of A are contained in S.
Theorem 8. In the above assumptions the semigroup etA is Hilbert-Schmidt, that
|etAφi|2L2(0,1)×R2 <∞ (11)
for any orthonormal basis {φi} of L2(0, 1)× R2.
Proof. In order to prove that the semigroup etA is Hilbert-Schmidt, it is enough
verify the (11) for an orthonormal basis. Let {φi} the orthonormal sequence of
eigenfunctions of the operator A described in Remark 1. Then
|etAφi|2L2(0,1)×R2 =
e2tλi
10 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
where λi are the eigenvalues of the operator A. By (9) it follows that
e2tλi ≤
i:λi∈σ(A)
e2tλi +
i: λi∈σ(B)
e2tλi +
i:λi∈S
e2tλi .
But, by Remark 3 we have that
e2tλi ≤
i: λi∈σ(B)
e2tλi +
i:λi∈S
e2tλi
and the first of the last two series is a finite sum and the second one converges since
the eigenvalues λi in S are asymptotic to −π2i2.
3 The abstract problem
In this section we are concerned with problem (3): we introduce the relevant assump-
tions and we formulate the main existence and uniqueness result for its solution.
Let W = (W,V ) be the Wiener process taking values in = L2(0, 1) × R2. We
denote {Ft, t ∈ [0, T ]} the natural filtration of W, augmented with the family N
of P-null sets of FT :
Ft = σ(W(s) : s ∈ [0, t]) ∨N.
The filtration {Ft} satisfies the usual conditions.
Define F : [0, T ]× X → X for every u =
F(t,u) = F
F (t, u)
where F (t, u)(ξ) = f(t, ξ, u(ξ)).
Let G be the mapping [0, T ]×X → L(X,X) such that, for u =
and y =
in X,
G1(t, u) y
G2(t, v) η
where
(G1(t, u) y)(ξ) = g(t, ξ, u(ξ))y(ξ) and (G2(t, v) · η) = h(t) η;
we stress that h is a diagonal matrix.
Therefore, we are concerned with the following abstract problem
dut = Aut dt+ F(t,ut) dt+G(t,ut)dWt
ut0 = u0
on which we formulate the following assumptions.
Control of stochastic differential equations with dynamical boundary conditions 11
Assumption 9.
(i) f : [0, T ] × [0, 1] × R → R, is a measurable mapping, bounded and Lipschitz
continuous in the last component
|f(t, x, u)| ≤ K, |f(t, x, u)− f(t, x, v)| ≤ L|u− v|.
for every t ∈ [0, T ], x ∈ [0, 1], u, v ∈ R.
(ii) g : [0, T ]× [0, 1]× R → R, is a measurable mapping such that
|g(t, x, u)| ≤ K, |g(t, x, u)− g(t, x, v)| ≤ L|u− v|
for every t ∈ [0, T ], x ∈ [0, 1], u, v ∈ R.
(iii) h : [0, T ] →M(2, 2) is a bounded measurable mapping verifying |h(t)| ≤ K for
every t ∈ [0, T ].
The existence and uniqueness of the solution to (12) is a standard result in the
literature, see for instance the monograph [3]. In order to apply the known results,
we shall verify that the nonlinear coefficients F and G satisfy suitable Lipschitz
continuous conditions. That will be enough to prove the existence of a mild solution
which is a process ut adapted to the filtration Ft satisfying the following integral
equation
ut = e
e(t−s)AF(s,us) ds+
e(t−s)AG(s,us) dWs. (13)
Proposition 10. Under Assumptions 9(i)–(iii), the following hold:
1. the mapping F : X → X is measurable and satisfies, for some constant L > 0,
|F(t,u)− F(t,v)|X ≤ L|u− v|X u,v ∈ X.
2. G is a mapping [0, T ]× X → L(X) such that
a. for every v ∈ X the map G(·, ·)v : [0, T ]× X → X is measurable,
b. esAG(t,u) ∈ L2(X) for every s > 0, t ∈ [0, T ] and u ∈ X, and
c. for every s > 0, t ∈ [0, T ] and u.v ∈ X we have
|esAG(t,u)|L2(X) ≤ L s
−1/4 (1 + |u|X), (14)
|esAG(t,u)− esAG(t,v)|L2(X) ≤ L s
−1/4|u− v|X, (15)
|G(t,u)|L(X) ≤ L (1 + |u|X), (16)
for a constant L > 0.
Proof. 1. We have, for u =
and v =
|F(t,u)− F(t,v)|X = |F (t, u)− F (t, v)|X ≤ L|u− v|X ≤ L|u− v|X.
12 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
2. Condition (16) follows from the definition of G and the Assumptions 9 (ii)-(iii)
on g and h.
Now we prove condition (14). Let {φk}k∈N be an orthonormal basis in X.
|esAG(t,u)|2L2(X) =
| < esAG(t,u)φj , φk > |2X
| < G(t,u)φj , esAφk > |2X
≤ |G(t,u)|2L(X) |esA|2L2(X) ≤ L
2(1 + |u|2
)|esA|2L2(X).
Using Theorem 8,
|esA|2L2(X) ≈
e−2sn
where f(t) ≈ g(t) means that f(s)/g(s) = O(1) as s→ 0; this verifies (14).
In order to prove the last statement (15), we take the orthonormal basis
{φk}k∈N consisting of eigenvectors of A (see Remark 1). We recall that φk =
(ek(x), ek(0), ek(1)) where
ek(x) = Bk
b0 + λk
−λkx+Bk sin
−λkx.
We have
|esAG(t,u)− esAG(t,v)|2L2(X) =
| < esA[G(t,u)−G(t,v)]φj , φk > |2X
| < G(t,u)−G(t,v)φj , esAφk > |2X =
e2sλk |G(t,u)−G(t,v)φk|2.
But, for u =
and v =
, by the definition of the operator G, we have
|G(t,u)−G(t,v)φk|2X =
|g(t, x, u(x))− g(t, x, v(x))|2|ek(x)|2dx
K2|u(x)− v(x)|2dx ≤ K2|u− v|2
since the function g is Lipschitz and |ek(x)| ≤ Bk is uniformly bounded in k.
Consequently
|esAG(t,u)− esAG(t,v)|L2(X) ≤ {
e2tλk}1/2K|u− v|X
≤ |esA|L2(X)K|u− v|X
which concludes the proof.
Control of stochastic differential equations with dynamical boundary conditions 13
Proposition 11. Under the assumptions 9 for every p ∈ [2,∞) there exists a
unique process u ∈ Lp(Ω;C([0, T ];X)) solution of (12).
Proof. We can apply Theorem 5.3.1 in [4]. In fact by Proposition 4 the operator
A generates a strongly continuous semigroup {etA} of bounded linear operators in
the Hilbert space X. Moreover, for this theorem to apply we need to verify that
coefficients F and G satisfy conditions (14)—(16), which follows from Proposition
4 Stochastic control problem
After some preliminaries, in this section we are concerned with an abstract control
problem in infinite dimensions. We settle the problem in the framework of weak
control problems (see [7]).
We aim to control the evolution of the system by the boundary. This means
that we assume a boundary dynamic of the form:
∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)] (17)
where z(t) is the control process. We require that z ∈ L2(Ω× [0, T ];R2).
As in the previous section we can write the system
∂tu(t, x) = ∂
xu(t, x) + f(t, x, u(t, x)) + g(t, x, u(t, x))Ẇ (t, x)
∂tv(t) = bv(t)− ∂νu(t, ·) + h(t)[z(t) + V̇ (t)]
in the following abstract form
duzt = Au
t dt+ F(t,u
t ) dt+G(t,u
t )[Pzt dt+ dWt] ut0 = u0 (19)
where P : R2 → X is the immersion of the boundary space in the product space
X = X×R2. Equation (19), in the framework of stochastic optimal control problem,
is called the controlled state equation associated to an admissible control system.
We recall that, in general, fixed t0 ≥ 0 and u0 ∈ X, an admissible control system
(a.c.s) is given by (Ω,F, {Ft}t≥0,P, {Wt}t≥0, z) where
• (Ω,F,P) is a probability space,
• {Ft}t≥0 is a filtration in it, satisfying the usual conditions,
• {Wt}t≥0 is a Wiener process with values in X and adapted to the filtration
{Ft}t≥0,
• z is a process with values in a space K, predictable with respect to the fil-
tration {Ft}t≥0 and satisfies the constraint: z(t) ∈ Z, P-a.s., for almost every
t ∈ [t0, T ], where Z is a suitable domain of K.
14 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
In our case the space K coincide with R2.
To each a.c.s. we associate the mild solution uz of state equation the mild so-
lution uz ∈ C([t0, T ];L2(Ω;X)) of the state equation. We introduce the functional
J(t0, u0, z) = E
λ(s,uzs , zs)) ds+ Eφ(u
T ) (20)
We consider the problem of minimizing the functional J over all admissible
control systems (which is known in the literature as the weak formulation of the
control problem); any a.c.s. that minimize J -if it exsts- is called optimal for the
control problem.
We define in classical way the Hamiltonian function relative to the above problem
ψ : [0, T ]× X× X → R
setting
ψ(t,u,w) = inf
{λ(t,u, z)+ < w, P z >} (21)
and we define he following set
Γ(t,u,w) = {z ∈ Z : λ(t,u, z)+ < w, P z >= ψ(t,u, z)}
We consider the Hamilton-Jacobi-Bellman equation associated to the control
problem
∂v(t, x)
+ Lt[v(t, ·)](x) = ψ(t, x, v(t, x),G(t, x)∗∇xv(t, x)),
t ∈ [0, T ], x ∈ X,
v(T, x) = Φ(x).
where the operator Lt is defined by
Lt[φ](x) =
Trace
G(t, x)G(x)
∗∇2φ(x)
+ < Ax,∇φ(x) > .
Under suitable assumptions, if we let v denote the unique solution of (22) then we
have J(t, x, z) ≥ v(t, x) and the equality holds if and only if the following feedback
law is verified by z and uzσ:
z(σ) = Γ(σ,uzσ,G(σ,u
∗∇xv(σ,uzσ)).
Thus, we can characterize optimal controls by a feedback law.
This class of stochastic control problems, in infinite dimensional setting, has
been studied by Fuhrman and Tessitore [8] (We refer to Theorem 7.2 in that paper
for precise statements and additional results).
In order to characterize optimal controls by a feedback law we have to require
that the abstract operators F and G satisfy further regularity conditions.
We will prove that, under suitable assumptions on the functions f and g in the
problem (18), the abstract operators fit the required conditions.
Control of stochastic differential equations with dynamical boundary conditions 15
We impose that the operators F and G are Gâteaux differentiable. This notion
of differentiability is weaker than the differentiability in the Fréchet sense.
We recall that for a mapping F : X → V , where X and V denote Banach spaces,
the directional derivative at point x ∈ X in the direction h ∈ X is defined as
∇F (x;h) = lim
F (x+ sh)− F (x)
whenever the limit exists in the topology of V . F is called Gâteaux differentiable
at point x if it has directional derivative in every direction at point x and there
exists an element of L(X,V ), denoted ∇F (x) and called Gâteaux derivative, such
that ∇F (x;h) = ∇F (x)h for every h ∈ X .
Definition 12. We say that a mapping F : X → V belongs to the class G1(X ;V )
if it is continuous, Gâteaux differentiable on X, and ∇F : X → L(X,V ) is strongly
continuous.
The last requirement of the definition means that for every h ∈ X the map
∇F (·)h : X → V is continuous. Note that ∇F : X → L(X,V ) is not continuous
in general if L(X,V ) is endowed with the norm operator topology; clearly, if this
happens then F is Fréchet differentiable on X . Membership of a map in G1(X,V )
may be conveniently checked as shown in the following lemma.
Lemma 13. A map F : X → V belongs to G1(X,V ) provided the following condi-
tions hold:
i) the directional derivatives ∇F (x;h) exist at every point x ∈ X and in every
direction h ∈ X;
ii) for every h, the mapping ∇F (·;h) : X → V is continuous;
iii) for every x, the mapping h 7→ ∇F (x;h) is continuous from X to V .
When F depends on additional arguments, the previous definitions and proper-
ties have obvious generalizations.
The following assumptions are necessary in order to provide Gâteaux differen-
tiability for the coefficients of the abstract formulation.
Assumption 14. For a.a. t ∈ [0, T ], ξ ∈ [0, 1] the functions f(t, ξ, ·) and g(t, ξ, ·)
belong to the class C1(R).
Proposition 15. Under assumptions 9 and 14, for every s > 0, t ∈ [0, T ],
F(t, ·) ∈ G1(X,X), esAG(t, ·) ∈ G1(X, L2(X)).
Proof. The first statement is an immediate consequence of the fact that f(t, ξ, ·) ∈
C1(R,R). In order to prove that esAG(t, ·) belongs to the class G1(X, L2(X)) we use
the continuous differentiability of g and an argument similar to that used in the
proof of Proposition 10.
We note that, for u =
and v =
, the gradient operator∇
esAG(t,u)
is an Hilbert Schmidt operator that maps
7→ esA
gu(t, ·, u(·))w(·)v(·)
= esA (∇u(G(t,u)v)(w))
16 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
In fact, we have
esAG(t,u+ rv) − esAG(t,u)
−∇esAG(t,u)v
L2(X)
= lim
esAG(t,u+ rv) − esAG(t,u)
φj − esA (∇u(G(t,u)v)φj) , φk >
= lim
G(t,u+ rv) −G(t,u)
−∇uG(t,u)v
φj , e
sAφk >
= lim
e2sλk
G(t,u+ rv) −G(t,u)
−∇uG(t,u)v
= lim
e2sλk
g(t, u(ξ) + rv(ξ)) − g(t, u(ξ))
ek(ξ)− gu(t, u(ξ))v(ξ)ek(ξ)
≤ c lim
e2sλk
g(t, u(ξ) + rv(ξ)) − g(t, u(ξ))
− gu(t, u(ξ))v(ξ)
= c lim
e2sλk
gu(t, u(ξ) + αrv(ξ)) − gu(t, u(ξ))
dα v(ξ)
and, by dominated convergence, this limit is equal to zero. In similar way we can
prove the points (ii)− (iii) of Lemma 13 to obtain the thesis.
In order to prove the main result of this section we require the following hypoth-
esis.
Assumption 16.
(i) λ is measurable and for a.e. t ∈ [0, T ], for all u,u′ ∈ X, z ∈ Z
|λ(t,u, z)− λ(t,u′, z)| ≤ C|1 + u+ u′|m|u− u′|
|λ(t, 0, z)| ≤ C
for suitable C ∈ R+, m ∈ N;
(ii) Z is a Borel and bounded subset of R2;
(iii) Φ ∈ G1(X,R) and, for every σ ∈ [0, T ], ψ(σ, ·, ·) ∈ G1,1(X × X,R);
(iv) for every t ∈ [0, T ], u,w,h ∈ X
ψ(t,u,w)h| + |∇
φ(u)h| ≤ L|h|(1 + |u|)m;
(v) for all t ∈ [0, T ], for all u ∈ X and w ∈ X there exists a unique Γ(t,u,w) ∈ Z
that realizes the minimum in (21). Namely
λ(t,u,Γ(t,u,w))+ < w, PΓ(t,u,w) >= ψ(t,u,w)
Control of stochastic differential equations with dynamical boundary conditions 17
Theorem 17. Suppose that assumptions 9, 14 and 16 hold. For all a.c.s. we have
J(t0, u0, z) ≥ v(t0, u0) and the equality holds if and only if the following feedback
law is verified by z and uz:
z(σ) = Γ(σ,uzσ, G(σ,u
∗∇xv(σ,uzσ)), P− a.s. for a.a. σ ∈ [t0, T ]. (23)
Finally there exists at least an a.c.s. for which (23) holds. In such a system the
closed loop equation:
duτ = Auτ dτ +G(τ,uτ )PΓ(τ,uτ ,G(τ,uτ )
∗∇xv(τ,uτ )) dτ
+ F(τ,uτ ) dτ +G(τ,uτ ) dWτ , τ ∈ [t0, T ],
ut0 = u0 ∈ X.
admits a solution and if z(σ) = Γ(σ,uσ, G(σ,uσ)
∗∇xv(σ,uσ)) then the couple (z,u)
is optimal for the control problem.
Proof. By Proposition 4 we know that A generates a strongly continuous semigroup
of linear operators etA on X. The assumption 9 ensures that the statements in
Proposition 10 hold. Moreover the assumption 14 guarantees that the results in
Proposition 15 are true. Finally these conditions together with the assumption 16
allow us to apply Theorem 7.2 in [8] and to perform the synthesis of the optimal
control.
References
1. M. Bertini, D. Noja, A. Posilicano, Dynamics and Lax-Phillips scattering for gen-
eralized Lamb models, J. Phys. A: Math. Gen. 39 (2006), 15173–15195
2. Igor Chueshov, Björn Schmalfuss, Parabolic stochastic partial differential equations
with dynamical boundary conditions, Differential Integral Equations 17 (2004), no.
7-8, 751–780.
3. Giuseppe Da Prato, Jerzy Zabczyk, Stochastic equations in infinite dimen-
sions, Encyclopedia of Mathematics and its Applications, 44. Cambridge University
Press, Cambridge, 1992.
4. G. Da Prato, J. Zabczyk, Ergodicity for infinite-dimensional systems, Lon-
don Mathematical Society Lecture Notes Series, 229, Cambridge University Press,
1996.
5. A. Debussche, M. Fuhrman, G. Tessitore, Optimal Control of a Stochastic Heat
Equation with Boundary-noise and Boundary-control, to appear in ESAIM Con-
trol, Optimisation and Calculus of Variations.
6. K.-J. Engel, Spectral theory and generator property for one-sided coupled operator
matrices, Semigroup Forum 58 (1999), 267–295.
7. W. H. Fleming, H. M. Soner, Controlled Markov processes and viscosity
solutions, Springer-Verlag, 1993.
8. M. Fuhrman, G. Tessitore, Non linear Kolmogorov equations in infinite dimen-
sional spaces: the backward stochastic differential equations approach and appli-
cations to optimal control, Ann. Probab. 30 (2002), no. 3: 1397-1465.
9. Israel Gohberg, Seymour Goldberg, Marinus A. Kaashoek, Classes of linear oper-
ators, Vol. I, Birkhser Verlag, Basel, 1990. Operator Theory: Advances and Applica-
tions, 49.
18 S. Bonaccorsi, F. Confortola, E. Mastrogiacomo
10. Marjeta Kramar, Delio Mugnolo, Rainer Nagel, Semigroups for initial-boundary
value problems.In: Evolution equations: applications to physics, industry,
life sciences and economics (Levico Terme, 2000), 275–292, Progr. Nonlinear
Differential Equations Appl., 55, Birkhuser, Basel, 2003.
11. Bohdan Maslowski, Stability of semilinear equations with boundary and pointwise
noise, Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4) 22 (1995), no. 1, 55–93.
12. D. Mugnolo, Asymptotics of semigroups generated by operator matrices, Ulmer
seminare 10 (2005), 299–311.
13. El Maati Ouhabaz, Analysis of heat equations on domains, London Math-
ematical Society Monographs Series, 31. Princeton University Press, Princeton, NJ,
2005.
	Setting of the problem
	Generation properties
	Spectral properties of the matrix operator
	The abstract problem
	Stochastic control problem
ABSTRACT
  In this paper we investigate the optimal control problem for a class of
stochastic Cauchy evolution problem with non standard boundary dynamic and
control. The model is composed by an infinite dimensional dynamical system
coupled with a finite dimensional dynamics, which describes the boundary
conditions of the internal system. In other terms, we are concerned with non
standard boundary conditions, as the value at the boundary is governed by a
different stochastic differential equation.

<|endoftext|><|startoftext|>
Introduction 
Fractional calculus is a branch of mathematics that deal with a 
generalization of well-known operations of differentiations and 
integrations to arbitrary non-integer order, which can be real non-integer or 
even imaginary number. 
Nowadays physicists have used this powerful tool to deal with some 
problems which were not solvable in the classical sense. Therefore, the  
fractional calculus became one of the most powerful and widely useful 
tools in describing and explaining some physical complex systems. 
Recently, the Euler-Lagrange equations has been presented for 
unconstrained and constrained fractional variational problems  [1 and other 
references]. This technique enable us to solve some  problems including 
describing the behavior of non-conservative systems developed by Riewe 
[2], where he used the fractional derivative to construct the Lagrangian and 
Hamiltonian for non-conservative systems. 
From these reasons in  [3] was developed a general formula for the potential 
of any arbitrary force conservative or not conservative, which leads directly 
to the consideration of dissipative effect in Lagrangian and Hamiltonian 
formulation. Also, the canonical quantization of non-conservative systems 
has been carried out in [4].  
Starting from a Lagrangian containing a fractional derivative, the fractional 
Hamiltonian is achieved in [5]. In addition, the passage from Hamiltonian 
containing fractional derivatives to the fractional Hamilton-Jaccobi is 
achieved by Rabei et.al [6]. The equations of motion are obtained in a 
similar manner to the usual mechanics. 
All these outstanding results using the fractional derivative make us 
concentrate on another branch of quntam physics. WKB approximation [7, 
8, 9, 10,14]. In this paper we are mainly interested to construct the solution 
of Schroödinger equation in an exponential form (Griffith 1995) starting 
from fractional Hamilton-Jaccobi equation and how it leads naturally to this 
semi-classical approximation namely fractional WKB.  
     The purpose of this paper is to find the solution of Schrödinger equation  
for some systems that have a fractional behavior in their Lagrangians and 
obey the WKB approximation assumptions. 
The plan of this paper is as follows: In section II the derivation of 
generalized Hamilton-Jaccobi partial differential equation which given in 
[6] is briefly reviewed. In section III the fractional WKB approximation is 
derived. In Section IV some examples with the fractional WKB technique 
is reported. Section V is dedicated to conclusions. 
II. Basic Tools 
The left and right Reimann-Loville fractional derivative are defined as 
follows [3] 
The left Riemann-Liouville fractional derivative is given by 
αα dfx
xa ∫ −−−⎟⎠
= )()(
)( 1                         (1) 
The right Riemann-Liouville fractional derivative has the form 
ββ dfx
bx ∫ −−−⎟⎠
= )()(
)( 1                     (2) 
Here α, β are the order of derivation such that n-1≤α <n, n-1≤β<n, and they 
are  not zero. 
If α is an integer, these derivatives are defined in usual sense as 
)()( xf
xfDxa
⎛=                                                                    (3-a) 
)()( xf
xfDbx
⎛−=                                                                (3-b) 
                                                                         
Hamilton formalism with fractional derivative was proposed in [5] namely 
),,,(),,,( tqDqDqLqDpqDptppqH bttabtta
αβα −+=  ,               (4) 
where L represents the fractional Lagrangian obtained by replacing the 
classical derivatives with the corresponding fractional ones [5]. 
Hamilton's equations of motion are obtained as follows [5] 
; ;qDp
 ;qDp
β pDpD
btta +=∂
           (5) 
In  [6] based on the sequential derivatives the fractional Hamilton-Jacobi 
partial differential equation is obtained. The Hamilton-Jacobi function in 
configuration space is written in a similar manner to the usual mechanics 
by using the Reimann-Loville fractional derivative.  In [6] the following 
generating function  is used, where α 
and β are bigger or equal to 1. Thus, the new Hamiltonian is expressed as  
StPPqDqDFF btta ==
−− ),,,,( 112 βα
),,,('),,,( tQDQDQLQDPQDPtPPQK bttabtta
αβα −+=         (7)                      
It is concluded that, the following relation relates the two Hamiltonians 
KQDPQDPHqDpqDp bttabtta +−+=−+
α                    (8) 
According to reference [6] the function F is proposed as   
),,,,( 11 tPPqDqDSF btta βα
βα −−= QDPQDP Btta
11 −− −− ββ
α  ,                    (9) 
                                                           
The function S is called Hamilton's principle function. 
                    
       Therefore, requiring that the transformed Hamiltonian K  shall be zero 
the Hamilton-Jacobi equation is satisfied. In other words Q, Pα, Pβ are 
constants. 
H                                                                                          (10) 
Since Q, Pα, Pβ are constants, The Hamilton’s principle function is written 
),,,,( 21
11 tEEqDqDSS btta
−−= βα                                                            (11) 
where 
1EP =α               2EP =β
If the Hamiltonian is explicitly independent of time, then S can be written 
as follows 
                                 (12) ),,(),(),( 212
1 tEEfEqDWEqDWS btta ++=
−− βα
W represents the Hamilton's characteristic function; therefore, the 
following equations of motion are obtained in [6] as: 
αα                         qD
ββ                                 (13) 
11 λα =
QDta       2
21 λβ =
QDbt                            (14) 
Here λ1 ,λ2 are constants. 
 III. Fractional WKB approximation 
The outstanding result regarding the meaning of the state function ψ and its 
relationship to Hamilton's principle function S enables us to write the 
exponential solution of Schrödinger equation [13]. 
exp),(
tqψ                                   (15) 
                    
The phase of state function obeys the same mathematical equation, as does 
Hamilton's principle function S. The physical significance of S in classical 
mechanics is that it represents the generator of trajectories [12] for 
fractional systems; the fractional Hamilton's principle function is become 
the phase of the state function ψ. One can write the solution of Schrödinger 
equation under the postulated constrains by the WKB approximation and 
using the fractional Hamilton's principle function eq (12).  Thus we 
propose the fractional state function  as: 
⎛= −−−− tqDqDS
tqDqD bttabtta ,,exp),,(
1111 βαβαψ
       (16)                 
From the quantization using WKB approximation [7,8,9,10,14] a general 
solution of Schrödinger equation is obtained using the expansion for S and 
then using the transformation to the N-dimensional system as: 
exp)(
ψψ                                                      (17)                      
where 
iio qp
                                                                                (18) 
                                                                                       
In our case, S behaves like a 2-dimensional problem with two distinct 
momenta. Thus, 
                                                             (19) qDq ta
−≡ α αPP ˆ1≡
                                                                                (20) qDq bt
−≡ β βPP ˆ2 ≡
And the momenta are defined as operators. Therefore, we can propose the 
wave function ψ of the fractional system in the following form  
= −−−− tEEqDqDS
tqDqD bttabtta ,,,,exp
),,( 21
1111 βα
h       (21) 
and the momenta operators in the form 
−− qDi
ˆ,ˆ ββαα
                                     (22) 
We conclude that (21) is the solution of Schrödinger equation for any given 
fractional systems. If α and β both are equal to unity, then we will return to 
the usual classical solution of Schrödinger equation, also we can notice 
how the probability is inversely proportional to the momentum 
IV. Examples  
IV. a) Example 1: 
As a first model let us consider the following fractional Lagrangian, 
( ) ( )21
qDqDL tt
βα +=                                       (23) 
The fractional Hamilton-Jacobi equation for this fractional Lagrangian can 
be calculated as: 
( ) ( ) .0
1 22 =
PP βα                                          (24) 
where 
              qD
=   ;     qD
=    
Making use of equation (13), the fractional Hamilton-Jacobi equation (24) 
becomes: 
βα                            (25) 
Taking into account 
           t
−=                                                                (26) 
If we apply (26) on a wave function it gives: 
)( 21 EEEt
+−≡−=
                                               (27) 
By using the fact that E is the total energy of the system and taking into 
account (27) we obtain 
βα           (28) 
Thus, both sides of (28) should be zero, and we obtain  
qDEWqDEW tt
011 2,2
−− == βα               (29) 
By using  (12) and (21) we obtain 
−+= −−−− tEqDEqDE
tqDqD tttt
0 22exp
),,( βα
                                                                                                          (30) 
Which represents the wave function of the following Hamiltonian: 
( ) ( )22
βα PPH +=                                                          (31) 
Let us deal now with the momenta as operators of the form (22), and 
applying these operators on the wave function, one obtain the following 
momenta eigenvalues 
ψψψψ βα 21 2ˆ2ˆ EPEP ==                    (32) 
Then, 
                  21 2ˆ2ˆ EPEP == βα                   (33) 
It’s the same as the classical solution. Also, when applying the energy 
operator it gives the energy eigenvalues:  
( ) ( ) ψψψ βα 22 ˆ2
PPH +=         ( )ψ21 EE +=                               (34) 
as in the classical case. 
IVb.)Example 2: 
As a second example let us consider the following fractional Lagrangian 
( ) ( ) 210
qqDqDqDqDL tttt ++++=
                                     (35) 
The corresponding fractional Hamilton is calculated as follows 
( ) ( ) 222
qPPH −−+−= βα                                                              (36) 
Thus, the fractional Hamilton-Jacobi equation becomes 
( ) ( ) 0
1 222 =
+−−+−
qPP βα                                                       (37)                      
The fractional Hamilton's principle function is calculated as,  
( ) ( ) tEEqDEqDEqS tt )(1212 211121012 +−++++= −− βα                      (38) 
As a result the wave function can be written in the form 
( ) ( )( )⎟
−++++
tEqDEqDEq
tqDqD
1212exp
(39) 
   To identify the influence of the operators let us test the effect of the 
momenta 
−− qDi
         (40) 
Using the characteristic equations, it can be shown that  
12ˆ,12ˆ 21
2 +=++= EPEqP βα              (41) 
The result shown in (41 ) is  the same classical solution. When applying the 
energy operator it will give the energy eigenvalues  
( ) ( ) ψψψψ βα 222 2
qPPH −−+−=                                        (42) 
( ) ( ) ψψψψ βαβα 222 2
1ˆˆˆˆ
qPPPP −++−+=        
( ) ψ}
222222
EEqEq
−++++−
+++++=
                                                                   
Then we get 
        ψψ EH =                                                                (43) 
which is exactly the total energy as the case for the classical systems. 
V. Conclusions  
We use the generating function "S" of the Hamilton-Jaccobi equation in its 
fractional form to be the phase factor of the wave function describing some 
potentials valid for the assumptions suggested by the WKB approximation       
The proof of our results arises from the new proposed concepts of the 
momentum and energy operators, that they give the same eigenvalues 
producing the ordinary results achieved by the classical approach. 
Giving the same eigenvalues that means this form of fractional operator 
also eigen, valid, and useful in effecting on a state functions. 
References 
[1] Om P.Agrawal, Formulation of Euler-Lagrange equations for fractional 
variational problems, J.Math. Anal. Appl. 272,(2002),368-379.    
[2] F. Riewe, Non-conservative Lagrangian and Hamiltonian mechanics, 
Phys. Rev.E 53: (1996), 1890-1899. 
[3] Eqab M.Rabei, Tareq S.Alhalholy and Akram A. Rousan, Potential of 
arbitrary forces with fractional derivatives, International journal of 
modern physics A.19: 17&18July(2004), 3083-3092.  
[4]  Eqab M.Rabei, Abdul-wali Ajlouni, an Humam B.Ghassib. 
Quantization of Brownian motion. International Journal of theoretical 
physics, 45: (2005), 1613-1623. 
[5] Eqab M.Rabei and khaled I.Nawafleh, Raed S.Hijjawi, Sami I.Muslih, 
Dumitru Baleanu.The Hamilton Formalism with fractional derivatives, 
J. Math. Anal. Appl.: 327, (2007) ,891–897  
[6] Eqab M.Rabei, Bashar S. Ababneh, Hamilton-Jaccobi  
      fractional Sequential Mechanics , J. Math. Anal. Appl. (in press). 
[7] Eqab M. Rabei, Eyad H. Hasan, and Humam B. Ghassib, Hamilton-
Jaccobi Treatment of Constrained Systems with Second-Order 
Lagrangians, International Journal of Theoretical Physics, Vol. 43, No. 
4, April (2004), 1073-1096. 
[8] Eqab M.Rabei, Khaled I.Nawafleh, Y.S Abdelrahman, H.Y.R Omari. 
Hamilton-Jaccobi treatment of Lagrangians with linear velocities 
Modern physics letters A.18:(2003), 1591-1596 
[9] Eqab M.Rabei, Eyad H. Hasan, Humam B.Ghassib, S Muslih, 
Quantization of Second-Order constrained Lagrangians systems using 
the WKB approximation, International Journal of geometric methods 
in modern Physics 2: (2005)1-20. 
[10] Khaled I.Nawafleh, Eqab M. Rabei and H. B. Ghassib, Hamilton-
Jacobi Treatment of constrained systems, International Journal of 
Modern Physics A, 19: (2004), 347-354.  
[11] I. Podlubny, Fractional Differential Equations, Academic Press, New 
York, 1999. 
[12] H. Goldstein, Classical Mechanics, second ed., Addison Wesley, 1980. 
[13] David J Griffiths, Introduction to quantum mechanics, prentice hall, 
New Jersey (1995). 
[14] Eqab M. Rabei, K.I.Nawafleh and H.B.Ghassib, Quantization of 
Constrained Systems Using the WKB Approximation, Phys. Rev. A 
66, (2002), 24101. 
	Abstract
ABSTRACT
  Wentzel, Kramers, Brillouin (WKB) approximation for fractional systems is
investigated in this paper using the fractional calculus. In the fractional
case the wave function is constructed such that the phase factor is the same as
the Hamilton's principle function "S". To demonstrate our proposed approach two
examples are investigated in details.

<|endoftext|><|startoftext|>
Introduction
The Skyrme model, in its initial form, was proposed and developed by T.H.R. Skyrme in
a series of papers as a non-linear field theory of pions [2], [3]. Skyrme’s initial idea was to
think of baryons (in particular the nucleons) as secondary structures arising from a more
fundamental mesonic fluid. The key property of the model was that the baryons arose as
solitons in a topological manner and thus possessed a conserved topological charge identified
with the baryon number.
The lowest energy stable solutions of the model are termed Skyrmions and can be thought
of as baryonic solitons. The Skyrme model has been very successful in modelling the struc-
tures of various nuclei and has been shown by Witten et al. [4] to possess the general
features of a low energy effective field theory for QCD.
Some studies of the Skyrme model coupled to gravity have previously been undertaken
[1], [5], [6], mainly with the motivation of a comparison of its features with those of other non-
linear field theories coupled to gravity. Of particular note is the Einstein-Yang-Mills theory,
in which gravitationally bound configurations of non-abelian gauge fields are produced.
Other reasons for studying the Einstein-Skyrme model are cosmological and astrophysi-
cal ones. Various authors have studied black hole formation in the model, with the conclu-
sion that the so-called no-hair conjecture may not hold [7], [8].
The purpose of this paper is to study large baryon number Skyrmions or configurations
of Skyrmions in the Einstein-Skyrme model. In particular, we wish to investigate if stable
solitonic stars could exist within the model and to compare their properties to those of
neutron stars.
Preliminary studies of Skyrmion stars have predicted instability to single particle decay
[1]. However this was done using the hedgehog ansatz for baryon number larger than 1
which is known to lead to unstable solutions even for the usual Skyrme model. Since then,
it has been shown that the Skyrme model has stable shell-like solutions[9] which can be well
approximated by the so called rational map ansatz [10].
In this paper we use the rational map ansatz and its extension to multiple shells to
construct configurations in the Gravitating Skyrme model that have a very large number
of baryon. We show that those configurations, contrary to the hedgehog ansatz are bound
even for very large baryon numbers.
To construct configurations that have a baryon number comparable to that of neutron
star, we have to introduce a further approximation, which we call the ramp ansatz. We show
that this anstaz introduces further errors of only a few percent and we use it to compute
very large Skyrmion configurations.
The paper is organised as follow: first we outline the Einstein-Skyrme model and discuss
the main features of the results on static gravitating SU(2) hedgehogs obtained by Bizon
and Chmaj [1]. We then use the rational map ansatz to construct shell like gravitating
multibaryon configurations and show that for a fixed value of the coupling constant, the
configurations exist only when the baryon number is below a certain critical value. Finally
we introduce a ramp profile approximation to construct solutions with extremely high baryon
numbers. We show how accurate it is and use it to construct Skyrmion stars configuration.
2 The Einstein-Skyrme Model
The action for gravitating Skyrmions is formed from the standard Skyrme action for the
matter field and the Einstein-Hilbert action for the gravitational field.
LSk −
x. (1)
Here LSk is the Lagrangian density for the Skyrme model defined on the manifold M :
LSk =
Tr(∇µU∇µU−1) +
Tr[(∇µU)U−1, (∇νU)U−1]2, (2)
where U belongs to SU(2). As we eventually wish to study baryon stars, we take a spheri-
cally symmetric metric, such as associated with the line element
= −A2(r)
1− 2m(r)
1− 2m(r)
+ sin
), (3)
where A(r) and m(r) are two profile functions that must be determined by solving the
Einstein equations for the model. Our choice of ansatz is motivated by the fact that although
in some cases we will be studying non-spherical Skyrmion configurations, the regime we are
primarily interested in (i.e. Skyrmions of extremely high baryon number) will be shown to
admit quasi spherical solutions. Also, for realistic values of the couplings, the gravitational
interaction is small compared to the Skyrme interaction and thus the use of a spherical
metric even with non-spherical configurations, is not a great problem.
From (3), it can be shown that the Ricci scalar is
−A′′r2 − 2A′r + 2A′′rm+ A′m+ 3A′rm′ + Arm′′ + 2Am′
which, after integrating various terms by parts and noting that asymptotic flatness requires
both A(r) and m(r) to take a constant value at spatial infinity, reduces the gravitational
part of the action to
Sgr =
−m′(r)
. (5)
For what follows, it will be convenient to scale to dimensionless variables by defining x =
eFπr and µ(x) = eFπm(r)/2, resulting in one dimensionless coupling parameter for the
model, α = πF 2πG. We note that taking Fπ = 186Mev and G = 6.72 × 10−45Mev−2, then
the physical value of the coupling is α = 7.3× 10−40.
As the Skyrme field is an SU(2) valued scalar field, at any given time one can think of it
as a map from R3 to the SU(2) manifold. Finite energy considerations impose that the field
at spatial infinity should map to the same point on SU(2), say the identity. Thus, one can
simply think of the Skyrme field as a map between three-spheres. All such maps fall into
disjoint homotopy classes characterised by their winding number. This winding number is
a conserved topological charge because no continuous deformation of the field and thus no
time evolution, can allow transitions between homotopy classes. It is this topological charge
that is interpreted as the baryon number.
3 Gravitating Hedgehog Skyrmions
Gravitating Skyrmions were first studied by Bizon and Chmaj[1] who analysed the properties
of static spherically symmetric gravitating SU(2) skyrmions. Taking the Hedgehog Ansatz
for the Skyrme field
U = exp(i−→σ .r̂F (r)) (6)
subject to the boundary conditions
F (r = 0) = Bπ (7)
F (r = ∞) = 0 (8)
where B is the Baryon number associated with the Skyrmion configuration, they derived
the Euler-Lagrange equation for the profiles F (r), (A(r) and m(r) and found that the
model admit two branches of global solitonic solutions at each given baryon number, which
annihilate at a critical value of the coupling parameter. Above αcrit no further solutions
were found. In particular the value of the critical coupling decreased quite considerably
with increasing baryon number as αcrit ≈ 0.040378/B2 . It appears that the existence of a
critical coupling does not signal the collapse of a Skyrmion to form a black hole. In fact
the metric factor S(x) = (1 − 2µ(x)
) is non-zero at αcrit; there simply ceases to be any
stationary points of the action above the critical coupling.
The major problem with the ansatz (7) is that it leads to unstable solutions, i.e. for any
given value of α, MADM (B = N) > NMADM (B = 1). This is actually the case for the pure
Skyrme model as well where the hedgehog anstaz (7) with B > 1 does not correspond to the
lowest energy solution for the model. The solutions of the pure Skyrme model when B > 1
are known not to be spherically symmetric[11] but are stable i.e. E(B = N) < N ∗E(B = 1).
It was actually shown by Houghton et al [10], [12] that the multi-baryon solutions of
the pure Skyrme model can be well approximated by the so called rational maps ansatz
which is a generalisation of the hedgehog ansatz. While not radially symmetric, the ansatz
separates its radial and angular dependence through a profile function and a rational map
respectively.
In the following sections we will generalise the construction of Houghton et al to approx-
imate the solution of the Einstein-Skyrme model.
4 The Rational Map Ansatz
The rational map ansatz introduced by Houghton et al.[10] works by decomposing the
field into angular and radial parts. Using the polar coordinates in R3 and defining the
stereographic coordinates z = tan(θ/2) expiφ the ansatz reads [10]
U = exp (i~σ · n̂RF (r, t)) (9)
where
n̂R =
1 + |R|2
2ℜ(R), 2ℑ(R), 1− |R|2
is a unit vector where R is a rational function of z.
It can be shown that the baryon number for Skyrmions constructed in this way, is equal
to the degree of the rational map providing we take the boundary conditions
F (r = 0) = π
F (r = ∞) = 0. (11)
Substituting the ansatz (9) into the action for the model and scaling to dimensionless
variables as earlier, we obtain the following reduced Hamiltonian
16πFπ
S(x)F (x)
+ Bsin2F (x)(1 + S(x)F (x)′2)
Isin4F (x)
where
S(x) = 1− 2µ(x)
From which one obtains the following field equations
S(x)x
F (x)
+ B sin2 F (x) + S(x)BF (x)′2 sin2 F (x) + I sin
4 F (x)
F (x)
S(x)V (x)
sin 2F (x)
B + S(x)BF (x)′2 + I sin
2 F (x)
− αS(x)F (x)
′3V (x)2
− S(x)′F (x)′V (x)− S(x)F (x)′V (x)′
= αA(x)F (x)
x+ 2B sin
2 F (x)
where, for convenience, we have defined V (x) as
V (x) = x
+ 2B sin2 F (x). (17)
B is the baryon number and
I = 1
1 + |z|2
1 + |R|2
2idzdz
(1 + |z|2)2
Its value depends on the chosen rational map R. To compute low energy configurations for
a given baryon charge B one must find the rational map R or degree B that minimize I.
This has been done in [10] and [11] for several values of B. Moreover when b is large, one
can use the approximation[11] I ≈ 1.28B2 . The value of I so obtained is then used as a
parameter and one can solve equations (14) - (16) for the radial profiles F (x), A(x) and
µ(x).
We should point out here that for the pure Skyrme model the rational map ansatz
produce very good approximation to the multi skyrmion solutions [10]: the energies are only
3 or 4 percent higher and the energy densities exhibit the same symmetries and differ by
very little. All the solutions computed by Battye and Sutcliffe[11], when B is not too small,
have somehow the shape of a hollow shell. The baryon density is very small everywhere
outside the shell, while on the shell itself, it forms a lattice of hexagons and pentagons.
5 Hollow Skyrmion Shells
Using the rational map ansatz, we will now solves the field equations (14) - (16) to compute
some low action configurations. These solutions will correspond, initially, to a hollow shell
of Skyrmions similar to the configuration obtained with the rational map anstaz for the pure
Skyrme model. In the following sections we will show how our ansatz can be generalised to
allow for more realistic configuration made out of embedded shells.
The first thing to note about our solutions is that we again obtain two branches of
solutions at each baryon number (Fig. 1). Obtaining this same qualitative behaviour is not
surprising when one considers that the B = 1 rational map Skyrmion reproduces the usual
B = 1 hedgehog. However, the behaviour of the critical coupling itself is drastically altered
for the rational map generated configurations. Namely, we observe that it decreases as
approximately 0.040378/B
2 (Fig. 2). In particular this means that for a given value of the
coupling, the rational map generated skyrmions can possess a much higher topological charge
than their hedgehog counterparts, before there ceases to be any solutions. Quantitatively
if Bhedgehog is the maximum baryon number for which hedgehog solutions can be found at
a given value of the coupling, then the highest baryon number rational map solution found
at the same value of α will be approximately B4hedgehog . Again we observe that the metric
function S(x) is non-zero at the critical coupling for all the solutions we have found and as
such a horizon has not formed.
In Table 1 we present the radius, ADM mass per baryon and minimum value of the
metric function, S(x), for configurations up to the maximum baryon number allowed at
α = 1× 10−6. These values were obtained by direct numerical solution of equations (14) -
(16), where we have used the boundary data as specified in (11).
We didn’t didn’t use the physical value of α (7.3×10−40) because for this value, the ratio
between the width of the shell and its radius is so small when we reach the maximum value
of B that it becomes very difficult to solve the equation reliably. The value α = 1 × 10−6
is small enough to allow for a shell with a large baryon number to exist but large enough
to make it possible to compute these solution nears the critical value of B for a single shell
configuration.
 0  0.005  0.01  0.015  0.02  0.025  0.03  0.035
Figure 1: Plot of the two branches of solutions found for B=2 configurations generated with the
rational map ansatz.
The major difference between these configuration and the solutions of Bizon and Chmadj
is that the rational map ansatz configurations become more bound when the baryon number
increases This suggests the possibility that giant gravitating Skyrmions can be bound and
consequently, that the Skyrme model can be used to study baryon stars.
Another interesting feature of the data is the observed change in the radius of the solu-
tions with increasing baryon number. We note that the radius grows as approximately B
However there are two main deviations from this. Firstly, the constant of proportionality
relating the radius to the square root of the baryon number decreases slightly but persis-
tently as we increase the baryon number, indicating the gravitational interaction becoming
more important as the number of baryons increases.
As we approach the maximum baryon charge that can exist at α = 1 × 10−6, we also
notice that the radius of the skyrmion actually decreases as we add more baryons. This
shows that the gravitation pull plays a crucial role near the critical value of the skyrmion.
This is a tantalising property when one considers that generally a neutron star’s radius must
decrease for an increase in mass in order to achieve sufficient degenerate neutron pressure
 0.005
 0.01
 0.015
 0.02
 0.025
 0.03
 0.035
 0.04
 0.045
 0.05
 2  4  6  8  10  12  14  16  18  20
Figure 2: Plot of the decrease in αcrit with increasing baryon number, for configurations generated
with the rational map ansatz. +: αcr for the minimum value of I; curve: αcr = 0.0404B
−1/2.
to support the star.
To motivate the further approximation that we will introduce in the next section, we
now look at the profiles of the configuration that we have computed. First of all, we observe
that the profile function F (x) stays approximately at its boundary value, π, for a finite
radial distance before decreasing monotonically over some small region and finally attaining
its second boundary value, 0. A similar behaviour is seen for both the mass field µ(x)
and the metric field A(x) (see Fig. 3). Furthermore, as we increase the baryon number
the structure becomes more pronounced, with the distance before the fields change (shell
radius) increasing significantly, whilst the distance over which the fields change (shell width)
settles to a constant size. We conclude that at large baryon numbers, those configurations
correspond to hollow shells where the baryons are distributed on a tight lattice over the
shell. As such the, structures are nearly spherical, validating our choice of radial metric.
Such structures immediately pose an interesting question. Can the gravitating Skyrmions
exist as shells with more than one layer? To investigate this we note that it is possible to
B R( 2
) MADM (
2etopconv
) Smin
1 0.8763 1.2315 1.0000
4 1.7728 1.1365 1.0000
8 2.5065 1.1180 1.0000
100 8.6829 1.0845 0.9999
500 19.3994 1.0827 0.9998
1× 103 27.4314 1.0825 0.9997
1× 104 86.7192 1.0821 0.9989
1× 105 274.0397 1.0814 0.9963
1× 106 864.6968 1.0792 0.9883
1× 107 2715.0729 1.0722 0.9628
1× 108 8377.4601 1.0500 0.88192
1× 109 23585.5315 0.9743 0.6107
1.5× 109 26860.2040 0.9463 0.5020
1.8× 109 27470.2449 0.9302 0.4256
1.81× 109 27456.5804 0.9296 0.4225
1.85× 109 27357.9201 0.9274 0.4090
1.9× 109 27078.6014 0.9246 0.3886
1.95× 109 26126.5508 0.9217 0.3517
1.951× 109 26050.7695 0.9217 0.3495
1.952× 109 25937.4210 0.9216 0.3463
Table 1: Properties of the one shell low energy configuration for α = 1× 10−6
 1210  1215  1220  1225  1230  1235
 1210  1215  1220  1225  1230  1235
 0.99
 0.995
 1.005
 1.01
 1210  1215  1220  1225  1230  1235
Figure 3: Numerical solutions for the profiles F (x), µ(x) and A(x) when B = 2 × 106 and
α = 1× 10−6.
modify the boundary condition (11) to read
F (r = 0) = Nπ (19)
F (r = ∞) = 0 (20)
whilst still ensuring that the Skyrme field is well defined at the origin. This idea was first
used in[12] to construct two shell configurations for the pure Skyrme model.
The baryon charge is now N times the degree of the rational map. Fig. 4 shows the
structure of the solutions we find in this case when N = 2. They suggest that the Skyrmion
now exists as a N-layered structure. This is exhibited in the form of the profile, mass and
metric functions which interpolate between the boundary values in N distinct steps of equal
 1200  1205  1210  1215  1220  1225  1230
 1200  1205  1210  1215  1220  1225  1230
 0.995
 1.005
 1.01
 1.015
 1.02
 1200  1205  1210  1215  1220  1225  1230
Figure 4: Numerical solutions for for the profiles F (x), µ(x) and A(x) for 2 layers configurations
(F (0) = 2π) when B = 2× 106 and α = 1× 10−6.
size stacked next to each other.
We can therefore think of this as a naive way of constructing a gravitating Skyrmion.
Instead of using the boundary conditions as in (11) and a rational map of degree B we
consider constructing the B-Skyrmion using a rational map of degree B/N (with the asso-
ciated value of I) and the boundary condition (20). This is a crude construction as we are
effectively considering N adjacent shells of baryons, all with the same baryon number. We
might realistically expect that the baryon number per shell and distribution of shells may
vary significantly for the minimum energy configuration. Nevertheless we shall study the
properties of such structures. In fact, in the case where the baryon number is large and the
number of shells is small, we expect this crude construction to be quite valid. That is, we
do not expect the baryon number to change significantly over the few shells at large radius.
B R( 2
) MADM (
2etopconv
) Smin
4 1.2898 1.6179 1.0000
8 1.7858 1.4072 1.0000
100 6.1754 1.1363 0.9999
1× 103 19.4157 1.0913 0.9996
1× 104 61.3207 1.0833 0.9985
1× 105 193.7006 1.0812 0.9949
1× 106 610.6271 1.0779 0.9835
1× 107 1911.3704 1.0680 0.9475
1× 108 5825.2626 1.0362 0.8325
9.0× 108 13736.9982 0.9302 0.4258
9.7× 108 13263.0853 0.9224 0.3644
9.76× 108 12998.2817 0.9217 0.3480
9.764× 108 12931.5189 0.9216 0.3444
9.7647× 108 12895.6984 0.9216 0.3425
9.76472× 108 12891.4247 0.9216 0.3423
9.764724× 108 12889.8645 0.9216 0.3422
Table 2: Table of properties of double layer solutions obtained numerically at α = 1× 10−6
For the remainder of this section we will restrict ourselves to the case where N = 2.
Table 2 summarises the properties of double layered gravitating skyrmions up to the
maximum baryon charge allowed at α = 1 × 10−6. Briefly, we note the main features.
Firstly, for all baryon numbers, the radius of the double layered solutions is significantly
less than their single layered counterparts. This is not surprising as the baryon charge exists
over a thicker region and so the mean radius can decrease with the baryon density remaining
the same. Secondly, when B is large enough, i.e. when the double layer starts to make sense,
the double layer solutions are energetically favourable when one compares the ADM mass
with the single layer solutions. Finally we note that the maximum baryon number allowed
(at the given coupling) is almost twice as much in the case of the single layer skyrmions.
Of course the results of this section are not really the main regime of interest. We clearly
need to study configurations of extremely high baryon number (of order 1058) relevant for
baryon stars.
We will now discuss this high baryon number regime.
6 The Ramp-profile Approximation
Unfortunately, at very high baryon numbers, eqns. ((14) - (16)) become difficult to handle
numerically. This is largely because the radius of the solutions becomes much larger than
the distance over which the fields change. That is, we need to integrate over a region which
is much less than 10−16radius, and so even double precision data types have insufficient
precision.
Moreover, single shell configurations are not physically relevant and multiple shells will
only yield configuration that looks like a star if the number of layers is very large, typically
well over 1017. With such a large number of layers we won’t be able to solve the equation nu-
merically as we will need at least 10 times as many sampling points for the profile functions.
We must thus resort to another level of approximation: approximate the profile functions
by profiles that are piecewise linear. This is inspired by the work of Kopeliovich [13] [14]
except that our ansatz has to be piecewise linear to be able to generate configurations with
a huge number of layers. After defining the ansatz for an arbitrary number of layers, we
will show that for a single layer configuration the ansatz produces configurations that are
in good agreements with the rational map ansatz configuration. Then we will use the new
ansatz to construct configurations that are made out of a very large no of layers.
We have shown, in the previous section, that one can construct shell like structures
with very large Baryon numbers. At large baryon numbers, the Skyrmions resemble shell
like structures. That is, the fields are constant nearly everywhere except in a small region
corresponding to the shell. In that region, the profile look like linear functions smoothly
linked to the constant parts at the edges (cfr. Fig 3 ). Motivated by this we approximate
the fields by the ramp-functions
F (x) =
− (x− x0)
, (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (21)
µ(x) =
+ (x− x0)
, (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (22)
A(x) =
(1 + A0)
+ (x− x0)
(1−A0)
, (x0 −NW/2) ≤ x ≤ (x0 +NW/2) (23)
In the above there are four free parameters, namely the central radius x0 of the shell
over which the fields change, the width of the shell W , the mass field at spatial infinity M
and the value of the metric field at the origin A0 such that limx→∞ = 0. N is the number
of layers we wish to study and, as such, is treated as an input parameter.
The picture is of a gravitating skyrmion with very high baryon number existing as N
thin layers or shell of small thickness.
The above ansatz, allow us to find an approximation to the integrated energy. To do
this we use the fact that the shell width is much smaller than the radius at large baryon
numbers. In particular to evaluate the action integral we can approximate expressions of
the type
G(x) sinp F (x) for any function G(x) that varies very little over the width of the
shell by
G(x0) sin
p F (x). We then use the fact that
Z x0+NW/2
x0−NW/2
F (x) =
y dy. (24)
This leads to the following expression for the energy:
E = −16πFπ
1 +A0
1 + A0
0 −Mx0 +
1− A0
W 2x0
− MWx0
1 + A0
−W − π
− 3IW
16x20
1 + A0
To find the configurations which minimize this energy we first minimised it with respect to
A0 and M algebraically in order to find an expression for the energy as a function of the
width and radius only. Then we minimised this numerically using Mathematica. We will
now discuss the features of these configurations.
First of all, we must compare the results obtained with the ramp-profile when N = 1
and compare them to the result obtained with the full profile. Tables. 3 and 4 show
the properties of solutions we obtained using the ramp-profile approximation, again at
α = 1 × 10−6. All the general features of the full numerical solutions are reproduced. In
particular, the approximate B
2 scaling and then decrease of the radius, the decreasing ADM
mass and the differences between the double and single layer solutions are all exhibited by
the data obtained using the ramp-profile approximation.
Quantitatively though, there are some differences. The approximation allows a signif-
icant increase in the maximum allowed baryon charge. Also, the radius of configurations
obtained using the approximation, tend to be smaller than those obtained numerically. If
we concentrate on the baryon numbers greater than 105 so as to ensure our approximation,
B R( 2
) W MADM (
) Smin
100 8.3063 3.1286 1.1023 0.9999
500 18.6031 3.1386 1.1160 0.9997
1× 103 26.313 3.1397 1.1195 0.9996
1× 104 83.206 3.1396 1.1254 0.9987
1× 105 262.94 3.1357 1.1266 0.9960
1× 106 829.60 3.1230 1.1251 0.9872
1× 107 2604.2 3.0825 1.1186 0.9595
1× 108 8032.8 2.9512 1.0972 0.8713
1× 109 22899 2.4837 1.0272 0.5772
2× 109 29121 2.1092 0.9818 0.3645
2.8× 109 29098 1.6623 0.9505 0.1380
2.83× 109 28514 1.6066 0.9495 0.1119
2.839× 109 28024 1.5671 0.94922 0.09373
2.8397× 109 27869.3 1.5556 0.94924 0.08845
2.83975× 109 27869.8 1.5524 0.94925 0.08699
2.839752× 109 27822 1.5521 0.94925 0.08687
Table 3: Table of properties of the single layer step ansatz configurations for varying the baryon
number at fixed α = 1× 10−6
B R( 2
) W MADM (
) Smin
100 5.7924 3.0428 1.0692 0.9999
1× 103 18.5788 3.1305 1.1047 0.9995
1× 104 58.8202 3.1380 1.1201 0.9983
1× 105 185.8420 3.1332 1.1246 0.9944
1× 106 585.7950 3.1153 1.1233 0.9820
1× 107 1833.0500 3.0578 1.1143 0.9428
1× 108 5587.3600 2.8688 1.0840 0.8172
9× 108 14147.1782 2.1859 0.9900 0.4065
9.764724× 108 14472.3851 2.1276 0.9837 0.3746
1× 109 14560.5000 2.1092 0.9818 0.3646
1.4× 109 14549.0000 1.6623 0.9505 0.1381
1.41963× 109 13994.0523 1.5644 0.9492 0.0926
1.419635134× 109 13993.2000 1.5643 0.9492 0.0925
Table 4: Table of properties of the double layer step ansatz configurations for varying the baryon
number at fixed α = 1× 10−6
that the width is much smaller than the radius, is valid, then at worst we find a discrepancy
in the ADM mass of 11% and in the radius of 7%.
In general then, the data seems to confirm the reliability of the ramp-profile approxi-
mation. In fact the approach will be even more reliable at the extremely high values of the
baryon number that we are interested in. This is because the radius of solutions is of orders
of magnitudes greater than the width in such a regime, consistent with the approximations
we have made.
Moreover, whilst searching for minima of the energy does not allow us to probe both
branches of solutions, it does allow us to locate the value of αcrit. We again obtain the
approximate trend αcrit ∝ B−
2 , for large B.
Now in order to say anything about the possibility of baryon stars in the Skyrme model
we need to be able to verify that the decrease in the ADM mass per baryon we observed
at low and moderate baryon numbers, extends to baryon numbers of order 1058 for α =
7.3× 10−40.
Table. 5 summarizes our solutions in such a regime. Firstly we consider constructing
a single layer self-gravitating Skyrmion with these values. We do indeed see that the con-
figuration is bound. This is verified by checking that the ADM mass is lower (even at this
significantly lower value of α) than for the B = 1 hedgehog. So the possibility of baryon
stars in the Einstein-Skyrme model cannot be ruled out on the grounds of energy.
The Skyrmion exists as a giant thin shell, and the large baryon charge is distributed as
a tight lattice over this. However a hollow shell is clearly not a realistic construction for a
neutron star. This fact manifests itself in the extremely high radius of the configuration.
Transferring to standard units, the single layer B = 1058 gravitating Skyrmion has a radius
of 2.42× 1010km !
To address this issue, we can use a large number of layered Skyrmions as discussed
earlier. This has several benefits. Firstly, as we are distributing the baryon number through
a larger volume, then at a given baryon density the necessary radius can decrease. Similar
to what we see in the double layer results. On top of this, we expect the radius to decrease
further due to extra gravitational compression, as the outer layers of the Skyrmions feel
the attraction of inner layers. Finally, the many layer approach is also a more realistic
construction of a solid baryon star.
The results for using more and more layers in the construction (for fixed B and α), are
also presented in Fig. 5. We note that not only does the radius decrease significantly, but
the added gravitational binding further improves the energies of the configurations, reflected
in the low ADM masses obtained. There appears to be a critical number of layers that can
be used before there ceases to be any solutions and although the value of Smin is close to
zero at this point, the star still has not collapsed to form a black hole. Finally, we note that
the radius of the Skyrmion at the critical number of layers is approximately 20.91km. This
is comparable to a real neutron star, with a typical radius of 10km.
We reemphasise here that our approach to embedding shells of baryons is quite crude.
For few shells and large baryon number, we might reasonably believe that baryon number
does not chance significantly from one shell to the next. However, when we embed many
shells we should really consider that the baryon number of the inner most shells would likely
be significantly less than the that of the outer shells. Nevertheless, our naive embedding has
produced some interesting properties. In a future work we hope to improve our multi-layer
construction to obtain a more realistic description of a baryon star.
NShell R(
) W MADM/(6π
2B) Smin
1× 102 8.3236× 1027 3.1416 1.1285 1.0000
1× 103 2.6321× 1027 3.1416 1.1285 1.0000
1× 104 8.3236× 1026 3.1416 1.1285 1.0000
1× 105 2.6321× 1026 3.1416 1.1285 1.0000
1× 106 8.3236× 1025 3.1416 1.1285 1.0000
1× 107 2.6321× 1025 3.1416 1.1285 1.0000
1× 108 8.3236× 1024 3.1416 1.1285 1.0000
1× 109 2.6321× 1024 3.1415 1.1285 1.0000
1× 1010 8.3234× 1023 3.1415 1.1285 0.9999
1× 1011 2.6319× 1023 3.1412 1.1285 0.9997
1× 1012 8.3216× 1022 3.1402 1.1283 0.9991
1× 1013 2.6301× 1022 3.1373 1.1278 0.9971
1× 1014 8.3034× 1021 3.1280 1.1263 0.9907
1× 1015 2.6118× 1021 3.0986 1.1213 0.9705
1× 1016 8.1147× 1020 3.0037 1.1057 0.9063
1× 1017 2.4001× 1020 2.6810 1.0552 0.6977
5× 1017 8.2066× 1019 1.7888 0.9552 0.2036
5.3× 1017 7.4172× 1019 1.6227 0.9491 0.1247
5.33× 1017 7.1871× 1019 1.5625 0.94866 0.0971
5.3306× 1017 7.1597× 1019 1.5549 0.94868 0.0936
5.33065× 1017 7.1525× 1019 1.5528 0.948694 0.0927
5.330657× 1017 7.1506× 1019 1.5523 0.948692 0.0924
Table 5: Table of properties of the step ansatz configurations for varying the number of embedded
shells at fixed B = 1058 and α = 7.3× 10−40.
7 Conclusions
Previous work on the Einstein-Skyrme model highlighted a considerable problem with using
the Skyrmions as a model for baryon stars. Namely, multibaryon hedgehog Skyrmions were
simply not energetically favourable states. We have shown that this is simply a consequence
of a poor ansatz for the true Skyrmion and, having used the more appropriate rational map
ansatz, we have generated energetically favourable configurations of multibaryons.
We also observe the interesting property that near the critical coupling, the Skyrmions
can decrease in radius as we add more baryons. This hints towards the similar behaviour
exhibited by real neutron stars.
Although the rational map ansatz does not have an exact radial symmetry, at large scale
it does. The anisotropy only appears at the nucleon scale.
Finally, since we started with the motivation of studying baryon stars within the Skyrme
model, it is interesting to compare the features of our configurations with those of neutron
stars. For realistic values, B = 1058 and α = 7.3 × 10−40 we find a minimal energy single
layer configuration with radius=2.42 × 1010km. This is clearly too large for a neutron star
(which is of order 10km. in radius). This is to be expected however due to the shell model
we have taken. Firstly, as we are distributing the baryons over the surface area rather
than throughout the volume of the star we naturally must require a much larger star for
a given baryon number. This effect is two-fold in that if we were distributing the baryons
throughout the volume, outer layers would feel the attraction of inner layers and enhanced
radial compression would occur. The loss of such an effect is pronounced when we are
considering realistically small values of the coupling.
It seems therefore that the way to construct baryon stars in the Skyrme model is to
consider embedding shells of baryons within shells. This gives rise to more appropriate
specifications for the star and is also more realistic. We do indeed observe such improvements
for a many layered configuration. In fact the radius of B = 1058 gravitating Skyrmion (at
realistic α), can be decreased in this manner to approximately 20.91km.
We note however that this approach to shell embedding has only be done naively thus far.
We have only considered the case where the baryon number is equal for each shell. We really
should allow the baryon number(and hence the rational map quantities) to vary over the
shells. One approach towards this would be to assume that the baryon density is a constant
over the shells. An even better approach would be to allow this to be a smoothly varying
function that must be determined by minimising the energy. This will give a more realistic
description of baryon stars within the Einstein-Skyrme model, as traditional descriptions
of neutron stars also involve many strata, of differing neutron density. We are currently
investigating such configurations.
8 Acknowledgements
GIP is supported by a PPARC studentship.
References
[1] P. Bizon & T. Chmaj “Gravitating Skyrmions” Phys. Lett. B 297 (1992), 55-62
[2] T. H. R. Skyrme, “A Non-Linear Field Theory” Proc. Roy. Soc. A 260 (1961), 127-138
[3] T. H. R. Skyrme, “A Unified Theory of Mesons and Baryons” Nucl. Phys.31 (1962)
[4] E. Witten “Global Aspects of Current Algebra” Nucl. Phys. B223 (1983), 422-433
[5] N. K. Glendenning, T. Kodama & F. R. Klinkhamer “Skyrme Topological Soliton
Coupled to Gravity” Phys. Rev. D 38 Number 10 (1988), 3226-3230
[6] M. S. Volkov & D. V. Gal’tsov “Gravitating Non-Abelian solitons and Black Holes with
Yang-Mills Fields” Physics Reports, 319, Numbers 1-2, 1-83 (1999)
[7] H. Luckock & I. Moss “Black Holes HAve Skyrmion Hair” Phys. Lett B176 (1986),341-
[8] S. Droz, M. Heusler & N. Straumann “New Black Hole Solutions with Hair” Phys.
Lett. B268 (1991), 371-376
[9] R.A. Battye, P.M. Sutcliffe “ MULTI - SOLITON DYNAMICS IN THE SKYRME
MODEL.” Phys.Lett. B391 (1997), 150-156
[10] C. Houghton, N. Manton & P. Sutcliffe “Rational MAps, Monopoles and Skyrmions”
Nucl. Phys. B510 (1998), 507-537
[11] R. A. Battye & P. M. Sutcliffe “Skyrmions, Fullerenes and Rational Maps” Rev. Math.
Phys. 14 (2002), 29-86
[12] N. S. Manton & B. M. A. G. Piette “Understanding Skyrmions using Rational
Maps” hep-th/0008110 Understanding Skyrmions Using Rational Maps: Proceedings
of the European Congress of Mathematics, Barcelona 2000, eds. C.Casacuberta et al.,
Progress in Mathematics, Birkhauser, Basel Vol. 201 (2001) 469-479
http://arxiv.org/abs/hep-th/0008110
[13] V. B. Kopeliovich “The Bubbles of Matter from MultiSkyrmions” JETP Lett. 73
(2001), 587-591; Pisma Zh.Eksp.Teor.Fiz. 73 (2001), 667-671
[14] V. B. Kopeliovich “MultiSkyrmions and Baryonic Bags” J.Phys. G28 (2002), 103-120
	Introduction
	The Einstein-Skyrme Model
	Gravitating Hedgehog Skyrmions
	The Rational Map Ansatz
	Hollow Skyrmion Shells
	The Ramp-profile Approximation
	Conclusions
	Acknowledgements
ABSTRACT
  We investigate the large baryon number sector of the Einstein-Skyrme model as
a possible model for baryon stars. Gravitating hedgehog skyrmions have been
investigated previously and the existence of stable solitonic stars excluded
due to energy considerations. However, in this paper we demonstrate that by
generating gravitating skyrmions using rational maps, we can achieve
multi-baryon bound states whilst recovering spherical symmetry in the limit
where B becomes large.

<|endoftext|><|startoftext|>
Microsoft Word - Transaction _Mar 2, 2007__1.doc
IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 1 
Many-to-One Throughput Capacity of IEEE 
802.11 Multi-hop Wireless Networks 
Chi Pan Chan, Student Member, IEEE, Soung Chang Liew, Senior Member IEEE, and An Chan, 
Student Member, IEEE 
Abstract—This paper investigates the many-to-one throughput capacity (and by symmetry, one-to-many throughput capacity) 
of IEEE 802.11 multi-hop networks.  It has generally been assumed in prior studies that the many-to-one throughput capacity is 
upper-bounded by the link capacity L. Throughput capacity L is not achievable under 802.11. This paper introduces the notion of 
“canonical networks”, which is a class of regularly-structured networks whose capacities can be analyzed more easily than 
unstructured networks. We show that the throughput capacity of canonical networks under 802.11 has an analytical upper 
bound of 3L/4 when the source nodes are two or more hops away from the sink; and simulated throughputs of 0.690L (0.740L) 
when the source nodes are many hops away. We conjecture that 3L/4 is also the upper bound for general networks. When all 
links have equal length, 2L/3 can be shown to be the upper bound for general networks. Our simulations show that 802.11 
networks with random topologies operated with AODV routing can only achieve throughputs far below the upper bounds. 
Fortunately, by properly selecting routes near the gateway (or by properly positioning the relay nodes leading to the gateway) to 
fashion after the structure of canonical networks, the throughput can be improved significantly by more than 150%. Indeed, in a 
dense network, it is worthwhile to deactivate some of the relay nodes near the sink judiciously. 
Index Terms—wireless mesh networks, many-to-one, one-to-many, data-gathering networks, 802.11, sensor networks, 
throughput capacity, wireless multi-hop networks. 
——————————      —————————— 
1 INTRODUCTION
any-to-one communication is a common communi-
cation mode in many multi-hop wireless networks. 
Two relevant applications are sensor networks and 
multi-hop wireless mesh networks. In sensor networks, 
there is often a “data processing center” to which data 
collected at distributed sensors are to be forwarded. In 
multi-hop wireless mesh networks, there is an Internet 
gateway connecting the mesh network to the core wired 
Internet – the client stations and the Internet gateway 
form a many-to-one relationship.  
This paper investigates the many-to-one throughput 
capacity of IEEE 802.11 multi-hop networks. In this set-
ting, there are multiple source nodes generating traffic 
streams to be forwarded to a common sink node via relay 
nodes. The relay nodes could be sources themselves. By 
symmetry, the throughput capacity thus found is also the 
same as that in a one-to-many scenario in which a source 
node generates multiple distinct data streams to be for-
warded to their respective sinks (note: this is not to be 
confused with the multicast scenario in which the same 
data is to be forwarded to multiple sinks). For conven-
ience, we shall refer to the sink in the many-to-one sce-
nario as the “center” of the network.  
There have been many related studies on the capacity 
of general wireless networks. Gupta and Kumar [1] ana-
lyzed the capacity in many-to-many situation. It provides 
the basic model that can be adapted for use in the analysis 
of the many-to-one communication. As a loose bound, it 
is obvious that the many-to-one throughput capacity is 
upper-bounded by L [1]-[3], where L is the single-link 
throughput capacity, since this is the rate at which the 
sink can receive data. There is a high probability, how-
ever, that the throughput capacity is lower than L for a 
random network [3]. This paper follows the approach 
used in [1]-[3] in characterizing which nodes can transmit 
together without packet collisions. The main difference is 
that here we are interested in the capacity throughput 
obtained under the IEEE 802.11 distributed MAC protocol 
[4]. Specifically, we integrate into our analysis the effects 
of carrier sensing,  the existence of an ACK frame for each 
DATA frame transmission, and the distributed nature of 
the CSMA protocol,  while [1]-[3] do not and their bounds 
are obtained with the implicit assumption of perfectly 
scheduled transmissions.     
There are three main contributions to this paper: 
1. We introduce the notion of “canonical networks”, 
which is a class of regularly-structured networks 
whose capacities can be analyzed more easily than 
general unstructured networks. We find that the 
throughput capacity of canonical networks under 
802.11 is upper bounded by 3L/4 when the source 
nodes are at least two hops away from the sink. We 
conjecture that this is also the upper bound for general 
networks. Indeed, when all the links in the network 
are of equal length, canonical networks and general 
networks have the same upper bound of 2L/3.   
xxxx-xxxx/0x/$xx.00 © 200x IEEE 
———————————————— 
• All authors are with the Department of Information Engineering, The 
Chinese University of Hong Kong, New Territories, Hong Kong. E-mail: 
C.P. Chan : cpchan4@ie.cuhk.edu.hk , S. C. Liew : soung@ie.cuhk.edu.hk, 
A. Chan : achan5@ie.cuhk.edu.hk.  
Manuscript received (insert date of submission if desired). Please note that all 
acknowledgments should be placed at the end of the paper, before the bibliography. 
2 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
2. We find that canonical networks give much insight on 
how a many-to-one network should be designed in 
general. Our simulations show that 802.11 networks 
with random topologies operated with AODV routing 
can only achieve throughputs far below the upper 
bound of canonical networks. However, if we route 
the traffic in accordance to the optimized routes ob-
tained from an optimization algorithm, the routes near 
the center have a structure similar to that of the opti-
mal canonical network structure. In other words, as a 
principle, routing or network design near the center 
should be fashioned after the canonical network. Our 
further investigation shows that a “manifold” canonical 
network structure near the center may yield through-
put improvement of more than 150% relative to that 
obtained by using AODV routing in a general network 
structure. Indeed, in a dense network, it is worthwhile 
to deactivate some of the relay nodes near the sink ju-
diciously.  
3. We find that ensuring the many-to-one network is 
hidden-node free (HNF) in our design leads to higher 
throughputs as compared to not doing so. This is in 
contrast to the many-to-many case, in which the large 
carrier-sensing range required to ensure the HNF 
property may lower the network throughput due to 
the increased exposed-node problem [5]. This observa-
tion is used as a design principle in much of the study 
in 1 and 2 above.  
 The rest of this paper is organized as follows. Section 
II provides the definitions and assumptions used in our 
analysis. Section III derives the throughput capacities of 
canonical networks, and presents simulation results to 
support our findings. In addition, we demonstrate the 
desirability of ensuring the HNF property in many-to-one 
networks. Section IV investigates general networks not 
restricted to the canonical network structure. We show 
that the optimal routing in general networks results in a 
subset of selected routes that form a structure near the 
center that resembles the optimal canonical network. We 
then apply this insight to demonstrate the desirability of 
designing the network according to a “manifold” canoni-
cal-network structure near the center. Section V concludes 
this paper. 
2 DEFINITIONS AND ASSUMPTIONS 
Let us first provide some definitions used in our analy-
sis.  
Definition 1: The source nodes are nodes that generate data 
traffic. 
Definition 2: The sink node is the center to which the data 
collected at the source nodes are to be forwarded. 
Definition 3: The relay nodes relay data traffic from the 
source nodes to the sink node.  
Note that a node can be classified as one of the follow-
ings: 1) a source node; 2) a sink node; 3) a relay node; or 4) 
both a source node and a relay node.  
Definition 4: Given a network topology, the uniform throu-
ghput capacity uC with respect to a set of source nodes and 
a sink node is the maximum total rate at which the data 
can be forwarded to the sink node, with equal amount of 
traffic from each source node to the sink node. The throu-
ghput capacity, mC , on the other hand, does not require 
equal amounts of traffic from sources to sink. Thus, in 
general, m uC C≥ . 
Fig. 1 shows a simple example of a network consisting 
of three nodes. Suppose that node 2 is the sole source 
node and node 1 is the relay node that forwards packets 
from node 2 to node 0. Node 1 does not generate traffic 
by itself.  Then, / 2mC L= , where L is the capacity of one 
link. This is because node 1 cannot receive and transmit at 
the same time (typical assumption of half-duplexity of 
wireless links). Also, since there is only one source node, 
m uC C= .  
If node 1 is also a source node in addition to being a re-
lay node, then mC L=   (obtained when only node 1 is 
allowed to transmit), and 2 / 3uC L= , with nodes 1 and 2 
having a throughput of L/3 each. Since node 1 needs to 
serve as the relay node for node 2, node 1 will need to 
transmit twice as often as node 2. So, proper scheduling is 
required.  
Now, if we generalize the above linear network [7] to 
the one consisting (n+1) nodes, in which there are n 
sources nodes with (n-1) of them also being relay nodes. 
Then, uC  can be obtained as follows. Node 1 will trans-
mit to node 0, the sink node, at rate uC . Node 2 will 
transmit to node 1 at rate ( 1) /uC n n− , and so on. In gen-
eral, node (i+1) transmits to node i at rate ( ) /uC n i n− . We 
note that when node i transmits, nodes (i+1) and (i+2) 
cannot:  node (i+2) cannot transmit because the reception 
at node (i+1) will be corrupted by the transmission by 
node i. So, considering transmissions of nodes 3, 2, and 1 
(which       form      the       bottleneck), we             have 
( 1) / .u
C n i n L
− + =∑  That is, /(3 3) / 3uC Ln n L= − ≈  for 
large n.  
We note that L/3 is also the mC  if node n were the only 
source node. As a matter of fact, / 3u mC C L= =  if the 
source nodes in the linear network were nodes i for 3i ≥  
only. Thus, for reasonably large n, if the traffic from 
nodes 1 and 2 is only a small fraction of the total traffic to 
the sink,   we could treat nodes 1 and 2 as pure relay, non-
source, nodes. Once we do that, we then do not have to 
distinguish between uC and mC .  
We next consider a general many-to-one network, such 
as that in Fig. 2. For the study of many-to-one networks in this 
paper, we focus on the case where the source nodes are two or 
more hops away from the sink. This is a good approximation 
when the nodes within one hop to the sink only generate 
a small fraction of the total traffic.  
Definition 5: The throughput capacity with respect a 
multi-access protocol p (e.g., IEEE 802.11), pC , is the total 
rate at which the data can be forwarded to the sink 
nodeusing that protocol, assuming the source nodes are 
two or more hops away from the sink.   The  transmission  
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 3 
schedule by the links is dictated by the protocol.  
This paper focuses on the throughput capacity under 
the 802.11 CSMA protocol, 802.11C . Henceforth, by 
throughput capacity, we mean 802.11C . For illustration, let 
us consider the two-chain linear topology shown in Fig. 3. 
Suppose that only nodes 2 and 2’ are the source nodes. 
Under “perfect scheduling”, nodes 1 and 2’ will transmit 
together; and nodes 1’ and 2 will transmit together. This 
results in a throughput capacity of L. Under 802.11, how-
ever, the transmissions are usually not perfectly aligned 
in time. In addition, a DATA frame is followed by an 
ACK frame in the reverse direction. Suppose nodes 1 and 
2’ transmit together. Say, the transmission of the DATA 
frame of node 1 completes first, while the transmission 
node 2’ is ongoing. When node 0 returns an ACK to node 
1, this ACK also reaches node 1’, the receiver of the 
transmission from node 2’, causing a collision there. Thus, 
under 802.11, simultaneous transmissions by nodes 1 and 
2’ will usually result in a collision unless the completion 
times of their DATA transmissions are perfectly aligned, 
which is rare. In this case, 802.11C is at best 2L/3, since at 
best node 2 and 2’ can transmit together, and nodes 1 and 
1’ will need to transmit at separate times.  
For many-to-one networks, the capacity bottleneck is 
likely to be near the sink node because all traffic travels 
toward the sink node. Specifically, nodes near the sink 
node are responsible for forwarding more traffic, and 
these nodes contend for access of the wireless medium 
because they are close to each other. To obtain an idea on 
the upper limit of the throughput capacity under 802.11, 
we consider a class of networks referred to as the canoni-
cal networks. An example of a canonical network is 
shown in Fig. 4. We show that 3L/4  is the upper bound 
of the throughput capacity of canonical networks, and 
conjecture that this is also the upper bound for networks 
with general structures. We will motivate the study of the 
canonical networks shortly. In the special case in which 
all links have equal length, then the throughput capacities 
of the canonical network as well as general networks are 
upper-bounded by 2L/3. We now define the canonical 
networks.  
Definition 6: A chain is formed by a sequence of at least 
three nodes leading to the center sink node. Traffic is for-
warded from one node to the next node in the sequence 
on its way to the sink node. A linear chain is a chain 
which is a straight line.  
In Fig. 4, for example, there are eight linear chains. 
Definition 7: An i-hop node is a node that is i hops away 
from the sink node in a chain (see Fig. 4). 
Definition 8: A canonical network is formed by a number of 
linear chains leading to a common center sink node; the 
nodes in different chains are distinct except the sink node. 
In addition, the distance between an i-hop node and an (i-
1)-hop node, di, is the same for all the linear chains (see 
Fig. 4).  
Definition 9: A ring is a circle centered on the sink node. 
An i-hop ring consists of all the i-hop nodes of the differ-
ent linear chains in a canonical network (see Fig. 4). 
Motivation for the Study of Canonical Networks  
Canonical networks have regular structures and can 
be analyzed more easily than general networks. We con-
jecture that the upper bound of throughput capacity ob-
tained for canonical networks is also the upper bound for 
general networks, because intuitively canonical networks 
model a rich class of networks the optimal of which may 
yield very good throughput performance. Consider the 
following intuitive argument. (i) In a densely populated 
network (say, infinitely dense), we may choose to form 
linear chains from the source nodes to the center sink 
node for routing purposes. Since the direction of traffic 
flow is pointed exactly to the center, there is no “wastage” 
Fig. 1. Simple network example. 
Fig. 2. A random many-to-one network. 
Fig. 4. A Canonical Network. 
Fig. 3. A two-chain many-to-one network with equal link length.
Sink Source 
Node 0 Node 1 Node 2 
Relay Sink Source 
Node 0 Node 1 Node 2 
Relay Relay Source 
Node 1’ Node 2’ 
ring 
3-hop node 
d0 d1 d2 
2-hop node 
1-hop node 
chain 
4 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
with respect to the case in which the routing direction is 
at an angle to the center. (ii) We have defined the class of 
canonical networks to be quite general in that we do not 
restrict the number of linear chains in it. Neither do we 
limit the distance di. In deriving the capacity of the ca-
nonical network later, we allow for the possibility of an 
infinite number of linear chains and arbitrarily small di. 
This provides us with a high degree of freedom in identi-
fying the best-structured canonical networks. The above 
intuitive reasoning will be validated by simulation results 
later. In addition, we will show later that in a random 
network with many nodes (so that there is a high degree 
of freedom in forming routes), establishing a canonical-
network-like structure near the center for routing pur-
poses will generally lead to superior throughput per-
formance.  
In this paper, unless otherwise stated, we further as-
sume the following: 
Assumptions: 
(1) The nodes and links are homogenous. They are con-
figured similarly, i.e., same transmission power, carrier-
sensing range (CSRange), transmission rate, etc. 
(2) ACK is sent by the receiver when a packet is received 
successfully, as per the 802.11 DCF operation.  
(3) The following constraints apply to simultaneous 
transmissions [1][6]. Consider two links (T1 ,R1) and 
(T2 ,R2). For simultaneous transmissions without collisions, 
they must satisfy all the eight inequalities below: 
2 1 1 1
2 1 1 1
2 1 1 1
2 1 1 1
1 2 2 2
1 2 2 2
1 2 2 2
1 2 2 2
T R T R
R R T R
T T T R
R T T R
T R T R
R R T R
T T T R
R T T R
X X X X
X X X X
X X X X
X X X X
X X X X
X X X X
X X X X
X X X X
− > + ∆ −
− > + ∆ −
− > + ∆ −
− > + ∆ −
− > + ∆ −
− > + ∆ −
− > + ∆ −
− > + ∆ −
 (1) 
where Xi is the location of node i, |Xi – Xj| is the distance 
between Xi and Xj, ∆ > 0 is the distance margin (see next 
paragraph). These are the physical constraints that pre-
vent DATA-DATA, DATA-ACK and ACK-ACK colli-
sions. 
The received power function can be expressed in the 
form of 
 ( ) /tP d P d
α∝ , (2) 
where Pt is the transmission power, d is the distance and 
α is the path-loss exponent, which typically ranges from 2 
to 6 according to different environments [8]. By the as-
sumptions that all the nodes have the same transmission 
power and α = 4, and Signal-to-Interference Ratio (SIR) 
requirement of 10dB. Then at R1 , we require 
(|  -  |)
(|  -  |)
P X X
P X X
>    (3) 
giving 
|  -  |
 10   1.78
|  -  |
> =    
In other words, ∆ = 0.78. Unless otherwise stated, we as-
sume ∆ = 0.78 throughout this work.  
(4) In 802.11 networks, there are two types of packet colli-
sions: collisions due to hidden nodes (HN) (see explana-
tion of assumption (5) below or [6]), and collisions due to 
simultaneous countdown to zero in the backoff period of 
the MAC of different transmitters. In much of our throu-
ghput-capacity analysis, we will neglect the latter colli-
sions and assume that they have only small effects toward 
throughput capacity, a fact which has been borne out by 
simulations and which can be understood through intui-
tive reasoning, particularly for a network in which a node 
is surrounded by only a few other active nodes who may 
collide with it. As will be shown later in this paper, this is 
generally a characteristic of a network with good throu-
ghput performance (see results of Fig. 14 and Fig. 18, for 
example). Also, an upper bound on throughput capacity 
obtained by neglecting the countdown collisions is still a 
valid upper bound. It is a good upper bound so long as it 
is tight. We will see later that the upper bounds we obtain 
are reasonably tight when verified against simulations 
results in which countdown collisions are taken into ac-
count. In the remainder of this paper, unless otherwise 
stated, the term “collisions” refers to collisions due to HN 
(i.e., caused by the failure of carrier-sensing) rather than 
simultaneous countdown to zero.  
(5) In this paper, unless otherwise specified, we assume 
the so-called Hidden-Node Free Design (HFD) [6] in the 
network. That is, we design the network such that simul-
taneous transmissions that will cause collisions can be 
carrier-sensed by transmitters and be avoided. A reason 
for this assumption is that for many-to-one communica-
tion, eliminating hidden nodes is worthwhile (see simula-
tion results in Section III-C). According to [6], HFD re-
quires 
(i) Use of Receiver Restart (RS) Mode, and 
(ii) Sufficiently large CSRange. 
This paper assumes the 802.11 basic mode and 
RTS/CTS are not used. We briefly describe the HFD re-
quirements for understanding of the analysis later. More 
details can be found in [6]. Fig. 5 is an example showing 
that no matter how large CSRange is, the hidden node 
(HN) phenomenon can still occur in the absence of RS. In 
the figure, T1 and T2 are more than CSRange apart, and so 
simultaneous transmissions can occur. Furthermore, the 
SIR is sufficient at R1 and R2 so that no “physical colli-
sions” occur. But HN can still happen, as described below. 
Assume T1 starts first to transmit a DATA packet to R1. 
After the physical-layer preamble of the packet is re-
ceived by R2, R2 will “capture” the packet and will not 
attempt to receive another new packet while T1’s DATA is 
ongoing. If at this time T2 starts to transmit a DATA to R2, 
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 5 
R2 will not receive it and will not reply with an ACK to T2, 
causing a transmission failure on link (T2, R2). This is the 
default receiver mode assumed in the NS-2 simulator [10] 
and most 802.11 commercial products.  Note that the ex-
ample in Fig. 5 is independent of the size of CSRange. 
This HN problem can be solved with the Receiver Re-
start Mode (RS) which can be enabled in some 802.11 
products (e.g., Atheros Wi-Fi chips; however, the default 
is that this mode is not enabled). With RS, a receiver will 
switch to receive the stronger packet if its power is Ct 
times greater than the current packet (say, 10 dB higher). 
The example in Fig. 5 will not give rise to HN with RS if 
CSRange is sufficiently large.  
RS Mode alone, however, cannot prevent HN without 
sufficiently large CSRange. To see this, consider the ex-
ample in Fig. 6. Assume T1 transmits a DATA to R1 first. 
During the DATA’s period, T2 starts to send a shorter 
DATA packet to R2. With RS Mode, R2 switches to receive 
T2’s DATA and sends an ACK after the reception. If T1’s 
DATA is still in progress, R2’s ACK will corrupt the 
DATA at R1, since the distance between R1 and R2 is 
within interference range (
max(1 )d+ ∆ ). To prevent T2 from 
transmission (hence the collision), the following must be 
satisfied:  
1 2|  -  |   T TX X CSRange≤ . (4) 
Reference [6] proved that in general if CSRange > (3+∆) 
dmax, where dmax is the maximum link length, then HN can 
be prevented in any network. However, for a specific 
network topology, e.g., the canonical network, the re-
quired CSRange can be smaller. 
Throughout this work, we primarily focus on the pair-
wise-interference model [1][6]. The concept of CSRange 
and the constraints in (1) rely on this assumption. An 
analysis which at the outset takes into account the simul-
taneous interferences from more than one source will 
complicate things significantly. So, given a network to-
pology, our approach is to first identify the capacity 
based on pair-wise interference analysis only, and then 
verify the capacity is still largely valid under multiple 
interferences (this verification is done in Section III-D). 
3 CANONICAL NETWORKS 
In this section, we derive the throughput capacities of 
canonical networks. Section A analyzes two kinds of ca-
nonical networks: equal link-length and variable link-
length networks. Simulation results are presented and 
discussed in Section B. Section C compares the perform-
ance of HFD and non-HFD networks, and Section D veri-
fies the results under multiple interferences. 
3.1 Theoretical Analysis 
(1)      Equal Link-Length Networks 
We first consider the case where all links have the 
same length d, i.e., d0 = d1 =… =d. Theorem 1, which fol-
lows from  Lemma 1 and Corollary 1 below, proves that 
the throughput capacity in this network is upper-
bounded by 2L/3, where L is the single-link throughput. 
Lemma 1: Given three nodes on the periphery of a circle of 
radius d, we can identify two nodes with distance smaller 
than (1+∆)d between them. 
Proof: The three nodes form the vertices of a triangle. 
Consider the equilateral triangle inscribed on the circle of 
radius d , and let t be the length of one side (see Fig. 7). 
Then 
2 cos = 1.731 (1+ )
t d d d
= < ∆  
That is, it is not possible to inscribe a triangle with all 
sides no less than (1+∆)d  on the circle.  
Corollary 1: At any time, at most two 2-hop nodes can 
transmit at the same time. 
Proof: With reference to Fig. 8, suppose that three 2-hop 
nodes can transmit together. In order that the ACK of any 
1-hop node to not interfere with the reception of DATA 
packet of another transmission, the distances between the 
three 1-hop nodes must all be larger than (1+∆)d.  By 
Lemma 1, this is not possible.  
Theorem 1: For equal-link-length canonical networks, 
802.11 2 / 3C L≤ , where L is the link capacity. 
Proof: Define “airtime” usage of a node to include the 
transmission time of DATA packets as well as the ACK 
from the receiver [7]. Let Sij be the airtime occupied by the 
transmission   of  the  i-hop node on the  j-th  chain over  a 
Fig. 5. Lack of RS Mode leads to HN no matter how large
CSRange and SIR are.  
Fig. 6. With RS Mode, CSRange not sufficiently large still leads to 
HN due to insufficient SIR . 
Fig. 7. Equilateral triangle inscribed in a circle. 
R1 T1 T2 R2 
dmax dmax 
DATA 
DATA 
CSRange 
>(1+ ∆ )dmax 
T1 R2 R1 DATA ACK 
dmax dmax <(1+ ∆ )dmax 
CSRange 
6 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
long time interval [0, Time]. 
Let S1 = the union of airtimes occupied by all 1-hop nodes 
S1j. Similarly, let S2 = the union of airtimes occupied by all 
2-hop nodes S2j . That is, 1 11 12 1... NS S S S= ∪ ∪ ∪  and  
2 21 22 2... NS S S S= ∪ ∪ ∪ . We further define xij = |Sij|/Time. 
By definition, 
1 2| |S S Time∪ ≤  (5) 
According to assumption (3), when any 1-hop node trans-
mits, none of the other 1-hop nodes or 2-hop nodes can 
transmit at the same time if collisions are not to happen. 
Thus, if carrier-sensing works perfectly and collisions due 
to simultaneous countdown to zero in the 802.11 backoff 
algorithm are negligible (see assumptions (4) and (5) in 
Section II), then  
1 2S S∩ = ∅  (6) 
and   
                          
1 1i jS S∩ = ∅           for  i ≠  j . (7) 
This implies 
1 2 1 2| | | | | |S S S S Time+ = ∪ ≤  (8) 
1 11 12 1| | | | | | ... | |NS S S S= + + +  . (9) 
By Corollary 1,  
 21 22 2
| | | | ... | |
NS S SS
+ + +
≥  . (10) 
Recall that we assume that the 1-hop nodes are relay nodes that 
do not generate data (see Definition 5 and the justification be-
fore that in Section II). All traffic transmitted by 1-hop nodes 
must therefore come from 2-hop nodes.  By the “no collision” 
assumption, the sum of the airtimes of 1-hop nodes must not be 
greater than the sum of airtimes of 2-hop nodes. We have 
11 12 1 21 22 2| | | | ... | | | | | | ... | |N NS S S S S S+ + + ≤ + + +  (11) 
From (8)-(10), we have 
11 12 1 21 22 2| | | | ... | | (| | | | ... | |) / 2N NS S S S S S Time+ + + + + + + ≤ . 
Applying (11), we get  
11 12 1
11 12 1
( ... )
( ... ) 1
x x x
x x x
+ + +
+ + + + ≤  
giving 
      
11 12 1
x x x+ + + ≤  
where 
11 12 1( ... )Nx x x L+ + +  is the throughput. 
We now show a specific schedule on a 2-chain network 
which achieves the capacity of 2L/3. Consider the topol-
ogy shown in Fig. 9. There are two chains, having link 
distance d and CSRange = 2.9d, which removes HN. Recall 
that the general HFD has two requirements, (i) RS mode 
and (ii) CSRange > (3+∆) dmax [6]. For the topology in Fig. 9, 
it turns out that CSRange = 2.9d is enough. 
The numbers shown on the links in Fig. 9 represent a 
possible transmission schedule. Links with same number 
transmits at the same time. Following this pattern, the 
throughput capacity of 2L/3 is “potentially” achievable. 
Our simulation results in Subsection B below show that 
the 802.11 protocol throughput capacity is below but close 
to this upper bound.  
The reader may be curious as to why we did not use a 
“symmetric” 2-chain network (where the angle between 
the chains isπ ) as the illustrating example above. It turns 
out that the symmetric structure cannot achieve the 
throughput of 2L/3 if there are source nodes four or more 
hops away. To see this, first we note that for a symmetric 
2-chain network, CSRange must be at least 3d to ensure 
HFD in the areas around the sink node (see discussion of 
the example in Fig. 3 in Section II). Given CSRange=3d, 
each of the chains (assuming a long chain with more than 
four hops (or five nodes)) cannot have throughput of L/3, 
as can be easily verified by analysis of one linear chain [7], 
[9]. 
Before going to the next subsection, we note that Theo-
rem 1 actually applies not just to canonical networks (the 
proof does not require it), but general networks in which 
(i) all links are of the same length; and (ii) source nodes 
are two hops are more away from the center. In other 
words, the chains leading to the data center need not be 
straight-line linear chains. Thus, Theorem 1 can be stated 
more generally as Theorem 1’ below: 
Theorem 1’: For equal-link-length general networks, 
802.11 2 / 3C L≤ , where L is the link capacity. 
Proof: Same as Theorem 1 since Lemma 1 and Corollary 1 
apply to general networks with equal link length also.  
(2)     Variable Link-Length Networks 
In this subsection, we consider canonical networks in 
which the distance between adjacent rings can be varied 
(i.e., d0 , d1 ,…  may be distinct). With this assumption, the 
capacity is upper-bounded by 3L/4. This is proved in 
Fig. 8. At most two simultaneous transmissions from 2-hop nodes.
Fig. 9. Example of equal-link-length topology, CSRange=2.9d.
1 2 3 
O N11 N12 
>(1+∆)d >(1+∆)d 
<(1+∆)d 
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 7 
Theorem 2 after Lemma 2 in the following. 
Lemma 2: At any time, at most three 2-hop nodes can 
transmit at the same time. 
Proof: Assume the contrary that we can have four 2-hop 
nodes belonging to four different chains transmitting at 
the same time. With respect to Fig. 10, consider the four 
straight lines formed by the four nodes to the center (note: 
the network could have more chains, just that we are fo-
cusing on the four chains of the four 2-hop nodes in focus 
here). Four angles are formed between adjacent lines. Let 
θ < / 2π  be the minimum of the four angles.  Four angles 
are also formed between non-adjacent lines. Let β π≤ be 
the angle encompassing θ (see Fig. 10).  
For simultaneous transmissions of 2-hop nodes, the 
transmitters should not be able to carrier-sense each other. 
This implies an upper bound for CSRange as follows: 
0 12( )sin 2
CSRange d d
< +  . (12) 
In addition, by assumption (5), to prevent collisions of 1-
hop nodes and 2-hop nodes, they should be able to car-
rier-sense each other. This implies a lower bound for 
CSRange. By (4), 
0 1 0 0 0 1( ) 2 ( )cosCSRange d d d d d d≥ + + − +  β . (13) 
By assumption (3), the receivers of simultaneous transmis-
sions should not violate the physical constraints.    By (1), 
1 0(1 ) 2 sin 2
+ ∆ <  . (14) 
Since there are four chains, / 2θ π≤  and β π≤ .  From 
the definitions of θ and β, we have 
 2θ β π≤ ≤ . (15) 
From (13) and (15), 
0 1 0 0 0 1( ) 2 ( )cos(2 )CSRange d d d d d d≥ + + − + θ . (16) 
Let d1= α d0. We can form two inequalities from (12), (14) 
and (16): 
 2(1 cos )
 , (17) 
(2cos 1) 1 2cos1 2cos
1 2cos 1 2cos
(2cos 1) 1 2cos1 2cos
1 2cos 1 2cos
⎧ θ − + − θ− θ
α > + −⎪
− θ − θ⎪⎪
θ − + − θ− θ⎪α < − −⎪ − θ − θ⎩
   . (18) 
Fig. 11 shows the plot of (17) and (18) when ∆ = 0.78. The 
shadowed region is the area of solution. From the plot, 
 1.73 / 2θ π> > . 
This leads to a contradiction. Thus, there can be at most 
three simultaneous 2-hop transmissions. 
Theorem 2: For variable-link-length canonical networks,  
802.11 3 / 4C L≤ , where L is the link capacity. 
Proof: Similar to the proof of Theorem 1, from Lemma 2,  
 21 22 2
| | | | ... | |
NS S SS
+ + +
≥   . (19) 
Hence, 
11 12 1
11 12 1
( ... )
( ... ) 1
x x x
x x x
+ + +
+ + + + ≤  
or       
11 12 1
x x x+ + + ≤  , 
where 
11 12 1( ... )Nx x x L+ + +   is the throughput 
Fig. 12 shows an example of a canonical network. The 
CSRange has to be set larger than 2.62d0 and smaller than 
3.417d0. The numbers on the links show a possible trans-
mission schedule that achieves capacity of 3L/4. 1  Our 
simulation results in Subsection B below show that 802.11 
throughput capacity is below but close to this upper 
bound. 
In the analysis of canonical networks, we have as-
sumed that the loss exponent is 4,  corresponding  to  ∆ = 
1 For the one-to-many network (i.e., the sink becomes the source, and 
the sources become the sinks with respect to the many-to-one case here), 
some parameters should be changed to attain the capacity of 3L/4. Spe-
cifically, CSRange = 1.7d0, and di = 0.7d0 for i=1, 2, … The derivation 
method for the capacity of the one-to-many case is similar to that in the 
many-to-one case here. 
Fig. 11. Plot of Inequalities (17) and (18). 
Fig. 10. Example of 4-chain canonical network. 
d0 d1 
d1 (12) 
(13) (14) 
8 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
0.78. In outdoor environment, the typical value of loss 
exponent is in the range 2 to 4. Similar analytical tech-
nique can be used to find their throughput capacities. 
Since smaller loss exponent implies larger ∆ (larger inter-
ference), the throughput capacity under the assumption 
of loss exponent 4 serves as an upper-bound for the 
throughput capacity in outdoor environment.  
3.2 Simulation 
We use the network simulator NS2 [10] to simulate the 
canonical network shown in Fig. 12. As shown in Subsec-
tion A, for the 3-chain canonical network, 802.11 3 / 4C L≤ . 
In the simulation, the RS Mode is enabled. Table I shows 
the details of the simulated configuration. Only the n-hop 
nodes at the boundary are source nodes that generate 
data. Offered load control is applied to prevent them 
from injecting too much traffic into the network. For the 
interested reader, it has been shown in [7] that offered-
load control can yield higher throughput in multi-hop 
networks. 
Fig. 13 shows the simulation result assuming the set-
up of Table I. The x-axis is the number of nodes per chain, 
including the sink. Given a number of nodes per chain, 
we vary the offered load in the simulation to identify an 
offered load that achieves the highest average throughput. 
When the number of nodes per chain is 3, i.e., the 2-hop 
nodes are the source nodes, the throughput is 4.62Mbps 
(0.740L), which is very close to the theoretical capacity 
3L/4, where the link capacity L is around 6.24Mbps as 
obtained by simulating one single link. But when the 
number of nodes per chain increases, the throughput 
drops to 4.30Mbps (0.690L). 
 An explanation for this phenomenon is that the sched-
uling scheme of IEEE 802.11 does not result in the optimal 
transmission schedule presented in Fig. 12 needed to 
achieve the 3L/4 upper bound. That is, the incorporation 
of random backoff countdown time in 802.11 causes im-
perfect scheduling. Consider Fig. 12, it is possible for 2-
hop and 3-hop nodes of different chains to transmit at the 
same time in 802.11, since they are out of the carrier-
sensing range of each other. To achieve capacity 3L/4, 
however, all the 2-hop nodes must transmit together. 
However, a 3-hop transmission may prevent this, result-
ing in only some of the 2-hop nodes transmitting together. 
In other words, there are times when not all 2-hop nodes 
transmit together, meaning |S2| cannot reach the lower 
bound in (19). Meeting the lower bound, however, is es-
sential to achieving the optimal throughput 3L/4.  
Fig. 14 shows the simulation results of canonical net-
works with different numbers of chains but with equal 
link length. The simulated configuration is shown in Ta-
bles II and III. For the 2-chain canonical network, we use 
the network structure in Fig. 9. The angle between two 
chains are slightly less than π . The reason of not using a 
symmetric structure has been given in Subsection A 
above. For other cases, the chains are evenly placed on the 
network. The CSRange for each topology is determined 
by minimizing its value while preventing HN. The 
throughput is obtained by varying the offered load and 
choosing the highest one. From the graph, the highest 
throughput is 3.86Mbps (0.619L), which is slightly smaller 
than the theoretical capacity of 2L/3. This is due to the 
imperfect scheduling by 802.11, which has been discussed 
in the previous paragraph. 
In Fig. 14, the throughput converges to around 
2.0Mbps (0.321L) when the number of chains increases. 
The convergence can be explained as follows. From the 
analysis in Subsection A, we see that the bottleneck is 
around the center. When the number of chains is large, 
the area near the center will become dense. The possible 
transmission patterns are similar in this area, and thus the 
throughput converges. In addition, note that the con-
verged value, 0.321L, is considerably smaller than the 
value achieved when the number of chains is three, 
0.619L. This is again due to imperfect scheduling of 802.11 
MAC protocol. An interesting insight is that when the 
number of chains is small, the possible transmission pat-
terns arise from “random” 802.11 MAC scheduling is 
more limited. And by limiting this degree of freedom, 
higher throughput can actually be achieved because ran-
dom transmission patterns that degrade throughputs are 
eliminated.  
The above observation has two implications: (i) For 
network design, we may want to design the network in such a 
way that the number of routes leading to the center is limited. 
(ii) Even for a general, non-canonical, network densely popu-
lated with nodes and with many routes leading to the center, it 
is better to selectively turn on only a subset of the nodes to limit 
the routes to the center. This principle will be further dis-
cussed in Section IV.  
TABLE I  
SIMULATION CONFIGURATION FOR VARIABLE-LINK-LENGTH CA-
NONICAL NETWORKS 
Number of chains 3 
d0 250m 
d1 242m 
di      for i>1 250m 
Transmission Range 250m 
Carrier Sensing Range 675m 
Routing Protocol AODV 
Propagation Model Two Ray Ground 
Packet Data Size 1460 bytes 
Fig. 12. Example of 3-chain canonical network, CSRange=2.7d.
1.732d0 
0.973d0 
2.62d0 
3.417d0 
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 9 
3 4 5 6 7 8 9 10
Number of nodes per chain
Fig. 13. Simulated throughput of a 3-chain canonical network with 
offered load control. 
TABLE II  
SIMULATION CONFIGURATION FOR EQUAL-LINK-LENGTH CA-
NONICAL NETWORKS 
Number of nodes per chain 8 
di      for all i 250m 
Transmission Range 250m 
Carrier Sensing Range Refer  to Table III 
Routing Protocol AODV 
Propagation Model Two Ray Ground 
Packet Data Size 1460 bytes 
TABLE III  
CARRIER SENSING RANGE FOR  EQUAL-LINK-LENGTH CANONICAL 
NETOWKRS 
Number of chains Carrier Sensing Range 
2 725m 
3 875m 
4 750m 
5 725m 
6 875m 
7 800m 
8 750m 
9 875m 
10 825m 
>10 900m 
2 6 10 14 18
Number of chains
Fig. 14. Simulated throughput of equal-link-length canonical net-
works with offered load control. 
3.3 HFD versus Non-HFD Performance 
In the preceding sections, we have assumed HFD net-
works to simplify the analysis by eliminating the effect of 
collision. We now investigate the performance of HFD 
versus that of non-HFD networks. As a reminder, HFD 
requires 
(i) Use of Receiver Restart (RS) Mode, and 
(ii) Sufficiently large CSRange. 
From [11], we know that increasing CSRange increases 
the number of exposed nodes (EN) and decrease the 
number of hidden nodes (HN), and vice versa. When HN 
is removed, say with HFD, the EN phenomenon will be 
more severe, which lowers the throughput.  However, 
that is the case for many-to-many data delivery only. For 
this paper, we are interested in many-to-one data delivery. 
Table IV shows the simulation results with same configu-
ration as in Table II with varying CSRange. The shaded 
entries correspond to HFD. From the table, when the 
number of chains is between 2 to 10, the highest through-
put is achieved if we choose the smallest CSRange within 
HFD. This shows that the best HFD configuration gener-
ally works better than non-HFD. 
TABLE IV 
SIMULATION RESULT FOR EQUAL-LINK-LENGTH   CANONICAL 
NETWOKRS 
No. of Chains Through- 
(Mbps) 2 3 4 5 6 7 8 9 10 
975 2.388  2.981  3.355  2.833  2.863  3.022  2.891  3.054  3.114  
925 2.793  2.993  3.329  3.518  2.837  2.805  2.943  3.270  3.108  
875 2.797  2.999  3.508  3.535  3.393  3.272  3.163  3.384  2.883  
825 2.795  2.490  3.513  3.483  2.615  3.681  3.575  3.053  3.366  
775 2.808  2.473  3.724  3.540  2.760  2.754  3.709  3.367  3.269  
725 3.589  2.226  3.210  3.854  2.095  2.264  3.147  3.199  2.686  
675 3.170  2.288  2.398  2.799  2.142  2.261  2.176  2.367  2.633  
625 3.166  1.806  2.219  2.657  1.735  2.020  2.670  1.906  2.156  
575 3.183  1.788  2.168  2.202  1.657  1.609  2.280  1.929  2.041  
bold: highest throughput;  shaded: HFD 
The better performance of HFD could be explained as 
follows. When CSRange is decreased, the number of HN 
increases and the number of EN decreases. More links 
could be active when there are fewer EN, thus the 
throughput in multiple-source-multiple-destination net-
work could be higher in the non-HN free situation. In a 
many-to-one network, however, all the traffic is directed 
toward the same destination. With a non-HN free design, 
although the total throughput on a link basis (point-to-
point throughput) may be increased, the many-to-one 
throughput (or the end-to-end throughput) could not 
benefit from the increase, because all the traffic in the end 
will flow toward the bottleneck and be dropped there due 
to HNs. We will see later that this observation suggests a 
design in which the area near the center should be made 
HN-free, while areas far away from the data center need 
not be HN-free.  
3.4 Multiple Interference 
Thus far, we have considered pair-wise interferences 
only. The analysis of pair-wise interferences is appealing 
from the simplicity viewpoint. However, it may not have 
taken into account the fact that the interferences from 
several other simultaneously transmitting sources may 
add up to yield unacceptable SIR even though each of the 
interferences may not be detrimental. In this section, we 
extend our analysis to take into account the effect of mul-
tiple interferences. For brevity, we refer to the throughput 
10 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
capacity obtained by assuming pair-wise interferences as 
pair-wise-interference throughput capacity, and the 
throughput capacity with mul-tiple interferences taken 
into account as multiple-interference throughput capacity.  
The multiple-interference throughput capacity is in 
general less than or equal to that of the pair-wise 
throughput capacity. The question then is whether the 
pair-wise-inter-ference capacity is a tight bound for mul-
tiple-interference capacity. We show in the following that 
this is indeed the case. In the following, we focus on the 3-
chain network. The analytical argument and the qualita-
tive results for the 2-chain network are similar.  
Consider the canonical network in the Fig. 15, 
where d0=d2=d3=d4, and d1=0.9d0. In some cases, the SIR 
may not satisfy the constraint 10dB. For example, 
when N11 is receiving DATA from N12, and at the same 
time N21 and N31 are replying ACK to N22 and N32, the 
SIR is at most 
11 11
21 31
4 4 4
( ) (0.9 )
6.859
1 1( ) ( ) ( )
1.7321 1.7321
P N d
PP N P N
where PX(Y) is the received power from node Y to node X, 
Pt is the transmission power. 
This situation, however, occurs only if multiple ACKs 
are transmitted simultaneous in nearby links near the 
center. The probability of this occurring is low, since the 
transmission time of ACK is much lower than that of 
DATA. If we ignore the simultaneous transmissions of 
ACKs in these nearby links, we can show that the SIR due 
to multiple interferences is still more than 10dB, given 
that the SIR due to pair-wise interferences is more than 
10dB, as follows.  
1. 1-hop node to sink node 
When the sink node is receiving DATA from N11, the 
nearest three active links that cause largest interference 
are: N23 to N22, N33 to N32 and N14 to N13. If no two ACKs 
are transmitted simultaneously by these three links, the 
“worst-case” interference power at N0 (which includes 
ACK from N22 DATAs from N33 and N14, and transmis-
sions by other nodes) is at most 
Hence, the SIR is at least 1/0.09949=10.513 
2. 2-hop node to 1-hop node 
Consider the link N12 to N11. The nearest three active 
links are: N22 to N21, N32 to N31, N15 to N14. Similar to 
above, the SIR is at least 
3. 3-hop node to 2-hop node and others 
The interference is less than the above cases. This part 
is skipped because the analytical approach is similar. 
In the above, we have argued analytically the consid-
eration of multiple interferences will not have substan-
tially different performance than that of pair-wise inter-
ference. We have focused on the 3-chain network with 
variable link distance because this structure provides the 
highest capacity bound among the canonical networks. 
We now present simulation results for general canoni-
cal networks with arbitrary number of chains. We have 
modified the NS2 simulator to take into account the ef-
fects of multiple interferences (the modified NS2 code can 
be downloaded from the website in [12]). The throughput 
results are shown in Fig. 16. The multiple-interference 
throughput is only lower than the pair-wise-interference 
throughput by a small margin, and therefore the pair-
wise-interference throughput serves a good bound for 
multiple-interference throughput. 
4 GENERAL NETWORKS 
In this section, we consider the throughput of general 
networks. Since general networks may not have the regu-
lar structure of canonical networks, the throughput capac-
Fig. 15. Example of 3-chain canonical network, CSRange=2.7d.
Fig. 16. Simulated throughput of 3-chain canonical network 
with offered load control. 
0 0 0 0 0 022 33 14 25 35 16
4 4 4 4 4 4 4 4
( ) ( ) ( ) ( ) ( ) ( ) ...
1 1 1 1 1 1
( ...) 0.0995
1.9 2.9 3.9 4.9 4.9 5.9
N N N N N N
P N P N P N P N P N P N
+ + + + + +
= + + + + + + ≈
11 11 11 11 11 11
N 21 N 32 N 15 N 24 N 34 N 17
4 4 4 4 4 4 4
(N ) (N ) (N ) (N ) (N ) (N ) ...
(0.9 )
1 1 1 1 1 1
( ...)
1.7321 2.5515 3.9 4.4844 4.4844 5.9
10.5259
P P P P P P
+ + + + + +
+ + + + + +
1.732d0 
0.9d0 
2.55d0 
3.29d0 N11 
3 4 5 6 7 8 9 10
Number of nodes per chain
multiple
interference
pairw ise
interference
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 11 
ity could be lower than 3L/4. We propose a method to 
find the capacity by selecting Hidden-node Free Paths 
(HFP).  
4.1 Discussion of HFP 
In Section III-C, we found that the network with HN-
free outperforms that with HN in terms of throughput 
capacity. We could have three schemes which satisfy the 
HN-free condition for general network analysis. As one of 
the requirements of HFD, we assume RS Mode is used in 
all the analyses and experiments in the remaining of the 
paper. We assume that all nodes use a common fixed 
CSRange in each of the following schemes (assumption (1) 
in Section II); however, the schemes set the fixed CSRange 
differently. 
Scheme 1: CSRange is set to 3.78‧TxRange, where 
TxRange is the transmission range. This is a sufficient 
condition of HN free for any networks [6]. 
Scheme 2: CSRange is minimized according to the 
network topology so that no hidden node exists with 
respect to any two links in the network. This scheme, 
for example, was used in the analysis of canonical 
networks. 
Scheme 3: HFP - We select a subset of links to form 
paths to the center which are hidden-node free and 
achieve the highest possible throughput. Since some 
links are not used, the CSRange can be smaller than 
scheme 1 and 2 (i.e., only the links in the path are 
considered when fixing CSRange.) 
 Based on Table IV, the highest throughput is achieved 
when we choose the smallest CSRange within HFD. So 
we have the following predictions for the throughputs of 
the different schemes above. The throughput of scheme 1 
cannot be higher than that of scheme 2 (because the 
CSRange of some links are forced to adopt a higher value 
than necessary in scheme 1). Also, the throughput of 
scheme 2 cannot be higher than that of scheme 3 (because 
scheme 3 requires the HN property to be maintained only 
for links along the paths, and the paths that will be used 
are optimally chosen with regard to the throughput; 
whereas scheme 2 requires all links to be HN-free, even 
for links that are not used). For an example where HFP 
can achieve a higher throughput than scheme 2, we add 
two nodes to the 3-chain canonical network in Fig. 12 to 
yield the network in Fig. 17. In the network, link BB’ in-
terfere with link AA’. If we set CSRange to be less than 
3.417d0, node B will become a hidden node of link AA’. If 
we set CSRange larger than 3.417d0, the capacity upper-
bound 3L/4 cannot be achieved. On the other hand, if we 
use HFP, we could select the links in the canonical net-
work only. So node A could be “switched off” and there 
will not be hidden-node problem if we set CSRange to 
2.7d0.  
4.2 Experiments and Discussions 
To conserve space, this paper will not go into the de-
tails of the formulation of the HFP problem, and the HFP 
experimental methodology. For the interested readers, 
such details can be found in the Appendix of our techni-
cal report [12]. In a nutshell, our approach extends that of 
[13] by additionally taking into consideration the effects 
of carrier sensing and HFD requirements. We also pro-
vide a branch-and-bound heuristic algorithm for the re-
sulting integer linear program (ILP). Here we only pre-
sent the performance results of experiments on schemes 1, 
2, and 3 and their implications. Solving the ILP of scheme 
3 is computationally intensive. The experimental results 
of scheme 3 in this subsection are therefore obtained us-
ing our branch-and-bound heuristic. Schemes 1 and 2 are 
still solved in an optimal manner. As will be seen, even 
with a suboptimal heuristic, scheme 3 still yields better 
results. 
In our experiments, we put the nodes inside a disk of 
radius one. A sink node is placed at the center of the disk, 
and six source nodes are placed evenly at the boundary of 
the disk spaced evenly apart. For each source node, a 
node is randomly generated within the transmission 
range 0.4. More nodes are generated similarly with refer-
ence to the newly created node until a node is within the 
transmission range from the sink node. In this way, we 
could ensure that there is a path from any source node to 
the sink node. By setting the transmission range to 0.4, the 
data from the source nodes will need at least three hops 
to reach the sink node. 
Table V shows the experiment results for five ran-
domly generated networks, Net1, Net2, …, Net5 . T1, T2 
and T3 are the throughputs of the three schemes. In ob-
taining Ti, we vary the offered load at the source nodes 
until the highest throughput is obtained [7]. From Table V, 
scheme 3 has improvements of 4.8% to 43.8% over scheme 
1, and 4.8% to 23.2% over scheme 2. As related earlier, we 
did not solve scheme 3 optimally, but rather used a heu-
ristic. Therefore, the CSRange (CS) found for HFP in the 
experiments may not be the shortest possible CSRange.  
Nevertheless, the result shows that the solutions of 
scheme 3 exhibit some properties similar to the canonical 
network, as shown in Fig. 12. We discuss the similarities 
in the following paragraph. 
First, for scheme 3, CSRange/TxRange (CS3/TX) for 
Fig. 17. Example of HFP. 
1.732d0 
0.973d0 
2.62d0 
3.417d0 
12 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
Net1 to Net5 is in the range of 2.62 to 3.417, which is the 
CSRange region we mentioned near the end of Section III-
A for achieving the capacity of 3L/4 in a canonical net-
work.  Second, exactly three paths leading to the sink 
node are used, which is the same as the 3-chain canonical 
network (Fig. 18). This gives us an intuition that the ca-
nonical network is in a sense optimal – that is, we may 
want to form a structure similar to the canonical network 
by turning on only some of the relay nodes. 
TABLE V  
RESULT FOR THROUGHPUT OF RANDOM NETWORKS 
 T1 T2 T3 T3/T1 T3/T2 CS3 CS3/TX 
Net1 0.4 0.5 0.575 1.438 1.15 1.253 3.133 
Net2 0.412 0.439 0.541 1.313 1.232 1.265 3.162 
Net3 0.429 0.451 0.536 1.25 1.189 1.265 3.163 
Net4 0.429 0.5 0.6 1.4 1.2 1.205 3.012 
Net5 0.5 0.5 0.524 1.048 1.048 1.287 3.216 
Neti: Network i 
TX: Transmission range, set to 0.4 in experiments 
T1: Throughput when CSRange=3.78 TX (Scheme 1)  
T2: Throughput when CSRange is minimized with respect to links in 
the network (Scheme 2). 
T3: Throughput when only some links in the network are activated 
(HFP) (Scheme 3) 
CS3: CSRange for Scheme 3 
4.3 Applying Canonical Network to General 
Networks 
The preceding subsection shows that HFP outperforms 
other HN-free schemes in terms of throughput. We also 
observe from the results that (i) HFP solutions for a ran-
dom network exhibit structures similar to that of the 3-
chain canonical network near the center. Furthermore, 
from simulation results in Section III-B (see Fig. 13), we 
observe that (ii) IEEE 802.11 scheduling in the canonical 
network achieves throughput close to that of perfect 
scheduling. Observations (i) and (ii) lead to the following 
general engineering principle: 
Centric Canonical-Network Design Principle 
 In a general multi-hop network densely populated 
with relay nodes, instead of solving the complex 
HFP optimization problem, as a heuristic, we may 
select routes near the center so that the structure 
looks like that of a 3-chain canonical network.  
 If we have the freedom for node placement near the 
center during the network design process, then the 
nodes around the center should be structured like a 
3-chain canonical network.  
Note that there is no restriction on nodes far away from 
the center, and that they can be randomly distributed (see 
Fig. 19 for illustration).  
This subsection investigates the application of the Cen-
tric Canonical-Network Design Principle. For our simula-
tions, we assume there is a disk with radius 2000m. 
Within the disk, there is an inner circle with radius 980m. 
As illustrated in Fig. 19, the inner circle is structured as a 
canonical network. The nodes outside the inner circle are 
placed randomly with the constraint that the smallest 
distance between any two of them is not shorter than 
125m. The nodes outside the inner circle act as source 
nodes and relay nodes at the same time, while the nodes 
inside the inner circle act merely as relay nodes. We refer 
to the network structure in Fig. 19 as centric canonical net-
work, alluding to the fact that only the vicinity of the cen-
ter looks like a canonical network. Henceforth, we shall 
refer to vicinity of the center as the canonical network and 
the randomly-structured part beyond that as the random 
network. The number of nodes beyond the inner circle is 
284. We use the default setting in NS2, CSRange of 550m 
and TXRange of 250m, for performing the simulations. 
AODV routing is assumed. For the canonical network, 
with respect to Fig.12, we set d0=200m. Since 
550m/200m=2.75, which is within the range 2.62 to 3.417 
(see Fig. 12), the canonical network is HN free.  The ran-
dom network, however, is not necessarily HN-free in our 
experiments. The assumption is reasonable, and corre-
sponds to the real situation in which we only try to de-
sign the network architecture near the center judiciously 
by careful node placement.  
As a benchmark, we have also conducted simulation 
experiments for a random network in which the inner 
circle is populated by 146 randomly placed nodes with no 
constraint on the node-to-node distance. In all our simula-
tions below, the offered load to the source nodes are var-
ied until we find the largest throughput for each network 
structure [7]. Simulation of 802.11 with AODV yields a 
Fig. 18. Random Networks and HFP.. 
CHAN ET AL.:  MANY-TO-ONE THROUGHPUT CAPACITY OF IEEE 802.11 MULTI-HOP WIRELESS NETWORKS 13 
throughput of 1.16 Mbps for the benchmark random net-
work, and a throughput of 2.79Mbps for the centric ca-
nonical network. That is, the throughput of the centric 
canonical network is more than 100% higher. This dem-
onstrates that a carefully designed structured network around 
the data center yields superior performance. 
Although the improvement is significant, 2.79 Mbps is 
still a bit lower than the 4.30Mbps simulated throughput 
of the 3-chain canonical network in Section III. It turns out 
that the centric-canonical network actually fails to take 
another bottleneck into account. That is, in addition to the 
bottleneck around the center, there is also a bottleneck at 
the “confluence” of the random network and the canoni-
cal network, where the canonical network may branch off 
to many paths in the random network, and the nodes on 
these branches may interfere with each other in a negative 
way to bring down the throughput.  
To mitigate the bottleneck at the confluence, we mod-
ify the canonical network as in Fig. 20. As shown, each 
chain in the canonical network only branches out further 
into two chains before meeting the random network. We 
refer to this design as the manifold canonical network, in 
reference to the fact that there are actually two “layers” of 
canonical networks. The first one is at the center, with 
three more before meeting the random network. We refer 
to this design principle as the Manifold Canonical-Network 
Design Principle.  
In our simulations, the manifold canonical network is 
placed inside an inner circle of radius 1026s. The nodes 
beyond the manifold canonical network are randomly 
generated with the same constraints as the nodes gener-
ated beyond the inner circle of the centric canonical net-
work. As the inner circle is larger than previous networks 
and the number of nodes (which are relay nodes) in the 
manifold canonical network is 31, to keep the total num-
ber of nodes in the network constant, the number of ran-
domly generated nodes (which are also the source nodes) 
outside the inner circle is decreased from 284 to 269. We 
set CSRange 550m and d0=200m in the manifold canonical 
network in our simulation (see Fig. 12). Simulation of 
802.11 with AODV routing yields a throughput of 
3.34Mbps, which is 20% higher than that of the centric 
canonical network. For fair bench-marking, we again per-
form the simulation with the inner circle replaced by ran-
dom node placements, but this time with the inner circle 
having a radius of 1026m, as in the manifold canonical 
network. The simulation of the benchmark network 
yields a throughput of 1.31Mbps. We find that the 
throughput of the manifold canonical network is more 
than 150% over that by the pure random benchmark net-
work. 
We have also investigated the robustness of the mani-
fold canonical network with respect node positioning. 
Simulations show that 5% position error of the nodes in 
the two “layers” of the canonical network only decreases 
the throughput by 10% on average, as summarized in 
TABLE VI.  
TABLE VI  
COMPARISON OF THROUGHPUTS OF MANIFOLD CANONICAL NETWORKS WITH 
AND WITHOUT NODE POSITION ERROR 
Throughput without 
position error (Mbps) 
Throughput with 
position error (Mbps) 
Ratio 
3.44 3.45 1.003 
3.35 3.11 0.928 
3.32 3.18 0.958 
3.29 2.94 0.894 
3.37 2.96 0.878 
3.36 2.82 0.839 
5 CONCLUSION 
In this paper, we have studied the throughput capacity 
of many-to-one multi-hop wireless networks based on the 
IEEE 802.11 MAC protocol. We have defined a class of 
canonical networks whose throughput capacity serves as a 
benchmark for general networks. Specifically, the throu-
ghput capacity of canonical networks under 802.11 is up-
per bounded by 3L/4, where L is the single-link capacity, 
when the source nodes are at least two hops away from 
the sink.  
If we restrict our attention to networks in which all 
links have the same length, the upper bound is further 
reduced to 2L/3. While the 3L/4 result in the previous 
paragraph has been established for canonical networks 
only, the 2L/3 result applies to general networks so long 
as (i) source nodes are at least two hops away from the 
data center; (ii) all links have the same length.  
Our 802.11 simulation results yield throughputs are 
Fig. 19. Example of a centric-canonical. 
Fig. 20. Example of a manifold canonical network. 
-2000
-1000
-2000 -1000 0 1000 2000
-2000
-1000
-2000 -1000 0 1000 2000
14 IEEE TRANSACTIONS ON MOBILE COMPUTING,  MANUSCRIPT ID 
around 0.690L (for variable-link-length canonical net-
works) and 0.619L (for equal-link-length canonical net-
works) under the worse-case scenario when the source 
nodes are very far away and their traffic needs to go 
through many hops before reaching the sink node. That is, 
the simulated throughputs are reasonably close to the 
theoretical upper bounds of 3L/4 and 2L/3, respectively. 
This is a quite positive result considering the fact that 
802.11 schedules transmissions in a rather random man-
ner, while the examples we gave in Section III-A to 
achieve throughputs of 3L/4 and 2L/3 require very spe-
cific transmission orders.  
The above results also imply that using variable link 
length is more desirable than using fixed link length. 
When the network is very dense (say, infinitely dense), if 
each node chooses a routing path with maximum hop 
distance in each hop, an equivalent network with fixed 
link length dmax, may result, where dmax is the maximum 
hop-distance governed by the transmit power and re-
ceiver sensitivity. This max-hop-distance routing is not 
optimal for the many-to-one traffic pattern.  
This paper has considered both canonical networks 
with and without hidden nodes. Our results indicate that 
hidden-node free designs (HFD) yield higher throughput 
capacity. This is in contrast to the many-to-many case 
where HFD may not yield better throughputs [5] [6] and 
may actually decrease the overall system throughput.   
For general networks, we have used the concept of 
HFP (Hidden-node Free Path) to set up routes that yield 
optimal throughput. HFP routing, however, requires 
solving a complicated integer linear program, which may 
not be practical. Fortunately, our experimental results 
indicate that the routes selected by the HFP algorithm 
resemble the structure of the canonical network near the 
center. This gives rise to simple network design principles 
that attempt to approximate the canonical network struc-
ture in the center. Specifically, we have shown that a 
manifold canonical network structure near the sink can yield 
superior throughput that is as much as 150% higher than 
that of a dense random network.  A key insight is that in a 
network densely populated with nodes, deliberating turn-
ing off some nodes in the area near the sink node so as to 
approximate the canonical network structure can actually 
give rise to better throughput performance. 
REFERENCES 
[1] P. Gupta and P. R. Kumar, “The Capacity of Wireless Net-
works,” IEEE Transactions on Information Theory, vol. IT-46, 
March 2000. 
[2] D. Marco, E.J. Duarte-Melo, M. Liu, and D.L. Neuhoff, “On the 
Many-to-One Transport Capacity of a Dense Wireless Sensor 
Network and the Compressibility of Its Data,” IPSN 2003, pp. 1-
16, April 2003 
[3] E.J. Duarte-Melo, M. Liu, “Data-Gathering Wireless Sensor 
Networks: Organization and Capacity,” Computer Networks, vol. 
43, pp.519-537, Nov. 2003 
[4] IEEE Computer Society LAN MAN Standards Committee, 
“Wireless LAN Medium Access Control (MAC) and Physical 
Layer (PHY) Specifications,” IEEE Std. 802.11, 1997 
[5] L. Jiang, “Improving Capacity and Fairness by Elimination of 
Exposed and Hidden Nodes in 802.11 Networks,” M.Phil Thesis, 
The Chinese University of Hong Kong, Jun. 2005. 
[6] L. Jiang and S. C. Liew, “Removing Hidden Nodes in IEEE 
802.11 Wireless Networks,” IEEE VTC, Sept. 2005. More com-
prehensive version to appear as “Hidden-node Removal and Its 
Application in Cellular WiFi Networks”  IEEE Trans. On Vehicu-
lar Technology, Nov 2007. 
[7] P.C. Ng and S.C. Liew, “Offered Load Control in IEEE802.11 
Multi-hop Ad-hoc Networks,” IEEE MASS, Oct. 2004. More 
comprehesive version to appear as “Throughput Analysis of 
IEEE 802.11 Multi-hop Ad hoc Networks,” IEEE/ACM Transac-
tions on Networking, June 2007. 
[8] The Institute of Electrical and Electronics Engineers Inc. Press, 
“Wireless Communications Principles and Practice” 
[9] J. Li, C. Blake et al., “Capacity of Ad Hoc Wireless Networks,” 
ACM MobiCom, July 2001 
[10] “The Network Simulator NS-2”, http://www.isi.edu 
/nsnam/ns 
[11] P. C. Ng, S. C. Liew, and L. Jiang, “Achieving Scalable Perform-
ance in Large-Scale IEEE 802.11 Wireless Networks,” IEEE 
WCNC, March 2005 
[12] http://www.ie.cuhk.edu.hk/soung/many_to_one, Technical 
Report with Appendix on HFP Algorithm and NS-2 code modeling 
multiple interferences. 
[13] K. Jain, J. Padhye et al, “Impact of Interference on Multi-hop Wire-
less Network Performance”, MobiCom ’03, Sept. 2003  
Chi Pan Chan received his B.Eng and M.Phil. degrees in Informa-
tion Engineering from The Chinese University of Hong Kong in 2004 
and 2006. His research was mainly related to capacity analysis in 
multi-hop wireless networks. He is now involved in the software in-
dustry in the field of multimedia and networking. 
Soung Chang Liew received his S.B., S.M., E.E., and Ph.D. de-
grees from the Massachusetts Institute of Technology. From March 
1988 to July 1993, Soung was at Bellcore (now Telcordia), New Jer-
sey, where he engaged in Broadband Network Research. Soung is 
currently Professor and Chairman of the Department of Information 
Engineering, the Chinese University of Hong Kong. Soung’s current 
research interests focus on wireless networking. Recently, Soung 
and his student won the best paper awards in the 1st IEEE Interna-
tional Conference on Mobile Ad-hoc and Sensor Systems (IEEE 
MASS 2004) the 4th IEEE International Workshop on Wireless Local 
Network (IEEE WLN 2004). Separately, TCP Veno, a version of TCP 
to improve its performance over wireless networks proposed by 
Soung and his student, has been incorporated into a recent release 
of Linux OS. Publications of Soung can be found in 
www.ie.cuhk.edu.hk/soung. Besides academic activities, Soung is 
also active in the industry. He co-founded two technology start-ups in 
Internet Software and has been serving as consultant to many com-
panies and industrial organizations. He is currently consultant for the 
Hong Kong Applied Science and Technology Research Institute 
(ASTRI), providing technical advice as well as helping to formulate 
R&D directions and strategies in the areas of Wireless Internetwork-
ing, Applications, and Services. 
An Chan received the B.Eng degree in Information Engineering from 
The Chinese University of Hong Kong, Hong Kong in 2005. He is 
currently working toward a M.Phil degree in the same field at The 
Chinese University of Hong Kong. His research interests are in QoS 
over wireless network and advanced IEEE 802.11-like multi-access 
protocols. He is a graduate student member of IEEE.
ABSTRACT
  This paper investigates the many-to-one throughput capacity (and by symmetry,
one-to-many throughput capacity) of IEEE 802.11 multi-hop networks. It has
generally been assumed in prior studies that the many-to-one throughput
capacity is upper-bounded by the link capacity L. Throughput capacity L is not
achievable under 802.11. This paper introduces the notion of "canonical
networks", which is a class of regularly-structured networks whose capacities
can be analyzed more easily than unstructured networks. We show that the
throughput capacity of canonical networks under 802.11 has an analytical upper
bound of 3L/4 when the source nodes are two or more hops away from the sink;
and simulated throughputs of 0.690L (0.740L) when the source nodes are many
hops away. We conjecture that 3L/4 is also the upper bound for general
networks. When all links have equal length, 2L/3 can be shown to be the upper
bound for general networks. Our simulations show that 802.11 networks with
random topologies operated with AODV routing can only achieve throughputs far
below the upper bounds. Fortunately, by properly selecting routes near the
gateway (or by properly positioning the relay nodes leading to the gateway) to
fashion after the structure of canonical networks, the throughput can be
improved significantly by more than 150%. Indeed, in a dense network, it is
worthwhile to deactivate some of the relay nodes near the sink judiciously.

<|endoftext|><|startoftext|>
Scanning Tunneling Spectroscopy in the Superconducting State and Vortex Cores of
the β-pyrochlore KOs2O6
C. Dubois,∗ G. Santi, I. Cuttat, C. Berthod, N. Jenkins, A. P. Petrović, A. A. Manuel, and Ø. Fischer
DPMC-MaNEP, Université de Genève, Quai Ernest-Ansermet 24, 1211 Genève 4, Switzerland
S. M. Kazakov, Z. Bukowski, and J. Karpinski
Laboratory for Solid State Physics ETHZ, CH-8093 Zürich, Switzerland
(Dated: October 24, 2018)
We performed the first scanning tunneling spectroscopy measurements on the pyrochlore super-
conductor KOs2O6 (Tc = 9.6 K) in both zero magnetic field and the vortex state at several temper-
atures above 1.95 K. This material presents atomically flat surfaces, yielding spatially homogeneous
spectra which reveal fully-gapped superconductivity with a gap anisotropy of 30%. Measurements
performed at fields of 2 and 6 T display a hexagonal Abrikosov flux line lattice. From the shape of
the vortex cores, we extract a coherence length of 31–40 Å, in agreement with the value derived from
the upper critical field Hc2. We observe a reduction in size of the vortex cores (and hence the coher-
ence length) with increasing field which is consistent with the unexpectedly high and unsaturated
upper critical field reported.
PACS numbers: 74.70.Dd, 74.50.+r, 74.25.Qt
The discovery of superconductivity in the β-pyrochlore
osmate compounds AOs2O6 (A = K, Rb, Cs) [1] has high-
lighted the question of the origin of superconductivity
in classes of materials which possess geometrical frustra-
tion [2, 3]. Interest has been predominantly focused on
the highest-Tc compound KOs2O6 which presents many
striking characteristics. In particular, the absence of in-
version symmetry in its crystal structure [4] raises the
question of its Cooper pair symmetry and the possibility
of spin singlet-triplet mixing [5, 6].
The pyrochlore osmate compound KOs2O6 displays a
critical temperature Tc = 9.6 K, the largest in its class of
materials (CsOs2O6 and RbOs2O6 which differ only by
the nature of the alkali ion have Tcs of 3.3 and 6.3 K re-
spectively). Although band structure calculations show
that the K ion does not influence the density of states
(DOS) at the Fermi level [7, 8], it seems to affect sev-
eral key properties [9]. In particular, the first order
phase transition revealed by specific heat measurements
in magnetic fields at the temperature Tp ≈ 7.5 K has
been ascribed to a “freezing” of its rattling motion [10].
The negative curvature of the resistivity as a function
of temperature also indicates a large electron-phonon
scattering [11]. Specific heat measurements [12] sug-
gest the coexistence of strong electron correlations and
strong electron-phonon coupling, two generally antago-
nistic phenomena with respect to the superconducting
pairing symmetry. The nature of the symmetry remains
a controversial subject in the literature. NMR [13] and
µSR [14] data suggest anisotropic gap functions with
nodes whereas thermal conductivity experiments [15] fa-
vor a fully-gapped state.
The peculiar behavior of KOs2O6 is demonstrated
by its upper critical magnetic field Hc2, whose tem-
perature dependence is linear down to sub-Kelvin tem-
peratures and whose amplitude is above the Clogston
limit [16]. One possible interpretation is the occur-
rence of spin-triplet superconductivity driven by spin-
orbit coupling [5, 6]. Alternatively, it has also been sug-
gested that this behavior can be explained by the peculiar
topology of the Fermi surface (FS) sheets of KOs2O6,
assuming that superconductivity occurs mainly on the
closed sheet [16].
The understanding of the physics of this compound
would greatly benefit from a detailed knowledge of the
local density of states (LDOS). Scanning Tunneling Spec-
troscopy (STS) is an ideal tool for this, particularly
since it allows one to map the vortices in real space and
also access the normal state below Tc by probing their
cores [17, 18, 19, 20]. In this Letter we present a detailed
STS study of KOs2O6 single crystals, including the first
vortex imaging in this material.
The KOs2O6 single crystals were grown from Os and
KO2 in oxygen-filled quartz ampoules. Their dimensions
are around 0.3 × 0.3 × 0.3 mm3. The details of their
chemical properties as well as their growth conditions can
be found in Ref. 4. AC susceptibility measurements show
a very sharp superconducting transition (∆Tc = 0.35 K).
Our measurements are carried out using a home-built low
temperature scanning tunneling microscope featuring a
compact nanopositioning stage [21] to target the small-
sized crystals. Electrochemically etched iridium tips are
used for STS measurements on as-grown single crystal
surfaces and the differential conductivity was measured
using a standard AC lock-in technique.
The surface topography of as-grown samples (Fig. 1a)
reveals atomically flat regions speckled with small corru-
gated islands a few Ångströms high whose spectroscopic
characteristics are noisy and not superconducting (thus
restraining our field of view for spectroscopic imaging).
http://arxiv.org/abs/0704.0529v1
0 50 100 150 200
PSfrag replacements
x (nm)
Distance d (nm)
Bias voltage V (mV)
100 (Å)
Conductance (shifted, arb. units)
−5 −4 −3 −2 −1 0 1 2 3 4 5
PSfrag replacements
x (nm)
y (nm)
Distance d (nm)
Height z (Å)
Bias voltage V (mV)
)(a) (b)
FIG. 1: (a) Large-scale topography of KOs2O6 (T = 2 K,
Rt = 60 MΩ); the box shows the measurement area for the
vortex maps. (b) Spectroscopic trace along a 100 Å path
taken on an atomically flat region with one spectrum every
1 Å. The spectra show raw data offset vertically for clarity
(T = 2 K, Rt = 20 MΩ).
The large flat regions display highly homogeneous super-
conducting spectra (Fig. 1b), which were perfectly repro-
ducible over the timescale of our experiments (4 months).
We have checked that the spectra obtained by varying
the tunnel resistance Rt all collapse onto a single curve,
thus confirming true vacuum tunneling conditions. We
have also verified that the numerical derivative of the
tunnel current with respect to the voltage gives the same
spectroscopic signature as the dI/dV lock-in signal. We
stress that all measurements presented in this paper are
raw data.
The lack of inversion symmetry in this compound to-
gether with several experimental findings raises the ques-
tion of the symmetry of the gap function. In order to
clarify this point, we have fitted our data to several sym-
metry models, focusing on the question of the presence
or absence of nodes and the amplitude of any possible
gap anisotropy. We therefore considered three scenarii
with an approximate angular dependence of the gap, i.e.
an isotropic s-wave (∆0), a d-wave (∆ cos 2φ) with nodes
and an “anisotropic” s-wave (∆0 + ∆sinφ) which has
the same angular dependence as the s-p-wave singlet-
triplet mixed state [6]. We do not take the real topol-
ogy of the FS [7] into account, since it comprises two
3D Fermi sheets and is hence unlikely to have any sig-
nificant effect on the gap structure. For an anisotropic
gap, ∆(φ), the quasiparticle DOS is given by N(ω) ∝
|Re[〈(ω+iΓ)/
(ω + iΓ)2 − |∆(φ)|2〉φ]| where Γ is a phe-
nomenological scattering rate. In addition, we included
broadenings due to the experimental temperature and
the lock-in in our fits. The results are presented in Fig. 2.
The d-wave model can be rejected at this stage since its
zero bias conductance (ZBC) is larger than in experi-
ment (increasing Γ in the model can only increase the
ZBC). The differences between symmetries appear much
more clearly in the second derivative spectrum (d2I/dV 2,
Fig. 2d) which is not surprising as it emphasizes the varia-
tions of the DOS on a small energy scale and is very sensi-
tive to the model parameters (in contrast with the dI/dV
−4 −3 −2 −1 0 1 2 3 4
V (mV)
Experiment
anisotropic s−wave
s−wave
d−wave
PSfrag replacements
x (nm)
y (nm)
Distance d (nm)
Height z (Å)
Bias voltage V (mV)
100 (Å)
Conductance (shifted, arb. units)
−2 −1 0 1 2
V (mV)
PSfrag replacements
x (nm)
y (nm)
Distance d (nm)
Height z (Å)
Bias voltage V (mV)
100 (Å)
Conductance (shifted, arb. units)
−5 0 5
V (mV)
1.95 K
3.10 K
4.00 K
5.10 K
6.00 K
9.00 K
10.00 K
PSfrag replacements
x (nm)
y (nm)
Distance d (nm)
Height z (Å)
Bias voltage V (mV)
100 (Å)
Conductance (shifted, arb. units)
anisotropic
(meV) s d s-wave
∆0 1.22 - 1.09
∆ - 1.52 0.40
Γ 0.12 0 0.05
2.93 3.66 3.58
(a) (b)
T = 1.95 K
T = 1.95 K
FIG. 2: Experimental and theoretical tunneling spectra.
(a) Normalized dI/dV spectra at different temperatures from
1.95 to 10 K (spectra are offset vertically for clarity). (b) Pa-
rameters for the different theoretical models. (c) Comparison
of the experimental spectrum at low temperature and low en-
ergy with the different theoretical models; the color codes are
explained in (d). (d) Same as (c) for the second derivative
d2I/dV 2.
curve). The best fit is clearly given by the “anisotropic”
s-wave model with an anisotropy of around 30%. With
respect to the singlet-triplet mixed state, we note that
we do not see any evidence in our data for a second co-
herence peak arising from spin-orbit splitting. Since the
3D nature of both sheets implies that tunneling takes
place in both of them, the absence of a second peak also
rules out the possibility of two different isotropic gaps
on separate FS sheets. Our results would however be
compatible with multiband superconductivity with two
(overlapping) anisotropic gaps. Finally, we see no signa-
ture of a normal-normal tunneling channel in our junc-
tion, suggesting that all electrons involved in the tunnel-
ing process come from the superconducting condensate.
To investigate the temperature evolution of the quasi-
particle DOS, we acquired tunneling conductance spec-
tra at different temperatures between 1.95 K and 10 K
(Fig. 2a). The closure of the gap at the bulk Tc shows
that we are probing the bulk properties of KOs2O6. This
−6 −4 −2 0 2 4 6
bias voltage V (mV)
−6 −4 −2 0 2 4 6
bias voltage V (mV)
(a) (b)H = 2 T H = 6 T
FIG. 3: Spectroscopic traces at T = 2 K across vortices for
a field of 2 T (a) and 6 T (b). The spectra at the vortex
centers are highlighted in red. The spatial variation of the
conductance is shown in the corresponding insets.
is further confirmed by the fact that similar spectra were
also obtained on freshly cleaved surfaces. The totally
flat conductance spectra at higher temperature show no
support for a pseudogap in the DOS above Tc, imply-
ing that the steep decrease in the 1/(T1T ) curve around
16 K in NMR data [13] must have a different origin. The
spectra taken between 6 and 9 K (not shown) were very
noisy. This could be explained by the proximity to the
first order transition at Tp ≃ 7.5 K [10].
The BCS coupling ratio 2∆max/kBTc inferred from
our measured gaps and critical temperature is about 3.6
for the anisotropic s-wave case, a value slightly smaller
than that reported from specific heat measurements [12].
However, we stress that STS is a direct probe of the su-
perconducting gap. Our findings therefore lead us to the
conclusion that KOs2O6 is fully gapped with a significant
anisotropy of around 30%.
We now focus on measurements performed in an ap-
plied magnetic field. In the vortex cores whose radial size
is roughly given by the coherence length ξ, superconduc-
tivity is suppressed leading to a drastic change in the
LDOS which can be measured by STM. Our measure-
ments were performed for two fields, 2 and 6 T, over the
particularly flat region of about 60 × 60 nm2 (Fig. 1a).
Each measurement was taken at 2 K with a typical ac-
quisition time of 40 hours.
The results are presented in Figs 3 and 4. The vor-
tex maps (insets of Fig. 3 and Figs 4a and 4b) show
the ZBC normalized to the conductance at 6 meV. Fig. 3
displays the spectra taken along traces passing through
vortex cores for each of the two fields considered. The
suppression of superconductivity and its effect on the
conductance in a vortex core can clearly be seen. The
vortex maps show a roughly hexagonal vortex lattice
with vortex spacings d = 352 ± 17 Å and 216 ± 21 Å
at 2 and 6 T respectively, in agreement with the spacings
2Φ0/H
expected for an Abrikosov hexago-
nal lattice [22], i.e. 345 Å and 199 Å. We ascribe the
variations in the core shapes and the deviation from a
perfectly hexagonal lattice to vortex pinning. In partic-
ular, the vortex identified by the arrow in Fig. 4 appears
to be split. We attribute this to the vortex oscillating
between two pinning centers during the measurement, a
situation which has been seen in other compounds [23].
One should also note that the islands (surface defects)
at the border of the measurement area (Fig. 1) could
influence the vortex core shapes and positions.
In order to estimate the coherence length ξ from our
measurements, we now consider the spatial dependence of
the ZBC. Due to the proximity of the vortices, we model
the LDOS as a superposition of isolated vortex LDOS
which can be expressed as N(ω, r) =
n |un(r)|
δ(ω −
En) + |vn(r)|2 δ(ω + En), where ψn(r) = (un(r), vn(r))
is the wave function of the nth vortex core state and
En its energy. An approximate solution for the iso-
lated vortex was given long ago [24] in which the ra-
dial dependence of each ψn(r) consists of a rapidly os-
cillating n-dependent Bessel function multiplied by a
cosh−1/π(r/ξ) envelope common to all states. We there-
fore construct a phenomenological model for our 2D ZBC
maps, σ(ω = 0, r) ∝ N(ω = 0, r), by retaining the slowly
varying parts of the wave functions alone, i.e.
σ(ω = 0, r) = σ0 + Λ
|r − ri|
where σ0 = 0.13 is the residual normalized conductance
at zero bias in the absence of field (Fig. 2c), Λ a scaling
factor, ξ the coherence length and the sum runs over all
the vortices with positions ri in the map. Using (1), we
fitted ri and ξ over the entire map for each field, thus
considering all imaged vortices to determine ξ.
The results from the 2D fits are presented in Fig. 4c
and d in map format and along traces selected to pass
through vortex cores in Fig. 4e and f. The traces help
to visualize the spatial extent of the vortices and assess
the (extremely high) quality of the 2D fits. We first ob-
serve that the normalized ZBC between the vortices is
slightly enhanced at H = 2 T but increases strongly at
H = 6 T with respect to the value at zero-field (Fig. 2c),
indicating a significant core overlap. From our data taken
at T = 2 K, we obtain ξ = 35 ± 3 Å and 45 ± 7 Å at
H = 6 and 2 T respectively (the uncertainties are esti-
mated from the spread of the results obtained on several
maps: two for 6 T and three for 2 T). Using Ginzburg-
Landau theory, we extrapolate the corresponding T = 0
values as ξ = 31± 3 and 40± 6 Å respectively, consistent
with the value derived from Hc2. Furthermore, our re-
sults indicate that the vortex size decreases with increas-
ing field and, although at the limit of the error bars, we
believe this trend to be genuine. In addition, this finding
is consistent with the abnormally large Hc2: if the vor-
tices become smaller as the field increases, the material
can accommodate more vortices before the breakdown
0 20 40 60
Distance (nm)
0 20 40 60
Distance (nm)
) (e)
x (nm)
0 10 20 30 40 50
x (nm)
0 10 20 30 40 50 60
60 (c)
H = 2 T H = 6 T
0 0.2 0.4 0.6 0.8 1
FIG. 4: (a), (b) Experimental ZBC maps (T = 2 K) normal-
ized to the background conductance at 2 and 6 T respectively
with corresponding fits (c), (d); large values (red) correspond
to normal regions (i.e. vortex cores) and low values (blue) to
superconducting (gapped) regions. (e), (f) Experimental ZBC
profiles across vortex centers together with the corresponding
profiles from the 2D fits (red lines).
of superconductivity, leading to a higher upper critical
field. This correlates with the observed temperature de-
pendence of the upper critical field.
We find that the spectra at the vortex centers are flat
for both fields (Fig. 3), showing the presence of localized
quasiparticle states in the vortex cores. However, our
spectra show no excess spectral weight at or close to zero
bias and thus no ZBCP which is the generally expected
signature of vortex core states. The absence of a ZBCP
is at first glance striking considering the large mean free
path ℓ ≈ 200 nm ≫ ξ in KOs2O6 [15]. In fact, this ab-
sence is common to many non-cuprate superconductors,
the only known exceptions being 2H-NbSe2 [17, 25, 26]
and YNi2B2C [27]. Although no definitive theory cur-
rently exists to explain such an absence, a possible ex-
planation assumes that the scattering rate is strongly en-
hanced in the vortex cores. This interpretation is sup-
ported by our numerical solutions of the Bogoliubov-de
Gennes equations for a single vortex with an r-dependent
scattering rate Γ. Furthermore, these simulations show a
radial dependence of the LDOS which is fully consistent
with (1).
In conclusion, we have presented the first scanning tun-
neling spectroscopic measurements on superconducting
KOs2O6. The fitted spectra demonstrate that KOs2O6
is a fully-gapped superconductor with an anisotropy of
around 30%, possibly resulting from a s-p singlet-triplet
mixed state allowed by the lack of inversion symme-
try. We have imaged hexagonal vortex lattices matching
Abrikosov’s prediction for 2 and 6 T fields. Using Caroli-
de Gennes-Matricon theory we extract a field-dependent
coherence length of 31–40 Å, in good agreement with the
thermodynamic estimate fromHc2. The absence of a zero
bias conductance peak, the apparent field dependence of
ξ and the precise radial dependence of the LDOS all call
for deeper exploration.
We acknowledge T. Jarlborg, M. Decroux, I. Maggio-
Aprile and P. Legendre for valuable discussions and thank
P.E. Bisson, L. Stark and M. Lancon for technical sup-
port. This work was supported by the Swiss National
Science Foundation through the NCCR MaNEP.
∗ Electronic address: duboisc@mit.edu
[1] S. Yonezawa, Y. Muraoka, Y. Matsushita and Z. Hiroi,
J. Phys.:Condens. Matter 16, L9 (2004); ibid, J. Phys.
Soc. Jpn 73, 819 (2004); S. Yonezawa, Y. Muraoka and
Z. Hiroi, J. Phys. Soc. Jpn 73, 1655 (2004).
[2] P. W. Anderson, Mater. Res. Bull. 8, 153 (1973).
[3] H. Aoki, J. Phys.: Condens. Matter 16, V1 (2004).
[4] G. Schuck, S. Kazakov, K. Rogacki, N. Zhigadlo, and
J. Karpinski, Phys. Rev. B 73, 144506 (2006).
[5] P. A. Frigeri, D. F. Agterberg, A. Koga, and M. Sigrist,
Phys. Rev. Lett. 92, 097001 (2004); ibid, Phys. Rev. Lett.
93, 099903 (2004).
[6] N. Hayashi, Y. Kato, P. A. Frigeri, K. Wakabayashi, and
M. Sigrist, Physica C 437-38, 96 (2006).
[7] J. Kuneš, T. Jeong, and W. E. Pickett, Phys. Rev. B 70,
174510 (2004).
[8] R. Saniz, J. Medvedeva, L.-H. Ye, T. Shishidou, and
A. Freeman, Phys. Rev. B 70, 100505(R) (2004).
[9] J. Kuneš and W. E. Pickett, Phys. Rev. B 74, 094302
(2006).
[10] Z. Hiroi, S. Yonezawa, and J. Yamaura, cond-
mat/0607064, to be published in the Proceedings of
HFM2006 (J.Phys.: Condens. Matter) (2006).
[11] Z. Hiroi, S. Yonezawa, J. Yamaura, T. Muramatsu, and
Y. Muraoka, J. Phys. Soc. Jpn. 74, 1682 (2005).
[12] M. Brühwiler, S. Kazakov, J. Karpinski, and B. Batlogg,
Phys. Rev. B 73, 094518 (2006).
[13] K. Arai, J. Kikuchi, K. Kodama, M. Takigawa,
S. Yonezawa, Y. Muraoka, and Z. Hiroi, Physica B 359-
361, 488 (2005).
[14] A. Koda, W. Higemoto, K. Ohishi, S. R. Saha,
R. Kadono, S. Yonezawa, Y. Muraoka, and Z. Hiroi, J.
mailto:duboisc@mit.edu
Phys. Soc. Jpn. 74, 1678 (2005).
[15] Y. Kasahara, Y. Shimono, T. Shibauchi, Y. Matsuda,
S. Yonezawa, Y. Muraoka, and Z. Hiroi, Phys. Rev. Lett.
96, 247004 (2006).
[16] T. Shibauchi, L. Krusin-Elbaum, Y. Kasahara, Y. Shi-
mono, Y. Matsuda, R. D. McDonald, C. H. Mielke,
S. Yonezawa, Z. Hiroi, M. Arai, et al., Phys. Rev. B 74,
220506 (2006).
[17] H. Hess, R. Robinson, R. Dynes, J. J. Valles, , and
J. Waszczak, Phys. Rev. Lett. 62, 214 (1989).
[18] Y. DeWilde, M. Iavarone, U. Welp, V. Metlushko,
A. Koshelev, I. Aranson, G. Crabtree, and P. Canfield,
Phys. Rev. Lett. 78, 4273 (1997).
[19] M. Eskildsen, M. Kugler, S. Tanaka, J. Jun, S. Kazakov,
J. Karpinski, and Ø. Fischer, Phys. Rev. Lett. 89, 187003
(2002).
[20] N. Bergeal, V. Dubost, Y. Noat, W. Sacks, D. Roditchev,
N. Emery, C. Hérold, J.-F. Marêché, P. Lagrange, and
G. Loupias, Phys. Rev. Lett. 97, 077003 (2006).
[21] C. Dubois, P. E. Bisson, S. Reymond, A. A. Manuel, and
Ø. Fischer, Rev. Sci. Instrum. 77, 043712 (2006).
[22] A. A. Abrikosov, Sov. Phys.-JETP 5, 1174 (1957).
[23] B. Hoogenboom, M. Kugler, B. Revaz, I. Maggio-Aprile,
Ø. Fischer, and C. Renner, Phys. Rev. B 62, 9179 (2000).
[24] C. Caroli, P. de Gennes, and J. Matricon, Physics Letters
9, 307 (1964).
[25] F. Gygi and M. Schluter, Phys. Rev. B 41, 822 (1990).
[26] C. Renner, A. D. Kent, P. Niedermann, Ø. Fischer, and
F. Lévy, Phys. Rev. Lett. 67, 1650 (1991).
[27] H. Nishimori, K. Uchiyama, S. Kaneko, A. Tokura,
H. Takeya, K. Hirata, and N. Nishida, J. Phys. Soc. Jpn.
73, 3247 (2004).
ABSTRACT
  We performed the first scanning tunneling spectroscopy measurements on the
pyrochlore superconductor KOs2O6 (Tc = 9.6 K) in both zero magnetic field and
the vortex state at several temperatures above 1.95 K. This material presents
atomically flat surfaces, yielding spatially homogeneous spectra which reveal
fully-gapped superconductivity with a gap anisotropy of 30%. Measurements
performed at fields of 2 and 6 T display a hexagonal Abrikosov flux line
lattice. From the shape of the vortex cores, we extract a coherence length of
31-40 {\AA}, in agreement with the value derived from the upper critical field
Hc2. We observe a reduction in size of the vortex cores (and hence the
coherence length) with increasing field which is consistent with the
unexpectedly high and unsaturated upper critical field reported.

<|endoftext|><|startoftext|>
Introduction
In the low-energy limit string theory with D-branes gives rise to noncommutative field theory on
the branes when the string propagates in a nontrivial NS-NS two-form (B-field) background [1, 2,
3, 4]. In particular, if the open string has N=2 worldsheet supersymmetry, the tree-level target
space dynamics is described by a noncommutative self-dual Yang-Mills (SDYM) theory in 2+2
dimensions [5]. Furthermore, open N=2 strings in a B-field background induce on the worldvolume
of n coincident D2-branes a noncommutative Yang-Mills-Higgs Bogomolny-type system in 2+1
dimensions which is equivalent to a noncommutative generalization [6] of the modified U(n) chiral
model known as the Ward model [7]. The topological nature of N=2 strings and the integrability
of their tree-level dynamics [8] render this noncommutative sigma model integrable.1
Being integrable, the commutative U(n≥2) Ward model features a plethora of exact scattering
and no-scattering multi-soliton and wave solutions, i.e. time-dependent stable configurations on R2.
These are not only a rich testing ground for physical properties such as adiabatic dynamics or
quantization, but also descend to more standard multi-solitons of various integrable systems in
2+0 and 1+1 dimensions, such as sine-Gordon, upon dimensional and algebraic reduction. There
is a price to pay however: Nonlinear sigma models in 2+1 dimensions may be Lorentz-invariant
or integrable but not both [7, 11]. In fact, Derrick’s theorem prohibits the existence of stable
solitons in Lorentz-invariant scalar field theories above 1+1 dimensions. A Moyal deformation,
however, overcomes this hurdle, but of course replaces Lorentz invariance by a Drinfeld-twisted
version. There is another gain: The deformed Ward model possesses not only deformed versions
of the just-mentioned multi-solitons, but in addition allows for a whole new class of genuinely
noncommutative (multi-)solitons, in particular for the U(1) group [12, 13]! Moreover, this class is
related to the generic but perturbatively constructed noncommutative scalar-field solitons [14, 15]
by an infinite-stiffness limit of the potential [16].
In [12, 13] and [17]–[20] families of multi-solitons as well as their reduction to solitons of the
noncommutative sine-Gordon equations were described and studied. In the nonabelian case both
scattering and nonscattering configurations were obtained. For static configurations the issue of
their stability was analyzed [21]. The full moduli space metric for the abelian model was computed
and its adiabatic two-soliton dynamics was discussed [16].
Recall that the critical N=2 string theory has a four-dimensional target space, and its open
string effective field theory is self-dual Yang-Mills [8], which gets deformed noncommutatively in
the presence of a B-field [5]. Conversely, the noncommutative SDYM equations are contained [19]
in the equations of motion of N=2 string field theory (SFT) [22] in a B-field background. This
SFT formulation is based on the N=4 topological string description [23]. It is well known that
the SDYM model can be described in terms of holomorphic bundles over (an open subset of) the
twistor space2 [26] CP 3 and the topological N=4 string theory contains twistors from the outset.
The Lax pair, integrability and the solutions to the equations of motion by twistor and dressing
methods were incorporated into the N=2 open SFT in [27, 28]. However, this theory reproduces
only bosonic SDYM theory, its symmetries (see e.g. [29, 30, 31]) and integrability properties. It
is natural to ask: What string theory can describe supersymmetric SDYM theory [32, 33] in four
dimensions?
1For discussing some other noncommutative integrable models see e.g. [9, 10] and references therein.
2For reviews of twistor theory see, e.g., the books [24, 25].
There are some proposals [33, 34, 35, 36] for extending N=2 open string theory (and its SFT) to
be space-time supersymmetric. Moreover, it was shown by Witten [37] that N=4 supersymmetric
SDYM theory appears in twistor string theory, which is a B-type open topological string with the
supertwistor space CP 3|4 as a target space.3 Note that N<4 SDYM theory forms a BPS subsector
of N -extended super Yang-Mills theory, and N=4 SDYM can be considered as a truncation of the
full N=4 super Yang-Mills theory [37]. It is believed [43, 39] that twistor string theory is related
with the previous proposals [33, 34, 35, 36] for a Lorentz-invariant supersymmetric extension of
N=2 (and topological N=4) string theory which also leads to the N=4 SDYM model.
A dimensional reduction of the above relations between twistor strings and N=4 super Yang-
Mills and SDYM models was considered in [44, 45, 46, 47]. The corresponding twistor string
theory after this reduction is the topological B-model on the mini-supertwistor space P2|4. In [47]
it was shown that the 2N=8 supersymmetric extension of the Bogomolny-type model in 2+1
dimensions is equivalent to an 2N=8 supersymmetric modified U(n) chiral model on R2,1. The
subject of the current paper is an 2N≤8 version of the above supersymmetric Bogomolny-type
Yang-Mills-Higgs model in signature (− + +), its relation with an N -extended supersymmetric
modified integrable U(n) chiral model (to be defined) in 2+1 dimensions and the Moyal-type
noncommutative deformation of this chiral model. We go on to explicitly construct multi-soliton
configurations on noncommutative R2,1 for the corresponding supersymmetric sigma model field
equations. By studying the scattering properties of the constructed configurations, we prove their
asymptotic factorization without scattering for large times. We also briefly discuss a D-brane
interpretation of these soliton configurations from the viewpoint of twistor string theory.
2 Supersymmetric Bogomolny model in 2+1 dimensions
2.1 N -extended SDYM equations in 2+2 dimensions
Space R2,2. Let us consider the four-dimensional space R2,2 = (R4, g) with the metric
ds2 = gµνdx
µdxν = det(dxαα̇) = dx11̇dx22̇ − dx21̇dx12̇ (2.1)
with (gµν) = diag(−1,+1,+1,−1), where µ, ν, . . . = 1, . . . , 4 are space-time indices and α = 1, 2,
α̇ = 1̇, 2̇ are spinor indices. We choose the coordinates4
(xµ) = (xa, t̃) = (t, x, y, t̃) with a, b, . . . = 1, 2, 3 , (2.2)
and the signature (− ++−) allows us to introduce real isotropic coordinates (cf. [19, 6])
x11̇ = 1
(t− y) , x12̇ = 1
(x+ t̃) , x21̇ = 1
(x− t̃) , x22̇ = 1
(t+ y) . (2.3)
SDYM. Recall that the SDYM equations for a field strength tensor Fµν on R
2,2 read
εµνρσF
ρσ = Fµν , (2.4)
3For other variants of twistor string models see [38, 39, 40]. For recent reviews providing a twistor description of
super Yang-Mills theory, see [41, 42] and references therein.
4Our conventions are chosen to match those of [12] after reduction to the space R2,1 with coordinates (t, x, y).
where εµνρσ is a completely antisymmetric tensor on R
2,2 and ε1234 = 1. In the coordinates (2.3)
we have the decomposition
αα̇,ββ̇
= ∂αα̇Aββ̇ − ∂ββ̇Aαα̇ + [Aαα̇, Aββ̇ ] = εαβ Fα̇β̇ + εα̇β̇ Fαβ (2.5)
:= −1
αα̇,ββ̇
and Fαβ := −12ε
α̇β̇F
αα̇,ββ̇
, (2.6)
where εαβ is antisymmetric, εαβε
βγ = δ
α, and similar for ε
α̇β̇, with ε12 = ε1̇2̇ = 1. The gauge
potential (Aαα̇) will appear in the covariant derivative
, · ] . (2.7)
In spinor notation, (2.4) is equivalently written as
= 0 ⇔ F
αα̇,ββ̇
Fαβ . (2.8)
Solutions {Aαα̇} to these equations form a subset (a BPS sector) of the solution space of Yang-Mills
theory on R2,2.
N -extended SDYM in component fields. The field content of N -extended super SDYM is5
N = 0 Aαα̇ (2.9a)
N = 1 Aαα̇, χiα with i = 1 (2.9b)
N = 2 Aαα̇, χiα, φ[ij] with i, j = 1, 2 (2.9c)
N = 3 Aαα̇, χiα, φ[ij], χ̃
[ijk]
with i, j, k = 1, 2, 3 (2.9d)
N = 4 Aαα̇, χiα, φ[ij], χ̃
[ijk]
[ijkl]
with i, j, k, l = 1, 2, 3, 4 . (2.9e)
Here (Aαα̇, χ
[ij], χ̃
[ijk]
[ijkl]
) are fields of helicities (+1,+1
, 0,−1
,−1). These fields obey
the field equations of the N = 4 SDYM model, namely [33, 37]
= 0 , (2.10a)
Dαα̇χ
iα = 0 , (2.10b)
Dαα̇D
αα̇φij + 2{χiα, χjα} = 0 , (2.10c)
Dαα̇χ̃
α̇[ijk] − 6[χ[iα, φjk]] = 0 , (2.10d)
D γ̇α G
[ijkl]
+ 12{χ[iα, χ̃
} − 18[φ[ij ,D
φkl]] = 0 . (2.10e)
Note that the N < 4 SDYM field equations are governed by the first N+1 equations of (2.10),
where F
= 0 is counted as one equation and so on.
5We use symmetrization (·) and antisymmetrization [·] of k indices with weight 1
, e.g. [ij] = 1
(ij − ji).
2.2 Superfield formulation of N -extended SDYM
Superspace R4|4N . Recall that in the space R2,2 = (R4, g) with the metric g given in (2.1)
one may introduce purely real Majorana-Weyl spinors6 θα and ηα̇ of helicities +1
and −1
as anti-
commuting (Grassmann-algebra) objects. Using 2N such spinors with components θiα and ηα̇i for
i = 1, . . . ,N , one can define the N -extended superspace R4|4N and the N -extended supersymmetry
algebra generated by the supertranslation operators
Pαα̇ = ∂αα̇ , Qiα = ∂iα − ηα̇i ∂αα̇ and Qiα̇ = ∂iα̇ − θiα∂αα̇ , (2.11)
where
∂αα̇ :=
∂xαα̇
, ∂iα :=
and ∂iα̇ :=
∂ηα̇i
. (2.12)
The commutation relations for the generators (2.11) read
{Qiα, Qjα̇} = −2δ
iPαα̇ , [Pαα̇, Qiβ ] = 0 and [Pαα̇, Q
] = 0 . (2.13)
To rewrite equations of motion in terms of R4|4N superfields one uses the additional operators
Diα = ∂iα + η
i ∂αα̇ and D
α̇ = ∂
α̇ + θ
iα∂αα̇ , (2.14)
which (anti)commute with the operators (2.11) and satisfy
{Diα,Dj
} = 2δjiPαβ̇ , [Pαα̇,Diβ ] = 0 and [Pαα̇,D
] = 0 . (2.15)
Antichiral superspace R4|2N . On the superspace R4|4N one may introduce tensor fields de-
pending on bosonic and fermionic coordinates (superfields), differential forms, Lie derivatives LX
etc.. Furthermore, on any such superfield A one can impose the constraint equations LDiαA = 0,
which for a scalar superfield f reduce to the so-called antichirality conditions
Diαf = 0 . (2.16)
These are easily solved by using a coordinate transformation on R4|4N ,
(xαα̇, ηα̇i , θ
iα) → (x̃αα̇ = xαα̇−θiαηα̇i , ηα̇i , θiα) , (2.17)
under which ∂αα̇,Diα and D
α̇ transform to the operators
∂̃αα̇ = ∂αα̇ , D̃iα = ∂iα and D̃
α̇ = ∂
α̇ + 2θ
iα∂αα̇ . (2.18)
Then (2.16) simply means that f is defined on a sub-superspace R4|2N ⊂ R4|4N with coordinates
x̃αα̇ and ηα̇i . (2.19)
This space is called antichiral superspace. In the following we will usually omit the tildes when
working on the antichiral superspace.
6Note that in Minkowski signature the Weyl spinor θα is complex and ηα̇ = εα̇β̇η
β̇ = θα is complex conjugate
to θα. For the Kleinian (split) signature 2 + 2, however, these spinors are real and independent of one another.
N -extended SDYM in superfields. The N -extended SDYM equations can be rewritten in
terms of superfields on the antichiral superspace R4|2N [33, 48]. Namely, for any given 0 ≤ N ≤ 4,
fields of a proper multiplet from (2.9) can be combined into superfields Aαα̇ and Aiα̇ depending on
xαα̇, ηα̇i ∈ R4|2N and giving rise to covariant derivatives
∇αα̇ := ∂αα̇ +Aαα̇ and ∇iα̇ := ∂iα̇ +Aiα̇ . (2.20)
In such terms the N -extended SDYM equations (2.10) read
[∇αα̇,∇ββ̇] + [∇αβ̇ ,∇βα̇] = 0 , [∇
α̇,∇ββ̇ ] + [∇
,∇βα̇] = 0 , {∇iα̇,∇
}+ {∇i
} = 0 , (2.21)
which is equivalent to
[∇αα̇,∇ββ̇] = εα̇β̇ Fαβ , [∇
α̇,∇ββ̇ ] = εα̇β̇ F
β and {∇iα̇,∇
} = ε
F ij , (2.22)
where F ij is antisymmetric and Fαβ is symmetric in their indices.
The above gauge potential superfields (Aαα̇, Aiα̇) as well as the gauge strength superfields
(Fαβ , F iα, F ij) contain all physical component fields of theN -extended SDYMmodel. For instance,
the lowest component of the triple (Fαβ , F iα, F ij) in an η-expansion is (Fαβ , χiα, φij), with zeros
in case N is too small. By employing Bianchi identities for the gauge strength superfields, one
successively obtains [48] the superfield expansions and the field equations (2.10) for all component
fields.
It is instructive to extend the antichiral combination in (2.18) to potentials and covariant
derivatives,
D̃iα̇ = ∂
α̇ + 2 θ
iα ∂αα̇
+ + +
Ãiα̇ := Aiα̇ + 2 θiαAαα̇
‖ ‖ ‖
∇̃iα̇ := ∇iα̇ + 2 θiα∇αα̇
(2.23)
where ∇αα̇, ∇iα̇ and D̃iα̇ are given by (2.20) and (2.18), while Aiα̇ and Aαα̇ depend on xαα̇ and ηα̇i
only. With the antichiral covariant derivatives, one may condense (2.21) or (2.22) into the single
{∇̃iα̇, ∇̃
} + {∇̃i
, ∇̃j
} = 0 ⇔ {∇̃iα̇, ∇̃
} = ε
F̃ ij , (2.24)
with F̃ ij = F ij + 4 θ[iαF j]α + 4 θiαθjβFαβ . The concise form (2.24) of the N -extended SDYM
equations is quite convenient, and we will use it interchangeable with (2.21).
Linear system for N -extended SDYM. It is well known that the superfield SDYM equations
(2.21) can be seen as the compatibility conditions for the linear system of differential equations
ζ α̇(∂αα̇ +Aαα̇)ψ = 0 and ζ α̇(∂iα̇ +Aiα̇)ψ = 0 , (2.25)
where (ζ
and ζ α̇ = εα̇β̇ζ
. The extra (spectral) parameter7 ζ lies in the extended complex
plane C∪∞ = CP 1. Here ψ is a matrix-valued function depending not only on xαα̇ and ηα̇i but also
(meromorphically) on ζ ∈ CP 1. We subject the n×n matrix ψ to the following reality condition:
ψ(xαα̇, ηα̇i , ζ)
ψ(xαα̇, ηα̇i , ζ̄)
= 1l , (2.26)
7The parameter ζ is related with λ used in [45] by the formula ζ = i 1−λ
(cf. e.g. [31]).
where “†” denotes hermitian conjugation and ζ̄ is complex conjugate to ζ. This condition guarantees
that all physical fields of the N -extended SDYM model will take values in the adjoint representation
of the algebra u(n). In the concise form the linear system (2.25) is written as
ζ α̇(∇iα̇ + 2θiα∇αα̇)ψ = 0 ⇔ ζ α̇(D̃iα̇ + Ãiα̇)ψ = 0 ⇔ ζ α̇ ∇̃iα̇ ψ = 0 . (2.27)
2.3 Reduction of N -extended SDYM to 2+1 dimensions
The supersymmetric Bogomolny-type Yang-Mills-Higgs equations in 2+1 dimensions are obtained
from the described N -extended super SDYM equations by a dimensional reduction R2,2 → R2,1.
In particular, for the N=0 sector we demand the components Aµ of a gauge potential to be
independent of x4 and put A4 =: ϕ. Here, ϕ is a Lie-algebra valued scalar field in three dimensions
(the Higgs field) which enters into the Bogomolny-type equations. Similarly, for N ≥ 1 one can
reduce the N -extended SDYM equations on R2,2 by imposing the ∂4-invariance condition on all
the fields (Aαα̇, χ
[ij], χ̃
[ijk]
[ijkl]
) from the N=4 supermultiplet or its truncation to N<4
and obtain supersymmetric Bogomolny-type equations on R2,1.
Spinors in R2,1. Recall that on R2,2 both N=4 SDYM theory and full N=4 super Yang-
Mills theory have an SL(4, R) ∼= Spin(3,3) R-symmetry group [33]. A dimensional reduction to
2,1 enlarges the supersymmetry and R-symmetry to 2N=8 and Spin(4,4), respectively, for both
theories (cf. [49] for Minkowski signature). More generally, any number N of supersymmetries gets
doubled to 2N in the reduction. Since dimensional reduction collapses the rotation group Spin(2,2)
∼= Spin(2,1)L×Spin(2,1)R of R2,2 to its diagonal subgroup Spin(2,1)D as the local rotation group
of R2,1, the distinction between undotted and dotted indices disappears. We shall use undotted
indices henceforth.
Coordinates and derivatives in R2,1. The above discussion implies that one can relabel the
bosonic coordinates xαβ̇ from (2.3) by xαβ and split them as
xαβ = 1
(xαβ + xβα) + 1
(xαβ − xβα) = x(αβ) + x[αβ] (2.28)
into antisymmetric and symmetric parts,
x[αβ] = 1
εαβx4 = 1
εαβ t̃ and x(αβ) =: yαβ , (2.29)
respectively, with
y11 = x11 = 1
(t− y) , y12 = 1
(x12 + x21) = 1
x , y22 = x22 = 1
(t+ y) . (2.30)
We also have θiα 7→ θiα and ηα̇i 7→ ηαi for the fermionic coordinates on R4|4N reduced to R3|4N .
Bosonic coordinate derivatives reduce in 2+1 dimensions to the operators
∂(αβ) =
(∂αβ + ∂βα) (2.31)
which read explicitly as
∂(11) =
= ∂t−∂y , ∂(12) = ∂(21) = 12
= ∂x , ∂(22) =
= ∂t+∂y . (2.32)
We thus have
= ∂(αβ) − εαβ∂4 = ∂(αβ) − εαβ∂t̃ , (2.33)
where ε12 = −ε21 = −1, ∂4 = ∂/∂x4 and ∂t̃ = ∂/∂t̃.
The operators Diα and D
α̇ acting on t̃-independent superfields reduce to
Diα = ∂iα + η
i ∂(αβ) and D
α = ∂
α + θ
iβ∂(αβ) , (2.34)
where ∂iα = ∂/∂θ
iα and ∂iα = ∂/∂η
i . Similarly, the antichiral operators D̃iα and D̃
α̇ in (2.18)
become
D̂iα = ∂iα and D̂
α = ∂
α + 2θ
iβ∂(αβ) . (2.35)
Supersymmetric Bogomolny-type equations in component fields. According to (2.33),
the components A
of a gauge potential in four dimensions split into the components A(αβ) of a
gauge potential in three dimensions and a Higgs field A[αβ] = −εαβ ϕ, i.e.
Aαβ = A(αβ) +A[αβ] = A(αβ) − εαβ ϕ . (2.36)
Then the covariant derivatives D
reduced to three dimensions become the differential operators
Dαβ − εαβ ϕ = ∂(αβ) + [A(αβ), · ]− εαβ [ϕ, · ] , (2.37)
and the Yang-Mills field strength on R2,1 decomposes as
Fαβ, γδ = [Dαβ , Dγδ] = εαγ fβδ + εβδ fαγ with fαβ = fβα . (2.38)
Substituting (2.36) and (2.37) into (2.10), i.e. demanding that all fields in (2.10) are independent
of x4 = t̃, we obtain the following supersymmetric Bogomolny-type equations on R2,1:
fαβ +Dαβϕ = 0 , (2.39a)
Dαβ χ
iβ + εαβ [ϕ, χ
iβ] = 0 , (2.39b)
Dαβ D
αβφij + 2[ϕ, [ϕ, φij ]] + 2{χiα, χjα} = 0 , (2.39c)
Dαβ χ̃
β[ijk] − εαβ [ϕ, χ̃β[ijk]]− 6[χ[iα, φjk]] = 0 , (2.39d)
[ijkl]
+ [ϕ,G
[ijkl]
] + 12{χ[iα, χ̃jkl]β } − 18[φ[ij ,Dαβφkl]]− 18εαβ [φ[ij , [φkl], ϕ]] = 0 .(2.39e)
Supersymmetric Bogomolny-type equations in terms of superfields. Translations gen-
erated by the vector field ∂4 = ∂t̃ are isometries of superspaces R
4|4N and R4|2N . By taking the
quotient with respect to the action of the abelian group G generated by ∂4, we obtain the reduced
full superspace R3|4N ∼= R4|4N /G and the reduced antichiral superspace R3|2N ∼= R4|2N/G. In the
following, we shall work on R3|2N and R3|2N × CP 1, since the reduced ψ-function from (2.25) and
(2.27) is defined on the latter space.
The linear system stays in the center of the superfield approach to the N -extended SDYM
equations. After imposing t̃-independence on all fields in the linear system (2.27), we arrive at the
linear equations
ζα ∇̂iα ψ ≡ ζα(D̂iα + Âiα)ψ = 0 (2.40)
of the same form but with
D̂iα = ∂
α + 2θ
iβ∂(αβ) and Âiα = Aiα + 2θiβ(A(αβ) − εαβΞ) , (2.41)
where Aiα, A(αβ) and Ξ are superfields depending on yαβ and ηαi only. These linear equations
expand again to the pair (cf. (2.25))
ζβ(∂(αβ) +A(αβ) − εαβΞ)ψ = 0 and ζα(∂iα +Aiα)ψ = 0 . (2.42)
The compatibility conditions for the linear system (2.40) read
{∇̂iα, ∇̂
} + {∇̂iβ, ∇̂jα} = 0 ⇔ {∇̂iα, ∇̂
} = εαβ F̂ ij (2.43)
and present a condensed form of (2.39) rewritten in terms of R3|2N superfields. Similarly, these
equations can also be written in more expanded forms analogously to (2.21) or using the superfield
analog of (2.37). However, we will not do this since all these sets of equations are equivalent.
3 Noncommutative N -extended U(n) chiral model in 2+1 dimensions
As has been known for some time, nonlinear sigma models in 2 + 1 dimensions may be Lorentz-
invariant or integrable but not both [7, 11]. We will show that the super Bogomolny-type model
discussed in Section 2 after a gauge fixing is equivalent to a super extension of the modified U(n)
chiral model (so as to be integrable) first formulated by Ward [7]. Since integrability is compatible
with noncommutative deformation (if introduced properly, see e.g. [9]–[20]) we choose from the
beginning to formulate our super extension of this chiral model on Moyal-deformed R2,1 with
noncommutativity parameter θ ≥ 0. Ordinary space-time R2,1 can always be restored by taking
the commutative limit θ → 0.
Star-product formulation. Classical field theory on noncommutative spaces may be realized
in a star-product formulation or in an operator formalism8. The first approach is closer to the
commutative field theory: it is obtained by simply deforming the ordinary product of classical
fields (or their components) to the noncommutative star product
(f ⋆ g)(x) = f(x) exp{ i
ab −→∂b} g(x) ⇒ xa ⋆ xb − xb ⋆ xa = iθab (3.1)
with a constant antisymmetric tensor θab. Specializing to R2,1, we use real coordinates (xa) =
(t, x, y) in which the Minkowski metric g on R3 reads (gab) = diag(−1,+1,+1) with a, b, . . . = 1, 2, 3
(cf. Section 2). It is straightforward to generalize the Moyal deformation (3.1) to the superspaces
introduced in the previous section, allowing in particular for non-anticommuting Grassmann-odd
coordinates. Deferring general superspace deformations and their consequences to future work,
we here content ourselves with the simple embedding of the “bosonic” Moyal deformation into
superspace, meaning that (3.1) is also valid for superfields f and g depending on Grassmann
variables θiα and ηαi .
For later use we consider not only isotropic coordinates and vector fields
u := 1
(t+y) = y22 , v := 1
(t−y) = y11 , ∂u = ∂t + ∂y = ∂(22) , ∂v = ∂t − ∂y = ∂(11) (3.2)
8See [50] for reviews on noncommutative field theories.
introduced in Section 2, but also the complex combinations
z := x+ iy , z̄ := x− iy , ∂z = 12 (∂x − i∂y) , ∂z̄ =
(∂x + i∂y) . (3.3)
Since the time coordinate t remains commutative, the only nonvanishing component of the non-
commutativity tensor θab is
θxy = −θyx =: θ > 0 ⇒ θzz̄ = −θz̄z = −2i θ . (3.4)
Hence, we have
z ⋆ z̄ = zz̄ + θ and z̄ ⋆ z = zz̄ − θ (3.5)
as examples of the general formula (3.1).
Operator formalism. The nonlocality of the star products renders explicit computation cum-
bersome. We therefore pass to the operator formalism, which trades the star product for operator-
valued spatial coordinates (x̂, ŷ) or their complex combinations (ẑ, ˆ̄z), subject to
[t, x̂] = [t, ŷ] = 0 but [x̂, ŷ] = iθ ⇒ [ẑ, ˆ̄z] = 2 θ . (3.6)
The latter equation suggests the introduction of annihilation and creation operators,
ẑ and a† =
ˆ̄z with [a , a†] = 1 , (3.7)
which act on a harmonic-oscillator Fock space H with an orthonormal basis { |ℓ〉, ℓ = 0, 1, 2, . . .}
such that
a |ℓ〉 =
ℓ |ℓ−1〉 and a† |ℓ〉 =
ℓ+1 |ℓ+1〉 . (3.8)
Any superfield f(t, z, z̄, ηαi ) on R
3|2N can be related to an operator-valued superfield f̂(t, ηαi ) ≡
F (t, a, a†, ηαi ) on R
1|2N acting in H, with the help of the Moyal-Weyl map
f(t, z, z̄, ηαi ) 7→ f̂(t, ηαi ) = Weyl-ordered f
2θa†, ηαi
. (3.9)
The inverse transformation recovers the ordinary superfield,
f̂(t, ηαi ) ≡ F (t, a, a†, ηαi ) 7→ f(t, z, z̄, ηαi ) = F⋆
t, z√
, z̄√
, ηαi
, (3.10)
where F⋆ is obtained from F by replacing ordinary with star products. Under the Moyal-Weyl
map, we have
f ⋆ g 7→ f̂ ĝ and
dx dy f = 2π θTrf̂ = 2π θ
〈ℓ|f̂ |ℓ〉 , (3.11)
and the spatial derivatives are mapped into commutators,
∂zf 7→ ∂̂z f̂ = − 1√
[a†, f̂ ] and ∂z̄f 7→ ∂̂z̄ f̂ = 1√
[a , f̂ ] . (3.12)
For notational simplicity we will from now on omit the hats over the operators except when con-
fusion may arise.
Gauge fixing for ψ. Note that the linear system (2.40) and the compatibility conditions (2.43)
are invariant under a gauge transformation
ψ 7→ ψ′ = g−1ψ , (3.13a)
A 7→ A′ = g−1A g + g−1∂ g (with appropriate indices) , (3.13b)
Ξ 7→ Ξ′ = g−1Ξ g , (3.13c)
where g = g(xa, ηαi ) is a U(n)-valued superfield globally defined on the deformed superspace R
CP 1. Using a gauge transformation of the form (3.13), we can choose ψ such that it will satisfy
the standard asymptotic conditions (see e.g. [51])
ψ = Φ−1 + O(ζ) for ζ → 0 , (3.14a)
ψ = 1l + ζ−1Υ + O(ζ−2) for ζ →∞ , (3.14b)
where the U(n)-valued function Φ and u(n)-valued function Υ depend on xa and ηαi . This “unitary”
gauge is compatible with the reality condition for ψ,
ψ(xa, ηαi , ζ)
ψ(xa, ηαi , ζ̄)
= 1l , (3.15)
obtained by reduction from (2.26).
Gauge fixing for Âiα. After fixing the unitary gauge (3.14) for ψ and inserting (ζα) =
the linear system (2.40), one can easily reconstruct the superfield given in (2.41) from Φ or Υ via
Âi1 = 0 and Âi2 = Φ−1D̂i2Φ = D̂i1Υ (3.16)
and thus fix a gauge for the superfields Âiα. The operators D̂iα were defined in (2.35). One can
express (3.16) in terms of Aiα and A(αβ) − εαβΞ as
Ai1 = 0 and Ai2 = Φ−1∂i2Φ = ∂i1Υ , (3.17)
A(11) = 0 and A(12) + Ξ = Φ−1∂(12)Φ = ∂(11)Υ , (3.18)
A(21) − Ξ = 0 and A(22) = Φ−1∂(22)Φ = ∂(12)Υ . (3.19)
Using (2.32), we can rewrite the nonzero components as
A := Φ−1∂uΦ = ∂xΥ , B := Φ−1∂xΦ = ∂vΥ , Ci := Φ−1∂i2Φ = ∂i1Υ . (3.20)
Recall that the superfields Φ and Υ depend on xa and ηαi .
Linear system. In the above-introduced unitary gauge the linear system (2.42) reads
(ζ∂x − ∂u −A)ψ = 0 , (ζ∂v − ∂x − B)ψ = 0 , (ζ∂i1 − ∂i2 − Ci)ψ = 0 , (3.21)
which adds the last equation to the linear system of the Ward model [7] and generalizes it to
superfields A(xa, ηαj ), B(xa, ηαj ) and Ci(xa, ηαj ). The concise form of (3.21) reads
ζ D̂i1 − D̂i2 − Âi2
ψ = 0 (3.22)
or, in more explicit form,
∂i1 + 2θ
i1∂v + 2θ
∂i2 + Ci + 2θi1(∂x + B) + 2θi2(∂u +A)
ψ = 0 . (3.23)
N -extended sigma model. The compatibility conditions of this linear system are the N -
extended noncommutative sigma model equations
D̂i1(Φ
−1D̂j2 Φ) + D̂
−1D̂i2 Φ) = 0 (3.24)
which in expanded form reads
(gab + vcε
cab) ∂a(Φ
−1∂bΦ) = 0 ⇔ ∂x(Φ−1∂xΦ) − ∂v(Φ−1∂uΦ) = 0 , (3.25a)
∂i1(Φ
−1∂xΦ) − ∂v(Φ−1∂i2Φ) = 0 , ∂i1(Φ−1∂uΦ) − ∂x(Φ−1∂i2Φ) = 0 , (3.25b)
∂i1(Φ
−1∂j2Φ) + ∂
−1∂i2Φ) = 0 . (3.25c)
Here, the first line contains the Wess-Zumino-Witten term with a constant vector (vc) = (0, 1, 0)
which spoils the standard Lorentz invariance but yields an integrable chiral model in 2+1 dimen-
sions. Recall that Φ is a U(n)-valued matrix whose elements act as operators in the Fock space H
and depend on xa and 2N Grassmann variables ηαi . As discussed in Section 2, the compatibility
conditions of the linear equations (3.22) (or (3.21)) are equivalent to the N -extended Bogomolny-
type equations (2.39) for the component (physical) fields. Thus, chiral model field equations (3.25)
are equivalent to a gauge fixed form of equations (2.39).
Υ-formulation. Instead of Φ-parametrization of (A,B, Ci) given in (3.17)–(3.20) we may use the
equivalent Υ-parametrization also given there. In this case, the compatibility conditions for the
linear system (3.21) reduce to
(∂2x − ∂u∂v)Υ + [∂vΥ , ∂xΥ] = 0 , (3.26a)
(∂i2∂v − ∂i1∂x)Υ + [∂i1Υ , ∂vΥ] = 0 , (∂i2∂x − ∂i1∂u)Υ + [∂i1Υ , ∂xΥ] = 0 , (3.26b)
(∂i2∂
1 + ∂
1)Υ + {∂i1Υ , ∂
1Υ} = 0 , (3.26c)
which in concise form read
(D̂i2 D̂
1 + D̂
1)Υ + {D̂i1Υ , D̂
1Υ} = 0 . (3.27)
Recall that Υ is a u(n)-valued matrix whose elements act as operators in the Fock space H and
depend on xa and 2N Grassmann variables ηαi .
For N=4, the commutative limit of (3.27) can be considered as Siegel’s equation [33] reduced
to 2+1 dimensions. According to Siegel, one can extract the multiplet of physical fields appearing
in (2.39) from the prepotential Υ via
∂i1Υ = A
2 , ∂
1Υ = φ
ij , ∂i1∂
1Υ = χ̃
[ijk]
2 , ∂
1Υ = G
[ijkl]
22 , (3.28a)
∂(α1)Υ = A(α2) − εα2ϕ , ∂(α1)∂i1Υ = χiα , ∂(α1)∂(β1)Υ = fαβ , (3.28b)
where one takes Υ and its derivatives at η2i = 0. The other components of the physical fields,
i.e. χ̃
[ijk]
1 , G
[ijkl]
11 , G
[ijkl]
21 , A(11) and A(21)−ϕ, vanish in this light-cone gauge.
Supersymmetry transformations. The 4N supercharges given in (2.11) reduce in 2+1 dimen-
sions to the form
Qiα = ∂iα − ηβi ∂(αβ) and Q
α = ∂
α − θiβ∂(αβ) . (3.29)
Their antichiral version, matching to D̂iα and D̂
of (2.35), reads
Q̂iα = ∂iα − 2ηβi ∂(αβ) and Q̂
, (3.30)
so that
{Q̂iα , Q̂jβ} = −2 δ
i ∂(αβ) . (3.31)
On a (scalar) R3|2N superfield Σ these supersymmetry transformations act as
δ̂Σ := εiαQ̂iαΣ + ε
αΣ (3.32)
and are induced by the coordinate shifts
δ̂ yαβ = −2εi(αηβ)i and δ̂ η
i = ε
i , (3.33)
where εiα and εαi are 4N real Grassmann parameters. It is easy to see that our equations (3.24)
and (3.27) are invariant under the supersymmetry transformations (3.32) (applied to Φ or Υ). This
is simply because the operators D̂iα and D̂
anticommute with the supersymmetry generators Q̂iα
and Q̂
. Therefore, the equations of motion (3.25) of the modified N -extended chiral model in
2+1 dimensions as well as their reductions to 2+0 and 1+1 dimensions carry 2N supersymmetries
and are genuine supersymmetric extensions of the corresponding bosonic equations. Note that this
type of extension is not the standard one since the R-symmetry groups are Spin(N ,N ) in 2+1 and
Spin(N ,N )× Spin(N ,N ) in 1+1 dimensions, which differ from the compact unitary R-symmetry
groups of standard sigma models. Contrary to the standard case of two-dimensional sigma models
the above “noncompact” 2N supersymmetries do not impose any constraints on the geometry of
the target space, e.g. they do not demand it to be Kähler [52] or hyper-Kähler [53]. This may be
of interest and deserves further study.
Action functionals. In either formulation of the N -extended supersymmetric SDYM model on
2,2 there are difficulties with finding a proper action functional generalizing the one [54, 55] for
the purely bosonic case. These difficulties persist after the reduction to 2+1 dimensions, i.e. for the
equations (3.25) and (3.26) describing our supersymmetric modified U(n) chiral model. It is the
price to be paid for overcoming the no-go barrier N ≤ 4 and the absence of geometric target-space
constraints. On a more formal level, the problem is related to the chiral character of (3.24) as well
as (3.27), where only the operators D̂iα but not D̂iα appear. Note however, that for N = 4 one can
write an action functional in component fields producing the equations (2.39), which are equivalent
to the superspace equations (3.24) when i, j = 1, . . . , 4 (see e.g. [47]).
One proposal for an action functional stems from Siegel’s idea [33] for the Υ-formulation of the
N -extended SDYM equations. Namely, one sees that ∂i2Υ enters only linearly into the last two
lines in (3.26). Therefore, if we introduce
Υ(1) := Υ|η2
=0 (3.34)
then it must satisfy the first equation from (3.26), and the remaining equations iteratively define
the dependence of Υ on η2i starting from Υ(1). Hence, all information is contained in Υ(1), as can
also be seen from (3.28). In other words, the dependence of Υ on η2i is not ‘dynamical’. For an
action one can then take (cf. [33])
d3x dN η1
Υ(1)∂(αβ)∂
(αβ)Υ(1) +
Υ(1) ε
αβ∂(α1)Υ(1) ∂(β1)Υ(1)
. (3.35)
Extremizing this functional yields the first line of (3.26) at η2i = 0. Except for the Grassmann
integration, this action has the same form as the purely bosonic one [55]. One may apply the same
logic to the Φ-formulation where the action for the purely bosonic case is also known [54, 56].
4 N -extended multi-soliton configurations via dressing
The existence of the linear system (3.22) (equivalent to (3.21)) encoding solutions of theN -extended
U(n) chiral model in an auxiliary matrix ψ allows for powerful methods to systematically construct
explicit solutions for ψ and hence for Φ† = ψ|ζ=0 and Υ = lim
ζ (ψ−1l). For our purposes the
so-called dressing method [57, 51] proves to be the most practical [12]–[20], and so we shall use it
here for our linear system, i.e. already in the N -extended noncommutative case.
Multi-pole ansatz for ψ. The dressing method is a recursive procedure for generating a new
solution from an old one. More concretely, we rewrite the linear system (3.21) in the form
ψ(∂u − ζ∂x)ψ† = A , ψ(∂x − ζ∂v)ψ† = B , ψ(∂i2 − ζ∂i1)ψ† = Ci . (4.1)
Recall that ψ† := (ψ(xa, ηαi , ζ̄))
† and (A,B, Ci) depend only on xa and ηαi . The central idea is to
demand analyticity in the spectral parameter ζ, which strongly restricts the possible form of ψ.
One way to exploit this constraint starts from the observation that the left hand sides of (4.1) as
well as of the reality condition (3.15) do not depend on ζ while ψ is expected to be a nontrivial
function of ζ globally defined on CP 1. Therefore, it must be a meromorphic function on CP 1
possessing some poles which we choose to lie at finite points with constant coordinates µk ∈ CP 1.
Here we will build a (multi-soliton) solution ψm featuringm simple poles at positions µ1, . . . , µm
with9 Imµk < 0 by left-multiplying an (m−1)-pole solution ψm−1 with a single-pole factor of the
µm − µ̄m
ζ − µm
a, ηαi ) , (4.2)
where the n×n matrix function Pm is yet to be determined. Starting from the trivial (vacuum)
solution ψ0 = 1l, the iteration ψ0 7→ ψ1 7→ . . . 7→ ψm yields a multiplicative ansatz for ψm,
µm−ℓ − µ̄m−ℓ
ζ − µm−ℓ
, (4.3)
which, via partial fraction decomposition, may be rewritten in the additive form
ψm = 1l +
ζ − µk
, (4.4)
9This condition singles out solitons over anti-solitons, which appear for Imµk > 0.
where Λmk and Sk are some n×rk matrices depending on xa and ηαi , with rk ≤ n.
Equations for Sk. Let us first consider the additive parametrization (4.4) of ψm. This ansatz
must satisfy the reality condition (3.15) as well as our linear equations in the form (4.1). In
particular, the poles at ζ = µ̄k on the left hand sides of these equations have to be removable
since the right hand sides are independent of ζ. Inserting the ansatz (4.4) and putting to zero the
corresponding residues, we learn from (3.15) that
µ̄k − µℓ
Sk = 0 , (4.5)
while from (4.1) we obtain the differential equations
µ̄k − µℓ
A,B,i
Sk = 0 , (4.6)
where L̄
A,B,i
stands for either
L̄Ak = ∂u − µ̄k∂x , L̄Bk = µk(∂x − µ̄k∂v) or L̄ik = ∂i2 − µ̄k∂i1 . (4.7)
Note that we consider a recursive procedure starting from m=1, and operators (4.7) will appear
with k = 1, . . . ,m if we consider poles at ζ = µ̄k.
Because the L̄
A,B,i
for k = 1, . . . ,m are linear differential operators, it is easy to write down
the general solution for (4.6) at any given k, by passing from the coordinates (u, v, x; η1i , η
i ) to
“co-moving coordinates” (wk, w̄k, sk; η
, η̄i
). The precise relation for k = 1, . . . ,m is [12, 58]
wk := x+ µ̄ku+ µ̄
v = x+ 1
(µ̄k−µ̄−1k )y +
(µ̄k+µ̄
)t and ηik := η
i + µ̄kη
i , (4.8)
with w̄k and η̄
obtained by complex conjugation and the co-moving time sk being inessential
because by definition nothing will depend on it. The kth moving frame travels with a constant
velocity
(vx , vy)k = −
( µk + µ̄k
µkµ̄k + 1
µkµ̄k − 1
µkµ̄k + 1
, (4.9)
so that the static case wk=z is recovered for µk = −i. On functions of (wk, ηik, w̄k, η̄ik) alone the
operators (4.7) act as
L̄Ak = L̄
k = (µk−µ̄k)
=: L̄k and L̄
k = (µk−µ̄k)
. (4.10)
By induction in k = 1, . . . ,m we learn that, due to (4.5), a necessary and sufficient condition for a
solution of (4.6) is
L̄kSk = SkZ̃k and L̄
kSk = SkZ̃
k (4.11)
with some rk×rk matrices Z̃k and Z̃ik depending on (wk, w̄k, η
Passing to the noncommutative bosonic coordinates we obtain
ŵk , ˆ̄wk
= 2θ νkν̄k with νkν̄k =
µk−µ̄k−µ−1k +µ̄
. (4.12)
Thus, we can introduce annihilation and creation operators
and c
so that [ck , c
] = 1 (4.13)
for k = 1, . . . ,m. Naturally, this Heisenberg algebra is realized on a “co-moving” Fock space Hk,
with basis states |ℓ〉k and a “co-moving” vacuum |0〉k subject to ck|0〉k = 0. Each co-moving
vacuum |0〉k (annihilated by ck) is related to the static vacuum |0〉 (annihilated by a) through an
ISU(1,1) squeezing transformation (cf. [12]) which is time-dependent. The fermionic coordinates
ηik and η̄
k remain spectators in the deformation. Coordinate derivatives are represented in the
standard fashion as
7→ −[c†
, · ] and ν̄k
7→ [ck , · ] . (4.14)
After the Moyal deformation, the n×rk matrices Sk have become operator-valued, but are
still functions of the Grassmann coordinates ηi
and η̄i
. The noncommutative version of the BPS
conditions (4.11) naturally reads
ck Sk = Sk Zk and
Sk = Sk Z
k (4.15)
where Zk and Z
k are some operator-valued rk×rk matrix functions of η
and η̄
Nonabelian solutions for Sk. For general data Zk and Z
it is difficult to solve (4.15), but it
is also unnecessary because the final expression ψm turns out not to depend on them. Therefore,
we conveniently choose
Zk = ck ⊗ 1lrk×rk and Zik = 0 ⇒ Sk = Rk(ck, ηik) , (4.16)
where Rk is an arbitrary n×rk matrix function independent of c†k and η̄ik.10 It is known that
nonabelian (multi-) solitons arise for algebraic functions Rk (cf. e.g. [7] for the commutative and [12]
for the noncommutative N=0 case). Their common feature is a smooth commutative limit. The
only novelty of the supersymmetric extension is the ηi
dependence, i.e.
Rk = Rk,0 + η
kRk,i + η
Rk,ij + η
Rk,ijp + η
Rk,ijpq . (4.17)
Abelian solutions for Sk. It is useful to view Sk as a map from C
rk⊗Hk to Cn⊗Hk (momentarily
suppressing the η dependence). The noncommutative setup now allows us to generalize the domain
of this map to any subspace of Cn ⊗Hk. In particular, we may choose it to be finite-dimensional,
say Cqk , and represent the map by an n×qk array |Sk〉 of kets in H. In this situation, Zk and Zik
in (4.15) are just number -valued qk×qk matrix functions of ηjk and η̄
. In case they do not depend
on η̄
, we can write down the most general solution as
|Sk〉 = Rk(ck, ηjk) |Zk〉 exp
) η̄ik
with |Zk〉 := exp
|0〉k . (4.18)
10Changing Zk or Z
k multiplies Rk by an invertible factor from the right, which drops out later, except for the
degenerate case Zk=0 which yields Sk = Rk |0〉k〈0|k.
As before, we may put Zi
= 0 without loss of generality, but now the choice of Zk does matter.
For any given k generically there exists a qk-dimensional basis change which diagonalizes the
ket-valued matrix
|Zk〉 7→ diag
c† , eα
c†, . . . , eα
|0〉k = diag
|α1k〉 , |α2k〉 , . . . , |α
, (4.19)
where we defined coherent states
|αlk〉 := eα
c† |0〉k so that ck |αlk〉 = αlk |αlk〉 for l = 1, . . . , qk and αlk ∈ C . (4.20)
Note that not only the entries of Rk but also the α
k are holomorphic functions of the co-moving
Grassmann parameters η
and thus can be expanded like in (4.17). In the U(1) model, we must
use ket-valued 1×qk matrices |Sk〉 for all k, yielding rows
|Sk〉 =
R1k |α1k〉 , R2k |α2k〉 , . . . , R
for k = 1, . . . ,m , (4.21)
with functions αl
). Here, the Rl
only affect the states’ normalization and can be collected in a
diagonal matrix to the right, hence will drop out later and thus may all be put to one. Formally,
we have recovered the known abelian (multi-) soliton solutions, but the supersymmetric extension
has generalized |Sk〉 → |Sk(ηjk)〉.
Explicit form of Pk. Let us now consider the multiplicative parametrization (4.3) of ψm which
also allows us to solve (4.5). First of all, note that the reality condition (3.15) is satisfied if
Pk = P
= P 2k ⇔ Pk = Tk (T
−1T †
for k = 1, . . . ,m , (4.22)
meaning that Pk is an operator-valued hermitian projector (of group-space rank rk ≤ n) built from
an n×rk matrix function Tk (the abelian case of n=1 is included). The reality condition follows
just because
µk − µ̄k
ζ − µk
µ̄k − µk
ζ − µ̄k
= 1l for any ζ and k = 1, . . . ,m . (4.23)
The rk columns of Tk span the image of Pk and obey
Pk Tk = Tk ⇔ (1l−Pk)Tk = 0 . (4.24)
Furthermore, the equation (4.5) with m = k (induction) rewritten in the form
(1l−Pk)
µk−ℓ − µ̄k−ℓ
µ̄k − µk−ℓ
Sk = 0 (4.25)
reveals that (cf. (4.24))
T1 = S1 and Tk =
1l − µk−ℓ − µ̄k−ℓ
µk−ℓ − µ̄k
Sk for k ≥ 2 , (4.26)
where the explicit form of Sk for k = 1, . . . ,m is given in (4.16) or (4.18). The final result reads
µm−ℓ − µ̄m−ℓ
ζ − µm−ℓ
= 1l +
ζ − µk
(4.27)
with hermitian projectors Pk given by (4.22), Tk given by (4.26) and Sk given by (4.16) or (4.18).
The explicit form of Λmk (which we do not need) can be found in [12]. The corresponding superfields
Φ and Υ are
Φm = ψ
m|ζ=0 =
(1l− ρkPk) with ρk = 1−
, (4.28a)
Υm = lim
ζ (ψm − 1l) =
(µk−µ̄k)Pk . (4.28b)
From (4.22) it is obvious that Pk is invariant under a similarity transformation
Tk 7→ Tk Λk ⇔ Sk 7→ Sk Λk (4.29)
for an invertible operator-valued rk×rk matrix Λk. This justifies putting Zik = 0 from the beginning
and also the restriction to Zk = ck ⊗1lrk×rk in the nonabelian case, both without loss of generality.
Hence, the nonabelian solution space constructed here is parametrized by the set {Rk}m1 of matrix-
valued functions of ck and η
k and the pole positions µk. The abelian moduli space, however, is
larger by the set {Zk}m1 of matrix-values functions of ηik which generically contain the coherent-
state parameter functions {αl
)}. Restricting to ηi
=0 reproduces the soliton configurations of
the bosonic model [12].
Static solutions. Let us consider the reduction to 2+0 dimensions, i.e. the static case. Recall
that static solutions correspond to the choice m = 1 and µ1 ≡ µ = −i implying w1 = z, so we drop
the index k. Specializing (4.27), we have
ψ = 1l − 2 i
ζ + i
P so that Φ = Φ† = 1l− 2P , (4.30)
where a hermitian projector P of group-space rank r satisfies the BPS equations
(1l−P ) aP = 0 ⇒ (1l−P ) aT = 0 , (4.31a)
(1l−P ) ∂
P = 0 ⇒ (1l−P ) ∂
T = 0 , (4.31b)
with P = T (T †T )−1T † and ηi = η1i + iη
i . In this case T = S, and for a nonabelian r=1 projector
P we get T = T (a, ηi) as an n×1 column. For the simplest case of N=1 we just have (cf. [59])
T = Te(a) + η To(a) with η = η
1 + iη2 , (4.32)
where Te(a) and To(a) are rational functions of a (e.g. polynomials) taking values in the even and
odd parts of the Grassmann algebra. Similarly, an abelian N=1 projector (for n=1) is built from
|T 〉 =
|α1〉 , |α2〉 , . . . , |αq〉
. (4.33)
At θ=0, the static solution (4.32) of our supersymmetric U(n) sigma model is also a solution
of the standard N=1 supersymmetric CPn−1 sigma model in two dimensions (see e.g. [59]).11 For
11In fact, Φ in (4.30) takes values in the Grassmannian Gr(r, n), and Gr(1, n) = CPn−1.
this reason, one can overcome the previously mentioned difficulty with constructing an action (or
energy from the viewpoint of 2+1 dimensions) for static configurations. Moreover, on solutions
obeying the BPS conditions (4.31) the topological charge
Q = 2πθ
dη1dη2 Tr tr Φ
D+Φ ,D−Φ
(4.34)
is proportional to the action (BPS bound)
S = 2πθ
dη1dη2 Tr tr
D+Φ ,D−Φ
(4.35)
and is finite for algebraic functions Te and To. Here, the standard superderivatives D± are defined
+ iη ∂z and D− =
+ iη̄ ∂z̄ . (4.36)
One-soliton configuration. For one moving soliton, from (4.27) and (4.28) we obtain
ψ1 = 1l +
µ− µ̄
ζ − µ P with P = T (T
†T )−1T † (4.37)
Φ = 1l − ρP with ρ = 1− µ
. (4.38)
Now our n×r matrix T must satisfy (putting Zi = 0 and Z = c⊗ 1lr×r)
[c , T ] = 0 and
T = 0 with ηi = η1i + µ̄ η
i , (4.39)
where c is the moving-frame annihilation operator given by (4.13) for k=1.
Recall that the operators c and c† and therefore the matrix T and the projector P can be
expressed in terms of the corresponding static objects by a unitary squeezing transformation (see
e.g. (4.8) and (4.13)). For simplicity we again consider the case N=1 and a nonabelian projector
with r=1. Then (4.39) tells us that T is a holomorphic function of c and η, i.e.
T = Te(c) + η To(c) =
T 1e (c) + η T
o (c)...
Tne (c) + η T
o (c)
(4.40)
with polynomials T ae and T
o of order q, say, analogously to the static case (4.32). Note that,
for T ao to be Grassmann-odd and nonzero, some extraneous Grassmann parameter must appear.
Similarly, abelian projectors for a moving one-soliton obtain by subjecting (4.33) to a squeezing
transformation.
For N=1 the moving frame was defined in (4.8) (dropping the index k) via
w = x + 1
(µ̄−µ̄−1)y + 1
(µ̄+µ̄−1)t and η = η1 + µ̄η2 hence ∂tη = 0 . (4.41)
Consider the moving frame with the coordinates (w, w̄, s; η, η̄) with the choice s = t and the related
change of the derivatives (see [12, 58])
∂x = ∂w + ∂w̄ , (4.42a)
(µ̄−µ̄−1) ∂w + 12(µ−µ
−1) ∂w̄ , (4.42b)
(µ̄+µ̄−1) ∂w +
(µ+µ−1) ∂w̄ + ∂s , (4.42c)
∂η1 = ∂η + ∂η̄ , (4.42d)
∂η2 = µ̄ ∂η + µ∂η̄ . (4.42e)
In the moving frame our solution (4.38) is static, i.e. ∂sΦ = 0, and the projector P has the same
form as in the static case. The only difference is the coefficient ρ instead of 2 in (4.38). Therefore,
by computing the action (4.35) in (w, w̄; η1, η2) coordinates, we obtain for algebraic functions T
in (4.40) a finite answer, which differs from the static one by a kinematical prefactor depending
on µ (cf. [12] for the bosonic case).
Large-time asymptotics. Note that in the distinguished (z, z̄, t) coordinate frame (4.41) implies
that at large times w→ κ t with κ = 1
(µ̄+µ̄−1). As a consequence, the tq term in each polynomial
in (4.40) will dominate, i.e.
T → tq
a1 + η b1
an + η bn
=: tq Γ , (4.43)
where Γ is a fixed vector in Cn. It is easy to see that in the distinguished frame the large-time
limit of Φ given by (4.38) is
Φ = 1l − ρΠ with Π = Γ (Γ†Γ)−1Γ† (4.44)
being the projector on the constant vector Γ.
Consider now them-soliton configuration (4.28). By induction of the above argument one easily
arrives at the m-soliton generalization of (4.44). Namely, in the frame moving with the ℓth lump
we have
Φm = (1l− ρ1Π1) . . . (1l− ρℓ−1Πℓ−1)(1l− ρℓPℓ)(1l− ρℓ+1Πℓ+1) . . . (1l− ρmΠm) , (4.45)
where the Πm are constant projectors. This large-time factorization of multi-soliton solutions
provides a proof of the no-scattering property because the asymptotic configurations are identical
for large negative and large positive times.
5 Conclusions
In this paper we introduced a generalization of the modified integrable U(n) chiral model with
2N≤ 8 supersymmetries in 2+1 dimensions and considered a Moyal deformation of this model.
It was shown that this N -extended chiral model is equivalent to a gauge-fixed BPS subsector of
an N -extended super Yang-Mills model in 2+1 dimensions originating from twistor string theory.
The dressing method was applied to generate a wide class of multi-soliton configurations, which
are time-dependent finite-energy solutions to the equations of motion. Compared to the N=0
model, the supersymmetric extension was seen to promote the configurations’ building blocks to
holomorphic functions of suitable Grassmann coordinates. By considering the large-time asymptotic
factorization into a product of single soliton solutions we have shown that no scattering occurs
within the dressing ansatz chosen here.
The considered model does not stand alone but is motivated by twistor string theory [37] with
a target space reduced to the mini-supertwistor space [44, 45, 47]. In this context, the obtained
multi-soliton solutions are to be regarded as D(0|2N )-branes moving inside D(2|2N )-branes [60].
Here 2N appears due to fermionic worldvolume directions of our branes in the superspace de-
scription [60]. Switching on a constant B-field simply deforms the sigma model and D-brane
worldvolumes noncommutatively, thereby admitting also regular supersymmetric noncommutative
abelian solutions.
Restricting to static configurations, the models can be specialized to Grassmannian supersym-
metric sigma models, where the superfield Φ takes values in Gr(r, n), and the field equations are
invariant under 2N supersymmetry transformations with 0 ≤ N ≤ 4. This differs from the results
for standard 2D sigma models [52, 53] where the target spaces have to be Kähler or hyper-Kähler
for admitting two or four supersymmetries, respectively. This difference will be discussed in more
details elsewhere.
We derived the supersymmetric chiral model in 2+1 dimensions through dimensional reduction
and gauge fixing of the N -extended supersymmetric SDYM equations in 2+2 dimensions. Recall
that for the purely bosonic case most (if not all) integrable equations in three and fewer dimensions
can be obtained from the SDYM equations (or their hierarchy [25]) by suitable dimensional reduc-
tions (see e.g. [61]–[65] and references therein). Moreover, this Ward conjecture [61] was extended
to the noncommutative case (see e.g. [66, 67]). It will be interesting to consider similar reductions
of the N -extended supersymmetric SDYM equations (and their hierarchy [68]) to supersymmetric
integrable equations in three and two dimensions generalizing earlier results [69].
Acknowledgements
We acknowledge fruitful discussions with C. Gutschwager. This work was supported in part by the
Deutsche Forschungsgemeinschaft (DFG).
References
[1] M.R. Douglas and C.M. Hull, J. High Energy Phys. 02 (1998) 008 [hep-th/9711165].
[2] C.S. Chu and P.M. Ho, Nucl. Phys. B 550 (1999) 151 [hep-th/9812219].
[3] V. Schomerus, J. High Energy Phys. 06 (1999) 030 [hep-th/9903205].
[4] N. Seiberg and E. Witten, J. High Energy Phys. 09 (1999) 032 [hep-th/9908142].
[5] O. Lechtenfeld, A.D. Popov and B. Spendig, Phys. Lett. B 507 (2001) 317
[hep-th/0012200].
[6] O. Lechtenfeld, A.D. Popov and B. Spendig, J. High Energy Phys. 06 (2001) 011
[hep-th/0103196].
[7] R.S. Ward, J. Math. Phys. 29 (1988) 386; Commun. Math. Phys. 128 (1990) 319.
[8] H. Ooguri and C. Vafa, Mod. Phys. Lett. A 5 (1990) 1389; Nucl. Phys. B 361 (1991) 469.
[9] M. Hamanaka, “Noncommutative solitons and D-branes,” PhD thesis, Tokyo University, 2003
[hep-th/0303256];
F.A. Schaposnik, Braz. J. Phys. 34 (2004) 1349 [hep-th/0310202];
L. Tamassia, “Noncommutative supersymmetric/integrable models and string theory,”
PhD thesis, Pavia University, 2005 [hep-th/0506064];
[10] M. Hamanaka, “Noncommutative solitons and integrable systems,”in Noncommutative Geome-
try and Physics, Eds. Y. Maeda, N. Tose, N. Miyazaki, S. Watamura and D. Steinheimer (World
Scientific, 2005), p.175 [hep-th/0504001]; Nucl. Phys. B 741 (2006) 368 [hep-th/0601209].
[11] W.J. Zakrzewski, “Low dimensional sigma models”, IOP Publishing, Bristol, 1989.
[12] O. Lechtenfeld and A.D. Popov, J. High Energy Phys. 11 (2001) 040 [hep-th/0106213].
[13] O. Lechtenfeld and A.D. Popov, Phys. Lett. B 523 (2001) 178 [hep-th/0108118].
[14] R. Gopakumar, S. Minwalla and A. Strominger, J. High Energy Phys. 05 (2000) 020
[hep-th/0003160].
[15] R. Gopakumar, M. Headrick and M. Spradlin, Commun. Math. Phys. 233 (2003) 355
[hep-th/0103256].
[16] M. Klawunn, O. Lechtenfeld and S. Petersen, J. High Energy Phys. 06 (2006) 028
[hep-th/0604219].
[17] S. Bieling, J. Phys. A 35 (2002) 6281 [hep-th/0203269].
[18] M. Wolf, J. High Energy Phys. 06 (2002) 055 [hep-th/0204185].
[19] M. Ihl and S. Uhlmann, Int. J. Mod. Phys. A 18 (2003) 4889 [hep-th/0211263].
[20] O. Lechtenfeld, L. Mazzanti, S. Penati, A.D. Popov and L. Tamassia,
Nucl. Phys. B 705 (2005) 477 [hep-th/0406065].
[21] A.V. Domrin, O. Lechtenfeld and S. Petersen, J. High Energy Phys. 03 (2005) 045
[hep-th/0412001].
[22] N. Berkovits, Nucl. Phys. B 450 (1995) 90 [Erratum-ibid. B 459 (1996) 439] [hep-th/9503099].
[23] N. Berkovits and C. Vafa, Nucl. Phys. B 433 (1995) 123 [hep-th/9407190];
H. Ooguri and C. Vafa, Nucl. Phys. B 451 (1995) 121 [hep-th/9505183].
[24] R.S. Ward and R.O. Wells, “Twistor geometry and field theory,”
Cambridge University Press, Cambridge, 1990.
[25] L.J. Mason and N.M. J. Woodhouse, “Integrability, self-duality, and twistor theory,”
Oxford University Press, Oxford, 1996.
[26] R.S. Ward, Phys. Lett. A 61 (1977) 81.
[27] O. Lechtenfeld and A.D. Popov, Phys. Lett. B 494 (2000) 148 [hep-th/0009144];
O. Lechtenfeld, A.D. Popov and S. Uhlmann, Nucl. Phys. B 637 (2002) 119
[hep-th/0204155].
[28] A. Kling, O. Lechtenfeld, A.D. Popov and S. Uhlmann, Phys. Lett. B 551 (2003) 193
[hep-th/0209186]; Fortsch. Phys. 51 (2003) 775 [hep-th/0212335];
A. Kling and S. Uhlmann, J. High Energy Phys. 07 (2003) 061 [hep-th/0306254];
M. Ihl, A. Kling and S. Uhlmann, J. High Energy Phys. 03 (2004) 002 [hep-th/0312314];
S. Uhlmann, J. High Energy Phys. 11 (2004) 003 [hep-th/0408245].
[29] L.L. Chau, M.L. Ge and Y.S. Wu, Phys. Rev. D 25 (1982) 1080;
L. Dolan, Phys. Lett. B 113 (1982) 387;
L.L. Chau, M.L. Ge, A. Sinha and Y.S. Wu, Phys. Lett. B 121 (1983) 391;
L. Crane, Commun. Math. Phys. 110 (1987) 391.
[30] A.D. Popov and C.R. Preitschopf, Phys. Lett. B 374 (1996) 71 [hep-th/9512130];
T.A. Ivanova, J. Math. Phys. 39 (1998) 79 [hep-th/9702144];
A.D. Popov, Rev. Math. Phys. 11 (1999) 1091 [hep-th/9803183];
Nucl. Phys. B 550 (1999) 585 [hep-th/9806239].
[31] T.A. Ivanova and O. Lechtenfeld, Int. J. Mod. Phys. A 16 (2001) 303 [hep-th/0007049].
[32] A.M. Semikhatov, Phys. Lett. B 120 (1983) 171;
I.V. Volovich, Phys. Lett. B 123 (1983) 329.
[33] W. Siegel, Phys. Rev. D 46 (1992) 3235 [hep-th/9205075].
[34] W. Siegel, Phys. Rev. Lett. 69 (1992) 1493 [hep-th/9204005];
Phys. Rev. D 47 (1993) 2512 [hep-th/9210008].
[35] N. Berkovits and W. Siegel, Nucl. Phys. B 505 (1997) 139 [hep-th/9703154].
[36] S. Bellucci, A. Galajinsky and O. Lechtenfeld, Nucl. Phys. B 609 (2001) 410 [hep-th/0103049].
[37] E. Witten, Commun. Math. Phys. 252 (2004) 189 [hep-th/0312171].
[38] N. Berkovits, Phys. Rev. Lett. 93 (2004) 011601 [hep-th/0402045];
N. Berkovits and L. Motl, J. High Energy Phys. 04 (2004) 056 [hep-th/0403187].
[39] W. Siegel,“Untwisting the twistor superstring,” hep-th/0404255;
O. Lechtenfeld and A.D. Popov, Phys. Lett. B 598 (2004) 113 [hep-th/0406179].
[40] I.A. Bandos, J.A. de Azcarraga and C. Miquel-Espanya, J. High Energy Phys. 07 (2006) 005
[hep-th/0604037];
M. Abou-Zeid, C.M. Hull and L.J. Mason,
“Einstein supergravity and new twistor string theories,” hep-th/0606272;
L. Dolan and P. Goddard, “Tree and loop amplitudes in open twistor string theory,”
hep-th/0703054.
[41] A.D. Popov and C. Saemann, Adv. Theor. Math. Phys. 9 (2005) 931 [hep-th/0405123];
A.D. Popov and M. Wolf, “Hidden symmetries and integrable hierarchy of the N=4 super-
symmetric Yang-Mills equations,” hep-th/0608225.
[42] C. Saemann, “Aspects of twistor geometry and supersymmetric field theories within super-
string theory,” PhD thesis, Leibniz University of Hannover, 2006 [hep-th/0603098];
M. Wolf, “On supertwistor geometry and integrability in super gauge theory,” PhD thesis,
Leibniz University of Hannover, 2006 [hep-th/0611013].
[43] A. Neitzke and C. Vafa, “N = 2 strings and the twistorial Calabi-Yau,” hep-th/0402128.
[44] D.W. Chiou, O.J. Ganor, Y.P. Hong, B.S. Kim and I. Mitra, Phys. Rev. D 71 (2005) 125016
[hep-th/0502076];
D.W. Chiou, O.J. Ganor and B.S. Kim, J. High Energy Phys. 03 (2006) 027
[hep-th/0512242].
[45] A.D. Popov, C. Saemann and M. Wolf, J. High Energy Phys. 10 (2005) 058 [hep-th/0505161].
[46] C. Saemann, “On the mini-superambitwistor space and N = 8 super Yang-Mills theory,”
hep-th/0508137; “The mini-superambitwistor space,” In: Proc. of the Intern. Workshop on
Supersymmetries and Quantum Symmetries (SQS’05), Eds. E. Ivanov and B. Zupnik, Dubna,
2005 [hep-th/0511251].
[47] A. D. Popov, Phys. Lett. B 647 (2007) 509 [hep-th/0702106].
[48] C. Devchand and V. Ogievetsky, Nucl. Phys. B 481 (1996) 188 [hep-th/9606027].
[49] N. Seiberg, Nucl. Phys. Proc. Suppl. 67 (1998) 158 [hep-th/9705117].
[50] A. Konechny and A.S. Schwarz, Phys. Rept. 360 (2002) 353 [hep-th/0012145, hep-th/0107251];
M.R. Douglas and N.A. Nekrasov, Rev. Mod. Phys. 73 (2001) 977 [hep-th/0106048];
R.J. Szabo, Phys. Rept. 378 (2003) 207 [hep-th/0109162].
[51] L.D. Faddeev and L.A. Takhtajan, “Hamiltonian methods in the theory of solitons”,
Springer, Berlin, 1987.
[52] B. Zumino, Phys. Lett. B 87 (1979) 203.
[53] L. Alvarez-Gaumé and D.Z. Freedman, Commun. Math. Phys. 80 (1981) 443.
[54] V.P. Nair and J. Schiff, Nucl. Phys. B 371 (1992) 329;
A. Losev, G.W. Moore, N. Nekrasov and S. Shatashvili,
Nucl. Phys. Proc. Suppl. 46 (1996) 130 [hep-th/9509151].
[55] E.T. Newman, Phys. Rev. D 18 (1978) 2901;
A.N. Leznov, Theor. Math. Phys. 73 (1988) 1233.
[56] T.A. Ioannidou and W. Zakrzewski, Phys. Lett. A 249 (1998) 303 [hep-th/9802177].
[57] V.E. Zakharov and A.V. Mikhailov, Sov. Phys. JETP 47 (1978) 1017;
V.E. Zakharov and A.B. Shabat, Funct. Anal. Appl. 13 (1979) 166;
P. Forgács, Z. Horváth and L. Palla, Nucl. Phys. B 229 (1983) 77;
K. Uhlenbeck, J. Diff. Geom. 30 (1989) 1;
O. Babelon and D. Bernard, Commun. Math. Phys. 149 (1992) 279 [hep-th/9111036].
[58] C.S. Chu and O. Lechtenfeld, Phys. Lett. B 625 (2005) 145 [hep-th/0507062].
[59] A.M. Perelomov, Phys. Rept. 174 (1989) 229.
[60] O. Lechtenfeld and C. Saemann, J. High Energy Phys. 03 (2006) 002 [hep-th/0511130].
[61] R.S. Ward, Phil. Trans. Roy. Soc. Lond. A 315 (1985) 451;
“Multidimensional integrable systems,” In: Field Theory, Quantum Gravity and Strings,
Eds. H.J. De Vega, N. Sanchez, Vol. 2, p.106, 1986.
[62] L.J. Mason and G.A.J. Sparling, Phys. Lett. A 137 (1989) 29; J. Geom. Phys. 8 (1992) 243;
L.J. Mason and M.A. Singer, Commun. Math. Phys. 166 (1994) 191.
[63] S. Chakravarty, M.J. Ablowitz and P.A. Clarkson, Phys. Rev. Lett. 65 (1990) 1085;
M.J. Ablowitz, S. Chakravarty and L.A. Takhtajan, Commun. Math. Phys. 158 (1993) 289;
S. Chakravarty, S.L. Kent and E.T. Newman, J. Math. Phys. 36 (1995) 763;
M.J. Ablowitz, S. Chakravarty and R.G. Halburd, J. Math. Phys. 44 (2003) 3147.
[64] G.V. Dunne, R. Jackiw, S.Y. Pi and C.A. Trugenberger, Phys. Rev. D 43 (1991) 1332
[Erratum-ibid. D 45 (1992) 3012];
I. Bakas and D.A. Depireux, Int. J. Mod. Phys. A 7 (1992) 1767.
[65] T.A. Ivanova and A.D. Popov, Phys. Lett. A 170 (1992) 293; Phys. Lett. A 205 (1995) 158
[hep-th/9508129]; Theor. Math. Phys. 102 (1995) 280;
M. Legaré and A. D. Popov, JETP Lett. 59 (1994) 883; Phys. Lett. A 198 (1995) 195.
[66] M. Hamanaka and K. Toda, Phys. Lett. A 316 (2003) 77 [hep-th/0211148];
J. Phys. A 36 (2003) 11981 [hep-th/0301213];
M. Hamanaka, Phys. Lett. B 625 (2005) 324 [hep-th/0507112];
[67] A. Dimakis and F. Mueller-Hoissen, J. Phys. A 33 (2000) 6579 [nlin.si/0006029];
M.T. Grisaru, L. Mazzanti, S. Penati and L. Tamassia, J. High Energy Phys. 04 (2004) 057
[hep-th/0310214];
I. Cabrera-Carnero, J. High Energy Phys. 10 (2005) 071 [hep-th/0503147].
[68] M. Wolf, J. High Energy Phys. 02 (2005) 018 [hep-th/0412163]; “Twistors and aspects of inte-
grability of self-dual SYM theory”, In: Proc. of the Intern. Workshop on Supersymmetries and
Quantum Symmetries (SQS’05), Eds. E. Ivanov and B. Zupnik, Dubna, 2005 [hep-th/0511230].
[69] S.J. Gates and H. Nishino, Phys. Lett. B 299 (1993) 255 [hep-th/9210163];
A.K. Das and C.A.P. Galvao, Mod. Phys. Lett. A 8 (1993) 1399 [hep-th/9211014];
H. Nishino and S. Rajpoot, Phys. Lett. B 572 (2003) 91 [hep-th/0306290].
ABSTRACT
  We consider a supersymmetric Bogomolny-type model in 2+1 dimensions
originating from twistor string theory. By a gauge fixing this model is reduced
to a modified U(n) chiral model with N<=8 supersymmetries in 2+1 dimensions.
After a Moyal-type deformation of the model, we employ the dressing method to
explicitly construct multi-soliton configurations on noncommutative R^{2,1} and
analyze some of their properties.

<|endoftext|><|startoftext|>
Introduction and Summary 1
2. Action and Hamiltonian 2
2.1 The 3 + 1 split 4
2.2 Shifted Variables 7
2.3 Linearization 7
3. Linearized Gravitational Duality and Holography 9
3.1 Duality and Holography 9
3.2 Linearized gravitational duality 10
3.3 Linearized Constraints and Bianchi Identities 11
3.4 Connection with other known dualities 12
4. The Effect on the Boundary Theory 13
5. Conclusions and Outlook 13
6. Appendix: other duality mappings 14
1. Introduction and Summary
Duality has played an important role in our understanding of Yang-Mill theories and it is believed
that it will play an important role also in gravity and in higher-spin gauge theories. Indeed,
although it is less clear what could be the implications of duality for theories whose quantum
versions are still unknown, gravity and higher-spin gauge theories1 are intimately connected to
a quantum string theory where certainly duality plays a crucial role.
The recent advent of holography raises some intriguing questions for duality. For example
one may wonder what is the holographic image of a duality invariant spectrum, a duality trans-
formation or a possible quantization condition that usually duality implies for charges. Some of
these issues were raised by Witten in [2] where it was argued that the standard electric-magnetic
duality of a U(1) gauge theory on AdS4 is responsible for a “natural” SL(2,Z) action on current
two-point functions in three-dimensional CFTs.2 Shortly afterwards it was shown in [3] that such
an SL(2,Z) action is intimately related to certain “double-trace” deformations in the boundary,
1For reviews of higher-spin theories see e.g. [1].
2See [4] and [5] for more recent works.
– 1 –
assuming suitable large-N limits and existence of non-trivial fixed points. The latter assumptions
are strengthened by the fact that there exist models (e.g., see [6] and references therein) which
exhibit the required behavior. In particular, it was shown in [3] that certain ”double-trace”
deformations induce an SL(2,Z) action on two-point functions of higher-spin (i.e. spin s ≥ 2)
currents. This has led to the Duality Conjecture of [3]: linearized higher-spin theories on AdS4
spaces possess a generalization of electric-magnetic duality whose holographic image is the natural
SL(2,Z) action on boundary two-point functions.
Surprisingly, even the duality for linearized spin-2 gauge fields (linearized gravity) was not
widely known by the time of this conjecture.3 Second order linearized gravitational duality was
discussed among other in [8, 9, 10, 11, 12]. More recently, the duality properties of linearized
gravity around flat space were studied in [13] and were further discussed in [14]. The duality of
linearized gravity around dS4 was later studied in [15].
In this note we present our calculations regarding the duality properties of gravity in the
presence of a cosmological constant. Having in mind applications to higher-spin gauge theories
we use forms and work in the first order formalism where duality is also manifested at the level
of the action [16]. Moreover, the first order formalism is relevant for applications of duality to
holography, since the correlation functions of the boundary theory are essentially determined by
the bulk canonical momenta (see e.g. [17]).
Our aim in this work is to formulate linearized first order gravity using suitable ”electric”
and ”magnetic” variables, in close analogy with electromagnetism. We find that this is pos-
sible only when the background geometry is Minkowski or (A)dS4. Then we implement the
standard electric-magnetic duality rotations. We find that, up to ”boundary” terms, the lin-
earized Hamiltonian changes by terms that do not alter the bulk dynamics i.e. do not alter
the second order bulk equations of motion. Moreover, the duality rotation interchanges the
(linearized) constraints with the (linearized) Bianchi identitites. The ”boundary” terms have
important holographic consequences since they correspond to marginal ”double-trace” deforma-
tions [3] that induce the boundary SL(2,Z) action. In the Appendix we exhibit a modified
duality rotation that leaves the bulk Hamiltonian invariant and induces ”boundary” terms that
correspond to relevant deformations as in [3].
2. Action and Hamiltonian
Having in mind the extension of our results to higher-spin gauge theorieswe start from the
MacDowell-Mansouri form [18] of the gravitational action4
IMM =
ǫabcd
Rab ∧ Rcd + 2Λea ∧ eb ∧ Rcd + Λ2ea ∧ eb ∧ ec ∧ ed
, (2.1)
3An interesting formulation of first order duality for linearized gravity around flat space was presented in [7].
4We note I = −16πGNS, where S is the usually normalized gravitational action.
– 2 –
where a, b, ... are Lorentz indices. In this formalism, the vierbein ea and the spin connection ωab
are initially thought of as independent variables. The curvature 2-form is
Rab = dω
b + ω
c ∧ ω
Rabcde
c ∧ ed.
Varying the action with respect to ea and ωab, we find
Rab + Λea ∧ eb = 0 , (2.2)
T a = dea + ωab ∧ e
b = 0 . (2.3)
The relation to gravity is established via the vanishing torsion equation (2.3), which relates e
and ω in the familiar way. The above equations are equivalent to the Einstein equation in metric
variables
Rµν −
Rgµν = +3Λgµν . (2.4)
and the scalar curvature is R = −d(d − 1)Λ = −12Λ. Note that our Λ is related to the
cosmological constant in its usual definition via Λcosm = −6Λ. Λ > 0 corresponds to AdS.
Note that this is actually SO(3, 2) covariant, as we can combine ω, e into a super-connection.
Note that Λ has units (Length)−2. In the SO(3, 2)-invariant formalism, IMM arises from
IMM =
ǫABCDEV
ERAB ∧RCD , (2.5)
where V E is a non-dynamical 0-form field (that we take to have value V −1 = 1 to gauge back
to the SO(3, 1) formalism) and RAB is the curvature of Ω
B ≡ {e
a, ωab}. There are also quasi-
topological terms of the form
Itop =
RAB ∧R
RAB ∧ RACV
BV C (2.6)
that we could add to the action. In the stated gauge, this reduces to
Itop =
P2 + (θ + α)CNY + α
Rab ∧ e
a ∧ eb (2.7)
where P2 =
Rab∧R
a is the Pontryagin class, CNY =
(T a∧Ta−Rab∧e
a∧eb) is the Nieh-Yan
class and we also note the Euler class E2 =
ǫabcdR
ab∧Rcd. Note that in the presence of torsion,
the action (2.7) contains the non-topological term
Rab ∧ e
a ∧ eb with “Immirzi parameter”
γ = −2/α. In the absence of torsion, this term is a total derivative.
The Hilbert-Palatini action is
IHP = IMM −
E2 . (2.8)
It differs from IMM by a boundary term, is smooth as Λ → 0 but is not manifestly SO(3, 2)-
invariant.
– 3 –
2.1 The 3 + 1 split
Next, we carefully consider the 3 + 1 split. Although much of the discussion here is familiar
from the ADM formalism, we feel it is important to set notation carefully, as we will introduce
some new ingredients. To accommodate both AdS and dS signatures simultaneously, we will
introduce a ‘time’ function t and a foliation of space-time Σt →֒ M . In dS, t is time-like, and
this corresponds to the usual Hamiltonian foliation; in AdS on the other hand, we will take t to
be the (space-like) radial coordinate. We will keep track of the resulting signs by a parameter
σ⊥, equal to ±1 in dS(AdS).
Proceeding as usual then, we get a vector field t that satisfies ∇tt = 1 ≡ t(t) (so t =
and a 1-form dt. Given a 4-metric, we can introduce the normal 1-form n as
n = σ⊥Ndt , (2.9)
which is normalized as (n, n) = σ⊥. The dual vector field n can be expanded as
N , (2.10)
where the shift N satisfies (N,n) = 0, and thus (t,n) = σ⊥N .
Next, we will locally choose a basis of 1-forms
e0 = σ⊥n = Ndt , (2.11)
eα = ẽα +Nαdt . (2.12)
The ẽα span T ∗Σt, and correspond to a 3-metric hij = ẽ
j ηαβ . The quantities N
α are the
components of N: Nα = eαi N
i. These basis 1-forms are dual to {e0 = n, eα = ẽα}, with
b) = δba.
We expand the spin connection in the same basis5
ωab = q
bdt+ ω̃
b , (2.13)
which leads to
Rab = R̃
b + dt ∧ r
b , (2.14)
where R̃ is formed from ω̃ and d̃ only, and
rab = ˙̃ω
b − d̃q
b − ω̃
b + q
b . (2.15)
Note that these quantities are merely decompositions along T ∗Σt in the 4-geometry; we will
introduce the intrinsically defined objects shortly.
We then find
IHP = 2ǫαβγ
N(R̃αβ + Λẽα ∧ ẽβ) ∧ ẽγ − 2Nα(R̃0β) ∧ ẽγ + r0α ∧ ẽβ ∧ ẽγ
. (2.16)
5We have qab = Nω0
b and ω̃
b ≡ ωα
– 4 –
As is familiar, the lapse and shift appear as Lagrange multipliers. The constraints that they
multiply are of course zero in any background (i.e. vacuum solution), such as (A)dS4. The final
term in the action contains the real dynamics – r0α depends on the components R0α0β of the
Riemann tensor.
Note though that the tensors used here are 4-dimensional. Let us define the ”electric field”
Kα = σ⊥ω̃
α = Kβαẽ
β . (2.17)
In the case that ω is the torsion-free Levi-Civita connection, this agrees with the standard
definition for extrinsic curvature, regarded as a vector-valued one-form. We then find
R̃αβ =
(3)Rαβ − σ⊥K
α ∧Kβ , (2.18)
R̃0α = σ⊥(d̃Kα +Kβ ∧ ω̃
α) ≡ σ⊥(D̃K)α . (2.19)
These equations amount to the Gauss-Codazzi relations.
Furthermore, r0α contains time derivatives of ω̃0α as well as terms linear in components of
q. We find
2ǫαβγr
0α ∧ ẽβ ∧ ẽγ = 2ǫαβγ
α − (D̃q)0α
∧ ẽβ ∧ ẽγ , (2.20)
= 2σ⊥ǫαβγ
K̇α + qαδKδ
∧ ẽβ ∧ ẽγ + 4q0α
ǫαβγ T̃
β ∧ ẽγ
up to a total 3-derivative. We have defined the intrinsic 3-torsion T̃ α = d̃ẽα + ω̃α β ∧ ẽ
β. Since
we wish to regard the ẽ as coordinate variables,6 we integrate the first term by parts to obtain
(up to the total time-derivative ∂
α ∧ ẽβ ∧ ẽγǫαβγ
2ǫαβγr
0α ∧ ẽβ ∧ ẽγ = Πα ∧ ˙̃e
α + 4q0αǫαβγ T̃
β ∧ ẽγ + 2σ⊥q
αδǫαβγKδ ∧ ẽ
β ∧ ẽγ . (2.21)
where we have defined the momentum 2-form
Πα = −4σ⊥ǫαβγK
β ∧ ẽγ . (2.22)
The qab appear as Lagrange multipliers. In particular, the qαβ constraint precisely sets the
antisymmetric (torsional) part of the extrinsic curvature tensor K[αβ] to zero. Next, we define
the ”magnetic field”
σ⊥ǫαβγω̃
βγ, ωαβ = −ǫαβγBγ. (2.23)
and we find that the q0α constraint
ǫαβγ T̃
β ∧ ẽγ = ǫαβγ d̃ẽ
β ∧ ẽγ − σ⊥Bβ ∧ ẽ
β ∧ ẽα = 0 , (2.24)
6Without this integration by parts, we would be in the Ashtekar formalism. Here, our choice gives a formalism
closely related to the metric variable formalism. Note that the induced boundary term may be written − 1
Πα∧ ẽ
– 5 –
involves only the antisymmetric part B[α,β] of the magnetic field Bα = Bαβ ẽ
β . The antisymmetric
part of Bα spoils the gauge covariance of the constraint (2.24) under an SO(3) rotation of the
dreibein ẽα, hence it represents degrees of freedom that can be gauged fixed to zero by an SO(3)
rotation. On the other hand, an algebraic equation of motion connects the symmetric part of
Bαβ to derivatives of ẽ
d̃ẽα + ǫαβγBβ ∧ ẽγ = 0 (2.25)
At the end, one is left with the canonically conjugate variables ẽα and Πα. These results are
familiar from the metric formalism.
Dropping the torsional terms, we then arrive at the action
IHP =
˙̃eα ∧ Πα + 2Nǫαβγ(
(3)Rαβ − σ⊥K
α ∧Kβ + Λẽα ∧ ẽβ) ∧ ẽγ
−4σ⊥N
αǫαβγ(D̃K)
β ∧ ẽγ
. (2.26)
Furthermore, using ∗3ẽ
α ∧ ẽβ = 1
αβγ ẽδ, we have
Π̂α = ∗3Πα = −2(Kαβ − ηαβtrK)ẽ
β , (2.27)
where trK = ηαβKαβ . We can solve the above equation to get
Kα = −
(Π̂αβ −
ηαβtrΠ̂)ẽ
β . (2.28)
As stated above, Kαβ (and Π̂αβ) is symmetric when the torsion vanishes.
Finally, with the definition (2.23) we find7
(3)Rαβ ∧ ẽγ = ǫαβγ
d̃ω̃αβ + ω̃αδ ∧ ω̃
∧ ẽγ = σ⊥
2d̃Bγ + ǫαβγB
α ∧ Bβ
∧ ẽγ . (2.29)
Introducing Bα is an unusual thing to do but it will play a role in duality: in this form, the
Hamiltonian contains terms which are reminiscent of those of the Maxwell theory. The full HP
action is of the form
IHP =
˙̃eα ∧Πα − 4σ⊥N
αǫαβγ(D̃K)
β ∧ ẽγ
+2σ⊥N(2d̃Bγ + ǫαβγB
α ∧ Bβ − ǫαβγK
α ∧Kβ + σ⊥Λǫαβγ ẽ
α ∧ ẽβ) ∧ ẽγ
. (2.30)
Note that the entire contribution of the cosmological constant appears in the last term of the
Hamiltonian constraint.
7The spatial signature σ3 appears in ǫαβγǫφδρη
αφ = σ3(ηβδηγρ − ηβρηγδ). We will always consider Lorentzian
spacetime signature, so σ3 = −σ⊥.
– 6 –
2.2 Shifted Variables
It is possible to make a transformation of the canonical variables in order to absorb the cosmo-
logical constant term in (2.30). This can be achieved by introducing the new variables
K̂α = Kα − ρẽα , (2.31)
and requiring that
ρ2 = σ⊥Λ . (2.32)
This is positive only when σ⊥ and Λ are simultaneously positive or negative, as it is the case for
both AdS4 (Λ > 0) and dS4 (Λ < 0). We will often write Λ = σ⊥/L
2 where L is a length scale.
Under (2.31) the momentum 2-form becomes
Πα → Pα − 4σ⊥ρǫαβγ ẽ
β ∧ ẽγ . (2.33)
The last term in (2.33) contributes a total time derivative to the action (of the form of a boundary
cosmological term). We have introduced a new momentum variable
Pα = −4σ⊥ǫαβγK̂
β ∧ ẽγ .
Then, we get the action
IHP =
˙̃eα ∧ Pα − 4σ⊥N
αǫαβγ(D̃K̂ + ρT̃ )
β ∧ ẽγ −
σ⊥ρǫαβγ
(ẽα ∧ ẽβ ∧ ẽγ)
+2σ⊥N
2d̃(Bα ∧ ẽ
α) + 2Bγ ∧ T̃
γ − ǫαβγ
Bα ∧Bβ + K̂α ∧ K̂β + 2ρK̂α ∧ ẽβ
∧ ẽγ
.(2.34)
Note that the shift constraint is still written in terms of the ordinary covariant derivative, and
thus involves a non-linear term coupling B to K̂. Consistent with our previous discussion, we
drop the terms involving the torsion T̃ , and disregard the boundary term to obtain
IHP =
˙̃eα ∧ Pα − 4σ⊥N
αǫαβγ(D̃K̂)
β ∧ ẽγ
+2σ⊥N
2d̃(Bα ∧ ẽ
α)− ǫαβγ
Bα ∧Bβ + K̂α ∧ K̂β + 2ρK̂α ∧ ẽβ
∧ ẽγ
. (2.35)
We note that the parameter ρ can be of either sign (although, this sign does not appear in the
second order equations of motion).
2.3 Linearization
Next, we linearize the above action around an appropriate fixed background. We expand as
ẽα = ẽα + Eα, N = 1 + n, Nα = nα, Bα = Bα + bα, K̂α = K̂
+ kα . (2.36)
The background values should satisfy the constraints. The simplest choice is the background
where
= 0 = Bα . (2.37)
– 7 –
In fact, reaching this simple form was a motivation for the shift (2.31). Then, to quadratic order
in the fluctuating fields the Hamiltonian gives
IHP =
Ėα ∧ pα − 4σ⊥n
αǫαβγ d̃k
β ∧ ẽγ + 4σ⊥n
d̃(bα ∧ ẽ
α)− ρǫαβγk
α ∧ ẽβ ∧ ẽγ
−2σ⊥ǫαβγ
bα ∧ bβ + kα ∧ kβ + 2ρkα ∧ Eβ
∧ ẽγ
,(2.38)
where
pα = −4σ⊥ǫαβγk
β ∧ ẽγ (2.39)
are the linearized momentum variables conjugate to Eα.
In order to reach the form (2.38) the linear terms in the fluctuations must vanish. For this
to happen we find the relationships
˙̃eα + ρẽα = 0 . (2.40)
Notice that we can also write the linearized action in the form
IHP =
(Ėα + ρEα) ∧ pα − 2σ⊥ǫαβγ
bα ∧ bβ + kα ∧ kβ
∧ ẽγ
−4σ⊥n
αǫαβγ d̃k
β ∧ ẽγ + n
4σ⊥d̃bγ + ρpγ
∧ ẽγ
. (2.41)
The form of the first term, involving the momentum, makes clear that longitudinal fluctuations
are non-dynamical. The natural time dependence of Eα is of the form e−ρt (correspondingly, the
natural time dependence of pα is e
+ρt). Other than that, we see that in comparing to the flat
space action, in these variables, the only change is that the Hamiltonian constraint is modified.
The solutions of (2.40) and (2.37) are components of (A)dS4 spacetimes. We can solve (2.40)
to obtain
e0 = dt, eα = e−ρtdxα . (2.42)
With these we construct the usual Poincaré metric on (A)dS which, however, covers only half of
the space even though the parameter t runs from −∞ to +∞. The conformal boundary in these
coordinates is at t = +∞. Then we derive
ωα0 = −ρe
−ρtdxα = −ρeα , (2.43)
and so
Rαβ = −
eα ∧ eβ
Rα0 = −
eα ∧ e0
Rab = −
ea ∧ eb . (2.44)
Hence Ricab = −
ηab and R = −12σ⊥/L
2 = −12Λ. We also evaluate
Πα = −4σ⊥ρǫαβγ ẽ
β ∧ ẽγ , Π̂
= 4ρẽα, trΠ̂ = 12ρ (2.45)
Bα = 0, K
α = ρẽα ⇒ K̂
= 0 (2.46)
Note that in this gauge, (D̃K)α = 1
= 0, which solves the shift constraint, while the Hamil-
tonian constraint is satisfied through a cancellation between the K2 term and the cosmological
term.
– 8 –
3. Linearized Gravitational Duality and Holography
Let us summarize what we have obtained so far. In the presence of a cosmological constant
we have defined variables such that the action resembles most closely the action without the
cosmological constant. This was done in order to look for a suitable background around which
linear fluctuations are as simple as possible. Requiring that K̂ (the “electric field”) and B (the
”magnetic field”) vanish in such a background - as they do around flat space - we found that
the background should be (A)dS4. Quite satisfactorily, both sign choices for ρ in the change of
variables (2.31) lead to (A)dS4 spacetimes.
3.1 Duality and Holography
This is the appropriate point to recall some salient features of duality rotations. In simple
Hamiltonian systems the effect of the canonical transformation p 7→ q and q 7→ −p to the action
is (see e.g. [19])
dt[pq̇ −H(p, q)] 7→ ID =
dt[−qṗ−H(q,−p)] . (3.1)
Notice that ID involves the dual variables, for which we have however kept the same notation for
simplicity. The transformed Hamiltonian H(q,−p) is in general not related to H(p, q). However,
if H(q,−p) = H(p, q) we call the above transformation a duality. It then holds
ID = I − qp
. (3.2)
The dual action describes exactly the same dynamics as the initial one, up to a modification of
the boundary conditions. For example, if I is stationary on the e.o.m for fixed q in the boundary,
ID is stationary on the same e.o.m. for fixed p in the boundary. This simple example illustrates
the role of duality in holography; a bulk duality transformation corresponds to a particular
modification of the boundary conditions. This property of duality transformations is behind the
remarkable holographic properties of electormagnetism in (A)dS4 [2, 3].
Clearly, the crucial properties of a duality transformation are to be canonical and to leave
the Hamiltonian unchanged. However, consider a slight generalization
dt[pq̇ −
(p2 + q2 + 2λpq)] (3.3)
where λ is an arbitrary parameter. The Hamiltonian now is not invariant under the canonical
transformation p 7→ q and q 7→ −p – the pq term changes sign. Consequently, the first order form
of the equations of motion are also not duality invariant. Nevertheless, the second order equation
of motion is invariant. We will find that gravity in the presence of a cosmological constant follows
precisely this model. Of course, gravity is a much more complicated constrained system, but as
we will show, the constraints and Bianchi identities transform appropriately.
– 9 –
We also note that the canonical transformation (implemented by a generating functional of
the first kind)
p 7→ q + 2λp , q 7→ −p . (3.4)
is of interest here. The above does not change the Hamiltonian and the transformed action differs
from the initial one by total time derivative terms8
S 7→ SD = S − pq
. (3.5)
3.2 Linearized gravitational duality
As a preamble to gravity we recall the duality properties of Maxwell theory
IMax =
A ∧ ∗3E −
(E ∧ ∗3E +B ∧ ∗3B)−A0d̃ ∗3 E
, (3.6)
Under the duality E 7→ − ∗3 B, B 7→ ∗3E, Ã 7→ ÃD, we find
IMax 7→ IMax,D =
AD ∧ B −
(E ∧ ∗3E +B ∧ ∗3B) + A0d̃B
. (3.7)
E and B in (3.7) should be expressed through ÃD. We observe that the kinetic term has changed
sign, while the Hamiltonian remains invariant. In addition, the (Gauss) constraint is dualized to
the trivial ‘Bianchi’ identity dB = 0 for the dual magnetic field.
Next we try to apply a Maxwell-type duality map in gravity. We consider the following
transformation around the fixed background (2.40)
kα 7→ −bα, bα 7→ kα . (3.8)
To implement the map (3.8) we need to specify the mapping of Eα to a ‘dual 3-bein’ Eα. We do
that using the linearized form of (2.25) as
ǫαβγbβ ∧ ẽγ + d̃E
α = 0 7→ ǫαβγkβ ∧ ẽγ + d̃E
α = 0 = d̃Eα −
pα (3.9)
Since pα = 4σ⊥d̃Eα, it is natural to define
pD,α = 4σ⊥d̃Eα = −4σ⊥ǫαβγb
β ∧ ẽγ , (3.10)
and thus the mapping (3.8) is supplemented by
E 7→ E , E 7→ −E , p 7→ −pD , pD 7→ p (3.11)
8In holography, the latter terms correspond to the relevant ”multi-trace” boundary deformations discussed in
– 10 –
Now, let us see the effects of the above duality mapping. The action transforms to
IHP 7→ IHP,D =
−Ėα ∧ pD,α − ρE
α ∧ pD,α − 2σ⊥ǫαβγ
bα ∧ bβ + kα ∧ kβ
∧ ẽγ (3.12)
+4σ⊥n
αǫαβγ d̃b
β ∧ ẽγ + n
4σ⊥d̃kγ + ρpD,α
∧ ẽγ
where now kα and bα should be expressed in terms of the dual variables Eα and pD,α via (3.9) and
(3.10). We notice that the ’kinetic’ part Ė ∧ p of the action changes sign under the duality map,
in direct analogy with the Maxwell case. However, the Hamiltonian is not invariant due to the
change of sign of the second term in the first line of (3.12). We will discuss this further in a later
section. For now, we note that this sign change would not show up in the equations of motion,
written in second order form. It is important to also note that the constraints are transformed
into quantities which in the next subsection we will recognize as the linearized Bianchi identities.
This is to be expected since the duality transformations are canonical. We also note that it may
be possible to choose an alternative canonical transformation, designed to leave the Hamiltonian
invariant. The latter is presumably related to the work of Julia et. al. [15] and is considered in
the Appendix.
3.3 Linearized Constraints and Bianchi Identities
By virtue of the discussion above we may now demonstrate that under the duality mapping (3.8)
the linearized constraints transform to the linearized Bianchi identities as
Cα ≡ ǫαβγ d̃k
β ∧ ẽγ 7→ −ǫαβγ d̃b
β ∧ ẽγ (3.13)
C0 ≡ −σ⊥
d̃bγ − ρǫαβγk
α ∧ ẽβ
∧ ẽγ 7→ −σ⊥
d̃kγ + ρǫαβγb
α ∧ ẽβ
∧ ẽγ (3.14)
To identify the right hand sides, we first note that the Bianchi identities are
b = dR
c ∧ ω
b + ω
b = 0 (3.15)
BaT = dT
a − Rab ∧ e
b + ωab ∧ T
b = 0 (3.16)
which are obtained from the definitions of Rab and T
a by exterior differentiation. The first
equation is satisfied identically. Since the torsion vanishes, the second equation tells us only that
Rab ∧ e
b = 0. If we do the 3+1 split, we find two equations. The first is
α = −((3)Rαβ − σ⊥K
α ∧Kβ) ∧ ẽ
β = 0 (3.17)
which upon using the symmetry of Kα linearizes to
α = −ǫαβγ d̃b
β ∧ ẽγ + . . . (3.18)
Note that this is the image under duality of the shift constraint as in (3.13).
The second identity is
0 = −R̃0α ∧ ẽ
α = −σ⊥(D̃K)α
= −σ⊥
d̃kα + ρǫαβγb
β ∧ ẽγ
∧ ẽα = 0 (3.19)
– 11 –
where to arrive in the second line we used (2.46). This is the image of the Hamiltonian constraint
as in (3.14).
Summarizing, the duality transformations between linearized constraints and Bianchi iden-
tities are
Cα 7→ BT,α C0 7→ B
T (3.20)
BT,α 7→ −Cα B
T 7→ −C0 (3.21)
3.4 Connection with other known dualities
The Maxwell-type duality operation (3.8) is closely related to the dualization of the first two
indices of the Riemann tensor as9
Rab → S
dRcd (3.22)
at least at the linearized level. Let us investigate (3.22) by rewriting expressions in the 3+1 split.
We have
Rab = R̃
b + dt ∧ r
Sab = S̃
b + dt ∧ s
We begin with the spatial 2-forms when we have
R̃αβ = −ǫαβγ d̃Bγ + σ⊥(B
α ∧ Bβ −Kα ∧Kβ) (3.23)
R̃0α = σ⊥(d̃Kα +Kβ ∧ ω̃
α) ≡ σ⊥(D̃K)α (3.24)
S̃0γ =
σ⊥ǫαβγR̃
αβ (3.25)
S̃αβ = ǫαβγR̃
0γ (3.26)
If we linearize these expressions, we find under the duality transformation (3.8)
R̃ab 7→ −σ⊥S̃
ab (3.27)
Because the expressions (3.24) involve derivatives of B and K, the duality (3.8) is an ‘integrated
form’ of the usual Riemann tensor duality, but implies it.
Similarly, if we investigate the spatial 1-forms, we find
rab 7→ −σ⊥s
ab (3.28)
To arrive at this result we have set to zero the Lagrange multiplier field q.
9For a discussion of the duality properties of gravity in terms of the Riemann tensor see [10].
– 12 –
4. The Effect on the Boundary Theory
It is well known that AdS is holographic. We may well ask, in the context of AdS/CFT, how
the duality transformation that we have defined here acts in the boundary. We are instructed
to consider the on-shell bulk action as a function of bulk fields. So, we evaluate the action on a
solution to the equation of motion, resulting in a pure boundary term which is of the form
Sbdy =
pα ∧ E
α (4.1)
Applying the duality transformation to the bulk theory, although the bulk action is not invari-
ant as we have discussed above, nevertheless it may be easily shown that it induces a simple
transformation on the (linearized) boundary term: it simply changes its sign.
Sdualbdy = −
pD,α ∧ E
α (4.2)
This transformation is exactly analogous to what happens in the Maxwell case: it amounts to
the result [21].10
2 = −1 . (4.3)
5. Conclusions and Outlook
Motivated by possible application in holography and in higher-spin gauge theory we have studied
the duality properties of gravity in the Hamiltonian formulation. We have presented the gravity
action in terms of suitable variables that closely resemble the electric and magnetic fields in
Maxwell theory. We have found suitable ”electric” and ”magnetic” field variables, such that at
the linearized level first order gravity most closely resembles electromagnetism. This can be done
only around Minkowksi and (A)dS4 backgrounds.
We have implemented duality transformations in the linearized gravity fluctuations around
these backgrounds. In the presence of a cosmological constant, the Hamiltonian changes, nev-
ertheless the bulk dynamics remains unaltered, while the linearized lapse and shift constraints
are mapped into the linearized Bianchi identities. Moreover, the duality transformations induce
boundary terms whose relevance in holography we have briefly discussed. Finally, we have ex-
hibited a modified duality rotation that leaves the bulk Hamiltonian invariant, while it induces
boundary terms corresponding to relevant deformations.
The main implication of our results is that certain properties of correlations functions in
three-dimensional CFTs mimic the duality of gravity. It would be interesting to extend our
results to black-hole backgrounds and also when topological terms are present in the bulk. We
also expect that one can analyze the duality of higher-spin gauge theories based on our first-order
approach.
Acknowledgments
10See also [22] for an interesting recent application of this formula.
– 13 –
The work of A. C. P. was partially supported by the research program ”PYTHAGORAS
II” of the Greek Ministry of Education. RGL was supported in part by the U.S. Department of
Energy under contract DE-FG02-91ER40709.
6. Appendix: other duality mappings
It is possible to find a transformation that leaves the Hamiltonian unchanged. Consider the
following transformation in the fixed background (2.40)
kα 7→ −bα − 2ρEα, bα 7→ kα . (6.1)
The mapping to the dual dreibein is still specified by (3.9). A straightforward calculation reveals
that the action transforms as
IHP 7→ IHP,D = IHP + 4σ⊥
ǫαβγE
α ∧ bβ + ρǫαβγE
α ∧ Eβ
∧ ẽγ
αǫαβγ d̃b
β ∧ ẽγ − 8ρnαkβ ∧ ẽα ∧ ẽ
+n(4σ⊥d̃kα + 4σ⊥ρǫαβγb
β ∧ ẽγ + 8ΛǫαβγE
β ∧ ẽγ) ∧ ẽα
(6.2)
The transformations (6.1) leaves unchanged the Hamiltonian and changes the action by the total
”time” derivative terms shown in the first line of (6.2).
Moreover, the linearized constraints transform into the linearized Bianchi identities. Let us
see that in some detail. The second term in the shift constraint is zero since kα is a symmetric
one form kα = kαβ ẽ
β with kαβ = kβα; see (2.21).
The term proportional to Λ in the lapse constraint is also zero. This is slightly more involved
to see and it is based on the possibility of solving (3.9) for Eα after gauge fixing.11 One way to
see this is in components. Write Eα = Eαβ ẽ
β and (3.9) becomes
γ − ∂γE
α = ǫ
γ − ǫ
α (6.3)
In the ”Lorentz gauge” where ∂αEβα = 0 = ∂
αkβα the above can be inverted as
Eαβ =
ǫαδγ∂
γkδβ (6.4)
Using (6.4) one verifies that the last term in the lapse constraint vanishes. This modified duality
transformation is probably related to the one considered by Julia et. al. in [15].
References
[1] D. Francia and A. Sagnotti, arXiv:hep-th/0601199.
X. Bekaert, S. Cnockaert, C. Iazeolla and M. A. Vasiliev, arXiv:hep-th/0503128.
11This is the equivalent of inverting Ē = ∇× Ā in the discussion of duality in electromagnetism [16].
– 14 –
[2] E. Witten, arXiv:hep-th/0307041.
[3] R. G. Leigh and A. C. Petkou, JHEP 0312 (2003) 020 [arXiv:hep-th/0309177].
[4] R. Zucchini, Adv. Theor. Math. Phys. 8 (2005) 895 [arXiv:hep-th/0311143].
H. U. Yee, Phys. Lett. B 598 (2004) 139 [arXiv:hep-th/0402115].
[5] S. de Haro and P. Gao, arXiv:hep-th/0701144.
[6] S. Hands, Phys. Rev. D 51 (1995) 5816 [arXiv:hep-th/9411016].
[7] P. C. West, Class. Quant. Grav. 18, 4443 (2001) [arXiv:hep-th/0104081].
[8] T. Curtright, Phys. Lett. B 165 (1985) 304.
[9] J. A. Nieto, Phys. Lett. A 262 (1999) 274 [arXiv:hep-th/9910049].
[10] C. M. Hull, JHEP 0109 (2001) 027 [arXiv:hep-th/0107149].
[11] X. Bekaert, N. Boulanger and M. Henneaux, Phys. Rev. D 67 (2003) 044010
[arXiv:hep-th/0210278]. and
[12] N. Boulanger, S. Cnockaert and M. Henneaux, JHEP 0306 (2003) 060 [arXiv:hep-th/0306023].
[13] M. Henneaux and C. Teitelboim, Phys. Rev. D 71, 024018 (2005) [arXiv:gr-qc/0408101].
[14] S. Deser and D. Seminara, Phys. Rev. D 71, 081502 (2005) [arXiv:hep-th/0503030].
S. Deser and D. Seminara, Phys. Lett. B 607, 317 (2005) [arXiv:hep-th/0411169].
[15] B. L. Julia, arXiv:hep-th/0512320.
B. Julia, J. Levie and S. Ray, JHEP 0511, 025 (2005) [arXiv:hep-th/0507262].
[16] S. Deser and C. Teitelboim, Phys. Rev. D 13, 1592 (1976).
[17] I. Papadimitriou and K. Skenderis, arXiv:hep-th/0404176.
[18] S. W. MacDowell and F. Mansouri, Phys. Rev. Lett. 38 (1977) 739 [Erratum-ibid. 38 (1977)
1376].
[19] H. Goldstein, ”Classical Mechanics”, Addison-Wesley Publishing Company Inc. (1980)
[20] E. Witten, arXiv:hep-th/0112258.
[21] A. C. Petkou, Fortsch. Phys. 53, 962 (2005).
[22] C. P. Herzog, P. Kovtun, S. Sachdev and D. T. Son, arXiv:hep-th/0701036.
– 15 –
ABSTRACT
  We discuss the implementation of electric-magnetic duality transformations in
four-dimensional gravity linearized around Minkowski or (A)dS4 backgrounds. In
the presence of a cosmological constant duality generically modifies the
Hamiltonian, nevertheless the bulk dynamics is unchanged. We pay particular
attention to the boundary terms generated by the duality transformations and
discuss their implications for holography.

<|endoftext|><|startoftext|>
Introduction
It is with great pleasure that we contribute to this book in honor of Prof.
Takeo Fujiwara. GTL enjoyed eighteen months of Prof. Fujiwara’s hospi-
tality at the University of Tokyo during the early 1990’s. At that time the
work of Prof. Fujiwara in the field of electronic structure of quasicrystals
had already made a major contribution to the literature (see for instance
[1]). Since that time our research owes much to his work.
Prof. Fujiwara was the first who performed realistic calculations of the
electronic structure in quasicrystalline materials without adjustable param-
eters (ab-initio calculations) [2]. Indeed these complex alloys [3] have very
exotic physical properties (see Refs. [4, 5] and Refs therein), and it rapidly
appeared that realistic calculations on the actual quasicrystalline materials
are necessary to understand the physical mechanism that govern this prop-
erties. In particular, these calculations allow to analyze numerically the role
http://arxiv.org/abs/0704.0532v1
2 ELECTRONIC STRUCTURE 2
of transition-metal elements which is essential in those materials.
In this paper, we briefly present our work on the role of transition-metal
element in electronic structure and transport properties of quasicrystals and
related complex phases. Several Parts of these works have been done or
initiated in collaboration with Prof. T. Fujiwara.
2 Electronic structure
2.1 Ab-initio determination of the density of states
A way to study the electronic structure of quasicrystal is to consider the case
of approximants. Approximants are crystallines phases, with very large unit
cell, which reproduce the atomic order of quasicrystals locally. Experiments
indicate that approximant phases, like α-AlMnSi, α-AlCuFeSi, R-AlCuFe,
etc., have transport properties similar to those of quasicrystals [4, 6]. In 1989
and 1991, Prof. Fujiwara performed the first numerical calculations of the
electronic structure in realistic approximants of quasicrystals [2, 7, 8]. He
showed that their density of states (DOS, see figure 1) is characterized by a
depletion near the Fermi energy EF, called “pseudo-gap”, in agreement with
experimental results (for review see Ref. [4, 9, 18]) and a Hume-Rothery
stabilization [10, 11]. The electronic structure of simpler crystals such as
orthorhombic Al6Mn, cubic Al12Mn, present also a pseudo-gap near EF
which is less pronounced than in complex approximants phases (figure 1)
[11].
2.2 Models to analyze the role of transition-metal element
sp–d hybridization model
The role of the transition-metal (TM, TM= Ti, Cr, Mn, Fe, Co, Ni) elements
in the pseudo-gap formation has been shown from experiments, ab-initio
calculations and model analysis [4,13–19,11]. Indeed the formation of the
pseudo-gap results from a strong sp–d coupling associated to an ordered
2 ELECTRONIC STRUCTURE 3
-12 -10 -8 -6 -4 -2 0 2 4
-12 -10 -8 -6 -4 -2 0 2 4
Energy (eV)
Fα-Al
Figure 1: Ab-initio total DOS of Al6Mn (simple crystal) and α-
Al69.6Si13.0Mn17.4 (approximant of icosahedral quasicrystals) [11, 12].
sub-lattice of TM atoms [19, 11]. Consequently, the electronic structure, the
magnetic properties and the stability, depend strongly on the TM positions,
as was shown from ab-initio calculations [28–33,20,21].
How an effective TM–TM interaction induces stability?
Just as for Hume-Rothery phases a description of the band energy can be
made in terms of pair interactions (figure 2) [17, 19]. Indeed, it has been
shown that an effective medium-range Mn–Mn interaction mediated by the
sp(Al)–d(Mn) hybridization plays a determinant role in the occurrence of
the pseudo-gap [19]. We have shown that this interaction, up to distances
10–20 Å, is essential in stabilizing these phases, since it can create a Hume-
Rothery pseudo-gap close to EF. The band energy is then minimized as
shown on figure 3 [20, 11].
2 ELECTRONIC STRUCTURE 4
2 3 4 5 6 7 8 9 10 11 12 13
Mn−Mn distance r (A)
-0.04
-0.02
with repulsive term
without repulsive term
repulsive term : b e
− a r 
r = 4.8 A
r = 6.7 A
Figure 2: Effective medium-range Mn–Mn interaction between two non-
magnetic manganese atoms in a free electron matrix which models aluminum
atoms. [11]
0 4 8 12 16 20 24 28 32 36 40
L (A)
α-AlMnSi
β-AlMnSi
Figure 3: Variation of the band energy due to the effective Mn–Mn interac-
tion in o-Al6Mn, α-AlMnSi and β-Al9Mn3Si. [20]
The effect of these effective Mn–Mn interactions has been also studied
by several groups [17, 20, 21] (see also Refs in [11]). It has also explained the
origin of large vacancies in the hexagonal β-Al9Mn3Si and ϕ-Al10Mn3 phases
on some sites, whereas equivalent sites are occupied by Mn in µ-Al4.12Mn and
λ-Al4Mn, and by Co in Al5Co2 [20]. On the other hand, an spin-polarized
2 ELECTRONIC STRUCTURE 5
effective Mn–Mn interaction is also determinant for the existence (or not)
of magnetic moments in AlMn quasicrystals and approximants [21, 22, 32].
The analysis can be applied to any Al(rich)-Mn phases, where a small
number of Mn atoms are embedded in the free electron like Al matrix. The
studied effects are not specific to quasicrystals and their approximants, but
they are more important for those alloys. Such a Hume-Rothery stabiliza-
tion, governed by the effective medium-range Mn–Mn interaction, might
therefore be intrinsically linked to the emergence of quasi-periodicity in
Al(rich)-Mn system.
Cluster Virtual Bound states
One of the main results of the ab-initio calculations performed by Prof.
Fujiwara for realistic approximant phases, is the small energy dispersion
of electrons in the reciprocal space. Consequently, the density of states of
approximants is characterized by “spiky” peaks [2, 7, 8, 28]. In order to
analyze the origin of this spiky structure of the DOS, we developed a model
that show a new kind of localization by atomic cluster [23].
As for the local atomic order, one of the characteristics of the quasicrys-
tals and approximants is the occurrence of atomic clusters on a scale of 10–30
Å [25]. The role of clusters has been much debated in particular by C. Janot
[24] and G. Trambly de Laissardière [23]. Our model is based on a standard
description of inter-metallic alloys. Considering the cluster embedded in a
metallic medium, the variation ∆n(E) of the DOS due to the cluster is cal-
culated. For electrons, which have energy in the vicinity of the Fermi level,
transition atoms (such as Mn and Fe) are strong scatters whereas Al atoms
are weak scatters. In the figure 4 the variation, ∆n(E), of the density of
states due to different clusters are shown. The Mn icosahedron is the actual
Mn icosahedron of the α-AlMnSi approximant. As an example of a larger
cluster, we consider one icosahedron of Mn icosahedra.
∆n(E) of clusters exhibits strong deviations from the Virtual Bound
2 ELECTRONIC STRUCTURE 6
6 7 8 9 10 11 12 13 14 15
Energy E (eV)
)) 1 Mn atom
     1 Mn 
 icosahedron
 1 icosahedron 
    of 12 Mn 
    icosahedra
Figure 4: Variation ∆n(E) of the DOS due to Mn atoms. Mn atoms are
embedded in a metallic medium (Al matrix). From [23].
States (1 Mn atom) [26]. Indeed several peaks and shoulders appear. The
width of the most narrow peaks (50 − 100meV) are comparable to the fine
peaks of the calculated DOS in the approximants (figure 1). Each peak
indicates a resonance due to the scattering by the cluster. These peaks
correspond to states “localized” by the icosahedron or the icosahedron of
icosahedra. They are not eigenstate, they have finite lifetime of the order of
~/δE, where δE is the width of the peak. Therefore, the stronger the effect
of the localization by cluster is, the narrower is the peak. A large lifetime
is the proof of a localization, but in the real space these states have a quite
large extension on length scale of the cluster.
The physical origin of these states can be understood as follows. Elec-
trons are scattered by the Mn atoms of a cluster. By an effect similar to
that of a Faraday cage, electrons can by confined by the cluster provided
that their wavelength λ satisfies λ & l, where l is the distance between two
Mn spheres. Consequently, we expect to observe such a confinement by the
3 TRANSPORT PROPERTIES 7
cluster. This effect is a multiple scattering effect, and it is not due to an
overlap between d-orbitals because Mn atoms are not first neighbor.
3 Transport properties
Quasicrystals have many fascinating electronic properties, and in particular
quasicrystals with high structural quality, such as the icosahedral AlCuFe
and AlPdMn alloys, have unconventional conduction properties when com-
pared with standard inter-metallic alloys. Their conductivities can be as low
as 150–200 (Ω cm)−1 (see Refs. [4, 5, 27] and Refs. therein). Furthermore
the conductivity increases with disorder and with temperature, a behavior
just at the opposite of that of standard metal. In a sense the most striking
property is the so-called “inverse Mathiessen rule” according to which the
increases of conductivity due to different sources of disorder seems to be ad-
ditive. This is just the opposite that happens with normal metals where the
increases of resistivity due to several sources of scattering are additive. An
important result is also that many approximants of these quasicrystalline
phases have similar conduction properties. For example the crystalline α-
AlMnSi phase with a unit cell size of about 12 Å and 138 atoms in the unit
cell has a conductivity of about 300 (Ω cm)−1 at low temperature [4].
3.1 Small Boltzmann velocity
Prof. Fujiwara et al. was the first to show that the electronic structure
of AlTM approximants and related phases is characterized by two energy
scales [2, 7, 8, 28, 29] (see previous section). The largest energy scale, of
about 0.5−1 eV, is the width of the pseudogap near the Fermi energy EF. It
is related to the Hume–Rothery stabilization via the scattering of electrons
by the TM sub-lattice because of a strong sp–d hybridization. The smallest
energy scale, less than 0.1 eV, is characteristic of the small dispersion of the
band energy E(k). This energy scale seems more specific to phases related to
3 TRANSPORT PROPERTIES 8
Temperature
Metallic alloys
"Perfect" stable quasicrystals
Doped semi-conductors
4 K 300 K
Metastable quasicrystals (i-AlMn),
(i-AlCuFe and i- AlPdMn)
"Imperfect" stable quasicrystals
(i-AlLiCu)
Amorphous alloys
ρ Mott
Figure 5: Schematic temperature dependencies of the experimental resistiv-
ity of quasicrystals, amorphous and metallic crystals.
1e+14 1e+15
1/τ     (s−1)
Al (f.c.c.)
Temperature
Figure 6: Ab-initio elec-
trical resistivity versus
inverse scattering time,
in cubic approximant α-
Al69.6Si13.0Mn17.4, pure Al
(f.c.c.), and cubic Al12Mn.
the quasi-periodicity. The first consequence on transport is a small velocity
at Fermi energy, Boltzmann velocity, VB = (∂E/∂k)E=EF . From numerical
calculations, Prof. Fujiwara et al. evaluated the Bloch–Boltzmann dc con-
ductivity σB in the relaxation time approximation. With a realistic value
3 TRANSPORT PROPERTIES 9
of scattering time, τ ∼ 10−14 s [27], one obtains σB ∼ 10 − 150 (Ωcm)
−1 for
a α-AlMn model [8] and 1/1-AlFeCu model [28]. This corresponds to the
measured values [4, 6], which are anomalously low for metallic alloys. For
decagonal approximant the anisotropy found experimentally in the conduc-
tivity is also reproduced correctly [29].
3.2 Quantum transport in Quasicrystals and approximants
The semi-classical Bloch–Boltzmann description of transport gives inter-
esting results for the intra-band conductivity in crystalline approximants,
but it is insufficient to take into account many aspects due to the spe-
cial localization of electrons by the quasi-periodicity (see Refs. [34–43] and
Refs. therein). Some specific transport mechanisms like the temperature
dependence of the conductivity (inverse Mathiessen rule, the defects influ-
ence, the proximity of a metal / insulator transition), require to go beyond
a Bloch–Boltzmann analysis. Thus, it appears that in quasicrystals and re-
lated complex metallic alloys a new type of breakdown of the semi-classical
Bloch-Boltzmann theory operates. In the literature, two different unconven-
tional transport mechanisms have been proposed for these materials. Trans-
port could be dominated, for short relaxation time τ by hopping between
“critical localized states”, whereas for long time τ the regime could be dom-
inated by non-ballistic propagation of wave packets between two scattering
events.
We develop a theory of quantum transport that applies to a normal bal-
listic law but also to these specific diffusion laws. As we show phenomenolog-
ical models based on this theory describe correctly the experimental trans-
port properties [41, 42, 43] (compare figures 5 and 6).
3.3 Ab-initio calculations of quantum transport
According to the Einstein relation the conductivity σ depends on the diffu-
sivity D(E) of electrons of energy E and the density of states n(E) (summing
3 TRANSPORT PROPERTIES 10
the spin up and spin down contribution). We assume that n(E) and D(E)
vary weakly on the thermal energy scale kT , which is justified here. In that
case, the Einstein formula writes
σ = e2n(EF)D(EF) (1)
where EF is the chemical potential and e is the electronic charge. The tem-
perature dependence of σ is due to the variation of the diffusivity D(EF )
with temperature. The central quantity is thus the diffusivity which is re-
lated to quantum diffusion. Within the relaxation time approximation, the
diffusivity is written [41]
D(E) =
C0(E, t) e
−|t|/τ dt (2)
where C0(E, t) =
Vx(t)Vx(0) + Vx(0)Vx(t)
it the velocity correlation
functions without disorder, and τ is the relaxation time. Here, the effect
of defects and temperature (scattering by phonons ...) is taken into account
through the relaxation time τ . τ decreases as disorder increases. In the
case of crystals phases (such as approximants of quasicrystals), one obtains
[42, 43]:
σ = σB + σNB (3)
σB = e
2n(EF)V
B τ and σNB = e
2n(EF)
L2(τ)
where σB is actual the Bolzmann contribution to the conductivity and σNB
a non-Boltzmann contribution. L2(τ) is smaller than the square of the unit
cell size L0. L
2(τ) can be calculated numerically for the ab-initio electronic
structure [42]. From (3) and (4), it is clear that the Bolzmann term domi-
nates when L0 ≪ VBτ : The diffusion of electrons is then ballistic, which is
the case in normal metallic crystals. But, when L0 ≃ VBτ , i.e. when the
Bolzmann velocity VB is very low, the non-Bolzmann term is essential. In
the case of α-Al69.6Si13.0Mn17.4 approximant (figure 7) [42], with realistic
value of τ (τ equals a few 10−14 s [27]), σNB dominates and σ increases when
3 TRANSPORT PROPERTIES 11
1e+14 2e+14 3e+14 4e+14
1/τ     (s−1)
Figure 7: Ab-initio dc-conductivity σ in cubic approximant α-
Al69.6Si13.0Mn17.4 versus inverse scattering time. [42]
1e+14 2e+14 3e+14 4e+14
1/τ     (s−1)
1e+05
2e+05
3e+05
4e+05
Figure 8: Ab-initio dc-conductivity σ in an hypothetical cubic approximant
α-Al69.6Si13.0Cu17.4 versus inverse scattering time. [43]
1/τ increases, i.e. when defects or temperature increases, in agreement with
experimental measurement (compare figures 5 and 6).
To evaluate the effect of TM elements on the conductivity, we have
considered an hypothetical α-Al69.6Si13.0Cu17.4 constructed by putting Cu
atoms in place of Mn atoms in the actual α-Al69.6Si13.0Mn17.4 structure.
Cu atoms have almost the same number of sp electrons as Mn atoms, but
their d DOS is very small at EF. Therefore in α-Al69.6Si13.0Cu17.4, the effect
of sp(Al)–d(TM) hybridization on electronic states with energy near EF is
4 CONCLUSION 12
very small. As a result, the pseudogap disappears in total DOS, and the
conductivity is now ballistic (metallic), σ ≃ σB, as shown on figure 8.
4 Conclusion
In this article we present the effect of transition-metal atoms on the physical
properties of quasicrystals and related complex phases. These studies lead
to consider these aluminides as spd electron phases [11], where a specific
electronic structure governs stability, magnetism and quantum transport
properties. The principal aspects of this new physics are now understood
particularly thanks to seminal work of Prof. T. Fujiwara and subsequent
developpements of his ideas.
References
[1] Fujiwara T, Tsunetsugu H. In: Di Vincenxo DP, Steinhart PJ, editors,
Quasicrystals: The states of the art, Singapore: World Scientific, 1991.
[2] Fujiwara T. Phys Rev 1989;B40:942.
[3] Shechtman D, Blech I, Gratias D, Cahn JW. Phys Rev Lett
1984;53:1951.
[4] Berger C. In: Hippert F, Gratias D, editors. Lecture on Quasicrystals.
Les Ulis: Les Editions de Physique, 1994; p. 463.
[5] Grenet T. In: Belin-Ferré E, Berger C, Quiquandon M, Sadoc A, edi-
tors. Quasicrystals: Current Topics. Singapor: World Scientific, 2000;
p. 455.
[6] Quivy A, Quiquandon M, Calvayrac Y, Faudot F, Gratias D, Berger C,
Brand RA, Simonet V, Hippert. J Phys Condens Matter 1996;8:4223.
[7] Fujiwara T, Yokokawa T. Phys Rev Lett 1991;66:333.
REFERENCES 13
[8] Fujiwara T, Yamamoto S, Trambly de Laissardière G. Phys Rev Lett
1993;71:4166. Mat Sci Forum 1994;150-151:387.
[9] Mizutani U, Takeuchi T, Sato H. J Phys: Condens Matter
2002;14:R767.
[10] Massalski TB, Mizutani U. Prog Mater Sci 1978;22:151.
[11] Trambly de Laissardière G, Nguyen Manh D, Mayou D, Prog Mater Sci
2005;50:679.
[12] Zijlstra ES, Bose SK. Phys Rev 2003;B67:224204.
[13] Dankházi Z, Trambly de Laissardière G, Nguyen–Manh D, Belin E,
Mayou D. J Phys: Condens Matter 1993;5:3339.
[14] Trambly de Laissardière G, Mayou D, Nguyen Manh D. Europhys
Lett 1993;21:25. J Non-Cryst Solids 1993;153-154:430. Trambly de Lais-
sardière G, et al. Phys Rev 1995;B52:7920.
[15] Berger C, Belin E, Mayou D. Annales de Chimie-Science des Matériaux
1993;18:485.
[16] Mayou D, Cyrot–Lackmann F, Trambly de Laissardière G, Klein T. J
Non-Cryst Solids 1993;153-154:412.
[17] Zou J, Carlsson AE. Phys Rev Lett 1993;70:3748.
[18] Belin-Ferré E. J Non-Cryst Solids 2004;334-335:323.
[19] Trambly de Laissardière G, Nguyen Manh D, Mayou D. J Non-Cryst
Solids 2004;334-335:347.
[20] Trambly de Laissardière G. Phys Rev 2003;B68:045117.
[21] Trambly de Laissardière G, Mayou D. Phys Rev Lett 2000;85:3273.
[22] Simonet V, Hippert F, Audier M, Trambly de Laissardière G. Phys Rev
1998;B58:R8865.
REFERENCES 14
[23] Trambly de Laissardière G, Mayou M. Phys Rev 1997;B55:2890. Tram-
bly de Laissardière G, Roche S, Mayou D. Mat Sci Eng 1997;A226-
228:986.
[24] Janot C, de Boissieu M. Phys Rev Lett 1994;72:1674.
[25] Gratias D, Puyraimond F, Quiquandon M, Katz A. Phys Rev
2000;B63:24202.
[26] Friedel J. Can J Phys 1956;34:1190.
Anderson PW. Phys Rev 1961;124:41.
[27] Mayou D, Berger C, Cyrot–Lackmann F, Klein T, Lanco P. Phys Rev
Lett 1993;70:3915.
[28] Trambly de Laissardière G, Fujiwara T. Phys Rev 1994;B50:5999.
[29] Trambly de Laissardière G, Fujiwara T. Phys Rev 1994;B50:9843.
Mat Sci Eng 1994;A181-182:722.
[30] Hafner J, Krajč́ı M. Phys Rev 1998;B57:2849.
[31] Krajč́ı M, Hafner J. Phys Rev 1998;B58:14110.
[32] Nguyen–Manh D, Trambly de Laissardière G. J Mag Mag Mater
2003;262:496.
[33] Zijlstra ES, Bose SK, Klanǰsek M, Jeglič P, Dolinšek J. Phys Rev
2005;B72:174206.
[34] Tokihiro T, Fujiwara T, Arai M. Phys. Rev 1988;B38:5981.
[35] Fujiwara T, Mitsui T, Yamamoto S. Phys Rev B 1996;53,R2910.
[36] Roche S, Trambly de Laissardière G, Mayou D. J Math Phys
1996;38:1794.
[37] Roche S, Mayou D. Phys Rev Lett 1997;79:2518.
REFERENCES 15
[38] Mayou D. In: Belin-Ferré E, Berger C, Quiquandon M, Sadoc A, edi-
tors. Quasicrystals: Current Topics. Singapor: World Scientific, 2000;
p. 412.
[39] Triozon F, Vidal J, Mosseri R, Mayou D. Phys Rev 2002;B65:220202.
[40] Bellissard J. In: Garbaczeski P, Olkieicz R, editors. Dynamics of Dissi-
pation, Lecture Notes in Physics. Berlin: Springer, 2003; p. 413.
[41] Mayou D. Phys Rev Lett 2000;85:1290.
[42] Trambly de Laissardière G, Julien JP, Mayou D. Phys Rev Lett
2006;97:026601.
[43] Mayou D, Trambly de Laissardière G. In: Fujiwara T, Ishii Y, editors.
Quasicrystals. Series “Handbook of Metal Physics”. Elsevier Science,
2007. to appear
	Introduction
	Electronic structure
	Ab-initio determination of the density of states
	Models to analyze the role of transition-metal element
	Transport properties
	Small Boltzmann velocity
	Quantum transport in Quasicrystals and approximants
	Ab-initio calculations of quantum transport
	Conclusion
ABSTRACT
  In this paper, we briefly present our work on the role of transition-metal
element in electronic structure and transport properties of quasicrystals and
related complex phases. Several Parts of these works have been done or
initiated in collaboration with Prof. T. Fujiwara.

<|endoftext|><|startoftext|>
TbMn2O5
Non-resonant and Resonant X-ray Scattering Studies on Multiferroic TbMn2O5
J. Koo1, C. Song1, S. Ji1, J.-S. Lee1, J. Park1, T.-H. Jang1, C.-H. Yang1, J.-H. Park1,2, Y. H. Jeong1, K.-B. Lee1,2,∗
T.Y. Koo2, Y.J. Park2, J.-Y. Kim2, D. Wermeille3,† A.I. Goldman3, G. Srajer4, S. Park5, and S.-W. Cheong5,6
eSSC and Department of Physics, POSTECH, Pohang 790-784, Korea
Pohang Accelerator Laboratory, Pohang University of Science and Technology, Pohang 790-784, Korea
Ames Laboratory, Department of Physics and Astronomy, Iowa State University, Ames, IA 50011, USA
Advanced Photon Source, Argonne National Laboratory, Argonne, IL 60439, USA
Rutgers Center for Emergent Materials and Department of Physics and Astronomy,
Rutgers University, Piscataway, NJ 08854, USA
Laboratory of Pohang Emergent Materials and Department of Physics, POSTECH, Pohang 790-784, Korea
(Dated: November 1, 2018)
Comprehensive x-ray scattering studies, including resonant scattering at Mn L-edge, Tb L- and
M -edges, were performed on single crystals of TbMn2O5. X-ray intensities were observed at a
forbidden Bragg position in the ferroelectric phases, in addition to the lattice and the magnetic
modulation peaks. Temperature dependences of their intensities and the relation between the mod-
ulation wave vectors provide direct evidences of exchange striction induced ferroelectricity. Resonant
x-ray scattering results demonstrate the presence of multiple magnetic orders by exhibiting their
different temperature dependences. The commensurate-to-incommensurate phase transition around
24 K is attributed to discommensuration through phase slipping of the magnetic orders in spin
frustrated geometries. We proposed that the low temperature incommensurate phase consists of the
commensurate magnetic domains separated by anti-phase domain walls which reduce spontaneous
polarizations abruptly at the transition.
PACS numbers: 77.80.e-, 75.25.+z, 64.70.Rh, 61.10.-i
In recent years, much attention has been paid to mul-
tiferroic materials, in which magnetic and ferroelectric
orders coexist and are cross-correlated [1, 2, 3, 4, 5,
6, 7, 8, 9, 10], due to theoretical interests and poten-
tial application to magnetoelectric (ME) devices. Ma-
nipulation of electric polarizations by external magnetic
fields has been demonstrated in some of these materi-
als [4, 5]. Orthorhombic TbMn2O5, one of the multi-
ferroic materials, displays a rich phase diagram. Upon
cooling through TN ∼ 41 K, TbMn2O5 becomes anti-
ferromagnetic with an incommensurate magnetic (ICM)
order which transits to a commensurate magnetic (CM)
phase with spontaneous electric polarization at T c1 ∼
36 K, and reenters a low temperature incommensurate
magnetic (LT-ICM) phase at T c2 ∼ 24 K. Anomalies
of ferroelectricity and dielectric properties were observed
concurrently with these magnetic phase transitions [4, 9].
Especially, the reentrant LT-ICM phase is a phenomenon
peculiar to RMn2O5 multiferroics while commensurate
phases are more common as the low temperature ground
states. Since the CM to LT-ICM phase transition is also
accompanied with an abrupt loss of spontaneous polar-
izations, it is critical to elucidate the natures of the in-
commensurability of the material, including the mecha-
nism of the CM to LT-ICM phase transition.
The origin of the complex phases of the material is at-
tributed to the coupling between magnetic moments of
Mn ions and lattice [8, 9]. It is suggested that, when a
magnetic order is modulated with a wave vector qm, the
exchange striction affects inter-atomic bondings result-
ing in a periodic lattice modulation with a wave vector
qc = 2qm [5, 6, 7, 8, 9]. Recently, Chapon et al. pro-
posed for RMn2O5 systems that ferroelectricity results
from the exchange striction of acentric spin density waves
for the CM phases [9]. Indeed, Kimura et al. insisted
that CM modulations are indispensable to the ferroelec-
tricity in the LT-ICM phase, from their neutron scat-
tering results on HoMn2O5 under high magnetic fields
[11]. However, lattice distortions derived from ICM spin
structures turned out to describe well the spontaneous
polarizations of YMn2O5 even in the ICM phase [12],
implying that commensurability is not a necessary con-
dition for the ferroelectricity. In order to understand the
intriguing magnetoelectricity well, detailed information
on the lattice and spin structure changes is necessary.
However, only limited crystallographic data are available
and even any direct evidence on the symmetry lowering
has not been reported yet [9, 10, 11, 12, 13, 14].
In this letter, we present synchrotron x-ray scatter-
ing results on single crystals of TbMn2O5. Since x-ray
scattering is sensitive to both lattice and magnetic mod-
ulations, x-ray scattering with intense undulator x-rays
allowed simultaneous measurements for qm and qc. Non-
resonant x-ray scattering results show the relationship of
qc = 2qm, confirming lattice modulations are generated
by the magnetic orders. A (3 0 0) forbidden Bragg peak,
which is a direct evidence of the symmetry lowering to
a non-centrosymmetry space group, was observed in the
ferroelectric (FE) phases. Furthermore, the temperature
dependence of the peak intensity, I (300), was found to
coincide with those of the lattice modulation peak inten-
sities, I c, and the spontaneous polarization square,P
2, in
http://arxiv.org/abs/0704.0533v1
the CM phase. This indicates the ferroelectricity is gen-
erated by the lattice modulations. In the LT-ICM phase,
temperature dependences of I c cannot be described by a
single order parameter, implying the presence of differ-
ent magnetic orders. Resonant x-ray magnetic scattering
results at Mn L-, Tb L3- and M5-edges show that each
magnetic order has its own temperature dependence. It
is proposed that CM to LT-ICM phase transition is in-
duced by discommensuration through phase slipping due
to competing magnetic orders under the frustrated ge-
ometry. Moreover, the CM modulations with anti-phase
domain walls are consistent with the temperature depen-
dences of qm and I (300) in the LT-ICM phase, and explain
well the abrupt loss of P at the transition.
Single crystals of TbMn2O5 were grown by a flux
method [4]. The specimen used for the hard x-ray scat-
tering measurements has a plate-like shape with (1 1 0) as
a surface normal direction. Its mosaicity was measured
to be about 0.01◦ at (3 3 0) Bragg reflection. For soft
x-ray scattering, a different sample was cut and polished
to have (2 0 1) as a surface normal direction. Soft x-ray
scattering measurements were performed at 2A beamline
in the Pohang Light Source (PLS). Details of the soft
x-ray scattering chamber were described elsewhere [15].
X-ray diffraction experiments were conducted at the 3C2
bending magnet beamline in the PLS and at the 6-ID
undulator beamline in the Midwest Universities Collab-
orative Access Team (MUCAT) Sector in the Advanced
Photon Source. For non-resonant x-ray scattering exper-
iments, 6.45 keV was selected as an incident x-ray energy
below Mn K -edge (∼ 6.55 keV). All the incident x-rays
were σ-polarized and PG(006) was used to have a σ-to-π
channel at Tb L3-edge.
Nonresonant x-ray scattering measurements were per-
formed to investigate the temperature dependence of qm
and qc simultaneously. The measured lattice modula-
tion peak position of (2 5 -0.5) for the CM phase and
those of its 4 split peaks for the ICM phases are pre-
sented as solid and open circles, respectively, in Fig. 1
(a). For magnetic satellites, (2.5 5 -0.25) peak and its 2
split ones were measured for the CM and ICM phases.
Their positions are presented as solid and open squares,
respectively. The magnetic and lattice modulation satel-
lites for ICM phases are linked with broken and solid
lines to their corresponding main Bragg peaks. Temper-
ature dependences of qm and qc are shown in Fig. 1 (b)
and (c). From the results, it is obvious that relation,
qc = 2qm, holds within experimental errors in the whole
temperature range below TN . It is consistent with the
magnetic order induced lattice modulations. The tem-
perature dependence of qm shown here is qualitatively
similar to the neutron scattering results by others [16].
Below TN , ICM magnetic peaks develop, and qm locks
into a CM ordering at (1
) via a first order transi-
tion at T c1. On further cooling the sample below T c2,
the CM to LT-ICM phase transition takes place. With
FIG. 1: (Color online) Positions of the measured magnetic
satellites (square) and lattice modulation peaks (circle) in
the (h 5 l) reciprocal lattice plane are shown in (a). The
temperature dependences of qxm (square) and q
c (circle), and
those of qzm (square) and q
c (circle) are shown in (b) and (c),
respectively. For direct comparisons with those of qm, the
components of qc are divided by two. Vertical broken lines
indicate TN ∼ 41 K, T c1 ∼ 36 K, T c2 ∼ 24 K and T c3 ∼ 13
K, respectively.
further decreasing temperature, qm of the LT-ICM mod-
ulations evolves and is eventually pinned around (0.486 0
0.308) which can be approximated to a CM value of (17
) at T c3 ∼ 13 K. Such a long-period CM modulation
can be interpreted as the CM modulations (qm = (
)) with domain walls, as is the case for ErNi2B2C [17].
As shown in Fig. 2 (a), measurable x-ray intensi-
ties were observed, in the ferroelectric phase, at (3 0 0)
Bragg position which is forbidden under a space group of
the room temperature paraelectric phase, Pbam. Resid-
ual intensities above T c1 are due to higher harmonic
FIG. 2: (Color online) (a) Rocking curves of a (3 0 0) forbid-
den Bragg peak measured below (open) and above T c1 (solid).
(b) Temperature dependences of the integrated intensities of
a (3 0 0) Bragg peak (circle), CM lattice modulation peak
(square) and squared spontaneous polarization (broken line)
taken from Ref. 4. All the data are properly scaled.
contaminations. Values for full-width-at-half-maximum
(FWHM) of the peak are about 0.01◦, close to those of (4
0 0) main Bragg peak in the LT-ICM phase. The results
explicitly evidenced that inversion symmetry is broken
concomitantly with the FE phase as speculated before.
According to the models suggested by others [9, 10], dis-
placements of Mn3+ are in ab-plane. While b-axis com-
ponents of the atomic displacements mainly contribute to
P, a-axis components enable the emergence of I (300). If
the atomic displacements correspond to the periodic lat-
tice modulations, it is expected that both P2 and I (300)
are proportional to I c, as shown in Fig. 2 (b). (The spon-
taneous polarization data are taken from Ref. 4 and are
shifted in order to get the same values for T c1.) It con-
firms that spontaneous polarization is due to the atomic
displacements driven by magnetic orders: a direct crys-
tallographic evidence of exchange striction as the origin
of ferroelectricity in the material [8, 9, 10, 12]. Also it is
noted that I (300) drops abruptly at T c2 and has a broad
minimum around T c3.
Though many interesting ME phenomena have been
reported in the LT-ICM phases below T c2 [4, 11, 18, 19],
their basic mechanisms still remain to be understood.
Since the lattice modulations reflect basic ME natures,
temperature dependences below T c2 of integrated inten-
sities were measured at the four split ICM peak positions
illustrated in Fig. 1 (a). From the results displayed in
Fig. 3, it is clear that temperature dependences of all four
peaks cannot be described by a single order parameter,
implying the presence of various magnetic orders having
the same qm’s but different temperature dependences.
To investigate different magnetic orders, we performed
resonant x-ray magnetic scattering measurements at Mn
L-, Tb L3- and M5-edges. Figure 4 (a) shows energy pro-
files around Mn L-edge of magnetic satellites at 10 K and
FIG. 3: (Color online) Temperature dependences of the ICM
lattice modulation peak intensities.
x-ray absorption spectroscopy (XAS) at room tempera-
ture. Magnetic peaks and XAS data clearly show reso-
nances at both Mn L2- and L3-edges. XAS results show
broad peaks containing contributions from the multiplet
states of 3d electrons of Mn3+ and Mn4+ ions. Magnetic
satellites show relatively sharp double peaks at both Mn
L-edges. The sharp resonances represent different multi-
plet states of Mn 3d electrons including charge transfer
excitations, while Mn ions are expected to be in the high-
spin configurations with all the 3d electron spins aligned
FIG. 4: (a) Energy profiles of the ICM magnetic peaks (cir-
cle)and XAS (solid line) around Mn L2,3-edges. Vertical bro-
ken lines correspond to 640.8 eV and 644.2 eV, respectively.
(b) Temperature dependences of the ICM (circle) and the
CM (square) magnetic peaks. Open (Solid) symbols denote
the data taken E = 640.8 eV (644.2 eV), respectively. (c)
Temperature dependences of the ICM (open circle) and the
CM (solid square) magnetic peak at Tb L3-edge.
parallel. Therefore, although the resonances do not have
one-to-one correspondences with the magnetic orders of
Mn ions, changes in the resonances at magnetic satellites
reflect the changes in spin ordering which are periodi-
cally modulated with the wave vector qm. Temperature
dependences of x-ray intensities at the ICM peak of Qm
= (qxm 0 q
m) were measured at the two resonances, 640.8
and 644.2 eV. The results are presented in Fig. 4 (b).
Data for a CM peak of Qm = (0.5 0 0.25) at the reso-
nance of 644.2 eV are presented together. It is clear that,
above 15 K, intensities of each resonance have different
temperature dependences from each other. Though the
origin of the anomalous temperature dependences is not
understood in detail, it reflects complicated natures of
magnetic moments of Mn ions under the frustrated con-
figuration.
Magnetic ordering of Tb3+ ions was investigated with
resonant x-ray scattering measurements at Tb L3-edge.
Figure 4 (c) shows that ordering temperature of Tb mag-
netic moments is the same with that of Mn, TN , which is
consistent with neutron scattering results [9]. The modu-
lation wave vector of Tb magnetic order is the same with
the values of qm measured in nonresonant x-ray scatter-
ing. Soft x-ray magnetic scattering measurements were
also performed at Tb M 5-edge and the result not shown
here confirms that observed x-ray intensities in Fig. 4
(c) reflect magnetic order of Tb 4f electrons which grows
monotonically below TN .
From the results shown in Fig. 4 (b) and (c), it is clear
that there exist multiple magnetic order parameters hav-
ing the same qm’s but different temperature dependences.
The contributing portions of each magnetic order to scat-
tering factors of magnetic satellites are different depend-
ing on Qm(= QBragg + qm), and it results in different
temperature dependence for each magnetic peak and its
corresponding lattice modulation peak intensities, which
explains the temperature dependences presented in Fig.
Since the magnetic orders are located under the spin
frustrated geometry, it is reasonable to suppose that
phase-slips take place due to competitions between the
magnetic orders, as their order parameters grow with
different temperature dependences. The discommensu-
ration results in the transition to the LT-ICM phase.
Anti-phase domain walls for the phase slips are consis-
tent with the aforementioned long-period CM modula-
tions below T c3. Assuming the model suggested by oth-
ers [9], atomic displacements are canted antiferroelectric
type. Across an anti-phase domain wall, directions of
the atomic displacements and the spontaneous polariza-
tions are reversed. Therefore, not only the polarizations
from domains separated by the domain wall cancel each
other but also x-ray scattering amplitudes for the (3 0
0) Bragg peak are canceled due to the crystal symme-
try. Then, only remnants resulting from unequal popu-
lations of the domains contribute to P and I(300). Since
a density of the domain walls determines qm, tempera-
ture dependences of P, I(300) and qm down to T c3 can be
explained consistently in terms of CM modulations with
the anti-phase domain walls. This indicates that CM
modulations are preferred as its low temperature ground
state. Then, the low temperature phase seems to have
a higher entropy due to the domain walls than the high
temperature CM phase, violating the entropy rule. How-
ever, due to the geometrical frustration and the presence
of multiple magnetic orders many different energy scales
can exist. The complicated temperature dependences of
the magnetic orders in Fig. 4(c) reflect the presence of
the different energy scales. Smaller energy scales become
important at low temperatures and induce discommen-
suration. Upturns of the electrical polarization and I(300)
below T c3 are attributed to lattice modulations enhanced
by increasing Tb magnetic moments, which is consistent
with results of others demonstrating couplings between
Tb moments and lattices [18, 19, 20].
In summary, we have shown that exchange striction
is the driving mechanism for the magnetoelectricity in
the material. The same temperature dependences of x-
ray intensities at a (3 0 0) forbidden Bragg peak and a
lattice modulation peak in the CM FE phase, together
with observation of the relation, qc = 2qm, demonstrate
that spontaneous electric polarization is due to atomic
displacements driven by the exchange striction of mag-
netic orders. Resonant x-ray magnetic scattering results
confirm the presence of multiple magnetic orders hav-
ing different temperature dependences. The CM to LT-
ICM phase transition is attributed to discommensura-
tion through phase slipping in the competing magnetic
orders in the frustrated configurations. Temperature de-
pendences of qm, P and I(300) in the LT-ICM phase are
explained in terms of the CM modulations with anti-
phase domain walls.
We thank D.J. Huang for the useful discussions. This
work was supported by the KOSEF through the eSSC
at POSTECH, and by MOHRE through BK-21 pro-
gram. The experiments at the PLS were supported by
the POSTECH Foundation and MOST. Use of the Ad-
vanced Photon Source (APS) was supported by the U.S.
Department of Energy, Office of Science, Office of Ba-
sic Energy Sciences, under Contract No. W-31-109-Eng-
38. The Midwest Universities Collaborative Access Team
(MUCAT) sector at the APS is supported by the U.S.
Department of Energy, Office of Science, Office of Basic
Energy Sciences, through the Ames Laboratory under
contract No. DE-AC02-07CH11358. Work at Rutgers
was supported by NSF-DMR-0520471.
∗ Electronic address: kibong@postech.ac.kr
† Present address: European Synchrotron Radiation Facil-
mailto:kibong@postech.ac.kr
ity, BP 220, F-38043 Grenoble Cedex 9, France
[1] M. Kenzelmann et al., Phys. Rev. Lett. 95, 87206 (2005).
[2] G. Lawes et al., Phys. Rev. Lett. 95, 087205 (2005).
[3] S. Kobayashi et al., J. Korean Phys. Soc. 46, 289 (2005).
[4] N. Hur et al., Nature 429, 392 (2004).
[5] T. Kimura et al., Nature 426, 55 (2003).
[6] T. Goto et al., Phys. Rev. Lett. 92, 257201 (2004).
[7] T. Kimura et al., Phys. Rev. B 68, 060403(R) (2003).
[8] S.-W. Cheong et al., Nature Mater. 6, 13 (2007).
[9] L. C. Chapon et al., Phys. Rev. Lett. 93 177402, (2004).
G.R. Blake et al., Phys. Rev. B. 71, 214402 (2005).
[10] I. Kagomiya et al., Ferroelectrics 286, 167 (2003).
[11] H. Kimura et al., J. Phys. Soc. Jpn. 75, 113701 (2006).
[12] L. C. Chapon et al., Phys. Rev. Lett. 96, 097601 (2006).
[13] V. Polyakov et al., Physica B 297, 208 (2001).
[14] D. Higashiyama et al., Phys. Rev. B 70, 174405 (2004).
[15] J.-S. Lee, Ph. D. Thesis, POSTECH (2006).
[16] S. Kobayashi et al., J. Phys. Soc. Jpn. 73, 3439 (2004).
[17] H. Kawano-Furukawa et al., Phys. Rev. B. 65, 180508
(2002).
[18] S. Y. Haam et al., Ferroelectrics 336, 153 (2006).
[19] S.-H. Baek et al., Phys. Rev. B 74 140410(R) (2006).
[20] R. Valdés Aguilar et al., Phys. Rev. B 74, 184404 (2006).
ABSTRACT
  Comprehensive x-ray scattering studies, including resonant scattering at Mn
L-edge, Tb L- and M-edges, were performed on single crystals of TbMn2O5. X-ray
intensities were observed at a forbidden Bragg position in the ferroelectric
phases, in addition to the lattice and the magnetic modulation peaks.
Temperature dependences of their intensities and the relation between the
modulation wave vectors provide direct evidences of exchange striction induced
ferroelectricity. Resonant x-ray scattering results demonstrate the presence of
multiple magnetic orders by exhibiting their different temperature dependences.
The commensurate-to-incommensurate phase transition around 24 K is attributed
to discommensuration through phase slipping of the magnetic orders in spin
frustrated geometries. We proposed that the low temperature incommensurate
phase consists of the commensurate magnetic domains separated by anti-phase
domain walls which reduce spontaneous polarizations abruptly at the transition.

<|endoftext|><|startoftext|>
Introduction to Their Statistical Mechanics
(World Scientific, London, 2005).
[7] R. Vogelsang and C. Hoheisel, Phys. Rev. A 38, 6296
(1988); H.P. van den Berg and C. Hoheisel, Phys. Rev.
A 42, 2090 (1990).
[8] K.W. Kehr, K. Binder, and S.M. Reulein, Phys. Rev. B
39, 4891 (1989).
[9] W. Hess, G. Nägele, and A.Z. Akcasu, J. Polymer Sc. B
28, 2233 (1990).
[10] J. Trullàs and J.A. Padrò, Phys. Rev. E 50, 1162 (1994).
[11] A. Baumketner and Ya. Chushak, J. Phys.: Condens.
Matter 11, 1397 (1999).
[12] J.–F. Wax and N. Jakse, Phys. Rev. B 75, 024204 (2007).
[13] L.S. Darken, Trans. AIME 180, 430 (1949).
[14] M. Fuchs and A. Latz, Physica A 201, 1 (1993).
[15] M. Asta, D. Morgan, J.J. Hoyt, B. Sadigh, J.D. Althoff,
D. de Fontaine, and S.M. Foiles, Phys. Rev. B 59, 14271
(1999).
[16] F. Faupel, W. Frank, M.–P. Macht, H. Mehrer, V. Naun-
dorf, K. Rätzke, H.R. Schober, S.K. Sharma, and H. Te-
ichler, Rev. Mod. Phys. 75, 237 (2003).
[17] Y. Mishin, M. J. Mehl, and D. A. Papaconstantopoulos,
Phys. Rev. B 65, 224114 (2002).
[18] S. K. Das, J. Horbach, M. M. Koza, S. Mavila Chatoth,
and A. Meyer, Appl. Phys. Lett. 86, 011918 (2005).
[19] G.L. Batalin, E.A. Beloborodova, and V.G. Kazimirov,
Thermodynamics and the Constitution of Liquid Al Based
Alloys (Metallurgy, Moscow, 1983).
[20] G.D. Ayushina, E.S. Levin, and P.V. Geld, Russ. J. Phys.
Chem. 43, 2756 (1969).
[21] M. Maret, T. Pomme, A. Pasturel, and P. Chieux, Phys.
Rev. B 42, 1598 (1990).
[22] S. Sadeddine, J.F. Wax, B. Grosdidier, J.G. Gasser, C.
Regnaut, and J.M. Dubois, Phys. Chem. Liq. 28, 221
(1994).
[23] M. Asta, V. Ozolins, J.J. Hoyt, and M. van Schilfgaarde,
Phys. Rev. B 64, 020201(R) (2001).
[24] V.S. Sudovtseva, A.V. Shuvalov, N.O. Sharchina, Ras-
plavy No. 4, 97 (1990); U.K. Stolz, I. Arpshoven, F.
Sommer, and B. Predel, J. Phase Equilib. 14, 473 (1993);
K.V. Grigorovitch and A.S. Krylov, Thermochim. Acta
314, 255 (1998).
[25] S.K. Das, J. Horbach, and K. Binder, J. Chem. Phys.
119, 1547 (2003); S.K. Das, J. Horbach, K. Binder, M.E.
Fisher, J.V. Sengers, J. Chem. Phys. 125, 024506 (2006);
S.K. Das, M. E. Fisher, J.V. Sengers, J. Horbach, K.
Binder, Phys. Rev. Lett. 97, 025702 (2006).
[26] F.O. Raineri and H.L. Friedman, J. Chem. Phys. 91,
5633 (1989); F.O. Raineri and H.L. Friedman, J. Chem.
Phys. 91, 5642 (1989).
[27] A.B. Bhatia and D.E. Thornton, Phys. Rev. B 52, 3004
(1970).
[28] D. Holland–Moritz, O. Heinen, R. Bellissent, T. Schenk,
and D.M. Herlach, Int. J. Mat. Res. 97, 948 (2006).
[29] J.R. Manning, Phys Rev. 124, 470 (1961).
[30] M. P. Allen, D. Brown, and A. J. Masters, Phys. Rev.
E 49, 2488 (1994); M. P. Allen, Phys. Rev. E 50, 3277
(1994).
[31] A. Griesche, F. Garcia Moreno, M.P. Macht, and G.
Frohberg, Mat. Sc. Forum 508, 567 (2006).
[32] A. Griesche, M.P. Macht, J.P. Garandet, and G. Froh-
berg, J Non–Cryst. Solids 336, 173 (2004).
[33] A. Griesche, M.P. Macht, and G. Frohberg, unpublished.
[34] J.P. Garandet, C. Barrat, and T. Duffar, Int. J. Heat
Mass Transfer 38, 2169 (1995).
[35] C. Barrat and J.P. Garandet, Int. J. Heat Mass Transfer
39, 2177 (1996).
[36] A. Meyer, Phys. Rev. B 66, 134205 (2002).
[37] S. Mavila Chathoth, A. Meyer, M. M. Koza, and F. Yu-
ranji, Appl. Phys. Lett. 85, 4881 (2004).
[38] D. P. Landau and K. Binder, A Guide to Monte Carlo
Simulations in Statistical Physics (Cambridge University
Press, Cambridge, 2000).
[39] A.F. Voter and S.P. Chen, in Characterization of Defects
in Materials, edited by R.W. Siegel et al., MRS Sym-
posia Proceedings No. 82 (Materials Research Society,
Pittsburgh, 1978), p. 175.
[40] S.M. Foiles and M.S. Daw, J. Mat. Res. 2, 5 (1987).
[41] S.K. Das, J. Horbach, and K. Binder, unpublished.
	Introduction
	Self–diffusion and interdiffusion: Basic theory
	Experimental Methods
	Long–capillary technique
	Neutron scattering experiments
	Details of the simulation
	Results
	Conclusion
	Acknowledgments
	References
ABSTRACT
  A combination of experimental techniques and molecular dynamics (MD) computer
simulation is used to investigate the diffusion dynamics in Al80Ni20 melts.
Experimentally, the self-diffusion coefficient of Ni is measured by the
long-capillary (LC) method and by quasielastic neutron scattering. The LC
method yields also the interdiffusion coefficient. Whereas the experiments were
done in the normal liquid state, the simulations provided the determination of
both self-diffusion and interdiffusion constants in the undercooled regime as
well. The simulation results show good agreement with the experimental data. In
the temperature range 3000 K >= T >= 715 K, the interdiffusion coefficient is
larger than the self-diffusion constants. Furthermore the simulation shows that
this difference becomes larger in the undercooled regime. This result can be
refered to a relatively strong temperature dependence of the thermodynamic
factor \Phi, which describes the thermodynamic driving force for
interdiffusion. The simulations also indicate that the Darken equation is a
good approximation, even in the undercooled regime. This implies that dynamic
cross correlations play a minor role for the temperature range under
consideration.

<|endoftext|><|startoftext|>
Introduction
1.1. The Multimodality of Globular Cluster Systems
One of the most significant developments in the study of extragalactic globular cluster
systems (GCSs) was the discovery of bimodality in their color distributions (see Ashman & Zepf
1998; Harris 2001; West et al. 2004 and references therein). Today, we generally refer to glob-
ular clusters (GCs) belonging to the blue peak of the color distribution as metal-poor GCs
and to the red-peak members as the metal-rich sub-population. It is generally considered
that the presence of multiple modes implies multiple distinct GC formation epochs and/or
mechanisms and ties those directly into formation scenarios that have to describe the par-
allel assembly histories of GCSs and the diffuse stellar populations in their host galaxies. In
massive early-type galaxies the current GCS assembly paradigms view the origin of the two
color peaks from the perspective of either episodic star-cluster formation bursts triggered by
gas-rich galaxy mergers (e.g. Ashman & Zepf 1992), temporarily interrupted cluster forma-
tion (so-called in-situ formation, e.g. Forbes et al. 1997; Harris et al. 1998), and star-cluster
accretion as a result of the hierarchical assembly of galaxies (e.g. Côté et al. 1998).
While the majority of GCSs in early-type galaxies show clearly bimodal color distri-
butions, the general picture is much more complex, ranging from purely blue to purely
red color distributions (e.g. Gebhardt & Kissler-Patig 1999; Kundu & Whitmore 2001a,b;
Larsen et al. 2001; Peng et al. 2006). This complexity is exacerbated by the fact that color
– 3 –
bimodality is a function of galaxy mass and morphology, as less massive and later-type
galaxies tend to have single-mode blue (i.e. metal-poor) GC populations (e.g. Lotz et al.
2004; Sharina et al. 2005; Peng et al. 2006). Furthermore, color bimodality is also a func-
tion of galactocentric distance and is mainly due to the more extended spatial distribution
of the metal-poor sub-population relative to metal-rich clusters (e.g. Harris & Harris 2002;
Rhode & Zepf 2004; Dirsch et al. 2003, 2005).
1.2. Numerical Models of Globular Cluster System Formation
The aspect of GCS formation and assembly entered recently the domain of numerical
simulations of galaxy formation due to the increasing spatial resolution of these computa-
tions. For instance, Li et al. (2004) model GC formation by identifying absorbing sink parti-
cles in their smoothed particle hydrodynamics (SPH) high-resolution simulation of isolated
gaseous disks and their mergers. They find a bimodal globular-cluster metallicity distribution
in their merger remnant under the assumption of a particular age-metallicity relation. A key
finding of their merger simulation is a more concentrated spatial distribution of metal-rich
GCs with respect to the metal-poor sub-population in good agreement with observations.
Since their models of isolated galaxies produce a smooth age distribution (implying a smooth
metallicity and color distribution), Li et al. conclude that mergers are required to produce
a bimodal metallicity (i.e. color) distribution.
In a more detailed adaptive-grid cosmological simulation, Kravtsov & Gnedin (2005)
followed the formation of a star-cluster system during the early evolution of a Milky Way-size
disk galaxy to redshift z=3. Their model could reproduce the extended spatial distribution of
metal-poor halo globular clusters as observed in M31 and the Milky Way. However, because
their simulation does not follow the later evolution at z < 3 it is unclear whether it would
produce a metallicity bimodality and any significant age-metallicity relation.
An alternative, more statistical approach to modeling GCS assembly is to directly link
the mode of GC formation to the star-formation rate in semi-analytic models. Beasley et al.
(2002) were the first to explore this path by assuming that metal-poor GCs form in gaseous
proto-galactic disks while metal-rich GCs are created during gaseous merger events. Their
study showed that the observed globular-cluster color bimodality can only be reproduced by
artificially stopping the formation of metal-poor GCs at redshifts z & 5. By construction,
no spatial information on metal-rich and/or metal-poor GCs is provided in these models.
– 4 –
1.3. A Spatially Resolved Chemical Evolution Model for Spheroid Galaxies
Recently, Pipino & Matteucci (2004, hereafter PM04) presented a spatially resolved
chemical evolution model for the formation of spheroids, which successfully reproduces a large
number of photo-chemical properties that could be inferred from either the optical or from
the X-ray spectra of the light coming from ellipticals. The model includes an initial gas infall
and a subsequent galactic wind; it takes into account detailed nucleosynthesis prescriptions
of both type-II and Ia supernovae as well as low and intermediate-mass stars. It has been
extensively tested against the main photo-chemical properties of nearby ellipticals, including
the observed increase of the α-enhancement in their stellar populations with galaxy mass (e.g.
Worthey et al. 1992; Weiss et al. 1995). This is at variance with standard models based on
the hierarchical merging paradigm, which do not reproduce this trend (Thomas et al. 2002).
Since the PM04 model provides full radial information on the composite nature of
stellar populations that make up elliptical galaxies, the observation of different GC sub-
populations is, therefore, a new sanity check for the validity of this model. Moreover, we
recall that PM04 and, more recently, Pipino et al. (2006, hereafter PMC06) suggested that
elliptical galaxies should form outside-in, namely the outermost regions form faster as well
as develop an earlier galactic wind with respect to the central parts (see also Martinelli et al.
1998). This mechanism implies that the stars in massive spheroids form a Composite Stellar
Population (CSP), whose chemical properties, in particular their metallicity distribution,
changes with galactocentric distance.
Starting with the assumption that GC sub-populations trace the components of CSPs,
we will show how the observed multi-modality in GCSs can be ascribed to the radial vari-
ation in the underlying stellar populations. In particular, the observed GCSs are a linear
combination of GC sub-populations inhabiting a given projected galactocentric radius.
The paper is organized as follows: in Section 2 we briefly describe the adopted theo-
retical model; in Section 3 we compare the predictions with observations and discuss the
implications, while Section 4 presents the final conclusions.
2. The model
2.1. The Chemical Evolution Code
The chemical evolution code for elliptical galaxies adopted here is described in PM04,
where we refer the reader for more details. In this work, we present the results for a
galaxy with Mlum ∼ 10
11M⊙, taken from PM04’s Model IIb. This model is characterized
– 5 –
by a Salpeter (1955) IMF, Thielemann et al. (1996) yields for massive stars, Nomoto et al.
(1997) yields for Type-Ia SNe, and van den Hoek & Groenewegen (1997) yields for low- and
intermediate-mass stars.
An important feature of the PM04 model is its multi-zone nature, namely the model
galaxy is divided into several non-interacting spherical shells of radius ri, which facilitate a
detailed study of the radial variation of the photo-chemical properties of the GCS and its
host galaxy. In each zone i, the equations for the chemical evolution of 21 chemical elements
are solved (see PM04, Matteucci 2001).
The model assumes that the galaxy assembles by merging of gaseous lumps on short
timescales. The chemical composition of the lumps is assumed to be primordial. In fact, our
model assumes that the accretion of primordial gas from the surroundings1 is more efficient
in more massive systems, given their higher cross section per unit mass (see PM04). The
model galaxy suffers a strong starburst which injects a large amount of energy into the
interstellar medium, able to trigger a galactic wind, occurring at different times at different
radii, mainly due to the radial variation of the potential well, which is shallower in the
galactic outskirts. After the onset of wind activity the star formation is assumed to stop
and the galaxy evolves passively with continuous mass loss. In order to correctly evaluate
the amount of energy driving the wind, a detailed treatment of stellar feedback is included
in the code (that takes into account the stellar lifetimes). In particular, the energy restored
to the interstellar medium by both Type-Ia and Type-II supernovae has been calculated
in a self-consistent manner according to the time of explosion of each supernova and the
characteristics of the ambient medium (see PM04 for details). The potential well that keeps
the gas bound to the galaxy is assumed to be dominated by a diffuse and massive halo of
Dark Matter surrounding the galaxy.
In the following we adopt the standard star formation rate ψ∗(t, ri) = ν · ρgas(ri, t)
before the onset of the galactic wind (tgw), where ρgas is the gas density, ν the star-formation
efficiency; otherwise we assume that ψ∗(t > tgw, ri) = 0. We recall here that the adopted
star-formation efficiency is ν = 10Gyr−1, while the infall timescale is τ = 0.4Gyr in the
galactic core and τ = 0.01Gyr at one effective radius (of the diffuse light, Reff), respectively.
These values were chosen by PM04 in order to reproduce the majority of the chemical and
photometric properties of ellipticals such as: the 〈[Mg/Fe]〉−σ (e.g. Faber et al. 1992), the
Color-Magnitude (e.g. Bower et al. 1992), the Mass-Metallicity relation (e.g. Gallazzi et al.
2005) as well as the observed gradients in metallicity (e.g. Carollo et al. 1993), 〈[Mg/Fe]〉
1Since we lack a cosmological framework we cannot further specify the properties of the infalling primordial
– 6 –
(e.g. Mendez et al. 2005), and color (e.g. Peletier et al. 1990). Pipino et al. (2005) recently
extended this model to explain also the properties of hot X-ray emitting halos surrounding
more massive spheroids.
2.2. Globular Cluster Formation
The formation rate of GCs, ψGC , in the i-th shell is assumed to be directly linked to its
star formation rate ψ∗(t, ri, Zi) via a suitable function of time t, radius ri, and metallicity
Z, which represents some scaling law between star formation rate ψ∗ and the star cluster
formation ψGC and can be regarded as a GC formation efficiency. A similar relation be-
tween the average star formation rate per surface area and the star cluster formation was
recently found by Larsen & Richtler (2000) to hold in nearby spiral galaxies. In addition,
the efficiency of cluster formation in massive ellipticals appears to be constant, where the
mass ratio between the mass in star clusters and the baryons locked in field stars+gas is
ǫGC≈ 0.25% (McLaughlin 1999). Here we extend this surface density relation to 3-D space.
Moreover, PMC06 showed that at a given galactocentric radius model galaxies are made
of a CSP, namely a mixture of several simple stellar populations (SSPs) each with a single age
and chemical composition. The CSP reflects the chemical enrichment history of the entire
system, weighted by the star formation rate. We define the stellar metallicity distribution
Υ∗ as the distribution of stars belonging to a given CSP as a function of [Z/H].
We can then write the globular-cluster metallicity distribution ΥGC at a given radius ri
and time t as:
ΥGC(t, ri, Z) = f(t, ri, Z) ·Υ∗(t, ri, Z) , (1)
where f includes all the information pertaining to the connection between ψGC and ψ∗.
It is not trivial, and beyond the scope of the paper, to find an explicit definition for
f(t, ri, Zi), which basically carries the information on the internal physics of gas clouds where
GCs are expected to form. In the following we will show that, for a few and sensible choices
of f , the observed multi-modality in the color distribution of globular clusters may be driven
by the radial variations in the stellar population mix of ellipticals. We will first adopt a
constant function f (see Sec. 3.1) and then allow f to mildly vary with Z (see Sec. 3.2). No
absolute values for f will be given, since our formalism deals with normalized distributions.
In particular, the total ΥGC summed over all radial shells can be written as:
ΥGC,tot(t, Z) =
f(t, ri, Z) ·Υ∗(t, ri, Z) . (2)
– 7 –
Similar equations hold for other GC distributions as a function of either [Mg/Fe] or [Fe/H].
At this stage it is useful to recall that Υ∗(t, ri, Z) can be represented in two following
ways: i) as the fraction of mass of a CSP which is locked in stars at any given metallicity
(Pagel & Patchett 1975; Matteucci 2001). In the following we refer to this stellar metal-
licity distribution as Υ∗,m (MSMD: mass-weighted stellar metallicity distribution); ii) as a
fraction of luminosity of the CSP in each metallicity bin. This definition is closer to the
measurement as it can be directly compared to the luminosity-weighted mean Υ∗,l (LSMD:
luminosity-weighted stellar metallicity distribution) at a given radius (see Arimoto & Yoshii
1987; Gibson 1996). This classification is important since PMC06 showed that the Υ∗,m and
Υ∗,l might differ, especially at large radii, even for old stellar populations. The advantage
of GCSs, for which accurate ages are known, is that they directly probe the mass-weighted
distributions.
At this point it is useful to recall that our the adopted chemical evolution model divides
the galaxy in several non-interacting shells. In each shell the time at which the galactic wind
occurs is self-consistently evaluated from the local condition. In particular, we follow Mar-
tinelli et al. (1998) suggestion that gradients can arise as a consequence of a more prolonged
SF, and thus stronger chemical enrichment, in the inner zones. In the galactic core, in fact,
the potential well is deeper and the supernovae (SNe) driven wind develops later relative to
the most external regions. This particular formation scenario leaves a characteristic imprint
on the shape of both Υ∗,m and Υ∗,l and here we give some general considerations. In partic-
ular, we can explain the slow rise in the low metallicity tail of the distributions as the effect
of the initially infalling gas, whereas the onset of the galactic wind sets the maximum metal-
licity of the Υ∗,m and Υ∗,l. In general, the suggested outside-in formation process reflects in
a more asymmetric stellar metallicity distribution at larger radii, where the galactic wind
occurs earlier (i.e. closer to the peak of the star formation rate), with respect to the galactic
center. The qualitative agreement between these model predictions and the observed stellar
metallicity distributions derived at different radii by Harris & Harris (2002, see their Fig. 18)
for the stars in the elliptical galaxy NGC 5128 is remarkable. If confirmed from observations
in other ellipticals, the expected sharp truncation of Υ∗,m at large radii might be the first
direct evidence of a sudden and strong wind which stopped the star formation earlier in the
galactic outskirts (see PMC06 and Pipino, D’Ercole, & Matteucci, in preparation).
– 8 –
3. Results and discussion
3.1. The Multi-Modality of Globular Cluster Systems in Elliptical Galaxies
The general presence of multi-modal GCSs implies that their host galaxies did not form
in a single, isolated monolithic event, but experienced spatially and/or temporally separated
star-formation bursts. In the recent past, both semi-analytic and hydrodynamic simula-
tions of galaxy formation attempted to follow the process of GC formation (Beasley et al.
2002; Kravtsov & Gnedin 2005), but neither could produce a clearly bimodal MDF in their
simulated GCSs.
In this section we will show how to obtain a bimodal metallicity distribution function for
GCs ΥGC,tot starting from single-mode stellar metallicity distribution functions Υ∗(t, ri, Z)
(commonly known as G-dwarf-like diagrams) for the CSP inhabiting different radii of a
prototypical elliptical galaxy according to Equation 2.
3.1.1. The Comparison Sample
As stressed in the introduction, the multi-modality in GCSs varies as a function of host
galaxy properties (e.g. mass, morphological type, etc.). Here we try to match the distri-
butions resulting from the recent compilation of spectroscopic data by Puzia et al. (2006,
hereafter P06), which samples the typical bimodal color distribution of GCs in nearby galax-
ies (see also Puzia et al. 2004, 2005). This is illustrated in Figure 1, where we plot the
(V−I)0 color distribution of the P06 sample of GCs in elliptical galaxies (top panel) together
with those of GCs in NGC 4472 and the Milky Way (middle and bottom panel). NGC 4472
is the most luminous elliptical in the Virgo galaxy cluster and hosts a GC system with a
prototypical color bimodality (e.g. Puzia et al. 1999). To allow direct comparison with the
P06 sample, we use GCs in NGC 4472 that are brighter than V ≃ 22.5 since the P06 sample
includes only the brightest GCs in nearby early-type galaxies in order to maximize the S/N
of their spectra. The resulting color distribution is remarkably similar to the one of the P06
sample, which assures that the P06 sample includes a representative sampling of the GC
color bimodality in massive elliptical galaxies.
However, the comparison with the Milky Way GCs shows that the P06 sample covers
few of the most metal-poor GCs. Therefore, the bimodality which we refer to in the fol-
lowing may not be the same as the one observed in spirals or in some elliptical galaxies,
where a substantial population of metal-poor clusters with [Z/H] . −1.5 is present (e.g.
Gebhardt & Kissler-Patig 1999).
– 9 –
We do not include the dynamical evolution of GCs in our model, since we are considering
only the brightest (most massive, i.e. > 105.5M⊙, see also Puzia et al. 2004, A&A 415, 123)
clusters in nearby galaxies as comparison sample. The comparison sample includes GCs much
brighter than the typical turnover magnitude of the globular cluster luminosity function and
we, therefore, do not expect significant differential dynamical evolution for these massive
systems (see Gnedin & Ostriker 1997, for details). In fact, it has been shown (e.g. Fall
& Zhang, 2001) that the timescales for both evaporation by two body relaxation and tidal
stripping of star clusters is longer than a Hubble time for GCs more massive than ∼ 105.5M⊙.
In our model, the number of clusters formed in a star-formation burst of a given strength
is adjusted to match the observations. Hence, the absolute scaling of GC numbers is arbi-
trary, i.e., within physical limitations of the star formation rate any number of GCs can be
reproduced by adjusting the function f in Equation 2.
If, however, metal-poor and metal-rich GCs are on systematically different orbits and
experience significantly different dynamical evolutions the effect of tidal disruption might
be slowly changing relative GC numbers with time. Another complication is the variation
of the initial star-cluster mass function, in particular as a function of metallicity. Modeling
these effects requires detailed knowledge of the orbital characteristics and chemo-dynamical
processes that lead to star cluster formation, and goes far beyond the scope of this work.
We keep these potential systematics in mind, but expect negligible impact on our analysis.
3.1.2. A Simple Model
In order to make a first-order comparison between our model predictions and the ob-
served ΥGC,tot at t=13 Gyr, we first focus on the simple case in which:
ΥGC,tot(Z) = fred ·Υ∗(t = 13Gyr, r1, Z) +
fblue ·Υ∗(t = 13Gyr, r2, Z) , (3)
with fred , fblue = const and r1 = 0.1Reff , r2 ≥ 1Reff . The first term (red) corresponds
to a population typical of the galaxy core (well inside r < 1Reff). The second term (blue)
represents Υ∗ in the outer regions. In order to take into account the different amounts of
stars formed in each galactic region, we point out that the stellar metallicity distributions
entering Equation 2 were not normalized.
As a first step, we take several values for the weights fred , fblue in order to mimic
different mixtures of the two GC populations. In particular, we used the relative numbers of
– 10 –
the red (here identified as the metal-rich core population) and the blue globular clusters (the
halo metal-poor population) as a function of galactocentric radius for the elliptical galaxy
NGC 1399 (Dirsch et al. 2003). Our particular choice is driven by observationally motivated
values for the weights, although we realize that NGC 1399 is a quite peculiar, massive cD
elliptical and might not be representative of less massive
systems. In the context of this first step, the weights might reflect the effects of the
projection on the sky of a three-dimensional structure. However, we show below that the
results do not strongly depend on the weights. Therefore they might be interpreted as mean
values and could be changed if one decides to model a particular galaxy, with a different
ellipticity, inclination, and luminosity profile.
In Figure 2 we show the globular-cluster metallicity distribution ΥGC by mass (compu-
tation based on Υ∗,m) in two radial bins for three particular choices of weights. In particular
in the following we will use the ratios fred = 0.77,fblue = 0.23, fred = 0.60,fblue = 0.40 and
fred= fblue = 0.50 in order to define the theoretical innermost, intermediate and outermost
sub-sample of the GCS, respectively. These ΥGC will be compared against subsamples of
the P06 data, obtained by selecting GCs with either r < 1Reff (in the case of the innermost
population) or r ≥ 1Reff (for the intermediate and the outemost cases, respectively), unless
otherwise stated.
3.1.3. Globular Cluster Metallicity Distribution
In order to plot the different cases on the same scale we normalize each ΥGC by its
maximum value. In the left panel of Figure 2 the shaded histogram represents the innermost
population. Our predictions match the data very well, especially in the metal-rich slope
and the mean of the distribution. The same happens for the pure core populations, which
shows how the GCS might be used to probe the CSP in ellipticals. It should be remarked
that a second peak centered at super-solar metallicity appears in the distribution predicted
by our models, although not evident in the data of the particular radial sub-sample. The
right panel of Figure 2 illustrates model predictions which are more representative of the
galaxy as a whole (either at 1Reff , i.e. the intermediate population, or at several effective
radii, the outermost population), and we consider them as the fiducial case. These two
cases look quite similar to each other and have clear signs of bimodality in remarkable
agreement with the spectroscopic data (solid empty histogram, sub-sample of the P06 data
with r ≥ Reff). A Kolmogorov-Smirnov test returns > 99% probability that both model
predictions and observations are drawn from the same parent distribution in the left panel of
Figure 2. The right panel statistics gives a lower likelihood of 98.4% that both distributions
– 11 –
have the same origin, which is mainly due to the observed excess of metal-poor GCs at
large galactocentric radii compared to the model predictions. The prediction of a super-
solar metallicity globular cluster sub-population is entirely new and a result of the radially
varying and violent formation of the parent galaxy. Moving to the low-metallicity tail, we
predict slightly fewer low metallicity objects than expected from observations. But we recall
the systematics mentioned in Section 3.1.1.
In Figure 3 (left panel) we show the results for a pure core GCs, namely one in which we
adopt fred :fblue = 1:0. In this quite extreme case the observed GCs have been selected with
radius r < 0.5Reff . The histogram reflects the shape of a G-dwarf-like diagram expected for
a typical CSP inhabiting the galactic core. This finding is particularly important, because it
might offer the opportunity to resolve the SSPs in ellipticals, at variance with data coming
from the integrated spectra which deal with luminosity-weighted quantities. Whereas in
Figure 3 (right panel), the intermediate population is compared to a sub-sample of P06 GCs
with 0.5 < r < 0.5Reff . This is to show that the multimodality is not an artifact due to the
particular radial binning adopted in this paper.
Figure 4 shows the V -band luminosity-weighted ΥGC for which the computation is based
on Υ∗,l. This metallicity distribution has been obtained by converting the mass in each [Z/H]
bin of the previous figure into LV , by means of theM/LV ratio computed by Maraston (2005)
as a function of [Z/H] for 13 Gyr old SSPs. Due to the well-known increase of the M/LV
in the high metallicity tail of the distribution2, we notice in Figure 4 that now the second
peak has a smaller intensity in all the cases. The corresponding diffuse-light population goes
undetected in integrated-light studies. In any case, the conclusions reached by analyzing
Figure 2 are not significantly altered. We conclude that our analysis is not significantly
biased by some metallicity effect which may alter the shape of the observed ΥGC,tot by
luminosity. We stress the power of GCSs in disentangling stellar sub-populations in massive
ellipticals, due to their nature as simple stellar populations that can be directly compared
to SSP model predictions, unlike diffuse light measurements.
Even with this simple parametrization, where f in Equation 3 does not depend on
metallicity, we suggest that the bimodality for the metal-rich GCs is the result of different
shapes of Υ∗,m (and the Υ∗,l) at different galactocentric radii.
2See PMC06 for a comparison between G-dwarf like diagrams for Υ∗,m and Υ∗,l predicted for the same
– 12 –
3.1.4. Globular Cluster [Mg/Fe] Distributions
Finally, in Figure 5 we show the [Mg/Fe] distributions for GCs divided in radial bins as in
Figure 2. According to PMC06, these [Mg/Fe] distributions are narrower, more symmetric,
and exhibit a smaller radial variation with respect to the [Z/H] distributions. In any case, a
small degree of bimodality is still present. We point out the impressive agreement with the
spectroscopic observations by P06. There is a rather large discrepancy between data and
models at the high-[Mg/Fe] end. Hence, the corresponding Kolmogorov-Smirnov likelihood
tests return a probaility of 1% (for inner sample) and 95% (for the outer ones). If we limit
the model predictions to [Mg/Fe] < 0.8 dex, the agreement slightly improves, reaching a 10%
probablity in the inner region. However, the inner field data still does not reach the extreme
[Mg/Fe] values of the models. In fact, due to the monotonic decrease of the [Mg/Fe] as a
function of either metallicity or time (see PM04), the lack of low-metallicity GCs, evident
from Figure 2, translates into a lack of α-enhanced clusters.
The [Mg/Fe] bimodality of our model predictions and the match with the spectroscopic
measurements strongly imply that globular clusters in massive elliptical galaxies form on two
different timescales. Their chemical compositions are consistent with an early mode with a
duration of ∆t . 100 Myr and a normal formation that lasted for ∆t . 500 Myr. In fact,
according to the time-delay model (see Matteucci, 2001) and given the typical star formation
history of our model ellipticals, the [Mg/Fe] ratio in the gas - out of which the GCs form -
is quickly and continuously decreasing with time. We predict that the [Mg/Fe] ratio can be
higher than 0.65 dex (i.e. in the bins in which our predictions exhibit a deficit of GCs with
respect to the observed distribution) only in the first ∼ 100 Myr (see also Fig.3 in P06 and
related discussion). In fact, such a high value for the [Mg/Fe] can be attained only if very
massive type II SNe contribute to the chemical evolution, without any contribution from
either lower-mass type II or type Ia SNe. The normal formation, instead, is the one already
plotted in Fig. 5 and forms on a typical timescale of 0.5 − 0.7 Gyr. More quantitatively,
our initial theoretical GCMD predicts that only ∼ 4% of the GCS forms at [Mg/Fe] larger
than 0.65 dex. In order to improve the agreement with observations we require that the
above fraction should be increased to ∼ 12− 15%. Since star and globular cluster formation
are expected to be closely linked (e.g. Chandar et al. 2006) the same must be true for the
diffuse stellar population of such galaxies. We therefore foresee the presence of a similar
[Mg/Fe] bimodality in the diffuse light of massive elliptical galaxies. Unfortunately, there
are not direct observations confirming our suggestions, until the metallicity distributions for
the diffuse stellar component in ellipticals will become available for a number of galaxies.
Indeed, the detection of bimodality in the [Mg/Fe]-distribution might be a benchmark test
for our predictions.
– 13 –
We will still refer to multiple GC sub-populations. However, their differences ought
to be ascribed only to the fact that they are created during an extended (and intense) star
formation event during which the variation in chemical evolution is not negligible. The
radial differences originate from the fact that the galactic wind epoch is tightly linked to the
potential, occurring later in the innermost regions (e.g. Carollo et al. 1993; Martinelli et al.
1998). Moreover, PM04 and PMC06 found that also the infall timescale is linked to the
galactocentric radius. In particular, it lasts longer in the more internal regions, owing to the
continuous gas flows in the center of the galactic potential well. In particular, we recall that
in our model the core experiences a longer (∼ 0.7 Gyr) star-formation history with respect
to the outskirts where the typical star-formation timescale is ∼0.2 Gyr.
According to our models the metal-rich population of GCs in massive elliptical galaxies
may consist of multiple sub-populations which basically play the same role as the CSPs
populating each galactocentric shell in our framework of the global galaxy evolution. At the
same time, we point out the lack of our models to produce a significant fraction of metal-
poor GCs similar to the halo GC population in the Milky Way, with the caveat that the star
formation histories are very different. This, in turn, suggests that GCSs in giant elliptical
galaxies were assembled by accretion of a significant number of metal-poor GCs.
3.2. Metallicity Dependent Globular Cluster Formation
In this section we explore the approach outlined in Equation 2, by introducing the effect
of metallicity in the function f. In particular, we start by assuming that
f(t, ri, [Z/H]<−1)
f(t, ri, [Z/H]>−1)
= 2 , (4)
roughly following what was found for the GCS of the most nearby giant elliptical galaxy
NGC 5128 (Centaurus A) by Harris & Harris (2002, hereafter HH02) for their “inner field”,
regardless of the radius of the i-th shell. Since the final distributions are normalized, the
actual zeropoint of the function f is not relevant. Our model predictions are plotted in
Figure 6. We notice a modest increase of the low-metalliticy tail of the distribution with
respect to the simple picture sketched in Section 3.1 without metallicity dependence, as
well as a lower fraction of globular clusters populating the high-metallicity peak. Including
the metallicity dependence leads to an ambiguous change in agreement with the general
observed trend. Hence, no firm conclusions on the real need for a metallicity dependence
can be drawn. Similar results are obtained in the more realistic case in which f is a linearly
decreasing function of [Z/H].
– 14 –
At variance with HH02, we chose to adopt the same scaling irrespective of galacto-
centric radius, for the following reason. Despite the fact that the HH02 stellar metallicity
distributions Υ∗ as functions of [Z/H] confirm both the shape and the radial behavior of
our model predictions for the mass-weighted stellar metallicity distribution Υ∗,m (compare
their Fig. 7 with PMC06 Fig. 4), care should be taken when comparing their results for
Υ∗ as a function of [Fe/H]. The latter, in fact, had been obtained by assuming a particular
trend in the [α/Fe] as a function of galactocentric radius which disagrees with the results
of our detailed chemical evolution model. In particular, we find an offset of at least 0.2
dex in the sense that [Fe/H] HH02 ∼ 0.2+ [Fe/H] PM04 at a given metallicity ([Z/H]). This
disagreement becomes larger either at very low metallicity or at larger galactocentric radii,
where we expect a stronger α-enhancement. Once the PM04 value for [Fe/H] is adopted in
Fig. 18 of HH02, we find that: i) for the inner halo, the stellar Υ∗,m should be shifted by
∼ 0.2 dex toward lower metallicities, removing any metallicity effect, and ii) for the outer
halo the discrepancy between the stellar Υ∗,m and the ΥGC,tot should be reduced.
Nevertheless, we believe that some decrease with time of the function f could be mo-
tivated by theoretical arguments. In fact, recent work (e.g. Elmegreen & Efremov 1997;
Elmegreen 2004) shows that GCs of all ages preferentially form in turbulent high-pressure
regions. If we interpret the decrease in the efficiency of star formation (inside the gas clouds
that form GCs), as a function of the ambient pressure (Elmegreen & Efremov 1997) as a
proxy for the temporal behaviour of our function f, we find again a reduction of a factor
∼2−3 from the early high-pressure epochs to a late, more quiescent evolutionary phase.
3.2.1. The Ratio of Metal-poor to Metal-rich Globular Clusters
For our fiducial model we predict a ratio of metal-poor (namely with [Z/H] ≤ −1) to
metal-rich GCs of ∼0.2. Previous photometric surveys found that the typical value for GC
systems in elliptical galaxies is close to unity (Gebhardt & Kissler-Patig 1999; Kundu & Whitmore
2001a). Provided a linear color-metallicity transformation (see also Section 3.5), a possible
explanation for the discrepancy between our models and the observations might be obtained
by boosting the metal-poor population by a factor of f(t, ri, [Z/H]<−1)/f(t, ri, [Z/H]>−1) ≥
Another way to solve the discrepancy is to assume that all the missing globular clusters
have been accreted from the surroundings, e.g. from dwarf satellites (e.g. Côté et al. 1998).
We estimate the amount of the accreted metal-poor GCs, needed to achieve a ratio close to
1, as a factor of ∼4 of the number of globular clusters initially formed inside the galaxy.
– 15 –
3.3. The Role of the Host Galaxy Mass
A natural consequence of the scenario depicted in Sections 3.1 and 3.2 is that, at a
given galactocentric radius, the mean metallicity and [α/Fe] ratios of a GCS coincide with
the mass-weighted [〈Z/H〉∗] and [〈α/Fe〉∗] of the underlying stellar population, because the
GC quantities are calculated either from Υ∗,m or the Υ∗,l (see Eqs. 1 and 2 of PMC06),
unless the scaling function f is allowed to strongly vary with time. We expect this to happen
at least in the innermost GC sub-populations, in which the effects of the accretion of GCs
from the environment can be reasonably neglected. In particular, PM04 predict that more
massive galaxies should show higher [〈α/Fe〉∗] and [〈Z/H〉∗]. If accretion plays a negligible
role, we expect the same correlations for the total GC population with host galaxy mass for
the most massive systems, in agreement with current observations (e.g. van den Bergh 1975;
Brodie & Huchra 1991; Peng et al. 2006).
In fact, if we perform the same study of the above sections for a 1012M⊙ galaxy (see
Table 2 of PM04 for its properties), both peaks in ΥGC,tot shift their positions by about 0.2
dex to higher [Z/H]. This is in good agreement with the results of (Peng et al. 2006, see their
Figure 13 and 14). This trend holds for smaller objects as well. If we apply the procedure
to a 1010M⊙ galaxy (Model IIb of PM04), we find that the metal-rich peak shifts towards a
lower metallicity by 0.3 dex (with respect to our fiducial model with Mlum = 10
11M⊙), while
the other peak is now centered around [Z/H] = −0.8 dex. In particular, we find a faster
decrease in the mean metalliticy of the metal-poor GCs than for the metal-rich ones, again
in agreement with the Peng et al. results.
Interestingly, the ratio of metal-poor to metal-rich cluster increases up to ∼ 0.5 for
less massive halos. We recall that in the PM04 scenario, the low-mass galaxies are those
forming on a longer timescales and with a slower infall rate. Therefore, we suggest that
the combination of these factors is likely to at least partly explain the change of the GC
distributions in different galaxy morphologies. This is especially the case in dwarf galaxies,
where star formation is slow and still on-going, together with the fact that the probability
for a substantial change in the pressure of the interstellar medium relative to its initial values
is higher than in ellipticals, thus implying a much stronger variation of f with time.
3.4. Merger-Induced Globular Cluster Formation
It has been suggested that GC populations are produced during major merger events
which would lead to present-day ellipticals and their rich GCSs (e.g. Schweizer 1987; Ashman & Zepf
1992). Subsequent studies (e.g. Forbes et al. 1997; Kissler-Patig et al. 1998b) challenged this
– 16 –
view by pointing out the much higher SN and more metal-rich GCSs in early-type galaxies
compared to those of spiral and irregular galaxies, which are thought to represent the early
building blocks of massive ellipticals.
In the following, we study the impact of the merger hypothesis on the predictions of
our simulations. In order to do that, we extended the procedure sketched in the previous
sections to the merger models presented by Pipino & Matteucci (2006, hereafter PM06). In
this paper, the effects of late gas accretion episodes and subsequent merger-induced starbursts
on the photo-chemical evolution of elliptical galaxies have been studied and compared to the
picture of galaxy formation emerging from PM04; in particular the PM04 best model is
taken here as a reference. By means of the comparison with the colour-magnitude relations
and the [〈Mg/Fe〉V ]-σ relation observed in ellipticals (e.g. Renzini 2006), PM06 showed that
either bursts involving a gas mass comparable to the mass already transformed into stars
during the first episode of star formation and occurring at any redshift (major mergers), or
bursts occurring at low redshift (i.e. z ≤ 0.2) and with a large range of accreted mass (minor
mergers), are ruled out. The reason lies in the fact that the chemical abundances in the ISM
after the galactic wind (and before the occurrence of the merger) are dominated by Type
Ia SN explosions, which continuously enrich the gas with their ejecta (mainly Fe). When
the merger-induced starburst occurs, most stars form out of this enriched gas (thus, e.g.,
lowering the total [〈Mg/Fe〉]); at the same time, we expect the metallicities of GCs formed
out of this gas to be on average higher and their [Mg/Fe] ratios to be lower than those of
the bulk of stars and GCs formed in the initial starburst.
In this work we present the case in which the galaxy accretes a gas mass Macc = Mlum
at tacc = 2 Gyr (i.e. ∼1 Gyr after the onset of the galactic wind). We make this choice for
several reasons: i This model is quite similar to the PM06 models b and g, which were among
those in good agreement with observations of the diffuse galaxy light properties. ii) The
formation epoch of the bulk of these second generation GCs cannot occur &2 Gyr later than
tgw, because the majority of GCs in the most massive elliptical galaxies studied today appear
old within the age resolution of current photometric (∆t/t ≈ 0.4−0.5) and spectroscopic
studies (∆t/t ≈ 0.2−0.3). Finally, the composition of the newly accreted gas is assumed to
be primordial (see PM06 for a detailed discussion), but we remark that we reach roughly the
same conclusion in the case of solar composition, in order to mimic some pre-enrichment for
the newly accreted gas. We point out that, lacking dynamics, PM06 presented their results
for one-zone models. Therefore, in this section we are considering Equation 2 limited to only
one shell. In this way we can check whether the single merger hypothesis alone is enough to
produce some bimodality in the total globular-cluster metallicity distribution ΥGC,tot, and if
it is consistent with the predictions based on our fiducial model described in Section 3.2.
– 17 –
We show our results in Figure 7 and 8. We notice a clear change in the overall shape of
the metallicity distribution ΥGC,tot with respect to the cases shown in the previous sections, in
the sense that now ΥGC,tot is narrower and dominated by objects with super-solar metallicity
(and sub-solar [Mg/Fe] ratios) with a dominant population at [Z/H] ≈ 0.1, which is not
prominent in the observations of P06. The high-metallicity globular cluster populations
dominate the metallicity distribution which is at variance with both the results from previous
sections and the observations.
Our merger model does not include the accretion of globular clusters that were already
formed within the accreted satellite galaxies. The inclusion of this effect could remedy the
match between models and observations at low metallicities, as the typical GC in a dwarf
galaxy is metal-poor (e.g. Lotz et al. 2004; Sharina et al. 2005) and their addition to the
initial GC population would enhance the total number of metal-poor GCs and improve the
fit to the data. However, these clusters need to be α-enhanced to match the observations.
The impact of GC accretion on our post-merger model predictions will be studied in detail
in a future paper. Here, we remark that the time at which the purely gaseous subsequent
merger event can occur (which does not import already formed GCs), is limited by the
onset of the galactic wind, after which the type-Ia SNe dominates the nucleosynthesis, and
needs to be completed at tmrg . 1 − 2 Gyr after the first starburst. However, this time
constraint implies that such merger events would overlap with the initial starburst and be
mostly indistinguishable from each other. Such a scenario closely resembles the Searle-Zinn
scenario (Searle & Zinn 1978), in which galaxy halos are formed from the agglomeration of
gaseous protogalactic fragments. Later merger events are excluded in our models, as they
would produce GCs with sub-solar [Mg/Fe] ratios which is at variance with the observations.
Note also that the fraction of GCs at [Z/H]< −1 can be recovered in our models only
if the cluster formation at low metallicity is enhanced (e.g. using a value of 10 instead of 2
in eq.4). However, even in the case in which we adopt some f(Z) strongly declining with
total metallicity, which may alter the shape of ΥGC,tot enhancing the low-metallicity tail and
thus improving the agreement with observations, the position of both super-solar metallicity
peaks will not change, remaining at variance with the data.
3.5. Other Mechanisms responsible for Multimodality
The picture emerging from our analysis is far from being the general solution to ex-
plaining the complexity of GC color distributions, and it suggests only a scheme in which
multiple mechanisms could be at work together, either broadening or adding features to the
observed distributions.
– 18 –
For instance, Yoon et al. (2006) suggested that the color bimodality could arise from the
presence of hot horizontal-branch stars (so far not accounted for in SSP model predictions)
that results in a non-linear color-metallicity transformation producing two color peaks from
an originally single-peak metallicity distribution. We tested this scenario on our fiducial
model GC metallicity distribution, by applying to each SSP the following transformation
from [Fe/H] to the (g − z) color:
(g − z) = α + β [Fe/H] + γ [Fe/H]
+δ [Fe/H]3 + ǫ [Fe/H]
The numerical values of the coefficients are given in Table 1 and the relation was adopted from
Yoon et al. (2006) and is consistent with the best-fit relation presented in their Figure 1b.
We show our results in Figure 9.
Since we start from symmetric metallicity distributions, the non-linear transformation
seems to work and produce a color bimodality for the CSP inhabiting the < 10Reff shell
(Figure 9, solid line), although the bimodality is slightly exaggerated compared to real data
(see Figure 1). In fact, a look at the color distribution which we obtained for the sole
0.1Reff shell reveals that it still has one peak and is roughly symmetric (Figure 9, dashed
line). Obviously, since the (g − z)− [Fe/H] relationship is meant to explain the GCs color
bimodality without invoking any other effect, we did not combine the two histograms, either
according to Eq.2 or to Eq. 3 in our models, as we are comparing metallicity distributions
to spectroscopic measurements.
It is of great importance to investigate this transformation with large and homoge-
neous data sets that cover a wide enough metallicity range to allow a robust analysis of
the non-linear inflection point in the color-metallicity transformation. However, as a re-
sult of Figure 9, we point out that the color bimodality typically found for GCSs in massive
early-type galaxies might be only partly due to a non-linear color-metallicity transformation.
Another effect put forward by, e.g.,Recchi & Danziger (2005) is the claim that GCs
might have undergone a self-enrichment phase at the early stages of their formation, and
Table 1. Numerical values of coefficients used in Equation 6.
coefficient numerical value
α 1.5033
β 0.172774
γ −0.623522
δ −0.453331
ǫ −0.089038
– 19 –
therefore some GCs could have experienced a boost in metallicity which would be not rep-
resentative of the metallicity of their parent gas cloud. Finally, as already mentioned above
in Section 3.4, some GCs residing in the outermost regions of the galaxies (e.g. Lee et al.
2006) could have experienced entirely different chemical enrichment histories at the time
of their formation and later been added to a more massive system through accretion (e.g.
Côté et al. 1998). The inclusion of these effects goes far beyond the scope of this work, but
we remind the reader that all the aforementioned effects might influence the interpretation
of any globular cluster color and metallicity distribution.
4. Conclusions
By means of the comparison between PM04’s best model predictions for the radial
changes in the CSP chemical properties and the recent spectroscopic data on the metallicity
distributions of extragalactic GCSs from Puzia et al. (2006), we are able to derive some
conclusions on the GC metallicity distributions in massive elliptical galaxies. In particular,
we focused on the main drivers of the multi-modality that is observed in the majority of
GCSs in massive elliptical galaxies. Our main conclusions are:
• We show that the observed multi-modality in the GC metallicity distributions can be
ascribed to the radial variation in the underlying stellar populations in giant elliptical
galaxies. In particular, the observed GCSs are consistent with a linear combination of
the GC sub-populations inhabiting different galactocentric radii projected on the sky.
• A new prediction of our models, which is in astonishing agreement with the spectro-
scopic observations, is the presence of a super-solar metallicity mode that seems to
emerge in the most massive elliptical galaxies. In smaller objects, instead, this mode
disappears quickly with decreasing stellar mass of the host galaxy.
• Our models successfully reproduce the observed [Mg/Fe] bimodality in GCSs of mas-
sive elliptical galaxies. This, in turn, suggests a bimodality in formation timescales
during the early formation epochs of GCs in massive galaxy halos. The two modes are
consistent with an early (initial) and later (triggered) formation mode.
• Since the GC populations trace the properties of galactic CSPs in our scenario, we
predict an increase of the mean metallicity of the cluster system with the host galaxy
mass, which closely follows the mass-metallicity relation for ellipticals. Moreover, we
expect that a major fraction of the GCs (i.e. those born inside the galaxy) follows an
age-metalliticity relationship, in the sense that the older ones are also more α-enhanced
and more metal-poor.
– 20 –
• The role of host galaxy metallicity in shaping the observed GC metallicity distribution
is non-negligible, although its effects have been estimated to change the function f ≃
ψGC/ψ∗ by a factor of ∼ 2 − 5, in order to match the sample of Puzia et al. (2006).
Either a non-linear color-metallicity transformation, or a stronger metallicity effect,
and/or accretion of GCs from the surrounding environment is needed to explain a
ratio of metal-poor to metal-rich GCs close to unity, as reported for ellipticals based
on results from photometric surveys.
• Merger models which include the later accretion of primordial and/or solar-metallicity
gas predict a shape for the GC metallicity distribution which is at variance with the
spectroscopic observations.
We thank the referee for a careful reading of the paper. A.P. warmly thanks S.Recchi
for useful discussions. A.P. acknowledges support by the Italian Ministry for University
under the COFIN03 prot. 2003028039 scheme. T.H.P. acknowledges support by NASA
through grants GO-10129 and GO-10515 from the Space Telescope Science Institute, which
is operated by AURA, Inc., under NASA Contract NAS5-26555, and the support in form of
a Plaskett Research Fellowship at the Herzberg Institute of Astrophysics.
REFERENCES
Arimoto, N., & Yoshii, Y. 1987, A&A, 173, 23
Ashman, K. M., & Zepf, S. E. 1992, ApJ, 384, 50
Ashman, K. M. & Zepf, S. E. 1998, Globular Cluster Systems, Cambridge University Press
Beasley, M. A., Baugh, C. M., Forbes, D. A., Sharples, R. M., & Frenk, C. S. 2002, MNRAS, 333,
Bower, R. G., Lucey, J. R., & Ellis, R. S. 1992, MNRAS, 254, 601
Bressan, A., Chiosi, C., Fagotto, F. 1994, ApJs, 94, 63
Brodie, J.P., & Huchra, J.P., 1991, ApJ, 379, 157
Brodie, J.P., & Strader, J., 2006, astro-ph/0602601
Carollo, C.M., Danziger, I.J., & Buson, L. 1993, MNRAS, 265, 553
Chandar, R., Fall, S. M., & Whitmore, B. C. 2006, ApJ, 650, L111
http://arxiv.org/abs/astro-ph/0602601
– 21 –
Côté, P., Marzke, R. O., & West, M. J. 1998, ApJ, 501, 554
Dirsch, B., Richtler, T., Geisler, D., Forte, J.C., Bassino, L.P., & Gieren, W.P., 2003, ApJ, 125,
Dirsch, B., Schuberth, Y., & Richtler, T. 2005, A&A, 433, 43
Elmegreen, B. C. 2004, ASP Conf. Ser. 322: The Formation and Evolution of Massive Young Star
Clusters, 322, 277
Elmegreen, B. G., & Efremov, Y. N. 1997, ApJ, 480, 235
Elmegreen, B. G., & Scalo, J. 2004, ARA&A, 42, 211
Faber, S.M., Worthey, G., & Gonzalez, J.J. 1992, in IAU Symp. n.149, eds. B. Barbuy & A. Renzini,
p. 255
Fall, S.M., & Zhang, Q. 2001, ApJ, 561, 751
Forbes, D. A., Brodie, J. P., & Grillmair, C. J. 1997, AJ, 113, 1652
Gallazzi, A., Charlot, S., Brinchmann, J., White, S. D. M., & Tremonti, C. A. 2005, MNRAS, 362,
Gebhardt, K., & Kissler-Patig, M. 1999, AJ, 118, 1526
Gibson, B.K., 1996, MNRAS, 278, 829
Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223
Greggio, L., 1997, MNRAS, 285, 151
Harris, W. E. 1991, ARA&A, 29, 543
Harris, W. E. 1996, AJ, 112, 1487
Harris, W. E. 2001, in Saas-Fee Advanced School on Star Clusters, ed. L. Labhardt & B. Binggeli
(course 28), Springer, New York
Harris, W. E., Harris, G. L. H., & McLaughlin, D. E. 1998, AJ, 115, 1801
Harris, W. E., & Harris, G. L. H. 2002, AJ, 123, 3108
Holtzman, J. A., et al. 1992, AJ, 103, 691
Kissler-Patig, M., Brodie, J. P., Schroder, L. L., Forbes, D. A., Grillmair, C. J., & Huchra, J. P.
1998a, AJ, 115, 105
– 22 –
Kissler-Patig, M., Forbes, D. A., & Minniti, D. 1998b, MNRAS, 298, 1123
Kravtsov, A. V., & Gnedin, O. Y. 2005, ApJ, 623, 650
Kundu, A., & Whitmore, B. C. 2001a, AJ, 121, 2950
Kundu, A., & Whitmore, B. C. 2001b, AJ, 122, 1251
Larsen, S. S., Brodie, J. P., Huchra, J. P., Forbes, D. A., & Grillmair, C. J. 2001, AJ, 121, 2974
Larsen, S. S., & Richtler, T. 2000, A&A, 354, 836
Lee, J.-W., López-Morales, M., & Carney, B. W. 2006, ApJ, 646, L119
Li, Y., Mac Low, M.-M., & Klessen, R. S. 2004, ApJ, 614, L29
Lotz, J. M., Miller, B. W., & Ferguson, H. C. 2004, ApJ, 613, 262
Lutz, D. 1991, A&A, 245, 31
Maraston, C. 2005, MNRAS, 362, 799
Martinelli, A., Matteucci, F., Colafrancesco, S., 1998, MNRAS, 298, 42
Matteucci, F. 2001, The chemical evolution of the Galaxy, Kluwer Academic Publishers, Dordrecht
McLaughlin, D. E. 1999, AJ, 117, 2398
Mendez, R.H., Thomas, D., Saglia, R.P., Maraston, C., Kudritzki, R.P., & Bender, R., 2005, ApJ,
627, 767
Nomoto, K., Hashimoto, M., Tsujimoto, T., Thielemann, F.K., Kishimoto, N., Kubo, Y., Nakasato,
N., 1997, Nuclear Physics A, A621, 467
Osterbrock, D. E., & Ferland, G. J. 2006, Astrophysics of gaseous nebulae and active galactic
nuclei, 2nd. ed. by D.E. Osterbrock and G.J. Ferland. Sausalito, CA: University Science
Books, 2006,
Pagel, B. E. J., & Patchett, B.E. 1975, MNRAS, 172, 13
Peletier, R.F., Davies, R.L., Illingworth, G.D., Davis, L.E., Cawson, M. 1990, AJ, 100, 1091
Peng, E. W., et al. 2006, ApJ, 639, 95
Pipino, A., Matteucci, F. 2004, MNRAS, 347, 968 (PM04)
Pipino, A., Kawata, D., Gibson, B. K., & Matteucci, F. 2005, A&A, 434, 553
– 23 –
Pipino, A., Matteucci, F., & Chiappini, C. 2006, ApJ, 638, 739 (PMC06)
Pipino, A., & Matteucci, F. 2006, MNRAS, 365, 1114 (PM06)
Puzia, T. H., Kissler-Patig, M., Brodie, J. P., & Huchra, J. P. 1999, AJ, 118, 2734
Puzia, T. H., et al. 2004, A&A, 415, 123
Puzia, T. H., Kissler-Patig, M., Thomas, D., Maraston, C., Saglia, R. P., Bender, R., Goudfrooij,
P., & Hempel, M. 2005, A&A, 439, 997
Puzia, T. H., Kissler-Patig, M., & Goudfrooij, P. 2006, ApJ, 648, 383, (P06)
Recchi, S., & Danziger, I. J. 2005, A&A, 436, 145
Renzini, A. 2006, ARA&A, 44, 141
Rhode, K. L., & Zepf, S. E. 2004, AJ, 127, 302
Salpeter, E. E. 1955, ApJ, 121, 161
Schweizer, F. 1987, Nearly Normal Galaxies. From the Planck Time to the Present, 18
Searle, L., & Zinn, R. 1978, ApJ, 225, 357
Sharina, M. E., Puzia, T. H., & Makarov, D. I. 2005, A&A, 442, 85
Thielemann, F. K., Nomoto, K., Hashimoto, M. 1996, ApJ, 460, 408
Thomas, D., Maraston, C., & Bender, R., 2002, Ap&SS, 281, 371
Tinsley, B.M., 1980, ApJ, 241, 41
van den Hoek, L.B., Groenewegen, M.A.T. 1997, A&AS, 123, 305
van den Bergh, 1975, ARA&A, 13, 217
Weiss, A., Peletier, R.F., Matteucci, F. 1995, A&A, 296, 73
West, M. J., Côté, P., Marzke, R. O., & Jordán, A. 2004, Nature, 427, 31
Worthey, G., Faber, S.M., & Gonzalez, J.J. 1992, ApJ, 398, 69
Yoon, S.-J., Yi, S. K., & Lee, Y.-W. 2006, Science, 311, 1129
Zepf, S. E., & Ashman, K. M. 1993, MNRAS, 264, 611
This preprint was prepared with the AAS LATEX macros v5.2.
– 24 –
Fig. 1.— Color distributions of GCs in nearby, massive elliptical galaxies (Puzia et al. 2006,
top panel), in NGC 4472 (Puzia et al. 1999, middle panel), and the Milky Way, taken from the
February 2003 update of the McMaster catalog (Harris 1996, bottom panel). In order to allow
a robust comparison between the P06 and NGC 4472 sample, only GCs in NGC 4472 with
with luminosities brighter than V ≃ 22.5 mag are shown. The solid lines are Epanechnikov-
kernel probability density estimates with their bootstrapped 90% confidence limits.
– 25 –
Fig. 2.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function
of [Z/H] for three different radial compositions (i.e. fred/fblue). The left panel shows both
model predictions and observations related to the central part of an elliptical galaxy. The
right panel shows the same quantities for cluster populations residing at r ≥ Reff . Solid empty
histograms: observational data taken as sub-samples of the P06 compilation, according to
the galactic regions presented in each panel.
– 26 –
Fig. 3.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function
of [Z/H] for two different projected galactocentric radii. The left panel shows both model
predictions and observations related to the pure core of an elliptical galaxy (namely fred :
fblue = 1 : 0). The right panel shows the same quantities for cluster populations residing
either at 0.5Reff < r < 1.5Reff . Solid empty histograms: observational data taken as sub-
samples of the P06 compilation, according to the galactic regions presented in each panel.
– 27 –
Fig. 4.— Predicted globular-cluster metallicity distribution ΥGC,tot by luminosity at three
different projected galactocentric radii. Solid: innermost region; dotted: average galactic
(intermediate population); dashed: outermost part.
– 28 –
Fig. 5.— Shaded histogram: Predicted distribution of globular-cluster [Mg/Fe] values at
two different projected galactocentric radii (innermost and outermost regions). Solid empty
histogram: observational data taken as sub-samples of the P06 compilation, according to
the galactic regions presented in each panel (see text).
– 29 –
Fig. 6.— Predicted globular-cluster metallicity distribution ΥGC,tot by mass as a function
of [Z/H] for three different radial compositions (i.e. different fred/fblue). In this case, the
function f has an explicit dependence on [Z/H] (see text). The left panel shows both model
predictions and observations related to the central part of an elliptical galaxy. The right
panel shows the same quantities for cluster populations residing at r ≥ Reff . Solid empty
histograms: observational data taken as sub-samples of the P06 compilation, according to
the galactic regions presented in each panel.
– 30 –
Fig. 7.— Shaded histogram: predicted total GC metallicity distribution ΥGC,tot by mass
for the < 1Reff shell, for a case in which a second episode of star formation, induced by
a gaseous merger, is taken into account (see text). Solid histogram: observations from
Puzia et al. (2006), their entire sample.
– 31 –
Fig. 8.— Shaded histogram: predicted total GC [Mg/Fe] distribution by mass for the < 1Reff
shell, for a case in which a second episode of star formation, induced by a gaseous merger,
is taken into account (see text). Solid histogram: observations by Puzia et al. (2006), their
entire sample.
– 32 –
Fig. 9.— Predicted globular-cluster metallicity distribution by mass as a function of the
(g−z) colour at two different projected galactocentric radii. Dashed: galactic core. Solid:
galactic halo out to 10 Reff .
	Introduction
	The Multimodality of Globular Cluster Systems
	Numerical Models of Globular Cluster System Formation
	A Spatially Resolved Chemical Evolution Model for Spheroid Galaxies
	The model
	The Chemical Evolution Code
	Globular Cluster Formation
	Results and discussion
	The Multi-Modality of Globular Cluster Systems in Elliptical Galaxies
	The Comparison Sample
	A Simple Model
	Globular Cluster Metallicity Distribution
	Globular Cluster [Mg/Fe] Distributions
	Metallicity Dependent Globular Cluster Formation
	The Ratio of Metal-poor to Metal-rich Globular Clusters
	The Role of the Host Galaxy Mass
	Merger-Induced Globular Cluster Formation
	Other Mechanisms responsible for Multimodality
	Conclusions
ABSTRACT
  The most massive elliptical galaxies show a prominent multi-modality in their
globular cluster system color distributions. Understanding the mechanisms which
lead to multiple globular cluster sub-populations is essential for a complete
picture of massive galaxy formation. By assuming that globular cluster
formation traces the total star formation and taking into account the radial
variations in the composite stellar populations predicted by the Pipino &
Matteucci (2004) multi-zone photo-chemical evolution code, we compute the
distribution of globular cluster properties as a function of galactocentric
radius. We compare our results to the spectroscopic measurements of globular
clusters in nearby early-type galaxies by Puzia et al. (2006) and show that the
observed multi-modality in globular cluster systems of massive ellipticals can
be, at least partly, ascribed to the radial variation in the mix of stellar
populations. Our model predicts the presence of a super-metal-rich population
of globular clusters in the most massive elliptical galaxies, which is in very
good agreement with the spectroscopic observations. Furthermore, we investigate
the impact of other non-linear mechanisms that shape the metallicity
distribution of globular cluster systems, in particular the role of
merger-induced globular cluster formation and a non-linear color-metallicity
transformation, and discuss their influence in the context of our model
(abridged)

<|endoftext|><|startoftext|>
Introduction
	GLAST studies of accreting binaries
	X-ray jets
	Large scale jet-ISM interaction
	Gamma-ray spectral states and major ejections
	The observational status: gamma-ray binaries
	The view from space
	The view from the ground
	Gamma-ray binaries
	Gamma-ray binaries as compact pulsar wind nebulae
	GLAST studies of rotation-powered binaries
	Gamma-ray orbital modulation
	Probing pulsar winds
	Population studies of gamma-ray binaries
ABSTRACT
  Radio and X-ray observations of the relativistic jets of microquasars show
evidence for the acceleration of particles to very high energies. Signatures of
non-thermal processes occurring closer in to the compact object can also be
found. In addition, three binaries are now established emitters of high (> 100
MeV) and/or very high (> 100GeV) energy gamma-rays. High-energy emission can
originate from a microquasar jet (accretion-powered) or from a shocked pulsar
wind (rotation-powered). I discuss the impact GLAST will have in the very near
future on studies of such binaries. GLAST is expected to shed new light on the
link between accretion and ejection in microquasars and to enable to probe
pulsar winds on small scales in rotation-powered binaries.

<|endoftext|><|startoftext|>
Introduction
1.1 The main questions and results
In this paper, every surface will be complex, rational, algebraic and smooth, and
except for C2, will also be projective. By an automorphism of a surface we mean
a biregular algebraic morphism from the surface to itself. The group of automor-
phisms (respectively of birational transformations) of a surface S will be denoted
by Aut(S) (respectively by Bir(S)).
The group Bir(P2) is classically called the Cremona group. Taking some sur-
face S, any birational map S 99K P2 conjugates Bir(S) to Bir(P2); any subgroup
of Bir(S) may therefore be viewed as a subgroup of the Cremona group, up to
conjugacy.
The minimal surfaces are P2, P1×P1 and the Hirzebruch surfaces Fn for n ≥ 2;
their groups of automorphisms are a classical object of study, and their structures
are well known (see for example [Bea1]). These groups are in fact the maximal
connected algebraic subgroups of the Cremona group (see [Mu-Um], [Um]).
Given some group acting birationally on a surface, we would like to determine
some geometric properties that allow us to decide whether the group is conjugate to
a group of automorphisms of a minimal surface, or equivalently to decide whether
http://arxiv.org/abs/0704.0537v2
it belongs to a maximal connected algebraic subgroup of the Cremona group. This
conjugation looks like a linearisation, as we will see below, and explains our title.
We observe that the set of points of a minimal surface which are fixed by a
non-trivial automorphism is the union of a finite number of points and rational
curves. Given a group G of birational transformations of a surface, the following
properties are thus related (note that for us the genus is the geometric genus, so
that a curve has positive genus if and only if it is not rational); property (F ) is
our candidate for the geometric property for which we require:
(F ) No non-trivial element of G fixes (pointwise) a curve of positive genus.
(M) The group G is birationally conjugate to a group of automorphisms of
a minimal surface.
The fact that a curve of positive genus is not collapsed by a birational trans-
formation of surfaces implies that property (F ) is a conjugacy invariant; it is
clear that the same is true of property (M). The above discussion implies that
(M) ⇒ (F ); we would like to prove the converse.
The implication (F ) ⇒ (M) is true for finite cyclic groups of prime order
(see [Be-Bl]). The present article describes precisely the case of finite Abelian
groups. We prove that (F ) ⇒ (M) is true for finite cyclic groups of any order,
and that we may restrict the minimal surfaces to P2 or P1 × P1. In the case of
finite Abelian groups, there exists, up to conjugation, only one counterexample
to the implication, which is represented by a group isomorphic to Z/2Z × Z/4Z
acting biregularly on a special conic bundle. Precisely, we will prove the following
results, announced without proof as Theorems 4.4 and 4.5 in [Bla3]:
Theorem 1. Let G be a finite cyclic subgroup of order n of the Cremona group.
The following conditions are equivalent:
• If g ∈ G, g 6= 1, then g does not fix a curve of positive genus.
• G is birationally conjugate to a subgroup of Aut(P2).
• G is birationally conjugate to a subgroup of Aut(P1 × P1).
• G is birationally conjugate to the group of automorphisms of P2 generated
by (x : y : z) 7→ (x : y : e2iπ/nz).
Theorem 2. Let G be a finite Abelian subgroup of the Cremona group. The
following conditions are equivalent:
• If g ∈ G, g 6= 1, then g does not fix a curve of positive genus.
• G is birationally conjugate to a subgroup of Aut(P2), or to a subgroup of
Aut(P1 × P1) or to the group Cs24 isomorphic to Z/2Z × Z/4Z, generated
by the two elements
(x : y : z) 99K (yz : xy : −xz),
(x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)).
Moreover, this last group is conjugate neither to a subgroup of Aut(P2), nor to a
subgroup of Aut(P1 × P1).
Then, we discuss the case in which the group is infinite, respectively non-
Abelian (Section 11) and provide many examples of groups satisfying (F ) but not
Note that many finite groups which contain elements that fix a non-rational
curve are known, see for example [Wim] or more recently [Bla2] and [Do-Iz]. This
can also occur if the group is infinite, see [BPV] and [Bla5]. In fact, the set of
non-rational curves fixed by the elements of a group is a conjugacy invariant very
useful in describing conjugacy classes (see [Ba-Be], [dFe], [Bla4]).
1.2 How to decide
Given a finite Abelian group of birational transformations of a (rational) surface,
we thus have a good way to determine whether the group is birationally conjugate
to a group of automorphisms of a minimal surface (in fact to P2 or P1 × P1). If
some non-trivial element fixes a curve of positive genus (i.e. if condition (F ) is not
satisfied), this is false. Otherwise, if the group is not isomorphic to Z/2Z×Z/4Z,
it is birationally conjugate to a subgroup of Aut(P2) or of Aut(P1 × P1). There
are exactly four conjugacy classes of groups isomorphic to Z/2Z×Z/4Z satisfying
condition (F ) (see Theorem 5); three are conjugate to a subgroup of Aut(P2) or
Aut(P1 × P1), and the fourth (the group Cs24 of Theorem 2, described in detail
in Section 7) is not.
1.3 Linearisation of birational actions
Our question is related to that of linearisation of birational actions on C2. This
latter question has been studied intensively for holomorphic or polynomial actions,
see for example [De-Ku], [Kra] and [vdE]. Taking some group acting birationally
on C2, we would like to know if we may birationally conjugate this action to have
a linear action. Note that working on P2 or C2 is the same for this question.
Theorem 1 implies that for finite cyclic groups, being linearisable is equivalent
to fulfilling condition (F ). This is not true for finite Abelian groups in general,
since some groups acting biregularly on P1 × P1 are not birationally conjugate to
groups of automorphisms of P2. Note that Theorem 1 implies the following result
on linearisation, also announced in [Bla3] (as Theorem 4.2):
Theorem 3. Any birational map which is a root of a non-trivial linear automor-
phism of finite order of the plane is conjugate to a linear automorphism of the
plane.
1.4 The approach and other results
Our approach – followed in all the modern articles on the subject – is to view the
finite subgroups of the Cremona group as groups of (biregular) automorphisms of
smooth projective rational surfaces and then to assume that the action is minimal
(i.e. that it is not possible to blow-down some curves and obtain once again a
biregular action on a smooth surface). Manin and Iskovskikh ([Man] and [Isk2])
proved that the only possible cases are action on del Pezzo surfaces or conic bun-
dles. We will clarify this classification, for finite Abelian groups fillfulling (F), by
proving the following result:
Theorem 4. Let S be some smooth projective rational surface and let G ⊂ Aut(S)
be a finite Abelian group of automorphisms of S such that
• the pair (G,S) is minimal;
• if g ∈ G, g 6= 1, then g does not fix a curve of positive genus.
Then, one of the following occurs:
1. The surface S is minimal, i.e. S ∼= P2, or S ∼= Fn for some integer n 6= 1.
2. The surface S is a del Pezzo surface of degree 5 and G ∼= Z/5Z.
3. The surface S is a del Pezzo surface of degree 6 and G ∼= Z/6Z.
4. The pair (G,S) is isomorphic to the pair (Cs24, Ŝ4) defined in Section 7.
We will then prove that all the pairs in cases 1, 2 and 3 are birationally equiv-
alent to a group of automorphisms of P1 × P1 or P2, and that this is not true
for case 4. In fact, we are able to provide the precise description of all conjugacy
classes of finite Abelian subgroups of Bir(P2) satisfying (F ):
Theorem 5. Let G be a finite Abelian subgroup of the Cremona group such that
no non-trivial element of G fixes a curve of positive genus. Then, G is birationally
conjugate to one and only one of the following:
[1] G ∼= Z/nZ× Z/mZ g.b. (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy)
[2] G ∼= Z/2Z× Z/2nZ g.b. (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζ2ny)
[3] G ∼= (Z/2Z)2 × Z/2nZ g.b. (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζ2ny)
[4] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1)
[5] G ∼= (Z/2Z)4 g.b. (x, y) 7→ (±x±1,±y±1)
[6] G ∼= Z/2Z× Z/4Z g.b. (x, y) 7→ (x−1, y−1) and (x, y) 7→ (−y, x)
[7] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (−x,−y), (x, y) 7→ (x−1, y−1),
and (x, y) 7→ (y, x)
[8] G ∼= (Z/2Z)× (Z/4Z) g.b. (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z))
and (x : y : z) 99K (yz : xy : −xz)
[9] G ∼= (Z/3Z)2 g.b. (x : y : z) 7→ (x : ζ3y : (ζ3)
and (x : y : z) 7→ (y : z : x)
(where n,m are positive integers, n divides m and ζn = e
2iπ/n).
Furthermore, the groups in cases [1] through [7] are birationally conjugate to sub-
groups of Aut(P1×P1), but the others are not. The groups in cases [1] and [9] are
birationally conjugate to subgroups of Aut(P2), but the others are not.
To prove these results, we will need a number of geometric results on automor-
phisms of rational surfaces, and in particular on automorphisms of conic bundles
and del Pezzo surfaces (Sections 3 to 9). We give for example the classification of
all the twisting elements (that exchange the two components of a singular fibre)
acting on conic bundles in Proposition 6.5 (for the elements of finite order) and
Proposition 6.8 (for those of infinite order); these are the most important elements
in this context (see Lemma 3.8). We also prove that actions of (possibly infinite)
Abelian groups on del Pezzo surfaces satifying (F ) are minimal only if the degree
is at least 5 (Section 9) and describe these cases precisely (Sections 4, 5 and 9).
We also show that a finite Abelian group acting on a projective smooth surface
S such that (KS)
2 ≥ 5 is birationally conjugate to a group of automorphisms of
P1 × P1 or P2 (Corollary 9.10) and in particular satisfies (F ).
1.5 Comparison with other work
Many authors have considered the finite subgroups of Bir(P2). Among them,
S. Kantor [Kan] gave a classification of the finite subgroups, which was incomplete
and included some mistakes; A. Wiman [Wim] and then I.V. Dolgachev and V.A.
Iskovskikh [Do-Iz] successively improved Kantor’s results. The long paper [Do-Iz]
expounds the general theory of finite subgroups of Bir(P2) according to the modern
techniques of algebraic geometry, and will be for years to come the reference on
the subject. Our viewpoint and aim differ from those of [Do-Iz]: we are only
interested in Abelian groups in relation with the above conditions (F) and (M);
this gives a restricted setting in which the theoretical approach is simplified and the
results obtained are more accurate. In the study of del Pezzo surfaces, using the
classification [Do-Iz] of subgroups of automorphisms would require the examination
of many cases; for the sake of readibility we prefered a direct proof. The two
main theorems of [Do-Iz] on automorphism of conic bundles (Proposition 5.3 and
Theorem 5.7(2)) do not exclude groups satisfying property (F ) and do not give
explicit forms for the generators of the groups or the surfaces.
1.6 Aknowledgements
This article is part of my PhD thesis [Bla2]; I am grateful to my advisor T. Vust
for his invaluable help during these years, to I. Dolgachev for helpful discussions,
and thank J.-P. Serre and the referees for their useful remarks on this paper.
2 Automorphisms of P2 or P1 × P1
Note that a linear automorphism of C2 may be extended to an automorphism of
either P2 or P1 × P1. Moreover, the automorphisms of finite order of these three
surfaces are birationally conjugate. For finite Abelian groups, the situation is quite
different. We give here the birational equivalence of these groups.
Notation 2.1. The element [a : b : c] denotes the diagonal automorphism (x : y :
z) 7→ (ax : by : cz) of P2, and ζm = e
2iπ/m.
Proposition 2.2 (Finite Abelian subgroups of Aut(P2)). Every finite Abelian
subgroup of Aut(P2) = PGL(3,C) is conjugate, in the Cremona group Bir(P2), to
one and only one of the following:
1. A diagonal group, isomorphic to Z/nZ×Z/mZ, where n divides m, generated
by [1 : ζn : 1] and [ζm : 1 : 1]. (The case n = 1 gives the cyclic groups).
2. The special group V9, isomorphic to Z/3Z×Z/3Z, generated by [1 : ζ3 : (ζ3)
and (x : y : z) 7→ (y : z : x).
Thus, except for the group V9, two isomorphic finite Abelian subgroups of PGL(3,C)
are conjugate in Bir(P2).
Proof. First of all, a simple calculation shows that every finite Abelian subgroup
of PGL(3,C) is either diagonalisable or conjugate to the group V9. Furthermore,
since this last group does not fix any point, it is not diagonalisable, even in Bir(P2)
[Ko-Sz, Proposition A.2].
Let T denote the torus of PGL(3,C) constituted by diagonal automorphisms
of P2. Let G be a finite subgroup of T ; as an abstract group it is isomorphic to
Z/nZ× Z/mZ, where n divides m. Now we can conjugate G by a birational map
of the form h : (x, y) 99K (xayb, xcyd) so that it contains [ζm : 1 : 1] (see [Be-Bl]
and [Bla1]). Since h normalizes the torus T , the group G remains diagonal and
contains the n-torsion of T , hence it contains [1 : ζn : 1].
Corollary 2.3. Every finite Abelian group of linear automorphisms of C2 is bi-
rationally conjugate to a diagonal group, isomorphic to Z/nZ × Z/mZ, where n
divides m, generated by (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy).
Proof. This follows from the fact that the group GL(2,C) of linear automorphisms
of C2 extends to a group of automorphisms of P2 that leaves the line at infinity
invariant and fixes one point.
Example 2.4. Note that Aut(P1 × P1) contains the group (C∗)2 ⋊ Z/2Z, where
(C∗)2 is the group of automorphisms of the form (x, y) 7→ (αx, βy), α, β ∈ C∗, and
Z/2Z is generated by the automorphism (x, y) 7→ (y, x).
The birational map (x, y) 99K (x : y : 1) from P1×P1 to P2 conjugates (C∗)2⋊
Z/2Z to the group of automorphisms of P2 generated by (x : y : z) 7→ (αx : βy : z),
α, β ∈ C∗ and (x : y : z) 7→ (y : x : z).
Proposition 2.5 (Finite Abelian subgroups of Aut(P1 × P1)). Up to birational
conjugation, every finite Abelian subgroup of Aut(P1×P1) is conjugate to one and
only one of the following:
[1] G ∼= Z/nZ× Z/mZ g.b. (x, y) 7→ (ζnx, y) and (x, y) 7→ (x, ζmy)
[2] G ∼= Z/2Z× Z/2nZ g.b. (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζ2ny)
[3] G ∼= (Z/2Z)2 × Z/2nZ g.b. (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζ2ny)
[4] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1)
[5] G ∼= (Z/2Z)4 g.b. (x, y) 7→ (±x±1,±y±1)
[6] G ∼= Z/2Z× Z/4Z g.b. (x, y) 7→ (x−1, y−1) and (x, y) 7→ (−y, x)
[7] G ∼= (Z/2Z)3 g.b. (x, y) 7→ (−x,−y), (x, y) 7→ (x−1, y−1),
and (x, y) 7→ (y, x)
(where n,m are positive integers, n divides m and ζn = e
2iπ/n).
Furthermore, the groups in [1] are conjugate to subgroups of Aut(P2), but the others
are not.
Proof. Recall that Aut(P1 × P1) = (PGL(2,C) × PGL(2,C)) ⋊ Z/2Z. Let G be
some finite Abelian subgroup of Aut(P1 × P1); we now prove that G is conjugate
to one of the groups in cases [1] through [7].
First of all, if G is a subgroup of the group (C∗)2⋊Z/2Z given in Example 2.4,
then it is conjugate to a subgroup of Aut(P2) and hence to a group in case [1].
Assume that G ⊂ PGL(2,C)×PGL(2,C) and denote by π1 and π2 the projec-
tions πi : PGL(2,C)×PGL(2,C) → PGL(2,C) on the i-th factor. Since π1(G) and
π2(G) are finite Abelian subgroups of PGL(2,C) each is conjugate to a diagonal
cyclic group or to the group x 99K ±x±1, isomorphic to (Z/2Z)2. We enumerate
the possible cases.
If both groups π1(G) and π2(G) are cyclic, the group G is conjugate to a
subgroup of the diagonal torus (C∗)2 of automorphisms of the form (x, y) 7→
(αx, βy), α, β ∈ C∗.
If exactly one of the two groups π1(G) and π2(G) is cyclic we may assume, up
to conjugation in Aut(P1 × P1), that π2(G) is cyclic, generated by y 7→ ζmy, for
some integer m ≥ 1, and that π1(G) is the group x 99K ±x
±1. We use the exact
sequence 1 → G ∩ kerπ2 → G → π2(G) → 1 and find, up to conjugation, two
possibilities for G:
(a) G is generated by (x, y) 7→ (x−1, y) and (x, y) 7→ (−x, ζmy).
(b) G is generated by (x, y) 7→ (±x±1, y) and (x, y) 7→ (x, ζmy).
If m is even, we obtain respectively [2] and [3] for n = m/2. If m is odd, the two
groups are equal; conjugating by ϕ : (x, y) 99K (x, y(x + x−1)) (which conjugates
(x, y) 7→ (−x, y) to (x, y) 7→ (−x,−y)) we obtain the group [2] for n = m.
If both groups π1(G) and π2(G) are isomorphic to (Z/2Z)
2, then up to conju-
gation, we obtain three groups, namely
(a) G is generated by (x, y) 7→ (−x,−y) and (x, y) 7→ (x−1, y−1).
(b) G is generated by (x, y) 7→ (±x,±y) and (x, y) 7→ (x−1, y−1).
(c) G is given by (x, y) 7→ (±x±1,±y±1).
The group [2] with n = 1 is conjugate to (a) by (x, y) 99K (x, x y+x
y+x−1
). The groups
(b) and (c) are respectively equal to [4] and [5].
We now suppose that the group G is not contained in PGL(2,C)×PGL(2,C).
Any element ϕ ∈ Aut(P1×P1) not contained in PGL(2,C)×PGL(2,C) is conjugate
to ϕ : (x, y) 7→ (α(y), x), where α ∈ Aut(P1), and if ϕ is of finite order, α may be
chosen to be y 7→ λy with λ ∈ C∗ a root of unity.
Thus, up to conjugation, G is generated by the group H = G ∩ (PGL(2,C)×
PGL(2,C)) and one element (x, y) 7→ (λy, x), for some λ ∈ C∗ of finite order. Since
the group G is Abelian, every element of H is of the form (x, y) 7→ (β(x), β(y)),
for some β ∈ PGL(2,C) satisfying β(λx) = λβ(x). Three possibilities occur,
depending on the value of λ which may be 1, −1 or something else.
If λ = 1, we conjugate the group by some element (x, y) 7→ (γ(x), γ(y)) so that
H is either diagonal or equal to the group generated by (x, y) 7→ (−x,−y) and
(x, y) 7→ (x−1, y−1). In the first situation, the group is contained in (C∗)2 ⋊Z/2Z
(which gives [1]); the second situation gives [7].
If λ = −1, the group H contains the square of (x, y) 7→ (−y, x), which is
(x, y) 7→ (−x,−y) and is either cyclic or generated by (x, y) 7→ (−x,−y) and
(x, y) 7→ (x−1, y−1). If H is cyclic, it is diagonal, since it contains (x, y) 7→
(−x,−y), so G is contained in (C∗)2 ⋊ Z/2Z. The second possibility gives [6].
If λ 6= ±1, the group H is diagonal and then G is contained in (C∗)2 ⋊ Z/2Z.
We now prove that distinct groups of the list are not birationally conjugate.
First of all, each group of case [1] fixes at least one point of P1 × P1. Since the
other groups of the list don’t fix any point, they are not conjugate to [1] [Ko-Sz,
Proposition A.2].
Consider the other groups. The set of isomorphic groups are those of cases [3]
(with n = 1), [4] and [7] (isomorphic to (Z/2Z)3), and of cases [2] (with n = 2)
and [6] (isomorphic to Z/2Z× Z/4Z).
The groups of cases [2] to [5] leave two pencils of rational curves invariant (the
fibres of the two projections P1 × P1 → P1) which intersect freely in exactly one
point. We prove that this is not the case for [6] and [7]; this shows that these
two groups are not birationally conjugate to any of the previous groups. Take
G ⊂ Aut(P1×P1) to be either [6] or [7]. We have then Pic(P1×P1)G = Zd, where
d = − 1
KP1×P1 is the diagonal of P
1×P1. Suppose that there exist two G-invariant
pencils Λ1 = n1d and Λ2 = n2d of rational curves, for some positive integers n1, n2
(we identify here a pencil with the class of its elements in Pic(P1 × P1)G). The
intersection Λ1 · Λ2 = 2n1n2 is an even integer. Note that the fixed part of the
intersection is also even, since G is of order 8 and acts without fixed points on
P1 × P1. The free part of the intersection is then also an even integer and hence
is not 1.
Let us now prove that [4] is not birationally conjugate to [3] (with n = 1).
This follows from the fact that [4] contains three subgroups that are fixed-point
free (the groups generated by (x, y) 7→ (x−1, y−1) and one of the three involutions
of the group (x, y) 7→ (±x,±y)), whereas [3] (with n = 1) contains only one such
subgroup, which is (x, y) 7→ (±x±1, y).
We now prove the last assertion. The finite Abelian groups of automorphisms
of P2 are conjugate either to [1] or to the group V9, isomorphic to (Z/3Z)
2 (see
Proposition 2.2). As no group of the list [2] through [7] is isomorphic to (Z/3Z)2,
we are done.
Summary of this section. We have found that the groups common to the
three surfaces C2,P2 and P1 × P1 are the ”diagonal” ones (generated by (x, y) 7→
(ζnx, y) and (x, y) 7→ (x, ζmy)). On P
2 there is only one more group, which is the
special group V9, and on P
1 × P1 there are 2 families ([2] and [3]) and 4 special
groups ([4], [5], [6] and [7]).
3 Some facts about automorphisms of conic bun-
We first consider conic bundles without mentioning any group action on them. We
recall some classical definitions:
Definition 3.1. Let S be a rational surface and π : S → P1 be a morphism. We
say that the pair (S, π) is a conic bundle if a general fibre of π is isomorphic to P1,
with a finite number of exceptions: these singular fibres are the union of smooth
rational curves F1 and F2 such that (F1)
2 = (F2)
2 = −1 and F1 · F2 = 1.
Let (S, π) and (S̃, π̃) be two conic bundles. We say that ϕ : S 99K S̃ is a
birational map of conic bundles if ϕ is a birational map which sends a general fibre
of π on a general fibre of π̃.
We say that a conic bundle (S, π) is minimal if any birational morphism of
conic bundles (S, π) → (S̃, π̃) is an isomorphism.
We remind the reader of the following well-known result:
Lemma 3.2. Let (S, π) be a conic bundle. The following conditions are equivalent:
• (S, π) is minimal.
• The fibration π is smooth, i.e. no fibre of π is singular.
• S is a Hirzebruch surface Fm, for some integer m ≥ 0. �
Blowing-down one irreducible component in any singular fibre of a conic bundle
(S, π), we obtain a birational morphism of conic bundles S → Fm for some integer
m ≥ 0. Note that m depends on the choice of the blown-down components. The
following lemma gives some information on the possibilities. Note first that since
the sections of Fm have self-intersection≥ −m, the self-intersections of the sections
of π are also bounded from below.
Lemma 3.3. Let (S, π) be a conic bundle on a surface S 6∼= P1 × P1. Let −n be
the minimal self-intersection of sections of π and let r be the number of singular
fibres of π. Then n ≥ 1 and:
1. There exists a birational morphism of conic bundles p− : S → Fn such that:
(a) p− is the blow-up of r points of Fn, none of which lies on the exceptional
section En.
(b) The strict pull-back Ẽn of En by p− is a section of π with self-intersection
2. If there exist two different sections of π with self-intersection −n, then r ≥
2n. In this case, there exist birational morphisms of conic bundles p0 : S →
F0 = P
1 × P1 and p1 : S → F1.
Proof. We denote by s a section of π of minimal self-intersection −n, for some
integer n (this integer is in fact positive, as will appear in the proof). Note that
this curve intersects exactly one irreducible component of each singular fibre.
If r = 0, the lemma is trivially true: take p− to be the identity map. We now
suppose that r ≥ 1, and denote by F1, ..., Fr the irreducible components of the
singular fibres which do not intersect s. Blowing these down, we get a birational
morphism of conic bundles p− : S → Fm, for some integerm ≥ 0. The image of the
section s by p− is a section of the conic bundle of Fm of minimal self-intersection,
so we get m = n, and n ≥ 0. If we had n = 0, then taking some section s̃ of
P1×P1 of self-intersection 0 passing through at least one blown-up point, its strict
pull-back by p− would be a section of negative self-intersection, which contradicts
the minimality of s2 = −n = 0. We find finally that m = n > 0, and that p−(s)
is the unique section Fn of self-intersection −n. This proves the first assertion.
We now prove the second assertion. Suppose that some section t 6= s has self-
intersection −n. The Picard group of S is generated by s = p∗−(En), the divisor f
of a fibre of π and F1, ..., Fr. Write t as t = s+ bf −
i=1 aiFi, for some integers
b, a1, ..., ar, with a1, ..., ar ≥ 0. We have t
2 = −n and t · (t+KS) = −2 (adjunction
formula), where KS = p
−(KFn) +
i=1 Fi = −(n + 2)f − 2s +
i=1 Fi. These
relations give:
s2 = t2 = s2 −
i=1 a
i + 2b,
n− 2 = t ·KS = −(n+ 2) + 2n− 2b+
i=1 ai,
whence
i=1 ai =
i=1 a
i = 2b, so each ai is equal to 0 or 1 and consequently
2b ≤ r. Since s · t = b− n ≥ 0, we find that r ≥ 2n, as announced.
Finally, by contracting f − F1, f − F2, ..., f − Fn, Fn+1, Fn+2, ..., Fr, we obtain
a birational morphism p0 of conic bundles which sends s on a section of self-
intersection 0 and whose image is thus F0. Similarly, the morphism p1 : S → F1
is given by the contraction of f − F1, f − F2, ..., f − Fn−1, Fn, Fn+1, ..., Fr.
We now add some group actions on the conic bundles, and give natural defi-
nitions (note that we will restrict ourselves to finite or Abelian groups only when
this is needed and will then say so):
Definition 3.4. Let (S, π) be some conic bundle.
• We denote by Aut(S, π) ⊂ Aut(S) the group of automorphisms of the conic
bundle, i.e. automorphisms of S that send a general fibre of π on another
general fibre.
Let G ⊂ Aut(S, π) be some group of automorphisms of the conic bundle (S, π).
• We say that a birational map of conic bundles ϕ : S 99K S̃ is G-equivariant
if the G-action on S̃ induced by ϕ is biregular (it is clear that it preserves
the conic bundle structure).
• We say that the triple (G,S, π) is minimal if any G-equivariant birational
morphism of conic bundles ϕ : S → S̃ is an isomorphism.
Remark 3.5. We insist on the fact that since a conic bundle is for us a pair (S, π),
an automorphism of S is not necessarily an automorphism of the conic bundle (i.e.
Aut(S) 6= Aut(S, π) in general). One should be aware that in the literature, conic
bundle sometimes means ”a variety admitting a conic bundle structure”.
Remark 3.6. If G ⊂ Aut(S, π) is such that the pair (G,S) is minimal, so is the
triple (G,S, π). The converse is not true in general (see Remark 4.7).
Note that any automorphism of the conic bundle acts on the set of singular
fibres and on its irreducible components. The permutation of the two components
of a singular fibre is very important (Lemma 3.8). For this reason, we introduce
some terminology:
Definition 3.7. Let g ∈ Aut(S, π) be an automorphism of the conic bundle (S, π).
Let F = {F1, F2} be a singular fibre. We say that g twists the singular fibre F if
g(F1) = F2 (and consequently g(F2) = F1).
If g twists at least one singular fibre of π, we will say that g twists the conic
bundle (S, π), or simply (if the conic bundle is implicit) that g is a twisting element.
Here is a simple but very important observation:
Lemma 3.8. Let G ⊂ Aut(S, π) be a group of automorphisms of a conic bundle.
The following conditions are equivalent:
1. The triple (G,S, π) is minimal.
2. Any singular fibre of π is twisted by some element of G. �
Remark 3.9. An automorphism of a conic bundle with a non-trivial action on
the basis of the fibration may twist at most two singular fibres. However, an
automorphism with a trivial action on the basis of the fibration may twist a large
number of fibres. We will give in Propositions 6.5 and 6.8 a precise description of
all twisting elements.
The following lemma is a direct consequence of Lemma 3.3; it provides infor-
mation on the structure of the underlying variety of a conic bundle admitting a
twisting automorphism.
Lemma 3.10. Suppose that some automorphism of the conic bundle (S, π) twists
at least one singular fibre. Then, the following occur.
1. There exist two birational morphisms of conic bundles p0 : S → F0 and
p1 : S → F1 (which are not g-equivariant).
2. Let −n be the minimal self-intersection of sections of π and let r be the
number of singular fibres of π. Then, r ≥ 2n ≥ 2.
Proof. Note that any section of π touches exactly one component of each singular
fibre. Since g twists some singular fibre, its action on the set of sections of S is
fixed-point-free. The number of sections of minimal self-intersection is then greater
than 1 and we apply Lemma 3.3 to get the result.
Remark 3.11. A result of the same kind can be found in [Isk1], Theorem 1.1.
Lemma 3.12. Let G ⊂ Aut(S, π) be a group of automorphisms of the conic bundle
(S, π), such that:
• π has at most 3 singular fibres (or equivalently (KS)
2 ≥ 5);
• the triple (G,S, π) is minimal.
Then, S is either a Hirzeburch surface or a del Pezzo surface of degree 5 or 6,
depending on whether the number of singular fibres is 0, 3 or 2 respectively.
Proof. Let −n be the minimal self-intersection of sections of π and let r ≤ 3 be
the number of singular fibres of π. If r = 0, we are done, so we may suppose that
r > 0. Since (G,S, π) is minimal, every singular fibre is twisted by some element
of G (Lemma 3.8). From Lemma 3.10, we get r ≥ 2n ≥ 2, whence r = 2 or 3 and
n = 1, and we obtain the existence of some birational morphism of conic bundles
(not G-equivariant) p1 : S → F1. So the surface S is obtained by the blow-up
of 2 or 3 points of F1, not on the exceptional section (Lemma 3.3), and thus by
blowing-up 3 or 4 points of P2, no 3 of which are collinear (otherwise we would
have a section of self-intersection ≤ −2). The surface is then a del Pezzo surface
of degree 6 or 5.
Remark 3.13. We conclude this section by mentioning an important exact se-
quence. Let G ⊂ Aut(S, π) be some group of automorphisms of a conic bundle
(S, π). We have a natural homomorphism π : G → Aut(P1) = PGL(2,C) that
satisfies π(g)π = πg, for every g ∈ G. We observe that the group G′ = kerπ of
automorphisms that leave every fibre invariant embeds in the group PGL(2,C(x))
of automorphisms of the generic fibre P1(C(x)). Then we get the exact sequence
1 → G′ → G
→ π(G) → 1. (1)
This restricts the structure of G; for example if G is Abelian and finite, so are
G′ and π(G), and we know that the finite Abelian subgroups of PGL(2,C) and
PGL(2,C(x)) are either cyclic or isomorphic to (Z/2Z)2.
We also see that the group G is birationally conjugate to a subgroup of the
group of birational transformations of P1 × P1 of the form (written in affine coor-
dinates):
(x, y) 99K
ax+ b
cx+ d
α(x)y + β(x)
γ(x)y + δ(x)
where a, b, c, d ∈ C, α, β, γ, δ ∈ C(x), and (ad− bc)(αδ − βγ) 6= 0.
This group, called the de Jonquières group, is the group of birational transfor-
mations of P1 ×P1 that preserve the fibration induced by the first projection, and
is isomorphic to PGL(2,C(x))⋊ PGL(2,C).
The subgroups of this group can be studied algebraically (as in [Bea2] and
[Bla4]) but we will not adopt this point of view here.
4 The del Pezzo surface of degree 6
There is a single isomorphism class of del Pezzo surfaces of degree 6, since all
sets of three non-collinear points of P2 are equivalent under the action of linear
automorphisms. Consider the surface S6 of degree 6 defined by the blow-up of
the points A1 = (1 : 0 : 0), A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1). We may view
it in P2 × P2, defined as {
(x : y : z), (u : v : w)
| ux = vy = wz}, where the
blow-down p : S6 → P
2 is the restriction of the projection on one copy of P2,
explicitly p :
(x : y : z), (u : v : w)
7→ (x : y : z). There are exactly 6 exceptional
divisors, which are the pull-backs of the Ai’s by the two projection morphisms.
We write Ei = p
−1(Ai) and denote by Dij the strict pull-back by p of the line of
P2 passing through Ai and Aj .
The group of automorphisms of S6 is well known (see for example [Wim],
[Do-Iz]). It is isomorphic to (C∗)2 ⋊ (Sym3 × Z/2Z), where (C
∗)2 ⋊ Sym3 is the
lift on S6 of the group of automorphisms of P
2 that leave the set {A1, A2, A3}
invariant, and Z/2Z is generated by the permutation of the two factors (it is the
lift of the standard quadratic transformation (x : y : z) 99K (yz : xz : xy) of P2);
the action of Z/2Z on (C∗)2 sends an element on its inverse.
There are three conic bundle structures on the surface S6. Let π1 : S6 → P
be the morphism defined by
(x : y : z), (u : v : w)
(y : z) if (x : y : z) 6= (1 : 0 : 0),
(w : v) if (u : v : w) 6= (1 : 0 : 0).
Note that p sends the fibres of π1 on lines of P
2 passing through A1. There are
exactly two singular fibres of this fibration, namely
π−11 (1 : 0) = {E2, D12} and π
1 (0 : 1) = {E3, D13};
and E1, D23 are sections of π1.
Lemma 4.1. The group Aut(S6, π1) of automorphisms of the conic bundle (S6, π1)
acts on the hexagon {E1, E2, E3, D12, D13, D23} and leaves the set {E1, D23} in-
variant.
1. The action on the hexagon gives rise to the exact sequence
1 → (C∗)2 → Aut(S6, π1) → (Z/2Z)
2 → 1.
2. This exact sequence is split and Aut(S6, π1) = (C
∗)2 ⋊ (Z/2Z)2, where
(a) (C∗)2 is the group of automorphisms of the form(
(x : y : z), (u : v : w)
(x : αy : βz), (αβu : βv : αw)
, α, β ∈ C∗.
(b) The group (Z/2Z)2 is generated by the automorphisms
(x : y : z), (u : v : w)
(x : z : y), (u : w : v)
whose action on the set of exceptional divisors is (E2 E3)(D12 D13);
(x : y : z), (u : v : w)
(u : v : w), (x : y : z)
whose action is (E1 D23)(E2 D13)(E3 D12).
(c) The action of (Z/2Z)2 on (C∗)2 is generated by permutation of the
coordinates and inversion.
Proof. Since Aut(S6) acts on the hexagon, so does Aut(S6, π1) ⊂ Aut(S6). Since
the group Aut(S6, π1) sends a section on a section, the set {E1, D23} is invariant.
The group (C∗)2 leaves the conic bundle invariant, and is the kernel of the
action of Aut(S6, π1) on the hexagon. As the set {E1, D23} is invariant, the
image is contained in the group (Z/2Z)2 generated by (E2 E3)(D12 D13) and
(E1 D23)(E2 D13)(E3 D12). The rest of the lemma follows directly.
By permuting coordinates, we have two other conic bundle structures on the
surface S6, given by the following morphisms π2, π3 : S6 → P
(x : y : z), (u : v : w)
(x : z) if (x : y : z) 6= (0 : 1 : 0),
(w : u) if (u : v : w) 6= (0 : 1 : 0).
(x : y : z), (u : v : w)
(x : y) if (x : y : z) 6= (0 : 0 : 1),
(v : u) if (u : v : w) 6= (0 : 0 : 1).
The description of the exceptional divisors on S6 shows that π1, π2 and π3 are
the only conic bundle structures on S6.
Lemma 4.2. For i = 1, 2, 3, the pair (Aut(S6, πi), S6) is not minimal. More
precisely the morphism πj×πk : S6 → P
1×P1 conjugates Aut(S6, πi) to a subgroup
of Aut(P1 × P1), where {i, j, k} = {1, 2, 3}.
Proof. The union of the sections E1 and D23 is invariant by the action of the
whole group Aut(S6, π1). Since these two exceptional divisors don’t intersect, we
can contract both and get a birational Aut(S6, π1)-equivariant morphism from S6
to P1×P1: the pair (Aut(S6, π1), S6) is thus not minimal; explicitly, the birational
morphism is given by q 7→ (π2(q), π3(q)), as stated in the lemma. We obtain the
other cases by permuting coordinates.
Remark 4.3. The subgroup of Aut(P1×P1) obtained in this manner doesn’t leave
any of the two fibrations of P1 × P1 invariant.
Corollary 4.4. If (G,S6) is a minimal pair (where G ⊂ Aut(S6)), then G does
not preserve any conic bundle structure. �
We conclude this section with a fundamental example; we will use several times
the following automorphism κα,β of (S6, π1):
Example 4.5. For any α, β ∈ C∗, we define κα,β to be the following automorphism
of (S6, π1):
κα,β :
(x : y : z), (u : v : w)
(u : αw : βv), (x : α−1z : β−1y)
Note that κα,β twists the two singular fibres of π1 (see Lemma 4.6 below); its
action on the basis of the fibration is (x1 : x2) 7→ (αx1 : βx2) and
κ2α,β(
(x : y : z), (u : v : w)
(x : αβ−1y : α−1βz), (u : α−1βv : αβ−1w)
So κα,β is an involution if and only if its action on the basis of the fibration is
trivial.
Lemma 4.6. Let g ∈ Aut(S6, π1) be an automorphism of the conic bundle (S6, π1).
The following conditions are equivalent:
• the triple (< g >, S6, π1) is minimal;
• g twists the two singular fibres of π1;
• the action of g on the exceptional divisors of S6 is (E1 D23)(E2 D12)(E3 D13);
• g = κα,β for some α, β ∈ C
Proof. According to Lemma 4.1 the action of Aut(S6, π1) on the exceptional curves
is isomorphic to (Z/2Z)2 and hence the possible actions of g 6= 1 are these:
1. id, 2. (E2 E3)(D12 D13),
3. (E1 D23)(E2 D13)(E3 D12), 4. (E1 D23)(E2 D12)(E3 D13).
In the first three cases, the triple (< g >, S6, π1) is not minimal. Indeed, the
blow-down of {E2, E3} or {E2, D13} gives a g-equivariant birational morphism of
conic bundles.
Hence, if (< g >, S6, π1) is minimal, its action on the exceptional curves is the
fourth one above, as stated in the lemma, and it then twists the two singular fibres
of π1. Conversely if g twists the two singular fibres of π1, the triple (< g >, S6, π1)
is minimal (by Lemma 3.8).
It remains to see that the last assertion is equivalent to the others. This follows
from Lemma 4.1; indeed this lemma implies that (C∗)2κ1,1 is the set of elements
of Aut(S6, π1) inducing the permutation (E1 D23)(E2 D12)(E3 D13).
Remark 4.7. The pair (Aut(S6, π1), S6) is not minimal (Lemma 4.2). Consequently
< κα,β > is an example of a group whose action on the surface is not minimal,
but whose action on a conic bundle is minimal.
5 The del Pezzo surface of degree 5
As for the del Pezzo surface of degree 6, there is a single isomorphism class of del
Pezzo surfaces of degree 5. Consider the del Pezzo surface S5 of degree 5 defined
by the blow-up p : S5 → P
2 of the points A1 = (1 : 0 : 0), A2 = (0 : 1 : 0),
A3 = (0 : 0 : 1) and A4 = (1 : 1 : 1). There are 10 exceptional divisors on S5,
namely the divisor Ei = p
−1(Ai), for i = 1, ..., 4, and the strict pull-back Dij of
the line of P2 passing through Ai and Aj , for 1 ≤ i < j ≤ 4. There are 5 sets of 4
skew exceptional divisors on S5, namely
F1 = {E1, D23, D24, D34}, F2 = {E2, D13, D14, D34}, F3 = {E3, D12, D14, D24},
F4 = {E4, D12, D13, D23}, F5 = {E1, E2, E3, E4}.
Proposition 5.1. The action of Aut(S5) on the five sets F1, ..., F5 of four skew
exceptional divisors of S5 gives rise to an isomomorphism
ρ : Aut(S5) → Sym5.
Furthermore, the actions of Symn, Altm ⊂ Aut(S5) on S5 given by the canonical
embedding of these groups into Sym5 are fixed-point free if and only if n = 3, 4, 5,
respectively m = 4, 5.
Proof. Since any automorphism in the kernel of ρ leaves E1, E2, E3 and E4 invari-
ant and hence is the lift of an automorphism of P2 that fixes the 4 points, the
homomorphism ρ is injective.
We now prove that ρ is also surjective. Firstly, the lift of the group of au-
tomorphisms of P2 that leave the set {A1, A2, A3, A4} invariant is sent by ρ on
Sym4 = Sym{F1,F2,F3,F4}. Secondly, the lift of the standard quadratic transforma-
tion (x : y : z) 99K (yz : xz : xy) is an automorphism of S5, as its lift on S6 is an
automorphism, and as it fixes the point A4; its image by ρ is (F4 F5).
It remains to prove the last assertion. First of all, it is clear that the actions
of the cyclic groups Alt3 and Sym2 fix some points. The group Sym3 ⊂ Aut(P
of permutations of A1, A2 and A3 fixes exactly one point, namely (1 : 1 : 1). The
blow-up of this point gives a fixed-point free action on F1, and thus its lift on S5
is also fixed-point free. The group Alt4 ⊂ Aut(P
2) contains the element (x : y :
z) 7→ (z : x : y) (which corresponds to (1 2 3)) that fixes exactly three points, i.e.
(1 : a : a2) for a3 = 1. It also contains the element (x : y : z) 7→ (z − y : z − x : z)
(which corresponds to (1 2)(3 4)) that does not fix (1 : a : a2) for a3 = 1. Thus,
the action of Alt4 on P
2 is fixed-point free and the same is true on S5.
Remark 5.2. The structure of Aut(S5) is classical and can be found for example
in [Wim] and [Do-Iz].
Lemma 5.3. Let π : S5 → P
1 be some morphism inducing a conic bundle (S5, π).
There are exactly four exceptional curves of S5 which are sections of π; the blow-
down of these curves gives rise to a birational morphism p : S5 → P
2 which
conjugates the group Aut(S5, π) ∼= Sym4 to the subgroup of Aut(P
2) that leaves
invariant the four points blown-up by p. In particular, the pair (Aut(S5, π), S5) is
not minimal.
Proof. Blowing-down one component in any singular fibre, we obtain a birational
morphism of conic bundles (not Aut(S5, π)-equivariant) from S5 to some Hirze-
bruch surface Fn. Since S5 does not contain any curves of self-intersection ≤ −2, n
is equal to 0 or 1. Changing the component blown-down in a singular fibre performs
an elementary link Fn 99K Fn±1; we may then assume that n = 1, and that F1 is
the blow-up of A1 ∈ P
2. Consequently, the fibres of the conic bundles correspond
to the lines passing through A1. Denoting by A2, A3, A4 the other points blown-up
by the constructed birational morphism S5 → P
2 and using the same notation as
before, the three singular fibres are {Ei, D1i} for i = 2, ..., 4, and the other excep-
tional curves are four skew sections of the conic bundle, namely the elements of
F1 = {E1, D23, D24, D34}. The blow-down of F1 gives an Aut(S5, π)-equivariant
birational morphism (that is not a morphism of conic bundles) p : S5 → P
2 and
conjugates Aut(S5, π) to a subgroup of the group Sym4 ⊂ Aut(P
2) of automor-
phisms that leaves the four points blown-up by p invariant. The fibres of π are
sent on the conics passing through the four points, so the lift of the whole group
Sym4 belongs to Aut(S5, π).
Corollary 5.4. Let G be some group of automorphisms of a conic bundle (S, π)
such that the pair (G,S) is minimal and (KS)
2 ≥ 5 (or equivalently such that the
number of singular fibres of π is at most 3). Then, the fibration is smooth, i.e. S
is a Hirzebruch surface.
Proof. Since (G,S) is minimal, so is the triple (G,S, π). By Lemma 3.12, the
surface S is either a Hirzebruch surface, or a del Pezzo surface of degree 5 or 6.
Corollary 4.4 shows that the del Pezzo surface of degree 6 is not possible and
Lemma 5.3 eliminates the possibility of the del Pezzo surface of degree 5.
6 Description of twisting elements
In this section, we describe the twisting automorphisms of conic bundles, which
are the most important automorphisms (see Lemma 3.8).
Lemma 6.1 (Involutions twisting a conic bundle). Let g ∈ Aut(S, π) be a twist-
ing automorphism of the conic bundle (S, π). Then, the following properties are
equivalent:
1. g is an involution;
2. π(g) = 1, i.e. g has a trivial action on the basis of the fibration;
3. the set of points of S fixed by g is an irreducible hyperelliptic curve of genus
(k − 1) – a double covering of P1 by means of π, ramified over 2k points –
plus perhaps a finite number of isolated points, which are the singular points
of the singular fibres not twisted by g.
Furthermore, if the three conditions above are satisfied, the number of singular
fibres of π twisted by g is 2k ≥ 2.
Proof. 1 ⇒ 2: By contracting some exceptional curves, we may assume that the
triple (< g >, S, π) is minimal. Suppose that g is an involution and π(g) 6= 1.
Then g may twist only two singular fibres, which are the fibres of the two points
of P1 fixed by π(g). Hence, the number of singular fibres is ≤ 2. Lemma 3.12
tells us that S is a del Pezzo surface of degree 6 and then Lemma 4.6 shows that
g = κα,β (Example 4.5) for some α, β ∈ C
∗. But such an element is an involution
if and only if it acts trivially on the basis of the fibration.
(1 and 2) ⇒ 3: Suppose first that (< g >, S, π) is minimal. This implies that g
twists every singular fibre of π. Therefore, since π(g) = 1 and g2 = 1, on a singular
fibre there is one point fixed by g (the singular point of the fibre) and on a general
fibre there are two fixed points. The set of points of S fixed by g is thus a smooth
irreducible curve. The projection π gives it as a double covering of P1 ramified over
the points whose fibres are singular and twisted by g. By the Riemann-Hurwitz
formula, this number is even, equal to 2k and the genus of the curve is k − 1.
The situation when (< g >, S, π) is not minimal is obtained from this one, by
blowing-up some fixed points. This adds in each new singular fibre (not twisted
by the involution) an isolated point, which is the singular point of the singular
fibre. We then get the third assertion and the final remark.
3 ⇒ 2: This implication is clear.
2 ⇒ 1: If π(g) = 1, then, g2 leaves every component of every singular fibre of
π invariant. Let p1 : S → F1 be the birational morphism of conic bundles given by
Lemma 3.10; it is a g2-equivariant birational morphism which conjugates g2 to an
automorphism of F1 that necessarily fixes the exceptional section. The pull-back
by p1 of this section is a section C of π, fixed by g
2. Since C touches exactly
one component of each singular fibre (in particular those that are twisted by g),
g sends C on another section D also fixed by g2. The union of the sections D and
C intersects a general fibre in two points, which are exchanged by the action of g.
This implies that g has order 2.
We now give some further simple results on twisting involutions.
Corollary 6.2. Let (S, π) be some conic bundle. No involution twisting (S, π) has
a root in Aut(S, π) which acts trivially on the basis of the fibration.
Proof. Such a root must twist a singular fibre and so (Lemma 6.1) is an involution.
Remark 6.3. There may exist some roots in Aut(S, π) of twisting involutions which
act non trivially on the basis of the fibration.
Take for example four general points A1, ..., A4 of the plane and denote by g ∈
Aut(P2) the element of order 4 that permutes these points cyclically. The blow-up
of these points conjugates g to an automorphism of the del Pezzo surface S5 of
degree 5 (see Section 5). The pencil of conics of P2 passing through the four points
induces a conic bundle structure on S5, with three singular fibres which are the
lift of the pairs of two lines passing through the points. The lift on S5 of g is an
automorphism of the conic bundle whose square is a twisting involution.
Corollary 6.4. Let (S, π) be some conic bundle and let g ∈ Aut(S, π). The
following conditions are equivalent.
1. g twists more than 2 singular fibres of π.
2. g fixes a curve of positive genus.
And these conditions imply that g is an involution which acts trivially on the basis
of the fibration and twists at least 4 singular fibres.
Proof. The first condition implies that g acts trivially on the basis of the fibration,
and thus (by Lemma 6.1) that g is an involution which fixes a curve of positive
genus.
Suppose that g fixes a curve of positive genus. Then, g acts trivially on the
basis of the fibration, and fixes 2 points on a general fibre. Consequently, the curve
fixed by g is a smooth hyperelliptic curve; we get the remaining assertions from
Lemma 6.1.
As we mentioned above, the automorphisms that twist some singular fibre are
fundamental (Lemma 3.8). We now describe these elements and prove that the
only possibilities are twisting involutions, roots of twisting involutions (of even or
odd order) and elements of the form κα,β (see Example 4.5):
Proposition 6.5 (Classification of twisting elements of finite order). Let g ∈
Aut(S, π) be a twisting automorphism of finite order of a conic bundle (S, π).
Let n be the order of its action on the basis.
Then gn is an involution that acts trivially on the basis of the fibration and
twists an even number 2k of singular fibres; furthermore, exactly one of the fol-
lowing situations occurs:
1. n = 1.
2. n > 1 and k = 0; in this case n is even and there exists a g-equivariant bi-
rational morphism of conic bundles η : S → S6 (where S6 is the del Pezzo
surface of degree 6) such that ηgη−1 = κα,β for some α, β ∈ C
∗ (see Exam-
ple 4.5).
3. n > 1 is odd and k > 0; here g twists 1 or 2 fibres, which are the fibres twisted
by gn that are invariant by g.
4. n is even and k > 0; here g twists r = 1 or 2 singular fibres; none of them
are twisted by gn; moreover the action of g on the set of 2k fibres twisted by
gn is fixed-point free; furthermore, n divides 2k, and 2k/n ≡ r (mod 2).
Proof. Lemma 6.1 describes the situation when n = 1. We now assume that n > 1;
by blowing-down some components of singular fibres we may also suppose that the
triple (G,S, π) is minimal.
Denote by a1, a2 ∈ P
1 the two points fixed by π(g) ∈ Aut(P1). For i 6≡ 0
(mod n) the element π(gi) fixes only two points of P1, namely a1 and a2 (since
π(g) has order n); the only possible fibres twisted by gi are thus π−1(a1), π
−1(a2).
Suppose that gn does not twist any singular fibre. By minimality there are
at most 2 singular fibres (π−1(a1) and/or π
−1(a2)) of π and g twists each one.
Lemma 3.12 tells us that S is a del Pezzo surface of degree 6 and Lemma 4.6
shows that
g = κα,β :
(x : y : z), (u : v : w)
(u : αw : βv) , (x : α−1z : β−1y)
for some α, β ∈ C∗. We compute the square of g and find
(x : y : z), (u : v : w)
(x : αβ−1y : α−1βz) , (u : α−1βv : αβ−1w)
Consequently, the order of g is 2n. The fact that gi twists π−1(a1) and π
−1(a2)
when i is odd implies that n is even. Case 2 is complete.
If gn twists at least one singular fibre, it twists an even number of singular
fibres (Lemma 6.1) which we denote by 2k, and gn is an involution. If n is odd,
each fibre twisted by gn is twisted by g, and conversely; this yields case 3. It
remains to consider the more difficult case when n is even.
Firstly we observe that there are r + 2k singular fibres with r ∈ {1, 2}, cor-
responding to the points a1 and/or a2, c1, ..., c2k of P
1, the first r of them be-
ing twisted by g and the 2k others by gn. Under the permutation π(g), the
set {c1, ..., c2k} decomposes into disjoint cycles of length n (this action is fixed-
point-free); this shows that n divides 2k. We write t = 2k/n ∈ N and set
{c1, ..., c2k} = ∪
i=1Ci, where each Ci ⊂ P
1 is an orbit of π(g) of size n. To
deduce the congruence r ≡ t (mod 2), we study the action of g on Pic(S).
For i ∈ {1, ..., t}, choose Fi to be a component in the fibre of the singular fibre
of some point of Ci, and for i ∈ {1, r} choose Li to be a component in the fibre of
ai. Let us write
i=1(Fi + g(Fi) + ...+ g
n−1(Fi)) +
i=1 Li ∈ Pic(S).
Denoting by f ⊂ S a general fibre of π, we find the equalities g(Li) = f − Li
and gn(Fi) = f − Fi in Pic(S), which yield (once again in Pic(S)):
g(R) = R+ (r + t)f − 2(
i=1 Li +
i=1 Fi).
The contraction of the divisor R gives rise to a birational morphism of conic
bundles (not g-equivariant) ν : S → Fm for some integer m ≥ 0. Denote by s ⊂ S
the pull-back by ν of a general section of Fm of self-intersection m (which does
not pass through any of the base-points of ν−1). The canonical divisor KS of S is
then equal in Pic(S) to the divisor −2s+ (m − 2)f + R. We compute g(2s) and
2(g(s)− s) = g(2s)− 2s in Pic(S):
g(2s) = g(−KS + (m− 2)f +R) = −KS + (m− 2)f + g(R);
g(2s)− 2s = g(R)−R = (r + t)f − 2(
i=1 Li +
i=1 Fi).
This shows that (r + t)f ∈ 2Pic(S), which implies that r ≡ t (mod 2). Case 4 is
complete.
Corollary 6.6. If g ∈ Aut(S, π) is a root of a twisting involution h that fixes a
rational curve (i.e. that twists 2 singular fibres) and if g twists at least one fibre
not twisted by h, then g2 = h, g twists exactly one singular fibre, and it exchanges
the two fibres twisted by h.
Proof. We apply Proposition 6.5 and obtain case 4 with k = 1.
Corollary 6.6 and the following result will be useful in the sequel.
Lemma 6.7. Let g ∈ Aut(S, π) be a non-trivial automorphism of finite order
that leaves every component of every singular fibre of π invariant (i.e. that acts
trivially on Pic(S)) and let h ∈ Aut(S, π) be an element that commutes with g.
Then, either no singular fibre of π is twisted by h or each singular fibre of π which
is invariant by h is twisted by h.
Proof. If no twisting element belongs to Aut(S, π), we are done. Otherwise, the
birational morphism of conic bundles p0 : S → P
1 × P1 given by Lemma 3.10
conjugates g to an element of finite order of Aut(P1 × P1, π1) whose set of fixed
points is the union of two rational curves. The set of points of S fixed by g is thus
the union of two sections and a finite number of points (which are the singular
points of the singular fibres of π). Any element h ∈ Aut(S, π) that commutes
with g leaves the set of these two sections invariant. More precisely, the action on
one invariant singular fibre F implies the action on the two sections: h exchanges
the two sections if and only if it twists F . Since the situation is the same at any
other singular fibre, we obtain the result.
We conclude this section with some results on automorphisms of infinite order
of conic bundles, which will not help us directly here but seem interesting to
observe.
Proposition 6.8 (Classification of twisting elements of infinite order). Let (S, π)
be a conic bundle and g ∈ Aut(S, π) be a twisting automorphism of infinite order.
Then g twists exactly two fibres of π and there exists some g-equivariant bira-
tional morphism of conic bundles η : S → S6, where S6 is the del Pezzo surface of
degree 6 and ηgη−1 = κα,β for some α, β ∈ C
Proof. Assume that the triple (< g >, S, π) is minimal. Lemma 6.1 shows that
no twisting element of infinite order acts trivially on the basis of the fibration.
Consequently, gk acts trivially on the basis if and only if k = 0, whence gk twists a
fibre F if and only if k is odd and g twists F . There thus exist at most 2 singular
fibres of π, and Lemma 3.12 tells us that S is a del Pezzo surface of degree 6.
Lemma 4.6 shows that g = κα,β for some α, β ∈ C
Corollary 6.9. Let g ∈ Aut(S, π) be an element of infinite order; then a birational
morphism conjugates g to an automorphism of a Hirzebruch surface.
Proof. Assume that the triple (< g >, S, π) is minimal. If the fibration is smooth,
we are done. Otherwise, a birational morphism conjugates g to an automorphism
κα,β ∈ Aut(S6) of a conic bundle on the del Pezzo surface of degree 6 (Lemma 6.8).
We conclude by using Lemma 4.2.
7 The example Cs24
We now give the most important example of this paper. This is the only finite
Abelian subgroup of the Cremona group which is not conjugate to a group of
automorphisms of P2 or P1 × P1 but whose non-trivial elements do not fix any
curve of positive genus (Theorem 2).
Let S6 ⊂ P
2 × P2 be the del Pezzo surface of degree 6 (see Section 4) defined
S6 = {
(x : y : z), (u : v : w)
| ux = yv = zw};
we keep the notation of Section 4. We denote by η : Ŝ4 → S6 the blow-up of
A4, A5 ∈ S6 defined by
(0 : 1 : 1) , (1 : 0 : 0)
∈ D23,
(1 : 0 : 0) , (0 : 1 : −1)
∈ E1.
We again denote by E1, E2, E3, D12, D13, D23 the total pull-backs by η of these
divisors of S6. We denote by Ẽ1 and D̃23 the strict pull-backs of E1 and D23 by
η. (Note that for the other exceptional divisors, the strict and total pull-backs are
the same.) Let us illustrate the situations on the surfaces S6 and Ŝ4 respectively:
E2 D15 E4 E3
D12 E5 D14 D13
Let π1 denote the morphism S6 → P
1 defined in Section 4. The morphism
π = π1 ◦ η gives the surface Ŝ4 a conic bundle structure (Ŝ4, π). It has 4 singular
fibres, which are the fibres of (−1 : 1), (0 : 1), (1 : 1) and (1 : 0). We denote by f
the divisor of Ŝ4 corresponding to a fibre of π and set E4 = η
−1(A4), E5 = η
−1(A5).
Note that E4 is one of the components of the singular fibre of (1 : 1); we denote by
D14 = f−E4 the other component, which is the strict pull-back by η of π
1 (1 : 1).
Similarly, we denote by D15 the divisor f−E5, so that the singular fibre of (−1 : 1)
is {E5, D15}.
Lemma 7.1. On the surface Ŝ4 there are exactly 10 irreducible rational smooth
curves of negative self-intersection. Explicitly, the 8 curves
E2, E3, E4, E5, D12, D13, D14, D15
have self-intersection −1 and the two curves
Ẽ1 = E1 − E5 and D̃23 = D23 − E4
have self-intersection −2.
Proof. The difficult part is to show that every rational irreducible smooth curve
of negative self-intersection is one of the ten given above. Let C be such a curve.
Denote by L the pull-back of a general line of P2 by the blow-up pr1 ◦ η : Ŝ4 →
P2 of the five points. If C is collapsed by pr1 ◦ η, then C is one of the curves
Ẽ1, E2, E3, E4, E5. Otherwise, C = mL −
i=1 aiEi, where m, a1, ..., a5 are non-
negative integers, and m > 0. Since C is rational we have C · (C + K
) = −2,
and by hypothesis C2 = −r for some positive integer r. The relations C2 = −r
and C ·K
= r − 2 imply (since K
= −3L+
i=1 Ei) the equations
i=1 a
i = m
2 + r,
i=1 ai = 3m+ r − 2.
If m = r = 1, we find that C is the pull-back of a line passing through two of the
points, so C = D1i for some i ∈ {2, ..., 5}. If m = 2 and r = 1, C is the pull-back
of a conic passing through each blown-up point. The configuration of the points
eliminates this possibility. If m = 1 and r = 2, we obtain a line passing through
three blown-up points, so C = D̃23.
We now prove that if there is no integral solution to (2) for m, r ≥ 2. Since
i=1 ai)
2 ≤ 5(
i=1 a
i ) (by the Cauchy-Schwarz inequality with the vectors
(1, ..., 1) and (a1, ..., a5)), we obtain (3m+ (r − 2))
2 ≤ 5(m2 + r), and this gives
4m2 − 10 + (r − 2) · (6m+ r − 7) ≤ 0.
But this is not possible if m, r ≥ 2, since in this case 4m2 > 10 and 6m+r > 7.
Note that (K
)2 = 4, which is why we denote this surface by Ŝ4; the hat is
here because the surface is not a del Pezzo surface, since it contains irreducible
divisors of self-intersection −2.
Corollary 7.2. There is only one conic bundle structure on Ŝ4, which is the one
induced by π = π1 ◦ η.
Proof. Since (K
)2 = 4, the number of singular fibres of any conic bundle is 4,
and thus it consists of eight (−1)-curves C1, ..., C8. The divisor of a fibre of the
conic bundle is thus equal to 1
i=1 Ci. Since there are exactly eight (−1)-curves
on Ŝ4, there is no choice.
The group of automorphisms of Ŝ4 that leave every curve of negative self-
intersection invariant is isomorphic to C∗ and corresponds to automorphisms of P2
of the form (x : y : z) 7→ (αx : y : z), for α ∈ C∗. Indeed, such automorphisms are
the lifts of automorphisms of S6 leaving invariant every exceptional curve (which
are of the form
(x : y : z), (u : v : w)
(x : αy : βz), (u : α−1v : β−1w)
, for
α, β ∈ C∗) and which fix both points A4 and A5.
Definition 7.3. Let h1 and h2 be the following birational transformations of P
h1 : (x : y : z) 99K (yz : xy : −xz)
h2 : (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z))
and denote respectively by g1, g2 the lift of these elements on Ŝ4 and by Cs24 the
group generated by g1 and g2.
The following lemma shows that Cs24 ⊂ Aut(Ŝ4, π) and describes some of the
properties of this group.
Lemma 7.4. Let h1, h2, g1, g2,Cs24 be as in Definition 7.3. Then:
1. The group Cs24 is a group of automorphisms of Ŝ4 that preserve the conic
bundle (Ŝ4, π), i.e. Cs24 ⊂ Aut(Ŝ4, π).
2. The action of g1 and g2 on the set of irreducible rational curves of negative
self-intersection is respectively:
(Ẽ1 D̃23)(E2 D12)(E3 D13)(E4 E5)(D14 D15),
(Ẽ1 D̃23)(E2 D13)(E3 D12)(E4 D14)(E5 D15).
In particular, both g1 and g2 twist the conic bundle (Ŝ4, π).
3. Both g1 and g2 are elements of order 4 and
2 = (h2)
2 = (x : y : z) 7→ (−x : y : z).
Thus (g1)
2 = (g2)
2 ∈ kerπ is an automorphism of Ŝ4 which leaves every
divisor of negative self-intersection invariant.
4. The group Cs24 is isomorphic to Z/2Z × Z/4Z and the action on the basis
of the fibration π yields the exact sequence
1 →< (h1)
2 >∼= Z/2Z → Cs24
→< π(h1), π(h2) >∼= (Z/2Z)
2 → 1.
5. The group Cs24 contains no involution that twists the conic bundle (Ŝ4, π).
In particular, no element of Cs24 fixes a curve of positive genus.
6. The pair (Cs24, Ŝ4) and the triple (Cs24, Ŝ4, π) are both minimal.
Proof. Observe first that h1 and h2 preserve the pencil of lines of P
2 passing
through the point A1 = (1 : 0 : 0), so g1, g2 are birational transformations of Ŝ4
that send a general fibre of π on another fibre. Then, we compute (h1)
2 = (h2)
(x : y : z) 7→ (−x : y : z). This implies that both h1 and h2 are birational maps of
order 4.
Note that the lift of h1 on the surface S6 is the automorphism
κ1,−1 :
(x : y : z), (u : v : w)
(u : w : −v), (x : z : −y)
(see Example 4.5). Since this automorphism permutes A4 and A5, its lift on Ŝ4
is biregular. The action on the divisors with negative self-intersection is deduced
from that of κ1,−1 (see Lemma 4.6).
Compute the involution
h3 = h1h2 = (x : y : z) 99K (x(y + z) : z(y − z) : −y(y − z)).
Its linear system is
{ax(y + z) + (by + cz)(y − z) = 0 | (a : b : c) ∈ P2},
which is the linear sytem of conics passing through (0 : 1 : 1) and A1 = (1 : 0 : 0),
with tangent y + z = 0 at this point (i.e. passing through A5). Blowing-up these
three points (two on P2 and one in the blow-up of A1), we get an automorphism
g3 of some rational surface. As the points A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1)
are permuted by h3, we can also blow them up and again get an automorphism.
The isomorphism class of the surface obtained is independent of the order of the
blown-up points. We may first blow-up A1, A2, A3 and get S6. Then, we blow-up
the two other base-points of h3, which are in fact A4 (the point (0 : 1 : −1)) and
A5 (the point infinitely near to A1 corresponding to the tangent y + z = 0). This
shows that g3, and therefore g2, belong to Aut(Ŝ4, π).
Since h3 permutes the points A2 and A3, g3 = g1g2 permutes the divisors E2
and E3. It also permutes D12 and D13, since h3 leaves the pencil of lines passing
through A1 invariant. It therefore leaves Ẽ1 and D̃23 invariant, since E2 and E3
touch D̃23 but not E1. The remaining exceptional divisors are E4, E5, D14, D15.
Either g1g2 leaves all four invariant, or it acts as (E4 D15)(E5 D14) (using the
intersection with Ẽ1 and D̃23). Since A4 and A5 are base-points of h1h2, E4 and
E5 are not invariant. Thus, g1g2 acts on the irreducible rational curves of negative
self-intersection as (E2 E3)(D12 D13)(E4 D15)(E5 D14). We obtain the action of
g2 by composing that of g1g2 with that of g1 and thus have proved assertions 1
through 3.
Assertion 4 follows from assertion 3 and the fact that g1 and g2 commute.
Let us prove that Cs24 contains no involution that twists the conic bundle
(Ŝ4, π). Recall that such elements are involutions acting trivially on the basis
of the fibration (see Lemma 6.1). Note that the 2-torsion of Cs24 is equal to
{1, g21, g1g2, g1g
2 }. The elements g1g2 and g1g
2 do not act trivially on the basis
of the fibration, and the element (g1)
2 does not twist any singular fibre since it
leaves every curve of negative self-intersection invariant. This proves assertion 5.
It remains to prove the last assertion. Observe that the orbits of the action of
Cs24 on the exceptional divisors of Ŝ4 are {E2, E3, D12, D13} and {E4, E5, D14, D15}.
Since these orbits cannot be contracted, the pair (Cs24, Ŝ4) is minimal, and so is
the triple (Cs24, Ŝ4, π).
Remark 7.5. The pair (Cs24, Ŝ4) was introduced in [Bla2] and was called Cs.24
because it is a group acting on a conic bundle, which is special, and isomorphic
to Z/2Z× Z/4Z.
8 Finite Abelian groups of automorphisms of conic
bundles - birational representative elements
In this section we use the tools prepared in the previous sections to describe the
finite Abelian groups of automorphisms of conic bundles such that no non-trivial
element fixes a curve of positive genus.
We first treat the case in which no involution twisting the conic bundle belongs
to the group:
Proposition 8.1. Let G ⊂ Aut(S, π) be a finite Abelian group of automorphisms
of the conic bundle (S, π) such that:
• no involution that twists the conic bundle (S, π) belongs to G;
• the triple (G,S, π) is minimal.
Then, one of the following occurs:
• The fibration is smooth, i.e. S is a Hirzebruch surface.
• S is the del Pezzo surface of degree 6.
• The triple (G,S, π) is isomorphic to the triple (Cs24, Ŝ4, π) of Section 7.
Proof. We assume that the fibration is not smooth. Recall that since the triple
(G,S, π) is minimal, any singular fibre of π is twisted by an element of G (by
Lemma 3.8). Since no twisting involution belongs to G, any element g ∈ G that
twists a fibre corresponds to case 2 of Proposition 6.5. In particular, g is the lift
on S of an automorphism of the form κα,β of the del Pezzo surface of degree 6 and
it twists 2 singular fibres, which correspond to the fibres of the two fixed points
of π(g) ∈ PGL(2,C). Furthermore, g is the root of an involution that leaves every
component of every singular fibre of π invariant.
If the number of singular fibres is exactly two, then S is the del Pezzo surface
of degree 6, and we are done.
Now suppose that the number of singular fibres is larger than two. This implies
that π(G) is not a cyclic group (otherwise the non-trivial elements of π(G) would
have the same two fixed points: there would then be at most two singular fibres);
therefore, π(G) is isomorphic to (Z/2Z)2. By a judicious choice of coordinates we
may suppose that
π(G) =
Since a singular fibre corresponds to a fixed point of one of the three elements
of order 2 of π(G), only the fibres of (0 : 1), (1 : 0), (1 : 1), (−1 : 1), (i : 1), (−i : 1)
can be singular. Since the group π(G) acts transitively on the sets {(1 : 0), (0 : 1)},
{(1 : ±1)} and {(1 : ±i)}, there are 4 or 6 singular fibres.
We denote by g1 an element of G which twists the two singular fibres of (1 : 0)
and (0 : 1). Let η : S → S6 denote the birational g1-equivariant morphism given
by Proposition 6.5, which conjugates g1 to the automorphism
−1 = κα,β :
(x : y : z), (u : v : w)
(u : αw : βv), (x : α−1z : β−1y)
of the del Pezzo surface S6 of degree 6, for some α, β ∈ C
∗. In fact, since π(g1)
has order 2, we have β = −α, so ηg1η
−1 = κα,−α. The points blown-up by η are
fixed by
η(g1)
2η−1 = (κα,−α)
(x : y : z), (u : v : w)
(x : −y : −z), (u : −v : −w)
and therefore belong to the curves
E1 = {
(1 : 0 : 0), (0 : a : b)
| (a : b) ∈ P1}
and D23 = {
(0 : a : b), (1 : 0 : 0)
| (a : b) ∈ P1}.
Since these points consist of orbits of ηg1η
−1, half of them lie in E1 and the
other half in D23. In fact, up to a change of coordinates,
(x, y, z), (u, v, w)
(u, v, w), (x, y, z)
, the points that may be blown-up by η are
(0 : 1 : 1) , (1 : 0 : 0)
∈ D23,
κα,−α(A4) = A5 =
(1 : 0 : 0) , (0 : 1 : −1)
∈ E1,
(0 : 1 : i) , (1 : 0 : 0)
∈ D23,
κα,−α(A6) = A7 =
(1 : 0 : 0) , (0 : 1 : i)
∈ E1.
The strict pull-backs Ẽ1 and D̃23 by η of E1 and D23 respectively thus have self-
intersection −2 or −3 in S, depending on the number of points blown-up. By
convention we again denote by E1, E2, E3, D12, D13, D23 the total pull-backs by
η of these divisors. (Note that for E2, E3, D12, D13, the strict and the total pull-
backs are the same.) We set E4 = η
−1(A4),..., E7 = η
−1(A7) and denote by f the
divisor class of the fibre of the conic bundle.
(a) Suppose that η is the blow-up of A4 and A5, which implies that S is the
surface Ŝ4 of Section 7. The Picard group of S is then generated by E1, E2, ..., E5
and f .
Since we assumed that (G,S, π) is minimal, the singular fibres of (1 : 1) and
(−1 : 1) must be twisted. One element g2 twists these two singular fibres and
acts with order 2 on the basis of the fibration, with action (x1 : x2) 7→ (x2 : x1).
Since g1 and g2 twist some singular fibre, both must invert the two curves of self-
intersection −2, namely Ẽ1 and D̃23. The action of g1 and g2 on the irreducible
rational curves of negative self-intersection is then respectively
(Ẽ1 D̃23)(E2 D12)(E3 D13)(E4 E5)(D14 D15),
(Ẽ1 D̃23)(E2 D13)(E3 D12)(E4 D14)(E5 D15).
The elements g1 and g2 thus have the same action on Pic(S) = Pic(Ŝ4) as the
two automorphisms with the same name in Definition 7.3 and Lemma 7.4, which
generate Cs24. Note that the group H of automorphisms of S that leave every
curve of negative self-intersection invariant is isomorphic to C∗ and corresponds
to automorphisms of P2 of the form (x : y : z) 7→ (αx : y : z), for any α ∈ C∗.
Then, g1 and g2 are equal to the lift of the the following birational maps of P
h1 : (x : y : z) 99K (µyz : xy : −xz),
h2 : (x : y : z) 99K (νyz(y − z) : xz(y + z) : xy(y + z)),
for some µ, ν ∈ C∗.
As h1h2(x : y : z) = (µx(y + z) : νz(y − z) : −νy(y − z)) and h2h1(x : y :
z) = (νx(y + z) : µz(y − z) : −µy(y − z)) must be the same by hypothesis, we get
µ2 = ν2.
We observe that π(g1) and π(g2) generate π(G) ∼= (Z/2Z)
2; on the other hand,
by hypothesis an element of G′ does not twist a singular fibre and hence belongs
to H . As the only elements of H which commute with g1 are id and (g1)
2 (which
is the lift of (h1)
2 : (x : y : z) 7→ (−x : y : z)), we see that g1 and g2 generate the
whole group G.
Conjugating h1 and h2 by (x : y : z) 7→ (αx : y : z), where α ∈ C
∗, α2 = µ, we
may suppose that µ = 1. So ν = ±1 and we get in both cases the same group,
because (h1)
2(x : y : z) = (−x : y : z). The triple (G,S, π) is hence isomorphic to
the triple (Cs24, Ŝ4, π) of Section 7.
(b) Suppose that η is the blow-up of A6 and A7. We get a case isomorphic
to the previous one, using the automorphism
(x : y : z), (u : v : w)
(x : y :
iz), (u : v : −iw)
of S6.
(c) Suppose that η is the blow-up of A4, A5, A6 and A7. The Picard group of
S is then generated by E1, E2, ..., E6, E7 and f . Since (G,S, π) is minimal, there
must be two elements g2, g3 ∈ G that twist respectively the fibres of (±1 : 1) and
those of (±i : 1). As in the previous example, the three actions of these elements
on the basis are of order 2, and the three elements transpose Ẽ1 and D̃23. The
actions of g1, g2 and g3 on the set of irreducible components of the singular fibres
of π are then respectively
(E2 D12)(E3 D13)(E4 E5)(D14 D15)(E6 E7)(D16 D17),
(E2 D13)(E3 D12)(E4 D14)(E5 D15)(E6 E7)(D16 D17),
(E2 D13)(E3 D12)(E4 E5)(D14 D15)(E6 D16)(E7 D17).
This implies that the action of the element g1g2g3 is
(E2 D12)(E3 D13)(E4 D14)(E5 D15)(E6 D17)(E7 D16),
and thus it twists six singular fibres of the conic bundle and fixes a curve of genus 2
(Lemma 6.1), which contradicts the hypothesis. (In fact, one can also show that
the group generated by g1, g2 and g3 is not Abelian, see [Bla2], page 66.)
After studying the groups that do not contain a twisting involution, we now
study those which contain such elements. Since these twisting involutions cannot
fix a curve of positive genus, they twist exactly two fibres (Lemma 6.1).
Proposition 8.2. Let G ⊂ Aut(S, π) be a finite Abelian group of automorphisms
of a conic bundle (S, π) such that:
1. If g ∈ G, g 6= 1, then g does not fix a curve of positive genus.
2. The group G contains at least one involution that twists the conic bundle
(S, π).
3. The triple (G,S, π) is minimal.
Then, S is a del Pezzo surface of degree 5 or 6.
Proof. If the number of singular fibres is at most 3, then the surface is a del Pezzo
surface of degree 5 or 6 (Lemma 3.12).
We now assume that the number of singular fibres is at least 4 and show that
this situation is not compatible with the hypotheses. We recall once again the
exact sequence of Remark 3.13
1 → G′ → G
→ π(G) → 1, (1)
and prove the following important assertions:
(a) No element of G twists more than two singular fibres.
(b) Any twisting involution that belongs to G belongs to G′ and twists exactly
two singular fibres.
(c) Any singular fibre is twisted by an element of G.
(d) No non-trivial element preserves every component of every singular fibre.
(e) Any twisting element of G is a root of (or equal to) a twisting involution
that belongs to G′.
Corollary 6.4 shows that an element that twists more than two fibres fixes a
curve of positive genus; since this possibility is excluded by hypothesis, we obtain
assertion (a). Lemma 6.1 shows that any twisting involution contained in G be-
longs to G′ and twists an even number of fibres; using assertion (a), we thus obtain
assertion (b). Assertion (c) follows from the minimality of the triple (G,S, π) (see
Lemma 3.8). Let us prove assertion (d). Suppose that there exists a non-trivial
element g ∈ G that leaves every component of every singular fibre invariant, and
denote by h ∈ G′ a twisting involution (which exists by hypothesis). Since g and h
commute, Lemma 6.7 shows that each singular fibre invariant by h – there are
at least 4 – is twisted by h, which contradicts assertion (a). Therefore, such an
element g doesn’t exist and assertion (d) is proved. Finally, Proposition 6.5 shows
that any twisting element that does not act trivially on the basis of the fibration
is a root of an involution that belongs to G′, and assertion (d) shows that this
involution is twisting, and we obtain assertion (e).
Now that assertions (a) through (e) are proved, we deduce the proposition from
them. Let us denote by σ ∈ G′ a twisting involution, which twists two singular
fibres that we denote by F1 and F2. There are at least two other singular fibres
F3 and F4 that are twisted by other elements of G.
If G′ =< σ >, the fibres F3 and F4 are twisted by roots of σ belonging to G
(assertions (c) and (e)). The description of these elements (Proposition 6.5, and
in particular Corollary 6.6) shows that the roots must be square roots that twist
exactly one singular fibre and permute the two fibres F1 and F2 twisted by σ.
There thus exist two elements h3, h4 ∈ G that twist respectively the fibres F3 and
F4. Since h3 commutes with h4, it must leave invariant the unique fibre twisted
by h4, i.e. F4. Similarly, h4 must leave F3 invariant. Therefore, h3h4 leaves the
four fibres F1,...,F4 invariant and twists the two fibres F3 and F4; it is thus an
involution that belongs to G′, which contradicts the fact that G′ =< σ >.
If G′ 6=< σ >, since σ has no root in G′ (Corollary 6.2), the Abelian groupG′ ⊂
PGL(2,C(x)) is isomorphic to (Z/2Z)2 and contains (using (d)) three twisting
involutions σ, ρ and σρ. Note that two of these three involutions do not twist
singular fibres which are all distinct, otherwise the product of the two involutions
would give an involution that twists 4 singular fibres, contradicting (a). We may
thus suppose that ρ twists F1 and F3, which implies that σρ twists F2 and F3.
The fibre F4 is then twisted by an element which is a square root of one of the
three twisting involutions (assertion (e) and Corollary 6.6). Denote this square
root by h and suppose that h2 6= σ. Note that h exchanges the two singular fibres
twisted by h2. One of these is twisted by σ and the other is not, so h and σ do
not commute.
The only remaining possible finite Abelian groups of automorphisms of conic
bundles satisfying property (F ) are thus del Pezzo surfaces of degree 6 or 5 (studied
in Sections 4 and 5), the triple (Cs24, Ŝ4, π) studied in Section 7, and Hirzebruch
surfaces. We now describe this last case and prove that it is birationally reduced
to the case of P1 × P1.
Proposition 8.3. Let G ⊂ Aut(Fn) be a finite Abelian subgroup of automorphisms
of Fn, for some integer n ≥ 1. Then, a birational map of conic bundles conjugates
G to a finite group of automorphisms of F0 = P
1 × P1 that leaves one ruling
invariant.
Proof. Let G ⊂ Aut(Fn) be a finite Abelian group, with n ≥ 1. Note that G
preserves the unique ruling of Fn. We denote by E ⊂ Fn the unique section
of self-intersection −n, which is necessarily invariant by G. We have the exact
sequence (see Remark 3.13)
1 → G′ → G
→ π(G) → 1. (1)
Since the group π(G) ⊂ PGL(2,C) is Abelian, it is isomorphic to a cyclic group
or to (Z/2Z)2.
If π(G) is a cyclic group, at least two fibres are invariant by G. The group
G fixes two points in one such fibre. We can blow-up the point that does not lie
on E and blow-down the corresponding fibre to get a group of automorphisms of
Fn−1. We do this n times and finally obtain a birational map of conic bundles
that conjugates G to a group of automorphisms of F0 = P
1 × P1.
If π(G) is isomorphic to (Z/2Z)2, there exist two fibres F, F ′ of π whose union
is invariant by G. Let GF ⊂ G be the subgroup of G of elements that leave F
invariant. This group is of index 2 in G and hence is normal. Since GF fixes the
point F ∩ E in F , it acts cyclically on F . There exists another point P ∈ F ,
P /∈ E, which is fixed by GF . The orbit of P by G consists of two points, P and
P ′, such that P ′ ∈ F ′, P ′ /∈ E. We blow-up these two points and blow-down the
strict transforms of F and F ′ to get a group of automorphisms of Fn−2. We do
this ⌊n/2⌋ times to obtain G as a group of automorphisms of F0 or F1.
If n is even, we get in this manner a group of automorphisms of F0 = P
1 × P1.
Note that n cannot be odd, if the group π(G) is not cyclic. Otherwise, we
could conjugate G to a group of automorphisms of F1 and then to a group of
automorphisms of P2 by blowing-down the exceptional section on a point Q ∈
P2. We would get an Abelian subgroup of PGL(3,C) that fixes Q, and thus a
group with at least three fixed points. In this case, the action on the set of lines
passing through Q would be cyclic (see Proposition 2.2), which contradicts our
hypothesis.
We can now prove the main result of this section:
Proposition 8.4. Let G ⊂ Aut(S, π) be some finite Abelian group of automor-
phisms of the conic bundle (S, π) such that the triple (G,S, π) is minimal and no
non-trivial element of G fixes a curve of positive genus. Then, one of the following
situations occurs:
1. S is a Hirzebruch surface Fn;
2. S is a del Pezzo surface of degree 5 or 6;
3. The triple (G,S, π) is isomorphic to the triple (Cs24, Ŝ4, π) of Section 7.
If we suppose that the pair (G,S) is minimal, then we are in case 1 with n 6= 1 or
in case 3. Moreover, cases 1 and 2 are birationally conjugate to automorphisms of
P1 × P1 whereas the third is not.
Proof. The fact that one of the three cases occurs follows directly from Proposi-
tions 8.1 and 8.2.
Case 1 is clearly minimal if and only if n 6= 1 and Proposition 8.3 shows that
it is conjugate to automorphisms of P1 × P1. In the case of del Pezzo surfaces
of degree 5 and 6, the pair (G,S) is not minimal and the group is respectively
birationally conjugate to a subgroup of Sym4 ⊂ Aut(P
2) (Lemma 5.3) or Aut(P1×
P1) (Lemma 4.2). If the first situation occurs, since the group is Abelian and
not isomorphic to (Z/3Z)2 it is diagonalisable and conjugate to a subgroup of
Aut(P1 × P1) (Proposition 2.2). Thus, we are done with case 2.
It remains to show that the pair (Cs24, Ŝ4) is not birationally conjugate to a
group of automorphisms of P1 × P1. Let us suppose the contrary, i.e. that there
exists some Cs24-equivariant birational map ϕ : Ŝ4 99K P
1 × P1 (that conjugates
Cs24 to a group of automorphisms). Then, ϕ is the composition of Cs24-equivariant
elementary links (see for example [Isk3, Theorem 2.5], or [Do-Iz, Theorem 7.7]).
Since our group preserves the conic bundle, the first link is of type II, III or IV
(in the classical notation of Mori theory). We now study these possibilities and
show that it is not possible to go to P1 × P1.
Link of type II - In our case, this link is a birational map of conic bundles,
which is the composition of the blow-up of an orbit of Cs24, no two points on the
same fibre, with the blow-down of the strict transforms of the fibres of the points
blown-up. The points must be fixed by the elements of Cs24 that act trivially on
the basis of the fibration, and thus an orbit has 4 points, two on Ẽ1 and two on
D̃23. This link conjugates the triple (Cs24, Ŝ4, π) to a triple isomorphic to it, by
Proposition 8.1.
Link of type III - It is the contraction of some set of skew exceptional curves, in-
variant by Cs24. This is impossible since the pair (Cs24, Ŝ4) is minimal (Lemma 7.4).
Link of type IV - It is a change of the fibration. This is not possible since the
surface Ŝ4 admits only one conic bundle fibration (Corollary 7.2).
9 Actions on del Pezzo surfaces with fixed part
of the Picard group of rank one
In this section we prove the following result (note that finiteness is not required
and that minimality of the action is implied by the condition on Pic(S)G).
Proposition 9.1. Let S be a del Pezzo surface, and let G ⊂ Aut(S) be an Abelian
group such that rk Pic(S)G = 1 and no non-trivial element of G fixes a curve of
positive genus. Then, one of the following occurs:
1. S ∼= P2 or S ∼= P1 × P1;
2. S is a del Pezzo surface of degree 5 and G ∼= Z/5Z;
3. S is a del Pezzo surface of degree 6 and G ∼= Z/6Z.
Furthermore, in cases 2 and 3, the group G is birationally conjugate to a diagonal
cyclic subgroup of Aut(P2).
This will be proved separately for each degree, in Lemmas 9.7, 9.8, 9.13, 9.15,
9.16 and 9.17.
Remark 9.2. A del Pezzo surface S is either P1 × P1 or the blow-up of 0 ≤ r ≤
8 points in general position on P2 (i.e. such that no irreducible curve of self-
intersection ≤ −2 appears on S). The group Pic(S) has dimension r + 1, and its
intersection form gives a decomposition Pic(S) ⊗ Q = QKS ⊕K
S ; the signature
is (1,−1, ...,−1).
The group Aut(S) of automorphisms of a del Pezzo surface S acts on Pic(S)
and preserves the intersection form. This gives an homomorphism of Aut(S) →
Aut(Pic(S)) which is injective if and only if r > 3, since the kernel is the lift of
automorphisms of P2 that fix the r blown-up points. Furthermore, the image is
contained in the Weyl group and is finite (see [Dol]). In particular, the group
Aut(S) is finite if and only if r > 3.
When we have some group action on a del Pezzo surface, we would like to
determine the rank of the fixed part of the Picard group. Here are some tools to
this end.
Lemma 9.3 (Size of the orbits). Let S be a del Pezzo surface, which is the blow-up
of 1 ≤ r ≤ 8 points of P2 in general position, and let G ⊂ Aut(S) be a subgroup
of automorphisms with rk Pic(S)G = 1. Then:
• G 6= {1};
• the size of any orbit of the action of G on the set of exceptional divisors is
divisible by the degree of S, which is 9− r;
• in particular, if the order of G is finite, it is divisible by the degree of S.
Proof. It is clear that G 6= {1}, since rk Pic(S) > 1. Let D1, D2, ..., Dk be k
exceptional divisors of S, forming an orbit of G (the orbit is finite, see Remark 9.2).
The divisor
i=1 Di is fixed by G and thus is a multiple of KS . We can write∑k
i=1 Di = aKS, for some a ∈ Q. In fact, since aKS is effective, we have a < 0 and
a ∈ Z. Since the Di’s are irreducible and rational, we deduce from the adjunction
formula Di(KS +Di) = −2 that Di ·KS = −1. Hence
i=1 Di =
i=1 KS ·Di = −k = KS · aKS = a(9− r).
Consequently, the degree 9− r divides the size k of the orbit.
Remark 9.4. This lemma shows in particular that rk Pic(S)G > 1 if S is the blow-
up of r = 1, 2 points of P2, a result which is obvious when r = 1, and is clear
when r = 2, since the line joining the two blown-up points is invariant by any
automorphism.
Lemma 9.5. Let S be some (smooth projective rational) surface, and let g ∈
Aut(S) be some automorphism of finite order. Then, the trace of g acting on
Pic(S) is equal to χ(Fix(g)) − 2, where Fix(g) ⊂ S is the set of fixed points of g
and χ is the Euler characteristic.
Proof. This follows from the topological Lefschetz fixed-point formula, which as-
serts that the trace of g acting on H∗(S,Z) is equal to χ(Fix(g)) (this uses the fact
that g is an homeomorphism of finite order). Since S is a complex rational surface,
H0(S,Z) and H4(S,Z) have dimension 1, H2(S,Z) ∼= Pic(S), and Hi(S,Z) = 0
for i 6= 0, 2, 4. Since the trace on H2 and H4 is 1, we obtain the result.
Remark 9.6. This lemma is false if the order of g is infinite. Take for example the
automorphism (x : y : z) 7→ (λx : y : z + y) of P2, for any λ ∈ C∗, λ 6= 1. It fixes
exactly two points, namely (1 : 0 : 0) and (0 : 0 : 1), but its trace on Pic(P2) = Z
is 1.
We now start the proof of Proposition 9.1 by studying the cases of del Pezzo
surfaces of degree 6 or 5.
Lemma 9.7 (Actions on the del Pezzo surface of degree 6). Let S6 = {
(x : y :
z), (u : v : w)
| ux = vy = wz} ⊂ P2 × P2 be the del Pezzo surface of degree 6
and let G ⊂ Aut(S6) be an Abelian group such that rk Pic(S6)
G = 1. Then, G is
conjugate in Aut(S6) to the cyclic group of order 6 generated by
(x : y : z), (u :
v : w)
(v : w : u), (y : z : x)
. Furthermore, G is birationally conjugate to a
diagonal subgroup of Aut(P2).
Proof. Lemma 9.3 implies that the sizes of the orbits of the action of G on the ex-
ceptional divisors are divisible by 6. The action of G on the hexagon of exceptional
divisors is thus transitive, so G contains an element of the form
(x : y : z), (u : v : w)
(αv : βw : u), (βy : αz : αβx)
where α, β ∈ C∗. As the only element of (C∗)2 that commutes with g is the
identity (see the description of Aut(S6) = (C
∗)2 ⋊ (Sym3 × Z/2Z) in Section 4),
G must be cyclic, generated by g. Conjugating it by
(x : y : z), (u : v : w)
(βx : y : αz), (αu : αβv : βw)
we may assume that α = β = 1, as stated in the lemma (this shows in particular
thatG is of finite order). It remains to prove that this automorphism is birationally
conjugate to a linear automorphism of the plane.
Denote by p : S → P2 the restriction of the projection on the first factor.
This is a birational morphism which is the blow-up of the three diagonal points
A1, A2, A3 of P
2. Consider the birational map ĝ = pgp−1 of P2, which is explicitly
ĝ : (x : y : z) 99K (xz : xy : yz). Since g is an automorphism of the surface, it
fixes the canonical divisor KS , so the birational map ĝ leaves the linear system
of cubics of P2 passing through A1, A2 and A3 invariant (this can also be verified
directly).
Note that ĝ fixes exactly one point of P2, namely P = (1 : 1 : 1), and that its
action on the projective tangent space P(TP (P
2)) of P2 at P is of order 3, with two
fixed points, corresponding to the lines (x− y) + ωk(z − y) = 0, where ω = e2iπ/3,
k = 1, 2. Hence, the birational map ĝ preserves the linear system of cubics of P2
passing through A1, A2 and A3, which have a double point at P and are tangent
to the line (x− y) + ω(z − y) = 0 at this point. This linear system thus induces a
birational transformation of P2 that conjugates ĝ to a linear automorphism.
Lemma 9.8 (Actions on the del Pezzo surface of degree 5). Let S5 be the del
Pezzo surface of degree 5 and let G ∈ Aut(S5) = Sym5 be an Abelian group such
that rk Pic(S5)
G = 1. Then, G is cyclic of order 5. Furthermore, G is birationally
conjugate to a diagonal subgroup of Aut(P2).
Proof. We use the description of the surface S5 and its automorphisms group
Aut(S5) = Sym5 given in Section 5. Lemma 9.3 implies that the order of G is
divisible by 5, and thus that G is a cyclic subgroup of Sym5 of order 5. Since
all such subgroups are conjugate in Aut(S5) = Sym5, we may suppose that G is
generated by the lift of the birational transformation h : (x : y : z) 99K (xy :
y(x− z) : x(y− z)) of P2, that fixes two points of P2, namely (ζ +1 : ζ : 1), where
ζ2 − ζ − 1 = 0. Denoting one of them by P , the linear system of cubics passing
through the four blown-up points and having a double point at P is invariant
by h. The birational transformation associated to this system thus conjugates h
to a linear automorphism of P2.
Remark 9.9. The fact that (x : y : z) 99K (xy : y(x− z) : x(y − z)) is linearisable
was proved in [Be-Bl], using the same argument as above.
Corollary 9.10. Let S be a rational surface with (KS)
2 ≥ 5 and let G ⊂ Aut(S)
be a finite Abelian group. Then G is birationally conjugate to a subgroup of Aut(P2)
or Aut(P1 × P1).
Proof. We may assume that the pair (G,S) is minimal; consequently there are two
possibilities (see [Man], [Isk2] or [Do-Iz]):
1. S is a del Pezzo surface and rk Pic(S)G = 1. Then S is either P2, P1×P1 or
a del Pezzo surface of degree 6 or 5 (Remark 9.4); we apply Lemmas 9.7 and 9.8
to conclude.
2. G preserves a conic bundle structure on S. Here the number of fibres is at
most 3, hence no element of G fixes a curve of positive genus (Corollary 6.4); we
apply Proposition 8.4 to conclude.
To study del Pezzo surfaces of degree 4, let us describe their group of auto-
morphisms (note that we do not use the notation Sd for the del Pezzo surfaces of
degree d ≤ 4, because there are many different surfaces of the same degree):
Lemma 9.11 (Automorphism group of del Pezzo surfaces of degree 4). Let S
be a del Pezzo surface of degree 4 given by the blow-up η : S → P2 of five points
A1, ..., A5 ∈ P
2 such that no three are collinear. Setting Ei = η
−1(Ai) and denoting
by L the pull-back by η of a general line of P2, we have:
1. There are exactly 10 conic bundle structures on S, whose fibres are respec-
tively L− Ei, −KS − (L− Ei), for i = 1, ..., 5.
2. The action of Aut(S) on the five pairs of divisors {L−Ei,−KS− (L−Ei)},
i = 1, ..., 5 gives rise to a split exact sequence
0 → F → Aut(S)
→ Sym5,
where F = {(a1, ..., a5) ∈ (F2)
ai = 0} ∼= (F2)
4, and the automorphism
(a1, ..., a5) permutes the pair {L−Ei,−KS − (L−Ei)} if and only if ai = 1.
3. We have
Aut(S) = F⋊Aut(S, η),
where Aut(S, η) is the lift of the group of automorphisms of P2 that leave
the set {A1, ..., A5} invariant, and Aut(S, η) acts on F = {(a1, ..., a5) ∈
ai = 0} by permutation of the ai’s, as it acts on {A1, ..., A5}, and
as ρ(Aut(S)) = ρ(Aut(S, η)) ⊂ Sym5 acts on the exceptional pairs.
4. The elements of F with two ”ones” correspond to quadratic involutions of
P2 and fix exactly 4 points of S.
5. The elements of F with four ”ones” correspond to cubic involutions of P2
and the points of S fixed by these elements form a smooth elliptic curve.
Remark 9.12. The group F ⊂ Aut(S) has been studied intensively since 1895 (see
[Kan], Theorem XXXIII). A modern description of the group as the 2-torsion of
PGL(5,C) can be found in [Bea2, (4.1)], together with a study of the conjugacy
classes of such groups in the Cremona group. For further descriptions of the auto-
morphism groups of these surfaces, see [Do-Iz, section 6.4] and [Bla2, section 8.1].
Proof. Let A = mL −
i=1 aiEi be the divisor of the fibre of some conic bundle
structure on S, for some m, a1, ..., a5 ∈ Z. From the relations A
2 = 0 (the fibres
are disjoint) and AKS = −2 (adjunction formula) we get:
i=1 ai
2 = m2,
i=1 ai = 3m− 2.
As in Lemma 7.1, we have (
i=1 ai)
2 ≤ 5
i=1 ai
2, which implies here that
(3m− 2)2 ≤ 5m2, that is 4(m2 − 3m+ 1) ≤ 0. As m is an integer, we must have
1 ≤ m ≤ 2. If m = 1, we replace it in (3) and see that there exists i ∈ {1, ..., 5}
such that A = L − Ei. Otherwise, taking m = 2 and replacing it in (3), we see
that four of the aj ’s are equal to 1, and one is equal to 0. This gives the ten conic
bundles of assertion 1, which are the lift on S of the lines of P2 passing through
one of the Ai’s or of the conic passing through four of the Ai’s.
The group Aut(S) acts on the set ∪5i=1{L−Ei,−KS − (L−Ei)}; since KS is
fixed, this induces an action on the set of five pairs {L−Ei,−KS − (L−Ei)}. We
denote by ρ : Aut(S) → Sym5 the corresponding homomorphism. The action of
the kernel of ρ on the pairs of conic bundles gives a natural embedding of Ker(ρ)
into (F2)
We now prove that Ker(ρ) = {(a1, ..., a5) |
ai = 0} = F. Acting by a
linear automorphism of P2, we may assume that the points blown-up by η are
A1 = (1 : 0 : 0), A2 = (0 : 1 : 0), A3 = (0 : 0 : 1), A4 = (1 : 1 : 1), A5 = (a : b : c),
for some a, b, c ∈ C∗. Then, the birational involution τ : (x0 : x1 : x2) 99K (ax1x2 :
bx0x2 : cx0x1) of P
2 lifts as an automorphism η−1τη ∈ Aut(S) that acts on Pic(S)

0 −1 −1 0 0 −1
−1 0 −1 0 0 −1
−1 −1 0 0 0 −1
0 0 0 0 1 0
0 0 0 1 0 0
1 1 1 0 0 2

with respect to the basis (E1, E2, E3, E4, E5, L). It follows from this observation
that η−1τη belongs to the kernel of ρ, and acts on the pairs of conic bundles
as (0, 0, 0, 1, 1) ∈ (F2)
5. Permuting the roles of the points A1, ..., A5, we get 10
involutions whose representations in (F2)
5 have two ”ones” and three ”zeros”.
These involutions generate the group {(a1, ..., a5) |
ai = 0} = F. To prove that
this group is equal to Ker(ρ), it suffices to show that (1, 1, 1, 1, 1) does not belong
to Ker(ρ). This follows from the fact that (1, 1, 1, 1, 1) would send L = 1
(KS +∑5
i=1(L − Ei)) on the divisor
(KS +
i=1(−KS − L + Ei)) =
(−2L − 3KS),
which doesn’t belong to Pic(S). This concludes the proof of assertion 2 (except
the fact that the exact sequence is split, which will be proved by assertion 3).
We now prove assertion 3. Let σ ∈ Sym5 be a permutation of the set {1, ..., 5}
in the image of ρ and g be an automorphism of S such that ρ(g) = σ. Let α be
the element of Aut(Pic(S)) that sends Ei on Eσ(i) and fixes L. Viewing Aut(S)
as a subgroup of Aut(Pic(S)), the element gα−1 ∈ Aut(Pic(S)) fixes the five pairs
of conic bundles. There exists some element h ∈ F ⊂ Aut(S) such that hgα−1
either fixes the divisor of every conic bundle or permutes the divisors of conic
bundles in each pair. The same argument as in the above paragraph shows that
this latter possibility cannot occur. Hence hgα−1 fixes L−E1, ..., L−E5 and KS.
It follows that hgα−1 acts trivially on Pic(S), so α = hg ∈ Aut(S), and α is by
construction the lift of an automorphism of P2 that acts on the set {A1, ..., A5} as
σ does on {1, ..., 5}. Conversely, it is clear that every automorphism r of P2 which
leaves the set {A1, A2, A3, A4, A5} invariant lifts to the automorphism η
−1rη of
S whose action on the pairs of conic bundles is the same as that of r on the set
{A1, A2, A3, A4, A5}. This gives assertion 3.
Assertion 4 follows from the above description of some element of F ⊂ Aut(S)
with two ”ones” as the lift of a birational map of the form τ : (x0 : x1 : x2) 99K
(ax1x2 : bx0x2 : cx0x1). As the automorphism η
−1τη ∈ Aut(S) does not leave any
exceptional divisor invariant, its fixed points are the same as those of τ , which are
the four points (α : β : γ), where α2 = a, β2 = b, γ2 = c.
It remains to prove the last assertion. Note that the element h = (0, 1, 1, 1, 1) ∈
Aut(S) fixes the divisor L−E1, hence acts on the associated conic bundle structure.
Furthermore, the four singular fibres of this conic bundle, {L − E1 − Ei, Ei}, for
i = 2, ..., 5, are invariant by h and this element switches the two components of
each fibre. This shows that the action of h on the basis of the fibration is trivial,
so the restriction of h on each fibre is an involution of P1 which fixes two points.
On each singular fibre, exactly one point is fixed, which is the singular point of
the fibre. The situation is similar for the other elements with four ”ones” (in fact,
the involutions described here are twisting involutions, see Lemma 6.1).
Lemma 9.13 (Actions on the del Pezzo surfaces of degree 4). Let S be a del
Pezzo surface of degree 4, and let G ∈ Aut(S) be an Abelian group such that
rk Pic(S)G = 1. Then, G contains an involution that fixes an elliptic curve.
Proof. We keep the notation of Lemma 9.11 for η : S → P2,Aut(S, η), ρ,F, ... and
denote by H the group G ∩ F = G ∩ Kerρ. We will prove that H contains an
element of F with four ”ones”, which is an involution that fixes an elliptic curve
(Lemma 9.11).
The group ρ(G) ⊂ ρ(Aut(S)) ∼= Aut(S, η) is isomorphic to a subgroup of
Aut(S, η). The group Aut(S, η) is the lift of the group of automorphisms of P2 that
leave the set {A1, ..., A5} invariant (Lemma 9.11). The restriction of this group to
the conic of P2 passing through the five points is a subgroup of PGL(2,C) that
leaves five points invariant. Since ρ(G) is finite and Abelian, it is cyclic, of order
at most 5. We consider the different possibilities.
The order of ρ(G) is 1. This implies that G ⊂ F. If G contains an element
with four ”ones”, we are done. Otherwise, up to conjugation G is a subset of the
group generated by (1, 1, 0, 0, 0) and (1, 0, 1, 0, 0), and fixes L − E4 and L − E5
(thus rk Pic(S)G > 1).
The order of ρ(G) is 2. Up to a change of numbering, ρ(G) is generated by
(1 2)(3 4); since G is Abelian, we find that H ⊂ V = {(a, a, b, b, 0) | a, b ∈ F2}.
Let g = ((a, b, c, d, e), (1 2)(3 4)) ∈ G be such that ρ(g) = (1 2)(3 4). We may
suppose that e = 1 (otherwise, the group G would fix L − E5 and we would
have rk Pic(S)G ≥ 2.) Conjugating by ((0, b, 0, d, b + c), id) we may assume that
g = ((a+ b, 0, c+ d, 0, 1), (1 2)(3 4)). In fact, since a+ b+ c+ d+ e = 0, we have
g = ((α, 0, 1+α, 0, 1), (1 2)(3 4)), where α = a+b = c+d+1 ∈ F2. If α = 1, then g
has order 4 and fixes the divisor 2L− E3 − E4, thus G cannot be equal to < g >
and it follows that V ⊂ G; in particular the element (1, 1, 1, 1, 0) is contained in
G. If α = 0, then < g > fixes 2L− E1 − E2, so once again G contains V .
The order of ρ(G) is 3. In this case, ρ(G) is generated by a 3-cycle, namely
(1 2 3); then H must be a subgroup of V = {(a, a, a, b, a + b) | a, b ∈ F2}. The
order of G must be a multiple of 4, by Lemma 9.3, hence H = V , and thus G
contains the element (1, 1, 1, 1, 0).
The order of ρ(G) is 4. Then ρ(G) is generated by (1 2 3 4), so H must be a
subgroup of V =< (1, 1, 1, 1, 0) >. Let g = ((a, b, c, d, e), (1 2 3 4)) ∈ G be such
that ρ(g) = (1 3 2 4). Conjugating the group by ((a, a+b, a+b+c, 0, a+c), id), we
may suppose that g = ((0, 0, 0, e, e), (1 3 2 4)). If e = 1, then g4 = (1, 1, 1, 1, 0) ∈ G.
If e = 0, the element g belongs to HS , so it fixes the divisors L and E5. As the
group V fixes L− E5, the rank of Pic(S)
G cannot be 1.
The order of ρ(G) is 5. Then, ρ(G) is generated by a 5-cycle and H = {1}.
The rank of Pic(S)H cannot be 1, by Lemma 9.3.
Before studying the case of del Pezzo surfaces of degree ≤ 3, we remind the
reader of some classical embeddings of these surfaces.
Remark 9.14. Recall ([Kol], Theorem III.3.5) that a del Pezzo surface of degree 3
(respectively 2, 1) is isomorphic to a smooth hypersurface of degree 3 (respectively
4, 6) in the projective space P3 (respectively in P(1, 1, 1, 2), P(1, 1, 2, 3)). Further-
more, in each of the 3 cases, any automorphism of the surface is the restriction of
an automorphism of the ambient space. We will use these classical embeddings,
take w, x, y, z as the variables on the projective spaces, and denote by [α : β : γ : δ]
the automorphism (w : x : y : z) 7→ (αw : βx : γy : δz). Note that a del Pezzo
surface of degree 4 is isomorphic to the intersection of two quadrics in P4, but we
will not use this here.
Lemma 9.15 (Actions on the del Pezzo surfaces of degree 3). Let S be a del
Pezzo surface of degree 3, and let G ∈ Aut(S) be an Abelian group such that
rk Pic(S)G = 1. Then, G contains an element of order 2 or 3 that fixes an elliptic
curve of S.
Proof. Lemma 9.3 implies that the order of G is divisible by 3, so G contains an
element of order 3. We view S as a cubic surface in P3, and Aut(S) as a subgroup
of PGL(4,C) (see Remark 9.14). There are three kinds of elements of order 3 in
PGL(4,C), depending on the nature of their eigenvalues. Setting ω = e2iπ/3, there
are elements with one eigenvalue of multiplicity 3 (conjugate to [1 : 1 : 1 : ω], or its
inverse), elements with two eigenvalues of multiplicity 2 (conjugate to [1 : 1 : ω : ω])
and elements with three distinct eigenvalues (conjugate to [1 : 1 : ω : ω2]). We
consider the three possibilities.
Case a: G contains an element of order 3 with one eigenvalue of multiplicity 3.
The element [1 : 1 : 1 : ω] fixes the hyperplane z = 0, whose intersection with the
surface S is an elliptic curve (because Fix(g) ⊂ S is smooth). Thus, we are done.
Case b: G contains an element of order 3 with two eigenvalues of multiplicity 2.
With a suitable choice of coordinates, we may assume that this element is
g = [1 : 1 : ω : ω].
Since S is smooth, its equation F is of degree at least 2 in each variable, which
implies that F (w, x, ωy, ωz) = F (w, x, y, z) (the eigenvalue is 1); up to a change
of coordinates F = w3 + x3 + y3 + z3, which means that S is the Fermat cubic
surface. The group of automorphisms of S is (Z/3Z)3 ⋊ Sym4 and the centraliser
of g in it is (Z/3Z)3 ⋊ V , where V ∼= (Z/2Z)2 is the subgroup of Sym4 generated
by the two transpositions (w, x) and (y, z). The structure of the centraliser gives
rise to an exact sequence
1 → (Z/3Z)3 → (Z/3Z)3 ⋊ V
→ V → 1
∪ ∪ ∪
1 → G ∩ (Z/3Z)3 → G → γ(G) → 1.
We may suppose that G contains no element of order 3 with an eigenvalue of
multiplicity 3, since this case has been studied above (case a). There are then
three possibilities for G ∩ (Z/3Z)3, namely < g >, < g, [1 : ω : 1 : ω] > and
< g, [1 : ω : ω : 1] >. The last is conjugate to the second by the automorphism
(y, z). Note that g preserves exactly 9 of the 27 lines on the surface; these
are {w + ωix = y + ωjz = 0}, for 0 ≤ i, j ≤ 2. If G ∩ (Z/3Z)3 is equal to
< g >, then G/ < g >∼= γ(G) has order 1, 2 or 4 and thus G leaves at least
one of the 9 lines invariant, whence rk Pic(S)G > 1. If G ∩ (Z/3Z)3 is the group
H =< g, [1 : ω : 1 : ω] > we have G = H , since the centraliser of H in (Z/3Z)3⋊V
is the group (Z/3Z)3. As the set of three skew lines {w + ωix = y + ωiz = 0} for
0 ≤ i ≤ 2 is an orbit of H , the rank of Pic(S)G is strictly larger than 1.
Case c: G contains an element g of order 3 with three distinct eigenvalues.
We may suppose that g = [1 : 1 : ω : ω2]. Note that the action of g on P3
fixes the line Lyz of equation y = z = 0 and thus the whole group G leaves this
line invariant. If Lyz ⊂ S, the rank of rk Pic(S)
G is at least 2. Otherwise, the
equation of S is of the form L3(w, x)+L1(w, x)yz+ y
3+ z3 = 0, where L3 and L1
are homogeneous forms of degree respectively 3 and 1, and L3 has three distinct
roots, so Fix(g) = S ∩ Lyz. Since g fixes exactly three points, the trace of its
action on Pic(S) ∼= Z7 is 1 (Lemma 9.5) and thus rk Pic(S)g > 1, which implies
that G 6=< g >.
Note that every subgroup of PGL(4,C) isomorphic to (Z/3Z)2 contains an
element with only two distinct eigenvalues, so we may assume that G contains
only two elements of order 3, which are g and g2. This implies that the action of
G on the three points of Lyz ∩ S gives an exact sequence
1 →< g >→ G → Sym3,
where the image on the right is a transposition. The group G thus contains an
element of order 2, that we may assume to be diagonal of the form (w : x : y :
z) 7→ (−w : x : y : z) and that fixes the elliptic curve which is the trace on S of
the plane w = 0.
Lemma 9.16 (Actions on the del Pezzo surfaces of degree 2). Let S be a del
Pezzo surface of degree 2, and let G ∈ Aut(S) be an Abelian group such that
rk Pic(S)G = 1. Then, G contains either the Geiser involution (that fixes a curve
isomorphic to a smooth quartic curve) or an element of order 2 or 3 that fixes an
elliptic curve.
Proof. We view S as a surface of degree four in the weighted projective space
P(2, 1, 1, 1) (see Remark 9.14). Note that the projection on the last three coordi-
nates gives S as a double covering of P2 ramified over a smooth quartic curve Q.
Lemma 9.3 implies that the order of G is divisible by 2, so G contains an
element g of order 2.
If the element g is the involution induced by the double covering (classically
called the Geiser involution), we are done; otherwise we may assume that g acts
on P(2, 1, 1, 1) as g : (w : x : y : z) 7→ (ǫw : x : y : −z), where ǫ = ±1, and the
equation of S is w2 = z4 + L2(x, y)z
2 + L4(x, y), where Li is a form of degree i,
and L4 has four distinct roots. The trace on S of the equation z = 0 defines an
elliptic curve Lz ⊂ S. If ǫ = 1, then g fixes the curve Lz and we are done; we
therefore assume that ǫ = −1.
If G contains another involution, we diagonalise the group generated by these
two involutions and see that one element of the group fixes either an elliptic curve
or the smooth quartic curve, so we may assume that g is the only involution of G.
Note that g fixes exactly four points of S, which are the points of intersection
of Lz with the quartic Q (of equation w = 0). The trace of g on Pic(S) ∼= Z
thus equal to 2 (Lemma 9.5), whence rk Pic(S)g = 5 and G 6=< g >.
The group G acts on the line z = 0 of P2 and on the four points of Lz ∩ Q.
Since g is the only element of order 2 of G, the action of G on these four aligned
points has order 3 and thus, we may assume that L4(x, y) = x(x
3 +λy3) and that
there exists an element h of G that acts as (w : x : y : z) 7→ (αw : x : e2iπ/3y : βz),
with α2 = β4 = 1. We find that h4 is an element of order 3 that fixes the elliptic
curve which is the trace on S of the equation y = 0.
Lemma 9.17 (Actions on the del Pezzo surfaces of degree 1). Let S be a del
Pezzo surface of degree 1, and let G ∈ Aut(S) be an Abelian group such that
rk Pic(S)G = 1. Then, some non-trivial element of G fixes a curve of S of positive
genus.
Proof. We view S as a surface of degree six in the weighted projective space
P(3, 1, 1, 2) (see Remark 9.14). Up to a change of coordinates, we may assume
that the equation is
w2 = z3 + zL4(x, y) + L6(x, y),
where L4 and L6 are homogeneous forms of degree 4 and 6 respectively. The
embedding of S into P(3, 1, 1, 2) is given by | − 3KS| × | −KS | × | − 2KS|, which
implies that G is a subgroup of P (GL(1,C)×GL(2,C)×GL(1,C)). The projection
(w : x : y : z) 99K (x : y) is an elliptic fibration generated by | − KS |, and has
one base-point, namely (1 : 0 : 0 : 1), which is fixed by Aut(S). This projection
induces an homomorphism ρ : Aut(S) → Aut(P1) = PGL(2,C). Note that the
kernel of ρ is generated by the Bertini involution w 7→ −w (and the element z 7→ ωz
(ω = e2iπ/3) if L4 = 0) and is hence cyclic of order 2 (or 6). Furthermore, any
element of this kernel fixes a curve of positive genus.
We assume that no non-trivial element of G fixes a curve of positive genus.
This implies that G is isomorphic to ρ(G) ⊂ Aut(P1), and thus is either cyclic or
isomorphic to (Z/2Z)2. Since the lift of this latter group in Aut(S) is not Abelian,
G is cyclic. We use the Lefschetz fixed-point formula (Lemma 9.5) to deduce the
eigenvalues of the action of elements of G on Pic(S) ∼= Z9. For any element g ∈ G,
g 6= 1, Fix(g) contains the point (1 : 0 : 0 : 1) and is the disjoint union of points
and lines. Thus χ(Fix(g)) ≥ 1 and so the trace of g on Pic(S) is at least −1
(Lemma 9.5).
Elements of order 2: The eigenvalues are < 1a, (−1)b > with a ≥ 4, b ≤ 5.
Elements of order 3: The eigenvalues are < 1a, (ω)b, (ω2)b > with a ≥ 3, b ≤ 3.
Elements of order 4: The eigenvalues are < 1a, (−1)b, (i)c, (−i)c > with a ≥
b−1. Furthermore, the information on the square induces that a+b ≥ 4, so a ≥ 2.
Elements of order 5: The eigenvalues are < 15, l1, l2, l3, l4 >, where l1, ..., l4 are
the four primitive 5-th roots of unity.
Elements of order 6: The eigenvalues are< 1a, (−1)b, (ω)c, (ω2)c, (−ω)d, (−ω2)d >,
where a − b − c + d ≥ −1. Computing the square and the third power, we find
respectively a + b ≥ 3, c + d ≤ 3 and a + 2c ≥ 4, b + 2d ≤ 5. This implies that
a ≥ 2. Indeed, if a = 1, we get b, c ≥ 2 and thus d ≤ 1, which contradicts the fact
that the trace a− b− c+ d is at least −1.
Since rk Pic(S)G = 1, the order of the cyclic group G is at least 7. As the
action of G leaves L4 and L6 invariant, both L6 and L4 are monomials. If some
double root of L6 is a root of L4, the surface is singular, so up to an exchange of
coordinates we may suppose that L4 = x
4 and either L6 = xy
5 or L6 = y
In the first case, the equation of the surface is w2 = z3+x4z+xy5 whose group
of automorphisms Aut(S) is isomorphic to Z/20Z, generated by [i : 1 : ζ10 : −1],
and contains the Bertini involution. No subgroup of Aut(S) fullfills our hypotheses.
In the second case, the equation of the surface is w2 = z3 + x4z + y6, whose
group of automorphisms is isomorphic to Z/2Z×Z/12Z, generated by the Bertini
involution and g = [i : 1 : ζ12 : −1]. The only possibility for G is to be equal to
< g >. Since g4 = [1 : 1 : ω : 1] fixes an elliptic curve, we are done.
Proposition 9.1 now follows, using all the lemmas proved above.
10 The results
We now prove the five theorems stated in the introduction.
Proof of Theorem 4. Since the pair (G,S) is minimal, either rk Pic(S)G = 1 and
S is a del Pezzo surface, or G preserves a conic bundle structure (see [Man], [Isk2]
or [Do-Iz]).
In the first case, either S ∼= P2, or S ∼= P1 × P1 or S is a del Pezzo surface of
degree d = 5 or 6 and G ∼= Z/dZ (Proposition 9.1).
In the second case, either S is a Hirzebruch surface or the pair (G,S) is the
pair (Cs24, Ŝ4) of Section 7 (Proposition 8.4).
Proof of Theorem 2. No non-trivial element of Aut(P2),Aut(P1 × P1) or Cs24
fixes a non-rational curve (the first two cases are clear, the last one follows from
Lemma 7.4).
Conversely, suppose that G is a finite Abelian subgroup of the Cremona group
such that no non-trivial element fixes a curve of positive genus. Since G is finite,
it is birationally conjugate to a group of automorphisms of a rational surface S
(see for example [dF-Ei, Theorem 1.4] or [Do-Iz]). Then, we assume that the pair
(G,S) is minimal and use the classification of Theorem 4.
If S is an Hirzebruch surface, the group is birationally conjugate to a subgroup
of Aut(P1 ×P1) (Proposition 8.3). If S is a del Pezzo surface, the group G is bira-
tionally conjugate to a subgroup of Aut(P1 × P1) or Aut(P2), by Proposition 9.1.
Otherwise, the pair (G,S) is isomorphic to the pair (Cs24, Ŝ4).
It remains to show that the group Cs24 is not birationally conjugate to a
subgroup of Aut(P1 × P1) or Aut(P2). Since the group is isomorphic to Z/2Z ×
Z/4Z, only the case of Aut(P1 × P1) need be considered (see Section 2). This was
proved in Proposition 8.4.
Proof of Theorem 5. By Theorem 2, G is birationally conjugate either to a sub-
group of Aut(P2), or of Aut(P1 × P1), or to the group Cs24.
The group Cs24 is case [8]. The finite Abelian subgroups of Aut(P
2) are conju-
gate to the groups of case [1] or [9] (Proposition 2.2). The finite Abelian subgroups
of Aut(P1 × P1) are conjugate to the groups of cases [1] through [7] (Proposi-
tion 2.5).
It was proved in Proposition 2.5 that cases [1] through [7] are distinct. In
Proposition 8.4 we showed that [8] (Cs24) is not birationally conjugate to any
groups of cases [1] through [7]. Finally, the group [9] is isomorphic only to [1], but
is not birationally conjugate to it (Proposition 2.2). This completes the proof that
the distincts cases given above are not birationally conjugate.
The proof of Theorem 1 follows directly from Theorem 5, and Theorem 3 is a
corollary of Theorem 1.
11 Other kinds of groups
Our main interest up to now was in finite Abelian subgroups of the Cremona
group. In this section, we give some examples in the other cases, in order to show
why the hypothesis ”finite”, respectively ”Abelian”, is necessary to ensure that
condition (F ) (no curve of positive genus is fixed by a non-trivial element) implies
condition (M) (the group is birationally conjugate to a group of automorphisms
of a minimal surface). We refer to the introduction for more details.
Finiteness is important since it imposes that the group is conjugate to a group
of automorphisms of a projective rational surface. This is not the case if the group
is not finite (see for example [Bla2], Proposition 2.2.4).
Lemma 11.1. Let ϕ : P2 99K P2 be a quadratic birational transformation with
three proper base-points, and such that deg(ϕn) = 2n for each integer n ≥ 1.
Then, the following occur:
1. no pencil of curves is invariant by ϕ;
2. ϕ is not birationally conjugate to an automorphism of P2 or of P1 × P1.
Proof. Denote by A1, A2, A3 the three base-points of ϕ and by B1, B2, B3 those
of ϕ−1. Up to a change of coordinates, we may suppose that A1 = (1 : 0 : 0),
A2 = (0 : 1 : 0) and A3 = (0 : 0 : 1). The birational transformation ϕ is thus the
composition of the standard quadratic transformation σ : (x : y : z) 99K (yz : xz :
xy) with a linear automorphism τ ∈ Aut(P2) that sends Ai on Bi for i = 1, 2, 3.
Let Λ be some pencil of curves, and assume that ϕ(Λ) = Λ. We will prove
that some base-point of Λ is sent by ϕ on an orbit of infinite order. The con-
dition deg(ϕn) = 2n is equivalent to saying that for i = 1, 2, 3, the sequence
Bi, ϕ(Bi), ..., ϕ
n(Bi), ... is well-defined, i.e. that ϕ
m(Bi) is not equal to Aj for any
i, j ∈ {1, 2, 3},m ∈ N. Denote by α1, α2, α3, β1, β2, β3 the multiplicity of Λ at
respectively A1, A2, A3, B1, B2, B3 and by n the degree of the curves of Λ. The
curves of the pencil ϕ(Λ) thus have degree 2n−α1−α2−α3. Since Λ is invariant,
n = α1 + α2 + α3, so at least one of the αi’s is not equal to zero. The equality
n = α1+α2+α3 implies that the curves of σ(Λ) have multiplicity αi at Ai, so the
curves of ϕ(Λ) have multiplicity αi at Bi, whence αi = βi for i = 1, 2, 3. Since Λ
passes through Bi with multiplicity αi, the pencil ϕ(Λ) = Λ passes through ϕ(Bi)
with multiplicity αi for i = 1, 2, 3. Continuing in this way, we see that Λ passes
through ϕn(Bi) with multiplicity αi for each n ∈ N. Consequently, Λ has infinitely
many base-points, which is not possible. This establishes the first assertion.
The second assertion follows directly, as each automorphism of P2 or P1 × P1
leaves a pencil of rational curves invariant.
Corollary 11.2. The group generated by a very general quadratic transformation
is a infinite cyclic group satisfying (F ) but not (M).
Proof. The condition deg(ϕn) = 2n, n ∈ N is satisfied for all quadratic transfor-
mations, except for a countable set of proper subvarieties. Consequently condition
(F ) is not satisfied (Lemma 11.1) for a very general quadratic transformation.
Let n be some positive integer and write ϕn : (x : y : z) 99K (f1(x, y, z) :
f2(x, y, z) : f3(x, y, z)), for some homogeneous polynomials fi of degree 2
n. The
set of points fixed by ϕn belongs to the intersection of the curves with equations
xf2 − yf1, xf3 − zf1 and yf3 − zf2. In general, there is only a finite number of
points; this yields condition (F ).
In fact, the argument of Lemma 11.1 works for any very general birational
transformation of P2, since this is a composition of quadratic transformations.
We thus find infinitely many cyclic subgroups of the Cremona group that are not
birationally conjugate to a group of automorphisms of a minimal surface although
none of their non-trivial elements fixes a non-rational curve. The implication
(F ) ⇒ (M) is therefore false for general cyclic groups.
We now study the finite non-Abelian subgroups and provide, in this case, many
examples satisfying (F ) but not (M):
Lemma 11.3. Let S6 = {
(x : y : z), (u : v : w)
| ux = vy = wz} ⊂ P2 × P2
be the del Pezzo surface of degree 6. Let G ∼= Sym3 × Z/2Z be the subgroup of
automorphisms of S6 generated by
(x : y : z), (u : v : w)
(u : v : w), (x : y : z)
(x : y : z), (u : v : w)
(y : x : z), (v : u : w)
(x : y : z), (u : v : w)
(z : y : x), (w : v : u)
Then no non-trivial element of G fixes a curve of positive genus, and G is not
birationally conjugate to a group of automorphisms of a minimal surface.
Proof. Since every non-trivial element of finite order of Aut(S6) is birationally
conjugate to a linear automorphism of P2 (Corollary 9.10), no such element fixes
a curve of positive genus. The description of every G-equivariant elementary link
starting from S6 was given by Iskovskikh in [Isk4]. This shows that this group is
not birationally conjugate to a group of automorphisms of a minimal surface.
Lemma 11.4. Let S5 be the del Pezzo surface of degree 5. Let G ∼= Sym5 be
the whole group Aut(S5). Then no non-trivial element of G fixes a curve of posi-
tive genus, and G is not birationally conjugate to a group of automorphisms of a
minimal surface.
Proof. Since every non-trivial element of Aut(S5) is birationally conjugate to a
linear automorphism of P2 (Corollary 9.10), such an element does not fix a curve
of positive genus. Suppose that there exists some G-equivariant birational trans-
formation ϕ : S5 99K S̃ where S̃ is equal to P
2 or P1 × P1. We decompose ϕ into
G-equivariant elementary links (see for example [Isk3], Theorem 2.5). The classi-
fication of elementary links ([Isk3], Theorem 2.6) shows that a link S5 99K S
either a Bertini or a Geiser involution (and in this case S′ = S5, and thus this link
conjugates G to itself), or the composition of the blow-up of one or two points,
and the contraction of 5 curves to respectively P1 × P1 or P2. It remains to show
that no orbit of G has size 2 or 1, to conclude that these links are not possible.
This follows from the fact that the actions of Sym5,Alt5 ⊂ G on S5 are fixed-point
free (Proposition 5.1).
Finally, the way to find more counterexamples is to look at groups acting on
conic bundles. The generalisation of the example Cs24 gives many examples of
non-Abelian finite groups. Here is the simplest family:
Lemma 11.5. Let n be some positive integer, and let G be the group of birational
transformations of P2 generated by
g1 : (x : y : z) 99K (yz : xy : −xz),
g2 : (x : y : z) 99K (yz(y − z) : xz(y + z) : xy(y + z)),
h : (x : y : z) 99K (e2iπ/2nx : y : z).
Then, G preserves the pencil Λ of lines passing through (1 : 0 : 0) and the corre-
sponding action gives rise to a non-split exact sequence
1 →< h >∼= Z/2nZ → G → (Z/2Z)2 → 1.
In particular, the group G has order 8n. Furthermore, no non-trivial element of
G fixes a curve of positive genus, and G is not birationally conjugate to a group
of automorphisms of a minimal surface.
Proof. Firstly, since g1 and g2 generate the group Cs24, which is not birationally
conjugate to a group of automorphisms of a minimal surface, this is also the case
for G.
Secondly, we compute that (g1)
2 = (g2)
2 = (h)n is the birational transforma-
tion (x : y : z) 7→ (−x : y : z). The maps g1 and g2 thus have order 4 and h has
order 2n.
Thirdly, every generator of G preserves the pencil Λ of lines passing through
(1 : 0 : 0). The action of g1, g2 and h on this pencil is respectively (y : z) 7→ (−y :
z), (y : z) 7→ (z : y) and (y : z) 7→ (y : z). The action of G on the pencil thus gives
an exact sequence
1 → G′ → G → (Z/2Z)2 → 0,
where G′ is the subgroup of elements of G that act trivially on the pencil Λ. It is
clear that < h >∼= Z/2nZ is a subgroup of G′. Since g1h(g1)
−1 = g2h(g2)
−1 = h−1
and g1 and g2 commute, the group < h > is equal to G
Finally, any element of G that fixes a curve of positive genus must act trivially
on the pencil Λ and thus belongs to < h >. Hence, only the identity is possible.
References
[Ba-Be] L. Bayle, A. Beauville, Birational involutions of P2. Asian J. Math. 4
(2000), no. 1, 11–17.
[Be-Bl] A. Beauville, J. Blanc, On Cremona transformations of prime order.
C.R. Acad. Sci. Paris, Sér. I 339 (2004), 257-259.
[Bea1] A. Beauville, Complex Algebraic Surfaces. LondonMathematical Society
Student Texts, 34, 1996.
[Bea2] A. Beauville, p-elementary subgroups of the Cremona group. J. Algebra
314 (2007), no. 2, 553-564
[Bla1] J. Blanc, Conjugacy classes of affine automorphisms of Kn and linear
automorphisms of Pn in the Cremona groups. Manuscripta Math., 119
(2006), no.2 , 225-241.
[Bla2] J. Blanc, Finite Abelian subgroups of the Cremona group of the
plane, PhD Thesis, University of Geneva, 2006. Available online at
http://www.unige.ch/cyberdocuments/theses2006/BlancJ/meta.html
[Bla3] J. Blanc, Finite Abelian subgroups of the Cremona group of the plane.
C.R. Acad. Sci. Paris, Sér. I 344 (2007), 21-26.
[Bla4] J. Blanc, The number of conjugacy classes of elements of the Cremona
group of some given finite order. Bull. Soc. Math. France 135 (2007),
no. 3, 419-434.
[Bla5] J. Blanc, On the inertia group of elliptic curves in the Cremona group
of the plane. Michigan Math. J. (to appear) math.AG/0703804
http://www.unige.ch/cyberdocuments/theses2006/BlancJ/meta.html
http://arxiv.org/abs/math/0703804
[BPV] J. Blanc, I. Pan, T. Vust, Sur un théorème de Castelnuovo. Bull. Braz.
Math. Soc. 39 (2008), no. 1, 61-80.
[De-Ku] H. Derksen, F. Kutzschebauch, Nonlinearizable holomorphic group ac-
tions. Math. Ann. 311 (1998), no. 1, 41–53.
[dFe] T. de Fernex, On planar Cremona maps of prime order. Nagoya Math.
J. 174 (2004), 1–28.
[dF-Ei] T. de Fernex, L. Ein, Resolution of indeterminacy of pairs. Algebraic
geometry, 165-177, de Gruyter, Berlin (2002).
[Dol] I.V. Dolgachev, Weyl groups and Cremona transformations. Singulari-
ties I, 283–294, Proc. Sympos. Pure Math. 40, AMS, Providence (1983).
[Do-Iz] I.V. Dolgachev, V.A. Iskovskikh, Finite subgroups of the plane Cremona
group. To appear in ”Arithmetic and Geometry - Manin Festschrift”
math.AG/0610595
[vdE] A. van den Essen, Polynomial Automorphisms and the Jacobian Con-
jecture. Progress in Mathematics, 190. Birkhäuser Verlag, Basel, 2000.
[Isk1] V.A. Iskovskikh, Rational surfaces with a pencil of rational curves.
Math. USSR Sbornik 3 (1967), no 4.
[Isk2] V.A. Iskovskikh, Minimal models of rational surfaces over arbitrary
fields. Izv. Akad. Nauk SSSR Ser. Mat. 43 (1979), no 1, 19-43, 237.
[Isk3] V.A. Iskovskikh, Factorization of birational mappings of rational sur-
faces from the point of view of Mori theory. Uspekhi Mat. Nauk 51
(1996) no 4 (310), 3-72.
[Isk4] V.A. Iskovskikh, Two nonconjugate embeddings of the group S3×Z2 into
the Cremona group. Tr. Mat. Inst. Steklova 241 (2003), Teor. Chisel,
Algebra i Algebr. Geom., 105–109.
[Kan] S. Kantor, Theorie der endlichen Gruppen von eindeutigen Transforma-
tionen in der Ebene. Mayer & Müller, Berlin (1895).
[Kol] J. Kollár, Rational Curves on Algebraic Varieties. Ergebnisse der Mathe-
matik und ihrer Grenzgebiete. 3. Folge. Band 32, Springer-Verlag, Berlin
(1996)
[Ko-Sz] J. Kollár, E. Szabó, Fixed points of group actions and rational maps.
Canadian J. Math. 52 (2000), 1054-1056.
[Kra] H. Kraft, Challenging problems on affine n-space. Séminaire Bourbaki,
Vol. 1994/95. Astérisque No. 237 (1996), Exp. No. 802, 5, 295–317.
[Man] Yu. Manin, Rational surfaces over perfect fields, II. Math. USSR -
Sbornik 1 (1967), 141-168.
http://arxiv.org/abs/math/0610595
[Mu-Um] S. Mukai, H. Umemura,Minimal rational threefolds. Algebraic geometry,
Tokyo/Kyoto, (1982), 490–518, Lecture Notes in Math., 1016, Springer,
Berlin, 1983.
[Um] H. Umemura, On the maximal connected algebraic subgroups of the Cre-
mona group. I. Nagoya Math. J. 88 (1982), 213–246.
[Wim] A. Wiman, Zur Theorie der endlichen Gruppen von birationalen Trans-
formationen in der Ebene. Math. Ann., vol. 48, (1896), 497-498, 195-
	Introduction
	The main questions and results
	How to decide
	Linearisation of birational actions
	The approach and other results
	Comparison with other work
	Aknowledgements
	Automorphisms of P2 or P1P1
	Some facts about automorphisms of conic bundles
	The del Pezzo surface of degree 6
	The del Pezzo surface of degree 5
	Description of twisting elements
	The example Cs24
	Finite Abelian groups of automorphisms of conic bundles - birational representative elements
	Actions on del Pezzo surfaces with fixed part of the Picard group of rank one
	The results
	Other kinds of groups
ABSTRACT
  This article gives the proof of results announced in [J. Blanc, Finite
Abelian subgroups of the Cremona group of the plane, C.R. Acad. Sci. Paris,
S\'er. I 344 (2007), 21-26.] and some description of automorphisms of rational
surfaces.
  Given a finite Abelian subgroup of the Cremona group of the plane, we provide
a way to decide whether it is birationally conjugate to a group of
automorphisms of a minimal surface.
  In particular, we prove that a finite cyclic group of birational
transformations of the plane is linearisable if and only if none of its
non-trivial elements fix a curve of positive genus. For finite Abelian groups,
there exists only one surprising exception, a group isomorphic to Z/2ZxZ/4Z,
whose non-trivial elements do not fix a curve of positive genus but which is
not conjugate to a group of automorphisms of a minimal rational surface.
  We also give some descriptions of automorphisms (not necessarily of finite
order) of del Pezzo surfaces and conic bundles.

<|endoftext|><|startoftext|>
Introduction 
Organic electronics, in particular, organic field effect transistors (OFET) is a fast 
developing field of research and technological development [1-3]. Pentacene (PnC) is 
one of the most extensively studied organic semiconductors for OFETs due to its 
relatively high carrier mobility [2]. Ordered molecular materials are used in electronic 
and photonic organic devices for obtaining anisotropic properties. Therefore, techniques 
for formation of high-quality films play an important role in the development of organic 
thin film devices. For such applications, uniform films with the thickness range from 
nanometers to submicrons are required. For electronic applications, film purity and 
interface characteristics influence the charge transport and energy transfer processes. 
For optical applications, controlling of dipole orientation is required as well as uniform 
thickness and low scattering loss. It is not easy to fulfill all these requirements by the 
wet processing. On the other hand, stable polymers like polytetrafluoroethylene (PTFE) 
do not dissolve in any solvent. Therefore, vacuum-based dry processing is the only 
possible method for deposition of such polymers. Some polymers can be evaporated by 
heating in vacuum, but for complex polymers low temperature plasma polymerisation 
should be used. Primary polymer degradation products are generated by the scission of 
the molecular chain at various sites and/or the cleavage of side groups or atoms. 
Depending on the nature of the polymer structure, the scission of polymer chains can 
occur either randomly or in an ordered depolymerisation mechanism. PTFE films were 
deposited in vacuum, but with a modified technique, which includes electron cloud 
activation of the decomposition products [4]. Since the discovery of the friction transfer 
method of PTFE hot friction transfer has been used extensively to prepare substrates 
materials on top of which deposited chromophores form oriented layers by self-
organization [5-7]. Recently it was found that vacuum deposited and rubbed PTFE films 
also support growth of oriented dye layers [8-10]. Using a series of measuring 
techniques (e.g. ellipsometry, optical and infrared spectroscopy and atomic force 
microscopy) we investigated physical and optical properties of vacuum deposited PTFE 
and PnC thin films formed on top of these PTFE layers in order to find optimal 
conditions for deposition of highly oriented PnC films. 
2. Experimental 
2.1. Description of PTFE and PnC 
PTFE is a linear polymer having the chemical structure shown in figure 1a. 
(a)  polytetrafluoroethylene (PTFE) (b)  pentacene
Fig.1. Chemical structure of: (a) polytetrafluoroethylene (PTFE), (b) pentacene 
(PnC). 
PTFE can be considered to be a suitable organic material serving as gate dielectric in 
organic field-effect transistor (OFET) devices because of its physical and chemical 
properties: very good chemical, photochemical and thermal stability, low dielectric 
constant, very low conductivity and high breakdown filed strength. PTFE is one of the 
most thermally stable plastic materials manifesting no appreciable decompositions 
below 260°C.  
The chemical structure of pentacene which consists of five annulated benzene rings 
is shown in fig. 1b. Due to its flat conformation it can easily form crystals, which show 
highly anisotropic transport properties. Pentacene has a molar mass of 278.35 grams. 
The melting point is at about 300°C and the heat of vaporization is 74.4 kJ/mol. 
2.2. Deposition technique 
The preparation of PTFE films was carried out by use of a special vacuum 
deposition technique. The films were obtained by evaporation of bulk PTFE pellets in 
the temperature range between 300° and 450°C with electron cloud-assisted activation 
with typical process pressure of 10-2 Pa, an accelerating voltage of 1-3 kV and an 
electron activation current of 0 – 5 mA as proposed before [4,11]. The electron cloud 
was produced by an electron gun with a ring cathode. A computer equipped with a 
quartz oscillator card Sigma SQM-242 was monitoring the film thickness and deposition 
rate. The temperature of the crucible was monitored by a chromel-alumel thermocouple. 
The deposition rate depends both on the electron current used for activation and on the 
temperature of the crucible. At the start of a deposition run, the increase of PTFE 
temperature in the evaporator causes an increase of both pressure and deposition rate. 
Fragments are colliding with each other before reaching the substrate, losing their 
chemical reactivity by forming stable gaseous species, which will not be incorporated 
into the deposited layer on the substrate. The evaporation rate is limited by the fact that 
the pressure can rise only to a certain value at which a breakdown of the electrical gun 
occurs. Hence, there exists an operation heating temperature [4, 11], which strongly 
depends on the pumping speed of the vacuum system and should be determined for each 
installation. This method gives the possibility to have a fast control of the evaporation 
rate, that in general is limited by the thermal inertia of the crucible, but in this method it 
is controlled instantly by the electrical power, that produces the activating cloud of 
electrons and, therefore, changing the quantity of active species. All PTFE films were 
deposited at a substrate temperature of 200C. Fig. 2 shows schematically the deposition 
installation used for the PTFE film deposition. 
Fig.2. Deposition set-up used for PTFE and pentacene deposition. 
PnC for fluorescence was purchased from Sigma-Aldrich and used as received. PnC 
films were deposited onto rubbed PTFE layers using conventional a tantalum boat 
heated by electric current. Important parameter which governs film formation is the 
temperature of the substrate. Related to this temperature, kinetic limitations such as 
molecular mobility, crystallization speed, and other thermodynamic factors are 
controlling the structure and morphology of the film. The substrate temperature was 
kept constant at room or elevated temperature and monitored by a chromel-alumel 
thermocouple. The deposition rate was chosen in the range between 0.05 and 0.2 nm/s. 
The distance between evaporators and substrate was 0.15 m. 
2.3. Mechanical rubbing method 
The PTFE layers deposited onto different substrates were rubbed in an unidirectional 
mode on a cotton surface used to clean optical systems. The cotton for friction was 
placed in a fixed position on an optical table. The samples were rubbed 3- 6 times on a 
cotton surface with a constant force and speed. The scheme for cold friction is shown in 
Fig. 3. 
Fig.3. Schematic representation of the cold friction technique applied to a PTFE 
layer. 
2.4. Studies of the deposited films 
The surface morphology of the films was obtained using an Atomic Force 
Microscope (Autoprobe VP 2 Park Scientific Instruments), operating in non-contact 
mode in air at room temperature. The mean thickness and index of refraction (n) of the 
PTFE films were determined by means of ellipsometry using a Plasmos SD2000 
Automatic Ellipsometer operating at a wavelength of 632,8 nm. The thickness of the 
investigated thin films was measured using a Dektak Profilometer (DEKTAK 3 from 
Veeco Instruments) device, which has the capability of measuring the step height down 
to a few nm. Polarized absorption spectra of pentacene films were obtained with a 
UV/VIS Spectrometer (Lambda 16 Perkin Elmer). Measurements of infrared spectra of 
PTFE films have been carried out by use of a Perkin-Elmer Spectrum 2000 fourier 
transform infrared (FT-IR) spectrometer. 
3. Results and discussions 
The analysis of the results, obtained using electron cloud assisted activation 
evaporation revealed that only a few important processes determine the film properties. 
Fig. 4 shows the influence of the evaporator temperature on the layer thickness at 
constant deposition time of 10 minutes. The presented curve stops well below the 
limiting pressure above which a decrease of deposition rate occurs due to the reason 
described above [4]. With increase of electron activation current the deposition rate and 
resulting film thickness are increasing. The maximum deposition rate obtained at a 
limiting pressure of 5-6 x10–2 Pa was 0.18 nm/s at an activation current of 10 mA and a 
voltage of 3 kV. The surface relief of PTFE films deposited at different conditions onto 
silicon substrate is shown in Fig. 5. 
Fig.4. Dependence of layer thickness on crucible temperature after 10 minutes of 
deposition, I=2mA, V=1.5 kV. 
Fig.5. Surface morphology and profile of PFTE films: (a) 2 mA and (b) 1 mA 
electron current activation, respectively. 
The surface of all PTFE films is smooth. For smaller electron activation current, a 
larger granular structure on the surface is detected. Root mean square (RMS) roughness 
is 1 nm and 3 nm, respectively. The obtained RMS values indicate a smoother surface 
occurs at higher electron activation energy. Ellipsometry results confirm the AFM 
investigations: thicknesses determined by both methods are comparable. In addition, a 
change of refractive index in dependence on electron activation energy and on current 
density was found, as determined by means of ellipsometry. Thus, electron activation 
parameters affect the surface morphology and refractive index of the PTFE films. 
Table 1. Refractive index of PTFE films versus activation conditions. 
The IR spectra of deposited films under different activation are depicted in Fig. 6. 
The bands at 1161 and 1258 cm-1 was assigned to the -CF2- groups, the band at 1350 cm-
1 to groups with a double bond. The intensity of the bands at 524 and 556 cm-1 is lower, 
than the intensity of the band at 736 cm-1, thus indicating, that the material of the films 
is almost amorphous [11-13]. Normally at low electron activation PTFE layers are 
crystalline [4, 13]. An increase of electron activation current makes the films amorphous 
and increases the content of double bonds and side branches. Here at low activation 
power almost amorphous films with some double bonds but almost without branches 
were deposited. 
Fig.6. IR-spectra of a 500 nm-thick films, deposited by PTFE evaporation under 
following conditions: 1 with activation current 1,5 mA; 2 with activation current 2 mA. 
Inset: the magnification of IR spectra in the range from 500 to 1000 cm-1 is shown. 
After rubbing with a cotton cloth, the film surfaces were investigated by AFM and 
profilometry techniques. The film surface acquired ordered relief oriented in the 
direction of friction. Fig. 7 shows the relief and the profile of the PTFE layer after 
friction. The PTFE grooves have 10 -100 µm length, and about 300 nm height. Also, we 
can see that the spectral lines show exactly the linear structure of PTFE. 
Fig.7. The 1 µm × 1 µm AFM scan of rubbed PTFE films:  (a) 3D image of the film 
relief; (b) profile of a series of grooves. The groove length is about 100 nm, and the 
height is 300 nm.  
Fig.8. Polarized absorption spectra of pentacene films for the parallel (dotted 
curves) and perpendicular (solid curves) orientation of the electric vector of light in 
respect to the PTFE layer alignment.  Films were deposited at the following substrate 
conditions: a – onto 36 nm PTFE at 200C, b – onto50 nm PTFE at 200C, c – onto 90 nm 
PTFE at 750C, d – onto50 nm PTFE at 750C. Film thicknesses by quartz monitor: a) 
and b) – 75 nm, c) and d) – 80 nm. Band splitting at the main absorption is 30, 34, 40 
and 40 nm for a), b), c) and d) respectively. 
Fig.9. Surface relief of PnC film onto rubbed PTFE sublayer. 
Measurements of electronic absorption spectra of the pentacene films deposited onto 
rubbed PTFE layers of different thickness have shown that orientation of the PnC films 
depend on both the PTFE film thickness and on substrate temperature. Optical spectra of 
some PnC films are presented in Fig.8. They are in a good agreement with spectra of α- 
and β- phase of PnC films, deposited onto both inorganic and polymer substrates, 
including PTFE [7, 14]. Rubbed PTFE films of about 50 nm thickness lead to the best 
oriented PnC films. Absorption measurements with polarized light have shown that the 
deposited PnC films show a pronounced dichroism. A dichroic ratio of about 2 was 
measured even at deposition temperature of 20°C. This dichroic ratio is larger than 
obtained for PnC films deposited onto friction transferred PTFE layers, and for 
deposition at 20°C there no dichroism was observed at all. The temperature elevation 
from 20°C to 75°C slightly enhances the PnC film orientation and changes the spectral 
shape. The latter two effects can be explained by the PnC molecular mobility 
enhancement. The former one is subject for further detailed studies. A little difference in 
the spectral shape indicates different molecular interactions inside of the PnC crystals 
dependent on deposition conditions. The crystal size and structure is also sensitive to the 
deposition conditions and results in modification of the absorption spectra. The optical 
spectra of the PnC films deposited at 75°C show a small shift of all bands towards to red 
region and an increase of band splitting in comparison with bands of the films, deposited 
at 20°C, thus evidencing better intermolecular interactions in films, deposited at elevated 
temperature. Both PTFE film thickness and substrate temperature allow controlling this 
parameter in order to deposit PnC films with predetermined properties. The absorption 
of films deposited at 75°C is smaller than the absorption of films deposited at 20°C, 
although the quartz monitor thickness was the same for both samples. Obviously, a re-
evaporation took place already at 75°C as mentioned before. By AFM no preferred 
crystal orientation was found in all PnC films. The typical relief of a PnC film on a 
PTFE aligned layer is shown in Fig.9a. The crystal size is in the range of 80 to 200 nm, 
depending on deposition conditions. Sometimes freely distributed needle-like PnC 
crystal with long axis up to 500 nm appeared (Fig.9b). Such films have low optical 
anisotropy. Therefore, the optical anisotropy of PnC films is due to the unidirectional 
arrangement of PnC molecules inside all crystals. Comparison of the obtained PnC 
crystals with those grown on friction-transferred PTFE layers shows that the crystals 
grown on the vacuum deposited, rubbed PTFE layers have smaller size and more round 
shape. The like effect was found for the growth of squarylium dyes on such vacuum 
deposited PTFE films [10]. This effect is caused by a smaller relief of the surface of the 
vacuum-deposited and rubbed PTFE film in comparison to the friction transferred films. 
In addition, some differences in the structure of friction-transferred and vacuum-
deposited PTFE also plays a role. The PnC nucleation directed by PTFE edges are the 
main mechanism of growth of oriented PnC film as it was proposed by Brinkmann et. 
al.[7]. They observed that the top material domains have been enforced to grow parallel 
to the ledge direction due to the confinement by the PTFE nanofibrils. Only when the 
height of these domains exceeds that of the ledge the lateral growth of the domains is 
possible. The opinion about the prevailing influence of PTFE aligned on the molecular 
level onto dye oriented growth was expressed previously by Tanaka et. al. [8] and 
Wittmann et al. [5, 6, 7]. Our results seem to support the latter opinion, but the 
amorphous structure of our PTFE films should be taken into account. Perhaps, both 
mechanisms are taking place with different contributions in dependence on both the 
sublayer properties and deposition conditions. But even this suggestion does not explain 
all peculiarities, so further research should be carried out. 
4. Conclusions 
Amorphous PTFE films with RMS roughness of 1-3 nm were deposited by electron 
cloud-assisted deposition in vacuum. Aligned grooves and ridges on the PTFE film 
surface were obtained by rubbing with a cotton cloth. PTFE film thickness and growth 
temperature elevation influence anisotropy of pentacene film. A dichroic ratio about of 2 
was obtained even when the substrate was held at room temperature. The pentacene film 
is oriented on the molecular level.  
The strength of this technique is that the vacuum deposited, and rubbed PTFE layers 
have a higher orienting power than friction transferred PTFE layers so that they may 
favorably be used in OFETs as bottom gate dielectric which induces enhanced order in 
the channel material deposited on top of them. In addition, the vacuum deposited PTFE 
layers can also be used in top gate geometry, i.e. by deposition on top of OFET channel 
materials on plastic substrates.  
5. Acknowledgements 
The authors would like to thanks to Dagmar Stabenow (University of Potsdam), 
Ramakrishna Velagapudi (University of Applied Sciences Wildau) for the AFM and 
optical measurements and Dr. Oleg Dimitiriev (Institute of Semiconductor Physics, 
Kyiv) for the fruitful discussions. Financial support of the European Commission under 
contract number: HPRN-CT-2002-00327-RTN EUROFET and of Federal Ministry of 
Education and Research (BMBF) Project under no. Ukr 04/004 is gratefully 
acknowledged. 
6. References 
1. Daraktchlev M, von Muchlenen A, Nuesch F. New J. of Physics 2005; 7:113. 
2. Mattis BA, Pei Y, Subramanian V. Appl.Phys. Lett. 2005; 86: 033113. 
3. Misaki M, Ueda Y. Appl. Phys. Lett. 2005; 87:243503. 
4. Gritsenko KP, Krasovsky AM. Chem. Rev. 2004; 103(9):3607. 
5. Wittmann JC, Smith P. Nature 1991; 352:414. 
6. Moulin JF, Brinkmann M, Thierry A, Wittmann JC. Adv. Mater.2002; 14(6):436. 
7. Brinkmann M, Graff S, Straupe C, Wittmann JC. J.Phys.Chem. 2003; 
B107:10531. 
8. Tanaka T, Honda Y, Ishitobi M. Langmuir 2002; 17:2192. 
9. Gritsenko KP, Slominski Yu L, Tolmachev AI, Tanaka T, Schrader S. Proc.SPIE 
2002; 4833: 482. 
10. Gritsenko KP, Grinko DO, Dimitrev OP, Schrader S, Thierry A, Wittmann JC. 
Optical Memory and Neutral Networks 2004; N3:135. 
11. Roeges NP. G. A Guide to the Complete Interpretation of Infrared Spectra of 
Organic Structures, Wiley: New York (1994). 
12. Liang CY, Krimm S J. J. Chem. Phys. 1956; 25:563. 
13. Gritsenko KP, Lantoukh GV. J. Applied Spectroscopy. 1990; 52:677. 
14. Brinkmann M, Videva VS, Bieber A. J. Phys. Chem. 2004; A108:8170. 
15. Ruiz R, Chouldhary D, Nickel B. Chem. Mater. 2004; 16:4497. 
16. Pratontep S, Nüesch F, Zuppiroli L, Brinkmann M. Phys. Rev.2005; B 
72:085211.
ABSTRACT
  We investigated structure and morphology of PTFE layers deposited by vacuum
process in dependence on deposition parameters: deposition rate, deposition
temperature, electron activation energy and activation current. Pentacene (PnC)
layers deposited on top of those PTFE films are used as a tool to demonstrate
the orienting ability of the PTFE layers. The molecular structure of the PTFE
films was investigated by use of infrared spectroscopy. By means of
ellipsometry, values of refractive index between 1.33 and 1.36 have been
obtained for PTFE films in dependence on deposition conditions. Using the cold
friction technique orienting PTFE layers with unidirectional grooves are
obtained. On top of these PTFE films oriented PnC layers were grown. The
obtained order depends both on the PTFE layer thickness and on PnC growth
temperature.

<|endoftext|><|startoftext|>
Introduction
The following notations are used:
(n) stands for
n1+...+np=n
n1, . . . , np ∈ N0 and
without any indices means
(n). The notation D ≥ 0
is also used for non–symmetrical matrices Dp×p with only non–negative eigenvalues.
The spectral norm of a p× p–matrix B is denoted by ‖B‖, I or Ip is always an identity
matrix and Cp is the p–cube (−π, π]p.
The Laplace transform (L.t.) of a p–variate non–central Γp(α,Σ,∆)–density with
α > 0, Σ > 0 and a non–centrality matrix ∆ ≥ 0 was originally obtained from the L.t.
of a non-central Wp(2α,Σ,∆)–Wishart distribution (with an additional scale factor 2)
and is given by
f̂(t1, . . . , tp;α,Σ,∆) = |Ip +ΣT |−αetr(−ΣT (I +ΣT )−1∆), (1)
T = diag(t1, . . . , tp), t1, . . . , tp ≥ 0.
This function f̂ is generally the L.t. of the density of a real measure on (0,∞)p which
is not always a probability measure. The term ”Γp(α,Σ,∆)–distribution” is used here
in this general sense. The exact set of values α, leading to a probability density (pdf)
f(x1, . . . , xp;α,Σ,∆), depends on Σ and presumedly on ∆. To obtain a pdf, all positive
integers 2α (degrees of freedom) are admissible and all 2α > p − 1. Moreover, in the
central case all non–integer values 2α > p− 2 ≥ 0 are allowed. For p− 2 < 2α < p− 1
http://arxiv.org/abs/0704.0539v1
see Royen (1997). Furthermore all α > 0 are admissible if |I + ΣT |−1 is infinitely
divisible. Two characterizations of infinite divisibility of a Γp(α,Σ)–distribution are
found in Griffiths (1984) and Bapat (1989). Further conditions for admissible non–
integer 2α < p− 2 are given in Royen (1997), (2006).
Three integral representations by integration over Cp are provided by theorem 2 in
section 4 for the functions
F (x1, . . . , xp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n) (2)
. . .
f(ξ1, . . . , ξp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n)dξ1 . . . dξp,
where f has the L.t.
|Ip +ΣkT |−αketr(−ΣkT (I +ΣkT )−1∆k), (3)
α1, . . . , αn > 0,Σ1, . . . ,Σn > 0,∆1, . . . ,∆n ≥ 0.
Thus, F is not always the cumulative distribution function (cdf) of a probability measure.
In particular letXp×n be aNp×n(Mp×n,Σp×p⊗In)–random matrix and An×n ≥ 0 of
rank q with T ′AT = Λ = diag(λ1, . . . , λn), λ1 ≥ . . . ≥ λn. Then the joint distribution of
the diagonal elements of the generalized quadratic form 1
XAX ′ equals the distribution
of the diagonal of 1
Y ΛY ′ with a Np×n(MT,Σp×p⊗In)–distributed Y = XT . This is the
distribution of a sum of q independent Γp(
, λkΣ,∆k =
−1)–random vectors,
where µ∗k is the k–th column of M
∗ = MT . This joint distribution of p quadratic
forms of normal random vectors is comprised within theorem 2 as a special case with
, Σk = λkΣ, k = 1, . . . , q. For methods under more general assumptions see also
Blacher (2003). For a survey of univariate quadratic forms of normal random variables
see chapter 4 in Mathai and Provost (1992). For several quadratic forms of skew elliptical
distributions see B.Q. Fang (2005).
In Royen (1991), (1992) three different types of series expansions for the χ2p(2α,Σ)–
cdf were derived from three different representations of the χ2p(2α,Σ)–L.t. which are
extended to the general Γp(α,Σ,∆)–L.t. in section 3 in a similar way as in Royen
(1995).
Some series expansions, closely related to the first two types, are already found in
Khatri, Krishnaiah and Sen (1977). The third type was introduced because of its superior
convergence properties. The simple method to transform many series expansions into
integrals over Cp is explained in more detail in section 2 and summarized in theorem 1.
The idea is as follows:
If A(z1, . . . , zp) and B(z1, . . . , zp) are analytical functions whose power series have
the coefficients a(m1, . . . ,mp) and b(n1, . . . , np) and which are absolutely convergent for
max |zj | < rA and max |zj | < rB respectively, where r−1B < rA, then
(2π)−p
A(y1, . . . , yp)B(y
1 , . . . , y
p )dϕ1 . . . dϕp (4)
a(n1, . . . , np)b(n1, . . . , np)
holds with yj = re
iϕj , −π < ϕj ≤ π, j = 1, . . . , p and r−1B < r < rA.
The integrals in (4) might be more economical than the series if the generating
functions A and B are simple available functions and if the series are slowly convergent
with very intricate coefficients. For non–central multivariate gamma distributions series
expansions are practically not feasible.
The integral representations in theorem 2 of section 4 are of the type in (4). As
long as no elementary density formulas are availale it should be a reasonable way to
obtain the joint cdf by integration of elementary terms only over Cp and not over Rp as
by the Fourier or Laplace inversion formula. A single Γp(α,Σ,∆)–cdf is represented by
a (p− 1)–variate integral over Cp−1 in section 5.
A totally different
–variate integral representation of the Γp(α,Σ)–cdf has
been given recently by Royen (2006), which is based on m–factorial decompositions∑
p×p = D −BB′, where D is a real or complex diagonal matrix minimizing the rank
m of Σ−1 −D. Approximations to a Γp(α,Σ)–cdf are obtained by m–factorial approx-
imations to Σ with a low value of m. These approximations are improved further by
successive correction terms.
2. The method
Theorem 1 in this section can be generalized in many ways, e.g. for Fourier trans-
forms, but the version below is sufficient for the purpose of the underlying paper.
Let f̂(t1, . . . , tp), t1, . . . , tp ≥ 0, be a given L.t. of an unknown function f(x1, . . . , xp)
with f = 0 for minxj < 0. It is assumed that there are univariate L.t. ĝj0(t) of
some probability densities gj0(x) on (0,∞) and further functions hj(t) with |hj(t)| ≤ 1,
uniformly for t ≥ 0, which enable a representation
f̂(t1, . . . , tp) =
ĝj0(tj)
B (h1(t1), . . . , hp(tp)) (5)
with an analytical function B(z1, . . . , zp) whose power series expansion
b(n1, . . . , np)
j (6)
is absolutely convergent for |z1|, . . . , |zp| < rB with a certain value rB > 1.
Furthermore, the products ĝj0(t)(hj(t))
n are supposed to be the L.t. of continuous
functions gjn(x), x > 0, which satisfy the conditions
|gjn(x)| ≤ nck(x) with a constant c and
(7)∫ ∞
k(x)e−txdx < ∞ for all t > 0 .
Hence, the generating functions (generators)
gj(x, y) =
gjn(x)y
n, j = 1, . . . , p, (8)
are defined for all x > 0 and |y| < 1, and they have the L.t.
ĝj(t, y) =
ĝj0(t)
1− yhj(t)
, t ≥ 0. (9)
Theorem 1. Under the assumptions from (6) and (7) f̂ in (5) is the L.t. of
f(x1, . . . , xp) = (2π)
B(y−11 , . . . , y
gj(xj , yj)dϕj (10)
with yj = re
iϕj , −π < ϕ ≤ π, r−1B < r < 1, gj from (8).
Proof. The integral in (10) is evaluated by
(2π)−p
b(m1, . . . ,mp)
gjnj (xj)y
 dϕ1 . . . dϕp
b(n1, . . . , np)
j=1 gjnj (xj)
and this series has the L.t. from (5).
Some further remarks: With
Gj(xj , yj) =
gj(ξ, yj)dξ (11)
instead of the gj in (10), the corresponding representation arises for
F (x1, . . . , xp) =
. . .
f(ξ1, . . . , ξp)dξ1 . . . dξp. (12)
If the series in (8) are absolutely convergent for all y ∈ C then additionally
hj(t) = 0 (13)
is supposed to hold. Then the rhs of (9) is the L.t. of gj(x, y) for any fixed y and all
sufficiently large t.
In some cases the functions gj0 and their L.t. ĝj0 are known from univariate marginal
distributions apart from some scale factors. If the functions uj = hj(t) are explicitly
invertible then
B(u1, . . . , up) =
h−11 (u1), . . . , h
p (up)
j=1 ĝj0
h−1j (uj)
) (14)
can sometimes be found easily from the given f̂ .
3. Three representations for the Γp(α,Σ,∆)–Laplace transform
and the related generators
With any v > 0 we define
zj = (1 + v
−1tj)
−1, tj ≥ 0, uj = 1− zj = v−1tjzj , ωj = zj − uj ,
Z = diag(z1, . . . , zp), U = diag(u1, . . . , up), Ω = diag(ω1, . . . , ωp).
The scale factor v is introduced to obtain ‖B‖ < 1 for the matrices B defined in (20)
below and to effect the convergence of some series expansions. For a more general scaling
see remarks following theorem 2 in section 4.
From the relations
v−1T = UZ−1, Ip = Z + U, Ω = Z − U, (16)
it follows for the matrices I +ΣT in the L.t. (1):
I +ΣT = I + vΣUZ−1 = (Z + vΣU)Z−1 (17)
Z + vΣU =
I + (vΣ− I)U, (18a)
vΣ(I + (v−1Σ−1 − I)Z), (18b)
(I + vΣ)(I + (2(I + vΣ)−1 − I)Ω), (18c)
and therefore
|I +ΣT |−α = cα|Z|α|I +BY |−α (19)
Y = U, B = vΣ− I, c = 1, (20a)
Y = Z, B = (vΣ)−1 − I, c = |I +B|, (20b)
Y = Ω, B = 2(I + vΣ)−1 − I, c = |I +B|. (20c)
It should be noticed that ‖B‖ < 1 in (20c) for every v > 0 and Σ > 0.
Now, using (16), by a straightforward calculation the L.t. in (1) can be represented
f̂(t1, . . . , tp;α,Σ,∆) =
|Z|α|I +BU |−αetr(−(I +B)U(I +BU)−1∆), (21a)
|I +B|αetr(−∆)|Z|α|I +BZ|−αetr(Z(I +BZ)−1(I +B)∆), (21b)
|I +B|αetr(− 1
∆(I −B)) (21c)
·|Z|α|I +BΩ|−αetr(1
Ω(I +BΩ)−1(I +B)∆(I −B)),
with the corresponding matrices B from (20) and Z,U,Ω from (15).
For the former series expansions the following relations were used:
Laplace transform f̂(t): f(x): F (x) =
f(ξ)dξ:
zαun vg
α+n(vx) G
α+n(vx) (22a)
zα+n vgα+n(vx) Gα+n(vx) (22b)
zαωn vhα,n(vx) Hα,n(vx) (22c)
where z = (1 + v−1t)−1, gα+n(x) = e
−xxα−1+n/Γ(α+ n),
α+n(x) =
gα+n(x) =
α− 1 + n
L(α−1)n (x)gα(x)
with the generalized Laguerre polynomials L
(α−1)
n and
hα,n(x) = (−1)n
α− 1 + n
L(α−1)n (2x)gα(x).
The last identity is verified by L.t.
The following bounds are derived from (22.14.13) in Abramowitz and Stegun (1965):
∣∣∣g(n)α+n(x)
∣∣∣ ≤
ex/2gα(x), α ≥ 1
2nα−1ex/2gα(x), 0 < α < 1
|hα,n(x)| ≤
xα−1/Γ(α), α ≥ 1
2nxα−1/Γ(α+ 1), 0 < α < 1
, (24)
matching with the conditions in (7).
The following generators (generating functions) with the Γ(α+n)–cdf Gα+n(x) are
required for the formulas in theorem 2:
Fα(x, y) =
n=0 G
α+n(x)y
n = 1
, |y| < 1, (25a)
n=0 Gα+n(x)y
n = Gα(x, y), y ∈ C, (25b)
n=0 Hα,n(x)y
n = 1
x, 2y
, |y| < 1 (25c)
The identities (a) and (c) are verified by the L.t. of fα(x, y) =
Fα(x, y). A short
calculation shows
Gα(x, y) =
Gα(x) − y1−αe(y−1)x Gα(xy)
, y 6= 1, α > 0
Gα−1(x) − y1−αe(y−1)x Gα−1(xy)
, α ≥ 1, G0 := 1
xgα(x) + (1 + x− α) Gα(x), y = 1
gα(x, y) =
Gα(x, y) =
gα(x) + y
1−αe(y−1)x Gα(xy), α > 0
y1−αe(y−1)x Gα−1(xy), α ≥ 1
The functions Fα(x, y) are especially simple for α ∈ N since Gα(z) = 1−e−z
j=0 z
j/j!,
α ∈ N.
Besides,
Gk+1/2(z) = erf(z
1/2)− e−z
zj−1/2
Γ(j + 1/2)
, k ∈ N0.
The following simple lemma is used for the proof of theorem 2.
Lemma 1. If B is a symmetrical p× p–matrix with ‖B‖ < 1 and
Y = diag(y1, . . . , yp) then the power series expansion
|I +BY |−α =
b(n1, . . . , np)
is absolutely convergent for max |yj | < rB = ‖B‖−1.
This follows from
(n) |b(n1, . . . , np)| = O(ϑn) with any ϑ > ‖B‖, which has been
already shown in (2.1.16) . . . (2.1.18) in Royen (1991) (with the notation −C instead of
4. The integral representations
In theorem 2 below the functions F (x1, . . . , xp;α1, . . . , αn,Σ1, . . . ,Σn,∆1, . . . ,∆n)
from (2) are represented by three different integrals over Cp = (−π, π]p. Together with
the generators Fα from (25), α =
k=1 αk, the following matrices are used with a scale
factor v to enforce ‖Bk‖ < 1:
Bk = vΣk − I, Dk = ∆k(I +Bk), Fα from (25a), (27a)
Bk = (vΣk)
−1 − I, Dk = (I +Bk)∆k, Fα from (25b), (27b)
Bk = 2(I + vΣk)
−1 − I, Dk = 12 (I +Bk)∆k(I −Bk), Fα from (25c). (27c)
Furthermore, we define λmax = max ‖Σk‖ , λ−1min = max ‖Σ
k ‖, yj = reiϕj ,
−π < ϕj ≤ π, Y = diag(y1, . . . , yp),
K = K(y1, . . . , yp) =
etr(±(Y +Bk)−1Dk)|I +BkY −1|−αk ,
where the negative sign occurs only with Bk, Dk from (27a), and
Fαdϕ =
j=1 Fα(vxj , yj)dϕj .
Theorem 2. With the above notations the functions F from (2) are respresentable
by each of the following three integrals:
(2π)−p
KFαdϕ, (28)
Fα from (25a), Bk, Dk from (27a), ‖Bk‖ < 1 if v < 2λ−1max, max ‖Bk‖ < r < 1,
etr(−∆k)|I +Bk|αk
(2π)−p
KFαdϕ, (29)
Fα from (25b), Bk, Dk from (27b), ‖Bk‖ < 1 if v > 12λ
min, max ‖Bk‖ < r,
∆k(I −Bk)
|I +Bk|αk
(2π)−p
KFαdϕ, (30)
Fα from (25c), Bk, Dk from (27c), v > 0, max ‖Bk‖ < r < 1.
Proof. Because of lemma 1 the assumptions of theorem 1 are satisfied with
ĝj0(t) = z
j = (1 + v
−1tj)
−α and hj(tj) corresponding to zj or uj = v
−1tjzj = 1 − zj
or ωj = zj − uj respectively. The functions ĝj0(t)(hj(t))n are the L.t. of the functions
in the second column of (22) from which type (a) and (c) have the bounds in (23), (24),
satisfying the condition (7) for theorem 1. The series
n=0 Gα+n(x)y
n = Gα(x, y) in
(25b) is absolutely convergent for every y ∈ C. Thus, all r > max ‖Bk‖ are admissible
in (29). In (30) we have max ‖Bk‖ < 1 for every v > 0. Hence, theorem 1 together with
the respresentations of the L.t. in (21) implies (28), (29) and (30).
The univariate case of (29) provides
F (x;α1, . . . , αn, σ
1 , . . . , σ
1 , . . . , δ
n) = (31)(
σ−2αkk e
Gα(vx, e
δ2k/(1 + vσ
iϕ − 1))
1 + (v−1σ−2k − 1)e−iϕ
with 2v > maxσ−2k , r = 1, Gα from (26). With p = 1 similar formulas arise from (28)
or (30).
The cdf of a quadratic form 1
x′Ax with T ′AT = diag(λ1, . . . , λn) ≥ 0 of rank q
and a N (µ, σ2In)–random vector x is a special case of (31) with αk = 12 , σ
k = λkσ
2 and
non–centrality parameters δ2k =
µ∗2k /σ
2, k = 1, . . . , q, µ∗ = T ′µ.
Some further remarks: In (29) also ‖Bk‖ > 1 is allowed since every r = ‖Y ‖ >
max ‖Bk‖ is admissible, which entails max ‖BkY −1‖ < 1.
With ϑ = λmax/λmin it follows with special values of v:
max ‖Bk‖ ≤
in (28) with v = 2(λmin + λmax)
max ‖Bk‖ ≤
in (29) with v =
(λ−1min + λ
max),
max ‖Bk‖ ≤
ϑ− 1√
in (30) with v = (λminλmax)
−1/2.
More generally, the scale factor v = w2 can be replaced by a scale matrix
W 2 = diag(w21 , . . . , w
p) > 0. Then with Tw = W
−1TW−1, Σw = WΣW , ∆w = W∆W
the L.t. (1) equals
|I +ΣwTw|−α etr(−ΣwTw(I +ΣwTw)−1∆w). (32)
Consequently, besides the substitutions vΣk → WΣkW , ∆k → W∆kW−1, the matrices
I + Bk in theorem 2 must be replaced by WΣkW , (WΣkW )
−1 and 2(I + WΣkW )
respectively, and the generators Fα(vxj , yj) by Fα(w
jxj , yj).
In particular for a single Γp(α,Σ,∆)–distribution this more general scaling can be
used to minimize ‖B‖ or for a ”natural scaling” i.e. to standardize I+B to a correlation
matrix. However, ‖B‖ < 1 must be taken into account in (28), whereas this condition
is satisfied in (30) for every scaling. It was shown in Royen (1991) that natural scaling
can always be accomplished also in I +B = 2(I +WΣW )−1 by a unique W 2.
5. Representations of the Γp(α,Σ,∆) distribution function by (p− 1)–variate
integrals
For a single Γp(α,Σ,∆)–cdf it is always possible to perform the integration over a
single variable ϕj within the integrals from theorem 2.
We use the following functions
Gα(x, y) = e−y
Gα+n(x)
α+n(x)
(−y)n
x, y ∈ C, Gα+n, G(n)α+n from (22), and
G∗α(x, y) = eyGα(x, y).
For positive half integers α = 1/2 + k these functions can also be computed by the
erf–function and a sum of k terms which are essentially given by the modified Bessel
functions Ij−1/2(2(xy)
1/2), j = 1, . . . , k, (see e.g. Royen (1995) or (2006)).
Now let be W 2 = diag(w21 , . . . , w
p) a general scale matrix,
Y = diag(y1, . . . , yp), yj = re
iϕj , −π < ϕj ≤ π,
Bpp bp
b′p bpp
WΣW − I, (34a)
(WΣW )−1 − I, (34b)
2(I +WΣW )−1 − I, (34c)
Dpp dp
dp dpp
W∆ΣW, (35a)
W−1Σ−1∆W−1, (35b)
2(I +WΣW )−1W∆W−1(I − (I +WΣW )−1), (35c)
y0 = y0(y1, . . . , yp−1) = b
p(Ypp +Bpp)
−1bp − bpp (36)
q = q(y1, . . . , yp−1) = (b
p(Ypp +Bpp)
−1,−1)D
(Ypp +Bpp)
Kα = Kα(y1, . . . , yp−1) = etr(±(Ypp +Bpp)−1Dpp)|I +BppY −1pp |−α,
where the negative sign is only taken for Bpp from (34a).
Theorem 3. With the above notations the Γp(α,Σ,∆)–cdf F (x1, . . . , xp;α,Σ,∆)
is given by each of the following three integrals:
(2π)p−1
w2pxp
1− y0
1− y0
1− yj
w2jxj ,
yj − 1
dϕj , (38)
B from (34a), D from (35a), ‖B‖ < r < 1,
etr(−W∆W−1)
|WΣW |α ·
(2π)p−1
(1− y0)−α G∗α
(1− y0)w2pxp,
1− y0
jxj , yj)dϕj ,
B from (34b), D from (35b), ‖B‖ < r,
2αpetr(− 1
W∆W−1(I −B))
|I +WΣW |α ·
(2π)p−1
1− y0
(1− y0)−αGα
1− y0
1 + y0
w2pxp,
1− y20
1 + yj
w2jxj ,
yj + 1
dϕj ,
B from (34c), D from (35c), ‖B‖ < r < 1.
For the proof of theorem 3 the following two lemmas are required.
Lemma 2. With Y = diag(y1, . . . , yp), yj = re
iϕj , ‖B‖ < r, B,D, y0, q from (34),
(35), (36), (37) the following decomposition is obtained
etr((Y +B)−1D)|Y +B|−α
= etr
(Ypp +Bpp)
−1Dpp
|Ypp +Bpp|−α exp
yp − y0
(yp − y0)−α.
Proof. From frequently used formulas for p× p–matrices, (see e.g. complements
and problems 2.4, 2.7 in chapter 1b of Rao (1973)) it follows for
A = Y +B =
App bp
b′p yp + bpp
|A| = |App|(yp + bpp − b′pA−1pp bp) = |Ypp +Bpp|(yp − y0),
A−1 =
A−1pp +
yp−y0
A−1pp bpb
pp − 1yp−y0A
pp bp
yp−y0
yp−y0
trace(A−1D)
= trace
A−1pp Dpp +
yp − y0
A−1pp bpb
pp Dpp −A−1pp bpdp
yp − y0
(dpp − b′pA−1pp dp)
= trace(A−1pp Dpp) +
yp − y0
, which implies (41).
Lemma 3. Let be q any number, Sr = {y ∈ C
∣∣|y| = r}, y0 any number with
|y0| < r, then with Fα from (25), Gα,G∗α from (33), and the negative sign in ±q only for
(42a)
y − y0
Fα(x, y)(y − y0)−αyα−1dy
Fα from (25a), r < 1, (42a)
(1− y0)−α G∗α
(1 − y0)x, q1−y0
, Fα from (25b), (42b)
(1− y0)−α Gα
x, 2q
, Fα from (25c), r < 1, (42c)
Proof. It is sufficient to verify (42) for the corresponding derivatives fα =
At first, (42a) is shown:
With Fα from (25a) and the binomial series for (1− y0/y)−(α+n) we obtain
fα(x, y)(y − y0)−(α+n)yα−1dy
α+m(x)y
α+ n+ k − 1
y−n−1dy.
With z = (1 + t)−1, u = tz, the last integral has the L.t.
(uy)m
α+ n+ k − 1
y−n−1dy
m=n+k
α+ n+ k − 1
yk0 = z
αun(1− uy0)−(α+n).
Multiplication by (−q)n/n! and summation over n leads to the L.t.
(1− uy0)α
1− uy0
(1 + (1− y0)t)α
(1− y0)t
1 + (1− y0)t
and this is the L.t. of ∂
To verify (42b) we obtain with Fα from (25b):
fα(x, y)(y − y0)−(α+n)yα−1dy
gα+m(x)y
Γ(α+ n+ k)
Γ(α+ n)k!
y−n−1dy
m=n+k
gα+m(x)
Γ(α + n+ k)
Γ(α+ n)k!
yk0 = gα+n(x)e
= (1− y0)−(α+n)(1− y0)gα+n((1 − y0)x).
Multiplication by qn/n! and summation provides (1− y0)−α ∂∂x G
(1− y0)x, q1−y0
(42c) can be shown by L.t. in a similar way as (42a).
Proof of theorem 3. Without loss of generality yp is selected from the variables
yj = re
iϕj in Y = diag(y1, . . . , yp) with any fixed r > ‖B‖. If yp is replaced by a variable
y with any |y| then the equation
|Y +B| = |Ypp +Bpp|(yp + bpp − b′p(Ypp +Bpp)−1bp) = 0
has always a unique solution
y = y0 = b
p(Ypp +Bpp)
−1bp − bpp
with |y0| < r since ‖Bpp‖ ≤ ‖B‖.
Hence, with lemma 2 and lemma 3, theorem 3 is obtained by integration over ϕp in
the integrals of theorem 2 with n = 1.
References
Abramowitz, M. and Stegun, I.A. (1968). Handbook of Mathematical Functions, Dover, New York.
Bapat, R.B. (1989). Infinite divisibility of multivariate gamma distributions and M–matrices, Sankhyā,
Series A 51, 73–78.
Blacher, R. (2003). Multivariate quadratic forms of random vectors, Journal of Multivariate Analysis
87, 2–23.
Fang, B.Q. (2005). Noncentral quadratic forms of the skew elliptical variables, Journal of Multivariate
Analysis 95, 410–430.
Griffiths, R.C. (1984). Characterization of infinitely divisible multivariate gamma distributions, Journal
of Multivariate Analysis 15, 13–20.
Khatri, C.G., Krishnaiah, P.R. and Sen, P.K. (1977). A note on the joint distribution of correlated
quadratic forms, Journal of Statistical Planning and Inference 1, 299–307.
Krishnamoorthy, A.S. and Parthasarathy, M. (1951). A multivariate gamma type distribution, Annals
of Mathematical Statistics 22, 549–557 (correction: ibid. (1960), 31, p. 229).
Mathai, A.M. and Provost, S.B. (1992). Quadratic forms in random variables: Theory and applications,
Marcel Dekker, New York.
Rao, C.R. (1973). Linear Statistical Inference and its Applications, 2nd edition, Wiley, New York.
Royen, T. (1991). Expansions for the multivariate chi–square distribution, Journal of Multivariate
Analysis 38, 213–232.
Royen, T. (1992). On representation and computation of multivariate gamma distributions, in: Data
Analysis and Statistical Inference - Festschrift in Honour of Friedhelm Eicker, 201–216, Verlag
Josef Eul, Bergisch Gladbach, Köln.
Royen, T. (1995). On some central and non–central multivariate chi–square distributions, Statistica
Sinica 5, 373–397.
Royen, T. (1997). Multivariate gamma distributions (Update), Encyclopedia of Statistical Sciences,
Update Volume 1, 419–425, Wiley, New York.
Royen, T. (2006). Integral representations and approximations for multivariate gamma distributions,
Annals of the Institute of Statistical Mathematics, DOI 10.1007/s10463-006-0057-5.
	Introduction 
	The method 
	Three representations for the p(,,)–Laplace transform and the related generators
	The integral representations
	Representations of the p(,,) distribution function by (p-1)–variate integrals
ABSTRACT
  Three types of integral representations for the cumulative distribution
functions of convolutions of non-central p-variate gamma distributions are
given by integration of elementary complex functions over the p-cube Cp =
(-pi,pi]x...x(-pi,pi]. In particular, the joint distribution of the diagonal
elements of a generalized quadratic form XAX' with n independent normally
distributed column vectors in X is obtained. For a single p-variate gamma
distribution function (p-1)-variate integrals over Cp-1 are derived. The
integrals are numerically more favourable than integrals obtained from the
Fourier or laplace inversion formula.

<|endoftext|><|startoftext|>
Introduction
	The Channel Model
	An Achievable Rate Region for the Discrete Memoryless IC-DMS
	Random Codebook Generation
	Encoding and Transmission
	Decoding
	Evaluation of Probability of Error
	Relating with Existing Rate Regions
	A Subregion of R
	A Subregion of Rsim
	The Gaussian IC-DMS
	The Channel Model of the GIC-DMS
	Achievable Rate Regions for the GIC-DMS
	Gaussian Extension of R
	Gaussian Extension of Rsuc
	Numerical Examples
	Comparing with Rate Regions in Tarokh06:icdmscog
	Comparing with Rate Regions in jovicic06:cogICDMS,wuwei06icdms
	Conclusions
	References
ABSTRACT
  The interference channel with degraded message sets (IC-DMS) refers to a
communication model in which two senders attempt to communicate with their
respective receivers simultaneously through a common medium, and one of the
senders has complete and a priori (non-causal) knowledge about the message
being transmitted by the other. A coding scheme that collectively has
advantages of cooperative coding, collaborative coding, and dirty paper coding,
is developed for such a channel. With resorting to this coding scheme,
achievable rate regions of the IC-DMS in both discrete memoryless and Gaussian
cases are derived, which, in general, include several previously known rate
regions. Numerical examples for the Gaussian case demonstrate that in the
high-interference-gain regime, the derived achievable rate regions offer
considerable improvements over these existing results.

<|endoftext|><|startoftext|>
Introduction
The additive group of integers modulo n will be denoted by Zn.
Let G be a finite Abelian group and let X ⊂ G. The subgroup generated by a subset X of G
will be denoted 〈X〉. For a positive integer k, we shall write
k ∧X =
A ⊂ X and |A| = k
Following the terminology of [12] we write
k ∧X.
The set X is said to be complete if SX = 〈X〉. The reader may find the connection between this
notion and the corresponding notion for integers in [12]. We shall also write
S0X = SX ∪ {0}.
Note that S0X =
x∈X{0, x}.
Université Pierre et Marie Curie, E. Combinatoire, Case 189, 4 Place Jussieu, 75005 Paris, France.
yha@ccr.jussieu.fr
Universitat Politècnica de Catalunya, Dept. Matemàtica Apl. IV; Jordi Girona, 1, E-08034 Barcelona, Spain.
allado@ma4.upc.edu
Universitat Politècnica de Catalunya, Dept. Matemàtica Apl. IV; Jordi Girona, 1, E-08034 Barcelona, Spain.
oserra@ma4.upc.edu
http://arxiv.org/abs/0704.0541v1
Let p denote a prime number and let A ⊂ Zp \ {0}. Erdős and Heilbronn [4] showed that A is
complete if |A| ≥
p, and conjectured that
18 can be replaced by 2. This conjecture was
proved by Olson[8]. More precisely, Olson’s Theorem states that A is complete if |A| ≥
4p− 4.
This result was sharpened by Dias da Silva and one of the authors [1] by showing that |k∧A| = p,
if |A| ≥
4p − 4, where k = ⌈
p− 1 ⌉. They also showed that |(j ∧ A) ∪ ((j + 1) ∧A)| = p, if
|A| ≥
4p− 8, where j = ⌈
p− 2 ⌉.
Let G be a finite abelian group and let A ⊂ G\{0}. Complete sets for general abelian group were
investigated by Diderrich and Mann [3]. Diderrich [2] proved that, if |G| = pq is the product of
two primes, then A is complete if |A| ≥ p+ q − 1.
Let p be the smallest prime dividing |G|. Diderrich conjectured [2] that A is complete, if |G|/p
is composite and |A| = p+ |G|/p− 2. This conjecture was finally proved by Gao and one of the
authors [5]. More precise results were later proved by Gao and the present authors [6]. Note
that the bound of Diderrich is best possible, since one may construct non complete sets of size
p+ |G|/p − 3.
However the result of Olson was extended recently by Vu [13] to general cyclic groups. Let
A ⊂ Zn be such that all the elements of A are coprime with n. Vu proved that there is an
absolute constant c such that, for an arbitrary large n, A is complete if |A| ≥ c
n. The proof of
Vu is rather short and depends on a recent result of Szemerédi and Vu [11]. In the same paper
Vu conjectures that the constant is essentially 2.
Our main result is the following:
Theorem 1.1 Let A be a subset of Zn be such that all the elements of A are coprime with n.
If |A| > 1 + 2
n− 4 then A is complete.
This result implies the validity of the last conjecture of Vu. We conjecture the following:
Conjecture 1.2 Let A ⊂ Zn be such that all the elements of A are coprime with n and |A| ≥√
4n− 4. Then |k ∧A| = n, where k = ⌈
n− 1 ⌉.
2 Some tools
In this section we present known material and some easy applications of it. We give short proofs
in order to make the paper self-contained.
Recall the following well-known and easy lemma.
Lemma 2.1 Let G be a finite group. Let X and Y be subsets of G such that X + Y 6= G. Then
|X|+ |Y | ≤ |G|.
Proof. Take a ∈ G \ (X + Y ). We have (a− Y ) ∩X = ∅. ✷
We use also the Chowla’s Theorem [7, 10] :
Theorem 2.2 (Chowla [7, 10]) Let n be a positive integer and let X and Y be non-empty
subsets of Zn. Assume that 0 ∈ Y and that the elements of Y \ {0} are coprime with n. Then
|X + Y | ≥ min(n, |X| + |Y | − 1).
Proof. The proof is by induction on |Y |, the result being obvious for |Y | = 1. Assume first
that Y ⊂ X − x, for all x ∈ X. Then X + Y ⊂ X, and hence X + Y = X. It follows that
X + Y = X + nY = Zn.
Assume now that Y 6⊂ X − x, for some x ∈ X. Then 0 ∈ Y ∩ (X − x) and |Y ∩ (X − x)| < |Y |.
By the induction hypothesis, |X|+ |Y | − 1 ≤ |((X − x) ∪ Y ) + ((X − x) ∩ Y )| ≤ |(X − x) + Y |.
Let B ⊂ G and x ∈ G. Following Olson, we write
λB(x) = |(B + x) \B|.
The following result is implicit in [8]:
Lemma 2.3 (Olson, [8]) Let Y be a nonempty subset of G \ {0}, z /∈ Y and y ∈ Y . Put
B = S0Y . Then
|B| ≥ |S0Y \{y}|+ λB(y), (1)
|S0Y ∪{z}| = |S
Y |+ λB(z). (2)
Proof. Clearly we have B∩(B−y) ⊂ B\S0
Y \{y}
and hence λB(y) = |B∩(B−y)| ≤ |B|−|S0Y \{y}|.
¿From S0
Y ∪{z}
= B + {0, z} we have |S0
Y ∪{z}
| = |B|+ |(B + z) \B|} = |B|+ λB(z). ✷
We need the following helpful result also due to Olson:
Lemma 2.4 (Olson [8]) Let B and C be nonempty subsets of an abelian group G such that
0 6∈ C. Then,
λB(x) = λB(−x). (3)
λB(x+ y) ≤ λB(x) + λB(y). (4)
λB(x) ≥ |B|(|C| − |B|+ 1). (5)
Proof. For each x ∈ G we have
|(B + x) ∩B| = |B + x| − |(B + x) ∩B|
= |B − x| − |B ∩ (B − x)|
= |B ∩ (B − x)| = λB(−x),
proving (3). Let x, y ∈ G. Then,
λB(x+ y) = |(B + x+ y) ∩B|
= |(B + x) ∩ (B − y)|
= |(B + x) ∩B ∩ (B − y)|+ |(B + x) ∩B ∩ (B − y)|
≤ |(B + x) ∩B|+ |B ∩ (B − y)|
= λB(x) + λB(y),
proving (4). Finally,
λB(x) ≥
(|B + x| − |B ∩ (B + x)|)
≥ |C||B| −
|B ∩ (B + x)|
≥ |C||B| −
x∈G\0
|B ∩ (B + x)|
= |B|(|C| − |B|+ 1),
proving (5). ✷
3 The main result
The next Lemma is the key tool for our main result.
Lemma 3.1 Let A and B be nonempty subsets of Zn. Assume that A ∩ (−A) = ∅ and that
each element in A is coprime with n. Put a = |A| and b = |B|. Assume also that a ≥ 3 and
2b ≤ n+ 2. Then
λB(x) > a−
a(a− 3)
. (6)
In particular, if 2b ≥ a(a− 3), then
λB(x) ≥ a− 1. (7)
Proof. Put A∗ = A ∪ (−A) ∪ {0}. Let t < n be a positive integer and set
t = 2ma+ r, m ≥ 0, 0 ≤ r ≤ 2a− 1.
Let Cj = jA
∗. By Chowla’s theorem, |Cj| ≥ min{n, 2ja + 1} = 2ja + 1, for j ≤ m. Therefore
we can choose a set C ⊃ A∗ of cardinality t + 1 which intersects Cj in exactly 2ja elements
j = 2, . . . ,m, and intersects Cm+1 in exactly r elements. Let E = C \{0}. Let α = max{λB(x) :
x ∈ A}. By (3) we have λB(x) ≤ α, for all x ∈ A∗. For an element x in Cj there are elements
x1, · · · , xj ∈ A∗ such that x = x1+ · · ·+xj. In view of (4) we have λB(x) ≤ λ(x1)+ · · ·+λ(xj) ≤
jα. Therefore,
λB(x) ≤ α2a+ 2α2a + · · ·+mα2a+ r(m+ 1)α
= α(m+ 1)(ma+ r) =
α(t− r + 2a)(t+ r)
≤ α(t+ a)
By using (5) we have
x∈E λB(x)
(t+ a)2
≥ 4ab(t− b+ 1)
(t+ a)2
In particular, since 2b ≤ n+ 2, we can set t = 2b− 3 to get,
α ≥ 4ab(b− 2)
(2b+ a− 3)2
≥ a(b− 2)
(1− a− 3
> a− a(a− 3)
where we have used a ≥ 3. In particular, if 2b ≥ a(a − 3), then α > a − 2 so that α ≥ a − 1.
This completes the proof. ✷
Lemma 3.1 gives the following estimation for the cardinality of the set of subset sums.
Lemma 3.2 Let A ⊂ Zn such that A ∩ (−A) = ∅ and every element of A is coprime with n.
Also assume |A| ≥ 2. Then
|S0A| ≥ min{
n + 2
, 3 +
|A|(|A| − 1)
Proof. We shall prove the result by induction on a = |A|, the result being obvious for a = 2.
Suppose a > 2. Put B = S0A. We may assume b = |B| ≤ n2 + 1 so that 2b ≤ n + 2. By the
induction hypothesis, 2b ≥ 6 + (a− 1)(a− 2) > a(a− 3).
By (7) there is an x ∈ A with λB(x) ≥ a− 1. Then, by Lemma 2.3,
|B| ≥ |S0A\{x}|+ λB(x) ≥ 3 + (a− 2)(a − 1)/2 + a− 1 = 3 +
a(a− 1)
as claimed. ✷
We are now ready for the proof of Theorem 1.1.
Proof of Theorem 1.1. Suppose A non complete and put |A| = k. Let X,Y be disjoint subsets
of A. We clearly have SX + S
Y ⊂ SA 6= Zn. Since |SX | ≥ |S0X | − 1, we have
|S0X |+ |S0Y | ≤ n+ 1, (8)
by Lemma 2.1.
Partition A = A1 ∪ A2 into two almost equal parts, i.e. |A1| = ⌈k/2⌉ and |A2| = ⌊k/2⌋, such
that Ai ∩ (−Ai) = ∅, i = 1, 2.
We must have
3 + ⌊k
⌋ − 1)/2 < (n+ 2)/2, (9)
since otherwise, by Lemma 3.2, we have |S0A1 |+ |S
| ≥ n+ 2, contradicting (8).
Case 1. k even.
Then we have by (9)
n/2 > 2 +
− 1)/2 = 2 + k(k − 2)/8,
and hence (k − 1)2 + 16 ≤ 4n, a contradiction.
Case 2. k odd.
Put a = k−1
. In view of (9), Lemma 3.2 implies
|S0A2 | ≥ 3 + a(a− 1)/2.
By (7) with B = S0A2 , there is a y ∈ A1 such that
λB(y) ≥ a− 1.
Put C1 = A1 \ {y} and C2 = A2 ∪ {y}. Then we have, by Lemma 2.3,
|S0C2 | ≥ |S
|+ λB(y) ≥ 3 + a(a− 1)/2 + a− 1 = 2 +
a(a+ 1)
On the other hand, from (9) and Lemma 3.2 we get
|S0C1 | ≥ 3 +
a(a− 1)
By (8),
n+ 1 ≥ |S0C1 |+ |S
| ≥ 3 + a(a− 1)/2 + 2 + a(a+ 1)/2 = 5 + a2.
Therefore 4n ≥ 16 + 4a2 = 16 + (k − 1)2, a contradiction. This completes the proof. ✷
References
[1] J.A. Dias da Silva and Y. O. Hamidoune, Cyclic spaces for Grassmann derivatives and
additive theory, Bull. London Math. Soc., 26 (1994), 140-146.
[2] G.T. Diderrich, An addition theorem for abelian groups of order pq, J. Number Theory 7
(1975), 33-48.
[3] G. T. Diderrich and H. B. Mann, Combinatorial problems in finite abelian groups, In: ”A
survey of Combinatorial Theory” (J.L. Srivasta et al. Eds.), pp. 95- 100, North- Holland,
Amsterdam (1973).
[4] P. Erdős and H. Heilbronn, On the Addition of residue classes mod p, Acta Arith. 9 (1964),
149-159.
[5] W. Gao and Y.O. Hamidoune, On additive bases, Acta Arith. 88 (1999), 3, 233-237.
[6] W. Gao, Y.O. Hamidoune A. S. Lladó and O. Serra, Covering a finite abelian group by
subset sums. Combinatorica 23 (2003), no. 4, 599–611.
[7] H.B. Mann, Addition Theorems, R.E. Krieger, New York, 1976.
[8] J. E. Olson, An addition theorem mod p, J. Comb. Theory 5 (1968), 45-52.
[9] S. Chowla, A theorem on the addition of residue classes: applications to the number Γ(k)
in Waring’s problem, Proc.Indian Acad. Sc. 2 (1935) 242–243.
[10] M. B. Nathanson, Additive Number Theory. Inverse problems and the geometry of sumsets,
Grad. Texts in Math. 165, Springer, 1996.
[11] E. Szemerédi and V.H. Vu, Long arithmetic progressions in finite and infinite sets, Annals
of Math., to appear.
[12] T. Tao and V.H. Vu, Additive Combinatorics, Cambridge Studies in Advanced Mathematics
105 (2006), Cambridge Press University.
[13] V.H. Vu, Olson Theorem for cyclic groups, Preprint, arXiv:math.NT/0506483 v1, 23 june
2005.
http://arxiv.org/abs/math/0506483
	Introduction
	Some tools
	The main result
ABSTRACT
  A subset $X$ of an abelian $G$ is said to be {\em complete} if every element
of the subgroup generated by $X$ can be expressed as a nonempty sum of distinct
elements from $X$.
  Let $A\subset \Z_n$ be such that all the elements of $A$ are coprime with
$n$. Solving a conjecture of Erd\H{o}s and Heilbronn, Olson proved that
  $A$ is complete if $n$ is a prime and if $|A|>2\sqrt{n}.$
  Recently Vu proved that there is an absolute constant $c$, such that for an
arbitrary large $n$, $A$ is complete if $|A|\ge c\sqrt{n},$ and conjectured
that 2 is essentially the right value of $c$. We show that $A$ is complete if
$|A|> 1+2\sqrt{n-4}$, thus proving the last conjecture.

<|endoftext|><|startoftext|>
Introduction 4
Organization of the paper . . . . . . . . . . . . . . . . . . . . . . . . . 5
Important note added . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
I The theorem 7
1 The set up 7
1.1 The statement of the problem . . . . . . . . . . . . . . . . . . . . 7
1.2 Some convenient choices . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Reduction to the case n even . . . . . . . . . . . . . . . . . . . . 8
2 The theorem 9
2.1 Basic notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Two fundamental definitions . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Definition of v-chain . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Definition of O-domination . . . . . . . . . . . . . . . . . . 11
2.3 The main theorem and its corollary . . . . . . . . . . . . . . . . . 11
II From geometry to combinatorics 13
3 Reduction to combinatorics 13
3.1 Homogeneous co-ordinate ring of the Schubert variety X(w) . . . 13
3.1.1 The line bundle L on Md(V ) . . . . . . . . . . . . . . . . . 13
3.1.2 The section qθ of L . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 Standard monomial theory for Md(V ) . . . . . . . . . . . . 14
3.2 Co-ordinate rings of affine patches and tangent cones of X(w) . . 14
3.2.1 Standard monomial theory for affine patches . . . . . . . . . 14
3.2.2 Standard monomial theory for tangent cones . . . . . . . . . 16
4 Further reductions 17
4.1 The main propositions . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 From the main propositions to the main theorem . . . . . . . . . 18
III The proof 21
5 Terminology and notation 21
5.1 Distinguished subsets . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Distinguished subsets of N . . . . . . . . . . . . . . . . . . 21
5.1.2 Attaching elements of I(d, 2d) to distinguished subsets of N 21
5.2 The involution # . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.1 The involution # on I(d, 2d) . . . . . . . . . . . . . . . . . 21
5.2.2 The involution # on N and R . . . . . . . . . . . . . . . . 22
5.3 The subset SC attached to a v-chain C . . . . . . . . . . . . . . 22
5.3.1 Vertical and horizontal projections of an element of ON . . 22
5.3.2 The “connection” relation on elements of a v-chain . . . . . 22
5.3.3 The definition of SC . . . . . . . . . . . . . . . . . . . . . 23
5.3.4 The type of an element α of a v-chain C, and the set SC,α 23
6 O-depth 27
6.1 Definition of O-depth . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 O-depth and depth . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 O-depth and type . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7 The map Oπ 34
7.1 Description of Oπ . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.2 Illustration by an example . . . . . . . . . . . . . . . . . . . . . . 35
7.3 A proposition about Sj,j+1 . . . . . . . . . . . . . . . . . . . . . 38
7.4 Proof of Proposition 7.1.1 . . . . . . . . . . . . . . . . . . . . . . 40
7.5 More observations . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8 The map Oφ 42
8.1 Description of Oφ . . . . . . . . . . . . . . . . . . . . . . . . . . 42
8.2 Basic facts about Tw,j,j+1 and T
w,j,j+1 . . . . . . . . . . . . . . . 45
9 Some Lemmas 46
9.1 Lemmas from the Grassmannian case . . . . . . . . . . . . . . . . 46
9.2 Orthogonal analogues of Lemmas of 9.1 . . . . . . . . . . . . . . 49
9.3 Orthogonal analogues of some lemmas in [7] . . . . . . . . . . . . 51
9.4 More lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
10 The Proof 59
10.1 Proof of Proposition 4.1.1 . . . . . . . . . . . . . . . . . . . . . . 59
10.2 Proof that OφOπ = identity . . . . . . . . . . . . . . . . . . . . 62
10.3 Proof that OπOφ = identity . . . . . . . . . . . . . . . . . . . . 63
10.4 Proof of Proposition 4.1.3 . . . . . . . . . . . . . . . . . . . . . . 64
IV An Application 65
11 Multiplicity counts certain paths 65
11.1 Description and illustration . . . . . . . . . . . . . . . . . . . . . 65
11.2 Justification for the interpretation . . . . . . . . . . . . . . . . . 67
References 71
Index of definitions and notation 73
Introduction
In this paper the following problem is solved: given a Schubert variety in an or-
thogonal Grassmannian (by which is meant the variety of isotropic subspaces of
maximum possible dimension of a finite dimensional vector space with a sym-
metric non-degenerate form—see §1 for precise definitions) and an arbitrary
point on the Schubert variety, how to compute the multiplicity, or more gener-
ally the Hilbert function, of the local ring of germs of functions at that point.
In a sense, our solution is but a translation of the problem: we do not give
closed form formulas but alternative combinatorial descriptions. The meaning
of “alternative” will presently become clear.
The same problem for the Grassmannian was treated in [11, 8, 7, 9, 12]
and for the symplectic Grassmannian in [4]. The present paper is a sequel
to [11, 7, 9, 12, 4] and toes the same line as them. In particular, its strategy
is borrowed from them and runs as follows: first translate the problem from
geometry to combinatorics, or, more precisely, apply standard monomial theory
to obtain an initial combinatorial description of the Hilbert function (the earliest
version of the theory capable of handling Schubert varieties in an orthogonal
Grassmannian is to be found in [17]); then transform the initial combinatorial
description to obtain the desired alternative description. But that is easier said
than done.
While the problem makes sense for Schubert varieties of any kind and stan-
dard monomial theory itself is available in great generality [13, 15], the transla-
tion of the problem from geometry to combinatorics has been made—in [14]—
only for “minuscule1 generalized Grassmannians.” Orthogonal Grassmannians
being minuscule, this translation is available to us and we have an initial combi-
natorial description of the Hilbert function. As to the passage from the initial to
the alternative description—and this is where the content of the present paper
lies—neither the end nor the means is clear at the outset.
The first problem then is to find a good alternative description. But how to
measure the worth of an alternative description? The interpretation of multi-
plicity as the number of certain non-intersecting lattice paths (deduced in §11
from our alternative description) seems to testify to the correctness of our al-
ternative description, but we are not sure if there are others that are equally or
more correct.
The proof of the equivalence of the initial and alternative combinatorial de-
scriptions is, unfortunately, a little technically involved. It builds on the details
of the proofs of the corresponding equivalences in the cases of the Grassmannian
and the symplectic Grassmannian. In [10] it is shown that the equivalence in
the case of the Grassmannian is a kind of KRS correspondence, called “bounded
KRS.” The proof there is short and elegant and it would be nice to realise the
main result of the present paper too in a similar spirit as a kind of KRS corre-
spondence.
The initial description is in terms of “standard monomials” and the alterna-
1Symplectic Grassmannians are not minuscule but can be treated as if they were.
tive description in terms of “monomials in roots.” The equivalence of the two
descriptions thus gives a bijective correspondence between standard monomials
and monomials in roots. Roughly—but not actually—the correspondence maps
each standard monomial to its initial term (with respect to a certain monomial
order). Thus it is natural to wonder whether we can compute the initial ideal
of the ideal of the tangent cone to the Schubert variety at the given point. We
believe that this can be done but that it is far more involved and difficult than
the corresponding computation for Grassmannians and symplectic Grassmanni-
ans (the natural set of generators of the ideal of the tangent cone do not form a
Gröbner basis unlike in those cases). If all goes well, the computation will soon
appear [16].
Taking the Schubert variety to be of a special kind and the point to be the
“identity coset,” our problem specializes to a problem about Pfaffian ideals con-
sidered in [5, 2]. On the other side of the spectrum from the identity coset, so to
speak, lie the “generic singularities,” points that are generic in the complement
of the open orbit of the stabiliser of the Schubert variety. For these, a geometric
solution to the problem appears in [1].
Given that our solution of the problem is but a translation, it makes sense
to ask if one can extract more tangible information—closed form formulas for
example—from our alternative description. See the papers quoted in the previ-
ous paragraph and also [3] for some answers in the special cases they consider.
Organization of the paper
The table of contents indicates how the paper is organized. There is a brief
description at the beginning of every subdivision of the contents therein. An
index of definitions and notation is included, for it would otherwise be difficult
to find the meanings of certain words and symbols.
Important note added
The recent article [6] treats some of the questions addressed here and some that
could be addressed by using the main result proved here. It includes:
• an interpretation of the multiplicity similar to ours.
• a closed formula for the multiplicity (as a specialization of a factorial Schur
function), thereby answering the question we raised above.
• a formula for the restriction to the torus fixed point of the equivariant
cohomology class of a Schubert variety.
The approach in [6] is quite different from ours. In fact, it is the opposite of
ours in that it circumvents the lack of results about initial ideals of tangent
cones, while our prime motivation is to remedy the lack. The starting points
in the two approaches are also different: [6] takes off from certain results of
Kostant-Kumar and Arabia on equivariant cohomology, while our launchpad is
standard monomial theory.
The appearance of [6] notwithstanding, our approach is worthwhile, for,
quite apart from the difference in starting points, there is no way, as far as we
can tell, to the Hilbert function via the approach of [6], nor to the initial ideal,
both of which are interesting in their own right.
Part I
The theorem
Definitions are recalled, the problem formulated, and the theorem stated.
1 The set up
In this section, we state the problem to be addressed after recalling the neces-
sary basic definitions, make some choices that are convenient for studying the
problem, and see why it is enough to focus on a particular case of the problem.
1.1 The statement of the problem
Fix an algebraically closed field of characteristic not equal to 2. Fix a vector
space V of finite dimension n over this field and a non-degenerate symmetric
bilinear form 〈 , 〉 on V . Let d be the integer such that either n = 2d or n =
2d+ 1. A linear subspace of V is said to be isotropic if the form 〈 , 〉 vanishes
identically on it. It is elementary to see that an isotropic subspace of V has
dimension at most d and that every isotropic subspace is contained in one of
dimension d. Denote by Md(V )
′ the closed sub-variety of the Grassmannian
of d-dimensional subspaces consisting of the points corresponding to isotropic
subspaces.
The orthogonal group O(V ) of linear automorphisms of V preserving 〈 , 〉
acts transitively on Md(V )
′, for by Witt’s theorem an isometry between sub-
spaces can be lifted to one of the whole vector space. If n is odd the special
orthogonal group SO(V ) (consisting of form preserving linear automorphisms
with trivial determinant) itself acts transitively on Md(V )
′. If n is even the spe-
cial orthogonal group SO(V ) does not act transitively on Md(V )
′, and Md(V )
has two connected components. We define the orthogonal Grassmannian Md(V )
to be Md(V )
′ if n is odd and to be one of the two components of Md(V )
′ if n
is even.
The Schubert varieties of Md(V ) are defined to be the B-orbit closures
in Md(V ) (with canonical reduced scheme structure), where B is a Borel sub-
group of SO(V ). The choice of B is immaterial, for any two of them are con-
jugate. The question that is tackled in this paper is this: given a point on a
Schubert variety in Md(V ), how to compute the multiplicity (and more gen-
erally, the Hilbert function) of the Schubert variety at the given point? The
answers are contained in Theorem 2.3.1 and Corollary 2.3.2. But in order to
make sense of those statements, we need some preparation.
1.2 Some convenient choices
We now make some choices that are convenient for the study of Schubert vari-
eties. For k an integer such that 1 ≤ k ≤ n, set k∗ := n + 1 − k. Fix a basis
e1, . . . , en of V such that
〈ei, ek〉 =
1 if i = k∗
0 otherwise
The advantage of this choice is: the elements of SO(V ) for which each ek is an
eigenvector form a maximal torus, and the elements that are upper triangular
with respect to this basis form a Borel subgroup (a linear transformation is upper
triangular if for each k, 1 ≤ k ≤ n, the image of ek under the transformation
is a linear combination of e1, . . . , ek). We denote this maximal torus and this
Borel subgroup by T and B respectively. Our Schubert varieties will be orbit
closures of this particular Borel subgroup B.
The B-orbits of Md(V )
′ are naturally indexed by its T -fixed points: each
orbit contains one and only one such point. The T -fixed points are evidently of
the form 〈ei1 , . . . , eid〉, where 1 ≤ i1 < . . . < id ≤ n and for each k, 1 ≤ k ≤ d,
there does not exist j, 1 ≤ j ≤ d, such that i∗k = ij—in other words, for each
ℓ, 1 ≤ ℓ ≤ n, such that ℓ 6= ℓ∗, exactly one of ℓ and ℓ∗ appears in {i1, . . . , id};
in addition, if n is odd, then d+ 1 does not appear in {i1, . . . , id}. Denote the
set of such d-element subsets {i1 < . . . < id} by I
n. We thus have a bijective
correspondence between I ′n and the B-orbits of Md(V )
′. Each B-orbit being
irreducible and open in its closure, it follows that B-orbit closures are indexed
by the B-orbits. Thus I ′n is an indexing set for B-orbit closures in Md(V )
Suppose that n is even—it will be shown presently that it is enough to
consider this case. As already observed, Md(V )
′ has two connected components
on each of which SO(V ) acts transitively. The B-orbits belong to one or the
other component accordingly as the parity of the cardinality of the number of
entries bigger than d in the corresponding element of I ′n. We take Md(V ) to
be the component in which these cardinalities are even. We let In denote the
subset of I ′n consisting of elements for which this cardinality is even. Schubert
varieties in Md(V ) are thus indexed by elements of In.
1.3 Reduction to the case n even
We now argue that it is enough to consider the case n even. Suppose that
n is odd. Let ñ := n + 1 and Ṽ be a vector space of dimension ñ with a
non-degenerate symmetric form. Let ẽ1, . . . , ẽen be a basis of Ṽ as in 1.2. Put
e := ẽd+1 and f := ẽd+2. Take λ to be an element of the field such that
λ2 = 1/2. We can take V to be the subspace of Ṽ spanned by the vectors
ẽ1, . . . , ẽd, λe + λf, ẽd+3, . . . , ẽen, and a basis of V to be these vectors in that
order.
There is a natural map from Md+1(Ṽ )
′ to Md(V ): intersecting with V an
isotropic subspace of Ṽ of dimension d + 1 gives an isotropic subspace of V of
dimension d. This map is onto, for every isotropic subspace of Ṽ (and hence
of V ) is contained in an isotropic subspace of Ṽ of dimension d + 1. It is
also elementary to see that the map is two-to-one (essentially because in a two-
dimensional space with a non-degenerate symmetric form there are two isotropic
lines), and that the two points in any fiber lie one in each component (there
is clearly an element in O(Ṽ ) \ SO(Ṽ ) that moves one element of the fiber to
the other, and so if there was an element of SO(Ṽ ) that also moved one point
to the other, the isotropy at the point would not be contained in SO(Ṽ ), a
contradiction).
We therefore get a natural isomorphism between Md+1(Ṽ ) and Md(V ). We
will now show that the B̃-orbits in Md+1(Ṽ ) correspond under the isomorphism
to B-orbits of Md(V ) (we denote by T̃ and B̃ the maximal torus and Borel
subgroups of SO(Ṽ ) as in §1.2). It will then follow that Schubert varieties in
Md+1(Ṽ ) are isomorphic to those in Md(V ) and the purpose of this subsection
will be achieved.
The group SO(V ) can be realized as the subgroup of SO(Ṽ ) consisting of
the elements that fix e − f . The isomorphism Md+1(Ṽ ) ∼= Md(V ) above is
equivariant for SO(V ), and we have T̃ ∩ SO(V ) = T and B̃ ∩ SO(V ) = B. It
should now be clear that the preimages in Md+1(Ṽ ) of two elements in the same
B-orbit of Md(V ) are in the same B̃-orbit: an element of B that moves one to
the other considered as an element of B̃ moves also the preimage of the one to
that of the other.
On the other hand, the preimages of distinct T -fixed points are distinct
T̃ -fixed points, the corresponding map from I ′n to Ien being given as follows:
i = {i1 < . . . < id} 7→
{̃i1, . . . , ĩd, d+ 1} if i ∈ In
{̃i1, . . . , ĩd, d+ 2} if i ∈ I
n \ In
where
ĩk =
ik if ik ≤ d
ik + 1 if ik ≥ d+ 2
(Note that d + 1 never occurs as an entry in any element of I ′n and that the
elements ĩ1, . . . , ĩd, d + 1 (respectively ĩ1, . . . , ĩd, d + 2) are not in increasing
order except in the trivial case i = {1 < . . . < d}.) Given that each B-orbit
has a T -fixed point and that distinct T̃ -fixed points belong to distinct B̃-orbits,
this implies that the preimages of two elements in distinct B-orbits belong to
distinct B̃-orbits, and the proof is over. �
2 The theorem
The purpose of this section is to state the main theorem and its corollary. We
first set down some basic notation and two fundamental definitions needed in
order to state the theorem.
2.1 Basic notation
We keep the terminology and notations of §1.1, 1.2. As observed in §1.3, it is
enough to consider the case n even. So from now on let n = 2d. Recall that,
for an integer k, 1 ≤ k ≤ 2d, k∗ := 2d + 1 − k. As observed in §1.2, Schubert
varieties in Md(V ) are indexed by In.
Since d now determines n, we will henceforth write I(d) instead of In. In
other words, I(d) is the set of d-element of subsets of {1, . . . , 2d} such that
• for each k, 1 ≤ k ≤ 2d, the subset contains exactly one of k, k∗, and
• the number of elements in the subset that exceed d is even.
We write I(d, 2d) for the set of all d-element subsets of {1, . . . , 2d}. There is a
natural partial order on I(d, 2d) and so also on I(d): v = (v1 < . . . < vd) ≤ w =
(w1 < . . . < wd) if and only if v1 ≤ w1, . . . , vd ≤ wd.
Given v ∈ I(d), the corresponding T -fixed point in Md(V ) (namely, the
span of ev1 , . . . , evd) is denoted e
v. Given w ∈ I(d), the corresponding Schubert
variety in Md(V ) (which, by definition, is the closure of the B-orbit of the T -
fixed point ew with canonical reduced scheme structure) is denoted X(w). The
point ev belongs to X(w) if and only if v ≤ w in the partial order just defined.
Since, under the natural action of B on X(w), each point of X(w) is in the
B-orbit of a T -fixed point ev for some v such that v ≤ w, it is enough to focus
attention on such T -fixed points.
For the rest of this section an element v of I(d) will remain fixed.
We will be dealing extensively with ordered pairs (r, c), 1 ≤ r, c ≤ 2d, such
that r is not and c is an entry of v. Let R denote the set of all such ordered
pairs, and set
N := {(r, c) ∈ R | r > c}
OR := {(r, c) ∈ R | r < c∗}
ON := {(r, c) ∈ R | r > c, r < c∗}
= OR ∩N
d := {(r, c) ∈ R | r = c∗}
diagonal
boundary
(r, c)
(c∗, c)
(r, r∗)
The picture shows a drawing of R. We think of r and c in (r, c) as row
index and column index respectively. The columns are indexed from left to
right by the entries of v in ascending order, the rows from top to bottom by
the entries of {1, . . . , 2d} \ v in ascending order. The points of d are those on
the diagonal, the points of OR are those that are (strictly) above the diag-
onal, and the points of N are those that are to the South-West of the poly-
line captioned “boundary of N”—we draw the boundary so that points on
the boundary belong to N. The reader can readily verify that d = 13 and
v = (1, 2, 3, 4, 6, 7, 10, 11, 13, 15, 18, 19, 22) for the particular picture drawn. The
points of ON indicated by solid circles form a v-chain (see §2.2.1 below).
We will be consideringmonomials, also calledmultisets, in some of these sets.
A monomial, as usual, is a subset with each member being allowed a multiplicity
(taking values in the non-negative integers). The degree of a monomial has also
the usual sense: it is the sum of the multiplicities in the monomial over all
elements of the set. The intersection of a monomial in a set with a subset
of the set has also the natural meaning: it is a monomial in the subset, the
multiplicities being those in the original monomial.
We will refer to d as the diagonal.
2.2 Two fundamental definitions
2.2.1 Definition of v-chain
Given two elements (R,C) and (r, c) in ON, we write (R,C) > (r, c) if R > r
and C < c (note that these are strict inequalities). An ordered sequence α, β, . . .
of elements of ON is called a v-chain if α > β > . . . . A v-chain α1 > . . . > αℓ
has head α1, tail αℓ, and length ℓ.
2.2.2 Definition of O-domination
To a v-chain C : α1 > α2 > . . . inON there corresponds, as described in §5.3.3, a
subset SC ofN which, as observed in Proposition 5.3.5, is “distinguished” in the
sense of §5.1.1. To a distinguished subset of N there corresponds, as described
below in §5.1.2, an element of I(d, 2d). Following these correspondences through,
we get an element of I(d, 2d) attached to the v-chain C. Let w(C) denote this
element—sometimes we write wC . (All this makes sense even when C is empty—
w(C) will turn out to be v itself in that case.)
Furthermore, as will be obvious from its definition, the monomial SC is
“symmetric” in the sense of §5.2.2 and contains evenly many elements of the
diagonal d. Thus, by Proposition 5.2.1, the element w(C) of I(d, 2d) belongs
to I(d).
An element w of I(d) is said toO-dominate C if w ≥ w(C), or, equivalently—
and this is important for the proofs—if w dominates in the sense of [7] the mono-
mial SC (for the proof of the equivalence, see [7, Lemma 5.5]). An element w of
I(d) O-dominates a monomial S of ON (repsectively of OR) if it O-dominates
every v-chain in S (respectively in S ∩ON).
2.3 The main theorem and its corollary
Theorem 2.3.1 Fix a positive integer d and elements v ≤ w of I(d). Let V
be a vector space of dimension 2d with a symmetric non-degenerate bilinear
form (over a field of characteristic not 2). Let X(w) be the Schubert variety
corresponding to w in the orthogonal Grassmannian Md(V ), and e
v the torus
fixed point of X(w) corresponding to v. Let Rwv denote the associated graded
ring with respect to the unique maximal ideal of the local ring of germs at ev of
functions on X(w). Then, for any non-negative integer m, the dimension as a
vector space of the homogeneous piece of Rwv of degree m equals the cardinality
of the set Sw(v)(m) of monomials of degree m of OR that are O-dominated
by w.
The proof of this theorem occupies us for most of this paper. It is reduced
in §3, by an application of standard monomial theory, to combinatorics. The
resulting combinatorial problem is solved in §4–10. For now, let us note the
following immediate consequence:
Corollary 2.3.2 The multiplicity at the point ev of the Schubert variety X(w)
equals the number of monomials in ON of maximal cardinality that are square-
free and O-dominated by w.
Proof: The proof of Corollary 2.2 of [7] holds verbatim here too. �
Part II
From geometry to combinatorics
The problem is translated from geometry to combinatorics. The main combi-
natorial results are formulated.
3 Reduction to combinatorics
In this section we translate the problem from geometry to combinatorics. In §3.1
we recall from [17] the theorem that enables the translation. The translation
itself is done in 3.2 and follows [14].
3.1 Homogeneous co-ordinate ring of the Schubert vari-
ety X(w)
3.1.1 The line bundle L on Md(V )
Let Md(V ) ⊆ Gd(V ) →֒ P(∧
dV ) be the Plücker embedding (where Gd(V )
denotes the Grassmannian of all d-dimensional subspaces of V ). The pull-back
toMd(V ) of the line bundle O(1) on P(∧
dV ) is the square of the ample generator
of the Picard group ofMd(V ). Letting L denote the ample generator, we observe
that it is very ample and want to describe the homogeneous coordinate rings
of Md(V ) and its Schubert subvarieties in the embedding defined by L.
3.1.2 The section qθ of L
For θ in I(d, 2d), let pθ denote the corresponding Plücker coordinate. Consider
the affine patch A of P(∧dV ) given by pǫ = 1, where ǫ := (1, . . . , d). The
intersection A ∩Gd(V ) of this patch with the Grassmannian is an affine space.
Indeed the d-plane corresponding to an arbitrary point z of A ∩ Gd(V ) has a
basis consisting of column vectors of a matrix of the form
where I is the identity matrix and A an arbitrary matrix both of size d × d.
The association z 7→ A is bijective. The restriction of a Plücker coordinate pθ
to A∩Gd(V ) is given by the determinant of a submatrix of size d× d of M , the
entries of θ determining the rows to be chosen from M to form the submatrix.
As can be readily verified, a point z of A ∩ Gd(V ) represents an isotropic
subspace if and only if the corresponding matrix A = (aij) is skew-symmetric
with respect to the anti-diagonal : aij + aj∗i∗ = 0, where the columns and rows
of A are numbered 1, . . . , d and d+1, . . . , 2d respectively. For example, if d = 4,
then a matrix that is skew-symmetric with respect to the anti-diagonal looks
like this: 
−d −c −b 0
−g −f 0 b
−i 0 f c
0 i g d
Since the set of these matrices is connected and contains the point that is
spanned by e1, . . . , ed, it follows that A∩Gd(V ) does not intersect the other com-
ponent of Md(V )
′. In other words, pǫ vanishes everywhere on Md(V )
′ \Md(V ).
Now suppose that θ belongs to I(d). Computing pθ/pǫ as a function on the
affine patch pǫ 6= 0, we see that it is the determinant of a skew-symmetric matrix
of even size, and therefore a square. The square root, which is determined up
to sign, is called the Pfaffian. This suggests that pθ itself is a square: more
precisely that there exists a section qθ of the line bundle L on Md(V ) such that
q2θ = pθ. A weight calculation confirms this to be the case. The qθ are also
called Pfaffians.
3.1.3 Standard monomial theory for Md(V )
A standard monomial in I(d) is a totally ordered sequence θ1 ≥ . . . ≥ θt (with
repetitions allowed) of elements of I(d). Such a standard monomial is said to
be w-dominated for w ∈ I(d) if w ≥ θ1. To a standard monomial θ1 ≥ . . . ≥ θt
in I(d) we associate the product qθ1 · · · qθt , where the qθ are the sections defined
above of the line bundle L. Such a product is also called a standard monomial
and it is said to be dominated by w for w ∈ I(d) if the underlying monomial in
I(d) is dominated by w. Standard monomial theory for Md(V ) says:
Theorem 3.1.1 (Seshadri [17]) Standard monomials qθ1 · · · qθr of degree r form
a basis for the space of forms of degree r in the homogeneous coordinate ring of
Md(V ) in the embedding defined by the ample generator L of the Picard group.
More generally, for w ∈ I(d), the w-dominated standard monomials of degree r
form a basis for the space of forms of degree r in the homogeneous coordinate
ring of the Schubert subvariety X(w) of Md(V ).
3.2 Co-ordinate rings of affine patches and tangent cones
of X(w)
From Theorem 3.1.1 one can deduce rather easily, as we now show, bases for
co-ordinate rings of affine patches of the form qv 6= 0 and of tangent cones of
Schubert varieties. An element v of I(d) will remain fixed for the rest of this
section. To simplify notation we will suppress explicit reference to v.
3.2.1 Standard monomial theory for affine patches
Let A denote the affine patch of P(H0(Md(V ), L)
∗) given by qv 6= 0. The
origin of the affine space A is identified as the T -fixed point ev. The functions
fθ := qθ/qv, v 6= θ ∈ I(d), provide a set of coordinate functions on A. Monomials
in these fθ form a k-basis for the polynomial ring k[A] of functions on A, where k
denotes the underlying field.
Fix w ≥ v in I(d), so that the point ev belongs to the Schubert varietyX(w),
and let Y (w) be the affine patch of X(w) defined thus:
Y (w) := X(w) ∩ A.
The coordinate ring k[Y (w)] of Y (w) is a quotient of the polynomial ring k[A],
and the proposition that follows identifies a subset of the monomials in fθ which
forms a k-basis for k[Y (w)].
We say that a standard monomial θ1 ≥ . . . ≥ θt in I(d) is v-compatible if for
each k, 1 ≤ k ≤ t, either θk 
 v or v 
 θk. Given w in I(d), we denote by SM
the set of w-dominated v-compatible standard monomials.
Proposition 3.2.1 As θ1 ≥ . . . ≥ θt runs over the set SM
w of w-dominated
v-compatible standard monomials, the elements fθ1 · · · fθt form a basis for the
coordinate ring k[Y (w)] of the affine patch Y (w) = X(w) ∩ A of the Schubert
variety X(w).
Proof: The proof is similar to the proof of Proposition 3.1 of [7]. First consider
a linear dependence relation among the fθ1 · · · fθt . Replacing fθ by qθ and “ho-
mogenizing” by qv yields a linear dependence relation among the w-dominated
standard monomials qθ1 · · · qθs restricted to X(w), and so the original relation
must only have been the trivial one, for by Theorem 3.1.1 the qθ1 · · · qθs are
linearly independent on X(w).
To prove that fθ1 · · · fθt generate k[Y (w)] as a vector space, we make the
following claim: if qµ1 · · · qµr be any monomial in the Pfaffians qθ, and qτ1 · · · qτs
a standard monomial that occurs with non-zero co-efficient in the expression for
(the restriction to X(w) of) qµ1 · · · qµr as a linear combination of w-dominated
standard monomials, then τ1∪· · ·∪τs = µ1∪· · ·∪µr as multisets of {1, . . . , 2d}.
To prove the claim, consider the maximal torus T of SO(V ) as in §1.2. The affine
patch A is T -stable and there is an action of T on k[Y (w)]. The sections qθ are
eigenvectors for T with corresponding characters ǫθ1+· · ·+ǫθd , where ǫk denotes
the character of T given by the projection to the diagonal entry on row k. The
claim now follows since eigenvectors corresponding to different characters are
linearly independent.
Let fµ1 · · · fµr be an arbitrary monomial in the fθ. Fix an integer h such
that h > r(d − 1) and consider the expression for (the restriction to X(w) of)
qµ1 · · · qµr · q
v as a linear combination of w-dominated standard monomials. We
claim that qv occurs in every standard monomial qτ1 · · · qτr+h in this expression
(from which it will follow that the τj are all comparable to v). Suppose that
none of τ1, . . . , τr+h equals v. For each τj there is at least one entry of v that
does not occur in it. The number of occurrences of entries of v in τ1 ∪ · · · ∪ τr+h
is thus at most (r + h)(d − 1). But these entries occur at least hd times in
µ1 ∪ · · · ∪ µr ∪ v ∪ · · · ∪ v (where v is repeated h times), a contradiction to the
claim proved in the previous paragraph. Hence our claim is proved. Dividing
by qr+hv the expression for qµ1 · · · qµr .q
v as a linear combination of w-dominated
standard monomials provides an expression for fµ1 · · · fµr as a linear combina-
tion of fθ1 · · · fθt , as θ1 ≥ . . . ≥ θt varies over SM
3.2.2 Standard monomial theory for tangent cones
The affine patch Md(V )∩A of the orthogonal Grassmannian Md(V ) is an affine
space whose coordinate ring can be taken to be the polynomial ring in variables
of the form X(r,c) with (r, c) ∈ OR, where (as in §2.1)
OR = {(r, c) | 1 ≤ r, c ≤ 2d, r 6∈ v, c ∈ v, r < c∗}
Taking d = 5 and v = (1, 3, 4, 6, 9) for example, a general element of Md(V )∩A
has a basis consisting of column vectors of a matrix of the following form:

1 0 0 0 0
X21 X23 X24 X26 0
0 1 0 0 0
0 0 1 0 0
X51 X53 X54 0 −X26
0 0 0 1 0
X71 X73 0 −X54 −X24
X81 0 −X73 −X53 −X23
0 0 0 0 1
0 −X81 −X71 −X51 −X21

The expression for fθ = qθ/qv in terms of the X(r,c) is a square root of the
determinant of the submatrix of a matrix like the one above obtained by choosing
the rows given by the entries of θ. Thus fθ is a homogeneous polynomial of
degree the v-degree of θ, where the v-degree of θ is defined as one half of the
cardinality of v \ θ.
Since the ideal of the Schubert variety X(w) in the homogeneous coordinate
ring of Md(V ) is generated
2 by the qτ , τ ∈ I(d) such that τ 6≤ w, it follows
that the ideal of Y (w) := X(w) ∩ A in Md(V ) ∩ A is generated by the the fτ ,
τ ∈ I(d) such that τ 6≤ w. We are interested in the tangent cone to X(w)
at ev (or, what is the same, the tangent cone to Y (w) at the origin), and since
k[Y (w)] is graded, its associated graded ring with respect to the maximal ideal
corresponding to the origin is k[Y (w)] itself.
Proposition 3.2.1 says that the graded piece of k[Y (w)] of degree m is gener-
ated as a k-vector space by elements of degreem of the set SMw of w-dominated
v-compatible standard monomials, where the degree of a standard monomial
θ1 ≥ . . . ≥ θt is defined to be the sum of the v-degrees of θ1, . . . , θt. To prove
Theorem 2.3.1 it therefore suffices to prove the following:
2This is a consequence of Theorem 3.1.1. It is easy to see that the qτ such that τ 6≤ w vanish
on X(w). Since all standard monomials form a basis for the homogeneous coordinate ring
of Md(V ) in P(H
0(Md(V ), L)
∗), it follows that w-dominated standard monomials span the
quotient ring by the ideal generated by such qτ . Since such monomials are linearly independent
in the homogeneous coordinate ring of X(w), the desired result follows.
Theorem 3.2.2 The set SMw(m) of standard monomials in I(d) of degree m
that are w-dominated and v-compatible is in bijection with the set Sw(v)(m) of
monomials in OR of degree m that are O-dominated by w.
4 Further reductions
In the last section, we reduced the proof of our main theorem (Theorem 2.3.1)
to that of Theorem 3.2.2. We now reduce the proof of Theorem 3.2.2 to that of
Propositions 4.1.1, 4.1.2 and 4.1.3 below. These propositions will eventually be
proved in §10.
4.1 The main propositions
Fix once and for all an element v of I(d). The bijection stated in Theorem 3.2.2
will be described by means of two maps Oπ and Oφ whose definitions will be
given in §7 and §8 below. We will now state some properties of these maps.
In §4.2 we will see how Theorem 3.2.2 follows once these properties are estab-
lished.
The map Oπ associates to a monomial S in ON a pair (w,S′) consisting
of an element w of I(d) and a “smaller” monomial S′ in ON. This map enjoys
the following good properties:
Proposition 4.1.1 1. w ≥ v.
2. v-degree(w) + degree(S′) = degree(S).
3. w O-dominates S′.
4. w is the least element of I(d) that O-dominates S.
The map Oφ, on the other hand, associates a monomial in ON to a pair (w,T)
consisting of an element w of I(d) with w ≥ v and a monomial T in ON that is
O-dominated by w.
Proposition 4.1.2 The maps Oπ and Oφ are inverses of each other.
For an integer f , 1 ≤ f ≤ 2d, consider the following conditions, the first on
a monomial S in ON, the second on an element w of I(d):
(‡) f is not the row index of any element of S and f⋆ is not the
column index of any element of S.
(‡) f is not an entry of w.
(It is convenient to the use the same notation (‡) for both conditions.)
Proposition 4.1.3 Assume that v satisfies (‡)—all references to (‡) in this
proposition are with respect to a fixed f , 1 ≤ f ≤ 2d.
1. Let w be an element of I(d) with w ≥ v and T a monomial in ON that is
O-dominated by w. If w and T both satisfy (‡), then so does Oφ(w,T).
2. If a monomial S in ON satisfies (‡), then so do the “components” w and
S′ of its image under Oπ.
4.2 From the main propositions to the main theorem
Let us now see how Theorem 3.2.2 follows from the propositions of §4.1. Most
of the following argument runs parallel to its counterparts in the case of the
Grassmannian and symplectic Grassmannian (Propositions 4.1.1 and 4.1.2 have
their counterparts in [7, 4]), but, in the case that d is odd, the part involving
the “mirror image” requires additional work. This is where Proposition 4.1.3
comes in.
Let S, T , and U , denote respectively the sets of monomials in OR, ON, and
OR\ON. Let SMv denote the set of v-compatible standard monomials that are
“anti-dominated” by v: a standard monomial θ1 ≥ . . . ≥ θt is anti-dominated
by v if θt ≥ v (we can also write θt > v since θt 6= v by v-compatibility).
Define the domination map from T to I(d) by sending a monomial in ON to
the least element that O-dominates it. Define the domination map from SMv
to I(d) by sending θ1 ≥ . . . ≥ θt to θ1. Both these maps take, by definition, the
value v on the empty monomial.
Notation 4.2.1 In the following, we use subscripts, superscripts, suffixes, and
combinations thereof to modify the meanings of S, T , U , SM , and SMv.
• superscript: this will be an element w of I(d); when used on T it denotes
O-domination (more precisely, Tw denotes the subset of T consisting of
those elements that are O-dominated by w); when used on SM or SMv
it denotes domination by w.
• subscript: denotes anti-domination (applied only to standard monomials).
• suffix “(m)”: indicates degree (for example, SMwv (m) denotes the set of v-
compatible standard monomials that are anti-dominated by v, dominated
by w, and of degree m).
Repeated application of Oπ gives a map from T to SMv that commutes
with domination (as just defined) and preserves degree. Repeated application
of Oφ gives a map from SMv to T . These two maps being inverses of each
other (Proposition 4.1.2) and so we have a bijection between SMv and T . In
fact, since domination and degree are respected (Proposition 4.1.1), we get a
bijection SMwv (m)
∼= Tw(m).
As explained below, the “mirror image” of the bijection SMv(m) ∼= T (m)
gives a bijection SMv(m) ∼= U(m). Putting these bijections together, we get
the desired result:
SMw(m) =
SMwv (k)× SM
v(m− k)
Tw(k)× U(m− k) = Sw(m).
We now explain how to realize the bijection SMv(m) ∼= U(m) as the “mirror
image” of the bijection SMv(m) ∼= T (m). For an element u of I(d), define
u∗ := (u∗d, . . . , u
1). In the case d is even, the association u 7→ u
∗ is an order
reversing involution, and the argument in [4] for the symplectic Grassmannian
holds here too. In the case d is odd, u∗ is not an element of I(d), and so some
additional work is required.
Recall that a “base element” v of I(d) has been fixed and that our notation
does not explicitly indicate this dependence upon v: for example, OR is depen-
dent upon v. For a brief while now (until the end of this section) we need to
simultaneously handle several base elements of I(d). We will use the following
convention: when the base element of I(d) is not v, we will explicitly indicate
it by means of a suffix. For instance, SM(v∗) denotes the set of v∗-compatible
standard monomials in I(d).
Let us first do the case when d is even. We get a bijection SMv ∼= SMv∗(v
by associating to θ1 ≥ . . . ≥ θt the element θ
t ≥ . . . ≥ θ
1 . The sum of the
v-degrees of θ1, . . . , θt equals the sum of the v
∗-degrees of θ∗t , . . . , θ
1 , so that we
get a bijection SMv(m) ∼= SMv∗(v
∗)(m).
For an element (r, c) of ON(v∗), consider its flip (c, r). Since v belongs
to I(d), the complement of v∗ in {1, . . . , 2d} is v, and it follows that (c, r)
belongs to OR \ ON. This induces a degree preserving bijection T (v∗) ∼= U .
Putting this together with the bijection of the previous paragraph and the one
deduced earlier in this section (using Oπ and Oφ), we get what we want:
SMv(m) ∼= SMv∗(v
∗)(m) ∼= T (v
∗)(m) ∼= U(m).
Now suppose that d is odd. Then the map x 7→ x∗ does not map I(d) to
I(d) but to I(d)∗ (defined as the set consisting of those elements u of I(d, 2d)
such that, for each k, 1 ≤ k ≤ 2d, exactly one of k, k∗ belongs to u, and the
number of entries of u greater than d is odd). We define a map u 7→ ũ from
I(d)∗ to I(d + 1) as follows: ũ := {ũ1, . . . , ũd, d + 2} (the elements are not in
increasing order except in the trivial case u = (1, . . . , d)), where, for an integer
e, 1 ≤ e ≤ 2d, we set
ẽ :=
e if 1 ≤ e ≤ d
e+ 2 if d+ 1 ≤ e ≤ 2d
This map u 7→ ũ is an order preserving injection.
Consider the composition x 7→ x∗ 7→ x̃∗ from I(d) to I(d+1). This is an order
reversing injection. The induced map on standard monomials is an injection
from SMv to SMfv∗(ṽ
∗). It is readily seen that the image under this map
is the subset SMfv∗(ṽ
∗)(‡) consisting of those standard monomials all of whose
elements satisfy (‡) with f = d+1. We have already established (using the maps
Oπ and Oφ) a bijection SMfv∗(ṽ
∗) ∼= T (ṽ∗). It follows from Proposition 4.1.3
that under this bijection the subset SMfv∗(ṽ
∗)(‡) maps to T (ṽ∗)(‡) (defined as
the set of those monomials in ON(ṽ∗) satisfying (‡) with f = d+ 1).
Now T (ṽ∗)(‡) is in degree preserving bijection with U : every element of
degree 1 of T (ṽ∗)(‡) is uniquely of the form (c̃, r̃) for (r, c) in OR\ON, and the
desired bijection is induced from this. Putting all of these together, we finally
SMv ∼= SMfv∗(ṽ
∗)(‡) ∼= T (ṽ∗)(‡) ∼= U.
Thus, in order to prove our main theorem (Theorem 2.3.1), it suffices to
describe the maps Oπ and Oφ and to prove Propositions 4.1.1–4.1.3.
Part III
The proof
The main combinatorial results formulated in §4.1 are proved. An attempt is
made to maintain parallelism with the proofs in [7].
5 Terminology and notation
5.1 Distinguished subsets
5.1.1 Distinguished subsets of N
Following [7, §4], we define a multiset S of N to be distinguished , if, first of all,
it is a subset in the usual sense (in other words, it is “multiplicity free”), and
if, for any two distinct elements (R,C) and (r, c) of S, the following conditions
are satisfied:
A. R 6= r and C 6= c.
B. If R > r, then either r < C or C < c.
In terms of pictures, condition A says that (r, c) cannot lie exactly due North
or East of (R,C) (or the other way around); so we can assume, interchanging
the two points if necessary, that (r, c) lies strictly to the Northeast or Northwest
of (R,C); condition B now says that, if (r, c) lies to the Northwest of (R,C),
then the point that is simultaneously due North of (R,C) and due East of (r, c)
(namely (r, C)) does not belong to N.
5.1.2 Attaching elements of I(d, 2d) to distinguished subsets of N
To a distinguished subset S of N there is naturally associated an element w of
I(d, 2d) as follows: start with v, remove all members of v which appear as column
indices of elements of S, and add row indices of all elements of S. As observed
in [7, Proposition 4.3], this association gives a bijection between distinguished
subsets of N and elements w ≥ v of I(d, 2d). The unique distinguished subset
of N corresponding to an element w ≥ v of I(d, 2d) is denoted Sw.
5.2 The involution #
5.2.1 The involution # on I(d, 2d)
There are two natural order reversing involutions on I(d, 2d). First there is
w 7→ w∗ induced by the natural order reversing involution j 7→ j∗ on {1, . . . , 2d}:
here w∗ has the obvious meaning, namely, it consists of all j∗ such that j
belongs to w. Then there is the map taking w to its complement {1, . . . , 2d}\w.
These two involutions commute. Composing the two we get an order preserving
involution on I(d, 2d) which we denote by w 7→ w#. The elements of the
subset I(d) are fixed points under this involution (there are points not in I(d)
that are also fixed).
5.2.2 The involution # on N and R
For α = (r, c) in N, or more generally in R, define α# = (c∗, r∗). The involution
α 7→ α# is just the reflection with respect to the diagonal d. For a subset or
even multiset S of N (or R), the symbol S# has the obvious meaning. We call
S symmetric if S = S#.
Proposition 5.2.1 An element w ≥ v of I(d, 2d) belongs to I(d) if and only
if the distinguished subset Sw of N corresponding to it as described in §5.1.2 is
symmetric and has evenly many diagonal elements.
Proof: That the symmetry of Sw is equivalent to the condition that w = w
is proved in [4, Proposition 5.7]. Now suppose that Sw is symmetric. We claim
that for an element (r, c) of Sw that is not on the diagonal, either both r and c
are bigger than d or both are less than d + 1. It is enough to prove the claim,
for w is obtained from v by removing the column indices and adding the row
indices of elements of Sw, and it would follow that the number of entries in w
that are bigger than d equals the number of such entries in v plus the number
of diagonal elements in Sw.
We now prove the claim. Since Sw is symmetric, it follows that (c
∗, r∗) also
belongs to Sw. Since Sw is distinguished, it follows that in case r < c
∗ (that
is, if (r, c) lies above the diagonal), we have r < r∗, and so c < r < r∗; and in
case r > c∗, we have c∗ < c, and so c∗ < c < r. Thus the claim is proved. �
5.3 The subset SC attached to a v-chain C
5.3.1 Vertical and horizontal projections of an element of ON
For α = (r, c) in ON (or more generally in OR), the elements pv(α) := (c
∗, c)
and ph(α) := (r, r
∗) of the diagonal d are called respectively the vertical and
horizontal projections of α. In terms of pictures, the vertical projection is the
element of the diagonal due South of α; the horizontal projection is the element
of the diagonal due East of α. The vertical line joining α to its vertical projection
pv(α) and the horizontal line joining α to its horizontal projection ph(α) are
called the legs of α.
5.3.2 The “connection” relation on elements of a v-chain
Let C : α1 = (r1, c1) > α2 = (r2, c2) > · · · be a v-chain in ON. Two consecutive
elements αj and αj+1 of C are said to be connected if the following conditions
are both satisfied:
• their legs are “intertwined”; equivalently and more precisely, this means
that r∗j ≥ cj+1, or, what amounts to the same, rj ≤ c
• the point (rj+1, r
j ) belongs to N; this just means that rj+1 > r
Consider the coarsest equivalence relation on the elements of C generated by
the above relation. The equivalence classes of C with respect to this equivalence
relation are called the connected components of the v-chain C.
This definition has its quirks:
The v-chain C : α > β > γ in the pic-
ture has {α, β} and {γ} as its connected
components; but the “sub” v-chain α > γ
of C is connected (as a v-chain in its own
right).
diagonal
boundary
5.3.3 The definition of SC
We will define SC as a multiset of N. It is easy to see and in any case stated
explicitly as part of Corollary 5.3.5 that it is multiplicity free and so is actually
a subset of N.
First suppose that C : α1 = (r1, c1) > · · · > αℓ = (rℓ, cℓ) is a connected
v-chain in ON. Observe that, if there is at all an integer j, 1 ≤ j ≤ ℓ, such that
the horizontal projection ph(αj) does not belong to N, then j = ℓ. Define
SC :=
{pv(α1), . . . , pv(αℓ)} if ℓ is even
{pv(α1), . . . , pv(αℓ)} ∪ {ph(αℓ)} if ℓ is odd and ph(αℓ) ∈ N
{pv(α1), . . . , pv(αℓ−1)} ∪ {αℓ, α
} if ℓ is odd and ph(αℓ) 6∈ N
For a v-chain C that is not necessarily connected, let C = C1 ∪ C2 ∪ · · · be
the partition of C into its connected components, and set
SC := SC1 ∪SC2 ∪ · · ·
5.3.4 The type of an element α of a v-chain C, and the set SC,α
We introduce some terminology and notation. Their usefulness may not be
immediately apparent.
Suppose that C : α1 > · · · > αℓ is a connected v-chain. We define the type
in C of an element αj , 1 ≤ j ≤ ℓ, of C to be V, H, or S, accordingly as:
V: j 6= ℓ, or j = ℓ and ℓ is even.
H: j = ℓ, ℓ is odd, and ph(αℓ) ∈ N.
S: j = ℓ, ℓ is odd, and ph(αℓ) 6∈ N.
The type of an element in a v-chain that is not necessarily connected is defined
to be its type in its connected component.
The set SC,α of elements of N generated by an element α of C is defined to
SC,α :=
{pv(α)} if α is of type V in C;
{pv(α), ph(α)} if α is of type H in C;
{α, α#} if α is of type S in C;
Observe that, for a v-chain C, the monomial SC defined in §5.3.3 is the union,
over all elements α of C, of SC,α.
For an element α of a v-chain C, we define qC,α to be pv(α) if α is of type V
or H and to be α if it is of type S.
If the horizontal projection of an element in a v-chain does not belong to N,
then clearly the same is true for every succeeding element. The first such element
of a v-chain is called the critical element.
Proposition 5.3.1 1. The cardinality is odd of a connected component that
has an element of type H or S. Conversely, if the cardinality of a compo-
nent is odd, then it has an element of type H or S.
2. An element of type H or S can only be the last element in its connected
component.
3. The critical element has type either V or S. No element before it can be of
type S and every element after it is of type S. In particular, any element
that succeeds an element of type S is of type S.
Proof: Clear from definitions. �
Proposition 5.3.2 Let α > γ be elements of a v-chain C (we are not assuming
that they are consecutive).
1. If α > γ is connected as a v-chain in its own right, then α is connected to
its next member in C; that is, α cannot be the last element in its connected
component in C.
2. If α > γ is not connected as a v-chain in its own right and the legs of α and
γ intertwine, then the connected component of γ in C is the singleton {γ},
and γ has type S in C.
Proof: Clear from definitions. �
Proposition 5.3.3 Let E : α > . . . > ζ be a v-chain, D and D′ two v-chains
with tail α, and C, C′ the concatenations of D, D′ respectively with E. Then
1. The last element in the connected component containing α is the same
in C and C′ (and this is the same as in E).
Let λ denote this element.
2. The only element among α, . . . , ζ that possibly has different types in C
and C′ is λ.
Proof: (1): Whether or not two successive elements in a v-chain are connected
is independent of other elements in the v-chain.
(2): The type of an element in a v-chain is V unless it is the last element
in its connected component. And the type of the last element in a component
depends on the cardinality of the component. The components of E not contain-
ing α are still components in C and C′. In contrast, the component containing α
could possibly be larger in C (respectively C′) and hence its cardinality could
be different. �
For an element α = (r, c) of N, we define α(up) to be α itself if α is either on
or above the diagonal d (more precisely, if r ≤ c∗), and to be its “reflection” in
the diagonal (more precisely, (c∗, r∗)) if α is below the diagonal (more precisely,
if r > c∗). For a monomial S of N, S(up) is defined to be the intersection
of S (as a multiset) with the subset ON ∪ d of N. The notations α(down) and
S(down) have similar meanings.
Caution: It is not true that S(up) = {α(up)|α ∈ S} (in the obvious sense one
would make of the right hand side). In particular, for a singleton monomial {α},
it is not always true that {α}(up) = {α(up)}.
Proposition 5.3.4 Let α and β be elements of a v-chain C. Let us use α′ and
β′ respectively to denote elements of SC,α(up) and SC,β(up).
1. If α > β (these elements are not necessarily consecutive in C), then, given
β′, there exists α′ such that α′ > β′. In fact, this is true for every choice
of α′ except when
(*) α is of type H, and ph(α) 6> β
′ for some β′ ∈ SC,β.
In particular, qC,α > β
′ and qC,α > qC,β.
2. Conversely, suppose that α′ > β′ for some choice of α′ and β′. Then
α ≥ β; if equality occurs, then α is of type H, α′ = pv(α) and β
′ = ph(α).
In particular, if α′ > qC,β (or more specially qC,α > qC,β), then α > β.
3. If (*) holds for α > β in C, then
(a) the critical element of C is the one just after α; in particular, α is
uniquely determined.
(b) all elements of C succeeding α are of type S; in particular, β is of
type S and β′ = β.
(c) (*) holds for γ in place of β for every γ in C that succeeds α.
Proof: (1) If α is of type V or H, we need only take α′ = pv(α), for pv(α) >
pv(β), pv(α) > ph(β), and pv(α) > β. Now suppose that α is of type S. Then β
too is of type S (Proposition 5.3.1 (3)), so β′ can only be β, and the first part
of (1) is proved.
It follows from the above that if α′ = pv(α) or if α has type S, then α
′ > β′
independent of the choice of α′. So if α′ 6> β′, then (*) holds and α′ = ph(α).
(3) Let λ be the immediate successor of α in C. Then α is not connected
to λ (Proposition 5.3.1 (2)). Since ph(α) 6> β
′, it follows that α and β have
intertwining legs. Therefore so do α and λ. By Proposition 5.3.2 (2), λ has
type S in C.
Since α has type H and λ type S, it follows immediately from the definition of
the critical element that λ is the critical element. This proves (a). Assertion (b)
now follows from Proposition 5.3.1 (3). For (c), write ph(α) = (a, a
∗), λ =
(R,C), and γ = (r, c). Then R < a∗, for α and λ have intertwining legs but
are not connected. So c < r ≤ R < a∗. This means ph(α) 6> γ. And γ being of
type S (by (b)), we can take γ′ = γ.
(2) Suppose that α 6≥ β. Then β > α. By the second part of (1) above, β
is of type H and β′ = ph(β); by item (b) of (3), α is of type S, so α
′ = α. This
leads to the contradiction β > α > ph(β). �
Corollary 5.3.5 The multiset SC attached to a v-chain C is a distinguished
subset of N in the sense of 5.1.1.
Proof: If α in C is of type V or S, then SC,α is a singleton; if it is of type H,
then SC,α = {pv(α), ph(β)}. So there can be no violation of conditions A and B
of §5.1.1 by elements of SC,α.
Suppose α > β. By Proposition 5.3.4 (1), we have α′ > β′ for any choice
of α′ ∈ SC,α and β
′ ∈ SC,β except when the condition (*) holds. By (3)
of the same proposition, if (*) holds, then β′ = β, and writing β = (r, c),
ph(α) = (a, a
∗), we have r < a (since α > β) and c < r < a∗ (see proof of
item 3(c) of the proposition). Thus there can be no violation of conditions A
and B of §5.1.1. �
Corollary 5.3.6 Let S be a v-chain in ON and w an element of I(d). If w
O-dominates S, then w dominates in the sense of [7] the monomial S ∪ S#
of N.
Proof: By [4, Proposition 5.15], it is enough to show that w dominates S. Let
C : α1 > . . . > αt be a v-chain in S. Writing αj = (rj , cj) and qC,αj = (Rj , Cj)
we have rj ≤ Rj and Cj ≤ cj . By Proposition 5.3.4 (1), we have qC,α1 > . . . >
qC,αt . Since w O-dominates S, it in particular dominates qC,α1 > . . . > qC,αt
and so also C. �
6 O-depth
The concept of O-depth defined in §6.1 below plays a key role in this paper. As
the name suggests, it is the orthogonal analogue of the concept of depth of [7].
In §6.2 below, it is observed that the O-depth is no smaller than depth in the
sense of [7]. In §6.3, some observations about the relation between O-depths
and types of elements in v-chains are recorded.
6.1 Definition of O-depth
The O-depth of an element α in a v-chain C in ON is the depth in SC in the
sense of [7] of qC,α: in other words, it is the depth in SC of pv(α) in case α is of
type V or H, and of α (equivalently of α#) in case α is of type S. It is denoted
O-depthC(α). The O-depth of an element α in a monomial S of ON is the
maximum, over all v-chains C in S containing α, of the O-depth of α in C. It
is denoted O-depth
(α). Finally, the O-depth of a monomial S in ON is the
maximum of the O-depths in S of all the elements of S.
There is a conflict in the above definitions: Is the O-depth of an element
of a v-chain C the same as its depth as an element of the monomial C? In
other words, could the O-depth of an element in a v-chain be exceeded by its
O-depth in a sub-chain? The conflict is resolved by the first item of the following
proposition.
Proposition 6.1.1 1. For v-chains C ⊆ D, the O-depth in C of an element
of C is no more than its O-depth in D.
2. If a v-chain C is an initial segment of a v-chain D, then the O-depths
in C and D of an element of C are the same.
Proof: (1): By an induction on the difference in the cardinalities of D and C,
we may assume that D has one more element than C. Call this extra element δ.
Suppose that δ lies between successive elements α and β of C (the modifications
needed to cover the extreme cases when it goes at the beginning or the end are
being left to the reader).
The only elements of C that could possibly undergo changes of type on
addition of δ are α and the last element in the connected component of β,
which let us call β′. If there are no type changes, then SC ⊆ SD and the
assertion is immediate. The only type change that α can undergo is from H
to V. The type changes that β′ can undergo are: H to V; V to H; S to V; V to
S. An easy enumeration of cases shows that only one of α and β′ can undergo
a type change.
We need not worry about changes from V to H for in this case SC ⊆ SD.
First let us suppose that α undergoes a change of type (from H to V). Then
δ is connected to α. It follows from Proposition 5.3.1 (1) that δ has type V in D:
the connected component of α in C has odd number of elements, so if δ happens
to be the last element in its connected component in D, the number of elements
in that component will be even. Replacing an occurrence of ph(α) in a v-chain
of SC by pv(δ) would result in a v-chain in SD (by Proposition 5.3.4 (1)), and
this case is settled.
Now suppose that β′ undergoes a type change. Then δ is connected to β and
δ is of type V in D (Proposition 5.3.1 (2)). Replacing by pv(δ) any occurrence
in a v-chain in SC of pv(β
′), ph(β
′), β′ accordingly as the type of β′ in C
is V, H, or S, (not necessarily in the same place but at an appropriate place)
would result in a v-chain in SD (by Proposition 5.3.4 (1)), and we see that the
O-depth cannot decrease.
(2): It follows from Proposition 5.3.4 (2) that, for an element α of C, con-
tributions to SD from elements beyond α (in particular from those not in C)
do not affect the depth in SD of qD,α. Looking for the possibility of differences
in types in C and D of elements of C, we see that the only element of C that
has possibly a different type in D is its last element. And this too can change
type only from H to V.
The above two observations imply that the calculations of O-depths in C
and D of an element α of C are no different: we would be considering the depth
in SC and SD respectively of the same element (either pv(α) or α), and the
differences in SD and SC have no effect on this consideration. �
Corollary 6.1.2 If C ⊆ D are v-chains in ON, then wC ≤ wD (although it is
not always true that SC ⊆ SD).
Proof: By [7, Lemma 5.5], it is enough to show that every v-chain in SC is
dominated by wD. Let β1 = (r1, c1) > · · · > βt = (rt, ct) be an arbitrary v-chain
in SC . To show that it is dominated by wD, it is enough, by [7, Lemma 4.5], to
show the existence of a v-chain (R1, C1) > · · · > (Rt, Ct) in SD with rj ≤ Rj
and Cj ≤ cj for 1 ≤ j ≤ t. Such a v-chain exists by the proof of (1) of Proposi-
tion 6.1.1. �
Corollary 6.1.3 1. Let S be a monomial in ON and α ∈ S. Then there ex-
ists a v-chain C in S with tail α such that O-depth
(α) = O-depthC(α).
2. For elements α > γ in a v-chain C (these need not be consecutive), we
have O-depthC(α) < O-depthC(γ).
3. For elements α > γ of a monomial S in ON, we have O-depth
(α) <
O-depth
4. No two elements of the same O-depth in a monomial in ON are compa-
rable.
Proof: (1) This follows from (2) of the Proposition above and the definition
of O-depth.
(2) This follows from Proposition 5.3.4 (1) and the definition of O-depth.
(3) By (1), there exists a v-chain C with tail α such that O-depth
(α) =
O-depthC(α). Concatenate C with α > γ and letD denote the resulting v-chain.
By (2) of the Proposition above, O-depthC(α) = O-depthD(α). By (2) above,
O-depthD(α) < O-depthD(γ). And finally, O-depthD(γ) ≤ O-depthS(γ) by
the definition of O-depth
(4) Immediate from (3). �
Corollary 6.1.4 Let β > γ be elements of a v-chain C of elements of ON. Let
E be a v-chain in SC with tail qC,γ and length O-depthC(γ). Then qC,β occurs
in E.
Proof: It is enough to show that for α′ 6= qC,β in E, either α
′ > qC,β or
qC,β > α
′. Let α be in C such that qC,β 6= α
′ ∈ SC,α. If β ≥ α, then qC,β > α
by Proposition 5.3.4 (1). If α > β and α′ 6> qC,β , then, by (1) and (3) of the
same proposition, α′ 6> qC,γ , a contradiction. �
6.2 O-depth and depth
Lemma 6.2.1 The O-depth of an element α in a monomial S of ON is no
less than its depth (in the sense of [7]) in S ∪S#.
Proof: Let C : α1 > . . . > αt be a v-chain in S∪S
# with tail αt = α, where t
is the depth of α in S ∪S#. We then have α1(up) > . . . > αt(up), so we may
assume C to be in S. By Proposition 5.3.4 (1), qC,α1 > . . . > qC,αt in SC . So
depth
S∪S#(α) = t ≤ depthSC (qC,αt) ≤ O-depthS(α). �
6.3 O-depth and type
We begin by defining some useful terminology. Let (r, c) and (R,C) be two
elements of R. To say that (R,C) dominates (r, c) means that r ≤ R and C ≤ c
(in terms of pictures, (r, c) lies (not necessarily strictly) to the Northeast of
(R,C)). To say that they are comparable means that either (R,C) > (r, c) or
(r, c) > (R,C). While this is admittedly strange, there will arise no occasion
for confusion.
For an integer i, we let i(odd) be the largest odd integer not bigger than i
and i(even) the smallest even integer not smaller than i.
Lemma 6.3.1 1. For consecutive elements α > β of a v-chain C,
O-depthC(β) =
O-depthC(α) + 2 if and only if α is of
type H and ph(α) > β
O-depthC(α) + 1 otherwise
2. For an element of a v-chain C such that either its horizontal projection
belongs to N or it is connected to its predecessor, the parity of its O-depth
in C is the same as that of its ordinality in its connected component in C.
3. The O-depth in a v-chain of an element of type H is odd.
4. If in a v-chain an element of type V is the last in its connected component,
then its O-depth is even.
5. If in a v-chain C there is an element of O-depth d, then
(a) for every odd integer d′ not exceeding d, there is in C an element of
O-depth d′.
(b) if, for an even integer d′ not exceeding d, there is no element in C of
O-depth d′, then the element α in C of O-depth d′ − 1 is of type H,
and ph(α) > β, where β denotes the immediate successor of α in C.
6. Let C be a v-chain and α an element of type H in C. Then the depth in
SC of ph(α) equals O-depthC(α) + 1. In particular, this depth is even.
Proof: (1): From items 1 and 3(a) of Proposition 5.3.4, it follows that, for
γ in C with γ > α, if γ′ 6> qC,α for some γ
′ in SC,γ , then γ
′ 6> qC,β. Thus
O-depthC(β) exceeds O-depthC(α) by the number of elements in SC,α that
dominate qC,β . This number is 1 if α is of type V, or of type S, or of type H
and ph(α) 6> β; it is 2 if α is of type H and ph(α) > β (note that ph(α) > β if
and only if ph(α) > qC,β).
(2): Let λ be such an element. Everything preceding λ in C is of type H
or V (Proposition 5.3.1 (3)). Let λ belong to the kth connected component,
and n1, . . . , nk be respectively the cardinalities of the first, . . . , k
th connected
components. By (1) above and item 3(b) of Proposition 5.3.4, O-depthC(λ)
is n1(even) + · · · + nk−1(even) plus the ordinality of λ in the k
th connected
component.
(3) and (4): These are special cases of 2.
(5): This follows easily from (1) and (3).
(6): It follows from Proposition 5.3.4 (2) that there is no element γ in SC
that lies between pv(α) and ph(α) (meaning pv(α) > γ > ph(α)), so the asser-
tion holds. �
Corollary 6.3.2 For a v-chain C in ON, if the O-depths of elements in C are
bounded by k, then the depths of elements in SC are bounded by k(even).
Proof: The depth of qC,α in SC for any α in C is at most k by hypothesis.
An element of SC that is not qC,α for any α in C can only be of the form
ph(α) for some α. By Proposition 5.3.4, depthSCpv(α) = depthSCph(α) − 1,
which implies depth
ph(α) ≤ k + 1. If, moreover, k is even, then by (3) of
Lemma 6.3.1 depth
ph(α) = depthSCpv(α) + 1 ≤ (k − 1) + 1 = k. �
Proposition 6.3.3 Given a monomial S in ON and an element α in it, there
exists a v-chain C in S with tail α such that O-depthC(β) = O-depthS(β) for
every β in C.
Proof: Proceed by induction on d := O-depth
(α). Choose a v-chain D in S
with tail α such that O-depthD(α) = O-depthS(α) (such a v-chain exists by
Corollary 6.1.3 (1)). Let α′ be the element in D just before α. It follows from
item (3) of Corollary 6.1.3 and item (1) of Lemma 6.3.1 that O-depth
(α′) (as
also O-depthD(α
′)) is either d − 1 or d − 2. By induction, there exists a v-
chain C′ with tail α′ that has the desired property. Let C be the concatenation
of C′ with α′ > α.
We claim that C has the desired property. The only thing to be proved is
that O-depthC(α) = d. By item (1) of Lemma 6.3.1, we have O-depthC(α) ≥
O-depthC′(α
′) + 1. In particular, the claim is proved in case O-depthC′(α
is d − 1, so let us assume that O-depthC′(α
′) is d− 2. It now follows from the
same item that α′ has type H in D and ph(α
′) > α; it further follows that it is
enough to show that α′ has type H in C.
Since α′ has type H in D, it follows (from item (2) of Proposition 5.3.1) that
α′ > α is not connected and (from item (3) of Lemma 6.3.1) that d− 2 is odd.
Now, by item (4) of Lemma 6.3.1, the type in C′ of α′ cannot be V, so it is H,
and the claim is proved. �
Corollary 6.3.4 Let S be a monomial in ON, β an element of S, and i an
integer such that i < O-depth
(β). Then
(a) If i is odd, there exists an element α in S of O-depth i such that α > β.
(b) If i is even and there is no element α in S of O-depth i such that α > β,
then there is element α in S of O-depth i− 1 such that ph(α) > β.
Proof: Choose a v-chain C in S having tail β and the good property of Propo-
sition 6.3.3. Apply Lemma 6.3.1 (5). �
Corollary 6.3.5 Let C be a v-chain in ON with tail α such that O-depthC(α)
is odd. Let A be a v-chain in ON with head α, and D the concatenation of C
with A. Let C′ denote the v-chain C \ {α}. Then
1. The type of an element of A is the same in both A and D. In particular,
SA ⊆ SD and qA,β = qD,β for β in A.
2. The type of an element of C′ is the same in both C′ and D. In particular,
SC′ ⊆ SD.
3. SD = SC′ ∪ SA (disjoint union); letting j0 := O-depthC(α) we have
j0 = SA and (SD)1 ∪ · · · ∪ (SD)j0−1 = SC′ . (For a monomial S,
the subset of elements of depth at least i is denoted Si, and the subset of
elements of depth exactly i is denoted Si.)
Proof: (1) Generally (meaning without the assumption that O-depthC(α) is
odd), the only element of A that could possibly have a different type in D is
the last one in the first connected component of A; whether or not it changes
type depends exactly upon whether or not the parity of the cardinality of its
connected component in D is different from that in A. Under our hypothesis,
this parity does not change, for, by (4) of Lemma 6.3.1, the type of α in C is H
or S, and so the cardinality of the connected component of α in C is odd.
(2) Generally (meaning without the assumption that O-depthC(α) is odd),
the only element of C′ that could possibly have a different type in D is the last
one of C′; it changes type if and only if it is connected to α and the cardinality
of its connected component in C′ is odd. Under our hypothesis, this cardinality
is even, for the same reason as in (1).
(3) That SD = SC′ ∪ SA (disjoint union) is an immediate consequence
of (1) and (2). By Lemma 6.3.1 (1), qA,α = qD,α dominates every element of
SA, so SA ⊆ (SD)
j0 (depth
qD,α = O-depthD(α) = O-depthC(α) = j0).
It is enough to prove the following claim: every element of SC′ has depth less
than j0 in SD. Let γ
′ be an element of SC′ . If γ
′ > qD,α then the claim is
clear. If not, then, by Proposition 5.3.4 (1), γ′ = ph(γ). By Lemma 6.3.1 (3),
O-depthD(γ) is odd. Since the claim is already true for qD,γ = pv(γ), we
have O-depthD(γ) = depthSDpv(γ) ≤ j0 − 2. By (6) of the same lemma,
depth
γ′ = O-depthD(γ)+1, so depthDγ
′ ≤ j0−1, and the claim is proved.�
Proposition 6.3.6 Let S be a monomial in ON and j an odd integer. For β
in Sj,j+1(:= {α ∈ S |O-depth
(α) ≥ j}), we have
O-depth
Sj,j+1
(β) = O-depth
(β)− j + 1
Proof: Proceed by induction on j. For j = 1, the assertion reduces to a
tautology. Suppose that the assertion has been proved upto j. By the induction
hypothesis, we have Sj+2,j+3 = (Sj,j+1)3,4, and we are reduced to proving the
assertion for j = 3.
Let A be a v-chain in S3,4 with tail β and O-depthA(β) = O-depthS3,4(β).
Let α be the head of A. We may assume that O-depth
(α) = 3 for, if
O-depth
(α) > 3, we can find, by Lemma 6.3.1 (5), α′ of O-depth 3 in S
with α′ > α, and extending A by α′ will not decrease the O-depth in A of β
(Proposition 6.1.1 (1)). Let E be a v-chain in SA with tail qA,β and length
O-depthA(β). The head of E is then qA,α (see Proposition 5.3.4 (1)).
Choose C in S with tail α such that O-depthC(α) = 3. Let D be the
concatenation of C with A. By Corollary 6.3.5, E is contained in SD, qD,α =
qA,α, and qD,β = qA,β. By Proposition 6.1.1 (2), the O-depth of α is the same
in D as in C. Choose a v-chain F in SD with tail qD,α = qA,α. Concatenating F
with E we get a v-chain inSD with tail qD,β = qA,β of lengthO-depthS3,4(β)+2.
This proves that O-depth
(β) ≥ O-depth
(β) + 2.
To prove the reverse inequality, we need only turn the above proof on
its head. Let D be a v-chain in S with tail β such that O-depth
(β) =
O-depthD(β). Let G be a v-chain in SD with tail qD,β and length O-depthS(β).
There exists an element α in D of O-depth 3 in D (by Lemma 6.3.1 (5)). Let
C be the part of D upto and including α, and A the part α > . . . > β. By
Proposition 6.1.1 (2), O-depthC(α) = 3 and, as above, Corollary 6.3.5 applies.
By Corollary 6.1.4, qA,α = qD,α occurs in G. The part F of G upto and in-
cluding qA,α is of length at most 3, and the part E : qD,α > . . . > qD,β belongs
also to SA (Proposition 5.3.4 (2)). Thus the length of G is at most 2 more than
the the length of E which is at most O-depth
(β). �
Corollary 6.3.7 For odd integers i, j, we have (Si,i+1)j,j+1 = Si+j−1,i+j . �
Corollary 6.3.8 Let E : α > . . . > ζ be a v-chain, D and D′ two v-chains with
tail α, and C, C′ the concatenations of D, D′ respectively with E. Then
1. O-depthC(ζ)−O-depthC(α) ≤ O-depthC′(ζ) −O-depthC′(α) + 1;
2. equality holds if and only if the type of λ is H in C and V in C′, and
ph(λ) > µ, where λ is the last element in the connected component con-
taining α of E and µ is the immediate successor in E of λ.
Proof: These assertions follow from combining (2) of Proposition 5.3.3 with
(1) of Lemma 6.3.1. �
Corollary 6.3.9 Let ζ be an element of a monomial S in ON. Let C be a
v-chain in S with tail ζ such that O-depthC(ζ) = O-depthS(ζ). Then
1. O-depthC(α) ≥ O-depthS(α) − 1 for any α in C.
2. If O-depthC(α) = O-depthS(α) − 1 for some α in C, then
(a) letting λ be the last element in the connected component containing α
and µ the element next to λ, the type of λ in C is H and ph(λ) > µ.
(b) O-depthC(γ) = O-depthS(γ) − 1 for all γ in C between α and λ
(both inclusive).
Proof: (1) Let α be in C. Let E denote the part of C beyond (and including) α.
Let D′ be a v-chain in S with tail α such that O-depthD′(α) = O-depthS(α).
Let C′ be the concatenation of D′ and E. Applying Proposition 6.3.8 (1), we
O-depthC(α) ≥ O-depthC(ζ) −O-depthC′(ζ) +O-depthC′(α)− 1.
But O-depthC(ζ)−O-depthC′(ζ) = O-depthS(ζ)−O-depthC′(ζ) ≥ 0, and, by
the choice of D′ and Proposition 6.1.1 (2), O-depthC′(α) = O-depthD′(α) =
O-depth
(2) Assertions (a) and (b) follow respectively from the “only if ” and “if”
parts of item (2) of Proposition 6.3.8. �
7 The map Oπ
The purpose of this section is to describe the map Oπ. The description is given
in §7.1. It relies on certain claims which are proved in §§7.3, 7.4. Those proofs
in turn refer to results from §9, but there is no circularity—to postpone the
definition of Oπ until all the results needed for it have been proved would hurt
rather than help readability. The observations in §7.5 are required only in §10.
The symbol j will be reserved for an odd positive integer throughout this
section.
7.1 Description of Oπ
The map Oπ takes as input a monomial S in ON and produces as output
a pair (w,S′), where w is an element of I(d) such that w ≥ v and S′ is a
“smaller” monomial, possibly empty, in ON. If the input S is empty, no output
is produced (by definition). So now suppose that S is non-empty.
We first partition S into subsets according to the O-depths of its elements.
Let S
be the sub-monomial of S consisting of those elements of S that have
O-depth k—the superscript “pr” is short for “preliminary”. It follows from
Corollary 6.1.3 (4) that there are no comparable elements in S
and so we can
arrange the elements of S
in ascending order of both row and column indices.
Let σk be the last element of S
in this arrangement.
Let now j be an odd integer. We set
j,j+1 := S
We say that S is truly orthogonal at j if ph(σj) belongs to N (that is, if r > r
where σj = (r, c)),
Let Sj,j+1 denote the monomial in N defined by Sj,j+1 :=
j,j+1 \ {σj}
j,j+1 \ {σj}
∪ {pv(σj), ph(σj)}
if S is truly
orthogonal at j
j,j+1 ∪
j,j+1
otherwise
Here S
j,j+1 \ {σj} and other terms on the right are to be understood as mul-
tisets. As proved in Corollary 7.3.4 (1) below, Sj,j+1 has depth at most 2.
Let Sj (respectively Sj+1 be the subset (as a multiset) of elements of depth 1
(respectively 2) of Sj,j+1.
Now, for every integer k, we apply the map of π of [7, §4] to Sk to obtain a
pair (w(k),S′k), where w(k) is an element of I(d, 2d) andS
k is a monomial in N.
Let Sw(k) be the distinguished monomial in N associated to w(k)—see §5.1.2.
Proposition 7.1.1 1. Sw(k) and S
k are symmetric. And therefore so are
∪kSw(k) and ∪kS
2. ∪kSw(k) is a distinguished subset of N (in particular, the Sw(k) are dis-
joint).
3. For j an odd integer, either
• both Sw(j) and Sw(j+1) meet the diagonal, or
• neither of them meets the diagonal,
precisely as whether or not S is truly orthogonal at j. And therefore
∪kSw(k) has evenly many diagonal elements.
4. No S′k intersects the diagonal. And therefore neither does ∪kS
The proposition will be proved below in §7.4.
Finally we are ready to define the image (w,S′) of S under Oπ. We let w
be the element of I(d, 2d) associated to the distinguished subset ∪kSw(k) of N;
since ∪kSw(k) is symmetric and has evenly many diagonal elements, it follows
from Proposition 5.2.1 that w is in fact an element of I(d). And we take S′ :=
k ∩ON.
Remark 7.1.2 Setting
π(Sj,j+1) := (wj,j+1,S
j,j+1), S
′ := ∪j oddS
j,j+1 ∩ON,
and defining w to be the element of I(d, 2d) associated to ∪j oddSwj,j+1 would give
an equivalent definition of Oπ.
7.2 Illustration by an example
We illustrate the map Oπ by means of an example. Let d = 15, and v =
(1, 2, 3, 4, 9, 10, 14, 16, 18, 19, 20, 23, 24, 25, 26). A monomial S in ON is shown
in Figure 7.2.1. Solid black dots indicate the elements that occur in S with non-
zero multiplicity. Integers written near the solid dots indicate multiplicities.
The O-depth of S is 5. The element (21, 9) has O-depth 3 although it has
depth 2 in S. Figure 7.2.2 shows the monomials S
1,2, S
3,4, and S
5,6. Solid
dots, open dots, and crosses indicate elements of these monomials respectively.
The monomial S is truly orthogonal at 1 and 3 but not at 5: σ1 = (28, 2),
σ3 = (21, 9), and σ5 = (15, 14).
Figure 7.2.3 shows the monomials S1,2, S3,4, and S5,6 of N and also their
decomposition into blocks, and Figure 7.2.4 the monomialsS′1,2, S
3,4, and S
We have
Sw = {(15, 14), (17, 16), (21, 10), (7, 4), (27, 24), (28, 3), (30, 1), (29, 2)}
hence w = (7, 9, 15, 17, 18, 19, 20, 21, 23, 25, 26, 27, 28, 29, 30). It is easy to check
that w ∈ I(d). The monomial S′ is the intersection with ON of the union of
S′1,2, S
3,4, and S
5,6—in other words it is just the monomial lying above d in
Figure 7.2.4.
diagonal
1         2        3        4       9       10      14     16     18      19     20     23     24      25     26  
2 1 3 4
3 2 6 2
2 3 2
3 1 2 3
Figure 7.2.1: The monomial S
diagonal
1         2        3        4       9       10      14     16     18      19     20     23     24      25     26  
2 1 3 4
3 2 6 2
2 1 3 2
3 1 2 3
Figure 7.2.2: S
1,2, S
3,4, and S
diagonal
1         2        3        4       9       10      14     16     18      19     20     23     24      25     26  
2 1 3 4
3 2 6 2
2 1 3 2
3 1 2
2 4 3 1 4 2 1 3 2
1 2 1 3 2 1
2 4 6 5 3
1 2 1 4
Figure 7.2.3: S1,2, S3,4, and S5,6
diagonal
1         2        3        4       9       10      14     16     18      19     20     23     24      25     26  
2 1 4
3 6 5 3
2 1 3 2 1
1 4 3 1 4 2 1 3 2
Figure 7.2.4: S′1,2, S
3,4, and S
7.3 A proposition about Sj,j+1
The aim of this subsection is to show that Sj,j+1 has depth no more than 2—
see item (1b) of Proposition 7.3.3. This basic fact was mentioned above in
the description of Oπ and is necessary (psychologically although not logically)
to make sense of the definitions of Sj and Sj+1. We prepare the way for
Proposition 7.3.3 by way of two preliminary propositions. The first of these is
about elements of O-depth j and j + 1 in S, the second about the relation of
these elements with σj .
Proposition 7.3.1 1. S
has no comparable elements.
2. For j an odd integer and β an element of S
j+1, there exists α in S
j such
that α > β. In particular, the row index of σj+1 (if σj+1 exists) is less
than the row index of σj.
Proof: (1) follows from Corollary 6.1.3 (4); (2) follows from Proposition 6.3.3
and Lemma 6.3.1 (5). �
Proposition 7.3.2 Let j be an odd integer and let S be truly orthogonal at j.
1. pv(σj) > ph(σj); if α > pv(σj), then α > σj; if α > σj , then α > ph(σj).
2. No element of S
j is comparable to pv(σj) or ph(σj).
3. No element of S
j+1 is comparable to ph(σj).
4. The following is not possible: α ∈ S
j , β ∈ S
j+1, and ph(α) > β.
Proof: (1) is trivial. (2) follows immediately from the definition of σj . We
now prove (3). First suppose β > ph(σj) for some β in S
j+1. By (2) of
Proposition 7.3.1, there exists α in S
j such that α > β. But then the row
index of α exceeds that of σj , a contradiction to the choice of σj .
We claim that it is not possible for β ∈ S
j+1 to satisfy ph(σj) > β. This
being a special case of (4), we need only prove that statement. So suppose that
α belongs to S
j and that ph(α) > β. Let C be a v-chain in S with tail α such
that O-depthC(α) = j (see Proposition 6.1.3 (1)). Concatenate C with α > β
and call the resulting v-chain D. Then, by Lemma 6.3.1 (4), α is of type H
in D, so that, by Lemma 6.3.1 (1), we have O-depthD(β) = O-depthD(α) + 2.
But, by Proposition 6.1.1 (2), O-depthD(α) = O-depthC(α) = j, so that
O-depth
(β) ≥ j + 2, a contradiction. �
Let Sj,j+1(ext) denote the set—not multiset—defined by:
Sj,j+1(ext) :=
Sj,j+1 ∪ {σj , σ
j } if S is truly orthogonal at j
Sj,j+1 otherwise
Here Sj,j+1 on the right stands for the underlying set of the multiset Sj,j+1
defined above. The set Sj,j+1(ext) is the disjoint union of the sets Sj(ext)
and Sj+1(ext) defined as follows (here again the terms on the right hand side
denote the underlying sets of the corresponding multisets):
Sj(ext) :=
∪ {pv(σ)} if S is truly orthogonal at j
otherwise
Sj+1(ext) :=
j+1 ∪
∪ {ph(σ)} if S is truly orthogonal at j
j+1 ∪
otherwise
Proposition 7.3.3 1. Sj(ext) (respectively Sj+1(ext)) is precisely the set
of elements of depth 1 (respectively 2) in Sj,j+1(ext). In particular,
(a) Neither Sj(ext) nor Sj+1(ext) contains comparable elements.
(b) The length of a v-chain in Sj,j+1(ext) is at most 2.
(c) There is a v-chain of length 2 in Sj,j+1 unless Sj+1(ext) is empty.
2. Let k be a positive integer, not necessarily odd. If there is in S an element
of O-depth at least k, then Sk(ext) is non-empty. The converse also holds
except possibly if k is even and S is truly orthogonal at k−1. In particular,
if Sk(ext) is non-empty, then there is an element of O-depth at least k−1.
Proof: (1): It is enough to show that every element of Sj(ext)(up) (respec-
tively Sj+1(ext)(up)) is of depth 1 (respectively 2) in Sj,j+1(ext)(up), for
• α > β implies α(up) > β(up) for elements α, β of N.
• Sj,j+1(ext) = Sj(ext) ∪Sj+1(ext).
• Sj,j+1(ext), Sj(ext), and Sj+1(ext) are symmetric.
In turn, it is enough to show the following:
(i) Every element of Sj(ext)(up) has depth 1.
(ii) Sj+1(ext)(up) has no comparable elements.
(iii) Every element of Sj+1(ext)(up) has depth at least 2.
Item (i) follows from Proposition 7.3.1 and Proposition 7.3.2 (2); item (ii)
from Proposition 7.3.1 (1) and Proposition 7.3.2 (3); item (iii) from Propo-
sition 7.3.1 (2) and Proposition 7.3.2 (1).
(2): The first assertion follows from Lemma 6.3.1 (5): if k is odd there is an
element ofO-depth k inS; if k is even and there is no element ofO-depth k inS,
then there is in S an element of O-depth k− 1 and of type H, so S is truly or-
thogonal at k−1. The second assertion is clear from the definition of Sk(ext). �
Corollary 7.3.4 1. No element of Sj,j+1 has depth more than 2.
2. Sj+1(ext) = Sj+1 and Sj(ext) ∩ Sj,j+1 = Sj (as sets). In particular,
Sj+1 = Sj,j+1∩Sj+1(ext) and Sj = Sj,j+1∩Sj(ext) as multisets defined
by the intersection of a multiset with a subset.
Proof: (1): Since Sj,j+1 ⊆ Sj,j+1(ext) (as sets), this follows immediately
from (1b) of the proposition above.
(2): Since the union of Sj+1(ext) (which always is contained in Sj,j+1) and
Sj(ext) ∩Sj,j+1 is all of Sj,j+1, and since Sj , Sj+1 are disjoint, it it enough
to show that Sj+1(ext) ⊆ Sj+1 and Sj(ext) ∩Sj,j+1 ⊆ Sj .
Now, since elements of Sj(ext) have depth 1 even in Sj,j+1(ext) (by item (1)
of the proposition above), it is immediate that Sj(ext) ∩ Sj,j+1 ⊆ Sj . And
it follows from the proof of item (iii) in the proof of item (1) of the proposi-
tion above that an element of Sj+1(ext) has depth 2 even in Sj,j+1 (not just
in Sj,j+1(ext)), so that Sj+1(ext) ⊆ Sj+1. �
7.4 Proof of Proposition 7.1.1
(1) The monomials Sj,j+1 are clearly symmetric. Observe that α in Sj,j+1 has
the same depth as α#, for α1 > α2 implies α(up) > α2(up) and α(down) >
α2(down) for α1, α2 in N. Thus the monomials Sk are symmetric. Since the
map π of [7] respects #—see Proposition 5.7 of [4]—it follows that Sw(k) and
S′k are symmetric. Therefore so are ∪kSw(k) and ∪kS
(2) This follows from Corollary 9.3.6.
(3) IfS is truly orthogonal at j, then pv(σj) and ph(σj) are diagonal elements
respectively in Sj and Sj+1—see Corollary 7.3.4 (2). Thus both Sj and Sj+1
have diagonal blocks in the sense of Proposition 5.10 (A) of [4]. It follows from
the result just quoted that both Sw(j) and Sw(j+1) meet the diagonal. It is
of course clear that each Sw(k) meets the diagonal at most once since diagonal
elements are clearly comparable but elements of Sw(k) are not by Lemma 4.9
of [7].
Suppose that S is not truly orthogonal at j. Then σj and σ
j belong to
different blocks—this is equivalent to the definition of S being not truly orthog-
onal at j. By Proposition 7.3.1 (2), it follows that σj+1 and σ
j+1 also belong
to different blocks. So neither Sj nor Sj+1 has a diagonal block.
(4) If S is not truly orthogonal at j, then neither Sj nor Sj+1 has a diagonal
block (as has just been said above), and it follows from Proposition 5.10 (A)
of [4] that neither S′j nor S
j+1 meets the diagonal.
So suppose that S is truly orthogonal at j. Then both Sj and Sj+1 have
a diagonal entry each of multiplicity 1, namely pv(σj) and ph(σj) respectively.
It is clear from the definition of σj that no element of Sj(up) shares its row
index with pv(σj). And it follows from Proposition 7.3.1 (2) that no element
of Sj+1(up) shares its row index with ph(σj). It now follows from the proof of
Proposition 5.10 (B) of [4]—see the last line of that proof—that neither S′j nor
S′j+1 meets the diagonal. �
7.5 More observations
Proposition 7.5.1 The length of any v-chain in Sj,j+1∪S
j+1 is at most 2.
Proof: By Corollary 7.3.4 (1), the length of any v-chain in Sj,j+1 is at most 2.
Applying Lemma 9.1.1 to Sj,j+1, we get the desired result. �
Proposition 7.5.2 1. For an element α′ = (r, c) of S′k(up), there exists an
element α = (r, C) of S
with C ≤ c.
2. For an element α′ = (r, c) of S′j+1(up), there exists an element α = (R, c)
of Sj+1(up) with r ≤ R.
3. For an element α′ of S′j+1(up), there exists an element α of S
j with
α > α′.
Proof: (1) That there exists α inSk(up) with C ≤ c follows from the definition
of S′k(up). Clearly such an α cannot be on the diagonal, so α belongs to S
(2) As in the proof of (1), it follows from the definition of S′j+1 that there
exists α = (R, c) in Sj+1 with r ≤ R. If α lies strictly below the diagonal, then
c > R∗, so that α∗ = (c∗, R∗) > α′ = (r, c), a contradiction to Lemma 9.1.1 (α∗
belongs to Sj+1 by the symmetry of Sj+1). Thus α belongs to Sj+1(up).
(3) Writing α′ = (r, c), by (1), we can find an β = (r, C) in S
j+1 with C ≤ c.
By Proposition 7.3.1 (2), there exists α in S
j,j+1 such that α > β. �
Corollary 7.5.3 If in S′j(up)∪S
j+1(up) there exists an element with horizon-
tal projection in N, then S is truly orthogonal at j.
Proof: Follows directly from Proposition 7.5.2 (1) and (3). �
Proposition 7.5.4 The O-depth of an element in S
j ∪ S
j+1 is at most 2.
More strongly, the O-depth of an element in S
j,j+1 ∪S
j(up) ∪S
j+1(up) is at
most 2.
Proof: It is enough to show that no element in S′j(up)∪S
j+1(up) hasO-depth
more than 2, for we may assume by increasing multiplicities that S
j ⊆ S
j(up)
and S
j+1 ⊆ S
j+1(up) (as sets). It follows from Proposition 7.5.1 that a v-chain
in S′j(up) ∪ S
j+1(up) has length at most 2. Let α
1 = (r1, c1) > α
2 = (r2, c2)
be such a v-chain. It follows from the proof of Corollary 4.14 (2) of [7] that
α′1 ∈ S
j(up) and α
2 ∈ S
j+1(up). By item (1) of Lemma 6.3.1, it is enough to
rule out the following possibility: α′1 is of type H in α
1 > α
2 and ph(α
1) > α
Suppose that this is the case. By Proposition 7.5.2 (1) and (2), it follows
that there exist elements α1 = (r1, C1) ∈ S
j,j+1 and α2 = (R2, c2) ∈ Sj+1(up)
with C1 ≤ c1 and r2 ≤ R2. Since ph(α
1) > α
2, it follows that α1 > α2. Now, if
α2 = ph(σj), then Proposition 7.3.2 (2) is contradicted; if α2 belongs to S
Proposition 7.3.2 (4) is contradicted (because ph(α1) > α2). �
8 The map Oφ
The purpose of this section is to describe the map Oφ and prove some basic
facts about it. Certain proofs here refer to results from §9, but there is no
circularity—to postpone the definition of Oφ until all the results needed for it
have been proved would hurt rather than help readability. As in §7, the symbol j
will be reserved for an odd integer throughout this section.
8.1 Description of Oφ
The map Oφ takes as input a pair (w,T), where T is a monomial, possibly
empty, in ON and w ≥ v an element of I(d) that O-dominates T, and produces
as output a monomial T∗ of ON. To describe Oφ, we first partition T into
subsets Tw,j,j+1. As the subscript w in Tw,j,j+1 suggests, this partition depends
on w.
For an odd integer j, let Sjw (respectively Sw,j,j+1) denote the subset of Sw
consisting of those elements that are j-deep (respectively that are j deep but
not j + 2 deep, or equivalently of depth j or j + 1) in Sw in the sense of [7,
§4]. Since Sw is distinguished, symmetric, and has evenly many elements on
the diagonal d, it follows that Sjw and Sw,j,j+1 too have these properties, and
that, in fact, the number of diagonal elements of Sw,j,j+1 is either 0 or 2 (in
the latter case, the elements have to be distinct since Sw is distinguished and
so is multiplicity free). Let us denote by wj and wj,j+1 the elements of I(d)
corresponding to Sjw and Sw,j,j+1 by Proposition 5.2.1.
Let Tw,j,j+1 denote the subset of T consisting of those elements α such that
• every v-chain in T with head α is O-dominated by wj , and
• there exists a v-chain in T with head α that is not O-dominated by wj+2.
It is evident that the subsets Tw,j,j+1 are disjoint (as j varies over the odd
integers) and that their union is all of T (for w = w1 O-dominates all v-chains
in T by hypothesis and Sjw is empty for large j and so w
j = v). In other words,
the Tw,j,j+1 form a partition of T.
Lemma 8.1.1 1. The length of a v-chain in Tw,j,j+1∪T
w,j,j+1 is at most 2.
In fact, the O-depth of any element in Tw,j,j+1 is at most 2.
2. wj,j+1 O-dominates Tw,j,j+1.
Proof: The lemma follows rather easily from Corollary 9.2.3 as we now show.
Let C be a v-chain in Tw,j,j+1. Let τ be the tail of C. Choose a v-chain D
in T with head τ that is not O-dominated by wj+2. Let E be the concate-
nation of C with D. Since the head of E belongs to Tw,j,j+1, it follows that
E is O-dominated by wj . It follows from (the only if part of) Corollary 9.2.3
(applied with S = E and x = wj) that wj,j+1 O-dominates E
1 ∪ E
2 and
wj+2 O-dominates E3,pr. This means τ 6∈ E3,pr, so τ ∈ E
1 ∪ E
2 , and so
C ⊆ E
1 ∪ E
2 . This proves (2). By Proposition 6.1.1 (2), the O-depths of
elements of C are the same in C and E, so C ⊆ C
1 ∪ C
2 , which proves the
second assertion of (1). The first assertion of (1) follows from the second (see
Lemma 6.2.1). �
Corollary 8.1.2 wj,j+1 dominates Tw,j,j+1 ∪ T
w,j,j+1 in the sense of [7].
Proof: This follows from (2) of Lemma 8.1.1 and Corollary 5.3.6 (the latter
applied with S = Tw,j,j+1 and w = wj,j+1). �
We may therefore apply the map φ of [7, §4] to the pair (wj,j+1,Tw,j,j+1 ∪
w,j,j+1) to obtain a monomial (Tw,j,j+1∪T
w,j,j+1)
⋆ in N. In applying φ, there
is the partitioning of Tw,j,j+1 ∪ T
w,j,j+1 into “pieces”, these being indexed by
elements of Swj,j+1 = Sw,j,j+1—observe that the elements of depth 1 (respec-
tively 2) of Sw,j,j+1 are precisely those of Sw of depth j (respectively j + 1).
We denote by Pβ the piece of Tw,j,j+1∪T
w,j,j+1 corresponding to β in Swj,j+1 .
We also use the notation P∗β as in [7]. Moreover, we will use the phrase piece
of T (with respect to w being implicitly understood) to refer to a piece of
Tw,j,j+1 ∪ T
w,j,j+1 for some odd integer j.
Caution: Thinking of T as a monomial in N and w as an element of I(d, 2d) that
dominates it, there is, as in [7], the notion of “piece of T” (with respect to w).
The two notions of “piece” are different.
Lemma 8.1.3 1. The monomial (Tw,j,j+1∪T
w,j,j+1)
⋆ is symmetric and has
either none or two distinct diagonal elements depending exactly on whether
Swj,j+1 = Sw,j,j+1 has 0 or 2 elements on the diagonal.
2. The depth of (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ is 2; and ∪β∈(Sw)jP
β, ∪β∈(Sw)j+1P
are respectively the elements of depth 1 and 2 in (Tw,j,j+1 ∪ T
w,j,j+1)
Proof: (1) The symmetry follows by combining Proposition 5.6 of [4], which
says that the map π respects the involution #, with Proposition 4.2 of [7], which
says that π and φ are are inverses of each other.
The assertion about diagonal elements follows by combining item (B) of [4,
Proposition 5.10], which is an assertion about the existence and relative mul-
tiplicities of diagonal elements in B and B′ where B is a diagonal block of a
monomial in N, and Proposition 4.2 of [7].
(2) It follows from Propositions 4.2 of [7] that the map π (described in §4 of
that paper) applied to (Tw,j,j+1∪T
w,j,j+1)
⋆ results in the pair (wj,j+1,Tw,j,j+1∪
Tw,j,j+1). It now follows from Lemma 4.16 of [7] that the depth of (Tw,j,j+1 ∪
w,j,j+1)
⋆ is exactly 2. The latter assertions again follow from the results of [7]—
in fact, the proof that π ◦ φ is identity on pages 47–49 of [7] shows that the P∗β
are the blocks in the sense of [7] of the monomial (Tw,j,j+1 ∪ T
w,j,j+1)
Suppose that (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ contains the pair (a, a∗), (b, b∗) of di-
agonal elements with a > b. We call the pair (b, a∗), (a, b∗) the “twists,” and
set δj := (b, a
∗). In other words, δj is the element of the twisted pair that lies
above the diagonal—observe that the twisted elements are reflections of each
other. We allow ourselves the following ways of expressing the condition that
(Tw,j,j+1 ∪ T
w,j,j+1)
⋆ has diagonal elements: δj exists; w is diagonal at j (the
latter expression is justified by the lemma above).
With notation as above, consider the new monomial defined as
(Tw,j,j+1 ∪ T
w,j,j+1)
⋆ if w is not diagonal at j(
(Tw,j,j+1 ∪ T
w,j,j+1)
⋆ \ d
∪ {δj, δ
j } if w is diagonal at j
This new monomial is symmetric and contains no diagonal elements. Its inter-
section with ON is denoted T⋆w,j,j+1. In other words, T
w,j,j+1 is the intersection
of the new monomial with the subset of N of those elements that lie strictly
above the diagonal.
The union of T⋆w,j,j+1 over all odd integers j is defined to be T
w, the result
of Oφ applied to (w,T). This finishes the description of the map Oφ.
For β in Sw,j,j+1(up), we define the “orthogonal piece-star” OP
β corre-
sponding to β as
OP∗β :=
P∗β = P
β(up) if β is not on the diagonal
P∗β ∩ON if β ∈ (Sw)j+1 is on the diagonal
{P∗β ∩ON} ∪ {δj} if β ∈ (Sw)j is on the diagonal
(8.1.1)
With this, we can say that T⋆w is the union of OP
β as β varies over Sw(up).
Lemma 8.1.4 Suppose that (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ contains the pair (a, a∗),
(b, b∗) of diagonal elements with a > b. Let
. . . , (r1, c1), (a, a
∗), (c∗1, r
1), . . . ; . . . , (r2, c2), (b, b
∗), (c∗2, r
2), . . .
be respectively the elements of depth 1 and 2 of (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ arranged
in increasing order of row and column indices. Then
1. c1 ≤ a
∗ and r1 ≤ b (assuming (r1, c1) exists); and
2. r2 < b and c2 ≤ b
∗ (assuming (r2, c2) exists).
Proof: (1) Suppose that (r1, c1) exists. It is clear that c1 ≤ a
∗. From way the
map φ of [7] is defined, it follows that (r1, a
∗) is an element of Tw,j,j+1. Suppose
that r1 > b. Then ph(r1, a
∗) = (r1, r
1) belongs to N. We consider two cases.
If (r2, c2) exists, then, again from the definition of the map φ, it follows that
(r2, b
∗) is an element of Tw,j,j+1. But then ph(r1, a
∗) = (r1, r
1) > (b, b
∗) and
(b, b∗) dominates (r2, b
∗), which means that the v-chain (r1, a
∗) > (r2, b
∗) (note
that a∗ < b∗ because a > b by hypothesis) in Tw,j,j+1 has O-depth more than 2,
a contradiction to Lemma 8.1.1 (1).
Now suppose that (r2, c2) does not exist. (Then (b, b
∗) is the diagonal ele-
ment in (Sw)j+1.) Consider the singleton v-chain C := {(r1, a
∗)} in Tw,j,j+1.
Then SC = {(a, a
∗), (r1, r
1)} which is not dominated by wj,j+1, a contradiction
to Lemma 8.1.1 (2).
(2) Suppose that (r2, c2) exists. Then there exists, by the definition of the
map φ, an element (r2, b
∗) in Tw,j,j+1. Since (r2, b
∗) lies above the diagonal, it
follows that r2 < b. That c2 ≤ b
∗ is clear. �
8.2 Basic facts about Tw,j,j+1 and T
w,j,j+1
Lemma 8.2.1 1. Let α′ > α be elements of T. Let j and j′ be the odd
integers such that α′ ∈ Tw,j′,j′+1 and α ∈ Tw,j,j+1. Then j
′ ≤ j.
2. If, further, either
(a) there exists µ in T such that α′ > µ > α, or
(b) α′ ∈ Pβ′ for β
′ in (Sw)j′+1,
then j′ < j.
Proof: (1) By hypothesis, every v-chain with head α′ is O-dominated by wj
This implies, by Corollary 6.1.2, that every v-chain with head α is O-dominated
by wj
. This shows j′ ≤ j.
(2a) Suppose that j′ = j. It follows from (1) that α′, µ, and α all belong
to Tw,j,j+1. But then α
′ > µ > α is a v-chain of length 3 in Tw,j,j+1, a
contradiction to Lemma 8.1.1 (1).
(2b) Suppose that j′ = j. Then α′ > α is a v-chain in Tw,j,j+1. Being of
length 2, it cannot be dominated by (Sw)j+1, which means, by the definition
of Pβ′ , that α
′ cannot belong to Pβ′ , a contradiction. �
Proposition 8.2.2 1. The length of a v-chain in T⋆w,j,j+1 is at most 2.
2. The O-depth of T⋆w,j,j+1 is at most 2.
3. ∪β∈(Sw)j(up)OP
β is precisely the set of depth 1 elements of T
w,j,j+1 (in
particular, no two elements there are comparable); if δj exists, then it is
the last element of ∪β∈(Sw)j(up)OP
β when the elements are arranged in
increasing order of row and column indices.
4. ∪β∈(Sw)j+1(up)OP
β is precisely the set of depth 2 elements of T
w,j,j+1 (in
particular, no two elements there are comparable); if δj exists, then its
row index exceeds the row index of any element in ∪β∈(Sw)j+1(up)OP
Proof: For (1), it is enough, given Lemma 8.1.3 (2), to show that δj is not
comparable to any element of depth 1 of (Tw,j,j+1 ∪T
w,j,j+1)
⋆, and this follows
from Lemma 8.1.4 (1). In fact, the above argument proves also (3).
For (4), it is enough, given Lemma 8.1.3 (2), the symmetry of the monomi-
als involved in that lemma, and the observation that α > β implies α(up) >
β(up) for elements α, β of N, to show the following: if (a, a∗) > γ = (e, f)
for γ an element of (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ lying (strictly) above the diagonal,
then δj > γ. But this follows from Lemma 8.1.4 (2): γ is a depth 2 ele-
ment in (Tw,j,j+1 ∪ T
w,j,j+1)
⋆, and we have e ≤ r2 < b (and a
∗ < f since
(a, a∗) > γ). In fact, the above argument proves also (2): observe that f ≤ b∗
(Lemma 8.1.4 (2)). �
9 Some Lemmas
The main combinatorial results of this paper are Propositions 4.1.1 and 4.1.2.
They are analogues respectively of Propositions 4.1 and 4.2 of [7]. We have
tried to preserve the structure of the proofs in [7] of those propositions. The
proofs in [7] rely on certain lemmas and it is natural therefore to first establish
the orthogonal analogues of those. The purpose of this section is precisely that.
Needless to say that the lemmas (especially those in §9.4) may be unintelligible
until one tries to read §10.
The division of this section into four subsections is also suggested by the
structure of the proofs in [7]. Each subsection has at its beginning a brief
description of its contents.
9.1 Lemmas from the Grassmannian case
In this subsection, the terminology and notation of [7, §4] are in force. The state-
ments here could have been made in [7, §4] and would perhaps have improved
the efficiency of the proofs there, but do not appear there explicitly.
Let S be a monomial in N. Recall from [7] the notion of depth of an ele-
ment α in S: it is the largest possible length of a v-chain in S with tail α and
denoted depth
α. The depth of S is the maximum of the depths in it of all its
elements. We denote by Sk the set of elements of depth k of S (as in [7]) and
by Sk the set of elements of depth at least k of S.
Caution: For a monomial S of ON, we have introduced in §7.1 the notation Sk.
That is different from the Sk we have just defined.
Lemma 9.1.1 Let S be a monomial in N, and let π(S) = (w,S′), where π is
the map defined in [7, §4]. Then the maximum length of a v-chain in S∪S′ is
the same as the maximum length of a v-chain in S.
Proof: We use the notation of [7, §4] freely. Let d be the maximum length of
a v-chain in S. Suppose α1 > . . . > αℓ is a v-chain in S ∪S
′. Let i1, . . . , iℓ be
such that αj belongs to Sij ∪ S
(the integers ij are uniquely determined—
see Corollary 5.4 of [4]). We claim that i1 < . . . < iℓ. This suffices to prove the
lemma, for Sk ∪S
k is empty for k > d.
To prove the claim, it is enough to show i1 < i2. It follows from Lemma 4.10
of [7] that i1 6= i2. We now assume that i1 > i2 and arrive at a contradiction.
First suppose that α1 ∈ Si1 . Then, by the definition ofSi1 , there exists β inSi2
with β > α1. Now β > α2 and both β, α2 belong to Si2 ∪S
, a contradiction
to [7, Lemma 4.10]. If α1 = (r, c) belongs to S
, then, by the definition of S′i1 ,
there exists (r, a) in Si1 with a ≤ c, and there exists β in Si2 with β > (r, a).
This leads to the same contradiction as before. �
Lemma 9.1.2 Let B and U be monomials in N. Assume that
• the elements of B form a single block (in the sense of [7, Page 38]).
• U has depth 1 (equivalently, there are no comparable elements in U).
• for every β = (r, c) in B, there exist γ1(β) = (R1, C1), and γ2(β) =
(R2, C2) in U such that
C1 < c, C2 < R1, r < R2
(this holds, for example, when there exists γ(β) in U such that γ(β) > β:
take γ1(β) = γ2(β) = γ(β)).
Then there exists a unique block C of U such that w(C) > w(B).
Proof: It is useful to isolate the following observation:
Lemma 9.1.3 Let (r1, c1) and (r2, c2) be elements of N with c2 < r1 ≤ r2. Let
γ11 = (R
1 ), γ
1 = (R
1 ) and γ
2 = (R
2 ), γ
2 = (R
2 ) be elements
of N such that
1. C11 ≤ c1, C
1 < R
1, r1 ≤ R
2. C12 ≤ c2, C
2 < R
2, r2 ≤ R
3. No two of γ11 , γ
1 , γ
2 , γ
2 are comparable (they could well be equal and this
is important for us—see our definition of comparability).
Then the monomial {γ11 , γ
1 , γ
2 , γ
2} consists of a single block.
Proof: It follows from assumption (1) that γ11 and γ
1 belong to a single block:
• if R11 < R
1, then C
1 < R
1 becomes relevant;
• if R21 < R
1, then the other two inequalities in (1) become relevant:
C11 ≤ c1 < r1 ≤ R
Similarly it follows from assumption (2) that γ12 and γ
2 belong to a single block.
We therefore need only consider the cases when, in the arrangement of the
elements {γ11 , γ
1 , γ
2 , γ
2} in increasing order of row indices, both γ
1 , γ
1 come
before or after γ12 , γ
2 . In the former case, the first sequence of inequalities
below shows that γ21 and γ
2 belong to the same block, and we are done; in
the latter case, the second sequence of inequalities below shows that γ22 and γ
belong to the same block, and we are done:
• C12 ≤ c2 < r1 ≤ R
• C11 ≤ c1 < r1 ≤ r2 ≤ R
Continuing with the proof of Lemma 9.1.2, we first prove the existence part.
Arrange the elements of B in non-decreasing order of row numbers as well as
column numbers (this is possible since there are no comparable elements in B).
If β1 = (r1, c1) and β2 = (r2, c2) are successive elements, then c2 < r1 ≤ r2
(since B is a single block). Apply Lemma 9.1.3 with γ11 = γ
1(β1), γ
1 = γ
2(β1),
and γ12 = γ
1(β2), γ
2 = γ
1(β2). We conclude that {γ
1 , γ
1 , γ
2 , γ
2} belongs to a
single block, say C, of U. Continuing thus, we conclude that all γ1(β) and γ2(β),
as β varies over B, belong to C. Since the row (respectively column) index of
w(C) is the maximum (respectively minimum) of all row (respectively column)
indices of elements of C (and similarly for B), it follows that w(C) > w(B).
To prove uniqueness, let C1 and C2 be two blocks of U with w(C1) > w(B)
and w(C2) > w(B). Apply the lemma with (r1, c1) = (r2, c2) = w(B) and
γ11 = γ
1 = w(C1) and γ
2 = γ
2 = w(C2); it follows from [7, Lemma 4.9] that
w(C1) and w(C2) are not comparable. But, unless C1 = C2, neither is the mono-
mial {w(C1), w(C2)} a single block, again by [7, Lemma 4.9]. �
Lemma 9.1.4 Let S be a monomial in N and x an element of I(d, n). For
x to dominate S it is necessary and sufficient that for every α = (r, c) in S
there exist β = (R,C) in Sx with C ≤ c, r ≤ R, and depthSxβ ≥ depthSα.
(Here Sx denotes the distinguished monomial in N associated to x as in [7,
Proposition 4.3].)
Proof: The lemma is a corollary of [7, Lemma 4.5] as we now show.
First suppose that x dominatesS. Let α = (r, c) be an element ofS, and C a
v-chain in S with tail α and length depth
α. Since x dominates C, there exists,
by [7, Lemma 4.5], a chain in D in Sx of length depthSα and tail β = (R,C)
with C ≤ c and r ≤ R, and we are done with the proof of the necessity.
To prove the sufficiency, let C : α1 = (r1, c1) > . . . > αk = (rk, ck) be a
v-chain in S. By hypothesis, there exist β1 = (R1, C1), . . . , βk = (Rk, Ck) in
Sx with Ci ≤ ci, ri ≤ Ri, and depthSxβi = i for 1 ≤ i ≤ k (observe that
replacing the ≥ in the latter condition of the statement by an equality yields
an equivalent statement). We claim that β1 > . . . > βk. By [7, Lemma 4.5], it
suffices to prove the claim.
Since βk has depth k in Sx, there exists a β
k−1 = (R
k−1, C
k−1) of depth
k − 1 in Sx such that β
k−1 > βk. It follows from the distinguishedness of Sx
that that β′k−1 = βk−1: if not, then we have two distinct elements of the same
depth (namely k−1) in Sx both dominating αk, a contradiction. So βk−1 > βk,
and the claim is proved by continuing in a similar fashion. �
Let x be an element of I(d, n). Let Sx denote the distinguished monomial
in N associated to x as in [7, Proposition 4.3]. For k a positive integer, let xk
denote the element of I(d, n) corresponding to the distinguished subset (Sx)k.
For a monomial S ofN, let Sk,k+1 := Sk∪Sk+1. Let xk,k+1 denote the element
of I(d, n) corresponding to the distinguished monomial (Sx)k,k+1; let x
k denote
the element of I(d, n) corresponding to the distinguished subset (Sx)
Caution: For a monomial S of ON and an odd integer j, we have introduced
in §7.1 the notation Sj,j+1. That is different from the Sk,k+1 just defined.
Corollary 9.1.5 x dominates S ⇔ xk dominates Sk ∀ k ⇔ x1,2 dominates
S1,2 and x
3 dominates S3.
Proof: The first equivalence is a restatement of the lemma: in the statement
of the lemma we could equally well have written depth
β = depth
α. The
second follows from the first and the following observations: (S1,2)1 = S1,
(S1,2)2 = S2, (S
3)k = Sk+2; and (x1,2)1 = x1, (x1,2)2 = x2, (x
3)k = xk+2. �
9.2 Orthogonal analogues of Lemmas of 9.1
Lemma 9.2.2 below is the orthogonal analogue of Lemma 9.1.4 (more precisely,
that of the first assertion of Corollary 9.1.5). The following proposition will be
used in its proof.
Proposition 9.2.1 Let x be an element of I(d) and S a monomial in ON.
Then x O-dominates S
1 ∪ S
2 if and only if it O-dominates every v-chain
in S of O-depth at most 2.
Proof: The “if” part is immediate from definitions (in any case, see also
Proposition 7.5.4). For the “only if” part, let C be a v-chain in S of O-depth
at most 2. Our goal is to show that x dominates SC . For this, it is enough, by
Corollary 9.1.5, to show that x1 dominates (SC)1 and x2 dominates (SC)2 (by
choice of C, (SC)k is empty for k ≥ 3).
Let α′ ∈ (SC)1. Choose α in C such that α
′ ∈ SC,α. Choose α0 in S
1 such
that α0 dominates α. Since x O-dominates the singleton v-chain {α0}, it follows
that x1 dominates q{α0},α0 . We claim that q{α0},α0 dominates α
′. To prove the
claim, we need only rule out the possibility that α0 is of type S in {α0} and α
of type V in C. Since α′ ∈ (SC)1, it follows from Proposition 5.3.4 (1) that α
is the first element of C. In particular, if α is of type V in C, then ph(α) ∈ N,
so ph(α0) ∈ N, and α0 is of type H in {α0}. The claim is thus proved.
Now consider an element of (SC)2. Observe that the length of C is at most 2
(Lemma 6.2.1). So our element is either the horizontal projection ph(α) of the
head α of C, or it is qC,β where β is the tail of C. In the first case, let α0 be
as in the previous paragraph, and proceed similarly. It is clear that ph(α0) ∈ N
(because ph(α) ∈ N); x2 dominates ph(α0) and so also ph(α).
Now we handle the second case. If β ∈ S
2 , then C is contained in S
and there is nothing to prove. So assume that O-depth
(β) ≥ 3. Choose a
v-chain D in S with tail β, O-depthD(β) ≥ 3, and with the good property
as in Proposition 6.3.3. There occurs in D an element of O-depth 3, say δ.
(Lemma 6.3.1 (5)). Let A denote the part δ > . . . of D and C′ the part up to
but not including δ. There clearly is an element—call it µ—of depth 2 in SD
that dominates qD,β. This element µ belongs to SC′ (Corollary 6.3.5 (3)). Since
D has the good property of Proposition 6.3.3, C′ ⊆ S
2 , so µ is dominated
by an element in (Sx)2. In particular, qD,β is dominated by the same element
of (Sx)2.
We are still not done, for it is possible that qD,β be β and qC,β be pv(β).
Suppose that this is the case. Then α > β is connected. So ph(α) ∈ N and the
legs of α and β intertwine. As seen above in the third paragraph of the present
proof, there is an element of (Sx)2 that dominates ph(α). By the distinguished-
ness of Sx, it follows that the element in (Sx)2 dominating β is the same as
the one dominating ph(α). By the symmetry of Sx, this element lies on the
diagonal and so dominates pv(β), and, finally, we are done with the proof in the
second case. �
Lemma 9.2.2 Let S be a monomial in ON and x an element of I(d). For x to
O-dominate S it is necessary and sufficient that, for every odd integer j, every
v-chain in S
j+1 is O-dominated by xj,j+1.
Proof: First suppose that x dominates S. Let j be an odd integer and let A
a v-chain in S
j+1. We need to show that xj,j+1 dominates SA. For this,
we may assume that A is maximal (by Corollary 6.1.2). By Corollary 6.1.3 (3),
the length of A is at most 2. By Lemma 6.3.1 (5) (b), for every β in S
j+1 there
exists α in S
j with α > β. Thus we may assume that the head α of A belongs
It is enough to show (see [7, Lemma 4.5]) that for any v-chain E in SA
• the length of E is at most 2;
• there exists an x-dominated monomial in N containing E and the head
of E is an element of depth at least j in that monomial.
The first of these conditions holds by Proposition 7.5.4. We now show that the
second holds.
We may assume that E is maximal in SA. By Proposition 5.3.4 (1), the
head of E is qA,α. Let C a v-chain in S with tail α such that O-depthC(α) = j.
Let D be the concatenation of C with A. We claim that the monomial SD has
the desired properties. That SD is x-dominated is clear (since x O-dominates
S). By Corollary 6.3.5, it follows that qD,α = qA,α and SA ⊆ SD (in particular
that E ⊆ SD). By Proposition 6.1.1 (2), O-depthD(α) = O-depthC(α) = j,
that is, depth
qD,α = j. The proof of the necessity is thus complete.
To prove the sufficiency, proceed by induction on the largest odd integer J
such that S
J+1 is non-empty. When J = 1, there is nothing to prove,
for S
1 ∪ S
2 = S and x1,2 O-dominates S
1 ∪ S
2 . So suppose that J ≥
3. We implicitly use Corollary 6.3.7 in what follows. By induction, x3 O-
dominates S3,4.
Let D be a v-chain in S. Our goal is to show that x dominates SD. Let
α be the element of D with O-depthD(α) = 3—such an element exists, by
Lemma 6.3.1 (5) (if there exists in D an element of O-depth in D exceed-
ing 2); the following proof works also in the case when α does not exist. Let
A be the part α > . . . of D, and C′ the part up to but not including α. By
Proposition 6.1.1 (2), the O-depth (in C′) of elements of C′ is at most 2. By
Proposition 9.2.1, x1,2 dominates SC′ . By Corollary 6.3.5 (3), (SD)1,2 = SC′
and (SD)
3 = SA. Since A ⊆ S
3,4, it follows that x3 dominates SA (induction
hypothesis). Finally, by an application of Corollary 9.1.5, we conclude that x
dominates SD. �
Corollary 9.2.3 Let S be a a monomial in ON and x an element of I(d). For
x to O-dominate S it is necessary and sufficient that x1,2 O-dominate S
and x3 O-dominate S3,4.
Proof: It is easy to see that (x3)j,j+1 = xj+2,j+3; it follows from Proposi-
tion 6.3.6 that (S3,4)
j ∪ (S
j+1 = S
j+2 ∪S
j+3. The assertion follows from
the lemma. �
9.3 Orthogonal analogues of some lemmas in [7]
The proofs of Propositions 4.1 and 4.2 of [7] are based on assertion 4.9–4.16 (of
that paper). Assertion 4.9 being a statement about a single Sk, it is applicable
in the present situation. Since references to it are frequent, we recall it below
as Lemma 9.3.1. As to assertions 4.10–4.16 of [7], assertions 9.3.2, 9.3.4–9.3.9
below are their respective analogues.
A block of a monomial S in ON means a block of Sj,j+1 in the sense of [7]
for some odd integer j.
Caution: Considering S as a monomial in N, there is the notion of a “block”
of S as in [7], which has in fact been used in §9.1, and which is different from the
notion just defined. Both notions are used and it will be clear from the context
which is meant.
Throughout this section S denotes a monomial in ON and j an integer (not
necessarily odd).
Lemma 9.3.1 If B1, . . . ,Bl are the blocks in order from left to right of some
Sk, and w(B1) = (R1, C1), w(B2) = (R2, C2), . . ., w(Bl) = (Rl, Cl), then
C1 < R1 < C2 < R2 < . . . < Rl−1 < Cl < Rl
Proof: This is merely a recall Lemma 4.9 of [7]. In any case it follows easily
from the definitions. �
Lemma 9.3.2 No two elements of Sk(ext) ∪ S
k are comparable. More pre-
cisely, it is not possible to have elements α > β both belonging to Sk(ext)∪S
Proof: It follows from Lemma 9.3.1 that Sk ∪ S
k contains no comparable
elements. If k is even, then Sk(ext) = Sk (Corollary 7.3.4 (2)); if k is odd, we
may assumeSk(ext) = Sk (as sets) by increasing the multiplicity of σk in S
Lemma 9.3.3 For integers i ≤ k, there cannot exist γ ∈ S′i(up) and β ∈ S
such that β > γ. For integers i < k, there cannot exist γ ∈ S′i(up) and β ∈ S
such that β dominates γ.
Proof: Let γ ∈ S′i(up) and β ∈ S
. If i = k and β > γ, then we get a
contradiction immediately to Lemma 9.3.2. Now suppose that i < k and that
β dominates γ. Apply Corollary 6.3.4 (the notation of the corollary being sug-
gestive of how exactly to apply it). Let α be as in its conclusion. The chain
α > γ contradicts Lemma 9.3.2 in case i is odd and either Lemma 9.3.2 or
Proposition 7.5.4 in case i is even. �
Lemma 9.3.4 For (r, c) in S′, there exists a unique block B of S with (r, c)
in B′.
Proof: The existence is clear from the definition of S′. For the uniqueness,
suppose that B and C are two distinct blocks of S with (r, c) in both B′ and C′.
We will show that this leads to a contradiction.
Let i and k be such that B ⊆ Si and C ⊆ Sk. From Lemma 4.11 of [7] (of
which the present lemma is the orthogonal analogue) it follows that i 6= k, so
we can assume without loss of generality that i < k. By applying the involu-
tion # if necessary, we may assume that (r, c) ∈ S′i(up). Now there exists an
element (r, a) in C with a ≤ c (this follows from the definition of C′). Clearly
(r, a) ∈ S
. Taking β = (r, a) and γ = (r, c), we get a contradiction to
Lemma 9.3.3. �
Lemma 9.3.5 Let i < j be positive integers.
1. Given a block B of Sj, there exists a unique block C of Si such that
w(C) > w(B).
2. Given an element β in Sj(ext)∪S
j, there exists α in Si such that α > β.
Proof: (1): The assertion follows by applying Lemma 9.1.2 with B = B
and U = Si. We need to make sure however that the lemma can be applied.
More precisely, we need to check that for every β = (r, c) in B there exist
γ1(β) = (R1, C1) and γ2(β) = (R2, C2) in Si such that C
1 < c, C2 < R1, and
r < R2. We may assume β = β(up), for, if β = β(down), then β(up) also belongs
to Sj because Sj is symmetric, and we can set γ
1(β) = γ2(β(up))(down),
and γ2(β) = γ1(β(up))(down)—note that these two belong to Si since Si is
symmetric.
We consider three cases:
1. β belongs to S.
2. β = ph(σj−1) (in particular, j is even and S is truly orthogonal at j − 1).
3. β = pv(σj) (in particular, j is odd and S is truly orthogonal at j).
Define β′ to be β in case 1, σj−1 in case 2, and σj in case 3. Let C be a v-chain
in S with tail β′ and having the good property as in Proposition 6.3.3.
First suppose that there exists in C an element of O-depth i and denote it
by γ. If ph(γ) 6∈ N (this can happen only in case 1), then set γ
1(β) = γ2(β) = γ.
Now suppose ph(γ) ∈ N. Then γ ∈ Si except when γ = σi with i odd and σi
has multiplicity 1 in S. If γ ∈ Si, take γ
1(β) = γ and γ2(β) = γ# = γ(down);
if γ 6∈ Si, then take γ
1(β) = γ2(β) = pv(γ).
Now suppose that C has no element ofO-depth i. Then, by Lemma 6.3.1 (5),
i is even and there exists in C an element of O-depth i − 1. This element
of C is of type H by Lemma 6.3.1 (1), so S is truly orthogonal at i − 1. Set
γ1(β) = γ2(β) = ph(σi−1).
(2): This proof parallels the proof of (1) above. As in the above proof, we
may assume that β = β(up). Suppose β = (r, c) belongs to S′j. Then there
exists (r, a) ∈ Sj with a ≤ c. Since S
j does not meet the diagonal, it is clear
that (r, a) ∈ ON, and thus it is enough to prove the assertion for β ∈ Sj(ext).
So now take β ∈ Sj(ext). Let β
′ and C be in the proof of (1). First suppose
that there exists in C an element of O-depth i. Denote it by γ. If γ ∈ Si, then
take α = γ. If γ 6∈ Si, then pv(γ) ∈ Si, and we take α = pv(γ). In case there is
no element in C of O-depth i, we take α = ph(σi−1) (see the above proof). �
Corollary 9.3.6 If B and B1 are blocks of S with w(B) = (r, c) and w(B1) =
(r1, c1), then exactly one of the following holds:
c < r < c1 < r1, c1 < r1 < c < r,
c < c1 < r1 < r, or c1 < c < r < r1.
Proof: This is a formal consequence of Lemmas 9.3.1 and 9.3.5, just as Corol-
lary 4.13 of [7] is of Lemmas 4.9 and 4.12 of that paper. �
Corollary 9.3.7 If w(B) > w(C) for blocks B ⊆ Si and C ⊆ Sj of S, then
i < j.
Proof: This is a formal consequence of Lemmas 9.3.1 and 9.3.5. It follows
from the first lemma that i 6= j. Suppose i > j. Then there exists by the second
lemma a block C′ ⊆ Sj such that w(C
′) > w(B). But then w(C′) > w(C), a
contradiction of the first lemma. �
Corollary 9.3.8 Let (s, t) > (s1, t1) be elements of S
′, and B, B1 be blocks of
S such that (s, t) ∈ B′, and (s1, t1) ∈ B
1. Then w(B) > w(B1).
Proof: Let w(B) = (r, c) and w(B1) = (r1, c1). By Corollary 9.3.6, we have
four possibilities. Since (r, c) dominates (s, t) and (r1, c1) dominates (s1, t1),
the possibilities c < r < c1 < r1 and c1 < r1 < c < r are eliminated. It is
thus enough to eliminate the possibility c1 < c < r < r1. Suppose that this
is the case. Then, by Corollary 9.3.7, j1 < j, where j1 and j are such that
B ⊆ Sj and B1 ⊆ Sj1 . Now, by Lemma 9.3.5 (2), there exists α in Sj1 such
that α > (s, t) > (s1, t1). But then this contradicts Lemma 9.3.2. �
Corollary 9.3.9 For a B ⊆ Si of S, the depth of w(B) in Sw is exactly i.
Proof: That the depth is at least i follows from Lemma 9.3.5. That the depth
cannot exceed i follows from Corollary 9.3.7. �
Corollary 9.3.10 Let α ∈ S′k(up), β ∈ S
m(up), and α > β. Then k < m.
Proof: Corollary 9.3.8 and Corollary 9.3.9. �
9.4 More lemmas
This subsection is a collection of lemmas to be invoked in the later subsec-
tions. More specifically, Lemma 9.4.1 and Corollary 9.4.2 are invoked in the
proof of Proposition 4.1.1 in §10.1, Lemma 9.4.3 in the proof of the first half of
Proposition 4.1.2 in §10.2, and Lemma 9.4.4 in the proof of the second half of
Proposition 4.1.2 in §10.3. Throughout this subsection, S denotes a monomial
in ON.
Lemma 9.4.1 Let C be a v-chain in S′, α an element of C, and α′ ∈ SC,α.
Then depth
α′ ≤ k(even), for k the integer such that α ∈ S′k(up).
Proof: Proceed by induction on k. If k = 1, the assertion follows from Corol-
lary 9.3.10, so assume k > 1. Choose a v-chain C′ in SC with tail α
′ and
depthC′α
′ = depth
(α′). The length of a v-chain in SC,α is clearly at most 2.
So, if γ′ is the element two steps before α′ in C′ (if γ′ does not exist then there is
clearly nothing to prove), then γ′ ∈ SC,γ with γ > α (see Proposition 5.3.4 (2)).
We claim that depth
(γ′) ≤ k(odd) − 1. It is enough to prove the claim, for
then depth
(α′) = depthC′α
′ = depthC′γ
′ + 2 ≤ k(odd)− 1 + 2 = k(even).
The claim follows by induction from Corollary 9.3.10 if k is odd or more
generally if γ ∈ S′l(up) with l ≤ k(odd) − 1. So assume that k is even and
γ ∈ S′k−1(up). By 7.5.4, it is not possible that γ is of type H and ph(γ) > α.
So the only possibility is that α′ = ph(α) and γ > α is connected. In particular,
γ is of type V and α of type H in C and γ′ = pv(γ).
Now let µ be the first element in the connected component of α in C. The
cardinality of the part µ > . . . > γ of C is even (by Proposition 5.3.1 (1), it
follows that the cardinality of µ > . . . > α is odd), say e. Letting m be such
that µ ∈ S′m(up), we have, by Proposition 9.3.10, m ≤ k − 1 − (e − 1) =
k − e. If m(even) < k − e, then, since depthC′γ
′ = depthC′pv(µ) + e − 1 (by
Proposition 5.3.4 (1), since, by Proposition 5.3.1 (2), µ, . . . , γ all have type V
in C) and depthC′pv(µ) ≤ m(even) by induction, it follows that depthC′γ
k − e+ e− 1 = k − 1, and we are done.
So suppose that m(even) = k − e. Let ν be the element just before µ in C
(if such an element does not exist, then depthC′γ
′ = e ≤ k − 2—observe that
m(even) ≥ 2—and we are done). Then ν > µ is not connected (by choice of µ).
So ph(ν) > µ. By Proposition 7.5.4, this means that j ≤ m(even) − 2 where j
is the odd integer defined by ν ∈ S′j(up) ∪ S
j+1(up). So, again by induction,
depthC′γ
′ = depthC′ph(ν) + e ≤ m(even) − 2 + e = k − 2, and the claim is
proved. �
Corollary 9.4.2 The O-depth of an element α in S′ is at most k where k is
such that α ∈ S′k(up).
Proof: Let C′ be a v-chain in SC with tail qC,α. If k is even, then, by the
lemma, depthC′qC,α ≤ k. So suppose that k is odd. Let γ
′ be the immediate pre-
decessor of qC,α in C
′. By Proposition 5.3.4 (2), γ > α, and so γ ∈ S′l(up) with
l ≤ k− 1 (see the observation in the first paragraph of the proof of the lemma).
So depthC′γ
′ ≤ k − 1 (by the lemma) and depthC′α
′ = depthC′γ
′ + 1 ≤ k. �
Lemma 9.4.3 Let S be a monomial in ON and Oπ(S) = (w,S′). Let i <
k be integers, α an element of S′i(up), and δ an element of (Sw)k(up) that
dominates α.
1. If k is even, then there exists β ∈ S′k(up) with α > β.
2. If k is odd and wk,k+1 O-dominates the singleton v-chain α, then either
there exists β ∈ S′k(up) with α > β or there exists γ ∈ S
k+1(up) with
ph(α) > γ.
Proof: Write α = (r, c) and δ = (A,B). By Corollary 9.3.9, there exists
a block B of Sk such that δ = w(B). Let (D,B) be the first element of B
(arranged in increasing order of row and column indices). We have the following
possibilities:
(i) D ≤ A and (D,B) ∈ S
(ii) k is odd, S is truly orthogonal at k, (D,B) = (A,B) = pv(σk), and B
consists of the single diagonal element (D,B) = (B∗, B).
(iii) k is even, S is truly orthogonal at k−1, (D,B) = (A,B) = ph(σk−1), and
B consists of the single diagonal element (D,B) = (B∗, B).
We claim the following: in case (i), D < r (in particular, D < A); in case (ii),
the row index of σk is less than r; and case (iii) is not possible. The first
two assertions and also the third in the case i < k − 1 follow readily from
Lemma 9.3.3; in case (iii) holds and i = k − 1, then σk−1 > α, a contradiction
to Lemma 9.3.2.
First suppose that possibility (ii) holds. Write σk = (s,B). Since s < r
and ph(σk) ∈ N, it is clear that ph(α) = (r, r
∗) also belongs to N. From the
hypothesis that wk,k+1 O-dominates {α}, it follows that there is an element
of (Sw)k+1 that dominates ph(α) = (r, r
∗). Such an element must be diagonal
(because of the distinguishedness ofSw), and so must be the w(C) for the unique
diagonal block C ofSk+1. In particular, this means that there are elements other
than (s, s∗) in Sk+1, and so S
k+1 is non-empty. In the arrangement of elements
of S′k+1(up) in increasing order of row and column numbers, let γ = (e, s
∗) be
the last element. Then e < s < r and r∗ < s∗, so ph(α) > γ, and we are done.
Now suppose that possibility (i) holds. Let (p, q) be the element of Sk such
that p is the largest row index that is less than r, and, among those elements
with row index p, the maximum possible column index is q. The arrangement of
elements of Sk (in increasing order or row and column indices) looks like this:
. . . , (p, q), (s, t), . . .
Since p < r ≤ A and w(B) = (A,B), we can be sure that (p, q) is not the last
element of B.
We first consider the case c < t. Then α = (r, c) > β := (p, t) ∈ S′k.
If β ∈ S′k(up), then we are done. It is possible that (p, q) lies on or below
the diagonal so that β lies below the diagonal, in which case, α > β(up) and
β(up) ∈ S′k(up), and again we are done.
Now suppose that t ≤ c. We claim that:
• (s, t) belongs to the diagonal;
• k is odd and S is truly orthogonal at k; and
• σk = (u, t) with u < r.
Suppose that (s, t) does not belong to the diagonal. Since r ≤ s (by choice of
(p, q)), it follows that (s, t) dominates (r, c). This leads to a contradiction to
Lemma 9.3.3, for either (s, t) or its reflection (t∗, s∗) (whichever is above the
diagonal) belongs to S
and dominates α = (r, c) in S′i(up). This shows that
(s, t) belongs to the diagonal. If k is even, then (s, t) = ph(σk−1), which means
σk−1 > α, again contradicting Lemma 9.3.3, so k must be odd. It also follows
that S is truly orthogonal at k and that (s, t) = pv(σk). Writing σk = (u, t), if
r ≤ u, then σk would dominate α, again contradicting Lemma 9.3.3. So u < r,
and the claim is proved.
To finish the proof of the lemma, now proceed as in the proof when possi-
bility (ii) holds. �
Lemma 9.4.4 Let T be a monomial in ON and w an element of I(d) that O-
dominates T. Let β′ > β be elements Sw(up). Let d−1 and d be their respective
depths in Sw. Let α be an element of OP
β or more generally an element of
ON such that
(a) it is dominated by β,
(b) it is not comparable to any element of Pβ, and
(c) in case d is odd, then {α} ∪ Tw,d,d+1 has O-depth at most 2.
1. there exists α′ ∈ P∗β′(up) with α
′ > α;
2. for α′ as in (1), if α′ is diagonal, then ph(δd−2) > α if d is odd and
δd−1 > α if d is even.
Proof: Assertion (2) is rather easy to prove. If d is odd, then, in fact,
ph(δd−2) = α
′; if d is even, then δd−1 has the same column index as α
′ and, by
Proposition 8.2.2 (4), has row index more than that of α, so δd−1 > α.
Let us prove (1). Write α = (r, c), β = (R,C), and β′ = (R′, C′). There
exists, by the definition of P∗β′ , an element in P
β′ with column index C
′. We
have C′ < c (for C′ < C ≤ c). Let (r′, c′) be the element of P∗β′ such that c
is maximum possible subject to c′ < c and among those elements with column
index c′ the maximum possible row index is r′. If r < r′, then we are done (if
(r′, c′) is below the diagonal, its mirror image would have the desired properties).
It suffices therefore to suppose that r′ ≤ r and arrive at a contradiction.
In the arrangement of elements of P∗β′ in non-decreasing order of row and
column indices, there is a portion that looks like this:
. . . , (r′, c′), (a, b), . . .
Since there is in P∗β an element with row index R
′ (and clearly r′ ≤ r < R < R′),
it follows that (a, b) exists (that is, (r′, c′) is not the last element in the above
arrangement). It follows from the construction of P∗β′ from Pβ′ that (r
′, b) is an
element in Pβ′ . By the choice of (r
′, c′), we have c ≤ b. Thus (r, c) dominates
(r′, b).
The proof now splits into two cases accordingly as d is even or odd. First
suppose that d is even. Then, since β dominates (r′, b) and yet (r′, b) does not
belong to Pβ, there exists a v-chain in Tw,d−1,d of length 2 and head (r
′, b). The
tail of this v-chain then belongs toPβ and is dominated by (r, c), a contradiction
to our assumption that α is not comparable to any element of Pβ.
Now suppose that d is odd. Choose a v-chain C in T with head (r′, b) that
is not O-dominated by wd. Let D be the part of C consisting of elements of
O-depth (in C) at most 2. We claim that D is O-dominated by wd,d+1. In fact,
we claim the following: Any v-chain F with head (r′, b) and O-depth at most 2
is O-dominated by wd,d+1.
To prove the claim, we first prove the following subclaim:
(†) If the horizontal projection of (r′, b) belongs to N, then β is on
the diagonal and dominates the vertical projection of (r′, b), and the
diagonal element β1 of (Sw)d+1 dominates the horizontal projection
of (r′, b).
Let ph(r
′, b) ∈ N. Then β belongs to the diagonal because Sw is distinguished
and symmetric. Once β is on the diagonal, it is clear that it dominates pv(r
′, b)
(from our assumptions, β dominates (r, c) and (r, c) dominates (r′, b)). It follows
from Proposition 8.2.2 (3) that the row index of β1 exceeds the row index r of
(r, c), so β1 dominates ph(r
′, b). This finishes the proof of the subclaim (†).
To begin the proof of the claim, observe that F has length at most 2. Suppose
first that F consists only of the single element (r′, b). The type of (r′, b) in
F is either H or S. If it is S, then since β dominates (r′, b), the claim follows
immediately. If it is H, then the claim follows immediately from the subclaim (†).
Continuing with the proof of the claim, let now F consist of two elements:
(r′, b) > µ. Let γ be the element of Sw such that µ ∈ Pγ , and let e be the
depth of γ in Sw. From Lemma 8.2.1 (2b) it follows that e ≥ d. If e = d, then
γ = β (by the distinguishedness of Sw), and the comparability of (r, c) and µ
contradicts our hypothesis (b). So e ≥ d+1, and there exists δ of depth d+1 in
Sw that dominates µ. We have β > δ (again by the distinguishedness of Sw).
The possibilities for the types of (r′, b) and µ in F are: S and S, V and V, H
and S (in the last case ph(r
′, b) 6> µ by Lemma 6.3.1 (1)). Noting the existence
in (Sw)d,d+1 of the v-chain β > δ in the first case and also of β > β1 (where
β1 is as in the subclaim) in the last case, the proof of the claim in these cases
is over. So suppose that the second possibility holds. The distinguishedness of
Sw implies that δ = β1. Since δ is diagonal, it dominates the vertical projection
of µ. Noting the existence of the v-chain in β > δ in (Sw)d,d+1, the proof of
the claim in this case too is over.
We continue with the proof of the lemma. It follows from the claim that D is
O-dominated by wd,d+1. From Corollary 9.2.3 it follows that the complement E
of D in C is not O-dominated by wd+2,d+3 (in particular, that E is non-empty)
and that every v-chain in T with head ǫ (where ǫ denotes the head of E) is
O-dominated by wd (given such a v-chain, the concatenation of D with it is
O-dominated by wd−2, and ǫ continues to have O-depth 3 in the concatenated
v-chain). Thus ǫ belongs to Tw,d,d+1. From (1) and (2b) of Lemma 8.2.1 it
follows that the element µ of C in between (r′, b) and ǫ (if it exists at all) also
belongs to Tw,d,d+1. Now consider the v-chain obtained as follows: take the part
of C up to (and including) ǫ and replace its head (r′, b) by (r, c). This chain has
O-depth 3 and lives in {α} ∪ Tw,d,d+1, a contradiction to hypothesis (c). �
Corollary 9.4.5 Let T be a monomial in ON and w an element of I(d) that
O-dominates T. Let β′ > β be elements of Sw(up), α an element of OP
β, and
d′ := depth
1. If d′ is odd, there exists α′ ∈ OP∗β′ such that α
′ > α.
2. If there does not exist α′ ∈ OP∗β′ such that α
′ > α then (d′ is even by (1)
above and) there exists α′′ ∈ OP∗β′′ such that ph(α
′′) > α, where β′′ is the
unique element of (Sw)d′−1 such that β
′′ > β′.
Proof: Immediate from the lemma. �
Corollary 9.4.6 Let T be a monomial in ON and w an element of I(d) that
O-dominates T. Let β, β′ be elements of Sw(up), and α, α
′ elements of OP∗β
and OP∗β′ respectively.
1. If α′ > α then β′ > β (in particular, depth
β′ < depth
2. If ph(α
′) > α and depth
β is even, depth
β′ ≤ depth
β − 2.
Proof: (1) Writing β = (r, c) and β′ = (r′, c′), there are, since both β and β′
dominate α and Sw is distinguished, the following four possibilities:
c < r < c1 < r1, c1 < r1 < c < r, c < c1 < r1 < r, c1 < c < r < r1
Since α′ > α, and α, α′ are dominated respectively by β, β′ (this is because α,
α′ belong to OP∗β, OP
β′ respectively), the possibilities c < r < c1 < r1 and
c1 < r1 < c < r are eliminated (by the distinguishedness of Sw). It is thus
enough to eliminate the possibility β > β′. Suppose, by way of contradiction,
that β > β′. By Corollary 9.4.5, either there exists γ ∈ OP∗β such that γ > α
in which case the v-chain γ > α in OP∗β contradicts Proposition 8.2.2 (3) or (4),
or d := depth
β is even and there exists (with β′′ being the unique element
in Sw such that β
′′ > β and depth
β′′ = d− 1) an element α′′ ∈ OP∗β′′ with
′′) > α′, in which case the v-chain α′′ > α in T⋆w,d−1,d has O-depth 3 and
so contradicts Proposition 8.2.2 (2).
(2) Set d := depth
β. If depth
β′ were d− 1, then the v-chain α′ > α in
T⋆w,d−1,d would be ofO-depth 3 and so would contradict Proposition 8.2.2 (2). �
10 The Proof
The aim of this section is to prove Propositions 4.1.1 and 4.1.2. The proof of
first proposition appears in §10.1 and that of the second in §§10.2, 10.3. In §9.4
some lemmas are established that are used in the proofs. Needless to say that
the lemmas maybe unintelligible until one tries to read the proofs in the later
subsections.
10.1 Proof of Proposition 4.1.1
(1) By definition, w is the element of I(d) associated to the distinguished mono-
mial ∪kSw(k). By the very definition of this association, we have w ≥ v.
(2) This follows from the corresponding property of the map π of [7]. More
precisely, that property justifies the third equality below. The other equalities
are clear from the definitions.
v-degree(w) + degree(S′) =
degree(Sw) +
degree(S′k)
degree(Sw(k)) + degree(S
degree(Sk)
j odd
degree(Sj,j+1)
j odd
degree(S
j ) + degree(S
= degree(S)
(3) We have:
w O-dominates S′ ⇔ w ≥ wC ∀ v-chain C in S
⇔ w dominates SC ∀ v-chain C in S
⇔ ∀ v-chain C in S′, ∀ α′ = (r, c) ∈ SC ,
∃ β = (R,C) ∈ Sw with C ≤ c, r ≤ R,
and depth
β ≥ depth
The first equivalence above follows from the definition of O-domination, the
second from [7, Lemma 4.5], the third from Lemma 9.1.4.
Now let C be a v-chain in S′ and α′ = (r, c) in SC . We will show that
there exists β in Sw that dominates α and satisfies depthSwβ ≥ depthSCα
Let α be the element in C such that α′ ∈ SC,α, let k be such that α ∈ S
k(up),
and let B be the block of Sk such that α ∈ B
′. Writing α = (r1, c1) and
w(B) = (R1, C1), we have C1 ≤ c1 and r1 ≤ R1 straight from the definition of
w(B). By Corollary 9.3.9, depth
w(B) = k.
First suppose that w(B) dominates α′ (meaning C1 ≤ c and r ≤ R1).
If k ≥ depth
α′, we are clearly done; by Corollary 9.4.2, this is the case
when α′ = qC,α. So suppose that α is of type H, α
′ = ph(α), and that k <
depth
α′. By Lemma 9.4.1, depth
α′ ≤ k(even). It follows that k is odd
and depth
α′ = k + 1. By Corollary 7.5.3, S is truly orthogonal at k, which
means that Sk+1 has a diagonal block, say C. Note that w(C) dominates ph(σk)
which in turn dominates ph(α). Since depthSww(C) = k+1 by Corollary 9.3.9,
we are done.
Now suppose that w(B) does not dominate α′. Then B is non-diagonal and
α′ = pv(α). Since B is non-diagonal, ph(α) 6∈ N, and α cannot be of type H.
So α is of type V in C. It follows easily (see Proposition 5.3.1 (3)) that α is
the critical element in C, and and that last element in its connected component
in C; by Lemma 6.3.1 (4), O-depthC(α) = depthSC qC,α =: d is even. By
Proposition 5.3.1 (1), (2), the cardinality of the connected component of α in
C is even.
The immediate predecessor γ of α in C is connected to α (this follows from
what has been said above). It is of type V in C, ph(γ) belongs to N, and
depth
pv(γ) = d − 1 (see Lemma 6.3.1 (1)). Let ℓ be such that γ ∈ S
ℓ(up).
Let C be the block of Sℓ such that γ ∈ C
′. Since ph(γ) ∈ N, C is diagonal.
Note that w(C) dominates pv(γ) and that pv(γ) > pv(α). By Corollary 9.3.9,
depth
w(C) = ℓ. Thus if d ≤ ℓ we are done. On the other hand, d− 1 ≤ ℓ by
Corollary 9.4.2.
So we may assume that ℓ = d− 1. By Corollary 7.5.3, S is truly orthogonal
at d − 1. This implies that Sd has a diagonal block, say D. Note that w(D)
dominates ph(σd−1) which in turn dominates ph(γ). Writing γ = (r2, c2), since
γ > α is connected, it follows that (r1, r
2) belongs to ON. Now both w(B) and
w(D) dominate (r1, r
2). Since Sw is distinguished and symmetric and w(B) is
not on the diagonal, it follows that w(D) > w(B). This implies, since w(D) is
on the diagonal, w(D) > pv(α). Since depthSww(D) = d by Corollary 9.3.9, we
are done.
(4) Let x be an element of I(d) that O-dominates S. We will show that
x ≥ w. By [7, Lemma 5.5], it is enough to show that x dominates Sw. By
Lemma 9.1.4, it is enough to show the following: for every block B of S, there
exists β in Sx such that β dominates w(B) and depthSxβ ≥ depthSww(B).
Let B be a block of S. By Corollary 9.3.9, depth
w(B) = k where B ⊆
Sk. Let S
x denote the set of elements of Sx of depth at least k. Our goal is
to show that there exists β in Skx that dominates w(B). It follows easily from
the distinguishedness of Sx and the fact that B is a block, that it suffices to
show the following: given α ∈ B, there exists β in Skx (depending upon α) that
dominates α. Moreover, since B and Skx are symmetric, we may assume that
α = α(up).
So now let α = α(up) belong to B. Then either
1. α belongs to S
2. k is odd, S is truly orthogonal at k, and α = pv(σk), or
3. k is even, S is truly orthogonal at k − 1, and α = ph(σk−1).
The proofs in the three cases are similar. In the first case, choose a v-chain C
in S with tail α such that O-depthC(α) = k (see Corollary 6.1.3 (1)). Then
depth
qC,α = k and, clearly, qC,α dominates α. Since x dominates SC , there
exists, by Lemma 4.5 of [7], β in Skx that dominates qC,α (and so also α).
In the second case, choose a v-chain C in S with tail σk with the property
that O-depthC(σk) = k. Then depthSC qC,σk = k. Since ph(σk) belongs to N,
σk is of type V or H in C, so qC,σk = α. Since x dominates SC , there exists, by
[7, Lemma 4.5], β in Skx that dominates qC,σk = α.
In the third case, choose a v-chain C in S with tail σk−1 such that the
O-depth in C of σk−1 is k − 1. Then depthSCqC,σk−1 = k − 1. Since ph(σk−1)
belongs to N, σk−1 is of type V or H in C, so qC,σk−1 = pv(σk−1). From
Lemma 6.3.1 (4), it follows, since k − 1 is odd, that σk−1 is of type H. Since
pv(σk−1) > ph(σk−1) = α, it follows that depthSCph(σk−1) ≥ k (in fact equality
holds as is easily seen). Since x dominates SC , there exists, by [7, Lemma 4.5],
β in Skx that dominates ph(σk−1) = α. �
10.2 Proof that OφOπ = identity
LetS be a monomial inON and let Oπ = (w,S′). We need to show thatOφ ap-
plied to the pair (w,S′) gets us back toS. We know from (3) of Proposition 4.1.1
that w O-dominates S′, so Oφ can indeed be applied to the pair (w,S′).
The main ingredients of the proof are the corresponding assertion in the
case of Grassmannian [7, Proposition 4.2] and the following claim which we will
presently prove:
(S′)w,j,j+1 = S
j(up) ∪S
j+1(up) for every odd integer j
Let us first see how the assertion follows assuming the truth of the claim, by
tracing the steps involved in applying Oφ to (w,S′). From the claim it follows
that when we partition S′ into pieces (see §8), we get S′j(up) ∪S
j+1(up) (for
odd integers j). Adding the mirror images will get us to S′j ∪ S
j+1. From
Corollary 9.3.9 it follows that wj,j+1 is exactly the element of I(d, 2d) obtained
by acting π on Sj ∪ Sj+1. Now, since φ ◦ π = identity, it follows that on
application of φ to (wj,j+1,S
j+1) we obtain Sj ∪Sj+1. By twisting the
two diagonal elements in Sj ∪ Sj+1 (if they exist at all) and removing the
elements below the diagonal d, we get back S
j,j+1. Taking the union of S
j,j+1
(over odd integers j), we get back S.
Thus we need only prove the claim. Since S′ is the union over all odd
integers of the right hand sides (this follows from the definition of S′), and the
left hand sides as j varies are mutually disjoint, it is enough to show that the
right hand side is contained in the left hand side. Thus we need only prove: for
j an odd integer and α an element in S′j(up) ∪S
j+1(up),
• every v-chain in S′ with head α is O-dominated by wj .
• there exists a v-chain in S′ with head α that is not O-dominated by wj+2.
To prove the first item, write T = Sj,j+1 := {α ∈ S|O-depth
(α) ≥ j}
and set Oπ(T) = (x,T′). By Proposition 6.3.6, we have T
i ∪ T
i+1 = S
i+j−1 ∪
i+j for any odd integer i. Thus, by the description of Oπ, we have T
∪k≥jS
k(up). By Corollary 9.3.9 and the description of Oπ, we have x = w
By Corollary 9.3.10, any v-chain inS′ with head belonging toS′j(up)∪S
j+1(up)
is contained entirely in ∪k≥jS
k(up). Finally, by Proposition 4.1.1 (3) applied
to T, the desired conclusion follows.
To prove the second item we use Lemma 9.4.3. Proceed by decreasing induc-
tion on j. For j sufficiently large the assertion is vacuous, for S′j(up)∪S
j+1(up)
is empty. To prove the induction step, assume that the assertion holds for j+2.
If the v-chain consisting of the single element α is not O-dominated by wj+2,
then we are done. So let us assume the contrary. Since the O-depth of the
singleton v-chain α is at most 2, it follows from Lemma 9.2.2 that wj+2,j+3 O-
dominates the v-chain α. Apply Lemma 9.4.3 with k = j+2. By its conclusion,
either there exists β ∈ S′j+2(up) such that α > β or there exists γ ∈ S
j+3(up)
such that ph(α) > γ.
First suppose that a γ as above exists. By induction, there exists a v-chain
in S′—call it D—with head γ that is not O-dominated by wj+4. Let C be the
concatenation of α > γ and D. Since elements of D haveO-depth at least 3 in C
(Lemma 6.3.1 (1)), it follows from Corollary 9.2.3 that C is not O-dominated
by wj+2, and we are done.
Now suppose that such a γ does not exist. Then a β as above exists. If
α > β is not O-dominated by wj+2 we are again done. So assume the contrary.
Since the O-depth of β in α > β is at least 2, it follows that there exists an
element of (Sw)j+3 that dominates β. Applying Lemma 9.4.3 again, this time
with k = j + 3, we find γ′ ∈ S′j+3(up) such that β > γ
′. Arguing as in the
previous paragraph with γ′ in place of γ, we are done. �
10.3 Proof that OπOφ = identity
Let T be a monomial in ON and w an element of I(d) that O-dominates T. We
can apply Oφ to the pair (w,T) to obtain a monomial T⋆w in ON. We need to
show that Oπ applied to T⋆w results in (w,T). The main step of the proof is to
establish the following:
T⋆w,j,j+1 = (T
j,j+1 (10.3.1)
(for the meaning of the left and right sides of the above equation, see §8 and
§7 respectively). Assuming this for the moment let us show that Oπ ◦ Oφ =
identity.
We trace the steps involved in applying Oπ to T⋆w. From Eq. (10.3.1) it
follows that when we break up T⋆w according to the O-depths of its elements
as in §7, we get T⋆w,j,j+1 (as j varies over odd integers). The next step in the
application of Oπ is the passage from (T⋆w)
j,j+1 to (T
w)j,j+1. This involves
replacing σj by its projections and adding the mirror image of the remaining
elements of (T⋆w)
j,j+1. It follows from Proposition 8.2.2 (3) that σj = δj and so
(T⋆w)j,j+1 = (Tw,j,j+1 ∪ T
w,j,j+1)
⋆. The next step is to apply π to (Tw,j,j+1 ∪
w,j,j+1)
⋆. Since π is the inverse of φ (as proved in [7]), we have π((Tw,j,j+1 ∪
w,j,j+1)
⋆) = (wj,j+1,Tw,j,j+1). Since Sw and T are respectively the unions, as
j varies over odd integers, of (Sw)j,j+1 and Tw,j,j+1, we see that Oφ applied to
T⋆w results in (w,T).
Thus it remains only to establish Eq. (10.3.1). It is enough to show that the
left hand side is contained in the right hand side, for the union over all odd j of
either side is T⋆w and the right hand side is moreover a disjoint union. In other
words, we need only show that the O-depth in T⋆w of an element of T
w,j,j+1
is either j or j + 1. We will show, more precisely, that, for any element β of
Sw, the O-depth in T
w of any element of OP
β equals the depth in Sw of β.
Lemma 9.4.4 will be used for this purpose.
Let α be an element of OP∗β and set e := O-depthT⋆w (α). We first show,
by induction on d := depth
β, that e ≥ d. There is nothing to prove in case
d = 1, so we proceed to the induction step. Let β′ be the element of Sw of depth
d − 1 such that β′ > β. If there exists α′ in OP⋆β′ with α
′ > α, the desired
conclusion follows from Corollary 6.1.3 (3) and induction. Lemma 9.4.4 says
that such an α′ exists in case d is even. So suppose that d is odd and such an
α′ does not exist. The same lemma now says that ph(δd−2) > α, so the desired
conclusion follows from Lemma 6.3.1 (1).
We now show, by induction on e, that d ≥ e. There is nothing to prove
in case e = 1, so we proceed to the induction step. Let C be a v-chain in T⋆w
with tail α and having the good property of Proposition 6.3.3. Let α′ be the
immediate predecessor in C of α. Let β′ in Sw be such that α
′ ∈ OP∗β′ (we are
not claiming at the moment that β′ is unique although that is true and follows
from the assertion that we are proving, the distinguishedness of Sw, and the
fact that β′ dominates α′). It follows from Corollary 9.4.6 that β′ > β.
Let d′ := depth
β′. It follows from Corollary 6.1.3 (3) that e′ < e where
e′ := O-depth
(α′). We have, d ≥ d′+1 ≥ e′+1 ≥ (e−2)+1 = e−1, the first
equality being justified because β′ > β, the second by the induction hypothesis,
and the last by Lemma 6.3.1 (1). It suffices to rule out the possibility that
d = e − 1. So assume d = e − 1. Then d = d′ + 1 and d′ = e′ = e − 2. It
follows from (1) of Lemma 6.3.1 that the v-chain α′ > α has O-depth 3 and
from (3) of the same lemma that e′ is odd. But then we get a contradiction to
Proposition 8.2.2 (2) (α′ and α belong to T⋆w,d′,d′+1). The proof of Eq. (10.3.1)
is thus over. �
10.4 Proof of Proposition 4.1.3
Observe that the condition (‡) makes sense also for a monomial of N. By virtue
of belonging to I(d), v has f∗ as an entry. It follows from the description of
the bijection w ↔ Sw of §5.1.2 that for an element w of I(d) to satisfy (‡) it is
necessary and sufficient that Sw (equivalently all its parts Sw,j,j+1) satisfy (‡).
(1) Since T satisfies (‡), so do its parts Tw,j,j+1 and Tw,j,j+1 ∪ T
w,j,j+1
(adding the mirror image preserves (‡)). Since Sw,j,j+1 also satisfies (‡), it
follows from the description of the map φ of [7] (observe the passage from a
piece P to its “star” P∗) that the (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ satisfy (‡). Since
the “twisting” involved in the passage from (Tw,j,j+1 ∪ T
w,j,j+1)
⋆ to T⋆w,j,j+1
involves only a rearrangement of row and column indices, it follows that the
T⋆w,j,j+1 satisfy (‡). Finally so also does their union T
(2) The parts S
j,j+1 of S clearly satisfy (‡). Therefore so do the Sj,j+1, for,
first of all, adding the mirror image preserves (‡), and then the removal of σj
and addition of its projections involves only a rearrangement of row and column
indices. It follows from description of the map π of [7] (observe the passage from
a block B to the pair (w(B),B′)) that both Sw,j,j+1 and S
j,j+1 satisfy (‡).
Finally, Sw and S
′ being the union (respectively) of Sw,j,j+1 and S
j,j+1(up),
they satisfy (‡). �
Part IV
An Application
As an application of the main theorem (Theorem 2.3.1), an interpretation of
the multiplicity is presented.
11 Multiplicity counts certain paths
Fix elements v, w in I(d) with v ≤ w. It follows from Corollary 2.3.2 that the
multiplicity of the Schubert variety X(w) in Md(V ) at the point e
v can be in-
terpreted as the cardinality of a certain set of non-intersecting lattice paths. We
first illustrate this by means of two examples and then justify the interpreta-
tion.
11.1 Description and illustration
The points of N can be represented, in a natural way, as the lattice points of
a grid. The column indices of the points of the grid are the entries of v and
the row indices are the entries of {1, . . . , 2d} \ v. In Figure 11.1.1 the points
of ON and those of the diagonal in N are shown (for the specific choice of v in
Example 11.1.1). The open circles represent the points of Sw(up), where Sw
is the distinguished monomial in N that is associated to w as in §5.1.2. From
each point β of Sw(up) we draw a vertical line upwards from β and let β(start)
denote the top most point of ON on this line. In case β is not on the diagonal,
draw also a horizontal line rightwards from β and let β(finish) denote the right
most point of ON on this line. In case β is on the diagonal, then β(finish) is
not a fixed point but varies subject to the following constraints:
• β(finish) is one step away from the diagonal (that is, it is of the form (r, c),
for some entry c of v, where r is the largest integer less than c∗ that is not
an entry of v);
• the column index of β(finish) is not less than that of β;
• if depth
β is odd, then the horizontal projection of β(finish) is the same
as the vertical projection of γ(finish) where γ is the diagonal element of
Sw of depth 1 more than that of β.
With v and w as in Example 11.1.1, we have β(start) = (6, 3) and β(finish) =
(9, 5) for β = (9, 3); β(start) = β(finish) = (21, 20) for β = (21, 20); β(start) =
(15, 11) for the diagonal element β = (36, 11); β(start) = (6, 1) for the diagonal
element β = (46, 1). In the particular case (of non-intersecting lattice paths)
drawn in Figure 11.1.1, β(finish) = (27, 19) for β = (36, 11) and β(finish) =
(28, 14) for β = (46, 1).
A lattice path between a pair of such points β(start) and β(finish) is a se-
quence α1, . . . , αq of elements of ON with α1 = β(start) and αq = β(finish)
such that, for 1 ≤ j ≤ q − 1, if we let αj = (r, c), then αj+1 is either (R, c) or
(r, C) where R is the least element of {1, . . . , 2d} \ v that is bigger than r and
C the least element of v that is bigger than c. Note that if β(start) = (r, c) and
β(finish) = (R,C), then q equals
|({1, . . . , 2d} \ v) ∩ {r, r + 1, . . . , R}|+ |v ∩ {c, c+ 1, . . . , C}| − 1,
where | · | is used to denote cardinality.
Consider the set Pathsw of all tuples (Λβ)β∈Sw(up) of paths where
• Λβ is a lattice path between β(start) and β(finish) (if β is on the diagonal,
then β(finish) is allowed to vary in the manner described above);
• Λβ and Λγ do not intersect for β 6= γ.
The number of such p-tuples, where p := |Sw(up)|, is the multiplicity of X(w)
at the point ev.
Example 11.1.1 Let d = 23,
v = (1, 2, 3, 4, 5, 11, 12, 13, 14, 19, 20, 22, 23, 26, 29, 30, 31, 32, 37, 38, 39, 40, 41),
w = (4, 5, 9, 10, 14, 17, 18, 21, 23, 25, 27, 28, 31, 32, 34, 35, 36, 39, 40, 41, 44, 45, 46),
so that
Sw ={(9, 3), (10, 2), (17, 13), (18, 12), (21, 20), (25, 22), (27, 26),
(28, 19), (34, 30), (35, 29), (36, 11), (44, 38), (45, 37), (46, 1)}
and Sw(up) =
{(9, 3), (10, 2), (17, 13), (18, 12), (21, 20), (25, 22), (28, 19), (36, 11), (46, 1)}.
A particular element of Pathsw is depicted in Figure 11.1.1. �
Example 11.1.2 Figure 11.1.2 shows all the elements of Pathsw in the follow-
ing simple case:
d = 7, v = (1, 2, 3, 4, 7, 9, 10), and w = (4, 6, 7, 10, 12, 13, 14).
We have Sw = {(6, 3), (12, 9), (13, 2), (14, 1)}, Sw(up) = {(6, 3), (13, 2)(14, 1)}.
There are 15 elements in Pathsw and thus the multiplicity in this case is 15. �
Example 11.1.3 Let d = 10,
v = (1, 2, 3, 4, 6, 8, 11, 12, 14, 16), and w = (8, 9, 11, 14, 15, 16, 17, 18, 19, 20).
so that Sw = {(20, 1)(19, 2)(18, 3), (17, 4), (9, 6)(15, 12)}. Figure 11.1.3 shows
a tuple of paths that is disallowed (meaning one that is not in Pathsw). The
elements of ON are represented as usual by a grid. The slanted line represents
1 2 3 4 5
11 12 13 14
19 20
22 23
Figure 11.1.1: An element of Pathsw with v and w as in Example 11.1.1
the diagonal d. The solid dot represents the point of Sw(up) that is not on d,
and the crosses on d represent the points of Sw(up) that lie on d. The tuple is
disallowed because the horizontal projection of the last point of the path Λβ1
is not the vertical projection of the last point of the path Λβ2 , where β1 =
(20, 1) and β2 = (19, 2) are the diagonal elements of Sw of depths 1 and 2
respectively. �
11.2 Justification for the interpretation
We now justify the interpretation in the previous subsection of the multiplicity.
Corollary 2.3.2 says that the multiplicity is the number of monomials in OR
of maximal cardinality that are square-free and O-dominated by w. Any such
monomial contains OR \ON, for, by the definition of O-domination, adding or
removing elements of OR\ON to or from a monomial does not alter the status
of its O-domination. One could therefore equally well consider the number of
monomials in ON of maximal cardinality that are square-free and O-dominated
✲ ✲ ✲ ✲ ✲
❄ ❄ ❄ ❄ ❄
❄ ❄ ❄ ❄ ❄
❄ ❄ ❄
Figure 11.1.2: All the 15 non-intersecting lattice paths of Example 11.1.2
diagonal
1        2        3        4        6        8        11      12      14      16     
The horizontal projection of the last point of the path associated to (20,1) is
not the same as the vertical projection of the last point of the path associated
to (19,2).    This is the reason that this tuple is disallowed.
Figure 11.1.3: A disallowed tuple of lattice paths (see Example 11.1.3)
by w. We now establish a bijection between the set Monw of such monomials
and the set Pathsw of non-intersecting lattice paths as in §11.1.
Each element Λ of Pathsw can be thought of, in the obvious way, as a
monomial in ON. We will continue to denote the corresponding monomial by Λ.
It is clear that the monomial Λ is square-free and that all such monomials Λ
have the same cardinality (in particular, that if Λ1 ⊆ Λ2 for two such monomials
then Λ1 = Λ2). In order to establish the bijection it therefore suffices to prove
the following proposition.
Proposition 11.2.1 1. w is the element of I(d) obtained on application of
Oπ to the monomial Λ (in particular (see Proposition 4.1.1), the monomial
Λ is O-dominated by w).
2. Given a monomial T of ON that is square-free and O-dominated by w,
there exists Λ such that T ⊆ Λ.
Proof: (1) Write Λ = (Λβ)β∈Sw(up). From the description of the map Oπ
in §7, it follows that it suffices to show that Λ
(in the notation of §7) is the
union ∪Λβ where β runs over all elements of depth k in Sw(up). In other words,
it suffices to show that the O-depth in Λ of any element of Λβ equals the depth
in Sw of β. To prove this, we observe the following (these assertions are easily
seen to be true thinking in terms of pictures): for fixed β ∈ Sw(up) and α ∈ Λβ ,
(A) For β′ in Sw(up) such that β
′ > β, there exists α′ ∈ Λβ′ such that α
′ > α.
(B) If α′ > α for some α′ in Λβ′ for some β
′ in Sw(up), then β
′ > β. If,
furthermore, β and β′ are diagonal, their depths in Sw are 1 apart, and
the depth in Sw of β is even, then the following is not possible: ph(α
belongs to N and ph(α
′) > α.
From (A) it is immediate that the O-depth e in Λ of an element α of Λβ is not
less than the depth d in Sw of β. We now show, by induction on e, that e ≤ d.
For e = 1 there is nothing to show. Suppose that e ≥ 2. Let C be a v-chain
in Λ having tail α and the good property of Proposition 6.3.3, α′ the immediate
predecessor in C of α, e′ the O-depth of α′ in Λ, β′ the element of Sw(up)
such that α′ ∈ Λβ′ , and d
′ the depth in Sw of β
′. From Corollary 6.1.3 (3) it
follows that e′ ≤ e − 1, so we may apply induction. From (B) it follows that
d′ ≤ d − 1, so that, by induction, e′ ≤ d − 1. If e′ ≤ d − 2, then we are done
by Lemma 6.3.1 (1). So suppose that e′ = d′ = d − 1. If d is odd, then the
conclusion e ≤ d follows from (1) and (3) of the same lemma. In case d is even,
then it follows from condition (B) and (1) of the same lemma.
(2) Let T be a square-free monomial in ON that is O-dominated by w. To
construct Λ such that T ⊆ Λ, we construct the “components” Λβ . As in §8, let
Pβ denote the piece of T corresponding to β ∈ Sw. From every point belonging
to Pβ(up) and also from β(start) carve out the South-West quadrant; if β is
not diagonal, then do this also from β(finish). The boundary of the carved out
portion (intersected with ON) gives a lattice path starting from β(start). In
case β is not diagonal, the path ends in β(finish). In this case as well as in
the case when β is diagonal and of even depth in Sw, we take Λβ to be this
lattice path. In case β is diagonal and of odd depth in Sw we do the carving
out from one more point before taking Λβ to be the boundary of the carved
out region, namely from the point that is one step away from the diagonal and
whose horizontal projection is the vertical projection of the end point of Λγ
where γ is the diagonal element of Sw of depth 1 more than β. We need to
justify why carving out from the extra point is still valid, and we do this now
by applying Lemma 8.1.4.
Let us first choose notation that is consistent with that of that lemma. Let
β and γ be diagonal elements in Sw of depths d and d + 1. Assume that d is
odd. Let the pieces of T corresponding to β and γ, when their elements are
arranged in increasing order of row and column indices, look like this:
. . . , (r1, a
∗), (a, r∗1), . . . ; . . . , (r2, b
∗), (b, r∗2), . . .
It is easy to see that the conditions on the numbers in the above display that pro-
vide the requisite justification are: r1 ≤ b and a
∗ < b∗ (if Pβ is empty then the
justification is easy). To prove that a∗ < b∗, observe that the diagonal elements
in P∗β and P
γ are respectively (a, a
∗) and (b, b∗), and apply Lemma 8.1.3 (2).
That r1 ≤ b now follows from Lemma 8.1.4 (1). This finishes the justification.
It suffices to prove the following claim: the lattice paths Λβ as β varies are
non-intersecting. Suppose that Λβ and Λβ′ intersect for β 6= β
′. Let α be a point
of intersection. Clearly β dominates all elements of Λβ and in particular α; for
the same reason β′ also dominates α. By the distinguishedness of Sw, we may
assume without loss of generality that β′ > β. It is easy to see graphically that
if γ in Sw is such that β
′ > γ > β then Λγ intersects either Λβ′ or Λβ : consider
the open portion of ON “caught between” the segment of Λβ′ from β
′(start) to
α and the segment of Λβ from β(start) to α; the starting point γ(start) of Λγ
lives in this region but its ending point does not (points strictly to the Northwest
of α can neither be of the form γ(finish) for γ not on the diagonal nor can they
be one step away from the diagonal); so Λγ must intersect one of the two lattice
path segments. We may therefore assume that the depths of β′ and β differ
by 1.
We now apply Lemma 9.4.4. From the construction of Λβ it readily fol-
lows that α satisfies the hypotheses (a), (b), and (c) of that lemma. By the
conclusion of Lemma 9.4.4, there exists α′ ∈ P∗β′(up) such that α
′ > α. On
the other hand, it follows from the construction of P∗β′ from Pβ′ , and from the
construction of Λβ′ that two elements one from P
β′ and another from Λβ′ are
not comparable. This is a contradiction to the comparability of α′ and α. �
References
[1] M. Brion and P. Polo, Generic singularities of certain Schubert varieties ,
Math. Z., 231, no. 2, 1999, pp. 301–324.
[2] E. De Negri, Some results on Hilbert series and a-invariant of Pfaffian
ideals , Math. J. Toyama Univ., 24, 2001, pp. 93–106.
[3] S. R. Ghorpade and C. Krattenthaler, The Hilbert series of Pfaffian rings ,
in: Algebra, arithmetic and geometry with applications (West Lafayette,
IN, 2000), Springer, Berlin, 2004, pp. 337–356.
[4] S. R. Ghorpade and K. N. Raghavan, Hilbert functions of points on Schubert
varieties in the Symplectic Grassmannian, Trans. Amer. Math. Soc., 358,
2006, pp. 5401–5423.
[5] J. Herzog and N. V. Trung, Gröbner bases and multiplicity of determinantal
and Pfaffian ideals , Adv. Math., 96, no. 1, 1992, pp. 1–37.
[6] T. Ikeda and H. Naruse, Excited Young diagrams and equivariant Schubert
calculus , arXiv:math/0703637.
[7] V. Kodiyalam and K. N. Raghavan, Hilbert functions of points on Schubert
varieties in Grassmannians , J. Algebra, 270, no. 1, 2003, pp. 28–54.
[8] C. Krattenthaler, On multiplicities of points on Schubert varieties in Grass-
mannians. II , J. Algebraic Combin., 22, no. 3, 2005, pp. 273–288.
[9] V. Kreiman, Monomial bases and applications for Schubert and Richardson
varieties in ordinary and affine Grassmannians , Ph. D. Thesis, Northeast-
ern University, 2003.
[10] V. Kreiman, Local Properties of Richardson Varieties in the Grassmannian
via a Bounded Robinson-Schensted-Knuth Correspondence, preprint, 2005,
URL arXiv:math.AG/0511695.
[11] V. Kreiman and V. Lakshmibai, Multiplicities of singular points in Schu-
bert varieties of Grassmannians , in: Algebra, arithmetic and geometry with
applications (West Lafayette, IN, 2000), Springer, Berlin, 2004, pp. 553–
[12] V. Kreiman and V. Lakshmibai, Richardson varieties in the Grassmannian,
in: Contributions to automorphic forms, geometry, and number theory,
Johns Hopkins Univ. Press, Baltimore, MD, 2004, pp. 573–597.
[13] V. Lakshmibai and C. S. Seshadri, Geometry of G/P . V , J. Algebra, 100,
no. 2, 1986, pp. 462–557.
[14] V. Lakshmibai and J. Weyman, Multiplicities of points on a Schubert va-
riety in a minuscule G/P , Adv. Math., 84, no. 2, 1990, pp. 179–208.
[15] P. Littelmann, Contracting modules and standard monomial theory for sym-
metrizable Kac-Moody algebras , J. Amer. Math. Soc., 11, no. 3, 1998,
pp. 551–567.
[16] K. N. Raghavan and S. Upadhyay, Initial ideals of tangent cones to Schubert
varieties in orthogonal Grassmannians . In preparation.
[17] C. S. Seshadri, Geometry of G/P . I. Theory of standard monomials for
minuscule representations , in: C. P. Ramanujam—a tribute, vol. 8 of Tata
Inst. Fund. Res. Studies in Math., Springer, Berlin, 1978, pp. 207–239.
Index
>, relation on ON . . . . . . . . . . . . . . . . . 11
≤, partial order on I(d, 2d) . . . . . . . . 10
A, affine patch qv 6= 0 of Md(V ) . . . 14
α(down), for α ∈ N . . . . . . . . . . . . . . . . 25
α# for α in N . . . . . . . . . . . . . . . . . . . . . 22
α(up), for α ∈ N . . . . . . . . . . . . . . . . . . .25
anti-domination . . . . . . . . . . . . . . . . . . . 18
B, a specific Borel subgroup . . . . . . . . 8
β(finish), for β ∈ Sw(up) . . . . . . . . . . 65
β(start), for β ∈ Sw(up) . . . . . . . . . . . 65
block
in the sense of [7] . . . . . . . . . . . . .47
of a monomial S in ON . . . . . . . 51
comparability, of elements of R . . . . 29
connected components of a v-chain.23
connectedness of two succcessive ele-
ments in a v-chain . . . . . . . . 22
critical element (of a v-chain) . . . . . . 24
d, integral part of n/2 (unfortunately
also used otherwise) . . . . . 7, 9
degree, of a monomial . . . . . . . . . . . . . 10
degree, of a standard monomial . . . . 16
δj , for j odd . . . . . . . . . . . . . . . . . . . . . . . 44
depth (of an element α in a monomial
S in N) = depth
α . . . . . . 46
depth (of a monomial S in N) . . . . . 46
diagonal, d . . . . . . . . . . . . . . . . . . . . . 10, 11
distinguished (a subset of N) . . . . . . 21
domination (among elements of R) 29
domination map . . . . . . . . . . . . . . . . . . . 18
e1, . . . , en, a specific basis of V . . . . . . 8
ev, T -fixed point . . . . . . . . . . . . . . . . . . . 10
〈 , 〉, bilinear form on V . . . . . . . . . . . . 7
fθ := qθ/qv . . . . . . . . . . . . . . . . . . . . . . . . 14
head, of a v-chain. . . . . . . . . . . . . . . . . .11
horizontal projection ph(α) . . . . . . . . 22
I(d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
I(d, 2d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
i(even), for an integer i . . . . . . . . . . . . 29
i(odd), for an integer i . . . . . . . . . . . . . 29
intersection (of a monomial in a set
with a subset) . . . . . . . . . . . . 11
isotropic subspace . . . . . . . . . . . . . . . . . . 7
In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
I ′n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
k, base field, (characteristic 6= 2) .7, 15
k∗(:= n+ 1− k) . . . . . . . . . . . . . . . . 7, 10
L, line bundle . . . . . . . . . . . . . . . . . . . . . 13
Λβ , for β ∈ Sw(up) . . . . . . . . . . . . . . . . 66
lattice path, from β(start) to β(finish),
denoted Λβ . . . . . . . . . . . . . . . 65
legs of α, for α ∈ ON . . . . . . . . . . . . . . 22
legs, intertwining of . . . . . . . . . . . . . . . .22
length, of a v-chain . . . . . . . . . . . . . . . . 11
Md(V ), orthogonal Grassmannian 7, 8
Md(V )
′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
monomial. . . . . . . . . . . . . . . . . . . . . . . . . .10
w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
multiplicity, of X(w) at ev . . . . . . . . . 65
multiset := monomial . . . . . . . . . . . . . . 10
N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
n := dimV , (even from §2.1 on) . . 7, 9
O(V ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
O-depth . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
ON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
OP∗β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Oφ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17, 42
Oπ . . . . . . . . . . . . . . . . . . . . . . . . . 17, 34, 35
OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
O-domination . . . . . . . . . . . . . . . . . . . . . 11
orthogonal Grassmannian (Md(V )) . 7
Paths
w . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Pβ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
P∗β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Pfaffian qθ . . . . . . . . . . . . . . . . . . . . . . . . . 14
ph(α), horizontal projection. . . . . . . .22
piece of T (see also caution) . . . . . . . 43
pθ, Plücker coordinate . . . . . . . . . . . . . 13
pv(α), vertical projection . . . . . . . . . . 22
qC,α, for α in a v-chain C . . . . . . . . . . 24
qθ, Pfaffian . . . . . . . . . . . . . . . . . . . . . . . . 14
R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
S, fixed monomial in ON in §7, §9.3 .
S, set of monomials in OR . . . . . . . . 18
modifications . . see Notation 4.2.1
SC , where C is a v-chain . . . . . . . . . . 23
SC,α, for α in a v-chain C . . . . . . . . . 24
Schubert varieties . . . . . . . . . . . . . . . . 7, 8
S(down), for a monomial S . . . . . . . 25
S#, for monomial S in N or R . . . . 22
σk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Sj,j+1, for S in ON, j odd . . . . . . . . 34
Sj,j+1(ext), for S in ON, j odd . . . 38
j,j+1, for S in ON, j odd . . . . . . . . 34
Sk, for monomial S in N . . . . . . . . . . 46
Sk, for monomial S in ON . . . . . . . . 34
Sk(ext), for monomial S in ON . . . 39
Sk,k+1, for monomial S in N . . . . . . 49
, for monomial S in ON . . . . . . . 34
SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
modifications . . see Notation 4.2.1
SMv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
SMw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
SO(V ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
S′, for monomial S in ON . . . . . . . . 35
standard monomial . . . . . . . . . . . . . . . . 14
v-compatible . . . . . . . . . . . . . . . . . . 15
w-dominated . . . . . . . . . . . . . . . . . . 14
S(up), for a monomial S . . . . . . . . . . 25
Sj,j+1, for monomial S in ON . . . . 32
Sk, for monomial S in N . . . . . . . . . . 46
Sw(v)(m) . . . . . . . . . . . . . . . . . . . . . . . . . .11
Sw, w in I(d, 2d) or I(d, n) . . . . 21, 48
Sw,j,j+1, w in I(d), j odd . . . . . . . . . 42
Sjw, w in I(d), j odd . . . . . . . . . . . . . . 42
symmetric (monomial of N). . . . . . . .22
T , a specific maximal torus . . . . . . . . . 8
T , set of monomials in ON . . . . . . . . 18
modifications . . see Notation 4.2.1
tail, of a v-chain . . . . . . . . . . . . . . . . . . . 11
T, fixed monomial in ON in §8 . . . . . . .
truly orthogonal at j (j odd) . . . . . . 34
Tw,j,j+1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
(Tw,j,j+1 ∪ T
w,j,j+1)
⋆ . . . . . . . . . . . . . . 43
T⋆w,j,j+1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
T⋆w(:= Oφ(w,T)) . . . . . . . . . . . . . . . . . . 44
type (V, H, S), of an element in a v-
chain. . . . . . . . . . . . . . . . . .23–24
U , set of monomials in OR \ON . . 18
modifications . . see Notation 4.2.1
u∗, for u ∈ I(d) . . . . . . . . . . . . . . . . . . . . 19
V , vector space of dimension n . . . . . .7
v, fixed element of I(d) . . . . . . . . . . . . 10
v-chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
v-degree . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
vertical projection pv(α) . . . . . . . . . . . 22
w(C) (or wC), where C is v-chain. .11
w#, for w an element of I(d, 2d) . . . 21
w(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
(w,S′)(:= Oπ(S)), for S in ON . . 34,
w∗, for w an element of I(d, 2d) . . . .21
wj,j+1, j odd . . . . . . . . . . . . . . . . . . . . . . 42
wj , j odd . . . . . . . . . . . . . . . . . . . . . . . . . . 42
X(w), Schubert variety . . . . . . . . . . . . 10
xk, x
k, xk,k+1, for x ∈ I(d, n). . .48–49
Xr,c, variable . . . . . . . . . . . . . . . . . . . . . . 16
Y (w)(:= X(w) ∩ A). . . . . . . . . . . . . . . .15
ABSTRACT
  A solution is given to the following problem: how to compute the
multiplicity, or more generally the Hilbert function, at a point on a Schubert
variety in an orthogonal Grassmannian. Standard monomial theory is applied to
translate the problem from geometry to combinatorics. The solution of the
resulting combinatorial problem forms the bulk of the paper. This approach has
been followed earlier to solve the same problem for the Grassmannian and the
symplectic Grassmannian.
  As an application, we present an interpretation of the multiplicity as the
number of non-intersecting lattice paths of a certain kind.
  Taking the Schubert variety to be of a special kind and the point to be the
"identity coset," our problem specializes to a problem about Pfaffian ideals
treatments of which by different methods exist in the literature. Also
available in the literature is a geometric solution when the point is a
"generic singularity."

<|endoftext|><|startoftext|>
Introduction
The hard X–ray transient IGR J11215–5952 was discovered
with the INTEGRAL satellite during an outburst in April 2005
(Lubinski et al., 2005) and was associated with HD 306414
(Negueruela et al., 2005), a B1Ia supergiant located at a dis-
tance of 6.2 kpc (Masetti et al., 2006). The short duration of the
outburst together with the likely optical counterpart suggested
that IGR J11215–5952 could be a new member of the class
of Supergiant Fast X-ray Transients (SFXTs; Negueruela et al.
2006). Analysing archival INTEGRAL observations of the
source field, Sidoli, Paizis, & Mereghetti (2006, hereafter Paper
I) discovered two previously unnoticed outbursts (in July 2003
and in May 2004) which demonstrate the recurrent nature of this
transient and suggest a possible periodicity of ∼330 days. This
periodicity was confirmed by the detection of the fourth outburst
from IGR J11215–5952 with RossiXTE/PCA on 2006 March
16–17, 329 days after the third outburst (Smith et al., 2006b).
The RXTE/PCA observations showed strong flux variability and
a hard spectrum (power-law photon index of 1.7 ± 0.2 in the
range 2.5–15 keV) as well as a possible pulse period of ∼195 s
(Smith et al., 2006a). The periodicity was confirmed with RXTE
observations of the latest outburst, yielding P = 186.78 ± 0.3 s
(Swank et al., 2007). Follow-up observations with Swift/XRT
Send offprint requests to: P. Romano, patrizia.romano@brera.inaf.it
refined the source position and confirmed the association with
HD 306414 (Steeghs et al., 2006). A hard power-law with a high
energy cut-off around 15 keV is a good fit to the spectra ob-
served with INTEGRAL (Paper I). For the distance of 6.2 kpc,
the peak fluxes of the outbursts correspond to a luminosity of
∼ 3×1036 erg s−1 (5–100 keV). All these findings confirmed
IGR J11215–5952 as a member of the class of the SFXTs, and
the first object of this class of High Mass X–ray Binaries dis-
playing periodic outbursts.
Predicting a fifth outburst for 2007 Feb 9, we obtained a
Target of Opportunity (ToO) observing campaign with Swift,
which commenced on Feb 4. The source started showing re-
newed activity on Feb 8 (Romano et al., 2007) and under-
went a powerful outburst on Feb 9 (Mangano et al., 2007a,b;
Sidoli et al., 2007; Swank et al., 2007). This paper presents our
observations of IGR J11215–5952 and it is organized as follows.
In Sect. 2 we describe our observations and data reduction; in
Sect. 3 we describe our spatial, timing and spectral data anal-
ysis. Finally, in Sect. 4 we discuss our findings and draw our
conclusions.
2. Observations and Data Reduction
Table 1 reports the log of the Swift/XRT observations used for
this work. Thanks to Swift’s fast-slewing and flexible observing
http://arxiv.org/abs/0704.0543v1
2 P. Romano et al.: The fifth outburst of IGR J11215–5952 observed by Swift
140 145 150 155
MJD (−54,000)
(a) 1−10 keV
Feb  9 14 19 24
0 2×104 4×104 6×104 8×104
Time (s since 2007−02−09 00:03:05 UT)
1 2 3
(b) 1−10 keV
1−4 keV(c) Flare 1
te 4−10 keV
5200 5400 5600 5800 6000
Time (s)
4−10/1−4
1−4 keV (d) Flare 2
3 4−10 keV
1.05×104 1.1×104 1.15×104
Time (s)
4−10/1−4
Fig. 1. XRT light curves, cor-
rected for pile-up, PSF losses,
vignetting and background-
subtracted. a) 1–10 keV light
curve for the whole campaign.
Different colours denote dif-
ferent observations (Table 1),
and points before Feb 6 (MJD
54,137) and after Feb 15 (MJD
54,146) are drawn from the
sum of several observations.
Filled circles are full detections
(S/N>3), triangles marginal
detections (2 <S/N< 3), while
downward-pointing arrows are
3-σ upper limits. The vertical
lines mark our time selections
for spectroscopic analysis
(observation 6, observations
7–8, observations 9–12, end
of observation 18). The hor-
izontal line marks the region
shown enlarged in panel b).
b) Detail of observation 6,
with a binning that allowed to
achieve a S/N in excess of 6.
The numbers mark the 5 flares
on which we performed spec-
troscopic analysis. c) Detail of
flare 1, showing the 1–4 keV,
4–10 keV count rates (top and
middle panel) and the hardness
ratio 4–10/1–4 (bottom). The
data were rebinned in order
to have at least 20 counts per
bin in both the 1–4 keV and
4–10 keV band. d) Same as c),
for flare 2.
scheduling, the ToO observations started on 2007 Feb 4 with
2 ks per day evenly spread throughout the day to maximize the
chances of detection of the outburst onset, and were increased
to 5 ks afterwards for a total of 23 days and a total on-source
exposure of ∼ 73 ks. We also retrieved from the Swift Archive
the data from a 643 s ToO performed on 2006 Mar 20 during the
fourth outburst of this source (Steeghs et al., 2006).
The XRT data were processed with standard procedures
(xrtpipeline v0.10.6), filtering and screening criteria by us-
ing FTOOLS in the Heasoft package (v.6.1.2). Given the low
rate of the source during the whole campaign, we only consid-
ered photon counting data (PC) and further selected XRT grades
0–12 (Burrows et al. 2005). With the exception of observation
6 (2006 Feb 9) the data show an average count rate of < 0.5
counts s−1 and no pile-up correction was necessary. We there-
fore extracted the source events from a circular region with a
radius of 11 pixels (1 pixel ∼ 2.37′′). During observation 6, pile-
up correction was required and we adopted an annular source
extraction region with radii 4 and 30 pixels. To account for the
background, we also extracted events within an annular region
centered on the source, and with radii 40 and 100 pixels, free
from background sources. Ancillary response files were gen-
erated with xrtmkarf, and account for different extraction re-
gions, vignetting and PSF corrections. We used the latest spec-
tral redistribution matrices (v008) in the Calibration Database
maintained by HEASARC. For timing analysis, the arrival times
of XRT events were converted to the Solar System barycentre
with the task barycorr and source events were extracted from
the circular region with 30 pixels radius to maximize statistics.
BAT always observed IGR J11215–5952 simultaneously
with XRT, but only survey data products, in the form of Detector
Plane Histograms (DPH) with typical integration time of ∼
300 s, are available. The BAT data were analysed using the stan-
dard BAT analysis software distributed within FTOOLS. DPH
data were calibrated with the task baterebin using the proper
BAT gain/offset files from the housekeeping data directory, and
sky images of each observation were extracted in the 15–25,
25–50, 50–100, 100–150 and 15–150 keV energy bands. The
batcelldetect task never detected the source above a signal-
to-noise ratio (S/N) threshold of 4. This is consistent with the ex-
trapolation at the high energies of the XRT data fit (Sect. 3) with
an absorbed power-law with exponential cutoff, with e-folding
energy of 15 ± 2 keV drawn from the RXTE fit (Swank et al.,
2007).
Throughout this paper the uncertainties are given at 90%
confidence level for one interesting parameter (i.e., ∆χ2 = 2.71)
unless otherwise stated. The spectral indices are parameterized
as Fν ∝ ν
−α, where Fν (erg cm
−2 s−1 Hz−1) is the flux density as
a function of frequency ν; we also use Γ = α + 1 as the photon
index, N(E) ∝ E−Γ (ph cm−2 s−1 keV−1).
P. Romano et al.: The fifth outburst of IGR J11215–5952 observed by Swift 3
0 0.5 1 1.5
Phase
Fig. 2. Folded 0.2–10 keV light curve of the combined obser-
vations 6 though 11, using a period of 186.78 s (Swank et al.,
2007).
3. Analysis and Results
A refined position was obtained by summing all data taken in
2007 with the exclusion of observation 6 (affected by pile-up) at
RA(J2000) = 11h21m46.s90, Dec(J2000) = −59◦ 51′46.′′9, with
an error, drawn from the cross-correlation with the USNO-B1.0
catalogue, of 1.′′1 (90% confidence). This position is 1.′′2 from
the optical counterpart HD 306414.
We extracted light curves in the 1–10 keV (total), 1–4 keV
(soft) and 4–10 keV (hard) bands. The 0.2–1 keV band was not
used in our analysis because, given the high absorbing column
density, its signal was significantly lower than the one of the
other bands. The light curves were corrected for Point-Spread
Function (PSF) losses, due to the extraction region geometry,
bad/hot pixels and columns falling within this region, and for
vignetting, by using the task xrtlccorr (v0.1.9), which gen-
erates an orbit-by-orbit correction based on the instrument map.
We then subtracted the scaled background rate in each band from
their respective source light curves and calculated the 4–10/1–4
hardness ratio. The IGR J11215–5952 light curve (Fig. 1) shows
an increase in count rate by a factor of ∼ 10 in less than 1.5
hours, and of a factor of ∼ 65 in 17 hours on 2007 Feb 9.
However, no significant variation in the hardness ratio can be ev-
idenced (panels c,d of Fig. 1). Indeed, fitting the hardness ratio
as a function of time (or as a function of count rate) to a constant
model yields a value of 0.49±0.03 and χ2 = 1.13 for 80 degrees
of freedom, d.o.f.
We folded the data at the period of 186.78 s reported
by Swank et al. (2007) based on Feb 9 01:20–03:20 UT
RXTE/PCA observations and obtained the 0.2–10 keV light
curve shown in Fig. 2.
Upon examination of the light curve presented in Fig. 1 and
the available counting statistics, we selected different time bins
over which we accumulated spectra. These include i) the qui-
escent phase before the 2007 Feb 9 outburst, ii) the Feb 9 out-
burst (observation 6) and iii) the tail phase of the outburst (ob-
servations 7–8, 9–12, 7–12, 7–18). We further selected 5 flaring
episodes from observation 6 (see Fig. 1b). A comparison with
the data collected during the tail of the 2006 outburst was also
performed. The data were rebinned with a minimum of 20 counts
per energy bin to allow χ2 fitting. However, for the 2006 observa-
tion performed (28 counts), before the onset of the outburst (41
0.5 1 1.5 2 2.5 3
NH (10
22 cm−2)
0.5 1 1.5 2 2.5 3
NH (10
22 cm−2)
Fig. 3. XRT time-selected spectroscopy. The ∆χ2 =
2.3, 4.61, 9.21 contour levels for the column density in units of
1022 cm−2 vs. the photon index, with best-fit values indicated by
crosses. Orange contours are from observation 6, while the blue
ones are from the combined observations 7 though 18.
counts), for the late flares, and the late observations, the Cash
statistic (Cash, 1979) and spectrally unbinned data were used.
The spectra were all fit with XSPEC (v11.3.2) in the 0.5–9 keV
energy range, adopting the typical pulsar spectral model, an ab-
sorbed power law model.
The best fit parameters are reported in Table 2 along with
the mean luminosity of each time selection. The spectrum of
the brightest part of the outburst (observation 6) could be fit
with a single power law, with a photon index Γ = 1.00+0.16
−0.14 and
an absorbing column density of NH = (1.04
+0.25
−0.20) × 10
22 cm−2
(χ2red = 1.04/83 d.o.f.), while the combined observations 7–18
yielded a photon index of 2.08+0.41
−0.37 and NH = (2.04
+0.62
−0.50) × 10
cm−2 (χ2red = 1.19/19 d.o.f). We note that the 1–10 keV count
rate to unabsorbed 1–10 keV flux conversion factor, obtained
from the best fit model for observation 6, is 2.9×10−10 erg cm−2
count−1. We also performed fits of observation 6 with an ab-
sorbed black-body. Since the column density assumed a value
significantly below that resulting from the interstellar medium,
the power law fit was favoured. In all cases the best-fit absorb-
ing column density is consistent (within 2-σ) with the Galactic
absorption along the line of sight of IGR J11215–5952. This
value is significantly lower than the column density measured
with RXTE/PCA during the 2006 outburst (Smith et al., 2006b),
(11±3)×1022 cm−2, which is likely overestimated because it was
derived with RXTE/PCA in the energy range 2.5–15 keV.
To investigate spectral variations, we created the contour lev-
els for the column density vs. the photon index. The most inter-
esting example is shown in Fig. 3. They indicate that the photon
index showed significant variations, with the spectrum soften-
ing as the outburst progresses, confirming the 2006 observation
of the outburst tail; there is also evidence of an increasing ab-
sorbing column density. These findings are independent on our
choice of an absorbed power-law model.
4. Discussion and Conclusions
We have carried out the most complete monitoring campaign of
an outburst from a SFXT, thanks to the known periodicity of the
4 P. Romano et al.: The fifth outburst of IGR J11215–5952 observed by Swift
outburst activity from IGR J11215–5952 (Paper I). This is re-
markable, since the transient and unpredictable nature of the out-
bursts from all other SFXTs hampers a similar extensive study,
from the almost “quiescent” level up to the “flaring” activity. The
entire “outburst event” was monitored for 23 days. The source
was under the threshold of detectability in the early days of the
campaign, with a luminosity below 3.7×1033 erg s−1. On Feb 9
the source underwent a bright outburst up to ∼ 1.1×1036 erg s−1.
The bright part of the outburst (Feb 9; see Fig. 1b) is composed
of at least five flares, with variable peak flux, each with a du-
ration of ∼ 15 min–2 hours. This bright flaring activity lasted
about 1 day, then the source underwent a decline phase, not flat,
but composed of other equally short flares, one order of magni-
tude fainter. This decline phase lasted about 5 days, and then the
source faded to a much fainter level, almost below the thresh-
old of detectability. The whole outburst lasted about 15 d, after
which the source became fainter than 1.2 ×1033 erg s−1. Thus,
IGR J11215–5952 reached the typical luminosity of SFXTs dur-
ing outburst (around 1036 erg s−1), showing a dynamic range
larger than 103 and a hard X–ray spectrum, proper of this kind of
sources. The brightest part of the outburst lasted less than a day,
on 9th February, and would have been the only flaring activity
seen with less sensitive instruments. Indeed up to now, observa-
tions of outbursts from SFXTs have been mostly performed with
instruments on-board RXTE and/or INTEGRAL, which could
only catch the brightest flares, and missed the complete evo-
lution of the phenomenon, from the onset of the outburst, and
down again to the level of almost quiescence, which is expected
at a level around ∼ 1032 erg s−1 (possible magnetospheric emis-
sion plus the contribution from the soft X–ray emission from the
OB supergiant). Only during XMM-Newton and Chandra ob-
servations of IGR J17544-2619 (González-Riestra et al., 2004;
in’t Zand, 2005) outbursts were observed starting from the qui-
escent emission, but the “post-flare” phases could not be com-
pletely followed and thus the duration of the entire outburst
phase could not be measured.
For IGR J11215–5952 we could exceptionally observe the
whole phenomenon, which for the first time reveals that the
“short outbursts” (the “flares” lasting minutes or few hours), are
actually part of a much longer “outburst event” (lasting several
days), which we believe is triggered at the periastron passage in a
wide, highly eccentric orbit. Indeed, the IGR J11215–5952 out-
burst recurrence time (329 days) is remarkably stable and reveals
an underlying clock, which can be naturally associated with
the orbital motion in a non-circular orbit. The short flares most
prominent on Feb 9 are probably produced by the episodic accre-
tion of clumps from the massive wind (Owocki & Cohen, 2006),
or by an inhomogeneous accretion stream near periastron (sim-
ilar to what proposed to explain the periodic outbursts from the
eccentric X-ray pulsar GX 302-1, e.g. Leahy 1991). Thus, both
mechanisms originally proposed to explain the SFXTs outbursts,
seem to be at work in IGR J11215–5952, i.e., accretion at perias-
tron passage in a wide eccentric orbit (Negueruela et al., 2006),
and accretion from clumpy winds, (in’t Zand, 2005). Applying
a spherically symmetric homogeneous wind model to a B1 Ia
spectral type companion, with a mass of 39 M⊙, 42 solar radii,
and a wind mass loss of 3.67×10−6 M⊙ yrs
−1 (Vink et al., 2000),
the short outburst duration implies an eccentricity larger than
0.9. From the spectroscopy of the single flares there is evidence
for only minor variations in the local absorbing column density
(which would suggest the clear presence of clumps). This may
be partly due to the high column density along the line of sight
that absorbs most of the radiation below 1 keV, thus preventing
us from detecting comparatively small variations of the intrinsic
Table 2. Spectral fit results.
Spectruma NH Γ χ
2 (d.o.f.)/ L1−10 keV
(1022 cm−2) C-stat(%)c (erg s−1)
001 (2006) 0.88+0.96
−0.62 1.89
+1.07
−0.92 167.7 (65.8%) 2.35 × 10
001–005 2.28+2.21
−1.50 1.34
+1.11
−0.96 225.9 (65.8%) 4.32 × 10
006 1.04+0.25
−0.20 1.00
+0.16
−0.14 1.04 (83) 4.78
006 Flare 1 0.85+0.46
−0.32 0.94
+0.31
−0.28 0.66 (17) 8.68
006 Flare 2 1.11+0.79
−0.49 0.91
+0.42
−0.32 1.16 (25) 8.32
006 Flare 3 0.83+0.62
−0.42 0.82
+0.44
−0.40 0.93 (10) 11.1
006 Flare 4 0.88+0.38
−0.31 1.03
+0.32
−0.31 623.6 (39.0%) 4.86
006 Flare 5 2.02+1.01
−0.79 1.51
+0.58
−0.53 436.0 (54.6%) 2.75
007–008 1.75+1.03
−0.83 1.94
+0.64
−0.60 518.3 (66.0%) 2.22 × 10
009–012 1.04+0.47
−0.36 1.48
+0.39
−0.35 576.9 (53.6%) 8.82 × 10
007–012 1.86+0.68
−0.44 1.92
+0.43
−0.38 1.09 (18) 1.40 × 10
007–018 2.04+0.62
−0.50 2.08
+0.41
−0.37 1.19 (19) 5.96 × 10
a Last three digits of observation numbers, see Table 1, column 1.
b Luminosity in the 1–10 keV band in units of 1035 erg s−1 obtained
from the spectral fits.
c Cash statistics (C-stat) and percentage of Monte Carlo realizations
that had statistic < C-stat. We performed 104 simulations.
column density. However, XRT data show evidence of softening
of the spectrum in the long decay to quiescent state (thus con-
firming the 2006 observation of the outburst tail) and a possible
evidence of an NH growth connected with the same transition.
Acknowledgements. We thank the Swift team for making these observations
possible, in particular the duty scientists and science planners M. Chester, S.
Hunsberger, J. Kennea, C. Pagani and J. Racusin; we thank N. Gehrels for ap-
proving this ToO and D. Burrows for a winning observing strategy. We thank
S. Campana, P. D’Avanzo, A. Paizis, P. Persi, V.F. Polcaro, and S. Vercellone
for insightful discussions. This research has made use of NASA’s Astrophysics
Data System Bibliographic Services as well as the NASA/IPAC Extragalactic
Database (NED) which is operated by the Jet Propulsion Laboratory, California
Institute of Technology, under contract with the National Aeronautics and Space
Administration. This work was supported by MIUR grant 2005-025417, and
contract ASI/INAF I/023/05/0. PR thanks INAF-IASFMi, where most of the
work was carried out, for their kind hospitality.
References
Burrows, D. N., Hill, J. E., Nousek, J. A., et al. 2005, Space Science Reviews,
120, 165
Cash, W. 1979, ApJ, 228, 939
González-Riestra, R., Oosterbroek, T., Kuulkers, E., Orr, A., & Parmar, A. N.
2004, A&A, 420, 589
in’t Zand, J. J. M. 2005, A&A, 441, L1
Leahy, D. A. 1991, MNRAS, 250, 310
Lubinski, P., Bel, M. G., von Kienlin, A., et al. 2005, ATel, 469
Mangano, V., Romano, P., & Sidoli, L. 2007a, ATel, 995
Mangano, V., Romano, P., & Sidoli, L. 2007b, ATel, 996
Masetti, N., Pretorius, M. L., Palazzi, E., et al. 2006, A&A, 449, 1139
Negueruela, I., Smith, D. M., & Chaty, S. 2005, ATel, 470
Negueruela, I., Smith, D. M., Reig, P., Chaty, S., & Torrejón, J. M. 2006, in
Proceedings of the “The X-ray Universe 2005”, 26-30 September 2005, El
Escorial, Madrid, Spain. ESA SP-604, ed. A. Wilson, 165–170
Owocki, S. P. & Cohen, D. H. 2006, ApJ, 648, 565
Romano, P., Sidoli, L., & Mangano, V. 2007, ATel, 994
Sidoli, L., Mereghetti, S., Vercellone, S., et al. 2007, ATel, 997
Sidoli, L., Paizis, A., & Mereghetti, S. 2006, A&A, 450, L9
Smith, D. M., Bezayiff, N., & Negueruela, I. 2006a, ATel, 773
Smith, D. M., Bezayiff, N., & Negueruela, I. 2006b, ATel, 766
Steeghs, D., Torres, M. A. P., & Jonker, P. G. 2006, ATel, 768
Swank, J., Smith, D., & Markwardt, C. 2007, ATel, 997
Vink, J. S., de Koter, A., & Lamers, H. J. G. L. M. 2000, A&A, 362, 295
P. Romano et al.: The fifth outburst of IGR J11215–5952 observed by Swift 5
Table 1. Observation log.
Sequence Start time (MJD) Start time (UT) End time (UT) Net Exposurea
(yyyy-mm-dd hh:mm:ss) (yyyy-mm-dd hh:mm:ss) (s)
00030384001 53814.7853 2006-03-20 18:50:47 2006-03-20 19:01:53 643
00030881001 54135.5060 2007-02-04 12:08:34 2007-02-04 23:38:58 2048
00030881002 54136.5092 2007-02-05 12:13:12 2007-02-05 23:44:57 1865
00030881003 54137.1798 2007-02-06 04:18:55 2007-02-06 17:28:17 1941
00030881004 54138.1244 2007-02-07 02:59:06 2007-02-07 17:13:56 1213
00030881005 54139.0604 2007-02-08 01:27:02 2007-02-08 16:03:57 1403
00030881006 54140.0021 2007-02-09 00:03:05 2007-02-09 23:59:57 4668
00030881007 54141.6747 2007-02-10 16:11:33 2007-02-11 00:16:56 3141
00030881008 54142.0696 2007-02-11 01:40:15 2007-02-12 00:10:11 4232
00030881009 54143.6078 2007-02-12 14:35:17 2007-02-12 19:38:57 3337
00030881010 54144.0085 2007-02-13 00:12:17 2007-02-13 16:30:58 3091
00030881011 54145.0182 2007-02-14 00:26:09 2007-02-14 21:25:56 4521
00030881012 54146.0142 2007-02-15 00:20:26 2007-02-15 09:58:57 4590
00030881013 54147.6084 2007-02-16 14:36:05 2007-02-16 19:44:56 5230
00030881014 54148.6848 2007-02-17 16:26:09 2007-02-17 21:23:57 4295
00030881015 54149.2812 2007-02-18 06:44:59 2007-02-18 12:01:58 4804
00030881016 54150.2187 2007-02-19 05:14:52 2007-02-19 12:02:56 4636
00030881017 54151.6337 2007-02-20 15:12:28 2007-02-20 20:19:56 4847
00030881018 54152.0412 2007-02-21 00:59:16 2007-02-21 17:11:57 5194
00030881019 54153.4431 2007-02-22 10:38:04 2007-02-22 15:39:58 3814
00030881021 54155.1683 2007-02-24 04:02:20 2007-02-24 13:46:57 2963
00030881023 54157.5108 2007-02-26 12:15:37 2007-02-26 18:48:58 786
a The exposure time is spread over several snapshots (single continuous pointings at the target) during each observation.
	Introduction
	Observations and Data Reduction
	Analysis and Results
	Discussion and Conclusions
ABSTRACT
  IGR J11215-5952 is a hard X-ray transient source discovered in April 2005
with INTEGRAL and a confirmed member of the new class of High Mass X-ray
Binaries, the Supergiant Fast X-ray Transients (SFXTs). Archival INTEGRAL data
and RXTE observations showed that the outbursts occur with a periodicity of
~330 days. Thus, IGR J11215-5952 is the first SFXT displaying periodic
outbursts, possibly related to the orbital period. We performed a Target of
Opportunity observation with Swift with the main aim of monitoring the source
behaviour around the time of the fifth outburst, expected on 2007 Feb 9. The
source field was observed with Swift twice a day (2ks/day) starting from 4th
February, 2007, until the fifth outburst, and then for ~5 ks a day afterwards,
during a monitoring campaign that lasted 23 days for a total on-source exposure
of ~73 ks. This is the most complete monitoring campaign of an outburst from a
SFXT. The spectrum during the brightest flares is well described by an absorbed
power law with a photon index of 1 and N_H~1 10^22 cm^-2. A 1-10 keV peak
luminosity of ~10^36 erg s^-1 was derived (assuming 6.2 kpc, the distance of
the optical counterpart). These Swift observations are a unique data-set for an
outburst of a SFXT, thanks to the combination of sensitivity and time coverage,
and they allowed a study of IGR J11215-5952 from outburst onset to almost
quiescence. We find that the accretion phase lasts longer than previously
thought on the basis of lower sensitivity instruments observing only the
brightest flares. The observed phenomenology is consistent with a smoothly
increasing flux triggered at the periastron passage in a wide eccentric orbit
with many flares superimposed, possibly due to episodic or inhomogeneous
accretion.

<|endoftext|><|startoftext|>
Introduction
	Background
	Model
	Functional representation of the grand partition function of an ionic model
	Effective Hamiltonian in the vicinity of the critical point
	Ginzburg temperature
	Summary
	Appendices
	 Recurrence formulas for the cumulants Fourier space.
	The nth-particle structure factors of a one component hard sphere systems in the Percus-Yevick approximation
	Explicit expression for SR2 
	 Explicit expressions for the integrals used in equations (??)-(??)
	References
ABSTRACT
  According to extensive experimental findings, the Ginzburg temperature
$t_{G}$ for ionic fluids differs substantially from that of nonionic fluids
[Schr\"oer W., Weig\"{a}rtner H. 2004 {\it Pure Appl. Chem.} {\bf 76} 19]. A
theoretical investigation of this outcome is proposed here by a mean field
analysis of the interplay of short and long range interactions on the value of
$t_{G}$. We consider a quite general continuous charge-asymmetric model made of
charged hard spheres with additional short-range interactions (without
electrostatic interactions the model belongs to the same universality class as
the 3D Ising model). The effective Landau-Ginzburg Hamiltonian of the full
system near its gas-liquid critical point is derived from which the Ginzburg
temperature is calculated as a function of the ionicity. The results obtained
in this way for $t_{G}$ are in good qualitative and sufficient quantitative
agreement with available experimental data.

<|endoftext|><|startoftext|>
Mostovoy Reply: In their Comment [1] Kenzelmann
and Harris argue against the conclusion made in [2] that
spiral magnets are in general ferroelectric. First of all, I
believe, this conclusion was proved experimentally. The
systematic search for ferroelectricity in magnets with spi-
ral ordering recently led to a discovery of new multifer-
roic materials, such as CoCr2O4 [3], MnWO4 [4, 5] and
LiCu2O2 [6].
Furthermore, Kenzelmann and Harris argue that the
continuum theory outlined in [2] leads to misleading pre-
dictions about the magnetically-induced electric polar-
ization. To prove their point, they consider two hy-
pothetical spin configurations shown in Fig. 1 (c) and
(d) of their Comment, and argue that the results of the
continuum theory are incompatible with crystal symme-
tries. While one cannot deny the importance of symme-
try considerations, the arguments Kenzelmann and Har-
ris are themselves very misleading. They incorrectly as-
sert that for the spin configurations shown in Fig. 1 (c)
and (d) ‘the spiral theory’ would predict electric polar-
ization along, respectively, the c and a axes.
The continuum model of multiferroics [2] is based on
assumption that the spin state can be described by a sin-
gle magnetization vector. For TbMnO3 (see Fig. 1b),
where the wave vector of the magnetic spiral is along the
b axis and spins are rotating in the bc plane, it predicts
electric polarizationP along the c axis, in agreement with
experiment. The magnetic structures (c) and (d) are of a
different kind, as they are made of spirals rotating in op-
posite directions. Thus in the configuration (c) there are
two counter-rotating bc spirals in each ab plane, which is
why the net polarization along the c axis is zero. Simi-
larly, in the configuration (d) the ab spirals in neighboring
bc planes rotate in opposite directions, resulting in zero
net Pa.
It is not difficult to modify the continuum model con-
sidered in [2] to describe these more general magnetic
orders. For more than one magnetic ion per unit cell
one can introduce several independent magnetic order
parameters, which increases the number of possible mag-
netoelectric coupling terms. For instance, all three spin
configurations shown in Fig. 1 of the Comment can be
described by three antiferromagnetic order parameters
L1 = S1 + S2 − S3 − S4,
L2 = S1 − S2 + S3 − S4,
L3 = S1 − S2 − S3 + S4
(the labels of the 4 Mn ions in the unit cell of TbMnO3
are the same as in [7]). The spiral configuration (b)
can be described by a single order parameter L1 with
nonzero Lb
and Lc
. As discussed in [2], the magneto-
electric coupling linear in the gradient of the magnetic
order parameter (Lifshitz invariant) allowed by symme-
tries has the form P c
, which gives
rise to magnetically-induced P c. The configuration (c)
is described by two different order parameters, Lb
. The term Lc
does not transform like
any of the components of P, so that the induced polar-
ization is zero. Finally, for the configuration (d) with
nonzero Lb
and La
, the only possible coupling term is
, allowing for nonzero P c.
The point is, however, that the spin configurations (c)
and (d) considered by Kenzelmann and Harris, are very
artificial, as it is difficult to find a system where interac-
tions between spins would favor the simultaneous pres-
ence of counter-rotating spirals. The average interaction
between counter-rotating spirals is zero, while for spirals
with spins rotating in the same direction some interac-
tion energy can always be gained by properly adjusting
their relative phases. This is the reason why the simple
model of Ref. [2] with a single vector order parameter
successfully describes thermodynamics and magnetoelec-
tric properties of many spiral multiferroics.
Maxim Mostovoy
Materials Science Center, University of Groningen,
Nijenborgh 4, 9747 AG Groningen, The Netherlands
[1] M. Kenzelmann and A. B. Harris, Comment
arXiv/cond-mat0610471.
[2] M. Mostovoy, Phys. Rev. Lett. 96, 067601 (2006).
[3] Y. Yamasaki, S. Miyasaka, Y. Kaneko, J.-P. He, T.
Arima, and Y. Tokura, Phys. Rev. Lett. 96, 207204
(2006).
[4] K. Taniguchi, N. Abe, T. Takenobu, Y. Iwasa, and T.
Arima, Phys. Rev. Lett. 97, 097203 (2006).
[5] O. Heyer et al., J. Phys. Condens. Matter 18, L471
(2006).
[6] S. Park, Y. J. Choi, C. L. Zhang and S.-W. Cheong, to
be published.
[7] A. B. Harris and G. Lawes, arXiv/cond-mat0508617.
http://arxiv.org/abs/0704.0545v1
	References
ABSTRACT
  In response to the comment of Kenzelmann and Harris I show how the continuum
theory of spiral multiferroics can be modified to describe general magnetic
orders and discuss why the microscopic mechanism of magnetically-induced
ferroelectricity usually makes such modifications unnecessary. This explains
why the simple model with a single vector order parameter successfully
describes thermodynamics and magnetoelectric properties of many spiral
multiferroics.

<|endoftext|><|startoftext|>
Microsoft Word - SQubit.doc
PERSISTENT CURRENTS IN SUPERCONDUCTING QUANTUM INTERFERENCE 
DEVICES 
F. Romeo 
Dipartimento di Fisica “E. R. Caianiello”, Università degli Studi di Salerno 
 I-84081 Baronissi (SA), Italy 
R. De Luca 
CNR-INFM and DIIMA, Università degli Studi di Salerno 
 I-84084 Fisciano (SA), Italy 
ABSTRACT 
Starting from the reduced dynamical model of a two-junction quantum interference device, a 
quantum analog of the system has been exhibited, in order to extend the well known properties of 
this device to the quantum regime. By finding eigenvalues of the corresponding Hamiltonian 
operator, the persistent currents flowing in the ring have been obtained. The resulting quantum 
analog of the overdamped two-junction quantum interference device can be seen as a supercurrent 
qubit operating in the limit of negligible capacitance and finite inductance.  
PACS: 74.50.+r, 85.25.Dq 
Keywords: Josephson junctions, d. c. SQUID, Qubit 
I   INTRODUCTION 
The d. c. SQUID (Superconducting QUantum Interference Device) is a well known system, 
widely investigated in the literature [1-3]. This system, though not confined to atomic scale in its 
dimensions, has been proposed as the basic unit for quantum computing (qubit) by resorting to a 
characteristic feature of superconductivity: macroscopic quantum coherence [4]. In general, a qubit 
can be realized by means of a two-level quantum mechanical system [5]. Therefore, the quantum 
states of a qubit can be a linear combinations of the orthogonal basis 0  and 1 , so that the Hilbert 
space generated by this basis is two-dimensional. Alternatively, a qubit state can be represented by 
elements of an infinite-dimensional Hilbert space. In this case, however, the effective potential of 
the system must show a double-well potential, in such a way that one of the two stationary states can 
be defined as state 0  and the other as state 1 . 
The electrodynamic properties of d. c. SQUIDs can be analyzed by means of two-junction 
quantum interferometer models, where each Josephson junction is assumed to be in the overdamped 
regime. The simplest possible analysis of these systems is done by assuming negligible values of the 
inductance L of a single branch of the device, so that 0
β , where 0Φ  is the elementary flux 
quantum, and 
21 JJ
=  is the mean value of the maximum Josephson currents of the junctions. 
In this case, the dynamical equation for the superconducting phase differences 1φ  and 2φ  across the 
two junctions can be written as a single equation for the average phase variable 
21 φφϕ
= . This 
equation is similar to the nonlinear differential equation governing the time evolution of a single 
overdamped junction, so that it can be defined as an equivalent single junction model, and is written 
as follows: 
sincos Bex
=+ ϕπψ
,       (1) 
where t
τ , with 
21 RRR
= , 1R  and 2R  being the resistive junction parameters, exψ  is 
the externally applied flux normalized to 0Φ  and Bi  is the bias current normalized to JI . 
Following the same type of approach, by means of a perturbation analysis, taking β  as the 
perturbation parameter, it can be shown that, to first order in β , the equivalent single junction 
model can be written as follows for a symmetric SQUID with identical junctions [6]:  
( ) ( ) ( )
2sinsinsincos1 2 Bexex
=Ψ+Ψ−+ ϕππβϕπ
,    (2) 
where n is an integer. This model allows, at least for small values of the parameter, to calculate in 
closed form some electrodynamic quantities, such as, for example, the amplitude of the half-integer 
Shapiro steps appearing in these systems [7]. It has also been shown that, by extending this model to 
SQUIDs with non-identical junction, one can obtain an effective classical double-well potential in 
which the transition from one state to the other can be enhanced by applying an opportune external 
magnetic flux [8]. However, this classical analysis by itself does not allow to define the quantum 
states of the system. Nonetheless, the equation of the motion (2) could be assumed to be a classical 
version of the time evolution of a quantum phase state. Therefore, the aim of the present work is to 
obtain, starting from the time evolution of the superconducting phase difference ϕ , the quantum 
mechanical Hamiltonian and to compute, by means of this quantum mechanical system, which in the 
classical limit reduces to Eq. (2), the persistent currents in the SQUID. It is interesting to notice that 
the resulting “Hamiltonian” quantum model derives from the overdamped limit of a classical 
dissipative system in the presence of a double well potential. The present analysis can be seen as an 
alternative approach to the study of the quantum properties of supercurrent qubits: It allows to study 
the response of the quantum system in the limit of negligible capacitance and finite values of the 
inductance, as opposed to the case usually considered in the literature, where negligible inductance 
and finite capacitance is assumed [9 - 10].   
II   FROM CLASSICAL TO QUANTUM MECHANICS 
Let us consider the classical dynamical equation ( )xfx =&  for the state variable x, where the dot 
notation indicates the derivative with respect to the normalized time τ. Making use of the previous 
equation, taking the time derivative of both sides, we can obtain the equation of the motion of the 
quantity x&  as follows: 
( ) ( ) ( )xfxfxxfx xx == &&& ,      (3) 
where the notation ( )xf x  stands for the partial derivative of ( )xf  with respect to x. Given the 
above equation and following the procedure described by Huang and Lin [11], the Lagrangian 
associated to this problem is obtained in the form: 
( )( )[ ]22
xfxL += & .      (4) 
Starting from the Lagrangian L, the Legendre transformation allows us to get the following classical 
Hamiltonian: 
( )( )[ ]22
xfH −= π ,      (5) 
where 
=π  is the canonical momentum conjugated to the variable x, while ( ) ( )( )2
xfxU −=  
could be considered as an effective potential. We are now interested in the quantization of the 
classical model described so far. According to the standard procedure, the recipe to transform the 
classical Hamiltonian in a quantum operator is implemented by making the substitution 
xi∂−=→ππ ˆ , xxx =→ ˆ  (in dimensionless units). From the previous definitions, the 
commutation rule [ ] ix =π̂,ˆ  for the conjugated variables follows directly. Furthermore, the 
Hamiltonian operator can be written as   ( )( )[ ]22
1ˆ xfH x +∂−= . 
The general procedure described above can be adopted to obtain the quantum model of an 
overdamped d.c. SQUID in the limit in which the reduced two-junctions interferometer model [6] 
can be applied. In the framework of this model, the phase dynamics can be written (in the 
homogeneous case) as in Eq. (2), so that the function ( )ϕff =  takes the following form: 
( ) ( ) ( )ϕϕγϕ 2sinsin baf −−= ,    (6) 
where ( )exa πψcos= , ( )exb πψπβ 2sin= , 2
Bi=γ , having chosen 0=n  for simplicity. We notice 
that this analysis cannot be extended to the similar case, considered by Grønbech-Jensen et al. [12], 
of junctions with finite capacitance. Therefore, by setting ϕ=x  and ϕπ &=  in the above general 
analysis, we notice that the phase and the voltage across the two-junction quantum interferometer 
are conjugate variables of the system. In the present case, therefore, proceeding as we said, by 
squaring ( )ϕf  and exhibiting the final result of the calculation in terms of the higher harmonics of 
the phase variable instead of powers of trigonometric functions, the  following Hamiltonian 
operator is obtained: 
                           ( ) ( ) ( ) ( )
( ) ( ) ( )γϕγϕγ
ϕϕϕϕϕ
,,2sinsin
babaab
= ,                     (7) 
where ( )
222 ba
γ  is a flux dependent energy shift which will be important in the 
following discussion. 
In order to calculate the relevant physical quantities of the system, we introduce the orthonormal  
complete basis 
n =  of the infinite-dimensional Hilbert space with the inner product 
== −∫ , where mn,δ  is the Kronecker delta. In this representation the matrix 
elements of the Hamiltonian operator can be written as follows: 
( ) ( ) ( )
( ) ( ) ( ) ( )2,2,1,1,4,4,
1,1,,
+−+−+−+−
−+−+++++
++++−
mnmnmnmnmnmnmnmn
mnmnmnmnmnmn
aabban
δδδδδ
, (8) 
where the following useful relations have been used: 
( ) ( )lmnlmninlm +− −= ,,2
sin δδϕ ,     (9a) 
( ) ( )lmnlmnnlm +− += ,,2
cos δδϕ .     (9b) 
Once the matrix elements of the Hamiltonian operator are known, we can diagonalize a reduced 
version of the complete infinite-dimensional matrix by introducing an energy cut-off. Such 
procedure can be safely carried out when we need to characterize low energy states which are 
located very far from the cut and when the number of the vectors in the basis of the reduced Hilbert 
space is able to capture the essential features of the low energy states. For instance, the Hilbert 
space spanned by the first 20 basis functions can be a very effective choice, if we need to study only 
the lowest energy states close to the ground state of our system. In fact, in our case we have noted 
that, by halving the number of the basis elements, no evident difference is present in the lowest 
energy eigenvalues.  
III   PERSISTENT CURRENTS 
Following the procedure described above, in the present section we shall derive the behavior of 
the persistent currents associated to each eigenstate of the Hamiltonian as a consequence of the time 
reversal symmetry breaking provided by the magnetic flux.  Such a current, in units of the 
Josephson current divided by π2  (i. e., in units of 
JE , where JE  is the Josephson energy), can be 
defined as follows: 
−= ,      (10) 
where nε  and exψ  are the eigenvalues of the Hamiltonian and the normalized external magnetic 
flux, respectively. According to the above relation, the persistent current nI  can be computed once 
the pertinent eigenvalue nε  is known. Furthermore, it should be noticed that, in the absence of the 
off-diagonal terms in the Hamiltonian given in Eq. (8), the state independent persistent current 
computed by means of Eq. (10) would be given by: 
                                    ( ) ( )[ ]exexI πψβππψπ 222 sin212sin4 −−= .                                                  (11) 
The solution of the full problem can thus be seen as the state dependent correction to the above 
relation induced by the off-diagonal terms. Last point can be well understood by analyzing Figs. 1a-
b. In these figures, even tough we are in the presence of finite off-diagonal terms, the relation given 
in Eq. (11) is able to describe quite accurately the behavior of the persistent currents which appears 
insensitive to the state index due to the small value of β . When the value of β  is raised (see Fig. 
2a), the persistent currents starts to become weakly state sensitive and some deformation of the 
original shape occurs. Furthermore, the states of higher energy (see Fig. 2b) induce a behavior of 
the persistent current which is quite insensitive to the state index.  A further raising of β  (see Fig. 
3a) induces a suppression of the persistent current carried by the first excited state in the vicinity of 
half integer values of the normalized applied magnetic flux. This implies that, in the low energy 
regime (i. e., when the quantum state can be written as SSS 1
+= , where 0
 and 
 represent the ground state and the first excited state, respectively), the average persistent 
current 1
ISISI +=  close to an half integer flux is mainly related to the ground 
state properties of the system, since 10 II >>  in the vicinity of 2
=exψ  (for 2
≠exψ ). In Fig. 4a, 
raising once again β ,  it can be noticed that, in the vicinity of half integer values of the normalized 
flux, the ground state and the first excited state carries currents of opposite sign, inducing a 
competing magnetic behavior. Therefore, the average persistent current, and its magnetic behavior,  
depend, on both coefficients of the decomposition (i. e., on S0
 and S1
).  This last point 
implies that, by measuring the magnetic momentum of the system in a particular magnetic field 
configuration, we can obtain constraints on the nature of the quantum superposition. For instance, 
under these conditions, we could prepare the quantum state in such a way that the average persistent 
current is negligibly small in the vicinity of half integer values of the normalized applied magnetic 
flux. 
Furthermore, we point out that a double well potential can be obtained setting the model 
parameters as done in Fig.5 ( 2.0=β  and 7.0=exψ ), where the potential ( )ϕU  is shown. Indeed, 
we notice that for 0=γ  two low-energy degenerate states are present, the degeneracy being 
removed by means of a small current bias. Such bias can drive the response of the system toward 
one of the two minima of the potential allowing a complete control of the quantum state which can 
be exploited for technological applications. Finally, we notice that, even thought the chosen β  
values in Figs. 2 – 5 are close to the validity limits of the first order approximation of the reduced 
model in ref. [6], the above characteristic response of the system remain qualitatively valid, since 
we are here considering the leading order in the value of β .  
IV   CONCLUSION 
Starting from the reduced dynamical model of the two-junction quantum interference device, 
the applied flux dependence of persistent currents in this system has been studied in the quantum 
regime. The extension of the dissipative overdamped classical system, from classical to quantum 
mechanics, allows to consider the electrodynamical response of a supercurrent quantum bit in the 
limit of negligible capacitance and finite inductance. For null bias current and for opportune values 
of the externally applied magnetic flux, the quantum analog of the two-junction interferometer 
shows effective potential with a degenerate ground state; degeneracy can be removed by applying a 
control non-null bias current. 
In the literature, the quantum behavior of the two-junction quantum interference device is 
studied by considering the charging energy of the junctions as preponderant with respect to the 
energy of the circulating currents [5, 9, 10]. In the present work it is shown that it is possible to 
obtain an Hamiltonian quantum analog of d. c. SQUIDs containing overdamped junctions in the 
limit of null capacitance and finite inductance values. In this framework, a flux qubit can be 
realized, under quite different conditions than those with high junction capacitance value [5]. 
Finally, the present analysis can also be considered as a link between classical dissipative systems 
and their corresponding quantum mechanical models.  
REFERENCES 
1. A. Barone and G. Paternò, Physics and Applications of the Josephson Effect (Wiley, NY, 
1982). 
2. K. K. Likharev, Dynamics of Josephson Junctions and circuits, Gordon and Breach, 
Amsterdam, 1986.  
3. J. Clarke and A. I. Braginski, Eds., The SQUID Handbook, Vol. I (Wiley-VCH, Weinheim, 
2004). 
4. M. F. Bocko, A. M. Herr and M. J. Feldman, IEEE Tans. Appl. Supercond. 7, 3638 (1997).  
5. J. B. Majer, F. G. Paauw, A. C. J. ter Haar, C. J. P. M. Harmans, and J. E. Mooij, Phys. Rev. 
Lett. 94, 090501 (2005). 
6. F. Romeo, R. De Luca, Phys. Lett. A 328, 330 (2004). 
7. C. Vanneste, C. C. Chi, W. J. Gallagher, A. W. Kleinsasser, S. I. Raider, and R. L. 
Sandstrom, J. Appl. Phys. 64,  242 (1988). 
8. R. De Luca, F. Romeo, Phys. Rev. B 73, 214518 (2006). 
9. G. Burkard, Phys. Rev B 71, 144511 (2005). 
10.  T. P. Orlando, J. E. Mooij, Lin Tian, Caspar H. van der Wal, L. S. Levitov, Seth Lloyd, J. J. 
Mazo, Phys. Rev B 60, 15398 (1999).  
11. Y.-S. Huang and C.-L. Lin, Am. J. Phys. 70, 741 (2002). 
12. N. Grønbech-Jensen, D. B. Thompson, M. Cirillo, C. Cosmelli, Phys. Rev. B 67, 224505 
(2003). 
FIGURE CAPTIONS 
Fig. 1  
(a) Persistent currents 1I  (triangle) and 2I  (box) plotted as a function of the applied external flux 
exψ  and  by fixing 075.0=β  and 0=γ . (b) Persistent currents 3I  (star) and 4I  (diamond) plotted 
as a function of the applied external flux exψ  and  by fixing 075.0=β  and 0=γ . 
Fig. 2 
(a) Persistent currents 1I  (triangle) and 2I  (box) plotted as a function of the applied external flux 
exψ  and  by fixing 15.0=β  and 0=γ . (b) Persistent currents 3I  (star) and 4I  (diamond) plotted 
as a function of the applied external flux exψ  and  by fixing 15.0=β  and 0=γ . 
Fig. 3 
(a) Persistent currents 1I  (triangle) and 2I  (box) plotted as a function of the applied external flux 
exψ  and  by fixing 2.0=β  and 0=γ . (b) Persistent currents 3I  (star) and 4I  (diamond) plotted as 
a function of the applied external flux exψ  and  by fixing 2.0=β  and 0=γ . 
Fig. 4  
 (a) Persistent currents 1I  (triangle) and 2I  (box) plotted as a function of the applied external flux 
exψ  and  by fixing 25.0=β  and 0=γ . (b) Persistent currents 3I  (star) and 4I  (diamond) plotted 
as a function of the applied external flux exψ  and  by fixing 25.0=β  and 0=γ . 
Fig. 5 
 Density plot of the potential ( )ϕU  plotted as a function of the phase ϕ  and of the normalized bias 
current γ  by setting the remaining parameters as: 2.0=β  and 7.0=exψ . Lower energy states are 
represented by darker regions in the plot. 
Fig. 1  
0 0.2 0.4 0.6 0.8 1
�0.75
�0.25
0 0.2 0.4 0.6 0.8 1
Fig. 2 
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Fig. 3 
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Fig. 4 
0 0.2 0.4 0.6 0.8 1
0 0.2 0.4 0.6 0.8 1
Fig. 5 
0 Π�����
Π 3 Π����������
�����
����������
2 2 Π
ABSTRACT
  Starting from the reduced dynamical model of a two-junction quantum
interference device, a quantum analog of the system has been exhibited, in
order to extend the well known properties of this device to the quantum regime.
By finding eigenvalues of the corresponding Hamiltonian operator, the
persistent currents flowing in the ring have been obtained. The resulting
quantum analog of the overdamped two-junction quantum interference device can
be seen as a supercurrent qubit operating in the limit of negligible
capacitance and finite inductance.

<|endoftext|><|startoftext|>
Draft version August 6, 2018
Preprint typeset using LATEX style emulateapj v. 4/12/04
MID-INFRARED FINE STRUCTURE LINE RATIOS IN ACTIVE GALACTIC NUCLEI OBSERVED WITH
SPITZER IRS: EVIDENCE FOR EXTINCTION BY THE TORUS
R. P. Dudik
, J. C. Weingartner
, S. Satyapal
, J. Fischer
, C. C. Dudley
, & B. O’Halloran
Draft version August 6, 2018
ABSTRACT
We present the first systematic investigation of the [NeV] (14µm/24µm) and [SIII] (18µm/33µm)
infrared line flux ratios, traditionally used to estimate the density of the ionized gas, in a sample
of 41 Type 1 and Type 2 active galactic nuclei (AGNs) observed with the Infrared Spectrograph on
board Spitzer. The majority of galaxies with both [NeV] lines detected have observed [NeV] line flux
ratios consistent with or below the theoretical low density limit, based on calculations using currently
available collision strengths and ignoring absorption and stimulated emission. We find that Type 2
AGNs have lower line flux ratios than Type 1 AGNs and that all of the galaxies with line flux ratios
below the low density limit are Type 2 AGNs. We argue that differential infrared extinction to the
[NeV] emitting region due to dust in the obscuring torus is responsible for the ratios below the low
density limit and we suggest that the ratio may be a tracer of the inclination angle of the torus to our
line of sight. Because the temperature of the gas, the amount of extinction, and the effect of absorption
and stimulated emission on the line ratios are all unknown, we are not able to determine the electron
densities associated with the [NeV] line flux ratios for the objects in our sample. We also find that the
[SIII] emission from the galaxies in our sample is extended and originates primarily in star forming
regions. Since the emission from low-ionization species is extended, any analysis using line flux ratios
from such species obtained from slits of different sizes is invalid for most nearby galaxies.
Subject headings: Galaxies: Active— Galaxies: Starbursts— X-rays: Galaxies — Infrared: Galaxies
1. INTRODUCTION
Mid-infrared (mid-IR) emission-line spectroscopy of
active galactic nuclei (AGNs) is used to investigate the
physical conditions of the dust-enshrouded gas that is
in close proximity to the active nucleus. In particular,
many spectral lines are emitted in the so-called narrow-
line region (NLR) of these objects which typically ex-
tends between tens to at most a thousand parsecs from
the nucleus (Capetti, et al. 1995, 1997, Schmitt & Kin-
ney 1996, Falcke et al. 1998; Ferruit et al. 1999, Schmitt
et al. 2003).
The NLRs of AGNs have been studied extensively us-
ing optical spectroscopic observations. However, there
have been very few systematic studies of the NLR using
infrared spectroscopic observations. Infrared (IR) fine-
structure emission lines have a number of special char-
acteristics that have been regarded as distinct advan-
tages, particularly in determining the electron density of
the ionized gas very close to the central AGN. Infrared
spectroscopic observations allow access to fine-structure
lines from ions with higher ionization potentials than the
most widely used optical diagnostic lines. This is impor-
tant in many AGNs, where a significant fraction of the
line emission from lower ionization species can originate
in gas ionized by star forming regions. In addition, it
is generally assumed that the density-sensitive infrared
line ratios originate in gas with temperatures around 104
K and are less dependent on electron temperature vari-
ations, enabling a more straightforward determination
of the electron density in the ionized gas. Finally, it
1 George Mason University, Department of Physics & Astron-
omy, MS 3F3, 4400 University Drive, Fairfax, VA 22030
2 Naval Research Laboratory, Remote Sensing Division, 4555
Overlook Ave SW, Washington DC, 20375
has long been assumed that the IR diagnostic line ratios
are insensitive to reddening corrections–a serious impedi-
ment to optical and ultraviolet observations, particularly
in the NLRs of AGNs, where the dust composition and
spatial distribution are highly uncertain. For these rea-
sons, IR spectroscopic observations, especially since the
era of the Infrared Space Observatory (ISO), have pro-
vided us with some of the most reliable tools for studying
the NLRs in AGNs. However, while there are clear ad-
vantages of mid-IR fine-structure diagnostics in studying
the physical state of the ionized gas, very little work has
been done to investigate their robustness in determin-
ing the gas densities of the NLRs in a large sample of
AGNs. The Spitzer Space Telescope Infrared Spectrome-
ter (IRS), with its extraordinary sensitivity and spectral
resolution, offers the opportunity to examine for the first
time the physical state of NLR gas in a large sample of
AGNs.
The focus of most previous comparative studies of the
infrared fine-structure lines in AGNs has been on the
excitation state of the ionized gas, in an effort to de-
termine the existence and energetic importance of po-
tentially buried AGNs and to constrain their ionizing
radiation fields (Genzel et al. 1998, Lutz et al. 1999,
Alexander & Sternberg 1999, Sturm et al. 2002, Satya-
pal, Sambruna, & Dudik 2004, Spinoglio et al. 2005).
Remarkably, very little work has been done in the in-
frared on studying the line flux ratios traditionally used
to probe the NLR gas densities in a significant number of
AGNs. We present in this paper the first systematic in-
frared spectroscopic study of the line flux ratios of [NeV]
and [SIII] in order to 1) test the robustness of these line
ratios as density diagnostics and 2) if possible, to probe
the densities of the NLR gas in a large sample of AGNs.
http://arxiv.org/abs/0704.0547v2
2. THE SAMPLE
We searched the Spitzer archive for galaxies with an ac-
tive nucleus and both high- and low-resolution Infrared
Spectrometer (IRS; Houck et al. 2004) observations cur-
rently available. Only those galaxies with indisputable
optical, X-ray, or radio signatures of active nuclei (such
as broad Hα or X-ray or radio point sources) were in-
cluded in our sample. The sample includes three AGN
subclasses: Seyferts, LINERs, and Quasars. The galax-
ies in this sample span a wide range of distances (4 to
400 Mpc; median = 21 Mpc), Hubble types, bolomet-
ric luminosities (log (LBOL) ∼ 40 to 46, median = 43),
and Eddington Ratios (log(L/LEdd) ∼ -6.5 to 0.3; me-
dian= -2.5). The entire sample consists of 41 galaxies.
The basic properties of the sample are given in Table
1. The black hole masses listed in Table 1 were derived
using resolved stellar kinematics, if available, reverber-
ation mapping, or by applying the correlation between
optical bulge luminosity and central black hole mass de-
termined in nearby galaxies only when the host galaxy
was clearly resolved. Bolometric luminosities listed in
Table 1 were calculated from the X-ray luminosities for
most objects. For Seyferts, the relationship LBOL = 10 ×
LX was adopted (Elvis 1994). For LINERs
1 we assumed
LBOL = 34× LX , as derived from the spectral energy dis-
tribution of a sample of nearby LINERs from Ho (1999)
(see also Dudik et al. 2005 and Satyapal et al. 2005).
The bolometric luminosities and black hole masses for
quasars and radio galaxies were taken from Woo & Urry
(2002) and Marchesini, Celotti, & Ferrarese (2004), re-
spectively. A detailed discussion of our methodology and
justification of assumptions for determining black hole
masses and bolometric luminosities for the various AGN
classes represented in Table 1 can be found in Satyapal et
al. (2005) and Dudik et al. (2005). Table 1 also lists the
AGN type (1 or 2) for the galaxies in our sample based
on the presence or absence of broad (full width at half
max (FWHM) exceeding 1000 km s−1) Balmer emission
lines in the optical spectrum. We emphasize that the
selection basis for the objects in our sample was on the
availability of high resolution IRS Spitzer observations.
The sample should therefore not be viewed as complete
in any sense.
3. DATA ANALYSIS AND RESULTS
We extracted archival spectral data obtained us-
ing the short-wavelength, low-resolution module (SL2,
3.6”×57”, λ = 5.2-7.7µm) and both the short-
wavelength, high-resolution (SH, 4.7”×11.3”, λ = 9.9-
19.6µm) and long-wavelength, high-resolution (LH,
11.1”×22.3”, λ = 18.7-37.2µm) modules of IRS.
The data presented here were preprocessed by the
IRS pipeline (version 13.0) at the Spitzer Science Center
(SSC) prior to download. Preprocessing includes ramp
fitting, dark-sky subtraction, droop correction, linear-
ity correction, flat-fielding, and flux calibration2. The
Spitzer data were further processed using the SMART v.
5.5.7 analysis package (Higdon et al. 2004). The slit for
1 We include all galaxies that are classified as LINERs using
either the Heckman (1980) or Veilleux & Osterbrock (1987) diag-
nostic diagrams.
2 See Spitzer Observers Manual, Chapter 7,
http://ssc.spitzer.caltech.edu/documents/som/irs60.pdf
Table 1: Properties of the Sample
Galaxy Distance Hubble log log log AGN
Name (Mpc) Type (MBH) (LX ) (L/LEdd) Type
(1) (2) (3) (4) (5) (6) (7)
Seyferts
NGC4151 13 SABab 7.13a 42.7b -1.53 1r
NGC1365 19 SBb 7.64b 41.3d -3.42 2s
NGC1097 15 SBb · · · 40.7e · · · · · ·
NGC7469 65 SABa 6.84a 44.3a 0.34 1t
NGC4945 4 SBcd 7.35b 42.5f -1.97 2u
Circinus 4 SAb 7.72b 42.1g -2.74 2v
Mrk 231 169 SAc 7.24c 42.2h -2.16 · · ·
Mrk3 54 S0 8.65a 43.5a -2.21 2w
Cen A 3 S0 7.24b 41.8i -2.54 2x
Mrk463 201 Merger · · · 43.0j · · · 2y
NGC 4826 8 SAab 6.76b · · · · · · · · ·
NGC 4725 16 SABab 7.40b · · · · · · 2r
1 ZW 1 245 Sa · · · 43.9k · · · · · ·
NGC 5033 19 SAc 7.39b 41.4l -3.13 1r
NGC1566 20 SABbc 6.92a 43.5a -0.57 1t
NGC 2841 9 SAb 8.21a 42.7a -2.64 · · ·
NGC 7213 24 SA0 7.99a 43.30a -1.79 · · ·
LINERs
NGC4579 17 SABb 7.85b 41.0b -3.47 · · ·
NGC3031 4 SAab 7.79b 40.2b -4.16 · · ·
NGC6240 98 Merger 9.15b 44.2b -1.52 2z
NGC5194 8 SAbc 6.90b 41.0b -2.43 2r
MRK266NE 112 Merger · · · 40.9b · · · 2t
NGC7552 21 SBab 6.99b · · · · · · · · ·
NGC 4552 17 · · · 8.57b 39.6b -5.52 · · ·
NGC 3079 15 SBc 7.58b 40.1m -4.05 · · ·
NGC 1614 64 SBc 6.94b · · · · · · · · ·
NGC 3628 10 SAb 7.86b 39.9n -4.58 · · ·
NGC 2623 74 Pec 6.83b · · · · · · 2aa
IRAS23128-5919 178 Merger · · · 41.0b · · · 2bb
MRK273 151 Merger 7.74b 44.0o -0.31 2t
IRAS20551-4250 171 Merger 7.52c 40.9b -3.23 · · ·
NGC3627 10 SABb 7.16b 39.4p -4.33 2r
UGC05101 158 S · · · 40.9b · · · 1cc
NGC4125 18 E6 8.50b 38.6b -6.47 · · ·
NGC 4594 10 SAa 9.04b 40.1q -5.47 · · ·
Quasars
PG 1351+640 353 · · · 8.48a 44.5a -1.08 · · ·
PG 1211+143 324 · · · 7.49a 44.8a 0.22 1t
PG 1119+120 201 · · · · · · · · · · · · 1y
PG 2130+099 252 Sa 7.74a 44.47a -0.37 1y
PG 0804+761 400 · · · 8.24a 44.93a -0.41 1dd
PG 1501+106 146 E · · · · · · · · · 1y
Columns Explanation: Col(1):Common Source Names; Col(2): Dis-
tance (for H0= 75 km s
−1Mpc−1); Col(3): Morphological Class;
Col(4): Mass of central black hole in solar masses; Col(5): Log of
the hard X-ray luminosity (2-10keV) in erg s−1. Col(6): log of the
Eddington Ratio. (* = We include all galaxies that are classified as
LINERs using either the Heckman (1980) or Veilleux & Osterbrock
(1987) diagnostic diagrams. Col(6): AGN type based on the presence
or absence of broad Balmer emission lines.) References:aWoo & Urry
2002, b Satyapal et al. 2005, c Tacconi et al. 2002, d Risaliti et al.
2005, e Terashima et al. 2002, f Guainazzi et al. 2000, g Smith &
Wilson 2001, h Gallagher et al. 2002, i Evans et al. 2004, j Iman-
ishi & Terashima et al. 2004, kGallo et al. 2004 , l Terashima et al.
1999, m Cappi et al. 2006, n Roberts, Schurch, & Warwick 2001, o
Balestra et al. 2005, pGeorgantopoulos et al 2002, q Dudik et al. 2005,
r Ho et al. 1997, s Storchi-Bergmann, Mulchaey, & Wilson, 1992, t
Veron-Cetty & Veron 2003, u Marconi et al. 2000, vOliva et al. 1994,
w Khachikian & Weedman 1974, x Veron-Cetty & Veron 1986y Dahari
& De Robertis 1988, z Andreasian, Khachikian, & Ye, 1987, aa Laine
et al. 2003, bb Duc, Mirabel, & Maza 1997, cc Sanders et al. 1988, dd
Thompson 1992.
http://ssc.spitzer.caltech.edu/documents/som/irs60.pdf
the SH and LH modules is too small for background sub-
traction to take place and separate SH or LH background
observations do not exist for any of the galaxies in this
sample. For the SL2 module, background subtraction
was done using either a designated background file when
available or the interactive source extraction option. In
the case of the latter, the exact position of the slit on
the host galaxy was first checked using Leopard, the data
archive access tool available from the SSC. The source
was then carefully defined according to the boundary of
the slit and the edge of the host galaxy. The background
was defined at the edge of the slit, where no other obvious
source was present. In some cases, the slit was enveloped
in the host galaxy and background subtraction could not
take place. For both high and low resolution spectra, the
ends of each order were manually cut from the rest of the
spectrum.
The 41 observations presented in this work are archived
from various programs, including the SINGS Legacy Pro-
gram, and therefore contain both mapping and staring
observations. All of the staring observations were cen-
tered on the nucleus of the galaxy. The SH, LH, SL2
staring observations include data from two slit positions
overlapping by one third of a slit. In order to isolate
the nuclear region in the mapping observations so that
we might compare them to the staring observations, we
extracted only those 3 overlapping slit positions coin-
ciding with either radio or 2MASS nuclear coordinates.
Because the slits in both the mapping and staring obser-
vations occupy distinctly different regions of the sky, the
slits cannot be averaged unless the emission originates
from a compact source that is contained entirely in each
slit. Therefore the procedure for flux extraction was the
following: 1) If the fluxes measured from the two slits
differed by no more than the calibration error of the in-
strument, then the fluxes were averaged; otherwise, the
slit with the highest measured line flux was chosen. 2) If
an emission line was detected in one slit, but not in the
other, then the detection was selected. This is true for
all of the high and low resolution staring and mapping
observations.
In Tables 2 and 3 we list the line fluxes and statistical
errors from the SH and LH observations for the [NeV]
14.3µm and 24.3µm lines, the [SIII] 18.1µm and 33.5µm
lines, as well as the 6.2µm PAH emission feature. For
all galaxies with previously published fluxes, we list in
Tables 2 and 3 the published flux values. Our values dif-
fer by no more than a factor of 1.9, much less in most
cases, from the Weedman et al. (2005) or Armus et al.
(2004, 2006) published values. These differences can be
attributed to differences in the pipeline used for prepro-
cessing. In all cases detections were defined when the
line flux was at least 3σ. For the absolute photometric
flux uncertainty we conservatively adopt 15%, based on
the assessed values given by the Spitzer Science Center
(SSC) over the lifetime of the mission.3 This error is cal-
culated from multiple observations of various standard
stars throughout the Spitzer mission by the SSC. The
dominant component of the total error arises from the
3 See Spitzer Observers Manual, Chapter 7,
(http://ssc.spitzer.caltech.edu/documents/som/som7.1.irs.pdf
and IRS Data Handbook (http://ssc.spitzer.caltech.edu/irs/dh/dh20v2.pdf,
Chapter 7.2
uncertainty at mid-IR wavelengths in the stellar models
used in calibration and is systematic rather than Gaus-
sian in nature. We note that the spectral resolution of
the SH and LH modules of IRS (λ / ∆λ ∼ 600) is in-
sufficient to resolve the velocity structure for most of the
lines. There are a few galaxies which do show slightly
broadened [NeV] line profiles (FWHM ∼ 200 - 1200 km
s−1). These results will be discussed in a future paper.
Abundance-independent density estimates can readily
be obtained using infrared fine-structure transitions from
like ions in the same ionization state with different crit-
ical densities. The density diagnostics available in the
IRS spectra of our objects are: [NeV] 14.32µm, 24.32
µm (ncrit ∼ 4.9 × 10
4 cm−3, and 2.7 × 104 cm−3, where
ncrit = Aul/γul, with Aul the Einstein A coefficient and
γul the rate coefficient for collisional de-excitation from
the upper to the lower level), [NeIII] 15.55µm, 36.04 µm
(ncrit ∼ 3 × 10
5 cm−3, and 5 × 104 cm−3, Giveon et al.
2002), and [SIII]18.71µm, 33.48 µm (ncrit ∼ 1.5 × 10
cm−3, and 4.1 × 103 cm−3). The results are very insen-
sitive to the shape of the ionizing continuum. Since the
[NeIII] 36µm line was either not detected or was outside
the wavelength range of the LH module in virtually all
galaxies, we omit any analysis of the [NeIII] line ratio
from this work.
4. THE [NEV] LINE FLUX RATIOS
In Figure 1 we plot the calculated 14µm/24µm line lu-
minosity ratio as a function of electron density ne for gas
temperatures T = 104K, 105K, and 106K. We include
only the five levels of the ground 2s22p2 configuration and
neglect absorption and stimulated emission. The results
are nearly identical if only the lowest three levels of the
ground term are included. We adopt collision strengths
from Griffin & Badnell (2000) and radiative transition
probabilities from Galavis, Mendoza, & Zeippen (1997).
Fig. 1.— [NeV] 14µm/24µm line flux ratio versus electron den-
sity, ne, for gas temperatures T = 10
4 K, 105 K, 106 K
In Table 2, we list the observed [NeV] line flux ratios
http://ssc.spitzer.caltech.edu/documents/som/som7.1.irs.pdf
and their associated calibration uncertainties. In calcu-
lating the upper and lower limits on the ratios, RMAX
and RMIN , shown in Table 2, we did not propagate the
errors in quadrature as would be appropriate for statis-
tical uncertainties, but propagated them as follows:
RMAX =
F [NeV]14 + 0.15(F [NeV]14)
F [NeV]24 − 0.15(F [NeV]24)
RMIN =
F [NeV]14 − 0.15(F [NeV]14)
F [NeV]24 + 0.15(F [NeV]24)
We note that this is conservative, since some components
of the calibration errors should cancel in the ratio. Both
line fluxes were measured for 19 galaxies. In what fol-
lows we compare the line flux ratios measured in all but
one, MKN 266, for reasons that are discussed in detail
in Section 5.2. Of these 18 AGNs, 13 have ratios that
are consistent with the low density limit to within the
uncertainties, while only 2, both Type 1, have ratios sig-
nificantly above it. The remaining 3, all Type 2, have
ratios significantly below the low-density limit. Inter-
estingly, we note that a similar range of ratios was also
measured with the ISO SWS (Sturm et al. 2002, Alexan-
der et al. 1999). There are several possible explanations
for this finding. The observed, unphysically low ratios
could result from artifacts introduced by variations in
the slit sizes from which the line fluxes are obtained,
from calibration uncertainties, or from substantial mid-
IR extinction. Alternatively, perhaps important physical
processes were neglected in calculating the theoretical ra-
tios. In addition, errors in the collisional rate coefficients
for the [NeV] transitions associated with the mid-infrared
lines may be important. We explore these scenarios in
the following sections.
Observational Effects: Because the IRS LH slit is
larger than the SH slit, if the [NeV] emission is extended,
or multiple AGNs are present, the 14/24 µm line ratio
will be artificially reduced. However, since the ionization
potential of [NeV] is ∼ 97 eV, we expect that the [NeV]-
emitting gas is ionized by the AGN radiation field only
and is concentrated very close to the central source. Vir-
tually all of the [NeV] fluxes presented in this work were
obtained from IRS staring observations. Thus it is im-
possible to determine whether the emission is extended
using Spitzer observations alone. However, a number of
galaxies have been observed at 14 and 24 µm by ISO. In
Table 2 we list in addition to our Spitzer [NeV] fluxes, all
available [NeV] fluxes from ISO. The ISO aperture at 14
and 24 µm (14”×27”) is much larger than either the SH
or LH slits. In Figure 2 we plot the ratio of the [NeV] flux
measured by ISO to that measured by Spitzer for both
the 14 and 24 µm lines. The ranges of the [NeV] line
flux ratios are consistent with the instrument uncertain-
ties and are similar for all galaxies in the sample. Only
the 14µm ratio for Mrk 266 falls outside of the expected
range. This strongly suggests that the [NeV] emission is
indeed compact and originates in the NLR and that the
ratios are not affected by aperture variations, except for
Mrk 266 which is discussed in detail in Section 5.2.
If the data were affected by aperture variations we
would expect to see an overall systematic increase of
the 14µm/24µm line ratio with distance(See Figure 3).
The Spearman rank correlation coefficient (rS, Kendell
& Stuart 1976) corresponding to this plot is -0.069 (with
 [NeV] 14 micron Ratio
0.0 0.5 1.0 1.5 2.0 2.5
[NeV]
(ISO) / F
[NeV]
(Spitzer)
Mrk 266
[NeV] 24 micron Ratio
Fig. 2.— Ratio of the ISO to Spitzer [NeV] 14µm and 24µm
fluxes for those galaxies with overlapping observations. The range
indicated with arrows is that corresponding to the absolute flux
calibration for ISO (20%) and Spitzer (15%). Within the calibra-
tion uncertainties of the instrument, the [NeV] fluxes are virtually
the same for all of the galaxies except Mrk 266 (See Section 6.2).
This strongly suggests that the [NeV] emission is compact and orig-
inates in the NLR. We note that Sturm et al. 2002 find that the
[NeV] 24µm detection for NGC 7469 is questionable. The ISO to
Spitzer ratio for this galaxy (0.43) is the lowest shown here.
a probability of chance correlation of 0.78), where a co-
efficient of 1 or -1 indicates a strong correlation and a
coefficient of 0 indicates no correlation. Thus we find
that there is no correlation between the [NeV] ratio and
distance in our sample. However this does not completely
rule out aperture effects, if the size of the [NeV] emitting
region increases with the bolometric luminosity of the
AGN and the sample displays a significant trend in bolo-
metric luminosity with distance. In this case, a correla-
tion between the [NeV] ratio and distance would not be
apparent since aperture variations would affect all galax-
ies in the same way, regardless of distance. However this
scenario is unlikely since the size of the [NeV]-emitting
region would have to increase proportionately with dis-
tance in order to remain extended beyond the slit for all
galaxies. Nevertheless, we checked for this possibility,
both by examining the [NeV] ratio vs. bolometric lu-
minosity and by plotting the ratio vs. distance, binning
the galaxies according to their bolometric luminosity. We
find neither to be correlated over 5 orders of magnitude
in LBOL. Thus, in the case of the [NeV] line flux ratio,
we find no indication that ratios below the low density
limit are artifacts of aperture effects.
We point out that the [NeV] 24µm IRS line fluxes in the
small overlapping sample plotted in Figure 2 are system-
atically higher than the corresponding ISO-SWS fluxes,
despite the smaller IRS slit. This indicates that one or
both of the instruments is affected by systematic errors
more severe than are indicated by the calibration un-
certainty estimates. The SWS band 3D that includes
the [NeV] 24µm line was characterized by strong fring-
ing effects that when combined with the narrow range
of the line scan mode introduced sometimes large un-
0.5 1.0 1.5 2.0 2.5
Mrk 266
 = 0.069
log(Distance, Mpc)
 Seyferts
 Liners 
 Quasars 
Fig. 3.— The [NeV] 14µm/24µm ratio as a function of distance.
Open symbols signify Type 1 AGNs, Filled symbols signify Type 2
AGNs. The error bars shown here mark the calibration uncertain-
ties on the line ratio. If the ratio were indeed affected by aperture
variations we would expect a systematic increase of the ratio with
distance. As can be seen here, this is not the case, and we find no
indication that the low ratio is attribuable to aperture effects.
certainties in the baseline fitting, and therefore the line
flux measurement accuracy. In contrast, the baseline fit-
ting over the entire Spitzer IRS SH and LH full spectra
can be much more accurate. Moreover pointing accu-
racy and stability are an order of magnitude improved
over that obtained by ISO. We therefore assume in the
sections that follow that the adopted conservative Spitzer
IRS calibration uncertainties are accurate characteriza-
tions of the IRS measurements. Importantly, regardless
of which instrument is used, [NeV] ratios consistent with
the low density limit have been observed in a number of
sources with both ISO (e.g. Sturm et al. 2002 NGC 1365,
NGC 7582, NGC4151, NGC 5506; Alexander et al. 1999,
NGC 4151) and Spitzer (Weedman et al. 2005, Haas et
al. 2005, and this work).
Extinction: We consider the possibility that mid-IR
differential extinction toward the [NeV]-emitting regions
is responsible for the low [NeV] line ratios. Adopting the
low-density limit (LDL) for the intrinsic value of the ratio
([NeV]14µm/24µm ∼0.83 for ne≤200 cm
−3) for galaxies
with ratios below the LDL, the observed line ratio gives a
lower limit to the extinction, for a given MIR extinction
curve. We examined the visual extinctions correspond-
ing to the mid-IR differential extinction derived using
three separate extinction curves: 1) the Draine (1989)
extinction curve amended by the more recent ISO SWS
extinction curve toward the Galactic center for 2.5-10µm
(Lutz et al. 1996), 2) the Chiar & Tielens (2006) ex-
tinction curve for the Galactic Center using 2.38-40µm
ISO SWS observations of a bright IR source in the Quin-
tuplet cluster (GCS3-I) 3) the Chiar & Tielens (2006)
extinction curve for the local ISM using 2.38-40µm ISO
SWS observations of a WC-type Wolf-Rayet (WR) star
(WR98a). The Draine (1989) and Lutz et al. (1996)
extinction curve yields AV ∼ 3 to 99 mag (See Table
2). However, these values result from an extinction law
that is unexplored beyond 10µm. The Chiar & Tie-
lens (2006) Galactic center extinction curve cannot ex-
plain the observed [NeV] ratios since the extinction at
24µm is greater than the extinction at 14µm, so we do
not discuss it further. The visual extinction resulting
from their local ISM extinction curve is unrealistically
high (AV median=500mag). The calculated extinction ob-
tained using the Draine (1989), Lutz et al. (1996), and
Chiar & Tielens (2006) local ISM extinction curves are
given in Table 2.
The AV derived from the two extinction curves de-
scribed above are extremely high in many cases. Even
if the extinction is calculated from the upper limit on
the ratio to the LDL for the three galaxies whose upper
limits are below the LDL, the corresponding visual ex-
tinction is still very high (for the Draine 1989 and Lutz
1996 extinction curve AV = 21, 26, and 30 mag for these
three galaxies; for the Chiar and Tielens extinction curve
AV = 260, 330, and 370 mag). However, we caution the
reader that the actual value for extinction is highly un-
certain. Indeed very little is known about the 8-40µm
extinction curve in AGNs. Specifically, the 10 and 18
µm silicate features in this band are the source of in-
consistency. Even within the AGN class, extinction may
vary dramatically from 8-40µm because of variations in
the silicate features due to differences in grain size, poros-
ity, shape, composition, abundance, and location in each
galaxy. Hao et al. (2005) show that in five AGNs (4
of which are in our sample), both silicate features vary
considerably in strength and width. Sturm et al. (2005)
also show that the standard ISM silicate models do not
accurately fit NGC 3998, a LINER with silicate emis-
sion. Sturm et al. (2005) suggest that increased grain
size and possibly the presence of crystalline silicates such
as clino-pyroxenes may improve the fit, but that clearly
circumnuclear dust in AGNs has very different proper-
ties than dust in the Galactic ISM (see also Maiolino et
al. 2001a, 2001b, but Weingartner & Murray 2002 for
an alternative view). Chiar & Tielens (2006) even show
that the GC observations and the local ISM observations
within the Galaxy deviate from each other most dramat-
ically in the wavelength region between the two silicate
absorption features. In their observations, this is the re-
gion between ∼ 12-15µm -directly overlapping with the
14µm values in which we are interested. Because of ir-
regularity of the silicate features in the mid-IR, it is very
difficult to interpret the true extinction there. Moreover,
in addition to the uncertainty in the extinction law, the
geometry of the obscuring material is unknown and can
vary substantially from galaxy to galaxy. The most that
can be said here for the galaxies with ratios below the
LDL is that if extinction is responsible for the low ratios,
then the extinction must be less at 24µm than at 14µm.
Physical Processes: It is possible that important
physical processes have been neglected in calculating the
[NeV] line luminosity ratio as a function of electron den-
sity shown in Figure 1. We consider three physical pro-
cesses that may affect the line ratios:
(1) A source of gas heating in addition to photoioniza-
tion (e.g., shocks, turbulence) that may yield gas temper-
atures substantially higher than 104K. As can be seen
in Figure 1, higher gas temperatures do not yield sig-
nificantly lower line ratios in the low-density limit, but
could explain the generally low values of the ratios that
lie above the LDL.
(2) Pumping from the ground term to the first excited
term, e.g., by O III resonance lines. The specific energy
density required for this to significantly affect the line
ratio exceeds 10−14 erg cm−3 Hz−1, which is implausibly
large by orders of magnitude.
(3) Absorption and stimulated emission within the
ground term, which could be important if, e.g., a large
quantity of warm dust yielding copious 24µm continuum
emission is located close to the [NeV]-emitting region.
Figures 4a through 4e show the line ratio as a func-
tion of the specific energy density at 24µm, uν(24µm).
We display results for electron density ne = 10
2, 103,
104, 105, and 106 cm−3; gas temperature T = 104, 105,
and 106K; and ratio of the specific energy density at 14
and 24µm, uν(14µm)/uν(24µm) = 0.4, 1.0, and 1.8 (val-
ues were chosen to reproduce the observed range of the
14µm/24µm continuum flux ratios; see Section 5).
For the moment, assume that the NeV is located suf-
ficiently far from the source of the 14 and 24µm contin-
uum emission to treat the source as a point. If hot dust
within or near the inner edge of the torus is responsible
for this emission, then this assumption requires that the
distance to the NeV, rNe, be large compared with the
dust sublimation radius, rsub ∼ 1 pcL
bol, 46 (Ferland et
al. 2002); Lbol, 46 is the bolometric luminosity in units of
1046 erg s−1. In this case, we can obtain a simple estimate
of uν(24µm) at the location of the [NeV]-emitting region
from the observed specific flux Fν(24µm), the distance
to the galaxy D, and rNe:
uν(24µm) ∼
Fν(24µm)
. (3)
With rNe = 100 pc, uν(24µm) estimated in this way
ranges from ≈ 10−24 erg cm−3 Hz−1 to somewhat less
than 10−20 erg cm−3 Hz−1 for the galaxies in our sam-
ple. These can be compared to the results of Hönig
et al. (2006), who modeled the infrared emission from
clumpy tori. They presented plots of Fν at a distance
of 10Mpc from an AGN with bolometric luminosity
Lbol = 4 × 10
45 erg s−1. Extrapolating to a distance
of 100 pc, we find uν(24µm) as large as a few times
10−21 erg cm−3 Hz−1, close to the estimate for the most
luminous AGN in our sample.
From Figures 4a through 4e, we see that the infrared
continuum can only reduce the line ratio significantly at
rNe ≈ 100 pc if T & 10
5K when ne ∼ 10
2 cm−3 and
T & 106K when ne ∼ 10
3 cm−3. However, the NeV, as
a high-ionization species, may lie closer to the central
source than does the bulk of the narrow line region. If
rNe ≈ 10 pc, then uν(24µm) increases by a factor ∼ 100.
In this case, the observed low line ratios can be explained
by this mechanism with T ∼ 104K, if ne ∼ 10
2 cm−3.
Higher values of electron density would require higher
gas temperatures.
In Section 5.1, we suggest that the [NeV]-emitting re-
gion may lie within the torus. In this case, absorp-
tion and stimulated emission within the ground term
are probably important. For the high-luminosity objects,
these may even dominate over collisional excitation and
de-excitation. At these central locations, gas tempera-
tures T ∼ 106 K may be natural (Ferland et al. 2002).
Relatively high densities may also be expected, in which
case the infrared continuum may not appreciably depress
the line ratio (see Figure 4d).
Adopting the Mathews & Ferland (1987) spectrum and
T ≈ 106K, the ionization parameter U ≡ nγ/ne ∼ 10
in order for a substantial fraction of the Ne to be NeV;
nγ is the number density of H-ionizing photons. For
this spectrum, nγ ≈ 1.7×10
3Lbol, 46 r
Ne, 100 cm
−3, where
rNe, 100 = rNe/100 pc. If rNe = 1pc, then either (1)
ne ∼ 10
10Lbol,46 cm
−3 or (2) the nuclear continuum is
filtered through a far-UV/X-ray-absorbing medium be-
fore reaching the [NeV]-emitting region.
If absorption and stimulated emission are indeed rele-
vant processes in [NeV] line production, we might expect
a relationship between the [NeV] line flux ratio and the
24 µm continuum luminosity that is consistent with one
of the curves shown in Figures 4a through 4e. In Figure
5 we plot this relationship for the [NeV] emitting galax-
ies in our sample. As can be seen in Figure 5, we find
no relationship between the [NeV] line flux ratio and the
24µm continuum luminosity for our sample of galaxies.
The Spearman rank correlation coefficient for this plot
is -0.01 (probability of chance correlation = 0.95), indi-
cating no correlation. As a result, as can be seen from
Figures 4a and 4b, stimulated emission and absorption
at low densities can be ruled out as possible scenarios be-
cause the scatter plot shown in Figure 5 does not follow
the model predictions. We note that the location of the
[NeV]-emitting region relative to the source of the 24µm
continuum emission is uniform among the galaxies in the
sample. Variations in the location might obscure any
correlation in these plots. Figures 4c and 4d reveal that,
for some values of ne, T, and uν(14µm)/uν(24µm), the
line ratio is very insensitive to the value of uν(24µm). In
these cases, the line ratio remains above ∼ 0.8. Thus, al-
though absorption and stimulated emission may be con-
tributing processes to [NeV] production, another mecha-
nism is required to explain the low (<0.8) [NeV] line flux
ratios in our sample.
Computed Quantities: Finally, it is possible that
there is significant error in the adopted collisional rate
coefficients. The accuracy of collisional strengths of in-
frared atomic transitions has been a longstanding ques-
tion. We adopt the collisional rate coefficients from the
state of the art IRON project (Hummer et al. 1993)
which produced the most up-to-date and accurate colli-
sion strengths for a large database of atomic transitions.
While these calculations have been questioned based on
recent ISO observations of nebulae (Clegg et al. 1987,
Oliva et al. 1996, Rubin et al. 2002, Rubin 2004), it is
likely that the discrepancies between the observational
and theoretical values can be explained by inaccuracies
in the fluxes employed (van Hoof et al. 2000). Uncer-
tainties in the collisional rate coefficients for the [NeV]
transitions are unlikely to exceed 30% (van Hoof et al.
2000). It is therefore unlikely that the low critical den-
sities implied by our data can be attributed to uncer-
tainties in the theoretical values of the [NeV] collisional
strengths.
5. EXTINCTION EFFECTS OF THE TORUS AND AGN
UNIFICATION
Although low electron densities, high gas tempera-
tures, and/or high infrared radiation densities may play
Fig. 4.— The [NeV] line ratio as a function of the specific energy density at 24µm, uν(24µm), for temperatures, T = 10
4, 105, 106 K,
for 14µm/24µm continuum ratios of 0.4, 1.0, and 1.8, and finally for electron densites, ne = 10
2, 103, 104, 105, 106 cm−3.
Table 2: NeV Line Fluxes and Derived Extinction
Galaxy [NeV] [NeV] [NeV] [NeV] [NeV] AV AV AV AV
Source 14.32 14.32 24.32 24.32 Ratio Ratio to Ratio to Ratio to Ratio to
SH ISO LH ISO LDL (D&L) HDL (D&L) LDL (C&T) HDL (C&T)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Seyferts
NGC4151 7.77a 5.50c 6.77a 5.60c 1.15
+0.41
−0.30
· · · 126 · · · 1686
NGC1365 2.20±0.06 2.50c 5.36±0.06 3.90c 0.41
+0.14
−0.11
45 189 570 2519
NGC1097 <0.05 · · · <0.18 · · · · · · · · · · · · · · · · · ·
NGC7469 1.16a <1.50c 1.47a 0.63c∗ 0.79
+0.28
−0.21
3 149 41 1990
NGC4945 0.28±0.03 <0.50d <0.75 · · · >0.38 · · · · · · · · · · · ·
Circinus 23.94±0.61 31.70c 24.00±3.90 21.80c 1.00
+0.35
−0.26
· · · 135 · · · 1799
Mrk 231 <0.44a <1.50e <0.69a · · · · · · · · · · · · · · · · · ·
Mrk3 6.45a 4.60c 6.75a 3.40c 0.96
+0.34
−0.25
· · · 138 · · · 1835
Cen A 2.32a 2.70c 2.99a 2.00c 0.77
+0.27
−0.20
4 150 56 2005
Mrk463 1.83b 1.40c 2.04b · · · 0.90
+0.32
−0.23
· · · 141 · · · 1886
NGC 4826 · · · · · · · · · · · · · · · · · · · · · · · · · · ·
NGC 4725 <0.09 · · · 0.09±0.03 · · · <1.04 · · · · · · · · · · · ·
1 ZW 1 <0.11a 0.27c <0.10a · · · · · · · · · · · · · · · · · ·
NGC 5033 0.07±0.02 · · · 0.11±0.02 · · · 0.65+0.23
−0.17
16 161 198 2146
NGC1566 0.16±0.05 · · · 0.22±0.04 · · · 0.74+0.26
−0.19
7 153 92 2041
NGC 2841 <0.04 · · · <0.03 · · · · · · · · · · · · · · · · · ·
NGC 7213 <0.04 · · · <0.09 · · · · · · · · · · · · · · · · · ·
LINERs
NGC4579 <0.06 · · · <0.03 · · · · · · · · · · · · · · · · · ·
NGC3031 <0.06 · · · <0.04 · · · · · · · · · · · · · · · · · ·
NGC6240 0.51b <1.00e <0.39b · · · <1.31 · · · · · · · · · · · ·
NGC5194 0.41±0.04 <0.20c 0.39±0.09 · · · 1.06
+0.37
−0.28
· · · 131 · · · 1751
MRK266∗∗ 0.21±0.02 0.50f 1.19±0.06 · · · 0.18
+0.06
−0.05
100 240 1254 3203
NGC7552 <0.11 · · · <0.83 · · · · · · · · · · · · · · · · · ·
NGC 4552 <0.06 · · · <0.07 · · · · · · · · · · · · · · · · · ·
NGC 3079 <0.07a · · · <0.14a · · · · · · · · · · · · · · · · · ·
NGC 1614 <0.28 · · · <1.49 · · · · · · · · · · · · · · · · · ·
NGC 3628 <0.06 · · · <0.34 · · · · · · · · · · · · · · · · · ·
NGC 2623 0.30±0.04 · · · 0.47±0.07 · · · 0.63
+0.22
−0.14
17 163 218 2167
IRAS23128· · · 0.22±0.02 <0.40e 0.34±0.10 · · · 0.65
+0.23
−0.22
16 161 203 2152
MRK273 1.06±0.05 0.82e 2.74±0.19 · · · 0.39
+0.14
−0.10
49 192 617 2565
IRAS20551· · · <0.06 <0.25e <0.25 · · · · · · · · · · · · · · · · · ·
NGC3627 0.08±0.01 · · · 0.19±0.05 · · · 0.45+0.16
−0.12
40 184 504 2453
UGC05101 0.52b <1.50e 0.49b · · · 1.06+0.37
−0.28
· · · 131 · · · 1750
NGC4125 <0.03 · · · <0.07 · · · · · · · · · · · · · · · · · ·
NGC 4594 <0.03 · · · <0.04 · · · · · · · · · · · · · · · · · ·
Quasars
PG1351· · · <0.04 · · · <0.07 · · · · · · · · · · · · · · · · · ·
PG1211· · · 0.04±0.007 · · · <0.04 · · · >1.01 · · · · · · · · · · · ·
PG1119· · · 0.30±0.06 · · · 0.22±0.02 · · · 1.39
+0.49
−0.36
· · · 115 · · · 1531
PG2130· · · 0.42±0.03 · · · 0.42±0.05 · · · 1.00
+0.35
−0.26
· · · 135 · · · 1798
PG0804· · · <0.06 · · · <0.07 · · · · · · · · · · · · · · · · · ·
PG1501· · · 0.78±0.02 · · · 0.83±0.02 · · · 0.94
+0.33
−0.25
· · · 138 · · · 1846
Columns Explanation: Col(1):Common Source Names; Col(2): 14.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from
Spitzer; Col(3): 14.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(4): 24.31 µm [NeV] line flux and statistical
error in units of 10−20 W cm−2 from Spitzer; Col(5): 24.32 µm [NeV] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(6):
[NeV] Line Ratio used in plots and calculations; Col(7): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL,
calculated using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µm
(Lutz et al. 1996); Col(8): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated
using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µm (Lutz
et al. 1996), Col(9): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated using the Chiar & Tielens
(2006) extinction curve for the local ISM, Col(10): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density
limit (HDL), calculated using the Chiar & Tielens (2006) extinction curve for the local ISM.∗ Sturm et al. 2002 find that the [NeV] 24µm detection
for NGC 7469 is a questionable one, ∗∗ As discussed in detail in Section 6.2, Mrk 266 is the only galaxy in our sample where we find that aperture
variation may affect the observed [NeV] line flux ratio. For this reason it has been excluded from relevent plots and calculations. References for
Table 2:
a Weedman et al. 2005, b Armus et al. 2004 & 2006, c Sturm et al. 2002, d Verma et al. 2003, e Genzel et al. 1998, f Prieto & Viegas
Table 3: SIII Line Fluxes and Derived Extinction
Galaxy [SIII] [SIII] [SIII] [SIII] [SIII] Av Av Av Av PAH6.2
Source 18.71 18.71 33.48 33.48 Ratio Ratio to Ratio to Ratio to Ratio to SL2
SH ISO LH ISO LDL (D&L) HDL (D&L) LDL (C&T) HDL (C&T)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11)
Seyferts
NGC4151 7.50a 5.40c 6.57a 8.10c 1.14
+0.40
−0.30
· · · · · · · · · · · · 168.1
NGC1365 5.73±0.05 13.50c 27.20±0.38 36.10c 0.21
+0.07
−0.05
· · · · · · · · · · · · 132.3
NGC1097 2.18±0.02 · · · 11.40±0.23 · · · 0.19
+0.07
−0.05
· · · · · · · · · · · · 151.7
NGC7469 7.70a 9.20c 9.80a 10.40c 0.79
+0.28
−0.20
· · · · · · · · · · · · 415.0
NGC4945 3.18±0.03 6.30d 38.70±1.80 51.40d 0.08
+0.03
−0.02
· · · · · · · · · · · · 671.7
Circinus 19.10±0.70 35.20c 56.30±3.31 93.20c 0.37
+0.13
−0.10
· · · · · · · · · · · · 1018.4
Mrk 231 <0.47a <3.00e <2.30a <3.00e · · · · · · · · · · · · · · · 175.6
Mrk3 5.55a · · · 5.25a · · · 1.06
+0.37
−0.28
· · · 83 · · · 72 18.4
Cen A 4.54a 6.40c 14.80a 22.30c 0.31
+0.11
−0.08
· · · · · · · · · · · · 220.8
Mrk463 1.50b <0.80c 1.35b 1.20c 1.11
+0.39
−0.29
· · · 81 · · · 70 55.4
NGC 4826 3.39±0.03f · · · 4.61±0.08f · · · 0.74
+0.26
−0.19
· · · 96 · · · 82 66.2
NGC 4725 0.02±0.02f · · · 0.11±0.02f · · · 0.23
+0.08
−0.06
23 135 20 116 3.7
1 ZW 1 <0.11a <0.50c <0.18a <1.00c · · · · · · · · · · · · · · · 22.2
NGC 5033 0.85±0.11 · · · 2.42±0.10 · · · 0.35
+0.12
−0.09
· · · · · · · · · · · · 33.9
NGC1566 0.55±0.05f · · · 0.55±0.06f · · · 1.00
+0.35
−0.26
· · · 85 · · · 73 61.0
NGC 2841 0.22±0.04f · · · 0.29±0.03f · · · 0.75
+0.26
−0.20
· · · 95 · · · 82 10.9
NGC 7213 0.47±0.05 · · · 0.59±0.06 · · · 0.80
+0.28
−0.21
· · · · · · · · · · · · 28.7
LINERs
NGC4579 0.32±0.06f <0.78g 0.24±0.03f <1.20g 1.33
+0.47
−0.35
· · · 75 · · · 65 17.3
NGC3031 0.61±0.03 · · · 0.09±0.09 · · · 0.67
+0.24
−0.18
· · · · · · · · · · · · 23.4
NGC6240 1.99b <4.00e 2.63b 4.50e 0.76
+0.27
−0.20
· · · 95 · · · 81 399b
NGC5194 1.06±0.05f 1.00d 1.48±0.03f 4.60d 0.72
+0.25
−0.19
· · · 96 · · · 83 26.4
MRK266∗∗ 1.00±0.13 · · · 4.65±0.09 · · · 0.21
+0.08
−0.06
25 138 22 118 23.1
NGC7552 17.11±0.08f 24.60d 13.38±0.41f 41.10d 1.28
+0.45
−0.33
· · · 77 · · · 66 872.0
NGC 4552 0.07±0.03f · · · 0.06±0.02f · · · 1.29
+0.45
−0.34
· · · 76 · · · 66 21.5
NGC 3079 1.25a 6.80g 6.08a 6.60g 0.21
+0.07
−0.05
· · · · · · · · · · · · 620.3
NGC 1614 9.63±0.27 · · · 11.60±0.43 · · · 0.83
+0.29
−0.15
· · · 91 · · · 79 508.5
NGC 3628 2.14±0.03 · · · 15.80±0.33 · · · 0.14
+0.05
−0.04
· · · · · · · · · · · · 430.4
NGC 2623 0.88±0.05 · · · 3.16±0.20 · · · 0.28
+0.10
−0.07
16 129 14 111 128.6
IRAS23128· · · 2.62±0.12 0.89e 2.11±0.18 2.80e 1.24
+0.44
−0.32
· · · 78 · · · 67 90.1
MRK273 1.24±0.07 <0.82e 3.88±0.40 2.30e 0.32
+0.11
−0.08
12 124 10 107 69.2
IRAS20551· · · 0.66±0.06 0.30e 1.18±0.13 1.40e 0.56
+0.20
−0.15
· · · 105 · · · 90 38.3
NGC3627 0.38±0.03f · · · 0.57±0.09f · · · 0.67
+0.24
−0.18
· · · 99 · · · 85 153.8
UGC05101 0.98b <1.40e 1.30b 2.50e 0.75
+0.27
−0.20
· · · 95 · · · 82 190b
NGC4125 · · ·f · · · 0.06±0.05f · · · · · · · · · · · · · · · · · · 14.9
NGC 4594 0.39±0.03 · · · 1.24±0.13 · · · 0.32
+0.11
−0.08
· · · · · · · · · · · · 14.2
Quasars
PG1351· · · 0.34±0.06 · · · <0.13 · · · >2.70 · · · · · · · · · · · · 29.6
PG1211· · · <0.06 · · · <0.08 · · · · · · · · · · · · · · · · · · 25.8
PG1119· · · <0.13 · · · 0.19±0.06 · · · <0.71 · · · · · · · · · · · · 8.7
PG2130· · · <0.19 · · · 0.34±0.06 · · · <0.55 · · · · · · · · · · · · 26.9
PG0804· · · <0.06 · · · <0.21 · · · · · · · · · · · · · · · · · · 27.4
PG1501· · · 0.67±0.15 · · · 0.41±0.05 · · · 1.64
+0.58
−0.43
· · · 68 · · · 59 19.2
Columns Explanation: Col(1):Common Source Names; Col(2): 18.71 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from
Spitzer; Col(3): 18.71 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(4): 33.48 µm [SIII] line flux and statistical
error in units of 10−20 W cm−2 from Spitzer; Col(5): 33.48 µm [SIII] line flux and statistical error in units of 10−20 W cm−2 from ISO; Col(6):[SIII]
line flux ratio used for plots and calculations; Col(7): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL, calculated
using the Draine (1989) extinction curve amended by the more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µ (Lutz et
al. 1996) for those galaxies with distances greater than 55 Mpc that are not effected by aperture variations, Col(8): Extinction required to bring
ratios below the low-density limit (LDL) up to the high-density limit (HDL), calculated using the Draine (1989) extinction curve amended by the
more recent ISO SWS extinction curve toward the Galactic center for 2.5-10µ (Lutz et al. 1996) for those galaxies with distances greater than 55
Mpc that are not effected by aperture variations, Col(9): Extinction required to bring ratios below the low-density limit (LDL) up to the LDL,
calculated using the Chiar & Tielens (2006) extinction curve for the Galactic Center for those galaxies with distances greater than 55 Mpc that
are not effected by aperture variations, Col(10): Extinction required to bring ratios below the low-density limit (LDL) up to the high-density limit
(HDL), calculated using the Chiar & Tielens (2006) extinction curve for the Galactic Center for those galaxies with distances greater than 55 Mpc
that are not effected by aperture variations, Col(11): 6.2 µm PAH line flux in units of 10−21 W cm−2 ∗∗ The [SIII] ratio for Mrk 266 is known
to be affected by aperture variations(See Section 6.2). For this reason it has been excluded from relevent plots and calculations. References for
Table 3:
a Weedman et al. 2005, b Armus et al. 2004 & 2006, c Sturm et al. 2002, d Verma et al. 2003, e Genzel et al. 1998, f Dale et al. 2006, g
Satyapal et al. 2004.
32 33 34 35 36 37 38
 = -0.01
log(24 m Specific Luminosity) (W m-1)
 14/24 < 0.50
 0.50 < 14/24 < 1.0
 14/24 > 1.0
Fig. 5.— The observed [NeV] line ratio as a function of the 24µm
specific luminosity for our sample of galaxies. The error bars shown
here represent the calibration uncertainties on the [NeV] line flux
ratio as in Figure 3. The symbol type indicates the 14µm/24µm
continuum ratio.
a role in lowering the [NeV] line flux ratio, we argue that
differential infrared extinction to the [NeV] emitting re-
gion due to dust in the obscuring torus is responsible
for the low line ratios in at least some AGN. Clearly,
this requires that there is significant extinction at mid-IR
wavelengths, and specifically toward the [NeV]-emitting
regions. Is this reasonable? If there is significant ex-
tinction, it is possible that: 1) the [NeV]-emitting region
originates much closer to the central source than previ-
ously recognized, close enough to be extinguished by the
central torus in some galaxies, 2) the [NeV]-emitting por-
tion of the NLR is obscured by dust in the host galaxy
or in the NLR itself, or 3) some combination of these
scenarios. We explore these possibilities in the following
analysis.
5.1. The [NeV] originates in gas interior to the central
torus.
In the conventional picture of an AGN, the broad line
region (BLR) is thought to exist within a small region
interior to a dusty molecular torus while the NLR origi-
nates further out. This of course is the paradigm invoked
to explain the Type 1/Type 2 dichotomy. However there
have been multiple optical spectroscopic studies that con-
tradict the assumption that the observational properties
of the NLR are not dependent on the viewing angle and
the inclination of the system, suggesting that some of
the narrow emission lines originate in gas interior to the
torus. For instance, Shuder and Osterbrock (1981) and
Cohen (1983) showed that narrow high ionization forbid-
den lines such as [Fe VII] λ 6374 (requiring photons with
energies ≥ 100eV to ionize) are stronger relative to the
low ionization lines in Seyfert 1 galaxies (including inter-
mediate Seyferts, 1.2, 1.5 etc.) than in Seyfert 2 galaxies,
suggesting that some of the emission is obscured by the
torus. In addition, [FeX] λ 6374 and [NeV] λ 3426 have
also been shown to be less luminous in Type 2 objects
than in Type 1 objects (Murayama & Taniguchi, 1998a;
Schmitt 1998, Nagao et al. 2000, 2001a, 2001b, 2003,
Tran et al. 2000, see also Jackson and Browne (1990) for
narrow line radio galaxies and quasars.) These findings
may imply that the emission lines of species with the
highest ionization potentials originate closer to the AGN
than those of lower ionization species such as [OII]λ3727,
[SII]λλ6716, 6731, [OI] λ6300 etc. and therefore may be
partially obscured by the central torus.
If there is considerable extinction to the line-emitting
regions due to the torus, one may expect the mid-infrared
continuum to be similarly obscured. To test this scenario
we divided our sample into Type 1 or Type 2 objects
based on the presence or absence of broad (full width at
half max (FWHM) exceeding 1000 km s−1) Balmer emis-
sion lines in the optical spectrum. The spectral classifi-
cation for the [NeV]-emitting galaxies is given in Table 1.
In Figure 6a, we plot the [NeV] 14µm/24µm line flux ra-
tio versus the 14µm/24µm continuum ratio of the [NeV]
emitting galaxies in our sample. Assuming there is no
correlation between the electron density and the contin-
uum shape, a correlation between the line flux and con-
tinuum ratios would suggest that the mid-IR extinction
associated with the torus (such as that found by Clavel et
al. 2000) affects the observed line flux ratios. As can be
seen, there is a correlation between the line and contin-
uum ratios for galaxies with [NeV] emission. Moreover
we note that the 3 nuclei with ratios significantly below
the LDL are all Type 2 AGNs, while the 2 that lie sig-
nificantly above this limit are Type 1 AGNs, suggestive
that the extinction of the [NeV]-emitting region in Type
2 AGNs may be due to the torus. We note that the
error bars displayed in Figure 6 are based on a conserva-
tive estimate (15%) of the absolute calibration error on
the flux (see Section 3). Moreover, we have adopted the
most conservative approach in propagating the error (see
Section 4) for each line ratio. We further note that two
of the three nuclei with ratios below the LDL were also
observed in high-accuracy peak-up mode, resulting in a
pointing accuracy on the continuum for these galaxies of
0.4”. The third galaxy, NGC 3627, was observed in high
resolution mapping mode over 15” X 22”. We extracted
the spectra and found that the full map and the single
slit fluxes agree to within 10%. Thus pointing errors do
not appear to be responsible for the low ratios in these
galaxies. Finally we find that the ratios for the all galax-
ies except Mrk 266 are not sensitive to the line-fitting or
flux extraction methods that we have employed.
The Spearman rank correlation coefficient for Figure 6
is 0.60 (with a probability of chance correlation of 0.008),
indicating a significant correlation between the [NeV] line
flux ratios and the mid-IR continuum ratio. We note
that some AGNs are known to contain prominent sili-
cate emission features (Hao et al. 2005, Sturm et al.
2006) which have not been disentangled from the under-
lying continuum in this study. Because of this, the 14µm
or 24µm flux may be overestimated in some cases mak-
ing intrinsic value of the continuum at 14µm and 24µm
somewhat uncertain. However only one galaxy plotted
in Figure 6 is currently known to contain such features
(PG1211+143, Hao et al. 2005). Variations in ne and
the underlying continuum shape will also add scatter to
the correlation, as will differences in extinction to the
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
 = 0.60
F (14 m) / F (24 m) Continuum
Type 1 Objects
Type 2 Objects
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
 D < 55 Mpc
 D > 55 Mpc
F (14 m) / F (24 m) Continuum
Fig. 6.— The [NeV] line ratio vs. the Fν(14µm)/Fν (24µm) continuum ratio for our sample. In both plots, the error bars mark the
calibration uncertainties on the line ratio. There is a correlation between the line and continuum ratios which suggests that extinction
affects the observed line flux ratios. 6a) The majority of galaxies with ratios below the LDL are Type 2 objects, implying that the extinction
toward the [NeV]-emitting region may be due to the torus. 6b) The correlation shown here is not an artifact of aperture vatiations between
the SH and LH slits. The correlation holds when only the most distant galaxies are considered.
line- and continuum-producing regions. We should note
that the correlation seen in Figure 6 is not an artifact
of aperture variations between the LH and SH slit. The
correlation holds when only the most distant galaxies
(closed symbols in Figure 6b) are considered.
Independent of this correlation, our most important
finding is that the [NeV] line flux ratio is significantly
lower for Type 2 AGNs than it is for Type 1 AGNs. Fig-
ure 7 shows the relative [NeV] flux ratios for the Type
1 and Type 2 objects in our sample. The mean ratios
are 0.97 and 0.72 for the eight Type 1 and ten Type 2
AGNs, respectively, with uncertainties in the mean of
about 0.08 for each. Interestingly, although the sample
size is limited, precluding us from drawing firm statisti-
cally significant conclusions, there is a similar suggestive
trend seen in the sample of AGNs observed by Sturm et
al. (2002) with ISO-SWS. That is, in their work, the two
galaxies with the lowest [NeV] flux ratios are NGC 1365
and NGC 7582, both Type 2 AGNs. The galaxy with
the highest ratio in their work is TOL 0109-383 , a Type
1 AGN.
If indeed the torus obscures the IR [NeV] emission
in Type 2 objects, one would expect the optical/UV
[NeV] emission in these objects to be obscured as well.
We searched the literature for optical/UV detections of
[NeV] λ3426 for all of the galaxies in our sample and
found five galaxies with observations at this wavelength.
Four of these galaxies (Mrk 463, Mrk 3, NGC 1566, and
NGC 4151) were detected at [NeV] λ3426; the other
(NGC 3031) was not detected (see, Kuraszkiewicz et al.
2002, 2004 and Forster et al. 2001 for optical/UV fluxes).
Of the four galaxies with optical/UV [NeV] detections,
two are Type 1 galaxies (NGC 1566 & NGC 4151) and,
surprisingly, two are Type 2 galaxies (Mrk 3 and Mrk
463). If the Type 2 galaxies Mrk 3 and Mrk 463 had
[NeV] emitting regions interior to the torus, then the op-
 Type 1 AGN
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75
[NeV]14 m/[NeV]24 m 
 Type 2 AGN
Fig. 7.— Histogram of the [NeV] 14µm/24µm line flux ratio as
a function of AGN type. The [NeV] line ratios for Type 2 AGNs
are consistently lower than those from Type 1 AGNs.
tical/UV lines in these objects should not be detected
due to severe obscuration. We note that Mrk 3 and Mrk
463 have some of the highest X-ray luminosities (both
∼ 1043 erg s−1 ) in the sample and mid-IR [NeV] ra-
tios that are comparable to similarly luminous Type 1
objects–consistent with little or no obscuration in the
mid-IR for these Type 2 galaxies. This finding may im-
ply that, in the most powerful AGNs, the [NeV] emitting
region is pushed beyond the torus because the radiation
field is so intense, while lines with higher ionization po-
tentials than [NeV] (such as [NeVI], [FeX] etc.) are still
concealed by the torus. More data, both from the mid-
IR and from the optical/UV, are needed to further test
this hypothesis.
5.2. The [NeV]-emitting region is obscured by the host
galaxy or dust in the NLR.
While the correlation in Figures 6a and 6b are very
promising explanations for the observed [NeV] line ra-
tios, it is not completely clear why some Type 2 galaxies
appear obscured and others may not. Perhaps the [NeV]
emission is attenuated by dust in the NLR itself or else-
where in the host galaxy. Indeed, it is well-known that
dust does exist in the NLR (e.g., Radomski et al. 2003,
Tran et al. 2000) and that it can be extended and patchy
(e.g Alloin et al. 2000; Galliano et al. 2005; Mason et
al. 2006). In addition, dust in the host galaxy could be
responsible for the extinction seen here. For complete-
ness, we have conducted a detailed archival analysis of
all of the galaxies in our sample with [NeV] ratios close
to or below the LDL in order to see if there is additional
evidence for high extinction either in the host galaxy or
within the NLR. We find that the majority of galaxies
with low densities do indeed have well-known dust lanes,
large X-ray inferred column densities, or other properties
indicative of extinction.
Cen A: This nearby (D = 3.4Mpc) early type (S0)
galaxy at one time devoured a smaller gas-rich spiral
galaxy (Israel 1998, Quillen 2006). There is clear evi-
dence for substantial obscuration toward the nucleus of
Cen A. For example, the central region is veiled by a well
known dense dust lane thought to be a warped thin disk
(Ebneter & Balick 1983, Bland et al. 1986, 1987, Nichol-
son et al. 1992, Sparke 1996, Israel 1998, Quillen et al.
2006). Schreier et al. (1996) find V-band extinction av-
eraging 4-5 mag and infrared observations by Alonso &
Minniti yield AV values exceeding 30 mag in some re-
gions. Thus, it is plausible that there are regions toward
the nucleus of Cen A that are obscured even at infrared
wavelengths.
NGC 1566: The optical nuclear spectrum of this
nearby galaxy is known to vary dramatically over a pe-
riod of months, changing its optical classification from a
Type 2 object to a Type 1 object and back again (Pastor-
iza & Gerola 1970, de Vaucouleurs 1973, Penfold 1979,
Alloin et al. 1985). The narrow optical lines in this ob-
ject also show prominent blue wings and the radio prop-
erties of this galaxy are more consistent with a Type 2
object than a Type 1 object (Alloin et al. 1985). HST
continuum imagery reveals spiral dust lanes within 1” of
the nucleus (Griffiths et al. 1997) which might be re-
sponsible for the Type 1/Type 2 variability. Baribaud et
al. (1992) find hot dust which lies just outside the broad
line region in this galaxy and a large covering factor that
might explain the steep continuum of the AGN. Ehle et
al. (1996) find NH ∼ 2.5 × 10
20 cm−2 from ROSAT
X-ray observations of this galaxy.
NGC 2623: This galaxy’s tidal tails are evidence of a
merger event, however infrared observations reveal a sin-
gle symmetric nucleus, implying that the merging galax-
ies have coalesced. Multi-color, near infrared observa-
tions reveal strong concentrations of obscuring material
in the central 500 pc.(Joy & Harvey 1987; Lipari et al.
2004). Lipari et al. (2004) also find an optically obscured
nucleus with V-band extinction ≥ 5 mag.
IRAS 23128-5919: This galaxy is also in the late
stages of a merger. The nuclei of the two galaxies are
4kpc apart and have not yet coalesced. The northern
nucleus is a starburst. The southern nucleus is a known
AGN, though its optical classification, Seyfert or LINER,
is unclear (Duc, Mirabel, & Maza 1997; Charmandaris
et al. 2002; Satyapal et al. 2004). IRAS 23128-5919 is
an ultraluminous infrared galaxy (ULIRG), clearly con-
sistent with the presence of substantial dust towards the
nucleus. Optical spectroscopy of the southern nucleus in-
dicates very large (1500 km s−1) blue asymmetries in the
Hβ and [OIII] lines. This blue wing could be a signature
of extinction toward the far side of an expanding region,
where the red wing is preferentially obscured. (Johans-
son & Bergvall 1988).
Mrk 273: This galaxy is also a ULIRG, so significant
dust obscuration toward the nucleus is expected. Near-
IR imaging and high resolution radio observations show
evidence for a double nucleus in this galaxy separated
by less than 1 kpc (Ulvestad & Wilson 1984; Mazzarella
et al. 1991; Majewski et al. 1993). However, high res-
olution Chandra observations reveal only the northern
of the two nuclei, suggesting that this galaxy is hosting
only one AGN and that perhaps the other ”nucleus” is
in fact a portion of the southern radio jet. The soft X-
ray emission from the northern nucleus is obscured by
column densities of at least 1023 cm−2 (Xia et al. 2002).
Although the X-ray-emitting regions are physically dis-
tinct from the NLR and some of the obscuration at X-ray
wavelengths likely arises in dust-free gas within the sub-
limation radius, the high column density derived may
be consistent with high extinction toward the central re-
gions of this galaxy. Though Xia et al (2002) find that
the X-ray morphology of the AGN in Mrk 273 is con-
sistent with a Seyfert, Colina et al. (1999) find that it
has a LINER optical spectrum, thus implying that some
LINER galaxies are in fact heavily absorbed powerful
AGN. The soft diffuse X-ray halo in combination with
the radio morphology found by Carilli & Taylor (2000)
may suggest a circumnuclear starburst surrounding the
northern AGN nucleus, again consistent with substantial
obscuration toward the AGN.
NGC 3627: This nearby galaxy (D ∼ 10Mpc) is
thought to have had tidal interactions with NGC 3628,
a neighboring galaxy in the Leo Triplet, some 8 × 108
years ago which caused an intense burst of star forma-
tion in the nuclear regions around the same time (Rots
1978, Zhang et al. 1993, Afanasiev & Sil’chenko 2005).
Zhang et al. (1993) also discovered an extremely dense
molecular bar (mass ≥ 4 × 108 M⊙) and Chemin et al.
(2003) uncovered a warped disk using Hα observations,
both evidence of the tidal interaction. In their spectral
fitting to the BeppoSAX observation of NGC 3627, Geor-
gantopoulos et al. (2002) find intrinsic column densities
of ∼ 1.5 × 1022 cm−2 which, like Mrk 273, may suggest
substantial extinction to other regions near the nucleus.
NGC 7469: This is a well-known, extensively-studied
galaxy with strong, active star formation surrounding
a Seyfert 1 nucleus. Meixner et al. (1990) find dense
molecular gas (2 × 1010 M⊙), two orders of magnitude
above the Galactic value, within the central 2.5kpc of
the nucleus. 3.3µm imaging of the galaxy reveals that
80% of the PAH emission comes from an annulus ∼ 1”-
3”in radius around the central nucleus, indicating that
there is an elongated region of material that shelters the
PAH from the harsh radiation field of the AGN (Cutri
et al. 1984, Mazzarella et al. 1994). [OIII] line asym-
metries may corroborate the presence of a dense obscur-
ing medium, revealing a blue wing resulting when the
redshifted gas is obscured by the star forming ring (Wil-
son et al. 1986). In addition, Genzel et al. (1995) find
variation in the NIR emission attributable to extinction
and estimate the extinction from the CO observations of
Meixner et al. (1990) to be AV ∼ 10 mag.
NGC 1365: This nearby (D = 18.6 Mpc) AGN is
known to be circumscribed by embedded young star clus-
ters. The galaxy also contains a prominent bar with a
dust lane that penetrates the nuclear region (Phillips et
al. 1983, Lindblad et al. 1996 & 1999, Galliano et al.
2005). Like NGC 7469, NGC 1365 shows a peak at 3.5µm
implying PAH emission in spite of the harsh AGN radia-
tion field (Galliano et al. 2005). The large Hα/Hβ ratio
found by Alloin et al. (1981) implies substantial extinc-
tion toward the emission line regions, ranging from 3-4
mag. Observations with ASCA and ROSAT imply high
intrinsic column densities toward the X-ray emitting re-
gions, suggesting possibly high obscuration towards other
regions near the nucleus (Iyomoto et al. 1997, Komossa
& Schulz (1998), see also Schulz et al. 1999). Komossa &
Schulz show that the ratio of Hα to both the mid-IR and
X-ray radiation is substantially different in NGC 1365
compared with typical Seyfert 1 galaxies, possibly sug-
gesting inhomogenous obscuration (Schultz et al. 1999).
In an XMM X-ray study of NGC 1365, Risaliti et al.
(2005) also find a heavily absorbed Seyfert nucleus. The
blueshifted X-ray spectral lines imply high column den-
sities of 1023 cm−2 or more.
Mrk 266 (NGC 5256): This luminous infrared
galaxy is the only galaxy for which aperture effects most
likely account for the low 14µm/24µm ratio. Mrk 266
contains a very complicated structure which includes at
least two bright nuclei, a Seyfert and a LINER, that are
10” apart–a signature of a merger in progress. The mor-
phology of the northeast LINER nucleus is extremely
controversial (Wang et al. 1997; Kollatschny & Kowatsch
1998; Satyapal 2004, 2005; Ishigaki et al. 2000; Davies,
Ward, & Sugai 2000). Mazzarella et al. (1988) find three
non-thermal radio structures, two that coincide with the
nuclei and one between the two nuclei. Mazzarella et
al. (1988) suggest that the two nuclear structures are
associated with classical AGN and are in the stage of a
violent interaction in which the center of gravity of the
collision produces a massive burst of star formation with
supernovae or shocks which are responsible for the third
nonthermal radio source. As can be seen in Figure 8,
the SH slit, which provides the 14µm flux, overlaps with
this third radio source, while the LH slit, responsible for
the 24µm flux, encompasses the southwestern nucleus,
the third radio source, and part of the northeastern nu-
cleus. In this case the two lines observed originate in
physically distinct regions that do not each encompass
all potential sources of [NeV] emission, resulting in an
unphysical 14µm/24µm ratio. This is not to say that
Mrk 266 does not suffer from extinction at all. Indeed
the possible presence of a circumnuclear starburst im-
plies that there may be substantial extinction (Ishigaki
et al. 2000; Davies, Ward, & Sugai 2000). We have ver-
ified that this is the only distant galaxy in our sample
Fig. 8.— 20 cm image of Mrk 266 taken from NED
(http://nedwww.ipac.caltech.edu/). As can be seen here, the SH
slit (from which the 14µm line is extracted) overlaps with a third
radio source, while the LH slit (from which the 24µm line is ex-
tracted) encompasses the southwestern nucleus and part of the
northeastern nucleus.
with a complicated nuclear structure that will result in
aperture effects.
5.3. Can the [NeV] line flux ratio be used as a density
diagnostic?
Our analysis reveals that extinction towards parts of
the NLR in some objects is significant and cannot be ig-
nored at mid-IR wavelengths. In fact, it is quite possible
that extinction affects the [NeV] line flux ratios of those
galaxies with ratios above the low density limit (LDL)
and the amount of extinction in all cases is highly un-
certain. In addition to extinction, the temperature of
the [NeV] emitting gas is unknown. If the [NeV] emis-
sion originates within the walls of the obscuring central
torus, which may be the source of extinction in many of
our galaxies, we might expect the temperature of the gas
to reach 106 K (Ferland et al. 2002). If, on the other
hand, the [NeV] emission comes from further out in the
NLR and is instead attenuated by the intervening mate-
rial, we might expect the temperature of the gas to be
closer to 104 K. As shown in Figure 1, the electron densi-
ties inferred from the [NeV] line flux ratios are sensitive
to temperature when such large temperature variations
are considered. Based on the calculations shown in Fig-
ure 1, the low ratios could indicate that the densities in
the [NeV] line emitting gas are typically ≤ 3000 cm−3 for
T = 104K. However, if the [NeV] gas is characterized by
temperatures as high as T = 105K to 106K, densities
as high as 105 cm−3 would be consistent with our mea-
surements. We note that the [NeV] line flux ratios for
the galaxies in our sample (especially the Type 1 AGNs)
http://nedwww.ipac.caltech.edu/
all cluster around a ratio of ≈ 1.0. Two separate con-
clusions may be drawn from this finding: 1) That the
temperatures of the gas are low (∼ 104K) and that the
electron density is relatively constant over many orders
of magnitude in X-ray Luminosity and Eddington Ratio
for these AGNs, or 2) That the temperature of the gas
is high (105K to 106 K) and that the AGNs here sam-
ple a wide range of electron densities (from 102 cm−3
to 105 cm−3). Since gas temperature, electron density,
mid-IR continuum, and extinction are all unknown for
these objects, the electron density cannot be determined
here.
6. THE SIII LINE FLUX RATIOS
In Figure 9 we plot the 18µm/33µm line ratio as a
function of electron density ne. As with [NeV], we only
consider the five levels of the ground configuration when
computing the line ratio and we plot the relationship for
gas temperatures of T = 104K and 105K. We adopt col-
lision strengths from Tayal & Gupta (1999) and radiative
transition probabilities from Mendoza & Zeippen (1982).
Fig. 9.— 18µm/33µm line flux ratio in S III versus electron
density ne, for gas temperatures T = 10
4 K and 105 K.
In Table 3, we list the observed [SIII] line flux ratios
for the galaxies in our sample. As with the [NeV] ratios,
the [SIII] ratios in many galaxies listed in Table 3 are
well below the theoretically allowed value of 0.45 for a
gas temperature of T = 104K (13/33 detections). Again
we explore the observational effects and the theoretical
uncertainties that could artificially lower these ratios.
Aperture Effects: The ionization potential of [SIII]
is ∼ 35 eV and therefore the [SIII] emission may arise
from gas ionized by either the AGN or young stars. In
Table 3 we list, in addition to our Spitzer [SIII] fluxes,
all available [SIII] fluxes from ISO. Unlike [NeV], the
[SIII] fluxes from ISO are significantly larger than the
Spitzer fluxes for most galaxies. In Figure 10 we plot
the ISO to Spitzer flux ratios for the 18µm and 33µm
the [SIII] lines. As can be seen here, the [SIII] emission
extends beyond the Spitzer slit for many galaxies (6 out
of 9 for [SIII] 18µm and 11 out of 13 for [SIII] 33µm).
Similarly, when we compare the [SIII] flux arising from
a single slit centered on the nucleus to the flux arising
from a more extended region obtained using mapping
observations (Dale et al. 2006), we find that in most cases
the fluxes from the extended region are much larger than
the nuclear single-slit fluxes. Galaxies with fluxes from
Dale et al. (2006) are not included in Figure 10 since
the extraction aperture for these galaxies is comparable
to the 18µm ISO slit. We point out that the value for
this ratio is dependent on the orientation of the Spitzer
slit relative to the ISO slit and on the distance of each
object. We also note that IRAS20551 and IRAS23128 are
point sources with Spitzer 18µm fluxes greater than the
ISO fluxes from Genzel et al. (1998), however they fall
within the Genzel et al. (1998) quoted errors of 30% and
the Spitzer calibration error of 15%. Figure 10 suggests
that the [SIII] emission may be produced in the extended,
circumnuclear star forming regions associated with many
AGNs and that aperture effects need to be considered in
our analysis of the [SIII] ratio for nearby objects.
[SIII]
(ISO) / F
[SIII]
(Spitzer)
 [SIII] 18 micron Ratio
0
 1
 2
 3
 4
 5
 6
 [SIII] 33 micron Ratio
Fig. 10.— The ratios of the [SIII] flux from ISO and Spitzer for
the 18µm and 33µm lines. The range indicated with arrows is that
corresponding to the absolute flux calibration for ISO (20%) and
Spitzer (15%). The [SIII] emission is indeed extended beyond the
Spitzer slit for many galaxies, suggesting that the [SIII] emission
may be produced in star forming regions. We note that IRAS20551
and IRAS23128 are point sources with Spitzer 18µm fluxes greater
than the ISO fluxes from Genzel et al. 1998, however they fall
within the Genzel et al. (1998) quoted errors of 30% and the Spitzer
calibration error of 15%. Galaxies with fluxes from Dale et al. 2006
are not included in this plot since the extraction aperture for these
galaxies is comparable to the 18µm ISO slit.
The contribution from star formation to the [SIII] lines
can be estimated using the strength of the PAH emis-
sion, one of the most widely used indicators of the star
formation activity in galaxies (e.g. Luhman et al. 2003;
Genzel et al. 1998; Roche et al. 1991; Rigopoulou et
al. 1999, Clavel et al. 2000; Peeters, Spoon, & Tielens
2004). We examined the [SIII] 18.71 µm/PAH 6.2 µm
and [SIII] 33.48 µm/PAH 6.2 µm line flux ratios in 7
starburst galaxies observed by Spitzer and found them
to be comparable to the analogous ratios in our entire
sample of AGNs as shown in Figure 11. This suggests
that the bulk of the [SIII] emission originates in gas ion-
ized by young stars. We note that the apertures of the
SH and LH IRS modules are smaller than that of the
SL2 module, which may artificially raise the line ratios
plotted in Figure 11 for nearby galaxies compared with
the more distant ones. However, the fact that the line
ratios plotted in Figure 11 span a very narrow range sug-
gests that the [SIII] line emission has a similar origin in
starbursts and in AGNs. Thus, we assume that the bulk
of the [SIII] emission originates in gas ionized by young
stars and that the electron densities derived using these
lines taken from slits of the same size (such those galaxies
coming from Dale et al. 2006 mapping observations) or
from the most distant galaxies are representative of the
gas density in star forming regions.
Extinction: We have shown that aperture effects are
the likely explanation for why many of the [SIII] ratios
for the galaxies in our sample fall below the LDL. How-
ever, there are three galaxies in the sample with ratios
below the LDL that are distant enough (D>55 Mpc, cor-
responding to projected distances greater than 1.2 by 3
kpc and 3 by 6 kpc for the SH and LH slits, respectively)
that aperture effects may not be as important (NGC 2623
& Mrk 273, Mrk 266 has been excluded since it is known
to be affected by aperture variations See Section 5.2).
Extinction may be the explanation for the low ratios in
these galaxies. However, even though the SH and LH slits
likely cover the entirety of the NLR at these distances, we
note that these three galaxies contain well-known, large
circumnuclear starbursts (See Section 5.2 for the indi-
vidual galaxy summaries) which may produce extremely
extended [SIII] emission. It is therefore still possible that
the line ratios in these galaxies are artificially lowered by
aperture variations between the SH and LH slits. How-
ever, in addition to these three distant galaxies, NGC
4725 from Dale et al. (2006) has a [SIII] ratio below the
low density limit. The low [SIII] ratio (<0.45) in this
case cannot be attributed to aperture variations since
the extraction region is the same for both the 18 and
33µm lines. Thus, for completeness, the extinction de-
rived using the extinction curves given in Section 4 from
the observed [SIII] line ratio for these four sources are
given in Table 3. The Draine (1989) and Lutz et al.
(1996) extinction curve calculations yield extinction val-
ues that range from ∼ 12 to 25 mag. The Chiar and
Tielens (2006) extinction curve for the Galactic Center
may also be used since, unlike [NeV], the extinction at
the longer wavelength line (33µm) is greater than that
at the shorter wavelength line (18µm). The values de-
rived from this method are quite similar, ranging from ∼
10 to 22 mag. The Chiar and Tielens (2006) extinction
curve from the local ISM cannot be used here since it
only extends to 27.0µm.
Computed Quantities: As with NeV, there may be
uncertainties in the computed SIII infrared collisional
rate coefficients. However, there is generally less contro-
versy surrounding the [SIII] coefficients and these values
are widely accepted.
Our analysis suggests that aperture effects severely in-
fluence the [SIII] line flux ratios in most cases and that
the observed flux is likely dominated by star forming
regions. Figure 12, a plot of the [SIII] line ratio as a
function of distance, illustrates the influence of aperture
effects on the [SIII] line ratio. Most of the galaxies at dis-
tances <55 Mpc with [SIII] fluxes extracted from aper-
tures of different sizes (i.e. NOT the Dale et al. (2006)
galaxies) are below the LDL. On the other hand, galax-
ies at larger distances and galaxies with fluxes from Dale
et al. (2006) are generally above the LDL. Thus, for the
most distant galaxies in our sample and the galaxies with
fluxes from Dale et al. (2006) where the aperture for the
18 and 33 µm lines are equal, aperture effects are not
problematic, but extinction, as can be seen from Mrk
273, NGC2623, and NGC 4725 in Figure 12, needs to be
considered. As with the [NeV] line ratio, the [SIII] line
ratio is NOT a tracer of the electron density in our sam-
ple. In conclusion, the ambiguity of the intrinsic [SIII]
line ratio is primarily the result of aperture variations.
However there is at least one case (NGC 4725) where
aperture effects cannot explain the low ratio, implying
that, in addition to aperture variations, extinction likely
plays a role in lowering the [SIII] line flux ratios.
7. SUMMARY
We report in this paper the [NeV] 14µm/24µm and
[SIII]18µm/33µm line flux ratios, traditionally used to
measure electron densities in ionized gas, in an archival
sample of 41 AGNs observed by the Spitzer Space Tele-
scope.
1. We find that the [NeV] 14µm/24µm line flux ratios
are low: approximately 70% of those measured are
consistent with the low density limit to within the
calibration uncertainties of the IRS.
2. We find that Type 2 AGNs have lower [NeV]
14µm/24µm line flux ratios than Type 1 AGNs.
The mean ratios are 0.97 and 0.72 for the eight
Type 1 and ten Type 2 AGNs, respectively, with
uncertainties in the mean of about 0.08 for each.
3. For several galaxies, the observed [NeV] line ratios
are below the theoretical low density limit. All of
these galaxies are Type 2 AGNs.
4. We discuss the physical mechanisms that may play
a role in lowering the line ratios: differential mid-
IR extinction, low density, high temperature, and
high mid-IR radiation density.
5. We argue that the [NeV]-emitting region likely
originates interior to the torus in many of these
AGNs and that differential infrared extinction due
to dust in the obscuring torus may be responsible
for the ratios below the low density limit. We sug-
gest that the ratio may be a tracer of the torus
inclination angle to our line of sight.
6. Our results imply that the extinction curve in these
galaxies must be characterized by higher extinction
at 14µm than at 24µm, contrary to recent studies
of the extinction curve toward the Galactic Center.
7. A comparison between the [NeV] line fluxes ob-
tained with Spitzer and ISO reveals that there are
systematic discrepancies in calibration between the
two instruments. However, our results are indepen-
dent of which instrument is used; [NeV] line flux
ratios are consistently lower in Type 2 AGNs than
in Type 1 and [NeV] line flux ratios below the LDL
are observed with both ISO and Spitzer.
0
 1
 2
 3
 4
 5
[SIII]18/F(PAH6.2)
 
Starbursts
0
 1
 2
 3
 4
 5
 
AGN
0
 1
 2
 3
 4
 5
[SIII](33)/F(PAH6.2)
 
Starbursts
0
 1
 2
 3
 4
 5
 
AGN
Fig. 11.— Distribution of the [SIII]33µm/PAH 6.2µm and the [SIII]18µm/PAH 6.2µm line flux ratios for our sample of AGNs and a
small sample of starburst galaxies observed by Spitzer. It is apparent that the line ratios of the AGNs are comparable to the corresponding
ratios in starbursts, suggesting that the bulk of the [SIII] emission originates in star forming regions and not the NLRs in our sample of
AGNs.
8. Our work provides strong motivation for investigat-
ing the mid-IR spectra of a larger sample of galaxies
with Spitzer in order to test our conclusions.
9. Finally, an analysis of the [SIII] emission reveals
that it is extended in many or all of the galaxies
and likely originates in star forming gas and NOT
the NLR. Since there is a variation in the aper-
tures between the SH and LH modules of the IRS,
we cannot use the [SIII] line flux ratios to derive
densities for the majority of galaxies in our sam-
We are extremely thankful for all of the invaluable data
analysis assistance from Dan Watson and Joel Green,
without which this work would not have been possible.
We are also very grateful to Davide Donato, Eli Dwek,
Frederic Galliano, Paul Martini, Kartik Sheth, Eckhard
Sturm, Peter van Hoof, and Dan Watson for their en-
lightening and thoughtful comments/expertise that sig-
nificantly improved this paper. Carissa Khanna was also
very helpful in providing assistance in the preliminary
data analysis. We are also grateful for the helpful and
constructive comments from the referee. This research
has made use of the NASA/IPAC Extragalactic Database
(NED) which is operated by the Jet Propulsion Labora-
tory, California Institute of Technology, under contract
with the National Aeronautics and Space Administra-
tion. SS gratefully acknowledges financial support from
NASA grant NAG5-11432 and NAG03-4134X. JCW
gratefully acknowledges support from Spitzer Space Tele-
scope Theoretical Research Program. JCW is a Cottrell
Scholar of Research Corporation. Research in infrared
astronomy at NRL is supported by 6.1 base funding.
RPD gratefully acknowledges financial support from the
NASA Graduate Student Research Program.
REFERENCES
Alexander, T. & Sternberg, A., 1999, ApJ, 520, 137
Alexander, T., Sturm, E., Lutz, D., et al. 1999, ApJ, 512, 204
Alloin, D., Edmunds, M. G., Lindblad, P. O., & Pagel, B. E. J.,
1981, A&A, 101, 377
Alloin, D., Pelat, D., Phillips, M., & Whittle, M., 1985, ApJ, 288,
Alloin, D., Pantin, E, Lagage, P.O., & Granato, G. L., 2000, A&A,
363, 929
Afanasiev, V. L. & Sil’chenko, 2005, A&A, 429, 825
Aoki, Kawaguchi, & Ohta, 2005, ApJ, 618, 601
Andreasian, N. K. & Khachikian, E. Y., 1987, IAUS, 121, 541
0.0 0.5 1.0 1.5 2.0 2.5 3.0
NGC 4725
NGC2623
Mrk 273
log(Distance) Mpc
Fig. 12.— The [SIII] 18µm/33µm line ratio as a function of
distance. The error bars shown here mark the calibration uncer-
tainties on the line ratio. Dale et al. (2006) quote 30% calibration
error which is shown here for those galaxies. For the rest of the
sample the calibration error is 15% as per the Spitzer handbook.
Most of the galaxies at distances <55 Mpc with [SIII] fluxes ex-
tracted from apertures of different sizes (i.e. not the Dale et al.
(2006) galaxies) are below the LDL. However, for the most distant
galaxies in our sample and the galaxies with fluxes from Dale et
al. (2006) where the aperture for the 18 and 33 µm lines are equal,
aperture effects are not problematic, but extinction needs to be
considered (see Mrk 273, NGC2623, and NGC 4725 above).
Armus, L., Charmandaris, V., Spoon, H. W. W., et al. 2004, ApJS,
154, 178
Armus, L., Bernard-Salas, J., Spoon,et al., H. W. W., et al. 2006,
ApJ, 640, 204
Balestra, I., Boller, T., Gallo, L., et al., 2005, A&A, 442, 469
Baribaud, T., Alloin, D., Glass, I., & Pelat, D., 1992, A&A, 256,
Barth, A. J., Ho, L. C., Filippenko, A. V., Rix, H., & Sargent, W.
L. W., 2001, ApJ, 546, 205
Bland, J., Taylor, K, & Atherton, P. D., 1987, MNRAS, 228, 595
Bland, J., Taylor, K, & Atherton, P. D., 1986, IAUS, 127, 417
Capetti, A., Macchetto, F., Axon, D. J., Sparks, W. B., &
Boksenberg, A., 1995, ApJ, 448, 600.
Capetti, A., Axon, D. J., & Macchetto, F., 1997, ApJ, 487, 560
Cappi, M, Panessa, F., Bassani, L., et al. 2006, 446, 459.
Carilli, C. L., & Taylor, G. B., 2000, ApJ, 532, 95
Charmandaris, V., Laurent, O., Le Floch, E., 2002, A&A, 391, 429
Chemin, L., Cayatte, V., Balkowski, C., et al., 2003, A&A, 405, 89
Chiar, J. E. & Tielens, A. G. G. M., 2006, ApJ, 637, 774
Clavel, J., Schulz, B., Altieri, B., et al., 2000, A&A, 357, 839
Clegg, R.E.S, Harrington, J. P., Barlow, M. J., & Walsh, J. R.,1987,
ApJ, 314, 551
Cohen, R. D., 1983, ApJ, 273, 489
Colina, L. Arribas, S., & Borne, K. D., 1999, ApJ, 527, 13
Cutri, R. M., Rieke, G. H., Tokunaga, A. T., Willner, S. P. & Rudy,
R. J., 1984, ApJ, 280, 521
Dahari, O. & De Robertis, M. M., 1988, ApJS, 67, 249
Dale, D. A., Smith, J. D. T., Armus, L, et al., 2006,
astroph/0604007
Davies, R., Ward, M., & Sugai, H., 2000, ApJ, 535, 735
de Vaucouleurs, G., 1973, ApJ, 181, 31
Draine, B. T., 1989, Interstellar Extinction in the Infrared, Edited
by B.H. Kaldeich, European Space Agency.
Duc, P. A., Mirabel, I. F., & Maza, J., 1997, A&AS, 124, 533
Dudik, R. P., Satyapal, S. Gliozzi, M. & Sambruna, R. M.2005,
ApJ, 620, 113
Ebneter, K., & Balick, B., 1983, ASP, 95, 675
Ehle, M., Beck, R., Haynes, et al., 1996, A&A, 306, 73
Elvis, M.; Wilkes, B. J., McDowell, J. C., et al. 1994, ApJS, 95,1
Evans, D. A., Kraft, R. P., Worrall, D. M., et al., 2004, ApJ, 612,
Falcke, H., Wilson, A. S., & Simpson, C., 1998, ApJ, 502, 199
Ferland, G. J., Martin, P. G., van Hoof, P. A. M., & Weingartner,
J. C. 2002, in Workshop on X-ray Spectroscopy of AGN with
Chandra and XMM-Newton, MPE Report 279, 103
Ferruit, P., Wilson, A. S., Falcke, H. et al., 1999, MNRAS, 309, 1
Filippenko, A. V. & Halpern, J. P., 1984, ApJ, 285, 458
Forster, K., Green, P. J., Aldcroft, T. L. et al., 2001, ApJS, 134,
Galavis, M. E., Mendoza, C., Zeippen, C. J., 1997, A&AS, 123, 159
Gallagher, S. C., Brandt, W. N., Chartas, G., Garmire, G. P., &
Sambruna, R. M., 2002, ApJ, 569, 655
Galliano, E., Alloin, D., Pantin, E., Lagage, P.O., & Marco, O.,
2005, A&A, 438, 803
Gallo, L. C., Boller, Th., Brandt, W. N., et al., 2004, A&A, 417,
Genzel, R., Weitzel, L., Tacconi-Garman, L. E., et al., 1995, ApJ,
444, 129
Genzel, R., Lutz, D., Sturm, E., Egami, E., et al., 1998, ApJ, 498,
Georgantopoulos, I. Panessa, F., Akylas, A., et al., 2002, A&A,
386, 60
Giveon, U., Sternberg, A., Lutz, D., Feuchtgruber, H., &
Pauldrach, A. W. A. 2002, ApJ, 566, 880
Griffin, D. C. & Badnell, N. R., 2000, JPhB, 33, 4389
Griffiths, R. E., Homeier, N., Gallagher, J., & HST/WFPC2
Investigation Definition Teamn, 2005, AAS, 191, 7607
Guainazzi, M., Matt, G., Brandt, W. N., et al., 2000, A&A, 356,
Hao, L., Spoon, H. W. W., Sloan, G. C., et al. 2005, ApJ, 625, 75
Haas, M., Siebenmorgen, R., Schulz, B., Krugel, E., & Chini, R.,
2005, A&A, 442, 39
Heckman, T. M., 1980, A&A, 87, 152
Higdon, S. J. U., Devost, D., Higdon, J. L., et al., 2004, PASP, 116,
Ho, L. C., Fillipenko, A. V., & Sargent, W. L. W. 1997, ApJS, 112,
Ho, L. C. 1999, ApJ, 516, 672
Hönig, S. F., Bekcert, T., Ohnaka, K. & Weigelt, G., 2006, A&A,
452, 459
Houck, J. R., Roellig, T. L., Van Cleve, J. et al., 2004, ApJS, 154,
Hummer, D. G., Berrington, K. A., Eissner, W. et al., 1993, A&A,
279, 298
Imanishi, M. & Terashima, Y., 2004, AJ, 127, 758
Ishigaki, T., Yoshida, M., Aoki, K., et al., 2000, PASJ, 52, 185
Israel, F. P, 1998, A&ARv, 8, 237
Iyomoto, N., Makishima, K., Fukazawa, Y., Tashiro, M. & Ishisaki,
Y., 1997, PASJ, 49, 425
Jackson & Browne, 1990, Nature, 343, 43
Johansson, L. & Bergvall, N., 1988, A&A, 192, 81
Joy, M. & Harvey, P. M., 1987, ApJ, 315, 480
Kendall, M., & Stuart, A.,1976, In: The Advanced Theory of
Statistics, Vol. 2, (New York: Macmillian)
Khachikian, E. Y. & Weedman, D. W., 1973, ApJ, 192, 581
Kollatschny, W. & Kowatsch, P., 1998, A&A, 336, 21
Komossa, S. & Schulz, H., 1998, A&A, 339, 345
Kuraszkiewicz, J. K., Green, P. J., Forster, K., et al., 2002, ApJS,
143, 257.
Kuraszkiewicz, J. K., Green, P. J., Crenshaw, D. M., et al., 2004,
Laine, S., van der Marel, R. P., & Rossa, J.; et al., 2003, AJ,126,
Leech, K., Kester, D., Shipman, R., Beintema, D., et al., 2003,
ESASP, The ISO Handbook, Volume V-SWS-Short Wavelength
Spectrometer, 1262
Lindblad, P. O., Hjelm, M., Hoegbom, J., et al., 1996, A&AS, 120,
Lipari, S., Mediavilla, E., Diaz, R. J., et al., 2004, MNRAS, 348,
Lindblad, P. 1999, A&ARv, 9, 22
Luhman, M. L., Satyapal, S., Fischer, J., Wolfire, M. G., Sturm, E
et al., 2003, ApJ, 594, 758
Lutz, D.; Genzel, R., Sternberg, A., et al., 1996, A&A, 315, 137
Lutz, D., Veilleux, S. & Genzel, R. 1999, ApJ, 517L, 13L
Maiolino, R., Marconi, A., Salvati, M. et al., 2001a, A&A, 365, 37
Maiolino, R., Marconi, A., & Oliva, E., 2001b, A&A, 365, 37
Majewski, S. R., Hereld, M., Koo, D.C. Illingworth, G. D., &
Heckman, T. M., 1993, ApJ, 402, 125.
Marchesini, D., Celotti, A., & Ferrarese, L. 2004, MNRAS, 351,
Marconi, A.; Oliva, E.; van der Werf, P. P.; et al., 2000, A&A, 357,
Mason, R. E., Geballe, T. R., Packham, C., et al., 2006, ApJ, 640,
Mathews, W. G. & Ferland, G. J. 1987, ApJ, 323, 456
Mazzarella, J. M., Bothun, G. D., & Boroson, T. A., 1991, AJ, 101,
Mazzarella, J. M., Voit, G. M., Soifer, B. T., et al., 1994, AJ, 107,
Mazzarella, J. M., Gaume, R. A., Aller, H. D., & Hughes, P. A.,
1988, ApJ, 333, 168
Meixner, M., Puchalsky, R., Blitz, L., Wright, M. & Heckman, T.,
1990, ApJ, 354, 158
Mendoza, C., & Zeippen, C. J., 1982, MNRAS, 199, 1025
Miller, P., Rawlings, S., & Saunders, R., 1993, MNRAS, 263, 425
Murayama, T. & Taniguchi, Y., 1998, ApJ, 497, 9
Nagao, T., Taniguchi, Y, & Murayama, T. 2000, AJ, 119, 2605
Nagao, T., Murayama, T , & Taniguchi, Y, 2001a, ApJ, 549, 155
Nagao, T., Murayama, T , & Taniguchi, Y, 2001b, PASJ, 53, 629
Nagao, T., Murayama, T., Shioya, Y, & Taniguchi, Y, 2003, 125,
Nelson, C. H. & Whittle, M., 1996, ApJ, 465, 96
Nicholson, R. A., Bland-Hawthorn, J, & Taylor, K, 1992, ApJ, 387,
Oliva, E.; Salvati, M.; Moorwood, A. F. M.; & Marconi, A., 1994,
A&A, 288, 457
Oliva, E., Pasquali, A., & Reconditi, M., 1996, A&A, 305, L21
Pastoriza, M. & Gerola, H., 1970, ApL, 6, 155
Peeters, E., Spoon, H. W. W., & Tielens, A. G. G. M., 2004, ApJ,
613, 986
Pelat, D., Alloin, D. & Fosbury, R. A. E., 1981, MNRAS, 195, 787
Penfold, J. E., 1979, MNRAS, 186, 297
Phillips, M. M., Edmunds, M. G., Pagel, B. E. J, & Turtle, A. J.,
1983, MNRAS, 203, 759
Prieto, M. A., & Viegas, S., 2000, RMxAC, 9, 324
Quillen, A. C., Bland-Hawthorn, J., Brookes, M. H. et al. 2006,
ApJ, 641, 29
Radomski, J. T., Pina, R. K., Packham, C., et al., 2003, ApJ, 587,
Rigopoulou, D., Spoon, H. W. W., Genzel, R., et al., 1999, AJ,
118, 2625
Risaliti, G., Bianchi, S., Matt, G., et al., 2005, ApJ, 630, 129
Roberts, T.P., Schurch, N. J., & Warwick, R. S., MNRAS, 2001,
324, 737
Roche, P. F.; Aitken, Smith, & Ward, 1991, MNRAS, 248, 606
Rots, A. H, 1978, AJ, 83, 219
Rubin, R. H., Colgan, S. W. J., Daane, A. R. & Dufour, R. J. 2002,
AAS, 34, 1252
Rubin, R. H., 2004, IAUS, 217, 190
Sanders,D.B., Soifer,B.T., Elias,J.H., et al., 1988, ApJ., 325, 74.
Satyapal, S., Sambruna, R. M., & Dudik, R. P. 2004, A&A, 414,
Satyapal, S. Dudik, R. P., O’Halloran, B. & Gliozzi, M, 2005, ApJ,
633, 86
Schmitt, H. R. & Kinney, A. L., 1996, ApJ, 463, 498
Schmitt, H. R., 1998, ApJ, 506, 647
Schmitt, H. R., Donley, J. L, Antonucci, R. R., et al., 2003, ApJ,
597, 768
Schulz, H., Komossa, S., Schmitz, C., & Mücke, A., 1999, A&A,
346, 764
Schreier, E. J., Capetti, A., Macchetto, F., Sparks, W. B., & Ford,
H. J., 1996, ApJ, 459, 535
Shields, J. C.; Boeker, T., Ho, L. C., et al., 2004, AAS, 205, 6411
Shuder & Osterbrock, 1981, ApJ, 250, 55.
Sparke, L., 1996, ApJ, 473, 810
Spinoglio, L, Malkan, M. A., Smith, Howard, A., Gonzalez-Alfonso,
E., & Fischer, J., 2005, ApJ, 623, 123
Smith, D. A. & Wilson, A. S., 2001, ApJ 557, 180
Storchi-Bergmann, T.; Mulchaey, J. S.; & Wilson, A. S. 1992, ApJ,
395, 73
Sturm, E., Lutz, D, Verma, A, et al., 2002, A&A, 393, 821
Sturm, E., Schweitzer, M., Lutz, D., et al., 2005, ApJ, 629, 21
Tacconi, L. J., Genzel, R., Lutz, D, et al, 2002, ApJ, 580, 73
Tayal, S. S., & Gupta, G. P., 1999, ApJ, 526, 544
Terashima, Y., Kunieda, H. & Misaki, K., 1999, PASJ, 51, 277
Terashima, Y, Iyomoto, N., Ho L. C., & Ptak, A. F., 2002, ApJ,
139, 1
Thompson, K. L., 1992, ApJ, 395, 403
Tran, H.D., Cohen, M. H., & Villar-Martin, M., 2000, AJ, 120, 562
Ulvestad, J. S., & Wilson, A. S., 1984, 285, 439
Van Hoof, P. A. M., Beintema, D. A., Verner D. A. & Ferland, G.
J., 2000, A&A, 354, 41
Veron-Cetty, M. P. & Veron, P., 1986, A&AS, 66, 335
Veron-Cetty, M. P. & Veron, P., 2003, A&A, 412, 399
Veilleux, S. & Osterbrock, D. E., 1987, ApJS, 63, 295
Verma, A., Lutz, D., Sturm, E., et al., 2003, A&A, 403, 829
Wang, J., Heckman, T. M., Weaver, K. A., & Armus, L. 1997, ApJ,
474, 659
Weedman, D. W., Hao, L., Higdon, S. J. U., et al. 2005, ApJ, 605,
Weingartner, J. C., & Murray, N., 2002, ApJ, 580, 88
Wilson, A. S., Baldwin, J. A., Sun, S., & Wright, A. E., 1986, ApJ,
310, 121
Wilson, A. S. & Heckman, T. M., 1985, Astrophysics of Active
Galaxies and Quasi-Stellar Objects, Mill Valley, CA
Woo, J.H., & Urry, C. M., 2002, ApJ, 579, 530
Xia, X. Y., Xue, S. J., Mao, S., Boller, Th., Deng, Z. G., & Wu,
H., 2002, ApJ, 564, 196
Zhang, X., Wright, M., & Alexander, P., 1993, ApJ, 418, 100
ABSTRACT
  We present the first systematic investigation of the [NeV] (14um/24um) and
[SIII] (18um/33um) infrared line flux ratios, traditionally used to estimate
the density of the ionized gas, in a sample of 41 Type 1 and Type 2 active
galactic nuclei (AGNs) observed with the Infrared Spectrograph on board
Spitzer. The majority of galaxies with both [NeV] lines detected have observed
[NeV] line flux ratios consistent with or below the theoretical low density
limit, based on calculations using currently available collision strengths and
ignoring absorption and stimulated emission. We find that Type 2 AGNs have
lower line flux ratios than Type 1 AGNs and that all of the galaxies with line
flux ratios below the low density limit are Type 2 AGNs. We argue that
differential infrared extinction to the [NeV] emitting region due to dust in
the obscuring torus is responsible for the ratios below the low density limit
and we suggest that the ratio may be a tracer of the inclination angle of the
torus to our line of sight. Because the temperature of the gas, the amount of
extinction, and the effect of absorption and stimulated emission on the line
ratios are all unknown, we are not able to determine the electron densities
associated with the [NeV] line flux ratios for the objects in our sample. We
also find that the [SIII] emission from the galaxies in our sample is extended
and originates primarily in star forming regions. Since the emission from
low-ionization species is extended, any analysis using line flux ratios from
such species obtained from slits of different sizes is invalid for most nearby
galaxies.

<|endoftext|><|startoftext|>
Hawaii
Neutrinos & Non-proliferation in Europe 
Michel Cribier*  
APC, Paris 
CEA/Saclay, DAPNIA/SPP 
The International Atomic Energy Agency (IAEA) is the United Nations agency in 
charge of the development of peaceful use of atomic energy. In particular IAEA is the 
verification authority of the Treaty on the Non-Proliferation of Nuclear Weapons (NPT). To 
do that jobs inspections of civil nuclear installations and related facilities under safeguards 
agreements are made in more than 140 states. 
IAEA uses many different tools for these verifications, like neutron monitor, gamma 
spectroscopy, but also bookeeping of the isotopic composition at the fuel element level before 
and after their use in the nuclear power station. In particular it verifie that weapon-origin and 
other fissile materials that Russia and USA have released from their defense programmes are 
used for civil application.  
The existence of an antineutrino signal sensitive to the power and to the isotopic 
composition of a reactor core, as first proposed by Mikaelian et al. [Mik77] and as 
demonstrated by the Bugey [Dec95] and Rovno experiments, [Kli94], could provide a means 
to address certain safeguards applications. Thus the IAEA recently ask members states to 
make a feasibility study to determine whether antineutrino detection methods might provide 
practical safeguards tools for selected applications.  If this method proves to be useful, IAEA 
has the power to decide that any new nuclear power plants built has to include an antineutrino 
monitor. 
Within the Double Chooz collaboration, an experiment [Las06] mainly devoted to 
study the fundamental properties of neutrinos, we thought that we were in a good position to 
evaluate the interest of using antineutrino detection to remotely monitor nuclear power 
station. This effort in Europe, supplemented by the US effort [Ber06], will constitute the basic 
answer to IAEA of the neutrino community. 
                                                 
* On behalf of a collective work by S. Cormon, M. Fallot, H. Faust, T. Lasserre, A. 
Letourneau, D. Lhuillier, V. Sinev from DAPNIA, Subatech and ILL. 
- 2 - 
Figure 1 : The statistical distribution of the fission products resulting 
from the fission of the most important fissile nuclei 235U and 239Pu 
shows two humps, one centered around masses 100 and the other one 
centered around 135. The low mass hump is at higher mass in 239Pu 
fission than in 235U, resulting in different nuclei and decays. 
The high penetration power of antineutrinos and the detection capability might 
provide a means to make remote, non-intrusive measurements of plutonium content in 
reactors [Ber02]. The antineutrino flux and energy spectrum depends upon the thermal power 
and on the fissile isotopic composition of the reactor fuel. Indeed, when a heavy nuclei 
(Uranium, Plutonium) experience a fission, it produce two unequal fission fragments (and a 
few free neutrons) ; the statistical distribution of the atomic masses is depicted in figure 1. All 
these nuclei immediately produced are extremely unstable - they are too rich in neutrons - and 
thus ß decay toward stable nuclei with an average of 6 ß decays. All these process involving 
several hundreds of unstable nuclei, with their excited states, makes very difficult to 
understand details of the physics, moreover, the most energetic antineutrinos, which are 
detected more easily, are produced in the very first decays, involving nuclei with typical 
lifetime smaller than a second. 
 235U 239Pu 
released energy per fission 201.7 MeV 210.0 MeV 
Mean energy of ν 2.94 MeV 2.84 MeV 
ν per fission > 1.8 MeV 1.92 1.45 
average inter. cross section 
≈ 3.2 10-43 
≈ 2.76 10-43 
Based on predicted and observed ß spectra, the number of antineutrinos per fission 
from 239Pu is known to be less than the number from 235U, and the energy released bigger by 
5%. Hence an hypothetical reactor able to use only 235U would induce in a detector an 
antineutrino signal 60% higher than the same reactor producing the same amount of energy 
but burning only 239Pu (see table). This offers a means to monitor changes in the relative 
amount of 235U and 239Pu in the core. If made in conjunction with accurate independent 
- 3 - 
measurements of the thermal power (with the temperature and the flow rate of cooling water), 
antineutrino measurements might provide an estimate of the isotopic composition of the core, 
in particular its plutonium inventories. The shape of the antineutrino spectrum can provide 
additional information about core fissile isotopic composition.  
Because the antineutrino signal from the reactor decreases as the square of the 
distance from the reactor to the detector a precise "remote" measurement is really only 
practical at distances of a few tens of meters if one is constrained to "small" detectors of the 
order of few cubic meter in size. 
Simulations 
MAGNITUDES OF SOME EFFECTS 
In our group, the development of detailed simulations using professional reactor 
codes started (see below), but it seems wise to use less sophisticated methods in order to 
evaluate already, with some flexibility, the magnitude of some effects. To do that we started 
from the set of Bateman equations, as depicted graphicaly in figure 2, which discribed the 
evolution of fuel elements in a reactor. The gross simplification in such treatment is the use of 
average cross section, depending only on 3 groups (thermal neutron, resonance region, fast 
neutrons), and moreover the fact that the neutron flux is imposed and not calculated. 
Figure 2 : The Bateman equations are the set of differential equations which described 
all transformations of the nuclei submitted to a given neutron flux : capture of 
neutrons are responsible to move at Z constant (green arrow), ß-decay are 
responsible to increase the atomic mass by one unit (dark blue arrow), and fission 
destroy the heavy nuclei and produce energy (orange arrows). 
Given this we use for each isotope under consideration, the cross section for capture, 
fission, and also plug in the parameters of the decays. Then it is rather easy (and fast) to 
simulate the evolution of a given initial core composition ; in the same way, it is possible to 
« make a diversion » by manipulation the fuel composition at a choosen moment. As an 
example, the figure 3 show the evolution of a fresh core composed of Uranium enriched at 3.5 
% in 235U : the build up of 239Pu and 241Pu is rather well reproduced. 
- 4 - 
Figure 3 : In a new reactor the initial fuel consist of enriched uranium rods, with 
an 235U content typically at 3.5 %, the rest is 238U. As soon as the reactor is 
operating, reactions described by Bateman equations produce 239Pu (and 241Pu), 
which then participate to the energy production, at the expense of 238U. 
Knowing the amount of fissions at a given time, it is straight forward to translate that 
in a given antineutrino flux using the parametrisation of [Hub04], and finally using the 
interaction cross section for inverse ß decay reaction, to produce the recorded signal in a given 
detector placed at a suitable location from the reactor under examination. 
- 5 - 
Figure 4 : Positron spectrum recorded in an typical antineutrino detector (10 tons of 
target) placed at 150m of a nuclear reactor (1000 MWel). Positrons results from the 
inverse ß-decay reaction used in the detection of anti-neutrino. The signal is the 
superposition of several component whose spectrum exhibit small but sizeable 
differences, especialy at high energy. 
As an example of this type of computation, we show in figure 5, the effect of the 
modification of fuel composition after 100 days : here the operator, clever enough, knows that 
he cannot merely remove Plutonium from the core without changing the thermal power which 
will be immediatly noticed. Hence he takes the precaution to add 28 kg of 235U at the same 
time where he remove 20 kg of 239Pu : although the thermal power is kept constant, the 
imprint on the antineutrino signal, although modest, is such that, after 10 days, there is an 
increase of more than 1 σ in the number of interactions recorded. Such a diversion is clearly 
impossible in PWR or BWR, but more easy in Candu-type reactor, and even more in a molten 
salt reactor. 
- 6 - 
Figure 5 : An hypothetical diversion scenario where an exchange of 239Pu with 
235U is made such that the power does not change, but the antineutrino signal 
recorded by the monitor is slightly increased, giving some evidence of an 
abnormal operation. 
SIMULATIONS OF DIVERSION SCENARIOS 
The IAEA recommends the study of specific safeguards scenarios. Among its 
concerns are the confirmation of the absence of unrecorded production of fissile material in 
declared reactors and the monitoring of the burn-up of a reactor core. The time required to 
manufacture an actual weapon estimated by the IAEA (conversion time), for plutonium in 
partially irradiated or spent fuel, lies between 1 and 3 months. The significant quantity of Pu 
is 8 kg, to be compared with the 3 tons of 235U contained in a Pressurized Water Reactor 
(PWR) of power 900MWe enriched to 3%. The small magnitude of the researched signal 
requires a carefull feasability study.  
The proliferation scenarios of interest involve different kinds of nuclear power plants 
such as light water or heavy water reactors (PWR, BWR, Candu...), it has to include isotope 
production reactors of a few tens of MWth, and future reactors (e.g., PBMRs, Gen IV 
reactors, accelerator-driven sub-critical assemblies for transmutation, molten salt reactors). To 
perform these studies, core simulations with dedicated Monte-Carlo codes should be 
provided, coupled to the simulation of the evolution of the antineutrino flux and spectrum 
over time. 
We started a simulation work using the widely used particle transport code MCNPX 
[Mcn05], coupled with an evolution code solving the Bateman equations for the fission 
- 7 - 
products within a package called MURE (MCNP Utility for Reactor Evolution) [Mur05]. This 
package offers a set of tools, interfaced with MCNP or MCNPX, that allows to define easily 
the geometry of a reactor core. In the evolution part, it accesses, the set of evaluated nuclear 
data and cross sections. MURE is perfectly adapted to simulate the evolution with time of the 
composition of the fuel, taking into account the neutronics of a reactor core. We are adapting 
the evolution code to simulate the antineutrino spectrum and flux, using simple Fermi decay 
as starting point.  
The extended MURE simulation will allows to perform sensitivity studies by varying 
the Pu content of the core in the relevant scenarios for IAEA. By varying the reactor power, 
the possibility to use antineutrinos for power monitoring can be evaluated. 
Preliminary results show that nuclei with half-lives lower than 1s emit about 70% 
(50%) of the 235U( 239Pu) antineutrino spectrum above 6 MeV. The high energy part of the 
spectrum is the energy region where Pu and U spectra differ mostly. The influence of the ß 
decay of these nuclei on the antineutrino spectrum might be preponderant also in scenarios 
where rapid changes of the core composition are performed, e.g. in reactors such as Candu, 
refueled on line. 
The appropriate starting point for this scenario is a representative PWR, like the 
Chooz reactors. For this reactor type, simulations of the evolution of the antineutrino flux and 
spectrum over time will be provided and compared to the accurate measurement provided by 
the near detector of Double Chooz. This should tell the precision on the fuel composition and 
of an independent thermal power measurements. An interesting point to study is at the time of 
the partial refuelling of the core, thanks to the fact that reactors like Chooz (N4-type) does not 
use MOX fuel. 
Without any extra experimental effort, the near detector of the Double Chooz 
experiment will provide the most important dataset of anti neutrino detected (5x105 ν per 
year) by a PWR. The precise neutrino energy spectrum recorded at a given time will be 
correlated to the fuel composition and to the thermal power provided by EDF. This valuable 
dataset will constitute an excellent experimental basis for the above feasibility studies of 
potential monitoring and for bench-marking fuel management codes ; it is expected that 
individual component due to fissile element (235U, 239Pu) could be extracted with some 
modest precision and serve as a benchmark of this techniques. 
To fulfil the goal of non-proliferation additional lab tests and theoretical calculations 
should also be performed to more precisely estimate the underlying neutrino spectra of 
plutonium and uranium fission products, especially at high energies. Contributions of decays 
to excited states of daughter nuclei are mandatory to reconstruct the shape of each spectrum. 
Following the conclusion of P. Huber and Th. Schwetz [Hub04] to achieve this goal a 
reduction of the present errors on the anti-neutrino fluxes of about a factor of three is 
necessary. We will see that such improvement needs an important effort. 
Experimental effort  
The precise measurement of β-decay spectra from fission products produced by the 
irradiation of a fissile target can be performed at the high flux reactor at Institut Laue 
- 8 - 
Langevin (ILL) in Grenoble, where similar studies performed in the past [Sch85] are the basis 
of the actual fluxes of antineutrinos used in these reactor neutrino experiment. The ILL 
reactor produces the highest neutron flux in the world : the fission rate of a fissile material 
target placed close to the reactor core is about 1012 per second. It is possible to choose 
different fissile elements as target in order to maximize the yield of the nucleus of interest. 
Using the LOHENGRIN recoil mass spectrometer [Loh04], measurement of individual 
β−spectra from short lived fission products are possible ; in the same irradiation channel, 
measurements of integral ß-spectrum with the Mini-INCA detectors [Mar06], could be 
envisaged to perform study on the evolution with time of the antineutrino energy spectrum of 
a nuclear power plant. 
EXPERIMENTS WITH LOHENGRIN 
The LOHENGRIN recoil mass spectrometer offers the possibility to measure β-
decays of individual fission products. The fissile target (235U, 239Pu, 241Pu, …) is placed into a 
thermal neutron flux of 6.1014 n/cm2/s, 50 cm from the fuel element. Recoil fission products 
are selected with a dipolar magnetic field followed by an electrostatic condenser. At the end 
the fragments could be implanted in a moving tape, and the measurement of subsequent β and 
γ-rays are recorded by a β-spectrometer (Si-detector) and Ge-clover detectors, respectively. 
Coincidences between these two quantities could also be made to reconstruct the decay 
scheme of the observed fission products or to select one fission product. Fragments with half-
lives down to 2 µs can be measured, so that nuclei with large Qß (above 4 MeV) can be 
measured.  
The LOHENGRIN experimental objectives are to complete existing β-spectra of 
individual fission products [Ten89] with new measurements for the main contributors to the 
detected ν-spectra and to clarify experimental disagreements between previous measurements. 
This ambitious experimental programme is motivated by the fact - noted by C. Bemporad 
[Bem02] - that unknown decays contribute as much as 25% of the antineutrinos at energies > 
4MeV. Folding the antineutrino energy spectrum over the detection cross-section for inverse 
beta decay enhances the contribution of the high energy antineutrinos to the total detected flux 
by a factor of about 10 for Eν > 6 MeV. The focus of these experiments will be on neutron 
rich nuclei with yields very different in 239Pu and 235U fission. In the list : 86Ge,90-92Se, 94Br, 
96-98Kr, 100Rb, 100-102Sr, 108-112Mo, 106-113Tc,113-115Ru…contribute to the high energy part of the 
spectrum and have never been measured. 
IRRADIATION TESTS IN SUMMER 2005  
A test-experiment has been performed during two weeks last in summer 2005. The 
isobaric chains A=90 and A=94 were studied where some isotopes possess a high Qß energy, 
contributing significantly to the high energy part of the antineutrino spectra following 235U 
and 239Pu fissions and moreover produced with very different fission yields after 235U and 
239Pu fission [Eng94]. The well-known nuclei, such as 90Br, will serve as a test of the 
experimental set-up, while the beta decay of more exotic nuclei such as 94Kr and 94Br will 
constitute a test case for how far one can reach in the very neutron rich region with this 
experimental device. The recorded data (figure 6) will validate the simulation described in the 
- 9 - 
previous section, in particular the evolution over time of the isobaric chains beta decay 
spectra. 
Silicon detector
Germanium detector
Figures 6 : Beta energy spectrum (6a) recorded with the silicon detector 
corresponding to ß decay of fission products with mass A=94. The fission 
products arising from the LOHENGRIN spectrometer were implanted on a mylar 
tape of adjustable velocity in front of the silicon detector. The highest velocity was 
selected in order to enhance shorter-lived nuclei such as 94Kr and 94Br. The 
gamma energy spectrum (6b) obtained with the germanium detector 
corresponding and to the same runs is displayed also. 
INTEGRAL ß SPECTRA MEASUREMENTS 
In complement to individual studies on LOHENGRIN, more integral studies can be 
envisaged using the so called “Mini-INCA chamber” at ILL [Mar06] in return for adding a β-
spectrometer (to be developped). The existing α- and γ-spectroscopy station is connected to 
the LOHENGRIN channel and offers the possibility to perform irradiations in a quasi thermal 
neutron flux up to 20 times the nominal value in a PWR. Moreover, the irradiation can be 
repeated as many time as needed. It offers then the unique possibility to characterize the 
evolution of the ß spectrum as a function of the irradiation time and the irradiation cooling. 
The expected modification of the β spectrum as a function of the irradiation time is connected 
to the transmutation induced by neutron capture of the fissile and fission fragment elements. It 
is thus related to the “natural” evolution of the spent-fuel in the reactor. The modification of 
the β spectrum as a function of the cooling time is connected to the decaying chain of the 
fission products and is then a means to select the emitted fragments by their livetime. This 
- 10 - 
information is important because long-lived fission fragments accumulate in the core and after 
few days mainly contribute to the low energy part of the antineutrino-spectra. 
Due to the mechanical transfer of the sample from the irradiation location to the 
measurement station an irreducible delay time of 30 mn is imposed leading to the loss of 
short-live fragments. 
PROSPECT TO STUDY FISSION OF 238U 
The integral beta decay spectrum arising from 238U fission has never been measured. 
All information relies on theoretical computations [Vog89]. Some experiments could be  
envisaged using few MeV neutron sources in Europe (Van de Graaf in Geel, SINQ in PSI, 
ALVARES or SAMES accelerators at Valduc, …). Here the total absence of experimental 
data on the ß emitted in the fission of 238U change the context of this measurement compared 
to the other isotopes. Indeed any integral measurements performed could be used to constraint 
the present theoretical estimations of the antineutrino flux produced in the fission of 238U. In 
any case it seems rather difficult to fulfil the goal of a determination of the isotopic content 
from antineutrinos measurements as long as in important part of the energy spectrum is so 
poorly known. 
Conclusions 
After the preliminary studies, some thoughts can already be made. A realistic 
diversion (≈ 10 kg Pu) has an imprint in the antineutrino signal which is very small. The 
present knowledge on antineutrino spectrum emitted in fissions is not precise enough to allow 
a determination of the isotopic content in the core sensitive to such diversion. 
On the other hand, the thermal power measurement is a less difficult job. Neutrinos 
sample the whole core, without attenuation, and would bring valuable information on the 
power with totally different systematics than present methods. 
Even if its measurement is not dissuasive by itself, the operator cannot hide any stops 
or change of power, and in most case, such a record made with an external and independent 
device, virtually impossible to fake, will act as a strong constraint. 
In spite of the uncertainty mentioned previously, we see that the most energetic part 
offers the best possibility to disentangle fission from 235U and 239Pu. The comparison between 
the cumulative numbers of antineutrinos as a function of antineutrino energy detected at low 
vs. high energy is an efficient observable to distinguish pure 235U and 239Pu. 
IAEA seeks also monitoring large spent-fuel elements. For this application, the 
likelihood is that antineutrino detectors could only make measurements on large quantities of 
beta-emitters, e.g., several cores of spent fuel. In the time of the experiment the discharge of 
parts of the core will happen and the Double-Chooz experiment will quantify the sensitivity of 
such monitoring. 
More generally the techniques developed for the detection of antineutrinos could be 
applied for the monitoring of nuclear activities at the level of a country. Hence a KamLAND 
type detector deeply submerged off the coast of the country, would offer the sensitivity to 
- 11 - 
detect a new underground reactor located at several hundreds of kilometers. All these 
common efforts toward more reliable techniques, remotely operated detectors, not to mention 
undersea techniques will automatically benefit to both fields, safeguard and geo-neutrinos. 
References 
[Bem02] Bemporad et al.,Rev. of Mod. Phys., Vol. 74, (2002). 
[Ber02] A. Bernstein, Y. Wang, G. Gratta, and T. West, J. Appl. Phys. 91, 4672 
(2002) 
[Ber06] A. Bernstein, these proceeding 
[Dec95] Y. Declais et al., Nucl. Phys. B434, 503 (1995) 
[Eng94] T.R. England and B.F. Rider, ENDF-349, LA-UR-94-3106. 
[Hub04] P. Huber,  Th. Schwetz, Precision spectroscopy with reactor anti-
neutrinos Phys.Rev. D70 (2004) 053011 
[Kli94] Klimov et al., Atomic Energy, v.76-2, 123, (1994) 
[Las06] T. Lasserre, these proceeding 
[Loh04] ILL Instrument Review, 2004/2005. 
[Mik77] Mikaelian L.A. Neutrino laboratory in the atomic plant, Proc. Int. 
Conference Neutrino-77,  v. 2, p. 383-387 
[Mar06] F. Marie, A. Letourneau et al., Nucl. Instr and Meth A556 (2006) 547. 
[Mcn05] Monte Carlo N-Particle eXtended, LA-UR-05-2675, J.S.Hendricks et al. 
[Mur05] MURE : MCNP Utility for Reactor Evolution -Description of the 
methods, first applications and results. MÃl'plan O., Nuttin A., 
Laulan O., David S., Michel-Sendis F. et al. In Proceedings of the 
ENC 2005 (CD-Rom) (2005) 1-7. 
[Sch85] K. Schreckenbach, G. Colvin, W. Gelletly, F.v. Feilitzch, Phys. Lett. 
B160 (1985) 325 
[Ten89] O. Tengblad et al., Nuclear Physics A 503 (1989) 136-160. 
[Vog89] P. Vogel and J. Engel, Phys. Rev. D39, 3378 (1989)
ABSTRACT
  Triggered by the demand of the IAEA, neutrino physicists in Europe involved
with the Double Chooz experiment are studying the potential of neutrino
detection to monitor nuclear reactors. In particular a new set of experiments
at the ILL is planned to improve the knowledge of the neutrino spectrum emitted
in the fission of 235U and 239Pu.

<|endoftext|><|startoftext|>
Introduction
Two–dimensional massive Integrable Quantum Field Theories (IQFTs) have proven
to be one of the most successful topics of relativistic field theory, with a large
variety of applications to statistical mechanical models. The main reason for this
success consists of their simplified on–shell dynamics which is encoded into a set of
elastic and factorized scattering amplitudes of their massive particles [1, 2]. The two-
particle S-matrix has a very simple analytic structure, with only poles in the physical
strip, and it can be computed combining the standard requirements of unitarity,
crossing and factorization together with specific symmetry properties of the theory.
The complete mass spectrum is obtained looking at the pole singularities of the
S–matrix elements. Off–mass shell quantities, such as the correlation functions, can
be also determined once the elastic S–matrix and the mass spectrum are known.
In fact, one can compute the exact matrix elements of the (semi)local fields on the
asymptotic states with the Form Factor (FF) approach [3], and use them to write
down the spectral representation of the correlators. By following this approach, it
has been possible, for instance, to tackle successfully the long-standing problem of
spelling out the mass spectrum and the correlation functions of the two dimensional
Ising model in a magnetic field [2, 4], as well as many other interesting problems of
statistical physics (for a partial list of them see, for instance, [5]).
The S-matrix approach can be also constructed for massless IQFTs [6, 7, 8, 9],
despite the subtleties in defining a scattering theory between massless particles in
(1 + 1) dimensions, and turns out to be useful mainly when conformal symmetry
is not present. In this case, massless IQFTs generically describe the Renormaliza-
tion Group trajectories connecting two different Conformal Field Theories, which
respectively rule the ultraviolet and infrared limits of all physical quantities along
the flows.
Given the large number of remarkable results obtained by the study of IQFTs,
one of the most interesting challenges is to extend the analysis to the non–integrable
field theories, at least to those obtained as deformations of the integrable ones and
to develop the corresponding perturbation theory. The breaking of integrability
is expected to considerably increase the difficulties of the mathematical analysis,
since scattering processes are no longer elastic. Non–integrable field theories are
in fact generally characterized by particle production amplitudes, resonance states
and, correspondingly, decay events. All these features strongly effect the analytic
structure of the scattering amplitudes, introducing a rich pattern of branch cut
singularities, in addition to the pole structure associated to bound and resonance
states. For massive non–integrable field theories, a convenient perturbative scheme
was originally proposed in [10] and called Form Factor Perturbation Theory (FFPT),
since it is based on the knowledge of the exact Form Factors (FFs) of the original
integrable theory. It was shown that, even using just the first order correction of
the FFPT, a great deal of information can be obtained, such as the evolution of
their particle content, the variation of their masses and the change of the ground
state energy. Whenever possible, universal ratios were computed and successfully
compared with their value obtained by other means. Recently, for instance, it has
been obtained the universal ratios relative to the decay of the particles with higher
masses in the Ising model in a magnetic field, once the temperature is displayed away
from the critical value [11] (see also the contibution by G. Delfino in this proceedings
[12]). For other and important aspects of the Ising model along non-integrable lines
see the references [13, 14, 15, 16]. Applied to the double Sine–Gordon model [17], the
FFPT has been useful in clarifying the rich dynamics of this non–integrable model.
In particular, in relating the confinement of the kinks in the deformed theory to the
non–locality properties of the perturbed operator and predicting the existence of a
Ising–like phase transition for particular ratios of the two frequencies – results which
were later confirmed by a numerical study [18]. The FFPT has been also used to
study the spectrum of the O(3) non-linear sigma model with a topological θ term,
by varying θ [19, 20].
In this talk I would like to focus the attention on a different approach to tackle
some interesting non-integrable models, i.e. those two dimensional field theories
with kink topological excitations. Such theories are described by a scalar real field
ϕ(x), with a Lagrangian density
(∂µϕ)
2 − U(ϕ) , (1.1)
where the potential U(ϕ) possesses several degenerate minima at ϕ
a (a = 1, 2, . . . , n),
as the one shown in Figure 1. These minima correspond to the different vacua | a 〉
of the associate quantum field theory.
The basic excitations of this kind of models are kinks and anti-kinks, i.e. topological
configurations which interpolate between two neighbouring vacua. Semiclassically
they correspond to the static solutions of the equation of motion, i.e.
∂2x ϕ(x) = U
′[ϕ(x)] , (1.2)
with boundary conditions ϕ(−∞) = ϕ(0)a and ϕ(+∞) = ϕ(0)b , where b = a ± 1.
Denoting by ϕab(x) the solutions of this equation, their classical energy density is
(A) (B)
Figure 1: Potential U(ϕ) of a quantum field theory with kink excitations (A) and
istogram of the masses of the kinks (B).
given by
ǫab(x) =
+ U(ϕab(x)) , (1.3)
and its integral provides the classical expression of the kink masses
Mab =
ǫab(x) . (1.4)
It is easy to show that the classical masses of the kinks ϕab(x) are simply proportional
to the heights of the potential between the two minima ϕ
a and ϕ
: their istogram
provides a caricature of the original ptential (see Figura 1).
The classical solutions can be set in motion by a Lorentz transformation, i.e.
ϕab(x) → ϕab
(x± vt)/
1− v2
. In the quantum theory, these configurations de-
scribe the kink states | Kab(θ) 〉, where a and b are the indices of the initial and final
vacuum, respectively. The quantity θ is the rapidity variable which parameterises
the relativistic dispersion relation of these excitations, i.e.
E = Mab cosh θ , P = Mab sinh θ . (1.5)
Conventionally | Ka,a+1(θ) 〉 denotes the kink between the pair of vacua {| a 〉, | a+ 1 〉}
while | Ka+1,a 〉 is the corresponding anti-kink. For the kink configurations it may
be useful to adopt the simplified graphical form shown in Figure 2.
The multi-particle states are given by a string of these excitations, with the adja-
cency condition of the consecutive indices for the continuity of the field configuration
| Ka1,a2(θ1)Ka2,a3(θ2)Ka3,a4(θ3) . . .〉 , (ai+1 = ai ± 1) (1.6)
In addition to the kinks, in the quantum theory there may exist other excitations in
the guise of ordinary scalar particles (breathers). These are the neutral excitations
a,a+1K K a+1,a
| a+1>
| 0 >
| a  >
| n  >
Figure 2: Kink and antikink configurations.
| Bc(θ) 〉a (c = 1, 2, . . .) around each of the vacua | a 〉. For a theory based on a
Lagrangian of a single real field, these states are all non-degenerate: in fact, there
are no extra quantities which commute with the Hamiltonian and that can give
rise to a multiplicity of them. The only exact (alias, unbroken) symmetries for a
Lagrangian as (1.1) may be the discrete ones, like the parity transformation P , for
instance, or the charge conjugation C. However, since they are neutral excitations,
they will be either even or odd eigenvectors of C.
The neutral particles must be identified as the bound states of the kink-antikink
configurations that start and end at the same vacuum | a 〉, i.e. | Kab(θ1)Kba(θ2) 〉,
with the “tooth” shapes shown in Figure 3.
| 0 >
| 0 >
| 0 >
Figure 3: Kink-antikink configurations which may give rise to a bound state nearby
the vacuum | 0 〉a.
If such two-kink states have a pole at an imaginary value i ucab within the physical
strip 0 < Im θ < π of their rapidity difference θ = θ1 − θ2, then their bound states
are defined through the factorization formula which holds in the vicinity of this
singularity
| Kab(θ1)Kba(θ2) 〉 ≃ i
θ − iuc
| Bc 〉a . (1.7)
In this expression gcab is the on-shell 3-particle coupling between the kinks and the
neutral particle. Moreover, the mass of the bound states is simply obtained by sub-
stituing the resonance value i ucab within the expression of the Mandelstam variable
s of the two-kink channel
s = 4M2ab cosh
−→ mc = 2Mab cos
. (1.8)
Concerning the vacua themselves, as well known, in the infinite volume their
classical degeneracy is removed by selecting one of them, say | k 〉, out of the n
available. This happens through the usual spontaneously symmetry breaking mech-
anism, even though – stricly speaking – there may be no internal symmetry to break
at all. This is the case, for instance, of the potential shown in Figure 1, which does
not have any particular invariance. In the absence of a symmetry which connects the
various vacua, the world – as seen by each of them – may appear very different: they
can have, indeed, different particle contents. The problem we would like to examine
in this talk concerns the neutral excitations around each vacuum, in particular the
question of the existence of such particles and of the value of their masses. To this
aim, let’s make use of a semiclassical approach.
2 A semiclassical formula
The starting point of our analysis is a remarkably simple formula due to Goldstone-
Jackiw [23], which is valid in the semiclassical approximation, i.e. when the coupling
constant goes to zero and the mass of the kinks becomes correspondingly very large
with respect to any other mass scale. In its refined version, given in [24] and redis-
covered in [25], it reads as follows1 (Figure 4)
(θ) = 〈Kab(θ1) | ϕ(0) | Kab(θ2)〉 ≃
dx eiMab θ x ϕab(x) , (2.9)
where θ = θ1 − θ2.
1The matrix element of the field ϕ(y) is easily obtained by using ϕ(y) = e−iPµy
ϕ(0) eiPµy
and by acting with the conserved energy-momentum operator Pµ on the kink state. Moreover, for
the semiclassical matrix element FG
(θ) of the operator G[ϕ(0)], one should employ G[ϕab(x)]. For
instance, the matrix element of ϕ2(0) are given by the Fourier transform of ϕ2
Figure 4: Matrix element between kink states.
Notice that, if we substitute in the above formula θ → iπ − θ, the corresponding
expression may be interpreted as the following Form Factor
(θ) = f(iπ − θ) = 〈a | ϕ(0) | Kab(θ1)Kba(θ2)〉 . (2.10)
In this matrix element, it appears the neutral kink states around the vacuum | a〉
we are interested in.
Eq. (2.9) deserves several comments.
1. The appealing aspect of the formula (2.9) stays in the relation between the
Fourier transform of the classical configuration of the kink, – i.e. the solu-
tion ϕab(x) of the differential equation (1.2) – to the quantum matrix ele-
ment of the field ϕ(0) between the vacuum | a 〉 and the 2-particle kink state
| Kab(θ1)Kba(θ2) 〉.
Once the solution of eq. (1.2) has been found and its Fourier transform has
been taken, the poles of Fab(θ) within the physical strip of θ identify the
neutral bound states which couple to ϕ. The mass of the neutral particles can
be extracted by using eq. (1.8), while the on-shell 3-particle coupling gcab can
be obtained from the residue at these poles (Figura 5)
θ→i uc
(θ − iucab)Fab(θ) = i gcab 〈a | ϕ(0) | Bc 〉 . (2.11)
2. It is important to stress that, for a generic theory, the classical kink config-
uration ϕab(x) is not related in a simple way to the anti-kink configuration
ϕba(x). It is precisely for this reason that neighbouring vacua may have a
different spectrum of neutral excitations, as shown in the examples discussed
in the following sections.
ab ba
ab ba
Figure 5: Residue equation for the matrix element on the kink states.
3. It is also worth noting that this procedure for extracting the bound states
masses permits in many cases to avoid the semiclassical quantization of the
breather solutions [22], making their derivation much simpler. The reason
is that, the classical breather configurations depend also on time and have,
in general, a more complicated structure than the kink ones. Yet, it can be
shown that in non–integrable theories these configurations do not exist as
exact solutions of the partial differential equations of the field theory. On
the contrary, in order to apply eq. (2.9), one simply needs the solution of the
ordinary differential equation (1.2). It is worth notice that, to locate the poles
(θ), one only needs to looking at the exponential behavior of the classical
solutions at x → ±∞, as discussed below.
In the next two sections we will present the analyse a class of theories with only
two vacua, which can be either symmetric or asymmetric ones. A complete analysis
of other potentials can be found in the original paper [27].
3 Symmetric wells
A prototype example of a potential with two symmetric wells is the ϕ4 theory in its
broken phase. The potential is given in this case by
U(ϕ) =
ϕ2 − m
. (3.12)
Let us denote with | ±1 〉 the vacua corresponding to the classical minima ϕ(0)± =
. By expanding around them, ϕ = ϕ
± + η, we have
± + η) = m
2 η2 ±m
λ η3 +
η4 . (3.13)
Hence, perturbation theory predicts the existence of a neutral particle for each of
the two vacua, with a bare mass given by mb =
2m, irrespectively of the value of
the coupling λ. Let’s see, instead, what is the result of the semiclassical analysis.
The kink solutions are given in this case by
ϕ−a,a(x) = a
, a = ±1 (3.14)
and their classical mass is
ǫ(x) dx =
. (3.15)
The value of the potential at the origin, which gives the height of the barrier between
the two vacua, can be expressed as
U(0) =
M0 , (3.16)
and, as noticed in the introduction, is proportional to the classical mass of the kink.
If we take into account the contribution of the small oscillations around the
classical static configurations, the kink mass gets corrected as [22]
+O(λ) . (3.17)
It is convenient to define
> 0 ,
and also the adimensional quantities
; ξ =
1− πcg
. (3.18)
In terms of them, the mass of the kink can be expressed as
. (3.19)
Since the kink and the anti-kink solutions are equal functions (up to a sign), their
Fourier transforms have the same poles. Hence, the spectrum of the neutral particles
will be the same on both vacua, in agreement with the Z2 symmetry of the model.
We have
f−a,a(θ) =
dx eiMθ xϕ−a,a(x) = i a
By making now the analitical continuation θ → iπ−θ and using the above definitions
(3.18), we arrive to
F−a,a(θ) = 〈a | ϕ(0) | K−a,a(θ1)Ka,−a(θ2)〉 ∝
(iπ−θ)
) . (3.20)
The poles of the above expression are located at
θn = iπ (1− ξ n) , n = 0,±1,±2, . . . (3.21)
and, if
ξ ≥ 1 , (3.22)
none of them is in the physical strip 0 < Im θ < π. Consequently, in the range of
the coupling constant
1 + πc
= 1.02338... (3.23)
the theory does not have any neutral bound states, neither on the vacuum to the
right nor on the one to the left. Viceversa, if ξ < 1, there are n =
neutral
bound states, where [x] denote the integer part of the number x. Their semiclassical
masses are given by
= 2M sin
= n mb
n2 + ...
. (3.24)
Note that the leading term is given by multiples of the mass of the elementary boson
| B1〉. Therefore the n-th breather may be considered as a loosely bound state of n of
it, with the binding energy provided by the remaining terms of the above expansion.
But, for the non-integrability of the theory, all particles with mass mn > 2m1 will
eventually decay. It is easy to see that, if there are at most two particles in the
spectrum, it is always valid the inequality m2 < 2m1. However, if ξ <
, for the
higher particles one always has
mk > 2m1 , for k = 3, 4, . . . n . (3.25)
According to the semiclassical analysis, the spectrum of neutral particles of ϕ4 theory
is then as follows: (i) if ξ > 1, there are no neutral particles; (ii) if 1
< ξ < 1, there
Figure 6: Neutral bound states of ϕ4 theory for g < 1. The lowest two lines are the
stable particles whereas the higher lines are the resonances.
is one particle; (iii) if 1
< ξ < 1
there are two particles; (iv) if ξ < 1
there are
particles, although only the first two are stable, because the others are resonances.
Let us now briefly mention some general features of the semiclassical methods,
starting from an equivalent way to derive the Fourier transform of the kink solution.
To simplify the notation, let’s get rid of all possible constants and consider the
Fourier transform of the derivative of the kink solution, expressed as
G(k) =
dx eikx
cosh2 x
. (3.26)
We split the integral in two terms
G(k) =
dx eikx
cosh2 x
dx eikx
cosh2 x
, (3.27)
and we use the following series expansion of the integrand, valid on the entire real
axis (except the origin)
cosh2 x
(−1)n+1n e−2n|x| . (3.28)
Substituting this expression into (3.27) and computing each integral, we have
G(k) = 4i
(−1)n+1n
ik + 2n
−ik + 2n
. (3.29)
Obviously it coincides with the exact result, G(k) = πk/ sinh π
k, but this derivation
permits to easily interpret the physical origin of each pole. In fact, changing k to
the original variable in the crossed channel, k → (iπ − θ)/ξ, we see that the poles
which determine the bound states at the vacuum | a〉 are only those relative to
the exponential behaviour of the kink solution at x → −∞. This is precisely the
point where the classical kink solution takes values on the vacuum | a〉. In the case
of ϕ4, the kink and the antikink are the same function (up to a minus sign) and
therefore they have the same exponential approach at x = −∞ at both vacua | ±1〉.
Mathematically speaking, this is the reason for the coincidence of the bound state
spectrum on each of them: this does not necessarily happens in other cases, as we
will see in the next section, for instance.
The second comment concerns the behavior of the kink solution near the minima
of the potential. In the case of ϕ4, expressing the kink solution as
ϕ(x) =
2x − 1
2x + 1
, (3.30)
and expanding around x = −∞, we have
ϕ(t) = − m√
1− 2t+ 2t2 − 2t3 + · · · 2 (−1)ntn · · ·
, (3.31)
where t = exp[
2x]. Hence, all the sub-leading terms are exponential factors, with
exponents which are multiple of the first one. Is this a general feature of the kink
solutions of any theory? It can be proved that the answer is indeed positive [27].
The fact that the approach to the minimum of the kink solutions is always
through multiples of the same exponential (when the curvature ω at the minimum
is different from zero) implies that the Fourier transform of the kink solution has
poles regularly spaced by ξa ≡ ωπMab in the variable θ. If the first of them is within
the physical strip, the semiclassical mass spectrum derived from the formula (2.9)
near the vacuum | a 〉 has therefore the universal form
mn = 2Mab sin
. (3.32)
As we have previously discussed, this means that, according to the value of ξa,
we can have only the following situations at the vacuum | a 〉: (a) no bound state
if ξa > 1; (b) one particle if
< ξa < 1; (c) two particles if
< ξa <
; (d)
particles if ξa <
, although only the first two are stable, the others being
resonances. So, semiclassically, each vacuum of the theory cannot have more than
two stable particles above it. Viceversa, if ω = 0, there are no poles in the Fourier
transform of the kink and therefore there are no neutral particles near the vacuum
| a 〉.
4 Asymmetric wells
In order to have a polynomial potential with two asymmetric wells, one must nec-
essarily employ higher powers than ϕ4. The simplest example of such a potential is
obtained with a polynomial of maximum power ϕ6, and this is the example discussed
here. Apart from its simplicity, the ϕ6 theory is relevant for the class of universality
of the Tricritical Ising Model [28]. As we can see, the information available on this
model will turn out to be a nice confirmation of the semiclassical scenario. .
A class of potentials which may present two asymmetric wells is given by
U(ϕ) =
ϕ− b m√
ϕ2 + c
, (4.33)
with a, b, c all positive numbers. To simplify the notation, it is convenient to use the
dimensionless quantities obtained by rescaling the coordinate as xµ → mxµ and the
field as ϕ(x) →
λ/mϕ(x). In this way the lagrangian of the model becomes
L = m
(∂ϕ)2 − 1
(ϕ+ a)2(ϕ− b)2(ϕ2 + c)
. (4.34)
The minima of this potential are localised at ϕ
0 = −a and ϕ
1 = b and the
corresponding ground states will be denoted by | 0 〉 and | 1 〉. The curvature of the
potential at these points is given by
U ′′(−a) ≡ ω20 = (a+ b)2(a2 + c) ;
U ′′(b) ≡ ω21 = (a+ b)2(b2 + c) .
(4.35)
For a 6= b, we have two asymmetric wells, as shown in Figure 7. To be definite, let’s
assume that the curvature at the vacuum | 0 〉 is higher than the one at the vacuum
| 1 〉, i.e. a > b.
The problem we would like to examine is whether the spectrum of the neutral
particles | B 〉s (s = 0, 1) may be different at the two vacua, in particular, whether it
would be possible that one of them (say | 0〉) has no neutral excitations, whereas the
other has just one neutral particle. The ordinary perturbation theory shows that
both vacua has neutral excitations, although with different value of their mass:
m(0) = (a+ b)
2 (a2 + c) , m(1) = (a+ b)
2 (b2 + c) . (4.36)
Let’s see, instead, what is the semiclassical scenario. The kink equation is given
in this case by
= ±(ϕ + a)(ϕ− b)
ϕ2 + c . (4.37)
Figure 7: Example of ϕ6 potential with two asymmetric wells and a bound state only
on one of them.
We will not attempt to solve exactly this equation but we can present nevertheless its
main features. The kink solution interpolates between the values −a (at x = −∞)
and b (at x = +∞). The anti-kink solution does viceversa, but with an important
difference: its behaviour at x = −∞ is different from the one of the kink. As a matter
of fact, the behaviour at x = −∞ of the kink is always equal to the behaviour at
x = +∞ of the anti-kink (and viceversa), but the two vacua are approached, in this
theory, differently. This is explicitly shown in Figure 8 and proved in the following.
-4 -2 0 2 4
0.005
0.015
0.025
0.035
Figure 8: Typical shape of
, obtained by a numerical solution of eq. (4.37).
Let us consider the limit x → −∞ of the kink solution. For these large values
of x, we can approximate eq. (4.37) by substituting, in the second and in the third
term of the right-hand side, ϕ ≃ −a, with the result
≃ (ϕ+ a)(a+ b)
a2 + c , x → −∞ (4.38)
This gives rise to the following exponential approach to the vacuum | 0〉
ϕ0,1(x) ≃ −a+ A exp(ω0x) , x → −∞ (4.39)
where A > 0 is a arbitrary costant (its actual value can be fixed by properly solving
the non-linear differential equation). To extract the behavior at x → −∞ of the
anti-kink, we substitute this time ϕ ≃ b into the first and third term of the right
hand side of (4.37), so that
≃ (ϕ− b)(a+ b)
b2 + c , x → −∞ (4.40)
This ends up in the following exponential approach to the vacuum | 1〉
ϕ1,0(x) ≃ b− B exp(ω1x) , x → −∞ (4.41)
where B > 0 is another constant. Since ω0 6= ω1, the asymptotic behaviour of the
two solutions gives rise to the following poles in their Fourier transform
F(ϕ0,1) →
ω0 + ik
(4.42)
F(ϕ1,0) →
ω1 + ik
In order to locate the pole in θ, we shall reintroduce the correct units. Assuming to
have solved the differential equation (4.37), the integral of its energy density gives
the common mass of the kink and the anti-kink. In terms of the constants in front
of the Lagrangian (4.34), its value is given by
α , (4.43)
where α is a number (typically of order 1), coming from the integral of the adimen-
sional energy density (1.4). Hence, the first pole2 of the Fourier transform of the
kink and the antikink solution are localised at
θ(0) ≃ iπ
1− ω0
1− ω0
(4.44)
θ(1) ≃ iπ
1− ω1
1− ω1
2In order to determine the others, one should look for the subleading exponential terms of the
solutions.
If we now choose the coupling constant in the range
, (4.45)
the first pole will be out of the physical sheet whereas the second will still remain
inside it! Hence, the theory will have only one neutral bound state, localised at
the vacuum | 1 〉. This result may be expressed by saying that the appearance of a
bound state depends on the order in which the topological excitations are arranged:
an antikink-kink configuration gives rise to a bound state whereas a kink-antikink
does not.
Finally, notice that the value of the adimensional coupling constant can be chosen
so that the mass of the bound state around the vacuum | 1 〉 becomes equal to mass
of the kink. This happens when
. (4.46)
Strange as it may appear, the semiclassical scenario is well confirmed by an
explicit example. This is provided by the exact scattering theory of the Tricritical
Ising Model perturbed by its sub-leading magnetization. Firstly discovered through
a numerical analysis of the spectrum of this model [29], its exact scattering theory
has been discussed later in [30].
5 Conclusions
In this paper we have used simple arguments of the semi-classical analysis to in-
vestigate the spectrum of neutral particles in quantum field theories with kink ex-
citations. We have concentrated our analysis on two cases: the first relative to a
potential with symmetric wells, the second concerning with a potential with asym-
metric wells. Leaving apart the exact values of the quantities extracted by the
semiclassical methods, it is perhaps more important to underline some general fea-
tures which have emerged through this analysis. One of them concerns, for instance,
the existence of a critical value of the coupling constant, beyond which there are
no neutral bound states. Another result is about the maximum number n ≤ 2 of
neutral particles living on a generica vacuum of a non-integrable theory. An addi-
tional aspect is the role played by the asymmetric vacua, which may have a different
number of neutral excitations above them.
Acknowledgements
I would like to thank G. Delfino and V. Riva for interesting discussions. I am
particularly grateful to M. Peyrard for very useful and enjoyable discussions on
solitons. This work was done under partial support of the ESF grant INSTANS.
References
[1] A.B. Zamolodchikov and Al.B. Zamolodchikov, Ann. Phys. 120 (1979) 253.
[2] A.B. Zamolodchikov, Adv. Stud. Pure Math. 19 (1989), 641.
[3] F. A. Smirnov, Form Factors in Completely Integrable Models of Quantum Field
Theory, (World Scientific, Singapore, 1992); M. Karowski and P. Weisz, Nucl.
Phys. D 139, (1978), 455.
[4] G. Delfino and G. Mussardo, Nucl. Phys. B 455, (1995), 724; G. Delfino and P.
Simonetti, Phys. Lett. B 383, (1996), 450.
[5] G. Mussardo, Phys. Rept. 218 (1992), 215.
[6] Al.B.Zamolodchikov, Nucl.Phys. B 358, (1991), 524.
[7] A.B.Zamolodchikov and Al.B.Zamolodchikov, Nucl.Phys. B 379, (1992), 602.
[8] P. Fendley, H. Saleur and N.P. Werner, Nucl.Phys. B 430, (1994), 577.
[9] G.Delfino, G.Mussardo and P.Simonetti, Phys. Rev. D 51, (1995), 6622.
[10] G.Delfino, G.Mussardo and P.Simonetti,Nucl.Phys. B 473, (1996), 469.
[11] P. Grinza, G. Delfino and G. Mussardo, hep/th 0507133, Nucl. Phys. B in press.
[12] G. Delfino, Particle decay in Ising field theory with magnetic field, Proceedings
ICMP 2006.
[13] B.M. McCoy and T.T. Wu, Phys. Rev. D 18 (1978), 1259.
[14] P. Fonseca and A.B. Zamolodchikov, J.Stat.Phys.110 (2003), 527.
[15] S.B. Rutkevich, Phys. Rev. Lett. 95 (2005), 250601.
[16] P. Fonseca and A.B. Zamolodchikov, Ising Spectoscopy I: Mesons at T < Tc,
hep-th/0612304.
http://arxiv.org/abs/hep-th/0612304
[17] G. Delfino and G. Mussardo, Nucl. Phys. B 516, (1998), 675.
[18] Z. Bajnok, L. Palla, G. Takacs, F. Wagner, Nucl.Phys. B 601, (2001), 503.
[19] D. Controzzi and G. Mussardo, Phys. Rev. Lett. 92, (2004), 021601.
[20] D. Controzzi and G. Mussardo, Phys. Lett. B 617, (2005), 133.
[21] G. Delfino, P. Grinza and G. Mussardo, Nucl. Phys. B 737 (2006), 291.
[22] R.F.Dashen, B.Hasslacher and A.Neveu, Phys. Rev. D 10 (1974) 4130;
R.F.Dashen, B.Hasslacher and A.Neveu, Phys. Rev. D 11 (1975) 3424.
[23] J. Goldstone and R. Jackiw, Phys.Rev. D 11 (1975) 1486.
[24] R. Jackiw and G. Woo, Phys. Rev. D 12 (1975), 1643.
[25] G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 670 (2003), 464.
[26] G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 699 (2004), 545.
G. Mussardo, V. Riva and G. Sotkov, Nucl. Phys. B 705 (2005), 548
[27] G. Mussardo, Neutral bound states in kink-like theories, hep-th/0607025, to
appear on Nucl. Phys. B.
[28] A.B. Zamolodchikov, Sov.J.Nucl.Phys. 44 (1986), 529.
[29] M. Lassig, G. Mussardo and J.L. Cardy, Nucl. Phys. B 348 (1991), 591.
[30] F. Colomo, A. Koubek and G. Mussardo, Int. Journ. Mod. Phys. A 7 (1992),
5281.
http://arxiv.org/abs/hep-th/0607025
	Introduction
	A semiclassical formula
	Symmetric wells
	Asymmetric wells
	Conclusions
ABSTRACT
  In this talk we discuss an elementary derivation of the semi-classical
spectrum of neutral particles in two field theories with kink excitations. We
also show that, in the non-integrable cases, each vacuum state cannot
generically support more than two stable particles, since all other neutral
exitations are resonances, which will eventually decay.

<|endoftext|><|startoftext|>
Introduction
	Observations and data reduction
	The colour - magnitude diagrams
	Cluster parameters
	King 11
	Berkeley 32
	Summary and discussion
ABSTRACT
  We have obtained CCD BVI imaging of the old open clusters Berkeley 32 and
King 11. Using the synthetic colour-magnitude diagram method with three
different sets of stellar evolution models of various metallicities, with and
without overshooting, we have determined their age, distance, reddening, and
indicative metallicity, as well as distance from the Galactic centre and height
from the Galactic plane. The best parameters derived for Berkeley 32 are:
subsolar metallicity (Z=0.008 represents the best choice, Z=0.006 or 0.01 are
more marginally acceptable), age = 5.0-5.5 Gyr (models with overshooting;
without overshooting the age is 4.2-4.4 Gyr with poorer agreement),
(m-M)_0=12.4-12.6, E(B-V)=0.12-0.18 (with the lower value being more probable
because it corresponds to the best metallicity), Rgc ~ 10.7-11 kpc, and |Z| ~
231-254 pc. The best parameters for King 11 are: Z=0.01, age=3.5-4.75 Gyr,
(m-M)_0=11.67-11.75, E(B-V)=1.03-1.06, Rgc ~ 9.2-10 kpc, and |Z| ~ 253-387 pc.

<|endoftext|><|startoftext|>
Introduction
The genetic programming (GP) bibliography1, created and maintained by
one of us (WBL) and by S. Gustafson contains most of the GP papers. As
such, it is a rich source of data that implicitly describes many aspects of
the structure of the GP community. Searching the bibliography and looking
at the images2 provides a lot of useful information about the field and the
people working on GP. However, a deeper analysis of the data, that goes
beyond the mere pictorial aspect, provides a much more complete view.
The coauthorship data is a social network since collaborating in a research
∗Information Systems Department, University of Lausanne, Switzerland
†Information Systems Department, University of Lausanne, Switzerland
‡Dpt. of Animal Production Epidemiology and Ecology, University of Torino, Italy
§Department of Computer Science, University of Essex, UK
1http://www.cs.bham.ac.uk/∼wbl/biblio/
2http://www.cs.bham.ac.uk/∼wbl/biblio/gp-coauthors/
http://arxiv.org/abs/0704.0551v1
study usually requires that the coauthors become personally acquainted.
Thus, studying those ties, their structure, and their evolution allows a better
understanding of the factors that shape scientific collaboration.
We present a systematic study of the GP coauthorship data base us-
ing methods and tools pertaining to complex networks and social network
analysis. Social network analysis (see [?] for a survey), although it is an
old discipline, has recently received new impetus and tools from the field of
complex networks (see [?] for an excellent review). This is mainly due to
the relatively recent availability of large machine-readable databases such as
the GP bibliography. Social acquaintances involve psychological and other
human aspects that are difficult to quantify. However, as it has been done
in other fields [?, ?, ?, ?], we use objective data such as joint published work
to stand for social bonds. Since this must ignore subtler aspects of a col-
laboration relationship, it is obviously far from perfect as a social indicator,
yet it is still a good “proxy” for the network of social relationships and can
reveal several interesting facts and trends.
A preliminary investigation of the GP coauthorship network appears in
[?]. In the first part of this article we update this initial study using the
most recent data and adding the study of the influence of excluding co-
edited proceedings and books. In the second part we offer a new analysis of
the finer community structure of the collaboration network. Similar studies
have been performed in the last few years on several other collaboration
networks in disciplines such as physics, mathematics, medicine, biology, and
computer science [?, ?, ?, ?]. A related investigation concerning the EC
collaboration network [?] has appeared recently in popular form, but it does
not take into account, for example the community structure of the network.
[?] deals with some of the same statistical features for the EC community
at large as we describe in detail here for GP. The values reported by [?] are
in line with those found here for the GP field. Given that the intersection
between the GP researchers and general EC is likely to be rather large, it
would be interesting to study how they are related to each other.
2 The GP Collaboration Network
We treat the genetic programming social network as a graph where each
node is a GP researcher, i.e. someone who has at least one entry in the bib-
liography. There is a connection between two people if they have coauthored
at least one paper, or if they have coedited one or more book or proceedings.
As of the start of 2007, there is a total of N = 2809 connected nodes, i.e.
authors that have at least one GP collaborator, and a total of 5853 edges
(collaborations) in the GP coauthorship network. There are 367 isolated
vertices, which represent authors who have not collaborated with others to
the extent of coauthoring a paper. Isolated vertices are ignored in our graph
statistics. We have also excluded a single paper with 108 coauthors in a
nuclear physics journal. This is because we consider it to be an anomalous
entry that is not representative of typical collaborations in our discipline.
Due to the youth of GP, the graph is relatively small compared to some
studied collaboration networks [?, ?, ?]. (Although some published studies
have covered much smaller and more specialised networks, e.g. of only 50
people [?].) The main disadvantage of studying a relatively small database
is that, like any statistical study, more data allows deeper and more mean-
ingful inferences to be drawn. In particular, studies of the form of the
distributions (such as whether they follow exponential or power laws) re-
quire a large amount of data. The advantages include that the graph almost
fully represents the state of the whole GP community. This allows reliable
characterisation of collaboration in the community. Also, the problems of
multiple authors with the same name (e.g. A. Smith), outliers and different
name spelling that plague the larger data sets, are unlikely and easy to spot
in our data.
Although in many cases in our field co-editing a book or proceedings vol-
ume does reflect personal acquaintance, there are some large coeditorships
which are not representative and so may give a slanted view. Therefore in
the following figures we present two kinds of statistics: those that include
all joint publications and those in which co-edited conference proceedings
and co-edited books are excluded (but not their contents, of course). Next
we present and discuss some basic measures that characterise the GP col-
laboration network.
2.1 Number of Papers per Author
The average number of papers per author is 3.16 with co-edited books and
proceedings and it is 3.14 without. The five most prolific authors are, in de-
creasing order: J. Koza, R. Poli, W. B. Langdon, W. Banzhaf and C. Ryan.
If we exclude proceedings’ co-editors the ranking remains unchanged. Nat-
urally the distribution of the number of papers per author, P (k), has some
scatter, particularly in the tail of the distribution. Thus, we present in Fig-
ure 1 the graph of the cumulative distribution P (k ≥ n) which is smoother
and allows the same inferences to be made. The curves are rather well fitted
by a straight line, and thus the distributions follow a power-law P (k) ∝ k−γ
10000
1 10 100 1000
number of entries
with coeditors
without coeditors
power-law fit
Figure 1: Cumulative distribution of the number of entries per author. Log-
log scale. The straight line is the best mean-square fit and shows the number
of authors is ∝ k−2.5.
with a calculated exponent γ of 2.5 for both of them. A power-law distribu-
tion with similar exponents has been observed for analogous collaboration
networks, e.g. 2.86 for a biological publication database (Medline), 3.41 for
a computer science database (NCSTRL), 2.4 for mathematics, and 2.1 for a
neuroscience papers database [?, ?]. A smaller exponent (in absolute value)
means that the tail of the distribution is more stretched towards high values
of degree.
2.2 Number of Collaborators per Author
The average number of collaborators per author, i.e. the mean degree 〈k〉 of
the coauthorship graph, is 4.17 with proceedings and 3.62 without. This is
close to the values reported by studies of computer science, physics (exclud-
ing high energy physics) and Mathematics, suggesting GP follows similar
collaboration patterns to those disciplines. However it is much less than
found in high energy physics and medicine. See Table 1. In order and
including co-edited volumes, the five authors that have the largest num-
ber of collaborators are: W. Banzhaf, J.A. Foster, P. Nordin, W.B. Lang-
don, U.-M. O’Reilly. Without co-edited books the ranking is: P. Nordin,
W. Banzhaf, J. Daida, C. Ryan and R. Goodacre. The five “pairs” that
have the highest number of coauthored papers are, in decreasing order both
with or without co-edited proceedings: J. Koza–M. A. Keane, R. Poli–W.B.
Langdon, J. Koza–D. Andre, J. Koza–F. Bennet and F. Bennet–M.A. Keane.
This shows that J. Koza’s group has been tightly collaborating for a long
time, a conclusion that is confirmed in the community study of section 4.
It is also evident that the W.B. Langdon–R. Poli association has been an
extremely productive one.
10000
1 10 100
number of collaborators
with coeditors
without coeditors
Figure 2: Cumulative distribution of the number of authors with a given
number of collaborators. Logarithmic scale on both axes.
Figure 2 shows the cumulative distributions of the number of collabora-
tors. One sees that the distributions are not pure power-laws, otherwise the
points would approximately lie on a straight line. Rather, the distributions
shows a power-law regime in the first part followed by an exponential decay
in the tail. That is, the whole network cannot be fitted by a power-law.
This is quite common. In fact, several measured social networks do not
follow a power-law degree distribution [?, ?] and are best fitted either by
an exponential degree distribution P (k) ≈ e−k/〈k〉 or by an exponentially
truncated power-law of the type P (k) ≈ k−γe−k/kc , where kc represents a
critical connectivity and 〈k〉 is the average degree.
2.3 Number of Authors per Paper
Figure 3 shows the cumulative distribution of the number of papers written
by a given number of coauthors. Here the distribution also has a tail that is
longer than that of a Gaussian or exponential distribution, however it does
not follow a power-law. The average number of authors per paper is 2.25
(2.22 without co-editors). From Table 1 we can see that these figures are
close to the equivalent ones for computer science (NCSTRL) and physics,
while Mathematics has a lower number of co-authors per paper. On the
other hand, nuclear physics stands out with an unusually high number of
coauthors per paper.
10000
1 10 100
number of coauthors
with coeditors
without coeditors
Figure 3: Cumulative distribution of the number of papers with a given
number of coauthors on log-log scales.
From Figures 2 and 3 one can see that the tails of the distribution with
co-editors are longer than without them. Thus, taking co-editorship into
account seems to rather artificially inflate the number of publications with
many co-authors and, by consequence, the number of collaborators that a
person has.
2.4 Connected Components
In the theory of Poisson random graphs there is a critical value of average de-
gree 〈k〉 = 1 above which there is a sudden appearance of a giant component.
This is so-called since most vertices belong to it. The other components are
smaller and have an exponentially decreasing size distribution [?]. Although
collaboration graphs are not random, a similar phenomenon appears. In-
cluding coeditors there are 1025 GP authors in the giant component. This
is 36.5% of the total graph. If we exclude coediting proceedings etc. the
size is 743, representing the 26.9% of the total. In the giant component the
average number of collaborators per author is 5.83 with co-editors and 4.39
without them.
The cumulative size distribution of the connected components with and
without co-editors are depicted in Figure 4. Figure 4 shows that the proba-
bility density functions are well approximated by a power law with exponent
of 2.9 (excluding co-editors) and 2.6 (total). Since the other authors did not
provide the analogous data for their databases, we do not know how our
figures would compare with those for other coauthorship databases.
The existence of a big connected component has a social meaning. It
suggests 36.5% of GP researchers are members of a single community, since
those researchers are either directly connected via a collaboration or they
are close to each other in a way that will be made clear in section 3. The
size of the giant component is notably smaller in the GP graph with respect
to other measured coauthorship networks (see Table 1). This may be due
to the comprehensive nature of the GP bibliography. It captures work done
by smaller groups which does not get into major journals, whereas, perhaps,
the other databases concentrate upon higher impact outlets where work is
heavily cited but at the expense of ignoring less regarded authors. This
may artificially inflate the fraction of authors within their giant component.
Alternatively it may be due to the youth of the GP field, with many semi-
isolated individuals and groups starting research independently.
One should also consider that all collaboration networks are in a non-
equilibrium state as they are continuously evolving [?]. Accordingly, as
time goes by, one should observe small components progressively connecting
themselves to the large one. For example, in less than one year the size of
the giant component including co-editors has grown from 942 to 1025 nodes.
This is due in part to a number of newcomers collaborating with people
already belonging to the giant component. The other part comes from the
absorption of a few disconnected small components into the giant one thanks
to one or more new collaborations. This suggests that the size of the giant
component has not yet reached its “steady-state” value and it will continue
to grow in relative size. Since we possess all the time-stamped data, it is
possible to study the evolution of this component, as well as several other
indicators from the beginning and up to the present days. This investigation
is currently under way.
2.5 Social GP Clusters
The clustering coefficient of a node in a graph is the proportion of its neigh-
bouring nodes which are also neighbours of each other. The average clus-
tering coefficient 〈C〉 is calculated across all nodes in the graph [?]. In other
words, 〈C〉 is a simple statistical measure of the amount of local structure
1 10 100 1000
component size
with coeditors
without coeditors
power-law fit
power-law fit
Figure 4: Cumulative distributions of the number of connected components
in the collaboration graph by number of people. Log-log scale.
that is present in a graph. Most real-world networks, e.g. the world wide
web, roads, electrical power transmission and including the social networks
that have been studied to date, have a much larger clustering coefficient
than would be expected of a random graph with the same number of ver-
tices and edges. Social networks are particularly clustered. For example,
the average clustering coefficient is 〈C〉 = 0.665 for the GP collaboration
graph including book co-editors, and it is 0.660 without. (We would expect
0.0015 and 0.0013 for the corresponding random graphs). In terms of scien-
tific collaborations, a high clustering coefficient means that people tend to
collaborate in groups of three or more. This agrees with what we know of
the GP field. It may mean that two researchers that collaborate indepen-
dently with a third one may, in time, become acquainted and so collaborate
themselves. Alternatively it might be due to collaborators coming from the
same institution. In all cases, a high value of 〈C〉 for a social network is
an indication that collaborations are not made at random at all, and that
social forces and processes are at work in the network structure formation.
Table 1 summarises the results of this section and compares them with
those for some other collaboration networks. Some of the entries in the table
will be discussed in the following section. Most GP statistics are similar to
those of the larger databases. However one notable difference, as we have
already remarked, is the relative smallness of the largest component. The
clustering is rather high, which shows that GP researchers know each other
quite well within the large component, and the community is rather homoge-
Table 1: Basic statistics for some scientific collaboration networks. GP1 is
the GP bibliography at the start of 2007, including coedited books and pro-
ceedings. GP2 is the same but without coeditors. SPIRES is a data set of
papers in high-energy physics. Medline is a database of articles on biomed-
ical research. Mathematics comprises articles from Mathematical Reviews.
NCSTRL is a database of preprints in computer science. Physics has been
assembled from papers posted on the Physics E-print Archive. Details about
these databases can be found in [?, ?, ?].
GP1 GP2 SPIRES Medline Mathematics NCSTRL Physics
Total number of papers 4564 4504 66652 2163923 1600000 13169 98502
Total number of authors 2809 2765 56627 1520251 253339 11994 52909
Average papers per author 3.16 3.14 11.6 6.4 7 2.55 5.1
Average authors per paper 2.25 2.22 8.96 3.754 1.5 2.22 2.53
Average collaborators per author 4.17 3.62 173 18.1 2.94 3.59 9.7
Size of the giant component (%) 36.5 26.9 88.7 92.6 82.0 57.2 85.0
Clustering coefficient 0.665 0.660 0.726 0.066 0.15 0.496 0.43
Average path length 4.74 5.2 4.0 4.6 7.73 9.7 5.9
neous. In contrast, in biology and medicine or mathematics, where scientist
from different sub-disciplines seldom collaborate, the clustering coefficient is
lower. Note also the high number of authors per paper, and especially the
strikingly high number of collaborators per author in the nuclear physics
community (SPIRES). Clearly, nobody can maintain an average of 173 sci-
entific partners on a first-hand acquaintance basis and thus this figure does
not seem to be socially meaningful.
3 Distances and Centrality
A social network can be characterised by a number of measures that give an
idea of “how far” people are from each other, or how “central” they are with
respect to the whole community. These measures are well known in social
network analysis. Here we shall concentrate on average path length and on
betweenness centrality.
3.1 Average Path Length
The average path length L of a graph is the average value of the shortest
paths between all of its pairs of vertices. In random graphs and many real
networks, such as the Internet, the World Wide Web and social networks,
the average path distance scales as a logarithmic function O(logN) of the
number of vertices N . Such networks, if they also have a high clustering
coefficient, are known as small worlds networks [?]. Since, even for very large
graphs, any two nodes in a small world network are only a few steps apart.
In contrast in regular lattices, two nodes are O(N
D ) apart. (Where D is the
lattice’s dimensionality. For example, for a square lattice L ≤ 2
2 ). The
average path length of the giant component of the GP collaboration graph
including coeditors is 4.74 (it is 5.2 without coeditors). The longest among
all the shortest paths (known as the diameter) is 12 (14 without coeditors).
Thus, unsurprisingly, the GP community, as far as its “core” component
is concerned, is indeed a small world and is characterised by values that
are typical of these kinds of network (see Table 1). Being a small world
means that information may circulate quickly and collaborations are easier
to set up. These are clearly advantageous for a research community. The
connected components following the largest one are themselves small worlds.
We expect over time some of them will merge with the largest component.
(For this to happen, only a single new collaboration between two scientists
each belonging to one of the components is needed.)
3.2 Betweenness
The betweenness b(v) of a vertex v is the total number of shortest paths be-
tween all possible pairs of vertices that pass through this vertex. Nodes that
have a high betweenness potentially have more influence, i.e. they are more
central in the network, in that there is more “traffic” that goes through
them. The first five authors in terms of betweenness in the network (in-
cluding co-editors and in decreasing order) are: W. Banzhaf, H. Iba, U.-M.
O’Reilly, H. de Garis and W. B. Langdon. W. Banzhaf is also the researcher
that has the highest number of different collaborators. Without co-editors
the ranking is: W. B. Langdon, U.-M. O’Reilly, W. Banzhaf, M. Tomassini
and P. Nordin. People who have a large value of betweenness play the role
of intermediaries or “brokers” in a social sense.
3.3 Non-random collaborations between
directly connected authors
Most technological and biological networks are disassortive in that they have
negative correlation, meaning that high-degree vertices are preferentially
connected to low-degree vertices. However most measured social networks
are assortative, meaning highly connected nodes tend to be connected with
other highly connected nodes [?]. The GP collaboration network confirms
this general observation with a correlation coefficient of +0.15 for the gi-
Powered by yFiles
Lanz Pizz
Figure 5: One of the communities belonging to the main network compo-
nent. The thickness of the links gives an indication of the number of co-
authored papers. The largest thickness indicates more than 16 coauthored
works. The thinnest link (light gray) stands for a single collaboration. The
different symbols and colours represent sub-communities of the illustrated
community.
ant component, and +0.30 for the whole graph (including coeditors and
excluding the single physicist’s paper). These are close to the coefficients
observed for other social networks (specifically 0.127 for Medline and 0.120
for Mathematics [?]).
4 Communities in the Giant Component
All the researchers belonging to the largest component of the network can be
said to form part of the GP community at large, in the sense that they are all
only a few steps away from any other member of the community. However,
we know from direct experience that some groups of GPers are more closely
connected between themselves than with other people. In other words, they
belong to what one might call a group or a tighter community within the
global one. It is not easy to give a rigorous quantitative definition of a
community within a network. For our purposes a community can be seen
as a set of highly connected vertices having few connections with vertices
belonging to other communities. In the analysis of social networks, several
algorithms that attempt to split a network into communities have been
proposed. We used Newman’s method [?], which is based on a measure
of the fraction of edges that fall within communities minus the expected
value of the same quantity if edges fall at random without regard for the
community structure.
Since the GP bibliography contains the number of papers that any two
collaborators have published together, it is possible to go a step further
than just saying that two people have coauthored at least a paper, and
give a measure of the intensity of the collaboration. We use the number of
papers that two given authors have in common as a measure of the strength
of their collaboration. Newman [?] has proposed a more refined measure
which takes into account the actual number of coauthors of each paper.
However this is more complicated than we need, instead we ignore the total
number of coauthors for each paper. Our measure of collaboration strength
is used in our communities algorithm to highlight groups of researchers that
have collaborated strongly with the aim of uncovering the stability of the
scientific relationship. We have also excluded coedited proceedings, books,
etc., as we have already seen that these might sometimes represent spurious
collaboration relationships.
The results of running the algorithm on the subgraph represented by the
largest connected component are qualitatively surprisingly close to what one
would expect, given our knowledge of the GP field. The advantage is that the
analysis makes them explicit and uncovers a number of other relationships
that would be difficult to infer without an explicit study of the raw data.
As an example of the about 25 communities that the algorithm discovers,
Figure 5 shows the structure of the groups around one of us (“Toma”). If we
now consider this community as an isolated subgraph and apply again New-
man’s algorithm to it, we obtain the groups highlighted by different symbols
Powered by yFiles
Chio BuxtHoll
Figure 6: Another community belonging to the main network component.
The thickness of the links gives an indication of the number of co-authored
papers. The largest thickness indicates more than 16 coauthored works.
The thinnest link (light gray) stands for a single collaboration. The different
symbols and colours represent sub-communities.
and colours in the figure. Thus, the groups correspond to sub-communities
within the main community. The thickness of the links represents the inten-
sity of the relationship. It is easy to recognise a “hard core” of collaborating
researchers strongly connected to “Toma” forming triads and higher poly-
gons of order four and five. The strong triangle (“Foli”, “Pizz”, “Spez”) is
relatively loosely connected to the rest, showing that these researchers be-
long to the community but often collaborate between themselves. It is also
possible to discern institutional and geographical components in the com-
munity. For example, most of the upper right part of the figure through the
node “Chop” comprises researchers essentially belonging to the University
of Geneva, which is close to the University of Lausanne, to which “Toma”
belongs. However, geographical closeness is not the key factor in the other
groups which belong to Universities in France, Italy, Spain, and the US.
We might conjecture that many collaborations start locally at the same or
at close institutions and then they spread through people being introduced
to others via a common acquaintance, or through people physically moving
or visiting other institutions. This is the case in the figure, where “Vann”,
”Chop”, and “Vega” among others have played the role of “bridges” between
different institutions and across countries.
As a second illustration, let us look at Figure 6 which is the community
that revolves around one of us (“Lang”) and “Poli”. In contrast to the pre-
vious case, one can see that the graph structure is more “star-like”, with two
large directly connected big hubs (“Lang” and “Poli”) who have about 70 co-
authored papers, and three other highly connected nodes (“Buxt”,“McPh”,
“Rowe”) which are strongly connected to one of the main hubs but not to
both. It is interesting to observe the role of “McPh” who, like “Vega” in the
previous community (cf. Figure 5), plays a bridging role, this time between
some UK and some American researchers. We can also recognise a strong
”theory-oriented” group, which is almost a clique in the graph, formed by
(“McPh”,“Poli”,“Rowe”, “Steph”, “Wrig”). There is also another bridge
formed by “Cagn” from UK to Italy, again due to a long-standing collabo-
ration and friendship. The small cliques or almost cliques at the periphery
of the figure essentially represent people that have worked at the same in-
stitution in either Italy or the US.
The discussion above, motivated by our belonging to the mentioned com-
munities, and thus by our direct human knowledge about them, should be
enough to get an impression of the many useful observations that one can
make on the communities that interlock in the main network component.
There are of course several other large well known and interesting commu-
nities in the network but unfortunately we cannot describe them here for
reasons of space.
5 Conclusions
In sections 2 and 3 we characterised the genetic programming (GP) coau-
thorship graph using a number of local and global statistics. We extended
and updated the findings presented in [?] by studying the influence of
coedited volumes and by using the latest data available. Section 3 showed
the GP field to be highly clustering and that the GP coauthorship network
has a small mean path length. Together these suggest that, at least for the
core, GP is indeed a “small world”. We also found, compared with other
published collaboration networks, that the fraction of GP authors connected
by coauthorship is a relatively small fraction of all GP authors.
Section 4 is a study of the community structure of GP. It uses a more pre-
cise definition of collaboration, which takes into account the intensity of the
relationship. This uncovers many groups of tightly interacting researchers.
From the detailed study of two of the communities we have drawn inferences
about the pivotal role of some researchers or groups of researchers in pro-
moting collaborations within and between academic institutions. Adding
our human knowledge about geographical location and personal acquain-
tance, allows some conjectures to be drawn about the way in which different
continents and countries collaborate on research projects.
It should be obvious that the present data driven approach to social
network analysis can only provide some answers but not all of them. Algo-
rithms and data cannot take into account human aspects such as friendship
in scientific collaboration. While these may be buried in the sea of numbers
they will never appear explicitly from such analyses. Nevertheless, we feel
that our results are interesting and useful in the way that they characterise
our community.
There is another aspect of the collaboration graph that would be re-
vealing: the analysis of its development over the years. Indeed, since each
paper has a date of publication, we possess all the data that are needed
for such an investigation. This would allow the detailed study of how the
network has grown to its present size and structure from the beginning and
might give hints as to its future progress. This extension is currently under
investigation.
	Introduction
	The GP Collaboration Network
	Number of Papers per Author
	Number of Collaborators per Author
	Number of Authors per Paper
	Connected Components
	Social GP Clusters
	Distances and Centrality
	Average Path Length
	Betweenness
	Non-random collaborations between directly connected authors
	Communities in the Giant Component
	Conclusions
ABSTRACT
  Useful information about scientific collaboration structures and patterns can
be inferred from computer databases of published papers. The genetic
programming bibliography is the most complete reference of papers on GP\@. In
addition to locating publications, it contains coauthor and coeditor
relationships from which a more complete picture of the field emerges. We treat
these relationships as undirected small world graphs whose study reveals the
community structure of the GP collaborative social network. Automatic analysis
discovers new communities and highlights new facets of them. The investigation
reveals many similarities between GP and coauthorship networks in other
scientific fields but also some subtle differences such as a smaller central
network component and a high clustering.

<|endoftext|><|startoftext|>
arXiv:0704.0552v1  [astro-ph]  4 Apr 2007
The Expanding Photosphere Method: Progress
and Problems
József Vinkó and Katalin Takáts
Department of Optics & Quantum Electronics, University of Szeged, Hungary
Abstract. Distances to well-observed Type II-P SNe are determined from an updated version of
the Expanding Photosphere Method (EPM), based on recent theoretical models. The new EPM
distances show good agreement with other independent distances to the host galaxies without any
significant systematic bias, contrary to earlier results in the literature. The accuracy of the method
is comparable with that of the distance measurements for Type Ia SNe.
Keywords: supernovae; core-collapse; distances
PACS: 97.10.Vm, 97.60.Bw
INTRODUCTION
Distance is one of the most fundamental quantities in astrophysics, and it is especially
true for supernovae. Type Ia SNe are thought to be the most reliable distance indicators,
even up to z ∼ 1.5 redshift, and they play major role in determining the expansion of the
Universe as well as the cosmic equation of state. On the other hand, accurate distances
to SNe are crucial in understanding not only their physical properties, but also revealing
their progenitor objects and the possible explosion mechanisms.
The Expanding Photosphere Method (EPM) is a tool for measuring distances to SNe
that have large amount of ejected material [1]. The concept of EPM is based on a few
assumptions about the general physics of the expanding ejecta. These are the followings:
1. The expansion of the ejected material is spherically symmetric.
2. The ejecta is expanding homologously, i.e. R(t) = v(R) · (t − te), where R(t) is the
time-dependent radius of a particular layer in the ejecta, v(R) is the (constant)
expansion velocity of this layer and t − te is the time elapsed since the moment
of explosion (te).
3. The ejecta is optically thick, i.e. there exists a layer where the optical depth τλ ∼ 1.
This layer is the „photosphere” (Rphot ). Because of the expansion, the location of
the photosphere moves inward the ejecta, so its velocity (vphot ) is decreasing with
time.
4. The photosphere radiates as a blackbody, so the shape of the emergent flux spec-
trum is Planckian with a well-defined effective temperature Te f f . However, the ab-
solute flux value differs from that of the blackbody due to the dominance of scatter-
ing opacity over true absorption in the ejecta. The deviation from the blackbody can
be described with a simple scaling, i.e. Fλ = ζ 2πBλ (T ). where Fλ is the surface
flux, Bλ (T ) is the Planck function and ζ is the correction (or “dilution”) factor.
http://arxiv.org/abs/0704.0552v1
These assumptions are most likely to be valid in Type II-P SNe. These eject a massive,
hydrogen-rich envelope that remains optically thick for ∼ 100 days after explosion, and
the emergent spectrum is indeed close to be Planckian. Thus, EPM is expected to work
best for such SNe.
Based on the assumptions, the instantaneous radius of the photosphere can be ex-
pressed as Rphot = vphot(t) · (t − te) (the radius of the progenitor is usually neglected).
Meantime, the observed flux is fλ = θ 2 · ζ 2πBλ (T ), where θ = Rphot/D is the angular
radius of the photosphere from distance D. Combining these two equations, one gets the
basic equation of EPM [2, 3]:
t = te+D ·
vphot
. (1)
Since θ and vphot can be determined from observations, te and D are the only unknowns
in Eq.1. These can be derived via least-squares fitting to the observed quantities.
If the SNe under study are at high redshifts, the equations should be slightly modified
[4]. The definition of the angular radius is connected with the angular distance DA,
while in the expression of the observed flux the luminosity distance DL enters. At high z
DL = (1+ z)
2DA, so the angular radius of the photosphere can be inferred from
fλ (1+ z)
πBλ ′(T )
, (2)
where λ ′ = λ/(1+ z).
One particular advantage of EPM is that it does not require initial calibration, i.e. a
sample of objects with a priori known distances. However, the computation of the ζ
correction factors needs detailed model atmospheres, which makes the method essen-
tially model-dependent. Currently, there are two independent sets of model atmospheres
of Type II-P SNe in the literature, which were used to compute correction factors as a
function of Te f f [5, 6]. The former one was used in detailed studies of SN 1999em (the
most extensively studied SN II-P so far) that resulted in DEPM ≈ 8±1 Mpc [2, 3, 7] . This
is in significant disagreement with the Cepheid distance to the host galaxy NGC 1637
being DCep = 11.7±1 Mpc [8]. This problem has been solved in [9] by using a new set
of correction factors based on the NLTE radiative transfer code CMFGEN which gave
DEPM = 11.5±1.0 Mpc for SN 1999em.
NEW EPM DISTANCES TO SNE II-P
The method outlined above has been implemented in a new code that needs observed
BVRI light curves, radial velocities (determined from the absorption minima of certain
spectral features, see below) and reddening information (typically E(B−V )) as input.
As in any method based on photometry, the magnitudes must be dereddened, but fortu-
nately the results of EPM are quite insensitive to reddening errors, compared with other
methods [5].
 3000  4000  5000  6000  7000  8000  9000
Wavelength (Å)
SN 1999em (+8 d)
Tbb = 16862 K
 4  6  8  10  12  14  16  18  20
T (103 K)
FIGURE 1. Left panel: Result of fitting a blackbody (solid line) to broadband BVI fluxes (filled
symbols) of SN 1999em [3]. The R−band flux is also in good agreement with the fitted blackbody. The
flux-calibrated spectrum obtained simultaneously (dotted line) is shown for comparison. Right panel: The
correction factor as a function of Te f f from [6] (filled circles) and [5] (open circles).
At each epoch, the angular radius has been computed by a simultaneous fitting to the
dereddened B, V and I fluxes, as described in [2]. The corresponding effective temper-
ature has been derived by fitting a blackbody curve to the broadband fluxes converted
from the dereddened magnitudes. Our experience shows that the best results can be
achieved by considering all optical+NIR (i.e. BVRI) fluxes simultaneously. Earlier stud-
ies were sometimes limited to the usage of B and V bands only, which may result in
increased systematic errors due to the large deviation of the B-band fluxes from the
blackbody curve at later phases. The left panel of Fig.1 illustrates the optimum fitting of
a blackbody to either photometric, or precisely calibrated spectroscopic fluxes.
From the resulting Te f f , the correction factor ζ has been computed from the ζBV I(T )
function of Dessart & Hillier [6] for data obtained less than 40 days after explosion. For
data measured between 40 - 60 days after explosion, the function given by Eastman et al.
[5] was applied. As noted above, the usage of the function of Dessart & Hillier produces
better distances, but their models are valid only during the first month after explosion,
before the hydrogen starts to recombine. The ζBV I(T ) functions are plotted in the right
panel of Fig.1.
Beside the correction factors, the other important quantity is the photospheric velocity
vphot , because the resulting distance is very sensitive to the velocities that appear in the
denominator in Eq.1. Thus, the problem of finding an optimum method to infer vphot
from Type II-P SNe spectra has been addressed in several studies (see [2, 3, 6]).
We have studied this problem by computing model spectra with the parametrized
spectral synthesis code SYNOW [10]. SYNOW computes the emergent spectrum in a
homologously expanding atmosphere assuming LTE and pure scattering line formation.
The input parameters are the velocity and the blackbody temperature at the photosphere
(vphot and Te f f ), the exponent of the atmospheric structure, the list of ions contributing
to the spectral features, and the optical depth of one strong line for each ion.
Four sets of spectra have been defined corresponding to phases +10, +15, +50 and
+95 days after explosion, respectively. The list of ions contained H, He I, Na I, Fe II, Sc
 3000  4000  5000  6000  7000  8000
Wavelength (Å)
+10 d
+15 d
+50 d
+95 d
 0.96
 0.98
 1.02
 1.04
 1.06
 1.08
 2  4  6  8  10  12  14  16
vobs (10
3 km/s)
HeI 5876
FeII 4924
FeII 5018
FeII 5169
ScII 5526
FIGURE 2. Left panel: SYNOW model spectra of Type II-P SNe. The phase of each spectrum (ex-
pressed in days after explosion) is indicated. Right panel: The ratio of the true photospheric velocity (an
input parameter of a SYNOW model) to the „observed” velocity (derived from the absorption minimum
of P Cygni lines) as a function of the „observed” velocity. Different symbols mean different photospheric
lines indicated on the righ-hand side.
II, Ti II and Ba II, because these ions are thought to be responsible for the strongest lines
in the optical [3]. The input parameters except vphot were tuned to match real Type II-P
SNe spectra. Then, several model spectra were synthesized with different input vphot for
each phase. The left panel of Fig.2 shows representative spectra of all phases.
The synthesized spectra were used to compute “observed” radial velocities by mea-
suring the Doppler-shift of the absorption minima of selected lines. For P Cygni line
profiles, this should give exactly vphot if the line is isolated and optically thin. However,
in reality the features in a SN spectrum are all blends and may not be optically thin.
Therefore, vobs will differ from vphot .
In the right panel of Fig.2 the ratio of vphot/vobs is plotted as a function of vobs for
the features shown. It is seen that in almost all cases vphot is slightly underestimated.
The explanation of such a phenomenon is discussed in [6] for the Hα line. However, the
relative difference is below 5 %, thus, these lines are expected to represent vphot with 2 -
4 % accuracy. Motivated by these results, we have selected the He I λ5876 and the Fe II
λ5169 features to infer vphot from early-phase (< +20 days) and late-phase spectra of
real SNe, respectively.
In order to apply the method to real SNe, we have collected the available data of Type
II-P SNe from the literature (details and references will be published in a forthcoming
paper). Eq.1 was fitted to the observed data via least squares using either t or θ/vphot
as the independent variable. The two results for D and te were averaged to obtain their
final value. Whenever possible, the fit was restricted to data obtained between +5 – +40
days after explosion, and the angular radii were calculated using the Dessart & Hillier
correction factors (see above). In a few cases only late-phase (t ∼ 40− 60 days) data
were available. The Eastman et al. correction factors were applied for those SNe.
The EPM distances are plotted against the “reference” distances to their host galaxies
(mostly Tully-Fisher or SBF-distances for the nearby ones and Hubble-flow distances for
the more distant ones) in the left panel of Fig.3. As a comparison, the distances coming
 1000
 1  10  100  1000
Dref (Mpc)
 26  28  30  32  34  36  38  40
FIGURE 3. Left panel: the comparison of EPM (filled circles) and SCM (open triangles) distances with
the reference distances of the host galaxies. Right panel: residuals of the distance moduli of Type II-P SNe
from EPM (filled symbols) and Type Ia SNe (see text).
from the „Standard Candle Method” (SCM) [11] for nearly the same observational
sample are also shown. The scattering is very similar for both EPM and SCM. It is
concluded that these two methods provide distances to Type II-P SNe with ∼ 15−20 %
accuracy.
The accuracy of the new EPM distances is also similar to that of individual SNe Ia
distances. This is illustrated in the right panel of Fig.3, where the difference between
the distance moduli of Type II-P SNe (from this paper) and the low-redshift subsample
of Type Ia SNe (from [12]) are plotted against the reference distance moduli (adopting
Dre f = cz/H0 for Type Ia SNe). Again, the scattering of the data is similar for the two
samples. Thus, the concept of EPM combined with the present knowledge of Type II-P
SNe atmospheres may provide consistent and reliable distances, which may be extended
toward higher redshifts in the future. This could be a very important, independent test of
the Type Ia SNe distance scale.
This work was supported by Hungarian OTKA Grants No. T 042509 and TS 049872.
REFERENCES
1. Kirshner R.P., Kwan J., ApJ 193, 27 (1974)
2. Hamuy M. et al., ApJ, 558, 615 (2001)
3. Leonard D.C. et al., PASP 114, 35 (2002)
4. Schmidt, B.P. et al., AJ 107, 1444 (1994)
5. Eastman, R.G., Schmidt, B.P., & Kirshner, R., ApJ 466, 911 (1996)
6. Dessart, L. and Hillier, D. J., Astronomy & Astrophysics 439, 671 (2005)
7. Elmhamdi, A. et al. MNRAS 338, 939 (2003)
8. Leonard, D.C., Kanbur, S.M., Ngeow, C.C., Tanvir, N.R. ApJ 594, 247 (2003)
9. Dessart, L. and Hillier, D. J., Astronomy & Astrophysics 447, 691 (2006)
10. Baron E. et al., ApJ 545, 444 (2000)
11. Hamuy, M., in Cosmic Explosions - IAU Colloquium 192, edited by J. M. Marcaide and K. W. Weiler,
Springer Proceedings in Physics 99, Springer-Verlag, Berlin, Heidelberg, 2005, pp. 535–541.
12. http://braeburn.pha.jhu.edu/~ariess/R06/Davis07_R07_WV07.dat
ABSTRACT
  Distances to well-observed Type II-P SNe are determined from an updated
version of the Expanding Photosphere Method (EPM), based on recent theoretical
models. The new EPM distances show good agreement with other independent
distances to the host galaxies without any significant systematic bias,
contrary to earlier results in the literature. The accuracy of the method is
comparable with that of the distance measurements for Type Ia SNe.

<|endoftext|><|startoftext|>
Introduction
The old idea[1] that spontaneous Lorentz invariance violation (SLIV) may lead to
an alternative theory of quantum electrodynamics still remains extremely attractive
in numerous theoretical contexts[2] (for some later developments, see the papers[3]).
The SLIV could generally cause the appearance of massless vector Nambu-Goldstone
modes which are identified with photons and other gauge fields underlying the mod-
ern particle physics framework like as Standard Model and Grand Unified Theory.
At the same time, the Lorentz violation by itself has attracted a considerable at-
tention in recent years as an interesting phenomenological possibility appearing in
various quantum field and string theories[4-9].
Early models realizing the SLIV conjecture were based on the four fermion
(current-current) interaction, where the proposed gauge field may appear as a fermion-
antifermion pair composite state[1], in a complete analogy with a massless composite
scalar field in the original Nambu-Jona-Lazinio model[10]. Unfortunately, owing to
the lack of a starting gauge invariance in such models and composite nature of
Goldstone modes appeared it is hard to explicitly demonstrate that these modes
really form together a massless vector boson being a gauge field candidate. Actu-
ally, one must make a precise tuning of parameters, including a cancellation between
terms of different orders in the 1/N expansion (where N is the number of fermion
species involved), in order to achieve the massless photon case[11]. Rather, there are
in general three separate massless Goldstone modes, two of which may mimic the
transverse photons polarizations, while the third one must properly be suppressed.
In this connection, the more instructive laboratory for SLIV consideration proves
to be some simple class of the QED type models having from the outset a gauge
invariant form, whereas the Lorentz violation is realized through the nonlinear dy-
namical constraint imposed on the starting vector field Aµ
A2µ = n
2 (1)
where nµ is an properly oriented unit Lorentz vector, while M is a proposed SLIV
scale. This constraint means in essence that the vector field Aµ develops the vacuum
expectation value 〈Aµ(x)〉 = nµM and Lorentz symmetry SO(1, 3) breaks down to
SO(3) or SO(1, 2) depending on the time-like (n2µ = +1) or space-like (n
µ = −1)
SLIV. Such QED model was first studied by Nambu a long time ago[12], but only
for the time-like SLIV case and in the tree approximation. For this purpose he
applied the technique of nonlinear symmetry realizations which appeared successful
in the handling of the spontaneous breakdown of chiral symmetry in the nonlinear
σ model[13] and beyond1.
1Actually, the simplest possible way to obtain the above supplementary condition (1) could be
an inclusion the “standard” quartic vector field potential V (A) = −
A2µ +
(A2µ)
2 into the
QED type Lagrangian, as can be motivated to some extent[14] from the superstring theory. This
potential inevitably causes the spontaneous violation of Lorentz symmetry in a standard way, much
as an internal symmetry violation is caused in a linear σ model for pions[13]. As a result, one has a
In the present paper, we mainly address ourselves to the Yang-Mills gauge fields
as the possible vector Goldstone modes (Sec.3) once some basic ingredients of the
Goldstonic QED model are established in a general SLIV case (Sec.2). This prob-
lem has been discussed many times in the literature within quite different contexts,
such as the Yang-Mills gauge fields as the Goldstone modes for the spontaneous
breaking of general covariance in a higher-dimensional space[17] or for the nonlinear
realization of some special infinite parameter gauge group[18]. However, all these
considerations look rather speculative and optional. Specifically, they do not give
a correlation between the SLIV induced photon case, from the one hand, and the
Yang-Mills gauge field case, from the other. In contrast, our approach is solely
based on the spontaneous Lorentz violation thus properly generalizing the Nambu’s
QED model[12] to the non-Abelian internal symmetry case. Just in this approach
evolved the interrelation between both of cases appears most transparent. We will
see that in the Yang-Mills theory case with an internal symmetry group G having
D generators not only the pure Lorentz symmetry part SO(1, 3) in the symmetry
SO(1, 3) ×G of the Lagrangian, but the larger accidental symmetry SO(D, 3D) of
the SLIV constraint Tr(AµA
µ) = ±M2 in itself is spontaneously broken as well.
Because the starting non-Abelian theory proves to be expanded about the vacuum
which violates the much higher accidental symmetry appeared, many extra mass-
less modes, the pseudo-Goldstone vector bosons (PGB), have to arise. Actually,
while the spontaneous Lorentz violation on its own still generates only one genuine
Goldstone vector boson, the accompanying vector PGBs related to the SO(D, 3D)
breaking also come into play in the final arrangement of the entire Goldstone vec-
tor field multiplet. Remarkably, in contrast to the familiar scalar PGB case[13] the
vector PGBs remain strictly massless being protected by the non-Abelian gauge in-
variance of the Yang-Mills theory involved. Then in Sec.4 we show by some examples
of the lowest order SLIV processes that, while the Goldstonic non-Abelian theory
evolved contains a rich variety of Lorentz and CPT violating couplings, it proves
to be physically indistinguishable from a conventional Yang-Mills theory. Actually,
one of the goals of the present work is to explicitly demonstrate that a conventional
Yang-Mills theory (as well as QED) is in fact the spontaneously broken theory. The
Lorentz violation, due to the quadratic field constraint of the type (1), renders this
theory highly nonlinear in the Goldstone vector modes, while physically equivalent
to the usual one. So, as well as in the pure QED case, the SLIV only means the
noncovariant gauge choice in the otherwise gauge invariant and Lorentz invariant
Yang-Mills theory. However, even a tiny breaking of the starting gauge invariance at
massive Higgs mode (with mass
2mA) together with a massless Goldstone mode associated with
photon. Furthermore, just as in the pion model one can go from the linear model for the SLIV to
the non-linear one taking a limit λA → ∞, m
A → ∞ (while keeping the ratio m
A/λA to be finite).
This immediately leads to the constraint (1) for vector potential Aµ with n
2 = m2A/λA, as it
appears from a validity of its equation of motion. Another motivation for the nonlinear vector field
constraint (1) might be an attempt to avoid the infinite self-energy of the electron in a classical
electrodynamics, as was originally indicated by Dirac[15] and extended later to various vector field
theory cases[16].
very small distances influenced by gravity would render the SLIV physically signifi-
cant. For the SLIV scale comparable with the Planck one the spontaneous Lorentz
violation could become directly observable at low energies. We summarize the results
obtained in the final Sec.5.
2 Goldstonic quantum electrodynamics
The simplest SLIV model is given by a conventional QED Lagrangian for the charged
fermion field ψ
L(A,ψ) = −
µν + ψ(iγ · ∂ −m)ψ − eAµψγ
µψ (2)
where the nonlinear vector field constraint (1) is imposed[12]. For the resulting
Lorentz violation, one can rewrite the Lagrangian L(A,ψ) in terms of the standard
parametrization for the vector potential Aµ
Aµ = aµ +
(n ·A) (n2 ≡ n2µ) (3)
where the aµ is pure Goldstonic mode
n · a = 0 (4)
while the effective Higgs mode (or the Aµ component in the vacuum direction) is
given according to the above nonlinear constraint (1) by
n ·A = (M2 − n2a2ν)
2 =M −
n2a2ν
+O(1/M2) (5)
where, for definiteness, the positive sign for the above square root was taken when
expanding it in powers of a2ν/M
2. Putting the parametrization (3) with the SLIV
constraint (1, 5) into our basic gauge invariant Lagrangian (2) one comes to the
truly Goldstonic model for QED. This model might look unacceptable due to the
inappropriately large Lorentz violating fermion bilinear eMψ(γ ·n)ψ stemming from
the vector-fermion current interaction eAµψγ
µψ in the Lagrangian L (2) when the
expansion (5) is taken. However, thanks to a local invariance of the Lagrangian L
this term can be gauged away by a suitable redefinition of the fermion field
ψ → eieM(n·x)ψ (6)
after which the above fermion bilinear is exactly cancelled by an analogous term
stemming from the fermion kinetic term. So, one eventually comes to the essentially
nonlinear SLIV Lagrangian for the Goldstonic aµ field of the type (taken in the first
approximation in a2ν/M
L(a, ψ) = −
δ(n · a)2 −
n2a2ρ
+ (7)
+ψ(iγ · ∂ +m)ψ − eaµψγ
en2a2ρ
ψ(γ · n)ψ
We denoted its strength tensor by fµν = ∂µaν − ∂νaµ, while h
µν = nµ∂ν −nν∂µ is a
new SLIV oriented differential tensor. This tensor hµν acts on the infinite series in
a2ρ coming from the expansion of the effective Higgs mode (5) from which the first
order term −n2a2ν/2M was only taken in this expansion throughout the Lagrangian
L(a, ψ). Also, we explicitly included the orthogonality condition n · a = 0 into
Lagrangian through the term which can be treated as the gauge fixing term (taking
the limit δ → ∞) and retained the former notation for the fermion ψ.
The Lagrangian (7) completes the Goldstonic QED construction for the charged
fermion field ψ. The model, as one can see, contains the massless Goldstone modes
given by the tree broken generators of the Lorentz group, while keeping the massive
Higgs mode frozen. These modes, lumped together, constitute a single Goldstone
vector boson associated with photon2. In the limit M → ∞ the model is indistin-
guishable from a conventional QED taken in the general axial (temporal or pure
axial) gauge. So, for this part of the Lagrangian L(a, ψ) given by the zero-order
terms in 1/M the spontaneous Lorentz violation only means the noncovariant gauge
choice in otherwise the gauge invariant (and Lorentz invariant) theory. Remarkably,
furthermore, also all the other (first and higher order in 1/M) terms in the L(a, ψ)
(7), though being by themselves the Lorentz and CPT violating ones, do not lead to
the physical SLIV effects which turn out to be strictly cancelled in all the physical
processes involved. So, the nonlinear constraint (1) imposed on the standard QED
Lagrangian (2) appears, in fact, as a possible gauge choice, while the S-matrix re-
mains unaltered under such a gauge convention. This conclusion was first reached
at tree level[12] and recently extended to the one-loop approximation[19]. All the
one-loop contributions to the photon-photon, photon-fermion and fermion-fermion
interactions violating the physical Lorentz invariance were shown to be exactly can-
celled as well. This means that the vector field constraint A2µ = n
2 which has
been treated as the nonlinear gauge choice at a tree (classical) level, remains just
as a pure gauge condition when quantum effects are also taken into account. Re-
markably, this conclusion appears to work also for a general Abelian theory case[20],
particularly, when the internal U(1) charge symmetry is spontaneously broken hand
in hand with the Lorentz one. As a result, the massless photon being first generated
by the Lorentz violation become then massive due to the standard Higgs mechanism,
while the SLIV condition in itself remains to be a gauge choice3.
2Strictly speaking one can no longer use the standard definition of photon as a state being
the spin-1 representation of the (now spontaneously broken) Poincare group. However, due to
gauge symmetry of the starting QED Lagrangian (2) the separate SLIV Goldstone modes appear
combined in such a way that a standard photon (taken in an axial gauge (4)) emerges.
3Note in this connection that there was discussed[12] a possibility of an explicit construction of
the gauge function corresponding to the nonlinear gauge constraint (1) that would eliminate the
need for all the kinds of checks of gauge invariance mentioned above. Remarkably, the equation
for this gauge function appears to be mathematically equivalent to the classical Hamilton-Jacobi
equation of motion for a charged particle. Thus, this gauge function should in principle exist
because there is a solution to the classical problem. However, this formal analogy only works for
the time-like SLIV (n2µ = +1) in the pure QED leaving aside a general Abelian theory when the
gauge invariance can spontaneously be broken. Apart from that, it does not generally extend to
3 Goldstonic Yang-Mills theory
In this section, we extend our discussion to the non-Abelian internal symmetry case
given by a general group G with generators ti([ti, tj ] = icijktk and Tr(titj) = δij
where cijk are structure constants and i, j, k = 0, 1, ...,D − 1). The corresponding
vector fields which transform according to its adjoint representation are given in the
proper matrix form Aµ = A
i, while the matter fields (fermions, for definiteness)
are presented in the fundamental representation column ψr (r = 0, 1, ..., d− 1) of G.
By analogy with the above Goldstonic QED case we take for them a conventional
Yang-Mills type Lagrangian
L(A, ψ) = −
Tr(F µνF
µν) + ψ(iγ · ∂ −m)ψ + gψAµγ
µψ (8)
(where F µν = ∂µAν − ∂νAµ − ig[Aµ,Aν ] and g stands for the universal coupling
constant in the theory) with the nonlinear SLIV constraint
Tr(AµA
µ) = n2µM
2, n2µ = ±1 (9)
imposed4. One can easily see that, although we propose only the SO(1, 3) × G
invariance in the theory, the SLIV constraint taken (9) possesses, in fact, the much
higher accidental symmetry SO(D, 3D) determined by the dimensionality D of the
G group adjoint representation to which the vector fields Aiµ are belonged. This
symmetry is indeed spontaneously broken at a scale M
Aiµ(x)
= niµM (10)
with the vacuum direction given now by the ‘unit’ rectangular matrix niµ which
describes both of the generalized SLIV cases at once, time-like (SO(D, 3D) →
SO(D−1, 3D)) or space-like (SO(D, 3D) → SO(D, 3D−1)), respectively, depending
on the sign of the n2µ ≡ n
µ,i = ±1. This matrix has only one non-zero element for
both of cases determined by the proper SO(D, 3D) rotation. They are, particularly,
0 or n
3 provided that the vacuum expectation value (10) is developed along the
i = 0 direction in the internal space and along the µ = 0 or µ = 3 direction, respec-
tively, in the Minkowskian space-time. In response to each of these two breakings,
side by side with one true vector Goldstone boson and the D − 1 scalar Goldstone
bosons corresponding to the spontaneous violation of actual SO(1, 3) ⊗ G symme-
try of the total Lagrangian L, the D − 1 vector pseudo-Goldstone bosons related
to breaking of the accidental SO(D, 3D) symmetry of the SLIV constraint taken
(9) are also produced. Remarkably, in contrast to the familiar scalar PGB case[13]
the non-Abelian case (see next Section).
4As in the Abelian case, the existence of such a constraint could be related with some non-
linear σ type SLIV model proposed for the vector field multiplet Aiµ in the Yang-Mills theory
(8). Note in this connection that, due to its generic antisymmetry, the familiar quadrilinear terms
g2Tr([Aµ, Aν ])
2 in the Lagrangian (8) do not contribute into the SLIV since they identically
vanish for any single-valued vacuum configuration
the vector PGBs remain strictly massless being protected by the non-Abelian gauge
invariance of the starting Lagrangian (8). Together with the aforementioned true
vector Goldstone boson they complete the entire Goldstonic vector field multiplet
of the internal symmetry group G.
As in the Abelian case, upon an explicit use of the corresponding SLIV constraint
(9) being so far the only supplementary condition for vector field multiplet Aiµ, one
comes to the pure Goldstone field modes aiµ identified in a similar way
Aiµ = a
(n · A) , n · a ≡ niµa
µ,i = 0 (n2 ≡ n2µ) , (11)
At the same time, an effective Higgs mode (i.e., the Aiµ component in the vacuum
direction niµ) is given by the product n · A ≡ n
µ,i determined by the SLIV con-
straint
n · A =
M2 − n2(aiν)
2 =M −
2(aiν)
+O(1/M2) (12)
where, as earlier in the Abelian case, we took the positive sign for the square
root when expanding it in powers of (aiν)
2/M2. Note that the general Goldstonic
modes aiµ, apart from pure vector fields, contain the D − 1 scalar ones, a
0 and a
(i′ = 1...D − 1), for the time-like (niµ = n
0gµ0δ
i0) and space-like (niµ = n
3gµ3δ
SLIV, respectively. They can be eliminated from the theory if one puts the proper
supplementary conditions on the aiµ fields which were still the constraint free. Using
their overall orthogonality (11) to the physical vacuum direction niµ one can formu-
late these supplementary conditions in terms of a general axial gauge for the entire
aiµ multiplet
n · ai ≡ nµa
µ,i = 0, i = 0...D − 1 (13)
where nµ is the unit Lorentz vector introduced in the Abelian case which is now
oriented in Minkowskian space-time so as to be parallel to the vacuum matrix niµ.
For such a choice the simple equation holds
µ = s
inµ (s
n · ni
) (14)
which shows that the rectangular vacuum matrix niµ has the factorized ”two-vector”
form. As a result, apart from the Higgs mode excluded earlier by the orthogonality
condition (11), all the scalar fields also appear eliminated, and only pure vector fields,
(µ′ = 1, 2, 3) or ai
(µ′′ = 0, 1, 2) for time-like or space-like SLIV, respectively,
are only left in the theory.
We now show that the such constrained Goldstone vector fields aiµ (with the
supplementary conditions (13) taken) appear truly massless when the starting non-
Abelian Lagrangian L (8) is rewritten in the form determined by the SLIV. Actually,
putting the parametrization (11) with the SLIV constraint (12) into the Lagrangian
(8) one is led to the highly nonlinear Yang-Mills theory in terms of the pure Gold-
stonic gauge field modes aiµ. However, as in the above Abelian case, one should
first gauge away (using the local invariance of the Lagrangian L) the enormously
large, while false, Lorentz violating terms appearing in the theory in the form of
the fermion and vector field bilinears. As one can readily see, they stem from the
couplings gψAµγ
µψ and −1
g2Tr([Aµ, Aν ])
2, respectively, when the effective Higgs
mode expansion (12) is taken in the Lagrangian (8). Making the appropriate redef-
initions of the fermion (ψ ) and vector (aµ ≡ a
i) field multiplets
ψ → U(ω)ψ , aµ → U(ω)aµU(ω)
†, U(ω) = eigM(n
i·x)ti (15)
and using the evident equalities for the linear (in coordinate) transformations U(ω)
with the single-valued vacuum matrix niµ (n
0 or n
3 for the particular SLIV cases)
∂µU(ω) = ign
iU(ω) = igU(ω)niµt
i (16)
one can confirm that the abovementioned Lorentz violating terms are exactly can-
celled with the analogous bilinears stemming from their kinetic terms. So, the final
Lagrangian for the Goldstonic Yang-Mills theory takes the form (in the first approx-
imation in (aiν)
2/M2)
L(a,ψ) = −
Tr(fµνf
δ(n · ai)2 +
Tr(fµνh
2(aiν)
+ψ(iγ · ∂ −m)ψ + gψaµγ
gn2(aiν)
ψ(γ · nk)tkψ (17)
where the tensor fµν is, as usual, fµν = ∂µaν − ∂νaµ − ig[aµ,aν ], while hµν is a
new SLIV oriented tensor of the type
hµν = nµ∂ν − nν∂µ + ig([nµ,aν ]− [nν ,aµ]), nµ ≡ n
k (18)
This tensor hµν acts on the infinite series in (a
2 coming from the expansion of the
effective Higgs mode (12) from which only the first order term −n2(aiν)
2/2M was
taken throughout the Lagrangian L(a,ψ). We also retained the former notations for
the fermion and vector field multiplets after transformations (15), and explicitly in-
cluded the (axial) gauge fixing term into Lagrangian according to the supplementary
conditions taken (13).
The theory derived gives a proper generalization of the nonlinear QED model[12]
for the non-Abelian case. It contains the massless vector boson multiplet aiµ (con-
sisting of one Goldstone and D − 1 pseudo-Goldstone vector states) which gauges
the starting internal symmetry G. In the limit M → ∞ it is indistinguishable from
a conventional Yang-Mills theory taken in the general axial gauge. So, for this part
of the Lagrangian L(a,ψ) given by the zero-order in 1/M terms the spontaneous
Lorentz violation only means the noncovariant gauge choice in the otherwise gauge
invariant (and Lorentz invariant) theory. However, one may expect that, just as
it appears in the nonlinear QED model, also all the first and higher order in 1/M
terms in the L (17), though being by themselves the Lorentz and CPT violating
ones, do not lead to the physical SLIV effects due to the mutual cancellation of their
contributions into all the physical processes appeared.
4 The lowest order SLIV processes
Let us now show that the simple tree level calculations related to the Lagrangian
L(a,ψ) confirms in essence this proposition. As an illustration, we consider SLIV
processes in the lowest order in g and 1/M being the fundamental parameters of the
Lagrangian (17). They are, as one can readily see, the vector-fermion and vector-
vector elastic scattering going in the order g/M , which we turn to once the Feynman
rules in the Goldstonic Yang-Mills theory are established.
4.1 Feynman rules
The corresponding Feynman rules, apart from the ordinary Yang-Mills theory rules
(i) the vector-fermion vertex
− ig γµ t
i (19)
(ii) the vector field propagator (taken in a general axial gauge nµaiµ = 0)
Dijµν (k) = −
gµν −
nµkν + kµnν
n · k
n2kµkν
(n · k)2
which automatically satisfies the orthogonality condition nµD
µν(k) = 0 and on-shell
transversality kµD
µν(k) = 0 (k
2 = 0); the latter means that free vector fields with
polarization vector ǫiµ(k, k
2 = 0) are always appeared transverse kµǫiµ(k) = 0;
(iii) the 3-vector vertex (with vector field 4-momenta k1, k2 and k3; all 4-momenta
in vertexes are taken ingoing throughout)
gcijk[(k1 − k2)γgαβ + (k2 − k3)αgβγ + (k3 − k1)βgαγ ] (21)
include the new ones, violating Lorentz and CPT invariance, for
(iv) the contact 2-vector-fermion vertex
(γ · nk)τkgµν δ
ij (22)
(v) another 3-vector vertex
(k1 · n
i)k1,αgβγδ
jk + (k2 · n
j)k2,βgαγδ
ki + (k3 · n
k)k3,γgαβδ
where the second index in the vector field 4-momenta k1, k2 and k3 denotes their
Lorentz components;
(vi) the extra 4-vector vertex (with the vector field 4-momenta k1,2,3,4 and their
proper differences k12 ≡ k1 − k2 etc.)
[cijpδklgαβgγδ(n
p · k12) + c
klpδijgαβgγδ(n
p · k34) +
+cikpδjlgαγgβδ(n
p · k13) + c
jlpδikgαγgβδ(n
p · k24) + (24)
+cilpδjkgαδgβγ(n
p · k14) + c
jkpδilgαδgβγ(n
p · k23)]
where only the terms which can not lead to contractions of the rectangular vacuum
matrix n
µ with vector field polarization vectors ǫ
µ(k) are presented. These contrac-
tions are in fact vanished due to the gauge taken (13), np · ǫi = sp(n · ǫi) = 0 (with
a factorized two-vector form for the matrix n
µ (14) used).
Just the rules (i-vi) are needed to calculate the lowest order amplitudes of the
processes we have mentioned in the above.
4.2 Vector boson scattering on fermion
This process is directly related to two SLIV diagrams one of which is given by the
contact a2-fermion vertex (22), while another corresponds to the pole diagram with
the longitudinal a-boson exchange between Lorentz violating a3 vertex (23) and
ordinary a-boson-fermion one (19). Since ingoing and outgoing a-bosons appear
transverse (k1 · ǫ
i(k1) = 0, k2 · ǫ
j(k2) = 0) only the third term in this a
3 coupling
(23) contributes to the pole diagram so that one comes to a simple matrix element
iM for both of diagrams
iM = i
ū(p2)τ
(γ · nl) + i(k · nl)γµkνDµν(k)
u(p1)[ǫ(k1) · ǫ(k2)] (25)
where the spinors u(p1,2) and polarization vectors ǫ
µ(k1) and ǫ
µ(k2) stand for the in-
going and outgoing fermions and a-bosons, respectively, while k is the 4-momentum
transfer k = p2 − p1 = k1 − k2. Upon the further simplifications in the square
bracket related to the explicit form of the a boson propagator Dµν(k) (20) and ma-
trix niµ (14), and using the fermion current conservation ū(p2)(p̂2 − p̂1)u(p1) = 0,
one is finally led to the total cancellation of the Lorentz violating contributions to
the a-boson-fermion scattering in the g/M approximation.
Note, however, that such a result may be in some sense expected since from the
SLIV point of view the lowest order a-boson-fermion scattering discussed here is
hardly distinct from the photon-fermion scattering considered in the nonlinear QED
case[12]. Actually, the fermion current conservation which happens to be crucial for
the above cancellation works in both of cases, whereas the couplings being peculiar
to the Yang-Mills theory have not yet touched on. In this connection the next
example seems to be more instructive.
4.3 Vector-vector scattering
The matrix element for this process in the lowest order g/M is given by the contact
SLIV a4 vertex (24) and the pole diagrams with the longitudinal a-boson exchange
between the ordinary a3 vertex (21) and Lorentz violating a3 one (23), and vice
versa. There are six pole diagrams in total describing the elastic a − a scattering
in the s- and t-channels, respectively, including also those with an interchange of
identical a-bosons. Remarkably, the contribution of each of them is exactly canceled
with one of six terms appeared in the contact vertex (24). Actually, writing down the
matrix element for one of the pole diagrams with ingoing a-bosons (with momenta k1
and k2) interacting through the vertex (21) and outgoing a-bosons (with momenta
k3 and k4) interacting through the vertex (23) one has
cijpδkl[(k1 − k2)µgαβ + (k2 − k)αgβµ + (k − k1)βgαµ] ·
·Dpqµν(k)gγδkν(n
q · k)[ǫi,α(k1)ǫ
j,β(k2)ǫ
k,γ(k3)ǫ
l,δ(k4)] (26)
where polarization vectors ǫi,α(k1), ǫ
j,β(k2), ǫ
k,γ(k3) and ǫ
l,δ(k4) belong, respectively,
to ingoing and outgoing a-bosons, while k = −(k1 + k2) = k3 + k4 according to the
momentum running in the diagrams taken above. Again, as in the previous case of
vector-fermion scattering, due to the fact that outgoing a-bosons appear transverse
(k3 · ǫ
k(k3) = 0 and k4 · ǫ
l(k4) = 0), only the third term in the Lorentz violating a
coupling (23) contributes to this pole diagram. Upon evident simplifications related
to the a-boson propagator Dµν(k) (20) and matrix n
µ (14) one comes to the expres-
sion which is exactly cancelled with the first term in the contact SLIV vertex (24)
when it is properly contracted with a-boson polarization vectors. Likewise, other
terms in this vertex provide the further one-to-one cancellation with the remaining
pole matrix elements iM
(2−6)
. So, again, the Lorentz violating contribution to the
vector-vector scattering is absent in Goldstonic Yang-Mills theory in the lowest g/M
approximation.
4.4 Other processes
Other tree level Lorentz violating processes, related to a bosons and fermions, appear
in higher orders in the basic SLIV parameter 1/M . They come from the subsequent
expansion of the effective Higgs mode (12) in the Lagrangian (17). Again, their
amplitudes are essentially determined by an interrelation between the longitudinal
a-boson exchange diagrams and the corresponding contact a-boson interaction dia-
grams which appear to cancel each other thus eliminating physical Lorentz violation
in theory.
Most likely, the same conclusion can be derived for SLIV loop contributions
as well. Actually, as in the massless QED case considered earlier [19], the corre-
sponding one-loop matrix elements in Goldstonic Yang-Mills theory either vanish
by themselves or amount to the differences between pairs of the similar integrals
whose integration variables are shifted relative to each other by some constants (be-
ing in general arbitrary functions of external four-momenta of the particles involved)
that in the framework of dimensional regularization leads to their total cancellation.
So, the Goldstonic vector field theory (17) for a non-Abelian charge-carrying
matter is likely to be physically indistinguishable from a conventional Yang-Mills
theory.
5 Conclusion
The spontaneous Lorentz violation in 4-dimensonal flat Minkowskian space-time was
shown to generate vector Goldstone bosons both in Abelian and non-Abelian theo-
ries with the corresponding nonlinear vector field constraint (1) or (9) imposed. In
the Abelian case such a massless vector boson is naturally associated with photon.
In non-Abelian case, although the pure Lorentz violation still generates only one
genuine Goldstone vector boson, the accompanying vector PGBs related to a vio-
lation of the larger accidental symmetry SO(D, 3D) of the SLIV constraint (9) in
itself come also into play in the final arrangement of the entire Goldstone vector
field multiplet of the internal symmetry group G. Remarkably, they remain strictly
massless being protected by the gauge invariance of the Yang-Mills theory involved.
These theories, both Abelian and non-Abelian, while being essentially nonlinear in
the Goldstone vector modes, are physically indistinguishable from conventional QED
and Yang-Mills theory. One could actually see that just the gauge invariance not
only provides these theories to be free from the unreasonably large Lorentz violation
stemming from the fermion and vector field bilinears (see Sections 2 and 3), but also
render all the other physical SLIV effects (including those which are suppressed by
the Lorentz violation scale M) non-observable (Section 4). As a result, Abelian and
non-Abelian SLIV theory appear, respectively, as standard QED and Yang-Mills
theory taken in the nonlinear gauge (to which the vector field constraints (1) and
(9) are virtually reduced), while the S-matrix remains unaltered under such a gauge
convention.
So, while at present the Goldstonic nature of gauge fields, both Abelian and non-
Abelian, seems to be highly plausible, the most fundamental question of physical
Lorentz violation in itself, that only could uniquely point toward such a possibility,
is still an open question. Note, that here we are not dealing with direct (and quite
arbitrary in essence) Lorentz non-invariant extensions of QED or Standard Model
which were intensively discussed on their own in recent years [6-8]. Rather, the case
in point is a construction of genuine SLIV models which would generate gauge fields
as the proper vector Goldstone bosons, from one hand, and could lead to observed
Lorentz violating effects, from the other. In this connection, somewhat natural
framework for physical Lorentz violation to occur would be a model where the
internal gauge invariance were slightly broken at very small distances through some
high-order operators stemming from the gravity-influenced area. Such physical SLIV
effects would be seen in terms of powers of ratio M/MP l (where MP l is the Planck
mass). So, for the SLIV scale comparable with the Planck one they would become
directly observable. Remarkably enough, if one has such internal gauge symmetry
breaking in an ordinary Lorentz invariant theory this breaking appears vanishingly
small at laboratory being properly suppressed by the Planck scale. However, the
spontaneous Lorentz violation would render it physically significant: the higher
Lorentz scale, the greater SLIV effects observed. If true, it would be of particular
interest to have a better understanding of the internal gauge symmetry breaking
mechanism that brings out the spontaneous Lorentz violation at low energies. We
return to this basic question elsewhere.
Acknowledgments
We would like to thank Colin Froggatt, Rabi Mohapatra and Holger Nielsen for useful
discussions and comments. One of us (J.L.C.) is grateful for the warm hospitality
shown to him during a visit to Center for Particle and String Theory at University
of Maryland where part of this work was carried out.
References
[1] W. Heisenberg, Rev. Mod. Phys. 29 (1957) 269;
J.D. Bjorken, Ann. Phys. (N.Y.) 24 (1963) 174;
I. Bialynicki-Birula, Phys. Rev. 130 (1963) 465 ;
G. Guralnik, Phys. Rev. 136 (1964) B1404;
T. Eguchi, Phys.Rev. D 14 (1976) 2755;
H. Terazava, Y. Chikashige and K. Akama, Phys. Rev. D 15 (1977) 480 .
[2] C.D. Froggatt and H.B. Nielsen, Origin of Symmetries (World Scientific, Sin-
gapore, 1991).
[3] J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen, Phys. Rev. Lett. 87 (2001)
091601;
J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen Nucl. Phys. B 609 (2001) 46;
J.D. Bjorken, hep-th/0111196;
Per Kraus and E.T. Tomboulis, Phys. Rev. D 66 (2002) 045015;
A. Jenkins, Phys. Rev. D 69 (2004) 105007;
J.L. Chkareuli, C.D. Froggatt, R.N. Mohapatra and H.B. Nielsen,
hep-th/0412225;
J.L. Chkareuli, C.D. Froggatt and H.B. Nielsen, hep-th/0610186.
[4] D. Colladay and V.A. Kostelecky, Phys. Rev. D58 (1998) 116002 ;
V.A. Kostelecky, Phys. Rev. D69 (2004) 105009 ;
R. Bluhm and V.A. Kostelecky, Phys. Rev. D 71(2005) 065008;
CPT and Lorentz Symmetry, ed. A. Kostelecky (World Scientific, Singapore,
1999, 2002, 2005).
[5] S.M. Carroll, G.B. Field and R. Jackiw, Phys. Rev. D 41 (1990) 1231;
R. Jackiw and V.A. Kostelecky, Phys. Rev. Lett. 82 (1999) 3572.
[6] S. Coleman and S.L. Glashow, Phys. Rev. D 59 (1999) 116008.
http://arxiv.org/abs/hep-th/0111196
http://arxiv.org/abs/hep-th/0412225
http://arxiv.org/abs/hep-th/0610186
[7] J. W. Moffat, Int. J. Mod.Phys. D2 (1993) 351;
J.W. Moffat, Int. J. Mod.Phys. D12 (2003) 1279.
[8] O. Bertolami and D.F. Mota, Phys. Lett. B 455 (1999) 96.
[9] T. Jacobson, S. Liberati and D. Mattingly, Ann. Phys. (N.Y.) 321 (2006) 150.
[10] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961) 345.
[11] M. Suzuki, Phys. Rev. D 37 (1988) 210 .
[12] Y. Nambu, Progr. Theor. Phys. Suppl. Extra 190 (1968).
[13] S. Weinberg, The Quantum Theory of Fields, v.2, Cambridge University Press,
2000.
[14] V.A. Kostelecky and S. Samuel, Phys. Rev. D 39 (1989) 683;
V.A. Kostelecky and R. Potting, Nucl. Phys. B 359 (1991) 545.
[15] P.A.M. Dirac, Proc. Roy. Soc. 209A (1951) 292;
P.A.M. Dirac, Proc. Roy. Soc. 212A (1952) 330.
[16] R. Righi and G. Venturi, Lett. Nuovo Cim. 19 (1977) 633;
R. Righi, G. Venturi and V. Zamiralov, Nuovo Cim. A47 (1978) 518.
[17] Y.M. Cho and P.G.O. Freund, Phys. Rev. D 12 (1975) 1711.
[18] E. A. Ivanov and V.I. Ogievetsky, Lett. Math. Phys. 1 (1976) 309 .
[19] A.T. Azatov and J.L. Chkareuli, Phys. Rev. D 73 (2006) 065026.
[20] J.L. Chkareuli and Z.R. Kepuladze, Phys. Lett. B 644 (2007) 212;
J.L. Chkareuli and Z.R. Kepuladze, Proc. of XIV Int. Seminar “Quarks-2006”,
eds. S.V. Demidov at al (Moscow, INR, 2006); hep-th/0610227.
http://arxiv.org/abs/hep-th/0610227
	Introduction
	Goldstonic quantum electrodynamics
	Goldstonic Yang-Mills theory
	The lowest order SLIV processes
	Feynman rules
	Vector boson scattering on fermion
	Vector-vector scattering
	Other processes
	Conclusion
ABSTRACT
  We argue that non-Abelian gauge fields can be treated as the pseudo-Goldstone
vector bosons caused by spontaneous Lorentz invariance violation (SLIV). To
this end, the SLIV which evolves in a general Yang-Mills type theory with the
nonlinear vector field constraint $Tr(% \boldsymbol{A}_{\mu
}\boldsymbol{A}^{\mu})=\pm M^{2}$ ($M$ is a proposed SLIV scale) imposed is
considered in detail. With an internal symmetry group $G$ having $D$ generators
not only the pure Lorentz symmetry SO(1,3), but the larger accidental symmetry
$SO(D,3D)$ of the SLIV constraint in itself appears to be spontaneously broken
as well. As a result, while the pure Lorentz violation still generates only one
genuine Goldstone vector boson, the accompanying pseudo-Goldstone vector bosons
related to the $SO(D,3D)$ breaking also come into play in the final arrangement
of the entire Goldstone vector field multiplet. Remarkably, they remain
strictly massless, being protected by gauge invariance of the Yang-Mills theory
involved. We show that, although this theory contains a plethora of Lorentz and
$CPT$ violating couplings, they do not lead to physical SLIV effects which turn
out to be strictly cancelled in all the lowest order processes considered.
However, the physical Lorentz violation could appear if the internal gauge
invariance were slightly broken at very small distances influenced by gravity.
For the SLIV scale comparable with the Planck one the Lorentz violation could
become directly observable at low energies.

<|endoftext|><|startoftext|>
Introduction
The knowledge of the properties of highly compressed and heated hadronic
matter is an important issue for the understanding of astrophysical processes,
such as the mechanism of supernovae explosions and the physics of neutron
stars [1,2]. Heavy ion collisions provide the unique opportunity to explore
highly excited hadronic matter, i.e. the high density behavior of the nuclear
EoS, under controlled conditions (high baryon energy densities and tempera-
tures) in the laboratory [3]. Of particular recent interest is also the still poorly
known density dependence of the isovector channel of the EoS.
Suggested observables have been the nucleon collective flows [3,4] and the
distributions of produced particles such as pions and, in particular, particles
with strangeness (kaons) [5,6]. Because of the rather high energy threshold
(Elab = 1.56 GeV for Nucleon-Nucleon collisions), kaon production in HICs
at energies in the range 0.8− 1.8 AGeV is mainly due to secondary processes
involving ∆ resonances and pions (π). On the other hand, secondary processes
require high baryon density. This explains why the kaon production around
threshold is intimately connected to the high density stage of the nucleus-
nucleus collision. Furthermore, the relatively large mean free path of positive
charged (K+) and neutral (K0) kaons inside the hadronic environment causes
hadronic matter to be transparent for kaons [7]. Therefore kaon yields and
generally strangeness ratios have been proposed as important signals for the
investigation of the high density behavior of the nuclear EoS. This idea, as
firstly suggested by Aichelin and Ko [8], has been recently applied in HIC
at intermediate energies in terms of strangeness ratios, e.g. the ratio of the
kaon yields in Au+Au and C+C collisions [5,9]. In these studies it was found
that this ratio is very sensitive to the stiffness of the nuclear EoS. Indeed
comparisons with KaoS data [10] favored a soft behavior of the high density
nuclear EoS, a statement which is particularly consistent with elliptic flow
data of the FOPI collaboration [11].
The idea of studying particle ratios in HICs around the kinematical threshold
has been recently applied in the determination of the isovector channel of the
nuclear EoS, i.e. the high density dependence of the symmetry energy Esym. It
has turned out that particle ratios, such as (π−/π+) [12] or (K0/K+) [13–15],
are sensitive to the stiffness of the symmetry energy and, in particular to the
strength of the vector isovector field. However in medium effects on the kaon
propagation have been neglected so far. Here we will test the robustness of the
yield ratio against the inclusion and the variation of the corresponding kaon
potentials. At the same time in Ref. [16] the role of the in-medium modifica-
tions of NN cross sections has been studied in terms of baryon and strangeness
dynamics. It was found that the pion and kaon yields are sensitively influenced
by the reduced effective NN cross sections for inelastic processes. Here we will
see that the kaon yield ratio appears robust even with respect to the density
dependence of the in-medium inelastic NN cross sections, while at variance
the pion ratio seems to be more sensitive.
The collision dynamics is rather complex and involves the nuclear mean field
(EoS) and binary 2-body collisions. In the presence of a nuclear medium the
treatment of binary collisions represents a non-trivial problem. The NN cross
sections for elastic and inelastic processes, which are the crucial physical pa-
rameters here, are experimentally accessible only in free space and not for
2-body scattering at finite baryon density. Recent microscopic studies, based
on the G-matrix approach, have shown a strong decrease of the elastic NN
cross section [17,18] in a hadronic medium. These in-medium effects of the
elastic NN cross section considerably influence the hadronic reaction dynam-
ics [19]. Obviously the question arises whether similar in-medium effects of the
inelastic NN cross sections may affect the reaction dynamics and, in particular,
the production of particles (pions and kaons).
Furthermore, the strangeness propagation inside the nuclear medium is even
more complex and involves the additional consideration of kaon mean field
potentials in the dynamical description. This is an important issue when com-
paring with experimental kaon data [10]. In a Chiral Perturbation approach
at the lowest order (ChPT Potentials), the kaon (antikaon) potential has an
attractive scalar and a repulsive (attractive) vector part [20]. This leads to
weakly repulsive (strongly attractive) potentials for kaons (antikaons) with
corresponding scalar and vector kaon-nucleon coupling constants depending
on the parametrization [20,21] accounted for. Similar results can be obtained
in an effective meson-coupling model (OBE Potentials, in the RMF spirit),
where the K-meson couplings are simply related to the nucleon-meson ones,
in the spirit of ref. [22]. The latter approach has the advantage of being fully
consistent with the covariant transport equations used to simulate the reaction
dynamics [14,15]. We remind that the high density dependence of the kaon
self energies is still an object of current debate, e.g. see Refs. [23,7] in which
the role of the kaon potential has been investigated in terms of kaon in-plane
and out-of-plane flows. Moreover for studies aimed to the determination of
the symmetry energy from strangeness production one has to consider with
particular care the isospin dependence of the kaon mean field potential.
The main focus of the present work is on a detailed study of the robustness of
the pionic (π−/π+) and, in particular, the strangeness ratio (K0/K+) with re-
spect to the in-medium modifications of the imaginary part of the nucleon self
energy, i.e. the NN cross sections, and to the in-medium variations of the kaon
self energy, i.e. the density dependence of the kaon potential. This analysis,
which goes beyond our previous investigations of [14,15], is also motivated by
new measurements of the FOPI collaboration [24] by means of the strangeness
ratios.
The paper is organized as follows: The next Section describes the theoret-
ical treatment of the reaction dynamics within the Relativistic Boltzmann-
Uheling-Uhlenbeck (RBUU) transport equation. A detailed discussion on the
in-medium modifications of the inelastic NN cross sections is presented. In Sec-
tion 3 we discuss the kaon mean field potentials (in both ChPT and OBE/RMF
schemes) and their expected isospin dependence. Section 4 is devoted to a short
introduction to the dynamical calculations. Results are then shown in Section
5, mostly for central 197Au+197 Au collisions at 1AGeV , in terms of pion and
kaon yields. The initial presentation of the absolute yields is relevant for a de-
tailed discussion as well as for a comparison with theoretical results of other
groups and with experimental data of the KaoS and FOPI collaborations.
All together this intermediate step is important for testing the reliability of
the calculations, since ratios do not do it. Finally we present the pion and
strangeness ratios and discuss their dependence on the in-medium modifica-
tions of the cross NN cross sections and of the kaon potentials, including the
isospin effects. In Section 6 we conclude with a summary and some general
comments and perspectives.
2 Theoretical description of the collision dynamics
In this chapter we briefly discuss the transport equation focusing on the treat-
ment of two features important for kaon dynamics: (a) the collision integral by
means of the cross sections; (b) the kaon mean field potential and its isospin
dependence.
2.1 The RBUU equation
The theoretical description of HICs is based on the semiclassical kinetic the-
ory of statistical mechanics, i.e. the Boltzmann Equation with the Uehling-
Uhlenbeck modification of the collision integral [25]. The relativistic analog of
this equation is the Relativistic Boltzmann-Uehling-Uhlenbeck (RBUU) equa-
tion [26]
k∗µ∂xµ + (k
µν +M∗∂µxM
∗) ∂k
f(x, k∗) =
2(2π)9
W (kk2|k3k4)
f3f4f̃ f̃2 − ff2f̃3f̃4
, (1)
where f(x, k∗) is the single particle distribution function. In the collision term
the short-hand notations fi ≡ f(x, k∗i ) for the particle and f̃i ≡ (1− f(x, k∗i ))
0 50 100 150 200 250 300 350
 [MeV]
=1.1 fm
=1.34 fm
=1.7 fm
pn-data
Fig. 1. Elastic in-medium neutron-proton cross section σel at various Fermi momenta
kF as a function of the laboratory energy Elab. The free cross section (kF = 0) is
compared to the experimental total np cross section [17].
for the hole distributions are used, with E∗
M∗2 + k2. The collision in-
tegral explicitly exhibits the final state Pauli-blocking while the in-medium
scattering amplitude includes the Pauli-blocking of intermediate states.
The dynamics of the drift term, i.e. the lhs of eq.(1), is determined by the
mean field. Here the attractive scalar field Σs enters via the effective mass
M∗ = M − Σs and the repulsive vector field Σµ via the kinetic momenta
k∗µ = kµ − Σµ and via the field tensor F µν = ∂µΣν − ∂νΣµ. The dynamical
description according to Eq.(1) involves the strangeness propagation in the
nuclear medium. This topic will be discussed in more detail at the end of this
section.
2.2 In-medium effects on NN cross sections
The in-medium cross sections for 2-body processes (see below) enter in the
collision integral via the transition amplitude
W = (2π)4δ4 (k + k2 − k3 − k4) (M∗)4|T |2 (2)
with T the in-medium scattering matrix element.
In the kinetic equation (1) both physical input quantities, the mean field (EoS)
and the collision integral (cross sections) should be derived from the same
underlying effective two-body interaction in the medium, i.e. the in-medium
T-matrix; Σ ∼ ℜTρB, σ ∼ ℑT , respectively. However, in most practical
applications phenomenological mean fields and cross sections have been used.
In such approach the strategy is to adjust to the known bulk properties of
nuclear matter around the saturation point, and to try to constrain the models
at supra-normal densities with the help of heavy ion reactions [27,28]. Medium
modifications of the NN cross sections are usually not taken into account.
In spite of that for several observables the comparison to experimental data
appears to work astonishingly well [27–30]. However, in particular kinematical
regimes a sensitivity to the elastic NN cross sections of dynamical observables,
such as collective flows and stopping [19,31] or transverse energy transfer [32],
has been observed.
Microscopic Dirac-Brueckner-Hartree-Fock (DBHF) studies for nuclear matter
above the Fermi energy regime show a strong density dependence of the elastic
[17] and inelastic [18,33] NN cross sections. In such studies one starts from the
bare NN-interaction in the spirit of the One-Boson-Exchange (OBE) model by
fitting the parameters to empirical nucleon-nucleus scattering and solves then
the equations of the nuclear matter many body problem in the T -matrix or
ladder approximation. It is not the aim of the present work to go into further
details on this topic. An important feature of such microscopic calculations is
the inclusion of the Pauli-blocking effect in the intermediate scattering states
of the T -matrix elements and their in-medium modifications, i.e. the density
dependence of the nucleon mass and momenta. Here of particular interest are
the in-medium modifications of the inelastic NN cross sections since they di-
rectly influence the production mechanism of resonances and thus the creation
of pions and kaons according to the channels listed later (see Sect.3). DBHF
studies on inelastic NN cross sections are rare and in limited regions of density
and momentum [18]. For this reason we will first discuss in the following the
in-medium dependence of the elastic NN cross sections, which will be then
used as a starting basis for a detailed analysis of the density dependence of
the inelastic NN cross sections.
The microscopic in-medium dependence of the elastic cross sections can be
seen in Fig. 1, where the energy dependence of the in-medium neutron-proton
(np) cross section at Fermi momenta kF = 0.0, 1.1, 1.34, 1.7fm
−1, correspond-
ing to ρB ∼ 0, 0.5, 1, 2ρ0 (ρ0 = 0.16fm−3 is the nuclear matter saturation den-
sity) is shown. These results are obtained from relativistic Dirac-Brueckner
calculations [17]. The presence of the medium leads to a substantial suppres-
sion of the cross section which is most pronounced at low laboratory energy
Elab and high densities where the Pauli-blocking of intermediate states is most
efficient. At larger Elab asymptotic values of 15-20 mb are reached. Also the
angular distributions are affected by the presence of the medium. E.g. the ini-
tially strongly forward-backward peaked np cross sections become much more
isotropic at finite densities, mainly due to the Pauli suppression of intermedi-
ate soft modes (π-exchange) [17]. As a consequence a larger transverse energy
transfer can be expected.
The case of the inelastic NN cross sections is similar, but more complicated.
The presence of the medium influences not only the matrix elements, but also
the threshold energy Etr, which is an important quantity at beam energies be-
low or near the threshold of particle production. In free space it is calculated
from the invariant quantity s = (p
1 + p
2 )(p1µ + p2µ) with p
i , (i = 1, 2) the
4-momenta of the two particles in the ingoing collision channel, e.g. NN −→
N∆. This quantity is conserved in binary collisions in free space, from which
one determines the modulus of the momenta of the particles in the outgoing
channel. The threshold condition reads Etr ≡
s ≥ M1 +M2. Cross sections
in free space are usually parametrized in terms of
s or the corresponding mo-
mentum in the laboratory system plab within the One-Boson-Exchange (OBE)
model, see e.g. [34] for details.
At finite density, however, particles carry kinetic momenta and effective masses
and obey a dispersion relation p∗µp
∗µ = m∗2 modified with respect to the free
case. These in-medium effects shift the threshold energy in the free space
according to s∗ = (p
2 )(p
2µ) and the threshold condition for inelastic
processes inside the medium reads now E∗
s∗ ≥ m∗
. The requirement
of energy-momentum conservation can be carried out in terms of the quantity
s∗ or s, only as long as the in-medium mean fields or the corresponding self
energies do not change between ingoing and outgoing channels.
The application of free parametrizations of cross sections for inelastic processes
in dynamical situations of HICs at finite density leads thus to an inconsistency,
since the threshold condition is performed in terms of effective quantities,
but the matrix elements are carried out in free space, e.g. by fitting their
parameters to free empirical NN scattering. This effect can be seen in Fig.
2 (left panel) where the free inelastic NN −→ N∆ cross section σinel as a
function of the laboratory energy Elab is displayed, at various baryon densities
ρB. The threshold energy in the free space is Etr =
s = 2.014 GeV (for
M = 0.939 GeV and Mmin = 1.076 for the nucleon and the lower limit mass of
the ∆ resonance). The corresponding threshold value of the laboratory energy
Elab = (E
− 4M2)/2M is 0.32 GeV. However, at finite density the threshold
is shifted towards lower energies, i.e. the free cross section increases, due to
the reduction of the free masses of the outgoing particles in the threshold
condition E∗
. Obviously at higher energies far from threshold the
free cross section does not depend on the density.
A more consistent approach is the determination of the inelastic cross section
under the consideration of in-medium effects, i.e. the Pauli-blocking of inter-
mediate scattering states and in-medium modified spinors in the determina-
tion of the matrix elements within the OBE model. A simultaneous treatment
of the transport equation and the structure equations of DBHF for actual
anisotropic momentum configurations is not possible, due to its high com-
plexity. For this reason we have applied the same method as for the case
0 0.5 1 1.5 2
 (GeV)
0 0.5 1 1.5 2
 (GeV)
=0.5ρ0
free effective
Fig. 2. Inelastic NN −→ N∆ cross section σinel at various baryon densities ρB (in
units of the saturation density ρ0 = 0.16 fm
−3) as a function of the laboratory
energy Elab using the free parametrizations (left) and the in-medium modified ones
from DBHF [18] (right).
of elastic binary processes, i.e. in-medium parametrizations of the inelastic
cross sections of the type NN −→ N∆ within the same underlying DBHF
approach as already used for the elastic processes. Haar and Malfliet [18]
investigated this topic for infinite nuclear matter with the result of a strong
in-medium modification of the inelastic cross sections due to the reasons given
above. However, these studies were performed at various densities but only in
a limited region of momenta. For a practical application in HICs we have
thus extended these DBHF calculations using an extrapolation technique. We
have imposed an exponential decay law of the form ae−bplab on the values of
the in-medium cross sections of the channel NN → N∆ given in ref. [18].
The parameter a normalizes to the last value of the extrapolated cross sec-
tion and b is defined by fitting the slope of the free cross section, since it
does not change with density. For the density dependence we have enforced
a correction of the form f(ρB) = 1 + a0(ρB/ρ0) + a1(ρB/ρ0)
2 + a2(ρB/ρ0)
where a0 = −0.601, a1 = 0223, a2 = −0.0035, with ρ0 saturation density,
are extracted from the results of ref.[18]. The same modification is imposed
on the cross sections of all the inelastic channels, in a form of the type
σeff = σfree(Elab)f(ρB), with σfree taken from the standard free parametriza-
tions of Ref. [34]. Such a procedure is well appropriate at low energies but at
higher momenta can be less accurate. This, however, should not be a problem
at the reaction energies below the kaon production threshold considered in
this work.
Fig. 2 (right panel) shows the energy dependence of the inelastic NN cross
section at various densities as obtained from DBHF calculations [17] for sym-
metric nuclear matter. As in the case of elastic processes (see Fig. 1), the
inelastic one drops with increasing baryon density ρB mainly due to the Pauli
blocking of intermediate scattering states and the in-medium modification of
the effective Dirac mass [17]. There are also phenomenological studies [16,33]
which give similar medium effects on the inelastic cross sections, within the
limitation to isospin symmetric nuclear matter. More suitable results would
come from a DBHF approach to isospin asymmetric nuclear matter. Only re-
cently such studies have been started [35], however, limiting to low momenta
regions, below the threshold energy of inelastic channels.
3 Kaon Potentials
Before starting with the presentation of the results, it is important to analyse
the in-medium kaon potential, since it could be relevant when theoretical re-
sults will be compared with experiments. In fact it has been widely discussed
whether the kaon potential plays a crucial role in describing kaon production
and their dynamics [23,7,9]. Kaplan and Nelson [20] found that the explicit chi-
ral symmetry breaking is not so small forK mesons and this leads to significant
corrections to the free kaon mass at finite baryon density. There are different
models for the description of kaon properties in the nuclear medium. Here we
will briefly discuss two main approaches, one based on Chiral Perturbation
Theory (ChPT ) and a second on effective meson couplings (OBE/RMF ),
more consistent with the general frame of our covariant reaction dynamics.
The results are in good agreement and this is not surprising on the basis of
a simple physics argument. It is well established [7] that kaons (K0,+) feel a
weak repulsive potential in nuclear matter, of the order of 20 − 30 MeV at
normal density. This can be described as the net result of the cancellation of an
attractive scalar and a repulsive vector interaction terms. Such a mechanism
can be reproduced in the ChPT approach through the competition between
an attractive scalar Kaplan-Nelson term [20] and a repulsive vector Weinberg-
Tomozawa [36] term. The same effect can be obtained in an effective meson
field scheme just via a coupling to the attractive σ-scalar and to the repulsive
ω-vector fields.
In this paper antikaons K− and their strong attractive potential will be not
discussed, since for the higher threshold they have been not considered in the
energy range of interest here.
Finally, for studies aimed to the determination of the symmetry energy from
strangeness production one has to treat with particular care the isospin de-
pendence of the kaon mean field potential.
3.1 Chiral Perturbative Results
Starting from an effective chiral Lagrangian for the K mesons one obtains a
density and isospin dependence for the effective kaon (K0,+) masses [7]. In
isospin asymmetric matter we finally get
m∗K =
m2K −
ρs3 + VµV µ (upper sign, K
+), (3)
where ρs, ρs3 are total and isospin scalar densities, with mK = 494MeV the
free kaon mass, fπ = 93MeV the pion decay constant, and ΣKN the kaon-
nucleon sigma term (attractive scalar), here chosen as 450 MeV. The vector
potential is given by:
8f ∗2π
8f ∗2π
jµ3 (upper sign, K
+), (4)
with jµ, jµ3 baryon and isospin currents. The f
π is an in-medium reduced pion
decay constant. It is expected to scale with density in a way similar to the chiral
condensate [37]. This leads to a reduction around normal density f ∗2π ≃ 0.6f 2π .
Such a reduction is compensated in one-loop ChPT by other contributions
in the scalar attractive term so we will use f ∗π only for the vector potential,
with an enhanced repulsive effect [7]. The constant C has been fixed from the
Gell-Mann-Okubo mass formula (i.e. in free space) to a value of 33.5MeV
[22]. In Eqs. (3-4) upper signs hold for K+ and lower signs for K0. As can be
seen, the vector term, which dominates over the scalar one at high density, is
more repulsive for K0 than for K+. This leads to a higher (lower) K0 (K+)
kaon in-medium energy given by the dispersion relation
EK(k) = k0 =
k2 +m∗2K + V0 (5)
The density dependence, evaluated in the chiral approach, of the quantity
EK(k)k=0 = m
K + V0 for K
0,+, that directly influences the in-medium pro-
duction thresholds is shown by the upper curves in Fig. 3 (left panel). In
particular, it can be noted that that K0 and K+ in medium-energy differs by
≈ 5% at ρB = 2ρ0 (with EK0 > EK+), at a fixed isospin asymmetry around
0.2. Therefore, the inclusion of isovector terms favors K+ over K0 production,
with a consequent reduction of the K0/K+ strangeness ratio.
0 1 2 3
0 1 2 3
0 - E
Fig. 3. Density dependence (ρ0 is the saturation density) of in medium kaon energy
(left panel) in unit of the free kaon mass (mK = 0.494GeV ). Upper curves refer
to ChPT model calculations: the central line corresponds to symmetric matter, the
other two give the isospin effect (up K0, down K+). Bottom curves are obtained
in the OBE/RMF approach, the solid one is for symmetric matter. The isospin
splitting is given by the dashed (NLρ) and dotted (NLρδ) lines, again up K0, down
K+. Right panel: relative weight of the isospin splitting, see text. All the curves are
obtained considering an asymmetry parameter α = 0.2.
3.2 Relativistic Mean Field Results
Kaon potentials can be also derived within an effective meson field OBE ap-
proach, fully consistent with the RMF transport scheme used to simulate the
reaction dynamics, see Eq.(1). We will use a simple constituent quark-counting
prescription to relate the kaon-meson couplings to the nucleon-meson cou-
plings, i.e. just a factor 3 reduction. Following the chiral argument discussed
before, only for the scalar vector case we have further increased the kaon
coupling to gωK ≃ 1.4/3gωN . This will ensure the required repulsion around
normal densities for K+s. Consistently the isospin dependence will be directly
derived from the coupling between the kaon fields and the ρ and δ isovector
mesons [22].
The in-medium energy carried by kaons will have the same form as in Eq.(5)
but with effective masses and vector potentials given by
m∗K =
m2K −mK(gσKσ ±
gδKgδN
m2K −mK(gσKσ ±
fδρs3) (6)
gωKgωN
gρKgρN
∗ρB ± fρρB3) (7)
where upper signs are forK+s. The fi ≡ g2iN/m2i , i = σ, ω, ρ, δ are the nucleon-
meson coupling constants used in our RMF Lagrangians and f ∗ω = 1.4fω due
to the enhanced kaon-scalar/vector coupling. σ represents the solution of the
non linear equation for the scalar/isoscalar field which gives the reduction of
the nucleon mass in symmetric matter, therefore we can directly evaluate the
kaon-σ coupling using
gσKσ =
(M −M∗)
where M∗ is the nucleon effective mass at the fixed baryon density.
In this RMF approach we can derive an almost analytical expression for
the isospin effects on the kaon in-medium energy Eq.(5) at k = 0. Using
the approximate form ρs ≃ M∗/E∗FρB for the scalar density, we get a rel-
ative weight of the isospin splitting of the kaon potentials ∆EK(k)k=0 ≡
EK0(k)k=0 −EK+(k)k=0 given by
2α(fρ − M
f ∗ω +
(mK − 16(M −M∗))
with α ≡ ρB3/ρB the asymmetry parameter.
We can now easily estimate the isospin splitting of K0 vs. K+ for the two
isovector mean field Lagrangians used here, NLρ and NLρδ. The effect will
be clearly larger when the δ coupling is included since we have to increase
the ρ-coupling fρ, see [14,15], but still the expected weight is relatively small,
going from about 1.5% (NLρ) to about 3.0% (NLρδ) at ρB = 2ρ0, for a fixed
isospin asymmetry around 0.2. The complete results are also shown in Fig. 3
(right panel). The agreement with the ChPT estimations is rather good, but
in the RMF scheme we see an overall reduced repulsion and a smaller isospin
splitting. Both effects are of interest for our discussion, the first affecting the
K0,+ absolute yields, the second important for the K0/K+ yield ratios.
4 Numerical realization and notations
The Vlasov term of the RBUU equation (1) is treated within the Relativistic
Landau-Vlasov method, in which the phase space distribution function f(x, p∗)
is represented by covariant Gaussians in coordinate and momentum space [38].
For the nuclear mean field or the corresponding EoS in symmetric matter the
fσ (fm
2) fω (fm
2) fρ (fm
2) fδ (fm
2) A (fm−1) B
NLρ 9.3 3.6 1.22 0.0 0.015 -0.004
NLρδ 9.3 3.6 3.4 2.4 0.015 -0.004
Table 1
Coupling parameters in terms of fi ≡ ( gimi )
2 for i = σ, ω, ρ, δ, A ≡ a
and B ≡ b
for the non-linear NL models [14] using the ρ (NLρ) and both, the ρ and δ mesons
(NLρδ) for the description of the isovector mean field.
NL2 parametrization [26] of the non-linear Walecka model [39] is adopted with
a compression modulus of 200 MeV and a Dirac effective mass of m∗ = 0.82 M
(M is the bare nucleon mass) at saturation. The momentum dependence enters
via the relativistic treatment in terms of the vector component of the baryon
self energy. The isovector components in the mean fields are introduced in the
NLρ,NLρδ Lagrangians as in the recent Refs. [14,15]. In Table 1 we report
all the coupling constants and the coefficients of the non-linear σ-terms.
The collision integral is treated within the standard parallel ensemble algo-
rithm imposing energy-momentum conservation. For the elastic NN cross sec-
tions the DBHF calculations of Ref. [17] have been used throughout this work.
At intermediate relativistic energies up to the threshold of kaon (K0,+) pro-
duction, i.e. Elab = 1.56 GeV, the major inelastic channels are (B, Y,K stand
for a baryon (nucleons N or a ∆-resonance), hyperon and kaon, respectively)
– NN ←→ N∆ (∆-production and absorption)
– ∆←→ πN (π-production and absorption)
– BB −→ BYK, Bπ −→ Y K (K-production from BB and Bπ-channels)
The produced resonances propagate in the same mean field as the nucleons,
and their decay is characterized by the energy dependent lifetime Γ which is
taken from Ref. [34]. The produced pions propagate under the influence of the
Coulomb interaction with the charged hadrons. Kaon production is treated
hereby perturbatively due to the low cross sections, taken from Refs. [40].
Kaons undergo elastic scattering and their phase space trajectories are deter-
mined by relativistic equations of motion, if the kaon potential is accounted
In the next section the results of transport calculations in terms of pion and
kaon yields and their rapidity distributions will be presented. The following
cases for the inelastic NN cross sections σinel and the kaon potential ΣK (scalar
and vector) will be particularly discussed:
– free σinel, without ΣK (w/o K-pot σfree)
– free σinel, with ΣK (w K-pot σfree)
– free σinel, with isospin dependent ΣK (w ID K-pot σfree)
– effective σinel, without ΣK (w/o K-pot σeff )
– effective σinel, with ΣK (w K-pot σeff )
– effective σinel, with isospin dependent ΣK (w ID K-pot σeff )
For pions only the different cases of σinel will be labelled, since they do not ex-
perience any potential, apart coulomb. One should note that in all calculations
only inelastic processes including the lowest mass resonance ∆(1232MeV )
have been considered, without accounting for the N∗(1440) resonance. This
will have not appreciable consequences for pions yields, but it slightly reduces
the kaon multiplicities.
5 Results
As mentioned in the introduction, the main topic of the present work is to
study the sensitivity of particle ratios to physical parameters such as in-
medium effects of cross sections and the isospin dependence of the kaon po-
tential. This is an important issue to clarify since there is some evidence sug-
gesting the yield ratios as good observables in determining the high density
behavior of the symmetry energy. In a near future these data will be exper-
imentally accessible with the help of reactions with radioactive ion beams.
However, a comparison of absolute values with experimental data, although it
is not the aim of this work, is essential and it has to be included in order to
show the consistency of our approach. Thus we will start the presentation of
the results first in terms of absolute yields, and comparison with data, before
passing to the main section on the particle ratios. Most calculations refer to
central 197Au+197 Au collisions at 1 AGeV .
5.1 Effects of in-medium inelastic NN cross sections on particle yields
5.1.1 Resonance and Pion Production
Here we study the role of the density dependence of the effective inelastic NN
cross sections on particle yields (pions and kaons). We start with the temporal
evolution of the ∆ resonances and the produced pions, as shown in Fig. 4.
The maximum of the multiplicity of produced ∆-resonances occurs around 15
fm/c which corresponds to the time of maximum compression. Due to their
finite lifetimes these resonances decay into pions (and nucleons) as ∆ −→
πN . Some of these pions are re-absorbed in the inverse process, i.e. πN −→
∆ but chemical equilibrium is never reached, as pointed out in [15]. This
0 10 20 30 40 50 60
time (fm/c)
0 10 20 30 40 50 60
time (fm/c)
Fig. 4. Time evolution of the ∆-resonances (left panel) and total pion yield (right
panel) for a central (b = 0 fm) Au+Au reaction at 1 AGeV incident energy. Cal-
culations with free (solid lines) and effective (DBHF, dashed lines) σinel are shown.
mechanism continues until all resonances have decayed leading to a saturation
of the pion yield for times t ≥ 50 fm/c (the so-called freeze-out time). The
resonance production takes place during the high density phase, where the
in-medium effects of the effective cross sections are expected to dominate.
In fact, the transport results with the in-medium modified σinel reduce the
multiplicity of inelastic processes, and thus the yields of ∆ resonances and
pions. However, the in-medium effect is not so pronounced here with respect
to similar phenomenological studies of Ref. [16,33], which should come from
the moderate density dependence of the effective cross sections, see also again
Fig. 2.
Fig. 5 shows the centrality dependence of the charged pion yields for Au+Au
collisions at 1.0 AGeV incident energy. The degree of centrality is characterized
by the observable Apart, which gives the number of participant nucleons and
can be calculated within a geometrical picture using smooth density profiles for
the nucleus [41]. Obviously Apart increases with decreasing impact parameter
b and its value approaches the total mass number of the two colliding nuclei
in the limiting case of b = 0 fm. As can be seen in Fig. 5, the charged pion
yields are enhanced with increasing Apart, particularly in a non-linear Apart-
dependence. As pointed out in [41], the charged pion multiplicities show a
similar non-linear increase also in the data. However, by directly comparing
the theoretical charged pion yields with the experiments [41] we observe that
our calculations overpredict the data, even when the in-medium reductions in
σinel are accounted for.
This discrepancy is a general feature of the transport models and may lie on
the role of the rescattering processes that take place in the spectator region,
0.0 100.0 200.0 300.0 400.0
 0  0
 5  5
 10  10
 15  15
 20  20
 (FOPI)
 (FOPI)
E=1.0 AGeV
Au+Au
Fig. 5. Centrality dependence (in terms of Apart) of the negative (π
−) and positive
(π+) charged pions for Au+Au collisions at 1 AGeV incident energy. Calculations
with free (solid lines, filled circles) and effective (dashed lines, filled squares) cross
sections are shown as indicated. Experimental data, taken from FOPI collaboration
[41], are also displayed for comparison.
where nuclear surface effects can play a crucial role. In order to check this
point we have performed a selection on pions produced at central rapidity,
where data are also available [41].
In Fig. 6 we present the inclusive (all centralities) pion rapidity distributions
vs. the FOPI data for charged pions. We see that the agreement is rather good
at mid-rapidity while we see a definite overcounting in the spectator sources.
Such a good evaluation of the pion production ad mid-rapidity is confirmed
by the results shown in Fig. 7, where we present the inclusive (all centralities)
pion transverse spectrum at midrapidity (−0.2 < y0 < 0.2). We first note that
this is also not much affected by the inclusion of the in-medium inelastic cross
sections. Moreover we see again that our results are in good agreement with
the experimental values from the FOPI collaboration [41], in the same rapidity
selection. The overestimation of the pion yields shown in Fig. 5 probably
results from other rapidity regions where the role of the spectator sources is
more evident. We have also to say that we are not imposing any experimental
filter to our results. The point is rather delicate since the main discrepancies
appear in high rapidity regions. In any case such a fine agreement at mid-
-2 -1 0 1 2
-2 -1 0 1 2
Fig. 6. Inclusive (all centralities) pion rapidity distributions for a Au+Au reaction
at Ebeam = 1 AGeV incident energy. Comparison with the experimental values given
by FOPI collaboration [41]; as in the data we have used a transverse momentum
cut to pt > 0.1GeV/c.
0 200 400 600 800
 (MeV)
 FOPI
 FOPIπ
0.1*π
Fig. 7. Inclusive transverse spectrum at midrapidity of π−, π+ for a Au+Au reaction
at Ebeam = 1 AGeV incident energy. Comparison with the experimental values given
by FOPI collaboration [41]. The cross sections are normalized to a rapidity interval
dy = 1.
rapidity is very important for the reliability of our results on kaon production,
mostly produced in that rapidity range via secondary πN,∆N channels, see
[15].
The pion reaction dynamics is furthermore not sensitively affected by the
in-medium inelastic cross sections. We restrict here the analysis to central
Au+Au collisions at 1 AGeV. In Fig. 8 we show cross section effects on the
-2 -1 0 1 2
-2 -1 0 1 2
Fig. 8. Rapidity distributions of negative and positive charged pions (left and right
panels, respectively) for a central (b = 0 fm) Au+Au reaction at Ebeam = 1 AGeV
incident energy.
rapidity distributions (normalized to the projectile rapidity in the cm sys-
tem) for π±, an observable which characterizes the degree of stopping or the
transparency of the colliding system. This is due to the fact that the global
dynamics is mainly governed by the total NN cross sections, in which its elas-
tic contribution is the same for all the cases. In previous studies [19,31] the
in-medium effects of the elastic NN cross sections gave important contribu-
tions to the degree of transparency or stopping. It was found that a reduction
of the effective NN cross section particularly at high densities is essential in
describing the experimental data [19], as confirmed by various other analy-
ses [31]. The density effects on the inelastic NN cross section influence only
those nucleons associated with resonance production, and therefore they do
not affect the global baryon dynamics significantly.
5.1.2 Kaon Production
The situation is different for kaon production, see Fig. 9. The influence of the
in-medium dependence of σinel is important, and reduces the kaon abundancies
by a factor of ≈ 30%. This is due to the fact that the leading channels for
kaon production are N∆ −→ BY K and Nπ −→ ΛK. Thus kaon production is
essentially a twostep process and the medium-modified inelastic cross sections
enter twice, leading to an increased sensitivity.
Fig. 10 shows the rapidity distributions of kaons, where the in-medium effect
is more visible with respect to the corresponding pion rapidity distributions
(see Fig. 8). These results seem to show that kaon production could be used
to determine the in-medium dependence of the NN cross section for inelas-
0 10 20 30 40 50 60
time (fm/c)
0 10 20 30 40 50 60
time (fm/c)
w/o K-pot σ
w/o K-pot σ
Fig. 9. Time evolution of the K0 (left panel) and K+ (right panel) multiplicities,
for the same reaction and models as in Fig. 4, with free and in-medium inelastic
cross sections, without the inclusion of the kaon potentials.
-2 -1 0 1 2
w/o K-pot σ
w/o K-pot σ
-2 -1 0 1 2
Fig. 10. Same as in Fig. 9, but for the normalized rapidity distributions.
tic processes. Similar phenomenological studies based on the BUU approach
[16,33] strongly support in-medium modifications of the free cross sections. It
is of great interest to perform an extensive comparison with experimental data
on kaon production, in order to have a more clear image of the effect of the
in-medium cross sections on their production. The point is that kaon absolute
yields are also largely affected by the kaon potentials, see the following, as
expected from the general discussion of the previous section. However since
kaons are mainly produced in more uniform high density regions the effects
of the medium on cross sections tend to disappear in the yield ratios. In the
next section we will show that the same holds true for the K0, K+ potentials.
Our conclusion is that the kaon yield ratios might finally be a rather robust
0 10 20 30 40 50 60
time (fm/c)
0 10 20 30 40 50 60
time (fm/c)
w/o K-pot
w K-pot
w ID K-pot
Fig. 11. Time evolution of the K0 (left panel) and K+ (right panel) multiplicities
for the same reaction as in Fig. 4. Calculations without (w/o K-pot, solid), with (w
K-pot, dashed) and with the isospin dependent (w ID K-pot,dotted-dashed) kaon
potential are shown. In all the cases the free choice for σinel is adopted.
observable to probe the nuclear EoS at high baryon densities.
5.2 The role of the kaon potential
As discussed in the previous sections, the important quantity which influences
the kaon production threshold is the in-medium energy at zero momentum [7].
This quantity rises with increasing baryon density and in the general case of
isospin asymmetric matter shows a splitting between K0 and K+, see Fig. 3.
We are presenting here several K-production results in ab initio collision sim-
ulations using the Chiral determination of the K-potentials, ChPT . Fig. 11
shows the time dependence of the two isospin states of the kaon with respect
to the role of the kaon potential and its isospin dependence. First of all, the
repulsive kaon potential considerably reduces the kaon yields, at least in this
ChPT evaluation.
The inclusion of the isospin dependent part of the kaon potential slightly
modifies the kaon yields, towards a larger K+ production in neutron-excess
matter. However by comparing to the corresponding isospin dependence of
the in-medium kaon energy, see Fig. 3, the effect is less pronounced in the
dynamical situation. This is due to the fact that in heavy ion collisions the local
asymmetry in the interacting region varies with time, see [15]. In particular,
it decreases with respect to the initial asymmetry because of partial isospin
equilibration due to stopping and inelastic processes with associated isospin
exchange. This is reflected also in the kaon rapidity distributions, see Fig. 12,
-2 -1 0 1 2
-2 -1 0 1 2
w/o K-pot
w K-pot
w ID K-pot
Fig. 12. Same as in Fig. 11, but for the rapidity distributions.
-1 0 1 2
normalized rapidity y
-1 0 1 2
normalized rapidity y
-1 0 1 2
normalized rapidity y
w/o K-pot w K-pot w ID K-pot
Fig. 13. K+ rapidity distributions for semi-central(b < 4 fm) Ni+Ni reactions at
1.93 AGeV. Theoretical calculations (as indicated) are compared with the exper-
imental data of FOPI (open triangles) and KaoS (open diamonds) collaborations
[42,43].
where the role of the kaon potential is crucial, but not its isospin dependence.
As we have already seen even in-medium modifications of inelastic cross sec-
tions are affecting the kaon absolute yields, so it appears of interest to look at
the combined effects. For that purpose we have performed calculations for a
semi-central (b < 4fm) Ni+Ni system at 1.93AGeV , where data are existing
from the FOPI [42] and KaoS [43] collaborations. The results for K+ rapidity
distributions, compared to experimental data, are shown in Fig.13.
We observe that although the kaon yields are reduced when using the in-
medium inelastic cross section, we are still rather far away from the data, left
panel of Fig. 13. We note that the reduction due to the density dependence
of the effective inelastic cross sections is rather moderate here with respect
to that of the heavier Au-system (see Fig. 10). for kaons). This is due to the
less compression achieved for the lighter Ni-systems. The inclusion of the kaon
potential, without (central panel) and with (right panel) isospin dependence,
is further suppressing the K+ yield, towards a better agreement with data, as
expected for the repulsive behavior at high density.
In fact the results obtained with kaon potentials and effective cross sections
seem to underestimate the data. This could be an indication that the ChPT
K-potentials are too repulsive at densities around 2ρ0 where kaons are pro-
duced, see [15]. We like to remind that the parameters of ChPT potentials
are essentially derived from free space considerations. When we follow a more
consistent RMF approach, directly linked to the effective Lagrangians used to
describe bulk properties of the nuclear matter as well as the relativistic trans-
port dynamics, we see less repulsion, bottom curves in Fig. 3 (left panel). This
is valid also for the isospin dependent part of the K-potentials, that more di-
rectly will affect the K0/K+ yield ratio. We see from the same Fig. 3 (right
panel) that in the RMF frame this splitting is reduced to a few percent for
all the different isovector interactions. The conclusion is that when kaon po-
tentials are evaluated within a consistent effective field approach we have a
better agreement with data for absolute yields, with a very similar reduction
of the K0 and K+ rates. This is important for the yield ratio, that then should
be not much sensitive to the in-medium effects on kaon propagation.
A similar conclusion on K-potential effects, obtained within the ChPT ap-
proach, can be drawn from the centrality dependence of the K+ yields shown
in Fig. 14 in the case of Au+Au collisions at 1 AGeV beam energy and com-
pared with KaoS data [44]. The trend in centrality can be reproduced by all
theoretical calculations (with different cross sections), however, all of them
seem to underestimate the experimental yields.
In fact we have to mention that another possible source of the discrepancy
with data can be that in all our simulations only the lowest mass resonance
∆(1232MeV ) has been dynamically included. Transport calculations from
other groups, that take care also of the N∗(1440MeV ) resonance, are getting
an enhancement of the K+ yield for Au+Au collisions at 1 AGeV incident
energy [7]. This significant dependence of the kaon yields on the N∗ resonance
comes from the 2-pionic N∗-decay channel, i.e. N∗ −→ ππN . Therefore, since
the most important channels of kaon production are the pionic ones, we can
expect some underestimation of the absolute yields in our calculations. Just
to confirm this point, in Fig. 14 we report also transport results from the
Tübingen group, in which all resonances are accounted for [7]. We finally re-
0,0 0,2 0,4 0,6 0,8
 5,0×10
 1,0×10
 1,5×10
 2,0×10
0,0 0,2 0,4 0,6 0,8 1,0
w K-pot w ID K-pot
Fig. 14. K+ centrality dependence in Au+Au reactions at 1 AGeV incident energy.
Our theoretical calculations (as indicated) are compared with KaoS data from [44]
(open diamonds) and with results of the Tübingen group (open squares).
mark that the inclusion of other nucleon resonances in neutron-rich matter
will further contribute to increase the K0 yield through a larger intermedi-
ate π− production. This can contribute to compensate the opposite effect of
isospin dependent part of the K-potentials on the K0/K+ yield ratios.
5.3 Pionic and Strangeness Ratios
A crucial question is whether particle yield ratios are influenced by in-medium
effects both on inelastic cross section and kaon potentials. This point is of ma-
jor importance particularly for kaons, since ratios of particles with strangeness
have been widely used in determining the nuclear EoS at supra-normal density.
Relative ratios of kaons between different colliding systems have been utilized
in determining the isoscalar sector of the nuclear EoS [5]. More recently, the
(π−/π+)- and (K0/K+)-ratios have been proposed in order to explore the high
density behavior of the symmetry energy, i.e. the isovector part of the nuclear
mean field [14,15,13,12].
Fig. 15 shows the incident energy dependence of the pionic (π−/π+, left panel)
and strangeness (K0/K+, right panel) ratios for the different choices of in-
elastic cross sections and kaon potentials, as widely discussed in the previous
sections.
First of all, a rapid decrease of the pionic ratio with increasing beam energy
0.8 1 1.2 1.4 1.6 1.8 2
 (AGeV)
0.8 1 1.2 1.4 1.6 1.8 2
 (AGeV)
free w/o K-pot
eff w/o K-pot
0.8 1 1.2 1.4 1.6 1.8 2
 (AGeV)
free w K-pot
eff w K-pot
0.8 1 1.2 1.4 1.6 1.8 2
 (AGeV)
tio σ
free w ID K-pot
eff w ID K-pot
Fig. 15. Energy dependence of the π−/π+ (left panel) and K0/K+ (right panel)
ratios for central (b = 0 fm) Au+Au reactions.
is observed, related to the opening of secondary rescattering processes (reab-
sorption/recreation of pions with associated isospin exchange) channels. The
corresponding strangeness ratio depends only moderately on beam energy due
to the absence of secondary interactions with the hadronic environment.
The pionic ratio is partially affected by the in-medium effects of σinel, as it can
be seen in the left panel of Fig.15. Its slope is slightly changing with respect to
beam energy. The situation is similar for the strangeness ratio, which actually
appears even more robust vs. in-medium modifications, even with the kaon
potentials. This can be seen in the right panel of Fig.15, where for all the
considered beam energies the ratio remains almost unchanged. Such a result
is consistent with those of the previous sections, where it was found that the
absolute kaon yields decrease in the same way when the effective σinel are
applied and when the K-potentials are included.
The different sensitivity to variations in the inelastic cross sections of pionic
vs. strangeness ratios can be easily understood. For the large rescattering and
lower masses pions can be produced at different times during the collision,
in different density regions. At variance kaons are mainly produced at early
times in a rather well definite compression stage, i.e. in a source with a more
uniform high density, and so the density dependence of the inelastic cross
sections will affect in the same way neutral and charged kaon yields, leaving
the ratio unchanged. At this level of investigation one could argue that the
strangeness ratio is a very promising observable in determining the nuclear
EoS and particularly its isospin dependent part. This has been also the main
conclusion in Ref. [15].
However, a strong isospin dependence of the kaon potentials could directly
NL NLρ NLρδ
K0/K+ (w/o K-pot) 1.24 (± 0.02) 1.35 (± 0.01) 1.43 (± 0.02)
K0/K+ (w ID K-pot) 1.02 (± 0.03) 1.22 (± 0.04) 1.34 (± 0.05)
Table 2
Sensitivity of the strangeness ratio K0/K+ to the isospin dependent kaon potential
and to the isovector mean field (NL, no isovector fields, NLρ and NLρδ). The
considered reaction is a central (b = 0 fm) Au+Au collision at 1 AGeV incident
energy.
affect the ratio, since the K0 and K+ rates will be modified in opposite ways.
This is shown by the two triangle points at 1 AGeV in the right panel of Fig.
15. As already discussed this large isospin dependence of the kaon potential,
clearly present in the ChPT evaluation, is greatly reduced in a consistent
mean field approach, see Fig. 3 and the arguments presented in Section 3.
In any case this point deserves more detailed studies. We plan to perform
ab initio kaon-production simulations within the OBE/RMF evaluation of
kaon potentials, with an isospin part fully consistent with the isovector fields
of the Hadronic Lagrangians used for the reaction dynamics.
An interesting final comment is that the sensitivity of the strangeness ratio
to the isovector part of the nuclear EoS remains even when strong isospin
dependence of the kaon potentials is inserted, as in the ChPT case.
In order to check this, we have repeated for Au+Au at 1 AGeV incident energy
the calculations by varying the isovector part of the nuclear mean field. As
in Refs. [14,15], three options for the isovector mean field have been applied:
the NL (no isovector fields), NLρ and NLρδ parametrizations, but now in-
cluding the isospin effect in the kaon potential in the ChPT evaluation. The
options of the symmetry energy differ from each other in the high density stiff-
ness. NL gives a relatively soft Esym, NLρδ a relatively stiff one, and NLρ
lies in the middle between the other limiting cases [14]. Table 2 shows the
strangeness ratio as function of these different cases for the isovector mean
field, keeping now constant the other parameters (free σinel, isospin dependent
kaon potential). The ratio, indeed, strongly decreases when the isospin part in
the kaon potential is accounted for. The more interesting result is, however,
that the relative difference between the different choices of the symmetry en-
ergy remains stable. This can be understood from the fact that in the kaon
self energies the isospin sector contains only the isospin densities and currents
without additional parameters such meson-nucleon coupling constants. Since
the local asymmetry does not strongly vary from one case to the other (NL,
NLρ, NLρδ), one would expect a robustness of the EoS dependence. Thus
we conclude that the strangeness ratio appears to be well suited in determin-
ing the isovector EoS, however, a fully consistent mean field approach is still
missing.
6 Conclusions
We have investigated the role of the in-medium modifications of the inelastic
cross section and of the kaon mean field potentials on particle production in
intermediate energy heavy ion collisions within a covariant transport equation
of a Boltzmann type. We have used for both, the elastic and inelastic NN cross
sections the same DBHF approach which provide in a parameter free manner
the in-medium modifications of the imaginary part of the self energy in nu-
clear matter. The kaon potential has been evaluated in two ways, following a
Chiral Perturbative approach and an Effective Field scheme, considering va-
lence quark-meson couplings. We have applied these modifications of the cross
sections and kaon potentials to the collision integral of the transport equation
and analyzed Au+Au and Ni+Ni collisions at intermediate relativistic energies
around the kaon threshold energy.
Our studies have shown a good sensitivity of the particle multiplicities and
rapidity distributions of pions and kaons. In particular, a moderate reduction
for pions has been seen when the in-medium effects in the inelastic cross section
are accounted for. The pion yields are still overestimating the inclusive data
while we have a very nice agreement with the pion spectra and multiplicities at
mid-rapidity. The latter point is important for trusting the kaon production,
mainly due to secondary pion collisions at mid-rapidity.
At variance the kaon (K0,+) yields show a larger sensitivity to the reduction
of the inelastic cross sections, with a decrease of about 30 %. However we
see that the introduction of a repulsive kaon potential is essential in order to
reproduce even inclusive data.
We have then focused our attention on π−/π+ and K0/K+ yield ratios, re-
cently suggested as good probes of the isovector part of the EoS at high den-
sities. The pionic ratios, due to their strong secondary interaction processes
with the hadronic environment, show a dependence on the density behavior
of the inelastic cross sections. A further selection of the production source, i.e.
a transverse momentum discrimination, could be required in order to have a
more reliable probe of the nuclear EoS.
The situation appears more favorable for the kaon ratios. In fact we find that
the multiplicities of K0 and K+ are influenced in such a way that their ratio
is not affected by the density dependence of the inelastic cross sections. This
is due to the long mean free path of the K0,+ that are produced only in the
compression stage of the collision [15]. The effects of the in medium kaon
potentials are also largely compensating in the K0/K+ yield ratio, due to
the similar repulsive field seen by K0 and K+ mesons. Such a result can be
modified by the isospin dependence of the kaon potentials which is expected to
act in opposite directions for neutral and charged kaons rates. Actually this is a
rather stimulating open problem. In our analysis with two completely different
approaches, ChPT vs. RMF , we get a good agreement for the isoscalar kaon
potential but a rather different prediction for the isovector part.
However, a study in terms of the different choices of the isovector kaon field
has shown that the relative dependence of the strangeness ratios on the stiff-
ness of the isovector nuclear EoS remains a well robust observable. This is
an important issue in determining the high density behavior of the symmetry
energy in more systematic analyses in the future, when more experimental
data will be available.
Acknowledgments. This work is supported by BMBF, grant 06LM189 and
the State Scholarships Foundation (I.K.Y.). It is also co-funded by European
Union Social Fund and National funded Pythagoras II - EPEAEK II, under
project 80861. One of the authors (V.P.) would like to thank H.H. Wolter and
M. Di Toro for the warm hospitality during her short stays at their institutes.
References
[1] J. M. Lattimer, M.Prakash, Nucl. Phys A777, (2006) 479 and refs. therein;
B. Liu, H. Guo, V. Greco, U. Lombardo, M. Di Toro and Cai-Dian Lue, Eur.
Phys. J. A22, (2004) 337.
[2] T.Klähn et al., Phys. Rev. C74, (2006) 035802.
[3] P. Danielewicz, R. Lacey, W.G. Lynch, Science 298, (2002) 1592
[4] N. Hermann, J.P. Wessels, T. Wienold, Annu. Rev. Nucl. Part. Sci. 49, (1999)
[5] C. Fuchs et al., Phys. Rev. Lett. 86, (2001) 1974.
[6] A.B. Larionov, U. Mosel, Phys. Rev. C72, (2005) 014901.
[7] C. Fuchs, Prog. Part. Nucl. Phys. 56, (2006) 1.
[8] J. Aichelin, C.M. Ko, Phys. Rev. Lett. 55, (1985) 2661.
[9] C. Hartnack, H. Oeschler, J. Aichelin, Phys. Rev. Lett. 90, (2003) 102302.
[10] C. Sturm et al. (KaoS Collaboration), Phys. Rev. Lett. 86, (2001) 39.
[11] A. Andronic et al. (FOPI Collaboration), Phys. Lett. B612, (2005) 173.
[12] Bao-An Li, Phys.Rev.C71, (2005) 014608.
[13] Qing-feng Li, Zhu-xia Li, En-guang Zhao, Raj K. Gupta, Phys. Rev. C71,
(2005) 054907.
[14] G. Ferini, M. Colonna, T. Gaitanos, M. Di Toro, Nucl. Phys. A762, (2005)
147-166.
[15] G. Ferini, T. Gaitanos, M. Colonna, M. Di Toro, H.H. Wolter, Phys. Rev. Lett.
97, (2006) 202301..
[16] A.B. Larionov, W. Cassing, S. Leupold, U. Mosel, Nucl.Phys. A696, (2001)
[17] C. Fuchs et al., Phys. Rev. C 64, (2001) 024003.
[18] B. Ter Haar, R. Malfliet, Phys. Rev. C36, (1987) 1611.
[19] T. Gaitanos, C. Fuchs, H.H. Wolter, Phys. Lett. B609, (2005) 241;
E. Santini, T. Gaitanos, M. Colonna, M. Di Toro, Nucl. Phys. A756, (2005)
T. Gaitanos, C. Fuchs, H.H. Wolter, Prog. Part. Nucl. Phys. 53, (2004) 45.
[20] D.B. Kaplan, A.E. Nelson, Phys. Lett. B175, (1986) 57;
A.E. Nelson, D.B. Kaplan, Phys. Lett. B192, (1987) 193.
[21] G.Q. Li, C.H. Lee, G.E. Brown, Nucl. Phys. A625, (1997) 372.
[22] J. Schaffner-Bielich, I.N. Mishustin, J. Bondorf, Nucl. Phys. A625, (1997) 325.
[23] E.L. Bratkovskaya, W. Cassing, U. Mosel, Nucl. Phys. A622, (1997) 593.
[24] X. Lopez et al. (FOPI Collaboration), submitted for publication.
[25] L.P. Kadanoff, G. Baym, Quantum Statistical Mechanics (Benjamin, New York,
1962).
[26] B. Blättel, V. Koch, U. Mosel, Rep. Prog. Phys. 56, (1993) 1.
[27] P. Danielewicz, Nucl. Phys. A 673, 375 (2000).
[28] A.B. Larionov et al., Phys. Rev. C62, 064611 (2000).
[29] T. Gaitanos, C. Fuchs, H.H. Wolter, Amand Faessler, Eur.Phys.J. A12, (2001)
[30] T. Gaitanos, C. Fuchs, H.H. Wolter, Nucl.Phys. A741, (2004) 287.
[31] D. Persam, C. Gale, Phys. Rev. C65, (2002) 064611.
[32] P. Danielewicz, Acta Phys. Polon. B33, (2002) 45.
[33] A.B. Larionov, U. Mosel, Nucl.Phys. A728, (2003) 135.
[34] H. Huber, J. Aichelin, Nucl. Phys. A573, (1994) 587.
[35] E.N.E. van Dalen, C. Fuchs, Amand Faessler, Phys, Rev. C72, (2005) 065803.
[36] S. Weinberg, Phys. Rev. 166, (1968) 1568.
[37] G.E.Brown, M.Rho, Nucl. Phys. A596, (1996) 503.
[38] C. Fuchs, H.H. Wolter, Nucl. Phys. A589, (1995) 732.
[39] J.D. Walecka, Ann. Phys. (N.Y.) 83, (1974) 491.
[40] K. Tsushima, A. Sibirtsev, A.W. Thomas, G.Q. Li, Phys. Rev. C59, (1999)
K. Tsushima, S.W. Huang, Amand Faessler, Phys. Lett. B337, (1994) 245;
Austral. J. Phys. 50, (1997) 35 (nucl-th/9602005).
[41] D. Pelte et al., Z. Phys. A357, (1997) 215;
More precise pion data have been recently published in W. Reisdorf et. al.,
FOPI Collaboration, Nucl. Phys. A781 (2007) 459.
[42] D. Best et al. (FOPI collaboration), Nucl. Phys. A625, (1997) 307
[43] M. Menzel et al. (KaoS collaboration), Phys. Lett. B495, (2000) 26.
[44] R. Barth et al. (KaoS collaboration), Phys. Rev. Lett. 78, (1997) 4007;
P. Senger, H. Ströbele, J. Phys. G25, (1999) R59.
ABSTRACT
  The effect of possible in-medium modifications of nucleon-nucleon ($NN$)
cross sections on particle production is investigated in heavy ion collisions
($HIC$) at intermediate energies. In particular, using a fully covariant
relativistic transport approach, we see that the density dependence of the {\it
inelastic} cross sections appreciably affects the pion and kaon yields and
their rapidity distributions. However, the $(\pi^{-}/\pi^{+})$- and
$(K^{0}/K^{+})$-ratios depend only moderately on the in-medium behavior of the
inelastic cross sections. This is particularly true for kaon yield ratios,
since kaons are more uniformly produced in high density regions. Kaon
potentials are also suitably evaluated in two schemes, a chiral perturbative
approach and an effective meson-quark coupling method, with consistent results
showing a similar repulsive contribution for $K^{+}$ and $K^{0}$. As a
consequence we expect rather reduced effects on the yield ratios. We conclude
that particle ratios appear to be robust observables for probing the nuclear
equation of state ($EoS$) at high baryon density and, particularly, its
isovector sector.

<|endoftext|><|startoftext|>
Introduction
One famous conjecture of Erdös and Turán [2] asserts that any set A ⊂ N with∑
a∈A 1/a = ∞ should contain infinitely many progressions of arbitrary length
k ≥ 3. There are two important progresses towards this direction due to Szemerédi
[7] and Green and Tao [5] respectively, which assert that if A has positive upper
density or A is the set of the prime numbers, then A contains infinitely many
progressions of arbitrary length.
If one considers the similar question in the two-dimensional plane, Graham [4]
conjectured that if B ⊂ N× N satisfies
(x,y)∈B
x2 + y2
then B contains the four vertices of an axes-parallel square. More generally, for
any s ≥ 2 it should be true that B contains an s× s axes-parallel grid. Furstenberg
and Katznelson [3] proved the two-dimensional Szemerédi theorem, that is, any set
B ⊂ N × N with positive upper density contains an s × s axes-parallel grid. In
another words, such a set B contains any finite pattern.
The purpose of this paper is to show that if the Graham conjecture is true, then
the Erdös-Turán conjecture is also true.
2. The Graham conjecture implies the Erdös-Turán conjecture
Suppose that the Erdös-Turán conjecture is false for k = 3. Then there exists a
A = {a1 < a2 < a3 < · · · } ⊂ N
Date: April 4, 2007.
2000 Mathematics Subject Classification. 11B25.
http://arxiv.org/abs/0704.0555v1
2 LIANGPAN LI
n∈N 1/an = ∞ such that A contains no arithmetic progression of length 3.
Define a set B ⊂ N× N by
(an +m,m) : n ∈ N,m ∈ N
(x,y)∈B
x2 + y2
(an +m)2 +m2
(an +m)2 +m2
(an + an)2 + a2n
In the sequel we indicate that B contains no square and argue it by contradiction.
This would mean that the Graham conjecture is false for s = 2. Suppose that for
some n,m, l ∈ N, B contains a square of the following form:
(an +m,m+ l), (an +m+ l,m+ l),
(an +m,m), (an +m+ l,m).
It follows easily from the construction of B that an − l, an, an + l ∈ A, which yields
a contradiction since A contains no arithmetic progression of length 3 according to
the initial assumption.
Similarly, if the Graham conjecture is true for some s ≥ 2, then the Erdös-Turán
conjecture is true for k = 2s− 1. The interested reader can easily provide a proof.
3. Concluding Remarks
Let r(k,N) be the maximal cardinality of a subset A of {1, 2, . . . , N} which is
free of k-term arithmetic progressions. Behrend [1] and Rankin [6] had shown that
r(k,N) ≥ N · exp(−c(logN)1/(k−1)).
Similarly, let r̃(s,N) be the maximal cardinality of a subset B of {1, 2, . . . , N}2
which is free of s× s axes-parallel grids. For any set A ⊂ {1, 2, . . . , N}, define
Θ(A) = {(a+m,m) : a ∈ A,m = 1, 2, . . . , N} ⊂ {1, 2, . . . , 2N}2.
Following the discussion in Section 2, one can easily deduce that if A is free of 2s−1
term of arithmetic progression, then Θ(A) is free of s× s axes-parallel grid. Hence
r̃(s, 2N) ≥ r(2s− 1, N) ·N
≥ N2 exp(−c(logN)1/(2s−2)).
We end this paper with a question. Does the Erdös-Turán conjecture imply the
Graham conjecture?
THE GRAHAM CONJECTURE IMPLIES THE ERDÖS-TURÁN CONJECTURE 3
References
[1] F.A. Behrend, On sets of integers which contain no three terms in arithmetic progression,
Proc. Nat. Aca. Sci. 32 (1946), 331–332.
[2] P. Erdös and P.Turán, On some sequences of integers, J. London Math. Soc. 11 (1936),
261–264.
[3] H. Furstenberg and Y. Katznelson, An ergodic Szemeredi theorem for commuting transfor-
mation, J. d’Analyse Math. 34 (1979), 275–291.
[4] R. Graham, Conjecture 8.4.6 in Discrete and Computational Geometry (J.E. Goodman and
J. O’Rourke, eds), CRC Press, Boca Raton, NY, p.11.
[5] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, to appear
in Ann. of Math.
[6] R.A. Rankin, Sets of integers containing not more than a given number of terms in arithmetic
progression, Proc. Roy. Soc. Edinburgh Sect A. 65 (1960/61), 332–344.
[7] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Acta
Arith. 27 (1975), 299–345.
Department of Mathematics, Shanghai Jiaotong University, Shanghai 200240, Peo-
ple’s Republic of China
E-mail address: liliangpan@sjtu.edu.cn
ABSTRACT
  Erd\"{o}s and Tur\'{a}n once conjectured that any set $A\subset\mathbb{N}$
with $\sum_{a\in A}{1}/{a}=\infty$ should contain infinitely many progressions
of arbitrary length $k\geq3$. For the two-dimensional case Graham conjectured
that if $B\subset \mathbb{N}\times\mathbb{N}$ satisfies $$\sum\limits_{(x,y)\in
B}\frac{1}{x^2+y^2}=\infty,$$ then for any $s\geq2$, $B$ contains an $s\times
s$ axes-parallel grid. In this paper it is shown that if the Graham conjecture
is true for some $s\geq2$, then the Erd\"{o}s-Tur\'{a}n conjecture is true for
$k=2s-1$.

<|endoftext|><|startoftext|>
Effective conservation of energy and momentum
algorithm using switching potentials suitable for
molecular dynamics simulation of
thermodynamical systems
Christopher G. Jesudason ∗
Laboratory of Physics and Helsinki Institute of Physics,
P.O.Box 1100, FIN-02015 HUT, Finland.
Email: jesu@um.edu.my, chrysostomg@gmail.com
4 April, 2007
Abstract
During a crossover via a switching mechanism from one 2-body poten-
tial to another as might be applied in modeling (chemical) reactions in
the vicinity of bond formation, energy violations would occur due to finite
step size which determines the trajectory of the particles relative to the
potential interactions of the unbonded state by numerical (e.g. Verlet)
integration. This problem is overcome by an algorithm which preserves
the coordinates of the system for each move, but corrects for energy dis-
crepancies by ensuring both energy and momentum conservation in the
dynamics. The algorithm is tested for a hysteresis loop reaction model
with an without the implementation of the algorithm. The tests involve
checking the rate of energy flow out of the MD simulation box; in the
equilibrium state, no net rate of flows within experimental error should
be observed. The temperature and pressure of the box should also be
invariant within the range of fluctuation of these quantities. It is demon-
strated that the algorithm satisfies these criteria
AMS (MSC2000) Subject Classification. 00A71-2, 70H05, 80A20
1 PRELIMINARIES
The dimeric particle reaction simulated may be written
A2 (1.1)
∗on leave from Chemistry Department, University of Malaya, 50603 Kuala Lumpur,
Malaysia.
http://arxiv.org/abs/0704.0556v1
where k1 is the forward rate constant and k−1 is the backward rate constant.
The reaction simulation was conducted at extremely high temperatures which
are off-scale and not used in ordinary simulations of LJ (Lennard-Jones) fluids
where normally [1] the reduced temperatures T ∗ ranges ∼ 0.3 − 1.2, whereas
here, T ∗ ∼ 8.0 − 16.0, well above the supercritical regime of the LJ fluid At
these temperatures, the normal choices for time step increments do not obtain
without also taking into account energy-momentum conservation algorithms in
regions where there are abrupt changes of gradient [1, 2, 3]. The global literature
does not seem to cover such extreme conditions of simulation with discrete time
steps using the Verlet velocity algorithm. The units used here are reduced LJ
ones [1]. The simulation was at density ρ = 0.70 with 4096 atomic particls which
could react. The potentials used are as given in Fig. (1) where rb = 1.20 for the
vicinity where the bond of the dimer is broken and where 2 free particles emerge,
and rf = 0.85 is the point along the hysteresis potential curve where the dimer
is defined to exist for two previously free particles which collide. The reaction
proceeds as follows; all particles interact with the splined LJ pair potential uLJ
except for the dimeric pair (i, j) formed from particles i and j which interact
with a harmonic-like intermolecular potential modified by a switch u(r) given
u(r) = uvib(r)s(r) + uLJ [1− s(r)] (1.2)
where uvib(r) is the vibrational potential given by eq.(1.3) below
uvib(r) = u0 +
k(r − r0)
2 (1.3)
The switching function s(r) is defined as
s(r) =
)n (1.4)
where
s(r) → 1 if r < rsw
s(r) → 0 for r > rsw
The switching function becomes effective when the distance between the atoms
approach the value rsw (see Fig. (1) ). Some of the other parameters used in
the equations that follow include:
u0 = −10, r0 = 1.0, k ∼ 2446 (exact value is determined by the other input pa-
rameters), n = 100, rf = 0.85, rb = 1.20, and rsw = 1.11. Particles i and j above
also interact with all other particles not bonded to it via uLJ . Full simulation
details are given elsewhere [2]; suffice to say the activation energy at rf is ex-
tremely high at approximately 17.5. At rf , the molecular potential is turned on
where at this point there is actually a crossing of the potential curves although
the gradients of the molecular and free uLJ potentials are ”‘very close”’. On
the other hand, at rb , the switch forces the two curves to coalesce, but detailed
examination shows that there is an energy gap of about the same magnitude as
the cut-off point in a normal non-splined LJ potential (∼ 0.04 energy units),
meaning there is no crossing of the potentials. The current algorithm is applied
for both these cross-over regions with their different mechanisms of cross-over.
The MD cell is rectangular, with unit distance along the axis ( x direction)
of the cell length, whereas the breadth and height was both 1/16, implying a
thin pencil-like system where the thermostats were placed at the ends of the
MD cell, and the energy supplied per unit time step δt at both ends of the cell
(orthogonal to the x axis) in the vicinity of x = 0 and x = 1 maintained at
temperatures Th and Tl could be monitored, where this energy per unit step
time is respectively ǫh and ǫl. At equilibrium, (when Th = Tl), the net energy
supplied within statistical error (meaning 1-3 units of the standard error of the
ǫ distributions ) is zero, i.e. ǫl ≈ ǫh ≈ 0. The cell is divided up uniformly into
64 rectangular regions along the x axis and its thermodynamical properties of
temperature and pressure are probed. The resulting values of the ǫ’s and the
relative invariance of the pressure and temperature profiles would be a measure
of the accuracy of the algorithm from a thermodynamical point of view at the
steady state. For systems with a large number of particles such thermodynam-
ical criteria are appropriate. The synthetic thermostats now frequently used
in conjunction with ”‘non-Hamiltonian”’ MD [3] cannot be employed for this
type of study, where actual energy increments are sampled. The runs were for
4 million time steps, with averages taken over 100 dumps, where each variable
is sampled every 20 time steps. The final averages were over the 20-100 dump
values of averaged quantities.
0.8 1 1.2 1.4 1.6 1.8
r/LJ distance units
Potentials for simulation model
 intermolecular potential
s(r) switching function
atomic LJ potential
Figure 1: Potentials used for this work.
The temperature T and pressure p are computed by the equipartition and
0 20 40 60 80
x layer number
Figure 2: Temperature profile across the cell for different set conditions a−e for
temperature T ∗ and step time δt pairs (T ∗, δt) where a = (8.0, 2.0 ep− 3), b =
(8.0, 5.0 ep− 4), c = (8.0, 5.0 ep− 5), d = (12.0, 5.0 ep− 5), e = (16.0, 5.0 ep− 5).
The curves {l1, l3, t1, t2, t3} results with the application of the algorithm at rb
and rf with associated conditions l1 ⇔ a, l3 ⇔ b, t1 ⇔ c, t2 ⇔ d, t3 ⇔ e whilst
the curves {l2, l4, l5, l6, l7} are for the cases without implementing the algorithm
with the associated conditions l2 ⇔ a, l4 ⇔ b, l5 ⇔ c, l6 ⇔ d, l7 ⇔ e, where
ep x ≡ 10x.
0 20 40 60 80
x layer number
Figure 3: Pressure profile across the cell for different runs.The conditions of the
runs and the labeling of the curves are exactly as in Fig. (2).
Curve ǫh ǫl Mean Temperature
l1 -.2274E+00 ±0.19E-02 -.2295E+00 ±0.21E-02 0.9063E+01 ±0.62E-02
l2 -.5602E+00 ±0.22E-02 -.5596E+00 ±0.22E-02 0.1032E+02 ±0.63E-02
l3 -.4161E-01 ±0.14E-02 -.4089E-01 ±0.14E-02 0.8774E+01 ±0.79E-02
l4 -.5201E-01 ±0.16E-02 -.5103E-01 ±0.17E-02 0.8980E+01 ±0.98E-02
t1 -.5312E-03 ±0.92E-03 -.3334E-03 ±0.76E-03 0.8082E+01 ±0.49E-02
l5 0.1311E-02 ±0.82E-03 0.1147E-02 ±0.84E-03 0.7731E+01 ±0.97E-02
t2 -.6823E-03 ±0.12E-02 -.1507E-02 ±0.13E-02 0.1216E+02 ±0.17E-01
l6 0.7291E-02 ±0.12E-02 0.6343E-02 ±0.14E-02 0.1088E+02 ±0.15E-01
t3 -.9348E-03 ±0.18E-02 -.3379E-02 ±0.17E-02 0.1622E+02 ±0.18E-01
l7 0.1918E-01 ±0.14E-02 0.1938E-01 ±0.16E-02 0.1329E+02 ±0.20E-01
Table 1: Table with values for the mean heat supply per unit step and temper-
ature. The error is one unit of standard error for the quantities.
Virial expression given respectively by
pi.pi/mi
= 3NkBT andP = ρkBT +W/V
where W = − 1
w(rij) and the intermolecular pair Virial w(r) is given
by w(r) = r
dv(r)
with v being the potential.
2 ALGORITHMAND AND ANALYSIS OF NU-
MERICAL RESULTS
The velocity Verlet algorithm [4, p. 81]used here [1] and allied types generate
a trajectory at time nδt from that at (n− 1)δt with step increment δt through
a mapping Tm where (v(nδt), r(nδt)) = Tm(v((n − 1)δt), r((n − 1)δt)) which
does not scale linearly with δt. For a Hamiltonian H whose potential V is
dependent only on position r having momentum components pi, the system
without external perturbation has constant energy E, and the normal assump-
tion in MD (NAMD)is that for the nth step, ∆En = |H(nδt)− E| ≤ ǫ and also
i=1 ∆Ei ≤ ǫ
s for the specified ǫ′s. In the simulation under NAMD, the force
fields are constant and do not change for any one time step. In these cases, the
energy is a constant of the motion for any time interval δtT when no external
perturbations (e.g. due to thermostat interference) are impressed. When there
is a crossing of potentials at such a time interval interval from φb to φa at an
inter particle distance(icd ) rc (such as points rf and rb of Fig. (1)) of general
particle 1 and 2 (the (1, 2) particle pair) due to a reactive process (such as oc-
curs in either direction of (1.1)) a bifurcation occurs where the MD program
computes the next step coordinates as for the unreacted system (potential φb),
which needs to be corrected. Let the icd at time step i be ri (with φb potential)
and at step i + 1 after interval δt be rf = ri+1 where rf < rc < ri. Due to
this crossover, a different Hamiltonian H ′ is operative after point rc is crossed,
where under NAMD, the other coordinates not undergoing crossover are not
affected. For what follows, subscripts refer to the particle concerned. Let the
interparticle potential at rf be Ea = Ef = φa(rf ) and at rf be Eb = φb(rf ),
where ∆ = Eb − Ea. Then if rf be the final coordinate due to the φb potential
and force field, two questions may be asked: (i) Can the velocities of (1,2) be
scaled, so that there is no energy or momentum violation during the crossover
based on the φb trajectory calculation and (ii) Can a pseudo stochastic potential
be imposed from coordinates rc (at virtual time tc) to rf such that (i) above is
true? For (ii) we have
Theorem 2.1 A virtual potential which scales velocities to preserve momentum
and energy can be constructed about region rc.
proof The external work done δW on particles 1 and 2 over the time step is
proportional to the distance traveled since these forces are constant and so for
each of these particles i, Fext,i.∆ri = δWi where ∆ri is the distance increment
during at least part of the time step from rc to rf . For the non-reacting trajec-
tory over time λδt (λ ≤ 1) (virtual because it is not the correct path due to the
crossover at rc),
δW2 + δW1 − (φb(rf )− φb(rc)) = ∆
(K.E.) (2.1)
where ∆
(K.E.) is the change of kinetic energy for the (1, 2) pair from the
First Law between the end points rf , rc. Now over time interval tc to tf , for
the reactive trajectory, we introduce a ”‘virtual potential”’ V vir that will lead
to the same positional coordinates for the pair at the end of the time step with
different velocities than for the non-reactive transition leading to the transition
δW2 + δW1 − (V
vir(rf )− V
vir(rc)) = ∆
′(K.E.) (2.2)
where ∆
′(K.E.) is the change of kinetic energy for the pair with V vir turned
on and along this trajectory, the change of potential for V vir is equated to the
change in the K.E. of the pair as given in the results of theorem (2.2) for all
three orthogonal coordinates, i.e.
δV vir(r) − δφb(r) = δ
(K.E.x,y,z)−∆
′(K.E.)x,y,z
with momentum conservation, that is δV vir(ri) = δφa(ri) for the variation
along the ri coordinate, but δφa(ri) = −δK.E. along internuclear coordinate ri
whereas δV vir = −K.E. (scaled about all three axes). Continuity of potential
implies
φa(rf ) = V
vir(rf );φa(rc) = V
vir(rc);φb(rc) = V
vir(rc); (2.3)
Subtracting (2.1) from (2.2) and applying b.c.’s (2.6) leads to
∆ = φb(rf )− V
vir(rf ) = φb(rf )− φa(rf ) = Eb − Ea (2.4)
′(K.E.)−∆
(K.E.) (2.5)
The above shows that a conservative virtual potential could be said to be oper-
ating in the vicinity of the transition (from tc to ta) .•
Question (i) above leads to:
Theorem 2.2 Relative to the velocities at any rf due to the φb potential, the
rescaled velocities v ′ due to the potential difference ∆ leading to these final
velocities due to the virtual potential can have a form given by
′ = (1 + α)vi + β (2.6)
(where i = 1, 2) for a vector β.
proof From the v velocities at rf due to φb we compute the v
′ velocities at
rf due to the virtual potential. Since net change of momentum is due to the
external forces only, which is invariant for the (1, 2) pair, conservation of total
momentum relating v′ and v in (2.6) yields a definition of β ( summation from
1 to 2 for what follows, where the mass of particle i is mi)
β = −α
mivi/
mi (2.7)
Defining for any vector s2 = s.s,β2 = α2Q, where
(2.8)
then the rescaled velocities become from (2.6)
= (1 + α)2vi
2 + 2(1 + α)vi.β + β
2. (2.9)
With ∆ = Eb − Ea, Energy conservation implies
2 = ∆ (2.10)
The coupling of (2.9-2.10) leads, after several steps of algebra to
α2m1m2
2(m1 +m2)
2 + v2
2 − 2v1.v2
(2.11)
2αm1m2
2(m1 +m2)
2 + v2
2 − 2v1.v2
Defining a = (v1 − v2)
2, q = m2m1/[2(m1 +m2)], (q > 0, a ≥ 0), then the
above is equivalent to the quadratic equation
α2qa+ 2qaα−∆ = 0 (2.12)
and in simulations, only α is unknown and can be determined from (2.12) where
real solutions exist for ∆/qa ≥ −1. • The above Inequality leads to a certain
asymmetry concerning forward and backward reactions, even for reversible re-
actions where the region of formation and breakdown of molecules are located
in the same region with the reversal of the sign of approximate ∆. For this
simulation, a reaction in either direction (formation or breakdown of dimer )
proceeds if (??) is true; if not then the trajectory follows the one for the initial
trajectory without any reaction (i.e. no potential crossover).
Interpretation of results. Fig. (1) shows a rapidly changing potential curve
with several inflexion points used in the simulation at very high temperature
(as far as I know such ranges have not been reported in the literature for non-
synthetic methods) warranting smaller time steps; larger ones would introduce
errors due to the rapidly changing potential and high K.E.; thus, even with
the application of the algorithm between cordinates rf and rb, curves l1 and
l2 have too large a δt value to achieve equilibrium - meaning flat or invariant
- temperature (see Fig. (2) ) or pressure (see Fig. (3))or unit step thermostat
heat supply (see Table 1)(ǫh and ǫl) profiles where for these curves, the (ǫh, ǫl)
values show net heat absorption; the curve at t1 (with δt = 5.0 ep− 5 show flat
profiles (within statistical fluctuations and 2 standard errors of variation) for
temperature, pressure and net zero heat supply; and this choice of time step
interval was found adequate for runs at much higher temperatures (T = 12 and
T = 16) which was used to determine thermodynamical properties [2]. For this
δt value and all others, no reasonable stationary equilibrium conditions could
be obtained without the application of the algorithm (curves l2,l4,l5,l6 and l7).
The algorithm is seen to be effective over a wide temperature range for this
complex dimer reaction simulated under extreme values of thermodynamical
variables and the results here do not vary for longer runs and greater sampling
statistics (e.g. 6 or 10 million time steps). The thin, pencil-like geometry of the
rectangular cell with thermostats located at the ends would highlight the energy
non-conservation leading to a non-flat temperature distribution, as observed and
which was used to determine the regime of validity of the algorithm.
References
[1] J. M. Haile,Molecular Dynamics Simulation,JohnWiley & Sons,Inc.,New
York, 1992.
[2] C. G. Jesudason, Model hysteresis dimer molecule. I. Equilibrium prop-
erties. J. Math. Chem. JOMC, Accepted 2006.
[3] D. Frenkel and B. Smit, Understanding Molecular Simulations: From
Algorithms to Applications, Vol(1) of Computational Science Series, Aca-
demic Press, San Diego, Second Ed., 2002.
[4] M.P. Allen and D. J. Tildesley, Computer Simulation of Liquids,Oxford
Univ. Press, Oxford, 1992
	PRELIMINARIES
	ALGORITHM AND AND ANALYSIS OF NUMERICAL RESULTS
ABSTRACT
  During a crossover via a switching mechanism from one 2-body potential to
another as might be applied in modeling (chemical) reactions in the vicinity of
bond formation, energy violations would occur due to finite step size which
determines the trajectory of the particles relative to the potential
interactions of the unbonded state by numerical (e.g. Verlet) integration. This
problem is overcome by an algorithm which preserves the coordinates of the
system for each move, but corrects for energy discrepancies by ensuring both
energy and momentum conservation in the dynamics. The algorithm is tested for a
hysteresis loop reaction model with an without the implementation of the
algorithm. The tests involve checking the rate of energy flow out of the MD
simulation box; in the equilibrium state, no net rate of flows within
experimental error should be observed. The temperature and pressure of the box
should also be invariant within the range of fluctuation of these quantities.
It is demonstrated that the algorithm satisfies these criteria.

<|endoftext|><|startoftext|>
Baltic Astronomy, vol. 16, xxx–xxx, 2007.
MIXED CHEMISTRY PHENOMENON DURING LATE STAGES
OF STELLAR EVOLUTION
R. Szczerba, M.R. Schmidt, M. Pulecka1
1 Nicolaus Copernicus Astronomical Center, ul. Rabiańska 8, 87-100 Toruń,
Poland
Received 2006 October 15; revised —
Abstract. We discuss phenomenon of simultaneous presence of O- and C-
based material in surroundings of evolutionary advanced stars. We concentrate
on silicate carbon stars and present observations that directly confirm the binary
model scenario for them. We discuss also class of C-stars with OH emission
detected, to which some [WR] planetary nebulae do belong.
Key words: stars: Asymptotic Giant Branch, carbon stars, chemical com-
position, planetary nebulae, stars: individual (V778 Cyg, IRAS 04496−6859,
IRAS 06238+0904, M 2−43)
1. INTRODUCTION
During Asymptotic Giant Branch (AGB) phase of evolution stars with ini-
tial masses between 0.8 and 8M⊙ lose a significant amount of their initial mass
by ejecting the matter into interstellar space with rates between 10−7 and 10−4
M⊙ yr
−1. The chemistry in the formed circumstellar envelopes is determined by
the photospheric C/O ratio and is O-based for n(O)>n(C) (usually less evolution-
ary advanced stars) and C-based when carbon abundance exceeds that of oxygen
(evolved stars which experienced thermal pulses and dredged-up carbon to the
surface). This dichotomy is a consequence of CO molecule (very stable) formation
which is so efficient that less abundant element (C or O) is mostly used. There-
fore, the detection of co-existence of O-rich and C-rich material in surroundings
of evolved stars was (and still is) surprising and attracts a significant attention.
Hereafter, we call this phenomenon a mixed chemistry phenomenon.
Already, due to the IRAS observations it was realized that there is a group
of carbon stars which show typical for O-rich environment the 9.7 and 18µm
amorphous silicate features (Little-Marenin 1986, Willems & de Jong 1986). The
Infrared Space Observatory (ISO) observations (Yamamura et al., 2000) showed
that 9.7µm feature in one of such objects (V778 Cyg) is very stable and did not
change during the last 15 years (the time spanned between IRAS and ISO obser-
vations). This put a very strong constraint on a model and evolutionary status
of this class of objects with most likely explanation being a long-lived reservoir
of O-rich material located inside or around a binary system. In this review we
discuss MERLIN interferometer observations of V778 Cyg which proved existence
of such reservoir (disk) around companion of C-rich star. We note that the recent
Spitzer Space Telescope (SST) data showed that the first extra-galactic silicate
http://arxiv.org/abs/0704.0557v1
2 R. Szczerba, M.R. Schmidt, M. Pulecka
carbon star (IRAS 04496−6859, Trams et al. 1999) is in fact a normal carbon star
and do not show the 9.7µm dust emission (see Speck et al. 2006).
There is another group of carbon stars suspected to have mixed chemistry.
Namely, carbon stars with OH maser emission. Lewis (1992) listed a group of stars
with SiC emission seen in the IRAS Low Resolution Spectra (LRS) and OH maser
emission detected. While most of these sources appeared to have wrong LRS clas-
sification the 3 C-stars with OH maser emission remained and Chen et al. (2001)
added 6 more sources to this class. However, this class of mixed chemistry sources
did not attract a significant attention (except of [WR] planetary nebulae – see
below), since OH emission is not well spatially resolved and this group of sources
may be result of spatial coincidence between OH maser emission from interstellar
medium and location of C-star. For example, Szczerba et al. (2002) presented
observational evidence that IRAS 05373−0810 (C-star with OH maser emission)
is a genuine carbon star and that OH maser and SiO thermal emission detected
toward this star is not coming from its envelope, but from molecular clouds. Here
we discuss a case of another C-star with OH maser emission (IRAS 06238+0904)
toward which we have detected, using IRAM radiotelescope, the SiO thermal emis-
sion coming from its envelope. Here, we present arguments that shock and Photon
Dominated Region (PDR) chemistry allow to form a significant amount of SiO in
C-rich environment.
One of the most important achievements of the ISO mission was detection of
crystalline silicates. Surprisingly, crystalline silicates were detected also in [WR]
planetary nebulae, which have H-poor and C-rich central stars of WR-type1. [WR]
planetary nebulae show at the same time presence of Polycyclic Aromatic Hydro-
carbons (PAHs) and crystalline silicates (Waters et al. 1998, Cohen et al. 1999).
Scenarios proposed to explain simultaneous presence of PAHs and crystalline sil-
icates include: destruction of fossil comets orbiting the star, ejection of matter
before star become C-rich, formation of stable O-rich disk or torus around com-
panion or system at some point of binary evolution. Hajduk, Szczerba & Gesicki
(this Proceedings) present an attempt to determine spatial location of PAHs and
crystalline silicates inside the [WR] planetary nebula M 2-43, by means of the
radiative transfer modelling of ISO spectrum. They concluded that crystalline
silicates have to be located at significant distance from the central star to avoid
their emission at about 10 µm. We note also an attempt to find precursors of [WR]
planetary nebulae (C-rich stars with C- and O-rich material in their circumstellar
shells) among proto-planetary nebulae (Szczerba et al. 2003). The authors have
argued that formation of crystalline silicates is necessary before proto-planetary
nebula phase, while post-AGB star may be still O-rich and change to C-rich one
during the fatal thermal pulse. They indicated five proto-planetary nebulae as a
possible precursors of [WR] planetary nebulae, including famous Red Rectangle,
other C-rich source with crystalline silicates (IRAS 16279-4757), as well as three
O-rich sources which show presence of crystalline silicates in their ISO spectra:
AC Her, IRAS 18095+2704 and IRAS 19244+1115.
In this review we will not cover such cases as: NGC 6302 – O-rich planetary
nebula which show presence of crystalline silicates as well as PAHs (e.g. Kemper
et al. 2002); HD 233517 – an evolved O-rich red giant with orbiting polycyclic
1Note, that Zijlstra et al. (1991) detected OH maser emission from [WR] planetary nebula
IRAS 07027−7934. Therefore, at least this [WR] planetary nebula belongs also to the discussed
above class of C-stars with OH maser emission.
Mixed chemistry phenomenon 3
aromatic hydrocarbons (Jura et al. 2006); IRAS 09425−6040 – a carbon-rich AGB
star with the highest abundance of crystalline silicates detected up to now (Molster
et al. 2001); IRC +10216 – a well known C-rich AGB star with water and OH
maser lines detected (Melnick et al. 2001, Ford et al. 2003); and possibly some
other spectacular sources which we, not intentionally, have overlooked.
2. V778 CYG A SILICATE CARBON STAR
To test the hypotheses related to the mixed chemistry phenomenon observed in
silicate carbon stars, we observed water masers towards V778Cyg at high angular
resolution. Details of our observations and data analysis are presented by Szczerba
et al. (2006), so here we repeat only some of the most important points and
findings.
The observations were taken on 2001 October 12/13 under good weather con-
ditions, using five telescopes of MERLIN. The longest MERLIN baseline of 217 km
gave a fringe spacing of 12mas at 22.235080GHz. The bandwidth was 2MHz with
256 spectral channels per baseline providing a channel separation of 0.105km s−1.
The continuum calibrator sources were observed in 16MHz band with 16 channels.
The data were obtained in left and right circular polarisation and the velocities
were measured with respect to the local standard of rest.
We used the phase referencing method; 4min scans on V778Cyg were inter-
leaved by 2min scans on the source 2021+614 (at 3.◦8 from the target) over 11.5 h.
The flux density of 2021+614 at K band of 1.48 Jy was derived from observation of
4C39.25. At the epoch of observation the flux density of 3C39.25 was 7.5±0.3Jy
(Terasranta 2002, private communication). This source was also used for bandpass
calibration.
After initial calibration with the MERLIN software the data were processed
using the AIPS package. In order to derive phase and amplitude corrections for
atmospheric and instrumental effects the phase reference source was mapped and
self-calibrated. These corrections were applied to the target visibility data. The
absolute position of the brightest feature at −15.1 km s−1 was determined. The
phase solutions for this feature were obtained with self-calibration method and
were then applied for the all channels. The target was mapped and cleaned using
a 12mas circular restoring beam. The map noise of ∼27mJy beam−1 for I Stokes
parameter in a line-free channel was close to the predicted thermal noise level.
In order to determine the position and the brightness of the maser components
two dimensional Gaussian components were fitted to the emission in channel maps.
The position uncertainty of this fitting depends on the signal to noise ratio in the
channel map and is lower than 1mas for about 80% of the maser components to-
wards V778Cyg. The absolute position of the phase reference source is known with
an accuracy of ∼3mas. The highest uncertainties in the absolute position of maser
spots are due to tropospheric effects and errors in the telescope positions. The
first effects, estimated by observing the phase rate on the point source 3C39.25,
introduce the position error of ∼9mas, whereas uncertainties in telescope positions
of 1−2 cm cause an error of spot positions of ∼10mas. In order to check the posi-
tion accuracy of maser spots we applied a reverse phase referencing scheme. The
emission of 15 channels around the reference feature at −15.1 km s−1 was averaged
and mapped. The map obtained was used as a model to self-calibrate the raw tar-
get data then these target solutions were applied to the raw data of 2021+614.
4 R. Szczerba, M.R. Schmidt, M. Pulecka
The position of the reference source was shifted by ∼2mas with respect to the
catalogue position. This indicates excellent phase connection when referencing
2021+614 to the set of the brightest maser spots. The above discussed factors im-
ply the absolute position accuracy of the maser source to be of order of ∼25mas.
Fig. 1. The absolute positions of the H2O 22GHz maser components towards
V778Cyg relative to the reference feature at −15.1 km s−1 (RA(J2000) = 20h
36m07.s3833, DE(J2000) = 60◦05′26.′′024). The symbols indicate the ranges of
component velocities in km s−1. The size of each symbol is proportional to the
logarithm of peak brightness of the corresponding component.
The maser emission brigther than 150mJ beam−1 (∼ 5σ) was found in 51 spectral
channels. In these channels single and unresolved component only was detected.
The overall structure of the H2O maser is shown in Fig. 1. The maser com-
ponents form a distorted ”S” like shape structure along a direction of position
angle of about −10◦. There is a clear velocity gradient along this structure with
weak south components blueshifted with respect to the brightest north compo-
nents. The angular extend of maser emisson is 18.5mas. The axis of alonga-
tion of the maser structure is fairly perpendicular to the line towards the op-
tical position of V778Cyg measured by Tycho2 (see Fig. 2). Angular separa-
tion between the optical star and the maser reference component is 0.192±0.′′048.
Mixed chemistry phenomenon 5
200 150 100 50 0 -50
Relative RA [mas]
Tycho-2 
MERLIN 
Fig. 2. Comparison of th optical position of V778 Cyg as determined in the
Tycho-2 catalogue with the radio position of the H2O 22 GHz maser components
as obtained from the MERLIN measurements. The epochs of optical and radio
observations differ by about 10 yrs.
Szczerba et al. (2006) have argued that such separation cannot be explained by
proper motion and instead provide direct observational evidence for the binary
system model of Yamamura et al. (2000). They suggested that the observed wa-
ter maser components can be interpreted as an almost edge-on warped Keplerian
disk located around a companion object and tilted by no more 20◦ relative to the
orbital plane. More detailed model of disk around companion in V778 Cyg system
is presented by Babkovskaia et al. (2006). Finally, note that recently Ohnaka
et al. (2006) reported indirect detection of disk around another silicate carbon
star (IRAS 08002−38003). They argued that oxygen-rich material is stored in
circumbinary disk surrounding the carbon-rich primary star and its putative low-
luminosity companion. These two findings may suggest that there are two different
kinds of silicate carbon stars: with circumbinary disk and disk around companion
only.
3. IRAS 06238+0904 - AN OH MASER C-STAR OR GENUINE CARBON STAR?
Genuine carbon stars are formed during evolution on AGB. The star on that
stage posses extended circumstellar envelope (CSE). In its inner part (near the
photosphere) physical conditions (T∼2500 K, ρ ∼ 1014 cm−3) make the material
mainly molecular, with composition determined by the local thermodynamic equi-
librium (LTE). In carbon CSE (C/O>1) after CO formation there is almost no
oxygen left. However silicon monoxide (SiO) is observed in carbon stars. Recent
observations (Schöier et al. 2006) show relatively high SiO fractional abundances
(1× 10−7− 5× 10−5), while LTE models give ∼ 5× 10−8 (Millar 2004). Therefore
6 R. Szczerba, M.R. Schmidt, M. Pulecka
the non-equilibrium processes should be considered in modelling of circumstellar
chemistry.
In this review we focus on IRAS 06238+0904 – an OH maser C-star (see Chen
et al. 2001). We first built model of carbon circumstellar envelope (CSE) and
then computed radiative transfer in molecular rotational lines of HCN J=1-0, CS
J=3-2, CS J=5-4 and SiO J=3-2, detected by us with the IRAM radiotelescope.
Spectral energy distribution (SED) for IRAS 06238+0904 was modelled by
means of the code and method described in Szczerba et al. (1997). The best fit (see
Fig. 3) is obtained for the star’s effective temperature T∗=2500 [K], luminosity to
distance ratio L/d2=6500 [L⊙kpc
−2], mass loss rate Ṁ = 2×10−5 [M⊙ yr
−1], dust
temperature at the inner boundary Tdust(R
in )=900 [K], amorphous carbon (AC)
and silicon carbide (SiC) to gas ratios: ρ(AC)/ρgas=0.001, ρ(SiC)/ρgas=0.00019.
Fig. 3. Spectral energy distribution for IRAS 06238+0904. See text for details
concerning assumed and estimated parameters.
The chemical model is computed with the network based on RATE99 database
Le Teuff et al. (2000) composed of 343 species made of 10 elements. The gas tem-
perature profile is approximated by the power law function r−1.8 established by
iterations from the best fits to CS lines. We assume solar gas composition with
modifications of carbon (C/O=1.5) and sulfur ǫ(S)=6.71 abundances. As initial
concentrations we put LTE values of 23 important species, where SiO number den-
sity is equal 1×10−8 [cm−3] The effect of dust formation is included by reduction
of Si and C by amount locked up in SiC and amorphous carbon grains according
to dusty model. This results in decrease of silicon abundance to ǫ(Si)=7.39 and
decrease of C to O ratio to 1.3.
Mixed chemistry phenomenon 7
The radiative transfer is computed in Sobolev approximation with molecular
data taken from the Leiden database (Schöier et al. 2005). Only interstellar
radiation is taken into account as an important source of UV photons. Level
populations of investigated molecules were computed for the assumed temperature
and molecular densities resulting from chemical model. The half-width main beam
(HPBW) for SiO rotational transition J=3-2 (v=130 GHz) is equal to 18.9′′, 16.7′′
for CS(3-2), 10.0′′ for CS(5-4) and 28.9′′ for HCN(1-0) transition, in case of the
IRAM telescope observations. The synthetic profile was computed for assumed
distance to IRAS 06238+0904 being equal 2.3 kpc.
The observed and obtained molecular line profiles of SiO(3-2), CS(3-2) and
CS(5-4) are shown in 3 panels of Fig. 4. During line profiles modelling we included
simple treatment of CO self-shielding based on Mamon et al. (1988). This process
has considerable influence on all molecules, and is especially important for SiO.
As one can see in Fig. 4, when self-shielding is not included (solid line) we can
explain observed spectrum solely by the PDR chemistry. Around 1×1016 [cm]
we observed considerable reproduction of SiO. On the other hand, inclusion of
CO self-shielding prevents formation of SiO (dashed line). Partial reproduction in
PDR is still present. In both cases the exchange reaction OH + Si → SiO + H is
a main process responsible for formation of silicon monoxide. Exchanges between
atomic oxygen and SiH, SiC, and HCSi molecules (O + SiH → SiO + H, O +
HCSi → SiO + CH, and O + SiC → SiO + C) are also important.
Simulation of the shock passage (see Willacy & Cherchneff 1998) enlarge initial
abundance of SiO, in comparison to the LTE value, for about one order. Profile
obtain with abundance of this molecule increased by factor of 10 is shown as
dashed-dotted line in Fig. 4.
Fig. 4. Observed and modelled molecular rotational lines without (solid line)
and with (dashed line) CO self-shielding. Dashed-dotted line in the left panel
show results when the intial LTE abundance of SiO is increased ten times due to
the shock passage.
Therefore, we can conclude that IRAS 06238+0904 is a genuine C-star and
no assumption of mixed chemistry is necessary. Chemical reactions considered in
network can reproduce O-bearing SiO molecule in C-rich environment if no CO
self-shielding is considered. In presence of CO self-shielding the computed SiO
8 R. Szczerba, M.R. Schmidt, M. Pulecka
emission is too low. This may be improved, however, if we consider the effect of
shock passage which can increase the initial SiO abundance by order of magnitude
as predicted by Willacy & Cherchneff (1998).
ACKNOWLEDGMENTS. This work has been supported by grants 2.P03D.017.25
and 1.P03D.010.29 of the Polish State Committee for Scientific Research.
REFERENCES
Babkovskaia N., Poutanen J., Richards A. M. S., Szczerba R. 2006, MNRAS, 370,
Chen P. S., Szczerba R., Kwok S, Volk K. 2001, AA, 368, 1006
Cohen M., Barlow M. J., Sylvester R. J. et al. 1999, ApJ, 513, L135
Ford K. E. S., Neufeld D. A., Goldsmith P. F., Melnick G. J. 2003, ApJ, 589, 430
Jura M., Bohac C. J., Sargent B. et al. 2006, ApJ, 637, L45
Lewis B. M., 1992, ApJ, 396, 251
Le Teuff Y. H., Millar T. J., Markwick A. J. 2000, A&AS, 146, 157
Little-Marenin I. 1986, AA, 307, L15
Mamon G. A., Glassgold A. E., Huggins P. J., 1988, ApJ, 328
Melnick G. J., Neufeld D. A., Ford K. E. S. et al. 2001, Nature, 412, 160
Millar T. J. 2004, in AGB stars, eds. H. J. Habing, H. Olofsson, 247
Molster F. J., Yamamura I., Waters L. B. F. M. et al. 2001, AA, 366, 923
Ohnaka K., Driebe T., Hoffman K.-H. et al. 2006, AA, 445, 1015
Schöier F., Olofsson H., Lundgren A. 2006, AA, 454, 247
Schöier F., van der Tak F., van Dishoeck E., Black J. H. 2005, AA, 432, 369
Speck A., Cami J., Markwick-Kemper C. et al., 2006, ApJ, 650, 892
Szczerba R., Chen P. S., Szymczak M., Omont A. 2002, AA, 381, 491
Szczerba R., Omont A., Volk K. et al. 1997, AA.,317,859
Szczerba R. Stasińska G., Siódmiak N., Górny S. K. 2003, in Exploiting the ISO
Data Archive. Infrared Astronomy in the Internet Age, eds. C. Gry, S. Peschke,
J. Matagne, P. Garcia-Lario, R. Lorente, A. Salama, ESA SP-511, 149
Szczerba R., Szymczak M., Babkovskaia N. et al. 2006, AA, 452, 561
Trams N. R., van Loon J. Th., Zijlstra A. A. et al. 1999, AA, 344, L17
Waters L. B. F. M., Beintema D. A., Zijlstra A. A. 1998, AA, 331, L61
Willacy K., Cherchneff I., 1998, AA, 330, 676
Willems F. J., de Jong T. 1986, ApJ, 309, L39
Yamamura I., Dominik C., de Jong T. 2000, AA, 363, 629
Zijlstra A. A., Gaylard M. J., Te Lintel Hekkert P. et al. 1991, AA, 243, L9
ABSTRACT
  We discuss phenomenon of simultaneous presence of O- and C-based material in
surroundings of evolutionary advanced stars. We concentrate on silicate carbon
stars and present observations that directly confirm the binary model scenario
for them. We discuss also class of C-stars with OH emission detected, to which
some [WR] planetary nebulae do belong.

<|endoftext|><|startoftext|>
Introduction
Let X3 ⊂ P4 be a smooth cubic threefold, then its intermediate Jacobian
J(X) := H2,1(X,C)∗/H3(X,Z)
is a principally polarised abelian variety (J(X),Θ) of dimension five that is not a Ja-
cobian of a curve [4, Thm.0.12]. The Fano scheme F parametrising lines contained
in X is a smooth surface, and the Abel-Jacobi map F → J(X) is an embedding
that induces an isomorphism Alb(F )) ≃ J(X) [4, Thm.0.6,0.9]. Furthermore the
cohomology class of F ⊂ J(X) is minimal, that is
[F ] =
There is only one other known family of examples of principally polarised abelian
varieties (A,Θ) of dimension n such that for 1 ≤ d ≤ n− 2, a minimal cohomology
class Θ
(n−d)!
can be represented by an effective cycle of dimension d: the Jacobians
of curves J(C) where the suvarietiesWd(C) ⊂ J(C) have minmal cohomology class.
O. Debarre has shown that on a Jacobian these are the only subvarieties having
minimal class [5, Thm.5.1], furthermore by a theorem of Z. Ran [11, Thm.5], the
only principally polarised abelian fourfolds with a subvariety of minimal class are
(products of) Jacobians of curves. In higher dimension few things are known about
subvarieties having minimal class.
In [9], G. Pareschi and M. Popa introduce a new approach to the characterisation
of these subvarieties: they consider the (probably more tractable) cohomological
properties of the twisted structure sheaf of the subvariety. More precisely we have
the following conjecture.
1.1. Conjecture. [5],[9] Let (A,Θ) be an irreducible principally polarised abelian
varieties of dimension n, and let Y be a nondegenerate subvariety (cf. [11, p.464])
of A of dimension d ≤ n− 2. The following statements are equivalent.
1.) The variety Y has minimal cohomology class, i.e. [Y ] = Θ
(n−d)!
2.) The twisted structure sheaf OY (Θ) is M -regular (cf. definition 1.4 below),
and h0(Y,OY (Θ)⊗ Pξ) = 1 for Pξ ∈ Pic
0(A) general.
Date: 4th April, 2007.
http://arxiv.org/abs/0704.0558v2
3.) Either (A,Θ) is the Jacobian of a curve of genus n and Y is a translate
of Wd(C) or −Wd(C), or n = 5, d = 2 and (A,Θ) is the intermediate
Jacobian of a smooth cubic threefold and Y is a translate of F or −F .
The implication 2) ⇒ 1) is the object of [9, Thm.B]. The implication 3) ⇒ 2)
has been shown for Jacobians of curves in [8, Prop.4.4]. We complete the proof of
this implication by treating the case of the intermediate Jacobian.
1.2. Theorem. Let X3 ⊂ P4 be a smooth cubic threefold, and let (J(X),Θ) be
its intermediate Jacobian. Let F ⊂ J(X) be an Abel-Jacobi embedded copy of the
Fano variety of lines in X. Then OF (Θ) is M -regular and h
0(F,OF (Θ)⊗ Pξ) = 1
for Pξ ∈ Pic
0 J(X) general.
Since the properties considered are invariant under isomorphisms, the theorem
implies the same statement for −F .
The study of the remaining open implications of conjecture 1.1 is a much harder
task than the proof of theorem 1.2. In an upcoming paper we will start to investi-
gate this problem under the additional hypothesis that (A,Θ) is the intermediate
Jacobian of a generic smooth cubic threefold. In this case we can show the following
statement.
1.3. Theorem. [6] Let X3 ⊂ P4 be a general smooth cubic threefold. Let (J(X),Θ)
be its intermediate Jacobian, and let F ⊂ J(X) be an Abel-Jacobi embedded copy
of the Fano variety of lines in X. Let S ⊂ J(X) be a surface that has minimal
cohomology class, i.e. [S] = Θ
. Then S is a translate of F or −F .
Notation and basic facts.
We work over an algebraically closed field of characteristic different from 2. We
will denote by D ≡ D′ the linear equivalence of divisors, and by D ≡num D
′ the
numerical equivalence.
For (A,Θ) a principally polarised abelian variety (ppav), we identify A with
Â = Pic0(A) via the morphism induced by Θ. If ξ ∈ A is a point, we denote by Pξ
the corresponding point in Â = Pic0(A) which we consider as a numerically trivial
line bundle on A.
1.4. Definition. [10] Let (A,Θ) be a ppav of dimension n, and let F be a coherent
sheaf on A. For all n ≥ i > 0, we denote by
V iF := {ξ ∈ A | h
i(A,F ⊗ Pξ) > 0}
the i-th cohomological support locus of F . We say that F is M -regular if
codimV i
for all i ∈ {1, . . . , n}.
If l ⊂ X is a line, we will denote by [l] the corresponding point of the Fano
surface F and by Dl ⊂ F the incidence curve of l, that is, Dl parametrises lines in
X that meet l. Furthermore we have by [4, §10], [12, §6] and Riemann-Roch that
OF (Θ) ≡num 2Dl,(1.5)
KF ≡num 3Dl,(1.6)
Dl ·Dl = 5,(1.7)
χ(F,OF (Θ)) = 1.(1.8)
2. Prym construction of the Fano surface
We recall the construction of the Fano surface as a special subvariety of a Prym
variety [3, 2]: let C̃ := Dl0 ⊂ F be the incidence curve of a general line l0 ⊂ X . Let
X ′ be the blow-up of X in l0. Then the projection from l0 induces a conic bundle
structure X ′ → P2 with branch locus C ⊂ P2 a smooth quintic. This conic bundle
induces a natural connected étale covering of degree two π : C̃ → C (cf. [1, Ch.I]
for details), and we denote by σ : C̃ → C̃ the involution induced by π.
The kernel of the normmorphism Nm : JC̃ → JC has two connected components
which we will denote by P and P1. The zero component P is called the Prym variety
associated to π, and it is isomorphic as a ppav to J(X) [1, Thm.2.1].
Let H ⊂ C be an effective divisor given by a hyperplane section in P2. Then H
has degree five and h0(C,OC(H)) = 3, so the complete linear system g
5 corresponds
to a P2 ⊂ C(5). We choose a divisor H̃ ∈ C̃(5) such that π(5)([H̃ ]) = [H ], where
π(5) : C̃(5) → C(5) is the morphism induced by π on the symmetric products. Let
φH : C
(5) → JC and φ
: C̃(5) → JC̃ be the Abel-Jacobi maps given by H and H̃.
We have a commutative diagram
C̃(5)
The fibre of φ
(C̃(5)) → φH(C
(5)) over the point 0 (and thus the intersection of
(C̃(5)) with kerNm) has two connected components F0 ⊂ P and F1 ⊂ P1. If
we identify P and P1 via H̃ − σ(H̃), we obtain an identification F1 = −F0 [3,
p.360]. The (non-canonical) isomorphism of ppavs P ≃ J(X) transforms F0 into a
translate of the Fano surface F [3, Thm.4].
From now on we will identify P (resp. F0) and J(X) (resp. some Abel-Jacobi
emdedded copy of the Fano surface F ).
We will now prove two technical lemmata on certain linear systems on C̃. The
first is merely a reformulation of [2, §2,ii)].
2.9. Lemma. The line bundle O
(C̃) is a base-point free pencil of degree five such
that any divisor D ∈ |O
(C̃)| satisfies π∗D ≡ H.
Proof. We define a morphism µ : C̃ = Dl0 → l0 ≃ P
1 by sending [l] ∈ C̃ to
l∩l0. Since l0 is general and through a general point of l0 there are five lines distinct
from l0, the morphism µ has degree 5. If [l] ∈ F , then Dl · Dl0 = 5 by formula
(1.7), so for [l] 6= [l0] the divisor Dl0 ∩ Dl ∈ |OC̃(Dl)| is effective. Furthermore
π∗Dl ≡ H , since π∗Dl is the intersection of C ⊂ P
2 with the image of l under
the projection X ′ → P2. By specialisation the linear system |O
(C̃)| is not empty
and a general divisor D in it corresponds to the five lines distinct from l0 passing
through a general point of l0. Hence OC̃(C̃) ≃ µ
∗OP1(1) and π∗D ≡ H . �
2.10. Lemma. The sets
V ′0 := {ξ ∈ P | h
0(C̃,O
(C̃)⊗ Pξ) > 0}
V ′1 := {ξ ∈ P | h
0(C̃,O
(2C̃)⊗ Pξ) > 1}
are contained in translates of F ∪ (−F ).
Proof.
1) Let D ∈ |O
(C̃) ⊗ Pξ| be an effective divisor. Then π∗C̃ ≡ π∗D ≡ H . It
follows that D ∈ (φ
(C̃(5)) ∩ kerNm), so D is in F or −F .
2) We follow the argument in [2, §3]. By [2, §2,iv)] we have h0(C̃,O
(C̃ +
σ(C̃))) = 4, so h0(C̃,O
(2C̃)) is odd. It follows from the deformation invariance
of the parity [7, p.186f] that
V ′1 = {ξ ∈ P | h
0(C̃,O
(2C̃)⊗ Pξ) ≥ 3}.
Fix ξ ∈ P such that h0(C̃,O
(2C̃)⊗ Pξ) ≥ 3 and D ∈ |OC̃(2C̃)⊗ Pξ|. Let s and t
be two sections of O
(C̃) such that the associated divisors have disjoint supports,
then we have an exact sequence
0 → O
(D − C̃)
(t,−s)
(D)⊕2
(s,t)
(D + C̃) → 0.
This implies
h0(C̃,O
(D − C̃)) + h0(C̃,O
(D + C̃)) ≥ 2h0(C̃,O
(D)) = 6,
furthermore by Riemann-Roch h0(C̃,O
(D + C̃)) = h0(C̃,O
−D − C̃)) + 5.
Now K
−D ≡ σ(D) and h0(C̃,O
(σ(D) − C̃)) = h0(C̃,O
(D − σ(C̃))) imply
h0(C̃,O
(D − C̃)) + h0(C̃,O
(D − σ(C̃))) ≥ 1.
Hence D ≡ C̃ + D′ or D ≡ σ(C̃) + D′ where D′ is an effective divisor such that
′ ≡ H . We see as in the first part of the proof that the effective divisors D′
such that π∗D
′ ≡ H are parametrised by a set that is contained in a translate of
F ∪ (−F ). �
3. Proof of theorem 1.2.
Since OF (Θ) ≡num OF (2C̃) by formula (1.5), it is equivalent to verify the stated
properties for the sheaf OF (2C̃).
Step 1. The second cohomological support locus is contained in a translate of
F ∪ (−F ). By formula (1.6), we have KF ≡ OF (3C̃)⊗Pξ0 for some ξ0 ∈ P . Hence
by Serre duality h2(F,OF (2C̃)⊗Pξ) = h
0(F,OF (C̃)⊗P
ξ ⊗Pξ0), so it is equivalent
to consider the non-vanishing locus
V0 := {ξ ∈ P | h
0(F,OF (C̃)⊗ Pξ) > 0}.
If l ∈ F is a line on X , the corresponding incidence curve Dl ⊂ F is an effective
divisor numerically equivalent to C̃, so it is clear that ±F is (up to translation) a
subset of V0. In order to show that we have an equality, consider the exact sequence
0 → OF ⊗ Pξ → OF (C̃)⊗ Pξ → OC̃(C̃)⊗ Pξ → 0.
Clearly h0(F,OF ⊗ Pξ) = 0 for ξ 6= 0, so h
0(F,OF (C̃)⊗ Pξ) ≤ h
0(C̃,O
(C̃)⊗ Pξ)
for ξ 6= 0. Since a divisor D ∈ |O
(C̃)| satisfies π∗D ≡ H , we conclude with
Lemma 2.10.
Step 2. The first cohomological support locus is is contained in a union of trans-
late of F ∪ (−F ). Since χ(F,OF (2C̃)) = χ(F,OF (Θ)) = 1 (formula (1.8)), we
h1(F,OF (2C̃)⊗ Pξ) = h
0(F,OF (2C̃)⊗ Pξ) + h
0(F,OF (C̃)⊗ P
ξ ⊗ Pξ0)− 1.
Since
h0(F,OF (2C̃)⊗ Pξ) = h
0(F,OF (Θ)⊗ Pξ) ≥ 1
for all ξ ∈ P , the first cohomological support locus is contained in the locus where
h0(F,OF (C̃)⊗ P
ξ ⊗Pξ0) > 0 or h
0(F,OF (2C̃)⊗ Pξ) > 1. By step 1 the statement
follows if we show the following claim: the set
V1 := {ξ ∈ P | h
0(F,OF (2C̃)⊗ Pξ) > 1}
is contained in a union of translates of F ∪ (−F ).
Step 3. Proof of the claim and conclusion. Consider the exact sequence
0 → OF (C̃)⊗ Pξ → OF (2C̃)⊗ Pξ → OC̃(2C̃)⊗ Pξ → 0.
By the first step we know that h0(F,OF (C̃)⊗ Pξ) = 0 for ξ in the complement of
a translate of F ∪ (−F ), so
h0(F,OF (2C̃)⊗ Pξ) ≤ h
0(C̃,O
(2C̃)⊗ Pξ)
for ξ in the complement of a translate of F ∪ (−F ). The claim is then immediate
from Lemma 2.10. By the same lemma h0(C̃,O
(2C̃)⊗Pξ) = 1 for ξ ∈ P general,
so h0(F,OF (2C̃)⊗ Pξ) = h
0(F,OF (Θ)⊗ Pξ) = 1 for ξ ∈ P general. �
Remark. It is possible to strengthen a posteriori the statements in the proof:
since Theorem 1.2 holds, we can use the Fourier-Mukai techniques from [9] to see
that the cohomological support loci are supported exactly on the theta-dual of F
(ibid, Definition 4.2), which in our case is just F .
Acknowledgements. I would like to thank Mihnea Popa for suggesting to me
to work on this question. Olivier Debarre has shown much patience at explaining
to me the geometry of abelian varieties. For this and many discussions on minimal
cohomology classes I would like to express my deep gratitude.
References
[1] A. Beauville. Variétés de Prym et jacobiennes intermédiaires. Ann. Sci. École Norm. Sup.
(4), 10(3):309–391, 1977.
[2] A. Beauville. Les singularités du diviseur Θ de la jacobienne intermédiaire de l’hypersurface
cubique dans P4. In Lect. Notes Math. 947., pages 190–208. Springer, Berlin, 1982.
[3] A. Beauville. Sous-variétés spéciales des variétés de Prym. Comp. Math., 45(3):357–383, 1982.
[4] C. H. Clemens and P. A. Griffiths. The intermediate Jacobian of the cubic threefold. Ann. of
Math. (2), 95:281–356, 1972.
[5] O. Debarre. Minimal cohomology classes and Jacobians. J. Alg. Geom., 4(2):321–335, 1995.
[6] A. Höring. Paper in preparation. Soon on this server, 2007.
[7] D. Mumford. Theta characteristics of an algebraic curve. Ann. Sci. École Norm. Sup. (4),
4:181–192, 1971.
[8] G. Pareschi and M. Popa. Regularity on abelian varieties. I. J. Amer. Math. Soc., 16(2):285–
302, 2003.
[9] G. Pareschi and M. Popa. Generic vanishing and minimal cohomology classes on abelian
varieties. arXiv:math.AG/0610166, 2006.
[10] G. Pareschi and M. Popa. GV-sheaves, Fourier-Mukai transform, and Generic Vanishing.
arXiv:math.AG/0608127, 2006.
[11] Z. Ran. On subvarieties of abelian varieties. Inventiones Math., 62:459–479, 1981.
[12] G. E. Welters. Abel-Jacobi isogenies for certain types of Fano threefolds, volume 141 of
Mathematical Centre Tracts. Mathematisch Centrum, Amsterdam, 1981.
Andreas Höring, IRMA, Université Louis Pasteur, 7 rue René Descartes, 67084 Stras-
bourg, France
E-mail address: andreas.hoering@ujf-grenoble.fr
http://arxiv.org/abs/math/0610166
http://arxiv.org/abs/math/0608127
	1. Introduction
	2. Prym construction of the Fano surface
	3. Proof of theorem ??.
	References
ABSTRACT
  Let $(A,\Theta)$ be a principally polarised abelian variety, and let Y be a
subvariety. Pareschi and Popa conjectured that Y has minimal cohomology class
if and only if the structure sheaf of Y satisfies a property that they call
M-regularity.
  Let now X be a smooth cubic threefold. By a classical result due to Clemens
and Griffiths, its intermediate Jacobian J(X) is a principally polarised
abelian variety; furthermore the Fano surface of lines on X can be embedded in
J(X) and has minimal cohomology class. In this short note we show that its
structure sheaf is M-regular.

<|endoftext|><|startoftext|>
arXiv:0704.0559v1  [hep-ph]  4 Apr 2007
Signal for space-time noncommutativity: the
Z → γγ decay in the renormalizable gauge sector
of the θ-expanded NCSM ∗
Josip Trampetić†
Rudjer Bošković Institute, Zagreb, Croatia
Abstract
We propose the Z → γγ decay, a process strictly forbidden in the standard
model, as a signal suitable for the search of noncommutativity of coordinates at
very short distances. We compute the Z → γγ partial widthin the framework of
the recently proposed renormalizable gauge sector of the noncommutative standard
model. The one-loop renormalizability is obtained for the model containing the
usual six representations of matter fields of the first generation. Even more, the
noncommutative part is finite or free of divergences, showing that perhaps new
interaction symmetry exists in the noncommutative gauge sector of the model.
Discovery of such symmetry would be of tremendous importance in further search
for the violation of the Lorentz invariance at very high energies. Experimental
possibilities of Z → γγ decay are analyzed and a firm bound to the scale of the
noncommutativity parameter is set around 1 TeV.
∗ Based on presentation given at the IV Summer School in Modern Mathematical
Physics, Belgrad, Serbia, September 3-14, 2006 and LHC Days in Split, Croatia, October
2-7, 2006. Work supported by the Croatian Ministry of Science, Education and Sport
project 098-0982930-2900.
† e-mail address: josipt@rex.irb.hr
http://arxiv.org/abs/0704.0559v1
The title 2
Gauge theories can be extended to a noncommutative (NC) setting in
different ways. In our model, the classical action is obtained via a two-step
procedure. First, the noncommutative Yang-Mills (NCYM) is equipped
with a star product carrying information about the underlying noncommu-
tative manifold, and, second, the ⋆-product and noncommutative fields are
expanded in the noncommutative parameter θ using the Seiberg-Witten
(SW) map [1]. In this approach, noncommutativity is treated perturba-
tively. The major advantage is that models with any gauge group and any
particle content can be constructed [2, 3, 4, 5, 6, 7], so we can construct
the standard model (SM). Commutative gauge symmetry is the underlying
symmetry of the theory and is present in each order of the θ-expansion.
Noncommutative (NC) symmetry, on the other hand, exists only in the full
theory, i.e. after summation.
There are a number of versions of the noncommutative standard model
(NCSM) in the θ-expanded approach, [3, 4, 5, 6]. The action is gauge in-
variant; furthermore, it has been proved that the action is anomaly free
whenever its commutative counterpart is also anomaly free [8]. The ar-
gument of renormalizability was previously included in the construction
of field theories on noncommutative Minkowski space producing not only
the one-loop renormalizable model [9], but the model containing one-loop
quantum corrections free of divergences [10], contrary to previous results
[11, 12].
In [10] we analyzed the gauge theory based on the U(1)Y × SU(2)L ×
SU(3)C group: we succeeded in constructing a model which had the renor-
malizable gauge sector to θ-linear order. The condition of the gauge sector
renormalizability determines the additional θ-linear interactions between
gauge bosons.
Experimental evidence for noncommutativity coming from the gauge
sector should be searched for in the process of the Z → γγ decay, kinemati-
cally allowed for on-shell particles [10, 7]. As it is forbidden in the SM by an-
gular momentum conservation and Bose statistics (Landau-Pomeranchuk-
Yang Theorem), it would serve as a clear signal for the existence of space-
time noncommutativity. Signatures of noncommutativity were discussed
previously within particle physics in [7, 13, 14].
The noncommutative space which we consider is the flat Minkowski
space, generated by four hermitian coordinates x̂µ which satisfy the com-
mutation rule
[x̂µ, x̂ν ] = iθµν = const. (1)
The algebra of the functions φ̂(x̂), χ̂(x̂) on this space can be represented
by the algebra of the functions φ̂(x), χ̂(x) on the commutative R4 with the
Moyal-Weyl multiplication:
φ̂(x) ⋆ χ̂(x) = e
θµν ∂
∂yν φ̂(x)χ̂(y)|y→x . (2)
It is possible to represent the action of an arbitrary Lie group G (with
the generators denoted by T a) on noncommutative space. In analogy to
the ordinary case, one introduces the gauge parameter Λ̂(x) and the vector
The title 3
potential V̂µ(x). The main difference is that the noncommutative Λ̂ and V̂µ
cannot take values in the Lie algebra G of the group G: they are enveloping
algebra-valued. The noncommutative gauge field strength F̂µν is
F̂µν = ∂µV̂ν − ∂ν V̂µ − i(V̂µ ⋆ V̂ν − V̂ν ⋆ V̂µ). (3)
There is, however, a relation between the noncommutative gauge symmetry
and the commutative one: it is given by the Seiberg-Witten (SW) mapping
[1]. Namely, the matter fields φ̂, the gauge fields V̂µ, F̂µν and the gauge
parameter Λ̂ can be expanded in the noncommutative θµν and in the com-
mutative Vµ and Fµν . This expansion coincides with the expansion in the
generators of the enveloping algebra of G, {T a, : T aT b :, : T aT bT c :};
here : : denotes the symmetrized product. The SW map is obtained as a
solution to the gauge-closing condition of infinitesimal (noncommutative)
transformations. The expansions of the NC vector potential and of the field
strength, up to first order in θ, read
V̂ρ(x) = Vρ(x)−
θµν {Vµ(x), ∂νVρ(x) + Fνρ(x)}+ . . . , (4)
F̂ρσ = Fρσ +
2{Fµρ, Fνσ} − {Vµ, (∂ν +Dν)Fρσ}
+ . . . , (5)
where Dν = ∂ν − i[Vν , ] is the commutative covariant derivative.
The solution for the SW map given above is not unique and along with
(5) all expressions V̂ ′µ, F̂
µν of the form
V̂ ′µ = V̂µ +Xµ, F̂
µν = F̂µν +DµXν −DνXµ (6)
are solutions to the closing condition to linear order, if Xµ is a gauge
covariant expression linear in θ, otherwise arbitrary. One can think of this
transformation as of a redefinition of the fields Vµ and Fµν .
Taking the action of the noncommutative gauge theory, analogous to
that of the ordinary Yang-Mills theory with the commutative field strengths
replaced by the noncommutative ones,
S = −
d4x F̂µν ⋆ F̂
µν , (7)
and expanding the fields as in (4-5) and the ⋆-product in θ, we obtain the
expression
S = −
d4xFµνF
µν + θµν Tr
FµνFρσ − FµρFνσ
F ρσ , (8)
which is the starting point for the analysis of θ-expanded noncommutative
gauge models. Due to the renormalizability condition, we add term, includ-
ing NC freedom parameter 1
(a− 1), to the original Lagrangian, producing
the following general form of the noncommutative gauge field action:
S = −
d4xFµνF
µν + θµν Tr
d4x (
FµνFρσ − FµρFνσ)F
ρσ. (9)
The title 4
The most general form of the NC action, invariant under the NC gauge
transformation, is given in [3, 5, 6, 4],
Sgauge = −
R(F̂µν) ⋆R(F̂
. (10)
The sum in (10) is, in principle, taken over all irreducible representations
R of GSM with arbitrary weights CR. Obviously, gauge models are rep-
resentation dependent in the NC case: the choice of representations has a
strong influence on the theory, on both the form of interactions and the
renormalizability properties.
Expanding the NC gauge action (10) to first order in the noncommuta-
tivity parameter θ, we obtain
Sgauge = −
d4xR(Fµν)R(F
µν) (11)
R(Fµν)R(Fρσ)−R(Fµρ)R(Fνσ)
R(F ρσ).
The arbitrariness in the gauge action, introduced through the coefficient
a, reflects in part also the nonuniqueness of the SW map. As we have
already mentioned, renormalizability points out the value a = 3 as physical;
however, we keep the value of a arbitrary in calculations and use a = 3 at
the end.
Note that by generalizing the expression (5) to equivalent form
F̂µν(a) = Fµν +
2{Fµρ, Fντ} − a{Vρ, (∂τ +Dτ )Fµν}
, (12)
one could also obtain the actions (9,11) directly from (7,10).1 The im-
portant question, if the freedom parameter a is eventually comming from
different class of SW maps and/or some other new interaction symmetry
extends the purpose of this presentation and, consequentlly, shall be dis-
cussed elsewhere.
The noncommutative correction, that is the θ-linear part of the La-
grangian, reads
Lθi = g
′3κ1θ
fµνfρσf
ρσ − fµρfνσf
+ g3κ
BiµνB
ρσk −BiµρB
+ g3Sκ
GaµνG
ρσc −GaµρG
1This is in part due to the properties of the integral over the two-function ⋆-product,
i.e. the Stokes theorem.
The title 5
+ g′g2κ2θ
ρσi − fµρB
ρσi + c.p.
+ g′g2Sκ3θ
ρσa − fµρG
ρσa + c.p.
, (13)
where the c.p. in (13) denotes the addition of the terms obtained by a
cyclic permutation of fields without changing the positions of indices. Here,
fµν , B
µν , and G
µν are the physical field strengths which correspond to
U(1)Y, SU(2)L, and SU(3)C, respectively. The couplings κi, (i = 1, ..., 5),
as functions of the weights CR, that is of the Ci(= 1/g
i ), i = 1, ..., 6, are
parameters of the model. The couplings in (13) are defined as follows:
CRd(R2)d(R3)R1(Y )
3, (14)
CRd(R3)R1(Y )Tr (R2(T
L)R2(T
L)), (15)
CRd(R2)R1(Y )Tr (R3(T
S )R3(T
S)), (16)
CRd(R3)Tr ({R2(T
L),R2(T
L)}R2(T
L)), (17)
κabc5 =
CRd(R2)Tr ({R3(T
S ),R3(T
S)}R3(T
S)). (18)
The κ1, . . . , κ5 depend on the representations of matter fields through the
dependence on the coefficients CR. For the first generation of the standard
model there are six such representations, summarized in Table 1 of [4]; they
produce six independent constants CR
2. However, one can immediately
verify that κ
4 = 0. This follows from the fact that the symmetric coeffi-
cients dijk of SU(2) vanish for all irreducible representations. In addition,
we take that κabc5 = 0. The argument for this assumption is related to the
invariance of the color sector of the SM under charge conjugation. Although
apparently in Table 1 from [4] one has only the fundamental representa-
tion 3 of SU(3)C, there are in fact both 3 and 3̄ representations with the
same weights, C3 = C3̄. In the Lagrangian this corresponds to writing each
minimally-coupled quark term as a half of the sum of the original and the
charge-conjugated terms. Since the symmetric coefficients for the 3 and 3̄
representations satisfy dabc
= −dabc
, we obtain
κabc5 = C3d
= 0. (19)
2We assume that CR > 0; therefore the six CR’s were denoted by
, i = 1, ..., 6, in
[3, 6].
The title 6
-0.3 -0.2 -0.1
0 0.1
ΓΓΓ -0.2
-0.3 -0.2 -0.1 0 0.1
Figure 1: (a) The three-dimensional simplex that bounds possi-
ble values for the coupling constants Kγγγ , KZγγ and KZgg at the
MZ scale. The vertices of the simplex are: (−0.184,−0.333, 0.054),
(−0.027,−0.340,−0.108), (0.129,−0.254, 0.217), (−0.576, 0.010,−0.108),
(−0.497,−0.133, 0.054), and (−0.419, 0.095, 0.217). (b) The allowed region
for KZγγ and Kγγγ at theMZ scale, projected from the simplex given in Fig
(a). The vertices of the polygon are: (−0.333, −0.184), (−0.340, −0.027),
(−0.254, 0.129), (0.095, −0.419), (0.0095, −0.576), and (−0.133, −0.497).
We are left only with three nonvanishing couplings, κ1, κ2, and κ3, depend-
ing on six constants C1, . . . , C6:
κ1 = −C1 −
κ2 = −
C6 ; κ3 =
C5 . (20)
There are three relations among Ci’s:
= 2C1 + C2 +
C5 + C6 ,
= C2 + 3C5 + C6 ;
= C3 + C4 + 2C5 , (21)
in effect representing three consistency conditions imposed on (8) in a way
to match the SM action at zeroth order in θ. See detailes in [6].
Fig.(1) shows the three-dimensional simplex that bounds allowed values
for the dimensionless coupling constants Kγγγ , KZγγ and KZgg. For any
choosen point within the simplex in Fig.(1) the remaining coupling con-
stants KZZγ, KZZZ, KWWγ, KWWZ and Kγgg are uniquely fixed by the
NCSM [6, 4]. This is true for any combination of three coupling constants.
The title 7
Our total classical action reads
Scl = SSM +
Sθi = g
′3κ1θ
fµνfρσf
ρσ − fµρfνσf
+ g′g2κ2θ
ρσi − fµρB
ρσi + c.p.
+ g′g2Sκ3θ
ρσa − fµρG
ρσa + c.p.
. (22)
The term Sθ1 in (22) is one-loop renormalizable to linear order in θ [9] since
the one-loop correction to the Sθ1 is of the second order in θ. We need to
investigate only the renormalizability of the remaining Sθ2 and S
3 parts of
the action (22).
To realize the one-loop renormalization of the gauge part action (22), we
apply, as before [9, 10], the background field method [15, 16]. As we have
already explained the details of the method in [12], here we only discuss
the points needed for this computation. The main contribution to the func-
tional integral is given by the Gaussian integral. However, technically, this
is achieved by splitting the vector potential into the classical-background
plus the quantum-fluctuation parts, that is, φV → φV +ΦV , and by comput-
ing the terms quadratic in the quantum fields. In this way we determine
the second functional derivative of the classical action, which is possible
since our interactions (22) are of the polynomial type. The quantization
is performed by the functional integration over the quantum vector field
ΦV in the saddle-point approximation around the classical (background)
configuration φV .
First, an advantage of the background field method is the guarantee of
covariance, because by doing the path integral the local symmetry of the
quantum field ΦV is fixed, while the gauge symmetry of the background
field φV is manifestly preserved.
Since we are dealing with gauge symmetry, our Lagrangian (22) is sin-
gular owing to its invariance under the gauge group. Therefore, a proper
quantization of (22) requires the presence of the gauge fixing term Sgf [φ],
i.e. the Feynman-Fadeev-Popov ghost appears in the effective action
Γ[φ] = Scl[φ] + Sgf [φ] + Γ
(1)[φ], Sgf [φ] = −
d4x(DµΦ
)2 . (23)
The one-loop effective part Γ(1)[φ] is given by
Γ(1)[φ] =
log detS(2)[φ] =
Tr logS(2)[φ]. (24)
In (24), the S(2)[φ] is the 2nd-functional derivative of the classical action,
with the following structure:
S2 = ✷+N1 +N2 + T2 + T3 + T4 . (25)
The title 8
Here N1, N2 are commutative vertices, while T2, T3, T4 are noncommutative
ones. The indices denote the number of classical fields. The one-loop
effective action computed by using the background field method is
θ,2 =
Tr log
I +✷−1(N1 +N2 + T2 + T3 + T4)
(−1)n+1
−1N1 +✷
−1N2 +✷
−1T2 +✷
−1T3 +✷
As the conventions and the notation are the same as in [10], we only en-
counter and discuss the final results.
The divergent one-loop vertex correction to (22) as a function of the
SW freedom parameter a is [10]
Γdiv =
3(4π)2ǫ
BiµνB
µνi +
GaµνG
3(4π)2ǫ
g′g2κ2(3− a)θ
ρσ − fµρB
3(4π)2ǫ
g′g2Sκ3(3− a)θ
ρσ − fµρG
ρσa .
From (27) it is clear that the expanded gauge action (22) is renormalizable
only for the value a = 3 and, its noncommutative part is finite or free of di-
vergencies, so the noncommutativity parameter θ need not be renormalized.
The results for the bare fields and couplings, are given in [10].
Note that we have also analized the renormalizability properties of the
pure NC SU(N) gauge sector, for vector fields in the adjoint representation
[17]. We have found that this model is also renormalizable for a = 3.
However, to obtain renormalizability, we had to pay a price by necessity
for the renormalization of the noncommutative deformation parameter h.
In this way the parameter h and/or the scale of noncommutativity ΛNC
become running quantities, dependent on energy [17].
In addition, it was shown that the one-loop contributions to the U(1)
gauge-field part of the noncommutative gauge theories in the enveloping-
algebra formalism are renormalizable at first order in θ even if the scalar
matter, with and without spontaneous symmetry breaking, contributions
are taken into account [18]. There is reasonable hope that the same con-
clusion should hold for SU(N), but the computations are expected to be
extremely involving. Nevertheless, the results [18] further strengthen the
philosophy which is embraced in our latest papers [10, 17].
From the action (22) we extract the triple-gauge boson terms which are
not present in the commutative SM Lagrangian. In terms of the physical
fields A, W±, Z, and G they are
Lθγγγ =
sin 2θW Kγγγθ
ρτAµν (aAµνAρτ − 4AµρAντ ) ,
Kγγγ =
gg′(κ1 + 3κ2); (28)
The title 9
LθZγγ =
sin 2θW KZγγ θ
× [2Zµν (2AµρAντ − aAµνAρτ ) + 8ZµρA
µνAντ − aZρτAµνA
µν ] ,
KZγγ =
− 2g2
; (29)
where Aµν ≡ ∂µAν − ∂νAµ, etc. The structure of the other interactions
such as ZZγ, WWZ, ZZZ, Zgg, and γgg is given in [4, 6].
Next we focus on the branching ratio of the Z → γγ decay in the renor-
malizable model. Note that each term from the θ-expanded action (22),
(28) and (29) is manifestly invariant under the ordinary gauge transforma-
tions. The gauge-invariant amplitude AθZ→γγ for the Z(k1) → γ(k2) γ(k3)
decay in the momentum space reads
AθZ→γγ = −2e sin 2θWKZγγΘ
3 (a; k1,−k2,−k3)ǫµ(k1)ǫν(k2)ǫρ(k3). (30)
The tensor Θ
3 (a; k1, k2, k3) is given by
3 (a; k1, k2, k3) = − (k1θk2) (31)
× [(k1 − k2)
ρgµν + (k2 − k3)
µgνρ + (k3 − k1)
νgρµ]
− θµν [k
1 (k2k3)− k
2 (k1k3)]
− θνρ [k
2 (k3k1)− k
3 (k2k1)]
− θρµ [kν3 (k1k2)− k
1 (k3k2)]
+ (θk2)
gνρ k23 − k
+ (θk3)
gνρ k22 − k
+ (θk3)
gµρ k21 − k
+ (θk1)
gµρ k23 − k
+ (θk1)
gµν k22 − k
+ (θk2)
gµν k21 − k
+ θµα(ak1 + k2 + k3)α [g
νρ (k3k2)− k
+ θνα(k1 + ak2 + k3)α [g
µρ (k3k1)− k
+ θρα(k1 + k2 + ak3)α [g
µν (k2k1)− k
1 ] ,
where the 4-momenta k1, k2, k3 are taken to be incoming, satisfying the
momentum conservation (k1+ k2+ k3 = 0). In (31) the freedom parameter
a appears symmetric in physical gauge bosons which enter the interaction
point, as one would expect. The amplitude (30), for a = 3, with the Z
boson at rest gives the total rate for the Z → γγ decay:
ΓZ→γγ =
sin2 2θWK
~E2θ +
~B2θ ), (32)
where ~Eθ = {θ
01, θ02, θ03} and ~Bθ = {θ
23, θ31, θ12} are dimensionless coef-
ficients of order one, representing the time-space and space-space noncom-
mutativity, respectively. For the Z boson at rest, polarized in the direction
The title 10
of the third axis, we obtain the following polarized partial width:
Γ3Z→γγ =
sin2 2θWK
~E2θ +
~B2θ + 42
(θ03)2 + (θ12)2
. (33)
In order to estimate the scale of noncommutativity ΛNC from ΓZ→γγ,we
consider new experimental possibilities at LHC. According to the CMS
Physics Technical Design Report [19], around 107 Z → e+e− events are
expected to be recorded with 10 fb−1 of the data. From this one can
estimate the expected number of Z → γγ events per 10 fb−1. Assuming
that BR(Z → γγ) ∼ 10−8 and using BR(Z → e+e−) = 3 × 10−2, we may
expect to have ∼ 3 events of Z → γγ with 10 fb−1. Now the question
is: What would be the background from Z → e+e− when the electron
radiates a very high-energy bremsstrahlung photon in the beam pipe or
in the first layer(s) of the Pixel Detector and is thus lost for the tracker
reconstruction? In that case, the electron would not be reconstructed and
would be misidentified as a photon. The probability of such an event should
be evaluated from the full detector simulation. According to the CMS
note [20] which studies the Z → e+e− background for Higgs → γγ, the
probability to misidentify the electron as a photon is huge (see Fig. 3 in [20])
but the situation can be improved by applying more stringent selections to
the photon candidate when searching for Z → γγ events [21]. However,
the irreducible di-photon background (Fig. 3 in [20]) might also kill the
signal. In that case, one can only set the upper limits to the scale of
noncommutativity from the Z → γγ rate.
In accord with the analysis of the LHC experimental expectations [19,
20, 21] it is bona fide reasonable to assume that the lower bound for the
branching ratio is BR(Z → γγ)
∼ 10−8. Next, choosing the lower central
value of |KZγγ | = 0.05, from the figures and the Table in [6], we find that
the upper bound to the scale of noncommutativity is ΛNC
∼ 1.0 TeV for
~E2θ +
~B2θ ≃ 1. The obtained bound is strongly supported in [18].
Clearly, the measurement of the Z → γγ decay branching ratio would
fix the quantity |KZγγ/Λ
NC|, while the inclusion of other triple gauge boson
interactions through 2 → 2 scattering experiments [14] would sufficiently
reduce the available parameter space of our model by more precisely de-
termining the relations among the couplings Kγγγ , KZγγ , KZZγ, KZZZ ,
KWWγ, and KWWZ. Next, we summarize our results and compare with
those obtained previously.
The first Z → γγ calculation [22] was performed within a different
model which has different symmetries in comparison with ours and, because
of the absence of the SW map, the model does not possess the commutative
gauge invariance. Also, the Z → γγ rate obtained in [22] by imposing the
unitarity of the theory in the usual manner, θ0i = 0, [23, 24], vanishes 3.
The partial width for the same process was obtained in [6] in the frame-
work of similar theories, which, however, were not renormalizable. The
3The condition of unitarity can be covariantly generalized to θµνθ
µν = 2( ~B2θ −
~E2θ) > 0
[25].
The title 11
present results for the partial widths ΓZ→γγ and Γ
Z→γγ are about three
times larger than those in [6] and consistently symmetric with respect to
time-space and space-space noncommutativity. In the polarized rate (33)
the third components ((θ03)2 + (θ12)2) are enhanced relative to the other
two components by a large factor, as expected. Also, the rate (33) is en-
hanced by a factor of ∼ 3 with respect to the total rate (32). The upper
limit to the scale of noncommutativity ΛNC
∼ 1 TeV is significantly higher
than in [6]. This bound is now firmer owing to the regular behavior of
the triple gauge boson interactions (28-29) with respect to the one-loop
renormalizability of the NCSM gauge sector.
After 10 years of the LHC running the integrated luminosity is expected
to reach ∼ 1000 fb−1, [20]. This means that for the assumed BR(Z →
γγ) ∼ 10−8 we should have ∼ 300 events of Z → γγ, that is we should
be well above the background. On the other hand, this result can also be
understood as ∼ 3 events with the BR(Z → γγ) ∼ 10−10, which lifts the
scale of noncommutativity up by a factor of ∼ 3. Therefore, with a more
stringent selection of photon candidates and if the irreducible di-photon
contamination becomes controllable, the Z → γγ decay will become a clean
signature of space-time noncommutativity in LHC experiments.
Finally, the results of [17,18], while strongly supporting this computa-
tions, might also hint at the existence of new interaction symmetry of the
noncommutative gauge sector. Such new symmetry could be a responsible
for the renormalizability of the noncommutative matter sector including
fermions and, next, for the main goal, i.e. in general, the physical realiza-
tion of the Lorentz invariance breaking at very high energies, respectively.
References
[1] N. Seiberg and E. Witten, String theory and non-commutative geometry, JHEP
09 (1999) 032 [arXiv:hep-th/9908142].
[2] J. Madore, S. Schraml, P. Schupp and J. Wess, Gauge theory on non-commutative
spaces, Eur. Phys. J. C16 (2000) 161 [arXiv:hep-th/0001203]; B. Jurčo, S.
Schraml, P. Schupp and J. Wess, Enveloping algebra valued gauge transforma-
tions for non-Abelian gauge groups on non-commutative spaces, Eur. Phys. J. C17
(2000) 521 [arXiv:hep-th/0006246]; B. Jurčo, L. Möller, S. Schraml, P. Schupp
and J. Wess, Construction of non-Abelian gauge theories on non-commutative
spaces, Eur. Phys. J. C21 (2001) 383 [arXiv:hep-th/0104153].
[3] X. Calmet, B. Jurčo, P. Schupp, J. Wess and M. Wohlgenannt, The standard
model on non-commutative space-time, Eur. Phys. J. C23 (2002) 363 [arXiv:hep-
ph/0111115].
[4] B. Melic, K. Passek-Kumericki, J. Trampetic, P. Schupp and M. Wohlgenannt,
The standard model on non-commutative space-time: Electroweak currents and
Higgs sector, Eur. Phys. J. C 42 (2005) 483 [arXiv:hep-ph/0502249]. B. Melic,
K. Passek-Kumericki, J. Trampetic, P. Schupp and M. Wohlgenannt, The stan-
dard model on non-commutative space-time: Strong interactions included, Eur.
Phys. J. C 42 (2005) 499 [arXiv:hep-ph/0503064].
[5] P. Aschieri, B. Jurčo, P. Schupp and J. Wess, Non-Commutative GUTs, Standard
Model and C,P,T, Nucl. Phys. B651 (2003) 45 [arXiv:hep-th/0205214].
The title 12
[6] W. Behr, N.G. Deshpande, G. Duplančić, P. Schupp, J. Trampetić and J. Wess,
The Z → γγ, gg Decays in the Noncommutative Standard Model, Eur. Phys. J.
C29 (2003) 441 [arXiv:hep-ph/0202121]; G. Duplančić, P. Schupp and J. Tram-
petić, Comment on triple gauge boson interactions in the non-commutative elec-
troweak sector, Eur. Phys. J. C32 (2003) 141 [arXiv:hep-ph/0309138].
[7] M. Buric, D. Latas, V. Radovanovic and J. Trampetic, Improved Z → γγ de-
cay in the renormalizable gauge sector of the non-commutative standard model,
[arXiv:hep-ph/0611299].
[8] F. Brandt, C.P. Martin and F. Ruiz Ruiz, Anomaly freedom in Seiberg-Witten
non-commutative gauge theories, JHEP 07 (2003) 068 [arXiv:hep-th/0307292].
[9] M. Buric, D. Latas and V. Radovanovic, Renormalizability of non-commutative
SU(N) gauge theory, JHEP 0602 (2006) 046 [arXiv:hep-th/0510133].
[10] M. Buric, V. Radovanovic and J. Trampetic, The one-loop renormalization of
the gauge sector in the θ-expanded non-commutative standard model, JHEP 03
(2007) 030 [arXiv:hep-th/0609073].
[11] R. Wulkenhaar, Non-Renormalizability Of Theta-Expanded Noncommutative
QED, JHEP 0203 (2002) 024 [arXiv:hep-th/0112248].
[12] M. Buric and V. Radovanovic, Non-renormalizability of non-commutative SU(2)
gauge theory, JHEP 0402 (2004) 040 [arXiv:hep-th/0401103];
[13] J. Trampetić, Rare and forbidden decays, Acta Phys. Polon. B33 (2002) 4317
[hep-ph/0212309]; B. Melic, K. Passek-Kumericki and J. Trampetic, Quarkonia
decays into two photons induced by the space-time non-commutativity, Phys. Rev.
D 72 (2005) 054004 [arXiv:hep-ph/0503133]; K → pi gamma decay and space-time
non-commutativity, Phys. Rev. D 72 (2005) 057502 [arXiv:hep-ph/0507231];
[14] A. Alboteanu, T. Ohl and R. Ruckl, Collider tests of the non-commutative stan-
dard model, PoS HEP2005 (2006) 322 [arXiv:hep-ph/0511188]; Probing the non-
commutative standard model at hadron collider, Phys. Rev. D 74, 096004 (2006)
[arXiv:hep-ph/0608155].
[15] G. ’t Hooft, An algorithm for the poles at dimension four in the dimensional
regularization procedure, Nucl. Phys. B 62 (1973) 444.
[16] M. E. Peskin and D. V. Schroeder, An introduction to Field Theory, Perseus
Books, Reading 1995.
[17] D. Latas, V. Radovanovic and J. Trampetic, Non-commutative SU(N) gauge the-
ories and asymptotic freedom, arXiv:hep-th/0703018.
[18] C. P. Martin, D. Sanchez-Ruiz and C. Tamarit, The noncommutative U(1)
Higgs-Kibble model in the enveloping-algebra formalism and its renormalizabil-
ity, arXiv:hep-th/0612188.
[19] CMS Physics Technical Design Report, Vol.1. CERN/LHCC 2006-001.
[20] M. Pieri et al., CMS Note 2006/112.
[21] A. Nikitenko, private communications.
[22] I. Mocioiu, M. Pospelov and R. Roiban, Low-energy limits on the antisymmetric
tensor field background on the brane and on the non-commutative scale, Phys.
Lett. B 489, 390 (2000) [arXiv:hep-ph/0005191];
[23] N. Seiberg, L. Susskind and N. Toumbas, Space/time non-commutativity and
causality, JHEP 0006, 044 (2000) [arXiv:hep-th/0005015].
[24] J. Gomis and T. Mehen, Space-time noncommutative field theories and unitarity,
Nucl. Phys. B 591, 265 (2000) [arXiv:hep-th/0005129].
[25] S. M. Carroll, J. A. Harvey, V. A. Kostelecky, C. D. Lane and T. Okamoto,
Noncommutative field theory and Lorentz violation, Phys. Rev. Lett. 87, 141601
(2001) [arXiv:hep-th/0105082].
ABSTRACT
  We propose the Z -> gamma gamma decay, a process strictly forbidden in the
standard model, as a signal suitable for the search of noncommutativity of
coordinates at very short distances. We compute the Z -> gamma gamma partial
widthin the framework of the recently proposed renormalizable gauge sector of
the noncommutative standard model. The one-loop renormalizability is obtained
for the model containing the usual six representations of matter fields of the
first generation. Even more, the noncommutative part is finite or free of
divergences, showing that perhaps new interaction symmetry exists in the
noncommutative gauge sector of the model. Discovery of such symmetry would be
of tremendous importance in further search for the violation of the Lorentz
invariance at very high energies. Experimental possibilities of Z -> gamma
gamma decay are analyzed and a firm bound to the scale of the noncommutativity
parameter is set around 1 TeV.

<|endoftext|><|startoftext|>
Introduction
Quantum electrodynamics (QED) was the first
quantum field theory to be formulated and has suc-
cessfully passed every experimental test at low and
intermediate fields. A well-known example of QED
effects at low fields (∼ 109 V/cm) is the Lamb
shift in hydrogen [1]. At low fields, the QED effects
(self-energy and vacuum polarisation) can still be
∗ Corresponding author.
Email address: d.winters@gsi.de (D.F.A. Winters).
treated as a perturbation, only taking into account
lower order terms [2]. However, up to now QED
calculations have never been tested at high fields
(∼ 1015 V/cm) because such fields cannot be pro-
duced in a laboratory, nor by the strongest lasers
available. At high fields, perturbative QED is no
longer valid and higher order terms become impor-
tant as well [2]. Experiments carried out at high
fields therefore test different aspects of QED cal-
culations and are complementary to high-precision
tests of the lower order terms.
Heavy atoms that have been stripped of almost
Preprint submitted to Canadian Journal of Physics 4 November 2018
http://arxiv.org/abs/0704.0560v2
all their electrons, the so-called highly-charged ions
(HCI), are ideal ‘laboratories’ for tests of QED at
high fields. These ions have, for example, electric
field strengths of the order of 1015 V/cm close to
the nucleus [2] and can be produced at high veloc-
ities at the Gesellschaft für Schwerionenforschung
(GSI) in Darmstadt, Germany.
At the HITRAP facility, which is currently be-
ing built at GSI, ions coming from the experimen-
tal storage ring (ESR) with MeV energies will be
slowed down by linear and radiofrequency stages
to keV kinetic energies, trapped and cooled down
to sub-eV energies, and finally made available for
experiments. Within the HITRAP project, instru-
mentation is being developed for high-precision
measurements of atomic and nuclear properties,
mass and g-factor measurements and ion-atom
and ion-surface interaction studies [3,4,5].
2. Hydrogen- and lithium-like ions
Hydrogen- and lithium-like ions are the best can-
didates for our studies, since they have s-electrons
which are very close to the nucleus. The (higher
order) QED effects are most pronounced at the
high fields close to the nucleus, therefore the best
measurable quantity is the ground state hyperfine
splitting (HFS). Due to the simple electronic struc-
ture of H- and Li-like species, accurate (higher or-
der) calculations of ground state HFS can be done,
which will then be compared with accurate exper-
imental results.
As a first approximation, good within about 4%,
the energy of the (1s) 2S1/2 ground state HFS of
hydrogen-like ions is given by [2,6]:
EHFS = α(Zα)
2(2I + 1)
2As(1− δ)(1)
where α is the fine structure constant, gI =
µ/(µNI) is the nuclear g-factor (with µ the nuclear
magnetic moment and µN the nuclear magneton),
I the nuclear spin, me and mp are the electron and
proton mass, respectively, and c is the speed of
light. Equation (1) represents the normal ground
state HFS multiplied by a correction As for the
relativistic energy of the s-electron, and by a fac-
tor (1− δ), which takes the ‘Breit-Schawlow’ (BS)
effect into account. The BS effect is due to the spa-
tial distribution of the nuclear charge. It corrects
for the fact that we cannot assume a homogeneous
charge distribution over a spherical nucleus. The
values for δ were taken from [6], those for gI and
I from [7]. In principle eq.(1) should also contain
a correction for the finite nuclear mass, but since
this correction is very small it can be neglected
[2]. The energy of the (1s22s) 2S1/2 ground state
HFS of lithium-like ions only differs from eq.(1)
by a factor 1/n3 = 1/8 and by the As-value [2].
However, eq.(1) requires two further important
corrections, the one of most interest to us being
that which corrects for the QED effects. The other
correction takes the ‘Bohr-Weisskopf’ (BW) effect
into account [8]. The BW effect is due to the spa-
tial distribution of the nuclear magnetisation and
is only known with an accuracy of 20-30 %, which
is mainly due to the single-particle model used for
its calculation [9]. Unfortunately, the QED effects
are of the same order of magnitude as the uncer-
tainty in the BW effect [10]. Thus, from a HFS
measurement of a single species (i.e. H- or Li-like)
the QED effects cannot be determined accurately.
Equation (1) can also be written as E1sHFS =
C1s +E1sQED, where the constant C
1s includes ev-
erything except the QED effects. Since the equa-
tions for the (1s) and (2s) states are so similar, it
is possible to write the difference between the two
HFS as ∆EHFS = E
HFS−ξE
HFS = Enon−QED+
EQED [10]. The factor ξ only contains non-QED
terms and can be calculated to a high precision [10].
From the difference between the HFS measure-
ments of H- and Li-like ions of the same isotope,
the QED effects can thus be determined within a
few percent. However, this requires measurements
of transitionwavelengthswith an experimental res-
olution of the order of 10−6.
The transition lifetime t is defined as t = A−1
(see e.g. [11]). The transition probability A, for
an M1 transition from the excited to the lower
hyperfine state, is given by [2]
4α(2πν)3~2I (2κ+ 1)
27m2ec
4 (2I + 1)
where ~ is Planck’s constant divided by 2π and κ
Table 1
Calculated HFS transition wavelengths (λ) and lifetimes (t)
of the most interesting ion species for systematic studies.
Also shown are the nuclear spin (I) and magnetic moment
(µ), taken from [7]. The half-lives of these species are longer
than 10 minutes. (The values listed are truncated and the
QED and BW effects are not included.)
element ion type λ (nm) t (ms) I µ (µN )
lead 207Pb81+ H-like 973 45 1/2 0.59
bismuth 209Bi82+ H-like 239 0.38 9/2 4.11
209Bi80+ Li-like 1469 87
protactinium 231Pa90+ H-like 262 0.64 3/2 2.01
231Pa88+ Li-like 1511 123
lead [12] 207Pb+ P3/2 - P1/2 710 41 1/2 0.59
chlorine [13] 35Cl+ 3P2 -
1D2 858 - 3/2 0.82
3P1 -
1D2 913 -
argon [14] 37Ar2+ 3P2 -
1D2 714 - 7/2 1.3
3P1 -
1D2 775 -
is related to the electron’s angular momentum [2].
From eq.(2) it can be seen that A scales with the
transition frequency as ν3, whereas ν is propor-
tional to Z3, see eq.(1). Therefore, the transition
lifetime scales with Z as t ∝ Z−9 and is roughly of
the order of milliseconds for Z > 70.
In table 1 the calculated transition wavelengths
(λ) and lifetimes (t), together with their corre-
sponding I and µ values, of the most interesting
species for our laser spectroscopy studies are listed.
(The QED and BW effects are not taken into ac-
count.) The half-lives of these species exceed 10
minutes, which corresponds to the minimum time
required for a measurement. Although the wave-
lengths span a broad range, roughly from 200 to
1600 nm, these transitions are still accessible with
standard laser systems. The three species (Pb [12],
Cl [13] and Ar [14]) at the bottom of the table
are considered as candidates for pilot experiments.
They are singly charged ions, which are easily pro-
duced, have M1 transitions at convenient wave-
lengths, and can be used to test the laser spec-
troscopy part of the experiment. A measurement
of the HFS in 207Pb+ is of special interest, be-
cause it will be possible to extract the value of µ.
Currently two different values exist, which unfor-
tunately leads to a 2% difference in the HFS cal-
culations [15].
In principle, similar experiments could be car-
ried out with metastable hafnium (180Hf, level
energy 1141 keV, half-life 5.5 h [7]). For H-like
hafnium, the transition values are λ = 217 nm and
t = 0.25 ms. For the Li-like ion, λ = 1434 nm and
t = 72 ms are obtained. The difficulty with this
isotope is that its nucleus is in an excited state,
which is difficult to produce.
Figure 1 shows the calculated transition wave-
lengths of all H-like lead, and all H- and Li-like
bismuth isotopes with half-lives exceeding 10 min-
utes. (The QED and BW effects are not included.)
The isotopes are labelled by their corresponding
atomic mass units (in u) and the stable isotopes
(207Pb and 209Bi) are indicated by the small ar-
rows. For Pb, only the H-like isotopes are acces-
sible with standard lasers, because the transition
wavelengths of Li-like isotopes are much longer
than 1600 nm. For Bi, many isotopes of both ion
species are accessible, although their transition
wavelengths differ considerably.
From Fig.1 it is clear that both elements offer
many candidates for laser spectroscopy measure-
ments of ground state HFS and that bismuth, in
particular, allows for a systematic study of the
(higher order) QED effects at high fields. Further-
more, a systematic study of different isotopes of
the same species, for example a study of the H-like
Pb isotopes, will make it possible to study trends
in nuclear properties across a range of isotopes.
There already exist two previous measurements
of the 2s ground state HFS in 209Bi. A direct mea-
surement [16] was carried out using the ESR at
GSI (Darmstadt), but unfortunately no resonance
was found at the predicted value of ≈ 1554 nm
[10]. An indirect measurement [17] was performed
in an electron beam ion trap (EBIT) and yielded
a value of ≈ 1512 nm, but the error in the mea-
surement was rather large (≈ 50 nm). In the ESR
the ions have relativistic velocities (≈ 200MeV/u),
H-like Li-like
H-like
Fig. 1. Calculated transition wavelengths for H-like Pb
and Bi isotopes (full circles), and Li-like Bi isotopes (open
circles). Only isotopes with half-lives exceeding 10 minutes
are shown. The small arrows indicate the stable isotopes,
the numbers are the masses in u. (The QED and BW effects
are not included.)
which are used to shift the transition wavelength
to a lower value (≈ 532 nm), and the transitions
are Doppler-broadened (≈ 40 GHz). In the EBIT
the ions have temperatures of several hundreds of
eV (∼ 106 K), which lead to considerable Doppler
broadening (≈ 10 GHz). The resolution obtained
in previous measurements at the ESR is of the or-
der of 10−4, whereas that of the EBIT measure-
ment is of the order of 10−2.
3. Experiment overview
A detailed description of the proposed experi-
ments, as well as a treatment of the techniques
used, can be found elsewhere [18,19]. Briefly, an
externally produced bunch of roughly 105 HCI at
an energy of a few keV is loaded into a cylindrical
open-endcap Penning trap [20] on axis, i.e. along
the magnetic field lines. Electron capture (neutral-
isation) by collisions is strongly reduced by operat-
ing the trap at cryogenic temperatures under UHV
conditions. The HCI are captured in flight, con-
fined, cooled by ‘resistive cooling’ [21] and radially
compressed by a ‘rotating wall’ [22] technique. Af-
ter these steps a cold and dense ion cloud is ob-
tained. The spectroscopy laser enters the trap ax-
ially through an open-endcap and will fully irradi-
ate the ion cloud. The fluorescence from the excited
HCI is detected perpendicular to the cooled axial
motion (trap axis) through segmented ring elec-
trodes, which are covered by a highly-transparent
copper mesh. (The ring is segmented for the rotat-
ing wall technique.)
The above mentioned transition lifetimes imply
that, for a detection efficiency of ∼ 10−3, accept-
able fluorescence rates, up to a few thousand counts
per second, from M1 transitions can be expected
from a (∼ 3mm diameter) cloud of 105 ions [18,19].
Confining the HCI in a trap, and cooling and com-
pressing the cloud, will thus enable fluorescence
detection and ensure long interrogation times by
the laser.
However, due to the high density of HCI in the
cloud, space charge effects will play a role and will
lead to shifts of the motional frequencies of the
trapped ions. We have studied this effect in detail
and understand the corresponding frequency shifts
well [23]. Since these shifts are fairly small, the (fre-
quency dependent) cooling and compression tech-
niques can still be applied.
The HCI also need to be strongly cooled to re-
duce Doppler broadening of the transitions. This
will be achieved by resistive cooling of the (axial)
ion motion in the trap. For example, for the F =
1 → F = 0 transition in 207Pb81+ at ν ≈ 3× 1014
Hz, the Doppler broadened linewidth at a temper-
ature of 4 K is ∆νD ≈ 3× 10
7 Hz. The anticipated
resolution is therefore of the order of 107/1014 =
10−7. This is three orders of magnitude better than
any previous measurement, see e.g. [15,24], and
good enough to measure the QED effects within a
few percent.
4. Acknowledgments
This work is supported by the European Com-
mission within the framework of the HITRAP
project (HPRI-CT-2001-50036). W.N. acknowl-
edges funding by the Helmholtz Association (VH-
NG-148).
References
[1] W.E. Lamb Jr. and R.C. Rhetherford, Phys. Rev. 72,
(1974) 241.
[2] T. Beier, Phys. Rep. 339, (2000) 79.
[3] W. Quint J. Dilling, S. Djekic, H. Häffner, N.
Hermanspahn, H.-J. Kluge, G. Marx, R. Moore, D.
Rodriguez, J. Schönfelder, G. Sikler, T. Valenzuela, J.
Verdú, C. Weber and G. Werth, Hyp. Int. 132, (2001)
[4] T. Beier, L. Dahl, H.-J. Kluge, C. Kozhuharov, W.
Quint and the HITRAP collaboration, Nucl. Instr.
Meth. Phys. Res. B 235, (2005) 473.
[5] H.-J. Kluge, T. Beier, K. Blaum, M. Block, L. Dahl,
S. Eliseev, F. Herfurth, S. Heinz, O. Kester, C.
Kozhuharov, T. Kühl, G. Maero, W. Nörtershäuser, T.
Stöhlker, W. Quint, G. Vorobjev, G. Werth, and the
HITRAP Collaboration, Proceedings of the Memorial
Symposium for Gerhard Soff, Topics in Heavy Ion
Physics (Eds Walter Greiner and Joachim Reinhardt),
pages 89-101 (2005), EP Systema (Budapest).
[6] V.M. Shabaev, J. Phys. B 27, (1994) 5825.
[7] R.B. Firestone and V.S. Shairley, Table of Isotopes
(Appendix E), Wiley (1998).
[8] A. Bohr and V.F. Weisskopf, Phys. Rev. 77, 94 (1950).
[9] V.M. Shabaev, M. Tomaselli, T. Kühl, A.N. Artemyev
and V.A. Yerokhin, Phys. Rev. A 56, (1997) 252.
[10] V.M. Shabaev, A.N. Artemyev, V.A. Yerokhin, O.M.
Zherebtsov and G. Soff, Phys. Rev. Lett. 86, (2001)
3959.
[11] W. Demtröder, Laser Spectroscopy, Springer, New
York (1996).
[12] A. Roth, Ch. Gerz, D. Wilsdorf and G. Werth, Z. Phys.
D 11, (1989) 283.
[13] I.S. Bowen, Astrophys. J. 132, (1960) 1.
[14] M.H. Prior, Phys. Rev. A 30, (1984) 3051.
[15] P. Seelig, S. Borneis, A. Dax, T. Engel, S. Faber, M.
Gerlach, C. Holbrow, G. Huber, T. Kühl, D. Marx, K.
Meier, P. Merz, W. Quint, F. Schmitt, M. Tomaselli, L.
Völker, H. Winter, M. Würtz, K. Beckert, B. Franzke,
F. Nolden, H. Reich, M. Steck and T. Winkler, Phys.
Rev. Lett. 81, (1998) 4824.
[16] S. Borneis, A. Dax, T. Engel, C. Holbrow, G. Huber,
T. Kühl, D. Marx, P. Merz, W. Quint, F. Schmitt,
P. Seelig, M. Tomaselli, H. Winter, K. Beckert, B.
Franzke, F. Nolden, H. Reich and M. Steck, Hyp. Int.
127, (2000) 305.
[17] P. Beiersdorfer, A.L. Osterheld, J.H. Scofield, J.R.
Crespo López-Urrutia and K. Widmann, Phys. Rev.
Lett. 80, (1998) 3022.
[18] D.F.A. Winters, A.M. Abdulla, J.R. Castrejón Pita,
A. de Lange, D.M. Segal and R.C. Thompson, Nucl.
Instr. Meth. Phys. Res. B 235, (2005) 201.
[19] M. Vogel, D.F.A. Winters, D.M. Segal and R.C.
Thompson, Rev. Sci. Instrum. 76, (2005) 103102.
[20] G. Gabrielse, L. Haarsma and S.L. Rolston, Int. J.
Mass Spectr. Ion Proc. 88, (1989) 319.
[21] D.J. Wineland and H.G. Dehmelt, J. Appl. Phys. 46,
(1975) 919.
[22] W.M. Itano, J.J. Bollinger, J.N. Tan, B. Jelenković,
X.-P. Huang and D.J. Wineland, Science 279, (1998)
[23] D.F.A. Winters, M. Vogel, D.M. Segal and R.C.
Thompson, J. Phys. B: At. Mol. Opt. Phys. 39, (2006)
3131.
[24] I. Klaft, S. Borneis, T. Engel, B. Fricke, R. Grieser, G.
Huber, T. Kühl, D. Marx, R. Neumann, S. Schröder,
P. Seelig and L. Völker, Phys. Rev. Lett. 73, (1994)
2425.
	Introduction
	Hydrogen- and lithium-like ions
	Experiment overview
	Acknowledgments
	References
ABSTRACT
  An overview is presented of laser spectroscopy experiments with cold,
trapped, highly-charged ions, which will be performed at the HITRAP facility at
GSI in Darmstadt (Germany). These high-resolution measurements of ground state
hyperfine splittings will be three orders of magnitude more precise than
previous measurements. Moreover, from a comparison of measurements of the
hyperfine splittings in hydrogen- and lithium-like ions of the same isotope,
QED effects at high electromagnetic fields can be determined within a few
percent. Several candidate ions suited for these laser spectroscopy studies are
presented.

<|endoftext|><|startoftext|>
Introduction to Superconducting Circuits (Wiley, New York, 1999); M. Tinkham,
Introduction to Superconductivity (McGraw-Hill, New York, 1996); K. K. Likharev, Dynamics
of Josephson Junctions and Circuits (Gordon and Breach, New York, 1986); A. Barone and
G. Paternò, Physics and Applications of the Josephson Effect (Wiley, New York, 1982).
[11] P.C. Hendry, N.S. Lawson, R.A.M. Lee, P.V.E. McClintock, and C.H.D. Williams, in: Forma-
tion and Interactions of Topological Defects, ed. A.C. Davis and R.N. Brandenberger (Plenum,
New York,1995).
mailto:phr76jb@tx.technion.ac.il
	References
ABSTRACT
  We study a loop of Josephson junctions that is quenched through its critical
temperature. For three or more junctions, symmetry breaking states can be
achieved without thermal activation, in spite of the fact that the relaxation
time is practically constant when the critical temperature is approached from
above. The probability for these states decreases with quenching time, but the
dependence is not allometric. For large number of junctions, cooling does not
have to be fast. For this case, we evaluate the standard deviation of the
induced flux. Our results are consistent with the available experimental data.

<|endoftext|><|startoftext|>
Title
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
Frequency modulation Fourier transform spectroscopy 
Julien Mandon, Guy Guelachvili, Nathalie Picqué 
Laboratoire de Photophysique Moléculaire, CNRS; Univ. Paris-Sud, Bâtiment 350, 91405 
Orsay, France 
Corresponding author: 
Dr. Nathalie Picqué, 
Laboratoire de Photophysique Moléculaire 
Unité Propre du CNRS, Université Paris Sud, Bâtiment 350 
91405 Orsay Cedex, France 
Phone number: +33 1 69156649 
Fax number: +33 1 69157530 
Email: nathalie.picque@ppm.u-psud.fr
Web: http://www.laser-fts.org  
Abstract: A new method, FM-FTS, combining Frequency Modulation heterodyne laser 
spectroscopy and Fourier Transform Spectroscopy is presented. It provides simultaneous 
sensitive measurement of absorption and dispersion profiles with broadband spectral coverage 
capabilities. Experimental demonstration is made on the overtone spectrum of C2H2 in the 1.5 
µm region. 
OCIS codes: 120.6200, 300.6300, 300.6380, 300.6360, 300.6310, 300.6390, 120.5060 
120.6200  Spectrometers and spectroscopic instrumentation, 300.6300 Spectroscopy, Fourier 
transforms, 300.6380  Spectroscopy, modulation, 300.6360  Spectroscopy, laser, 300.6310  
Spectroscopy, heterodyne, 300.6390 Spectroscopy, molecular, 120.5060  Phase modulation 
mailto:nathalie.picque@ppm.u-psud.fr
http://www.laser-fts.org/
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
Improving sensitivity is presently one of the major concern of spectroscopists. This may 
be obtained both from the enhancement of the intrinsic signal, and from the reduction of the 
background noise. In this latter case, modulation has been one of the most effective approach. 
In particular, Frequency Modulation (FM) absorption spectroscopy [1] has reached detection 
sensitivity near to the fundamental quantum noise limit, by shifting the frequency modulation 
of the measurements to a frequency range where the 1/f noise becomes negligible. Moreover, 
FM spectroscopy benefits from high-speed detection and simultaneous measurement of 
absorption and dispersion signals. Since Bjorklund’s first demonstrations [1,2] of the 
efficiency of FM spectroscopy with a single-mode continuous-wave dye laser, the technique 
has been widely used as a tunable laser spectroscopic method in fields such as laser 
stabilization [3], two-photon spectroscopy [4], optical heterodyne saturation spectroscopy [5], 
trace gas detection [6]. In most schemes, the laser wavelength is scanned across the 
atomic/molecular resonance to retrieve the line shape. More rarely, the modulation frequency 
is tuned. However in both cases, the measurements are limited to narrow spectral ranges. 
This letter reports the first results in FM broadband spectroscopy. This work is  
motivated by our ongoing effort of implementing a new spectroscopic approach 
simultaneously delivering sensitivity, resolution, accuracy, broad spectral coverage and rapid 
acquisition. The basic idea, named FM-FTS, is to associate the advantages of FM 
spectroscopy and high-resolution Fourier transform spectroscopy (FTS). FTS is able to record 
at once extended ranges, with no spectral restriction. In particular it gives easy access to the 
infrared domain. In this letter, a new way of modulating the interferogram is implemented. 
The key concept is that a radio frequency (RF) modulation is performed. The beat signal at 
the output port of the Fourier transform spectrometer is modulated at constant RF, which is 
about 104 times greater than the audio frequency generally delivered by the interferometer 
optical conversion. Together with the advantage, over classical FTS, of measurements 
performed at much higher frequency, our approach benefits from the synchronous detection 
ability and from the simultaneous acquisition of both the absorption and the dispersion of the 
recorded profiles. 
The experimental principle is presented in Fig. 1. The light emitted by the broadband 
source is first passing through the interferometer. The output beam is then phase-modulated 
by an electro-optic modulator (EOM) before entering the absorption cell and falling on the 
fast detector. The synchronous detection of the detector signal is realized by the lock-in 
amplifier at the EOM driver reference frequency fm. Recorded data are finally stored on the 
computer disk with their corresponding path difference position ∆. Their Fourier transform is 
the spectrum. 
In more details, the electric field E at the output of the interferometer may be written as: 
0 ( )( , ) 1 exp exp( )d c.c.     (1)
c c ct i i tc
ω ω ω
⎡ ⎤∆⎛ ⎞
∆ = + − +⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦
where E0 is the electric field amplitude of the source at ωc optical pulsation, c is the velocity 
of light and c.c the conjugate complex of the preceding expression in Eq. 1. The EOM effect 
on the beam is assumed to have a low modulation index M. As a consequence, each carrier 
wave of pulsation ωc, has two weak sidebands located at ± ωm = ± 2π fm. Equation (1) 
becomes:  
( ) ( ) }
0 ( )( , ) 1 exp exp
M exp M exp d c.c.     (2)
c m c m c
t i i t
i t i t
ω ω ω ω ω
⎡ ⎤∆⎛ ⎞
∆ = + −⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦
⎡ ⎤ ⎡ ⎤+ + − − +⎣ ⎦ ⎣ ⎦
When interacting with the gas, the carrier and the sidebands experience attenuation and phase-
shift due to absorption and dispersion. Following the notations introduced in [1], this 
interaction may be written as exp(-δ(ω)- i φ(ω)) where δ is the amplitude attenuation and φ is 
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
the phase shift. The following convention is adopted: δn and φn denotes for n = 0, ±1 the 
respective components at ωc and ωc± ωm. Then Eq. 2 may be written: 
( ) ( ){
( ) ( ) ( ) ( ) }
1 1 1 1
( , ) 1 exp exp exp
M exp exp M exp exp d c.c.     (3)
c m c m c
t i i i t
i i t i i t
ω δ φ ω
δ φ ω ω δ φ ω ω ω+ + − −
⎡ ⎤∆⎛ ⎞
∆ = + − − −⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦
⎡ ⎤ ⎡ ⎤+ − − + − − − − +⎣ ⎦ ⎣ ⎦
The intensity I detected by the fast photodetector is proportional to : 
*I( , ) ( , ) ( , ).     (4)t t t∆ ∝ ∆ ∆E E  
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
0 1 1
0 1 0 1 0 1 0 1
0 1 1 0 0 1 0 1
I( , ) exp 2 exp 2 exp 2 1 cos
2M cos exp cos exp cos 1 cos
2M sin exp sin exp sin 1 cos
δ δ δ ω
ω δ δ φ φ δ δ φ φ ω
ω δ δ φ φ δ δ φ φ ω
+ + − −
− − + +
⎛ ⎡ ⎤∆⎛ ⎞⎡ ⎤∆ ∝ − + − + − +⎜ ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦⎝
⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − − − − − + ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦
∆⎛ ⎞⎡ ⎤+ − − − − − − − + ⎜⎣ ⎦ ⎝ ⎠
( ) ( ) ( )
( ) ( ) ( )
1 1 1 1
1 1 1 1
2M cos 2 exp sin 1 cos
2M sin 2 exp cos 1 cos d .                                        (5)
m c c
ω δ δ φ φ ω
ω δ δ φ φ ω ω
+ − − +
+ − − +
⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − + ⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦
⎞⎡ ⎤∆⎛ ⎞⎡ ⎤+ − − − + ⎟⎜ ⎟⎢ ⎥⎣ ⎦ ⎝ ⎠⎣ ⎦ ⎠
After synchronous detection at fm frequency and with the assumption that |δ0-δj |<<1 and |φ0-
φj|<<1 (with j = ±1), the in-phase Icos(∆) and the in-quadrature Isin(∆) parts of the electric 
signal are given by  
( )( )cos 0 1 1I ( ) M 1 cos exp 2 d .     (6)c cc
ω δ δ δ ω− +
⎡ ⎤∆⎛ ⎞
∆ ∝ + − −⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦
( )( )sin 0 1 1 0I ( ) M 1 cos exp 2 2 d .     (7)c cc
ω δ φ φ φ ω+ −
⎡ ⎤∆⎛ ⎞
∆ ∝ + − + −⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦
Summarizing, two interferograms are simultaneously measured, allowing to obtain 
broadband FM spectra. The in-phase interferogram provides spectrally resolved information 
on the difference of absorption experienced by each group of two sidebands. The in-
quadrature interferogram gives the difference between the average of the dispersions 
experienced by the sidebands and the dispersion undergone by each carrier. 
 For this first experimental demonstration, a narrow-band emission source covering 0.25 
cm-1 (7.5 GHz) has been implemented as a test source. It is made of a fiber-coupled 
distributed feedback laser diode emitting around 1530 nm with an output power of a few mW. 
The current of the laser diode is modulated at about 20 Hz by a ramp generator. At each path 
difference step, while the interferometer is recording one interferogram sample, the laser 
frequency excursion is equal to 7.5 GHz, corresponding to one period of the triangular ramp. 
Consequently, for the interferometer, the laser diode behaves as a continuous emission source 
emitting over 0.25 cm-1. The interferometer output light is phase-modulated at fm = 150 MHz 
by the EOM and passes through an 80-cm cell filled at 10 hPa with acetylene in natural 
abundance. The light is next focused on an InGaAs nanosecond infrared photodetector, which 
according to Eq.5 delivers a signal proportional to the intensity of the beam containing a beat 
signal at the RF modulation frequency. The amplified detector signal is mixed with the 
reference signal at fm, down to d.c., using a commercial high frequency dual-phase lock-in 
amplifier. The reference may be phase-shifted with respect the signal used to drive the EOM. 
The two channels detected in-phase and in-quadrature are measured simultaneously.  
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
Figure 2 shows a typical in-phase interferogram of C2H2. Its shape is characteristic of an 
interferogram of first-derivative type line-shapes. The 3 cm period amplitude modulation is 
due to the beat between the two strongest acetylene lines in the explored spectral domain. 
Figure 3 shows the two narrow-band spectra, Fourier transform of the in-phase (absorption) 
and in-quadrature (dispersion) interferograms. The spectral domain extension is limited by the 
tuning capabilities of the diode laser, which was used as a test source. This does not restrict 
the generality of the present demonstration. The lines belong to the ν1+ν3 and ν1+ν3+ν51-ν51 
overtone bands of 12C2H2. The unapodised spectral instrumental resolution: 12.5 10-3 cm-1 
(0.375 GHz) is narrower than the Doppler width of the lines. Signal to noise ratio is of the 
order of 1200. The total recording time of the order of 15 minutes is due to the need of 
adapting the interferometer recording mode procedure to the rather low laser diode frequency 
excursion period.  
The present validation of FM-FTS with a narrow band light source made the experience 
much simpler. Indeed, in wideband FTS, processing the signal of the interferogram needs 
special dynamic range solutions. Thanks to the only 0.25 cm-1-wide spectrum analysed in this 
experiment, a sophisticate RF detection chain, presently under development, was not 
necessary. The design of our Connes-type interferometer allows a balanced detection of the 
signals recorded at the two output ports. This will be helpful to remove the part of the 
interferogram which is not modulated by path difference and to consequently improve the 
dynamic range of the measurements. Similar solutions have already been successfully 
practiced for time-resolved FTS [7]. In FM-FTS, they are formally even easier to implement 
since the signal may be band-pass filtered around the modulation radio-frequency.  
In the present experimental set-up, the light should sequentially reach the equipment 
parts as shown in Fig.1. Briefly, to have a broadband equivalent of FM tunable laser 
spectroscopy, the sidebands generated by the EOM must not be resolved by the spectrometer. 
Also, since each carrier and its sidebands have to experience different attenuation and phase-
shift, the EOM must be placed before the cell containing the gas of interest. This matter will 
be discussed in more detail elsewhere.  
This first FM-FTS experiment demonstrates the feasibility of coupling broadband laser 
sources, Fourier spectrometers and RF detection. This opens new perspectives in high 
sensitivity multiplex spectroscopy. FM-FTS may be coupled to a large variety of high 
brightness sources. This includes broadband cw lasers, supercontinua sources, mode-locked 
lasers as demonstrated recently [8], and Amplified Spontaneous Emission sources. Frequency 
nonlinear conversion may also be used when no laser source is available in the spectral range 
of interest.  
FM-FTS may be practiced with any kind of Fourier transform spectrometers, including 
commercially available instruments, at the expense of reasonable modifications in the signal 
detection scheme. The approach is also suitable at low spectral resolution. In such case, 
modulation frequencies lying in the GHz domain may be used. Moreover,  FM-FTS induces 
new practices in Fourier transform spectroscopy. The modulation frequency is very high. The 
optical fringes generated by the interferometer can then be scanned at a much higher 
frequency than what is usually practiced nowadays. Path difference variation of the order of 1 
m/s, is easily affordable. It corresponds to acquisition times expressed in second when 
presently the most efficient existing high resolution interferometers need 1 to 10 hours to 
record interferograms.  Additionally, due to the low étendue of the analysed laser beams in 
our method, miniaturized instruments may be implemented. In addition to the radio-frequency 
detection scheme, sensitivity may be further enhanced by using an external optical resonator, 
thus increasing the effective absorption length. 
With FM-FTS, both the absorption and the dispersion associated with each spectral 
features are measured simultaneously. Despite its recognized interest for lineshape parameters 
retrieval, traditional dispersion spectroscopy has been poorly developed, only at low spectral 
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
resolution, mostly due to its experimental complexity. FM-FTS should represent an easy 
manner of getting this information over extended spectral domains, which may induce new 
interest to the experimental investigation of dispersion profiles. 
References 
[1] G.C. Bjorklund, Frequency-modulation spectroscopy: a new method for measuring weak 
absorptions and dispersions, Optics Letters 5, 15-17 (1980). 
[2] G.C. Bjorklund, M.D. Levenson, W. Lenth, C. Ortiz, Frequency-modulation (FM) 
spectroscopy. Theory of  lineshapes and signal-to-noise analysis, Applied Physics B 32, 145-
152 (1983). 
[3] R. W. P. Drever, J. L. Hall, F. V. Kowalski, J. Hough, G. M. Ford, A. J. Munley and H. 
Ward, Laser phase and frequency stabilization using an optical resonator, Applied Physics B 
31, 97-105 (1983).  
[4] W. Zapka, M. D. Levenson, F. M. Schellenberg, A. C. Tam, G. C. Bjorklund, Continuous-
wave Doppler-free two-photon frequency-modulation spectroscopy in Rb vapor Optics 
Letters 8, 27-29 (1983) 
[5] J.L. Hall, L. Hollberg, T. Baer, H.G. Robinson, Optical heterodyne saturation 
spectroscopy, Applied Physics Letters 39, 680-682 (1981). 
[6] P. Maddaloni, P. Malara, G. Gagliardi, P. De Natale, Two-tone frequency modulation 
spectroscopy for ambient-air trace gas detection using a portable difference-frequency source 
around 3 µm, Applied-Physics-B-Lasers-and-Optics B85, 219-22 (2006). 
[7] N. Picqué, G. Guelachvili, High-information time-resolved Fourier transform 
spectroscopy at work, Applied Optics 39, 3984-3990 (2000). 
[8] J. Mandon, G. Guelachvili, N. Picqué, Frequency Comb Spectrometry with Frequency 
Modulation, in preparation, 2007. 
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
Figure captions 
Fig. 1. 
Schematic of the experimental setup. 
Fig. 2. 
Absorption interferogram using in-phase RF detection with FM-FTS. Maximum path 
difference is 40 cm corresponding to 12.5 10-3 cm-1 unapodized resolution. 
Fig. 3. 
FM-FTS dispersion and absorption spectra of the acetylene molecule at 1528.6 nm. The 
middle plot represents the line relative intensities taken from the HITRAN database. 
Frequency modulation Fourier transform spectroscopy  Mandon, Guelachvili, Picqué, 2007 
Broadband 
source
FTS EOM Sample Cell
Driver
Lock-In
Detector
Fig. 1. Schematic of the experimental setup. 
0 10 20 30 40
Path difference (cm)
FM-FTS Interferogram (absorption channel)
Fig. 2. Absorption interferogram using in-phase RF detection with FM-FTS. Maximum path 
difference is 40 cm corresponding to 12.5 10-3 cm-1 unapodized resolution. 
6541.50 6541.75 6542.00
Dispersion
Absorption
(2) R
ν1+ν3+ν51-ν51
ν1+ν3
(cm-1)
Fig. 3. FM-FTS dispersion and absorption spectra of the acetylene molecule at 1528.6 nm. 
The middle plot represents the line relative intensities taken from the HITRAN database. 
	Frequency modulation Fourier transform spectroscopy
ABSTRACT
  A new method, FM-FTS, combining Frequency Modulation heterodyne laser
spectroscopy and Fourier Transform Spectroscopy is presented. It provides
simultaneous sensitive measurement of absorption and dispersion profiles with
broadband spectral coverage capabilities. Experimental demonstration is made on
the overtone spectrum of C2H2 in the 1.5 $\mu$m region.

<|endoftext|><|startoftext|>
Introduction 
There are strong theoretical coherence reasons which impose to critically reconsider the 
approach to cosmological problem on the whole. The Quantum Cosmology’s main problem is to 
individuate the proper boundary conditions for the Universe’s wave function in the Wheeler-DeWitt 
equation. These conditions have to be such to allow the confrontation between a  probability 
distribution of states and the observed Universe. In particular, it is expected to select a path in the 
configuration space able to solve the still open problems of the Big-Bang traditional scenario: flat 
space, global homogeneity (horizon problem) and the “ruggedness” necessary to explain the tiny 
initial dishomogeneities which have led to the formation of the galactic structures. 
The inflationary cosmology ideas has partly supplied with a solution to the standard model 
wants by introducing the symmetry breaking and phase transition notions which are at the core of 
Quantum Cosmology. The last one also finds its motivation in the necessity to provide with a 
satisfactory physical meaning to the initial singularity problem, unavoidable in GR under the 
condition of the Hawking-Penrose theorem (Hawking & Ellis, 1973). 
The Hartle-Hawking “no-boundary” condition seems to provide a very powerful constraint for 
the Quantum Cosmology main requirements, but appears as an “ad hoc” solution which could be 
deduced by a fundamental approach. Particularly, the mix of topologies  used to conciliate the 
without boundary Universe symmetry with the Big-Bang evolutionary scenario is unsatisfactory. 
We realize that  most part of the Quantum Cosmology problems inherit the uncertainties of the 
Fridman model in GR, so they derive from the euristic use of the local laws on cosmic scale. 
A possible way-out is the Fantappié-Arcidiacono group approach which allows to individuate a 
Universe model without recourse to arbitrary extrapolations of the symmetry groups valid in 
physics. 
The group extension theory naturally finds again the Hartle-Hawking condition on the Universe 
wave function and allows to firmly founding theoretically the Quantum Cosmology. The price to 
pay is a subtle methodological  question on using the GR in cosmology. In fact, in 1952 Fantappié 
pointed out that the problem of the use of local laws to define the cosmological boundary 
conditions is due to the fact that  GR describes matter  in terms of local curvature, but leaves the 
question of space-time global structure indeterminate. It happens because, differently from RR, GR 
has not be built on group base, which thing should be central in building any theory up, especially 
when it aims to express  universally valid statements on physical world, the class of the superb 
theories, how Roger Penrose called them. 
We are going to examine here the foundations of the group extension method (par. 2) and the 
relativity in the De Sitter Universe (par. 3, 4), we introduce the conditions to define matter-fields 
(par.5).In (par.6) we analyze the physical significance of the observers in an istantonic Universe at 
imaginary time, and in (par.7) investigate the physical meaning of an Hartle-Hawking condition in 
an hyper-spherical  universe. 
2. An Erlangen Program for Cosmology  
In 1872 Felix Klein (1849-1925) presented the so-called Erlangen program for geometry, 
centred upon the symmetry transformations group. From 1952, Fantappié, basing on a similar idea 
and in perfect consonance with Relativity spirit, proposed an Erlangen program for physics, where 
a Universe is univocally individuated by a symmetry group which let its physical laws invariant 
(Fantappié,1954, 1959). It has to be underlined that in the theory Universe means any physical 
system characterized by a symmetry group.  
The space-time isotropy and homogeneity principle  with respect to physical laws tells us that 
the physical law concept itself is based upon symmetry.  So the essential idea is to individuate 
physical laws starting from the transformations group which let them invariant. We observe here 
that there are infinite possible transformations group which individuate an isotropic and 
homogeneous space-time. In order to build the next improvements in physics using the group 
extension method, we can follow the path indicated by the two groups we know to be two valid 
description levels of the physical world: the Galilei group and the Lorentz-Poincarè one. It is useful 
to remember that the Galilei group is a particular case of the Lorentz one when ∞→c ,i.e. when  it 
is not made use of the field notion and the interactions velocity is considered to be infinite. Staying 
within a quadrimensional space-time and consequently considering only groups at 10 parameters 
and continuous transformations, Fantappié showed that the Poincaré group can be considered a limit 
case of a broader group depending with continuity on c and another parameter r: the Fantappié 
group; moreover this group cannot be further extended under the condition to stay within a group at 
10 parameters. 
So we have the sequence: 
                              
                                        1031
31 +++ →→ FLG  
Where G is the Galilei group, L the Lorentz one and F the Fantappié final one, from which with 
∞→R , we get the L group. It is shown that such sequence of universes is univocal. 
The Lorentz group can be mathematically interpreted as the group of roto-translations such to 
let that particular object that is the Minkowski space-time invariant. Similarly, the Fantappié group 
is the one of the pentadimensional rotations of a new space-time: the hyper-spherical and at 
constant curvature  De Sitter universe (maximally symmetric). We point out we have obtained the 
De Sitter model without referring to the gravitational interaction, differently from the GR where the 
De Sitter universe is one of the possible solutions of the Einstein equations with cosmological 
constant. From a formal viewpoint we make recourse to pentadimensional rotations because in the 
De Sitter universe there appears a new constant r, which can be interpreted as the Universe radius. 
The group extension mechanism individuates an univocal sequence of symmetry groups; for 
each symmetry group we have a corresponding level of physical world description and a new 
universal constant, so providing the most general boundary conditions and constraining the form of 
the possibile physical laws. The Fantappié group fixes the c  and r constants and defines a new 
relativity for the inertial observers in De Sitter Universe. In this sense, the Theory of Universes- 
based on group extension method- is actually a version of what is sought for in the Holographic 
Principle: the possibility to describe laws and boundaries in a compact and unitary way. 
In 1956 G. Arcidiacono proposed to study the De Sitter S4 absolute universe by means of the 
tangent relative spaces where observers localize and describe the physical events by using the 
Beltrami-Castelnuovo 4P  projective representation in the Projective Special Relativity, PSR 
(Arcidiacono,1956; 1976; 1984).  
We note that we pass from hyper-spherical S4 to its real representation as hyperboloid by means 
of an inverse Wick rotation, rotating τ→it  and associating the great circles on the hyper-sphere 
with a family of geodesics on the hyperboloid. In this way, we get a realization of the Weyl 
principle for defining a Universe model, because it fixes a set of privileged observers (Ellis & 
Williams, 1988). So, the choice of 4P Beltrami-Castelnuovo is equivalent to study a relativity in 
4S . 
3. The Fantappié Group Transformations 
To study the De Sitter 4S  universe according to Beltrami-Castelnuovo representation we have 
to set the projectivities which let the Cayley-Klein interval invariant: 
(1.3)                                 0222222 =+−++ rtczyx . 
The (1.3) meets the time axis in the two 0tt ±=  “singularities”, where crt =0  is the time it 
takes light to run the Universe r radius. In this case the singularities’ meaning is purely geometrical, 
not physical, and they represent the hyperboloid rims (1.3), since the De Sitter universe is lacking in 
“structural” singularities. The 4S  invariant transformations are the 5-dimensional space rotations 
which lead on the 4P  observer’s space the projectivities that let the (1.3) unchanged. 
Let’s introduce the five homogeneous projective coordinates (Weierstrass condition): 
(2.3)                                  2rxx aa = , with   .4,3,2,1,0=a  
The ix  space-time coordinates, with i = 1,2,3,4 are: 
(3.3)                          xx =1 ,    yx =2  , zx =3 , ictx =4 . 
The connection between the (2.3) and (3.3) is given by the relation: 
(4.3)                         0xxrx ii =  
from which, owing to (2.3), we get the inverse relation: 
(5.3)                        arx =0 , axx ii =  , 
where 2222 11 γα −+=+= rxxa ii   , with rx
=α  and 0tt=γ . 
The searched transformation between the two 'O and O  observers consequently has the form: 
(6.3)            
ax = bab xα   with abα  orthogonal matrix. 
Limiting ourselves, just for simplicity reasons, to the 410 ,, xxx variables and following the 
standard method, also used in RR, we get 3 families of transformations: 
A) the space translations along the x axis, given by the  ( 10 , xx ) rotation: 
(7.3)       ϑϑ sincos 01
1 xxx +=  
                 
                +−= ϑsin1
0 xx ϑcos0x  
                 4
4 xx = . 
Using the (4.3) and putting αϑ == r
Ttg , we get the space-time transformations with T 
parameter: 
(8.3)        
'     ,  
1 2' . 
The (8.3) for r  indeterminate, i.e. ∞→r , are reduced to the well-known space translations of 
the classical and relativistic cases, connected by the T parameter. 
B) the T0 parameter time translation, given by the ( 40 , xx ) rotation: 
(9.3)       0004
4 sincos ϑϑ xxx +=  
               +−= 04
0 sinϑxx 00 cosϑx  
               1
1 xx = . 
          
                
Putting γϑ itTitg == 000   we obtain: 
(10.3)      
=     ,     
Also the (10.3), when ∞→r  are reduced to the known cases of classical and relativistic 
physics. 
C) the V parameter inertial transformations, given by the ( 41 , xx ) rotation: 
(11.3)           04011
sincos ϕϕ xxx +=  
                      +−= 01
4 sinϕxx 04 cosϕx  
                       0
0 xx = . 
Putting βϕ icVitg == , here we find again the Lorentz transformations: 
(12.3)             
x  , 
The (A), (B) and (C) transformations form the Fantappié projective group which for two 
variables (x,t) and three parameters (T,T0,V), with T translations and V velocity along x, can be  
written: 
(13.3)          
( )[ ]
( ) ( ) 0
ttrxab
bTctax
αβγβγα
γβγαβ
( )[ ]
( ) ( ) 0
ttrxab
bTtcxa
αβγβγα
αβγαβ
where we have put 221 γα −+=a  and 22 )(1 βγαβ −+−=b , with rx=α , cV=β  and 
0tt=γ . 
For ∞→r  we get a = 1 and 21 β−=b , and from (13.3) we obtain the Poincaré group with 
three parameters (T, T0,V). 
The Fantappié group can be synthesized by a very clear geometrical viewpoint, saying that the 
De Sitter universe at 21 r constant curvature shows an elliptic geometry in its hyper spatial global 
aspect (Gauss-Riemann) and an hyperbolic geometry in its space-time sections (Lobacevskij). 
Making the “natural” r unit of this two geometries tend towards infinity we obtain the parabolic 
geometry of Minkowski flat space. 
4. The Projective Relativity in De Sitter Universe 
The Projective Special Relativity (PSR) widens and contextualizes the relativistic results in De 
Sitter geometry.Just like in any physics there exists a wll-defined connection between mechanics 
and geometry. Therefore the PSR makes use of the notion of observer’s private space, redifining it 
on the basis of a constant curvature.  
In PSR it is introduced a space temporal double scale which connects a ( τχ , ) point of S4 with a 
(x,t) one of P4 by means the (1.3) projective invariant. Given a AB straight line and put as R and S 
the intersections with (1.3), the projective distance is given by the logarithm of the (ABRS) bi-ratio: 
(1.4)             ( ) ( ) ( ) ( ) ( )ASBRBSARtABRStAB ⋅⋅== log2log2 00 . 
From the (1.4) we obtain: 
    (2.4)     
rarctg=χ    and   
00 log
From the (2.4) second one, similar to the Milne’s formula, we can see that the “formal” 
singularities are related to the projective description which depicts a universe with infinite space 
and finite time, whereas the De Sitter one is with finite space and infinite time. It is important to 
underline that such equivalence between an “evolutionary” model and a “stationary” one, 
differently from what is often stated, is purely geometrical and has nothing to do with the physical 
processes, but it deals with the cosmological observer definition.We will speak again about such 
fundamental point further. 
The addition of durations’ new law: 
(3.4)                            
1 tdd
it is obtained by the (10.3) formulae and finds its physical meaning in the appearing of the new 
crt =0 , interpretable as the “universe age” for any 
4P observer family. 
Let us consider a uniform motion with U velocity, given by '' Utx = , by means of Fantappié 
transformations we have a uniform motion with W  velocity given by: 
(4.4)               
( ) ( ) ( )
cUVcVU
−+−++
For the visible universe of the O  observer, inside the light-cone, it is valid the condition 
γα ±=  and a=1 , and the (4.4) can be simplified as: 
(5.4)                
( )( )
cVcUcVU
For V = c then W=c, according to RR, while for U=c we have: 
(6.4)               ( ) ( ) ccVcVccW ≠+−±= 112 2α . 
The  (6.4) expresses the possibility of observing hyper-c velocity in PSR. The outcome is less 
strange than it can seem at first sight, because now the space-time of an observer is defined not only 
by the c constant but also by r, and the light-cone is at variable aperture. In straighter physical terms 
it means that when we observe a far universe region of the crt =0  order,  the cosmic objects’ 
velocity appears to be superior to c value, even if the region belongs to the light-cone of the 
observer’s past. For b=0 we obtain the angular coefficients of the tangents to the (1.3) Cayley-Klein 
invariant starting from a P point of the Beltrami-Castelnuovo projection, which represent the two 
light-cone’s straight lines. Differently from RR, here the light-cone’s angle is not constant and 
depends on the P point according to the formula: 
(7.4)             ( )222 γαϑ += atg . 
From the (7.4) derives the C variation of the light velocity with time: 
(8.4)          
21 γ−
C , with 
from which follows that ∞→C in the  two 0t±  singularities  which fix the limit duration according 
to the addition of durations’ new law (3.4).  
Another remarkable consequence of the projective group is the expansion-collapse law, that is 
the connection between the two singularities. Differentiating the (10.3) and dividing them we obtain 
the  velocities’ variation law for a translation in time: 
 (9.4)         ( ) 002' 11 txttVV γγγ −+=− . 
For 1=γ  and 00 tT =  we have the law of projective expansion valid for 00 <<− tt : 
(10.4)           
= , or also 
If ( )00 == tγ , we can write 
(11.4)              HxtxV == 0 , ( )αβ = , 
where 01 trcH == is the well-known Hubble constant. 
The analogous procedure will be followed for the law of projective collapse valid for 00 tt << , 
with 1−=γ  and 00 tT −= : 
(12.4)               
= , or 
We note that in singularities the expansion-collapse velocity becomes infinite. In PSR such 
process, differently from GR, is not connected to gravitation, but derives from Beltrami-
Castelnuovo geometry. 
From the Fantappié group it also follows a new formula for the Doppler effect: 
(13.4)              ( ) ( ) 2' 11 αββωω ++−= , 
where ω is the frequency. For 1=β , which is V=c, we get nothing but the traditional 
proportionality between distance and frequency, αωω =' . For V=0 there follows a Doppler effect 
depending on distance: 
(14.4)               2' 1 αωω += . 
The z red-shift is defined by '1 ωω=+ z  and the (13.4) becomes: 
(15.4)                ( ) ( ) ( ) 21111 αββ ++−=+ z , 
which was historically introduced- in a 1930 Accademia dei Lincei famous memoir- by 
Castelnuovo to explain the “new” Hubble observations on galactic red-shift. If  we are placed on the 
observer’s light-cone where the (12.4) becomes )1( ααβ −= , the (15.4) will be: 
(16.4)                    ( )α−=+ 111 z . 
The red-shift tends towards infinity for x = r, and hyper- c  velocities are possible if z > 1. 
As everybody would naturally expect, modifying geometry implies, as well as in RR, a deep 
redefinition of mechanics. In PSR, the m mass of a body varies with velocity and distance according 
(17.4)                   bamm 20= .    
From the (17.4) it follows that for a = 0, in singularities, the mass is null, while on the light-
cone, for b = 0, ∞→m . The mass of a body at rest varies with t according to: 
(18.4)                     ( )20 1 γ−= mm , 
from which we deduce that at the initial and final instant, 1±=γ , the mass vanishes. 
Another greatly important outcome (Arcidiacono, 1977) is the relation between m mass and the    
J polar inertia momentum of a body: 
(19.4)                       2mrJ =  
A remarkable consequence is that the universe M mass varies with t: 
(20.4)                         ( ) ( )
MtM +−= γ , 
where M0 is the mass for 0=t , and J  the polar momentum with respect to the observer. 
So the overall picture for an inertial observer in a De Sitter Universe is that of a universe 
coming into existence in a singularity at –t0 time, expanding and collapsing at t0 time and where c 
light velocity is only locally constant. In the initial and final instants the light velocity is infinite and 
the global mass is zero while in the expansion-collapse time it varies according to (20.4). In the 
projective scenario the space flatness is linked to the observer geometry in a universe at constant 
curvature. All this is linked to the fact that in PSR the translations and rotations are indivisible. In 
the singularities there is no “breakdown” of the physical laws because the global space-time 
structure is univocally individuated by the group which is independent of the matter-energy 
distribution. In this case, the singularities in 4P  are – more properly- an horizon of events with a 
natural “cosmic censure” fixed by observers’ geometry. 
5. The Projective Gravitation 
The connection between the metric approach to Einstein gravitation and Fantappié-Arcidiacono 
group one is the aim of Projective General Relativity(PGR), which describes a universe globally at 
constant curvature and locally at variable curvature. It can be done by following the Cartan idea, 
where any  4V  Riemann manifold is associated with an infinite family of Euclidean, pseudo-
Euclidean, non-Euclidean spaces tangent to it in each of its P points. Those spaces’ geometry is 
individuated by a holonomy group. The Cartan connection law links the tangent spaces so as to 
obtain both the 4V  local characteristics (curvature and torsion) and the global ones (holonomy 
group). The GR holonomy group is the one at four dimension rotations, i.e. the Lorentz group. So 
we get a general method which builds a bridgeway up between differential geometry and group 
theory (Pessa, 1973; Arcidiacono, 1986) 
To make a PGR it is introduced the 5V  Riemann manifold which allows as holonomy group the 
De Sitter-Fantappié one, isomorphic to the 5S five-dimensional rotations’ group. The 5V  geometry 
is successively written in terms of Beltrami projective inducted metric for a  anholomonous 4V  
manifold at variable curvature. The Veblen projective connection: 
(1.5)              { }ABCABC =π = ( )BCSCSBBSCAS gggg ∂−∂+∂2
defines a projective translation law which let the field of the Q quadrics invariant in the tangent 
spaces, in each 4V  point, 0== BAAB xxgQ ,where ABg  are the coefficients of the five-dimensional 
metric, the Kx are the homogeneous projective coordinates, and (ABC)=0,1,..,4.From the (1.5) we 
build the projective torsion-curvature tensor: 
(2.5)             SBC
BCDR ππππππ −+∂−∂= . 
So the gravitation equations of Projective General Relativity are: 
(3.5)              ABABAB TRgR χ=− 2
with ABT energy-momentum tensor, and χ Einstein gravitational constant. The (2.5) tensor is 
projectively flat, i.e. when it vanishes we get the De Sitter space at constant curvature. The deep 
link between rotations and translations in 4S  naturally leads  the (3.5) to include the torsion, 
showing an interesting formal analogy with Einstein-Cartan- Sciama-Kibble spin-fluids  theory. The 
construction is analogous to the GR one, but in lieu of the relation between Riemann curvature and 
Minkowski s-t, we get here a curvature-torsion connected to the De Sitter-Fantappié holonomy  
group. It has to be noted that, in concordance with the equivalence principle, the PGR gives a metric 
description of the local gravity, valid for single( i.e., non cosmological) systems. 
It is here proposed again the problem of the relations between local physics and its extension on 
cosmic scale. In fact, if we take the starting expression of standard cosmology based upon GR, i.e. 
let us consider the whole matter of Universe, and transfer it within the ambit of PGR, we can ask 
ourselves if the torsion role, associated to the rotation one, could get a feed-back on the background 
metric, modifying it deeply. Generally, the syntax of a purely group-based theory does not get the 
tools to give an answer, because it is independent from gravity and the hypotheses on ABT . For 
example, Snyder (Snyder, 1947) showed that in a De Sitter space it is introduced an uncertainty 
relation linked to a curvature of the kind: 21 rxx ki ≈∆∆ . Only a third quantization formalism, able 
to take into account the dynamical two-way  inter-relations between local and global, will succeed 
in giving an answer. 
The essential point we have to underline here is that the introduction of a cosmological constant, 
both as additional hypothesis on Einstein equations or via group, is a radical alternative to the 
“machian philosophy” of the GR. 
So, for a Universe without metter-fields we assume the constant curvature as a sort of “pre-
matter” which describes in topological terms the most general conditions for the quantum vacuum. 
Therefore the Einstein equations in the following form are valid: 
(4.5)           ABAB gG Λ=  and ( ) ABAB gRR Λ−= 2 , 
with their essentially physical content, i.e. the deep connection among curvature, radius and matter-
energy’s density vacρ  by means of the cosmological constant: 
       (5.5)            
vac π
6. De Sitter Observers, Singularities and Wick Rotations 
From a quantum viewpoint the 4S  interesting aspect is that it is at imaginary cyclic time and 
without singularities. It means that it is impossible to define on De Sitter a global temporal 
coordinate. So it has an istanton feature, individuated by its Euler topological number which is 2 
(Rajaraman,1982). This leads to a series of formal analogies both with black holes’ quantum 
physics and the theoretical proposals for the “cure” for singularities. 
Let us consider the De Sitter-Castelnuovo metric in real time: 
(1.6)             222
2 11 Ω+��
drdrr
ds , 
where 2222 sin ϕϑϑ ddd +=Ω in polar coordinates. 
As we have seen in PSR, the singularity in Hcr = becomes an horizon of events for any 
observer when it passes to the Euclidean metric with it−→τ : 
(2.6)           ( )222
22 sincos
Ω++= rddrH
dds ττ , 
with a close analogy with the Schwarzschild solution’s case. The τ period is Hπβ 2= ; for the 
observers in De Sitter it implies the possibility to define a temperature, an entropy and an area of 
the horizon, respectively given by: 
(3.6)     1
−== β
Tb ; π
From the (3.6) we get the following fundamental outcome: 
(4.6)        AS
which is the well-known expression of the t’Hooft-Susskind-Bekenstein Holographic 
Principle(Susskind,1995). The (4.6) connects the non-existence of a global temporal coordinate 
with the information accessible to any observer in the De Sitter model. In this way we obtain a deep 
physical explanation for applying  the Weyl Principle in the De Sitter Universe, and sum up that in 
cosmology, as well as in QM, a physical system cannot be fully specified without defining an 
observer. G. Arcidiacono stated that the hyper-spherical Universe is like a book written with seven 
seals ( Apocalypse, 6-11), and consequently two operations are necessary to investigate its physics: 
1) inverse Wick rotation and 2) Beltrami-Castelnuovo representation. That’s the way we can 
completely define a relativity in  De Sitter. 
The association of imaginary time with temperature gets a remarkable physical significance 
which implies some  considerations on the statistical partition function (Hawking, 1975). For our 
aims it will be sufficient to say that such temperature is linked to the (4.6) relation, i.e. to the 
information that an observer spent within his area of events. Which thing has patent implications 
from the dynamical viewpoint, because it is the same as to state that, as well as in Schwarzschild 
black hole’ s case, the De Sitter space and the quantum field defined on it behave as if they were 
immersed in background fluctuations. The transition amplitude from a configuration of a φ  generic 
field  in dttt =− 12  time will be given by the 
iHdte−  matrix element which acts as a ( )1U  group 
transformation of the ( ) ( )timespace UU 11 ⇔ . It means that a transition amplitude on 4S  will appear to 
an observer as the ( )tR scale factor’s variation with H variation rate. 
It makes possible to link the hyper-spherical description with the Big-Bang evolutionary 
scenario and to get rid of the thermodinamic ambiguities which characterize its “beginning” and 
“ending” notions. The last ones have to be re-interpretated as purely quantum dynamics of the 
matter-fields on the hyper-sphere free of singularities.  
7. Physical Considerations for Further Developments 
Such considerations suggest a research program we are going here to shortly delineate ; it 
furthermore develops the analogy between black holes, istantons and De Sitter Universes (see – for 
example – Frolov, Markov, Mukhanov,  1989;Strominger, 1992). It is known that the Hartle-
Hawking proposal of “no-boundary” condition removes the initial singularity and allows to 
calculate the Universe wave function (Hartle-Hawking, 1989). In fact, it is possible – as in the usual 
QFT- to calculate the path integrals by using a Wick rotation as “Euclidization” procedure. In such 
way also the essential characteristics of the inflationary hypotheses are englobed (A. Borde, A. 
Guth and A. Vilenkin, 2003). The derived formalism is similar to that used in the ordinary QM for 
the tunnel effect, an analogy which should explain the physics at its bottom (Vilenkin, 1982; S.W. 
Hawking and I.G. Moss, 1982). 
The group extension method provides this procedure with a solid foundation, because the De 
Sitter space, maximally symmetric and simply connected, is univocally individuated by the group 
structure, and consequently is directly linked to the space-time homogeneity and isotropy principle 
with respect to physical laws. The original Hartle-Hawking formulation operates a mix of 
topologies hardly justified both on the formal level and the conceptual one. The “no-boundary” 
condition is only valid if we works with imaginary time, and the theory does not contain a strict 
logical procedure to explain the passage to real time. This corresponds to a quite vague attempt to 
conciliate an hyper-spherical description at imaginary time with an evolutive one at real time 
according to the traditional Big-Bang scenario.In fact, it has been observed that the Hartle-Hawking 
condition is the same as to substitute a singularity with a “nebulosity”.  
The spontaneous proposal, at this point, is considering the Hartle-Hawking conditions on 
primordial space-time as a consequence of a global charaterization of the hyper-sphere and directly 
developing quantum physics on 4S .Which thing does not contradict the quantum mechanics 
formulation and its fundamental spirit, which is to say the Feynman path integrals. In other words, 
quantum mechanics has not to be applied to cosmology for the Universe smallness at its beginning, 
but because each physical system – without exception- gets quantum histories with amplitude 
interferences. We point out that such view is in perfect consonance with the so-called quantum 
mechanics Many Worlds Interpretation ( Halliwell, 1994). The “by nothing creation” means that we 
cannot “look inside” an istanton (hyper-spherical space), but we have to recourse to an 
“evolutionary” description which separates space from time. The projective methods tell us how to 
do it. 
An analogous problem– to some extent – is that of the Weyl Tensor Hypothesis. Recently, 
Roger Penrose has suggested a condition on the initial singularity that, within the GR, ties entropy 
and gravity and makes a time arrow emerge (Penrose,1989). It is known that the ABCDW  Weyl 
conformal  tensor describes the freedom degrees of the gravitational field. The Penrose Hypothesis 
is that 0→ABCDW in the Big-Bang, while ∞→ABCDW  in the Big-Crunch. The physical reason is 
that in the Universe’s initial state we have an highly uniform matter distribution at low entropy 
( entalpic order), while in Big-Crunch, just like a black hole, we have an high entropy situation. 
This differentiates the two singularities and provides a time arrow. In an hyper-spherical Universe 
there is no “beginning” and “ending”, but only quantum transitions.Consequently, the Penrose 
Hypothesis can only be implemented in terms of projective representation within the ambit of PGR. 
Finally, we can take into consideration the possibility to build a Quantum Field Theory on 4S . 
A QFT, for T tending towards zero, is a limit case of a theory describing some physical fields 
interacting with an external environment at T temperature. Without this external environment we 
could not speak of dechoerence , could not introduce concepts such as like dissipation, chaos, noise 
and, obviously, the possibility to describe phase transitions would vanish too. Therefore, it is of 
paramount importance to write a QFT on De Sitter background metric and then studying it in 
projective representation. If we admit decoherence processes on 4S , it is possible to interpret the 
Weyl Principle as a form of Anthropic Principle: the “classical”  and observable Universes are the 
ones where it can be operated a description at real time. 
In conclusion, it is possible to delineate an alternative, but not incompatible with traditional 
cosmology scenario.The Universe is the quantum configuration of the quantum fields on 4S .Thus 
developing a Quantum Cosmology coincides with developing a Quantum Field Theory on a space 
free of singularities.The Big-Bang is a by vacuum nucleation in an hyper-spherical background at 
imaginary time, and so the concepts of “beginning”, “expansion” and “ending” belong to the space-
time foreground and gain their meaning only by means of a suitable representation which defines a 
family of cosmological observers. 
Acknowledgements: 
I owe my knowledge  of the group extension method to the regretted Prof. G. Arcidiacono 
(1927 – 1998), during our intense discussions while strolling throughout Rome.  
Special thanks to my friends E. Pessa and L. Chiatti for the rich exchange of viewpoints and e-
mails. 
 References 
Arcidiacono, G.(1956),  Rend. Accad. Lincei,20, 4 
Arcidiacono, G. (1976), Gen. Rel. And Grav., 7, 885 
Arcidiacono, G.(1977), Gen. Rel. And Grav.,7, 865 
Arcidiacono, G.(1984), in De Sabbata,V. & T.M.Karade (eds), Relativistic Astrophysics and      
Cosmology, World Scientific, Singapore 
Arcidiacono, G. (1986), Projective Relativity,Cosmology and Gravitation, 
Hadr.Press,Cambridge,USA 
Borde, A. , Guth,A., Vilenkin, A. (2003),Phys. Rev. Lett. 90 
Ellis, G.F.R. & Wiliams,R. (1988), Flat and Curved Space-Times, Clarendon Press 
Fantappié, L. (1954), Rend. Accad. Lincei,17,5 
Fantappié, L. (1959), Collectanea Mathematica,XI, 77 
Frolov, V.P., Markov,M.A., Mukhanov,V.F.(1989), Phys.Lett.,B216,272 
Halliwell,J.J. (1994), in Greenberger,D. & Zeilinger,A.(eds), Fundamental Problems in 
Quantum Theory, New York Academy of Sciences,NY 
Hartle, J.B. & Hawking,S.W. (1983), Phys.Rev.D,28,12 
Hawking,S.W. & Ellis, G.F.R.,(1973) The Large Scale Structure of Space-Time,Cambridge 
Univ.Press 
Hawking,S.W. (1975), Commun.Math.Phys.,43 
Hawking, S.W. & Moss, I.G. (1982), Phys.Lett.,B110,35 
Pessa,E. (1973), Collectanea Mathematica,XXIV,2 
Rajaraman,R.(1982), Solitons and Istantons,North-Holland Publ.,NY 
Snyder, H.S. (1947), Phys.Rev., 51,38 
Strominger, A. (1992), Phys.Rev.D,46,10 
Susskind, L. (1995), Jour.Math.Phys.,36 
Vilenkin, A. (1982), Phys.Lett.,117B,1.
ABSTRACT
  In the last years the traditional scenario of Big Bang has been deeply
modified by the study of the quantum features of the Universe evolution,
proposing again the problem of using local physical laws on cosmic scale, with
particular regard to the cosmological constant role. The group extention method
shows that the De Sitter group univocally generalizes the Poincare group,
formally justifies the cosmological constant use and suggests a new
interpretation for Hartle-Hawking boundary conditions in Quantum Cosmology.

<|endoftext|><|startoftext|>
Introduction
The spectral action introduced by Chamseddine–Connes plays an important role [3] in noncom-
mutative geometry. More precisely, given a spectral triple (A,H,D) where A is an algebra acting
on the Hilbert space H and D is a Dirac-like operator (see [8, 23]), they proposed a physical
action depending only on the spectrum of the covariant Dirac operator
DA := D +A+ ǫ JAJ−1 (1)
where A is a one-form represented on H, so has the decomposition
ai[D, bi], (2)
with ai, bi ∈ A, J is a real structure on the triple corresponding to charge conjugation and
ǫ ∈ { 1,−1 } depending on the dimension of this triple and comes from the commutation relation
JD = ǫDJ. (3)
This action is defined by
S(DA,Φ,Λ) := Tr
Φ(DA/Λ)
where Φ is any even positive cut-off function which could be replaced by a step function up
to some mathematical difficulties investigated in [16]. This means that S counts the spectral
values of |DA| less than the mass scale Λ (note that the resolvent of DA is compact since, by
assumption, the same is true for D, see Lemma 3.1 below).
In [18], the spectral action on NC-tori has been computed only for operators of the form D+A
and computed for DA in [20]. It appears that the implementation of the real structure via J ,
does change the spectral action, up to a coefficient when the torus has dimension 4. Here we
prove that this can be also directly obtained from the Chamseddine–Connes analysis of [4] that
we follow quite closely. Actually,
S(DA,Φ,Λ) =
0<k∈Sd+
− |DA|−k +Φ(0) ζDA(0) +O(Λ−1) (5)
where DA = DA + PA, PA the projection on KerDA, Φk = 12
Φ(t) tk/2−1 dt and Sd+ is the
strictly positive part of the dimension spectrum of (A,H,D). As we will see, Sd+ = { 1, 2, · · · , n }
|DA|−n =
|D|−n. Moreover, the coefficient ζDA(0) related to the constant term in (5)
can be computed from the unperturbed spectral action since it has been proved in [4] (with an
invertible Dirac operator and a 1-form A such that D +A is also invertible) that
ζD+A(0) − ζD(0) =
(−1)q
−(AD−1)q, (6)
using ζX(s) = Tr(|X|−s). We will see how this formula can be extended to the case a noninvert-
ible Dirac operator and noninvertible perturbation of the form D+ Ã where Ã := A+ εJAJ−1.
All this results on spectral action are quite important in physics, especially in quantum field
theory and particle physics, where one adds to the effective action some counterterms explicitly
given by (6), see for instance [2–5,17,18,20,22,28,35–38].
Since the computation of zeta functions is crucial here, we investigate in section 2 residues of
series and integrals. This section contains independent interesting results on the holomorphy
of series of holomorphic functions. In particular, the necessity of a Diophantine constraint is
naturally emphasized.
In section 3, we revisit the notions of pseudodifferential operators and their associated zeta
functions and of dimension spectrum. The reality operator J is incorporated and we pay a
particular attention to kernels of operators which can play a role in the constant term of (5).
This section concerns general spectral triple with simple dimension spectrum.
Section 4 is devoted to the example of the noncommutative torus. It is shown that it has a
vanishing tadpole.
In section 5, all previous technical points are then widely used for the computation of terms in
(5) or (6).
Finally, the spectral action (6) is obtained in section 6 and we conjecture that the noncom-
mutative spectral action of DA has terms proportional to the spectral action of D + A on the
commutative torus.
2 Residues of series and integral, holomorphic continuation, etc
Notations:
In the following, the prime in
means that we omit terms with division by zero in the summand.
Bn (resp. Sn−1) is the closed ball (resp. the sphere) of Rn with center 0 and radius 1 and the
Lebesgue measure on Sn−1 will be noted dS.
For any x = (x1, . . . , xn) ∈ Rn we denote by |x| =
x21 + · · ·+ x2n the euclidean norm and
|x|1 := |x1|+ · · · + |xn|.
N = {1, 2, . . . } is the set of positive integers and N0 = N ∪ {0} the set of non negative integers.
By f(x, y) ≪y g(x) uniformly in x, we mean that |f(x, y)| ≤ a(y) |g(x)| for all x and y for some
a(y) > 0.
2.1 Residues of series and integral
In order to be able to compute later the residues of certain series, we prove here the following
Theorem 2.1. Let P (X) =
j=0 Pj(X) ∈ C[X1, · · · ,Xn] be a polynomial function where Pj is
the homogeneous part of P of degree j. The function
ζP (s) :=
P (k)
, s ∈ C
has a meromorphic continuation to the whole complex plane C.
Moreover ζP (s) is not entire if and only if PP := {j :
u∈Sn−1
Pj(u) dS(u) 6= 0} 6= ∅. In that
case, ζP has only simple poles at the points j + n, j ∈ PP , with
s=j+n
ζP (s) =
u∈Sn−1
Pj(u) dS(u).
The proof of this theorem is based on the following lemmas.
Lemma 2.2. For any polynomial P ∈ C[X1, . . . ,Xn] of total degree δ(P ) :=
i=1 degXiP and
any α ∈ Nn0 , we have
P (x)|x|−s
≪P,α,n (1 + |s|)|α|1 |x|−σ−|α|1+δ(P )
uniformly in x ∈ Rn verifying |x| ≥ 1, where σ = ℜ(s).
Proof. By linearity, we may assume without loss of generality that P (X) = Xγ is a monomial.
It is easy to prove (for example by induction on |α|1) that for all α ∈ Nn0 and x ∈ Rn \ {0}:
|x|−s
β,µ∈Nn0
β+2µ=α
|β|1+|µ|1
) (|β|1+|µ|1)!
β! µ!
|x|σ+2(|β|1+|µ|1)
It follows that for all α ∈ Nn0 , we have uniformly in x ∈ Rn verifying |x| ≥ 1:
|x|−s
≪α,n (1 + |s|)|α|1 |x|−σ−|α|1 . (7)
By Leibniz formula and (7), we have uniformly in x ∈ Rn verifying |x| ≥ 1:
xγ |x|−s
∂β(xγ) ∂α−β
|x|−s
≪γ,α,n
β≤α;β≤γ
xγ−β (1 + |s|)|α|1−|β|1 |x|−σ−|α|1+|β|1
≪γ,α,n (1 + |s|)|α|1 |x|−σ−|α|1+|γ|1 .
Lemma 2.3. Let P ∈ C[X1, . . . ,Xn] be a polynomial of degree d. Then, the difference
∆P (s) :=
P (k)
Rn\Bn
P (x)
which is defined for ℜ(s) > d+ n, extends holomorphically on the whole complex plane C.
Proof. We fix in the sequel a function ψ ∈ C∞(Rn,R) verifying for all x ∈ Rn
0 ≤ ψ(x) ≤ 1, ψ(x) = 1 if |x| ≥ 1 and ψ(x) = 0 if |x| ≤ 1/2.
The function f(x, s) := ψ(x) P (x) |x|−s, x ∈ Rn and s ∈ C, is in C∞(Rn × C) and depends
holomorphically on s.
Lemma 2.2 above shows that f is a “gauged symbol” in the terminology of [24, p. 4]. Thus
[24, Theorem 2.1] implies that ∆P (s) extends holomorphically on the whole complex plane C.
However, to be complete, we will give here a short proof of Lemma 2.3:
It follows from the classical Euler–Maclaurin formula that for any function h : R → C of class
CN+1 verifying lim|t|→+∞ h(k)(t) = 0 and
|h(k)(t)| dt < +∞ for any k = 0 . . . , N + 1, that we
have ∑
h(k) =
h(t) +
(−1)N
(N+1)!
BN+1(t) h
(N+1)(t) dt
where BN+1 is the Bernoulli function of order N + 1 (it is a bounded periodic function.)
Fix m′ ∈ Zn−1 and s ∈ C. Applying this to the function h(t) := ψ(m′, t) P (m′, t) |(m′, t)|−s (we
use Lemma 2.2 to verify hypothesis), we obtain that for any N ∈ N0:
ψ(m′,mn) P (m
′,mn) |(m′,mn)|−s =
ψ(m′, t) P (m′, t) |(m′, t)|−s dt+RN (m′; s) (8)
where RN (m′; s) := (−1)
(N+1)!
BN+1(t)
N+1 (ψ(m
′, t) P (m′, t) |(m′, t)|−s) dt.
By Lemma 2.2,
∣∣∣BN+1(t) ∂
ψ(m′, t) P (m′, t) |(m′, t)|−s
) ∣∣∣ dt ≪P,n,N (1 + |s|)N+1 (|m′|+ 1)−σ−N+δ(P ).
m′∈Zn−1 RN (m′; s) converges absolutely and define a holomorphic function in the half
plane {σ = ℜ(s) > δ(P ) + n−N}.
Since N is an arbitrary integer, by letting N → ∞ and using (8) above, we conclude that:
(m′,mn)∈Zn−1×Z
ψ(m′,mn) P (m
′,mn) |(m′,mn)|−s−
m′∈Zn−1
ψ(m′, t) P (m′, t) |(m′, t)|−s dt
has a holomorphic continuation to the whole complex plane C.
After n iterations, we obtain that
ψ(m) P (m) |m|−s −
ψ(x) P (x) |x|−s dx
has a holomorphic continuation to the whole C.
To finish the proof of Lemma 2.3, it is enough to notice that:
• ψ(0) = 0 and ψ(m) = 1, ∀m ∈ Zn \ {0};
• s 7→
ψ(x) P (x) |x|−s dx =
{x∈Rn:1/2≤|x|≤1}
ψ(x) P (x) |x|−s dx is a holomorphic
function on C.
Proof of Theorem 2.1. Using the polar decomposition of the volume form dx = ρn−1 dρ dS in
Rn, we get for ℜ(s) > d+ n,
Rn\Bn
Pj(x)
ρj+n−1
Pj(u) dS(u) =
j+n−s
Pj(u) dS(u).
Lemma 2.3 now gives the result.
2.2 Holomorphy of certain series
Before stating the main result of this section, we give first in the following some preliminaries
from Diophantine approximation theory:
Definition 2.4. (i) Let δ > 0. A vector a ∈ Rn is said to be δ−diophantine if there exists c > 0
such that |q.a−m| ≥ c |q|−δ, ∀q ∈ Zn \ { 0 } and ∀m ∈ Z.
We note BV(δ) the set of δ−diophantine vectors and BV := ∪δ>0BV(δ) the set of diophantine
vectors.
(ii) A matrix Θ ∈ Mn(R) (real n × n matrices) will be said to be diophantine if there exists
u ∈ Zn such that tΘ(u) is a diophantine vector of Rn.
Remark. A classical result from Diophantine approximation asserts that for all δ > n, the
Lebesgue measure of Rn \ BV(δ) is zero (i.e almost any element of Rn is δ−diophantine.)
Let Θ ∈ Mn(R). If its row of index i is a diophantine vector of Rn (i.e. if Li ∈ BV) then
tΘ(ei) ∈ BV and thus Θ is a diophantine matrix. It follows that almost any matrix of Mn(R) ≈
is diophantine.
The goal of this section is to show the following
Theorem 2.5. Let P ∈ C[X1, · · · ,Xn] be a homogeneous polynomial of degree d and let b be in
S(Zn × · · · × Zn) (q times, q ∈ N). Then,
(i) Let a ∈ Rn. We define fa(s) :=
P (k)
e2πik.a.
1. If a ∈ Zn, then fa has a meromorphic continuation to the whole complex plane C.
Moreover if S is the unit sphere and dS its Lebesgue measure, then fa is not entire if and only
u∈Sn−1
P (u) dS(u) 6= 0. In that case, fa has only a simple pole at the point d + n, with
s=d+n
fa(s) =
u∈Sn−1
P (u) dS(u).
2. If a ∈ Rn \ Zn, then fa(s) extends holomorphically to the whole complex plane C.
(ii) Suppose that Θ ∈ Mn(R) is diophantine. For any (εi)i ∈ {−1, 0, 1}q , the function
g(s) :=
l∈(Zn)q
b(l) fΘ
i εili
extends meromorphically to the whole complex plane C with only one possible pole on s = d+n.
Moreover, if we set Z := {l ∈ (Zn)q :
i=1 εili = 0} and V :=
l∈Z b(l), then
1. If V
P (u) dS(u) 6= 0, then s = d+ n is a simple pole of g(s) and
s=d+n
g(s) = V
u∈Sn−1
P (u) dS(u).
2. If V
P (u) dS(u) = 0, then g(s) extends holomorphically to the whole complex plane C.
(iii) Suppose that Θ ∈ Mn(R) is diophantine. For any (εi)i ∈ {−1, 0, 1}q , the function
g0(s) :=
l∈(Zn)q\Z
b(l) fΘ
i=1 εili
where Z := {l ∈ (Zn)q :
i=1 εili = 0} extends holomorphically to the whole complex plane C.
Proof of Theorem 2.5: First we remark that
If a ∈ Zn then fa(s) =
P (k)
. So, the point (i.1) follows from Theorem 2.1;
g(s) :=
l∈(Zn)q\Z b(l) fΘ
i εili
(s) +
l∈Z b(l)
P (k)
. Thus, the point (ii) rises
easily from (iii) and Theorem 2.1.
So, to complete the proof, it remains to prove the items (i.2) and (iii).
The direct proof of (i.2) is easy but is not sufficient to deduce (iii) of which the proof is more
delicate and requires a more precise (i.e. more effective) version of (i.2). The next lemma gives
such crucial version, but before, let us give some notations:
F := { P (X)
(X21+···+X
r/2 : P (X) ∈ C[X1, . . . ,Xn] and r ∈ N0}.
We set g =deg(G) =deg(P )− r ∈ Z, the degree of G = P (X)
(X21+···+X
r/2 ∈ F .
By convention we set deg(0) = −∞.
Lemma 2.6. Let a ∈ Rn. We assume that d (a.u,Z) := infm∈Z |a.u−m| > 0 for some u ∈ Zn.
For all G ∈ F , we define formally,
F0(G; a; s) :=
e2πi k.a and F1(G; a; s) :=
(|k|2+1)s/2
e2πi k.a.
Then for all N ∈ N, all G ∈ F and all i ∈ {0, 1}, there exist positive constants Ci := Ci(G,N, u),
Bi := Bi(G,N, u) and Ai := Ai(G,N, u) such that s 7→ Fi(G;α; s) extends holomorphically to
the half-plane {ℜ(s) > −N} and verifies in it:
Fi(G; a; s) ≤ Ci(1 + |s|)Bi
d (a.u,Z)
Remark 2.7. The important point here is that we obtain an explicit bound of Fi(G;α; s) in
{ℜ(s) > −N} which depends on the vector a only through d(a.u,Z), so depends on u and
indirectly on a (in the sequel, a will vary.) In particular the constants Ci := Ci(G,N, u),
Bi = Bi(G,N) and Ai := Ai(G,N) do not depend on the vector a but only on u. This is crucial
for the proof of items (ii) and (iii) of Theorem 2.5!
2.2.1 Proof of Lemma 2.6 for i = 1:
Let N ∈ N0 be a fixed integer, and set g0 := n+N + 1.
We will prove Lemma 2.6 by induction on g =deg(G) ∈ Z. More precisely, in order to prove
case i = 1, it suffices to prove that:
Lemma 2.6 is true for all G ∈ F verifying deg(G) ≤ −g0.
Let g ∈ Z with g ≥ −g0+1. If Lemma 2.6 is true for all G ∈ F such that deg(G) ≤ g−1,
then it is also true for all G ∈ F satisfying deg(G) = g.
• Step 1: Checking Lemma 2.6 for deg(G) ≤ −g0 := −(n+N + 1).
Let G(X) =
P (X)
(X21+···+X
r/2 ∈ F verifying deg(G) ≤ −g0. It is easy to see that we have
uniformly in s = σ + iτ ∈ C and in k ∈ Zn:
|G(k) e2πi k.a|
(|k|2+1)σ/2
|P (k)|
(|k|2+1)(r+σ)/2
≪G 1(|k|2+1)(r+σ−deg(P ))/2 ≪G
(|k|2+1)(σ−deg(G))/2
≪G 1(|k|2+1)(σ+g0)/2 .
It follows that F1(G; a; s) =
(|k|2+1)s/2
e2πi k.a converges absolutely and defines a holo-
morphic function in the half plane {σ > −N}. Therefore, we have for any s ∈ {ℜ(s) > −N}:
|F1(G; a; s)| ≪G
(|k|2+1)(−N+g0)/2
(|k|2+1)(n+1)/2
≪G 1.
Thus, Lemma 2.6 is true when deg(G) ≤ −g0.
• Step 2: Induction.
Now let g ∈ Z satisfying g ≥ −g0+1 and suppose that Lemma 2.6 is valid for all G ∈ F verifying
deg(G) ≤ g − 1. Let G ∈ F with deg(G) = g. We will prove that G also verifies conclusions of
Lemma 2.6:
There exist P ∈ C[X1, . . . ,Xn] of degree d ≥ 0 and r ∈ N0 such that G(X) = P (X)(X21+···+X2n+1)r/2
and g =deg(G) = d− r.
Since G(k) ≪ (|k|2 +1)g/2 uniformly in k ∈ Zn, we deduce that F1(G; a; s) converges absolutely
in {σ = ℜ(s) > n+ g}.
Since k 7→ k + u is a bijection from Zn into Zn, it follows that we also have for ℜ(s) > n+ g
F1(G; a; s) =
P (k)
(|k|2+1)(s+r)/2
e2πi k.a =
P (k+u)
(|k+u|2+1)(s+r)/2
e2πi (k+u).a
= e2πi u.a
P (k+u)
(|k|2+2k.u+|u|2+1)(s+r)/2
e2πi k.a
= e2πi u.a
α∈Nn0 ;|α|1=α1+···+αn≤d
∂αP (k)
(|k|2+2k.u+|u|2+1)(s+r)/2
e2πi k.a
= e2πi u.a
|α|1≤d
∂αP (k)
(|k|2+1)(s+r)/2
2k.u+|u|2
(|k|2+1)
)−(s+r)/2
e2πi k.a.
Let M := sup(N + n+ g, 0) ∈ N0. We have uniformly in k ∈ Zn
2k.u+|u|2
(|k|2+1)
)−(s+r)/2
−(s+r)/2
)(2k.u+|u|2)j
(|k|2+1)j
+OM,u
( (1+|s|)M+1
(|k|2+1)(M+1)/2
Thus, for σ = ℜ(s) > n+ d,
F1(G; a; s) = e
2πi u.a
|α|1≤d
∂αP (k)
(|k|2+1)(s+r)/2
2k.u+|u|2
(|k|2+1)
)−(s+r)/2
e2πi k.a
= e2πi u.a
|α|1≤d
−(s+r)/2
∂αP (k)(2k.u+|u|2)
(|k|2+1)(s+r+2j)/2
e2πi k.a
+OG,M,u
(1 + |s|)M+1
(|k|2+1)(σ+M+1−g)/2
. (9)
Set I := {(α, j) ∈ Nn0 × {0, . . . ,M} | |α|1 ≤ d} and I∗ := I \ { (0, 0) }.
Set also G(α,j);u(X) :=
∂αP (X)(2X.u+|u|2)
(|X|2+1)(r+2j)/2
∈ F for all (α, j) ∈ I∗.
Since M ≥ N + n+ g, it follows from (9) that
(1 − e2πi u.a) F1(G; a; s) = e2πi u.a
(α,j)∈I∗
−(s+r)/2
G(α,j);u;α; s
+RN (G; a;u; s) (10)
where s 7→ RN (G; a;u; s) is a holomorphic function in the half plane {σ = ℜ(s) > −N}, in
which it satisfies the bound RN (G; a;u; s) ≪G,N,u 1.
Moreover it is easy to see that, for any (α, j) ∈ I∗,
G(α,j);u
= deg(∂αP ) + j − (r + 2j) ≤ d− |α|1 + j − (r + 2j) = g − |α|1 − j ≤ g − 1.
Relation (10) and the induction hypothesis imply then that
(1− e2πi u.a) F1(G; a; s) verifies the conclusions of Lemma 2.6. (11)
Since |1− e2πi u.a| = 2| sin(πu.a)| ≥ d (u.a,Z), then (11) implies that F1(G; a; s) satisfies conclu-
sions of Lemma 2.6. This completes the induction and the proof for i = 1.
2.2.2 Proof of Lemma 2.6 for i = 0:
Let N ∈ N be a fixed integer. Let G(X) = P (X)
(X21+···+X
r/2 ∈ F and g = deg(G) = d− r where
d ≥ 0 is the degree of the polynomial P . Set also M := sup(N + g + n, 0) ∈ N0.
Since P (k) ≪ |k|d for k ∈ Zn\{ 0 }, it follows that F0(G; a; s) and F1(G; a; s) converge absolutely
in the half plane {σ = ℜ(s) > n+ g}.
Moreover, we have for s = σ + iτ ∈ C verifying σ > n+ g:
F0(G; a; s) =
k∈Zn\{ 0 }
(|k|2+1−1)s/2
e2πi k.a =
′ G(k)
(|k|2+1)s/2
|k|2+1
)−s/2
e2πi k.a
(−1)j G(k)
(|k|2+1)(s+2j)/2
e2πi k.a
(1 + |s|)M+1
′ |G(k)|
(|k|2+1)(σ+2M+2)/2
(−1)jF1(G; a; s + 2j)
(1 + |s|)M+1
′ |G(k)|
(|k|2+1)(σ+2M+2)/2
. (12)
In addition we have uniformly in s = σ + iτ ∈ C verifying σ > −N ,
′ |G(k)|
(|k|2+1)(σ+2M+2)/2
′ |k|g
(|k|2+1)(−N+2M+2)/2
|k|n+1
< +∞.
So (12) and Lemma 2.6 for i = 1 imply that Lemma 2.6 is also true for i = 0. This completes
the proof of Lemma 2.6.
2.2.3 Proof of item (i.2) of Theorem 2.5:
Since a ∈ Rn \ Zn, there exists i0 ∈ {1, . . . , n} such that ai0 6∈ Z. In particular d(a.ei0 ,Z) =
d(ai0 ,Z) > 0. Therefore, a satisfies the assumption of Lemma 2.6 with u = ei0 . Thus, for all
N ∈ N, s 7→ fa(s) = F0(P ; a; s) has a holomorphic continuation to the half-plane {ℜ(s) > −N}.
It follows, by letting N → ∞, that s 7→ fa(s) has a holomorphic continuation to the whole
complex plane C.
2.2.4 Proof of item (iii) of Theorem 2.5:
Let Θ ∈ Mn(R), (εi)i ∈ {−1, 0, 1}q and b ∈ S(Zn × Zn). We assume that Θ is a diophantine
matrix. Set Z := { l = (l1, . . . , lq) ∈ (Zn)q :
i εili = 0 } and P ∈ C[X1, . . . ,Xn] of degree
d ≥ 0.
It is easy to see that for σ > n+ d:
l∈(Zn)q\Z
|b(l)|
′ |P (k)|
|e2πi k.Θ
i εili | ≪P
l∈(Zn)q\Z
|b(l)|
|k|σ−d
l∈(Zn)q\Z
|b(l)|
< +∞.
g0(s) :=
l∈(Zn)q\Z
b(l) fΘ
i εili
(s) =
l∈(Zn)q\Z
′ P (k)
e2πi k.Θ
i εili
converges absolutely in the half plane {ℜ(s) > n+ d}.
Moreover with the notations of Lemma 2.6, we have for all s = σ + iτ ∈ C verifying σ > n+ d:
g0(s) =
l∈(Zn)q\Z
b(l)fΘ
i εili
(s) =
l∈(Zn)q\Z
b(l)F0(P ; Θ
εili; s) (13)
But Θ is diophantine, so there exists u ∈ Zn and δ, c > 0 such
|q. tΘu−m| ≥ c (1 + |q|)−δ , ∀q ∈ Zn \ { 0 }, ∀m ∈ Z.
We deduce that ∀l ∈ (Zn)q \ Z,
.u−m| = |
.tΘu−m| ≥ c
1 + |
εili|
)−δ ≥ c (1 + |l|)−δ.
It follows that there exists u ∈ Zn, δ > 0 and c > 0 such that
∀l ∈ (Zn)q \ Z, d
εili).u;Z
≥ c (1 + |l|)−δ . (14)
Therefore, for any l ∈ (Zn)q \Z, the vector a = Θ
i εili verifies the assumption of Lemma 2.6
with the same u. Moreover δ and c in (14) are also independent on l.
We fix now N ∈ N. Lemma 2.6 implies that there exist positive constants C0 := C0(P,N, u),
B0 := Bi(P,N, u) and A0 := A0(P,N, u) such that for all l ∈ (Zn)q \ Z, s 7→ F0(P ; Θ
i εili; s)
extends holomorphically to the half plane {ℜ(s) > −N} and verifies in it the bound
F0(P ; Θ
εili; s) ≤ C0 (1 + |s|)B0 d
εili).u;Z
This and (14) imply that for any compact set K included in the half plane {ℜ(s) > −N},
there exist two constants C := C(P,N, c, δ, u,K) and D := D(P,N, c, δ, u) (independent on
l ∈ (Zn)q \ Z) such that
∀s ∈ K and ∀l ∈ (Zn)q \ Z, F0(P ; Θ
εili; s) ≤ C (1 + |l|)D . (15)
It follows that s 7→
l∈(Zn)q\Z b(l)F0(P ; Θ
iεili; s) has a holomorphic continuation to the half
plane {ℜ(s) > −N}.
This and ( 13) imply that s 7→ g0(s) =
l∈(Zn)q\Z b(l)fΘ
i εili
(s) has a holomorphic contin-
uation to {ℜ(s) > −N}. Since N is an arbitrary integer, by letting N → ∞, it follows that
s 7→ g0(s) has a holomorphic continuation to the whole complex plane C which completes the
proof of the theorem.
Remark 2.8. By equation (11), we see that a Diophantine condition is sufficient to get Lemma
2.6. Our Diophantine condition appears also (in equivalent form) in Connes [7, Prop. 49] (see
Remark 4.2 below). The following heuristic argument shows that our condition seems to be
necessary in order to get the result of Theorem 2.5:
For simplicity we assume n = 1 (but the argument extends easily to any n).
Let θ ∈ R \Q. We know (see this reflection formula in [15, p. 6]) that for any l ∈ Z \ {0},
gθl(s) :=
e2πiθlk
s−1/2
) hθl(1− s) where hθl(s) :=
|θl+k|s
So, for any (al) ∈ S(Z), the existence of meromorphic continuation of g0(s) :=
l∈Z al gθl(s) is
equivalent to the existence of meromorphic continuation of
h0(s) :=
al hθl(s) =
|θl+k|s
So, for at least one σ0 ∈ R, we must have |al||θl+k|σ0 = O(1) uniformly in k, l ∈ Z
It follows that for any (al) ∈ S(Z), |θl + k| ≫ |al|1/σ0 uniformly in k, l ∈ Z∗. Therefore, our
Diophantine condition seems to be necessary.
2.2.5 Commutation between sum and residue
Let p ∈ N. Recall that S((Zn)p) is the set of the Schwartz sequences on (Zn)p. In other words,
b ∈ S((Zn)p) if and only if for all r ∈ N0, (1 + |l1|2 + · · · |lp|2)r |b(l1, · · · , lp)|2 is bounded on
(Zn)p. We note that if Q ∈ R[X1, · · · ,Xnp] is a polynomial, (aj) ∈ S(Zn)p, b ∈ S(Zn) and φ
a real-valued function, then l := (l1, · · · , lp) 7→ ã(l) b(−l̂p)Q(l) eiφ(l) is a Schwartz sequence on
(Zn)p, where
ã(l) := a1(l1) · · · ap(lp),
l̂i := l1 + . . .+ li.
In the following, we will use several times the fact that for any (k, l) ∈ (Zn)2 such that k 6= 0
and k 6= −l, we have
|k + l|2 =
|k|2 −
2k.l + |l|2
|k|2|k + l|2 . (16)
Lemma 2.9. There exists a polynomial P ∈ R[X1, · · · ,Xp] of degree 4p and with positive
coefficients such that for any k ∈ Zn, and l := (l1, · · · , lp) ∈ (Zn)p such that k 6= 0 and k 6= −l̂i
for all 1 ≤ i ≤ p, the following holds:
|k + l̂1|2 . . . |k + l̂p|2
≤ 1|k|2p P (|l1|, · · · , |lp|).
Proof. Let’s fix i such that 1 ≤ i ≤ p. Using two times (16), Cauchy–Schwarz inequality and
the fact that |k + l̂i|2 ≥ 1, we get
|k+bli|2
2|k||bli|+|bli|
(2|k||bli|+|bli|
|k|4|k+bli|2
|l̂i|+
|l̂i|2 + 4|k|3 |l̂i|
3 + 1
|l̂i|4.
Since |k| ≥ 1, and |l̂i|j ≤ |l̂i|4 if 1 ≤ j ≤ 4, we find
|k+bli|2
|l̂i|j ≤ 5|k|2
1 + 4|l̂i|4
1 + 4(
|lj |)4
|k+bl1|2...|k+blp|2
|k|2p
1 + 4(
|lj |)4
Taking P (X1, · · · ,Xp) := 5p
1 + 4(
j=1Xj)
now gives the result.
Lemma 2.10. Let b ∈ S((Zn)p), p ∈ N, Pj ∈ R[X1, · · · ,Xn] be a homogeneous polynomial
function of degree j, k ∈ Zn, l := (l1, · · · , lp) ∈ (Zn)p, r ∈ N0, φ be a real-valued function on
Zn × (Zn)p and
h(s, k, l) :=
b(l)Pj(k) e
iφ(k,l)
|k|s+r|k + l̂1|2 · · · |k + l̂p|2
with h(s, k, l) := 0 if, for k 6= 0, one of the denominators is zero.
For all s ∈ C such that ℜ(s) > n+ j − r − 2p, the series
H(s) :=
(k,l)∈(Zn)p+1
h(s, k, l)
is absolutely summable. In particular,
l∈(Zn)p
h(s, k, l) =
l∈(Zn)p
h(s, k, l) .
Proof. Let s = σ + iτ ∈ C such that σ > n+ j − r − 2p. By Lemma 2.9 we get, for k 6= 0,
|h(s, k, l)| ≤ |b(l)Pj(k)| |k|−r−σ−2p P (l),
where P (l) := P (|l1|, · · · , |lp|) and P is a polynomial of degree 4p with positive coefficients.
Thus, |h(s, k, l)| ≤ F (l)G(k) where F (l) := |b(l)|P (l) and G(k) := |Pj(k)||k|−r−σ−2p. The
summability of
l∈(Zn)p F (l) is implied by the fact that b ∈ S((Zn)p). The summability of∑′
k∈ZnG(k) is a consequence of the fact that σ > n + j − r − 2p. Finally, as a product of two
summable series,
k,lF (l)G(k) is a summable series, which proves that
k,lh(s, k, l) is also
absolutely summable.
Definition 2.11. Let f be a function on D× (Zn)p where D is an open neighborhood of 0 in C.
We say that f satisfies (H1) if and only if there exists ρ > 0 such that
(i) for any l, s 7→ f(s, l) extends as a holomorphic function on Uρ, where Uρ is the open
disk of center 0 and radius ρ,
(ii) the series
l∈(Zn)p ‖H(·, l)‖∞,ρ is summable,where ‖H(·, l)‖∞,ρ := sups∈Uρ |H(s, l)|.
We say that f satisfies (H2) if and only if there exists ρ > 0 such that
(i) for any l, s 7→ f(s, l) extends as a holomorphic function on Uρ − {0},
(ii) for any δ such that 0 < δ < ρ, the series
l∈(Zn)p ‖H(·, l)‖∞,δ,ρ is summable, where
‖H(·, l)‖∞,δ,ρ := supδ<|s|<ρ |H(s, l)|.
Remark 2.12. Note that (H1) implies (H2). Moreover, if f satisfies (H1) (resp. (H2) for
ρ > 0, then it is straightforward to check that f : s 7→
l∈(Zn)p f(s, l) extends as an holomorphic
function on Uρ (resp. on Uρ \ { 0 }).
Corollary 2.13. With the same notations of Lemma 2.10, suppose that r + 2p − j > n, then,
the function H(s, l) :=
k∈Znh(s, k, l) satisfies (H1).
Proof. (i) Let’s fix ρ > 0 such that ρ < r + 2p − j − n. Since r + 2p − j > n, Uρ is inside
the half-plane of absolute convergence of the series defined by H(s, l). Thus, s 7→ H(s, l) is
holomorphic on Uρ.
(ii) Since
∣∣|k|−s
∣∣ ≤ |k|ρ for all s ∈ Uρ and k ∈ Zn \ { 0 }, we get as in the above proof
|h(s, k, l)| ≤ |b(l)Pj(k)| |k|−r+ρ−2p P (|l1|, · · · , |lp|).
Since ρ < r + 2p − j − n, the series
k∈Zn |Pj(k)||k|−r+ρ−2p is summable.
Thus, ‖H(·, l)‖∞,ρ ≤ K F (l) where K :=
′|Pj(k)||k|−r+ρ−2p <∞. We have already seen that
the series
l F (l) is summable, so we get the result.
We note that if f and g both satisfy (H1) (or (H2)), then so does f + g. In the following, we
will use the equivalence relation
f ∼ g ⇐⇒ f − g satisfies (H1).
Lemma 2.14. Let f and g be two functions on D × (Zn)p where D is an open neighborhood of
0 in C, such that f ∼ g and such that g satisfies (H2). Then
l∈(Zn)p
f(s, l) =
l∈(Zn)p
g(s, l) .
Proof. Since f ∼ g, f satisfies (H2) for a certain ρ > 0. Let’s fix η such that 0 < η < ρ and
define Cη as the circle of center 0 and radius η. We have
g(s, l) = Res
f(s, l) = 1
f(s, l) ds =
u(t, l)dt .
where I = [0, 2π] and u(t, l) := 1
ηeitf(η eit, l). The fact that f satisfies (H2) entails that the
series
l∈(Zn)p ‖f(·, l)‖∞,Cη is summable. Thus, since ‖u(·, l)‖∞ =
η ‖f(·, l)‖∞,Cη , the series∑
l∈(Zn)p ‖u(·, l)‖∞ is summable, so, as a consequence,
l∈(Zn)p u(t, l)dt =
l∈(Zn)p
u(t, l)dt
which gives the result.
2.3 Computation of residues of zeta functions
Since, we will have to compute residues of series, let us introduce the following
Definition 2.15.
ζ(s) :=
Zn(s) :=
|k|−s,
ζp1,...,pn(s) :=
1 · · · k
|k|s , for pi ∈ N,
where ζ(s) is the Riemann zeta function (see [25] or [14]).
By the symmetry k → −k, it is clear that these functions ζp1,...,pn all vanish for odd values of pi.
Let us now compute ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) in terms of Zn(s):
Since ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) = Ai(s) δij , exchanging the components ki and kj , we get
ζ0,··· ,0,1i,0··· ,0,1j ,0··· ,0(s) =
Zn(s− 2).
Similarly,
|k|s+8
n(n−1)
Zn(s+ 4)− 1n−1
|k|s+8
but it is difficult to write explicitly ζp1,...,pn(s) in terms of Zn(s− 4) and other Zn(s−m) when
at least four indices pi are non zero.
When all pi are even, ζp1,...,pn(s) is a nonzero series of fractions
P (k)
where P is a homogeneous
polynomial of degree p1 + · · ·+ pn. Theorem 2.1 now gives us the following
Proposition 2.16. ζp1,...,pn has a meromorphic extension to the whole plane with a unique pole
at n+ p1 + · · ·+ pn. This pole is simple and the residue at this pole is
s=n+p1+···+pn
ζp1,...,pn(s) = 2
)···Γ(
n+p1+···+pn
when all pi are even or this residue is zero otherwise.
In particular, for n = 2,
′ kikj
|k|s+4
= δij π , (18)
and for n = 4,
′ kikj
|k|s+6
= δij
′ kikjklkm
|k|s+8
= (δijδlm + δilδjm + δimδjl)
. (19)
Proof. Equation (17) follows from Theorem (2.1)
s=n+p1+···+pn
ζp1,...,pn(s) =
k∈Sn−1
1 · · · kpnn dS(k)
and standard formulae (see for instance [32, VIII,1;22]). Equation (18) is a straightforward
consequence of Equation (17). Equation (19) can be checked for the cases i = j 6= l = m and
i = j = l = m.
Note that Zn(s) is an Epstein zeta function associated to the quadratic form q(x) := x
1+...+x
so Zn satisfies the following functional equation
Zn(s) = π
s−n/2Γ(n/2− s/2)Γ(s/2)−1 Zn(n− s).
Since πs−n/2Γ(n/2−s/2) Γ(s/2)−1 = 0 for any negative even integer n and Zn(s) is meromorphic
on C with only one pole at s = n with residue 2πn/2Γ(n/2)−1 according to previous proposition,
so we get Zn(0) = −1. We have proved that
Zn(s+ n) = 2π
n/2 Γ(n/2)−1, (20)
Zn(0) = −1. (21)
2.4 Meromorphic continuation of a class of zeta functions
Let n, q ∈ N, q ≥ 2, and p = (p1, . . . , pq−1) ∈ Nq−10 .
Set I := {i | pi 6= 0} and assume that I 6= ∅ and
I := {α = (αi)i∈I | ∀i ∈ I αi = (αi,1, . . . , αi,pi) ∈ N
0 } =
We will use in the sequel also the following notations:
- for x = (x1, . . . , xt) ∈ Rt recall that |x|1 = |x1|+ · · ·+ |xt| and |x| =
x21 + · · ·+ x2t ;
- for all α = (αi)i∈I ∈ I =
i∈I N
|α|1 =
|αi|1 =
|αi,j| and
2.4.1 A family of polynomials
In this paragraph we define a family of polynomials which plays an important role later.
Consider first the variables:
- for X1, . . . ,Xn we set X = (X1, . . . ,Xn);
- for any i = 1, . . . , 2q, we consider the variables Yi,1, . . . , Yi,n and set Yi := (Yi,1, . . . , Yi,n) and
Y := (Y1, . . . , Y2q);
- for Y = (Y1, . . . , Y2q), we set for any 1 ≤ j ≤ q, Ỹj := Y1 + · · ·+ Yj + Yq+1 + · · ·+ Yq+j.
We define for all α = (αi)i∈I ∈ I =
i∈I N
0 the polynomial
Pα(X,Y ) :=
(2〈X, Ỹi〉+ |Ỹi|2)αi,j . (22)
It is clear that Pα(X,Y ) ∈ Z[X,Y ], degXPα ≤ |α|1 and degY Pα ≤ 2|α|1.
Let us fix a polynomial Q ∈ R[X1, · · · ,Xn] and note d := degQ. For α ∈ I, we want to expand
Pα(X,Y )Q(X) in homogeneous polynomials in X and Y so defining
L(α) := {β ∈ N(2q+1)n0 : |β|1 − dβ ≤ 2|α|1 and dβ ≤ |α|1 + d }
where dβ :=
1 βi, we set
Pα(X,Y )Q(X) =:
β∈L(α)
cα,βX
where cα,β ∈ R, Xβ := Xβ11 · · ·X
n and Y
β := Y
1,1 · · ·Y
β(2q+1)n
2q,n . By definition, X
β is a
homogeneous polynomial of degree in X equals to dβ . We note
Mα,β(Y ) := cα,β Y
2.4.2 Residues of a class of zeta functions
In this section we will prove the following result, used in Proposition 5.4 for the computation of
the spectrum dimension of the noncommutative torus:
Theorem 2.17. (i) Let 1
Θ be a diophantine matrix, and ã ∈ S
(Zn)2q
. Then
s 7→ f(s) :=
l∈[(Zn)q]2
|k + l̃i|pi |k|−sQ(k) eik.Θ
has a meromorphic continuation to the whole complex plane C with at most simple possible poles
at the points s = n+ d+ |p|1 −m where m ∈ N0.
(ii) Let m ∈ N0 and set I(m) := { (α, β) ∈ I × N(2q+1)n0 : β ∈ L(α) and m = 2|α|1 − dβ + d }.
Then I(m) is a finite set and s = n+ d+ |p|1 −m is a pole of f if and only if
C(f,m) :=
(α,β)∈I(m)
Mα,β(l)
u∈Sn−1
uβ dS(u) 6= 0,
with Z := {l :
1 lj = 0} and the convention
∅ = 0. In that case s = n + d + |p|1 −m is a
simple pole of residue Res
s=n+d+|p|1−m
f(s) = C(f,m).
In order to prove the theorem above we need the following
Lemma 2.18. For all N ∈ N we have
|k + l̃i|pi =
α=(αi)i∈I∈
i∈I{0,...,N}
) Pα(k,l)
|k|2|α|1−|p|1
+ON (|k||p|1−(N+1)/2)
uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l) := 36 (
∑2q−1
i=1, i 6=q |li|)4.
Proof. For i = 1, . . . , q − 1, we have uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l),
∣∣2〈k,eli〉+|eli|2
. (23)
In that case,
|k + l̃i| =
|k|2 + 2〈k, l̃i〉+ |l̃i|2
= |k|
2〈k,eli〉+|eli|
|k|2u−1
P iu(k, l)
where for all i = 1, . . . , q − 1 and for all u ∈ N0,
P iu(k, l) :=
2〈k, l̃i〉+ |l̃i|2
with the convention P i0(k, l) := 1.
In particular P iu(k, l) ∈ Z[k, l], degk P iu ≤ u and degl P iu ≤ 2u. Inequality (23) implies that for
all i = 1, . . . , q − 1 and for all u ∈ N,
|k|2u
|P iu(k, l)| ≤
uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l).
Let N ∈ N. We deduce from the previous that for any k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l)
and for all i = 1, . . . , q − 1, we have
|k + l̃i| =
|k|2u−1
P iu(k, l) +O
|k| |
|k|)−u
|k|2u−1
P iu(k, l) +ON
|k|(N−1)/2
It follows that for any N ∈ N, we have uniformly in k ∈ Zn and l ∈ (Zn)2q verifying |k| > U(l)
and for all i ∈ I,
|k + l̃i|pi =
αi∈{0,...,N}
|k|2|αi|1−pi
P iαi(k, l) +ON
|k|(N+1)/2−pi
where P iαi(k, l) =
j=1 P
(k, l) for all αi = (αi,1, . . . , αi,pi) ∈ {0, . . . , N}pi and
|k + l̃i|pi =
α=(αi)∈
i∈I{0,...,N}
|k|2|α|1−|p|1
Pα(k, l) +ON
|k|(N+1)/2−|p|1
where Pα(k, l) =
i∈I P
(k, l) =
j=1 P
(k, l).
Proof of Theorem 2.17.
(i) All n, q, p = (p1, . . . , pq−1) and ã ∈ S
(Zn)2q
are fixed as above and we define formally for
any l ∈ (Zn)2q
F (l, s) :=
|k + l̃i|pi Q(k) eik.Θ
1 lj |k|−s. (24)
Thus, still formally,
f(s) :=
l∈(Zn)2q
ãl F (l, s). (25)
It is clear that F (l, s) converges absolutely in the half plane {σ = ℜ(s) > n + d + |p|1} where
d = degQ.
Let N ∈ N. Lemma 2.18 implies that for any l ∈ (Zn)2q and for s ∈ C such that σ > n+ |p|1+d,
F (l, s) =
|k|≤U(l)
|k + l̃i|pi Q(k) eik.Θ
1 lj |k|−s
α=(αi)i∈I∈
i∈I{0,...,N}
|k|>U(l)
|k|s+2|α|1−|p|1
Pα(k, l)Q(k) e
1 lj +GN (l, s).
where s 7→ GN (l, s) is a holomorphic function in the half-plane DN := {σ > n+ d+ |p|1 − N+12 }
and verifies in it the bound GN (l, s) ≪N,σ 1 uniformly in l.
It follows that
F (l, s) =
α=(αi)i∈I∈
i∈I{0,...,N}
Hα(l, s) +RN (l, s), (26)
where
Hα(l, s) :=
′ (1/2
|k|s+2|α|1−|p|1
Pα(k, l)Q(k) e
1 lj ,
RN (l, s) :=
|k|≤U(l)
|k + l̃i|pi Q(k) eik.Θ
1 lj |k|−s
|k|≤U(l)
α=(αi)i∈I∈
i∈I{0,...,N}
) Pα(k,l)
|k|s+2|α|1−|p|1
Q(k) eik.Θ
1 lj +GN (l, s).
In particular there exists A(N) > 0 such that s 7→ RN (l, s) extends holomorphically to the
half-plane DN and verifies in it the bound RN (l, s) ≪N,σ 1 + |l|A(N) uniformly in l.
Let us note formally
hα(s) :=
ãlHα(l, s).
Equation (26) and RN (l, s) ≪N,σ 1 + |l|A(N) imply that
f(s) ∼N
α=(αi)i∈I∈
i∈I{0,...,N}
hα(s), (27)
where ∼N means modulo a holomorphic function in DN .
Recall the decomposition
Pα(k, l)Q(k) =
β∈L(α)Mα,β(l) k
β and we decompose similarly
hα(s) =
β∈L(α) hα,β(s). Theorem 2.5 now implies that for all α = (αi)i∈I ∈
i∈I{0, . . . , N}pi
and β ∈ L(α),
- the map s 7→ hα,β(s) has a meromorphic continuation to the whole complex plane C with
only one simple possible pole at s = n+ |p|1 − 2|α|1 + dβ ,
- the residue at this point is equal to
s=n+|p|1−2|α|1+dβ
hα,β(s) =
ãlMα,β(l)
u∈Sn−1
uβdS(u) (28)
where Z := {l ∈ (Z)n)2q :
1 lj = 0}. If the right hand side is zero, hα,β(s) is holomorphic on
By (27), we deduce therefore that f(s) has a meromorphic continuation on the halfplane DN ,
with only simple possible poles in the set {n+ |p|1+k : −2N |p|1 ≤ k ≤ d }. Taking now N → ∞
yields the result.
(ii) Let m ∈ N0 and set I(m) := { (α, β) ∈ I × N(2q+1)n0 : β ∈ L(α) and m = 2|α|1 − dβ + d }.
If (α, β) ∈ I(m), then |α|1 ≤ m and |β|1 ≤ 3m+ d, so I(m) is finite.
With a chosen N such that 2N |p|1 + d > m, we get by (27) and (28)
s=n+d+|p|1−m
f(s) =
(α,β)∈I(m)
Mα,β(l)
u∈Sn−1
uβ dS(u) = C(f,m)
with the convention
∅ = 0. Thus, n+d+ |p|1−m is a pole of f if and only if C(f,m) 6= 0.
3 Noncommutative integration on a simple spectral triple
In this section, we revisit the notion of noncommutative integral pioneered by Alain Connes, pay-
ing particular attention to the reality (Tomita–Takesaki) operator J and to kernels of perturbed
Dirac operators by symmetrized one-forms.
3.1 Kernel dimension
We will have to compare here the kernels of D and DA which are both finite dimensional:
Lemma 3.1. Let (A,H,D) be a spectral triple with a reality operator J and chirality χ. If
A ∈ Ω1D is a one-form, the fluctuated Dirac operator
DA := D +A+ ǫJAJ−1
(where DJ = ǫ JD, ǫ = ±1) is an operator with compact resolvent, and in particular its kernel
KerDA is a finite dimensional space. This space is invariant by J and χ.
Proof. Let T be a bounded operator and let z be in the resolvent of D + T and z′ be in the
resolvent of D. Then
(D + T − z)−1 = (D − z′)−1 [1− (T + z′ − z)(D + T − z)−1].
Since (D− z′)−1 is compact by hypothesis and since the term in bracket is bounded, D+ T has
a compact resolvent. Applying this to T = A+ ǫJAJ−1, DA has a finite dimensional kernel (see
for instance [27, Theorem 6.29]).
Since according to the dimension, J2 = ±1, J commutes or anticommutes with χ, χ commutes
with the elements in the algebraA andDχ = −χD (see [10] or [23, p. 405]), we get DAχ = −χDA
and DAJ = ±JDA which gives the result.
3.2 Pseudodifferential operators
Let (A,D,H) be a given real regular spectral triple of dimension n.
We note
P0 the projection on KerD , PA the projection on KerDA ,
D := D + P0 ,DA := DA + PA .
P0 and PA are thus finite-rank selfadjoint bounded operators. We remark that D and DA are
selfadjoint invertible operators with compact inverses.
Remark 3.2. Since we only need to compute the residues and the value at 0 of the ζD, ζDA
functions, it is not necessary to define the operators D−1 or D−1A and the associated zeta func-
tions. However, we can remark that all the work presented here could be done using the process
of Higson in [26] which proves that we can add any smoothing operator to D or DA such that
the result is invertible without changing anything to the computation of residues.
Define for any α ∈ R
OP 0 := {T : t 7→ Ft(T ) ∈ C∞
R,B(H)
OPα := {T : T |D|−α ∈ OP 0 }.
where Ft(T ) := e
it|D| T e−it|D| = eit|D| T e−it|D| since |D| = |D|+ P0. Define
δ(T ) := [|D|, T ],
∇(T ) := [D2, T ],
σs(T ) := |D|sT |D|−s, s ∈ C.
It has been shown in [13] that OP 0 =
p≥0Dom(δ
p). In particular, OP 0 is a subalgebra of B(H)
(while elements of OPα are not necessarily bounded for α > 0) and A ⊆ OP 0, JAJ−1 ⊆ OP 0,
[D,A] ⊆ OP 0. Note that P0 ∈ OP−∞ and δ(OP 0) ⊆ OP 0.
For any t > 0, Dt and and |D|t are in OP t and for any α ∈ R,Dα and |D|α are in OPα. By
hypothesis, |D|−n ∈ L(1,∞)(H) so for any α > n, OP−α ⊆ L1(H).
Lemma 3.3. [13]
(i) For any T ∈ OP 0 and s ∈ C, σs(T ) ∈ OP 0.
(ii) For any α, β ∈ R, OPαOP β ⊆ OPα+β .
(iii) If α ≤ β, OPα ⊆ OP β.
(iv) For any α, δ(OPα) ⊆ OPα.
(v) For any α and T ∈ OPα, ∇(T ) ∈ OPα+1.
Proof. See the appendix.
Remark 3.4. Any operator in OPα, where α ∈ R, extends as a continuous linear operator from
Dom |D|α+1 to Dom |D| where the Dom |D|α spaces have their natural norms (see [13,26]).
We now introduce a definition of pseudodifferential operators in a slightly different way than
in [9,13,26] which in particular pays attention to the reality operator J and the kernel of D and
allows D and |D|−1 to be a pseudodifferential operators. It is more in the spirit of [4].
Definition 3.5. Let us define D(A) as the polynomial algebra generated by A, JAJ−1, D and
A pseudodifferential operator is an operator T such that there exists d ∈ Z such that for any
N ∈ N, there exist p ∈ N0, P ∈ D(A) and R ∈ OP−N (p, P and R may depend on N) such
that P D−2p ∈ OP d and
T = P D−2p +R .
Define Ψ(A) as the set of pseudodifferential operators and Ψ(A)k := Ψ(A) ∩OP k.
Note that if A is a 1-form, A and JAJ−1 are in D(A) and moreover D(A) ⊆ ∪p∈N0OP p. Since
|D| ∈ D(A) by construction and P0 is a pseudodifferential operator, for any p ∈ Z, |D|p is a
pseudodifferential operator (in OP p.) Let us remark also that D(A) ⊆ Ψ(A) ⊆ ∪k∈ZOP k.
Lemma 3.6. [9, 13] The set of all pseudodifferential operators Ψ(A) is an algebra. Moreover,
if T ∈ Ψ(A)d and T ∈ Ψ(A)d′, then TT ′ ∈ Ψ(A)d+d′ .
Proof. See the appendix.
Due to the little difference of behavior between scalar and nonscalar pseudodifferential operators
(i.e. when coefficients like [D, a], a ∈ A appears in P of Definition 3.5), it is convenient to also
introduce
Definition 3.7. Let D1(A) be the algebra generated by A, JAJ−1 and D, and Ψ1(A) be the
set of pseudodifferential operators constructed as before with D1(A) instead of D(A). Note that
Ψ1(A) is subalgebra of Ψ(A).
Remark that Ψ1(A) does not necessarily contain operators such as |D|k where k ∈ Z is odd.
This algebra is similar to the one defined in [4].
3.3 Zeta functions and dimension spectrum
For any operator B and if X is either D or DA, we define
ζBX(s) := Tr
B|X|−s
ζX(s) := Tr
|X|−s
The dimension spectrum Sd(A,H,D) of a spectral triple has been defined in [9,13]. It is extended
here to pay attention to the operator J and to our definition of pseudodifferential operator.
Definition 3.8. The spectrum dimension of the spectral triple is the subset Sd(A,H,D) of all
poles of the functions ζPD := s 7→ Tr
P |D|−s
where P is any pseudodifferential operator in
OP 0. The spectral triple (A,H,D) is simple when these poles are all simple.
Remark 3.9. If Sp(A,H,D) denotes the set of all poles of the functions s 7→ Tr
P |D|−s
where
P is any pseudodifferential operator, then, Sd(A,H,D) ⊆ Sp(A,H,D).
When Sp(A,H,D) = Z, Sd(A,H,D) = {n − k : k ∈ N0 }: indeed, if P is a pseudodifferential
operator in OP 0, and q ∈ N is such that q > n, P |D|−s is in OP−ℜ(s) so is trace-class for s in
a neighborhood of q; as a consequence, q cannot be a pole of s 7→ Tr
P |D|−s
Remark 3.10. Sp(A,H,D) is also the set of poles of functions s 7→ Tr
B|D|−s−2p
where
p ∈ N0 and B ∈ D(A).
3.4 The noncommutative integral
We already defined the one parameter group σz(T ) := |D|zT |D|−z, z ∈ C.
Introducing the notation (recall that ∇(T ) = [D2, T ]) for an operator T ,
ε(T ) := ∇(T )D−2,
we get from [4, (2.44)] the following expansion for T ∈ OP q
σz(T ) ∼
g(z, r) εr(T ) mod OP−N−1+q (29)
where g(z, r) := 1
) · · · (z
− (r − 1)) =
with the convention g(z, 0) := 1.
We define the noncommutative integral by
− T := Res
ζTD(s) = Res
T |D|−s
Proposition 3.11. [13] If the spectral triple is simple,
is a trace on Ψ(A).
Proof. See the appendix.
4 Residues of ζDA for a spectral triple with simple dimension
spectrum
We fix a regular spectral triple (A,H,D) of dimension n and a self-adjoint 1-form A.
Recall that
DA := D + Ã where Ã := A+ εJAJ−1,
DA := DA + PA
where PA is the projection on KerDA. Remark that Ã ∈ D(A) ∩OP 0 and DA ∈ D(A) ∩OP 1.
We note
VA := PA − P0.
As the following lemma shows, VA is a smoothing operator:
Lemma 4.1. (i)
k≥1Dom(DA)k ⊆
k≥1Dom |D|k.
(ii) KerDA ⊆
k≥1Dom |D|k.
(iii) For any α, β ∈ R, |D|βPA|D|α is bounded.
(iv) PA ∈ OP−∞.
Proof. (i) Let us define for any p ∈ N, Rp := (DA)p−Dp, so Rp ∈ OP p−1 and Rp
Dom |D|p
Dom |D| (see Remark 3.4).
Let us fix k ∈ N, k ≥ 2. Since DomDA = DomD = Dom |D|, we have
Dom(DA)k = {φ ∈ Dom |D| : (Dj +Rj)φ ∈ Dom |D| , ∀j 1 ≤ j ≤ k − 1 }.
Let φ ∈ Dom(DA)k. We prove by recurrence that for any j ∈ { 1, · · · , k − 1 }, φ ∈ Dom |D|j+1:
We have φ ∈ Dom |D| and (D +R1)φ ∈ Dom |D|. Thus, since R1 φ ∈ Dom |D|, Dφ ∈ Dom |D|,
which proves that φ ∈ Dom |D|2. Hence, case j = 1 is done.
Suppose now that φ ∈ Dom |D|j+1 for a j ∈ { 1, · · · , k − 2 }. Since (Dj+1 +Rj+1)φ ∈ Dom |D|,
and Rj+1 φ ∈ Dom |D|, we get Dj+1 φ ∈ Dom |D|, which proves that φ ∈ Dom |D|j+2.
Finally, if we set j = k − 1, we get φ ∈ Dom |D|k, so Dom(DA)k ⊆ Dom |D|k.
(ii) follows from KerDA ⊆
k≥1Dom(DA)k and (i).
(iii) Let us first check that |D|αPA is bounded. We define D0 as the operator with domain
DomD0 = ImPA ∩Dom |D|α and such that D0 φ = |D|α φ. Since DomD0 is finite dimensional,
D0 extends as a bounded operator on H with finite rank. We have
φ∈Dom |D|αPA, ‖φ‖≤1
‖|D|αPA φ‖ ≤ sup
φ∈DomD0, ‖φ‖≤1
‖|D|α φ‖ = ‖D0‖ <∞
so |D|αPA is bounded. We can remark that by (ii), DomD0 = ImPA and Dom |D|αPA = H.
Let us prove now that PA|D|α is bounded: Let φ ∈ DomPA|D|α = Dom |D|α. By (ii), we have
ImPA ⊆ Dom |D|α so we get
‖PA|D|α φ‖ ≤ sup
ψ∈ImPA, ‖ψ‖≤1
| < ψ, |D|α φ > | ≤ sup
ψ∈ImPA, ‖ψ‖≤1
| < |D|αψ, φ > |
≤ sup
ψ∈ImPA, ‖ψ‖≤1
‖|D|αψ‖ ‖φ‖ = ‖D0‖ ‖φ‖ .
(iv) For any k ∈ N0 and t ∈ R, δk(PA)|D|t is a linear combination of terms of the form
|D|βPA|D|α, so the result follows from (iii).
Remark 4.2. We will see later on the noncommutative torus example how important is the
difference between DA and D + A. In particular, the inclusion KerD ⊆ KerD + A is not
satisfied since A does not preserve KerD contrarily to Ã.
The coefficient of the nonconstant term Λk (k > 0) in the expansion (5) of the spectral action
S(DA,Φ,Λ) is equal to the residue of ζDA(s) at k. We will see in this section how we can
compute these residues in term of noncommutative integral of certain operators.
Define for any operator T , p ∈ N, s ∈ C,
Kp(T, s) := (− s2)
0≤t1≤···≤tp≤1
σ−st1(T ) · · · σ−stp(T ) dt
with dt := dt1 · · · dtp.
Remark that if T ∈ OPα, then σz(T ) ∈ OPα for z ∈ C and Kp(T, s) ∈ OPαp.
Let us define
X := D2A −D2 = ÃD +DÃ+ Ã2,
XV := X + VA,
thus X ∈ D1(A) ∩OP 1 and by Lemma 4.1,
XV ∼ X mod OP−∞. (30)
We will use
Y := log(D2A)− log(D2)
which makes sense since D2A = D2A + PA is invertible for any A.
By definition of XV , we get
Y = log(D2 +XV )− log(D2).
Lemma 4.3. [4]
(i) Y is a pseudodifferential operator in OP−1 with the following expansion for any N ∈ N
k1,··· ,kp=0
(−1)|k|1+p+1
|k|1+p
∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))D−2(|k|1+p) mod OP−N−1.
(ii) For any N ∈ N and s ∈ C,
|DA|−s ∼ |D|−s +
Kp(Y, s)|D|−s mod OP−N−1−ℜ(s). (31)
Proof. (i) We follow [4, Lemma 2.2]. By functional calculus, Y =
I(λ) dλ, where
I(λ) ∼
(−1)p+1
(D2 + λ)−1XV
(D2 + λ)−1 mod OP−N−3.
By (30),
(D2 + λ)−1XV
(D2 + λ)−1X
mod OP−∞ and we get
I(λ) ∼
(−1)p+1
(D2 + λ)−1X
(D2 + λ)−1 mod OP−N−3.
We set Ap(X) :=
(D2 + λ)−1X
(D2 + λ)−1 and L := (D2 + λ)−1 ∈ OP−2 for a fixed λ. Since
[D2 + λ,X] ∼ ∇(X) mod OP−∞, a recurrence proves that if T is an operator in OP r, then,
for q ∈ N0,
A1(T ) = LTL ∼
(−1)k∇k(T )Lk+2 mod OP r−q−5.
With Ap(X) = LXAp−1(X), another recurrence gives, for any q ∈ N0,
Ap(X) ∼
k1,··· ,kp=0
(−1)|k|1∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))L|k|1+p+1 mod OP−q−p−3,
which entails that
I(λ) ∼
(−1)p+1
k1,··· ,kp=0
(−1)|k|1∇kp(X∇kp−1(· · ·X∇k1(X) · · · ))L|k|1+p+1 mod OP−N−3.
(D2 + λ)−(|k|1+p+1)dλ = 1
|k|1+p
D−2(|k|1+p), we get the result provided we control the
remainders. Such a control is given in [4, (2.27)].
(ii) We have |DA|−s = eB−(s/2)Y e−B |D|−s where B := (−s/2) log(D2). Following [4, Theorem
2.4], we get
|DA|−s = |D|−s +
Kp(Y, s)|D|−s . (32)
and each Kp(Y, s) is in OP
Corollary 4.4. For any p ∈ N and r1, · · · , rp ∈ N0, εr1(Y ) · · · εrp(Y ) ∈ Ψ1(A).
Proof. If for any q ∈ N and k = (k1, · · · , kq) ∈ Nq0,
Γkq(X) :=
(−1)|k|1+q+1
|k|1+q
∇kq(X∇kq−1(· · ·X∇k1(X) · · · )),
then, Γkq (X) ∈ OP |k|1+q. For any N ∈ N,
k1,··· ,kq=0
Γkq(X)D
−2(|k|1+q) mod OP−N−1. (33)
Note that the Γkq(X) are in D1(A), which, with (33) proves that Y and thus εr(Y ) = ∇r(Y )D−2r,
are also in Ψ1(A).
We remark, as in [11], that the fluctuations leave invariant the first term of the spectral action
(5). This is a generalization of the fact that in the commutative case, the noncommutative
integral depends only on the principal symbol of the Dirac operator D and this symbol is stable
by adding a gauge potential like in D+A. Note however that the symmetrized gauge potential
A+ ǫJAJ−1 is always zero in this case for any selfadjoint one-form A.
Lemma 4.5. If the spectral triple is simple, formula (6) can be extended as
ζDA(0)− ζD(0) =
(−1)q
−(ÃD−1)q. (34)
Proof. Since the spectral triple is simple, equation (32) entails that
ζDA(0)− ζD(0) = Tr(K1(Y, s)|D|−s)|s=0 .
Thus, with (29), we get ζDA(0) − ζD(0) = −12
Y . Replacing A by Ã, the same proof as in [4]
gives
− Y =
(−1)q
−(ÃD−1)q.
Lemma 4.6. For any k ∈ N0,
s=n−k
ζDA(s) = Res
s=n−k
ζD(s) +
r1,··· ,rp=0
s=n−k
h(s, r, p) Tr
εr1(Y ) · · · εrp(Y )|D|−s
where
h(s, r, p) := (−s/2)p
0≤t1≤···≤tp≤1
g(−st1, r1) · · · g(−stp, rp) dt .
Proof. By Lemma 4.3 (ii), |DA|−s ∼ |D|−s +
p=1Kp(Y, s)|D|−s mod OP−(k+1)−ℜ(s), where
the convention
∅ = 0 is used. Thus, we get for s in a neighborhood of n− k,
|DA|−s − |D|−s −
Kp(Y, s)|D|−s ∈ OP−(k+1)−ℜ(s) ⊆ L1(H)
which gives
s=n−k
ζDA(s) = Res
s=n−k
ζD(s) +
s=n−k
Kp(Y, s)|D|−s
. (35)
Let us fix 1 ≤ p ≤ k and N ∈ N. By (29) we get
Kp(Y, s) ∼ (− s2)
0≤t1≤···tp≤1
r1,··· ,rp=0
g(−st1, r1) · · · g(−stp, rp)
εr1(Y ) · · · εrp(Y ) dt mod OP−N−p−1. (36)
If we now take N = k − p, we get for s in a neighborhood of n− k
Kp(Y, s)|D|−s −
r1,··· ,rp=0
h(s, r, p) εr1(Y ) · · · εrp(Y )|D|−s ∈ OP−k−1−ℜ(s) ⊆ L1(H)
so (35) gives the result.
Our operators |DA|k are pseudodifferential operators:
Lemma 4.7. For any k ∈ Z, |DA|k ∈ Ψk(A).
Proof. Using (36), we see that Kp(Y, s) is a pseudodifferential operator in OP
−p, so (31) proves
that |DA|k is a pseudodifferential operator in OP k.
The following result is quite important since it shows that one can use
for D or DA:
Proposition 4.8. If the spectral triple is simple, Res
P |DA|−s
P for any pseudodiffer-
ential operator P . In particular, for any k ∈ N0
− |DA|−(n−k) = Res
s=n−k
ζDA(s).
Proof. Suppose P ∈ OP k with k ∈ Z and let us fix p ≥ 1. With (36), we see that for any N ∈ N,
PKp(Y, s)|D|−s ∼
r1,··· ,rp=0
h(s, r, p)Pεr1(Y ) · · · εrp(Y )|D|−s mod OP−N−p−1+k−ℜ(s).
Thus if we take N = n− p+ k, we get
PKp(Y, s)|D|−s
n−p+k∑
r1,··· ,rp=0
h(s, r, p) Tr
Pεr1(Y ) · · · εrp(Y )|D|−s
Since s = 0 is a zero of the analytic function s 7→ h(s, r, p) and s 7→ TrPεr1(Y ) · · · εrp(Y )|D|−s
has only simple poles by hypothesis, we see that Res
h(s, r, p) Tr
Pεr1(Y ) · · · εrp(Y )|D|−s
PKp(Y, s)|D|−s
= 0. (37)
Using (31), P |DA|−s ∼ P |D|−s +
p=1 PKp(Y, s)|D|−s mod OP−n−1−ℜ(s) and thus,
Tr(P |DA|−s) =
− P +
PKp(Y, s)|D|−s
. (38)
The result now follows from (37) and (38). To get the last equality, one uses the pseudodiffer-
ential operator |DA|−(n−k).
Proposition 4.9. If the spectral triple is simple, then
− |DA|−n =
− |D|−n. (39)
Proof. Lemma 4.6 and previous proposition for k = 0.
Lemma 4.10. If the spectral triple is simple,
− |DA|−(n−1) =
− |D|−(n−1) − (n−1
− X|D|−n−1.
− |DA|−(n−2) =
− |D|−(n−2) + n−2
− X|D|−n + n
− X2|D|−2−n
Proof. (i) By (31),
s=n−1
ζDA(s)− ζD(s) = Res
s=n−1
(−s/2)Tr
Y |D|−s
= −n−1
Y |D|−(n−1)|D|−s
where for the last equality we use the simple dimension spectrum hypothesis. Lemma 4.3 (i)
yields Y ∼ XD−2 mod OP−2 and Y |D|−(n−1) ∼ X|D|−n−1 mod OP−n−1 ⊆ L1(H). Thus,
Y |D|−(n−1)|D|−s
= Res
X|D|−n−1|D|−s
− X|D|−n−1.
(ii) Lemma 4.6 (ii) gives
s=n−2
ζDA(s) = Res
s=n−2
ζD(s) + Res
s=n−2
h(s, r, 1) Tr
εr(Y )|D|−s
+ h(s, 0, 2) Tr
Y 2|D|−s
We have h(s, 0, 1) = − s
, h(s, 1, 1) = 1
)2 and h(s, 0, 2) = 1
)2. Using again Lemma 4.3 (i),
Y ∼ XD−2 − 1
∇(X)D−4 − 1
X2D−4 mod OP−3.
Thus,
s=n−2
Y |D|−s
− X|D|−n − 1
−(∇(X) +X2)|D|−2−n.
Moreover, using
∇(X)|D|−k = 0 for any k ≥ 0 since
is a trace,
s=n−2
ε(Y )|D|−s
= Res
s=n−2
∇(X)D−4|D|−s
− ∇(X)|D|−2−n = 0.
Similarly, since Y ∼ XD−2 mod OP−2 and Y 2 ∼ X2D−4 mod OP−3, we get
s=n−2
Y 2|D|−s
= Res
s=n−2
X2D−4|D|−s
− X2|D|−2−n.
Thus,
s=n−2
ζDA(s) = Res
s=n−2
ζD(s)+(−n−22 )(
− X|D|−n − 1
−(∇(X) +X2)|D|−2−n)
− ∇(X)|D|−2−n + 1
− X2|D|−2−n.
Finally,
s=n−2
ζDA(s) = Res
s=n−2
ζD(s) + (−n−22 )
− X|D|−n − 1
− X2|D|−2−n
− X2|D|−2−n
and the result follows from Proposition 4.8.
Corollary 4.11. If the spectral triple is simple and satisfies
|D|−(n−2) =
ÃD|D|−n =∫
DÃ|D|−n = 0, then
− |DA|−(n−2) = n(n−2)4
− ÃDÃD|D|−n−2 + n−2
− Ã2|D|−n
Proof. By previous lemma,
s=n−2
ζDA(s) =
− Ã2|D|−n + n
−( ÃDÃD +DÃDÃ+ ÃD2Ã+DÃ2D )|D|−n−2
Since ∇(Ã) ∈ OP 1, the trace property of
yields the result.
5 The noncommutative torus
5.1 Notations
Let C∞(TnΘ) be the smooth noncommutative n-torus associated to a non-zero skew-symmetric
deformation matrix Θ ∈Mn(R) (see [6], [30]). This means that C∞(TnΘ) is the algebra generated
by n unitaries ui, i = 1, . . . , n subject to the relations
ui uj = e
iΘij uj ui, (40)
and with Schwartz coefficients: an element a ∈ C∞(TnΘ) can be written as a =
k∈Zn ak Uk,
where {ak} ∈ S(Zn) with the Weyl elements defined by Uk := e−
k.χk u
1 · · · uknn , k ∈ Zn,
relation (40) reads
UkUq = e
k.Θq Uk+q, and UkUq = e
−ik.Θq UqUk (41)
where χ is the matrix restriction of Θ to its upper triangular part. Thus unitary operators Uk
satisfy U∗k = U−k and [Uk, Ul] = −2i sin(
k.Θl)Uk+l.
Let τ be the trace on C∞(TnΘ) defined by τ
k∈Zn ak Uk
:= a0 and Hτ be the GNS Hilbert
space obtained by completion of C∞(TnΘ) with respect of the norm induced by the scalar product
〈a, b〉 := τ(a∗b). On Hτ = {
k∈Zn ak Uk : {ak}k ∈ l2(Zn) }, we consider the left and right
regular representations of C∞(TnΘ) by bounded operators, that we denote respectively by L(.)
and R(.).
Let also δµ, µ ∈ { 1, . . . , n }, be the n (pairwise commuting) canonical derivations, defined by
δµ(Uk) := ikµUk. (42)
We need to fix notations: let AΘ := C∞(TnΘ) acting on H := Hτ ⊗ C2
with n = 2m or
n = 2m+ 1 (i.e., m = ⌊n
⌋ is the integer part of n
), the square integrable sections of the trivial
spin bundle over Tn.
Each element of AΘ is represented on H as L(a) ⊗ 12m where L (resp. R) is the left (resp.
right) multiplication. The Tomita conjugation J0(a) := a
∗ satisfies [J0, δµ] = 0 and we define
J := J0 ⊗ C0 where C0 is an operator on C2
. The Dirac operator is given by
D := −i δµ ⊗ γµ, (43)
where we use hermitian Dirac matrices γ. It is defined and symmetric on the dense subset of H
given by C∞(TnΘ)⊗ C2
. We still note D its selfadjoint extension. This implies
α = −εγαC0, (44)
D Uk ⊗ ei = kµUk ⊗ γµei,
where (ei) is the canonical basis of C
2m . Moreover, C20 = ±12m depending on the parity of m.
Finally, one introduces the chirality (which in the even case is χ := id⊗ (−i)mγ1 · · · γn) and this
yields that (AΘ,H,D, J, χ) satisfies all axioms of a spectral triple, see [8, 23].
The perturbed Dirac operator VuDV ∗u by the unitary
Vu :=
L(u)⊗ 12m
L(u)⊗ 12m
defined for every unitary u ∈ A, uu∗ = u∗u = U0, must satisfy condition (3) (which is equivalent
toH being endowed with a structure ofAΘ-bimodule). This yields the necessity of a symmetrized
covariant Dirac operator:
DA := D +A+ ǫJ AJ−1
since VuDV ∗u = DL(u)⊗12m [D,L(u∗)⊗12m ]: in fact, for a ∈ AΘ, using J0L(a)J
0 = R(a
∗), we get
L(a)⊗ γα
J−1 = −R(a∗)⊗ γα
and that the representation L and the anti-representation R are C-linear, commute and satisfy
[δα, L(a)] = L(δαa), [δα, R(a)] = R(δαa).
This induces some covariance property for the Dirac operator: one checks that for all k ∈ Zn,
L(Uk)⊗ 12m [D, L(U∗k )⊗ 12m ] = 1⊗ (−kµγµ), (45)
so with (44), we get Uk[D, U∗k ] + ǫJUk[D, U∗k ]J−1 = 0 and
VUk D V
= D = DL(Uk)⊗12m [D,L(U∗k )⊗12m ]. (46)
Moreover, we get the gauge transformation:
VuDAV ∗u = Dγu(A) (47)
where the gauged transform one-form of A is
γu(A) := u[D, u∗] + uAu∗, (48)
with the shorthand L(u)⊗ 12m −→ u.
As a consequence, the spectral action is gauge invariant:
S(DA,Φ,Λ) = S(Dγu(A),Φ,Λ).
An arbitrary selfadjoint one-form A, can be written as
A = L(−iAα)⊗ γα, Aα = −A∗α ∈ AΘ, (49)
DA = −i
δα + L(Aα)−R(Aα)
⊗ γα. (50)
Defining
Ãα := L(Aα)−R(Aα),
we get D2A = −gα1α2(δα1 + Ãα1)(δα2 + Ãα2)⊗ 12m − 12Ωα1α2 ⊗ γ
α1α2 where
γα1α2 := 1
(γα1γα2 − γα2γα1),
Ωα1α2 := [δα1 + Ãα1 , δα2 + Ãα2 ] = L(Fα1α2)−R(Fα1α2)
Fα1α2 := δα1(Aα2)− δα2(Aα1) + [Aα1 , Aα2 ]. (51)
In summary,
D2A = −δα1α2
δα1 + L(Aα1)−R(Aα1)
δα2 + L(Aα2)−R(Aα2)
⊗ 12m
L(Fα1α2)−R(Fα1α2)
⊗ γα1α2 . (52)
5.2 Kernels and dimension spectrum
We now compute the kernel of the perturbed Dirac operator:
Proposition 5.1. (i) KerD = U0 ⊗C2
, so dimKerD = 2m.
(ii) For any selfadjoint one-form A, KerD ⊆ KerDA.
(iii) For any unitary u ∈ A, KerDγu(A) = Vu KerDA.
Proof. (i) Let ψ =
k,j ck,j Uk ⊗ ej ∈ KerD. Thus, 0 = D2ψ =
k,i ck,j|k|2 Uk ⊗ ej which
entails that ck,j|k|2 = 0 for any k ∈ Zn and 1 ≤ j ≤ 2m. The result follows.
(ii) Let ψ ∈ KerD. So, ψ = U0 ⊗ v with v ∈ C2
and from (50), we get
DAψ = Dψ + (A+ ǫJAJ−1)ψ = (A+ ǫJAJ−1)ψ = −i[Aα, U0]⊗ γαv = 0
since U0 is the unit of the algebra, which proves that ψ ∈ KerDA.
(iii) This is a direct consequence of (47).
Corollary 5.2. Let A be a selfadjoint one-form. Then KerDA = KerD in the following cases:
(i) Au := L(u)⊗ 12m [D, L(u∗)⊗ 12m ] when u is a unitary in A.
(ii) ||A|| < 1
(iii) The matrix 1
Θ has only integral coefficients.
Proof. (i) This follows from previous result because Vu(U0 ⊗ v) = U0 ⊗ v for any v ∈ C2
(ii) Let ψ =
k,j ck,j Uk ⊗ ej be in KerDA (so
k,j |ck,j|2 <∞) and φ :=
j c0,j U0 ⊗ ej . Thus
ψ′ := ψ − φ ∈ Ker DA since φ ∈ KerD ⊆ KerDA and
06=k∈Zn, j
ck,j kα Uk ⊗ γαej ||2 = ||Dψ′||2 = || − (A+ ǫJAJ−1)ψ′||2 ≤ 4||A||2||ψ′||2 < ||ψ′||2.
Defining Xk :=
α kαγα, X
α |kα|2 12m is invertible and the vectors {Uk ⊗Xkej }06=k∈Zn, j
are orthogonal in H, so
06=k∈Zn, j
|kα|2
|ck,j|2 <
06=k∈Zn, j
|ck,j|2
which is possible only if ck,j = 0, ∀k, j that is ψ′ = 0 et ψ = φ ∈ Ker D.
(iii) This is a consequence of the fact that the algebra is commutative, thus A+ǫJAJ−1 = 0.
Note that if Ãu := Au + ǫJAuJ
−1, then by (45), ÃUk = 0 for all k ∈ Zn and ‖AUk‖ = |k|, but
for an arbitrary unitary u ∈ A, Ãu 6= 0 so DAu 6= D.
Naturally the above result is also a direct consequence of the fact that the eigenspace of an iso-
lated eigenvalue of an operator is not modified by small perturbations. However, it is interesting
to compute the last result directly to emphasize the difficulty of the general case:
Let ψ =
l∈Zn,1≤j≤2m cl,j Ul⊗ ej ∈ KerDA, so
l∈Zn,1≤j≤2m |cl,j |2 <∞. We have to show that
ψ ∈ Ker D that is cl,j = 0 when l 6= 0.
Taking the scalar product of 〈Uk ⊗ ei| with
0 = DAψ =
l, α, j
cl, j(l
αUl − i[Aα, Ul])⊗ γαej ,
we obtain
l, α, j
cl, j
lαδk,l − i〈Uk, [Aα, Ul]〉
〈ei, γαej〉.
If Aα =
α,l aα,l Ul ⊗ γα with { aα,l }l ∈ S(Zn), note that [Ul, Um] = −2i sin(12 l.Θm)Ul+m and
〈Uk, [Aα, Ul]〉 =
l′∈Zn
aα,l′(−2i sin(12 l
′.Θl)〈Uk, Ul′+l〉 = −2i aα,k−l sin(12k.Θl).
cl, j
lαδk,l − 2aα,k−l sin(12k.Θl)
〈ei, γαej〉, ∀k ∈ Zn, ∀i, 1 ≤ i ≤ 2m. (53)
We conjecture that KerD = KerDA at least for generic Θ’s:
the constraints (53) should imply cl,j = 0 for all j and all l 6= 0 meaning ψ ∈ KerD. When 12πΘ
has only integer coefficients, the sin part of these constraints disappears giving the result.
Lemma 5.3. If 1
Θ is diophantine, Sp
C∞(TnΘ),H,D
= Z and all these poles are simple.
Proof. Let B ∈ D(A) and p ∈ N0. Suppose that B is of the form
B = arbrDqr−1|D|pr−1ar−1br−1 · · · Dq1 |D|p1a1b1
where r ∈ N, ai ∈ A, bi ∈ JAJ−1, qi, pi ∈ N0. We note ai =:
l ai,l Ul and bi =:
i bi,l Ul.
With the shorthand kµ1,µqi := kµ1 · · · kµqi and γ
µ1,µqi = γµ1 · · · γµqi , we get
Dq1 |D|p1a1b1 Uk ⊗ ej =
l1, l
a1,l1b1,l′1
Ul1UkUl′1
|k + l1 + l′1|p1 (k + l1 + l′1)µ1,µq1 ⊗ γ
µ1,µq1 ej
which gives, after r iterations,
BUk⊗ej =
ãlb̃lUlr · · ·Ul1UkUl′1 · · ·Ul′r
|k+ l̂i+ l̂′i|pi(k+ l̂i+ l̂′i)µi1,µiqi ⊗γ
qr−1 · · · γµ
q1 ej
where ãl := a1,l1 · · · ar,lr and b̃l′ := b1,l′1 · · · br,l′r .
Let us note Fµ(k, l, l
′) :=
i=1 |k + l̂i + l̂′i|pi(k + l̂i + l̂′i)µi1,µiqi and γ
µ := γ
µr−11 ,µ
qr−1 · · · γµ
Thus, with the shortcut ∼c meaning modulo a constant function towards the variable s,
B|D|−2p−s
ãlb̃l′ τ
U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r
)Fµ(k,l,l′)
|k|s+2p
Tr(γµ) .
Since Ulr · · ·Ul1Uk = UkUlr · · ·Ul1e−i
1 li.Θk we get
U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r
= δPr
1 li+l
eiφ(l,l
′) e−i
1 li.Θk
where φ is a real valued function. Thus,
B|D|−2p−s
eiφ(l,l
′) δPr
1 li+l
ãlb̃l′
Fµ(k,l,l
′) e−i
1 li.Θk
|k|s+2p
Tr(γµ)
∼c fµ(s)Tr(γµ).
The function fµ(s) can be decomposed has a linear combination of zeta function of type described
in Theorem 2.17 (or, if r = 1 or all the pi are zero, in Theorem 2.5). Thus, s 7→ Tr
B|D|−2p−s
has only poles in Z and each pole is simple. Finally, by linearity, we get the result.
The dimension spectrum of the noncommutative torus is simple:
Proposition 5.4. (i) If 1
Θ is diophantine, the spectrum dimension of
C∞(TnΘ),H,D
equal to the set {n− k : k ∈ N0 } and all these poles are simple.
(ii) ζD(0) = 0.
Proof. (i) Lemma 5.3 and Remark 3.9.
(ii) ζD(s) =
1≤j≤2m〈Uk ⊗ ej , |D|−sUk ⊗ ej〉 = 2m(
+ 1) = 2m(Zn(s) + 1).
By (21), we get the result.
We have computed ζD(0) relatively easily but the main difficulty of the present work is precisely
to calculate ζDA(0).
5.3 Noncommutative integral computations
We fix a self-adjoint 1-form A on the noncommutative torus of dimension n.
Proposition 5.5. If 1
Θ is diophantine, then the first elements of the expansion (5) are given
− |DA|−n =
− |D|−n = 2m+1πn/2 Γ(n
)−1. (54)
− |DA|n−k = 0 for k odd.
− |DA|n−2 = 0.
We need few technical lemmas:
Lemma 5.6. On the noncommutative torus, for any t ∈ R,
− ÃD|D|−t =
− DÃ|D|−t = 0.
Proof. Using notations of (49), we have
Tr(ÃD|D|−s) ∼c
〈Uk ⊗ ej,−ikµ|k|−s[Aα, Uk]⊗ γαγµej〉
∼c −iTr(γαγµ)
kµ|k|−s〈Uk, [Aα, Uk]〉 = 0
since 〈Uk, [Aα, Uk]〉 = 0. Similarly
Tr(DÃ|D|−s) ∼c
〈Uk ⊗ ej , |k|−s
aα,l 2 sin
(l + k)µUl+k ⊗ γµγαej〉
∼c 2Tr(γµγα)
aα,l sin
(l + k)µ |k|−s〈Uk, Ul+k〉 = 0.
Any element h in the algebra generated by A and [D,A] can be written as a linear combination of
terms of the form a1
p1 · · · anpr where ai are elements of A or [D,A]. Such a term can be written
as a series b :=
a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq ⊗ γα1 · · · γαq where ai,αi are Schwartz sequences
and when ai =:
l alUl ∈ A, we set ai,α,l = ai,l with γα = 1. We define
L(b) := τ
a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq
Tr(γα1 · · · γαq ).
By linearity, L is defined as a linear form on the whole algebra generated by A and [D,A].
Lemma 5.7. If h is an element of the algebra generated by A and [D,A],
h|D|−s
∼c L(h)Zn(s).
In particular, Tr
h|D|−s
has at most one pole at s = n.
Proof. We get with b of the form
a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq ⊗ γα1 · · · γαq ,
b|D|−s
a1,α1,l1 · · · aq,αq,lqUl1 · · ·UlqUk〉 Tr(γα1 · · · γαq )|k|−s
∼c τ(
a1,α1,l1 · · · aq,αq,lqUl1 · · ·Ulq)Tr(γα1 · · · γαq )Zn(s) = L(b)Zn(s).
The results follows now from linearity of the trace.
Lemma 5.8. If 1
Θ is diophantine, the function s 7→ Tr
εJAJ−1A|D|−s
extends meromor-
phically on the whole plane with only one possible pole at s = n. Moreover, this pole is simple
εJAJ−1A|D|−s
= aα,0 a
m+1πn/2 Γ(n/2)−1.
Proof. With A = L(−iAα) ⊗ γα, we get ǫJAJ−1 = R(iAα) ⊗ γα, and by multiplication
εJAJ−1A = R(Aβ)L(Aα)⊗ γβγα. Thus,
εJAJ−1A|D|−s
〈Uk, AαUkAβ〉 |k|−s Tr(γβγα)
aα,l aβ,−l e
ik.Θl |k|−sTr(γβγα)
∼c 2m
aα,l a
ik.Θl |k|−s.
Theorem 2.5 (ii) entails that
l aα,l a
ik.Θl |k|−s extends meromorphically on the
whole plane C with only one possible pole at s = n. Moreover, this pole is simple and we
aα,l a
ik.Θl |k|−s = aα,0 aα0 Res
Zn(s).
Equation (20) now gives the result.
Lemma 5.9. If 1
Θ is diophantine, then for any t ∈ R,
− X|D|−t = δt,n 2m+1
aα,l a
−l + aα,0 a
2πn/2 Γ(n/2)−1.
where X = ÃD +DÃ+ Ã2 and A =: −i
l aα,l Ul ⊗ γα.
Proof. By Lemma 5.6, we get
X|D|−t = Ress=0Tr(Ã2|D|−s−t). Since A and εJAJ−1 commute,
we have Ã2 = A2 + JA2J−1 + 2εJAJ−1A. Thus,
Tr(Ã2|D|−s−t) = Tr(A2|D|−s−t) + Tr(JA2J−1|D|−s−t) + 2Tr(εJAJ−1A|D|−s−t).
Since |D| and J commute, we have with Lemma 5.7,
Ã2|D|−s−t
∼c 2L(A2)Zn(s+ t) + 2Tr
εJAJ−1A|D|−s−t
Thus Lemma 5.8 entails that Tr(Ã2|D|−s−t) is holomorphic at 0 if t 6= n. When t = n,
Ã2|D|−s−t
= 2m+1
aα,l a
−l + aα,0 a
2πn/2 Γ(n/2)−1, (55)
which gives the result.
Lemma 5.10. If 1
Θ is diophantine, then
− ÃDÃD|D|−2−n = −n−2
− Ã2|D|−n.
Proof. With DJ = εJD, we get
− ÃDÃD|D|−2−n = 2
− ADAD|D|−2−n + 2
− εJAJ−1DAD|D|−2−n.
Let us first compute
ADAD|D|−2−n. We have, with A =: −iL(Aα)⊗ γα =: −i
l aα,lUl ⊗ γα,
ADAD|D|−s−2−n
l1,l2
aα2,l2 aα1,l1 τ(U−kUl2Ul1Uk)
kµ1(k+l1)µ2
|k|s+2+n
Tr(γα,µ)
where γα,µ := γα2γµ2γα1γµ1 . Thus,
− ADAD|D|−2−n = −
aα2,−l aα1,lRes
′ kµ1kµ2
|k|s+2+n
Tr(γα,µ).
We have also, with εJAJ−1 = iR(Aα)⊗ γa,
εJAJ−1DAD|D|−s−2−n
l1,l2
aα2,l2aα1,l1τ(U−kUl1UkUl2)
kµ1 (k+l1)µ2
|k|s+2+n
Tr(γα,µ).
which gives
− εJAJ−1DAD|D|−2−n = aα2,0aα1,0Res
′ kµ1kµ2
|k|s+2+n
Tr(γα,µ).
Thus,
− ÃDÃD|D|−2−n =
aα2,0aα1,0 −
aα2,−laα1,l
Ress=0
′ kµ1kµ2
|k|s+2+n
Tr(γα,µ).
kµ1kµ2
|k|s+2+n
δµ1µ2
Zn(s+ n) and Cn := Ress=0 Zn(s+ n) = 2π
n/2Γ(n/2)−1 we obtain
− ÃDÃD|D|−2−n =
aα2,0aα1,0 −
aα2,−laα1,l
Tr(γα2γµγα1γµ).
Since Tr(γα2γµγα1γµ) = 2
m(2− n)δα2,α1 , we get
− ÃDÃD|D|−2−n = 2m
− aα,0 aα0 +
aα,−l a
)Cn(n−2)
Equation (55) now proves the lemma.
Lemma 5.11. If 1
Θ is diophantine, then for any P ∈ Ψ1(A) and q ∈ N, q odd,
− P |D|−(n−q) = 0.
Proof. There exist B ∈ D1(A) and p ∈ N0 such that P = BD−2p + R where R is in OP−q−1.
As a consequence,
P |D|−(n−q) =
B|D|−n−2p+q. Assume B = arbrDqr−1ar−1br−1 · · · Dq1a1b1
where r ∈ N, ai ∈ A, bi ∈ JAJ−1, qi ∈ N. If we prove that
B|D|−n−2p+q = 0, then the general
case will follow by linearity. We note ai =:
l ai,l Ul and bi =:
l bi,l Ul. With the shorthand
kµ1,µqi := kµ1 · · · kµqi and γ
µ1,µqi = γµ1 · · · γµqi , we get
Dq1a1b1Uk ⊗ ej =
a1,l1 b1,l′1 Ul1UkUl
(k + l1 + l
1)µ1,µq1 ⊗ γ
µ1,µq1ej
which gives, after iteration,
B Uk ⊗ ej =
ãlb̃lUlr · · ·Ul1UkUl′1 · · ·Ul′r
(k + l̂i + l̂
i)µi1,µiqi
qr−1 · · · γµ
where ãl := a1,l1 · · · ar,lr and b̃l′ := b1,l′1 · · · br,l′r . Let’s note Qµ(k, l, l
′) :=
i=1 (k + l̂i + l̂
i)µi1,µiqi
and γµ := γ
qr−1 · · · γµ
q1 . Thus,
− B |D|−n−2p+q = Res
ãl b̃l′ τ
U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r
) Qµ(k,l,l′)
|k|s+2p+n−q
Tr(γµ) .
Since Ulr · · ·Ul1Uk = UkUlr · · ·Ul1e−i
1 li.Θk, we get
U−kUlr · · ·Ul1UkUl′1 · · ·Ul′r
= δPr
1 li+l
eiφ(l,l
′) e−i
1 li.Θk
where φ is a real valued function. Thus,
− B |D|−n−2p+q = Res
eiφ(l,l
′) δPr
1 li+l
ãl b̃l′
Qµ(k,l,l
′)e−i
1 li.Θk
|k|s+2p+n−q
Tr(γµ)
=: Res
fµ(s)Tr(γ
We decompose Qµ(k, l, l
′) as a sum
h=0Mh,µ(l, l
′)Qh,µ(k) where Qh,µ is a homogeneous poly-
nomial in (k1, · · · , kn) and Mh,µ(l, l′) is a polynomial in
(l1)1, · · · , (lr)n, (l′1)1, · · · , (l′r)n
Similarly, we decompose fµ(s) as
h=0 fh,µ(s). Theorem 2.5 (ii) entails that fh,µ(s) extends
meromorphically to the whole complex plane C with only one possible pole for s+2p+n−q = n+d
where d := deg Qh,µ. In other words, if d+ q− 2p 6= 0, fh,µ(s) is holomorphic at s = 0. Suppose
now d+ q− 2p = 0 (note that this implies that d is odd, since q is odd by hypothesis), then, by
Theorem 2.5 (ii)
fh,µ(s) = V
u∈Sn−1
Qh,µ(u) dS(u)
where V :=
l,l′∈ZMh,µ(l, l
′) eiφ(l,l
′) δPr
1 li+l
,0 ãl b̃l′ and Z := { l, l′ :
i=1 li = 0 }. Since d is
odd, Qh,µ(−u) = −Qh,µ(u) and
u∈Sn−1
Qh,µ(u) dS(u) = 0. Thus, Res
fh,µ(s) = 0 in any case,
which gives the result.
As we have seen, the crucial point of the preceding lemma is the decomposition of the numer-
ator of the series fµ(s) as polynomials in k. This has been possible because we restricted our
pseudodifferential operators to Ψ1(A).
Proof of Proposition 5.5. The top element follows from Proposition 4.9 and according to (20),
− |D|−n = Res
|D|−s−n
= 2mRes
Zn(s + n) =
2m+1πn/2
Γ(n/2)
For the second equality, we get from Lemmas 5.7 and 4.6
s=n−k
ζDA(s) =
r1,··· ,rp=0
h(n− k, r, p)
− εr1(Y ) · · · εrp(Y )|D|−(n−k).
Corollary 4.4 and Lemma 5.11 imply that
εr1(Y ) · · · εrp(Y )|D|−(n−k) = 0, which gives the
result.
Last equality follows from Lemma 5.10 and Corollary 4.11.
6 The spectral action
Here is the main result of this section.
Theorem 6.1. Consider the n-NC-torus
C∞(TnΘ),H,D
where n ∈ N and 1
Θ is a real n×n
skew-symmetric real diophantine matrix, and a selfadjoint one-form A = L(−iAα)⊗ γα. Then,
the full spectral action of DA = D +A+ ǫJAJ−1 is
(i) for n = 2,
S(DA,Φ,Λ) = 4πΦ2Λ2 +O(Λ−2),
(ii) for n = 4,
S(DA,Φ,Λ) = 8π2 Φ4Λ4 − 4π
Φ(0) τ(FµνF
µν) +O(Λ−2),
(iii) More generally, in
S(DA,Φ,Λ) =
Φn−k cn−k(A)Λ
n−k +O(Λ−1),
cn−2(A) = 0, cn−k(A) = 0 for k odd. In particular, c0(A) = 0 when n is odd.
This result (for n = 4) has also been obtained in [20] using the heat kernel method. It is however
interesting to get the result via direct computations of (5) since it shows how this formula is
efficient. As we will see, the computation of all the noncommutative integrals require a lot of
technical steps. One of the main points, namely to isolate where the Diophantine condition on
Θ is assumed, is outlined here.
Remark 6.2. Note that all terms must be gauge invariants, namely, according to (48), invariant
by Aα −→ γu(Aα) = uAαu∗ + uδα(u∗). A particular case is u = Uk where Ukδα(U∗k ) = −ikαU0.
In the same way, note that there is no contradiction with the commutative case where, for any
selfadjoint one-form A, DA = D (so A is equivalent to 0!), since we assume in Theorem 6.1 that
Θ is diophantine, so A cannot be commutative.
Conjecture 6.3. The constant term of the spectral action of DA on the noncommutative n-torus
is proportional to the constant term of the spectral action of D+A on the commutative n-torus.
Remark 6.4. The appearance of a Diophantine condition for Θ has been characterized in di-
mension 2 by Connes [7, Prop. 49] where in this case, Θ = θ
with θ ∈ R. In fact, the
Hochschild cohomology H(AΘ,AΘ∗) satisfies dim Hj(AΘ,AΘ∗) = 2 (or 1) for j = 1 (or j = 2)
if and only if the irrational number θ satisfies a Diophantine condition like |1−ei2πnθ|−1 = O(nk)
for some k.
Recall that when the matrix Θ is quite irrational (see [23, Cor. 2.12]), then the C∗-algebra
generated by AΘ is simple.
Remark 6.5. It is possible to generalize above theorem to the case D = −i gµν δµ ⊗ γν instead
of (43) when g is a positive definite constant matrix. The formulae in Theorem 6.1 are still
valid.
6.1 Computations of
In order to get this theorem, let us prove a few technical lemmas.
We suppose from now on that Θ is a skew-symmetric matrix in Mn(R). No other hypothesis is
assumed for Θ, except when it is explicitly stated.
When A is a selfadjoint one-form, we define for n ∈ N , q ∈ N, 2 ≤ q ≤ n and σ ∈ {−,+}q
+ := ADD−2,
− := ǫJAJ−1DD−2,
σ := Aσq · · ·Aσ1 .
Lemma 6.6. We have for any q ∈ N,
−(ÃD−1)q =
−(ÃDD−2)q =
σ∈{+,−}q
− Aσ.
Proof. Since P0 ∈ OP−∞, D−1 = DD−2 mod OP−∞ and
(ÃD−1)q =
(ÃDD−2)q.
Lemma 6.7. Let A be a selfadjoint one-form, n ∈ N and q ∈ N with 2 ≤ q ≤ n and σ ∈ {−,+}q.
Then ∫
− Aσ =
− A−σ.
Proof. Let us first check that JP0 = P0J . Since DJ = εJD, we get DJP0 = 0 so JP0 =
P0JP0. Since J is an antiunitary operator, we get P0J = P0JP0 and finally, P0J = JP0. As a
consequence, we get JD2 = D2J , JDD−2 = εDD−2J , JA+J−1 = A− and JA−J−1 = A+.
In summary, JAσiJ−1 = A−σi .
The trace property of
now gives
− Aσ =
− Aσq · · ·Aσ1 =
− JAσqJ−1 · · · JAσ1J−1
− A−σq · · ·A−σ1 =
− A−σ.
Definition 6.8. In [4] has been introduced the vanishing tadpole hypothesis:
− AD−1 = 0, for all A ∈ Ω1D(A). (56)
By the following lemma, this condition is satisfied for the noncommutative torus, a fact more or
less already known within the noncommutative community [34].
Lemma 6.9. Let n ∈ N, A = L(−iAα)⊗γα = −i
l∈Zn aα,l Ul⊗γα, Aα ∈ AΘ, { aα,l }l ∈ S(Zn),
be a hermitian one-form. Then,
ApD−q =
(ǫJAJ−1)pD−q = 0 for p ≥ 0 and 1 ≤ q < n (case p = q = 1 is tadpole
hypothesis.)
(ii) If 1
Θ is diophantine, then
BD−q = 0 for 1 ≤ q < n and any B in the algebra generated
by A, [D,A], JAJ−1 and J [D,A]J−1.
Proof. (i) Let us compute ∫
− Ap(ǫJAJ−1)p′D−q.
With A = L(−iAα)⊗ γα and ǫJAJ−1 = R(iAα)⊗ γα, we get
Ap = L(−iAα1) · · ·L(−iAαp)⊗ γα1 · · · γαp
(ǫJAJ−1)p
= R(iAα′1
) · · ·R(iAα′
)⊗ γα′1 · · · γα
We note ãα,l := aα1,l1 · · · aαp,lp . Since
L(−iAα1) · · ·L(−iAαp)R(iAα′1) · · ·R(iAα′p′ )Uk = (−i)
ãα,l ãα′,l′ Ul1 · · ·UlpUkUl′
· · ·Ul′1 ,
Ul1 · · ·UlpUk = UkUl1 · · ·Ulp e−i(
i li).Θk,
we get, with
Ul,l′ := Ul1 · · ·UlpUl′
· · ·Ul′1 ,
gµ,α,α′(s, k, l, l
′) := eik.Θ
kµ1 ...kµq
|k|s+2q
ãα,l ãα′,l′ ,
′,µ := γα1 · · · γαpγα′1 · · · γα
p′γµ1 · · · γµq ,
Ap(ǫJAJ−1)p
D−q|D|−sUk ⊗ ei ∼c (−i)p ip
gµ,α,α′(s, k, l, l
′)UkUl,l′ ⊗ γα,α
′,µei.
Thus,
Ap(ǫJAJ−1)p
D−q = Res
f(s) where
f(s) : = Tr
Ap(ǫJAJ−1)p
D−q|D|−s
∼c (−i)p ip
〈Uk ⊗ ei,
gµ,α,α′(s, k, l, l
′)UkUl,l′ ⊗ γα,α
′,µei〉
∼c (−i)p ip
gµ,α,α′(s, k, l, l
′)Ul,l′
Tr(γµ,α,α
∼c (−i)p ip
gµ,α,α′(s, k, l, l
Ul,l′
Tr(γµ,α,α
It is straightforward to check that the series
k,l,l′gµ,α,α′(s, k, l, l
Ul,l′
is absolutely summable
if ℜ(s) > R for a R > 0. Thus, we can exchange the summation on k and l, l′, which gives
f(s) ∼c (−i)p ip
gµ,α,α′(s, k, l, l
Ul,l′
Tr(γµ,α,α
If we suppose now that p′ = 0, we see that,
f(s) ∼c (−i)p
′ kµ1 ...kµq
|k|s+2q
ãα,l δ
Tr(γµ,α,α
which is, by Proposition 2.16, analytic at 0. In particular, for p = q = 1, we see that
AD−1 = 0,
i.e. the vanishing tadpole hypothesis is satisfied. Similarly, if we suppose p = 0, we get
f(s) ∼c (−i)p
′ kµ1 ...kµq
|k|s+2q
ãα,l′ δPp′
i=1 l
Tr(γµ,α,α
which is holomorphic at 0.
(ii) Adapting the proof of Lemma 5.11 to our setting (taking qi = 0, and adding gamma matrices
components), we see that
− BD−q = Res
eiφ(l,l
′) δPr
1 li+l
ãα,l b̃β,l′
kµ1 ···kµq e
1 li.Θk
|k|s+2q
Tr(γ(µ,α,β))
where γ(µ,α,β) is a complicated product of gamma matrices. By Theorem 2.5 (ii), since we
suppose here that 1
Θ is diophantine, this residue is 0.
6.1.1 Even dimensional case
Corollary 6.10. Same hypothesis as in Lemma 6.9.
(i) Case n = 2:
− AqD−q = −δq,2 4π τ
(ii) Case n = 4: with the shorthand δµ1,...,µ4 := δµ1µ2δµ3µ4 + δµ1µ3δµ2µ4 + δµ1µ4δµ2µ3 ,
− AqD−q = δq,4 π
Aα1 · · ·Aα4
Tr(γα1 · · · γα4γµ1 · · · γµ4)δµ1,...,µ4 .
Proof. (i, ii) The same computation as in Lemma 6.9 (i) (with p′ = 0, p = q = n) gives
− AnD−n = Res
(−i)n
′ kµ1 ...kµn
|k|s+2n
l∈(Zn)n
ãα,lUl1 · · ·Uln
Tr(γα1 · · · γαnγµ1 · · · γµn)
and the result follows from Proposition 2.16.
We will use few notations:
If n ∈ N, q ≥ 2, l := (l1, · · · , lq−1) ∈ (Zn)q−1, α := (α1, · · · , αq) ∈ {1, · · · , n}q, k ∈ Zn\{0},
σ ∈ {−,+}q, (ai)1≤i≤n ∈ (S(Zn))n,
lq := −
1≤j≤q−1
lj , λσ := (−i)q
j=1...q
σj , ãα,l := aα1,l1 . . . aαq ,lq ,
φσ(k, l) :=
1≤j≤q−1
(σj − σq) k.Θlj +
2≤j≤q−1
σj (l1 + . . .+ lj−1).Θlj ,
gµ(s, k, l) :=
kµ1 (k+l1)µ2 ...(k+l1+...+lq−1)µq
|k|s+2|k+l1|2...|k+l1+...+lq−1|2
with the convention
2≤j≤q−1 = 0 when q = 2, and gµ(s, k, l) = 0 whenever l̂i = −k for a
1 ≤ i ≤ q − 1.
Lemma 6.11. Let A = L(−iAα) ⊗ γα = −i
l∈Zn aα,l Ul ⊗ γα where Aα = −A∗α ∈ AΘ and
{ aα,l }l ∈ S(Zn), with n ∈ N, be a hermitian one-form, and let 2 ≤ q ≤ n, σ ∈ {−,+}q.
Then,
Aσ = Res
f(s) where
f(s) :=
l∈(Zn)q−1
φσ(k,l) gµ(s, k, l) ãα,l Tr(γ
αqγµq · · · γα1γµ1).
Proof. By definition,
Aσ = Res
f(s) where
Tr(Aσq · · ·Aσ1 |D|−s) ∼c
〈Uk ⊗ ei, |k|−s Aσq · · ·Aσ1Uk ⊗ ei〉 =: f(s).
Let r ∈ Zn and v ∈ C2m . Since A = L(−iAα)⊗ γα, and ǫJAJ−1 = R(iAα)⊗ γα, we get
+Ur ⊗ v = ADD−2Ur ⊗ v = A rµ|r|2+δr,0Ur ⊗ γ
µv = −i rµ
|r|2+δr,0
AαUr ⊗ γαγµv ,
−Ur ⊗ v = ǫJAJ−1DD−2Ur ⊗ v = ǫJAJ−1 rµ|r|2+δr,0Ur ⊗ γ
µv = i
|r|2+δr,0
UrAα ⊗ γαγµv.
With UlUr = e
Ur+l and UrUl = e
Ur+l, we obtain, for any 1 ≤ j ≤ q,
σjUr ⊗ v =
(−σj) i eσj
r.Θl rµ
|r|2+δr,0
aα,l Ur+l ⊗ γαγµv.
We now apply q times this formula to get
|k|−sAσq · · ·Aσ1Uk ⊗ ei = λσ
l∈(Zn)q
φσ(k,l) gµ(s, k, l) ãα,l Uk+
⊗ γαqγµq · · · γα1γµ1ei
φσ(k, l) := σ1 k.Θl1 + σ2 (k + l1).Θl2 + . . .+ σq (k + l1 + . . . + lq−1).Θlq.
Thus,
f(s) =
l∈(Zn)q
φσ(k,l) gµ(s, k, l) ãα,l U
Tr(γαqγµq · · · γα1γµ1)
l∈(Zn)q
φσ(k,l) gµ(s, k, l) ãα,l δ(
lj)Tr(γ
αqγµq · · · γα1γµ1)
l∈(Zn)q−1
φσ(k,l) gµ(s, k, l) ãα,l Tr(γ
αqγµq · · · γα1γµ1)
where in the last sum lq is fixed to −
1≤j≤q−1 lj and thus,
φσ(k, l) =
1≤j≤q−1
(σj − σq) k.Θlj +
2≤j≤q−1
σj (l1 + . . .+ lj−1).Θlj .
By Lemma 2.10, there exists a R > 0 such that for any s ∈ C with ℜ(s) > R, the family
φσ(k,l) gµ(s, k, l) ãα,l
(k,l)∈(Zn\{ 0 })×(Zn)q−1
is absolutely summable as a linear combination of families of the type considered in that lemma.
As a consequence, we can exchange the summations on k and l, which gives the result.
In the following, we will use the shorthand
c := 4π
Lemma 6.12. Suppose n = 4. Then, with the same hypothesis of Lemma 6.11,
(i) 1
−(A+)2 = 1
−(A−)2 = c
aα1,l aα2,−l
lα1 lα2 − δα1α2 |l|2
(ii) − 1
−(A+)3 = −1
−(A−)3 = 4c
li∈Z4
aα3,−l1−l2 a
aα1,l1 sin
l1.Θl2
(iii) 1
−(A+)4 = 1
−(A−)4 = 2c
li∈Z4
aα1,−l1−l2−l3 aα2,l3 a
aα2l1 sin
l1.Θ(l2+l3)
sin l2.Θl3
(iv) Suppose 1
Θ diophantine. Then the crossed terms in
(A+ + A−)q vanish: if C is the
set of all σ ∈ {−,+}q with 2 ≤ q ≤ 4, such that there exist i, j satisfying σi 6= σj, we have∑
Aσ = 0.
Proof. (i) Lemma 6.11 entails that
A++ = Res
l∈Zn −f(s, l) where
f(s, l) :=
′ kµ1 (k+l)µ2
|k|s+2|k+l|2
ãα,l Tr(γ
α2γµ2γα1γµ1) and ãα,l := aα1,l aα2,−l .
We will now reduce the computation of the residue of an expression involving terms like |k+l|2 in
the denominator to the computation of residues of zeta functions. To proceed, we use (16) into
an expression like the one appearing in f(s, l). We see that the last term on the righthandside
yields a Zn(s) while the first one is less divergent by one power of k. If this is not enough,
we repeat this operation for the new factor of |k + l|2 in the denominator. For f(s, l), which
is quadratically divergent at s = 0, we have to repeat this operation three times before ending
with a convergent result. All the remaining terms are expressible in terms of Zn functions. We
get, using three times (16),
|k+l|2
− 2k.l+|l|
(2k.l+|l|2)2
− (2k.l+|l|
|k|6|k+l|2
. (57)
Let us define
fα,µ(s, l) :=
′ kµ1 (k+l)µ2
|k|s+2|k+l|2
ãα,l
so that f(s, l) = fα,µ(s, l)Tr(γ
α2γµ2γα1γµ1). Equation (57) gives
fα,µ(s, l) = f1(s, l)− f2(s, l) + f3(s, l)− r(s, l)
with obvious identifications. Note that the function
r(s, l) =
′ kµ1 (k+l)µ2 (2kl+|l|
|k|s+8|k+l|2
ãα,l
is a linear combination of functions of the typeH(s, l) satisfying the hypothesis of Corollary 2.13.
Thus, r(s, l) satisfies (H1) and with the previously seen equivalence relation modulo functions
satisfying this hypothesis we get fα,µ(s, l) ∼ f1(s, l)− f2(s, l) + f3(s, l).
Let’s now compute f1(s, l).
f1(s, l) =
′ kµ1 (k+l)µ2
|k|s+4
ãα,l = ãα,l
′ kµ1kµ2
|k|s+4
Proposition 2.1 entails that s 7→
′ kµ1kµ2
|k|s+4
is holomorphic at 0. Thus, f1(s, l) satisfies (H1),
and fα,µ(s, l) ∼ −f2(s, l) + f3(s, l).
Let’s now compute f2(s, l) modulo (H1). We get, using several times Proposition 2.1,
f2(s, l) =
′ kµ1 (k+l)µ2 (2kl+|l|
|k|s+6
ãα,l =
′ (2kl)kµ1kµ2+(2kl)kµ1 lµ2+|l|
2kµ1kµ2+lµ2 |l|
|k|s+6
ãα,l
∼ 0 +
′ (2kl)kµ1 lµ2
|k|s+6
ãα,l +
′ |l|2kµ1kµ2
|k|s+6
ãα,l + 0 .
Recall that
|k|s+6
Zn(s + 4). Thus,
f2(s, l) ∼ 2lilµ2 ãα,l
Zn(s+ 4) + |l|2 ãα,l
δµ1µ2
Zn(s+ 4).
Finally, let us compute f3(s, l) modulo (H1) following the same principles:
f3(s, l) =
′ kµ1 (k+l)µ2 (2kl+|l|
|k|s+8
ãα,l
′ (2kl)2kµ1kµ2+(2kl)
2kµ1 lµ2+|l|
4kµ1kµ2+|l|
4kµ1 lµ2+(4kl)|l|
2kµ1kµ2+(4kl)|l|
2kµ1 lµ2
|k|s+8
ãα,l
∼ 4lilj
′ kikjkµ1kµ2
|k|s+8
ãα,l + 0.
In conclusion,
fα,µ(s, l) ∼ −14(2lµ1 lµ2 + |l|
2 δµ1µ2)ãα,lZn(s+ 4) + 4l
ilj ãα,l
′ kikjkµ1kµ2
|k|s+8
=: gα,µ(s, l).
Proposition (2.1) entails that Zn(s+ 4) and s 7→
′ kikjkµ1kµ2
|k|s+8
extend holomorphically in a
punctured open disk centered at 0. Thus, gα,µ(s, l) satisfies (H2) and we can apply Lemma 2.14
to get
−(A+)2 = Res
f(s, l) =
gα,µ(s, l)Tr(γ
α2γµ2γα1γµ1) =:
g(s, l).
The problem is now reduced to the computation of Res
g(s, l). Recall that Ress=0 Z4(s+4) = 2π
by (20) or (17), and
Ress=0
′ kikjklkm
|k|s+8
= (δijδlm + δilδjm + δimδjl)
Thus,
gα,µ(s, l) = −π
ãα,l (lµ1 lµ2 +
|l|2δµ1µ2).
We will use
Tr(γµ1 · · · γµ2j ) = Tr(1)
all pairings of { 1···2j }
s(P ) δµP1µP2 δµP3µP4 · · · δµP2j−1µP2j (58)
where s(P ) is the signature of the permutation P when P2m−1 < P2m for 1 ≤ m ≤ n. This gives
Tr(γα2γµ2γα1γµ1) = 2m(δα2µ2δα1µ1 − δα1α2δµ2µ1 + δα2µ1δµ2α1). (59)
Thus,
g(s, l) = −c ãα,l (lµ1 lµ2 + 12 |l|
2δµ1µ2)(δ
α2µ2δα1µ1 − δα1α2δµ2µ1 + δα2µ1δµ2α1)
= −2c ãα,l (lα1 lα2 − δα1α2 |l|2).
Finally,
−(A+)2 = 1
−(A−)2 = c
aα1,l aα2,−l
lα1 lα2 − δα1α2 |l|2
(ii) Lemma 6.11 entails that
A+++ = Res
(l1,l2)∈(Zn)2
f(s, l) where
f(s, l) :=
l1Θl2 kµ1 (k+l1)µ2 (k+
bl2)µ3
|k|s+2|k+l1|2|k+bl2|2
ãα,l Tr(γ
α3γµ3γα2γµ2γα1γµ1)
=: fα,µ(s, l)Tr(γ
α3γµ3γα2γµ2γα1γµ1),
and ãα,l := aα1,l1 aα2,l2 aα3,−bl2
with l̂2 := l1 + l2.
We use the same technique as in (i):
|k+l1|2
− 2k.l1+|l1|
(2k.l1+|l1|
|k|4|k+l1|2
|k+bl2|2
|k|2 −
2k.bl2+|bl2|
(2k.bl2+|bl2|
|k|4|k+bl2|2
and thus,
|k+l1|2|k+bl2|2
− 2k.l1
− 2k.bl2
+R(k, l) (60)
where the remain R(k, l) is a term of order at most −6 in k. Equation (60) gives
fα,µ(s, l) = f1(s, l) + r(s, l)
where r(s, l) corresponds to R(k, l). Note that the function
r(s, l) =
l1Θl2 kµ1(k+l)µ2 (k+
bl2)µ3R(k,l)
|k|s+2
ãα,l
is a linear combination of functions of the type H(s, l) satisfying the hypothesis of Corollary
(2.13). Thus, r(s, l) satisfies (H1) and fα,µ(s, l) ∼ f1(s, l).
Let us compute f1(s, l) modulo (H1)
f1(s, l) =
l1Θl2 kµ1 (k+l1)µ2 (k+
bl2)µ3
|k|s+6
ãα,l −
l1Θl2 kµ1 (k+l1)µ2 (k+
bl2)µ3 (2k.l1+2k.
|k|s+8
ãα,l
l1Θl2 kµ1kµ2
bl2µ3+kµ1kµ3 l1µ2
|k|s+6
ãα,l −
l1Θl2 kµ1kµ2kµ3 (2k.l1+2k.
|k|s+8
ãα,l
= i e
l1Θl2 ãα,l
(l1µ2δµ1µ3 + l̂2µ3δµ1µ2)
Z4(s+ 4)− 2(li1 + l̂i2)
′ kµ1kµ2kµ3ki
|k|s+8
=: gα,µ(s, l).
Since gα,µ(s, l) satisfies (H2), we can apply Lemma 2.14 to get
−(A+)3 = Res
(l1,l2)∈(Zn)2
f(s, l)
(l1,l2)∈(Zn)2
gα,µ(s, l)Tr(γ
α3γµ3γα2γµ2γα1γµ1) =:
Recall that l3 := −l1 − l2 = −l̂2. By (17) and (19),
gα,µ(s, l)i e
l1Θl2 ãα,l
2(−li1 + li3)π
(δµ1µ2δµ3i + δµ1µ3δµ2i + δµ1iδµ2µ3)
+ (l1µ2δµ1µ3 − l3µ3δµ1µ2)π
We decompose Xl in five terms: Xl = 2
l1Θl2 ãα,l (T1 + T2 + T3 + T4 + T5) where
T0 :=
(−li1 + li3)(δµνδρi + δµρδνi + δµiδνρ) + l1νδµρ − l3ρδµν ,
T1 := (δ
α3ρδα2νδα1µ − δα3ρδα2α1δµν + δα3ρδα2µδα1ν)T0,
T2 := (−δα2α3δρνδα1µ + δα2α3δα1ρδµν − δα2α3δρµδα1ν)T0,
T3 := (δ
α3νδα2ρδα1µ − δα3νδα1ρδα2µ + δα3νδρµδα1α2)T0,
T4 := (−δα1α3δα2ρδµν + δα1α3δρνδα2µ − δα1α3δρµδα2ν)T0,
T5 := (δ
α3µδα2ρδα1ν − δα3µδρνδα1α2 + δα3µδα1ρδα2ν)T0.
With the shorthand p := −l1 − 2l3, q := 2l1 + l3, r := −p− q = −l1 + l3, we compute each Ti,
and find
3T1 = δ
α1α2(2− 2m)pα3 + δα3α1qα2 − δα2α1qα3 + δα3α2qα1 + δα3α2rα1 − δα2α1rα3 + δα3α1rα2 ,
3T2 = (2
m − 2)δα2α3pα1 − 2mδα2α3qα1 − 2mδα2α3rα1 ,
3T3 = δ
α1α3pα2 − δα2α3pα1 + δα1α2pα3 + 2mδα2α1qα3 + δα3α2rα1 − δα3α1rα2 + δα1α2rα3 ,
3T4 = −δα1α32mpα2 − δα1α32mqα2 + δα1α3(2m − 2)rα2 ,
3T5 = δ
α1α3pα2 − δα1α2pα3 + δα3α2pα1 + δα3α2qα1 − δα1α2qα3 + δα3α1qα2 + (2− 2m)δα1α2rα3 .
Thus,
Xl = 2
m 2π2
l1.Θl2 ãα,l (q
α3δα1α2 + rα2δα1α3 + pα1δα2α3) (61)
−(A+)3 = i 2c (S1 + S2 + S3),
where S1, S2 and S3 correspond to respectively q
α3δα1α2 , rα2δα1α3 and pα1δα2α3 . In S1, we
permute the li variables the following way: l1 7→ l3, l2 7→ l1, l3 7→ l2. Therefore, l3.Θ l1 7→ l3.Θ l1
and q 7→ r. With a similar permutation of the αi, we see that S1 = S2. We apply the same
principles to prove that S1 = S3 (using permutation l1 7→ l2, l2 7→ l3, l3 7→ l1). Thus,
−(A+)3 = i 2c
ãα,l e
l1.Θl2 (l1 − l2)α3δα1α2 = S4 − S5,
where S4 correspond to l1 and S5 to l2. We permute the li variables in S5 the following way:
l1 7→ l2, l2 7→ l1, l3 7→ l3, with a similar permutation on the αi. Since l1.Θ l2 7→ −l1.Θ l2, we
finally get
−(A+)3 = −4c
aα1,l1 aα2,l2 aα3,−l1−l2 sin
l1.Θl2
lα31 δ
α1α2 .
(iii) Lemma 6.11 entails that
A++++ = Res
(l1,l2,l3)∈(Zn)3
fµ,α(s, l)Tr γ
µ,α where
θ := l1.Θl2 + l1.Θl3 + l2.Θl3,
Tr γµ,α := Tr(γα4γµ4γα3γµ3γα2γµ2γα1γµ1),
fµ,α(s, l) :=
θ kµ1 (k+l1)µ2 (k+
bl2)µ3 (k+
bl3)µ4
|k|s+2|k+l1|2|k+bl2|2|k+bl3|2
ãα,l,
ãα,l := aα1,l1 aα2,l2 aα3,l3 aα4,−l1−l2−l3 .
Using (16) and Corollary 2.13 successively, we find
fµ,α(s, l) ∼
θ kµ1kµ2kµ3kµ4
|k|s+2|k+l1|2|k+l1+l2|2|k+l1+l2+l3|2
ãα,l ∼
θ kµ1kµ2kµ3kµ4
|k|s+8
ãα,l.
Since the function
θ kµ1kµ2kµ3kµ4
|k|s+8
ãα,l satisfies (H2), Lemma 2.14 entails that
−(A+)4 =
(l1,l2,l3)∈(Zn)3
ãα,l Res
′ kµ1kµ2kµ3kµ4
|k|s+8
Tr γµ,α =:
Therefore, with (19), we get Xl =
ãα,l e
(A+B + C), where
A := Tr(γα4γµ4γα3γµ4γ
α2γµ2γα1γµ2),
B := Tr(γα4γµ4γα3γµ2γα2γµ4γ
α1γµ2),
C := Tr(γα4γµ4γα3γµ2γ
α2γµ2γα1γµ4).
Using successively {γµ, γν} = 2δµν and γµγµ = 2m 12m , we see that
A = C = 4 Tr(γα4γα3γα2γα1),
B = −4
Tr(γα4γα3γα1γα2) + Tr(γα4γα2γα3γα1)
Thus, A+B +C = 8 2m
δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1
, and
ãα,l
δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1
. (62)
By (62), we get ∫
−(A+)4 = 2c (−2T1 + T2 + T3),
where
T1 :=
l1,...,l4
aα4,l4 aα3,l3 aα2,l2 aα1,l1 e
δα4α2 δα3α1 ,
T2 :=
l1,...,l4
aα4,l4 aα3,l3 aα2,l2 aα1,l1 e
δα4α3 δα2α1 ,
T3 :=
l1,...,l4
aα4,l4 aα3,l3 aα2,l2 aα1,l1 e
δα4α1 δα3α2 .
We now proceed to the following permutations of the li variables in the T1 term : l1 7→ l2,
l2 7→ l1, l3 7→ l4, l4 7→ l3. While
i li is invariant, θ is modified : θ 7→ l2.Θl1 + l2.Θl4 + l1.Θl4.
With δ0,
in factor, we can let l4 be −l1 − l2 − l3, so that θ 7→ −θ. We also permute the αi
in the same way. Thus,
l1,...,l4
aα3,l3 aα4,l4 aα1,l1 aα2,l2 e
δα3α1 δα4α2 .
Therefore,
2T1 = 2
l1,...,l4
aα4,l4 aα3,l3 aα2,l2 aα1,l1 cos
δα4α2 δα3α1 . (63)
The same principles are applied to T2 and T3. Namely, the permutation l1 7→ l1, l2 7→ l3, l3 7→ l2,
l4 7→ l4 in T2 and the permutation l1 7→ l2, l2 7→ l3, l3 7→ l1, l4 7→ l4 in T3 (the αi variables are
permuted the same way) give
l1,...,l4
aα4,l4aα3,l3aα2,l2 aα1,l1 e
δα4α2 δα3α1 ,
l1,...,l4
aα4,l4 aα3,l3aα2,l2 aα1,l1 e
δα4α2 δα3α1
where φ := l1.Θ l2 + l1.Θ l3 − l2.Θ l3. Finally, we get
−(A+)4 = 4c
l1,...,l4
aα1,l4 aα2,l3 a
− cos θ
l1,...,l3
aα1,−l1−l2−l3 aα2,l3 a
l1.Θ(l2+l3)
sin l2.Θl3
. (64)
(iv) Suppose q = 2. By Lemma 6.11, we get
− Aσ = Res
λσfα,µ(s, l)Tr(γ
α2γµ2γα1γµ1)
where
fα,µ(s, l) :=
′ kµ1 (k+l)µ2
|k|s+2|k+l|2
eiη k.Θl ãα,l
and η := 1
(σ1 − σ2) ∈ {−1, 1}. As in the proof of (i), since the presence of the phase does not
change the fact that r(s, l) satisfies (H1), we get
fα,µ(s, l) ∼ f1(s, l)− f2(s, l) + f3(s, l)
where
f1(s, l) =
′ kµ1(k+l)µ2
|k|s+4
eiη k.Θl ãα,l,
f2(s, l) =
′ kµ1(k+l)µ2 (2k.l+|l|
|k|s+6
eiη k.Θl ãα,l,
f3(s, l) =
′ kµ1(k+l)µ2 (2k.l+|l|
|k|s+8
eiη k.Θl ãα,l.
Suppose that l = 0. Then f2(s, 0) = f3(s, 0) = 0 and Proposition 2.1 entails that
f1(s, 0) =
kµ1kµ2
|k|s+4
ãα,0
is holomorphic at 0 and so is fα,µ(s, 0).
Since 1
Θ is diophantine, Theorem 2.5 3 gives us the result.
Suppose q = 3. Then Lemma 6.11 implies that
− Aσ = Res
l∈(Zn)2
fµ,α(s, l) Tr(γ
µ3γα3 · · · γµ1γα1)
where
fµ,α(s, l) :=
ik.Θ(ε1l1+ε2l2)e
σ2l1.Θl2 kµ1 (k+l1)µ2 (k+l1+l2)µ3
|k|s+2|k+l1|2|k+l1+l2|2
ãα,l,
and εi :=
(σi − σ3) ∈ {−1, 0, 1}. By hypothesis (ε1, ε2) 6= (0, 0). There are six possibilities
for the values of (ε1, ε2), corresponding to the six possibilities for the values of σ: (−,−,+),
(−,+,+), (+,−,+), (+,+,−), (−,+,−), and (+,−,−). As in (ii), we see that
fµ,α(s, l) ∼
′ eik.Θ(ε1l1+ε2l2)kµ1 (k+l1)µ2 (k+
bl2)µ3
|k|s+6
′ eik.Θ(ε1l1+ε2l2)kµ1 (k+l1)µ2 (k+
bl2)µ3 (2k.l1+2k.
|k|s+8
λσ ãα,l e
σ2l1.Θl2 .
With Z := {(l1, l2) : ε1l1 + ε2l2 = 0}, Theorem 2.5 (iii) entails that
l∈(Zn)2\Z fµ,α(s, l) is
holomorphic at 0. To conclude we need to prove that
g(σ) :=
fµ,α(s, l) Tr(γ
µ3γα3 · · · γµ1γα1)
is holomorphic at 0. By definition, λσ = iσ1σ2σ3 and as a consequence, we check that
g(−,−,+) = −g(+,+,−), g(+,−,+) = −g(+,−,−), g(−,+,+) = −g(−,+,−),
which implies that
σ g(σ) = 0. The result follows.
Suppose finally that q = 4. Again, Lemma 6.11 implies that
− Aσ = Res
l∈(Zn)3
fµ,α(s, l) Tr(γ
µ4γα4 · · · γµ1γα1)
where
fµ,α(s, l) :=
i=1 εili e
(σ2l1.Θl2+σ3(l1+l2).Θl3) kµ1 (k+l1)µ2 (k+l1+l2)µ3 (k+l1+l2+l3)µ4
|k|s+2|k+l1|2|k+l1+l2|2|k+l1+l2+l3|2
ãα,l
and εi :=
(σi − σ4) ∈ {−1, 0, 1}. By hypothesis (ε1, ε2, ε3) 6= (0, 0, 0). There are fourteen pos-
sibilities for the values of (ε1, ε2, ε3), corresponding to the fourteen possibilities for the values of
σ: (−,−,−,+), (−,−,+,+), (−,+,−,+), (+,−,−,+), (−,+,+,+), (+,−,+,+), (+,+,−,+),
(+,+,+,−), (−,−,+,−), (−,+,−,−), (+,−,−,−), (−,+,+,−), (+,−,+,−) and (+,+,−,−).
As in (ii), we see that, with the shorthand θσ := σ2l1.Θl2 + σ3(l1 + l2).Θl3,
fµ,α(s, l) ∼
i=1 εili e
θσ kµ1kµ2kµ3kµ4
|k|s+8
ãα,l =: gµ,α(s, l) .
With Zσ := {(l1, l2, l3) :
i=1 εili = 0}, Theorem 2.5 (iii), the series
l∈(Zn)3\Zσ
fµ,α(s, l) is
holomorphic at 0. To conclude, we need to prove that
g(σ) :=
gµ,α(s, l) Tr(γ
µ4γα4 · · · γµ1γα1) = 0.
Let C be the set of the fourteen values of σ and C7 be the set of the seven first values of σ given
above. Lemma 6.7 implies ∑
g(σ) = 2
g(σ).
Thus, in the following, we restrict to these seven values. Let us note Fµ(s) :=
kµ1kµ2kµ3kµ4
|k|s+8
so that
g(σ) = Res
Fµ(s)λσ
θσ ãα,l Tr(γ
µ4γα4 · · · γµ1γα1).
Recall from (62) that
Fµ(s)Tr(γ
µ4γα4 · · · γµ1γα1) = 2c
δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1
As a consequence, we get, with ãα,l := aα1,l1 · · · aα4,l4 ,
g(σ) = 2cλσ
l∈(Zn)4
θσ ãα,l δP4
i=1 li,0
i=1 εili,0
δα4α3δα2α1 + δα4α1δα3α2 − 2δα4α2δα3α1
=: 2cλσ(T1 + T2 − 2T3).
We proceed to the following change of variable in T1: l1 7→ l1, l2 7→ l3, l3 7→ l2, l4 7→ l4. Thus,
we get θσ 7→ ψσ := σ2l1.Θl3 + σ3(l1 + l3).Θl2, and
i=1 εili 7→ ε1l1 + ε3l2 + ε2l3 =: uσ(l). With
a similar permutation on the αi, we get
l∈(Zn)4
ψσ ãα,l δP4
i=1 li,0
δε1l1+ε3l2+ε2l3,0 δ
α4α2δα3α1 .
We proceed to the following change of variable in T2: l1 7→ l2, l2 7→ l3, l3 7→ l1, l4 7→ l4. Thus,
we get θσ 7→ φσ := σ2l2.Θl3 + σ3(l2 + l3).Θl1, and
i=1 εili 7→ ε3l1 + ε1l2 + ε2l3 =: vσ(l). After
a similar permutation on the αi, we get
l∈(Zn)4
φσ ãα,l δP4
i=1 li,0
δε3l1+ε1l2+ε2l3,0 δ
α4α2δα3α1 .
Finally, we proceed to the following change of variable in T3: l1 7→ l2, l2 7→ l1, l3 7→ l4, l4 7→ l3.
Thus, we get θσ 7→ −θσ, and
i=1 εili 7→ (ε2− ε3)l1+(ε1− ε3)l2− ε3l3 =: wσ(l). With a similar
permutation on the αi, we get
l∈(Zn)4
θσ ãα,l δP4
i=1 li,0
δ(ε2−ε3)l1+(ε1−ε3)l2−ε3l3,0δ
α4α2δα3α1 .
As a consequence, we get
g(σ) = 2c
l∈(Zn)4
Kσ(l1, l2, l3) ãα,l δP4
i=1 li,0
δα4α2δα3α1 ,
where Kσ(l1, l2, l3) = λσ
ψσ δuσ(l),0 + e
φσ δvσ(l),0 − e
θσ δP3
i=1 εili,0
θσ δwσ(l),0
The computation of Kσ(l1, l2, l3) for the seven values of σ yields
K−−++(l1, l2, l3) = δl1+l3,0 + δl2+l3,0 − δl1+l2,0 − δl1+l2,0,
K−+−+(l1, l2, l3) = δl1+l2,0 + δl1+l2,0 − δl1+l3,0 − δl1+l3,0,
K−−++(l1, l2, l3) = δl2+l3,0 + δl1+l3,0 − δl2+l3,0 − δl2+l3,0,
K−−−+(l1, l2, l3) = −
l1.Θl2δP3
i=1 li,0
l2.Θl1δP3
i=1 li,0
l2.Θl1δP3
i=1 li,0
l1.Θl2δl3,0
K−+++(l1, l2, l3) = −
l3.Θl2δl1,0 + e
l3.Θl1δl2,0 − e
l2.Θl3δl1,0 − e
l3.Θl1δl2,0
K+−++(l1, l2, l3) = −
l1.Θl2δl3,0 + e
l2.Θl1δl3,0 − e
l1.Θl3δl2,0 − e
l3.Θl2δl1,0
K++−+(l1, l2, l3) = −
l1.Θl3δl2,0 + e
l2.Θl3δl1,0 − e
l1.Θl2δl3,0 − e
l2.Θl1δP3
i=1 li,0
Thus, ∑
Kσ(l1, l2, l3) = 2i(δl3,0 − δP3
i=1 li,0
) sin l1.Θl2
and ∑
g(σ) = i4c
l∈(Zn)4
(δl3,0 − δP3
i=1 li,0
) sin l1.Θl2
ãα,l δP4
i=1 li,0
δα4α2δα3α1 .
The following change of variables: l1 7→ l2, l1 7→ l2, l3 7→ l4, l4 7→ l3 gives
l∈(Zn)4
1 li,0
sin l1.Θl2
ãα,l δP4
1 li,0
δα4α2δα3α1 = −
l∈(Zn)4
δl3,0 sin
l1.Θl2
ãα,l δP4
1 li,0
δα4α2δα3α1
g(σ) = i8c
l∈(Zn)4
δl3,0 sin
l1.Θl2
ãα,l δP4
1 li,0
δα4α2δα3α1 .
Finally, the change of variables: l2 7→ l4, l4 7→ l2 gives
l∈(Zn)4
δl3,0 sin
l1.Θl2
ãα,l δP4
1 li,0
δα4α2δα3α1 = −
l∈(Zn)4
δl3,0 sin
l1.Θl2
ãα,l δP4
1 li,0
δα4α2δα3α1
which entails that
g(σ) = 0.
Lemma 6.13. Suppose n = 4 and 1
Θ diophantine. For any self-adjoint one-form A,
ζDA(0) − ζD(0) = −c τ(Fα1,α2Fα1α2).
Proof. By (34) and Lemma 6.6 we get
ζDA(0) − ζD(0) =
(−1)q
σ∈{+,−}q
− Aσ.
By Lemma 6.12 (iv), we see that the crossed terms all vanish. Thus, with Lemma 6.7, we get
ζDA(0)− ζD(0) = 2
(−1)q
−(A+)q. (65)
By definition,
Fα1α2 = i
aα2,k kα1 − aα1,k kα2
aα1,k aα2,l [Uk, Ul]
(aα2,k kα1 − aα1,k kα2)− 2
aα1,k−l aα2,l sin(
τ(Fα1α2F
α1α2) =
α1, α2=1
(aα2,k kα1 − aα1,k kα2)− 2
l′∈Z4
aα1,k−l′ aα2,l′ sin(
k.Θl′
(aα2,−k kα1 − aα1,−k kα2)− 2
l”∈Z4
aα1,−k−l” aα2,l” sin(
k.Θl”
One checks that the term in aq of τ(Fα1α2F
α1α2) corresponds to the term
(A+)q given by
Lemma 6.12. For q = 2, this is
l∈Z4, α1, α2
aα1,l aα2,−l
lα1 lα2 − δα1α2 |l|2
For q = 3, we compute the crossed terms:
k,k′,l
(aα2,k kα1 − aα1,k kα2) a
Uk[Uk′ , l] + [Uk′ , Ul]Uk
which gives the following a3-term in τ(Fα1α2F
α1α2)
aα3,−l1−l2 a
aα1,l1 sin
l1.Θl2
For q = 4, this is
aα1,−l1−l2−l3 aα2,l3 a
l1.Θ(l2+l3)
sin l2.Θl3
which corresponds to the term
(A+)4. We get finally,
(−1)q
−(A+)q = − c
τ(Fα1,α2F
α1α2). (66)
Equations (65) and (66) yield the result.
Lemma 6.14. Suppose n = 2. Then, with the same hypothesis as in Lemma 6.11,
−(A+)2 =
−(A−)2 = 0.
(ii) Suppose 1
Θ diophantine. Then
− A+A− =
− A−A+ = 0.
Proof. (i) Lemma 6.11 entails that
A++ = Res
l∈Z2 −f(s, l) where
f(s, l) :=
kµ1 (k+l)µ2
|k|s+2|k+l|2
ãα,l Tr(γ
α2γµ2γα1γµ1) =: fµ,α(s, l)Tr(γ
α2γµ2γα1γµ1)
and ãα,l := aα1,l aα2,−l. This time, since n = 2, it is enough to apply just once (16) to obtain an
absolutely convergent series. Indeed, we get with (16)
fµ,α(s, l) =
′ kµ1 (k+l)µ2
|k|s+4
ãα,l −
′ kµ1 (k+l)µ2 (2k.l+|l|
|k|s+4|k+l|2
ãα,l.
and the function r(s, l) :=
kµ1 (k+l)µ2 (2k.l+|l|
|k|s+4|k+l|2
ãα,l is a linear combination of functions of
the type H(s, l) satisfying the hypothesis of Corollary 2.13. As a consequence, r(s, l) satisfies
(H1) and
fµ,α(s, l) ∼
′ kµ1 (k+l)µ2
|k|s+4
ãα,l ∼
′ kµ1kµ2
|k|s+4
ãα,l
Note that the function (s, l) 7→ hµ,α(s, l) :=
kµ1kµ2
|k|s+4
ãα,l satisfies (H2). Thus, Lemma 2.14
yields
f(s, l) =
hµ,α(s, l)Tr(γ
α2γµ2γα1γµ1).
By Proposition 2.16, we get Res
hµ,α(s, l) = δµ1µ2 π ãα,l. Therefore,
− A++ = −π
ãα,l Tr(γ
α2γµγα1γµ) = 0
according to (59).
(ii) By Lemma 6.11, we obtain that
A−+ = Res
l∈Z2 λσfα,µ(s, l)Tr(γ
α2γµ2γα1γµ1) where
λσ = −(−i)2 = 1 and
fα,µ(s, l) :=
′ kµ1 (k+l)µ2
|k|s+2|k+l|2
eiη k.Θl ãα,l
and η := 1
(σ1 − σ2) = −1. As in the proof of (i), since the presence of the phase does not
change the fact that r(s, l) satisfies (H1), we get
fα,µ(s, l) ∼
′ kµ1 (k+l)µ2
|k|s+4
eiη k.Θl ãα,l := gα,µ(s, l) .
Since 1
Θ is diophantine, the functions s 7→
l∈Z2\{0} gα,µ(s, l) are holomorphic at s = 0 by
Theorem 2.5 3. As a consequence,
− A−+ = Res
gα,µ(s, 0)Tr(γ
α2γµ2γα1γµ1) = Res
′ kµ1kµ2
|k|s+4
ãα,0 Tr(γ
α2γµ2γα1γµ1).
Recall from Proposition 2.1 that Ress=0
|k|s+4
= δij π. Thus, again with (59),
− A−+ = ãα,0 π Tr(γα2γµγα1γµ) = 0.
Lemma 6.15. Suppose n = 2 and 1
Θ diophantine. For any self-adjoint one-form A,
ζDA(0) − ζD(0) = 0.
Proof. As in Lemma 6.13, we use (34) and Lemma 6.6 so the result follows from Lemma 6.14.
6.1.2 Odd dimensional case
Lemma 6.16. Suppose n odd and 1
Θ diophantine. Then for any self-adjoint 1-form A and
σ ∈ {−,+}q with 2 ≤ q ≤ n, ∫
− Aσ = 0 .
Proof. Since Aσ ∈ Ψ1(A), Lemma 5.11 with k = n gives the result.
Corollary 6.17. With the same hypothesis of Lemma 6.16, for any self-adjoint one-form A,
ζDA(0)− ζD(0) = 0.
Proof. As in Lemma 6.13, we use (34) and Lemma 6.6 so the result follows from Lemma 6.16.
6.2 Proof of the main result
Proof of Theorem 6.1. (i) By (5) and Proposition 5.5, we get
S(DA,Φ,Λ) = 4πΦ2Λ2 +Φ(0) ζDA(0) +O(Λ−2),
where Φ2 =
Φ(t) dt. By Lemma 6.15, ζDA(0) − ζD(0) = 0 and from Proposition 5.4,
ζD(0) = 0, so we get the result.
(ii) Similarly, S(DA,Φ,Λ) = 8π2 Φ4Λ4+Φ(0) ζDA(0)+O(Λ−2) with Φ4 = 12
Φ(t) t dt. Lemma
6.13 implies that ζDA(0)−ζD(0) = −c τ(FµνFµν) and by Proposition 5.4, ζDA(0) = −c τ(FµνFµν)
leading to the result.
(iii) is a direct consequence of (5), Propositions 5.4, 5.5, and Corollary 6.17.
A Appendix
A.1 Proof of Lemma 3.3
(i) We have |D|T |D|−1 = T + δ(T )|D|−1 and |D|−1T |D| = T − |D|−1δ(T ). A recurrence
proves that for any k ∈ N, |D|kT |D|−k =
δq(T )|D|−q and we get |D|−kT |D|k =
q=0(−1)q
|D|−qδq(T ).
As a consequence, since T , |D|−q and δq(T ) are in OP 0 for any q ∈ N, for any k ∈ Z,
|D|kT |D|−k ∈ OP 0. Let us fix p ∈ N0 and define Fp(s) := δp(|D|sT |D|−s) for s ∈ C. Since
for k ∈ Z, Fp(k) is bounded, a complex interpolation proves that Fp(s) is bounded, which gives
|D|sT |D|−s ∈ OP 0.
(ii) Let T ∈ OPα and T ′ ∈ OP β. Thus, T |D|−α, T ′|D|−β are in OP 0. By (i) we get
|D|βT |D|−α|D|−β ∈ OP 0, so T ′|D|−β|D|βT |D|−β−α ∈ OP 0. Thus, T ′T |D|−(α+β) ∈ OP 0.
(iii) For T ∈ OPα, |D|α−β and T |D|−α are in OP 0, thus T |D|−β = T |D|−α|D|α−β ∈ OP 0.
(iv) follows from δ(OP 0) ⊆ OP 0.
(v) Since ∇(T ) = δ(T )|D|+ |D|δ(T )− [P0 , T ], the result follows from (ii), (iv) and the fact that
P0 is in OP
A.2 Proof of Lemma 3.6
The non-trivial part of the proof is the stability under the product of operators. Let T, T ′ ∈
Ψ(A). There exist d, d′ ∈ Z such that for any N ∈ N, N > |d|+ |d′|, there exist P,P ′ in D(A),
p, p′ ∈ N0, R ∈ OP−N−d
, R′ ∈ OP−N−d such that T = PD−2p + R, T ′ = P ′D−2p′ + R′,
PD−2p ∈ OP d and P ′D−2p′ ∈ OP d′ .
Thus, TT ′ = PD−2pP ′D−2p
+RP ′D−2p
+ PD−2pR′ +RR′.
We also have RP ′D−2p
′ ∈ OP−N−d′+d′ = OP−N and similarly, PD−2pR′ ∈ OP−N . Since
RR′ ∈ OP−2N , we get
TT ′ ∼ PD−2pP ′D−2p′ mod OP−N .
If p = 0, then TT ′ ∼ QD−2p′ mod OP−N where Q = PP ′ ∈ D(A) and QD−2p′ ∈ OP d+d′ .
Suppose p 6= 0. A recurrence proves that for any q ∈ N0,
D−2P ′ ∼
(−1)k∇k(P ′)D−2k−2 + (−1)q+1D−2∇q+1(P ′)D−2q−2 mod OP−∞ .
By Lemma 3.3 (v), the remainder is in OP d
′+2p′−q−3, since P ′ ∈ OP d′+2p′ . Another recurrence
gives for any q ∈ N0,
D−2pP ′ ∼
k1,··· ,kp=0
(−1)|k|1∇|k|1(P ′)D−2|k|1−2p mod OP d′+2p′−q−1−2p.
Thus, with qN = N + d+ d
′ − 1,
TT ′ ∼
k1,··· ,kp=0
(−1)|k|1P∇|k|1(P ′)D−2|k|1−2(p+p′) mod OP−N .
The last sum can be written QND
−2rN where rN := p qN + (p + p
′). Since QN ∈ D(A) and
−2rN ∈ OP d+d′ , the result follows.
A.3 Proof of Proposition 3.11
Let P ∈ OP k1 , Q ∈ OP k2 ∈ Ψ(A). With [Q, |D|−s] =
Q− σ−s(Q)
|D|−s and the equivalence
Q− σ−s(Q) ∼ −
r=1 g(−s, r) εr(Q) mod OP−N−1+k2 , we get
P [Q, |D|−s] ∼ −
g(−s, r)Pεr(Q)|D|−s mod OP−N−1+k1+k2−ℜ(s)
which gives, if we choose N = n+ k1 + k2,
P [Q, |D|−s]
n+k1+k2∑
g(−s, r)Tr
Pεr(Q)|D|−s
By hypothesis s 7→ Tr
Pεr(Q)|D|−s
has only simple poles. Thus, since s = 0 is a zero of the
analytic function s 7→ g(−s, r) for any r ≥ 1, we have Res
g(−s, r) Tr
Pεr(Q)|D|−s
= 0, which
entails that Res
P [Q, |D|−s]
= 0 and thus
− PQ = Res
P |D|−sQ
When s ∈ C with ℜ(s) > 2max(k1 + n + 1, k2), the operator P |D|−s/2 is trace-class while
|D|−s/2Q is bounded, so Tr
P |D|−sQ
|D|−s/2QP |D|−s/2
σ−s/2(QP )|D|−s
Thus, using (29) again,
P |D|−sQ
− QP +
n+k1+k2∑
g(−s/2, r)Tr
εr(QP )|D|−s
As before, for any r ≥ 1, Res
g(−s/2, r)Tr
εr(QP )|D|−s
= 0 since g(0, r) = 0 and the spectral
triple is simple. Finally,
P |D|−sQ
− QP.
Acknowledgments
We thank Pierre Duclos, Emilio Elizalde, Victor Gayral, Thomas Krajewski, Sylvie Paycha,
Joe Varilly, Dmitri Vassilevich and Antony Wassermann for helpful discussions and Stéphane
Louboutin for his help with Proposition 2.16.
A. Sitarz would like to thank the CPT-Marseilles for its hospitality and the Université de
Provence for its financial support and acknowledge the support of Alexander von Humboldt
Foundation through the Humboldt Fellowship.
References
[1] A. L. Carey, J. Phillips, A. Rennie and F. A. Sukochev, “The local index formula in semifi-
nite von Neumann algebras I: Spectral flow”, Advances in Math. 202 (2006), 415–516.
[2] L. Carminati, B. Iochum and T. Schücker, “Noncommutative Yang-Mills and noncommu-
tative relativity: a bridge over troubled water, Eur. Phys. J. C 8 (1999) 697–709.
[3] A. Chamseddine and A. Connes, “The spectral action principle”, Commun. Math. Phys.
186 (1997), 731–750.
[4] A. Chamseddine and A. Connes, “Inner fluctuations of the spectral action”, J. Geom. and
Phys. 57 (2006), 1–21.
[5] A. Chamseddine, A. Connes and M. Marcolli, “Gravity and the standard model with neu-
trino mixing”, [arXiv:hep-th/0610241].
[6] A. Connes, “C∗-algèbres et géométrie différentielle”, C. R. Acad. Sci. Paris 290 (1980),
599–604.
[7] A. Connes, “Noncommutative differential geometry”, Pub. Math. IHÉS, 39 (1985), 257–
[8] A. Connes, Noncommutative Geometry, Academic Press, London and San Diego, 1994.
[9] A. Connes, “Geometry from the spectral point of view”, Lett. Math. Phys., 34 (1995),
203–238.
[10] A. Connes, “Noncommutative geometry and reality”, J. Math. Phys. 36 (1995), 6194–6231.
[11] A. Connes, Cours au Collège de France, january 2001.
[12] A. Connes and G. Landi, “Noncommutative manifolds, the instanton algebra and isospectral
deformations”, Commun. Math. Phys. 221 (2001), 141–159.
[13] A. Connes and H. Moscovici, “The local index formula in noncommutative geometry”,
Geom. And Funct. Anal. 5 (1995), 174–243.
[14] A. Edery, “Multidimensional cut-off technique, odd-dimensional Epstein zeta functions and
Casimir energy of massless scalar fields”, J. Phys. A: Math. Gen. 39 (2006), 678–712.
[15] E. Elizalde, S. D. Odintsov, A. Romeo, A. A. Bytsenko and S. Zerbini, Zeta Regularization
Techniques with Applications, Singapore: World Scientific, 1994.
[16] R. Estrada, J. M. Gracia-Bond́ıa and J. C. Várilly, “On summability of distributions and
spectral geometry”, Commun. Math. Phys. 191 (1998), 219–248.
[17] V. Gayral, “Heat-kernel approach to UV/IR Mixing on isospectral deformation manifolds”,
Ann. H. Poincaré 6 (2005), 991–1023.
[18] V. Gayral and B. Iochum, “The spectral action for Moyal plane”, J. Math. Phys. 46 (2005),
no. 4, 043503, 17 pp.
http://arxiv.org/abs/hep-th/0610241
[19] V. Gayral, B. Iochum and J. C. Várilly, “Dixmier traces on noncompact isospectral defor-
mations”, J. Funct. Anal. 237 (2006), 507–539.
[20] V. Gayral, B. Iochum and D. Vassilevich, “Heat kernel and number theory on NC-torus”,
Commun. Math. Phys. 273 (2007), 415–443.
[21] P. B. Gilkey, Asymptotic Formulae in Spectral Geometry, Chapman & Hall/CRC, Boca
Raton, FL, 2004.
[22] A. de Goursac, J.-C. Wallet and R. Wulkenhaar, “Noncommutative induced gauge theory”,
Eur. Phys. J. C51 (2007) 977–988.
[23] J. M. Gracia-Bond́ıa, J. C. Várilly and H. Figueroa, Elements of Noncommutative Geome-
try, Birkhäuser Advanced Texts, Birkhäuser, Boston, 2001.
[24] V.W. Guillemin, S. Sternberg and J. Weitsman, “The Ehrhart function for symbols”,
arXiv:math.CO/06011714.
[25] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, Clarendon,
Oxford, 1979.
[26] N. Higson, “The local index formula in noncommutative geometry”, Lectures given at the
School and Conference on Algebraic K-theory and its applications, Trieste, 2002.
[27] T. Kato, Perturbation Theory For Linear Operators, Springer–Verlag, Berlin-Heidelberg-
New York, 1980.
[28] M. Knecht and T. Schücker, “Spectral action and big desert”, Phys. Lett. B640 (2006)
272-277
[29] R. Nest, E. Vogt and W. Werner, “Spectral action and the Connes–Chamseddine model”,
p. 109-132 in Noncommutative Geometry and the Standard Model of Elementary Parti-
cle Physics, F. Scheck, H. Upmeier and W. Werner (Eds.), Lecture Notes in Phys., 596,
Springer, Berlin, 2002.
[30] M. A. Rieffel, “C∗-algebras associated with irrational rotations”, Pac. J. Math. 93 (1981),
415–429.
[31] M. A. Rieffel, Deformation Quantization for Actions of Rd, Memoirs Amer. Soc. 506,
Providence, RI, 1993.
[32] L. Schwartz, Méthodes mathématiques pour les sciences physiques, Hermann, Paris, 1979.
[33] B. Simon, Trace ideals and their applications, London Math. Lecture Note Series, Cam-
bridge University Press, Cambridge, 1979.
[34] W. van Suijlekom, Private communication.
[35] A. Strelchenko, “Heat kernel of non-minimal gauge field kinetic operators on Moyal plane,
Int. J. Mod. Phys. A22 (2007), 181–202.
[36] D. V. Vassilevich, “Non-commutative heat kernel”, Lett. Math. Phys. 67 (2004), 185–194.
http://arxiv.org/abs/math/0601171
[37] D. V. Vassilevich, “Heat kernel, effective action and anomalies in noncommutative theories”,
JHEP 0508 (2005), 085.
[38] D. V. Vassilevich, “Induced Chern–Simons action on noncommutative torus”,
[arXiv:hep-th/0701017].
http://arxiv.org/abs/hep-th/0701017
	Introduction
	Residues of series and integral, holomorphic continuation, etc
	Residues of series and integral
	Holomorphy of certain series
	Proof of Lemma ?? for i=1:
	Proof of Lemma ?? for i=0:
	Proof of item (i.2) of Theorem ??:
	Proof of item (iii) of Theorem ??:
	Commutation between sum and residue
	Computation of residues of zeta functions
	Meromorphic continuation of a class of zeta functions
	A family of polynomials
	Residues of a class of zeta functions
	Noncommutative integration on a simple spectral triple
	Kernel dimension
	Pseudodifferential operators
	Zeta functions and dimension spectrum
	The noncommutative integral   
	Residues of DA for a spectral triple with simple dimension spectrum
	The noncommutative torus
	Notations
	Kernels and dimension spectrum
	Noncommutative integral computations
	The spectral action
	Computations of   
	Even dimensional case
	Odd dimensional case
	Proof of the main result
	Appendix
	Proof of Lemma ??
	Proof of Lemma ??
	Proof of Proposition ??
ABSTRACT
  The spectral action on noncommutative torus is obtained, using a
Chamseddine--Connes formula via computations of zeta functions. The importance
of a Diophantine condition is outlined. Several results on holomorphic
continuation of series of holomorphic functions are obtained in this context.

<|endoftext|><|startoftext|>
Introduction
The late-stage behavior of a material undergoing a first-order phase transition
(due to changes in temperature and/or pressure for example) is characterized by
thermodynamic instability resolved through phase separation and consequent coars-
ening of the emerging phase. In the case of the new phase occupying much smaller
volume fraction, and thus appearing as well-separated particles, this coarsening pro-
cess (known as Ostwald ripening) is driven by the minimization of surface energy at
the interface via diffusional mass exchange between particles while the total mass
or volume of each phase is conserved. The result of this kind of mass diffusion from
regions of high to regions of low interfacial curvature is the growth of large parti-
cles and the shrinkage and final extinction of smaller ones. For a review of some
aspects of Ostwald ripening, mainly from the physical and modeling viewpoint, see
the survey by Voorhees [21] or the book by Ratke and Voorhees [18].
In this coarsening scenario the mass-diffusion process can be controlled by two
different mechanisms: either by the diffusion of atoms away from the particles and
into the bulk, or by the reaction-rate of attachment of atoms at the phase interface.
In the former case (diffusion control), the random exchange of atoms between the
particles and the bulk is sufficiently rapid and the surrounding of each particle is in
thermal equilibrium with the atoms in it; in the latter (interface-reaction control),
detachment and attachment are slow compared to diffusion and the surrounding
bulk can be out of equilibrium with the particle interface. We refer to the physics
literature for more details, for example, Slezov and Sagalovich [19], Bartelt, Theis,
and Tromp [3]; for a related mathematical treatment see Dai and Pego [5].
The classical theory for Ostwald ripening was developed by Lifshitz and Slyozov
[9] and Wagner [22] in the case of supersaturated solid solutions in three dimensions.
The Lifshitz–Slyozov–Wagner theory statistically characterizes the evolution by the
particle-radius density n(t, R), where n(t, R) dR is defined to be the number of
particles with radii between R and dR at time t per unit volume. In the late
stages of the phase transition nucleation and coalesence of particles can be neglected
since new nuclei dissolve immediately and since particles cannot merge because of
the large distances between them. Thus, the particle-radius density satisfies the
This work was supported by the DFG through the Graduiertenkolleg RTG-1128 “Analysis,
Numerics, and Optimization of Multiphase Problems” at the Humboldt-Universität zu Berlin.
http://arxiv.org/abs/0704.0565v4
2 APOSTOLOS DAMIALIS
continuity equation (see [18, §5.1])
n(t, R) +
v(t, R)n(t, R)
where v(t, R) denotes the growth rate of particles of radius R at time t. Using a
mean-field ansatz (cf. Section 3), Lifshitz, Slyozov, and Wagner formally calculate
n(t, R) +
(Rū− 1)n(t, R)
ū(t) =
n(t, R) dR
Rn(t, R) dR,
in the diffusion-controlled case, and
n(t, R) +
n(t, R)
ū(t) =
Rn(t, R) dR
R2n(t, R) dR,
in the reaction-controlled one, both results valid in the limit of vanishing mass or
volume fraction of particles.
In [11] and [12] Niethammer rigorously derived the effective equations in the dif-
fusion-controlled case, starting from a quasi-static one-phase Stefan problem with
surface tension and kinetic undercooling,
−∆u = 0 in Ω \G,
V = ∇u · n on ∂G,
u = H + βV on ∂G,
(1.1)
and restricting it to spherical particles. The same was also done in [11] for the full
time-dependent parabolic problem but without the kinetic-drag term βV . Here, u
is a chemical potential, n is the outer normal to the particle phase G, V is the nor-
mal velocity of the phase interface ∂G, and H is its mean curvature. The domain
Ω ⊂ R3 is considered bounded and β is a parameter that comes from the nondi-
mensionalization and scales like diffusivity over mobility. The second boundary
condition is the Gibbs–Thomson law, coupling the curvature of the interface with
the chemical potential, modified by accounting for kinetic drag. Note that while
under diffusion control the parameter β is small and the kinetic drag can even
be neglected (thus yielding the well-known Mullins–Sekerka model [10]), in the
reaction-controlled case the values of β are large and, therefore, the kinetic-drag
term is necessary. For a derivation of such sharp-interface free-boundary problems
from continuum mechanics and thermodynamics see the book of Gurtin [8].
The goal in the following is to use the techniques developed in [11] and [12] to
derive the effective equations in the reaction-controlled case. This involves passing
over to a different time scale incorporating the parameter β tending to infinity
(see Section 2) and, as a result, some extra manipulations in the proofs. Except
for the scaling, in Section 2 we also give short proofs of some useful preliminaries
and discuss the validity of the mean-field description while in Section 3 we prove
pointwise estimates for approximate solutions and for the growth rates of particles.
Finally, using these estimates, in Section 4 we pass to the homogenization limit of
infinitely-many particles and obtain a weak form of the Lifshitz–Slyozov–Wagner
equation.
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 3
In comparison with the results in the diffusion-controlled case, we make precise
that the crucial quantity that has to vanish in order to neglect direct interactions
between particles and justify the expected mean-field law is the surface-area density
of the particles in contrast to their capacity in the other case (see [11] and [12]).
This difference is of interest since the asymptotic limits of vanishing surface area
and capacity have different physical interpretations and further refine the näıve
general limit of vanishing mass or volume. For the reaction-controlled case though,
the result is in some sense to be expected since the limit of vanishing surface-
area density corresponds to the physics of the interface-reaction-controlled scenario,
where there is an obvious dependence on the area of the interface.
2. Formulation, scaling, and preliminary estimates
We start with problem (1.1) where the quasi-static approximation to the para-
bolic diffusion equation is justified by the small interfacial velocities present during
late-stage coarsening. (See the discussion in Mullins and Sekerka [10].)
We further suppose that the solid phase consists of spherical particles with cen-
ters fixed in space, a simplification that can be justified by the work of Alikakos
and Fusco [1], [2], and Velázquez [20]. Denoting these particles as Bi, where each
Bi is the closed ball B(xi, Ri(t)), the particle phase is then the union ∪Bi and its
isotropic evolution can be modeled by averaging the flux in the Stefan condition,
i.e.,
V = Ṙi(t) := −
∇u · n ,
where the average integral is defined as
for a function f on some domain D, and where the overdot denotes a derivative
with respect to time; the Gibbs–Thomson law becomes then
+ βṘi,
since in the case of spheres the mean curvature is the inverse radius.
To have many small particles in a bounded domain, for a system with size of order
O(1), say the unit cube [0, 1]3, let δ be the typical particle distance with 0 < δ ≪ 1.
For the distribution of particle centers in space, we assume, for simplicity, that they
are situated on a three-dimensional lattice of spacing δ. Then, the initial number
density of particles Ni(δ) will be bounded by 1/δ
3, and for the particles to be small
let the typical particle size be δα for α > 1. For times t ∈ [0, T ] we choose a δ small
enough so that adjacent particles of size δα will not collide during the evolution up
to a maximal time T .
Concerning the assumption on the spatial distribution of particles, a more general
assumption like infi6=j |xi−xj| > cδ, for a constant c > 0, would still be enough for
our purposes in this work. These considerations will also be used in the proof of
Lemma 3.2 where we approximate a certain sum over all particles by an integral. For
an approach using more sophisticated deterministic and stochastic assumptions on
the distribution of particles with respect to homogenization we refer to Niethammer
and Velázquez [16], [17], where also further refinements of the theory are made.
To have particle sizes of order O(1) as well, we rescale
Rδi :=
4 APOSTOLOS DAMIALIS
and motivated by the scaling invariance of problem (1.1) (cf. [5]),
uδ := δαu, tδ :=
Notice that this rescaling is another way of addressing the reaction-controlled
regime. Instead of rescaling time by β and then letting β tend to infinity, we
keep β fixed and positive, and specially rescale as above letting δ tend to zero.
Since now β plays no significant role, we will set it to unity in what follows. In
addition, one easily sees that the transformations Rδi , u
δ, and tδ preserve the form
of the equations. From hereon we also drop the superscript δ from the notation for
time and to denote the dependence on the new scale we write
Bδi := B(xi, δ
αRδi ).
Finally, note that under diffusion control the relevant scale for time would be δ3α
instead of δ2α. This difference is key to all that follows, leading to different consid-
erations on the validity of the mean-field model. (Cf. the remarks following Lemma
2.1.)
As initial data, for every particle-center xi we associate a corresponding bounded
initial radius Rδi (0) with the assumption that
supi∈NiR
i (0) ≤ R0,
uniformly for some constant R0. To consider a closed system, we impose a no-flux
Neumann boundary condition on the outer boundary of Ω, i.e.,
∇uδ · n = 0 on ∂Ω.
In case the ith particle vanishes at time ti := sup{t | Rδi (t) > 0}, for times later
than ti we define R
i to be zero, reduce the number N(t) := {j | Rδj(t) > 0} of
active particles by one, and neglect the boundary ∂Bδi in the boundary conditions.
In the following, all sums, unions, and suprema will run over the set N(t), with
N(0) ≡ Ni, and any further reference to the particle-number density will mean the
active particle-number density N unless otherwise noted.
Summarizing, the restricted and rescaled problem for the particle radii can be
considered as a nonlocal, N -dimensional system of ordinary differential equations
Ṙδi (t) =
4πδ2αRδi (t)
∇uδ · n on ∂Bδi (t), (2.1)
for times t ∈ (0, ti), ti < T , and with bounded initial data Rδi (0) for every i, while
the chemical potential is determined by
−∆uδ(t, x) = 0 in Ω \ ∪Bδi (t), (2.2)
uδ(t, x) =
Rδi (t)
+ Ṙδi (t) on ∂B
i (t), (2.3)
and the Neumann condition on the outer boundary.
Global existence and uniqueness of continuous, piecewise-smooth solutions for
a similar restricted Stefan problem was proved in [12] by an application of the
Picard–Lindelöf theorem, the only difference being the different time scale. These
solutions are not globally smooth due to the singularities arising from the extinction
of particles; however, they are smooth in the intervals between the extinction times
ti. In the following, when we mention solutions of the problem we will mean such
continuous, piecewise-smooth solutions that exist up to any given time T .
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 5
It is easy to see that equations (2.1), (2.2), (2.3), along with the outer boundary
condition conserve the volume and decrease the interfacial area of the particle phase.
Indeed, differentiating the total volume of particles with respect to time gives
Rδi (t)
3 = 3
Rδi (t)
2Ṙδi (t) = 3
Rδi (t)
4πδ2αRδi (t)
∇uδ · n
where the last sum vanishes due to the divergence theorem, equation (2.2), and the
no-flux condition on ∂Ω. The decrease of total surface area follows from the next
a priori estimate.
Lemma 2.1. For any time t ∈ (0, T ), the solutions of the problem satisfy the
following energy equality.
(Rδi )
2|Ṙδi |2 +
Rδi (t)
4πδ2α
Ω\∪Bδ
|∇uδ|2 = 1
Rδi (0)
Proof. Multiplying −∆uδ = 0 with uδ, integrating over Ω \ ∪Bδi , and integrating
by parts gives
Ω\∪Bδ
|∇uδ|2 +
(∇uδ · n)uδ −
(∇uδ · n)uδ = 0,
where the last term vanishes due to the Neumann condition on the outer boundary.
Thus, using equations (2.3) and (2.1) we get
Ω\∪Bδ
|∇uδ|2 =
+ Ṙδi
∇uδ · n
+ Ṙδi
4πδ2α(Rδi )
2Ṙδi
= 4πδ2α
Rδi Ṙ
i + (R
2|Ṙδi |2
and after rearranging,
(Rδi )
2|Ṙδi |2 +
Rδi Ṙ
4πδ2α
Ω\∪Bδ
|∇uδ|2 = 0. (2.4)
The result follows from an integration over time. �
After normalization with respect to the initial particle-number density Ni, this
energy equality can yield useful information on the validity of the mean-field ap-
proach. In fact, we have
(Rδi )
2|Ṙδi |2+
Rδi (t)
4πNiδ2α
Ω\∪Bδ
|∇uδ|2 = 1
Rδi (0)
where the right-hand side is uniformly bounded by the assumption on the initial
radii. For the left-hand side to stay bounded as well, if the quantity Niδ
2α tends to
zero, the same must hold for |∇uδ| and it is exactly this limit of vanishing surface-
area density of particles that results in a mean field that is constant in space since,
in particular,
∇uδ → 0 in L2
0, T ;H1(Ω)
Here and in the following, to obtain global estimates that are uniform in δ we extend
uδ to the interior of particles, and thus to the whole of Ω, by its boundary values.
It is important to note that in our scaling setup, for the surface area to vanish as δ
6 APOSTOLOS DAMIALIS
tends to zero, the exponent α must be strictly larger than 3/2 since Ni is O(1/δ
These facts will be made precise in Corollary 3.3 where we give an estimate of the
mean-field effect. Note also that we do not address here the critical case α = 3/2
that corresponds to finite surface area. For that one would have to use the different
methods developed by Niethammer and Otto in [13].
Finally, note that for similar considerations under diffusion control, the corre-
sponding quantity would be the capacity Niδ
α due to the different time scale. In
three dimensions, this capacity effect fits to general homogenization results as in
the work of Cioranescu and Murat [4]; to our knowledge though, the surface-area
effect has not been explicitly discussed in the relevant literature.
3. Approximation and growth-rate estimates
As in the mean-field ansatz of Lifshitz, Slyozov, and Wagner, we suppose that
the system is dilute enough so that particles behave as if they were isolated and we
base our approximation on the solution of a single-particle problem.
Consider problem (2.1), (2.2), (2.3) for a single spherical particle centered at
the origin and with initially unscaled radius r that we rescale as rδ := r/δα, along
with the corresponding reaction-controlled rescalings for a chemical potential uδr
and time, as in Section 2. For this rescaled particle Bδr we consider the following
problem in the whole space:
ṙδ(t) =
4πδ2αrδ(t)2
∇uδr · n on ∂Bδr ,
where the chemical potential uδr(t, x) satisfies
−∆uδr(t, x) = 0 in R3 \Bδr ,
uδr(t, x) =
rδ(t)
+ ṙδ(t) for x ∈ ∂Bδr ,
and the mean-field assumption is posed as a condition at infinity, i.e.,
|x|→∞
uδr(t, x) = ū
r(t).
This problem can be explicitly solved to give
uδr(t, x) = ū
r(t) +
δαrδ(t)
1 + δαrδ(t)
1− ūδr(t)rδ(t)
ṙδ(t) =
1 + δαrδ(t)
ūδr(t)−
rδ(t)
Note that in the formal limit of δ tending to zero, the expected effective equations
take the general form
u(t, x) = ū(t) and ṙ = ū− 1
as in the reaction-controlled Lifshitz–Slyozov–Wagner theory.
Going now back to the many-particle problem, a calculation using the single-
particle growth rate above along with the requirement that the volume is conserved
gives the following expression for the mean field
ūδ =
1 + δαRδi
(Rδi )
1 + δαRδi
. (3.1)
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 7
The effect of this mean field plus a sum of single-particle solutions will be the
monopole approximation to the solution uδ supposing that there are no direct
interactions between particles. To this end, let us define the approximate solution
ζδ(t, x) := ūδ(t) +
δαRδi (t)
1 + δαRδi (t)
1− ūδ(t)Rδi (t)
|x− xi|
(3.2)
for x ∈ Ω \ ∪Bδi (t).
Below is a maximum principle tailored to our setting that will be used to compare
the approximation and the solution in the lemma next. Its proof can be found in
[12].
Lemma 3.1. Let Ω be a Lipschitz domain and let ∪Bi ⊂ Ω be a finite collection of
disjoint closed balls. Then, a function v which is constant on each of the boundaries
∂Bi and satisfies
−∆v = 0 in Ω \ ∪Bi,
v − ci
∇v · n ≥ 0 on ∂Bi,
∇v · n ≥ 0 on ∂Ω,
where ci ≥ 0 for all i, also satisfies
v ≥ 0 in Ω \ ∪Bi.
Lemma 3.2. For any time t ∈ (0, T ) and small positive ε, the chemical potential
and its approximation satisfy
‖uδ − ζδ‖L∞(Ω\∪Bδ
)(t) ≤ Cδ2α−3−ε supRδi (t)
1 + ūδ(t) supRδi (t)
Proof. Since the difference uδ − ζδ is already harmonic in Ω \ ∪Bδi as ζδ is a su-
perposition of fundamental solutions, we would like to estimate to what extent it
satisfies the maximum principle’s boundary conditions.
For the condition on the particle boundaries, we use equations (2.1), (2.3), and
the definition of ζδ to calculate for x on the boundary ∂Bδi of the ith particle,
ζδ(t, x)−−
∇ζδ · n
uδ(t, x)−−
∇uδ · n
ζδ − 1
∇ζδ · n
ūδ − 1
δαRδj
1 + δαRδj
(1− ūδRδj)
|x− xj |
|x− xj |
and since by the divergence theorem there holds for j 6= i,
|x− xj |
· n = 0,
while for j = i,
|x− xi|
· n = − 1
δα(Rδi )
8 APOSTOLOS DAMIALIS
we continue the calculation to get
ūδ − 1
1− ūδRδi
Rδi (1 + δ
αRδi )
δαRδj
1 + δαRδj
(1− ūδRδj)
|x− xj |
j 6=i
δαRδj
1 + δαRδj
(1− ūδRδj)
|x− xj |
≤ δ2α−3 supRδj (1 + ūδ supRδj )
j 6=i
|x− xj |
≤ Cδ2α−3 supRδj(1 + ūδ supRδj). (3.3)
In the last step, keeping in mind the assumptions on the spatial distribution of
particle centers, the sum is bounded for j 6= i since it is considered as a Riemann-
sum approximation to the integral
|x− y|
which in turn is bounded using radial symmetry around the singularity and where
the factor δ3 in the sum compensates for the scaling in space.
To further fulfil the maximum principle’s outer boundary condition on ∂Ω, we
consider the comparison function ζδ+zδ, where the auxiliary function zδ solves the
problem
−∆zδ =
∇ζδ · n in Ω,
∇zδ · n = −∇ζδ · n on ∂Ω,
zδ = 0,
(3.4)
such that the comparison function ζδ + zδ has zero normal derivative on ∂Ω. To
work with the maximum principle, zδ also needs to be harmonic in Ω and for that
we need that the integral
∇ζδ · n vanishes. But,
∇ζδ · n = δα
δαRδi
1 + δαRδi
(1−Rδi ūδ)
|x− xi|
· n ,
where the last integral equals −4π, independent of i. Thus, zδ is harmonic if and
only if
ūδ =
1 + δαRδi
(Rδi )
1 + δαRδi
which is exactly the mean field (3.1) as dictated by the single-particle ansatz in
the beginning of the section. Moreover, since now zδ is harmonic, the divergence
theorem further gives
∇zδ · n = 0.
A construction as in Lemma 3 of [11] and elliptic regularity theory (see Gilbarg
and Trudinger [7]) give the estimate
‖zδ‖L∞(Ω) ≤ Cεδ2α−3−ε supRδi (1 + ūδ supRδi ),
where ε is a small positive number.
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 9
Let us now apply the maximum principle to the function
f+ := u
δ − ζδ − zδ + Cδ2α−3−ε supRδi (1 + ūδ supRδi ).
For a large enough constant C, say 2Cε, the following hold for f+: it is harmonic,
there holds ∇f+ · n = 0 on ∂Ω by the construction of zδ, and for the constants
ci = 1/4πδ
2α(Rδi )
2, estimate (3.3) gives
uδ − ζδ − zδ + Cδ2α−3−ε supRδi (1 + ūδ supRδi )− ci
∇(uδ − ζδ − zδ) · n ≥ 0.
Thus, f+ satisfies the maximum principle’s conditions and therefore, f+ ≥ 0 in
Ω \ ∪Bδi , i.e.,
uδ − ζδ − zδ ≥ −Cδ2α−3−ε supRδi (1 + ūδ supRδi ).
Using the maximum principle with −v instead of v, the function
f− := u
δ − ζδ − zδ − Cδ2α−3−ε supRδi (1 + ūδ supRδi ),
again satisfies the corresponding conditions and, as above, yields f− ≤ 0 in Ω\∪Bδi ,
i.e.,
uδ − ζδ − zδ ≤ Cδ2α−3−ε supRδi (1 + ūδ supRδi ).
Combining the last two inequalities, we get
‖uδ − ζδ − zδ‖L∞(Ω\∪Bδ
) ≤ Cδ2α−3−ε supRδi (1 + ūδ supRδi )
and the lemma follows by the triangle inequality using the regularity of zδ. �
In the previous lemma it is clear that our approach excludes the critical case
α = 3/2. In the following we introduce, for technical reasons, a new exponent
γ > 0 with the property
δγ := max {δα, δ2α−3, δ2α−3−ε}
for each α greater than 3/2 + ε.
As a corollary to the previous lemma we can now estimate the effect of the mean
field.
Corollary 3.3. For any time t ∈ (0, T ) and γ > 0, the chemical potential and the
mean field satisfy
‖uδ − ūδ‖L∞(Ω\∪Bδ
)(t) ≤ Cδγ
1 + 2 supRδi (t)
1 + ūδ(t) supRδi (t)
Proof. By the triangle inequality and Lemma 3.2 there holds
‖uδ − ūδ‖L∞(Ω\∪Bδ
) ≤ ‖ζδ − ūδ‖L∞(Ω\∪Bδ
) + Cδ
2α−3−ε supRδj(1 + ū
δ supRδj).
To estimate ‖ζδ− ūδ‖L∞(Ω\∪Bδ
), by the definition of ζ
δ there holds for x ∈ Ω\∪Bδj ,
ζδ(t, x)− ūδ(t)
δαRδj
1 + δαRδj
(1− ūδRδj)
|x− xj |
δαRδi
1 + δαRδi
(1 + ūδRδi )
|x− xi|
j 6=i
δαRδj
1 + δαRδj
(1− ūδRδj)
|x− xj |
and since |x− xi| ≥ δαRδi in Ω \ ∪Bδj , arguing as in estimate (3.3) gives
α(1 + ūδRδi )
1 + δαRδi
+ Cδ2α−3 supRδj(1 + ū
δ supRδj)
≤ C(δα + δ2α−3 supRδj )(1 + ūδ supRδj),
10 APOSTOLOS DAMIALIS
thus,
‖ζδ − ūδ‖L∞(Ω\∪Bδ
) ≤ C(δα + δ2α−3 supRδj )(1 + ūδ supRδj),
and finally,
‖uδ − ūδ‖L∞(Ω\∪Bδ
) ≤ C(δα + δ2α−3 supRδj + δ2α−3−ε supRδj )(1 + ūδ supRδj).
Using the exponent γ, we get
‖uδ − ūδ‖L∞(Ω\∪Bδ
) ≤ Cδγ(1 + 2 supRδi )(1 + ūδ supRδi ). �
The following lemma gives an estimate for the growth rate of particles in accor-
dance with the reaction-controlled Lifshitz–Slyozov–Wagner theory.
Lemma 3.4. For any time t ∈ (0, T ) and γ > 0, for the growth rates of particles
holds
Ṙδi −
ūδ − 1
≤ Cδγ(1 + ūδ supRδi )
1 + (1 + δγ supRδi )(1 + 2 supR
Proof. Let wδi be the capacity potential of the ball B
i with respect to a larger ball
Bλδi := B(xi, λδ
αRδi ) for λ > 1, i.e., let wi solve
−∆wδi = 0 in Bλδi \Bδi ,
wδi = 0 on ∂B
wδi = 1 in B
(3.5)
An explicit calculation gives
wδi =
1− λδ
|x− xi|
(3.6)
and also
∇wδi · n =
∇wδi · n = 4π
1− λδ
αRδi . (3.7)
Using equations (2.1), (2.2), (2.3), and the Neumann boundary condition, along
with the above properties of wδi , and integrating by parts, gives
4πδ2α(Rδi )
2Ṙδi =
∇uδ · n
wδi∇uδ · n
∇wδi∇uδ
uδ∇wδi · n −
uδ∇wδi · n
+ Ṙδi
∇wδi · n −
uδ∇wδi · n
δαRδi
+ Ṙδi − ūδ
(uδ − ūδ)∇wδi · n ,
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 11
where in the last equation we used (3.7) and added and subtracted ūδ. After
rearranging, we have
Ṙδi −
ūδ − 1
δαRδi Ṙ
4πλδαRδi
(uδ − ūδ)∇wδi · n
≤ λ− 1
δαRδi |Ṙδi |+
4πλδαRδi
(uδ − ūδ)∇wδi · n
≤ δαRδi |Ṙδi |+ ‖uδ − ūδ‖L∞(Ω\∪Bδ
), (3.8)
where in the last step we again used equation (3.7). But by using equation (2.3)
for uδ on ∂Bδi we have
Rδi |Ṙδi | ≤ 1 +Rδi |uδ| ≤ 1 +Rδi (‖uδ − ūδ‖L∞(Ω\∪Bδ
) + ū
δ). (3.9)
Substituting back in (3.8) and using Corollary 3.3 gives the final estimate. �
The next lemma ensures that the bounds in the approximation and the growth-
rate estimates are indeed uniform.
Lemma 3.5. For any time t ∈ (0, T ), the mean field and the radii of the particles
are uniformly bounded, i.e.,
ūδ(t) ≤ C and supRδi (t) ≤ C.
Proof. For the mean field (3.1) holds
ūδ =
1 + δαRδi
(Rδi )
1 + δαRδi
≤ supRδi (1 + δα supRδi )
(Rδi )
and since by Hölder’s inequality
(Rδi )
(Rδi )
δ3(Rδi )
conservation of the total volume of particles gives
ūδ ≤ C supRδi (1 + δα supRδi ).
or, using the exponent γ,
ūδ ≤ C supRδi (1 + δγ supRδi ). (3.10)
Consider now the set
t | supRδi (t) ≤
then, for times t ∈ A, plugging (3.10) in estimate (3.9) and using Corollary 3.3
gives
(Rδi )
2 ≤ C sup(Rδi )2 + C.
Integrating over the time interval (0, T ), Gronwall’s inequality implies that
supi supt∈A∩[0,T ] (R
2 ≤ C(T ),
therefore, [0, T ] ⊂ A, i.e., the radii are bounded up to time T as is the mean field
by estimate (3.10). �
Finally, the following lemma gives control over the growth rates of vanishing
particles and will prove useful for some regularity considerations in the next section.
12 APOSTOLOS DAMIALIS
Lemma 3.6. For any time t ∈ (0, T ) such that Rδi (t) ≤ 1/4 supt,δ ūδ(t) and for
sufficiently small δ, there holds
≤ Ṙδi ≤ −
and √
ti − t ≤ Rδi ≤ 2
ti − t.
Proof. For
δαRδi
1 + δαRδi
|x− xi|
it can be verified that the function uδ−g satisfies the assumptions of the maximum
principle in Lemma 3.1 for the constants ci = 1/4πδ
2α(Rδi )
2, thus yielding uδ ≥ g
in Ω \ ∪Bδi . But since uδ = g on the boundary ∂Bδi , monotonicity implies that
∇uδ · n ≥ ∇g · n on ∂Bδi and taking the average integrals over ∂Bδi we have
Ṙδi ≥ −
Rδi (1 + δ
αRδi )
≥ − 2
Moreover, Lemma 3.4 gives
Ṙδi ≤ ūδ −
+ Cδγ(1 + ūδ supRδi )
1 + (1 + δγ supRδi )(1 + 2 supR
Using now the assumption that Rδi ≤ 1/4 supt,δ ūδ and since from Lemma 3.5 it
follows that for sufficiently small δ the O(δγ) term is uniformly bounded by 1/4Rδi ,
we get
Ṙδi ≤
≤ − 1
Let now
y1 :=
ti − t, y2 := 2
ti − t
be sub- and supersolutions that respectively solve
ẏ1 = −
, ẏ2 = −
By comparison, we get the lemma’s second assertion, i.e., y1 ≤ Rδi ≤ y2. �
4. Homogenization
In order to pass to the homogenization limit of infinitely-many particles, we need
first describe the particle-radius density in the limit. To that end, define at any
time t ∈ (0, T ) the empirical measure νδt as
〈φ, νδt 〉 =
t, Rδi (t)
dνδt :=
t, Rδi (t)
for φ ∈ Cc,
i.e., for functions φ(t, R) continuous and compactly supported in the radius variable.
Using now the estimates from the previous section, we can prove the following
Lemma 4.1. For a subsequence δ → 0 and for a function ū ∈ W 1, p(0, T ), for
p < 2, holds
ūδ → ū in L2(0, T ),
uδ → ū in L2
0, T ;H1(Ω)
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 13
Furthermore, the measures νδt converge to a family νt of probability measures such
φdνδt →
φa(t) dνt uniformly in t,
where a(t) denotes the percentage of active particles in the limit.
Proof. As a consequence of Lemma 3.6, we have
sup ‖Ṙδi ‖Lp(0,T ) ≤ C(p) for p < 2,
thus, conservation of volume and boundedness of the radii give for p < 2,
Lp(0,T )
≤ C sup ‖Ṙδi ‖Lp(0,T ) ≤ C.
Therefore, ūδ ∈ W 1, p(0, T ), for p < 2, and the compactness following from the
Rellich–Kondrachov theorem gives that ūδ converges to a limit ū in L2.
Taking into consideration that uδ is extended to the whole of Ω and using the
lemmas in the previous section, ζδ converges to ū in L2(Ω) and uδ − ζδ converges
uniformly to 0 as δ → 0, therefore, uδ converges to ū in L2(Ω). By the energy
equality in Lemma 2.1 we have further control over ‖∇uδ‖L2 and thus, we have
strong convergence in L2(0, T ;H1(Ω)).
For the measures νδt holds
‖νδt ‖ := sup‖φ‖Cc≤1|〈φ, ν
t 〉| ≤ 1
in the norm of (Cc)
∗, so for a subsequence δ → 0 there holds that νδt converges
weakly-* to νt. Furthermore, for positive functions φ the limit measure νt is non-
negative and from this it follows that νt becomes zero if there are no particles left
in the system.
Choosing now a function ψ(t) that depends only on time, we calculate
ψ(t) dνδt =
ψ(t).
The ratio N/Ni is the percentage of active particles at time t. This ratio is bounded
by 1 and decreasing, therefore it is uniformly bounded in the space BV (0, T ) and
by the compact embedding of BV (0, T )∩L∞(0, T ) in L2(0, T ), it converges in L2,
for a subsequence δ → 0, to a limit a ∈ BV (0, T ).
If we project now the measure νt to the interval [0, T ], we get that the projection
satisfies
proj[0, T ] νt = a(t) dt
and according to [6, Ch. 1, Thm. 10], the decomposition and convergence to νt
follow from the slicing of measures. �
We conclude with the following theorem which states that the limit measure νt
satisfies the Lifshitz–Slyozov–Wagner equation in a weak sense. Note that in the
theorem’s statement, the initial condition is defined as
t, Rδi (t)
dνδ0 :=
0, Rδi (0)
Theorem 4.2. The measure νt satisfies the Lifshitz–Slyozov–Wagner equation in
the sense that
φ(t, R) +
ū− 1
φ(t, R)
a(t) dνt +
φ(0, R) dν0 = 0, (4.1)
14 APOSTOLOS DAMIALIS
for all smooth and compactly supported functions φ ∈ C∞c ([0, T )× R+), where the
mean field ū is given by
R dνt
R2 dνt.
Proof. We begin by computing the mean-field limit ū(t). For a continuous function
φ(t) there holds, by the definition of ūδ,
1 + δαR
dνδt =
∑ R2i
1 + δαRi
1 + δαRi
1 + δαR
dνδt .
Taking the limit δ → 0 on both sides, Lemma 4.1 gives
R dνt
R2 dνt.
Consider now a smooth and compactly supported function φ as in the theorem’s
statement. Then, the fundamental theorem of calculus and Lemma 3.4 give
t, Rδi (t)
dνδt +
0, Rδi (0)
t, Rδi (t)
+ Ṙδi (t)
t, Rδi (t)
dνδt +
0, Rδi (0)
t, Rδi (t)
ūδ − 1
t, Rδi (t)
dνδt +O(δ
0, Rδi (0)
dνδ0 .
The result follows by taking the limit for a subsequence δ → 0 and using the strong
convergence of ūδ. �
As a concluding remark, we note that the well-posedness (existence, uniqueness,
and continuous dependence on initial data) of the weak formulation (4.1) can be
treated by the methods developed by Niethammer and Pego in [14] and [15].
Acknowledgments
Thanks are due to Barbara Niethammer for her substantial help and to Nick
Alikakos and Bob Pego for helpful discussions. Thanks are also due to the anony-
mous referee for a careful reading of the manuscript.
References
[1] N. D. Alikakos and G. Fusco. The equations of Ostwald ripening for dilute systems. J. Stat.
Phys. 95 No. 5/6 (1999), pp. 851–866.
[2] N. D. Alikakos and G. Fusco. Ostwald ripening for dilute systems under quasistationary
dynamics. Comm. Math. Phys. 238 No. 3 (2003), pp. 429–479.
[3] N. C. Bartelt, W. Theis, and R. M. Tromp. Ostwald ripening of two-dimensional islands on
Si(001). Phys. Rev. B 54 (1996), pp. 11741–11751.
[4] D. Cioranescu and F. Murat. A strange term coming from nowhere. In Topics in the math-
ematical modelling of composite materials, A. Cherkaev, R. Kohn eds. Birkhäuser, Boston,
MA, 1997, pp. 45–94.
THE LSW EQUATION FOR REACTION-CONTROLLED KINETICS 15
[5] S. Dai and R. L. Pego. Universal bounds on coarsening rates for mean-field models of phase
transitions. SIAM J. Math. Anal. 37 No. 2 (2005), pp. 347–371.
[6] L. C. Evans. Weak convergence methods for nonlinear partial differential equations. American
Mathematical Society, Providence, RI, 1990.
[7] D. Gilbarg and N. S. Trudinger. Elliptic partial differential equations of second order.
Springer-Verlag, Berlin, second edition, 1983.
[8] M. E. Gurtin. Thermomechanics of evolving phase boundaries in the plane. The Clarendon
Press, Oxford, 1993.
[9] I. M. Lifshitz and V. V. Slyozov. The kinetics of precipitation from supersaturated solid
solutions. J. Phys. Chem. Solids 19 (1961), pp. 35–50.
[10] W. W. Mullins and R. F. Sekerka. Morphological stability of a particle growing by diffusion
or heat flow. J. Appl. Phys. 34 No. 2 (1963), pp. 323–329.
[11] B. Niethammer. Derivation of the LSW-theory for Ostwald ripening by homogenization meth-
ods. Arch. Rat. Mech. Anal. 147 (1999), pp. 119–178.
[12] B. Niethammer. The LSW model for Ostwald ripening with kinetic undercooling. Proc. R.
Soc. Edinburgh 130A No. 6 (2000), pp. 1337–1361.
[13] B. Niethammer and F. Otto. Ostwald ripening: The screening length revisited. Calc. Var.
13 No. 1 (2001), pp. 33–68.
[14] B. Niethammer and R. L. Pego. On the initial-value problem in the Lifshitz–Slyozov–Wagner
theory of Ostwald ripening. SIAM J. Math. Anal. 31 No. 3 (2000), pp. 467–485.
[15] B. Niethammer and R. L. Pego. Well-posedness for measure transport in a family of nonlocal
domain coarsening models. Indiana Univ. Math. J. 54 No. 2 (2005), pp. 499–530.
[16] B. Niethammer and J. J. L. Velázquez. Homogenization in coarsening systems I: Deterministic
case. Math. Mod. Meth. Appl. Sci. 14 No. 8 (2004), pp. 1211–1233.
[17] B. Niethammer and J. J. L. Velázquez. Homogenization in coarsening systems II: Stochastic
case. Math. Mod. Meth. Appl. Sci. 14 No. 9 (2004), pp. 1–24.
[18] L. Ratke and P. W. Voorhees. Growth and coarsening: Ostwald ripening in material process-
ing. Springer-Verlag, Berlin, 2002.
[19] V. V. Slezov and V. V. Sagalovich. Diffusive decomposition of solid solutions. Sov. Phys.
Usp. 30 No. 1 (1987), pp. 23–45.
[20] J. J. L. Velázquez. On the effect of stochastic fluctuations in the dynamics of the Lifshitz–
Slyozov–Wagner model. J. Stat. Phys. 99 No. 1/2 (2000), pp. 231–252.
[21] P. W. Voorhees. The theory of Ostwald ripening. J. Stat. Phys. 38 No. 1/2 (1985), pp. 231–
[22] C. Wagner. Theorie der Alterung von Niederschlägen durch Umlösen. Z. Elektrochem. 65
No. 7/8 (1961), pp. 581–591.
Institut für Mathematik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099
Berlin, Germany
E-mail address: damialis@mathematik.hu-berlin.de
Current address: Department of Mathematics, University of Athens, Panepistemiopolis, 15784
Athens, Greece
mailto:damialis@mathematik.hu-berlin.de
	1. Introduction
	2. Formulation, scaling, and preliminary estimates
	3. Approximation and growth-rate estimates
	4. Homogenization
	Acknowledgments
	References
ABSTRACT
  We rigorously derive a weak form of the Lifshitz-Slyozov-Wagner equation as
the homogenization limit of a Stefan-type problem describing
reaction-controlled coarsening of a large number of small spherical particles.
Moreover, we deduce that the effective mean-field description holds true in the
particular limit of vanishing surface-area density of particles.

<|endoftext|><|startoftext|>
Introduction 1
1.1 Canonical AZD hcan . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Supercanonical AZD ĥcan . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Variation of the supercanonical AZD ĥcan . . . . . . . . . . . . . 5
2 Proof of Theorem 1.7 6
2.1 Upper estimate of K̂Am . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Lower estimate of K̂Am . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Independence of ĥcan,A from hA . . . . . . . . . . . . . . . . . . 9
2.4 Completion of the proof of Theorem 1.7 . . . . . . . . . . . . . . 10
2.5 Comparison of hcan and ĥcan . . . . . . . . . . . . . . . . . . . . 10
3 Variation of ĥcan under projective deformations 11
3.1 Construction of ĥcan on a family . . . . . . . . . . . . . . . . . . 12
3.2 Semipositivity of the curvature current of ĥm,A . . . . . . . . . . 13
3.3 Uniqueness of ĥcan,A for singular hA’s . . . . . . . . . . . . . . . 16
3.4 Case dimS > 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Completion of the proof of Theorem 1.10 . . . . . . . . . . . . . 17
4 Appendix 20
1 Introduction
Let X be a smooth projective variety and let KX be the canonical bundle of X .
In algebraic geometry, the canonical ring R(X,KX) := ⊕∞m=0Γ(X,OX(mKX))
is one of the main object to study.
http://arxiv.org/abs/0704.0566v5
Let X be a smooth projective variety such that KX is pseudoeffective. The
purposes of this article are twofold. The first purpose is to construct a singular
hermitian metric ĥcan on KX such that
1. ĥcan is uniquely determined by X .
ĥcan
is semipositive in the sense of current.
3. H0(X,OX(mKX)⊗I(ĥmcan)) ≃ H0(X,OX(mKX)) holds for every m ≧ 0,
where I(ĥmcan) denotes the multiplier ideal sheaf of ĥmcan as is defined in [N].
And the second purpose is to study the behavior of ĥcan on projective families.
We may summerize the 2nd and the 3rd conditions by introducing the following
notion.
Definition 1.1 (AZD)([T1, T2]) Let M be a compact complex manifold and
let L be a holomorphic line bundle on M . A singular hermitian metric h on L
is said to be an analytic Zariski decomposition (AZD in short), if the followings
hold.
1. Θh is a closed positive current.
2. For every m ≥ 0, the natural inclusion
H0(M,OM (mL)⊗ I(hm)) → H0(M,OM (mL))
is an isomorphim.
Remark 1.2 A line bundle L on a projective manifold X admits an AZD, if
and only if L is pseudoeffective ([D-P-S, Theorem 1.5]). �
In this sense, the first purpose of this article is to construct an AZD on KX
depending only on X , when KX is pseudoeffective (by Remark 1.2 this is the
minimum requirement for the existence of an AZD).
The main motivation to construct such a singular hermitian metric is to
study the canonical ring in terms of it. This is indeed possible. For example,
we obtain the invariance of plurigenera under smooth projective deformations
(cf. Corollary 1.12).
In fact the hermitian metric constructed here is useful in many other con-
texts. Other applications and a generalization to subKLT pairs will be treated
in the forthcoming papers ([T6]).
I would like to express thanks to Professor Bo Berndtsson who pointed out
an error in the previous version.
1.1 Canonical AZD hcan
If we assume the stronger assumption that X has nonnegative Kodaira dimen-
sion, we have already konwn how to construct a canonical AZD for KX . Let us
review the construction in [T5].
Theorem 1.3 ([T5]) Let X be a smooth projective variety with nonnegative
Kodaira dimension. We set for every point x ∈ X
Km(x) := sup{| σ |
m (x);σ ∈ Γ(X,OX(mKX)), |
(σ ∧ σ̄) 1m |= 1}
K∞(x) := lim sup
Km(x).
hcan := the lower envelope of K
is an AZD on KX. �
Remark 1.4 By the ring structure of R(X,KX), we see that
lim sup
Km(x) = sup
Km(x)
holds. �
Remark 1.5 Since h∞ depends only on X, the volume
h−1can
is an invariant of X. �
Apparently this construction is very canonical, i.e., hcan depends only on the
complex structure of X . We call hcan the canonical AZD of KX . But this
construction works only if we know that the Kodaira dimension of X is nonneg-
ative apriori. This is the main defect of hcan. For example, hcan is useless to
solve the abundance conjecture.
1.2 Supercanonical AZD ĥcan
To avoid the defect of hcan we introduce the new AZD ĥcan. Let us use the
following terminology.
Definition 1.6 Let (L, hL) be a singular hermitian line bundle on a complex
manifold X. (L, hL) is said to be pseudoeffective, if the curvature current of hL
is semipositive. �
Let X be a smooth projective n-fold such that the canonical bundle KX is
pseudoeffective. Let A be a sufficiently ample line bundle such that for every
pseudoeffective singular hermitian line bundle (L, hL) on X , OX(A+L)⊗I(hL)
and OX(KX+A+L)⊗I(hL) are globally generated. Such an ample line bundle
A extists by L2-estimates. Let hA be a a C
∞ hermitian metric on A with
strictly positive curvature 1. Let us fix a C∞ volume form dV on X . By the
L2-extension theorem ([O]) we may and do assume that A is sufficiently ample
1Later we shall also consider the case that hA is any C
∞ hermitian metric (without posi-
tivity of curvature) or a singular hermitian metric on A.
so that for every x ∈ X and for every pseudoeffective singular hermitian line
bundle (L, hL), there exists a bounded interpolation operator
Ix : A
2(x, (A+ L)x, hA · hL, δx) → A2(X,A+ L, hA · hL, dV )
such that the operator norm of Ix is bounded by a positive constant independent
of x and (L, hL), where A
2(X,A + L, hA · hL, dV ) denotes the Hilbert space
defined by
A2(X,A+L, hA·hL, dV ) := {σ ∈ Γ(X,OX(A+L)⊗I(hL)) |
| σ |2 ·hA·hL·dV < +∞}
with the L2 inner product
(σ, σ′) :=
σ · σ̄′ · hA · hL · dV
and A2(x, (A+L)x, hA ·hL, δx) is defined similarly, where δx is the Dirac measure
supported at x. We note that if hL(x) = +∞, then A2(x, (A+L)x, hA ·hL, δx) =
0. For every x ∈ X we set
K̂Am(x) := sup{| σ |
m (x) | σ ∈ Γ(X,OX(A+mKX)), |
A · (σ ∧ σ̄)
m |= 1}.
Here | σ | 2m is not a function on X , but the supremum is takan as a section of
the real line bundle |A | 2m ⊗ |KX |2 in the obvious manner2. Then h
A · K̂Am is a
continuous semipositive (n, n) form on X . Under the above notations, we have
the following theorem.
Theorem 1.7 We set
K̂A∞ := lim sup
A · K̂
ĥcan,A := the lower envelope of K̂
Then ĥcan,A is an AZD of KX . And we define
ĥcan := the lower envelope of inf
ĥcan,A,
where inf means the pointwise infimum and A runs all the ample line bundles
on X. Then ĥcan is a well defined AZD
3 depending only on X. �
Definition 1.8 (Supercanonical AZD) We call ĥcan in Theorem 1.7 the
supercanonical AZD of KX . And we call the semipositive (n, n) form ĥ
can the
supercanonical volume form on X. �
Remark 1.9 Here “super” means that corresponding volume form ĥ−1can satisfies
the inequality :
ĥ−1can ≧ h
if X has nonnegative Kodaria dimension (cf. Theorem 2.9). �
In the statement of Theorem 1.7, one may think that ĥcan,A may dependent of
the choice of the metric hA. But later we prove that ĥcan,A is independent of
the choice of hA (cf. Theorem 2.7).
2We have abused the notations |A|, |KX| here. These notations are similar to the notations
of corresponding linear systems. But I think there is no fear of confusion.
3I believe that ĥcan,A is already independent of the sufficiently ample line bundle A.
1.3 Variation of the supercanonical AZD ĥcan
Let f : X −→ S be an algebraic fiber space, i.e., X,S are smooth projective
varieties and f is a projective morphism with connected fibers. Suppose that
for a general fiber Xs := f
−1(s), KXs is pseudoeffective
4. In this case we may
define a singular hermitian metric ĥcan on KX/S similarly as above. Then ĥcan
have a nice properties on f : X −→ S as follows.
Theorem 1.10 Let f : X −→ S be an algebraic fiber space such that for a
general fiber Xs, KXs is pseudoeffective. We set S
◦ be the maximal nonempty
Zariski open subset of S such that f is smooth over S◦ and X◦ = f−1(S◦).
Then there exists a unique singular hermitian metric ĥcan on KX/S such that
1. ĥcan has semipositive curvature in the sense of current.
2. ĥcan |Xs is an AZD of KXs for every s ∈ S◦.
3. There exists the union F of at most countable union of proper subvarieties
of S such that for every s ∈ S \F ,
ĥcan|Xs ≦ ĥcan,s
holds, where ĥcan,s denotes the supercanonical AZD of KXs .
4. There exists a subset G of measure 0 in S◦, such that for every s ∈ S◦ \G,
ĥcan |Xs = ĥcan,s holds.
Remark 1.11 Even for s ∈ G, ĥcan|Xs is an AZD of KXs by 2. I do not know
whether F or G really exists in some cases. �
By Theorem 1.10 and the L2-extension theorem ([O-T, p.200, Theorem]), we
obtain the following corollary immediately.
Corollary 1.12 ([S1, S2, T3]) Let f : X −→ S be a smooth projective family
over a complex manifold S. Then plurigenera Pm(Xs) := dimH
0(Xs,OXs(mKXs))
is a locally constant function on S �
The following corollary is immediate consequence of Theorem 1.10, since the
supercanonical AZD is always has minimal singularities (cf. Definition 2.2 and
Remark 2.8).
Corollary 1.13 Let f : X −→ Y be an algebraic fiber space. Suppose that
KX and KY are pseudoeffective. Let ĥcan be the canonical singular hermitian
metric on KX/Y constructed as in Theorem 1.10. Let ĥcan,X , ĥcan,Y be the
supercanonical AZD’s of KX and KY respectively. Then there exists a positive
constant C such that
ĥcan,X ≦ C · ĥcan · f∗ĥcan,Y
holds on X. �
4This condition is equivalent to the one that for some regular fiber Xs, KXs is pseudoef-
fective. This is well known. For the proof, see Lemma 3.7 below for example.
Cororally 1.13 is very close to Iitaka’s conjecture which asserts that
Kod(X) ≧ Kod(Y ) + Kod(F )
holds for any algebraic fiber space f : X −→ Y , where F is a general fiber of
f : X −→ Y and Kod(M) denotes the Kodaira dimension of a compact complex
manifold M .
In this paper all the varieties are defined over C. And we frequently use the
classical result that the supremum of a family of plurisubharmonic functions
locally uniformly bounded from above is again plurisubharmonic, if we take the
uppersemicontinuous envelope of the supremum ([L, p.26, Theorem 5]). For
simpliciy, we denote the upper(resp. lower)semicontinuous envelope simply by
the upper(resp. lower) envelope. We note that this adjustment occurs only
on the set of measure 0. In this paper all the singular hermitian metrics are
supposed to be lowersemicontinuous.
There are other applications of the supercanonical AZD. Also it is imme-
diate to generalize it to the log category and another generalization involving
hermitian line bundles with semipositive curvature is also possible. These will
be discussed in the forthcoming papers.
2 Proof of Theorem 1.7
In this section we shall prove Theorem 1.7. We shall use the same notations as
in Section 1.2. The upper estimate of K̂Am is almost the same as in [T5], but the
lower estimate of K̂Am requires the L
2 extension theorem ([O-T, O]).
2.1 Upper estimate of K̂Am
Let X be as in Theorem 1.7 and let n denote dimX and let x ∈ X be an
arbitrary point. Let (U, z1, · · · , zn) be a coordinate neighbourhood ofX which is
biholomorphic to the unit open polydisk ∆n such that z1(x) = · · · = zn(x) = 0.
Let σ ∈ Γ(X,OX(A+mKX)). Taking U sufficiently small, we may assume
that (z1, · · · , zn) is a holomorphic local coodinate on a neighbourhood of the
closure of U and there exists a local holomorphic frame eA of A on a neighbour-
hood of the clousure of U . Then there exists a bounded holomorphic function
fU on U such that
σ = fU · eA · (dz1 ∧ · · · ∧ dzn)⊗m
holds. Suppose that
A · (σ ∧ σ̄)
m |= 1
holds. Then we see that
| fU (z) |
m dµ(z) ≦ (inf
hA(eA, eA))
hA(eA eA)
m | fU |2 dµ(z)
≦ (inf
hA(eA, eA))
hold, where dµ(z) denotes the standard Lebesgue measure on the coordinate.
Hence by the submeanvalue property of plurisubharmonic functions,
A · | σ |
m (x) ≦ { hA(eA, eA)(x)
infU hA(eA, eA)
m · π−n· |dz1 ∧ · · · ∧ dzn |2 (x)
holds. Let us fix a C∞ volume form dV on X . Since X is compact and every
line bundle on a contractible Stein manifold is trivial, we have the following
lemma.
Lemma 2.1 There exists a positive constant C independent of the line bundle
A and the C∞ metric hA such that
lim sup
A · K̂
m ≦ C · dV
holds on X. �
2.2 Lower estimate of K̂Am
Let hX be any C
∞ hermitian metric on KX . Let h0 be an AZD of KX defined
by the lower envelope of :
inf{h(x) | h is a singular hermitian metric on KX with Θh ≧ 0,h ≧ hX}.
Then by the classical theorem of Lelong ([L, p.26, Theorem 5]) it is easy to
verify that h0 is an AZD of KX (cf. [D-P-S, Theorem 1.5]). h0 is of minimal
singularities in the following sense.
Definition 2.2 Let L be a pseudoeffective line bundle on a smooth projective
variety X. An AZD h on L is said to be an AZD of minimal singularities,
if for any AZD h′ on L, there exists a positive constant C such that
h ≦ C · h′
holds. �
Let us compare h0 and ĥcan.
By the L2-extension theorem ([O]), we have the following lemma.
Lemma 2.3 There exists a positive constant C independent of m such that
K(A+mKX , hA · hm−10 ) ≧ C · (hA · hm0 )−1
holds, where K(A+mKX , hA · hm−10 ) is the (diagonal part of) Bergman kernel
of A+mKX with respect to the L
2-inner product:
(σ, σ′) := (
σ ∧ σ̄′ · hA · hm−10 ,
where we have considered σ, σ′ as A+ (m− 1)KX valued canonical forms. �
Proof of Lemma 2.3. By the extremal property of the Bergman kernel (see
for example [Kr, p.46, Proposition 1.4.16]) we have that
K(A+mKX , hA·hm−10 )(x) = sup{|σ(x) |
2| σ ∈ Γ(X,OX(A+mKX)⊗I(hm−10 )), ‖σ‖= 1},
holds for every x ∈ X , where ‖σ‖= (σ, σ) 12 . Let x be a point such that h0 is not
+∞ at x. Let dV be an arbitrary C∞ volume form on X as in Section 1.2. Then
by the L2-extension theorem ([O, O-T]) and the sufficiently ampleness of A (see
Section 1.2), we may extend any τx ∈ (A+mKX)x with hA·hm−10 ·dV −1(τx, τx) =
1 to a global section τ ∈ Γ(X,OX(A+mKX)⊗ I(hm−10 )) such that
‖τ ‖≦ C0,
where C0 is a positive constant independent of x and m. Let C1 be a positive
constant such that
h0 ≧ C1 · dV −1
holds on X . By (1), we obtain the lemma by taking C = C−10 · C1. �
Let σ ∈ Γ(X,OX(A+mKX)⊗ I(hm−10 )) such that
σ ∧ σ̄ · hA · hm−10 = 1
| σ |2 (x) = K(A+mKX , hA · hm−10 )(x)
hold, i.e., σ is a peak section at x. Then by the Hölder inequality we have that
A · (σ ∧ σ̄)
m | ≦ (
hA · hm0 · | σ |2 ·h−10 )
m · (
h−10 )
h−10 )
hold. Hence we have the inequality:
K̂Am(x) ≧ K(A+mKX , hA · hm−10 )(x)
m · (
h−10 )
m (2)
holds. Now we shall consider the limit
lim sup
A ·K(A+mKX , hA · h
Let us recall the following result.
Lemma 2.4 ([D, p.376, Proposition 3.1])
lim sup
A ·K(A+mKX , hA · h
m = h−10
holds. �
Remark 2.5 In ([D, p.376, Proposition 3.1], Demailly only considered the local
version of Lemma 2.4. But the same proof works in our case by the sufficiently
ampleness of A. This kind of localization principle for Bergman kernels is quite
standard. �
In fact the L2-extension theorem ([O-T, O]) implies the inequality
lim sup
A ·K(A+mKX , hA · h
m ≧ h−10
and the converse inequality is elementary. See [D] for details and applications.
Hence letting m tend to infinity in (2), by Lemma 2.4, we have the following
lemma.
Lemma 2.6
lim sup
A · K̂
m ≧ (
h−10 )
−1 · h−10
holds. �
By Lemmas 2.1 and 2.6, we see that
K̂A∞ := lim sup
A · K̂
exists as a bounded semipositive (n, n) form on X . We set
ĥcan,A := the lower envelope of (K
2.3 Independence of ĥcan,A from hA
In the above construction, ĥcan,A depends on the choice of the C
∞ hermitian
metric hA apriori. But actually ĥcan,A is independent of the choice of hA.
Let h′A be another C
∞-hermitian metric on A. We define
(K̂Am)
′ := sup{| σ | 2m ; σ ∈ Γ(X,OX(A+mKX)), |
(h′A)
m · (σ ∧ σ̄) 1m |= 1}.
We note that the ratio hA/h
A is a positive C
∞-function on X and
m = 1
uniformly on X . Since the definitions of K̂Am and (K̂
′ use the extremal prop-
erties, we see easily that for every positive number ε, there exists a positive
integer N such that for every m ≧ N
(1− ε)(K̂Am)′ ≦ K̂Am ≦ (1 + ε)(K̂Am)′
holds on X . Hence we obtain the following uniqueness theorem.
Theorem 2.7 K̂A∞ = lim supm→∞ h
A · K̂Am is independent of the choice of the
C∞ hermitian metric hA. Hence hcan,A is independent of the choice of the C
hermitian metric hA. �
2.4 Completion of the proof of Theorem 1.7
Let h0 be an AZD of KX constructed as in Section 2.1. Then by Lemma 2.6 we
see that
ĥcan,A ≦ (
h−10 ) · h0
holds. Hence we see
I(ĥmcan,A) ⊇ I(hm0 )
holds for every m ≧ 1. This implies that
H0(X,OX(mKX)⊗I(hm0 )) ⊆ H0(X,OX(mKX)⊗I(ĥmcan,A)) ⊆ H0(X,OX(mKX))
hold, hence
H0(X,OX(mKX)⊗ I(ĥmcan,A)) ≃ H0(X,OX(mKX))
holds for every m ≧ 1. And by the construction and the classical theorem of
Lelong ([L, p.26, Theorem 5]) stated in Section 1.3, ĥcan,A has semipositive
curvature in the sense of current. Hence ĥcan,A is an AZD of KX and depends
only on X and A by Lemma 2.7.
Let us consider
K̂∞ := sup
K̂∞,A
where sup means the pointwise supremum and A runs all the sufficiently am-
ple line bundle on X . Then Lemma 2.1, we see that K̂∞ is a well defined
semipositive (n, n) form on X . We set
ĥcan := the lower envelope of K̂
Then by the construction, ĥcan ≦ ĥcan,A for every ample line bundle A. Since ĥA
is an AZD ofKX , ĥcan is also an AZD ofKX indeed (again by [L, p.26, Theorem
5]). Since ĥcan,A depends only on X and A, ĥcan is uniquely determined by X .
This completes the proof of Theorem 1.7. �
Remark 2.8 As one see Section 2.2, we see that ĥcan is an AZD of KX of
minimal singularities (cf. Definition 2.2). �
2.5 Comparison of hcan and ĥcan
Suppose that X has nonnegative Kodaira dimension. Then by Theorem 1.3, we
can define the canonical AZD hcan on KX . We shall compare hcan and ĥcan.
Theorem 2.9
ĥcan,A ≦ hcan
holds on X. In particular
ĥcan ≦ hcan
holds on X �
Proof of Theorem 2.9. If X has negative Kodaira dimension, then the right
hand side is infinity. Hence the ineuqality is trivial.
Suppose thatX has nonnegative Kodaira dimension. Let σ ∈ Γ(X,OX(mKX))
be an element such that
(σ ∧ σ̄)
m |= 1
Let x ∈ X be an arbitrary point on X . Since OX(A) is globally generated by
the definition of A, there exists an element τ ∈ Γ(X,OX(A)) such that τ(x) 6= 0
and hA(τ, τ) ≦ 1 on X . Then we see that
hA(τ, τ)
m · (σ ∧ σ̄) 1m ≦ 1
holds. This implies that
K̂Am(x) ≧|τ(x) |
m ·Km(x)
holds at x. Noting τ(x) 6= 0,letting m tend to infinity, we see that
K̂A∞(x) ≧ K∞(x)
holds. Since x is arbitrary, this completes the proof of Theorem 2.9. �
Remark 2.10 The equality hcan = ĥcan implies the abundance of KX . �
By the same proof we obtain the following comparison theorem (without
assuming X has nonnegative Kodaira dimension).
Theorem 2.11 Let A,B a sufficiently ample line bundle on X. Suppose that
B −A is globally generated, then
ĥcan,B ≦ ĥcan,A
holds. �
Remark 2.12 Theorem 2.11 implies that
ĥcan = lim
ĥcan,ℓA
holds for any ample line bundle A on X. �
3 Variation of ĥcan under projective deforma-
tions
In this section we shall prove Theorem 1.10. The main ingredient of the proof
is the variation of Hodge structure.
3.1 Construction of ĥcan on a family
Let f : X −→ S be an algebraic fiber space as in Theorem 1.10.
The construction of ĥcan can be performed simultaeneously on the family
as follows. The same construction works for flat projective family with only
canonical singularities. But for simplicity we shall work on smooth category.
Let S◦ be the maximal nonempty Zariski open subset of S such that f is
smooth over S◦ and let us set X◦ := f−1(S◦).
Hereafter we shall assume that dimS = 1. The general case of Theorem 1.10
easily follows from just by cutting down S to curves. Let A be a sufficiently
ample line bundle on X such that for every pseudoeffective singular hermitian
line bundle (L, hL), OX(A+L)⊗I(hL) and OX(KX+A+L)⊗I(hL) are globally
generated and OXs(A+L |Xs)⊗I(hL |Xs) and OXs(KXs+A+L |Xs)⊗I(hL |Xs)
are globally generated for every s ∈ S◦ as long as hL|Xs is well defined.
Let us assume that there exists a smooth member D of | 2A | such that D
does not contain any fiber over S◦. Let σD a holomorphic section of 2A with
divisor D. We consider the singular hermitian metric
hA :=
| σD |
on A. We set
Em := f∗OX(A+mKX/S).
Since we have assumed that dimS = 1, Em is a vector bundle for every m ≧ 1.
We denote the fiber of the vector bundle over s ∈ S by Em,s. Then we shall
define the sequence of 1
A-valued relative volume forms by
K̂Am,s := sup{|σ |
m ;σ ∈ Em,s, |
A · (σ ∧ σ̄)
m |= 1}
for every s ∈ S◦. This fiberwise construction is different from that in Section
1.2 in the following two points :
1. We use the singular metric hA |Xs instead of a C∞ hermitian metric on
A |Xs.
2. We use Em,s instead of Γ(Xs,OXs(A|Xs +mKXs)).
We note that the 2nd difference occurs only over at most countable union of
proper analytic subsets in S◦. Since hA is singular, at some point s ∈ S◦ and
for some positive integer m0, K̂
might be identically 0 on Xs. But for any
s ∈ S◦ we find a positive integer m0 such that for every m ≧ m0, we have
A |Xs) = OXs holds for every m ≧ m0. Hence even in this case we see that
K̂Am,s is not identically 0 for every sufficiently large m.
We define the relative |A | 2m valued volume form K̂Am by
K̂Am|Xs := K̂Am,s(s ∈ S)
and a relative volume form K̂A∞ by
K̂A∞|Xs := lim sup
A · K̂
m,s(s ∈ S).
Of course the above construction of K̂Am,s(s ∈ S◦) works also for C∞ hermitian
metric instead of the singular hA as above. The reason why we use the singular
hA is that we shall use the variation of Hodge structure to prove the plurisub-
harmonic variation of log K̂Am,s.We may use a C
∞ metric with strictly positive
curvature on A, instead of the singular hA as above, if we use the plurisubhar-
monicity properties of Bergman kernels ([Ber, Theorem 1.2]) instead of Theorem
3.1. See Theorem 4.1 below.
We define singular hermitian metrics on A+mKX/S by
ĥm,A := the lower envelope of (K̂
Let us fix a C∞ hermitian metric hA,0 on A and we set
ĥcan,A := the lower envelope of lim inf
A,0 · ĥm,A.
Cleary ĥcan,A does not depend on the choice of hA,0 (in this sense, the presence
of hA,0 is rather auxilary). Then we define
ĥcan := the lower envelope of inf
ĥcan,A,
where A runs all the ample line bundle on X . At this moment, ĥcan is defined
only on KX/S |X◦. The extension of ĥcan to the singular hermitian metric on
the whole KX/S will be discussed later.
3.2 Semipositivity of the curvature current of ĥm,A
To prove the semipositivity of the curvature of ĥm,A, the following theorem is
essential.
Theorem 3.1 ([Ka3, p.174,Theorem 1.1] see also [F, Ka1]) φ : M −→ C be a
projective morphism with connected fibers from a smooth projective variety M
onto a smooth curve C. Let KM/C be the relative canonical bundle. We set
F := φ∗OM (KM/C)) and let C◦ denote the nonempty maximal Zariski open
subset of C such that φ is smooth over C◦. Let hM/C be the hermitian metric
on F | C◦ by
hM/C(σ, σ
′) := (
σ ∧ σ′,
where n = dimM − 1. Let π : P(F ∗) −→ C be the projective bundle associated
with F ∗ and Let L −→ P(F ∗) be the tautological line bundle. Let hL denote the
hermitian metric on L | π−1(S◦) induced by hM/C .
Then hL has semipositive curvature on π
−1(S◦) and hL extends to the sin-
gular hermitian metric on L with semipositive curvature current. �
We define the pseudonorm ‖σ‖ 1
of σ ∈ Em,s by
‖σ‖ 1
A · (σ ∧ σ̄)
m |m2 .
We set Em = f∗OX(A+mKX/S) and let Lm be the tautological line bundle
on P(E∗m), where E
m denotes the dual of Em. By Theorem 3.1 and the branched
covering trick, we obtain the following essential lemma.
Lemma 3.2 ([Ka1, p.63, Lemma 7 and p.64, Lemma 8]) Let
σ ∈ Γ(X,OX(A + mKX/S)). Then ‖ ‖ defines a singular hermitian metric
with semipositive curvature on Lm. �
Proof of Lemma 3.2. If there were no A, the lemma is completely the same as
[Ka1, p.63, Lemma 7 and p.64, Lemma 8]. In our case, we use the Kawamata’s
trick to reduce the logarithmic case to the non logarithmic case. Since this trick
has been used repeatedly by Kawamata himself (see [Ka2, Ka3] for example),
the following argument has no originality. We consider the multivalued relative
log canonical form
Then there exists a finite cyclic covering
µ : Y −→ X
such that µ∗( σ√
m is a (single valued) relative canonical form on Y 5. Here
the branch locus of µ may be much larger than the union of D∪(σ). But it does
not matter. The branch covering is used to reduce the log canonical case to the
canonical case. Let π : Ỹ −→ Y be an equivariant resolution of singularities
and let
f̃ : Ỹ −→ S
be the resulting family. We shall denote the composition µ ◦ π : Ỹ −→ X by
µ̃. Let U be a Zariski open subset of Sσ such that f̃ is smooth. We note that
the Galois group action is isometric on f̃∗OỸ (KỸ /S) with respect to the natural
L2-inner product on f̃∗OỸ (KỸ /S). Therefore by Theorem 3.1, we see that ‖ ‖
defines a singular hermitian metric on Lm with semipositive curvature on a
nonempty Zariski open of P(E∗m). Again by Theorem 3.1 the singular hermitian
metric extends to the whole P(E∗m) preserving semipositive curvature property.
We also present an alternative proof indicated by Bo Berndtsson at the workshop
at MSRI in April, 2007.
Alternative proof of Lemma 3.2(cf. [B-P, Section 6]).
We use the eqality
| σ | 2m= | σ |
| σ |2m−1m
and view
| σ |2m−1m
as a singular hermitian metric on (m− 1)KX/S +A. Then by [T4, Therem 5.4]
or [B-P], we see that
A · (σ ∧ σ̄)
m | 2m
defines a singular hermitian metric with semipositive curvature current on Lm.
The rest of the proof is identical as the previous one. �
5If we use a C∞ hermitian metric instead of the above hA, we also construct a cyclic
covering µ : Y −→ X such that 1
µ∗L is a genuine line bundle on Y and µ∗σ
m is a 1
valued canonical form on Y .
Remark 3.3 The metric hA can be replaced by a C
∞-hermitian metric with
semipositive curvature in the second proof. �
Corollary 3.4 (see also [B-P, Section 6]) The curvature Θ
ĥm,A
−1∂∂̄ log K̂m,A
is semipositive everywhere on X◦. �
Proof. Let x ∈ Xs(s ∈ S◦) and let Ω be a holomorphic local generatorof KX/S
and let eA be a holomorphic local generator of A on a neighbourhood U of x
in X◦. Viewing ξ(y) := (e−1A · Ω−m)(y) as an element of the dual of Em,f(y) by
σ ∈ Em,f(y) 7→ σ(y) · (e−1A · Ω−m)(y)(y ∈ U),
log(K̂m,A(y)· | eA |−
m · | Ω |−2 (y)) (y ∈ U)
is plurisubharmonic function on U , since
| ξ(y) |
m ·K̂m,A(y) = sup{
| ξ(y) · σ(y) | 2m
‖ [σ][ξ(y)] ‖
; σ ∈ Em,f(y), [σ][ξ(y)] 6= 0}
holds, where [σ][ξ(y)] denotes the class of σ ∈ Em,f(y) in the fiber Lm,[ξ(y)] at
[ξ(y)] ∈ P(E∗m). �
Now let us consider the behavior of ĥm,A along X\X◦. Since the problem is
local, we may and do assume S is a unit open disk ∆ in C for the time being.
For every local holomorphic section σ of Em the function
A · (σ ∧ σ̄)
is of algebraic growth along S \S◦. More precisely for s0 ∈ S \S◦ as in [Ka1,
p.59 and p. 66] there exist positive numbers C,α, β such that
A · (σ ∧ σ̄)
m |≦ C· |s− s0|−α · | log(s− s0) |β (3)
holds. Moreover as [Ka1, p.66] for a nonvanishing holomorphic section σ of Em
around p ∈ S \S◦, the pseudonorm
‖σ‖ 1
A (σ ∧ σ̄)
has a positive lower bound around every p ∈ S. This implies that ĥm,A is
bounded from below by a smooth metric along the boundary X \X◦. By the
above estimate, ĥm,A is of algebraic growth along the fiber on X \X◦ by its
definition and ĥm,A extends to a singular hermitian metric on
A+KX/S with
semipositive curvature on the whole X .
Now we set
ĥcan,A := the lower envelope of lim inf
A,0 · ĥm,A,
where hA,0 be a C
∞ metric on A (with strictly positive curvature) as in the last
subsetion 6.
To extend ĥcan,A across S \S◦, we use the following useful lemma.
6One may use hA instead of hA,0 here. But the corresponding limits may be different
along D, although the difference is negligible by taking the lower envelopes.
Lemma 3.5 ([B-T, Corollary 7.3]) Let {uj} be a sequence of plurisubharmonic
functions locally bounded above on the bounded open set Ω in Cm. Suppose
further
lim sup
is not identically −∞ on any component of Ω. Then there exists a plurisubhar-
monic function u on Ω such that the set of points
{x ∈ Ω | u(x) 6= (lim sup
uj)(x)}
is pluripolar. �
Since ĥm,A extends to a singular hermitian metric on
A + KX/S with
semipositive curvature current on the whole X and
ĥcan,A := the lower envelope of lim inf
A,0 · ĥm,A
exists as a singular hermitian metric on KX/S on X
◦ = f−1(S◦), we see that
ĥcan,A extends as a singular hermitian metric with semipositive curvature cur-
rent on the whole X by Lemma 3.5.
Repeating the same argument we see that ĥcan is a well defined singular her-
mitian metric with semipositive curvature current on KX/S |X◦ and it extends
to a singular hermitian metric on KX/S with semipositive curvature current on
the whole X .
3.3 Uniqueness of ĥcan,A for singular hA’s
In the above construction, we use a singular hermitian metric hA on A instead of
a C∞ hermitian metric. We note that hA is singular along the divisor D. Hence
the resulting metric may be a little bit different from the original construction
apriori. But actually Theorem 2.7 still holds. Our metric hA is defined as
as above. Let h′A be a C
∞ hermitian metric on A. Let us fix an arbitrary point
s ∈ S◦. Let us fix a Kähler metric on X and let Uε be the ε neighbourhood
of D with respect to the metric. By the upper estimate Lemma 2.1, we see
that although hA is singular along D, there exists a positive integer m0 and a
positive constnat C depending only on s such that for every m ≧ m0 and any
σ ∈ Em,s with
‖ σ ‖ 1
A · (σ ∧ σ̄)
2 = 1,
Uε∩Xs
A · (σ ∧ σ̄)
m |≦ C · ε
holds. This means that there is no mass concentration around the neighbour-
hood of D ∩ Xs. We note that on Xs \Uε the ratio (hA/h′A)
m converges uni-
formly to 1 as m tends to infinity. Hence by the definitions of K̂Am,s and (K̂
we see that for every s ∈ S◦ and δ > 0, there exists a positive integer m1 such
that for every m ≧ m1
(1− δ)(K̂Am,s)′ ≦ K̂Am,s ≦ (1 + δ)(K̂Am,s)′
holds on Xs. Hence we have the following lemma.
Lemma 3.6 K̂A∞,s is same as the one defined by a C
∞ hermitian metric on A
for every s ∈ S◦. �
3.4 Case dimS > 1
In Sections 3.1,3.2, we have assumed that dimS = 1. In the case of dimS > 1
the same proof works similarly. But there are several minor differences.
First there may not exist D ∈| 2A | which does not contain any fibers, hence
the restriction of hA may not be well defined on some fibers in this case. But
this can be taken care by Lemma 3.6. Namely ĥcan is independent of the choice
of D. Hence replacing hA by a C
∞ hermitian metric, we see that K̂A∞ is defined
on all fibers over S◦.
Second in this case Em = f∗OX(A+mKX/S) may not be locally free on S◦.
If Em.s is not locally free at s0 ∈ S◦, then K̂A∞ may be discontinuous at s0. But
J := {s ∈ S◦ | Em is not locally free at s for some m ≧ 1}
is at most a countable union of proper subvarieties of S◦ and
ĥcan,A := the lower envelope of
is a well defined singular hermitian metric with semipositive curvature current
on X◦, i.e., the construction is indifferent to the thin set J . Hence we may
construct ĥcan on X
◦ in this case. The extension of ĥcan as a singular hermitian
metric onKX/S with semipositive curvature current can be accomplished just by
slicing S by curves. Hence we complete the proof of the assertion 1 in Theorem
1.10.
3.5 Completion of the proof of Theorem 1.10
To complete the proof of Theorem 1.10, we need to show that ĥcan defines an
AZD for KXs for every s ∈ S. To show this fact, we modify the construction of
K̂Am. Here we do not assume dimS = 1.
Let us fix s ∈ S◦ and let h0,s be an AZD constructed as in Section 2.2. Let
U be a neighbourhood of s ∈ S◦ in S◦ which is biholomorphic to an open ball in
k(k := dimS). By the L2-extension theorem ([O-T, O]), we have the following
lemma.
Lemma 3.7 Every element of Γ(Xs,OXs(A | Xs+mKXs)⊗I(hm−10,s )) extends
to an element of Γ(f−1(U),OX(A+mKX)) for every positive integer m. �
Proof of Lemma 3.7. We prove the lemma by induction on m. If m =
1, then the L2-extension theorem ([O-T, O]) implies that every element of
Γ(Xs,OXs(A +KXs)) extends to an element of Γ(f−1(U),OX(A +KX)). Let
{σ(m−1)1,s , · · · , σ
(m−1)
N(m−1)} be a basis of Γ(Xs,OXs(A | Xs+(m−1)KXs)⊗I(h̃
0,s ))
for some m ≧ 2. Suppose that we have already constructed holomorphic exten-
sions
{σ̃(m−1)1,s , · · · , σ̃
(m−1)
N(m−1),s} ⊂ Γ(f
−1(U),OX(A+ (m− 1)KX))
of {σ(m−1)1,s , · · · , σ
(m−1)
N(m−1),s} to f
−1(U). We define the singular hermitian metric
Hm−1 on (A+ (m− 1)KX) | f−1(U) by
Hm−1 :=
1∑N(m−1)
j=1 | σ̃
(m−1)
j,s |2
We note that by the choice of A, OXs(A |Xs + mKXs) ⊗ I(hm−10,s ) is globally
generated. Hence we see that
I(hm0,s) ⊆ I(hm−10,s ) ⊆ I(Hm−1|Xs)
hold on Xs. Apparently Hm−1 has a semipositive curvature current. Hence
by the L2-extension theorem ([O-T, p.200, Theorem]), we may extend every
element of
Γ(Xs,OXs(A+mKXs)⊗ I(hm−10,s ))
extends to an element of
Γ(f−1(U),OX(A+mKX)⊗ I(Hm−1)).
This completes the proof of Lemma 3.7 by induction. �
Let hA,0 be a C
∞ hermitian metric on A with strictly positive curvature as
in the end of the last subsection. We define the sequence of {K̃Am,s} by
K̃Am,s := sup{| σ |
m ; σ ∈ Γ(Xs,OXs(A | Xs+mKXs)⊗I(hm−10,s )), |
A,0·(σ∧σ̄)
m |= 1}.
By Lemma 3.7, we obtain the following lemma immediately.
Lemma 3.8
lim sup
A,0 · K̃
m,s ≦ K̂
holds. �
Proof. We set
K̂A,0m,s = sup{| σ |
m ; σ ∈ Em,s, |
A,0 · (σ ∧ σ̄)
m |= 1}.
Then by the definition of K̃Am,s and Lemma 3.7 we have that
K̃Am,s ≦ K̂
m,s (4)
holds on Xs. On the other hand by Lemma 3.6, we see that
lim sup
A,0 · K̂
m,s = lim sup
A,0 · K̂
m,s = K̂∞,s (5)
hold. Hence combining (4) and (5), we complete the proof of Lemma 3.8. �
We set
h̃m,A,s := (K̃
We have the following lemma.
Lemma 3.9 If we define
K̃A∞,s := lim sup
A,0 · K̃
h̃∞,A,s := the lower envelope of K̃
∞.A,s,
h̃∞,A,s is an AZD of KXs . �
Proof. Let h0,s be an AZD of KXs as above. We note that
OXs(A |Xs + mKXs) ⊗ I(hm−10,s ) is globally generated by the definition of A.
Then by the definition of K̃Am,s,
I(hm0,s) ⊆ I(h̃mm,A,s)
holds for every m ≧ 1. Hence by repeating the arugument in Section 2.2, similar
to Lemma 2.6, we have that
h̃∞,A,s ≦ (
h−10,s) · h0,s
holds. Hence h̃∞,A,s is an AZD of KXs . �
Since by the construction and Lemma 3.6
ĥcan,s ≦ h̃∞,A,s
holds on s, we see that ĥcan |Xs is an AZD of KXs . Since s ∈ S◦ is arbitrary,
we see that ĥcan |Xs is an AZD of KXs for every s ∈ S◦. This completes
the proof of the assertion 2 in Theorem 1.10. We have already seen that the
singular hermitian metric ĥcan has semipositive curvature in the sense of current
(cf. Section 3.2 expecially Corollary 3.4). We note that there exists the union
F of at most countable union of proper subvarieties of S such that for every
s ∈ S◦ \F
E(ℓ)m,s = Γ(Xs,OX(ℓA+mKXs))
holds for every ℓ,m ≧ 1. Then by the construction and Theorem 2.11(see
Remark 2.12)7 for every s ∈ S◦ \F ,
ĥcan|Xs ≦ ĥcan,s
holds, where ĥcan,s is the supercanonical AZD of KXs . This completes the proof
of the assertion 3 in Theorem 1.10.
7Theorem 2.11 is used because some ample line bundle on the fiber may not extends to an
ample line bundle on X in general.
We shall define the singular hermitian metric Ĥcan on KX/S|X◦ by
Ĥcan|Xs := ĥcan,s (s ∈ S◦).
Then by the construction of ĥcan there exists a subset Z of measure 0 in X
such that
Ĥcan|X◦ \Z = ĥcan|X◦ \Z
holds. Let us set
G := {s ∈ S◦ | Xs ∩ Z is not of measure 0 in Xs}.
Then since Z is of measure 0, G is of measure 0 in S◦. For s ∈ S \G, by the
definition of the supercanonical AZD ĥcan,s of KXs , we see that
ĥcan|Xs = ĥcan,s
holds. This completes the proof of Theorem 1.10. �
Remark 3.10 As above we have used the singular hermitian metric hA to prove
Theorem 1.10 and then go back to the case of a C∞ metric by the uniqueness
result (Lemma 3.6). This kind of interaction between singular and smooth met-
rics have been seen in the convergence of the currents associated with random
sections of a positive line bundle to the 1-st Chern form of the positive line
bundle (see [S-Z]). My first plan of the proof of Theorem 1.10 was to use the
random sections to go to the smooth case from the singular case. Although I
cannot justify it, it seems to be interesting to pursue this direction. �
4 Appendix
The following theorem is a generalization of Theorem 3.1.
Theorem 4.1 φ : M −→ C be a projective morphism with connected fibers
from a smooth projective variety M onto a smooth curve C. Let KM/C be the
relative canonical bundle. Let (L, hL) be a pseudoeffective singular hermitian
line bundle on M Let m be a positive integer. We set F := φ∗OM (mKM/C +L)
and let C◦ denote the nonempty maximal Zariski open subset of C such that
φ is smooth over C◦. Let π : P(F ∗) −→ C be the projective bundle associated
with F ∗ and Let H −→ P(F ∗) be the tautological line bundle. Let hH denote
the singular hermitian metric on H | π−1(S◦) defined by
hH(σ, σ) := {(
L · (σ ∧ σ)
m }m2 ,
where n = dimM − 1. Then hH has semipositive curvature on π−1(S◦) and
hH extends to the singular hermitian metric on H with semipositive curvature
current. �
Proof. The proof is a minor modification of the proof of Lemma 3.2. Let σ
be a local holomorphic section of H on π−1(S◦). We consider the multivalued
L-valued canonical form m
σ and uniformize it by taking a suitable cyclic
Galois covering
µ : Y −→ X
as in Lemma 3.2. Then applying [Ber, Theorem 1.2] or [T4, Theorem 5.4] (see
also [B-P]) on Y , as in Lemma 3.2, we see that hH defines a singular hermitian
metric on the tautological line bundle on P(E∗m). Hence we see that hH has
semipositive curvature on π−1(S◦). The extension of hH to the whole H is also
follows from [T4, Theorem 5.4]. This completes the proof of Theorem 4.1. �
References
[B-T] E. Bedford, B.A. Taylor : A new capacity of plurisubharmonic functions, Acta
Math. 149 (1982), 1-40.
[Ber] B. Berndtsson: Curvature of vector bundles associated to holomorphic fibra-
tions, math.CV/0511225 (2005).
[B-P] B. Berndtsson, B. and M. Paun : Bergman kernels and the pseudoeffectivity of
relative canonical bundles, math.AG/0703344 (2007).
[D] J.P. Demailly : Regularization of closed positive currents and intersection theory.
J. Algebraic Geom. 1 (1992), no. 3, 361–409.
[D-P-S] J.P. Demailly-T. Peternell-M. Schneider : Pseudo-effective line bundles on
compact Kähler manifolds, International Jour. of Math. 12 (2001), 689-742.
[F] T. Fujita : On Kähler fiber spaces over curves, J. Math. Soc. Japan 30, 779-794
(1978).
[Ka1] Y. Kawamata: Kodaira dimension of Algebraic fiber spaces over curves, Invent.
Math. 66 (1982), pp. 57-71.
[Ka2] Y. Kawamata, Subadjunction of log canonical divisors II, alg-geom
math.AG/9712014, Amer. J. of Math. 120 (1998),893-899.
[Ka3] Y. Kawamata, On effective nonvanishing and base point freeness, Kodaira’s
issue, Asian J. Math. 4, (2000), 173-181.
[Kr] S. Krantz : Function theory of several complex variables, John Wiley and Sons
(1982).
[L] P. Lelong : Fonctions Plurisousharmoniques et Formes Differentielles Positives,
Gordon and Breach (1968).
[N] A.M. Nadel: Multiplier ideal sheaves and existence of Kähler-Einstein metrics of
positive scalar curvature, Ann. of Math. 132 (1990),549-596.
[O-T] T. Ohsawa, K. Takegoshi: L2-extension of holomorphic functions, Math. Z. 195
(1987),197-204.
[O] T. Ohsawa: On the extension of L2 holomorphic functions V, effects of gener-
alization, Nagoya Math. J. 161 (2001) 1-21, Erratum : Nagoya Math. J. 163
(2001).
[S-Z] B. Shiffman, S. Zelditch :Distribution of zeros of random and quantum chaotic
sections of positive line bundles. Comm. Math. Phys. 200 (1999), no. 3, 661–683.
[S1] Y.-T. Siu : Invariance of plurigenera, Invent. Math. 134 (1998), 661-673.
[S2] Y.-T. Siu : Extension of twisted pluricanonical sections with plurisubharmonic
weight and invariance of semipositively twisted plurigenera for manifolds not nec-
essarily of general type, Collected papers Dedicated to Professor Hans Grauert
(2002), pp. 223-277.
[T1] H. Tsuji: Analytic Zariski decomposition, Proc. of Japan Acad. 61(1992) 161-
[T2] H. Tsuji: Existence and Applications of Analytic Zariski Decompositions, Trends
in Math. Analysis and Geometry in Several Complex Variables, (1999) 253-272.
http://arxiv.org/abs/math/0511225
http://arxiv.org/abs/math/0703344
http://arxiv.org/abs/math/9712014
[T3] H. Tsuji: Deformation invariance of plurigenera, Nagoya Math. J. 166 (2002),
117-134.
[T4] H. Tsuji: Dynamical construction of Kähler-Einstein metrics, math.AG/0606023
(2006).
[T5] H. Tsuji: Curvature semipositivity of relative pluricanonical systems,
math.AG/0703729 (2007).
[T6] H. Tsuji: Kodaira dimension of algebraic fiber spaces, in preparation.
Author’s address
Hajime Tsuji
Department of Mathematics
Sophia University
7-1 Kioicho, Chiyoda-ku 102-8554
Japan
http://arxiv.org/abs/math/0606023
http://arxiv.org/abs/math/0703729
	Introduction
	Canonical AZD hcan
	Supercanonical AZD can
	Variation of the supercanonical AZD can
	Proof of Theorem 1.7
	Upper estimate of mA
	Lower estimate of mA
	Independence of can,A from hA
	Completion of the proof of Theorem 1.7
	Comparison of hcan and can
	Variation of can under projective deformations
	Construction of can on a family
	Semipositivity of the curvature current of m,A
	Uniqueness of can,A for singular hA's
	Case dimS > 1
	Completion of the proof of Theorem 1.10
	Appendix
ABSTRACT
  We introduce a new class of canonical AZD's (called the supercanonical AZD's)
on the canonical bundles of smooth projective varieties with pseudoeffective
canonical classes. We study the variation of the supercanonical AZD
$\hat{h}_{can}$ under projective deformations and give a new proof of the
invariance of plurigenera.

<|endoftext|><|startoftext|>
Introduction
We consider a model for the term structure of interest rates, where the short
rate (rt)t≥0 is given under the martingale measure by a one-dimensional conserva-
tive affine process in the sense of Duffie, Filipović, and Schachermayer [2003]. An
affine short rate process of this type will lead to an exponentially-affine structure
of zero-coupon bond prices and thus also to an affine term structure of yields and
forward rates.
We emphasize here that the definition of Duffie et al. [2003] is not limited to diffu-
sions, but also includes processes with jumps and even with jumps whose intensity
depends in an affine way on the state of the process itself. The class of models we
consider naturally includes the Vasiček model, the CIR model and variants of them
that are obtained by adding jumps, such as the JCIR-model of Brigo and Mercurio
[2006, Section 22.8]. Since they are the best-known, the two ‘classical’ models of
Vasiček and Cox-Ingersoll-Ross will serve as the starting point for our discussion of
yield curve shapes:
A common criticism of the (time-homogenous) CIR and the Vasiček model is that
they are not flexible enough to accommodate more complex shapes of yield curves,
such as curves with a dip (a local minimum), curves with a dip and a hump, or
Date: November 4, 2018.
2000 Mathematics Subject Classification. 60J25, 91B28.
Key words and phrases. affine process, term structure of interest rates, Ornstein-Uhlenbeck
process, yield curve.
Supported by the Austrian Science Fund (FWF) through project P18022 and the START
programm Y328.
Supported by the module M5 “Modelling of Fixed Income Markets” of the PRisMa Lab,
financed by Bank Austria and the Republic of Austria through the Christian Doppler Research
Association.
Both authors would like to thank Josef Teichmann for most valuable discussions and encour-
agement. We also thank various proof-readers at FAM for their comments.
http://arxiv.org/abs/0704.0567v2
2 MARTIN KELLER-RESSEL AND THOMAS STEINER
other shapes that are frequently observed in the markets. Often these shortcomings
are explained by ‘too few parameters’ in the model (cf. Carmona and Tehranchi
[2006, Section 2.3.5] or Brigo and Mercurio [2006, Section 3.2]). However if jumps
are added to the mentioned models, additional parameters (potentially infinitely
many) are introduced through the jump part, while the model still remains in the
scope of affine models. It is not clear per se what consequences the introduction of
jumps will have for the range of attainable yield curves, and this is one question we
intend to answer in this article.
Moreover, there seems to be some confusion about what shapes of yield curves
are actually attainable even in well-studied models like the CIR-model. While
most sources (including the original paper of Cox et al. [1985]) mention inverse,
normal and humped shapes, Carmona and Tehranchi [2006, Section 2.3.5] write
that ‘tweaking the parameters [of the CIR model] can produce yield curves with one
hump or one dip’, and Brigo and Mercurio [2006, Section 3.2] state that ‘some typ-
ical shapes, like that of an inverted yield curve, may not be reproduced by the [CIR
or Vasiček] model.’ In our main result, Theorem 3.9, we settle this question and
prove that in any time-homogenous, affine one-factor model the attainable yield
curves are either inverse, normal or humped. The proof will rely only on tools
of elementary analysis and on the characterization of affine processes through the
generalized Riccati equations of Duffie et al. [2003].
Another related problem is how the shape of the yield curve is determined by the
parameters of the model, and also how – when the parameters are fixed – the yield
curve is determined by the level of the current short rate. We show in Section 4.2
that also in this respect the CIR model has not been completely understood and
discuss a misconception that originates in [Cox et al., 1985] and is repeated for ex-
ample in [Rebonato, 1998].
In Section 3.3 we provide conditions under which an affine process converges
to a limit distribution. We also characterize the limit distribution in terms of its
cumulant generating function, extending results of Jurek and Vervaat [1983] and
Sato and Yamazato [1984] for OU-type processes to the class of affine processes.
These results can again be interpreted in the context of interest rates, where they
can be used to derive the risk-neutral asymptotic distribution of the short rate
(rt)t≥0 as t goes to infinity.
We conclude our article in Section 4 by applying the theoretical results to several
interest rate models, such as the Vasiček model, the CIR model, the JCIR model
and an Ornstein-Uhlenbeck-type model.
2. Preliminaries
In this section we collect some key results on affine processes from Duffie et al.
[2003]. In their article affine processes are defined on the (m+n)-dimensional state
space Rm
>0 × Rn, and we will try to simplify notation where this is possible in the
one-dimensional case. Results on affine processes with state space R>0 can also be
found in Filipović [2001].
Definition 2.1 (One-dimensional affine process). A time-homogenous Markov pro-
cess (rt)t≥0 with state space D = R>0 or R and its semi-group (Pt)t≥0 are called
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 3
affine, if the characteristic function of its transition kernel pt(x, .), given by
p̂t(x, u) =
euξ pt(x, dξ)
and defined (at least) on
{u ∈ C : Reu ≤ 0} if D = R>0 ,
{u ∈ C : Reu = 0} if D = R ,
is exponentially affine in x. That is, there exist C-valued functions φ(t, u) and
ψ(t, u), defined on R>0 × U , such that
(2.1) p̂t(x, u) = exp (φ(t, u) + xψ(t, u)) for all x ∈ D, (t, u) ∈ R>0 × U .
For subsequent results the following regularity condition for (rt)t≥0 will be
needed:
Definition 2.2. An affine process is called regular if it is stochastically continuous
and the right hand derivatives
∂+t φ(t, u)|t=0 and ∂+t ψ(t, u)|t=0
exist for all u ∈ U and are continuous at u = 0.
Definition 2.3. The parameters (a, α, b, β, c, γ,m, µ) are called admissible for a
process with state space R>0 if
a = 0,
α, b, c, γ ∈ R>0 ,
β ∈ R ,
m, µ are Lévy measures on (0,∞), where m satisfies
(0,∞)
(ξ ∧ 1)m(dξ) <∞ ,
and admissible for a process with state space R if
a, c ∈ R>0 ,
b, β ∈ R ,
m is a Lévy measure on R \ {0} ,
α = 0, γ = 0, µ ≡ 0 .
Moreover define the truncation functions
hF (ξ) =
0 if D = R>0
if D = R
and hR(ξ) =
if D = R>0
0 if D = R
and finally the functions F (u), R(u) for u ∈ C as
F (u) = au2 + bu− c+
D\{0}
euξ − 1− uhF (ξ)
m(dξ) ,(2.2)
R(u) = αu2 + βu− γ +
D\{0}
euξ − 1− uhR(ξ)
µ(dξ) .(2.3)
The next result is a one-dimensional version of the key result of Duffie et al.
[2003]:
4 MARTIN KELLER-RESSEL AND THOMAS STEINER
Theorem 2.4 (Duffie, Filipović, and Schachermayer, Theorem 2.7). Suppose (rt)t≥0
is a one-dimensional regular affine process. Then it is a Feller process. Let A be
its infinitesimal generator. Then C∞c (D) is a core of A, C2c (D) ⊆ D(A) and there
exist some admissible parameters (a, α, b, β, c, γ,m, µ) such that, for f ∈ C2c (D),
Af(x) = (a+ αx)f ′′(x) + (b + βx)f ′(x)− (c+ γx)f(x)+
D\{0}
(f(x+ ξ)− f(x)− f ′(x)hF (ξ)) m(dξ)+
D\{0}
(f(x+ ξ)− f(x) − f ′(x)hR(ξ)) µ(dξ) .(2.4)
Moreover φ(t, u) and ψ(t, u), defined by (2.1), solve the generalized Riccati equa-
tions
∂t φ(t, u) = F (ψ(t, u)) , φ(0, u) = 0 ,(2.5a)
∂t ψ(t, u) = R (ψ(t, u)) , ψ(0, u) = u .(2.5b)
Conversely let (a, α, b, β, c, γ, µ,m) be some admissible parameters. Then there ex-
ists a unique regular affine semigroup (Pt)t≥0 with infinitesimal generator (2.4),
and (2.1) holds with φ(t, u) and ψ(t, u) given by (2.5).
Closely related to affine processes is the notion of an Ornstein-Uhlenbeck (OU-
)type process. These processes are of some importance, since they usually offer
good analytic tractability and have been studied for longer than affine processes.
Following Sato [1999, Chapter 17] an OU-type process (Xt)t≥0 can be defined as
the solution of the Langevin SDE
dXt = −λXt dt+ dLt, λ ∈ R, X0 ∈ R,
where (Lt)t≥0 is a Lévy process, often called background driving Lévy process
(BDLP). In an equivalent definition, an OU-type process is a time-homogenous
Markov process, whose transition kernel pt(x, .) has the characteristic function
p̂t(x, u) = exp
F (e−λsu) ds+ xe−λtu
where F (u) is the characteristic exponent of (Lt)t≥0. From the last equation it is
immediately seen that every OU-type process is an affine process in the sense of
Definition 2.1. It is also seen that in the generalized Riccati equations (2.5) for an
OU-type process necessarily R(u) = −λu. Comparing this with (2.3) and Defini-
tion 2.3, it is seen that any regular affine process with state space R is a process of
OU-type. The reverse, however is not true, as there also exist OU-type processes
with state space R>0. We will give an example of such a process in Section 4.4.
Naturally we will not only be interested in the process (rt)t≥0 itself, but also in
its integral
rs ds and in quantities of the type
(2.6) Qt f(x) := E
rs ds
f(rt)
∣∣∣∣ r0 = x
where f is a bounded function on D. The next result is an application of the
Feynman-Kac formula for Feller semigroups (cf. Rogers and Williams [1994, Sec-
tion III.19]) and can be found in Duffie et al. [2003]. It relies on the positivity of
(rt)t≥0 and is therefore only applicable if D = R>0.
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 5
Proposition 2.5 (Duffie, Filipović, and Schachermayer, Proposition 11.1). Let (rt)t≥0
be a one-dimensional, regular affine process with state space R>0. Then the fam-
ily (Qt)t≥0 defined by (2.6) forms a regular, affine semigroup with infinitesimal
generator
Bf(x) = Af(x)− xf(x) for all f ∈ C2c (D) .
We will make extensive use of the convexity and continuous differentiability of
the functions F and R from Definition 2.3. These properties are established in this
Lemma:
Lemma 2.6. If c = γ = 0 then F, R as defined in Definition 2.3 have the following
properties:
(i) R(0) = 0 and F (0) = 0.
(ii) R(u) <∞ for all u ∈ (−∞, 0].
(iii) If F (u) < ∞ on (c1, c2) ⊆ R, then F is either strictly convex on (c1, c2) or
F (u) = bu for all u ∈ R. The same holds for R with b replaced by β.
(iv) If F (u) <∞ on (c1, c2) ⊆ R, then F is continuously differentiable on (c1, c2).
Also the one-sided derivatives at c1 and c2 are defined but may take the values
−∞ (at c1) and +∞ (at c2). The same holds for R.
Proof. Property (i) is obvious. If D = R then by Definition 2.3 R(u) = βu such
that (ii) follows immediately. If D = R>0 we use the estimate
(2.7) |euξ − 1− uhR(ξ)| ≤ |u|
O(ξ2) ∧ 1
for all u ∈ (−∞, 0] and ξ ∈ R>0, and (ii) follows from (2.3). For Property (iii)
note that by the Lévy-Khintchine formula there exists an infinitely divisible random
variableX , such that F is its cumulant generating function, i.e. F (u) = logE
for u ∈ (c1, c2). Choosing two distinct numbers u, v ∈ (c1, c2), we apply the Cauchy-
Schwarz inequality to
= logE
2 · e vX2
≤ log
E[euX ] · E[evX ] = F (u) + F (v)
which shows convexity of F . The inequality is strict unless there exists some c 6= 0
such that euX = cevX almost surely. This can only be the case if X is con-
stant a.s., in which case F is linear. The same argument applies to R. Property
(iv) follows from the convexity and from the fact that F and R are analytic on
{u ∈ C : Reu ∈ (c1, c2)} (cf. Lukacs [1960, Chapter 7]). �
3. Theoretical Results
We will now use the theory from the last section to calculate bond prices, yields
and other quantities in an interest rate model where the short rate follows a one-
dimensional regular affine process (rt)t≥0 under the martingale measure. Naturally
we will also make the assumption that (rt)t≥0 is conservative, i.e. that pt(x,D) = 1
for all (t, x) ⊆ R>0 ×D. This implies by Duffie et al. [2003, Proposition 9.1] that
c = γ = 0 in Definition 2.3. We will need some additional assumptions which are
summarized in the following condition:
Condition 3.1. The one-dimensional affine process (rt)t≥0 is assumed to be reg-
ular and conservative. In addition, if the process has state space R, such that by
6 MARTIN KELLER-RESSEL AND THOMAS STEINER
Definition 2.3 R(u) = βu, we require that
(3.1) F (u) <∞ for all u ∈
(1/β, 0] if β < 0 ,
(−∞, 0] else .
It will be seen that the condition on F is necessary to guarantee existence of bond
prices for all maturities in the term structure model. By Sato [1999, Theorem 25.17]
we get an equivalent formulation of Condition 3.1, if we replace F (u) < ∞ by∫
|ξ|>1
euξm(dξ) <∞. Next we define a quantity that will generalize the coefficient
of mean reversion from OU-type processes:
Definition 3.2 (quasi-mean-reversion). Given a one-dimensional conservative affine
process (rt)t≥0, define the quasi-mean-reversion λ as the positive solution of
(3.2) R(−1/λ) = 1 .
If there is no positive solution we set λ = 0.
Since R is by Lemma 2.6 a convex function satisfying R(0) = 0, it is easy to see
that (3.2) can have at most one solution and thus λ is well-defined. The name quasi-
mean-reversion is derived from the fact that if (rt)t≥0 is a process of OU-type with
positive mean-reversion, then R(u) = βu and the quasi-mean-reversion λ = −β
is exactly the coefficient of mean reversion of (rt)t≥0. When the process (rt)t≥0
satisfies Condition 3.1, it is seen that F must be defined at least on (−1/λ, 0].
We will encounter several times the condition that λ > 0. The next result gives an
equivalent formulation in terms of (α, β, µ):
Proposition 3.3. The quasi-mean reversion λ is strictly positive if and only if
α > 0,
D\{0}
hR(ξ)µ(dξ) = ∞, or β −
D\{0}
hR(ξ)µ(dξ) < 0.
Proof. First note that by Lemma 2.6 R(u) < ∞ for all u ∈ (−∞, 0]. Using the
estimate (2.7) and a dominated convergence argument it is seen from (2.3) that
= α(3.3)
R(u)− αu2
= β0 := β −
D\{0}
hR(ξ)µ(dξ) ,(3.4)
where β0 can also take the value −∞. Suppose now that α > 0. Then by (3.3) we
get limu→−∞R(u) = ∞. Since R(0) = 0 and R is continuous it follows that there
exists a λ > 0 such that R(−1/λ) = 1. Similarly if α = 0, but β0 < 0, it follows
from (3.4) that limu→−∞R(u) = ∞ and thus again that λ > 0.
Conversely, suppose that α = 0 and β0 ≥ 0. Then
R′(u) = lim
= β0 ≥ 0 .
By the convexity of R it follows that R′(u) ≥ 0 for all u ∈ (−∞, 0). Since R(0) = 0
this implies that R(u) ≤ 0 for all u ∈ (−∞, 0), and consequently that λ = 0. �
3.1. Bond Prices. We consider now the price P (t, t + x) of a zero-coupon bond
with time to maturity x, at time t, given by
P (t, t+ x) = E
∫ t+x
rs ds
)∣∣∣∣Ft
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 7
The affine structure of (rt)t≥0 carries over to the bond prices, and we get the
following result:
Proposition 3.4. Let the short rate be given by a one-dimensional affine process
(rt)t≥0 satisfying Condition 3.1.
Then the bond price P (t, t+ x) exists for all t, x ≥ 0 and is given by
(3.5) P (t, t+ x) = exp (A(x) + rtB(x))
where A and B solve the generalized Riccati equations
∂xA(x) = F (B(x)) A(0) = 0 ,(3.6a)
∂xB(x) = R (B(x)) − 1 B(0) = 0 .(3.6b)
Proof. If D = R>0 the assertion follows directly from Proposition 2.5 by noting
that P (t, t+ x) = Qx 1.
If D = R then, as discussed after Theorem 2.4, (rt)t≥0 is a process of OU-type and
R(u) has the simple structure R(u) = βu. By Sato [1999, (17.2) - (17.3)] we obtain
in this case directly that
(3.7) E
∫ t+x
rs ds
= exp
F (B(s)) ds + rtB(x)
with B(x) = (1 − eβx)/β if β 6= 0 and B(x) = −x when β = 0. As a function of
x ∈ R>0, B is continuously decreasing from 0 to 1/β if β < 0, and from 0 to −∞
if β ≥ 0. It is therefore seen that the integral on the right side of (3.7) is finite for
all x ∈ R>0 if and only if F satisfies (3.1), as required by Condition 3.1. �
Corollary 3.5. Let (rt)t≥0 satisfy Condition 3.1 and have quasi-mean-reversion
λ. Then the function B(x) from Proposition 3.4 is strictly decreasing and satisfies
B(x) = −1/λ .
Proof. The result follows from a qualitative analysis of the autonomous ODE (3.6b).
Let λ > 0. Since R(−1/λ)−1 = 0 the point x∗ := −1/λ is an critical point of (3.6b).
By the convexity of R and the fact that R(0) = 0 it follows that R′(x∗) < 0 such
that x∗ is asymptotically stable, i.e. solutions entering a small enough neighborhood
of x∗ must converge to x∗. Since R(x)− 1 < 0 for x ∈ (x∗, 0] and there is no other
critical point in (x∗, 0], we conclude that B(x) – the solution of (3.6b) starting at
0 – is strictly decreasing and converges to x∗.
If λ = 0 then there is no critical point in (−∞, 0] and R(x)−1 < 0 for x ∈ (−∞, 0].
It follows that B(x) is strictly decreasing and diverges to −∞. �
3.2. The Yield Curve and the Forward Rate Curve. The next results are the
central theoretical results of this article and describe the global shapes of attainable
yield curves in any affine one-factor term structure model.
Definition 3.6. The (zero-coupon) yield Y (rt, x) is given by Y (rt, 0) := rt and
(3.8) Y (rt, x) := −
logP (t, t+ x)
= −A(x)
for all x > 0 .
For rt fixed, we call the function Y (rt, .) the yield curve.
The (instantaneous) forward rate f(rt, x) is given by f(rt, 0) := rt and
(3.9) f(rt, x) := −∂x logP (t, t+ x) = −A′(x)− rtB′(x) for all x > 0 .
For rt fixed, we call the function f(rt, .) the forward rate curve.
8 MARTIN KELLER-RESSEL AND THOMAS STEINER
By l’Hospital’s rule and the generalized Riccati equations (3.6) it is seen that
both the yield and the forward rate curve are continuous at 0.
The first quantity associated to the yield curve that we consider, is the asymptotic
level basymp of the yield curve as x → ∞, also known as long-term yield, consol
yield or simply ‘long end’.
Theorem 3.7. Let the short rate process be given by a one-dimensional affine pro-
cess (rt)t≥0 satisfying Condition 3.1 with quasi-mean-reversion λ.
If λ > 0 then
basymp := lim
Y (rt, x) = lim
f(rt, x) = −F (−1/λ) .
If λ = 0 then
basymp = lim
−F (u) + rt (1−R(u)) .
Proof. From (3.6a) we obtain that
(3.10) lim
= lim
A′(x) = lim
F (B(x)) .
If λ > 0 then by Corollary 3.5
(3.11) lim
B(x) = −1/λ, lim
= 0 and lim
B′(x) = 0
and the assertion follows by combining (3.8) – (3.11).
If λ = 0 then limx→∞B(x) = −∞ and
= lim
B′(x) = lim
R(B(x)) − 1 .
By setting u := B(x) we obtain the desired result. �
From Theorem 3.7 it is clear that for practical purposes only models with λ > 0
will be useful. So far we know that in this case the short end of the yield curve is
given by Y (rt, 0) = rt and the long end by Y (rt,∞) = basymp. We will now examine
what happens between these two endpoints.
Definition 3.8. The yield curve Y (rt, x) is called
• normal if it is a strictly increasing function of x,
• inverse if it is a strictly decreasing function of x,
• humped if it has exactly one local maximum and no minimum on (0,∞).
In addition we call the yield curve flat if it is constant over all x ∈ R>0.
This is our main result on the shapes of yield curves in affine one-factor models:
Theorem 3.9. Let the risk-neutral short rate process be given by a one-dimensional
affine process (rt)t≥0 satisfying Condition 3.1 and with quasi-mean-reversion λ > 0.
In addition suppose that F 6= 0 and that either F or R is non-linear. Then the
following holds:
• The yield curve Y (rt, .) can only be normal, inverse or humped.
• Define
bnorm := −
F ′(−1/λ)
R′(−1/λ) and binv :=
R′(0)
if R′(0) < 0
+∞ if R′(0) ≥ 0 .
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 9
The yield curve is normal if rt ≤ bnorm , humped if bnorm < rt < binv and
inverse if rt ≥ binv .
The above theorem is visualized in Figure 1. For its proof we will use a simple
Lemma. We state the Lemma without proof, since it follows in an elementary way
from the usual definition of a convex function on R.
Lemma 3.10. A strictly convex or a strictly concave function on R intersects an
affine function in at most two points. In the case of two intersection points p1 < p2,
the convex function lies strictly below the affine function on the interval (p1, p2); if
the function is concave it lies strictly above the affine function on (p1, p2).
Proof of Theorem 3.9. Define the function H(x) : R>0 → R by
(3.12) H(x) := Y (rt, x)x = −A(x)− rtB(x) .
We will see that the convexity behavior of H will be crucial for the shape of the
yield curve Y (rt, .). From the generalized Riccati equations (3.6) the first derivative
of H is calculated as
(3.13) ∂xH(x) = −F (B(x)) − rt (R(B(x))− 1)
and the second as
(3.14) ∂xxH(x) = −B′(x) (F ′(B(x)) + rtR′(B(x))) .
Note that F and R are continuously differentiable by Lemma 2.6, and also B by
(3.6b), such that the second derivative of H is well-defined and continuous. Since B
is strictly decreasing by Corollary 3.5, the factor −B′(x) is positive for all x ∈ R>0.
The sign of ∂xxH(x) therefore equals the sign of
(3.15) k(x) := F ′(B(x)) + rtR
′(B(x)) .
From the fact that B is decreasing and F and R are convex it is obvious that k
must be decreasing. We will now show that k has at most a single zero in [0,∞):
(a) D = R>0: We have assumed that either F or R is non-linear. By Lemma 2.6
this implies that either F or R is strictly convex, and thus that either F ′ or R′
is strictly increasing. If rt > 0, then it follows that k is strictly decreasing and
thus has at most a single zero. If rt = 0, an additional argument is needed:
It could happen that F is of the form F = bu such that k(x) = b and k is no
longer strictly decreasing. However, by assumption, F 6= 0 such that in this
case k has no zero in [0,∞).
(b) D = R: In this case, by the admissibility conditions in Definition 2.3, we
have necessarily R(u) = βu. Also, since either F or R is non-linear, F must
be non-linear and thus by Lemma 2.6 strictly convex. It follows that k(x) =
F ′(B(x))+rtβ is strictly decreasing and thus has at most a single zero in [0,∞).
We have shown that k is decreasing and has at most a single zero; to determine
whether it has a zero for some value of rt, we consider the two ‘endpoints’ k(0) and
limx→∞ k(x). First we show that
(3.16) k(0) ≥ 0 if and only if rt ≤ binv :=
R′(0)
if R′(0) < 0
+∞ if R′(0) ≥ 0 .
Since B(0) = 0 by Proposition 3.4 it follows that
k(0) = F ′(0) + rtR
′(0) .
10 MARTIN KELLER-RESSEL AND THOMAS STEINER
We distinguish two cases:
(a) If R′(0) < 0 then the assertion (3.16) follows immediately.
(b) Consider the case that R′(0) ≥ 0: Assume that D = R. Then we have
R(u) = βu and R′(0) = β ≥ 0. This, however, stands in contradiction to
our assumption λ > 0, which implies that β = −λ < 0 (cf. Definition 3.2).
Thus we must have D = R>0 and rt ≥ 0; in this case it follows that k(0) ≥ 0,
for all rt ∈ D, and we set binv = +∞.
Next we consider the right end of k(x) and show that
(3.17) lim
k(x) ≤ 0 if and only if rt ≥ bnorm := −
F ′(−1/λ)
R′(−1/λ) .
Since limx→∞B(x) = −1/λ by Corollary 3.5 we have that
(3.18) lim
k(x) = F ′(−1/λ) + rtR′(−1/λ) .
By assumption λ > 0, and by Definition 3.2 it holds that R(−1/λ) = 1. Also
R(0) = 0, and by the mean value theorem
1 = R(−1/λ)−R(0) = − 1
R′(ξ)
for some ξ ∈ (−1/λ, 0). Since R′ is increasing, it follows that R′(−1/λ) ≤ −λ < 0,
and we can deduce (3.17) directly from (3.18).
We summarize our results on the function k so far: k stays negative on (0,∞) if
rt ≥ binv and positive if rt ≤ bnorm. It has a single zero on (0,∞) if and only if
bnorm < rt < binv. If k has a zero on (0,∞), since k is decreasing, the sign of k will
be positive to the left of the zero and negative to the right of the zero.
Since ∂xxH has the same sign as k, the statements above translate in the obvious
way to the convexity behavior of H . We will now use the convexity behavior of H
to derive our results about the yield curve.
Consider the equation
(3.19) H(x) = cx, x ∈ [0,∞)
for some fixed c ∈ R. Since H(0) = 0 this equation has at least one solution,
x0 = 0. If rt ≥ binv then H(x) is strictly concave on [0,∞), and by Lemma 3.10
the equation (3.19) has at most one additional solution x1. Also, when the solution
exists, H(x) crosses cx from above at x1. Similarly if rt ≤ bnorm then H(x) is
strictly convex, and there exists at most one additional solution x2 to (3.19) on
[0,∞). If the solution exists, then cx is crossed from below at x2. In the last case
bnorm < rt < binv, there exists a x∗ – the zero of k(x) – such that H(x) is strictly
convex on (0, x∗) and strictly concave on (x∗,∞). Now there can exist at most two
additional solutions x1, x2 to (3.19) with x1 < x
∗ < x2, such that cx is crossed from
below at x1 and from above at x2.
Because of definition (3.12), every solution to (3.19), excluding x0 = 0, is also a
solution to
(3.20) Y (rt, x) = c, x ∈ (0,∞)
with rt fixed. Also the properties of crossing from above/below are preserved since
x is positive. This means that in the case rt ≥ binv, equation (3.20) has at most a
single solution, or in other words, that every horizontal line is crossed by the yield
curve at most in a single point. If it is crossed, it is crossed from above. This
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 11
implies that Y (x) is a strictly decreasing function of x, or following Definition 3.8,
that the yield curve is inverse. In the case rt ≤ bnorm we have again that (3.20) has
at most a single solution and that every horizontal line is crossed from below by
the yield curve, if it is crossed. In other words, the yield curve is normal. In the
last case of bnorm < rt < binv, the yield curve crosses every horizontal line at most
twice, in which case it crosses first from below, then from above. Thus in this case
the yield curve is humped. �
Corollary 3.11. Under the conditions of Theorem 3.9 the instantaneous forward
rate curve has the same global behavior as the yield curve, i.e.
Y (rt, .) is inverse ⇐⇒ f(rt, .) is strictly decreasing
Y (rt, .) is humped ⇐⇒
f(rt, .) has exactly one local maximum
and no local minimum
Y (rt, .) is normal ⇐⇒ f(rt, .) is strictly increasing .
In the second case the maximum of the forward rate curve is f(rt, x∗), where x∗
solves
(3.21) rt = −
F ′(B(x))
R′(B(x))
, x ∈ (0,∞) .
Proof. This follows from the fact that ∂xH(x) as given in (3.13) is exactly the
forward rate f(rt, x). The derivative of the forward rate is therefore ∂xxH(x),
which is given in (3.14) as
∂xf(rt, x) = ∂xxH(x) = −B′(x) · k(x) .
The factor −B′(x) 6= 0 is always positive, and the possible sign changes and zeroes
of k(x) are discussed in the proof of Theorem 3.9, leading to the stated equivalences.
Equation (3.21) is simply the condition k(x∗) = 0. �
Corollary 3.12. Under the conditions of Theorem 3.9 it holds that
(3.22) bnorm < basymp < binv
whenever the quantities are finite. In addition it holds that
(3.23) D ∩ (bnorm, binv) 6= ∅ .
Remark 3.13. Note that equation (3.23) implies that there is always some rt ∈ D
such that the yield curve Y (rt, .) is humped.
Proof. By the mean value theorem there exists a ξ ∈ (−1/λ, 0) such that
basymp = −F (−1/λ) = F (0)− F (−1/λ) =
F ′(ξ) .
Since F is convex and thus F ′ increasing, it holds that
(3.24)
F ′(−1/λ)
≤ basymp ≤
F ′(0)
Applying the mean value theorem to R, there exists another ξ ∈ (−1/λ, 0) such
1 = R(−1/λ)−R(0) = − 1
R′(ξ) .
12 MARTIN KELLER-RESSEL AND THOMAS STEINER
Since R′ is increasing we deduce that R′(−1/λ) ≤ −λ < 0. Assuming that also
R′(0) < 0 we get
(3.25) − 1
R′(−1/λ) ≤
≤ − 1
R′(0)
Since either F orR is non-linear, one of the functions is strictly convex by Lemma 2.6.
Consequently either both inequalities in (3.24) or in (3.25) are strict. Putting them
together we get
′(−1/λ)
R′(−1/λ) < basymp < −
F ′(0)
R′(0)
proving (3.22) under the assumption that R′(0) < 0.
If R′(0) ≥ 0 then by definition binv = ∞. Equation (3.24) still holds, but in
(3.25) only the left inequality sign remains valid. Together this still proves that
bnorm < basymp and we have shown (3.22).
To prove (3.23) we distinguish two cases:
(a) D = R. In this case it is sufficient to prove−∞ < binv and bnorm <∞. Consider
first binv. If R
′(0) ≥ 0 then by definition binv = ∞ and nothing is to prove.
If R′(0) < 0 then binv = −F ′(0)/R′(0). By convexity F ′(0) > −∞ and the
assertion follows. Consider now bnorm = −F ′(−1/λ)/R′(−1/λ). From (3.25)
we know that R′(−1/λ) ≤ −λ < 0. By convexity F ′(−1/λ) <∞ and it follows
that bnorm <∞.
(b) D = R>0. In this case it is sufficient to prove 0 ≤ bnorm and to apply (3.22).
As above we have that bnorm = −F ′(−1/λ)/R′(−1/λ) and that R′(−1/λ) ≤
−λ < 0. By Definition 2.3
F ′(−1/λ) = b+
(0,∞)
ξe−ξ/λm(dξ)
with b ≥ 0. It follows that F ′(−1/λ) ≥ 0, proving the assertion. �
The last Corollary of this section shows the interesting fact that the occurrence
of a humped yield curve is a necessary and sufficient sign of randomness in the
short rate model:
Corollary 3.14. Let the risk-neutral short rate process be given by a one-dimensional
affine process (rt)t≥0 satisfying Condition 3.1 with F 6= 0 and quasi-mean-reversion
λ > 0. Then the following statements are equivalent:
(i) There exists a rt ∈ D such that Y (rt, .) is flat.
(ii) There exists no rt ∈ D such that Y (rt, .) is humped.
(iii) The short rate process (rt)t≥0 is deterministic.
(iv) F (u) = bu and R(u) = βu.
Proof. Theorem 3.9, together with Corollary 3.12, shows already that ¬(iv) implies
¬(i) and ¬(ii). Also, from the form of the generator in (2.4), it is seen that (iii) and
(iv) are equivalent. It remains to show that (iv) implies (i) and (ii). Proceeding
as in the proof of Theorem 3.9 we obtain instead of (3.15) simply
k(x) = b+ rtβ .
The yield curve will be humped if and only if k has a single (isolated) zero in [0,∞).
Since k is a constant function, this cannot be the case for any rt ∈ D and we have
shown (ii). By the same arguments as in the proof of Theorem 3.9 the yield curve
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 13
binv = −
F’(0)
R’(0)
basymp = − F(− 1 λ)
bnorm = −
F’(− 1 λ)
R’(− 1 λ)
Time to Maturity
Figure 1. This Figure shows a graphical summary of Theorems
3.7 and 3.9, as well as the definitions of the key quantities bnorm,
basymp and binv. In any affine model satisfying the conditions of
Theorem 3.9, the shapes of yield curves will follow the picture
given here. They will be normal if r0 is below bnorm, humped if r0
is between bnorm and binv and inverse if r0 is above binv. Also all
yield curves will tend asymptotically to the same level basymp.
is flat if and only if k is constant and equal to 0. This is the case if rt = − bβ . It
remains to show that rt ∈ D. Note that β = −λ < 0. In particular β 6= 0, such
that for D = R we are already done. If D = R>0 we have by the admissibility
conditions in Definition 2.3 that b ≥ 0. Thus rt = − bβ ≥ 0 and we have shown
(i). �
3.3. The Limit Distribution of an Affine Process. It is well-known that the
Gaussian Ornstein-Uhlenbeck process, for example, converges in law to a limit
distribution and that this distribution is Gaussian. The goal of this section is to
establish a corresponding result for affine processes. While calculating the marginal
distributions of an affine process involves solving the generalized Riccati equations
(2.5), it will be seen that the limit distribution is much easier obtained and can be
determined directly from the functions F and R.
In the interest rate model considered in the preceding section, the short rate follows
an affine process under the martingale measure, such that the results will allow us
to characterize the risk-neutral asymptotic short rate distribution. Often also the
14 MARTIN KELLER-RESSEL AND THOMAS STEINER
limit distribution under the objective measure is of interest, but the affine prop-
erty is in general not preserved by an equivalent change of measure, such that the
results are not directly applicable. Nevertheless, for the sake of tractability, condi-
tions on the measure change can be imposed, such that the model is affine under
both the objective and the risk-neutral measure. (See Nicolato and Venardos [2003]
for an example from option pricing and Cheridito et al. [2005] for more general re-
sults). In such a setting the results can also be applied under the objective measure.
Before we state the result, we want to recall that a real-valued random variable
L is called self-decomposable if for every c ∈ (0, 1) there exists a random variable
Lc, independent of L, such that
L = cL+ Lc for all c ∈ (0, 1) .
Since self-decomposability is a distributional property, we will identify L and its
law, and refer to both as self-decomposable.
For OU-type processes, limit distributions have been studied for some time;
the first results can be found in Jurek and Vervaat [1983] and Sato and Yamazato
[1984]. The next theorem summarizes these results, and can be found in similar
form in Sato [1999, Theorem 17.5]:
Theorem 3.15. Let (rt)t≥0 be a OU-type process on R. If
β < 0 and
|ξ|>1
log |ξ|m(dξ) <∞
then (rt)t≥0 converges in law to a limit distribution L which is independent of r0
and has the following properties:
(i) L is self-decomposable.
(ii) The cumulant generating function κ(u) = log
eux dL(x) satisfies
(3.26) κ(iu) =
F (is)
ds for all u ∈ R .
Conversely, if L is a self-decomposable distribution on R and β < 0, there exists
a unique triplet (a, b,m) satisfying the admissibility conditions of Definition 2.3,
such that L is the limit distribution of the affine process (of OU-type) given by the
parameters (a, b,m, β).
As discussed in Section 2, every regular affine process with state space R is of
OU-type, such that the above theorem applies. We now state our corresponding
result for affine processes on R>0:
Theorem 3.16. Let (rt)t≥0 be a one-dimensional, regular, conservative affine pro-
cess with state space R>0. If
R′(0) < 0 and
log ξ m(dξ) <∞
then (rt)t≥0 converges in law to a limit distribution L which is independent of r0,
and whose cumulant generating function κ is given by
(3.27) κ(u) =
F (s)
ds for all u ∈ (−∞, 0] .
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 15
Proof. By Theorem 2.4 the transition kernel pt(x, .) of the process (rt)t≥0 has the
characteristic function
p̂t(x, u) = exp (φ(t, u) + xψ(t, u))
where φ and ψ satisfy the generalized Riccati equations (2.5) for all u ∈ U , and
thus in particular for all u ∈ (−∞, 0]. Since R(0) = 0, 0 is a critical point of the
autonomous ODE (2.5b), and by the assumption R′(0) < 0 it is asymptotically
stable. By the convexity of R, R′(0) < 0 also implies that R(u) > 0 for all u ∈
(−∞, 0), such that ψ(t, u) is a strictly increasing function in t for all u ∈ (−∞, 0).
Since 0 is the only critical point of (2.5b) on (−∞, 0] it also follows that
ψ(t, u) = 0 for all u ∈ (−∞, 0] .
Consequently,
(3.28) lim
log p̂t(x, u) = lim
φ(t, u) =
F (ψ(r, u)) dr =
F (s)
where the last two equalities follow from (2.5) and the transformation s = ψ(r, u).
We will now show that the last integral in (3.28) converges absolutely for all u ∈
(−∞, 0]: Since R(u) ≥ 0 and F (u) ≤ 0 for all u ∈ (−∞, 0] we obtain
F (s)
∣∣∣∣ ds = −
F (s)
ds ≤ − 1
R′(0)
F (s)
ds, u ∈ (−∞, 0] ,
where the inequality follows from the fact that the convex function R is supported by
its tangent at 0. From the definition of F (u) in (2.2) it is clear that the convergence
of the last integral will depend only on the jump part of F , i.e. the integral converges
if and only if
(3.29)
(0,∞)
esξ − 1
m(dξ) ds <∞, for all u ∈ (−∞, 0].
Define M(u, ξ) =
esξ−1
ds. For a fixed u ∈ (−∞, 0], it is easily verified that
M(u, ξ) = O(ξ) as ξ → 0, and that M(u, ξ) = O(log ξ) as ξ → ∞. Since the Lévy
measurem(dξ) integrates (ξ∧1) by Definition 2.3, and log ξ ·1{ξ>1} by assumption,
it must also integrateM(u, ξ). Applying Fubini’s theorem, (3.29) follows, such that
κ(u) :=
F (s)
ds converges for all u ∈ (−∞, 0]. In particular limu↑0 κ(u) = 0, such
that the limit in (3.28) is a function that is left-continuous at 0. By standard results
on Laplace transforms of probability measures (cf. Steutel and van Harn [2004,
Theorem A.3.1]), the pointwise convergence of cumulant generating functions to a
function that is left-continuous at 0 implies convergence in distribution of (rt)t≥0
to a limit distribution L with cumulant generating function given by (3.28). �
Since the marginal distributions of an affine process are infinitely divisible, also
the limit distribution L must be infinitely divisible, if it exists. In Theorem 3.15
a stronger result is given for an affine process on R: In this case L is also self-
decomposable. An obvious question is, if this result can be extended to the state
space R>0. We will see that the answer is negative. In Section 4.3 an example of an
affine process with state space R>0 is given, which converges to an infinitely divisible
limit distribution that is not self-decomposable. This result is interesting, since it
leaves open the possibility of some unexpected properties of the limit distribution of
16 MARTIN KELLER-RESSEL AND THOMAS STEINER
an affine process. For example a self-decomposable distribution is always unimodal,
whereas an infinitely divisible distribution might be not.
4. Applications
4.1. The Vasiček model. We apply the results of the last section to the classical
Vasiček model
(4.1) drt = −λ(rt − θ) dt+ σ dWt, r0 ∈ R
where (Wt)t≥0 is a standard Brownian motion under the risk-neutral measure and
λ, θ, σ > 0. The Vasiček model is arguably the simplest affine model, and no
surprises are to be expected here. In fact all results that we state here can already
be found in the original paper of Vasiček [1977]. We advise the reader to view this
paragraph as a warm-up for the following examples.
Clearly (rt)t≥0 is a conservative affine process with
F (u) = λθu +
u2 ,(4.2)
R(u) = −λu .(4.3)
From the quadratic term in F and Definition 2.3, it is seen that (rt)t≥0 has state
space R. This property is often criticized, since it allows the short rate to become
negative.
From Theorem 3.9 we calculate
binv = θ and bnorm = θ −
such that the yield curve in the Vasiček model is normal if rt ≤ θ − σ2/λ2, inverse
if rt ≥ θ and humped in the remaining cases.
The long term yield is calculated from (3.7) as
basymp = −F (−1/λ) = θ −
in this case exactly the arithmetic mean of binv and bnorm.
Theorem 3.15 applies and the cumulant generating function κ of the risk-neutral
limit distribution L satisfies
κ(iu) = − 1
F (is)
iθ − σ
ds = uiθ − u
for u ∈ R. Hence, L is Gaussian with mean θ and variance σ2
4.2. The Cox-Ingersoll-Ross model. The Cox-Ingersoll-Ross (CIR)-model was
introduced by Cox et al. [1985]. In this model the short rate process (rt)t≥0 is given
by the SDE
(4.4) drt = −a(rt − θ)dt+ σ
rt dWt, r0 ∈ R>0
where (Wt)t≥0 is a standard Brownian Motion under the risk-neutral measure and
a, θ, σ > 0. The process (rt)t≥0 is a conservative affine process with
F (u) = aθu ,(4.5)
R(u) =
u2 − au .(4.6)
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 17
From Definition 2.3 it is seen that (rt)t≥0 has state space R>0. The fact that
interest rates stay non-negative in the CIR-model is often cited as an advantage
of the model over the Vasiček model. Calculating the quasi-mean-reversion (see
Definition 3.2), we find that
a2 + 2σ2 + a
From Theorem 3.7 we find that the long-term yield is given by
basymp = −F (−1/λ) =
a2 + 2σ2 + a
The boundary between humped and inverse behavior binv is calculated from Theo-
rem 3.9 as
binv = −
F ′(0)
R′(0)
Both quantities basymp and binv can also be found in [Cox et al., 1985, Eq. (26) and
following paragraph]. Before we consider bnorm, we quote (with notation adapted
to (4.4)) from page 394 of [Cox et al., 1985] where the shape of the yield curve is
discussed:
‘When the spot rate is below the long-term yield [= basymp], the
term structure is uniformly rising. With an interest rate in excess
of θ [= binv], the term structure is falling. For intermediate values
of the interest rate, the yield curve is humped.’
In our terminology, they claim that the yield curve is normal for rt ≤ basymp,
humped for basymp < rt < binv and inverse for rt ≥ binv. This stands in clear
contradiction to Theorem 3.9 and Corollary 3.12 where we have obtained that yield
curves are normal if and only if rt ≤ bnorm and that bnorm < basymp, or – in plain
words – that there are yield curves starting strictly below the long-term yield that
are still humped.
The claims of Cox et al. [1985] are repeated in [Rebonato, 1998, p. 244f], where even
several plots of ‘yield surfaces’ (the yield as a function of rt and x) are presented as
evidence. However Rebonato fails to indicate the level of basymp in the plots, such
that the conclusion remains ambiguous.
To clarify the scope of humped yield curves in the CIR-model we calculate bnorm
from Theorem 3.9:
bnorm = −
F ′(−1/λ)
R′(−1/λ) =
a2 + 2σ2
The relation bnorm < basymp < binv is immediately confirmed by noting that basymp
is the harmonic mean of bnorm and binv. For a graphical illustration we refer to
the second yield curve from below in Figure 1. The plot actually shows CIR yield
curves with parameters
a = 0.5, σ = 0.5, θ = 6%
plotted over a time scale of 25 years. The second curve from below starts at
r0 = 4.2%, i.e. below the long-term yield, but is visibly humped.
18 MARTIN KELLER-RESSEL AND THOMAS STEINER
To calculate the limit distribution of (rt)t≥0, we apply Theorem 3.16: The cu-
mulant generating function κ(u) of the limit distribution is given by
κ(u) =
F (s)
1− sσ2/2a ds = −
This is the cumulant generating function of a gamma distribution with shape pa-
rameter 2aθ/σ2 and scale parameter σ2/2a. Again this result can already be found
in Cox et al. [1985, p. 392].
4.3. An extension of the CIR model. To illustrate the power of the affine
setting, we consider now an extension of the CIR model that is obtained by adding
jumps to (4.4). We define the risk-neutral short rate process by
(4.7) drt = −a(rt − θ)dt+ σ
rt dWt + dJt, r0 ≥ 0
where (Jt)t≥0 is a compound Poisson process with intensity c > 0 and expo-
nentially distributed jumps of mean ν > 0. This model has been introduced by
Duffie and Gârleanu [2001] as a model for default intensity and is used by Filipović
[2001] as a short rate model. It can also be found in Brigo and Mercurio [2006]
under the name JCIR model. It is easily calculated that
F (u) = aθu+
ν − u, u ∈ (−∞, ν) ,(4.8)
R(u) =
u2 − au .(4.9)
Solving the generalized Riccati equations (3.6) for A(x) and B(x) becomes quite
tedious, but the quantities binv, basymp, bnorm can be calculated from Theorem 3.7
and Theorem 3.9 in a few lines: The quasi-mean reversion λ stays the same as in
the CIR model, since R does not change. From
F ′(u) = aθ +
(ν − u)2
we derive immediately
binv = θ +
basymp =
ν(a+ ν) + 2
bnorm =
γ(σ2ν + γ − a)2 ,
where γ =
a2 + 2σ2. Note that by setting the jump intensity c to zero, the ex-
pressions of the (original) CIR model are recovered.
Next we calculate the limit distribution of the model. Using the abbreviations
ρ := σ2/2 and ∆ := a− νρ we obtain
κ(u) =
F (s)
1− sρ/a ds+ c
(s− ν)(ρs− a) =
if ∆ 6= 0
−θν log
if ∆ = 0
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 19
as the cumulant generating function of the limit distribution L under the martin-
gale measure.
We now take a closer look at the distribution L, since this will answer the
question raised at the end of Section 3.3: For certain parameters, L is an example
for a limit distribution of an affine process that is infinitely divisible, but not self-
decomposable. We consider the case ∆ = 0 and define
(4.10) l(x) :=
νe−νx, x ∈ R>0 .
By Frullani’s integral formula
(4.11) κ(u) =
(eux − 1) l(x)
for all u ∈ (−∞, ν). Since l is non-negative on R>0, l(x)/x is the density of a
Lévy measure and (4.11) is seen to be the Lévy-Khintchine representation for the
cumulant generating function of the infinitely divisible distribution L. In addition,
L is self-decomposable if and only if l is non-negative and non-increasing on R>0
(cf. Sato [1999, Corollary 15.11]).
In the case of l(x) given by (4.10), it is easily calculated that l(x) has a single
maximum at x∗ = 1
. Thus, if c ≤ aθν, then x∗ ≤ 0, such that l is non-
increasing on R>0 and L is self-decomposable. If c > aθν then l is increasing
in the interval [0, x∗) and the limit distribution L is infinitely divisible, but not
self-decomposable.
4.4. The gamma model. Instead of analyzing the properties of a known model,
we will now follow a different route and construct a model that satisfies some given
properties. We want to construct an affine process on R>0 that has the same limit
distribution as the CIR model (i.e. a gamma distribution), but is a process of OU-
type. The second property is equivalent to R(u) = βu. Considering Theorem 3.16,
we know that if we want to obtain a limit distribution, we need β < 0. To keep
with the notation of the Vasiček model, we will write R(u) = −λu where λ > 0.
Now by (3.27) the cumulant generating function of the limit distribution is given
(4.12) κ(u) =
F (s)
ds for all u ∈ (−∞, 0] .
Let the limit distribution be a gamma distribution with shape parameter k > 0
and scale parameter θ > 0. Then κ(u) = −k log(1 − θu) and by (4.12)
F (u) =
1− θu .
Setting c = λk and ν = 1/θ it is seen that F (u) is equal to the last term in
(4.8). This means that the driving Lévy process of (rt)t≥0 is of the same kind
as the process (Jt)t≥0 in (4.7), i.e. (rt)t≥0 is a pure jump OU-type process with
exponentially distributed jump heights of mean 1/θ and with jump intensity λk.
We interpret the affine process we have constructed as a risk-neutral short rate
process. It is clear that the bond prices are of the exponentially-affine form (3.5).
From the generalized Riccati equation (3.6b) we obtain
B(x) =
e−λx − 1
20 MARTIN KELLER-RESSEL AND THOMAS STEINER
From equation (3.6a) we calculate
A(x) =
F (B(s)) ds =
θ + λ
(log(1− θB(x)) − θx) ,
such that the bond prices are given by
P (t, t+ x) = exp
−x λθk
θ + λ
+ rtB(x)
(1− θB(x))
θ+λ .
The global shape of the yield curve is described by the quantities
binv = kθ, basymp =
1/θ+ 1/λ
, bnorm =
(1/θ + 1/λ)2
and it is seen that for the gamma-OU-process basymp is the geometric average of
binv and bnorm.
5. Conclusions
In this article we have given, under very general conditions, a characterization of
the yield curve shapes that are attainable in term structure models where the risk-
neutral short rate is given by a time-homogenous, one-dimensional affine process.
Even though the parameter space for this class of models is infinite-dimensional,
the scope of attainable yield curves is very narrow, with only three possible global
shapes. In addition we have given conditions under which an affine process con-
verges to a limit distribution, and we have characterized the limit distribution in
terms of its cumulant generating function, extending some known results on OU-
type processes.
The most obvious question for future research is the extension of these results to
multi-factor models. It is evident from numerical results that in two-factor models
yield curves with e.g. a dip, or also with a dip and a hump, can be obtained. It
would be interesting to see if more complex shapes can also be produced, or if there
are similar limitations as in the single-factor case. Also, in the one-factor case the
dependence of the yield curve shape on the current short rate is basically described
by the intervals D ∩ (−∞, bnorm], (bnorm, binv) and [binv,∞). In the two-factor case
the partitioning of the state-space might be more complex, and we expect to see
more interesting transitions between yield curve types. Another aspect is, that
since affine processes as a general framework become better understood, extensions
of classical models e.g. by adding jumps, like in the JCIR model described in
Section 4.3, become more feasible and attractive for applications.
References
Damiano Brigo and Fabio Mercurio. Interest Rate Models - Theory and Practice.
Springer Finance. Springer, 2nd edition, 2006.
René Carmona and Michael Tehranchi. Interest Rate Models: An Infinite Dimen-
sional Stochastic Analysis Perspective. Springer Finance. Springer, 2006.
Patrick Cheridito, Damir Filipović, and Marc Yor. Equivalent and absolutely con-
tinuous measure changes for jump-diffusion processes. The Annals of Applied
Probability, 15(3), 2005.
John C. Cox, Jonathan E. Ingersoll, and Stephen A. Ross. A theory on the term
structure of interest rates. Econometrica, 53(2):385–407, 1985.
YIELD CURVE SHAPES IN AFFINE ONE-FACTOR MODELS 21
Darrell Duffie and Nicolae Gârleanu. Risk and valuation of collateralized debt
obligations. Financial Analysts Journal, 57(1):41 – 59, 2001.
Darrell Duffie, Damir Filipović, and Walter Schachermayer. Affine processes and
applications in finance. The Annals of Applied Probability, 13(3):984–1053, 2003.
Damir Filipović. A general characterization of one factor affine term structure
models. Finance and Stochastics, 5:389–412, 2001.
Zbigniew J. Jurek and Wim Vervaat. An integral representation for self-
decomposable Banach space valued random variables. Zeitschrift für Wahrschein-
lichkeitstheorie und verwandte Gebiete, 62:247–262, 1983.
Eugene Lukacs. Characteristic Functions. Charles Griffin & Co Ltd., 1960.
Elisa Nicolato and Emmanouil Venardos. Option pricing in stochastic volatility
models of the Ornstein-Uhlenbeck type. Mathematical Finance, 13 (4):445–466,
2003.
Riccardo Rebonato. Interest-Rate Option Models. Wiley, 2nd edition, 1998.
L.C.G. Rogers and David Williams. Diffusions, Markov Processes and Martingales,
Volume 1. Cambridge Mathematical Library, 2nd edition, 1994.
Ken-iti Sato. Lévy processes and infinitely divisible distributions. Cambridge Uni-
versity Press, 1999.
Ken-iti Sato and M. Yamazato. Operator-selfdecomposable distributions as limit
distributions of processes of Ornstein-Uhlenbeck type. Stochastic Processes and
Applications, 17:73–100, 1984.
Fred Steutel and Klaas van Harn. Infinite Divisibility of Probability Distributions
on the Real Line. Marcel Dekker Inc., 2004.
Oldrich Vasiček. An equilibrium characterization of the term structure. Journal of
Financial Economics, 5:177–188, 1977.
Vienna University of Technology, Wiedner Hauptstrasse 8–10, A-1040 Wien, Austria
E-mail address: mkeller@fam.tuwien.ac.at
Vienna University of Technology, Wiedner Hauptstrasse 8–10, A-1040 Wien, Austria
E-mail address: thomas@fam.tuwien.ac.at
	1. Introduction
	2. Preliminaries
	3. Theoretical Results
	3.1. Bond Prices
	3.2. The Yield Curve and the Forward Rate Curve
	3.3. The Limit Distribution of an Affine Process
	4. Applications
	4.1. The Vasicek model
	4.2. The Cox-Ingersoll-Ross model
	4.3. An extension of the CIR model
	4.4. The gamma model
	5. Conclusions
	References
ABSTRACT
  We consider a model for interest rates, where the short rate is given by a
time-homogenous, one-dimensional affine process in the sense of Duffie,
Filipovic and Schachermayer. We show that in such a model yield curves can only
be normal, inverse or humped (i.e. endowed with a single local maximum). Each
case can be characterized by simple conditions on the present short rate. We
give conditions under which the short rate process will converge to a limit
distribution and describe the limit distribution in terms of its cumulant
generating function. We apply our results to the Vasicek model, the CIR model,
a CIR model with added jumps and a model of Ornstein-Uhlenbeck type.

<|endoftext|><|startoftext|>
Introduction  
The discussion about the presence of localised states and their density naturally leads to the 
question about the kind of traps we are dealing with. In a first approach we should distinguish 
between intrinsic and extrinsic defects. In the first type we should inscribe polymer end groups, grain 
boundaries, structural defects, conformational disorder up to molecular groups with large permanent 
dipole moment that could increase the level of energetic disorder [1]. For the second type we should 
mention the chemical impurities, somehow unavoidable in the synthesis of organic molecules. We 
can further distinguish the kind of traps in function of their location, interfacial or bulk, or in term of 
energy, deep traps or shallow traps. Also polarons could be seen, in a simplistic way, like defects 
caused by an electron plus an induced lattice polarisation [2] followed by a lattice distortion. Traps 
are into the samples in a great variety and in different proportion, often despite the same preparation 
procedure. For that reason sometimes their nature is difficult to investigate and the data must be 
handled with care. 
In the studies of trapped states because of the above underlined variety of defects in solids the 
most successful approach is to start introducing a single type of defect in a well-known system in a 
controlled way.  
2. Experimental Part 
 In our work we focused on low molecular compounds as well as on polymers, especially of 
two classes of materials: oxadiazoles and quinoxalines. Both organic compounds are well know as 
electron transport materials in OLEDs. PPQs (see figure 1) in general show very high solubility [3] in 
a variety of common organic solvents, and according to literature they exhibit a glass transition at 
quite high temperature (250-350 °C).  
The materials were deposited by spin coating on gold, aluminium or silicon in different speed 
or concentration for the film optimisation. The layer thickness was controlled by Dektak techniques 
and Ellypsometry. 
The thermally stimulated luminescence (TSL) and thermally stimulated current (TSC) are 
powerful instruments in order to study de-trapping and relaxation processes in organic materials. TSL 
is a contact less technique that allows to distinguish between deep and shallow trapping states. The 
proposed mathematical model for the TSL enables to study trap levels and recombination centres 
inside the band gap. The analytical solution of the rate equations allows two different de-trapping 
regimes, including or excluding subsequent re-trapping effects. The first order solution kinetic 
indicates that no re-trapping phenomena are permitted. The electron released from a localised level 
recombines with a hole in a recombination centre and its re-trapping probability, before to 
recombine, is negligible. The second order equation deals with the opposite extreme case. The 
phonon-assisted release of an electron is followed by multiple re-trapping. In this second order 
kinetic regime the probability of a released electron to get re-trapped is very high. The main factors 
governing both solutions are the energy depth of the traps calculated with respect to the conduction 
band edge and the frequency factor. This second important factor in general indicates the attempt-to-
escape frequency of electrons from the localised levels. The mathematical model takes also into 
account the occurrence of distributions of localised levels. In case of a Gaussian distribution of 
localised states a meaningful parameter is the width of the distribution. Numerical simulations, 
calculated with the proposed model, show that while a first order peak is characterised by an 
asymmetric peak shape with a steep decreasing side, the second order kinetic peaks are characterised 
by a more symmetric shape. The signal is smeared along the whole peak temperature range due to the 
re-trapping effect.  
Figure 1.  Poly-[2,2’-(1,4-phenylene)-6,6’-bis(3-phenylquinoxaline)] (PPQ IA) 
The same theoretical description holds for both techniques, TSL and TSC. However, TSC 
theory requires the presence of an extended conduction band. During a TSC experiment a driving 
voltage is applied to the sample and the de-trapped charges are extracted at the device contacts. 
However, the equations describing a TSC peak are similar to equations describing TSL. Additionally, 
it is possible to determine the density of the trapping states evaluating the area under a TSC peak.  
Simultaneous TSL and TSC measurements give useful information about the localised states 
combining the best possibilities of both thermal techniques. Unambiguous information about trap 
depth, density of states, kinetics order and frequency factor can be extract making use of the full 
possibilities of the combined measurements. 
In typical thermally stimulated process experiments a sample is heated in a controlled way and 
the current, in case of TSC, or the light emission, in case of TSL, or both simultaneously are 
monitored. The effect appears only when an optical or electrical excitation takes place prior to the 
heating. TSC, in contrast to TSL, requires the presence of good ohmic contacts.  
In the following a TSL experiment is described in more detail and the rate equations derived. 
The sample, in an equilibrium state at room temperature where all the shallow traps are empty, is 
cooled down to a low temperature. Then, it is illuminated with electromagnetic radiation of certain 
energy. The incident radiation excites the electrons from the valence band to the conduction band 
trough the gap. 
In the case of prompt fluorescence the generated electrons recombine promptly. Otherwise 
they can form an electron hole pair followed by geminate recombination or by dissociation with 
subsequent trapping. Charge carriers can get trapped in localised levels that, considering the random 
fluctuation of the potential in disordered materials, are distributed in energy. From statistical 
consideration, the distribution type should be generally Gaussian. The thermal emission from traps at 
low temperature is negligibly small. Therefore, the perturbed equilibrium created by the incident 
radiation resists for a long time and the electrons are just stored in the localised levels. 
Temperature is then raised in a controlled way, electrons acquire energy and finally escape 
from the traps by means of a phonon assisted jump and recombine with holes trough a recombination 
centre where recombination occurs with subsequent photon emission. By means of spectrally 
resolved TSL experiments it is possible to get information about recombination centres studying the 
wavelength of the emitted light as function of temperature and intensity.  
The above-described processes are illustrated in figure 2. 
The illustrated scheme for thermoluminescence is simple, but despite of its simplicity it can 
describe all fundamental features of a thermoluminescence process. Following Chen [4], the electron 
exchange between the HOMO and LUMO levels, during the trap emptying, can be described by the 
following three differential equations: 
h Ann
⋅⋅−=   (1) 
pnAnNn
c ⋅−⋅−= )(   (2) 
c AnnAnNnpn
⋅⋅−⋅−−⋅= )(  (3) 
Here nh is the concentration of holes in the recombination centres, nc is the concentration of 
electrons in the conduction band, Ar is the recombination coefficient for electrons in the conduction 
band with holes in the recombination centres, n is the concentration of electrons in traps, N is the 
function describing the concentration of electron traps at depth E below the edge of the conduction 
band, A is the transition coefficient for electrons in the conduction band becoming trapped and p is 
the same probability of thermal release of electrons from traps defined in equation (1), which 
represents in fact their release rate. Equation (2) describes the change of hole density nh in 
recombination centres versus time. The recombination rate depends both from the concentration of 
free electrons (nc) and from the concentration of holes already present in the recombination centres 
trough a probability coefficient (Ar) that depends on the cross section and the thermal velocity72 of 
electrons. An increase in these parameters results in an increase of the recombination probability. 
Equation (3) describes the exchange of electrons between conduction band and traps. The first term 
in the right hand side includes the probability A for an electron to be trapped. That probability A, like 
Ar, also depends on the thermal velocity of electrons and on the cross section of traps. The second 
term on the right hand side is the de-trapping term. It is proportional to the concentration of trapped 
Electron centre
Hole centre
Conduction band
Valence band 
Recombination centre
Figure 2. Energy diagram describing the elementary 
process of the simple model for TSL 
electrons and to the Boltzmann’s function, i.e. equation (1). The proportionality factor s, often called 
frequency factor or pre-exponential factor, should be, when interpreted in terms of attempt to escape 
of an electron from the potential well, in the order of magnitude of 1010 ÷ 1014 s-1. A saturation effect 
for carrier release from traps, caused by a limited number of available states in the conduction band, 
is neglected in this model. In each moment the number of available states in conduction band is much 
higher than the released amount of electrons from the localised states [5]. 
Equation (3) describes the variation of electron density in the conduction band and essentially 
it takes into account the charge neutrality of the whole system. The variation rate of electrons in the 
conduction band depends on electrons being released - first term on the right hand side-, electrons 
being trapped - second term - and electrons that recombine - third term. Electrons and holes in that 
model are generated at the same time - geminate couples, but they are not necessarily still bound. 
Saturation effects due to filled deeper traps or recombination centres that have already a hole on them 
are not considered.  
Complex models have the disadvantage to introduce an increased number of parameters [7]. 
Actually, several combinations of too many parameters can generate the same shape of a real glow 
curve, making impossible to find a most probable fit [8]. For that reason it is preferable to deal with a 
reasonable simple model that involves few reliable parameters. Actually, the proposed simple model 
can successfully describe the experimental glow curves, but it is necessary to take also the energetic 
distribution of localised states into account in order to describe the complex behaviour of disordered 
systems, like amorphous polymers or organic polycrystalline thin films. In such case the total number 
of traps is represented by the following equation (4). The traps do not have single activation energy, 
but they are continuously energetically distributed. 
dEENN   (4) 
 Here N(E) can be in principle any kind of distribution, but considering the statistical disorder 
in organic materials it should have a Gaussian shape. In principle also the frequency factor should 
have the same energetic distribution. In order to solve the system of differential equations (1)-(4), 
equation (5), regarding the time dependence of temperature, should be add. 
tTT ⋅+= β0   (5) 
In equation (5) β is the experimental constant heating rate. It should be note that as long as T is 
a well knows function of the time the only real variable is the time. For that reason it is very 
important, experimentally, to have a perfect control of the temperature linearity. 
3. Results and discussion 
The main peak in figure 3 has the maximum temperature at Tm = 159.7 K and the second, of the 
roughly half the intensity, at Tm = 230.7 K. The peak at Tm = 159.7 K has a symmetry factor µ = 
0.54, very near to the typical value for a second order kinetic. This fact gives a hint that in PPQ IA an 
electron, before recombination, has high probability to get re-trapped several times. Because of its 
hidden position the analysis of the minor peak of PPQ IA appearing at Tm = 230.7 K is very difficult  
[6]. 
60 80 100 120 140 160 180 200 220 240
=230.7 K
= 159.7 K
Temperature (K)
40 60 80 100 120 140 160 180 200 220 240 260
Temperature (K)
Figure 3. TSL of PPQ IA Figure 4. TSL glow curve of a PPQ IA sample (red 
line) compared with the second order numerical 
simulation of the first peak (green line) 
Figure 4 shows the numerical fitting of the main TSL peak of PPQ IA. The fit is performed by 
means of a second order equation characterised by a Gaussian distribution of traps. While the 
high temperature side perfectly fits the glow curve, the low temperature side do not follow the 
curve shape. For that reason the numerical simulation is not completely satisfactory. The 
distribution has a width σ = 0.12 eV and the distribution maximum is at Em = 0.37 eV. The 
energy maximum Em lies exactly in the middle of the integration limits E1 = 0.25 eV and E2 = 
0.49 eV, having the distribution in such case a perfect Gaussian shape. The natural frequency s of 
this peak can be estimated to be in the order of s = 1x1010 s-1, considering, as is normal, the 
recombination coefficient / trapping coefficient ratio equal to 1 and a density of traps of 1014 cm-
3. The value of Em, derived by numerical analysis, is far from the expected energy depth of a trap 
calculated with the initial rise method. However, this mismatching can be explained considering 
the particular complexity of the peak that could result from the sum of at least two distributed 
peaks. In effect the initial rise procedure reveals the activation energy of a hidden peak at low 
temperature. This is an important point to clarify because of the importance of the presence of 
shallow traps in materials suitable for plastic electronic applications. Shallow traps are, at room 
temperature, empty and they play a crucial role in the electron transport property of organic 
materials. For a different thickness we obtain glow curves from figure 5. 
40 60 80 100 120 140 160 180 200 220 240 260
Temperature (K)
Figure 5. Glow curve of PPQ IA for 1500 nm 
We made TSC measurements for Poly-3-hexyle-thiophene (P3HT) on SiO2 – treated and 
untreated in oxygen plasma. The TSC experiment consisted of: 
(1) Cooling to -180°C 
(2)  Trap filling by light (Mercury lamp) with + 4 V bias voltage  
(3)  Application of readout voltage, heating with 0.10 K/sec.  and measurement of 
detrapping current 
All measurements were carried out in vacuum  (4,5 x 10 –5 mbar). The traps were filled by 
creating carriers with band-to-band photoexcitation of the samples. The light source was a Mercury 
lamp (200 W). The thermally stimulated currents were measured by a  Keithley 617  electrometer. 
The TSC and temperature data were stored in a personal computer as described earlier. In a typical 
experiment, the samples are cooled down to T = 80 K and kept at this temperature for 15 min. Then 
they are illuminated through front electrode for a 15 minutes at a bias voltage   + 4 V. Measurements 
were started after exposure to light, and samples are then heated with a constant rate (β = 0.1 K/sec.) 
from 50 up to 240 K. We measured and compared 2 samples of P3HT on SiO2 – treated and 
untreated in oxygen plasma.  
Both experiments were performed under the same conditions.  
The concentration of the traps was estimated using (Manfredotti et. al.) the relation: 
NT =   (6) 
Here Q is the quantity of charge released during a TSC experiment and can be calculated from the 
area under the TSC peaks; A and L are the area and the thickness of the sample, respectively; e is the 
electronic charge and G is the photoconductivity gain, which equals to the number of electrons 
passing through the sample for each absorbed photon. NT was calculated by assuming G = 1. For that 
samples L = 60 nm and samples have 2,5 x 2,5 cm,    A = 6,25. 10-4 m2. The trap is characterized by 
the temperature (Tm) corresponding to the peak maximum at the thermally stimulated current.  The 
energy associated with the trap is the thermal energy at Tm given by: 
mmo KTTTfE ),,(
´βα=           (7) 
80 100 120 140 160 180 200 220 240
1.54E-010
1.55E-010
1.55E-010
1.56E-010
1.56E-010
Ea = 22.2 meV
Ea=20.1 meV
Ea = 25.8 meV
Ea=17,5 meV
=26.79x1014
=4,452x1014Nt=0.2825x10
=48.31x1014
198.56
171.08
155.05
134.43
Temperature (K)
80 100 120 140 160 180 200 220 240
1.00E-013
1.10E-013
1.20E-013
1.30E-013
1.40E-013
1.50E-013
Ea=26 meV
=1.368x1012 cm3
=0.608x1012 cm3
=0.84x1012 cm-3 Ea=21 meV
Ea=16 meV
202.914
167.161
127.709
Temperature (K)
Figure 6.  TSC for P3HT on SiO2 untreated sample Figure 7.  TSC for P3HT on SiO2 treated sample 
In this equation, α is a dimensionless model dependent constant. The variable T’ is the temperature at 
half of the maximum current value on the low temperature side of the current peak. Assuming the 
Grossweiner model, the constant α and function f are given by: 
51.1=α        (8) 
´ 1),,(
TTf mmβ      (9) 
 We observed a difference between the trap concentrations in these two samples (treated and 
untreated) of two orders of magnitude.  
4. Conclusions 
Thermal techniques are a powerful tool in the to study of localised levels in inorganic and organic 
materials. Thermally stimulated luminescence, thermally stimulated currents and thermally 
stimulated depolarisation currents allow, when applied in synergy the details shallow of traps and 
deeper levels to be investigated. They also permit to study, in synergy with dielectric spectroscopy, 
as polarisation and depolarisation effects. The analysis of the thermograms, emerging from the 
thermal techniques, can be performed starting from the differential rate equations of the de-trapping 
phenomena. Such an approach, allowed by the computing power of the modern computers, is not the 
most fruitful, while the number of free variables involved in the numerical resolution of the rate 
differential equations is too high. Sometimes completely different sets of parameters can fit the same 
thermally stimulated peak and ambiguous results are often achieved. 
5. Acknowledgements 
Many thanks for the European Commission (contract number - HPRN-CT-2002-00327 - RTN-
EUROFET) for the financial support as well as all co-workers and many friends of EUROFET 
network. 
6. References 
 [1] Ashcroft, N. W. & Mermin, N. D. Solid State Physics (Holt, Rinehart & Winston, New York, 
1976). 
[2] McKeever, S. W. S. Thermoluminescence of Solids (eds. Cahn, R. W., Davis, E. A. & 
Ward, I. M.) (Cambridge University Press, Cambridge, 1985). 
[3] Paolo Imperia, Localised States in Organic Semiconductors and their Detection, 
University of Potsdam, 2003. 
[4] Bässler, H. Charge Transport in Disordered Organic Photoconductors, a Monte Carlo Simulation 
Study. phys. stat. sol. (b) 175, 15 (1993). 
[5] M. Prelipceanu, O.G. Tudose, S. Schrader, Thermally Stimulated Luminescence Investigations of 
New Materials For OFET’s and OLED’s, Winterschool on Organic Electronics (OEWS’04) 
Materials, Thin Films, Charge Transport & Device, Planneralm, Austria, 2004. 
[6] van Turnhout, J. in Electrets (ed. Sessler, G. M.) 81 (Springer Verlag, Berlin, 1980). 
[7] Schrader, S., Imperia, P., Koch, N., Leising, G. & Falk, B. in Organic Light-Emitting Materials 
and Devices, (ed. Kafafi, Z. H.) 209 (SPIE, San Diego, California, 1999).
ABSTRACT
  The present work is focused on theoretical and experimental study of
localised levels in organic materials suitable for light-emitting devices and
field effect transistors by means of thermal techniques. In our work we focused
on low molecular compounds as well as on polymers, especially of two classes of
materials: oxadiazoles and quinoxalines. Both organic compounds are well know
as electron transport materials in OLEDs.

<|endoftext|><|startoftext|>
A mi
ro�uidi
 devi
e based on droplet storage
for s
reening solubility diagrams
Philippe Laval,
Ni
olas Lisai, Jean-Baptiste Salmon, and Mathieu Joani
ot
LOF, unité mixte Rhodia�CNRS�Bordeaux 1, 178 avenue du Do
teur S
hweitzer, F�33608 Pessa
 
edex � FRANCE
(Dated: O
tober 25, 2018)
This work des
ribes a new mi
ro�uidi
 devi
e developed for rapid s
reening of solubility diagrams.
In several parallel 
hannels, hundreds of nanoliter-volume droplets of a given solution are �rst stored
with a gradual variation in the solute 
on
entration. Then, the appli
ation of a temperature gradient
along these 
hannels enables us to read dire
tly and quantitatively phase diagrams, 
on
entration
vs. temperature. We show, using a solution of adipi
 a
id, that we 
an measure ten points of the
solubility 
urve in less than 1 hr and with only 250 µL of solution.
I. INTRODUCTION
Chemistry, biology, and pharma
ology, are fa
ing al-
ways more 
omplex systems depending on multiple pa-
rameters. Therefore their 
omplete investigations take
time and require signi�
ant amounts of produ
ts. In this

ontext, roboti
 �uidi
 workstations have already met a
great su

ess and proved their e�
ien
y for instan
e in
the genome sequen
ing and analysis [1℄. However, these
instruments remain very expensive, need important la-
bor, and the volumes involved (≤ mL) are still too large
for some spe
i�
 appli
ations (e.g. proteomi
s) [2, 3℄.
Nowadays, other high throughput te
hniques based on
mi
ro�uidi
s [4, 5℄ 
an o�er suitable alternative solutions
for the development of rapid s
reening tools. Mi
ro�uidi
devi
es are now largely used in biologi
al and 
hemi
al
�elds for multiple appli
ations [6℄ like mole
ular sepa-
rations and 
ells sorting [7℄, polymerase 
hain rea
tion
[8, 9℄, rapid mi
romixing and analysis of 
hemi
al rea
-
tions [10, 11, 12, 13℄. . . Moreover, the development of
mi
rovalves and mi
romixers has made possible the pro-
du
tion of highly integrated systems whi
h 
an be used
to address individually hundreds of rea
tion 
hambers
[14℄. These devi
es are well adapted to 
arry out high
throughput s
reening of phase diagrams, parti
ularly in
the 
ase of protein 
rystallization investigation. However,
their fabri
ation and multiplexing are still 
ompli
ated.
Another possible strategy is the use of droplets playing
the role of nanoliter-sized rea
tion 
ompartments. These
droplets 
an be produ
ed in spe
i�
 mi
ro�uidi
 geome-
tries [15℄, and their volume and 
hemi
al 
omposition 
an
be �xed in a 
ontrolled way. In addition, they also allow
a rapid mixing of the di�erent 
ompounds, prevent from
any hydrodynami
 dispersion and 
ross 
ontamination,
and 
an be stored in mi
ro
hannels (see Ref. [16℄ and
referen
es therein). Su
h a strategy has already proved
to be useful for 
rystallization studies: e.g. s
reening
of protein 
rystallization 
onditions [17, 18℄, or 
rystal
nu
leation kineti
s measurements [19℄.
Figure 1 summarizes the main insights of our work.
Ele
troni
 address: philippe.laval-exterieur�eu.rhodia.
om
We have engineered a new mi
ro�uidi
 
hip that allow
a dire
t and quantitative reading of two-dimensional di-
agrams. Hundreds of nanoliter-sized droplets of di�er-
ent 
hemi
al 
ompositions 
an be stored in parallel mi-

ro
hannels, and a temperature gradient applied along
these 
hannels enables us to obtain a two-dimensional
array of droplets of di�erent 
on
entrations and tempera-
tures. For solubility diagram s
reening, droplets 
ontain-
ing a given solute are �rst stored with a gradual variation
of 
on
entration. Then, 
rystallization in the droplets is
indu
ed by 
ooling, and �nally, the appli
ation of an ad-
equat temperature gradient dissolves 
rystals in droplets
whose temperature is higher than their solubility tem-
perature. As a result, we dire
tly read the limit between
droplets with and without 
rystals as shown on Fig. 1(
),
whi
h gives the solubility temperatures of the solution at
the di�erent 
on
entrations.
In the materials and methods se
tion, we des
ribe the
mi
ro�uidi
 devi
e, the method used to store the droplets
in the 
hannels, and the temperature 
ontrol setup. We
also 
hara
terize the 
on
entration and temperature gra-
dients. In the last se
tion, we present an experimental
proto
ol to measure quantitatively solubility diagrams
using this devi
e. We demonstrate its e�
ien
y by mea-
suring with only 250 µL of solution, the solubility 
urve
of an organi
 
ompound.
II. MATERIALS AND METHODS
A. Mi
rofabri
ation
The mi
ro�uidi
 devi
e is fabri
ated in
poly(dimethylsiloxane) (PDMS) by using soft-
lithographi
 te
hniques [20℄. PDMS (Sili
one Elastomer
Base, Sylgard 184; Dow Corning) is molded on master
fabri
ated on a sili
on wafer (3-In
h-Si-Wafer; Siegert
Consulting e.k.) using a negative photoresist (SU-8 2100;
Mi
roChem). To make molds of 500 µm height, we spin
su

essively two 250 µm thi
k SU-8 layers on the wafer.
After ea
h spin
oating pro
ess, the wafer is soft-baked
(10 min/65
C and 60 min/95
C). Photolithography is
used to de�ne negative images of the mi
ro
hannels.
Eventually, the wafer is hard-baked (25 min/95
C) and
http://arxiv.org/abs/0704.0569v1
mailto:philippe.laval-exterieur@eu.rhodia.com
(a) 1
FIG. 1: (a) Design of the mi
ro�uidi
 devi
e (
hannels width
500 µm). Sili
one oil is inje
ted in inlet 1 and aqueous so-
lutions in inlets 2 and 3. The two dotted areas indi
ate the
positions of the two Peltier modules used to apply tempera-
ture gradients ∇T . The three lines of dots mark the positions
of temperature measurements. (b) Pi
ture of the mi
ro�uidi

hip made of PDMS sealed with a glass slide (76×52 mm
improve 
larity. Droplets 
ontaining a 
olored dye at di�erent

on
entrations are stored in the ten parallel 
hannels. (
) Ex-
ample of dire
t reading of a solubility diagram. The droplets

ontain an organi
 solute. The dotted line bounding droplets

ontaining 
rystals give an estimation of the solubility limit
(see se
tion Results for details).
developed (SU-8 Developer; Mi
roChem). A mixture
10:1 of PDMS is molded on the SU-8 master des
ribed
above (65
C/60 min). The 
rossed linked PDMS layer
is then peeled o� the mold and holes for the inlets
and outlets (1/32 and 1/16 in. o.d.) are pun
hed into
the material. Then, the PDMS surfa
e and a 
lean
sili
on wafer surfa
e (3-In
h-Si-Wafer; 500 µm; Siegert
Consulting e.k.) are a
tivated for 2 min in a UV ozone
apparatus (UVO Cleaner, Model 144AX; Jelight) and
brought together. Finally, the devi
e is pla
ed at 65
for 2 hr to improve the sealing.
B. Droplet storage proto
ol
The devi
e, presented on Fig. 1(a), is 
omposed of
three inlets and ten outlets lo
ated at the extremities
of 
hannels 
1 to 
10. As shown on Fig. 1(b), ea
h out-
let is 
onne
ted to a ≈ 20 
m long rigid tubing (FEP
1/16 in.) ended with a pie
e of soft PVC tubing (Nal-
gene, ≈ 5 
m long) inserted in an automated pin
h ele
-
trovalve (105S�01059P; As
o Jou
omati
). Thanks to
this system, ea
h outlet 
an be independently 
losed or
opened by pin
hing or not the 
orresponding PVC tub-
ing. However, the pin
hing out of a tube leads in a liquid
displa
ement. To minimize the subsequent liquid distur-
ban
e in the mi
ro
hannels, the ele
trovalves are pla
ed

lose to the rigid ones, and the hydrodynami
 resistan
e
after the ele
trovalves is kept as weak as possible using
large tubing.
Sili
one oil (500 
St; Rhodorsil) is inje
ted in inlet 1 at

onstant �ow rate Q1 ≈ 3 mL hr
, and aqueous phases
are inje
ted at �ow rates Q2 and Q3 ranging from 0 to
about 1 mL hr
, in inlets 2 and 3 respe
tively. All
liquids are inje
ted with syringe pumps (PHD 2000 infu-
sion; Harvard Apparatus). At the interse
tion between
the oil and the aqueous streams, monodisperse droplets
of the aqueous phase in oil are 
ontinuously produ
ed
[21℄. Both the droplet volume (about 100�300 nL) and
their produ
tion frequen
y (typi
ally between one to ten
droplets per se
ond) 
an be tuned by the ratio of oil to
aqueous phase �ow rates. The droplet 
omposition is
monitored by the ratio Q2/Q3.
Thanks to the possible opening and 
losing of ea
h out-
let, we 
an store droplets of given aqueous 
ompositions
in the di�erent storage 
hannels 
i. Several steps are ne
-
essary to perform su
h a �lling. First, all the 
hannels
are initially �lled with sili
one oil. Se
ondly, the outlet of

hannel 
1 is opened and all the others are 
losed. In this

on�guration, all the droplets of a given 
omposition �ow
through 
1. Finally, on
e the �ow is stable, the outlet of

1 is suddenly 
losed and simultaneously, the outlet of

hannel 
2 is opened. All the droplets previously present
in 
1 stay immobilized whereas the other droplets, whose

omposition 
an be 
hanged, �ow through 
2. Su

es-
sively, in the same way, we 
an store droplets of various

ompositions in all 
hannels 
i.
C. Chemi
al 
omposition 
ontrol
For solubility investigations, the 
ontrol of the 
on-

entrations in the droplets is 
ru
ial. However, be
ause
of PDMS elasti
ity and syringe pumps pre
ision, an in-
a

ura
y in droplet 
on
entration remains. To estimate
this error, we have performed investigations with a 
on-
fo
al Raman mi
ros
ope (HR800 Horiba; Jobin-Yvon).
A 50× mi
ros
ope obje
tive was used for fo
using a
532 nm wavelength laser beam in the droplets, and for

olle
ting Raman s
attered light, subsequently dispersed
with a grating of 600 lines per millimeter. To minimize
the out-of-fo
us ba
kground signals, we �xed the 
on-
fo
al pinhole at 500 µm. Experiments were performed
on droplets made of two initial aqueous solutions of
K4Fe(CN)6 (0.5 M) and K3Fe(CN)6 (0.5 M) inje
ted in
inlets 2 and 3 respe
tively. These two 
ompounds display
strong and distin
t Raman signals [22℄.
Figure 2 shows three Raman spe
tra measured
in droplets 
ontaining di�erent 
on
entration ratios
RC=[K4Fe(CN)6℄/[K3Fe(CN)6℄. The two bands 
entered
2000 2060 2095 2136 2200
wave number (cm−1)
FIG. 2: Raman spe
tra of droplets 
ontaining di�erent 
on-

entration ratios RC of potassium ferro
yanide K4Fe(CN)6
and potassium ferri
yanide K3Fe(CN)6. (a) RC = 0 (b)
RC = 1 (
) RC = 9.
at 2060 and 2095 
m

orrespond to K3Fe(CN)6 and
the one at 2136 
m

orresponds to K4Fe(CN)6. The

on
entration of ea
h 
ompound 
an be probed from the
area under their spe
i�
 Raman bands by:
Ai = KiCitV , (1)
where Ai is the area under the Raman band of the 
om-
pound i, Ci its 
on
entration, Ki a spe
i�
 
onstant, t
the a
quisition time, and V the analysis volume. As a

onsequen
e, the ratio RA of the Raman bands areas of
K4Fe(CN)6 and K3Fe(CN)6 is proportional to the 
on-

entrations ratio RC , and does not depend on the a
qui-
sition parameters.
In order to optimize the �lling proto
ol, we �rst use Ra-
man mi
ros
opy to follow the kineti
s of the 
on
entra-
tion stabilization in the droplets after a sudden 
hange in
the aqueous phases �ow rates. Indeed, due to the PDMS
elasti
ity and the inje
tion system (syringe pumps), the
�nite response time of the devi
e does not allow instan-
taneous 
hange of the 
on
entrations. To estimate this
response time, we have performed the following experi-
ment: for t < 0 s, Q2 = 0 and Q3 = 500 µL hr
, and
for t > 0 s, Q2 = Q3 = 250 µL hr
. Droplets �rst
�ow through 
hannel 
1 whi
h is 
losed after 30 s. Then,
droplets are stored in �ve other 
hannels after 1, 2, 4,
6, and 10 min. Thus, Raman spe
tra obtained from the
droplets in the di�erent 
hannels enable us to follow the
evolution of RA as a fun
tion of time after the �ow rates

hange. Figure 3(a) shows it rea
hes almost a 
onstant
value after 60 s meaning the 
on
entrations be
ome sta-
ble after this time. Su
h measurements illustrate that
20 min long proto
ols are e�
ient to store droplets of
desired 
ompositions in the ten 
hannels (≈ 2 min per

hannel).
A se
ond series of experiments was performed to es-
timate and 
hara
terize the 
on
entration gradient we

an apply in the devi
e. The storage 
hannels are �lled
with droplets of di�erent 
on
entrations in K3Fe(CN)6
and K4Fe(CN)6 set from the �ow rates. In ea
h 
hannel

i, to rea
h a stable droplets 
omposition, we maintain
the �ow for 90 s before 
losing the outlet to store them
[see Fig. 3(a)℄. By measuring the Raman spe
tra of the
droplets 
omposition in the di�erent 
hannels, we ob-
tain the ratio RA as a fun
tion of the theoreti
al ratio
= Q2/Q3. The error bar 
orresponds to the stan-
dard deviation of the measurements performed on the
droplets in a given 
hannel. As 
an be seen on Fig. 3(b),
a linear relationship between RA and R
is observed as
expe
ted. Deviations of a few per
ents around the linear
law are probably due to the Raman measurements un-

ertainties, to the a

ura
y of the inje
tion system, and
also to the PDMS elasti
ity.
These Raman measurements demonstrate that with
the developed proto
ol, we are able to store hundreds
of droplets in ten 
hannels in about 20 min, and 
on-
suming less than a few hundreds of µL of solution. We
believe that more rigid and smaller mi
rodevi
es 
om-
bined with even more rea
tive inje
tion system would
de
rease signi
antly the amount of liquids used when �ll-
ing the 
hannels. Other strategies involving for instan
e
droplet generation thanks to integrated mi
rovalves [23℄,
may also proved to be useful to de
rease the required
volumes of solution.
0 1 2 3 4
0 60 200 400 600
t (s)
FIG. 3: (a) Evolution of the ratio RA in the droplets after a
sudden 
hange of the aqueous solutions �ow rates Q2 and Q3.
Before t = 0 s, Q2 = 0 and Q3 = 500 µL hr
. For t > 0 s,
Q2 = Q3 = 250 µL hr
. Between 5 and 30 s, RA are obtained
from three single droplets in 
hannel 
1. After t = 60 s, ea
h
point is a mean value 
al
ulated on several droplets in a given

hannel. (b) Con
entration ratio RA in droplets as a fun
tion
of the 
on
entration ratio R
C determined from the aqueous
solutions �ow rates. The dotted line 
orresponds to the linear
�t of the data.
D. Temperature 
ontrol
The temperature �eld of the 
hip is 
ontrolled with two
Peltier modules (30×30×3.3 mm
; CP1.4�71�06L; Mel-

or) pla
ed underneath the wafer at positions marked by
the two dotted areas on Fig. 1(a). Sin
e the two Peltier
modules are independant, we 
an heat or 
ool the de-
vi
e, and also apply important temperature gradients.
We use a sili
on wafer as 
hip support to optimize ther-
mal transfers and thus to 
reate regular temperature gra-
dients along the storage 
hannels. Thin thermo
ouples
(type K, 76 µm o.d., 5SRTC-TTKI-40-1M; Omega) mea-
sure the temperature of the devi
e along three series of
positions parallel to the storage 
hannels. The �rst series
is pla
ed above 
1, the se
ond one between 
5 and 
6, and
the third one below 
10 [see Fig. 1(a)℄. To rea
h the maxi-
mal pre
ision on liquid temperature measurements inside
the 
hannels, the thermo
ouples are inserted in holes pre-
viously pun
hed through the PDMS layer and �lled with
sili
one oil. Thermo
ouples signals are pro
essed with
a data a
quisition instrument (USB�9161; National In-
struments) and LabView software. Figure 4(a) shows
we are able to apply easily temperature gradients up to
C on 5 
m. To estimate the temperature at any po-
0 10 20 30 40
X (mm)
X (mm)C
FIG. 4: Temperature pro�les of the 
hip for a given tem-
perature gradient (a) Temperatures measured along the stor-
age 
hannels with thermo
ouples inserted through the PDMS
layer at di�erent positions shown on Fig. 1(a). (N) measure-
ments series above 
hannel 
1; (�) series between 
5 and 
6;
(H) series below 
10. (b) Interpolated temperature pro�le of
the 
hip.
sitions along the storage 
hannels, we perform a longi-
tudinal and transverse linear interpolation of the three
series of measurements. The �nal pro�le obtained after
su
h interpolation is depi
ted on Fig. 4(b). Note that the
temperature is not perfe
tly homogeneous transversely
to the storage 
hannels. This is due to the size of the
Peltier module as 
ompared to the size of the droplet
storage area: smaller storage area, or larger Peltier mod-
ules, would give homogeneous temperature pro�les along
the transverse dire
tion of the 
hannels.
III. RESULTS
In the previous se
tion we have shown that our mi-

rodevi
e allows us to build a two-dimensional array of
droplets with both 
on
entration and temperature gra-
dients. We now present an appli
ation of this 
hip by
measuring the solubility 
urve of an organi
 solute.
Su
h measurements are 
arried out with an adipi
 a
id
solution previously prepared in a beaker. It is made of
10.14 g of adipi
 a
id (99%; Aldri
h) in 50.66 g of deion-
ized water. The solubility temperature of the solution
is 63
C. To avoid any 
rystallization before the droplets
formation, the syringe 
ontaining the solution and the

orresponding tubing are heated at about 65
C with two
�exible heaters (Min
o) 
ontrolled with temperature 
on-
trollers (Min
o). A stereo mi
ros
ope (SZX12; Olympus)
with an obje
tive (DF PLFL 0.5× PF; Olympus) enables
us to observe the devi
e during the solubility study.
We inje
t the adipi
 a
id solution in inlet 2 and deion-
ized water in inlet 3. By 
hanging the �ow rates ratio
we �ll the storage 
hannels with droplets whose 
on
en-
tration in adipi
 a
id varies from 20 g / 100 g of water
in 
hannel 
1 down to 6 g / 100 g of water in 
10. The
massi
 
on
entration C in the droplets is 
al
ulated a
-

ording to:
1 + (1 + C0)Q3/Q2
where C0 is the massi
 
on
entration of the initial adipi
a
id solution, Q2 and Q3 the respe
tive �ow rates of the
solution and water (we 
he
ked that density variations in-
du
ed by the presen
e of adipi
 a
id are negligible). The
mi
ro�uidi
 
hip is kept at about 65
C using the Peltier
modules to avoid any 
rystallization during the droplet
storage. Before stopping the droplets in a 
hannel, we
maintain it open for 90 s for �ow stabilization. In these

onditions, the total �lling of the ten 
hannels is rea
hed
in less than 20 min and only 250 µL of solution are spent.
After the droplet storage, 
rystallization is indu
ed by

ooling. Note that the mean time of 
rystal nu
leation
is inversely proportional to the rea
tor volume. Indeed,
the nu
leation frequen
y is given by 1/JV where J is the
nu
leation rate that does no depend on the volume V
of the rea
tor (see Refs. [19, 24, 25, 26℄ and referen
es
therein). Crystal nu
leation in a droplet of 100 nL is
thus 10
times longer than in a vial of 1 mL. To redu
e
su
h long indu
tion time, we apply a strong 
ooling to
in
rease signi�
antly the supersaturation. In our 
ase,
down to ≈ −5◦C, 
rystals appear in all the droplets after
a few minutes.
To obtain the solubility 
urve dire
tly on the 
hip, we
then apply a temperature gradient between 32 and 65
after the 
rystallization step. Crystals dissolve in all the
droplets whose temperature is higher than their solubil-
ity temperature. In the other droplets, 
rystals are partly
solubilized but still exist (the equilibrium is rea
hed in
about 20 min). Typi
al images of the storage area are
FIG. 5: Images of a part of the storage area obtained under

rossed polarizers. Droplets of adipi
 a
id solution are stored
in the 
hannels. The 
on
entration in adipi
 a
id was grad-
ually 
hanged between the upper and the bottom 
hannels.
After 
rystallisation of all the droplets, a temperature gradi-
ent is applied (low temperature on the left and high temper-
ature on the right). (a) The dotted line separating droplets

ontaining 
rystals from empty droplets give an estimation
of the solubility limit. (b) Same image but with a di�erent

ontrast displaying the droplets positions.
presented on Fig. 5. Sin
e adipi
 a
id 
rystals have bire-
fringent properties, they are easily dete
ted under 
rossed
polarizers. The smallest dete
table 
rystals size is about
50×50 µm2 at the magni�
ation used. Figure 5(a) en-
ables us to dire
tly observe the limit of 
rystal presen
e.
Using interpolated temperature pro�les su
h as the one
displayed in Fig. 4, allows us to estimate the solubility
temperatures for all the ten 
on
entrations (we 
hoose
them in the middle of the two su
essive droplets with
and without 
rystals). Figure 6 presents su
h solubility
temperatures measured with our mi
ro�uidi
 devi
e. The
error 
orresponds to the temperatures di�eren
e between
the two droplets en
losing the solubility limit positions.
These results are in good agreements with data obtained
from literature [27℄.
Naturally, the errors done on su
h measurements de-
pend on the distan
e between two su

essive droplets,
and on the amplitude of the temperature gradient. In
our 
ase, the temperature gradient of 0.7
a typi
al distan
e of 3 mm between two droplets give an
error of ±1◦C. The appli
ation of smaller temperature
40 45 50 55 60
T (°C)
FIG. 6: (•) Solubility of adipi
 a
id in water measured in the

ase of a temperature gradient of 0.7
. (◦) Solubility
data from literature, the dotted line is a guideline for eyes.
gradients and the redu
tion of the distan
e between two
su

essive droplets would give a better a

ura
y on the
solubility limit.
For the moment, the maximal temperature whi
h 
an
be investigated is limited by the evaporation of water
through the PDMS layer [28℄. Simple measurements
show that the volume of an aqueous droplet stored in
our devi
e at 60
C, de
reases by ≈ 10% in 4 hr. Su
h an
e�e
t is negligible for the experiments des
ribed above
(droplet �lling time 20 min at 65
C), but may explain
the small dis
repan
y observed on Fig. 6 at high temper-
ature. We believe that the use of non-permeable materi-
als su
h as glass, instead of PDMS, 
ould easily broaden
the possibilities o�ered by our system.
IV. CONCLUSION
In this work we have presented a new mi
ro�uidi
tool to perform rapid s
reening of solubility diagrams.
The devi
e enables us to store hundreds of droplets
(≈ 100 nL) of various 
hemi
al 
ompositions in parallel
mi
ro
hannels, and to apply large temperature gradients.
We have demonstrated using a model system (adipi
 a
id
in water), that we 
ould easily and dire
tly a

ess to
ten simultaneous measurements of the solubility 
urve
on a large temperature range, in less than 1 hr, and with
only 250 µL of solution. To 
on
lude, we believe our
devi
e is a suitable tool for solubility diagrams s
reen-
ing, more rapid, with a better temperature 
ontrol, and

heaper than 
lassi
al roboti
 workstations. Su
h a mi-

ro�uidi
 tool may also be useful for many other appli
a-
tions, where two-dimensional s
reening, temperature vs.

omposition, is required.
A
knowledgments
We gratefully thank G. Cristobal, J. Krishnamurti,
J. Leng, and F. Sarrazin for fruitful dis
ussions and 
riti-

al reading of this manus
ript. We also a
knowledge Ré-
gion Aquitaine for funding and support, and the Atelier
Mé
anique of the CRPP for their te
hni
al help.
[1℄ G. H. W. Sanders and A. Manz, Trends Anal. Chem. 19,
364 (2000).
[2℄ J. R. Luft, J. Wol�ey, I. Jurisi
a, J. Glasgow, S. Fortier,
and G. T. DeTitta, J. Cryst. Growth 232, 591 (2001).
[3℄ D. L. Chen and R. F. Ismagilov, Curr. Opin. Chem. Biol.
10, 226 (2006).
[4℄ H. A. Stone, A. D. Stroo
k, and A. Ajdari, Annu. Rev.
Fluid. Me
h. 36, 381 (2004).
[5℄ T. M. Squires and S. R. Quake, Rev. Mod. Phys. 77, 977
(2005).
[6℄ T. Vilkner, D. Janasek, and A. Manz, Anal. Chem. 76,
3373 (2004).
[7℄ N. Min
, C. Futterer, K. D. Dorfman, A. Ban
aud,
C. Gosse, C. Goubault, and J. L. Viovy, Anal. Chem.
76, 3770 (2004).
[8℄ J. Khandurina and A. Guttman, J. Chromatogr. A 943,
159 (2002).
[9℄ M. Chabert, K. D. Dorfman, P. de Cremoux, J. Roer-
aade, and J.-L. Viovy, Anal. Chem. 78, 7722 (2006).
[10℄ A. D. Stroo
k, S. K. Dertinger, A. Ajdari, I. Mezi
, H. A.
Stone, and G. M. Whitesides, S
ien
e 295, 647 (2002).
[11℄ E. M. Chan, A. P. Alivisatos, and R. A. Mathies, J. Am.
Chem. So
. 127, 13854 (2005).
[12℄ J.-B. Salmon, C. Dubro
q, P. Tabeling, S. Charier, D. Al-

or, L. Jullien, and F. Ferrage, Anal. Chem. 77, 3417
(2005).
[13℄ S. A. Khan, A. Gunther, M. A. S
hmidt, and K. F.
Jensen, Langmuir 20, 8604 (2004).
[14℄ T. Thorsen, S. J. Maerkl, and S. R. Quake, S
ien
e 298,
580 (2002).
[15℄ T. Thorsen, R. W. Roberts, F. H. Arnold, and S. R.
Quake, Phys. Rev. Lett. 86, 4163 (2001).
[16℄ H. Song, D. L. Chen, and R. F. Ismagilov, Angew. Chem.
Int. Ed Engl. 45, 7336 (2006).
[17℄ B. Zheng, L. S. Roa
h, and R. F. Ismagilov, J. Am.
Chem. So
. 125, 11170 (2003).
[18℄ J. Shim, G. Cristobal, D. Link, T. Thorsen, and
S. Fraden, Using mi
ro�uidi
s to de
ouple nu
leation and
growth of protein 
rystals, Unpublished Work (2006).
[19℄ P. Laval, J.-B. Salmon, and M. Joani
ot, J. Cryst.
Growth doi:10.1016/j.j
rysgro.2006.12.044 (2007).
[20℄ J. C. M
Donald and G. M. Whitesides, A

. Chem. Res.
35, 491 (2002).
[21℄ S. L. Anna, N. Boutoux, and H. A. Stone, Appl. Phys.
Lett. 82, 364 (2003).
[22℄ G. Cristobal, L. Arbouet, F. Sarrazin, D. Talaga, J.-L.
Bruneel, M. Joani
ot, and L. Servant, Lab Chip 6, 1140
(2006).
[23℄ B. T. Lau, C. A. Baitz, X. P. Dong, and C. L. Hansen,
J. Am. Chem. So
. 129, 454 (2007).
[24℄ A. C. Zettlemoyer, Nu
leation (Mar
el Dekker, New
York, 1969).
[25℄ D. Kash
hiev and G. M. Rosmalen, Cryst. Res. Te
hnol.
38, 555 (2003).
[26℄ J. W. Mullin, Crystallization (Butterworth-Heinemann,
Oxford, 2001), 4th ed.
[27℄ A. Apelblat and E. Manzurola, J. Chem. Thermodynam-
i
s 19, 317 (1986).
[28℄ J. Leng, B. Lonetti, P. Tabeling, M. Joani
ot, and A. Aj-
dari, Physi
al Review Letters 96, 084503 (2006).
ABSTRACT
  This work describes a new microfluidic device developed for rapid screening
of solubility diagrams. In several parallel channels, hundreds of
nanoliter-volume droplets of a given solution are first stored with a gradual
variation in the solute concentration. Then, the application of a temperature
gradient along these channels enables us to read directly and quantitatively
phase diagrams, concentration vs. temperature. We show, using a solution of
adipic acid, that we can measure ten points of the solubility curve in less
than 1 hr and with only 250 $\mu$L of solution.

<|endoftext|><|startoftext|>
Introduction
	One and two quasiparticles in the Laughlin state
	The ground state and the quasihole states
	One quasiparticle
	Two or more quasiparticles
	Quasiparticles and quasiholes
	 Composite Fermion states in the Jain series
	The = 2/5  composite fermion ground state
	The quasihole operators
	The quasiparticle operator
	The = 3/7 state and the Jain series
	Connection to effective Chern-Simons theories and edge states
	Localized quasiparticles and fractional charge and statistics
	Numerical tests
	Two-quasiparticle wave function
	Random Phase Approximation
	Summary and Outlook
	The background charge
	Equivalence between CFT and CF wave functions
	An identity
	Equivalence between the = 2/5 CF and CFT wave functions. 
	The general CF operators and the Jain series
	The normalization factors N1 and N2 
	References
ABSTRACT
  It is known that a subset of fractional quantum Hall wave functions has been
expressed as conformal field theory (CFT) correlators, notably the Laughlin
wave function at filling factor $\nu=1/m$ ($m$ odd) and its quasiholes, and the
Pfaffian wave function at $\nu=1/2$ and its quasiholes. We develop a general
scheme for constructing composite-fermion (CF) wave functions from conformal
field theory. Quasiparticles at $\nu=1/m$ are created by inserting anyonic
vertex operators, $P_{\frac{1}{m}}(z)$, that replace a subset of the electron
operators in the correlator. The one-quasiparticle wave function is identical
to the corresponding CF wave function, and the two-quasiparticle wave function
has correct fractional charge and statistics and is numerically almost
identical to the corresponding CF wave function. We further show how to exactly
represent the CF wavefunctions in the Jain series $\nu = s/(2sp+1)$ as the CFT
correlators of a new type of fermionic vertex operators, $V_{p,n}(z)$,
constructed from $n$ free compactified bosons; these operators provide the CFT
representation of composite fermions carrying $2p$ flux quanta in the $n^{\rm
th}$ CF Landau level. We also construct the corresponding quasiparticle- and
quasihole operators and argue that they have the expected fractional charge and
statistics. For filling fractions 2/5 and 3/7 we show that the chiral CFTs that
describe the bulk wave functions are identical to those given by Wen's general
classification of quantum Hall states in terms of $K$-matrices and $l$- and
$t$-vectors, and we propose that to be generally true. Our results suggest a
general procedure for constructing quasiparticle wave functions for other
fractional Hall states, as well as for constructing ground states at filling
fractions not contained in the principal Jain series.

<|endoftext|><|startoftext|>
Untitled
0 → π+π−π0 Time Dependent Dalitz analysis at BaBar.
Gianluca Cavoto∗
INFN Sezione di Roma, Piazzale Aldo Moro 2, 00185 Rome, Italy
I present here results of a time-dependent analysis of the Dalitz structure of neutral B meson
decays to π+π−π0 from a dataset of 346 million BB̄ pairs collected at the Υ (4S) center of mass
energy by the BaBar detector at the SLAC PEP-II e+e− accelerator. No significant CP violation
effects are observed and 68% confidence interval is derived on the weak angle α to be [75,152]
I. INTRODUCTION
The time-dependent analysis of the B0 → π+π−π0
Dalitz plot (DP), dominated by the ρ(770) intermedi-
ate resonances, extracts simultaneously the strong tran-
sition amplitudes and the weak interaction phase α ≡
arg [−VtdV ∗tb/VudV ∗ub] of the Unitarity Triangle [1]. In
the Standard Model, a non-zero value for α is respon-
sible for the occurrence of mixing-induced CP violation
in this decay. ρ±π∓ is not a CP eigenstate, and four
flavor-charge configurations (B0(B0) → ρ±π∓) must be
considered. The corresponding isospin analysis [2] is un-
fruitful with the present statistics since two pentagonal
amplitude relations with 12 unknowns have to be solved
(compared to 6 unknowns for the π+π− and ρ+ρ− sys-
tems).
The differential B0 decay width with respect to the
Mandelstam variables s+, s− (i.e., the Dalitz plot [3])
reads dΓ(B0 → π+π−π0) = 1
(2π)3
|A3π |
ds+ds−, where
A3π (A3π) is the Lorentz-invariant amplitude of the
three-body decay B0 → π+π−π0 (B0 → π+π−π0). We
assume in the following that the amplitudes are dom-
inated by the three resonances ρ+, ρ− and ρ0 and we
write A3π = f+A
+ + f−A
− + f0A
0 and A3π = f+A+ +
−+f0A
0, where the fκ (with κ = {+,−, 0} denoting
the charge of the ρ from the decay of the B0 meson) are
functions of s+ and s− that incorporate the kinematic
and dynamical properties of the B0 decay into a (vec-
tor) ρ resonance and a (pseudoscalar) pion, and where
the Aκ are complex amplitudes that include weak and
strong transition phases and that are independent of the
Dalitz variables.
With ∆t ≡ t3π−ttag defined as the proper time interval
between the decay of the fully reconstructedB03π and that
of the other meson B0tag, the time-dependent decay rate
∗Electronic address: gianluca.cavoto@roma1.infn.it
when the tagging meson is a B0 (B0) is given by
|A±3π(∆t)|
e−|∆t|/τB0
|A3π |2 + |A3π|2 ∓
|A3π |2 − |A3π|2
cos(∆md∆t)
± 2Im
A3πA∗3π
sin(∆md∆t)
, (1)
where τB0 is the mean B
0 lifetime and ∆md is the B
oscillation frequency. Here, we have assumed that CP
violation in b mixing is absent (|q/p| = 1), ∆ΓBd = 0
and CPT is conserved. Inserting the amplitudes A3π and
A3π one obtains for the terms in Eq. (1)
|A3π |2 ± |A3π|2 =
κ∈{+,−,0}
|fκ|2U±κ +
κ<σ∈{+,−,0}
Re [fκf
κσ − Im [fκf∗σ ]U±,Imκσ
A3πA∗3π
κ∈{+,−,0}
|fκ|2Iκ +
κ<σ∈{+,−,0}
Re [fκf
σ ] I
κσ + Im [fκf
σ ] I
, (2)
The 27 real-valued coefficients defined in Tab.IV that
multiply the fκf
σ bilinears are determined by the fit.
Each of the coefficients is related in a unique way to phys-
ically more intuitive quantities, such as tree-level and
penguin-type amplitudes, the angle α, or the quasi-two-
body CP and dilution parameters [4] (cf. Section IVB).
We determine the quantities of interest in a subsequent
least-squares fit to the measured U and I coefficients.
II. DALITZ MODEL
The ρ resonances are assumed to be the sum of the
ground state ρ(770) and the radial excitations ρ(1450)
and ρ(1700), with resonance parameters determined by
a combined fit to τ+ → ντπ+π0 and e+e− → π+π−
data [5]. Since the hadronic environment is different in
B decays, we cannot rely on this result and therefore de-
termine the relative ρ(1450) and ρ(1700) amplitudes si-
multaneously with the CP parameters from the fit. Vari-
ations of the other parameters and possible contributions
http://arxiv.org/abs/0704.0571v1
0 0.2 0.4 0.6 0.8 1
Interference
FIG. 1: Square Dalitz plots for Monte-Carlo generated B0 →
π+π−π0 decays.The decays have been simulated without any
detector effect and the amplitudes A+, A− and A0 have all
been chosen equal to 1 in order to have destructive inter-
ferences at equal ρ masses. The main overlap regions be-
tween the charged and neutral ρ bands are indicated by the
hatched areas. Dashed lines in both plots correspond to√
s+,−,0 = 1.5 GeV/c
2: the central region of the Dalitz plot
contains almost no signal event.
to the B0 → π+π−π0 decay other than the ρ’s are studied
as part of the systematic uncertainties (Section IVA).
Following Ref. [5], the ρ resonances are parameterized
in fκ by a modified relativistic Breit-Wigner function in-
troduced by Gounaris and Sakurai (GS) [6].
Large variations occurring in small areas of the Dalitz
plot are very difficult to describe in detail. These re-
gions are particularly important since this is where the
interference, and hence our ability to determine the
strong phases, occurs. We therefore apply the trans-
formation ds+ ds− −→ | detJ | dm′ dθ′, which defines
the Square Dalitz plot (SDP). The new coordinates are
m′ ≡ 1
arccos
−mmin
, θ′ ≡ 1
θ0, where
m0 is the invariant mass between the charged tracks,
mmax0 = mB0 − mπ0 and mmin0 = 2mπ+ are the kine-
matic limits of m0 and θ0 is the ρ
0 helicity angle; θ0 is
defined by the angle between the π+ in the ρ0 rest frame
and the ρ0 flight direction in the B0 rest frame. J is
the Jacobian of the transformation that zooms into the
kinematic boundaries of the Dalitz plot, shown in Fig.1 .
III. ANALYSIS DESCRIPTION
The U and I coefficients and the B0 → π+π−π0 event
yield are determined by a maximum-likelihood fit of the
signal model to the selected candidate events. Kinematic
and event shape variables exploiting the characteristic
properties of the events are used in the fit to discriminate
signal from background.
A. Signal and background parametrization
We reconstruct B0 → π+π−π0 candidates from pairs
of oppositely-charged tracks, which are required to form
a good quality vertex, and a π0 candidate. In order to
ensure that all events are within the Dalitz plot bound-
aries, we constrain the three-pion invariant mass to the
B mass.
A B-meson candidate is characterized kinemat-
ically by the energy-substituted mass mES =
s+ p0 · pB)2/E20 − p2B]
2 and energy difference
∆E = E∗B − 12
s, where (EB,pB) and (E0,p0) are
the four-vectors of the B-candidate and the initial
electron-positron system, respectively. The asterisk
denotes the Υ (4S) frame, and s is the square of the
invariant mass of the electron-positron system. We
require 5.272 < mES < 5.288GeV/c
2. The ∆E res-
olution exhibits a dependence on the π0 energy and
therefore varies across the Dalitz plot. We account
for this effect by introducing the transformed quantity
∆E′ = (2∆E − ∆E+ − ∆E−)/(∆E+ − ∆E−), with
∆E±(m0) = c± − (c± ∓ c̄) (m0/mmax0 )2, where m0 is
strongly correlated with the energy of π0. We use the val-
ues c̄ = 0.045GeV, c− = −0.140GeV, c+ = 0.080GeV,
mmax0 = 5.0GeV, and require −1 < ∆E′ < 1.
Backgrounds arise primarily from random combina-
tions in continuum qq̄ events. To enhance discrimination
between signal and continuum, we use a neural network
(NN) [7] to combine discriminating topological variables.
The time difference ∆t is obtained from the measured
distance between the z positions (along the beam direc-
tion) of the B03π and B
tag decay vertices, and the boost
βγ = 0.56 of the e+e− system: ∆t = ∆z/βγc. To deter-
mine the flavor of the B0tag we use the B flavor tagging
algorithm of Ref. [8]. This produces six mutually exclu-
sive tagging categories.
Events with multiple B candidates passing the full se-
lection occur in 16% (ρ±π∓) and 9% (ρ0π0) of the time,
according to signal MC. If the multiple candidates have
different π0 candidates, we choose the B candidate with
the reconstructed π0 mass closest to the nominal π0 mass;
in the case that both candidates have the same π0, we
pick the first one.
The signal efficiency determined from MC simulation
is 24% for B0 → ρ±π∓ and B0 → ρ0π0 events, and 11%
for non-resonant B0 → π+π−π0 events.
Of the selected signal events, 22% of B0 → ρ±π∓,
13% of B0 → ρ0π0, and 6% of non-resonant events are
misreconstructed. Misreconstructed events occur when a
track or neutral cluster from the tagging B is assigned
to the reconstructed signal candidate. This occurs most
often for low-momentum tracks and photons and hence
the misreconstructed events are concentrated in the cor-
ners of the Dalitz plot. Since these are also the areas
where the ρ resonances overlap strongly, it is important
to model the misreconstruced events correctly.
We use MC simulated events to study the background
from other B decays. More than a hundred channels
were considered in preliminary studies, of which twenty-
nine are included in the final likelihood model. For each
mode, the expected number of selected events is com-
puted by multiplying the selection efficiency (estimated
using MC simulated decays) by the world average branch-
ing fraction (or upper limit), scaled to the dataset lumi-
nosity (310 fb−1). The selected on-resonance data sample
is assumed to consist of signal, continuum-background
and B-background components, separated by the flavor
and tagging category of the tag side B decay. The sig-
nal likelihood consists of the sum of a correctly recon-
structed (“truth-matched”, TM) component and a mis-
reconstructed (“self-cross-feed”, SCF) component.
B. Dalitz and ∆t distribution
The Dalitz plot PDFs require as input the Dalitz plot-
dependent relative selection efficiency, ǫ = ǫ(m′, θ′), and
SCF fraction, fSCF = fSCF(m
′, θ′). Both quantities are
taken from MC simulation.
Away from the Dalitz plot corners the efficiency is uni-
form, while it decreases when approaching the corners,
where one of the three particles in the final state is close
to rest so that the acceptance requirements on the par-
ticle reconstruction become restrictive. Combinatorial
backgrounds and hence SCF fractions are large in the
corners of the Dalitz plot due to the presence of soft neu-
tral clusters and tracks.
The width of the dominant ρ(770) resonance is large
compared to the mass resolution for TM events (about
8MeV/c2 core Gaussian resolution). We therefore neglect
resolution effects in the TM model. Misreconstructed
events have a poor mass resolution that strongly varies
across the Dalitz plot. It is described in the fit by a
2 × 2-dimensional resolution function, convoluted with
signal Dalitz PDF.
The ∆t resolution function for signal and B-
background events is a sum of three Gaussian distribu-
tions, with parameters determined by a fit to fully recon-
structed B0 decays [8].
The Dalitz plot- and ∆t-dependent PDFs factorize for
the charged-B-backgroundmodes, but not necessarily for
the neutral-B background due to B0B0 mixing.
The charged B-background contribution to the likeli-
hood parametrizes tag-“charge” correlation (represented
by an effective flavor-tag-versus-Dalitz-coordinate cor-
relation), and therefore possible direct CP violation in
these events.
The Dalitz plot PDFs are obtained from MC simula-
tion and are described with the use of non-parametric
functions. The ∆t resolution parameters are determined
by a fit to fully reconstructed B+ decays.
The neutral-B background is parameterized with
PDFs that depend on the flavor tag of the event and,
depending on the final states they can show correla-
tions between the flavor tag and the Dalitz coordinate.
The Dalitz plot PDFs are obtained from MC simulation
and are described with the use of non-parametric func-
tions. For neutral-B background, the signal ∆t resolution
model is assumed.
The Dalitz plot of the continuum events is
parametrized with an empirical shape. extracted from
on-resonance events selected in the mES sidebands and
corrected for feed-through from B decays. The contin-
uum ∆t distribution is parameterized as the sum of three
Gaussian distributions with common mean and three dis-
tinct widths that scale the ∆t per-event error, all deter-
mined by the fit.
IV. RESULTS
The maximum-likelihood fit results in a B0 → π+π−π0
event yield of 1847 ± 69, where the error is statistical
only. For the U and I coefficients, the results are given
together with their statistical and systematic errors in
Table IV. The signal is dominated by B0 → ρ±π∓ de-
cays. We observe an excess of ρ0π0 events, which is in
agreement with our previous upper limit [9], and the lat-
est measurement from the Belle collaboration [10]. The
result for the ρ(1450) amplitude is in agreement with the
findings in τ and e+e− decays [5]. For the relative strong
phase between the ρ(770) and the ρ(1450) amplitudes we
find (171± 23)◦ (statistical error only), which is compat-
ible with the result from τ and e+e− data.
A. Systematics studies
The most important contribution to the systematic un-
certainty stems from the modeling of the Dalitz plot dy-
namics for signal. We evaluated this by observing the
difference between the true values and Monte Carlo fit re-
sults, in which events are generated based on an alterna-
tive model. The alternative fit model has, in addition, a
uniform Dalitz distribution for the non-resonance events
and possible resonances including f0(980), f2(1270), and
a low mass S-wave σ. The fit does not find significant
number of any of those decays. However, the inclusion
of a low mass π+π− S-wave component significantly de-
grades our ability to identify ρ0π0 events. .
We vary the mass and width of the ρ(770), ρ(1450),
and ρ(1700) within ranges that exceed twice the errors
found for these parameters in the fits to τ and e+e−
data [5], and assign the observed differences in the mea-
sured U and I coefficients as systematic uncertainties.
To validate the fitting tool, we perform fits on large MC
samples with the measured proportions of signal, contin-
uum and B-background events. No significant biases are
observed in these fits, and the statistical uncertainties on
the fit parameters are taken as systematic uncertainties
”Quasi twobody” U±κ = |Aκ|2 ± |Aκ|2
U+0 ρ
0π0 fit fraction 0.237 ± 0.053 ± 0.043
U+− ρ
−π+ fit fraction 1.33± 0.11 ± 0.04
U−0 Direct CPV (ρ
0π0) −0.055± 0.098 ± 0.13
U−− Direct CPV (ρ
−π+) −0.30± 0.15 ± 0.03
U−+ Direct CPV (ρ
+π−) 0.53± 0.15 ± 0.04
”Quasi twobody” Iκ = Im
AκAκ∗
I0 Int. Mixing CPV ρ
0π0 −0.028± 0.058 ± 0.02
I− Int. Mixing CPV ρ
−π+ −0.03± 0.10 ± 0.03
I+ Int. Mixing CPV ρ
+π− −0.039± 0.097 ± 0.02
”Interference” U
±,Re(Im)
κσ = Re(Im)
AκAσ∗ ± AκAσ∗
+− 0.62± 0.54 ± 0.72
+− 0.13± 0.94 ± 0.17
+− 0.38± 0.55 ± 0.28
+− 2.14± 0.91 ± 0.33
+0 0.03± 0.42 ± 0.12
+0 −0.75± 0.40 ± 0.15
+0 −0.93± 0.68 ± 0.08
+0 −0.47± 0.80 ± 0.3
−0 −0.03± 0.40 ± 0.23
−0 −0.52± 0.32 ± 0.08
−0 0.24± 0.61 ± 0.2
−0 −0.42± 0.73 ± 0.28
”Interference” IReκσ = Re
AκAσ∗ −AσAκ∗
IRe+− −0.1 ± 1.9 ± 0.3
IRe+0 0.2 ± 1.1 ± 0.4
IRe−0 0.92± 0.91 ± 0.4
”Interference” IImκσ = Im
AκAσ∗ + AσAκ∗
IIm+− −1.9 ± 1.1 ± 0.1
IIm+0 −0.1 ± 1.1 ± 0.3
IIm−0 0.7 ± 1.0 ± 0.3
TABLE I: Definitions and results for the 26 U and I observ-
ables extracted from the fit. We determine the relative values
of U and I coefficients to U++ .
Another major source of systematic uncertainty is the
B-background model. The expected event yields from
the background modes are varied according to the uncer-
tainties in the measured or estimated branching fractions
Since B-backgroundmodes may exhibit CP violation, the
corresponding parameters are varied within appropriate
uncertainty ranges.
Continuum Dalitz plot PDF is extrapolated form mES
sideband, and large samples of off-resonance data with
loosened requirements on ∆E and the NN are used to
compare the distributions of m′ and θ′ between the mES
sideband and the signal region. No significant differences
are found. We assign as systematic error the effect seen
when weighting the continuum Dalitz plot PDF by the
ratio of both data sets. This effect is mostly statistical
in origin.
Other systematic effects due to the signal PDFs com-
prise uncertainties in the PDF parameterization, the
treatment of misreconstructed events, the tagging per-
0 50 100 150
α (deg)
B A B A R
P R E L I M I N A R Y
FIG. 2: Confidence level functions for α. Indicated by the
dashed horizontal lines are the confidence level (C.L.) values
corresponding to 1σ and 2σ, respectively.
formance, and the modeling of the signal contributions
and are estimated using arious data control samples.
B. Intepretation of the results
The U and I coefficients are related to the quasi-two-
body parameters (Tab.IVB) defined in Ref. [4], explic-
itly accounting for the presence of interference effects,
and are thus exact even for a ρ with finite width. The
systematic errors are dominated by the uncertainty on
the CP content of the B-related backgrounds. One can
transform the experimentally convenient, namely uncor-
related, direct CP -violation parameters C and Aρπ into
the physically more intuitive quantities A+−ρπ and A−+ρπ .
The significance, including systematic uncertainties and
calculated by using a mininum χ2 method, for the ob-
servation of non-zero direct CP violation is at the 3.0σ
level.
C = (C+ + C−)/2 0.154 ± 0.090 ± 0.037
S = (S+ + S−)/2 0.01± 0.12 ± 0.028
∆C = (C+ − C−)/2 0.377 ± 0.091 ± 0.021
∆S = (S+ − S−)/2 0.06 ± 0.13 ± 0.029
Aρπ =
−0.142± 0.041 ± 0.015
A+−ρπ = |κ
+−|2−1
|κ+−|2+1
0.03 ± 0.07± 0.03
A−+ρπ = |κ
−+|2−1
|κ−+|2+1
−0.38+0.15−0.16 ± 0.07
TABLE II: Quasi twobody parameters definition and results,
where C± =
and S± =
; κ+− = (q/p)(A−/A+)
and κ−+ = (q/p)(A+/A−), so that A+−ρπ (A−+ρπ ) involves only
diagrams where the ρ (π) meson is emitted by the W bo-
son. A+−ρπ and A−+ρπ are evaluated as −
Aρπ+C+Aρπ∆C
1+∆C+AρπC
Aρπ−C−Aρπ∆C
1−∆C−AρπC
. Their correlation coefficient is 0.62.
The measurement of the resonance interference terms
allows us to constrain the relative phase δ+− =
arg (A+∗A−) between the amplitudes of the decays B0 →
ρ−π+ and B0 → ρ+π−. This constraint can be improved
with the use of strong isospin symmetry. The amplitudes
Aκ represent the sum of tree-level (T κ) and penguin-
type (P κ) amplitudes, which have different CKM fac-
tors. Here we denote by κ the charge conjugate of κ,
where 0 = 0. We define [11] Aκ = T κe−iα + P κ and
Aκ = T κe+iα + P κ, where the magnitudes of the CKM
factors have been absorbed in the T κ, P κ, T κ and P κ.
Using strong isospin symmetry and neglecting isospin-
breaking effects, one can identify P 0 = −(P+ + P−)/2
and 9 unknowns have to be determined by the fit.
We find for the solution that is favored by the fit
δ+− = (34 ± 29)◦, where the errors include both sta-
tistical and systematic effects, but only a marginal con-
straint on δ+− is obtained for C.L. < 0.05.
Finally, following the same procedure, we can also de-
rive a constraint on α. The resulting C.L. function versus
α is given in Fig. 2 and includes systematic uncertain-
ties. Ignoring the mirror solution at α + 180◦, we find
α ∈ (75◦, 152◦) at 68% C.L. No constraint on α is
achieved at two sigma and beyond.
V. CONCLUSIONS
We have presented the preliminary measurement of
CP -violating asymmetries in B0 → π+π−π0 decays dom-
inated by the ρ resonance. The results are obtained from
a data sample of 346 million Υ (4S) → BB decays. We
perform a time-dependent Dalitz plot analysis. From the
measurement of the coefficients of 26 form factor bilin-
ears we determine the three CP -violating and two CP -
conserving quasi-two-body parameters, where we find a
3.0σ evidence of direct CP violation. Taking advantage
of the interference between the ρ resonances in the Dalitz
plot, we derive constraints on the relative strong phase
between B0 decays to ρ+π− and ρ−π+, and on the an-
gle α of the Unitarity Triangle. These measurements are
consistent with the expectation from the CKM fit [12].
Acknowledgments
The author wishes to thank the conference organizers
for an enjoyable and well-organized workshop. This work
is supported by the Istituto Nazionale di Fisica Nucle-
are (INFN) and the United State Department of Energy
(DOE) under contract DE-AC02-76SF00515.
[1] H.R. Quinn and A.E. Snyder, Phys. Rev. D48, 2139
(1993).
[2] H.J. Lipkin, Y. Nir, H.R. Quinn and A. Snyder, Phys.
Rev. D44, 1454 (1991).
[3] W. M. Yao et al. [Particle Data Group], J. Phys. G 33
(2006) 1.
[4] BABAR Collaboration (B. Aubert et al.), Phys. Rev.
Lett. 91, 201802 (2003); updated preliminary results at
BABAR-PLOT-0055 (2003).
[5] ALEPH Collaboration, (R. Barate et al.), Z. Phys. C76,
15 (1997); we use updated lineshape fits including new
data from e+e− annihilation [13] and τ spectral func-
tions [14] (masses and widths in MeV/c2): mρ±(770) =
775.5±0.6, mρ0(770) = 773.1±0.5, Γρ±(770) = 148.2±0.8,
Γρ±(770) = 148.0 ± 0.9, mρ(1450) = 1409 ± 12, Γρ(1450) =
500± 37, mρ(1700) = 1749 ± 20, and Γρ(1700) ≡ 235.
[6] G.J. Gounaris and J.J. Sakurai, Phys. Rev. Lett. 21, 244
(1968).
[7] P. Gay, B. Michel, J. Proriol, and O. Deschamps, “Tag-
ging Higgs Bosons in Hadronic LEP-2 Events with Neural
Networks.”, In Pisa 1995, New computing techniques in
physics research, 725 (1995).
[8] BABAR Collaboration, B. Aubert et al., Phys. Rev. D66,
032003 (2002).
[9] BABAR Collaboration (B. Aubert et al.), Phys. Rev. Lett.
93, 051802 (2004).
[10] Belle Collaboration (J. Dragic et al.), Phys. Rev. D73,
111105 (2006).
[11] The BABAR Physics Book, Editors P.F. Harrison and
H.R. Quinn, SLAC-R-504 (1998).
[12] M. Bona et al., JHEP, 0507 (2005) 028, J. Charles et al.,
Eur. Phys. J. C41, 1 (2005).
[13] R.R. Akhmetshin et al. (CMD-2 Collaboration), Phys.
Lett. B527, 161 (2002).
[14] ALEPH Collaboration, ALEPH 2002-030 CONF 2002-
019, (July 2002).
ABSTRACT
  I present here results of a time-dependent analysis of the Dalitz structure
of neutral $B$ meson decays to
  $\pip\pim\piz$ from a dataset of 346 million $B \bar B$ pairs collected at
the $\Upsilon(4S)$ center of mass energy by the BaBar detector at the SLAC
PEP-II $e^+e^-$ accelerator. No significant CP violation effects are observed
and 68% confidence interval is derived on the weak angle $\alpha$ to be
[75$^o$,152$^o$]

<|endoftext|><|startoftext|>
Introduction 
Thermally stable polymers have attracted a lot of interest due to their potential use as the 
active component in electronic, optical and optoelectronic applications, such as light-emitting diodes, 
light emitting electrochemical cells, photodiodes, photovoltaic cells, field effect transistors, 
optocouplers and optically pumped lasers in solution and solid state.  
Polymer-based structures are the focus of intensive investigations as mechanically and 
physically flexible, processible materials for large-area photoemitting and photosensitive devices. 
Their wide practical application is inhibited by present-day limitations in control over luminescent 
spectra, sensitivity and efficiency. We report results of our investigations into the use of thermal 
treatment of poly(p-phenylene vinylene) (PPV) films grown on a variety of substrates (quartz and 
glass). The samples studied had a thickness in the range 50 - 200 nm. Film thickness, morphology 
and structural properties were investigated by a range of techniques in particular: atomic force 
microscope - AFM, DEKTAK method, Ellipsometry and UV-VIS spectroscopy. 
2. Experiment Part 
Thin polymeric films are often used in the microelectronic industry, the development of 
optoelectronic applications. Homogeneous films with thickness varying from 50 – 200 nm are 
commonly prepared by spin coating. I this technique, polymers solution is dropped on the substrate 
surface (our case glass and quartz), which rotates at a given angular velocity during a give period of 
time. The film thickness is controlled by the concentration of the polymer is solution – 5% PPV in 
our experiment -, polymer molecular weight, spinning velocity and solvent evaporation rate. The 
polymers films are annealed at higher temperatures in vacuum and normal atmosphere and after this 
are investigated and results are compared. This work is concerned with the morphology of the thin 
films obtained from spin coating when different annealing method. The interactions between 
substrate, polymer and solvent were qualitatively correlated with the resulting surface morphology of 
spin coated films and treatment applied. We choose quartz and glass as substrates because this is 
transparent and easy for the spectroscopic investigations in transmission mode, and P-PPV dissolved 
in common solvents like toluene and chloroform. Moreover, the determination of the optical 
absorption and transmission, morphology and stability of the films are important for the development 
of electronic applications and waveguides[1].  
Analytical grade toluene and chloroform were used to prepare the solutions at the polymer 
concentration 5 mg mL-1. The P-PPV was dissolved in solvents, where no phase separation takes 
place. The chemical structure for PPV is schematically represented in Figure 1. 
Cl Cl
H2O2 / TeO2
NaOtBu
200oC
III IV
Figure 1. Synthesis of PPV ( after C.J. Brabec, et al.) 
3. Methods and Results  
Spin coating – The PPV films were prepared 
by spin coating on commercial quartz and glass 
substrates. The substrates dimensions of 1 cm x 2 cm 
were previously cleaned in standard manner and dried 
under a stream of N2 [2]. All coatings were performed 
with the spinning velocity of 2000 rpm and the 
spinning time of 60 seconds.  
Ellipsometry – The mean thickness and index 
of refraction (n) of the films were determined by 
means of ellipsometry in a Plasmos SD2000Automatic ellipsometer, Munich, Germany. The samples 
characteristics are shown in Table 1. 
Table 1. Characteristic of PPV films obtained from spin coating. All measurements 
were performed at 24 ± 2 0 C 
Dektak measurements – We measured and compared the morphology, thickness and aspect 
of films before and after treatment. For PPV films before annealing we obtained the thickness of 87 
nm witch is shown in figure 3, and in figure 4 after annealing in vacuum we obtained 45 nm and 
figure 5 and 6 shown aspects layers before and after annealing. [3]. 
Sample Solvent Thickness (nm) 
Reflection index 
P-PPV on quartz 
(before annealing) 
Toluene and 
Chloroform 
87 ± 5 1,3 ± 0.05 
PPV on Quartz 
(annealed in 
vacuum) 
Toluene and 
Chloroform 
45 ± 5 2,590 ± 0.05 
PPV on quartz 
(annealed in 
normal 
atmosphere) 
Toluene and 
Chloroform 
58 ± 5 2,567 ± 0.05 
PPV on quartz 
(after second 
annealing in 
vacuum) 
Toluene and 
Chloroform 
44 ± 5 2,578 ± 0,05 
Figure 2.  Spin coating deposition 
-100 0 100 200 300 400 500 600 700
87 nm
distance µm
-100 0 100 200 300 400 500 600 700
Distance (µm)
Figure 3.PPV film thickness before 
annealing 
Figure 4. PPV film thickness after annealing 
in vacuum 2 h at 200 oC 
Figure 5. PPV film aspects before annealing Figure 6. PPV film aspects before annealing 
Atomic Force Microscopy – Measurements were carried out with an instrument from Park 
Instrument Scientific (Sunnyvale, CA, USA) in non-contact mode in air at room temperature. All 
AFM images represent unfiltered original data and are displayed in color scale in figure 7, 8 and 9 
[4]. 
Figure 7. AFM image from PPV films after 
anneling in vacuum at 200 0 C from 2 h 
Figure 8. AFM image from PPV films after 
second anneling in vacuum at 200 0 C from 2 
Figure 9. AFM image from PPV films after 
anneling in mormal atmosphere  at 200 0 C 
from 2 h 
In figure 7 and 8 are shown the image of PPV films after first and second annealing in 
vacuum at 200 0C for 2 hours. We can see not many changes between films, thichness were almost 
the same ( 45 nm respectively 44 nm) [5]. 
Figure 9 shows the surface structure of PPV films after annealing in normal atmophere at 
200 0C for 2 hours and the film structure are diferent, compare with films structure which were 
annealed in vacuum. For all images, films are continuous and smooth with a a root mean square 
(r.m.s) roughness of 2 – 3 nm, from the annealed in vacuum and 5-7 nm from the normal 
athmosphere annealed [6]. We get amorphous films in the both case. The main informations observed 
in Dektak measurements, AFM investigations and ellipsomentry are: the surface roughness of the 
films depend on the speed of heating, slow heat up raises the roughness, quick heat up leads to more 
smoth films. The same situation is meet in case of vacuum annealing and normal atmosphere 
annealing. More over, the thickness of the layers is reduced to about the haltf after annealing in the 
both case. The PPV layers are not orienteded in the both annealing method [6]. 
UV/VIS measurements - were made using the Perkin Elmer – UV/VIS Spectrometer Lambda  
16. Spectra were acquired from 300 to 900 nm for optical excitation.  Figure 9, 10 and 11 shows a set 
of absorption spectra of PPV films obtained from spin coating converted by heating under vacuum 
and normal atmosphere at 200oC for 2 hours [7]. 
Spectra were normalized by dividing absorption spectrum of each individual sample by its 
absorption at the maximum. In this way relative changes within the spectrum and between the spectra 
are easily observed. One can notice differences in the position of the absorption maxima of PPV 
films prepared in different annealing method. The changes in the optical spectra of PPV films 
obtained from the precursors prepared in vacuum conditions and normal atmosphere condition are 
consistent with earlier observations in figure 10, 11 and 12 [8].  
4. Conclusions 
We summarize our findings as follows: (i) Annealing of PPV films causes ordering of polymer 
chains and, as a result, change in the luminescence intensity and spectra. (ii) spectral characteristics 
of the converted PPV-precursor strongly dependent on the preparation condition of the precursor (iii) 
the thickness of layers is reduced to about the half after annealing. (iv) The surface roughness of the 
films depends on the speed of heating: slow heat up raises the roughness; quick heat up leads to more 
smooth films. (v) PPV is thermally stable up to more than 500 0C measured by TGA. (vi) We get 
amorphous films in spin coating deposition. 
5. Acknowledgements 
Financial support of   the   European Commission   under   contract   number: FP6 – 505478-1  
ODEON - Project  and RTN EUROFET – Project is gratefully acknowledged. 
300 400 500 600 700 800 900 1000
after annealing in normal atmosphere
before annealing
after annealing in vacuum
wavelength (nm)
100 200 300 400 500 600 700 800 900 1000
before annealing
after 1st annealing (in vacuum)
after 2nd annealing (in vacuum)
X Axis Title
Figure 10.  UV-VIS spectra of PPV films 
obtained in vacuum conversion and normal 
atmosphere conversion 
Figure 11. UV-VIS spectra of PPV films 
obtained in vacuum conversion made several 
times 
300 400 500 600 700 800 900
before  annealing
after  annealing
wavelength (nm)
Figure 12.  UV-VIS spectra comparation of PPV films after and before vacuum conversion 
6. References 
[1] L. Bakueva, E.H. Sargent, R. Resendes, A. Bartole, I. Manners, J. 
Mater. Sci.: Mater. Electron. 12 (2001) 21. 
[2] M. Pope, C.E. Swenberg, Electronic Processes in Organic Crystals and Polymers, Oxford Science 
Publications, Oxford, 1999. 
[3] L. Bakueva, S. Musikhin, E.H. Sargent, A. Shik, 2001. MRS Fall Meeting, Boston, November 
26–30, 2001 Book of Abstracts. 
[4] D. Moses, A. Dogariu, A.J. Heeger, Synth. Met. 116 (2001) 19. 
[5] B. Hu, F.E. Karaz, Chem. Phys. 227 (1998) 263. 
[6] X.-R. Zeng, T.-M. Ko, J. Polym. Sci. B 35 (1997) 1993. 
[7] C.E. Lee, C.-H. Jin, Synthet. Met. 117 (2001) 27. 
[8] D.F.S. Petri,  J.Braz.Chem.Soc. vol. 13, no 5, 695-699,2002.
ABSTRACT
  Thermally stable polymers have attracted a lot of interest due to their
potential use as the active component in electronic, optical and optoelectronic
applications, such as light-emitting diodes, light emitting electrochemical
cells, photodiodes, photovoltaic cells, field effect transistors, optocouplers
and optically pumped lasers in solution and solid state.We report results of
investigations into the use of thermal treatment of poly(p-phenylene vinylene)
(PPV) films grown on a variety of substrates (quartz and glass). Film
thickness, morphology and structural properties were investigated by a range of
techniques in particular: atomic force microscope - AFM, DEKTAK method,
Ellipsometry and UV-VIS spectroscopy.

<|endoftext|><|startoftext|>
arXiv:0704.0573v1  [quant-ph]  4 Apr 2007
Relativistic treatment in D-dimensions to a spin-zero particle with
noncentral equal scalar and vector ring-shaped Kratzer potential
Sameer M. Ikhdair∗ and Ramazan Sever†
∗Department of Physics, Near East University, Nicosia, North Cyprus, Mersin 10, Turkey
†Department of Physics, Middle East Technical University, 06531 Ankara, Turkey.
(November 4, 2018)
Abstract
The Klein-Gordon equation in D-dimensions for a recently proposed Kratzer
potential plus ring-shaped potential is solved analytically by means of the
conventional Nikiforov-Uvarov method. The exact energy bound-states and
the corresponding wave functions of the Klein-Gordon are obtained in the
presence of the noncentral equal scalar and vector potentials. The results
obtained in this work are more general and can be reduced to the standard
forms in three-dimensions given by other works.
Keywords: Energy eigenvalues and eigenfunctions, Klein-Gordon equa-
tion, Kratzer potential, ring-shaped potential, non-central potentials, Niki-
forov and Uvarov method.
PACS numbers: 03.65.-w; 03.65.Fd; 03.65.Ge.
∗sikhdair@neu.edu.tr
†sever@metu.edu.tr
http://arxiv.org/abs/0704.0573v1
I. INTRODUCTION
In various physical applications including those in nuclear physics and high energy physics
[1,2], one of the interesting problems is to obtain exact solutions of the relativistic equations
like Klein-Gordon and Dirac equations for mixed vector and scalar potential. The Klein-
Gordon and Dirac wave equations are frequently used to describe the particle dynamics in
relativistic quantum mechanics. The Klein-Gordon equation has also been used to under-
stand the motion of a spin-0 particle in large class of potentials. In recent years, much
efforts have been paid to solve these relativistic wave equations for various potentials by
using different methods. These relativistic equations contain two objects: the four-vector
linear momentum operator and the scalar rest mass. They allow us to introduce two types
of potential coupling, which are the four-vector potential (V) and the space-time scalar
potential (S).
Recently, many authors have worked on solving these equations with physical potentials
including Morse potential [3], Hulthen potential [4], Woods-Saxon potential [5], Pösch-Teller
potential [6], reflectionless-type potential [7], pseudoharmonic oscillator [8], ring-shaped har-
monic oscillator [9], V0 tanh
2(r/r0) potential [10], five-parameter exponential potential [11],
Rosen-Morse potential [12], and generalized symmetrical double-well potential [13], etc. It
is remarkable that in most works in this area, the scalar and vector potentials are almost
taken to be equal (i.e., S = V ) [2,14]. However, in some few other cases, it is considered
the case where the scalar potential is greater than the vector potential (in order to guar-
antee the existence of Klein-Gordon bound states) (i.e., S > V ) [15-19]. Nonetheless, such
physical potentials are very few. The bound-state solutions for the last case is obtained for
the exponential potential for the s-wave Klein-Gordon equation when the scalar potential is
greater than the vector potential [15].
The study of exact solutions of the nonrelativistic equation for a class of non-central po-
tentials with a vector potential and a non-central scalar potential is of considerable interest
in quantum chemistry [20-22]. In recent years, numerous studies [23] have been made in
analyzing the bound states of an electron in a Coulomb field with simultaneous presence
of Aharanov-Bohm (AB) [24] field, and/or a magnetic Dirac monopole [25], and Aharanov-
Bohm plus oscillator (ABO) systems. In most of these works, the eigenvalues and eigen-
functions are obtained by means of seperation of variables in spherical or other orthogonal
curvilinear coordinate systems. The path integral for particles moving in non-central poten-
tials is evaluated to derive the energy spectrum of this system analytically [26]. In addition,
the idea of SUSY and shape invariance is also used to obtain exact solutions of such non-
central but seperable potentials [27,28]. Very recently, the conventional Nikiforov-Uvarov
(NU) method [29] has been used to give a clear recipe of how to obtain an explicit exact
bound-states solutions for the energy eigenvalues and their corresponding wave functions in
terms of orthogonal polynomials for a class of non-central potentials [30].
Another type of noncentral potentials is the ring-shaped Kratzer potential, which is a
combination of a Coulomb potential plus an inverse square potential plus a noncentral angu-
lar part [31,32]. The Kratzer potential has been used to describe the vibrational-rotational
motion of isolated diatomic molecules [33] and has a mixed-energy spectrum containing both
bound and scattering states with bound-states have been widely used in molecular spec-
troscopy [34]. The ring-shaped Kratzer potential consists of radial and angular-dependent
potentials and is useful in studying ring-shaped molecules [22]. In taking the relativistic
effects into account for spin-0 particle in the presence of a class of noncentral potentials, Ya-
suk et al [35] applied the NU method to solve the Klein-Gordon equation for the noncentral
Coulombic ring-shaped potential [21] for the case V = S. Further, Berkdemir [36] also used
the same method to solve the Klein-Gordon equation for the Kratzer-type potential.
Recently, Chen and Dong [37] proposed a new ring-shaped potential and obtained the
exact solution of the Schrödinger equation for the Coulomb potential plus this new ring-
shaped potential which has possible applications to ring-shaped organic molecules like cyclic
polyenes and benzene. This type of potential used by Chen and Dong [37] appears to be
very similar to the potential used by Yasuk et al [35]. Moreover, Cheng and Dai [38],
proposed a new potential consisting from the modified Kratzer’s potential [33] plus the
new proposed ring-shaped potential in [37]. They have presented the energy eigenvalues
for this proposed exactly-solvable non-central potential in three dimensional (i.e., D = 3)-
Schrödinger equation by means of the NU method. The two quantum systems solved by
Refs [37,38] are closely relevant to each other as they deal with a Coulombic field interaction
except for a slight change in the angular momentum barrier acts as a repulsive core which
is for any arbitrary angular momentum ℓ prevents collapse of the system in any dimensional
space due to the slight perturbation to the original angular momentum barrier. Very recently,
we have also applied the NU method to solve the Schrödinger equation in any arbitrary D-
dimension to this new modified Kratzer-type potential [39,40].
The aim of the present paper is to consider the relativistic effects for the spin-0 parti-
cle in our recent works [39,40]. So we want to present a systematic recipe to solving the
D-dimensional Klein-Gordon equation for the Kratzer plus the new ring-shaped potential
proposed in [38] using the simple NU method. This method is based on solving the Klein-
Gordon equation by reducing it to a generalized hypergeometric equation.
This work is organized as follows: in section II, we shall present the Klein-Gordon
equation in spherical coordinates for spin-0 particle in the presence of equal scalar and
vector noncentral Kratzer plus the new ring-shaped potential and we also separate it into
radial and angular parts. Section III is devoted to a brief description of the NU method.
In section IV, we present the exact solutions to the radial and angular parts of the Klein-
Gordon equation in D-dimensions. Finally, the relevant conclusions are given in section
II. THE KLEIN-GORDON EQUATION WITH EQUAL SCALAR AND VECTOR
POTENTIALS
In relativistic quantum mechanics, we usually use the Klein-Gordon equation for de-
scribing a scalar particle, i.e., the spin-0 particle dynamics. The discussion of the relativistic
behavior of spin-zero particles requires understanding the single particle spectrum and the
exact solutions to the Klein Gordon equation which are constructed by using the four-vector
potential Aλ (λ = 0, 1, 2, 3) and the scalar potential (S). In order to simplify the solution
of the Klein-Gordon equation, the four-vector potential can be written as Aλ = (A0, 0, 0, 0).
The first component of the four-vector potential is represented by a vector potential (V ), i.e.,
A0 = V. In this case, the motion of a relativistic spin-0 particle in a potential is described by
the Klein-Gordon equation with the potentials V and S [1]. For the case S ≥ V, there exist
bound-state (real) solutions for a relativistic spin-zero particle [15-19]. On the other hand,
for S = V, the Klein-Gordon equation reduces to a Schrödinger-like equation and thereby
the bound-state solutions are easily obtained by using the well-known methods developed
in nonrelativistic quantum mechanics [2].
The Klein-Gordon equation describing a scalar particle (spin-0 particle) with scalar
S(r, θ, ϕ) and vector V (r, θ, ϕ) potentials is given by [2,14]
2 − [ER − V (r, θ, ϕ)/2]
+ [µ+ S(r, θ, ϕ)/2]
ψ(r, θ, ϕ) = 0, (1)
where ER,P and µ are the relativistic energy, momentum operator and rest mass of the
particle, respectively. The potential terms are scaled in (1) by Alhaidari et al [14] so that
in the nonrelativistic limit the interaction potential becomes V.
In this work, we consider the equal scalar and vector potentials case, that is, S(r, θ, ϕ) =
V (r, θ, ϕ) with the recently proposed general non-central potential taken in the form of the
Kratzer plus ring-shaped potential [38-40]:
V (r, θ, ϕ) = V1(r) +
V2(θ)
V3(ϕ)
r2 sin2 θ
, (2)
V1(r) = −
, V2(θ) = Cctg
2θ, V3(ϕ) = 0, (3)
where A = 2a0r0, B = a0r
0 and C is positive real constant with a0 is the dissociation energy
and r0 is the equilibrium internuclear distance [33]. The potentials in Eq. (3) introduced
by Cheng-Dai [38] reduce to the Kratzer potential in the limiting case of C = 0 [33]. In fact
the energy spectrum for this potential can be obtained directly by considering it as special
case of the general non-central seperable potentials [30].
In the relativistic atomic units (h̄ = c = 1), the D-dimensional Klein-Gordon equation
in (1) becomes [41]
sin θ
sin θ
sin2 θ
− (ER + µ)
V1(r) +
V2(θ)
V3(ϕ)
r2 sin2 θ
E2R − µ
ψ(r, θ, ϕ) = 0. (4)
with ψ(r, θ, ϕ) being the spherical total wave function separated as follows
ψnjm(r, θ, ϕ) = R(r)Y
j (θ, ϕ), R(r) = r
−(D−1)/2g(r), Y mj (θ, ϕ) = H(θ)Φ(ϕ). (5)
Inserting Eqs (3) and (5) into Eq. (4) and using the method of separation of variables, the
following differential equations are obtained:
dR(r)
j(j +D − 2)
+ α22
α21 −
R(r) = 0, (6)
sin θ
sin θ
m2 + Cα22 cos
sin2 θ
+ j(j +D − 2)
H(θ) = 0, (7)
d2Φ(ϕ)
+m2Φ(ϕ) = 0, (8)
where α21 = µ−ER, α
2 = µ+ER, m and j are constants and with m
2 and λj = j(j+D−2)
are the separation constants.
For a nonrelativistic treatment with the same potential, the Schrödinger equation in
spherical coordinates is
sin θ
sin θ
sin2 θ
ENR − V1(r)−
V2(θ)
V3(ϕ)
r2 sin2 θ
ψ(r, θ, ϕ) = 0. (9)
where µ and ENR are the reduced mass and the nonrelativistic energy, respectively. Besides,
the spherical total wave function appearing in Eq. (9) has the same representation as in Eq.
(5) but with the transformation j → ℓ. Inserting Eq. (5) into Eq. (9) leads to the following
differential equations [39,40]:
dR(r)
ENR +
R(r) = 0, (10)
sin θ
sin θ
m2 + 2µC cos2 θ
sin2 θ
+ ℓ(ℓ+D − 2)
H(θ) = 0, (11)
d2Φ(ϕ)
+m2Φ(ϕ) = 0, (12)
where m2 and λℓ = ℓ(ℓ + D − 2) are the separation constants. Equations (6)-(8) have the
same functional form as Eqs (10)-(12). Therefore, the solution of the Klein-Gordon equation
can be reduced to the solution of the Schrödinger equation with the appropriate choice of
parameters: j → ℓ, α21 → −ENR and α
2 → 2µ.
The solution of Eq. (8) is well-known periodic and must satisfy the period boundary
condition Φ(ϕ + 2π) = Φ(ϕ) which is the azimuthal angle solution:
Φm(ϕ) =
exp(±imϕ), m = 0, 1, 2, ..... (13)
Additionally, Eqs (6) and (7) are radial and polar angle equations and they will be solved
by using the Nikiforov-Uvarov (NU) method [29] which is given briefly in the following
section.
III. NIKIFOROV-UVAROV METHOD
The NU method is based on reducing the second-order differential equation to a gener-
alized equation of hypergeometric type [29]. In this sense, the Schrödinger equation, after
employing an appropriate coordinate transformation s = s(r), transforms to the following
form:
ψ′′n(s) +
τ̃(s)
ψ′n(s) +
σ̃(s)
σ2(s)
ψn(s) = 0, (14)
where σ(s) and σ̃(s) are polynomials, at most of second-degree, and τ̃ (s) is a first-degree
polynomial. Using a wave function, ψn(s), of the simple ansatz:
ψn(s) = φn(s)yn(s), (15)
reduces (14) into an equation of a hypergeometric type
σ(s)y′′n(s) + τ(s)y
n(s) + λyn(s) = 0, (16)
where
σ(s) = π(s)
φ′(s)
, (17)
τ(s) = τ̃(s) + 2π(s), τ ′(s) < 0, (18)
and λ is a parameter defined as
λ = λn = −nτ ′(s)−
n (n− 1)
σ′′(s), n = 0, 1, 2, .... (19)
The polynomial τ(s) with the parameter s and prime factors show the differentials at first
degree be negative. It is worthwhile to note that λ or λn are obtained from a particular
solution of the form y(s) = yn(s) which is a polynomial of degree n. Further, the other
part yn(s) of the wave function (14) is the hypergeometric-type function whose polynomial
solutions are given by Rodrigues relation
yn(s) =
[σn(s)ρ(s)] , (20)
where Bn is the normalization constant and the weight function ρ(s) must satisfy the con-
dition [29]
w(s) =
w(s), w(s) = σ(s)ρ(s). (21)
The function π and the parameter λ are defined as
π(s) =
σ′(s)− τ̃ (s)
σ′(s)− τ̃ (s)
− σ̃(s) + kσ(s), (22)
λ = k + π′(s). (23)
In principle, since π(s) has to be a polynomial of degree at most one, the expression under
the square root sign in (22) can be arranged to be the square of a polynomial of first degree
[29]. This is possible only if its discriminant is zero. In this case, an equation for k is
obtained. After solving this equation, the obtained values of k are substituted in (22). In
addition, by comparing equations (19) and (23), we obtain the energy eigenvalues.
IV. EXACT SOLUTIONS OF THE RADIAL AND ANGLE-DEPENDENT
EQUATIONS
A. Separating variables of the Klein-Gordon equation
We seek to solving the radial and angular parts of the Klein-Gordon equation given
by Eqs (6) and (7). Equation (6) involving the radial part can be written simply in the
following form [39-41]:
d2g(r)
(M − 1)(M − 3)
− α22
+ α21α
g(r) = 0, (24)
where
M = D + 2j. (25)
On the other hand, Eq. (7) involving the angular part of Klein-Gordon equation retakes the
simple form
d2H(θ)
+ ctg(θ)
dH(θ)
m2 + Cα22 cos
sin2 θ
− j(j +D − 2)
H(θ) = 0. (26)
Thus, Eqs (24) and (26) have to be solved latter through the NU method in the following
subsections.
B. Eigenvalues and eigenfunctions of the angle-dependent equation
In order to apply NU method [29,30,33,35,36,38-40,42-44], we use a suitable transforma-
tion variable s = cos θ. The polar angle part of the Klein Gordon equation in (26) can be
written in the following universal associated-Legendre differential equation form [38-40]
d2H(s)
1− s2
dH(s)
(1− s2)2
j(j +D − 2)(1− s2)−m2 − Cα22s
H(θ) = 0. (27)
Equation (27) has already been solved for the three-dimensional Schrödinger equation
through the NU method in [38]. However, the aim in this subsection is to solve with different
parameters resulting from the D-space-dimensions of Klein-Gordon equation. Further, Eq.
(27) is compared with (14) and the following identifications are obtained
τ̃(s) = −2s, σ(s) = 1− s2, σ̃(s) = −m′2 + (1− s2)ν ′, (28)
where
ν ′ = j′(j′ +D − 2) = j(j +D − 2) + Cα22, m
′2 = m2 + Cα22. (29)
Inserting the above expressions into equation (22), one obtains the following function:
π(s) = ±
(ν ′ − k)s2 + k − ν ′ +m′2, (30)
Following the method, the polynomial π(s) is found in the following possible values
π(s) =


m′s for k1 = ν
′ −m′2,
−m′s for k1 = ν ′ −m′2,
m′ for k2 = ν
−m′ for k2 = ν ′.
Imposing the condition τ ′(s) < 0, for equation (18), one selects
k1 = ν
′ −m′2 and π(s) = −m′s, (32)
which yields
τ(s) = −2(1 +m′)s. (33)
Using equations (19) and (23), the following expressions for λ are obtained, respectively,
λ = λn = 2ñ(1 +m
′) + ñ(ñ− 1), (34)
λ = ν ′ −m′(1 +m′). (35)
We compare equations (34) and (35), the new angular momentum j values are obtained as
j = −
(D − 2)
(D − 2)2 + (2ñ+ 2m′ + 1)2 − 4Cα22 − 1, (36)
j′ = −
(D − 2)
(D − 2)2 + (2ñ+ 2m′ + 1)2 − 1. (37)
Using Eqs (15)-(17) and (20)-(21), the polynomial solution of yn is expressed in terms of
Jacobi polynomials [39,40] which are one of the orthogonal polynomials:
Hñ(θ) = Nñ sin
m′(θ)P
(m′,m′)
(cos θ), (38)
where Nñ =
(ñ+m′)!
(2ñ+2m′+1)(ñ+2m′)!ñ!
is the normalization constant determined by
[Hñ(s)]
ds = 1 and using the orthogonality relation of Jacobi polynomials [35,45,46].
Besides
ñ = −
(1 + 2m′)
(2j + 1)2 + 4j(D − 3) + 4Cα22, (39)
where m′ is defined by equation (29).
C. Eigensolutions of the radial equation
The solution of the radial part of Klein-Gordon equation, Eq. (24), for the Kratzer’s
potential has already been solved by means of NU-method in [39]. Very recently, using
the same method, the problem for the non-central potential in (2) has been solved in three
dimensions (3D) by Cheng and Dai [36]. However, the aim of this subsection is to solve
the problem with a different radial separation function g(r) in any arbitrary dimensions. In
what follows, we present the exact bound-states (real) solution of Eq. (24). Letting
ε2 = α21α
2, 4γ
2 = (M − 1)(M − 3) + 4Bα22, β
2 = Aα22, (40)
and substituting these expressions in equation (24), one gets
d2g(r)
−ε2r2 + β2r − γ2
g(r) = 0. (41)
To apply the conventional NU-method, equation (41) is compared with (14), resulting in
the following expressions:
τ̃ (r) = 0, σ(r) = r, σ̃(r) = −ε2r2 + β2r − γ2. (42)
Substituting the above expressions into equation (22) gives
π(r) =
4ε2r2 + 4(k − β2)r + 4γ2 + 1. (43)
Therefore, we can determine the constant k by using the condition that the discriminant of
the square root is zero, that is
k = β2 ± ε
4γ2 + 1, 4γ2 + 1 = (D + 2j − 2)2 + 4Bα22. (44)
In view of that, we arrive at the following four possible functions of π(r) :
π(r) =


εr + 1
4γ2 + 1
for k1 = β
2 + ε
4γ2 + 1,
εr + 1
4γ2 + 1
for k1 = β
2 + ε
4γ2 + 1,
εr − 1
4γ2 + 1
for k2 = β
2 − ε
4γ2 + 1,
εr − 1
4γ2 + 1
for k2 = β
2 − ε
4γ2 + 1.
The correct value of π(r) is chosen such that the function τ(r) given by Eq. (18) will have
negative derivative [29]. So we can select the physical values to be
k = β2 − ε
4γ2 + 1 and π(r) =
4γ2 + 1
, (46)
which yield
τ(r) = −2εr + (1 +
4γ2 + 1), τ ′(r) = −2ε < 0. (47)
Using Eqs (19) and (23), the following expressions for λ are obtained, respectively,
λ = λn = 2nε, n = 0, 1, 2, ..., (48)
λ = δ2 − ε(1 +
4γ2 + 1). (49)
So we can obtain the Klein Gordon energy eigenvalues from the following relation:
1 + 2n+
(D + 2j − 2)2 + 4(µ+ ER)B
µ−ER = A
µ+ ER, (50)
and hence for the Kratzer plus the new ring-shaped potential, it becomes
1 + 2n+
(D + 2j − 2)2 + 4a0r20(µ+ ER)
µ−ER = 2a0r0
µ+ ER, (51)
with j defined in (36). Although Eq. (51) is exactly solvable for ER but it looks to be little
complicated. Further, it is interesting to investigate the solution for the Coulomb potential.
Therefore, applying the following transformations: A = Ze2, B = 0, and j = ℓ, the central
part of the potential in (3) turns into the Coulomb potential with Klein Gordon solution for
the energy spectra given by
ER = µ
2q2e2
q2e2 + (2n+ 2ℓ+D − 1)2
, n, ℓ = 0, 1, 2, ..., (52)
where q = Ze is the charge of the nucleus. Further, Eq. (52) can be expanded as a series in
the nucleus charge as
ER = µ−
2µq2e2
(2n + 2ℓ+D − 1)2
2µq4e4
(2n+ 2ℓ+D − 1)4
−O(qe)6, (53)
The physical meaning of each term in the last equation was given in detail by Ref. [36].
Besides, the difference from the conventional nonrelativistic form is because of the choice of
the vector V (r, θ, ϕ) and scalar S(r, θ, ϕ) parts of the potential in Eq. (1).
Overmore, if the value of j obtained by Eq.(36) is inserted into the eigenvalues of the
radial part of the Klein Gordon equation with the noncentral potential given by Eq. (51),
we finally find the energy eigenvalues for a bound electron in the presence of a noncentral
potential by Eq. (2) as
1 + 2n+
(2j′ +D − 2)2 + 4(a0r20 − C)(µ+ ER)
µ−ER = 2a0r0
µ+ ER, (54)
where m′ =
m2 + C(µ+ ER) and ñ is given by Eq. (39). On the other hand, the solution
of the Schrödinger equation, Eq. (9), for this potential has already been obtained by using
the same method in Ref. [39] and it is in the Coulombic-like form:
ENR = −
8µa20r
2n+ 1 +
(2ℓ′ +D − 2)2 + 8µ(a0r20 − C)
]2 , n = 0, 1, 2, ... (55)
2ℓ′ +D − 2 =
(D − 2)2 + (2ñ + 2m′ + 1)2 − 1, (56)
wherem′ =
m2 + 2µC. Also, applying the following appropriate transformation: µ+ER →
2µ, µ−ER → − ENR, j → ℓ to Eq. (54) provides exactly the nonrelativistic limit given by
Eq. (55).
In what follows, let us now turn attention to find the radial wavefunctions for this
potential. Substituting the values of σ(r), π(r) and τ(r) in Eqs (42), (45) and (47) into Eqs.
(17) and (21), we find
φ(r) = r(ζ+1)/2e−εr, (57)
ρ(r) = rζe−2εr, (58)
where ζ =
4γ2 + 1. Then from equation (20), we obtain
ynj(r) = Bnjr
−ζe2εr
rn+ζe−2εr
, (59)
and the wave function g(r) can be written in the form of the generalized Laguerre polyno-
mials as
g(ρ) = Cnj
)(1+ζ)/2
e−ρ/2Lζn(ρ), (60)
where for Kratzer’s potential we have
(D + 2j − 2)2 + 4a0r20(µ+ ER), ρ = 2εr. (61)
Finally, the radial wave functions of the Klein-Gordon equation are obtained
R(ρ) = Cnj
)(ζ+2−D)/2
e−ρ/2Lζn(ρ), (62)
where Cnj is the normalization constant to be determined below. Using the normalization
condition,
R2(r)rD−1dr = 1, and the orthogonality relation of the generalized Laguerre
polynomials,
zη+1e−z [Lηn(z)]
(2n+η+1)(n+η)!
, we have
Cnj =
µ2 − E2R
)1+ ζ
(2n+ ζ + 1) (n+ ζ)!
. (63)
Finally, we may express the normalized total wave functions as
ψ(r, θ, ϕ) =
µ2 − E2R
)1+ ζ
(ñ+m′)!
√√√√(2ñ+ 2m
′ + 1)(ñ+ 2m′)!ñ!n!
2π (2n+ ζ + 1) (n + ζ)!
(ζ+2−D)
2 exp(−
µ2 − E2Rr)L
µ2 − E2Rr) sin
m′(θ)P (m
′,m′)
n (cos θ) exp(±imϕ). (64)
where ζ is defined in Eq. (61) and m′ is given after the Eq. (54).
V. CONCLUSIONS
The relativistic spin-0 particle D-dimensional Klein-Gordon equation has been solved
easily for its exact bound-states with equal scalar and vector ring-shaped Kratzer potential
through the conventional NU method. The analytical expressions for the total energy levels
and eigenfunctions of this system can be reduced to their conventional three-dimensional
space form upon setting D = 3. Further, the noncentral potentials treated in [30] can
be introduced as perturbation to the Kratzer’s potential by adjusting the strength of the
coupling constant C in terms of a0, which is the coupling constant of the Kratzer’s potential.
Additionally, the radial and polar angle wave functions of Klein-Gordon equation are found
in terms of Laguerre and Jacobi polynomials, respectively. The method presented in this
paper is general and worth extending to the solution of other interaction problems. This
method is very simple and useful in solving other complicated systems analytically without
given a restiction conditions on the solution of some quantum systems as the case in the
other models. We have seen that for the nonrelativistic model, the exact energy spectra can
be obtained either by solving the Schrödinger equation in (9) (cf. Ref. [39] or Eq. (55)) or by
applying appropriate transformation to the relativistic solution given by Eq. (54). Finally,
we point out that these exact results obtained for this new proposed form of the potential
(2) may have some interesting applications in the study of different quantum mechanical
systems, atomic and molecular physics.
ACKNOWLEDGMENTS
This research was partially supported by the Scientific and Technological Research Coun-
cil of Turkey. S.M. Ikhdair wishes to dedicate this work to his family for their love and
assistance.
REFERENCES
[1] T.Y. Wu and W.Y. Pauchy Hwang, Relativistic Quantum Mechanics and Quantum
Fields (World Scientific, Singapore, 1991).
[2] W. Greiner, Relativistic Quantum Mechanics: Wave Equations, 3rd edn (springer,
Berlin, 2000).
[3] A.D. Alhaidari, Phys. Rev. Lett. 87 (2001) 210405; 88 (2002) 189901.
[4] G. Chen, Mod. Phys. Lett. A 19 (2004) 2009; J.-Y. Guo, J. Meng and F.-X. Xu, Chin.
Phys. Lett. 20 (2003) 602; A.D. Alhaidari, J. Phys. A: Math. Gen. 34 (2001) 9827; 35
(2002) 6207; M. Şimşek and H. Eğrifes, J. Phys. A: Math. Gen. 37 (2004) 4379.
[5] J.-Y. Guo, X.-Z. Fang and F.-X. Xu, Phys. Rev. A 66 (2002) 062105; C. Berkdemir, A.
Berkdemir and R. Sever, J. Phys. A: Math. Gen. 39 (2006) 13455.
[6] G. Chen, Acta Phys. Sinica 50 (2001) 1651; Ö. Yeşiltaş, Phys. Scr. 75 (2007) 41.
[7] G. Chen and Z.M. Lou, Acta Phys. Sinica 52 (2003) 1071.
[8] S.M. Ikhdair and R. Sever, J. Mol. Structure:THEOCHEM 806 (2007) 155; G. Chen,
Z.D. Chen and Z.M. Lou, Chin. Phys. 13 (2004) 279.
[9] W.C. Qiang, Chin. Phys. 12 (2003) 136.
[10] W.C. Qiang, Chin. Phys. 13 (2004) 571.
[11] G. Chen, Phys. Lett. A 328 (2004) 116; Y.F. Diao, L.Z. Yi and C.S. Jia, Phys. Lett. A
332 (2004) 157.
[12] L.Z. Yi et al, Phys. Lett. A 333 (2004) 212.
[13] X.Q. Zhao, C.S. Jia and Q.B.Yang, Phys. Lett. A 337 (2005) 189.
[14] A.D. Alhaidari, H. Bahlouli and A. Al-Hasan, Phys. Lett. A 349 (2006) 87.
[15] G. Chen, Phys. Lett. A 339 (2005) 300.
[16] A. de Souza Dutra and G. Chen, Phys. Lett. A 349 (2006) 297.
[17] F. Dominguez-Adame, Phys. Lett. A 136 (1989) 175.
[18] A.S. de Castro, Phys. Lett. A 338 (2005) 81.
[19] G. Chen, Acta Phys. Sinica 53 (2004) 680; G. Chen and D.F. Zhao, Acta Phys. Sinica
52 (2003) 2954.
[20] M. Kibler and T. Negadi, Int. J. Quantum Chem. 26 (1984) 405; İ. Sökmen, Phys. Lett.
118A (1986) 249; L.V. Lutsenko et al., Teor. Mat. Fiz. 83 (1990) 419; H. Hartmann
et al., Theor. Chim. Acta 24 (1972) 201; M.V. Carpido-Bernido and C. C. Bernido,
Phys. Lett. 134A (1989) 315; M.V. Carpido-Bernido, J. Phys. A 24 (1991) 3013; O.
F. Gal’bert, Y. L. Granovskii and A. S. Zhedabov, Phys. Lett. A 153 (1991) 177; C.
Quesne, J. Phys. A 21 (1988) 3093.
[21] M. Kibler and P. Winternitz, J. Phys. A 20 (1987) 4097.
[22] H. Hartmann and D. Schuch, Int. J. Quantum Chem. 18 (1980) 125.
[23] M. Kibler and T. Negadi, Phys. Lett. A 124 (1987) 42; A. Guha and S. Mukherjee, J.
Math. Phys. 28 (1989) 840; G. E. Draganescu, C. Campiogotto and M. Kibler, Phys.
Lett. A 170 (1992) 339; M. Kibler and C. Campiogotto, Phys. Lett. A 181 (1993) 1; V.
M. Villalba, Phys. Lett. A 193 (1994) 218.
[24] Y. Aharonov and D. Bohm, Phys. Rev. 115 (1959) 485.
[25] P. A. M. Dirac, Proc. R. Soc. London Ser. A 133 (1931) 60.
[26] B.P. Mandal, Int. J. Mod. Phys. A 15 (2000) 1225.
[27] R. Dutt, A. Gangopadhyaya and U.P. Sukhatme, Am. J. Phys. 65 (5) (1997) 400.
[28] B. Gönül and İ. Zorba, Phys. Lett. A 269 (2000) 83.
[29] A.F. Nikiforov and V. B. Uvarov, Special Functions of Mathematical Physics
(Birkhauser, Bassel, 1988).
[30] S.M. Ikhdair and R. Sever, to appear in the Int. J. Theor. Phys. (preprint quant-
ph//0702186).
[31] H.S. Valk, Am. J. Phys. 54 (1986) 921.
[32] Q.W. Chao, Chin. Phys. 13 (5) (2004) 575.
[33] C. Berkdemir, A. Berkdemir and J.G. Han, Chem. Phys. Lett. 417 (2006) 326.
[34] A. Bastida et al, J. Chem. Phys. 93 (1990) 3408.
[35] F. Yasuk, A. Durmuş and I. Boztosun, J. Math. Phys. 47 (2006) 082302.
[36] C. Berkdemir, Am. J. Phys. 75 (2007) 81.
[37] C.Y. Chen and S.H. Dong, Phys. Lett. A 335 (2005) 374.
[38] Y.F. Cheng and T.Q. Dai, Phys. Scr. 75 (2007) 274.
[39] S.M. Ikhdair and R. Sever, preprint quant-ph/0703008; quant-ph/0703042.
[40] S.M. Ikhdair and R. Sever, preprint quant-ph/0703131.
[41] S.M. Ikhdair and R. Sever, Z. Phys. C 56 (1992) 155; C 58 (1993) 153; D 28 (1993) 1;
Hadronic J. 15 (1992) 389; Int. J. Mod. Phys. A 18 (2003) 4215; A 19 (2004) 1771; A
20 (2005) 4035; A 20 (2005) 6509; A 21 (2006) 2191; A 21 (2006) 3989; A 21 (2006)
6699; Int. J. Mod. Phys. E (in press) (preprint hep-ph/0504176); S. Ikhdair et al, Tr. J.
Phys. 16 (1992) 510; 17 (1993) 474.
[42] S.M. Ikhdair and R. Sever, Int. J. Theor. Phys. (DOI 10.1007/s10773-006-9317-7; J.
Math. Chem. (DOI 10.1007/s10910-006-9115-8).
[43] S.M. Ikhdair and R. Sever, Ann. Phys. (Leipzig) 16 (3) (2007) 218.
[44] S.M. Ikhdair and R. Sever, to appear in the Int. J. Mod. Phys. E (preprint quant-
ph/0611065).
[45] G. Sezgo, Orthogonal Polynomials (American Mathematical Society, New York, 1939).
[46] N.N. Lebedev, Special Functions and Their Applications (Prentice-Hall, Englewood
Cliffs, NJ, 1965).
ABSTRACT
  The Klein-Gordon equation in D-dimensions for a recently proposed Kratzer
potential plus ring-shaped potential is solved analytically by means of the
conventional Nikiforov-Uvarov method. The exact energy bound-states and the
corresponding wave functions of the Klein-Gordon are obtained in the presence
of the noncentral equal scalar and vector potentials. The results obtained in
this work are more general and can be reduced to the standard forms in
three-dimensions given by other works.

<|endoftext|><|startoftext|>
Introduction
	Introducing red noise terms
	Overall formalism
	Modifying the detection statistic to account for red noise
	Multiple transits
	Impact on the noise budget and detection statistic
	Noise budget on transit time-scales
	Detection probability PS/N
	Additional considerations
	Turnoff mass
	Saturation mass
	Radial velocity follow up
	Applications
	PG05a's fiducial cluster
	Example galactic open clusters
	Conclusions
ABSTRACT
  We present an extension of the formalism recently proposed by Pepper & Gaudi
to evaluate the yield of transit surveys in homogeneous stellar systems,
incorporating the impact of correlated noise on transit time-scales on the
detectability of transits, and simultaneously incorporating the magnitude
limits imposed by the need for radial velocity follow-up of transit candidates.
New expressions are derived for the different contributions to the noise budget
on transit time-scales and the least-squares detection statistic for box-shaped
transits, and their behaviour as a function of stellar mass is re-examined.
Correlated noise that is constant with apparent stellar magnitude implies a
steep decrease in detection probability at the high mass end which, when
considered jointly with the radial velocity requirements, can severely limit
the potential of otherwise promising surveys in star clusters. However, we find
that small-aperture, wide field surveys may detect hot Neptunes whose radial
velocity signal can be measured with present-day instrumentation in very nearby
(<100 pc) clusters.

<|endoftext|><|startoftext|>
Introduction
In 1873, J. Bertrand[1] published a short but important paper in which he proved that
there are of only two central fields for which all orbits radially bounded are closed, namely:
The newtonian field and the isotropic harmonic oscillator field. Because of this additional
degenerescency it is no wonder that the properties of those two fields have been under
close scrutiny since Newton’s times. Newton addresses to the isotropic harmonic oscillator
in proposition X Book I, and to the inverse-square law in proposition XI [2]. Newton
shows that both fields give rise to an elliptical orbit with the difference that in the first
case the force is directed towards the geometrical centre of the ellipse and in the second
∗e-mail: filadelf@if.ufrj.br
†e-mail: vsoares@if.ufrj.br
‡e-mail: tort@if.ufrj.br.
http://arxiv.org/abs/0704.0575v1
case the force is directed to one of the foci. Bertrand’s result, also known as Bertrand’s
theorem, continues to fascinate old and new generations of physicists interested in classical
mechanics and unsurprisingly papers devoted to it continue to be produced and published.
Bertrand’s proof concise and elegant and contrary to what one may be led to think by a
number of perturbative demonstrations that can be found in modern literature, textbooks
and papers on the subject, it is fully non-perturbative. As examples of perturbative
demonstrations the reader can consult references [3, 4, 5]. We can also find in the literature
demonstrations that resemble the spirit of Bertrand’s original work as for example [6]. As
far as the present authors are aware of all those demonstrations have a restrictive feature,
i.e., they set a limit on the number of possibilities of the existence of central fields with
the property mentioned above to a finite number and finally show explicitly that among
the surviving possibilities only two, the newtonian and the isotropic harmonic oscillator,
are really possible.
In his paper, Bertrand proves initially by taking into consideration the equal radii limit
that a central force f(r) acting on a point-like body able of generating radially bounded
orbits must necessarily be of the form
f (r) = κ r(1/m
where r is the radial distance to center of force, κ is a constant and m a rational number.
Next, making use of this particular form of the law of force and considering also an
additional limiting condition, Bertrand finally shows that only for m = 1 and m = 1/2,
which correspond to Newton’s gravitational law of force
f (r) = −
and to the isotropic harmonic oscillator law of force
f (r) = −κ r,
respectively, we can have orbits with the properties stated in the theorem. However, we
can also prove that for these laws of force all bounded orbits are closed.
Here we offer an alternative non-perturbative proof of Bertrand’s theorem that leads
in a more concise way directly to the two allowed fields.
2 Bertrand’s theorem
In a central field one can introduce a potential function V (r), through the property
f = −∇V (r) . (1)
in such a way that the mechanical energy of a point-like body of mass µ
v2 + V (r) , (2)
is conserved. For radially bounded orbits there are two extreme radii rmax e rmin, the so
called apsidal points ra, that are determined by the condition ṙa = 0, and between which
the particle oscillates indefinitely. Moreover, the conservation of the angular momentum
of the particle under the action of a central field obliges the motion to take place on a
fixed plane and allows the introduction of the effective potential
U(r) = V (r) +
, (3)
with the help of which it is possible to reduce this problem to an equivalent unidimensional
one. This procedure can be found in several textbooks at the undergraduate and gradu-
ated level, see for example [7]. In terms of the effective potential orbits radially bounded
are characterised by apsidal distances rmax e rmin that satisfy the condition E = U (ra).
Evidently there is an intermediate point r0 where the effective potential has a minimum
that satisfies
U ′ (r0) = V
′ (r0)−
= 0. (4)
The angular displacement of the particle between two successive apsidal points, the apsidal
angle ∆θa, is determined by
∆θa =
∫ rmax
[E − U (r)]
. (5)
By considering the effective potential U as the independent variable and by making use
of the inverse function r (U), Tikochinsky[3] produced a very ingenious proof of Bertrand’s
theorem. The inversion of the equation (3), however, is not possible in all the domain
on which the radial coordinate r is defined because the function is not one-to-one in the
field of the real numbers. To circumvent this difficulty we define two one-to-one branches
of the function U (r), namely, one to the left and the other to the right of the point r0.
Then we introduce the inverse functions r1 = r1 (U) and r2 = r2 (U), defined to the left
and to the right of the point r0, respectively, see Figure 1 .
We express initially the angular displacement, equation when the particle moves from
the point of minimum radial distance rmin to the point r0 in terms of the variable U
∆θ1 =
[E − U ]
[E − U ]
. (6)
radial distance r
r0rmin
Figure 1: General form of the effective potential energy.
By the same token we will also have
∆θ2 =
[E − U ]
[E − U ]
, (7)
for the angular displacement from r0 to the point of maximum radial distance r2. Upon
adding up equations (6) and (7) we obtain the angular displacement between two succes-
sive apsidal points
∆θa =
F (U)
E − U
, (8)
where
F (U) =
. (9)
Equation (8) is Abel’s integral equation the solution of which can be found, for ex-
ample, in Landau’s well known book on classical mechanics [8]. A beautiful and straight-
forward solution of this equation is the one by Oldham and Spanier [9]. Abel’s solution
reads
∆θa (E)√
U − E
dE, (10)
where the explicit dependency of the apsidal angle on the energy was stressed.
If all bounded orbits are closed then the apsidal angle ∆θa (E), for these orbits, cannot
change when the energy changes in a continual manner otherwise the continual changes
would inevitably lead to open orbits. Taking this fact into account let us determine the
central potentials that produce the same apsidal angle for all radially bounded orbits.
After integrating equation (10) we obtain
2m∆θa
U − U0. (11)
Equation (11) was derived in Ref. [3] where a perturbative technique applied on a circular
orbit leads to Bertrand’s result. The functions r1 (U) and r2 (U) being the inverse function
of the function U (r) are not independent of each other, and combined as they are in
equation (11), do not allow an efficient manipulation and hide the unique inverse we are
looking for. At this point we perform an analytical continuation of the function U (r)
such that we can consider its inverse function r = r (U). Therefore we write
2µ∆θa
U − U0 + Φ(U, U0) , (12)
where Φ (U, U0) is an analytical function of the complex variable U in an open neighbor-
hood of U0 satisfying the condition Φ (U0, U0) = 1/r0, and whose analytical continuation
cannot have poles but can have other ramification points. Notice that it s not necessary
to make use of the symbol ± before the second term of equation 12) because the square
root has two branches. The positive sign corresponds to r < r0 and the negative one to
r > r0. Taking equation (3) into equation (12) we obtain
2m∆θa
r2V (r) +
− U0r2 + Φ(U, U0) . (13)
The left-hand side of the identity (13) represents a meromorphic function with a single
pole at r = 0 and the right hand side of this same identity contains several terms but
only one can spoil the analyticity of the complete function at some point not equal to
r = 0, namely the term that depends on the square root that generates a branch point at
r = r0. To avoid this it is mandatory to undo the branching effect inherent to the square
root. This is possible only if the radicand is the square of an analytical function with a
zero at r = r0. In this way we identify two possibilities for the potential V (r), to wit
V (r) = −
, newtonian potential, (14)
V (r) =
κ r2, isotropic harmonic oscillator potential; (15)
for which the apsidal angle is independent of the energy. We can calculate the corre-
sponding constant apsidal angles for those two potentials as follows.
For the newtonian potential the effective potential, equation (3), is given by
U = −
. (16)
Solving equation (16) with respect to 1/r we obtain
. (17)
Making use of equation (4) with the effective potential given by equation (16) we find
r0 = ℓ
2/(µκ) and the corresponding minimum energy U0 = −µκ2/(2ℓ2). Now we can
recast equation (17) into the form
U − U0. (18)
Comparing equation (12) with equation (18) we can finally determine the apsidal angle
for the newtonian potential which reads
∆θa = π. (19)
The procedure employed with the newtonian potential can be also applied with a little
bit more of effort to the case of the isotropic harmonic oscillator. The effective potential
is now given by
kr2 +
. (20)
This equation is a quartic equation in 1/r, biquadratic more precisely, and its solution is
given by
− 1. (21)
Factoring out the right hand side of the equation (21) we have
U − U0 +
U + U0, (22)
where now we have made use of the relations r2
µκ and U0 = ℓ
κ/µ. Comparing
equations (12) and (22) we obtain
2µ∆θa
, ∴ ∆θa =
. (23)
We can see that both potentials for which the apsidal angle is constant the orbits are
closed. For the newtonian case the radius oscillates only once in a complete cycle and for
the oscillator case the radius oscillates twice.
3 Final Remarks
In this brief paper we derived Bertrand’s theorem in a non-perturbative way. We have
shown that simple analytical function techniques applied to the problem of finding the only
central fields that allow an entire class of bounded, closed orbits with a minimum number
of restrictions leads in a concise, straightforward way directly to the two allowed fields.
We believe that the derivation discussed here is a valid alternative to a non-perturbative
proof of Bertrand’s theorem and can be presented at the undergraduate and graduate
level or assigned as a problem for classroom discussion.
References
[1] Bertrand J 1873 C.R. Acad. Sci. Paris 77 849
[2] Newton I 1687 Philosophiae Naturalis Principia Mathematica (London: Royal Soci-
ety). English translation by A Motte revised by F Cajori 1962 (University of Cali-
fornia Press, Berkeley CA)
[3] Tikochinsky Y 1988 Am. J. Phys. 56 1073
[4] Brown L S 1978 Am. J. Phys. 46 930
[5] Zarmi Y 2002 Am. J. Phys. 70 446
[6] Arnol’d V I 1976 Les Méthodes Mathématiques de la Mécanique Classique (Mir:
Moscou)
[7] Goldstein H, Poole C and Safko J 2002 Classical Mechanics 3rd edn (Reading:
Addison-Wesley)
[8] Landau L and Lifchitz E 1969 Mècanique 3e èdition revue (Mir: Moscou)
[9] Oldham K B and Spanier J 1974 The Fractional Calculus (London: Academic Press)
	Introduction
	Bertrand's theorem
	Final Remarks
ABSTRACT
  We discuss an alternative non-perturbative proof of Bertrand's theorem that
leads in a concise way directly to the two allowed fields: the newtonian and
the isotropic harmonic oscillator central fields.

<|endoftext|><|startoftext|>
[1] [2] (Received 12 January 2007)
Neutron-Capture Elements in the Double-Enhanced Star HE 1305-0007: a New s- and
r-Process Paradigm∗
CUI Wen-Yuan()1,2,3, CUI Dong-Nuan()1, DU Yun-Shuang()1, ZHANG Bo()1,2
Department of Physics, Hebei Normal University, Shijiazhuang 050016
National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012
Graduate School of the Chinese Academy of Sciences, Beijing 100049
The star HE 1305-0007 is a metal-poor double-enhanced star with metallicity [Fe/H] = −2.0,
which is just at the upper limit of the metallicity for the observed double-enhanced stars. Using
a parametric model, we find that almost all s-elements were made in a single neutron exposure.
This star should be a member of a post-common-envelope binary. After the s-process material
has experienced only one neutron exposure in the nucleosynthesis region and is dredged-up to its
envelope, the AGB evolution is terminated by the onset of common-envelope evolution. Based on
the high radial-velocity of HE 1305-0007, we speculate that the star could be a runaway star from
a binary system, in which the AIC event has occurred and produced the r-process elements.
PACS numbers: 97.10.Cv,26.45.+h,97.10.Tk
The discovery that several stars show enhancements
of both r-process and s-process elements (s+r stars
hereafter)[1,2] is puzzling, as they require pollution from
both an AGB star and a supernova. In 2003, Qian and
Wasserburg[3] proposed a theory, i.e. accretion-induced
collapse(AIC), for the possible creation of s+r-process
stars. Another possible s+r scenario is that the AGB
star transfers s-rich matter to the observed star but not
suffer from a large mass loss and at the end of the AGB
phase, the degenerate core of low-metallicity, high-mass
AGB star may reach the Chandresekhar mass, leading to
type-1.5 supernova.[4] Because the initial-final-mass re-
lation flats at higher metallicity,[4] the degenerate cores
of high-metallicity AGB stars are smaller than those of
the low-metallicity stars, the formation of AIC or SN1.5
is more difficult in the high-metallicity binary system,
which can explain the upper limit of the metallicity
([Fe/H] < −2.0) for the observed r+s stars.[5] Recently,
Barbuy et al.[6] and Wanajo et al.[7] suggested massive
AGB stars (M = 8 ∼ 12M⊙) to be the origin of these
double enhancements. Such a large mass AGB star could
possibly provide the observed enhancement of s-process
elements in the first phase, and explode or collapse pro-
viding the r-process elements. However, the modeling of
the evolution of such a large mass metal-poor star is a
difficult task, an amount of the s-process material is pro-
duced and its abundance distribution is still uncertain.[7]
The generally favoured s-process model till now is as-
sociated with the partial mixing of protons (PMP here-
after) into the radiative C-rich layers during thermal
pulses.[8−11] PMP activates the chain of reactions 12C(p,
γ)13N(β)13C(α, n)16O, which likely occurs in a narrow
mass region of the He intershell (i.e. 13C-pocket) during
the interpulse phases of an AGB star. The nucleosynthe-
sis of neutron-capture elements in the carbon-enhanced
metal-poor stars (CEMP stars hereafter)[12] can be in-
vestigated by abundance studies of s-rich or r-rich stars.
In 2006, Goswami et al.[13] analysed the spectra of the s-
and r-rich metal-poor star HE 1305-0007, and concluded
that the observed abundances could not be well fit by a
scaled solar system r-process pattern nor by the s-process
pattern of an AGB model. This star shows that the en-
hancements of the neutron-capture elements Sr and Y
are much lower than the enhancement of Ba and the
abundances ratio [Pb/Ba] is only about 0.05. Because
of the Na overabundance, which is believed to be formed
through deep CNO-burning, Goswami et al.[13] have also
speculated that this star should be polluted by a massive
AGB star. Clearly, the restudy of elemental abundances
in this object is still very important for well understand-
ing the nucleosynthesis of neutron-capture elements in
metal-poor stars.
The chemical abundance distributions of the very
metal-poor double-enhanced stars are excellent informa-
tion to set new constraints on models of neutron-capture
processes at low metallicity. The metallicity of HE 1305-
0007 is [Fe/H] = −2.0, which is just at the upper limit of
the metallicity for the observed double-enhanced stars.
There have been many theoretical studies of s-process
nucleosynthesis in low-mass AGB stars. Unfortunately,
the precise mechanism for chemical mixing of protons
from the hydrogen-rich envelope into the 12C-rich layer
to form a 13C-pocket is still unknown.[14] It is interest-
ing to adopt the parametric model for metal-poor stars
presented by Aoki et al.[15] and developed by Zhang et
al.[5] to study the physical conditions which could repro-
duce the observed abundance pattern found in this star.
In this Letter, we investigate the characteristics of the
nucleosynthesis pathway that produces the special abun-
dance ratios of s- and r-rich object HE 1305-0007 using
the s-process parametric model.[5] The calculated results
are presented. We also discuss the characteristics of the
s-process nucleosynthesis at low metallicity.
We explored the origin of the neutron-capture elements
in HE 1305-0007 by comparing the observed abundances
with predicted s- and r-process contribution. For this
http://arxiv.org/abs/0704.0576v1
purpose, we adopt the parametric model for metal-poor
stars presented by Zhang et al.[5] The ith element abun-
dance in the envelope of the star can be calculated by
Ni(Z) = CsNi,s + CrNi,r10
[Fe/H], (1)
where Z is the metallicity of the star, Ni,s is the abun-
dance of the i-th element produced by the s-process in the
AGB star and Ni,r is the abundance of the ith element
produced by the r-process (per Si = 106 at Z = Z⊙), Cs
and Cr are the component coefficients that correspond
to contributions from the s-process and r-process respec-
tively.
There are four parameters in the parametric model of
s- and r-rich stars. They are the neutron exposure per
thermal pulse ∆τ , the overlap factor r, the component co-
efficient of the s-process Cs and the component coefficient
of the r-process Cr. The adopted initial abundances of
seed nuclei lighter than the iron peak elements were taken
to be the solar-system abundances, scaled to the value of
[Fe/H] of the star. Because the neutron-capture-element
component of the interstellar gas to form very mental-
deficient stars is expected to consist of mostly pure r-
process elements, for the other heavier nuclei we use the
r-process abundances of the solar system,[16] normalized
to the value of [Fe/H]. The abundances of r-process nu-
clei in Eq. (1) are taken to be the solar-system r-process
abundances[16] for the elements heavier than Ba, for the
other lighter nuclei we use solar-system r-process abun-
dances multiplied by a factor of 0.4266.[5,17] Using the
observed data in the sample star HE 1305-0007, the pa-
rameters in the model can be obtained from the para-
metric approach.
Figure 1 shows our calculated best-fit result. For this
star, the curves produced by the model are consistent
with the observed abundances within the error limits.
The agreement of the model results with the observations
provides strong support to the validity of the paramet-
ric model. In the AGB model, the overlap factor r is a
fundamental parameter. In 1998, Gallino et al.[8] (G98
hereafter) have found an overlap factor of r ≃ 0.4−0.7 in
their standard evolution model of low-mass (1.5−3.0M⊙)
AGB stars at solar metallicity. The overlap factor calcu-
lated for other s-enhanced metal-poor stars lies between
0.1 and 0.81.[5] The overlap factor deduced for HE 1305-
0007 is about r = 0+0.17
−0.00, which is much smaller than
the range presented by G98. This just implies that iron
seeds could experience only one neutron exposure in the
nucleosynthesis region.[18]
For the third dredge-up and the AGB model, sev-
eral important properties depend primarily on the core
mass.[19−21] In the core-mass range 0.6 ≤ Mc ≤ 1.36,
an analytical formula for the AGB stars was given by
Iben[19] showing that the overlap factor increases with
decreasing core mass. Combing the formula and the
initial-final mass relations,[4] Cui and Zhang [22] obtained
the overlap factor as a function of the initial mass and
metallicity. In an evolution model of AGB stars, a
small r may be realized if the third dredge-up is deep
40 50 60 70 80
 = 0.71(mbarn 1)
 r  = 0.00
Cr = 67.43
Cs= 0.0047
 = 1.290
FIG. 1: Best fit to observational result of HE 1305-0007.
The black circles with appropriate error bars denote the ob-
served element abundances, the solid line represents predic-
tions from s-process calculations considering r-process contri-
bution (taken from Ref. [13]).
enough for the s-processed material to be diluted by ex-
tensive admixture of unprocessed material. Karakas[21]
and Herwig[23,24] have found that the third dredge-up is
more efficient for the AGB stars with larger core masses,
confirming the low values of r obtained by Iben[19] in
these cases. In AGB stars with initial mass in the range
M = 1.0−4.0M⊙, the core mass Mc lies between 0.6 and
1.2M⊙ at [Fe/H] = −2.0. According to the formula pre-
sented by Iben,[19] the corresponding values of r would
range between 0.76 and 0.26. Obviously, the overlap fac-
tor of HE 1305-0007 is smaller than this range.
We have extensively explored the convergence of the
abundance distribution of s-process elements through re-
current neutron exposures. All elements, including Pb,
were found to be made in the first neutron exposure.
This is consistent with the small overlap factor r ≃ 0
deduced in our best-fit model. Thus the possibility that
the s-process material has experienced only one neutron
exposure in the nucleosynthesis region is existent.
In 2000, Fujimoto, Ikeda and Iben[25] have proposed
a scenario for the extra-metal-poor AGB stars with
[Fe/H]< −2.5 in which the convective shell triggered by
the thermal runaway develops inside the helium layer.
Once this occurs,12C captures proton to synthesize13C
and other neutron-source nuclei. The thermal runaway
continues to heat material in the thermal pulse so that
neutrons produced by the 22Ne(α, n)25Mg reaction as
well as the 13C(α, n)16O reaction may contribute. In this
case, only one episode of proton mixing into He intershell
layer occurs in metal-poor stars.[25,15,45] After the first
two pulses no more proton mixing occurs although the
third dredge-up events continue to repeat, so the abun-
dances of the s-rich metal-poor stars can be characterized
by only one neutron exposure. Obviously, the metallicity
of HE 1305-0007 is higher than the range of metallicity
for this scenario.
0.1 1
FIG. 2: Best fit to observational result of metal-deficient
star HE 1305-0007 shows the calculated abundances logε(Pb),
logε(Ba) and logε(Sr) and reduced χ2 (bottom)as a function
of the neutron exposure ∆τ in a model with Cr = 67.4,
Cs = 0.0047 and r = 0. These are compared with the ob-
served abundances of HE 1305-0007.
0.0 0.2 0.4 0.6 0.8 1.0
FIG. 3: The same as those in Fig. 2 but as a function of the
overlap factor r in a model with ∆τ = 0.71.
One major goal of this work is to explore the charac-
teristics of the binary system that HE 1305-0007 origin
belongs to. The enhancement of the neutron-capture ele-
ments Ba and Pb suggests that in a binary system a mass-
transfer episode from a former AGB star took place. The
radial-velocity measurement indicates that HE 1305-0007
is a high-velocity star, with a radial-velocity of 217.8 km
s−1. From the high velocity of HE 1305-0007, we could
speculate that the star could be a runaway star from a bi-
nary system, which has experienced the AIC event. The
strong overabundance of r-process elements for HE 1305-
0007 (Cr = 67.4) should be a significant evidence for the
AIC scenario. In this case, the orbital separation must be
small enough to allow for capture of a sufficient amount
of material to create the formation of this event. As-
suming that HE 1305-0007 is formed in a binary system,
the AGB connection strongly suggests that this star is a
member of a post-common-envelope binary. This must
be the case if the overabundances of s-process elements
are attributed to mass-transfer from an AGB star. We
can only speculate about the effects of common-envelope
phase on the nuclear signatures in a metal-poor star that
was formed from this mechanism. One case could in-
volve several thermal pulses with dredge-up causing the
observed abundance distribution corresponding to larger
overlap factor. However, after the s-process material has
experienced only one neutron exposure in the nucleosyn-
thesis region and is dredged-up to its envelope, the AGB
evolution is terminated by the onset of common-envelope
evolution. This could explain the characteristic of single
neutron exposure in this star. In addition, based on the
Na overabundance, Goswami et al.[13] have speculated
that HE 1305-0007 should be polluted by a massive AGB
star, which has a large core-mass and favours the forma-
tion of AIC. Clearly, a detailed theoretical investigation
of this scenario is highly desirable.
The neutron exposure per pulse, ∆τ , is another funda-
mental parameter in the AGB model. In 2006, Zhang
et al.[5] have deduced the neutron exposure per pulse
for other s-enhanced metal-poor stars which lies between
0.45 and 0.88 mbarn−1. The neutron exposure deduced
for HE 1305-0007 is about ∆τ = 0.71+0.06
−0.04 mbarn
−1. Fig-
ures 2 and 3 show the calculated abundances logε(Pb),
logε(Ba) and logε(Sr) as versus the neutron exposure ∆τ
in a model with Cr = 67.4, Cs = 0.0047 and r = 0
and versus overlap r with ∆τ = 0.71 mbarn−1, respec-
tively. These are compared with the observed abun-
dances of HE 1305-0007. There is only one region in
Fig. 2, ∆τ = 0.71+0.06
−0.04 mbarn
−1, in which all the ob-
served ratios of three representative elements can be ac-
counted for within the error limits. The bottom panel
in Fig. 2 displays the reduced χ2 value calculated in
our model with all detected elemental abundances be-
ing taken into account and there is a minimum, with
χ2 = 1.290, at ∆τ = 0.71 mbarn−1. From Fig. 3,
we find that the abundances logε(Pb), logε(Ba) and
logε(Sr)are insensitive to the overlap factor r in a wider
range, 0 ≤ r ≤ 0.17. The uncertainties of the parameters
for the star HE 1305-0007 are similar to those for metal-
poor stars LP 625-44 and LP 706-7 obtained by Aoki et
al.[15]
In addition, it is worth further commenting on the be-
haviour of logε(Sr), logε(Ba) and logε(Pb) as a function
of the neutron exposure ∆τ seen in Fig. 2. The non-
linear trends displayed in the plot reveal the complex
dependence on the neutron exposure. The trends can be
illustrated as follows. Starting from low neutron expo-
sure and moving toward higher neutron exposure values,
they show how the Sr peak elements are preferentially
produced at nearly ∆τ∼ 0.4mbarn−1. At larger neutron
exposure (e.g., ∆τ∼ 0.7mbarn−1), the Ba-peak elements
become dominant. In fact, the higher neutron exposure
favors large amounts of production of the heavier ele-
ments such as Ba, La, etc. and less Sr, Y, etc.,[22] which
is the reason of the abundance pattern of the s-process
elements in HE 1305-0007, i.e. the enhancements of the
neutron-capture elements Sr and Y are much lower than
the enhancement of Ba and the abundances ratio [Pb/Ba]
is only about 0.05. Then a higher value of logε(Pb)∼ 4
follows at ∆τ = 1.5 mbarn−1. In this case, the s-process
flow extends beyond the Sr-peak and Ba-peak nuclei to
cause an accumulation at 208Pb. Clearly, logε(Pb) is very
sensitive to the neutron exposure.
The r- and s-process component coefficients of HE
1305-0007 are about 67.4 and 0.0047, which implies that
this star belongs to s+r stars. Recently, Zhang et al.[5]
have calculated 12 s+r stars with 0.0005 ≤ Cs ≤ 0.0060.
The s-process component coefficient of HE 1305-0007 lies
in this range. The Ba and Eu abundances are most use-
ful for unraveling the sites and nuclear parameters asso-
ciated with the s- and r-process corresponding to those
in extremely metal-poor stars, polluted by material with
a few times of nucleosynthesis processing. In the Sun,
the elemental abundances of Ba and Eu consist of signif-
icantly different combinations of s- and r-process isotope
contributions, with s:r ratios for Ba and Eu of 81:19 and
6:94, respectively.[16] From Eq. (1), we can obtain the s:r
ratios for Ba and Eu are 95.7:4.3 and 30.1:69.9, which
are obviously larger than the ratios in the solar system.
From Fig. 1 we find that our model cannot explain the
larger errors of some neutron-capture elements, such as
Y and Zr in HE 1305-0007. This implies that our un-
derstanding of the true nature of s-process or r-process
is incomplete for at least some of these elements.[27]
In conclusion, the star HE 1305-0007 is an s+r star
with metallicity [Fe/H] = −2.0, which is just at the upper
limit of the metallicity for the observed double-enhanced
stars. Theoretical predictions for abundances starting
with Sr fit well the observed data for the sample star,
providing an estimation for neutron exposure occurred
in AGB star. The calculated results indicated that al-
most all s-elements were made in the first neutron expo-
sure. Once this happens, after only one time dredge-up,
the observed abundance profile of the s-rich stars may
be reproduced in a single neutron exposure. From the
high radial-velocity of HE 1305-0007, we speculate that
the star could be a runaway star from a binary system,
which has experienced the AIC event. The r-process el-
ements in HE 1305-0007 (Cr = 67.4) should come from
the AIC event. Because the orbital separation must be
small enough to allow for capture of a sufficient amount of
material to create the formation of AIC, this star should
be a member of a post-common-envelope binary. After
the s-process material has experienced only one neutron
exposure in the nucleosynthesis region and is dredged-up
to its envelope, the AGB evolution is terminated by the
onset of common-envelope evolution. Clearly, such an
idea requires a more detailed high-resolution study and
long-term radial-velocity monitoring in order to reach a
definitive conclusion. More in-depth theoretical and ob-
servational studies of this scenario is highly desirable.
References
[1] Hill V et al 2000 Astron. Astrophys. 353 557
[2] Cohen J G et al 2003 Astrophys. J. 588 1082
[3] Qian Y Z and Wasserburg G J 2003 Astrophys. J.
588 1099
[4] Zijlstra A A 2004 Mon. Not. R. Astron. Soc. 348,
[5] Zhang B, Ma K and Zhou G D 2006 Astrophys. J.
642 1075
[6] Barbuy B et al 2005 Astron. Astrophys. 429 1031
[7] Wanajo S et al 2005 Astrophys. J. 636 842
[8] Gallino R et al 1998 Astrophys. J. 497 388
[9] Gallino R et al 2003 Nucl. Phys. A 718 181
[10] Straniero O et al 1995 Astrophys. J. 440 L85
[11] Straniero O, Gallino R and Cristallo S 2006 Nucl.
Phys. A 777 311
[12] Cohen J G et al 2005 Astrophys. J. 633 L109
[13] Aruna Goswami et al 2006 Mon. Not. R. Astron.
Soc. 372 343
[14] Busso M et al 2001 Astrophys. J. 557 802
[15] Aoki W et al 2001 Astrophys. J. 561 346
[16] Arlandini C et al 1999 Astrophys. J. 525 886
[17] Cui W Y et al 2007 Astrophys. J. 657 1037
[18] Ma K, Cui W Y and Zhang B 2007 Mon. Not. R.
Astron. Soc. 375 1418
[19] Iben I Jr 1977 Astrophys. J. 217 788
[20] Groenewegen M A T and de Jong T 1993 Astron.
Astrophys. 267 410
[21] Karakas A I, Lattanzio J C and Pols O R 2002
PASA 19 515
[22] Cui W Y and Zhang B 2006Mon. Not. R. Astron.
Soc. 368 305
[23] Herwig F 2000 Astron. Astrophys. 360 952
[24] Herwig F 2004 Astrophys. J. 605 425
[25] Fujimoto M Y, Ikeda Y and Iben I Jr 2000 Astro-
phys. J. 529 L25
[26] Iwamoto N et al 2003 Nucl. Phys. A 718 193
[27] Travaglio C et al 2004 Astrophys. J. 601 864
[1] ∗Supported by the National Natural Science Foundation of
China under Grant Nos 10373005, 10673002 and 10778616.
[2] ∗∗To whom correspondence should be addressed. Email:
zhangbo@hebtu.edu.cn
	References
ABSTRACT
  The star HE 1305-0007 is a metal-poor double-enhanced star with metallicity
[Fe/H] $=-2.0$, which is just at the upper limit of the metallicity for the
observed double-enhanced stars. Using a parametric model, we find that almost
all s-elements were made in a single neutron exposure. This star should be a
member of a post-common-envelope binary. After the s-process material has
experienced only one neutron exposure in the nucleosynthesis region and is
dredged-up to its envelope, the AGB evolution is terminated by the onset of
common-envelope evolution. Based on the high radial-velocity of HE 1305-0007,
we speculate that the star could be a runaway star from a binary system, in
which the AIC event has occurred and produced the r-process elements.

<|endoftext|><|startoftext|>
Introduction
In eleven-dimensional M theory, there exists two extended brane solutions, i.e membrane
and M5-brane. The membrane was recovered in [1] as an elementary solution of D = 11
supergravity which preserves half of the spacetime supersymmetry, which is a electric
source of four-form field. While, the M5-brane was found in [2] as a soliton solution of
D = 11 supergravity also preserving half of the spacetime supersymmetry, but is magnetic
source of the same four-form field. These extended brane solutions can be related to the
corresponding brane solutions in ten-dimensional string theory. After performing the
compactification and some dualities, these branes can be reduced to D-branes or other
brane solutions in string theory [3].
In this paper, we will investigate the properties of M2-brane in the M5-brane back-
ground. Here, we will not investigate the cases of the brane intersection. Instead, we are
mainly concerned with the classical dynamics of membrane in the given background. As
will be illustrated, due to the gravity force of M5-brane, the membrane evolves nontriv-
ially.
In the 11-dimensional supergravity, the classical solution of N coincident M5-brane
reads
ds2 = H−
3 ηµνdx
µdxν +H
3 δijdx
idxj,
H = 1 +
πNl3p
(xi)2 = r2 + x11
2, µ, ν = 0, 1, · · · , 5, i, j = 6, 7, 8, 9, 11 (1.1)
and the 4-form field strength takes the form
F4 = dA3 = 3πNl
pdvS4 (1.2)
where the dvS4 denotes the volume form of a unit S
4 and lp is the Planck length in the
11-dimensional theory. The N coincident M5-brane are parallel to the xµ directions and
located at R = 0 in the transverse space. In the near horizon limit R → 0, the harmonic
form H will become H =
πNl3p
, and the other parts will choose the same forms as in the
equations (1.1) and (1.2).
As in [4], if we suppose that there are a periodic configuration of N coincident M5-
brane along the x11 direction at intervals of 2πR11, and take the limit of 1 ≪ r/R11, then
our background metric and the 4-form field strength will become
ds2 = f−
3ηµνdx
µdxν + f
3 δijdx
idxj + f
3 (dx11)2,
f = 1 +
R11r2
2Nℓ3p
dvS3 ∧ dx11,
(xi)2, x11 = R11φ, (1.3)
where µ, ν = 0, 1, · · · , 5, i, j = 6, 7, 8, 9 and 0 ≤ φ ≤ 2π . We can see this metric has an
so(4) symmetry group of rotations in the directions transverse to the M5-brane. In the
near horizon limit, the harmonic function f becomes f =
R11r2
. While, the other parts
of background (1.3) remain unchanged. Actually, if letting the radius of x11 coordinate
approach zero, then the metric (1.3) can reduce to the N coincident NS5-brane solution
in ten-dimensional string theory [5].
Here we will mainly focus on the classical dynamics of a M2-brane in the above back-
grounds (1.1) and (1.3). The dynamics of this single membrane can be described by the
Nambu-Goto and Wess-Zumino type effective action. However, for the coincident mem-
branes, unlike the coincident D-brane in string theory which can be described by the
effective action [6], their worldvolume action is still not very clear [7]. We choose the
worldvolume coordinates of membrane as x0, x1, x2, and those of M5-brane as x0, · · · , x5.
Hence M2-brane is “parallel” to the M5-brane, i.e it is extended in some of the M5-brane
worldvolume directions xµ, and point-like in the directions transverse to the M5-brane
(x6, x7, x8, x9, x11). Indeed, this configuration breaks supersymmetry completely. We can
label the worldvolume coordinates of the M2-brane by ξµ, µ = 0, 1, 2, and use reparame-
terization invariance on the worldvolume of the M2-brane to set ξµ = xµ. The position of
the M2-brane in the transverse directions, (x6, · · · , x9, x11), give rise to scalar fields on the
worldvolume of the M2-brane, (X6(ξµ), · · · , X9(ξµ), X11(ξµ)). A single M2-brane world-
volume action [8] is given by the sum of the Nambu-Goto action and the Wess-Zumino
type term in the following form
SM2 = −T2
− detP [G]µν + T2
P [A] (1.4)
where the tension of the M2-brane is expressed as T2 = 1/4π
2l3p, and P [· · ·] means the
pullback operation
P [G]µν =
GMN(X),
P [A] =
AMNL(X) . (1.5)
The indices M,N,L run over the whole eleven dimensional spacetime. And the fields
GMN , AMNL denote the metric and form field in eleven dimensions. In the following
sections, we will discuss the M2-brane classical dynamics in the above backgrounds, and
suppose that the transverse coordinates of M5-brane only depend on the time coordinate.
In this case the Wess-Zumino term in the membrane action will vanish.
2 Classical dynamics of membrane
Now let us consider the membrane dynamics in the background (1.1). Since we have
supposed that the directions transverse to the M5-brane X i are only the function of time
t, where i = 6, 7, 8, 9, 11, the pullback quantities are as following
P [G]tt = −H−
3 Ẋ iẊ i,
P [G]x1x1 = H
P [G]x2x2 = H
3 , P [A] = 0. (2.1)
after substituting the above equations (2.1) into the M2-brane action (1.4), we get
SM2 = −V T2
H−1 − Ẋ iẊ i (2.2)
where V is the space volume of the M2-brane, also i = 6, · · · , 9, 11. We can find it is very
similar to the corresponding one in [9] which is the DBI action of D-brane in the N NS5
brane background. Then through using the Legendre transformation, the Hamiltonian is
H−1 − Ẋ iẊ i
≡ V E (2.3)
where the E denotes the energy density. And the equation of motion will be
H−1 − ẊjẊj
H−1 − ẊjẊj
. (2.4)
Using this equation of motion (2.4), one can check that the Hamiltonian is conserved. To
solve the (2.4), we need the initial conditions that it is ~X(t = 0) and ~̇X(t = 0). These
two vectors define a plane in R5. By an SO(5) rotation, we can rotate this plane into
the (x6, x7) plane. Then the motion will remain in the (x6, x7) space for all time. Thus,
without loss of generality, we can study trajectories in this space.
We choose the polar coordinates
X6 = R cos θ,
X7 = R sin θ. (2.5)
Then the energy density (2.3) will become
H−1 − Ṙ2 − R2θ̇2
, (2.6)
and the angular momentum density will be
H−1 − Ṙ2 − R2θ̇2
. (2.7)
We can find this angular momentum of the M2-brane is conserved as well. From the
membrane action (2.2), we can obtain energy momentum tensor. The components of Tµν
are listed in the following
T00 = −
H−1 − Ẋ iẊ i
Tij = −T2δij
H−1 − Ẋ iẊ i, (2.8)
and the other components of stress tensor are zero. From the angular momentum Lθ
equation (2.7) and energy density E (2.6), we can get the equations of the coordinates R
and θ
Ṙ2 =
T 22 +
, (2.9)
EH(R)R2
. (2.10)
For simplicity, we can first consider Lθ = 0 case, then the radial equation is
Ṙ2 =
. (2.11)
The right hand of the above equation can’t be smaller than zero, so we get a constraint
on the coordinate R is
πNl3p
− 1. (2.12)
From the above equation, we can see if the energy density E is larger than the tension of
a M2-brane, T2, the constraint (2.12) is empty and the M2-brane can escape to infinity.
However, for E < T2, the M2-brane does not have enough energy to overcome the grav-
itational pull of the M5-brane, and then will fall down to the M5-brane from an initial
position.
Choosing the near horizon limit, hence the harmonic function becomes H = πNl3p/R
Then the equation (2.11) will be
Ṙ2 =
πNl3p
π2N2E2l6p
R6. (2.13)
Since the left hand of the equation (2.13) is nonnegative, the coordinate R has a maximal
value
πNE2l3p/T
. Also from this equation, the minimal value of R is zero. Except
for these two, there are no other extremum. But there is one inflexion between points
R = 0 and
πNE2l3p/T
. We can regard the M2-brane is at the maximal value
πNE2l3p/T
at the initial time. Due to the gravitational force of M5-brane, the M2-
brane then will roll down to the M5-brane. As the time t → ∞, the radial coordinate
R approaches to zero. We can calculate the energy momentum tensor which is Tij =
EH(R)
as the R → 0, the Tij will approach to zero. It may regard as the pressure
decreasing to zero. But we need to mention that the coordinate R can’t reach zero, since
at this point the supergravity background will be not reliable. Then the classical dynamics
of the membrane near R = 0 from the above analysis will become incorrect. Thus, in
order to use the supergravity approximation, we must constrain the coordinate R to be
larger than the planck length lp.
Now we begin to consider the nonzero case of angular momentum. From the radial
equation of motion (2.9), and after substituting the harmonic function H = 1+πNl3p/R
we can get the constraint on the radial coordinate R is
πNE2l3p
πNl3p
. (2.14)
If choosing the equal case of the above equation, the constraint will become
πNE2l3p
πNl3p
= 0. (2.15)
The above equation only has one real root which is the maximal distance that M2-brane
is separated from M5-brane.
For simplicity, we choose the near horizon limit, then the equation of motion for the
radial coordinate will become
Ṙ2 =
πNl3p
π2N2E2l6p
π2N2E2l6p
R6. (2.16)
We find that the equation (2.16) is still very difficult to solve. Instead, here, we take
some analysis for this equation. If letting Lθ = 0, then this equation will reduce to the
equation (2.13). We let the left hand of the equation (2.16) to zero, then we can get the
extremal value for R. Actually, there are two only two real extremal values of the radial
coordinate R. One is R = 0, the other is
108πNlp
3E2T2 + 12
6 + 27π2N2lp
6E4T2
− 12Lθ2
108π Nlp
3E2T2 + 12
6 + 27π2N2lp
6E4T2
. (2.17)
When Lθ = 0, the above R value will reach the
πNE2l3p/T
. As the same in the
Lθ = 0 case, between the R = 0 and (2.17) there exists a inflexion. We can suppose
that the M2-brane is at the maximal value (2.17) at the initial time, then under the
gravitational pull of M5-brane, it will monotonic approach to M5-brane. Of course for the
Lθ nonzero case, the equation (2.10) for the θ coordinate in the near horizon background is
θ̇ = Lθ
= RLθ
πNEl3p
. Thus, if the radial coordinate R reaches the value (2.17), the angular
velocity will choose the maximum, and as the R → 0, the angular velocity does also
approach to zero. The energy momentum tensor satisfies Tij = −δij
EH(R)
= −δij
πNEl3p
Thus, it again goes to zero as in the Lθ = 0 case. As mentioned in the above, near the
region R = 0, the classical background will be instability due to the strong interaction.
Hence the above supergravity analysis will become unreliable in this region.
From the first section, we already know that, after compactifying a periodic circle of
coordinate x11, the metric (1.1) will become background (1.3). In the following, we study
the membrane dynamics in this background (1.3). Here, we still suppose the directions
transverse to the M5-brane X i and X11 are only the function of time t, where i = 6, 7, 8, 9,
then the pullback quantities take the form as follows
P [G]tt = −f−
3 + f
3 Ẋ iẊ i + f
3 Ẋ11Ẋ11,
P [G]x1x1 = f
P [G]x2x2 = f
3 , P [A] = 0. (2.18)
After inserting (2.18) into the M2-brane action (1.4), we can get
SM2 = −V T2
f−1 − Ẋ iẊ i − Ẋ11Ẋ11 (2.19)
where V is the space volume of the M2-brane. This action is also very similar to action
in [9] except for the harmonic function and dimension. From the Lagrangian (2.19), we
can derive the equations of motion for the membrane in this background as followes
f−1 − ẊjẊj − R211φ̇2
f−1 − ẊjẊj −R211φ̇2
, (2.20)
R11φ̇
f−1 − ẊjẊj − R211φ̇2
 = 0. (2.21)
Due to some symmetry of this system, there are also some conserved charges. Time
translation invariance implies that the energy
H = PiẊ i + Pφφ̇− L (2.22)
is conserved. The momentum is obtained by varying the Lagrangian L,
δẊ i
T2V Ẋi
f−1 − ẊjẊj − R211φ̇2
, (2.23)
T2V R
f−1 − ẊjẊj − R211φ̇2
. (2.24)
Substituting (2.23) into (2.22), we find that the energy is given by
f−1 − Ẋ iẊ i − R211φ̇2
≡ V E. (2.25)
And since the harmonic function f = 1 +
R11r2
, then ∂if(r) = X
if ′(r)/r, and one of the
equations of motion (2.20) can be rewritten as
f−1 − ẊjẊj − R211φ̇2
X if ′
2rf 2
f−1 − ẊjẊj − R211φ̇2
, (2.26)
the other one is unchanged.
To solve these equations, we need to specify some initial conditions for the coordinates.
One condition is ~X(t = 0) and ~̇X(t = 0). These two vectors define a plane in R4. By an
SO(4) rotation symmetry, we can rotate this plane into the (x6, x7) plane. The other one
is φ(t = 0) and φ̇(t = 0). Then the motion of the membrane will remain in the (x6, x7, φ)
space for all time. Thus, without loss of generality, we can study trajectories in this space.
In addition to the energy, the angular momentum of the M2-brane is conserved as well.
It is given by
(X6P 7 −X7P 6). (2.27)
Using the expression for the momentum, (2.23), we find that
Lθ = T2
X6Ẋ7 −X7Ẋ6
f−1 − ẊjẊj −R211φ̇2
. (2.28)
Another interest quantity is the stress tensor Tµν associated with the moving M2-
brane. The component T00 denotes the energy density, so it is given by expression (2.25)
for E, with the factor of the volume stripped off. We list the components of Tµν in the
following equations
T00 = −
f−1 − Ẋ iẊ i −R211φ̇2
Tij = −T2δij
f−1 − Ẋ iẊ i − R211φ̇2,
Tφφ = −T2R211
f−1 − Ẋ iẊ i −R211φ̇2, (2.29)
and the other components of stress tensor are zero.
Due to the so(4) rotation symmetry in the transverse directions of M5-brane, it is
convenient to change to the polar coordinates
X6 = r cos θ, X7 = r sin θ. (2.30)
In these coordinates, the expressions of the energy density and angular momentum density
becomes
f−1 − ṙ2 − r2θ̇2 − R211φ̇2
, (2.31)
f−1 − ṙ2 − r2θ̇2 − R211φ̇2
, (2.32)
f−1 − ṙ2 − r2θ̇2 − R211φ̇2
. (2.33)
One can check directly that Lθ and Lφ are conserved by using the equations of motion
(2.26) and (2.21).
In order to solve the equations of motion for the given energy and angular momentum
densities E, Lθ and Lφ, we would like to solve the equation (2.32) for θ̇, and then substitute
this solution into the (2.31). Then the equation for the θ̇ is
. (2.34)
Inserting it into (2.31), (2.32) and solving for ṙ, we find
ṙ2 =
E2f 2
T 22 +
. (2.35)
Also we have the equation of φ̇
EfR211
. (2.36)
In the next, we would like to study the solutions of the equations of motion (2.34), (2.35)
and (2.36).
Firstly, we consider the angular momentum Lθ = 0 case. Then Equation (2.34) implies
that θ is constant, while the radial equation (2.35) takes the form
ṙ2 =
E2f 2
T 22 +
. (2.37)
Since the right hand side of the equation(2.37) is non-negative, then we can get the
condition 1
T 22 +
≥ 0. After substituting the harmonic function f , (1.3),
into it, we find the constraint on r (for fixed energy density E)
R11r2
− 1 (2.38)
where we can define the effective M2-brane tension is
T 2e = T
(2.39)
From the equation of constraint (2.38), obviously, if the energy density E is larger than
the effective tension of a M2-brane, Te, the constraint (2.38) is empty and the M2-brane
can escape to infinity. For E < Te, the M2-brane does not have enough energy to escape
the gravitational pull of the M5-brane, which means that it cannot exceed some maximal
distance from the M5-brane.
Under the near horizon limit, the harmonic function f will become f =
R11r2
. Then
the equation (2.38) will be
R11r2
. Thus, if r <<
, the effective tension of
membrane Te satisfies the constraint Te/E >> 1. However, r >>
, the case will
be otherwise. Indeed, in this near horizon case, we can solve for the trajectory r(t), φ(t)
exactly. Substituting the harmonic function f =
R11r2
into (2.37), we find the equation
of motion
ṙ2 =
E2N2l6p
T 22 +
r4. (2.40)
Then the solution can be obtained
L2φ +R
NR11E2l3p
t (2.41)
where we choose t = 0 to be the time at which the M2-brane reaches its maximal distance
from the M5-brane. For an observer living on M5-brane, the M2-brane reaching r = 0
will take an infinite time. Also, the M2 radial motion is similar to D-brane’s motion in
And the equation of motion (2.24) becomes
φ̇2 =
r2 − ṙ2
11 + T
. (2.42)
Substituting the solution r into equation (2.42), we can get the equation
L2φ + T
. (2.43)
Then after solving this equation, the solution can be obtained
L2φ + T
t. (2.44)
It is interesting to calculate the energy momentum tensor of the M2-brane in this
case. The energy density T00 is constant and equal to E throughout the time evolution.
However, for the parts Tij and Tφφ, we can find
Tij = −δij
Tφφ = −
R211T
. (2.45)
We see that the pressure goes smoothly to zero as r → 0, since f(r) ∼ 1/r2. But again
as the analysis in the background (1.1), this may be unreliable near the r = 0 region.
So far we have discussed the trajectories with vanishing angular momentum density
(2.32). A natural question is whether anything qualitatively new occurs for non-zero Lθ.
Just as [9], we can think as follows, the radial equation of motion (2.35) can be thought
of as describing a particle with mass m = 2, moving in one dimension r in the effective
potential
Veff(r) =
E2f 2
T 22 +
(2.46)
with zero energy. Now we discuss the properties of this effective potential Veff . In the
small r region, it will behave as
Veff(r) ≃
E2Nl3p
r2. (2.47)
For large r, the leading terms of this potential will be
Veff(r) ≃
− 1. (2.48)
If the energy density of the M2-brane is smaller than the effective tension of a M2-brane,
E < Te, then the effective potential Veff approaches to a positive constant (2.48) as r → ∞,
which means the membrane cannot escape to infinity. From the equation (2.47), we can
find that in order to have trajectories at non-zero r, the angular momentum must satisfy
the constraint
NEl3p
. (2.49)
If the constraint (2.49) is not satisfied, the only solution is r = 0. But, if the condition
(2.49) is satisfied, the trajectory of the M2-brane is qualitatively similar to that in the
Lθ = 0 case. It will approach the M5-brane and does not have stable orbits at finite r.
For the case Te >> E, the whole trajectory lies again in the region r <<
Nl3p/R11,
and one can approximate the harmonic function (1.3) by f =
R11r2
. Then the equation
(2.35) for ṙ will be
ṙ2 =
Nl3pE
E2N2l4p
T 22 +
r4 , (2.50)
with the solution
L2φ +R
R11NE2l3p − R11L2θ
NR11E2l3p − R211L2θ
NEl3p
t. (2.51)
We can find that the non-zero angular momentum can slow down the exponential decrease
of r as t → ∞. In the near horizon limit f(r) = Nl
, the solution of the equation (2.34)
for θ is
R11Lθ
ENl3p
t. (2.52)
The solution (2.51) and (2.52) mean that the M2-brane in the background (1.3) will be
spiralling towards the origin, circling around it an infinite number of times in the process.
The equation about φ is
Lφ(NE
2l3p −R11L2θ)
ENl3p(L
cosh
R11E2Nl3p − R211L2θ
ENl3p
. (2.53)
and the solution of the above equation reads
L2φ +R
NE2l3p − L2θ
R11NE2l3p − R211L2θ
ENl3p
t. (2.54)
At t = 0, the φ = 0, however, the time t → ∞, then, φ → Lφ
NE2l3p−L
Thus, the non-zero angular momentum Lθ slows down the variation of φ. From these
three solutions, we know that the M2-brane is circling along the θ direction, varying along
the φ and falling down towards the M5-brane in the process. Also, the energy momentum
tensor Tij and Tφφ will approach to zero as r → 0, since f(r) ∼ 1/r2. But we must
mention that, near the r = 0 region, the discussion may be incorrect due to the strong
coupling.
In the background (1.3), the results about the dynamics of a M2-brane have some
similar properties as studying in [9]. This can be understood that the D2-brane and
NS5-brane in IIA can be got by compactified one transverse dimension of M2-brane and
M5-brane in M theory. The solutions of equation of motion describe the M2-brane falling
towards the M5-brane. In the non-zero angular momentum Lθ, the M2-brane is spiralling
towards the M5-brane. But both in this two case, M2-brane has a angular momentum Lφ.
We need to mention that the background (1.3) is only correct in the limit of 1 ≪ r/R11.
Therefore, as the M2-brane approaches the M5-brane, the energy momentum tensor Tij
and Tφφ approaching zero may be unreliable. Since here the radial coordinate r is smaller
than the radius R11. So we are not sure whether the membrane will have the same
behavior just like the late time behavior of unstable D-brane [10, 11, 12, 13, 14].
In the above sections, we investigated the membrane classical dynamics in various
M5-brane backgrounds. There may be some generalizations, since under the Penrose
limit, the N coincident M5-brane solution (1.1) will reduce to the AdS7 × S4 geometry.
Hence one can investigate the membrane dynamics in this geometry. For the (1.1), (1.3)
and their near horizon background geometry, after calculating the classical equations of
motion of membrane from the membrane action (1.4), we can analyze the moving trajec-
tories of membrane. In some particular cases, we can get the exact solution of trajectories
of membrane. However, generally, the equations of motion is very difficult to solve. But
through analyzing these equations, we still can obtain some qualitative information about
the motion of membrane. Consequently, in the M5-brane background, the membrane will
be falling and spiralling towards to the M5-brane by the gravitational force of M5-brane.
In the near M5-brane region, i.e R (or r) being of the order of the planck length lp,
the above analysis of the classical dynamics of membrane may not be trusted, since the
method of the supergravity approximation is unreliable.
Acknowledgements
We would like to thank Yi-hong Gao for the useful suggestions and discussions.
References
[1] M. J. Duff and K. Stelle, “Multimembrane solutions of d = 11 supergravity,” Phys.
Lett. B 253 (1991) 113.
[2] R. Gueven, “Black p-brane solutions of D = 11 supergravity theory,” Phys. Lett. B
276 (1992) 49.
[3] J. Polchinski, “String Theory (Vol. I, Vol. II),” Cambridge Press, 1998.
[4] Y. Hyakutake, “ Expanded Strings in the Background of NS5-branes via a M2-brane,
a D2-brane and D0-branes,” hep-th/0112073.
[5] C. G. Callan, J. A. Harvey and A. Strominger, “Worldbrane actions for string soli-
tons,” Nucl. Phys. B367: 60-82, 1991; “World sheet approach to heterotic instantons
and solitons,” Nucl. Phys. B359: 611-634, 1991.
[6] R. C. Myers, “Dielectric branes,” JHEP 9912: 022, 1999 [hep-th/9910053].
[7] A. Basu and J. A. Harvey, “The M2-M5 brane system and a generalized Nahm’s
equation,” Nucl. Phys. B713: 136-150, 2005 [hep-th/0412310].
[8] E. Bergshoeff, E. Sezgin and P. K. Townsend, “Properties Of The Eleven-Dimensional
Super Membrane Theory,” Annals Phys 185 (1988) 330.
[9] D. Kutasov, “D-Brane Dynamics Near NS5-Branes,” hep-th/0405058; “A Geometric
interpretation of the open string tachyon,” hep-th/0408073; K. L. Panigrahi, “D-
Brane Dynamics in Dp-Brane Background,” hep-th/0407134.
[10] A. Sen, “Tachyon dynamics in open string theory,” Int. J. Mod. Phys. A20: 5513-
5656, 2005 [hep-th/0410103].
[11] A. Sen, “Rolling tachyon,” JHEP 0204, 048 (2002) [hep-th/0203211].
[12] F. Larsen, A. Naqvi and S. Terashima, “Rolling tachyons and decaying branes,”
hep-th/0212248.
[13] T. Okuda and S. Sugimoto, “Coupling of rolling tachyon to closed strings,” Nucl.
Phys. B647, 101 (2002) [hep-th/0208196].
[14] N. Lambert, H. Liu and J. Maldacena, “Closed strings from decaying D-branes,”
hep-th/0303139.
	Introduction
	Classical dynamics of membrane
ABSTRACT
  In this paper, we investigate the properties of a membrane in the M5-brane
background. Through solving the classical equations of motion of the membrane,
we can understand the classical dynamics of the membrane in this background.

<|endoftext|><|startoftext|>
Introduction
Solar research is currently working on understanding how
turbulent convection on the Sun transports mass and en-
ergy through the convective zone, how it couples with the
magnetic field and how it manages to deposit in the higher
parts of the solar atmosphere the energy released from the
corona. Among the different approaches to these questions,
observations of the solar photosphere are essential, as they
provide the only direct look at what is happening just below
the solar surface. The hierarchy of surface features found
on the photosphere are the visible representation of the
plasma flows beneath the photosphere and are customar-
ily classified by size and lifetime as patterns of granula-
tion (1 Mm, 0.2 hr), mesogranulation (5-10 Mm, 5 hr)
and supergranulation (15-35 Mm, 24 hr). These features
have been initially regarded as direct manifestation of var-
ious sized convection cells existing in the convection zone
(Schrijver et al., 1997; Raju et al., 1999); lately, the idea
is consolidating that meso and supergranulation are sig-
natures of a collective interaction of granular cells (Rast,
2003; Roudier et al., 2003; Berrilli et al., 2005). Despite
years of intensive studies, the character of their motions
remains not completely understood (Beck & Duvall, 2000;
Krishan et al., 2002; Berrilli et al., 2004; Del Moro et al.,
2004; DeRosa & Toomre, 2004). The aim of the study we
present is to investigate the origin of the supergranular (SG)
flow field: directly convective or a collective interaction of
smaller convective features.
The study performed by Simon & Leighton (1964) initi-
ated the campaign to characterize supergranular flows.
Outflows on SG scales have been observed to sweep
embedded granules and magnetic flux elements toward
Send offprint requests to: delmoro@roma2.infn.it
convergence lanes between cells (Leighton, 1964; Zwaan,
1978; Rimmele, 1989; Shine et al., 2000). Such behaviour
causes the chromospheric transition CaII k line to be
a good proxy for the network of intercellular lanes due
to the higher magnetic elements density in the SG
perimeters. The advent of full-disk Doppler imaging, pro-
vided by the MDI onboard SOHO spacecraft, has con-
siderably improved our capability to study such fea-
tures (Hathaway et al., 2002; DeRosa & Toomre, 2004;
Paniveni et al., 2004; Meunier et al., 2007); but direct ob-
servations of supergranular flows are still hindered by the
fact that there is no contrast on supergranular scales in visi-
ble light, observations in CaII only provide the cell network
boundaries and Doppler images show SG only away from
disk centre.
At present, the only methods to reconstruct the full 3D
vector velocity field are direct Doppler measurement in
combination with a tracking type measure for the velocity
horizontal component (above the τ = 1 surface) or Local
Helioseismology (below the τ = 1 surface). To gain com-
plete insight of the dynamics of the plasma flows inside a
SG structure, we need a spatial and temporal resolution still
not reached by local helioseismology, while to obtain the
3D velocity field through the other method, observations
with very high spatial, spectral and temporal resolution are
necessary. With the assumption that granule motions are
mainly driven by plasma flows (Rieutord et al., 2001), it is
possible to employ the TST to infer the horizontal velocity
field.
In this work we reconstruct the 3D velocity field of a sin-
gle SG structure and investigate in detail its plasma flow
using data acquired with the IBIS spectrometer, trying to
discern whether the SG pattern has a convective nature or
is originated by small scale structure interaction.
http://arxiv.org/abs/0704.0578v2
2 Del Moro et al.: 3D photospheric velocity field of a SG cell
Fig. 1. A representative synoptic panel from the 16th October 2003 dataset. Upper left panel: Ca II wing intensity
image. Upper middle panel: Doppler velocity field computed from FeI 709.0 nm line scan. Upper right panel: FeI 709.0
nm line core intensity Lower left panel: Doppler velocity field computed from FeII 722.4 nm line scan. Lower middle
panel: FeII 722.4 nm line core intensity. Lower right panel: Continuum (near 709.0 nm) intensity image.
Line Wavelength zTcore FWHMRFI
zVline FWHMRFI
[nm] [km] [km] [km] [km]
FeI 709.0 ≃100 ∼300 ≃140 ∼300
FeII 722.4 ≃50 ∼200 ≃70 ∼200
Table 1. Line RF peak depths. Depths are in km above the level τ500nm = 1.
2. Observations
The data utilized in this analysis have been acquired with
the IBIS (Interferometric BIdimensional Spectrometer) 2D
spectrometer (Cavallini et al., 2001; Cavallini, 2006) on
October 16, 2003 (from 14:24 UT to 17:32 UT).
We imaged a roundish network cell near the solar disk cen-
tre (SLAT=7.8N, SLONG=3.6E). When observed in MDI
high-resolution magnetograms, all the features outlining
the cell exhibit negative polarity and seem to survive for
at least 10 hours, with little or no evolution.
The full dataset consists of 600 sequences, containing a 16
image scan of the FeI 709.0 nm line, a 14 image scan of
the FeII 722.4 nm line and 5 spectral images in the wing
(line centre + 12 nm) of the CaII 854.2 nm line, imag-
ing a round Field of View (FoV) of about 80” diameter.
Each monochromatic image was acquired with a 25 ms ex-
posure time by a 12bit CCD detector, whose pixel scale
was 0.17′′·pixel−1. The time required for the acquisition of
a single sequence was 19 s, thus setting the temporal res-
olution. Each image was reduced with the standard IBIS
pipeline (Janssen & Cauzzi, 2006; Giordano et al., 2007),
correcting for CCD non linearity effects, dark current, gain
table and blue shift. The Line of Sight (LoS) velocity fields
were computed for the Fe I and Fe II lines by means of
Doppler shifts, evaluated, pixel by pixel, fitting a Gaussian
on the line profile. In order to remove the orbital contribu-
tion, we set to zero the average value of each LoS velocity
image. The 5-minutes oscillations were removed applying a
3D Fourier filter in the kh −ω domain with a cut-off veloc-
ity of 7 km·s−1 both on intensity and velocity image series.
After the whole reduction process and selecting only the
period of good seeing we are left with a 30 minutes dataset
imaging a square FoV of ∼ 50”. An example of the im-
ages of this reduced dataset is shown in Fig. 1. The mean
resolution due to the seeing of the CaII images is 0.35”;
the mean resolution of the LoS velocity images, the inten-
sity continuum images and the line core images is instead
0.45”, somewhat degraded by both the reduction pipeline
and the kh − ω filtering.
In order to obtain information about the depth dependence
of a photospheric quantity by associating a suitable ‘for-
mation zone’ with a line, it is possible to consider its ef-
Del Moro et al.: 3D photospheric velocity field of a SG cell 3
fect on the line characteristic as linear perturbations and
to study the Response Function RF of the emergent line
characteristic at the observed wavelengths within the line.
In particular, the RF Ip is, at each depth, the function we
must use to weigh the perturbation p in order to get the
variation of the emergent intensity I (Caccin et al., 1977).
This approach has been employed to derive the RF IT and
Fig. 2. Core intensity fields of FeII 722.4 nm (z≃50 km)
and FeI 709.0 nm (z≃100 km ) in comparison with con-
tinuum image (z≃0 km) and CaII 854.2 nm wing intensity
field (z≃150 km) (Qu & Xu , 2002). The z axis is greatly
exaggerated with respect to the x-y axes in order to allow
a better visualization.
V for the spectral lines FeI 709.0 nm and FeII 722.4 nm
(Del Moro, 2005).
In Table 1 we report the photospheric depths of the line
core RF IT maximum (zTcore), and of the mean RF
V maxi-
mum (zVline) for the two spectral lines. We also report the
RF full width at half maximum for the two spectral lines:
these rather large values imply broad formation zones for
both the FeI 709.0 nm and FeII 722.4 nm Doppler velocity
and core intensity signals.
3. 3D Velocity Field
The TST procedure (Del Moro, 2004) has been applied on
the continuum image series and on both the FeII 722.4 nm
and FeI 709.0 nm Doppler field series, in order to retrieve
the horizontal velocity field at different depths of the solar
atmosphere.
To minimize the effect of the proper motion of the
granules, which are used as trackers of the mean plasma
flow, we computed the horizontal velocity field using all
the structures that were tracked, so that statistically at
least one tracker is present in each interpolated horizontal
velocity field pixel, as suggested by Behan (2000). This
means we used a grid step of ∼ 1.5 Mm and a temporal
window of ∼ 30 min. Possibly, this would not completely
remove the noise associated with granule proper motions
or residuals from the 5-min oscillation filtering, but should
minimize it.
Combining the horizontal velocity fields retrieved from
Fig. 3. 3D representation of velocity vectors extracted from
continnum (z≃0 km), FeII 722.4 nm (z≃70 km) and FeI
709.0 nm (z≃140 km). Cone size is proportional to the ve-
locity vector module: the yellowish cone corresponds to 1
km·s−1. The z axis is greatly exaggerated with respect to
the x-y axes in order to allow a better visualization.
the Dopplergrams by the TST and the Doppler LoS
velocity, the 3D vector field has been reconstructed for
the FeII 722.4 nm and FeI 709.0 nm lines. We are aware
that associating the vector fields to precise heights in the
photosphere is an oversemplification, as can be readily
understood from the large FWHM reported in Table 1,
nevertheless, we did it for the sake of a good visualization:
in Fig. 3 we show the mean 3D velocity field associated to
the the dataset.
In both the 3D fields we retrieved, the vector velocity
appears to be structured quite coherently with the SG
feature visible in the CaII wing images.
In order to further investigate the structuring of the
velocity field, we computed the average continuum (upper
panel of Fig. 4) and CaII wing (bottom panel of Fig. 4)
intensity images, the average Dopplergram from FeII 722.4
nm (middle panel of Fig. 4) and the average Dopplergram
from FeI 709.0 nm (upper panel of Fig. 4) and correlated
them. While the average continuum image does not show
any evident signal, there is a significant correlation between
strong downflows and bright CaII features, in particular
for the complex cluster of features in the lower part of the
FoV. This issue can be at least partially explained by the
coherence of the 3D velocity field with the SG structure.
We expanded this study by comparing the averaged images
with the horizontal velocity field extracted by the TST
from granules as seen in the continuum and up-flows from
the FeI 709.0 nm Doppler images.
We excluded from this analysis the FeII 722.4 nm Doppler
images because we found its horizontal velocity field to be
not as reliable as the others. This is due to the TST finding
less than optimal number of features to track because
of the shallowness of the FeII 722.4 nm line. A shallow
line Doppler shift is much harder to measure by the LoS
velocity reconstruction procedure, resulting in a more
noisy dopplergram. This noise is interpreted by the TST
as a fast variation of the structures, therefore causing a lot
of them to be rejected for the tracking. As a consequence,
the TST finds too few trackers in the 722.4 Doppler for
the divergence field to be reliably reconstructed.
4 Del Moro et al.: 3D photospheric velocity field of a SG cell
The horizontal velocity fields extracted by the TST are
Fig. 4. Upper panel: average continuum image with the
horizontal velocity field (obtained by tracking granules)
represented as red arrows. The granules were tracked by ap-
plying the TST to the continuum image time series. Lower
panel: divergence field computed from the interpolated hor-
izontal velocity field.
shown superimposed on the average images in the left
panels of Fig. 4 and Fig. 5, above the associated 2D
divergence images. The values of the divergence fields
range from +0.25 km s−1 Mm−1 in the brightest part of
the image to −0.25 km s−1 Mm−1 in the darkest parts.
The continuum granules and FeI 709.0 nm up-flow fields
show a divergent flow from the centre of the SG structure
and convergent flows in the border of the SG structure. In
detail, these two fields agree very well, showing a single,
large divergent feature in the centre of the SG, whose mean
value is about +0.1 km s−1 Mm−1, almost completely
surrounded by convergent flows of the same magnitude.
The peak divergence signals we retrieved both in the
centre and in the periphery of the SG cell are an order
of magnitude larger than the averaged values reported by
Meunier et al. (2007). This discrepancy probably stems
mainly from the different temporal and spatial averaging
processes in the divergence reconstruction and marginally
from the different resolution of the two datasets.
The structuring of the divergence field is very compatible
with a net flow from the centre of the SG to its border.
Fig. 5. Upper panel: average FeI 709.0 nm Doppler velocity
image with the horizontal velocity field (obtained by track-
ing up-flows) represented as red arrows. The up-flows were
tracked by applying the TST to the FeI 709.0 nm Doppler
velocity field time series. Lower panel: divergence field com-
puted from the interpolated horizontal velocity field.
Moreover, examining the LoS velocity fields, we found a
strong and stable up-flow region nearby the divergence
maximum, with a mean FeII 722.4 nm Doppler veloc-
ity value of Vc ∼200 m·s
−1 for almost the whole time
span. This last region is liable to be the origin of the
divergence signal we measured, possibly as suggested by
Rieutord et al. (2000); Roudier et al. (2003).
Observing the divergence images (bottom panels of Fig. 4
and Fig. 5) extracted from the horizontal velocity fields,
the supergranule is outlined by convergences on ∼ 66%
of its circumference, while the bright cluster area clearly
visible in the CaII wing image does not seem to be a region
of strong convergence, despite the fact that it is mostly
formed of down-flows.
This region has a mean FeII 722.4 nm Doppler velocity
value Vc ∼-100 m·s
−1 for the whole dataset time span (T
≃ 0.5 hour), and it seems to mantain similar values also for
the part of the observations discarded for the loss of spatial
resolution due to worsening seeing condition. A similar
cluster of bright CaII structures is present in the upper-left
part of the FoV, but its associated downflow shows a
much smaller coherence: it has a mean FeII 722.4 nm
Doppler velocity value Vc ∼-50 m·s
−1 for more than half
Del Moro et al.: 3D photospheric velocity field of a SG cell 5
of the time span, then it drops to ∼-20 m·s−1. Whether
or not downflow regions like these may be organizing the
SG pattern, as predicted by Rast (2003), is a question
we cannot address due to the short duration of our dataset.
4. Horizontal Flow Analysis via Cork Tracking
To further extract information about the plasma motion
inside the SG structure, we tracked the evolution of tracers
(corks) passively advected by instantaneous velocity and
intensity fields. The corks, initially randomly spread over
the FoV, are moved following the local gradient towards
sites of minimum intensity or of minimum velocity in the
case of intensity or velocity fields, respectively. We will com-
pare the final and initial positions of the corks, which will
give us information about the motion of downflows in the
field of view. Corks are tracked for ∼ 16 minutes (a time
sufficiently longer than the characteristic time scale of pho-
tospheric fields (Müller et al., 2001; Berrilli et al., 2002) to
let the cork settle in a downflow feature and track it for
a while) and their initial and final position are stored. As
corks tend to accumulate in long lasting downflow struc-
tures, new corks are added each ∼ 5 minutes in order to
also track structures forming during the observations. In
Fig. 6 we report the result of the cork tracing for a contin-
uum image series and for both the FeII 722.4 nm and the
FeI 709.0 nm Doppler fields. In particular, we plotted the
difference between the final and initial distances from the
image centre of the corks versus their initial distances from
the image centre. The alignment effect of the scatter plot is
due to an inverse linear relationship between the ρstart and
the δρ of corks with different initial postions which end in
the same ‘attractor’ and therefore share their final position.
As the image is centred on the SG structure, this will give us
information about a possible difference of mean flows inside
and outside the SG. In order to investigate the properties
of the distribution, we fit on the scatter plots a sigmoidal
function:
A1 −A2
1 + e(x−x0)/dx
+A2 (1)
so that x0 will tell where the transition between the two
values A1 and A2 of the distribution takes place and dx
will tell how fast this transition is. The parameters of the
fit are retrieved by a recursive Levenberg-Marquardt min-
imization algorithm. The fits agree in retrieving positive
values of A1 and near zero values for A2 (Table 2). This
means that the corks inside a circle of radius x0 from the
image centre tend to increase their radial distance, while
the corks outside have no preferred direction in their mo-
tion. Several simulations on randomly generated velocity
fields showed that we can neglect the contribute from
corks whose initial position is so near to the image centre
that they are biased towards positive radial displacement.
Finally, we tested the robustness of these results against
the initial guesses and against the SG centre position in
the FoV. The retrieved parameters do not depend on these
factors, as long as the initial guesses are of the same order
of magnitude of the convergence values or the FoV shift is
less than 2.5 Mm.
In Fig. 4 we show the mean intensity fields associated to
the plots in Fig. 6, with superimposed the location of the
change of the A value represented as an annulus of mean
Fig. 6. Cork displacements versus initial positions. Top
Panel: results of the cork tracking for the intensity field
from continuum images. Central Panel: results of the cork
tracking for the vertical velocity field extracted from FeII
722.4 nm. Bottom Panel: results of the cork tracking for
the vertical velocity field extracted from FeI 709.0 nm. Each
scatter plot has been fitted with a sigmoidal function (equa-
tion 1). The retrieved fits are overplotted on the relative
scatter plots.
radius x0 and thickness 2dx. The three annular shapes
essentially agree in retrieving the same SG diameter of
∼ 25 Mm.
The width of the annuli, instead, seems to depend on
the atmospheric altitude. In the upper panel of Fig. 8
we report the value of dx computed by the sigmoidal fits
versus the photospheric height. Error bars represent the
standard deviation from the fit.
Recently, Berrilli et al. (2002) found a similar height
dependence of the statistical properties of granular
flows. In particular, they reported an intense braking
in the first ∼ 120 km of the photosphere, confirmed by
Puschmann et al. (2005) and a damping effect that filtered
6 Del Moro et al.: 3D photospheric velocity field of a SG cell
A1 A2 x0 dx
[Mm] [Mm] [Mm] [Mm]
Continuum Intensity 0.49 ± 0.02 0.08 ± 0.01 11.3± 0.2 0.3± 0.2
FeII 722.4 LoS Velocity 0.61 ± 0.03 −0.09± 0.02 12.0± 0.2 1.1± 0.2
FeI 709.0 LoS Velocity 0.67 ± 0.08 −0.09± 0.03 10.9± 0.7 2.9± 0.6
Table 2. Parameters of the sigmoidal fits to the scatter plots reported in Fig. 6.
out small features in higher atmospheric layers, letting only
large flow features penetrate into the upper photosphere.
The same process can explain the broadening of the SG
border we found: in higher layers the corks are collected in
larger and fewer downflow structures. As more corks are
collected by the same structures, the number of indepen-
dent tracers is decreased and similarly the precision of the
retrieval of the boundary is decreased.
Instead, we can exclude that such a smoothing effect is
due to data reduction or seeing, because in that case the
SG border retrieved from the FeI 709.0 nm LoS field would
have been thinner than the one retrieved from the FeII
722.4 nm LoS field, as the latter shows lower contrast
features, more prone to be degraded by the loss of spatial
resolution.
Due to the form of equation 1, the difference A1 − A2,
divided for the time allotted to the corks to move, will
give the mean radial velocity experienced by the corks.
We plot in the bottom panel of Fig. 8 the radial velocity
retrieved from the three scatter plots as a function of the
photospheric altitude. Error bars represent the standard
deviation from the fit.
To account for these results, we assume that the large and
more coherent features present in the LoS dopplergrams
are reliable to retrieve the radial velocity measure, while
the measure from the continuum images is somewhat
reduced by the presence of tiny structures which are more
turbulent in their motion. Such structures are not present
in the higher layers dopplergrams because of the damping
effect already discussed.
We therefore discard the value obtained from the WL
dataset because it is probably smeared by the turbulent
motions of very small scale features and take into account
only the two values retrieved from the higher layers,
retrieving a mean velocity of 0.75± 0.05 km s−1.
Such a value for the flows from the SG structure centre
is consistent with the literature (Simon & Leighton,
1964; Hathaway et al., 2002; Paniveni et al., 2004;
Meunier et al., 2007).
5. Conclusions
The study of the full 3D velocity field of a SG shows that
strong downflows are located on the border of the super-
granular structure, but also that the mean granular flow
regresses from the centre to the periphery of the SG.
The divergence images show that the SG structure is out-
lined by convergence sites on ∼ 66% of its border. The
retrieved divergence values show a nearly radial flow of
∼ 0.1 km s−1 Mm−1 from the centre of the SG and con-
vergent flows of the same magnitude in its border.
The analysis of the evolution of passive tracers on inten-
sity and velocity fields shows that inside the SG structure
there is a preferential radial flow towards the SG border of
0.75± 0.05 km s−1.
The height behaviour of the thickness of the SG border,
again retrieved via cork tracing, shows an increase of the
border width with height. This is probably due to a filter-
ing effect with height, which preferentially allows large flow
features to penetrate into the upper photospheric layers.
The large and CaII bright cluster of structures in the lower
part of the FoV, is not a site of strong convergence, but
is a site of long-lasting downflows. We also found a strong
and stable upflow nearby the centre of the cell, liable to
organize the CaII bright structures by sweeping them out
of the SG cell.
The result presented in this paper are extracted from a sub-
set of a longer timeseries of excellent spectral and temporal
resolution, but varying spatial quality due to seeing. The
used 30 min subset is characterized by a constant and good
spatial resolution. This allowed us to detect precisely the
flow associated with the SG. Anyhow, our analysis would
have greatly benefited from a longer time sequence and
other SG structures to analyze. In the future, we plan to
apply this analysis to a collection of SG structures, so as
to derive some statistical describer and possibly generalize
the results.
Acknowledgements. We thank the referee, T. Roudier, for suggestions
and comments that have signicantly improved this paper. Part of
this work was supported by Rome “Tor Vergata” University Physics
Department grants. The data were acquired by instruments operated
by the National Solar Observatory. The National Solar Observatory
is a Division of the National Optical Astronomy Observatories,
which is operated by the Association of Universities for Research
in Astronomy, Inc., under cooperative agreement with the National
Science Foundation. DDM thanks the High Altitude Observatory for
support and C. Sormani for helpful comments. The authors aknowl-
edge k. Janssen for the development of the IBIS data reduction
pipeline, V. Penza for the calculation of the line RFs and M. Rast
for very useful discussions and comments.
References
Behan, A. 2000, Proceedings of the 19th ISPRS Congress and
Exhibition - Geoinformation for All. Amsterdam, The Netherlands,
16th - 23rd July 2000.
Beck, J. G., Duvall, T. L., Jr. 2000, BAAS, 32, 802
Berrilli, F., Consolini, G., Pietropaolo, E., Caccin, B., Penza, V.,
Lepreti, F. 2002, A&A, 381, 253
Berrilli, F., Del Moro, D., Consolini, G., Pietropaolo, E., Duvall, T.
L., Jr., Kosovichev, A. G. 2004, Sol. Phys., 221, 33
Berrilli, F., Del Moro, D., Russo, S., Consolini, G., Straus, Th. 2005,
ApJ, 632, 677
Caccin, B., Gomez, M. T., Marmolino, C. & Severino, G., 1977, A&A,
54, 227
Cavallini, F., Berrilli, F., Cantarano, S., Egidi, A. 2001, Mem. SaIt,
72, 554
Cavallini, F. 2006, Sol. Phys., 236, 415
Del Moro, D., Berrilli, F., Duvall, T. L., Jr., Kosovichev, A. G. 2004,
Sol. Phys., 221, 23
Del Moro, D. 2004, A&A, 428, 1007
Del Moro, D. 2005, PhD Thesis
DeRosa, M. L., Toomre, J. 2004, ApJ, 616, 1242
Del Moro et al.: 3D photospheric velocity field of a SG cell 7
Deubner, F.-L. 1971, Sol. Phys., 17, 6
Frazier, E.N. 1970, Sol. Phys., 14, 89
Giordano, S. Del Moro, D. Berrilli, F. 2007, submitted
Hathaway, D. H., Beck, J. G., Han, S., Raymond, J. 2002, Sol. Phys.,
205, 25
Janssen, K., Cauzzi, G. 2006, A&A, 450, 365
Krishan, V., Paniveni, U., Singh, Jagdev, Srikanth, R. 2002, MNRAS,
334, 230
Leighton, R. B. 1964, ApJ, 140, 1547
Meunier, N., Tkaczuk, R., Roudier, Th., Rieutord, M. 2007, A&A,
461, 1141
Müller, D.A.N., Steiner, O., Schlichenmaier, R., Brandt, P.N. 2001,
Sol. Phys., 203, 211
Musman, S., Rust, D.S. 1970, Sol. Phys., 13, 261
Paniveni, U., Krishan, V., Singh, Jagdev, Srikanth, R. 2004, MNRAS,
347, 1279
Puschmann, K. G., Ruiz Cobo, B., Vzquez, M., Bonet, J. A.,
Hanslmeier, A. 2005, A&A, 441, 1157
Qu, Z. & Xu, Z.,
2002, Chin. J. Astron. Astrophys., 2, 71
Rast, M. P., 2003, ApJ, 597, 1200
Raju, K. P., Srikanth, R., Singh, Jagdev 1999, BASI, 27, 65
Rieutord, M., Roudier, Th., Malherbe, J.M., Rincon, F. 2000, A&A,
357, 1063
Rieutord, M., Roudier, Th., Ludwig, H. G., Nordlund, Å., Stein, R.
2001, A&A, 377, L14
Rimmele, T., Schroeter, E. H. 1989, A&A, 221, 137
Roudier, Th., Lignieres, F., Rieutord, M., Brandt, P.N., Malherbe,
J.M. 2003, A&A, 409, 301
Schrijver, C. J., Hagenaar, H. J., Title, A. M. 1997, ApJ, 475, 328
Shine, R. A., Simon, G. W., Hurlburt, N. E. 2000, Soph, 193, 313
Simon, G. W., Leighton, R. B. 1964, ApJ, 140, 1120
Zwaan, C. 1978, Sol. Phys., 60, 213
Fig. 7. Mean images with superimposed the SG dimen-
sion extracted from the cork tracking. Top Panel: mean FeI
709.0 nm Doppler image with SG extracted from from FeI
709.0 nm Dopplergrams (∼ 140 km). Central Panel: mean
FeII 722.4 nm Doppler image with SG extracted from from
FeII 722.4 nm Dopplergrams (∼ 70 km). Bottom Panel:
mean CaII 854.2 nm wing image with SG extracted from
the intensity continuum images (∼ 0 km).
8 Del Moro et al.: 3D photospheric velocity field of a SG cell
-20 0 20 40 60 80 100 120 140 160 180 200
Photospheric Altitude (km)
-20 0 20 40 60 80 100 120 140 160 180 200
Photospheric Altitude [km]
Fig. 8. Top panel: dx (annulus width) versus photospheric
height. Bottom panel: radial velocity versus photospheric
height.
	Introduction
	Observations
	3D Velocity Field
	Horizontal Flow Analysis via Cork Tracking
	Conclusions
ABSTRACT
  We investigate the plasma flow properties inside a Supergranular (SG) cell,
in particular its interaction with small scale magnetic field structures. The
SG cell has been identified using the magnetic network (CaII wing brightness)
as proxy, applying the Two-Level Structure Tracking (TST) to high spatial,
spectral and temporal resolution observations obtained by IBIS. The full 3D
velocity vector field for the SG has been reconstructed at two different
photospheric heights. In order to strengthen our findings, we also computed the
mean radial flow of the SG by means of cork tracing. We also studied the
behaviour of the horizontal and Line of Sight plasma flow cospatial with
cluster of bright CaII structures of magnetic origin to better understand the
interaction between photospheric convection and small scale magnetic features.
The SG cell we investigated seems to be organized with an almost radial flow
from its centre to the border. The large scale divergence structure is probably
created by a compact region of constant up-flow close to the cell centre. On
the edge of the SG, isolated regions of strong convergent flow are nearby or
cospatial with extended clusters of bright CaII wing features forming the knots
of the magnetic network.

<|endoftext|><|startoftext|>
Introduction
According to the current cosmological paradigm, large struc-
tures in the Universe form hierarchically. Clusters of galaxies
are the largest structures that have grown through mergers of
smaller units and have achieved near dynamical equilibrium. In
the hierarchical scenario, clusters are a rather young population,
and we should be able to observe their formation process even at
rather low redshifts. A signature of such process is the presence
of cluster substructures. A cluster is said to contain substruc-
tures (or subclusters) when its surface density is characterized
by multiple, statistically significant peaks on scales larger than
the typical galaxy size, with “surface density” being referred to
the cluster galaxies, the intra-cluster (IC) gas or the dark matter
(DM hereafter; Buote 2002).
Studying cluster substructure therefore allows us to investi-
gate the process by which clusters form, constrain the cosmo-
logical model of structure formation, and ultimately test the hi-
erarchical paradigm itself (e.g. Richstone et al. 1992; Mohr et al.
1995; Thomas et al. 1998). In addition, it also allows us to better
understand the mechanisms affecting galaxy evolution in clus-
ters, which can be accelerated by the perturbative effects of a
cluster-subcluster collision and of the tidal field experienced by
Send offprint requests to: Massimo Ramella, ramella@oats.inaf.it
⋆ Figure 6 is only available in electronic form via
http://www.edpsciences.org
a group accreting onto a cluster (Bekki 1999; Dubinski 1999;
Gnedin 1999). If clusters are to be used as cosmological tools,
it is important to calibrate the effects substructures have on
the estimate of their internal properties (e.g. Schindler & Müller
1993; Pinkney et al. 1996; Roettiger et al. 1998; Biviano et al.
2006; Lopes et al. 2006). Finally, detailed analyses of clus-
ter substructures can be used to constrain the nature of DM
(Markevitch et al. 2004; Clowe et al. 2006).
The analysis of cluster substructures can be performed us-
ing the projected phase-space distribution of cluster galaxies
(e.g. Geller & Beers 1982), the surface-brightness distribution
and temperature of the X-ray emitting IC gas (e.g. Briel et al.
1992), or the shear pattern in the background galaxy distribu-
tion induced by gravitational lensing, that directly samples sub-
structure in the DM component (e.g. Abdelsalam et al. 1998).
None of these tracers of cluster substructure (cluster galaxies, IC
gas, background galaxies) can be considered optimal. The iden-
tification of substructures is in fact subject to different biases
depending on the tracer used. In X-rays projection effects are
less important than in the optical, but the identification of sub-
structures is more subject to a z-dependent bias, arising from the
point spread function of the X-ray telescope and detector (e.g.
Böhringer & Schuecker 2002). Moreover, the different cluster
components respond in a different way to a cluster-subcluster
collision. The subcluster IC gas can be ram-pressure braked and
stripped from the colliding subcluster and lags behind the sub-
cluster galaxies and DM along the direction of collision (e.g.
http://arxiv.org/abs/0704.0579v1
http://www.edpsciences.org
2 M. Ramella et al.: Substructures in the WINGS clusters
Roettiger et al. 1997; Barrena et al. 2002; Clowe et al. 2006).
Hence, it is equally useful to address cluster substructure analy-
sis in the X-ray and in the optical.
Traditionally, the first detections of cluster substructures
were obtained from the projected spatial distributions of galax-
ies (e.g. Shane & Wirtanen 1954; Abell et al. 1964), in com-
bination, when possible, with the distribution of galaxy ve-
locities (e.g. van den Bergh 1960, 1961; de Vaucouleurs 1961).
Increasingly sophisticated techniques for the detection and
characterization of cluster substructures have been developed
over the years (see Moles et al. 1986; Perea et al. 1986a,b;
Buote 2002; Girardi & Biviano 2002, and references therein).
In many of these techniques substructures are identified as de-
viations from symmetry in the spatial and/or velocity distri-
bution of galaxies and in the X-ray surface-brightness (e.g.
West et al. 1988; Fitchett & Merritt 1988; Mohr et al. 1993;
Schuecker et al. 2001). In other techniques substructures are
identified as significant peaks in the surface density distribu-
tion of galaxies or in the X-ray surface brightness, either as
residuals left after the subtraction of a smooth, regular model
representation of the cluster (e.g. Neumann & Böhringer 1997;
Ettori et al. 1998), or in a non-parametric way, e.g. by the tech-
nique of wavelets (e.g. Escalera et al. 1994; Slezak et al. 1994;
Biviano et al. 1996) and by adaptive-kernel techniques (e.g.
Kriessler & Beers 1997; Bardelli et al. 1998a, 2001).
The performances of several different methods have been
evaluated both using numerical simulations (e.g. Mohr et al.
1995; Crone et al. 1996; Pinkney et al. 1996; Buote & Xu
1997; Cen 1997; Valdarnini et al. 1999; Knebe & Müller 2000;
Biviano et al. 2006) and also by applying different methods
to the same cluster data-sets and examine the result differ-
ences (e.g. Escalera et al. 1992, 1994; Mohr et al. 1995, 1996;
Kriessler & Beers 1997; Fadda et al. 1998; Kolokotronis et al.
2001; Schuecker et al. 2001; Lopes et al. 2006). Generally
speaking, the sensitivity of substructure detection increases with
both increasing statistics (e.g. more galaxies or more X-ray pho-
tons) and increasing dimensionality of the test (e.g. using galaxy
velocities in addition to their positions, or using X-ray tempera-
ture in addition to X-ray surface brightness).
Previous investigations have found very different fractions
of clusters with substructure in nearby clusters, depending
on the method and tracer used for substructure detection, on
the cluster sample, and on the size of sampled cluster re-
gions (e.g. Geller & Beers 1982; Dressler & Shectman 1988;
Mohr et al. 1995; Girardi et al. 1997; Kriessler & Beers 1997;
Jones & Forman 1999; Solanes et al. 1999; Kolokotronis et al.
2001; Schuecker et al. 2001; Flin & Krywult 2006; Lopes et al.
2006). Although the distribution of subcluster masses has not
been determined observationally, it is known that subclusters
of ∼ 10% the cluster mass are typical, while more massive
subclusters are less frequent (Escalera et al. 1994; Girardi et al.
1997; Jones & Forman 1999). The situation is probably dif-
ferent for distant clusters which tend to show massive sub-
structures more often than nearby clusters clearly suggesting
hierarchical growth of clusters was more intense in the past
(e.g. Gioia et al. 1999; van Dokkum et al. 2000; Haines et al.
2001; Maughan et al. 2003; Huo et al. 2004; Rosati et al. 2004;
Demarco et al. 2005; Jeltema et al. 2005).
Additional evidence for the hierarchical formation of clus-
ters is provided by the analysis of brightest cluster galaxies
(BCGs hereafter) in substructured clusters. BCGs usually sit
at the bottom of the potential well of their host cluster (e.g.
Adami et al. 1998b). When a BCG is found to be significantly
displaced from its cluster dynamical center, the cluster displays
evidence of substructure (e.g. Beers et al. 1991; Ferrari et al.
2006). From the correlation between cluster and BCG luminosi-
ties, Lin & Mohr (2004) conclude that BCGs grow by merg-
ing as their host clusters grow hierarchically. The related evo-
lution of BCGs and their host clusters is also suggested by the
alignement of the main cluster and BCG axes (e.g. Binggeli
1982; Durret et al. 1998). Both the BCG and the cluster axes
are aligned with the surrounding large scale structure dis-
tribution, where infalling groups come from. These infalling
groups are finally identified as substructures once they enter
the cluster environment (Durret et al. 1998; Arnaud et al. 2000;
West & Blakeslee 2000; Ferrari et al. 2003; Plionis et al. 2003;
Adami et al. 2005). Hence, substructure studies really provide
direct evidence for the hierarchical formation of clusters.
Concerning the impact of subclustering on global cluster
properties, it has been found that subclustering leads to over-
estimating cluster velocity dispersions and virial masses (e.g.
Perea et al. 1990; Bird 1995; Maurogordato et al. 2000), but not
in the general case of small substructures (Escalera et al. 1994;
Girardi et al. 1997; Xu et al. 2000). During the collision of a
subcluster with the main cluster, both the X-ray emitting gas
distribution and its temperature have been found to be signifi-
cantly affected (e.g. Markevitch & Vikhlinin 2001; Clowe et al.
2006). As a consequence, it has been argued that substruc-
ture can explain at least part of the scatter in the scaling rela-
tions of optical-to-X-ray cluster properties (e.g. Fitchett 1988;
Girardi et al. 1996; Barrena et al. 2002; Lopes et al. 2006).
As far as the internal properties of cluster galaxies are
concerned, there is observational evidence that a higher frac-
tion of cluster galaxies with spectral features characteristic
of recent or ongoing starburst episodes is located in sub-
structures or in the regions of cluster-subcluster interactions
(Caldwell et al. 1993; Abraham et al. 1996; Biviano et al. 1997;
Caldwell & Rose 1997; Bardelli et al. 1998b; Moss & Whittle
2000; Miller et al. 2004; Poggianti et al. 2004; Miller 2005;
Giacintucci et al. 2006).
In this paper we search for and characterize substructures
in the sample of 77 nearby clusters of the WIde-field Nearby
Galaxy-cluster Survey (WINGS hereafter, Fasano et al. 2006).
This sample is an almost complete sample in X-ray flux in the
redshift range 0.04 < z < 0.07. We detect substructures from the
spatial, projected distribution of galaxies in the cluster fields, us-
ing the adaptive-kernel based DEDICA algorithm (Pisani 1993,
1996). In Sect. 2 we describe our data-set; in Sect. 3 we de-
scribe the procedure of substructure identification; in Sect. 4 we
use Monte Carlo simulations in order to tweak our procedure;
in Sect. 5 we describe the identification of substructures in our
data-set; in Sect. 6 the catalog of identified substructures is pro-
vided. In Sect. 7 we investigate the properties of the identified
substructures, and in Sect. 8 we consider the relation between
the BCGs and the substructures. We provide a summary of our
work in Sect. 9.
2. The Data
WINGS is an all-sky, photometric (multi-band) and spectro-
scopic survey, whose global goal is the systematic study of the
local cosmic variance of the cluster population and of the prop-
erties of cluster galaxies as a function of cluster properties and
local environment.
The WINGS sample consists of 77 clusters selected from
three X-ray flux limited samples compiled from ROSAT All-
Sky Survey data, with constraints just on the redshift (0.04 < z <
0.07) and distance from the galactic plane (|b| ≥20 deg). The core
M. Ramella et al.: Substructures in the WINGS clusters 3
of the project consists of wide-field optical imaging of the se-
lected clusters in the B and V bands. The imaging data were col-
lected using the WFC@INT (La Palma) and the WFI@MPG (La
Silla) in the northern and southern hemispheres, respectively.
The observation strategy of the survey favors the uniformity
of photometric depth inside the different CCDs, rather than com-
plete coverage of the fields that would require dithering. Thus,
the gaps in the WINGS optical imaging correspond to the phys-
ical gaps between the different CCDs of the mosaics.
During the data reduction process, we give particular care to
sky subtraction (also in presence of crowded fields including big
halo galaxies and/or very bright stars), image cleaning (spikes
and bad pixels) and star/galaxy classification (obtained with both
automatic and interactive tools).
According to Fasano et al. (2006) and Varela et al. (2007),
the overall quality of the data reported in the WINGS photomet-
ric catalogs can be summarized as follows: (i) the astrometric
errors for extended objects have r.m.s. ∼0.2 arcsec; (ii) the av-
erage limiting magnitude is ∼24.0, ranging from 23.0 to 25.0;
(iii) the completeness of the catalogs is achieved (on average) up
to V ∼22.0; (iv) the total (systematic plus random) photometric
r.m.s. errors, derived from both internal and external compar-
isons, vary from ∼0.02 mag, for bright objects, up to ∼0.2 mag,
for objects close to the detection limit.
3. The DEDICA Procedure
We base our search for substructures in WINGS clusters on the
DEDICA procedure (Pisani 1993, 1996). This procedure has the
following advantages:
1. DEDICA gives a total description of the clustering pattern,
in particular the membership probability and significance of
structures besides geometrical properties;
2. DEDICA is scale invariant;
3. DEDICA does not assume any property of the clusters, i.e.
it is completely non-parametric. In particular it does not re-
quire particularly rich samples to run effectively.
The basic nature and properties of DEDICA are described in
Pisani (1993, 1996, and references therein). Here we summarize
the main structure of the algorithm and how we apply it to our
data sample. The core structure of DEDICA is based on the as-
sumption that a structure (or a “cluster” in the algorithm jargon)
corresponds to a local maximum in the density of galaxies.
We proceed as follows. First we need to estimate the proba-
bility density function Ψ(ri) (with i = 1, . . .N) associated with
the set of N galaxies with coordinates ri. Second, we need to
find the local maxima in our estimate of Ψ(ri) in order to iden-
tify clusters and also to evaluate their significance relatively to
the noise. Third and finally, we need to estimate the probability
that a galaxy is a member of the identified clusters.
3.1. The probability density
DEDICA is a non-parametric method in the sense that it does
not require any assumption on the probability density function
that it is aimed to estimate. The only assumptions are that Ψ(ri)
must be continuous and at least twice differentiable.
The function f (ri) is an estimate of Ψ(ri) and it is built by
using an adaptive kernel method given by:
fka(r) =
K(ri, σi; r) (1)
where we use the two dimensional Gaussian kernel K(ri, σi; r)
centered in ri with size σi.
The most valuable feature of DEDICA is the procedure to se-
lect the values of kernel widths σi. It is possible to show that the
optimal choice for σi, i.e. with asymptotically minimum vari-
ance and null bias, is obtained by minimizing the distance be-
tween our estimate f (ri) and Ψ(ri). This distance can be eval-
uated by a particular function called the integrated square error
IS E( f ) given by:
IS E( f ) =
[Ψ(r) − f (r)]2dr (2)
Once the minimum IS E( f ) is reached we have obtained the
DEDICA estimate of the density as in Eq.1.
3.2. Cluster Identification
The second step of DEDICA consists in the identification of the
local maxima in fka(r). The positions of the peaks in the density
function fka(r) are found as the solutions of the iterative equa-
tion:
rm+1 = rm + a ·
∇ fka(rm)
fka(rm)
where a is a scale factor set according to optimal convergence
requirements. The limit R of the sequence rm defined in Eq.3
depends on the starting position rm=1.
rm = R(rm=1) (4)
We run the sequence in Eq.3 at each data position ri. We label
each data point with the limit Ri = R(rm=1 = ri). These limits
Ri are the position of the peak to which the i− th galaxy belong.
In the case that all the galaxies are members of a unique cluster,
all the labels Ri are the same. At the other extreme each galaxy
is a one-member cluster and all Ri have different values. All the
members of a given cluster belong to the same peak in fka(r)
and have the same Ri. We identify cluster members by listing
galaxies having the same values of R. We end up with ν different
clusters each with nµ (µ = 1, . . . , ν) members.
In order to maintain a coherent notation, we identify with
the label µ = 0 the n0 isolated galaxies considered a system of
background galaxies. We have: n0 = N −
µ=1 nµ.
3.3. Cluster Significance and of Membership Probability
The statistical significance S µ (µ = 1, . . . , ν) of each cluster is
based on the assumption that the presence of the µ − th cluster
causes an increase in the local probability density as well as in
the sample likelihood LN = Πi[ fka(ri)] relatively to the value Lµ
that one would have if the members of the µ− th cluster were all
isolated, i.e. belonging to the background.
A large value in the ratio LN/Lµ characterizes the most im-
portant clusters. According to Materne (1979) it is possible to
estimate the significance of each cluster by using the likelihood
ratio test. In other words 2 ln(LN/Lµ) is distributed as a χ
2 vari-
able with ν− 1 degrees of freedom. Therefore, once we compute
the value of χ2 for each cluster (χ2S ), we can also compute the
significance S µ of the cluster.
Here we assume that the contribution to the global density
field fka(ri) of the µ − th cluster is Fµ(ri). The ratio between the
value of Fµ(ri) and the total local density fka(ri) can be used to
estimate the membership probability of each galaxy relatively to
4 M. Ramella et al.: Substructures in the WINGS clusters
the identified clusters. This criterion also allows us to estimate
the probability that a galaxy is isolated.
At the end of the DEDICA procedure we are left with a)
a catalog of galaxies each with information on position, mem-
bership, local density and size of the Gaussian kernel, b) a cat-
alog of structures with information on position, richness, the
χ2S parameter, and peak density. For each cluster we also com-
pute from the coordinate variance matrix the cluster major axis,
ellipticity and position angle.
4. Tweaking the Algorithm with Simulations
In this section we describe our analysis of the performance of
DEDICA and the guidelines we obtain for the interpretation of
the clustering analysis of our observations.
We build simulated fields containing a cluster with and with-
out subclusters. The simulated fields have the same geometry
of the WFC field and are populated with the typical number of
objects we will analyze. For simplicity we consider only WFC
fields. Because DEDICA is scale-free, a different sampling of
the same field of view has no consequence on our analysis.
In the next section we limit our analysis to MV,lim ≤ −16.
At the median redshift of the WINGS cluster, z ≃ 0.05, this
absolute magnitude limit corresponds to an apparent magnitude
Vlim ≃ 21. Within this magnitude limit the representative number
of galaxies in our frames is Ntot= 900.
We then consider Ntot= Nmem+ Nbkg, with Nmemthe number
of cluster members and Nbkgthe number of field – or background
– galaxies. We set Nbkg= 670, close to the average number of
background galaxies we expect in our frame based on typical
observed fields counts, e.g. Berta et al. (2006) or Arnouts et al.
(1997). With this choice, we have Nmem= 230.
We distribute uniformly at random Nbkgobjects. We dis-
tribute at random the remaining Nmem= 230 objects in one or
more overdensities depending on the test we perform. We popu-
late overdensities according to a King profile (King 1962) with
a core radius Rcore = 90 kpc, representative of our clusters. We
then scale Rcore with the number of members of the substructure,
NS . We use
Rcore = 250
NC + NS
where NC is the number of objects in the cluster with Nmem=
NS + NC . This scaling of Rcore with cluster richness is from
Adami et al. (1998a) assuming direct proportionality between
cluster richness and luminosity (e.g. Popesso et al. 2006).
As far as the relative richnesses of the cluster and subcluster
are concerned, we consider the following richness ratios rcs =
NC/NS = 1, 2, 4, 8. With these richness ratios, the number of
objects in the cluster are NC = 115, 153, 184, 204, and those in
subclusters are NS = 115, 77, 46, 26 respectively.
In a first set of simulated fields we place the substructure at
2731 pixels (15 arcmin) from the main cluster so that they do not
overlap. In a second set of simulations, we place main cluster and
substructure at shorter distances, 683 and 1366 pixels, in order to
investigate the ability of DEDICA to resolve structures. At each
of these shorter distances we build simulations with both rcs= 1
and 2.
For each richness ratio and/or distance between cluster and
subcluster we produce 16 simulations with different realizations
of the random positions of the data points representing galaxies.
In order to minimize the effect of the borders on the detection
of structures we add to the simulation a “frame” of 1000 pixel.
5 10 15
simulation
Fig. 1. Fraction of recovered members of each substructure for
different rcs. The solid line connects substructures with rcs = 2
and 4
We fill this frame with a grid of data points at the same density
as the average density of the field.
The first result we obtain from the runs of DEDICA on the
simulations with varying richness ratio is the positive rate at
which we detect real structures. We find that we always recover
both cluster and substructure even when the substructure only
contains NS = 1/8 NC objects, i.e. 26 objects (on top of the uni-
form background). In other words, if there is a real structure
DEDICA finds it.
We also check how many original members the procedure
assigns to structures it recovers. The results are summarized in
Fig. 1. In the diagram, the fraction of recovered members of each
substructure is represented by the values of its rcs. The solid line
connects substructures with rcs = 2 and 4.
From Fig. 1 it is clear that our procedure recovers a large
fraction of members, almost irrespective of the richness of the
original structure. It is also interesting to note that the fluctu-
ations identified as substructures are located very close to the
center of the corresponding simulated substructures. In almost
all cases the distance between original and detected substructure
is significantly shorter than the mean inter-particle distance.
The second important result we obtain from the simulations
is the false positive rate, i.e. the fraction of noise fluctuations
that are as significant as the fluctuations corresponding to real
structures.
First of all we need to define an operative measure of the
reliability of the detected structures. In fact DEDICA provides a
default value S µ (µ = 1, . . . , ν) of the significance (see Sect. 3.3).
However, S µ has a relatively small dynamical range, in particular
for highly significant clusters.
Density or richness both allow a reasonable ”ranking” of
structures. However, both large low-density noise fluctuations
(often built up from more than one noise fluctuation) and very
high density fluctuations produced by few very close data points
could be mistakenly ranked as highly significant structures ac-
cording to, respectively, richness and density criteria.
M. Ramella et al.: Substructures in the WINGS clusters 5
0 20 40 60 80
Fig. 2. χ2S of simulated noise fluctuations (solid line). Labels are
the rcsof simulated structures at the abscissa corresponding to
their χ2S and at arbitrary ordinates.
We therefore prefer to use the parameter χ2S which stands at
the base of the estimate of S µ and which is naturally provided by
DEDICA. The main characteristic of χ2S is that it depends both on
the density of a cluster relative to the background and on its rich-
ness. Using χ2S we classify correctly significantly more structures
than with either density or richness alone.
In Fig. 2 we plot the distribution of χ2S of noise fluctuations
(solid line). In the same plot we also mark the rcsof real struc-
tures as detected by our procedure. We use labels indicating
rcsand place them at the abscissa corresponding to their χ
S and
at arbitrary ordinates.
Fig. 2 shows that the structures detected with rcs= 1, 2 are al-
ways distinguishable from noise fluctuations. Substructures with
rcs= 4 or higher, although correctly detected, have χ
S values that
are close to or lower than the level of noise.
With the second set of simulations, we test the minimum dis-
tance at which cluster and subcluster can still be identified as
separate entities. We place cluster and substructure (rcs= 1, 2) at
distances dcs = 683 and 1366 pixel. These distances are 1/4 and
1/2 respectively of the distance between cluster and substructure
in the first set of simulations. Again we produce 16 simulations
for each of the 4 cases.
We find that at dcs = 1366 pixel cluster and substructure are
always correctly identified. At the shorter distance dcs = 683
pixel, DEDICA merges cluster and substructure in 1 out of 16
cases for rcs= 1 and in 8 out of 16 cases for rcs= 2. With our
density profile, dcs = 683 pixel corresponds to dcs ≃ Rc + Rs
with Rc, Rs the radii of the main cluster and of the subcluster
respectively.
In order to verify the results we obtain for 900 data points
we produce more simulations with Ntot= 450, 600 and 1200. In
all these simulations RC and RS are the same as in the set with
Ntot= 900. We vary Nbkgand Nmemso that Nmem/ Nbkgis the same
as in the case Ntot= 900.
0.5 1 1.5 2 2.5
Fig. 3. Small symbols correspond to χ2S as a function of the num-
ber of members of noise fluctuations. Crosses, circles, dots and
triangles are χ2S for the noise fluctuations of the simulations with
Ntot= 450, 600, 900, and 1200 respectively. Large symbols are
χ2S of simulated clusters and subclusters with rcs= 1. Horizontal
lines mark the levels of χ2S ,threshold.
These simulations confirm the results we obtain in the
case Ntot= 900, and allow us to set a detection threshold,
χ2S ,threshold(Ntot), for significant fluctuations in the analysis of real
clusters.
We summarize the behavior of the noise fluctuations in our
simulations in Fig. 3. In this figure, the small symbols corre-
spond to χ2S as a function of the number of members of noise
fluctuations. In particular, crosses, circles, dots and triangles are
χ2S for the noise fluctuations of the simulations with Ntot= 450,
600, 900, and 1200 respectively.
The larger symbols are the χ2S of the fluctuations correspond-
ing to simulated clusters and subclusters of equal richness (rcs=
The 4 horizontal lines mark the level of χ2S ,threshold, i.e. the
average χ2S of the 3 most significant noise fluctuations in each of
the 4 groups of simulations with Ntot= 450, 600, 900, and 1200.
The expected increase of χ2S ,thresholdwith Ntotis evident.
We note that the only significant difference with these find-
ings we obtain from the simulations with rcs= 2 is that χ
simulated clusters and subclusters is closer to χ2S ,threshold(but still
higher).
We fit χ2S ,thresholdwith Ntotand obtain
log(χ2S ,threshold) = 1/2.55 log(Ntot) + 0.394 (5)
in good agreement with the expected behavior of the poissonian
fluctuations.
As a final test we verify that infra-chip gaps do not have a
dramatic impact on the detection of structures in the cases rcs= 1
and 2. We place a 50 pixel wide gap where it has the maximum
impact, i.e. where the kernel size is shortest. Even if the infra-
6 M. Ramella et al.: Substructures in the WINGS clusters
chip gap cuts through the center of the structures, DEDICA is
able to identify these structures correctly.
We summarize here the main results of our tests on simulated
clusters with substructures:
– DEDICA successfully detects even the poorest structures
above a uniform poissonian noise background.
– DEDICA recovers a large fraction (typically > 3/4) of the
real members of a substructure, almost irrespective of the
richness of the structure.
– DEDICA is able to distinguish between noise fluctuations
and true structures only if these structures are rich enough.
In the case of our simulations, structures have to be richer
than 1/4th of the main structure.
– DEDICA is able to separate neighboring structures provided
they do not overlap.
– infra-chip gaps do not threaten the detection of structures
that are rich enough to be reliably detected.
– the χ2S threshold we use to identify significant structures is
a function of the total number of points and can be scaled
within the whole range of numbers of galaxies observed
within our fields.
In the next section we apply these results to the real WINGS
clusters.
5. Substructure detection in WINGS clusters
We apply our clustering procedure to the 77 clusters of the
WINGS sample. The photometric catalog of each cluster is deep,
reaching a completeness magnitude Vcomplete ≤ 22. The num-
ber of galaxies is correspondingly large, from Ngal ≃ 3, 000 to
Ngal ≃ 10, 000.
The large number of bright background galaxies (faint ap-
parent magnitudes) dilutes the clustering signal of local WINGS
clusters. We perform test runs of the procedure on several clus-
ters with magnitude cuts brighter than Vcomplete. Based on these
tests, we decide to cut galaxy catalogs to the absolute magnitude
threshold MV = −16.0. With this choice a) we maximize the
signal-to-noise ratio of the detected subclusters and b) we still
have enough galaxies for a stable identification of the system. At
the median redshift of WINGS clusters, z ≃ 0.0535, our absolute
magnitude cut corresponds to an apparent magnitude V ≃ 21.2.
This apparent magnitude also approximately corresponds to
the magnitude where the contrast of our typical cluster relative
to the field is maximum (this estimate is based on the average
cluster luminosity function of (Yagi et al. 2002; De Propris et al.
2003) and on the galaxy counts of (Berta et al. 2006)).
The number of galaxies that are brighter than the threshold
MV = −16.0 is in the range 600 < Ntot < 1200 for a large
fraction of clusters observed with either WFC@INT or with
WFI@ESO2.2.
In order to proceed with the identification of significant
structures within WINGS clusters, we need to verify that our
simulations are sufficiently representative of the real cases.
In practice we need to compare the observed distributions of
χ2S values of noise fluctuations with the corresponding simulated
distributions. In the observations it is impossible to identify indi-
vidual fluctuations as noise. In order to have an idea of the distri-
butions of χ2S of noise fluctuations we consider that our fields are
centered on real clusters. As a consequence, on average, fluctua-
tions in the center of the frames are more likely to correspond to
real systems than those at the borders.
0 20 40 60
Fig. 4. χ2S distributions for border (thick solid histogram) and
central (thick dashed histogram) observed fluctuations. The thin
solid line is the normalized distribution of χ2S of the noise fluctu-
ations in our simulations
We therefore consider separately the fluctuations within the
central regions of the frames and all other fluctuations (borders).
We define the central regions as the central 10% of WFC and
WFI areas. We plot in Fig. 4 the two distributions. The thick
solid histogram is for the border and the thick dashed histogram
for the center of the frames. The difference between ”noise” and
”signal” is clear. In the same figure we also plot the normal-
ized distribution of χ2S of the noise fluctuations in our simulations
(thin solid line). The distributions of χ2S of the observed and sim-
ulated fluctuations are in reasonable agreement considering a)
the simple model used for the simulations and that b) in the ob-
servations we can not exclude real low-χ2S structures among noise
fluctuations. We conclude that for our clusters we can adopt the
same reliability threshold χ2S ,thresholdwe determine from our sim-
ulations (Eq. 5).
6. The Catalog of Substructures
We detect at least one significant structure in 55 (71%) clus-
ters. We find that 12 clusters (16%) have no structure above the
threshold (undetected). In the case of another 10 (13%) clusters
we find significant structures only at the border of the field of
view. In absence of a detection in the center of the frame, we con-
sider these border structures unrelated to the target cluster. We
also verify that in the Color-Magnitude Diagram (CMD) these
border structures are redder than expected given the redshift of
the target cluster. We consider also these 10 clusters undetected.
Here we list the 22 undetected clusters: A0133, A0548b,
A0780, A1644, A1668, A1983, A2271, A2382, A2589, A2626,
A2717, A3164, A3395, A3490, A3497, A3528a, A3556, A3560,
A3809, A4059, RX1022, Z1261.
We note that undetected clusters are real physical systems ac-
cording to their x-ray selection. From an operative point of view,
the fact that these clusters are not detected by DEDICA is the
M. Ramella et al.: Substructures in the WINGS clusters 7
Fig. 5. Isodensity contours (logarithmically spaced) of the Abell
85 field. The title lists the coordinates of the center. The orien-
tation is East to the left, North to the top. Galaxies belonging to
the systems detected by DEDICA are shown as dots of different
colors. Black, light green, blue, red, magenta, dark green are for
the main system and the subsequent substructures ordered as in
Table A.1. Large symbols are for galaxies with MV ≤ −17.0 that
lie where local densities are higher than the median local density
of the structure the galaxy belongs to. Open symbols mark the
positions of the first- and second-ranked cluster galaxies, BCG1
and BCG2 respectively. Similar plots for the 55 analysed clusters
are available in the electronic version of this Journal.
result of the division into too many structures of the total avail-
able clustering signal in the field (or of a too large fraction of
the clustering signal going into border structures). Several phys-
ical situations could be at the origin of missed detections. One
possibility is an excess of physical substructures of comparable
richness. Another possibility is that these clusters are embedded
in regions of the large scale structure that are highly clustered.
We do not try to recover these structures because they can
not be prominent enough. Since our analysis is bidimensional,
we can only detect and use confidently the most prominent struc-
tures. Redshifts are needed for a more detailed analysis of cluster
substructures.
We list the 55 clusters with significant structures in Table
A.1. We give, for each substructure: (1) the name of the parent
cluster; (2) the classification of the structure as main (M), sub-
cluster (S), or background (B) together with their order number;
(3) right ascension (J2000), and (4) declination (J2000) in deci-
mal degrees of the DEDICA peak; the parameters of the ellipse
we obtain from the variance matrix of the coordinates of galaxies
in the substructure, i.e. (5) major axis in arcminutes, (6) elliptic-
ity, and (7) position angle in degrees; (8) luminosity (see the next
section); (9) χ2S .
We make available contour plots of the number density fields
of all clusters in Fig. 6 of the electronic version of this Journal.
In Fig. 5 we show an example of these plots. Isodensity contours
are drawn at ten logarithmic intervals. Galaxies belonging to the
systems detected by DEDICA are shown as dots of different col-
ors. We use large symbols for brighter galaxies (MV ≤ −17.0)
that lie where local densities are higher than the median local
density of the structure the galaxy belongs to. We also mark
with open symbols the positions of the first- and second-ranked
cluster galaxies, BCG1 and BCG2 respectively. Color coding is
black, light green, blue, red, magenta, dark green for the main
system and the subsequent substructures ordered as in Table A.1.
We describe and analyze in detail our catalog in the next sec-
tion.
7. Properties of substructures
The first problem we face in order to study the statistical and
physical properties of substructures is to determine their asso-
ciation with the main structure. In fact, the main structure itself
has to be identified among the structures detected by DEDICA
in each frame.
In most cases it is easy to identify the main structure of a
cluster since it is located at the center of the frame and it has
a high χ2S . In two cases (A0168 and A1736) the choice of the
main structure is complicated because there are several similar
structures near the center of the frame. In these cases we select
the main structure for its highest χ2S .
At this point we limit our analysis to members of the struc-
ture that a) have an absolute magnitude MV ≤ −17 (corrected for
Galactic absorption) and that b) are in the upper half of the distri-
bution of DEDICA-defined local galaxy densities of the system
they belong to. The galaxy density threshold we apply allows us
to separate adjacent structures whose definition becomes more
uncertain at lower galaxy density levels. The magnitude cut in-
creases the relative weight of the galaxies we use to evaluate the
nature of structures in the CMD.
After having identified the main structure, we need to deter-
mine which structures in the field of view of a given cluster have
to be considered background structures. We consider a structure
a physical substructure (or subcluster) if its color-magnitude re-
lation (CMR hereafter) is identical, within the errors, to the CMR
of the main structure.
As a first step we define the color-magnitude relation (CMR)
of the “whole cluster”, i.e. of galaxies in the main structure to-
gether with all other galaxies not assigned to any structure by
DEDICA. We compute the (B − V) CMR of the Coma cluster
from published data (Adami et al. 2006). Then we keep fixed
the slope of the linear CMR of Coma and shift it to the mean
redshift of the cluster.
In order to determine that the main structure and a substruc-
ture are at the same redshift, we evaluate the fraction of back-
ground (red) galaxies, fbg, that each structure has in the CMD.
If these fractions are identical within the errors (Gehrels 1986),
we consider the two structures to be at the same redshift.
In practice we determine fbg by assigning to the background
those galaxies of a structure that are redder than a line parallel to
the CMR and vertically shifted (i.e. redwards) by 2.33 times the
root-mean square of the colors of galaxies in the CMR. We note
that the probability that a random variable is greater than 2.33 in
a Gaussian distribution is only 1%.
The result of the selection of main structures and substruc-
tures is the following: 40 clusters have a total of 69 substruc-
tures at the same redshift as the main structure, only 15 clus-
ters are left without substructures. A total of 35 systems are
found in the background. Considering a) the number density of
poor-to-rich clusters (Mazure et al. 1996; Zabludoff et al. 1993),
b) the average luminosity function of clusters (Yagi et al. 2002;
8 M. Ramella et al.: Substructures in the WINGS clusters
Fig. 7. Cumulative distributions of the two different indicators of
subclustering: left panel Nsub, right panel fLsub.
De Propris et al. 2003), c) the total area covered by the 55 clus-
ter fields, and d) the limiting apparent magnitude corresponding
to our absolute magnitude threshold MV = −16.0, we expect to
find ∼ 0.5± 0.2 background systems per cluster field, 28± 11 in
total. This estimate is consistent with the 35 background systems
we find.
The fraction of clusters with subclusters (73%) is higher than
generally found in previous investigations (typically∼ 30%, see,
e.g., Girardi & Biviano 2002; Flin & Krywult 2006; Lopes et al.
2006, and references therein). Even if we count all undetected
clusters as clusters without substructures, this fraction only de-
creases to 52% (40/77). It is however acknowledged that the
fraction of substructured clusters depends, among other factors,
on the algorithm used to detect substructures, on the quality
and depth of the galaxy catalog. For example Kolokotronis et al.
(2001) using optical and X-ray data find that the fraction of clus-
ters with substructures is ≥ 45%, Burgett et al. (2004) using a
battery of tests detect substructures in 84% of the 25 clusters of
their sample.
Having established the “global” fraction of substructured
clusters, we now investigate the degree of subclustering of in-
dividual clusters, i.e. the distribution of the number of substruc-
tures Nsub we find in our sample.
We find 15 (27%) clusters without substructures; 22 (40%)
clusters with Nsub = 1; 10 (18%) clusters with Nsub = 2; 6 (11%)
clusters with Nsub = 3; and 2 (3%) clusters with Nsub = 4. We
plot in the left panel of Fig. 7 the integral distribution of Nsub.
The distribution of the level of subclustering does not change
when we measure it as the fractional luminosity of subclusters,
fLsub, relative to the luminosity of the whole cluster (see Fig. 7,
right panel). The luminosities we estimate are background cor-
rected using the counts of Berta et al. (2006). We use the ellipses
output from DEDICA (see previous section) as a measure of the
area of subclusters.
We find that Nsub and fLsub are clearly correlated according
to the Spearman rank-correlation test.
We now consider the distribution of subcluster luminosities
and plot the corresponding histogram in Fig. 8. In the same fig-
ure we also plot with arbitrary scaling the power-law∝ L−1. This
relation is the prediction for the differential mass function of
substructures in the cosmological simulations of De Lucia et al.
(2004).
Fig. 8. Observed differential distribution of subcluster lumi-
nosities (histogram) and theoretical model (arbitrary scaling;
De Lucia et al. 2004).
Our observations are consistent to within the uncertainties
with the theoretical prediction of De Lucia et al. (2004) down
to L ∼ 1011.2 L⊙. The disagreement at lower luminosity is ex-
pected since: a) below this limit galaxy-sized halos become im-
portant among the simulated substructures, and b) only above
this limit we expect our catalog to be complete. In fact only
subclusters with luminosities brighter than L = 1011.2 L⊙ have
always richnesses that are ≥ 1/3 of the main structure. This
richness limit approximately corresponds to the completeness
limit of DEDICA detections according to our simulations (see
Sect. 4).
8. Brightest Cluster Galaxies
Here we investigate the relation between BCGs and cluster struc-
tures.
We find that, on average, BCG1s are located close to the den-
sity peak of the main structures. In projection on the sky, the bi-
weight average (see Beers et al. 1990) distance of BCG1s from
the peak of the main system is 72 ± 11 kpc. If we only consider
the 44 BCG1s that are on the CMR and are assigned to main
systems by DEDICA, the average distance decreases to 56 ± 8
kpc. The fact that BCG1s are close to the center of the system
is consistent with current theoretical view on the formation of
BCGs (e.g. Dubinski 1998; Nipoti et al. 2004).
BCG2s are more distant than BCG1s from the peak of the
main system: the biweight average distance is 345 ± 47 kpc. If
we only consider the 26 BCG2s that are on the CMR and are
assigned to main systems by DEDICA, the average distance de-
creases to 161 ± 34 kpc.
Projected distances of BCG2s from density peaks remain
larger than those of BCG1s even when we consider the density
peak of the structure or substructure they belong to. In Fig. 9
we plot the cumulative distributions of the distances of BCG1s
(solid line) and BCG2s (dashed line) from the density peak of
their systems. The distributions are different at the > 99.99%
level according to a Kolmogorov-Smirnov test (KS-test).
Now we turn to luminosities and find that the magnitude dif-
ference between BCG1s and BCG2s, ∆M12, is larger in clus-
ters without substructures than in clusters with substructures.
In Fig. 10 we plot the cumulative distributions of ∆M12 for
clusters with (dashed line) and without (solid line) subclusters.
M. Ramella et al.: Substructures in the WINGS clusters 9
Fig. 9. Cumulative distributions of distances of BCG1 (solid
line) and BCG2 (dashed line) from the density peak of their sys-
Fig. 10. Cumulative distributions of the magnitude difference be-
tween BCG1 and BCG2 in clusters with (dashed line) and with-
out subclusters (solid line).
The two distributions are different according to a KS-test at the
99.1% confidence level. We note that Lin & Mohr (2004) find
that ∆M12 is independent of cluster properties. These authors
however do not consider subclustering.
In order to determine whether the higher values of ∆M12 in
clusters without subclusters are due to an increased luminosity
of the BCG1 (L1) or to a decreased luminosity of the BCG2
(L2), we consider the luminosity of the 10
th brightest galaxy
(L10) as a reference. The biweight average luminosity ratios are
< L1/L10 >= 8.6 ± 1.0 and < L2/L10 >= 3.3 ± 0.3 in clus-
ters without substructures, and < L1/L10 >= 7.1 ± 0.4 and
< L2/L10 >= 3.4 ± 0.2 in clusters with substructures. We then
conclude that the ∆M12-effect is caused by a brightening of the
BCG1 relative to the BCG2 in clusters without substructures.
The fact that ∆M12 is higher in clusters without substruc-
tures can be interpreted, at least qualitatively, in the framework
of the hierarchical scenario of structure evolution. Clusters with-
out substructures are likely to be evolved after several merger
phases. Their BCG1s have already had time to accrete many
galaxies, in particular the more massive ones, which slow down
and sink to the cluster center as the result of dynamical friction.
Some of these galaxies may even have been BCGs of the merg-
ing structures. The simulations by De Lucia & Blaizot (2006)
show that the BCG1s continue to increase their mass via can-
nibalism even at recent times, and that there is a large vari-
ance in the mass accretion history of BCG1s from cluster to
cluster. The result of such a cannibalism process is an increase
of the BCG1 luminosity with respect to other cluster galaxies,
and in extreme cases may lead to the formation of fossil groups
(Khosroshahi et al. 2006).
However, according to these simulations, only 15% of all
BCG1s have accreted > 30% of their mass over the last 2 Gyr,
while another 15% have accreted <3% of their mass over the
same period. Our results indicate that about 60% of the BCG1s
are more than 1 magnitude brighter than the corresponding
BCG2s. Given the size and generality of the luminosity dif-
ferences it would seem that cannibalism alone, even if present
along the merging history of a given cluster, cannot account for
it. Most of the BCG1s should have then been assembled in early
times, as pointed out in the downsizing scenario for galaxy for-
mation (Cowie et al. 1996) and entered that merging history al-
ready with luminosity not far form the present one.
9. Summary
In this paper we search for and characterize cluster substructures,
or subclusters, in the sample of 77 nearby clusters of the WINGS
(Fasano et al. 2006). This sample is an almost complete sample
in X-ray flux in the redshift range 0.04 < z < 0.07.
We detect substructures in the spatial projected distribution
of galaxies in the cluster fields using DEDICA (Pisani 1993,
1996) an adaptive-kernel technique. DEDICA has the following
advantages for our study of WINGS clusters:
a) DEDICA gives a total description of the clustering pat-
tern, in particular membership probability and significance of
structures besides geometrical properties.
b) DEDICA is scale invariant
c) DEDICA does not assume any property of the clusters, i.e.
it is completely non-parametric. In particular it does not require
particularly rich samples to run effectively.
In order to test DEDICA and to set guidelines for the in-
terpretation of the results of the application of DEDICA to our
observations we run DEDICA on several sets of simulated fields
containing a cluster with and without subclusters.
We find that: a) DEDICA always identifies both cluster and
subcluster even when the substructure richness ratio cluster-
to-subcluster is rcs= 8, b) DEDICA recovers a large fraction
of members, almost irrespective of the richness of the original
structure (>∼ 70% in most cases), c) structures with richness ra-
tios rcs<∼ 3 are always distinguishable from noise fluctuations of
the poissonian simulated field.
These simulations also allow us to define a threshold that we
use to identify significant structures in the observed fields.
We apply our clustering procedure to the 77 clusters of the
WINGS sample. We cut galaxy catalogs to the absolute magni-
tude threshold MV = −16.0 in order to maximize the signal-to-
noise ratio of the detected subclusters.
We detect at least one significant structure in 55 (71%) clus-
ter fields. We find that 12 clusters (16%) have no structure above
the threshold (undetected). In the remaining 10 (13%) clusters
we find significant structures only at the border of the field of
view. In absence of a detection in the center of the frame, we
consider these border structures unrelated to the target cluster.
We also verify that in the CMD these border structures are redder
10 M. Ramella et al.: Substructures in the WINGS clusters
than expected given the redshift of the target cluster. We consider
also these clusters undetected.
We provide the coordinates of all substructures in the 55
clusters together with their main properties.
Using the CMR of the early-type cluster galaxies we sep-
arate ”true” subclusters from unrelated background structures.
We find that 40 clusters out of 55 (73%) have a total of 69 sub-
structures with 15 clusters left without substructures.
The fraction of clusters with subclusters (73%) we identify is
higher than most previously published values (typically ∼ 30%,
see, e.g., Girardi & Biviano 2002, and references therein). It is
however acknowledged that the fraction of substructured clus-
ters depends, among other factors, on the algorithm used to de-
tect substructures, on the quality and depth of the galaxy catalog
(Kolokotronis et al. 2001; Burgett et al. 2004).
Another important result of our analysis is the distribution of
subcluster luminosities. In the luminosity range where our sub-
structure detection is complete (L ≥ 1011.2 L⊙), we find that the
distribution of subcluster luminosities is in agreement with the
power-law ∝ L−1 predicted for the differential mass function of
substructures in the cosmological simulations of De Lucia et al.
(2004).
Finally, we investigate the relation between BCGs and clus-
ter structures.
We find that, on average, BCG1s are located close to the den-
sity peak of the main structures. In projection on the sky, the bi-
weight average distance of BCG1s from the peak of the main
system is 72±11 kpc. BCG2s are significantly more distant than
BCG1s from the peak of the main system (345 ± 47 kpc).
The fact that BCG1s are close to the center of the system
is consistent with current theoretical view on the formation of
BCGs (Dubinski 1998).
A more surprising result is that the magnitude difference be-
tween BCG1s and BCG2s, ∆M12, is significantly larger in clus-
ters without substructures than in clusters with substructures.
This fact may be interpreted in the framework of the hierarchical
scenario of structure evolution (e.g. De Lucia & Blaizot 2006).
References
Abdelsalam H.M., Saha P., Williams L.L.R., Oct. 1998, AJ, 116, 1541
Abell G.O., Neyman J., Scott E.L., 1964, AJ, 69, 529
Abraham R.G., Smecker-Hane T.A., Hutchings J.B., et al., Nov. 1996, ApJ, 471,
Adami C., Mazure A., Biviano A., Katgert P., Rhee G., Mar. 1998a, A&A, 331,
Adami C., Mazure A., Katgert P., Biviano A., Aug. 1998b, A&A, 336, 63
Adami C., Biviano A., Durret F., Mazure A., Nov. 2005, A&A, 443, 17
Adami C., Picat J.P., Savine C., et al., Jun. 2006, A&A, 451, 1159
Arnaud M., Maurogordato S., Slezak E., Rho J., Mar. 2000, A&A, 355, 461
Arnouts S., de Lapparent V., Mathez G., et al., Jul. 1997, A&AS, 124, 163
Bardelli S., Pisani A., Ramella M., Zucca E., Zamorani G., Oct. 1998a, MNRAS,
300, 589
Bardelli S., Zucca E., Zamorani G., Vettolani G., Scaramella R., May 1998b,
MNRAS, 296, 599
Bardelli S., Zucca E., Baldi A., Jan. 2001, MNRAS, 320, 387
Barrena R., Biviano A., Ramella M., Falco E.E., Seitz S., May 2002, A&A, 386,
Beers T.C., Flynn K., Gebhardt K., Jul. 1990, AJ, 100, 32
Beers T.C., Gebhardt K., Forman W., Huchra J.P., Jones C., Nov. 1991, AJ, 102,
Bekki K., Jan. 1999, ApJ, 510, L15
Berta S., Rubele S., Franceschini A., et al., Jun. 2006, A&A, 451, 881
Binggeli B., Mar. 1982, A&A, 107, 338
Bird C.M., Jun. 1995, ApJ, 445, L81
Biviano A., Durret F., Gerbal D., et al., Jul. 1996, A&A, 311, 95
Biviano A., Katgert P., Mazure A., et al., May 1997, A&A, 321, 84
Biviano A., Murante G., Borgani S., et al., Sep. 2006, A&A, 456, 23
Böhringer H., Schuecker P., Jun. 2002, Observational signatures and statistics
of galaxy Cluster Mergers: Results from X-ray observations with ROSAT,
ASCA, and XMM-Newton, 133–162, ASSL Vol. 272: Merging Processes in
Galaxy Clusters
Briel U.G., Henry J.P., Boehringer H., Jun. 1992, A&A, 259, L31
Buote D.A., Jun. 2002, X-Ray Observations of Cluster Mergers: Cluster
Morphologies and Their Implications, 79–107, ASSL Vol. 272: Merging
Processes in Galaxy Clusters
Buote D.A., Xu G., Jan. 1997, MNRAS, 284, 439
Burgett W.S., Vick M.M., Davis D.S., et al., Aug. 2004, MNRAS, 352, 605
Caldwell N., Rose J.A., Feb. 1997, AJ, 113, 492
Caldwell N., Rose J.A., Sharples R.M., Ellis R.S., Bower R.G., Aug. 1993, AJ,
106, 473
Cen R., Aug. 1997, ApJ, 485, 39
Clowe D., Bradač M., Gonzalez A.H., et al., Sep. 2006, ApJ, 648, L109
Cowie L.L., Songaila A., Hu E.M., Cohen J.G., Sep. 1996, AJ, 112, 839
Crone M.M., Evrard A.E., Richstone D.O., Aug. 1996, ApJ, 467, 489
De Lucia G., Blaizot J., Jun. 2006, ArXiv Astrophysics e-prints
De Lucia G., Kauffmann G., Springel V., et al., Feb. 2004, MNRAS, 348, 333
De Propris R., Colless M., Driver S.P., et al., Jul. 2003, MNRAS, 342, 725
de Vaucouleurs G., May 1961, ApJS, 6, 213
Demarco R., Rosati P., Lidman C., et al., Mar. 2005, A&A, 432, 381
Dressler A., Shectman S.A., Apr. 1988, AJ, 95, 985
Dubinski J., Jul. 1998, ApJ, 502, 141
Dubinski J., Aug. 1999, In: Merritt D.R., Valluri M., Sellwood J.A. (eds.) ASP
Conf. Ser. 182: Galaxy Dynamics - A Rutgers Symposium, 491–+
Durret F., Forman W., Gerbal D., Jones C., Vikhlinin A., Jul. 1998, A&A, 335,
Escalera E., Slezak E., Mazure A., Oct. 1992, A&A, 264, 379
Escalera E., Biviano A., Girardi M., et al., Mar. 1994, ApJ, 423, 539
Ettori S., Fabian A.C., White D.A., Nov. 1998, MNRAS, 300, 837
Fadda D., Slezak E., Bijaoui A., Jan. 1998, A&AS, 127, 335
Fasano G., Marmo C., Varela J., et al., Jan. 2006, A&A, 445, 805
Ferrari C., Maurogordato S., Cappi A., Benoist C., Mar. 2003, A&A, 399, 813
Ferrari C., Arnaud M., Ettori S., Maurogordato S., Rho J., Feb. 2006, A&A, 446,
Fitchett M., Merritt D., Dec. 1988, ApJ, 335, 18
Fitchett M.J., 1988, In: Dickey J.M. (ed.) ASP Conf. Ser. 5: The Minnesota lec-
tures on Clusters of Galaxies and Large-Scale Structure, 143–174
Flin P., Krywult J., Apr. 2006, A&A, 450, 9
Gehrels N., Apr. 1986, ApJ, 303, 336
Geller M.J., Beers T.C., Jun. 1982, PASP, 94, 421
Giacintucci S., Venturi T., Bardelli S., et al., Apr. 2006, New Astronomy, 11, 437
Gioia I.M., Henry J.P., Mullis C.R., Ebeling H., Wolter A., Jun. 1999, AJ, 117,
Girardi M., Biviano A., Jun. 2002, Optical Analysis of Cluster Mergers, 39–77,
ASSL Vol. 272: Merging Processes in Galaxy Clusters
Girardi M., Fadda D., Giuricin G., et al., Jan. 1996, ApJ, 457, 61
Girardi M., Escalera E., Fadda D., et al., Jun. 1997, ApJ, 482, 41
Gnedin O.Y., Oct. 1999, Ph.D. Thesis
Haines C.P., Clowes R.G., Campusano L.E., Adamson A.J., May 2001, MNRAS,
323, 688
Huo Z.Y., Xue S.J., Xu H., Squires G., Rosati P., Mar. 2004, AJ, 127, 1263
Jeltema T.E., Canizares C.R., Bautz M.W., Buote D.A., May 2005, ApJ, 624,
Jones C., Forman W., Jan. 1999, ApJ, 511, 65
Khosroshahi H.G., Ponman T.J., Jones L.R., Oct. 2006, MNRAS, 372, L68
King I., Oct. 1962, AJ, 67, 471
Knebe A., Müller V., Feb. 2000, A&A, 354, 761
Kolokotronis V., Basilakos S., Plionis M., Georgantopoulos I., Jan. 2001,
MNRAS, 320, 49
Kriessler J.R., Beers T.C., Jan. 1997, AJ, 113, 80
Lin Y.T., Mohr J.J., Dec. 2004, ApJ, 617, 879
Lopes P.A.A., de Carvalho R.R., Capelato H.V., et al., Sep. 2006, ApJ, 648, 209
Markevitch M., Vikhlinin A., Dec. 2001, ApJ, 563, 95
Markevitch M., Gonzalez A.H., Clowe D., et al., May 2004, ApJ, 606, 819
Materne J., Apr. 1979, A&A, 74, 235
Maughan B.J., Jones L.R., Ebeling H., et al., Apr. 2003, ApJ, 587, 589
Maurogordato S., Proust D., Beers T.C., et al., Mar. 2000, A&A, 355, 848
Mazure A., Katgert P., den Hartog R., et al., Jun. 1996, A&A, 310, 31
Miller N.A., Dec. 2005, AJ, 130, 2541
Miller N.A., Owen F.N., Hill J.M., et al., Oct. 2004, ApJ, 613, 841
Mohr J.J., Fabricant D.G., Geller M.J., Aug. 1993, ApJ, 413, 492
Mohr J.J., Evrard A.E., Fabricant D.G., Geller M.J., Jul. 1995, ApJ, 447, 8
Mohr J.J., Geller M.J., Fabricant D.G., et al., Oct. 1996, ApJ, 470, 724
Moles M., Perea J., del Olmo A., Jan. 1986, MNRAS, 213, 365
Moss C., Whittle M., Sep. 2000, MNRAS, 317, 667
Neumann D.M., Böhringer H., Jul. 1997, MNRAS, 289, 123
M. Ramella et al.: Substructures in the WINGS clusters 11
Nipoti C., Treu T., Ciotti L., Stiavelli M., Dec. 2004, MNRAS, 355, 1119
Perea J., Moles M., del Olmo A., Jan. 1986a, MNRAS, 219, 511
Perea J., Moles M., del Olmo A., Jan. 1986b, MNRAS, 222, 49
Perea J., del Olmo A., Moles M., Jan. 1990, A&A, 228, 310
Pinkney J., Roettiger K., Burns J.O., Bird C.M., May 1996, ApJS, 104, 1
Pisani A., Dec. 1993, MNRAS, 265, 706
Pisani A., Feb. 1996, MNRAS, 278, 697
Plionis M., Benoist C., Maurogordato S., Ferrari C., Basilakos S., Sep. 2003,
ApJ, 594, 144
Poggianti B.M., Bridges T.J., Komiyama Y., et al., Jan. 2004, ApJ, 601, 197
Popesso P., Biviano A., Böhringer H., Romaniello M., Jun. 2006, ArXiv
Astrophysics e-prints
Richstone D., Loeb A., Turner E.L., Jul. 1992, ApJ, 393, 477
Roettiger K., Loken C., Burns J.O., Apr. 1997, ApJS, 109, 307
Roettiger K., Stone J.M., Mushotzky R.F., Jan. 1998, ApJ, 493, 62
Rosati P., Tozzi P., Ettori S., et al., Jan. 2004, AJ, 127, 230
Schindler S., Müller E., May 1993, A&A, 272, 137
Schuecker P., Böhringer H., Reiprich T.H., Feretti L., Nov. 2001, A&A, 378, 408
Shane C.D., Wirtanen C.A., Sep. 1954, AJ, 59, 285
Slezak E., Durret F., Gerbal D., Dec. 1994, AJ, 108, 1996
Solanes J.M., Salvador-Solé E., González-Casado G., Mar. 1999, A&A, 343, 733
Thomas P.A., Colberg J.M., Couchman H.M.P., et al., Jun. 1998, MNRAS, 296,
Valdarnini R., Ghizzardi S., Bonometto S., Mar. 1999, New Astronomy, 4, 71
van den Bergh S., 1960, MNRAS, 121, 387
van den Bergh S., Feb. 1961, PASP, 73, 46
van Dokkum P.G., Franx M., Fabricant D., Illingworth G.D., Kelson D.D., Sep.
2000, ApJ, 541, 95
Varela et al., Jan. 2007, A&A, submitted
West M.J., Blakeslee J.P., Nov. 2000, ApJ, 543, L27
West M.J., Oemler A.J., Dekel A., Apr. 1988, ApJ, 327, 1
Xu W., Fang L.Z., Wu X.P., Apr. 2000, ApJ, 532, 728
Yagi M., Kashikawa N., Sekiguchi M., et al., Jan. 2002, AJ, 123, 87
Zabludoff A.I., Geller M.J., Huchra J.P., Ramella M., Oct. 1993, AJ, 106, 1301
12 M. Ramella et al.: Substructures in the WINGS clusters
Appendix A: The catalog of substructures
We provide here the catalog of substructures. In Table A.1 we give, for each substructure: (1) the name of the parent cluster; (2) the
classification of the structure as main (M), subcluster (S), or background (B) together with their order number; (3) right ascension
(J2000), and (4) declination (J2000) in decimal degrees of the DEDICA peak; the parameters of the ellipse we obtain from the
variance matrix of the coordinates of galaxies in the substructure, i.e. (5) major axis in arcminutes, (6) ellipticity, and (7) position
angle in degrees; (8) luminosity; (9) χ2S .
ID class αJ2000 δJ2000 a e PA L χ
(deg) (deg) (arcmin) (deg) (1012 L⊙)
A0085 M 10.4752 -9.3025 2.0 0.23 -17. 0.41536 48.4
A0085 S1 10.4410 -9.4430 1.8 0.35 -39. 0.17649 42.9
A0085 S2 10.3947 -9.3501 2.3 0.40 -72. 0.12337 32.8
A0119 M 14.0625 -1.2630 4.1 0.44 -65. 0.83955 63.4
A0119 S1 14.1183 -1.2106 4.6 0.60 -23. 0.26847 50.8
A0119 S2 14.0267 -1.0441 3.4 0.34 80. 0.03592 32.0
A0119 B1 13.9402 -1.4979 3.1 0.39 46. – 23.5
A0147 M 17.0648 2.2033 3.9 0.45 79. 0.31392 45.2
A0147 S1 16.8673 2.1393 4.1 0.25 -50. 0.05638 24.8
A0147 S2 17.1925 1.9284 4.4 0.38 55. 0.05052 21.4
A0147 B1 17.0753 2.3174 4.4 0.37 75. – 58.0
A0151 M 17.2186 -15.4219 1.7 0.26 -16. 0.47344 39.9
A0151 S1 17.3516 -15.3652 2.1 0.37 -58. 0.13761 42.9
A0151 S2 17.2632 -15.5564 1.6 0.26 -53. 0.19762 40.9
A0151 B1 17.1375 -15.6116 1.5 0.08 -4. – 59.0
A0160 M 18.2344 15.5126 3.6 0.37 82. 0.55525 66.7
A0160 S1 18.2483 15.3138 5.0 0.41 85. 0.03120 38.1
A0160 S2 18.1141 15.7501 3.0 0.16 86. 0.15196 28.3
A0160 S3 17.9981 15.4150 3.9 0.41 0. 0.06315 27.8
A0168 M 18.7755 0.3999 3.1 0.32 -11. 0.24492 30.5
A0168 S1 18.8799 0.2993 2.0 0.33 4. 0.06871 28.6
A0193 M 21.2894 8.6994 2.1 0.08 36. 0.61982 105.7
A0193 B1 20.9945 8.6119 4.9 0.45 -1. – 39.1
A0311 M 32.3793 19.7722 2.3 0.19 43. 0.43320 44.0
A0376 M 41.4276 36.9517 1.7 0.07 -67. 0.13477 40.8
A0376 S1 41.5569 36.9214 4.4 0.49 -22. 0.24350 33.6
A0500 M 69.6476 -22.1308 2.0 0.31 16. 0.41203 45.5
A0500 S1 69.5915 -22.2377 2.3 0.19 36. 0.20520 47.3
A0602 M 118.3638 29.3528 1.8 0.55 -46. 0.20112 55.4
A0602 S1 118.1848 29.4145 2.3 0.52 31. 0.08470 34.8
A0671 M 127.1237 30.4269 1.6 0.24 -51. 0.68582 69.8
A0671 S1 127.2241 30.4342 2.0 0.40 -5. 0.19736 44.3
A0671 S2 127.1617 30.2967 1.9 0.24 -90. 0.13778 43.0
A0754 M 137.1073 -9.6370 2.0 0.25 53. 0.56063 46.8
A0754 S1 137.3707 -9.6760 3.2 0.53 -8. 0.30590 54.9
A0754 S2 137.2619 -9.6367 1.7 0.14 76. 0.23734 51.2
A0957x M 153.4095 -0.9259 2.0 0.09 -83. 0.42106 38.6
A0957x B1 153.5517 -0.7023 2.2 0.44 -63. – 37.9
A0970 M 154.3595 -10.6921 1.5 0.27 -30. 0.46130 62.5
A0970 S1 154.2369 -10.6422 1.7 0.15 15. 0.13660 42.3
A0970 B1 154.1833 -10.6771 1.8 0.23 -76. – 32.2
A1069 M 159.9418 -8.6883 2.8 0.31 52. 0.37270 50.2
A1069 S1 159.9286 -8.5506 2.4 0.23 88. 0.18532 32.7
A1069 B1 159.7678 -8.9262 3.5 0.55 77. – 54.7
A1291 M 173.0467 56.0255 2.5 0.51 -11. 0.25272 32.1
A1291 S1 172.9090 56.1872 1.4 0.48 -82. 0.03530 37.6
A1631a M 193.2410 -15.3413 1.4 0.35 40. 0.20077 33.9
A1736 M 202.0097 -27.3131 3.1 0.35 58. 0.41824 52.1
A1736 S1 201.7305 -27.0170 2.8 0.32 9. 0.24023 42.6
A1736 S2 201.7662 -27.4067 3.4 0.28 7. 0.14528 42.3
A1736 S3 201.5672 -27.4291 2.7 0.44 73. 0.16926 40.4
A1736 S4 201.9057 -27.1600 3.5 0.21 -1. 0.24192 32.9
A1736 S5 201.7036 -27.1236 3.0 0.39 -12. 0.40395 31.7
A1795 M 207.1911 26.5586 0.6 0.17 55. 0.12341 52.4
M. Ramella et al.: Substructures in the WINGS clusters 13
ID class αJ2000 δJ2000 a e PA L χ
(deg) (deg) (arcmin) (deg) (1012 L⊙)
A1795 S1 207.2329 26.7362 1.3 0.38 82. 0.05123 46.7
A1831 M 209.8120 27.9714 1.9 0.43 9. 1.08418 56.0
A1831 S1 209.7356 28.0636 2.1 0.34 41. 0.36295 59.7
A1831 B1 209.5725 28.0206 1.7 0.25 -10. – 47.7
A1991 M 223.6405 18.6390 2.3 0.54 -78. 0.28195 40.3
A1991 S1 223.7575 18.7812 2.5 0.32 41. 0.11412 49.9
A1991 B1 223.7683 18.7022 1.7 0.31 71. – 36.6
A2107 M 234.9497 21.8075 2.7 0.19 48. 0.50994 61.1
A2107 B1 235.0699 22.0127 4.3 0.48 83. – 32.9
A2107 B2 235.1409 21.8276 2.4 0.10 55. – 20.4
A2124 M 236.2400 36.0990 1.3 0.24 32. 0.41727 43.3
A2124 B1 236.0207 36.1779 1.6 0.22 34. – 59.7
A2149 M 240.3723 53.9406 1.5 0.46 -10. 0.37347 48.7
A2169 M 243.4867 49.1875 0.6 0.22 72. 0.15358 34.2
A2256 M 255.9260 78.6412 1.9 0.29 -86. 1.46563 95.6
A2256 B1 256.3094 78.4886 2.2 0.48 75. – 48.2
A2256 B2 256.6024 78.4283 2.0 0.12 -88. – 46.8
A2399 M 329.3693 -7.7772 3.5 0.64 -26. 0.40505 38.8
A2415 M 331.3829 -5.5444 2.3 0.23 -60. 0.36780 44.8
A2415 S1 331.5610 -5.3960 2.3 0.33 52. 0.05032 33.9
A2415 B1 331.3800 -5.4017 1.9 0.34 32. – 41.4
A2415 B2 331.3295 -5.3890 1.6 0.41 4. – 37.3
A2457 M 338.9462 1.4765 4.3 0.50 -84. 0.88720 107.3
A2457 S1 339.0392 1.6459 4.1 0.65 -50. 0.05960 23.1
A2457 B1 339.0667 1.3266 5.6 0.53 77. – 73.8
A2572a M 349.3192 18.7197 2.9 0.39 23. 0.44749 47.8
A2572a S1 349.1122 18.5320 4.1 0.25 8. 0.07320 34.5
A2572a S2 349.3851 18.5395 2.6 0.31 67. 0.05345 25.1
A2572a S3 349.0037 18.7220 3.2 0.34 86. 0.00884 20.5
A2593 M 351.0766 14.6539 1.1 0.25 58. 0.28333 33.8
A2593 S1 351.0677 14.4048 2.2 0.42 80. 0.09810 27.0
A2622 M 353.7384 27.3856 3.1 0.09 76. 0.48920 68.1
A2622 S1 353.4880 27.2877 4.2 0.35 46. 0.03070 35.1
A2622 B1 353.7837 27.3182 3.0 0.49 -5. – 53.7
A2622 B2 353.8009 27.6217 2.6 0.38 68. – 29.9
A2657 M 356.1725 9.1818 4.5 0.47 22. 0.27061 49.6
A2657 S1 356.2755 9.1799 3.2 0.35 10. 0.20771 34.0
A2657 B1 355.9569 8.9422 2.8 0.06 -10. – 38.6
A2665 M 357.7050 6.1582 3.6 0.26 71. 0.67950 121.8
A2665 S1 357.4003 5.8659 4.9 0.64 84. 0.01780 17.5
A2665 B1 357.8218 6.3522 3.5 0.44 -50. – 32.7
A2734 M 2.8363 -28.8652 3.4 0.33 3. 0.48700 56.3
A2734 S1 2.6950 -28.7728 4.0 0.27 -7. 0.18970 55.0
A2734 S2 2.6987 -29.0394 3.2 0.24 55. 0.03030 43.3
A2734 S3 2.5727 -29.0562 3.4 0.51 84. 0.03100 33.1
A2734 B1 2.7701 -28.6488 3.7 0.39 14. – 28.0
A3128 M 52.4825 -52.5764 2.2 0.17 -36. 0.40452 37.2
A3128 S1 52.7366 -52.7089 4.1 0.44 -9. 0.25240 51.8
A3128 S2 52.6655 -52.4413 2.3 0.32 -65. 0.19646 51.0
A3128 S3 52.3697 -52.7570 3.2 0.47 81. 0.17169 39.0
A3158 M 55.7477 -53.6334 2.6 0.62 2. 0.70205 52.8
A3158 S1 55.8382 -53.6780 3.4 0.58 10. 0.45553 53.4
A3266 M 67.7893 -61.4637 1.1 0.27 -72. 0.42993 63.5
A3376 M 90.1628 -39.9950 2.5 0.14 -43. 0.33708 43.1
A3376 S1 90.4344 -39.9776 2.7 0.20 -4. 0.21279 59.8
A3376 S2 90.4712 -39.7946 2.1 0.39 -88. 0.00904 31.2
A3528b M 193.5928 -29.0136 1.3 0.04 -24. 0.65638 66.4
A3528b S1 193.6030 -29.0721 1.3 0.26 10. 0.16706 59.0
A3530 M 193.9098 -30.3606 1.9 0.26 33. 0.53043 34.9
A3532 M 194.3035 -30.3732 3.6 0.52 -44. 0.76920 51.1
A3532 B1 194.0413 -30.2130 3.8 0.38 66. – 56.3
14 M. Ramella et al.: Substructures in the WINGS clusters
ID class αJ2000 δJ2000 a e PA L χ
(deg) (deg) (arcmin) (deg) (1012 L⊙)
A3558 M 201.9587 -31.4892 4.9 0.54 49. 1.14860 64.1
A3558 B1 202.2501 -31.6887 2.6 0.49 44. – 37.6
A3562 M 203.4603 -31.6812 2.5 0.18 82. 0.39087 42.4
A3562 S1 203.1622 -31.7742 4.0 0.40 -86. 0.15010 51.1
A3562 S2 203.3137 -31.6953 3.6 0.40 76. 0.11706 41.0
A3562 S3 203.6982 -31.7171 2.5 0.21 -50. 0.07820 36.7
A3562 S4 203.6541 -31.5969 4.0 0.55 -81. 0.06542 30.3
A3667 M 303.1637 -56.8598 2.1 0.14 23. 0.56803 42.0
A3667 S1 303.5297 -56.9660 3.0 0.39 -86. 0.27086 39.2
A3667 S2 302.7241 -56.6674 1.5 0.17 75. 0.28948 34.0
A3667 S3 302.7081 -56.7557 2.3 0.39 12. 0.09961 33.7
A3716 M 312.9910 -52.7677 4.7 0.38 36. 0.76450 77.9
A3716 S1 312.9769 -52.6434 3.5 0.22 -6. 0.49159 56.9
A3716 B1 312.7735 -52.8976 4.1 0.40 28. – 48.4
A3716 B2 313.1888 -52.4785 3.2 0.23 21. – 22.6
A3880 M 336.9796 -30.5474 3.8 0.33 -50. 0.25840 44.1
A3880 B1 336.8684 -30.8171 2.5 0.30 64. – 30.9
A3880 B2 336.7356 -30.7839 2.7 0.35 46. – 28.4
IIZW108 M 318.4443 2.5706 2.6 0.38 42. 0.49940 33.4
IIZW108 S1 318.6247 2.5533 3.2 0.36 -14. 0.08565 42.3
IIZW108 B1 318.3335 2.7751 1.6 0.24 43. – 44.5
IIZW108 B2 318.5190 2.8039 2.5 0.44 22. – 33.6
MKW3s M 230.3916 7.7281 2.3 0.30 -8. 0.37614 48.7
MKW3s S1 230.4576 7.8769 3.3 0.46 -22. 0.04585 39.3
MKW3s B1 230.7349 7.8882 2.0 0.40 16. – 25.5
RX0058 M 14.5875 26.8816 2.4 0.22 -31. 0.31967 44.2
RX0058 S1 14.7652 27.0424 3.8 0.60 64. 0.31661 50.6
RX0058 B1 14.4012 26.7041 3.3 0.08 -12. – 28.9
RX1740 M 265.1398 35.6416 2.8 0.21 -26. 0.17896 42.1
RX1740 S1 265.2600 35.4366 3.3 0.22 38. 0.01946 31.7
RX1740 S2 264.8688 35.6053 3.7 0.52 28. 0.01340 27.9
RX1740 S3 265.0744 35.8116 3.5 0.44 -7. 0.01166 21.8
Z2844 M 150.7281 32.6483 2.9 0.20 -85. 0.10143 48.3
Z2844 S1 150.6524 32.7621 5.2 0.58 63. 0.04930 50.5
Z2844 S2 150.5821 32.8890 2.6 0.39 0. 0.00395 23.1
Z8338 M 272.7447 49.9078 3.0 0.37 -67. 0.45876 43.1
Z8338 S1 272.8606 49.7916 3.2 0.11 62. 0.05549 31.9
Z8338 S2 272.6903 49.9737 3.1 0.67 62. 0.07089 25.7
Z8338 B1 272.4479 49.6815 1.7 0.18 9. – 25.8
Z8852 M 347.6024 7.5824 2.7 0.41 -45. 0.76110 67.9
Z8852 S1 347.5926 7.3999 5.8 0.56 62. 0.12022 32.4
Z8852 S2 347.6986 7.8018 2.3 0.13 81. 0.02493 25.5
Z8852 B1 347.7381 7.6808 2.1 0.25 -73. – 35.9
Z8852 B2 347.4951 7.8165 2.4 0.21 72. – 27.0
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 1
Online Material
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 2
Fig. 6. Isodensity contours (logarithmically spaced) of the 55 clusters with significant structures. The title lists the coordinates of
the center. The orientation is East to the left, North to the top. Galaxies belonging to the systems detected by DEDICA are shown as
dots of different colors. Black, light green, blue, red, magenta, dark green are for the main system and the subsequent substructures
ordered as in Table A.1. Large symbols are for galaxies with MV ≤ −17.0 that lie where local densities are higher than the median
local density of the structure the galaxy belongs to. Open symbols mark the positions of the first- and second-ranked cluster galaxies,
BCG1 and BCG2 respectively.
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 3
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 4
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 5
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 6
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 7
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 8
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 9
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 10
Fig. 6. (continued)
M. Ramella et al.: Substructures in the WINGS clusters, Online Material p 11
Fig. 6. (continued)
	Introduction
	The Data
	The DEDICA Procedure
	The probability density
	Cluster Identification
	Cluster Significance and of Membership Probability
	Tweaking the Algorithm with Simulations
	Substructure detection in WINGS clusters
	The Catalog of Substructures
	Properties of substructures
	Brightest Cluster Galaxies
	Summary
	The catalog of substructures
ABSTRACT
  We search for and characterize substructures in the projected distribution of
galaxies observed in the wide field CCD images of the 77 nearby clusters of the
WIde-field Nearby Galaxy-cluster Survey (WINGS). This sample is complete in
X-ray flux in the redshift range 0.04<z<0.07. We search for substructures in
WINGS clusters with DEDICA, an adaptive-kernel procedure. We test the procedure
on Monte-Carlo simulations of the observed frames and determine the reliability
for the detected structures. DEDICA identifies at least one reliable structure
in the field of 55 clusters. 40 of these clusters have a total of 69
substructures at the same redshift of the cluster (redshift estimates of
substructures are from color-magnitude diagrams). The fraction of clusters with
subclusters (73%) is higher than in most studies. The presence of subclusters
affects the relative luminosities of the brightest cluster galaxies (BCGs).
Down to L ~ 10^11.2 L_Sun, our observed differential distribution of subcluster
luminosities is consistent with the theoretical prediction of the differential
mass function of substructures in cosmological simulations.

<|endoftext|><|startoftext|>
Ising-like dynamics and frozen states in systems of ultrafine magnetic particles
Stefanie Russ1 and Armin Bunde1
Institut für Theoretische Physik III, Justus-Liebig-Universität Giessen, D-35392 Giessen, Germany
(Dated: February 28, 2022)
We use Monte-Carlo simulations to study aging phenomena and the occurence of spinglass phases
in systems of single-domain ferromagnetic nanoparticles under the combined influence of dipolar
interaction and anisotropy energy, for different combinations of positional and orientational disorder.
We find that the magnetic moments oriente themselves preferably parallel to their anisotropy axes
and changes of the total magnetization are solely achieved by 180 degree flips of the magnetic
moments, as in Ising systems. Since the dipolar interaction favorizes the formation of antiparallel
chain-like structures, antiparallel chain-like patterns are frozen in at low temperatures, leading to
aging phenomena characteristic for spin-glasses. Contrary to the intuition, these aging effects are
more pronounced in ordered than in disordered structures.
PACS numbers: 75.75.+a, 75.40.Mg, 75.50.Lk, 75.50.Tt
INTRODUCTION
In the last decade, systems of ultrafine magnetic
nanoparticles have received considerable interest, due
both to their important technological applications
(mainly in magnetic storage and recordings) and their
rich and often unusual experimental behavior, which is
related to their role as a complex mesoscopic system
[1, 2]. It has been discussed controversially in the past,
under which circumstances these systems are able to
show spin-glass phases. While experiments on disor-
dered magnetic materials present indications of a spin-
glass phase [2, 3, 4] or of a glassy-like random anisotropy
system [5], the situation is less clear on the theoretical
side. Simulations of the zero-field cooling (ZFC) and
field-cooling susceptibility showed no indication of a spin-
glass phase [6, 7]. In contrast, simulations on aging [8]
(on a simplified system, where the dipolar interaction
was only considered up to a cut-off radius) and magnetic
relaxation [9, 10] favorize the spin-glass hypothesis, but
the structure of the frozen history-dependent states as
well as the actual mechanism leading to them has not
yet been clarified.
In this letter, in order to clarify these questions, we
use Monte Carlo simulations [11] to study aging phe-
nomena on a large variety of systems of ultrafine mag-
netic nanoparticles (see Fig. 1). Our simulations do not
only point to the existence of frozen history-dependent
states at low temperatures that are characteristic for spin
glasses, but also yield an insight into the structure of the
frozen states and the underlying dynamics. We find that
under the combined influence of dipolar and anisotropy
energy, the magnetic moments have a tendency to align
in an Ising-like manner either parallel or antiparallel to
their anisotropy axes and change their directions by 180
degree flips as in Ising systems. This way, chain-like
structures are formed where all magnetic moments point
into the same direction and neighboring chains have the
tendency to oriente themselves in an antiparallel way.
(a) (b)
(c) (d)
FIG. 1: Two-dimensional sketches of the geometries consid-
ered in this paper: (a) cubic arrangement of the particles and
all anisotropy axes aligned into the z-direction, (b) liquid-like
arrangement and all axes arranged, (c) cubic arrangement
and all axes randomly oriented and (d) liquid-like arrange-
ment and all axes randomly oriented. In the simulations, the
systems were three-dimensional (64 particles per cube).
These topological chains that freeze in at low tempera-
tures, form simple straight lines, when the particles are
arranged on the sites of a cubic lattice [10] and form com-
plex winded curves, when the arrangement of the parti-
cles is liquid-like. As a consequence, if a small external
magnetic field is applied, the magnetic moments can fol-
low the field more easily in a disordered system than in
the ordered configuration. This leads, contrary to the in-
tuition, to more pronounced aging effects (characteristic
for spin glasses) in ordered than in disordered structures.
http://arxiv.org/abs/0704.0580v1
MODEL SYSTEM AND NUMERICAL
SIMULATIONS
For the numerical calculations, we focus on the same
model as in earlier papers [6, 9], which (i) assumes a
coherent magnetization rotation within the anisotropic
particles, and (ii) takes into account the magnetic dipo-
lar interaction between them. Every particle i of volume
Vi is considered to be a single magnetic domain ~µi with
all its atomic magnetic moments rotating coherently and
the Vi are taken from a Gaussian distribution of width
σV = 0.4 and 〈V 〉 = 1 (see also [6, 9]). This results in
a constant absolute value |µi| = MsVi of the total mag-
netic moment of each particle, whereMs is the saturation
magnetization. The energy of each particle consists of
three contributions: anisotropy energy, dipolar interac-
tion and magnetic energy of an external field. We assume
a temperature independent uniaxial anisotropy energy
A = −KVi((~µi~ni)/|~µi|)
2, where K is the anisotropy
constant and the unit vector ~ni denotes the easy direc-
tions. Eventually, the magnetic moments are coupled to
an external field H leading to the additional field energy
H = −~µi
~H . Finally, the energy of the magnetic dipo-
lar interaction between two particles i and j separated by
~rij is given by E
(i,j)
D = (~µi~µj)/r
ij − 3(~µi~rij)(~µj~rij)/r
Adding up the three energy contributions and summing
over all N particles we obtain the total energy
j 6=i
(i,j)
. (1)
In the Monte Carlo simulations we concentrate on sam-
ples of N = L3 particles placed inside a cube of side
length L = 4 and average over 1000 configurations. Dur-
ing the simulations, both, the positions of the particles
and their easy axes are kept fixed. The unitless concen-
tration c is defined as the ratio between the total vol-
Vi occupied by the particles and the volume
Vs of the sample. Here, we focus on the concentra-
tion c/c0 ≈ 0.3, where c0 = 2K/M
s is a dimensionless
material-dependent constant, c0 ∼ 1.4 for iron nitride
and c0 ∼ 2.1 for maghemite nanoparticles [9]. We also
tested systems with higher concentrations c/c0 ≈ 0.4 and
(the extremely high concentration) c/c0 ≈ 0.6 and found
that the results remain qualitatively unchanged. The
temperature is measured in units of the reduced temper-
ature T̃ ≡ 1/(2βKV ), where 2KV is the height of the
anisotropy barrier and β = 1/(kBT ). Similarly, the mag-
netic field is measured in units of the anisotropy field
Ha = 2K/Ms. The relaxation of the individual magnetic
moments is simulated by the standard Metropolis algo-
rithm [12]. In contrast to [8], where dipole interactions
between the particles were only considered up to a cut-off
radius, we calculate the interaction energies by the Ewald
sum method with periodic boundary conditions in x, y
and z-direction [6, 13] and thus are able to account fully
(a) (b)
(c) (d)
FIG. 2: (Colors online) The magnetization m(τ ) after wait-
ing times tw = 0 (filled symbols) and tw = 10000 Monte
Carlo steps (open symbols) is plotted versus τ (number of
Monte Carlo steps with applied external field) for (a) cubic
lattice and aligned axes, (b) liquid-like system and aligned
axes, (c) cubic system and random axes and (d) liquid-
like systems and random axes for the reduced temperatures
T̃ = kBT/(2KV ) = 5 (black symbols, circles), T̃ = 1/10 (red
symbols, squares) and T̃ = 1/40 (blue symbols, diamonds).
for the long-range character of the dipole forces. The
magnetic moment ~µi is characterized by the spherical
angles θi and ϕi relative to a coordinate frame, where
the z-axis is parallel to the external field [9, 14, 15]. To
study the magnetic relaxation we determine as a function
of time t (number of Monte Carlo steps) for each particle
i the angle θi between the magnetic moment ~µi and the
z-axis, from which we obtain the relevant quantities, as
e.g. the normalized magnetization,
m(t) =
cos θi(t). (2)
To obtain the orientation of ~µi relative to ~ni, we intro-
duce the ”orientational order parameter” Oµ ≡ 〈|~µi~ni|〉,
i.e. the average of the absolute values of the scalar prod-
uct ~µi~ni over all N particles and all configurations. Oµ
does not distinguish between the parallel and the an-
tiparallel alignment. It is equal to zero when all ~µi are
perpendicular to their axes ~ni and equal to 1 if they are
all parallel or antiparallel to them.
To study aging phenomena, we determine the magne-
tization in a ZFC simulation. First, starting in a ran-
dom configuration of the magnetic moments, the system
is cooled down in the absence of an external field, from
T = ∞ to a reduced temperature T̃ with a constant cool-
ing rate of ∆β/∆t = 0.1, corresponding to 400 Monte
Carlo steps for T̃ = 1/40 and 10 steps for T̃ = 1. Sec-
ond, the cooling process is stopped at T̃ and the system is
allowed to relax for a certain waiting time tw. Finally, in
the third step, a small external field h = 0.1Ha is applied
in z-direction. The magnetization m(τ) is determined as
a function of τ ≡ t− tw (number of Monte Carlo steps af-
ter switching on the field). Aging effects are represented
by differences between the m(τ)-curves for different tw
and occur, when many different relaxation rates exist in
the system, so that after a given waiting time tw, the sys-
tem has only partly relaxed towards equilibrium. Experi-
mentally, aging effects have already been found in several
spin-glasses, as e.g. in Permalloy/alumina granular films
[16], rare-earth manganates [17], CuMn spin-glasses [18],
multilayer systems [19] and in Fe3N nanoparticle sys-
tems [20].
NUMERICAL RESULTS
Figure 2 shows m(τ) for the systems of Fig. 1 without
waiting time, tw = 0 (filled symbols), and for tw = 10
(open symbols). The different colors (and symbols) stand
for three different temperatures T̃ = 1/5, 1/10 and 1/40.
Clearly, all curves show aging effects, similar to the ex-
perimental results of Refs. [16, 17, 18]. Systems with no
or only small tw follow the external field faster than the
systems with longer waiting times, indicating that the
longer relaxation leads to more stable chains. The ag-
ing effects are most pronounced for those systems where
all anisotropy axes are oriented into the direction of the
external field (Fig. 2(a,b)) and less pronounced but still
visible for the systems with disordered anisotropy axes
(Fig. 2(c,d)). In these systems with orientational disor-
der, the m(τ) curves coincide for small τ and show aging
effects only after a certain crossover time (close to 102
Monte Carlo steps). This indicates that in these systems
a certain fraction of dipoles does not belong to quasi-
stable chain-like structures and can follow the external
field nearly instantaneously, independentely of the wait-
ing time and thus dominate the short-time behavior. The
aging effects decrease with increasing T̃ , when the order
is destroyed by the thermal fluctuations.
In order to understand the dynamical behavior in a
more microscopic way, we compare m(τ) with the time-
dependence of the corresponding orientational order pa-
rameters Oµ(τ). Figure 3 shows Oµ in the 3rd step of
the aging process for tw = 0 and tw = 10000 (filled and
open symbols, respectively) and for the same geometries
as before (see Fig. 1). The figure shows that quite con-
trary to the expectation, apart from a slight minimum
at intermediate τ , Oµ is constant in time for the systems
of tw = 10000. Without waiting time, the curves start
at much smaller values of Oµ, but increase rapidly un-
til they reach at a crossover time τc of about 10
3 Monte
Carlo steps the common plateau value. In the plateau
regime, the dipolar moments ~µi are either oriented par-
(a) (b)
(c) (d)
FIG. 3: (Colors online) The order parameter Oµ after a wait-
ing time tw = 0 (filled symbols) and tw = 10000 (open sym-
bols) is plotted versus τ (number of Monte Carlo steps) for
the same geometries, temperatures, system parameters and
symbols and colors as in Fig. 2.
(a) (b)
(c) (d)
FIG. 4: (Colors online) The percentage Nup of particles per
system pointing upwards after waiting times tw = 0 (filled
symbols) and tw = 10000 (open symbols) is plotted versus
τ (number of Monte Carlo steps) for the same geometries,
temperatures, system parameters and symbols and colors as
in Fig. 2.
allel or antiparallel to their easy axes ~ni and do therefore
flip only between these two directions. Accordingly, the
value of Oµ does neither depend on the external field nor
on the functional form of m(τ). Since Oµ(τ) stays con-
stant for large tw or τ > τc, while m(τ) increases with
time (see Fig. 2), the ~µi have already reached their par-
FIG. 5: (Colors online) The magnetization m(τ ) after a wait-
ing time tw = 0 (filled symbols) and tw = 10000 (open sym-
bols) for T̃ = 1/10 (red symbols) and T̃ = 1/40 (blue symbols)
of systems with aligned and randomly oriented anisotropy
axes (red circles and diamonds, respectively for T̃ = 1/10
and blue squares and triangles respectively for T̃ = 1/40) are
plotted versus τ (number of Monte Carlo steps) for systems
without dipole-interaction.
allel or antiparallel position and can only perform spin
flips by 180 degrees, thereby increasing m(τ) and leav-
ing Oµ unchanged. To make this point still clearer, we
plot in Fig. 4 the percentage Nup of particles pointing
upwards, i.e. with ϑi < π/2, again for the geometries of
Fig. 1. The similarity between Fig. 4 and Fig. 2 is obvi-
ous, showing that the number of the magnetic moments
oriented upwards determine the shape of m(τ).
We therefore arrive at a remarkably simple Ising-like
dynamics of these ultrafine magnetic particles. The
amount of aging is directly related to the degree of or-
der a system can achieve during tw. In the fully ordered
system of Figs. 1-3(a), after a long waiting time tw, the
~µi prefer to be aligned in stable chains [10] along the z-
direction and thus cannot follow an external field easily.
Single magnetic moments inside a chain will hardly flip to
the other side and flips of whole chains possess extremely
large relaxation times. Without waiting time, on the
other hand, the ~µi are in unstable positions which allows
them to follow the external field quite rapidly, leading to
large aging effects in ordered systems. As Figs. 2(b-d)
show, the situation is different in systems with positional
and/or orientational disorder. The relaxation times for
spin flips decrease with the amount of disorder, in partic-
ular with the amount of orientational disorder. When the
chains are winded and aligned into different directions,
they are less stable and possess a large variety of inter-
mediate positions to flip to the other side. Accordingly,
aging effects become weaker with increasing disorder.
For illustration, we visualize the aging process in Fig. 6
for the system with the highest order and the strongest
aging effects, i.e. for the cubic system with aligned
anisotropy axes. For this visualization, we follow the
definition of the transversal order parameter of Ref. [10]:
each of the L2 sites in the xy plane can be either a + site
or a − site, if a chain has already been formed and all
magnetic moments in the chain point into the positive or
negative z direction, respectively (white sites). If this is
not the case, the site is a 0 site (grey sites). The figure
shows that chains are quite obviously formed in the sec-
ond step of the aging process during the waiting time tw,
as can most easily be seen by comparing Fig. 6(a), where
tw = 10000 with 6(d) where tw = 0. In (a), many chains
are formed during tw that appear to be quite stable in
the following 3rd step of the aging process (Fig. 6(b,c)),
when an external field is applied in the + direction. We
can see that most of the − chains persist in spite of the
external field. The situation is different in Fig. 6(d-f),
where only few chains exist at the end of the 2nd step
of the aging process (Fig. 6(d)). Here, after switching
on the external magnetic field, new chains can be built
from the 0 sites and the system therefore follows the field
much easier than in Fig. 6(a-c).
Recently, it has been argued that also a broad distri-
bution of anisotropy energy barriers might lead to aging
effects in superparamagnetic systems [20]. To show that
these kinds of aging effects are in fact negligible com-
pared with systems where both energy contributions are
present, we have studied systems without dipole interac-
tion (solely anisotropy energy) at temperatures T̃ = 1/10
and 1/40. In this case, the particle positions play no role,
so that the geometry of Fig. 1(a) and (b) as well as (c)
and (d) are physically identical. The results of m(τ) for
these two geometries are shown in Fig. 5 for the same
aging procedure as before. The figure shows that the dif-
ferences between the curves for tw = 0 and tw = 10000
are orders of magnitude smaller than in the systems with
dipolar interaction. It is interesting to note that also for
systems with only dipolar interaction, some kind of ag-
ing can be seen, but orders of magnitude smaller than for
systems with both energy contributions.
In summary, analyzing the microscopic dynamics of ul-
trafine magnetic particles, we found that irrespective of
the strength of the dipolar interaction, the dipoles ori-
ente themselves either parallel or antiparallel to their
anisotropy axes. We therefore arrive at a remarakably
simple picture of the dipole dynamics, where similar to
the Ising model, the ~µi perform ”spin flips” between these
two orientations. Aging effects occur when after a certain
waiting time, the magnetic dipoles have arranged them-
selves in stable configurations and flips of single magnetic
moments are suppressed. These aging effects increase in
a counter-intuitive way with the order of the system and
are thus most pronounced in completely orderded sys-
tems with cubic arrangement of the particles and axes
aligned into the direction of the magnetic field.
(a) (b) (c)
(d) (e) (f)
FIG. 6: Visualization of the chains perpendicular to the xy-
plane in the cubic system with aligned anisotropy axes at
T̃ = 1/5 for one typical system. The complete chains are
indicated by white sites and by + or − signs, depending on
the direction of the chain. Sites, where chains have not (yet)
been built are indicated by the grey shade. (a-c) System with
waiting time tw = 10000, i.e. (a) after the cooling process
and tw = 10000 (b,c) after an external magnetic field in the +
direction has been applied for (b) 1000 and (c) 10000 Monte
Carlo steps. (d-f) System without waiting time (tw = 0),
i.e. (d) after the cooling process (and tw = 0) (e,f) after an
external magnetic field in the + direction has been applied
for (e) 1000 and (f) 10000 Monte Carlo steps.
ACKNOWLEDGEMENTS
We gratefully acknowledge very valuable discussions
with W. Kleemann and financial support from the
Deutsche Forschungsgemeinschaft.
[1] X. Batlle and A. Labarta, J. Phys. D 35, R15 (2002).
[2] Xi Chen, S. Sahoo, W. Kleemann, S. Cardoso and
P. P. Freitas, Phys. Rev. B 70, 172411 (2004).
[3] T. Jonsson, J. Mattsson, C. Djurberg, F. A. Khan, P.
Nordblad, and P. Svedlindh, Phys. Rev. Lett. 75, 4138
(1995).
[4] R. W. Chantrell, M. El-Hilo, and K. O Grady, IEEE
Trans. Magn. 27, 3570 (1991).
[5] W. Luo, S. R. Nagel, T. F. Rosenbaum, and R. E.
Rosensweig, Phys. Rev. Lett. 67, 2721 (1991).
[6] J. Garcia-Otero, M. Porto, J. Rivas, and A. Bunde, Phys.
Rev. Lett. 84, 167 (2000).
[7] M. Porto, Eur. Phys. J. B 45, 369 (2005).
[8] J.-O. Andersson et al., Phys. Rev. B 56, 13983 (1997).
[9] M. Ulrich, J. Garcia-Otero, J. Rivas, and A. Bunde;
Phys. Rev. B 67, 024416 (2003).
[10] S. Russ, A. Bunde, Phys. Rev. B 74, 064426 (2006).
[11] U. Nowak, R. W. Chantrell, and E. C. Kennedy, Phys.
Rev. Lett. 84, 163 (2000).
[12] In every step, we select a particle i at random and gen-
erate an attempted orientation of its magnetization, cho-
sen in a spherical segment around the present orientation
with an aperture angle dθ (see also Ref. [6]). By varying
dθ, i.e. the maximum jump angle, it is possible to modify
the rate of acceptance and to optimize the simulation.
As a compromise between simulations at low and high
temperatures, we chose dθ = 0.1 for all simulations, in-
dependent of temperature, which refers to an accecptance
rate between 0.5 and 0.8 for T̃ between 1/40 and 1/5. We
also tested larger values of dθ with considerably lower ac-
ceptation rates and found that they did not change the
final states significantly.
[13] M. P. Allen and D. J. Tildesley, Computer Simulation of
Liquids (Clarendon, Oxford, 1987).
[14] R. V. Chamberlin, G. Mozurkewich, and R. Orbach,
Phys. Rev. Lett. 52, 867 (1984).
[15] K. L. Ngai and U. Strom, Phys. Rev. B 38, 10350 (1988).
[16] E. Vincent, Y. Yuan, J. Hamman, H. Hurdequint and F.
Guevara; J. of Mag. and Mag. Mat. 161209 (1996).
[17] A. K. Kundu, P. Nordblad and C. N. R. Rao; Phys. Rev.
B 72, 144423 (2005).
[18] L. Lundgren, P. Svendlindh, P. Nordblad and O. Beck-
mann; Phys. Rev. Lett. 51, 811 (1983).
[19] S. Bedanta, O. Petracic, E. Kentzinger, W. Kleemann,
U. Rücker, A. Paul, Th. Brückel, S. Cardoso and
P. P. Freitas, Phys. Rev. B 72, 024419 (2005).
[20] M. Sasaki, P.E. Jönsson, H. Takayama and H. Mamiya;
Phys. Rev. B 71, 104405 (2005).
ABSTRACT
  We use Monte-Carlo simulations to study aging phenomena and the occurence of
spinglass phases in systems of single-domain ferromagnetic nanoparticles under
the combined influence of dipolar interaction and anisotropy energy, for
different combinations of positional and orientational disorder. We find that
the magnetic moments oriente themselves preferably parallel to their anisotropy
axes and changes of the total magnetization are solely achieved by 180 degree
flips of the magnetic moments, as in Ising systems. Since the dipolar
interaction favorizes the formation of antiparallel chain-like structures,
antiparallel chain-like patterns are frozen in at low temperatures, leading to
aging phenomena characteristic for spin-glasses. Contrary to the intuition,
these aging effects are more pronounced in ordered than in disordered
structures.

<|endoftext|><|startoftext|>
Introduction
Let G be a finite group and V be a finite G–module of characteristic p. If (|G|, |V |) = 1, then in
[3, Theorem 2.2] R. Knörr presented a beautiful argument showing how to obtain strong upper
bounds for k(GV ) (the number of conjugacy classes of GV ) by using only information on CG(v)
for a fixed v ∈ V . Note that his result immediately implies the important special case that if G
has a regular orbit on V (i.e., there is a v ∈ V with CG(v) = 1), then k(GV ) ≤ |V |, which was a
crucial result in the solution of the k(GV )–problem. In this note we give a much shorter proof
of this result (see Proposition 3.1 below).
The main objective of the paper, however, is to modify and generalize Knörr’s argument in
various directions to include non–coprime situations. This way we obtain a number of bounds
on certain subsets of Irr(GV ), such as the following:
Theorem A. Let G be a finite group and let V be a finite G–module of characteristic p. Let
v ∈ V and C = CG(v) and suppose that (|C|, |V |) = 1. Then the number of irreducible characters
whose restriction to 〈v〉 is not a multiple of the regular character of 〈v〉 is bounded above by
|CV (ci)|,
where the ci are representatives of the conjugacy classes of C.
Theorem B. Let G be a finite group and V be a finite G–module. Let g ∈ G be of prime order
not dividing |V |. Then the number of irreducible characters of GV whose restriction to A = 〈g〉
is not a multiple of the regular character of 〈g〉 is bounded above by
|CG(g)| n(CG(g), CV (g)),
where n(CG(g), CV (g)) denotes the number of orbits of CG(g) on CV (g).
Stronger versions and refinements of these results are proved in the paper. It is hoped that these
results prove useful in solving the non–coprime k(GV )–problem, as discussed, for instance, in
[2] and [1]. Theorem A and B will be proved in Sections 3 and 4 below respectively. In Section
2, we will generalize a recent result of P. Schmid [5, Theorem 2(a)] stating that in the situation
of the k(GV )–problem, if G has a regular orbit on V , then k(GV ) = |V | can only hold if G is
abelian. We prove
Theorem C. Let G be a finite group and V a finite faithful G–module with (|G|, |V |) = 1.
Suppose that G has a regular orbit on V . Then
k(GV ) ≤ |V | − |G|+ k(G).
Our proof is different from the approach taken in [5], and we actually will prove a slightly
stronger result including some non–coprime actions.
Notation: If the group A acts on the set B, we write n(A,B) for the number of orbits of A on
B. All other notation is standard or explained along the way.
2 k(GV) = |V| and regular orbits
In this paper we often work under the hypothesis of the k(GV )–problem which is the following.
2.1 Hypothesis. Let G be a finite group and let V be a finite faithful G–module such that
(|G|, |V |) = 1. Write p for the characteristic of V .
In [5, Theorem 2(a)] P. Schmid proved that under Hypothesis 2.1, if G has a regular orbit on V ,
V is irreducible, and k(GV ) = |V |, then G is abelian, and from this it follows easily that either
|G| = 1 and |V | = p, or G is cyclic of order |V | − 1. The proof in [5] is somewhat technical.
The goal of this section is to give a short proof of a generalization of Schmid’s result based on a
beautiful argument of Knörr [3]. We word it in such a way that we even do not need the coprime
hypothesis, so that the result may even be useful to study the non–coprime k(GV )–problem. To
do this, for any group X and x ∈ X we introduce the set
Irr(X,x) = {χ ∈ Irr(G)| χ|〈x〉 is not an integer multiple of the regular character of 〈x〉}
and write
k(X,x) = |Irr(X,x)|.
2.2 Theorem. Let G be a finite group and let V be a finite G–module such that G possesses a
regular orbit on V . Let v ∈ V be a representative of such an orbit. Then
k(GV, v) ≤ |V | − |G|+ k(G)
Proof. Let p be the characteristic if V . We proceed exactly as in Case (ii) of the proof of
[3, Theorem 2.2]. Write C = CG(v). As C = 1, we see that for A = 〈v〉 we trivially have that
|C| and |A| are coprime, and so that proof yields
(1) (p − 1)|V | =
τ∈Irr(GV )
(τη, τ)A
where η is the character of A defined by η = p1A − ρA with ρA being the regular character of
A. Now for any τ ∈ Irr(GV ) we have
(2) (τη, τ)A =
τ(a)(p − ρA(a))τ(a)
16=a∈A
|τ(a)|2
= 0 if τ |A is an integer multiple of ρA
≥ p− 1 otherwise
where the last step follows from [4, Corollary 4]. Next observe that if τ ∈ Irr(GV ) with V ≤ ker τ ,
then τ ∈ Irr(G) and clearly τ |A is not a multiple of ρA, and then clearly
(3) (τη, τ)A =
16=a∈A
|τ(a)|2 =
16=a∈A
τ(1)2 = (p − 1)τ(1)2.
Thus with (1), (2), and (3) we get
(p − 1)|V | =
τ∈Irr(G)
(τη, τ)A +
τ ∈ Irr(GV ),
V 6≤ ker τ
(τη, τ)A
τ∈Irr(G)
(p− 1)τ(1)2 + (k(GV, v) − k(G))(p − 1)
which yields
|V | ≥
τ∈Irr(G)
τ(1)2 + k(GV, v) − k(G) = |G|+ k(GV, v) − k(G).
This implies the assertion of the theorem, and we are done. ✸
The following consequence implies Schmid’s result [5, Theorem 2(a)].
2.3 Corollary. Assume Hypothesis 2.1 and that G has a regular orbit on V . Then
k(GV ) ≤ |V | − |G|+ k(G).
In particular, if k(GV ) = |V |, then G is abelian.
Proof. By Ito’s theorem and as (|G|, |V |) = 1, we know that χ(1) divides |G| for every
χ ∈ Irr(GV ), so in particular p does not divide χ(1). Thus for any v ∈ V # we see that
χ|〈v〉 cannot be an integer multiple of ρ〈v〉. Therefore k(GV, v) = k(GV ). Now the assertion
follows from Theorem 2.2. ✸
3 Bounds for k(GV)
In this section we study more variations of Knörr’s argument in [3, Theorem 2.2] and generalize
it to some non-coprime situations.
We begin, however, by looking at a classical application of it. An important and immediate
consequence of Knörr’s result is that if under Hypothesis 2.1 G has a regular orbit on V , then
k(GV ) ≤ |V |. This important result can be obtained in the following shorter way.
3.1 Proposition. Let G be a finite group and let V be a finite faithful G–module. Let v ∈ V .
k(GV, v) ≤ |CG(v)||V |,
in particular, if (|G|, |V |) = 1 and G has a regular orbit on V , then k(GV ) ≤ |V |.
Proof. PutA = 〈v〉. If τ ∈ Irr(GV, v), then by [4, Corollary 4] we know that
16=a∈A
|τ(a)|2 ≥ p−1.
With this and well–known character theory we get
(p− 1)k(GV, v) ≤ k(GV, v) min
τ∈Irr(GV,v)
16=a∈A
|τ(a)|2
τ∈Irr(GV )
16=a∈A
|τ(a)|2
16=a∈A
τ∈Irr(GV )
τ(a)τ(a)
16=a∈A
|CGV (a)|
16=a∈A
|CG(v)||V |
= (p− 1)|CG(v)||V |
This implies the first result. If (|G|, |V |) = 1, then by Ito’s result τ(1)||G| for all τ ∈ Irr(GV ), so
p cannot divide τ(1), and thus k(GV, v) = k(GV ), and the second result now follows by choosing
v to be in a regular orbit of G on V . ✸
Now we turn to generalizing Knörr’s argument. We discuss various ways to do so.
3.2 Remark. Let G be a finite group and let V be a finite faithful G–module of characteristic
p. Let v ∈ V and put C = CG(v) and A = 〈v〉. Let
Irr(GV,C, v) := Irr0(GV ) := Irr(GV )−{χ ∈ Irr(GV ) | χ|C×〈v〉 = τ ×ρA for a character τ of C}
Irrp′(GV ) = {χ ∈ Irr(GV ) | p does not divide χ(1)},
so that clearly Irrp′(GV ) ⊆ Irr0(GV ).
Note that if (|G|, |V |) = 1, then by Ito Irr(GV ) = Irrp′(GV ).
To work towards our next result, we again proceed somewhat similarly as in [3, Theorem 2.2]. In
the following we work under the hypothesis that (|C|, |V |) = 1. Let N = NG(A). Then |N : C|
divides p − 1. Moreover, from Knörr’s proof we know that if ci (i = 1, . . . , k(C)) with c1 = 1
are representatives of the conjugacy classes of C and aj (j = 1, . . . ,
|N :C|
) are representatives of
the N–conjugacy classes of A − 1 then, the ciaj are representatives of those conjugacy classes
of GV which intersect C × (A− 1) nontrivially.
Moreover recall from Knörr’s proof that for c ∈ C, 1 6= a ∈ A, g ∈ G, u ∈ V we know that
(ca)gu ∈ C ×A if and only if g ∈ N and u ∈ CV (c
Now define a character η on C ×A by η = 1C × (p1A − ρA).
Then for c ∈ C, a ∈ A we have
η(ca) =
p, if a 6= 1
0, if a = 1
Therefore ηGV vanishes on all conjugacy classes of GV which intersect C × (A − 1) trivially,
whereas for c ∈ C, 1 6= a ∈ A we have that
ηGV (ca) =
|C ×A|
g ∈ G
u ∈ V
η̇((ca)gu)
u∈CV (c
η(cgag)
|CV (c
|CV (c)|
= |N : C| |CV (c)|.
Thus if xi (i = 1, . . . , k(GV )) are representatives of the conjugacy classes of GV , then we get
k(GV )
ηGV (xi) =
|N:C|
ηGV (ciaj)
|N:C|
|N : C| |CV (ci)|
|N : C|
|N : C|
|CV (ci)|
= (p− 1)
|CV (ci)|,
and thus
(p − 1)
|CV (ci)| =
k(GV )
ηGV (xi) =
τ∈Irr(GV )
(τηGV , τ)GV
τ∈Irr(GV )
(τη, τ)C×A (1).
Now if τ ∈ Irr(GV ), as in [3] write
τ |C×A =
λ∈Irr(A)
τλ × λ (2)
where τλ is a character of C or τλ = 0.
Then as in [3] we see that
(τη, τ)C×A =
|C ×A|
c ∈ C
a ∈ A
τ(ca)η(ca)τ(ca)
c ∈ C
1 6= a ∈ A
τ(ca)τ(ca)
((τλ − τµ), (τλ − τµ))C (3)
where ”≤” is some arbitrary ordering on Irr(A).
Now if τλ − τµ is a nonzero multiple of ρC , then
(τλ − τµ, τλ − τµ)C ≥ |C| (4)
and thus
(τη, τ)C×A ≥ |C|.
Moreover, note that if τ ∈ Irr0(GV ), then not all τλ− τµ can be equal to 0 as otherwise from (2)
we see that τ |C×A would be equal to τλ × ρA for any λ. So we can partition the set Irr(A) into
two disjoint nonempty subsets Λ1 = {λ ∈ Irr(A) | τλ = τ1} and Λ2 = {λ ∈ Irr(A) | τλ 6= τ1},
and thus as in [3] we see that |Λ1| |Λ2| ≥ p − 1, so there are at least p − 1 pairs λ, µ ∈ Irr(A)
such that τλ − τµ 6= 0. Thus
(τη, τ)C×A ≥ p− 1 for all τ ∈ Irr0(GV ). (5)
Therefore by (1) and (5) we get that
(p− 1)
|CV (ci)| =
τ∈Irr(GV )
(τη, τ)C×A ≥
τ∈Irr0(GV )
(τη, τ)C×A ≥ (p − 1)|Irr0(GV )|
and thus
|Irr0(GV )| ≤
|CV (ci)|. (6)
From now on we assume that C > 1.
Now we repeat the arguments of this proof, but replace η by
η1 = (|C|1C − ρC)× (p1A − ρA),
so for c ∈ C and a ∈ A we have
η1(ca) =
|C|p if c 6= 1 and a 6= 1
0 if c = 1 or a = 1
Now from the above we know that the ciaj (i = 2, . . . , k(C), j = 1, . . . ,
|N :C|
) are representatives
of those conjugacy classes which intersect (C − 1)× (A− 1) nontrivially.
Clearly ηGV1 vanishes on all conjugacy classes of GV which intersect (C − 1)× (A− 1) trivially,
whereas for 1 6= c ∈ C, 1 6= a ∈ A, if (|C|, |V |) = 1, we have that
ηGV1 (ca) =
|C ×A|
g ∈ G
u ∈ V
η̇1((ca)
u∈CV (c
= |N | |CV (c)|.
Next we conclude that
k(GV )
ηGV1 (xi) =
|N:C|
ηGV1 (ciaj) = (p− 1)|C|
|CV (ci)|,
and so as in (1) we see that
(p− 1)|C|
|CV (ci)| =
τ∈Irr(GV )
(τη1, τ)C×A (7).
Now with (2) similarly as in [3] we see that
(τη1, τ)C×A =
|C ×A|
c ∈ C
a ∈ A
τ(ca)η1(ca)τ(ca)
1 6= c ∈ C
1 6= a ∈ A
τ(ca)τ(ca)
1 6= c ∈ C
1 6= a ∈ A
λ∈Irr(A)
τλ(c)λ(a)
µ∈Irr(A)
τµ(c)µ(a)
λ,µ∈Irr(A)
16=c∈C
τλ(c)τµ(c)
16=a∈A
λ(a)µ(a)
= (p − 1)
λ∈Irr(A)
16=c∈C
τλ(c)τλ(c) −
λ, µ ∈ Irr(A)
λ 6= µ
16=c∈C
τλ(c)τµ(c)
λ∈Irr(A)
16=c∈C
τλ(c)τλ(c)−
λ,µ∈Irr(A)
16=c∈C
τλ(c)τµ(c)
16=c∈C
(τλ(c) − τµ(c))(τλ(c)− τµ(c))
16=c∈C
|τλ(c) − τµ(c)|
2 (8)
for some arbitrary ordering ≤ on Irr(A).
Now recall that if τ ∈ Irr0(GV ), then not all of the τλ − τµ can be 0. So choose λ, µ ∈ Irr(C)
such that τλ − τµ 6= 0. If all the τµ (µ ∈ Irr(A)) are integer multiples of ρC then put Λ1 =
{φ ∈ Irr(A) | τφ = τλ} and Λ2 = {φ ∈ Irr(A) | τφ 6= τλ}, so Λ1 6= ∅ and Λ2 6= ∅ and from
0 ≤ (|Λ1| − 1)(|Λ2| − 1) we clearly deduce that |Λ1||Λ2| ≥ p− 1, so there are at least p− 1 pairs
(φ1, φ2) ∈ Irr(A) × Irr(A) such that τφ1 − τφ2 is a nonzero multiple of ρC .
So next we assume that τλ is not a multiple of ρC .
Then put
Γ1 = {φ ∈ Irr(A) | τλ − τφ is a multiple of ρC}
Γ2 = {φ ∈ Irr(A) | τλ − τφ is not a multiple of ρC}.
Clearly λ ∈ Γ1, so Γ1 6= ∅. If Γ2 = ∅, then Irr(A) = Γ1, and if we define Λ1, Λ2 as in the
previous argument, we see that there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A) × Irr(A) such
that τφ1 − τφ2 is a nonzero multiple of ρC .
So now suppose Γ2 6= ∅. Then |Γ1| + |Γ2| = p, and if φ1 ∈ Γ1 and φ2 ∈ Γ2, then τφ1 − τφ2 =
(τφ1 − τλ) + (τλ − τφ2) clearly is not a multiple of ρC , and by the same argument as used before
we see that |Γ1||Γ2| ≥ p − 1, so there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A) × Irr(A) such
that τφ1 − τφ2 is not a multiple of ρC .
Altogether we thus have shown that for any τ ∈ Irr0(GV ) one of the following holds:
(A) There are at least (p− 1) pairs (φ1, φ2) ∈ Irr(A)× Irr(A) such that
τφ1 − τφ2 is a nonzero multiple of ρC , or
(B) there are at least (p − 1) pairs (φ1, φ2) ∈ Irr(A)× Irr(A) such that
τφ1 − τφ2 is not a multiple of ρC .
Now it remains to consider two cases:
Case 1: At least half of the τ ∈ Irr0(GV ) satisfy (A).
Then for any of these τ by (3) and (4) we have
(τη, τ)C×A =
((τλ − τµ), (τλ − τµ))C ≥ (p − 1)|C|
and so by (1) we see that
(p − 1)
|CV (ci)| ≥
τ∈Irr0(GV )
(τη, τ)C×A ≥
|Irr0(GV )|(p− 1)|C|
which implies
|Irr0(GV )| ≤
|CV (ci)| (9).
Case 2: At least half of the τ ∈ Irr0(GV ) satisfy (B).
Then for any of these τ by (8) and [4, Corollary 4] we have
(τη1, τ)C×A ≥ (p− 1)(k(C) − 1).
Thus by (7) we have that
(p− 1)|C|
|CV (ci)| ≥
τ∈Irr0(GV )
(τη1, τ)C×A ≥
|Irr0(GV )|(p − 1) · (k(C)− 1)
whence
|Irr0(GV )| ≤
k(C)− 1
|CV (ci)| (10).
Now we drop the assumption (|C|, |V |) = 1 and work towards a general bound for |Irr0(GV )|.
For this, fix g0 ∈C such that g0 is of prime order q and put C0 = 〈g0〉 andN0 = NG(C0). Trivially
there are at most |C0|(p − 1) = q(p − 1) conjugacy classes of GV that intersect C0 × (A − 1)
nontrivially, and given 1 6= c ∈ C0, 1 6= a ∈ A, we see that for g ∈ G, u ∈ V
(ca)gu = cg[cg, u]ag ∈ C0 ×A first implies c
g ∈ C0, i.e., g ∈ N0,
and for each fixed g ∈ N0, the equation [c
g, u]ag ∈ A implies [cg, u] ∈ Aa−g which has at most
|CV (c
g)| |Ag−1| = p|CV (g0)| solutions u.
Moreover, if c = 1, then
(ca)gu = agu = ag implies g ∈ NG(A) = N and u ∈ V.
Now we define the character η2 on C0 ×A by η2 = 1C0 × (p1A − ρA). Thus η
2 vanishes on all
conjugacy classes of GV which intersect C0×(A−1) trivially, whereas for 1 6= c ∈ C0, 1 6= a ∈ A
we get
ηGV2 (ca) =
|C0 ×A|
g ∈ G
u ∈ V
η̇ ((ca)gu)
p|CV (g0)|p
|N0||CV (g0)|,
and for c = 1, 1 6= a ∈ A we get
ηGV2 (ca) = η
2 (a) =
|V |p =
|N ||V |.
Thus if xi (i = 1, . . . , k(GV )) are representatives of the conjugacy classes of GV , then
k(GV )
ηGV2 (xi) ≤ (p − 1)
|N ||V |+ (q − 1)(p − 1)
|N0||CV (g0)|
and as in (1) we see that
k(GV )
ηGV2 (xi) =
τ∈Irr(GV )
(τη2, τ)C0×A.
Now arguing as in (2), (3), (5) and (6) above will yield
|Irrp′(GV )| ≤ k(GV, v) ≤ |Irr(GV,C0, v)| ≤
(|N ||V |+ (q − 1)p|N0||CV (g0)|),
where Irr(GV,C0, v) is as defined at the beginning of Remark 3.2. Putting the main results
together, altogether we have proved the following:
3.3 Theorem. Let G be a finite group and let V be a finite faithful G–module of characteristic
p. Let v ∈ V and put C = CG(v). If ci (i = 1, . . . , k(C)) are representatives of the conjugacy
classes of C, then the following hold:
(a) If (|C|, |V |) = 1, then
|Irr0(GV )| ≤
|CV (ci)|
and if C > 1, then
|Irr0(GV )| ≤ max
|CV (ci)|,
k(C)− 1
|CV (ci)|
(b) If (|G|, |V |) = 1, then
Irr0(GV ) = Irr(G), so k(GV ) = |Irr0(GV )|
and the bounds in (a) hold true for k(GV ) instead of |Irr0(GV )|.
(c) In general, if g ∈ C such that o(g) = q is a prime, then
|Irrp′(GV )| ≤ k(GV, v) ≤
|NG(〈v〉)||V |+ (q − 1)p|NG(〈g〉)||CV (g)|
4 The dual approach
In the previous section, we always fixed v ∈ V and obtained bounds on the size of suitable subsets
of Irr(GV ) in terms of properties of the action of CG(v) on V . In this section we consider a
”dual” approach:
We fix g ∈ G and find bounds in terms of the action of CG(g) on CV (g). For this, put
Irrg(GV ) = {χ ∈ Irr(G) | χ|〈g〉×CV (g) cannot be written as ρ〈g〉×ψ for a character ψ of CV (g)}.
In particular, Irr(GV, g) ⊆ Irrg(GV ).
4.1 Theorem. Let G be a finite group and V be a finite G–module. Let g ∈ G such that
(o(g), |V |) = 1. Write A = 〈g〉, N = NG(A) and C = CV (g). Then
(a) |Irrg(GV )| ≤
(n(N,A)−1)n(CG(A),C)
(|A|−1)|C|
max16=a∈A(|NG(〈a〉)||CV (a)|)
(b) if g is of prime order, then
|Irrg(GV )| ≤ |CG(A)|n(CG(A), C)
(c) there are X,Y ⊆ Irrg(GV ) such that Irrg(GV ) is a disjoint union of X and Y and
|X| ≤
(n(N,A)− 1)n(CG(A), C)
(|A| − 1)|C|2
16=a∈A
(|NG(〈a〉)||CV (a)|) and
|Y | ≤
(n(N,A)− 1)(n(CG(A), C)− 1)
(|A| − 1)|C|
16=a∈A
(|NG(〈a〉)||CV (a)|)
(d) if g is of prime order and X,Y are as in (c), then
|X| ≤
|CG(A)|n(CG(A), C)
and |Y | ≤ |CG(A)|(n(CG(A), C)− 1)
Proof. If a1, a2 ∈ A and c1, c2 ∈ C − {1}, then it is straightforward to see that (a1, c1)
(a2, c2)
GV implies that aG1 = a
2 . Hence if T is a set of representatives of the orbits of N on
A−{1}, then every conjugacy class of GV that intersects nontrivially with (A−{1})×C has a
representative ac for some a ∈ T and some c ∈ C. Moreover, for each a ∈ T we have that if c3,
c4 ∈ C are CG(A)–conjugate, then ac3 and ac4 are CG(A)–conjugate and thus (ac3)
G = (ac4)
This shows that for each a ∈ T there are at most n(CG(A), C) conjugacy classes of GV inter-
secting nontrivially with {a} × C. Hence altogether we see that there are at most
|T |n(CG(A), C) = (n(N,A) − 1)n(CG(A), C) (1)
conjugacy classes of GV which intersect (A− {1}) × C nontrivially.
Moreover observe that for 1 6= a ∈ A, c ∈ C, h ∈ G and u ∈ V we have
(ac)hu ∈ A× C if and only if h ∈ NG(〈a〉), c
h ∈ C and u ∈ CV (a)
because the condition (ac)hu = ah[ah, u]ch ∈ A × C first forces ah ∈ A which implies (as A is
cyclic) ah ∈ 〈a〉, so h ∈ NG(〈a〉), and then as c ∈ C ≤ CV (〈a〉), it follows that c
h ∈ CV (〈a〉)
and [ah, u] ∈ [〈a〉, V ]. Now as by our hypothesis we have V = CV (〈a〉) × [〈a〉, V ], we see that
(ac)hu ∈ A× C now forces [ah, u] = 1 and ch ∈ C. Hence u ∈ CV (a
h) = CV (a).
Note that the direct product A×C is a subgroup of GV . We now define a generalized character
η on A× C by
η = (|A| · 1A − ρA)× 1C
where ρA is the regular character of A. So for a ∈ A, c ∈ C we have
η(ac) =
0, a = 1
|A|, a 6= 1
Therefore ηGV vanishes on all conjugacy classes of GV which intersect (A− {1}) × C trivially,
whereas for c ∈ C and 1 6= a ∈ A we have
ηGV (ac) =
|A× C|
h ∈ G
u ∈ V
η̇((ac)hu)
|A||C|
h ∈ NG(〈a〉)
with ch ∈ C
u∈CV (a)
η((ac)hu)
|A||C|
h ∈ NG(〈a〉)
with ch ∈ C
u∈CV (a)
η(ahch)
|CV (a)|
|A||C|
h ∈ NG(〈a〉)
with ch ∈ C
|NG(〈a〉)||CV (a)|
Thus if {xi | i = 1, . . . , k(GV )} is a set of representatives for the conjugacy classes of GV , then
by (1) and (2) we see that
(n(N,A) − 1)n(CG(A), C) ·
16=a∈A
(|NG(〈a〉)||CV (a)|) ≥
k(GV )
ηGV (xi)
τ∈Irr(GV )
(τηGV , τ)GV
τ∈Irr(GV )
(τη, τ)A×C (3).
Observe that in case that A is of prime order, then
n(N,A)− 1 =
|A| − 1
|N : CG(A)|
(|A| − 1)|CG(A)|
and max
16=a∈A
(|NG(〈a〉)||CV (a)|) = |N ||C|, so that (3) becomes
|CG(A)|(|A| − 1)n(CG(A), C) ≥
τ∈Irr(GV )
(τη, τ)A×C (3a)
Since A× C is a direct product, we can write
τA×C =
λ∈Irr(C)
(τλ × λ),
where τλ is a character of A or τλ = 0. Then
(τη, τ)A×C =
|A× C|
a ∈ A
c ∈ C
τ(ac)η(ac)τ(ac)
|A||C|
1 6= a ∈ A
c ∈ C
τ(ac)|A|τ(ac)
1 6= a ∈ A
c ∈ C
λ∈Irr(C)
τλ(a)λ(c)
µ∈Irr(C)
τµ(a)µ(c)
16=a∈A
λ,µ∈Irr(C)
τλ(a)τµ(a)
λ(c)µ(c)
16=a∈A
λ,µ∈Irr(C)
τλ(a)τµ(a)(λ, µ)C
As (λ, µ)C =
1, λ = µ
0, λ 6= µ
, we further obtain
(τη, τ)A×C =
16=a∈A
λ∈Irr(C)
τλ(a)τλ(a)
λ∈Irr(C)
16=a∈A
|τλ(a)|
2 (4)
Now observe that τ(1) =
λ∈Irr(C)
τλ(1).
If all the τλ are multiples of ρA, then clearly τ1 6∈ Irrg(GV ), and so if τ ∈ Irrg(GV ), then by [4,
Corollary 4] with (4) we see that
(τη, τ)A×C ≥ |A| − 1 (5)
So (3) and (5) yield
|Irrg(GV )| ≤
(n(N,A)− 1)n(CG(A), C)
(|A| − 1)|C|
16=a∈A
(|NG(〈a〉)||CV (a)|), (6)
and if g is of prime order, then (3a) and (5) yield
|Irrg(GV )| ≤ |CG(A)|n(CG(A), C). (6a)
Now as in Section 3, we now repeat the same arguments, but use
η1 = (|A|1A − ρA)× (|C|1C − ρC)
instead of η.
One can then easily check that
(n(N,A)− 1)(n(CG(A), C) − 1) ·
16=a∈A
(|NG(〈a〉)||CV (a)|) ≥
τ∈Irr(GV )
(τη1, τ)A×C (3b)
and if g is of prime order, then
|CG(A)|(|A| − 1)(n(CG(A), C) − 1) ≥
τ∈Irr(GV )
(τη1, τ)A×C (3c)
Moreover it is easily seen that
(τη1, τ)A×C =
1 6= a ∈ A
1 6= c ∈ C
τ(ac)τ(ac)
16=a∈A
λ,µ∈Irr(C)
τλ(a)τµ(a)
16=c∈C
λ(c)µ(c),
and as
16=c∈C
λ(c)µ(c) =
−1, if λ 6= µ
|C| − 1, if λ = µ
, it follows that
(τη1, τ)A×C =
16=a∈A
|τλ(a)− τµ(a)|
2 (7)
where ”≤” is an arbitrary ordering on Irr(C).
Next suppose that there are exactly a characters τ ∈ Irrg(GV ) such that there is a character
ψ of A (depending on τ) and there are aλ ∈ ZZ (λ ∈ Irr(C)) such that τλ = ψ + aλρA for all
λ ∈ Irr(C) and ψ is not a multiple of ρA. Then by (4) and [4, Corollary 4] we know that
(τη, τ)A×C =
λ∈Irr(C)
16=a∈A
|ψ(a)|2 ≥ |C|(|A| − 1)
and hence by (3) we get
(n(N,A)− 1)n(CG(A), C)
(|A| − 1)|C|2
16=a∈A
(|NG(〈a〉)||CV (a)|), (8)
and if g is of prime order, then by (3a) even
|CG(A)|n(CG(A), C)
Now let b be the number of τ ∈ Irrg(GV ) such that there is no such ψ.
Then there exist λ, µ ∈ Irr(C) with
16=a∈A
|τλ(a)− τµ(a)|
2 6= 0,
and thus by [4, Corollary 4] we have
(τη1, τ) ≥ |A| − 1 (9)
So (3b) and (9) yield
(n(N,A)− 1)(n(CG(A), C) − 1)
|C|(|A| − 1)
16=a∈A
(|NG(〈a〉)||CV (a)|) (10)
and, if g is of prime order, then by (3c)
b ≤ |CG(A)|(n(CG(A), C) − 1), (10b)
and clearly a+ b = |Irrg(GV )|, and hence all the assertions follow and we are done. ✸
References
[1] R. Guralnick, P. H. Tiep, The non–coprime k(GV )–problem, J. Algebra 279 (2004), 694–
[2] T. M. Keller, Fixed conjugacy classes of normal subgroups and the k(GV )–problem, J.
Algebra 305 (2006), 457–486.
[3] R. Knörr, On the number of characters in a p–block of a p–solvable group, Illinois J. Math
28 (1984), 181–209.
[4] G. R. Robinson, A bound on norms of generalized characters with applications, J. Algebra
212 (1999), 660–668.
[5] P. Schmid, Some remarks on the k(GV )–theorem, J. Group Theory 8 (2005), 589–604.
ABSTRACT
  Let $G$ be a finite group and $V$ be a finite $G$--module. We present upper
bounds for the cardinalities of certain subsets of $\Irr(GV)$, such as the set
of those $\chi\in\Irr(GV)$ such that, for a fixed $v\in V$, the restriction of
$\chi$ to $<v>$ is not a multiple of the regular character of $<v>$. These
results might be useful in attacking the non--coprime $k(GV)$--problem.

<|endoftext|><|startoftext|>
Introduction
1.1 The setup
The study of lattice effective interface models, continous and discrete, has a long tradi-
tion in statistical mechanics [14, 5, 9, 10, 13, 2, 3, 4].
The model we study is given in terms of variables ϕi ∈ R which, physically speaking,
are thought to represent height variables of a random surface at the sites i ∈ Zd.
Mathematically speaking they are just continuous unbounded (spin) variables. The
model is defined in terms of: a pair potential V , a quenched random term, and a
pinning term at interface height zero.
More precisely, we are interested in the behavior of the quenched finite-volume Gibbs
measures in a finite volume Λ⊂Zd with fixed boundary condition at height zero, given
University of Groningen, Department of Mathematics and Computing Sciences, Blauwborgje 3,
9747 AC Groningen, The Netherlands kuelske@math.rug.nl, http://www.math.rug.nl/∼kuelske/
Dipartimento di Matematica, Universit degli Studi ”Roma Tre”, Largo San Leonardo Murialdo, 1,
00146 Roma, ITALY, orlandi@mat.uniroma3.it , http://www.mat.uniroma3.it/users/orlandi/
http://arxiv.org/abs/0704.0582v1
http://www.math.rug.nl/~kuelske/
µε,Λ[η](dϕΛ)
〈i,j〉∈Λ V (ϕi−ϕj)−
i∈Λ,j∈Λc,|i−j|=1 V (ϕi)+
i∈Λ ηiϕi
i∈Λ(dϕi + εδ0(dϕi))
Zε,Λ[η]
where the partition function Zε,Λ[η] denotes the normalization constant that turns the
last expression into a probability measure. The Dirac-measures at the interface height
zero are multiplied with the parameter ε, having the meaning of a coupling strength.
The disorder configuration η = (ηi)i∈Rd denotes an arbitrary fixed configuration of
external fields, modelling a ”quenched” (or frozen) random environment.
What do we expect for such a model? Recall that the variance of a free massless
interface in a finite box diverges like the logarithm of the sidelength when there are no
random fields. Adding an arbitrarily small pinning ε (without disorder) always localizes
the interface uniformly in the volume, with the variance of the field behaving on the scale
| log ε| when ε tends to zero. Indeed, there is a beautiful and complete mathematical
understanding of the model without disorder, in the case of both Gaussian and uniformly
elliptic potentials (see [1, 7]) with precise asymptotics as the pinning force tends to zero.
These results follow from the analysis of the distribution of pinned sites and the random
walk (arising from the random walk representation of the covariance of the ϕi’s) with
killing at these sites. In this sense there is already a random system that needs to be
analyzed even without disorder in the original model.
What do we expect if we turn on randomness in the model and add the ηi’s ? Let
us review first what we know about the same model without a pinning force. In d = 2
we recently proved the deterministic lower bound µΛN [η](|ϕ0| ≥ t
logL) ≥ c exp(−ct2)
uniformly for any fixed disorder configuration η, for general potentials V (assuming not
too slow growth at infinity) [12]. So, it is not possible to stabilize an interface by cleverly
choosing a random field configuration (one could think e.g. that this might be possible
with a staggered field). As this result holds at any arbitrary fixed configuration here
we don’t need any assumptions on the distribution of random fields. This result clearly
excludes the existence of an infinite-volume Gibbs measure describing a two dimensional
interface in infinite volume in the presence of random fields. In another paper [8] the
question of existence of gradient Gibbs measures (Gibbs distributions of the increments
of the interface) in infinite volume was raised. Note that while interface states may not
exist in the infinite volume such gradient states may very well exist, as the example of
the two-dimensional Gaussian free field shows, by computation. (For existence beyond
the Gaussian case which is far less trivial, see [10, 11].) It was proved in [8] that there
are no such gradient Gibbs measures in the random model in dimension d = 2.
Now, turn to the full model in d = 2. In view of the localization taking place at any
positive pinning force ε without disorder, a natural guess might be that with disorder
at least at very large ε there would be pinning. However, we show as a result of the
present paper that this is not the case, somewhat to our own surprise, and an arbitrarily
strong pinning does not suffice to keep the interface bounded.
1.2 Main results
Delocalization in d = 2 - superextensivity of the overlap
Denote by ΛL the square of sidelength 2L+ 1 centered at the origin.
In this subsection we consider the disorder average of the overlap in ΛL showing
that it grows faster than the volume. This in particular implies that in two dimensions
there is never pinning, for arbitrarily weak random field and arbitrarily large pinning
forces ε. Here is the result.
Theorem 1.1 Assume that supt V
′′(t) ≤ 1, lim inf |t|↑∞
log V (t)
log |t|
> 1, and let ηi be sym-
metrically distributed, i.i.d. with finite second moment.
Let d = 2. Then there is a constant a > 0, independent of the distribution of the
random fields and the pinning strength ε ≥ 0, such that
lim inf
L2 logL
ηiµε,ΛL [η](ϕi)
≥ aE(η20). (2)
Note that the growth condition on V includes the quadratic case and ensures the
finiteness of the integrals appearing in (1) for all arbitrarily fixed choices of η, even at
ε = 0.
Generalizations to interactions that are non-nearest neighbor are obvious; all results
go through e.g. for finite range and we skip them in this presentation for the sake of
simplicity. We like to exhibit the case of Gaussian random fields (and not necessarily
Gaussian potential V ) since the bound acquires a form that looks even more striking
because it becomes independent of the size of the variance of the ηi’s (as long as this is
strictly positive).
Corollary 1.2 Let us assume that the random fields ηi have an i.i.d. Gaussian distri-
bution with mean zero and strictly positive variance of arbitrary size.
Then, with the same constant a as above, we have the bound
lim inf
L2 logL
µε,ΛL [η](ϕ
i )− µε,ΛL [η](ϕi)
≥ a > 0 (3)
for any 0 ≤ ε < ∞.
(3) follows from (2) by partial integration w.r.t. the Gaussian disorder average
(transforming the overlap into the variance of the ϕi’s).
Note that, even in the unpinned case of ε = 0, Theorem 1.1 is not entirely trivial in
the case of general potentials V . Here it provides an alternative simple way to see the
delocalization in the presence of random fields (while the explicit lower bound on the
tails of [12] provides more information.)
Lower bound on overlap in d ≥ 3
The analogue of Theorem 1.1 for higher dimensions is the following.
Theorem 1.3 Let d ≥ 3 and let ε ≥ 0 be arbitrary and assume the same conditions on
V and ηi as in Theorem 1.1.
Then there are positive constants B1, B2 < ∞, independent of the distribution of the
random fields and the pinning strength ε ≥ 0, such that
lim inf
ηiµε,ΛL[η](ϕi)
E(η20) (−∆−1)0,0
− log(B1 +B2ε) (4)
where the positive constant (−∆−1)0,0 is the diagonal element of the inverse of the
infinite-volume lattice Laplace operator whose existence is guaranteed in d ≥ 3.
Lower bound on the pinned volume in d ≥ 3
We complement the previous lower bounds on the overlaps which are depinning-type
of results by a pinning-type result. It is a lower bound on the disorder average of the
quenched Gibbs-expectation of the fraction of pinned sites. While we needed an upper
bound on the interaction potential V before we are assuming now a lower bound on V .
Theorem 1.4 Let d ≥ 3. Assume that inft V ′′(t) = c− > 0 and let ηi be symmetrically
distributed, i.i.d. with finite second moment.
Then there exist dimension-dependent constants C1, C2 > 0, independent of the
distribution of the disorder, such that, for all ε and for all volumes Λ, the disorder
average of the fraction of pinned sites obeys the estimate
µε,Λ[η](ϕi = 0)
≥ 1− C1 + C2E(η
log ε
. (5)
This shows pinning for the large ε regime in the ”thermodynamic sense” that the
fraction of pinned sites can be made arbitrarily close to one, uniformly in the volume.
As usual this result does not allow to make statement about the Gibbs measure itself.
The proofs follows from ”thermodynamic reasoning”. The first ”depinning-type”
result follows from taking the log of the partition function and differentiating and in-
tegrating back w.r.t. the coupling strength of the random fields. Exploiting the linear
form of the random fields, convexity, comparison of non-Gaussian with the Gaussian
partition functions, and asymptotics of Green’s functions the results follow, see Chapter
2 Proof of Depinning-type results
The estimates in formulas (2), (3), and (4) are immediate consequences of the following
fixed-disorder estimate.
Proposition 2.1 For any dimension d, there are constants CnG,d < ∞ and cG,d > 0
such that, for all fixed configurations of local fields η, we have
i,j∈Λ
(−∆Λ)−1i,j ηiηj − |Λ| log
CnG,d + ε
ηiµε,Λ[η](ϕi). (6)
Proof of the Proposition: Let us see what comes out when we differentiate and
integrate back the free energy in finite volume w.r.t. strength of the random fields.
logZε,Λ[hη] =
ηiµε,Λ[hη](ϕi). (7)
At every fixed η, this quantity is a monotone function of h, which is seen by another
differentiation w.r.t. h which produces the variance. We have
Zε,Λ[η]
Zε,Λ[0]
dhηiµε,Λ[hη](ϕi) ≤
ηi µε,Λ[η](ϕi). (8)
We note the lower bound on the numerator which we get by dropping the pinning term,
giving us
Zε,Λ[η] ≥ Zε=0,Λ[η]
≥ ZGaussε=0,Λ[η]
= exp
i,j∈Λ
(−∆Λ)−1i,j ηiηj
ZGaussε=0,Λ[0]
≥ exp
i,j∈Λ
(−∆Λ)−1i,j ηiηj
Here we have denoted by ZGaussε=0,Λ[η] the Gaussian partition function with potential V (t) =
Further we used that the lower bound on V (t) taken from the hypothesis implies
that, for any partition function in any volume D, we have Zε=0,D[0] ≤ C |D|nG,d. This gives
Zε,Λ[0] =
ε|A|Zε,Λ\A[0]
ε|A|C
|Λ\A|
nG,d = (CnG,d + ε)
So the desired estimate on the overlap follows from (8),(9),(10). This concludes the
proof of the Proposition. �
It is easy to obtain the Theorems 1.1 and 1.3 from the proposition. Indeed, taking
a disorder average we have
E(η20)
(−∆Λ)−1i,i − |Λ| log
CnG,d + ε
ηiµε,Λ[η](ϕi)
. (11)
Now use the asymptotics of the Green’s-function in a square (−∆ΛL)
i,i ∼ logL at fixed
i to get the first theorem. The proof of the case d ≥ 3 follows from the existence of the
infinite-volume Green’s-function in d ≥ 3.
Finally let us note in passing that a constant magnetic field is always winning against
an arbitrarily strong pinning, and even more strongly than a random field. Indeeed, let
d ≥ 2, let ηi = h ≥ 0 for all sites i and let ε ≥ 0 be arbitrary. Then, there is a constant
cd > 0, independent of h and ε, such that
lim inf
µε,ΛL [h](ϕi) ≥ cdh. (12)
This again follows from the Proposition, using
i,j∈Λ(−∆ΛL)
i,j ∼ Ld+2.
3 Proof of Pinning-type results
To prove the lower bound on the fraction of pinning sites in dimension d ≥ 3 given in
Theorem 1.4 we will in fact prove the following fixed-disorder lower bound:
For all finite volumes Λ and for all realizations η we have, for any ε0 > 0
µε,Λ[η](ϕi = 0)
log ε
2c−|Λ|
i,j∈Λ
(−∆Λ)−1i,j ηiηj
with a constant CG,d defined in (21).
Taking a disorder-expectation (5) follows by the finiteness of Green’s function in the
infinite volume (−∆
)−10,0 with ε0 = 1. �
Proof of (13): The proof is based on the trick to differentiate and integrate back
the log of the partition function, now w.r.t. ε: Differentiation gives
logZε,Λ[η] =
µε,Λ[η](ϕi = 0). (14)
We integrate this relation back, and it will be important for us to do it starting from a
positive ε0 > 0. So we get
Zε,Λ[η]
Zε0,Λ[η]
µε̃,Λ[η](ϕi = 0) ≤ log
µε,Λ[η](ϕi = 0) (15)
where we have used that
i∈Λ µε̃,Λ[η](ϕi = 0) is a monotone function of ε̃. Note that
the integrand itself is not a monotone function. (Compare [6] for a related non-random
pinning scenario, with back-integration from zero.)
Now we have the trivial lower bound obtained by keeping only the contribution in
the expansion where all sites are pinned, i.e.
Zε,Λ[η] ≥ ε|Λ|. (16)
For the upper bound on the partition function of the full model (at ε0) we first use
the lower bound on the potential V (t) ≥ c−t
giving us a comparison with a Gaussian
partition function with curvature c−:
Zε0,Λ[η] ≤ Z
Gauss,c−
[η]. (17)
It is a simple matter to rescale the Gaussian curvature away
Gauss,c−
[η] = c−
2 ZGauss
2 η] (18)
where the partition function on the r.h.s. is taken with unity curvature potential. For
the Gaussian partition function we claim the upper bound (writing again in the original
parameters) of the form
ZGaussε,Λ [η] ≤
ZGaussε=0,Λ[η]. (19)
Here is an elementary proof: We will replace successively the single-site integrations
involving the Dirac measure by integrations only over the Lebesgue measure with the
appropriately adjusted prefactor. Indeed, consider one site i and compute the contri-
bution to the partition function while fixing the values of ϕj for j not equal to i. Then
use that
dϕi + εδ0(dϕi)
ϕj + ηi)ϕi
= (2π)
2 exp
j∼i ϕj + ηi)
2 exp
j∼i ϕj + ηi)
dϕi exp
ϕj + ηi)ϕi
and iterate over the sites.
For the Gaussian unpinned partition function use
ZGaussε=0,Λ[η] = exp
i,j∈Λ
(−∆Λ)−1i,j ηiηj
ZGaussε=0,Λ[0]
≤ exp
i,j∈Λ
(−∆Λ)−1i,j ηiηj
with a suitable constant. From here (5) follows from (15,16,17,18,19,21) �
Acknowledgements: The authors thank Pietro Caputo for an interesting discus-
sion and Aernout van Enter for comments on a previous draft of the manuscript. C.K.
thanks the university Roma Tre for hospitality.
References
[1] E. Bolthausen, Y. Velenik, Critical behavior of the massless free field at the depinning
transition. Comm. Math. Phys. 223, 161-203, 2001.
[2] M. Biskup and R. Kotecký, Phase coexistence of gradient Gibbs states. Published Online
in Probab. Theory Rel. Fields DOI 10.1007/s00440-006-0013-6, 2007.
[3] A. Bovier and C. Külske, A rigorous renormalization group method for interfaces in random
media. Rev. Math. Phys. 6, 413–496, 1994.
[4] A. Bovier and C. Külske, There are no nice interfaces in (2 + 1)-dimensional SOS models
in random media, J. Statist. Phys., 83: 751–759, 1996.
[5] J. Bricmont, A. El Mellouki, and J. Fröhlich, Random surfaces in statistical mechanics:
roughening, rounding, wetting, . . . J. Statist. Phys. 42, 743–798, 1986.
[6] P. Caputo, Y. Velenik, A note on wetting transition for gradient fields. Stochastic Process.
Appl. 87, 107–113, 2000.
[7] J.-D. Deuschel, Y. Velenik, Non-Gaussian surface pinned by a weak potential. Probab.
Theory Related Fields 116, 359-377, 2000.
[8] A. C. D. van Enter, C. Külske, Non-existence of random gradient Gibbs measures in contin-
uous interface models in d = 2., math.PR/0611140, to be published in Annals of Applied
Probability
[9] G. Forgacs, R. Lipowski and Th.M. Nieuwenhuizen, The Behaviour of Interfaces in Ordered
and Disordered Systems, in Phase Transitions and Critical Phenomena, vol. 14, edited by
C. Domb and J.L. Lebowitz, Academic Press, 1986.
[10] T. Funaki, Stochastic Interface models. 2003 Saint Flour lectures, Springer Lecture Notes
in Mathematics, 1869, 103–294, 2005.
[11] T. Funaki and H. Spohn, Motion by mean curvature from the Ginzburg-Landau ∇ϕ inter-
face model. Comm. Math. Phys. 185, 1–36, 1997.
[12] C. Külske, E. Orlandi, A simple fluctuation lower bound for a disordered massless random
continuous spin model in d = 2. Electronic Comm. Probab. 11 200-205 (2006)
[13] S. Sheffield, Random surfaces, large deviations principles and gradient Gibbs measure clas-
sifications. arXiv math.PR/0304049, Asterisque 304, 2005.
[14] Y. Velenik, Localization and delocalization of random interfaces. Probability Surveys 3,
112-169, 2006.
http://arxiv.org/abs/math/0611140
	Introduction
	The setup
	Main results
	Proof of Depinning-type results
	Proof of Pinning-type results
ABSTRACT
  We consider statistical mechanics models of continuous height effective
interfaces in the presence of a delta-pinning at height zero. There is a
detailed mathematical understanding of the depinning transition in 2 dimensions
without disorder. Then the variance of the interface height w.r.t. the Gibbs
measure stays bounded uniformly in the volume for any positive pinning force
and diverges like the logarithm of the pinning force when it tends to zero.
  How does the presence of a quenched disorder term in the Hamiltonian modify
this transition? We show that an arbitarily weak random field term is enough to
beat an arbitrarily strong delta-pinning in 2 dimensions and will cause
delocalization. The proof is based on a rigorous lower bound for the overlap
between local magnetizations and random fields in finite volume. In 2
dimensions it implies growth faster than the volume which is a contradiction to
localization. We also derive a simple complementary inequality which shows that
in higher dimensions the fraction of pinned sites converges to one when the
pinning force tends to infinity.

<|endoftext|><|startoftext|>
Introduction
A unital and separable C∗-algebra D 6= C is strongly self-absorbing if there is an
isomorphism D
→ D ⊗ D which is approximately unitarily equivalent to the inclusion
map D → D ⊗ D, d 7→ d ⊗ 1D ([14]). Strongly self-absorbing C
∗-algebras are known to
be simple and nuclear; moreover, they are either purely infinite or stably finite. The only
known examples of strongly self-absorbing C∗-algebras are the UHF algebras of infinite
type (i.e., every prime number that occurs in the respective supernatural number occurs
with infinite multiplicity), the Cuntz algebras O2 and O∞, the Jiang–Su algebra Z and
tensor products of O∞ with UHF algebras of infinite type, see [14]. All these examples
are K1-injective, i.e., the canonical map U(D)/U0(D) → K1(D) is injective.
It was observed in [14] that any two unital ∗-homomorphisms σ, γ : D → A ⊗ D are
approximately unitarily equivalent, were A is another unital and separable C∗-algebra.
If D is K1-injective, the unitaries implementing the equivalence may even be chosen to
Date: August 3, 2021.
2000 Mathematics Subject Classification. 46L05, 47L40.
Key words and phrases. Strongly self-absorbing C∗-algebras, KK-theory, asymptotic unitary
equivalence, continuous fields of C∗-algebras.
Supported by: The first named author was partially supported by NSF grant #DMS-0500693.
The second named author was supported by the DFG (SFB 478).
http://arxiv.org/abs/0704.0583v1
2 MARIUS DADARLAT AND WILHELM WINTER
be homotopic to the unit. When D is O2, O∞, it was known that σ and γ are even
asymptotically unitarily equivalent – i.e., they can be intertwined by a continuous path of
unitaries, parametrized by a half-open interval. Up to this point, it was not clear whether
the respective statement holds for the Jiang–Su algebra Z. Theorem 2.2 below provides
an affirmative answer to this problem. Even more, we show that the path intertwining σ
and γ may be chosen in the component of the unit.
We believe this result, albeit technical, is interesting in its own right, and that it will
be a useful ingredient for the systematic further use of strongly self-absorbing C∗-algebras
in Elliott’s program to classify nuclear C∗-algebras by K-theory data. In fact, this point
of view is our main motivation for the study of strongly self-absorbing C∗-algebras; see
[8], [10], [16], [17], [18] and [15] for already existing results in this direction.
For the time being, we use Theorem 2.2 to derive some consequences for the Kasparov
groups of the form KK(D, A ⊗ D). More precisely, we show that all the elements of the
Kasparov group KK(D, A ⊗ D) are of the form [ϕ] − n[ι] where ϕ : D → K ⊗ A ⊗ D
is a ∗-homomorphism and ι : D → A ⊗ D is the inclusion ι(d) = 1A ⊗ d and n ∈ N.
Moreover, two non-zero ∗-homomorphisms ϕ,ψ : D → K⊗A⊗D with ϕ(1D) = ψ(1D) = e
have the same KK-theory class if and only if there is a unitary-valued continuous map
u : [0, 1) → e(K ⊗ A ⊗ D)e, t 7→ ut such that u0 = e and limt→1 ‖ut ϕ(d)u
t − ψ(d)‖ = 0
for all d ∈ D. In addition, we show that KKi(D,D ⊗A) ∼= Ki(D ⊗A), i = 0, 1.
One may note the similarity to the descriptions of KK(O∞,O∞ ⊗ A) ([8],[10]) and
KK(C,C ⊗ A). However, we do not require that D satisfies the universal coefficient
theorem (UCT) in KK-theory. In the same spirit, we characterize O2 and the universal
UHF algebra Q using K-theoretic conditions, but without involving the UCT.
As another application of Theorem 2.2 (and the results of [7]), we prove in [4] an
automatic trivialization result for continuous fields with strongly self-absorbing fibres over
finite dimensional spaces.
The second named author would like to thank Eberhard Kirchberg for an inspiring
conversation on the problem of proving Theorem 2.2.
1. Strongly self-absorbing C∗-algebras
In this section we recall the notion of strongly self-absorbing C∗-algebras and some facts
from [14].
1.1 Definition: Let A, B be C∗-algebras and σ, γ : A → B be ∗-homomorphisms.
Suppose that B is unital.
ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 3
(i) We say that σ and γ are approximately unitarily equivalent, σ ≈u γ, if there is a
sequence (un)n∈N of unitaries in B such that
‖unσ(a)u
n − γ(a)‖
for every a ∈ A. If all un can be chosen to be in U0(B), the connected component of
1B of the unitary group U(B), then we say that σ and γ are strongly approximately
unitarily equivalent, written σ ≈su γ.
(ii) We say that σ and γ are asymptotically unitarily equivalent, σ ≈uh γ, if there is
a norm-continuous path (ut)t∈[0,∞) of unitaries in B such that
‖utσ(a)u
t − γ(a)‖
for every a ∈ A. If one can arrange that u0 = 1B and hence (ut ∈ U0(B) for
all t), then we say that σ and γ are strongly asymptotically unitarily equivalent,
written σ ≈suh γ.
1.2 The concept of strongly self-absorbing C∗-algebras was formally introduced in [14,
Definition 1.3]:
Definition: A separable unital C∗-algebra D is strongly self-absorbing, if D 6= C and
there is an isomorphism ϕ : D → D ⊗D such that ϕ ≈u idD ⊗ 1D.
1.3 Recall [14, Corollary 1.12]:
Proposition: Let A and D be unital C∗-algebras, with D strongly self-absorbing. Then,
any two unital ∗-homomorphisms σ, γ : D → A⊗D are approximately unitarily equivalent.
In particular, any two unital endomorphisms of D are approximately unitarily equivalent.
We note that the assumption that A is separable which appears in the original statement
of [14, Corollary 1.12] is not necessary and was not used in the proof.
1.4 Lemma: Let D be a strongly self-absorbing C∗-algebra. Then there is a sequence
of unitaries (wn)n∈N in the commutator subgroup of U(D ⊗ D) such that for all d ∈ D
‖wn(d⊗ 1D)w
n − 1D ⊗ d‖ → 0 as n→ ∞.
Proof: Let F ⊂ D be a finite normalized set and let ε > 0. By [14, Prop. 1.5] there is a
unitary u ∈ U(D⊗D) such that ‖u(d⊗1D)u
∗−1D⊗d‖ < ε for all d ∈ F . Let θ : D⊗D → D
be a ∗-isomorphism. Then ‖(θ(u∗) ⊗ 1D)u(d ⊗ 1D)u
∗(θ(u) ⊗ 1D) − 1D ⊗ d‖ < ε for all
d ∈ F . By Proposition 1.3 θ ⊗ 1D ≈u idD⊗D and so there is a unitary v ∈ U(D ⊗ D)
such that ‖θ(u∗) ⊗ 1D − vu
∗v∗‖ < ε and hence ‖(θ(u∗) ⊗ 1D)u − vu
∗v∗u‖ < ε. Setting
w = vu∗v∗u we deduce that ‖w(d ⊗ 1D)w
∗ − 1D ⊗ d‖ < 3ε for all d ∈ F .
1.5 Remark: In the situation of Proposition 1.3, suppose that the commutator subgroup
of U(D) is contained in U0(D). This will happen for instance if D is assumed to be K1-
injective. Then one may choose the unitaries (un)n∈N which implement the approximate
4 MARIUS DADARLAT AND WILHELM WINTER
unitary equivalence between σ and γ to lie in U0(A⊗D). This follows from [14, (the proof
of) Corollary 1.12], since the unitaries (un)n∈N are essentially images of the unitaries
(wn)n∈N of Lemma 1.4 under suitable unital
∗-homomorphisms.
2. Asymptotic vs. approximate unitary equivalence
It is the aim of this section to establish a continuous version of Proposition 1.3.
2.1 Lemma: Let D be separable unital strongly self-absorbing C∗-algebra. For any finite
subset F ⊂ D and ε > 0, there are a finite subset G ⊂ D and δ > 0 such that the following
holds:
If A is another unital C∗-algebra and σ : D → A⊗D is a unital ∗-homomorphism, and
if w ∈ U0(A⊗D) is a unitary satisfying
‖[w, σ(d)]‖ < δ
for all d ∈ G, then there is a continuous path (wt)t∈[0,1] of unitaries in U0(A ⊗ D) such
that w0 = w, w1 = 1A⊗D and
‖[wt, σ(d)]‖ < ε
for all d ∈ F , t ∈ [0, 1].
Proof: We may clearly assume that the elements of F are normalized and that ε < 1.
Let u ∈ D ⊗D be a unitary satisfying
(1) ‖u(d ⊗ 1D)u
∗ − 1D ⊗ d‖ <
for all d ∈ F . There exist k ∈ N and elements s1, . . . , sk, t1, . . . , tk ∈ D of norm at most
one such that
(2) ‖u−
si ⊗ ti‖ <
(3) δ :=
k · 10
(4) G := {s1, . . . , sk} ⊂ D.
Now let w ∈ U0(A⊗D) be a unitary as in the assertion of the lemma, i.e., w satisfies
(5) ‖[w, σ(si)]‖ < δ
for all i = 1, . . . , k. We proceed to construct the path (wt)t∈[0,1].
By [14, Remark 2.7] there is a unital ∗-homomorphism
ϕ : A⊗D ⊗D → A⊗D
ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 5
such that
(6) ‖ϕ(a⊗ 1D)− a‖ <
for all a ∈ σ(F) ∪ {w}.
Since w ∈ U0(A⊗D), there is a path (w̄t)t∈[ 1
,1] of unitaries in A⊗D such that
(7) w̄ 1
= w and w̄1 = 1A⊗D.
For t ∈ [1
, 1] define
(8) wt := ϕ((σ ⊗ idD)(u)
∗(w̄t ⊗ 1D)(σ ⊗ idD)(u)) ∈ U(A⊗D);
then (wt)t∈[ 1
,1] is a continuous path of unitaries in A ⊗ D. For t ∈ [
, 1] and d ∈ F we
‖[wt, σ(d)]‖
= ‖wtσ(d)w
t − σ(d)‖
< ‖wtϕ(σ(d) ⊗ 1D)w
t − ϕ(σ(d) ⊗ 1D)‖+ 2 ·
≤ ‖((σ ⊗ idD)(u))
∗(w̄t ⊗ 1D)((σ ⊗ idD)(u(d ⊗ 1D)u
∗))(w̄∗t ⊗ 1D)
·((σ ⊗ idD)(u)) − ((σ ⊗ idD)(d⊗ 1D))‖ +
< ‖((σ ⊗ idD)(u))
∗(w̄t ⊗ 1D)((σ ⊗ idD)(1D ⊗ d))(w̄
t ⊗ 1D)
·((σ ⊗ idD)(u)) − ((σ ⊗ idD)(d⊗ 1D))‖ +
= ‖(σ ⊗ idD)(u
∗(1D ⊗ d)u− d⊗ 1D)‖+
6 MARIUS DADARLAT AND WILHELM WINTER
where for the last equality we have used that the w̄t are unitaries and that σ is a unital
∗-homomorphism. Furthermore, we have
(7),(8)
= ‖ϕ(((σ ⊗ idD)(u))
∗(w ⊗ 1D)((σ ⊗ idD)(u))) − w‖
< ‖ϕ(((σ ⊗ idD)(u))
∗(w ⊗ 1D)(
σ(si)⊗ ti))− w‖+
≤ ‖ϕ(((σ ⊗ idD)(u))
σ(si)⊗ ti)(w ⊗ 1D))− w‖
‖[w, σ(si)]‖ · ‖ti‖+
(5),(4),(2)
< ‖ϕ(w ⊗ 1D)− w‖+ k · δ + 2 ·
(6),(3)
+ 2 ·
The above estimate allows us to extend the path (wt)t∈[ 1
,1] to the whole interval [0, 1]
in the desired way: We have ‖w 1
w∗ − 1D‖ <
< 2, whence −1 is not in the spectrum
of w 1
w∗. By functional calculus, there is a = a∗ ∈ A ⊗ D with ‖a‖ < 1 such that
w∗ = exp(πia). For t ∈ [0, 1
) we may therefore define a continuous path of unitaries
wt := (exp(2πita))w ∈ U(A⊗D).
It is clear that w0 = w and wt → w 1
as t→ (1
)−, whence (wt)t∈[0,1] is a continuous path
of unitaries in A satisfying w0 = w and w1 = 1A ⊗D. Moreover, it is easy to see that
‖wt − w‖ ≤ ‖w 1
− w‖ <
for all t ∈ [0, 1
), whence
‖[wt, σ(d)]‖ < ‖[w 1
, σ(d)]‖ +
for t ∈ [0, 1
), d ∈ F .
We have now constructed a path (wt)t∈[0,1] ⊂ U(A) with the desired properties.
2.2 Theorem: Let A and D be unital C∗-algebras, with D separable, strongly self-
absorbing and K1-injective. Then, any two unital
∗-homomorphisms σ, γ : D → A⊗D are
strongly asymptotically unitarily equivalent. In particular, any two unital endomorphisms
of D are strongly asymptotically unitarily equivalent.
ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 7
Proof: Note that the second statement follows from the first one with A = D, since
D ∼= D ⊗D by assumption.
Let A be a unital C∗-algebra such that A ∼= A ⊗ D and let σ, γ : D → A be unital
∗-homomorphisms. We shall prove that σ and γ are strongly asymptotically unitarily
equivalent. Choose an increasing sequence
F0 ⊂ F1 ⊂ . . .
of finite subsets of D such that
Fn is a dense subset of D. Let 1 > ε0 > ε1 > . . . be a
decreasing sequence of strictly positive numbers converging to 0.
For each n ∈ N, employ Lemma 2.1 (with Fn and εn in place of F and ε) to obtain a
finite subset Gn ⊂ D and δn > 0. We may clearly assume that
(10) Fn ⊂ Gn ⊂ Gn+1 and that δn+1 < δn < εn
for all n ∈ N.
Since σ and γ are strongly approximately unitarily equivalent by Proposition 1.3 and
Remark 1.5, there is a sequence of unitaries (un)n∈N ⊂ U0(A) such that
(11) ‖unσ(d)u
n − γ(d)‖ <
for all d ∈ Gn and n ∈ N. Let us set
wn := u
n+1un, n ∈ N.
Then wn ∈ U0(A) and
‖[wn, σ(d)]‖
= ‖wnσ(d)w
n − σ(d)‖
≤ ‖u∗n+1unσ(d)u
nun+1 − u
n+1γ(d)un+1‖
+‖u∗n+1γ(d)un+1 − σ(d)‖
for d ∈ Gn, n ∈ N. Now by Lemma 2.1 (and the choice of the Gn and δn), for each n there
is a continuous path (wn,t)t∈[0,1] of unitaries in U0(A) such that wn,0 = wn, wn,1 = 1A and
(12) ‖[wn,t, σ(d)]‖ < εn
for all d ∈ Fn, t ∈ [0, 1].
Next, define a path (ūt)t∈[0,∞) of unitaries in U0(A) by
ūt := un+1wn,t−n if t ∈ [n, n+ 1).
8 MARIUS DADARLAT AND WILHELM WINTER
We have that
(13) ūn = un+1wn = un
and that
ūt → un+1
as t → n + 1 from below, which implies that the path (ūt)t∈[0,∞) is continuous in U0(A).
Furthermore, for t ∈ [n, n+ 1) and d ∈ Fn we obtain
‖ūtσ(d)ū
t − γ(d)‖
= ‖un+1wn,t−nσ(d)w
n,t−nu
n+1 − γ(d)‖
< ‖un+1σ(d)u
n+1 − γ(d)‖ + εn
(11),(10)
< 2εn.
Since the Fn are nested and the εn converge to 0, we have
(14) ‖ūtσ(d)ū
t − γ(d)‖
for all d ∈
n=0Fn; by continuity and since
n=0Fn is dense in D, we have (14) for all
d ∈ D. Since ū0 ∈ U0(A) we may arrange that ū0 = 1A.
3. The group KK(D, A⊗D) and some applications
3.1 For a separable C∗-algebra D we endow the group of automorphisms Aut (D) with the
point-norm topology.
Corollary: Let D be a separable, unital, strongly self-absorbing and K1-injective C
algebra. Then [X,Aut(D)] reduces to a point for any compact Hausdorff space X.
Proof: Let ϕ,ψ : X → Aut (D) be continuous maps. We identify ϕ and ψ with unital
∗-homomorphisms ϕ,ψ : D → C(X) ⊗ D. By Theorem 2.2, ϕ is strongly asymptotically
unitarily equivalent to ψ. This gives a homotopy between the two maps ϕ,ψ : X →
Aut (D).
3.2 Remark: The conclusion of Corollary 3.1 was known before for D a UHF algebra of
infinite type and X a CW complex by [13], for D = O2 by [8] and [10], and for D = O∞
by [2]. It is new for the Jiang–Su algebra.
3.3 For unital C∗-algebras D and B we denote by [D, B] the set of homotopy classes of
unital ∗-homomorphisms from D to B. By a similar argument as above we also have the
following corollary.
ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 9
Corollary: Let D and A be unital C∗-algebras. If D is separable, strongly self-absorbing
and K1-injective, then [D, A⊗D] reduces to a singleton.
3.4 For separable unital C∗-algebras D and B, let χi : KKi(D, B) → KKi(C, B) ∼= Ki(B),
i = 0, 1 be the morphism of groups induced by the unital inclusion ν : C → D.
Theorem: Let D be a unital, separable and strongly self-absorbing C∗-algebra. Then for
any separable C∗-algebra A, the map χi : KKi(D, A ⊗ D) → Ki(A ⊗ D) is bijective, for
i = 0, 1. In particular both groups KKi(D, A⊗D) are countable and discrete with respect
to their natural topology.
Proof: Since D is KK-equivalent to D ⊗ O∞, we may assume that D is purely infinite
and in particular K1-injective by [11, Prop. 4.1.4]. Let CνD denote the mapping cone C
algebra of ν. By [3, Cor. 3.10], there is a bijection [D, A⊗ D] → KK(CνD, SA⊗ D) and
hence KK(CνD, SA⊗D) = 0 for all separable and unital C
∗-algebras A as a consequence
of Corollary 3.3. Since KK(CνD, A ⊗ D) is isomorphic to KK(CνD, S
2A ⊗ D) by Bott
periodicity and the latter group injects in KK(CνD, SC(T) ⊗ A ⊗ D) = 0, we have that
KKi(CνD,D ⊗ A) = 0 for all unital and separable C
∗-algebras A and i = 0, 1. Since
KKi(CνD,D ⊗A) is a subgroup of KKi(CνD,D ⊗ Ã) = 0 (where Ã is the unitization of
A) we see that KKi(CνD,D ⊗ A) = 0 for all separable C
∗-algebras A. Using the Puppe
exact sequence, where χi = ν
KKi+1(CνD, A⊗D) // KKi(D, A⊗D)
// KKi(C, A⊗D) // KKi(CνD, A⊗D)
we conclude that χi is an isomorphism, i = 0, 1. The map χi = ν
∗ is continuous since it
is given by the Kasparov product with a fixed element (we refer the reader to [12], [9] or
[1] for a background on the topology of the Kasparov groups). Since the topology of Ki is
discrete and χi is injective, it follows that the topology of KKi(D, A⊗D) is also discrete.
The countability of KKi(D, A⊗D) follows from that of Ki(A⊗D), as A⊗D is separable.
3.5 Remark: In contrast to Theorem 3.4, if D is the universal UHF algebra, then
KK(D,C) ∼= Ext(Q,Z) ∼= QN has the power of the continuum [6, p. 221].
3.6 Let D and A be as in Theorem 3.4 and assume in addition that D is K1-injective and
A is unital. Let ι : D → A⊗D be defined by ι(d) = 1A ⊗ d.
Corollary: If e ∈ K ⊗ A⊗ D is a projection, and ϕ,ψ : D → e(K ⊗ A ⊗ D)e are two
unital ∗-homomorphisms, then ϕ ≈suh ψ and hence [ϕ] = [ψ] ∈ KK(D, A⊗D). Moreover:
KK(D, A⊗D) = {[ϕ]− n[ι] |ϕ : D → K⊗A⊗D is a ∗-homomorphism, n ∈ N}.
10 MARIUS DADARLAT AND WILHELM WINTER
Proof: Let ϕ, ψ and e be as in the first part of the statement. By [14, Cor. 3.1], the
unital C∗-algebra e(K⊗A⊗D)e is D-stable, being a hereditary subalgebra of a D-stable
C∗-algebra. Therefore ϕ ≈suh ψ by Theorem 2.2.
Now for the second part of the statement, let x ∈ KK(D, A ⊗ D) be an arbitrary
element. Then χ0(x) = [e]−n[1A⊗D] for some projection e ∈ K⊗A⊗D and n ∈ N. Since
e(K ⊗ A ⊗ D)e is D-stable, there is a unital ∗-homomorphism ϕ : D → e(K ⊗ A ⊗ D)e.
χ0([ϕ] − n[ι]) = [ϕ(1D)]− n[ι(1D)] = [e]− n[1A⊗D] = χ0(x),
and hence [ϕ]− n[ι] = x since χ0 is injective by Theorem 3.4.
In the remainder of the paper we give characterizations for the Cuntz algebra O2 and for
the universal UHF-algebra which do not require the UCT. The latter result is a variation
of a theorem of Effros and Rosenberg [5].
3.7 Proposition: Let D be a separable unital strongly self-absorbing C∗-algebra. If [1D] =
0 in K0(D), then D ∼= O2.
Proof: Since D must be nuclear (see [14]), D embeds unitally in O2 by Kirchberg’s
theorem. D is not stably finite since [1D] = 0. By the dichotomy of [14, Thm. 1.7] D must
be purely infinite. Since [1D] = 0 in K0(D), there is a unital embedding O2 → D, see [11,
Prop. 4.2.3]. We conclude that D is isomorphic to O2 by [14, Prop. 5.12].
3.8 Proposition: Let D, A be separable, unital, strongly self-absorbing C∗-algebras.
Suppose that for any finite subset F of D and any ε > 0 there is a u.c.p. map ϕ : D → A
such that ‖ϕ(cd) − ϕ(c)ϕ(d)‖ < ε for all c, d ∈ F . Then A ∼= A⊗D.
Proof: By [14, Thm. 2.2] it suffices to show that for any given finite subsets F of D, G of
A and any ε > 0 there is u.c.p. map Φ : D → A such that (i) ‖Φ(cd)− Φ(c)Φ(d)‖ < ε for
all c, d ∈ F and (ii) ‖[Φ(d), a]‖ < ε for all d ∈ F and a ∈ G. We may assume that ‖d‖ ≤ 1
for all d ∈ F . Since A is strongly self-absorbing, by [14, Prop. 1.10] there is a unital ∗-
homomorphism γ : A⊗A→ A such that ‖γ(a⊗1A)−a‖ < ε/2 for all a ∈ G. On the other
hand, by assumption there is a u.c.p. map ϕ : D → A such that ‖ϕ(cd) − ϕ(c)ϕ(d)‖ < ε
for all c, d ∈ F . Let us define a u.c.p. map Φ : D → A by Φ(d) = γ(1A ⊗ ϕ(d)). It is
clear that Φ satisfies (i) since γ is a ∗-homomorphism. To conclude the proof we check
now that Φ also satisfies (ii). Let d ∈ F and a ∈ G. Then
‖[Φ(d), a]‖
≤ ‖[Φ(d), a − γ(a⊗ 1A)]‖+ ‖[Φ(d), γ(a ⊗ 1A)]‖
≤ 2‖Φ(d)‖‖a − γ(a⊗ 1A)‖+ ‖[γ(1A ⊗ ϕ(d)), γ(a ⊗ 1A)]‖
< 2ε/2 + 0 = ε.
ON THE KK-THEORY OF STRONGLY SELF-ABSORBING C∗-ALGEBRAS 11
3.9 Proposition: Let D be a separable, unital, strongly self-absorbing C∗-algebra. Sup-
pose that D is quasidiagonal, it has cancellation of projections and that [1D] ∈ nK0(D)
for all n ≥ 1. Then D is isomorphic to the universal UHF algebra Q with K0(Q) ∼= Q.
Proof: Since D is separable unital and quasidiagonal, there is a unital ∗-representation
π : D → B(H) on a separable Hilbert space H and a sequence of nonzero projections
pn ∈ B(H) of finite rank k(n) such that limn→∞ ‖[pn, π(d)]‖ = 0 for all d ∈ D. Then
the sequence of u.c.p. maps ϕn : D → pnB(H)pn ∼= Mk(n)(C) ⊂ Q is asymptotically
multiplicative, i.e limn→∞ ‖ϕn(cd) − ϕn(c)ϕn(d))‖ = 0 for all c, d ∈ D. Therefore Q ∼=
Q⊗D by Proposition 3.8.
In the second part of the proof we show that D ∼= D ⊗Q. Let En : Q → Mn!(C) ⊂ Q
be a conditional expectation onto Mn!(C). Then limn→∞ ‖En(a)− a‖ = 0 for all a ∈ Q.
By assumption, for each n there is a projection e in D ⊗Mm(C) (for some m) such
that n![e] = [1D] in K0(D). Let ϕ : Mn!(C) → Mn!(C) ⊗ e(D ⊗ Mm(C))e be defined
by ϕ(b) = b ⊗ e. Since D has cancellation of projections and since n![e] = [1D], there
is a partial isometry v ∈ Mn!(C) ⊗ D ⊗Mm(C) such that v
∗v = 1Mn!(C) ⊗ e and vv
e11⊗1D⊗e11. Therefore b 7→ v ϕ(b) v
∗ gives a unital embedding ofMn!(C) into D. Finally,
ψn(a) = v (ϕ ◦ En(a)) v
∗ defines a sequence of asymptotically multiplicative u.c.p. maps
Q → D. Therefore D ∼= D ⊗Q by Proposition 3.8.
3.10 Remark: Let D be a separable, unital, strongly self-absorbing and quasidiagonal C∗-
algebra. Then D ⊗Q ∼= Q by the first part of the proof of Proposition 3.9. In particular
K1(D) ⊗ Q = 0 and K0(D) ⊗ Q ∼= Q by the Künneth formula (or by writing Q as an
inductive limit of matrices).
References
[1] M. Dadarlat. On the topology of the Kasparov groups and its applications., J. Funct. Anal. 228 (2005),
394–418.
[2] M. Dadarlat. Continuous fields of C∗-algebras over finite dimensional spaces , arXiv preprint
math.OA/0611405 (2006).
[3] M. Dadarlat. The homotopy groups of the automorphism group of Kirchberg algebras, J. Noncomm.
Geom. 1 (2007), 113–139.
[4] M. Dadarlat and W. Winter. Trivialization of C(X)-algebras with strongly self-absorbing fibres, preprint
(2007).
[5] E. G. Effros and J. Rosenberg. C∗-algebras with approximately inner flip, Pacific J. Math. 77 (1978),
417–443.
[6] L. Fuchs. Infinite abelian groups, vol. 1, Academic Press, New York and London, 1970.
[7] I. Hirshberg, M. Rørdam and W. Winter. C0(X)-algebras, stability and strongly self-absorbing C
algebras, arXiv preprint math.OA/0610344 (2006). To appear in Math. Ann.
[8] E. Kirchberg. The classification of purely infinite C∗-algebras using Kasparov’s theory, preprint (1994).
http://arxiv.org/abs/math/0611405
http://arxiv.org/abs/math/0610344
12 MARIUS DADARLAT AND WILHELM WINTER
[9] M. V. Pimsner. A topology on the Kasparov groups, draft.
[10] N. C. Phillips. A classification theorem for nuclear purely infinite simple C∗-algebras, Documenta
Math. 5 (2000), 49–114.
[11] M. Rørdam. Classification of Nuclear C∗-Algebras, Encyclopaedia Math. Sci., vol. 126, Springer,
Berlin, 2002.
[12] C. Schochet. The fine structure of the Kasparov groups I. Continuity of the KK-pairing, J. Funct.
Anal. 186 (2001), 25–61.
[13] K. Thomsen. The homotopy type of the group of automorphisms of a UHF-algebra, J. Funct. Anal. 72
(1987), 182–207.
[14] A. Toms andW.Winter. Strongly self-absorbing C∗-algebras, arXiv preprint math.OA/0502211 (2005).
To appear in Trans. Amer. Math. Soc.
[15] A. Toms and W. Winter. Z-stable ASH algebras, arXiv preprint math.OA/0508218 (2005). To appear
in Can. J. Math.
[16] W. Winter. On the classification of simple Z-stable C∗-algebras with real rank zero and finite decom-
position rank, J. London Math. Soc. 74 (2006), 167–183.
[17] W. Winter. Simple C∗-algebras with locally finite decomposition rank, J. Funct. Anal. 243 (2007),
394–425.
[18] W. Winter. Localizing the Elliott conjecture, in preparation.
Department of Mathematics, Purdue University, West Lafayette,, IN 47907, USA
E-mail address: mdd@math.purdue.edu
Mathematisches Institut der Universität Münster, Einsteinstr. 62, D-48149 Münster,
Germany
E-mail address: wwinter@math.uni-muenster.de
http://arxiv.org/abs/math/0502211
http://arxiv.org/abs/math/0508218
	0. Introduction
	1. Strongly self-absorbing C*-algebras
	2. Asymptotic vs. approximate unitary equivalence
	3. The group KK(D,AD) and some applications
	References
ABSTRACT
  Let $\Dh$ and $A$ be unital and separable $C^{*}$-algebras; let $\Dh$ be
strongly self-absorbing. It is known that any two unital $^*$-homomorphisms
from $\Dh$ to $A \otimes \Dh$ are approximately unitarily equivalent. We show
that, if $\Dh$ is also $K_{1}$-injective, they are even asymptotically
unitarily equivalent. This in particular implies that any unital endomorphism
of $\Dh$ is asymptotically inner. Moreover, the space of automorphisms of $\Dh$
is compactly-contractible (in the point-norm topology) in the sense that for
any compact Hausdorff space $X$, the set of homotopy classes $[X,\Aut(\Dh)]$
reduces to a point. The respective statement holds for the space of unital
endomorphisms of $\Dh$. As an application, we give a description of the
Kasparov group $KK(\Dh, A\ot \Dh)$ in terms of $^*$-homomorphisms and
asymptotic unitary equivalence. Along the way, we show that the Kasparov group
$KK(\Dh, A\ot \Dh)$ is isomorphic to $K_0(A\ot \Dh)$.

<|endoftext|><|startoftext|>
Effective interactions from q-deformed
inspired transformations
V. S. Timóteo a C. L. Lima b
aCentro Superior de Educação Tecnológica, Universidade Estadual de Campinas,
13484-370, Limeira, SP, Brasil
bInstituto de F́ısica, Universidade de São Paulo, CP 66318, 05315-970, São
Paulo, SP, Brazil
Abstract
From the mass term for the transformed quark fields, we obtain effective contact
interactions of the NJL type. The parameters of the model that maps a system of
non-interacting transformed fields into quarks interacting via NJL contact terms
are discussed.
It is very common in physics to use transformations that make one particular
system mathematically simpler, yet describing the same phenomena. A clear
example is the use of canonical transformations in classical mechanics.
q-Deformed algebras provide a nice framework to incorporate, in an effective
way, interactions not originally contained in the Lagrangian of a particular
system.
In hadron physics, the NJL model is a very simple effective model for strong
interactions that describes important features like the dynamical mass genera-
tion, spontaneous chiral symmetry breaking, and chiral symmetry restoration
at finite temperature.
In recent works, we have been investigating possible applications of quantum
algebras in hadronic physics. In general, we observed that when we deform
the underlying algebra, the system is affected with correlations between its
constituents. We have studied in detail the NJL model under the influence of
a quantum su(2) algebra.
The question we approach in this letter is: is it possible to obtain a transfor-
mation connecting the NJL model to a simpler non-interacting system? We
verified that we can indeed obtain the same dynamics of the NJL interaction
Preprint submitted to Elsevier 4 November 2018
http://arxiv.org/abs/0704.0584v1
with a simple transformation of the quark fields, inspired in the q-deformed
quark fields of previous works [2], [3], [6].
Mass term
We start by writing a mass term for the transformed quark fields
Lmassq =−M ΨΨ
Ψ1Ψ1 +Ψ2Ψ2
UU +DD
where
 . (2)
The transformed quark fields can be written in terms of the standard fields as
Ψ1=ψ1 + (q
−1 − 1) ψ1ψ2γ0ψ2 , (3)
Ψ2=ψ2 + (q
−1 − 1) ψ2ψ1γ0ψ1 , (4)
U = u+ (q−1 − 1) udγ0d , (5)
D= d+ (q−1 − 1) duγ0u , (6)
 . (7)
Here both components are modified in the same way, so that the above ex-
pressions are different from the ones used in [2,3], where only one component
is affected. Extending the transformation to both components is required to
obtain a set of terms that will form an interaction of the NJL type. This im-
plies that the anti-commutation relations for the deformed fields Ψ will also
be different from the ones in [2,3]. Since obtaining the new anti-commuation
relations is not in the scope of this work, we focus on the effective interactions
contained in the non-interacting Lagrangian.
Using Eqs. (5) and (6), we can re-write the Lagragian Eq.(1) in terms of the
standard quark fields
UU = uu+Q uud†d+Q d†duu+Q2 dduudd , (8)
DD= dd+Q ddu†u+Q u†udd+Q2 uudduu , (9)
where Q = (q−1 − 1).
We can re-write the above equations as follows
1 + 2Q d†d
dduudd+ dduudd
, (10)
1 + 2Q u†u
uudduu+ uudduu
, (11)
so that we identify the contact interactions between the quarks contained in
the non-interacting deformed fields Lagrangian. Figure 1 shows the six-point
contact interactions contained in the mass term for the q-deformed quark
fields.
We can reduce the six-point interactions to four-point contact terms in a mean
field approach [5], so that we have
UU +DD
1 + 2Q
uu+ dd
dduu+ dddd+ uudd+ uuuu
, (12)
where 〈ψ†ψ〉 = 〈u†u〉 = 〈d†d〉 = ρv,
= 〈uu〉 =
= ρs, and A =
A(T ; q) has the same dimension of the condensate and will be determined
later in this letter. The reduction of the six-point to four-point contact terms
by closing one fermion line is also shown in Figure 1.
Now we can write the mass term for the transformed quark fields
Lmassq = −MΨΨ = −M
1 + 2
ψψ − M
Γ2 ψψψψ , (13)
with Γ = Q/A
Kinetic energy term
Accordingly, the kinetic energy term for the transformed fields, Ψγµ∂µΨ, can
be written in terms of the standard ones as
Ψγµ∂µΨ=Uγ
µ∂µU +Dγ
= uγµ∂µu+Q
dγ0duγ
µ∂µu+ uγ
µ∂µudγ0d
+ dγµ∂µd+Q
uγ0udγ
µ∂µd+ dγ
µ∂µduγ0u
dγ0duγ
µ∂µudγ0d
uγ0udγ
µ∂µduγ0u
By using an extreme mean field approximation, namely, substituting every-
where in the kinetic energy contribution 〈ψ†ψ〉 = 〈u†u〉 = 〈d†d〉 → ρv, and
= 〈uu〉 =
→ ρs, we obtain
Ψγµ∂µΨ= uγ
µ∂µu (1 + 2Γρv)
+ dγµ∂µd (1 + 2Γρv)
+ (uγµ∂µu) Γ
2ρv +
dγµ∂µd
uγµ∂µu+ dγ
(1 + Γρv)
This corresponds to a usual kinetic energy with a shifted momentum p →
p (1 + Γρv)
The full Lagrangian
The treatment of the density dependence of the kinetic energy term is rather
cumbersome and will be postponed to a further contribution. We will consider
the influence of this momentum dependent kinetic energy term in an effective
way. Therefore, we will study a class of Lagrangians of the type
L′q =
(1 + Γρv)
Lq = ψγµ∂µψ −M
1 + 2
(1 + Γρv)
(1 + Γρv)
ψψψψ (16)
This representative of the full Lagrangian Lq = Ψγµ∂µΨ + Lmassq , when writ-
ten in terms of the standard quark fields, can be identified with the NJL
Lagrangian
LNJL = ψγµ∂µψ −m0 ψψ +G ψψψψ . (17)
The conditions for both Lagrangians, LNJL and L′q, to be equivalent for any
values of T and q are
(1 + Γρv)
(1 + 2Γρv)
m0 , (18)
G = −M
(1 + Γρv)
. (19)
Inserting Eq. (18) in Eq. (19), we obtain an equation for Γ
Γ2 − 2αρv Γ− α = 0 , (20)
where
α = − 2G
> 0. (21)
This equation has two solutions
Γ± = αρv
. (22)
The mass of the transformed fermion fields, M , has to be positive, so we
associate the two solutions Γ− and Γ+ with the two regimes q < 1 and q > 1,
respectively. The quantity A will be negative in both cases.
The scalar (ρs) and vector (ρv) densities were calculated from the NJL model
at finite temperature:
[1− n− n] , (23)
dpp2 [n− n] , (24)
where
n(p, T, µ) =
1 + exp [β (E − µ)]
, (25)
n(p, T, µ) =
1 + exp [β (E + µ)]
, (26)
are the fermions and anti-fermions distribution functions respectively with
p2 +m2.
First we solve the set of coupled gap equations for m, µ, and ρv (Eqs. 27
and 24, respectively) in the NJL model at finite temperature and chemical
potential
m = m0 − 2Gρs ,
µ = µ0 − GNcρv .
The next step is to calculate the scalar and vector densities entering in the
equation for Γ for a given value of the transformation parameter q. In this way
we obtain A(T ; q), which in turn is used to obtain M . The numerical results
are displayed in Figures 2 and 3, where we show the quantity A and the mass
M as a function of both temperature and transformation parameter in the
q > 1 and q < 1 regimes. It is worth to note that the mass of the transformed
fermion fields does not depend on the transformation parameter.
The well known results of the NJL model are mapped through A(T ; q) from the
non-interacting transformed fermion fields Lagrangian. It is worth to note that
the mass of the q-deformed fermion fields does not depend on the deformation
of the algebra.
The quantity A (T ; q) maps the simple non-interacting model into the NJL
model. It represents, in an effective way, the correlations introduced by the
transformations, when we write the non-interacting Lagrangian in terms of
the standard quark fields. These correlations, in a mean field approximation,
are effectively represented by contact interactions of the NJL type. It is also
important to mention that it inherits the phase transition. When the con-
densate and the dynamical mass vanishes with increasing T , the quantity A
also experiences the phase transition. This is an expected behavior, since it
depends on the dynamical mass. For a given temperature, T , and transfor-
mation parameter, q, there is a value of the mapping function, A(T ; q), that
makes the Lagrangians Eq.(16) and Eq.(17) equivalent.
Summarizing, we have shown that it is possible to describe the dynamics of
an interacting system of the NJL type with a simple non-interacting system
by using a set of quantum algebra inspired transformations and a mapping
function.
Acknowledgments
C. L. L. thanks Profs. D. Galetti and B. M. Pimentel for most helpful discus-
sions. This work was partially supported by FAPESP Grant No. 2002/10896-7.
V.S.T. would like to thank FAEPEX/UNICAMP for financial support.
References
[1] D. Galetti and B. M. Pimentel, An. Acad. Bras. Ci. 67 (1995) 7; S. S. Avancini,
A. Eiras, D. Galetti, B. M. Pimentel, and C. L. Lima, J. Phys. A: Math. Gen. 28
(1995) 4915; D. Galetti, J. T. Lunardi, B. M. Pimentel, and C. L. Lima, Physica
A242 (1997) 501.
[2] M. Ubriaco, Phys. Lett. A 219 (1996) 205.
[3] L. Tripodi and C. L. Lima, Phys. Lett. B 412 (1997) 7.
[4] Y. Nambu and G. Jona-Lasinio, Physical Review 122 (1961) 345.
[5] U. Vogl and W. Weise, Prog. Part. Nucl. Phys. 27 (1991) 195.
[6] V. S. Timóteo and C. L. Lima, Phys. Lett. B 448 (1999) 1.
[7] V. S. Timóteo and C. L. Lima, Mod. Phys. Lett. A 15 (2000) 219.
[8] V. S. Timóteo and C. L. Lima, nucl-th/0509089.
http://arxiv.org/abs/nucl-th/0509089
d dd d
Gq Gq
Gq Gq
u u d d
d d u u
〈uu〉〈dd〉
〈dd〉〈uu〉
Fig. 1. Contact interactions generated by the mass term for the q-deformed fermion
fields and their reduction from six-point to four-point by closing one fernion line.
0.2 1
T (GeV)
0.2 0.40.5
qT (GeV)
Fig. 2. The quantity A, in units of the chiral condensate at zero temperature
ρs(T = 0) = −1.42 × 10−2 GeV3, as a function of temperature and q-deformation
for the q > 1 and q < 1 regimes.
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
T (GeV)
Fig. 3. The mass of the q-deformed quark fields, in units of the current quark mass
m0 = 5 MeV, as a function of T for both q > 1 and q < 1 regimes. For small
temperatures, M = m0.
	References
ABSTRACT
  From the mass term for the transformed quark fields, we obtain effective
contact interactions of the NJL type. The parameters of the model that maps a
system of non-interacting transformed fields into quarks interacting via NJL
contact terms are discussed.

<|endoftext|><|startoftext|>
Magnetospectroscopy of epitaxial few-layer
graphene
M.L. Sadowski a G. Martinez a, M. Potemski a, C. Berger b,c,
W.A. de Heer b
aGrenoble High Magnetic Field Laboratory, Grenoble,France
bGeorgia Institute of Technology, Atlanta, Georgia, USA
cInstitut Néel, CNRS, Grenoble, France
Abstract
The inter-Landau level transitions observed in far-infrared transmission experiments
on few-layer graphene samples show a behaviour characteristic of the linear disper-
sion expected in graphene. This behaviour persists in relatively thick samples, and
is qualitatively different from that of thin samples of bulk graphite.
Key words: Graphene, Cyclotron resonance,
PACS: 71.70.Di, 76.40.+b, 78.30.-j, 78.67.-n
The interest in two-dimensional graphite is fuelled by its particular band struc-
ture and ensuing dispersion relation for electrons, leading to numerous differ-
ences with respect to “conventional” two-dimensional electron gases (2DEG).
Single graphite layers (graphene) have long been used as a starting point in
band structure calculations of bulk graphite [1,2,3] and, more recently, carbon
nanotubes [4]. The band structure of a single graphene sheet is considered to
be composed of cones located at two inequivalent Brillouin zone corners at
which the conduction and valence bands merge. In the vicinity of these points
the electron energy depends linearly on its momentum, which implies that
free charge carriers in graphene are governed not by Schrödinger’s equation,
but rather by Dirac’s relativistic equation for zero rest mass particles, with an
effective velocity c̃, which replaces the speed of light [5,6].
The recent appearance of ultrathin graphite layers (few-layer graphene, FLG),
obtained by epitaxial [7,8,9] and exfoliation techniques [10], followed by sin-
gle graphene and its unusual sequence of quantum Hall states [11,12] has
re-ignited this interest. The prospects of studying quantum electrodynamics
Preprint submitted to Solid State Communications 4 November 2018
http://arxiv.org/abs/0704.0585v1
100 200 300 400 500 600 700
Energy (cm-1)
0.4 T
1.9 K
Fig. 1. Transmission spectrum of epitaxial graphene at 0.4 T. The inset shows a
schematic of the assignations of the observed transitions.
in solid state experiments on the one hand and the possibility of future ap-
plications in carbon-based electronics on the other are currently driving a
considerable research effort. The majority of the published literature remains
theoretical; the extremely small lateral dimensions (≈ 10µm) of the graphene
flakes used in the above-mentioned transport experiments makes them difficult
objects for experimental studies. Moreover, due to the somewhat hit-and-miss
character of the exfoliation method, as well as the inherent difficulty of ob-
taining large numbers of samples, it appears to be an unlikely candidate for
possible applications. Epitaxial methods on the other hand offer the opportu-
nity of obtaining relatively large, high quality two-dimensional graphite [13].
In the following, we present optical measurements of the characteristic disper-
sion relation of FLG, confirming directly its linear (“relativistic”) character.
A number of epitaxial graphene samples have been studied by means of far-
infrared magnetotransmission measurements. The samples were about 4 × 4
mm2 in area, grown by sublimating SiC substrates at high temperatures [9,13].
The experimental details and part of the results have been described elsewhere
[14].
A representative transmission spectrum of a three-graphene-layer sample is
shown in Fig. 1 for a weak magnetic field of 0.4 T. When the magnetic field
is increased, all the features visible in this figure are displaced towards higher
energies. Furthermore, their strength increases [14] and more features become
visible at higher energies. The positions of the features observed for the sample
containing 3 graphene layers are plotted versus the square root of the magnetic
field in Fig. 2. It may be seen that the resonant energies observed evolve
proportionally to the square root of the magnetic field. The oscillator strength
Line Slope in units of c̃
2e~ Transition
1 L1 → L2
B 1 L0 → L1(L−1 → L0)
−1 → L2(L−2 → L1)
−2 → L3(L−3 → L2)
−3 → L4(L−4 → L3)
−4 → L5(L−5 → L4)
−5 → L6(L−6 → L5)
−6 → L7(L−7 → L6)
−7 → L8(L−8 → L7)
Table 1
Observed lines and their assignments
of the transition labelled B in Fig 1 has also been shown [14] to increase linearly
with the square root of the magnetic field.
These results are, in a first approximation, in excellent agreement with predic-
tions arising from a simple single-particle model of non-interacting massless
Dirac fermions.
Using appropriate graphene wavefunctions [4] and the Hamiltonian commonly
used to describe electrons in a single graphene layer, it is fairly straightforward
to work out the optical selection rules [15]. It may then be shown that the
allowed transitions are Ln → Lm such that |m| = |n| − 1 for the “+” circular
polarisation and |m| = |n|+1 in the “-” circular polarisation. For unpolarised
radiation, used in the current experiment, the allowed transitions are simply
those between states n,m such that |m| = |n| ± 1. The Landau level energies
are obtained as
En = c̃
2~eB|n| (1)
where c̃ is the effective velocity of the Dirac fermions, B is the magnetic field
and n = 0,±1,±2 ... is the Landau level index (the electron and hole levels
being identical). The energies of the allowed optical transitions may then be
concisely written as
2~eB(
|n+ 1| ±
|n|) (2)
The positions of the transitions shown in Fig. 2 are summarised in Table 1.
It should be stressed that all the positions of all the observed lines are described
0 1 2 3 4
4000 I H G F E
 )( 2/1TB
Fig. 2. Evolution with magnetic field of transitions observed in transmission. The
letters correspond to those used in Fig. 1; the shaded region corresponds to the
range where the substrate is opaque.
by a single fitting parameter - the effective light velocity c̃. We should add, for
the sake of completeness, that the present experiment, using unpolarised light,
does not distinguish between electrons and holes, which are expected to be
identical in terms of the effective mass and dispersion relation. Thus, transition
A, attributed to the L1 → L2 process, could also be due to the corresponding
−2 → L−1 one. While a p-type character appears to be unlikely, it cannot be
ruled out on the basis of the experiment in question.
The striking agreement of the experimental data obtained using several lay-
ers of graphene with expectations for a single layer is surprising, given that
calculations suggest a completely different behaviour already for a graphene
bilayer [17]. On the other hand, it has long been known that particles with a
linear dispersion exist in bulk graphite as well - a minority pocket of carriers
in the vicinity of the H point of the Brillouin zone were shown to give rise to
electronic transitions following a square root dependence on the magnetic field
[18]. The question therefore is posed: at what point, if at all, does epitaxial
FLG become bulk graphite?
Early work on epitaxial graphene [7] suggested that the process of baking
SiC substrates led to a single graphene layer floating above a graphite layer.
More recent calculations [22] suggest that the first carbon layer on top of an
SiC substrate has an electronic structure different from that of graphene, and
acts as a buffer, allowing subsequent layers to behave like graphene. A strong
dependence of the electronic structure of FLG on the type of stacking has also
been suggested [23]. The common Bernal, or AB, stacking found for example
in HOPG graphite is usually assumed for all FLG structures as well; this is
100 200 300 400 500 600 700
60 layers
9 layers
Energy (cm-1)
3 layers
Fig. 3. Transmission spectra at 4T for epitaxial FLG samples (top three) of varying
thickness and, for comparison, of HOPG graphite at the same magnetic field.
not necessarily the case. Also, let us note that the HOPG interlayer distance
of 3.354 Å may not be the correct value for epitaxial graphene.
In order to elucidate the effect of multiplying layers on the transmission spec-
trum, samples of varying thickness were studied and compared with a layer
of HOPG obtained by exfoliation. The details of this study shall be presented
elsewhere [19]; for the time being let us note the qualitative differences in the
spectra, shown in Fig. 3. Four spectra are shown, at a magnetic field of 4T: for
sample consisting of 3, 9 and 60 layers of graphene on SiC, and for the HOPG
sample. The dominant feature in the epitaxial samples is always the L0 → L1
−1 → L0) transition; we can see that it grows stronger as the number of
layers is increased, and is several times stronger for the sample containing 60
layers. In this sample one can also see the appearance of other features at
lower energies, which were absent in the thinner samples, and which appear
to correspond to bulk-like features visible in the lowest (HOPG) trace in the
figure. On the other hand, the L0 → L1 (L−1 → L0) transition, which has a
square root dispersion even in the 60 layer sample, is absent from the HOPG
spectrum.
The observed persistence of the Dirac fermion-like behaviour of the carriers
in epitaxial FLG up to relatively thick ( 19 nm) structures appears to suggest
that the structure of this material is in fact different from that of bulk HOPG.
The simplest explanation would be a far weaker interaction between adjacent
graphene layers, leading to a sequence of graphene layers instead of bulk,
or even multilayer, graphene. More studies are necessary to elucidate this
question.
The GHMFL is a “Laboratoire conventionné avec l’UJF et l’INPG de Greno-
ble”. The present work was supported in part by the European Commission
through grant RITA-CT-2003-505474 and by grants from the Intel Research
Corporation and the NSF: NIRT “Electronic Devices from Nano-Patterned
Epitaxial Graphite”.
References
[1] P.R. Wallace, Phys. Rev. 71, 622 (1947)
[2] J.W. McClure, Phys. Rev. 104, 666 (1956)
[3] J.C. Slonczewski and P.R. Weiss, Phys. Rev.109, 272 (1958)
[4] T. Ando, J. Phys. Soc. Jpn. 74, 777 (2005)
[5] F.D.M. Haldane, Phys. Rev. Lett. 61, 2015 (1988)
[6] Y. Zheng and T. Ando, Phys. Rev. B 65, 245420 (2002)
[7] I. Forbeaux, J.-M. Themlin, and J.-M. Debever, Phys. Rev. B 58, 16396 (1998)
[8] A.Charrier, A. Coati, T. Argunova, F. Thibaudau, Y. Garreau, R. Pinchaux, I.
Forbeaux, J.-M. Debever, M. Sauvage-Simkin, J.-M. Themlin, J. Appl. Phys.
92, 2479 (2002)
[9] C. Berger, Z. Song, T. Li, X. Li, A.Y. Ogbazghi, R. Feng, Z. Dai, A.N.
Marchenkov, E.H. Conrad, P.N. First, and W.A. de Heer, J. Phys. Chem. 108,
19912 (2004).
[10] K.S. Novoselov, A.K. Geim, S.V. Morozov, D. Jiang, Y. Zhang, S.V. Dubonos,
I.V. Grigorieva, and A.A. Firsov, Science 306, 666 (2004)
[11] K.S. Novoselov, A.K. Geim, S.V. Morozov, D. Jiang, M.I. Katsnelson, I.V.
Grigorieva, S.V. Dubonos, and A.A. Firsov, Nature 438, 197 (2005).
[12] Y. Zhang, Y.-W. Tan, H.L. Stormer and P. Kim, Nature 438, 201 (2005).
[13] C. Berger, Z. Song, T. Li, X. Li, X. Wu, N. Brown, C. Naud, D. Mayou, A.N.
Marchenkov, E.H. Conrad, P.N. First, and W.A. de Heer, Science 312, 1191
(2006)
[14] M.L. Sadowski, G. Martinez, M. Potemski, C. Berger, and W.A. de Heer, Phys.
Rev. Lett 97, 266405 (2006).
[15] M.L. Sadowski, G. Martinez, M. Potemski, C. Berger, and W.A. de Heer, Int.
J. Mod. Phys. B, in press.
[16] V.P. Gusynin, S.G.Sharapov, and J.P. Carbotte, J. Phys.: Condens. Matter 19,
026222 (2007)
[17] D.S.L. Abergel and V.I. Fal’ko, cond-mat/0610673
[18] W.W. Toy, M.S. Dresselhaus, and G. Dresselhaus, Phys. Rev. 15, 4077 (1977)
[19] M.L. Sadowski et al., to be published
[20] T. Ohta, A. Bostwick, T. Seyller, K. Horn, and E. Rotenberg, Science 313, 951
(2006)
[21] B. Partoens and F.M. Peeters, Phys. Rev. B 74,075404 (2006)
[22] F. Varchon, R. Feng, J. Hass, X. Li, B.N. Nguyen, C. Naud, P. Mallet, J.-Y.
Veuillen, C. Berger, E.H. Conrad, and L. Magaud, cond-mat/0702311
[23] F. Guinea, A.H. Castro Neto, N.M.R. Peres, Phys. Rev. B 73, 245426 (2006)
http://arxiv.org/abs/cond-mat/0610673
http://arxiv.org/abs/cond-mat/0702311
	References
ABSTRACT
  The inter-Landau level transitions observed in far-infrared transmission
experiments on few-layer graphene samples show a behaviour characteristic of
the linear dispersion expected in graphene. This behaviour persists in
relatively thick samples, and is qualitatively different from that of thin
samples of bulk graphite.

<|endoftext|><|startoftext|>
Introduction
	SN dust formation revisited
	Survival in the reverse shock
	Dynamics of the reverse shock
	Dust grain survival
	Extinction and emission from SN dust
	Summary
	Stochastic heating from electron collisions
ABSTRACT
  The presence of dust at high redshift requires efficient condensation of
grains in SN ejecta, in accordance with current theoretical models. Yet,
observations of the few well studied SNe and SN remnants imply condensation
efficiencies which are about two orders of magnitude smaller. Motivated by this
tension, we have (i) revisited the model of Todini & Ferrara (2001) for dust
formation in the ejecta of core collapse SNe and (ii) followed, for the first
time, the evolution of newly condensed grains from the time of formation to
their survival - through the passage of the reverse shock - in the SN remnant.
We find that 0.1 - 0.6 M_sun of dust form in the ejecta of 12 - 40 M_sun
stellar progenitors. Depending on the density of the surrounding ISM, between
2-20% of the initial dust mass survives the passage of the reverse shock, on
time-scales of about 4-8 x 10^4 yr from the stellar explosion. Sputtering by
the hot gas induces a shift of the dust size distribution towards smaller
grains. The resulting dust extinction curve shows a good agreement with that
derived by observations of a reddened QSO at z =6.2. Stochastic heating of
small grains leads to a wide distribution of dust temperatures. This supports
the idea that large amounts (~ 0.1 M_sun) of cold dust (T ~ 40K) can be present
in SN remnants, without being in conflict with the observed IR emission.

<|endoftext|><|startoftext|>
Preferential interaction coefficient for nucleic acids and other cylindrical poly-ions
Emmanuel Trizac∗
CNRS; Univ. Paris Sud, UMR8626, LPTMS, ORSAY CEDEX, F-91405 and
Center for Theoretical Biological Physics, UC San Diego,
9500 Gilman Drive MC 0374 - La Jolla, CA 92093-0374, USA
Gabriel Téllez†
Departamento de F́ısica, Universidad de Los Andes, Apartado Aéreo 4976, Bogotá, Colombia
The thermodynamics of nucleic acid processes is heavily affected by the electric double-layer
of micro-ions around the polyions. We focus here on the Coulombic contribution to the salt-
polyelectrolyte preferential interaction (Donnan) coefficient and we report extremely accurate ana-
lytical expressions valid in the range of low salt concentration (when polyion radius is smaller than
the Debye length). The analysis is performed at Poisson-Boltzmann level, in cylindrical geometry,
with emphasis on highly charged poly-ions (beyond “counter-ion condensation”). The results hold
for any electrolyte of the form z−:z+. We also obtain a remarkably accurate expression for the
electric potential in the vicinity of the poly-ion.
Coulombic interactions between salt and poly-anions
play a key role in the equilibrium and kinetics of nucleic
acid processes [1]. A convenient quantity quantifying
such interactions and allowing for the analysis and inter-
pretation of their thermodynamics consequences, is the
so called preferential interaction coefficient. Several def-
initions have been proposed and their interrelation stud-
ied, see e.g. [2, 3, 4]. In the present work, they are defined
as the integrated deficit (with respect to bulk conditions)
of co-ions concentration around a rod-like poly-ion. Our
goal is to provide analytical expressions describing the
effect of salt concentration and poly-ion structural pa-
rameters on the preferential interaction coefficient, for a
broad class of asymmetric electrolytes. For symmetric
electrolytes, it will be shown that our formulas improve
upon existing analytical results. For other asymmetries,
they seem to have no counterpart in the literature. Our
analysis holds for highly (i.e. beyond counter-ion conden-
sation [5, 6]) and uniformly charged cylindrical poly-ions,
and is explicitly limited to the low salt regime (i.e. when
the poly-ion radius a is smaller than the Debye length
1/κ). These conditions are most relevant for RNA or
DNA in their single, double, or triple strand forms.
As in several previous approaches [7, 8, 9, 10], we
adopt the mean-field framework of Poisson-Boltzmann
equation, in a homogeneous dielectric background of per-
mittivity ε. The same starting point has proven relevant
for related structural physical chemistry studies of nu-
cleic acids [11]. In a z−:z+ electrolyte, the dimensionless
electrostatic potential φ = eϕ/kT (with e > 0 the ele-
mentary charge and kT thermal energy) then obeys the
equation [12]
z+ + z−
ez−φ − e−z+φ
, (1)
∗Electronic address: trizac@lptms.u-psud.fr
†Electronic address: gtellez@uniandes.edu.co
where r is the radial distance to the rod axis. The va-
lencies z+ and z− of salt ions are both taken positive.
Denoting derivative with a prime, the boundary condi-
tions read rφ′(r) = 2ξ > 0 at the polyion radius (r = a)
and φ → 0 for r → ∞. The latter condition expresses
the infinite dilution of poly-ion limit and ensures that
the whole system is electrically neutral, since it (indi-
rectly) implies that rφ′ → 0 for r → ∞. We consider
a negatively charged poly-anion for which φ < 0 and
the line charge density reads λ = −eξ/ℓB < 0, where
ℓB = e
2/(εkT ) denotes the Bjerrum length (0.71 nm in
water at room temperature). Finally, the Debye length is
defined from the bulk ionic densities n∞+ and n
− through
κ2 = 4πℓB(z
+ + z
The Coulombic contribution to the anionic preferential
interaction coefficient is defined as [7, 8, 9, 10, 13]
Γ = κ2
(ez−φ − 1) rdr, (2)
while its cationic counterpart follows from electro-
neutrality. This quantity –which provides a measure of
the Donnan effect [14]– can be expressed in closed form
as a function of the electrostatic potential, see Appendix
A. As can be seen in (A3) and (A4), Γ depends expo-
nentially on the surface potential φ0, so that deriving a
precise analytical expression is a challenging task. Fur-
thermore, we are interested here in the limit κa < 1
(including the regime κa≪ 1) which is analytically more
difficult than the opposite high salt situation where to
leading order, the charged rod behaves as an infinite
plane, and curvature corrections can be perturbatively
included [15, 16, 17].
We will proceed in two steps. Focusing first on the
surface potential φ0 = φ(a), we make use of recent re-
sults [18] that have been obtained from a mapping of Eq.
(1) onto a Painlevé type III problem [19, 20, 21]. The
exact expressions thereby derived only hold for 1:1, 1:2
and 2:1 electrolytes, but may be written in a way that is
electrolyte independent. This remarkable feature is spe-
http://arxiv.org/abs/0704.0587v1
mailto:trizac@lptms.u-psud.fr
mailto:gtellez@uniandes.edu.co
z+/z− 1/10 1/3 1/2 1 2 3 10
C -2.51 -1.94 -1.763 -1.502 -1.301 -1.21 -1.06
TABLE I: Values of C appearing in Eq. (4) as a function
of electrolyte asymmetries. For z+/z− = 1, 1/2 and 2, C is
known analytically from the results of [18]. The corresponding
values are recalled in Appendix B. For other values of z+/z−,
C has been determined numerically, see in particular Fig. 6
of Appendix B.
cific to the short distance behaviour of φ and has been
overlooked so far, since not only short distance but also
large distance properties have been studied [18]. We are
then led to conjecture that the corresponding expression
holds for any binary electrolyte z−:z+, and we explicitly
check the relevance of our assumption on several specific
examples.
Technical details are deferred to the appendices. It is
in particular concluded in Appendix B that the surface
potential may be written
e−z+φ0 ≃
2(z+ + z−)
z+(κa)2
(z+ξ − 1)
2 + µ̃2
where
log(κa) + C − (z+ξ − 1)−1
. (4)
Expression (4) is valid for κa < 1 and z+ξ > 1 [in fact
z+ξ > 1 + O(1/| log κa|)]. These conditions are easily
fulfilled for nucleic acids. The “constant” C appearing
in (3) depends smoothly on the ratio z+/z− but is oth-
erwise salt and charge independent. We report in Table
I its values for several electrolyte asymmetries. The de-
crease (in absolute value) of C when z+/z− increases is
a signature of more efficient (non-linear) screening with
counter-ions of higher valencies.
From Eq. (3) and the results of Appendix B, our ap-
proximation for Γ takes a simple form
Γ ≃ −
(1 + µ̃2). (5)
This expression is tested in Figures 1 and 2 against the
“true” numerical results that serve as a benchmark. In
Fig. 1 which corresponds to a monovalent salt (or more
generally a z:z electrolyte), we also show the prediction
of Ref. [9], which is, to our knowledge, the most accurate
existing formula for a 1:1 salt. For the technical reasons
discussed in Appendix B, and that are evidenced in Fig-
ure 6, our expression improves that of Shkel, Tsodikov
and Record [9], particularly at lower salt content. For
1:2 and 2:1 salts, we expect Eq. (5) to be also accurate,
since it is based on exact expansions. The situation of
other salt asymmetries is more conjectural (see Appendix
B), but Eq. (5) is nevertheless in remarkable agreement
with the full solution of Eq. (1), see Fig. 2. To be spe-
cific, in both Figures 1 and 2, the relative accuracy of
our approximation is better than 0.2% for κa = 10−2
(for both ss and ds RNA parameters). At κa = 0.1, the
accuracy is on the order of 1%.
As illustrated in Fig. 3, approximation (4) assumes
that z+ξ > 1. The corresponding expression for Γ there-
fore breaks down when ξ is too low. More general expres-
sions, still for κa < 1, may be found in appendix C. The
inset of Fig. 3 offers an illustration and shows that the
limitations of approximation (4) may be circumvented at
little cost, providing a quasi-exact value for Γ. Moreover,
it is shown in this appendix that for z+ξ = 1, µ̃ reads
log(κa) + C
. (6)
On the other hand, Eq. (3) still holds. The corresponding
Γ is shown in Fig. 4.
We provide in Appendix C a general expression of the
short scale (i.e valid up to κr ∼ 1) radial dependence
of the electric potential, see Eq. (C1). The bare charge
should not be too low [more precisely, one must have
ξ > ξc with ξc given by Eq. (C5)], and µ̃ –which encodes
the dependence on ξ– follows from solving Eq. (C2). In
general, the corresponding solution should be found nu-
merically. However, one can show a) that µ̃ vanishes
for ξ = ξc, b) that µ̃ takes the value (6) when z+ξ = 1
and c) that µ̃ is given by (4) when z+ξ exceeds unity
by a small and salt dependent amount. In practice, for
DNA and RNA, we have ξ > 2 and Eq. (4) provides
0.001 0.01 0.1 1
0.001 0.01 0.1
FIG. 1: Preferential interaction coefficient for a 1:1 salt. The
main graph corresponds to ss-RNA with reduced line charge
ξ = 2.2 while the inset is for ds-RNA (ξ = 5). The circles
correspond to the value of (2) following from the numerical
solution of Eq. (1). The prediction of Eq. (5) with eµ given
by (4) and C ≃ −1.502, shown with the continuous curve, is
compared to that of Ref. [9], shown with the dashed line. As
in all other figures, the opposite of Γ is displayed, to consider
a positive quantity.
0.001 0.01 0.1
0.001 0.01 0.1
2.5 dsRNA
ssRNA
FIG. 2: Same as Figure 1 for a 1:3 and a 3:1 electrolyte. From
Table I, we have C ≃ −1.21 in the 1:3 case and conversely
C ≃ −1.94 in the 3:1 case. The symbols correspond to the
numerical solution of Eq. (1) and the continuous curves show
the results of Eq. (5) with again eµ given by (4).
1 1.5 2 2.5 3
1 1.5 2 2.5
FIG. 3: Preferential interaction coefficient for a 1:1 salt (hence
C ≃ −1.502) and κa = 10−2. The circles show the numerical
solution of PB theory (1), the continuous curve is for (5) with
(4) and the dashed line is the prediction of Ref. [9]. Although
approximation (4) breaks down at low ξ, the inset shows that
eµ following from the solution of Eq. (C2) gives through (5) a
Γ (continuous curve), that is in excellent agreement with the
“exact one”, shown with circles as in the main graph.
excellent results whenever κa < 0.1. To illustrate this,
we compare in Figure 5 the potential following from the
analytical expression (C1) to its numerical counterpart.
We do not display 1:1, 1:2 and 2:1 results since in these
cases, Eq. (C1) is obtained from an exact expansion and
fully captures the r-dependence of the potential. For the
asymmetry 1:3, Fig. 5 shows that the relatively simple
form (C1) is very reliable. A similar agreement has been
found for all couples z−:z+ sampled, with the trend that
0.001 0.01 0.1
0.001 0.01 0.1
FIG. 4: Same as Fig. 1 for ξ = 1 and z+/z− = 1. The same
quantities are shown: our prediction for Γ [Eqs. (5) and (6)
with C ≃ −1.502] is compared to that of Ref. [9]. The inset
shows −z+Γ/z− for a 1:2 salt such as MgCl2 where C takes the
value -1.301. Circles : numerical data; curve : our prediction.
the validity of (C1) extends to larger distances as z+/z−
is decreased. In this respect, the agreement shown in Fig.
5 for which z+/z− is quite high (3), is one of the “worst”
observed.
0.01 0.1 1
0 0.5 1
0.01 0.1 1
FIG. 5: Opposite of the electric potential versus radial dis-
tance in a 1:3 electrolyte with κa = 10−2. The continuous
curve shows the prediction of Eq. (C1) with eµ given by (4) ;
the circles show the numerical solution of Eq. (1). The po-
tential for ξ = 2.2 is shown in the main graph on a log-linear
scale, and on a linear scale in the lower inset. The upper inset
is for ξ = 5.
Conclusion. The poly-ion ion preferential interaction
coefficient Γ describes the exclusion of co-ions in the
vicinity of a polyelectrolyte in an aqueous solution. We
have obtained an accurate expression for Γ in the regime
of low salt (κa < 1). The present results are particu-
larly relevant for highly charged poly-ions (z+ξ > 1, that
is beyond the classical Manning threshold [22]), but are
somewhat more general and hold in the range ξc < ξ < 1,
where ξ stands for the line charge per Bjerrum length and
ξc is a salt dependent threshold, given by Eq. (C5). Our
formulae have been shown to hold for arbitrary mixed
salts of the form z−:z+ (magnesium chloride, cobalt hex-
amine etc). They have been derived from exact expan-
sions valid in 1:1,1:2 and 2:1 cases, from which a more
general conjecture has been inferred. The validity of this
conjecture, backed up by analytical arguments, has been
extensively tested for various values of z+/z−, poly-ion
charge and salt content. These tests have provided the
numerical value of the constant C reported in Table I,
which only depends of the ratio z+/z−. As a byprod-
uct of our analysis, we have obtained a very accurate
expression for the electric potential in the vicinity of the
charged rod (r < κ−1).
It should be emphasized that the validity of our
mean-field description relying on the non-linear Poisson-
Boltzmann equation depends on the valency of counter-
ions (z+), and to a lesser extent to the value of z− [12, 23].
For the 1:1 case in a solvent like water at room temper-
ature, micro-ionic correlations can be neglected up to a
salt concentration of 0.1M [8]. For z+ ≥ 2 or in sol-
vents of lower dielectric permittivity, they play a more
important role. Our results however provide mean-field
benchmarks from analytical expressions, from which the
effects of correlations may be assessed in cases where they
cannot be ignored (see e.g. [8] for a detailed discussion).
Acknowledgments
This work was supported by a ECOS
Nord/COLCIENCIAS action of French and Colom-
bian cooperation. G. T. acknowledge partial financial
support from Comité de Investigaciones, Facultad de
Ciencias, Universidad de los Andes. This work has
been supported in part by the NSF PFC-sponsored
Center for Theoretical Biological Physics (Grants No.
PHY-0216576 and PHY-0225630).
APPENDIX A
In order to explicitly relate the preferential coefficient
Γ in (2) to the electric potential, we follow a procedure
similar to that which leads to an analytical solution in
the cell model, without added salt [24]. Implicit use will
be made of the boundary conditions associated to (1).
First, integrating Eq. (1), one gets
[r′φ′(r′)]ra =
z+ + z−
e−z+φ − ez−φ
r′dr′, (A1)
where the notation [F (r′)]ra = F (r) − F (a) has been in-
troduced. Then, multiplying Eq. (1) by r2φ′ and inte-
grating, we obtain
z+ + z−
(r′φ′)2
e−z+φ
+ r′2
e−z+φ
dr′.(A2)
Combining both relations with adequate weights, in order
to suppress the integral over counter-ion (+) density, we
r′(ez−φ − 1)dr′ =
2(z+ + z−)
ez−φ0 − 1
e−z+φ0 − 1
where φ0 = φ(a) is the surface potential. Equation (A3)
will turn useful in the formulation of a general conjec-
ture concerning the surface potential φ0, see Appendix
B. We also note that for the systems under investigation
here, the surface potential is quite high, and a very good
approximation to (A3) is
r′(ez−φ − 1)dr′ ≃
a2z−e
−z+φ0
2(z+ + z−)
APPENDIX B
We start by analyzing a 1:1 electrolyte, for which it
has been shown [19, 20] that the short distance behaviour
reads
eφ/2 =
2µ log
− 2Ψ(µ)
+ O (κr)
where Ψ denotes the argument of the Euler Gamma func-
tion Ψ(x) = arg[Γ(ix)] [19, 20]. In (B1), µ denotes the
smallest positive root of
tan [2µ log(κa/8)− 2Ψ(µ)] =
ξ − 1
. (B2)
Expressions (B1) and (B2) require that ξ exceeds a salt
dependent threshold [denoted ξc below and given by Eq.
(C5)] that is always smaller than 1 [18]. They thus always
hold for ξ ≥ 1 and in particular encompass the interesting
limiting case ξ = 1, which is sufficient for our purposes.
For large ξ, we have proposed in [18] an approximation
which amounts to linearizing the argument of the tangent
in (B1) in the vicinity of −π, and similarly linearizing Ψ
to first order: Ψ(x) ≃ −π/2 − γx + O(x3) where γ is
the Euler constant, close to 0.577. It turns out however
that finding accurate expressions for exp(−z+φ0), which
is useful for the computation of the preferential interac-
tion coefficient, requires to include the first non-linear
correction in the expansion of the tangent. After some
algebra, we find :
log(κa) + C − (ξ − 1)−1
6(log(κa) + C − (ξ − 1)−1)4
(ξ − 1)3
ψ(2)(1)
where the constant C = C1:1 reads C1:1 = γ − log 8 ≃
−1.502 and ψ(2)(1) = d3 ln Γ(x)/dx3|x=1. From (B3) and
(B1) where the sinus is expanded to third order, we ob-
(κa)2e−φ0 ≃ 4[(ξ − 1)2 + µ̃2] (B4)
where µ̃ is given by
log(κa) + C − (z+ξ − 1)−1
. (B5)
In writing (B5), we have introduced the change of vari-
able µ̃ = 2µ [25]. The reason is that similar changes for
other electrolyte asymmetries allows to put the final re-
sult in a “universal” (electrolyte independent) form, see
below. A similar reason holds for introducing z+, here
equal to 1, in the denominator of (B5).
The functional proximity between our expressions and
those reported in [9] in the very same context is striking.
We note however that our µ̃ (denoted β in [9]) involves
a different constant C. More importantly, the functional
form of (B1) differs from that given in [9]. The compari-
son of the performances of our results with those of [9] is
addressed below, and is also discussed in the main text.
Performing a similar analysis as above in the 1:2 case
where z+ = 2 and z− = 1, we obtain from the expressions
derived in [18]:
(κa)2e−z+φ0 ≃ 3[(ξ − 1)2 + µ̃2] (B6)
and similarly, in the 2:1 case (z+ = 1, z− = 2):
(κa)2e−z+φ0 ≃ 6[(ξ − 1)2 + µ̃2]. (B7)
In both cases, provided again that ξ is not too low (see
below) µ̃ is given by (B5) [26], with however a different
numerical value for C [C1:2 = γ− (3 log 3)/2− (log 2)/3 ≃
−1.301 and C2:1 = γ − (3 log 3)/2− log 2 ≃ −1.763].
The similarity of expressions (B4), (B6) and (B7) leads
to conjecture that this form holds for any z−:z+ elec-
trolyte :
(κa)2e−z+φ0 ≃ A[(z+ξ − 1)
2 + µ̃2]. (B8)
We then have to determine the prefactor A as a function
of z+ and z−. To this end, we make use of the exact
relation (A3) [or equivalently (A4)], where in the limit of
large ξ, the lhs is finite while the two terms on the rhs
diverge. This yields the leading order behaviour :
(κa)2 exp(−z+φ0)
z+ + z−
(z+ξ − 1)
2. (B9)
It then follows that A = 2(z+ + z−)/z+ so that our gen-
eral expression (B8) takes the form:
(κa)2e−z+φ0 ≃ 2
z+ + z−
(z+ξ − 1)
2 + µ̃2
. (B10)
This expression holds regardless of the approximation
used for µ̃. If Eq. (B5) is used, then z+ξ should not
be too close to unity (see appendix C for more general
results including the case z+ξ = 1).
In order to test the accuracy of (B10) in conjunction
with (B5), we have solved numerically Eq. (1) for several
values of κa < 1 and electrolyte asymmetry and checked
that for several different values of z+ξ > 1, the quantity
Q = −π
(κa)2e−z+φ0
2(z+ + z−)
− (z+ξ − 1)
]−1/2
− log(κa) + (z+ξ − 1)
−1 (B11)
is a constant C, which only depends on z+/z− but not on
salt and ξ [it should be borne in mind that Eq. (B5) is a
small κa and large ξ expansion, which becomes increas-
ingly incorrect as κa is increased and/or ξ lowered]. This
is quite a stringent test (since the two terms on the rhs of
(B11) are large and close] which requires high numerical
accuracy. This is achieved following the procedure out-
lined in [27]. In doing so, we confirm the validity of (B10)
and collect the values of C given in Table I. In the 1:1
case, we predict that C = γ− log 8 ≃ −1.507, in excellent
agreement with the numerical data of Figure 6. On the
other hand, the prediction of Ref. [9] that Q reaches a
constant close to -1.90 (shown by the horizontal dashed
line in Fig 6) is incorrect. Figure 6 shows that the quality
of expression (B10) deteriorates when κa increases, as ex-
pected. It is noteworthy however that for κa = 10−1, its
accuracy is excellent whenever ξ > 2. The inset of Fig.
6 shows the validity of (B10) for a 3:1 electrolyte. When
z+ξ is close to 1, Eq. (B5) becomes an irrelevant approx-
imation to the solution of (B2), and can therefore not be
inserted into the general formula (B10). This explains
the large deviations between Q and the asymptotic value
C observed in Fig. 6 for the lower values of ξ reported.
We come back to this point in Appendix C.
The present results hold for z+ξ > 1 +O(1/| log κa|).
In this regime, our analysis shows that Eq. (B10) [with µ̃
given by (B5)] is correct up to order 1/ log4(κa) for any
(z−, z+). On the other hand the results of [9] ,valid in
the 1:1 case, appear to be correct to order 1/ log2(κa).
In addition, our expression for the surface potential may
be generalized to a broader range of ξ values and an ex-
pression for the short distance dependence of the electric
potential may also be provided. This is the purpose of
appendix C.
1 2 3 4 5
1 2 3 4
Q = -1.90
FIG. 6: Plot of the quantity Q defined in (B11) versus line
charge ξ for a 1:1 electrolyte at κa = 10−3 (continuous curve)
and κa = 10−1 (dashed curve). The value reached at large ξ is
compared to the prediction of [9] Q → eγ + log 2− γ ≃ −1.90
(horizontal dashed-dotted line) whereas Eqs. (B10) and (B5)
imply Q → γ− log 8 ≃ −1.50, shown by the horizontal dotted
line. The inset shows the same quantity for a 3:1 electrolyte
at κa = 10−5 [such a very low value is required to determine
precisely the value of the asymptotic constant C, that can
subsequently be used at experimentally relevant (higher) salt
concentrations]. Here, we obtain Q → −1.94 (dotted line)
which is the value reported for C in Table I.
APPENDIX C
In Appendix B, the “universal” results valid for all
(z+, z−) have been unveiled partly by a change of variable
µ → µ̃ from existing expressions [18]. In light of these
results, and of their accuracy (assessed in particular by
the precision reached for the preferential interaction co-
efficient), it is tempting to go further without invoking
approximations of (B2), or related expressions for other
asymmetries than 1:1. Inspection of the results given in
[18] for the 1:1, 1:2 and 2:1 cases lead, with again the
help of (A4), to the conjecture that
ez+φ/2 ≃
2(z+ + z−)
sin [µ̃ log(κr) + µ̃ C] (C1)
tan [ µ̃ log(κa) + µ̃ C ] =
z+ξ − 1
. (C2)
We emphasize that (C1), much as (B1), is a short dis-
tance expansion and typically holds for κr < 1 (hence
the requirement that κa < 1). In appendix D we give
further analytical support for conjecture (C1). A typical
plot showing the accuracy of (C1) is provided in the main
text (Fig. 5). For κr < 0.1, the agreement with the ex-
act result is better than 0.1%, and becomes progressively
worse at higher distances (20% disagreement at κr = 1).
From (C1), it follows that the integrated charge q(r)
in a cylinder of radius r [that is q(r) = −rφ′(r)/2] reads
z+q(r) = −1 + µ̃ tan
µ̃ log
where the so-called Manning radius [18, 28, 29] is given
κRM = exp
. (C4)
The Manning radius is a convenient measure of the coun-
terion condensate thickness. It is the point r where not
only z+q(r) = 1 but also where q(r) versus log r exhibits
an inflection point [30]. For high enough ξ, the logarith-
mic dependence of 1/µ̃ with salt [see (B5)] is such that
RM ∝ κ
−1/2.
The two relations (C1) and (C2) encompass those given
in Appendix B and allow to investigate the regime z+ξc <
z+ξ, and in particular the case z+ξ = 1, the so-called
Manning threshold [5]. However, (C1) and (C2) are not
valid for ξ < ξc, with
z+ξc = 1 +
log κa+ C
. (C5)
Note that ξc < 1, since the constant C is negative and
that salt should fulfill κa < 1. For κa = 10−2 and
z+/z− = 1, we obtain ξc ≃ 0.836. This is precisely the
point where −Γ = 1 in the inset of Fig. 3. This inset
also shows that the value of Γ resulting from the use of
the solution of (C2) is remarkably accurate.
At this point, it seems useful to investigate the Man-
ning threshold case z+ξ = 1 (which corresponds to
the onset of counterion condensation when κa → 0
[5, 18, 30]). It is readily seen that the solution of (C2)
reads
z+ξ=1
log(κa) + C
, (C6)
which should be inserted in (C1) to obtain the potential
profile, or in (5) to get the interaction coefficient.
APPENDIX D
In this appendix we give further support for the con-
jecture (C1) which gives the short-distance expansion
of the electric potential. Let us suppose initially that
the charge is below the Manning threshold ξ < ξc.
It is straightforward to verify that Poisson–Boltzmann
equation (1) admits solutions which behave as φ(r) =
−2A ln(κr) + lnB + o(1) for κr ≪ 1. Injecting this ex-
pansion into equation (1) allows us to compute higher
order terms. To study the regime beyond the Man-
ning threshold, we compute all higher order terms of the
form r2n(1+z+A) (for a negatively charged macroion) and
r2n(1−z−A) (for a positively charged macroion), with n a
positive integer. These terms turn out to present them-
selves as the series expansion of the logarithm, thus re-
summing them we obtain
φ(r) = −2A ln(κr) + lnB (D1)
−z+ (κr)2(1+z+A)
8(z+ + z−)(1 + z+A)2
− (κr)2(1−z−A)
8(z+ + z−)(1 − z−A)2
+ · · ·
The dots represent terms of order r2n(1+z+A)+2m(1−z−A)
with n and m two nonzero positive integers. When the
Manning threshold is approached, z+A + 1 = 0 for neg-
atively charged macroion, the terms r2n(1+z+A) (second
line of Eq. (D1)) become of order one, but the rest of the
terms (third line of Eq. (D1) and dots) remain higher or-
der: a change in the small distance behavior of φ occurs.
A similar situation is reached for 1 − z−A = 0 which is
the Manning threshold for a positively charged macroion.
A and B in the previous equations are constants of in-
tegration, which should be determined with the bound-
ary conditions rφ′(r) = 2ξ at the polyion radius (r = a)
and φ → 0 for r → ∞. Thus to proceed further, we
have to connect the long and the short distance behavior
of φ. This connection problem has been only solved in
the cases 1:1, 1:2 and 2:1 in Refs. [19, 31]. In particular,
once A has been chosen (notice that for a = 0, A = −ξ),
B should be one and only one function of A in order to
satisfy φ→ 0 for r → ∞. The results from [19, 31] show
B = 26Aγ ((1 +A)/2)
(1 : 1) (D2)
B = 33A22Aγ (2(1 +A)/3)γ ((1 +A)/3) (1 : 2)
B = 33A22Aγ ((1 + 2A)/3)γ ((2 +A)/3) (2 : 1)
where γ(x) = Γ(x)/Γ(1 − x). B turns out to have some
interesting properties in the cases 1:1, 1:2 and 2:1, where
its exact expression (D2) is known. Namely, at the Man-
ning threshold 1 + z+A = 0,
A→−1/z+
8(z+ + z−)(1 + z+A)2
= 1 (D3)
Furthermore if we put 1 + z+A = iµ̃, and define
e2iΨ(eµ) =
8(z+ + z−)(1 + z+A)2
then for µ̃ ∈ R, Ψ(µ̃) ∈ R is a real function of µ̃, with
Ψ(0) = 0.
Let us now study the regime beyond the Manning
threshold for a negatively charged macroion. From
Eq. (D1) we can write
ez+φ(r)/2 ∼ (κr)−z+ABz+/2
−z+ (κr)2(1+z+A)
8(z+ + z−)(1 + z+A)2
neglecting terms of higher order when z+A is close to −1.
Let us conjecture that the properties of B as a function
of A presented above hold in the general case z− : z+.
Then using the parameter µ̃ defined above we find after
some simple algebra
ez+φ(r)/2 =
2(z+ + z−)
sin [µ̃ log(κr) + Ψ(µ̃)]
+O(r3+2z−/z+) (D6)
Recalling that |µ̃| ≪ 1 we can approximate Ψ(µ̃) ≃ µ̃C,
where C = Ψ′(0). Replacing this approximation into (D6)
and imposing the boundary condition aφ′(a) = 2ξ leads
to (C1) and (C2). Numerical values obtained for the
constants C are reported in Table I, for different charge
asymmetries z− : z+. The previous analysis shows that
analytical predictions for C could be made if the con-
nection problem is solved and the equivalent of expres-
sions (D2) are found for the general case z− : z+.
[1] C.F. Anderson and M.T. Record Jr, Annu. Rev. Phys.
Chem. 33, 191 (1984).
[2] H. Eisenberg, Biological Macromolecules and Polyelec-
trolytes in Solution, Clarendon, Oxford (1976).
[3] J.A. Schellman, Biophys. Chem. 37, 121 (1990).
[4] S.M. Timasheff, Biochemistry 31, 9857 (1992).
[5] G.S. Manning, J. Chem. Phys. 51, 924 (1969).
[6] F. Oosawa, Polyelectrolytes, Dekker, New York (1971).
[7] K.A. Sharp, Biopolymers 36, 227 (1995).
[8] H. Ni, C.F. Anderson and M.T. Record Jr, J. Phys.
Chem. B 103, 3489 (1999).
[9] I.A. Shkel, O.V. Tsodikov and M.T. Record Jr, Proc.
Natl. Acad. Sci. USA 99, 2597 (2002).
[10] C.H. Taubes, U. Mohanty and S. Chu, J. Phys. Chem. B
109, 21267 (2005).
[11] M. Gueron, J.-Ph. Demaret and M. Filoche, Biophys.
Journal 78, 1070 (2000).
[12] Y. Levin, Rep. Prog. Phys. 65, 1577 (2002).
[13] In the 1:1 case, our definition differs from the more stan-
dard one as found e.g. in [9] by a factor 4ξ. The reason
for doing so is that this allows easier comparison of the
salt dependence of Γ for different values of the poly-ion
charge.
[14] F.G. Donnan, Chem. Rev. 1, 73 (1924).
[15] I.A. Shkel, O.V. Tsodikov and M.T. Record Jr, J. Phys.
Chem. B 104, 5161 (2000).
[16] M. Aubouy, E. Trizac, L. Bocquet, J. Phys. A: Math.
Gen. 36, 5835 (2003).
[17] G. Tellez and E.Trizac, Phys. Rev. E 70, 011404 (2004).
[18] E. Trizac and G. Téllez, Phys. Rev. Lett 96, 038302
(2006) ; G. Téllez and E. Trizac, J. Stat. Mech. P06018
(2006).
[19] B.M. McCoy, C.A. Tracy and T.T. Wu, J. Math. Phys.
18, 1058 (1977).
[20] J.S. McCaskill and E.D. Fackerell, J. Chem. Soc., Fara-
day Trans. 2 84, 161 (1988).
[21] C.A. Tracy and H. Widom, Physica A 244, 402 (1997).
[22] We emphasize that accurate results for Γ, φ etc may be
obtained for ξ < ξc from the results given in [18]. We
did not investigate this regime here, since it is of little
relevance for nucleic acids.
[23] A.Y. Grosberg, T.T. Nguyen and B.I. Shklovskii, Rev.
Mod. Phys. 74, 329 (2002).
[24] R.M. Fuoss, A. Katchalsky and S.F. Lifson, P. Natl.
Acad. Sci. USA 37, 579 (1951).
[25] It then appears that the expression given for eµ = 2µ in
(B5) corresponds to the dominant term only in (B3) (the
first one on the rhs).
[26] Compared to the expressions given in [18] where a pa-
rameter µ plays a key role, the corresponding change of
variables should be performed: eµ = 3µ (1:2 case) and
eµ = 3µ/2 for 2:1 electrolytes.
[27] E. Trizac, L. Bocquet, M. Aubouy and H.H. von
Grünberg, Langmuir 19, 4027 (2003).
[28] M. Gueron and G. Weisbuch, Biopolymers 19, 353
(1980).
[29] B. O’Shaughnessy and Q. Yang, Phys. Rev. Lett. 94,
048302 (2005).
[30] M. Deserno, C. Holm and S. May, Macromolecules 33,
199 (2000).
[31] C.A. Tracy and H. Widom, Commun. Math. Phys. 190,
697 (1998).
ABSTRACT
  The thermodynamics of nucleic acid processes is heavily affected by the
electric double-layer of micro-ions around the polyions. We focus here on the
Coulombic contribution to the salt-polyelectrolyte preferential interaction
(Donnan) coefficient and we report extremely accurate analytical expressions
valid in the range of low salt concentration (when polyion radius is smaller
than the Debye length). The analysis is performed at Poisson-Boltzmann level,
in cylindrical geometry, with emphasis on highly charged poly-ions (beyond
``counter-ion condensation''). The results hold for any electrolyte of the form
$z_-$:$z_+$. We also obtain a remarkably accurate expression for the electric
potential in the vicinity of the poly-ion.

<|endoftext|><|startoftext|>
Introduction
One of the important quantities in information theory is the mutual information of
two random variables X and Y which is expressed in terms of the Boltzmann-Gibbs
entropy H(·) as follows:
I(X ∧ Y ) = −H(X, Y ) +H(X) +H(Y )
when X, Y are continuous variables. For the expression of I(X∧Y ) of discrete variables
X, Y , the aboveH(·) is replaced by the Shannon entropy. A more practical and rigorous
definition via the relative entropy is
I(X ∧ Y ) := S(µ(X,Y ), µX ⊗ µY ),
where µ(X,Y ) denotes the joint distribution measure of (X, Y ) and µX⊗µY the product
of the respective distribution measures of X, Y .
The aim of this paper is to show that the mutual information I(X∧Y ) is gained as a
certain asymptotic limit of the volume of “discrete micro-states” consisting of permu-
tations approximating joint moments of (X, Y ) in some way. In Section 1, more gener-
ally we consider an n-tuple of real bounded random variables (X1, . . . , Xn). Denote by
∆(X1, . . . , Xn;N,m, δ) the set of (x1, . . . ,xn) of xi ∈ R
N whose joint moments (on the
uniform distributed N -point set) of order up tom approximate those of (X1, . . . , Xn) up
to an error δ. Furthermore, denote by ∆sym(X1, . . . , Xn;N,m, δ) the set of (σ1, . . . , σn)
of permutations σi ∈ SN such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N,m, δ) for
some x1, . . . ,xn ∈ R
≤ , where R
≤ is the R
N -vectors arranged in increasing order. Then,
the asymptotic volume
log γ⊗nSN
∆sym(X1, . . . , Xn;N,m, δ)
under the uniform probability measure γSN on SN is shown to converge as lim supN→∞
(also lim infN→∞) and then limm→∞,δց0 to
−H(X1, . . . , Xn) +
H(Xi)
1 Supported in part by Grant-in-Aid for Scientific Research (B)17340043.
2 Supported in part by the Hungarian Research Grant OTKA T068258.
AMS subject classification: Primary: 62B10, 94A17.
http://arxiv.org/abs/0704.0588v1
2 F. HIAI AND D. PETZ
as long as H(Xi) > −∞ for 1 ≤ i ≤ n. Thus, we obtain a kind of discretization of the
mutual information via symmetric group (or permutations).
The approach can be applied to an n-tuple of discrete random variables (X1, . . . , Xn)
as well. But the definition of the ∆sym-set of micro-states for discrete variables is
somewhat different from the continuous variable case mentioned above, and we discuss
the discrete variable case in Section 2 separately.
The idea comes from the paper [3]. Motivated by theory of mutual free information in
[6], a similar approach to Voiculescu’s free entropy is provided there. The free entropy
is the free probability counterpart of the Boltzmann-Gibbs entropy, and RN -vectors
and the symmetric group SN here are replaced by Hermitian N ×N matrices and the
unitary group U(N), respectively. In this way, the “discretization approach” here is in
some sense a classical analog of the “orbital approach” in [3].
1. The continuous case
For N ∈ N let RN≤ be the convex cone of the N -dimensional Euclidean space R
consisting of x = (x1, . . . , xN ) such that x1 ≤ x2 ≤ · · · ≤ xN . The space R
N is naturally
regarded as the real function algebra on the N -point set. Let SN be the symmetric
group of order N (i.e., the permutations on {1, 2, . . . , n}). Throughout this section let
(X1, . . . , Xn) be an n-tuple of real random variables on a probability space (Ω,P), and
assume that the Xi’s are bounded (i.e., Xi ∈ L
∞(Ω;P)). The Boltzmann-Gibbs entropy
of (X1, . . . , Xn) is defined to be
H(X1, . . . , Xn) := −
· · ·
p(x1, . . . , xn) log p(x1, . . . , xn) dx1 · · · dxn
if the joint density p(x1, . . . , xn) of (X1, . . . , Xn) exists; otherwise H(X1, . . . , Xn) =
−∞. Note that the above integral is well defined in [−∞,∞) since the density p is
compactly supported.
Definition 1.1. The mean value of x = (x1, . . . , xN) in R
N is given by
κN(x) :=
For each N,m ∈ N and δ > 0 we define ∆(X1, . . . , Xn;N,m, δ) to be the set of all
n-tuples (x1, . . . ,xn) of xi = (xi1, . . . , xiN ) ∈ R
N , 1 ≤ i ≤ n, such that
|κN(xi1 · · ·xik)− E(Xi1 · · ·Xik)| < δ
for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m, where xi1 · · ·xik means the pointwise
product, i.e.,
xi1 · · ·xik := (xi11 · · ·xik1, xi12 · · ·xik2, . . . , xi1N · · ·xikN ) ∈ R
and E(·) denotes the expectation on (Ω,P). For each R > 0, define ∆R(X1, . . . , Xn;
N,m, δ) to be the set of all (x1, . . . ,xn) ∈ ∆(X1, . . . , Xn;N,m, δ) such that xi ∈
[−R,R]N for all 1 ≤ i ≤ n.
Heuristically, ∆(X1, . . . , Xn;N,m, δ) is the set of “micro-states” consisting of n-
tuples of discrete random variables on the N -point set with the uniform probability
A NEW APPROACH TO MUTUAL INFORMATION 3
such that all joint moments of order up to m give the corresponding joint moments of
X1, . . . , Xn up to an error δ.
For x ∈ RN write ‖x‖p := (N
j=1 |xj |
p)1/p for 1 ≤ p < ∞ and ‖x‖∞ :=
max1≤j≤N |xj| while ‖X‖p denotes the L
p-norm of a real random variable X on (Ω,P).
The next lemma is seen from [4, 5.1.1] based on the Sanov large deviation theorem,
which says that the Boltzmann-Gibbs entropy is gained as an asymptotic limit of the
volume of the approximating micro-states.
Lemma 1.2. For every m ∈ N and δ > 0 and for any choice of R ≥ max1≤i≤n ‖Xi‖∞,
the limit
log λ⊗nN
∆R(X1, . . . , Xn;N,m, δ)
exists, where λN is the Lebesgue measure on R
N . Furthermore, one has
H(X1, . . . , Xn) = lim
m→∞,δց0
log λ⊗nN
∆R(X1, . . . , Xn;N,m, δ)
independently of the choice of R ≥ max1≤i≤n ‖Xi‖∞.
In the following let us introduce some kinds of mutual information in the discretiza-
tion approach using micro-states of permutations.
Definition 1.3. The action of SN on R
N is given by
σ(x) := (xσ−1(1), xσ−1(2), . . . , xσ−1(N))
for σ ∈ SN and x = (x1, . . . , xN) ∈ R
N . For each N,m ∈ N, δ > 0 and R > 0 we
denote by ∆sym,R(X1, . . . , Xn;N,m, δ) the set of all (σ1, . . . , σn) ∈ S
N such that
(σ1(x1), . . . , σn(xn)) ∈ ∆R(X1, . . . , Xn;N,m, δ)
for some (x1, . . . ,xn) ∈ (R
n. For each R > 0 define
Isym,R(X1, . . . , Xn) := − lim
m→∞,δց0
lim sup
log γ⊗nSN
∆sym,R(X1, . . . , Xn;N,m, δ)
where γSN is the uniform probability measure on SN . Define also Isym,R(X1, . . . , Xn)
by replacing lim sup by lim inf. Obviously,
0 ≤ Isym,R(X1, . . . , Xn) ≤ Isym,R(X1, . . . , Xn).
Moreover, ∆sym,∞(X1, . . . , Xn;N,m, δ) is defined by replacing ∆R(X1, . . . , Xn;N,m, δ)
in the above by ∆(X1, . . . , Xn;N,m, δ) without cut-off by the parameter R. Then
Isym,∞(X1, . . . , Xn) and Isym,∞(X1, . . . , Xn) are also defined as above.
Definition 1.4. For each 1 ≤ i ≤ n we choose and fix a sequence ξi = {ξi(N)} of
ξi(N) ∈ R
≤ , N ∈ N, such that κN (ξi(N)
k) → E(Xki ) as N → ∞ for all k ∈ N, i.e.,
ξi(N) → Xi in moments. For each N,m ∈ N and δ > 0 we define ∆sym(X1, . . . , Xn :
ξ1(N), . . . , ξn(N);N,m, δ) to be the set of all (σ1, . . . , σn) ∈ S
N such that
(σ1(ξ1(N)), . . . , σn(ξn(N))) ∈ ∆(X1, . . . , Xn;N,m, δ).
4 F. HIAI AND D. PETZ
Define
Isym(X1, . . . , Xn : ξ1, . . . , ξn)
:= − lim
m→∞,δց0
lim sup
log γ⊗nSN
∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ)
and Isym(X1, . . . , Xn : ξ1, . . . ξn) by replacing lim sup by lim inf.
The next proposition asserts that the quantities in Definitions 1.3 and 1.4 are all
equivalent.
Lemma 1.5. For any choice of R ≥ max1≤i≤n ‖Xi‖∞ and for any choices of approxi-
mating sequences ξ1, . . . , ξn one has
Isym,∞(X1, . . . , Xn) = Isym,R(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn), (1.1)
Isym,∞(X1, . . . , Xn) = Isym,R(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn). (1.2)
Proof. It is obvious that ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) is included in
∆sym,∞(X1, . . . , Xn;N,m, δ) for any approximating sequences ξi. Moreover, for each
1 ≤ i ≤ n an approximating sequence ξi can be chosen so that ‖ξi(N)‖∞ ≤ ‖Xi‖∞
for all N ; then ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ) ⊂ ∆sym,R(X1, . . . , Xn;
N,m, δ) for any R ≥ R0 := max1≤i≤n ‖Xi‖∞. Hence it suffices to prove that for any
approximating sequences ξi and for every m ∈ N and δ > 0, there are an m
′ ∈ N, a
δ′ > 0 and an N0 ∈ N so that
∆sym,∞(X1, . . . , Xn;N,m
′, δ′) ⊂ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ)
for all N ≥ N0. Choose a ρ ∈ (0, 1) with m(R0 + 1)
m−1ρ < δ/2. By [5, Lemma 4.3]
(also [4, 4.3.4]) there exist an m′ ∈ N with m′ ≥ 2m, a δ′ > 0 with δ′ ≤ min{1, δ/2}
and an N0 ∈ N such that for every 1 ≤ i ≤ n and every x ∈ R
≤ with N ≥ N0,
if |κN (x
k) − E(Xki )| < δ
′ for all 1 ≤ k ≤ m′, then ‖x − ξi(N)‖m < ρ. Suppose
N ≥ N0 and (σ1, . . . , σn) ∈ ∆sym,∞(X1, . . . , Xn;N,m
′, δ′); then (σ1(x1), . . . , σn(xn)) ∈
∆(X1, . . . , Xn;N,m
′, δ′) for some (x1, . . . ,xn) ∈ (R
n. Since |κN(x
i ) − E(X
i )| < δ
for all 1 ≤ k ≤ m′, we get ‖xi − ξi(N)‖m ≤ ρ and
‖xi‖m ≤ ‖xi‖2m = κN(x
< (E(X2mi ) + 1)
≤ (R2m0 + 1)
1/2m ≤ R0 + 1.
Therefore,
|κN(σi1(ξi1(N)) · · ·σik(ξik(N)))− E(Xi1 · · ·Xik)|
≤ |κN(σi1(ξi1(N)) · · ·σik(ξik(N)))− κN(σi1(xi1) · · ·σik(xik))|
+ |κN(σi1(xi1) · · ·σik(xik))− E(Xi1 · · ·Xik)|
≤ m(R0 + 1)
m−1ρ+ δ′ < δ
for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m. The above latter inequality follows from the
Hölder inequality. Hence (σ1, . . . , σn) ∈ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ),
and the result follows. �
A NEW APPROACH TO MUTUAL INFORMATION 5
Consequently, we denote all the quantities in (1.1) by the same Isym(X1, . . . , Xn) and
those in (1.2) by Isym(X1, . . . , Xn). We call Isym(X1, . . . , Xn) and Isym(X1, . . . , Xn) the
mutual information and upper mutual information of (X1, . . . , Xn), respectively. The
terminology “mutual information” will be justified after the next theorem.
In the continuous variable case, our main result is the following exact relation of Isym
and Isym with the Boltzmann-Gibbs entropy H(·), which says that Isym(X1, . . . , Xn) is
formally the sum of the separate entropiesH(Xi)’s minus the compoundH(X1, . . . , Xn).
Thus, a naive meaning of Isym(X1, . . . , Xn) is the entropy (or information) overlapping
among the Xi’s.
Theorem 1.6.
H(X1, . . . , Xn) = −Isym(X1, . . . , Xn) +
H(Xi)
= −Isym(X1, . . . , Xn) +
H(Xi).
Proof. If the coordinates si of s ∈ R
N are all distinct, then s is uniquely written as
s = σ(x) with x ∈ RN≤ and σ ∈ SN . Note that the set of s ∈ R
N with si = sj for some
i 6= j is a closed subset of λN -measure zero. Under the correspondence
s ∈ RN ←→ (x, σ) ∈ RN≤ × SN , s = σ(x)
(well defined on a co-negligible subset of RN), the measure λN is transformed into the
product of λN |RN
and the counting measure on SN .
In the following proof we adopt, due to Lemma 1.5, the description of Isym and
Isym as Isym,R(X1, . . . , Xn) and Isym,R(X1, . . . , Xn) with R := max1≤i≤n ‖Xi‖∞. For
each N,m ∈ N and δ > 0, suppose (s1, . . . , sn) ∈ ∆R(X1, . . . , Xn;N,m, δ) and write
si = σi(xi) with xi ∈ R
≤ and σi ∈ SN . Then it is obvious that
(x1, . . . ,xn; σ1, . . . , σn)
∆R(Xi;N,m, δ) ∩ R
×∆sym,R(X1, . . . , Xn;N,m, δ).
By Lemma 1.2 and the fact stated at the beginning of the proof, we obtain
H(X1, . . . , Xn) ≤ lim
log λ⊗nN
∆R(X1, . . . , Xn;N,m, δ)
≤ lim inf
log λN
∆R(Xi;N,m, δ) ∩ R
+ log#∆sym,R(X1, . . . , Xn;N,m, δ)
= lim inf
log λN
∆R(Xi;N,m, δ)
− n logN !
6 F. HIAI AND D. PETZ
+ log#∆sym,R(X1, . . . , Xn;N,m, δ)
log λN
∆R(Xi;N,m, δ)
+ lim inf
log γ⊗nSN
∆sym,R(X1, . . . , Xn;N,m, δ)
This implies that
H(X1, . . . , Xn) ≤
H(Xi)− Isym(X1, . . . , Xn). (1.3)
Conversely, for each m ∈ N and δ > 0, by [5, Lemma 4.3] (also [4, 4.3.4]) there are
an m′ ∈ N with m′ ≥ m, a δ′ > 0 with δ′ ≤ δ/2 and an N0 ∈ N such that for every
N ∈ N and for every x,y ∈ RN≤ , if ‖x‖∞ ≤ R and |κN(x
k) − κN(y
k)| < 2δ′ for all
1 ≤ k ≤ m′, then ‖x− y‖1 < δ/2m(R + 1)
m−1. Suppose N ≥ N0 and
(x1, . . . ,xn; σ1, . . . , σn)
∆R(Xi;N,m
′, δ′) ∩ RN≤
×∆sym,R(X1, . . . , Xn;N,m
′, δ′)
so that (σ1(y1), . . . , σn(yn)) ∈ ∆R(X1, . . . , Xn;N,m
′, δ′) for some (y1, . . . ,yn) ∈ (R
Since
|κN(x
i )− κN(y
i )| ≤ |κN(x
i )− E(X
i )|+ |κN(y
i )− E(X
i )| < 2δ
for all 1 ≤ k ≤ m′, we get ‖xi − yi‖1 < δ/2m(R + 1)
m−1 for 1 ≤ i ≤ n. Therefore,
|κN (σi1(xi1) · · ·σik(xik))− E(Xi1 · · ·Xik)|
≤ |κN(σi1(xi1) · · ·σik(xik))− κN(σi1(yi1) · · ·σik(yik))|
+ |κN(σi1(yi1) · · ·σik(yik))− E(Xi1 · · ·Xik)|
≤ m(R + 1)m−1 max
1≤i≤n
‖xi − yi‖1 + δ
+ δ′ ≤ δ
for all 1 ≤ i1, . . . , ik ≤ n with 1 ≤ k ≤ m. This implies that (σ1(x1), . . . , σn(xn)) ∈
∆R(X1, . . . , Xn;N,m, δ). By Lemma 1.2 we obtain
H(Xi)− Isym(X1, . . . , Xn)
log λN
∆R(Xi;N,m
′, δ′)
+ lim sup
log γ⊗nSN
∆sym,R(X1, . . . , Xn;N,m
′, δ′)
= lim sup
log λN
∆R(Xi;N,m
′, δ′) ∩ RN≤
A NEW APPROACH TO MUTUAL INFORMATION 7
+ log#∆sym,R(X1, . . . , Xn;N,m
′, δ′)
≤ lim sup
log λ⊗nN
∆R(X1, . . . , Xn;N,m, δ)
This implies by Lemma 1.2 once again that
H(Xi)− Isym(X1, . . . , Xn) ≤ H(X1, . . . , Xn). (1.4)
The result follows from (1.3) and (1.4). �
Let µ(X1,...,Xn) be the joint distribution measure on R
n of (X1, . . . , Xn) while µXi is
that of Xi for 1 ≤ i ≤ n. Let S(µ(X1,...,Xn), µX1 ⊗· · ·⊗µXn) denote the relative entropy
(or the Kullback-Leibler divergence) of µ(X1,...,Xn) with respect to the product measure
µX1 ⊗ · · · ⊗ µXn , i.e.,
S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) :=
dµ(X1,...,Xn)
d(µX1 ⊗ · · · ⊗ µXn)
dµ(X1,...,Xn)
if µ(X1,...,Xn) is absolutely continuous with respect to µX1 ⊗ · · · ⊗ µXn ; otherwise
S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) := +∞. When H(Xi) > −∞ for all 1 ≤ i ≤ n,
one can easily verify that
S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn) = −H(X1, . . . , Xn) +
H(Xi).
Thus, the above theorem yields the following:
Corollary 1.7. If H(Xi) > −∞ for all 1 ≤ i ≤ n, then
Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn)
= S(µ(X1,...,Xn), µX1 ⊗ · · · ⊗ µXn).
Corollary 1.8. Under the same assumption as the above corollary, Isym(X1, . . . , Xn) =
0 if and only if X1, . . . , Xn are independent.
In particular, the originalmutual information I(X1∧X2) of two real random variables
X1, X2 is normally defined as
I(X1 ∧X2) := S(µ(X1,X2), µX1 ⊗ µX2).
Hence we have
I(X1 ∧X2) = Isym(X1, X2) = Isym(X1, X2)
as long as H(X1) > −∞ and H(X2) > −∞ (and X1, X2 are bounded). For this reason,
we gave the term “mutual information” to Isym.
Finally, some open problems are in order:
(1) Without the assumption H(Xi) > −∞ for 1 ≤ i ≤ n, does Isym(X1, . . . , Xn) =
Isym(X1, . . . , Xn) hold true?
8 F. HIAI AND D. PETZ
(2) More strongly, does the limit such as
log γ⊗nSN (∆sym,R(X1, . . . , Xn;N,m, δ))
log γ⊗nSN (∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N,m, δ))
exist as in Lemma 1.2?
(3) Without the assumption H(Xi) > −∞ for 1 ≤ i ≤ n, does Isym(X1, . . . , Xn) =
S(µ(X1,...,Xn), µX1⊗· · ·⊗µXn) hold true? Also, is Isym(X1, . . . , Xn) = 0 equivalent
to the independence of X1, . . . , Xn?
(4) Although the boundedness assumption for X1, . . . , Xn is rather essential in
the above discussions, it is desirable to extend the results in this section to
X1, . . . , Xn not necessarily bounded but having all moments.
2. The discrete case
Let Y be a finite set with a probability measure p. The Shannon entropy of p is
S(p) := −
p(y) log p(y).
For each sequence y = (y1, . . . , yN) ∈ Y
N , the type of y is a probability measure on Y
given by
νy(t) :=
Ny(t)
where Ny(t) := #{j : yj = t}, t ∈ Y .
The number of possible types is smaller than (N + 1)#Y . If ν is a type and TN(ν)
denotes the set of all sequences of type ν from YN , then the cardinality of TN(ν) is
estimated as follows:
(N + 1)#Y
eNS(ν) ≤ #TN (ν) ≤ e
NS(ν) (2.1)
(see [1, 12.1.3] and [2, Lemma 2.2]).
Let p be a probability meausre on Y . For each N ∈ N and δ > 0 we define ∆(p;N, δ)
to be the set of all sequences y ∈ YN such that |νy(t)−p(t)| < δ for all t ∈ Y . In other
words, ∆(p;N, δ) is the set of all δ-typical sequeces (with respect to the measure p).
Then the next lemma is well known.
Lemma 2.1.
S(p) = lim
log#∆(p;N, δ).
In fact, this easily follows from (2.1). Let PN,δ be the maximizer of the Shannon
entropy on the set of all types νy, y ∈ Y
N , such that |νy(t) − p(t)| < δ for all t ∈ Y .
We can use the Shannon entropy of the type class corresponding to PN,δ to estimate
the cardinality of ∆(p;N, δ):
(N + 1)−#YeNS(PN,δ) ≤ #∆(p;N, δ) ≤ eNS(PN,δ)(N + 1)#Y .
A NEW APPROACH TO MUTUAL INFORMATION 9
It follows that
log#∆(p;N, δ) = sup{S(q) : q is a probability meausre on Y
such that |q(t)− p(t)| < δ, t ∈ Y},
and the lemma follows.
We consider the case where p is the joint distribution of an n-tuple (X1, . . . , Xn)
of discrete random variables on (Ω,P). Throughout this section we assume that the
random variables X1, . . . , Xn have their values in a finite set X = {t1, . . . , td}.
Definition 2.2. Let p(X1,...,Xn) denote the joint distribution of (X1, . . . , Xn), which is
a measure on X n while the distribution pXi of Xi is a measure on X , 1 ≤ i ≤ n. We
write ∆(Xi;N, δ) for ∆(pXi;N, δ) and ∆(X1, . . . , Xn;N, δ) for ∆(p(X1,...,Xn);N, δ).
Next, we introduce the counterparts of Definitions 1.3 and 1.4 in the discrete variable
case.
Definition 2.3. The action of SN on X
N is similar to that on RN given in Defintion
1.3. For N ∈ N let XN≤ denote the set of all sequences of length N of the form
x = (t1, . . . , t1, t2, . . . , t2, . . . , td, . . . , td).
Oviously, such a sequence x is uniquely determined by (Nx(t1), . . . , Nx(td)) or the type
of x. That is, XN≤ is regarded as the set of all types from X
N . For each N ∈ N and
δ > 0 we denote by ∆sym(X1, . . . , Xn;N, δ) the set of all (σ1, . . . , σn) ∈ S
N such that
(σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ)
for some (x1, . . . ,xn) ∈ (X
n. Define
Isym(X1, . . . , Xn) := − lim
lim sup
log γ⊗nSN (∆sym(X1, . . . , Xn;N, δ)),
and Isym(X1, . . . , Xn) by replacing lim sup by lim inf. Moreover, for each 1 ≤ i ≤ n,
choose a sequence ξi = {ξi(N)} of ξi(N) = (ξi(N)1, . . . , ξi(N)N) ∈ X
≤ such that
νξi(N) → pXi as N → ∞. We then define ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N, δ),
Isym(X1, . . . , Xn : ξ1, . . . , ξn) and Isym(X1, . . . , Xn : ξ1, . . . , ξn) as in Definition 1.4.
Lemma 2.4. For any choices of approximating sequences ξ1, . . . , ξn one has
Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn),
Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn : ξ1, . . . , ξn).
Proof. It suffices to show that for each δ > 0 there are a δ′ > 0 and an N0 ∈ N such
∆sym(X1, . . . , Xn;N, δ
′) ⊂ ∆sym(X1, . . . , Xn : ξ1(N), . . . , ξn(N);N, δ) (2.2)
for all N ≥ N0. Choose δ
′ > 0 so that 3ndn+1δ′ ≤ δ, where d = #X . Suppose
(σ1, . . . , σn) is in the left-hand side of (2.2) so that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;
N, δ′) for some (x1, . . . ,xn), xi = (xi1, . . . , xiN) ∈ X
≤ . Since
|ν(σ1(x1),...,σn(xn))(z1, . . . , zn)− p(X1,...,Xn)(z1, . . . , zn)| < δ
′, (z1, . . . , zn) ∈ X
n, (2.3)
10 F. HIAI AND D. PETZ
νxi(t) =
z1,...,zi−1,zi+1,...,zn∈X
ν(σ1(x1),...,σn(xn))(z1, . . . , zi−1, t, zi+1, . . . , zn), t ∈ X ,
pXi(t) =
z1,...,zi−1,zi+1,...,zn∈X
p(X1,...,Xn)(z1, . . . , zi−1, t, zi+1, . . . , zn), t ∈ X ,
it follows that
|νxi(t)− pXi(t)| < d
n−1δ′ (2.4)
for any 1 ≤ i ≤ n and t ∈ X . Now, choose an N0 ∈ N so that |νξi(N)(t)− pXi(t)| < δ
and hence
|νξi(N)(t)− νxi(t)| < 2d
n−1δ′ (2.5)
for any 1 ≤ i ≤ n and t ∈ X and for all N ≥ N0. Since
|(Nξi(N)(t1) + · · ·+Nξi(N)(tl))− (Nxi(t1) + · · ·+Nxi(tl))|
≤ |Nξi(N)(t1)−Nxi(t1)|+ · · ·+ |Nξi(N)(tl)−Nxi(tl)|
< 2Ndnδ′
for every 1 ≤ l ≤ d thanks to (2.5), it is easily seen that
j ∈ {1, . . . , N} : ξi(N)j 6= xij
< 2Ndn+1δ′
for any 1 ≤ i ≤ n. Hence we get
|ν(σ1(ξ1(N)),...,σn(ξn(N)))(z1, . . . , zn)− ν(σ1(x1),...,σn(xn))(z1, . . . , zn)|
∣#{j : ξ1(N)σ−1
(j) = z1, . . . , ξn(N)σ−1n (j) = zn}
−#{j : x1σ−1
(j) = z1, . . . , xnσ−1n (j) = zn}
#{j : ξi(N)j 6= xij} < 2nd
n+1δ′
so that thanks to (2.3)
|ν(σ1(ξ1(N)),...,σn(ξn(N)))(z1, . . . , zn)− p(X1,...,Xn)(z1, . . . , zn)| < 3nd
n+1δ′ ≤ δ
for every (z1, . . . , zn) ∈ X
n. Therefore, (σ1, . . . , σn) is in the right-hand side of (2.2),
as required. �
The next theorem is the discrete variable version of Theorem 1.6.
Theorem 2.5.
Isym(X1, . . . , Xn) = Isym(X1, . . . , Xn) = −S(X1, . . . , Xn) +
S(Xi).
Proof. For each sequence (N1, . . . , Nd) of integers Nl ≥ 0 with
l=1Nl = N , let
S(N1, . . . , Nd) denote the subgroup of SN consisting of products of permutations of
{1, . . . , N1}, {N1 + 1, . . . , N1 +N2}, . . . , {N1 + · · ·+Nd−1 + 1, . . . , N}, and let
SN/S(N1, . . . , Nd)
be the set of left cosets of S(N1, . . . , Nd). For each x ∈ X
≤ and σ ∈ SN we write
[σ]x for the left coset of S(Nx(t1), . . . , Nx(td)) containing σ. Then it is clear that
A NEW APPROACH TO MUTUAL INFORMATION 11
every s ∈ XN is represented as s = σ(x) with a unique pair (x, [σ]x) of x ∈ X
≤ and
[σ]x ∈ SN/S(Nx(t1), . . . , Nx(td)).
For any ε > 0 one can choose a δ > 0 such that for every 1 ≤ i ≤ n and every
probability measure p on X , if |p(t)−pXi(t)| < δ for all t ∈ X , then |S(p)−S(pXi)| < ε.
This implies that for each N ∈ N and 1 ≤ i ≤ n, one has |S(νx) − S(pXi)| < ε
whenever x ∈ ∆(Xi;N, δ). Notice that ∆sym(X1, . . . , Xn;N, δ/d
n−1) is the union of
[σ1]x1 × · · · × [σn]xn for all (x1, . . . ,xn; [σ1]x1 , . . . , [σn]xn) of xi ∈ X
≤ and [σi]xi ∈
SN/S(Nxi(t1), . . . , Nxi(td)) such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d
n−1).
Now, suppose (x1, . . . ,xn) ∈ (X
n, (σ1, . . . , σn) ∈ S
N and (σ1(x1), . . . , σn(xn)) ∈
∆(X1, . . . , Xn;N, δ/d
n−1). Then, for each 1 ≤ i ≤ n we get xi ∈ ∆(Xi;N, δ), i.e.,
|νxi(t)− pXi(t)| < δ for all t ∈ X as (2.4). Hence we have
[σ1]x1 × · · · × [σn]xn
x∈∆(Xi;N,δ)
Nx(t)!
(2.6)
so that
#∆sym(X1, . . . , Xn;N, δ/d
≤ #∆(X1, . . . , Xn;N, δ/d
n−1) ·
x∈∆(Xi;N,δ)
Nx(t)!
Therefore,
log γ⊗nSN
∆sym(X1, . . . , Xn;N, δ/d
log#∆(X1, . . . , Xn;N, δ/d
x∈∆(Xi;N,δ)
logNx(t)!
logN !. (2.7)
For each 1 ≤ i ≤ n and for any x ∈ ∆(Xi;N, , δ), the Stirling formula yields
logNx(t)!−
logN !
Nx(t)
logNx(t)−
Nx(t)
− logN + 1 + o(1)
= −S(νx) + o(1) ≤ −S(pXi) + ε+ o(1) as N →∞ (2.8)
thanks to the above choice of δ > 0. Here, note that the o(1) in the above estimate
is uniform for x ∈ ∆(Xi;N, δ). Hence, by (2.7), (2.8) and by Lemma 2.1 applied to
p(X1,...,Xn) on X
n, we obtain
−Isym(X1, . . . , Xn) ≤ S(p(X1,...,Xn))−
S(pXi) + nε
and hence
Isym(X1, . . . , Xn) ≥ −S(X1, . . . , Xn) +
S(Xi). (2.9)
12 F. HIAI AND D. PETZ
Next, we prove the converse direction. For any ε > 0 choose a δ > 0 as above. For
N ∈ N let Ξ(N, δ/dn−1) be the set of all (x1, . . . ,xn) ∈ (X
n such that
(σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d
for some (σ1, . . . , σn) ∈ S
N . Furthermore, for each (x1, . . . ,xn) ∈ Ξ(N, δ/d
n−1), let
Σ(x1, . . . ,xn;N, δ/d
n−1) be the set of all
([σ1]x1 , . . . , [σn]xn) ∈
SN/S(Nxi(t1), . . . , Nxi(td))
such that (σ1(x1), . . . , σn(xn)) ∈ ∆(X1, . . . , Xn;N, δ/d
n−1). Then it is obvious that
#∆(X1, . . . , Xn;N, δ/d
n−1) ≤
(x1,...,xn)∈Ξ(N,δ/dn−1)
#Σ(x1, . . . ,xn;N, δ/d
n−1). (2.10)
When (x1, . . . ,xn) ∈ Ξ(N, δ/d
n−1), we get xi ∈ ∆(Xi;N, δ) as (2.4) for 1 ≤ i ≤ n.
Hence it is seen that
#Ξ(N, δ/dn−1) ≤
#∆(Xi;N, δ)
(N1, . . . , Nd) : Nl ≥ 0 is an integer in
N(pXi(tl)− δ), N(pXi(tl) + δ)
for 1 ≤ l ≤ d
< (2Nδ + 1)nd. (2.11)
For any fixed (x1, . . . ,xn) ∈ Ξ(N, δ/d
n−1), suppose ([σ1]x1 , . . . , [σn]xn) ∈ Σ(x1, . . . ,xn;
N, δ/dn−1); then we get
[σ1]x1 × · · · × [σn]xn
x∈∆(Xi;N,δ)
Nx(t)!
similarly to (2.6). Therefore,
#∆sym(X1, . . . , Xn;N, δ/d
([σ1]x1 ,...,[σn]xn )∈Σ(x1,...,xn;N,δ/d
[σ1]x1 × · · · × [σn]xn
≥ #Σ(x1, . . . ,xn;N, δ/d
n−1) ·
x∈∆(Xi;N,δ)
Nx(t)!
. (2.12)
By (2.10)–(2.12) we obtain
#∆(X1, . . . , Xn;N, δ/d
n−1) ≤
#∆sym(X1, . . . , Xn;N, δ/d
n−1) · (2Nδ + 1)nd
minx∈∆(Xi;N,δ)
t∈X Nx(t)!
A NEW APPROACH TO MUTUAL INFORMATION 13
so that
log#∆(X1, . . . , Xn;N, δ/d
log γ⊗nSN
∆sym(X1, . . . , Xn;N, δ/d
x∈∆(Xi;N,δ)
logNx(t)!
logN ! +
log(2Nδ + 1).
Since it follows similarly to (2.8) that
logNx(t)! +
logN ! ≤ S(pXi) + ε+ o(1) as N →∞
with uniform o(1) for all x ∈ ∆(Xi;N, δ), we obtain
S(p(X1,...,Xn)) ≤ −Isym(X1, . . . , Xn) +
S(pXi) + nε
by Lemma 2.1 again, and hence
Isym(X1, . . . , Xn) ≤ −S(X1, . . . , Xn) +
S(Xi). (2.13)
The conclusion follows from (2.9) and (2.13). �
In particular, the mutual information I(X1 ∧ X2) of X1 and X2 is equivalently ex-
pressed as
I(X1 ∧X2) = S(p(X1,X2), pX1 ⊗ pX2) = −S(p(X1,X2)) + S(pX1) + S(pX2)
= Isym(X1, X2) = Isym(X1, X2).
Similarly to the problem (2) mentioned in the last of Section 1, it is unknown whether
the limit
log γ⊗nSN
∆sym(X1, . . . , Xn;N, δ)
exists or not.
References
[1] T. M. Cover and J. A. Thomas, Elements of Information Theory, Second edition, Wiley-
Interscience, Hoboken, NJ, 2006.
[2] I. Csiszár and P. C. Shields, Information Theory and Statistics: A Tutorial, in “Foundations
and Trends in Communications and Information Theory,” Vol. 1, No. 4 (2004), 417-528, Now
Publishers.
[3] F. Hiai, T. Miyamoto and Y. Ueda, Orbital approach to microstate free entropy, preprint, 2007,
math.OA/0702745.
[4] F. Hiai and D. Petz, The Semicircle Law, Free Random Variables and Entropy, Mathematical
Surveys and Monographs, Vol. 77, Amer. Math. Soc., Providence, 2000.
[5] D. Voiculescu, The analogues of entropy and of Fisher’s information measure in free probability
theory, II, Invent. Math. 118 (1994), 411–440.
[6] D. Voiculescu, The analogue of entropy and of Fisher’s information measure in free probability
theory VI: Liberation and mutual free information, Adv. Math. 146 (1999), 101–166.
http://arxiv.org/abs/math/0702745
14 F. HIAI AND D. PETZ
Graduate School of Information Sciences, Tohoku University, Aoba-ku, Sendai 980-
8579, Japan
Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, H-1053
Budapest, Reáltanoda u. 13-15, Hungary
	Introduction
	1. The continuous case
	2. The discrete case
	References
ABSTRACT
  A new expression as a certain asymptotic limit via "discrete micro-states" of
permutations is provided to the mutual information of both continuous and
discrete random variables.

<|endoftext|><|startoftext|>
Introduction
Zhou and Sornette (2003) analyzed the deflated quarterly average sales prices
p(t) from December 1992 to December 2002 of new houses sold in all the states
in the USA and by regions (northeast, midwest, south and west) and found
that, while there was undoubtedly a strong growth rate, there was no evidence
of a bubble in the latest six years (as qualified by a super-exponential growth).
Then, Zhou and Sornette (2006) analyzed the quarterly average sale prices of
new houses sold in the USA as a whole, in the northeast, midwest, south,
and west of the USA, in each of the 50 states and the District of Columbia
of the USA up to the first quarter of 2005, to determine whether they have
grown faster-than-exponential (which is taken as the diagnostic of a bubble).
Zhou and Sornette (2006) found that 22 states (mostly Northeast and West)
exhibit clear-cut signatures of a fast growing bubble. From the analysis of the
S&P 500 Home Index, they concluded that the turning point of the bubble
would probably occur around mid-2006. The specific statement found at the
bottom of page 306 of Ref.[Zhou and Sornette (2006)] is: “We observe a good
stability of the predicted tc ≈ mid-2006 for the two LPPL models (2) and (3).
The spread of tc is larger for the second-order LPPL fits but brackets mid-
2006. As mentioned before, the power-law fits are not reliable. We conclude
that the turning point of the bubble will probably occur around mid-2006.”
It should be stressed that these studies departed from most other reports by
analysts and consulting firms on real estate prices in that Zhou and Sornette
(2003, 2006) did not characterize the housing market as overpriced in 2003. It is
only in 2004-2005 that they confirmed that the signatures of an unsustainable
bubble path has been revealed.
Let us briefly analyze how this prediction has fared. The upper panel of Figure
1 shows the quarterly house price indexes (HPIs) in the 21 states and in the
District of Columbia (DC) from 1994 to the fourth quarter of 2006 released
by the OFHEO. It is evident that the growth in most of these 22 HPIs has
slowed down or even stopped during the year of 2006. When we look at the
S&P Case-Shiller Home Indexes of the 20 major US cities, as illustrated in the
lower panel of Figure 1, we observe that the majority of the S&P/CSIs had
a maximum denoted by a solid dot in the middle of 2006, validating the pre-
diction of Zhou and Sornette (2006). Specifically, the times of the maxima are
respectively 2006/06/01, 2006/09/01, 2005/11/01, 2006/05/01, 2006/08/01,
2006/05/01, 2006/12/01, 2006/07/01, 2006/08/01, 2006/09/01, 2005/09/01,
2005/12/01, 2006/09/01, 2006/09/01, 2006/08/01, 2006/06/01, 2006/07/01,
2006/09/01, 2006/08/01, 2006/12/01, 2006/06/01, and 2006/07/01 for the 20
cities shown in the legend of the lower panel. The only two cities with a max-
Education Foundation (Grant 101086), and the Alfred Kastler Foundation which
supported W.-X. Zhou for a visiting position in France.
imum occurring later towards the end of 2006 (2006/12/01) are Miami and
Seattle. However their growth rates decreased remarkably in 2006 as shown
in the figure. Furthermore, the S&P/CS Home Price Composite-10 reached
its historical high 226.29 on 2006/06/01 and the Composite-20 culminated to
206.53 on 2006/07/01, again confirming remarkably well the validity of the
forecast of Zhou and Sornette (2006).
1994 1996 1998 2000 2002 2004 2006 2008 2010
2000 2001 2002 2003 2004 2005 2006 2007
Phoenix − AZ
Los Angeles
San Diego
San Francisco
Denver
Washington
Miami
Tampa − FL
Atlanta − GA
Chicago
Boston
Detroit − MI
Minneapolis − MN
Charlotte − NC
Las Vegas
New York
Cleveland − OH
Portland − OR
Dallas − TX
Seattle − WA
Fig. 1. Evaluation of the prediction of Zhou and Sornette (2006) that “the turning
point of the bubble will probably occur around mid-2006” using the OFHEO HPI
data (upper panel) and the S&P CSI data (lower panel).
In this note, we provide a more regional study of the diagnostic of bubbles
and the prediction of their demise. Specifically, we analyze the Case-Shiller-
Weiss (CSW) Zip Code Indexes of 27 different Las Vegas regions calculated
with a monthly rate from June-1983 to March-2005. The CSW Indexes are
based on the so-called repeat sales methods which directly measure house
price appreciations. The key to these data is that they are observations of
multiple transactions on the same property, repeated over many properties
and then pooled in an index. Prices from different time periods are combined
to create “matched pairs,” providing a direct measure of price changes for a
given property over a known period of time. Bailey et al. (1963) proposed the
basic repeat sales method over four decades ago, but only after the work by
Case and Shiller (1987, 1989, 1990) did the idea receive significant attention
in the housing research community.
Studying the Las Vegas database is particular suitable since Las Vegas belongs
to a state which was identified by Zhou and Sornette (2006) as one of the
22 states with a fast growing bubble in 2005. With access to 27 different
CSW Zip Code Indexes of Las Vegas, we are able to obtain more reliable and
fine-grained measures, which both confirm and extend the previous analyses
of Zhou and Sornette (2003, 2006). The next section recalls the conceptual
background underlying our empirical approach. Then, section 3 analyzes the
regional CSW indexes for Las Vegas, showing that there is a regime shift
separated by a bubble around year 2004. Section 4 identifies and then analyzes
the yearly periodicity and intra-year pattern detected in the growth rate of
the regional CSW indexes. Section 5 offers a preliminary forecast based on the
periodicity analyses in Sec. 4. Section 6 concludes.
2 Conceptual background of our empirical analysis
2.1 Humans as social animals and herding
Humans are perhaps the most social mammals and they shape their envi-
ronment to their personal and social needs. This statement is based on a
growing body of research at the frontier between new disciplines called neuro-
economics, evolutionary psychology, cognitive science, and behavioral finance.
This body of evidence emphasizes the very human nature of humans with
its biases and limitations, opposed to the previously prevailing view of ratio-
nal economic agents optimizing their decisions based on unlimited access to
information and to computation resources.
Here, we focus on an empirical question (the existence and detection of real-
estate bubbles) which, we hypothesize, is a footprint of perhaps the most
robust trait of humans and the most visible imprint in our social affairs: im-
itation and herding. Imitation has been documented in psychology and in
neuro-sciences as one of the most evolved cognitive process, requiring a de-
veloped cortex and sophisticated processing abilities. In short, we learn our
basics and how to adapt mostly by imitation all along our life. It seems that
imitation has evolved as an evolutionary advantageous trait, and may even
have promoted the development of our anomalously large brain (compared
with other mammals). It is actually “rational” to imitate when lacking suffi-
cient time, energy and information to take a decision based only on private
information and processing, that is..., most of the time. Imitation, in obvious
or subtle forms, is a pervasive activity of humans. In the modern business,
economic and financial worlds, the tendency for humans to imitate leads in
its strongest form to herding and to crowd effects.
Based on a theory of cooperative herding and imitation, we have shown that
imitation leads to positive feedbacks, that is, an action leads to consequences
which themselves reinforce the action and so on, leading to virtuous or vicious
circles. We have formalized these ideas in a general mathematical theory which
has led to observable signature of herding, in the form of so-called log-periodic
power law acceleration of prices. A power law acceleration of prices reflects
the positive feedback mechanism. When present, log-periodicity takes into
account the competition between positive feedback (self-fulfilling sentiment),
negative feedbacks (contrariant behavior and fundamental/value analysis) and
inertia (everything takes time to adjust). Sornette (2003) presented a general
introduction, a synthesis and examples of applications.
2.2 Definition and mechanism for bubbles
The term “bubble” is widely used but rarely clearly defined. Following Case and Shiller
(2003), the term “bubble” refers to a situation in which excessive public ex-
pectations of future price increases cause prices to be temporarily elevated.
During a housing price bubble, homebuyers think that a home that they would
normally consider too expensive for them is now an acceptable purchase be-
cause they will be compensated by significant further price increases. They
will not need to save as much as they otherwise might, because they expect
the increased value of their home to do the saving for them. First-time home-
buyers may also worry during a housing bubble that if they do not buy now,
they will not be able to afford a home later. Furthermore, the expectation of
large price increases may have a strong impact on demand if people think that
home prices are very unlikely to fall, and certainly not likely to fall for long,
so that there is little perceived risk associated with an investment in a home.
What is the origin of bubbles? In a nutshell, speculative bubbles are caused
by “precipitating factors” that change public opinion about markets or that
have an immediate impact on demand, and by “amplification mechanisms”
that take the form of price-to-price feedback, as stressed by Shiller (2000). A
number of fundamental factors can influence price movements in housing mar-
kets. On the demand side, demographics, income growth, employment growth,
changes in financing mechanisms or interest rates, as well as changes in loca-
tion characteristics such as accessibility, schools, or crime, to name a few, have
been shown to have effects. On the supply side, attention has been paid to
construction costs, the age of the housing stock, and the industrial organiza-
tion of the housing market. The elasticity of supply has been shown to be a
key factor in the cyclical behavior of home prices. The cyclical process that
we observed in the 1980s in those cities experiencing boom-and-bust cycles
was caused by the general economic expansion, best proxied by employment
gains, which drove demand up. In the short run, those increases in demand
encountered an inelastic supply of housing and developable land, inventories
of for-sale properties shrank, and vacancy declined. As a consequence, prices
accelerated. This provided an amplification mechanism as it led buyers to
anticipate further gains, and the bubble was born. Once prices overshoot or
supply catches up, inventories begin to rise, time on the market increases,
vacancy rises, and price increases slow down, eventually encountering down-
ward stickiness. The predominant story about home prices is always the prices
themselves (see Shiller, 2000; Sornette, 2003); the feedback from initial price
increases to further price increases is a mechanism that amplifies the effects of
the precipitating factors. If prices are going up rapidly, there is much word-of-
mouth communication, a hallmark of a bubble. The word of mouth can spread
optimistic stories and thus help cause an overreaction to other stories, such as
stories about employment. The amplification can also work on the downside
as well. Price decreases will generate publicity for negative stories about the
city, but downward stickiness is encountered initially.
2.3 Was there a bubble? Status of the argument based on the ratio of cost of
owning versus cost of renting
In recent years, there has been increasing debates on whether there was a real
estate bubble or not in the United States of America. Case and Shiller (2003),
Shiller (2006) and Smith and Smith (2006) argued that the house prices over
the period 2000-2005 were not abnormal as they reflected only the convergence
of the prices to their fundamentals from below. In contrast, Zhou and Sornette
(2006) and Roehner (2006) have suggested that there was a bubble, which be-
came identifiable only after 2003, that is, after the work of Zhou and Sornette
(2003).
In this context, it is instructive to comment on the study by Himmelberg et al.
(2005), from the Federal Reserve Bank of New York , as it reflects the never
ending debate between tenants of the fundamental valuation explanation and
those invoking speculative bubbles. We are resolutely part of the second group.
Himmelberg et al. (2005) constructed measures of the annual cost of single-
family housing for 46 metropolitan areas in the United States over the last 25
years and compared them with local rents and incomes as a way of judging the
level of housing prices. In a nutshell, they claimed in 2005 that conventional
metrics like the growth rate of house prices, the price-to-rent ratio, and the
price-to-income ratio can be misleading and lead to incorrect conclusions on
the existence of the real-estate bubble. Their measure showed that, during
the 1980s, houses looked most overvalued in many of the same cities that
subsequently experienced the largest house price declines. But they found that
from the trough of 1995 to 2004, the cost of owning rose somewhat relative
to the cost of renting, but not, in most cities, to levels that made houses look
overvalued.
The rosy conclusion of Himmelberg et al. (2005), that 2004-2005 prices were
justifiable and that there was no risk of deflation as no bubble was present, is
based on a particularly curious comparison between cost of owning and cost of
renting, as noticed by Jorion (2005), in a letter to the Wall Street Journal. In-
deed, they candidly revealed however that their “cost of owning” calculations
imply an “expected appreciation on the property” coefficient. The value for
this factor is no doubt derived from figures for appreciation as currently ob-
served on the housing market, meaning they regarded the current appreciation
level as a reasonable assumption for what would indeed happen next – which
is precisely what our analyses and that of others question. In other words, the
authors had unwittingly hard-wired into their model the assertion that there
was no housing bubble; little wonder then that this is also what they felt au-
thorized to conclude. The circularity of their reasoning is particularly obvious
in an illustration they gave for San Francisco where for more than 60 years the
price-to-rent ratio has exceeded the national average, which, so they claimed,
“does not necessarily make owning there more expensive than renting.” The
reason why is that “high financing costs are offset by above-average expected
capital gains.” Translated, this means that as long as there is a bubble, prices
will go up and investing in a house remain a profitable operation. This trivial
statement is hollow; the real question is whether the trend that is observed
now remains sustainable.
In addition to this criticism put forward by Jorion (2005), there are other
reasons to doubt the validity of the conclusion of Himmelberg et al. (2005). In
the own words of Himmelberg et al. (2005), “the ratio of the cost of owning to
the cost of renting is especially sensitive to the real long-term interest rates.”
They are right in their rosy conclusion... as long as the long-term interest rates
remain exceptionally low. It is particularly surprising that their estimation of
the ratio of the cost of owning to the cost of renting was based on the most
recent rates over the preceding year of their analysis (2004), while the price of a
house is a long-term investment: what will be the long-term rates in 10, 20, 30,
or 50 years? Another problem is that their analysis was “mono-dimensional”:
they proposed that everything depends only on the ratio of the cost of owning
to the cost of renting. But they missed the interest rates as an independent
variable. As a consequence, it is not reasonable to compare the 1980s and the
present time, as the long-term interest rates had nothing in common. Another
problem with their analysis is that they assumed “equilibrium,” while people
are sensitive to the history-dependent path followed by the prices. In other
words, people are sensitive to the way prices reach a certain level, if there is
an acceleration that can self-fuel itself for a while, while Himmelberg et al.
(2005) discussed only the mono-dimensional level of the price, and not how it
got there. We think that this general error made by “equilibrium” economists
constitutes a fundamental flaw which fails to capture the real nature of the
organization of human societies and their decision process. In the sequel, we
actually focus our attention on signatures of price trajectories that highlight
the importance of history dependence for prediction.
This discussion is reminiscent of the proposition by Mauboussin and Hiler
(1999), offered close to the peak of the Internet and new technology bubble
that culminated in 2000, that better business models, the network effect, first-
to-scale advantages, and real options effect could account rationally for the
high prices of dot.com and other New Economy companies. These interest-
ing views expounded in early 1999 were in synchrony with the bull market
of 1999 and preceding years. They participated in the general optimistic view
and added to the strength of the herd. Later, after the collapse of the bubble,
these explanations seem less attractive. This did not escape the then U.S. Fed-
eral Reserve chairman Alan Greenspan (1997), who said : “Is it possible that
there is something fundamentally new about this current period that would
warrant such complacency? Yes, it is possible. Markets may have become more
efficient, competition is more global, and information technology has doubt-
less enhanced the stability of business operations. But, regrettably, history is
strewn with visions of such new eras that, in the end, have proven to be a
mirage. In short, history counsels caution.”
3 Regime shift in the CSW Zip Code Indexes of Las Vegas
3.1 Description of the data
We now turn to the analysis of the CSW indexes of 27 different Las Vegas zip
regions obtained with a monthly rate. The 27 monthly CSW data sets start
from June-1983 and end in March-2005. Figure 2 shows the price trajectories
of all the 27 CSW indexes. Visual inspection shows (i) a very similar behavior
of all the different zip codes and (ii) a sudden increase of the indexes since
Mid-2003. Let us now analyze this data quantitatively.
3.2 Power law fits
The simplest mathematical equation capturing the positive feedback effect
and herding is the power law formula (see Broekstra et al., 2005, for a simple
introduction in a similar context)
I(t) = A+B|tc − t|
m , (1)
1980 1985 1990 1995 2000 2005 2010
Fig. 2. Time evolution of the Case-Shiller-Weiss (CSW) Zip Code Indexes of 27 Las
Vegas zip regions from June-1983 to March-2005.
where B < 0 and 0 < m < 1 or B > 0 and m < 1. Others cases do not
qualify as a power law acceleration. For B < 0 and 0 < m < 1 or B > 0
and m < 0, the trajectory of I(t) described by (1) expresses the existence of
an accelerating bubble, which is faster than exponential. This is taken as one
hallmark of the existence of a bubble.
Notice also that this formula expresses the existence of a singularity at time tc,
which should be interpreted as a change of regime (the mathematical singu-
larity does not exist in reality and is rounded off by so-called finite-size effects
and the appearance of a large susceptibility to other mechanisms). This criti-
cal time tc must be interpreted as the end of the bubble and the time where
the regime is transiting to another state through a crash or simply a plateau
or a slowly moving correction.
We have fitted each of the 27 individual CSW indexes using the pure power
law model (1). The data used for fitting is from Dec-1995 to Jun-2005. We do
not show the results as the signature of a power law growth is not evident,
essentially because the acceleration is only over a rather short period of time
from approximately 2002 to 2004. As a consequence, power law fits give unre-
liable critical time tc too much in the future (like 2008 and beyond). We have
thus redone the fits of the 27 CSW indexes over a shorter time interval from
Aug-2001 to Jun-2005. A typical example is shown in Fig 3. All other 26 CSW
are very similar, with some variations of the parameters, but the message is
the same: while there is a clear faster-than-exponential acceleration over most
of the time interval, the price trajectory has clearly transitioned into another
regime in the latter part of the time interval considered here. The transition
occurred smoothly from mid-2004 to mid-2005 (the end of the time period
analyzed here).
It is important to recognize that the power law regime is expected only rela-
2001 2002 2003 2004 2005 2006
 = 2012.94;   m = −12;   χ = 0.007679
Fig. 3. Typical evolution of a CSW index from Aug-2001 to Jun-2005 and its fit by
a power law, showing both the faster-than-exponential growth up to mid-2004 and
the smooth transition to a much slower growth at later times. The root-mean-square
χ of the residuals of the fit as well at tc and m are given inside the figure.
tively close to the critical time tc, while other behaviors are expected far from
tc. The simplest model is to consider that, far from tc, the price follows an
exponential growth with an approximately constant growth rate µ:
I(t) = a+ beµt . (2)
A fuller description is thus to consider that formula (2) holds from the begin-
ning of the time series up to a cross-over time t∗, beyond which expression (1)
takes over. Any given price trajectory should thus be fitted by (2) from some
initial time tstart to time t
∗ and then by (1) from t∗ to the end of the time
series. Technically, t∗ is known from the parameters a, b, µ, A,B, tc, m by the
condition of continuity of I(t) at t = t∗, that is, both formulas give the same
value at t = t∗. We can further determine one of the parameters a, b or µ by
imposing a condition of differentiability at t∗, that is, the first time-derivative
of I(t) is continuous at t∗. This approach is known in numerical analysis as
“asymptotic matching” (see Bender and Orszag, 1978).
A simplified description of such a cross-over between a standard exponential
growth and the power law super-exponential acceleration is obtained by using
a more compact formulation
I(t) = A+B tanh[(tc − t)/τ ]
m , (3)
where tanh denotes the hyperbolic tangent function. This expression derives
from a study of the transition from the non-critical to critical regime in rup-
ture processes (of which bubbles and their terminal singularity belong to)
conducted by Sornette and Andersen (1998). This expression has the virtue
of providing automatically a smooth transition between the exponential be-
havior (2) and the pure power law (1), since tanh[(tc − t)/τ ] ≈ (tc − t)/τ for
tc− t < τ and tanh[(tc− t)/τ ] ≈ 1− 2e
2(t−tc)/τ for tc− t > τ . In this later case
tc − t > τ , expression (3) becomes of the form (2) with m = 1 and
a=A+B , (4)
b=−2Be−2tc/τ , (5)
µ=1/τ . (6)
In contrast, for tc − t < τ , expression (3) becomes of the form (1) with the
correspondence B/τm → B. Expression (3) has only five free parameters, in
contrast with the model involving the cross-over from (2) to (1) which has 7
free parameters (a, b, µ, A,B, tc, m) while t
∗ is determined by the asymptotic
matching). The pure power law formula (1) has 4 parameters while the ex-
ponential law (2) has just 3 parameters. The problem with expression (3) is
that it does not recover a pure exponential growth even for tc − t > τ , when
m 6= 1. Thus, expression (3) is limited in fully describing a possible cross-over
from a standard mild exponential growth and an super-exponential power law
acceleration. Our tests (not shown) find that a fit with model (3) retrieve the
pure power law model (1) with the same critical time tc and exponent m and
the same root-mean-square residual r.m.s. (the fit adjusts the parameter τ to a
very large value, ensuring that the fit is always in the regime tc−t ≪ τ so that
the hyperbolic tangential model reduces to the pure power law model). Thus,
contrary to our initial hopes, this approach does not provide any additional
insight.
Inspired by these tests, we could propose the following modified model
I(t) = a+ beµt(tc − t)
m . (7)
It has 5 adjustable parameters, like model (3), but it seems more flexible to
describe the looked-for cross-over: for large tc− t, the power law term (tc− t)
changes slowly, especially for 0 < m < 1 as is expected here; for small tc−t, the
power law term changes a lot while the exponential term is basically constant.
But, this model is correct for a critical point only if m < 0 so that b > 0;
otherwise, if 0 < m < 1, b < 0 and for tc− t large, the exponential term which
dominate does not describe a growth but an exponentially accelerating decay.
For 0 < m < 1, we thus need a different formulation. We propose
I(t) = a+ beµt + c(tc − t)
m . (8)
We have fitted this formula to the data over the four periods 1983 - Oct. 2004,
1991 - Oct. 2004, 1983 - Mar. 2005, 1991 - Mar. 2005 and, while the fits are
reasonable, the critical time tc is found to overshoot to 2007-2008, which is a
typical signature that the model is not predictive.
In conclusion of this first preliminary study, the presence of a bubble (faster-
than-exponential growth) is confirmed but the determination of the end of
this phase is for the moment unreliable.
3.3 Dependence of the growth rate on the index value
The monthly growth rate g(t) of a given CSW index at time t is defined by
g(t) = ln[p(t)/p(t− 1)] , (9)
where p(t) is the price of that CSW index at time t. Figure 4 shows the
evolution of the growth rates of the 27 CSW indexes from June-1983 to March-
2005. While there are some variations, all 27 CSW indexes follow practically
the same pattern. We clearly observe a large peak of growth over the period
2003-2005. Notice that this recent peak is much larger and coherent than the
previous one ending in 1991, which was followed by a price stabilization and
even a price drop in certain cases. This figure stresses that the acceleration in
growth rate is a very localized event which occurred essentially in 2003-2004
and the subsequent growth rate has leveled off to pre-bubble times. We can
conclude that there has been no bubble from 1990 to 2002, approximately,
then a short-lived bubble until mid-2004 followed by a smoothed transition
back to normal.
1980 1985 1990 1995 2000 2005 2010
−0.04
−0.02
Fig. 4. Evolution of the growth rates of the 27 regional CSW indexes from June-1983
to March-2005.
Fig. 5 plots the price growth rate g(t) versus the price p(t) itself for the 27
CSW indexes. A linear regression of the data points on Fig. 5, shown as the
red straight line, gives a correlation coefficient of 0.494. If we perform lin-
ear regression for each index, then we find an average correlation coefficient
0.503 ± 0.036, confirming the robustness of this estimation of the correlation
between growth rate and price level. The obtained relation between g and p
obtained from this correlation analysis is captured by the following mathe-
matical regression
g = 0.00922×
− 0.00747 . (10)
In words, if p is large, then g is large on average, which confirms the concept
of a positive feedback of price on its further growth. The continuous time limit
of g(t) defined by (9) is
g(t) =
d ln p
. (11)
This last equation together with (10), that we write as g(t) = αp − β (with
α = 0.00922/100 and β = 0.00747), implies the following ordinary differential
equation
= αp2 − βp , (12)
which indeed gives a power law acceleration p(t) ∼ 1/(tc − t) asymptotically
close to the critical time tc. Note that this critical time is determined by
the initial conditions, and is called in mathematics a movable singularity. We
conclude from this first analysis that the rough linear growth of the growth rate
confirms the existence of a bubble growing faster than exponential according
to an approximate power law. But of course, the exponent of this power law
is poorly constrained, in particular from the fact that the growth rate g(t)
exhibits significant variability and furthermore nonlinearity, as can be seen in
Fig. 5.
50 100 150 200 250 300 350 400 450
−0.04
−0.02
Jul. 1983 − Sep. 2003
Oct. 2003 − Sep. 2004
Oct. 2004 − Mar. 2005
Fig. 5. Dependence on the data price p for all CSW indexes of its growth rate g.
The overall correlation coefficient is 0.494. The red line is the linear fit of the data
points.
It is useful to refine this analysis by separating the whole time interval into
three distinct intervals. The corresponding plot of the growth rate g as a
function of price is shown in Fig. 5 with different symbols: period 1 is Jul.
1983 to Sept. 2003, period 2 is from Oct. 2003 to Sept. 2004, and period 3 is
from Oct. 2004 to Mar. 2005. An anomaly can be clearly outlined, associated
with the red dots which correspond to the anomalous peak in the growth rate
in the period from Oct. 2003 to Sept. 2004. Notice also that the most recent
time interval from Oct. 2004 to Mar. 2005 shows practically the same behavior
as the first period before 2003. In other words, when removing the data in red
for the period from Oct. 2003 to Sept. 2004, the growth rate g(t) is practically
independent of p, which qualifies the normal regime. We can thus conclude that
this so-called “phase-portrait” of the growth rate versus price has identified
clearly an anomalous time interval associated with extremely fast accelerating
prices followed by a more recent period where the price growth has resumed
a more normal regime.
4 Yearly periodicity and intra-year structure
4.1 Yearly periodicity from superposed year analysis and spectral analysis
In Fig. 4, the time dependence of the monthly growth rate exhibits a clear sea-
sonality (or periodicity), which appear visually to be predominantly a yearly
phenomenon. This visual observation is made quantitative by performing a
spectral Fourier analysis. The power spectrum of a typical CSW index is shown
in Fig. 6 (all CSW indexes show the same power spectrum). Since the unit of
time used here is one year, the frequency f is in unit of 1/year. A periodic
behavior with period one year should translate into a peak at f = 1 plus all
its harmonics f = 2, 3, 4, · · · , which is indeed observed in Fig. 6. Note also
that the spectrum has large peaks at f = 4 and f = 8 among the harmonics of
f = 1, which indicates a weak periodicity with period of one quarter. This is
consistent with Fig. 7, where four oscillations in the averaged monthly growth
rates can be observed.
0 1 2 3 4 5 6 7 8 9 10 11 12
Fig. 6. Spectrum analysis to confirm the strong periodicity in g(t).
Note that the power spectrum itself is periodic with a period of 12, which is
the sampling frequency, equal to the double of the Nyquist frequency. There
are also many peaks in the low-frequency region (larger that one-year time
scale) close to f = 0, which are associated with the time scales of the global
trends produced by the big peaks in g(t) around year 2004 as well as around
1990.
To further explore this seasonal variability of the price growth rates, we cal-
culate the averages of the growth rates for given months, where the average is
performed over all years. Consider for instance the month of January: we look
up the growth rate for all the data over all years for the month of January
and take the average. We do the same for each successive months. The result
is shown in Figure 7 for two time periods, which gives the average growth rate
〈g〉 for different months of the year. The red dash line and circles give the
resultant 〈g〉 for all the data and the black dash line and triangles give the
standard deviation σg for all data (which is a measure of the variability from
year to year and from zip code to zip code around the average). The difference
between the two time periods is precisely the time interval from June 2003 to
March 2005: this period is responsible for a significant increase of the average
growth rate (compare the red dashed line (filled circles) with the red continu-
ous line (open circles)) and an even larger increase of the variability (compare
the black continuous line (filled triangles) with the dashed black line (open
triangles)), again confirming the evidence of an anomalous behavior in that
period. In 2005, it appears that the growth rate relaxed back to the normal
level (according to the historical record).
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
Fig. 7. Monthly average growth rate (circles) and its standard deviation (triangles)
as a function of the month within the year. Dash: results obtained over all 27
indexes over the period from Jun. 1983 to Mar. 2005; Solid: results obtained over
all 27 indexes over the period from Jun. 1983 to May 2003.
4.2 Yearly periodicity and intra-year structure with a scale and translation
modulated model
Inspired by these results, we propose the following quantitative model. Con-
sider a time t in units of month. We write t = 12T +m, where T is the year
and m is the month within that year and thus goes from 1 (January) to 12
(December). For instance, t = 26 corresponds to T = 2 and m = 2 (Febru-
ary), while t = 38 corresponds to T = 3 and again the same month m = 2
(February) within the year. We propose to model the intra-year structure of
the growth rate g(t) together with possible yearly variations by the following
expression
g(t = 12T +m) = f(T )h(m) + j(T ) . (13)
In words, the growth rate has an intra-year structure h(m) modulated from
year to year in amplitude by f(T ) up to a possible overall translation j(T )
which can also vary from year to year. We can expect f(T ) and j(T ) to be
approximately constant for most years, except around 1990 and 2004 for which
we should see an anomaly in either or both of them, since these two periods
had bubbles. Note that this model (13) gives an exact yearly periodicity if
f(T ) and j(T ) are constant. A non-constant f(T ) describes an amplitude
modulation of the yearly periodicity. In particular, we expect a strong peak
around T = 2004. With this model, we can focus on predicting f(T ) and j(T )
only, because we have removed the complex intra-year structure.
We have thus fitted the model (13) to three subsets of the whole available
time series for the growth rate g(t) and also to the whole set taken globally, in
order to test for the robustness of the model. For this, we use the cost function
Tmax∑
[g(t = 12T +m)− f(T )h(m)− j(T )]2 (14)
which is minimized with respect to the 12 unknown variables h(1), ..., h(12)
and the 2 × Tmax variables [f(1), j(1)], ..., f(Tmax), j(Tmax). There are 12Tmax
terms in the sum and 12 + 2 × Tmax unknown variables. This shows that the
system is well-constrained as soon as Tmax ≥ 2. For instance for Tmax = 20, we
have 52 unknown variables to fit and 240 terms in the sum to constrain the
Figure 8 illustrates the result of the fit of model (13) to the growth rate over
the whole time interval from 1985 to 2005. As expected, we can observe a clear
peak in the amplitude f(T ) corresponding to the year 2004, while there is not
appreciable peak around 1990. This means that the recent bubble appears
significantly stronger than any other episodes in the last 20 years and dwarfs
them. The anomalous nature of the recent bubble is reinforced by the existence
of a peak in j(T ) for the same year 2004, showing that both the amplitude and
translation components of the growth rates has been completely anomalous
in 2004. The middle graph of the top panel of figure 8 shows the intra-year
pattern captured by the model, which is in remarkable agreement with the
pattern shown in figure 7: one can observe a peak in March, May, August and
December, the largest peak being in May. The bottom panel of figure 8 shows
visually how well (or badly) the model fits the actual data. The quality of
the fit is excellent, except in 2004-2005. In other words, we clearly identify
a very anomalous or exceptional behavior in 2004-2005, again providing a
confirmation that something exceptional or anomalous has occurred during
that period.
1985 1990 1995 2000 2005
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
−0.025
−0.02
−0.015
−0.01
1985 1990 1995 2000 2005
1980 1985 1990 1995 2000 2005 2010
−0.03
−0.02
−0.01
Fig. 8. Upper panels: three graphs showing the three functions f(T ), h(m) and
j(T ) fitted on the growth rate over the whole time interval from 1985 to 2005.
Lower panel: Comparison between the growth rate data (empty blue circles) and
the model (13) (red line).
Figure 9 is the same as figure 8 for the period from 1985 to 1990. One can
clearly here observe a peak in the scaling amplitude f(T ) at T =1988 and in
the translation term j(T ) at T =1986, suggesting that the first bubble of the
1985-2005 period occurred over a relatively large time period 1985-1990, with
two successive contributions. The intra-year structure h(m) has also its peaks
on March, May, August and December, but this intra-year structure is weaker
than for other sub-periods. The lower panel of figure 9 shows that the model
captures very well the overall trend as well as the intra-year structure. The
main discrepancies are in the amplitude of the large peaks and valleys, which
are not fully predicted.
1985 1985.5 1986 1986.5 1987 1987.5 1988 1988.5 1989 1989.5 1990
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0.185
0.195
0.205
1985 1985.5 1986 1986.5 1987 1987.5 1988 1988.5 1989 1989.5 1990
−0.29
−0.28
−0.27
1984 1985 1986 1987 1988 1989 1990 1991
−0.03
−0.02
−0.01
Fig. 9. Same as figure 8 for the period from 1985 to 1990.
Figure 10 is the same as figure 8 for the period from 1991 to 2000. One can
clearly here observe a peak in the scaling amplitude f(T ) at T =1995 and
in the translation term j(T ) at T =1994. This thus identifies a small bubble
in the mid-1990s. The intra-year structure h(m) has also its peaks on March,
May, August and December, with very large amplitudes. The lower panel of
figure 10 shows a truly excellent fit.
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
−0.12
−0.115
−0.11
−0.105
1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
0.155
0.165
1990 1992 1994 1996 1998 2000 2002
−0.01
−0.005
0.005
0.015
0.025
Fig. 10. Same as figure 8 for the period from 1991 to 2000.
Figure 11 is the same as figure 8 for the period from 2001 to 2005. One can
clearly here observe a peak in the scaling amplitude f(T ) at T =2004 and
in the translation term j(T ) also at T =2004. This thus clearly identifies the
bubble as peaking in 2004. The intra-year structure h(m) has also its peaks
on March, May, August and December, with very large amplitudes and very
good agreement with the other three figures. The lower panel of figure 11
shows an excellent fit up to the early 2003 and then a rather large discrepancy
starting early 2003 all the way to the last data point approaching mid-2005. In
particular, note that the intra-year structure is washed out by the anomalous
growth rate culminating in mid-2004. Symmetrically, the intra-year structure
is also absent in the fast decay of the growth rate back to normal. We do
not have enough data to ascertain if the growth rate has resumed its normal
intra-year pattern. We believe that this is a very important diagnostic to
characterize abnormal behavior and this could be a very useful variable to
monitor on a monthly basis.
2001 2001.5 2002 2002.5 2003 2003.5 2004 2004.5 2005
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
−0.025
−0.02
−0.015
−0.01
2001 2001.5 2002 2002.5 2003 2003.5 2004 2004.5 2005
2000 2001 2002 2003 2004 2005 2006
−0.01
Fig. 11. Same as figure 8 for the period from 2001 to 2005.
The four figures 8-11 validate model (13): in particular, they show the very
robust intra-year structure with peaks in March, May, August and December.
One possible contribution to this quarterly periodicity comes from the con-
struction of the CSI: the monthly indexes use a three-month moving average
algorithm. Home sales pairs are accumulated in rolling three-month periods,
on which the repeat sales methodology is applied. The index point for each
reporting month is based on sales pairs found for that month and the pre-
ceding two months. For example, the December 2005 index point is based
on repeat sales data for October, November and December of 2005. This av-
eraging methodology is used to offset delays that can occur in the flow of
sales price data from county deed recorders and to keep sample sizes large
enough to create meaningful price change averages. A three month rolling
window construction corresponds in general to a convolution of the bare price
with a kernel which possesses a three month periodicity (or size). The Fourier
transform of the convolution is the product of Fourier transforms. Thus the
spectrum of the signal should contain the peaks of the Fourier spectrum of
the kernel, which by construction contains a peak at three months. However,
our synthetic tests (not shown) suggest that this effect is by far too small to
explain the strong amplitude of the observed quarterly periodicity. It would
be important to understanding why such intra-year structure develops: is it
the result of a natural intra-day organization of buyers’ behaviors associated
with taxes/ income constraints or a problem of reporting or perhaps the effect
of other calendar regularities? Or is it the result of patterns coming from the
supply part of the equation, namely home-builders, developers, and perhaps
in the time modulation of the rates of allocated permits? Answering these
questions is important to determine how much emphasis one should give to
these results. But if indeed the intra-day structure is a genuine non-artificial
phenomenon, we believe that it offers a remarkable opportunity for monitoring
in real time the normal versus abnormal evolution of the market and also for
developing forecasts on a month time horizon.
4.3 Intra-year pattern from signs of growth rate increments
The existence of a strong and robust intra-year structure in the price growth
rate can be further demonstrated by studying the sign of g(t + 1) − g(t). A
positive (negative) sign mean that the growth rate tends to increase (decrease)
from one month to the next.
Based on the seasonality of the growth rate, we are able to answer the following
question: given the current growth rate g(t), will the growth rate increase or
decrease at time t+1? This amounts to asking what is the sign of g(t+1)−g(t)?
Technically, we construct the (unconditional) number of times the sign of the
increment g(t + 1) − g(t) is positive or negative irrespective of what is g(t).
From Fig. 4, we obtain a sequence of signs: −−+−+−−+−−++. For each
month, we calculate the percentage of positive and negative signs, respectively.
The second and the third rows of Table 1 gives the percentage of positive and
negative signs for each month. The third and fourth rows gives the signs and
the associated percentages.
For instance, the table says that the “probability” of the sign of g(t = Feb)−
g(t = Jan) being “-” is about 92.1%. If we know g(t = Jan), we can say that it
is very probable that the growth rate of February will be less than this January
value. Thus, this table has predictive power in the sense that the probabilities
to predict the signs are much higher than the value of 75% obtained under the
null hypothesis that g(t) is a white noise process (see Sornette and Andersen,
2000). This table is another way to rephrase and expand on our preceding
analysis on the yearly periodicity by identifying a very strong and robust
intra-year structure.
Table 1
Analysis of the signs of g(t + 1) − g(t). The second and the third rows gives the
percentage of positive and negative signs for each month. The third and fourth rows
give the sign for each month that dominates and the associated percentages.
Mon Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
+% 7.91 17.2 88.0 5.64 97.7 8.47 8.47 91.4 6.57 8.92 84.2 82.2
-% 92.1 82.8 12.0 94.4 2.29 91.5 91.5 8.59 93.4 91.1 15.8 17.8
sign - - + - + - - + - - + +
% 92.1 82.8 88.0 94.4 97.7 91.5 91.5 91.4 93.4 91.1 84.2 82.2
Since our initial analysis performed in the summer of 2005 which used data
up to March 2005, new data for the 27 CSW indexes has become available
which covers the interval from Apr. 2005 to Sept. 2006. It is very interesting
to check if the sign of the growth variations obtained in Table 1 using the data
until March 2005 still applies to the new data. The realized signs of the newly
available months are calculated and the sequence of signs is the following: -
(Apr. 2005, 27 CSW indexes out of 27), + (May. 2005, 27 out of 27), - (Jun.
2005, 27 out of 27), - (Jul. 2005, 27 out of 27), + (Aug. 2005, 27 out of 27), -
(Sep. 2005, 27 out of 27), - (Oct. 2005, 27 out of 27), + (Nov. 2005, 27 out of
27), + (Dec. 2005, 21 out of 27), - (Jan. 2006, 27 out of 27), - (Feb. 2006, 27 out
of 27), + (Mar. 2006, 27 out of 27), - (Apr. 2006, 27 out of 27), + (May. 2006,
27 out of 27), - (Jun. 2006, 27 out of 27), - (Jul. 2006, 27 out of 27), + (Aug.
2006, 27 out of 27), and - (Sep. 2006, 27 out of 27). Thus, table 1 predicts
exactly the signs of the growth rate variations of all 27 CSW indexes for all
months except for Dec. 2005 for which there are 6 errors: table 1 predicts that
the growth rate variation from Dec. 2005 to Jan. 2006 should be +, which is
correct for 21 CSW indexes out of 27, corresponding to a success ratio of 77%
(close to the white noise case). This score is slightly lower than the previously
estimated probability of 82.2% for the month of December, which is the lowest
among all months. Overall, the success rate is remarkably high, adding further
evidence that the Las Vegas property market has returned to a more normal
phase (no bubble from April 2005 to Sept. 2006).
5 Predicting the monthly growth rate
Conditional of the evidence that the anomalous faster than exponential growth
has ended, let us attempt to predict the future evolution of the CSW indexes
based only on the strong seasonality of the growth rate. Figure 12 presents
the predictions one year ahead for the 27 regional CSW indexes. Two different
prediction schemes are used. The RED lines are based on the average growth
rate obtained from all 27 indexes, while the MAGENTA lines are based on the
average growth rate obtained from the individual index under investigation.
There is not discernable difference.
A similar prediction of the Clark County (Las Vegas MSA) indexes (NVC003Q
and NVC003C) has also been made using the average growth rates obtained
from all 27 regional indexes. Since these two indexes are only available from
July-2000 to March-2005, we do not have enough data to calculate the average
growth rates using the indexes themselves. The results are shown in Fig. 13.
2002 2003 2004 2005 2006 2007
Fig. 12. Predicting regional CSW indexes one year ahead. Red lines: Prediction
using average growth rate obtained from all 27 indexes; Magenta lines: Prediction
using average growth rate obtained from the individual index under investigation.
The two kinds of prediction are almost undistinguishable.
2000 2001 2002 2003 2004 2005 2006 2007
NVC003C: Raw data
NVC003C: Prediction
NVC003Q: Raw data
NVC003Q: Prediction
Fig. 13. Predicting Clark County (Las Vegas MSA) indexes (NVC003Q and
NVC003C) one year ahead.
6 Conclusion
We have analyzed 27 house price indexes of Las Vegas from Jun. 1983 to
Mar. 2005, corresponding to 27 different zip codes. These analyses confirm
the existence of a real-estate bubble, defined as a price acceleration faster
than exponential. This bubble is found however to be confined to a rather
limited time interval in the recent past from approximately 2003 to mid-2004
and has progressively transformed into a more normal growth rate in 2005.
The data up to mid-2005 suggests that the current growth rate has now come
back to pre-bubble levels. We conclude that there has been no bubble from
1990 to 2002 except for a medium-sized surge in 1995, then a short-lived but
very strong bubble until mid-2004 which has been followed by a smoothed
transition back to what appears to be normal. It thus seems that, while the
strength of the real-estate bubble has been very strong over the period 2003-
2004, the price appreciation rate has returned basically to normal.
In addition, we have identified a strong yearly periodicity which provides a
good potential for fine-tuned prediction from month to month. As the intra-
year structure is likely a genuine non-artificial phenomenon, it offers a re-
markable opportunity for monitoring in real time the normal versus abnormal
evolution of the market and also for developing forecasts on a monthly time
horizon. In particular, a monthly monitoring using a model that we have de-
veloped here could confirm, by testing the intra-year structure, if indeed the
market has returned to “normal” or if more turbulence is expected ahead.
In addition, it would provide a real-time observatory of upsurges and other
anomalous behavior at the monthly scale. This requires additional technical
developments and tests beyond this report.
Compared with previous analysis of Zhou and Sornette (2003, 2006) at the
scale of states and whole regions (northeast, midwest, south and west), the
present analysis demonstrates the existence of very significant variations at the
local scale, in the sense that the bubble in Las Vegas seems to have preceded
the more global USA bubble and has ended approximately two years earlier
(mid 2004 for Las Vegas compared with mid-2006 for the whole of the USA).
References
Bailey, M., Muth, R., Nourse, H., 1963. A regression method for real estate
price index construction. Journal of the American Statistical Association
58, 933–942.
Bender, C., Orszag, S. A., 1978. Advanced Mathematical Methods for Scien-
tists and Engineers. McGraw-Hill, New York.
Broekstra, G., Sornette, D., Zhou, W.-X., 2005. Bubble, critical zone and the
crash of Royal Ahold. Physica A 346, 529–560.
Case, K. E., Shiller, R. J., 1987. Prices of single-family homes since 1970: New
indexes for four cities. New England Economic Review Sep/Oct, 45–56.
Case, K. E., Shiller, R. J., 1989. The efficiency of the market for single-family
homes. American Economic Review 79, 125–137.
Case, K. E., Shiller, R. J., 1990. Forecasting prices and excess returns in the
housing market. AREUEA J. 18, 253–273.
Case, K. E., Shiller, R. J., 2003. Is there a bubble in the housing market.
Brookings Papers on Economic Activity (2), 299–362.
Greenspan, A., 1997. Federal ReserveÕs semiannual monetary policy report,
before the Committee on Banking, Housing, and Urban Affairs, U.S. Senate,
February 26.
Himmelberg, C., Mayer, C., Sinai, T., September 2005. Assessing high house
prices: Bubbles, fundamentals, and misperceptions. Tech. Rep. Staff Report
no. 218, Federal Reserve Bank of New York.
Jorion, P., 2005. Is housing market surge really sustainable? The Wall Street
Journal September 22, A17.
Mauboussin, M. J., Hiler, R., 1999. Rational Exuberance? Equity research
report of Credit Suisse First Boston, January 26.
Roehner, B. M., 2006. Real estate price peaks: A comparative overview. Evo-
lutionary and Institutional Economics Review in press, physics/0605133.
Shiller, R. J., 2000. Irrational Exuberance. Princeton University Press, New
York.
Shiller, R. J., 2006. Long-term perspectives on the current boom in home
prices. Economists’ Voice 3(4), Art. 4.
Smith, M. H., Smith, G., 2006. Bubble, bubble, where’s the housing bubble?
Brookings Papers on Economic Activity (1), 1–67.
Sornette, D., 2003. Why Stock Markets Crash: Critical Events in Complex
Financial Systems. Princeton University Press, Princeton.
Sornette, D., Andersen, J.-V., 1998. Scaling with respect to disorder in time-
to-failure. European Physical Journal B 1, 353–357.
Sornette, D., Andersen, J.-V., 2000. Increments of uncorrelated time series
can be predicted with a universal 75% probability of success. International
Journal of Modern Physics C 11, 713–720.
Zhou, W.-X., Sornette, D., 2003. 2000-2003 real estate bubble in the UK but
not in the USA. Physica A 329, 249–263.
Zhou, W.-X., Sornette, D., 2006. Is there a real-estate bubble in the US?
Physica A 361, 297–308.
Biographies:
Wei-Xing Zhou is a Professor of Finance at the School of Business in the East
China University of Science and Technology. He received his PhD in Chemical
Engineering from the East China University of Science and Technology in
2001. His current research interest focuses on the modeling and prediction of
catastrophic events in complex systems.
Didier Sornette holds the Chair of Entrepreneurial Risks at the Depart-
ment of Management, Technology and Economics of ETH Zurich. He received
his PhD in Statistical Physics from the University of Nice, France. His current
research focuses on the modeling and prediction of catastrophic events in com-
plex systems, with applications to finance, economics, seismology, geophysics
and biology.
	Introduction
	Conceptual background of our empirical analysis
	Humans as social animals and herding
	Definition and mechanism for bubbles
	Was there a bubble? Status of the argument based on the ratio of cost of owning versus cost of renting
	Regime shift in the CSW Zip Code Indexes of Las Vegas
	Description of the data
	Power law fits
	Dependence of the growth rate on the index value
	Yearly periodicity and intra-year structure
	Yearly periodicity from superposed year analysis and spectral analysis
	Yearly periodicity and intra-year structure with a scale and translation modulated model
	Intra-year pattern from signs of growth rate increments
	Predicting the monthly growth rate
	Conclusion
ABSTRACT
  We analyze 27 house price indexes of Las Vegas from Jun. 1983 to Mar. 2005,
corresponding to 27 different zip codes. These analyses confirm the existence
of a real-estate bubble, defined as a price acceleration faster than
exponential, which is found however to be confined to a rather limited time
interval in the recent past from approximately 2003 to mid-2004 and has
progressively transformed into a more normal growth rate comparable to
pre-bubble levels in 2005. There has been no bubble till 2002 except for a
medium-sized surge in 1990. In addition, we have identified a strong yearly
periodicity which provides a good potential for fine-tuned prediction from
month to month. A monthly monitoring using a model that we have developed could
confirm, by testing the intra-year structure, if indeed the market has returned
to ``normal'' or if more turbulence is expected ahead. We predict the evolution
of the indexes one year ahead, which is validated with new data up to Sep.
2006. The present analysis demonstrates the existence of very significant
variations at the local scale, in the sense that the bubble in Las Vegas seems
to have preceded the more global USA bubble and has ended approximately two
years earlier (mid 2004 for Las Vegas compared with mid-2006 for the whole of
the USA).

<|endoftext|><|startoftext|>
Introduction
	Hermitian Codes and Syndrome Computation
	Lemma 1
	Definition 1
	Lemma 2
	Systematic Encoding
	Lemma 3
	Algorithm 1
	Algorithm 2
	Lemma 5
	Algorithm 3
	Efficient Implementation of a Systematic Encoder
	Module A: Multiplication with Matrix A, A'
	Module B: Multiplication with Matrix A-1, A'-1
	Module C: Systematic Encoding of Codes Ei
	Module D: Systematic Encoding of Codes Dl
	Encoder
	Final Remarks
	Acknowledgement
	References
ABSTRACT
  We present an algorithm for systematic encoding of Hermitian codes. For a
Hermitian code defined over GF(q^2), the proposed algorithm achieves a run time
complexity of O(q^2) and is suitable for VLSI implementation. The encoder
architecture uses as main blocks q varying-rate Reed-Solomon encoders and
achieves a space complexity of O(q^2) in terms of finite field multipliers and
memory elements.

<|endoftext|><|startoftext|>
Quantum criticality and disorder in the antiferromagnetic critical point of NiS2 pyrite
N. Takeshita1, S. Takashima2, C. Terakura1, H. Nishikubo2, S. Miyasaka3, M. Nohara2, Y. Tokura1,4, and H. Takagi1,2
1Correlated Electron Research Center (CERC), National Institute of Advanced
Industrial Science and Technology (AIST), Tsukuba 305-8562, Japan
2Department of Advanced Materials, University of Tokyo, Kashiwa 277-8561, Japan
3Department of Physics, Osaka University, Toyonaka 560-0043, Japan
4Department of Applied Physics, University of Tokyo, Tokyo 113-8656, Japan
(Dated: November 11, 2021)
A quantum critical point (QCP) between the antiferromagnetic and the paramagnetic phases was
realized by applying a hydrostatic pressure of ∼ 7 GPa on single crystals of NiS2 pyrite with a
low residual resistivity, ρ0, of 0.5 µΩcm. We found that the critical behavior of the resistivity, ρ,
in this clean system contrasts sharply with those observed in its disordered analogue, NiS2−xSex
solid-solution, demonstrating the unexpectedly drastic effect of disorder on the quantum criticality.
Over a whole paramagnetic region investigated up to P = 9 GPa, a crossover temperature, defined
as the onset of T2 dependence of ρ, an indication of Fermi liquid, was suppressed to a substantially
low temperature T ∼ 2 K and, instead, a non Fermi liquid behavior of ρ, T 3/2-dependence, robustly
showed up.
PACS numbers:
A hallmark of strongly correlated electron systems is
the presence of a rich variety of phases often competing
with each other. When two phases meet with each other
in the T = 0 limit by tuning a phase controlling parame-
ter such as pressure and chemical doping, quantum fluc-
tuations often give rise to exotic states of electrons, which
has been attracting considerable interest in condensed
matter research [1]. One of the most fascinating cases is a
breakdown of the Fermi liquid at magnetic quantum crit-
ical points (QCP) in itinerant magnets, which has been
believed to be captured by self-consistent renormalization
theory [2] and scaling analysis [3, 4]. The onset temper-
ature of Fermi liquid coherence, probed by a quadratic
temperature dependence of resistivity ρ(T ) ∝ T 2, is pre-
dicted to be suppressed by the presence of low lying spin
fluctuations near QCP and, right at QCP, a non Fermi
liquid ground state is realized which manifests itself as a
non-trivial power law behavior of resistivity ρ(T ) ∝ T n
down to the T = 0 limit, where n = 3/2 for antiferro-
magnetic (AF) QCP and n = 5/3 for ferromagnetic (F)
QCP [5]. A V-shaped recovery of Fermi liquid behavior,
T 2-resistivity, around QCP is anticipated as a function
of phase tuning parameter.
The critical behavior of ρ(T ) near the AF QCP in
NiS2−xSex solid solution is a textbook demonstration of
standard theories for QCP. NiS2 crystallizes in the pyrite
structure. Ni is divalent and therefore accommodates two
electrons in doubly degenerate eg orbitals (t
g). Due to
a strong onsite Coulomb repulsion among Ni eg electrons,
the system is a S = 1 Mott insulator [6, 7]. By substitut-
ing S with Se, the effective bandwidth can be increased
due to the increase of p-d hybridization. With increasing
x in NiS2−xSex, the system experiences a weakly first or-
der transition to an AF metal with a collinear spin struc-
ture [8] at x ∼ 0.4 and then a second order transition to
a paramagnetic metal at x = 1.0.
In the AF metal phase of NiS2−xSex, the AF transi-
tion temperature is TN ∼ 90 K at x = 0.5 and, with
increasing x, continuously goes down to T = 0 at x =
1.0, giving rise to a well defined AF QCP. The T 3/2 de-
pendence of ρ(T ), expected for AF QCP, is observed at
least down to 1.7 K at x = 1.0. On going away from x =
1.0, T 2-behavior of ρ(T ) quickly recovers and a V-shaped
region with T 2 resistivity is identified around x = 1.0. It
is known that the application of pressure is equivalent to
Se substitution in that it increases the band width. By
applying pressure P on an AF metal NiS1.3Se0.7, sup-
pression of TN analogous to Se substitiution was indeed
observed and, eventually, QCP was approached with P =
2 GPa [9]. The phase diagram and the critical behavior
of the resistivity in pressurized NiS1.3Se0.7 were essen-
tially identical with Se content x simply replaced with P
Recently, however, there has been growing evidence
that, the above mentioned textbook picture is violated
in a variety of intermetallic systems. In a helical magnet
MnSi [10] and a weak ferromagnet ZrZn2 [11], a non triv-
ial power law behavior of resistivity, ρ(T ) ∝ T 3/2, dom-
inates the resistivity down to a very low temperature,
not only right at the QCP but also over a wide range
of paramagnetic phase. At the AF QCP in CePd2Si2,
with increasing the purity of sample, the exponent of the
power law resistivity was found to deviate significantly
from the standard value of 3/2 [12]. The common feature
among these systems is that they are clean with a low
residual resistivity of ρ0 < 1 µΩcm [10, 11]. In contrast,
in NiS2−xSex solid solution where textbook example of
critical behavior of ρ(T ) is observed, Se substitution in-
herently gives rise to disorder. Indeed, ρ0 of NiS2−xSex
around QCP is as large as several 10 µΩcm. Questions
immediately arise. Does the non-trvial behavior in the
intermetallics represent a generic property of magnetic
http://arxiv.org/abs/0704.0591v1
QCP in clean systems? Does standard behavior of QCP
shows up only when the system is disordered? To tackle
these questions experimentally, we attempted to realize
a “clean” QCP in NiS2−xSex. The parent compound of
NiS2−xSex, NiS2, is pure and presumably clean. If one
can approach the QCP of pure NiS2 by pressure without
relying on Se substitution, a clean analogue of AF QCP
in NiS2−xSex can be explored and the impact of disor-
der on QCP can be captured. Recent progress of high
pressure technique enabled us to do so.
In this Letter, we address the issue of criticality and
disorder by examining the critical behavior of resistivity
of pure NiS2 under pressures. The AF QCP of NiS2 was
reached at ∼ 7 GPa, where the system was found very
clean with a low residual resistivity ρ0 of ∼ 0.5 µΩcm.
Not only right at the QCP but over an entire range of the
paramagnetic phase investigated, the recovery of Fermi
liquid T 2 of ρ(T ) is suppressed substantially to a very low
temperature below ∼ 2 K and non Fermi liquid behavior
with T 3/2 dependence of ρ(T ) dominated. We demon-
strate the drastic influence of disorder on this AF QCP
by contrasting the result with previous pressure work on
NiS1.3Se0.7 with a residual resistivity of 60 µΩcm [9].
NiS2 sample used in this study was prepared by a va-
por transport technique. The resistivity measurement
was performed by a conventional four probe technique
under hydrostatic pressure up to ∼ 10 GPa in a cubic
anvil type pressure system down to 3 K and also in a
modified Bridgman-type pressure cell down to 180 mK.
The results obtained by the two different pressure setups
agreed reasonably in the temperature range of overlap,
indicating a very good homogeneity of pressure. Pres-
sure was calibrated by measuring the superconducting
transition temperature of Pb [13].
The inset of Fig. 1 demonstrates ρ(T ) of NiS2 at rel-
atively low pressures below 4 GPa. With applying pres-
sure, the insulating behavior of ρ(T ) switches into metal-
lic behavior, indicating the occurrence of metal-insulator
transition. In between 2.6-3.4 GPa, we observe a discon-
tinuous jump of resistivity as a function of temperature,
which corresponds to a first order metal-insulator transi-
tion line on the phase diagram in Fig. 1. The discontinu-
ous jump appears to terminate around 200 K, indicating
the presence of a critical end point. In the phase dia-
gram of NiS2−xSex solid solution, the first order phase
line terminates at much lower temperature and is hard
to identify [14]. This difference appears to suggest the
strong influence of disorder on the Mott criticality.
As seen in the inset of Fig. 1, ρ(T ) of pure NiS2 showed
metallic behavior above P = 2.6 GPa. The residual re-
sistivity at the critical point was as low as ∼ 0.5 µΩcm,
demonstrating that the system is indeed very clean. Mag-
netic ordering in the AF metal phase manifests itself as a
very weak but sharp kink in ρ(T ) at TN as indicated by
the arrows. The antiferromagnetic transition tempera-
ture TN thus determined systematically goes down upon
1086420
P  (GPa)
3002001000
T  (K)
0,  2.6,  3.0,  3.2, 
3.3, 3.36, 3.4 GPa
 TNInsulator
100500
T  (K)
4.0 GPa
5.0 GPa
6.2 GPa
7.5 GPa
FIG. 1: The electronic phase diagram of clean NiS2 pyrite as
a function of pressure. PM and AFM denote paramagnetic
metal and AF metal, respectively. The inset shows the tem-
perature dependent resistivity under pressures, P = 0 - 3.4
GPa (left) and P = 4.0 - 7.5 GPa (right).
pressure and approaches T = 0 somewhere around 7-7.5
GPa. No superconductivity was observed between P =
6 and 9.1 GPa down to 180 mK, in spite of the low resid-
ual resistivity. This appears to suggest that realizing an
AF QCP in clean systems alone is not enough to achieve
exotic superconductivity as observed in heavy Fermion
compounds [15, 16, 17, 18] and that additional ingredi-
ents such as Kondo physics must be invoked.
The pressure dependence of TN , determined by the
kink in ρ(T ), together with the first order metal insulator
transition, is summarized as a phase diagram in Fig. 1.
TN appears to decrease almost linearly in contradiction
to (Pc − P )
2/3 dependence expected from self consistent
renormalization theory [5]. Unusual linear suppression of
the magnetic transition temperature was also observed
analogously for a helical magnet MnSi [10] and a weakly
ferromagnet ZrZn2 [11] when the sample is very clean. It
may be interesting to infer that, in these clean system,
the magnetic transition as a function of pressure is re-
ported to be a first order rather than a second order. In
the clean NiS2, we cannot rule out the possibility of a
first order transition at this stage, because ρ(T ) is not
very sensitive to TN near the critical point.
The signature of AF criticality in this clean system was
explored. The inset of Fig. 2 demonstrates ρ(T ) below 30
K, plotted as ρ vs. T 3/2. In the antiferromagnetic phase
at P = 6.2 GPa, ρ - T 3/2 curve is linear above TN but
shows superlinear behavior below TN . The temperature
dependence below TN is found to be approximately T
indicative of the formation of coherent quasi particles. In
the paramagnetic phase above ∼ 7 GPa, however, the ρ
- T 3/2 curve shows a linear behavior down to very low
T (K)
T 3/2
6.8 GPa
2001000
T 3/2 (K3/2)
6.2 GPa
FIG. 2: Temperature dependent part of resistivity ρ−ρ0 as a
function of temperature under pressure above ∼ 7 GPa, where
the system is paramagnetic, plotted as log(ρ− ρ0) vs. log T .
The inset shows ρ vs. T 3/2 plot.
temperature which is expected for the antiferromagnetic
critical point due to low lying spin fluctuations. It is
remarkable to see T 3/2 behavior characteristic of the an-
tiferromagnetic critical point over such a wide range of
pressure from ∼ 7 GPa to ∼ 9 GPa. ρ(T ) is surprisingly
insensitive to the pressure in the paramagnetic region
above 7 GPa and it is hard to find a signature of crit-
icality. This is analogous to those observed in a helical
magnet MnSi [10] and a weakly ferromagnetic magnet
ZrZn2 [11] when the sample is clean, implying that un-
usual critical behavior in clean systems is ubiquitous.
To investigate the details of unusual temperature de-
pendence in the paramagnetic phase further, we plotted
the temperature dependence of ρ− ρ0 as log(ρ− ρ0) vs.
logT in the main panel of Fig. 2. The residual resistivity
ρ0 was determined by extrapolating ρ - T
2 curve to T =
0 limit. We note here that the temperature dependent
part ρ− ρ0 is comparable to ρ0 at ∼ 3 K and, therefore,
the ambiguity originating from the estimate of ρ0 does
not influence the temperature dependence of ρ − ρ0 at
least above 1 K. It is again clear that the slope is appar-
ently smaller than 2 and instead close to 3/2 above ∼
2 K. At the lowest temperatures below ∼ 2 K, however,
the slope becomes steeper and T 2-resitivity appears to
recover eventually below 2 K. This crossover tempera-
ture to T 2-resistivity is again insensitive to pressure and
always 2-3 K up to 9 GPa. Close inspection of data indi-
cates that the crossover temperature is the lowest around
7.5 GPa but only slightly lower than the other pressures.
987654
P  (GPa)
6543210
P  (GPa)
NiS1.3Se0.7
FIG. 3: Contour plot of the exponent n of power low depen-
dence of resistivity on pressure-temperature plane, demon-
strating the criticality observed in the temperature depen-
dence of resistivity. The main panel is data for clean NiS2
and the inset shows data for dirty NiS2−xSex. The Néel tem-
perature determined by resistivity anomaly was shown by the
white line.
This strong suppression of the crossover to T 2 behav-
ior in ρ(T ), over a remarkably wide range of pressure
from ∼ 7 GPa up to ∼ 9 GPa, contrasts sharply with
the observation in Se-doped samples, where the recovery
of T 2 resistivity was clearly observed not only for the
magnetic side but also for the paramagnetic side. As a
function of Se-doping, a metal-insulator transition and
antiferromagnetic critical point occurs at around Se con-
tent x = 0.4 and 1.0, respectively, while ∼ 2.5 GPa and ∼
7 GPa are required to reach a metal-insulator transition
and QCP, respectively. This yields a conversion ratio
of phase controlling parameters, ∼ 0.15 Se/1 GPa. In
this disordered NiS2−xSex, the T
3/2-dependence of ρ(T )
dominates at the QCP of x = 1.0. With further doping
of Se up to x = 1.33 which is equivalent of additional
pressure of ≃ 2 GPa, however, the T 2 resistivity is fully
recovered and can be observed below ∼ 80 K [9]. Analo-
gously, in a Se doped NiS1.3Se0.7 crystal under pressure,
on going from the QCP at P ∼ 2 GPa to P = 4 GPa,
T 2 resistivity recovers quickly and shows up below 80 K
with increase of 2 GPa. These should be compared with
the low crossover temperature of 2-3 K, approximately 2
GPa above the critical point.
To visually illustrate these points, we plotted the ex-
ponent of power law dependence of ρ(T ), n as a contour
map on the pressure-temperature plane in Fig. 3. The
exponent was determined by taking the derivative of the
log(ρ − ρ0) - logT curve in Fig. 2. It is clear that the
V-shaped recovery of Fermi liquid behavior around QCP
is absent in clean NiS2. The recovery can be seen only
on the antiferromagnetic side below 7 GPa, where the
region with n = 2 (T 2) develops below TN . Above the
critical point of P ∼ 7 GPa, it is clear that the n =
1.5 (T 3/2) region predominantly occupies a majority of
the paramagnetic phase. A thin region with a different
color is lying at the T = 0 limit. This corresponds to the
marginal recovery of Fermi liquid behavior below ∼ 2 K.
In the inset of Fig. 3, we have constructed the contour
map also for the NiS1.3Se0.7 data under pressure from a
previous report [7]. Note again the V-shaped recovery on
the temperature scale of 100 K over ∼ 2 GPa pressure.
The remarkable contrast in the critical behavior be-
tween pure NiS2 and NiS2−xSex, visually demonstrated
in Fig. 3, indicates that the influence of disorder on
quantum criticality is surprisingly drastic, since the only
difference between the two systems is the disorder pro-
duced by Se substitution. In NiS1.3Se0.7 solid solution,
the residual resistivity ρ0 is approximately 60 µΩcm,
which is larger than those of pure NiS2 by two orders
of magnitude. When the samples are disordered, we do
see a canonical behavior of the QCP as predicted by stan-
dard theories [3, 4, 5]. To our surprise, once the system
becomes clean, the textbook behavior is gone and the
Fermi liquid coherence seen in ρ(T ) is dramatically sup-
pressed. We should note that the magnitude of ρ(T )−ρ0
is roughly 50 µΩcm in the temperature range below 100
K at around QCP. In the Se doped crystal, inelastic scat-
tering is always weaker than or at most comparable to
elastic scattering due to disorder below 100 K. In the
pure NiS2, in contrast, the same situation, ρ − ρ0 < ρ0
occurs only below 2-3 K, where we observed crossover
to T 2-resistivity. This suggests that disorder might be
controlling the appearance of T 2-resistivity.
One of the plausible scenarios for the strong influence
of disorder and robust non-Fermi liquid behavior might
be a dichotomy of the Fermi surface [19]. It is natural
that a specific part of Fermi surface, a “hot spot”, is
coupled strongly with a critical antiferromagnetic fluctu-
ation with a characteristic wave vector Q. There may
remains a region with well defined quasiparticles free
from critical fluctuations, a cold spot. The transport
is then determined by an interplay of these two contri-
butions at high temperatures but eventually a cold spot
with T 2-dependence should dominate the conduction at
very low temperatures. This phase separation in k-space
might explain in part the unusual temperature depen-
dence observed in pure NiS2 but it is not clear whether
the robustness of non Fermi liquid behavior can be prop-
erly described. In this scenario, the strong influence of
disorder can be naturally understood. The strong elas-
tic scatting should mix up the hot spot and cold spot
and the inelastic scattering therefore becomes effectively
isotropic, which might be close to the situation implic-
itly assumed in standard theories [5]. Another scenario
might be a phase separation and the resultant domain or
bubble formation in real space as discussed in clean MnSi
where the helical spin order disappears discontinuously as
a first order transition [10]. These bubbles and domains
have been proposed to be responsible for the robust non
Fermi liquid behavior in the paramagnetic phase. It is
worth checking the possible first order transition carefully
checking the magnetism at ∼ 7 GPa.
In conclusion, we have demonstrated the sharp con-
trast in the quantum critical behavior of ρ(T ) between
the clean and the disordered systems by examining a sin-
gle crystal of NiS2 with a low residual resistivity of ∼ 0.5
µΩcm. Previously, the V-shaped recovery of Fermi liquid
behavior (T 2-behavior of resistivity) around the antifer-
romagnetic critical point was clearly observed as a func-
tion of pressure and Se content in the dirty NiS2−xSex
systems with ρ − ρ0 < ρ0. In sharp contrast, we found
a robust non Fermi liquid behavior over a wide pressure
range in the paramagnetic side of a QCP in the clean sys-
tem with ρ−ρ0 ≫ ρ0. These results clearly demonstrate
that our understanding of the quantum critical point is
still far from complete and some important ingredient
must be missing.
The authors would like to thank M. Imada and H.
Fukuyama for discussion. This work was partly sup-
ported by a Grant-in-Aid for Scientific Research (No.
18043009) from the Ministry of Education, Culture,
Sports, Science and Technology of Japan and CREST,
Japan Science and Technology Agency (JST).
[1] P. Coleman and A. J. Schofield, Nature 433, 226 (2005).
[2] T. Moriya, Spin Fluctuations in Itinerant Electron Mag-
netism (Springer-Verlag, Berlin, 1985).
[3] J. A. Hertz, Phys. Rev. B. 14, 1165 (1976).
[4] A. J. Millis, Phys. Rev. B. 48, 7183 (1993).
[5] T. Moriya and K. Ueda, Adv. Phys., 49, 555 (2000).
[6] S. Ogawa, J. Appl. Phys., 50, 2308 (1979).
[7] J. A. Wilson, The Metallic and Non-metallic States of
Matter, pp.215-260 (Taylor & Francis, London, 1985).
[8] T. Miyadai et al., J. Phys. Soc. Jpn., 38, 115 (1975).
[9] S. Miyasaka et al., J. Phys. Soc. Jpn., 69, 3166 (2000).
[10] C. Pfleiderer, S. R. Julian, and G. G. Lonzarich, Nature
414, 427 (2001).
[11] S. Takashima et al., J. Phys. Soc. Jpn. 76, 043704 (2007).
[12] F. M. Grosche et al., J. Phys.: Condens. Matter, 12,
L533 (2000).
[13] N. Môri, H. Takahashi, and N. Takeshita, High Pressure
Research, 24, 225 (2004).
[14] G. Czjzek et al., J. Magn. Mag. Mat., 3, 58 (1976).
[15] D. Jaccard, K. Behnia, and J. Sierro, Phys. Lett. A 163,
475 (1992).
[16] S. R. Julian et al., J. Phys.: Condens. Matter, 8, 9675
(1996).
[17] C. Petrovic et al., J. Phys.: Condens. Matter, 13, L337
(2001).
[18] S. S. Saxena et al., Nature 406, 587 (2000).
[19] A.Rosch, Phys. Rev. B, 62, 4945 (2000).
ABSTRACT
  A quantum critical point (QCP) between the antiferromagnetic and the
paramagnetic phases was realized by applying a hydrostatic pressure of ~ 7 GPa
on single crystals of NiS_{2} pyrite with a low residual resistivity, rho_{0},
of 0.5 mu-Omega-cm. We found that the critical behavior of the resistivity,
rho, in this clean system contrasts sharply with those observed in its
disordered analogue, NiS_{2-x}Se_{x} solid-solution, demonstrating the
unexpectedly drastic effect of disorder on the quantum criticality. Over a
whole paramagnetic region investigated up to P = 9 GPa, a crossover
temperature, defined as the onset of T^{2} dependence of rho, an indication of
Fermi liquid, was suppressed to a substantially low temperature T sim 2 K and,
instead, a non Fermi liquid behavior of rho, T^{3/2}-dependence, robustly
showed up.

<|endoftext|><|startoftext|>
Spin coherence of holes in GaAs/AlGaAs quantum wells
M. Syperek1,2, D. R. Yakovlev1,†, A. Greilich1, J. Misiewicz2, M. Bayer1, D. Reuter3, and A. D. Wieck3
Experimentelle Physik II, Universität Dortmund, D-44221 Dortmund, Germany
Institute of Physics, Wroc law University of Technology, 50-370 Wroc law, Poland and
Angewandte Festkörperphysik, Ruhr-Universität Bochum, D-44780 Bochum, Germany
(Dated: August 2, 2021)
The carrier spin coherence in a p-doped GaAs/(Al,Ga)As quantum well with a diluted hole gas
has been studied by picosecond pump-probe Kerr rotation with an in-plane magnetic field. For
resonant optical excitation of the positively charged exciton the spin precession shows two types
of oscillations. Fast oscillating electron spin beats decay with the radiative lifetime of the charged
exciton of 50 ps. Long lived spin coherence of the holes with dephasing times up to 650 ps. The
spin dephasing time as well as the in-plane hole g factor show strong temperature dependence,
underlining the importance of hole localization at cryogenic temperatures.
PACS numbers: 42.25.Kb, 78.55.Cr, 78.67.De
Recently the investigation of the coherent spin dynam-
ics in semiconductor quantum wells (QW) and quantum
dots has attracted much attention, due to the possible
use of the spin degree of freedom in novel fields of solid
state research such as spin-based electronics or quantum
information processing [1, 2, 3]. Until now the inter-
est has been mostly focused on the spin coherence of
electrons, while experimental information about the spin
coherence of holes is limited [4]. The hole as a Luttinger
spinor has properties strongly different from the electron
spin, such as a strong spin-orbit coupling, a strong direc-
tional anisotropy, etc. It plays an important role also in
coherent control of electron spins, since in many optical
schemes charged electron-hole complexes are proposed as
intermediate manipulation states [5].
Earlier, the hole spin dynamics in GaAs-based QWs
has been measured by optical orientation detecting
photoluminescence (PL) either time-integrated or time-
resolved [4, 6, 7, 8, 9]. Experimental studies addressed
the longitudinal spin relaxation time T1 [6, 7, 8] and the
dephasing time T ∗
, exploiting the observation of hole spin
quantum beats [4]. The reported relaxation times vary
from 4 ps [6] up to ∼1 ns [4, 8] demonstrating strong de-
pendence on doping level, doping density and excitation
energy. A major drawback of PL techniques is, however,
that the spin coherence can be traced only as long as
both electrons and holes are present and photolumines-
cence can occur. Further, they work only for studying
the spin dynamics of minority carriers in a sea of ma-
jority carriers and are therefore restricted to undoped or
n-type doped QWs. However, then the holes can interact
with electrons, providing additional relaxation channels
via exchange or shake-up processes [8, 10]. These mech-
anisms can be excluded for p-doped structures if the hole
spin relaxation occurs on time scales longer than the ra-
diative annihilation of electrons. A pump-probe Kerr ro-
tation (KR) technique using resonant excitation allows to
realize such measurements, which up to now have been
reported only for bulk p-type GaN [11] and not yet for
low-dimensional systems.
The theoretical analysis of the hole spin dynamics in
QWs has been focused on free holes [10, 12, 13, 14]
by considering different relaxation mechanisms: (i) a
Dyakonov-Perel like mechanism, (ii) an acoustic phonon
assisted spin-flip due to spin mixing of valence band
states, (iii) an exchange induced spin-flip due to scatter-
ing on electrons, which resembles the Bir-Aronov-Pikus
mechanism, but for holes. Recently the attention has
been drawn on the role of hole localization and the de-
phasing caused by fluctuations of the in-plane g factor
has been calculated [15].
In this paper we use time-resolved pump-probe Kerr
rotation [16] to investigate the spin coherence of holes in
a p-doped GaAs/Al0.34Ga0.66As single QW with a low
hole density. We find spin dephasing times reaching al-
most the ns-range at a temperature T = 1.6 K with a
hole in-plane g factor of about 0.01. Both quantities de-
crease strongly with increasing temperature, suggesting
the important influence of hole localization. We discuss
also a mechanism that provides generation of spin co-
herence for the hole gas under resonant excitation of the
positively charged exciton.
The structure was fabricated by molecular-beam epi-
taxy on a (100) oriented GaAs substrate. A 15 nm-
wide GaAs QW was grown on top of a 380 nm-
thick Al0.34Ga0.66As barrier and overgrown by a 190
nm-thick Al0.34Ga0.66As layer. 21 nm-thick layers
with Al0.34Ga0.66As effective composition realized by
GaAs/AlAs short-period superlattices were deposited on
both sides of the QW in order to improve interface pla-
narity. Two δ-doped layers with Carbon acceptors were
positioned symmetrically at 110 nm distance from the
QW. The hole gas concentration and mobility in the QW
are 1.51× 1011 cm−2 and 1.2× 105 cm2/Vs, respectively,
as determined by Hall measurements at T = 4.2 K. It
was possible to deplete the hole density in the QW by
above barrier illumination and even invert the majority
carrier type, resulting in a diluted electron gas. The sam-
ple temperature was varied from 1.6 to 6 K.
A mode-locked Ti:Sapphire laser with a repetition rate
of 75.6 MHz and a pulse duration of ∼1.5 ps (∼1 meV full
width at half maximum) was used for optical excitation.
http://arxiv.org/abs/0704.0592v1
FIG. 1: (a) KR traces for a p-type 15 nm-wide
GaAs/Al0.34Ga0.66As QW vs time delay between pump and
probe pulses at B = 0 and 7 T with field tilted by ϑ = 4◦
out of QW plane. Laser at 1.5365 eV is resonant with T+
line. Power was set to 5 and 1 W/cm2 for pump and probe,
respectively. Bottom trace was recorded with additional laser
illumination at 2.33 eV. T = 1.6 K. (b) PL spectra for DHG
(excitation at 1.579 eV) and DEG regime (above barrier ex-
citation at 2.33 eV). (c) Comparison of KR traces for ϑ = 0
and 4◦.
The laser beam was split into a circularly polarized pump
and a linearly polarized probe beam. Both beams where
focused on the sample surface to a spot diameter of ∼100
µm. Magnetic fields B ≤ 10 T were applied about per-
pendicular to the structure growth z-axis (Voigt geome-
try). In a pump-probe KR experiment the pump pulse
coherently excites carriers with spins polarized along the
z axis. The subsequent coherent evolution of the spins
in form of a precession about the magnetic field is tested
by the probe pulse polarization. To detect the change
of the linear probe polarization plane (the KR angle), a
homodyne technique based on phase-sensitive balanced
detection was used.
Photoluminescence spectra excited above and below
the band gap of the Al0.34Ga0.66As barriers are shown in
Fig. 1(b) at B = 0 and 7 T. A single PL line correspond-
ing to the positively charged trion T+ (consisting of two
holes and one electron) is seen for the regime of diluted
hole gas (DHG) established for below-barrier excitation.
After inverting the type of majority carriers to the DEG
FIG. 2: Top trace: KR signal measured at B = 7 T for ϑ = 4◦.
Bottom traces are obtained by separating electron and hole
contributions (see text). Excitation conditions as in Fig. 1.
regime by above barrier illumination the PL spectra con-
sist of the exciton (X) and negatively charged trion (T−)
lines.
The type of majority carriers in the QW can be iden-
tified by the KR signals measured at B = 7 T, with the
laser energy tuned to the trion resonance. The bottom
trace in Fig. 1(a) was measured with additional above-
barrier illumination (DEG regime) and shows long-lived
electron spin beats with a dephasing time of 2.5 ns which
is considerably longer than the radiative decay time of
resonantly excited trions of about 50 ps. The precession
frequency corresponds to a g factor | ge |= 0.285± 0.005,
which is typical for electrons in GaAs-based QWs.
Without above-barrier illumination in the DHG
regime, fast electron precession is observed only during
∼ 200 ps after pump pulse arrival [see middle trace in
Fig. 1(a)]. This signal is caused by the coherent preces-
sion of the electron in T+ and disappears with the trion
recombination. The electron beats are superimposed on
the hole beats with a much longer precession period. The
hole beats decay with a time constant of about 100 ps and
can be followed up to 500 ps delay. At these long times
the KR signal is solely given by coherent hole precession.
Experimentally, it is difficult to observe the hole spin
quantum beats due to the very small in-plane hole g fac-
tor. To enhance the visibility we tilted the magnetic
field slightly out of the plane by an angle ϑ = 4◦ to
increase the hole g factor by mixing the in-plane com-
ponent (gh,⊥) with the one parallel to the QW growth
axis (gh,‖), which typically is much larger: gh(ϑ) =
sin2 ϑ+ g2
h,⊥ cos
2 ϑ [17]. The strong change of the
hole beat frequency with the tilt angle is seen in Fig. 1(c).
The precession frequency is analyzed by ωh = µB |
gh | B/~, where µB is the Bohr magneton, and gives
| gh,⊥ |= 0.012±0.005 for ϑ = 0
◦ and | gh |= 0.048±0.005
for ϑ = 4◦.
The electron and hole contributions to the KR ampli-
FIG. 3: Hole component of KR signal at different magnetic
fields and ϑ = 4◦. Top inset: Magnetic field dependence of the
hole dephasing time T ∗2 . Solid line is 1/B fit to data. Bottom
inset: Hole spin precession frequency vs magnetic fields. Line
is guide to the eye. In inserts closed and open circles show the
data measured for pump to probe powers of 1 to 5 W/cm2
and 5 to 1 W/cm2, respectively. T = 1.6 K.
tude, ΘK, can be separated by fitting the experimental
data with a superposition form of exponentially damped
harmonic functions for electrons and holes:
i=e,h
Ai exp(−
) cos(ωi∆t). (1)
Ai are the corresponding signal amplitudes at pump-to-
probe delay ∆t = 0, and T ∗
2,i are the spin dephasing
times. An example for a decomposition of the KR signal
in the DHG regime is shown in Fig. 2.
Let us turn now to the hole coherence. Figure 3 shows
the hole contribution to the KR signal for different B at
T = 1.6 K. From the fit by Eq.(1) we have obtained the
dephasing time T ∗
, which is plotted versus B in the inset.
A very long lived hole spin coherence with T ∗2 = 650 ps is
found at B = 1 T. With increasing B up to 10 T it short-
ens to 70 ps. The field dependence can be well described
by a 1/B-form (see the line in the inset), from which
we conclude that the dephasing shortening arises from
the inhomogeneity of the hole g factor ∆gh = 0.0007 in
the QW, which is translated into a spread of the preces-
sion frequency: ∆ωh = ∆ghµBB/~. Since T
∝ 1/∆ωh,
this explains the 1/B-dependence of the dephasing time.
The magnetic field dependence of the hole precession fre-
quency in the lower inset of Fig. 3 shows an approximate
B-linear dependence up to 7 T. For higher fields a super
linear increase is seen which indicates a change of the
hole g factor due to mixing between heavy and light hole
states, induced by the field.
Two sets of experimental data measured for pump to
FIG. 4: Temperature dependence of the hole KR signal at
B = 7 T and ϑ = 5◦. Pump and probe powers are set to 1
and 5 W/cm2, respectively. Inset: Hole spin dephasing time
T ∗2 vs temperature.
probe powers 1 to 5 W/cm2 and 5 to 1 W/cm2 are com-
pared in the insets of Fig. 3. The very similar results
demonstrate performance of the experiment in the linear
regime for both pump and probe beams with power not
exceeding 5 W/cm2.
Insight into the origin of the long hole spin coherence
can be taken from KR at varying temperatures. The
data in Fig. 4 measured at ϑ = 5◦ show that (i) the
dephasing time T ⋆
decreases from 110 down to 60 ps
when increasing the temperature from 1.6 to 6 K (see also
inset), and (ii) simultaneously the precession frequency
decreases notably corresponding to a g factor decrease
from 0.057 to 0.030.
These results can be naturally explained by hole lo-
calization in the QW potential relief due to monolayer
well width fluctuations. The localization energy does not
exceed 0.5 meV, which is comparable with the thermal
energy at T = 6 K. Free holes are expected to have a
short spin coherence time T2 limited by the efficient spin
relaxation mechanisms due to the spin-orbit interaction
[10, 13, 14]. For localized holes these mechanisms are
mostly switched off and one can expect long T2 times.
However, in a KR experiment we do not measure the
T2 time, but rather the ensemble dephasing time T
(Ref. 15), as confirmed by the 1/B dependence in the
inset of Fig. 3. The T ∗2 time gives a lower boundary for
the spin coherence time T2. Therefore we can conclude,
that the T2 for localized holes is at least 650 ps. Thermal
delocalization of holes on the one hand decreases the role
of inhomogeneities and reduces ∆gh, which should lead
to longer dephasing times. On the other hand, then the
fast decoherence of free holes becomes the limiting factor
for the spin beats dynamics.
Mixing of heavy and light hole states in a QW is en-
hanced by localization effects. This should be detectable
by an increase of the in-plane hole g factor, which is close
to zero for free holes [4, 15, 18]. The decrease of the hole
g factor with increasing temperature shown in Fig. 4 is
consistent with a hole delocalization scenario.
We turn now to discussing the mechanism for opti-
cal generation of hole spin coherence in a QW with a
DHG. The generation mechanism is similar to the one
suggested for singly charged quantum dots [19, 20] and
QWs with a DEG [21]. In our experiment pump and
probe are resonant in energy with the positively charged
trion T+. Due to the considerable heavy-light hole split-
ting, the circularly polarized pump creates holes and elec-
trons with well-defined spin projections, Jh,z = ±3/2 and
Se,z = ±1/2, respectively, according to the optical selec-
tion rules [12]. Therefore, |⇑⇓↓〉 (|⇑⇓↑〉) trions can be
generated by a σ+ (σ−) polarized pump. Here the thick
and thin arrows give the spin states of holes and elec-
trons, respectively.
The pump pulse duration is much shorter than the spin
coherence and the electron-hole recombination times. If
in addition the pump duration is shorter than the charge
coherence time of the trion state the pulse creates a co-
herent superposition of a resident hole from the DHG and
a hole singlet trion T+. The spin state of the resident hole
with arbitrary spin orientation before excitation can be
described by α |⇑〉+ β |⇓〉, where |α|2 + |β|2 = 1. With-
out magnetic field and for fields oriented normal to the
z-axis, the net spin polarization of the hole ensemble is
zero, so that the ensemble averaged coefficients are equal:
α = β.
For σ+ polarized excitation, for which injection of an
|⇑↓〉 electron-hole pair is possible, the excited superposi-
tion is given by α |⇑〉+β cos(Θ/2) |⇓〉+iβ sin(Θ/2) |⇑⇓↓〉.
Here Θ =
d · E(t)dt/~ is the dimensionless pulse area
with the pump laser electric field E(t) and the dipole
transition matrix element d. In general, the hole-trion
superposition state may be driven coherently by varying
the pulse area, giving rise to Rabi-oscillations as reported
recently for (In,Ga)As quantum dots [20]. Such oscilla-
tions have not been found yet in QWs, most probably due
to the fast carrier dephasing, in particular for strong ex-
citation. Dephasing of the superposition occurs shortly
after the pulse on a time scale of a few ps, converting
the coherent polarization into a population consisting of
holes with original spins ⇑ and ⇓ and trions with ⇑⇓↓.
In a simplified picture, the spin coherence generation
can be described as follows: The σ+ polarized pump cre-
ates with certain efficiency trions T+ of spin configura-
tion |⇑⇓↓〉. By this process |⇓〉 holes are pumped out of
the DHG, leaving behind holes with opposite spin |⇑〉.
Right after the pump pulse the KR signal is contributed
by the |⇑〉 hole from the DHG and |↓〉 electron of the T+.
The further evolution of the coherent signal depends on
the strength of external magnetic field applied perpen-
dicular to the z-axis.
AtB=0, the carrier spins experience no Larmor preces-
sion. The electron spin relaxation time usually exceeds
the lifetime of trions, which is limited by radiative decay,
by one-two orders of magnitude. Trion recombination re-
turns the hole to the DHG with the same spin orientation
as it was pumped out, if no electron spin scattering oc-
curred in the meantime. This compensates the induced
spin polarization and nullifies the KR signal at delays
exceeding the trion lifetime. Indeed, the KR signal in
the top trace in Fig. 1(a) shows a fast decay with a time
constant of ∼50 ps, which is characteristic for radiative
trion recombination in GaAs/(Al,Ga)As QWs [22]. The
long-lived tail of the signal has a very small amplitude
and is due to hole coherence provided by weak spin re-
laxation of electrons in T+ and/or hole relaxation in the
DHG during the trion lifetime.
In finite magnetic fields, the carrier spins start to pre-
cess about B. Due to the electron spin precession in
T+, the hole spin returned to the DHG after trion re-
combination will not compensate the spin polarization of
the resident holes. Therefore, a long-lived hole coherence
with considerable amplitude will be induced. This co-
herence is observed in the KR signal as spin beats with
low frequency (see Figs. 1 and 3). Note that the Larmor
precession of the resident holes may also contribute to
generation of hole spin coherence, but the effect is pro-
portional to the ratio of the hole and electron Larmor
frequencies and therefore will be rather small.
Let us compare the spin coherence generation for QWs
with DHG and DEG resonantly excited in the T+ and
T− states, respectively. We are interested in a long-lived
spin coherence which goes beyond the trion lifetime, i.e.
in spin coherence induced for the resident carriers. In
both cases the amplitude of the KR signal is controlled
by the ratio of the electron spin beat period to the trion
lifetime. Nevertheless, the two cases are quite different as
for DHG the precessing electron is bound in the T+ trion,
while for DEG the background electron precesses. In the
latter case the electron precession in T− is blocked due
to the singlet spin character of the trion ground state.
To conclude, a long-lived spin coherence has been
found for localized holes in a GaAs/(Al,Ga)As QW with
a diluted hole gas. The spin coherence time exceeds 650
ps and is still masked by the spin dephasing due to g
factor inhomogeneities. Localization of holes suppresses
most spin relaxation mechanisms inherent for free carri-
ers. It is also worth to note, that due to the p-type Bloch
wave functions the holes do not interact with the nuclear
spins, which provides the most efficient spin relaxation
mechanism for localized electrons [23].
Acknowledgements. This work was supported by
the BMBF program ’nanoquit’.
[†] Also at Ioffe Physico-Technical institute, Russian
Academy of Sciences, St. Petersburg, Russia.
[1] Semiconductor Spintronics and Quantum Computation,
ed. by D. D. Awschalom, D. Loss, and N. Samarth,
(Springer-Verlag, Heidelberg 2002).
[2] I. Žutić, J. Fabian, and S. Das Sarma, Rev. Mod. Phys.
76, 323 (2004).
[3] D. P. DiVincenzo, Science 270, 255 (1995); D. Loss and
D. P. DiVincenzo, Phys. Rev. A 57, 120 (1998).
[4] X. Marie et al., Phys. Rev. B 60, 5811 (1999).
[5] A. Imamoglu et al., Phys. Rev. Lett. 83, 4204 (1999).
[6] T. C. Damen, L. Viña, J. E. Cunningham, J. E. Shah,
and L. J. Sham, Phys. Rev. Lett. 67, 3432 (1991).
[7] Ph. Roussignol et al., Surf. Sci. 305, 263 (1994).
[8] B. Baylac et al., Sol. State Comm. 93, 57 (1995).
[9] B. Baylac et al., Surf. Sci. 326, 161 (1995).
[10] T. Uenoyama and L. J. Sham, Phys. Rev. Lett. 64, 3070
(1990); Phys. Rev. B 42, 7114 (1990).
[11] C. Y. Hu et al., Phys. Rev. B 72, 121203(R) (2005).
[12] Optical Orientation, ed. by F. Meier and B. P. Za-
kharachenya (North-Holland, Amsterdam 1984), Ch. 2.
[13] R. Ferreira and G. Bastard, Phys. Rev. B 43, 9687
(1991).
[14] C. Lü, J. L. Cheng, and M. W. Wu, Phys. Rev. B 73,
125314 (2006).
[15] Y. G. Semenov, K. N. Borysenko, and K. W. Kim, Phys.
Rev. B 66, 113302 (2002).
[16] J. J. Baumberg, D. D. Awschalom, N. Samarth, H. Luo,
and J. K. Furdyna, Phys. Rev. Lett. 72, 717 (1994).
[17] The value | gh,‖ |= 0.60 ± 0.01 was determined from the
Zeeman splitting of PL lines at B = 7 T applied along
the QW growth axis.
[18] R. Winkler, S. J. Papadakis, E. P. De Poortere, and M.
Shayegan, Phys. Rev. Lett. 85, 4574 (2000).
[19] A. Shabaev, Al. L. Efros, D. Gammon, and I. A.
Merkulov, Phys. Rev. B 68, 201305(R) (2003).
[20] A. Greilich et al., Phys. Rev. Lett. 96, 227401 (2006).
[21] T. A. Kennedy et al., Phys. Rev. B 73, 045307 (2006).
[22] G. Finkelstein et al., Phys. Rev. B 58, 12637 (1998).
[23] I. A. Merkulov, Al. L. Efros and M. Rosen, Phys. Rev. B
65, 205309 (2002).
ABSTRACT
  The carrier spin coherence in a p-doped GaAs/(Al,Ga)As quantum well with a
diluted hole gas has been studied by picosecond pump-probe Kerr rotation with
an in-plane magnetic field. For resonant optical excitation of the positively
charged exciton the spin precession shows two types of oscillations. Fast
oscillating electron spin beats decay with the radiative lifetime of the
charged exciton of 50 ps. Long lived spin coherence of the holes with dephasing
times up to 650 ps. The spin dephasing time as well as the in-plane hole g
factor show strong temperature dependence, underlining the importance of hole
localization at cryogenic temperatures.

<|endoftext|><|startoftext|>
Local-field effects in radiatively broadened magneto-dielectric media: negative
refraction and absorption reduction
Jürgen Kästel and Michael Fleischhauer
Fachbereich Physik, Technische Universität Kaiserslautern, D-67663 Kaiserslautern, Germany
Gediminas Juzeliūnas
Institute of Theoretical Physics and Astronomy, Vilnius University, A Goštauto 12, Vilnius 01108, Lithuania
(Dated: August 11, 2021)
We give a microscopic derivation of the Clausius-Mossotti relations for a homogeneous and
isotropic magneto-dielectric medium consisting of radiatively broadened atomic oscillators. To this
end the diagram series of electromagnetic propagators is calculated exactly for an infinite bi-cubic
lattice of dielectric and magnetic dipoles for a lattice constant small compared to the resonance
wavelength λ. Modifications of transition frequencies and linewidth of the elementary oscillators are
taken into account in a selfconsistent way by a proper incorporation of the singular self-interaction
terms. We show that in radiatively broadened media sufficiently close to the free-space resonance
the real part of the index of refraction approaches the value -2 in the limit of ρλ3 ≫ 1, where ρ is
the number density of scatterers. Since at the same time the imaginary part vanishes as 1/ρ local
field effects can have important consequences for realizing low-loss negative index materials.
PACS numbers:
INTRODUCTION
It is well known that in dense dielectric materials the
induced polarization P alters the field strength Eloc act-
ing on the constituents (i.e. the local field) compared to
the average macroscopic field Em. Macroscopic consider-
ations show that in systems with high symmetry such as
a cubic lattice the two fields are related to each other ac-
cording to Eloc = Em +P/(3ε0) [1, 2]. This leads to the
well-known Clausius-Mossotti relation for the permittiv-
ity ε(ω)
ε(ω) = 1 +
ρα(ω)/ε0
1− ρα(ω)/(3ε0)
where ρ is the density and α(ω) the polarizability of the
oscillators. Similar arguments hold for a purely magnetic
material [3], except that the required densities are usu-
ally much higher due to the smallness of magnetic dipole
moments and polarizabilities. In linear response α(ω) is
well described by a damped-oscillator model [1]
α(ω) = α′ + i α′′ =
ω20 − ω2 − iγω
. (2)
The corresponding (real-valued) parameters such as the
oscillator strength α0, the resonance frequency and
width, ω0 and γ, are determined by the microscopic
model. In general the linewidth γ contains radiative
as well as non-radiative contributions. For purely ra-
diative interaction these parameters are strongly af-
fected by the renormalization of energy levels and spon-
taneous emission processes caused by the interaction
with the vacuum electromagnetic field in the medium
[4, 5, 6, 7, 8, 9, 10, 11]. Since the mode structure of
the electromagnetic field inside a dense medium can be
substantially modified compared to free space, one would
expect that the polarizability entering eq.(1) is different
from that in free space. In a macroscopic approach α(ω)
is however an input function and no conclusion can be
drawn about possible changes due to the different struc-
ture of the vacuum modes inside the medium. To take
into account the modification of transition frequencies
and radiative linewidth in a dense medium in a self-
consistent way requires a microscopic approach.
In the present paper we develop a microscopic ap-
proach to local field effects in dense materials with simul-
taneous dielectric and magnetic response using Greens-
function techniques similar to those used by deVries and
Lagendijk for purely dielectric materials [12]. To this end
we consider an infinitely extended bi-cubic lattice of elec-
tric and magnetic point dipoles with isotropic response
with a lattice constant small compared to the transition
wavelength. We however do not make use of the as-
sumptions made in [12] to renormalize the singular self-
interaction contributions to the lattice T -matrix which
eliminated radiative contributions to linewidth and tran-
sition frequencies altogether. We show that instead the
self-interaction contributions can be summed to yield the
dressed t-matrix of an isolated oscillator interacting with
the vacuum modes of the electromagnetic field in free
space. In this way we derive Clausius-Mossotti relations
for general, radiatively broadened, isotropic magneto-
dielectrica. Apart from non-radiative broadenings, the
electric and magnetic polarizabilities entering these equa-
tions are shown to be exactly those of free space. We
then show that simultaneous local-field corrections to
electric and magnetic fields in purely radiatively broad-
ened magneto-dielectrica have a surprising and poten-
tially important effect: For sufficiently large densities the
http://arxiv.org/abs/0704.0593v2
real part of the refractive index saturates at the level of
−2. At the same time, the imaginary part of the com-
plex index approaches zero inversely proportional to the
density. Thus the medium becomes transparent and left-
handed i.e. displays a negative index of refraction with
low absorption.
LOCAL-FIELD EFFECTS AND
RENORMALIZATION OF RADIATIVE
SELF-INTERACTION IN DIELECTRIC MEDIA
We start by developing a microscopic scattering ap-
proach to local-field effects in dielectric media taking
into account possible material induced modifications of
radiative linewidth and transition frequencies in a self-
consistent way. To this end we consider a simple cubic
lattice of electric point dipoles with isotropic bare polar-
izability αb
αb(r) = αb
δ(r−R), (3)
whereR denote lattice vectors. The dipoles interact with
the quantized electromagnetic field Ê which obeys the
vector Helmholtz equation
~∇× ~∇× Ê(r, ω)− ω
Ê(r, ω) = µ0ω
P̂ . (4)
In the weak-excitation, i.e. linear response limit, the op-
erator of the microscopic electric polarization P̂ has the
form P̂(r) = αb(r)Ê(r, ω). Solving eq.(4) we can deter-
mine the (isotropic) dispersion relation k = k(ω) from
which the permittivity ε(ω) can be extracted. In the
linear response limit the solution of the quantum me-
chanical interaction problem can most easily be obtained
by means of Greensfunction techniques. In particular it
is sufficient to calculate the scattering T -matrix of the
oscillator lattice. The dispersion relation can then be
obtained via [13, 14, 15]
det T−1 = 0. (5)
The scattering T -matrix obeys a linear Dyson equation
T = V + V G(0)V + · · · = V + V G(0)T, (6)
where G(0)(r, r′, ω) is the free-space retarded propagator
of the electric field which is a solution to the classical
vector Helmholtz equation
~∇× ~∇× G(0)(r, r′, ω)− ω
G(0)(r, r′, ω) =
= 1 δ(r− r′), (7)
V (r, ω) = −ω
2αb(r)
is a linear, isotropic point vertex. Note that integration
over spatial variables was suppressed in eq.(6) for nota-
tional simplicity.
For a cubic lattice of isotropic scatterers, the series can
be summed up to yield [16]
T (k,k′) = −
ei(k−k
R 6=0
RG(0)(R)
where G(0)(R) stands for G(0)(r, r+R, ω0) which due to
the discrete translation invariance is independent on r.
The single-particle scattering t-matrix t(ω) is determined
by the bare polarizability [12]
t(ω)−1 =
+ G(0)(0). (10)
Note that G(0)(0) is diagonal and isotropic. In eq.(9) we
have separated the contribution of the lattice (
R 6=0)
from the multiple scattering events at the same oscilla-
tor (G(0)(0)). This separation is crucial since G(0)(0) is
singular. Rather than eliminating this singularity by a
regularization procedure as done in [12], we note that ex-
pression (10) gives the single-particle scattering t-matrix
t(ω) dressed by the interaction with the vacuum field in
free space. This quantity is experimentally observable
and is related to the single-particle polarizability α(ω) in
free space:
α(ω) = t(ω)
ε0 (11)
αb on the other hand is not observable and thus only a
theoretical notion. At this point other broadening mech-
anisms can be incorporated by adding appropriate non-
radiative decay rates γnon-rad to the polarizability α(ω)
(11) (cf. equation (2) and discussion thereafter).
Obviously, for the radiative part separating the sum∑
RG(0)(R) into G(0)(0)+
R 6=0 e
ik′RG(0)(R) does
the trick of writing the full lattice T -matrix in terms
of the known free space t-matrix. As a drawback we
are left with the sum over the lattice vectors R 6= 0.
Unfortunately this sum can not be evaluated exactly and
has to be treated approximately.
According to Poisson’s summation formula
f(n) =
dxf(x)e−2πikx (12)
the sum over R 6= 0 can be expressed in terms of a real
space integral and a sum over inverse lattice vectors K
of the Fourier transform of the free space Greensfunction
G̃(0)(p)
R 6=0
eikRG(0)(R) =
Ξ(|r|)
(2πa)3
ei(p+k−K)rG̃(0)(p)
Here Ξ(|r|) is some smooth function with Ξ(0) = 0 and
Ξ(|r| > 0) → 1 introduced to prevent the integral from
touching the excluded singular point r = 0.
In the following we restrict the discussion to lattices
with a lattice constant much smaller than the resonant
wavelength, i.e. ka ≪ 1. In this limit the lattice of
oscillators behaves essentially as a homogeneous medium.
Contributions from large K-vectors to the sum, which
reflect the discreteness of the lattice, can be neglected as
long as the singular contribution from the origin has been
excluded. Therefore we only keep the term K = 0 and
assume a Gaussian cutting function Ξ(|r|) = 1− e−r2/δ2 ,
with δ ≪ a. This yields
R 6=0
eikRG(0)(R) ≈ 1
G̃(0)(k)
π3/2δ3
(2π)3
dp p2e−
(k2+p2)e−
k·p̂G̃(0)(p),
where p̂ = p/|p|. Apart from the Gaussian p-integral
which provides a smooth cut-off in reciprocal space, δ
can be treated as a small parameter. That allows to
carry out the integration analytically which in leading
order of δ yields
R 6=0
eikRG(0)(R) ≈ 1
G̃(0)(k)− 1
3ω2/c2
1. (15)
The free-space Greentensor G̃(0)(k) is given by [12]
G̃(0)(k) =
1− |k|2∆k
with ∆k = 1 − k̂ ⊗ k̂ being a projector to directions
orthogonal to k.
With this we are ready to evaluate eq. (5) which reads
in the limit ka ≪ 1
ρα(ω)/ε0
1− |k|2∆k
Solving eq. (17) for the (isotropic) dispersion k = k(ω)
with k(ω) = ε(ω)ω2/c2 finally yields
ε(ω) = 1 +
ρα(ω)/ε0
1− ρα(ω)/3ε0
. (18)
This is the well-known Clausius-Mossotti relation where
for purely radiatively broadened systems α(ω) is the
dressed polarizability of an isolated oscillator interacting
with the free-space electromagnetic vacuum field.
LOCAL-FIELD EFFECTS FOR
MAGNETO-DIELECTRICS
We now extend the above discussion to the case of a
bi-cubic lattice of electric and magnetic dipole oscillators.
The microscopic, space-dependent bare electric polariz-
ability αbe(r) is then given by
αbe(r) = αbe
δ(r−R) = αbe
eiKr (19)
and, similarly, the bare magnetic polarizability by
αbm(r) = αbm
δ(r−R−∆r) = αbm
eiK(r−∆r)
HereR denotes again the lattice vectors and ∆r the spac-
ing between the electric and magnetic sublattices. The
bare atomic polarizabilities αbe and αbm are assumed
to be scalar for simplicity corresponding to an isotropic
medium. The last expressions in eqn. (19) and (20) give
the bare polarizabilities in reciprocal space, with K being
the reciprocal lattice vectors.
Due to the simultaneous presence of electric and mag-
netic dipole lattices we now have to solve the coupled set
of vector Helmholtz equations for the operators of the
electric and magnetic fields
∇×∇× Ê− ω
Ê = iωµ0∇× M̂+ µ0ω2P̂ (21)
∇×∇× Ĥ− ω
M̂− iω∇× P̂. (22)
In linear response the operator of the polarization P̂
and the magnetization M̂ are proportional to the elec-
tric and magnetic fields respectively, P̂(r) = αbe(r)Ê(r)
and M̂(r) = µ0αbm(r)Ĥ(r).
In the following we will pursue a slightly different ap-
proach to solve the coupled set of equations than used
in the previous section. Taking into account the lattice
symmetry we first write the field variables in the form
Ê(r) =
Ẽ(k−K)ei(k−K)r, (23)
where the dependence on frequency ω was suppressed for
notational simplicity. The subscript denotes integration
over the first Brillouin zone. Substituting this and the
corresponding expression for Ĥ into (21)-(22) gives the
Helmholtz equations in reciprocal space. After some ele-
mentary manipulations the following closed set of equa-
tions is derived:
ραbe/ε0
1− |k−K|2∆k−K
Ẽ(k−K′) = µ0αbm
eiK∆r(k−K)×
1− |k−K|2∆k−K
H̃(k−K′)e−iK
ρµ0αbm
1− |k−K|2∆k−K
H̃(k−K′)e−iK
′∆r = − c
ωµ0αbm
e−iK∆r(k−K)×
1− |k−K|2∆k−K
Ẽ(k−K′)
where ρ = 1/a3 is the particle density. The sum in the
brackets on the left hand sides of eqs. (24,25) can be
rewritten as
1− |k−K|2∆k−K
eikRG(0)(R)
= G(0)(0) +
R 6=0
eikRG(0)(R),
where in the second line we have separated the singular
contribution G(0)(0). One recognizes that this term can
be added to the expressions containing the bare polariz-
abilities in eqs.(24) and (25) yielding the dressed scatter-
ing t-matrices for isolated electric and magnetic dipoles
interacting with the free-space vacuum field:
te(ω)
+ G(0)(0), (26)
tm(ω)
(ω2µ0
+ G(0)(0). (27)
The sum over the Greensfunction excludingR = 0 can be
evaluated in a similar way as in the previous section. If
we again assume a lattice constant a much smaller than
the resonant wavelength, reciprocal K vectors different
from zero can be disregarded. This leads to
ρte(ω)
+ G̃(0)(k)− 1
3ω2/c2
Ê(k) = (28)
µ0αbm
1− k2∆k
Ĥ(k),
ρtm(ω)
+ G̃(0)(k) − 1
3ω2/c2
Ĥ(k) = (29)
c2αbe
ωµ0αbm
1− k2∆k
Ê(k).
Since we are furthermore only interested in propagating,
i.e. transversal modes, we can further simplify the calcu-
lation by projecting onto transversal modes using ∆k
ραe(ω)/ε0
∆kÊ(k) =
µ0αbm
k×∆kĤ(k) (30)
ρµ0αm(ω)
k×∆kĤ(k) =
c2αbe
ωµ0αbm
∆kÊ(k). (31)
Here we have substituted the dressed single parti-
cle t-matrices by the free-space dressed polarizabilities
αe(m)(ω) = te(m)(ω)c
2/ω2ε0
In order to find the dispersion k(ω) = n2ω2/c2 we have
to determine the solution of the secular equation of the
linear set of eqs. (30,31), which results in the condition
ραe(ω)/ε0
× (32)
ρµ0αm(ω)
Solving for the refractive index of the transversal modes
then gives n2 = εµ, where
ε = 1 +
ραe(ω)/ε0
1− ραe(ω)/3ε0
µ = 1 +
ρµ0αm(ω)
1− ρµ0αm(ω)/3
are the relative dielectric permittivity and magnetic
permeability, respectively, both satisfying the Clausius-
Mossotti relations.
Note that for longitudinal modes eqs. (28) and (29) de-
couple. This can be seen by applying the corresponding
projector to longitudinal waves k̂⊗k̂ which leads to a dis-
appearance of the cross-coupling terms. The dispersion
obtained in this way gives either ε = 0 corresponding to
electric excitons [17, 18] or µ = 0 for magnetic excitons.
NEGATIVE REFRACTION AND ABSORPTION
REDUCTION DUE TO LOCAL FIELD EFFECTS
IN MAGNETO-DIELECTRIC MEDIA
It is interesting to consider the implications of the
Clausius Mossotti relations for radiatively broadened me-
dia in the large density limit. Let us first consider a
purely dielectric medium and let us assume that the po-
larizability αe(ω) = α
e(ω) + i α
e (ω) does not depend on
the density, i.e. the medium is radiatively broadened. In
this case one finds
ρ→∞−→ −2 + i
|αe|2
. (35)
In the high-density limit and sufficiently close to reso-
nance the response saturates at a value of −2 with an
imaginary part that vanishes as 1/ρ. At this point the
medium becomes totally opaque since the index of re-
fraction attains an imaginary value n = i
2 indicating
the emergence of a stopping band. This is illustrated
in the left column of Fig 1 for a medium composed of
either electric or magnetic dipole oscillators. For small
densities (ρ|α0|/3 = 1/3) the resonance is centered at ω0
whereas for larger densities (ρ|α0|/3 = 3) the response
shifts to smaller frequencies and is amplified. Eventu-
ally (ρ|α0|/3 = 30) the refractive index becomes almost
purely imaginary in which case light cannot propagate
any longer.
This behavior changes dramatically if we consider me-
dia with overlapping electric and magnetic resonances
described by both an electric polarizability αe(ω) and a
magnetic polarizability αm(ω). Independent application
of Clausius-Mossotti local-field corrections to the permit-
tivity and the permeability leads in the high density limit
n = −2 + i
|αe|2
9α′′m
µ0|αm|2
. (36)
Thus in the spectral overlap region the real part of the
index of refraction approaches the value −2, i.e. attains
a constant negative value. Furthermore the imaginary
part, responsible for absorption losses, approaches zero in
that spectral region as 1/ρ. This rather peculiar behav-
ior is illustrated in the right column of Fig.1. One clearly
recognizes the emergence of a spectral region around the
bare resonance frequency where the real part of the re-
fractive index approaches −2 while the imaginary part is
strongly suppressed.
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
-3 -2 -1 0 1 2 3
(b)(a)
∆ [γ ] ∆ [γ ]
FIG. 1: (color online) spectrum of the real (solid) and imagi-
nary (dashed) part of the refractive index as well as the real
(dotted) part of the response function(s) ε and/or µ as a func-
tion of the detuning ∆ for a (a) pure dielectric or magnetic
medium for ρ|α0|/3 at ∆ = 0 equal to = 1/3 (top), 3 (middle)
and 30 (bottom) (b) magneto-dielectric medium for ρ|α0|/3
at ∆ = 0 equal to = 1/3 (top), 3 (middle) and 30 (bottom).
Negative refraction of light is currently one of the
most active research areas in photonics [19, 20, 21] due
to fascinating potential applications such as superlens-
ing [22] or electromagnetic cloaking [23, 24, 25]. In
recent years substantial progress has been made in re-
alizing negative refraction in so-called meta-materials
[26, 27, 28, 29]. These are artificial periodic structures
of electric and magnetic dipoles with a resonance wave-
length much larger than the lattice constant which thus
form a quasi-homogeneous magneto-dielectric medium.
In order to achieve a large electromagnetic response, op-
eration close to resonance is needed which is associated
with rather substantial losses. The elimination of these
losses represents one of the main challenges in the field
[30]. We have shown here that in a radiatively broad-
ened medium, i.e. a medium in which density-dependent
broadening mechanism can still be disregarded for suf-
ficiently large densities, local field effects can provide a
negative index of refraction and at the same time effi-
ciently suppress absorption losses.
SUMMARY
In the present paper we have given a rigorous micro-
scopic derivation of Clausius-Mossotti relations for both
the electric and magnetic response in an isotropic, radia-
tively broadened magneto-dielectric medium formed by
a simple bi-cubic lattice of electric and magnetic dipoles.
As opposed to previous microscopic approaches we have
taken into account possible modifications of the single-
particle polarizabilities by the altered electromagnetic
vacuum inside the medium in a self-consistent way. For
a simple bi-cubic lattice it has been shown that the po-
larizabilities entering the Clausius-Mossotti relations are
those of single oscillators interacting with the free-space
vacuum field. We showed that as a consequence of the
local field corrections a radiatively broadened medium
with overlapping electric and magnetic resonances be-
comes lossless with a real part of the refractive index
approaching the value −2 in the high-density limit. The
latter could provide an interesting avenue to construct ar-
tificial materials with negative refraction and low losses.
This work was supported by the Alexander von Hum-
boldt Foundation through the institutional collabora-
tion grant between The Institute of Theoretical Physics
and Astronomy of Vilnius University and the Techni-
cal University of Kaiserslautern. J.K. acknowledges fi-
nancial support by the Deutsche Forschungsgemeinschaft
through the GRK 792 “Nichtlineare Optik und Ultra-
kurzzeitphysik”.
[1] J. D. Jackson, Classical Electrodynamics, (John Wiley &
Sons, New York, 1999)
[2] H.A. Lorentz, Wiedem. Ann. 9, 641 (1880); L. Lorenz
ibid. 11, 70 (1881); L. Onsager, J. Am. Chem. Soc. 58,
1486 (1936); C.J.F. Böttcher, Theory of Electric Polar-
ization, (Elsevier, Amsterdam, 1973)
[3] D. M. Cook, The Theory of the Electromagnetic Field,
(Prentice-Hall, Englewood Cliffs, N.J., 1975).
[4] G. Nienhuis and C. Th. J. Alkemande, Physica C 81, 181
(1976).
[5] J. Knoester and S. Mukamel, Phys. Rev. A 40, 7065
(1989).
[6] R.J. Glauber and M. Lewenstein, Phys. Rev. A 43, 467
(1991).
[7] S.M. Barnett, B. Huttner, and R. Loudon, Phys. Rev.
Lett. 68, 3698 (1992).
[8] G. Juzeliunas, Phys. Rev. A 55 R4015 (1997).
[9] S. Scheel, L. Knöll, and D.G. Welsch, Phys. Rev. A 60,
4094 (1999)
[10] M. Fleischhauer, Phys. Rev. A 60, 2534 (1999).
[11] H.T. Dung HT, S.Y. Buhmann, L. Knöll, D.G. Welsch,
S. Scheel, and J. Kästel, Phys. Rev. A 68, 043816 (2003).
[12] P. de Vries, D. V. van Coevorden, and A. Lagendijk, Rev.
Mod. Phys. 70, 447 (1998).
[13] J. Korringa, Physica 13, 392 (1947).
[14] W. Kohn, N. Rostoker, Phys. Rev. 94, 1111 (1953).
[15] J. M. Ziman, Proc. Phys. Soc. 86, 337 (1965).
[16] P. de Vries, and A. Lagendijk, Phys. Rev. Lett 81, 1381
(1998).
[17] A. S. Davydov, Theory of Molecular Excitons (Plenum,
New York, 1971).
[18] V. M. Agranovich, and M. D. Galanin, Electronic Exci-
tation Energy Transfer in Condensed Matter, edited by
V. M. Agranovich and A. A. Maradudin (North-Holland,
Amsterdam, 1982).
[19] V. G. Veselago, Sov. Phys. Usp. 10, 509 (1968).
[20] V. M. Agranovich, and Y. N. Gartstein, Physics Uspekhi
49, 1029 (2006)
[21] V. M. Shalaev, Nature Photonics 1, 41 (2007)
[22] J. B. Pendry, Phys. Rev. Lett. 85, 3966 (2000)
[23] U. Leonhardt, Science 312, 1777 (2006).
[24] J. B. Pendry, D. Schurig, and D. R. Smith, Science 312,
1780 (2006).
[25] D. Schurig et al. , Science 314, 977 (2006).
[26] J. B. Pendry et al. , IEEE Trans. Micro. Theory Tech.
47, 2075 (1999).
[27] D. R. Smith et al. , Phys. Rev. Lett. 84, 4184 (2000);
R. Shelby, D. R. Smith, and S. Schultz, Science 292, 77
(2001).
[28] T. J. Yen et al. , Science 303, 1494 (2004).
[29] S. Linden et al. , Science 306, 1351 (2004); C. Enkrich
et al. , Phys. Rev. Lett. 95, 203901 (2005).
[30] J. Kästel, M. Fleischhauer, S. F. Yelin, and R. L.
Walsworth, Phys. Rev. Lett. 99, 073602 (2007).
ABSTRACT
  We give a microscopic derivation of the Clausius-Mossotti relations for a
homogeneous and isotropic magneto-dielectric medium consisting of radiatively
broadened atomic oscillators. To this end the diagram series of electromagnetic
propagators is calculated exactly for an infinite bi-cubic lattice of
dielectric and magnetic dipoles for a lattice constant small compared to the
resonance wavelength $\lambda$. Modifications of transition frequencies and
linewidth of the elementary oscillators are taken into account in a
selfconsistent way by a proper incorporation of the singular self-interaction
terms. We show that in radiatively broadened media sufficiently close to the
free-space resonance the real part of the index of refraction approaches the
value -2 in the limit of $\rho \lambda^3 \gg 1$, where $\rho$ is the number
density of scatterers. Since at the same time the imaginary part vanishes as
$1/\rho$ local field effects can have important consequences for realizing
low-loss negative index materials.

<|endoftext|><|startoftext|>
Introduction
The Standard Model (SM), although in agreement with the available experimental
data [1], leaves several open questions. In particular, the number of fermion generations
and their mass spectrum are not predicted. The measurement of the Z decay widths [1]
established that the number of light neutrino species (m < mZ/2, where mZ is the Z boson
mass) is equal to three. However, if a heavy neutrino or a neutrinoless extra generation
exists, this bound does not exclude the possibility of extra generations of heavy quarks.
Moreover the fit to the electroweak data [2] does not deteriorate with the inclusion of one
extra heavy generation, if the new up and down-type quarks mass difference is not too
large. It should be noticed however that in this fit no mixing of the extra families with
the SM ones is assumed.
The subject of this paper is the search for the pair production of a fourth generation
b′-quark at LEP-II: b′ production and decay are discussed in section 2; in section 3, the
data sets and the Monte Carlo (MC) simulation are described; the analysis is discussed
in section 4; the results and their interpretation within a sequential model are presented
in sections 5 and 6, respectively.
2 b′-quark production and decay
Extra generations of fermions are predicted in several SM extensions [3,4]. In sequen-
tial models [5–7], a fourth generation of fermions carrying the same quantum numbers as
the SM families is considered. In the quark sector, an up-type quark, t′, and a down-type
quark, b′, are included. The corresponding 4× 4 extended Cabibbo-Kobayashi-Maskawa
(CKM) matrix is unitary, approximately symmetric and almost diagonal. As CP-violation
is not considered in the model, all the CKM elements are assumed to be real.
The b′-quark may decay via charged currents (CC) to UW, with U = t′, t, c, u, or via
flavour-changing neutral currents (FCNC) to DX , where D = b, s, d and X = Z,H, γ, g
(Fig. 1). As in the SM, FCNC are absent at tree level, but can appear at one-loop level,
due to CKM mixing. If the b′ is lighter than t′ and t, the decays b′ → t′W and b′ → tW
are kinematically forbidden and the one-loop FCNC decays can be as important as the
CC decays [6].
The analysis of the electroweak data [1] shows that the mass difference |mt′ −mb′ | <
60 GeV/c2 is consistent with the measurement of the ρ parameter [3,5]. In particular,
when mZ + mb < mb′ < mH + mb, either b
′ → cW or b′ → bZ decay tend to be domi-
nant [5–7]. In this case, the partial widths of the CC and FCNC b′ decays depend mainly
on mt′ , mb′ and RCKM = | Vcb′V
tb′Vtb
|, where Vcb′ , Vtb′ and Vtb are elements of the extended
4 × 4 CKM matrix [7].
Limits on the mass of the b′-quark have been set previously at various accelerators.
At LEP-I, all the experiments searched for b′ pair production (e+e− → b′b̄′), yielding a
lower limit on the b′ mass of about mZ/2 [8]. At the Tevatron, both the D0 [9] and
CDF [10] experiments reported limits on σ(pp̄ → b′b̄′) × BR(b′ → bX)2, where BR is
the branching ratio corresponding to the considered FCNC b′ decay mode and X = γ,Z.
Assuming BR(b′ → bZ) = 1, CDF excluded the region 100 < mb′ < 199 GeV/c2.
Although no dedicated analysis was performed for the b′ → cW decay, the D0 limits on
σ(pp̄ → tt̄) × BR(t → cW)2 from Fig. 44 and Table XXXI of reference [11] can give a
hint on the possible values for BR(b′ → cW) [12].
In the present analysis the on-shell FCNC (b′ → bZ) and CC (b′ → cW) decay modes
were studied and consequently the mass range 96 GeV/c2 < mb′ < 103 GeV/c
2 was
considered. This mass range is complementary to the one covered by CDF [10]. The
mass range mW + mc < mb′ < mZ + mb was not considered because in this region
the evaluation of the branching ratios for the different b′ decays is particularly difficult
from the theoretical point of view [7]. In the present analysis no assumptions on the
BR(b′ → bZ) and BR(b′ → cW) in order to derive mass limits were made. Different
final states, corresponding to the different b′ decay modes and subsequent decays of the
Z and W bosons, were analysed.
3 Data samples and Monte Carlo simulation
The analysed data were collected with the DELPHI detector [13] during the years
1999 and 2000 in LEP-II runs at
s = 196 − 209 GeV and correspond to an integrated
luminosity of about 420 pb−1. The luminosity collected at each centre-of-mass energy is
shown in Table 1. During the year 2000, an unrecoverable failure affected one sector of
the central tracking detector (TPC), corresponding to 1/12 of its acceptance. The data
collected during the year 2000 with the TPC fully operational were split into two energy
bins, below and above
s = 206 GeV, with 〈
s〉 = 204.8 GeV and 〈
s〉 = 206.6 GeV,
respectively. The data collected with one sector of the TPC turned off were analysed
separately and have 〈
s〉 = 206.3 GeV.
s (GeV) 196 200 202 205 207 206∗
luminosity (pb−1) 76.0 82.7 40.2 80.0 81.9 59.2
Table 1: The luminosity collected with the DELPHI detector at each centre-of-mass
energy is shown. The energy bin labelled 206∗ corresponds to the data collected with one
sector of the TPC turned off.
Signal samples were generated using a modified version of PYTHIA 6.200 [14]. Al-
though PYTHIA does not provide FCNC decay channels for quarks, it was possible to
activate them by modifying the decay products of an available channel. The angular
distributions assumed for b′ pair production and decay were those predicted by the SM
for any heavy down-type quark. Different samples, corresponding to b′ masses in the
range between 96 and 103 GeV/c2 and with a spacing of 1 GeV/c2 were generated at
each centre-of-mass energy. Specific Monte Carlo simulations (for both SM and signal
processes) were produced for the period when one sector of the TPC was turned off.
The most relevant background processes for the present analyses are those leading
to WW or ZZ bosons in the final state, i.e. four-fermion backgrounds. Radiation in
these events can mimic the six-fermion final states for the signal. Additionally qq̄(γ) and
Bhabha events can not be neglected since for signal final states with missing energy these
backgrounds can become important. SM background processes were simulated at each
centre-of-mass energy using several Monte Carlo generators. All the four-fermion final
states (both neutral and charged currents) were generated with WPHACT [15], while
the particular phase space regions of e+e− → e+e−f f̄ referred to as γγ interactions were
generated using PYTHIA [14]. The qq(γ) final state was generated with KK2F [16].
Bhabha events were generated with BHWIDE [17].
The generated signal and background events were passed through the detailed simu-
lation of the DELPHI detector [13] and then processed with the same reconstruction and
analysis programs as the data.
4 Description of the analyses
Pair production of b′-quarks was searched for in both the FCNC (b′ → bZ) and CC
(b′ → cW) decay modes. The b′ decay modes and the subsequent decays of the gauge
bosons (Z or W) lead to several different final states (Fig. 2). The final states considered
and their branching ratios are shown in Table 2. The choice of the considered final
states was done taking into account their signatures and BR. About 81% and 90% of the
branching ratio to the FCNC and CC channels were covered, respectively. All final states
include two jets originating from the low energy b (c) quarks present in the FCNC (CC)
b′ decay modes. A common preselection was adopted, followed by a specific analysis for
each of the final states (Table 2).
b′ decay boson decays BR (%) final states
b′ → bZ (FCNC) ZZ → l+l−νν̄ 4.0 bb̄l+l−νν̄
ZZ → qq̄νν̄ 28.0 bb̄qq̄νν̄
ZZ → qq̄qq̄ 48.6 bb̄qq̄qq̄
b′ → cW (CC) WW → qq̄l+ν 43.7 cc̄qq̄l+ν
WW → qq̄qq̄ 45.8 cc̄qq̄qq̄
Table 2: The final states considered in this analysis are shown. About 81% and 90% of
the branching ratio to the FCNC and CC channels were covered, respectively.
Events were preselected by requiring at least eight good charged-particle tracks and
the visible energy measured at polar angles1 above 20◦, to be greater than 0.2
s. Good
charged-particle tracks were defined as those with a momentum above 0.2 GeV/c and
impact parameters in the transverse plane and along the beam direction below 4 cm and
below 4 cm/ sin θ, respectively.
The identification of muons relied on the association of charged particles to signals
in the muon chambers and in the hadronic calorimeters and was provided by standard
DELPHI algorithms [13]. The identification of electrons and photons was performed
by combining information from the electromagnetic calorimeters and the tracking sys-
tem. Radiation and interaction effects were taken into account by an angular clustering
procedure around the main shower [18].
The search for isolated particles (charged leptons and photons) was done by construct-
ing double cones oriented in the direction of charged-particle tracks or neutral energy
deposits. The latter ones were defined as calorimetric energy deposits above 0.5 GeV,
not matched to charged-particle tracks and identified as photon candidates by the stan-
dard DELPHI algorithms [13,18]. For charged leptons (photons), the energy in the region
between the two cones, which had half-opening angles of 5◦ and 25◦ (5◦ and 15◦), was
required to be below 3 GeV (1 GeV), to ensure isolation. All the charged-particle tracks
1In the standard DELPHI coordinate system, the positive z axis is along the electron beam direction. The polar angle
(θ) is defined with respect to the z axis. In this paper, polar angle ranges are always assumed to be symmetric with respect
to the θ = 90◦ plane.
final state assignment criteria
bb̄l+l−νν̄ at least 1 isolated lepton
bb̄qq̄νν̄ no isolated leptons
Emissing > 50 GeV
bb̄qq̄qq̄ no isolated leptons
Emissing < 50 GeV
cc̄qq̄l+ν only 1 isolated lepton
cc̄qq̄qq̄ no isolated leptons
Emissing < 50 GeV
Table 3: Summary of the final state assignment criteria.
and neutral energy deposits inside the inner cone were associated to the isolated particle.
Its energy was then re-evaluated as the sum of the energies inside the inner cone and
was required to be above 5 GeV. For well identified leptons or photons [13,18] the above
requirements were weakened. In this case only the external cone was used (to ensure
isolation) and its angle α was varied according to the energy of the lepton (photon) can-
didate, down to 2◦ for Pℓ ≥ 70 GeV/c (3◦ for Pγ ≥ 90 GeV/c), with the allowed energy
inside the cone reduced by sinα/ sin 25◦ (sinα/ sin 15◦). Isolated leptons were required
to have a momentum greater than 10 GeV/c and a polar angle above 25◦. Events with
isolated photons were rejected.
All the events were clustered into two, four or six jets using the Durham jet algo-
rithm [19], according to the number of jets expected in the signal in each of the final
states, unless explicitly stated otherwise. Although two b jets are always present in the
FCNC final states, they have a relatively low energy and b-tagging techniques [20] were
not used.
Events were assigned to the different final states according to the number of isolated
leptons and to the missing energy in the event, as detailed in Table 3. Within the same
b′ decay channel, the different selections were designed to be mutually exclusive. For the
final states involving charged leptons (bb̄l+l−νν̄ and cc̄qq̄l+ν), events were divided into
different samples according to the lepton flavour identification: e sample (well identified
electrons), µ sample (well identified muons) and no-id sample (leptons with unidentified
flavour or two leptons identified with different flavours).
Specific analyses were then performed for each of the final states. The selection criteria
for the bb̄qq̄qq̄ and cc̄qq̄qq̄ final states were the same. The bb̄l+l−νν̄ final state has a very
clean signature (two leptons with ml+l− ∼ mZ, two low energy jets and missing mass close
to mZ) and consequently a sequential cut analysis was adopted. For all the other final
states, a sequential selection step was followed by a discriminant analysis. In this case,
a signal likelihood (LS) and a background likelihood (LB) were assigned to each event,
based on Probability Density Functions (PDF), built from the distributions of relevant
physical variables. The discriminant variable was defined as ln(LS/LB).
4.1 The bb̄l+l−νν̄ final state
The FCNC bb̄l+l−νν̄ final state events were preselected as described above, by re-
quiring at least eight good charged-particle tracks, the visible energy measured at polar
angles above 20◦, to be greater than 0.2
s and at least one isolated lepton. Distribu-
tions of the relevant variables are shown in Fig. 3 for all the events assigned to this final
state after the preselection. The event selection was performed in two levels. In the first
one, events were required to have at least two leptons and an effective centre-of-mass
energy [21],
s′, below 0.95
s. The particles other than the two leptons in the events
were clustered into two jets and the Durham resolution variable in the transition from
two jets to one jet2 was required to be greater than 0.002. The number of data events and
the SM expectation after the first selection level is shown in Table 4. The background
composition and the signal efficiencies at this level of selection for mb′ = 100 GeV/c
2 and√
s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√
s values were found to be the same within errors. Data, SM expectation and signal
distributions at this selection level are shown in Fig. 4.
s (GeV) data (SM expectation ± statistical error)
e sample µ sample no-id sample
196 2 (2.6±0.3) 1 (2.9±0.3) 47 (35.9±1.4)
200 3 (2.5±0.4) 4 (3.4±0.4) 30 (37.4±1.4)
202 2 (1.3±0.2) 1 (1.7±0.2) 20 (18.7±0.7)
205 5 (2.5±0.4) 3 (3.0±0.4) 35 (36.2±1.4)
207 3 (2.3±0.4) 3 (3.1±0.4) 45 (35.1±1.3)
206∗ 1 (1.9±0.3) 2 (2.6±0.2) 31 (27.6±1.0)
total 16 (13.2±0.8) 14 (16.7±0.8) 208 (191.0±3.0)
Table 4: First selection level of the bb̄l+l−νν̄ final state: the number of events selected
in data and the SM expectations after the first selection level for each sample and cen-
tre-of-mass energy are shown.
In the final selection level the momentum of the more energetic (less energetic) jet
was required to be below 30 GeV/c (12.5 GeV/c). Events in the e and no-id samples
had to have a missing energy greater than 0.4
s. In the µ sample events were required
to have an angle between the two muons greater than 125◦. In the no-id sample, the
angle between the two charged leptons had to be greater than 140◦ and pmis/Emis < 0.4,
where pmis and Emis are the missing momentum and energy, respectively. After the final
selection, one data event was selected for an expected background of 1.5±0.7. This event
belonged to the no-id sample and was collected at
s = 200 GeV. The signal efficiencies
for mb′ = 100 GeV/c
2 and
s = 205 GeV are 30.6 ± 2.5% (e sample), 48.6 ± 2.7% (µ
sample) and 7.2 ± 0.8% (no-id sample) and their variation with mb′ and
s was found
to be negligible in the relevant range.
4.2 The bb̄qq̄νν̄ final state
The FCNC bb̄qq̄νν̄ final state is characterised by the presence of four jets and a
missing mass close to mZ. At least 20 good charged-particle tracks and
s′ > 0.5
were required. Events were clustered into four jets. Monojet-like events were rejected by
requiring − log10(y2→1) < 0.7 (y2→1 is the Durham resolution variable in the two to one
jet transition). Furthermore, − log10(y4→3) was required to be below 2.8 and the energy
of the leading charged particle of the most energetic jet was required to be below 0.1
2The Durham resolution variable is the minimum value of the scaled transverse momentum obtained in the transition
from n to n− 1 jets [19] and will be represented by yn→n−1.
A kinematic fit imposing energy-momentum conservation and no missing energy was
applied and the background-like events with χ2/n.d.f. < 6 were rejected. The data,
SM expectation and signal distributions of this variable are shown in Fig. 5. Table 5
summarizes the number of selected data events and the SM expectation. The background
composition and the signal efficiency at this level of selection for mb′ = 100 GeV/c
2 and√
s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√
s values were found to be the same within errors.
s (GeV) data (SM expectation ± statistical error)
196 123 (106.3±4.0)
200 111 (104.8±4.0)
202 50 (49.8±1.9)
205 88 (94.2±3.7)
207 99 (91.2±3.6)
206∗ 62 (65.7±2.6)
total 533 (511.7±8.3)
Table 5: First selection level of the bb̄qq̄νν̄ final state: the number of events selected in
data and the SM expectation for each centre-of-mass energy are shown.
A discriminant selection was then performed using the following variables to build the
PDFs:
• the missing mass;
• Aj1j2cop × min(sin θj1 , sin θj2), where Aj1j2cop is the acoplanarity3 and θj1,j2 are the polar
angles of the jets when forcing the events into two jets4;
• the acollinearity between the two most energetic jets5 with the event particles clus-
tered into four jets;
• the sum of the first and third Fox-Wolfram moments (h1 + h3) [22];
• the polar angle of the missing momentum.
The data, SM expectation and signal distributions of these variables are shown in Fig. 6.
4.3 The bb̄qq̄qq̄ final state
The FCNC bb̄qq̄qq̄ final state is characterised by the presence of six jets and a small
missing energy. All the events were clustered into six jets and only those with at least
30 good charged-particle tracks were accepted. Moreover, events were required to have√
s′ > 0.6
s, − log10(y2→1) < 0.7 and − log10(y6→5) < 3.6. The number of selected data
events and the expected background at this level are shown in Table 6. The background
composition and the signal efficiency at this level of selection for mb′ = 100 GeV/c
2 and√
s = 205 GeV are given in Table 8. The efficiencies for the other relevant b′ masses and√
s values were found to be the same within errors.
A discriminant selection was performed using the following variables to build the PDFs:
3The acoplanarity between two particles is defined as |180◦ − |φ1 −φ2||, where φ1,2 are the azimuthal angles of the two
particles (in degrees).
4While the signal is characterised by the presence of four jets in the final state, the two jets configuration is used mainly
for background rejection.
5The acollinearity between two particles is defined as 180◦ − α1,2, where α1,2 is the angle (in degrees) between those
two particles.
s (GeV) data (SM expectation ± statistical error)
196 349 (326.7±5.3)
200 347 (342.1±5.5)
202 165 (162.1±2.6)
205 322 (319.0±5.2)
207 287 (307.6±5.0)
206∗ 192 (215.8±3.6)
total 1662 (1673.9±11.4)
Table 6: First selection level of the bb̄qq̄qq̄ and cc̄qq̄qq̄ final states: the number of events
selected in data and the SM expectations for each centre-of-mass energy are shown.
• the Durham resolution variable, − log10(y4→3);
• the Durham resolution variable, − log10(y5→4);
• the acollinearity between the two most energetic jets, with the event forced into four
jets;
• the sum of the first and third Fox-Wolfram moments;
• the momentum of the most energetic jet;
• the angle between the two most energetic jets (with the events clustered into six
jets).
The distributions of these variables are shown in Fig. 7 for data, SM expectation and
signal.
4.4 The cc̄qq̄l+ν final state
The signature of this CC final state is the presence of four jets (two of them having
low energy), one isolated lepton and missing energy (originating from the W → lν̄ decay).
The events were accepted if they had at least 15 good charged-particle tracks. The event
particles other than the identified lepton were clustered into four jets. Part of the qq̄
and γγ background was rejected by requiring − log10(y2→1) < 0.7. Furthermore, there
should be only one charged-particle track associated to the isolated lepton, and the leading
charged particle of the most energetic jet was required to have a momentum below 0.1
The number of selected data events and SM expectations at this level are summarized in
Table 7. The background composition and the signal efficiencies at this level of selection
for mb′ = 100 GeV/c
2 and
s = 205 GeV are given in Table 8. The efficiencies for the
other relevant b′ masses and
s values were found to be the same within errors.
The PDFs used to calculate the background and signal likelihoods were based on the
following variables:
• the sum of the first and third Fox-Wolfram moments;
• the invariant mass of the two jets, with the event particles other than the identified
lepton clustered into two jets;
• the Durham resolution variable, − log10(y4→3);
|~pi|/
s, where ~pi are the momenta of the charged particles (excluding the lepton)
in the same hemisphere as the lepton (the hemisphere is defined with respect to the
lepton);
• the acollinearity between the two most energetic jets;
s (GeV) data (SM expectation ± statistical error)
e µ no-id
196 65 (51.1±1.4) 53 (56.1±1.5) 38 (34.4±1.4)
200 54 (58.1±1.7) 63 (59.9±1.6) 40 (35.0±1.4)
202 30 (27.8±0.8) 21 (28.4±0.8) 13 (16.9±0.7)
205 56 (50.8±1.5) 66 (53.6±1.5) 32 (33.3±1.4)
207 53 (53.8±1.6) 48 (57.2±1.6) 35 (33.8±1.4)
206∗ 31 (37.2±1.4) 42 (39.3±1.1) 21 (23.4±1.0)
total 289 (278.8±3.5) 293 (294.5±3.4) 179 (176.8 ± 2.8)
Table 7: First selection level of the cc̄qq̄l+ν final state: the number of events selected in
data and the SM expectations for each sample and centre-of-mass energy are shown.
• the angle between the lepton and the missing momentum.
The data, SM expectation and signal distributions of these variables are shown in Fig. 8.
In order to improve the efficiency, events with no leptons seen in the detector were
kept in a fourth sample. For this sample, the selection criteria of the bb̄qq̄νν̄ final state
were applied and the same variables as in section 4.2 were used to build the PDFs. The
signal efficiency after the first selection level for mb′ = 100 GeV/c
2 and
s = 205 GeV
was 8.9±0.9%. The efficiencies for the other relevant b′ masses and
s values were found
to be the same within errors.
4.5 The cc̄qq̄qq̄ final state
This final state is very similar to bb̄qq̄qq̄ (with slightly different kinematics due to the
mass difference between the Z and the W). The analysis described in section 4.3 was
thus adopted. The number of selected events and the SM expectations can be found in
Table 6. At this level, the signal efficiency for mb′ = 100 GeV/c
2 and
s = 205 GeV was
67.3±1.5%. The efficiencies for the other b′ masses and centre-of-mass energies were the
same within errors. The PDFs were built using the same set of variables as in section 4.3.
5 Results
For all final states, a good agreement between data and SM expectation was found. The
summary of the total number of selected data events, SM expectations, the corresponding
background composition and the signal efficiencies for the studied final states are shown
in Table 8. In the bb̄l+l−νν̄ final state, one data event was retained after the final
selection level, for a SM expectation of 1.5 ± 0.7 events. This event belonged to the no-id
sample and was collected at
s = 200 GeV. For all the other final states, discriminant
analyses were used. In these cases, a discriminant variable, ln(LS/LB), was defined. The
distributions of ln(LS/LB), for the different analysis channels are shown in Fig. 9. No
evidence for a signal was found in any of the channels and the full information, i.e. event
numbers and the shapes of the distributions of the discriminant variables were used to
derive limits on BR(b′ → bZ) and BR(b′ → cW).
data background signal
final state (SM ± stat. error) composition (%) efficiency (%)
qq̄ WW ZZ γγ
bb̄l+l−νν̄ e sample 16 (13.2±0.8) 16 16 68 0 35.1±2.6
(first selection µ sample 14 (16.7±0.8) 0 10 90 0 53.4±2.7
level) no-id sample 208 (191.0±3.0) 8 80 12 0 12.3±1.0
bb̄qq̄νν̄ 533 (511.7±8.3) 76 17 2 5 57.6±1.7
bb̄qq̄qq̄ 1662 (1673.9±11.4) 35 65 0 0 66.0±1.5
e sample 289 (278.8±3.5) 7 82 11 0 45.3±2.7
cc̄qq̄l+ν µ sample 293 (294.5±3.4) 2 97 1 0 56.4±2.7
no-id sample 179 (176.8±2.8) 9 84 7 0 5.3±0.7
no lepton sample 533 (511.7±8.3) 76 17 2 5 8.9±0.9
cc̄qq̄qq̄ 1662 (1673.9±11.4) 35 65 0 0 67.3±1.5
Table 8: Summary of the total number of selected data events and SM expectations for
the studied final states after the final selection (first selection level for bb̄l+l−νν̄). The
corresponding background composition and signal efficiencies for mb′ = 100 GeV/c
2 and√
s = 205 GeV are also shown.
5.1 Limits on BR(b′ → bZ) and BR(b′ → cW)
Upper limits on the product of the e+e− → b′b̄′ cross-section and the branching ratio
as a function of the b′ mass were derived at 95% confidence level (CL) in each of the
considered b′ decay modes (FCNC and CC), taking into account the values of the dis-
criminant variables and their expected distributions for signal and background, the signal
efficiencies and the data luminosities at the various centre-of-mass energies.
Assuming the SM cross-section for the pair production of heavy quarks at LEP [7,14],
these limits were converted into limits on the branching ratios corresponding to the
b′ → bZ and b′ → cW decay modes. The modified frequentist likelihood ratio method [23]
was used. The different final states and centre-of-mass energy bins were treated as inde-
pendent channels. For each b′ mass only the channels with
s > 2mb′ were considered.
In order to avoid some non-physical fluctuations of the distributions of the discriminant
variables due to the limited statistics of the generated events, a smoothing algorithm was
used. The median expected limit, i.e. the limit obtained if the SM background was the
only contribution in data, was also computed. In Fig. 10 the observed and expected limits
on BR(b′ → bZ) and BR(b′ → cW) are shown as a function of the b′ mass. The 1σ and
2σ bands around the expected limit are also shown. The observed and expected limits
are statistically compatible. At 95% CL and for mb′ = 96 GeV/c
2, the BR(b′ → bZ) and
BR(b′ → cW) have to be below 51% and 43%, respectively. These limits were evaluated
taking into account the systematic uncertainties, as explained in the next subsection.
The limits obtained for BR(b′ → bZ) are compatible with those presented by CDF [10]
for a b′ mass of 100 GeV/c2. Below this mass, the DELPHI result is more sensitive and
the CDF limit degrades rapidly. For higher b′ masses, the LEP-II kinematical limit is
reached and the present analysis looses sensitivity.
5.2 Systematic uncertainties
The evaluation of the limits was performed taking into account systematic uncertain-
ties, which affect the background estimation, the signal efficiency and the shape of the
distributions used. The following systematic uncertainties were considered:
• SM cross-sections: uncertainties on the SM cross-sections translate into uncertainties
on the expected number of background events. The overall uncertainty on the most
relevant SM background processes for the present analyses is typically less than
2% [24], which leads to relative changes on the branching ratio limits below 6%;
• Signal generation: uncertainties on the final state quark hadronisation and fragmen-
tation modelling were studied. The Lund symmetric fragmentation function was
tested and compared with schemes where the b and c quark masses are taken into
account [14]. This systematic error source was estimated to be of the order of 20%
in the signal efficiency, by conservatively taking the maximum observed variation.
The relative effect on the branching ratio limits is below 16%;
• Smoothing: the uncertainty associated to the discriminant variables smoothing was
estimated by applying different smoothing algorithms. The smoothing procedure
does not change the number of SM expected events or the signal efficiency, but may
lead to differences in the shape of the discriminant variables. The relative effect of
this uncertainty on the limits evaluation was found to be below 9%.
Further details on the evaluation of the systematic errors and the derivation of limits can
be found in [25].
6 Constraints on RCKM
The branching ratios for the b′ decays can be computed within a four generations
sequential model [5–7]. As discussed before, if the b′ is lighter than both the t and the t′
quarks and mZ < mb′ < mH, the main contributions to the b
′ width are BR(b′ → bZ) and
BR(b′ → cW) [7]. Using the unitarity of the CKM matrix, its approximate diagonality
(Vub′ Vub ≈ 0) and taking Vcb ≈ 10−2 [12], the branching fractions can be written as a
function of three variables: RCKM = | Vcb′V
tb′ Vtb
|, mt′ and mb′ [5–7].
Fixing mt′ − mb′ , the limits on BR(b′ → bZ) and BR(b′ → cW) (Fig. 10) can be
translated into 95% CL bounds on RCKM as a function of mb′ . Two extreme cases were
considered: the almost degenerate case, with mt′−mb′ = 1 GeV/c2, and the case in which
the mass difference is close to the largest possible value, mt′ − mb′ = 50 GeV/c2 [3,5].
The results are shown in Fig. 11 and Fig. 12. In the figures, the upper curve was obtained
from the limit on BR(b′ → cW), while the lower curve was obtained from the limit on
BR(b′ → bZ), which decreases with growing mt′ . This suppression is due to the GIM
mechanism [26] as mt′ approaches mt. On the other hand, as the b
′ mass approaches
the bZ threshold, the b′ → bg decay dominates over b′ → bZ [7] and the lower limit on
RCKM becomes less stringent. The expected limits on BR(b
′ → bZ) did not allow to set
exclusions for low values of RCKM and mt′ −mb′ = 1 GeV/c2 (see Fig. 11).
7 Conclusions
The data collected with the DELPHI detector at
s = 196−209 GeV show no evidence
for the pair production of b′-quarks with masses ranging from 96 to 103 GeV/c2.
Assuming the SM cross-section for the pair production of heavy quarks at LEP, 95%
CL upper limits on BR(b′ → bZ) and BR(b′ → cW) were obtained. It was shown that, at
95% CL and for mb′ = 96 GeV/c
2, the BR(b′ → bZ) and BR(b′ → cW) have to be below
51% and 43%, respectively. The 95% CL upper limits on the branching ratios, combined
with the predictions of the sequential fourth generation model, were used to exclude
regions of the (RCKM , mb′) plane for two hypotheses of the mt′ − mb′ mass difference.
It was shown that, for mt′ −mb′ = 1 (50) GeV/c2 and 96 GeV/c2 < mb′ < 102 GeV/c2,
RCKM is bounded by an upper limit of 3.8×10−3 (1.2×10−3). For mb′ = 100 GeV/c2 and
mt′ −mb′ = 50 GeV/c2, the CKM ratio was constrained to be in the range 4.6 × 10−4 <
RCKM < 7.8 × 10−4.
Acknowledgements
We are greatly indebted to our technical collaborators, to the members of the CERN-
SL Division for the excellent performance of the LEP collider, and to the funding agencies
for their support in building and operating the DELPHI detector.
We acknowledge in particular the support of
Austrian Federal Ministry of Education, Science and Culture, GZ 616.364/2-III/2a/98,
FNRS–FWO, Flanders Institute to encourage scientific and technological research in the
industry (IWT) and Belgian Federal Office for Scientific, Technical and Cultural affairs
(OSTC), Belgium,
FINEP, CNPq, CAPES, FUJB and FAPERJ, Brazil,
Czech Ministry of Industry and Trade, GA CR 202/99/1362,
Commission of the European Communities (DG XII),
Direction des Sciences de la Matière, CEA, France,
Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie, Germany,
General Secretariat for Research and Technology, Greece,
National Science Foundation (NWO) and Foundation for Research on Matter (FOM),
The Netherlands,
Norwegian Research Council,
State Committee for Scientific Research, Poland, SPUB-M/CERN/PO3/DZ296/2000,
SPUB-M/CERN/PO3/DZ297/2000, 2P03B 104 19 and 2P03B 69 23(2002-2004)
FCT - Fundação para a Ciência e Tecnologia, Portugal,
Vedecka grantova agentura MS SR, Slovakia, Nr. 95/5195/134,
Ministry of Science and Technology of the Republic of Slovenia,
CICYT, Spain, AEN99-0950 and AEN99-0761,
The Swedish Research Council,
Particle Physics and Astronomy Research Council, UK,
Department of Energy, USA, DE-FG02-01ER41155,
EEC RTN contract HPRN-CT-00292-2002.
References
[1] The LEP Collaborations ALEPH, DELPHI, L3, OPAL and the LEP Electroweak
Working Group, A Combination of Preliminary Electroweak Measurements and Con-
straints on the Standard Model (2005) CERN-PH-EP/2005-051, hep-ex/0511027;
ALEPH, DELPHI, L3, OPAL and SLD Coll., LEP Electroweak Working Group,
SLD Heavy Flavour Groups, Phys. Rept. 427 (2006) 257.
[2] V.A. Novikov, L.B. Okun, A.N. Rozanov and M.I. Vysotsky, Phys. Lett. B529 (2002)
[3] P.H. Frampton, P.Q. Hung and M. Sher, Phys. Rep. 330 (2000) 263.
[4] A. Djouadi et al. in Electroweak symmetry breaking and new physics at the TeV scale,
ed. Barklow, Timothy - World Scientific, Singapore (1997).
[5] A. Arhrib and W.S. Hou, Phys. Rev. D64 (2001) 073016;
A. Arhrib and W.S. Hou, JHEP 0607 (2006) 009.
[6] W.S. Hou and R.G. Stuart, Phys. Rev. Lett. 62 (1989) 617;
W.S. Hou and R.G. Stuart, Nucl. Phys. B320 (1989) 277;
W.S. Hou and R.G. Stuart, Nucl. Phys. B349 (1991) 91.
[7] S.M. Oliveira and R. Santos, Phys. Rev. D68 (2003) 093012;
S.M. Oliveira and R. Santos, Acta Phys. Polon. B34 (2003) 5523.
[8] ALEPH Coll., D. Decamp et al., Phys. Lett. B236 (1990) 511;
DELPHI Coll., P. Abreu et al., Nucl. Phys. B367 (1991) 511;
L3 Coll., O. Adriani et al., Phys. Rep. 236 (1993) 1;
OPAL Coll., M.Z. Akrawy et al., Phys. Lett. B246 (1990) 285.
[9] D0 Coll., S. Abachi et al., Phys. Rev. Lett. 78 (1997) 3818.
[10] CDF Coll., T. Affolder et al., Phys. Rev. Lett. 84 (2000) 835.
[11] D0 Coll., S. Abachi et al., Phys. Rev. D52 (1995) 4877.
[12] Particle Data Group, W.-M. Yao et al., J. Phys. G33 (2006) 1.
[13] DELPHI Coll., P. Aarnio et al., Nucl. Instr. Meth. A303 (1991) 233;
DELPHI Coll., P. Abreu et al., Nucl. Instr. Meth. A378 (1996) 57.
[14] T. Sjöstrand, Comp. Phys. Comm. 82 (1994) 74;
T. Sjöstrand, PYTHIA 5.7 and JETSET 7.4, CERN-TH/7112-93;
T. Sjöstrand et al., Comp. Phys. Comm. 135 (2001) 238.
[15] E. Accomando and A. Ballestero, Comp. Phys. Comm. 99 (1997) 270;
E. Accomando, A. Ballestrero and E. Maina, Comp. Phys. Comm. 150 (2003) 166;
A. Ballestrero, R. Chierici, F. Cossutti and E. Migliore, Comp. Phys. Comm. 152
(2003) 175.
[16] S. Jadach, B.F.L. Ward and Z. Was, Comp. Phys. Comm. 130 (2000) 260.
[17] S. Jadach, W. P laczek and B.F.L. Ward, Phys. Lett. B390 (1997) 298.
[18] F. Cossutti et al., REMCLU: a package for the Reconstruction of Elec-
troMagnetic CLUsters at LEP200, DELPHI Note 2000-164 PROG 242,
http://delphiwww.cern.ch/pubxx/delnote/public/2000 164 prog 242.ps.gz.
[19] S. Catani et al., Phys. Lett. B269 (1991) 432.
[20] DELPHI Coll., J. Abdallah et al., Eur. Phys. J. C32 (2004) 185.
[21] P. Abreu et al., Nucl. Instr. Meth. A427 (1999) 487.
[22] G. Fox and S. Wolfram, Phys. Lett. B82 (1979) 134.
[23] A.L. Read, CERN report 2000-005 (2000) 81, “Workshop on Confidence Limits”,
edited by F. James, L. Lyons and Y. Perrin.
[24] S. Jadach et al., LEP2 Monte Carlo Workshop: Report of the Working Groups on
Precision Calculations for LEP2 Physics, CERN report 2000-009 (2000);
G. Altarelli et al., Physics at LEP2, CERN report 96-01 (1996).
[25] N. Castro, Search for a fourth generation b′-quark at LEP-II. MSc. Thesis, Instituto
Superior Técnico da Universidade Técnica de Lisboa (2004), CERN-THESIS-2005-
[26] S. Glashow, J. Iliopoulos and L. Maiani, Phys. Rev. D2 (1970) 1285.
Z / H / g / γ
b / s / d
t′ / t / c / u
a) b)
Figure 1: The Feynman diagrams corresponding to the b′ (a) FCNC and (b) CC decay
modes are shown.
Z/γ Z
l−/ q̄ / q̄
l+/ q / q
ν̄ / ν̄ / q̄
ν / ν / q
Z/γ W
q̄ / q̄
q / q
ν / q̄
l+/ q
a) b)
Figure 2: The final states associated to the b′ (a) FCNC and (b) CC decay modes are
shown. Only those states analysed here are indicated.
DELPHI
0 50 100 150
αl1,tr (˚)
a)       (e sample)
0 20 40 60 80 100
pmis (GeV/c)
b)      (µ sample)
0 20 40 60 80
pjet 1 (GeV/c)
c)      (no-id sample)
SM expectation
signal (mb’=100 GeV/c
Figure 3: Data and SM expectation after the preselection level for the bb̄l+l−νν̄ final state
and centre-of-mass energies above 200 GeV. (a) The angle between the most energetic
lepton and the closest charged-particle track (e sample), (b) the missing momentum (µ
sample) and (c) the momentum of the most energetic jet (no-id sample) are shown. The
signal distributions for mb′ = 100 GeV/c
2 and
s = 205 GeV are also shown with
arbitrary normalisation. The background composition is 11% of qq̄, 69% of WW, 15% of
ZZ and 5% of γγ for the e sample, 6% of qq̄, 90% of WW and 4% of ZZ for the µ sample
and 45% of qq̄, 48% of WW, 5% of ZZ and 2% of γγ for the no-id sample.
DELPHI
0 20 40 60 80
pjet 1 (GeV/c)
a)       (e sample)
0 50 100 150
αll (˚)
b)      (µ sample)
0 0.25 0.5 0.75 1
pmis / Emis
c)      (no-id sample)
SM expectation
signal (mb’=100 GeV/c
Figure 4: Data and SM expectation after the first selection level for the bb̄l+l−νν̄ final
state and for centre-of-mass energies above 200 GeV. (a) The momentum of the most
energetic jet (e sample), (b) the angle between the two leptons (µ sample) and (c) the
ratio between the missing momentum and missing energy (no-id sample) are shown. The
signal distributions for mb′ = 100 GeV/c
2 and
s = 205 GeV are also shown with
arbitrary normalisation. The arrows represent the cuts applied in the second selection
level.
DELPHI
0 5 10 15 20 25 30 35 40
0 5 10 15 20 25 30 35 40
0 5 10 15 20 25 30 35 40
χ2/n.d.f.
SM expectation
signal (mb’=100 GeV/c
Figure 5: Comparison of data and SM expectation distributions of the χ2/n.d.f. of the fit
imposing energy-momentum conservation and no missing energy for the bb̄qq̄νν̄ final state
at centre-of-mass energies above 200 GeV. The arrow shows the applied cut. The signal
for mb′ = 100 GeV/c
2 and
s = 205 GeV is also shown with arbitrary normalisation.
DELPHI
0 50 100 150 200
missing mass (GeV/c
0 10 20 30 40
scaled acoplanarity (˚)
0 50 100 150
acolj1j2 (˚)
0 0.5 1 1.5
h1+h3
0 50 100 150
θmis (˚)
SM expectation
signal (mb’=100 GeV/c
Figure 6: Variables used in the discriminant analysis (bb̄qq̄νν̄ final state). The data and
SM expectation distributions for centre-of-mass energies above 200 GeV are shown for (a)
the missing mass, (b) Aj1j2cop ×min(sin θj1 , sin θj2), where Aj1j2cop is the acoplanarity and θj1,j2
are the polar angles of the jets when forcing the events into two jets, (c) the acollinearity
between the two most energetic jets (with the event particles clustered into four jets),
(d) the sum of the first and third Fox-Wolfram moments and (e) the polar angle of the
missing momentum. The signal distributions for mb′ = 100 GeV/c
2 and
s = 205 GeV
are also shown with arbitrary normalisation.
DELPHI
1 1.5 2 2.5 3
-log10(y4→3)
1 2 3 4
-log10(y5→4)
0 50 100 150
acolj1j2 (4 jets) (˚)
0 0.2 0.4 0.6 0.8 1
h1+h3
0 20 40 60 80 100
pj1 (GeV/c)
/c e)
0 50 100 150
αj1j2 (˚)
data SM expectation signal (mb’=100 GeV/c
Figure 7: Variables used in the discriminant analysis (bb̄qq̄qq̄ final state). The data
and SM expectation for centre-of-mass energies above 200 GeV are shown for (a)
− log10(y4→3), (b) − log10(y5→4), (c) the acollinearity between the two most energetic
jets, with the events clustered into four jets (see text for explanation), (d) the h1 + h3
Fox-Wolfram moments sum, (e) the momentum of the most energetic jet and (f) the angle
between the two most energetic jets. The signal distributions for mb′ = 100 GeV/c
2 and√
s = 205 GeV are also shown with arbitrary normalisation.
DELPHI
0 0.25 0.5 0.75 1
h1+h3
a)       (e sample)
0 50 100 150
mj1j2 (2 jets) (GeV/c
b)       (e sample)
1 2 3 4
-log10(y4→3)
c)      (µ sample)
0 0.1 0.2 0.3 0.4
Σptracks lepton hem. / √s
d)      (µ sample)
0 50 100 150
acolj1j2 (˚)
e)      (no-id sample)
0 50 100 150
αlν (˚)
f)       (no-id sample)
data SM expectation signal (mb’=100 GeV/c
Figure 8: Variables used in the discriminant analysis (cc̄qq̄l+ν final state). The data
events and background expectation for centre-of-mass energies above 200 GeV are shown
for (a) the h1 + h3 Fox-Wolfram moments sum (e sample), (b) the invariant mass of
the two jets with the events clustered into two jets (e sample), (c) − log10(y4→3) (µ
sample), (d)
|~pi|/
s, where ~pi are the momenta of the charged particles (excluding
the lepton) in the same hemisphere as the lepton (µ sample), (e) the acollinearity between
the two most energetic jets (no-id sample) and (f) the angle between the lepton and the
missing momentum (no-id sample). The signal distributions for mb′ = 100 GeV/c
2 and√
s = 205 GeV are also shown with arbitrary normalisation.
DELPHI
-10 0 10
ln(LS/LB)
-10 -5 0 5
ln(LS/LB)
-20 -10 0 10
ln(LS/LB)
-20 -10 0 10
ln(LS/LB)
-20 -10 0 10
ln(LS/LB)
-10 0 10
ln(LS/LB)
-20 -10 0 10
ln(LS/LB)
SM expectation
Signal (mb’=100 GeV/c
Figure 9: Discriminant variables ln(LS/LB) for data and SM simulation (centre-of–
mass energies above 200 GeV). FCNC b′ decay mode: (a) bb̄qq̄νν̄ and (b) bb̄qq̄qq̄.
CC b′ decay mode: (c) cc̄qq̄l+ν (e sample), (d) cc̄qq̄l+ν (µ sample), (e) cc̄qq̄l+ν (no-id
sample) (f) cc̄qq̄l+ν (no lepton sample) and (g) cc̄qq̄qq̄. The signal distributions for
mb′ = 100 GeV/c
2 and
s = 205 GeV are also shown with arbitrary normalisation.
DELPHI
96 97 98 99 100 101 102 103
mb’ (GeV/c
a) b’→ bZ decay
observed limit
expected limit
expected ± 1σ
expected ± 2σ
96 97 98 99 100 101 102 103
96 97 98 99 100 101 102 103
mb’ (GeV/c
96 97 98 99 100 101 102 103
b) b’→ cW decay
observed limit
expected limit
expected ± 1σ
expected ± 2σ
Figure 10: The observed and expected upper limits at 95% CL on (a) BR(b′ → bZ) and
(b) BR(b′ → cW) are shown. The 1σ and 2σ bands around the expected limit are also
presented. Systematic errors were taken into account in the limit evaluation.
DELPHI
Figure 11: The excluded region in the plane (RCKM , mb′) with mt′ −mb′ = 1 GeV/c2,
obtained from the 95% CL upper limits on BR(b′ → bZ) (bottom) and BR(b′ → cW)
(top) is shown. The light and dark shadings correspond to the observed and expected
limits, respectively. The expected limits on BR(b′ → bZ) did not allow exclusions to be
set for low values of RCKM .
DELPHI
Figure 12: The excluded region in the plane (RCKM , mb′) with mt′ −mb′ = 50 GeV/c2,
obtained from the 95% CL upper limits on BR(b′ → bZ) (bottom) and BR(b′ → cW)
(top) is shown. The light and dark shadings correspond to the observed and expected
limits, respectively.
ABSTRACT
  A search for the pair production of fourth generation b'-quarks was performed
using data taken by the DELPHI detector at LEP-II. The analysed data were
collected at centre-of-mass energies ranging from 196 to 209 GeV, corresponding
to an integrated luminosity of 420 pb^{-1}. No evidence for a signal was found.
Upper limits on BR(b' -> bZ) and BR(b' -> cW) were obtained for b' masses
ranging from 96 to 103 GeV/c^2. These limits, together with the theoretical
branching ratios predicted by a sequential four generations model, were used to
constrain the value of R_{CKM}=|V_{cb'}/V_{tb'}V_{tb}|, where V_{cb'}, V_{tb'}
and V_{tb} are elements of the extended CKM matrix.

<|endoftext|><|startoftext|>
Introduction
The main concern of this paper is the curvature of a special family of
warped pseudo-metrics on product manifolds. We introduce a suitable form
for the relations among the involved curvatures in such metrics and apply
them to the existence and/or construction of Einstein and constant scalar
curvature metrics in this family.
Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Riemannian manifolds
of dimensions m ≥ 1 and k ≥ 0, respectively and also let B × F be the
usual product manifold of B and F . For a given smooth function w ∈
C∞>0(B) = {v ∈ C∞(B) : v(x) > 0, ∀x ∈ B}, the warped product B ×w F =
((B ×w F )m+k, g = gB +w2gF ) was defined by Bishop and O’Neill in [19] in
order to study manifolds of negative curvature.
Date: November 4, 2018.
1991 Mathematics Subject Classification. Primary: 53C21, 53C25, 53C50
Secondary: 35Q75, 53C80, 83E15, 83E30.
Key words and phrases. Warped products, conformal metrics, Ricci curvature, scalar
curvature, semilinear equations, positive solutions, Lichnerowicz-York equation, concave-
convex nonlinearities, Kaluza-Klein theory, string theory.
http://arxiv.org/abs/0704.0595v1
2 FERNANDO DOBARRO & BÜLENT ÜNAL
In this article, we deal with a particular class of warped products, i.e.
when the pseudo-metric in the base is affected by a conformal change. Pre-
cisely, for given smooth functions c, w ∈ C∞>0(B) we will call ((B × F )m+k, g =
c2gB+w
2gF ) as a [c, w]-base conformal warped product (briefly [c, w]-bcwp),
denoted by B ×[c,w] F . We will concentrate our attention on a special sub-
class of this structure, namely when there is a relation between the conformal
factor c and the warping function w of the form c = wµ, where µ is a real
parameter and we will call the [ψµ, ψ]-bcwp as a (ψ, µ)-bcwp. Note that we
generically called the latter case as special base conformal warped products,
briefly sbcwp in [29].
As we will explain in §2, metrics of this type play a relevant role in several
topics of differential geometry and theoretical physics (see also [29]). This
article concerns curvature related questions of these metrics which are of
interest not only in the applications, but also from the points of view of dif-
ferential geometry and the type of the involved nonlinear partial differential
equations (PDE), such as those with concave-convex nonlinearities and the
Lichnerowicz-York equations.
The article is organized in the following way: in §2 after a brief description
of several fields where pseudo-metrics described as above are applied, we
formulate the curvature problems that we deal within the next sections and
give the statements of the main results. In §3, we state Theorems 2.2 and
2.3 in order to express the Ricci tensor and scalar curvature of a (ψ, µ)-bcwp
and sketch their proofs (see [29, Section 3] for detailed computations). In §4
and 5, we establish our main results about the existence of (ψ, µ)-bcwp’s of
constant scalar curvature with compact Riemannian base.
2. Motivations and Main results
As we announced in the introduction, we firstly want to mention some of
the major fields of differential geometry and theoretical physics where base
conformal warped products are applied.
i: In the construction of a large class of non trivial static anti de Sitter
vacuum space-times
• In the Schwarzschild solutions of the Einstein equations (see
[10, 18, 41, 59, 69, 74]).
• In the Riemannian Schwarzschild metric, namely (see [10]).
• In the “generalized Riemannian anti de Sitter T2 black hole
metrics” (see §3.2 of [10] for details).
• In the Bañados-Teitelboim-Zanelli (BTZ) and de Sitter (dS)
black holes (see [1, 15, 16, 28, 45] for details).
Indeed, all of them can be generated by an approach of the fol-
lowing type: let (F2, gF ) be a pseudo-Riemannian manifold and g be
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 3
a pseudo-metric on R+ × R× F2 defined by
(2.1) g =
u2(r)
dr2 ± u2(r)dt2 + r2gF .
After the change of variables s = r2, y =
t, there results ds2 =
4r2dr2 and dy2 =
dt2. Then (2.1) is equivalent to
ds2 ± 4
s)dy2
+ sgF
2 )2(−
2 ))2(−1)ds2 ± (2s
2 ))2dy2
2 )2gF .
(2.2)
Note that roughly speaking, g is a nested application of two (ψ, µ)-
bcwp’s. That is, on R+ × R and taking
(2.3) ψ1(s) = 2s
2 ) and µ1 = −1,
the metric inside the brackets in the last member of (2.2) is a (ψ1, µ1)-
bcwp, while the metric g on (R+ × R)× F2 is a (ψ2, µ2)-bcwp with
(2.4) ψ2(s, y) = s
2 and µ2 = −
In the last section of [29], through the application of Theorems 2.2
and 2.3 below and several standard computations, we generalized
the latter approach to the case of an Einstein fiber (Fk, gF ) with
dimension k ≥ 2.
ii: In the study of the equivariant isometric embeddings of space-time
slices in Minkowski spaces (see [39, 38]).
iii: In the Kaluza-Klein theory (see [76, §7.6, Particle Physics and Ge-
ometry], [60] and [77]) and in the Randall-Sundrum theory [30, 40,
63, 64, 65, 71] with µ as a free parameter. For example, in [46] the
following metric is considered
(2.5) e2A(y)gijdx
idxj + e2B(y)dy2,
with the notation {xi}, i = 0, 1, 2, 3 for the coordinates in the 4-
dimensional space-time and x5 = y for the fifth coordinate on an
extra dimension. In particular, Ito takes the ansatz
(2.6) B = αA,
which corresponds exactly to our sbcwp metrics, considering gB =
dy2, gF = gijdx
idxj , ψ(y) = e
α = eA(y) and µ = α.
iii: In String and Supergravity theories, for instance, in the Maldacena
conjecture about the duality between compactifications of M/string
4 FERNANDO DOBARRO & BÜLENT ÜNAL
theory on various Anti-de Sitter space-times and various confor-
mal field theories (see [55, 62]) and in warped compactifications
(see [40, 72] and references therein). Besides all of these, there are
also frequent occurrences of this type of metrics in string topics (see
[33, 34, 35, 36, 37, 53, 61, 71] and also [1, 12, 67] for some reviews
about these topics).
iv: In the derivation of effective theories for warped compactification
of supergravity and the Hor̆ava-Witten model (see [50, 51]). For in-
stance, in [51] the ansatz ds2 = hαds2(X4) + h
βds2(Y ) is considered
where X4 is a four-dimensional space-time with coordinates x
is a Calabi-Yau manifold (the so called internal space) and h de-
pends on the four-dimensional coordinates xµ, in order to study the
dynamics of the four-dimensional effective theory. We note that in
those articles, the structure of the expressions of the Ricci tensor
and scalar curvature of the involved metrics result particularly use-
ful. We observe that they correspond to very particular cases of the
expressions obtained by us in [29], see also Theorems 2.2 and 2.3 and
Proposition 2.4 stated below.
v: In the discussion of Birkhoff-type theorems (generally speaking these
are the theorems in which the gravitational vacuum solutions admit
more symmetry than the inserted metric ansatz, (see [41, page 372]
and [17, Chapter 3]) for rigorous statements), especially in Equation
6.1 of [66] where, H-J. Schmidt considers a special form of a bcwp and
basically shows that if a bcwp of this form is Einstein, then it admits
one Killing vector more than the fiber. In order to achieve that, the
author considers for a specific value of µ, namely µ = (1 − k)/2, in
the following problem:
Does there exist a smooth function ψ ∈ C∞>0(B) such that
the corresponding (ψ, µ)-bcwp (B2 × Fk, ψ2µgB + ψ2gF ) is
an Einstein manifold? (see also (Pb-Eins.) below.)
vi: In the study of bi-conformal transformations, bi-conformal vector
fields and their applications (see [32, Remark in Section 7] and [31,
Sections 7 and 8]).
vii: In the study of the spectrum of the Laplace-Beltrami operator for
p−forms. For instance in Equation (1.1) of [11], the author considers
the structure that follows: let M be an n-dimensional compact, Rie-
mannian manifold with boundary, and let y be a boundary-defining
function; she endows the interior M of M with a Riemannian metric
ds2 such that in a small tubular neighborhood of ∂M inM , ds2 takes
the form
(2.7) ds2 = e−2(a+1)tdt2 + e−2btdθ2∂M ,
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 5
where t := − log y ∈ (c,+∞) and dθ2
is the Riemannian metric on
∂M (see [11, 56] and references therein for details).
Notation 2.1. From now on, we will use the Einstein summation convention
over repeated indices and consider only connected manifolds. Furthermore,
we will denote the Laplace-Beltrami operator on a pseudo-Riemannian man-
ifold (N,h) by ∆N (·), i.e., ∆N (·) = ∇N
i∇Ni(·). Note that ∆N is elliptic if
(N,h) is Riemannian and it is hyperbolic when (N,h) is Lorentzian. If (N,h)
is neither Riemannian nor Lorentzian, then the operator is ultra-hyperbolic.
Furthermore, we will consider the Hessian of a function v ∈ C∞(N), denoted
by Hvh or H
N , so that the second covariant differential of v is given by
Hvh = ∇(∇v). Recall that the Hessian is a symmetric (0, 2) tensor field
satisfying
(2.8) Hvh(X,Y ) = XY v − (∇XY )v = h(∇X(grad v), Y ),
for any smooth vector fields X,Y on N.
For a given pseudo-Riemannian manifold N = (N,h) we will denote its
Riemann curvature tensor, Ricci tensor and scalar curvature by RN , RicN
and SN , respectively.
We will denote the set of all lifts of all vector fields of B by L(B). Note that
the lift of a vector field X on B denoted by X̃ is the vector field on B × F
given by dπ(X̃) = X where π : B × F → B is the usual projection map.
In Section 3, we will sketch the proofs of the following two theorems related
to the Ricci tensor and the scalar curvature of a generic (ψ, µ)-bcwp.
Theorem 2.2. Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Rieman-
nian manifolds with m ≥ 3 and k ≥ 1, respectively and also let µ ∈ R \
{0, 1, µ, µ±} be a real number with
µ := − k
m− 2 and µ± := µ±
µ2 − µ.
Suppose ψ ∈ C∞>0(B). Then the Ricci curvature tensor of the corresponding
(ψ, µ)-bcwp, denoted by Ric verifies the relation
Ric = RicB + β
− β∆ 1
α∆ gB on L(B)× L(B),
Ric = 0 on L(B)× L(F ),
Ric = RicF −
ψ2(µ−1)
α∆ gF on L(F )× L(F ),
(2.9)
6 FERNANDO DOBARRO & BÜLENT ÜNAL
where
(2.10)
(m− 2)µ+ k ,
(m− 2)µ+ k ,
−[(m− 2)µ + k]
µ[(m− 2)µ+ k] + k(µ− 1) ,
[(m− 2)µ + k]2
µ[(m− 2)µ+ k] + k(µ− 1) .
Theorem 2.3. Let B = (Bm, gB) and F = (Fk, gF ) be two pseudo-Rieman-
nian manifolds of dimensions m ≥ 2 and k ≥ 0, respectively. Suppose that
SB and SF denote the scalar curvatures of B = (Bm, gB) and F = (Fk, gF ),
respectively. If µ ∈ R and ψ ∈ C∞>0(B), then the scalar curvature S of the
corresponding (ψ, µ)-bcwp verifies,
(i) If µ 6= − k
m− 1 , then
(2.11) − β∆Bu+ SBu = Su2µα+1 − SFu2(µ−1)α+1
where
(2.12) α =
2[k + (m− 1)µ]
{[k + (m− 1)µ] + (1− µ)}k + (m− 2)µ[k + (m− 1)µ] ,
(2.13) β = α2[k + (m− 1)µ] > 0
and ψ = uα > 0.
(ii) If µ = − k
m− 1 , then
(2.14) − k
|∇Bψ|2B
m−1 [S − SFψ−2]− SB.
From the mathematical and physical points of view, there are several
interesting questions about (ψ, µ)-bcwp’s. In [29] we began the study of
existence and/or construction of Einstein (ψ, µ)-bcwp’s and those of constant
scalar curvature. These questions are closely connected to Theorems 2.2 and
In [29], by applying Theorem 2.2, we give suitable conditions that allow
us to study some particular cases of the problem:
(Pb-Eins.) Given µ ∈ R, does there exist a smooth function
ψ ∈ C∞>0(B) such that the corresponding (ψ, µ)-bcwp is an
Einstein manifold?
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 7
In particular, we obtain the following result as an immediate corollary of
Theorem 2.2.
Proposition 2.4. Let us assume the hypothesis of Theorem 2.2. Then
the corresponding (ψ, µ)-bcwp is an Einstein manifold with λ if and only
if (F, gF ) is Einstein with ν constant and the system that follows is verified
λψ2µgB = RicB + β
− β∆ 1
α∆ gB on L(B)× L(B)
λψ2 = ν − 1
ψ2(µ−1)
(2.15)
where the coefficients are given by (2.10).
Compare the system (2.15) with the well known one for a classical warped
product in [18, 49, 59]. By studying (2.15), we have obtained the generaliza-
tion of the construction exposed in the above motivational examples in i and
v, among other related results. We suggest the interested reader consider
the results about the problem (Pb-Eins.) stated in [29].
Now, we focus on the problems which we will deal in §4. Let B = (Bm, gB)
and F = (Fk, gF ) be pseudo-Riemannian manifolds.
There is an extensive number of publications about the well known Yamabe
problem namely:
(Ya) [79, 75, 68, 13] Does there exist a function ϕ ∈ C∞>0(B)
such that (Bm, ϕ
m−2 gB) has constant scalar curvature?
Analogously, in several articles the following problem has been studied:
(cscwp) [27] Is there a function w ∈ C∞>0(B) such that the
warped product B ×w F has constant scalar curvature?
In the sequel we will suppose that B = (Bm, gB) is a Riemannian manifold.
Thus, both problems bring to the study of the existence of positive solutions
for nonlinear elliptic equations on Riemannian manifolds. The involved non-
linearities are powers with Sobolev critical exponent for the Yamabe problem
and sub-linear (linear if the dimension k of the fiber is 3) for the problem of
constant scalar curvature of a warped product.
In Section 4, we deal with a mixed problem between (Ya) and (cscwp)
which is already proposed in [29], namely:
8 FERNANDO DOBARRO & BÜLENT ÜNAL
(Pb-sc) Given µ ∈ R, does there exist a function ψ ∈
C∞>0(B) such that the corresponding (ψ, µ)-bcwp has constant
scalar curvature?
Note that when µ = 0, (Pb-sc) corresponds to the problem (cscwp),
whereas when the dimension of the fiber k = 0 and µ = 1, then (Pb-sc)
corresponds to (Ya) for the base manifold. Finally (Pb-sc) corresponds to
(Ya) for the usual product metric with a conformal factor in C∞>0(B) when
µ = 1.
Under the hypothesis of Theorem 2.3 i, the analysis of the problem (Pb-
sc) brings to the study of the existence and multiplicity of solutions u ∈
C∞>0(B) of
(2.16) − β∆Bu+ SBu = λu2µα+1 − SFu2(µ−1)α+1,
where all the components of the equation are like in Theorem 2.3 i and λ
(the conjectured constant scalar curvature of the corresponding (ψ, µ)-bcwp)
is a real parameter. We observe that an easy argument of separation of
variables, like in [24, §2] and [27], shows that there exists a positive solution
of (2.16) only if the scalar curvature of the fiber SH is constant. Thus this
will be a natural assumption in the study of (Pb-sc).
Furthermore, note that the involved nonlinearities in the right hand side of
(2.16) dramatically change with the choice of the parameters, an exhaustive
analysis of these changes is the subject matter of [29, §6].
There are several partial results about semi-linear elliptic equations like
(2.16) with different boundary conditions, see for instance [2, 5, 6, 9, 21,
23, 26, 73, 78] and references in [29].
In this article we will state our first results about the problem (Pb-sc) when
the base B is a compact Riemannian manifold of dimension m ≥ 3 and the
fiber F has non-positive constant scalar curvature SF .
For brevity of our study, it will be useful to introduce the following notation:
µsc := µsc(m,k) = −
m− 1 and µpY = µpY (m,k) := −
k + 1
m− 2 (sc as scalar
curvature and Y as Yamabe). Notice that µpY < µsc < 0.
We plan to study the case of µ = µsc in a preceding project, therefore the
related results are not going to be presented here.
We can synthesize our results about (Pb-sc) in the case of non-positive SF
as follow.
• The case of scalar flat fiber, i.e. SF = 0.
Theorem 2.5. If µ ∈ (µpY , µsc) ∪ (µsc,+∞) the answer to (Pb-sc)
is affirmative.
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 9
By assuming some additional restrictions on the scalar curvature of
the base SB , we obtain existence results for the range µ ≤ µpY .
• The case of fiber with negative constant scalar curvature, i.e. SF < 0.
In order to describe the µ−ranges of validity of the results, we will
apply the notations introduced in [29, §5] (see Appendix A for a brief
introduction of these notations).
Theorem 2.6. If “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD and
µ ∈ (0, 1) ∩ (µ−, µ+)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩ C[µ−, µ+]”,
then the answer to (Pb-sc) is affirmative.
Remark 2.7. The first two cases in Theorem 2.6 will be studied by
adapting the ideas in [5] and the last case by applying the results in
[73, p. 99]. In the former - Theorem 4.15, the involved nonlinearities
correspond to the so called concave-convex whereas in the latter -
Theorem 4.16, they are singular as in the Lichnerowicz-York equation
about the constraints for the Einstein equations (see [22], [43], [58],
[57, p. 542-543] and [73, Chp.18]).
Similarly to the case of SF = 0, we obtain existence results for some
remaining µ−ranges by assuming some additional restrictions for the
scalar curvature of the base SB .
Naturally the study of (Pb-sc) allows us to obtain partial results of the
related question:
Given µ ∈ R and λ ∈ R does there exist a function ψ ∈
C∞>0(B) such that the corresponding (ψ, µ)-bcwp has constant
scalar curvature λ?
These are stated in the several theorems and propositions in §4.
3. The curvature relations - Sketch of the proofs
The proofs of Theorems 2.2 and 2.3 require long and yet standard com-
putations of the Riemann and Ricci tensors and the scalar curvature of a
general base conformal warped product. Here, we reproduce the results for
the Ricci tensor and the scalar curvature, and we also suggest the reader see
[29, §3] for the complete computations.
Theorem 3.1. The Ricci tensor of [c, w]-bcwp, denoted by Ric satisfies
(1) Ric = RicB −
(m− 2)1
HcB + k
+2(m− 2) 1
dc⊗ dc+ k 1
[dc⊗ dw + dw ⊗ dc]
(m− 3)gB(∇
Bc,∇Bc)
gB(∇Bw,∇Bc)
on L(B)× L(B),
10 FERNANDO DOBARRO & BÜLENT ÜNAL
(2) Ric = 0 on L(B)× L(F ),
(3) Ric = RicF −
(m− 2)gB(∇
Bw,∇Bc)
+(k − 1)gB(∇
Bw,∇Bw)
gF on L(F )× L(F ).
Theorem 3.2. The scalar curvature S of a [c, w]-bcwp is given by
c2S = SB + SF
− 2(m− 1)∆Bc
− 2k∆Bw
− (m− 4)(m− 1)gB(∇
Bc,∇Bc)
− 2k(m− 2)gB(∇
Bw,∇Bc)
− k(k − 1)gB(∇
Bw,∇Bw)
The following two lemmas (3.3 and 3.7) play a central role in the proof
of Theorems 2.2 and 2.3. Indeed, it is sufficient to apply them in a suitable
mode and make use of Theorems 3.1 and 3.2 several times, the reader can
find all the details in [29, §2 and 4].
LetN = (Nn, h) be a pseudo-Riemannian manifold of dimension n, |∇(·)|2 =
|∇N (·)|2N = h(∇N (·),∇N (·)) and ∆h = ∆N .
Lemma 3.3. Let Lh be a differential operator on C
>0(N) defined by
(3.1) Lhv =
where ri, ai ∈ R and ζ :=
riai, η :=
i . Then,
(3.2) Lhv = (η − ζ)
‖grad hv‖2h
(ii) If ζ 6= 0 and η 6= 0, for α = ζ
and β =
, then we have
(3.3) Lhv = β
Remark 3.4. We also applied the latter lemma in the study of curvature of
multiply warped products (see [28]).
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 11
Corollary 3.5. Let Lh be a differential operator defined by
(3.4) Lhv = r1
for v ∈ C∞>0(N),
where r1a1 + r2a2 6= 0 and r1a21 + r2a22 6= 0. Then, by changing the variables
v = uα with 0 < u ∈ C∞(N), α = r1a1 + r2a2
1 + r2a
and β =
(r1a1 + r2a2)
1 + r2a
α(r1a1 + r2a2) there results
(3.5) Lhv = β
Remark 3.6. By the change of variables as in Corollary 3.5 equations of the
(3.6) Lhv = r1
= H(v, x, s),
transform into
(3.7) β∆hu = uH(u
α, x, s).
Lemma 3.7. Let Hh be a differential operator on C∞>0(N) defined by
(3.8) Hhv =
riai and η :=
i , where the indices extend from 1 to l ∈ N and
any ri, ai ∈ R. Hence,
(3.9) Hhv = (η − ζ)
dv ⊗ dv + ζ 1
where ⊗ is the usual tensorial product. If furthermore, ζ 6= 0 and η 6= 0,
(3.10) Hhv = β
where α =
and β =
4. The problem (Pb-sc) - Existence of solutions
Throughout this section, we will assume that B is not only a Riemannian
manifold of dimension m ≥ 3, but also “compact” and connected. We further
assume that F is a pseudo-Riemannian manifold of dimension k ≥ 0 with
constant scalar curvature SF ≤ 0. Moreover, we will assume that µ 6= µsc.
Hence, we will concentrate our attention on the relations (2.11), (2.12) and
(2.13) by applying Theorem 2.3 (i).
12 FERNANDO DOBARRO & BÜLENT ÜNAL
Let λ1 denote the principal eigenvalue of the operator
(4.1) L(·) = −β∆B(·) + SB(·),
and u1 ∈ C∞>0(B) be the corresponding positive eigenfunction with ‖u1‖∞ =
1, where β is as in Theorem 2.3.
First of all, we will state some results about uniqueness and non-existence
of positive solutions for Equation (2.16) under the latter hypothesis.
About the former, we adapt Lemma 3.3 in [5, p. 525] to our situation (for
a detailed proof see [5], [20, Method II, p. 103] and also [70]).
Lemma 4.1. Let f ∈ C0(R>0) such that t−1f(t) is decreasing. If v and w
satisfy
(4.2)
−β∆Bv + SBv ≤ f(v),
v ∈ C∞>0(B),
(4.3)
−β∆Bw + SBw ≥ f(w),
w ∈ C∞>0(B),
then w ≥ v on B.
Proof. Let θ(t) be a smooth nondecreasing function such that θ(t) ≡ 0 for
t ≤ 0 and θ(t) ≡ 1 for t ≥ 1. Thus for all ǫ > 0,
θǫ(t) := θ
is smooth, nondecreasing, nonnegative and θ(t) ≡ 0 for t ≤ 0 and θ(t) ≡ 1
for t ≥ ǫ. Furthermore γǫ(t) :=
sθ′ǫ(s)ds satisfies 0 ≤ γǫ(t) ≤ ǫ, for any
t ∈ R.
On the other hand, since (B, gB) is a compact Riemannian manifold without
boundary and β > 0, like in [5, Lemma 3.3, p. 526] there results
(4.4)
[−vβ∆Bw + wβ∆Bv]θǫ(v − w)dvgB ≤
[−β∆Bv]γǫ(v − w)dvgB .
Hence, by the above considerations about θǫ and γǫ, (4.4) implies that
(4.5)
[−vβ∆Bw + wβ∆Bv]θǫ(v − w)dvgB ≤ ǫ
[−β∆Bv≥0]
[−β∆Bv]dvgB .
Now, by applying (4.2) and (4.3) there results
(4.6)
− vβ∆Bw+wβ∆Bv = vLw−wLv ≥ vf(w)−wf(v) = vw
− f(v)
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 13
Thus by combining (4.6) and (4.5), as ǫ→ 0+ we led to
(4.7)
[v>w]
− f(v)
dvgB ≤ 0
and conclude the proof like in [5, Lemma 3.3, p. 526-527]. But
on [v > w] and hence meas[v > w] = 0; thus v ≤ w. 1 �
Corollary 4.2. Let f ∈ C0(R>0) such that t−1f(t) is decreasing. Then
(4.8)
−β∆Bv + SBv = f(v),
v ∈ C∞>0(B)
has at most one solution.
Proof. Assume that v and w are two solutions of (4.8). Then by applying
Lemma 4.1 firstly with v and w, and conversely with w and v, the conclusion
is proved. �
Remark 4.3. Notice that Lemma 4.1 and Corollary 4.2 allow the function
f ∈ C0(R>0) to be singular at 0.
Related to the non-existence of smooth positive solutions for Equation
(2.16), we will state an easy result under the general hypothesis of this
section.
Proposition 4.4. If either maxB SB ≤ infu∈R>0 u2µα(λ− SFu−2α) or
minB SB ≥ supu∈R>0 u
2µα(λ − SFu−2α), then (2.16) has no solution in
C∞>0(B).
Proof. It is sufficient to apply the maximum principle with some easy ad-
justments to the particular involved coefficients. �
• The case of scalar flat fiber, i.e. SF = 0.
In this case, the term containing the nonlinearity u2(µ−1)α+1 becomes non-
influent in (2.16), thus (Pb-sc) equivalently results to the study of existence
of solutions for the problem:
(4.9)
−β∆Bu+ SBu = λu2µα+1,
u ∈ C∞>0(B),
where λ is a real parameter (i.e., it is the searched constant scalar curvature)
and ψ = uα.
1meas denotes the usual gB−measure on the compact Riemannian manifold (Bm, gB)
14 FERNANDO DOBARRO & BÜLENT ÜNAL
Remark 4.5. 2 Let p ∈ R\{1} and (λ0, u0) ∈ (R\{0})×C∞>0(B) be a solution
(4.10)
−β∆Bu+ SBu = λup,
u ∈ C∞>0(B).
Hence, by the difference of homogeneity between both members of (4.9), it
is easy to show that if λ ∈ R satisfies sign(λ) = sign(λ0), then (λ, uλ) is a
solution of (4.10), where uλ = tλu0 and tλ =
Thus by (4.9), we obtain geometrically: if the parameter µ is given in a
way that p := 2µα + 1 6= 1 and B ×[ψµ0 ,ψ0] F has constant scalar curvature
λ0 6= 0, then for any λ ∈ R verifying sign(λ) = sign(λ0), there results that
B×[ψµ
F is of scalar curvature λ, where ψλ = t
ψ0 and tλ given as above.
Theorem 4.6. (Case : µ = 0) The scalar curvature of a (ψ, 0)-bcwp of base
B and fiber F (i.e., a singly warped product B ×ψ F ) is a constant λ if and
only if λ = λ1 and ψ is a positive multiple of u
1 (i.e., ψ = tu
1 for some
t ∈ R>0).
Proof. First of all note that µ = 0 implies α =
k + 1
. On the other hand, in
this case, the problem (4.9) is linear, so it is sufficient to apply the well known
results about the principal eigenvalue and its associated eigenfunctions of
operators like (4.1) in a suitable setting. �
Theorem 4.7. (Case : µsc < µ < 0) The scalar curvature of a (ψ, µ)-bcwp
of base B and fiber F is a constant λ, only if sign(λ) = sign(λ1). Further-
more,
(1) if λ = 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has
constant scalar curvature 0 if and only if λ1 = 0. Moreover, such
ψ’s are the positive multiples of uα1 , i.e. tu
1 , t ∈ R>0.
(2) if λ > 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has
constant scalar curvature λ if and only if λ1 > 0. In this case, the
solution ψ is unique.
(3) if λ < 0 then there exists ψ ∈ C∞>0(B) such that B ×[ψµ,ψ] F has
constant scalar curvature λ when λ1 < 0 and is close enough to 0.
Proof. The condition µsc < µ < 0 implies that 0 < p := 2µα + 1 < 1,
i.e., the problem (4.9) is sublinear. Thus, to prove the theorem one can use
variational arguments as in [24] (alternatively, degree theoretic arguments
as in [7] or bifurcation theory as in [27]).
2Along this article we consider the sign function defined by sign = χ(0,+∞) − χ(−∞,0),
where χA is the characteristic function of the set A.
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 15
We observe that in order to obtain the positivity of the solutions required in
(4.9), one may apply the maximum principle for the case of λ > 0 and the
antimaximum principle for the case of λ < 0.
The uniqueness for λ > 0 is a consequence of Corollary 4.2. �
Remark 4.8. In order to consider the next case we introduce the following
notation. For a given p such that 1 < p ≤ pY , let
(4.11) κp := inf
|∇Bv|2 + SB
dvgB ,
where
Hp :=
v ∈ H1(B) :
|v|p+1dvgB = 1
Now, we consider the following two cases.
(1 < p < pY ): In this case by adapting [42, Theorem 1.3], there ex-
ists up ∈ C∞>0(B) such that (βκp, up) is a solution of (4.10) and∫
up+1p dvgB = 1.
(p = pY ): For this specific and important value, analogously to [42, §2],
we distinguish three subcases along the study of our problem (4.10),
in correspondence with the sign(κpY ).
κpY = 0: in this case, there exists upY ∈ C∞>0(B) such that (0, upY )
is a solution of (4.10) and
upY +1pY dvgB = 1.
κpY < 0: here there exists upY ∈ C∞>0(B) such that (βκpY , upY ) is
a solution of (4.10) and
upY +1pY dvgB = 1.
κpY > 0: this is a more difficult case, let Km be the sharp Eu-
clidean Sobolev constant
(4.12) Km =
m(m− 2)ω
where ωm is the volume of the unit m−sphere. Thus, if
(4.13) κpY <
then there exists upY ∈ C∞>0(B) such that (βκpY , upY ) is a solu-
tion of (4.10) and
upY +1pY dvgB = 1. Furthermore, the condi-
(4.14) κpY ≤
16 FERNANDO DOBARRO & BÜLENT ÜNAL
is sharp by [42], so that this is independent of the underlying
manifold and the potential considered.
The equality case in (4.14) is discussed in [44].
This results allow to establish the following two theorems.
Theorem 4.9. (Cases : µpY < µ < µsc or 0 < µ) There exists ψ ∈ C∞>0(B)
such that the scalar curvature of B ×[ψµ,ψ] F is a constant λ if and only if
sign(λ) = sign(κp) where p := 2µα+1 and κp is given by (4.11). Furthermore
if λ < 0, then the solution ψ is unique.
Proof. The conditions (µpY < µ < µsc or 0 < µ) imply that 1 < p := 2µα+
1 < pY , i.e. the problem (4.9) is superlinear but subcritical with respect to
the Sobolev immersion theorem (see [29, Remark 5.5]). By recalling that
ψ = uα, it is sufficient to prove that follows.
Let up be defined as in the case of (1 < p < pY ) in Remark 4.8. If (λ, u)
is a solution of (4.9), then multiplying (4.9) by up and integrating by parts
there results
(4.15) βκp
upudvgB = λ
pdvgB .
Thus sign(λ) = sign(κp) since β, up and u are all positive.
Conversely, if λ is a real constant such that sign(λ) = sign(κp) 6= 0,
then by Remark 4.5, (λ, uλ) is a solution of (4.9), where uλ = tλup and
On the other side, if λ = κp = 0, then (0, up) is a solution of (4.9).
Since 1 < p, the uniqueness for λ < 0 is a consequence of Corollary 4.2. �
Theorem 4.10. (Cases : µ = µpY ) If there exists ψ ∈ C∞>0(B) such that the
scalar curvature of B ×[ψµpY ,ψ] F is a constant λ, then sign(λ) = sign(κpY ).
Furthermore, if λ ∈ R verifying sign(λ) = sign(κpY ) and (4.13), then there
exists ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµpY ,ψ] F is λ.
Besides, if λ ∈ R is negative, then there exists at most one ψ ∈ C∞>0(B) such
that the scalar curvature of B ×[ψµpY ,ψ] F is λ.
Proof. The proof is similar to that of Theorem 4.9, but follows from the
application of the case of (p = pY ) in Remark 4.8. Like above, the uniqueness
of λ < 0 is a consequence of Corollary 4.2. �
In the next proposition including the supercritical case, we will apply the
following result (see also [73, p.99]).
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 17
Lemma 4.11. Let (Nn, gN ) be a compact connected Riemannian manifold
without boundary of dimension n ≥ 2 and ∆gN be the corresponding Laplace-
Beltrami operator. Consider the equation of the form
(4.16)
−∆gNu = f(·, u),
u ∈ C∞>0(N)
where f ∈ C∞(N × R>0). If there exist a0 and a1 ∈ R>0 such that
(4.17)
u < a0 ⇒ f(·, u) > 0
u > a1 ⇒ f(·, u) < 0,
then (4.16) has a solution satisfying a0 ≤ u ≤ a1.
Proposition 4.12. (Cases : −∞ < µ < µsc or 0 < µ) If maxSB < 0, then
for all λ < 0 there exists ψ ∈ C∞>0(B) such that the scalar curvature of
B ×[ψµ,ψ] F is the constant λ. Furthermore, the solution ψ is unique.
Proof. The conditions (−∞ < µ < µsc or 0 < µ) imply that 1 < p :=
2µα+ 1.
On the other hand, since B is compact, by taking
f(., u) = −SB(·)u+ λup = (−SB + λup−1)u,
we obtain that limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus
(4.17) is verified.
Hence, the proposition is proved by applying Lemma 4.11 on (Bm, gB). No-
tice that a0 can take positive values and eventually gets close enough to 0
due to the condition of limu−→0+ f(·, u), and consequently the corresponding
solution results positive.
Again, since λ < 0 and 1 < p the uniqueness is a consequence of Corollary
4.2. �
Proof. (of Theorem 2.5) This is an immediate consequence of the above
results. �
• The case of a fiber with negative constant scalar curvature, i.e. SF < 0.
Here, the (Pb-sc) becomes equivalent to the study of the existence for
the problem
(4.18)
−β∆Bu+ SBu = λup − SFuq,
u ∈ C∞>0(B),
where λ is a real parameter (i.e., the searched constant scalar curvature),
ψ = uα, p = 2µα+ 1 and q = 2(µ − 1)α + 1.
Remark 4.13. Let u be a solution of (4.18).
18 FERNANDO DOBARRO & BÜLENT ÜNAL
(i) If λ1 ≤ 0, then λ < 0. Indeed, multiplying the equation in (4.18) by
u1 and integrating by parts there results:
(4.19) λ1
u1udvgB + SF
qdvgB = λ
pdvgB ,
where u1 and u are positive.
(ii) If λ = 0, then λ1 > 0.
(iii) If µ = 0 (the warped product case), then λ < λ1. These cases have
been studied in [27, 24].
(iv) If µ = 1 (the Yamabe problem for the usual product with conformal
factor in C∞>0(B)), there results sign(λ) = sign(λ1 + SF ).
An immediate consequence of Remark 4.13 is the following lemma.
Lemma 4.14. Let B and F be given like in Theorem 2.3(i). Suppose further
that B is a compact connected Riemannian manifold and F is a pseudo-
Riemannian manifold of constant scalar curvature SF < 0. If λ ≥ 0 and
λ1 ≤ 0 (for instance when SB ≤ 0 on B), then there is no ψ ∈ C∞>0(B) such
that the scalar curvature of B ×[ψµ,ψ] F is λ.
Theorem 4.15. [29, Rows 6 and 8 in Table 4] Under the hypothesis of
Theorem 2.3(i), let B be a compact connected Riemannian manifold and
F be a pseudo-Riemannian manifold of constant scalar curvature SF < 0.
Suppose that “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD and µ ∈ (0, 1) ∩
C[µ−, µ+]”.
(1) If λ1 ≤ 0, then λ ∈ R is the scalar curvature of a B ×[ψµ,ψ] F if and
only if λ < 0.
(2) If λ1 > 0, then there exists Λ ∈ R>0 such that λ ∈ R \ {Λ} is the
scalar curvature of a B ×[ψµ,ψ] F if and only if λ < Λ.
Furthermore if λ ≤ 0, then there exists at most one ψ ∈ C∞>0(B) such that
B ×[ψµ,ψ] F has scalar curvature λ.
Proof. The proof of this theorem is the subject matter of §5. �
Once again we make use of Lemma 4.11 for the next theorem about the
singular case and the following propositions.
Theorem 4.16. [29, Row 7 Table 4] Under the hypothesis of Theorem 2.3(i),
let B be a compact connected Riemannian manifold and F be a pseudo-
Riemannian manifold of constant scalar curvature SF < 0. Suppose that
“(m,k) ∈ CD and µ ∈ (0, 1) ∩ (µ−, µ+)”, then for any λ < 0 there exists
ψ ∈ C∞>0(B) such that the scalar curvature of B×[ψµ,ψ] F is λ. Furthermore
the solution ψ is unique.
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 19
Proof. First of all note that the conditions “(m,k) ∈ CD and µ ∈ (0, 1) ∩
(µ−, µ+)” imply that q < 0 and 1 < p , i.e. the problem (4.18) is superlinear
in p but singular in q.
On the other hand, since B is compact, taking
f(., u) = −SB(·)u+ λup − SFuq = [(−SB(·) + λup−1)u1−q − SF ]uq,
there result limu−→0+ f(·, u) = +∞ and limu−→+∞ f(·, u) = −∞. Thus
(4.17) is verified.
Thus by an application of Lemma 4.11 for (Bm, gB), we conclude the proof
for the existence part.
The uniqueness part just follows from Corollary 4.2. �
Remark 4.17. We observe that the arguments applied in the proof of The-
orem 4.16 can be adjusted to the case of a compact connected Riemannian
manifold B with 0 ≤ q < 1 < p, λ < 0 and SF < 0, so that some of
the situations included in Theorem 4.15. However, both argumentations are
compatible but different.
Proof. (of Theorem 2.6) This is an immediate consequence of the above
results. �
The approach in the next propositions is similar to Proposition 4.12 and
Theorem 4.16.
Proposition 4.18. [29, Row 10 Table 4] Let 1 < µ < +∞. If maxSB < 0,
then for all λ < 0 there exists ψ ∈ C∞>0(B) such that the scalar curvature of
B ×[ψµ,ψ] F is the constant λ.
Proof. The condition 1 < µ < +∞ implies that 1 < q < p.
On the other hand, since B is compact, taking
f(., u) = −SB(·)u+ λup − SFuq = [−SB(·) + (λup−q − SF )uq−1]u,
there result limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus (4.17)
is satisfied.
Thus an elementary application of Lemma 4.11 for (Bm, gB) proves the
proposition. �
Proposition 4.19. [29, Rows 2, 4 and 3 in Table 4] Let either “(m,k) ∈
D and µ ∈ (µsc, 0)” or “(m,k) ∈ CD and µ ∈ (µsc, 0) ∩ C[µ−, µ+]” or
“(m,k) ∈ CD and µ ∈ (µsc, 0)∩ (µ−, µ+)”. If minSB > 0, then for all λ ≤ 0
there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of
B ×[ψµ,ψ] F is the constant λ.
Proof. If either “(m,k) ∈ D and µ ∈ (µsc, 0)” or “(m,k) ∈ CD and µ ∈
(µsc, 0) ∩ C[µ−, µ+]”, then 0 < q < p < 1.
20 FERNANDO DOBARRO & BÜLENT ÜNAL
On the other hand, since B is compact, taking
f(., u) = −SB(·)u+ λup − SFuq = [−SB(·)u1−q + λup−q − SF ]uq,
there result limu−→0+ f(·, u) = 0+ and limu−→+∞ f(·, u) = −∞. Thus (4.17)
is verified and again we can apply Lemma 4.11 for (Bm, gB).
If “(m,k) ∈ CD and µ ∈ (µsc, 0) ∩ (µ−, µ+)”, then q < 0 < p < 1. Con-
sidering the limits as above, limu−→0+ f(·, u) = +∞ and limu−→+∞ f(·, u) =
−∞. So, an application of Lemma 4.11 concludes the proof. �
Remark 4.20. Notice that in Theorems 4.15 and 4.16 we do not assume
hypothesis related to the sign of SB(·), unlike in Propositions 4.12, 4.18 and
4.19.
Proposition 4.21. [29, Rows 5 and 9 in Table 4] Let (m,k) ∈ CD be.
(1) If either “µ ∈
m− 1 , 0
∩ {µ−, µ+} and minSB > 0” or “µ ∈
(0, 1) ∩ {µ−, µ+}”, then for all λ < 0 there exists a smooth function
ψ ∈ C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is the
constant λ. In the second case, ψ is also unique .
(2) If either “µ ∈
m− 1 , 0
∩ {µ−, µ+}” or “µ ∈ (0, 1) ∩ {µ−, µ+}”
and furthermore λ1 > 0, then there exists a smooth function ψ ∈
C∞>0(B) such that the scalar curvature of B ×[ψµ,ψ] F is 0.
Proof. In both cases q = 0, so by considering
f(., u) = −SB(·)u+ λup − SF ,
the proof of (1) follows as in the latter propositions, while that of (2) is a
consequence of the linear theory and the maximum principle. �
Remark 4.22. Finally, we observe a particular result about the cases studied
in [27]. If µ = 0, then p = 1 and q = 1−2α = k − 3
k + 1
. When the dimension of
the fiber is k = 2, the exponent q = −1
. So, writing the involved equation
∆Bu = f(., u) = −SB(·)u + λu− SFu−
and by applying Lemma 4.11 as above, we obtain that if λ < minSB, then
there exists a smooth function ψ ∈ C∞>0(B) such that the scalar curvature of
B ×ψ F is the constant λ. Furthermore, by Corollary 4.2 such ψ is unique
(see [27, 24] and [25]).
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 21
5. Proof of the Theorem 4.15
The subject matter of this section is the proof of the Theorem 4.15, so we
naturally assume its hypothesis.
Most of the time, we need to specify the dependence of λ of (4.18), we will
do that by writing (4.18)λ. Furthermore, we will denote the right hand side
of (4.18)λ by fλ(t) := λt
p − SF tq.
The conditions either “(m,k) ∈ D and µ ∈ (0, 1)” or “(m,k) ∈ CD
and µ ∈ (0, 1) ∩ C[µ−, µ+]”, imply that 0 < q < 1 < p. But the type of
nonlinearity in the right hand side of (4.18)λ changes with the signλ, i.e. it
is purely concave for λ < 0 and concave-convex for λ > 0.
The uniqueness for λ ≤ 0 is again a consequence of Corollary 4.2.
In order to prove the existence of a solution for (4.18)λ with signλ 6= 0, we
adapt the approach of sub and upper solutions in [5].
Thus, the proof of Theorem 4.15 will be an immediate consequence of the
results that follows.
Lemma 5.1. (4.18)0 has a solution if and only if λ1 > 0.
Proof. This situation is included in the results of the second case of Theorem
4.7 by replacing −SF with λ (see [24, Proposition 3.1]). �
Lemma 5.2. Let us assume that {λ : (4.18)λ has a solution} is non-empty
and define
(5.1) Λ = sup{λ : (4.18)λ has a solution}.
(i) If λ1 ≤ 0, then Λ ≤ 0.
(ii) If λ1 > 0, then there exists λ > 0 finite such that Λ ≤ λ.
Proof.
(i) It is sufficient to observe Remark 4.13 i.
(ii) Like in [5], let λ > 0 such that
(5.2) λ1t < λt
p − SF tq,∀t ∈ R, t > 0.
Thus, if (λ, u) is a solution of (4.18)λ, then
p − SF
λ1u1u < λ
p − SF
so λ < λ.
Lemma 5.3. Let
(5.3) Λ = sup{λ : (4.18)λ has a solution}.
22 FERNANDO DOBARRO & BÜLENT ÜNAL
Figure 1. The nonlinearity fλ in Lemma 5.3, i.e. 0 < q <
1 < p, SF < 0, λ1 > 0, λ > 0.
(i) Let E ∈ R>0. There exist 0 < λ0 = λ0(E) and 0 < M = M(E,λ0)
such that ∀λ : 0 < λ ≤ λ0, so we have
(5.4) 0 < E
fλ(EM)
(ii) If λ1 > 0, then {λ > 0 : (4.18)λ has a solution} 6= ∅. As a conse-
quence of that, Λ is finite.
(iii) If λ1 > 0, then for all 0 < λ < Λ there exists a solution of the
problem (4.18)λ.
Proof.
(i) For any 0 < λ < λ0
0 < gλ(r) := E
fλ(Er)
= Erq−1(λEp−1rp−q − SFEq−1)
< Erq−1(λ0E
p−1rp−q − SFEq−1).
It is easy to see that
q − 1
p−q 1
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 23
is a minimum point for gλ0 and
gλ0(r0) = E
q − 1
) q−1
q − 1
p− 1 − 1
→ 0+, as λ0 → 0+.
Hence there exist 0 < λ0 = λ0(E) and 0 < M =M(E,λ0) such that
(5.4) is verified.
(ii) Since λ1 > 0, by the maximum principle, there exists a solution
e ∈ C∞>0(B) of
(5.5) LB(e) = −β∆Be+ SBe = 1.
Then, applying item (i) above with E = ‖e‖∞ there exists 0 < λ0 =
λ0(‖e‖∞) and 0 < M = M(‖e‖∞, λ0) such that ∀λ with 0 < λ ≤ λ0
we have that
(5.6) LB(Me) =M ≥ fλ(Me),
hence Me is a supersolution of (4.18)λ.
On the other hand, since ǔ1 := inf u1 > 0, for all λ > 0
(5.7) ǫ−1fλ(ǫǔ1) = ǫ
q−1[λǫp−qǔ
1 − SF ǔ
1] → +∞, as ǫ→ 0+.
Furthermore, note that fλ is nondecreasing when λ > 0. Hence for
any 0 < λ there exists a small enough 0 < ǫ verifying
(5.8) LB(ǫu1) = ǫλ1u1 ≤ ǫλ1‖u1‖∞ ≤ fλ(ǫǔ1) ≤ fλ(ǫu1),
thus ǫu1 is a subsolution of (4.18)λ.
Then for any 0 < λ < λ0, (taking eventually 0 < ǫ smaller if
necessary), we have that the above constructed couple sub super
solution satisfies
(5.9) ǫu1 < Me.
Now, by applying the monotone iteration scheme, we have that
{λ > 0 : (4.18)λ has a solution} 6= ∅. Furthermore by Lemma 5.2
(ii) there results Λ is finite.
(iii) The proof of this item is completely analogous to Lemma 3.2 in [5].
We will rewrite this to be self contained.
Given λ < Λ, let uν be a solution of (4.18)ν with λ < ν < Λ.
Then uν is a supersolution of (4.18)λ and for small enough 0 < ǫ, the
subsolution ǫu1 of (4.18)λ verifies ǫu1 < uν , then as above (4.18)λ
has a solution.
Lemma 5.4. For any λ < 0, there exists γλ > 0 such that ‖u‖∞ ≤ γλ for
any solution u of (4.18)λ. Furthermore if SB is nonnegative, then positive
zero of fλ can be choose as γλ.
24 FERNANDO DOBARRO & BÜLENT ÜNAL
fΛHtL
fΛHtL+Νt
Figure 2. The nonlinearity in Lemma 5.5 , i.e. 0 < q < 1 <
p, SF < 0, λ1 > 0, λ < 0.
Proof. Define ŠB := minSB (recall that B is compact). There are two
different situations, namely.
• 0 ≤ ŠB: since there exists x1 ∈ B such that u(x1) = ‖u‖∞ and
0 ≤ −β∆Bu(x1) = −SB(x1)‖u‖∞ + λ‖u‖p∞ − SF‖u‖q∞, there results
‖u‖∞ ≤ γλ, where γλ is the strictly positive zero of fλ.
• ŠB < 0: we consider f̃λ(t) := λtp − SF tq − ŠBt. Now, our problem
(4.18)λ is equivalent to
−β∆Bu+ (SB − ŠB)u = f̃λ(u),
u ∈ C∞>0(B).
But here the potential of (SB− ŠB) is non negative and the function
f̃λ has the same behavior of fλ with a positive zero γ̃λ on the right
side of the positive zero γλ of fλ. Thus, repeating the argument for
the case of ŠB ≥ 0, we proved ‖u‖∞ ≤ γ̃λ.
Lemma 5.5. Let λ1 > 0. Then for all λ < 0 there exists a solution of
(4.18)λ.
Proof. We will apply again the monotone iteration scheme. Define ŠB :=
minSB (note that B is compact).
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 25
• 0 ≤ ŠB: Clearly, the strictly positive zero γλ of fλ is a supersolution
(5.10) − β∆Bu+ (SB + ν)u = fλ(u) + νu,
for all ν ∈ R.
On the other hand, for 0 < ǫ = ǫ(λ) small enough,
(5.11) LB(ǫu1) = ǫλ1u1 ≤ fλ(ǫu1).
Then ǫu1 is a subsolution of (5.10) for all ν ∈ R.
By taking ε possibly smaller, we also have
(5.12) 0 < ǫu1 < γλ.
We note that for large enough values of ν ∈ R>0, the nonlinearity
on the right hand side of (5.10), namely fλ(t) + νt, is an increasing
function on [0, γλ].
Thus applying the monotone iteration scheme we obtain a strictly
positive solution of (5.10), and hence a solution of (4.18)λ (see [3],
[4], [54]).
• ŠB < 0: In this case, like in Lemma 5.4 we consider f̃λ(t) := λtp −
q − ŠBt. Then, the problem (4.18)λ is equivalent to
(5.13)
−β∆Bu+ (SB − ŠB)u = f̃λ(u),
u ∈ C∞>0(B),
where the potential is nonnegative and the function f̃λ has a similar
behavior to fλ with a positive zero γ̃λ on the right side of the positive
zero γλ of fλ.
Here, it is clear that γ̃λ is a positive supersolution of
(5.14) − β∆Bu+ (SB − ŠB + ν)u = f̃λ(u) + νu,
for all ν ∈ R. Hence, we complete the proof similarly to the case of
ŠB ≥ 0.
Lemma 5.6. Let λ1 ≤ 0, λ < 0, ŠB := minSB and also let γλ be a positive
zero of fλ and γ̃λ be a positive zero of f̃λ := fλ − ŠBidR≥0 . Then there
exists a solution u of (4.18)λ. Furthermore any solution of (4.18)λ satisfies
γλ ≤ ‖u‖∞ ≤ γ̃λ.
Proof. First of all we observe that if SB ≡ 0 (so λ1 = 0), then u ≡ γλ is the
searched solution of (4.18)λ.
Now, we assume that SB 6≡ 0. Since λ1 ≤ 0, there results ŠB < 0. In this
case, one can notice that 0 < γλ < γ̃λ.
26 FERNANDO DOBARRO & BÜLENT ÜNAL
On the other hand, the problem (4.18)λ is equivalent to
(5.15)
−β∆Bu+ (SB − ŠB)u = f̃λ(u),
u ∈ C∞>0(B).
By the second part of the proof of Lemma 5.4, if u is a solution of (4.18)λ
(or equivalently (5.15)), then ‖u‖∞ ≤ γ̃λ. Besides, since
u1(fλ ◦ u) = λ1
u, u1 > 0 and λ1 ≤ 0 results γλ ≤ ‖u‖∞.
From this point on, the proof of the existence of solutions for (5.15) follows
the lines of the second part of Lemma 5.5. �
6. Conclusions and future directions
Now, we would like to summarize the content of the paper and to propose
our future plans on this topic.
We remark to the reader that several computations and proofs, along with
other complementary results mentioned in this article and references can be
obtained in [29]. We have chosen this procedure to avoid the involved long
computations.
In brief, we introduced and studied curvature properties of a particular
family of warped products of two pseudo-Riemannian manifolds which we
called as a base conformal warped product. Roughly speaking the metric
of such a product is a mixture of a conformal metric on the base and a
warped metric. We concentrated our attention on a special subclass of this
structure, where there is a specific relation between the conformal factor c
and the warping function w, namely c = wµ with µ a real parameter.
As we mentioned in §1 and the first part of §2, these kinds of metrics and
considerations about their curvatures are very frequent in different physi-
cal areas, for instance theory of general relativity, extra-dimension theories
(Kaluza-Klein, Randall-Sundrum), string and super-gravity theories; also in
global analysis for example in the study of the spectrum of Laplace-Beltrami
operators on p-forms, etc.
More precisely, in Theorems 3.1 and 3.2, we obtained the classical relations
among the different involved Ricci tensors (respectively, scalar curvatures)
for metrics of the form c2gB⊕w2gF . Then the study of particular families of
either scalar or tensorial nonlinear partial differential operators on pseudo-
Riemannian manifolds (see Lemmas 3.3 and 3.7) allowed us to find reduced
expressions of the Ricci tensor and scalar curvature for metrics as above with
c = wµ, where µ a real parameter (see Theorems 2.2 and 2.3). The operated
reductions can be considered as generalizations of those used by Yamabe in
[79] in order to obtain the transformation law of the scalar curvature under
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 27
a conformal change in the metric and those used in [27] with the aim to
obtain a suitable relation among the involved scalar curvatures in a singly
warped product (see also [52] for other particular application and our study
on multiply warped products in [28]).
In §4 and 5, under the hypothesis that (B, gB) be a “compact” and con-
nected Riemannian manifold of dimension m ≥ 3 and (F, gF ) be a pseudo
- Riemannian manifold of dimension k ≥ 0 with constant scalar curvature
SF , we dealt with the problem (Pb-sc). This question leads us to ana-
lyze the existence and uniqueness of solutions for nonlinear elliptic partial
differential equations with several kinds of nonlinearities. The type of non-
linearity changes with the value of the real parameter µ and the sign of SF .
In this article, we concentrated our attention to the cases of constant scalar
curvature SF ≤ 0 and accordingly the central results are Theorems 2.5 and
2.6. Although our results are partial so that there are more cases to study
in forthcoming works, we obtained also other complementary results under
more restricted hypothesis about the sign of the scalar curvature of the base.
Throughout our study, we meet several types of partial differential equa-
tions. Among them, most important ones are those with concave-convex
nonlinearities and the one so called Lichnerowicz-York equation. About the
former, we deal with the existence of solutions and leave the question of
multiplicity of solutions to a forthcoming study.
We observe that the previous problems as well as the study of the Ein-
stein equation on base conformal warped products, (ψ, µ)-bcwp’s and their
generalizations to multi-fiber cases, give rise to a reach family of interesting
problems in differential geometry and physics (see for instance, the several
recent works of R. Argurio, J. P. Gauntlett, M. O. Katanaev, H. Kodama, J.
Maldacena, H. -J. Schmidt, A. Strominger, K. Uzawa, P. S. Wesson among
many others) and in nonlinear analysis (see the different works of A. Am-
brosetti, T. Aubin, I. Choquet-Bruat, J. Escobar, E. Hebey, J. Isenberg, A.
Malchiodi, D. Pollack, R. Schoen, S. -T. Yau among others).
Appendix A.
Let us assume the hypothesis of Theorem 2.3 (i), the dimensions of the
base m ≥ 2 and of the fiber k ≥ 1. In order to describe the classification of
the type of nonlinearities involved in (2.11), we will introduce some notation
(for a complete study of these nonlinearities see [29, Section 5]). The example
in Figure 1 will help the reader to clarify the notation.
Note that the denominator in (2.12) is
(A.1) η := (m− 1)(m − 2)µ2 + 2(m− 2)kµ + (k + 1)k
28 FERNANDO DOBARRO & BÜLENT ÜNAL
and verifies η > 0 for all µ ∈ R. Thus α in (2.12) is positive if and only if
µ > − k
m− 1 and by the hypothesis µ 6= −
m− 1 in Theorem 2.3 (i), results
α 6= 0.
We now introduce the following notation:
(A.2)
p = p(m,k, µ) = 2µα+ 1 and
q = q(m,k, µ) = 2(µ− 1)α + 1 = p− 2α,
where α is defined by (2.12).
Thus, for all m,k, µ given as above, p is positive. Indeed, by (A.1), p > 0
if and only if ̟ > 0, where
̟ := ̟(m,k, µ)
:= 4µ[k + (m− 1)µ] + (m− 1)(m− 2)µ2 + 2(m− 2)kµ + (k + 1)k
= (m− 1)(m+ 2)µ2 + 2mkµ + (k + 1)k.
But discr (̟) ≤ −4km2 ≤ −16 and m > 1, so ̟ > 0.
Unlike p, q changes sign depending on m and k. Furthermore, it is im-
portant to determine the position of p and q with respect to 1 as a function
of m and k. In order to do that, we define
(A.3) D := {(m,k) ∈ N≥2 × N≥1 : discr (̺(m,k, ·)) < 0},
where N≥l := {j ∈ N : j ≥ l} and
̺ := ̺(m,k, µ)
:= 4(µ − 1)[k + (m− 1)µ] + (m− 1)(m− 2)µ2 + 2(m− 2)kµ + (k + 1)k
= (m− 1)(m + 2)µ2 + 2(mk − 2(m− 1))µ + (k − 3)k.
Note that by (A.1), q > 0 if and only if ̺ > 0. Furthermore q = 0 if and
only if ̺ = 0. But here discr (̺(m,k, ·)) changes its sign as a function of m
and k.
We adopt here the notation in [29, Table 4] below, namely CD = (N≥2×
N≥1)\D if D ⊆ N≥2×N≥1 and CI = R\I if I ⊆ R. Thus, if (m,k) ∈ CD, let
µ− and µ+ two roots (eventually one, see [29, Remark 5.3]) of q, µ− ≤ µ+.
Besides, if discr (̺(m,k, ·)) > 0, then µ− < 0; whereas µ+ can take any sign.
References
[1] O. Aharony, S. S. Gubser, J. Maldacena, H. Ooguri and Y. Oz, Large N
Field Theories, String Theory and Gravity, Physics Reports 323 (2000), 183-386
[arXiv:hep-th/9905111].
[2] S. Alama, Semilinear elliptic equations with sublinear indefinite nonlinearities, Ad-
vances in Differential Equations 4 No. 6 (1999), 813-842.
[3] H. Amann, On the number of solutions of nonlinear equations in ordered Banach spaces,
J. Func. Anal. 11(1972), 346-384.
[4] H. Amann, Fixed point equations and nonlinear eigenvalue problems in ordered Banach
spaces, SIAM Rev. 18(1976), 620-709.
http://arxiv.org/abs/hep-th/9905111
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 29
ΜscΜpY Μ-Μ+ 1
pHm,k,ΜL
qHm,k,ΜL
ΑHm,k,ΜL
ΡHm,k,ΜL
Figure 3. Example: (m,k) = (7, 4) ∈ CD
[5] A. Ambrosetti, N. Brezis and G. Cerami, Combined effects of concave and convex
nonlinearities in some elliptic problems, J. Funct. Anal. 122 No. 2 (1994), 519-543.
[6] A. Ambrosetti, J. Garcia Azorero and I. Peral, Existence and multiplicity results for
some nonlinear elliptic equations: a survey, Rendiconti di Matematica Serie VII Vol-
ume 20 (2000), 167-198.
[7] A. Ambrosetti and P. Hess, Positive solutions of asymptotically linear elliptic eigenvalue
problems, J. Math. Anal. Appl. 73 (1980), 411-422.
[8] A. Ambrosetti, A. Malchiodi and W-M.Ni, Singularly Perturbed Elliptic Equations with
Symmetry: Existence of Solutions Concentrating on Spheres, Part I, (2002).
[9] A. Ambrosetti and P. H. Rabinowitz, Dual variational methods in critical points theory
and applications, J. Funct. Anal. 14 (1973), 349–381.
[10] M. T. Anderson, P. T. Chrusciel and E. Delay, Non-trivial, static, geodesically com-
plete, vacuum space-times with a negative cosmological constant, JHEP 10 (2002), 063.
[11] F. Antoci, On the spectrum of the Laplace-Beltrami opertor for p−forms for a
class of warped product metrics, Advances in Mathematics 188 (2) (2004), 247-293
[arXiv:math.SP/0311184].
[12] R. Argurio, Brane Physics in M-theory, PhD thesis (Université Libre de Bruxelles),
ULB-TH-98/15 [arXiv:hep-th/9807171].
[13] T. Aubin, Nonlinear analysis on manifolds. Monge-Ampere equations, Comprehensive
Studies in Mathematics no. 252, Springer Verlag, Berlin (1982).
[14] M. Badiale and F. Dobarro, Some Existence Results for Sublinear Elliptic Problems
in Rn, Funkcialaj Ekv. 39 (1996), 183-202.
[15] M. Bañados, C. Teitelboim and J. Zanelli, The black hole in three dimensional space-
time, Phys. Rev. Letters 69 (1992), 1849-1851.
[16] M. Bañados, C. Henneaux, C. Teitelboim and J. Zanelli, Geometry of 2+1 black hole,
Phys. Rev. D. 48 (1993), 1506-1525.
http://arxiv.org/abs/math/0311184
http://arxiv.org/abs/hep-th/9807171
30 FERNANDO DOBARRO & BÜLENT ÜNAL
[17] J. K. Beem, P. E. Ehrlich and K. L. Easley, Global Lorentzian Geometry, 2nd Edition,
Pure and Applied Mathematics Series Vol. 202, Marcel Dekker Ink., New York (1996).
[18] A. Besse, Einstein manifolds, Modern Surveys in Mathematics no. 10, Springer Ver-
lag, Berlin (1987).
[19] R. L. Bishop and B. O’Neil, Manifolds of negative curvature, Trans. Amer. Math. Soc.
145 (1969), 1-49.
[20] H. Brezis, S. Kamin, Sublinear elliptic equation in RN , Manus. Math. 74 (1992),
p.87-106.
[21] J. Chabrowski and J. B. do O, On Semilinear Elliptic Equations Involving Concave
and Convex Nonlinearities, Math. Nachr. 233-234 (2002), 55-76.
[22] Y. Choquet-Bruhat, J. Isenberg and D. Pollack, The constraint equations for the
Einstein-scalar field system on compact manifolds, Class. Quantum Grav. 24 (2007),
809-828 [arXiv:gr-qc/0610045].
[23] C. Cortázar, M. Elgueta and P. Felmer, On a semilinear elliptic problem in Rn with a
non-Lipschitzian nonlinearity, Advances in Differential Equations 1 (2) (1996) 199-218.
[24] V. Coti Zelati, F. Dobarro and R. Musina, Prescribing scalar curvature in warped
products, Ricerche Mat. 46 (1) (1997), 61-76.
[25] M. G. Cradall, P. H. Rabinowitz and L. Tartar, On a Dirichlet problem with a singular
nonlinearity, MRC Report 1680 (1976).
[26] D. De Figueiredo, J-P. Gossez and P. Ubilla, Local superlinearity and sublinearity for
indefinite semilinear elliptic problems, J. Funct. Anal. 199 (2) (2003), 452-467.
[27] F. Dobarro and E. Lami Dozo, Scalar curvature and warped products of Riemann
manifolds, Trans. Amer. Math. Soc. 303 (1987), 161-168.
[28] F. Dobarro and B. Ünal, Curvature of multiply warped products, J. Geom. Phys. 55
(1) (2005), 75-106 [arXiv:math.DG/0406039].
[29] F. Dobarro and B. Ünal, Curvature of Base Conformal Warped Products,
arXiv:math.DG/0412436.
[30] A. V. Frolov, Kasner-AdS spacetime and anisotropic brane-world cosmology,
Phys.Lett. B 514 (2001), 213-216 [arXiv:gr-qc/0102064].
[31] A. Garcia-Parrado, Bi-conformal vector fields and their applications to
the characterization of conformally separable pseudo-Riemannian manifolds,
arXiv:math-ph/0409037.
[32] A. Garcia-Parrado, J. M. M. Senovilla Bi-conformal vector fields and their applica-
tions, Class. Quantum Grav. 21, 2153-2177.
[33] J. P. Gauntlett, N. Kim and D. Waldram, M-Fivebranes Wrapped on Supersymmetric
Cycles, Phys.Rev. D 63 (2001), 126001 [arXiv:hep-th/0012195].
[34] J. P. Gauntlett, N. Kim and D. Waldram, M-Fivebranes Wrapped on Supersymmetric
Cycles II, Phys.Rev. D 65 (2002), 086003 [arXiv:hep-th/0109039].
[35] J. P. Gauntlett, N. Kim, S. Pakis and D. Waldram, M-theory solutions with AdS
factors, Class. Quantum Grav. 19 (2002), 3927-3945.
[36] J. P. Gauntlett, D. Martelli, J. Sparks and D. Waldram, Supersymmetric AdS5 solu-
tions of M-theory, Class.Quant.Grav. 21 (2004), 4335-4366 [arXiv:hep-th/0402153].
[37] A.M. Ghezelbash and R.B. Mann , Atiyah-Hitchin M-Branes, JHEP 0410 (2004), 012
[arXiv:hep-th/0408189].
[38] J. T. Giblin, Jr. and A. D. Hwang, Spacetime Slices and Surfaces of Revolution,
J.Math.Phys. 45 (2004), 4551 [arXiv:gr-qc/0406010].
[39] J. T. Giblin Jr., D. Marlof and R. H. Garvey, Spacetime Embedding Dia-
grams for Spherically Symmetric Black Holes, Gen.Rel.Grav. 36 (2004), 83-99
[arXiv:gr-qc/0305102].
http://arxiv.org/abs/gr-qc/0610045
http://arxiv.org/abs/math/0406039
http://arxiv.org/abs/math/0412436
http://arxiv.org/abs/gr-qc/0102064
http://arxiv.org/abs/math-ph/0409037
http://arxiv.org/abs/hep-th/0012195
http://arxiv.org/abs/hep-th/0109039
http://arxiv.org/abs/hep-th/0402153
http://arxiv.org/abs/hep-th/0408189
http://arxiv.org/abs/gr-qc/0406010
http://arxiv.org/abs/gr-qc/0305102
ABOUT CURVATURE, CONFORMAL METRICS AND WARPED PRODUCTS 31
[40] B. R. Greene, K. Schalm, G. Shiu, Warped compactifications in M and F theory ,
Nucl.Phys. B 584 (2000), 480-508 [arXiv:hep-th/0004103].
[41] S. W. Hawking and G. F. Ellis, The large scale structure of space-time, Cambridge
Monographs on Mathematical Physics, (1973).
[42] E. Hebey, Variational methods and elliptic equations in Riemannian geometry, Notes
from lectures at ICTP, Workshop on recent trends in nonlinear variational problems,
http://www.ictp.trieste.it, 2003 smr1486/3.
[43] E. Hebey, F. Pacard, D. Pollack, A variational analysis of Einstein-scalar field Lich-
nerowicz equations on compact Riemannian manifolds, arXiv:gr-qc/0702203.
[44] E. Hebey, M. Vaugon, From best constants to critical functions, Math. Z. 237 (2001),
737-767.
[45] S.-T. Hong, J. Choi and Y.-J. Park, (2 + 1) BTZ Black hole and multiply warped
product space time, General Relativity and Gravitation 35 12 (2003), 2105-2116.
[46] M. Ito, Five dimensional warped geometry with bulk scalar field,
arXiv:hep-th/0109040.
[47] M. O. Katanaev, T. Klösch and W. Kummer, Global properties of warped solutions
in general relativity, Ann. Physics 276 (2) (1999), 191-222.
[48] J. Kazdan, Some applications of partial differential equations to problems in geometry,
Surveys in Geometry Series, Tokyo Univ. (1983).
[49] D.-S. Kim and Y. H. Kim, Compact Einstein warped product spaces with nonpositive
scalar curvature, Proc. Amer. Math. Soc. 131 (8) (2003), 2573-2576.
[50] H. Kodama and K. Uzawa, Moduli instability in warped compactifications of the type-
IIB supergravity, JHEP07(2005)061 [arXiv:hep-th/0504193].
[51] H. Kodama and K. Uzawa, Comments on the four-dimensional effective theory for
warped compactification, JHEP03(2006)053 [arXiv:hep-th/0512104].
[52] J. Lelong-Ferrand, Geometrical interpretations of scalar curvature and regularity of
conformal homeomorphisms, Differential Geometry and Relativity, Mathematical Phys.
and Appl. Math. Vol. 3, Reidel, Dordrecht (1976), 91-105.
[53] J. E. Lidsey, Supergravity Brane Cosmologies, Phys. Rev. D 62 (2000), 083515
[arXiv:hep-th/0007014].
[54] P. L. Lions, On the existence of positive solutions of semilinear elliptic equations,
SIAM Review 24 4 (1982), 441-467.
[55] J. Maldacena, The Large N Limit of Superconformal field theories and supergravity,
Adv.Theor.Math.Phys. 2 (1998), 231-252; Int. J. Theor. Phys. 38 (1999), 1113-1133
[arXiv:hep-th /9711200].
[56] R. Melrose, Geometric scattering theory, Stanford Lectures, Cambridge University
Press, Cambridge (1995).
[57] C. W. Misner, J. A. Wheeler and K. S. Thorne, Gravitation, W. H. Freeman and
Company, San Francisco (1973).
[58] Ó. N. Murchadha, Readings of the Licherowicz-York equation, Acta Physica Polonica
B 36, 1 (2005), 109-120.
[59] B. O’Neil, Semi-Riemannian geometry, Academic Press, New York (1983).
[60] J. M. Overduin and P. S. Wesson, Kaluza-Klein Gravity, Phys.Rept. 283 (1997), 303-
380 [arXiv:gr-qc/9805018].
[61] G. Papadopoulos and P. K. Townsend, Intersecting M-branes, Physics Letters B 380
(1996), 273-279 [arXiv:hep-th/9603087].
[62] J. L. Petersen, Introduction to the Maldacena Conjecture on AdS/CFT,
Int.J.Mod.Phys. A 14 (1999), 3597-3672 [arXiv:hep-th/9902131].
http://arxiv.org/abs/hep-th/0004103
http://www.ictp.trieste.it
http://arxiv.org/abs/gr-qc/0702203
http://arxiv.org/abs/hep-th/0109040
http://arxiv.org/abs/hep-th/0504193
http://arxiv.org/abs/hep-th/0512104
http://arxiv.org/abs/hep-th/0007014
http://arxiv.org/abs/gr-qc/9805018
http://arxiv.org/abs/hep-th/9603087
http://arxiv.org/abs/hep-th/9902131
32 FERNANDO DOBARRO & BÜLENT ÜNAL
[63] L. Randall and R. Sundrum, A large mass hierarchy from a small extra dimension,
Phys. Rev. Letters 83 3770 (1999) [arXiv:hep-th/9905221].
[64] L. Randall and R. Sundrum, An alternative to compactification, Phys. Rev. Letters
83 (1999), 4690 [arXiv:hep-th/9906064].
[65] S. Randjbar-Daemi and V. Rubakov, 4d−flat compactifications with brane vorticiteis,
JHEP 0410 (2004), 054 [arXiv:hep-th/0407176].
[66] H.-J. Schmidt, A new proof of Birkoff’s theorem, Gravitation and Cosmology,
Grav.Cosmol. 3 (1997), 185-190 [arXiv:gr-qc/9709071].
[67] H.-J. Schmidt, Lectures on mathematical cosmology, arXiv:gr-qc/0407095.
[68] R. Schoen, Conformal deformation of a Riemannian metric to constant scalar curva-
ture, Journal of Differential Geometry 20 (1984), 479-495.
[69] K. Schwarzschild, On the Gravitational Field of a Mass Point according to Einstein’s
Theory, Sitzungsberichte der Koeniglich Preussischen Akademie der Wissenschaften zu
Berlin (1916), 189-196 [arXiv:physics/9905030].
[70] J. Shi and M. Yao, Positive solutions for elliptic equations with singular nonlinearity,
EJDE Vol. 2005(2005), 04, 1-11.
[71] J. Soda, Gravitational waves in brane world A Midi-superspace Approach,
arXiv:hep-th/0202016.
[72] A. Strominger, Superstrings with torsion, Nucl. Phys. B 274 (1986), 253.
[73] M. E. Taylor, Partial Differential Equations III - Nonlinear Equations, Applied Math-
ematical Sciences - Springer (1996).
[74] K. Thorne, Warping spacetime, The Future of Theoretical Physics and Cosmology,
Part 5, Cambridge University Press (2003), 74-104.
[75] N. Trudinger, Remarks concerning the conformal deformation of Rieamnnian struc-
tures on compact manifolds, Ann. Scuola Norm. Sup. Pisa 22 (1968), 265-274.
[76] P. S. Wesson, Space-Time-Matter, Modern Kaluza-Klein Theory, World Scientific
(1999).
[77] P. S. Wesson, On Higher-Dimensional Dynamics, arXiv:gr-qc/0105059.
[78] M. Willem, Minimax Theorems, Birkhäuser, Boston (1996).
[79] H. Yamabe On a deformation of Riemannian structures on compact manifolds, Osaka
Math. J. 12 (1960), 21-37.
(F. Dobarro) Dipartimento di Matematica e Informatica, Università degli
Studi di Trieste, Via Valerio 12/b, I-34127 Trieste, Italy
E-mail address: dobarro@dmi.units.it
(B. Ünal) Department of Mathematics, Bilkent University, Bilkent, 06800
Ankara, Turkey
E-mail address: bulentunal@mail.com
http://arxiv.org/abs/hep-th/9905221
http://arxiv.org/abs/hep-th/9906064
http://arxiv.org/abs/hep-th/0407176
http://arxiv.org/abs/gr-qc/9709071
http://arxiv.org/abs/gr-qc/0407095
http://arxiv.org/abs/physics/9905030
http://arxiv.org/abs/hep-th/0202016
http://arxiv.org/abs/gr-qc/0105059
	1. Introduction
	2. Motivations and Main results
	3. The curvature relations - Sketch of the proofs
	4. The problem (Pb-sc) - Existence of solutions
	5. Proof of the Theorem ??
	6. Conclusions and future directions
	Appendix A.  
	References
ABSTRACT
  We consider the curvature of a family of warped products of two
pseduo-Riemannian manifolds $(B,g_B)$ and $(F,g_F)$ furnished with metrics of
the form $c^{2}g_B \oplus w^2 g_F$ and, in particular, of the type $w^{2
\mu}g_B \oplus w^2 g_F$, where $c, w \colon B \to (0,\infty)$ are smooth
functions and $\mu$ is a real parameter. We obtain suitable expressions for the
Ricci tensor and scalar curvature of such products that allow us to establish
results about the existence of Einstein or constant scalar curvature structures
in these categories. If $(B,g_B)$ is Riemannian, the latter question involves
nonlinear elliptic partial differential equations with concave-convex
nonlinearities and singular partial differential equations of the
Lichnerowicz-York type among others.

<|endoftext|><|startoftext|>
Introduction
The present paper provides a finishing touch in a local classification of essentially
conformally symmetric pseudo-Riemannian metrics.
A pseudo-Riemannian manifold of dimension n ≥ 4 is called essentially conformal-
ly symmetric if it is conformally symmetric [2] (in the sense that its Weyl conformal
tensor is parallel) without being conformally flat or locally symmetric.
The metric of an essentially conformally symmetric manifold is always indefinite
[4, Theorem 2]. Compact essentially conformally symmetric manifolds are known
to exist in all dimensions n ≥ 5 with n ≡ 5 (mod 3), where they represent all in-
definite metric signatures [8], while examples of essentially conformally symmetric
pseudo-Riemannian metrics on open manifolds of all dimensions n ≥ 4 were first
constructed in [16].
On every conformally symmetric manifold there is a naturally distinguished parallel
distribution D, of some dimension d, which we call the Olszak distribution. As shown
by Olszak [13], for an essentially conformally symmetric manifold d ∈ {1, 2}.
In [7] we described the local structure of all conformally symmetric manifolds with
d = 2. See also Section 3. This paper establishes an analogous result (Theorem 4.1)
for the case d = 1.
In both cases, some of the metrics in question are locally symmetric. In Remark 4.2
we explain why a similar classification result cannot be valid just for essentially con-
formally symmetric manifolds.
Essentially conformally symmetric manifolds with d = 1 are all Ricci-recurrent, in
the sense that, for every tangent vector field v, the Ricci tensor ρ and the covariant
derivative ∇vρ are linearly dependent at each point. The local structure of essentially
conformally symmetric Ricci-recurrent manifolds at points with ρ ⊗∇ρ 6= 0 has
already been determined by the second author [16]. Our new contribution settles the
2000 Mathematics Subject Classification. 53B30.
Key words and phrases. Parallel Weyl tensor, conformally symmetric manifold.
http://arxiv.org/abs/0704.0596v1
2 A. DERDZINSKI AND W. ROTER
one case still left open in the local classification problem, namely, that of essentially
conformally symmetric manifolds with d = 1 at points where ρ⊗∇ρ = 0.
The literature dealing with conformally symmetric manifolds includes, among oth-
ers, [9, 10, 12, 15, 17, 18] and the papers cited above. A local classification of homo-
geneous essentially conformally symmetric manifolds can be found in [3].
1. Preliminaries
Throughout this paper, all manifolds and bundles, along with sections and connec-
tions, are assumed to be of class C∞. A manifold is, by definition, connected. Unless
stated otherwise, a mapping is always a C∞ mapping betweeen manifolds.
Given a connection ∇ in a vector bundle E over a manifold M , a section ψ of
E , and vector fields u, v tangent to M , we use the sign convention
(1) R(u, v)ψ = ∇v∇uψ − ∇u∇vψ + ∇[u,v]ψ
for the curvature tensor R = R∇.
The Levi-Civita connection of a given pseudo-Riemannian manifold (M, g) is al-
ways denoted by ∇. We also use the symbol ∇ for connections induced by ∇, in
various ∇-parallel subbundles of TM and their quotients.
The Schouten tensor σ and Weyl conformal tensor W of a pseudo-Riemannian
manifold (M, g) of dimension n ≥ 4 are given by σ = ρ − (2n − 2)−1 sg, with ρ
denoting the Ricci tensor, s = trgρ standing for the scalar curvature, and
(2) W = R − (n− 2)−1g ∧ σ.
Here ∧ is the exterior multiplication of 1-forms valued in 1-forms, which uses the
ordinary ∧ as the valuewise multiplication; thus, g∧σ is a 2-form valued in 2-forms.
Let (t, s) 7→ x(s, t) be a fixed variation of curves in a pseudo-Riemannian manifold
(M, g), that is, an M-valued C∞ mapping from a rectangle (product of intervals) in
the ts-plane. By a vector field w along the variation we mean, as usual, a section of
the pullback of TM to the rectangle (so that w(t, s) ∈ Tx(t,s)M). Examples are xs
and xt, which assign to (t, s) the velocity of the curve t 7→ x(t, s) or s 7→ x(t, s) at
s or t. Further examples are provided by restrictions to the variation of vector fields
on M . The partial covariant derivatives of a vector field w along the variation are
the vector fields wt, ws along the variation, obtained by differentiating w covariantly
along the curves t 7→ x(t, s) or s 7→ x(t, s). Skipping parentheses, we write wts, wstt,
etc., rather than (wt)s, ((ws)t)t for higher-order derivatives, as well as xss, xst instead
of (xs)s, (xs)t. One always has wts = wst + R(xt, xs)w, cf. [11, formula (5.29) on
p. 460], and, since the Levi-Civita connection ∇ is torsionfree, xst = xts. Thus,
whenever (t, s) 7→ x(s, t) is a variation of curves in M ,
(3) xtss = xsst + R(xt, xs)xs .
CONFORMALLY SYMMETRIC MANIFOLDS 3
2. The Olszak distribution
The Olszak distribution of a conformally symmetric manifold (M, g) is the parallel
subbundle D of TM , the sections of which are the vector fields u with the property
that ξ∧Ω = 0 for all vector fields v, v ′ and for the differential forms ξ = g(u, · ) and
Ω =W (v, v ′, · , · ). The distribution D was introduced, in a more general situation,
by Olszak [13], who also proved the following lemma.
Lemma 2.1. The following conclusions hold for the dimension d of the Olszak
distribution D in any conformally symmetric manifold (M, g) with dimM = n ≥ 4.
(i) d ∈ {0, 1, 2, n}, and d = n if and only if (M, g) is conformally flat.
(ii) d ∈ {1, 2} if (M, g) is essentially conformally symmetric.
(iii) d = 2 if and only if rankW=1, in the sense that W, as an operator acting
on exterior 2-forms, has rank 1 at each point.
(iv) If d = 2, the distribution D is spanned by all vector fields of the form
W (u, v)v′ for arbitrary vector fields u, v, v′ on M .
Proof. See Appendix I. �
In the next lemma, parts (a) and (d) are due to Olszak [13, 2o and 3o on p. 214].
Lemma 2.2. If d ∈ {1, 2}, where d is the dimension of the Olszak distribution D
of a given conformally symmetric manifold (M, g) with dimM = n ≥ 4, then
(a) D is a null parallel distribution,
(b) at any x ∈M the space Dx contains the image of the Ricci tensor ρx treated,
with the aid of gx, as an endomorphism of TxM,
(c) the scalar curvature is identically zero and R = W + (n− 2)−1g ∧ ρ,
(d) W (u, · , · , · ) = 0 whenever u is a section of D,
(e) R(v, v ′, · , · ) = W (v, v ′, · , · ) = 0 for any sections v and v ′ of D⊥,
(f) of the connections in D and E = D⊥/D, induced by the Levi-Civita connec-
tion of g, the latter is always flat, and the former is flat if d = 1.
Proof. Assertion (e) for W is immediate from the definition of D. Namely, at any
point x ∈M , every 2-form Ωx in the image of Wx (for Wx acting on 2-forms at x) is
∧-divisible by ξ = gx(u, · ) for each u ∈ Dxr{0}, and so Ωx(v, v
′) = 0 if v, v ′ ∈ Dx
We now proceed to prove (a), (b), (c) and (d).
First, let d = 2. By Lemma 2.1(iii), this amounts to the condition rankW=1, so
that (a), (b) and (c) follow from Lemma 2.1(iv) combined with [7, Lemma 17.1(ii)
and Lemma 17.2]. Also, for a nonzero 2-form Ωx chosen as in the last paragraph,
Dx is the image of Ωx, that is, Ωx equals the exterior product of two vectors in Dx
(treated as 1-forms, with the aid of gx). Now (d) follows since, by (a), Ωx(ux, · ) = 0
if u is a section of D.
4 A. DERDZINSKI AND W. ROTER
Next, suppose that d = 1. Replacing M by a neighborhood of any given point,
we may assume that D is spanned by a vector field u. If u were not null, we would
have W (u, v, u, v ′) = 0 for any sections v, v ′ of D⊥, as one sees contracting the
twice-covariant tensor field W ( · , v, · , v ′) = 0, at any point x, in an orthogonal basis
containing the vector ux. (We have already established (e) for W.) Combined with
(e) for W and the symmetries of W, the relation W (u, v, u, v ′) = 0 for v, v ′ in D⊥
would then give W = 0, contrary to the assumption that d = 1. Thus, u is null,
which yields (a). Now
(4) we choose, locally, a null vector field u′ with g(u, u′) = 1.
For any section v of D⊥ one sees that W (u, · , u′, v) = 0 by contracting the tensor
field W ( · , · , · , v) = 0 in the first and third arguments, at any point x, in
(5) a basis of TxM formed by ux, u
x and n− 2 vectors orthogonal to them,
and using (e) for W, along with the inclusion D ⊂ D⊥, cf. (a). Since u′ and D⊥
span TM , assertion (e) for W thus implies (d).
To prove (b) and (c) when d = 1, we distinguish two cases: (M, g) is either es-
sentially conformally symmetric, or locally symmetric. For (c), it suffices to establish
vanishing of the scalar curvature s (cf. (2)). Now, in the former case, s = 0 accord-
ing to [5, Theorem 7], while (b) follows since, as shown in [6, Theorem 7 on p. 18],
for arbitrary vector fields v, v ′ and v ′′ on an essentially conformally symmetric pseu-
do-Riemannian manifold, ξ ∧ Ω = 0, where ξ = ρ(v, · ) and Ω = W (v ′, v ′′, · , · ). In
the case where g is locally symmetric, (b) and (c) are established in Appendix II.
Assertion (e) for R is now obvious from (e) for W and (c), since, by (b), ρ(v, · ) =
0 for any section v of D⊥. The claim about E in (f) is in turn immediate from (1)
and (e) for R, which states that R(w,w ′)v, for arbitrary vector fields w,w ′ and any
section v of D⊥, is orthogonal to all sections of D⊥ (and hence must be a section
of D). Finally, to prove (f) for D, with d = 1, let us fix a section u of D, a vector
field v, and define a differential 2-form ζ by ζ(w,w ′) = (n−2)R(w,w ′, u, v) for any
vector fields w,w ′. By (c) and (e), ζ = g(u, · )∧ ρ(v, · ), as D ⊂ D⊥ (cf. (a)), and so
ρ(u, · ) = 0 in view of (b) and symmetry of ρ. However, by (b), both g(u, · ) and
ρ(v, · ) are sections of the subbundle of T ∗M corresponding to D under the bundle
isomorphism TM → T ∗M induced by g, so that ζ = 0 since the distribution D is
one-dimensional. �
3. The case d = 2
For more details of the construction described below, we refer the reader to [7].
Let there be given a surface Σ, a projectively flat torsionfree connection D on Σ
with a D-parallel area form α, an integer n ≥ 4, a sign factor ε = ±1, a real vector
space V of dimension n− 4, and a pseudo-Euclidean inner product 〈 , 〉 on V .
CONFORMALLY SYMMETRIC MANIFOLDS 5
We also assume the existence of a twice-contravariant symmetric tensor field T on
Σ with divD(divDT ) + (ρD, T ) = ε (in coordinates: T jk,jk + T
jkRjk = ε). Here
divD denotes the D-divergence, ρD is the Ricci tensor of D, and ( , ) stands for
the obvious pairing. Such T always exists locally in Σ. In fact, according to [7,
Theorem 10.2(i)] combined with [7, Lemma 11.2], T exists whenever Σ is simply
connected and noncompact.
For T chosen as above, we define a twice-covariant symmetric tensor field τ on
Σ, that is, a section of [T ∗Σ]⊙2, by requiring τ to correspond to the section T of
[TΣ]⊙2 under the vector-bundle isomorphism TΣ → T ∗Σ which acts on vector fields
v by v 7→ α(v, · ). In coordinates, τjk = αjlαkmT
Next, we denote by hD the Patterson-Walker Riemann extension metric [14] on
the total space T ∗Σ, obtained by requiring that all vertical and all D-horizontal
vectors be hD-null, while hDx (ζ, w) = ζ(dπxw) for x ∈ T
∗Σ, any vector w ∈ TxT
any vertical vector ζ ∈ Ker dπx = T
Σ, and the bundle projection π : T ∗Σ → Σ.
Finally, let γ and θ be the constant pseudo-Riemannian metric on V correspond-
ing to the inner product 〈 , 〉, and the function V → R with θ(v) = 〈v, v〉.
Our Σ,D, α, n, ε, V , 〈 , 〉 now give rise to the pseudo-Riemannian manifold
(6) (T ∗Σ × V, hD− 2τ + γ − θρD) ,
of dimension n, with the metric hD− 2τ + γ − θρD, where the function θ and
covariant tensor fields τ, ρD, hD, γ on Σ, T ∗Σ or V are identified with their pull-
backs to T ∗Σ × V . (Thus, for instance, hD− 2τ + γ is a product metric.)
We have the following local classification result, in which d stands for the dimen-
sion of Olszak distribution D.
Theorem 3.1. The pseudo-Riemannian manifold (6) obtained as above from any
data Σ,D, α, n, ε, V , 〈 , 〉 with the stated properties is conformally symmetric and has
d = 2. Conversely, in any conformally symmetric pseudo-Riemannian manifold such
that d = 2, every point has a connected neighborhood isometric to an open subset of
a manifold (6) constructed above from some data Σ, D, α, n, ε, V , 〈 , 〉.
The manifold (6) is never conformally flat, and it is locally symmetric if and only
if the Ricci tensor ρD is D-parallel.
Proof. See [7, Section 22]. Note that, in view of Lemma 2.1(iii), the condition
rankW=1 used in [7] is equivalent to d = 2. �
The objects Σ,D, α, n, ε, V , 〈 , 〉 are treated as parameters of the above construc-
tion, while T is merely assumed to exist, even though the metric g in (6) clearly
depends on τ (and hence on T ). This is justified by the fact that, with fixed
Σ,D, α, n, ε, V , 〈 , 〉, the metrics corresponding to two choices of T are, locally, iso-
metric to each other, cf. [7, Remark 22.1].
6 A. DERDZINSKI AND W. ROTER
The metric signature of (6) is clearly given by −− . . .++, with the dots standing
for the sign pattern of 〈 , 〉.
4. The case d = 1
Let there be given an open interval I, a C∞ function f : I → R, an integer n ≥ 4,
a real vector space V of dimension n − 2 with a pseudo-Euclidean inner product
〈 , 〉, and a nonzero traceless linear operator A : V → V , self-adjoint relative to 〈 , 〉.
As in [16], we then define an n-dimensional pseudo-Riemannian manifold
(7) (I ×R× V, κ dt2 + dt ds + γ) ,
where products of differentials represent symmetric products, t, s denote the Carte-
sian coordinates on the I × R factor, γ stands for the pullback to I × R × V of
the flat pseudo-Riemannian metric on V that corresponds to the inner product 〈 , 〉,
and the function κ : I ×R× V → R is given by κ(t, s, ψ) = f(t)〈ψ, ψ〉+ 〈Aψ, ψ〉.
The manifolds (7) are characterized by the following local classification result,
analogous to Theorem 3.1. As before, d is the dimension of the Olszak distribution.
Theorem 4.1. For any I, f, n, V , 〈 , 〉, A as above, the pseudo-Riemannian man-
ifold (7) is conformally symmetric and has d = 1. Conversely, in any conformally
symmetric pseudo-Riemannian manifold such that d = 1, every point has a connected
neighborhood isometric to an open subset of a manifold (7) constructed from some
such I, f, n, V , 〈 , 〉, A.
The manifold (7) is never conformally flat, and it is locally symmetric if and only
if f is constant.
A proof of Theorem 4.1 is given at the end of the next section.
Obviously, the metric κ dt2+ dt ds + γ in (7) has the sign pattern − . . .+, where
the dots stand for the sign pattern of 〈 , 〉.
Remark 4.2. A classification result of the same format as Theorem 4.1 cannot
be true just for essentially conformally symmetric manifolds with d = 1. Namely,
such manifolds do not satisfy a principle of unique continuation: formula (7) with f
which is nonconstant on I, but constant on some nonempty open subinterval I ′ of I,
defines an essentially conformally symmetric manifold with a locally symmetric open
submanifold U = I ′ ×R × V . At points of U, the local structure of (7) does not,
therefore, arise from a construction that, locally, produces all essentially conformally
symmetric manifolds and nothing else.
As explained in [7, Section 24], an analogous situation arises when d = 2.
5. Proof of Theorem 4.1
The following assumptions will be used in Lemma 5.1.
(a) (M, g) is a conformally symmetric manifold of dimension n ≥ 4 and y ∈M .
CONFORMALLY SYMMETRIC MANIFOLDS 7
(b) The Olszak distribution D of (M, g) is one-dimensional.
(c) u is a global parallel vector field spanning D.
(d) t :M → R is a C∞ function with g(u, · ) = dt and t(y) = 0.
(e) dim V = n− 2 for the space V of all parallel sections of E = D⊥/D.
(f) ρ = (2−n)f(t) dt⊗dt for some C∞ function f : I ′ → R on an open interval
I ′, where ρ is the Ricci tensor and f(t) denotes the composite f ◦ t.
For local considerations, only (a) and (b) are essential. In fact, condition (e) (in
which ‘parallel’ refers to the connection in E induced by the Levi-Civita connection
of g), as well (c) and (d) for some u and t, follow from (a) – (b) if M is simply
connected. See Lemma 2.2(f). On the other hand, (c) – (d), Lemma 2.2(b) and
symmetry of ρ give ∇dt = 0 and ρ = χ dt⊗ dt for some function χ : M → R, so
that ∇ρ = dχ⊗ dt⊗ dt. However, ∇ρ is totally symmetric (that is, ρ satisfies the
Codazzi equation): our assumption ∇W = 0 implies the condition divW = 0, well
known [11, formula (5.29) on p. 460] to be equivalent to the Codazzi equation for the
Schouten tensor σ, while σ = ρ by Lemma 2.2(c). Thus, dχ equals a function times
dt, and so χ is, locally, a function of t, which (locally) yields (f).
For any section v of D⊥, we denote by v the image of v under the quotient-pro-
jection morphism D⊥ → E = D⊥/D.
The data required for the construction in Section 4 consist of I, f, n, V appearing
in (a) – (f), along with the pseudo-Euclidean inner product 〈 , 〉 in V , induced in an
obvious way by g (cf. Lemma 2.2(f)), and A : V → V characterized by 〈Aψ, ψ ′〉 =
W (u′, v, v ′, u′), for ψ, ψ ′ ∈ V , with a vector field u′ and sections v, v ′ of D⊥ chosen,
locally, so that g(u, u′) = 1, ψ = v and ψ ′ = v ′. (The resulting bilinear form
(ψ, ψ ′) 7→ 〈Aψ, ψ ′〉 on V is well-defined, that is, unaffected by the choices of u′, v or
v ′, as a consequence of Lemma 2.2(d),(e), while the function W (u′, v, v ′, u′) is in fact
constant, by Lemma 2.2(d), as ones sees differentiating it via the Leibniz rule and
noting that, since v and v ′ are parallel, the covariant derivatives of v and v ′ in the
direction of any vector field are sections of D.) That A is traceless and self-adjoint
is immediate from the symmetries of W. Finally, A 6= 0 since, otherwise, W would
vanish. (Namely, in view of Lemma 2.2(d),(e), W would yield 0 when evaluated on
any quadruple of vector fields, each of which is either u′ or a section of D⊥.)
Under the assumptions (a) – (f), with f = f(t), we then have
(8) R(u′, v)v ′ = [f g(v, v ′) + 〈Av, v ′〉]g(u′, u)u
for any sections v, v ′ of D⊥ and any vector field u′. In fact, ρ(v, · ) = ρ(v ′, · ) = 0
from symmetry of ρ and Lemma 2.2(b), so that, by Lemma 2.2(c), R(u′, v)v ′ =
W (u′, v)v ′ − (n − 2)−1g(v, v ′)ρu′, where ρu′ denotes the unique vector field with
g(ρu′, · ) = ρ(u′, · ). Now (8) follows: due to (d), (f) and the definition of A, both
sides have the same g-inner product with u′, and are orthogonal to u⊥ = D⊥ (with
R(u′, v)v ′ orthogonal to D⊥ in view of Lemma 2.2(e)).
8 A. DERDZINSKI AND W. ROTER
We fix an open subinterval I of I ′, containing 0, and a null geodesic I ∋ t 7→ x(t)
in M with x(0) = y, parametrized by the function t (in the sense that the function
t restricted to the geodesic coincides with the geodesic parameter). Namely, since
∇dt = 0, the restriction of t to any geodesic is an affine function of the parameter;
thus, by (d), it suffices to prescribe the initial data formed by x(0) = y and a null
vector ẋ(0) ∈ TyM with g(ẋ(0), uy) = 1.
As g(ẋ(0), uy) = 1, the plane P in TyM , spanned by the null vectors ẋ(0) and
uy (cf. Lemma 2.2(a)) is gy-nondegenerate, and so TyM = P ⊕ Ṽ , for Ṽ = P
⊥. Let
pr : TyM → Ṽ be the orthogonal projection. Since pr(Dy) = {0}, the restriction of
pr to Dy
⊥ descends to the quotient Ey = Dy
⊥/Dy, producing an isomorphism Ey → Ṽ ,
also denoted by pr. Finally, for ψ ∈ V , we let t 7→ ψ̃(t) ∈ Tx(t)M be the parallel
field with ψ̃(0) = pr ψy, and set κ(t, s, ψ) = f(t)〈ψ, ψ〉+ 〈Aψ, ψ〉, as in Section 4.
The formula F (t, s, ψ) = expx(t)(ψ̃(t) + sux(t)/2) now defines a C
∞ mapping F
from an open subset of R2× V into M .
Lemma 5.1. Under the above hypotheses, F ∗g = κ dt2+ dtds+ h.
Proof. The F -images w,w ′, F∗ψ of the constant vector fields (1, 0, 0), (0, 1, 0)
and (0, 0, ψ) in R2×V , for ψ ∈ V , are vector fields tangent to M along F (sections of
F ∗TM). Since D⊥ is parallel, its leaves are totally geodesic and, by Lemma 2.2(e), the
Levi-Civita connection of g induces on each leaf a flat torsionfree connection. Thus,
w ′ and each F∗ψ are parallel along each leaf of D
⊥, as well as tangent to the leaf,
and parallel along the geodesic t 7→ x(t). Therefore, w ′ = u/2, while the functions
g(w ′, F∗ψ) and g(F∗ψ, F∗ψ
′), for ψ, ψ ′ ∈ V , are constant, and hence equal to their
values at y, that is, 0 and 〈ψ, ψ ′〉. It now remains to be shown that g(w,w) = κ◦F ,
g(w, u/2) = 1/2 and g(w, F∗ψ) = 0. To this end, we consider the variation x(t, s) =
F (t, sa, sψ) of curves in M , with any fixed a ∈ R and ψ ∈ V . Clearly, w = xt along
the variation (notation of Section 1). Next, xts = xst is tangent to D
⊥, since so is
xs, while D
⊥ is parallel. Consequently, [g(xt, u)]s = 0, as u is parallel and tangent
to D. Thus, g(w, u) = g(xt, u) = 1. (Note that g(xt, u) = 1 at s = 0, due to (d),
as the geodesic t 7→ x(t) is parametrized by the function t.) However, xss = 0 and
xs is tangent to D
⊥, so that (3) and (8) now give xtss = [fg(xs, xs) + 〈Axs, xs〉]u,
which is parallel in the s direction, while xts = xst = 0 at s = 0. Hence xts =
s[fg(xs, xs) + 〈Axs, xs〉]u, and so g(xts, xts) = 0 (cf. (c) above and Lemma 2.2(a)).
This further yields [g(xt, xt)]ss/2 = g(xt, xtss) = fg(xs, xs) + 〈Axs, xs〉. The last
function is constant in the s direction, while g(xt, xt) = [g(xt, xt)]s = 0 at s = 0, and
so g(w,w) = g(xt, xt) = s
2[fg(xs, xs) + 〈Axs, xs〉] = κ. Finally, being proportional
to u at each point, xts is orthogonal to D
⊥, and hence to F∗ψ, which imples that
[g(xt, F∗ψ)]s = 0, and, as g(w, F∗ψ) = g(xt, F∗ψ) = 0 at s = 0, we get g(w, F∗ψ) = 0
everywhere. �
CONFORMALLY SYMMETRIC MANIFOLDS 9
We are now in a position to prove Theorem 4.1. First, (7) is conformally sym-
metric and has d = 1, as one can verify by a direct calculation, cf. [16, Theorem 3].
Conversely, if conditions (a) and (b) above are satisfied, we may also assume (c) – (f).
(See the comment following (f).) Our assertion is now immediate from Lemma 5.1.
Appendix I: Proof of Lemma 2.1
We prove Lemma 2.1 here, since Olszak’s paper [13] may be difficult to obtain.
The condition d = n is equivalent to conformal flatness of (M, g), since n > 2
and so Ω = 0 is the only 2-form ∧-divisible by all nonzero 1-forms ξ. At a fixed
point x, the metric gx allows us to treat the Ricci tensor ρx and any 2-form Ωx as
endomorphisms of TxM, so that we may consider their images (which are subspaces
of TxM). If W 6= 0, fixing a nonzero 2-form Ωx in the image of Wx acting on
2-forms at x we see that, for every u ∈ Dx, our Ωx is ∧-divisible by ξ = gx(u, · ),
and so the image of Ωx contains Dx. Thus, d ≤ 2, and (i) follows. (Being nonzero
and decomposable, Ωx has rank 2.) As shown in [6, Theorem 7 on p. 18], if (M, g)
is essentially conformally symmetric, the image of ρx is a subspace of Dx, so that
(i) yields (ii), since g in (ii) cannot be Ricci-flat. Next, if d = 2, the image of our
Ωx coincides with Dx (as rankΩx = 2). Every 2-form in the image of Wx thus is a
multiple of Ωx, being the exterior product of two vectors in Dx, identified, via gx,
with 1-forms. Hence rankW = 1. Conversely, if rankW = 1, all nonzero 2-forms
Ωx in the image of Wx are of rank 2, as Wx, being self-adjoint, is a multiple of
Ωx ⊗ Ωx, and so the Bianchi identity for W gives Ωx ∧ Ωx = 0. All such Ωx are
therefore ∧-divisible by ξ = gx(u, · ), for every nonzero vector u in the common
2-dimensional image of such Ωx, which shows that d = 2. Finally, (iv) follows if one
chooses Ωx 6= 0 equal to Wx(v, v
′, · , · ) for some v, v ′ ∈ TxM .
Appendix II: Lemma 2.2(b),(c) in the locally symmetric case
Parts (b) and (c) of Lemma 2.2 for locally symmetric manifolds with d = 1 could,
in principle, be derived from Cahen and Parker’s classification [1] of pseudo-Riemann-
ian symmetric manifolds. We prove them here directly, for the reader’s convenience.
Our argument uses assertions (a), (d) in Lemma 2.2, along with (e) for W, which
were established in the proof of Lemma 2.2 before Appendix II was mentioned.
Suppose that ∇R = 0 and d = 1. Replacing M by an open subset, we also
assume that the Olszak distribution D is spanned by a vector field u. By (1),
(9) i) R( · , · )u = Ω ⊗ u or, in coordinates, ii) ulRjkl
s = Ωjku
for some differential 2-form Ω, which obviously does not depend on the choice of u.
(It is also clear from (1) that Ω is the curvature form of the connection in the line
bundle D, induced by the Levi-Civita connection of g.) Being unique, Ω is parallel,
10 A. DERDZINSKI AND W. ROTER
and so are ρ and W, which implies the Ricci identities R · Ω = 0, R · ρ = 0, and
R ·W = 0. In coordinates: Rmlj
sτsk +Rmlk
sτjs = 0, where τ = Ω or τ = ρ, and
(10) Rqpj
sWsklm + Rqpk
sWjslm + Rqpl
sWjksm + Rqpk
sWjkls = 0.
Summing Rmlj
sΩsk + Rmlk
sΩjs = 0 against u
l, we obtain Ω ◦ Ω = 0, where the
metric g is used to treat Ω as a bundle morphism TM → TM that sends each
vector field v to the vector field Ωv with g(Ωv, v ′) = Ω(v, v ′) for all vector fields
v ′. Lemma 2.2(d) and (9.i) give W ( · , · , u, v) = R( · , · , u, v) = 0 for our fixed vector
field u, spanning D, and any section v of D⊥. Hence, by (2), g(u, · ) ∧ σ(v, · ) =
g(v, · ) ∧ σ(u, · ). Thus, σu = cu for the Schouten tensor σ and some constant c,
with σu defined analogously to Ωv. (Otherwise, choosing v such that u, σu and
v are linearly independent at a given point x, we would obtain a contradiction with
the equality between planes in TxM , corresponding to the above equality between
exterior products.) Consequently, g(u, · ) ∧ (σ + cg)(v, · ) = 0, and so σv + cv is a
section of D whenever v is a section of D⊥. Let us now fix u′ as in (4). Symmetry
of σ gives g(σu′, u) = c. In a suitably ordered basis with (5), at any point x, the
endomorphism of TxM corresponding to σx thus has an upper triangular matrix
with the diagonal entries c,−c, . . . ,−c, c, so that trgσ = (4 − n)c. Consequently,
(n − 2) s = 2(n − 1)(4 − n)c, for the scalar curvature s, and (n − 2)ρu = 2cu.
However, contracting (9.ii) in k = s, we get ρu = −Ωu, and so (n− 2)Ωu = −2cu.
The equality Ω ◦Ω = 0 that we derived from the Ricci identity R ·Ω = 0 now gives
c = 0. Hence s = 0 (which yields Lemma 2.2(c)), and ρu = 0.
As c = 0 and σ = ρ, the assertion about σv + cv obtained above means that
ρv is a section of D whenever v is a section of D⊥. Let λ, µ, ξ be the 1-forms
with λ = g(u, · ), µ = g(u′, · ), ξ(u′) = 0, and ρv = ξ(v)u for sections v of D⊥.
Transvecting (9.ii) with µs, we get Ω = R( · , · , u, u
′) = (n − 2)−1λ ∧ ρ(u′, · ) from
Lemma 2.2(c) with ρu = 0 and Lemma 2.2(d). However, evaluating ρ(u′, · ) on
u′, u and sections v of D⊥, we see that ρ(u′, · ) = hλ+ ξ, with h = ρ(u′, u′). (Note
that ξ(u) = 0 since ρu = 0, while D ⊂ D⊥ by Lemma 2.2(a).) Therefore,
(11) i) (n− 2)Ω = λ ∧ ξ , ii) ρ = hλ⊗ λ + λ⊗ ξ + ξ ⊗ λ.
In addition, if v ′ denotes the unique vector field with g(v ′, · ) = ξ, then u and v ′
are null and orthogonal, or, equivalently,
(12) the 1-forms λ and ξ are null and mutually orthogonal.
In fact, g(u, u) = 0 by Lemma 2.2(a), g(u, v ′) = 0 as ξ(u) = 0, and v ′ is null
since (11) yields (n−2)[ρ(Ωu′)−Ω(ρu′)] = 2g(v ′, v ′)u, while, transvecting the Ricci
identity Rmlj
sRsk + Rmlk
sRjs = 0 with u
l and using (9.ii), we see that ρ and Ω
commute as bundle morphisms TM → TM .
Furthermore, transvecting with µkµm the coordinate form Rmlj
sτsk+Rmlk
sτjs = 0
of the Ricci identity R · τ = 0 for the parallel tensor field τ = (n − 2)Ω + ρ =
CONFORMALLY SYMMETRIC MANIFOLDS 11
hλ⊗ λ + 2λ⊗ ξ (cf. (11)), we get 2λjblsξ
s = 0, where b = W (u′, · , u′, · ). Namely,
R = W + (n− 2)−1g ∧ ρ by Lemma 2.2(c), Wmlj
sτsk = 0 in view of Lemma 2.2(d),
µkµmWmlk
sτjs = 2λjblsξ
s since b(u, · ) = 0 (again from Lemma 2.2(d)), and the
remaining terms, related to g ∧ ρ, add up to 0 as a consequence of (12), (11.ii)
and the formula for τ . (Note that (12) gives Rj
sτsk = Rj
sτks = 0, and so four out
of the eight remaining terms vanish individually.) However, u 6= 0, and so λ 6= 0,
which gives b( · , v ′) = 0, where v ′ is the vector field with g(v ′, · ) = ξ. Thus,
W (u′, · , u′, v ′) = 0. As a result, the 3-tensor W ( · , · , · , v ′) must vanish: it yields
the value 0 whenever each of the three arguments is either u′ or a section of D⊥.
(Lemma 2.2(e) for W is already established.)
The relation W ( · , · , · , v ′) = 0 implies in turn that W ( · , · , · , ρv) = 0 (in coor-
dinates: Wjkl
sRsp = 0). In fact, by (11.ii), the image of ρ is spanned by u and v
while W ( · , · , · , u) = 0 according to Lemma 2.2(d).
As in [13, 1o on p. 214], we have W = (λ ⊗ λ) ∧ b (notation of (2)), where,
again, b = W (u′, · , u′, · ). Namely, by Lemma 2.2(e) for W, both sides agree on any
quadruple of vector fields, each of which is either u′ or a section of D⊥.
Finally, transvecting (10) with µkµm and replacing R by W + (n− 2)−1g ∧ ρ, we
obtain two contributions, one from W and one from g ∧ ρ, the sum of which is zero.
Since W = (λ⊗λ)∧ b, the W contribution vanishes: its first two terms add up to 0,
and so do its other two terms. (As we saw, b(u, · ) = 0, while, obviously, b(u′, · ) = 0.)
Out of the sixteen terms forming the g∧ρ contribution, eight are separately equal to
zero since Wjkl
sRsp = 0, and so, in view of (11.ii) and the relation W = (λ⊗ λ) ∧ b,
vanishing of the g∧ρ contribution gives λpSjlq = λqSjlp, for Sjlq = 2bjlξq−bqlξj−bqjξl.
Thus, Sjlq = ηjlλq for some twice-covariant symmetric tensor field η, which, summed
cyclically over j, l, q, yields 0 (due to the definition of Sjlq and symmetry of b). As
λ 6= 0 and the symmetric product has no zero divisors, we get η = 0 and Sjlq = 0.
The expression bjlξq− bqlξj is, therefore, skew-symmetric in j, l. As it is also, clearly,
skew-symmetric in j, q, it must be totally skew-symmetric and hence equal to one-
third of its cyclic sum over j, l, q. That cyclic sum, however, is 0 in view of symmetry
of b, so that bjlξq = bqlξj. Thus, ξ = 0, for otherwise the last equality would yield
b = ϕξ ⊗ ξ for some function ϕ, and hence W = (λ⊗ λ) ∧ b = ϕ(λ⊗ λ) ∧ (ξ ⊗ ξ),
which would clearly imply that the vector field v ′ with g(v ′, · ) = ξ is a section of
the Olszak distribution D, not equal to a function times u (as ξ(u′) = 0, while
g(u, u′) = 1), contradicting one-dimensionality of D. Therefore, ρ = hλ ⊗ λ by
(11.ii) with ξ = 0, which proves assertion (b) of Lemma 2.2 in our case.
References
[1] Cahen, M. & Parker, M., Pseudo-riemannian symmetric spaces. Mem. Amer. Math. Soc.
229 (1980), 1–108.
12 A. DERDZINSKI AND W. ROTER
[2] Chaki, M. C. & Gupta, B., On conformally symmetric spaces. Indian J. Math. 5 (1963),
113–122.
[3] Derdziński, A., On homogeneous conformally symmetric pseudo-Riemannian manifolds. Col-
loq. Math. 40 (1978), 167–185.
[4] Derdziński, A. & Roter, W., On conformally symmetric manifolds with metrics of indices
0 and 1. Tensor (N. S.) 31 (1977), 255–259.
[5] Derdziński, A. & Roter, W., Some theorems on conformally symmetric manifolds. Tensor
(N. S.) 32 (1978), 11–23.
[6] Derdziński, A. & Roter, W., Some properties of conformally symmetric manifolds which
are not Ricci-recurrent. Tensor (N. S.) 34 (1980), 11–20.
[7] Derdzinski, A. & Roter, W., Projectively flat surfaces, null parallel distributions, and
conformally symmetric manifolds. Preprint, math.DG/0604568. To appear in Tohoku Math. J.
[8] Derdzinski, A. & Roter, W., Compact pseudo-Riemannian manifolds with parallel Weyl
tensor. Preprint, http://arXiv.org/abs/math.DG/0702491.
[9] Deszcz, R., On hypercylinders in conformally symmetric manifolds. Publ. Inst. Math.
(Beograd) (N. S.) 51(65) (1992), 101–114.
[10] Deszcz, R. & Hotloś, M., On a certain subclass of pseudosymmetric manifolds. Publ. Math.
Debrecen 53 (1998), 29–48.
[11] Dillen, F. J. E. & Verstraelen, L. C. A. (eds.), Handbook of Differential Geometry, Vol. I.
North-Holland, Amsterdam, 2000.
[12] Hotloś, M., On conformally symmetric warped products. Ann. Acad. Paedagog. Cracov. Stud.
Math. 4 (2004), 75–85.
[13] Olszak, Z., On conformally recurrent manifolds, I: Special distributions. Zesz. Nauk. Politech.
Śl., Mat.-Fiz. 68 (1993), 213–225.
[14] Patterson, E. M. & Walker, A. G., Riemann extensions. Quart. J. Math. Oxford Ser. (2)
3 (1952), 19–28.
[15] Rong, J. P., On 2K∗
space. Tensor (N. S.) 49 (1990), 117–123.
[16] Roter, W., On conformally symmetric Ricci-recurrent spaces. Colloq. Math. 31 (1974), 87–
[17] Sharma, R., Proper conformal symmetries of conformal symmetric space-times. J. Math. Phys.
29 (1988), 2421–2422.
[18] Simon, U., Compact conformally symmetric Riemannian spaces.Math. Z. 132 (1973), 173–177.
Department of Mathematics
The Ohio State University
Columbus, OH 43210
E-mail address : andrzej@math.ohio-state.edu
Institute of Mathematics and Computer Science
Wroc law University of Technology
Wybrzeże Wyspiańskiego 27, 50-370 Wroc law
Poland
E-mail address : roter@im.pwr.wroc.pl
http://arxiv.org/abs/math/0604568
	Introduction
	1. Preliminaries
	2. The Olszak distribution
	3. The case d=2
	4. The case d=1
	5. Proof of Theorem ??
	Appendix I: Proof of Lemma ??
	Appendix II: Lemma ??(b),(c) in the locally symmetric case
	References
ABSTRACT
  This is a final step in a local classification of pseudo-Riemannian manifolds
with parallel Weyl tensor that are not conformally flat or locally symmetric.

<|endoftext|><|startoftext|>
Introduction
The space-time development of a hadronic system is still poorly understood, and
models are necessary to transform a partonic system, governed by perturbative QCD, to
final state hadrons observed in the detectors.
WW events produced in e+e− collisions at LEP-2 constitute a unique laboratory to
study and test the evolution of such hadronic systems, because of the clean environment
and the well-defined initial energy in the process. Of particular interest is the possibility
to study separately one single evolving hadronic system (one of the W bosons decaying
semi-leptonically, the other decaying hadronically), and compare it with two hadronic
systems evolving at the same time (both W bosons decaying hadronically).
Interconnection effects between the products of the hadronic decays of the two W
bosons (in the same event) are expected since the lifetime of the W bosons (τW ≃ ~/ΓW ≃
0.1 fm/c) is an order of magnitude smaller than the typical hadronization times. These
effects can happen at two levels:
• in the evolution of the parton shower, between partons from different hadronic sys-
tems by exchanging coloured gluons [1] (this effect is called Colour Reconnection
(CR) for historical reasons);
• between the final state hadrons, due to quantum-mechanical interference, mainly
due to Bose-Einstein Correlations (BEC) between identical bosons (e.g. pions with
the same charge).
A detailed study by DELPHI of this second effect was recently published [2].
The first effect, the possible presence of colour flow between the two W hadronization
systems, is the topic studied in this paper. This effect is worthy of study in its own right
and for the possible effects induced on the W mass measurement in fully hadronic events
(see for instance [3] for an introduction and [4] for an experimental review).
The effects at the perturbative level are expected to be small [3], whereas they may
be large at the hadronization level (many soft gluons sharing the space-time) for which
models have to be used to compare with the data.
The most tested model is the Sjöstrand-Khoze “Type 1” CR model SK-I [5]. This
model of CR is based on the Lund string fragmentation phenomenology. The strings are
considered as colour flux tubes with some volume, and reconnection occurs when these
tubes overlap. The probability of reconnection in an event is parameterised by the value
κ, set globally by the user, according to the space-time volume overlap of the two strings,
Voverlap :
Preco(κ) = 1− e−κVoverlap . (1)
The parameter κ was introduced in the SK-I model to allow a variation of the percentage
of reconnected events and facilitate studies of sensitivity to the effect. In this model
only one string reconnection per event was allowed. The authors of the model propose
the value of κ = 0.66 to give similar amounts of reconnection as other models of Colour
Reconnection. By comparing the data with the model predictions evaluated at several
κ values, it is possible to determine the value of κ most consistent with the data and
extract the corresponding reconnection probability. Another model was proposed by the
same authors, considering the colour flux tubes as infinitely thin, which allows for Colour
Reconnection in the case the tubes cross each other and provided the total string length
is reduced (SK-II′). This last model was not tested.
Two further models are tested here, these are the models implemented in HERWIG [6]
and ARIADNE [7] Monte Carlo programs. In HERWIG the partons are reconnected, with a
reconnection probability of 1/9, if the reconnection results in a smaller total cluster mass.
In ARIADNE, which implements an adapted version of the Gustafson-Häkkinen model [8],
the model used [9] allows for reconnections between partons originating in the same W
boson, or from different W bosons if they have an energy smaller than the width of the
W boson (this model will be referred as ‘AR-2’).
Colour Reconnection has been previously investigated in DELPHI by comparing in-
clusive distributions of charged particles, such as the charged-particle multiplicity dis-
tribution or the production of identified (heavy) particles, in fully hadronic WW events
and the distributions in semi-leptonic WW events. The investigations did not show any
effect as they were limited by statistical and systematic errors and excluded only the
most extreme models of CR (see [10]).
This article presents the results of the investigations of Colour Reconnection effects in
hadronically decaying W pairs using two techniques. The first, proposed by L3 in [11],
looks at the particle flow between the jets in a 4-jet WW event. The second, proposed
by DELPHI in [12], takes into account the different sensitivity to Colour Reconnection of
several W mass estimators. The first technique is more independent of the model and it
can provide comparisons based on data. The second technique is more dependent on the
model tested, but has a much larger sensitivity to the models SK-I and HERWIG. Since
the particle flow and W mass estimator methods were found to be largely uncorrelated a
combination of the results of these two methods is provided.
The paper is organised as follows. In the next section, the LEP operation and the
components of the DELPHI detector relevant to the analyses are briefly described. In
section 3 data and simulation samples are explained. Then both of the analysis methods
discussed above are described and their results presented in sections 4 and 5. The com-
bination of the results is given in section 6 and conclusions are drawn in the seventh and
final section.
2 LEP Operation and Detector Description
At LEP-2, the second phase of the e+e− collider at CERN, the accelerator was operated
at centre-of-mass energies above the threshold for double W boson production from 1996
to 2000. In this period, the DELPHI experiment collected about 12000 WW events
corresponding to a total integrated luminosity of 661 pb−1. About 46% of the WW
events are WW → q1q̄2q3q̄4 events (fully hadronic), and 44% are WW → q1q̄2ℓν̄, where
ℓ is a lepton (semi-leptonic).
The detailed description of the DELPHI detector and its performance is provided
in [13,14]. A brief summary of the main characteristics of the detector important for the
analyses follows.
The tracking system of DELPHI consisted of a Time Projection Chamber (TPC), the
main tracking device of DELPHI, and was complemented by a Vertex Detector (VD)
closest to the beam pipe, the Inner and the Outer Detectors in the barrel region, and two
Forward Chambers in the end caps. It was embedded in a 1.2 T magnetic field, aligned
parallel to the beam axis.
The electromagnetic calorimeter consisted of the High density Projection Chamber
(HPC) in the barrel region, the Forward Electromagnetic Calorimeter (FEMC) and the
Small angle Tile Calorimeter (STIC) in the forward regions, complemented by detectors to
tag the passage of electron-positron pairs from photons converted in the regions between
the FEMC and the HPC. The total depths of the calorimeters corresponded to about 18
radiation lengths. The hadronic calorimeter was composed of instrumented iron with a
total depth along the shortest trajectory for a neutral particle of 6 interaction lengths,
and covered 98% of the total solid angle. Embedded in the hadronic calorimeter were
two planes of muon drift chambers to tag the passage of muons. The whole detector was
surrounded by a further double plane of staggered muon drift chambers.
For LEP-2, the DELPHI detector was upgraded as described in the following.
Changes were made to some of the subdetectors, the trigger system [15], the run
control and the algorithms used in the offline reconstruction of tracks, which improved
the performance compared to the earlier LEP-1 period.
The major changes were the extensions of the Vertex Detector (VD) and the Inner
Detector (ID), and the inclusion of the Very Forward Tracker (VFT) [16], which increased
the coverage of the silicon tracker to polar angles with respect to the z-axis1 of 11◦ <
θ < 169◦. To further improve the track reconstruction efficiency in the forward regions
of DELPHI, the tracking algorithms and the alignment and calibration procedures were
optimised for LEP-2.
Changes were also made to the electronics of the trigger and timing system which
improved the stability of the running during data taking. The trigger conditions were
optimised for LEP-2 running, to give high efficiency for 2- and 4-fermion processes in the
Standard Model and also to give sensitivity to events which may have been signatures
of new physics. In addition, improvements were made to the operation of the detector
during the LEP operating states, to prepare the detector for data taking at the very start
of stable collisions of the e+e− beams, and to respond to adverse background from LEP
when it arose. These changes led to an overall improvement in the efficiency for collecting
the delivered luminosity from about 85% in 1995, before the start of LEP-2, to about
95% at the end in 2000.
During the operation of the DELPHI detector in 2000 one of the 12 sectors of the
central tracking chamber, the TPC, failed. After 1st September it was not possible to
detect the tracks left by charged particles inside the broken sector. The data affected
corresponds to around 1/4 of the data collected in 2000. Nevertheless, the redundancy of
the tracking system of DELPHI meant that tracks passing through the sector could still
be reconstructed from signals in any of the other tracking detectors. As a result, the track
reconstruction efficiency was only slightly reduced in the region covered by the broken
sector, but the track parameter resolutions were degraded compared with the data taken
prior to the failure of this sector.
3 Data and Simulation Samples
The analyses presented here use the data collected by DELPHI in the years 1997 to
2000, at centre-of-mass energies
s between 183 and 209 GeV. The data collected in the
year 2000 with the TPC working in full, with centre-of-mass energies from 200 to 208
GeV and a integrated luminosity weighted average centre-of-mass energy of 206 GeV, were
analysed together. Data acquired with the TPC with a broken sector, corresponding to a
integrated luminosity weighted average centre-of-mass energy of 207 GeV, were analysed
separately and included in the results presented here.
The total integrated luminosity of the data sample is 660.8 pb−1, and the integrated
luminosity weighted average centre-of-mass energy of the data is 197.1 GeV.
To compare with the expected results from processes in the Standard Model including
or not including CR, Monte Carlo (MC) simulation was used to generate events and
1The DELPHI coordinate system is a right-handed system with the z-axis collinear with the incoming electron beam,
and the x axis pointing to the centre of the LEP accelerator.
simulate the response of the DELPHI detector. These events were reconstructed and
analysed with the same programs as used for the real data.
The 4-fermion final states were generated with the code described in [17], based on
WPHACT [18], for the WW signal (charged currents) and for the ZZ background (neutral
currents), after which the events were fragmented with PYTHIA [19] tuned to DELPHI
data [20]. The same WW events generated at 189, 200 and 206 GeV were also fragmented
with PYTHIA implementing the SK-I model, with 100% reconnection probability. The
systematic effects of fragmentation were studied using the above WW samples and WW
samples generated with WPHACT and fragmented with either ARIADNE [7] or HERWIG [6] at
183, 189, 200 and 206 GeV. For systematic studies of Bose-Einstein Correlations (BEC),
WW samples generated with WPHACT and fragmented with PYTHIA implementing the BE32
model [21] of BEC, were used at all energies, except at 207 GeV. The integrated luminosity
of the simulated samples was at least 10 times that of the data of the corresponding year,
and the majority corresponded to 100 times that of the data.
To test the consistency of the SK-I model and measure the κ parameter, large WW
samples were generated in an early stage of this work with EXCALIBUR [22] at 200 and
206 GeV, keeping only the fully hadronic decays. These samples were then fragmented
with PYTHIA. It was verified for smaller subsets that the results using these large samples
and the samples generated later with WPHACT are compatible.
The qq̄(γ) background events were generated at all energies with KK2f [23] and frag-
mented with PYTHIA. For systematic studies, similar KK2f samples fragmented with
ARIADNE [7] were used at 183, 189, 200 and 206 GeV.
These samples will be referred to as “DELPHI samples”.
At 189 GeV, to compare with the other LEP experiments and with different CR mod-
els, 6 samples generated with KORALW [24] for the 4-fermion final states were also used.
These samples 2 will be referred to as “Cetraro samples”. The events in the different sam-
ples have the final state quarks generated with the same kinematics, and differ only in the
parton shower evolution and fragmentation. Three samples were fragmented respectively
with PYTHIA, ARIADNE and HERWIG (using the tuning of the ALEPH collaboration), with
no CR implementation. Three other samples were fragmented in the same manner but
now implementing several CR models: the SK-I model with 100% reconnection proba-
bility, the AR-2 model, and the HERWIG implementation of CR with 1/9 of reconnected
events, respectively.
4 The Particle Flow Method
The first of the two analyses presented in this paper is based on the so-called “particle
flow method”. The particle flow algorithm is based on the selection of special event
topologies, in order to obtain well defined regions between any two jets originating from
the same W (called the Inside-W region) or from different Ws (called the Between-W
region). It is expected that Colour Reconnection decreases (increases) particle production
in the Inside-W (Between-W) region. Hence, by studying the particle production in the
inter-jet regions it is possible to measure the effects of Colour Reconnection. However,
this method requires a selection of events with a suitable topology (see below) which has
a low efficiency (<∼25%).
2produced by ALEPH after the LEP-W Physics Workshop in Cetraro, Italy, October 2001
4.1 Event and Particle Selection
Events with both Ws decaying into q1q̄2 are characterised by high multiplicity, large
visible energy, and the tendency of the particles to be grouped in 4 jets. The background
is dominated by qq̄(γ) events.
Charged particles were required to have momentum p larger than 100 MeV/c and below
1.5 times the beam energy, a relative error on the momentum measurement ∆p/p < 1,
and polar angle θ with respect to the beam axis between 20◦ and 160◦. To remove tracks
from secondary interactions, the distance of closest approach of the extrapolated track
to the interaction point was required to be less than 4 cm in the plane perpendicular to
the beam axis and less than 4/sin θ cm along the beam axis, and the reconstructed track
length was required to be larger than 30 cm.
Clusters in the electromagnetic or hadronic calorimeters with energy larger than 0.5
GeV and polar angle in the interval 10◦ < θ < 170◦, not associated to charged particles,
were considered as neutral particles.
The events were pre-selected by requiring at least 12 charged particles, with a sum
of the modulus of the momentum transverse to the beam axis, of charged and neutral
particles, above 20% of the centre-of-mass energy. These cuts reduced the contributions
from gamma-gamma processes and beam-gas interactions to a negligible amount. The
momentum distribution of the charged particles for the pre-selected events is shown in
Figure 1 and compared to the expected distribution from the simulation. A good agree-
ment between data and simulation is observed.
DELPHI 189 GeV
WW WPHACT
WW semileptonic
p (GeV/c)
0 5 10 15 20 25 30 35 40 45 50
DELPHI 189 GeV
WW WPHACT
WW semileptonic
p (GeV/c)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Figure 1: Momentum distribution for charged particles (range 0-50 GeV/c (a) and
0-5 GeV/c (b)). Points represent the data and the histograms represent the contributions
from simulation for the different processes (signal (white) and background contributions).
About half of the e+e−→qq̄(γ) events at high-energy are associated with an energetic
photon emitted by one of the beam electrons or positrons (radiative return events), thus
reducing the energy available in the hadronic system to the Z mass. To remove these
radiative return events, the effective centre-of-mass energy
s′, computed as described
in [25], was required to be above 110 GeV. It was verified that this cut does not affect the
signal from W pairs, but reduces significantly the contribution from the qq̄(γ) process.
In the WW fully hadronic decays four well separated energetic jets are expected which
balance the momentum of the event and have a total energy near to the centre-of-mass
energy. The charged and neutral particles in the event were thus clustered using the
DURHAM algorithm [26], for a separation value of ycut = 0.005, and the events were
kept if there were 4 and only 4 jets and a multiplicity (charged plus neutral) in each
jet larger than 3. The combination of these two cuts removed most of the semi-leptonic
WW decays and the 2-jet and 3-jet events of the qq̄(γ) background. The charged-particle
multiplicity distribution for the selected events at 189 GeV is given in Figure 2, with data
points compared to the histogram from simulation of signal and background processes.
0 10 20 30 40 50 60 70 80 90
DELPHI 189 GeV
WW WPHACT
WW semileptonic
Figure 2: Uncorrected charged-particle multiplicity distribution at a centre-of-mass en-
ergy of 189 GeV. Points represent the data and the histograms represent the contribution
from simulation for the different processes.
For the study of the charged-particle flow between jets, the initial quark configuration
should be well reconstructed with a good quark-jet association. At 183 GeV and above,
the produced W bosons are significantly boosted. This produces smaller angles in the
laboratory frame of reference between the jets into which the W decays, when compared
to these angles at threshold (back-to-back). Hence, this property tends to reduce the
ambiguity in the definition of the Between-W and Inside-W regions. The selection criteria
were designed in order to minimize the situation of one jet from one W boson appearing
in the Inside-W region of the other W boson.
The selection criteria are based on the event topology, with cuts in 4 of the 6 jet-jet
angles. The smallest and the second smallest jet-jet angle should be below 100◦ and not
adjacent (not have a common jet). Two other jet-jet angles should be between 100◦ and
140◦ and not adjacent (large angles).
In the case that there are two different combinations of jets satisfying the above criteria
for the large angles, the combination with the highest sum of large angles is chosen. This
selection increases the probability to have a correct pairing of jets to the same W boson.
s L Eff. Pur. Nsel MC tot. WW 4j qq̄(γ) ZZ W lep. εPAIR
183 52.7 22% 74% 127 114.2 84.4 22.3 0.7 7.0 69%
189 157.6 21% 75% 340 341.4 255.9 56.8 2.4 26.4 75%
192 25.9 21% 75% 61 56.1 41.9 9.4 0.4 4.4 77%
196 77.3 19% 74% 176 159.2 117.6 26.2 1.3 14.0 79%
200 83.4 18% 72% 173 165.0 119.5 27.8 1.3 16.4 82%
202 40.6 17% 72% 82 75.7 54.6 12.5 0.7 8.0 82%
206 163.9 15% 70% 282 274.7 193.1 47.8 2.7 31.1 79%
207 59.4 15% 70% 102 99.7 70.1 17.6 1.0 11.1 80%
Table 1: Centre-of-mass energy (
s in GeV), integrated luminosity (L in pb−1), efficiency
and purity of the data samples, number of selected events, number of expected events
from 4-jet WW and background processes (total and separated by process), and efficiency
of correct pairing of jets to the same W boson.
The integrated luminosity, the efficiency to select 4-jet WW events and the purity of
the selected data samples, estimated using simulation, and the number of selected events
are summarised for each centre-of-mass energy in Table 1. The numbers of expected
events are also given separately for the signal and the background processes, and were
estimated using simulation. The efficiency to select the correct pairing of jets to the
same W boson, estimated with simulation as the fraction of WW events for which the
selected jets 1 and 2 (see later) correspond indeed to the same W boson, is given in the
last column of the Table.
The efficiency of the event selection criteria decreases with increasing centre-of-mass
energy. This is primarily due to the ‘large’ angles being reduced as a result of the
increased boost (becoming lower than the cut value of 100◦) and ‘small’ angles being
increased due to the larger phase-space available (becoming higher than the cut value of
100◦). Much for the same reason, the efficiency to assign two jets to the same W boson in
the selected events increases slightly with increasing centre-of-mass energy, in opposition
to what would happen at threshold with the W boson decaying into two back-to-back
jets, that would never be selected to come from the same W boson by the requirement
that their interjet angle should be between 100◦ and 140◦.
In the following analysis the jets and planar regions are labeled as shown in Figure 3:
the planar region corresponding to the smallest jet-jet angle is region B in the plane
made by jets 2 and 3; the second smallest jet-jet angle corresponds to the planar region D
between jets 1 and 4 in the plane made by these two jets; the planar region corresponding
to the greatest of the large jet-jet angles in this combination is region A and spans
the angle between jets 1 and 2 in the plane made by these jets; and finally region C
corresponds to the planar region spanned by the second large angle, between jets 3 and
4 in the plane made by these two jets. In general, the planar regions are not in the same
plane, as the decay planes of the W bosons do not coincide, and the large angles in this
combination are not necessarily the largest jet-jet angles in the event.
The distribution of the reconstructed masses of the jet pairings (1,2) and (3,4), after
applying a 4C kinematic fit requiring energy and momentum conservation, is shown in
Figure 4 (two entries per event). In the figure, data at 189 GeV (points) are compared
to the expected distribution from the 4-jet WW signal without CR, plus background
processes, estimated using the simulation (histograms). The contribution from the 4-jet
Inside-W region
Inside-W region
Between-W region
Between-W region
jet 2
jet 1
jet 3
jet 4
Figure 3: Schematic drawing of the angular selection.
WW signal simulation is split between the case in which the two pairs of jets making the
large angles actually come from their parent W bosons and the case in which the jets of
a pair come from different W bosons (mismatch).
4.2 Particle Flow Distribution
The particle flow analysis uses the number of particles in the Inside-W and the
Between-W regions. An angular ordering of the jets is performed as in Figure 3. The
two large jet-jet angles in the event are used to define the Inside-W regions, and the two
smallest angles span the Between-W regions, the regions between the different Ws.
In general, the two W bosons will not decay in the same plane, and this must be
accounted for when comparing the particle production in the Inside-W and Between-W
regions. So, for each region (A, B, C and D) the particle momenta of all charged particles
are projected onto the plane spanned by the jets of that region: jets 1 and 2 for region
A; jets 2 and 3 for region B; jets 3 and 4 for region C; jets 4 and 1 for region D. Then,
for each particle the rescaled angle Φrescaled is determined as a ratio of two angles:
Φrescaled = Φi/Φr , (2)
when the particle momentum is projected onto the plane of the region r. The angle Φi is
then the angle between the projected particle momentum and the first mentioned jet in
the definition of the regions given above. The angle Φr is the full opening angle between
the jets. Hence Φrescaled varies between 0 and 1 for the particles whose momenta are
projected between the pair of jets defining the plane.
However, due to the aplanarity of the event about 9% of the particles in the data and in
the 4-jet WW simulation have projected angles outside all four regions. These particles
were discarded from further analysis. In the case where a particle could be projected
onto more than one region, with 0 < Φrescaled < 1, the solution with the lower momentum
transverse to the region was used. This happened for about 13% of the particles in data,
after background subtraction, and in the 4-jet WW simulation.
This leads to the normalised particle flow distribution shown in Figure 5 at 189 GeV,
where the rescaled angle of region A is plotted from 0 to 1, region B from 1 to 2, region
0 20 40 60 80 100 120
DELPHI 189 GeV
WW WPHACT
WW wrong pairing
WW semileptonic
Mass (GeV/c
Figure 4: Reconstructed dijet masses (after a 4C kinematic fit) for the selected pairs at
189 GeV (2 entries per event)(see text).
C from 2 to 3 and region D from 3 to 4. The statistical error on the bin contents (the
average multiplicity per bin of Φrescaled divided by the bin width) was estimated using
the Jackknife method [27], to correctly account for correlations between different bins.
In this distribution the regions between the jets coming from the same W bosons (A and
C), and from different W bosons (B and D), have the same scale and thus can be easily
compared.
After subtracting bin-by-bin the expected background from the observed distributions,
we define the Inside-W (Between-W) particle flow as the bin-by-bin sum of regions A and
C (B and D). These distributions are compared by performing the bin-by-bin ratio of
the Inside-W particle flow to the Between-W particle flow. This ratio of distributions is
shown for 189 GeV and 206 GeV in Figure 6. The data points are compared to several
fully simulated WW MC samples with and without CR.
A good agreement was found between the predictions using the WPHACT WW MC sam-
ples and the predictions based on the KORALW WW MC samples, both for the scenario
without CR and for the scenario with CR (SK-I model with 100% probability of recon-
nection). For both sets of predictions the regions of greatest difference between the two
scenarios span the rescaled variable Φrescaled from 0.2 to 0.8.
4.3 Particle Flow Ratio
After summing the particle flow distributions for regions A and C, and regions B and
D, the resulting distributions are integrated from 0.2 to 0.8. The ratio R of the Inside-W
to the Between-W particle flow is then defined as (with Φ being the rescaled variable
Φrescaled):
0 0.5 1 1.5 2 2.5 3 3.5 4
A B C D
DELPHI 189 GeV
WPH. SK-I 100%
WW WPHACT
WW semilept.
Φrescaled
Figure 5: Normalised charged-particle flow at 189 GeV. The lines correspond to the sum
of the simulated 4-jet WW signal with the background contributions (estimated from
DELPHI MC samples), normalised to the total number of expected events (Nevents). The
dashed histogram corresponds to the sum with the simulated 4-jet WW signal generated
by WPHACT with 100% SK-I.
0 0.2 0.4 0.6 0.8 1
Φrescaled
DELPHI 189 GeV
KW. SK-I 100%WPH. SK-I 100%
WW KORALWWW WPHACT
0 0.2 0.4 0.6 0.8 1
Φrescaled
DELPHI 206 GeV
WPH. SK-I 100%
WW WPHACT
Figure 6: The ratio of the particle flow distributions (A+C)/(B+D) at 189 GeV (a) and at
206 GeV (b). The data (dots) are compared to WW MC samples generated with WPHACT
(DELPHI samples) and KORALW (Cetraro samples), both without CR and implementing
the SK-Imodel with 100% probability of reconnection. The lines corresponding to WPHACT
are hardly distinguishable from the lines corresponding to KORALW in the same condition
of implementation of CR.
s (GeV) RData Rno CR RSK-I:100%
183 0.889 ± 0.084 0.928 ± 0.005 -
189 1.025 ± 0.063 0.966 ± 0.006 0.864 ± 0.005
192 1.008 ± 0.150 0.970 ± 0.006 -
196 1.041 ± 0.093 0.995 ± 0.006 -
200 0.922 ± 0.084 1.022 ± 0.007 0.889 ± 0.006
202 0.952 ± 0.126 1.015 ± 0.008 -
206 1.116 ± 0.088 1.012 ± 0.008 0.889 ± 0.006
207 1.039 ± 0.135 1.019 ± 0.008 -
Table 2: Values of the ratio R for each energy (errors are statistical only), and expected
values with errors due to limited statistics of the simulation, all from DELPHI WPHACT
WW samples.
MC Sample χ2/DF α,A β,B γ
no CR 7.31/5 1.001± 0.003 (3.20± 0.36)× 10−3 (−1.35± 0.40)× 10−4
SK-I 100% 1.46/1 0.880± 0.003 (1.68± 0.44)× 10−3 -
Table 3: Results of the fit to the evolution of R with (
s(GeV)− 197.5).
∫ 0.8
dnch/dΦ(A+ C)dΦ
∫ 0.8
dnch/dΦ(B +D)dΦ
. (3)
To take into account possible statistical correlations between particles in the Inside-W
and Between-W regions, the statistical error on this ratio R was again estimated through
the Jackknife method [27].
The values forR obtained for the different centre-of-mass energies are shown in Table 2,
and compared to the expectations from the DELPHI WPHACT WW samples without CR
and implementing the SK-I model with 100% reconnection probability. These values for
data and MC are plotted as function of the centre-of-mass energy in Figure 7.
The changes in the value of R for the MC samples are mainly due to the dif-
ferent values of the boost of the W systems. In order to quantify this effect
a linear function R(
s− 197.5) = A + B · (
s− 197.5) was fitted to the MC points
with CR (with
s in GeV), while for the points without CR the quadratic func-
tion R(
s− 197.5) = α+ β · (
s− 197.5) + γ · (
s− 197.5)2 was assumed (with
GeV), giving reasonable χ2/d.o.f. values. The fits yielded the results shown in Table 3.
The MC without CR shows a stronger dependence on
s. The function fitted to this
sample was used to rescale the measured values of R for the data collected at different
energies to the energy of 189 GeV, the centre-of-mass energy at which the combination
of the results of the LEP experiments was proposed in [4]. All the rescaled values were
combined with a statistical error-weighted average. The average of the R ratios rescaled
to 189 GeV was found to be
〈R〉 = 0.979± 0.032(stat). (4)
180 185 190 195 200 205 210
WW no CR Fit to no CR
WW SK-I 100% Fit to SK-I 100%Fit to SK-I 100%
Combined Ratio
√s (GeV)
DELPHI
Figure 7: The ratio R as function of
s for data and MC (DELPHI WPHACTWW samples),
and fits to the MC with and without CR, and the combined ratio after rescaling all values
s = 189 GeV (see text). The value of the combined ratio at 189 GeV is shown at
a displaced energy (upwards by 1 GeV) for better visibility, as well as all the values
for the MC ‘WW no CR’ points and the corresponding fitted curve which are shown at
centre-of-mass energies shifted downwards by 0.5 GeV. All errors for the MC values are
smaller than the size of the markers.
Performing the same weighted average when using for the rescaling the fit to the MC
with CR, one obtains:
〈RCR rescale〉 = 0.987± 0.032(stat). (5)
Repeating the procedure, but now without rescaling the R ratios, the result is:
〈Rno rescale〉 = 0.999± 0.033(stat). (6)
4.4 Study of the Systematic Errors in the Particle Flow
The following effects were studied as sources of systematic uncertainties in this anal-
ysis.
4.4.1 Fragmentation and Detector response
A direct comparison between the particle flow ratios measured in fully hadronic data
and MC samples, R4qData and R4qMC, respectively, is hampered by the uncertainties as-
sociated with the modelling of the WW fragmentation and the detector response. These
systematic uncertainties were estimated using mixed semi-leptonic events. In this tech-
nique, two hadronically decaying W bosons from semi-leptonic events were mixed together
to emulate a fully hadronic WW decay.
Mixing Technique
Semi-leptonic WW decays were selected from the data collected by DELPHI at centre-
of-mass energies between 189 and 206 GeV, by requiring two hadronic jets, a well isolated
identified muon or electron or, in case of a tau candidate, a well isolated particle, all
associated with missing momentum (corresponding to the neutrino) pointing away from
the beam pipe. A neural network selection, developed in [28], was used to select the
events. The same procedure was applied to the WPHACT samples fragmented with PYTHIA
and HERWIG at centre-of-mass energies of 189, 200 and 206 GeV and with ARIADNE at 189
and 206 GeV. The background to this selection was found to be of negligible importance
in this analysis. Samples of mixed semi-leptonic events were built separately at each
centre-of-mass energy for data and Monte Carlo semi-leptonic samples, following the
mixing procedure developed in [2].
In each semi-leptonic event, the lepton (or tau-decay jet) was stripped off and the
remaining particles constituted the hadronically decaying W boson. Two hadronically
decaying W bosons were then mixed together to emulate a fully hadronic WW decay.
The hadronic parts of W bosons were mixed in such a way as to have the parent W
bosons back-to-back in the emulated fully hadronic WW decay. To increase the statistics
of emulated events, and profiting from the cylindrical symmetry of the detector along
the z axis, the hadronic parts of W bosons were rotated around the z axis, but were not
moved from barrel to forward regions or vice-versa, as detailed in the following.
When mixing the hadronic parts of different W events it was required that the two Ws
had reconstructed polar angles back-to-back or equal within 10 degrees. In the latter case,
when both Ws are on the same side of the detector, the z component of the momentum
is sign flipped for all the particles in one of the Ws.
The particles of one W event were then rotated around the beam axis, in order to have
the two Ws also back-to-back in the transverse plane. Each semi-leptonic event was used
in the mixing procedure between 4 and 9 times, to minimize the statistical error on the
particle flow ratio R measured in the mixed semi-leptonic data sample.
The mixed events were then subjected to the same event selection and particle flow
analysis used for the fully hadronic events. The particle flow ratios Rmixed SLData and
Rmixed SLMC were measured in the mixed semi-leptonic data and MC samples, respectively,
and are plotted as function of the centre-of-mass energy in Figure 8.
The values of Rmixed SL measured in MC show a dependence on
s. This effect is
quantified by performing linear fits to the points measured with PYTHIA, ARIADNE and
HERWIG, respectively. The differences between the measured slopes were found to be small.
The function fitted to the PYTHIA points was used to rescale the values of R measured in
data at different energies to 189 GeV. The rescaled values were then combined using as
weights the scaled statistical errors. The weighted average R at 189 GeV for the mixed
semi-leptonic events built from data was found to be
〈Rmixed SLData〉 = 1.052± 0.027(stat). (7)
For each MC sample, the ratio Rmixed SLData/Rmixed SLMC was used to calibrate the
particle flow ratio measured in the corresponding fully hadronic sample, R4qMC, to
compare it to the ratio measured in the data, 〈R4qData〉. The correction factor
Rmixed SLData/Rmixed SLMC was computed from the values of R rescaled to 189 GeV, calcu-
lated from the fits to the mixed semi-leptonic samples built from the data and the MC.
185 190 195 200 205 210
Data Mixed SL
PYTHIA
Fit to PYTHIA
ARIADNE
Fit to ARIADNE
HERWIG
Fit to HERWIG
√s (GeV)
DELPHI
Figure 8: The ratio Rmixed SL as function of
s for data and MC, and fits to the MC (see
text). The ARIADNE points at 189 GeV and at 206 GeV have their centre-of-mass energy
shifted and the error bars on data are tilted for readability.
MC sample PYTHIA ARIADNE HERWIG
Rmixed SLData/Rmixed SLMC 1.053 1.044 0.997
RCalibrated4qMC 1.018 1.011 1.004
Table 4: Ratio of data to MC fitted values of R in mixed semi-leptonic samples, used
to calibrate the R4qMC values for different models (upper line), and calibrated values of
R4qMC. All values were computed at
s = 189 GeV.
The values for Rmixed SLMC are presented in Table 4, for the different models, along with
the calibrated values of R4q for the same models.
The calibration factors differ from unity by less than 6%, and the largest difference of
the calibrated R4qMC values when changing the fragmentation model, 0.014, was consid-
ered as an estimate of the systematic error due to simulation of the fragmentation and of
the detector response, and was added in quadrature to the systematic error. The error
in the calibrated R4qMC values due to the statistical error on 〈Rmixed SLData〉 value used
for the calibration, 0.026, was also added in quadrature to the systematic error.
4.4.2 Bose-Einstein Correlations
Bose-Einstein correlations (BEC) between identical pions and kaons are known to
exist and were established and studied in Z hadronic decays in [29]. They are expected
to exist with a similar behaviour in the W hadronic decays, and this is studied in [2].
They are implemented in the MC simulation samples with BEC via the BE32 model
of LUBOEI [21], which was tuned to describe the DELPHI data in [2]. However, the
situation for the WW (ZZ) fully hadronic decays is not so clear, i.e. whether there are
correlations only between pions and kaons coming from the same W(Z) boson or also
between pions and kaons from different W(Z) bosons. The analyses of Bose-Einstein
correlations between identical particles coming from the decay of different W bosons do
not show a significant effect [30] for three of the LEP experiments, whereas for DELPHI,
an effect was found at the level of 2.4 standard deviations [2]. Thus, a comparison was
made between the WPHACT samples without CR and with BEC only between the identical
pions coming from the same W boson (BEC only inside), to the samples without CR
and with BEC allowed for all the particles stemming from both W bosons, implemented
with the BE32 variant of the LUBOEI model (BEC all). The R values were obtained
at each centre-of-mass energy, after which a linear fit was performed for each model to
obtain a best prediction at 189 GeV. The fit values were found to be in agreement to the
estimate at 189 GeV alone, and for simplicity this estimate was used. The measurement
of BEC from DELPHI of 2.4 standard deviations above zero (corresponding to BEC only
inside), was used to interpolate the range of 4.1 standard deviations of separation between
BEC only inside and BEC all. To include the error on the measured BEC effect, one
standard deviation was added to the effect before the interpolation. The difference in the
estimated values of R at
s = 189 GeV, between the model with BEC only inside and
the model with partial BEC all (at the interpolated point of 3.4/4.1), -0.013, was added
in quadrature to the systematic error.
4.4.3 qq̄(γ) Background Shape
The fragmentation effects, in the shape of the qq̄(γ) background, were estimated by
comparing the values of R obtained when the subtracted qq̄(γ) sample was fragmented
with ARIADNE instead of PYTHIA at the centre-of-mass energy of 189 GeV, and the differ-
ence, 0.003, was added in quadrature to the systematic error.
4.4.4 qq̄(γ) and ZZ Background Contribution
At the centre-of-mass energy of 189 GeV, the qq̄(γ) cross-section in the 4-jet region
is poorly known, due to the difficulty in isolating the qq̄(γ) → 4-jet signal from other
4-jet processes such as WW and ZZ. The study performed in [31] has shown that the
maximal difference in the estimated qq̄(γ) background rate is 10% coming from changing
from PYTHIA to HERWIG as the hadronization model, with the ARIADNE model giving
intermediate results. Conservatively, at each centre-of-mass energy a variation of 10%
on the qq̄(γ) cross-section was assumed, and the largest shift in R, 0.011, was added in
quadrature to the systematic error.
The other background process considered is the Z pair production. The Standard
Model predicted cross-sections are in agreement with the data at an error level of 10% [32].
The cross-section was thus varied by ±10% at each energy and the effect in R was found
to be negligible.
4.4.5 Evolution of R with Energy
The R ratios were rescaled to
s = 189 GeV using the fit to the MC without CR,
however the correct behaviour might be given by the MC with CR. Hence, the difference of
0.009 between the R values obtained using the two rescaling methods, using MC without
CR 〈R〉 and with CR 〈RCR rescale〉, was added in quadrature to the systematic error.
4.5 Results of the Particle Flow Analysis
The final result for the average of the ratios R rescaled to 189 GeV is
MC Sample R
PYTHIA no CR 1.037± 0.004
PYTHIA SK-I 100% 0.917± 0.003
ARIADNE no CR 1.053± 0.004
ARIADNE AR2 1.021± 0.004
HERWIG no CR 1.059± 0.004
HERWIG 1/9 CR 1.040± 0.003
Table 5: R ratios for the Cetraro samples at 189 GeV, calibrated with the mixed semi-lep-
tonic events.
〈R〉 = 0.979± 0.032(stat)± 0.035(syst). (8)
In order to facilitate comparisons between the four LEP experiments, this value can
be normalised by the one determined from simulation samples produced with the full
detector simulation and analysed with the same method. The LEP experiments agreed
to use for this purpose the Cetraro PYTHIA samples. These events were generated with the
ALEPH fragmentation tuning but have been reconstructed with the DELPHI detector
simulation and analysed with this analysis. The values of the R ratios obtained from the
Cetraro samples at 189 GeV, calibrated using the mixed semi-leptonic events from these
samples, are given in Table 5.
The value of 〈R〉 measured from data is between the expected R ratios from PYTHIA
without CR and with the SK-I model with 100% fraction of reconnection. The error
of this measurement is larger than the difference between the values of R from ARIADNE
samples without and with CR, and than the difference between values of R from the
HERWIG samples without CR and with 1/9 of reconnected events.
The following normalised ratios are obtained for the sample without CR and imple-
menting the SK-I model with 100% CR probability, respectively:
rdatano CR =
〈R〉data
Rno CR
= 0.944± 0.031(stat)± 0.034(syst), (9)
rdataCR =
〈R〉data
= 1.067± 0.035(stat)± 0.039(syst). (10)
In the above expressions, the statistical errors in the MC predicted values were propagated
and added quadratically to the systematic errors on the ratios.
It is also possible to define the following quantity, taking the predictions for RCR and
Rno CR at
s = 189 GeV from the PYTHIA samples in Table 5,
〈Rdata〉 −Rno CR
RCR − Rno CR
= 0.49± 0.27(stat)± 0.29(syst) , (11)
from which it can be concluded that the measured 〈Rdata〉 is compatible with intermediate
probability of CR, and differs from the CR in the SK-I model at 100% at the level of
1.3 standard deviations. The ability to distinguish between these two models can be
computed from the inverse of the sum in quadrature of the statistical and systematic
errors; it amounts to be 2.5 standard deviations. In Figure 9 the result of δr is compared
to the predicted values, in the scope of the SK-I model, as a funtion of the fraction of
reconnected events.
Fraction of reconnected events %
DELPHI
0 20 40 60 80 100
Figure 9: Comparison of the measurement of the δr observable to the predictions from
the SK-I model as a function of the fraction of reconnected events.
The result for the value of 〈R〉 can also be used to test for consistency with the SK-I
model as a function of κ and a log-likelihood curve was obtained. This also facilitates
combination with the result obtained in the analysis in the following section, and for this
reason the value of 〈R〉 is rescaled with PYTHIA without CR to a centre-of-mass energy
of 200 GeV: the value obtained at 200 GeV is 〈R〉(200 GeV) = 1.024 ± 0.050. The
values obtained for the predicted ratios RN at 200 GeV and the log-likelihood curve, as
a function of κ, are shown in Figure 10. The value of κ most compatible with the data
within one standard deviation is
κSK-I = 4.13
+20.97
−3.46 . (12)
5 Different MW Estimators as Observables
It has been shown [12] that the MW measurement inferred from hadronically decaying
W+W− events at LEP-2, by the method of direct reconstruction, is influenced by CR
effects, most visible when changing the value of κ in the SK-I model. For the MW(4q)
estimator within DELPHI this is shown in [33]. Other published MW estimators in LEP
experiments are equally sensitive to κ [34].
To probe this sensitivity to CR effects, alternative estimators for the MW measure-
ment were designed which have different sensitivity to κ. In the following, the standard
estimator and two alternative estimators, studied in this paper, are presented. The stan-
dard estimator corresponds to that previously used in the measurement of the W mass
by DELPHI [33]. Note that in the final DELPHI W mass analysis [35] results are given
0. 0.
Fraction of reconnected events %
DELPHI
DELPHI
Fraction of reconnected events %
DELPHI
DELPHI
0 20 40 60 80 100
0 20 40 60 80 100
Figure 10: a) Estimated ratio RN at 200 GeV plotted as a function of different κ values
(top scale), or as function of the corresponding reconnection probabilities (bottom scale),
compared to 〈R〉 measured from data after rescaling to 200 GeV (horizontal lines marked
with R for the value and with 1σ(2σ) for the 〈R〉 value added/subtracted by one(two)
standard deviations); the last three marks on the x axis, close to 100% of reconnection
probability, correspond respectively to the values κ = 100, 300, 800; b) corresponding
log-likelihood curve for the comparison of the estimated values (RN ) with the data (〈R〉).
for the standard and hybrid cone estimators, with the hybrid cone estimator used to
provide the primary result. The data samples, efficiencies and purities for the analysis
corresponding to the standard estimator are provided in [33, 35].
• The standard MW estimator :
This estimator is described in [33] and was optimised to obtain the smallest sta-
tistical uncertainty for the W mass measurement. It results in an event-by-event
likelihood Li(MW) for the parameter MW.
• The momentum cut MW estimator :
For this alternative MW estimator the event selection was performed in exactly
the same way as for the standard MW estimator. The particle-jet association was
also taken from this analysis. However, when reconstructing the event for the MW
extraction a tighter track selection was applied. The momentum and energy of the
jets were calculated only from those tracks having a momentum higher than a certain
pcut value. An event-by-event likelihood L
i (MW) was then calculated.
• The hybrid cone MW estimator :
In this second alternative MW estimator the reconstruction of the event is the same
as for the standard analysis, except when calculating the jet momenta used for the
MW extraction.
coneR
(cone) (std)
Figure 11: Illustration of the iterative cone algorithm within a predefined jet as explained
in the text.
An iterative procedure was used within each jet (defined by the clustering algorithm
used in the standard analysis) to find a stable direction of a cone excluding some
particles in the calculation of the jet momentum, illustrated in Figure 11. Starting
with the direction of the original jet ~p
std , the jet direction was recalculated (direction
(1) on the Figure) only from those particles which have an opening angle smaller
than Rcone with this original jet. This process was iterated by constructing a second
cone (of the same opening angle) around this new jet direction and the jet direction
was recalculated again. The iteration was continued until a stable jet direction ~p jetcone
was found. The jet momenta obtained, ~p jetcone, were rescaled to compensate for the
lost energy of particles outside the stable cone,
~p jetcone → ~p jetcone ·
Ejetcone
. (13)
The energies of the jets were taken to be the same as those obtained with the
standard clustering algorithm (E jetcone → E jet). This was done to increase the
correlation of this estimator with the standard one. The rescaling was not done
for the pcut estimator as it will be used in a cross-check observable with different
systematic properties. Again the result is an event-by-event likelihood LRconei (MW).
Each of these previously defined MW likelihoods had to be calibrated. The slope of
the linear calibration curve for the MW estimators is tuned to be unity, therefore only
a bias correction induced by the reconstruction method has to be applied. This bias
is estimated with the nominal WPHACT Monte Carlo events and the dependence on the
value of κ is estimated with the EXCALIBUR simulation. It was verified for smaller subsets
that the results using these large EXCALIBUR samples and the samples generated with
WPHACT are compatible. Neglecting the possible existence of Colour Reconnection (CR) in
the Monte Carlo simulation results in event likelihoods Li(MW|event without CR), while
Li(MW|event with CR) are the event likelihoods obtained when assuming the hypothesis
that events do reconnect (100% CR in the scope of the SK-I model). To construct the
event likelihoods for intermediate CR (values of κ larger than 0) the following weighting
formula is used :
Li(MW|κ) = [1−Pi(κ)]·Li(MW|event without CR)+Pi(κ)·Li(MW|event with CR) (14)
where Pi(κ) is defined in Equation 1. The combined likelihood is produced for
the event sample; the calibrated values for MW(κ) were obtained for different val-
ues of κ using the maximum likelihood principle. In Figure 12 the difference
dMW(κ) = MW(κ)−MW(κ = 0) or the influence of κ on the bias of the MW estimator is
presented as function of κ.
The uncertainty on this difference is estimated with the Jackknife method [27] to
take the correlation between MW(κ) and MW(κ = 0) into account. It was observed from
simulations that the estimators dependency on κ, for κ below about 5, was not signifi-
cantly different in the centre-of-mass range between 189 and 207 GeV. Therefore in the
determination of κ the dependency at 200 GeV was taken as default for all centre-of-
mass energies. This value of centre-of-mass energy is close to the integrated luminosity
weighted centre-of-mass energy of the complete data sample, which is 197.1 GeV.
When neglecting the information content of low momentum particles or when using
the hybrid cone algorithm, the influence of Colour Reconnection on the MW estimator is
decreased. The dependence ∂MW
of the estimator to κ is decreased when increasing the
value of pcut or when working with smaller cone opening angles Rcone.
5.1 The Measurement of κ
The observed difference ∆MW(std, i) = MW
std − MWi in the event sample, where i
is a certain alternative analysis, provides a measurement of κ. When both estimators
std and MW
i are calibrated in the same hypothesis of κ, the expectation values of
∆MW(std, i) will be invariant under a change of pcut or Rcone.
When neglecting part of the information content of the events in these alternative MW
analyses, by increasing pcut or decreasing Rcone, the statistical uncertainty on the value of
1 10 10
Standard MW analysis    DELPHI
pcut = 1 GeV/c
pcut = 2 GeV/c
pcut = 3 GeV/c
Cone R=1.00 rad
Cone R=0.75 rad
Cone R=0.50 rad
Cone R=0.25 rad
SK-I Model parameter κ
κ = 0.66
Figure 12: The difference dMW(κ) = MW(κ)−MW(κ = 0) is presented as a function of
κ, for different MW estimators. The curve for the standard MW estimator is the curve
at the top. The curves obtained with the hybrid cone analysis for different values of the
cone opening angle, starting from the top with 1.00 rad down to 0.75 rad, 0.50 rad and
0.25 rad are indicated with dotted lines. The curves obtained with the momentum cut
analysis for different values of pcut, starting from the top with 1 GeV/c, down to 2 GeV/c
and 3 GeV/c are dashed. The vertical line indicates the value of κ preferred by the SK-I
authors [5] and commonly used to estimate systematic uncertainties on measurements
using e+e− → W+W− → q1q̄2q3q̄4 events.
the MW estimator is increased. Therefore a balance must be found between the statistical
precision on ∆MW(std, i) and the dependence of this difference to κ in order to obtain
the largest sensitivity for a κ measurement. This optimum was found using the Monte
Carlo simulated events and assuming that the data follow the κ = 0 hypothesis, resulting
in the smallest expected uncertainty on the estimation of κ.
For the pcut analysis an optimal sensitivity was found when using the difference
∆MW(std, pcut) with pcut equal to 2 GeV/c or 3 GeV/c. Even more information about
κ could be extracted from the data, when using the difference ∆MW(std,Rcone), which
was found to have an optimal sensitivity around Rcone = 0.5 rad. No significant im-
provement in the sensitivity was found when combining the information from these two
observables. Therefore the best measure of κ using this method is extracted from the
∆MW(std,Rcone = 0.5 rad) observable. Nevertheless, the ∆MW(std, pcut = 2GeV/c) ob-
servable was studied as a cross-check.
5.2 Study of the Systematic Errors in the ∆MW Method
The estimation of systematic uncertainties on the observables ∆MW(std, i) follows
similar methods to those used within the MW analysis. Here the double difference is a
measure of the systematic uncertainty between Monte Carlo simulation (‘MC’) and real
data (‘DA’):
∆syst(MC,DA) =
std(MC)−MWstd(DA)]− [MWi(MC)−MWi(DA)]
∣ (15)
where i is one of the alternative MW estimators. The systematic error components are
described below and summarised in Table 6.
5.2.1 Jet Reconstruction systematics with MLBZs
A novel technique was proposed in [36] to study systematic uncertainties on jet recon-
struction and fragmentation in W physics measurements with high statistical precision
through the use of Mixed Lorentz Boosted Z events (MLBZs). The technique is similar to
the one described in section 4.4.1. The main advantage of this method was that Monte
Carlo simulated jet properties in W+W− events could be directly compared with the
corresponding ones from real data using the large Z statistics.
The main extension of the method beyond that described in [36] consisted in an
improved mixing and boosting procedure of the Z events into MLBZs, demonstrated
in Figure 13.
The 4-momenta of the four primary quarks in WPHACT generated W+W− → q1q̄2q3q̄4
events were used as event templates. The Z events from data or simulation were chosen
such that their thrust axis directions were close in polar angle to one of the primary quarks
of the W+W− event template. Each template W was then boosted to its rest frame. The
particles in the final state of a selected Z event were rotated so that the thrust axis matches
the rest frame direction of the primary quarks in the W+W− template. After rescaling the
kinematics of the Z events to match the W boson mass in the generated W+W− template,
the two Z events were boosted to the lab frame of the W+W− template. All particles
having an absolute polar angle with the beam direction smaller than 11◦ were removed
from the event. The same generated WPHACT events were used for the construction of both
the data MLBZs and Monte Carlo MLBZs in order to increase the correlation between
both emulated samples to about 31%. This correlation was taken into account when
boost
lab frame WW
"re−boost" rotate
rest frame WW
lab frame  MLBZ
boost
Figure 13: Illustration of the mixing and boosting procedure within the MLBZ method
(see text for details).
quoting the statistical uncertainty on the systematic shift on the observables between
data and Monte Carlo MLBZs.
It was verified that when introducing a significant mass shift of 300 MeV/c2 on MW
by using the cone rejection algorithm, it was reproduced within 15% by applying the
MLBZ technique. Because the expected systematic uncertainties on the ∆MW(std, i)
observables of interest are one order of magnitude smaller than 300 MeV/c2, this method
is clearly justified.
The double difference of Equation 15 was determined with the MLBZ method using
Z events selected in the data sets collected during the 1998 calibration runs and Z events
from the corresponding Monte Carlo samples. The following results were obtained for
the ∆MW(std,Rcone = 0.5 rad) observable:
∆syst(ARIADNE ,DA) = −1.9 ± 3.9(stat)MeV/c2
∆syst(PYTHIA ,DA) = −5.7 ± 3.9(stat)MeV/c2
∆syst(HERWIG ,DA) = −10.6 ± 3.9(stat)MeV/c2
where the statistical uncertainty takes into account the correlation between the Monte
Carlo and the data MLBZ events, together with the correlation between the two MW
estimators. This indicates that most of the fragmentation, detector and Between-W
Bose-Einstein Correlation systematics are small. The study was not performed for the
∆MW(std, pcut) observable.
Other systematic sources on the reconstructed jets are not considered as the MW
estimators used in the difference ∆MW(std, i) have a large correlation.
5.2.2 Additional Fragmentation systematic study
The fragmentation of the primary partons is modelled in the Monte Carlo simulation
used for the calibration of the MW
i observables.
The expected values on the MW estimators from simulation (in the κ = 0 hypothe-
sis) are changed when using different fragmentation models [33], resulting in systematic
uncertainties on the measured MW
i observables and hence possibly also on our esti-
mated κ. In Figure 14 the systematic shift δMW in the different MW
i observables is
shown when using HERWIG or ARIADNE rather than PYTHIA as the fragmentation model
in the no Colour Reconnection hypothesis. When inferring κ from the data difference,
∆MW(std, i), the PYTHIA model is used to calibrate each MW
i observable. This data
difference for MW
pcut=2GeV/c, ∆MW(std, pcut = 2GeV/c), changes
3 by (27 ± 12) MeV/c2
or (8 ± 12) MeV/c2 when replacing PYTHIA by respectively HERWIG or ARIADNE. Simi-
larly, the observable ∆MW(std,Rcone = 0.5 rad) changes by (-4 ± 10) MeV/c2 or (-6 ±
10) MeV/c2 when replacing PYTHIA by respectively HERWIG or ARIADNE. The largest
shift of the observable when changing fragmentation models (or the uncertainty on this
shift if larger) is taken as systematic uncertainty on the value of the observable. Hence,
systematic errors of 27 MeV/c2 for the ∆MW(std, pcut = 2GeV/c) observable and 10
MeV/c2 for the ∆MW(std,Rcone = 0.5 rad) observable were assumed as the contribution
from fragmentation uncertainties. The MLBZ studies (see above) are compatible with
these results, hence no additional systematic due to fragmentation was quoted for the
∆MW(std,Rcone = 0.5 rad) observable.
3This change, ∆MW(std, pcut = 2GeV/c)
PYTHIA
− ∆MW(std, pcut = 2GeV/c)
HERWIG
, is given by
δMW(std ≡ pcut = 0.2GeV/c)
PYTHIA−HERWIG
− δMW(pcut = 2GeV/c)
PYTHIA−HERWIG, and similar expressions for the
ARIADNE and Rcone cases (for Rcone, std ≡ Rcone = π).
0 0.5 1 1.5 2 2.5 3 3.5 4
PYTHIA-ARIADNE         DELPHI
PYTHIA-HERWIG
pcut / GeV/c
0 0.2 0.4 0.6 0.8 1 1.2
PYTHIA-ARIADNE         DELPHI
PYTHIA-HERWIG
Rcone / rad
Figure 14: Systematic shifts δMW, on MW observables, when applying different fragmen-
tation models as a function of the pcut or Rcone values used in the construction of the
MW observable. These Monte Carlo estimates were obtained at a centre-of-mass energy
of 189 GeV. The uncertainties are determined with the Jackknife method.
5.2.3 Energy Dependence
The biases of the different MW estimators have a different dependence on the centre-
of-mass energy, hence the calibration of ∆MW(i, j) will be energy dependent. The energy
dependence of each individual MW estimator was parameterised with a second order poly-
nomial. Since WPHACT event samples were used at a range of centre-of-mass energies
the uncertainty on the parameters describing these curves are small. Therefore a small
systematic uncertainty of 3 MeV/c2 was quoted on the ∆MW(i, j) observables due to the
calibration.
5.2.4 Background
The same event selection criteria were applied for all the MW estimators, hence the
same background contamination is present in all analyses. The influence of the qq̄(γ)
background events on the individual MW estimators is small [33] and was taken into
account when constructing the centre-of-mass energy dependent calibration curves of
the individual MW estimators. The residual systematic uncertainty on both ∆MW(i, j)
observables is 3 MeV/c2.
5.2.5 Bose-Einstein Correlations
As for the particle flow method, the systematic uncertainties due to possible Bose-
Einstein Correlations are estimated via Monte Carlo simulations. The relevant values
for the systematic uncertainties on the observables are the differences between the ob-
servables obtained from the Monte Carlo events with Bose-Einstein Correlations inside
individual W’s (BEI) and those with, in addition, Bose-Einstein Correlations between
identical particles from different W’s (BEA). The values were estimated to be (6.4 ±
9.3) MeV/c2 for the ∆MW(std, pcut = 2GeV/c) observable, and (7.2 ± 8.2) MeV/c2 for
the ∆MW(std,Rcone = 0.5 rad) observable. As the uncertainties in the estimated contri-
butions were larger than the contributions themselves, these uncertainties were added in
quadrature to the systematic errors on the relevant observables.
5.2.6 Cross-check in the Semi-leptonic Channel
Colour Reconnection between the decay products originating from different W boson
decays can only occur in the W+W− → q1q̄2q3q̄4 channel. The semi-leptonic W+W−
decay channel (i.e, qq̄′ℓνℓ) is by definition free of those effects. Therefore the determi-
nation of Colour Reconnection sensitive observables, like ∆MW(std,Rcone = 0.5 rad), in
this decay channel could indicate the possible presence of residual systematic effects. A
study of the ∆MW(std,Rcone = 0.5 rad) observable was performed in the semi-leptonic
decay channel. The semi-leptonic MW analysis in [33] was used and the cone algorithm
was implemented in a similar way as for the fully hadronic decay channel. The same
data sets have been used as presented throughout this paper and the following result was
obtained:
∆MW(std,Rcone) = MW
std − MWRcone = (8 ± 56(stat))MeV/c2 (17)
where the statistical uncertainty was computed taking into account the correlation be-
tween both measurements. Although the statistical significance of this cross-check is
small, a good agreement was found for both MW estimators.
5.3 Results from the MW Estimators Analyses
The observable ∆MW(std,Rcone) with Rcone equal to 0.5 rad (defined above), was
found to be the most sensitive to the SK-I Colour Reconnection model, and the
∆MW(std, pcut = 2GeV/c) observable was measured as a cross-check. The analyses were
calibrated with PYTHIA κ = 0 WPHACT generated simulation events. The values measured
from the combined DELPHI data at centre-of-mass energies ranging between 183 and
208 GeV are:
∆MW(std,Rcone) = MW
std − MWRcone = (59 ± 35(stat) ± 14(syst))MeV/c2
∆MW(std, pcut) = MW
std − MWpcut = (143 ± 61(stat) ± 29(syst))MeV/c2
where the first uncertainty numbers represent the statistical components and the sec-
ond the combined systematic ones. The full breakdown of the uncertainties on both
observables can be found in Table 6.
Uncertainty contribution (MeV/c2)
Source ∆MW(std,Rcone = 0.5 rad) ∆MW(std, pcut = 2GeV/c)
Fragmentation 11 27
Calibration 3 3
Background 3 3
BEI-BEA 8 9
Total systematic 14 29
Statistical Error 35 61
Total 38 67
Table 6: Breakdown of the total uncertainty on both relevant observables.
From these values estimates were made for the κ parameter by comparing them with
the Monte Carlo expected values in different hypothesis of κ, shown in Figure 15 for the
observable ∆MW(std,Rcone = 0.5 rad).
The Gaussian uncertainty on the measured observables was used to construct a log-
likelihood function L(κ) = −2 log L(κ) for κ. The log-likelihood function obtained is
shown in Figure 16 for the first and in Figure 17 for the second observable.
The result shown in Figure 16 is the primary result of this analysis, because of the
larger sensitivity of the ∆MW(std,Rcone = 0.5 rad) observable to the value of κ (see sec-
tion 5.1). The value of κ most compatible with the data within one standard deviation
of the measurement is
κSK-I = 1.75
+2.60
−1.30 . (19)
The result on κ extracted from the cross-check ∆MW(std, pcut = 2GeV/c) observable
is found not to differ significantly from the quoted result obtained with the more opti-
mal ∆MW(std,Rcone = 0.5 rad) observable. The significance can be determined by the
difference between both MW estimators :
pcut − MWRcone = (−84 ± 59(stat))MeV/c2 . (20)
Taking into account that the expectation of this difference depends on κ, we find a sta-
tistical deviation of about 1 to 1.5σ between the measurements. No improved sensitivity
is obtained by combining the information of both observables.
200 DELPHI
at 188.6 GeV
at 199.5 GeV
at 206.5 GeV
SK-I Model parameter κ
κ = 0.66
Figure 15: The dependence of the observable ∆MW(std,Rcone = 0.5 rad) from simulation
events on the value of the SK-I model parameter κ. The dependence is given at three
centre-of-mass energies.
1 10 10
Likelihood of indirect measurement of κ SK-I
DELPHI measured : std - R=0.50 (stat+syst)
DELPHI measured : std - R=0.50 (stat)
SK-I Model parameter κ
Figure 16: The log-likelihood function −2 log L(κ) obtained from the DELPHI data mea-
surement of ∆MW(std,Rcone = 0.5 rad). The bottom curve (full line) gives the final result
including the statistical uncertainty on ∆MW(std,Rcone = 0.5 rad) and the investigated
systematic uncertainty contributions. The top curve (dashed) is centred on the same min-
imum and reflects the log-likelihood function obtained when only statistical uncertainties
are taken into account.
1 10 10
Likelihood of indirect measurement of κ SK-I
DELPHI measured : std - pcut=2 (stat+syst)
DELPHI measured : std - pcut=2 (stat)
SK-I Model parameter κ
Figure 17: The log-likelihood function −2 log L(κ) obtained from the DELPHI data mea-
surement of ∆MW(std, pcut = 2GeV/c). The bottom curve (full line) gives the final result
including the statistical uncertainty on ∆MW(std, pcut = 2GeV/c) and the investigated
systematic uncertainty contributions. The top curve (dashed) is centred on the same min-
imum and reflects the log-likelihood function obtained when only statistical uncertainties
are taken into account.
In this paper the SK-I model for Colour Reconnection implemented in PYTHIA was
studied because it parameterizes the effect as function of the model parameter κ. Other
phenomenological models implemented in the ARIADNE [7,8] and HERWIG [6] Monte Carlo
fragmentation schemes exist and are equally plausible. Unfortunately their effect in
W+W− → q1q̄2q3q̄4 events cannot be scaled with a model parameter, analogous to κ
in SK-I, without affecting the fragmentation model parameters. Despite this non-
factorization property, the consistency of these models with the data can still be ex-
amined. The Monte Carlo predictions of the observables in the hypothesis with Colour
Reconnection (calibrated in the hypothesis of no Colour Reconnection) give the following
values:
ARIADNE → MWstd − MWRcone = (7.2 ± 4.1) MeV/c2
ARIADNE → MWstd − MWpcut = (9.4 ± 7.0) MeV/c2
HERWIG → MWstd − MWRcone = (19.7 ± 4.0) MeV/c2
HERWIG → MWstd − MWpcut = (22.8 ± 6.9) MeV/c2 .
The small effects on the observables with the HERWIG implementation of Colour Reconnec-
tion compared to those predicted by SK-I are due to the fact that the fraction of events
that reconnect is smaller in HERWIG (1/9) compared to SK-I (& 25% at
s = 200 GeV).
After applying this scale factor between both models, their predicted effect on the W
mass and on the ∆MW(i, j) observables becomes compatible. The ARIADNE implementa-
tion of Colour Reconnection has a much smaller influence on the observables compared
to those predicted with the SK-I and HERWIG Monte Carlo.
5.4 Correlation with Direct MW Measurement
When using a data observable to estimate systematic uncertainties on some measur-
and inferred from the same data sample, the correlation between the estimator used to
measure the systematic bias and the estimator of the absolute value of the measurand
should be taken into account. Therefore the correlation between the Colour Reconnection
sensitive observables ∆MW(std,Rcone = 0.5 rad) and ∆MW(std, pcut = 2GeV/c) and the
absolute MW(std) estimator was calculated. The correlation was determined from the
Monte Carlo events and with κ = 0 or no Colour Reconnection. The values obtained
were found to be stable as a function of κ within the statistical precision. The correlation
between ∆MW(std,Rcone = 0.5 rad) and MW(std) was found to be 11%, while for the one
between ∆MW(std, pcut = 2GeV/c) and MW(std) a value of 8% was obtained. Also the
correlation between the different MW estimators was estimated and found to be stable
with the value of κ. A value of 83% was obtained for the correlation between MW(std)
and MW
Rcone=0.5 rad, while 66% was obtained between MW(std) and MW
pcut=2GeV/c.
6 Combination of the Results in the Scope of the
SK-I Model
The log-likelihood curve from the particle flow method was combined with the curve
from the ∆MW method and the result is shown in Figure 18. The correlations between the
analyses were neglected because the overlap between the samples is small and the nature
of the analyses is very different. The total errors were used (statistical and systematic
added in quadrature) in the combination.
1 10 10
Log-Likelihood of measurement    of κ SK-I
from ∆MW
from particle flow
∆MW+part.flow combined
SK-I Model parameter κ
DELPHI
Figure 18: The log-likelihood function −2 log L(κ) obtained from the combined DELPHI
measurement via ∆MW(std,Rcone = 0.5 rad) and the particle flow. The full line gives
the final result including the statistical and systematic uncertainties. The log-likelihood
functions are combined in the hypothesis of no correlation between the statistical and
systematic uncertainties of both measurements.
The best value for κ from the minimum of the curve, with its error given by the width
of the curve at the value −2 log L = (−2 log L)min + 1, is:
κSK-I = 2.2
−1.3 . (22)
7 Conclusions
Colour Reconnection (CR) effects in the fully hadronic decays of W pairs, produced
in the DELPHI experiment at LEP, were investigated using the methods of the particle
flow and the MW estimators, notably the ∆MW(std,Rcone = 0.5 rad) observable.
The average of the ratios R of the integrals between 0.2 and 0.8 of the particle distri-
bution in Inside-W regions to the Between-W regions was found to be
〈R〉 = 0.979± 0.032(stat)± 0.035(syst) . (23)
The values used in this average were obtained after rescaling the value at each energy to
the value at a centre-of-mass energy of 189 GeV using a fit to the MC without CR.
The effects of CR on the values of the reconstructed mass of the W boson, as imple-
mented in different Monte Carlo models, were studied with different estimators. From
the estimator of the W mass with the strongest sensitivity to the SK-I model of CR, the
∆MW(std,Rcone = 0.5 rad) method, the difference in data was found to be
∆MW(std,Rcone) = MW
std −MWRcone=0.5 rad = ( 59± 35(stat)± 14(syst) )MeV/c2 . (24)
From the combination of the results from particle flow and MW estimators, corre-
sponding to the curve in full line shown in Figure 18, the best value and total error for
the κ parameter in the SK-I model was extracted to be:
κSK-I = 2.2
−1.3 (25)
which corresponds to a probability of reconnection of Preco = 52% and lies in the range
31% < Preco < 68% at 68% confidence level.
The two analysis methods used in this paper are complementary: the method of parti-
cle flow provides a model-independent measurement but has significantly less sensitivity
towards the SK-I model of CR than the method of ∆MW estimators.
The obtained value of κ in equation (25) can be compared with similar values obtained
by other LEP experiments, and it was found to be compatible with, but higher than, the
values obtained with the particle flow by L3 [37] and OPAL [38]. It is also compatible
with, but higher than, the values obtained with the method of different MW estimators
by OPAL [39] and ALEPH [40].
Acknowledgements
We thank the ALEPH Collaboration for the production of the simulated “Cetraro
Samples”.
We are greatly indebted to our technical collaborators, to the members of the CERN-
SL Division for the excellent performance of the LEP collider, and to the funding agencies
for their support in building and operating the DELPHI detector.
We acknowledge in particular the support of
Austrian Federal Ministry of Education, Science and Culture, GZ 616.364/2-III/2a/98,
FNRS–FWO, Flanders Institute to encourage scientific and technological research in the
industry (IWT) and Belgian Federal Office for Scientific, Technical and Cultural affairs
(OSTC), Belgium,
FINEP, CNPq, CAPES, FUJB and FAPERJ, Brazil,
Czech Ministry of Industry and Trade, GA CR 202/99/1362,
Commission of the European Communities (DG XII),
Direction des Sciences de la Matière, CEA, France,
Bundesministerium für Bildung, Wissenschaft, Forschung und Technologie, Germany,
General Secretariat for Research and Technology, Greece,
National Science Foundation (NWO) and Foundation for Research on Matter (FOM),
The Netherlands,
Norwegian Research Council,
State Committee for Scientific Research, Poland, SPUB-M/CERN/PO3/DZ296/2000,
SPUB-M/CERN/PO3/DZ297/2000, 2P03B 104 19 and 2P03B 69 23(2002-2004)
FCT - Fundação para a Ciência e Tecnologia, Portugal,
Vedecka grantova agentura MS SR, Slovakia, Nr. 95/5195/134,
Ministry of Science and Technology of the Republic of Slovenia,
CICYT, Spain, AEN99-0950 and AEN99-0761,
The Swedish Research Council,
Particle Physics and Astronomy Research Council, UK,
Department of Energy, USA, DE-FG02-01ER41155,
EEC RTN contract HPRN-CT-00292-2002.
References
[1] G. Gustafson, U. Pettersson and P. M. Zerwas, Phys. Lett. B 209 (1988) 90.
[2] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 44 (2005) 161.
[3] V. Khoze, L. Lönnblad, R. Møller, T. Sjöstrand, Š. Todorova and N. K. Watson in
“Physics at LEP-2”, Yellow Report CERN 96-01, Eds. G. Altarelli, T. Sjöstrand
and F. Zwirner, vol. 1 (1996) 191.
[4] The LEP Collaborations ALEPH, DELPHI, L3, OPAL, and the LEP W Working
Group,
“Combined preliminary results on Colour Reconnection using Particle Flow in
e+e− → W+W−”, note LEPEWWG/FSI/2002-01, ALEPH 2002-027 PHYSIC 2002-
016, DELPHI 2002-090 CONF 623, L3 note 2768, and OPAL TN-724, July 17th,
2002, contribution to the summer Conferences of 2002, available at
http://delphiwww.cern.ch/pubxx/delnote/public/2002 090 conf 623.ps.gz .
[5] T. Sjöstrand and V. A. Khoze, Z. Phys. C 62 (1994) 281.
[6] G. Marchesini et al., Comp. Phys. Comm. 67 (1992) 465;
G. Corcella et al., JHEP 0101 (2001) 010.
[7] L. Lönnblad, Comp. Phys. Comm. 71 (1992) 15;
H. Kharraziha and L. Lönnblad, Comp. Phys. Comm. 123 (1999) 153.
[8] G. Gustafson and J. Häkkinen, Z. Phys. C 64 (1994) 659.
[9] L. Lönnblad, Z. Phys. C 70 (1996) 107.
[10] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 416 (1998) 233.
P. Abreu et al. [DELPHI Collaboration], Eur. Phys. J. C 18 (2000) 203 [Erratum-
ibid. C 25 (2002) 493].
[11] D. Duchesneau, “New method based on energy and particle flow in e+e− →
W+W− → hadrons events for colour reconnection studies”, LAPP-EXP-2000-02
(http://wwwlapp.in2p3.fr/preplapp/LAPP EX2000 02.pdf), Presented at Workshop
on WW Physics at LEP-200 (WW99), Kolymbari, Chania, Greece, 20-23 Oct 1999.
[12] J. D’ Hondt and N. J. Kjaer, “Measurement of Colour Reconnection model parameters
using MW analyses”, contributed paper for ICHEP’02 (Amsterdam), note DELPHI
2002-048 CONF 582, available at
http://delphiwww.cern.ch/pubxx/delnote/public/2002 048 conf 582.ps.gz .
[13] P. A. Aarnio et al. [DELPHI Collaboration], Nucl. Instrum. Meth. A 303 (1991) 233.
[14] P. Abreu et al. [DELPHI Collaboration], Nucl. Instrum. Meth. A 378 (1996) 57.
[15] A. Augustinus et al. [DELPHI Trigger Group], Nucl. Instrum. Meth. A 515 (2003)
[16] P. Chochula et al. [DELPHI Silicon Tracker Group], Nucl. Instrum. Meth. A 412
(1998) 304.
[17] A. Ballestrero, R. Chierici, F. Cossutti and E. Migliore, Comp. Phys. Comm. 152
(2003) 175.
[18] E. Accomando and A. Ballestrero, Comp. Phys. Comm. 99 (1997) 270;
E. Accomando, A. Ballestrero and E. Maina, Comp. Phys. Comm. 150 (2003) 166.
[19] T. Sjöstrand, Comp. Phys. Comm. 82 (1994) 74;
T. Sjöstrand et al., Comp. Phys. Comm. 135 (2001) 238.
[20] P. Abreu et al. [DELPHI Collaboration], Z. Phys. C 73 (1996) 11.
[21] L. Lönnblad and T. Sjöstrand, Eur. Phys. J. C 2 (1998) 165.
[22] F. A. Berends, R. Pittau and R. Kleiss, Comp. Phys. Comm. 85 (1995) 437.
[23] S. Jadach, B. F. L. Ward and Z. Was, Phys. Lett. B 449 (1999) 97;
S. Jadach, B. F. L. Ward and Z. Was, Comp. Phys. Comm. 130 (2000) 260.
[24] S. Jadach et al., Comp. Phys. Comm. 140 (2001) 475.
[25] P. Abreu et al., Nucl. Instrum. Meth. A 427 (1999) 487.
[26] S. Catani et al., Phys. Lett. B 269 (1991) 432.
[27] B. Efron, “Computers and the Theory of Statistics”, SIAM Rev. 21 (1979) 460.
[28] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 34 (2004) 399.
[29] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 286 (1992) 201;
P. Abreu et al. [DELPHI Collaboration], Z. Phys. C 63 (1994) 17;
P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 355 (1995) 415;
P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 471 (2000) 460;
P. Achard et al. [L3 Collaboration], Phys. Lett. B 524 (2002) 55;
P. Achard et al. [L3 Collaboration], Phys. Lett. B 540 (2002) 185;
P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 267 (1991) 143;
P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 287 (1992) 401;
P. D. Acton et al. [OPAL Collaboration], Phys. Lett. B 298 (1993) 456;
R. Akers et al. [OPAL Collaboration], Z. Phys. C 67 (1995) 389;
G. Alexander et al. [OPAL Collaboration], Z. Phys. C 72 (1996) 389.
K. Ackerstaff et al. [OPAL Collaboration], Eur. Phys. J. C 5 (1998) 239;
G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 11 (1999) 239;
G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 16 (2000) 423;
G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 21 (2001) 23;
G. Abbiendi et al. [OPAL Collaboration], Phys. Lett. B 523 (2001) 35;
G. Abbiendi et al. [OPAL Collaboration], Phys. Lett. B 559 (2003) 131;
D. Decamp et al. [ALEPH Collaboration], Z. Phys. C 54 (1992) 75;
A. Heister et al. [ALEPH Collaboration], Eur. Phys. J. C 36 (2004) 147;
S. Schael et al. [ALEPH Collaboration], Phys. Lett. B 611 (2005) 66.
[30] P. Achard et al. [L3 Collaboration], Phys. Lett. B 547 (2002) 139;
G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 36 (2004) 297;
S. Schael et al. [ALEPH Collaboration], Phys. Lett. B 606 (2005) 265.
[31] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 34 (2004) 127.
[32] J. Abdallah et al. [DELPHI Collaboration], Eur. Phys. J. C 30 (2003) 447.
[33] P. Abreu et al. [DELPHI Collaboration], Phys. Lett. B 511 (2001) 159.
[34] The LEP Collaborations ALEPH, DELPHI, L3 and OPAL, and the LEP W
Working Group, “Combined Preliminary Results on the Mass and Width of the
W Boson Measured by the LEP Experiments”, note LEPEWWG/MASS/2001-
02, ALEPH 2001-044 PHYSIC 2001-017, DELPHI 2001-122 PHYS 899,
L3 Note 2695, OPAL TN-697, contribution to EPS 2001, available at
http://delphiwww.cern.ch/pubxx/delnote/public/2001 122 phys 899.ps.gz .
[35] J. Abdallah et al. [DELPHI Collaboration], “Measurement of the mass and width of
the W boson in e+e− collisions at
s =161-209 GeV ”, paper in preparation.
[36] N. Kjaer and M. Mulders, “Mixed Lorentz boosted Z0’s”, CERN-OPEN-2001-026.
[37] P. Achard et al. [L3 Collaboration], Phys. Lett. B 561 (2003) 202.
[38] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 45 (2006) 291.
[39] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 45 (2006) 307.
[40] S. Schael et al. [ALEPH Collaboration], Eur. Phys. J. C 47 (2006) 309.
ABSTRACT
  In the reaction e+e- -> WW -> (q_1 qbar_2)(q_3 qbar_4) the usual
hadronization models treat the colour singlets q_1 qbar_2 and q_3 qbar_4 coming
from two W bosons independently. However, since the final state partons may
coexist in space and time, cross-talk between the two evolving hadronic systems
may be possible during fragmentation through soft gluon exchange. This effect
is known as Colour Reconnection. In this article the results of the
investigation of Colour Reconnection effects in fully hadronic decays of W
pairs in DELPHI at LEP are presented. Two complementary analyses were
performed, studying the particle flow between jets and W mass estimators, with
negligible correlation between them, and the results were combined and compared
to models. In the framework of the SK-I model, the value for its kappa
parameter most compatible with the data was found to be: kappa_{SK-I} =
2.2^{+2.5}_{-1.3} corresponding to the probability of reconnection P_{reco} to
be in the range 0.31 < P_{reco} < 0.68 at 68% confidence level with its best
value at 0.52.

<|endoftext|><|startoftext|>
Microsoft Word - ENG-EJTP.doc
EVOLUTIONARY NEURAL GAS (ENG):  A MODEL OF SELF ORGANIZING 
NETWORK FROM INPUT CATEGORIZATION. 
Ignazio Licata (a) ↑ , Luigi Lella (b) 
(a) Ixtucyber for Complex Systems, Marsala, TP and Institute for Scientific Methodology, 
Palermo, Italy; 
(b) A.R.C.H.I. - Advanced Research Center for Health Informatics, Ancona, Italy 
ABSTRACT  
Despite their claimed biological plausibility, most self organizing networks have strict 
topological constraints and consequently they cannot take into account a wide range of external 
stimuli. Furthermore their evolution is conditioned by deterministic laws which often are not 
correlated with the structural parameters and the global status of the network, as it should happen in 
a real biological system. In nature the environmental inputs are noise affected and “fuzzy”. Which 
thing sets the problem to investigate the possibility of emergent behaviour in a not strictly 
constrained net and subjected to different inputs. 
It is here presented a new model of Evolutionary Neural Gas (ENG) with any topological 
constraints, trained by probabilistic laws depending on the local distortion errors and the network 
dimension. The network is considered as a population of nodes that coexist in an ecosystem sharing 
local and global resources. 
Those particular features allow the network to quickly adapt to the environment, according to its 
dimensions. The ENG model analysis shows that the net evolves as a scale-free graph, and justifies 
in a deeply physical sense- the term “gas” here used. 
Key-words: Self-Organizing Networks; Neural Gas; Scale-Free Graph; Information in Network 
Functional Specialization. 
1. INTRODUCTION 
Self organizing networks are systems widely used in categorization tasks. A network can be 
seen as a set A={c1, c2,… ,cn} of units with associated reference vectors wc∈R
n where Rn is the 
same space where inputs are defined. Each unit (or node) can establish connections with the other 
ones, the units belonging to the same clusters are subjected to similar modification affecting their 
reference vectors.  
Self organizing networks can automatically adapt to input distributions without supervision by 
means of training algorithms that are simple sequences of deterministic rules. Competitive hebbian 
learning and neural gas are the most important strategies used for their training. 
Neural gas algorithm (Martinetz T.M. and Schulten K.J., 1991) sorts the network units according to 
the distance of their reference vector to each input. Then the reference vectors are adapted so that 
the ones related to the first nodes in the rank order are moved more close than the others to the 
considered input. 
Competitive hebbian learning (Martinetz and Schulten, 1991; Martinetz, 1993) consists in 
augmenting the weight of the link connecting the two units whose reference vectors are closest to 
the considered input (the two most activated units). 
Both strategies are examples of deterministic rules. As we know there are other rules that constrain 
the topology of the network which has a fixed dimensionality. That’s the case of Self Organizing 
Maps (Kohonen, 1982) and Growing Cell Structures (Fritzke, 1994). 
                                                
↑ Corresponding author: Ignazio.licata@ejtp.info  
In other cases the network structures haven’t topological constraints, they take a well ordered 
distribution by exactly adapting to the manifold inputs. For example TRN (Martinetz and Schulten, 
1994) and GNG are networks whose final structure is similar to a Delaunay Triangulation 
(Delaunay, 1934).We have tried to define a new self organizing network that is trained by 
probabilistic rules avoiding any topological constraints.  
According to Jefferson (1995) life and evolution are structured at least into four fundamental 
levels:  molecular, cellular, organism and population. We propose a population level based on 
evolutionary algorithm where the network is seen as a population of units whose interactions are 
conditioned by the availability of resources in their ecosystem. The evolution of the population is 
driven by a selective process that favours the fittest units. This approach has a biological 
plausibility. As stated by recent theories (Edelman, 1987) human brain evolution is subjected to 
similar selective pressures. 
Obviously we are not interested in recreating the same structure as the human brain. Our work 
aims at finding innovative and effective solutions to the categorization problem adopting natural 
system strategies. So our system falls within the Artificial Life field (Langton, 1989). 
Our model is a complex system that shows emergent features. In particular its structure evolves as a 
scale free graph. In the training phase there arise clusters of units with a limited number of nodes 
that establish a great number of links with the others.  
Scale free graphs are a particular structure that is really common in natural systems. Human 
knowledge, for instance, seems to be structured as a scale free graph (Steyvers, Tenenbaum 2001). 
If we represent words and concepts as nodes, we’ll find that some of these are more connected than 
the others. 
Scale free graphs have three main features.The small world structure. It means there is a 
relatively short path between any couple of nodes (Watts, Strogatz, 1998).The inherent tendency to 
cluster that is quantified by a coefficient introduced by Watts and Strogatz. Given a node i of ki 
degree i.e. having ki edges which connect it to ki other nodes, if those make a cluster, they can 
establish ki(ki-1)/2 edges at best. The ratio between the actual number of edges and the maximum 
number gives the clustering coefficient of node i. The clustering coefficient of the whole network is 
the average of all the individual clustering coefficients.  
Scale free graphs are also characterized by a particular degree distribution that has a power-law 
tail P(k)~k n− . That’s why such networks are called “scale free” (Albert, Barabasi, 2000). 
The three previous features are quantified by three parameters: the average path length between 
any couple of nodes, the clustering coefficient and the exponent of the power law tail. We’ll show 
that the values of these parameters in our model seem to confirm its scale free nature. 
2. AN OUTLINE ON SELF-ORGANIZATION AND EVOLUTIONARY SYSTEMS 
Natural selection mechanism has been successfully used for a lot of industrial applications 
spanning from projecting to real-time control and neural networks training. 
It was in the 60s that Genetic Algorithms based on the Evolution Theory’s three main 
mechanisms - reproduction, mutation and fitness – were first used in dealing with optimization 
problems. Although the solution is reached by a population of individuals, systems based on this 
approach are not considered self organizing because their dynamics depend on the external 
constraint of the fitness function. 
In the 80s a new approach to the study of living systems which mixed together self organization 
and evolutionary systems came out (Rocha, 1997). Its success was due to the studies on the way 
how biological systems work (metabolism, adaptability, autonomy, self repairing, growth, evolution 
etc.). The hybrid systems make us possible to get a better simulation both of the evolutionary 
optimization processes and the internal structure modification to reach a greater biological 
plausibility in the fitness 
Neuroevolutionary systems are an example of this approach. In classic neuroevolutionary models 
the network parameters are genetically set, whereas the connection weights are modified according 
to a training strategy. This solution follows the classic vision of cerebral development where genes 
control the formation of synaptic connections while their reinforcement depends on neural activity. 
More recent neuroevolutionary systems are characterized by different forms of self organizing 
processes which are cooperative coevolution (Paredis, 1995; Smith, Forrest and Perelson, 1993) and 
synaptic Darwinism (Edelman, 1987). 
Cooperative co evolutionary systems offer a promising alternative to classic evolutionary 
algorithms when we face complex dynamical problems. The main difference with respect to classic 
EA is the fact that each individual represents only a partial solution of the problem. Complete 
solutions are obtained by grouping several individuals. The goal of each individual is to optimize 
only a part of the solution, cooperating with other individuals that optimize other parts of the 
solution. It is so avoided the premature convergence towards a single group of individuals. An 
example of such approach is given by the Symbiotic Adaptive Neuroevolution System (Moriarty 
and Miikkulainen, 1998) that operates on populations of neural networks. 
While in most neuroevolutionary systems each individual represents a complete neural network, 
in SANE each individual represents a hidden unit of a two-layered network. Units are continuously 
combined and the resulting networks are evaluated on the basis of the performances shown in a 
given task. The global effect is equal to schemas promoting in standard EAs. In fact during the 
evolution of the population the neural schemas having the highest fitness values are favoured and 
the possible mutations in the copies of these schemas don’t affect the other copies in the population. 
Other recent strategies focus on the evolution of connection schemas in the network. In the human 
brain the number of synapses established by a single neuron is always much lower than the overall 
number of neurons. That gives the network a sparsely connected aspect. In the last years several 
models have been proposed to emulate the mechanism involved in the selection of links without 
referring to the physical and chemical properties of neurons. 
The Chialvo and Bak model (Chialvo and Bak, 1999) is based on two simple and biological 
inspired principles. First, the neural activity is kept low selecting the activated units by a winner 
takes all strategy. Second, the external environment gives a negative feedback that inhibits active 
synapses if the network behaviour is not satisfying. With these simple rules the model operates in a 
highly adaptive state and in critical conditions (extreme dynamics). The fundamental difference of 
this strategy based on the synaptic inhibition with respect to the classic one based on synaptic 
reinforcement is that the reinforcement-based learning is a continuative process by definition, while 
the inhibition-based learning stops when the training goal is achieved. The synaptic inhibition is 
also biologically plausible. According to Young (Young, 1964; Young, 1966) learning is the result 
of the elimination of synaptic connections (closing of unneeded channels). Dawkins (Dawkins R., 
1971) stressed that pattern learning is achieved by synaptic inhibition. As stated by the neural 
groups’ selection theory developed by Edelman (Edelman, 1978; Edelman, 1987), brain 
development is characterized by generating a structural and dynamical variability within and 
between populations of neurons, by the interaction of the neural circuit with the environment and by 
the differential attenuation or amplification of synaptic connections. Research in neurobiology 
seems to confirm the validity of the negative feedback model and the fact that neural development 
follows the process of Darwinian evolution. 
The Chialvo and Bak model is a simple two-layered network. After the training each input pattern 
is associated with a single output unit leading to the formation of an associative map. When an input 
pattern is presented the most activated input unit i is selected. Then the neuron j from the hidden 
layer that establishes the most robust connection with i is selected. Finally the output neuron k that 
is the most strongly connected with j is selected. If k is not the desired output the two links 
connecting i with j and j with k are inhibited by a coefficient d that is the only parameter of the 
model. The iterative application of these rules leads to a rapid convergence towards any input-
output mapping. This selective process followed by an inhibitory one is the essence of the natural 
selection in the evolutionary context. The fittest individual is selected on the basis of a strategy that 
doesn’t reward the best but punishes the worst. That’s the reason why this model has been 
considered a particular kind of synaptic Darwinism. 
Our neuroevolutionary model is also based on a selection strategy. The structural information of 
our network is not codified by genes. We directly consider the entire network as a population of 
nodes that can establish connections, generate other units or die. The probability of these events 
depends on the presence of local and global resources. If there are few resources the population 
falls, if there is a lot of resources the population grows. Like in the Chialvo and Bak model we don’t 
select the fittest nodes reinforcing their links, but we simply remove the worst nodes when the 
ecosystem resources are low. This generates a selective process that indirectly rewards the units 
which can better model the input patterns. Our evolutionary strategy can be seen as a selective 
retention process (Heylighen, 1992) that removes those units which cannot reach a stable state, 
remaining associated with several input patterns. Even if the stability of a unit is quantified by the 
minimum distortion error related to it, this information mustn’t be considered to be environmental 
information. The minimum distortion error simply quantifies the difficulty encountered by the unit 
during the modelling of input patterns. 
3. THE EVOLUTIONARY ALGORITHM 
Research has confirmed (Roughgarden, 1979; Song and Yu, 1988) that in natural environments 
the population size along with competition and reproduction rates continuously changes according 
to some natural resources and the available space in the ecosystem. 
These mechanisms have been reproduced in some evolutionary algorithms, for example to optimize 
the evolution of a population of chromosomes in a genetic algorithm (Annunziato and Pizzuti, 
2000). We have tried to use a similar strategy for the evolution of a population of units in a self 
organizing network without using the string representation of genetic programming. 
In our model each node is defined by a vector of neighbouring units connected to it, a reference 
vector and a variable D that is the smallest distance between its reference vector and the closest 
modelled input. The value of this variable quantifies the debility degree of the unit.  The lower is D 
the higher are the chances for the unit to survive. At each presentation of the training input set, D is 
set to the maximum value. After the presentation of a given input x, if the reference vector w of the 
unit is modified, the resulting distance between the two vectors ||x-w|| is calculated. If this value is 
lower than D it becomes its new value. 
The  training algorithm here used can be subdivided in three phases: 
1) Winners are selected. For each input the unit having the closest reference vector is selected. 
2) The reference vectors of the winners and their neighbours are updated according to the 
following formula : 
(3.1) ( ) ( ) ( )( )1w t w t x w tα+ = + − . 
So the reference vectors w of the selected units are moved towards the relative inputs x of a certain 
fraction of the distances that separate them. For winners this fraction is two or three orders of 
magnitude higher than the one used for their neighbours. So winners have the reference vectors 
moving more quickly towards the inputs. 
3) The population of units evolves producing descendants, establishing new connections and 
eliminating the less performing units. All these events can occur with a well defined probability that 
depends on the availability of resources. 
These rules are iterated until a given goal is achieved. For example the minimization of the 
expected quantization error that is the mean of the distances between the winners and the K inputs 
they model: 
(3.2) 
D x wK
= −�  
If this value falls below a certain threshold Dmin, training is stopped. 
The first two phases can be considered a kind of winner takes all strategy, where only the most 
activated units are selected and enabled to modify their reference vectors. The third phase is the 
evolutionary phase (fig. 3.1). Each unit i, i=[1…N(t)] where N(t) is the actual population size can 
meet the closest winner j with probability Pm: 
Fig. 3.1 – The evolutionary phase of the algorithm. 
If meeting occurs, the two units establish a link and they can interact by reproducing with 
probability Pr. In this case two new units are created. One is closer to the first parent, the other to 
the second parent: 
(3.3) 
If reproduction doesn’t take place due to the lack of resources the weaker unit of the population, i.e. 
the one with the highest debility degree, is removed. 
If unit i doesn’t meet any winner it can interact with the closest node k with probability Pr 
establishing a connection and producing a new unit whose reference vector is set between the 
parents reference vectors: 
(3.4)  1 22 2
p pw ww
When we fix a maximum population size, the ratio between the actual size and the threshold 
N(t)/Nmax can be seen as a global resource of the ecosystem affecting the probabilities of the events. 
For example if the population size is low the reproduction rate should be high. So we can 
reasonably choose Pr = 1-N(t)/Nmax.  If the population size is high, the chance for the units to meet 
each other will be higher, so we can set Pm = N(t)/Nmax. 
We can also consider a local resource that is the ratio between the threshold Dmin and the 
debility degree Di of the unit i. Each unit i should meet a winner with a probability 
Pm=(N(t)/Nmax)(1-Dmin/Di) and Pr = 1 – Pm. In this way winners are not encouraged to migrate to 
other groups of nodes and weaker units don’t participate in reproduction activities. 
We can estimate the population grow rate in the following way: 
(3.5) 
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( ) ( )
2 2 2
2min min
1 2 1 1 1
2 1 1 2 1 ( )
2 1 1 1 2 1 1 ( )
m r m r m m d
N t N t P P N t P P N t P N t P P N t
N t P N t
N t X t X t X t first model
N t D D
N t X t X t X t second model
M D D
+ = + − − + − − − =
= − =
= − � + = −� �
� � � �� � � �= − − � + = − −� � � �� � � �� �� �� � � �� �� �
where X(t) is the normalized size N(t)/Nmax. This is the quadratic-logistic map of Annunziato and 
Pizzuti(Annunziato and Pizzuti, 2000): 
(3.6) ( ) ( ) ( )( )21 1X t aX t X t+ = −  
They proved that by varying the parameter different chaotic regimes arise. For a<1.7 the behaviour 
is not chaotic, for 1.7<a<2.1 we have chaotic regimes with simple attractors localized in a fixed part 
of the plane of the phases. Theoretically for the first model we expect to obtain a chaotic regime 
that is described by a simple attractor. In the second model the factor (1 – Dmin/D) might reduce the 
influence of the negative feedback in the final part of network training. 
It is possible to demonstrate that during the evolution the population size converges to N(t) = 
0.72 Nmax. In this phase the probability that a unit establishes n connections with the other ones for 
the first model, considering only clusters of n units, is given by: 
(3.7) ( )
max0.72 1
max max
1max max
0.72 0.72
n i nN n
P n n
� � � �
= − =� � � �
� � � �
It has to be pointed out we have subtracted the probability that such n links developed within a 
cluster of more than a n unit. 
The coefficients α  and β  of the power law are considered constant at the end of the training. To 
compute their values, we can take into consideration the cases n=1 and n=0.72N Nmax-1 which 
correspond to the minimum and maximum number of connection at the end of the training. 
(3.8) ( )
max0.72 2
1 0.72 0.72 1
P βα α
= − = =�  
( ) ( )
0.72 2
0.72 1 1
max max
0.72 1 0.72 0.72 0.72 0.72 1
P N N
− = = − −� �
max max
0.72 2
0.72 1 0.72 1
0.72 0.72
� �� =
The distribution tail of the degrees tends to stretch when the maximum size of the population 
increases, it means that in wider networks there are more hubs with a higher degree. 
For the second model we can consider that at the end of the training (1-Dmin/D) ∼ε   
So the probability that a unit establishes n links becomes: 
(3.9) 
max0.72 1
max max
1max max
0.72 0.72
n i nN n
P n n
βε ε α
� � � �
= − =� � � �
� � � �
max max
0.72 2
0.72 1 0.72 1
0.72 0.72
� �� =
and the considerations made for the first model can be therefore extended to the second model. 
4. TRAINING THE NET: SIMULATIONS  
We have compared the performances of our networks with those of a Growing Neural Gas in 
categorizing bidimensional inputs.  
GNG is a self organizing network which thanks to both the competitive hebbian learning 
strategy and the neural gas algorithm can categorize inputs without altering their exact 
dimensionality.For the GNG, the parameters of the model are �����α  = 0.5, β  = 0.0005 and at each λ  = 
300 steps a new unit is inserted. The maximum age of the links is set to 88.  
For the two different ENG models, the parameters are α  = 0.05, β  = 0.0006 and the maximum 
size is set to Nmax = 120. 
As stopping criterion for both the algorithms we have chosen the minimization of the expected 
quantization error that is the average distance between the winners and the corresponding inputs. 
We have considered two different input domains. In the first case inputs are localized within four 
square regions, in the second one inputs are uniformly distributed in a ring region. 
As shown in fig.4.1 after the training, GNG reference vectors are all positioned in the input 
domain. In the Evolutionary Self Organizing Networks (fig.4.2a and fig.4.2b) some units fall 
outside the input domain, but in this way the network remains fully connected. The nodes’ 
distribution statistical analysis confirms what appears to be intuitively patent: the emerging network 
structure is a typical scale-free one, i.e. a structure where few hubs manage the links. 
Fig. 4.1 – Growing Neural Gas simulations. 
Fig. 4.2 a – Evolutionary Self Organizing Network simulations (first model). 
Fig. 4.2 b – Evolutionary Self Organizing Network simulations (second model). 
We trained 30 networks of each type obtaining the average degree distributions reported in fig.4.3-
4.5. In tab. 4.1 – 4.2 are reported the average values of the structural parameters of the two 
networks. 
y = 37,074x
-2,0371
1 2 3 4 5
degree
y = 68,961x
-2,9864
1 2 3 4 5 6
degree
Fig. 4.3 – Average degree distribution in GNG (two different input domains) 
y = 23,75x-1,149
1 2 3 4 5 6 7 8 9
degree
y = 22,069x
-1,1143
1 2 3 4 5 6 7 8
degree
Fig. 4.4 – Average degree distribution in ENG (first model, two different input domains) 
y = 24,415x-1,1411
1 2 3 4 5 6 7 8 9
degree
y = 21,98x
-1,1408
1 2 3 4 5 6 7 8
degree
Fig. 4.5 – Average degree distribution in ENG (second model, two different input domains) 
While GNG have a high value for the average path length and a low clustering coefficient, ENG 
have a short average path length and a high clustering coefficient which along with the power law 
tail of the degree distribution confirm its scale free graph features. 
 Average 
path 
length 
Clustering 
coefficient 
Power 
exponent 
GNG -  0.49 2.04 
ESON (1st) 3.82 0.64 1.15 
ESON(2nd) 3.92 0.63 1.14 
 Tab. 4.1 – Comparison of structural parameters (average values, first input domain) 
 Average 
path 
length 
Clustering 
coefficient 
Power 
exponent 
GNG 6.4 0.42 2.98 
ESON (1st) 3.61 0.58 1.11 
ESON(2nd) 3.67 0.59 1.14 
Tab. 4.2 – Comparison of structural parameters (average values, second input domain) 
Fig. 4.6 – 4.7 shows the population dynamics of the two ENG models. 
The structure shared by the two different ENG models is due to the fact that the winner units tend to 
establish the greatest number of connections. These are the favoured units with which each node try 
to establish a connection. If the probability depends also on the local distortion error as it happens in 
the second model, we obtain a final structure that is more similar to the GNG, which is to say more 
similar to a gas. In point of fact, the conditions to create a new link become more restrictive, 
reducing the interaction among each cluster and the whole network. The structure of connections 
seems to extend more uniformly in the regions where inputs are present as it can be seen in picture 
4.2b (more evident in the circular distribution). 
Picture 4.7 shows the dynamics of the populations in the two different models of ENG. In the first 
model the population size seems to converge to the final value of 0.72Nmax, confirming the 
experimental results of Annunziato and Pizzuti. As it can be noticed in fig. 4.6, since the d value 
gradually diminishes during the training, the influence of the factor (1-Dmin/d) grows reducing the 
effects of the negative feedback which characterizes the quadratic logistic map. This justifies the 
sudden growth of the population at the end of the training in the second model. 
1 3 5 7 9 1
epochs
1 2 3 4 5 6 7 8 9 10 11
epochs
Fig. 4.6 – Network size evolution of the two ENG models (first input domains). 
At the end of training new units connect with the winner units which have a lower d, while the 
subgroups of units become more isolated. Considering the function (X(t),X(t+1)) the attractor 
becomes more marked in the second model. This means that the system tends to converge more 
toward a precise final state with a lower interaction among the groups of units. 
Fig. 4.7 – population dynamics (X(t),X(t+1)) of the two Esonet models (first input domain). 
5. THE ROLE OF INFORMATION IN FUNCTIONAL SPECIALIZATION AND 
INTEGRATION 
We can classify a system as complex when it is made up of different parts heterogeneously 
interacting. In addition, its behaviour and its structure have to be neither completely casual (as it 
happens in a gas) nor too regular (as it happens in a crystal). In Nature we generally observe the co-
existence of functionally highly specialized integrated areas. 
That’s what happens in the brain, where different areas and groups of neurons interact to give 
rise to an integrated and unitary cognitive scenario (G. M. Edelman, G. Tononi, 2000). 
Edelman has introduced the integration, reciprocal information and complexity concepts in 
order to mathematically define the functional organization of the cerebral structures. 
Within a complex system, a subset of elements can be defined an integrated process if – on a 
given temporal scale – the elements interact more strongly with each other than with the system. In 
a neural net or in a self-organizing one it means that the units of an integrated group will tend to 
simultaneously activate themselves. 
When the units in a subset are independent, the system’s entropy reaches its maximum value 
which is the sum of the entropies of the single elements (local entropies). On the contrary, when any 
kind of interaction occurs, the global entropy decreases so becoming lower than the sum of the local 
entropies. The integration measure is, therefore, a natural indicator of the system informational 
“capacity”. 
So the integration of a subset of network units can be calculated by deducting the sum of the 
entropies of each single component ( )ix  from the entropy of the system considered as a whole. If 
each unit can only take two states (activated/not-activated), the amount of the possible activation 
patterns of a subset with N units is N2 . So the system maximum entropy is: 
(5.1) ( ) ( )max 2 2 2
1 2 1
log log 1 log 2
i i NN
i i i
H X H x p N
� � � �= = = = =� �� �
� �� �
� �  
and the integration will be: 
(5.2) ( ) ( ) ( )
I X H x H X
= −�  
for the self-organized net here considered, the integration of a sub-group of units takes the 
following expression: 
(5.3) ( )
I X N P
� �� �
= − � �� �+� � � �
where Pi is the probability for a node to establish i connections. The overall number of the 
system’ states is equal to the total number of possible groups of i+1 units. Groups of units having 
the same dimension (groups of i+1 units) give the same contribution to the entropy of the system. 
If we choose the WTA strategy as activation modality, for each presented input only a single 
unit (the winner) and the 1<i<N-1 i units will activate themselves. All the other ones remain not-
activated. 
The probability for a node to create connections is ruled by the power law βα −= kPi , with α and 
β depending on 1) the network dimension, 2) the local distortion errors (for the second model) and 
3) the particular evolution of the network structure, i.e. the dynamic behaviour of ( )tα  and ( )tβ . 
So the integration of the two self-organizing network here presented is: 
(5.4) ( ) ( )
I X N i
� � � �= − � �� �+ � �� �
The integration can be seen as a measure of the statistic dependency within a subset of units. 
The stronger their interactions are, the higher their integration. 
In order to measure the statistic dependency between a subset and the whole system, Edelman 
introduced the concept of mutual information. Given an n subset made up of k elements ( )knX  and 
its complement in the system ( )knXX − , the mutual information is: 
(5.5) ( ) ( ) ( ) ( );k k k kn n n nIR X X X H X H X X H X− = + − −  
The mutual information is essential to evaluate the differentiation degree of a system, i.e. it is a 
significant index of the system’ “resolution” degree, calculated on the subdividable and distinct 
states.  
In order to measure the information of an integrated activation pattern, we calculate how the 
states of a given subset can differentiate them from the whole system ones. Which thing, following 
Edelman, is equivalent to considering the whole system as the observer of itself. In fact, if entropy 
measures the variability of a system according to an external observer evaluation, the mutual 
information measures the system variability according to an observer ideally placed within the 
system itself. 
The overall measure of the differentiation degree of a complex system is given by the mutual 
information average between each subset and the whole system: 
(5.6) ( ) ( )
C X IR X X X
= −�  
Edelman defined such measure as complexity and its value is high if each subset can averagely take 
many different states which are statistically depending on the whole system’s ones, so it shows how 
the system is differentiated. High complexity values correspond to an optimal synthesis of 
functional specialization and functional integration. Systems whose elements are not integrated 
(such as a gas) or not specialized ( such as an homogeneous crystal) have a minimum complexity. 
In the evolutionary neural gas case, the WTA strategy limits the integration among the 
activation patterns. So the mutual information between any activation pattern and the other possible 
patterns is equal to zero. It justifies the use of the term “gas”, since the patterns behave like isles of 
information weakly interacting each other. 
If there were selected more winner units for the same input signal in the early training phase, we 
could get a given system status characterized by i + 1 activated units not only by the activation of 
just a single winner and its related i units, but also by the activation of more winners. therefore  we 
should also take into consideration all the possible subgroups with j+1 elements. 
The mutual information formula between a subgroup with k activated units and the system is 
given by: 
(5.7) 
( ) ( ) ( )
( ) ( )
k ii jk
n ii ji j
H X i j
= = − −
� �+	 
� � � � � �= +� �� � � � +� �+ + � �� � � �
 � +� �� �+� �� �
( ) ( ) ( )
( ) ( )
k ii jk
n ii ji j
H X X i j
= = − −
� �+	 
� � � � � �− = + +� �� � � � +� �+ + � �� � � �
 � +� �� �+� �� �
( )( ) ( )
( )( ) ( )
1 1 log
kk jj
−−− −
−−−= −
� �	 
� � � � � �+ − − + +� �� �� � � � � �+ � �� � � �� �
 � − +� �� �+� �� �
( ) ( ) ( )
N ii j j
N i i
i j j
i j k
β β βα α α
− − −
+ +� �� � � � � �� �+ + + − ⋅� �� � � � � �� �+ +� � � � � �� �� �
� �  
( ) ( ) ( )
ii j j
i j j
β β βα α α− − −
+ +� �� �� � � �
+ + −� �� �� � � �+� � � �� �� �
( ) ( ) ( )
( ) ( )
N ii j
ii ji j
H X i j
= = − −
� �+	 
� � � � � �= +� �� � � � +� �+ + � �� � � �
 � +� �� �+� �� �
To provide the system with a greater level of complexity, in order to favouring the integration 
among the network unit subgroups, it is, therefore, necessary adopting a strategy different from the 
WTA in the early training phases so as to select more winner units. 
6) CONCLUSIONS AND FUTURE WORKS 
The here presented self-organizing network can be considered as an example of autopoietic 
system which evolves by means of a closed network of interactions and based upon the production 
of components (the categorization units). In the course of the reproductive dynamics, those ones 
produce other components, also belonging to the system (i.e. other categorization units) which 
maintain the system identity over time with respect to the experimental task.  
In particular, it has to be noticed that they are not just the environmental information to lead the 
evolution of the network of connections, but rather the network internal status, which is 
individuated globally by the size that the population has reached and locally by the values of the 
parameters of the units. The latter show the difficulties that the units encounter in modelling the 
presented input, such difficulty is directly proportional to the amount of variations their reference 
vectors are subjected to. 
Learning and the capability to model the system external inputs, therefore, emerges more by 
means of the population internal dynamics than by means of a learning algorithm. 
The appearing of a scale-free structure emerging from the choice of the population dynamics is 
peculiarly significant for the model’s biological plausibility. Which thing describes a quite phase-
transition-like status where cluster “float” as informational “isles” in a “gaseous” configuration. It is 
worthy noticing that the WTA strategy and the environmental noise (probabilistic laws) suffice to 
create a kind of basic informational skeleton around which more interconnected functional 
structures can then aggregate. In the nervous system, it plausibly happens according to an 
essentially genetic design. Such kind of neural dynamics guarantees flexibility and redundancy to 
the informational nuclei which are ready to synchronize and connect through signals. Actually, 
what we tried here to describe is a proto-neural scenario with low integration of clusters which are 
specialized in easy categorization tasks. 
Developing the ENG model requires to investigate different synchronization scenarios among 
clusters and their ensuing functional integration to execute more complex tasks. In particular, it is 
necessary to modify the evolutive dynamics so as to mane the connections among units active. In 
this way, it should be possible to create a dynamic neural topology susceptible of hierarchical 
organization. 
Everything seems to confirm not only the deep reasons for the scale-free structures recurring in 
nature (Z. Toroczkai, K. E. Bassler, 2004), but also the fundamental lesson associating complexity 
with a thin border zone between integration and differentiation among the functional modules of a   
system. 
Acknowledgements: The authors thank Eliano Pessa and Graziano Terenzi for their precious 
suggestions. 
REFERENCES 
Albert R. and Barabasi A.(2000) Topology of evolving networks: Local events and universality. 
Physical Review Letters vol.85, p.5234. 
Annunziato M. and Pizzuti S.(2000), Adaptive Parametrization of Evolutionary Algorithms Driven 
by Reproduction and Competition, in Proceedings of ESIT 2000, Aachen, Germany. 
Chialvo D.R. and Bak P. (1999) Learning from mistakes, in  Neuroscience Vol.90, No.4, pp.1137-
1148. 
Dawkins R. (1971), Selective neurone death as a possible memory mechanism, in Nature n.229, 
pp.118-119. 
Delaunay B. (1934), Bullettin of the Academy of Sciences USSR, vol.7, pp. 793-800. 
Edelman G.M. (1978), Group selection and phasic reentrant signaling: a theory of higher brain 
function, in The Mindful Brain (eds Edelman G.M. and Mountcastle V.), pp. 51-100, MIT, 
Cambridge. 
Edelman G.M. (1987), Neural Darwinism: The Theory of Neuronal Group Selection, Basic Books, 
New York. 
Edelman G.M., Tononi G. (2000), Un universo di coscienza, Biblioteca Einaudi, Torino. 
Fritzke B. (1994), Growing Cell Structures. A Self-Organizing Network for Unsupervised and 
Supervised Learning. in Neural Networks, 7(9), pp. 1441-1460. 
Heylighen F. (1992), Principles of Systems and Cybernetics: an evolutionary perspective, in: 
Cybernetics and Systems ’92, R. Trappl (ed.), World Science, Singapore, pp. 3-10. 
Langton C.G. (1989), Artificial Life: The Proceedings of an Interdisciplinary Workshop on the 
Synthesis and Simulation of Living Systems,  Addison-Wesley. 
Jefferson D. and Taylor C. (1995), Artificial Life as a Tool for Biological Inquiry, in Artificial Life: 
an Overview, edited by C.G. Langton, MIT press, pp.1-10. 
Kohonen T. (1982), Self-Organized Formation of Topologically Correct Feature Maps, in 
Biological Cybernetics, n.43, pp.59-69. 
Martinetz T.M. (1993), Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving 
Maps, in ICANN’93, International Conference on Artificial Neural Networks, Springer, pp. 427-
434. Amsterdam.  
Martinetz T.M. and K.J. Schulten, (1991), A Neural Gas Network Learns Topologies, In Artificial 
Neural Networks, T.Kohonen, K. Makisara, O. Simula, and J. Kangas, eds, , pp. 397-402. North-
Holland, Amsterdam. 
Martinetz T.M. and Schulten K.J. (1994), Topology Representing Networks, in Neural Networks, 
7(3), pp. 507-522. 
Moriarty D.E. and R.Miikkulainen (1998), Forming Neural Networks Through Efficient and 
Adaptive Coevolution, in Evolutionary Computation, 5(4), pp.  373-399. 
Paredis J. (1995), Coevolutionary Computation, in Artificial Life, 2, pp.355-375. 
Rocha L.M. (1997) Evolutionary Systems and Artificial Life, Lecture Notes. Los Alamos, NM 
87545 
Roughgarden J.,(1979), Theory of Population Genetics and Evolutionary Ecology, Prentice-Hall. 
Smith R.E., Forrest S., Perelson A.S. (1993), Searching for Diverse Cooperative Populations with 
Genetic Algorithms, in  Evolutionary Computation, 1(2), 127-149. 
Song J. and Yu J. (1988), Population System Control, Springer-Verlag. 
Steyvers M. and Tenenbaum J., 2001. The Large-Scale structure of Semantic Networks. Working 
draft submitted to Cognitive Science. 
Toroczkai,Z. and Bassler,K.E. (2004), Jamming is Limited in Scale-Free Systems, in Nature, 428 , 
p.716 
Watts D.J., Strogatz S.H. , 1998. Collective dynamics of ‘small-world’ networks. Nature, vol. 393, 
pp. 440-442. 
Young J.Z. (1964), A Model of the Brain, Clarendon, Oxford. 
Young J.Z. , (1966), The Memory System of the Brain, University of California Press, Berkeley.
ABSTRACT
  Despite their claimed biological plausibility, most self organizing networks
have strict topological constraints and consequently they cannot take into
account a wide range of external stimuli. Furthermore their evolution is
conditioned by deterministic laws which often are not correlated with the
structural parameters and the global status of the network, as it should happen
in a real biological system. In nature the environmental inputs are noise
affected and fuzzy. Which thing sets the problem to investigate the possibility
of emergent behaviour in a not strictly constrained net and subjected to
different inputs. It is here presented a new model of Evolutionary Neural Gas
(ENG) with any topological constraints, trained by probabilistic laws depending
on the local distortion errors and the network dimension. The network is
considered as a population of nodes that coexist in an ecosystem sharing local
and global resources. Those particular features allow the network to quickly
adapt to the environment, according to its dimensions. The ENG model analysis
shows that the net evolves as a scale-free graph, and justifies in a deeply
physical sense- the term gas here used.

<|endoftext|><|startoftext|>
X-ray Dichroism and the Pseudogap Phase of Cuprates
S. Di Matteo1, 2 and M. R. Norman3
Laboratori Nazionali di Frascati INFN, via E. Fermi 40, C.P. 13, I-00044 Frascati, Italy
Equipe de physique des surfaces et interfaces, UMR-CNRS 6627 PALMS,
Université de Rennes 1, 35042 Rennes Cedex, France
Materials Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
(Dated: October 26, 2018)
A recent polarized x-ray absorption experiment on the high temperature cuprate superconduc-
tor Bi2Sr2CaCu2O8+x indicates the presence of broken parity symmetry below the temperature,
T*, where a pseudogap appears in photoemission. We critically analyze the x-ray data, and con-
clude that a parity-breaking signal of the kind suggested is unlikely based on the crystal structures
reported in the literature. Possible other origins of the observed dichroism signal are discussed.
We propose x-ray scattering experiments that can be done in order to determine whether such
alternative interpretations are valid or not.
PACS numbers: 78.70.Ck, 78.70.Dm, 75.25.+z, 74.72.Hs
I. INTRODUCTION
Twenty years since the discovery of high-temperature
cuprate superconductivity, there is still no consensus on
its origin. As the field has evolved, more and more at-
tention has been directed to the pseudogap region of
the phase diagram in underdoped compounds and the
possible relation of this phase to the superconducting
one.1 Time-reversal breaking has been predicted to occur
in this pseudogap phase due to the presence of orbital
currents2 and a subsequent experiment3 using angle-
dependent dichroism in photoemission has claimed to de-
tect this. However, this result has been challenged by
others4 and an independent experimental verification of
this would be highly desirable.
Recently, Kubota et al.5 performed Cu K edge circu-
lar and linear dichroism x-ray experiments for under-
doped Bi2Sr2CaCu2O8+x (Bi2212), claiming no time-
reversal breaking of the kind predicted in Ref. 2 exists,
and that, on the contrary, a parity-breaking signal (but
time-reversal even) is present with the same tempera-
ture dependence as the photoemission dichroism signal,
that was interpreted as x-ray natural circular dichroism
(XNCD) as seen in other materials.6
The aim of the present paper is to critically examine
the conclusions of Kubota et al.5 In particular, we find
that the XNCD signal for the average7 Bi2212 crystal
structure should be zero along all three crystallographic
axes, therefore casting doubt on the original interpreta-
tion of Ref. 5. To look into alternate explanations, we
performed detailed numerical simulations aimed at ex-
plaining the observed signal. At the basis of our study
is the simple observation (see, e.g., Ref. 8) that circu-
lar dichroism in absorption can be generated either by
a non-magnetic effect in the electric dipole-quadrupole
(E1-E2) channel (XNCD, a parity-breaking signal), or
by a magnetic signal in the E1-E1 channel (parity-even).
Alternately, such a signal can be due to contamination
from x-ray linear dichroism (XNLD). We propose x-ray
experiments that could be used to investigate these mat-
ters further.
The structure of the paper is as follows. In Sec. II we
show how symmetry constrains any possible XNCD sig-
nal that would be observed in Bi2212. We also perform
numerical simulations for XNCD assuming an alignment
displaced from the c-axis, using several crystal structure
refinements proposed in the literature. We also calculate
the XNLD signal and comment whether XNLD contam-
ination could be responsible for the observed signal. In
Sec. III we calculate the x-ray magnetic circular dichro-
ism (XMCD) signal at both the Cu K and L edges as-
suming magnetic order on either the copper or oxygen
sites. Finally, in Sec. IV, we draw some general conclu-
sions from our work.
II. NON-MAGNETIC CIRCULAR DICHROISM
IN BI2212
The structure of Bi2212 is strongly layered, with insu-
lating BiO blocks intercalated between superconducting
CuO2 planes. Crystal structure refinements reveal the
presence of an incommensurate modulation whose origin
has been the subject of much debate. The presence of
this modulation has complicated the determination of the
average crystal structure. Two different average struc-
tures have been proposed in the literature for Bi2212:
Bbmb9,10 and Bb2b11,12,13,14, where b is the modulation
direction for the superstructure. We follow the general
convention in the cuprate literature and use a rotated
basis compared to those in the International Tables for
Crystallography15 (respectively, Cccm, No. 66, and Ccc2,
No. 37). In this way, the c-crystallographic direction is
orthogonal to the CuO2 planes.
The Bbmb structure is globally centrosymmetric and,
as such, does not admit a non-zero value for the parity-
odd operator ~L · (ǫ̂∗ × ǫ̂)(~Ω · k̂), whose expectation value
determines the XNCD signal (~L, k̂, ǫ̂ and ~Ω are, re-
spectively, the angular momentum operator, the direc-
tion and polarization of the x-ray beam, and the toroidal
http://arxiv.org/abs/0704.0599v2
momentum operator, see, e.g., Refs. 8,16). Therefore, if
the signal measured by Kubota et al.5 were a true XNCD
signal, this would imply a lower crystal symmetry than
Bbmb. We note that most refinements in the literature
suggest Bbmb,9,10 and this crystal structure is also con-
sistent with recent photoemission data17 that indicate
the presence of both a glide plane and a mirror plane
based on dipole selection rules.
The other average structure that has been suggested
by x-ray and neutron diffraction is Bb2b.11,12,13,14 This
space group is not centrosymmetric and, therefore, a par-
ity breaking signal like that of XNCD is in principle al-
lowed. However, not all wave vector directions are com-
patible with the presence of a XNCD signal, as demon-
strated below by symmetry considerations. In the last
part of Section II, we numerically calculate the XNCD
for a geometrical configuration allowing a signal - like
~k‖(101) - by means of the multiple-scattering subroutine
in the FDMNES program.18
In the context of this program, atomic potentials are
generated using a local density approximation with a
Hedin-Lundqvist form for the exchange-correlation en-
ergy. These potentials are then used in a muffin tin
approximation to calculate the resulting XANES sig-
nal by considering multiple scattering of the photo-
electron about the absorbing site within a one-electron
approximation.19 In the future, it would be desirable
to repeat these calculations by using input from self-
consistent band theory, as has recently been done for the
Bbmb structure in regards to angle resolved photoemis-
sion spectra.20
In the Bb2b setting, Cu ions belong to sites of Wyck-
off multiplicity 8d. These eight equivalent copper sites
can be partitioned in two groups of four sites that are
related by the vector (1/2,0,1/2). Within each group
the four sites are related by the symmetry operations
{Ê, Ĉ2y, m̂x, m̂z}, where Ê is the identity, Ĉ2y is a two-
fold axis around the b crystallographic axis, and m̂x(z)
is a mirror-symmetry plane orthogonal to the a(c)-axis.
The absorption at the Cu K edge, expressed in Mbarn,
can be calculated through the equations:
σ(±) =
j , (1)
j = 4π
2αh̄ω
|〈Ψ(j)n |Ô
(±)|Ψ
2δ[h̄ω − (En − E0)]
The operator Ô(±) ≡ ǫ̂(±) ·~r(1+ i
~k ·~r) in Eq. (2) is the
usual matter-radiation interaction operator expanded up
to dipole (E1) and quadrupole (E2) terms, with the pho-
ton polarization ǫ̂ and the wave vector ~k, where we label
left- and right-handed polarization by the superscript ±.
n ) is the ground (excited) state of the crystal,
and E0 (En) its energy. The sum in Eq. (2) is extended
over all the excited states of the system and h̄ω is the
energy of the incoming photon, with α the fine-structure
9.029.019.008.998.98
Energy (keV)
Pet (XNCD x 1000)
Glady (XNCD x 400)
Kan (XNCD x 100)
FIG. 1: (Color online) XANES signal for ~k‖(001) and XNCD
signal for ~k‖(101) at the Cu K edge for three Bb2b crystal
refinements, with a cluster radius of 4.9 Å. The crystal struc-
tures are Pet for Petricek et al.11, Glady for Gladyshevskii
and Flükiger13 , and Kan for Kan and Moss12. The XNCD
signals have been multiplied by the factors indicated. Each
successive set of curves is displaced by 0.4 Mbarn.
constant. Finally, the index j = 1, ..., 8 indicates the
lattice site of the copper photoabsorbing atom in the
unit cell. Eqs. (1) and (2) are the basis of the numer-
ical calculations of the FDMNES program.18 The eight
contributions can be written as the sum of two equal
parts coming from the two subsets of four ions related by
the (1/2,0,1/2) translation. Within each subset, the four
absorption contributions are related to one another by
the symmetry operations σ2 = Ĉ2yσ1, σ3 = m̂xσ1, and
σ4 = m̂zσ1, implying that the total absorption is:
σ = 2(1 + m̂z)(1 + m̂x)σ1 (3)
The group from σ5 to σ8 is equivalent to the first group of
four modulo a translation (this is the reason for the factor
of two in Eq. (3)). Notice that the symmetry operators in
Eq. (3) are meant to operate just on the electronic part
of the operator Ô in Eq. (2).
In the case of circular dichroism, the signal is given by
σ = σ+−σ−. If we suppose that no net magnetization is
present in the material (we shall analyze the possibility
of magnetism in Sec. III), then the dichroism is natural,
i.e., necessarily coming from the interference E1-E2 con-
tribution in Eq. (2). In this case the signal is parity-odd,
which implies that m̂z(x) ≡ ÎĈ2z(2x) → −Ĉ2z(2x) (Î is
the inversion operator). Then Eq. (3) becomes
σ = 2(1− Ĉ2z)(1 − Ĉ2x)σ1 (4)
which implies that, of the possible five second-rank ten-
sors involved in XNCD, only the term T
1 − T
−1 sur-
vives. To arrive at this result, we applied the usual op-
erator rules on spherical tensors:21 Ĉ2xT
m = T
−m and
Ĉ2zT
m = (−)
m . This in turn leads to a zero XNCD
along the three crystallographic axes of the Bb2b crys-
tal structure where, e.g., along the c-axis, the signal is
proportional to T
Therefore, even in the Bb2b crystal structure, the
XNCD is exactly zero by symmetry when the wave vec-
tor is directed along the c-axis (i.e., orthogonal to the
CuO2 planes) as in the experiment of Ref. 5. This has
been further checked by numerical calculations with clus-
ter radii up to 6.5 Å, i.e., 93 atoms, centered on the Cu
ion, based on the average crystal structures reported in
Refs. 11,12,13,14. The only possibility to justify theo-
retically the experimental evidence of circular dichroism
of a non-magnetic nature is either by lowering of the or-
thorhombic Bb2b symmetry, a misalignment of the k̂-
direction with respect to the c-axis, or contamination
from linear dichroism. We checked all of these possi-
bilities.
If we take into account the monoclinic supercell pro-
posed in Ref. 13, with space group Cc, this implies a re-
duction of the symmetry operations, with the loss of the
two-fold screw axis. Nonetheless, the glide plane contain-
ing the normal to the CuO2-planes is still present (m̂x
in Eq. 3), which is responsible for the extinction rule of
the quantity 〈Ψn|~L · (ǫ̂
∗ × ǫ̂)(~Ω · k̂)|Ψn〉. Therefore the
XNCD is again identically zero by symmetry, which we
verified by direct numerical simulation of the supercell.
Note that only a reduction to triclinic symmetry would
allow for XNCD in the direction orthogonal to the CuO2
planes.
We also checked for the possibility of misalignment, as
shown in Fig. 1, by a direct calculation with ~k‖(101),
corresponding to a tilting θ ∼ 10o with respect to the
c-axis (note that the a lattice constant is ∼ 5.4 Å, and
the c one ∼ 30.9 Å). We first remark that the calcu-
lated XANES signal compares well to experiment (the
XANES for ~k‖(001) and ~k‖(101) are identical on the
scale of Fig. 1). Despite this, the energy profile of the
XNCD signal is different from the one reported by Kub-
ota et al.5 The main difference is the energy extension of
the calculated dichroic signal, whose oscillations persist,
though with decreased intensity, more than 50 eV above
the edge itself. This characteristic is present for all four
Bb2b refinements we have looked at (and as well for the
monoclinic supercell, which we do not show) and is at
variance with the experimental results of Ref. 5, where
the dichroic signal is confined to a small energy range
around the edge.
Some comments on the calculations are in order. Each
refinement gives rise to a different XNCD signal, and
-0.05
9.029.019.008.998.98
Energy (keV)
 Pet (XNCD x 1000)
 Kan (XNCD x 100)
 Glady (XNCD x 400)
FIG. 2: (Color online) XNCD signal, as in Fig. 1, but for a
CuO5 cluster.
9.029.019.008.998.98
Energy (keV)
R=2.1
R=3.1
R=4.9
R=4.1
R=5.5
R=6.5
XNCD x 1000
FIG. 3: (Color online) XNCD signal at the Cu K edge for
~k‖(101) as a function of the cluster radius (in Å) for the crys-
tal structure of Petricek et al.11. The signal has been mul-
tiplied by 1000. Each successive curve is displaced by 0.06
Mbarn.
their intensities are quite different as well. They are
found to be strongly dependent on the magnitude of the
deviation of the atoms from their Bbmb positions, which
differs significantly among the various Bb2b refinements.
Moreover, the structure of Kan and Moss leads to a mani-
festly different energy profile. This difference is seen even
for CuO4 and CuO5 clusters (the results for a CuO5 clus-
ter are shown in Fig. 2) and has to do with the large de-
parture of this particular structure from the Bbmb one.
Although there has been some criticism in the literature
concerning this particular refinement,10 the point we wish
to make is that each refinement has a different XNCD
signal, showing how sensitive this signal is to the actual
crystal structure. We note that Fig. 1 was done for a
cluster radius of 4.9 Å, i.e., 37 atoms. In Fig. 3, we show
results for the refinement of Petricek et al.11 up to a ra-
dius of 6.5 Å (93 atoms), showing the development of ad-
ditional structure in the energy profile as more and more
atoms are included in the cluster. In the real system, the
effective cluster radius is limited by the photoelectron
escape depth, which is energy dependent.22
We also remark that the magnitude of the XNCD sig-
nal we calculated for a 10 degree misalignment is com-
parable to that measured in Ref. 5. On the other hand,
the signal goes as sin(2θ), where θ measures the displace-
ment from the c-axis. We note that Kubota et al.5 men-
tion that their signal was insensitive to displacements
from the c-axis of 5 degrees, which would argue against
a misalignment given the strong angular dependence we
predict. Moreover, we note that the size of the signal
only depends on the projection of the k vector onto the
a-c plane, i.e., a signal for ~k‖(111) is equivalent to that
for ~k‖(101).
The above leads us to suspect that neither misalign-
ment, nor symmetry reduction, are the basis of the signal
detected in Ref. 5. We now turn to the third possibility
for a non-magnetic signal, that due to intermixing of lin-
ear dichroism. All x-ray beams at a synchrotron have a
linear polarization component (Kubota et al.5 mention
the possibility of up to 5% of linear admixture). The
resulting linear dichroism, which vanishes for a uniaxial
crystal, can swamp the intrinsic XNCD signal for a biax-
ial crystal where the a and b directions are inequivalent
(like Bi2212). This was shown by Goulon et al. for a
KTiOPO4 crystal with the same point group symmetry
(mm2) as Bi2212.23
To analyze this further, we show the linear dichro-
ism (the XANES signal for ~E‖(010) minus the one for
~E‖(100)), calculated for the Bb2b refinement of Glady-
shevskii et al.,14 for various cluster radii, in Fig. 4. Simi-
lar results have been obtained for the other crystal struc-
tures, including the Bbmb refinement of Miles et al.10
The energy profile, with a positive peak followed by a
negative peak, and its location at the absorption edge, is
very reminiscent of the data. Moreover, the size of the
XNLD signal is large, meaning that only a few percent
admixture is necessary to explain the size of the signal
seen in Ref. 5. One issue is that Kubota et al.5 did report
the existence of an XNLD signal, but also claim that it is
temperature independent. This is somewhat puzzling, as
there are significant changes of the lattice constants with
temperature.24 One obvious question would be why such
an XNLD contamination would only appear below T*,
though it should be remarked that there are anomalies
in the superstructure periodicity near T*.24 A definitive
test would be to rotate the sample under the beam, as
any XNLD signal would vary as cos(2φ) where φ is the in-
plane angle relative to the b-axis. Any circular dichroism
(XNCD or XMCD) is instead φ-independent.
9.029.019.008.998.98
Energy (keV)
 R=3.0
 R=4.1
 R=5.1
XANES
XNLD x 50
FIG. 4: (Color online) XNLD signal at the Cu K edge for
~k‖(001) as a function of the cluster radius (in Å) for the crys-
tal structure of Gladyshevskii et al.14. The XNLD signal has
been multiplied by 50. The XANES curve has been displaced
by 0.05 Mbarn.
A final possibility would be a small energy shift be-
tween the left and right circularly polarized beams. Dif-
ferentiating the absorption edge in Fig. 1 would indeed
lead to a signal similar to that seen in Ref. 5 (but with
an enhanced positive peak relative to the negative peak).
But such an energy shift is difficult to imagine with the
particular experimental set up used.
The observed dichroism signal as a function of energy
is also reminiscent of that typically seen for magnetic
circular dichroism: in this case the signal would be of
E1-E1 origin and its main features are indeed expected
to be at the edge itself. In addition, the nature of the
observed signal (a single sharp positive peak followed by
a single sharp negative peak) is also much like a magnetic
signal where the main features are expected to be more
localized in energy starting from the rising edge of the
absorption. Whether this possibility is realistic or not
can only be determined by a quantitative calculation,
which we offer in the next section.
III. MAGNETIC DICHROISM IN BI2212
Over the years, there have been several claims of pos-
sible magnetic order in the pseudogap phase of cuprate
superconductors. Recently, a magnetic signal at a (101)
Bragg vector has been observed below T* for several un-
derdoped YBa2Cu3O6+x (YBCO) samples by polarized
neutron diffraction.25 The signal, corresponding to a mo-
ment of order 0.05-0.1 µB, is not simple ferromagnetism
as it was not observed at the (002) Bragg vector. Even
more recently, a Kerr rotation below T* has been de-
tected in underdoped YBCO corresponding to a net fer-
9.029.019.008.998.98
Energy (keV)
 XANES Kan
 XMCD Pet
 XMCD Kan
 XMCD Glady
XMCD x 10000
FIG. 5: (Color online) XMCD for ~k‖(001) at the Cu K edge
with a ferromagnetic moment of 0.1 µB along the c-axis at
each copper site. The cluster radius is 4.1 Å. The XMCD
signal has been multiplied by 10000. The crystal structures
are those of Fig. 1.
romagnetic moment of 10−4 µB.
This motivates us to consider the possibility of a mag-
netic origin for the x-ray circular dichroism detected in
Ref. 5. We restate that in absorption (see, e.g., Ref. 8),
circular dichroism can be generated either by a non-
magnetic effect in the E1-E2 channel (XNCD, a parity-
breaking signal), or by a magnetic signal in the E1-E1
channel (XMCD, parity-even). The first possibility, an-
alyzed in the previous section, does not seem to be com-
patible with the experimental results. In order to analyze
the second possibility, we need to provide the lattice with
a magnetic structure that has a net magnetization (oth-
erwise, the XMCD is zero). In what follows, we shall sup-
pose two magnetic distributions, the first with the mag-
netic moments on the Cu sites, the second on the planar
O sites. The numerical calculations are performed with
the relativistic extension of the multiple-scattering pro-
gram in the FDMNES code,18 and provide results that
are an extension of those of the previous section.
The details of the calculations are as follows: we used
again the average crystal structures discussed in Section
II. For each magnetic configuration we employed clus-
ters with radii ranging from 3.1 Å (a CuO5-cluster), to
4.9 Å (37 atoms) around the Cu photoabsorbing ion. In
the first set of calculations, shown in Figs. 5 and 6, we
built the input potential from a magnetic configuration of
4.55 3d↑ electrons and 4.45 3d↓ electrons (i.e., a moment
of 0.1 µB per copper site). In the second set of calcula-
tions, shown in Fig. 7, we built the input potential from
a magnetic configuration of 2.05 2p↑ electrons and 1.95
2p↓ electrons (i.e., a moment of 0.1 µB per planar oxygen
site). The following results are noteworthy:
a) Differently from the XNCD calculations shown in
-0.05
9.029.019.008.998.98
Energy (keV)
 XMCD Pet
 XMCD Kan
 XMCD Glady
x 10000
FIG. 6: (Color online) XMCD as in Fig. 5, but with a cluster
radius of 3.1 Å, i.e., a CuO5 cluster.
9.029.019.008.998.98
Energy (keV)
 XANES Kan
 XMCD Pet
 XMCD Kan
 XMCD Glady
XMCD x 5000
FIG. 7: (Color online) XMCD for ~k‖(001) at the Cu K edge
with a ferromagnetic moment of 0.1 µB at each planar oxygen
site. The XMCD signal has been multiplied by 5000. The
XANES signal has been displaced by 0.02 Mbarn.
Fig. 1, all crystal structures give basically the same
XMCD spectra. The reason for this behavior may be
related to the fact that XMCD, when x-rays are orthog-
onal to the CuO2 planes, mainly depends on the in-plane
magnetisation density and on the in-plane crystal struc-
ture, which is quite similar for the various refinements.
b) There is a more marked dependence on the cluster
radius compared to the XNCD, as shown by the compar-
ison of Fig. 5 and Fig. 6. The calculations with a radius
bigger than 4.1 Å (6 Cu, 9 O, 4 Sr, and 4 Ca), above the
pre-edge energy, are basically equivalent to those shown
in Fig. 5, with a positive peak at the edge energy, fol-
960950940930
Energy (eV)
 XANES Pet
 XMCD Pet
 XMCD Kan
 XMCD Glady
XMCD x 10
FIG. 8: (Color online) XMCD for ~k‖(001) at the Cu L2,3
edges with a ferromagnetic moment of 0.1 µB along the c-
axis at each copper site. The cluster radius is 4.1 Å. The
XMCD signal has been multiplied by 10.
lowed by a double negative peak, the latter at variance
with the experimental results. On the contrary, the en-
ergy shape obtained for a radius of 3.1 Å is very close to
the experimental one, with a single negative peak after
the sharp positive one, with relatively good agreement
in the energy position and width. We could be tempted
to suppose, therefore, that the virtual photoelectron has
a very small mean free path before decaying and it is
sensitive just to the nearest neighbor oxygens. Indeed
we checked that an identical XMCD profile is obtained
with just the in-plane CuO4-cluster. On the other hand,
the size of the signal we calculate is about an order of
magnitude smaller than that seen in Ref. 5. Since the
XMCD signal is proportional to the moment, then we
would need a moment of ∼1 Bohr magneton per copper
to have a comparable signal. Such a huge moment would
have been observed previously by neutron scattering if
it existed. Of course we cannot exclude that spurious
effects, such as strain fields, could have influenced the
measurement.
c) The energy profile in the case of magnetization at
the oxygen sites is not much different from the cop-
per case, except for the deeper negative peak around E
∼ 8.994 keV, as shown in Fig. 7. Also in this case the dif-
ferent crystal structure refinements give basically equiv-
alent results, as again the CuO2 planes are practically
equivalent in the various cases. Note that the relative
intensity is equivalent to the copper case, as 0.1 Bohr
magnetons per planar oxygen corresponds to 0.2 Bohr
magnetons per CuO2 cell (note we multiply by 5000 in
Fig. 7 as compared to 10000 in Fig. 5).
We also performed simulations for the Cu L2,3 edges
for the magnetic configuration corresponding to Fig. 5,
as shown in Fig. 8, which can be compared with future
experimental investigations in order to confirm whether
or not a net magnetization exists in this compound.
Finally, we remark that the dependence of the XMCD
signal on the tilting angle θ (i.e., the displacement of
the photon wave vector from the c-axis) goes like cos(θ)
and therefore the signal is not very sensitive to small
displacements of 5 degrees, as noted by Kubota et al.5
This different angular dependence from the XNCD signal
suggests a relatively easy way to unravel the question ex-
perimentally: it is sufficient to measure the θ (azimuthal)
dependence of the signal, noting that any XNLD contam-
ination would be tested by the φ (polar) dependence of
the signal.
IV. CONCLUSIONS
In our opinion, the experiments of Kubota et al.5 have
raised more questions than they have answered. Al-
though not treated in our paper, we believe that their
results from the measurement of non-reciprocal linear
dichroism are at this stage not conclusive, as only one di-
rection for the toroidal moment has been investigated, of
the two possible suggested by the orbital current pattern
of Varma.2 The analysis performed in Section II showed,
moreover, that the claimed XNCD signal is probably un-
justified. In fact, even though the space group Bb2b
is non-centrosymmetric, XNCD is absent by symmetry
when the x-ray wave vector is chosen orthogonal to the
CuO2 planes, as in the experimental measurement geom-
etry of Ref. 5. The same extinction rule survives for the
monoclinic supercell structure refined in Ref. 13. More-
over, in both cases, it seems hard to mantain the hy-
pothesis of misalignment, due to the experimental local-
ization of the energy profile around the main absorption
edge, which is absent in the calculations. We also note
the difference of XNCD from the photoemission dichro-
ism results of Kaminski et al.3 A direct comparison is
however not immediate, as the former represents a q̂-
integrated version of the latter (here q̂ is the solid angle in
the space of the photoelectron wave-vector, see, e.g., Ref.
8). A more likely explanation is an XNLD contamination
(Fig. 4), but then the challenge is to understand why such
an effect would only exist below T*. We note that an op-
tics experiment for an optimal doped Bi2212 sample has
seen a change in linear birefrigence below Tc, which was
accompanied by a non-zero circular birefringence.28 In
addition, both Bi222329 and the Fe analogue of Bi221230
exhibit supercells with 222 space groups which would al-
low for dichroism. So, it is conceivable that there is a
subtle structural transition associated with T* which we
suggest could be looked for by diffraction experiments.
A final comment about the physical quantities detected
by x-ray circular dichroism, either natural or magnetic,
is in order. At the K edge of transition metal oxides,
XMCD in the E1-E1 channel, at the excitation energy
E = h̄ω − Eedge, gives information on the expectation
value of L̂ · (ǫ̂∗× ǫ̂) for the excited states at the energy E .
The orbital angular momentum is either induced from a
spin moment via spin-orbit coupling (as calculated here)
or directly by an orbital current (as in the scenario advo-
cated in Ref. 231). No direct spin information is available
at the K edge and therefore an XMCD measurement is
not directly related to the ground state magnetic moment
as for the L edge. Moreover, the observed states are
those with p-like angular momentum projection on the
photoabsorbing Cu ion, which are extended and there-
fore mainly sensitive to the influence of the oxygen atoms
surrounding the Cu site. In this case, the main contribu-
tion to the XMCD energy profile is expected in an energy
range of 10-20 eV from the main edge to the first shoul-
der in the XANES spectrum, as found in Ref. 5 and in
our own XMCD calculations.
Although the results of Section III are in principle con-
sistent with Ref. 5, the size of the ferromagnetic moment
necessary to get a signal of the magnitude seen by exper-
iment, ∼1 Bohr magneton, is excessive. If such a large
ferromagnetic moment existed, it would have surely been
seen by neutron scattering. From this point of view, ex-
periments performed at the Cu L edge and O K edge
would be desirable as they are more sensitive to the pres-
ence of a magnetic moment.
Finally, we would like to remark that XNCD along
the c-axis is insensitive to orbital currents. These latter,
confined to the CuO2 planes, develop a parity breaking
characterized by a toroidal moment (~Ω) within the CuO2
planes. The XNCD experiment of Ref. 5 would only be
sensitive to the projection of the toroidal moment out of
this plane (i.e., along the direction of the x-ray wavevec-
tor). Therefore, if performed as stated, it cannot tell us
about any possible orbital current order.
To conclude, we believe that the various interpreta-
tions, XNCD, XNLD, or XMCD, have their drawbacks,
and therefore the origin of the experimental signal of
Ref. 5 is still open. In this sense, further experimen-
tal checks of the energy extension of the dichroic signal
would be highly desirable. Based on our results, the most
stringent experimental test on the physical origin of the
signal would come from the measurement of the depen-
dence on the tilting (azimuthal) angle θ, due to the dif-
ferent dependences of XNCD and XMCD, as well as the
dependence on the in-plane (polar) angle φ, which would
test for any possible XNLD contamination.
V. ACKNOWLEDGMENTS
The authors thank John Freeland, Zahir Islam,
Stephan Rosenkranz, Daniel Haskel and Matti Lindroos
for various discussions. This work is supported by the
U.S. DOE, Office of Science, under Contract No. DE-
AC02-06CH11357. SDM would like to thank the kind
hospitality of the ID20 beamline staff at the ESRF.
1 M. R. Norman, D. Pines and C. Kallin, Adv. Phys. 54,
715 (2005).
2 C. M. Varma, Phys. Rev. B 55, 14554 (1997) and Phys.
Rev. Lett. 83, 3538 (1999); M. E. Simon and C. M. Varma,
ibid 89, 247003 (2002).
3 A. Kaminski, S. Rosenkranz, H. Fretwell, J. C. Cam-
puzano, Z. Li, H. Raffy, W. G. Cullen, H. You, C. G. Olson,
C. M. Varma and H. Hoechst, Nature 416, 610 (2002).
4 S. V. Borisenko, A. A. Kordyuk, A. Koitzsch, T. K. Kim,
K. A. Nenkov, M. Knupfer, J. Fink, C. Grazioli, S. Turchini
and H. Berger, Phys. Rev. Lett. 92, 207001 (2004).
5 M. Kubota, K. Ono, Y. Oohara and H. Eisaki, J. Phys.
Soc. Jpn. 75, 053706 (2006).
6 L. Alagna, T. Prosperi, S. Turchini, J. Goulon, A. Ro-
galev, C. Goulon-Ginet, C. R. Natoli, R. D. Peacock and
B. Stewart, Phys. Rev. Lett. 80, 4799 (1998).
7 The average crystal structure is the base orthorhombic unit
cell before the incommensurate superstructure is taken into
account. As the latter involves a translation operator, it
should not affect the symmetry arguments in this paper.
8 S. Di Matteo and C. R. Natoli, J. Synchr. Rad. 9, 9 (2002).
9 A. Yamamoto, M. Onoda, E. Takayama-Muromachi, F.
Izumi, T. Ishigaki and H. Asano, Phys. Rev. B 42, 4228
(1990); D. Grebille, H. Leligny, A. Ruyter, Ph. Labbé and
B. Raveau, Acta Cryst. B52, 628 (1996); N. Jakubowicz,
D. Grebille, M. Hervieu and H. Leligny, Phys. Rev. B 63,
214511 (2001).
10 P. A. Miles, S. J. Kennedy, G. J. McIntyre, G. D. Gu, G.
J. Russell and N. Koshizuka, Physica C 294, 275 (1998).
11 V. Petricek, Y. Gao, P. Lee and P. Coppens, Phys. Rev. B
42, 387 (1990).
12 X. B. Kan and S. C. Moss, Acta Cryst. B48, 122 (1992).
13 R. E. Gladyshevskii and R. Flükiger, Acta Cryst. B52, 38
(1996).
14 R. E. Gladyshevskii, N. Musolino and R. Flükiger, Phys.
Rev. B 70, 184522 (2004). There is no superstructure for
this Pb doped variant, so the crystal structure used is the
actual one, not the average one.
International Tables for Crystallography, 5th ed., ed. T.
Hahn (Kluwer, Dordrecht, 2002).
16 P. Carra and R. Benoist, Phys. Rev. B 62, R7703 (2000).
17 A. Mans, I. Santoso, Y. Huang, W. K. Siu, S. Tavaddod,
V. Apiainen, M. Lindroos, H. Berger, V. N. Strocov, M.
Shi, L. Patthey and M. S. Golden, Phys. Rev. Lett. 96,
107007 (2006).
18 Y. Joly, Phys. Rev. B 63, 125120 (2001).
This program can be downloaded at
http://www-cristallo.grenoble.cnrs.fr/fdmnes.
19 C. R. Natoli, Ch. Brouder, Ph. Sainctavit, J. Goulon, Ch.
Goulon-Ginet and A. Rogalev, Eur. Phys. J. B 4, 1 (1998).
20 V. Arpiainen and M. Lindroos, Phys. Rev. Lett. 97, 037601
(2006).
21 D. A. Varshalovich, A. N. Moskalev and V. K. Kersonskii,
Quantum Theory of Angular Momentum (World Scientific,
Singapore, 1988).
22 The results presented involve a convolution of the calcu-
http://www-cristallo.grenoble.cnrs.fr/fdmnes
lated spectrum with both a core hole (1.9 eV for the Cu K
edge) and a photoelectron lifetime, with the latter having
a strong energy dependence (ranging up to 15 eV with a
midpoint value at 30 eV above the Fermi energy). But the
calculation has a fixed cluster radius.
23 J. Goulon, C. Goulon-Ginet, A. Rogalev, G. Benayoun, C.
Brouder and C. R. Natoli, J. Synchr. Rad. 7, 182 (2000).
In addition, there is also an intrinsic non XNCD contribu-
tion to the circular dichroism for a biaxial crystal, but this
should vanish in Bi2212 for an x-ray beam aligned along
the c-axis, see J. Goulon, C. Goulon-Ginet, A. Rogalev, V.
Gotte, C. Brouder and C. Malgrange, Eur. Phys. J. B 12,
373 (1999).
24 P. A. Miles, S. J. Kennedy, A. R. Anderson, G. D. Gu,
G. J. Russell and N. Koshizuka, Phys. Rev. B 55, 14632
(1997).
25 B. Fauqué, Y. Sidis, V. Hinkov, S. Pailhes, C. T. Lin, X.
Chaud and P. Bourges, Phys. Rev. Lett. 96, 197001 (2006).
26 A. Kapitulnik, unpublished results.
27 It should be remarked, though, that single-particle based
approaches can miss some features of the spectral weight
transfer between the L2 and L3 edges.
28 J. Kobayashi, T. Asahi, M. Sakurai, M. Takahashi, K.
Okubo and Y. Enomoto, Phys. Rev. B 53, 11784 (1996).
29 E. Giannini, N. Clayton, N. Musolino, A. Piriou, R. Glady-
shevskii and R. Flükiger, IEEE Trans. Appl. Supercond.
15, 3102 (2005).
30 Y. Le Page, W. R. McKinnon, J.-M. Tarascon and P. Bar-
boux, Phys. Rev. B 40, 6810 (1989).
31 X-ray dichroism experiments in regards to the orbital cur-
rent scenario of Varma have been discussed by S. Di Matteo
and C. M. Varma, Phys. Rev. B 67, 134502 (2003).
ABSTRACT
  A recent polarized x-ray absorption experiment on the high temperature
cuprate superconductor Bi2Sr2CaCu2O8 indicates the presence of broken parity
symmetry below the temperature, T*, where a pseudogap appears in photoemission.
We critically analyze the x-ray data, and conclude that a parity-breaking
signal of the kind suggested is unlikely based on the crystal structures
reported in the literature. Possible other origins of the observed dichroism
signal are discussed. We propose x-ray scattering experiments that can be done
in order to determine whether such alternative interpretations are valid or
not.

<|endoftext|><|startoftext|>
Introduction
1.1 Algebraic patterns within subsets of N
We use extensively the notion of “algebraic pattern”. By an algebraic pattern
we mean a solution of a diophantine system of equations. For example, an
arithmetic progression of length k is an algebraic pattern corresponding to
the following diophantine system:
2xi = xi−1 + xi+1, i = 2, 3, . . . , k − 1.
We investigate the problem of finding linear algebraic patterns (these cor-
respond to linear systems) within a family of subsets of natural numbers
satisfying some asymptotic conditions.
For instance, by Szemerédi theorem, subsets of positive upper Banach density
(all S ⊂ N : d∗(S) > 0, where d∗(S) = lim supbn−an→∞
|S∩[an,bn]|
bn−an+1
) contain the
pattern of an arithmetic progression of any finite length (see [12]).
http://arxiv.org/abs/0704.0600v2
On the other hand, Schur patterns, namely triples of the form {x, y, x+ y},
which correspond to solutions of the so-called Schur equation, x+ y = z, do
not necessarily occur in sets of positive upper density. For example, the odd
numbers do not contain this pattern. But if we take a random subset of N
by picking natural numbers with probability 1
independently, then this set
contains the Schur pattern with probability 1.
There is a deterministically defined analog of a random set - a normal set. To
define a normal set we recall the notions of a normal infinite binary sequence
and of a normal number.
An infinite {0, 1}-valued sequence λ is called a normal sequence if every
finite binary word w occurs in λ with frequency 1
, where |w| is the length
of w.
The more familiar notion is that of a normal number x ∈ [0, 1]. If to a number
x ∈ [0, 1] we associate its dyadic expansion x =
with xi ∈ {0, 1},
then x is called a normal number if the sequence (x1, x2, . . . , xn, . . .) is a
normal sequence.
Definition 1.1.1 A set S ⊂ N is called normal if the 0-1 sequence 1S
(1S(n) = 1 ⇔ n ∈ S) is normal.
Normal sets exhibit a non-periodic, “random” behavior. We notice that if S
is a normal set then S − S contains N. Therefore, the equation
z − y = x
is solvable within every normal set. This implies that every normal set con-
tains Schur patterns.
Normal sets are related to a class of dynamical systems displaying maximal
randomness; namely Bernoulli systems. In this work we investigate occur-
rence of linear patterns in sets corresponding to dynamical systems with a
lower degree of randomness, so called weakly mixing dynamical systems. The
sets we obtain will be called WM sets. We will make this precise in the next
section.
In the present paper we treat the following problem:
Give a complete characterization of the linear algebraic patterns which occur
in all WM sets.
Remark 1.1.1 It will follow from our definition of a WM set, that any
normal set is a WM set.
The problem of the solvability of a nonlinear equation or system of equa-
tions is beyond the limits of the technique used in this paper. Nevertheless,
some particular equations might be analyzed. In [3] it is shown that there
exist normal sets in which the multiplicative Schur equation xy = z is not
solvable.
1.2 Generic points and WM sets
For a formal definition of WM sets we need the notions of measure preserving
systems and of generic points.
Definition 1.2.1 Let X be a compact metric space, B the Borel σ-algebra
on X; let T : X → X be a continuous map and µ a probability measure on
B. The quadruple (X,B, µ, T ) is called a measure preserving system if
for every B ∈ B we have µ(T−1B) = µ(B).
For a compact metric space X we denote by C(X) the space of continuous
functions on X with the uniform norm.
Definition 1.2.2 Let (X,B, µ, T ) be a measure preserving system. A point
ξ ∈ X is called generic for the system (X,B, µ, T ) if for any f ∈ C(X) we
f(T nξ) =
f(x)dµ(x). (1.1)
Example: Consider the Bernoulli system: (X = {0, 1}N0,B, µ, T ), where X
is endowed with the Tychonoff topology, B is Borel σ-algebra on X , T is the
shift to the left, µ is the product measure of µi’s where µi(0) = µi(1) =
and N0 = N ∪ {0}. An alternative definition of a normal set which is purely
dynamical is the following.
A set S is normal if and only if the sequence 1S ∈ {0, 1}
N0 is a generic
point of the foregoing Bernoulli
system.
The notion of a WM set generalizes that of a normal set, where the role
played by Bernoulli dynamical system is taken over by dynamical systems of
more general character.
Let ξ(n) be any {0, 1}−valued sequence. There is a natural dynamical system
(Xξ, T ) connected to the sequence ξ:
On the compact space Ω = {0, 1}N0 endowed with the Tychonoff topology,
we define a continuous map T : Ω −→ Ω by (Tω)n = ωn+1. Now for any ξ in
Ω we define
Xξ = {T nξ}n∈N0 ⊂ Ω.
Let A be a subset of N. Choose ξ = 1A and assume that for an appropriate
measure µ, the point ξ is generic for (Xξ,B, µ, T ). We can attach to the set
A dynamical properties associated with the system (Xξ,B, µ, T ).
We recall the notions of ergodicity, total ergodicity and weak-mixing in er-
godic theory:
Definition 1.2.3 A measure preserving system (X,B, µ, T ) is called er-
godic if every A ∈ B which is invariant under T , i.e. T−1(A) = A, satisfies
µ(A) = 0 or 1.
A measure preserving system (X,B, µ, T ) is called totally ergodic if for ev-
ery n ∈ N the system (X,B, µ, T n) is ergodic.
A measure preserving system (X,B, µ, T ) is called weakly mixing if the
system (X ×X,BX×X , µ× µ, T × T ) is ergodic.
In our discussion of WM sets corresponding to weakly mixing systems, we
shall add the proviso that the dynamical system in question not be the trivial
1-point system supported on the point x ≡ 0. This implies that the “density”
of the set in question be positive.
Definition 1.2.4 Let S ⊂ N. If the limit of 1
n=1 1S(n) exists as N → ∞
we call it the density of S and denote by d(S).
Definition 1.2.5 A subset S ⊂ N is called a WM set if 1S is a generic
point of the weakly mixing system (X1S ,B, µ, T ) and d(S) > 0.
1.3 Solvability of linear diophantine systems within
WM sets and normal sets
Our main result is a complete characterization of linear systems of diophan-
tine equations which are solvable within every WM set. The characterization
is given by describing affine subspaces of Qk which intersect Ak, for any WM
set A ⊂ N.
Theorem 1.3.1 An affine subspace of Qk intersects Ak for every WM set
A ⊂ N if and only if it contains a set of the form
{n~a+m~b+ ~f |n,m ∈ N},
where ~a,~b, ~f have the following description:
~a = (a1, a2, . . . , ak)
t, ~b = (b1, b2, . . . , bk)
t ∈ Nk, ~f = (f1, f2, . . . , fk)
t ∈ Zk and
there exists a partition F1, . . . , Fl of {1, 2, . . . , k} such that:
a) for every r ∈ {1, . . . , l} there exist c1,r, c2,r ∈ N, such that for every i ∈ Fr
we have ai = c1,r , bi = c2,r and for every j ∈ {1, . . . , k} \ Fr we have
aj bj
c1,r c2,r
6= 0.
∀r ∈ {1, 2, . . . , l} ∃cr ∈ Z such that ∀i ∈ Fr : fi = cr.
We also classify all affine subspaces of Qk which intersect Ak for any normal
set A ⊂ N.
Theorem 1.3.2 An affine subspace of Qk intersects Ak for every normal set
A ⊂ N if and only if it contains a set of the form
{n~a+m~b+ ~f |n,m ∈ N},
where ~a,~b, ~f have the following description:
~a = (a1, a2, . . . , ak)
t, ~b = (b1, b2, . . . , bk)
t ∈ Nk, ~f = (f1, f2, . . . , fk)
t ∈ Zk
and there exists a partition F1, . . . , Fl of {1, 2, . . . , k} such that for every
r ∈ {1, . . . , l} there exist c1,r, c2,r ∈ N, such that for every i ∈ Fr we have
ai = c1,r , bi = c2,r and for every j ∈ {1, . . . , k} \ Fr we have
aj bj
c1,r c2,r
6= 0.
A family of linear algebraic patterns that has been studied previously are the
“partition regular” patterns. These are patterns which for any finite partition
of N: N = C1∪C2 ∪ . . .∪Cr, the pattern necessarily occurs in some Cj. (For
example by van der Waerden’s theorem, arithmetic progressions are partition
regular and by Schur’s theorem the Schur pattern is also partition regular).
A theorem of Rado gives a complete characterization of such patterns. We
will show in Proposition 4.1 that every linear algebraic pattern which is
partition-regular occurs in every WM set.
It is important to mention that if we weaken the requirement of weak mixing
to total ergodicity, then in the resulting family of sets, Rado’s patterns need
not necessarily occur. For example, for α 6∈ Q the set
n ∈ N|nα (mod 1) ∈
is totally ergodic, i.e., 1S is a generic point for a totally ergodic system and
the density of S is positive, but the equation x+ y = z is not solvable within
In the separate paper [4] we will address the question of solvability of more
general algebraic patterns, not necessarily linear, in totally ergodic and WM
sets.
The structure of the paper is the following. In Section 2 we prove the direction
“⇐” of Theorems 1.3.1 and 1.3.2. In Section 3, by use of a probabilistic
method, we prove the direction “⇒” of Theorems 1.3.1 and 1.3.2. In Section
4 we show that every linear system which is solvable in one of the cells of any
finite partition of N is also solvable within every WM set. The paper ends
with Appendix in which we collected proofs of technical statements which
have been used in Sections 2 and 3.
1.4 Acknowledgments
This paper is a part of the author’s Ph.D. thesis. I thank my advisor Prof.
Hillel Furstenberg for introducing me to ergodic theory and for many useful
ideas which I learned from him. I thank Prof. Vitaly Bergelson for fruitful
discussions and valuable suggestions. Also, I would like to thank an anony-
mous referee for numerous valuable remarks.
2 Proof of Sufficiency
Notation: We introduce the scalar product of two vectors v, w of length N
as follows:
< v,w >N
v(n)w(n).
We denote by L2(N) the (finite-dimensional) Hilbert space of all real vectors
of length N with the aforementioned scalar product.
We define: ‖ w ‖2N
=< w,w >N .
First we state the following proposition which will prove useful in the proof
of the sufficiency of the conditions of Theorem 1.3.1.
Proposition 2.1 Let Ai ⊂ N (1 ≤ i ≤ k) be WM sets. Let
ξi(n)
= 1Ai(n)−d(Ai), where d(Ai) denotes density of Ai. Suppose there are
(a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z
2, such that ai > 0, 1 ≤ i ≤ k, and for every
i 6= j
ai bi
aj bj
6= 0.
Then for every ε > 0 there exists M(ε) ∈ N, such that for every M ≥ M(ε)
there exists N(M, ε) ∈ N, such that for every N ≥ N(M, ε)
where w(n)
m=1 ξ1(a1n+b1m)ξ2(a2n+b2m) . . . ξk(akn+bkm) for every
n = 1, 2, . . . , N .
Since the proof of Proposition 2.1 involves many technical details, first we
show how our main result follows from it. Afterwards we state and prove all
the lemmas necessary for the proof of Proposition 2.1.
We use an easy consequence of Proposition 2.1.
Corollary 2.1 Let A be a WM set. Let k ∈ N, suppose
(a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z
2 satisfy all requirements of Proposition 2.1
and suppose f1, . . . , fk ∈ Z. Then for every δ > 0 there exists M(δ) such
that ∀M ≥ M(δ) there exists N(M, δ) such that ∀N ≥ N(M, δ) we have
∣‖v‖N − d
∣ < δ,
where v(n)
m=1 1A(a1n + b1m + f1)1A(a2n + b2m + f2) . . . 1A(akn +
bkm+ fk) for every n = 1, 2, . . . , N .
Proof. We rewrite v(n) in the following form:
v(n) =
(ξ1(a1n+ b1m) + d(A)) . . . (ξk(akn + bkm) + d(A)),
for every n = 1, 2, . . . , N . We introduce normalized WM sequences ξi(n) =
ξ(n+ fi) (of zero average), where ξ(n) = 1A(n)− d(A). By use of triangular
inequality and Proposition 2.1 it follows that for big enoughM and N (which
depends on M) ‖v‖N is as close as we wish to d
k(A). This finishes the proof.
Proof. (of Theorem 1.3.1, ⇚) Let A ⊂ N be a WM set. Without loss of
generality, we can assume that for every r : 1 ≤ r ≤ l we have r ∈ Fr.
It follows from Corollary 2.1 that the vector v defined by
1A(a1n+ b1m+ f1)1A(a2n + b2m+ f2) . . . 1A(aln + blm+ fl)
for every n = 1, 2, . . . , N , is not identically zero for big enough M and N .
But this is possible only if for some n,m ∈ N we have
(a1n+ b1m+ f1, a2n+ b2m+ f2, . . . , aln+ blm+ fl) ∈ A
The latter implies that Ak intersects the affine subspace.
Proof. (of Theorem 1.3.2, ⇚) For every r : 1 ≤ r ≤ l take all indices which
comprise Fr. Denote this sequence of indices by Ir. Denote cr = mini∈Ir fi.
Let Sr be the set of all non-zero shifts of fi, i ∈ Fr, centered at cr, i.e.,
Sr = {fi − cr | i ∈ Fr, fi > cr}.
For example, if the sequence of fi’s where i ∈ F1 is (−5, 2, 3, 2,−5), then
S1 = {7, 8}.
Let A be a normal set. For every r : 1 ≤ r ≤ l we define sets Ar by
Ar = {n ∈ N ∪ {0} |n ∈ A and n+ s ∈ A, ∀s ∈ Sr}.
Then Ar is no longer a normal set provided that Sr 6= ∅ (d(A) =
21+|Sr |
But, for all r : 1 ≤ r ≤ l the sets Ar’s are WM sets.
Without loss of generality, assume that for every r : 1 ≤ r ≤ l we have
r ∈ Fr.
From Proposition 2.1 it follows that for big enough M and N
1A1(a1n+ b1m)1A(a2n+ b2m) . . . 1A(aln+ blm) ≈
d(Ar).
The latter ensures that there exist m,n ∈ N such that
(a1n+ b1m+ f1, . . . , akn+ bkm+ fk) ∈ A
Now we state and prove all the claims that are required in order to prove
Proposition 2.1.
Definition 2.1 Let ξ be a WM-sequence (ξ is a generic point for a weakly
mixing system (Xξ,BXξ , µ, T )) of zero average. The autocorrelation function
of ξ of length j ∈ N with the shifts ~i = (i1, i2, . . . , ij) ∈ Z
j and r ∈ Z is the
sequence ψ
which is defined by
(n) =
w∈{0,1}j
ξ(n+ r + w ·~i), n ∈ N,
where w ·~i is the usual scalar product in Qj, and
(n) = 0, n ≤ 0.
Lemma 2.1 Let ξ be a WM-sequence of zero average and suppose ε, δ >
0, b ∈ Z \ {0}. Then for every j ≥ 1, (c1, c2, . . . , cj) ∈ (Z \ {0})
j and
(r1, r2, . . . , rj) ∈ Z
j there exist I = I(ε, δ, c1, . . . , cn), a set S ⊂ [−I, I]
density at least 1−δ and N(S, ε) ∈ N, such that for every N ≥ N(S, ε) there
exists L(N, S, ε) such that for every L ≥ L(N, S, ε)
r,(c1i1,...,cjij)
(l + bn)
for every (i1, i2, . . . , ij) ∈ S, where r =
k=1 rk.
Proof. We note that it is sufficient to prove the lemma in the case c1 =
c2 = . . . = cj = 1, since if the average of nonnegative numbers over a whole
lattice is small, then the average over a sublattice of a fixed positive density
is also small.
Recall that ξ ∈ Xξ
= {T nξ}∞n=0 ⊂ supp(ξ)
N0, where T is the usual shift to
the left on the dynamical system supp(ξ)N0, and by the assumption that ξ
is a WM-sequence of zero average it follows that ξ is a generic point of the
weakly mixing system (Xξ,BXξ , µ, T ) and the function f : f(ω)
= ω0 has
zero integral.
Denote ~i = (i1, . . . , ij).
We define functions g
on Xξ by
T r+ǫ·
~i ◦ f,
ǫ∈V ∗
T r+ǫ·
~i ◦ f,
where Vj is the j-dimensional discrete cube {0, 1}
j and V ∗j is the j-dimensional
discrete cube except the zero point.
Notice that
(T nξ) = ψ
We use the following theorem which is a special case of a multiparameter
weakly mixing PET of Bergelson and McCutcheon (theorem A.1 in [2]; it is
also a corollary of Theorem 13.1 of Host and Kra in [9]).
Let (X, µ, T ) be a weakly mixing system. Given an integer k and 2k bounded
functions fǫ on X, ǫ ∈ Vk , the functions
Ni −Mi
n∈[M1,N1)×...[Mk,Nk)
ǫ∈V ∗
T ǫ1n1+...ǫknk ◦ fǫ
converge in L2(µ) to the constant limit
ǫ∈V ∗
when N1 −M1, . . . , Nk −Mk tend to +∞.
From this theorem applied to the weakly mixing system Xξ × Xξ and the
functions fǫ(x) = T
r ◦ f ⊗ T r ◦ f for every ǫ ∈ Vj , we obtain for every Folner
sequence {Fn} in N
j that an average over the multi-index ~i = {i1, . . . , ij} of
on Fn’s converges to zero in L
2(µ) (the integral of T r ◦ f ⊗ T r ◦ f
is zero). Thus
Xξ×Xξ
Ni −Mi
~i∈[M1,N1)×...×[Mj,Nj)
(y)dµ(x)dµ(y) =
Ni −Mi
~i∈[M1,N1)×...×[Mj,Nj)
(x)dµ(x)
as N1 −M1, . . . , Nj −Mj → ∞.
As a result we obtain the following statement:
For every ε > 0, j ∈ N and every fixed (r1, r2, . . . , rj) ∈ N
j, there exists a
subset R ⊂ Nj of lower density equal to one, such that
< ε (2.1)
for every ~i ∈ R, where r =
k=1 rj.
Recall that lower density of a subset R ⊂ Nj is defined to be
d∗(R) = lim inf
N1−M1,...,Nj−Mj→∞
#{R ∩ [M1, N1)× . . .× [Mj , Nj)}
k=1(Nk −Mk)
Recall that ψ
(l + bn) = g
T l+bnξ
The definition of the sequences ψj implies
r1,~i
(l + bn)
= lim
r2,(±i1,...,±ij)
(l ± bn)
for any r1, r2 ∈ Z, where ~i = (i1, . . . , ij).
Therefore, in order to prove Lemma 2.1 it is sufficient to show the following:
For every ε, δ > 0 and for any a priori chosen b ∈ N there exists I(ε, δ) ∈ N,
such that for every I ≥ I(ε, δ) there exists a subset S ⊂ [1, I]j of density
at least 1 − δ (namely, we have
|S∩[1,I)j |
≥ 1 − δ) and N(S, ε) ∈ N, such
that for every N ≥ N(S, ε) there exists L(N, S, ε) ∈ N such that for every
L ≥ L(N, S, ε) the following holds for every ~i ∈ S:
(l + bn)
Let b ∈ N. Continuity of the function
g0,~i and genericity of the point ξ ∈ Xξ yield
(l + bn)
= lim
T bng0,~i
T bng0,~i
dµ. (2.2)
By applying the von Neumann ergodic theorem to the ergodic system
(Xξ,B, µ, T
b) (ergodicity follows from weak-mixing of the original measure
preserving system (Xξ,B, µ, T )) we have
T bng0,~i →
L2(Xξ)
g0,~idµ. (2.3)
From (2.1) there exists I(ε, δ) ∈ N big enough that for every I ≥ I(ε, δ) there
exists a set S ⊂ [1, I]j of density at least 1− δ such that
g0,~idµ
for all ~i ∈ S.
From equation (2.3) it follows that there exists N(S, ε) ∈ N, such that for
every N ≥ N(S, ε) we have
T bng0,~i
for all ~i ∈ S.
Finally, equation (2.2) implies that there exists L(N, S, ε) ∈ N, such that for
every L ≥ L(N, S, ε) we have
(l + bn)
for all ~i ∈ S.
The following lemma is a generalization of the previous lemma to a product
of several autocorrelation functions.
Lemma 2.2 Let ψ
r1,~i
, . . . , ψ
rk,~i
be autocorrelation functions of length j of
WM-sequences ξ1, . . . , ξk of zero average,
{c11, . . . , c
j , . . . , c
1, . . . , c
j} ∈ (Z \ {0})
jk and ε, δ > 0. Suppose
(a1, b1), (a2, b2), . . . , (ak, bk) ∈ Z
2, such that ai > 0 for all i : 1 ≤ i ≤ k and
for every i 6= j
ai bi
aj bj
6= 0.
(If k = 1 assume that b1 6= 0.)
Then there exists I(ε, δ) ∈ N, such that for every I ≥ I(ε, δ) there exist
S ⊂ [−I, I]j of density at least 1− δ, M(S, ε) ∈ N, such that for every M ≥
M(S, ε) there exists X(M,S, ε) ∈ N, such that for every X ≥ X(M,S, ε)
r1,(c
i1,...,c
(a1x+ b1m) . . . ψ
rk,(c
i1,...,c
(akx+ bkm)
for every (i1, i2, . . . , ij) ∈ S.
Proof. The proof is by induction on k.
THE CASE k = 1 (and arbitrary j):
If a1 = 1 then the statement of the lemma follows from Lemma 2.1. If a1 > 1
then by Proposition 5.1 of Appendix for a given ~i = (i1, . . . , ij) ∈ S we have
r1,(c
i1,...,c
(a1x+ b1m)
r1,(c
i1,...,c
(x+ b1m)
(2.4)
(Limits exist by genericity of the point ξ.)
By Lemma 2.1 the right hand side of (2.4) is small for large enough M . So,
for large enough X (depending on M and (i1, . . . , ij)) the statement of the
lemma is true. By finiteness of S we conclude that the statement of the
lemma holds for k = 1.
GENERAL CASE (k > 1):
Suppose that the statement holds for k − 1.
Denote
vm(x)
r1,(c
i1,...,c
(a1x+ b1m) . . . ψ
rk,(c
i1,...,c
(akx+ bkm).
Let ε, δ > 0. We show that there exists I(ε, δ) ∈ N such that for every
I > I(ε, δ) a set S ⊂ [−I, I]j of density at least 1− δ can be chosen satisfying
the following property:
There exists I(ε, S) ∈ N such that for every I > I(ε, S) there exists
M(I) ∈ N such that for all M > M(I) for a set of i’s in {1, 2, . . . , I} of
density at least 1− ε
we have
< vm, vm+i >X
(2.5)
for all (i1, . . . , ij) ∈ S.
The Van der Corput lemma (Lemma 5.1 of Appendix) finishes the proof.
Note that the set of “good” i’s in the interval {1, 2, . . . , I} depends on
(i1, . . . , ij) ∈ S.
Denote
< vm, vm+i >X
1,j+1
r1,(c
i1,...c
ij ,b1i)
(a1x+ b1m) . . . ψ
k,j+1
rk,(c
i1,...,c
ij ,bki)
(akx+ bkm)
Denote y = a1x+ b1m. Assume that (a1, b1) = d. Denote
B̃y,m = ψ
1,j+1
r1,(c
i1,...c
ij ,b1i)
(y) . . . ψ
k,j+1
rk,(c
i1,...,c
ij ,bki)
(a′ky + b
where a′p =
, b′p = bp − a
pb1, 2 ≤ p ≤ k. We rewrite Ã as follows:
y≡dl mod a1
m≡φ(l) mod
B̃y,m
+ δX,M . (2.6)
Here φ is a bijection of Za1
defined by the identity
for every 0 ≤ l ≤ a1
− 1, Y = a1X , a
p as above and δX,M accounts for
the fact that for small y’s and y’s close to Y there is a difference between
elements that are taken in the expression for Ã and in the expression on the
right hand side of equation (2.6). Nevertheless, we have δX,M → 0 if
Denote
C̃y,m = ψ
2,j+1
r2,(c
i1,...,c
ij ,b2i)
(a′2y + b
2m) . . . ψ
k,j+1
rk,(c
i1,...,c
ij ,bki)
(a′ky + b
It will suffice to prove that there exists I(ε, δ) ∈ N such that for every I >
I(ε, δ) we can find S ⊂ [−I, I]j of density at least 1 − δ with the following
property:
There exists I(ε, S) ∈ N such that for every I > I(ε, S) there exists
M(I) ∈ N such that for every M > M(I) we can find X(M) ∈ N such that
for every X > X(M) for a set of i’s in {1, 2, . . . , I} of density at least 1 − ε
we have
y≡dl mod a1
m≡φ(l) mod
C̃y,m
(2.7)
for all 0 ≤ l ≤ a1
− 1, for all (i1, . . . , ij) ∈ S.
Note that it is enough to prove the latter statement for every particular
l : 0 ≤ l ≤ a1
Denote the left hand side of inequality (2.7) for a fixed l by D̃l.
Introduce new variables z and n, such that y = za1+ dl and m = n
+φ(l).
We obtain
D̃l =
2,j+1
t2n,z,l
. . . ψ
k,j+1
tkn,z,l
2,j+1
(a2z + c2n + q2) . . . ψ
k,j+1
(akz + ckn+ qk)
where shp = (rp, (c
1i1, . . . , c
j ij , bpi)),
n,z,l
ap(a1z+dl)+(a1bp−apb1)(
n+φ(l))
, qp =
apld+(a1bp−apb1)φ(l)
a1bp−apb1
6= 0, Z = Y
and N = Md
From the conditions on the function φ it follows that qp ∈ Z, 2 ≤ p ≤ k.
From the conditions of the lemma we obtain for every p 6= q, p, q > 1,
ap cp
aq cq
a1 det
ap bp
aq bq
6= 0.
Therefore, D̃l can be rewritten as
D̃l =
φ2 (a2z + c2n) . . . φk (akz + ckn)
where φℓ = ψ
ℓ,j+1
rℓ+qℓ,(c
i1,...,c
ij ,bℓi)
, 2 ≤ ℓ ≤ k. By the induction hypothesis the
following is true.
There exists Il(ε, δ
′) ∈ N big enough, such that for every Il ≥ Il(ε, δ
′) there
exist a subset Sl ⊂ [−Il, Il]
j+1 of density at least 1 − δ′2 and N(Sl, ε) ∈ N,
such that for every N ≥ N(Sl, ε) there exists Z(N, Sl, ε) ∈ N, such that for
every Z ≥ Z(N, Sl, ε) we have
D̃l <
(2.8)
for all (i1, . . . , ij , i) ∈ Sl.
For every (i1, . . . , ij) ∈ [−Il, Il]
j we denote by Sli1,...,ij the fiber above (i1, . . . , ij):
Sli1,...,ij = {i ∈ [−Il, Il] | (i1, . . . , ij , i) ∈ Sl}.
Then there exists a set Tl ⊂ [−Il, Il]
j of density at least 1− δ′, such that for
every (i1, . . . , ij) ∈ Tl the density of S
i1,...,ij
is at least 1 − δ′. Let ε, δ > 0.
Take δ′ < min ( ε
, δ) and I > max (I ′(ε), Il(ε, δ
′)) (I ′(ε) is taken from the van
der Corput lemma).
Then it follows by (2.8) that there exists M(Tl, ε, δ) ∈ N, such that for
every M ≥ M(Tl, ε, δ) there exists X(M,Tl, ε, δ) ∈ N, such that for every
X ≥ X(M,Tl, ε, δ) the inequality (2.7) holds for every fixed (i1, . . . , ij) ∈ Tl
for a set of i’s within the interval {1, . . . , I} of density at least 1 − ε
. The
lemma follows from the van der Corput lemma.
Proof of Proposition 2.1.
Denote vm(n)
= ξ1(a1n+b1m) . . . ξk(akn+bkm). For every i ∈ N we introduce
Ã defined by
< vm, vm+i >N
0,(b1i)
(a1n+ b1m) . . . ψ
0,(bki)
(akn + bkm)
where the functions ψp,j’s are autocorrelation functions of the ξp’s of length
By Lemma 2.2 it follows that for every ε > 0 there exists I(ε) ∈ N such that
for every I ≥ I(ε) there exist S ⊂ {1, 2, . . . , I} of density at least 1− ε
M(S, ε) such that for every M ≥ M(S, ε) there exists N(M,S, ε) such that
for every N ≥ N(M,S, ε) we have
0,(b2i)
(a2n+ b2m) . . . ψ
0,(bki)
(akn+ bkm)
≤ ε2.
The proposition follows from the van der Corput Lemma 5.1.
3 Probabilistic constructions of WM sets
The goal of this section is to prove the necessity of the conditions of Theorem
1.3.1. The following proposition is the main tool for this task.
Proposition 3.1 Let a, b ∈ N, c ∈ Z such that a 6= b. Then there exists a
normal set A within which the equation
ax = by + c (3.1)
is unsolvable, i.e., for every (x, y) ∈ A2 we have ax 6= by + c.
Remark 3.1 The proposition is a particular case of Theorem 1.3.1. It is a
crucial ingredient in proving the necessity direction of the theorem in general.
Proof. Let S ⊂ N. We construct from S a new set AS within which the
equation ax = by + c is unsolvable.
Without loss of generality, suppose that a < b.
Assume (a, b) = 1 (the general case follows easily). It follows from (a, b) =
1 that (3.1) is solvable. Any solution (x, y) of the equation ax = by +
c has restrictions on x. Namely, x ≡ φ(a, b, c)(mod b), where φ(a, b, c) ∈
{0, 1, . . . , b − 1} is determined uniquely. Let us denote l0
= φ(a, b, c). We
define inductively a sequence {li} ⊂ N ∪ {0}. If a pair (x, y) is a solution of
equation (3.1) and y ∈ biN + li−1 then choose li ∈ {0, 1, . . . , b
i+1 − 1} such
that x ∈ bi+1N+ li.
Note that from (a, b) = 1 it follows that (a, bi+1) = 1. It is clear that
if u, v ∈ N satisfy (u, v) = 1 then for any w ∈ Z there exists a solution
(x, y) ∈ N2 of the equation ux = vy + w. The latter implies that there exist
x ∈ N, y ∈ biN+ li−1 such that ax = by+ c. Any such x should be a member
of bi+1N+ li. Note that li and li−1 are connected by the identity
ali ≡ bli−1 + c ( mod b
i+1). (3.2)
In addition, if x ∈ N is given then the equation
ax ≡ by + c ( mod bi+1)
has at most one solution y ∈ {0, 1, . . . , bi − 1}.
We define sets Hi
= biN+ li−1 ; i ∈ N. We prove that for every i ∈ N, Hi+1 ⊂
Hi. All elements of Hi+1 are in the same class modulo b
i+1, therefore all
elements of Hi+1 are in the same class modulo b
i. So, if we show for some
x ∈ Hi+1 that x ≡ li−1(mod b
i) then we are done. For i = 1 we know
that if y ∈ N then any x ∈ N such that (x, y) is a solution of the equation
(3.1) has to be in H1. Take x ∈ H2 such that there exists y ∈ H1 with
ax = by + c. Then x ∈ H1. Therefore, we have shown that H2 ⊂ H1. For
i > 1 there exists x ∈ Hi+1 such that there exists y ∈ Hi with ax = by + c.
By induction Hi ⊂ Hi−1. Therefore, the latter y is in Hi−1. Therefore, by
construction of li’s we have that x ∈ Hi. This shows Hi+1 ⊂ Hi. We define
sets Bi; 0 ≤ i <∞:
B0 = N \H1,
B1 = H1 \H2
. . .
Bi = Hi \Hi+1
. . .
Clearly we have Bi∩Bj = ∅ , ∀i 6= j and |N\ (∪
i=0Bi)| = | ∩
i=1Hi| ≤ 1. The
latter is because for every i the second element (in the increasing order) of
Hi is ≥ b
We define AS =
i=0Ai, where Ai’s are defined in the following manner:
= S ∩ B0, C0
= B0 \ A0
= B1 \ {x | ax ∈ bB0 + c}, A1
= (B1 ∩ {x | ax ∈ bC0 + c}) ∪ (D1 ∩ S) ,
= B1 \A1
. . .
= Bi \ {x | ax ∈ bBi−1 + c}, Ai = (Bi ∩ {x | ax ∈ bCi−1 + c}) ∪ (Di ∩ S) ,
= Bi \ Ai
. . .
Here it is worthwhile to remark that for every i, Bi = Ai ∪ Ci. Therefore
AS ⊂ ∪
i=0Bi.
If for some i ≥ 1 we have y ∈ Ai ⊂ Bi = Hi \ Hi+1, then any x with
ax = by + c satisfies
ax ≡ bli−1 + c ( mod b
i+1).
From (a, bi+1) = 1 it follows that there exists a unique solution x modulo
bi+1. By identity (3.2) we have
x ≡ li ( mod b
i+1).
Thus x ∈ Hi+1.
If x ∈ Hi+2, then
x ≡ li+1 ( mod b
i+2).
Thus we have
ali+1 ≡ by + c ( mod b
i+2).
By uniqueness of a solution ( y ) modulo bi+1 we get
y ≡ li ( mod b
i+1).
Thus y ∈ Hi+1. We have a contradiction, which shows that x ∈ Hi+1\Hi+2 =
Bi+1.
The same argument works for y ∈ A0 ⊂ B0 and it shows that any x with
ax = by + c satisfies x ∈ B1.
So, if y ∈ Ai (i ≥ 0) then any x with ax = by + c should satisfy x ∈ Bi+1.
By construction of AS, x 6∈ AS. Thus equation (3.1) is not solvable in AS.
We make the following claim:
For almost every subset S of N the set AS is a normal set.
(The probability measure on subsets of N considered here is the product on
{0, 1}∞ of probability measures (1
The tool for proving the claim is the following easy lemma (for a proof see
Appendix, Lemma 5.2).
A subset A of natural numbers is a normal set if and only if for any k ∈
(N ∪ {0}) and any i1 < i2 < . . . < ik we have
χA(n)χA(n + i1) . . . χA(n+ ik) = 0, (3.3)
where χA(n)
= 2 · 1A(n)− 1.
First of all, we denote TN =
n=1 χAS(n)χAS(n + i1) . . . χAS(n + ik). Be-
cause of randomness of S, TN is a random variable. We will prove that
N=1E(T
) <∞ and this will imply by Lemma 5.3 that TN →N→∞ 0 for
almost every S ⊂ N.
E(T 2N) =
n,m=1
E(χAS(n)χAS(n+i1) . . . χAS(n+ik)χAS(m) . . . χAS(m+ik)).
Adding (removing) of a finite set to (from) a normal set does not affect the
normality of the set. The set ∪iBi might differ from N by at most one element
(| ∩∞i=1 Hi| ≤ 1). This possible element does not affect the normality of AS
and we assume without loss of generality that ∩∞i=1Hi = ∅, thus N = ∪
i=0Bi.
For every number n ∈ N we define the chain of n, Ch(n), to be the following
finite sequence:
If n ∈ B0, then Ch(n) = (n).
If n ∈ B1, then two situations are possible. In the first one there exists a
unique y ∈ B0 such that an = by+c. We set Ch(n) = (n, y) = (n, Ch(y)). In
the second situation we can not find such y from B0 and we set Ch(n) = (n).
If n ∈ Bi+1, then again two situations are possible. In the first one there
exists y ∈ Bi such that an = by + c. In this case we set Ch(n) = (n, Ch(y)).
In the second situation there is no such y from Bi. In this case we set
Ch(n) = (n). We define l(n) to be the length of Ch(n).
For every n ∈ N we define the ancestor of n, a(n), to be the last element of
the chain of n (of Ch(n)). To determine whether or not n ∈ AS will depend
on whether a(n) ∈ S. The exact relationship depends on the i for which
n ∈ Bi and on the j for which a(n) ∈ Bj or in other words on the length of
Ch(n): χAS(n) = (−1)
i−jχS(a(n)) = (−1)
l(n)−1χS(a(n)).
We say that n is a descendant of a(n).
It is clear that E(χAS(n1) . . . χAS(nk)) 6= 0 (E(χAS(n1) . . . χAS(nk)) ∈ {0, 1})
if and only if every number a(ni) occurs an even number of times among
numbers a(n1), a(n2), . . . , a(nk).
We bound the number of n,m’s inside the square [1, N ] × [1, N ] such that
E(χAS(n)χAS(n+ i1) . . . χAS(n+ ik)χAS(m)χAS(m+ i1) . . . χAS(m+ ik)) 6= 0.
For a given n ∈ [1, N ] we count all m’s inside [1, N ] such that for the ancestor
of n there will be a chance to have a twin among the ancestors of n+i1, . . . , n+
ik, m,m+ i1, . . . , m+ ik.
First of all it is obvious that in the interval [1, N ] for a given ancestor there
can be at most log b
N + C1 descendants, where C1 is a constant. For all
but a constant number of n’s it is impossible that among n + i1, . . . , n + ik
there is the same ancestor as for n. Therefore we should focus on ancestors
of the set {m,m + i1, . . . , m + ik}. For a given n we might have at most
(k + 1)(log b
N + C1) options for the number m to provide that one of the
elements of {m,m + i1, . . . , m + ik} has the same ancestor as n. Therefore
for most of n ∈ [1, N ] (except maybe a bounded number C2 of n’s which
depends only on {i1, . . . , ik} and doesn’t depend on N) we have at most
(k + 1)(log b
N + C1) possibilities for m’s such that
E(χAS(n)χAS(n+ i1) . . . χAS(n+ ik)χAS(m)χAS(m+ i1) . . . χAS(m+ ik)) 6= 0.
Thus we have
E(T 2N) ≤
(k + 1)(log b
N + C1) + C2N
((k+1) log b
N +C3),
where C3 is a constant. This implies
E(T 2N2) <∞.
Therefore TN2 →N→∞ 0 for almost every S ⊂ N. By Lemma 5.3 it follows
that TN →N→∞ 0 almost surely.
In the general case, where a, b are not relatively prime, if c satisfies (3.1) then
it should be divisible by (a, b). Therefore by dividing the equation (3.1) by
(a, b) we reduce the problem to the previous case.
We use the following notation:
Let W be a subset ofQn. Then for any increasing subsequence I = (i1, . . . , ip) ⊂
{1, 2, . . . , n} we define
ProjIW = WI = {(wi1, . . . , wip) | ∃w = (w1, w2, . . . , wn) ∈ W}.
We recall the notion of a cone.
Definition 3.1 A subset W ⊂ Qn is called a cone if
(a) ∀w1, w2 ∈ W we have w1 + w2 ∈ W
(b) ∀α ∈ Q : α ≥ 0 and ∀w ∈ W we have αw ∈ W .
The next step involves an algebraic statement with a topological proof which
we have to establish.
Lemma 3.1 Let W be a non-trivial cone in Qn which has the property that
for every two vectors ~a = {a1, a2, . . . , an}
t,~b = {b1, b2, . . . , bn}
t ∈ W there
exist two coordinates 1 ≤ i < j ≤ n (depend on the choice of ~a,~b) such that
ai bi
aj bj
There exist two coordinates i < j such that the projection of W on these two
coordinates is of dimension ≤ 1 (dimQ SpanProj(i,j)W ≤ 1).
Proof. First of all W has positive volume in V = SpanW (Volume is Haar
measure which normalized by assigning measure one to a unit cube and W
contains a parallelepiped). Fix an arbitrary non-zero element ~x ∈ W . For
every i, j : 1 ≤ i < j ≤ n we define the subspace
Ui,j = {~v ∈ V |Proj(i,j)~v ∈ SpanProj(i,j)~x}.
From the assumptions of the lemma it follows that
i,j;1≤i<j≤n
(W ∩ Ui,j).
For every i 6= j we obviously have that the volume of Ui,j is either zero or
Ui,j = V . If we assume that the statement of the lemma does not hold then
Ui,j 6= V, ∀i 6= j, and thus the volume of Ui,j , ∀i 6= j is zero. We get a
contradiction because a finite union of sets with zero volume cannot be equal
to a set with positive volume.
Proof. (of Theorem 1.3.1, ⇛)
Assume that an affine subspace A of Qk intersects Ak for any WM set A ⊂ N.
First of all, we shift the affine space to obtain a vector subspace, denote it
by U . The linear space U must contain vectors with all positive coordinates,
since A ∩Ak must be infinite.
Denote by W = {~v ∈ U | 〈~v, ~ei〉 ≥ 0 , ∀ i : 1 ≤ i ≤ k}. W is a non-trivial
cone.
Assume that for every ~a = (a1, . . . , ak)
t,~b = (b1, . . . , bk)
t ∈ W we have that
∃i, j : 1 ≤ i < j ≤ k such that
ai bi
aj bj
Then by Lemma 3.1 we deduce that there exist maximal subsets of coordi-
nates F1, . . . , Fl (one of them, assume F1, should have at least two coordi-
nates) such that for every r ∈ {1, 2 . . . , l} we have VFr
= SpanWFr is one
dimensional.
We fix r : 1 ≤ r ≤ l. We show that the projection on Fr of W + ~f is on a
diagonal, where ~f ∈ Zk is such that U + ~f = A. If the projection of W on
Fr is not on a diagonal then there exist two coordinates i < j from Fr such
that W(i,j) = {(ax, bx) | x ∈ N} for some a 6= b natural numbers. Therefore
the projection of A on (i, j) has the form {(ax+ f1, bx+ f2) | x ∈ N}, where
f1, f2 are integers. From Proposition 3.1 it follows that for any a, b, c, where
a 6= b, there exists a WM set A (even a normal set) such that the equation
ax = by + c is not solvable within A. This proves the existence of a WM set
A0 such that for every x ∈ Z we have (ax+ f1, bx+ f2) 6∈ A
0 (introduce the
new variables z1, z2 by (z1, z2) = (ax1+ f1, bx+ f2) and take a normal set A0
such that the equation az2 = bz1 + (af2 − bf1) is unsolvable within A0).
Thus ∀i, j ∈ Fr : W(i,j) = {(ax, ax) | x ∈ N}.
To prove that a shift is the same for all coordinates in Fr we merely should
know that for any natural number c there exists a WM set Ac such that
inside Ac the equation x− y = c is not solvable. The last statement is easy
to verify.
Let jr ∈ Fr, ∀1 ≤ r ≤ l. Denote I = (j1, . . . , jl). We have proved that there
exist g1, . . . , gl ∈ N, c1, . . . , cl ∈ Z such that
(U + ~f)I = {(g1x1 + c1, . . . , glxl + cl) | x1, . . . , xl ∈ Q}.
It is clear that we can find ~a,~b which satisfy all the requirements of Theorem
1.3.1. This completes the proof.
Remark 3.2 We have proved that if an affine subspace A ⊂ Qk intersects
Ak for any normal set A ⊂ N, then there exist ~a,~b ∈ Nk and a partition
F1, . . . , Fl of {1, 2, . . . , k} such that:
(a) ∀r : 1 ≤ r ≤ l and ∀i ∈ Fr, ∀j 6∈ Fr we have
ai bi
aj bj
6= 0.
(b) ∃~f ∈ Zk such that the set {n~a +m~b+ ~f |n,m ∈ N} is in A.
Thus, we have proved the direction “⇛” of Theorem 1.3.2.
4 Comparison with Rado’s Theorem
We recall that the problem of solvability of a system of linear equations in
one cell of any finite partition of N was solved by Rado in [10]. Such systems
of linear equations are called partition-regular. We show that partition-
regular systems are solvable within every WM set by use of Theorem 1.3.1.
It is important to note that solvability of partition-regular linear systems of
equations within WM sets can be shown directly (without use of Theorem
1.3.1) by use of the technique of Furstenberg and Weiss that was developed
in their dynamical proof of Rado’s theorem (see [8]).
First of all we describe Rado’s regular systems.
Definition 4.1 A rational p × q matrix (aij) is said to be of level l if the
index set {1, 2, . . . , q} can be divided into l disjoint subsets I1, I2, . . . , Il and
rational numbers crj may be found for 1 ≤ r ≤ l and 1 ≤ j ≤ q such that the
following relationships are satisfied:
aij = 0
aij =
c1jaij
. . .
aij =
j∈I1∪I2∪...∪Il−1
cl−1j aij
for i = 1, 2, . . . , p.
Theorem 4.1 (Rado) A system of linear equations is partition-regular if and
only if for some l the matrix (aij) is of level l and it is homogeneous, i.e. a
system of the form
aijxj = 0, i = 1, 2, . . . , p.
The following claim is the main result of this section.
Proposition 4.1 A partition-regular system is solvable in every WM set.
Proof. Let a system
j=1 aijxj = 0, i = 1, 2, . . . , p be partition-regular.
We will use the fact that the system is solvable for any finite partition of N.
First of all, the set of solutions of a partition-regular system is a subspace of
Qq; denote it by V . It is obvious that V contains vectors with all positive
components. If for some 1 ≤ i < j ≤ q we have Proj+i,jV (where Proj
i,jV =
{(x, y)|x, y ≥ 0 & ∃~v ∈ V :< ~v, ~ei >= x , < ~v, ~ej >= y}) is contained
in a line, then Proj+i,jV is diagonal, i.e. it is contained in {(x, x)|x ∈ Q}.
Otherwise, we can generate a partition of N into two disjoint sets S1, S2 such
that no S
1 and no S
2 intersects V :
This partition is constructed by an iterative process. Without loss of gener-
ality we may assume that the line is x = ny, where n ∈ N. The general case
is treated in the simillar way. We start with S1 = S2 = ∅. Let 1 ∈ S1.
We “color” the infinite geometric progression {nm |m ∈ N} (adding elements
to either S1 or S2) in such way that there is no (x, y) on the line from S
1 , S
Then we take a minimal element from N which is still uncolored. Call it a.
Add a to S1. Next, “color” {an
m |m ∈ N}.
Continuing in this fashion, we obtain the desired partition of N.
This contradicts the assumption that the given system is partition-regular.
Let F1, . . . , Fl be a partition of {1, 2, . . . , k} such that for every r ∈ {1, . . . , l}
we have for every i 6= j , i, j ∈ Fr : dim QSpan(Proj
i,jV ) = 1, and for every
r : 1 ≤ r ≤ l, every i ∈ Fr .and for every j 6∈ Fr we have dim QSpan(Proj
i,jV ) =
2. For every r : 1 ≤ r ≤ l we choose arbitrarily one representative index
within Fr and denote it by jr (jr ∈ Fr).
Then there exist g1, . . . , gl ∈ N such that
VI = {(g1x1, . . . , glxl) | x1, . . . , xl ∈ Q}.
The latter ensures that there exist vectors ~a,~b ∈ V which satisfy all the
requirements of Theorem 1.3.1 and, therefore, the system is solvable in every
WM set.
5 Appendix
In this section we prove all technical lemmas and propositions that were used
in the paper.
We start with the key lemma which is a finite modification of Bergelson’s
lemma in [1]. Its origin is in a lemma of van der Corput.
Lemma 5.1 Suppose ε > 0 and {uj}
j=1 is a family of vectors in Hilbert
space, such that ‖uj‖ ≤ 1 (1 ≤ j ≤ ∞). Then there exists I
′(ε) ∈ N, such
that for every I ≥ I ′(ε) there exists J ′(I, ε) ∈ N, such that the following
holds:
For J ≥ J ′(I, ε) for which we obtain
〈uj, uj+i〉
for a set of i’s in the interval {1, . . . , I} of density 1− ε
we have
Proof. For an arbitrary J define uk = 0 for every k < 1 or k > J . The
following is an elementary identity:
uj−i = I
Therefore, the inequality
i=1 ui
i=1 ‖ui‖
yields
≤ (J + I)
= (J + I)
uj−p,
uj−s〉
= (J+I)
‖uj−p‖
+2(J+I)
r,s=1;s<r
〈uj−r, uj−s〉 = (J+I)(Σ1+2Σ2),
where Σ1 = I
by the aforementioned elementary identity and
h=1(I−h)
j=1〈uj, uj+h〉. The last expression is obtained by rewrit-
ing Σ2, where h = r − s. By dividing the foregoing inequality by I
2J2 we
obtain
J + I
J + I
J + I
Choose I ′(ε) ∈ N, such that 12
≤ I ′(ε) ≤ 12
+1. Then for every I ≥ I ′(ε) we
have 1
≤ 11ε
. There exists J ′(I, ε) ∈ N, such that for every J ≥ J ′(I, ε):
. As a result, for every I ≥ I ′(ε) there exists J ′(I, ε), such that for
every J ≥ J ′(I, ε)
The next proposition was used in Section 2.
Proposition 5.1 Let A ⊂ N be a WM-set. Then for every integer a > 0
and every integers b1, b2, . . . , bk
ξ(n+ b1)ξ(n+ b2) . . . ξ(n+ bk) =
ξ(an+ b1)ξ(an+ b2) . . . ξ(an+ bk),
where ξ
= 1A − d(A).
Proof. Consider the weak-mixing measure preserving system (Xξ,B, µ, T ).
The left side of the equation in the proposition is
T b1fT b2f . . . T bkfdµ,
where f(ω)
= ω0 for every infinite sequence inside Xξ. We make use of the
notion of disjointness of measure preserving systems. By [6] we know that
every weak-mixing system is disjoint from any Kronecker system which is
a compact monothethic group with Borel σ-algebra, the Haar probability
measure, and the shift by a chosen element of the group. In particular,
every weak-mixing system is disjoint from the measure preserving system
(Za,BZa , S, ν), where Za = Z/aZ, S(n)
= n+ 1( mod a). The measure and
the σ-algebra of the last system are uniquely determined. Therefore, from
Furstenberg’s theorem (see [6], Theorem I.6) it follows that the point (ξ, 0) ∈
Xξ×Za is a generic point of the product system (Xξ×Za,B×BZa , T×S, µ×ν).
Thus, for every continuous function g on Xξ × Za we obtain
Xξ×Za
g(x,m)dµ(x)dν(m) = lim
g(T nξ, Sn0).
Let g(x,m)
= f(x)10(m), which is obviously continuous on Xξ × Za. Then
genericity of the point (ξ, 0) yields
Xξ×Za
f(x)10(m)dµ(x)dν(m) =
f(x)dµ(x) =
f(T nξ)10(n) = lim
f(T anξ).
Taking instead of the function f the continuous function T b1fT b2f . . . T bkf
in the definition of g finishes the proof.
The next two lemmas are very useful for constructing normal sets with specif-
ical properties.
Lemma 5.2 Let A ⊂ N. Let λ(n) = 21A(n)− 1. Then A is a normal set ⇔
for any k ∈ (N ∪ {0}) and any i1 < i2 < . . . < ik we have
λ(n)λ(n+ i1) . . . λ(n+ ik) = 0.
Proof. “⇒” If A is normal then any finite word w ∈ {−1, 1}∗ has the
“right” frequency 1
inside wA. This guarantees that “half of the time” the
function λ(n)λ(n + i1) . . . λ(n + ik) equals 1 and “half of the time” is equal
to −1. Therefore we get the desired conclusion.
“⇐” Let w be an arbitrary finite word of plus and minus ones: w = a1a2 . . . ak
and we have to prove that w occurs in wA with the frequency 2
−k. For every
n ∈ N the word w occurs in 1A and starting from n if and only if
1A(n) = a1
. . .
1A(n+ k − 1) = ak
The latter is equivalent to the following
λ(n) = 2a1 − 1
. . .
λ(n+ k − 1) = 2ak − 1
The frequency of w within 1A is equal to
λ(n)(2a1 − 1) + 1
. . .
λ(n+ k − 1)(2ak − 1) + 1
The limit is equal to 1
Lemma 5.3 Let {an} be a bounded sequence. Let TN =
n=1 an. Then
TN converges to a limit t ⇔ there exists a sequence of increasing indices {Ni}
such that Ni
→ 1 and TNi →i→∞ t.
References
[1] Bergelson, V. Weakly mixing PET. Ergodic Theory Dynam. Sys-
tems 7 (1987), no. 3, 337–349.
[2] Bergelson, V.; McCutcheon, R. An ergodic IP polynomial Sze-
merédi theorem. Mem. Amer. Math. Soc. 146 (2000), no. 695.
[3] Fish, A. Random Liouville functions and normal sets. Acta Arith.
120 (2005), no. 2, 191–196.
[4] Fish, A. Polynomial largeness of sumsets and totally ergodic sets,
see http://arxiv.org/abs/0711.3201.
[5] Fish, A. Ph.D. thesis, Hebrew University, 2006.
[6] Furstenberg, H. Disjointness in ergodic theory, minimal sets, and
a problem in Diophantine approximation. Math. Systems Theory
1 (1967), 1-49.
[7] Furstenberg, H. Ergodic behavior of diagonal measures and a
theorem of Szemerédi on arithmetic progressions. J. d’ Analys
Math. 31 (1977), 204–256.
[8] Furstenberg, H. Recurrence in Ergodic Theory and Combinatorial
Number Theory. Princeton Univ. Press 1981.
[9] Host, B.; Kra, B. Nonconventional ergodic averages and nilman-
ifolds. Ann. of Math. (2) 161 (2005), no. 1, 397–488.
[10] Rado, R. Note on combinatorial analysis. Proc. London Math.
Soc. 48 (1943), 122–160.
[11] Schur, I. Uber die Kongruenz xm+ym ≡ zm(modp). Jahresbericht
der Deutschen Math.-Ver. 25 (1916), 114–117.
[12] Szemerédi, E. On sets of integers containing no k elements in
arithmetic progression. Collection of articles in memory of Jurǐi
Vladimirovič Linnik. Acta Arith. 27 (1975), 199–245.
Current Address:
Department of Mathematics
University of Wisconsin-Madison
480 Lincoln Dr.
Madison, WI 53706-1388
E-mail: afish@math.wisc.edu
http://arxiv.org/abs/0711.3201
	Introduction
	Algebraic patterns within subsets of N
	Generic points and WM sets
	Solvability of linear diophantine systems within WM sets and normal sets
	Acknowledgments
	Proof of Sufficiency
	Probabilistic constructions of WM sets
	 Comparison with Rado's Theorem
	Appendix
ABSTRACT
  We introduce a new class of "random" subsets of natural numbers, WM sets.
This class contains normal sets (sets whose characteristic function is a normal
binary sequence). We establish necessary and sufficient conditions for
solvability of systems of linear equations within every WM set and within every
normal set. We also show that partition-regular system of linear equations with
integer coefficients is solvable in any WM set.

<|endoftext|><|startoftext|>
arXiv:0704.0601v4  [hep-ph]  24 Oct 2007
D − D̄ mixing and rare D decays in the Littlest Higgs
model with non-unitarity matrix
Chuan-Hung Chen1,2∗, Chao-Qiang Geng3,4† and Tzu-Chiang Yuan3‡
1Department of Physics, National Cheng-Kung University, Tainan 701, Taiwan
2National Center for Theoretical Sciences, Hsinchu 300, Taiwan
3Department of Physics, National Tsing-Hua University, Hsinchu 300, Taiwan
4Theory Group, TRIUMF, 4004 Wesbrook Mall,
Vancouver, B.C. V6T 2A3, Canada
(Dated: October 27, 2018)
Abstract
We study the D − D̄ mixing and rare D decays in the Littlest Higgs model. As the new weak
singlet quark with the electric charge of 2/3 is introduced to cancel the quadratic divergence induced
by the top-quark, the standard unitary 3× 3 Cabibbo-Kobayashi-Maskawa matrix is extended to
a non-unitary 4× 3 matrix in the quark charged currents and Z-mediated flavor changing neutral
currents are generated at tree level. In this model, we show that theD−D̄ mixing parameter can be
as large as the current experimental value and the decay branching ratio (BR) of D → Xuγ is small
but its direct CP asymmetry could be O(10%). In addition, we find that the BRs of D → Xuℓ+ℓ−,
D → Xuνν̄ and D → µ+µ− could be enhanced to be O(10−9), O(10−8) and O(10−9), respectively.
∗ Email: physchen@mail.ncku.edu.tw
† Email: geng@phys.nthu.edu.tw
‡ Email: tcyuan@phys.nthu.edu.tw
http://arxiv.org/abs/0704.0601v4
I. INTRODUCTION
As the observation of the Bs − B̄s mixing in 2006 by CDF [1], all neutral pseudoscalar-
antipseudoscalar oscillations (P − P̄ ) in the down type quark systems have been seen. In
the standard model (SM), the most impressive features of flavor physics are the Glashow-
Iliopoulos-Maiani (GIM) mechanism [2] and the large top quark mass. The former results
in the cancellation between the lowest order short-distance (SD) contributions of the first
two generations to the mass difference ∆mK in the K
0 system, while the latter makes ∆mBq
(q = d, s) in the Bq systems dominated by the SD effects [3]. In addition, these features also
lead to sizable flavor changing neutral currents (FCNCs) from box and penguin diagrams,
which contribute to the rare decays, such as K → πνν̄ and B → K(∗)ℓℓ̄. It is known that
these processes could be good candidates to probe new physics effects [4–6]. However, it is
clear that the new physics signals deviated from the SM predications for the P − P̄ mixings
and rare FCNC decays have to wait for precision measurements on these processes.
Unlike K and Bq systems, the SD contributions to charmed-meson FCNC processes,
such as the D − D̄ mixing [7] and the decays of c → uℓ+ℓ− and D → ℓ+ℓ− [8], are highly
suppressed due to the stronger GIM mechanism and weaker heavy quark mass enhancements
in the loops. On the other hand, it is often claimed that the long-distance (LD) effect for
the D−D̄ mixing should be the dominant contribution in the SM. Nevertheless, because the
nonperturbative hadronic effects are hard to control, the result is still inconclusive [9–12].
Recently, BABAR [13] and BELLE [14, 15] collaborations have reported the evidence for
the D − D̄ mixing with
x′2 = (−0.22± 0.30± 0.21)× 10−3 ,
y′ = (9.7± 4.4± 3.1)× 10−3 , (1)
x ≡ ∆mD
= (0.80± 0.29± 0.17)% ,
y ≡ ∆ΓD
= (0.33± 0.24± 0.15)% ,
yCP = (1.31± 0.32± 0.25)% , (2)
respectively, where x′ = x cos δ + y sin δ and y′ = −x sin δ + y cos δ with the assumption
of CP conservation and δ being the relative strong phase between the amplitudes for the
doubly-Cabbibo-suppressed D → K+π− and Cabbibo-favored D → K−π+ decays [16, 17]
and yCP = τ(D → K−π+)/τ(D → K+K−) − 1. Moreover, no evidence for CP violation is
found. The combined results of Eqs. (1) and (2) at the 68% C.L. are [18]
x = (5.5± 2.2)× 10−3 ,
y = (5.4± 2.0)× 10−3 ,
δ = (−38± 46)0 . (3)
Note that the upper bound of x < 0.015 at 95%C.L. can be extracted from the BELLE
data in Eq. (2) [14, 15]. The evidences of the mixing parameters by BABAR and BELLE
collaborations reveal that the era of the rare charmed physics has arrived. The results in
Eq. (3) can not only test the SU(3) breaking effects for the D− D̄ mixing [10, 12], but also
examine new physics beyond the SM [17–21].
It is known that a straightforward way to enhance the rare D processes is to include some
new heavy quarks within the framework of the SM. For instance, if a new heavy quark with
the electric charge of −1/3 is introduced, it could affect the D system since the extra down
type quark violates the GIM mechanism. However, the constraint on this heavy quark is
quite strong as it could also lead to FCNCs for the down type quark sector at tree level,
which are strictly limited by the well measured rare K and B decays, such as KL → µ+µ−
and B → Xsγ [22]. On the other hand, if the charge of the new heavy quark is 2/3, it could
generate interesting tree level FCNCs for the up type quark sector, for which the constraints
are much weaker. In this paper, we will study D physics based on a new weak singlet upper
quark.
It has been known that in the framework of the Littlest Higgs model [23], there exists
a new SU(2)L singlet vector-like up quark [24], hereafter denoted by T . Since the number
of down type quarks is the same as that in the SM, the standard unitary 3 × 3 Cabibbo-
Kobayashi-Maskawa (CKM) matrix is extended to a non-unitary 4×3 matrix in the charged
currents. Moreover, Z-mediated FCNCs for the up quark sector are generated at tree level.
In Ref. [25], it has been shown that the contributions of this new quark to the rare D
processes are small and cannot reach the sensitivities of future experiments [25, 26]. In this
paper, we will demonstrate that by adopting some plausible scenario, the effects could not
only generate a large D − D̄ oscillation but also marginally reach the sensitivity proposed
by BESIII for the rare D decays [27]. We note that the implication of the new data on the
D − D̄ mixing in the Littlest Higgs model with T-parity has been recently studied in Ref.
[21].
The paper is organized as follows. In Sec. II, we investigate that when a gauge singlet T -
quark is introduced in the Littlest Higgs model, how the non-unitary matrix for the charged
current and the tree level Z-mediated FCNC are formed. By using the leading perturbation,
the mixing matrix elements related to the new parameters in the Littlest Higgs model are
derived. In addition, we study how to get the small mixing matrix element for Vu(c)b, which
describes the b → u(c) decays. In Sec. III, we discuss the implications of the non-unitarity
on the D− D̄ mixing and rare D decays by presenting some numerical analysis. Finally, we
summarize our results in Sec. IV.
II. NON-UNITARY MIXING MATRIX IN THE LITTLEST MODEL
To study the new flavor changing effects in the Littlest Higgs model, we start by writing
the Yukawa interactions for the up quarks to be [24, 25]
λabfǫijkǫxyχaiΣjxΣkyu
b + λ0fTT
c + h.c. , (4)
where χT1 = (d1, u1, 0), χ
2 = (s2, c2, 0), χ
3 = (b3, t3, T ), u
b is the weak singlet and Σ =
eiΠ/fΣ0e
iΠT /f with
112×2
112×2
, Π =
2 h∗/
φ hT/
. (5)
The scale f denotes the global symmetry spontaneously breaking scale, which, as usual,
could be around 1 TeV. Consequently, the 4× 4 up-quark mass matrix is given by [25]
iλijv
− − − − −
0 0 λ33f | λ0f
. (6)
We remark that the quadratic divergences for the Higgs mass from one-loop diagrams in-
volving t and T get exactly cancelled as shown in Ref. [28]. Moreover, for other quarks
other than the top quark, the one-loop quadratic divergent contributions do not necessitate
fine-tuning the Higgs potential as the cutoff is around 10 TeV for f ∼ 1 TeV due to the small
corresponding Yakawa couplings. That is why there is no need to introduce extra singlet
states T [28, 29].
To obtain the quark mass hierarchy of mt ≫ mc ≫ mu, we can choose a basis such that
the up-quark mass matrix is [30]
m̂U 0
hf λ0f
 (7)
where m̂Uij = δijλiv/
2 ≡ mi is diagonal matrix and h = (h1, h2, h3). The hi is related to
λ33 by hi = Ṽ
i3 λ33 and hh
† = |λ33|2, in which Ṽ UR is the unitary transformation for the
right-handed up quarks. We note that mi are not the physical masses and in principle their
magnitudes could be as large as the weak scale. In order to preserve the hierarchy in the
quark masses, one expects that m3 > m2 > m1. Furthermore, in terms of this basis, the
charged and neutral currents, defined by
LC = g√
J−µ W
+µ − g√
2 tan θ
J−µ W
H + h.c. ,
cos θW
3 − sin2 θWJµem
tan θ
3 ZHµ + h.c. , (8)
are expressed by
J−µ = ŪLγµṼ
0aVDL ,
µṼ 0aV Ṽ
0†UL −
µDL , (9)
respectively, where UT = (u1, c2, t3, T ), D
T = (d, s, b), aV = diag(1, 1, 1, 0) and
Ṽ 0 =
(V 0CKM)3×3 0
 (10)
with V 0CKMV
CKM = 113×3. The null entry in aV denotes the new T -quark being a weak
singlet; and without the new T -quark, V 0CKM is just the CKM matrix. Since the down quark
sector is the same as that in the SM, we have set the unitary transformation UDL to be an
identity matrix.
For getting the physical eigenstates, the mass matrix in Eq. (7) can be diagonalized by
unitary matrices V UL,R so that we have
†diag
U = V
ULMUM
L (11)
m̂Um̂
f (|λ33|2 + |λ0|2)f 2
 . (12)
Since (|λ33|2 + |λ0|2)f 2 is much larger than other elements, we can take the leading order
of the perturbation in himi/f (i = 1, 2, 3). According to Eq. (11), the leading expansion is
given by
†diag
U = V
ULMUM
L ≈ (1 + ∆L)MUM †U (1−∆L) . (13)
By looking at the off-diagonal terms (M
†diag
U )i4(4i), we can easily get
∆Li4 ≈ −∆L4i = −
himif
(|λ33|2 + |λ0|2)f 2 −m2i
with i 6= 4. From the diagonal entries, if we set the light quark masses to be mu ≈ mc ≈ 0,
we obtain
0 ≈ m2uj ≈ m
j + 2∆Lj4(MUM
U)4j ,
∆Lj4 ≈ −
with j = 1, 2. To be consistent with Eq. (14), at the leading expansion the relation
2h2j = (|λ33|2 + |λ0|2) (16)
should be satisfied. We emphasize that the choice of Eq. (16) is somewhat fine-tuned in
order to have Eqs. (14) and (15) simultaneously. Since the top-quark is much heavier than
other ordinary quarks, we have 2h23 ≈ (1−m2t/m23)(|λ33|2+ |λ0|2) if f > m3 > mt. Similarly,
one obtains the flavor mixing effects for i 6= j 6= 4 to be
∆Lij =
hihjmimj
m2i −m2j
f 2[2(|λ33|2 + |λ0|2)f 2 − (m2i +m2j )]
(|λ33|2 + |λ0|2)f 2 −m2j
((|λ33|2 + |λ0|2)f 2 −m2i )
. (17)
After diagonalization, the currents become
J−µ = ŪLγµV
ULṼ 0aVDL = ŪLγµV
ULV 0DL ,
3 = ŪLγ
µV ULṼ 0aV Ṽ
0†V †ULUL = ŪLγ
µV ULV 0V 0†V †ULUL , (18)
where UT = (u, c, t, T ),
V 0 = Ṽ 0aV =
(V 0CKM)3×3 0
 (19)
and diag(V 0V 0†) = aV . Since the 4-th component of aV is different from the first 3 ones, it
is obvious that the matrix V ≡ V ULV 0 associated with the charged current does not satisfy
unitarity. In addition, V ULaV V
L, which is associated with the neutral current, is not the
identity matrix. As a result, Z-mediated FCNCs at tree level are induced. According to
Eq. (18), we see that
V V † = V ULaV V
L (20)
which is just the same as the effects of the Z-mediated FCNCs. Due to V being a non-unitary
matrix, one finds
(V V †)ij = Vi4V
j4 . (21)
Consequently, the interesting phenomena arising from non-unitary matrix elements are al-
ways related to Vi4V
j4 = ∆Li4∆j4. We note that as we do not particularly address CP
problem, in most cases, we set the parameters to be real numbers.
It has been known that enormous data give strict bounds on the flavor changing effects.
In particular, the pattern describing the charged current has been fixed quite well. Any
new parametrization should respect these constraints. It should be interesting to see the
relationship with and without the new vector-like T -quark. From Eq. (18), we know that
the new flavor mixing matrix for the charged current is given by V = V ULV 0. At the leading
order perturbation, one gets
V = V ULV 0 ≈ (1 + ∆L)V 0 = V 0 +∆LV 0 . (22)
If V 0tb ∼ 1 is taken, one finds that Vub ≈ V 0ub+∆L13 and Vcb ≈ V 0cb+∆L23. In terms of Eq. (17)
and h1 = h2 ≈ h3, the relations ∆L13 ≈ −m1/m3 and ∆L23 ≈ −m2/m3 are obtained. Hence,
in our approach, we have
Vus ≈ V 0us −
, Vub ≈ V 0ub −
, Vcb ≈ V 0cb −
. (23)
From these results, it is clear that when the T -quark decouples from ordinary quarks,
Vus → V 0us, Vub → V 0ub and Vcb → V 0cb, while m1/m2 → mu/mc, m1/m3 → mu/mt and
m2/m3 → mc/mt, respectively. According to the observations in the decays of b → uℓν̄ℓ and
b → cℓν̄ℓ, the corresponding values have been determined to be |Vub| = 3.96 ± 0.09 × 10−3
and |Vcb| = 42.21+0.10−0.80 × 10−3, respectively [16]. Since V 0ij and mi are free parameters, to
satisfy the experimental limits with interesting phenomena in low energy physics, it is rea-
sonable to set the orders of magnitude for m1/m3 and m2/m3 (m1/m2) to be O(10
−2) and
O(10−1), respectively. Consequently, the non-unitary effects on the rare charmed meson
decays governed by V14V
24 could be as large as ∆14∆24 ∼ O(10−4), which could be one order
of magnitude larger than those in Ref. [25].
III. D − D̄ MIXING AND RARE D DECAYS
A. D − D̄ mixing
It is well known that the GIM mechanism has played an important role in the K − K̄
oscillation in the SM. In addition, due to the top-quark in the box and penguin diagrams,
Bq − B̄q mixings are dominated by the SD effects, which are consistent with the data [16].
On the contrary, for the D− D̄ mixing the GIM cancellation further suppresses the mixing
effect to be ∆mD ∼ O(m4s/m2Wm2c) [7] and the bottom quark contribution actually is a
subleading effect due to the suppression of (VubV
2. In the SM, the SD contribution to the
mixing parameter is O(10−7) [34]. However, the LD contribution to the mixing is believed
to be dominant. Due to the nonperturbative hadronic effects, the result is still uncertain
with the prediction on the mixing parameter ranging from O(10−3) [9] to O(10−2) [10–12].
Nonetheless, the mixing parameters shown in Eq. (3) could arise from the LD contribution.
Thus, it is important to have a better understanding of the LD effect. On the other hand, it
is also possible that the mixings in Eq. (3) could result from new physics. In the following,
we will concentrate on the Littlest Higgs model.
In the quark sector of the Littlest Higgs model due to the introduction of a new weak
singlet, a direct impact on the low energy physics is the FCNCs at tree level. According to
Eq. (18), the most attractive process with |∆C| = 2 via the Z-mediated c−u−Z interaction,
illustrated in Fig. 1, is given by
H(|∆C| = 2) = g
(V14V
ūγµPLc ūγ
µPLc ,
(V14V
ūγµPLc ūγ
µPLc . (24)
In terms of the hadronic matrix element, defined by
〈D̄|(ūc)V−A (ūc)V−A|D〉 =
D , (25)
FIG. 1: Z-mediated flavor diagram with |∆C| = 2.
the mass difference for the D meson is [25]
∆mD ≈
DmDBD|(V14V ∗24)2| . (26)
If we assume no cancellation between new physics and SM contributions, by taking τD =
1/ΓD = 6.232 × 1011 GeV−1, fD
BD = 200 MeV [31, 32] and mD = 1.86 GeV and using
Eq. (3), we obtain
ζ0 ≡ |V14V ∗24| = |∆L14∆L24| = (1.47± 0.29)× 10−4 , (27)
which is in the desirable range. In other words, the result in Eq. (27) demonstrates that the
non-unitarity in the Littlest Higgs model could enhance the D − D̄ mixing at the observed
level. We note that the limit of x < 0.015 (95%C.L.) leads to
ζ0 < 2.5× 10−4. (28)
In addition, we note that cancellation between the LD effect in the SM and the SD one from
new physics could happen. In this case, the values in Eqs. (27) and (28) could be relaxed.
B. D → Xuγ decay
In the SM, the D-meson FCNC related processes are all suppressed since the internal
fermions in the loops are all much lighter than mW . For the decay of D → Xuγ, without
QCD corrections, the branching ratio is O(10−17); and it becomes O(10−12) when one-loop
QCD corrections are included [8]. However, it is found that the two-loop QCD corrections
can boost the BR to be as large as 3.5× 10−8 [33]. It should be interesting to see how large
the non-unitarity effect on c → uγ is in the Littlest Higgs model.
To study the radiative decay of c → uγ, we write the effective Lagrangian to be
Lc→uγ = −
mcūσµνPRcF
µν , (29)
where C7 = C
7 + C
7 and C
7 ≈ −(0.007 + i0.02) = 0.021eiδs with δs = 70.7◦ [33]
being the strong phase induced by the two-loop QCD corrections. In the extension of the
SM by including a weak singlet particle, the flavor mixing matrix in the charged current
is not unitary and the Z-mediated FCNC at tree level is generated as well. For c → uγ,
besides the QED-penguin diagrams induced by the W -boson displayed in Figs. 2a and 2b,
the Z-mediated QED-penguin one in Fig. 2c will also give contributions. We note that the
d, s, b
W d, s, b
u, c, T
(a) (c)(b)
FIG. 2: Flavor diagrams for c → uγ.
contributions from WH and ZH can be ignored as m
and m2Z/m
are much less
than one.
At the first sight, due to the light quarks in the loops, the contributions from Figs. 2a
and 2b could be negligible. However, due to the non-unitarity of (V V †)uc = V14V
24 6= 0,
even in the limits of md, s, b → 0, the contributions from the mass independent terms do not
vanish anymore and can be sizable. In terms of unitary gauge [22], we obtain
CW7 =
(V V †)12
VusV ∗cs
VusV ∗cs
. (30)
Furthermore, if we set mu ≈ mc = 0, the contributions from Fig. 2c are given by
CZ7 = (f
u + f
c + f
T )/VusV
fZc + f
− eu sin2 θW
[4ξ0(0)− 6ξ1(0) + 2ξ2(0)]
+eu sin
2 θW [4ξ0(0)− 4ξ1(0)]
fZT =
eu [2ξ0(yT )− 3ξ1(yT ) + ξ2(yT )] , (31)
where the functions ξn(x) are defined by
ξn(x) ≡
zn+1dz
1 + (x− 1)z
and yT = mT/mZ . Numerically, the total contribution in Fig. 2 is
C7 = C
7 + C
V ∗csVus
24 . (33)
If we regard V14V
24 as an unknown complex parameter, i.e. V14V
24 = ζ0e
iθ with θ being
the CP violating phase, one can study the decay BR and direct CP asymmetry (CPA) of
D → Xuγ defined by
BR(D → Xuγ) =
6αem|C7|2
π|Vcd|2
BR(D → Xdeν̄e) ,
ACP =
BR(c̄ → ūγ)− BR(c → uγ)
BR(c̄ → ūγ) + BR(c → uγ)
, (34)
as functions of ζ0 and θ. In Fig. 3, the BR and CPA as functions of ζ0 are presented,
where the solid, dotted, dashed and dash-dotted lines represent the CP violating phase at
θ = 0, 45◦, 90◦ and 135◦, respectively. From these results, it is interesting to see that
0 1 2 3 4 5 6
0 1 2 3 4 5 6
(a) (b)
FIG. 3: BR (in units of 10−8) and CPA (in units of 10−2) for D → Xuγ as functions of ζ0, where
the solid, dotted, dashed and dash-dotted lines represent the CP violating phase at θ = 0, 45◦, 90◦
and 135◦, respectively.
BR(D → Xuγ) is insensitive to the new physics effects, whereas the direct CPA could be as
large as O(10%). Explicitly, if we take θ = 90◦ and ζ0 = 1.5 × 10−4, the CPA is about 3%.
Note that this CPA vanishes in the SM.
C. D → Xuℓℓ̄ and D0 → ℓ+ℓ− decays
Because the current experimental measurements in K and Bq decays are all consistent
with the SM predictions, it is inevitable that if we want to observe any deviations from the
SM, we have to wait for precision measurements for K and Bq. SuperB factories or LHCb
could provide a hope. However, the situation in D physics is straightforward. As stated
before, unlike K and Bq systems, due to no heavy quark enhancement in the D system, the
rare D-meson decays, such as D → Xuℓℓ̄ (ℓ = e, µ, ν), are always suppressed. Even by
considering the long-distance effects, the related decays, such as D → µ+µ− and D → Xuνν̄,
get small corrections to the SD predictions on the BRs [35]. Therefore, these rare decays
definitely could be good candidates to probe the new physics effects. Since the values in the
SM are hardly reachable at D factories [27], if any exotic event is found, it must be a strong
evidence for new physics. In the following analysis, we are going to discuss the implication
of the Littlest Higgs model on the rare D decays involving di-leptons.
To study these decays, we first write the effective Hamiltonian for c → uℓ+ℓ− (ℓ = e, µ)
H(c → uℓ+ℓ−) = −
GFαem√
V ∗csVus
Ceff9 O9 + C7O7 + C10O10
, (35)
O7 = −
ūiσµνq
νPRcℓ̄γ
O9 = ūγµPLc ℓ̄γ
O10 = ūγµPLc ℓ̄γ
µγ5ℓ , (36)
where the effective Wilson coefficients are given by
Ceff9 =
(V V †)14
V ∗csVus
cℓV + (h(zs, s)− h(zd, s)) (C2(mc) + 3C1(mc)) ,
C10 = −
(V V †)14
V ∗csVus
cℓA , (37)
with s = q2/m2c , zi = mi/mc, c
V = −1/2 + 2 sin2 θW , cℓA = −1/2 and
h(z, s) = −4
ln z +
(2 + x)
|1− x|
1−x+1√
1−x−1
− i π, for x ≡ 4z2/s < 1 ,
2 arctan 1√
x−1 , for x ≡ 4z
2/s > 1
. (38)
Here, we have neglected the small contributions from the penguin and box diagrams. We
note that in the SM, the SD contributions are mainly from the term with h(z, s), induced
by the insertion of O2 = ūLγµqLq̄Lγ
µcL and mixing with O9 at one-loop level [35, 36].
We note that the resonant decays of D → XuV → Xuℓ+ℓ− (V = φ, ρ, ω) would have
large corrections to c → uℓ+ℓ− at the resonant regions. However, in this paper we do
not discuss these contributions as we only concentrate on the SD contributions. Moreover,
these resonance contributions can be removed by imposing proper cuts in the phase space
in dedicated searches.
From Eq. (35), the decay rate for D → Xuℓ+ℓ− as a function of the invariant mass
s = q2/m2c can be found to be
768π5
|VusV ∗cs|2(1− s)2R(s) ,
R(s) =
|Ceff9 |2 + |C10|2
(1 + 2s) + 12Re(C∗7C
9 ) + 4
|C7|2 . (39)
In addition, by utilizing the lepton angular distribution, we can also study the forward-
backward asymmetry (FBA), given by
−1 d cos θdΓ/dsd cos θ sgn(cos θ)
−1 d cos θdΓ/dsd cos θ
Ceff9 +
, (40)
where θ is the angle of ℓ+ related to the momentum of the D meson in the ℓ+ℓ− invariant
mass frame. Since C10 is small in the SM, AFB is negligible. With mc = 1.4 GeV and the
mixing parameter in Eq. (27), we get
BR(D → Xue+e−) = (4.18± 0.91)× 10−10 ,
BR(D → Xuµ+µ−) = (2.51± 0.86)× 10−10 , (41)
comparing with the SM predictions of BR(D → Xue+e−)SM = 2.1 × 10−10 and BR(D →
+µ−)SM = 0.5 × 10−10, respectively. Clearly, if some cancellation occurs between new
physics and SM contributions in the D − D̄ mixing, a larger value of ζ0 can be allowed. In
Fig. 4, we show the tendency of the decay as a function of ζ0, where the negative horizontal
values correspond to -ζ0. In addition, we present the differential decay BR [FBA] of
D → Xue+e− as a function of s = q2/m2c in Fig. 5a [b], where the thick solid, dotted and
dashed lines correspond to ζ0 = 1.5, 3.0 and 5.0, while the thin ones denote the cases for
−ζ0 except ζ0 = 0 for the thin solid line in Fig. 5a. From Fig. 5b, we see that the FBA
-6 -4 -2 0 2 4 6
FIG. 4: BR(in units of 10−9) for D → Xue+e− as a function of ζ0.
0.2 0.4 0.6 0.8
0.2 0.4 0.6 0.8
(a) (b)
FIG. 5: (a)[(b)] Differential BR (in units of 10−9) [FBA] for D → Xue+e− as a function of s, where
the thick solid, dotted and dashed lines correspond to ζ0 = 1.5, 3.0 and 5.0, while the thin ones
denote the cases for −ζ0 except ζ0 = 0 for the thin solid line in (a).
is only at percent level. In the Littlest Higgs model, this is because the Z coupling to the
charged lepton cℓV = −1/2 + 2 sin2 θW appearing in Ceff9 is much smaller than one. This is
quite different from that in b → sℓ+ℓ− where the dominant effect in the SM for the FBA is
from the box and QED-penguin diagrams.
Next, we discuss the decay of D → Xuνν̄. In the SM, the BR for D → Xuνν̄ is estimated
to be O(10−16) − O(10−15) [35], which is vanishing small. In the Littlest Higgs model, by
taking C7 = 0, C
9 = −C10 = −π(V V †)14/(αemVusV ∗cs), the effective Hamiltonian in Eq. (35)
can be directly applied to c → uνν̄. The decay rate forD → Xuνν̄ as a function of s = q2/m2c
can be obtained as
768π5
(1− s)2(1 + 2s)
2π2|(V V †)12|2
, (42)
where the factor of 3 stands for the neutrino species. With ζ0 = 1.5×10−4, we get BR(D →
Xuνν̄) = 1.31 × 10−9. However, if we relax the constraint on V14V †24, the BR as a function
of ζ0 is shown in Fig. 6a. For a larger value of ζ0, BR(D → Xuνν̄) could be as large as
O(10−8).
0 1 2 3 4 5 6
0 1 2 3 4 5 6
(a) (b)
FIG. 6: (a) BR (in units of 10−8) for D → Xuνν̄ and (b) BR (in units of 10−9) for D → µ+µ−.
Finally, we study the decays of D → ℓ+ℓ−. It has been known that, in the SM, the SD
contributions to D → µ+µ− are O(10−18), while the LD ones are O(10−13) [35]. It is clear
that any signal to be observed at the sensitivity of the proposed detector, such as BESIII,
will indicate new physics effects. Since the effective interactions at quark level are the same
as those in Eq. (35), one finds that
BR(D → ℓ+ℓ−) =
τDmDm
|πV14V ∗24|2 . (43)
Here we have used equation of motion for the charged lepton so that ℓ̄/pDℓ = 0. We also
note that operators O7,9 make no contributions. With |V14V ∗24| = ζ0 = 1.5 × 10−4, the
predicted BR for D → µ+µ− is 1.17× 10−10. In Fig. 6b, we present the BR as a function of
ζ0. We see that BR(D → µ+µ−) in the Littlest Higgs model could be as large as O(10−9).
IV. CONCLUSIONS
We have studied the D − D̄ mixing and rare D decays in the Littlest Higgs model. In
the model, as the new weak singlet vector-like quark T with the electric charge of 2/3
is introduced to cancel the quadratic divergence induced by the top-quark, the standard
unitary 3× 3 CKM matrix is extended to a non-unitary 4× 3 matrix in the quark charged
currents and Z-mediated flavor changing neutral currents are generated at tree level. We
have shown that the effects on |∆C| = 2 and |∆C| = 1 processes are all related to V14V ∗24 in
Eq. (21).
To avoid the scenario adopted by Ref. [25], in which λ0 ∼ λ33 ≫ λij was assumed, we
choose the basis such that the effective mass matrix for u1, c2 and t3 is diagonal, while
the corresponding masses m1, m2 and m3 are free parameters and can be as large as the
weak scale v. Since the global symmetry breaking scale f is larger than v, the mixing
matrix relating physical and unphysical states could be extracted by taking the leading
perturbative expansion. Accordingly, by using the approximation of mu ≈ mc ≈ 0, the
explicit expressions for V14 and V24 have been obtained. In terms of the data for Vub and
Vcb, we have found that the natural value for ζ0 ≡ |V14V ∗24| is O(10−4), which agrees with the
observed parameter in the D − D̄ mixing but it is one order of magnitude larger than that
in Ref. [25].
For the rare D decays, due to the non-unitarity effects in the model, BR(D → Xuℓ+ℓ−),
BR(D → Xuνν̄) and BR(D → µ+µ−) could be enhanced to be O(10−9), O(10−8) and
O(10−9), respectively, which could marginally reach the sensitivity proposed by BESIII [27].
Acknowledgments
This work is supported in part by the National Science Council of R.O.C. under Grant
#s: NSC-95-2112-M-006-013-MY2, NSC-95-2112-M-007-001, NSC-95-2112-M-007-059-MY3
and NSC-96-2918-I-007-010.
[1] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 97, 242003 (2006) [arXiv:hep-
ex/0609040].
[2] S. L. Glashow, J. Iliopoulos and L. Maiani, Phys. Rev. D2, 1285 (1970).
[3] J. S. Hagelin, Nucl. Phys. B193, 123 (1981).
[4] S. R. Choudhury et al., Phys. Lett. B601, 164 (2004); J. Hubisz, S. J. Lee and G. Paz, JHEP
06, 041 (2006).
[5] M. Blanke et al., arXiv:hep-ph/0702136; M. Blanke et al., JHEP 01, 066 (2007); A. J. Buras
et al., JHEP 11, 062 (2006); M. Blanke, JHEP 12, 003 (2006).
[6] C. H. Chen, Phys. Lett. B521, 315 (2001); J. Phys. G28, L33 (2002); C. H. Chen and C.
Q. Geng, Phys. Rev. D66, 014007 (2002); Phys. Rev. D66, 094018 (2002); Phys. Rev. D71,
054012 (2005); Phys. Rev. D71, 115004 (2005); C. H. Chen and H. Hatanaka, Phys. Rev.
D73, 075003 (2006).
[7] K. Niyogi and A. Datta, Phys. Rev. D20, 2441(1979); A. Datta and D. Kumbhakhar, Z. Phys.
C 27, 515 (1985).
[8] G. Burdman, E. Golowich, J. Hewett and S. Pakvasa, Phys. Rev. D 52, 6383 (1995).
[9] H. Georgi, Phys. Lett. B297, 353 (1992); T. Ohl, G. Ricciardi and E. Simmons, Nucl. Phys.
B403, 605 (1993); I. Bigi and N. Uraltsev, Nucl. Phys. B592, 92 (2001).
[10] A. A. Petrov, Int. J. Mod. Phys. A21, 5686 (2006).
[11] J. Donoghue et al., Phys. Rev. D33, 179 (1986); L. Wolfenstein, Phys. Lett. B164, 170 (1985);
P. Colangelo, G. Nardulli and N. Paver, Phys. Lett. B242, 71 (1990); T. A. Kaeding, Phys.
Lett. B357, 151 (1995); A. A. Anselm and Y. I. Azimov, Phys. Lett. B85, 72 (1979).
[12] A. F. Falk et al., Phys. Rev. D65, 054034 (2002); A. F. Falk, et al., Phys. Rev. D69, 114021
(2004).
[13] B. Aubert et al. (Babar Collaboration), Phys. Rev. Lett. 98, 211802 (2007) [arXiv:hep-
ex/0703020].
[14] K. Abe et al. (Belle Collaboration), Phys. Rev. Lett. 98, 211803 (2007) [arXiv:hep-
ex/0703036].
[15] L. M. Zhang et al. (Belle Collaboration), arXiv:0704.1000 [hep-ex].
[16] Particle Data Group, W.M. Yao et al., J. Phys. G 33, 1 (2006).
[17] Y. Nir, arXiv:hep-ph/0703235.
[18] M. Ciuchini et al., arXiv:hep-ph/0703204.
[19] E. Golowich, S. Pakvasa and A. A. Petrov, arXiv:hep-ph/0610039.
[20] P. Ball, arXiv:hep-ph/0703245; arXiv:0704.0786; X. G. He and G. Valencia, arXiv:hep-
ph/0703270.
[21] M. Blanke, A. J. Buras, S. Recksiegel, C. Tarantino and S. Uhlig, arXiv:hep-ph/0703254.
[22] C. H. Chang, D. Chang and W. Y. Keung, Phys. Rev. D61, 053007 (2000).
[23] N. Arkani-Hamed, A. G. Cohen and H. Georgi, Phys. Lett. B513, 232 (2001) N. Arkani-
Hamed, A. G. Cohen, E. Katz and A. E. Nelson, JHEP 07, 034 (2002) [arXiv:hep-ph/0206021].
[24] T. Han et al., Phys. Rev. D67, 095004 (2003).
[25] J. Lee, JHEP 0412, 065 (2004).
[26] S. Fajfer and S. Prelovsek, Phys. Rev. D73, 054026 (2006).
[27] H. B. Li, hep-ex/0605004.
[28] M. Perelstein, M. E. Peskin and A. Pierce, Phys. Rev. D 69, 075002 (2004); see also M. Perel-
stein, Prog. Part. Nucl. Phys. 58, 247 (2007).
[29] M. Schmaltz and D. Tucker-Smith, Ann. Rev. Nucl. Part. Sci. 55, 229 (2005).
[30] H. Fritzsch, Phys. Lett. B73, 317 (1978); Phys. Lett. B166, 423 (1986).
[31] M. Artuso et al. [CLEO Collaboration], Phys. Rev. Lett. 95, 251801 (2005).
[32] H. W. Lin, S. Ohta, A. Soni and N. Yamada, Phys. Rev. D74, 114506 (2006).
[33] C. Greub, T. Hurth, M. Misiak and D. Wyler, Phys. Lett. B382, 415 (1996); S. Prelovsek and
D. Wyler, Phys. Lett. B500, 304 (2001).
[34] E. Golowich and A. A. Petrov, Phys. Lett. B625, 53 (2005).
[35] G. Burman, E. Golowich, J. Hewett and S. Pakvasa, Phys. Rev. D66, 014009 (2002).
[36] S. Fajfer, P. Singer and J. Zupan, Eur. Phys. J. C27, 201 (2003).
ABSTRACT
  We study the $D-\bar D$ mixing and rare D decays in the Littlest Higgs model.
As the new weak singlet quark with the electric charge of 2/3 is introduced to
cancel the quadratic divergence induced by the top-quark, the standard unitary
$3\times 3$ Cabibbo-Kobayashi-Maskawa matrix is extended to a non-unitary
$4\times 3$ matrix in the quark charged currents and Z-mediated flavor changing
neutral currents are generated at tree level. In this model, we show that the
$D-\bar D$ mixing parameter can be as large as the current experimental value
and the decay branching ratio (BR) of $D\to X_u \ga$ is small but its direct CP
asymmetry could be $O(10%)$. In addition, we find that the BRs of $D\to X_u
\ell^{+} \ell^{-}$, $D\to X_u\nu \bar \nu$ and $D\to \mu^{+} \mu^{-}$ could be
enhanced to be $O(10^{-9})$, $O(10^{-8})$ and $O(10^{-9})$, respectively.

<|endoftext|><|startoftext|>
CERN–PH–TH/2007–068
SU–4252–8489
Alternative Large N
Schemes and Chiral Dynamics
Francesco Sannino∗
CERN Theory Division, CH-1211 Geneva 23, Switzerland. and
University of Southern Denmark, Campusvej 55, DK-5230, Odense M., Denmark.
Joseph Schechter†
Department of Physics, Syracuse University, Syracuse, NY 13244-1130, USA.
(Dated: March 2007)
We compare the dependences on the number of colors of the leading ππ scattering amplitudes
using the single index quark field and two index quark fields. These are seen to have different
relationships to the scattering amplitudes suggested by chiral dynamics which can explain the long
puzzling pion pion s wave scattering up to about 1 GeV. This may be interesting for getting a better
understanding of the large Nc approach as well as for application to recently proposed technicolor
models.
BACKGROUND
Gaining control of QCD in its strongly interacting (low
energy) regime constitutes a real challenge. One very at-
tractive approach is based on studying the theory in the
large number of colors (Nc) limit [1, 2]. At the same
time one may obtain more information by requiring the
theory to model the (almost) spontaneous breakdown of
chiral symmetry [3, 4]. A standard test case is pion pion
scattering in the energy range up to about 1 GeV. Some
time ago, an attempt was made [5, 6] to implement this
combined scenario. Since the leading large Nc ampli-
tude contains only tree diagrams involving mesons of the
standard quark-antiquark type, it is expected that the
required amplitude should be gotten by calculating just
the chiral tree diagrams for rho meson exchange together
with the four point pion contact diagram. There are
no unknown parameters in this calculation. The crucial
question is whether the scattering amplitude calculated
in this way will satisfy unitarity. When one compares
the result with experimental data up to about 1 GeV on
the real part of the (most sensitive to unitarity violation)
J=I=0 partial wave, one finds (see Fig.1 of [6]) that the
result violates the partial wave unitarity bound by just
a “little bit”. On the other hand, the pion contact term
by itself violates unitarity much more drastically so one
might argue that the large Nc approach, which suggests
that the tree diagrams of all quark anti-quark resonances
in the relevant energy range be included, is helping a lot.
To make matters more quantitative one might ask the
question: by how much should Nc be increased in or-
der for the amplitude in question to remain within the
unitarity bounds for energies below 1 GeV?
This question was answered in a very simple way in
[7], as we now briefly review. In terms of the con-
ventional amplitude, A(s, t, u) the I = 0 amplitude is
3A(s, t, u) + A(t, s, u) + A(u, t, s). One gets the J = 0
channel by projecting out the correct partial wave. The
current algebra (pion contact diagram) contribution to
the conventional amplitude is
Aca(s, t, u) = 2
s−m2π
, (1)
where the pion decay constant, Fπ depends on Nc as
Fπ(Nc) = 131
3 so that Fπ(3) = 131 MeV. Fur-
thermore mπ = 137 MeV is independent of Nc. The
desired amplitude is obtained by adding to the current
algebra term the following vector meson ρ(770) contribu-
tion:
Aρ(s, t, u) =
g2ρππ
(4m2π − 3s)
g2ρππ
(m2ρ − t)− imρΓρθ(t− 4m2π)
(m2ρ − u)− imρΓρθ(u − 4m2π)
, (2)
where gρππ(Nc) = 8.56
Nc is the ρππ coupling con-
stant. Also, mρ = 771 MeV is independent of Nc and
Γρ(Nc) =
g2ρππ (Nc)
12πm2ρ
. (3)
It should be noted that the first term in Eq. (2), which
is an additional non-resonant contact interaction other
than the current algebra contribution, is required when
we include the ρ vector meson contribution in a chiral
invariant manner. In Fig. 1 we show the real part of the
I = J = 0 amplitude (denoted R0
) due to current alge-
bra plus the ρ contribution for increasing values of Nc.
Since in this channel the vector meson is never on shell
we suppress the contribution of the width in the vector
http://arxiv.org/abs/0704.0602v1
meson propagator in Eq. (2). One observes that the uni-
tarity bound (i.e., |R0
| ≤ 1/2) is satisfied for Nc ≥ 6 till
well beyond the 1 GeV region. However unitarity is still a
problem for 3, 4 and 5 colors. At energy scales larger than
the one associated with the vector meson clearly other
resonances are needed [5] but we shall not be concerned
with that energy range here. It is also interesting to note
that these considerations are essentially unchanged when
the pion mass (i.e. explicit chiral symmetry breaking in
the Lagrangian) is set to zero.
FIG. 1: Real part of the I = J = 0 partial wave amplitude
due to the current algebra +ρ terms, plotted for the follow-
ing increasing values of Nc (from up to down), 3, 4, 5, 6, 7.
The curve with largest violation of the unitarity bound corre-
sponds to Nc = 3 while the ones within the unitarity bound
are for Nc = 6, 7.
Note that essentially we are just using the scaling,
A(s, t, u) =
Ã(s, t, u) . (4)
where Ã(s, t, u) is defined replacing Fπ and gρππ with the
Nc independent quantities F̃π = Fπ
Nc and g̃ρππ =
3. Other authors [8] have found the same
minimum value, Nc=6 for the practical consistency of
the large Nc approximation, by using different methods.
In order to explain low energy ππ scattering for the
physical value Nc = 3 one must go beyond the large Nc
approximation. It is attractive to keep the assumption
of tree diagram dominance involving near by resonances,
however. One easily sees that adding a scalar singlet res-
onance (sigma) at the location where the unitarity bound
on R0
(s) is first violated should restore unitarity. This
is because the real part of a Breit Wigner resonance is
zero at the pole location and negative just above it, so
will bring R0
(s) below the bound, as required. In [7], the
resonance mass was found to be around 550 MeV on this
basis. Such a low value would make it different from a
p-wave quark-antiquark state, which is expected to be in
the 1000-1400 MeV range. We assume then that it is a
four quark state (glueball states are expected to be in the
1.5 GeV range from lattice investigations). Four quark
states of diquark-anti diquark type [9] and meson-meson
type [10] have been discussed in the literature for many
years. Accepting this picture, however, poses a problem
for the accuracy of the large Nc inspired description of
the scattering since four quark states are predicted not
to exist in the large Nc limit of QCD. We shall take the
point of view that a four quark type state is present since
it allows a natural fit to the low energy data. Of course,
it is still necessary to fine tune the parameters and shape
of the sigma resonance to fit the experimental ππ scat-
tering data in detail. In practice, since the parameters
of the pion contact and rho exchange contributions are
fixed, the sigma is the most important one for fitting and
fits may even be achieved [11] if the vector meson piece
is neglected. However the well established, presumably
four quark type, f0(980) resonance must be included to
achieve a fit in the region just around 1 GeV.
There is by now a fairly large recent literature [12]-
[44] on the effect of light “exotic” scalars in low energy
meson meson scattering. There seems to be a consene-
sus, arrived at using rather different approaches (keeping
however, unitarity), that the sigma exists.
TWO INDEX QUARK FIELDS
Now, consider redefining the Nc = 3 quark field with
color index A (and flavor index not written) as
ǫABCq
BC , qBC = −qCB, (5)
so that, for example, q1 = q
23 and similarly for the ad-
joint field, q̄1 = q̄23 etc. This is just a trivial change
of variables and will not change anything for QCD. How-
ever, if a continuation of the theory is made to Nc > 3 the
resulting theory will be different since the two index anti-
symmetric quark representation has Nc(Nc− 1)/2 rather
than Nc color components. As was pointed out by Cor-
rigan and Ramond [45], who were mainly interested in
the problem of the baryons at large Nc, this shows that
the extrapolation of QCD to higher Nc is not unique.
Further investigation of the properties of the alternative
extrapolation model introduced in [45] was carried out
by Kiritsis and Papavassiliou [46]. Here, we shall discuss
the consequences for the low energy ππ scattering dis-
cussed above, of this alternative large Nc extrapolation,
assuming for our purpose, that all the quarks extrapolate
as antisymmetric two index objects.
It may be worthwhile to remark that gauge theo-
ries with two index quarks have recently gotten a great
deal of attention. Armoni, Shifman and Veneziano
[47, 48, 49, 50, 51] have proposed an interesting relation
between certain sectors of the two index antisymmetric
(and symmetric) theories at large number of colors and
sectors of super Yang-Mills (SYM). Using a supersym-
metry inspired effective Lagrangian approach 1/Nc cor-
rections were investigated in [52]. Information on the su-
per Yang-Mills spectrum has been obtained in [53]. On
the validity of the large Nc equivalence between differ-
ent theories and interesting new possible phases we refer
the reader to [54, 55, 56]. The finite temperature phase
transition and its relation with chiral symmetry has been
investigated in [57] while the effects of a nonzero baryon
chemical potential were studied in [58].
When adding flavors the phase diagram as a function
of the number of flavors and colors has been provided
in [59]. The complete phase diagram for fermions in ar-
bitrary representations has been unveiled in [60]. The
study of theories with fermions in a higher dimensional
representation of the gauge group and the knowledge of
the associated conformal window led to the construction
of minimal models of technicolor [59, 61, 62] which are
not ruled out by current precision measurements and lead
to interesting dark matter candidates [63, 64, 65] as well
as to a very high degree of unification of the standard
model gauge couplings [66].
Besides these two limits a third one for massless one-
flavor QCD, which is in between the ’t Hooft and Cor-
rigan Ramond ones, has been very recently proposed in
[67]. Here one first splits the QCD Dirac fermion into the
two elementary Weyl fermions and afterwards assigns one
of them to transform according to a rank-two antisym-
metric tensor while the other remains in the fundamental
representation of the gauge group. For three colors one
reproduces one-flavor QCD and for a generic number of
colors the theory is chiral. The generic Nc is a particular
case of the generalized Georgi-Glashow model [68].
To illustrate the large Nc counting for the ππ scatter-
ing amplitude when quarks are designated to transform
according to the two index antisymmetric representation
of color SU(3) one may employ [1] the mnemonic where
each tensor index of this group is represented by a di-
rected line. Then the quark-quark gluon interaction is
pictured as in Fig. 2. The two index quark is pictured
FIG. 2: Two index fermion - gluon vertex.
as two lines with arrows pointing in the same direction,
as opposed to the gluon which has two lines with arrows
pointing in opposite directions. The coupling constant
representing this vertex is taken to be gt/
Nc, where gt
(the ’t Hooft coupling constant) is to be held constant.
A “one point function”, like the pion decay constant,
Fπ would have as it’s simplest diagram, Fig. 3.
The X represents a pion insertion and is associated
FIG. 3: Diagram for Fπ for the two index quark.
with a normalization factor for the color part of the pion’s
wavefunction,
Nc(Nc − 1)
, (6)
which scales for large Nc as 1/Nc. The two circles each
carry a quark index so their joint factor scales as N2c for
large Nc; more precisely, taking the antisymmetry into
account, the factor is
Nc(Nc − 1)
. (7)
The product of Eqs. (6) and (7) yields the Nc scaling for
F 2π (Nc) =
Nc(Nc − 1)
F 2π (3). (8)
For largeNc, Fπ scales proportionately to Nc rather than
Nc as in the case of the ’t Hooft extrapolation.
Using this scaling together with Eq.(1) suggests that
the ππ scatttering amplitude, A scales as,
A(Nc) =
Nc(Nc − 1)
A(3), (9)
which, for large Nc scales as 1/N
c rather than as 1/Nc
in the ’t Hooft extrapolation. This scaling law for large
Nc may be verified from the mnemonic in Fig. 4, where
there is an N2c factor from the two loops multiplied by
four factors of 1/Nc from the X’s.
FIG. 4: Diagram for the scattering amplitude, A with the 2
index quark.
It is interesting to find the minimum value of Nc for
which the tree amplitude due to the pion and rho meson
terms (given in Eqs.(1) and (2) above) is unitary in this
two antisymmetric index quark extrapolation scheme.
Fig. 1 shows that the the peak value of the partial wave
amplitude, R0
due to these two terms is numerically
about 0.9. This is to be identified with Aca(3) + Aρ(3)
in Eq.(9). Thus the condition that the extrapolated am-
plitude be unitary is,
Nc(Nc − 1)
< 1/2. (10)
Clearly, the extrapolated amplitude is unitary already for
Nc = 4, which indicates better convergence in Nc than
for the ’t Hooft case which became unitary at Nc = 6.
There is still another different feature; consider the
typical ππ scattering diagram with an extra internal (two
index) quark loop, as shown in Fig. 5.
FIG. 5: Diagram for the scattering amplitude, A including an
internal 2 index quark loop.
In this diagram there are four X’s (factor from Eq.(6)),
five index loops (factor from Eq.(7)) and six gauge cou-
pling constants. These combine to give a large Nc scaling
behavior proportional to 1/N2c for the ππ scattering am-
plitude. We see that diagrams with an extra internal
2 index quark loop are not suppressed compared to the
leading diagrams. This is analogous, as pointed out in
[46], to the behavior of diagrams with an extra gluon loop
in the ’t Hooft extrapolation scheme. Now, Fig. 5 is a
diagram which can describe a sigma particle exchange.
Thus in the 2 index quark scheme, “exotic” four quark
resonances can appear at the leading order in addition
to the usual two quark resonances. Given the discussion
we reviewed above, the possibility of a sigma appearing
at leading order means that one can construct a unitary
ππ amplitude already at Nc = 3 in the 2 antisymmetric
index scheme. From the point of view of low energy ππ
scattering, it seems to be unavoidable to say that the 2
index scheme is more realistic than the ’t Hooft scheme
given the existence of a four quark type sigma.
Of course, the usual ’t Hooft extrapolation has a num-
ber of other things to recommend it. These include the
fact that nearly all meson resonances seem to be of the
quark- antiquark type, the OZI rule predicted holds to
a good approximation and baryons emerge in an elegant
way as solitons in the model.
A fair statement would seem to be that each extrap-
olation emphasizes different aspects of the true Nc = 3
QCD. In particular, the usual scheme is not really a re-
placement for the true theory. That appears to be the
meaning of the fact that the continuation to Nc > 3 is
not unique.
QUARKS IN TWO INDEX SYMMETRIC COLOR
REPRESENTATION
Clearly the assignment of quarks to the two index sym-
metric representation of color SU(3) looks very similar.
We may denote the quark fields as,
AB = q
BA , (11)
in contrast to Eq.(5). There will be Nc(Nc + 1)/2 differ-
ent color states for the two index symmetric quarks. This
means that there is no value of Nc for which the sym-
metric theory can be made to correspond to true QCD.
For Nc = 3 there are 6 color states of the quarks and 8
color states of the gluon. If we choose Nc = 2, there are
3 color states of the quarks but unfortunately only three
color states of the gluon. On the other hand, for large Nc
it would seem reasonable to make approximations like,
Asym(Nc) ≈ Aasym(Nc), (12)
for the ππ scattering amplitude.
As far as the large Nc counting goes, the mnemonics
in Figs. 2-5 are still applicable to the case of quarks in
the two index symmetric color representation. For not so
large Nc, the scaling factor for the pion insertion would
Nc(Nc + 1)
, (13)
and the pion decay constant would scale as
F symπ (Nc) ∝
Nc(Nc + 1)
. (14)
With the identification AQCD = Aasym(3), the use of
Eq.(12) enables us to estimate the large Nc scattering
amplitude as,
Asym(Nc) ≈
AQCD. (15)
In applications to recently proposed minimal walking
technicolor theories this formula is useful for making es-
timates involving weak gauge bosons via the Goldstone
boson equivalence theorem [69].
Finally we remark on the large Nc scaling rules for
meson and glueball masses and decays in either the two
index antisymmetric or two index symmetric schemes.
Both meson and glueball masses scale as (Nc)
0. Further-
more, all six reactions of the type
a → b+ c, (16)
where a,b and c can stand for either a meson or a glueball,
scale as 1/Nc. This is illustrated in Fig.6 for the case of a
meson decaying into two glueballs; note that the glueball
insertion scales as 1/Nc and that two interaction vertices
are involved.
FIG. 6: Diagram for meson decay into two glueballs.
SUMMARY
We have investigated the dependences on the number
of colors of the leading ππ scattering amplitudes using
the single and the two index quark fields.
We have seen that in the 2 index quark extension of
QCD, exotic four quark resonances can appear at the
leading order in addition to the usual two quark reso-
nances. From the point of view of low energy ππ scatter-
ing the 2 index scheme is more realistic than the ’t Hooft
one given the existence of a four quark type sigma. This
allows one to explain the long puzzling pion pion s wave
scattering up to about 1 GeV.
Of course, the usual ’t Hooft extrapolation has a num-
ber of other important predictions to recommend it. A
fair statement is that each largeNc extrapolation of QCD
captures different aspects of the physical Nc = 3 case.
We have also related the QCD scattering amplitude
at large Nc with the one featuring two index symmetric
quarks (Similar connections exist for adjoint fermions).
The results are interesting for getting a better under-
standing of the large Nc approach as well as for applica-
tion to recently proposed technicolor models.
Acknowledgments
It is a pleasure to thank A. Abdel Rehim, D. Black,
D.D. Dietrich, A. H. Fariborz, M.T. Frandsen, M.
Harada, S. Moussa, S. Nasri and K. Tuominen for helpful
discussions. The work of F.S. is supported by the Marie
Curie Excellence Grant under contract MEXT-CT-2004-
013510 as well as the Danish Research Agency. The work
of J.S. is supported in part by the U. S. DOE under Con-
tract no. DE-FG-02-85ER 40231.
∗ Electronic address: francesco.sannino@nbi.dk
† Electronic address: schechte@physics.syr.edu
[1] G.’t Hooft, Nucl. Phy. B72, 461 (1974).
[2] E. Witten, Nucl. Phy. B160, 57 (1979).
[3] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122, 345
(1961).
[4] M. Gell-Mann and M. Levy, Nuovo Cimento 16, 705
(1960).
[5] F. Sannino and J. Schechter, Phys. Rev. D 52, 96 (1995).
[6] M. Harada, F. Sannino and J. Schechter, Phys. Rev. D
54, 1991 (1996).
[7] M. Harada, F. Sannino and J. Schechter, Phys. Rev. D
69, 034005 (2004) [arXiv:hep-ph/0309206].
[8] J. R. Pelaez, Phys. Rev. Lett. 92, 102001 (2004);
97, 242002 (2006). M. Uehara, arXiv:hep-ph/0308241,
/0401037, /0404221.
[9] R. L. Jaffe, Phys. Rev. D 15, 267 (1977); Phys. Rev. D
15, 281 (1977).
[10] J. D. Weinstein and N. Isgur, Phys. Rev. Lett. 48, 659
(1982).
[11] M. Harada, F. Sannino and J. Schechter, Phys. Rev. Lett.
78, 1603 (1997).
[12] See the dedicated conference proceedings, S. Ishida et al
“Possible existence of the sigma meson and its implica-
tion to hadron physics”, KEK Proceedings 2000-4, So-
ryyushiron Kenkyu 102, No. 5, 2001. Additional points
of view are expressed in the proceedings, D. Amelin
and A.M. Zaitsev “Hadron Spectroscopy”, Ninth Inter-
national Conference on Hadron Spectroscopy, Protvino,
Russia(2001).
[13] E. Van Beveren, T. A. Rijken, K. Metzger, C. Dulle-
mond, G. Rupp and J. E. Ribeiro, Z. Phys. C 30, 615
(1986); E. van Beveren and G. Rupp, Eur. Phys. J. C
10, 469 (1999). See also J. J. De Swart, P. M. Maessen
and T. A. Rijken, Talk given at U.S. / Japan Seminar:
The Hyperon - Nucleon Interaction, Maui, HI, 25-28 Oct
1993 [arXiv:nucl-th/9405008].
[14] D. Morgan and M. R. Pennington, Phys. Rev. D 48, 1185
(1993).
[15] A. A. Bolokhov, A. N. Manashov, V. V. Vereshagin and
V. V. Polyakov, Phys. Rev. D 48, 3090 (1993). See also
V. A. Andrianov and A. N. Manashov, Mod. Phys. Lett.
A 8, 2199 (1993). Extension of this string-like approach
to the πK case has been made in V. V. Vereshagin, Phys.
Rev. D 55, 5349 (1997) and in A. V. Vereshagin and
V. V. Vereshagin, Phys. Rev. D 59, 016002 (1999).
[16] N. N. Achasov and G. N. Shestakov, Phys. Rev. D 49,
5779 (1994).
[17] R. Kamı́nski, L. Leśniak and J. P. Maillet, Phys. Rev. D
50, 3145 (1994).
[18] N. A. Tornqvist, Z. Phys. C 68, 647 (1995) and references
therein. In addition see N. A. Törnqvist and M. Roos,
Phys. Rev. Lett. 76, 1575 (1996); N. A. Tornqvist, Talk
given at 7th International Conference on Hadron Spec-
troscopy (Hadron 97), Upton, NY, 25-30 Aug 1997 and
at EuroDaphne Meeting, Barcelona, Spain, 6-9 Nov 1997
[arXiv:hep-ph/9711483]; Phys. Lett. B 426, 105 (1998).
[19] R. Delbourgo and M. D. Scadron, Mod. Phys. Lett. A
10, 251 (1995); See also D. Atkinson, M. Harada and
A. I. Sanda, Phys. Rev. D 46, 3884 (1992).
[20] G. Janssen, B. C. Pearce, K. Holinde and J. Speth, Phys.
mailto:francesco.sannino@nbi.dk
mailto:schechte@physics.syr.edu
http://arxiv.org/abs/hep-ph/0309206
http://arxiv.org/abs/hep-ph/0308241
http://arxiv.org/abs/nucl-th/9405008
http://arxiv.org/abs/hep-ph/9711483
Rev. D 52, 2690 (1995).
[21] M. Svec, Phys. Rev. D 53, 2343 (1996).
[22] S. Ishida, M. Ishida, H. Takahashi, T. Ishida, K. Taka-
matsu and T. Tsuru, Prog. Theor. Phys. 95, 745 (1996);
S. Ishida, M. Ishida, T. Ishida, K. Takamatsu and
T. Tsuru, Prog. Theor. Phys. 98, 621 (1997). See also
M. Ishida and S. Ishida, Talk given at 7th International
Conference on Hadron Spectroscopy (Hadron 97), Upton,
NY, 25-30 Aug 1997 [arXiv:hep-ph/9712231].
[23] D. Black, A. H. Fariborz, F. Sannino and J. Schechter,
Phys. Rev. D 58, 054012 (1998).
[24] D. Black, A. H. Fariborz, F. Sannino and J. Schechter,
Phys. Rev. D 59, 074026 (1999).
[25] L. Maiani, A. Polosa, F. Piccinni and V. Riquier, Phys.
Rev. Lett. 93, 212002 (2004). Here the characteristic
form for a four quark scalar coupling to two pions was
obtained as in [24] above but with the difference that
non-derivative coupling rather than derivative coupling
was used. The derivative coupling appeared in [24] since
the context was that of a non-linear chiral Lagrangian.
[26] J. A. Oller, E. Oset and J. R. Pelaez, Phys. Rev. Lett.
80, 3452 (1998). See also K. Igi and K. I. Hikasa, Phys.
Rev. D 59, 034005 (1999).
[27] A. V. Anisovich and A. V. Sarantsev, Phys. Lett. B 413,
137 (1997).
[28] V. Elias, A. H. Fariborz, F. Shi and T. G. Steele, Nucl.
Phys. A 633, 279 (1998).
[29] V. Dmitrasinovic, Phys. Rev. C 53, 1383 (1996).
[30] P. Minkowski andW. Ochs, Eur. Phys. J. C 9, 283 (1999).
[31] S. Godfrey and J. Napolitano, Rev. Mod. Phys. 71, 1411
(1999).
[32] L. Burakovsky and T. Goldman, Phys. Rev. D 57, 2879
(1998).
[33] A. H. Fariborz and J. Schechter, Phys. Rev D60, 034002
(1999).
[34] T. Hatsuda, T. Kunihiro and H. Shimizu, Phys. Rev.
Lett. 82, 2840 (1999); S. Chiku and T. Hatsuda, Phys.
Rev. D 58, 076001 (1998).
[38] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D
61, 074030 (2000). See also V. Bernard, N. Kaiser and
U. G. Meissner, Phys. Rev. D 44, 3698 (1991).
[39] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D
61, 074001 (2000).
[37] L. S. Celenza, S. f. Gao, B. Huang, H. Wang and
C. M. Shakin, Phys. Rev. C 61, 035201 (2000).
[38] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D
61, 074030 (2000). See also V. Bernard, N. Kaiser and
U. G. Meissner, Phys. Rev. D 44, 3698 (1991).
[39] D. Black, A. H. Fariborz and J. Schechter, Phys. Rev. D
61, 074001 (2000).
[40] D. Black, A. H. Fariborz, S. Moussa, S. Nasri and
J. Schechter, Phys. Rev. D 64, 014031 (2001).
[41] In addition to [39] and [40] above see T. Teshima, I. Ki-
tamura and N. Morisita, J. Phys. G 28, 1391 (2002);
F. Close and N. Tornqvist, ibid. 28, R249 (2002); A.H.
Fariborz, Int. J. Mod. Phys. A 19, 2095 (2004); 5417
(2004); Phys. Rev. D 74, 054030 (2006); F. Giacosa, Th.
Gutsche, V.E. Lyubovitskij and A. Faessler, Phys. Lett.
B 622, 277 (2005); J. Vijande, A. Valcarce, F. Fernandez
and B. Silvestre-Brac, Phys. Rev. D 72, 034025 (2005); S.
Narison, Phys. Rev. D 73, 114024 (2006); L. Maiani, F.
Piccinini, A.D. Polosa and V. Riquer, hep-ph/0604018.
[42] Y. S. Kalashnikova, A. E. Kudryavtsev, A. V. Nefediev,
C. Hanhart and J. Haidenbauer, the Eur. Phys. J. A 24
(2005) 437 [hep-ph/0412340].
[43] The Roy equation for the pion amplitude, S.M. Roy,
Phys. Lett. B 36, 353 (1971), has been used by several au-
thors to obtain information about the f0(600) resonance.
See T. Sawada, page 67 of ref. [12] above, I. Caprini,
G. Colangelo and H. Leutwyler, Phys. Rev. Lett. 96,
132001 (2006). A similar approach has been employed
to study the putative light kappa by S.Descotes-Genon
and B. Moussallam, Eur. Phys. J. C 48, 553 (2006).
[44] Further discussion of the approach in ref. [43] above
is given in D.V. Bugg, J. Phys. G 34, 151 (2007)
[hep-ph/0608081].
[45] E. Corrigan and P. Ramond, Phys. Lett. B 87, 73 (1979).
[46] E. B. Kiritsis and J. Papavassiliou, Phys. Rev. D 42, 4238
(1990).
[47] A. Armoni, M. Shifman and G. Veneziano, Nucl. Phys.
B 667, 170 (2003) [arXiv:hep-th/0302163].
[48] A. Armoni, M. Shifman and G. Veneziano, Phys. Rev.
Lett. 91, 191601 (2003) [arXiv:hep-th/0307097].
[49] A. Armoni, M. Shifman and G. Veneziano, Phys. Rev. D
71, 045015 (2005) [arXiv:hep-th/0412203].
[50] A. Armoni, G. Shore and G. Veneziano, Nucl. Phys. B
740, 23 (2006) [arXiv:hep-ph/0511143].
[51] A. Armoni, M. Shifman and G. Veneziano, Phys. Lett. B
647, 515 (2007) [arXiv:hep-th/0701229].
[52] F. Sannino and M. Shifman, Phys. Rev. D 69, 125004
(2004) [arXiv:hep-th/0309252].
[53] A. Feo, P. Merlatti and F. Sannino, Phys. Rev. D
70, 096004 (2004) [arXiv:hep-th/0408214]. P. Merlatti
and F. Sannino, Phys. Rev. D 70, 065022 (2004)
[arXiv:hep-th/0404251].
[54] M. Unsal, arXiv:hep-th/0703025.
[55] M. Unsal and L. G. Yaffe, Phys. Rev. D 74, 105019 (2006)
[arXiv:hep-th/0608180].
[56] P. Kovtun, M. Unsal and L. G. Yaffe, Phys. Rev. D 72,
105006 (2005) [arXiv:hep-th/0505075].
[57] F. Sannino, Phys. Rev. D 72, 125006 (2005)
[arXiv:hep-th/0507251].
[58] M. T. Frandsen, C. Kouvaris and F. Sannino, Phys. Rev.
D 74, 117503 (2006) [arXiv:hep-ph/0512153].
[59] F. Sannino and K. Tuominen, Phys. Rev. D 71, 051901
(2005) [arXiv:hep-ph/0405209].
[60] D. D. Dietrich and F. Sannino, arXiv:hep-ph/0611341.
To Appear in Phys. Rev. D.
[61] D. K. Hong, S. D. H. Hsu and F. Sannino, Phys. Lett. B
597, 89 (2004) [arXiv:hep-ph/0406200].
[62] D. D. Dietrich, F. Sannino and K. Tuominen, Phys. Rev.
D 72, 055001 (2005) [arXiv:hep-ph/0505059]. Phys. Rev.
D 73, 037701 (2006) [arXiv:hep-ph/0510217].
[63] K. Kainulainen, K. Tuominen and J. Virkajarvi,
arXiv:hep-ph/0612247.
[64] S. B. Gudnason, C. Kouvaris and F. Sannino, Phys. Rev.
D 74, 095008 (2006) [arXiv:hep-ph/0608055].
[65] C. Kouvaris, arXiv:hep-ph/0703266.
[66] S. B. Gudnason, T. A. Ryttov and F. Sannino,
arXiv:hep-ph/0612230.
[67] T. A. Ryttov and F. Sannino, Phys. Rev. D 73, 016002
(2006) [arXiv:hep-th/0509130].
[68] H. Georgi, Nucl. Phys. B 266, 274 (1986).
[69] J.M. Cornwall, D.N. Levin and G. Tiktopoulos, Phys.
Rev. D 10, 1145 (1974); B.W. Lee, C. Quigg and H.B.
Thacker, Phys. Rev. D 16, 1519 (1977); M.S. Chanowitz
and M.K. Galliard, Nucl. Phys. B 261, 379 (1985).
http://arxiv.org/abs/hep-ph/9712231
http://arxiv.org/abs/hep-ph/0604018
http://arxiv.org/abs/hep-ph/0412340
http://arxiv.org/abs/hep-ph/0608081
http://arxiv.org/abs/hep-th/0302163
http://arxiv.org/abs/hep-th/0307097
http://arxiv.org/abs/hep-th/0412203
http://arxiv.org/abs/hep-ph/0511143
http://arxiv.org/abs/hep-th/0701229
http://arxiv.org/abs/hep-th/0309252
http://arxiv.org/abs/hep-th/0408214
http://arxiv.org/abs/hep-th/0404251
http://arxiv.org/abs/hep-th/0703025
http://arxiv.org/abs/hep-th/0608180
http://arxiv.org/abs/hep-th/0505075
http://arxiv.org/abs/hep-th/0507251
http://arxiv.org/abs/hep-ph/0512153
http://arxiv.org/abs/hep-ph/0405209
http://arxiv.org/abs/hep-ph/0611341
http://arxiv.org/abs/hep-ph/0406200
http://arxiv.org/abs/hep-ph/0505059
http://arxiv.org/abs/hep-ph/0510217
http://arxiv.org/abs/hep-ph/0612247
http://arxiv.org/abs/hep-ph/0608055
http://arxiv.org/abs/hep-ph/0703266
http://arxiv.org/abs/hep-ph/0612230
http://arxiv.org/abs/hep-th/0509130
ABSTRACT
  We compare the dependences on the number of colors of the leading pion pion
scattering amplitudes using the single index quark field and two index quark
fields. These are seen to have different relationships to the scattering
amplitudes suggested by chiral dynamics which can explain the long puzzling
pion pion s wave scattering up to about 1 GeV. This may be interesting for
getting a better understanding of the large Nc approach as well as for
application to recently proposed technicolor models.

<|endoftext|><|startoftext|>
Introduction
The correlative analysis proves to be an essential tool in disentangling of causal relations
in the solar atmosphere. Recently, Rutten & Krijger (2003) and Rutten et al. (2004) quanti-
fied the correlation of the reversed granulation observed in the low chromosphere and mid-
photosphere with surface granulation in quest for the nature of internetwork background
brightness patterns in these layers. In agreement with these studies Puschmann et al. (2003)
demonstrated that filtering out of the p-modes is inevitable for studying the convective struc-
tures in the solar photosphere because p-modes mostly reduce the correlation between var-
ious line parameters. Odert et al. (2005) showed that correlation coefficients can fluctuate
strongly in time with amplitudes of over 0.4 due to 5-min oscillations and the amplitudes
are larger for higher formed lines. In case of weak lines the situation worsens even more,
because correlations derived from them are influenced stronger by seeing.
In this paper, we address the dissimilarity between non-magnetic and magnetic region seen
in height variations of the correlation between temperature and line-of-sight velocity. We
compare our results with a similar study by Rodrı́guez Hidalgo et al. (1999). Our analysis
follows on the paper Koza et al. (2006, henceforth Paper I) and we invite the reader to have
the paper at hand for further references.
2 Observational data and inversion procedure
We use a time sequence of spectrograms obtained at the German Vacuum Tower Telescope at
the Observatorio del Teide on April 28, 2000. The inversion code SIR (Ruiz Cobo & del Toro Iniesta
1992) was employed in the analysis of this observation. Thorough descriptions of the obser-
vational data, inversion procedure, and spectral lines are given in Paper I.
http://arxiv.org/abs/0704.0603v1
140 J. Koza et al.: Temperature – velocity correlation in the solar photosphere
Figure 1. The height variation of the correlation between line-of-sight velocity and temperature for
the results of Rodrı́guez Hidalgo et al. (1999) (solid) and for the non-magnetic (dashed) and magnetic
(dotted) region defined in Paper I.
3 Results
Figure 1 shows the height variations of the correlation between temperature and line-of-sight
velocity for three different sets of data. The results of Rodrı́guez Hidalgo et al. (1999) in-
dicate the significant anticorrelation between granules and intergranular lanes reaching the
maximum of about −0.7 at log τ5 = −0.2. The subsequent weakening of this anticorrelation
over the log τ5 ∈ 〈−0.2,−1.0〉 range is followed by a rise of correlation up to 0.28 at the
middle photosphere at log τ5 = −1.75. No significant correlation exists in the upper photo-
sphere. In the lower layers of the non-magnetic region (Paper I) the anticorrelation decreases
to −0.63 at log τ5 = −0.4. However, in the middle photosphere temperature and line-of-sight
velocity are almost uncorrelated with a local peak value of 0.08 at log τ5 = −1.7. Higher up at
log τ5 = −2.9 the anticorrelation of about −0.42 is established again. In the sub-photospheric
layers of the magnetic region the anticorrelation of −0.6 was found at log τ5 = 0.5. An
approximately constant value of anticorrelation −0.2 was obtained in the low and middle
photosphere. In the upper photosphere the anticorrelation reaches again −0.6.
Figure 2 compares temperatures and line-of-sight velocities in the form of scatter corre-
lation plots. Each plotted sample represents temperature and line-of-sight velocity specified
along the x and y axes at a given pixel along the slit at a time within the interval of 15 min.
From the top down, the row panels show correlations of the results of Rodrı́guez Hidalgo et al.
(1999) and our results in the non-magnetic and magnetic region in three selected optical
depths log τ5 = −0.3,−1.3, and −1.8. Plot saturation is avoided by showing sample density
contours rather than individual points, except for the extreme outliers. The total distributions
of temperatures and line-of-sight velocities are shown at the top and the left side of each
panel, respectively. The first-moment curves are aligned at large correlation and become
perpendicular in the absence of any correlation (Rutten & Krijger 2003). The first column
in Fig. 2 shows good agreement of correlation coefficients and positions of maxima of ve-
J. Koza et al.: Temperature – velocity correlation in the solar photosphere 141
Figure 2. Height-dependent scatter correlations of the line-of-sight velocity versus temperature. Top
row: data from Rodrı́guez Hidalgo et al. (1999, see p. 315, Fig. 1). Middle and bottom row: our data for
the non-magnetic and magnetic region (Paper I), respectively. The optical depths log τ5 and correlation
coefficients cc are specified at each panel. Negative velocities indicate upflows. The rescaled total
distributions of temperatures and line-of-sight velocities are shown as solid curves at the top and the
left side of each panel, respectively. The dashed curves show the first moments of the sample density
distributions over temperature and velocity bins.
locity distributions in the non-magnetic region with the results of Rodrı́guez Hidalgo et al.
(1999). However, the temperature distributions are dissimilar both in terms of asymme-
try and also the positions of maxima. Our results indicate predominance of higher tem-
peratures in the sample in contrast with lower temperatures prevailing in the results of
Rodrı́guez Hidalgo et al. (1999). In the magnetic region, weak anticorrelation was found.
The temperature distribution in this region is almost symmetric with maximum at higher
temperatures than in the non-magnetic region. The second column of Fig. 2 corresponds to
the layers where granulation is almost erased. While the temperature distributions in the
non-magnetic region and in the results of Rodrı́guez Hidalgo et al. (1999) are symmetric, in
the magnetic region the asymmetry indicates the abundant higher temperatures. The posi-
tive correlation in the results of Rodrı́guez Hidalgo et al. (1999) shown in the third column
suggests reversed granulation. However, this is not seen in our results. In the magnetic re-
gion the asymmetries of temperature and velocity distributions indicate higher abundance of
relatively hotter pixels with faster upflows.
142 J. Koza et al.: Temperature – velocity correlation in the solar photosphere
4 Discussion
Figures 1 and 2 show dissimilarities both in height variations of correlation and distributions,
although we and Rodrı́guez Hidalgo et al. (1999) used VTT observations and the SIR code.
Because the maximum of anticorrelation found at sub-photospheric layers of the magnetic
region is out of the range of sensitivity of the spectral lines (Paper I), we disregard this fea-
ture. Very low anticorrelation found over log τ5 ∈ 〈0.0,−2.0〉 in the magnetic region (Fig. 1)
suggests that magnetic field is another important decorrelating factor along with 5-min os-
cillations and seeing (Puschmann et al. 2003; Odert et al. 2005). In our results, the middle
layers of the non-magnetic and magnetic region lack signatures of reversed granulations
(Fig. 1). The sinusoidal shape of the correlation coefficient in the non-magnetic region over
the log τ5 ∈ 〈−1.2,−3.5〉 range can be explained as a sum of positive correlation typical for
reversed granulation and negative anticorrelation characteristic for 5-min oscillations.
5 Summary
Using a time sequence of high-resolution spectrograms and the SIR inversion code we have
inferred height variation of correlation between the temperature and line-of-sight velocity
throughout the quiet solar photosphere and a small magnetic region. The most important as-
pect is comparison of the results with the akin study by Rodrı́guez Hidalgo et al. (1999). We
found in agreement with Rodrı́guez Hidalgo et al. (1999) that the maximum anticorrelation
−0.6 between the temperature and line-of-sight velocity in the non-magnetic region occurs at
log τ5 = −0.4. The absence of signatures of reversed granulation in the middle layers of the
non-magnetic region is likely to be due to 5-min oscillations, which negative anticorrelation
prevails over weaker positive correlation typical for reversed granulation. Our results show
that magnetic field is another decorrelating factor along with 5-min oscillations and seeing.
Acknowledgements. The VTT is operated by the Kiepenheuer-Institut für Sonnenphysik, Freiburg,
at the Observatorio del Teide of the Instituto de Astrofı́sica de Canarias. We are grateful to B. Ruiz
Cobo (IAC) for kindly providing of the original data used in Figs. 1 and 2. This research is part of
the European Solar Magnetism Network (EC/RTN contract HPRN-CT-2002-00313). This work was
supported by the Slovak grant agency VEGA (2/6195/26) and by the Deutsche Forschungsgemein-
schaft grant (DFG 436 SLK 113/7). J. Koza’s research is supported by a Marie Curie Intra-European
Fellowships within the 6th European Community Framework Programme.
References
Koza, J., Kučera, A., Rybák, J., & Wöhl, H. 2006, A&A, 458, 941, (Paper I)
Odert, P., Hanslmeier, A., Rybák, J., Kučera, A., & Wöhl, H. 2005, A&A, 444, 257
Puschmann, K., Vázquez, M., Bonet, J. A., Ruiz Cobo, B., & Hanslmeier, A. 2003, A&A, 408, 363
Rodrı́guez Hidalgo, I., Ruiz Cobo, B., Collados, M., & del Toro Iniesta, J. C. 1999, in ASP Conf.
Ser. 173: Stellar Structure: Theory and Test of Connective Energy Transport, ed. A. Giménez, E. F.
Guinan, & B. Montesinos, 313
Ruiz Cobo, B. & del Toro Iniesta, J. C. 1992, ApJ, 398, 375
Rutten, R. J., de Wijn, A. G., & Sütterlin, P. 2004, A&A, 416, 333
Rutten, R. J. & Krijger, J. M. 2003, A&A, 407, 735
	Introduction
	Observational data and inversion procedure
	Results
	Discussion
	Summary
ABSTRACT
  We derive correlation coefficients between temperature and line-of-sight
velocity as a function of optical depth throughout the solar photosphere for
the non-magnetic photosphere and a small area of enhanced magnetic activity.
The maximum anticorrelation of about -0.6 between temperature and line-of-sight
velocity in the non-magnetic photosphere occurs at log tau5 = -0.4. The
magnetic field is another decorrelating factor along with 5-min oscillations
and seeing.

<|endoftext|><|startoftext|>
Magnetism in the high-Tc analogue Cs2AgF4 studied with muon-spin relaxation
T. Lancaster,1, ∗ S.J. Blundell,1 P.J. Baker,1 W. Hayes,1 S.R. Giblin,2 S.E.
McLain,2 F.L. Pratt,2 Z. Salman,1, 2 E.A. Jacobs,3 J.F.C. Turner,3 and T. Barnes3
Clarendon Laboratory, Oxford University Department of Physics, Parks Road, Oxford, OX1 3PU, UK
ISIS Facility, Rutherford Appleton Laboratory, Chilton, Oxfordshire OX11 0QX, UK
Department of Chemistry and Neutron Sciences Consortium,
University of Tennessee, Knoxville, Tennessee 37996, USA
(Dated: November 4, 2018)
We present the results of a muon-spin relaxation study of the high-Tc analogue material Cs2AgF4.
We find unambiguous evidence for magnetic order, intrinsic to the material, below TC = 13.95(3) K.
The ratio of inter- to intraplane coupling is estimated to be |J ′/J | = 1.9 × 10−2, while fits of the
temperature dependence of the order parameter reveal a critical exponent β = 0.292(3), implying an
intermediate character between pure two- and three- dimensional magnetism in the critical regime.
Above TC we observe a signal characteristic of dipolar interactions due to linear F–µ
+–F bonds,
allowing the muon stopping sites in this compound to be characterized.
PACS numbers: 74.25.Ha, 74.72.-h, 75.40.Cx, 76.75.+i
Twenty years after its discovery, high-Tc superconduc-
tivity remains one of the most pressing problems in con-
densed matter physics. High-TC cuprates share a lay-
ered structure of [CuO2] planes with strong antiferro-
magnetic (AFM) interactions between S = 1/2 3d9 Cu2+
ions [1, 2]. However, analogous materials based upon
3d transition metal systems such as manganites [3] and
nickelates [4] share neither the magnetic nor the super-
conducting properties of the high-TC cuprates, leading to
speculation that the spin- 1
character of Cu2+ is unique
in this context. A natural extension to this line of in-
quiry is to explore compounds based on the 4d analogue
of Cu2+, namely S = 1
4d9 Ag2+ [5]; this motivated the
synthesis of the layered fluoride Cs2AgF4 which contains
silver in the unusual divalent oxidation state [6, 7]. This
material possesses several structural similarities with the
superconducting parent compound La2CuO4; it is com-
prised of planes of [AgF2] instead of [CuO2] separated by
FIG. 1: (Color online.) Structure of Cs2AgF4 showing a pos-
sible magnetic structure. Candidate muon sites occur in both
the [CsF] and [AgF2] planes.
planes of [CsF] instead of [LaO] (Fig. 1).
Magnetic measurements [7] suggest that in contrast
to the antiferromagnetism of La2CuO4, Cs2AgF4 is well
modelled as a two-dimensional (2D) Heisenberg ferro-
magnet (described by the Hamiltonian H = J
〈ij〉 Si ·
Sj) with intralayer coupling J/kB = 44.0 K. The ob-
servation of a magnetic transition below TC ≈ 15 K
with no spontaneous magnetization in zero applied field
(ZF) and a small saturation magnetization (∼ 40 mT),
suggests the existence of a weak, AFM interlayer cou-
pling. This behavior is reminiscent of the 2D ferromag-
net K2CuF4 [8], where ferromagnetic (FM) exchange re-
sults from orbital ordering driven by a Jahn-Teller distor-
tion [9, 10]. On this basis, it has been suggested that in
Cs2AgF4 a staggered ordering of dz2−x2 and dz2−y2 hole-
containing orbitals on the Ag2+ ions gives rise to the
FM superexchange [7]. An alternative scenario has also
been advanced on the basis of density functional calcula-
tions in which a d3z2−r2 − p− dx2−y2 orbital interaction
through the Ag–F–Ag bridges causes spin polarization of
the dx2−y2 band [11].
Although inelastic neutron scattering measurements
have been carried out on this material [7], both Cs and
Ag strongly absorb neutrons, resulting in limited resolu-
tion and a poor signal-to-noise ratio. In contrast, spin-
polarized muons, which are very sensitive probes of local
magnetic fields, suffer no such impediments and, as we
shall see, are ideally suited to investigations of the mag-
netism in fluoride materials.
In this paper we present the results of a ZF muon-
spin relaxation (µ+SR) investigation of Cs2AgF4. We
are able to confirm that the material is uniformly ordered
throughout its bulk below TC and show that the critical
behavior associated with the magnetic phase transition
is intermediate in character between 2D and 3D. In ad-
dition, strong coupling between the muon and F− ions
allows us to characterise the muon stopping states in this
http://arxiv.org/abs/0704.0604v1
compound.
ZF µ+SR measurements were made on the MuSR in-
strument at the ISIS facility, using an Oxford Instru-
ments Variox 4He cryostat. In a µ+SR experiment spin-
polarized positive muons are stopped in a target sample,
where the muon usually occupies an interstitial position
in the crystal. The observed property in the experiment
is the time evolution of the muon spin polarization, the
behavior of which depends on the local magnetic field B
at the muon site [12]. Polycrystalline Cs2AgF4 was syn-
thesised as previously reported [7]. Due to its chemical
reactivity, the sample was mounted under an Ar atmo-
sphere in a gold plated Ti sample holder with a cylindri-
cal sample space of diameter 2.5 cm and depth 2 mm.
A 25 µm thick window was screw-clamped onto a gold
o-ring on the main body of the sample holder resulting
in an airtight seal.
Example ZF µ+SR spectra measured on Cs2AgF4 are
shown in Fig. 2(a). Below TC (Fig. 2(c)) we observe
oscillations in the time dependence of the muon polar-
ization (the “asymmetry” A(t) [12]) which are character-
istic of a quasi-static local magnetic field at the muon
stopping site. This local field causes a coherent preces-
sion of the spins of those muons for which a component
of their spin polarization lies perpendicular to this local
field (expected to be 2/3 of the total spin polarization
for a powder sample). The frequency of the oscillations
is given by νi = γµ|Bi|/2π, where γµ is the muon gyro-
magnetic ratio (= 2π × 135.5 MHz T−1) and Bi is the
average magnitude of the local magnetic field at the ith
muon site. Any fluctuation in magnitude of these local
fields will result in a relaxation of the oscillating signal
[13], described by relaxation rates λi.
Maximum entropy analysis (inset, Fig. 2(c)) reveals
two separate frequencies in the spectra measured below
TC, corresponding to two magnetically inequivalent muon
stopping sites in the material. The precession frequen-
cies, which are proportional to the internal magnetic field
experienced by the muon, may be viewed as an effective
order parameter for these systems [12]. In order to ex-
tract the temperature dependence of the frequencies, the
low temperature data were fitted to the function
A(t) =
Ai exp(−λit) cos(2πνit) (1)
+A3 exp(−λ3t) +Abg,
where A1 and A2 are the amplitudes of the precession
signals and A3 accounts for the contribution from those
muons with a spin component parallel to the local mag-
netic field. The term Abg reflects the non-relaxing signal
from those muons which stop in the sample holder or
cryostat tail.
The ratio of the two precession frequencies was found
to be ν2/ν1 = 0.83 across the temperature range T < TC
and this ratio was fixed in the fitting procedure. The
FIG. 2: (Color online.) (a) Temperature evolution of ZF
µ+SR spectra measured on Cs2AgF4 between 1.3 K and
59.7 K. (b) Above TC low frequency oscillations are observed
due to the dipole-dipole coupling of F–µ+–F states. Inset:
The energy level structure allows three transitions, leading
to three observed frequencies. (c) Below TC higher frequency
oscillations are observed due to quasistatic magnetic fields
at the muon sites. Inset: Maximum entropy analysis reveal
two magnetic frequencies corresponding to two magnetically
inequivalent muon sites.
amplitudes Ai were found to be constant across the tem-
perature range and were fixed at values A1 = 1.66%,
A2 = 3.74% and A3 = 5.54%. This shows that the prob-
ability of a muon stopping in a site that gives rise to fre-
quency ν1 is approximately half that of a muon stopping
in a site that corresponds to ν2. We note also that A3 is in
excess of the expected ratio of A3/(A1 +A2) = 1/2. The
unambiguous assignment of amplitudes is made difficult
by the resolution limitations that a pulsed muon source
places on the measurement. The initial muon pulse at
ISIS has FWHM τmp ∼ 80 ns, limiting the response for
frequencies above ∼ τ−1mp [12]. We should expect, there-
fore, slightly reduced amplitudes or increased relaxation
(see below) for the oscillating components in our spectra
for which ν1,2 & 5 MHz. The amplitudes of the oscilla-
tions are large enough, however, for us to conclude that
the magnetic order in this material is an intrinsic prop-
erty of the bulk compound. Moreover, above TC there is
a complete recovery of the total expected muon asymme-
try. This observation, along with the constancy of A1,2,3
below TC, leads us to believe that Cs2AgF4 is completely
ordered throughout its bulk below TC.
Fig. 3(a) shows the evolution of the precession frequen-
11-T/TC
0 2 4 6 8 10 12 14
T (K)
FIG. 3: Results of fitting data measured below TC to Eq. (1).
(a) Evolution of the muon-spin precession frequencies ν1
(closed circles) and ν2 (open circles) with temperature. Solid
lines are fits to the function νi(T ) = νi(0)(1 − T/TC)
described in the text. Inset: Scaling plot of the precession
frequencies with parameters TC = 13.95(3) K and β = 0.292.
(b) Relaxation rates λ1 (closed circles), λ2 (open circles) and
λ3 (closed triangles), as a function of temperature showing a
rapid increase as TC is approached from below.
cies νi, allowing us to investigate the critical behavior as-
sociated with the phase transition. From fits of νi to the
form νi(T ) = νi(0)(1−T/TC)β for T > 10 K, we estimate
TC = 13.95(3) K and β = 0.292(3). In fact, good fits to
νi(T ) = νi(0)(1−T/13.95)0.292 are achieved over the en-
tire measured temperature range (that is, no spin wave
related contribution was evident at low temperatures),
yielding ν1(0) = 6.0(1) MHz and ν2(0) = 4.9(2) MHz
corresponding to local magnetic fields at the two muon
sites of B1 = 44(1) mT and B2 = 36(1) mT. A value
of β = 0.292(3) is less than expected for three dimen-
sional models (β = 0.367 for 3D Heisenberg), but larger
than expected for 2D models (β = 0.23 for 2D XY or
β = 0.125 for 2D Ising) [14, 15]. This suggests that in
the critical regime the behavior is intermediate in char-
acter between 2D and 3D; this contrasts with the mag-
netic properties of K2CuF4 where β = 0.33, typical of a
3D system, is observed in the reduced temperature re-
gion tr ≡ (TC − T )/TC > 7 × 10−2, with a crossover to
more 2D-like behavior at tr < 7 × 10−2, where β = 0.22
[16, 17, 18]. Our measurements probe the behavior of
Cs2AgF4 for tr ≥ 5.5 × 10−3, for which we do not ob-
serve any crossover.
A knowledge of TC and the intraplane coupling J , al-
lows us to estimate the interplane coupling J ′. Recent
studies of layered S = 1/2 Heisenberg ferromagnets us-
ing the spin-rotation invariant Green’s function method
[19], show that the interlayer coupling may be described
by an empirical formula
= exp
with a = 2.414 and b = 2.506. Substituting our value of
TC = 13.95 K and using |J |/kB = 44.0 K [7], we obtain
|J ′|/kB = 0.266 K and |J ′/J | = 1.9× 10−2. The applica-
tion of this procedure to K2CuF4 (for which TC = 6.25 K
and |J |/kB = 20.0 K [8]) results in |J ′|/kB = 0.078 K
and |J ′/J | = 3.9 × 10−3. This suggests that, although
highly anisotropic, the interlayer coupling is stronger
in Cs2AgF4 than in K2CuF4. This may account for
the lack of dimensional crossover in Cs2AgF4 down to
tr = 5.5× 10−3.
Both transverse depolarization rates λ1 and λ2 are seen
to decrease with increasing temperature (Fig. 3(b)) ex-
cept close to TC where they rapidly increase. The large
values of λ1,2 at low temperatures may reflect the re-
duced frequency response of the signal due to the muon
pulse width described above. The large upturn in the
depolarization rate close to TC, which is also seen in
the longitudinal relaxation rate λ3 (which is small and
nearly constant except on approach to TC), may be at-
tributed to the onset of critical fluctuations close to TC.
The component in the spectra with the larger precession
frequency ν1 has the smaller depolarization rate λ1 at all
temperatures. These features provide further evidence
for a magnetic phase transition at TC = 13.95 K.
Above TC the character of the measured spectra
changes considerably (Fig.2(a) and (b)) and we observe
lower frequency oscillations characteristic of the dipole-
dipole interaction of the muon and the 19F nucleus [20].
The Ag2+ electronic moments, which dominate the spec-
tra for T < TC, are no longer ordered in the param-
agnetic regime, and fluctuate very rapidly on the muon
time scale. They are therefore motionally narrowed from
the spectra, leaving the muon sensitive to the quasistatic
nuclear magnetic moments. This interpretation is sup-
ported by µ+SR measurements of K2CuF4 where simi-
lar behavior was observed [21]. In many materials con-
taining fluorine, the muon and two fluorine ions form
a strong hydrogen bond usually separated by approxi-
mately twice the F− ionic radius. The linear F–µ+–F
spin system consists of four distinct energy levels with
three allowed transitions between them (inset, Fig. 2(b))
giving rise to the distinctive three-frequency oscillations
observed. The signal is described by a polarization
function [20] D(ωdt) =
uj cos(ωjt)
, where
u1 = 1, u2 = (1 + 1/
3) and u3 = (1 − 1/
3). The
transition frequencies (shown in Fig. 2(b)) are given by
ωj = 3ujωd/2 where ωd = µ0γµγF/4πr
3, γF is the
nuclear gyromagnetic ratio and r is the µ+–19F separa-
tion. This function accounts for the observed frequencies
very well, leading us to conclude that the F–µ+–F bonds
are highly linear.
A successful fit of our data required the multiplication
of D(ωdt) by an exponential function with a small re-
laxation rate λ4, crudely modelling fluctuations close to
TC. The addition of a further exponential component
A5 exp(−λ5t) was also required in order to account for
those muon sites not strongly dipole coupled to fluorine
nuclei. The data were fitted with the resulting relaxation
function
A(t) = A4D(ωdt) exp(−λ4t) +A5 exp(−λ5t) +Abg, (3)
The frequency ωd was found to be constant at all
measured temperatures, taking the value ωd = 2π ×
0.211(1) MHz, which corresponds to a constant F–µ+
separation of 1.19(1) Å, typical of linear bonds [20]. The
relaxation rates only vary appreciably within 0.2 K of
the magnetic transition, increasing as TC is approached
from above, probably due to the onset of critical fluctua-
tions. This provides further evidence for our assignment
of TC = 13.95 K.
Our determination of νi(0) and observation of the lin-
ear F–µ+–F signal allow us to identify candidate muon
sites in Cs2AgF4. Although the magnetic structure of the
system is not known, magnetic measurements [7] suggest
the existence of loosely coupled FM Ag2+ layers arranged
antiferromagnetically along the c-direction. Dipole fields
were calculated for such a candidate magnetic structure
with Ag2+ moments in the ab planes oriented parallel
(antiparallel) to the a direction for z = 0 (z = 1/2). The
calculation was limited to a sphere containing ≈ 105 Ag
ions with localized moments of 0.8 µB [7]. The above
considerations suggest that the muon sites will be situ-
ated midway between two F− ions. Two sets of candidate
muon sites may be identified in the planes containing the
fluorine ions. Magnetic fields corresponding to ν2(0) are
found in the [CsF] planes (i.e. those with z = 0.145 and
z = 0.355) at the positions (1/4, 1/4, z), (1/4, 3/4, z),
(3/4, 1/4, z) and (3/4, 3/4, z). Sites corresponding to
the frequency ν1(0) are more difficult to assign, but good
candidates are found in the [AgF2] planes (at z = 0, 1/2)
at positions (1/4, 1/2, z), (3/4, 1/2, z), (1/4, 0, z) and
(3/4, 0, z). The candidate sites are shown in Fig. 1.
We note that there are twice as many [CsF] planes in a
unit cell than there are [AgF2] planes in agreement with
our observation that components with frequency ν2 oc-
cur with twice the amplitude of those with ν1. Such an
assignment then implies that the presence of the muon
distorts the surrounding F− ions such that their separa-
tion is ∼ 2.38 Å. This contrasts with the in-plane F–F
separation in the unperturbed material of 4.55 Å ([CsF]
planes) and∼ 3.2 Å ([AgF2] planes) [7]. Thus the two ad-
jacent F− ions in the magnetic [AgF2] planes each shift by
∼ 0.4 Å from their equilibrium positions towards the µ+,
demonstrating that the muon introduces a non-negligible
local distortion; however, the distortion in the Ag2+ ion
positions is expected to be much less significant.
In conclusion, we have shown unambiguous evi-
dence for magnetic order in Cs2AgF4 with an exchange
anisotropy of |J ′/J | ≈ 10−2 and critical behavior inter-
mediate in character between 2D and 3D. The presence
of coherent F–µ+–F states allows a determination of can-
didate muon sites and an estimate of the perturbation
of the system caused by the muon probe. This study
demonstrates that µ+SR is an effective and useful probe
of the Cs2AgF4 system. In order to further explore this
system as an analogue to the high-TC materials it is desir-
able to perform investigations of doped materials based
on the Cs2AgF4 parent compound.
Part of this work was carried out at the ISIS facility,
Rutherford Appleton Laboratory, UK. This work is sup-
ported by the EPSRC (UK). T.L. acknowledges support
from the Royal Commission for the Exhibition of 1851.
J.F.C.T and S.E.M acknowledge the U.S. National Sci-
ence Foundation under awards CAREER-CHE 039010
and OISE 0404938, respectively.
∗ Electronic address: t.lancaster1@physics.ox.ac.uk
[1] P.A. Lee, N. Nagaosa and X.-G. Wen, Rev. Mod. Phys.
78 17 (2006).
[2] M.A. Kastner et al, Rev. Mod. Phys. 70 897 (1998).
[3] E. Dagotta, T. Hotta and A Moreo, Phys. Rep. 344 1
(2001).
[4] R.J. Cava et al., Phys. Rev. B 43 1229 (1991).
[5] W. Grochala and R. Hoffmann, Angew. Chem. Int. Ed.
40 2742 (2001).
[6] R.-H. Odenthal, D. Paus and R. Hoppe, Z. Anorg. Allg.
Chem. 407 144 (1974).
[7] S.E. McLain et al., Nature Mat. 5, 561 (2006).
[8] I. Yamada, J. Phys. Soc. Jpn. 33, 979 (1972).
[9] Y. Ito and J. Akimitsu, J. Phys. Soc. Jpn. 40, 1333
(1976).
[10] D.I. Khomskii and K.I. Kugel, Solid State Commun. 13
763 (1973).
[11] D. Dai et al., Chem. Mater. 18, 3281 (2006).
[12] S.J. Blundell, Contemp. Phys. 40, 175 (1999).
[13] R.S. Hayano et al., Phys. Rev. B 20, 850 (1979).
[14] S.J. Blundell, Magnetism in Condensed Matter (Oxford
University Press, 2001).
[15] S.T. Bramwell and P.C.W. Holdsworth, J. Phys.: Con-
dens. Matter 5 L53 (1993).
[16] K. Hirakawa and H. Ikeda, J. Phys. Soc. Jpn. 35, 1328
(1973).
[17] T. Hashimoto et al., J. Magn. Magn. Mater., 15-18, 1025
(1980).
[18] M. Suzuki and H. Ikeda, J. Phys. Soc. Jpn. 50, 1133
(1981).
[19] D. Schmalfuß, J. Richter and D. Ihle, Phys. Rev. B 72
224405 (2005).
[20] J.H. Brewer et al., Phys. Rev. B 33, 7813 (1986).
[21] C. Mazzoli et al., Physica B 326 427 (2003).
mailto:t.lancaster1@physics.ox.ac.uk
ABSTRACT
  We present the results of a muon-spin relaxation study of the high-Tc
analogue material Cs2AgF4. We find unambiguous evidence for magnetic order,
intrinsic to the material, below T_C=13.95(3) K. The ratio of inter- to
intraplane coupling is estimated to be |J'/J|=1.9 x 10^-2, while fits of the
temperature dependence of the order parameter reveal a critical exponent
beta=0.292(3), implying an intermediate character between pure two- and three-
dimensional magnetism in the critical regime. Above T_C we observe a signal
characteristic of dipolar interactions due to linear F-mu-F bonds, allowing the
muon stopping sites in this compound to be characterized.

<|endoftext|><|startoftext|>
Introduction
	Flattè parametrization
	Flattè analysis: procedure and results
	Discussion
	Summary
	Acknowledgments
	References
ABSTRACT
  We investigate the enhancement in the D^0\bar{D}^0\pi^0 final state with the
mass M=3875.2\pm 0.7^{+0.3}_{-1.6}\pm 0.8 MeV found recently by the Belle
Collaboration in the B\to K D^0\bar{D}^0\pi^0 decay and test the possibility
that this is yet another manifestation of the well-established resonance
X(3872). We perform a combined Flatte analysis of the data for the
D^0\bar{D}^0\pi^0 mode, and for the \pi^+\pi^- J/\psi mode of the X(3872). Only
if the X(3872) is a virtual state in the D^0\bar{D}^{*0} channel, the data on
the new enhancement comply with those on the X(3872). In our fits, the mass
distribution in the D^0\bar{D}^{*0} mode exhibits a peak at 2-3 MeV above the
D^0\bar{D}^{*0} threshold, with a distinctive non-Breit-Wigner shape.

<|endoftext|><|startoftext|>
Introduction
Tremendous interest has been generated recently in the electronic properties of two dimensional
(2D) graphene in both experimental and theoretical arenas [1, 2, 3, 5, 6, 7]. Graphene is a single
atomic layer of carbon atoms forming a dense honeycomb crystal lattice [8]. The massless energy
dispersion relation of electrons and holes with zero (or close to zero) bandgap results in novel
behavior of both single-particle and collective excitations [1, 2, 3]. In addition, the high mobility
of electrons in graphene has generated interest in developing novel high speed devices. Recently,
it has been shown that the frequencies of plasma waves in graphene at moderate carrier densities
( 109−1011 cm−2) are in the terahertz range [3]. Electron-hole decay through plasmon emission
has been recently experimentally observed in graphene [4]. The zero bandgap of graphene leads
to strong damping of the plasma waves (plasmons) at finite temperatures as plasmons can
decay by exciting interband electron-hole pairs [1, 2]. In this paper we show that plasmon
amplification through stimulated emission is possible in population inverted graphene layers.
This process is depicted in Fig.1. We show that plasmons in graphene can have a net gain
at frequencies in the 1-10 THz range even if plasmon losses from electron and hole intraband
scattering are considered. A net gain for the plasmons implies that terahertz amplifiers and
oscillators based on plasmon amplification through stimulated emission are possible. The gain
at terahertz frequencies is possible due to the (almost) zero bandgap of graphene. Although
terahertz gain is also achievable in population inverted subbands in 2D quantum wells [9],
intrasubband plasmons in quantum wells, being longitudinal collective modes, do not couple
with intersubband transitions that require field polarization perpendicular to the plane of the
quantum wells. The electromagnetic energy in the two-dimensional plasmon mode is confined
within very small distances of the graphene layer and therefore waveguiding structures with large
dimensions, such as those required in terahertz quantum cascade lasers [9], are not required
for realizing plasmon based terahertz devices. We also present results for plasmon gain under
different population inversion conditions taking into account both intraband and interband
electronic transitions and carrier scattering.
2 Theoretical Model
In this section we discuss the theoretical model used to obtain the values for the plasmon gain
in graphene. In graphene, the valence and conduction bands resulting from the mixing of the
pz-orbitals are degenerate at the inequivalent K and K
′ points of the Brillouin zone [8]. Near
these points, the conduction and valence band dispersion relations can be written compactly
as [2],
Es,k = sh̄v|k| (1)
where s = ±1 stand for conduction (+1) and valence (−1) bands, respectively, and v is the
“light” velocity of the massless electrons and holes. The wavevector k is measured from the
K(K ′) point. The frequencies ω(q) of the longitudinal plasmon modes of wavevector q are given
by the equation,ǫ(q, ω) = 0, where ǫ(q, ω) is the longitudinal dielectric function of graphene [2].
In the random phase approximation (RPA) ǫ(q, ω) can be written as [10],
ǫ(q, ω) = 1− V (q)Π(q, ω) (2)
Here, V (q) is the bare 2D Coulomb interaction and equals e2/2ǫ∞q. ǫ∞ is the average of
the dielectric constant of the media on either side of the graphene layer. Π(q, ω) is the
electron-hole propagator including both intraband and interband processes and is given by
the expression [1, 2],
Π(q, ω) = 4
s s′ k
| < ψs′,k+q|e
iq.r|ψs,k > |
f(Es,k − Efs)− f(Es′,k+q − Efs′)
h̄ω + Es,k − Es′,k+q + iη
The factor of 4 outside in the above equation comes from the degenerate two spins and the
two valleys at K and K ′. f(E − Ef ) is the Fermi distribution function with Fermi energy Ef .
|ψs,k) > are the Bloch functions for the conduction and valence bands near the K(K
′) point.
The occupancy of electrons in the conduction and valence bands are described by different
Fermi levels to allow for nonequilibrium population inversion. The Bloch functions have the
following matrix elements [8],
| < ψs′,k+q|e
iq.r|ψs,k > |
1 + ss′
|k|+ |q| cos (θ)
|k+ q|
where θ is the angle between the vectors k and q. The condition v|q| < ω(q) must be satisfied
in order to avoid direct intraband absorption of plasmons. Assuming v|q| < ω, and using the
symmetry between conduction and valence bands, the intraband and interband contributions
to the propagator can be approximated as follows,
Πintra(q, ω) ≈
q2K T /πh̄2
ω(ω + i/τ)− v2q2/2
eEf+/KT + 1
e−Ef−/KT + 1
Πinter(q, ω) ≈
[f(h̄ω/2− Ef+)− f(−h̄ω/2− Ef−)]
ω2 − ω2
[f(h̄ω/2− Ef+)− f(−h̄ω/2− Ef−)] (6)
Here, q = |q|. In Equation (5), the intraband contribution to the propagator is written in the
plasmon-pole approximation that satisfies the f-sum rule [10]. This approximation is not valid
for large value of the wavevector q when ω(q) → vq. However, in this paper we will be concerned
with small values of the wavevector for which the plasmons have net gain, and therefore
the approximation used in Equation (5) is adequate. Plasmon energy loss due to intraband
scattering has been included with a scattering time τ in the number-conserving relaxation-time
approximation which assumes that as a result of scattering the carrier distribution relaxes
to the local equilibrium distribution [11]. The real part of the interband contribution to the
propagator modifies the effective dielectric constant and leads to a significant reduction in the
plasmon frequency under population inversion conditions. The imaginary part of the interband
contribution to the propagator incorporates plasmon loss or gain due to stimulated interband
transitions. A necessary condition for plasmon gain from stimulated interband transitions is
that the splitting of the Fermi levels of the conduction and valence electrons exceed the plasmon
energy, i.e. Ef+−Ef− > h̄ω. But the plasmons will gave net gain only if the plasmon gain from
stimulated interband transitions exceed the plasmon loss due to intraband scattering. The real
and imaginary parts of the propagator in Equations (5) and (6) satisfy the Kramers-Kronig
relations. Equations (5) and (6) can be used with Equation (2) to calculate the real and
imaginary parts of the plasmon frequency ω(q) as a function of q. However, from the point
of view of device design, it is more useful to assume that the frequency ω is real and the
propagation vector q(ω), written as a function of ω, is complex. Since the charge density wave
corresponding to plasmons has the form eiq.r−iωt, the imaginary part of the propagation vector
corresponds to net gain or loss. We define the net plasmon energy gain g(ω) as −2Imag{q(ω)}.
3 Results and Discussion
In simulations we use v = 108 cm/s and ǫ∞ = 4.0ǫo (assuming silicon-dioxide on both sides
of the graphene layer) [1]. We assume a nonequilibrium situation, as in a semiconductor
interband laser [12], in which the electron and hole densities are equal and Ef+ = −Ef−.
Such a non-equilibrium situation can be realized experimentally by either carrier injection in
an electrostatically defined graphene pn-junction or through optical pumping [13, 14]. The
value of the scattering time τ (momentum relaxation time) is also critical for calculations of
the net plasmon gain. Value of τ can be estimated from the experimentally reported values
of mobility using the following expression for the graphene conductivity (assuming that only
electrons are present) [17],
e2 τ K T
eEf+/KT + 1
Values of mobility between 20,000 and 60,000 cm2/V-s have been experimentally measured at
low temperatures (T¡77K) in graphene [6, 7, 15]. Assuming a mobility value of 27,000 cm2/V-s
, reported in Ref. [15] for an electron density of 3.4×1012 cm−2 at T=58K, the value of τ comes
out to be approximately 0.6 ps. The phonon scattering time was experimentally determined to
be close to 4 ps at T=300K [15]. Therefore, impurity or defect scattering is expected to be the
0 0.5 1 1.5 2 2.5 3
Wavevector (105 cm−1)
T = 10K
n=p=1, 3.5, 6, 8.5 × 109 cm−2 
increasing
density 
 ω =  v q 
Figure 2: Calculated plasmon dispersion relation in graphene at 10K is plotted for different
electron-hole densities (n = p = 1, 3.5, 6, 8.5×109 cm−2). The condition ω(q) > h̄vq is satisfied
for frequencies that have net gain in the terahertz range. The assumed values of v and τ are
108 cm/s and 0.5 ps, respectively.
dominant momentum relaxation mechanism in graphene, and the scattering time is expected
to be relatively independent of temperature [17]. In the results presented below, unless stated
otherwise, we have used a temperature independent scattering time of 0.5 ps.
Figs. 2-7 show the calculated dispersion relation of the plasmons and the net plasmon gain
at T=10K, 77K, and 300K for different electron-hole densities. At very low frequencies the
losses from intraband scattering dominate. At frequencies ranging from 1 to 15 THz, the
plasmons can have net gain. The values of the net gain are found to be significantly large
reaching 1−4×104 cm−1 for electron-hole densities in the 109 cm−2 range at low temperatures
and 1011 cm−2 range at room temperature. The calculated plasmon dispersions indicate that
ω(q) > vq at all frequencies for which the plasmons have net gain. Therefore, direct intraband
absorption of plasmons is not possible at these frequencies and will not reduce the calculated
gain values. Plasmons acquire net gain for smaller electron-hole densities at lower temperatures.
At higher temperatures the distribution of electrons and holes in energy is broader and the
0 1 2 3 4 5 6
Frequency (THz)
T = 10K
n=p=1, 3.5, 6, 8.5 × 109 cm−2 increasing
density 
Figure 3: Net plasmon gain in graphene at 10K is plotted for different electron-hole densities
(n = p = 1, 3.5, 6, 8.5 × 109 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps,
respectively.
gain at any particular frequency is therefore smaller. At T=10K, the plasmons have net gain
for electron-hole densities as small as 2 × 109 cm−2. Almost an order of magnitude larger
electron-hole densities are required to achieve the same net gain values at T=77K compared
to T=10K. The linear energy dependence of the density of states associated with the massless
dispersion relation of electrons and holes in graphene results in the maximum plasmon gain
values to increase with the electron-hole density. The peak gain values shift to higher frequencies
with the increase in the electron-hole density for the same reason.
The fact that plasmons can acquire net gain for relatively small carrier densities suggests
that plasmon gain is relatively robust with respect to intraband scattering losses. Fig. 8 shows
the net gain at T=10K for n = p = 1010 cm−2 and values of the intraband scattering time
τ varying from 0.1 to 0.5 ps. The net gain decreases as the plasmon losses increase with a
decrease in the value of τ and the maximum gain value equals zero for τ = 0.15 ps. However,
it should not be concluded from Fig. 8 that plasmons cannot have net gain for τ less than 0.15
ps since electron-hole density can always be increased to achieve net gain for smaller values of
0 2 4 6 8
Wavevector (105 cm−1)
increasing
density 
 ω =  v q 
T = 77K
n=p=1, 2, 3, 4 × 1010 cm−2 
Figure 4: Calculated plasmon dispersion relation in graphene at 77K is plotted for different
electron-hole densities (n = p = 1, 2, 3, 4 × 1010 cm−2). The condition ω(q) > h̄vq is satisfied
for frequencies that have net gain in the terahertz range. The assumed values of v and τ are
108 cm/s and 0.5 ps, respectively.
0 2 4 6 8 10
Frequency (THz)
T = 77K
n=p=1, 2, 3, 4 × 1010 cm−2 
increasing
density 
Figure 5: Net plasmon gain in graphene at 77K is plotted for different electron-hole densities
(n = p = 1, 2, 3, 4 × 1010 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps,
respectively.
0 5 10 15 20
Wavevector (105 cm−1)
T = 300K
n=p=1, 1.5, 2, 2.5 × 1011 cm−2 
increasing
density 
 ω =  v q 
Figure 6: Calculated plasmon dispersion relation in graphene at 300K is plotted for different
electron-hole densities (n = p = 1, 1.5, 2, 2.5×1011 cm−2). The condition ω(q) > h̄vq is satisfied
for frequencies that have net gain in the terahertz range. The assumed values of v and τ are
108 cm/s and 0.5 ps, respectively.
0 5 10 15 20
Frequency (THz)
T = 300K
n=p=1, 1.5, 2, 2.5 × 1011 cm−2 
increasing
density 
Figure 7: Net plasmon gain in graphene at 300K is plotted for different electron-hole densities
(n = p = 1, 1.5, 2, 2.5 × 1011 cm−2). The assumed values of v and τ are 108 cm/s and 0.5 ps,
respectively.
0 2 4 6 8
Frequency (THz)
T = 10K
n = p = 1010 cm−2 
τ = 0.5, 0.4, 0.3, 0.2, 0.15, 0.1 ps
increasing τ 
Figure 8: Net plasmon gain in graphene at 10K is plotted for different intraband scattering times
τ (τ = 0.5, 0.4, 0.3, 0.2, 0.15, 0.1 ps). The assumed value of v is 108 cm/s and the electron-hole
density is 1010 cm−2.
0 2 4 6 8 10 12
Frequency (THz)
increasing τ 
T = 10K
n = p = 3 × 1010 cm−2 
τ = 150, 125, 100, 75 fs
Figure 9: Net plasmon gain in graphene at 10K is plotted for different scattering times τ
(τ = 150, 125, 100, 75 fs). The assumed value of v is 108 cm/s and the electron-hole density is
3× 1010 cm−2.
τ . Fig. 9 shows the net gain at T=10K for n = p = 3× 1010 cm−2 and values of the intraband
scattering time τ varying from 75 to 150 fs. It can be seen that at these larger carrier densities
plasmons have net gain for scattering times that are sub-100 fs.
The exceedingly large values of the net plasmon gain (> 104 cm−1) in graphene implies that
terahertz plasmon oscillators only a few microns long in length could have sufficient gain to
overcome both intrinsic losses and losses associated with external radiation coupling. Plasmon
fields with in-plane wavevector magnitude q decay as e−q |z| away from the graphene layer where
|z| is the distance from the graphene layer. Figs. 2, 4, and 6 show that q has values exceeding
105 cm−1 at terahertz frequencies. Therefore, the electromagnetic energy associated with the
terahertz plasmons is confined within 100 nm of the graphene layer. Strong field confinement
and low plasmon losses at terahertz frequencies are both partly responsible for the high net
gain values in graphene. Recent theoretical predictions for electron-hole recombination rates
in graphene due to Auger scattering indicate that electron-hole recombination times can be
much longer than 1 ps at temperatures ranging from 10K to 300K for electron-hole densities
smaller than 1012 cm−2 [16]. This suggests that population inversion can be experimentally
achieved in graphene via current injection in electrostatically defined pn-junctions or via
optical pumping [13, 14]. It also needs to be pointed out here that graphene monolayers
and multilayers produced from currently available experimental techniques are estimated to
have defect/impurity densities anywhere between 1011 and 1012 cm−2 [17]. Therefore, at low
electron-hole densities (less than 1011 cm−2) graphene is expected to exhibit localized electron
and hole puddles rather than continuous electron and/or hole sheet charge densities [17].
This implies that with the currently available techniques graphene based terahertz plasmon
oscillators might only be realizable with higher electron-hole densities (> 1011 cm−2) for
operation at higher frequencies (> 5 THz).
4 conclusion
In conclusion, we have shown that high gain values for plasmons are possible in population
inverted graphene layers in the 1-10 THz frequency range. The plasmon gain remains positive
even for carrier intraband scattering times shorter than 100 fs. The high gain values and
the strong plasmon field confinement near the graphene layer could enable compact terahertz
amplifiers and oscillators. The authors would like to thank Edwin Kan and Sandip Tiwari for
helpful discussions.
References
[1] X. F. Wang, T. Chakraborty, Phys. Rev. B, 75, 033408 (2007).
[2] E. H. Hwang, S. D. Darma, cond-mat/0610561.
[3] V. Ryzhii, A. Satou, J. Appl. Phys., 101, 024509 (2007).
[4] A. Bostwick, T. Ohta, T. Seyller, Karsten Horn AND E. Rotenberg, Nature, 3, 36, (2007).
[5] K. S. Novoselov et. al., Nature, 438, 197 (2005).
[6] K. S. Novoselov et. al., Science, 306, 666 (2004).
[7] Y. Zhang et. al., Nature, 438, 201 (2005).
[8] R. Saito, G. Dresselhaus, M. S. Dresselhaus, Physical Properties of Carbon Nanotubes,
Imperial College Press, London, UK (1999).
[9] B. Williams, H. Callebaut, S. Kumar, Q. Hu, Appl. Phys. Letts., 82, 1015 (2003).
[10] H. Huag, S. W. Koch, Quantum Theory of the Optical and Electronic Properties of
Semiconductors, World Sientific, NJ (1994).
[11] N. D. Mermin, Phys. Rev. B, 1, 2362 (1969).
[12] L. A. Coldren, S. W. Corzine, Diode Lasers and Photonics Integrated Circuits, Wiley, NY
(1995).
[13] J. R. Williams, L. DiCarlo, C. M. Marcus, cond-mat/0704.3487 (2007).
[14] B. zyilmaz, P. Jarillo-Herrero, D. Efetov, D. A. Abanin, L. S. Levitov, Philip Kim,
cond-mat/0705.3044.
http://arxiv.org/abs/cond-mat/0610561
[15] W. De Heer et. al., Science, 312, 1191 (2006).
[16] F. Rana, cond-mat/0705.1204.
[17] E. H. Hwang, S. Adam, S. Das Sarma, Phys. Rev. Letts., 98, 186806 (2007).
	Introduction
	Theoretical Model
	Results and Discussion
	conclusion
ABSTRACT
  We show that plasmons in two-dimensional graphene can have net gain at
terahertz frequencies. The coupling of the plasmons to interband electron-hole
transitions in population inverted graphene layers can lead to plasmon
amplification through the process of stimulated emission. We calculate plasmon
gain for different electron-hole densities and temperatures and show that the
gain values can exceed $10^{4}$ cm$^{-1}$ in the 1-10 terahertz frequency
range, for electron-hole densities in the $10^{9}$-$10^{11}$ cm$^{-2}$ range,
even when plasmon energy loss due to intraband scattering is considered.
Plasmons are found to exhibit net gain for intraband scattering times shorter
than 100 fs. Such high gain values could allow extremely compact terahertz
amplifiers and oscillators that have dimensions in the 1-10 $\mu$m range.

<|endoftext|><|startoftext|>
Introduction
Let R be a Noetherian ring and f = {f1, . . . , fm} a set of elements of R. Such sets are
the ingredients of rational maps between affine and other spaces. At the cost of losing
some definition, we choose to examine them in the setting of the ideal I they generate.
Specifically, we consider the presentation of the Rees algebra of I
0 → M −→ S = R[T1, . . . , Tm]
ϕ−→ R[It] → 0, Ti 7→ fit.
The context of Rees algebra theory allows for the examination of the syzygies of the fi but
also of the relations of all orders, which are carriers of analytic information.
We set R = R[It] for the Rees algebra of I. The ideal M will be referred to as the
equations of the fj, or by abuse of terminology, of the ideal I. If M is generated by forms
of degree 1, I is said to be of linear type (this is independent of the set of generators). The
Rees algebra R[It] is then the symmetric algebra S = Sym(I) of I. Such is the case when
the fi form a regular sequence, M is then generated by the Koszul forms fiTj − fjTi, i < j.
We will treat mainly almost complete intersections in a Cohen-Macaulay ring R, that is,
ideals of codimension r generated by r + 1 elements. Almost exclusively, I will be an ideal
of finite co-length in a local ring, or in a ring of polynomials over a field.
Our focus on R is shaped by the following fact. The class of ideals I to be considered
will have the property that both its symmetric algebra S and the normalization R′ of R
have amenable properties, for instance, one of them (when not both) is Cohen-Macaulay.
In such case, the diagram
S ։ R ⊂ R′
gives a convenient dual platform from which to examine R.
There are specific motivations for looking at (and for) these equations. In order to
describe our results in some detail, let us indicate their contexts.
(i) Ideals which are almost complete intersections occur in some of the more notable
birational maps and in geometric modelling ([3], [4], [5], [6], [7], [8], [9], [10], [17], [18],
[21]).
(ii) It is possible interpret questions of birationality of certain maps as an interaction
between the Rees algebra of the ideal and its special fiber. The mediation is carried
by the first Chern coefficient of the associated graded ring of I. In the case of almost
complete intersections the analysis is more tractable, including the construction of
suitable algorithms.
(iii) At a recent talk in Luminy ([9]), D. Cox raised several questions about the character of
the equations of Rees algebras in polynomial rings in two variables. They are addressed
in Section 4 as part of a general program of devising algorithms that produce all the
equations of an ideal, or at least some distinguished polynomial (e.g. the ‘elimination
equation’ in it) ([3], [13]).
We now describe our results. Section 2 is an assemblage for the ideals treated here
of basics on symmetric and Rees algebras, and on their Cohen-Macaulayness. We also
introduce the general notion of a Sylvester form in terms of contents and coefficients in a
polynomial ring over a base ring. This is concretely taken up in Section 4 when the base
ring is a polynomial ring in 2 variables over a field.
In Section 3 we examine the connection between typical algebraic invariants and the
geometric background of rational maps and their images. Here, besides the dimension and
the degree of the related algebras, we also consider the Chern number e1(I) of an ideal. In
particular we explain a criterion for a rational map to be birational in terms of an equality
of two such Chern numbers, provided the base locus of the map is empty and defined by an
almost complete intersection ideal.
In Section 4, we discuss the role of irreducible ideals in producing Sylvester forms. Of
a general nature, we describe a method to obtain an irreducible decomposition of ideals of
finite co-length. In rings such as k[s, t], due to a theorem of Serre, irreducible ideals are
complete intersections, a fact that leads to Sylvester forms of low degree.
Turning to the equations of almost complete intersections, we derive several Sylvester
forms over a polynomial ring R = k[s, t], package them into ideals and examine the incident
homological properties of these ideals and the associated algebras. It is a computer-assisted
approach whose role is to produce a set of syzygies that afford hand computation: the
required equations themselves are not generated by computation. Concretely, we model
a generic class of ideals cases to define ‘super-generic’ ideals L in rings with several new
variables
L = (f, g, h1, . . . , hm) ⊂ A.
Using Macaulay2 ([11]), we obtain the free resolutions of L. In degrees ≤ 5, the resolution
has length ≤ 3 (2 when degree = 4)
0 → F3
d3−→ F2
d2−→ F1 −→ F0 −→ L → 0.
It has the property that after specialization the ideals of maximal minors of d3 and d2 have
codimension 5 and ≥ 4, respectively. Standard arguments of the theory of free resolutions
will suffice to show that the specialization of L is a prime ideal.
For ideals in R = k[s, t] generated by forms of degrees ≤ 5, the method succeeds in
describing the full set of equations. In higher degree, in cases of special interest, it predicts
the precise form of the elimination equation.
For a technical reason–due to the character of irreducible ideals–the method is limited
to dimension two. Nevertheless, it is supple enough to apply to non-homogeneous ideals.
This may be exploited elsewhere, along with the treatment of ideals with larger numbers of
generators in a two-dimensional ring.
2 Preliminaries on symmetric and Rees algebras
We will introduce some basic material of Rees algebras ([2], [12], [22]). Since most of
the questions we will consider have a local character, we pick local rings as our setting.
Whenever required, the transition to graded rings will be direct.
Throughout we will consider a Noetherian local ring (R,m) and I an m-primary ideal
(or a graded algebra over a field k, R =
n≥0Rn = R0[R1], R0 = k, and I a homogeneous
ideal of finite colength λ(R/I) < ∞).
We assume that I admits a minimal reduction J generated by n = dimR elements.
This is always possible when k is infinite. The terminology means that for some integer r,
Ir+1 = JIr. This condition in turn means that the inclusion of Rees algebras R[Jt] ⊂ R[It]
is an integral birational extension (birational in the sense that the two algebras have the
same total ring of fractions). The smallest such integer, rJ(I), is called the reduction
number of I relative to J ; the infimum of these numbers over all minimal reductions of I
is the (absolute) reduction number r(I) of I.
For any ideal, not necessarily m-primary, the special fiber of R[It] – or of I by abuse of
terminology – is the algebra F(I) = R[It]⊗R (R/m). The dimension of F(I) is called the
analytic spread of I, and denoted ℓ(I). When I is m-primary, ℓ(I) = dimR. A minimal
reduction J is generated by ℓ(I) elements, and F(J) is a Noether normalization of F(I).
Hilbert polynomials
The Hilbert polynomial of I by (m ≫ 0) is the function ([2]):
λ(R/Im) = e0(I)
m+ n− 1
− e1(I)
m+ n− 2
+ lower terms.
e0(I) is the multiplicity of the ideal I. If R is Cohen-Macaulay, e0(I) = λ(R/J), where J
is a minimal reduction of I (generated by a regular sequence). For such rings, e1(I) ≥ 0.
For instance, if R = k[x1, . . . , xn], m = (x1, . . . , xn) and I = m
λ(R/Im) = λ(R/mmd) =
md+ n− 1
m+ n− 1
− e1(I)
m+ n− 2
+ lower terms
where e1(I) =
(dn − dn−1).
Both coefficients will be the focus of our interest soon.
Cohen-Macaulay Rees algebras
There is broad array of criteria expressing the Cohen-Macaulayness of Rees algebra (see [1],
[14], [19], [23, Chapter 3]). Our needs will be filled by single criterion whose proof is fairly
straightforward. We briefly review its related contents.
Let (R,m) be a Cohen-Macaulay local ring of dimension ≥ 1, and let I be an m-primary
ideal with a minimal reduction J . The Rees algebra R[Jt] is Cohen-Macaulay and serves
as an anchor to derive many properties of R[It]. Here is one that we shall make use of.
Define the Sally module SJ(I) of I relative to J to be the cokernel of the natural inclusion
of finite R[Jt]-modules I R[Jt] ⊂ I R[It]. Thus,
SJ(I) =
It/IJ t−1.
It has a Hilbert function, unlike the algebra R[It], that gives information about the Hilbert
function of I (see [22, Chapter 2]). The module on the left, I ·R[Jt], is a Cohen-Macaulay
R[Jt]-module of depth dimR + 1. The Cohen-Macaulayness of I · R[It] is directly related
to that of R[It]. These considerations lead to the criterion:
Theorem 2.1 If dimR ≥ 2 and the reduction number of I is ≤ 1, that is I2 = JI, then
R[It] is Cohen-Macaulay. The converse holds if dimR = 2.
Symmetric algebras
Throughout R is a Cohen-Macaulay ring and I is an almost complete intersection. The
symmetric algebra Sym(I) will be denoted by S. Hopefully there will be no confusion
between S and the rings of polynomials S = R[T1, . . . , Tn] that we use to give a presentation
of either R or S.
What keeps symmetric algebras of almost complete intersections fairly under control is
the following:
Proposition 2.2 Let (R,m) be a Cohen-Macaulay local ring. If I is an almost complete
intersection and depth R/I ≥ dimR/I − 1, then S is Cohen-Macaulay. In particular, if I
is m-primary then S is Cohen-Macaulay.
Proof. The general assertion follows from [12, Proposition 10.3]; see also [16]. ✷
Let R be a Noetherian ring and let I be an R-ideal with a free presentation
ϕ−→ Rn −→ I → 0.
We assume that I has a regular element. If S = R[T1, . . . , Tn], the symmetric algebra S of
I is defined by the ideal M1 ⊂ S of 1-forms,
M1 = I1([T1, . . . , Tn] · ϕ).
The ideal of definition of the Rees algebra R of I is the ideal M ⊂ S obtained by elimination
(M1 : x
t) = M1 : x
where x is a regular element of I.
Sylvester forms
To get additional elements of M , evading the above calculation, we make use of general
Sylvester forms. Recall how these are obtained. Let f = {f1, . . . , fn} be a set of polynomials
in B = R[x1, . . . , xr] and let a = {a1, . . . , an} ⊂ R. If fi ∈ (a)B for all i, we can write
f = [f1 · · · fn] = [a1 · · · an] ·A = a ·A,
where A is a n×n matrix with entries in B. By an abuse of terminology, we refer to det(A)
as a Sylvester form of f relative to a, in notation
det(f)(a) = det(A).
It is not difficult to show that det(f)(a) is well-defined mod (f). The classical Sylvester forms
are defined relative to sets of monomials (see [9]). We will make use of them in Section 4.
The structure of the matrix A may give rise to finer constructions (lower order Pfaffians,
for example) in exceptional cases (see [20]).
In our approach, the fi are elements of M1, or were obtained in a previous calculation,
and the ideal (a) is derived from the matrix of syzygies ϕ.
3 Algebraic invariants in rational parametrizations
Let f1, . . . , fn+1 ∈ R = k[x1, . . . , xn] be forms of the same degree. They define a rational
Ψ : Pn−1 99K Pn
p → (f1(p) : f2(p) : · · · : fn+1(p)).
Rational maps are defined more generally with any number m of forms of the same degree,
but in this work we only deal with the case where m = n+ 1.
There are two basic ingredients to the algebraic side of rational map theory: the ideal
theoretic and the algebra aspects, both relevant for the nature of Ψ. First the ideal I =
(f1, . . . , fn+1) ⊂ R, which in this context is called the base ideal of the rational map. Then
there is the k-subalgebra k[f1, . . . , fn+1] ⊂ R, which is homogeneous, hence a standard k-
algebra up to degree renormalization. As such it gives the homogeneous coordinate ring of
the (closed) image of Ψ. Finding the irreducible defining equation of the image is known as
elimination or implicitization.
We refer to [21] and [18] (also [20] for an even earlier overview) for the interplay between
the ideal and the algebra, as well as its geometric consequences. In particular, the Rees
algebra R = R[It] plays a fundamental role in the theory. A pleasant side of it is that, since
I is generated by forms of the same degree, one has R⊗R k ≃ k[f1t, . . . , fn+1t] ⊂ R, which
retro-explains the (closed) image of Pn−1 by Ψ as the image of the projection to Pn of the
graph of Ψ. In particular, the fiber cone is reduced and irreducible.
3.1 Elimination degrees and birationality
Although a rational map Pn−1 99K Pn has a unique set of defining forms f1, . . . , fn+1 of the
same degree and unit gcd, two such maps may look “nearly” the same if they happen to
be composite with a birational map of the target Pn - a so-called Cremona transformation.
If this is the case the two maps have the same degree, in particular the final elimination
degrees are the same.
However, it may still be the case that the two maps are composite with a rational map
of the target which is not birational, so that their degrees as maps do not coincide, yet
the degrees of the respective images are the same. In such an event, one would like to
pick among all such maps one with smallest possible degree. This leads us to he notion of
improper and proper rational parametrizations.
Definition 3.1 Let Ψ = (f1 : · · · : fn+1) : Pn−1 99K Pn be a rational map, where
gcd(f1, . . . , fn+1) = 1. We will say that Ψ (or the parametrization defined by f1, . . . , fn+1)
is improper if there exists a rational map
Ψ′ = (f ′1 : · · · : f ′n+1) : Pn−1 99K Pn,
with gcd(f ′1, . . . , f
n+1) = 1, such that:
1. There is an inclusion of k-algebras k[f1, . . . , fn+1] ⊂ k[f ′1, . . . , f ′n+1];
2. There is an isomorphism of k-algebras k[f1, . . . , fn+1] ≃ k[f ′1, . . . , f ′n+1];
3. degΨ′ < degΨ.
We note that if Ψ is improper and Ψ′ is as above then the rational map
(P1 : · · · : Pn+1) : Pn 99K Pn
is not birational, where fj = Pj(f
1, . . . , f
n+1), for 1 ≤ j ≤ n + 1. Of course, the transition
forms Pj = Pj(y1, . . . , yn+1) are not uniquely defined.
Example 3.2 The parametrization given by f1 = x
1, f2 = x
2, f3 = x
2 is improper since
it factors through the parametrization f ′1 = x
2 = x1x2, f
3 = x
2 through either one of
the rational maps (y1 : y2 : y3) 7→ (y21 : y22 : y23) or (y1 : y2 : y3) 7→ (y21 : y1t3 : y23) neither of
which is birational. Moreover, the forms x21, x1x2, x
2 define a birational map onto its image.
We say that a rational map Ψ = (f1 : · · · : fn+1) : Pn−1 99K Pn is proper if it is
not improper. The need for considering proper rational maps will become apparent in the
context. It is also a basic assumption in elimination theory when one is looking for the
elimination degrees (see [9]).
Clearly, if Ψ is birational onto its image then it is proper. The converse does not hold
and one seeks for precise conditions under which Ψ is birational onto its image. This is the
object of the following parts of this subsection.
When the ideal I = (f1, . . . , fn+1) has finite co-length – that is, I is (x1, . . . , xn)-primary
– it is natural to consider another mapping, namely, the corresponding embedding of the
Rees algebra R = R[It] into its integral closure R̃. We will explore the attached Hilbert
functions into the determinations of various degrees, including the elimination degree of the
mapping.
Thus, assume that I has finite co-length. Then we may assume (k is infinite) that
f1, . . . , fn is a regular sequence, hence the multiplicity of J = (f1, . . . , fn) is d
n, the same
as the multiplicity of md. This implies that J is a minimal reduction of I and of md. We
will set up a comparison between R and R′ = R[md], where m = (x1, . . . , xn), through two
relevant exact sequences:
0 → R −→ R′ −→ D → 0, (1)
and its reduction mod m
R̄ −→ R̄′ −→ D̄ → 0. (2)
F = R̄ is the special fiber ofR (or, of I), and since I is generated by forms of the same degree,
one has F ≃ k[f1, . . . , fn+1] as graded k-algebras. By the same token, F ′ = R̄′ ≃ k[md] –
the d-th Veronese subring of R. In particular, since dimF = dimF ′, the leftmost map in the
exact sequence (2) is injective. AlsoD is annihilated by a power of m, hence dimD = dim D̄.
These are the degrees (multiplicities) deg(F) and deg(F ′) of the special fibers. Since F ′
is an integral extension of F , one has
deg(F ′) = deg(F)[F ′ : F ], (3)
where [F ′ : F ] = dimK(F ′ ⊗F K), where K denotes the fraction field of F (see, e.g.,
[21, Proposition 6.1 (b) and Theorem 6.6] for more general formulas). Since F ′ is besides
integrally closed, the latter is also the field extension degree [ k(md) : K ]. Note that
[F ′ : F ] = 1 means that the extension F ⊂ F ′ is birational (equivalently, the rational map
Ψ maps Pn−1 birationally onto its image). As above, set L = md. We next characterize
birationality in terms of both the coefficient e1 and the dimension of the R-module D.
Proposition 3.3 The following conditions are equivalent:
(i) [F ′ : F ] = 1, that is Ψ is birational onto its image;
(ii) deg(F) = dn−1;
(iii) dim D̄ ≤ n− 1;
(iv) dimD ≤ n− 1
(v) e1(L) = e1(I).
Proof. (i) ⇐⇒ (ii) This is clear from (3) since deg(F ′) = dn−1.
(i) ⇐⇒ (iii) Since ℓ(I) = n and F ⊂ F ′ is integral, then F ⊂ F ′ is a birational extension
if and only if its conductor F :F F ′ is nonzero, equivalently, if and only if dim D̄ ≤ n− 1.
(iv) ⇐⇒ (iii) Clearly, dimD ≤ n and in the case of equality its multiplicity is e1(L) −
e1(I) > 0. Therefore, the equivalence of the two statements follows suit. ✷
There is some advantage in examining D̄ since F is a hypersurface ring,
F = k[T1, . . . , Tn+1]/(f) = R[T1, . . . , Tn+1]/(x1, . . . , xn, f)
a complete intersection. Since F ′ is also Cohen-Macaulay, with a well-known presentation,
it affords an understanding of D̄, and sometimes, of D.
3.2 Calculation of e1(I) of the base ideal of a rational map
One objective here is to apply some general formulas for the Chern number e1(I) of an ideal
I to the case of the base ideal of a rational map with source P1 = Proj(k[x1, x2]).
Here is a method put together from scattered facts in the literature of Rees algebras
(see [23, Chapter 2]).
Proposition 3.4 Let (R,m) be a Cohen-Macaulay local ring of dimension d, let I be an
m-primary ideal with a minimal reduction J = (a1, . . . , ad). Set R
′ = R/(a1, . . . , ad−1),
I ′ = IR′. Then
(i) e0(I) = e0(I
′) = λ(R/J), e1(I) = e1(I
(ii) r(I ′) < degR′ ≤ e0(I); in particular, for n ≥ r = r(I ′), one has I ′n+1 = adI ′n
(iii) λ(R′/I ′ r+1) = λ(R′/I ′ r) + λ(I ′ r/adI
′ r) = e0(I)(r + 1)− e1(I)
(iv) e1(I) = −λ(R′/I ′ r) + e0(I)r
It would be desirable to develop a direct method suitable for the ideal I = (a, b, c)
generated by forms of R = k[s, t], of degree n. We may assume that a, b for a regular
sequence (i.e. gcd(a, b) = 1). We already know that e0(I) = n
2. For regular rings, one
knows ([15]) that e1(I) ≤ d−12 e0(I), d = dimR. Nevertheless the steps above already lead
to an efficient calculation for two reasons: the multiplicity e0(I) is known at the outset and
it does not really involve the powers of I. Forms of degree up to 10 are handled well by
Maucalay2 ([11]).
4 Sylvester forms in dimension two
We establish the basic notation to be used throughout. R = k[s, t] is a polynomial ring over
the infinite field k, and I ⊂ R = k[s, t] is a codimension 2 ideal generated by 3 forms of the
same degree n+ 1, with free graded resolution
0 −→ R(−n−1−µ)⊕R(2(−n−1)+µ) ϕ−→ R3(−n−1) −→ I −→ 0, ϕ =
α1 β1 γ1
α2 β2 γ2
Then the symmetric algebra of I is S ≃ R[T1, T2, T3]/(f, g) with
f = α1T1 + β1T2 + γ1T3
g = α2T1 + β2T2 + γ2T3.
Starting out from these 2 forms, the defining equations of S, following [9], we obtain by elim-
ination higher degrees forms in the defining ideal of R(I). It will make use of a computer-
assisted methodology to show that these algorithmically specified sets generate the ideal
of definition M of R(I) in several cases of interest–in particular answering some questions
raised [9]. More precisely, the so-called ideal of moving forms M is given when I is gen-
erated by forms of degree at most 5. In arbitrary degree, the algorithm will provide the
elimination equation in significant cases.
4.1 Basic Sylvester forms in dimension 2
Let R = k[s, t] and let F,G ∈ B = R[s, t, T1, T2, T3]. If F,G ∈ (u, v)B, for some ideal
(u, v) ⊂ R, the form derived from
h = ad− bc = det(F,G)(u,v),
will be called a basic Sylvester form.
To explain their naturalness, even for ideals I not necessarily generated by forms, we
give an approach to irreducible decomposition of certain ideals.
Theorem 4.1 Let (R,m) be a Gorenstein local ring and let I be an m–primary ideal. Let
J ⊂ I be an ideal generated by a system of parameters and let E = (J : I)/J be the canonical
module of R/I. If E = (e1, . . . , er), ei 6= 0, and Ii = ann (ei), then Ii is an irreducible ideal
The statement and its proof will apply to ideals of rings of polynomials over a field.
Proof. The module E is the injective envelope of R/I, and therefore it is a faithful R/I–
module (see [2, Section 3.2] for relevant notions). For each ei, Re1 is a nonzero submodule of
E whose socle is contained in the socle of E (which is isomorphic to R/m) and therefore its
annihilator Ii (as an R-ideal) is irreducible. Since the intersection of the Ii is the annihilator
of E, the asserted equality follows. ✷
Corollary 4.2 Let (R,m) be a regular local ring of dimension two and let I be an m–
primary ideal with a free resolution
0 → Rn−1 ϕ−→ Rn −→ I → 0,

an−1,1 · · · an−1,n−1
an,1 · · · an,n−1

and suppose that the last two maximal minors ∆n−1,∆n of ϕ form a regular sequence. If
e1, . . . , en−1 are as above, then
(∆n−1,∆n) : I = In−2(ξ
′) = (e1, . . . , en−1)
and each ideal (∆n−1,∆n) : ei is a complete intersection of codimension 2.
Proof. The assertion that the irreducible Ii is a complete intersection is a result of Serre,
valid for all two-dimensional regular rings whose projective modules are free. ✷
Remark 4.3 In our applications, I = C(f, g), the content ideal of f, g. In some of these
cases, C(f, g) = (s, t)n, for some n, an ideal which admits the irreducible decomposition
(s, t)n =
(si, tn+1−i).
One can then process f, g through all the pairs {si, tn−i+1}, and collect the determinants for
the next round of elimination. As in the classical Sylvester forms, the inclusion C(f, g) ⊂
(s, t)n may be used anyway to start the process, although without the measure of control
of degrees afforded by the equality of ideals.
4.2 Cohen-Macaulay algebras
We pointed out in Theorem 2.1 that the basic control of Cohen-Macaulayness of a Rees
algebra of an ideal I ⊂ k[s, t] is that its reduction number be at most 1. We next give a
mean of checking this property directly off a free presentation of I.
Theorem 4.4 Let I ⊂ R be an ideal of codimension 2, minimally generated by 3 forms of
the same degree. Let
α1 α2
β1 β2
γ1 γ2
be the Hilbert-Burch presentation matrix of I. Then R is Cohen-Macaulay if and only if
the equalities of ideals of R hold
(α1, β1, γ1) = (α2, β2, γ2) = (u, v),
where u, v are forms.
Proof. Consider the presentation
0 → L −→ S = R[T1, T2, T3]/(f, g) −→ R → 0,
where f, g are the 1-forms [
T1 T2 T3
If R is Cohen-Macaulay, the reduction number of I is 1 by Theorem 2.1, so there must
be a nonzero quadratic form h with coefficients in k in the presentation ideal M of R. In
addition to h, this ideal contains f, g, hence in order to produce such terms its Hilbert-Burch
matrix must be of the form 
p1 p2
q1 q2
where u, v are forms of k[s, t], and the other entries are 1-forms of k[T1, T2, T3]. Since p1, p2
are q1, q2 are pairs of linearly independent 1-forms, the assertion about the ideals defined
by the columns of ϕ follow.
4.3 Base ideals generated in degree 4
This is the case treated by D. Cox in his Luminy lecture ([9]). We accordingly change the
notation to R = k[s, t], I = (f1, f2, f3), forms of degree 4. The field k is infinite, and we
further assume that f1, f2 form a regular sequence so that J = (f1, f2) is a reduction of I
and of (s, t)4. Let
0 → R(−4− µ)⊕R(−8 + µ) ϕ−→ R3(−4) −→ R −→ R/I → 0, ϕ =
α1 α2
β1 β2
γ1 γ2
 (4)
be the Hilbert-Burch presentation of I. We obtain the equations of f1, f2, f3 from this
matrix.
Note that µ is the degree of the first column of ϕ, 4 − µ the other degree. Let us first
consider (as in [9]) the case µ = 2.
Balanced case
We shall now give a computer-assisted treatment of the balanced case, that is when the
resolution (4) of the ideal I has µ = 2 and the content ideal of the syzygies is (s, t)2. Since
k is infinite, it is easy to show that there is a change of variables, T1, T2, T3 → x, y, z, so
that (s2, st, t2) is a syzygy of I. The forms f, g that define the symmetric algebra of I can
then be written
[f g] = [s2 st t2]
where u, v, w are linear forms in x, y, z. Finally, we will assume that the ideal I2
x y z
u v w
has codimension two. Note that this is a generic condition.
We introduce now the equations of I.
• Linear equations f and g:
[f g] = [x y z] ϕ = [x y z]
α1 α2
β1 β2
γ1 γ2
= [s2 st t2]
where u, v, w are linear forms in x, y, z.
• Biforms h1 and h2:
Write Γ1 and Γ2 such that
[f g] = [x y z] ϕ = [ s t2 ] Γ1 = [ s
2 t ] Γ2.
Then h1 = detΓ1 and h2 = detΓ2.
• Implicit equation F = detΘ, where [h1 h2] = [s t] Θ.
Using generic entries for ϕ, in place of the true k-linear forms in old variables x, y, z, we
consider the ideal of k[s, t, x, y, z, u, v, w] defined by
f = s2x+ sty + t2z
g = s2u+ stv + t2w
h1 = −syu− tzu+ sxv + txw
h2 = −szu− tzv + sxw + tyw
F = −z2u2 + yzuv − xzv2 − y2uw + 2xzuw + xyvw − x2w2
Proposition 4.5 If I2
x y z
u v w
specializes to a codimension two ideal of k[x, y, z],
then L = (f, g, h1, h2, F ) ⊂ A = R[x, y, z, u, v, w] specializes to the defining ideal of R.
Proof. Macaulay2 ([11]) gives a resolution
0 → A d2−→ A5 −→ A5 −→ L → 0
where

zv − yw
zu− xw
−yu+ xv

The assumption on I2
x y z
u v w
says that the entries of d2 generate an ideal of
codimension four and thus implies that the specialization LS has projective dimension two
and that it is unmixed. Since LS 6⊂ (s, t)S, there is an element q ∈ (s, t)R that is regular
modulo S/LS. If
LS = Q1 ∩ · · · ∩Qr
is the primary decomposition of LS, the localization LSq has the corresponding decompo-
sition since q is not contained in any of the
Qi. But now Symq = Rq, so LSq = (f, g)u,
as Iq = Rq. ✷
Non-balanced case
We shall now give a similar computer-assisted treatment of the non-balanced case, that is
when the resolution (4) of the ideal I has µ = 3. This implies that the content ideal of the
syzygies is (s, t). Let us first indicate how the proposed algorithm would behave.
• Write the forms f, g as
f = as+ bt
g = cs+ dt,
where
x y z
u v w
• The next form is the Jacobian of f, g with respect to (s, t)
h1 = det(f, g)(s,t) = ad− bc = −bxs2 − byst− bzt2 + aus2 + avst+ awt2.
• The next two generators
h2 = det(f, h1)(s,t) = b
2xs+ b2yt− abzt− abus− abvt+ a2wt
and the elimination equation
h3 = det(f, h2)(s,t) = −b3x+ ab2y − a2bz + ab2u− a2bv + a3w.
Proposition 4.6 L = (f, g, h1, h2, h3) ⊂ A = k[s, t, x, y, z, u, v, w] specializes to the defin-
ing ideal of R.
Proof. Macaulay2 ([11]) gives the following resolution of L
0 → A2 ϕ−→ A6 ψ−→ A5 −→ L → 0,
x + abu −b
y + abz + abv − a
w −bsx− bty + asu + atv −btz + atw −s
x− sty − t
u− stv − t
t −s 0 0 0 0
a b t −s 0 0
0 0 a b t −s
0 0 0 0 a b
The ideal of 2 × 2 minors of ϕ has codimension 4, even after we specialize from A to
S in the natural manner. Since LS has projective dimension two, it will be unmixed. As
LS 6⊂ (s, t), there is an element u ∈ (s, t)R that is regular modulo S/LS. If
LS = Q1 ∩ · · · ∩Qr
is the primary decomposition of LS, the localization LSu has the corresponding decompo-
sition since u is not contained in any of the
Qi. But now Symu = Ru, so LSu = (f, g)u,
as Iu = Ru. ✷
4.4 Degree 5 and above
It may be worthwhile to extend this to arbitrary degree, that is assume that I is defined
by 3 forms of degree n+1 (for convenience in the notation to follow). We first consider the
case µ = 1. Using the procedure above, we would obtain the sequence of polynomials in
A = R[a, b, x1, . . . , xn, y1, . . . , yn]
• Write the forms f, g as
f = as+ bt
g = cs+ dt,
where
x1 · · · xn
y1 · · · yn

sn−2t
stn−2

• The next form is the Jacobian of f, g with respect to (s, t)
h1 = det(f, g)(s,t) = ad− bc
• Successively we would set
hi+1 = det(f, hi)(s,t), 1 < n.
• The polynomial
hn = det(f, hn−1)(s,t)
is the elimination equation.
Proposition 4.7 L = (f, g, h1, . . . , h5) ⊂ A specializes to the defining ideal of R.
In Macaulay2, we checked the degrees 5 and 6 cases. In both cases, the ideal L (which
has one more generator in degree 6) has a projective resolution of length 2 and the ideal of
maximal minors of the last map has codimension four.
Conjecture 4.8 For arbitrary n, L = (f, g, h1, . . . , hn) ⊂ A has projective dimension two
and specializes to the defining ideal of R.
In degree 5, the interesting case is when the Hilbert-Burch matrix φ has degrees 2 and
3. Let us describe the proposed generators. For simplicity, by a change of coordinates, we
assume that the coordinates of the degree 2 column of ϕ are s2, st, t2
f = s2x+ sty + t2z
g = (s3w1 + s
2tw2 + st
2w3 + t
3w4)x+ (s
3w5 + s
2tw6 + st
2w7 + t
3w8)y
+ (s3w9 + s
2tw10 + st
2w11 + t
3w12)z
Let [
x y z
sA sB + tC tD
 = φ
x ys+ zt
sA+ tB stC + t2D
xs+ yt z
s2A+ stB sC + tD
where A,B,C,D are k-linear forms in x, y, z.
h1 = det(B1)
= s2(−yA) + st(xC − yB − zA) + t2(xD − zB)
= s2(−yA) + t(xCs− yBs− zAs+ xDt− zBt)
= s(−yAs+ xCt− yBt− zAt) + t2(xD − zB),
h2 = det(B2)
= s2(xC − zA) + st(xD + yC − zB) + t2(yD)
= s2(xC − zA) + t(xDs+ yCs− zBs+ yDt)
= s(xCs− zAs + xDt+ yCt− zBt) + t2(yD).
x ys+ zt
−yA xCs− yBs− zAs + xDt− zBt
xs+ yt z
−yAs+ xCt− yBt− zAt xD − zB
x ys+ zt
xC − zA xDs+ yCs− zBs+ yDt
xs+ yt z
xCs− zAs+ xDt+ yCt− zBt yD
c1 = det(C1) = x
2(Cs+Dt) + xy(−Bs) + xz(−As−Bt) + yz(At) + y2(As)
c2 = det(C2) = x
2(Ds) + xy(Dt) + xz(−Bs−Ct) + yz(As) + z2(At)
c3 = det(C3) = x
2(Ds) + xy(Dt) + xz(−Bs−Ct) + yz(As) + z2(At)
c4 = det(C4) = xy(Ds) + xz(−Cs−Dt) + yz(−Ct) + z2(As +Bt) + y2(D)
x y z
−yA xC − yB − zA xD − zB
xC − zA xD + yC − zB yD
Then F = −x3D2+x2yCD+xy2(−BD)+x2z(2BD−C2)+xz2(2AC−B2)+xyz(BC−
3AD) + y2z(−AC)+ yz2(AB)+ y3(AD) + z3(−A2), an equation of degree 5. In particular,
the parametrization is birational.
Proposition 4.9 L = (f, g, h1, h2, c1, c2, c4, F ) specializes to the defining ideal of R.
Proof. Using Macaulay2, the ideal L has a resolution:
0 −→ S1 d3−→ S6 d2−→ S12 d1−→ S8 −→ L −→ 0.
d3 = [−z y x − t s 0]t
y z 0 0 0 0
x 0 z 0 0 0
−v 0 0 z 0 x2w4 − xzw7 + xyw8 + xzw12
u 0 0 0 z −xzw3 + xyw4 + z
2w6 − yzw7 + y
2w8 − xzw8 − z
2w11 + yzw12
0 x −y 0 0 0
0 −v 0 −y 0 xzw1 − x
2w3 + yzw5 + z
2w9 − xzw11
0 u 0 0 −y xzw2 − x
2w4 + z
2w10 − xzw12
0 0 u 0 −x xzw1 + yzw5 − xzw6 + x
2w8 + z
0 0 0 u v 0
0 0 v x 0 −xyw1 + x
2w2 − y
2w5 + xyw6 − x
2w7 − yzw9 + xzw10
0 0 0 0 0 −t
0 0 0 0 0 s
The ideals of maximal minors give codim I1(d3) = 5 and codim I5(d2) = 4 after special-
ization. As we have been arguing, this suffices to show that the specialization is a prime
ideal of codimension two. ✷
Elimination forms in higher degree
In degrees greater than 5, the methods above are not very suitable. However, in several
cases they are still supple enough to produce the elimination equation. We have already
seen this when one of the syzygies is of degree 1. Let us describe two other cases.
• Degree n = 2p, f and g both of degree p. We use the decomposition
(s, t)p =
(si, tp+1−i).
For each 1 ≤ i ≤ p, let
hi = det(f, g)(si,tp+1−i).
These are quadratic polynomials with coefficients in (s, t)p−1. We set
[h1, · · · , hp] = [sp−1, · · · , tp−1] ·A,
where A is a p × p matrix whose entries are 2-forms in k[x, y, z]. The Sylvester form of
degree n, F = det(A), is the required elimination equation.
• Degree n = 2p+ 1, f of degree p. We use the decomposition
(s, t)p =
(si, tp+1−i).
For each 1 ≤ i ≤ p, let
hi = det(f, g)(si,tp+1−i).
These are quadratic polynomials with coefficients in (s, t)p. We set
[f, h1, · · · , hp] = [sp, · · · , tp] ·B,
where A is a (p + 1) × (p + 1) matrix with one column whose entries are linear forms and
the remaining columns with entries 2-forms in k[x, y, z]. The Sylvester form F = det(B) is
the required elimination equation.
References
[1] I. M. Aberbach, C. Huneke and N. V. Trung, Reduction numbers, Briançon-Skoda
theorems and depth of Rees algebras, Compositio Math. 97 (1995), 403–434.
[2] W. Bruns and J. Herzog, Cohen-Macaulay Rings, Cambridge University Press, 1993.
[3] L. Busé and J.-P. Jouanolou, On the closed image of a rational map and the implicit-
ization problem, J. Algebra 265 (2003), 312-357.
[4] L. Busé, M. Chardin and J.-P. Jouanolou, Complement to the implicitization of ratio-
nal hypersurfaces by means of approximation complexes, Arxiv, 2006.
[5] L. Busé, D. Cox and C. DAndrea, Implicitization of surfaces in P3 in the presence of
base points, J. Algebra Appl. 2 (2003), 189-214.
[6] D. A. Cox, T. Sederberg, and F. Chen, The moving line ideal basis of planar rational
curves, Comput. Aided Geom. Des. 15 (1998) 803–827.
[7] D. A. Cox, R. N. Goldman, and M. Zhang, On the validity of implicitization by moving
quadrics for rational surfaces with no base points, J. Symbolic Computation 29 (2000)
419–440.
[8] D. A. Cox, Equations of parametric curves and surfaces via syzygies, Contemporary
Mathematics 286 (2001) 1–20.
[9] D. A. Cox, Four conjectures: Two for the moving curve ideal and two for the Bezoutian,
Proceedings of “Commutative Algebra and its Interactions with Algebraic Geometry”,
CIRM, Luminy, France, May 2006 (available in CD media).
[10] C. D’Andrea, Resultants and moving surfaces, J. Symbolic Computation 31 (2001)
585–602.
[11] D. Grayson and M. Stillman, Macaulay2, a software system for research in algebraic
geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[12] J. Herzog, A. Simis and W. V. Vasconcelos, Koszul homology and blowing-up rings, in
Commutative Algebra, Proceedings: Trento 1981 (S. Greco and G. Valla, Eds.), Lecture
Notes in Pure and Applied Mathematics 84, Marcel Dekker, New York, 1983, 79–169.
[13] J. P. Jouanolou, Formes d’inertie et résultant: un formulaire, Adv. Math. 126 (1997),
119–250.
[14] B. Johnson and D. Katz, Castelnuovo regularity and graded rings associated to an
ideal, Proc. Amer. Math. Soc. 123 (1995), 727-734.
[15] C. Polini, B. Ulrich and W. V. Vasconcelos, Normalization of ideals and Briançon-
Skoda numbers, Math. Research Letters 12 (2005), 827–842.
[16] M. E. Rossi, On symmetric algebras which are Cohen-Macaulay, Manuscripta Math.
34 (1981), 199-210.
http://www.math.uiuc.edu/Macaulay2/
[17] T. Sederberg, R. Goldman and H. Du, Implicitizing rational curves by the method of
moving algebraic curves, J. Symbolic Computation 23 (1997), 153–175.
[18] A. Simis, Cremona transformations and some related algebras, J. Algebra 280 (1)
(2004), 162–179.
[19] A. Simis, B. Ulrich and W. V. Vasconcelos, Cohen-Macaulay Rees algebras and degrees
of polynomial relations, Math. Annalen 301 (1995), 421–444.
[20] A. Simis, B. Ulrich and W. V. Vasconcelos, Jacobian dual fibrations, Amer. J. Math.
115 (1993), 47–75.
[21] A. Simis, B. Ulrich and W. V. Vasconcelos, Codimension, multiplicity and integral
extensions, Math. Proc. Camb. Phil. Soc. 130 (2001), 237–257.
[22] W. V. Vasconcelos, Arithmetic of Blowup Algebras, London Math. Soc., Lecture Note
Series 195, Cambridge University Press, 1994.
[23] W. V. Vasconcelos, Integral Closure, Springer Monographs in Mathematics, New York,
2005.
	Introduction
	Preliminaries on symmetric and Rees algebras
	Algebraic invariants in rational parametrizations
	Elimination degrees and birationality
	Calculation of e1(I) of the base ideal of a rational map
	Sylvester forms in dimension two
	Basic Sylvester forms in dimension 2
	Cohen-Macaulay algebras
	Base ideals generated in degree 4
	Degree 5 and above
ABSTRACT
  We study birational maps with empty base locus defined by almost complete
intersection ideals. Birationality is shown to be expressed by the equality of
two Chern numbers. We provide a relatively effective method of their
calculation in terms of certain Hilbert coefficients. In dimension two the
structure of the irreducible ideals leads naturally to the calculation of
Sylvester determinants via a computer-assisted method. For degree at most 5 we
produce the full set of defining equations of the base ideal. The results
answer affirmatively some questions raised by D. Cox.

<|endoftext|><|startoftext|>
Introduction
We have all become accustomed to sending messages electronically, whether by fax machine,
telephone, computer or other electronic media. Most of these messages contain data that is
already publicly known or at least easily found. Other messages are things we would like to
keep to ourselves, and it would be inconvenient if some third party came across the message.
Still other messages are extremely private and resources, jobs, or even lives(!) might be lost if
the message fell into the wrong hands. A great deal of effort is employed to encrypt the messages
that fall in this last category, sending them with some sort of code in order to prevent any third
party from understanding them even if the messages are intercepted.[1]
However, when a message is sent electronically there is no commonly available technology to
determine if someone has been trying to intercept the message. When sending typed letters,
such a technology does exist, albeit in an imperfect form. We often seal our letters in envelopes.
These envelopes are not secure, that is, they do not prevent anyone from opening the envelope
and reading the letter inside. However, when an envelope is received intact, without any tears
or other indication that it has been tampered with, we have a strong reason to believe that the
message inside has not been seen by anyone since the earlier time when the sender sealed it.
Yet a seal on an envelope is not to be wholly trusted for this task of detecting eavesdroppers.
1 Previous address: Army Research Laboratory, Adelphi, MD
http://arxiv.org/abs/0704.0609v1
A skilled person might be able to examine the contents of the sealed envelope in any number
of ways: by using x-rays or other similar non-destructive testing methods, by steaming the seal
off and re-sealing, or by ripping open the envelope and then placing the letter in a new, forged
envelope that matches the original in every detail.
In this paper we introduce a quantum cryptographic protocol that allows two users to send
and receive a message in a manner that is, in effect, quite similar to the use of a sealed
envelope. The receiver of the message has the opportunity to check if there have been any
active eavesdroppers trying to learn the contents of the message. And similar to a message
sealed in an envelope, the message remains unknown to anyone who is not actively trying to
learn the contents. This protocol has the advantage over sealing letters in envelopes because the
limited types of interactions allowed by quantum mechanics prevent someone from eavesdropping
on the message without leaving signs of the eavesdropping activities.
It is important to make it clear that any messages sent using the protocol introduced here
are not secure. That is, an eavesdropper can always choose to take some action in order to
determine the content of a message sent using this protocol. (We give an example of one such
effective eavesdropping strategy below.) The quality that makes this protocol distinct from other
methods of message transmission is that any such active eavesdropping strategies will cause an
appreciable amount of “noise” that is detectable by the message receiver. The analysis that
a message receiver undertakes to place a bound on what an eavesdropper could have learned
during a particular message transmission is not undertaken here. This analysis can be found
elsewhere.[2]
The goal of this manuscript is to examine a certain class of strategies for eavesdropping on these
sealed messages, and it is divided into four parts: First, the quantum message sealing protocol is
introduced. Following this, we examine a certain class of eavesdropping strategies and describe
what an eavesdropper expects to learn by employing such strategies. Next, we describe the type
and amount of disturbance the eavesdropper will cause by such an activity and work out the
details of an example from this class of eavesdropping strategies. We conclude with a discussion
of this protocol and its similarities and differences to other quantum cryptographic protocols.
2. Message sealing protocol
We describe the protocol where the message sender named Alice transfers a message to the
receiver named Bob. This message will be a single bit b which is either zero or one. The
protocol utilizes a single quantum mechanical system which has two degrees of freedom. The
standard notation for such a system is used, with |0〉 and |1〉 representing vectors that form an
orthonormal basis. The protocol also involves a number of announcements made by the message
sender. These announcements are to be considered as public announcements to which everyone
is assumed to have access.
A process, referred to as a single shot, will be repeated many times and goes as follows:
Step 1 - Bob prepares a quantum system, which we refer to as a particle, in one of four pure
states: |0〉, |1〉, |+〉 ≡ (|0〉+ |1〉)/
2, or |−〉 ≡ (|0〉 − |1〉)/
2. The decision as to which state to
prepare is made at random with equal probability for each state. He records the state he has
prepared and then he sends the particle to Alice.
Step 2 - Alice makes one of two measurements with equal probability. She either makes a
measurement corresponding to σ1 = |+〉〈+|−|−〉〈−| or she makes a measurement corresponding
to σ3 = |0〉〈0| − |1〉〈1|. Each of these two measurements can be said to have a result m that is
either m = +1 or m = −1.
Step 3 - Alice announces whether her measurement corresponded to σ1 or σ3.
Step 4 - Alice makes one of two possible announcements. With probability pa she makes a
bit-announcement (described immediately below) and with probability (1 − pa) she makes a
result-announcement. She also makes it known which of the two types of announcement she is
making.
Bit-Announcement: She announces a bit c that is determined by using the message bit b and
the measurement result. If her measurement yielded the result m = +1 then her announced bit
c will be the same as the message bit b and if her measurement yielded the result m = −1 her
announced bit c will be the opposite of the message bit b.
Result-Announcement: She announces the result of her measurement, m = +1 or m = −1.
When Bob prepares the particle in the state |0〉 or |1〉 and Alice makes a σ3 measurement, or
when Bob prepares the particle in the state |+〉 or |−〉 and Alice makes a σ1 measurement we say
that Alice’s measurement and Bob’s state preparation have a matching basis. They will have a
matching basis on half the shots performed. When this occurs then Bob knows the result of the
measurement without Alice having to announce it, provided that the state of the particle did
not change from when Bob prepared it to when Alice makes the measurement. The correlations
between Bob’s state preparation and Alice’s measurement results allow Bob to both determine
the message bit and check the channel for any disturbances.
When Alice makes a measurement in the basis matching Bob’s state preparation, Bob determines
the message by applying a controlled-bit-flip operation on the announced bit. When the state
in which he prepared the particle is either |0〉 or |+〉 then the message bit b is the same as the
announced bit c and if he prepared |1〉 or |−〉 then the message b is the opposite of the announced
bit c.
From an eavesdropper’s point of view, the probability that the message bit is one value or the
other is determined from the coded bit-announcements. When both values of the measurement
result are equally likely then both values of the message bit are equally likely (for either
bit-announcement). The four possible initial states that Bob prepares and the two possible
measurements were chosen so that either measurement result is equally likely. Moreover, the
only opportunity that an eavesdropper has to change these probabilities is to change the state
of the particle when it is traveling from Bob to Alice. The rules of quantum mechanics allow
for the state of a quantum mechanical system to change in two different ways: by a unitary
evolution or by a measurement. If we want to describe the effects of coupling the quantum
system composed of the particle to another (auxiliary) quantum system and then letting the
state of whole system (particle plus auxiliary) change via unitary evolution of measurement, the
entire process can be described as a quantum operation or a generalized measurement on the
state of the particle subsystem.[3]
In the following sections we examine the case of when an eavesdropper chooses to change the
state of the particle by applying a quantum operation. It is worthwhile to emphasize that while
using this type of eavesdropping activity is not optimal,[2] it provides us with some intuition as
to how this protocol can be expected to work.
3. Information gain from quantum operations
In this section we quantify what an eavesdropper learns by applying a quantum operation to
change the state of the particle as it travels from Bob to Alice. We first describe quantum
operations[3] and then tackle the problem of quantifying an eavesdropper’s gain by using the
Shannon mutual information.[4]
A quantum operation E acting on states in Hilbert space H is described by a set of operators
{E1, . . . , En} subject to the requirement that
iEi = I where I is the identity operator
acting on H. We say that the quantum operation E maps the initial state ρ to final state
E(ρ) =
i . A quantum operation is a convex linear map on the space of mixed states,
which is to say that if ρ = pρ1 + (1 − p)ρ2 with 0 ≤ p ≤ 1, then E(ρ) = pE(ρ1) + (1 − p)E(ρ2).
A special class of quantum operations are the unital quantum operations that map the chaotic
state, which is 1
I where d = dim(H), to itself.
We quantify the amount an eavesdropper learns by using the Shannon mutual information
between two random variables: the random variable B which describes the possible values of
the message and their probabilities, and the random variable C which describes the possible
strings of bit-announcements and their probabilities. These strings result from the fact that
there will be N shots, and an announcement will be made on each shot. On some of the shots
only the result of the measurement will be announced, and this result does not depend on the
message in any way. Therefore, only the bit-announcements will be of any concern to us in
quantifying what the eavesdropper learns.
The possible messages are b = 0 and b = 1 with one-half prior probability each.
On each shot there are four possible bit-announcements — (σ1, 0), (σ1, 1), (σ3, 0), and (σ3, 1) —
and when N shots are made, k of which result in bit-announcements (where 0 ≤ k ≤ N), there
are 4k possible bit-announcement strings. Because of the probabilistic nature of the protocol, the
number of bit-announcements is not fixed. The probability pk of making k bit-announcements
is found using the binomial distribution
p ka (1− pa)N−k .
We use the symbol c to denote a bit-announcement string, and we use the symbol C(k) to
describe the ensemble of all possible bit-announcement strings of length k.
Given that there are k bit-announcements, the Shannon mutual information I(C(k) : B) is
calculated using
I(C(k) : B) =
Pr(c) log
Pr(c)
Pr(c | b) log Pr(c | b)
where the sum over c(k) indicates that this sum is taken over all 4k bit-announcement strings
of length k. This can be used to determine the expected mutual information when taking the
weighted sum over the various possible lengths of bit-announcement strings,
I(C : B) =
pkI(C
(k) : B) . (2)
This can be calculated once the probabilities Pr(c|b) are known for every c and both values of
b. The remainder of this section is devoted to determining these probabilities, which will change
depending upon which quantum operation is applied.
For a given value of the message, the probabilities of the four bit-announcements depend upon
the probability of Alice getting the m = +1 measurement result. That is,
Pr(σi, c = b|b) = Pr(m = +1|σi) Pr(σi) = Pr(m = +1|σi)/2 ,
Pr(σi, c 6= b|b) = Pr(m = −1|σi) Pr(σi) = Pr(m = −1|σi)/2 ,
Table 1. The probabilities for the four results relevant to the bit-announcements, given that
an eavesdropper acts with a quantum operation Eλv that maps the chaotic state to ρ(λv).
Pr(m = +1|σ1, Eλv) = 12(1 + λv1)
Pr(m = −1|σ1, Eλv) = 12(1− λv1)
Pr(m = +1|σ3, Eλv) = 12(1 + λv3)
Pr(m = −1|σ3, Eλv) = 12(1− λv3)
where i = 1, 3 and b = 0, 1. The notation Pr(m = +1|σi), for example, is used to mean that this
is the probability that the result m = +1 will be found when a measurement that corresponds
to σi is made on the particle and Pr(σi) is the probability that the measurement corresponding
to σi will be performed.
Of course, the machinery of quantum mechanics requires us to specify the state of the particle
in order to calculate a probability of a certain measurement result. From an eavesdropper’s
point of view, if she does nothing to the particle then there are four possible states with equal
probability. So Pr(m = ±1|σi) = 14 (Tr(
(I±σi)|0〉〈0|)+Tr(12 (I±σi)|1〉〈1|)+Tr(
(I±σi)|+〉〈+|)+
(I ± σi)|−〉〈−|)) where i = 1, 3. By the linearity of the Trace function, this is equivalent to
Pr(m = ±1|σi) = Tr(12(I ± σi)
I). In this way, it is quite reasonable to say that the state of
the particle, to the eavesdropper’s best description, is the chaotic state ρ = 1
When an eavesdropper applies a quantum operation E to change the state of the particle, it will
in general change each of the four possible initial states differently. By the linearity of the Trace
function and the convex linearity of the quantum operation E , the probability of m = ±1 can be
calculated for the state ρ′ = E(1
I). That is, Pr(m = ±1|σi) = Tr(12 (I ± σi)E(
I)) for i = 1, 3.
Every (generally mixed) state of a two-level quantum system can be described by ρ(λv) =
(I + λ[v1σ1 + v2σ2 + v3σ3]) where v
1 + v
2 + v
3 = 1, σ2 = iσ1σ3, and 0 ≤ λ ≤ 1. This
“Bloch sphere” description of the two-level state can be pictured as a vector λv in a real
three dimensional space. When E(1
I) = 1
(I + λ[v1σ1 + v2σ2 + v3σ3]), the probabilities for
the four possible announcements are shown in Table 1. If an eavesdropper applies the same
quantum operation each time a particle is sent from Bob to Alice, the probabilities for each
bit-announcement string is found by taking the product of the probabilities of each of the four
announcements, with each of the probabilities appearing in the product the same number of
times that that announcement appears in the string.
We can now calculate the mutual information for any quantum operation by calculating the
probabilities for each bit-announcement string and then using Equations (1) and (2).
To summarize this section, we have described how to calculate the mutual information which
quantifies what an eavesdropper expects to learn about the message given a particular quantum
operation used as an eavesdropping strategy. In the next section, we determine the amount of
“noise” that such eavesdropping strategies cause.
4. Disturbance caused by quantum operations
In the previous section we focused on the bit-announcements and ignored the result-
announcements. In this section we will do the opposite. The bit-announcements are used
Table 2. The four events that correspond to mismatches.
Bob prepares the state Alice measures measurement result
|+〉 σ1 m = −1
|−〉 σ1 m = +1
|0〉 σ3 m = −1
|1〉 σ3 m = +1
by both Bob and any eavesdroppers to determine the message, but the result-announcements
are of no use to the eavesdropper and serve Bob’s purpose to check the channel for “noise”.
There are sixteen different event statistics that are kept by Bob relating to the measurement-
announcements: four possible initial states, two possible measurement types, and two possible
measurement results for each measurement. Out of these sixteen, there are four events that
would be the most surprising to Bob, and would each indicate that the state of the particle,
when Alice measured it, was not the same as the one he had prepared. These four types of events
will be referred to as mismatches and are shown in Table 2. The probability of a mismatch, on
a particular shot, is
Pr(mismatch) = Pr(|+〉, σ1,−1) + Pr(|−〉, σ1,+1) + Pr(|0〉, σ3,−1) + Pr(|1〉, σ3,+1)
Pr(σ1,−1||+〉) + Pr(σ1,+1||−〉) + Pr(σ3,−1||0〉) + Pr(σ3,+1||1〉)
Pr(−1||+〉, σ1) + Pr(+1||−〉, σ1) + Pr(−1||0〉, σ3) + Pr(+1||1〉, σ3)
Of course, when Bob analyzes the data, a mismatch can only occur on a particular shot if the
bases are matched up. A factor of 1/2 disappears when we account for this to give the probability
that there will be a mismatch error on a shot when the bases are matched. For a fixed quantum
operation E employed by an eavesdropper, these probabilities are easily calculated. Note that
these probabilities depend upon the final states E(|+〉〈+|), E(|−〉〈−|), E(|0〉〈0|), and E(|1〉〈1|),
and not just on the evolution of the chaotic state. In general, there are many different quantum
operations that have the same effect on the chaotic state. (The exception to this is when the
chaotic state is mapped to a pure state, in which case it is easily seen by the convex linearity of
quantum operations that every initial state must be mapped to that pure state.)
5. An Example
Let us now examine a family of eavesdropping strategies that utilize the quantum operation
Ex, where x is a parameter which falls in the range 0 ≤ x ≤ 1. When x = 0 the strategy
corresponds to the eavesdropper doing nothing (and as we shall see, learning nothing), and
when x = 1 it corresponds to a quantum operation eavesdropping strategy with the greatest
mutual information.
The quantum operation Ex can be achieved by coupling the initial state ρ (from Bob) to an
auxiliary quantum system in the state |φ〉, letting the coupled system evolve unitarily (described
by some unitary operator U that acts on the combined system) and then tracing over the
auxiliary system. The unitary operator acts as follows:
|0〉 ⊗ |φ〉
= |0〉 ⊗ |F 〉 ≡ |Γ0〉
Table 3. The probabilities, from the eavesdropper’s point of view, of the four possible bit-
announcements for a given value of b when the quantum operation Ex, introduced in Section 5,
is applied.
b = 0 b = 1
Pr(σ1, c = 0|b) 1/4 1/4
Pr(σ1, c = 1|b) 1/4 1/4
Pr(σ3, c = 0|b) (1 + x)/4 (1− x)/4
Pr(σ3, c = 1|b) (1− x)/4 (1 + x)/4
|1〉 ⊗ |φ〉
x |0〉 ⊗ |G〉+
1− x |1〉 ⊗ |F 〉 ≡ |Γ1〉 ,
where 〈F |G〉 = 0 and 〈F |F 〉 = 〈G|G〉 = 1. The fact that 〈0|1〉〈φ|φ〉 = 〈Γ0|Γ1〉 is sufficient
to show that such a unitary operator U exists. The action of the quantum operation Ex on
any initial pure state |η〉 is found by tracing over the auxiliary subsystem after performing the
unitary transformation U :
|η〉〈η|
= Traux
|η〉 ⊗ |φ〉
〈η| ⊗ 〈φ|
By the convex linearity of quantum operations we also know the action of Ex on any mixed state
as well. From the preceeding considerations, it is straightforward to show that Ex acts on the
relevant initial states in the following way:
|0〉〈0|
= |0〉〈0|
|1〉〈1|
= x |0〉〈0| + (1− x) |1〉〈1|
|+〉〈+|
(1 + x) |0〉〈0| + (1− x) |1〉〈1| +
|0〉〈1| + |1〉〈0|
|−〉〈−|
(1 + x) |0〉〈0| + (1− x) |1〉〈1| −
|0〉〈1| + |1〉〈0|
from which it is easy to see that
(1 + x) |0〉〈0| + (1− x) |1〉〈1|
(I + xσ3).
The probability of a mismatch, calculated using Equation (3), for this quantum operations is
(1 + x−
1− x).
In order to calculate the mutual information for this quantum operation, we must be able
to determine the values of Pr(c|b, Ex), that is, the probability of a every string of result-
announcements c given each value of b. If a particular string of k result-announcements
c(k, d1, d2, d3, d4) consists of (σ1, c = 0) announced d1 times, (σ1, c = 1) announced d2 times,
(σ3, c=0) announced d3 times, and (σ3, c=1) announced d4 times — in any order — then the
probability for this announcement to occur is
c(k, d1, d2, d3, d4)|b=0, Ex
(1 + x)d3(1− x)d4 ≡ px,k,d3,d4
c(k, d1, d2, d3, d4)|b=1, Ex
(1− x)d3(1 + x)d4 ≡ qx,k,d3,d4 .
0.2 0.4 0.6 0.8 1
I HC : BL
Figure 1. Mutual information as a function
of x, describing the amount an eavesdropper
learns about the message bit given that she
uses the quantum operation Ex on each shot
when Bob sends N = 119 particles and Alice
has probability pa = 0.01 of making a bit-
announcement.
0.2 0.4 0.6 0.8 1
Mismatch Probability
Figure 2. Probability of a mismatch as a
function of x when an eavesdropper uses the
quantum operation Ex on each shot.
This calculation utilizes the probabilities for the single announcements found in Table 3. There
are k!/(d3!d4!(k − d3 − d4)!) different strings of k bit announcements that share this same
probability (for each value of b).
Using these results, we can now calculate the mutual information.
I(C(k) : B) = −
d3!d4!(k−d3−d4)!
px,k,d,d3 + qx,k,d,d3
px,k,d,d3 + qx,k,d,d3
px,k,d,d3 log px,k,d,d3 − qx,k,d,d3 log qx,k,d,d3
If we choose some exemplary values of pa and N , this will give us some numerical results for the
mutual information. Say that Alice sets pa = 0.05 and Bob agrees with Alice to send N = 119
particles in order to communicate the value of a single bit. This choice of pa and N gives them
slightly more than a 95% chance of matching their bases on a shot when a result-announcement is
made. The mutual information I(C : B), when an eavesdropper applies the quantum operation
Ex on every shot, is plotted for all values of 0 ≤ x ≤ 1 in Figure 1. Compare this with the
disturbance caused, quantified by the probability of a mismatch, by applying the same quantum
operation, which is shown in Figure 2.
As a final note, this example demonstrates that a passive eavesdropper learns nothing about
the message. That is, if we describe a passive eavesdropper as someone who is only listening
to the announcements that Alice makes but does not interfere with the particles in any way,[1]
that person’s eavesdropping strategy would correspond to Ex when x = 0. It is easily seen from
the Figures that this strategy causes the eavesdropper to learn nothing and also to causes no
disturbance.
6. Discussion
This protocol represents something new in the field of cryptography. It provides the message
receiver with a way to check if an eavesdropper is attempting to access the message. The analysis
shown here demonstrates both the amount learned by an eavesdropper and the disturbance
caused, measured in the number of mismatches, when an eavesdropper employs a particular
quantum operation.
As shown in the example above, this protocol is not secure against active attacks in which an
eavesdropper interacts with the particles as they travel from the message receiver to the message
sender. However, this example also demonstrates that such attacks cause a disturbance in the
system, which can be quantified by the number of mismatches found by the message receiver. A
more general analysis a message receiver’s bound on the amount of information an eavesdropper
could have learned during a particular transmission is taken up elsewhere.[2]
The protocol discussed here has similarities to other quantum cryptography protocols that
have been introduced and it is worthwhile to examine these similarities, as well as what
makes this current protocol distinct. The three types of quantum cryptographic protocols that
will be discussed here are quantum key distribution (QKD) protocols, quantum secure direct
communication (QSDC) protocols, and quantum seal protocols.
The main distinction between this new protocol and the QKD protocols is that the goal of
QKD is to develop a shared private key between two parties while here it is important that a
particular message gets transmitted. Said in a different way, each party in a QKD setting starts
with nothing and ends up with a random string of bits, but neither one of them cares which
string of bits results from the process, so long as they share the same one. Here, one party starts
with a particular string of bits — the message — and when the process ends the other party will
(hopefully) have the message as well. (There is a tunably small probability that the process will
be unsuccessful.[2]) Of course, in QKD the random string of bits can later be used to encrypt
a message (which can be sent on a classical channel), but the QKD process itself transfers no
information.
It is worthwhile mentioning that this current protocol is very similar, in some ways, to a specific
QKD protocol, called BB84.[5] The two protocols use the same four initial possible states and
the same two measurements. The difference between the two is the classical messages that are
sent and how these messages are used. These two protocols are so similar that if two users have
a system that implements BB84 then they should be able to implement this new protocol with
only minor modifications to the system.
The second type of quantum cryptographic protocol that we will discuss is the so-called
“quantum secure direct communication” (QSDC).[6] The greatest similarity between the QSDC
protocols and the one introduced here is that they both use quantum states of some transferred
system to transmit a message from one party to another, rather than generating a key. Moreover,
this is done without the use of any pre-shared key. However, the goal of QSDC is to transmit the
messages securely (that is, to prevent any eavesdropper from understanding the message), while
the goal of the protocol introduced here is to detect the activity of any active eavesdroppers.
The final comparison we will make is with those quantum cryptographic protocols that have been
called “quantum seal” protocols.[7] These quantum seal protocols are distinct from the current
one. The goal of the quantum seal protocols is for a message sender to prepare a quantum
mechanical system in some initial state so that someone else can determine the message by
making a measurement on that quantum mechanical system. Moreover, the message preparer
also creates correlations between the quantum mechanical system and a second quantum
mechanical system so that a measurement can be made, by the message preparer, on the second
system to determine if anyone has read the message. The major distinction between these
quantum seal protocols and the protocol introduced here is that protocol introduced here has a
preferred message receiver (the person who sends the particles to the message sender) who can
check if anyone else has tried to read the message, while in these earlier quantum seal protocols[7]
all receivers are on equal footing and it is the message sender who can check if someone has
accessed the message.
We conclude this discussion by emphasizing that the protocol introduced here is neither a QKD
protocol, nor a QSDC protocol, nor a quantum seal protocol. It has distinct goals and the
various security (or no-security) proofs that have been applied to these earlier protocols do not
apply here.
Acknowledgments
This work was funded in part by the Disruptive Technology Office (DTO) and by the Army
Research Office (ARO). This research was performed while Paul Lopata held a National Research
Council Research Associateship Award at the Army Research Laboratory.
References
[1] Brassard G Modern Cryptology 1988 (Spring-Verlag New York, Inc.)
[2] Lopata P and Bahder T, manuscript in preparation
[3] Nielsen M and Chuang I 2000 Quantum Computation and Quantum Information (Cambridge University
Press)
[4] Shannon C 1993 Claude Elwood Shannon Collected Papers (IEEE Press) p 84
[5] Bennett C and Brassard G 1984 Proceedings of IEEE International Conference on Computers, Systems and
Signal Processing (IEEE Press) pp 175–179
[6] Boström K and Felbinger T 2002 Physical Review Letters 89 187902
Wójcik A 2003 Physical Review Letters 90 157901
Deng F-G, Long G L, and Liu X-S 2003 Physical Review A 68 042317
Deng F-G and Long G L 2004 Physical Review A 69 052319
Lucamarini M and Mancini S Physical Review Letters 94 140501
and others.
[7] Bechmann-Pasquinucci H 2003 Quantum Seals Preprint quant-ph/0303173
Bechmann-Pasquinucci H, D’Ariano G M, and Macchiavello C 2005 Impossibility of Perfect Sealing of
Classical Information Preprint quant-ph/0501073
Singh S K and Srikanth R 2005 Physica Scripta 71 pp 433–5
He G-P 2005 Physical Review A 71 054304
Chau H F 2006 Physics Letters A 354 pp 31–4
http://arxiv.org/abs/quant-ph/0303173
http://arxiv.org/abs/quant-ph/0501073
ABSTRACT
  A quantum protocol is described which enables a user to send sealed messages
and that allows for the detection of active eavesdroppers. We examine a class
of eavesdropping strategies, those that make use of quantum operations, and we
determine the information gain versus disturbance caused by these strategies.
We demonstrate this tradeoff with an example and we compare this protocol to
quantum key distribution, quantum direct communication, and quantum seal
protocols.

<|endoftext|><|startoftext|>
Shocks in nonlocal media
Neda Ghofraniha,1 Claudio Conti,2,3 Giancarlo Ruocco,3,4 Stefano Trillo5∗
1 Research Center SMC INFM-CNR, Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy
2Centro Studi e Ricerche “Enrico Fermi”, Via Panisperna 89/A, 00184 Rome, Italy
3Research center SOFT INFM-CNR Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy
4Dipartimento di Fisica, Università di Roma “La Sapienza”, P. A. Moro 2, 00185, Roma, Italy
5 Dipartimento di Ingegneria, Università di Ferrara, Via Saragat 1, 44100 Ferrara, Italy
(Dated: October 1, 2018)
We investigate the formation of collisionless shocks along the spatial profile of a gaussian laser
beam propagating in nonlocal nonlinear media. For defocusing nonlinearity the shock survives
the smoothing effect of the nonlocal response, though its dynamics is qualitatively affected by the
latter, whereas for focusing nonlinearity it dominates over filamentation. The patterns observed in
a thermal defocusing medium are interpreted in the framework of our theory.
Shock waves are a general phenomenon thoroughly in-
vestigated in disparate area of physics (fluids and water
waves, plasma physics, gas dynamics, sound propagation,
physics of explosions, etc.), entailing the propagation of
discontinuous solutions typical of hyperbolic PDE mod-
els [1, 2]. They are also expected in (non-hyperbolic)
universal models for dispersive nonlinear media, such as
the Korteweg-De Vries (KdV) and nonlinear Schrödinger
(NLS, or analogous Gross-Pitaevskii) equations, since hy-
drodynamical approximations of such models hold true
in certain regimes (typically, in the weakly dispersive or
strongly nonlinear case) [3, 4, 5]. However, in the lat-
ter cases, no true discontinuous solutions are permitted.
The general scenario, first investigated by Gurevich and
Pitaevskii [3], is that dispersion regularizes the shock, de-
termining the onset of oscillations that appear near wave-
breaking points and expand afterwards. This so-called
collisionless shock has been observed for example in ion-
acoustic waves [6], or wave-breaking of optical pulses in
a normally dispersive fiber [7], and recently in a Bose-
Einstein condensate with positive scattering length [8].
In this Letter we investigate how nonlocality of the
nonlinear response affects the formation of a collisionless
shock in a system ruled by a NLS model. In fact nonlocal-
ity plays a key role in many physical systems due to trans-
port phenomena and finite range interactions (e.g. as in
Bose-Einstein condensation), and can be naively thought
to smooth and eventually wipe out steep fronts character-
istic of shocks. More specifically, we place this problem
in the context of nonlinear optics where nonlocality arises
quite naturally in different media [9, 10, 11, 12], study-
ing the spatial propagation of a fundamental (gaussian
TEM00) laser mode subject to diffraction and nonlocal
focusing/defocusing action (Kerr effect). In a defocus-
ing and ideal (local and lossless) medium, high intensity
portions of the beam diffract more rapidly than the tails
leading, at sufficiently high powers, to overtaking and os-
cillatory wave-breaking similar (in 1D) to what observed
in the temporal case [18]. We find that, while shock is
not hampered by the presence of (even strong) nonlocal-
ity, the mechanism of its formation as well as post-shock
patterns are qualitatively affected by the nonlocality. Ex-
perimental results obtained with a thermal defocusing
nonlinearity are consistent with our theory and shed new
light on the interpretation of the thermal lensing phe-
nomenon.
Importantly, our theory permits also to establish that
nonlocality allows the shock to form also in the focusing
regime where, contrary to the local case, it can prevails
over filamentation or modulational instability (MI).
Theory We start from the paraxial wave equation
obeyed by the envelope A of a monochromatic field
E = ( 2
)1/2A exp(ikZ − iωT ) (|A|2 is the intensity)
+ k0∆nA = −i
A. (1)
where k = k0n =
n is the wave-number, and α0
the intensity loss rate. A sufficiently general nonlocal
model can be obtained by coupling Eq. (1) to an equa-
tion that rules the refractive index change ∆n of nonlin-
ear origin. Introducing the scaled coordinates x, y, z =
X/w0, Y/w0, Z/L, and complex variables ψ = A/
and θ = k0Lnl∆n, where Lnl = (k0|n2|I0)−1 is the non-
linear length scale associated with peak intensity I0 and
a local Kerr coefficient n2 (∆n = n2|A|2), Ld = kw20 is
the characteristic diffraction length associated with the
input spot-size w0, and L ≡
LnlLd, such model can be
conveniently written as follows [12]
ψ + χθψ = −iα
εψ, (2)
−σ2∇2
θ + θ = |ψ|2, (3)
where α = α0L, ∇2⊥ = ∂2x + ∂2y , χ = n2/|n2| = ±1 is
the sign of the nonlinearity, and σ2 is a free parameter
that measures the degree of nonlocality. The peculiar
dimensionless form of Eqs. (2-3) where ε ≡ Lnl/L =
Lnl/Ld is a small quantity, highlights the fact that we
will deal with the weakly diffracting (or strongly non-
linear) regime, such that the local σ = 0 and lossless
α = 0 limit yields a semiclassical Schrödinger equation
with cubic potential (ε and z replace Planck constant
http://arxiv.org/abs/0704.0610v1
and time, respectively). We study Eqs. (2-3) subject
to the axi-symmetric gaussian input ψ0(r) = exp(−r2),
x2 + y2, describing a fundamental laser mode
at its waist. For ε ≪ 1, its evolution can be studied
in the framework of the WKB trasformation ψ(r, z) =
ρ(r, z) exp [iφ(r, z)/ ε] [4]. Substituting in Eqs. (2-3)
and retaining only leading orders in ε, we obtain
(D − 1)
ρu+ (ρu)r
= −αρ; uz + uur − χθr = 0,
θrr +
D − 1
+ θ = ρ. (4)
where u ≡ φr is the phase chirp, and D = 2 is the trans-
verse dimensionality. The 1D case described by Eqs. (4)
with D = 1 and r → x (∂y = 0) illustrates the ba-
sic physics with least complexity. In the defocusing case
(χ = −1) for an ideal medium (σ = α = 0, θ = ρ),
Eqs. (4) are a well known hyperbolic system of conser-
vation laws (Eulero and continuity equations) with real
celerities (or eigenspeeds, i.e. velocities of Riemann in-
variants) v± = u ±
−χρ, which rules gas dynamics (u
and ρ are velocity and mass density of a gas with pres-
sure ∝ ρ2). A gaussian input is known to develop two
symmetric shocks at finite z [4]. Importantly the diffrac-
tion, which is initially of order ε2, starts to play a major
role in the proximity of the overtaking point, and regu-
larize the wave-breaking through the appearance of fast
(wavelength ∼ ε) oscillations which connect the high and
low sides of the front and expand outwards (far from the
beam center) [3]. Such oscillations, characteristic of a col-
lisionless shock, appear simultaneously in intensity and
phase chirp (u) as clearly shown in Fig. 1(a,c).
In the nonlocal case, the index change θ(x) can be
wider than the gaussian mode (for large σ) and the shock
dynamics is essentially driven by the chirp u with ρ adia-
batically following. This can be seen by means of the fol-
lowing approximate solution of Eqs. (4): considering that
the equation for ρ is of lesser order [O(ǫ)], with respect
to those for θ and u [O(1)], we assume ρ = exp(−2x2)
unchanged in z and solve exactly the third of Eq. (4) for
θ(x) (though derived easily, its full expression is quite
cumbersome). Then, applying the theory of characteris-
tics [1], the second of Eqs. (4) is reduced to the following
ODEs, where dot stands for d/dz
ẋ = u ; u̇ = χθx, (5)
equivalent to the motion of a unit mass in the potential
V (x) = −χθ with conserved energy E = u(z)
+ V (x).
The solution of Eqs. (5) with initial condition x(0) =
s, u(0) = 0 yields x(s, z), u(s, z) in parametric form, from
which overtaking is found whenever u(x, z) (obtained
by eliminating s) becomes a multivalued function of x
at finite z = zs. The shock point corresponding to
|du/dx| → ∞ is found from the solution u(x, z) displayed
in Fig. 2(a) [ 2(b)], at positions x = ±xs 6= 0 (defocusing
FIG. 1: (Color online) 1D spatial profiles of phase chirp u(x)
(a-b) and amplitude |ψ(x)| =
ρ(x) (c-d), as obtained from
Eqs. (2-3) with ε = 10−3, α = 0, χ = −1 (defocusing), and
ψ0 = exp(−x
2), for different z as indicated: (a-c) local case,
σ2 = 0; (b-d) nonlocal case, σ2 = 5.
FIG. 2: (Color online) (a) u(x) for different z and χ = 1
(focusing), σ = 1; (b) as in (a) for χ = −1 (defocusing); (c)
shock distance zs (χ = −1 bold solid, χ = 1 thin solid) and
shock position xs in the defocusing case (dashed line).
case) or xs = 0 (focusing case). The shock distance zs
increases with σ in both cases, as shown in Fig. 2(c).
We have tested these predictions by integrating nu-
merically Eqs. (2-3). Simulations with χ = −1 [see Fig.
1(b,d)] show indeed steepening and post-shock oscilla-
tions in the spatial chirp u, which are accompanied by
a steep front in ρ moving outward. The shock location
in x and z is in good agreement with the results of our
approximate analysis summarized in Fig. 2.
Numerical simulations of Eqs. (2-3) validates also
the focusing scenario. The field evolution displayed in
Fig. 3(a) exhibits shock formation at the focus point
(xs = 0, zs ≃ 8, for σ = 5) driven the phase whose chirp
is shown in Fig. 3(b). This is remarkable because, in
the local limit σ = 0, the celerities become imaginary
(the equivalent gas would have pressure decreasing with
increasing density ρ), and no shock could be claimed to
exist. In this limit, the reduced problem (4) is elliptic and
the initial value problem is ill-posed [13], an ultimate con-
sequence of the onset of MI: modes with transverse (nor-
FIG. 3: (Color online) Level plot of the intensity in the
focusing case (χ = 1, ε = 0.01): (a) nonlocal case (σ2 =
25); (b) chirp profile for various z for (a); (c) quasi-local case
(σ2 = 10−5).
malized) wavenumber q < ∆q grow exponentially with z,
with both gain g and bandwidth ∆q scaling as 1/ε. How-
ever, the nonlocal response tends to frustrate MI (see also
Refs. [9, 12]), as shown by standard linear stability anal-
ysis which yields g =
d(2χ− d)/ε2 (we set d ≡ ε2q2/2
and χ ≡ χ/(1 + σ2q2)), in turn implying a strong reduc-
tion of both gain and bandwidth for large σ. In order
to emphasize the difference between the local and non-
local regime, we contrast Fig. 3(a) with the analogous
evolution [see Fig. 3(c)] obtained in the quasi-local limit
(σ2 = 10−5), which appears to be clearly dominated by
filamentation.
Thermal nonlinearity The physics of the defocusing
case can be experimentally tested by exploiting ther-
mal nonlinearities of strongly absorptive bulk samples,
that we show below to fit our model. In this case, the
system relaxes to a steady-state refractive index change
∆n = (dn/dT )∆T , where dn/dT is the thermal coef-
ficient, and ∆T the local temperature change due to
optical absorption. It is well known that this so-called
thermal lens distorts a laser beam propagating in the
medium [14, 15, 16]. However, only perturbative ap-
proaches to the problem have been proposed (ray optics
or Fresnel diffraction theory is applied after the lens pro-
file is worked out from gaussian ansatz [14]), while the
role of shock phenomena was completely overlooked.
FIG. 4: 2D evolution according to Eqs. (2-3) with σ2 = 1,
α = 1: (a) radial phase chirp at different z, as indicated,
showing steepening and shock formation for ε = 10−2; (b) cor-
responding intensity profile |ψ|2 (maximum scaled to unity)
at z = 4.9; (c) transverse intensity profile vs. x (at y = 0) at
z = 1/(4ε) and different values of ε (α0 = 62cm
−1, σ = 0.3).
We assume that the temperature field ∆T =
∆T⊥(X,Y ) obeys the following 2D heat equation
(∂2X + ∂
Y )∆T⊥ − C∆T⊥ = −γ|A|2 (6)
where the source term account for absorption pro-
portional to intensity through the coefficient γ =
α0/(ρ0cpDT ), where ρ0 the material density, cp the spe-
cific heat at constant pressure, and DT is the thermal
diffusivity (see e.g. [16]). Eq. (6) has been already em-
ployed to model a refractive index of thermal origin in
Ref. [10], and in Ref. [11] in the limit C = 0 which
is equivalent to consider the range of nonlocality (mea-
sured by 1/C, see below) to be infinite. Starting from
the 3D heat equation ∇2∆T = −γ|A|2, the latter regime
amounts to assume ∆T (X,Y, Z) = ∆T⊥(X,Y ), which is
justified when longitudinal changes in intensity |A|2 are
negligible as for solitary (invariant in Z) wave-packets
in the presence of low absorption [11]. Viceversa, in the
regime of strong absorption, we need to account for lon-
gitudinal temperature profiles that are known from solu-
tions of the 3D heat equations to be peaked at charac-
teristic distance Ẑ in the middle of sample and decay to
room temperature on the facets [14]. Since highly non-
linear phenomena occurs in the neighborhood of Ẑ where
the index change is maximum, we can use a (longitudi-
nal) parabolic approximation with characteristic width
Leff (∼ L) of the 3D temperature field ∆T (X,Y, Z) =
1− (Z−Ẑ)
∆T⊥(X,Y ) and consequently approximate
∇2∆T ≃ (∂2X + ∂2Y )∆T⊥ − L
eff∆T⊥, so that the 3D
heat equation reduces to Eq. (6) with C = 1/L2eff .
Following this approach, Eq. (6) coupled to Eq. (1)
can be casted in the form of Eqs. (2-3) by posing
θ = k0Lnl|dn/dT |∆T⊥ and σ2 = 1/(Cw20) = L2eff/w20.
The model reproduces the infinite range nonlocality for
negligible losses (Leff → ∞); while for thin samples
[|(∂2X + ∂2Y )∆T⊥| << |L
eff∆T⊥|], Leff can be related
to the Kerr coefficient n2 as
Leff =
γ|dn/dT |
DTρ0cp|n2|
α0|dn/dT |
which establishes a link between the degree of nonlocal-
ity and the strength of the nonlinear response (similarly
to other nonlocal materials [12]).
Having retrieved the model Eqs. (2-3), let us show
next that the scenario illustrated previously applies sub-
stantially unchanged in bulk (2D case) even on account
for the optical power loss (α 6= 0). An example of the
general dynamics is shown in Fig. 4, where we report a
simulation of the full model (2-3), with σ2 = 1 and rela-
tively large loss α = 1. In analogy to the 1D case, Fig.
4(a) clearly shows that the radial chirp u = φr steep-
ens and then develop characteristic oscillations after the
shock point (z ≃ 6, where |∂ru| → ∞). Correspond-
ingly the intensity exhibits also an external front which
is connected to a flat central region with a characteris-
tic overshoot [see Fig. 4(b)] corresponding to a brighter
ring [inset in Fig. 4(c)]. For larger distances this struc-
ture moves outward following the motion of the shock.
In the experiment such motion can be observed, at fixed
physical lenght, by increasing the power, which amounts
to decrease ε while scaling z and α accordingly (z ∝ 1/ε,
α ∝ ε), as displayed in Fig. 4(c) for σ = 0.3.
As a sample of a strongly absorbing medium we choose
a 1 mm long cell filled with an acqueous solution of
Rhodamine B (0.6 mM concentration). Our measure-
ments of the linear and nonlinear properties of the sam-
ple performed by means of the Z-scan technique gives
data consistent with the literature [17], and allows us
to extrapolate at the operating vacuum wavelength of
532 nm, a linear index n = 1.3, a defocusing nonlin-
ear index n2 = 7 × 10−7 cm2W−1, and α0 = 62 cm−1.
For our sample DT = 1.5 × 10−7 m2s−1, ρ0 = 103 kg
m−3, cp = 4 × 103Jkg−1K−1 and |dn/dT | = 10−4 K−1
(γ ∼= 104 K W−1), and exploiting Eq. (7) we estimate
Leff ∼= 10µm (Leff << L because of the strong ab-
sorption that causes strong heating of our sample near
the input facet), and correspondingly the degree of non-
locality σ ∼= 0.3. We operate with an input gaussian
beam with fixed intensity waist w0I = w0/
2 = 20 µm
(Ld ∼= 12 mm) focused onto the input face of the cell.
With these numbers, an input power P = πw20II0 = 200
mW yields a nonlinear length Lnl ∼= 8 µm (L ∼= 0.3 mm),
which allows us to work in the semiclassical regime with
ε ∼= 0.025. The radial intensity profiles together with the
2D patterns imaged by means of a 40×microscope objec-
tive and a recording CCD camera are reported in Fig. 5.
As shown the beam exhibits the formation of the bright
ring whose external front moves outward with increasing
power, consistently with the reported simulations. We
point out that, at higher powers, we observe (both ex-
perimentally and numerically) that the moving intensity
front leaves behind damped oscillations that correspond
to inner rings of lesser brightness, as reported in litera-
ture [15]. This, however, occurs well beyond the shock
point that we have characterized so far.
In summary, the evolution of a gaussian beam in the
strong nonlinear regime is characterized by occurence of
collisionless (i.e., regularized by diffraction) shocks that
survive the smoothing effect of (even strong) nonlocality.
While experimental results support the theoretical sce-
nario in the defocusing case, the remarkable result that
the nonlocality favours shock dynamics over filamenta-
tion requires future investigation.
∗ Electronic address: claudio.conti@phys.uniroma1.it
[1] G. B. Whitman, Linear and Nonlinear Waves (Wiley,
New York, 1974);
[2] L. D. Landau and E. M. Lifshitz, Fluid Mechanics (Perg-
amon, 1995); M. A. Liberman and A. L. Yelikovich,
Physics of Shock Waves in Gases and Plasmas (Springer,
FIG. 5: Radial profiles of intensity observed in the thermal
medium for different input powers. The insets show the cor-
responding 2D output patterns.
Heidelberg, 1986).
[3] A.V. Gurevich and L.P. Pitaevskii, Sov. Phys. JETP 38,
291 (1973); A.V. Gurevich and A. L. Krylov, Sov. Phys.
JETP 65, 944 (1987).
[4] J. C. Bronski and D. W. McLaughlin, in Singular Limits
of Dispersive Waves, NATO ASI Series, Ser. B 320, pp.
21-28 (1994); M. G. Forest and K. T. R. McLaughlin,
J. Nonlinear Science 7, 43 (1998); Y. Kodama, SIAM J.
Appl. Math. 59, 2162 (1999). M. G. Forest, J. N. Kutz,
and K. T. R. McLaughlin, J. Opt. Soc. Am. B 16, 1856
(1999).
[5] A. M. Kamchatnov, R. A. Kraenkel, and B. A. Umarov,
Phys. Rev. E 66, 036609 (2002).
[6] R. J. Taylor, D.R. Baker, and H. Ikezi, Phys. Rev. Lett.
24, 206 (1970).
[7] J. E. Rothenberg and D. Grischkowsky, Phys. Rev. Lett.
62, 531 (1989).
[8] M. A. Hoefer, M. J. Ablowitz, I. Coddington, E. A.
Cornell, P. Engels, and V. Schweikhard, Phys. Rev. A
74, 023623 (2006). V. M. Perez-Garcia, V.V. Konotop,
V.A. Brazhnyi, Phys. Rev. Lett. 92, 220403 (2004); B.
Damski, Phys. Rev. A 69, 043610 (2004);
[9] J. Wyller, W. Krolikowski, O. Bang, J. J. Rasmussen,
Phys. Rev. E 66, 066615 (2002).
[10] A. Yakimenko, Y. Zaliznyak and Y.S. Kivshar, Phys.
Rev. E 71, 065603(R) (2005).
[11] C. Rotschild, O. Cohen, O. Manela, M. Segev and T.
Carmon, Phys. Rev. Lett. 95, 213904 (2005).
[12] C. Conti, M. Peccianti and G. Assanto, Phys. Rev. Lett.
91 073901 (2003); Phys. Rev. Lett. 92 113902 (2004);
C. Conti, G. Ruocco and S. Trillo, Phys. Rev. Lett. 95
183902 (2005).
[13] P. D. Miller and S. Kamvissis, Phys. Lett. A 247, 75
(1998); J. C. Bronski, Physica D 152, 163 (2001).
[14] C. A. Carter and J. M. Harris, Appl. Opt. 23, 476 (1984);
S. Wu and N. J. Dovichi, J. Appl. Phys. 67, 1170 (1990);
F. Jürgensen and W. Schröer, Appl. Opt. 34 41 (1995).
[15] C. J. Wetterer, L. P. Schelonka, and M. A. Kramer, Opt.
Lett. 14, 874 (1989).
mailto:claudio.conti@phys.uniroma1.it
[16] P. Brochard, V. Grolier-Mazza and R. Cabanel, J. Opt.
Soc. Am. B 14, 405 (1997).
[17] S. Sinha, A. Ray, and K. Dasgupta, J. Appl. Phys. 87,
3222 (2000).
[18] paraxial diffraction in defocusing media is well known
to be isomorphus in 1D to propagation in a normally
dispersive focusing medium as considered in Ref. [7]
ABSTRACT
  We investigate the formation of collisionless shocks along the spatial
profile of a gaussian laser beam propagating in nonlocal nonlinear media. For
defocusing nonlinearity the shock survives the smoothing effect of the nonlocal
response, though its dynamics is qualitatively affected by the latter, whereas
for focusing nonlinearity it dominates over filamentation. The patterns
observed in a thermal defocusing medium are interpreted in the framework of our
theory.

<|endoftext|><|startoftext|>
Introduction
	Description of RBFNN
	Non parametric statistical modeling
	Quality of predictor
	Prediction of field evolution
	Time evolution of melt pool
	Optimal value of parameter 
	Optimal number of joint sample pairs
	Choosing the surrounding S
	Optimal prediction of melt pool evolution
	Conclusion
	References
ABSTRACT
  Efficient control of a laser welding process requires the reliable prediction
of process behavior. A statistical method of field modeling, based on
normalized RBFNN, can be successfully used to predict the spatiotemporal
dynamics of surface optical activity in the laser welding process. In this
article we demonstrate how to optimize RBFNN to maximize prediction quality.
Special attention is paid to the structure of sample vectors, which represent
the bridge between the field distributions in the past and future.

<|endoftext|><|startoftext|>
Introduction
Recently our understanding of the linear Balitsky–Fadin–Kuraev–Lipatov (BFKL) [4, 5] and
non-linear Jalilian-Marian–Iancu–McLerran–Weigert–Leonidov–Kovner (JIMWLK) [6–13] and
Balitsky-Kovchegov (BK) [14–18] small-x evolution equations in the Color Glass Condensate
[6–29] has been improved due to the completion of the calculations determining the scale of the
running coupling in the evolution kernel in [1,2,30,31]. The calculations in [1,2] proceeded by
including αsNf corrections into the evolution kernel and by then completing Nf to the complete
one-loop QCD beta-function by replacing Nf → −6πβ2. Calculation of the αsNf corrections
is particularly easy in the s-channel light-cone perturbation theory formalism [32, 33] used to
derive the BK and JIMWLK equations: there αs Nf corrections are solely due to chains of
quark bubbles placed onto the s-channel gluon lines.
The analytical results of [1,2] are not very concise and could not have been guessed without
an explicit calculation. After finding αsNf corrections, the obtained contributions had to be
divided into the running coupling part, which has a form of a running coupling correction
to the leading-order (LO) JIMWLK or BK kernel, and into the ”subtraction piece”, which
would bring in new structures into the kernel. Such separation had to be done both in [1] and
in [2]. Unfortunately, there appears to be no unique way to perform this separation: it is not
surprising, therefore, that it was done differently in both papers [1, 2]. This resulted in two
different running coupling terms, shown below in Eqs. (35) and (36) along with Eqs. (7) and
(8). Such a discrepancy has led to a misconception in the community that the calculations
of [1] and of [2] disagree at some fundamental level.
Indeed to compare the results of [1] and [2] one has to undo the separation into the running
coupling and subtraction terms: combining both terms one should compare full kernels of the
evolution equation obtained in [1, 2]. There is another more physical reason to perform such
comparison: in principle, there is no small parameter making the subtraction term smaller than
the running coupling term and thus justifying neglecting the former compared to the latter.
Even the labeling of one term as “running coupling” piece is somewhat misleading, since it may
give an impression that the neglected subtraction term has no running coupling corrections in
it. As was shown in [31] both terms actually contribute to the running coupling corrections to
the BFKL equation (if one uses the separation of [2] to define the terms).
In this paper we perform numerical analysis of the BK evolution equation with the αsNf
corrections resummed to all orders and with Nf completed to the QCD beta-function, Nf →
−6πβ2, with β2 given in Eq. (20). We first solve the BK equation keeping the running coupling
term only, with the kernels given by Eqs. (7) and (8). Indeed the solutions we find this way
are different from each other. We then evaluate the subtraction terms for both cases and show
that inclusion of subtraction terms puts the results of [1] and [2] in perfect agreement with each
other! We complete our analysis by solving the BK equation with the full kernel including both
the running coupling and subtraction terms.
This work is structured as follows. Section 2 begins with Sect. 2.1 in which we review
the αsNf corrections to the dipole scattering amplitude evolution equation recently derived
in [1, 2] and the subtraction method employed in both works to separate the running coupling
contributions from the subtraction terms. We discuss the scheme dependence of the running
coupling terms introduced by this separation. We proceed in Sect. 2.2 by deriving the explicit
expressions for the subtracted terms. The calculation is based on the results of [2]. Our
analytical results are summarized in Sect. 2.3, where we give the explicit final expression
for the kernel of the subtraction term in Eq. (39), which, combined with Eq. (38) gives us the
subtraction terms (40) and (41) for the subtractions performed in [1] and in [2] correspondingly.
In Sect. 3 we explain the numerical method we use to solve the evolution equations. We
also list the initial conditions used, along with the definition of the saturation scale employed.
Throughout the paper we will avoid the important question of the Landau pole and the contri-
bution of renormalons to small-x evolution. As we explain in Sect. 3, we will simply “freeze”
the running coupling at a constant value in the infrared. For a detailed study of the renormalon
effects in the non-linear evolution we refer the readers to [30].
Our numerical results are presented in Sect. 4. By solving the evolution equations with the
running coupling term only in Sect. 4.1 we show that the resulting dipole amplitude differs
significantly from the fixed coupling case. We also observe that the amplitude obtained by
solving the equation obtained in [2] is very close to the result of solving the BK evolution
with a postulated parent-dipole running of the coupling constant. Both these amplitudes are
quite different from the solution of the equation derived in [1], as one can see from Fig. 4.
In spite of that, all three evolution equations studied (the ones derived in [1], [2] and the
parent-dipole running coupling model) give approximately identical scaling function for the
dipole amplitude at high rapidity, as demonstrated in Fig. 5 in Sect. 4.2. It is worth noting
that, as can be seen from Fig. 6, the anomalous dimension we extracted from our solution
is γ ≈ 0.85, which is different from the fixed coupling anomalous dimension of γ ≈ 0.64.
The former anomalous dimension also appears to disagree with the predictions of analytical
approximations to the behavior of the dipole amplitude with running coupling from [34–38]. In
Sect. 4.3 we numerically evaluate the subtraction terms for both [1] and [2] and show that their
contributions are important, as shown in Fig. 7. However, subtraction terms decrease with
increasing rapidity, such that at high enough rapidities their relative contribution becomes
small (see Fig. 8). In Fig. 9 we show that inclusion of subtraction terms makes the results of [1]
and [2] agree with each other. Finally, the numerical solution of the full (all orders in αs β2)
evolution equation including both the running coupling and subtraction terms is performed in
Sect. 4.4. The results are shown in Fig. 10. All the main features of the evolution with the
running coupling are preserved in the full solution: the growth of the dipole amplitude and
of the saturation scale with rapidity is slowed down (for the latter see Fig. 11). The scaling
function of Fig. 5 is unaltered by the subtraction term, as shown in Fig. 12.
We summarize and discuss our main conclusions in Sect. 5.
2 Scheme dependence
2.1 Inclusion of running coupling corrections: general concepts
The BK evolution equation for the dipole scattering matrix reads
∂S(x0, x1; Y )
d2z K(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] , (1)
where
K(x0, x1, z) =
r21 r
is the kernel of the evolution. Here transverse two-dimensional vectors x0 and x1 denote the
transverse coordinates of the quark and the anti-quark in the parent dipole, while z is the
position of the gluon produced in one step of evolution [39–42]. We have introduced the
notation r = x0 − x1, r1 = x0 − z, r2 = z − x1 for the sizes of the parent and of the new
(daughter) dipoles created by one step of the evolution. The notation r ≡ |r| for all the 2-
dimensional vectors will be also employed throughout the rest of the paper. Eq. (1) admits
a clear physical interpretation: the original parent dipole, when boosted to higher rapidities,
may emit a new gluon which, in the large-Nc limit, is equivalent to a quark-antiquark pair.
Thus, the original dipole splits into two new dipoles sharing a common transverse coordinate:
the transverse position of the emitted gluon, z. The nonlinear term in the right hand side of
Eq. (1) accounts for either one of the two new dipoles interacting with the target, along with
the possibilities of only one dipole interacting or no interaction at all, while the subtracted
linear term reflects virtual corrections. The kernel of the evolution is just the probability of
one gluon emission calculated at leading logarithmic accuracy in αs ln(1/xB), where xB is the
fraction of momentum carried by the emitted gluon [39–42].
Under the eikonal approximation the dipole scattering matrix off a hadronic target at a
fixed rapidity is given by the average over the hadron field configurations of Wilson lines V
calculated along fixed transverse coordinates (those of the quark and of the antiquark). More
specifically
S(x0, x1; Y ) =
V (x0)V
†(x1)
〉 . (3)
Hence, the integrand of Eq. (1) can be regarded as a three point function in the sense that
the gluon fields of the target are evaluated at three different transverse positions, those of the
original quark and antiquark plus the one of the emitted gluon.
However, the inclusion of higher order corrections to the evolution equation via all order
resummation of αsNf contributions as recently derived in [1, 2] brings in new physical chan-
nels that modify the three point structure of the leading-log equation. The dipole structure
generated under evolution by diagrams like the one depicted in Fig. 1A (for more detailed dis-
cussion of the diagrammatic content of the high order corrections see [2]) is identical to the one
previously discussed for the leading order equation, the only novelty being that the propagator
of the emitted gluon is now dressed with quark loops, modifying the emission probability but
leaving untouched the interaction terms. On the contrary, diagrams like the one in Fig. 1B
in which a quark-antiquark pair (rather than a gluon) is added to the evolved wave function
modify the interaction structure of the evolution equation. The evolution of the parent dipole
scattering matrix driven by these kind of terms is proportional to the scattering matrix of the
two newly created dipoles (the one formed by the original quark and the new antiquark and
vice versa), ∼ S(x0, z1)S(z2, x1). This term depends on four different transverse coordinates,
i.e., it is a four point function and, therefore, its contribution to the evolution equation cannot
be accounted for by a mere modification of the emission kernel of the leading order equation.
To discuss in more detail the modifications introduced by the high order corrections, we
find it useful to rewrite the evolution equation in the following, rather general way
∂S(x0, x1; Y )
= F [S(x0, x1; Y )] (4)
where F is a functional of the dipole scattering matrix which for the original derivation of the
equation is given by the right hand side of Eq. (1). In general it can be decomposed into two
Figure 1: Schematic representation of the diagrams contributing to quark-NLO evolution.
pieces
F [S] = R [S]− S [S] . (5)
The first term, R, which we will call the ’running coupling’ contribution, gathers all the
higher order in αsNf corrections to the evolution that can be recast in a functional form that
looks identical to the leading order one but with a modified kernel, K̃, which includes all the
terms setting the scale for the running coupling:
R [S(x0, x1; Y )] =
d2z K̃(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] . (6)
The second term, S, henceforth referred to as the ’subtraction’ contribution, encodes those
contributions that depart from the three point structure of the leading-log equation. The
explicit derivation and expressions for this term are presented in the next section. The relative
minus sign between the two terms in Eq. (5) has been introduced for latter convenience.
Importantly, the decomposition of F into running coupling and subtraction contributions,
although constrained by unitarity arguments, is not unique. Two different separation schemes
have been proposed in [1, 2]. They are both based on a similar strategy, sketched in Fig. 2,
that can be summarized as follows. The newly created quark-antiquark pair added to the
wave function in the diagrams Fig. 1B is shrunk to a point, called the subtraction point, by
integrating out one of the coordinates in the dipole-qq̄ wave function, rendering the previously
discussed four point nature of these contributions into a three point one. This integrated
three point contribution is added to the running coupling contribution, whereas the original
four point term minus its integrated version are assigned to the subtraction contribution. The
divergence between the two approaches stems from the choice of the subtraction point. In
the subtraction scheme proposed by Balitsky in [1] the subtraction point is chosen to be the
transverse coordinate of either the quark, z2, or the antiquark, z1. The kernel for the running
coupling functional, Eq. (6), obtained in this way is
K̃Bal(r, r1, r2) =
Nc αs(r
r21 r
. (7)
On the other hand, in the subtraction procedure followed in [2] (which we will refer to as
KW) the zero size quark-antiquark pair is fixed at the transverse coordinate of the gluon,
x 0 x 0
Figure 2: Schematic representation of the subtraction procedure.
z = αz1 + (1 − α)z2, where α is the fraction of the gluon’s longitudinal momentum carried by
the quark, yielding the following expression for the kernel of the running coupling contribution:
K̃KW(r, r1, r2) =
− 2 αs(r
1)αs(r
αs(R2)
r1 · r2
r21 r
+ αs(r
, (8)
where
R2(r, r1, r2) = r1 r2
r1·r2
. (9)
As we shall discuss later, the scheme dependence originated by the choice of the subtraction
point is substantial and has an important effect in the solutions of the evolution equation when
only the running contribution is taken into account.
In our numerical study we will also consider the following ad hoc prescription for the kernel
of the running coupling functional in which the scale for the running of the coupling is set to
be the size of the parent dipole
K̃pd(r, r1, r2) =
Nc αs(r
r21 r
. (10)
This prescription is useful as a benchmark used to compare with previous numerical [3] and
analytical works [34, 35, 43] where this ansatz was used.
2.2 Derivation of the subtraction term
We begin by considering the NLO contribution to the kernel of the JIMWLK and BK evolution
equations with the s-channel gluon splitting into a quark-antiquark pair, which then interacts
with the target, as shown on the left hand side of Fig. 3. The contribution of this diagram has
been calculated in [2]. The resulting JIMWLK kernel is [2]
KNLO1 (x0, x1; z1, z2) = 4Nf
(2π)2
(2π)2
(2π)2
(2π)2
e−iq·(z−x0)+iq
′·(z−x
)−i(k−k′)·z
q2q′2
(1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′
k2 + q2α(1− α)
k′2 + q′2α(1− α)
2α (1− α) (1− 2α)
k2 + q2α(1− α)
k′2 + q′2α(1− α)
k · q
k′ · q′
4α2 (1− α)2
k2 + q2α(1− α)
k′2 + q′2α(1− α)
 . (11)
The momentum labels in the above equation are explained on the left hand side of Fig. 3.
If k1 and k2 are the transverse momenta of the quark and of the antiquark in the produced
pair as shown in Fig. 3, then the transverse momentum of the gluon is q = k1 + k2. The
other transverse momentum we use is k = k1(1 − α) − k2α, where α is the fraction the of
gluon’s “plus” momentum carried by the quark, α ≡ k1+/(k1+ + k2+). The prime over the
transverse momentum denotes the momentum of the same particle in the complex conjugate
amplitude. For instance q′ is the momentum of the s-channel gluon in the complex conjugate
amplitude. Finally, z1 and z2 denote the transverse coordinates of the quark and the antiquark.
In Eq. (11) we use z12 = z1−z2 (the transverse separation between the quark and the antiquark)
and z = α z1 + (1− α) z2 (the transverse coordinate of the gluon).
Figure 3: A lowest order leading-Nf NLO correction which gives rise to the subtraction term
is shown on the left. The same diagram with the gluon lines “dressed” by chains of fermion
bubbles, as shown on the right, gives the full (resumming all powers of αµNf ) contribution to
the subtraction term. Calculation of the subtraction term is pictured in Fig. 2.
To obtain the BK kernel from Eq. (11) one should sum over all possible emissions of the
gluon off the quark and antiquark lines in the incoming dipole both in the amplitude and in
the complex conjugate amplitude, which is accomplished by
KNLO1 (x0, x1; z1, z2) = CF
m,n=0
(−1)m+n KNLO1 (xm, xn; z1, z2). (12)
Below we will label the JIMWLK kernel by calligraphic letter K and the corresponding BK
kernel by K.
The contribution of the kernel from Eq. (12) to the right hand side of the NLO version of
Eq. (1) is given by the following term
d2z1 d
1 (x0, x1; z1, z2)S(x0, z1, Y )S(z2, x1, Y ) (13)
with αµ the bare coupling.
As shown in Fig. 2, at the NLO level, the subtraction term introduced in Eq. (5), is then
defined by
SNLO[S] = α2µ
d2z1 d
2z2 K
1 (x0, x1; z1, z2)
× [S(x0, w, Y )S(w, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )], (14)
where w is the point of subtraction in the transverse coordinate space. In [1] it was chosen to
be equal to the transverse coordinate of either the quark or the antiquark,
w = z1 or w = z2, (15)
as both choices lead to the same subtraction term SBalNLO[S]:
SBalNLO[S] =
d2z1 d
2z2 K
1 (x0, x1; z1, z2)
× [S(x0, z1, Y )S(z1, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] . (16)
In [2] the subtraction point was chosen to be the transverse coordinate of the gluon z,
w = z = α z1 + (1− α) z2. (17)
This leads to the following subtraction term, which we denote SKWNLO[S]:
SKWNLO[S] =
d2z1 d
1 (x0, x1; z1, z2)
× [S(x0, z, Y )S(z, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] . (18)
Indeed the complete kernel in Eq. (5) is independent of the choice of w. However, since the
subtraction term of Eq. (14) was neglected both in [1] and in [2], different choices of w led to
different expressions for the remaining running coupling part R[S], i.e., to different answers as
far as investigations in [1] and in [2] were concerned. Different choice of w is the main source
of the discrepancy of final answers of [1] and [2], though it does not imply any disagreement in
the full expression (5).
Our goal in this Section is to evaluate KNLO1 (x0, x1; z1, z2) from Eq. (11) including the
running coupling corrections. The s-channel light-cone perturbation theory formalism makes
such inclusion simple [2]: all we have to do is include infinite chains of quark bubbles on the
gluon lines in the amplitude and in the complex conjugate amplitude, as depicted on the right
hand side of Fig. 3. Performing calculations similar to those done in [2] one arrives at
K ❣1 (x0, x1; z1, z2) = 4Nf
(2π)2
(2π)2
(2π)2
(2π)2
e−iq·(z−x0)+iq
′·(z−x
)−i(k−k′)·z
q2q′2
(1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′
k2 + q2α(1− α)
k′2 + q′2α(1− α)
2α (1− α) (1− 2α)
k2 + q2α(1− α)
k′2 + q′2α(1− α)
k · q
k′ · q′
4α2 (1− α)2
k2 + q2α(1− α)
k′2 + q′2α(1− α)
1 + αµβ2 ln
q2 e−5/3
1 + αµβ2 ln
q′2 e−5/3
) (19)
where K ❣1 denotes the kernel with the running coupling corrections resummed to all orders.
Just like in [2, 31], here we will use MS renormalization scheme. Inclusion of fermion bubble
chains generated two denominators at the end of Eq. (19), which is its only difference from
Eq. (11). Here
11Nc − 2Nf
. (20)
Now we have to perform the transverse momentum integrals in Eq. (19). First we expand
the denominators at the end of Eq. (19) into a power series and rewrite Eq. (19) as
K ❣1 (x0, x1; z1, z2) = 4Nf
n,m=0
(−αµβ2)n+m
(2π)2
(2π)2
(2π)2
(2π)2
e−iq·(z−x0)+iq
′·(z−x
)−i(k−k′)·z
q2q′2
(1− 2α)2q · k k′ · q′ + q · q′ k · k′ − q · k′ k · q′
k2 + q2α(1− α)
k′2 + q′2α(1− α)
2α (1− α) (1− 2α)
k2 + q2α(1− α)
k′2 + q′2α(1− α)
k · q
k′ · q′
4α2 (1− α)2
k2 + q2α(1− α)
k′2 + q′2α(1− α)
λ=λ′=0
where we have defined µ2 = µ2
e5/3 to make the expressions more compact.
Indeed we can not always expand the denominators of Eq. (19) into a geometric series
employed in Eq. (21), but one has to remember that the summation of bubble chain diagrams
shown on the right side of Fig. 3 gives one the geometric series. Hence the geometric series
come first: later they are absorbed into the denominators shown in Eq. (19), which is an
approximation not valid for all q and q′. Therefore, by keeping the geometric series in Eq. (21)
we are not making any approximations. In general, in what follows we are not going to keep
track of the issues of convergence of perturbation series. The contribution of renormalons to
non-linear small-x evolution was thoroughly investigated in [30] and was found to be significant
at low Q2. We refer the interested reader to [30] for more details on this issue.
Using the following formulas
(2π)2
e−ik·z
k2 + q2
K0(q z) (22)
(2π)2
e−ik·z
k2 + q2
q K1(q z) (23)
we can now perform the k- and k′-integrals in Eq. (21). Integrating over the angles of q and q′
as well yields
K ❣1 (x0, x1; z1, z2) =
(2π)4
n,m=0
(−αµβ2)n+m
dq q dq′ q′
z212 |z − x0| |z − x1|
− 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1)
× J1(q |z − x0|)K1(z12 q
α ᾱ) J1(q
′ |z − x1|)K1(z12 q′
α ᾱ) + 2α ᾱ (α− ᾱ)
z12 · (z − x0)
z12 |z − x0|
J1(q |z − x0|)K1(z12 q
α ᾱ) J0(q
′ |z − x1|)K0(z12 q′
α ᾱ)
z12 · (z − x1)
z12 |z − x1|
J0(q |z − x0|)K0(z12 q
α ᾱ) J1(q
′ |z − x1|)K1(z12 q′
α ᾱ)
+ 4α2 ᾱ2 J0(q |z − x0|)K0(z12 q
α ᾱ) J0(q
′ |z − x1|)K0(z12 q′
α ᾱ)
λ=λ′=0
. (24)
We have defined
ᾱ = 1− α (25)
for brevity. Now the integrals over q and q′ can be carried out to give
K ❣1 (x0, x1; z1, z2) =
(2π)4
n,m=0
(−αµβ2)n+m
z212 µ
2 α ᾱ
)λ+λ′
Γ2(1 + λ)
Γ2(1 + λ′)
− 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1)
(1 + λ) (1 + λ′)
z812 (α ᾱ)
1 + λ, 2 + λ; 2;−|z − x0|
α ᾱ z212
1 + λ′, 2 + λ′; 2;−|z − x1|
α ᾱ z212
α− ᾱ
z12 · (z − x0)
(1 + λ)F
1 + λ, 2 + λ; 2;−|z − x0|
α ᾱ z212
1 + λ′, 1 + λ′; 1;−|z − x1|
α ᾱ z212
z12 · (z − x1)
1 + λ, 1 + λ; 1;−
|z − x0|2
α ᾱ z212
(1 + λ′)F
1 + λ′, 2 + λ′; 2;−
|z − x1|2
α ᾱ z212
1 + λ, 1 + λ; 1;−
|z − x0|2
α ᾱ z212
1 + λ′, 1 + λ′; 1;−
|z − x1|2
α ᾱ z212
λ=λ′=0
. (26)
Unfortunately further simplification of the expression in Eq. (26) is impossible without approx-
imations. The series resulting from summation over n and m are likely to be divergent due to
renormalons. As we mentioned before, here we neglect the renormalon problem referring the
reader to [30]. Similar to how it was done in [2] we are not going to attempt to resum the series
exactly: instead we will calculate the next-to-leading order terms and assume that with a good
accuracy they give us the scale(s) of the running coupling constant. This procedure is similar
to the well-known prescription due to Brodsky, Lepage and Mackenzie [44].
Using the Taylor-expansions of hypergeometric functions
F (1 + λ, 2 + λ; 2; z) =
− λ 1
1 + ln(1− z) + 1
ln(1− z)
+ o(λ2). (27)
F (1 + λ, 1 + λ; 1; z) =
ln (1− z) + o(λ2) (28)
after some algebra we obtain
K ❣1 (x0, x1; z1, z2) =
[α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412
− 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1)
1− αµ β2 ln
R2T (x0)µ
+ o(α2µ)
1− αµ β2 ln
R2T (x1)µ
+ o(α2µ)
+ 2α ᾱ (α− ᾱ) z212
z12 · (z − x0)
1− αµ β2 ln
R2T (x0)µ
+ o(α2µ)
1− αµ β2 ln
R2L(x1)µ
+ o(α2µ)
+ z12 · (z − x1)
1− αµ β2 ln
R2L(x0)µ
+ o(α2µ)
1− αµ β2 ln
R2T (x1)µ
+ o(α2µ)
+4α2 ᾱ2 z412
1− αµ β2 ln
R2L(x0)µ
+ o(α2µ)
1− αµ β2 ln
R2L(x1)µ
+ o(α2µ)
In arriving at Eq. (29) we employed functions RT (x) and RL(x), which have dimensions of
transverse coordinates and are defined by
R2T (x)µ
4 e−2γ−5/3
[α (z1 − x)2 + ᾱ (z2 − x)2]µ2MS
α ᾱ z212
(z − x)2
α (z1 − x)2 + ᾱ (z2 − x)2
α ᾱ z212
R2L(x)µ
4 e−2γ−5/3
[α (z1 − x)2 + ᾱ (z2 − x)2]µ2MS
α (z1 − x)2 + ᾱ (z2 − x)2
α ᾱ z212
The subscripts T and L stand for transverse and longitudinal gluon polarizations which give
rise to the two different functions under the logarithm.
Recombining the series in Eq. (29) into physical running couplings finally yields
α2µK ❣1 (x0, x1; z1, z2) =
[α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412
− 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1)
R2T (x0)
R2T (x1)
+ 2α ᾱ (α− ᾱ) z212
z12 · (z − x0) αs
R2T (x0)
R2L(x1)
+z12 · (z − x1)αs
R2L(x0)
R2T (x1)
+ 4α2 ᾱ2 z412 αs
R2L(x0)
R2L(x1)
with the physical running coupling in the MS scheme given by
αs(1/R
1 + αµβ2 ln
R2 µ2
) . (33)
Eq. (32) is the contribution to the JIMWLK evolution kernel of the resummed diagram on
the right hand side of Fig. 3.
2.3 Brief summary of analytical results
Let us briefly summarize our analytical results. The non-linear small-x evolution equation with
the running coupling corrections included reads
∂S(x0, x1; Y )
= R [S]− S [S] . (34)
The first term on the right hand side of Eq. (34) is referred to as the running coupling
contribution. It was calculated independently in [1] and in [2]: the results of those calculations
are given above in Eqs. (7) and (8) correspondingly, which have to be combined with Eq. (6)
to obtain
RBal [S] =
d2z K̃Bal(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] (35)
RKW [S] =
d2z K̃KW(x0, x1, z) [S(x0, z; Y )S(z, x1; Y )− S(x0, x1; Y )] . (36)
One notices immediately that RBal [S] calculated in [1] is different from RKW [S] calculated
in [2] due to the difference in the kernels K̃Bal and K̃KW in Eqs. (7) and (8). However that
does not imply disagreement between the calculations of [1] and [2]: after all, it is the full
kernel on the right of Eq. (34), R [S]− S [S], that needs to be compared. To do that one has
to calculate the second term on the right hand side of Eq. (34).
The second term on the right hand side of Eq. (34) is referred to as the subtraction contri-
bution. It is given by
S[S] = α2µ
d2z1 d
2z2K ❣1 (x0, x1; z1, z2) [S(x0, w, Y )S(w, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )]
with the resummed BK kernel
K ❣1 (x0, x1; z1, z2) = CF
m,n=0
(−1)m+n K ❣1 (xm, xn; z1, z2). (38)
The resummed JIMWLK kernel K ❣1 (xm, xn; z1, z2) is given by Eq. (32), along with Eqs. (30)
and (31) defining the scales of the running couplings. In the numerical solution below we will
replace Nf → −6πβ2 in its prefactor, obtaining
α2µK ❣1 (x0, x1; z1, z2) = −
[α (z1 − x0)2 + ᾱ (z2 − x0)2] [α (z1 − x1)2 + ᾱ (z2 − x0)2] z412
− 4α ᾱ z12 · (z − x0) z12 · (z − x1) + z212 (z − x0) · (z − x1)
R2T (x0)
R2T (x1)
+ 2α ᾱ (α− ᾱ) z212
z12 · (z − x0) αs
R2T (x0)
R2L(x1)
+z12 · (z − x1)αs
R2L(x0)
R2T (x1)
+ 4α2 ᾱ2 z412 αs
R2L(x0)
R2L(x1)
This substitution is the same as for all other factors ofNf . The same substitution was performed
in [2] to calculate the running coupling term. In fact, as was shown in [31], the linear part of the
subtraction term (calculated using the prescription of [2]) contributes to the running coupling
corrections to the BFKL equation. Therefore, in that case, the factor of Nf in front of Eq. (32)
is definitely a part of the beta-function. Hence the replacement Nf → −6πβ2 is justified even
in the subtraction term. Once again, in the numerical solution below we will use Eq. (39) along
with Eq. (38) in Eq. (37) to calculate the subtraction term S[S].
Substituting w = z1 (or, equivalently, w = z2) in Eq. (37) would yield the subtraction term
SBal[S] =α2µ
d2z1 d
2z2K ❣1 (x0, x1; z1, z2)
× [S(x0, z1, Y )S(z1, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] (40)
which has to be subtracted from RBal [S] calculated in [1] and given by Eq. (35) to obtain the
complete evolution equation resumming all orders of αsNf in the kernel.
Substituting w = z = α z1 + (1− α) z2 in Eq. (37) yields
SKW[S] =α2µ
d2z1 d
2z2K ❣1 (x0, x1; z1, z2)
× [S(x0, z, Y )S(z, x1, Y )− S(x0, z1, Y )S(z2, x1, Y )] (41)
which has to be subtracted from RKW [S] calculated in [2] and given in Eq. (36) again to obtain
the complete evolution equation resumming all orders of αsNf in the kernel. We checked
explicitly by performing analytic calculations that the two evolution equations obtained this
way agree at the NLO and NNLO. Below we will check the agreement of the two calculations
to all orders by performing a numerical analysis of the solutions of these equations.
The above discussion demonstrates that the separation of the evolution kernel into the
running coupling and subtraction pieces, as done in Eq. (34), is somewhat artificial, and has
no small parameter justifying one or another separation prescription. Therefore, the small-x
evolution equation including all running coupling (or, more precisely, αsNf ) corrections should
combine both terms in Eq. (34). Below we will solve such evolution equation numerically to
obtain the full small-x evolution with the running coupling.
3 Numerical setup and initial conditions
In our numerical study we consider the translational invariant approximation in which the
scattering matrix is independent of the impact parameter of the collision, i.e., S = S(r, Y ).
To solve the integro-differential equations, corresponding to the BK equation with running
coupling we employ a second-order Runge-Kutta method with a step size in rapidity ∆Y = 0.1.
We discretize the variable |r| into 800 points equally separated in logarithmic space between
rmin = 10
−8 and rmax = 50. Throughout this paper, the units of r will be GeV
−1, and those
of Qs will be GeV. All the integrals have been performed using improved adaptative Gaussian
quadrature methods. The accuracy of this numerical method has been checked in [3] to be
better than a 4% in all the r range.
We consider three different initial conditions for the dipole scattering amplitude, N(r, Y ) =
1− S(r, Y ). The first one is taken from the McLerran-Venugopalan (MV) model [22, 23]:
NMV (r, Y = 0) = 1− exp
. (42)
where a constant term has been added to the argument of the logarithm in the exponent in
order to regularize it for large values of r. The other two initial conditions are given by
NAN (r) = 1− exp
−(r Q
, (43)
with γ = 0.6 and γ = 0.8. These two last initial conditions will be referred hereinafter as AN06
and AN08 respectively. The interest in this ansatz, reminiscent of the Golec-Biernat–Wusthoff
model [45], is that the small-r behavior NAN ∝ r2γ corresponds to an anomalous dimension
1− γ of the unintegrated gluon distribution at large transverse momentum. (AN labels initial
conditions with anomalous dimension.) Our choices γ = 0.6 and γ = 0.8 can be motivated
a posteriori by the observation that the anomalous dimension of the evolved BK solution for
running coupling lies in between those two values and the one for the MV initial condition,
γ ≈ 1 (see Section 4.2). Thus, the choice of distinct initial conditions allows us to better
track the onset of the expected asymptotic universal behavior that is eventually reached at
high energies and to study the influence of the pre-asymptotic, non-universal corrections to the
solutions of the evolution equations. To completely determine our initial conditions, we set
Q′s = 1 GeV at Y=0 in Eqs. (42) and (43) and put Λ = 0.2 GeV. Although Q
s is normally
identified with the saturation scale, our definition of the saturation scale through the rest of
the paper will be purely pragmatical and given by the condition
N(r = 1/Qs(Y ), Y ) = κ, (44)
with κ = 0.5. We have checked that this choice of κ, albeit arbitrary, does not affect any of the
major conclusions to be drawn in the rest of the paper.
Finally, in order to avoid the Landau pole and to regularize the running coupling at large
transverse sizes we stick to the following procedure: for small transverse distances r < rfr, with
rfr defined by αs(1/r
fr) = 0.5, the running coupling is given by the one loop expression
αs(1/r
β2 ln
r2 Λ2
) (45)
with Nf = 3 and Λ = 0.2 GeV, whereas for larger sizes, r > rfr, we “freeze” the coupling at a
fixed value αs = 0.5. A detailed study of the role of Landau pole in non-linear small-x evolution
is given in [30].
4 Results
In this Section, we discuss our numerical results and how they compare to previous numerical
work and analytical estimates.
4.1 Running coupling
Fig. 4 shows the solutions of the evolution equation when only the running coupling contribution
is taken into account, i.e., neglecting the subtraction term in Eq. (34), for different initial
conditions and for the three schemes considered in this work: Balitsky’s, given by Eqs. (7) and
(35), KW, given by Eqs. (8) and (36), and the the ad hoc parent dipole implementation of the
running coupling, shown in Eq. (10).
MV init. cond.     Y=0,5,15,30KW
Balitsky
parent dipole
init. cond.
AN08 init. cond.     Y=0,5,15,30
AN06 init. cond.     Y=0,5,15,30
r )-1(GeV
Figure 4: Solutions of the BK equation at rapidities Y=0, 5, 15 and 30 (curves are labeled from
right to left) for the three running coupling schemes considered in this work: KW (solid line),
Balitsky (dashed line) and parent dipole (dashed-dotted lines). The initial conditions are MV
(top), AN08 (middle) and AN06 (bottom).
As previously observed in [3, 46], the most relevant effect of including running coupling
corrections in the evolution equation is a considerable reduction in the speed of the evolution
with respect to the fixed coupling case. This is a common feature of the different running
coupling schemes studied here and of other phenomenological ones considered in the literature
(a detailed comparison between the solutions for fixed coupling evolution and for parent dipole
running coupling can be found e.g. in [3]). This is not a surprising result, since a generic effect
of the running of the coupling is to suppress the emission of small transverse size dipoles, which
is the leading mechanism driving the evolution.
However, despite this common feature of the running coupling solutions, significant dif-
ferences are found between the solutions obtained under different schemes as we infer from
Fig. 4. In particular, the evolution is much faster with the KW prescription than with that of
Balitsky. Equivalently, the KW prescription yields a stronger growth of the saturation scale
with rapidity/energy than Balitsky’s. Moreover, the solutions obtained when the parent dipole
prescription is used lay much closer to those obtained within the KW scheme than to the ones
obtained when Balitsky’s scheme is applied, contrary to what was suggested in [1]. As argued
before, the differences observed in the solutions obtained using the two subtraction schemes
are entirely due to neglecting the subtraction contribution and reflect the arbitrariness of the
separation procedure.
4.2 Geometric scaling
It has been found in previous analytical [34, 36, 47] and numerical studies on the solutions
of the BK equation at leading order [3, 48–50] and for different heuristic implementations of
next-to-leading order corrections [3,46], including the parent dipole prescription for the running
coupling also considered in this work, that the solutions of the evolution equation at high enough
rapidities are no longer a function of two separate variables r and Y , but rather they depend
on a single scaling variable, τ = r Qs(Y ). This feature of the evolution, commonly referred to
as geometric scaling, is an exact property of the solutions for fixed coupling evolution due to
the conformal invariance of the leading-log kernel, and has become one of the key connections
between the saturation based formalisms and the phenomenology of heavy ion collisions and
deep inelastic scattering experiments [51–57].
It can be seen from Fig. 5 that the solutions of the BK equation with the running coupling
terms discussed in the previous section also exhibit the property of scaling, in agreement with
the analytical study carried out in [38], shown by the fact that the rescaled high rapidity
solutions lay on a single curve which is independent of both the running coupling scheme and
of the initial condition. The scaling behavior of the solution is observed in the whole τ range
studied in this work, including the saturation region, τ > 1. The tiny deviations from a pure
scaling behavior observed in Fig. 5 may be attributed to the fact that the full asymptotic
behavior is reached at even larger rapidities (Y & 80, [3]) than those achieved by the numerical
solution performed in this work.
Remarkably, the scaling function for both KW and Balitsky’s scheme coincides with the one
obtained with the parent dipole prescription, up to the above mentioned scaling violations. It
has been observed in [3, 46, 48] that the scaling function differs significantly in the fixed and
running coupling cases. Following that work, and to make a more quantitative study of the
scaling property, we fitted our solutions to the functional form [34]
f(τ) = a τ 2 γ
ln τ 2 + b
, (46)
with a, b and γ free parameters, within a fixed window below the saturation region, τ ∈
[10−5, 0.1]. Noticeably, at large enough rapidities the whole fitting window lays within the
geometric scaling window proposed in [36]: (Λ/Qs(Y )) < τ < 1, where Λ is some initial scale.
-210 -110 1
MV init. cond.   Y=0,40
Balitsky
parent dipole
fixed coupling
init. cond.
τ -110 1 10
AN06 init. cond.   Y=0,40
Figure 5: Solutions of the BK equation at rapidities Y=0 and 40 for KW (solid line), Balitsky
(dashed line) and parent dipole (dashed-dotted lines) schemes plotted versus the scaling variable
τ = rQs(Y ). The asymptotic solution obtained with fixed coupling αs = 0.2 at Y = 40 in [3]
is shown (black dashed-dotted line) for comparison. The initial conditions are MV (left) and
AN06 (right).
The value of γ extracted from the fits at rapidity Y = 40 lays in between γ ∼ 0.8 and γ ∼ 0.9.
This conclusion holds for the three initial conditions used here: the anomalous dimension
seems to converge to some intermediate value, in agreement with the value found in [3], for
asymptotic running coupling solutions (γ ∼ 0.85 at Y = 70). This result for anomalous
dimension is very far away from the value obtained in [3] for fixed coupling solutions (γ ∼ 0.64
at Y = 70) and from the predicted anomalous dimension for both running and fixed coupling
solutions from analytical studies of the equation based on saddle point techniques [34–38],
γc = χ(γc)/χ
′(γc) = 0.6275, where χ is the leading-log BFKL kernel.
It might be argued that the numerical value of the anomalous dimension extracted from our
fits is conditioned by the choice of the fitting function and by the fitting interval. Actually, it
was shown in [3] that the solutions of the evolution could be well fitted by other functional forms,
including the double-leading-log solution of BFKL, within a similar fitting region to the one
considered in this work. On the other hand, several phenomenological parameterizations of the
solution of the evolution have been proposed in [54–57] and have successfully confronted HERA
10 -410
10 -210 -110 1
 2*0.6τ ~ 
Fixed coupling
 2*0.85τ ~ 
Running coupling
Figure 6: Asymptotic solutions (Y=40) of the evolution equation for running coupling (solid
line) and fixed coupling with αs = 0.2 (dashed line). A fit to a power-law function aτ
2γ in the
region τ ∈ [10−6, 10−2] yields γ ≈ 0.85 for the running coupling solution and γ ≈ 0.6 for the
fixed coupling one.
and RHIC experimental data. There, the dipole scattering amplitude at arbitrary rapidity is
assumed to be given by a functional form analogous to our ansatz for the initial condition
Eq. (43), but allowing for geometric scaling violations by replacing γ → γ(r, Y ). The value
of the anomalous dimension at r = 1/Qs and/or for Y → ∞ is fixed to be the BFKL saddle
point, γc ∼ 0.63 (the saddle point value considered in [57] is slightly different, γ ∼ 0.53), while
the value γ = 1 is recovered in the limit r → ∞ at any finite rapidity. The success of these
phenomenological works supports the claim that the anomalous dimension of the solution is
given by the BFKL saddle point, in agreement with the above mentioned analytical predictions.
However, the relevant values of momenta probed at current phenomenological applications are
very distinct from the fitting region considered here. For example, the inclusive structure
function measured in HERA is fitted in [54,56] within the region 0.045 GeV2 < Q2 < 45 GeV2,
whereas charged hadron pt spectra in dAu collisions is well reproduced by [55–57] in the region
1 GeV < pt < 4.5 GeV. Note that, for both sets of data, the measured regions overlap with
the deeply saturated domain of the solution. On the contrary, our fitting region 10−5 < τ <
1 corresponds to values of momenta ∼ 10Qs(Y ) < pt < 105Qs(Y ) (always well above the
saturation scale), with Qs(Y = 40) ∼ 500÷1000 GeV for the different running coupling schemes
considered and, therefore, has no overlap with the kinematic regions measured experimentally,
since we scrutinize a momentum region strongly shifted to the ultraviolet compared to currently
available data. Moreover, it should be noticed that the rapidity interval covered by both sets
of experimental data is ∆Y < 4 in both cases, while we study the solutions of the evolution
(Y=0)]MVSub[N
-410 -310 -210 -110 1
(Y=30)]MVSub[N
(Y=0)]
Sub[N
10 -210 -110 1
(Y=30)]
Sub[N
(Y=0)]
Sub[N
10 -210 -110 1
(Y=30)]
Sub[N
Figure 7: Subtraction contribution calculated in the KW scheme (triangles) and in Balitsky’s
(stars). The trial functions correspond to the solutions of the evolution under Balitsky run-
ning coupling scheme at rapidities Y = 0, 30 for MV (left), AN08 (center) and AN06 initial
conditions.
at asymptotic rapidities, Y ∼ 40. We have checked that shifting our fitting region to larger
values of τ (smaller momentum) would bring the value of γ extracted from our fits closer to the
saddle point BFKL one, since the transition from the ultraviolet region to the deeply saturated
domain of the scaling solution is realized by a locally less steeper function (see Figs. (5) and
(12)). Therefore, there is no contradiction at all between the success of the phenomenological
parameterizations of the solutions and the results reported here.
With the above clarifications we reach the following conclusion: the asymptotic scaling
solutions corresponding to fixed and running coupling evolution are intrinsically different in
the whole r-range. This is emphasized in Fig. 6, where we represent the scaling solutions in
a log scale for τ < 1. It is clear that the tail of the distribution falls off with decreasing τ
much steeper for the running coupling solution than for the fixed coupling one. A fit to a
pure power-law function, f = a τ 2γ , in the region τ ∈ [10−6, 10−2] yields γ ∼ 0.85 for the
running coupling and γ ∼ 0.61 for the fixed coupling solution. The differences between fixed
and running coupling solutions at τ > 1 are evident from Fig. 5. This is a puzzling result that
remains to be understood from purely analytical methods.
(Y=0)]MV[N
10 -210 -110 1
(Y=30)]MV[N
(Y=0)]
10 -210 -110 1
(Y=30)]AN08[N
(Y=0)]
10 -210 -110 1
(Y=30)]AN06[N
Figure 8: Ratio of the subtraction over the running terms, D(r, Y ) = S[N(r, Y )]/R[N(r, Y )],
calculated in both KW (triangles) and Balitsky (stars) schemes for MV (left), AN08 (middle)
and AN06 (right) initial conditions at rapidities Y=0 (top) and Y=30 (bottom).
4.3 Subtraction Term
Before attempting to solve the complete evolution equation, and in order to gain insight in the
nature and structure of the subtraction contribution, we first evaluate the subtraction functional
for both Balitsky, Eq. (40), and KW, Eq. (41), schemes using a set of trial functions for S which
we choose to consist of the solutions of the evolution equation with the running coupling in
Balitsky’s scheme at different rapidities and of the three initial conditions considered above in
this work.
Two main remarks can be made about our results, shown in Fig. 7:
i) For all the trial functions considered in this work, the subtraction contribution is much
larger in the KW scheme than in Balitsky’s. A plausible explanation for this is that Balitsky’s
subtraction contribution, Eq. (40), when expanded in terms of dipole scattering amplitudes,
N = 1 − S, reduces to a sum of non-linear terms, since all the linear terms in the expansion
cancel each other due to the z1 ↔ z2 symmetry of the kernel, whereas in the KW case no such
cancellation happens and the subtraction contribution, Eq. (41), also includes linear terms,
which are dominant over the non-linear ones in the non-saturated domain where N ≪ 1.
ii) The subtraction contribution S has the same sign as the running coupling contribution
R in the whole τ range which, together with the relative minus sign assigned to the subtraction
term in Eq. (34), implies that the proper inclusion of the subtraction term reduces the value
of the functional that governs the evolution, F . In other words: the subtraction contribution
tends to systematically slow down the evolution, as we shall explicitly confirm in the next
subsection.
To better quantify the size of the subtraction contribution, we plot the ratio D(r, Y ) ≡
S[N(r, Y )]/R[N(r, Y )] in Fig. 8. At Y = 0, the relative weight of the subtraction contribution
with respect to the running one within the KW scheme and for a MV initial condition goes
from a D ∼ 0.4 at small τ to D ∼ 1 at τ ∼ 1. The same ratio for the Balitsky scheme takes
significantly smaller values: it goes from D ∼ 0.1 at small τ to D ∼ 0.4 for τ ∼ 1. As the
evolved solutions get closer to the scaling function, i.e. for larger rapidities, the r dependence of
the ratio becomes flatter and its overall normalization goes down to an approximately constant
value D ∼ 0.15 for the KW scheme and D ∼ 0.025 for that of Balitsky. This behavior remains
unaltered when going from rapidity Y = 20 to Y = 30, which suggests that the ratio may
saturate to a fixed value in the asymptotic region.
-210 -110 1
KWRun
BalRun
KW(Run-Sub)
(Run-Sub)
(Y=0)MVN
τ -210 -110 1
(Y=30)MVN
Figure 9: Total kernel F = R−S calculated under Balitsky’s scheme, Eqs (7) and (40), (solid
line) and under the KW scheme, Eqs (8) and (41), (dashed line). The overlap of the two lines
shows the agreement between the two calculations. Triangles stand for the running coupling
term calculated in the KW approach, RKW, while stars stand for the running coupling term
under Balitsky’s scheme, RBal. The trial functions N(r, Y ) correspond to the solution of the
evolution with only running coupling under Balitsky’s scheme at Y=0 (left) and Y=30 (right)
for a MV initial condition.
-110 1
1.2 MV i.c.
Y=0,3,10
KWRun
BalRun
F=Run-Sub
init. cond.
r )-1(GeV
-110 1
scaling function i.c.
Y=0,3,10
r )-1(GeV
Figure 10: Solutions of the complete (all orders in αs β2) evolution equation given in Eq. (34)
(solid lines), and of the equation with Balitsky’s (dashed lines) and KW’s (dashed-dotted)
running coupling schemes at rapidities Y = 0, 5 and 10. Left plot uses MV initial condition.
The right plot employs the initial condition given by the dipole amplitude at rapidity Y = 35
evolved using Balitsky’s running coupling scheme and with r-dependence rescaled down such
that Qs = Q
s = 1 GeV.
Finally, we have checked that combining the subtraction and running coupling contributions
for both schemes adds up to the same result. This is shown in Fig. 9, where we plot the value
of the total functional F = R − S calculated under the KW scheme (Eqs. (8) and (36) for
the running coupling term, R, and Eq. (41) for the subtraction term, S) and under Balitsky’s
scheme (Eqs. (7) and (35) for the running coupling term and Eq. (40) for the subtraction term).
The two results coincide within the estimation of the numerical accuracy previously discussed.
The agreement between the two results is better in the small-τ region, where the two curves lay
almost on top of each other. In the saturation region, τ & 1, the agreement is slightly worse,
although the differences between the values of F calculated in both schemes is still much less
than the differences between the running coupling terms themselves. This slight remaining
disagreement between the Balitsky’s and KW prescriptions may also be due to inaccuracies in
a Fourier transform of a geometric series performed in arriving at Eq. (39). This result serves
as a cross-check of our numerical method and as an additional confirmation of the agreement
of the independent calculations derived in [1, 2].
0 1 2 3 4 5 6 7 8 9
MV i.c.KWRun
BalRun
F=Run-Sub
)2(GeV
1 2 3 4 5 6 7 8 9
scaling function i.c.
Figure 11: Saturation scale corresponding to the solutions plotted in Fig. 10.
4.4 Complete running coupling BK equation
In this section we calculate the solutions of the complete evolution equation, Eq. (34), including
both the running and subtraction terms obtained by the all-orders αs Nf resummation and by
the Nf → −6πβ2 replacement. Since the numerical evaluation of the subtraction contribution
at each point of the grid and each step of the evolution would require an exceedingly large
amount of CPU time consumption, the strategy followed to include it in the evolution equation
consists of calculating such contribution only in a small set of grid points at each step of the
evolution, which we fixed at n = 16, between the points r1 and r2, which are determined at each
step of the evolution by the conditions N(Y, r1) = 10
−9, and N(Y, r2) = 0.99, and then using
power-law interpolation and extrapolation to the other points of the grid. Both the running
and subtracted terms are calculated within Balitsky scheme. This procedure is motivated by
the fact that, as discussed in the previous section, the subtraction contribution can be regarded
as a small perturbation with respect to the running coupling term within Balitsky’s scheme and
by the fact that it is a rather smooth function that can be well fitted by a power-law function in
most of the r-range. The accuracy of this procedure has been checked by doubling the number
of points at which the subtraction contribution is calculated at each step of the evolution, i.e.
by setting n=32. At Y=2, the differences between the solutions obtained with the two above
mentioned choices for n were less than a 8% in the tail of the solution, r < r1, and less than a
3% for r > r1.
The results of the evolution calculated in this way and using MV and rescaled asymptotic
running coupling solution (Y=35) as initial conditions are plotted in Fig. 10. They confirm
-210 -110 1 10
Y=10F=Run-Sub
(Y=35)  MV BALi.c.=N
Figure 12: Rescaled solutions given by the complete αs β2-evolution equation (solid line) and for
KW (dashed-dotted line) and Balitsky’s (dashed line) running coupling schemes at Y = 10. The
initial condition corresponds to the dipole amplitude at rapidity Y = 35 evolved using Balitsky’s
running coupling scheme and with r-dependence rescaled down such that Qs = Q
s = 1 GeV.
the expectations raised in the previous Subsection: the inclusion of the subtraction terms
considerably slows down the evolution with respect to the sole consideration of the running
coupling contributions. Moreover, the reduction in the speed of the wave front is much larger
for the KW scheme than for that of Balitsky one for both initial conditions. However, the closer
the initial condition is to the asymptotic running coupling scaling function, the smaller are the
effects of the subtraction contribution. These features can be better quantified by inspecting
the rapidity dependence of the saturation scale generated by the evolution, plotted in Fig. 11.
At rapidity Y = 10 the ratio of the saturation scale Qs yielded by the KW scheme to Qs given
by the complete αs β2-evolution equation is a factor of ∼ 2.5 for the MV initial condition and
a factor of ∼ 2.1 for the asymptotic running coupling initial condition. At the same rapidity,
the ratio of the saturation scale obtained under Balitsky’s scheme to Qs corresponding to the
complete αs β2-evolution is ∼ 1.25 for the MV initial condition and ∼ 1.15 for the scaling
function initial condition. Thus, in spite of the smallness of the ratio of the subtraction terms
to the running coupling contributions at high rapidity, which is ∼ 0.025 for Balitsky’s and
∼ 0.15 for KW scheme at Y = 30 (see bottom plots in Fig. 8), the proper inclusion of the
subtraction term results in fairly sizable effects in the solutions of the evolution equation.
Finally, we notice that the scaling behavior of the solution is not affected by the subtraction
term. This is seen in figure Fig. 12, where we evolve starting from an initial condition already
close to the running coupling scaling function and plot the solutions of the evolution equation
obtained with just running coupling terms (see Section 4.1) and the solution of the complete
αs β2-evolution at rapidity Y = 10. It is clear that, within the numerical accuracy, no departure
from the scaling behavior is observed. Therefore the main effect of a proper consideration of the
subtraction term is the one of reducing the speed of the evolution. It does not violate or modify
the geometric scaling property of the solutions established in Section 4.2. In our understanding
geometric scaling appears to persist when the running coupling effects are included because,
at high enough rapidity Qs(Y ) ≫ Λ, such that the new (from the LO standpoint) momentum
scale Λ introduced by the running coupling can be safely neglected. Hence the dynamics is
again characterized by a single momentum scale Qs(Y ). At the same time running coupling
does modify the evolution kernel, leading to a different shape of the scaling function.
5 Conclusions
In this paper we have taken into account all corrections to the kernels of the non-linear JIMWLK
and BK evolution equations containing powers of αs Nf . We reiterated the fact that the sep-
aration of the resulting kernel resumming all powers of αs Nf into the running coupling and
subtraction parts, as done in the previous calculations of [1, 2], is not justified parametrically.
We have then performed numerical analysis with the following conclusions.
• First we solved the evolution equations derived in [1] and [2] keeping only the running
coupling part or the evolution kernel and neglecting the subtraction term. Comparing to
the results for fixed coupling obtained in [3] we confirmed the conclusion reached in [3]
that the growth with rapidity is substantially reduced when running coupling corrections
are included. The results for three different initial conditions are shown in Fig. 4. We
observe that the solution of the equation derived in [2] differs significantly from that
derived in [1], but agrees (with good numerical accuracy) with the solution of the BK
evolution equation with the coupling running at the parent dipole size. (The latter is just
a model of the running coupling not resulting from any calculations, which we plot for
illustrative purposes.) We also observe that at sufficiently high rapidity both equations
from [1] and from [2] give us the same scaling function for the dipole amplitude N(r, Y )
as a function of r Qs(Y ), which is also in agreement with the scaling function given by
the parent dipole running, as shown in Fig. 5. The fact that the scaling is preserved when
the running coupling corrections are included was previously established in [3], though
for models of running coupling only. The shape of the scaling function is very different
from that obtained from the fixed coupling evolution equations. In particular, we found
that for dipole sizes below 0.1/Qs the anomalous dimension of the scaling function in the
running coupling case becomes γ ≈ 0.85 (see Fig. 6). This is different from the result of
several analytical estimates [34–38], which expect the anomalous dimension not to change
when running coupling corrections are included and to remain at its fixed coupling value
of γ ≈ 0.63.
• We have then evaluated the subtraction term for both calculations performed in [1] and
[2]. We demonstrated that subtracting the subtraction terms from the running coupling
terms makes the full answer agree for both calculations of [1] and [2], as shown in Fig. 9
for the right hand side of the evolution equation. It turns out that the subtraction
term SBal[S], which has to be subtracted from the result of [1], is systematically smaller
than SKW[S], to be subtracted from the result of [2], over the whole rapidity range
studied here. This implies that the result of [1] should have a smaller correction than
the result of [2] and is thus closer to the full answer. The subtraction terms SBal[S]
and SKW[S] are plotted in Fig. 7 as functions of the dipole size r for different values of
rapidity. Their relative contributions to the evolution kernel are shown in Fig. 8, where
we plotted the subtraction functional divided by the running coupling functional. From
those figures we conclude that both the magnitude of these extra terms and their relative
contribution to the evolution kernel decrease with increasing rapidity. Hence, while at
”moderate” rapidities (the ones closer to realistic experimental values) the subtraction
term is important for both calculations [1, 2], it becomes increasingly less important at
asymptotically large rapidities. The physics is easy to understand: the subtraction terms
are o(α2s), while the running coupling part of the kernel is o(αs). Hence, if we suppose
that the effective value of the coupling is given by its magnitude at the saturation scale
Qs(Y ), then, as rapidity increases, the coupling would decrease, making the subtraction
term much smaller than the running coupling term. Indeed, while at asymptotically high
rapidities the assumption of [1,2] that the subtraction term could be neglected is justified,
making the results of [1] and [2] agree with each other, for rapidities relevant to modern
days experiments the subtraction term is numerically important.
• With the last conclusion in mind we continued by numerically solving the full evolution
equation resumming all powers of αs Nf in the evolution kernel, which now would combine
both the running coupling and the subtraction terms. The five-dimensional integral in
the subtraction term (37) made obtaining this solution rather difficult. The outcome of
the calculation is shown in Fig. 10. All the main conclusions stated above were again
confirmed by the solution of the full equation. At asymptotically high rapidity scaling
regime is recovered, as can be seen from Fig. 12. As the subtraction term is less important
in that regime, the scaling function appears to be the same as in the case of having only
the running coupling term in the kernel. The anomalous dimension again turns out
to be γ ≈ 0.85, in disagreement with the analytical expectations of [34–38]. However,
the scaling of the saturation scale with rapidity appears to be in agreement with the
expectations of analytical work of [34, 35, 38], as shown in Fig. 11.
We conclude by observing that the knowledge of the non-linear small-x evolution equation
with all the running coupling corrections included brings us to an unprecedented level of pre-
cision allowing for a much more detailed comparison with experiments than was ever possible
before.
Acknowledgments
We would like to thank Heribert Weigert for many informative and helpful discussions at
the beginning of this work. A portion of the performed work was motivated by stimulating
discussions with Robi Peschanski, which we gratefully acknowledge.
This research is sponsored in part by the U.S. Department of Energy under Grant No. DE-
FG02-05ER41377. This work was supported in part by an allocation of computing time from
the Ohio Supercomputer Center.
References
[1] I. I. Balitsky, Quark Contribution to the Small-x Evolution of Color Dipole, Phys. Rev. D
75 (2007) 014001, [hep-ph/0609105].
[2] Y. Kovchegov and H. Weigert, Triumvirate of Running Couplings in Small-x Evolution,
Nucl. Phys. A 784 (2007) 188–226, [hep-ph/0609090].
[3] J. L. Albacete, N. Armesto, J. G. Milhano, C. A. Salgado, and U. A. Wiedemann,
Numerical analysis of the Balitsky-Kovchegov equation with running coupling:
Dependence of the saturation scale on nuclear size and rapidity, Phys. Rev. D71 (2005)
014003, [hep-ph/0408216].
[4] E. A. Kuraev, L. N. Lipatov, and V. S. Fadin, The Pomeranchuk singularity in
non-Abelian gauge theories, Sov. Phys. JETP 45 (1977) 199–204.
[5] Y. Y. Balitsky and L. N. Lipatov Sov. J. Nucl. Phys. 28 (1978) 822.
[6] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, The BFKL equation from
the Wilson renormalization group, Nucl. Phys. B504 (1997) 415–431, [hep-ph/9701284].
[7] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, The Wilson renormalization
group for low x physics: Towards the high density regime, Phys. Rev. D59 (1998) 014014,
[hep-ph/9706377].
[8] J. Jalilian-Marian, A. Kovner, and H. Weigert, The Wilson renormalization group for low
x physics: Gluon evolution at finite parton density, Phys. Rev. D59 (1998) 014015,
[hep-ph/9709432].
[9] J. Jalilian-Marian, A. Kovner, A. Leonidov, and H. Weigert, Unitarization of gluon
distribution in the doubly logarithmic regime at high density, Phys. Rev. D59 (1999)
034007, [hep-ph/9807462].
[10] A. Kovner, J. G. Milhano, and H. Weigert, Relating different approaches to nonlinear
QCD evolution at finite gluon density, Phys. Rev. D62 (2000) 114005, [hep-ph/0004014].
[11] H. Weigert, Unitarity at small Bjorken x, Nucl. Phys. A703 (2002) 823–860,
[hep-ph/0004044].
[12] E. Iancu, A. Leonidov, and L. D. McLerran, Nonlinear gluon evolution in the color glass
condensate. I, Nucl. Phys. A692 (2001) 583–645, [hep-ph/0011241].
[13] E. Ferreiro, E. Iancu, A. Leonidov, and L. McLerran, Nonlinear gluon evolution in the
color glass condensate. II, Nucl. Phys. A703 (2002) 489–538, [hep-ph/0109115].
[14] I. Balitsky, Operator expansion for high-energy scattering, Nucl. Phys. B463 (1996)
99–160, [hep-ph/9509348].
[15] I. Balitsky, Operator expansion for diffractive high-energy scattering, hep-ph/9706411.
http://arxiv.org/abs/hep-ph/0609105
http://xxx.lanl.gov/abs/hep-ph/0609105
http://arxiv.org/abs/hep-ph/0609090
http://xxx.lanl.gov/abs/hep-ph/0609090
http://arxiv.org/abs/hep-ph/0408216
http://xxx.lanl.gov/abs/hep-ph/0408216
http://arxiv.org/abs/hep-ph/9701284
http://xxx.lanl.gov/abs/hep-ph/9701284
http://arxiv.org/abs/hep-ph/9706377
http://xxx.lanl.gov/abs/hep-ph/9706377
http://arxiv.org/abs/hep-ph/9709432
http://xxx.lanl.gov/abs/hep-ph/9709432
http://arxiv.org/abs/hep-ph/9807462
http://xxx.lanl.gov/abs/hep-ph/9807462
http://arxiv.org/abs/hep-ph/0004014
http://xxx.lanl.gov/abs/hep-ph/0004014
http://arxiv.org/abs/hep-ph/0004044
http://xxx.lanl.gov/abs/hep-ph/0004044
http://arxiv.org/abs/hep-ph/0011241
http://xxx.lanl.gov/abs/hep-ph/0011241
http://arxiv.org/abs/hep-ph/0109115
http://xxx.lanl.gov/abs/hep-ph/0109115
http://arxiv.org/abs/hep-ph/9509348
http://xxx.lanl.gov/abs/hep-ph/9509348
http://arxiv.org/abs/hep-ph/9706411
http://xxx.lanl.gov/abs/hep-ph/9706411
[16] I. Balitsky, Factorization and high-energy effective action, Phys. Rev. D60 (1999)
014020, [hep-ph/9812311].
[17] Y. V. Kovchegov, Small-x F2 structure function of a nucleus including multiple pomeron
exchanges, Phys. Rev. D60 (1999) 034008, [hep-ph/9901281].
[18] Y. V. Kovchegov, Unitarization of the BFKL pomeron on a nucleus, Phys. Rev. D61
(2000) 074018, [hep-ph/9905214].
[19] L. V. Gribov, E. M. Levin, and M. G. Ryskin, Singlet structure function at small x:
Unitarization of gluon ladders, Nucl. Phys. B188 (1981) 555–576.
[20] A. H. Mueller and J.-w. Qiu, Gluon recombination and shadowing at small values of x,
Nucl. Phys. B268 (1986) 427.
[21] L. D. McLerran and R. Venugopalan, Green’s functions in the color field of a large
nucleus, Phys. Rev. D50 (1994) 2225–2233, [hep-ph/9402335].
[22] L. D. McLerran and R. Venugopalan, Gluon distribution functions for very large nuclei
at small transverse momentum, Phys. Rev. D49 (1994) 3352–3355, [hep-ph/9311205].
[23] L. D. McLerran and R. Venugopalan, Computing quark and gluon distribution functions
for very large nuclei, Phys. Rev. D49 (1994) 2233–2241, [hep-ph/9309289].
[24] Y. V. Kovchegov, Non-Abelian Weizsaecker-Williams field and a two- dimensional
effective color charge density for a very large nucleus, Phys. Rev. D54 (1996) 5463–5469,
[hep-ph/9605446].
[25] Y. V. Kovchegov, Quantum structure of the non-Abelian Weizsaecker-Williams field for a
very large nucleus, Phys. Rev. D55 (1997) 5445–5455, [hep-ph/9701229].
[26] J. Jalilian-Marian, A. Kovner, L. D. McLerran, and H. Weigert, The intrinsic glue
distribution at very small x, Phys. Rev. D55 (1997) 5414–5428, [hep-ph/9606337].
[27] E. Iancu and R. Venugopalan, The color glass condensate and high energy scattering in
QCD, hep-ph/0303204.
[28] H. Weigert, Evolution at small xbj: The Color Glass Condensate, Prog. Part. Nucl. Phys.
55 (2005) 461–565, [hep-ph/0501087].
[29] J. Jalilian-Marian and Y. V. Kovchegov, Saturation physics and deuteron gold collisions
at RHIC, Prog. Part. Nucl. Phys. 56 (2006) 104–231, [hep-ph/0505052].
[30] E. Gardi, J. Kuokkanen, K. Rummukainen, and H. Weigert, Running coupling and power
corrections in nonlinear evolution at the high-energy limit, Nucl. Phys. A784 (2007)
282–340, [hep-ph/0609087].
[31] Y. V. Kovchegov and H. Weigert, Quark loop contribution to BFKL evolution: Running
coupling and leading-N(f) NLO intercept, Nucl. Phys. A789, 260 (2007),
[hep-ph/0612071].
http://arxiv.org/abs/hep-ph/9812311
http://xxx.lanl.gov/abs/hep-ph/9812311
http://arxiv.org/abs/hep-ph/9901281
http://xxx.lanl.gov/abs/hep-ph/9901281
http://arxiv.org/abs/hep-ph/9905214
http://xxx.lanl.gov/abs/hep-ph/9905214
http://arxiv.org/abs/hep-ph/9402335
http://xxx.lanl.gov/abs/hep-ph/9402335
http://arxiv.org/abs/hep-ph/9311205
http://xxx.lanl.gov/abs/hep-ph/9311205
http://arxiv.org/abs/hep-ph/9309289
http://xxx.lanl.gov/abs/hep-ph/9309289
http://arxiv.org/abs/hep-ph/9605446
http://xxx.lanl.gov/abs/hep-ph/9605446
http://arxiv.org/abs/hep-ph/9701229
http://xxx.lanl.gov/abs/hep-ph/9701229
http://arxiv.org/abs/hep-ph/9606337
http://xxx.lanl.gov/abs/hep-ph/9606337
http://arxiv.org/abs/hep-ph/0303204
http://xxx.lanl.gov/abs/hep-ph/0303204
http://arxiv.org/abs/hep-ph/0501087
http://xxx.lanl.gov/abs/hep-ph/0501087
http://arxiv.org/abs/hep-ph/0505052
http://xxx.lanl.gov/abs/hep-ph/0505052
http://arxiv.org/abs/hep-ph/0609087
http://xxx.lanl.gov/abs/hep-ph/0609087
http://arxiv.org/abs/hep-ph/0612071
http://xxx.lanl.gov/abs/hep-ph/0612071
[32] G. P. Lepage and S. J. Brodsky, Exclusive processes in perturbative quantum
chromodynamics, Phys. Rev. D22 (1980) 2157.
[33] S. J. Brodsky, H.-C. Pauli, and S. S. Pinsky, Quantum chromodynamics and other field
theories on the light cone, Phys. Rept. 301 (1998) 299–486, [hep-ph/9705477].
[34] A. H. Mueller and D. N. Triantafyllopoulos, The energy dependence of the saturation
momentum, Nucl. Phys. B640 (2002) 331–350, [hep-ph/0205167].
[35] D. N. Triantafyllopoulos, The energy dependence of the saturation momentum from RG
improved BFKL evolution, Nucl. Phys. B648 (2003) 293–316, [hep-ph/0209121].
[36] E. Iancu, K. Itakura, and L. McLerran, Geometric scaling above the saturation scale,
Nucl. Phys. A708 (2002) 327–352, [hep-ph/0203137].
[37] S. Munier and R. Peschanski, Universality and tree structure of high energy QCD, Phys.
Rev. D70 (2004) 077503, [hep-ph/0401215].
[38] G. Beuf and R. Peschanski, Universality of QCD traveling-waves with running coupling,
hep-ph/0702131.
[39] A. H. Mueller, Soft gluons in the infinite momentum wave function and the BFKL
pomeron, Nucl. Phys. B415 (1994) 373–385.
[40] A. H. Mueller and B. Patel, Single and double BFKL pomeron exchange and a dipole
picture of high-energy hard processes, Nucl. Phys. B425 (1994) 471–488,
[hep-ph/9403256].
[41] A. H. Mueller, Unitarity and the BFKL pomeron, Nucl. Phys. B437 (1995) 107–126,
[hep-ph/9408245].
[42] Z. Chen and A. H. Mueller, The dipole picture of high-energy scattering, the BFKL
equation and many gluon compound states, Nucl. Phys. B451 (1995) 579–604.
[43] S. Munier and R. Peschanski, Traveling wave fronts and the transition to saturation,
Phys. Rev. D69 (2004) 034008, [hep-ph/0310357].
[44] S. J. Brodsky, G. P. Lepage, and P. B. Mackenzie, On the elimination of scale
ambiguities in perturbative quantum chromodynamics, Phys. Rev. D28 (1983) 228.
[45] K. Golec-Biernat and M. Wüsthoff, Saturation effects in deep inelastic scattering at low
Q2 and its implications on diffraction, Phys. Rev. D59 (1998) 014017, [hep-ph/9807513].
[46] M. A. Braun, Pomeron fan diagrams with an infrared cutoff and running coupling, Phys.
Lett. B 576 (2003) 115, [hep-ph/0308320].
[47] S. Munier and R. Peschanski, Geometric scaling as traveling waves, Phys. Rev. Lett. 91
(2003) 232001, [hep-ph/0309177].
http://arxiv.org/abs/hep-ph/9705477
http://xxx.lanl.gov/abs/hep-ph/9705477
http://arxiv.org/abs/hep-ph/0205167
http://xxx.lanl.gov/abs/hep-ph/0205167
http://arxiv.org/abs/hep-ph/0209121
http://xxx.lanl.gov/abs/hep-ph/0209121
http://arxiv.org/abs/hep-ph/0203137
http://xxx.lanl.gov/abs/hep-ph/0203137
http://arxiv.org/abs/hep-ph/0401215
http://xxx.lanl.gov/abs/hep-ph/0401215
http://arxiv.org/abs/hep-ph/0702131
http://xxx.lanl.gov/abs/hep-ph/0702131
http://arxiv.org/abs/hep-ph/9403256
http://xxx.lanl.gov/abs/hep-ph/9403256
http://arxiv.org/abs/hep-ph/9408245
http://xxx.lanl.gov/abs/hep-ph/9408245
http://arxiv.org/abs/hep-ph/0310357
http://xxx.lanl.gov/abs/hep-ph/0310357
http://arxiv.org/abs/hep-ph/9807513
http://xxx.lanl.gov/abs/hep-ph/9807513
http://arxiv.org/abs/hep-ph/0308320
http://xxx.lanl.gov/abs/hep-ph/0308320
http://arxiv.org/abs/hep-ph/0309177
http://xxx.lanl.gov/abs/hep-ph/0309177
[48] J. L. Albacete, N. Armesto, A. Kovner, C. A. Salgado, and U. A. Wiedemann, Energy
dependence of the Cronin effect from non-linear QCD evolution, Phys. Rev. Lett. 92
(2004) 082001, [hep-ph/0307179].
[49] M. Lublinsky, Scaling phenomena from non-linear evolution in high energy DIS, Eur.
Phys. J. C21 (2001) 513–519, [hep-ph/0106112].
[50] N. Armesto and M. A. Braun, Parton densities and dipole cross-sections at small x in
large nuclei, Eur. Phys. J. C20 (2001) 517–522, [hep-ph/0104038].
[51] A. M. Stasto, K. Golec-Biernat, and J. Kwiecinski, Geometric scaling for the total γ∗p
cross-section in the low x region, Phys. Rev. Lett. 86 (2001) 596–599, [hep-ph/0007192].
[52] N. Armesto, C. A. Salgado, and U. A. Wiedemann, Relating high-energy lepton hadron,
proton nucleus and nucleus nucleus collisions through geometric scaling,
hep-ph/0407018.
[53] J. L. Albacete, N. Armesto, J. G. Milhano, C. A. Salgado, and U. A. Wiedemann,
Nuclear size and rapidity dependence of the saturation scale from QCD evolution and
experimental data, Eur. Phys. J. C43 (2005) 353–360, [hep-ph/0502167].
[54] E. Iancu, K. Itakura, and S. Munier, Saturation and BFKL dynamics in the HERA data
at small x, Phys. Lett. B590 (2004) 199–208, [hep-ph/0310338].
[55] A. Dumitru, A. Hayashigaki, and J. Jalilian-Marian, Geometric scaling violations in the
central rapidity region of d + Au collisions at RHIC, Nucl. Phys. A770 (2006) 57–70,
[hep-ph/0512129].
[56] V. P. Goncalves, M. S. Kugeratski, M. V. T. Machado, and F. S. Navarra, Saturation
physics at HERA and RHIC: An unified description, Phys. Lett. B643 (2006) 273–278,
[hep-ph/0608063].
[57] D. Kharzeev, Y. V. Kovchegov, and K. Tuchin, Nuclear modification factor in d + Au
collisions: Onset of suppression in the color glass condensate, Phys. Lett. B599 (2004)
23–31, [hep-ph/0405045].
http://arxiv.org/abs/hep-ph/0307179
http://xxx.lanl.gov/abs/hep-ph/0307179
http://arxiv.org/abs/hep-ph/0106112
http://xxx.lanl.gov/abs/hep-ph/0106112
http://arxiv.org/abs/hep-ph/0104038
http://xxx.lanl.gov/abs/hep-ph/0104038
http://arxiv.org/abs/hep-ph/0007192
http://xxx.lanl.gov/abs/hep-ph/0007192
http://arxiv.org/abs/hep-ph/0407018
http://xxx.lanl.gov/abs/hep-ph/0407018
http://arxiv.org/abs/hep-ph/0502167
http://xxx.lanl.gov/abs/hep-ph/0502167
http://arxiv.org/abs/hep-ph/0310338
http://xxx.lanl.gov/abs/hep-ph/0310338
http://arxiv.org/abs/hep-ph/0512129
http://xxx.lanl.gov/abs/hep-ph/0512129
http://arxiv.org/abs/hep-ph/0608063
http://xxx.lanl.gov/abs/hep-ph/0608063
http://arxiv.org/abs/hep-ph/0405045
http://xxx.lanl.gov/abs/hep-ph/0405045
	Introduction
	Scheme dependence
	Inclusion of running coupling corrections: general concepts
	Derivation of the subtraction term
	Brief summary of analytical results
	Numerical setup and initial conditions
	Results
	Running coupling
	Geometric scaling
	Subtraction Term
	Complete running coupling BK equation
	Conclusions
ABSTRACT
  We study the solution of the nonlinear BK evolution equation with the
recently calculated running coupling corrections [hep-ph/0609105,
hep-ph/0609090]. Performing a numerical solution we confirm the earlier result
of [hep-ph/0408216] that the high energy evolution with the running coupling
leads to a universal scaling behavior for the dipole scattering amplitude. The
running coupling corrections calculated recently significantly change the shape
of the scaling function as compared to the fixed coupling case leading to a
considerable increase in the anomalous dimension and to a slow-down of the
evolution with rapidity. The difference between the two recent calculations is
due to an extra contribution to the evolution kernel, referred to as the
subtraction term, which arises when running coupling corrections are included.
These subtraction terms were neglected in both recent calculations. We evaluate
numerically the subtraction terms for both calculations, and demonstrate that
when the subtraction terms are added back to the evolution kernels obtained in
the two works the resulting dipole amplitudes agree with each other! We then
use the complete running coupling kernel including the subtraction term to find
the numerical solution of the resulting full non-linear evolution equation with
the running coupling corrections. Again the scaling regime is recovered at very
large rapidity.

<|endoftext|><|startoftext|>
Anomalous c-axis transport in layered metals
D. B. Gutman and D. L. Maslov
Department of Physics, University of Florida, Gainesville, FL 32611, USA
(Dated: November 4, 2018)
Transport in metals with strongly anisotropic single-particle spectrum is studied. Coherent band
transport in all directions, described by the standard Boltzmann equation, is shown to withstand
both elastic and inelastic scattering as long as EF τ ≫ 1. A model of phonon-assisted tunneling via
resonant states located in between the layers is suggested to explain a non-monotonic temperature
dependence of the c-axis resistivity observed in experiments.
PACS numbers: 72.10.-d,72.10.Di
Electron transport in layered materials exhibits a num-
ber of unusual properties. The most striking example is
a qualitatively different behavior of the in-plane (ρab)
and out-of-plane (ρc) resistivities: whereas the temper-
ature dependence of ρab is metallic-like, that of ρc is ei-
ther insulating-like or even non-monotonic. At the level
of non-interacting electrons, layered systems are metals
with strongly anisotropic Fermi surfaces. A commonly
used model is free motion along the planes and nearest-
neighbor hopping between the planes:
εk = k
||/2mab + 2J (1− cos k⊥d) , (1)
where k|| and k⊥ are in the in-plane and c-axis com-
ponents of momentum, respectively, mab is the in-plane
mass, and d is lattice constant in the c-axis direction. For
the strongly anisotropic case (J ≪ EF ), the equipoten-
tial surfaces are “corrugated cylinders” (see Fig.1).
If the Hamiltonian consists of the band motion with
spectrum (1) and the interaction of electrons with poten-
tial disorder as well as with inelastic degrees of freedom,
e.g., phonons, the Boltzmann equation predicts that the
conductivities are given by
σBab = e
2ν〈vavbτtr〉, σBc = 4e2νJ2d2〈sin2 (k⊥d) τtr〉, (2)
where 〈. . . 〉 denotes averaging over the Fermi surface and
over the thermal (Fermi) distribution, ν = mab/πd is the
density of states, and τtr is the transport time, resulting
from all scattering processes (we set h̄ = kB = 1). If
τtr decreases with the temperature, both σab and σc are
expected to decrease with T as well. This is not what
the experiment shows.
The c-axis puzzle received a lot of attention in con-
nection to the HTC materials [1], and a non-Fermi-liquid
nature of these materials was suggested to be responsible
for the anomalous c-axis transport [2]. However, other
materials, such as graphite [3], TaS2 [4], Sr2RuO4 [5], or-
ganic metals [6], etc., behave as canonical Fermi liquids
in all aspects but the c-axis transport. This suggests that
the origin of the effect is not related to the specific prop-
erties of HTC compounds but common for all layered
materials. A large number of models were proposed to
explain the c-axis puzzle. Despite this variety, most au-
thors seem to agree on that the coherent band transport
FIG. 1: Fermi surface corresponding to Eq.(1) with Fermi
velocity vectors at two different points.
in the c-axis direction is destroyed. Although there is no
agreement as to what replaces the band transport in the
”incoherent” regime, the most frequently discussed mech-
anisms include incoherent tunneling between the layers,
assisted by either out-of-plane impurities [8, 10, 11, 12] or
by coupling to dissipative environment [13], and polarons
[14, 15].
The message of this Letter is two-fold. First, we ob-
serve that neither elastic or inelastic (electron-phonon)
scattering can destroy band transport even in a strongly
anisotropic metal as long as the familiar parameter EF τ
is large. Nothing happens to the Boltzmann conductivi-
ties in Eq.(2) except for σBc becoming very small at high
temperatures so that other mechanisms, not included in
Eq.(2), dominate transport. This observation is in agree-
ment with recent experiment [7] where a coherent fea-
ture (angle-dependent magnetoresistance) was observed
in a supposedly incoherent regime. Second, we propose
phonon-assisted tunneling through resonant impurities as
the mechanism competing with the band transport. As
such tunneling provides an additional channel for trans-
http://arxiv.org/abs/0704.0613v1
port, the total conductivity is [8]
σc = σ
c + σres, (3)
where σres is the resonant-impurity contribution. Be-
cause σres increases with the temperature, the band chan-
nel is short-circuited by the resonant one at high enough
temperatures[9]. Accordingly, σc goes through a min-
imum at a certain temperature (and ρc = σ
c goes
through a maximum). We consider phonon-assisted tun-
neling through a wide band of resonant levels distributed
uniformly in space. We show that the non-perturbative
(in the electron-phonon coupling) version of this the-
ory is in a quantitative agreement with the experiment
on Sr2RuO4 [5]. Due to a similarity between phonon-
assisted tunneling and other problems, in which inter-
action leads to the formation of a cloud surrounding the
electron (such as polaronic effect and zero bias anomaly),
many ideas put forward earlier [8, 10, 11, 12, 13, 14, 15]
agree with our picture. Nevertheless, we believe that
only a combination of resonant impurities and electron-
phonon interaction solves the puzzle of c-axis resistivity
and provides a microscopic theory for some of the mech-
anisms considered in prior work. We begin with the dis-
cussion of the breakdown (or lack of it thereof) of the
Boltzmann equation.
One may wonder whether the band transport along
the c-axis breaks down because the Anderson localization
transition occurs in the c-direction whereas the in-plane
transport remains metallic. This does not happen, how-
ever, because an electron, encountering an obstacle for
motion along the c-axis, moves quickly to another point
in the plane, where such an obstacle is absent. More
formally, it has been shown the Anderson transition oc-
curs only simultaneously in all directions [16, 17, 18] and
only if J is exponentially smaller than 1/τ . Therefore,
localization cannot explain the observed behavior.
Refs.[19, 20] suggested an idea of the “coherent-
incoherent crossover”. It implies that the coherent band
motion breaks down if electrons are scattered faster than
they tunnel between adjacent layers, i.e., if Jτ ≪ 1. Con-
sequently, the current in the c-direction is carried via in-
coherent hops between conducting layers. It was noted
by a number of authors that the assumption about inco-
herent nature of the transport does not, by itself, explain
the difference in temperature dependences of σab and σc
[20, 21]: due to conservation of the in-plane momentum,
σc is proportional to τ both in the coherent and inco-
herent regimes. Nevertheless, an issue of the “coherent-
incoherent crossover” poses a fundamentally important
question: can scattering destroy band transport only in
some directions, if the spectrum is anisotropic enough
[22]? We argue here that this is not the case.
Since we have already ruled out elastic scattering, this
leaves inelastic one as a potential culprit. We focus on
the case of the electron-phonon interaction as a source of
inelastic scattering. For an isotropic metal, the quantum
kinetic equation is derived from the Keldysh equations of
motion for the Green’s function via the Prange-Kadanoff
procedure [23] for any strength of the electron-phonon in-
teraction. In this Letter, we apply the Prange-Kadanoff
theory to metals with strongly anisotropic Fermi surfaces,
such as the one in Fig. 1. We show that, exactly as in
the isotropic case, the Boltzmann equation holds its stan-
dard form as long as EF τe-ph ≫ 1. Since this form does
not change between coherent (Jτe-ph ≫ 1) and incoher-
ent (Jτe-ph ≪ 1) regimes, it means that the coherent-
incoherent crossover is, in fact, absent.
We adopt the standard Frölich Hamiltonian for
the deformation-potential interaction with longitudinal
acoustic phonons (ωq = sq)
k+qak
Since tunneling matrix elements are much more sensi-
tive to the increase in the inter-plane distance than the
elastic moduli, the anisotropy of phonon spectra in lay-
ered materials, albeit significant, is still weaker than
the anisotropy of electron spectra (see, e.g., Ref. [24]).
Therefore, we treat phonons in the isotropic approxima-
tion, and assume that the magnitude of the Fermi veloc-
ity is larger than the speed of sound s.
For a static and uniform electric field, the Keldysh
component of the electron’s Green function satisfies the
Dyson equation
L̂GK +
[ReΣR,⊗GK ]− + [ΣK ,⊗ReGK ]−
[ΣK ,⊗A]+ − [Γ,⊗GK ]+
. (5)
Here L̂ = (∂t + v · ∇R + eE · ∇k) is the Liouville op-
erator, A = i(GR − GA) is the spectral function, Γ =
ΣR − ΣA
, and ⊗ denotes the convolution in space and
time. Thanks to the Migdal theorem, the self-energy does
not depend on electron’s dispersion ξk ≡ εk − EF , and
Eq.(5) can be integrated over ξk. This results in an equa-
L̂gK +
[ReΣR, gK ]− = 2iΣ
K − 1
[Γ, gK ]+ (6)
for the “distribution function”
gK(ǫ, n̂) =
GK(ǫ, ξk, n̂)dξk , (7)
where n̂ = vk/ |vk| is a local normal to the Fermi surface.
We consider a linear dc response, when the self-energy
is needed only at equilibrium. Within the Migdal theory,
the Matsubara self-energy is given by a single diagram
Σ(ǫ, n̂) = −
g2 (q)G(ǫ− ω,k− q)D(ω, q) ,
where the dressed phonon propagator
D−1 = D−10 − g
is expressed through bare one
D0(ω, q) = −s2q2/
ω2 + s2q2
and polarization operator Π which, for EF > 2J, is given
by its 2D form
Π(ω, q) = −ν
1− |ω|/
v2F q
‖ + ω
We assume that the electron-phonon vertex decays on
some scale kD shorter than Fermi momentum (kD ≪ kF ).
This assumption allows one to linearize the dispersion
ξk−q ≈ ξk − vk · q and simplifies the analysis without
changing the results qualitatively. As long as J ≪ EF ,
we have |vk| ≈ kF /mab ≈ vF , where kF is the radius of
the cylinder in Fig. 1 for J = 0. Despite the fact that
the electron velocity does have a small component along
the c-axis, its in-plane component is large (cf. Fig. 1).
Since it is the magnitude of vk that controls the Migdal’s
approximation, the problem reduces to the interaction of
fast 2D electrons with slow 3D phonons. With these
simplifications, we find
ReΣR(ǫ, n̂) = −1
ǫ; (8a)
ImΣR(ǫ, n̂) = − ζ
12(1− ζ)2
, (8b)
where ζ = νg2 is a dimensionless coupling constant and
ωD = skD. We see that, despite the strong anisotropy,
the self-energy remains local, i.e., independent of ξk.
Vertex renormalization leads to two types of correc-
tions to the self-energy: those that are proportional to
the Migdal’s parameter (s/vF ) and those that are pro-
portional to ms2/ǫ. The second type of corrections inval-
idates the Migdal’s theory for temperatures below ms2,
which is about 1 K in a typical metal. For metals with
anisotropic spectrum the existence of such a scale is po-
tentially dangerous, since it is not obvious which of the
masses (light or heavy) defines this scale. We find that
the in-plane mass (mab) controls the vertex renormaliza-
tion for the nearly cylindrical Fermi surface. This shows
that the Migdal theory for layered metals has the same
range of applicability as for isotropic metals [25].
The rest of the derivation proceeds in the same way as
for the isotropic case [23], and the resulting Boltzmann
equation assumes its standard form. Since no assump-
tion about the relation between τe-ph and the dwell time
(1/J) has been made, the conductivities obtained from
the Boltzmann equation have the same form regardless
of whether Jτe-ph is large or small. In other words, there
is no coherent-incoherent crossover due to inelastic scat-
tering in an anisotropic metal [29].
The situation changes qualitatively if resonant impu-
rities are present in between the layers. Electrons that
tunnel through such impurities are moving with the speed
controlled by the broadening of a resonant level, i.e.,
much slower than speed of sound. For that reason they
can not be treated within the formalism outlined above
and require a separate study.
To evaluate the resonant-impurity contribution to the
conductivity, we assume that the impurities are randomly
distributed in space with density nimp whereas their en-
ergy levels uniformly distributed over an interval Eb. The
tunneling conductance of a bilayer junction is
G = −e2
dǫdǫ′Wǫ,ǫ′
(1 − n′ǫ) +
, (9)
where Wǫ,ǫ′ is a transition probability per unit time and
nǫ is the Fermi function. To calculateWǫ,ǫ′ , we use the re-
sults of Ref.[30, 31] for the probability of phonon-assisted
tunneling through a single impurity
Wǫ,ǫ′ = ΓLΓR
it1(ǫ
dt2dt3e
i(t2−t3)(ǫ−ǭ0)−Γ(t2+t3) (10)
× exp
|αq|2
|1− e−it3 + eit1
e−it2 − 1
|2 coth
e−it3 + eit2 + eit1(e−it2 − 1)(1− eit3)− c.c.
where αq = −iΛq/
ρωq, Λ is the deformation-potential
constant, ΓL and ΓR are tunneling widths of the resonant
level, Γ = ΓL + ΓR, and ǭ0 is the energy of a resonant
level renormalized by the electron-phonon interaction. In
the limit of no electron-phonon interaction, Eq.(10) re-
produces the well-known Breit-Wigner formula. From
now on, we consider a wide band of resonant levels:
Eb ≫ T ≫ Γ. Averaging Eq.(10) over spatial and en-
ergy positions of resonant levels, one obtains
σres=σel
1−coth
sinh2
dteitǫ−λf(t)
f(t)=
(1−cos(ωt)) coth
+i sin(ωt)
.(11)
Here σel is the conductivity due to elastic resonant tun-
neling and λ ≡ Λ2ω2D/ρs5π2 is the dimensionless cou-
pling constant for localized electrons. In the absence
of electron-phonon interaction, σres is temperature inde-
pendent and given by σel ≃ πe2Γ1nimpa0d/Eb[32], where
a0 is the localization radius of a resonant state and
Γ1 ≃ ǫ0e−d/a0 is its typical width. We note that the
electron-phonon interaction is much stronger for localized
electrons than for band ones: λ/ζ ∼ (kFd) (vF /s) ≫ 1.
Since typically ζ ∼ 1, one needs to consider a non-
perturbative regime of phonon-assisted tunneling. In
that case, resonant tunneling is exponentially suppressed
at T = 0: σres(T = 0) = σele
−λ/2. At finite T , we find
σres = σel
e−λ/2
1 + π
, T ≪ ωD√
, T ≫ λωD.
As T increases, σres growth, resembling the zero-bias
anomaly in disordered metals and Mössbauer effect. At
high temperatures (T ≫ λωD) σres approaches the non-
interacting value (σel). The asymptotic regimes in the
interval ωD/
λ ≪ T ≪ λωD can also be studied but we
will not pause for this here. Notice that, in contrast to
the phenomenological model of Ref.[8], there is no simple
relation between the T -dependences of σBc and σres.
To compare our model with the experiment, we extract
σBc from the low-temperature (between 10 and 50 K) c-
axis resistivity of Sr2RuO4 and extrapolate it to higher
temperatures [5]. The resonant part of the conductivity
is calculated numerically using Eq.(11). The fit to the
data for σel = 43 · 103Ω−1 cm−1, ωD = 41 K and λ = 16
is shown in Fig. 2. The agreement between the theory
and experiment is quite good and the values of the fitting
parameters are reasonable. An immediate consequence of
our model is the sample-to-sample variation of the c-axis
conductivity. Among the layered materials, the largest
amount of data is collected for graphite [3]. Even within
the group of samples with comparable in-plane mobili-
ties, the temperature of the maximum in ρc varies from
40K to 300 K [3, 33].
To conclude, we have shown that the Boltzmann
equation and its consequences are no less robust for
anisotropic metals than they are for isotropic ones. The
only condition controlling the validity of the Boltzmann
equation is the large value of EF τ, regardless of whether
τ comes from elastic or inelastic scattering. Out-of-
plane localized states change the c-axis transport rad-
ically while playing only minor role for the in-plane
one. While ρab remains metallic, an interplay between
100 200 300 400
T,     Kelvin
FIG. 2: ρc vs temperature. Solid: experimental data on
Sr2RuO4; dashed: fit into the phonon-assisted tunneling
model in the non-perturbative regime, Eq.(11)
phonon-assisted tunneling and conventional momentum
relaxation causes insulating or non-monotonic depen-
dence of ρc on temperature. This model is in a good
agreement with the experimental data on Sr2RuO4.
This research was supported by NSF-DMR-0308377.
We acknowledge stimulating discussions with B. Alt-
shuler, A. Chubukov, A. Hebard, S. Hill, P. Hirschfeld,
P. Littlewood, D. Khmelnistkii, N. Kumar, Yu. Makhlin,
A. Mirlin, M. Reizer, A. Schofield, S. Tongay, A.A. Var-
lamov, and P. Wölfle. We are indebted to A. Hebard, A.
Mackenzie, and S. Tongay for making their data available
to us.
[1] S. L. Cooper and K. E. Gray, in Physical Properties
of High Temperature Superconductors, edited by D. M.
Ginsberg, (World Scientific, Singapore, 1994), p. 61.
[2] P. W. Anderson, Science 256, 1526 (1990); P. W. An-
derson and Z. Zou, Phys. Rev. Lett. 42, 2642 (1992); D.
G. Clarke, S. P. Strong, and P. W. Anderson, Phys. Rev.
Lett. 72, 3218-3221 (1994).
[3] see N. B. Brandt, S. M. Chudinov, and Ya. G.
Ponomarev, Semimetals: I. Graphite and its com-
pounds, (North-Holland, Amsterdam, 1988) and refer-
ences therein.
[4] W. J. Wattamaniuk, J. P. Tidman, and R. F. Frindt,
Phys. Rev. Lett. 35 62 (1975).
[5] A. W. Tyler, A. P. Mackenzie, S. NishiZaki, and Y.
Maeno, Phys. Rev. B 58, 10107 (R) (1998).
[6] J. Singleton and C. Mielke, Contemp. Phys. 43, 63
(2002).
[7] J. Singleton et al. cond-mat/0610318.
[8] A. Rojo and K. Levin, Phys. Rev. B 48, 16861 (1993).
[9] V. Fleurov, M. Karpovski, M. Molotskii, A. Palevski, A.
Gladkikh, R. Kris Solid State Comm. 97, 543, (1996).
[10] M. J. Graf, M. Palumbo, D. Rainer, and J. A. Sauls,
Phys. Rev. B 52, 10588 (1995).
http://arxiv.org/abs/cond-mat/0610318
[11] P. J. Hirschfeld, S. M. Quinlan, and D. J. Scalapino,
Phys. Rev. B 55, 12742 (1997).
[12] A. A. Abrikosov, Physica C 317-318, 154 (1999).
[13] M. Turlakov and A. J. Leggett, Phys. Rev. B 63, 064518
(2001).
[14] U. Lundin and R. H. McKenzie,Phys. Rev. B 68,
081101(R) (2003).
[15] A. F. Ho and A. J. Schofield, Phys. Rev. B 71, 045101
(2005)
[16] P. Wölfle and R. N. Bhatt, Phys. Rev. B 30, 3542 (1984).
[17] N. Kumar, P. A. Lee, and B. Shapiro, Physica A 168,
447 (1990).
[18] N. Dupuis, Phys. Rev B 56, 9377 (1997).
[19] N. Kumar and A. M. Jayannavar, Phys. Rev. B 45, 5001
(1992).
[20] P. Moses and R. H. McKenzie, Phys. Rev. B 60, 7998
(1999).
[21] L. Ioffe, A. Larkin, A. Varlamov, and L. Yu, Phys. Rev.
B 47, 8936 (1993).
[22] D. G. Clarke, S. P. Strong, P. M. Chaikin, and E. I.
Chashechkina, Science 279, 2071 (1998).
[23] J. Rammer and H. Smith, Rev. Mod. Phys. 58, 323
(1986).
[24] J. Paglione, C. Lupien, W.A. MacFarlane, J.M. Perz, L.
Taillefer, Z.Q. Mao, and Y. Maeno, Phys. Rev. B 65,
220506(R).
[25] The self-energy in Eqs.(8a,8b) diverges at ζ = 1. This di-
vergence –also present for the isotropic case – results from
the renormalization of the sound velocity and is an arte-
fact of the Frölich Hamiltonian. A divergence-free theory
is obtained by applying the adiabatic approximation to
the coupled system of electrons and ions [26, 27, 28].
[26] J. R. Schrieffer, Theory of Superconductivity, (Addison-
Wesley, Redwood City, 1988).
[27] E.G. Brovman and Yu. Kagan, Sov. Phys. JETP 25, 365
(1967).
[28] B. T. Geilikman, Sov. Phys.-Usp. 18, 190 (1975).
[29] Migdal’s theory also rules out models based entirely on
polaronic effects since polarons are stable only if a typical
electron velocity does not exceed the sound one.
[30] L.I. Glazman and R.I. Shekhter, Sov. Phys. JETP 61
163, (1988).
[31] N. S. Wingreen, K. W. Jacobsen, and J. W. Wilkins,
Phys. Rev. Lett. 59, 376 (1987); Phys. Rev. B 40, 11834
(1989).
[32] A.I. Larkin and K. Matveeev, Sov. Phys. JETP 66, 580
(1987).
[33] S. Tongay and A. F. Hebard, private communication.
ABSTRACT
  Transport in metals with strongly anisotropic single-particle spectrum is
studied. Coherent band transport in all directions, described by the standard
Boltzmann equation, is shown to withstand both elastic and inelastic scattering
as long as $E_F\tau\gg 1$. A model of phonon-assisted tunneling via resonant
states located in between the layers is suggested to explain a non-monotonic
temperature dependence of the c-axis resistivity observed in experiments.

<|endoftext|><|startoftext|>
Introduction to complex analytic geometry. Translated from
the Polish by Maciej Klimek, Birkhäuser Verlag, Basel, 1991.
[Nik-Tho-Zwo 2007] N. Nikolov, P. J. Thomas, W. Zwonek, Discontinuity of the Lempert function
and the Kobayashi-Royden metric of the spectral ball, preprint.
[Ran-Whi 1991] T. J. Ransford, M. C. White, Holomorphic self-maps of the spectral unit ball,
Bull. London Math. Soc. 23 (1991), 256–262.
[Ros 2003] J. Rostand, On the automorphisms of the spectral unit ball, Studia Math.
155 (2003), 207–230.
[Rud 1980] W. Rudin, Function theory in the unit ball of Cn (1980), Grundlehren der
Mathematischen Wissenschaften 241 Springer-Verlag, New York-Berlin.
Instytut Matematyki, Uniwersytet Jagielloński, Reymonta 4, 30-059 Kraków,
Poland
E-mail address: Wlodzimierz.Zwonek@im.uj.edu.pl
ABSTRACT
  We prove an Alexander type theorem for the spectral unit ball $\Omega_n$
showing that there are no non-trivial proper holomorphic mappings in
$\Omega_n$, $n\geq 2$.

<|endoftext|><|startoftext|>
Introduction
The (Fitch) parsimony length of a character on a tree equals the minimum number of state
changes (substitutions) required to fit the character onto a tree (Fitch, 1971). We turn this
definition on its head and show how the parsimony length of a character equals the minimum
number of changes in the tree required to fit the tree onto the character. This may be a
back-to-front way to look at parsimony, but it is also a useful one. We detail two applications
of the result.
The first application is that this reformulation of parsimony provides a closer link between
parsimony based analysis and supertree methods. We demonstrate that the maximum parsi-
mony tree can be viewed as a type of median consensus tree, where the median is computed
with respect to the SPR distance (see below). As well, the result shows how to conduct a
parsimony based analysis not just on characters but on trees, without having to recode the
trees as binary character matrices. This opens the way to a hybrid between the consensus
approach and the total evidence approach, where the data is a mix characters, trees, and
subtrees.
The second application of our observation on parsimony is to the analysis of pairs of
characters. We show that the score of the maximum parsimony tree for two characters is a
simple function of the smallest number of recombinations required to explain the incongru-
ence between the characters without homoplasy. This result provides the basis of a highly
efficient test for recombination (Bruen et al., 2006).
Here and throughout the paper we assume that all phylogenetic trees are fully resolved
(bifurcating) and that by ‘parsimony’ we refer to Fitch parsimony, where the character states
are unordered and reversible. Some of the results presented here can be extended to other
forms of parsimony, and possibly to incompletely resolved trees (Bruen, 2006), lie beyond
the scope of this paper.
Note that in this paper we are dealing with unrooted SPR rearrangements, which are those
used in tree searches. There is a related, but distinct, concept of rooted SPR rearrangements,
where the rearrangements are restricted to obey a type of temporal constraint Song (2003). It
is this latter class of rooted SPR rearrangments that are used to model lateral gene transfers
and recombination. It would be a worthwhile, but challenging, goal to investigate whether
any of the results on unrooted SPR rearrangements in this paper can be extended to rooted
SPR rearrangements.
Linking Parsimony with SPR
A subtree-prune and regraft (SPR) rearrangement is an operation on phylogenetic trees
whereby a subtree is removed from one part of the tree and regrafted to another part of
the tree, see Figure 1, (Felsenstein, 2004; Swofford et al., 1996). These SPR rearrangements
are widely used by tree searching software packages like PAUP (Swofford, 1998) and Garli
(Zwickl, 2006). The SPR distance between two trees can be defined as the minimal num-
ber of SPR rearrangements required to transform one tree into the other (Hein, 1990; Allen
and Steel, 2001; Goloboff, 2007). For example, the two trees T1 and T3 in Figure 1 can be
transformed into each other using a minimum of two SPR rearrangements, via the tree T2,
so their SPR distance is two.
Figure 1: Two trees, T1 and T3, separated by two SPR rearrangements via the intermediate
tree T2. A binary character of parsimony length 3 is indicated on tree T1 by the node
colours. The character is compatible with a tree (T3) within SPR distance two, illustrating
Theorem 1..
The parsimony length of a character on a tree is the minimum number of steps required
to fit that character on the tree, as computed by the algorithm of (Fitch, 1971). We will
always assume unordered reversible characters The length of a character Xi on a tree T is
denoted `(Xi, T ). A character with ri states therefore has parsimony length at least (ri−1),
as every state not at the root has to arise at least once. A character is compatible with a
tree if it requires at most (ri − 1) changes on that tree (Felsenstein, 2004).
So far, one thinks of fitting a character onto a tree; we could just as well fit the tree onto
the character. If the character and the tree are compatible then we have a perfect fit. When
there is not a perfect fit we can measure how many SPR rearrangements are required to give
a tree that does make a perfect fit. It turns out that this measure gives an equivalent score
to parsimony length. More formally:
Theorem 1. Let Xi be a character with ri states and let T be a fully resolved phylogenetic
tree. It takes exactly `(Xi, T ) − (ri − 1) SPR rearrangements to transform T into a tree
compatible with Xi. The result still holds if Xi has some missing states.
As an example, consider the character X1 mapping taxa A,C,D,F to one and B,E,G to
zero. The length of this character on tree T1 of Figure 1 is three, and the number of SPR
rearrangements needed to transform T1 onto some tree T3 compatible with with X1 is two.
Note that there could be other trees compatible with X1 are are further than two SPR
rearrangements away: the result only gives the number of rearrangements required to obtain
the closest tree.
Once stated, the theorem is not too difficult to prove. First show that performing an SPR
rearrangement decreases the length by at most one step. Hence it takes at least `(Xi, T )−
(ri − 1) SPR rearrangements to transform T into a tree compatible with the character Xi.
Then show that this is the minimum required. A formal proof is presented in the Appendix.
A restricted (binary character) version of this theorem was proved in (Bryant, 2003).
The theorem captures an issue that is central to the interpretation of incongruence: is
an observed incongruence to be explained by positing homoplasy or by modifying the tree.
Define the SPR distance from a tree T to a character Xi to be the SPR distance from T
to the closest tree T ′ that is compatible with Xi. Theorem 1 then tells us that the SPR
distance from T is equal to the difference between the length `(Xi, T ) of Xi on T and the
minimum possible length of Xi on any tree.
Consensus trees, supertrees and parsimony
In their insightful overview of supertree methods Thorley and Wilkinson (2003) characterise
a family of supertree methods that all minimise a sum of the form
d(T, ti) = d(T, t1) + d(T, t2) + ...+ d(T, tn). (1)
Here t1, t2, . . . , tn are the input trees and d(T, ti) is a measure of the distance between the
input tree ti with the supertree T . There are many choice for the distance measure d,
and it need not be the case that the distance measure satisfies the symmetry condition
d(T, ti) = d(ti, T ). Gordon (1986) was the first to propose this description of supertrees.
Many supertree methods can be described in these terms, including Matrix representation
with parsimony (MRP) (Baum, 1992; Ragan, 1992); Minimum Flip supertrees Chen et al.
(2006); the Median Supertree (Bryant, 1997), Majority Rule Supertree (Cotton and Wilkin-
son, 2007) and the Average Consensus Supertree (Lapointe and Cucumel, 1997).
Let ds(T,Xi) denote the SPR distance from T to the closest fully resolved tree Ti that is
compatible with Xi. By Theorem 1, a maximum parsimony tree for X1, . . . , Xm is one that
minimises the expression
ds(T,Xi) = ds(T,X1) + ds(T,X2) + ...+ ds(T,Xm). (2)
In this way, maximum parsimony is a form of median consensus. The significance of this
observation doesn’t come from the fact that we can write the the parsimony score of T in the
form (2); it is from the close connection with SPR distances, and from the way we will now
use this connection to combine different kinds of data in the same theoretical framework.
An SPR median tree for fully resolved trees t1, . . . , tn on the same leaf set is a tree T that
minimises
ds(T, ti) = ds(T, t1) + ds(T, t2) + ...+ ds(T, tn),
where here d(T, ti) denotes the SPR distance from T to ti (Hill, 2007). We extend this
directly to a supertree method by mimicking the situation for characters. Suppose that ti is
a phylogenetic tree, not necessarily fully resolved, on a subset of the set of leaves. We say that
a fully resolved tree T on the full set of leaves is compatible with ti (equivalently, T displays
ti) if we can obtain ti from T by pruning off leaves and contracting edges. In this general
situation, we let ds(T, ti) denote the SPR distance from T to the closest fully resolved tree
Ti that is compatible with ti. This is equivalent to the more traditional definition whereby
we first prune leaves off T then compute the distance from this pruned tree to ti.
Now suppose that we have both characters and trees in the input. Both types of phylo-
genetic data can be into an SPR median tree T , chosen to minimise the sum
ds(T, ti) +
ds(T,Xi).
We have, then, a way to bring together both the supertree/consensus methodology and the
total evidence methodology. In the case that the data comprises only trees, the tree is a
median supertree; in the case that the data comprises only character data, the tree is the
maximum parsimony tree.
It is important to note the difference between this approach and the MRP method (Baum,
1992; Ragan, 1992), which could be used to combine trees and characters. In MRP, the trees
are broken down into multiple independent characters. This is a problem, since the characters
encoding a tree are nowhere near independent. In contrast, the SPR median tree approach
treats a tree as a single indivisible unit of information.
There is one critical issue that has been side-stepped: computation time. At present,
computational limitations make the construction of SPR median trees infeasible for all but
the smallest data sets: just computing the SPR distance between two trees is an NP-hard
problem (Hickey et al., 2006). In contrast, Total evidence and MRP approaches are possible
for at least 100 taxa. However there are now good heuristics for unrooted SPR distance
Goloboff (2007) and exact special case algorithms Hickey et al. (2006) that could be applied
to the problem. Below we describe a lower bound method for the SPR distance that should
also aid construction of these SPR median trees.
Parsimony on pairs of characters
Another valuable application of Theorem 1 follows when we consider parsimony analysis of
just two unordered and reversible characters. The concept of pairwise character compatibil-
ity was introduced by Le Quesne (1969) (see also Felsenstein (2004)). Two binary characters
with states 0 and 1 are incompatible if and only if all four combinations of 00, 01, 10, and 11
are present as combination of states for the two characters (Le Quesne, 1969). In a standard
setting, character incompatibility is interpreted as implying that at least one of the charac-
ters has undergone convergent or recurrent mutation (homoplasy). In other words, for every
possible phylogeny describing the history of the two characters, at least one homoplasy is
posited for one of the characters. Another interpretation of incompatibility of two characters
is that characters evolved without homoplasy on two different phylogenies, where the phylo-
genies differ by one or more SPR rearrangement (Sneath et al., 1975; Hudson and Kaplan,
1985).
Define the total incongruence score i(X1, X2) for two multi-state unordered characters
X1 and X2 (with r1 and r2 states respectively) as
i(X1, X2) = min
`(X1, T ) + `(X2, T )
− (r1 − 1)− (r2 − 1). (3)
This is the maximum parsimony score of the two characters X1, X2 minus the minimum
number of changes required for each character. Equation (3) generalises the incompatibility
notion for two binary characters. It is also equivalent to the incongruence length difference
statistic applied to only two characters (Farris et al., 1995). Importantly, the total incongru-
ence score can be computed rapidly (Bruen and Bryant, 2006). The following consequence
of Theorem 1 strengthens the connection between incongruence and SPR rearrangements.
Theorem 2. The total incongruence score i(X1, X2) for two characters equals the minimum
SPR distance between a tree T1 and T2 such that X1 is compatible with T1 and X2 is compatible
with T2.
Although the notion of total incongruence for two characters has been considered before
in the context of character selection and weighting (Penny and Hendy, 1986), it has not been
considered in the context of genealogical similarity. Essentially, Theorem 2 shows that the
total incongruence score equals the minimum possible number of SPR rearrangements that
could have occurred between the phylogenetic histories for both characters, assuming that
the characters have different histories with which they are each perfectly compatible.
Indeed, Theorem 2 suggests a natural way to interpret genealogical similarity between
two characters, which we have used to develop a powerful test for recombination (Bruen
et al., 2006). Choosing two characters from two different genes (which have possibly different
histories) gives a simple approach to identify the distinctiveness of the histories of the genes.
We can also apply Theorem 2 to obtain a lower bound on an SPR distance between two
trees. Suppose that we have two trees T1 and T2 and we wish to obtain a lower bound on the
SPR distance d(T1, T2) between the two trees. If we choose any character X1 convex on T1
and any character X2 convex on T2 then, by Theorem 2, we have that i(X1, X2) ≤ d(T1, T2).
By carefully choosing X1 and X2 we can obtain tighter bounds. One natural starting point
for X1 and X2 is the four or five character encodings described by (Semple and Steel, 2002;
Huber et al., 2005).
Discussion and extensions
We have presented a reformulation of parsimony that is, in some way, dual to the standard
definitions. Instead of measuring how well a character fits onto a tree we look at how well the
tree fits onto the character. A consequence of this new perspective is that we can combine
trees and character data using one general SPR framework, and we also obtain new results
connecting incongruence measures and recombination. Nevertheless, it is not immediately
clear how the new reformulation can be interpreted in itself.
Trees compatible
with X1
Trees compatible
with X2
Trees compatible
with X3
Trees compatible
with X4
Trees compatible
with X(m-1)Trees compatible
with Xm
d(T,t1)
Figure 2: Cartoon representation of parsimony in terms of tree rearrangements. Each
characterXi gives a ‘cloud’ of trees containing those trees compatible withXi. The maximum
parsimony tree is then the tree closest to these clouds under the SPR distance.
One aid in this direction is to consider the information a single character, or tree, rep-
resents. Given a single character, we can imagine a cloud of trees comprising exactly those
trees compatible with the character (Figure 2). If we are told that this character evolved
without homoplasy, then we know that the true evolutionary tree must be contained some-
where within the cloud. However as there is only one character there is a lot of uncertainty
regarding the tree, so there are a lot of trees in the clouds. Now suppose we have multiple
characters, each with its own cloud. There may not be a single tree contained in the inter-
section of all of these clouds. Instead, we search for a tree that is close as possible to all of
the clouds. The distance from T to the cloud associated to character Xi is exactly ds(T,Xi),
so by Theorem 1 a tree closest to all of the clouds is a maximum parsimony tree.
Each cloud represents the uncertainty around each piece of data (tree or character).
We note that several of the results in this article can be extended, for details. Firstly,
both Theorems 1 and 2 are both valid if we replace the SPR distance with the tree bisection
and reconnection (TBR) distance. In a TBR rearrangement, a subtree is removed from
the tree and then reattached elsewhere in a tree, the difference with SPR being that we can
reattach using any of the nodes in the subtree (Allen and Steel, 2001; Felsenstein, 2004). The
TBR distance between two trees is the minimum number of TBR rearrangements required
to transform one tree into the other.
That Theorems 1 and 2 hold for the TBR distance might seem surprising, since the TBR
distance between two trees is always less than, or equal to, the SPR distance between the
trees. However the extension follows by a tiny change to the proof of Theorem 1, noting
that a TBR move can still only reduce the parsimony score of a character by at most one.
We have also explored extensions of the result to other distances between trees, notably
the Robinson-Foulds or partition distance and the Nearest Neighbor Interchange distance,
though the connections are not so clear. See Bruen (2006) for details.
Acknowledgements
We would like to thank Mike Steel, Sebastien Böcker, Olaf Bininda Emonds, Pablo Golloboff,
Mark Wilkinson and an anonymous referee for their valuable suggestions. This research was
partially supported by the New Zealand Marsden Fund.
References
Allen, B. and M. Steel. 2001. Subtree transfer operations and their induced metrics on
evolutionary trees. Annals of Combinatorics 5:1–13.
Baum, B. 1992. Combining trees as a way of combining datasets for phylogenetic inference,
and the desirability of combining gene trees. Taxon 41:3–10.
Bruen, T. 2006. Discrete and statistical approaches to genetics. Ph.D. thesis McGill Univer-
sity School of Computer Science.
Bruen, T. and D. Bryant. 2006. A subdivision approach to maximum parsimony. Annals of
Combinatorics In Press.
Bruen, T., H. Philippe, and D. Bryant. 2006. A simple and robust statistical test to detect
the presence of recombination. Genetics 172:1–17.
Bryant, D. 1997. Building trees, hunting for trees and comparing trees. Ph.D. thesis Dept.
Mathematics, University of Canterbury.
Bryant, D. 2003. A classification of consensus methods for phylogenetics. Pages 163–184 in
Bioconsensus vol. 61 of DIMACS. American Math Society, Providence, RI.
Bryant, D. 2004. The splits in the neighborhood of a tree. Annals of Combinatorics 8:1–11.
Chen, D., O. Eulenstein, D. Fernandez-Baca, and M. Sanderson. 2006. Minimum-flip su-
pertrees: Complexity and algorithms. IEEE/ACM Trans. Comput. Biol. Bioinformatics
3:165–173.
Cotton, J. and M. Wilkinson. 2007. Majority-rule supertrees. Systematic Biology 56:445–
Farris, J. S., M. Källersjö, A. G. Kluge, and C. Bult. 1995. Constructing a significance test
for incongruence. Systematic Biology 44:570–572.
Felsenstein, J. 2004. Inferring Phylogenies. Sinauer Associates.
Fitch, W. M. 1971. Towards defining the course of evolution: Minimum change for a specific
tree topology. Systematic Zoology 20:406–416.
Goloboff, P. 2007. Calculating SPR distances between trees. Cladistics Online early access.
Gordon, A. D. 1986. Consensus supertrees: the synthesis of rooted trees containing overlap-
ping sets of labeled leaves. Journal of Classification 3:335–348.
Hein, J. 1990. Reconstructing evolution of sequences subject to recombination using parsi-
mony. Mathematical Biosciences 98:185–200.
Hickey, G., F. Dehne, A. Rau-Chaplin, and C. Blouin. 2006. The computational complexity
of the unrooted subtree prune and regraft distance. Tech. Rep. CS-2006-06 Faculty of
Computer Science, Dalhousie University.
Hill, T. 2007. Development of New Methods for Inferring and Evaluating Phylogenetic Trees.
Ph.D. thesis Uppsala Universitet.
Huber, K. T., V. Moulton, and M. A. Steel. 2005. Four characters suffice to convexly define
a phylogenetic tree. SIAM Journal on Discrete Mathematics 18:835–843.
Hudson, R. R. and N. L. Kaplan. 1985. Statistical properties of the number of recombination
events in the history of a sample of dna sequences. Genetics 111:147–64.
Lapointe, F.-J. and G. Cucumel. 1997. The average consensus procedure: combination of
weighted taxa containing identical or overlapping sets of taxa. Systematic Biology 46:306–
Le Quesne, W. J. 1969. A method of selection of characters in numerical taxonomy. System-
atic Zoology 18:201–205.
Penny, D. and M. Hendy. 1986. Estimating the reliability of evolutionary trees. Molecular
Biology and Evolution 3:403–17.
Ragan, M. A. 1992. Phylogenetic inference based on matrix representations of trees. Molec-
ular Phylogenetics and Evolution 1:53–58.
Semple, C. and M. Steel. 2002. Tree reconstruction from multi-state characters. Advances in
Applied Mathematics 28:169–84.
Semple, C. and M. Steel. 2003. Phylogenetics. Oxford University Press.
Sneath, P., M. Sackin, and R. Ambler. 1975. Detecting evolutionary incompatibilities from
protein sequences. Systematic Zoology 24:311–332.
Song, Y. S. 2003. On the combinatorics of rooted binary phylogenetic trees. Ann. Comb.
7:365–379.
Swofford, D., G. Olsen, P. Waddell, and D. Hillis. 1996. Molecular sytematics chap. Phylo-
genetic Inference, Pages 407–514. Sinauer Associates, Inc.
Swofford, D. L. 1998. PAUP*. Phylogenetic Analysis using Parsimony (*and other methods).
Sinauer Associates, Sunderland, Massachusetts.
Thorley, J. L. and M. Wilkinson. 2003. A view of supertree methods. Pages 185–194 in
Bioconsensus (F. Roberts, ed.) vol. 61 of DIMACS series in discrete mathematics and
theoretical computer science The American Mathematical Society, New York.
Zwickl, D. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological
sequence datasets under the maximum likelihood criterion. Ph.D. thesis University of
Texas at Austin.
Appendix
Refer to (Semple and Steel, 2003) for a detailed description of the notation.
The first observation is that an TBR rearrangement of a tree increases the length of a
character by at most one. As SPR rearrangements are a special case of TBR rearrangements,
the same result holds for SPR.
Lemma 1. Let T be a fully resolved phylogenetic tree and Xi an unordered reversible charac-
ter. Let T ′ be a phylogenetic tree that differs from T by a single TBR rearrangement. Then
`(χ, T ′) ≤ `(χ, T ) + 1.
Proof. The proof of Lemma 5.1 in (Bryant, 2004) for binary characters applies directly to
the multistate case.
Let dSPR(T, T
′) denote the unrooted SPR distance between two phylogenetic trees T and
Theorem 1 LetXi be a character with ri states and let T be a fully resolved phylogenetic
tree. It takes exactly `(Xi, T ) − (ri − 1) SPR rearrangements to transform T into a tree
compatible with Xi. The result still holds if Xi has some missing states.
Proof. Let T ′ be a fully resolved phylogenetic tree compatible with Xi for which dSPR(T, T
is minimized and let m = dSPR(T, T
′). Then there exists a sequence of trees T ′ = T0, ..., Tm =
T such that every adjacent pair of trees in the sequence differ by exactly one SPR rear-
rangement. By Lemma 1 the existence of this sequence implies that `(T,Xi) − `(T ′, Xi) ≤
dSPR(T, T
′) and since Xi is compatible with Xi we have `(T
′, Xi) = ri − 1, giving
`(T,Xi)− (ri − 1) ≤ dSPR(T, T ′).
For the other direction, we show that we can construct a sequence of `(T,Xi) − (ri −
1) SPR rearrangements that transform T into a tree T ′ compatible with Xi. Firstly, if
`(T,Xi) − (ri − 1) = 0, then T is compatible with Xi so the proof is finished. Otherwise,
let X̂i be an assignment of states to internal nodes that minimises the number of state
changes (that is, a minimum extension of Xi). Then since Xi is not convex on T there
exist three vertices u, v and w, where {u, v} ∈ E(T ), v lies on the path from u to w and
X̂i(u) = X̂i(w) 6= X̂i(v). Perform an SPR rearrangement by removing edge {u, v}, supressing
the v vertex and creating a new edge {u, t} where t is a new vertex on an edge adjacent
to w. Furthermore, set X̂i(t) = X̂i(w). Then the number of edges on which a change has
occurred has decreased by 1 thereby decreasing the parsimony length by 1. This procedure
can be repeated until the parsimony length equals ri − 1, constructing the desired sequence
of trees and completing the proof.
Let T be a maximum parsimony phylogenetic tree for X1 and X2 and let
Theorem 2 The total incongruence score i(X1, X2) for two characters equals the mini-
mum SPR distance between a tree T1 and T2 such that X1 is compatible with T1 and X2 is
compatible with T2.
Proof. Let T1 and T2 be any two trees compatible with X1 and X2 respectively. Then
`(X1, T1) = r1−1 and by Theorem 1, `(X2, T1)− (r2−1) ≤ dSPR(T1, T2). We have then that
i(X1, X2) ≤ `(X1, T1) + `(X2, T1)− (r1 − 1)− (r2 − 1)
≤ dSPR(T1, T2)
and so i(X1, X2) is a lower bound for dSPR(T1, T2).
We show that this bound can be achieved. Let T be a maximum parsimony tree for the
pair of characters X1, X2. By Theorem 1 there exist two trees T1 and T2 such that T1 is
compatible with X1, T2 is compatible with X2 and
dSPR(T1, T ) + dSPR(T2, T ) = i(X1, X2),
implying that dSPR(T1, T2) ≤ dSPR(T1, T ) + dSPR(T2, T ) ≤ i(X1, X2) and hence
dSPR(T1, T2) = i(X1, X2).
ABSTRACT
  The parsimony score of a character on a tree equals the number of state
changes required to fit that character onto the tree. We show that for
unordered, reversible characters this score equals the number of tree
rearrangements required to fit the tree onto the character. We discuss
implications of this connection for the debate over the use of consensus trees
or total evidence, and show how it provides a link between incongruence of
characters and recombination.

<|endoftext|><|startoftext|>
Draft version June 8, 2021
Preprint typeset using LATEX style emulateapj v. 08/22/09
SPECTROPOLARIMETRIC OBSERVATIONS OF THE CA II 8498 Å AND 8542 Å LINES IN THE QUIET SUN
A. Pietarila
, H. Socas-Navarro
High Altitude Observatory, National Center for Atmospheric Research2, 3080 Center Green, Boulder, CO 80301, USA
T. Bogdan
Space Environment Center, National Oceanic and Atmospheric Administration, 325 Broadway, Boulder, CO 80305, USA
Draft version June 8, 2021
ABSTRACT
The Ca II infrared triplet is one of the few magnetically sensitive chromospheric lines available for
ground-based observations. We present spectropolarimetric observations of the 8498 Å and 8542 Å
lines in a quiet Sun region near a decaying active region and compare the results with a simulation
of the lines in a high plasma-β regime. Cluster analysis of Stokes V profile pairs shows that the two
lines, despite arguably being formed fairly close, often do not have similar shapes. In the network, the
local magnetic topology is more important in determining the shapes of the Stokes V profiles than the
phase of the wave, contrary to what our simulations show. We also find that Stokes V asymmetries
are very common in the network, and the histograms of the observed amplitude and area asymmetries
differ significantly from the simulation. Both the network and internetwork show oscillatory behavior
in the Ca II lines. It is stronger in the network, where shocking waves, similar to those in the high-β
simulation, are seen and large self-reversals in the intensity profiles are common.
Subject headings: polarization, Sun: chromosphere, waves
1. INTRODUCTION
Our understanding of solar magnetic fields
outside active regions has increased signifi-
cantly during the last years. This is due to
new and better instrumentation (e.g., THEMIS,
Paletou & Molodij 2001; VSM on SOLIS,
Keller & The Solis Team 2001; Swedish Solar Telescope,
Scharmer, Bjelksjo, Korhonen, Lindberg, & Petterson
2003; Solar Optical Telescope on Hinode, Shimizu
2004; and SPINOR, Socas-Navarro et al. 2006), better
diagnostic techniques (see for example Bellot Rubio
2006 for a review on inversion techniques) and advanced
numerical simulations (Stein & Nordlund 2006 and
references therein). A large portion of the work has
focused on photospheric magnetic fields. Only now we
are starting to have adequate tools for investigating
chromospheric magnetism in more detail. (For a review
of chromospheric magnetic fields see Lagg (2005)). This
is not surprising considering the numerous difficulties in
observing chromospheric magnetic fields, interpreting
the data, and performing realistic MHD simulations.
There are two different sets of lines that are often used
for chromospheric spectropolarimetry, the He I infrared
(IR) triplet at 10830 Å, and the Ca II IR triplet at
8500 Å. Both line sets have their advantages and dis-
advantages. The He I lines are formed over a relatively
thin layer, and therefore observations can be inverted us-
ing a simple Milne-Eddington model. The drawback is
that while the formation range is fairly narrow, the pre-
cise formation height remains uncertain, and the Milne-
Eddington inversions do not give any information on
1 Institute of Theoretical Astrophysics, University of Oslo,
P.O.Box 1029 Blindern, N-0315 Oslo, Norway
2 The National Center for Atmospheric Research (NCAR) is
sponsored by the National Science Foundation.
the atmospheric gradients. The lines are also sensitive
to the Paschen-Back effect, which must be included in
the inversion code (Socas-Navarro et al. 2004). Further-
more, simulating the He I lines is difficult since coronal
irradiation has a non-negligible effect on their formation
(Andretta & Jones 1997). In contrast, the formation of
the Ca II IR lines is fairly well understood (Lites et al.
1982). The broad Ca II lines sample a large region of the
atmosphere, from the photosphere to the lower chromo-
sphere. However, the Ca II lines are formed in nonLTE,
making inversions considerably more cumbersome.
Several investigations using the Ca II IR lines have
studied intensity and velocity oscillations in the quiet
Sun (e.g. Lites et al. 1982; Deubner & Fleck 1990)
or, alternatively, magnetic fields in active regions (e.g.
Socas-Navarro et al. 2000a). In both cases the lines have
proven useful as diagnostics of the solar chromosphere.
In this paper we present results of spectropolarimetric
observations of two of the lines in an enhanced network
region. We have both spatial maps and time series data.
The observations show that the Ca II lines are formed in
a very interesting region, namely the region where the
atmosphere is transforming from a plasma dominated
(β >> 1) to a magnetic field dominated (β << 1) regime
in terms of dynamic force balance. Wave propagation
is clearly seen in the highly dynamic magnetic regions,
whereas the weakly magnetic internetwork is found to
be less variable. Interestingly, the two Ca II lines ex-
hibit significant differences even though in calculations
they are formed fairly close together. The importance of
gradients in the chromospheric network is clearly demon-
strated by the prevalence of asymmetric Stokes V profiles
in the data.
The paper is arranged as follows: in § 2 the data and
their reduction are addressed. Results of analyzing the
data using different approaches are presented in § 3. We
http://arxiv.org/abs/0704.0617v1
performed cluster analyses on the Stokes V profiles to
classify them and to describe spatial patterns seen in the
data. Statistics, such as profile amplitudes and asym-
metries, are presented. The time dependent behavior of
the lines in different network and internetwork regions
is also discussed. In § 4 the observations are compared
to simulations of the lines in a high plasma-β regime
(Pietarila et al. 2006, hereafter P06). Finally, in § 5 the
main results are summarized and discussed.
2. OBSERVATIONS AND DATA REDUCTION
The Spectro-Polarimeter for INfrared and Optical Re-
gions (SPINOR, Socas-Navarro et al. 2006) at the Dunn
Solar telescope, Sacramento Peak Observatory, was used
to observe two of the Ca II infrared triplet lines at 8498
Å and 8542 Å, as well as two photospheric Fe I lines at
8497 Å and 8538 Å. The setup included several other
lines but because of computer problems only data from
the two Ca lines which used the ASP TI TC245 cameras
were recorded fully. The data have 256 points in both
the wavelength and spatial position with a typical noise
level of 6 × 10−4 Ic (1 σ deviation from the mean) and
a spectral sampling of 25 mÅ. The pixel height corre-
sponds to ≈ 0.38 arcseconds on the solar surface along
the slit. We observed a quiet Sun region near disk center
at S17.3 W32.1 on May 19, 2005 at 14:14 UT. An MDI-
magnetogram of the region is shown in Figure 1. The
slit was positioned in the vicinity of a decaying active re-
gion, AR10763, but avoided flux concentrations from the
active region (i.e., plages). A time series consisting of 99
time steps of short scans (3 slit positions), with a spacing
of 0.375 arcseconds each, was acquired during variable
seeing conditions. The cadence is ≈ 10 seconds (i.e., a
given slit position was repeated every 30 s). The time
series was followed by a 63 step raster centered around
the position where the slit was during the time series.
The raster step size was 0.375 arcseconds. Adaptive op-
tics (AO, Rimmele 2000) were used during the observ-
ing sequence but the compromised seeing conditions did
not allow for continuous locking onto granulation. This
caused the slit to jump occasionally, making the longest
period with a stationary slit in the time series 17 time
steps (8.5 min). The spatial resolution varied during the
sequences being at best less than an arc second, but on
average a factor of two worse.
Standard procedures for flat field and bias were used
for the data reduction. Instrumental polarization was re-
moved using the available calibration data, as explained
in Socas-Navarro et al. (2006). No absolute wavelength
calibration was attempted because no suitable telluric
lines are present. Instead a wavelength calibration using
spatial pixels devoid of magnetic field was done by fitting
the average spectrum to the Kitt Peak FTS-spectral at-
las (Neckel & Labs 1984). The FTS atlas was also used
to find the normalization factor for the intensities to the
quiet Sun continuum intensities. Because of detector flat-
field residuals and prefilter shape, the continua in the raw
data from both detectors are tilted. The tilts were re-
moved a posteriori by subtracting a linear fit (y = a+bλ)
obtained by matching the continuum intensity levels to
those of the FTS atlas.
The data were analyzed using both the raster and time
series for statistical purposes. The period when the slit
was stationary on the solar disk was used to study the
time-dependent behavior of the lines. Because of the
short length of this period, we do not present any Fourier
analysis of the data. To make a classification of Stokes
V profile morphologies, we did cluster analyses based
on a Principal Component Analysis (PCA) in a similar
manner as the work of Sánchez Almeida & Lites (2000)
and Khomenko et al. (2003) for photospheric lines. We
computed amplitudes for Stokes I and V profiles. Be-
cause the Ca line intensity profiles often exhibit strong
self-reversals, no proxies for atmospheric velocities, such
as lines’ centers of gravity or bisectors, are adequate.
For those Stokes V profiles with amplitudes greater than
7 × 10−3 Ic, (i.e. ≥ 10σ), amplitude and area asymme-
tries were also calculated.
The amplitude asymmetry of a Stokes V profile is de-
fined by (Mart́ınez Pillet 1997):
ab − ar
ab + ar
, (1)
where ab and ar are the unsigned extrema of the blue
and red lobes of the Stokes V profile.
The area asymmetry of a Stokes V profile is defined by
(Mart́ınez Pillet 1997):
σA = s
V (λ)dλ
|V (λ)|dλ
, (2)
where s is the sign of the blue lobe. Because of the broad,
deep lines and large velocities (compared with the pho-
tosphere) present in the chromosphere, the choice of the
integration range for the area asymmetries is non-trivial
for the Ca lines. We followed the same procedure as
in P06. In the weak field regime the Stokes V profile
is proportional to dI/dλ (strictly true only in the ab-
sence of atmospheric velocity and magnetic gradients).
Inspection of the data showed that most of the observed
Stokes V profiles have roughly the same structures as
the dI/dλ profiles. The intensity in the blue wing (λ0)
of the line profile was matched with a point in the red
wing (λ1) with the same intensity. The signal-to-noise in
the intensity profiles is much higher than in the Stokes V
profiles and also the slope is much steeper. This makes
matching points with the same value more accurate in
the intensity than in the Stokes V profiles. The selec-
tion of a wavelength to start the integration range was
made by choosing a wavelength point that is far enough
from the line core so that self-reversals are not an issue.
In our data this point, λ1, is at 600 mÅ from reference
wavelength of line center. The same value was used in
Magnetograms made from the 63 step scan are shown
in Figure 2. The panels are in order of increasing for-
mation height: Fe I 8497 Å, Fe I 8538 Å, Ca II 8498 Å
and Ca II 8542 Å. The lower part of the slit was located
above a flux concentration along the enhanced network
and the upper part over an internetwork region with very
little magnetic activity. The network becomes wider and
more diffuse with increasing line formation height as de-
scribed by Giovanelli (1980). Not all magnetic flux seen
in the photosphere can be identified in the chromosphere
Fig. 1.— MDI magnetogram showing the position of the slit for the time series and the map (rectangular region). The observed region
was close to the decaying active region, AR10763.
and vice versa. However, interpreting the chromospheric
magnetograms is difficult due to the self-reversed features
in the cores of the Ca line Stokes V profiles.
3. RESULTS
In Figure 3, Stokes I and V spectra of the solar sur-
face under the slit are shown for both Ca II lines as well
as the two photospheric Fe I lines in the Ca lines’ wings
(marked by arrows). Since the Fe I 8497 Å line is blended
in the Ca line’s wing and the Fe I 8538 Å line is very close
to the edge of the detector, no quantitative analysis is
done for them. No signal above the noise was recorded
in Stokes Q and U so they will not be addressed in what
follows. Residual vertical fringing caused by the polar-
ization modulator is visible in the Stokes V images. We
chose not to try to remove the fringing since its’ ampli-
tude is of the same order of magnitude as the noise.
The network, present in the lower part of the slit, is
associated with less absorption in the intensity profiles.
Both Ca lines often show self reversals, which are usually
stronger on the blue side of the line than in the red. The
Stokes V profiles of both Ca lines have large, extended
wings. At times, the profiles may have both polarities
present on the blue side of the core but in almost all
cases the far blue wing of the profile has the same po-
larity (i.e., opposite sign) as the red wing. The Stokes
I and V profiles of the chromospheric lines look distinc-
tively different from the photospheric lines: the Ca lines
have more structure, they are wider and exhibit more
spatial variation than the photospheric Fe lines. Some
differences are seen between the two Ca lines: the 8542
Å line is slightly broader, has more structure in the spec-
Fig. 2.— Magnetograms of the map deduced by using the weak field method (Landi Degl’Innocenti 1992). The Stokes V signal is
measured in units of Gauss. Vertical lines show the position of slit during the time series. First panel: Fe I 8497 Å, second panel: Fe I8538
Å, third panel: Ca II 8498 Å, third panel: Ca II 8542 Å. Location on the solar disk: S32.1, W17.3. The orientation of the magnetograms
is 180 degrees from the MDI magnetogram in Figure 1. The plotted symbols (∗, ✸ and △) on the images show where the pixels discussed
later in the text are located.
tra and also stronger absorption than the 8498 Å line.
The internetwork region, present in the upper part of
the slit, is mostly devoid of Stokes V signal, and Stokes I
is more homogeneous than in the network. Self reversals
are usually not seen in the profiles. A small portion of
the internetwork region has structures in Stokes I that
are similar to those seen in the magnetic region: Stokes I
is brighter than in the surrounding areas and the profiles
show some self reversals. Closer inspection of the images
reveals a visible, albeit a very small amplitude, Stokes V
signal.
The spatial patterns of Stokes I and V amplitudes and
asymmetries in the two Ca lines (Figure 4) are fairly
similar to one another. The network is clearly visible in
the Stokes I and V amplitudes, though it is more diffuse
in the 8542 Å line. There is a structure in the upper
part of the map that is seen best in the 8498 Å intensity
image. Parts of this structure appear also in both lines’
Stokes V amplitude and asymmetry images. The edges
of the network have more asymmetric Stokes V profiles.
This is seen clearly in the 8542 Å amplitude asymmetry.
Photospheric velocities can be estimated from the lo-
cations of the iron lines’ intensity minima. Except for
a nearly constant offset caused by the convective down
flows in the network, the internetwork and network re-
gions have very similar spatial and temporal patterns.
3.1. Classification of the Stokes V profiles
To classify the shapes of the 8498 Å and 8542 Å Stokes
V profile pairs we used PCA, (Rees et al. 2000) and clus-
ter analysis. The cluster analyses were performed sepa-
rately for the map and the time series. Here we present
a summary of the PCA procedure and cluster analysis
for completeness.
With the PCA we are able to reduce the number of
parameters needed to describe a given profile. Each pro-
file, S(λj), j = 1, ..., Nλ (Nλ is the number of wavelength
points in the profile) is composed of a linear combination
of eigenvectors ei(λj), i = 1, ..., n:
S(λj) = Σ
i=1ciei(λj), (3)
where the ci are appropriate constants. The eigenvectors
and constants for a given set of profiles are obtained from
a singular value decomposition (SVD, Rees et al. 2000,
Socas-Navarro et al. 2001) and form an orthonormal ba-
sis with Nλ eigenvectors:
j=1ei(λj)ek(λj) = δik. (4)
Not all eigenvectors contain the critical information
needed to reproduce the profiles, some of the eigenvec-
tors carry information about the noise pattern. We can
therefore truncate the series expansion and use only a
small number of eigenvectors and corresponding coeffi-
cients to reproduce a given profile. The PCA guarantees
that when expansion of Eq. 3 is truncated at a given
order m, the amount of information in the lower orders
is maximized.
We performed the SVD for the two 8498 Å and 8542
Å Stokes V profiles separately. The resulting orthonor-
mal bases, and also the cluster analysis, depend on the
subset of profiles used to construct it. Because of this
we included all Stokes V profiles from pixels where the
8498 Å Stokes V amplitude is above 7 × 10−3Ic, alto-
gether 13671 profiles. Visual inspection of the eigenvec-
tors shows that the first 11 eigenvectors (approximately)
Fig. 3.— Dispersed images of the slit. The arrows mark the locations of the two photospheric iron lines and the horizontal lines in the
intensity images are the hairlines used to spatially coalign the two detectors. Wavelengths are measured from 8498 Å (left) and 8542 Å
(right).
contain relevant information about the actual shape of
the profiles whereas the remainder are associated with
the noise patterns.
The Stokes V profile pairs, now described with 11 × 2
coefficients corresponding to the 11 × 2 eigenvectors in-
stead of 102 (51 for each profile) wavelength points, were
organized into a predefined number of clusters. Before
doing this the vectors consisting of the 22 coefficients
were standardized, i.e., no information of the absolute
Stokes V amplitudes is left, only the relative amplitudes
of the 8498 Å and 8542 Å profiles. Based on the values
of the coefficients, 6 cluster centers were identified using
the k-means method (MacQueen 1967). It starts with k
random clusters, which through iterations are changed to
minimize the variability within a cluster and maximize
it between clusters. Each profile pair is then assigned
to the nearest cluster center in the 22-dimensional Eu-
clidean space. The choice of number of clusters used for
the cluster analysis is non-trivial. Since each data point
is described by 22 numbers we cannot visually distinguish
patterns in the spatial distribution of the points. Instead
the number of clusters was defined by trial and error, i.e.
so that each profile type in the time series or map is rep-
resented and each cluster is still clearly distinct from one
another. For each cluster a profile was constructed using
the eigenvectors and the averaged 2 × 11 coefficients of
all profiles belonging to that cluster.
Cluster analysis of the map shows the shapes of Stokes
V profiles in network regions with different magnetic
topologies, whereas the time series analysis describes how
a set of profiles from a certain magnetic topology changes
with time.
The results for the map are shown in Figure 5. Above
each profile is the percentage of all profiles belonging to
the cluster, the mean distance in the Euclidean space
Fig. 4.— Maps of the 8498 Å and 8542 Å lines’ Stokes I amplitudes, Stokes V amplitudes, area asymmetries and amplitude asymmetries
in the raster scan. The horizontal line seen in the amplitude images is a hairline used to spatially coalign the detectors. The vertical line
shows the position of the slit during the time series. Note that x-axis is stretched compared with y-axis.
of the profiles to the cluster center, and the standard
deviation of the mean. The smaller the distance to the
cluster center, the more compact the cluster is and the
better the cluster describes the profiles. The standard
deviation is proportional to the spread of the distances
in each cluster. In general, clusters with the least number
of profiles belonging to them have larger mean distances.
Three points can be deduced from the figure. First,
asymmetric profiles should be common. In fact, they ap-
pear to be more common than symmetric ones. Second,
even though the two Ca lines are formed fairly close to
one another (the 8498 Å line core optical depth is unity at
about 1 Mm and the 8542 Å 0.2 Mm higher up in the ra-
diation hydrodynamic simulations by Carlsson and Stein
1997), the 8498 Å and 8542 Å profiles in a given cluster
are often clearly different from one another. Third, in
all cluster profiles the far-red wings have the same polar-
ity as the far-blue wings, indicating that the lower parts
along the line-of-sight of the atmosphere, where the wings
are formed, are dominated by a single magnetic polarity.
The clusters differ from one another in several differ-
ent ways: the degree of asymmetry, and distinct relation-
ships between the 8498 Å and 8542 Å line profiles, rel-
ative amplitudes, etc. However, quantitative measures,
such as profile asymmetries, of the clusters do not nec-
essarily represent the members of a given cluster very
well. For example, the variation of Stokes V amplitude
asymmetries within a cluster is large and the mean is
not necessarily the same as that of the cluster profile.
The cluster analysis retrieves qualitative similarities and
gives a basis for morphological classification, rather than
representing quantitative similarities within the data. To
illustrate this point, Figure 6 displays histograms of the
clusters showing the Stokes V amplitudes and asymme-
tries for all profiles belonging to a given cluster. Shown
in Figure 7 is the spatial distribution of the clusters. The
smallest network patches often consist of only cluster 1
and cluster 2 profiles. The middle of the largest network
patch is a mixture of different clusters.
In most cases, the profiles at the edges of the network
patches belong to cluster 1. This is the most common
cluster consisting of 35.6 % of all the profile pairs in
the map. The cluster 1 profiles are asymmetric, 8542
Å more so than 8498 Å, and they also have opposite
signs of amplitude and area asymmetries. The amplitude
histograms of profiles belonging to this cluster show that
they have in general low amplitudes, as one might expect
from profiles located at the edges of the network. The
large amplitude asymmetry in the 8542 Å cluster profile
is not seen in the observed profiles. In fact, only very few
profiles exhibit such large asymmetries and there is only a
slight tendency of the profiles having more often negative
than positive amplitude asymmetries. The cluster area
asymmetries are in better agreement with the observed
profiles belonging to this cluster.
Regions of cluster 2 profiles are often located adjacent
to patches of cluster 1 profiles. The cluster 2 profiles
account for 20.0 % of all profile pairs in the map. The
cluster profiles are fairly antisymmetric. This is seen in
the observed profiles as well: the asymmetry histograms
tend to be narrow and only slightly offset from zero. The
relative amplitudes of the two cluster profiles are very
different: the 8498 Å amplitude is a factor 3 larger. The
disproportionality is not as large in the observed profiles
though the amplitude histograms show that in general
8498 Å has a larger amplitude than 8542 Å. The range of
observed amplitudes is considerably larger than in cluster
Of the profile pairs in the map 14.1 % belong to cluster
3. Also, these profiles are often found in regions close to
the network edges by the patches of cluster 1 profiles.
Both cluster profiles have multiple lobes and are asym-
metric, 8498 Å more in amplitude and 8542 Å in area.
This is also seen in the histograms of the observed asym-
Fig. 5.— Results of cluster analysis of the Stokes V profile pairs in the map. Line on left is 8498 Å and on right 8542 Å. Shown are the
percentages of profile pairs belonging to each cluster, and the mean distance and its standard deviation of the profiles to the cluster center.
metries. There is a strong emission feature on the blue
side of the line in the 8498 Å cluster profile. It is weaker
in the 8542 Å profile. The histograms for cluster 3 are
nearly identical to those of cluster 1. This illustrates how
cluster analysis based on PCA is captures the qualitative
differences in the line profiles.
Cluster 4 consists of 13.5 % of the profile pairs. Most
of the observed profiles belonging to this cluster are near
to the middle of the largest network patches. The 8498 Å
cluster profile is dominated by a strong emission feature
in the blue lobe. This feature is not visible in the 8542 Å
cluster profile. The overlap between the two lines’ ampli-
tude histograms is fairly small. Also the cluster profiles
show this difference in the relative amplitudes: 8542 Å
has a significantly lower amplitude than 8498 Å. Except
for the 8542 Å area asymmetry histogram, all histograms
are centered around zero. The range of area asymmetries
in the 8542 Å line is large and the distribution is skewed
towards negative values. This trend in the 8542 Å area
asymmetries is seen in several of the clusters.
The patches of profiles belonging to the fifth cluster
(9.6 %) are also found in the less homogeneous middle
regions of the network elements. The 8498 Å cluster
Fig. 6.— Stokes V statistics of the map clusters. The histograms are for all profiles belonging to the given cluster and the dotted vertical
lines show the area and amplitude asymmetries for the cluster profiles.
profile has a factor 2 lower amplitude than 8542 Å. This
is not seen in the amplitude histograms but there is a
large overlap between the two histograms. The cluster
profiles are fairly antisymmetric and also the histograms
of observed profile asymmetries are centered around zero.
The 8542 Å area asymmetry is again the exception: it is
centered around a negative value.
Cluster 6 is the smallest cluster with 7.2 % of the pro-
files. Patches of cluster 6 profiles are located in regions
with cluster 4 and 5 profiles. The 8498 Å cluster profile is
very similar to that of cluster 5. Like cluster 5, the 8542
Å cluster 6 profile has a factor 2 larger amplitude and
the amplitude histograms overlap nearly entirely. All the
cluster 6 histograms are very similar to cluster 5. The
major difference between the two is that there is very
little structure in the 8542 Å line profile.
3.2. Time-dependent behavior
The cluster analysis results of the time series are shown
in Figure 8, and the spatio-temporal distribution is cap-
tured in Figure 9. The clusters consist of profiles at
rest with varying degrees of structure, and profiles where
the blue side is in emission. While there are temporal
changes in the clusters, there are no clear periodic pat-
terns visible. Most slit positions have a preferred cluster
or in some cases the slit position is dominated by two
clusters. Positions where more than 2 clusters are domi-
nant are rare.
Because the slit moved occasionally during the time
series, no meaningful power spectra can be made from
this data set. The time series data do however allow
for a qualitative analysis of the time-dependent behav-
ior. Comparing network and internetwork pixels reveals
some interesting features: the network, especially in the
intermediate flux regions, is very dynamic with propagat-
ing shock-like features and large self-reversals appearing
frequently in both Stokes I and V . In comparison, the
internetwork is less dynamic, intensity oscillations are
present but they are much weaker than in the network.
No structures indicating the presence of shocks, are seen
in the internetwork profiles. In agreement with prior ob-
servations of chromospheric lines (e.g., Noyes 1967), any
oscillation periods in the network appear to have a longer
period than in the internetwork.
We now examine three different regions, namely an
internetwork pixel, an intermediate flux network pixel,
and a strong network pixel.
3.2.1. Internetwork
In Figure 10 the time evolution of a typical internet-
work pixel is shown. The location of the pixel is marked
by an asterisk in Figure 2. The data were taken when
the slit was stationary. No Stokes V signal above the
noise level is seen in the pixel. The Stokes I profiles of
both Ca lines change periodically in width and position
of the line center, but no self-reversals are seen. Also,
the line-wing intensity shows some oscillations.
3.2.2. Intermediate flux network
The difference between the internetwork and network
regions with intermediate flux (Fig. 11) is dramatic: the
Fig. 7.— Spatial distribution of the clusters in the map. The black areas (0 cluster) correspond to regions where the Stokes V amplitudes
are below 7× 10−3Ic and where no cluster analysis was performed.
network region is much more dynamic, and highly asym-
metric profiles, in both lines Stokes I and V , are seen.
The time dependent behavior of the photospheric iron
line is quite similar to what is seen in the internetwork.
The Stokes I in both lines has a clearly oscillating be-
havior with bright, very asymmetric episodes followed
by a darker, more symmetric episodes. The period for
the oscillation is about 4 minutes, i.e. below that as-
sociated with the acoustic cutoff frequency (about 5.3
mHz). This may be caused by the presence of inclined
magnetic fields can lower effectively the acoustic cutoff
frequency (Bel & Leroy 1977). The time evolution of the
8542 Å Stokes I has a diagonal structure moving from
blue to red. This indicates the presence of propagating
compressible waves (Carlsson & Stein 1997). The bright
part, which corresponds to a large self-reversal, is clearly
shifted towards the blue. This is seen in the 8498 Å line
profiles as well, although these profiles tend to be more
flat-bottomed. In general, the self-reversals and over all
variation is larger in the blue wing than in the red. This
is true for all slit positions which exhibit strong time-
dependent behavior.
The Stokes V image of the 8498 Å line also shows
strong diagonal structures that coincide in time with the
dark phases of Stokes I. Inspection of individual pro-
files (Fig. 12) reveals a pattern of multiple lobes in the
Stokes V profiles. These lobes are on the blue side of the
line core and their amplitudes and positions vary period-
ically in time resulting in the diagonal structure seen in
the image. The lobes can be identified with the emission
features seen in the Stokes I profiles. The 8542 Å line
Stokes V image shows a pattern of a multi-lobed pro-
files whose amplitudes vary strongly in time. The large
Stokes V amplitude phase coincides with the bright, very
asymmetric phase seen in the intensity profiles. The red
wings always exhibit less structure and variation than
the blue wings.
3.2.3. Strong network
Stokes I and V profiles seen in the strong network re-
gions (Figs. 13 and 14) would appear at first glance to
be a mixture of the less dynamic internetwork and the
highly dynamic intermediate flux region. The Stokes I
profiles exhibit the same pattern of bright (more asym-
metric) and dark (less asymmetric) phases as seen in the
intermediate flux region. The difference between the two
phases is however not as large: the amplitude of the self-
reversals, especially in the 8542 Å intensity profiles, is
much smaller than in the intermediate flux case.
The Stokes V images resemble those of the intermedi-
ate flux region: some diagonal structures are seen, but
they are weaker. The 8542 Å line Stokes V profiles have a
time varying amplitude but the profiles are not as asym-
metric and they are not necessarily multi-lobed. The dif-
Fig. 8.— As Fig. 5 but for the time series.
ference between the time-dependent behavior of the red
and blue lobes of the profile, i.e. the red lobe varies less
in time, is even more clear here than in the intermediate
flux region.
3.3. Statistics
Histograms of the Stokes I amplitude integrated over
250 mÅ around the line core for the two Ca lines are
shown in the top-left panel of Figure 15. These his-
tograms include both the map and time-series profiles.
Because there are almost five times as many profiles in
the time-series as there are in the map, the histograms
are dominated by the time-series profiles. Both lines ex-
hibit a wide range of values. Except for the peaks at
low intensities, the histograms are fairly flat. The dark-
est (i.e. lowest core intensity or most absorption) am-
plitudes, are associated with the internetwork, and the
brightest with the network.
Histograms of the Stokes V amplitudes (top right panel
of Fig. 15) peak at the same value in both lines, 0.003 Ic,
but the 8498 Å histogram tail decays more slowly. Since
the 8498 Å line is formed slightly lower of the two and
the lines are roughly equally sensitive to magnetic fields
(effective Landé g factors are 1.07 and 1.10 for the 8498
Å and the 8542 Å lines, respectively), it is not surprising
that the 8498 Å histogram has the longer tail.
Fig. 9.— The spatio-temporal distributions of the clusters for the first slit position in the time series. The black areas correspond to
regions where the Stokes V amplitudes is ≤ 7×10−3Ic. The vertical lines show the period with the best seeing when the slit was stationary.
Both lines’ Stokes V amplitude asymmetry histograms
(bottom left panel of Fig. 15) have very similar shapes
and similar widths. There are more positive asymme-
tries in both lines: 56% in 8498 Å and 64 % in 8542
Å (Table 1). The mean amplitude asymmetries are also
positive, and the 8542 Å mean asymmetry is two times
larger. There are more negative amplitude asymmetries
in the 8542 Å map than in the time series. Non-zero am-
plitude asymmetries indicate at least one of two things:
the spatial pixels consist in most cases of at least two
atmospheric components that are shifted relative to one
another or that there are velocity and/or magnetic field
gradients present in the atmosphere.
The area asymmetry histograms (bottom right panel
of Fig. 15) of the two calcium lines repeat the pattern al-
ready seen in the cluster profiles: the 8542 Å histograms
is centered around a negative value and the 8498 Å is
centered at roughly zero, though the mean is slightly pos-
itive. The 8542 Å histogram is significantly wider than
the 8498 Å histogram. A multi-component atmosphere
alone cannot produce area asymmetries, so the existence
of non-zero area asymmetries indicates the presence of
velocity and possibly magnetic gradients in the atmo-
sphere.
In the 8542 Å line 66 % of the profiles have negative
area asymmetries whereas in the 8498 Å line the majority
of the profiles, 64 %, have positive area asymmetries (Ta-
ble 1). To better understand why the area asymmetry
histograms of the lines are so different, we need to look
at the components of the area asymmetry separately i.e.
the sign of the blue lobe and the total area of the Stokes
V profile. One possible cause for the difference in the
histograms might be that the distribution of signs of the
blue lobe is different in the two lines. Closer inspection
reveals that this is not the explanation. The vast major-
ity of both lines, over 80 %, have a negative sign. (Here
the sign is defined to be the sign of the local maximum
or minimum amplitude of the blue lobe). A second pos-
sible explanation is that the
V (λ)dλ is different in
the two lines. This is found to be the case. The 8542 Å
line has more profiles with a positive area and the 8498
Å has slightly more profiles with a negative area. (Note
that the sign of the area asymmetry is the product of
the sign of the blue lobe and the area; eq. 2.) The area
of the Stokes V profile is strongly affected by the emis-
sion features. These features, and their amplitudes, are
related to the self-reversals seen in the Stokes I profiles.
The self-reversals are stronger on the blue side of the line
core than on the red. In general, the blue lobes of the
Stokes V profiles have negative amplitudes and the effect
of the emission features is then to reduce the amplitude,
and in some cases, make it positive and this way reduce
the overall negative area.
The effect of the emission features on the amplitude
asymmetries is not as large because the amplitude will
be affected only if the emission feature is located at the
Fig. 10.— Time dependent behavior of Stokes I in an internetwork pixel. Location of the pixel is marked with an asterisk in Figure 2.
same wavelength as the maximum absolute amplitude.
Also if the profile has a wide blue lobe, i.e., the wings
contribute significantly, a local reduction in peak ampli-
tude is counterbalanced by a comparable signal in the
other parts of the blue lobe. The resulting profile will
have nearly the same amplitude in the blue lobe as be-
fore, but the area will be reduced leading to a smaller, or
even negative, area asymmetry. Since the self-reversals
are larger in the 8542 Å line, this scenario is more likely
to apply to it than the 8498 Å line.
Both lines’ area and amplitude asymmetries are found
to be inversely proportional to the Stokes V amplitudes.
The scatter, especially in the 8542 Å line, is fairly large.
PCA also allows us to ensure that the determination of
Stokes V asymmetries is not dominated by noise. Recon-
structing the profiles using only the 11 first eigenvectors
(i.e., essentially noise-free profiles) and then computing
the asymmetries reproduces the Stokes V amplitude and
asymmetry histograms. To test if the negative histogram
peak in the 8542 Å line is an artifact caused by data re-
duction, we computed area asymmetries for the datasets,
but after first removing the fringe pattern caused by the
optics. This did not alter the area asymmetry histogram.
Another artifact that could cause the offset is an incor-
rect subtraction of the tilt caused by the detector in the
continuum intensity. To remove the offset in the his-
tograms by means of changing the tilt causes a clearly
visible lopsidedness in the Stokes I profiles. Lastly, to
make sure that the choice of the integration range is not
the cause of the offset, we used a constant bandwidth
for area asymmetries and it also reproduces the 8542 Å
area histogram offset. (Besides these issues, there are no
other obvious artifacts that would cause the offset.) We
therefore conclude that the offset is not caused by the
fringing or incorrect subtraction of the tilt in the contin-
uum intensity.
4. COMPARISON OF OBSERVATIONS WITH A HIGH-β
SIMULATION
In P06 we synthesized Stokes profiles for the Ca IR
triplet lines in the high-β regime. This was done by
combining a radiation hydrodynamic code (see for exam-
ple Carlsson & Stein 1997) with a weak magnetic field
and using a nLTE Stokes inversion and synthesis code
(Socas-Navarro et al. 2000b) to produce, based on snap-
shots of the simulation, a time series of the lines’ Stokes
vectors. The simulation is driven by a photospheric ve-
locity piston and its dynamics are dominated by upward
propagating acoustic waves in a simple magnetic field
topology. The simulation shows that the radiative trans-
fer is very similar in all the Ca IR triplet lines. The
differences between the line behaviors in the simulation
are mainly due to the lines having slightly different for-
mation heights and thus experiencing a difference in the
amplitudes of the shocking waves: the higher the line is
formed, the larger the amplitude of the passing wave is.
In the simulation there is no feedback from the mag-
netic fields on the dynamics and the waves are purely
acoustic. The observations have limited spatial and tem-
poral resolutions whereas the simulation is much better
resolved.
4.1. Comparison of time dependent behavior
As the acoustic waves in the simulation propagate up-
wards and eventually form shocks, a time-varying pat-
Fig. 11.— Time dependent behavior of Stokes I and V in an intermediate flux pixel. Location of the pixel is marked with a diamond in
Figure 2.
Fig. 12.— Time evolution of individual Stokes I and V profiles in an intermediate flux pixel. Location of the pixel is marked with a
diamond in Figure 2.
Fig. 13.— Time evolution of Stokes I and V in a network pixel. Location of the pixel is marked with a triangle in Figure 2.
Fig. 14.— Time evolution of individual Stokes I and V profiles in a network pixel. Location of the pixel is marked with a triangle in
Figure 2.
Fig. 15.— Histograms of Stokes I and V amplitudes, and Stokes V amplitude and area asymmetries of the map and time-series.
tern of disappearing and reappearing Stokes V lobes is
seen (Fig. 16). The pattern is strongest in the highest
forming line, i.e. 8542 Å. Wave propagation is also seen
in the Stokes I profiles. There are no large self reversals
or brightenings, instead the position of the line minimum
changes periodically and forms a saw-tooth like pattern
where the red shift takes more time than the blue shift
phase.
If we first compare the simulated profiles to the in-
ternetwork observations (Fig. 10), we see that the
strong signatures of shocks seen in the simulation are not
present in the observations. In the simulation the Ca IR
triplet is formed in a region where the waves are just be-
ginning to shock. If the formation height of the lines or
the shocks in the simulation is off, compared to the real
Sun, by a small amount, even 50 km, the lines’ temporal
evolution may look very different. Another possible ex-
planation to why we see no strong indications of shocks
is the temporal and/or spatial resolution: there may be
several components oscillating out of phase relative to
one another in a given resolution element. However, the
photospheric velocities are very similar in the internet-
work and network, but the network profiles show strong
self-reversals. This suggests that spatial and temporal
resolution alone cannot explain the lack of strong signa-
tures of shocks in the internetwork.
Observations of the quiet Sun show varying degrees
of oscillatory power (compare for example Lites et al.
(1993) [Ca II H and K] or UV data of Judge et al. 2003,
McIntosh & Judge 2001 and Wikstøl et al. 2000). This
variation may be related to the local magnetic topology,
especially to the possible existence of a magnetic canopy
(McIntosh et al. 2003; Vecchio et al. 2006). The region
observed here was less oscillatory than average but still
not exceptionally quiet.
Both the simulated profiles and observed network pro-
files (Fig. 13) show time varying patterns where the
Stokes I and V amplitudes change periodically. In the
simulation the wave propagation manifests itself in the
Stokes I profiles most clearly as a shift of the line core
and the saw-tooth shape of the time series. In the ob-
servations, waves cause the lines’ periodically varying
self-reversals that result in alternating bright and dark
phases. There are indications of diagonal structures in
the observed Stokes I images, but they are not nearly
as clear as in the simulation. In the simulation the up-
ward propagating waves cause the blue and red lobes of
the Stokes V profiles to disappear alternately. In con-
trast, the observed time varying pattern in Stokes V
looks more complicated: there is much more structure
in the observed profiles, especially in the line cores, than
in the simulation. This is related to the simulated pro-
files not exhibiting strong self-reversals as seen in the
observations.
In the simulation, because of radiative cooling and ex-
pansion of the falling material, the down flows are in
general cooler than the up flows. In the synthesized pro-
files this manifests itself by the red wings of the Stokes
I profiles showing less variations, though the difference
with the blue wing is quite small. Similar behavior is also
seen in the observations: the self-reversals are in general
larger in the blue wing of the Stokes I profiles and the red
lobes of the Stokes V profiles show clearly less variation.
4.2. Comparison of statistics and Stokes V
morphologies
In the simulation the magnetic field decays exponen-
tially with height and therefore the Ca II Stokes V am-
plitudes are significantly lower than the Fe I 8497 Å am-
plitude. In the observations the Ca and Fe line Stokes V
profiles have roughly the same amplitudes. This may be
explained by the field decaying much slower with height
in the observations, or by the filling factor in the ob-
servations being smaller in the photosphere than in the
chromosphere.
Both Ca II lines’ observed Stokes V profiles have a sig-
nificant amount of signal in the wings. In the simulations
only the 8498 Å line Stokes V has extended wings with
large amplitudes (Fig. 4 in P06). The amount of signal
in the wings depends on the atmospheric magnetic field
Fig. 16.— Time evolution of Stokes I and V profiles in the high-β simulation (P06). The Stokes V signal in wavelength range -1.2 to -0.6
Å in the 8498 Å image is scaled down with factor 7.5 in order to display both the Ca II 8498 Å and Fe I 8497 Å lines in the same panel.
gradient. If there is no gradient the wings of all three
Ca lines have very little signal. Whereas a model atmo-
sphere with a constant field gradient produces profiles
where all lines, 8498 Å the most, have some signal in
the line wings and an exponential field produces profiles
with the largest wings. Depending on where the gradi-
ent is located and how strong the field is, the Ca lines
may or may not have similar Stokes V profiles. Based
on the profile shapes and relative amplitudes, it is ob-
vious that the magnetic topology in the observations is
different from the simulation.
Formation of area and amplitude asymmetries in the
simulation is coupled. The correlation is especially
strong in the 8542 Å line (upper row of Fig. 17). In
the 8498 Å Stokes V profiles the strong wings affect
the asymmetries, and the correlation is weaker. The
observed area and amplitude asymmetries of both lines
show less correlation. This is at least partly because the
observed profiles have more complex shapes than in the
simulation.
The lower panels in Figure 17 show the Stokes V asym-
metry histograms for the simulation. The observed his-
tograms are re-plotted to enable direct comparison. In
the simulation both lines’ amplitude and asymmetry his-
tograms are centered roughly around zero (percentage-
wise there are a couple of percent more negative than
positive asymmetries). This was not the case in the ob-
servations where all the asymmetries, except the 8542 Å
area asymmetry, have clearly more positive than nega-
tive values, i.e. the blue lobe is larger in area/amplitude
than the red lobe.
The observed 8498 Å profiles are more dynamic than
the simulated ones. Consequently the observed 8498 Å
asymmetry histograms are clearly wider than the simu-
lated. Because there is very little signal in the simulated
8542 Å Stokes V profile wings, when an upward propa-
gating wave causes a Stokes V lobe to disappear, there is
no signal in the line wing to contribute to the amplitude.
This leads to the extreme amplitude asymmetries in the
simulations and in the additional lobes at large values
in the simulated 8542 Å line area asymmetry histogram.
Since the observed profiles have a significant amount of
signal in the wings, the extreme amplitude asymmetries
are moderated, and no lobes at large values are seen in
the histogram.
5. CONCLUSIONS AND DISCUSSION
So far most spectropolarimetric studies using the
Ca II IR triplet lines have focused on active regions
(e.g., Socas-Navarro, Trujillo Bueno, & Ruiz Cobo
2000a; López Ariste, Socas-Navarro, & Molodij
2001; Socas-Navarro 2005;
Uitenbroek, Balasubramaniam, & Tritschler 2006).
The observations presented here show that these lines
are also promising candidates for studying the magnetic
chromosphere outside of active regions. Interpreting the
observations, however, is not straight forward.
The main results of the analysis presented here are:
• Classification of Stokes V profile shapes.
Asymmetric line profiles are very common and that
the two lines, despite being formed fairly close in a
geometrical sense, often do not have similar shapes.
Furthermore, the edges of the network patches ex-
hibit profile shapes different from those seen in the
center of the patches. The cluster analysis results,
as expected, in a qualitative, not quantitative, de-
scription of the profile shapes.
• Statistics of the line profiles.
The 8542 Å area asymmetry is predominantly neg-
ative; while the 8498 Å area asymmetry and the
amplitude asymmetries are usually positive.
• Time dependent behavior.
The enhanced network has very different dynamic
behavior compared with the internetwork. It is
more dynamic and the oscillation period, as seen
in both Stokes I and V , is greater than in the in-
ternetwork.
• Comparison with high-β simulation.
Oscillations are present in both the observations
and the simulation. The simulated profiles are
more dynamic than the observed internetwork pro-
files. The opposite is true for network profiles.
In the simulation, the formation of asymmetries is
more tightly coupled than what is seen in the ob-
servations. Except for the 8542 Å amplitude asym-
metry the observed profiles show a wider range of
asymmetries. And lastly, the peculiar negative area
asymmetries seen in the observed 8542 Å line and
the tendency of the other asymmetries to be posi-
tive are not reproduced by the simulation.
The tendency of large Stokes V asymmetries to de-
crease with an increasing signal amplitude has also been
observed in photospheric lines (Grossmann-Doerth et al.
1996). In the photosphere a magnetic canopy is one pos-
sible explanation: the canopy gives rise to asymmetries
in the lines, and as a flux tube diameter increases, the
relative contribution from the canopy to the Stokes V
signal decreases. In the photosphere the scatter in an
amplitude vs. asymmetry plot is significantly larger in
the area than in the amplitude. No large difference is
seen in the area and amplitude asymmetry scatters of
the Ca II lines.
In the quiet Sun photosphere, more positive
than negative Stokes V asymmetries are found
(Grossmann-Doerth et al. 1996). In contrast with 8498
Å line (where there is no large difference in the mean area
and amplitude asymmetries) the photospheric mean area
asymmetries are significantly smaller (4 % in the Fe I
6302 Å line) than the mean amplitude asymmetries (15
% in the Fe I 6302 Å line). The photospheric asymme-
tries are often attributed to multiple atmospheric compo-
nents within a resolution element. In the chromosphere,
however, gradients have to play a dominant role since
the formation of area asymmetries require them. An-
other piece of evidence of the importance of gradients
in the chromosphere is that Milne-Eddington inversions,
which include the Paschen-Back effect of the He I 10830
Å triplet, are not able to reproduce the observed area
asymmetries (Sasso & Solanki 2006).
Fig. 17.— Stokes V asymmetries of the simulated and observed profiles. Upper 4 panels show the correlation of amplitude and area
asymmetries in the simulated and observed Ca lines. The Pearson correlation coefficient for each case is given. The asterisk symbols show
the mean for each 0.1 wide bin and the error bars show the standard deviation. The lower panels are histograms of observed and simulated
amplitude and area asymmetries.
Khomenko et al. (2005) used a 3-dimensional magne-
toconvection model to synthesize photospheric magneti-
cally sensitive lines in the visible and IR. There are more
positive than negative Stokes V asymmetries in their syn-
thetic profiles. They found that reducing the spatial res-
olution increases the number of irregular stokes V pro-
files (though the number of strongly asymmetric profiles
decreases). They conclude that the asymmetries reflect
more inhomogeneities in the horizontal direction than in
the vertical. In the chromosphere large velocity gradients
are more common and variation in the vertical direction
are likely to be more important than variation in the hor-
izontal direction. When these two factors are combined
with the observed area asymmetries, one concludes that
the chromospheric asymmetries mainly reflect the line-of-
sight inhomogeneities, and not variations in the horizon-
tal direction. Despite the apparent similarities between
the photospheric and chromospheric Stokes V profiles,
the underlying mechanism causing the asymmetries does
not appear to be the same. Drawing parallels between
the chromosphere and photosphere is problematic since
the two regions exist in very different physical regimes.
The discrepancy between the Stokes V asymmetry his-
tograms of the observations and the simulation may be
related to the self-reversals. The simulated profiles ex-
hibit only small self-reversals. The observations show
large self-reversals in the Stokes I profiles and accompa-
nying emission features in the Stokes V profiles. These
features are stronger on the blue side of the line cores.
Another effect that contributes to the imbalance is that
that the down flow phase lasts longer. Our observations,
especially with a 5 second exposure time, sample more
profiles with red-shifts and positive asymmetries (since
there will be more emission on the blue side). However,
inspection of Fig. 16 shows the same to be true of the
simulations. If this is the case, why are there not more
positive than negative asymmetries in the simulation as
well?
The sample of these observations is limited because the
majority of the profiles are drawn from the same three
slit positions which sample the same local magnetic field
configuration. It would not be surprising if histograms
made of profiles from a variety of quiet-Sun magnetic
field topologies would have somewhat different shapes.
The complexity of the observed profiles makes the inter-
pretation of the area and amplitude asymmetries diffi-
cult. Because of multiple lobes and the strong signal in
the line wings, the asymmetries are not necessarily good
proxies for the overall complexity of the Stokes V pro-
files. This is especially true if the two asymmetries are
viewed separately.
It is a well known result that the network intensity
oscillations have a longer period than the internetwork
(e.g. Orrall 1966, Lites et al. 1993, Banerjee et al. 2001).
This has also been observed before in the Ca II IR
lines (Deubner & Fleck 1990). Why do the intermedi-
ate flux regions in our observations appear to be more
dynamic than the stronger flux regions? It may be re-
lated to a more complex magnetic topology at the edges
of the network patches. The observations show no sig-
nal above the noise in Stokes Q and U , so we can-
not draw any conclusions of possible horizontal fields.
Any signal would be affected by atomic polarization
(Manso Sainz & Trujillo Bueno 2003) making the inter-
pretation exceedingly complex. The filling factor in the
network is not likely to be very large, and is likely smaller
at the edges than in the center of the network patch. In-
versions by Bellot Rubio et al. (2000) of average Stokes
profiles in a plage region gave a filling factor of 0.5 a
z = 0 km. The filling factor in the photospheric net-
work can safely be assumed to be lower than this. In
fact, in recent inversions by Domı́nguez Cerdeña et al.
(2006), which included a small patch of network, the pho-
tospheric filling factor in the patch center was as small
as 0.1. The network magnetic fields must expand with
height and consequently the chromospheric filling fac-
tor must exceed photospheric values. Results of com-
paring photospheric and chromospheric magnetograms,
however, Zhang & Zhang (2000) suggest that the sizes
of the network magnetic elements are not very different
at the two heights . The chromospheric magnetograms in
the comparison are based on the Hβ line. Its interpreta-
tion is complicated by the magnetically sensitive blends
close to the line core, and the line may suffer from same
problems as the Hα line when used as a proxy for chro-
mospheric magnetic fields, namely that the photospheric
contribution to the polarization signal is not insignificant
(Socas-Navarro & Uitenbroek 2004). Lastly, the size of
network patches is not directly linked with the filling fac-
tor. We see some expansion of the network with height in
the magnetograms of the map (Fig. 2), especially when
comparing the Ca II 8498 Å and 8542 Å magnetograms.
But since the magnetograms were constructed by using
the weak field formula, and the network fields have gra-
dients and are not necessarily weak, the magnetograms
are not accurate. Also the choice of color scaling of the
images affects the comparison. However, the apparent
expansion is not necessarily an artifact, since expansion
of network seen in magnetograms has also been reported
by Giovanelli (1980).
Obviously we need to understand better the topology
of the network magnetic fields. To do this we plan to
perform nLTE inversions of these data in the near fu-
ture. The inversions will help further in understanding
the formation dynamics of the Ca II IR lines in the quiet
Sun, and hopefully reveal how the underlying atmosphere
differs from that used in the simulation. An important
question to answer is why the two Ca lines behave as
differently as they do. Having a time series taken during
good seeing would be helpful. Also in order to expand the
analysis to internetwork regions, better spatial resolution
is required. Another interesting question is how much
variation there is in dynamics in different internetwork
regions, and how well the differences can be explained in
terms of the surrounding magnetic fields as has been sug-
gested by Vecchio et al. (2006) based on imaging data of
Ca II 8542 Å Stokes I. To fully investigate this in detail
high quality data of the full Stokes vector are needed.
TABLE 1
Observed Stokes V asymmetries
8498 Å 8498 Å 8498 Å 8542 Å 8542 Å 8542 Å
< 0 (%) > 0 (%) mean (%) < 0 (%) > 0 (%) mean (%)
σa 43.2 55.7 3.1 36.6 61.4 6.3
σA 35.5 64.5 3.3 69.7 30.3 -6.8
Note. — Percentages of observed Ca II 8498 Å and 8542 Å Stokes V
amplitude and area asymmetries with negative (i.e. red lobe larger) and
positive (i.e. blue lobe larger) signs.
Thanks to Doug Gilliam, Joe Elrod and Mike Bradford for all their invaluable help during the observing run.
REFERENCES
Andretta, V. & Jones, H. P. 1997, ApJ, 489, 375
Banerjee, D., O’Shea, E., Doyle, J. G., & Goossens, M. 2001,
A&A, 371, 1137
Bel,N. & Leroy, B. 1977, A&A, 55, 239
Bellot Rubio, L. R. 2006, ArXiv Astrophysics e-prints
Bellot Rubio, L. R., Ruiz Cobo, B., & Collados, M. 2000, ApJ,
535, 489
Carlsson, M. & Stein, R. F. 1997, ApJ, 481, 500
Deubner, F.-L. & Fleck, B. 1990, A&A, 228, 506
Domı́nguez Cerdeña, I., Almeida, J. S., & Kneer, F. 2006, ApJ,
646, 1421
Giovanelli, R. G. 1980, Solar Phys., 68, 49
Grossmann-Doerth, U., Keller, C. U., & Schuessler, M. 1996,
A&A, 315, 610
Judge, P. G., Carlsson, M., & Stein, R. F. 2003, ApJ, 597, 1158
Keller, C. U. & The Solis Team. 2001, in ASP Conf. Ser. 236:
Advanced Solar Polarimetry – Theory, Observation, and
Instrumentation, ed. M. Sigwarth, 16–+
Khomenko, E. V., Collados, M., Solanki, S. K., Lagg, A., &
Trujillo Bueno, J. 2003, A&A, 408, 1115
Khomenko, E. V., Shelyag, S., Solanki, S. K. & Vögler, A. 2005,
A&A, 442, 1059
Lagg, A. 2005, in ESA SP-596: Chromospheric and Coronal
Magnetic Fields, ed. D. E. Innes, A. Lagg, & S. A. Solanki
Landi Degl’Innocenti,E. 1992, in Solar Observations: Techniques
and Interpretation, First Canary Islands Winter School of
Astrophysics, ed. F. Sanchez, M. Collados & M. Vazquez
Lites, B. W., Chipman, E. G., & White, O. R. 1982, ApJ, 253, 367
Lites, B. W., Rutten, R. J., & Kalkofen, W. 1993, ApJ, 414, 345
López Ariste, A., Socas-Navarro, H., & Molodij, G. 2001, ApJ,
552, 871
MacQueen, J. 1967, in Proceedings Fifth Berkeley Symposium on
Math. Stat. and Prob., ed. L. M. LeCam & J. Neyman, 281–+
Manso Sainz, R. & Trujillo Bueno, J. 2003, Physical Review
Letters, 91, 111102
Mart́ınez Pillet, V., L. B. W. . S. A. 1997, ApJ, 474, 810
McIntosh, S. W., Fleck, B., & Judge, P. G. 2003, A&A, 405, 769
McIntosh, S. W. & Judge, P. G. 2001, ApJ, 561, 420
Neckel, H. & Labs, D. 1984, Solar Phys., 90, 205
Noyes, R. W. 1967, in IAU Symp. 28: Aerodynamic Phenomena
in Stellar Atmospheres, ed. R. N. Thomas, 293–+
Orrall, F. Q. 1966, ApJ, 143, 917
Paletou, F. & Molodij, G. 2001, in ASP Conf. Ser. 236: Advanced
Solar Polarimetry – Theory, Observation, and Instrumentation,
ed. M. Sigwarth, 9–+
Pietarila, A., Socas-Navarro, H., Bogdan, T., Carlsson, M., &
Stein, R. F. 2006, ApJ, 640, 1142
Rees, D. E., López Ariste, A., Thatcher, J., & Semel, M. 2000,
A&A, 355, 759
Rimmele, T. R. 2000, in Proc. SPIE Vol. 4007, p. 218-231,
Adaptive Optical Systems Technology, Peter L. Wizinowich;
Ed., ed. P. L. Wizinowich, 218–231
Sánchez Almeida, J. & Lites, B. W. 2000, ApJ, 532, 1215
Sasso, C., Lagg, A. & Solanki, S. 2006, A&A, 456, 367
Scharmer, G. B., Bjelksjo, K., Korhonen, T. K., Lindberg, B., &
Petterson, B. 2003, in Innovative Telescopes and
Instrumentation for Solar Astrophysics. Edited by Stephen L.
Keil, Sergey V. Avakyan . Proceedings of the SPIE, Volume
4853, pp. 341-350 (2003)., ed. S. L. Keil & S. V. Avakyan,
341–350
Shimizu, T. 2004, in ASP Conf. Ser. 325: The Solar-B Mission and
the Forefront of Solar Physics, ed. T. Sakurai & T. Sekii, 3–+
Socas-Navarro, H. 2005, ApJ, 631, L167
Socas-Navarro, H., López Ariste, A., & Lites, B. W. 2001, ApJ,
553, 949
Socas-Navarro, H., Trujillo Bueno, J., & Landi Degl’Innocenti, E.
2004, ApJ, 612, 1175
Socas-Navarro, H., Trujillo Bueno, J., & Ruiz Cobo, B. 2000a,
Science, 288, 1398
—. 2000b, ApJ, 530, 977
Socas-Navarro, H. & Uitenbroek, H. 2004, ApJ, 603, L129
Socas-Navarro, H. et al. 2006, Solar Physics, 235, 55
Stein, R. F. & Nordlund, Å. 2006, ApJ, 642, 1246
Uitenbroek, H., Balasubramaniam, K. S., & Tritschler, A. 2006,
ApJ, 645, 776
Vecchio, A., Cauzzi, G., Reardon, K. P., Janssen, K. & Rimmele,
T. 2006, astro-ph/0611206
Wikstøl, Ø., Hansteen, V., Carlsson, M., & Judge, P. G. 2000,
Astrophys. J., 531, 1150
Zhang, H. & Zhang, M. 2000, Solar Phys., 196, 269
http://arxiv.org/abs/astro-ph/0611206
ABSTRACT
  The Ca II infrared triplet is one of the few magnetically sensitive
chromospheric lines available for ground-based observations. We present
spectropolarimetric observations of the 8498 A and 8542 A lines in a quiet Sun
region near a decaying active region and compare the results with a simulation
of the lines in a high plasma-beta regime. Cluster analysis of Stokes V profile
pairs shows that the two lines, despite arguably being formed fairly close,
often do not have similar shapes. In the network, the local magnetic topology
is more important in determining the shapes of the Stokes V profiles than the
phase of the wave, contrary to what our simulations show. We also find that
Stokes V asymmetries are very common in the network, and the histograms of the
observed amplitude and area asymmetries differ significantly from the
simulation. Both the network and internetwork show oscillatory behavior in the
Ca II lines. It is stronger in the network, where shocking waves, similar to
those in the high-beta simulation, are seen and large self-reversals in the
intensity profiles are common.

<|endoftext|><|startoftext|>
Introduction
In this paper we compute the number of moduli of certain families of irreducible
plane curves with nodes and cusps as singularities. Let Σnk,d ⊂ P(H
0(P2,OP2(n))) :=
PN , with N =
n(n+3)
, be the closure, in the Zariski’s topology, of the locally closed
set of reduced and irreducible plane curves of degree n with k cusps and d nodes.
Let Σ ⊂ Σnk,d be an irreducible component of the variety Σ
k,d. We denote by
Σ0 the open set of Σ of points [Γ] ∈ Σ such that Σ is smooth at [Γ] and such
that [Γ] corresponds to a reduced and irreducible plane curve of degree n with d
nodes, k cusps and no further singularities. Since the tautological family S0 → Σ0,
parametrized by Σ0, is an equigeneric family of curves, by normalizing the total
space, we get a family
// S0
// P2 × Σ0
of smooth curves of genus g =
−k−d. Because of the functorial properties of
the moduli space Mg of smooth curves of genus g, we get a regular map Σ0 → Mg,
sending every point [Γ] ∈ Σ0 to the isomorphism class of the normalization of the
plane curve Γ corresponding to the point [Γ]. This map extends to a rational map
ΠΣ : Σ 99K Mg.
We say that ΠΣ is the moduli map of Σ and we set
number of moduli of Σ := dim(ΠΣ(Σ)).
Date: 06 September 2005.
1991 Mathematics Subject Classification. 14H15; 14H10; 14B05.
Key words and phrases. families of plane curves, number of moduli, nodes and cusps.
http://arxiv.org/abs/0704.0618v1
2 CONCETTINA GALATI
Notice that, when Σnk,d is reducible, two different irreducible components of Σ
can have different number of moduli. We say that Σ has general moduli if ΠΣ is
dominant. Otherwise, we say that Σ has special moduli.
Definition 1.1. When Σ has the expected dimension equal to 3n+ g − 1 − k and
g ≥ 2, we say that Σ has the expected number of moduli if
dim(ΠΣ(Σ)) = min(dim(Mg), dim(Mg) + ρ− k),
where ρ := ρ(2, g, n) = 3n−2g−6 is the number of Brill-Noether of the linear series
of degree n and dimension 2 on a smooth curve of genus g.
As we shall see in the next section, when g ≥ 2 and when Σ has the expected
dimension equal to 3n+ g − 1 − k, the number of moduli of Σ is at most equal to
the expected one. This happens in particular if k < 3n. If k ≥ 3n, in general we
have not an upper-bound for the dimension of Σ and we cannot provide an upper
bound for the number of moduli of Σ, (see lemma 2.2 and remark 2.3). Moreover,
by classical Brill-Neother theory when ρ is positive and by a well know result of
Sernesi when ρ ≤ 0 (see [18]), we have that Σn0,d, (which is irreducible by [8]), has
the expected number of moduli for every d ≤
. When k > 0 there are known
results giving sufficient conditions for the existence of irreducible components Σ of
Σnk,d with general moduli, (see propositions 2.5 and 2.6 and corollary 2.7). In this
article we construct examples of families of irreducible plane curves with nodes and
cusps with finite and expected number of moduli. A large part of this paper is
obtained working out the main ideas and techniques that Sernesi uses in [18].
In section 2.1 we introduce the varieties Σnk,d and we recall their main properties.
In section 2.2 we discuss on definition 1.1 and we summarize known results on the
number of moduli of families of irreducible plane curves with nodes and cusps. In
theorem 3.5 we prove the existence of plane curves with nodes and cusps as singular-
ities whose singular points are in sufficiently general position to impose independent
linear conditions to a linear system of plane curves of a certain degree. This result
is related to the moduli problem by lemma 3.2, remark 3.4 and proposition 4.1,
where we find sufficient conditions in order that an irreducible component Σ ⊂ Σnk,d
has the expected number of moduli. If Σ verifies the hypotheses of proposition 4.1,
then the Brill-Neother number ρ is not positive and Σ has finite number of moduli.
Moreover, by lemma 4.6 and corollary 4.7, for every k′ ≤ k and d′ ≤ d+k−k′, there
is at least an irreducible component Σ′ ⊂ Σnk′,d′ , such that Σ ⊂ Σ
′ and the general
element [D] ∈ Σ′ corresponds to a plane curve D verifying hypotheses of propo-
sition 4.1 and so having the expected number of moduli. Finally, the main result
of this paper is contained in theorem 4.9, where, by using induction on the degree
n and on the genus g of the general curve of the family, we construct examples of
families of irreducible plane curves with nodes and cusps verifying the hypotheses
of proposition 4.1. In particular, we prove that, if k ≤ 6 and ρ ≤ 0, then Σnk,d has
at least an irreducible component which is not empty and which has the expected
number of moduli. This result may be improved and examples of families of curves
showing that the condition k ≤ 6 is not sharp are given in remark 4.10. Notice
that the previous theorem provides only examples of families of plane curves with
nodes and cusps with expected number of moduli, when ρ is not positive. When the
number of cusps k is very small, we expect it is possible to prove the existence of
irreducible components of Σnk,d with expected number of moduli, for every value of
ρ. For example, from a result of Eisenbud and Harris, it follows that Σn1,d, (which is
irreducible by [16]), has general moduli if ρ ≥ 2, (see corollary 2.7). In theorem 4.11,
by using induction on n we find that Σn1,d has general moduli also when ρ = 1. By
recalling that, by theorem 4.9, Σn1,d has expected number of moduli when ρ ≤ 0, we
conclude that Σn1,d has the expected number of moduli for every ρ or, equivalently,
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 3
for every d ≤
− 1. We still don’t know examples of irreducible components of
Σnk,d having number of moduli smaller that the expected.
2. Preliminaries
2.1. On Severi-Enriques varieties. We shall denote by PN = P
n(n+3)
2 the Hilbert
scheme of plane curves of degree n, by [Γ] ∈ PN the point parametrizing a plane
curve Γ ⊂ P2 and by Σnk,d ⊂ P
N the closure, in the Zariski topology, of the locally
closed set parametrizing reduced and irreducible plane curves of degree n with d
nodes and k cusps as singularities. These varieties have been introduced at the
beginning of the last century by Severi and Enriques. In particular, the case k = 0
has been studied first by Severi and for this reason the varieties Σn0,d are usually
called Severi varieties, while for k > 0 the varieties Σnk,d are called Severi-Enriques
varieties. We recall that every irreducible component Σ of Σnk,d has dimension at
least equal to
N − d− 2k = 3n+ g − 1− k,
where g =
−k−d. When the equality holds we say that Σ has expected dimen-
sion. Moreover, it is well known that if k < 3n then every irreducible component
Σ of Σnk,d has expected dimension, (see for example [23] or [25]). On the contrary,
when k ≥ 3n, there exist examples of irreducible components of Σnk,d having di-
mension greater than the expected, (see [25]). Moreover, we recall that Σn0,d is not
empty for every d ≤
and it contains in its closure all points parameterizing
irreducible plane curves of degree n and genus g =
− d, (see [24], [25] and
[1]). Often, we shall denote Σn0,d by Vn,g. While the proof of the existence of Vn,g
is quite elementary and it is due to Severi, the irreducibility of Vn,g remained an
open problem for a long time and it has been proved by Harris only in 1986. Later,
by using the same techniques of Harris, Kang has proved the irreducibility of Σnk,d
with k ≤ 3, see [8] and [16]. However, in general, Σnk,d is reducible and there exist
values of n, d and k such that Σnk,d is empty, (see [25], [12], [20], [11] or chapter 2
of [7] and related references). Finally, we recall that, if Σ ⊂ Σnk,d is a non-empty
irreducible component of the expected dimension equal to 3n + g − 1 − k, then,
for every k′ ≤ k and d′ ≤ d + k − k′, there exists a non-empty irreducible compo-
nent Σ′ ⊂ Σnk′,d′ such that Σ ⊂ Σ
′. This happens in particular if k < 3n. More
precisely, it is true that, if Γ ⊂ P2 is a reduced (possibly reducible) plane curve of
degree n with k < 3n cusps at points q1, . . . , qk, nodes at points p1, . . . , pd and no
further singularities, then, chosen arbitrarily k1 cusps, say q1, . . . , qk1 among the
k cusps of Γ, k2 cusps qk1+1, . . . , qk2 among qk1+1, . . . , qk and d1 nodes p1, . . . , pd1
among the nodes of Γ, there exists a family of reduced plane curves D → B ⊂ PN
of degree n, whose special fibre is D0 = Γ and whose general fibre Dt = D has a
node in a neighborhood of every marked node of Γ, a cusp in a neighborhood of
each point q1, . . . , qk1 , a node in a neighborhood of each point qk1+1, . . . , qk2 and
no further singularities, (see [25], corollary 6.3 of [11] or lemma 3.17 of chapter 2
of [7]). To save space, we shall say that the family D → B is obtained from Γ by
preserving the singularities q1, . . . , qk1 and p1, . . . , pd1 , by deforming in a node each
cusp qk1+1, . . . , qk2 and by smoothing the other singularities.
2.2. Known results on the number of moduli of Σnk,d. In order to explain
the definition 1.1, we need to recall some basics of Brill-Noether theory. Given a
smooth curve C of genus g, the set G2n(C) of linear series g
n on C of dimension 2
and degree n, is a projective variety which verifies the following properties:
(1) G2n(C) is not empty of dimension at least ρ, if ρ(2, n, g) = 3n− 2g − 6 ≥ 0,
(see theorem V.1.1 and proposition IV.4.1 of [4]).
4 CONCETTINA GALATI
(2) Let g2n be a given linear series, letH ∈ g
n be a divisor and letW ⊂ H
0(C,H)
be the three dimensional vector space corresponding to g2n. Denoting by
ωC = OC(KC) the canonical sheaf of C and by
µo,C :W ⊗H
0(C, ωC(−H)) → H
0(C, ωC)
the natural multiplication map, also called the Brill-Noether map of the
pair (C,W ), we have that the dimension of the tangent space to G2n(C) at
the point [g2n],corresponding to g
n, is equal to
dim(T[g2
n(C)) = ρ+ dim(ker(µ0,C)),
(see [2] or proposition IV.4.1 of [4] for a proof).
(3) Moreover, if C is a curve with general moduli (i.e. if [C] varies in an open
set of Mg), the variety G
n(C) is empty if ρ < 0, it consists of a finite
number of points if ρ = 0 and it is reduced, irreducible, smooth and not
empty variety of dimension exactly ρ, when ρ ≥ 1, (see theorem V.1.5 and
theorem V.1.6 of [4]). In the latter case, the general g2n on C defines a local
embedding on C and it maps C to P2 as a nodal curve, (see theorem 3.1 of
[1] or lemma 3.43 of [9]).
From (3), we deduce that, the Severi variety Σn0,d = Vn,g of irreducible plane curves
of genus g =
− d, has general moduli when ρ ≥ 0 and it has special moduli
when ρ < 0. When ρ < 0, and then g ≥ 3, by definition 1.1, we expect that the
image of Vn,g into Mg has codimension exactly −ρ. Equivalently, recalling that, in
this case,
dim(Vn,g) = 3n+ g − 1 = 3g − 3 + ρ+ 8 = dim(Mg) + ρ+ dim(Aut(P
we expect that on the smooth curve C, obtained by normalizing the plane curve
corresponding to the general element of Vn,g, there is only a finite number of g
mapping C to the plane as a nodal curve. This is a well known result proved by
Sernesi in [18].
Theorem 2.1 (Sernesi, [18]). The Severi variety Vn,g = Σ
0,d of irreducible plane
curves of degree n and genus g =
− d has number of moduli equal to
min(dim(Mg), dim(Mg) + ρ).
What can we say about the number of moduli of an irreducible component Σ of
Σnk,d, when k > 0? In this case we need to distinguish the two cases k < 3n and
k ≥ 3n. In the first case we have the following result.
Lemma 2.2. For every not empty irreducible component Σ of Σnk,d, with k < 3n
and g =
− k − d ≥ 2, the number of moduli of Σ is at most equal to
min(dim(Mg), dim(Mg) + ρ− k),
where ρ = 3n − 2g − 6 is the Brill-Neother number of moduli of linear series of
dimension 2 and degree n on a smooth curve of genus g.
Proof. We recall that an ordinary cusp P of a plane curve Γ corresponds to a simple
ramification point p of the normalization map φ : C → Γ, i.e. to a simple zero of
the differential map dφ. If we denote by G2n,k(C) ⊂ G
n(C) the set of g
n on C
defining a birational morphism with k simple ramification points, then G2n,k(C) is
a locally closed subset of G2n(C) and every irreducible component G of G
n,k(C) has
dimension at least equal to ρ− k, if it is not empty. In particular, if F 2n,k(C) is the
variety whose points correspond to the pairs ([g2n], {s0, s1, s2}) where [g
n] ∈ G
n,k(C)
and {s0, s1, s2} is a frame of the three dimensional space associated to the linear
series g2n, then every irreducible component of F
n,k(C) has dimension at least equal
min(8, ρ− k + 8).
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 5
Now, let Σ be one of the irreducible components of Σnk,d and let [Γ] be a general
point of Σ. Then, if Γ ⊂ P2 is the corresponding plane curve and φ : C → Γ is the
normalization map, then the fibre over the point [C] ∈ Mg of the moduli map
ΠΣ : Σ 99K Mg
consists of an open set in one or more irreducible components of F 2n,k(C). In partic-
ular, every irreducible component of the general fibre of ΠΣ has dimension at least
equal to min(8, ρ− k + 8). Moreover, if k < 3n then Σ has the expected dimension
equal to N−d−2k = 3n+g−1−k, (see [25] or [23]). Finally, if g =
−k−d ≥ 2,
dim(Σ) = 3n+ g − 1− k = 3g − 3 + ρ− k + 8.
This proves the statement. �
Remark 2.3. The proof of the previous lemma still holds if k ≥ 3n but Σ has the
expected dimension. However in general, when k ≥ 3n, we don’t have a bound for
dim(ΠΣ(Σ)). Indeed, in this case the dimension of the general fibre of the moduli
map of Σ is still at least equal to ρ− k + 8, but Σ may have dimension larger than
3n+ g − 1 − k. Anyhow, by the following proposition, every not empty irreducible
component of Σnk,d has special moduli if k ≥ 3n.
Proposition 2.4 (Arbarello-Cornalba, [1]). Let C be a general curve of genus
g ≥ 2 and let φ : C → P2 be a birational morphism, then the degree of the zero
divisor of the differential map of φ is smaller than ρ. In particular, every irreducible
component of Σnk,d has special moduli if ρ = 3n− 2g − 6 < k.
A sufficient condition for the existence of irreducible families of plane curves with
nodes and cusps with general moduli is given by the following result.
Proposition 2.5 (Kang, [15]). Σnk,d is irreducible, not empty and with general
moduli if n > 2g − 1 + 2k, where g =
− d− k.
Actually, in [15], Kang proves that if n > 2g−1+2k, then Σnk,d is not empty and
irreducible. But from his proof it follows that, under the hypothesis of proposition
2.5, Σnk,d has general moduli because the general element of Σ
k,d corresponds to a
curve which is a projection of an arbitrary smooth curve C of genus g in Pn−g, from
a general (n − 3)-plane intersecting the tangent variety of C in k different points.
Another result which may be used to find examples of families of plane curves with
nodes and cusps having general moduli is the following. Let grn be a linear series on
C associated to a (r+1)-spaceW ⊂ H0(C,L), where L is an invertible sheaf on C,
and let {s0, . . . , sr} be a basis of W , then the ramification sequence of the g
n at p
is the sequence b = (b0, . . . , br) with bi = ordpsi − i. Choosing another basis of W ,
the ramification sequence of grn at p doesn’t change. We say that the ramification
sequence of the grn at p is at least equal to b = (b0, . . . , br) if bi ≤ ordpsi − i, for
every i, and we write (ordps0, . . . , ordpsr − r) ≥ (b0, . . . , br).
Proposition 2.6 (Proposition 1.2 of [10]). Let C be a general curve of genus g,
let p be a general point on C and let b = (b0, . . . , br) be any ramification sequence.
There exists a grn on C having ramification at least b at p if and only if
(bi + g − n+ r)+ ≤ g,
where (−)+ := max(−, 0).
From proposition 2.6, we easily deduce the following result.
Corollary 2.7. Suppose that k ≤ 3 and ρ = 3n− 2g − 6 ≥ 2k. Then Σnk,d is not
empty, irreducible and it has general moduli.
6 CONCETTINA GALATI
Proof. By [16], the variety Σnk,d is irreducible for every k ≤ 3 and d ≤
Moreover, by using classical arguments, one can prove that Σnk,d is not empty if
k ≤ 4 and d ≤
− 4, (see, for example, corollary 3.18 of chapter two of [7]).
Finally, by theorem 1.1 of [21], by using the terminology of proposition 2.6, under
the hypothesis k ≤ 3n − 4, in particular if k ≤ 3, the variety Σnk,d contains every
point of PN corresponding to a plane curve Γ of genus g =
− k − d such that
the normalization morphism of Γ has at least a ramification point with ramification
sequence (b0, b1, b2) ≥ (0, k, k). Then, by proposition 2.6, if ρ ≥ 2k and k ≤ 3, the
moduli map of Σnk,d is surjective. �
3. On the existence of certain families of plane curves with nodes
and cusps in sufficiently general position
As we already observed, we don’t have a complete answer for the existence prob-
lem of Σnk,d. In this section we are interested in a little more specific existence
problem. We shall prove the existence of plane curves with nodes and cusps as
singularities whose singular points are in sufficiently general position to impose
independent linear conditions to a linear system of plane curves of a certain degree.
Definition 3.1. A projective curve C ⊂ Pr is said to be geometrically t-normal if
the linear series cut out on the normalization curve C̃ of C by the pull-back to C̃
of the linear system of hypersurfaces of Pr of degree t is complete.
From a geometric point of view, a projective curve C ⊂ Pr is geometrically t-
normal if and only if the image curve νt,r(C) of C by the Veronese embedding
νt,r : P
r → P(
t ) of degree t, is not a projection of a non-degenerate curve living
in a higher dimensional projective space. We shall say that a curve is geometrically
linearly normal (g.l.n. for short) if it is geometrically 1-normal. Every such a curve
C is not a projection of a curve lying in a projective space of larger dimension.
The following result is proved under more general hypotheses in [5], theorem 2.1.
Lemma 3.2. Let Γ ⊂ P2 be an irreducible and reduced plane curve of degree n and
genus g with at most nodes and cusps as singularities. Let t be an integer such that
n− 3 − t < 0, then Γ is geometrically t-normal if and only if it is smooth. On the
contrary, if n − 3 − t ≥ 0, the plane curve Γ is geometrically t-normal if and only
if its singular points impose independent linear conditions to plane curves of degree
n− 3− t.
We recall the following classical definition.
Definition 3.3. Let Γ ⊂ P2 be a plane curve of degree n with d nodes at p1, . . . , pd
and k cusps at q1, . . . , qk as singularities. Let φ : C → Γ be the normalization of
Γ. The adjoint divisor ∆ of φ is the divisor on C defined by ∆ =
i=1 φ
−1(pi) +
j=1 2φ
−1(qj).
Proof of lemma 3.2. Let Γ be a plane curve as in the statement of the lemma. Then,
Γ is geometrically t-normal if and only if, by definition,
h0(C,OC(t)) = h
0(P2,OP2(t))− h
0(P2, IΓ(t))
where IΓ is the ideal sheaf of Γ in P
2 and OC(t) := OC(tφ
∗(H)), where H is the
general line of P2. By Riemann-Roch theorem, Γ is geometrically t-normal if and
only if
(2) h0(C, ωC(−t))) = −nt+ g − 1 +
(t+ 1)(t+ 2)
− h0(P2, IΓ(t)),
where g is the geometric genus of Γ and ωC is the canonical sheaf of C. On the other
hand, it is well known that H0(C, ωC(−t)) = H
0(C,OC(n− 3− t)(−∆)), where ∆
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 7
is the adjoint divisor of φ, (see definition 3.3 and [4], appendix A). If n− 3− t < 0
then h0(C,OC(n− 3− t)) = 0 and Γ is geometrically t-normal if and only if
h0(P2,OP2(t)) − h
0(P2, IΓ(t)) = nt−
n2 − 3n
where δ =
− g = deg(∆)/2. This equality is verified if and only if δ = 0, i.e.
Γ is smooth. If n− 3 ≥ t, h0(P2, IΓ(t)) = 0 and (2) is verified if and only if
h0(C,OC(n− 3− t)(−∆)) = h
0(P2,OP2(n− 3− t))− δ.
On the other hand, if ψ : S → P2 is the blowing-up of the plane at the singular
locus of Γ, denoting by
Ei the pullback of the singular locus of Γ with respect
to ψ and by OS(r) the sheaf OS(rψ
∗(H)), we have that
h0(C,OC(n−3−t)(−∆)) = h
0(S,OS(n−3−t)(−
Ei)) = h
0(P2,OP2(n−3−t)⊗A)
where A is the ideal sheaf of singular points of Γ. �
Remark 3.4. Notice that, if an irreducible and reduced plane curve Γ of degree n
with only nodes and cusps as singularities is geometrically t-normal, with t ≤ n−3,
then it is geometrically r-normal for every r ≤ t. Indeed, if a set of points imposes
independent linear conditions to a linear system S, then it imposes independent
linear conditions to every linear system S′ containing S.
Theorem 3.5. Let Σnk,d be the variety of irreducible and reduced plane curves of
degree n with d nodes and k cusps. Suppose that d, k, n and t are such that
d+ k ≤
n2 − (3 + 2t)n+ 2 + t2 + 3t
= h0(OP2(n− t− 3))(3)
t ≤ n− 3 if k = 0,(4)
k ≤ 6 if t = 1, 2 and(5)
k ≤ 6 + [
], if t = 3,(6)
where [−] is the integer part of −. Then the variety Σnk,d is not empty and there ex-
ists at least an irreducible component W ⊂ Σnk,d whose general element corresponds
to a geometrically t-normal plane curve.
Remark 3.6. As we shall see in the next section, (see proposition 4.1), the geomet-
ric linear normality of the plane curve corresponding to the general element of an
irreducible component Σ of Σnk,d, is related with the number of moduli of Σ. Another
motivation for the previous theorem has been the family of irreducible plane sextics
with six cusps. By [25], we know that Σ66,0 contains at least two irreducible compo-
nents Σ1 and Σ2. The general point of Σ1 corresponds to a sextic with six cusps on
a conic, whereas the general element of Σ2 corresponds to a sextic with six cusps
not on a conic. Note that, by the previous lemma the general element of Σ2 param-
eterizes a geometric linearly normal sextic, unlike the general element of Σ1, which
corresponds to a projection of a canonical curve of genus four. Theorem 3.5, proves
in particular that, under a suitable restriction, (see inequality (3)), on the genus
of the curve corresponding to the general element of the family and, if the number
of the cusps is small, the variety Σnk,d contains a not empty irreducible component
whose general element corresponds to a curve which is not a projection of an other
curve, lying in a projective space of larger dimension. We notice that the inequality
(3) of the previous theorem can’t be improved. Indeed, if g =
− k − d, then
k + d > h0(P2,OP2(n− 3 − t)) if and only if g <
2tn−t2−3t
. On the other hand, by
using the same notation as in theorem (3.5), if g < 2tn−t
, then, by Riemann-
Roch theorem, we have that h0(C,OC(t)) ≥ tn− g+1 >
+1 = h0(P2,OP2(t)).
On the contrary, inequalities (5) and (6) are not sharp, (see example 3.7).
8 CONCETTINA GALATI
In the case of k = 0 and t = 1, theorem 3.5 has been proved by Sernesi in [18],
section 4. The case k = 0 and t ≤ n−3 is already contained in [5]. To show theorem
3.5, we proceed by induction on the degree n and on the number of nodes and cusps
of the curve. The geometric idea at the base of the induction on the degree of the
curve is, mutatis mutandis, the same as that of Sernesi.
Proof of theorem 3.5. Let t be a positive integer such that n−3−t ≥ 0 and letW ⊂
Σnk,d be an irreducible component of Σ
k,d. By standard semicontinuity arguments
it follows that, if there exists a point [C] ∈ W corresponding to a geometrically
t-normal curve with only k cusps and d nodes as singularities, then the general
element of W corresponds to a geometrically t-normal plane curve. Moreover, if the
theorem is true for fixed n, t ≤ n − 3, k as in (5) or in (6) and k + d as in (3),
then the theorem is true for n, t and any k′ ≤ k and d′ ≤ d+ k − k′. Indeed, from
the hypotheses (3), (5) and (6), it follows in particular that k < 3n. By section 2.1,
under this hypothesis, for every k′ ≤ k and for every d′ ≤ d + k − k′, there exists
a family of plane curves C → ∆ of degree n, parametrized by a curve ∆ ⊂ Σnk′,d′ ,
whose special fibre is C0 = C and whose general fibre Cz has d
′ nodes and k′ cusps
as singularities. The statement follows by applying the semicontinuity theorem to
the family C̃ → ∆̃, obtained by normalizing the total space of the pull-back family
of C → ∆ to the normalization curve ∆̃ of ∆. Finally, it’s enough to show the
theorem when the equality holds in (5), (6) and (3).
First of all we consider the case k = 0. We will show the statement for any fixed t
and by induction on n. Let, then t ≥ 1 and n = t+3. In this case the equality holds
in (3) if d = 1 = h0(P2,OP2). Since one point imposes independent linear conditions
to regular functions, by using lemma 3.2, we find that every irreducible plane curve
of degree n = t + 3 with one node and no further singularities is geometrically t-
normal. So, the first step of the induction is proved. Suppose, now, that the theorem
is true for n = t+3+a and let [Γ] ∈ Vn,g be a point corresponding to a geometrically
t-normal curve with a
2+3a+2
nodes. Let D be a line which intersects transversally
Γ and let P1, ..., Pt+1 be t + 1 marked points of Γ ∩ D. If Γ
′ = Γ ∪ D ⊂ P2, then
P1, ..., Pt+1 are nodes for Γ
′. Let C → Γ be the normalization of Γ and C′ → Γ′ the
partial normalization of Γ′, obtained by smoothing all singular points of Γ′, except
P1, ..., Pt+1. We have the following exact sequence of sheaves on C
(7) 0 → OD(t)(−P1 − ...− Pt+1) → OC′(t) → OC(t) → 0,
where OC′(t) := OC′(tH) and H is the pull-back with respect to C
′ → Γ′ of general
line of P2. Since deg(OD(t)(−P1 − ...− Pt+1)) < 0, we get that
h0(D,OD(t)(−P1 − ...− Pt+1)) = 0
and so
(8) h0(C′,OC′(t)) = h
0(C,OC(t)) = h
0(P2,OP2(t)).
Now, by section 2.1, we can obtain Γ′ as the limit of a 1-parameter family of
irreducible plane curves
ψ : C → ∆ ⊂ P
(n+1)(n+4)
of degree n+ 1 = t+ a+ 4 with
a2 + 3a+ 2
+ n− t− 1 =
(a+ 1)2 + 3(a+ 1) + 2
= h0(P2,OP2(n+ 1− t− 3))
nodes specializing to nodes of Γ′ different from the marked points P1, ..., Pt+1. More-
over, one can prove that ∆ is smooth, (see [24] or [25]). Normalizing C, we obtain a
family whose general fibre is smooth and whose special fibre is exactly C′, and we
conclude the inductive step by (8) and by semicontinuity theorem.
Now we consider the case t = 1, 2 or 3 and k as in (5) and in (6). Suppose the
theorem is true for n and let [Γ] ∈ Σnk,d be a general point in one of the irreducible
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 9
components of Σnk,d. Then, let D be a smooth plane curve of degree t if t = 1, 2 or
an irreducible cubic with a cusp if t = 3. By the generality of Γ, we may suppose
that D intersects Γ transversally. Let P1, ..., Pt2+1 be t
2+1 fixed points of Γ∩D. If
Γ′ = Γ ∪D, then P1, ..., Pt2+1 are nodes for Γ
′. Let C → Γ be the normalization of
Γ and C′ → Γ′ the partial normalization of Γ′, obtained by smoothing all singular
points except P1, ..., Pt2+1. By using the same notation and by arguing as before,
from the following exact sequence of sheaves on C′
0 → OD(t)(−P1 − ...− Pt2+1) → OC′(t) → OC(t) → 0,
we deduce that
(9) h0(C′,OC′(t)) = h
0(C,OC(t)) = h
0(P2,OP2(t)).
Now, by section 2.1, we can obtain Γ′ as limit of a family of irreducible plane curves
φ : C → ∆
of degree n+ t with d+ nt− t2 − 1 =
(n+t)2+(3+2t)(n+t)+t2+3t+2
nodes specializing
to nodes of Γ′ different to P1, ..., Pt2+1, and k +
2−3t+2
cusps specializing to cusps
of Γ. We conclude by (9) and by semicontinuity, as before. Now we have to show
the first step of the induction. For t = 1 the induction begins with the cases
(n, k) = (4, 1), (5, 3), (6, 6). Trivially, if n = 4 and k = 1 one point imposes
independent conditions to the linear system of regular functions. If n = 5 and
k = 3 we have to show that there are irreducible quintics with three cusps not on
a line. A quintic with three cusps is a projection of the rational normal quintic
C5 ⊂ P
5 from a plane generated by three points lying on three different tangent
lines to C5. By Bezout theorem the three cusps of such a plane curve can’t be
aligned. If n = k = 6, one can repeat the classical argument used by Zariski, see
[24] or example 3.20 of chapter 2 of [7]. For t = 2 we have to show the theorem
for (n, k) = (5, 1), (6, 3), (7, 6), (8, 6), while for t = 3 we have to show the theorem
for (n, k) = (6, 1), (7, 3), (8, 6), (9, 6), (10, 6). The case t = 2 and (n, k) = (5, 1) is
trivial. When t = 2, n = 6 and k = 3 we have that n − 3 − t = 1. To show that
there exists an irreducible sextic with three cusps not on a line, consider a rational
quartic C4 with three cusps, (see corollary 3.18 of chapter 2 of [7] for the existence).
By Bezout theorem, the three double points of C4 can’t be aligned. Then consider
a sextic C6 which is union of C4 and a conic C2 which intersects C4 transversally.
By section 2.1, one can smooth the intersection points of C4 and C2 obtaining a
family of sextics with three cusps not on a line. For t = 2, n = 7 and k = 6 we
argue as in the previous case, by using a sextic C6 with six cusps not on a conic
and a line R with intersects C6 transversally. Similarly for t = 2 , n = 8 and k = 6
and t = 3 and (n, k) = (6, 1), (7, 3), (8, 6), (9, 6), (10, 6). �
Example 3.7. Inequalities (5) and (6) are not sharp. To see this, we can consider
the example of curves of degree 10. We recall that we say that a plane curve is
geometrically linearly normal (g.l.n. for short) if it is geometrically 1-normal. The-
orem 3.5 ensures the existence of g.l.n. irreducible plane curves of degree 10 with
k ≤ 6 cusps and nodes as singularities. But, by using the same ideas as we used
in theorem 3.5, one can prove the existence of g.l.n. plane curves of degree 10 with
nodes and k ≤ 9 cusps. It is enough to consider a sextic Γ6 with six cusps not on a
conic and a rational quartic Γ4 with three cusps intersecting Γ6 transversally. We
choose five points P1, . . . , P5 of Γ4 ∩ Γ6. If Γ
6 and Γ
4 are the normalization curves
of Γ6 and Γ4 respectively and C
′ is the partial normalization of Γ6 ∪Γ4 obtained by
normalizing all its singular points except P1, . . . , P5, by considering the following
exact sequence
0 → OΓ′4(1)(−P1 − · · · − P5) → OC′(1) → OΓ′6(1) → 0
10 CONCETTINA GALATI
we find that h0(C′,OC′(1)) = 3. By using terminology of section 2.1, the statement
follows by smoothing the singular points P1, . . . , P5 of Γ6 ∪Γ4, and by semicontinu-
ity, as in the proof of theorem 3.5. The bound on the number of cusps of theorem
3.5 can be improved also for t = 2 or t = 3. For example, theorem 3.5 ensures
the existence of geometrically 3-normal curves of degree 12 with k ≤ 6 and nodes
as further singularities. But, by considering a geometrically 3-normal curve of de-
gree 8 with six cusps and a quartic with 3 cusps and arguing as before, we can find
geometrically 3-normal irreducible plane curves of degree 12 with nodes and k ≤ 9
cusps.
4. Families of plane curves with nodes and cusps with finite and
expected number of moduli.
Let Σ ⊂ Σnk,d be an irreducible component of Σ
k,d. We want to give sufficient
conditions for Σ to have the expected number of moduli. Let [Γ] ∈ Σ be a general
element, corresponding to a plane curve Γ with normalization map φ : C → Γ. We
shall denote by ωC the canonical sheaf of C and by OC(1) the sheaf associated to
the pullback to C of the divisor cut out on Γ from the general line of P2.
Proposition 4.1. Let Σ ⊂ Σnk,d be an irreducible component of Σ
k,d and let [Γ] ∈ Σ
be a general element, corresponding to a plane curve Γ with normalization map
φ : C → Γ. Suppose that Σ is smooth of the expected dimension equal to 3n+g−1−k
at [Γ]. Moreover, suppose that:
(1) Γ is geometrically linearly normal, i.e. h0(C,OC(1)) = 3,
(2) the Brill-Noether map
µo,C : H
0(C,OC(1))⊗H
0(C, ωC(−1)) → H
0(C, ωC)
is surjective.
Then Σ has the expected number of moduli equal to 3g − 3 + ρ− k.
Proof. The case k = 0 has been proved by Sernesi in [18], section 4. We shall
assume k > 0. Let Γ be a plane curve verifying the hypotheses of the proposition.
By lemma 1.5.(b) of [22], the hypothesis that Σ is smooth of the expected dimension
at [Γ] implies the vanishing H1(C,Nφ) = 0, where Nφ if the normal sheaf of φ. We
recall that, denoting by ΘC and ΘP2 the tangent sheaf of C and P
2 respectively,
then the normal sheaf of φ is defined as the cokernel of the differential map φ∗ of φ
(10) 0 → ΘC
→ φ∗ΘP2 → Nφ → 0
By theorem 3.1 of [13], the vanishing H1(C,Nφ) = 0 is a sufficient condition for the
existence of a universal deformation family
of the normalization map φ, whose parameter space B is smooth at the point 0
corresponding to φ, with tangent space at 0 equal to H0(C, Nφ). On the contrary,
by [3], p. 487, the Severi variety Vn,g = Σ
0,k+d of irreducible plane curves of genus
− d − k is singular at the point [Γ] and the universal deformation space
B of φ is a desingularization of Vn,g at [Γ]. Moreover, by corollary 6.11 of [2],
if Bk = F
−1(Σ) is the locus of points of B corresponding to a morphism with k
ramification points, then the tangent space to Bk at 0 is a subspaceW of H
0(C,Nφ)
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 11
of codimension k such that W ∩H0(C,Kφ) = 0, where Kφ is the torsion subsheaf
of Nφ. By [3], p.487, it follows that, if
F : B → Vn,g
is the natural (1 : 1)-map from B to Vn,g, then the differential map
dF : H0(C, Nφ) → T[Γ]Vn,g
restricts to an isomorphism between W and the tangent space T[Γ]Σ to Σ at [Γ].
We can now go back to the number of moduli of Σ. From the exact sequence
(10), by using that H1(C, Nφ) = 0, we get the following long exact sequence
0 → H0(C,ΘC) → H
0(C, φ∗ΘP2) → H
0(C,Nφ)
→ H1(C,ΘC) → H
1(C, φ∗ΘP2) → 0
Recalling that the space H1(C,ΘC) is canonically identified with the tangent space
T[C]Mg to Mg at the point associated to the normalization C of Γ, the coboundary
map δC : H
0(C,Nφ) → H
1(C,ΘC) sends the Horikawa class of an infinitesimal
deformation of φ to the Kodaira-Spencer class of the corresponding infinitesimal
deformation of C. So, δC |W is the differential map at the point 0 ∈ B of the
moduli map ΠΣ ◦ F : Bk = F
−1(Σ) 99K Mg. Since the point [Γ] is general in Σ,
and recalling the isomorphism dF :W
→ T[Γ]Σ, we have that
the number of moduli of Σ = dim(δC(W )).
Now, from the exact sequence (10), we have that
dim(δC(H
0(C,Nφ)) = 3g − 3− h
1(C, φ∗ΘP2).
Moreover, from the pull-back to C of the Euler exact sequence, we deduce the well
known isomorphism
H1(C, φ∗ΘP2) ≃ coker(µ
0,C) ≃ (ker(µ0,C))
and we conclude that
(11) dim(δC(H
0(C,Nφ))) = 3g − 3− dim(ker(µ0,C)).
Notice that the previous equality is always true, even if Γ doesn’t verify the hypoth-
esis (1) or (2) of the statement. Moreover, if Γ is geometrically linearly normal, i.e.
if h0(C,OC(1)) = 3, we have that
ρ = 3n− 2g − 6 = dim(coker(µo,C))− dim(ker(µo,C)).
When µo,C is surjective, ρ = − dim(ker(µo,C)) and
(12) dim(δC(H
0(C,Nφ)) = 3g − 3 + ρ = dim(B)− 8 = dim(Vn,g)− 8.
Since the dimension of the fibre of the moduli map
ΠVn,g ◦ F : B 99K Mg
has dimension at least equal to 8 = dim(Aut(P2)), from (12) we deduce that the
differential map of ΠVn,g ◦F has maximal rank at 0 and, in particular, we have that
dim((ΠVn,g ◦ F )
−1([C])) = 8. Equivalently, there exist only finitely many g2n on C.
It follows that there are only finitely many g2n on C mapping C to the plane as a
curve with k cusps and d nodes. Then,
dim(δc(W )) = dim(ΠΣ(Σ)) = 3g − 3 + ρ− k.
Remark 4.2. Arguing as in the proof of the previous proposition, it has been proved
in [18] that, if Γ is a geometrically linearly normal plane curve with only d nodes
as singularities and the Brill-Noether map µo,C of the normalization morphism of
Γ is injective, then Σ = Σn0,d has general moduli. If Σ ⊂ Σ
k,d and [Γ] ∈ Σ verify
the hypotheses of proposition 4.1 but we assume that µo,C is injective, we may
only conclude that ΠVn,g ◦ F is dominant with surjective differential map at [Γ].
12 CONCETTINA GALATI
So dim(Π−1Vn,g ([C])) = ρ + 8. But this is not useful to compute the dimension of
δC(W ) = δC(T[Γ]Σ). However, in this case we get that
δC(T[Γ]Σ) + δC(H
0(C,Kφ)) = δC(H
0(C,Nφ)) = H
1(C,ΘC).
Then, by using that dim(δC(H
0(C,Kφ))) ≤ k and by recalling that if Σ has the
expected dimension then the number of moduli of Σnk,d is at most the expected one
(see lemma 2.2 and remark 2.3), we find that
3g − 3− k ≤ number of moduli of Σ ≤ 3g − 3 + ρ− k.
Remark 4.3. Notice that, if a plane curve Γ of genus g verifies the hypotheses
(1) and (2) of the previous proposition, then the Brill-Noether number ρ(2, g, n)
is not positive and, in particular, g ≥ 3. We don’t know examples of complete
irreducible families Σ ⊂ Σnk,d with the expected number of moduli whose general
element [Γ] corresponds to a curve Γ of genus g, with ρ(2, g, n) ≤ 0, which doesn’t
verify properties (1) and (2).
Lemma 4.4 ([5], Corollary 3.4). Let Γ be an irreducible plane curve of degree n
with only nodes and cusps as singularities and let φ : C → Γ be the normalization
morphism of Γ. Suppose that Γ is geometrically 2-normal, i.e. h0(C,OC(2)) = 6.
Then the Brill-Noether map
µo,C : H
0(C,OC(1))⊗H
0(C, ωC(−1)) → H
0(C, ωC)
is surjective.
Proof. By lemma 3.2, the curve Γ is geometrically 2-normal if and only if the scheme
N of the singular points of Γ imposes independent linear conditions to the linear
systemH0(P2,OP2(n−5)) of plane curves of degree n−5. SinceH
0(P2,OP2(n−5)) ⊂
H0(P2,OP2(n−4)), N imposes independent linear conditions plane curves of degree
n−4, and, by using lemma 3.2, we get that h0(C,OC(1)) = 3, i.e. Γ is geometrically
linearly normal. Now, denote by IN |P2 the ideal sheaf of N . Notice that the curve
Γ is geometrically 2-normal if and only if the ideal sheaf IN |P2(n− 4) is 0-regular,
(in the sense of Castelnuovo-Mumford). Indeed, since h2(P2, IN |P2(n− 6)) = 0, the
ideal sheaf IN |P2(n− 4) is 0-regular if and only if h
1(P2, IN |P2(n− 5)) = 0. Because
of the 0-regularity of IN |P2(n− 4), we have the surjectivity of the natural map
H0(P2, IN |P2(n− 4))⊗H
0(P2,OP2(1)) → H
0(P2, IN |P2(n− 3)),
(see [17]). Finally, by the geometric linear normality of Γ, the vertical maps of the
following commutative diagram
H0(P2,OP2(1))⊗H
0(P2, IN |P2(n− 4)) //
H0(P2, IN |P2(n− 3))
H0(C,OC(1))⊗H
0(C, ωC(−1))
// H0(C, ωC)
are surjective and, hence, the Brill-Noether map µo,C is surjective too. �
Corollary 4.5. Let Σ ⊂ Σnk,d be an irreducible component of Σ
k,d of dimension
equal to 3n + g − 1 − k, such that the general point [Γ] ∈ Σ corresponds to a
geometrically 2-normal plane curve. Then Σ has the expected number of moduli
equal to 3g − 3 + ρ− k.
Proof. It follows from proposition 4.1 and lemma 4.4. �
In order to produce examples of families of irreducible plane curves with nodes
and cusps with the expected number of moduli, we study how increases the rank of
the Brill-Noether map by smoothing a node or a cusp of the general curve of the
family, (in the sense of section 2.1).
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 13
Let Σ ⊂ Σnk,d, with n ≥ 5, be an irreducible component of Σ
k,d, let [Γ] ∈ Σ be a
general point of Σ and let φ : C → Γ be the normalization of Γ. Choose a singular
point P ∈ Γ and denote by φ′ : C′ → Γ the partial normalization of Γ obtained by
smoothing all singular points of Γ, except the point P . If ωC′ is the dualizing sheaf
of C′ and
µo,C′ : H
0(C′,OC′(1))⊗H
0(C′, ωC′(−1)) → H
0(C′, ωC′),
is the natural multiplication map, we have the following result.
Lemma 4.6. If h0(C,OC(1)) = 3 and the geometric genus g of C is such that g >
n−2, with n ≥ 5, then rk(µo,C′) ≥ rk(µo,C)+1. In particular, if h
0(C,OC(1)) = 3,
n ≥ 5 and µo,C is surjective, then µo,C′ is also surjective.
Proof. Let ψ : C → C′ be the normalization map.
We recall that, if we set φ∗(P ) := p1 + p2 when P is a node and φ
∗(P ) = 2φ−1(P )
when P is a cusp, then the dualizing sheaf of C′ is a subsheaf of ψ∗(ωC(φ
∗(P ))),
(see for example [10], p.80). In particular we have the following exact sequence
(13) 0 → ωC′ → ψ∗ωC(φ
∗(P )) → CP → 0
where CP is the skyscraper sheaf on C with support at P . From this exact sequence,
we deduce that
H0(C′, ωC′) ≃ H
0(C, ωC(φ
∗(P ))).
Moreover, tensoring (13) by OC′(−1), we find the exact sequence
(14) 0 → ωC′(−1) → ψ∗ωC(φ
∗(P ))(−1) → CP → 0
from which we get an injective map H0(C′, ωC′(−1)) → H
0(C, ωC(φ
∗(P ))(−1)).
On the other hand
(15) h0(C′, ωC′(−1)) = h
0(C, ωC(φ
∗(P ))(−1)) = g − n+ 3
and so
H0(C′, ωC′(−1)) ≃ H
0(C, ωC(φ
∗(P ))(−1)).
Moreover, from the hypothesis h0(C,OC(1)) = 3, we have that H
0(C,OC(1)) ≃
H0(C′,OC′(1)) ≃ H
0(P2,OP2(1)). Therefore, in the following commutative dia-
H0(C′,OC′(1))⊗H
0(C′, ωC′(−1))
µo,C′
H0(C′, ωC′)
H0(C,OC(1))⊗H
0(C, ωC(−1)(φ
∗(P )))
// H0(C, ωC(φ
∗(P )))
where we denoted by µ′o,C the natural multiplication map, the vertical maps are
isomorphisms. In particular,
rk(µo,C′) = rk(µ
o,C).
In order to compute the rank of µ′o,C , we consider the following commutative dia-
14 CONCETTINA GALATI
H0(C,OC(1))⊗H
0(C, ωC(−1))
H0(C, ωC)
H0(C,OC(1))⊗H
0(C, ωC(−1)(φ
∗(P )))
// H0(C, ωC(φ
∗(P )))
where the vertical maps are injections. Notice that, since we supposed n ≥ 5,
h0(C,OC(1)) = 3 and g > n−2 ≥ 3, the sheaf OC(1) is special. We deduce that C is
not hyperelliptic and, chosen a basis of H0(C, ωC), the associated map C → P
g−1 is
an embedding. On the contrary, the sheaf ωC(φ
∗(P )) does not define an embedding
on C. Choosing a basis of H0(C, ωC(φ
∗(P )) and denoting by Φ : C → Pg the
associated map, this will be an embedding outside φ∗(P ). If P is a node of C
and φ∗(P ) = p1 + p2, the image of C to P
g, with respect to Φ, will have a node
at the image point Q of p1 and p2. If P ∈ Γ is a cusp, then Φ(C) will have a
cusp at the image point Q of φ−1(P ). The hyperplanes of Pg passing through Q
cut out on C the canonical linear series |ωC |. Moreover, if we denote by B ⊂ P
the subspace which is the base locus of the hyperplanes of Pg corresponding to
Im(µ′o,C), then Q /∈ B. Indeed, B intersects the curve C in the image of the base
locus of |OC(1)|+ |ωC(φ
∗(P ))(−1)| := P(Im(µ′0,C)), which coincides with the base
locus of |ωC(φ
∗(P ))(−1)|, since |OC(1)| is base point free. Now, by (15),
h0(ωC(φ
∗(P ))(−1)) = 3 + g − n = h0(C, ωC(−1)) + 1.
Then φ∗(P ) does not belong to the base locus of |ωC(φ
∗(P ))(−1)|, and so
dim(< Q,B >Pg) = dim(B) + 1.
Finally, we find that
rk(µo,C) = rk(Gµo,C) ≤ dim(Im(G) ∩ Im(µ
o,C))
≤ g + 1− dim(< B,Q >Pg)− 1
= g − 1− dim(B)
= rk(µ′o,C)− 1.
Corollary 4.7. Let Σ ⊂ Σnk,d be a non-empty irreducible component of the expected
dimension of Σnk,d, with n ≥ 5. Suppose that Σ has the expected number of moduli
and that the general element [Γ] ∈ Σ corresponds to a g.l.n. plane curve Γ of
geometric genus g such that, if C → Γ is the normalization of Γ, then the map
µo,C is surjective. Then, for every k
′ ≤ k and d′ ≤ d + k − k′, there is at least an
irreducible component Σ′ ⊂ Σnk′,d′ , such that Σ ⊂ Σ
′, the general element [D] ∈ Σ′
corresponds to a g.l.n. plane curve D of geometric genus g′ with normalization
Dν → D and the Brill-Noether map µ0,Dν surjective. In particular, Σ
′ has the
expected number of moduli.
Proof. Let Γ be the curve corresponding to the general element [Γ] of Σ ⊂ Σnk,d.
Since by hypothesis Σ is smooth of the expected dimension at [Γ], by section 2.1,
for every k′ ≤ k and for every d′ ≤ d+ k− k′ there exists an irreducible component
Σ′ of Σnk′,d′ containing Σ. In order to prove the statement, it is enough to show it
under the hypotheses k′ = k− 1 and d′ = d+1, k = k′ and d′ = d− 1 or d = d′ and
k′ = k − 1. If k′ = k − 1 and d′ = d + 1, then the statement follows by standard
semiconinuity arguments. If k = k′ and d′ = d − 1 or d = d′ and k′ = k − 1, the
statement follows by lemma 4.6 and by standard semicontinuity arguments. �
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 15
The following lemma has been stated and proved by Sernesi in [18]. Actually,
Sernesi supposes that Γ has only nodes as singularities. But, since his proof works
for plane curves Γ with any type of singularities and, since we need it for curves
with nodes and cusps, we state the lemma in a more general form.
Lemma 4.8. ([18], lemma 2.3) Let Γ be an irreducible and reduced plane curve of
degree n ≥ 5 with any type of singularities. Denote by C the normalization of Γ.
Suppose that h0(C,OC(1)) = 3 and the Brill-Noether map
µo,C : H
0(C,OC(1))⊗H
0(C, ωC(−1)) → H
0(C, ωC),
has maximal rank. Let R be a general line and let P1, P2 and P3 be three fixed points
of Γ ∩ R. We denote by C′ the partial normalization of Γ′ = Γ ∪ R, obtained by
smoothing all the singular points, except P1, P2 and P3. Then h
0(C′,OC′(1)) = 3
and, denoting by ωC′ the dualizing sheaf of C
′, the multiplication map
µo,C′ : H
0(C′,OC′(1))⊗H
0(C′, ωC′(−1)) → H
0(C′, ωC′),
has maximal rank.
Theorem 4.9. Let Σnk,d be the algebraic system of irreducible plane curves of degree
n ≥ 4 with k cusps, d nodes and geometric genus g =
− k − d. Suppose that:
(16) n− 2 ≤ g equivalently k + d ≤ h0(P2,OP2(n− 4))
(17) k ≤ 6 +
if 3n− 9 ≤ g and n ≥ 6,
(18) k ≤ 6 otherwise.
Then Σnk,d has at least an irreducible component Σ which is not empty and such
that, if Γ ⊂ P2 is the curve corresponding to the general element of Σ and C is the
normalization curve of Γ, then h0(C,OC(1)) = 3 and the map µo,C has maximal
rank. In particular, when ρ ≤ 0, the algebraic system Σ has the expected number of
moduli equal to 3g − 3 + ρ− k.
Proof. Suppose that (17) holds. Then, by observing that
g ≥ 3n− 9 if and only if k + d ≤ h0(P2,OP2(n− 6))
and by using theorem 3.5 for t = 3, we have that there exists an irreducible com-
ponent Σ of Σnk,d whose general element is a geometrically 3-normal plane curve Γ.
By remark 3.4, it follows that also the linear systems cut out on C by the conics
and the lines are complete. The statement follows from corollary 4.5.
In order to prove the theorem under the hypothesis (18), we consider the following
subcases:
(1) 2n− 5 ≤ g ≤ 3n− 9, i.e. h0(OP2(n− 6)) ≤ k+ d ≤ h
0(OP2(n− 5)) and
n ≥ 5,
(2) n− 2 ≤ g ≤ 2n− 7 and n ≥ 5,
(3) g = 2n− 6 and n ≥ 4.
Suppose that (1) holds. By theorem 3.5 for t = 2, we know that, under this
hypothesis, there exists a nonempty component Σ ⊂ Σnk,d, whose general element is
geometrically 2-normal. We conclude as in the previous case, by corollary 4.5.
Now, suppose that g and n verify (2). We shall prove the theorem by induction
on n and g. Set g = 2n − 7 − a, with a ≥ 0 fixed. Suppose that the theorem is
true for the pair (n, g), with n ≥ 7. We shall prove the theorem for (n + 1, g + 2),
observing that g + 2 = 2(n + 1) − 7 − a. Let Γ be a g.l.n. irreducible plane curve
16 CONCETTINA GALATI
of degree n and genus g = 2n − 7 − a with k ≤ 6 cusps, d nodes and no more
singularities. Let C be the normalization of Γ. Suppose that the Brill-Noether map
µo,C has maximal rank. Let R ⊂ P
2 be a general line and let P1, P2 and P3 be three
fixed points of Γ∩R. By section 2.1, since k ≤ 6 < 3n, one can smooth the singular
points P1, P2, P3 and preserve the other singularities of Γ ∪ R ⊂ P
2, obtaining a
family of plane curves C → ∆ whose general fibre is irreducible, has degree n+1 and
genus g+2. We conclude by lemma 4.8 and by standard semicontinuity arguments.
Now we prove the first step of the induction for n ≥ 7. If n = 7, we get 0 ≤ a ≤ 2.
Let a = 0, i.e. g = 2n−7−a= 7. Let Γ be a g.l.n. irreducible plane curve of degree
n = 7, of genus g = n = 7 with k ≤ 6 cusps and nodes as singularities, such that
no seven singular points of Γ lie on an irreducible conic. To prove that there exists
such a plane curve, notice that, by applying theorem 3.5 for t = 1, we get that, for
any fixed k ≤ 6, there exists a g.l.n. irreducible sextic D of genus four with k cusps
and d = 6 − k nodes. Let R1, . . . , R6 be the singular points of D. Since the points
R1, . . . , R6 of D impose independent linear conditions to the conics, however we
choose five singular points Ri1 , . . . , Ri5 of D, with I = (i1, . . . , i5) ⊂ (1, . . . , 6), there
exists only one conic CI , passing through these points. Let us set S =
CI ∩D
and let R be a line intersecting D transversally at six points out of S. By Bezout
theorem, no seven singular points of Γ′ = D ∪ R belong to an irreducible conic.
Moreover, if D̃ is the normalization of D, if Q1, . . . , Q4 are four fixed points of
D∩R and D′ is the partial normalization of Γ′ obtained by smoothing the singular
points except Q1, . . . , Q4, then, by the following exact sequence
(19) 0 → OR(1)(−Q1 − · · · −Q4) → OD′(1) → OD̃(1) → 0
we find that h0(D′, OD′(1)) = 3. By section 2.1, one can smooth the singulari-
ties Q1, . . . , Q4 and preserve the other singularities of D ∪ R, getting a family of
irreducible septics G → ∆ whose general fibre Γ is a geometrically linearly normal
irreducible septic with k cusps and 8 − k nodes such that no seven singular point
of Γ belong to an irreducible conic. Let, now, C be the normalization of Γ and let
∆ ⊂ C be the adjoint divisor of the normalization map φ : C → Γ. We shall prove
that ker(µo,C) = 0. Since Γ is geometrically linearly normal, we have that
h0(C, ωC(−1)) = h
0(C,OC(3)(−∆))) = g − n+ 2 = 2.
Then, by the base point free pencil trick, we find that
ker(µo,C) = H
0(C, ω∗C(B)⊗OC(2)),
where B is the base locus of |ωC(−1) = OC(3)(−∆)|. Let F be the pencil of plane
cubics passing through the eight double points P1, . . . , P8 of Γ and let BF be the
base locus of the pencil F . Let Γ3 be the general element of F . Suppose that BF
has dimension one. If BF contains a line l, then, by Bezout theorem, at most three
points among P1, . . . , P8, say P1, . . . , P3 can lie on l and the other points have to be
contained in the base locus of a pencil of conics F ′. Using again Bezout theorem,
we find that the curves of F ′ are reducible and the base locus of F ′ contains a line
l′. But also l′ contains at most three points of P4, . . . , P6. It follows that there is
only one cubic through P1, . . . , P8. This is not possible by construction. Suppose
that BF contains an irreducible conic Γ2. By Bezout theorem, at most seven points
among P1, . . . , P8 may lie on Γ2. On the other hand, since dim(F) = 1, there
are exactly seven points of P1, . . . , P8, say P1, . . . , P7, on Γ2 and the general cubic
Γ3 of F is union of Γ2 and a line passing through P8. Since, by construction, no
seven singular points of Γ lie on an irreducible conic, also in this case we get a
contradiction. So the general element Γ3 of F is irreducible. Using again Bezout
theorem, we find that Γ3 is smooth and F has only one more base point Q. We
consider the following cases:
a) Q doesn’t lie on Γ;
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 17
b) Q lies on Γ, but Q 6= P1, . . . , P8;
c) Q is infinitely near to one of the points P1, . . . , P8, say Pî, i.e. the cubics of F
have at P
the same tangent line l, but l is not contained in the tangent cone to Γ
d) Q is like in the case c), but l is contained in the tangent cone to Γ at P
Suppose that the case a) or c) holds. Thus B = 0 and
ker(µo,C) = H
0(C, ω∗C ⊗OC(2)) = H
0(C,OC(−2)(∆)).
By Riemann-Roch theorem, h0(C,OC(−2)(∆)) = h
0(C,OC(6)(−2∆)) − 4. One
sees that h0(C,OC(6)(−2∆)) = 4, by blowing-up the plane at P1, . . . , P8 and by
using some standard exact sequences. Suppose now that the case b) holds. Thus
B = Q and
dim(ker(µo,C)) = h
0(C,OC(−2)(∆ +Q)) = h
0(C,OC(6)(−2∆−Q))− 3.
Also in this case one sees that h0(C,OC(6)(−2∆ − Q)) = 3 by blowing-up at
P1, . . . , P8 and Q and by using standard exact sequences. Finally, we analyze the
case d). Let Φ : S → P2 be the blow-up of the plane at P1, . . . , P8 with exceptional
divisors E1, . . . , E8. Let Q ∈ Eî be the intersection point of Eî and the strict
transform C3 of the general cubic Γ3 of the pencil F . We denote by Φ̃ : S̃ → S the
blow-up of S at Q and by Ψ : S̃ → P2 the composition map of the maps Φ and Φ̃.
We still denote by E1, . . . , E8 their strict transforms on S̃, by C and C3 the strict
transforms of Γ and Γ3 and by EQ the new exceptional divisor of S̃. In this case we
have that Ψ∗(Γ) = C+2
Ei+3EQ, Ψ
∗(Γ3) = C3+
Ei+2EQ. Moreover, the
divisor ∆ is cut out on C by
iEi + EQ and the base locus B of the linear series
|ωC(−1)| coincides with the intersection point of EQ and C. So, we have that
dim(ker(µo,C)) = h
0(C,OC(−2)(
Ei+2EQ)) = h
0(C,OC(6)(−2
Ei−3EQ))−3.
Moreover, from the following exact sequence
0 → O
(−1) → O
(6)(−2
Ei − 3EQ) → OC(6)(−2
Ei − 3EQ) → 0
we find that H0(C,OC(6)(−2
Ei− 3EQ)) = H
0(S̃,O
(6)(−2
Ei− 3EQ)). In
order to show that h0(S̃,O
(6)(−2
Ei − 3EQ)) = 3, we consider the following
exact sequence
0 → O
(3)(−
Ei − EQ) → OS̃(6)(−2
Ei − 3EQ) →(20)
→ OC3(6)(−2
Ei − 3EQ) → 0
By Riemann-Roch theorem, we have that
h0(C3,OC3(6)(−2
Ei − 3EQ)) = 1 and h
1(C3,OC3(6)(−2
Ei − 3EQ) = 0.
Moreover, by Serre duality we have that
H1(S̃,O
(3)(−
Ei − EQ))) = H
1(S̃,O
(−6)(2
Ei + 3EQ))).
From the exact sequence
(21) 0 → O
(−6)(+2
Ei + 3EQ)) → OS̃(1) → OC(1) → 0
18 CONCETTINA GALATI
by using that the map H0(S̃,O
(1)) → H0(C,OC(1)) is surjective and that
h1(S̃,O
(1)) = 0, we find that
H1(S̃,O
(−6)(+2
Ei + 3EQ))) = H
1(S̃,O
(3)(−
Ei − EQ))) = 0.
Then, by (20), h0(S̃,O
(6)(−2
Ei − 3EQ)) = h
0(S̃,O
(3)(−
Ei − EQ)) +
h0(C3,OC3(6)(−2
Ei−3EQ)) = 3 and ker(µo,C) = 0. The first step of induction
for g = n = 7 and k ≤ 6 is proved.
We complete the proof of the first step of the induction, for n and g verifying
(2). When n = 7 and 1 ≤ a ≤ 2, the existence of a g.l.n. plane curve Γ follows
from theorem 3.5. Using the above notation, h0(C, ωC(−1)) = 1 if a = 1 and
h0(C, ωC(−1)) = 0 if a = 2. In any case µo,C is injective. When n ≥ 8 and
a ≤ n − 6 the theorem follows by induction from the case n = 7. For n ≥ 8 and
a = n− 5, we find that g = n− 2, or, equivalently, k + d = h0(P2,OP2(n− 4)). In
theorem 3.5, we proved the existence of geometrically linearly normal plane curves
of degree n ≥ 8 and genus g = n− 2, with nodes and k ≤ 6 cusps. For every such
plane curve Γ, using the above notation, the Brill-Noether map µo,C is injective
since h0(C, ωC(−1)) = 0. The cases n = 5 and n = 6 are similar.
Suppose now that n and g verify (3). First of all we prove the theorem for
(n, g) = (4, 2), (5, 4), (6, 6). For n = 4 and g = 2, we find n = g + 2 and we argue
as in the case n ≥ 8 and g = n − 2. Similarly, for (n, g) = (5, 4). For n = 6
and g = 6 in theorem 3.5 we proved the existence of geometrically linearly normal
plane curves Γ with k ≤ 4 cusps and nodes as singularities. For every such a plane
curve Γ, denoting by C its normalization, we get that h0(C, ωC(−1)) = 2, i.e. the
linear system F of conics passing through the four singular points P1, . . . , P4 of Γ
is a pencil which cuts out on C the complete linear series |ωC(−1)|. We have two
possibilities: either the general element of this pencil is irreducible or it consists of
a line containing exactly three singular points P1, P2, P3 of Γ and a line passing
through P4. In any case the base locus of F intersects Γ only at P1, . . . , P4 and
the linear series |ωC(−1)| has no base points. Then, by the base point free pencil
trick , we find that ker(µo,C) = H
0(C, ω∗C ⊗ O(2)) = H
0(C,OC(−1)(∆)), where
∆ ⊂ C is the adjoint divisor of the normalization map C → Γ. By Riemann-Roch
theorem, we have that h0(C,OC(−1)(∆)) = h
0(C,OC(4)(−2∆))−3. By blowing-up
at P1, . . . , P4, one can see that h
0(C,OC(4)(−2∆)) = 3, as we wanted.
Finally, we show the theorem under the hypothesis (3) for n ≥ 7, by using
induction on n. In order to prove the inductive step we may use lemma 4.8, exactly
as we did in the case (2). We prove the first step of induction. If n = 7 we have
that g = 8. On pages 15 and 16 we proved the existence of geometrically linearly
normal plane curves Γ of degree 7 and genus 7 with k ≤ 6, such that, if P1, . . . , P8
are the singular points of Γ, then no seven points among P1, . . . , P8 lie on a conic.
In particular, we proved that, for every such a plane curve Γ, the general element
of the pencil of cubics passing through P1, . . . , P8 is irreducible and, if φ : C → Γ
is the normalization of Γ, then the Brill-Noether map µo,C is injective. Let C
the partial normalization of Γ which we get by smoothing all the singular points of
Γ except a node, say P8. By using the same notation and by arguing exactly as in
the proof of lemma 4.6, we get the following commutative diagram
H0(C′,OC′(1))⊗H
0(C′, ωC′(−1))
µo,C′
H0(C′, ωC′)
H0(C,OC(1))⊗H
0(C, ωC(−1)(φ
∗(P8)))
// H0(C, ωC(φ
∗(P8)))
where µ′o,C is the multiplication map and the vertical maps are isomorphisms. We
want to prove that the map µo,C′ is surjective. By the previous diagram it is enough
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 19
to prove that µ′o,C is surjective. Since h
0(C, ωC(φ
∗(P8))) = 8 and
h0(C,OC(1))h
0(C, ωC(−1)(φ
∗(P8))) = 3(7− 7 + 3) = 9,
we have that dim(ker(µo,C′)) ≥ 1 and µo,C′ is surjective if dim(ker(µo,C′)) = 1. By
recalling that Γ is geometrically linearly normal, we have that, if Z is the scheme of
the points P1, . . . , P7 and IZ|P2 is the ideal sheaf of Z in P
2, then in the following
commutative diagram
H0(C,OC(1))⊗H
0(C, ωC(−1)(φ
∗(P8)))
H0(C, ωC(φ
∗(P8)))
H0(P2,OP2(1))⊗H
0(P2, IZ|P2(3))
// H0(P2, IZ|P2(4))
the vertical maps are isomorphisms. Hence, it is enough to prove that the kernel
of the multiplication map µ has dimension one. Let {f0, f1, f2} be a basis of the
vector space H0(P2, IZ|P2(3)). Since the general cubic passing through P1, . . . , P8
is irreducible, we may assume that f0, f1 and f2 are irreducible. Suppose, by
contradiction, that there exist at least two linearly independent vectors in the kernel
of µ. Then, there exist sections u0, u1, u2 and v0, v1, v2 of H
0(P2,OP2(1)) such that
the sections
ui⊗ fi and
vi⊗ fi are linearly independent in H
0(P2,OP2(1))⊗
H0(P2, IZ|P2(3)) and
i=0 uifi = 0
i=0 vifi = 0.
We can look at (22) as a linear system in the variables f0, f1, f2. The space of
solutions of (22) is generated by the vector
(u1v2 − u2v1, u3v0 − u0v3, u0v1 − u1v0).
In particular, if we set qi = (−1)
1+iuivj − viuj, we find that fjqi = fiqj , for every
i 6= j. But this is not possible since f1, f2 and f3 are irreducible. We deduce that
dim(ker(µ)) = dim(ker(µo,C′)) = 1
and µo,C′ is surjective. The existence of a plane septic of genus 8 with k ≤ 6
cusps and nodes as singularities, with injective Brill-Neother map, follows now by
smoothing the node P8 (in the sense of section 2.1) and by standard semicontinuity
arguments. �
Remark 4.10. Notice that the conditions which we found in theorem 4.9 in order
that Σnk,d has at least an irreducible component with the expected number of moduli,
are not sharp, even if we suppose ρ ≤ 0. To see this, notice that in remark 3.6 we
proved the existence of an irreducible component Σ of Σ129,0 whose general element
corresponds to a 3-normal plane curve. By remark 3.4 and corollary 4.5, we have
that Σ has the expected number of moduli.
Theorem 4.11. Σn1,d has the expected number of moduli, for every d ≤
Proof. First of all, we recall that, by [16], Σn1,d is irreducible for every d ≤
Moreover, from theorem 4.9 and from corollary 2.7, we know that Σn1,d is not empty
and it has the expected number of moduli if either ρ ≤ 0 or ρ ≥ 2. Next we shall
prove that, if ρ = 1, then the algebraic system
Σn1,d = Σ
(n−3)2
has general moduli. Equivalently, we will show that, if [Γ] ∈ Σn1,d is a general point
and g =
− 1 − d = 3n−7
, then, on the normalization curve C of Γ there are
only finitely many linear series g2n with at least a ramification point. Notice that,
20 CONCETTINA GALATI
if g =
− 1 − d = 3n−7
, then n is odd and n ≥ 5. We prove the statement by
induction on n.
If n = 5 then g = 4. Let C ⊂ P3 be the canonical model of a general curve of
genus four and let 2P + Q, with P 6= Q be a divisor in a g13 on C. This divisor is
cut out on C by the tangent line to C at P . The projection of C from Q is a plane
quintic of genus four with a cusp. This proves that Σ51,1 has general moduli.
Now we suppose that the theorem is true for n and we prove the theorem for
n+2. Let Γ ⊂ P2 be the plane curve with a cusp and
(n−3)2
−1 nodes corresponding
to a general point [Γ] ∈ Σn
(n−3)2
and let C2 be an irreducible conic intersecting
Γ transversally. By section 2.1, the point [C2 ∪ Γ] belongs to Σ
(n+2−3)2
particular, however we choose four points P1, . . . , P4 of intersection between Γ and
C2, there exists an analytic branch SP1,..., P4 of Σ
(n−1)2
, passing through [C2∪Γ]
and whose general point corresponds to an irreducible plane curve of degree n + 2
with a cusp in a neighborhood of the cusp of Γ and a node at a neighborhood of
every node of C2 ∪ Γ different from P1, . . . , P4. Moreover, S := SP1,..., P4 is smooth
at the point [C2 ∪ Γ], (see [7], chapter 2). Let
Π : Σn+2
(n−1)2
99K M 3(n+2)−7
be the moduli map of Σn+2
(n−1)2
. In order to prove that Π is dominant it is
sufficient to show that Π(S) = M 3n−1
. By section 2.1, there exist an analytic open
sets Si ⊂ Σn+2
(n−3)2
−1+2n−i
, with i = 1, 2, 3, such that
S0 := S ∩ (P5 × Σn
(n−3)2
) ⊂ S1 ⊂ S2 ⊂ S3 ⊂ S.
Every Si, with i = 1, 2, 3, has
irreducible components, passing through [C2∪Γ]
and intersecting transversally at [C2 ∪Γ], (see [7], chapter 2 or [25]). Moreover, the
general point of every irreducible component of Si, with i = 1, 2, 3, corresponds
to an irreducible plane curve Γi of degree n + 2 with a cusp in a neighborhood of
the cusp of Γ, a node in a neighborhood of every node of C2 ∪ Γ different from
P1, . . . , P4 and 4 − i nodes specializing to 4 − i fixed points among P1, . . . , P4, as
Γi specializes to C2 ∪ Γ. Now, notice that the moduli map Π is not defined at the
point [C2 ∪ Γ], but, if S is sufficiently small, then the restriction of Π to S extends
to a regular function on S. More precisely, let C → ∆ be any family of curves,
parametrized by a projective curve ∆ ⊂ S, passing through the point [C2 ∪ Γ] and
whose general point corresponds to an irreducible plane curve of degree n+2 of genus
3(n+2)−7
with a cusp and nodes as singularities. If we denote by C′ → ∆
the family of curves obtained from C → ∆ by normalizing the total space, we have
that the general fibre of C′ → ∆ is a smooth curve of genus 3n−1
, corresponding
to the normalization of the general fibre of C → ∆, whereas the special fibre C′0 is
the partial normalization of C2∪Γ, obtained by normalizing all the singular points,
except P1, . . . , P4. Then, the map Π|S is defined at [C2 ∪Γ] and it associates to the
point [C2∪Γ] the isomorphism class of C
0. Similarly, if [Γi] is a general point in one
of the irreducible components of Si, with i = 1, 2, 3, then Π|S ([Γi]) is the partial
normalization of Γi obtained by smoothing all the singular points except for the 4−i
nodes of Γi tending to 4−i fixed points among P1, . . . , P4 as Γi specializes to C2∪Γ.
It follows that, if we denote by M
the locus of M 3n−1
parametrizing j-nodal
curves, then ΠS(S
i) ⊆ M4−i3n−1
, for every i = 0, . . . , 4, and ΠS(S
i)  ΠS(S
i+1). In
particular, we find that
dim(Π|S (S)) ≥ dim(Π|S (S
0)) + 4.
NUMBER OF MODULI OF IRREDUCIBLE FAMILIES... 21
In order to compute the dimension of Π|S (S
0) we consider the rational map
F : Π|S (S
0) 99K M 3n−7
forgetting the rational tail. By the hypothesis that Σn
(n−3)2
has general moduli
and hence F is dominant. Moreover, if C is the normalization curve of Γ, by the
generality of [Γ] in Σn
(n−3)2
, we may assume that C is general in M 3n−7
want to show that dim(F−1([C])) = 5. In order to see this, we recall that, by the
hypothesis that Σn
(n−3)2
has general moduli, on C there exist only finitely many
linear series of degree n and dimension two, mapping C to the plane as curve with a
cusp and nodes as singularities. Let g2n be one of these linear series, let {s0, s1, s2}
be a basis of g2n and φ
′ : C → Γ′ ⊂ P2 the associated morphism. If Q1, . . . , Q4
are four general points of Γ′, then the linear system of conics through Q1, . . . , Q4
is a pencil F(Q1, . . . , Q4). Let C2 and D2 be two general conics of F(Q1, . . . , Q4).
We claim that, if η : P1 → C2 and β : P
1 → D2 are isomorphisms between P
1 and
C2 and D2 respectively, then the points η
−1(Q1), . . . , η
−1(Q4) are not projectively
equivalent to the points β−1(Q1), . . . , β
−1(Q4). In order to prove this, it is enough to
prove that there are at least two conics in the pencil F(Q1, . . . , Q4) which verify the
claim. LetD ⊂ P2 be a conic. If we choose two sets of points p1, . . . , p4 and q1, . . . , q4
ofD not projectively equivalent onD, we may always find projective automorphisms
A : P2 → P2 and A′ : P2 → P2 such that A(pi) = Qi and A
′(qi) = (Qi), for every
i. By construction, the conics C2 = A(D) and D2 = A
′(D) belong to the pencil
F (Q1, . . . , Q4) and verify the claim. This implies that the partial normalizations C
and D′ of Γ′∪C2 and Γ
′∪D2, obtained by smoothing all the singular points except
Q1, . . . , Q4, are not isomorphic. Now, let C
2 be a general conic of F(Q1, . . . , Q4)
and let R1, . . . , R4 be four general points of Γ
′, different from Q1, . . . , Q4. If D
is a general conic of the pencil F(R1, . . . , R4), then the partial normalization C
and D′ of Γ′ ∪C′2 and Γ
′ ∪D′2 obtained, respectively, by smoothing all the singular
points except Q1, . . . , Q4 and R1, . . . , R4, are not isomorphic. Indeed, since C is a
general curve of genus 3n−7
≥ 7, the only automorphism of C is the identity. This
proves that dim(F−1([C])) = 5. In particular, we deduce that
dim(Π|S (S
0)) = 3
3n− 7
− 3 + 5
dim(Π|S (S) ≥ 3
3n− 7
− 3 + 9 = 3
3(n+ 2)− 7
Remark 4.12. We expect that it is possible to prove that Σnk,d has expected number
of moduli for every ρ also when k = 2 or k = 3. By corollary 2.7 and theorem 4.9,
Σnk,d is not empty, irreducible and it has expected number of moduli for ρ ≤ 0 and
ρ ≥ 2k. In order to extend theorem 4.11 to the case k = 2 and k = 3 one needs to
consider a finite number of cases.
Acknowledgment. The results of this paper are part of my PhD-thesis. I would
like to express my gratitude to my advisor Prof. C. Ciliberto who initiated me into
the subject of algebraic geometry and who provided me many invaluable suggestions.
I have also enjoyed and benefited from conversation with many people including F.
Flamini, E. Sernesi, L. Chiantini, L. Caporaso and G. Pareschi. Finally, I would
like to thank the referee for useful remarks which allowed me to improve the finale
version of this paper.
22 CONCETTINA GALATI
References
[1] E. Arbarello and M. Cornalba: Su una proprietà notevole dei morfismi di una curva a moduli
generali in uno spazio proiettivo, Rend. Sem. Mat. Univ. Politec. Torino, vol. 38 (1980), no. 2,
87–99 (1981).
[2] E. Arbarello and M. Cornalba: Su una congettura di Petri., Comment. Math. Helv., vol. 56
(1981), no. 1, 1–38.
[3] E. Arbarello and M. Cornalba: A few remarks about the variety of irreducible plane curves of
given degree and genus., Ann. Sci. École Norm. Sup. (4), vol. 16 (1983), 467–488 (1984).
[4] E. Arbarello and M. Cornalba, P.A. Griffiths, J. Harris: Geometry of algebraic curves., vol. 1,
Springer-Verlag.
[5] A. Arsie and C. Galati: Geometric k-normality of curves and applications, Le Matematiche,
Vol. LVIII (2003), Fasc. II, 179–199.
[6] S. Diaz and J. Harris: Ideals associated to deformations of singular plane curves, Transactions
of the American Mathematical Society, vol. 309, n. 2, 433–468 (1988).
[7] C. Galati: Number of moduli of plane curves with nodes and cusps., PhD thesis, Università
degli Studi di Tor Vergata, 2004-2005.
[8] J. Harris: On the Severi problem, Invent. Math., vol. 84 (1986), no. 3, 445–461.
[9] J. Harris and I. Morrison: Moduli of curves, Graduate texts in mathematics, vol. 187, Springer,
New York, 1988.
[10] D. Eisenbud and J. Harris: The Kodaira dimension of the moduli space of curves of genus
≥ 23. Invent. Math. vol. 90 (1987), no. 2, 359–387.
[11] G.M Greuel and U. Karras: Families of varieties with prescribed singularities, Compositio
Math. vol. 69 (1989), no. 1, 83–110.
[12] G.M. Greuel, C. Lossen, and E. Shustin: Castelnuovo function, zero-dimensional schemes
and singular plane curves. J. Algebraic Geom. vol. 9 (2000), no. 4, 663–710.
[13] E. Horikawa: On the deformations of the holomorphic maps I, J. Math. Soc. Japan, vol. 25
(1973), 372–396.
[14] E. Horikawa: On the deformations of the holomorphic maps II, J. Math. Soc. Japan, vol. 26
(1974), 647–667.
[15] P. Kang: A note on the variety of plane curves with nodes and cusps, Proc. Amer. Math.
Soc. vol. 106 (1989), no. 2, 309–312.
[16] P. Kang: On the variety of plane curves of degree d with δ nodes and k cusps, Trans. Amer.
Math. Soc. vol. 316 (1989), no. 1, 165–192.
[17] D. Mumford: Lectures on curves on an algebraic surface. , Princeton University Press, 1966.
[18] E. Sernesi: On the existence of certain families of curves, curves. Invent. Math. vol. 75 (1984),
no. 1, 25–57.
[19] F. Severi: Vorlesungen über algebraische Geometrie, Teuner, Leipzig, 1921.
[20] E. Shustin: Smoothness and irreducibility of varieties of plane curves with nodes and cusps,
Bull. Soc. Math. France vol. 122 (1994), no. 2, 235–253.
[21] E. Shustin: Equiclassical deformations of plane algebraic curves, Singularities (Oberwolfach,
1996), 195–204, Progr. Math., vol. 162, Birkhuser, Basel, 1998.
[22] A. Tannenbaum: On the classical characteristic linear series of plane curves with nodes and
cuspidal points: two examples of Beniamino Segre, Compositio Mathematica vol. 51 (1984),
169–183.
[23] J. Wahl: Deformations of plane curves with nodes and cusps, Amer. J. Math. vol. 96 (1974),
529–577.
[24] O. Zariski: Dimension theoretic characterization of maximal irreducible sistems of plane
nodal curves, Amer. J. Math. vol. 104 (1982), no. 1, 209–226.
[25] O. Zariski: Algebraic surfaces, Classics in mathematics, Springer.
Dipartimento di Matematica, Università degli Studi della Calabria, via P. Bucci,
cubo 30B, Arcavacata di Rende (CS)
E-mail address: galati@mat.unical.it
	1. Introduction
	2. Preliminaries
	2.1. On Severi-Enriques varieties
	2.2. Known results on the number of moduli of nk,d
	3. On the existence of certain families of plane curves with nodes and cusps in sufficiently general position
	4. Families of plane curves with nodes and cusps with finite and expected number of moduli.
	Acknowledgment
	References
ABSTRACT
  Consider the family S of irreducible plane curves of degree n with d nodes
and k cusps as singularities. Let W be an irreducible component of S. We
consider the natural rational map from W to the moduli space of curves of genus
g=(n-1)(n-2)/2-d-k. We define the "number of moduli of W" as the dimension of
the image of W with respect to this map. If W has the expected dimension equal
to 3n+g-1-k, then the number of moduli of W is at most equal to the min(3g-3,
3g-3+\rho-k), dove \rho is the Brill-Neother number of the linear series of
degree n and dimension 2 on a smooth curve of genus g. We say that W has the
expected number of moduli if the equality holds. In this paper we construct
examples of families of irreducible plane curves with nodes and cusps as
singularities having expected number of moduli and with non-positive
Brill-Noether number.

<|endoftext|><|startoftext|>
Introduction
Identifying the mechanism of electroweak symmetry breaking will be one of the main goals of
the LHC. Many possibilities have been studied in the literature, of which the most popular
ones are the Higgs mechanism within the Standard Model (SM) and within the Minimal
Supersymmetric Standard Model (MSSM) [1]. Contrary to the case of the SM, in the MSSM
two Higgs doublets are required. This results in five physical Higgs bosons instead of the
single Higgs boson of the SM. These are the light and heavy CP-even Higgs bosons, h and
H , the CP-odd Higgs boson, A, and the charged Higgs boson, H±.1 The Higgs sector
of the MSSM can be specified at lowest order in terms of the gauge couplings, the ratio
of the two Higgs vacuum expectation values, tan β ≡ v2/v1, and the mass of the CP-odd
Higgs boson, MA. Consequently, the masses of the CP-even neutral Higgs bosons and the
charged Higgs boson are dependent quantities that can be predicted in terms of the Higgs-
sector parameters. Higgs-phenomenology in the MSSM is strongly affected by higher-order
corrections, in particular from the sector of the third generation quarks and squarks, so that
the dependencies on various other MSSM parameters can be important.
After the termination of LEP in the year 2000 (the final LEP results can be found in
Refs. [2, 3]), and the (ongoing) Higgs boson search at the Tevatron [4–6], the search will be
continued at the LHC [7–9] (see also Refs. [10, 11] for recent reviews). The current exclusion
bounds within the MSSM [3–5] and the prospective sensitivities at the LHC are usually dis-
played in terms of the parameters MA and tan β that characterize the MSSM Higgs sector
at lowest order. The other MSSM parameters are conventionally fixed according to certain
benchmark scenarios [12–14]. The most prominent one is the “mmaxh scenario”, which in the
search for the light CP-even Higgs boson allows to obtain conservative bounds on tan β for
fixed values of the top-quark mass and the scale of the supersymmetric particles [15]. Besides
the “no-mixing scenario”, which is similar to the mmaxh scenario, but assumes vanishing mix-
ing in the stop sector, other CP-conserving scenarios that have been studied in LHC analyses
(see e.g. Ref. [11]) are the “gluophobic Higgs scenario” and the “small αeff” scenario [13].
For the interpretation of the exclusion bounds and prospective discovery contours in
the benchmark scenarios it is important to assess how sensitively the results depend on
those parameters that have been fixed according to the benchmark prescriptions. While in
the decoupling limit, which is the region of MSSM parameter space with MA ≫ MZ , the
couplings of the light CP-even Higgs boson approach those of a SM Higgs boson with the
same mass, the couplings of the heavy Higgs bosons of the MSSM can be sizably affected
by higher-order contributions even for large values of MA. The kinematics of the heavy
Higgs-boson production processes, on the other hand, is governed by the parameter MA,
since in the region of large MA the heavy MSSM Higgs bosons are nearly mass-degenerate,
MA ≈ MH ≈ MH±. In Ref. [14] it has been shown that higher-order contributions to the
relation between the bottom-quark mass and the bottom-Yukawa coupling have a dramatic
effect on the exclusion bounds in the MA–tanβ plane obtained from the bb̄φ, φ → bb̄ channel
at the Tevatron.
In this article we investigate how the 5 σ discovery regions in the MA–tanβ plane for the
heavy neutral MSSM Higgs bosons (a corresponding analysis for the charged Higgs-boson
1We focus in this paper on the case without explicit CP-violation in the soft supersymmetry-breaking
terms.
search will be presented elsewhere) obtainable with the CMS experiment at the LHC depend
on the other MSSM parameters. For the experimental sensitivities achievable with CMS we
use up-to-date results based on full simulation studies for 30 or 60 fb−1(depending on the
channel) [9]. This information is combined with precise theory predictions for the Higgs-
boson masses and the involved production and decay processes incorporating higher-order
corrections at the one-loop and two-loop level. In our analysis we investigate the impact on
the discovery reach arising both from higher-order corrections and from possible decays of
the heavy Higgs bosons into supersymmetric particles.2
The search for the heavy neutral MSSM Higgs bosons at the LHC will mainly be pursued
in the b quark associated production with a subsequent decay to τ leptons [7–9]. In the region
of large tanβ this production process benefits from an enhancement factor of tan2 β compared
to the SM case. The main search channels are3 (here and in the following φ denotes the two
heavy neutral MSSM Higgs bosons, φ = H,A):
bb̄φ, φ → τ+τ− → 2 jets (1)
bb̄φ, φ → τ+τ− → µ+ jet (2)
bb̄φ, φ → τ+τ− → e+ jet (3)
bb̄φ, φ → τ+τ− → e+ µ . (4)
For our numerical analysis we use the program FeynHiggs [19–22]. We study in particular
the dependence of the “LHC wedge” region, i.e. the region in which only the light CP-even
MSSM Higgs boson can be detected at the LHC at the 5 σ level, on the variation of the
higgsino mass parameter µ. The dependence on µ enters in two different ways, on the one
hand via higher-order corrections affecting the relation between the bottom mass and the
bottom Yukawa coupling, and on the other hand via the kinematics of Higgs decays into
supersymmetric particles. We analyze both effects separately and discuss the possible impact
of other supersymmetric parameters.
Our results for the discovery reach of the heavy neutral MSSM Higgs bosons extend the
known results in the literature in various ways. In comparison with Refs. [23, 24], where the
prospective 5σ discovery contours for CMS in the MA–tanβ plane of the m
h benchmark
scenario were given for three different values of µ, the results in the present paper are based
on full simulation studies and make use of the most up-to-date CMS tools for triggering
and event reconstruction. Furthermore, in the analysis of Refs. [23, 24] relevant higher-
order corrections, in particular those depending on ∆b (see Sect. 2.2 below), have been
neglected. The effects induced by the ∆b corrections have been investigated in Ref. [14],
where the results were obtained by a simple rescaling of the experimental results given in
Refs. [7, 23–25]. Our present analysis, on the other hand, makes use of the latest CMS studies
and provides a separate treatment of the different τ final states, channels (1)–(4).
As a second step of our analysis we investigate the experimental precision that can be
achieved for the determination of the heavy Higgs-boson masses in the discovery channels (1)–
2We restrict our analysis to the impact of supersymmetric contributions. For a discussion of uncertainties
related to parton distribution functions, see e.g. Ref. [16].
3In our analysis we do not consider diffractive Higgs production, pp → p ⊕ H ⊕ p [17]. For a detailed
discussion of the search reach for the heavy neutral MSSM Higgs bosons in diffractive Higgs production we
refer to Ref. [18].
(4). We discuss the prospective accuracy of the mass measurement in view of the possibility
to experimentally resolve the signals of the heavy neutral MSSM Higgs bosons.
The paper is organized as follows: Sect. 2 introduces our notation and gives a brief sum-
mary of the most relevant supersymmetric radiative corrections to the Higgs-boson masses,
production cross sections and decay widths at the LHC. The relevant benchmark scenarios
are briefly reviewed. In Sect. 3 the experimental analysis is described. The results for the
variation of the 5 σ discovery contours, obtainable at CMS with 30 or 60 fb−1 are given
in Sect. 4, where we also discuss the achievable experimental precision in the Higgs mass
determination. The conclusions can be found in Sect. 5.
2 Phenomenology of the MSSM Higgs sector
2.1 Notation
The MSSM Higgs sector at lowest order is described in terms of two independent parameters
(besides the SM gauge couplings): tan β ≡ v2/v1, the ratio of the two vacuum expectation
values, and MA, the mass of the CP-odd Higgs boson A. Beyond the tree-level, large
radiative corrections can occur from the t/t̃ sector, and for large values of tanβ also from
the b/b̃ sector.
Our notations for the scalar top and scalar bottom sector of the MSSM are as follows:
the mass matrices in the basis of the current eigenstates t̃L, t̃R and b̃L, b̃R are given by
+m2t + cos 2β (
s2w)M
Z mtXt
mtXt M
+m2t +
cos 2β s2wM
, (5)
+m2b + cos 2β (−12 +
s2w)M
Z mbXb
mbXb M
+m2b − 13 cos 2β s
, (6)
where
mtXt = mt(At − µ cotβ ), mb Xb = mb (Ab − µ tanβ ). (7)
Here MQ̃, Mt̃R and Mb̃R are the diagonal soft SUSY-breaking parameters, At denotes the
trilinear Higgs–stop coupling, Ab denotes the Higgs–sbottom coupling, and µ is the higgsino
mass parameter.
For the numerical evaluation, it is often convenient to choose
MQ̃ = Mt̃R = Mb̃R =: MSUSY. (8)
Concerning analyses for the case where Mt̃R 6= MQ̃ 6= Mb̃R , see e.g. Refs. [20, 26, 27]. It
has been shown that the upper bound on the mass of the light CP-even Higgs boson, Mh,
obtained using eq. (8) is the same as for the more general case, provided that MSUSY is
identified with the heaviest mass of MQ̃,Mt̃R ,Mb̃R [20].
Accordingly, the most important parameters entering the Higgs-sector predictions via
higher-order corrections are mt, MSUSY, Xt, Xb and µ (see also the discussion in Sect. 2.2.2
below). The Higgs-sector observables furthermore depend on the SU(2) gaugino mass param-
eter, M2, the U(1) parameter M1 and the gluino mass, mg̃ (the latter enters the predictions
for the Higgs-boson masses only from two-loop order on). In numerical analyses the U(1)
gaugino mass parameter, M1, is often fixed via the GUT relation
M2. (9)
We will briefly comment below on the possible impact of complex phases entering the Higgs-
sector predictions via higher-order contributions.
2.2 Higher-order corrections in the Higgs sector
In the following we briefly summarize the most important higher-order corrections affecting
the observables in the MSSM Higgs-boson sector. As mentioned above, we focus on the
MSSM with real parameters. For our numerical analysis we use the program FeynHiggs [19–
22]4, which incorporates a comprehensive set of higher-order results obtained in the Feynman-
diagrammatic approach [20–22, 28–30].
2.2.1 Higgs-boson propagator corrections
Higher-order corrections to the Higgs-boson masses and the wave function normalization
factors of processes with external Higgs bosons arise from Higgs-boson propagator-type con-
tributions. These corrections furthermore contribute in a universal way to all Higgs-boson
couplings. For the propagator-type corrections in the MSSM the complete one-loop re-
sults [31–34], the bulk of the two-loop contributions [20, 27–29, 35–39] and even leading
three-loop corrections [40] are known. The remaining theoretical uncertainty on the light
CP-even Higgs-boson mass has been estimated to be below ∼ 3 GeV [21, 41]. The by far
dominant contribution is the O(αt) term due to top and stop loops (αt ≡ h2t/(4π), where ht
denotes the top-quark Yukawa coupling). Effects of O(αb) can be important for large values
of tan β.
2.2.2 Corrections to the relation between the bottom-quark mass and the bot-
tom Yukawa coupling
Concerning the corrections from the bottom/sbottom sector, large higher-order effects can
in particular occur in the relation between the bottom-quark mass and the bottom Yukawa
coupling (which controls the interaction between the Higgs bosons and bottom quarks as
well as between the Higgs and scalar bottoms), hb, for large values of tanβ. At lowest order
the relation reads mb = hbv1. Beyond the tree level large radiative corrections proportional
to hbv2 are induced, giving rise to tanβ-enhanced contributions [36–38,42]. At the one-loop
level the leading terms proportional to v2 are generated either by gluino–sbottom one-loop
diagrams of O(αs) or by chargino–stop loops of O(αt).
The leading one-loop contribution ∆b in the limit of MSUSY ≫ mt and tanβ ≫ 1 takes
the simple form [36]
mg̃ µ tanβ × I(mb̃1 , mb̃2 , mg̃) +
At µ tan β × I(mt̃1 , mt̃2 , µ) , (10)
4 The code can be obtained from www.feynhiggs.de .
where the function I is given by
I(a, b, c) =
(a2 − b2)(b2 − c2)(a2 − c2)
a2b2 log
+ b2c2 log
+ c2a2 log
max(a2, b2, c2)
The leading contribution can be resummed to all orders in the perturbative expansion [36–
38]. This leads in particular to the replacement
1 + ∆b
, (12)
where mb denotes the running bottom quark mass including SM QCD corrections. For the
numerical evaluations in this paper we choose mb = mb(mt) ≈ 2.97 GeV.
The ∆b corrections are numerically sizable for large tan β in combination with large values
of the ratios of µmg̃/M
SUSY or µAt/M
SUSY. Negative values of ∆b lead to an enhancement of
the bottom Yukawa coupling as a consequence of eq. (12) (for extreme values of µ and tanβ
the bottom Yukawa coupling can even acquire non-perturbative values when ∆b → −1),
while positive values of ∆b give rise to a suppression of the Yukawa coupling. Since a change
in the sign of µ reverses the sign of ∆b, the bottom Yukawa coupling can exhibit a very
pronounced dependence on the parameter µ.
For large values of tanβ the correction to the production cross sections of the Higgs
bosons H and A induced by ∆b enters approximately like tan
2 β/(1 + ∆b)
2, giving rise
to potentially large numerical effects. In the case of the subsequent Higgs-boson decay
φ → τ+τ−, however, the ∆b corrections in the production and the decay process cancel
each other to a large extent. The residual ∆b dependence of σ(bb̄φ) × BR(φ → τ+τ−) is
approximately given by tan2 β/((1+∆b)
2+9), which has a much weaker ∆b dependence (see
Ref. [14] for a more detailed discussion).
In the numerical analysis below the ∆b corrections, which have been discussed in this
section in terms of simple approximation formulae, will be supplemented by other higher-
order corrections as implemented in the program FeynHiggs (and possible decay modes into
supersymmetric particles are taken into account). Higher-order corrections to Higgs decays
into τ+τ− within the SM and MSSM have been evaluated in Refs. [34, 43].
2.2.3 Corrections to the Higgs production cross sections
For the prediction of Higgs-boson production processes at hadron colliders SM-type QCD
corrections in general play an important role. The SM predictions for the process bb̄ → φ+X
at the LHC are far advanced. In the five-flavor scheme the SM cross section is known at
NNLO in QCD [44]. The cross section in the four-flavor scheme is known at NLO [45, 46].
Results obtained in the two schemes have been shown to be consistent [47–49] (see also
Refs. [48, 50] and Refs. [45, 46] for results with one and two final-state b-quarks at high-pT ,
respectively).
The predictions for the bb̄ → φ + X cross sections in the MSSM have been obtained
with FeynHiggs [19–22]. The FeynHiggs implementation5 is based on the state-of-the-art
5The inclusion of the charged Higgs production cross sections is planned for the near future.
SM prediction, namely the NNLO result in the five-flavor scheme [44] using MRST2002
parton distributions at NNLO [51], with the renormalization scale set equal to MHSM and
the factorization scale set equal to MHSM/4. In order to obtain the MSSM prediction the
SM cross section is rescaled with the ratio of the partial widths in the MSSM and the SM,
Γ(φ → bb̄)MSSM
Γ(φ → bb̄)SM
. (13)
The evaluation of the partial widths incorporates one-loop SM QCD and SUSY QCD correc-
tions, as well as (in the SUSY case) the resummation of all terms of O((αs tanβ)n) [34,37,43]
and the proper normalization of the external Higgs bosons as discussed in Refs. [22, 52]. Since
the approximation of rescaling the SM cross section with the ratio of partial widths does not
take into account the MSSM-specific dynamics of the production processes, the theoretical
uncertainty in the predictions for the cross sections will in general be somewhat larger than
for the decay widths. It should be noted that in comparison with other approaches for treat-
ing the SM and SUSY contributions, for instance the program HQQ [53], sizable deviations
can occur as a consequence of differences in the scale choices and the inclusion of higher-order
corrections.
2.3 The mmaxh and no-mixing benchmark scenarios
While the phenomenology of the production and decay processes of the heavy neutral MSSM
Higgs bosons at the LHC is mainly characterised by the parametersMA and tanβ that govern
the Higgs sector at lowest order, other MSSM parameters enter via higher-order contribu-
tions, as discussed above, and via the kinematics of Higgs-boson decays into supersymmetric
particles. The other MSSM parameters are usually fixed in terms of benchmark scenarios.
The most commonly used scenarios are the “mmaxh ” and “no-mixing” benchmark scenar-
ios [12–14]. According to the definition of Ref. [13] the mmaxh scenario is given by
mmaxh : MSUSY = 1000 GeV, Xt = 2MSUSY, Ab = At,
µ = 200 GeV, M2 = 200 GeV, mg̃ = 0.8MSUSY . (14)
The no-mixing scenario differs from the mmaxh scenario only in that it has vanishing mixing
in the stop sector and a larger value of MSUSY
no-mixing: MSUSY = 2000 GeV, Xt = 0, Ab = At,
µ = 200 GeV, M2 = 200 GeV, mg̃ = 0.8MSUSY . (15)
The value of the top-quark mass in Ref. [13] was chosen according to the experimental central
value at that time. For our numerical analysis below, we use the value, mt = 171.4 GeV [54]
In Ref. [14] it was suggested that in the search for heavy MSSM Higgs bosons the mmaxh
and no-mixing scenarios, which originally were mainly designed for the search for the light
CP-even Higgs boson h, should be extended by several discrete values of µ,
µ = ±200,±500,±1000 GeV . (16)
6 Most recently the central experimental value has shifted to mt = 170.9± 1.8 GeV [55]. This shift has
a negligible impact on our analysis.
As discussed above, the variation of µ in particular has an impact on the correction ∆b,
modifying in this way the bottom Yukawa coupling. For very large values of tan β and
large negative values of µ the bottom Yukawa coupling can be so much enhanced that a
perturbative treatment is no longer possible. We have checked that in our analysis of the
LHC discovery contours the bottom Yukawa coupling stays in the perturbative regime, so
that all values of µ down to µ = −1000 GeV can safely be inserted.
The variation of the parameter µ also modifies the mass spectrum and the couplings in
the chargino and neutralino sector of the MSSM. Besides the small higher-order corrections
induced by loop diagrams involving charginos and neutralinos, a change in the mass spectrum
of the chargino and neutralino sector can have an important effect on Higgs phenomenology
because decay modes of the heavy neutral MSSM Higgs bosons into charginos and neutralinos
open up if the supersymmetric particles are sufficiently light (the mass spectrum in the mmaxh
and no-mixing scenarios respects the limits from direct searches for charginos at LEP [56]
for all values of µ specified in eq. (16)).
Differences between the mmaxh and no-mixing scenarios in the searches for heavy neutral
MSSM Higgs bosons are induced in particular by a difference in the ∆b correction. While in
the mmaxh scenario both the O(αs) and O(αt) contributions to ∆b can be sizable, see eq. (10),
in the no-mixing scenario the O(αt) contribution is very small because At is close to zero in
this case. The larger value of MSUSY in the no-mixing scenario gives rise to an additional
suppression of |∆b| compared to the mmaxh scenario.
3 Experimental analysis
In this section we briefly review the recent CMS analysis of the φ → τ+τ− channel, see
Ref. [9], yielding the number of events needed for a 5 σ discovery (depending on the mass
of the Higgs boson). The analysis was performed with full CMS detector simulation and
reconstruction for the following four final states of di-τ -lepton decays: τ+τ− → jets [57],
τ+τ− → e+ jet [58], τ+τ− → µ+ jet [59] and τ+τ− → e + µ [60].
The Higgs-boson production in association with b quarks, pp → bb̄φ, has been selected
using single b-jet tagging in the experimental analysis. The kinematics of the gg → bb̄φ
production process (2 → 3) was generated with PYTHIA [61]. It has been shown that in
this way the NLO kinematics is better reproduced than using the PYTHIA gb → bφ process
(2 → 2) [62]. The backgrounds considered in the analysis were QCD muli-jet events (for the
ττ → jets mode), tt̄, bb̄, Drell-Yan production of Z, γ∗, W+jet, Wt and ττbb̄. All background
processes were generated using PYTHIA, except for τ+τ−bb̄, which was generated using
CompHEP [63].
The results for the various channels, eqs. (1) – (4), are given in Tabs. 1 – 4. For every
Higgs-boson mass point studied we show the number of signal events needed for 5 σ discovery,
NS, the total experimental selection efficiency, εexp, and the ratio of the di-τ mass resolution
to the Higgs-boson mass, RMφ . The last row in Tabs. 1 – 4 shows the expected precision of
the Higgs-boson mass measurement, evaluated as explained below, for parameter points on
the 5 σ discovery contour. Detector effects, experimental systematics and uncertainties of
the background determination were taken into account in the evaluation of the NS. These
effects reduce the discovery region in the MA–tanβ plane as shown in previous analyses [9]
φ → τ+τ− → jets, 60 fb−1
MA [GeV] 200 500 800
NS 63 35 17
εexp 2.5× 10−4 2.4× 10−3 3.6× 10−3
RMφ 0.176 0.171 0.187
∆Mφ/Mφ [%] 2.2 2.8 4.5
Table 1: Required number of signal events, NS, with L = 60 fb−1 for a 5 σ discovery in
the channel φ → τ+τ− → jets. Furthermore given are the total experimental selection
efficiency, εexp, the ratio of the di-τ mass resolution to the Higgs-boson mass, RMφ, and the
expected precision of the Higgs-boson mass measurement, ∆Mφ/Mφ, obtainable from NS
signal events.
φ → τ+τ− → e+ jet, 30 fb−1
MA [GeV] 200 300 500
NS 72.9 45.5 32.8
εexp 3.0× 10−3 6.4× 10−3 1.0× 10−2
RMφ 0.216 0.214 0.230
∆Mφ/Mφ [%] 2.5 3.2 4.0
Table 2: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the
channel φ → τ+τ− → e+ jet. The other quantities are defined as in Tab. 1.
φ → τ+τ− → µ+ jet, 30 fb−1
MA [GeV] 200 500
NS 79 57
εexp 7.0× 10−3 2.0× 10−2
RMφ 0.210 0.200
∆Mφ/Mφ [%] 2.4 2.6
Table 3: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the
channel φ → τ+τ− → µ+ jet. The other quantities are defined as in Tab. 1.
φ → τ+τ− → e + µ, 30 fb−1
MA [GeV] 200 250
NS 87.8 136.7
εexp 6.4× 10−3 1.1× 10−2
RMφ 0.262 0.412
∆Mφ/Mφ [%] 2.8 3.5
Table 4: Required number of signal events, NS, with L = 30 fb−1 for a 5 σ discovery in the
channel φ → τ+τ− → e+ µ. The other quantities are defined as in Tab. 1.
(see in particular Fig. 5.6 of Ref. [9] for the τ+τ− → µ+ jet mode).
Now we turn to the evaluation of the expected precision of the Higgs-boson mass mea-
surement. In spite of the escaping neutrinos, the Higgs-boson mass can be reconstructed
in the H,A → ττ channel from the visible τ momenta (τ jets) and the missing transverse
energy, EmissT , using the collinearity approximation for neutrinos from highly boosted τ ’s. In
the investigated region of MA and tanβ the two states A and H are nearly mass-degenerate.
For most values of the other MSSM parameters the mass difference of A and H is much
smaller than the achievable mass resolution. In this case the difference in reconstructing the
A or the H will have no relevant effect on the achievable accuracy in the mass determina-
tion. In some regions of the MSSM parameter space, however, a sizable splitting between
MA and MH can occur even for MA ≫ MZ . We will discuss below the prospects in scenarios
where the splitting between MA and MH is relatively large. The precision ∆Mφ/Mφ shown
in Tabs. 1 – 4 is derived for the border of the parameter space in which a 5 σ discovery
can be claimed, i.e. with NS observed Higgs events. The statistical accuracy of the mass
measurement has been evaluated via
. (17)
A higher precision can be achieved if more than NS events are observed. The corresponding
estimate for the precision is obtained by replacing NS in eq. (17) by the number of observed
signal events, Nev. It should be noted that the prospective accuracy obtained from eq. (17)
does not take into account the uncertainties of the jet and missing ET energy scales. In
the τ+τ− → jets mode these effects can lead to an additional 3% uncertainty in the mass
measurement [57]. A more dedicated procedure of the mass measurement from the signal
plus background data still has to be developed in the experimental analysis. However, we do
not expect that the additional uncertainties will considerably degrade the accuracy of the
Higgs boson mass measurement as calculated with eq. (17).
4 Results
The results quoted in Sect. 3 for the required number of signal events depend only on the
Higgs-boson mass, i.e. the event kinematics, but are independent of any specific MSSM
scenario. In order to determine the 5 σ discovery contours in the MA–tan β plane these
results have to be confronted with the MSSM predictions. The number of signal events, Nev,
for a given parameter point is evaluated via
Nev = L × σbb̄φ × BR(φ → τ+τ−)× BRττ × εexp . (18)
Here L denotes the luminosity collected with the CMS detector, σbb̄φ is the Higgs-boson pro-
duction cross section, BR(φ → τ+τ−) is the branching ratio of the Higgs boson to τ leptons,
BRττ is the product of the branching ratios of the two τ leptons into their respective final
state,
BR(τ → jet +X) ≈ 0.65 , (19)
BR(τ → µ+X) ≈ BR(τ → e+X) ≈ 0.175 , (20)
and εexp denotes the total experimental selection efficiency for the respective process (as
given in Tabs. 1 – 4). The Higgs-boson production cross sections and decay branching ratios
have been evaluated with FeynHiggs as described in Sect. 2.2.
4.1 Discovery reach for heavy neutral MSSM Higgs bosons
The number of signal events, Nev, in the MSSM depends besides the parameters MA and
tan β, which govern the MSSM Higgs sector at lowest order, in principle also on all other
MSSM parameters. In the following we analyze how stable the results for the 5σ discovery
contours in theMA–tanβ plane are with respect to variations of the other MSSM parameters.
We take into account both effects from higher-order corrections, as discussed in Sect. 2.2,
and from decays of the heavy Higgs bosons into supersymmetric particles. As starting point
of our analysis we use the mmaxh and no-mixing benchmark scenarios, where we investigate
in detail the sensitivity of the discovery contours with respect to variations of the parameter
µ. We then discuss the possible impact of varying other MSSM parameters.
We have evaluated Nev in the two benchmark scenarios as a function of MA and tan β.
For fixed MA we have varied tan β such that Nev = NS (as given in Tabs. 1 – 4). This tanβ
value is then identified as the point on the 5 σ discovery contour corresponding to the chosen
value of MA. In this way we have determined the 5 σ discovery contours for the m
h and
the no-mixing scenarios for µ = ±200,±1000 GeV.
In Figs. 1 – 3 we show the 5σ discovery contours obtained from the process bb̄φ, φ → τ+τ−
for the final states τ+τ− → jets, τ+τ− → e + jet and τ+τ− → µ + jet. As can be seen
from Tab. 4, the fourth channel discussed above, τ+τ− → e + µ, contributes for 30 fb−1
only in the region of relatively small MA values and has a lower sensitivity than the other
three channels. We therefore omit this channel in the following discussion. The discovery
contours in Figs. 1 – 3 are given for the mmaxh and no-mixing benchmark scenarios with µ =
±200,±1000 GeV. As explained above, the 5 σ discovery contours are affected by a change
in µ in two ways. Higher-order contributions, in particular the ones associated with ∆b,
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
 scenariomaxhm
2 = 1 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
SUSY = 2 MtStop mix: X
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
no mixing scenario
2 = 2 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
 = 0tStop mix: X
Figure 1: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− →
jets in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ.
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 30 fb
 e+j→ ττ → φ bb→pp 
 scenariomaxhm
2 = 1 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
SUSY = 2 MtStop mix: X
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 30 fb
 e+j→ ττ → φ bb→pp 
no mixing scenario
2 = 2 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
 = 0tStop mix: X
Figure 2: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− →
e+ jet in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ.
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 30 fb
+jµ → ττ → φ bb→pp 
 scenariomaxhm
2 = 1 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
SUSY = 2 MtStop mix: X
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 30 fb
+jµ → ττ → φ bb→pp 
no mixing scenario
2 = 2 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
 = 0tStop mix: X
Figure 3: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− →
µ+ jet in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ.
modify the Higgs-boson production cross sections and decay branching ratios. Furthermore
the mass eigenvalues of the charginos and neutralinos vary with µ, possibly opening up the
decay channels of the Higgs bosons to supersymmetric particles, which reduces the branching
ratio to τ leptons.
The results for the 5 σ discovery contours for the final state τ+τ− → jets are shown in
Fig. 1 for themmaxh (left) and the no-mixing (right) scenario. As expected from the discussion
of the ∆b corrections in Sect. 2.2, the variation of the 5 σ discovery contours with µ is more
pronounced in the mmaxh scenario, where a shift up to ∆ tanβ = 12 can be observed for
MA = 800 GeV. For lowMA values (corresponding also to lower tanβ values on the discovery
contours) the variation stays below ∆ tanβ = 3. In the no-mixing scenario the variation does
not exceed ∆ tan β = 5. The τ+τ− → jets channel has also been discussed in Ref. [14]. Our
results, which are based on the latest CMS studies using full simulation [57], are qualitatively
in good agreement with Ref. [14], in which the earlier CMS studies of Refs. [23, 24] had beed
used. The 5 σ discovery regions are largest for µ = −1000 GeV and pushed to highest tanβ
values for µ = +200 GeV. In the low MA region our discovery contours are very similar
to those obtained in Ref. [14]. In the high MA region, MA ∼ 800 GeV, corresponding to
larger values of tan β on the discovery contours, our improved evaluation of the 5 σ discovery
contours gives rise to a shift towards higher tan β values compared to Ref. [14] of about
∆ tanβ = 8 (mostly due to the up-to-date experimental input). Accordingly, we find a
smaller discovery region compared to Ref. [14] and therefore an enlarged “LHC wedge”
region where only the light CP-even MSSM Higgs boson can be detected at the 5 σ level.
The results for the channel τ+τ− → e+ jet are shown in Fig. 2. Again the mmaxh scenario
shows a stronger variation than the no-mixing scenario. The resulting shift in tan β reaches
up to ∆ tan β = 8 for MA = 500 GeV in the m
h scenario, but stays below ∆ tanβ = 4
for the no-mixing scenario. Finally in Fig. 3 the results for the channel τ+τ− → µ+ jet are
depicted. The level of variation of the 5 σ discovery contours is the same as for the e + jet
final state.7
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
)=0χ χ → φ, BR(maxhm
2 = 1 TeV/cSUSYM
2 = 200 GeV/c2M
SUSY = 0.8 Mgluinom
SUSY = 2 MtStop mix: X
,GeV/cAM
100 200 300 400 500 600 700 800
2 = -1000 GeV/cµ
2 = -200 GeV/cµ
2 = 200 GeV/cµ
2 = 1000 GeV/cµ
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
)=0χ χ → φno mixing, BR(
2 = 2 TeV/c
2 = 200 GeV/c2M
 = 0.8 M
gluino
 = 0tStop mix: X
Figure 4: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− →
jets in the mmaxh (left) and no-mixing (right) benchmark scenarios for different values of µ in
the case where no decays of the heavy Higgs bosons into supersymmetric particles are taken
into account (see text).
In order to gain a better understanding of how sensitively the discovery contours in the
MA–tan β plane depend on the chosen SUSY scenario, it is useful to separately investigate the
different effects caused by varying the parameter µ. For simplicity, we restrict the following
discussion to the bb̄φ, φ → τ+τ− → jets channel. In Fig. 4 we show the same results as
in Fig. 1, but for the case where no decays of the heavy Higgs bosons into supersymmetric
particles are taken into account. As a consequence, the variation of the 5 σ discovery contours
with µ shown in Fig. 4 is purely an effect of higher-order corrections, predominantly those
entering via ∆b. The difference between Fig. 1 and Fig. 4, on the other hand, is purely an
effect of the change in BR(φ → τ+τ−) caused by the variation of the partial Higgs-boson
decay widths into supersymmetric particles arising from a shift in the masses of the charginos
and neutralinos.
In Fig. 4 the dependence of the 5 σ discovery contours on µ significantly differs from the
case of Fig. 1. While in Fig. 1 the inclusion of decays into supersymmetric particles gives
7Since the results of the experimental simulation for this channel are available only for two MA values,
the interpolation is a straight line. This may result in a slightly larger uncertainty of the results shown in
Fig. 3 compared to the other figures.
rise to the fact that the smallest discovery region is found for small µ values, µ = +200 GeV
(with the exception of the region of very small MA), in Fig. 4 the 5 σ discovery contours
are ordered monotonously in µ: the largest (smallest) 5 σ discovery regions are obtained for
µ = −(+)1000 GeV, i.e. for the largest (smallest) values of the bottom Yukawa coupling.
As expected, the effect of the higher-order corrections is largest in the high tanβ-region
(corresponding to large values of MA on the discovery contours). In this region the variation
of µ shifts the discovery contours by up to ∆ tanβ = 11 for the case of the mmaxh scenario
(left plot of Fig. 4), i.e. the effect is about the same as for the case where decays into
supersymmetric particles are included. For lower values of tanβ (corresponding to smaller
values of MA on the discovery contours), on the other hand, the modification of the Higgs
branching ratio as a consequence of decays into supersymmetric particles yields the dominant
effect on the 5 σ discovery contours. Accordingly, the observed variation with µ in this
region is significantly smaller in Fig. 4 as compared to the full result of Fig. 1. The reduced
sensitivity of the discovery contours on µ can also clearly be seen for the case of the no-
mixing scenario (right plot), where as discussed above the ∆b correction is smaller than in
the mmaxh scenario.
,GeV/cAM
100 200 300 400 500 600 700 800
2 = 200 GeV/c
gluino
2 = 500 GeV/c
gluino
2 = 1000 GeV/c
gluino
2 = 2000 GeV/c
gluino
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
 scenariomaxhm
2 = 1 TeV/cSUSYM
2 = 200 GeV/c2M
2 = 1000 GeV/cµ
SUSY = 2 MtStop mix: X
,GeV/cAM
100 200 300 400 500 600 700 800
2 = 200 GeV/c
gluino
2 = 500 GeV/c
gluino
2 = 1000 GeV/c
gluino
2 = 2000 GeV/c
gluino
CMS, 60 fb
 j+j→ ττ → φ bb→pp 
no mixing scenario
2 = 2 TeV/cSUSYM
2 = 200 GeV/c2M
2 = 1000 GeV/cµ
 = 0tStop mix: X
Figure 5: Variation of the 5σ discovery contours obtained from the channel bb̄φ, φ → τ+τ− →
jets in the mmaxh (left) and no-mixing (right) benchmark scenarios with µ = +1000 GeV for
different values of mg̃.
A parameter affecting the ∆b corrections, see eq. (10), but not the kinematics of the
Higgs-boson decays is the gluino mass, mg̃. We now investigate the impact of varying this
parameter, which is normally fixed to the values mg̃ = 800, 1600 GeV in the m
h and
no-mixing benchmark scenarios, respectively. The results for four different values of the
gluino mass, mg̃ = 200, 500, 1000, 2000 GeV, are shown in Fig. 5. The µ parameter has been
set to µ = +1000 GeV in Fig. 5, such that the Higgs decay channels into charginos and
neutralinos are suppressed. As one can see from eq. (10), the change of mg̃ affects the O(αs)
part of ∆b and corresponds to a monotonous increase of ∆b. As an example, this yields for
µ = 1000 GeV, tan β = 50 in the two scenarios:
mmaxh , mg̃ = 200 GeV : ∆b = 0.50
mmaxh , mg̃ = 2000 GeV : ∆b = 0.94
no-mixing, mg̃ = 200 GeV : ∆b = 0.06
no-mixing, mg̃ = 2000 GeV : ∆b = 0.29 . (21)
In the no-mixing scenario the At value is close to zero, suppressing the mg̃-independent
contribution to ∆b, while the higher SUSY mass scale results in an overall reduction of ∆b in
this scenario. The value of ∆b in the no-mixing scenario would slightly increase if mg̃ were
raised to even larger values, but this effect would not change the qualitative behaviour.
Fig. 5 shows that the results for the discovery reach in the MA–tanβ plane are relatively
stable with respect to variations of the gluino mass. The shift in the discovery contours
remains below about ∆ tanβ = 4 for the mmaxh scenario (left plot) and ∆ tanβ = 1 for the
no-mixing scenario (right plot). For the positive sign of µ chosen in Fig. 5, where the ∆b
correction yields a suppression of the bottom Yukawa coupling, the largest discovery reach
is obtained for small mg̃, while the smallest discovery reach is obtained for large mg̃. This
behaviour would be reversed by a change of sign of µ.
We have also investigated the possible impact of other MSSM parameters (besides µ and
mg̃) on the 5 σ discovery contours in the MA–tan β plane. The ∆b corrections depend also on
the parameters in the stop and sbottom sector, see eq. (10). While the formulas in Sect. 2.2.2
have been given for the region where MSUSY ≫ mt, the qualitative effect of reducing the
stop and sbottom masses can nevertheless be inferred. Sizable ∆b corrections require relative
large values of µ and mg̃. If these parameters are kept large while the stop and sbottom
masses are reduced, the ∆b corrections tend to decrease. It is obvious from eq. (10) that
reducing the absolute value of At decreases the electroweak part of the ∆b correction. As
discussed above, this effect of the ∆b corrections manifests itself in the comparison of the
mmaxh and no-mixing scenarios, see Figs. 1–5. Concerning the possible impact of the ∆b
corrections on the 5 σ discovery contours for the bb̄φ, φ → τ+τ− channel in the MA–tanβ
plane we conclude that larger effects than those shown in Figs. 1–5 (where we have displayed
the discovery contours up to tan β = 50) would only arise if the variation of µ were extended
over an even wider interval than −1000 GeV ≤ µ ≤ +1000 GeV as done in our analysis
above.
We now turn to the possible effects of other higher-order corrections beyond those entering
via ∆b on the 5 σ discovery contours for the bb̄φ, φ → τ+τ− channel. These effects are in
general non-negligible, see the discussions in Sect. 2.2 and in Sect. 4.2 below, but smaller
than those induced by ∆b. As a consequence, the impact on the 5 σ discovery contours in the
MA–tan β plane of other supersymmetric parameters entering via higher-order corrections is
in general much smaller than the effect of varying µ in the high-tanβ region of Fig. 4. As
an example, the difference observed in Figs. 1–5 between the mmaxh and no-mixing scenarios
arising from the different values of At and MSUSY in the two scenarios (see eqs. (14), (15))
is mainly an effect of the ∆b corrections, while the impact of other higher-order corrections
involving At and MSUSY is found to be small.
Also the decays of the heavy neutral MSSM Higgs bosons into supersymmetric particles
are in general affected by other supersymmetric parameters in addition to the dependence
on µ, MA and tan β. The resulting effects on BR(φ → τ+τ−) turn out to be rather small,
however. We find that sizable deviations from the values of BR(φ → τ+τ−) occurring in
the mmaxh and no-mixing scenarios for −1000 GeV ≤ µ ≤ +1000 GeV are only possible in
quite extreme regions of the MSSM parameter space that are already highly constrained by
existing experimental data.
Our discussion above has been given in the context of the MSSM with real parameters.
Since the sensitivity of the 5 σ discovery contours in the MA–tan β plane on the other super-
symmetric parameters can mainly be understood as an effect of higher-order corrections to
the bottom Yukawa coupling and of the kinematics of Higgs-boson decays into supersymmet-
ric particles, no qualitative changes of our results are expected for the case where complex
phases are taken into account.
4.2 Higgs-boson mass precision
The discussion in the previous section shows that the prospective discovery reach of the
bb̄φ, φ → τ+τ− channel in theMA–tanβ plane is rather stable with respect to variations of the
other MSSM parameters. We now turn to the second part of our analysis and investigate the
expected statistical precision of the Higgs-boson mass measurement. The expected statistical
precision is evaluated as described in Sect. 3, see eq. (17). In Figs. 6 – 7 we show the expected
precision for the mass measurement achievable from the channel bb̄φ, φ → τ+τ− using the
final states τ+τ− → jets and τ+τ− → e + jet. Within the 5 σ discovery region we have
indicated contour lines corresponding to different values of the expected precision, ∆M/M .
The results are shown in the mmaxh benchmark scenario for µ = −200 GeV (left plots) and
µ = +200 GeV (right plots).
We find that experimental precisions of ∆Mφ/Mφ of 1–4% are reachable within the dis-
covery region. A better precision is reached for larger tanβ and smaller MA as a consequence
of the higher number of signal events in this region. The other scenarios and other values of
µ discussed above yield qualitatively similar results to those shown in Figs. 6, 7.
As discussed above, for large values of MA the heavy neutral MSSM Higgs bosons are
nearly mass-degenerate, MH ≈ MA. The experimental separation of the two states H and A
(or the corresponding mass eigenstates in the CP-violating case) will therefore be challenging.
The results shown in Figs. 6 – 7 have been obtained using the combined sample of H and
A events. It is important to note, however, that even in the region of large MA the mass
splitting between MH and MH can reach the level of a few %. An example of such a scenario
is (as above, we consider the CP-conserving case, i.e. the MSSM with real parameters; the
corresponding scenario in the case of non-vanishing complex phases has been discussed in
Ref. [22])
MSUSY = 500 GeV, At = Ab = 1000 GeV, µ = 1000 GeV,
M2 = 500 GeV, M1 = 250 GeV, mg̃ = 500 GeV . (22)
In Fig. 8 the mass splitting
|MH −MA|
min(MH ,MA)
is given as a function of Xt for tanβ = 40 and two MA values, MA = 300 GeV (solid
line) and MA = 500 GeV (dashed line). The dot-dashed and dotted parts of the contours for
Figure 6: The statistical precision of the Higgs-boson mass measurement achievable from
the channel bb̄φ, φ → τ+τ− → jets in the mmaxh benchmark scenario for µ = −200 GeV (left)
and µ = +200 GeV (right) is shown together with the 5 σ discovery contour.
Figure 7: The statistical precision of the Higgs-boson mass measurement achievable from
the channel bb̄φ, φ → τ+τ− → e + jet in the mmaxh benchmark scenario for µ = −200 GeV
(left) and µ = +200 GeV (right) is shown together with the 5 σ discovery contour.
-1500 -1000 -500 0 500 1000 1500
 [GeV]
 = 500 GeV, tanβ = 40
 = 300 GeV
 = 300 GeV, LEP excl.
 = 500 GeV
 = 500 GeV, LEP excl.
Figure 8: The mass splitting between the heavy neutral MSSM Higgs bosons, ∆MHA/M ≡
|MH −MA| /min(MH ,MA), is shown as a function ofXt forMA = 300, 500 GeV in a scenario
with MSUSY = 500 GeV, µ = 1000 GeV and tanβ = 40. The other parameters are given in
eq. (22). The dot-dashed (dotted) parts of the contours forMA = 300 GeV (MA = 500 GeV)
indicate parameter combinations that are excluded by the search for the light CP-even Higgs
boson of the MSSM at LEP [3].
MA = 300, 500 GeV, respectively, in the region of small |Xt| indicate parameter combinations
that result in relatively low Mh values that are excluded by the search for the light CP-even
Higgs boson of the MSSM at LEP [3]. One can see in Fig. 8 that the mass splitting between
MH and MA shows a pronounced dependence on Xt in this scenario. Mass differences of up
to 5% are possible for large Xt (while the widths of the Higgs bosons are at the 1–1.5% level
in this parameter region).
The example of Fig. 8 shows that a precise mass measurement at the LHC may in
favourable regions of the MSSM parameter space open the exciting possibility to distin-
guish between the signals of H and A production. In confronting Fig. 8 with the expected
accuracies obtained in Figs. 6 – 7 one of course needs to take into account that a separate
treatment of the H and A channels in Figs. 6 – 7 would reduce the number of signal events by
a factor of 2, resulting in a degradation of the expected accuracies (for the same luminosity)
by a factor of
2. A more detailed analysis of the potential for experimentally resolving two
mass peaks would furthermore have to include effects arising from overlapping Higgs signals.
Such an analysis goes beyond the scope of the present paper.
5 Conclusions
We have analyzed the reach of the CMS experiment with 30 or 60 fb−1 for the heavy neutral
MSSM Higgs bosons, depending on tanβ and the Higgs-boson mass scale, MA. We have
focused on the channel bb̄H/A,H/A → τ+τ− with the τ ’s subsequently decaying to jets
and/or leptons. The experimental analysis, yielding the number of events needed for a
5 σ discovery (depending on the mass of the Higgs boson) was performed with full CMS
detector simulation and reconstruction for the final states of di-τ -lepton decays. The events
were generated with PYTHIA.
The experimental analysis has been combined with predictions for the Higgs-boson masses,
production processes and decay channels obtained with the code FeynHiggs, taking into ac-
count all relevant higher-order corrections as well as possible decays of the heavy Higgs
bosons into supersymmetric particles. We have analyzed the sensitivity of the 5 σ discov-
ery contours in the MA–tanβ plane to variations of the other supersymmetric parameters.
We have shown that the discovery contours are relatively stable with respect to the im-
pact of additional parameters. The biggest effects, resulting from higher-order corrections
to the bottom Yukawa coupling and from the kinematics of Higgs decays into charginos
and neutralinos, are caused by varying the absolute value and the sign of the higgsino mass
parameter µ. The corresponding shift in the 5 σ discovery contours amounts up to about
∆ tanβ = 10. The effects of other contributions to the relation between the bottom-quark
mass and the bottom Yukawa coupling, arising from the gluino mass and the parameters in
the stop and sbottom sector, are in general smaller than the shifts induced by a variation
of µ. The same holds for the impact of higher-order contributions beyond the corrections to
the bottom Yukawa coupling and for the possible effects of other decay modes of the heavy
Higgs bosons into supersymmetric particles. The results of our analysis, which was carried
out in the framework of the CP-conserving MSSM, should not be substantially affected by
the inclusion of complex phases of the soft-breaking parameters.
We have analyzed the prospective accuracy of the mass measurement of the heavy neu-
tral MSSM Higgs bosons in the channel bb̄H/A,H/A → τ+τ−. We find that statistical
experimental precisions of 1–4% are reachable within the discovery region. These results,
obtained from a simple estimate of the prospective accuracies, are not expected to consid-
erably degrade if further uncertainties related to background effects and jet and missing ET
scales are taken into account. We have pointed out that a %-level precision of the mass
measurements could in favourable regions of the MSSM parameter allow to experimentally
resolve the signals of the two heavy MSSM Higgs bosons.
Acknowledgements
S.H. and G.W. thank M. Carena and C.E.M. Wagner for collaboration on some of the
theoretical aspects employed in this analysis.
References
[1] H. Nilles, Phys. Rept. 110 (1984) 1;
H. Haber and G. Kane, Phys. Rept. 117 (1985) 75;
R. Barbieri, Riv. Nuovo Cim. 11 (1988) 1.
[2] [LEP Higgs working group], Phys. Lett. B 565 (2003) 61, hep-ex/0306033.
[3] [LEP Higgs working group], Eur. Phys. J. C 47 (2006) 547, hep-ex/0602042.
[4] V. Abazov et al. [D0 Collaboration], Phys. Rev. Lett. 95 (2005) 151801, hep-ex/0504018;
Phys. Rev. Lett. 97 (2006) 121802, hep-ex/0605009; D0 Note 5331-CONF.
[5] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 96 (2006) 011802,
hep-ex/0508051; CDF note 8676.
[6] A. Abulencia et al. [CDF Collaboration], Phys. Rev. Lett. 96 (2006) 042003,
hep-ex/0510065;
R. Eusebi, Ph.d. thesis: “Search for charged Higgs in tt̄ decay products from proton-
antiproton collisions at
s = 1.96TeV”, University of Rochester, 2005.
[7] ATLAS Collaboration, Detector and Physics Performance Technical Design Report,
CERN/LHCC/99-15 (1999), see:
atlasinfo.cern.ch/Atlas/GROUPS/PHYSICS/TDR/access.html ;
[8] K. Cranmer, Y. Fang, B. Mellado, S. Paganis, W. Quayle and S. Wu, hep-ph/0401148.
[9] CMS Physics Technical Design Report, Volume 2. CERN/LHCC 2006-021, see:
cmsdoc.cern.ch/cms/cpt/tdr/ .
[10] V. Büscher and K. Jakobs, Int. J. Mod. Phys. A 20 (2005) 2523, hep-ph/0504099.
[11] M. Schumacher, Czech. J. Phys. 54 (2004) A103; hep-ph/0410112.
[12] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, hep-ph/9912223.
[13] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, Eur. Phys. J. C 26 (2003) 601,
hep-ph/0202167.
[14] M. Carena, S. Heinemeyer, C. Wagner and G. Weiglein, Eur. Phys. J. C 45 (2006) 797,
hep-ph/0511023.
[15] S. Heinemeyer, W. Hollik and G. Weiglein, JHEP 0006 (2000) 009, hep-ph/9909540.
[16] A. Belyaev, J. Pumplin, W. Tung and C. Yuan, JHEP 0601 (2006) 069,
hep-ph/0508222.
http://arxiv.org/abs/hep-ex/0306033
http://arxiv.org/abs/hep-ex/0602042
http://arxiv.org/abs/hep-ex/0504018
http://arxiv.org/abs/hep-ex/0605009
http://arxiv.org/abs/hep-ex/0508051
http://arxiv.org/abs/hep-ex/0510065
http://arxiv.org/abs/hep-ph/0401148
http://arxiv.org/abs/hep-ph/0504099
http://arxiv.org/abs/hep-ph/0410112
http://arxiv.org/abs/hep-ph/9912223
http://arxiv.org/abs/hep-ph/0202167
http://arxiv.org/abs/hep-ph/0511023
http://arxiv.org/abs/hep-ph/9909540
http://arxiv.org/abs/hep-ph/0508222
[17] M. Albrow and A. Rostovtsev, hep-ph/0009336;
V. Khoze, A. Martin and M. Ryskin, Eur. Phys. J. C 23 (2002) 311, hep-ph/0111078;
A. De Roeck, V. Khoze, A. Martin, R. Orava and M. Ryskin, Eur. Phys. J. C 25 (2002)
391, hep-ph/0207042;
B. Cox, AIP Conf. Proc. 753 (2005) 103, hep-ph/0409144;
J. Forshaw, hep-ph/0508274.
[18] S. Heinemeyer, V. Khoze, M. Ryskin, W. Stirling, M. Tasevsky and G. Weiglein, in
preparation.
[19] S. Heinemeyer, W. Hollik and G. Weiglein, Comput. Phys. Commun. 124 (2000) 76,
hep-ph/9812320; hep-ph/0002213; see: www.feynhiggs.de .
[20] S. Heinemeyer, W. Hollik and G. Weiglein, Eur. Phys. J. C 9 (1999) 343,
hep-ph/9812472.
[21] G. Degrassi, S. Heinemeyer, W. Hollik, P. Slavich and G. Weiglein, Eur. Phys. J. C 28
(2003) 133, hep-ph/0212020.
[22] M. Frank, T. Hahn, S. Heinemeyer, W. Hollik, H. Rzehak and G. Weiglein, JHEP 02
(2007) 047, hep-ph/0611326.
[23] S. Abdullin et al., Eur. Phys. J. C 39S2 (2005) 41.
[24] R. Kinnunen and A. Nikitenko, CMS note 2003/006.
[25] J. Thomas, ATL-PHYS-2003-003;
D. Cavalli and D. Negri, ATL-PHYS-2003-009.
[26] M. Carena, P. Chankowski, S. Pokorski and C. Wagner, Phys. Lett. B 441 (1998) 205,
hep-ph/9805349.
[27] J. Espinosa and I. Navarro, Nucl. Phys. B 615 (2001) 82, hep-ph/0104047.
[28] S. Heinemeyer, W. Hollik and G. Weiglein, Phys. Rev. D 58 (1998) 091701,
hep-ph/9803277; Phys. Lett. B 440 (1998) 296, hep-ph/9807423.
[29] G. Degrassi, A. Dedes and P. Slavich, Nucl. Phys. B 672 (2003) 144, hep-ph/0305127.
[30] M. Carena, H. Haber, S. Heinemeyer, W. Hollik, C. Wagner, and G. Weiglein, Nucl.
Phys. B 580 (2000) 29, hep-ph/0001002.
[31] J. Ellis, G. Ridolfi and F. Zwirner, Phys. Lett. B 257 (1991) 83;
Y. Okada, M. Yamaguchi and T. Yanagida, Prog. Theor. Phys. 85 (1991) 1;
H. Haber and R. Hempfling, Phys. Rev. Lett. 66 (1991) 1815.
[32] A. Brignole, Phys. Lett. B 281 (1992) 284.
[33] P. Chankowski, S. Pokorski and J. Rosiek, Phys. Lett. B 286 (1992) 307; Nucl. Phys.
B 423 (1994) 437, hep-ph/9303309.
http://arxiv.org/abs/hep-ph/0009336
http://arxiv.org/abs/hep-ph/0111078
http://arxiv.org/abs/hep-ph/0207042
http://arxiv.org/abs/hep-ph/0409144
http://arxiv.org/abs/hep-ph/0508274
http://arxiv.org/abs/hep-ph/9812320
http://arxiv.org/abs/hep-ph/0002213
http://arxiv.org/abs/hep-ph/9812472
http://arxiv.org/abs/hep-ph/0212020
http://arxiv.org/abs/hep-ph/0611326
http://arxiv.org/abs/hep-ph/9805349
http://arxiv.org/abs/hep-ph/0104047
http://arxiv.org/abs/hep-ph/9803277
http://arxiv.org/abs/hep-ph/9807423
http://arxiv.org/abs/hep-ph/0305127
http://arxiv.org/abs/hep-ph/0001002
http://arxiv.org/abs/hep-ph/9303309
[34] A. Dabelstein, Nucl. Phys. B 456 (1995) 25, hep-ph/9503443; Z. Phys. C 67 (1995)
495, hep-ph/9409375.
[35] R. Hempfling and A. Hoang, Phys. Lett. B 331 (1994) 99, hep-ph/9401219;
J. Casas, J. Espinosa, M. Quirós and A. Riotto, Nucl. Phys. B 436 (1995) 3, E: ibid.
B 439 (1995) 466, hep-ph/9407389;
M. Carena, J. Espinosa, M. Quirós and C. Wagner, Phys. Lett. B 355 (1995) 209,
hep-ph/9504316;
M. Carena, M. Quirós and C. Wagner, Nucl. Phys. B 461 (1996) 407, hep-ph/9508343;
H. Haber, R. Hempfling and A. Hoang, Z. Phys. C 75 (1997) 539, hep-ph/9609331;
R. Zhang, Phys. Lett. B 447 (1999) 89, hep-ph/9808299;
J. Espinosa and R. Zhang, JHEP 0003 (2000) 026, hep-ph/9912236;
G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 611 (2001) 403, hep-ph/0105096;
J. Espinosa and R. Zhang, Nucl. Phys. B 586 (2000) 3, hep-ph/0003246;
A. Brignole, G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 631 (2002) 195,
hep-ph/0112177;
A. Brignole, G. Degrassi, P. Slavich and F. Zwirner, Nucl. Phys. B 643 (2002) 79,
hep-ph/0206101;
S. Heinemeyer, W. Hollik, H. Rzehak and G. Weiglein, Eur. Phys. J. C 39 (2005) 465,
hep-ph/0411114; hep-ph/0506254.
[36] R. Hempfling, Phys. Rev. D 49 (1994) 6168;
L. Hall, R. Rattazzi and U. Sarid, Phys. Rev. D 50 (1994) 7048, hep-ph/9306309;
M. Carena, M. Olechowski, S. Pokorski and C. Wagner, Nucl. Phys. B 426 (1994) 269,
hep-ph/9402253.
[37] M. Carena, D. Garcia, U. Nierste and C. Wagner, Nucl. Phys. B 577 (2000) 577,
hep-ph/9912516.
[38] H. Eberl, K. Hidaka, S. Kraml, W. Majerotto and Y. Yamada, Phys. Rev. D 62 (2000)
055006, hep-ph/9912463.
[39] S. Martin, Phys. Rev. D 65 (2002) 116003, hep-ph/0111209; Phys. Rev. D 66 (2002)
096001, hep-ph/0206136; Phys. Rev. D 67 (2003) 095012, hep-ph/0211366; Phys. Rev.
D 68 (2003) 075002, hep-ph/0307101; Phys. Rev.D 70 (2004) 016005, hep-ph/0312092;
Phys. Rev. D 71 (2005) 016012, hep-ph/0405022; Phys. Rev. D 71 (2005) 116004,
hep-ph/0502168;
S. Martin and D. Robertson, Comput. Phys. Commun. 174 (2006) 133, hep-ph/0501132.
[40] S. Martin, hep-ph/0701051.
[41] S. Heinemeyer, W. Hollik and G. Weiglein, Phys. Rept. 425 (2006) 265, hep-ph/0412214.
[42] J. Guasch, P. Häfliger and M. Spira, Phys. Rev. D 68 (2003) 115001, hep-ph/0305101.
[43] S. Gorishny, A. Kataev, S. Larin and L. Surguladze, Mod. Phys. Lett. A 5 (1990) 2703;
Phys. Rev. D 43 (1991) 1633;
A. Kataev and V. Kim, Mod. Phys. Lett. A 9 (1994) 1309;
http://arxiv.org/abs/hep-ph/9503443
http://arxiv.org/abs/hep-ph/9409375
http://arxiv.org/abs/hep-ph/9401219
http://arxiv.org/abs/hep-ph/9407389
http://arxiv.org/abs/hep-ph/9504316
http://arxiv.org/abs/hep-ph/9508343
http://arxiv.org/abs/hep-ph/9609331
http://arxiv.org/abs/hep-ph/9808299
http://arxiv.org/abs/hep-ph/9912236
http://arxiv.org/abs/hep-ph/0105096
http://arxiv.org/abs/hep-ph/0003246
http://arxiv.org/abs/hep-ph/0112177
http://arxiv.org/abs/hep-ph/0206101
http://arxiv.org/abs/hep-ph/0411114
http://arxiv.org/abs/hep-ph/0506254
http://arxiv.org/abs/hep-ph/9306309
http://arxiv.org/abs/hep-ph/9402253
http://arxiv.org/abs/hep-ph/9912516
http://arxiv.org/abs/hep-ph/9912463
http://arxiv.org/abs/hep-ph/0111209
http://arxiv.org/abs/hep-ph/0206136
http://arxiv.org/abs/hep-ph/0211366
http://arxiv.org/abs/hep-ph/0307101
http://arxiv.org/abs/hep-ph/0312092
http://arxiv.org/abs/hep-ph/0405022
http://arxiv.org/abs/hep-ph/0502168
http://arxiv.org/abs/hep-ph/0501132
http://arxiv.org/abs/hep-ph/0701051
http://arxiv.org/abs/hep-ph/0412214
http://arxiv.org/abs/hep-ph/0305101
L. Surguladze, Phys. Lett. B 338 (1994) 229, hep-ph/9406294; Phys. Lett. B 341 (1994)
60, hep-ph/9405325;
K. Chetyrkin, Phys. Lett. B 390 (1997) 309, hep-ph/9608318;
K. Chetyrkin and A. Kwiatkowski, Nucl. Phys. B 461 (1996) 3, hep-ph/9505358;
S. Larin, T. van Ritbergen and J. Vermaseren, Phys. Lett. B 362 (1995) 134,
hep-ph/9506465;
P. Chankowski, S. Pokorski and J. Rosiek, Nucl. Phys. B 423 (1994) 497;
S. Heinemeyer, W. Hollik and G. Weiglein, Eur. Phys. J. C 16 (2000) 139,
hep-ph/0003022.
[44] R. Harlander and W. Kilgore, Phys. Rev. D 68 (2003) 013001, hep-ph/0304035.
[45] S. Dittmaier, M. Kramer and M. Spira, Phys. Rev. D 70 (2004) 074010,
hep-ph/0309204.
[46] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Phys. Rev. D 69 (2004) 074027,
hep-ph/0311067.
[47] K. Assamagan et al. [Les Houches 2003 Higgs Working Group], hep-ph/0406152.
[48] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Phys. Rev. Lett. 94 (2005) 031802,
hep-ph/0408077.
[49] S. Dawson, C. Jackson, L. Reina and D. Wackeroth, Mod. Phys. Lett. A 21 (2006) 89,
hep-ph/0508293.
[50] J. Campbell, R. Ellis, F. Maltoni and S. Willenbrock, Phys. Rev. D 67 (2003) 095002,
hep-ph/0204093.
[51] A. Martin, R. Roberts, W. Stirling and R. Thorne, Eur. Phys. J. C 28 (2003) 455,
hep-ph/0211080.
[52] T. Hahn, S. Heinemeyer and G. Weiglein, Nucl. Phys. B 652 (2003) 229,
hep-ph/0211204.
[53] See: people.web.psi.ch/spira/hqq .
[54] E. Brubaker et al. [Tevatron Electroweak Working Group], hep-ex/0608032, see:
tevewwg.fnal.gov/top/ .
[55] [Tevatron Electroweak Working Group], hep-ex/0703034.
[56] G. Abbiendi et al. [OPAL Collaboration], Eur. Phys. J. C 35 (2004) 1, hep-ex/0401026.
[57] S. Gennai, A. Nikitenko and L. Wendland, CMS Note 2006/126.
[58] R. Kinnunen and S. Lehti, CMS Note 2006/075.
[59] A. Kalinowski, M. Konecki and D. Kotlinski, CMS Note 2006/105.
[60] S. Lehti, CMS Note 2006/101.
http://arxiv.org/abs/hep-ph/9406294
http://arxiv.org/abs/hep-ph/9405325
http://arxiv.org/abs/hep-ph/9608318
http://arxiv.org/abs/hep-ph/9505358
http://arxiv.org/abs/hep-ph/9506465
http://arxiv.org/abs/hep-ph/0003022
http://arxiv.org/abs/hep-ph/0304035
http://arxiv.org/abs/hep-ph/0309204
http://arxiv.org/abs/hep-ph/0311067
http://arxiv.org/abs/hep-ph/0406152
http://arxiv.org/abs/hep-ph/0408077
http://arxiv.org/abs/hep-ph/0508293
http://arxiv.org/abs/hep-ph/0204093
http://arxiv.org/abs/hep-ph/0211080
http://arxiv.org/abs/hep-ph/0211204
http://arxiv.org/abs/hep-ex/0608032
http://arxiv.org/abs/hep-ex/0703034
http://arxiv.org/abs/hep-ex/0401026
[61] T. Sjostrand et al., Comput. Phys. Commun. 135 (2001) 238; hep-ph/0010017.
[62] J. Campbell, A. Kalinowski and A. Nikitenko, “Comparison between MCFM and Pythia
for the gb → bh and gg → bb̄h processes at the LHC” in
C. Buttar et al., Les Houches Physics at TeV Colliders 2005, “Standard Model and
Higgs working group: Summary report”, hep-ph/0604120.
[63] E. Boos et al. [CompHEP Collaboration], Nucl. Instrum. Meth. A 534 (2004) 250,
hep-ph/0403113.
http://arxiv.org/abs/hep-ph/0010017
http://arxiv.org/abs/hep-ph/0604120
http://arxiv.org/abs/hep-ph/0403113
	Introduction
	Phenomenology of the MSSM Higgs sector
	Notation
	Higher-order corrections in the Higgs sector
	Higgs-boson propagator corrections
	Corrections to the relation between the bottom-quark mass and the bottom Yukawa coupling
	Corrections to the Higgs production cross sections
	The mhmax and no-mixing benchmark scenarios
	Experimental analysis
	Results
	Discovery reach for heavy neutral MSSM Higgs bosons
	Higgs-boson mass precision
	Conclusions
ABSTRACT
  The search for MSSM Higgs bosons will be an important goal at the LHC. We
analyze the search reach of the CMS experiment for the heavy neutral MSSM Higgs
bosons with an integrated luminosity of 30 or 60 fb^-1. This is done by
combining the latest results for the CMS experimental sensitivities based on
full simulation studies with state-of-the-art theoretical predictions of MSSM
Higgs-boson properties. The results are interpreted in MSSM benchmark scenarios
in terms of the parameters tan_beta and the Higgs-boson mass scale, M_A. We
study the dependence of the 5 sigma discovery contours in the M_A-tan_beta
plane on variations of the other supersymmetric parameters. The largest effects
arise from a change in the higgsino mass parameter mu, which enters both via
higher-order radiative corrections and via the kinematics of Higgs decays into
supersymmetric particles. While the variation of $\mu$ can shift the
prospective discovery reach (and correspondingly the ``LHC wedge'' region) by
about Delta tan_beta = 10, we find that the discovery reach is rather stable
with respect to the impact of other supersymmetric parameters. Within the
discovery region we analyze the accuracy with which the masses of the heavy
neutral Higgs bosons can be determined. We find that an accuracy of 1-4% should
be achievable, which could make it possible in favourable regions of the MSSM
parameter space to experimentally resolve the signals of the two heavy MSSM
Higgs bosons at the LHC.

<|endoftext|><|startoftext|>
Introduction
White Dwarf (WD) mass distributions have been determined us-
ing a variety of different methods. Discrepancies exist between
the different determinations in particular between the photo-
metric and spectroscopic WD masses. Boudreault & Bergeron
(2005) compared the masses derived by fitting the observed
Balmer lines with masses derived from trigonometric parallaxes
and photometry. They found differences of ∼ 50 per cent for cool
(6 500–14 000 K) DA white dwarfs. Spectroscopic masses are
believed to be more accurate, especially for WDs in the temper-
ature range between 15 000 and 40 000 K (Liebert et al. 2005).
Atmospheric models are less well established for stars outside
this range. For hotter WDs the atmospheric structure is modi-
fied by an (often unknown) amount of metals and by non-LTE
effects. For cooler WDs the convection has to be considered and
the models are sensitive to the mixing length and the amount of
helium convected to the surface (Boudreault & Bergeron 2005).
Central stars of planetary nebulae (CSPN) provide a way to
test the mass distributions. CSPNe evolve directly into WDs,
with only very minor mass changes, allowing one to measure
masses of currently forming white dwarfs. However, CSPN mass
distributions have also been uncertain. For example, Napiwotzki
(2006) shows that the very high CSPN masses (close to the
Chandrasekhar limit) derived spectroscopically with state-of-
the-art model atmospheres by Pauldrach et al. (2004) are physi-
cally implausible and masses close to the peak of the CSPN/WD
mass distribution are more likely.
CSPN masses are normally obtained from the luminosi-
ties. But more accurate masses can be derived using the age–
temperature diagram, obtainable from the surrounding planetary
nebula (PN). Gesicki et al. (2006) applied this to a sample of
101 PNe. In this Letter we discuss the resulting mass distribu-
tions for hydrogen-rich and hydrogen-poor CSPNe and compare
with published WD masses.
2. Methods and results
2.1. Models
The method requires the age of the nebula and the temperature
of the central star to be determined. Together these provide the
heating time scale for the star.
We derive the age of the PNe using a combination of line
ratios, diameters (taken from the literature), and new high res-
olution spectra (Gesicki et al. 2006). The diameters and line
ratios are used to fit a spherically symmetric photo-ionization
model. The model assumes a density distribution and finds a
stellar black-body temperature. For each ion, the model finds a
radial emissivity distribution. The observed line profiles for each
ion represent the convolution of the thermal broadening and the
expansion velocity at each radius. Thus, the line profiles for dif-
ferent ions are used to fit a velocity field. An iterative procedure
is used to improve the ionization model. The emissivity distribu-
tions of different ions overlap, and this gives a strong constraint
on the shape of the wings of the line profiles. A genetic algo-
rithm, PIKAIA, is used to arrive at the optimum solution for
ionization model and velocity field. A turbulent component is
added if needed: turbulence is indicated by a Gaussian shape of
the line profiles. The expansion velocities are found to increase
with radius, due to the overpressure of the ionized region.
¿From the velocity field v(r), we derive the mass-weighted
average over the nebula, vav. This parameter has been shown
to be robust against the simplifications. Different models which
provide comparable quality fits give the same vav to within
2 km s−1 (Gesicki et al 2006). Applying this to a radius of 0.8
times the outer radius (equivalent to the mass-averaged radius)
allows us to define a kinematic age t to the nebula. A linear ac-
celeration is assumed to have occurred from the AGB expansion
velocity (10–15 km s−1) to the PN velocity vav (20–25 km s−1).
The derived nebular age and stellar temperature are com-
pared to the the H-burning tracks of Blöcker (1995), which pro-
vide the largest and most uniform collection available. We inter-
2 K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae
0.565
0.605
0.625
0.696
0.836
2 4 8
Fig. 1. Comparison of the 101 modelled PNe with the evolu-
tionary tracks in the HR diagram. The model black-body tem-
peratures are plotted against the luminosities interpolated from
tracks. Filled circles indicate [WR] stars, open circles are wels
and pluses indicate non-emission-line stars. The dotted lines
show H-burning evolutionary models of Blöcker (1995), labeled
by mass in units of M�. The solid lines are isochrones, labeled
by the time after the nebula ejection, in units of 103 yr.
polate between different tracks to find for each (t,Teff), the CSPN
luminosity and mass.
2.2. Different CSPN types
The CSPNe fall into two broad categories: the hydrogen-rich
O-type stars and the emission-line central stars which are gen-
erally hydrogen-deficient. The second group consists of [WR]-
type stars with strong emission lines and wels (weak emission
line stars). The [WR] are subdivided into hot [WO] and cool
[WC]. [WR] stars are in most cases hydrogen-free (three possi-
ble exceptions are mentioned by Werner & Herwig 2006). The
wels may contain some hydrogen. Gesicki et al. (2006) show
that one group of wels is located in the temperature gap between
[WC] and [WO] stars. The other wels stars form a non-uniform
group, including higher-mass objects where the high luminosity
drives a wind but the star is not necessarily hydrogen-poor. The
hydrogen-rich stars are believed to be related to the DA white
dwarfs, while the [WR] may evolved into DB’s.
2.3. The HR diagram
The full analyzed sample contains 101 PNe, of which about 60
are in the direction of the Galactic Bulge and the remainder are in
the Galactic disk. Foreground confusion among the Bulge PNe
is estimated at 20%. The sample contains 23 [WR]-type, 21 wels
and 57 non-emission-line central stars1. The CSPN classification
was adopted from literature. The last group contains also objects
without any information about their spectrum.
In Fig.1 we show the photoionization temperatures and inter-
polated luminosities, plotted on the HR diagram. The H-burning
tracks of Blöcker (1995) are also shown: the luminosities and
masses of CSPNe fall into a rather restricted range of values.
Isochrones of 1,2,4, and 8 × 103 yr are also shown.
A previous HR diagram of CSPNe presented by Stanghellini
et al. (2002) shows a much broader range of luminosities and,
in consequence, masses. They use Zanstra temperatures and
luminosities. The Zanstra method of locating a CSPN in the
HR diagram was criticized by Schönberner & Tylenda (1990).
1 The data file is available from web page
www.astri.uni.torun.pl/∼gesicki/modelled pne.dat
Table 1. Comparison between our dynamical masses and spec-
troscopic masses from Kudritzki et al. (2006). Observed mass-
loss rates from the same paper are also listed and compared to
values from the model tracks of Blöcker (1995). He 2-108 is
classified as wels, the other three are non-emission-line stars.
Object M [M�] Teff [103 K] log Ṁ [M� yr−1]
dyn. spec. dyn. spec. spec. evol. tracks
Tc 1 0.59 0.81 32 34 −7.46 −7.91
He 2-108 0.57 0.63 32 34 −6.85 −8.16
IC 418 0.61 0.92 37 36 −7.43 −7.82
NGC 3242 0.61 0.63 79 75 −8.08 −7.86
Observationally, the accuracy of the luminosity determinations
is about a factor of 2. On the Schönberner tracks, a CSPN mass
change from, e.g., 0.57 to 0.7 M� corresponds to a factor of 3
in luminosity. The masses determined directly from luminosity
are thus accurate to only 0.1 M�. This is less than the typical
dispersion of masses. In contrast, for the same mass range, the
dynamical time scales differ by a factor of 60. Even for a factor
of 2 uncertainty in the nebular age, the mass changes by only
0.02 M�. Therefore, the dynamical method improves the accu-
racy.
Schönberner & Tylenda (1990) also developed a method to
improve the CSPN mass determination. This method (Tylenda et
al. 1991) results in masses similar to ours.
Table 1 compares, for four objects in common, our dynami-
cal masses with the spectroscopic masses derived by Kudritzki et
al. (2006). The spectroscopic masses are larger, in two cases very
much larger. The lower masses are supported by the kinematical
properties of Tc 1 and He 2-108 (see Fig. 5 of Napiwotzki 2006),
which favour an old thin disk population. Kudritzki et al. also
derive Teff : our photo-ionization values are in good agreement.
Pauldrach et al. (2004) find from a spectroscopic analysis,
five CSPNe with masses close to the Chandrasekhar limit. This
result is implausible, as argued by Napiwotzki (2006). Three of
their objects are also in our sample, and all are found to have
regular masses.
2.4. The mass distributions
In Fig.2 the upper panel presents the mass distribution of our
whole sample of 101 PNe. All CSPNe masses fall into a narrow
range, 0.55 − 0.66 M�, with a mean mass of 0.61 M�. The range
of masses is almost identical to that of Tylenda et al. (1991) but
they obtained a smaller mean mass of 0.593 M� and their distri-
bution peaks at 0.58 M�.
The lower panel of Fig.2 presents masses for the same types
of CSPNe as shown in Fig. 1. The non-emission-line stars show
a Gaussian mass distribution. The hydrogen-deficient emission-
line stars seem to consist of two populations: one sharply peaked,
containing [WR] stars, and the other showing a wider spread,
composed of [WR] and wels. The sharp peak consists, with a
single exception, of hot [WO] stars only.
The presented histograms seem to suggest that hot [WO]
stars form a different group from the combined cooler [WC] and
wels CSPNe.
K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae 3
Fig. 2. The CSPN mass histograms. Upper panel: the histogram
of all modelled PNe. Lower panel: the histogram of different
subgroups of the 101 PNe. The dashed line indicates [WR] stars,
the dotted line wels and the solid line non-emission-line stars.
3. Comparing CSPNe and WDs
3.1. The histograms
The comparable birth rates of PNe and WDs suggests that most
white dwarfs go through the PN phase (e.g. Liebert et al. 2005).
The mass distribution in both samples should therefore be simi-
Fig.3 presents the histograms of our interpolated O-type
CSPN masses and the masses of DA white dwarfs from recent
surveys. The WD data of Madej et al. (2004), kindly provided
by the authors, contain 1175 new DA WDs extracted from the
Sloan Digital Sky Survey. The data of Liebert et al. (2005) taken
from the electronic version of their article, contain 347 DA WDs
from the Palomar Green Survey. For Fig.3 we selected the ob-
jects with temperatures between 15 000 K and 40 000 K. The two
WDs histograms are not identical, but both peak at similar val-
ues and show extended low- and high-mass tails. We plot the
histograms using narrower bins than usually done for WDs, op-
timized to the mass resolution of our CSPN data. The difference
between the WD and CSPN distributions is striking.
First, the obtained CSPN masses are restricted to a much nar-
rower range of values than WDs, and are also much more sharply
peaked. At face value, this implies that only some of the WDs
have gone through the PN phase, in contrast to the conclusion
from their similar birth rates (Liebert et al. 2005). Second, the
two distributions peak at different masses. Here a systematic er-
ror cannot be excluded, as discussed below.
3.2. Hydrogen-rich vs. hydrogen-deficient
Hansen & Liebert (2003) point to a variety of WD mass distri-
butions with clear differences between hydrogen- and helium-
rich cool stars. Beauchamp et al. (1996) found for hot helium-
atmosphere DB stars a sharp peak lacking almost entirely of low-
and high-mass components. They also found that the DBA stars,
which exhibit traces of atmospheric hydrogen, show a distinctly
different, broad and flat distribution.
The CSPN show an apparent difference between hydrogen-
rich and hydrogen-deficient mass distributions. The hydrogen-
deficient stars show a very narrow mass distribution; it is tempt-
ing to relate this to the helium-rich DB and DBA populations.
We use hydrogen-burning tracks to derive these masses. The
Fig. 3. The mass distribution of non-emission-line O-type
CSPNe (shaded area) is compared to two DA white dwarf distri-
butions of intermediate temperatures: thin line: data from Liebert
et al. (2005); dotted line: data from Madej et al. (2004) which are
more numerous, and are rescaled.
evolution after the thermal pulse leading to helium burners is
very complicated and not well understood (Werner & Herwig
2006). This may not affect the derived masses too much: the ef-
fect of a thermal pulse is to change the temperature of the star,
but as shown in Fig. 1, the isochrones have only a weak depen-
dence on temperature. The resulting offset in time (still very un-
certain) when accounted for can shift those CSPN masses to-
wards higher values.
4. Discussion
4.1. Uncertainties in mass determinations
When comparing the CSPNe and WDs we have to remem-
ber that we compare different spatial distributions. Because of
their faintness the WD observations are restricted to our near-
est neighbourhood while PNe are observed across the whole
Galaxy. Nevertheless we didn’t obtain significantly different dis-
tributions for PNe at different distances.
Our mass determination relies on a single set of evolutionary
tracks. There are two possible sources of errors in the Blöcker
tracks. The first is the early post-AGB evolution where the time
scales depend on how and when the AGB wind terminates. The
Blöcker tracks end this at Teff ∼ 6000 K, (pulsation period of
50 days) to agree with the observations of detached shells around
hotter stars but not around cooler stars. A later termination would
lead to an earlier start of the ionization: in this case we would
systematically overestimate the masses. For a reduction of the
post-AGB transition time by 103 yr, the typical mass would re-
duce by 0.01 M�.
The second uncertainty is the mass-loss rate during the post-
AGB phase. For M ∼ 0.6 M�, the post-AGB mass-loss rate in
the Blöcker models is 0.1 times the nuclear burning rate, but
for high-mass models the mass loss accelerates the evolution
by 50% (Blöcker 1995). A higher post-AGB mass loss than as-
sumed would reduce our masses, but for the typical masses we
find a very large increase would be required. Table 1 compares
the Blöcker mass-loss rates with observed values, where we used
the dynamical mass to calculate the Blöcker rate. For the three
non-emission-line stars, observed rates are higher by up to a fac-
tor of 3. This appears to be in part related to the high luminos-
ity derived by Kudritzki et al: if we compare their rates with
Blöcker tracks at similar luminosity, then the Blöcker rates tend
to be higher. The nuclear burning rate of ṀH ∼ −6.8 exceeds
the observed wind by a factor of four (more for NGC 3242). For
4 K. Gesicki and A.A. Zijlstra: White dwarf masses from planetary nebulae
Table 2. Blöcker track time scales: PN visibility is defined as
between log Teff = 4.4 and either a nebular age t = 104 yr or a
stellar luminosity log L = 3.0, whichever occurs earlier
Mass [M�] tstart [yr] tend [yr] tvisibility [yr]
0.546 90 103 - -
0.565 4 103 10 103 6 103
0.605 1.5 103 7.4 103 5.9 103
0.625 660 3.6 103 2.9 103
0.696 100 880 780
0.836 100 840 740
0.940 12 90 78
this factor, the Blöcker tracks would underestimate the speed of
evolution by only 10 per cent. We conclude that the post-AGB
mass-loss rates have little effect on the derived masses. The ex-
ception is the wels star in the sample, where the wind mass loss
rate is comparable to the nuclear burning rate.
There is also an uncertainty in the dynamical age estimate.
A later acceleration would increase the ages by up to 50 per cent
and shift the mass peak from 0.61 to 0.60 M�.
The WD mass determinations also suffer from simplifica-
tions and model assumptions, in addition to the uncertainties
concerning cool and hot WDs as described in the Introduction.
One uncertainty is in contemporary plasma physics, concerning
the pressure broadening in a very high density plasma (Madej et
al. 2004). The mass-radius relations used depend on the assumed
mass of the hydrogen layer. Napiwotzki et al. (1999) compared
estimates from different studies and concluded that the gravities
obtained from spectroscopic method suffer from systematic er-
rors of up to 0.1 dex in log g. This corresponds to an offset in
masses of about 0.02 M� and could, in principle, explain the dif-
ference in peak masses between WDs and CSPNe. The width of
the peak may also be narrower than derived from the models.
Nevertheless, the wide tails of the mass distribution are not in
doubt.
4.2. Time scales, birth rates and binarity
The derived CSPN mass distribution combines the effects of the
birth rate as function of mass, and the observable life time of
the PN. The latter depends on mass as indicated in Table 2. The
period of visibility is defined here as beginning when the star
reaches Teff = 25 103 K, and ending either when the star enters
the cooling track (defines as log L = 3.00) or when the age of the
nebula is 104 yr, whichever comes earlier. Our histogram should
be corrected for the difference in visibility time. This increases
the number at high CSPN mass only by a factor of up to 10,
and brings the high mass tail in somewhat better agreement. We
may also have a sample bias against high masses, as these are
not expected in the Bulge objects. The de-selection of bipolar
objects may have removed a few higher-mass nebulae in the disk.
CSPNe with M < 0.56 M� would not produce a visible PN,
as the post-AGB transition time becomes too long (’lazy PNe’).
In the sample of Liebert et al. (2005), 30 per cent of white dwarfs
have masses in this range, and 50 per cent in the sample of Madej
et al. (2004). However, the sharp drop in the CSPN mass distri-
bution below 0.60 M� occurs at too high mass to be affected.
Hansen & Liebert (2003) argue that both the high- and low-
mass tails in WDs distribution can be a result of binary evolu-
tion. Merging leads to high-mass WDs while a close compan-
ion stripping the envelope can cause an early termination of the
evolution and produce a low-mass helium WD. Both channels
together may account for some 10 per cent of all WDs (Moe &
de Marco 2006). Therefore the histogram for single WDs could
be narrower. Close binary evolution can affect the PN phase as
well, leading to strongly non-spherical nebulae. Our model anal-
ysis assumes spherical symmetry, and we did not analyze bipo-
lar nebulae. Our selection therefore favours single CSPNe and
rejects low-mass CSPNe in interacting binaries. Thus, the CSPN
histogram (Fig. 3) is biased toward single-star evolution, while
the WD histogram includes binary broadening. This may affect
the tails of the WD histogram but is not expected to affect the
main peak.
Moe & de Marco (2006) predict a number of PNe in the
Galaxy of around 46000. Based on local column densities,
Zijlstra & Pottasch (1991) derive an actual number of 23000,
suggesting that only about half the stars which could produce
a PN, do so. This comparison is limited by our knowledge on
the time a PN remains observable. Moe & de Marco (2006) pre-
dict a birth rate of PNe of 1.1× 10−12 PNe yr−1 pc−3, comparable
to the current, local WD birth rate of 1.0 × 10−12 PNe yr−1 pc−3.
Again assuming only half their predicted number of PNe is actu-
ally observed, the expectation is that half of all WDs have passed
through the PN phase.
5. Conclusions
We show that the mass distribution of CSPNe is sharply peaked
at M = 0.61 M�. The published WD mass distributions show
a much broader distribution peaking at a lower mass of M =
0.59 M�. Part of the difference in the peak may indicate faster
evolution during the early post-AGB phase than assumed in the
Blöcker tracks. CSPN mass-loss rates cannot explain the dif-
ference. However considering the uncertainty of 0.02 M� in the
WD mass estimations both peaks are in reasonable agreement.
About 30 per cent of WDs have too low masses to have passed
through the PN phase.
Acknowledgements. We thank our referee Ralf Napiwotzki for important com-
ments. This project was financially supported by the “Polish State Committee
for Scientific Research” through the grant No. 2.P03D.002.025 and by a NATO
collaborative program grant No. PST.CLG.979726. AAZ and KG gratefully ac-
knowledge hospitality from the SAAO.
References
Beauchamp, A., Wesemael, F. & Bergeron, P. 1996, in: C. S. Jeffery and U. Heber
(eds.) Hydrogen-Deficient Stars, ASP Conference Series, Vol. 96, p.295
Blöcker, T. 1995, A&A 299, 755
Boudreault, S. & Bergeron, P. 2005, ASP Conf. Series, Vol. 334, p.249
Gesicki, K., Zijlstra, A. A., Acker, A., Gorny, S. K., Gozdziewski & K., Walsh,
J. R. 2006, A&A 451, 925
Hansen, B.,M.,S. & Liebert, J. 2003, ARA&A 41, 465
Kudritzki, R. P., Urbaneja, M. A., & Puls, J. 2006, IAU Symposium 234,
Planetary Nebulae in our Galaxy and Beyond, M.J. Barlow and R.H.
Méndez, Eds., (CUP Cambridge). , p. 119
Liebert, J., Bergeron, P. & Holberg, J. B. 2005, ApJS 156, 47
Madej, J., Nalezyty M. & Althaus, L. G. 2004, A&A 419, L5
Moe, M. & de Marco, O. 2006, ApJ, 650, 916
Napiwotzki, R., Green, P. J. & Saffer, R. A., 1999, ApJ, 517, 399
Napiwotzki, R. 2006, A&A, 451, L27
Pauldrach, A. W. A., Hoffmann, T. L. & Mendez, R. H. 2004, A&A 419, 1111
Schönberner, D. & Tylenda, R. 1990, A&A, 234, 439
Stanghellini, L., Villaver, E., Manchado, A. & Guerrero, M. A. 2002, ApJ, 576,
Tylenda, R., Stasińska, G., Acker, A. & Stenholm, B. 1991, A&A, 246, 221
Werner, K. & Herwig, F. 2006, PASP 118, 183
Zijlstra A.A., & Pottasch S.R. 1991, A&A, 243, 478
	Introduction
	Methods and results
	Models
	Different CSPN types
	The HR diagram
	The mass distributions
	Comparing CSPNe and WDs
	The histograms
	Hydrogen-rich vs. hydrogen-deficient
	Discussion
	Uncertainties in mass determinations
	Time scales, birth rates and binarity
	Conclusions
ABSTRACT
  We compare the mass distribution of central stars of planetary nebulae (CSPN)
with those of their progeny, white dwarfs (WD). We use a dynamical method to
measure masses with an uncertainty of 0.02 M$_\odot$. The CSPN mass
distribution is sharply peaked at $0.61 \rm M_\odot$. The WD distribution peaks
at lower masses ($0.58 \rm M_\odot$) and shows a much broader range of masses.
Some of the difference can be explained if the early post-AGB evolution is
faster than predicted by the Bl\"ocker tracks. Between 30 and 50 per cent of WD
may avoid the PN phase because of too low mass. However, the discrepancy cannot
be fully resolved and WD mass distributions may have been broadened by
observational or model uncertainties.

<|endoftext|><|startoftext|>
Introduction
This article discusses uniqueness theorems for Cauchy integrals of complex measures in
the plane. We consider the spaceM =M(C) of finite complex measures µ in C. The Cauchy
integral of a measure fromM is defined in the sense of principal value. First, for any µ ∈M ,
ε > 0 and any z ∈ C consider
Cµε (z) :=
ζ:|ζ−z|>ε
dµ(ζ)
ζ − z
Consequently, the Cauchy integral of µ can be defined as
Cµ(z) := lim
Cµε (z) ,
if the limit exists.
Unlike the Cauchy transform on the line, Cµ can vanish on a set of positive Lebesgue
measure: consider for example µ = dz on a closed curve, whose Cauchy transform is zero
at all points outside the curve. It is natural to ask if Cµ can also vanish on large sets with
respect to µ. If µ = δz is a single point mass, its Cauchy transform will be zero µ-a.e. due
to the above definition of Cµ in the sense of principal value. Examples of infinite discrete
measures with vanishing Cauchy transforms can also be constructed with little effort.
After that one arrives at the following corrected version of the question: Is it true that
any continuous µ ∈ M , such that Cµ(z) = 0 at µ-a.e. point, is trivial? As usual, we call
a measure continuous if it has no point masses. We denote the space of all finite complex
continuous measures by Mc(C).
This problem can also be interpreted in terms of uniqueness. Namely, if f and g are two
functions from L1(|µ|) such that C(f−g)µ = 0, µ-a.e., does it imply that f = g, µ-a.e.? This
way it becomes a problem of injectivity of the planar Cauchy transform.
The first author is supported by grants No. MTM2004-00519 and 2001SGR00431.
The second author is supported by N.S.F. Grant No. 0500852.
The third author is supported by N.S.F. Grant No. 0501067 .
http://arxiv.org/abs/0704.0621v1
2 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
First significant progress towards the solution of this problem was achieved by X. Tolsa
and J. Verdera in [14]. It was established that the answer is positive in two important
particular cases: when µ is absolutely continuous with respect to Lebesgue measure m2 in
C and when µ is a measure of linear growth with finite Menger curvature. The latter class
of measures is one of the main objects in the study of the planar Cauchy transform, see for
instance [11], [12] or [13].
As to the complete solution to the problem, it seemed for a while that the answer could
be positive for any µ ∈Mc, see for example [14]. However, in Section 5 of the present paper
we show that there exists a large set of continuous measures µ satisfying Cµ(z) = 0, µ-a.e.
Following [2], we call such measures reflectionless. This class seems to be an intriguing new
object in the theory.
On the positive side, we prove that if the maximal function associated with the Cauchy
transform is summable with respect to |µ| then µ cannot be reflectionless, see Theorem 2.1.
This result is sharp in its scale because the simplest examples of reflectionless measures
produce maximal functions that lie in the ”weak” L1(|µ|). We prove this result in Section 2
In view of this fact, we believe that the class of continuous measures with summable Cauchy
maximal functions also deserves attention.
A full description of this class and the (disjoint) class of reflectionless measures remains
an open problem.
Let us mention that if µ is a measure with linear growth and finite Menger curvature then
its Cauchy maximal function belongs to L2(|µ|), see [12, 13], and therefore is summable.
This fact relates Theorem 2.1 to the beforementioned result from [14]. The latter can also
be deduced in a different way, see Section 2.
From the point of view of uniqueness, our results imply that any bounded planar Cauchy
transform is injective, see corollary 2.5. This property is a clear analogue of the uniqueness
results for the Cauchy integral on the line or the unit circle.
In Section 3 we discuss other applications of Theorem 2.2. They involve structural theo-
rems of De Giorgi and his notion of a set of finite perimeter, see [5].
In Section 4 we study asymptotic behavior of the Cauchy transform near its zero set.
The results of this section imply that the Radon derivative of µ with respect to Lebesgue
measure m2 vanishes a.e. on the set {Cµ = 0}. In particular the set {Cµ = 0} must be a
zero set with respect to the variation of the absolutely continuous part of µ which is a slight
generalization of the first result of [14]. It is interesting to note that the most direct analogue
of this corollary on the real line is false: it is easy to construct an absolutely continuous (with
respect to m1 = dx) measure µ ∈M(R) such that |µ|({Cµ = 0}) > 0.
Finally, in Section 5 we attempt a geometric description of the set of reflectionless mea-
sures. We give a partial description of reflectionless measures on the line in terms of so-called
comb-like domains. We also provide tools for the construction of various examples of such
measures. In particular, we show that the harmonic measure on any compact subset (of
positive Lebesgue measure) of R is reflectionless.
Acknowledgments. The authors are grateful to Fedja Nazarov for his invaluable comments
and insights. The second author would also like to thank the administration and staff of
Centre de Recerca Matemática in Barcelona for the hospitality during his visit in the Spring
of 2006.
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 3
2. Measures with summable maximal functions
If µ ∈M we denote by Cµ∗ (z) its Cauchy maximal function
Cµ∗ (z) := sup
|Cµε (z)|.
Our first result is the following uniqueness theorem.
Theorem 2.1. Let µ ∈Mc. Assume that Cµ∗ (z) ∈ L1(|µ|) and that Cµ(z) exists and vanishes
µ-a.e. Then µ ≡ 0.
We first prove
Theorem 2.2. If Cµ∗ ∈ L1(|µ|) and Cµ(z) exists µ-a.e. then
µdµ(z) = 2
Cµ(t)dµ(t)
= [Cµ(z)]
for m2-a.e. point z ∈ C . (1)
Proof. Put
F := {z ∈ C :
d|µ|(t)
|t− z|
<∞} .
As |µ| is a finite measure,
m2(C \ F ) = 0 . (2)
Let z ∈ F . Then the integral
|t−ζ|>ε
dµ(t)dµ(ζ)
ζ − z
is absolutely convergent for any ε > 0.
Using the identity
(t− z)(z − ζ)
(z − ζ)(ζ − t)
(ζ − t)(t− z)
we obtain
|t−ζ|>ε
z − ζ
ζ − t
ζ − t
dµ(t)dµ(ζ) =
dµ(ζ)
ζ − z
|t−ζ|>ε
dµ(t)
dµ(t)
|ζ−t|>ε
dµ(ζ)
ζ − t
dµ(t) · Cµε (t) ·
dµ(ζ) · Cµε (ζ) ·
ζ − z
Cµε (t)dµ(t)
E := {z ∈ C :
Cµ∗ (t)d|µ|(t)
|t− z|
<∞} .
By assumption, the numerator Cµ∗ (t)d|µ|(t) is a finite measure. Therefore
m2(C \ E) = 0 . (3)
If z ∈ E then
Cµε (t)dµ(t)
Cµ(t)dµ(t)
. (4)
4 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
This formula is true as long as Cµ∗ ∈ L1(|µ|) and the principal value Cµ exists µ-a.e. by
the dominated convergence theorem. Thus
I = 2CC
µdµ(z) if z ∈ E . (5)
It is left to show that, since z ∈ F ,
I = [Cµ(z)]2 . (6)
Since z ∈ F , the following integral converges absolutely:
φε(t, z) :=
ζ∈C,|ζ−t|>ε
dµ(ζ)
ζ − z
φε(t, z)
dµ(t) .
Since the point z is fixed in F , we have that 1|ζ−z| ∈ L
1(|µ|), and therefore
|ζ−z|d|µ|(ζ) is
small if |µ|(A) is small. Denoting the disc centered at t and of radius ε by B(t, ε) we notice
1) φε(t, z) =
dµ(ζ)
ζ − z
B(t,ε)
dµ(ζ)
ζ − z
2) lim
|µ|(B(t, ε)) = 0.
uniformly in t. Otherwise µ would have an atom.
We conclude that, as ε → 0, the functions φε(t, z) converge uniformly in t ∈ C to φ(z) =∫
ζ−z . Hence for any z ∈ F and any t ∈ C \ z
φε(t, z)
→ φ(z)
, as ε → 0 .
Since φε(t, z) converge uniformly and z ∈ F ,
dµ(t)φε(t, z)
→ φ(z)
dµ(t)
= [Cµ(z)]2 .
We have verified (6).
Combining (5) and (6) we conclude that for z ∈ E ∩ F (so for m2-a.e. z ∈ C) we have
µdµ(z) = 2
Cµ(t)dµ(t)
= lim
I = [Cµ(z)]2 for m2-a.e. point z ∈ C . (7)
This formula is true as long as Cµ∗ ∈ L1(|µ|) and the principal value Cµ exists µ-a.e.
To deduce Theorem 2.1 suppose that Cµ vanishes µ-a.e. Then the left-hand side in (7) is
zero form2-a.e. point z. The same must hold for [C
µ(z)]2. But if Cµ(z) = 0 for Lebesgue-a.e.
point z ∈ C then µ = 0, see for example [6]. Theorem 2.1 is completely proved.
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 5
Remark. In the statement of Theorem 2.2 the condition Cµ∗ ∈ L1(|µ|) can be replaced
with the condition that Cµε converge in L
1(|µ|). The proof would have to be changed as
follows.
Like in the above proof one can show that at Lebesgue-a.e. point z
I = [Cµ(z)]2 . (8)
The relation
I = 2
Cµε (t)dµ(t)
for a.e. z can also be established as before. Since Cµε converge in L
1(|µ|), the last integral
converges to CC
µdµ(z) in the ”weak” L2(dxdy), which concludes the proof.
Hence we arrive at the following version of Theorem 2.1:
Theorem 2.3. Let µ ∈Mc. Assume that Cµε → 0 in L1(|µ|). Then µ ≡ 0.
This version has the following corollary:
corollary 2.4 ([14]). Let µ ∈M be a measure of linear growth and finite Menger curvature.
If Cµ = 0 at µ-a.e. point then µ ≡ 0.
Proof. The conditions on µ imply that the L2(|µ|)-norms of the functions Cµε are uniformly
bounded, see for instance [11]. Since Cµε also converge µ-a.e., they must converge in L
1(|µ|).
Remark As was mentioned in the introduction, Corollary 2.4 also follows from Theo-
rem 2.1. However, the above version of the argument allows one to obtain it without the
additional results of [12, 13] on the maximal function.
We also obtain the following statement on the injectivity of any bounded planar Cauchy
transform. As usual, we say that the Cauchy transform is bounded in L2(µ) if the functions
Cfdµε are uniformly bounded in L
2(µ)-norm for any f ∈ L2(µ). If Cµ is bounded, then Cfdµε
converge µ-a.e as ε → 0 and the image Cfdµ exists in a regular sense as a function in L2(µ),
see [13].
corollary 2.5. Let µ ∈ M be a positive measure. If Cµ is bounded in L2(µ) then it is
injective (has a trivial kernel).
Proof. Suppose that there is f ∈ L2(µ) such that Cfdµ = 0 at µ-a.e. point. Since both f
and Cfdµ∗ are in L
2(µ), Cfdµ∗ is in L
1(|f |dµ). Hence f is a zero-function by Theorem 2.1 �
Remark We have actually obtained a slightly stronger statement: If Cµ is bounded
in L2(µ) then for any f ∈ L2(µ) the functions f and Cfdµ cannot have disjoint essential
supports, i.e. the product fCfdµ cannot equal to 0 at µ-a.e. point.
In the rest of this section we will discuss what other kernels could replace the Cauchy
kernel in the statement of Theorem 2.1.
If K(x) is a complex-valued function in Rn, bounded outside of any neighborhood of the
origin, and µ is a finite measure on Rn, one can define Kµ and Kµ∗ in the same way as C
and Cµ∗ were defined in the introduction.
6 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
The proof of Theorem 2.2 relied on the fact that the Cauchy kernel K(z) = 1/z is odd,
satisfies the symmetry condition (3), i.e.
K(x− y)K(y − z) +K(y − z)K(z − x) +K(z − x)K(x− y) ≡ 0, (9)
and is summable as a function of z for any t with respect to Lebesgue measure. Any K(x)
having these three properties could be used in Theorem 2.1. Out of these three conditions
the symmetry condition (9) seems to be most unique. However, other symmetry conditions
may result in formulas similar to Theorem 2.2 that could still yield Theorem 2.1.
Here is a different example. It shows that much less symmetry can be required from the
kernel if the measure is positive.
Theorem 2.6. Let µ be a positive measure in Rn. Suppose that the real kernel K(x) satisfies
the following properties:
1) K(−x) = −K(x) for any x ∈ Rn;
2) K(x) > 0 for any x from the half-space Rn+ = {x = (x1, x2, ..., xn) | x1 > 0}.
If Kµ∗ ∈ L1(µ) and Kµ(x) = 0 for µ-a.e. x then µ ≡ 0.
Note that real and imaginary parts of the Cauchy kernel, Riesz kernels in Rn, as well as
many other standard kernels satisfy the conditions of the theorem.
We will need the following
Lemma 2.7. Let K be an odd kernel. and let µ, ν ∈ M . Then
Kµε (z)dν(z) = −
Kνε (z)dµ(z) (10)
for any ε > 0.
Suppose that Kµ∗ ∈ L1(|ν|). If Kµ(z) exists ν-a.e. then
Kµ(z)dν(z) = − lim
Kνε (z)dµ(z).
In particular, suppose that both Kµ∗ ∈ L1(|ν|) and Kν∗ ∈ L1(|µ|). If Kµ(z) exists ν-a.e.
and Kν(z) exists µ-a.e. then
Kµ(z)dν(z) =
Kν(z)dµ(z).
Proof. Since K is odd, the first equation can be obtained simply by changing the order of
integration. The second and third equations now follow from the dominated convergence
theorem. �
Proof of Theorem 2.6. There exists a half-plane {x1 = c} in Rn such that µ({x1 = c}) = 0
but both µ({x1 > c}) and µ({x1 < c}) are non-zero. Denote by ν and η the restrictions of
µ onto {x1 > c} and {x1 < c} respectively. Then∫
Kνε (z)dµ(z) =
Kνε (z)dν(z) +
Kνε (z)dη(z).
The first integral on the right-hand side is 0 because of the oddness of K (apply the first
equation in the last lemma with µ = ν). The second condition on K and the positivity of
the measure imply that the second integral is positive and increases as ε → 0. Therefore∫
Kνε (z)dµ(z) cannot tend to zero. This contradicts the fact that K
µ = 0, ν-a.e. and the
second equation from the last lemma. �
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 7
3. Sets of finite perimeter
In this section we give another example of an application of Theorem 2.2. It involves the
notion of a set of finite perimeter introduced by De Giorgi in the 50’s, see [5]. We say that
a set G ⊂ R2 has finite perimeter (in the sense of De Giorgi) if the distributional partial
derivatives of its characteristic function χG are finite measures. Such sets have structural
theorems. For example, if G is such a set then the measure ∇χG is carried by a set E,
rectifiable in the sense of Besicovitch, i. e. a subset of a countable union of C1 curves and an
H1-null set, where H1 is the one-dimensional Hausdorff measure. Also the measure ∇χG is
absolutely continuous with respect to H1 restricted to E and its Radon-Nikodym derivative
is a unit normal vector H1-a.e. (notice that ∇χG is a vector measure). At H1-almost all
points of E the function χG has approximate “one-sided”’ limit. For more details we refer
the reader to [5].
The general question we consider can be formulated as follows: What can be said about µ
if Cµ coincides at µ-a.e. point with a ”good” function f? To avoid certain technical details,
all measures in this section are compactly supported. Furthermore, we will only discuss the
two simplest choices of f . As we will see, even in such elementary situations Theorem 2.2
yields interesting consequences.
As usual, when we say that Cµ = f at µ-a.e. point, we imply that the principal value
exists µ-almost everywhere.
Theorem 3.1. Let µ ∈ Mc be compactly supported. Assume that Cµ(z) = 1, µ-almost
everywhere and Cµ∗ ∈ L1(|µ|). Then µ = ∂̄χG, where G is a set of finite perimeter. In
particular, µ is carried by a set E, H1(E) < ∞, rectifiable in the sense of Besicovitch, and
µ is absolutely continuous with respect to the restriction of H1 to E.
Remark. The most natural example of such a measure is dz on a C1 closed curve. The
theorem says that, by the structural results of De Giorgi, this is basically the full answer.
Proof. By Theorem 2.2 we get that for Lebesgue-almost every point in C
[Cµ(z)]2 = 2Cµ(z) . (11)
In other words for m2-a.e. point z we have C
µ(z) = 0 or = 2. Let G denote the set where
Cµ(z) = 2. Since the Cauchy transform of any compactly supported finite measure must
tend to zero at infinity, this set is bounded. Consider the following equality
χG = C
understood in the sense that the two functions are equal as distributions. Taking distribu-
tional derivatives on both sides we obtain
∂̄χG = µ/2 and ∂χ̄G = µ̄/2.
Hence G has finite perimeter and the rest of the statement follows from the results of [5]. �
We say that a set G has locally finite perimeter (in the sense of De Giorgi) if the distribu-
tional derivatives of χG are locally finite measures. Our second application is the following
8 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
Theorem 3.2. Let µ ∈ Mc be compactly supported. Assume that Cµ(z) = z, µ-almost
everywhere and Cµ∗ ∈ L1(|µ|). If µ(C) = 0 then µ = 2z∂̄χG, where G is a set with locally
finite perimeter. Whether µ(C) = 0 or not, µ is carried by a set E, H1(E) <∞, which is a
rectifiable set in the sense of Besicovitch, and µ is absolutely continuous with respect to the
restriction of H1 to E.
Remark. The most natural example of such a measure is zdz on a C1 closed curve. Our
statement shows that this is basically one-half of the answer. The other half is given by√
z2 − cdz as will be seen from the proof.
Proof. Again, from Theorem 2.2 we get that for Lebesgue-almost every point in C
[Cµ(z)]2 = 2Cζdµ(ζ)(z) . (12)
Notice that
Cζdµ(ζ)(z) =
ζ − z
dµ(ζ) = µ(C) + zCµ(z)
and we get a quadratic equation
[Cµ(z)]2 = 2zCµ(z)− p ,
where p := −2µ(C).
First case p = 0. Here we get
[Cµ(z)]2 = 2zCµ(z) .
We conclude that Cµ(z) = 0 or z for Lebesgue-a.e. point z ∈ C.
Again a bounded set G appears on which
Cµ = 2zχG(z)
in terms of distributions. Therefore
∂̄χG = dµ/2z ,
and the right hand side is a finite measure on any compact set avoiding the origin. Therefore,
G is a (locally) De Giorgi set.
Let us consider the case p 6= 0. For simplicity we assume p = 1, other p’s are treated in
the same way. Then we have to solve the quadratic equation
Cµ(z)2 − 2zCµ(z) + 1 = 0
for Lebesgue-a.e. point in C. Let us make the slit [−1, 1] and consider two holomorphic
functions in C \ [−1, 1]
r1(z) = z −
z2 − 1, r2(z) = z +
z2 − 1 ,
where the branch of the square root is chosen so that
r1(z) → 0, z → ∞ .
In other words we have the sets E1 and E2 such that m2(C \ E1 ∪ E2) = 0 and
z ∈ E1 ⇒ Cµ(z) = r1(z) ,
z ∈ E2 ⇒ Cµ(z) = r2(z) .
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 9
Obviously it is E1 that contains a neighborhood of infinity. The function z −
z2 − 1
outside of [−1, 1] can be written as Cµ0(z) where dµ0(x) = 1π
1− x2dx. Consider ν = µ−µ0.
z ∈ E1 ⇒ Cν(z) = 0 ,
z ∈ E2 ⇒ Cν(z) = 2
z2 − 1 := R(z) .
Therefore,
Cν(z) = R(z)χE2 . (13)
Notice that if R was analytic in an open domain compactly containing E2 we would conclude
from the previous equality that
ν = R(z)∂̄χE2 .
If, in addition, |R| was bounded away from zero on E2, we would obtain that ∂̄χE2 and ∂χE2
are measures of finite variation, and hence E2 is a set of finite perimeter. Notice that our
R(z) = 2
z2 − 1 is analytic in O := C \ [−1, 1] and is nowhere zero. We will conclude that
E2 is a set of locally finite perimeter. More precisely we will establish the following claim:
For every open disk V ⊂ O the set O ∩ E2 has finite perimeter.
Indeed, let W be a disk compactly containing V , W ⊂ O. Let ψ be a smooth function,
supported in W , ψ|V = 1. Multiply (13) by ψ and take a distributional derivative (against
smooth functions supported in V ). Then we get (using the fact that R is holomorphic on V )
ν|V = ∂̄(ψRχE2∩V )|V = ∂̄(RχE2∩V )|V = R∂̄(χE2∩V )|V .
We conclude immediately that E2 ∩ V is a set of finite perimeter. Therefore, E2 ∩D is a set
of finite perimeter, where D is a domain whose closure is contained compactly in O.
Recalling that µ = ν + µ0 we finish the proof. �
Remark 3.3. In is interesting to note that, as follows from the proof, if µ is the measure
from the statement of the theorem then one of the connected components of supp µ must
contain both roots of the equation z2 + 2µ(C) = 0.
We conclude this section with the following examples of measures µ whose Cauchy trans-
form coincides with z at µ-a.e. point
Examples. 1. Let Ω be an open domain with smooth boundary Γ. Suppose that [−1, 1] ⊂ Ω.
Let {Dj}∞j=1 be smoothly bounded disjoint domains in O := Ω \ [−1, 1], γj = ∂Dj . Assume
H1(γj) <∞ . (14)
LetR(z) be an analytic branch of 2
z2 − 1 inO. Consider the measure ν on Γ∪(∪γj)∪[−1, 1]
defined as
ν = R(z)dz|Γ − R(z)dz|∪γj −
1− x2dx|[−1,1].
10 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
Cν(z) =
0 if z ∈ C \ Ō ,
0 if z ∈ ∪jDj ,
R(z) if z ∈ O \ ∪jD̄j .
Recall that R(z) = z +
z2 − 1 − (z −
z2 − 1) and that Cµ0(z) = z −
z2 − 1 for µ0 =
1− x2dx|[−1,1]. We conclude that for µ = ν + µ0 one has
Cµ(z) =
z2 − 1 if z ∈ C \ Ō ,
z2 − 1 if z ∈ ∪jDj ,
z2 − 1 if z ∈ O \ ∪jD̄j .
2. The second example is exactly the same as the first one but Dj,k = B(xj,k,
), xj,k =
j , 1 ≤ k ≤ j, j = 1, 2, 3.... Here the assumption (14) fails. But ν, defined as above,
will still be a measure of finite variation (and so will be µ): |ν|(C) ≤ C
In both examples Cµ(z) = z for µ-a.e. z.
4. Asymptotic behavior near the zero-set of Cµ
In this section we take a slightly different approach. We study asymptotic properties of
measures near the sets where the Cauchy transform vanishes. Theorem 4.2 below shows
that near the density points of such sets the measure must display a certain ”irregular”
asymptotic behavior.
As was mentioned in the introduction, one of the results of [14] says that an absolutely
continuous planar measure cannot be reflectionless. This result is not implied by our Theorem
2.1 because an absolutely continuous measure may not have a summable Cauchy maximal
function. It is, however, implied by Theorem 4.2, see Corollary 4.4 below.
When estimating Cauchy integrals one often uses an elementary observation that the
difference of any two Cauchy kernels 1/(z − a) − 1/(z − b) can be estimated as O(|z|−2)
near infinity. To obtain higher order of decay one may consider higher order differences.
Here we will utilize the following estimate of that kind, which can be verified through simple
calculations.
Lemma 4.1. If a, b, c ∈ B(0, r) be different points, |a − b| > r. Then there exist constants
A,B ∈ C such that |A|, |B| < 2
z − a
z − b
z − c
∣∣∣∣ <
outside of B(0, 2r).
(Namely, A = b−c
b−a , B =
a−b .)
If µ ∈M consider one of its Riesz transforms in R3, R1µ(x, y, z), defined as
R1µ(x, y, z) =
|(u, v, 0)− (x, y, z)|3
dµ(u+ iv).
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 11
This transform is the planar analogue of the Poisson transform. In particular,
R1µ(x, y, z) =
(x+ iy)
for all points w = x+ iy ∈ C where the Radon derivative
(w) = lim
µ(B(w, r))
|B(w, r)|
exists.
For measures on the line or on the circle their Poisson integrals and Radon derivatives
(with respect to the one-dimensional Lebesgue measure) are very much related but not always
equivalent. When the asymptotics of the Poisson integral and the ratio from the definition
of the Radon derivative are different near a certain point it usually means that the measure
is ”irregular” near that point. It is not difficult to show that if µ is absolutely continuous
then at a Lebesgue point of its density function the Radon derivative of µ and the Poisson
integral of |µ| (or R1|µ| if n > 1) behave equivalently. Even for singular measures on the
circle, if a measure possesses a certain symmetry near a point, then the same equivalent
behavior takes place, as follows for instance from [1], Lemma 4.1. In fact, it is not easy to
construct a measure so that its Poisson integral and Radon derivative behaved differently
near a large set of points. The same can be said about the Riesz transform and the Radon
derivative. Thus one may interpret our next result as an evidence that, for a planar measure
µ, most points where Cµ = 0 are ”irregular.”
Theorem 4.2. Let µ ∈ M and let w = x + iy be a point of density (with respect to m2) of
the set E = {Cµ = 0}. Then
µ(B(w, r))
= o (R1|µ|(x, y, r)) as r → 0 + .
In view of the above discussion this implies
corollary 4.3. If w is a point of density of the set E = {Cµ = 0}, such that there exists the
Radon derivative d|µ|/dm2(w) 6= 0, then
µ(B(w, r)) = o (|µ|(B(w, r))) as r → 0+ (16)
and dµ/dm2(w) = 0.
Since m2-almost every point of a set is its density point, we also obtain the following
version of the result from [14]:
corollary 4.4. The set E = {Cµ = 0} has measure zero with respect to the absolutely
continuous component of µ.
Proof of Theorem 4.2. without loss of generality w = 0. Choose a C∞0 test-function φ sup-
ported in B := B(0, r), and such that 0 ≤ φ ≤ D/r2, |∇φ| ≤ A/r3 and
φ dm2 = 1. Denote
the complement of E by Ec. Then
φdµ = 〈φ, ∂̄Cµ〉 = 〈∂̄φ, Cµ〉 = 〈χEc∂̄φ, Cµ〉 =
χEc ∂̄φ dm2(z)
ζ − z
dµ(ζ) (17)
All we need is to show that the last integral is small. Then, since the first integral in
(17) is similar to the right-hand side of (16) we will complete the proof. The main idea for
12 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
the rest of the proof is to make the function F (ζ) =
χEc ∂̄φ dm2(z)
ζ−z ”small” by subtracting a
linear combination of Cauchy kernels corresponding to points from E, which will not change
its integral with respect to µ.
Namely, let a, b ∈ B(0, r) ∩ E be any two points such that |a − b| > r. By the previous
lemma for any z ∈ B(0, r) there exist constants A = A(z), B = B(z), of modulus at most 2,
such that (15) holds with c = z. Integrating (15) with respect to χEc ∂̄φ dm2(z) we obtain
that ∣∣∣∣
χEc ∂̄φ dm2(z)
ζ − z
ζ − a
ζ − b
∣∣∣∣ < C
ε(r)r
|ζ |3
outside of B(0, 2r) for some constants A∗, B∗, where ε(r) = |B(0, r)∩Ec|/r2 = o(1) as r → 0.
The constants satisfy |A∗|, |B∗| < 2 ε(r)
Notice that if w ∈ E then
ζ−wdµ = 0 by the definition of the set E. Hence, since
a, b ∈ E,
χEc ∂̄φ dm2(z)
ζ − z
dµ(ζ) =
χEc ∂̄φ dm2(z)
ζ − z
ζ − a
ζ − b
dµ(ζ)
B(0,2r)
C\B(0,2r)
= I1 + I2.
For I2 we now have
C\B(0,2r)
χEc ∂̄φ dm2(z)
ζ − z
ζ − a
ζ − b
dµ(ζ)
C\B(0,2r)
ε(r)r
|ζ |3
d|µ|(ζ) ≤ Cε(r)R1|µ|(0, 0, r).
In I1 we estimate each summand separately. First,
B(0,2r)
χEc ∂̄φ dm2(z)
ζ − z
dµ(ζ)
∣∣∣∣ ≤
B(0,2r)
|ζ − z|
χEcdm2(z)d|µ|(ζ)
|µ|(B(0, 2r)) ≤ C
ε(r)R1|µ|(0, 0, r).
To estimate the second and third summands of I1, recall that the only restriction on the
choice of a, b ∈ B(0, r) ∩ E was that |a − b| > r. This condition will be satisfied, for
instance, if a ∈ B1 = B(−56r,
r) and b ∈ B2 = B(56r,
r). If we average the modulus of the
second summand over all choices of a ∈ B1 ∩ E, recalling that A∗ = A∗(a) always satisfies
|A∗| ≤ 2 ε(r)
, we get
|B1 ∩ E|
B(0,2r)
A∗(a)
ζ − a
dµ(ζ)
∣∣∣∣dm2(a) ≤
|B1 ∩ E|
B(0,2r)
|A∗(a)|
|ζ − a|
dm2(a)d|µ|(ζ)
r|µ|(B(0, 2r)) ≤ Cε(r)R1|µ|(0, 0, r).
It is left to choose a ∈ B1 ∩ E for which the modulus is no greater than its average. The
same can be done for b. The proof is finished. �
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 13
5. Reflectionless measures and Combs
As was mentioned in the introduction, following [2], we will call a non-trivial continuous
finite measure µ ∈M(C) reflectionless if Cµ(z) = 0 at µ-a.e. point z.
Perhaps the simplest example of a reflectionless measure is the measure µ = 1
(1−x2)−1/2dx
on [−1, 1], the harmonic measure of C \ [−1, 1] corresponding to infinity. The fact that
µ is reflectionless can be verified through routine calculations or via the conformal map
interpretation of the harmonic measure. It will also follow from a more general Theorem 5.4
below.
At the same time, since Cµ∗ ≍ (1 − x2)−1/2 on [−1, 1], this simple example complements
the statement of Theorem 2.1. Since the function (1−x2)−1/2 belongs to the ”weak” L1(|µ|),
the summability condition for the Cauchy maximal function proves to be exact in its scale.
In the rest of this section we discuss further examples and properties of positive reflec-
tionless measures on the line.
Let us recall that functions holomorphic in the upper half plane C+ and mapping it to
itself (having non-negative imaginary part) are called Nevanlinna functions. Let M+(R)
denote the class of finite positive measures compactly supported on R. The function f is a
Nevanlinna function if and only if it has a form
f(z) = az + b+
t2 + 1
]dρ(t) ,
where ρ is a positive measure on R such that
dρ(t)
< ∞, a > 0, b ∈ R are constants.
If the representing measure is from M+(R) and f(∞) = 0, the formula becomes simpler:
f(z) =
dµ(x)
x−z .
Definition. A simply connected domain O is comb-like if it is a subset of a half-strip
{w : ℑw ∈ (0, π),ℜw > q}, for some q ∈ R, contains another half-strip {w : ℑw ∈
(0, π),ℜw > r} for some r ∈ R and has the property that
for any w0 = u0 + iv0 ∈ O the whole ray {w = u+ iv0, u ≥ u0} lies in O . (18)
If in addition H1(∂O∩B(0, R)) <∞ for all finite R, we say that O is a rectifiable comb-like
domain.
Let O be a rectifiable comb-like domain, Γ = ∂O. Then by the Besicovitch theory we
know that for H1-a.e. pont w ∈ Γ there exists an approximate tangent line to Γ, see [3] for
details. We wish to consider rectifiable comb-like domains satisfying the following geometric
property:
for a.e. w ∈ Γ approximate tangent line is either vertical or horizontal. (19)
It is not difficult to verify that for any conformal map F : C+ → O, O is comblike if and
only if F ′ is a Cauchy potential of µ ∈ M+(R): F ′(z) =
dµ(x)
x−z . It is, therefore, natural to
ask the following
Question. Which comb-like domains correspond to reflectionless measures µ ∈M+(R)?
An answer would give a geometric description of reflectionless measures from M+(R). If,
in addition, a comb-like domain is rectifiable, then the answer is given by
14 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
Theorem 5.1. 1) Rectifiable comb-like domains correspond exactly to those measures
µ ∈M+(R) that are absolutely continuous with respect to dx and satisfy
dµ(x)
∈ H1loc(C+). (20)
2) An absolutely continuous measure satisfying (20) is reflectionless if and only if the corre-
sponding comb-like domain has the property (19).
Remarks.
1) Of course not every comb-like domain gives rise to a reflectionless measure fromM+(R).
Just take any comb-like domain which appears as F (C+), where F =
∫ z ∫ dµ(x)
x−z for a singular
µ ∈M+(R). By a result from [9] singular measures cannot be reflectionless.
2) On the other hand, even if µ = g(x)dx is a reflectionless absolutely continuous measure,
the corresponding conformal map F =
∫ z ∫ dµ(x)
x−z : C+ → O can be onto a non-rectifiable
domain.
3) For non-rectifiable domains we have no criteria to recognize which ones correspond to
reflectionless measures.
4) It is well known, and not difficult to prove, that the antiderivative of a Nevanlinna
function is a conformal map, see for instance [4]. If F =
∫ z ∫ dµ(x)
x−z , µ ∈ M+(R) then ℑF (x)
is an increasing function on R whose derivative in the sense of distributions is µ. The image
F (C+) lies in the strip {ℑw ∈ (0, π‖µ‖)}.
Theorem 5.1 will follow from Theorems 5.2 and 5.3 below.
Theorem 5.2. Let F be a conformal map of C+ on a rectifiable comb-like domain O. Then
F (z) =
∫ z ∫ dµ(x)
x−z , µ ∈ M+(R), µ << dx. Also
dµ(x)
x−z ∈ H
loc(C+). If in addition O satisfies
(19) then µ is reflectionless.
Proof. without loss of generality O ⊂ {ℜz > 0}. Put Φ = eF . Then the image Φ(O) is
the subdomain of the complement of the unit half-disk in C+ which is the union of rays
(R(θ)eiθ,∞). Consider the subdomain of the upper half-disk D := {z : 1/z ∈ Φ(O)}. Define
G as the smallest open domain containing D and its reflection D := {z̄ : z ∈ D}. Then G is
a star-like domain inside the unit disk. The preimage of G ∩ R under Φ is the union of two
Infinite rays R1 = [−∞, a), R2 = (b,∞], a < b. Therefore, by reflection principle C \ [a, b] is
mapped conformally (by the extension of Φ which we will also denote by Φ) onto star-like
Since Φ : C+ → G, where G is star-like, it is well-known that argΦ(x+ iδ) is an increasing
function of x, see [7].
We conclude that the argument of Φ is monotone. Therefore, ℑF (x + iδ) is monotone,
and so ℑf(x+ iδ) is positive, where f = F ′. We see that f = F ′ is a Nevanlinna function.
From the structure of our comb-like domain, we conclude immediately that its representing
measure µ has compact support, so we are in M+(R). Also, let us prove that µ << dx. The
boundary of our comb is locally rectifiable. So f = F ′ belongs locally to the Hardy class
H1(C+), [16]. Since ℑf is the Poisson integral of µ,
ℑf = Pµ =
(x− t)2 + y2
dµ(t),
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 15
and f is in H1(C+) locally, we conclude that µ = ℑfdx,ℑf ≥ 0 a.e., [16].
Now suppose that, in addition, O = F (C+) has the property (19). Let us recall that
for a simply connected domain with rectifiable boundary Γ the restriction of the Hausdorff
measure H1|Γ is equivalent to the harmonic measure ν on O. Therefore the tangent lines
to Γ are either vertical or horizontal a.e. with respect to ν. The measure ν is the image
of the harmonic measure λ of C+ which is equivalent to the Lebesgue measure on the line.
We have a conformal map F (a continuous function up to the boundary of C+ because it
is an anti-derivative of an H1loc-function) which pushes forward λ to ν. Call a point w0 ∈ Γ
accessible from O if there exists a ray x0 + iy, 0 < y < 1, such that w0 = limy→0 F (x0 + iy).
Almost every point of Γ (w.r. to ν) is accessible from O. For ν-a.e. accessible w0 ∈ Γ
where the tangent line is vertical (horizontal) we can say that ℜF ′(x0) = 0 (ℑF ′(x) = 0).
So R = E1 ∪ E2 ∪ E3, where |E3| = 0, |E1 ∩ E2| = 0, and E1 = {x ∈ R : ℜF ′(x) = 0},
E2 = {x ∈ R : ℑF ′(x) = 0}. We already know that the measure µ = ℑF ′(x)dx represents
f(z) = F ′(z) =
dµ(t)
t−z . Notice that
R\E2 · =
·. But we also know that boundary
values exist dx-almost everywhere, i.e.
dµ(t)
t− x− iy
= ℜF ′(x) = 0
for a.e. x ∈ E1 and therefore for µ-a.e. x ∈ E1. This means (see [16]) that
dµ(x)
= 0 µ-a.e.
Definition. A simply connected rectifiable comb-like domain O is called a comb if its “left”
boundary consists of countably many horizontal and vertical segments.
A comb is called a straight comb if O = {w : ℑw ∈ (0, π),ℜw > 0} \ S, where the set S
is relatively closed with respect to the strip {w : ℑw ∈ (0, π),ℜw > 0} and is the union of
countably many horizontal intervals Rn = (iyn, ln + iyn]. We require also that
ln <∞ .
Example. Let F be a conformal map of C+ on a comb O. By our last theorem F ′(z) =∫
dµ(x)
x−z , where µ ∈M+(R) is reflectionless: C
µ(x) = 0 for µ-a.e. x.
Definition. Let E be a compact subset of the real line. Let E have positive logarithmic
capacity, so Green’s function G of C \E exists. The domain C \E is called Widom domain
G(c) <∞ ,
where the summation goes over all critical points of G (we assume that G is a Green’s
function with pole at infinity.
Example. Let E be a compact subset of the real line of the positive length. We assume
that every point of E is regular in the sense of Dirichlet for the domain C \ E, and we
also assume that C \ E is not a Widom domain. Such E exist in abundance. We will
see below, that the harmonic measure ω of C \ E (with pole at infinity) is reflectionless.
Consider F (z) =
∫ z ∫ dω(x)
z−x for z ∈ C+. It is easy to see that F (z) = G(z) + iG̃(z) + const,
16 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
where G̃ is the harmonic conjugate of G. This F is a conformal map (see [4]) of C= onto
a domain D lying in the strip {w : ℑw ∈ (0, π)}. It is easy to see that complementary
intervals of E will be mapped by F onto straight horizontal segments on the boundary of
D. Each finite complementary interval contains exactly one critical point of G, and clearly
the length of the corresponding straight horizontal segment is G(c) (this follows from the
formula F (z) = G(z) + iG̃(z) + const).
As the domain C \ E was not a Widom domain, we have that the sum of lengths of
abovementioned straight horizontal segment is infinite. So domain D is not rectifiable.
Therefore the reflectionless property of µ alone does not say anything about the rectifiability
of the domain, which is the target domain of the conformal map F (z) =
∫ z ∫ dµ(x)
z−x .
Theorem 5.3. Let µ be absolutely continuous positive measure on R and let Cµ ∈ H1loc(C+).
Then F (z) =
∫ z ∫ dµ(x)
x−z is a conformal map of C+ onto a rectifiable comb-like domain O. If
µ is reflectionless then O has the property (19).
Proof. Consider F (z) =
∫ z ∫ dµ(x)
x−z . Since µ is positive, it is a conformal map. If µ is such
that f(z) = Cµ ∈ H1loc(C+) then F (z) =
f maps C+ onto a domain with locally rectifiable
boundary (see [16]).
If, in addition, µ = ℑfdx is reflectionless, then for a.e. point of P := {x ∈ R : ℑf(x) > 0}
we have ℜf(x) = 0. Conformal map F (z) is continuous up to the boundary of C+ and its
boundary values F (x) form a (locally) absolutely continuous function, F ′(x) = f(x) a.e. As
at almost every point we have either ℑF ′(x) = 0 or ℜF ′(x) = 0 we conclude that O = F (C+)
has the property (19).
We also need the following definition.
Definition. A compact subset E in R is called homogeneous if there exist r, δ > 0 such that
for all x ∈ E, |E ∩ (x− h, x+ h)| ≥ δh for all h ∈ (0, r).
Example. Let E ⊂ R be a compact set of positive length. Let µ be a reflectionless
measure supported on E, µ = g(x)dx. Let in addition E be a homogeneous set. Then
F (z) =
∫ z ∫ dµ(x)
x−z is a conformal map from C+ on a rectifiable comb-like domain satisfying
(19).
Proof. The Cauchy integral Cgdx considered in C\E will be in the Hardy class H1(C\E). In
fact the reflectionless property of gdx implies that its limits from C± will be both integrable
with respect to dx|E .
Now we use homogenuity of E and Zinsmeister’s theorem [15] to conclude that f(z) =
Cgdx(z) is in the usual H1loc(C). Then the conformal map F (z) =
f maps C+ onto a
rectifiable subdomain of a strip. We use Theorem 5.3 to get the rest of our example’s
claims. �
The simple example of a reflectionless measure mentioned at the beginning of this section,
as well as many other explicit examples, are given by our next statement.
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 17
Theorem 5.4. Let E be a compact set of positive lenght, E ⊂ R. Let ω be a harmonic
measure of C \ E with pole at infinity. Then ω is reflectionless.
Example. The simplest comb is a strip {w : ℑw ∈ (0, π),ℜw > 0}. Consider F (z) =
log(z +
z2 − 1). It maps conformally C+ onto the strip. Its derivative f(z) = 1√z2−1 is
x−z and dµ =
1−x2 is the harmonic measure of C \ [−1, 1].
Proof of Theorem 5.4. We need to show that Cω = 0 at ω-a.e point. From our definitions
it can be seen, that Cω on the line coincides with the Hilbert transform of ω, which in its
turn is asymptotically equivalent to the conjugate Poisson transform Qω. Thus all we need
to establish is that
Qω(x+ ih) =
(x− y)2 + h2
dω(y) = ℜ
dω(y)
x− ih− y
→ 0 as h→ 0+ (21)
for almost every x. Instead, we have that the Green’s function F (x) defined as
F (x) =
log |x− y|dω(y) + C∞,
where C∞ is a real constant (Robin’s constant), is equal to 0 at every density point of
E, see for example [8]. The idea of the proof is to show that Qω(x + iε) behaves like
(F (x+ ε) + F (x− ε))/ε near almost every x. The technical details are as follows.
Introduce
φ(y) :=
|1− y|
|1 + y|
y2 + 1
, (22)
φx,h(y) :=
y − x
The function φ(y) decreases as 1/y2 at infinity, hence it is in L1(R, dx) and so are φx,h(y)
with a uniform bound on the norm. However, these functions are not bounded, which makes
it difficult to use them in our estimates. To finish the proof we will first obtain a bounded
version of φx,h(y) through the following averaging procedure.
Let ω = g(x)dx. Choose x to be a Lebesgue point of g and a density point of E. Fixing
sufficiently small h > 0 we can find the set A(x, h) ⊂ (x−h, x−h/2)∪ (x+h/2, x+h) such
• A(x, h) consists of density points of E,
• |A(x, h)| ≥ h/2,
• A(x, h) is symmetric with respect to x.
Let Tx,h := T := {t ∈ (0, h) : x+ t ∈ A(x, h)}. Then |T | ≥ h/4. Now put
ψx,h(y) :=
φx,t(y) dt .
By (22) one can see immediately that
|ψx,h| ≤
for some M > 0 and |ψx,h(y)| ≤ C
, for |y| > h . (23)
18 MARK MELNIKOV, ALEXEI POLTORATSKI, AND ALEXANDER VOLBERG
Also, since ∫
φ dy = 0 .
we have that ∫
ψx,h dy = 0 .
Therefore,
g(y)ψx,h(y) dy| = |
(g(y)− g(x))ψx,h(y) dy| ≤
|g(y)− g(x)||ψx,h|(y) dy.
Now notice that (23) implies that |ψx,h| is majorated by an approximate unity (for instance,
by a constant multiple of the Poisson kernel corresponding to z = x + ih). Since x is a
Lebesgue point for g(x), this means that the last integral tends to 0 as h→ 0.
Looking at the definitions of Tx,h and ψx,h(y) we can see that
g(y)ψx,h(y) dy =
|Tx,h|
(F (x+ t)− F (x− t))−ℜ
g(y)dy
x− it− y
where F (x) is the Green’s function. As we mentioned before, F is zero at the density points
of E. We conclude that
|Tx,h|
g(y)dy
x− it− y
→ 0, h→ 0 + .
for a.e. x on the Borel support of g. Since the Cauchy integral of g has a limit a.e. we obtain
g(y)dy
x− ih− y
→ 0, h→ 0 + .
Remark. All reflectionless measures on R discussed in this section, including those provided
by Theorem 5.4 are absolutely continuous with respect to Lebesgue measure. One may
wonder if there exist singular reflectionless measures. The answer is negative. More generally,
as follows from a theorem from [9], if principal values of the Hilbert transform exist µ-a.e.
for a continuous µ ∈M(R) then µ << dx .
References
[1] A. B. Alexandrov, J. M. Anderson, A. Nicolau. Inner functions, Bloch spaces and symmetric measures,
Proc. London Math. Soc. (3) 79 (1999), no. 2, 318–352.
[2] E.D. Belokolos, A.I. Bobenko, V.Z. Enol’skii, A.R. Its and V.B. Matveev, Algebro-Geometric Approach
to Nonlinear Integrable Equations, Springer, Berlin (1994).
[3] P. Mattila, Geometry of sets and measures in Euclidean spaces. Fractals and rectifiability. Cambridge
Studies in Advanced Mathematics, 44. Cambridge University Press, Cambridge, 1995. xii+343 pp.
[4] P. L. Duren, Univalent Functions, Grundlehren der Mathematischen Wissenschaften [Fundamental Prin-
ciples of Mathematical Sciences], 259. Springer-Verlag, New York, 1983.
[5] L.C. Evans, R. Gariepy, Measure Theory and Fine Properties of Functions, Studies in Advanced Math-
ematics. CRC Press, Boca Raton, FL, 1992.
[6] T. W. Gamelin, Uniform Algebras, Prentice-Hall, Inc., Englewood Cliffs, N. J., 1969.
UNIQUENESS THEOREMS FOR CAUCHY INTEGRALS 19
[7] G. M. Golusin, Geometric theory of functions. Hochschulbcher fr Mathematik, Bd. 31. VEB Deutscher
Verlag der Wissenschaften, Berlin, 1957. xii+438 pp. 30.0X
[8] W. K. Hayman, P. B. Kennedy, Subharmonic Functions, vol. 1, Academic Press, London-New York,
1976.
[9] P. Jones, A. Poltoratski, Asymptotic growth of Cauchy transforms, Ann. Acad. Sci. Fenn. Math, 2004
[10] M. Krein, A. Nudelman The Markov moment problem and extremal problems. Ideas and problems of P.
L. Čebyšev and A. A. Markov and their further development. Translated from the Russian by D. Louvish.
Translations of Mathematical Monographs, Vol. 50. American Mathematical Society, Providence, R.I.,
1977. v+417 pp.
[11] M. Melnikov, J. Verdera, A geometric proof of the L2 boundedness of the Cauchy integral on Lipschitz
graphs, Internat. Math. Res. Notices 1995, no. 7, 325–331.
[12] F. Nazarov, S. Treil, A. Volberg , Cauchy integral and Calder-Zygmund operators on nonhomogeneous
spaces, Int. Math. Res. Not. 15 (1997) 703726.
[13] X. Tolsa, L2 -boundedness of the Cauchy integral operator for continuous measures, Duke Math. J. 98
(1999), 269-304.
[14] X. Tolsa, J. Verdera, May the Cauchy transform of a non-trivial finite measure vanish on the support
of the measure? Ann. Acad. Sci. Fenn. Math. 31 (2006), no. 2, 479–494.
[15] M. Zinsmeister, Espaces de Hardy et domaines de Denjoy. (French) [Hardy spaces and Denjoy domains]
Ark. Mat. 27 (1989), no. 2, 363–378.
[16] I. Privalov, Boundary properties of analytic functions, Gosudarstv. Izdat. Tehn.-Teor. Lit., Moscow-
Leningrad, 1950. 336 pp.
Department de Matematiques, Uneversitat Autonoma de Barcelona, 08193 Bellaterra,
Barcelona, Spain
E-mail address : mark.melnikov@gmail.com
Texas A& M University, Department of Mathematics, College Station, TX 77843, USA
E-mail address : alexeip@math.tamu.edu
Dept. Math., Michigan State Univ., East Lansing MI 48823, USA, and, School of Math.,
University of Edinburgh, Edinburgh UK EH9 EJ6
E-mail address : sashavolberg@yahoo.com
ABSTRACT
  If $\mu$ is a finite complex measure in the complex plane $\C$ we denote by
$C^\mu$ its Cauchy integral defined in the sense of principal value. The
measure $\mu$ is called reflectionless if it is continuous (has no atoms) and
$C^\mu=0$ at $\mu$-almost every point. We show that if $\mu$ is reflectionless
and its Cauchy maximal function $C^\mu_*$ is summable with respect to $|\mu|$
then $\mu$ is trivial. An example of a reflectionless measure whose maximal
function belongs to the "weak" $L^1$ is also constructed, proving that the
above result is sharp in its scale. We also give a partial geometric
description of the set of reflectionless measures on the line and discuss
connections of our results with the notion of sets of finite perimeter in the
sense of De Giorgi.

<|endoftext|><|startoftext|>
Introduction
Let Σnk,d ⊂ P(H0(P2,OP2(n))) := PN , with N =
n(n+3)
, be the closure, in the
Zariski’s topology, of the locally closed set of reduced and irreducible plane curves
of degree n with k cusps and d nodes. Let Σ ⊂ Σnk,d be an irreducible component
of the variety Σnk,d. Denoting by Mg the moduli space of smooth curves of genus
− k − d, it is naturally defined a rational map
ΠΣ : Σ 99K Mg,
sending the general point [Γ] ∈ Σ0 to the isomorphism class of the normalization
of the curve Γ ⊂ P2 corresponding to [Γ]. We say that ΠΣ is the moduli map of Σ
and we set
number of moduli of Σ := dim(ΠΣ(Σ)).
We say that Σ has general moduli if ΠΣ is dominant. Otherwise, we say that Σ has
special moduli or that Σ has finite number of moduli. By lemma 2.2 of [4], we know
that the dimension of the general fibre of ΠΣ is at least equal to
max(8, 8 + ρ− k),
where ρ := ρ(2, g, n) = 3n− 2g− 6 is the number of Brill-Noether of linear series of
degree n and dimension 2 on a smooth curve of genus g. It follows that, if Σ has
the expected dimension equal to 3n+ g − 1− k and g ≥ 2, then
(1) dim(ΠΣ(Σ)) ≤ min(dim(Mg), dim(Mg) + ρ− k).
Definition 1.1. We say that Σ has the expected number of moduli if equality holds
in (1).
In particular, we expect that, if ρ − k ≤ 0, then on the normalization curve C
of the curve Γ ⊂ P2 corresponding to the general point [Γ] ∈ Σ, there exists only
a finite number of linear series of degree n and dimension 2 mapping C to a plane
curve with nodes and k cusps as singularities and corresponding to a point of Σ,
(see the proof of lemma 2.2 of [4]). For a deeper discussion and a list of known
results about the moduli problem of Σnk,d we refer to sections 1 and 2 of [4] and
related references. In particular, in [4] we have found sufficient conditions in order
Key words and phrases. number of moduli, sextics with six cusps, plane curves, Zariski pairs.
http://arxiv.org/abs/0704.0622v1
2 CONCETTINA GALATI
that an irreducible component Σ of Σnk,d has finite and expected number of moduli.
If Σ verifies these conditions then ρ(2, n, g) ≤ 0. Finally in [4] we constructed
examples of families of plane curves with nodes and cusps with finite and expected
number of moduli. In this paper we consider the particular case of the variety Σ66,0
of irreducible sextics with six cusps.
It was proved by Zariski (see [8]) that Σ66,0 has at least two irreducible com-
ponents. One of them is the parameter space Σ1 of the family of plane curves of
equation
f32 (x0, x1, x2) + f
3 (x0, x1, x2) = 0,
where f2 and f3 are homogeneous polynomials of degree two and three respectively.
The general point of Σ1 corresponds to an irreducible sextic with six cusps on a
conic as singularities. Moreover, Σ66,0 contains at least one irreducible component
Σ2 whose general element corresponds to a sextic with six cusps not on a conic as
singularities and containing in its closure the variety Σ69,0 of elliptic sextics with nine
cusps. Recently, A. Degtyarev has proved that Σ1 and Σ2 are the unique irreducible
components of Σ66,0, (see [1]).
The moduli number of Σ1 and Σ2 can not be calculated by using the result of
[4]. Indeed, in this case ρ(2, 4, 6) = 4 > 0 and then the general element of every
irreducible component of Σ66,0 does not verify the hypotheses of proposition 4.1 of [4].
On the contrary, it is easy to verify that, if Γ ⊂ P2 is the plane curve corresponding
to the general element of one of the irreducible components of Σ66,0 and C is the
normalization curve of Γ, then the map µo,C is injective. But, in contrast with the
nodal case, this information is not useful in order to study the moduli problem of
Σnk,d, (see [6] and remark 4.2 of [4]). In the proposition 2.2 and corollary 2.4, we
prove that Σ2 has the expected number of moduli equal to seven. Moreover, we
show that there exists a stratification
Σ69,0 ⊂ Σ′ ⊂ Σ̃ ⊂ Σ2,
where Σ′ and Σ̃ are respectively irreducible components of Σ68,0 and Σ
7,0 with ex-
pected number of moduli. Finally, in the corollary 2.8, we prove that also Σ1 has
the expected number of moduli by using that every element of Σ1 is the branch
locus of a triple plane.
2. On the number of moduli of complete irreducible families of plane
sextics with six cusps
First of all we want to find sufficient conditions in order that, if an irreducible
component Σ of Σnk,d has the expected number of moduli, then every irreducible
component Σ′ of Σnk′,d′ , containing Σ, has the expected number of moduli. In the
corollary 4.7 of [4] we considered this problem under the hypothesis that Σ has the
expected dimension and ρ(2, n, g) ≤ 0. Now we are interested to the case ρ > 0.
We need the following local result.
= {(a, b, x, y)| y2 = x3 + ax+ b} ⊂ C2 × A2
be the versal deformation family of an ordinary cusp (see [3] for the definition and
properties of the versal deformation family of a plane singularity). We recall that
the general curve of this family is smooth. The locus ∆ of C2 of the pairs (a, b) such
that the corresponding curve is singular, has equation 27b2 = 4a3. For (a, b) ∈ ∆
and (a, b) 6= (0, 0), the corresponding curve has a node and no other singularities,
whereas (0, 0) is the only point parametrizing a cuspidal curve.
ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 3
Lemma 2.1 ([3], page 129.). Let G → C2 be a two parameter family of curves
of genus g ≥ 2, whose general fibre is stable and which is locally given by y2 =
x3 + ax + b, with (a, b) ∈ C2 and let D ⊂ C2 be a curve passing through (0, 0)
and not tangent to the axis b = 0 at (0, 0). Then the j-invariant of the elliptic
tail of the curve which corresponds to the stable limit of G(0,0), with respect to the
curve D, doesn’t depend on D. Otherwise, for every j0 ∈ C, there exists a curve
Dj0 ⊂ C2 passing through (0, 0) and tangent to the axis b = 0 at this point, such that
the elliptic tail of the stable reduction of G(0,0) with respect to Dj0 , has j-invariant
equal to j0.
Proposition 2.2. Let Σ ⊂ Σnk,d, with k < 3n, be an irreducible component of
Σnk,d. Let g be the geometric genus of the plane curve corresponding to the general
element of Σ. Suppose that g ≥ 2, ρ(2, g, n)− k ≤ 0 and Σ has the expected number
of moduli equal to 3g − 3 + ρ− k. Then, every irreducible component Σ′ of Σnk′,d′ ,
with k′ = k− 1 and d = d′ or k = k′ and d′ = d− 1, such that Σ ⊂ Σ′, has expected
number of moduli.
Proof. First we consider the case k′ = k − 1 and d = d′. Let q1, . . . , qk be the
cusps of Γ. It is well known that, since k < 3n then [Γ] ∈ Σnk−1,d. In particular,
for every fixed cusp qi of Γ there exists an irreducible analytic branch Si of Σnk−1,d
passing through the point [Γ] and whose general point corresponds to a plane curve
Γ′ of degree n with d nodes and k − 1 cusps specializing to the singular points
of Γ different from qi, as Γ
′ specializes to Γ. Moreover, it is possible to prove
that every Si is smooth at the point [Γ], see [7] or chapter 2 of [5]. Let Σ′ be
one of the irreducible components of Σnk−1,d containing Σ. Notice that the general
element of Σ′ corresponds to a curves of genus g′ = g + 1. Since ρ(2, g′, n) − k′ =
3n− 2g − 2− 6− k + 1 = ρ(2, g, n)− k − 1 < 0, in order to prove the theorem it is
enough to show that the general fibre of the moduli map
ΠΣ′ : Σ
99K Mg+1
has dimension equal to eight. Let us notice that the map ΠΣ′ is not defined at the
general element [Γ] of Σ. More precisely, let γ ⊂ Si ⊂ Σ′ be a curve passing through
[Γ] and not contained in Σ. Let C → γ be the tautological family of plane curves
parametrized by γ. Let C′ → γ be the family obtained from C → γ by normalizing
the total space. The general fibre of C′ → γ is a smooth curve of genus g+1, while
the special fibre C′0 := Γ′ is the partial normalization of Γ obtained by smoothing
all the singular points of Γ, except the marked cusp qi. If we restrict the moduli
map ΠΣ′ to γ, we get a regular map which associates to [Γ] the point corresponding
to the stable reduction of Γ with respect to the family C′ → γ, which is the union
of the normalization curve C of Γ and an elliptic curve, intersecting at the point
q ∈ C which maps to the cusp qi ∈ Γ. Now, let G ⊂ Σ′ × Mg+1 be the graph
of ΠΣ′ , let π1 : G → Σ′ and π2 : G → Mg+1 be the natural projections and let
U ⊂ Σ be the open set parametrizing curves of degree n and genus g with exactly
k cusps and d nodes as singularities. From what we observed before, if we denote
by ΠΣ′(Σ) the Zariski closure in Mg+1 of π2π−11 (U), then ΠΣ′(Σ) is contained in
the divisor ∆1 ⊂ Mg+1, whose points are isomorphism classes of reducible curves
which are union of a smooth curve of genus g and an elliptic curve, meeting at a
point. Denoting by ΠΣ : Σ → Mg the moduli map of Σ, the rational map
∆1 99K Mg
which forgets the elliptic tail, restricts to a rational dominant map
q : ΠΣ′(Σ) 99K ΠΣ(Σ).
The dimension of the general fibre of q is at most two. Since, by hypothesis, the
dimension of the fibre of the moduli map ΠΣ is eight, there exists only a finite
4 CONCETTINA GALATI
number of g2n on C, ramified at k points, which maps C to a plane curve D such
that the associated point [D] ∈ P
n(n+3)
2 belongs to Σ. In particular, the set of points
x of C such that there is a g2n with k simple ramification points, one of which at
x, is finite. So, the dimension of the general fibre of q is at most one. In order to
decide if the general fibre of q has dimension zero or one, we have to understand
how the j-invariant of the elliptic tail of the stable reduction of Γ′ with respect the
family C′ → γ, depends on γ. If C → C2 is the étale versal deformation family of
the cusp. By versality, for every fixed cusp pi of Γ, there exist étale neighborhoods
n(n+3)
2 of [Γ] in P
n(n+3)
2 , V
v→ C2 of (0, 0) in C2 and Ui of pi in the tautological
family U → P
n(n+3)
2 with a morphism f : U → V such that the family Ui → U is
the pullback, with respect to f , of the restriction to V of the versal family. By the
properties of the étale versal deformation family of a plane singularity, (see [2]), we
have that f−1((0, 0)) is an étale neighborhood of [Γ] in the (smooth) analytic branch
Σn1,0 whose general element corresponds to an irreducible plane curves with only one
cusp at a neighborhood of the cusp qi of Γ. So, dim(f
−1((0, 0))) =
n(n+3)
− 2 and
the map f is surjective. Moreover, if g is the restriction of f at u−1(Σ′), then also
g is surjective. Indeed,
g−1((0, 0)) = f−1((0, 0)) ∩ u−1(Σ′) = u−1((Σ))
and, since k < 3n, then dim(Σ) = 3n + g − 1 − k = dim(Σ′) − 2 and g is sur-
jective. By using lemma 2.1, it follows that the general fibre of the natural map
ΠΣ′(Σ) → ΠΣ(Σ) has dimension exactly equal to one. Therefore, dim(ΠΣ′ (Σ)) =
dim(ΠΣ(Σ)) + 1 = 3g − 3 + ρ(2, g, n)− k + 1 = 3(g + 1)− 3 + ρ(2, g + 1, n)− k By
using that
dim(ΠΣ′ (Σ
′)) ≥ dim(ΠΣ′(Σ)) + 1 = 3(g + 1)− 3 + ρ(2, g + 1, n)− k + 1.
and by recalling that, by lemma 2.2 of [4], it is always true that dim(ΠΣ′(Σ
′)) ≤
3(g + 1)− 3 + ρ(2, g + 1, n)− k + 1, the statement is proved in the case k′ = k − 1
and d′ = d.
Suppose, now, that k = k′ and d′ = d − 1. Also in this case Σ is not contained
in the regularity domain of ΠΣ′ . More precisely, if [Γ] ∈ Σ is general, then ΠΣ′([Γ])
consists of a finite number of points, corresponding to the isomorphism classes
of the partial normalizations of Γ obtained by smoothing all the singular points
of Γ, except for a node. Then ΠΣ′(Σ) is contained in the divisor ∆0 of Mg+1
parametrizing the isomorphism classes of the analytic curves of arithmetic genus
g + 1 with a node and no more singularities. The natural map ∆0 99K Mg sending
the general point [C′] of ∆0 to the isomorphism class of the normalization of C
restricts to a rational dominant map q : ΠΣ′(Σ) 99K ΠΣ(Σ). Since we suppose that
Σ has the expected number of moduli and ρ(2, g, n)−k ≤ 0, if C is the normalization
of the plane curve corresponding to the general element of Σ, then the set S of the
linear series of dimension 2 and degree n on C with k simple ramification points,
mapping C to a plane curve D such that the associated point [D] in the Hilbert
Scheme belongs to Σ, is finite. We deduce that also the set S′ of the pairs of points
(p1, p2) of C, such that there is a g
n ∈ S such that the associated morphism maps
p1 and p2 to the same point of the plane, is finite. So, also q
−1([C]) is finite and
dim(ΠΣ′ (Σ)) = dim(ΠΣ(Σ)). It follows that
dim(ΠΣ′ (Σ
′)) ≥ 3g− 3+ 3n− 2g− 6− k+1 = 3(g+1)− 3+ 3n− 2(g+1)− 6− k.
Remark 2.3. Notice that, the arguments used before to prove lemma 2.2 don’t
work if the dimension of the general fibre of the moduli map of Σ has dimension
bigger than eight. Indeed, in this case, the dimension of the general fibre of the map
ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 5
ΠΣ′(Σ) 99K ΠΣ(Σ) could be bigger than one if k
′ = k − 1 and d = d′, or than zero
if k′ = k and d = d′ − 1.
Corollary 2.4. There exists at least one irreducible component Σ2 of Σ
6,0 having
the expected number of moduli equal to dim(M4) − 2 and whose general element
corresponds to a sextic with six cusps not on a conic.
Remark 2.5. As we already observed in the previous section, Σ2 is the only com-
ponent of Σ66,0 parametrizing sextics with six cusps not on a conic by [1].
Proof. Let Σ69,0 be the variety of elliptic plane curves of degree six with nine cusps
and no more singularities. It is not empty and irreducible, because, by the Plücker
formulas, the family of dual curves is Σ30,0 ≃ P9, which is irreducible and not empty.
Moreover, if we compose an holomorphic map φ : C → P2 from a complex torus C
to a smooth plane cubic with the natural map φ(C) → φ(C)∗, where we denoted by
the dual curve of φ(C), we get a morphism from C to a plane sextic with nine
cusps. Therefore, the number of moduli of Σ69,0 is equal of the number of moduli of
Σ30,0, equal to one. Since 6 < 3n = 18, there is at least one irreducible component
Σ′ of Σ68,0 containing Σ
9,0. Let ΠΣ′ : Σ
99K M2 be the moduli map of Σ′ and let
G ⊂ Σ′ × M2 be its graph. If we denote by π1 : G → Σ′ and π2 : G → M2 the
natural projection, by U the open set of Σ69,0 parametrizing cubics of genus one
with nine cusps and by ΠΣ′(Σ
9,0) the Zariski closure in M2 of π2π−11 (U), then, by
arguing as in the first part of the proof of the lemma 2.2, we have a dominant map
ΠΣ′(Σ
9,0) 99K M1, whose general fibre has dimension one. We conclude that
dim(πΣ′ (Σ
′)) ≥ dim(πΣ′(Σ69,0)) + 1 = 3
and so, the moduli map of Σ′ is dominant, as one expects, because ρ(2, 2, 6)− 8 =
18− 4− 6− 8 = 0. Let D be the plane sextic corresponding to the general point of
Σ′. By Bezout theorem, the height cusps P1, . . . , P8 of D don’t belong to a conic
and, however we choose five cusps of D, no four of them lie on a line. Then, let
C2 be the unique conic containing P1, . . . , P5. There exists at least a cusp, say P6,
which does not belong to C2. Since 8 < 3n = 18, there exists a family of plane
sextics D → ∆, whose special fibre is D and whose general fibre has a cusp at
a neighborhood of every cusp of D different from P7 and no further singularities.
By lemma 2.2, the curve ∆ is contained in an irreducible component of Σ67,0 with
expected number of moduli. By repeating the same argument for the general fibre
of the family D → ∆ we get an irreducible component Σ2 of Σ66,0 with the expected
number of moduli and whose general element parametrizes a sextic with six cusps
not on a conic. �
Now we consider the irreducible component Σ1 of Σ
6,0 parametrizing plane curves
of equation f23 (x0, x1, x2) + f
2 (x0, x1, x2) = 0, where f2 is an homogeneous poly-
nomial of degree two and f3 is an homogeneous polynomial of degree three. The
general element of Σ1 corresponds to an irreducible plane curve of degree six with
six cusps on a conic. We want to show that Σ1 has the expected number of moduli
equal to 12− 3 + ρ(2, 4, 6)− 6 = 7 = dim(M4)− 2. Equivalently, we want to show
that the general fibre of the moduli map
Σ1 99K M4
has dimension equal to eight.
Lemma 2.6. Let Γ2 : f2(x0, x1, x2) = 0 and Γ3 : f3(x0, x1, x2) = 0 be a smooth
conic and a smooth cubic intersecting transversally. Then, the plane curve
Γ : f23 (x0, x1, x2)− f32 (x0, x1, x2) = 0
is an irreducible sextic of genus four with six cusps at the intersection points of Γ2
and Γ3 as singularities. The curve Γ is projection of a canonical curve C ⊂ P3 from
6 CONCETTINA GALATI
a point p ∈ P3 which is contained in six tangent lines to C. Moreover, for every
point q ∈ P3 −C such that the projection plane curve πq(C) of C from q is a sextic
with six cusps on a conic of equation g23(x0, x1, x2) − g32(x0, x1, x2) = 0, where g3
and g2 are two homogeneous polynomials of degree three and two respectively, there
exists a cubic surface S3 ∈ |IC|P3(3)|, containing C, such that the plane curve πq(C)
is the branch locus of the projection πq : S3 → P2.
Remark 2.7. Notice that, by [1], every irreducible sextic with six cusps on a conic
as singularities has equation given by (f2(x0, x1, x2))
3 + (f3(x0, x1, x2))
2 = 0, with
f2 and f3 homogeneous polynomials of degree two and three. In order words, all the
sextics with six cusps on a conic as singularities are parametrized by points of Σ1.
An other proof of this result as been provided to us by G. Pareschi.
Proof of lemma 2.6. Let f(x0, x1, x2) = f
3 (x0, x1, x2) − f32 (x0, x1, x2) = 0 be the
equation of Γ. From the relation f3(x) = ±f2(x)
f2(x), we deduce that
(x) =
±2∂f2
f2(x) and hence
(x) = 2
(x)f3(x)− 3f2(x)2
(x) = −f2(x)2
By using that the conic Γ2 : f2 = 0 is smooth, it follows that, if a point x ∈ Γ is
singular, then x ∈ Γ2 and hence x ∈ Γ3∩Γ2. On the other hand, always from (2), if
x ∈ Γ2∩Γ3, then x is a singular point of Γ. Hence, the singular locus of Γ coincides
with Γ3 ∩ Γ2. Let x be a singular point of Γ. If
p1(x, y) + terms of degree two = 0
q1(x, y) + terms of degree ≥ two = 0
are respectively affine equations of Γ2 and Γ3 at x, then, the affine equation of Γ at
x is given by
q1(x, y)
2 − p1(x, y)3 + terms of degree ≥ four = 0.
Since Γ2 and Γ3 intersect transversally, we have that q1(x, y) does not divide p1(x, y)
and hence Γ has an ordinary cusp at x. Let now φ : C → Γ be the normalization of
Γ. We recall that the cubics passing through the six cusps of Γ cut out on C the
complete canonical series |ωC |. Since the cusps of Γ is contained in the conic Γ2 ⊂ P2
of equation f2 = 0, the lines of P
2 cut out on C a subseries g ⊂ |ωC | of dimension
two of the canonical series. Moreover, if we still denote by C a canonical model of
C in P3, then the linear series g is cut out on C in P3 from a two dimensional family
of hyperplanes passing through a point p ∈ P3−C. If we project C from p we get a
plane curve projectively equivalent to Γ. Since Γ has six cusps as singularities, we
deduce that there are six tangent lines to C passing through p. To see that Γ is the
branch locus of a triple plane, let S3 ⊂ P3 be the cubic surface of equation
F3(x0, . . . , x3) = x
3 − 3f2(x0, x1, x2)x3 + 2f3(x0, x1, x2) = 0.
If p = [0, 0, 0, 1], then, by using Implicit Function Theorem, the ramification locus
of the projection πp : S3 → P2, is given by the intersection of S3 with the quadric
S2 of equation
= x23 − f2(x0, x1, x2) = 0. Now, if x = [x0, x1, x2] ∈ S3 ∩ S2,
then x3 = ±
f2(x0, x1, x2). By substituting in the equation of S3, we find that the
branch locus of the projection πp : S3 → P2 coincides with the plane curve Γ. From
what we proved before, it follows that the ramification locus of the projection map
πp : S3 → P2 is the normalization curve C of Γ. Finally, if q ∈ P3 − C is an other
point such that the plane projection πq(C) is an irreducible sextic with six cusps on a
conic parametrized by a point xq ∈ Σ1 ⊂ P27, then, up to projective motion, we may
always assume that q = [0 : 0 : 0 : 1] and hence, if g23(x0, x1, x2)− g32(x0, x1, x2) = 0
ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 7
is the equation of the plane curve πq(C), then C is the locus of ramification of the
projection from q to the plane of the cubic surface of equation
x33 − 3g2(x0, x1, x2)x3 + 2g3(x0, x1, x2) = 0.
Corollary 2.8. The irreducible component Σ1 of Σ
6,0 parametrizing plane curves of
equation f23 (x0, x1, x2)+ f
2 (x0, x1, x2) = 0, where f2 is an homogeneous polynomial
of degree two and f3 is an homogeneous polynomial of degree three, has the expected
number of moduli equal to 7 = dim(M4) + ρ(2, 4, 6)− 6.
Proof. Let [Γ] ⊂ P2 be a plane sextic of equation f23 (x0, x1, x2) − f32 (x0, x1, x2) =
0, where the conic f2 = 0 and the cubic f3 = 0 are smooth and they intersect
transversally. Let C ⊂ P3 be the normalization curve of Γ and let SC be the set of
points v = [v0 : · · · : v3] ∈ P3 such that there exists a cubic surface S3 ∈ |IC|P3(3)|,
containing C, such that the curve C is the ramification locus of the projection πv :
S3 → P2. By the former lemma, in order to prove that Σ1 has the expected number
of moduli, it is enough to find a point [Γ] of Σ1 corresponding to an irreducible
plane sextic Γ ⊂ P2 with six cusps of a conic such that the set SC is finite. Let Γ2
be the smooth conic of equation f2(x0, x1, x2) = x
1 − x22 = 0 and let Γ3 be the
smooth cubic of equation f3(x0, x1, x2) = x
0+x0x
2−x21x2 = 0. If a1, a2 and a3 are
the three different solutions of the polynomial x3 + x2 + x− 1 = 0, then Γ2 and Γ3
intersect transversally at the points [ai,
ai, 1], [ai,−
ai, 1], with i = 1, 2, 3. By
the former lemma, the plane sextic Γ of equation f32 − f23 = 0 is irreducible and
it has six cusps at the intersection points of Γ2 and Γ3 as singularities. Moreover,
the normalization curve C of Γ is the canonical curve of genus 4 in P3 which is
intersection of the cubic surface S3 ⊂ P3 of equation
F3(x0, x1, x2, x3) = x
3 + (x
0 + x
1 − x22)x3 + x30 + x0x22 − x21x2 = 0
and the quadric S2 of equation
= 3x23 + x
0 + x
1 − x22 = 0.
We want to show that SC is finite. To see this we observe that, since
h0(P3, IC|P3(2)) = 1 and h0(P3, IC|P3(3)) = 5,
the equation of every cubic surface containing C and which is not the union of S2
and an hyperplane is given by
G(x0, . . . , x3;β0, . . . , β3) = F3(x0, x1, x2, x3) +
∂F3(x0, x1, x2, x3)
with βj ∈ C, for i = 0, . . . , 3. Now, a point [v] = [v0, . . . , v3] ∈ SC if and only if
there exist β0, . . . , β3 such that C is contained in the intersection of
G(x0, . . . , x3;β0, . . . , β3) = 0 and
∂G(x0, . . . , x3;β0, . . . , β3)
Still using that h0(P3, IC|P3(2)) = 1, a point [v] ∈ P3 belongs to SC if and only if
∂G(x0, . . . , x3;β0, . . . , β3)
∂F3(x0, . . . , x3)
for some λ ∈ R− 0, or, equivalently,
βjxj)
∂x3∂xi
= (λ−
viβi − v3)
8 CONCETTINA GALATI
The previous equality of polynomials is equivalent to the following bilinear system
of ten equations in the variables v0, . . . , v3 and β0, . . . , β3
(1 + β3)v0 + 3β0v3 = 0 (x0x3)
(1 + β3)v1 + 3β1v3 = 0 (x1x3)
(1 + β3)v2 − 3β2v3 = 0 (x2x3)
β1v0 + β0v1 = 0 (x0x1)
β2v0 + (1− β0)v2 = 0 (x0x2)
(1 − β2)v1 + β1v2 = 0 (x1x2)
2β1v1 − v2 = λ−
j=0 βjvj − v3 (x21)
−v0 + 2β2v2 = λ−
j=0 βjvj − v3 (x22)
(3 + 2β0)v0 = λ−
j=0 βjvj − v3 (x20)
2β3v3 = λ−
j=0 βjvj − v3 (x23)
The points of SC are the solutions v of the previous linear system, as a linear
system whose coefficients depend on β0, . . . , β3. It is easy to prove that it has only
a solution equal to (v0, v1, v2, v3) = (0, 0, 0, λ) if β0 = β1 = β2 = β3 = 0 and it has
not solutions otherwise, (see [5], page 98). By the previous lemma, we conclude
that the point [0 : 0 : 0 : 1] ∈ P3 is the only point which belongs to six tangent lines
to the canonical curve C ⊂ P3 which is intersection of the cubic surface of equation
F3(x0, x1, x2, x3) = x
3 + (x
0 + x
1 − x22)x3 + x30 + x0x22 − x21x2 = 0
and the quadric of equation
= 3x23 + x
0 + x
1 − x22 = 0.
It follows that, on the normalization curve D of the plane curve Γ′ corresponding
to the general point of Σ1 ⊂ Σ66,0 there exists only a finite number of linear series
of dimension two with six ramification points. �
Remark 2.9. By using the notation introduced in the proof of corollary 2.8, we
observe that in this corollary we have proved that if C is a general canonical curve of
genus four such that the set SC is not empty, then SC is finite. Actually, C. Ciliberto
pointed out to our attention that it is possible to show, with a very simple argument,
that for every canonical curve C of genus four such that SC is not empty, we have
that SC is finite. Finally, we observe that, by remark 2.7, for every canonical curve
C of genus four, the set SC coincides with the set of points of P3 which are contained
in six tangent lines to C.
Acknowledgment. The results of this paper are contained in my PhD-thesis. I
would like to thank my advisor C. Ciliberto for introducing me into the subject and
for providing me very useful suggestions. I have also enjoyed and benefited from
conversation with G. Pareschi and my college M. Pacini.
References
[1] A. Degtyarev: On deformations of singular plane sextics, math.AG/0511379, appearing on
Journal of Algebraic Geometry.
[2] S. Diaz and J. Harris: Ideals associated to deformations of singular plane curves, Transactions
of the American Mathematical Society, vol. 309, n. 2, 433–468 (1988).
[3] J. Harris and I. Morrison: Moduli of curves, Graduate texts in mathematics, vol. 187, Springer,
New York, 1988.
[4] C. Galati: Number of moduli of irreducible families of plane curves with nodes and cusps,
Collect. Math. 57,3 (2006), 319-346.
[5] C. Galati: Number of moduli of plane curves with nodes and cusps, PhD the-
sis, Università degli Studi di Tor Vergata, 2004-2005, available on the homepage
http://dspace.uniroma2.it/dspace/items-by-author?author=Galati
http://arxiv.org/abs/math/0511379
http://dspace.uniroma2.it/dspace/items-by-author?author=Galati
ON THE NUMBER OF MODULI OF PLANE SEXTICS WITH SIX CUSPS 9
[6] E. Sernesi: On the existence of certain families of curves, curves. Invent. Math. vol. 75 (1984),
no. 1, 25–57.
[7] O. Zariski: Dimension theoretic characterization of maximal irreducible sistems of plane nodal
curves, Amer. J. Math. vol. 104 (1982), no. 1, 209–226.
[8] O. Zariski: Algebraic surfaces, Classics in mathematics, Springer.
Dipartimento di Matematica, Università della Calabria, Arcavacata di Rende (CS)
E-mail address: galati@mat.unical.it
	1. Introduction
	2. On the number of moduli of complete irreducible families of plane sextics with six cusps
	Acknowledgment
	References
ABSTRACT
  Let S be the variety of irreducible sextics with six cusps as singularities.
Let W be one of irreducible components of W. Denoting by M_4 the space of
moduli of smooth curves of genus 4, the moduli map of W is the rational map
from W to M_4 sending the general point of W, corresponding to a plane curve D,
to the point of M_4 parametrizing the normalization curve of D. The number of
moduli of W is, by definition the dimension of the image of W with respect to
the moduli map. We know that this number is at most equal to seven. In this
paper we prove that both irreducible components of S have number of moduli
exactly equal to seven.

<|endoftext|><|startoftext|>
Introduction
	Angular distribution
	Diffusion on surfaces
	Asynchronous transitions
	Fourier analysis of asynchronous data
	Random walk about a single center
	Discussion
	Angular distributions
	Time distribution
	Acknowledgments
	References
ABSTRACT
  In this paper I describe a specialized algorithm for anisotropic diffusion
determined by a field of transition rates. The algorithm can be used to
describe some interesting forms of diffusion that occur in the study of proton
motion in a network of hydrogen bonds. The algorithm produces data that require
a nonstandard method of spectral analysis which is also developed here.
Finally, I apply the algorithm to a simple specific example.

<|endoftext|><|startoftext|>
Temporal Evolution Of Step-Edge Fluctuations
Under Electromigration Conditions
P.J. Rous∗ and T.W. Bole
Department of Physics,
University of Maryland Baltimore County,
1000 Hilltop Circle,
Baltimore, MD 21250
(Dated: October 31, 2018)
The temporal evolution of step-edge fluctuations under electromigration conditions is analysed
using a continuum Langevin model. If the electromigration driving force acts in the step up/down
direction, and step-edge diffusion is the dominant mass-transport mechanism, we find that significant
deviations from the usual t1/4 scaling of the terrace-width correlation function occurs for a critical
time τ which is dependent upon the three energy scales in the problem: kBT , the step stiffness,
γ, and the bias associated with adatom hopping under the influence of an electromigration force,
±∆U . For (t < τ ), the correlation function evolves as a superposition of t1/4 and t3/4 power laws.
For t ≥ τ a closed form expression can be derived. This behavior is confirmed by a Monte-Carlo
simulation using a discrete model of the step dynamics. It is proposed that the magnitude of the
electromigration force acting upon an atom at a step-edge can by estimated by a careful analysis of
the statistical properties of step-edge fluctuations on the appropriate time-scale.
PACS numbers: 05.40.-a, 66.30.Qa, 68.35.Ja
I. INTRODUCTION
During the past decade, continuum models and dis-
crete lattice simulations have been applied to understand
direct imaging observations of the thermal fluctuations of
step edges in which the step position is monitored as a
function of time [1, 2]. Of particular interest has been
the study of the dynamics of the equilibration of terrace
width distributions where the temporal evolution of step
edge fluctuations are driven by the exchange of atoms
between the step and the adjacent terrace and/or by mo-
tion of adatoms along the step edge itself [3, 4, 5, 6, 7, 8].
It is well known that the position of the step edge, as de-
scribed by its temporal correlation function, has a time
dependence that scales as a power law with an expo-
nent characteristic of specific atomic processes driving
the step fluctuations; tβ . In cases where mass transport
at the step is dominated by diffusion of atoms along the
step edge; β = 1/4. When mass transport proceeds via
exchange of atoms between the step edge and the terrace
β = 1/2. Careful experiments allow a crossover from t1/4
to t1/2 scaling to be observed as a function of the sam-
ple temperature [9]. Further, experimental measurement
of the correlation functions have been used to determine
thermodynamic properties of the steps, such as the step
stiffness [1, 2].
In this paper we investigate how the scaling of the step
edge fluctuations is changed by the presence of an electro-
migration force [10] acting upon atoms diffusing along the
step edges. The primary motivation for this study is the
possibility of using measurements of these changes to ob-
∗Electronic address: rous@umbc.edu
tain information about the electromigration force itself.
In conducting materials, a surface electromigration force
can be generated by passing an electrical current through
the sample [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22].
In terms of a simple discrete model, the presence of the
electromigration force introduces a small bias in the dif-
fusion of atoms at the step edge, parallel to the current
(and field). By convention, this bias can be expressed as
an energy difference between atoms diffusing parallel or
anti-parallel to the current ∆U = Z∗eEa⊥ where E is
the electric field applied to the sample, Z∗ is the effec-
tive valence of adatoms and a⊥ is the lattice spacing per-
pendicular to the step edge. A characteristic property of
surface electromigration is that the electromigration bias,
∆U , is several orders of magnitude smaller than the other
energy scales in the problem: γa⊥, where γ is the step
stiffness and the energy associated with thermal fluctua-
tions; kBT . This suggests that thermal fluctuations will
completely dominate the short-time behavior of the step
fluctuations with the effect of the electromigration bias
emerging only on much longer time scales. Nevertheless,
such time scales (of the order of seconds) are accessible to
experiment offering the possibility that the observation of
step fluctuations under electromigration conditions may
allow us to obtain quantitative information about the
magnitude of the force itself; a quantity that, to date,
has been hidden from experimental study.
This paper is organized as following. In section 2 we
present a continuum theory of step edge fluctuations un-
der electromigration conditions. In section 3 we test the
fidelity of the theory by showing the results of a Monte-
Carlo simulation of the temporal evolution step correla-
tion function. Concluding remarks are contained in sec-
tion 4.
http://arxiv.org/abs/0704.0624v1
mailto:rous@umbc.edu
II. THEORY
In order to describe the dynamics of a step-edge evolv-
ing under electromigration conditions we employ the
usual Langevin formalism where each degree of freedom
diffuses towards lower energy with a velocity that is pro-
portional to the energy gradient, subject to random ther-
mal fluctuations. The position of step edge is described
by it’s edge profile y(x, t) where the x-axis is oriented
parallel to the step edge. y(x, t) is the position of the
step edge at x and at time t.
In this paper we consider the limiting case where the
step motion occurs most easily by adatom diffusion along
the step edge itself. Adatom exchange between the step-
edge and the adjacent terraces, via attachment and de-
tachment, is neglected. It is well known that atomic diffu-
sion along a step edge is driven by the step curvature [23],
which generates a flux
Dsδsγ
∇sκ (1)
where Jc is the curvature-driven flux, Ds = a
||/τh is
the surface diffusion constant, a|| is the atomic diameter,
δs = a⊥ the width of an atom perpendicular to the step,
τh is the average time between hopping events and γ is
the step stiffness. ∇sκ is the gradient of the curvature
along the step edge. Mass conservation determines the
normal velocity of the step edge,
n̂ = −Ω∇s · Jc (2)
where Ω is the area of a single atom and n̂ is normal to
the step edge.
The inclusion of a thermal noise term, η and lineariza-
tion of the above equation leads to the well-known equa-
tion of motion for a step edge [3]
− Γhγ
y(x, t) = η(x, t). (3)
where we have defined
a3||a
. (4)
We model the effects of electromigration by adding to
this equation of motion a term generated by a constant
force, F = Z∗eE, felt by atoms at the step edge, which
arises from the application of a electric field, E, to the
material oriented parallel to the y-axis. Z∗ is the effective
valence of an atom at the step edge. The electromigration
force generates an additional flux,
JEM =
DsδsZ
∗eE||
DsδsZ
1 + y2x
where E|| is component of the electric field along the step
edge. yx indicates an x derivative.
The energy of any step configuration, relative to the
energy of the straight step, is determined by the step stiff-
ness γ and the electromigration force, F , felt by atoms
at the step edge. If the force acts perpendicular to the
step edge (i.e. when F > 0, the force acts in the +y,
or step down, direction) then, for small fluctuations, one
can linearize the stochastic equation of motion for the
step edge to obtain the following Langevin equation for
the step dynamics,
− Γhγ
− ΓhF
kTa||a⊥
y(x, t) = η(x, t) (6)
where a|| and a⊥ are the lattice parameter parallel and
perpendicular to the step edge. The noise term, η, de-
scribes the thermal fluctuations and is correlated be-
cause, in our model, of the random hopping of atoms
occurs only between between adjacent step edge sites.
〈η(x, t)η(x′, t′)〉 = 2Γhδ(t− t′)
δ(x − x′) (7)
To first order, the electromigration force does not al-
ter the correlation properties of the noise so that in the
single-jump regime equation 4 remians valid.
Before proceeding, for notational simplicity we rewrite
eqn. 6 as
− α ∂
∓ αq2c
y(x, t) = η(x, t) (8)
where
α ≡ Γhγ
, qc ≡
a||a⊥
The parameter qc depends only on the magnitude of the
electromigration force and in eqn. 6 the ∓ denotes the
force acting in the step down(-) or step up (+) direction.
In the case where there is no random thermal noise
(i.e. T → 0), eqn. 6 predicts that the amplitude of a
small sinusoidal fluctuation of the step-edge profile, with
wavevector q, evolves according to the following disper-
sion relation,
iω = αq2
q2 ± q2c
This is the same dispersion relation as the one that de-
scribes step flow under growth conditions [24, 25, 26]. If
the electromigration force acts in the step up (+) direc-
tion the step fluctuations are linearly stable. For a force
acting in the step-down direction there exists a linear in-
stability which initiates the well-known phenomenon of
electromigration-induced step-wandering for fluctuations
with wavevector smaller than qc [27, 28, 29]. The critical
wavevector is an important parameter in our Langevin
model that, from eqn. 9, is determined by the ratio of
the electromigration force and the step stiffness.
In order to determine the statistical properties of the
solutions of eqn. 6 we take the usual approach and first
derive the Green’s function for the problem using stan-
dard Fourier transform methods,
G(x, t|x′, t′) = i
e−α(k
k2)(t−t′)eik(x−x
In terms of the Green’s function, the displacement of the
step at time t is:
y(x, t′′) = y(x, t′) +
∫ t′′
G(x, t′′|x′, t)η(x′, t′)dx′dt
We can now compute the time correlation function of the
step edge, G(t), after time t = t′′ − t′ has elapsed,
g(t) ≡< (y(x, t′′)− y(x, t′))2 > (13)
Substituting eqn. 11 into 12 and using the correlation
properties of the noise (eqn. 7) we obtain we find, after
some calculation,
g±(t) =
21/4πα
(k2 ± q2c )
1− e−2α(k
or, using substitution,
g±(t) = t
(kT )
× (15)
(u2 ±
αtq4c )
1− e−2u
When there is no electromigration force (i.e. |F | =
0, qc = 0), the integral in eqn. 16 is clearly time-
independent and we recover the result derived by Bartelt
et al. [3] where the g(t) the step edge fluctuations evolves
with the well-known t1/4 scaling law characteristic of
step-edge limited diffusion.
g±(t) ≡ g0(t) = t1/4
(kT )
It is helpful to re-express eqn. 16 in terms of the average
time, τ0, that it takes for the mean-squared width of an
initially straight step to reach a value equal to g20(τ0) =
a2⊥ (i.e. one square lattice spacing):
g0(t) = a
where
τ0 ≡ τh
2kTa||
4Γ (3/4)
When the electromigration force is finite ( i.e. |F | > 0),
it is apparent from eqn. 16 that this scaling behavior is
modified by the appearance of an explicit time depen-
dence in the integral. This can be seen more clearly by
defining a critical time, τ , and a rescaled time, ζ = t/τ ,
where,
= 2τh
2kTa||
, (19)
Then, eqn. 16 can be rewritten as
g±(t) = a
where
I± (ζ) ≡
21/4Γ(3/4)
× (21)
(u2 ±
1− e−2u
I± is a universal function of the rescaled time, ζ and
is normalized such that I±(t → 0) = 1. The integral
appearing in eqn. 22 is easily evaluated numerically and
is shown in fig. 1 in which we display I± plotted as a
function of the rescaled time ζ (i.e. time is expressed in
units of τ). The solid curves show I± obtained for F > 0
(step up) , F < 0 (step down) electromigration forces
(For F = 0, I±(ζ) = 1).
From fig. 1. it is apparent that g±(t) (eqn. 20) deviates
very significantly from the t1/4 scaling behavior of g0(t)
(eqn. 17) as t approaches τ (ζ → 1). These deviations be-
gin to appear at earlier times, t ∼ 0.1τ , when the effect of
the force on the evolution of the step fluctuations begins
to be felt. This can be seen more clearly in fig. 2 which
displays the correlation function of a step plotted as a
function of time for τ0 = 5s and τ = 10
+4s. The values
for these parameters were chosen so that τ/τ0 ∼ 104, a
ratio typical of accelerated electromigration experiments.
As noted above, deviations from t1/4 scaling start to ap-
pear when t ∼ 0.1τ = 100s. The results shown in figs. 1
and 2 have a simple qualitative interpretation; the short
time behavior of the step fluctuations (t ≪ τ) is com-
pletely dominated by the thermal fluctuations and the
effect of the electromigration bias emerges only at later
FIG. 1: The integral function I(ζ) (eqn. 22) plotted as a
function of the rescaled time ζ = t/τ for no electromigration
force (F = 0), the electromigration force in the step down
direction (F > 0) and in the step-up direction (F < 0).
FIG. 2: The time corrleation function, g(t), of the step-
edge distribution predicted by the continuum Langevin model
(eqns. 20 and 22), plotted as a function of time for τ0 = 5s
and τ = 104s.
times. Such behavior is typical of the dynamics of diffu-
sion driven by weak external forces.
It is instructive to perform a power series expansion of
the integral about ζ = 0 such that eqn. 20 becomes,
g±(t) = a
1∓ a 1
. . .
The expansion coefficents can be obtained analytically,
= 0.3487, a1 =
= 0.1500,
Shown as the dashed lines in fig. 1 are the results of this
series approximation for I±(ζ) (eqn. 22), evaluated up
to, and including the terms linear in time. Clearly, this
truncated expression is a reasonable approximation for
t ≤ 0.4τ .
III. SIMULATION
In order to test the predictions of the continuum
Langevin model described above, we developed a Monte
Carlo simulation of step edge fluctuations in the pres-
ence of an external force. In this model, atomic diffu-
sion was restricted to the step edges with atoms jump-
ing between adjacent step sites only. Only nearest-
neighbor interactions on a square lattice were permit-
ted (a⊥ = a|| = a = 1) and were modelled by a single
bond energy ǫ. In order to describe the electromigration
force, the atomic jump probabilities for motion parallel
and anti-parallel to the force were biased by a small en-
ergy differential ∆U . In terms of the electromigration
force, F , and the lattice spacing perpendicular to the
step edge, a; ∆U = ±Fa.
Simulations were performed for steps of length ℓ =
10000a|| fluctuation on a square lattice. Periodic bound-
ary conditions were employed. The bond energy was set
to ǫ = 2.0kBT and the magnitude of the electromigra-
tion bias was |∆U | = 0.01kBT , a factor of 103 smaller
than the typical binding energy of an atom to the step
edge. This value was chosen to generate statistically sig-
nificant deviations from the (no-force) t1/4 scaling within
reasonable simulation times. If ǫ = 0.1eV and it is as-
sumed that a ∼ 1.5Å (typical of metals) then this bias
corresponds to an electric field with a magnitude of or-
der 1000V cm−1 acting on an atom with effective valence
of Z∗ ∼ ±10e. In actual accelerated electromigration
experiments, a field of 0.1− 1V cm−1 is typical.
Figure 3 shows the results of the simulation where the
correlation function of an isolated step is plotted as a
function of the time measured in Monte-Carlo steps per
step-edge site (MCS). Shown is g(t) obtained when the
electromigration force acts in the step-up and step-down
directions and when ∆U = 0 (i.e. no electromigration
force is acting at the step edge).We define one Monte
FIG. 3: Results of the a kinetic Monte Carlo simulation where
the correlation function of an isolated step is plotted as a
function of the time (measured in Monte-Carlo steps per step-
edge site, the lattice spacing is a = 1). Shown is g(t) obtained
when the electromigration force acts in the step-up and step-
down directions and when ∆U = 0 (i.e. no electromigration
force is acting at the step edge). The curves shown were
obtained by averaging the data from 200 randomly generated
replicas after each was equilibrated for 105 Monte Carlo steps.
Carlo step to be equal to the average time needed for ev-
ery atom at the step edge to attempt a hop. The results
shown in fig. 3 were obtained by averaging the data from
100 randomly generated replicas after each was equili-
brated for 105 Monte Carlo iterations per site. Compar-
ing the simulation results (fig. 3.) to the predictions of
the Langevin theory (fig 4.) it is apparent that the qual-
itative features of the continuum theory are reproduced
by the Monte Carlo simulation. These same data are pre-
sented in the form of a log-log plot in fig. 4. In the ab-
sence of the force (F = 0 , ∆U = 0), a least-squares fit of
the no-force simulation data shows that g0(t) is very well
fit by a tβ power-law where β = 0.25± 0.01. Therefore,
when there is no electromigration force present in the
simulation, the correlation function of the step evolves
according well-known t1/4 scaling, as predicted by the
Langevin analysis (eqn. 17). By least squares fitting the
simulation results to eqn. 17 we obtain a value of τ0 = 5
We now consider the results of the simulation obtained
for a finite electromigration force, also shown in figs. 3
and 4. Equation 22 suggests that the value of τ can be
FIG. 4: A log-log plot of the simulation data shown in fig. 3.
The dashed line shows the best fit of a power law, (t/τ0)
β, to
the no-force (F = 0) data; β = 0.25, τ0 = 5 MCS.
extracted from the simulation results by considering the
scaling of the difference between the correlation functions
for the force in the up and down step directions:
∆(t) =
g+ − g−
= 2a 1
. . . (24)
For the simulation results, this normalized difference is
plotted in fig. 5. The behavior of this quantity is well
fit by the leading order term in eqn. 24 from which we
obtain a value of τ = 62000± 10000 MCS.
For comparison with the fits to the continuum
Langevin model , we can estimate τ from the microscopic
parameters employed in the discrete Monte Carlo model
used to generate the simulation data. Combining eqn. 19
with the step stiffness appropriate for our model
2kTa||
2kBTa||
we obtain an estimate of τ in units of MCS:
τ = 2
2kBTa||
The ratio of the hopping time in the Langevin model ,
τh to the time between hopping attempts in the Monte
Carlo simulation τa can be obtained by monitoring the
FIG. 5: For the simulation results shown in figs. 3 and 4; a
log-log plot of the normalized difference (eqn. 24) plotted as a
function of time (MCS). The behavior of this quantity is well
fit by the leading order term in eqn. 24 (t1/2, dashed curve)
from which we obtain a value of τ = 62000 ± 10000 MCS.
success rate of hops between adjacent lattice sites in the
simulation. We find that τa/τh = 3.6 ± 0.1. Thus our
estimate for the value of τ in the Monte Carlo simu-
lations, used to generate the data shown in figs 3-5, is
τ = 62000±10000 MCS. Clearly, the agreement between
the continuum Langevin theory (τ = 71000 MCS ) and
the microscopic model (τ = 62000± 10000 MCS) is rea-
sonable. Finally we note that in the high temperature
limit (kBT ≫ ǫ) the ratio of the electromigration bias to
the binding energy at a step edge, ǫ is related directly to
τ that would be obtained from experiment:
where τh can be determined from eqn. 18, if the step
stiffness is known.
IV. CONCLUSIONS
The temporal evolution of step-edge fluctuations un-
der electromigration conditions has been analyzed using
a continuum Langevin model for the case where diffu-
sion is limited by mass transport along the step edge.
We find that the presence of the electromigration force,
felt by atoms at the step edge, causes deviations from
the usual t1/4 scaling of the terrace-width distribution
driven by thermal fluctuations alone. We have identified
a critical time τ that is a function of the three energy
scales in the problem: kBT , the step stiffness, γ, and the
bias associated with adatom hopping under the influence
of an electromigration force, ±∆U . For (t < τ), the cor-
relation function evolves, to a good approximation, as a
superposition of t1/4 and t3/4 power laws. For all τ a
closed form expression was derived. This behavior was
confirmed by a Monte-Carlo simulation using a discrete
model of the step dynamics. Finally we propose that the
magnitude of the electromigration force acting upon an
atom at a step-edge could be determined directly by care-
ful measurement and analysis of the statistical properties
of step-edge fluctuations on the appropriate time-scale.
Acknowledgments
We acknowledge helpful discussions with E.D.
Williams.
This work has been supported by the US Department
of Energy grant DE-FG02-01ER45939 and by the NSF-
Materials Research Science and Engineering Center un-
der grant DMR-00-80008.
[1] H.-C. Jeong and E. D. Williams, Surface Science Reports
34, 171 1999.
[2] M. Giesen, Progress in Surface Science 68, 1 2001.
[3] N. C. Bartelt, J. L. Goldberg, T. L. Einstein, and E. D.
Williams, Surf. Sci. 273, 252 1992.
[4] S. V. Khare and T. L. Einstein, Phys. Rev. B 57, 4782
1998.
[5] T. Ihle, C. Misbah, and O. Pierre-Louis, Phys. Rev. B
58, 2289 1998.
[6] H.-C. Jeong and J. D. Weeks, Surf. Sci. 432 , 101 1999.
[7] C. P. Flynn, Phys. Rev. B 66, 155405 2002.
[8] N. C. Bartelt, T. L. Einstein, and E. D. Williams, Surf.
Sci. 521 , L669 2002.
[9] M. Giesen, Surf. Sci. 442 , 543 1999.
[10] R. S. Sorbello, Solid State Physics eds. H. Ehrenreich and
F. Spaepen 51, 159 1999.
[11] D. Schumacher, Surface Scattering Experiments With
Conduction Electrons Springer Tracts in Modern Physics
(Springer, Berlin) 128, 1993.
[12] T. W. Duryea and H. B. Huntington, Surf. Sci. 199, 261
1988.
[13] H. Ishida, Phys. Rev. B 49, 14610 1994.
[14] H. Ishida, Phys. Rev. B 52, 10819 1995.
[15] H. Ishida, Phys. Rev. B 57, 4140 1998.
[16] H. Ishida, Phys. Rev. B 60, 4532 1999.
[17] H. Ishida, Phys. Rev. B 54, 10905 1996.
[18] P. J. Rous, T. L. Einstein, and E. D. Williams, Surf. Sci.
Lett. 315, 995 1994.
[19] P. J. Rous, Phys. Rev. B 61 , 8475 2000.
[20] P. J. Rous, Phys. Rev. B 61 , 8475 2000.
[21] P. J. Rous, J. Appl. Phys. 87, 2780 2000.
[22] H. Yasunaga and A. Natori, Surf. Sci. Rep. 15, 205 1992.
[23] W. W. Mullins, J. Appl. Phys. 28 , 333 1957.
[24] G. Bales and A. Zangwill, Phys. Rev. B 41 , 5500 1990.
[25] O. Pierre-Louis, M. D’Orsogna, and T. Einstein, Phys.
Rev. Lett. 82, 3661 1999.
[26] M. R. Murty and B. Cooper, Phys. Rev. Lett. 83, 352
1999.
[27] J. Krug, Multiscale Modeling of Epitaxial Growthed. ed.
A Voight (Birkhauser) ), 2004.
[28] M. Degawa et al., Surf. Sci. 487 , 171 2001.
[29] T. Zhao, J. D. Weeks, and D. Kandel, Phys. Rev. B 70
, 161303 2004.
ABSTRACT
  The temporal evolution of step-edge fluctuations under electromigration
conditions is analysed using a continuum Langevin model. If the
electromigration driving force acts in the step up/down direction, and
step-edge diffusion is the dominant mass-transport mechanism, we find that
significant deviations from the usual $t^{1/4}$ scaling of the terrace-width
correlation function occurs for a critical time $\tau$ which is dependent upon
the three energy scales in the problem: $k_{B}T$, the step stiffness, $\gamma$,
and the bias associated with adatom hopping under the influence of an
electromigration force, $\pm \Delta U$. For ($t < \tau$), the correlation
function evolves as a superposition of $t^{1/4}$ and $t^{3/4}$ power laws. For
$t \ge \tau$ a closed form expression can be derived. This behavior is
confirmed by a Monte-Carlo simulation using a discrete model of the step
dynamics. It is proposed that the magnitude of the electromigration force
acting upon an atom at a step-edge can by estimated by a careful analysis of
the statistical properties of step-edge fluctuations on the appropriate
time-scale.

<|endoftext|><|startoftext|>
Introduction
	Renormgroup analysis
	The Split Higgsino scenario
	The model Lagrangian and its features
	Split Higgsino as the DM carrier
	SUSY scales and experimental possibilities
	-N scattering and collider signals
	Diffuse gamma spectrum from the Galactic halo
	Conclusions
	References
ABSTRACT
  We present a renormalization group motivation of scale hierarchies in SUSY
SU(5) model. The Split Higgsino scanrio with a high scale of the SUSY breaking
is considered in detail. Its manifestations in experiments are discussed.

<|endoftext|><|startoftext|>
Introduction
	ISW effect and steerable wavelets
	ISW effect
	Wavelets on the sphere
	Steerability and morphological measures
	Analysis procedures
	Data and simulations
	Generic procedure
	Local morphological analysis
	Matched intensity analysis
	Local morphological correlations
	Detections
	Foregrounds and systematics
	Matched intensity correlation
	Detections
	Foregrounds and systematics
	Conclusions
ABSTRACT
  Using local morphological measures on the sphere defined through a steerable
wavelet analysis, we examine the three-year WMAP and the NVSS data for
correlation induced by the integrated Sachs-Wolfe (ISW) effect. The steerable
wavelet constructed from the second derivative of a Gaussian allows one to
define three local morphological measures, namely the signed-intensity,
orientation and elongation of local features. Detections of correlation between
the WMAP and NVSS data are made with each of these morphological measures. The
most significant detection is obtained in the correlation of the
signed-intensity of local features at a significance of 99.9%. By inspecting
signed-intensity sky maps, it is possible for the first time to see the
correlation between the WMAP and NVSS data by eye. Foreground contamination and
instrumental systematics in the WMAP data are ruled out as the source of all
significant detections of correlation. Our results provide new insight on the
ISW effect by probing the morphological nature of the correlation induced
between the cosmic microwave background and large scale structure of the
Universe. Given the current constraints on the flatness of the Universe, our
detection of the ISW effect again provides direct and independent evidence for
dark energy. Moreover, this new morphological analysis may be used in future to
help us to better understand the nature of dark energy.

<|endoftext|><|startoftext|>
Electromagnetic structure and weak decay of meson K in a light-front
QCD-inspired model∗
Fabiano P. Pereira a, J. P. B. C. de Melo b †, T. Frederico c, and Lauro Tomio d
aInstituto de F́ısica, Universidade Federal Fluminense, 24210-900, Niterói, RJ, Brazil
bUniversidade Cruzeiro do Sul, CETEC, 08060-070, São Paulo, SP, Brazil
cInstituto Tecnológico de Aeronáutica, 12228-900, São José dos Campos, SP, Brazil
dInstituto de F́ısica Teórica, UNESP, 01405-900, São Paulo, SP, Brazil
The kaon electromagnetic (e.m.) form factor is reviewed considering a light-front con-
stituent quark model. In this approach, it is discussed the relevance of the quark-antiquark
pair terms for the full covariance of the e.m. current. It is also verified, by considering a
QCD dynamical model, that a good agreement with experimental data can be obtained for
the kaon weak decay constant once a probability of about 80% of the valence component
is taken into account.
1. INTRODUCTION
The kaon, as quark-antiquark bound states, is one appropriate system to study aspects
of QCD at low and intermediate energy regions. By using quantum field theory at the
light-front the subnuclear structure can be more easily studied [ 1, 2, 3]. Within the light-
front framework and an appropriate choice of the frame, it is possible to obtain the pion
electromagnetic form factor at both space- and time-like regimes[ 4]. Using the light-cone
components J+K = J
0+J3 and J−K = J
0−J3 of the kaon electromagnetic current, one can
obtain the corresponding form factors in the light-front formalism, with a pseudoscalar
coupling for the quarks and considering the Breit frame (q+ = 0, ~q⊥ = (qx, 0) 6= 0) [ 5].
In the case of J+K there is no pair term contribution in the Breit frame. However, for the
J−K component of the electromagnetic current, the pair term contribution is different from
zero and necessary to preserve the rotational symmetry of the current.
In the next section, we outline the main equations of the model for the kaon electromag-
netic current, detailed in [ 5], with the corresponding results obtained for the kaon elastic
form factor. In section 3, we briefly review a QCD inspired model, presenting results for
the weak decay pseudoscalar constants compared to data. In section 4 we present our
conclusions.
∗Work partially supported by the Brazilian funding agencies FAPESP and CNPq
†JPBC de Melo thanks Instituto de F́ısica Teórica, UNESP, for supporting facilities
http://arxiv.org/abs/0704.0627v1
2. ELECTROMAGNETIC FORM FACTOR
The initial light-front wave function considered in the present model is given by:
ΦiQ(x, k⊥) =
(1− x)2
−M20 )(m
−M2(mQ, mR))
, (1)
where N is a normalization constant, Q ≡ {q̄, q} is the quark or antiquark index with mQ
is the corresponding quark mass, m2
is the kaon mass, x = k+/P+ is the momentum
fraction, and
M2(mQ, mR) ≡
k2⊥ +m
(P − k)2⊥ +m
− P 2⊥, (2)
with the free quark-mass operator given by M20 = M
2(mq̄, mq). mR is a mass constant
chosen to regularize the triangle diagram. For the corresponding final wave-functions,
q̄ and Φ
q , we just need to exchange P ↔ P
′ in (1) and (2). The relation between the
electromagnetic current Jµ and the space-like kaon electromagnetic form factor FK+(q
is given by 〈P ′|Jµ|P ′〉 = (P ′ + P )µFK+(q
2) . In terms of the initial (Φiq̄) and final (Φ
light-front wave functions, we have
F+q̄ (q
2) = −eq̄
N2g2Nc
4π3P+
d2k⊥dx
N+q̄ θ(x)θ(1− x) Φ
q̄ (x, k⊥)Φ
q̄(x, k⊥) , (3)
F+q (q
2) = [ q ↔ q̄ in F+q̄ (q
2) ] , (4)
where Nc is the color number, g is the coupling constant, eQ is the charge of quark Q, and
N+q̄ = (−1/4)Tr[(/k +mq̄)γ
5(/k − /P ′ +mq)γ
+(/k − /P +mq)γ
. In the light-front
approach, beside the valence contribution, we have also the non-valence contributions
to the currents. In the case of the J+ component, the non-valence component does
not contribute to the corresponding matrix elements [ 5]. The kaon electromagnetic
form factor obtained with J+ is the sum of two contributions from quark and antiquark
currents:
(q2) = F+q (q
2) + F+q̄ (q
2) normalized such that F+
(0) = 1. (5)
In the case that we consider the J− component, to obtain the kaon electromagnetic form
factor, after considering the contribution from the interval 0 < k+ < P+ (interval I), we
need to add a second contribution, which is originated from the pair terms, and non-zero
in the interval P+ < k+ < P ′+ (interval II). The contribution is obtained after a Cauchy
integral in k− is performed in the limit P ′+ → P+ [ 5]. So, instead of (5), we will have:
(q2) =
F−q (q
2) + F−q̄ (q
F−(q2)
, (6)
normalized by the charge conservation to F−
(0) = 1.
The parameters of the model are the constituent quark masses, mq = mu = md = 220
MeV, ms = 419 MeV and the regulator mass mR =946 MeV, adjusted to fit the electro-
magnetic radius of the kaon. The electromagnetic radius is related to the corresponding
form factor, with the mean-square-radius given by
〈r2K+〉 = 6
dFK+(q
. (7)
With the parameters adjusted as given above, we have 〈r2
〉 = 0.354 fm2, which is very
close to the experimental value 〈r2
〉|exp = 0.340 fm
2 [ 6].
Our results for the kaon electromagnetic form factor are presented in Fig. 1, in compar-
ison with available experimental data [ 6]. We observe that the full kaon electromagnetic
form factor is covariant only after the inclusion of the pair terms or non-valence contri-
bution to the J−
component of the electromagnetic current.
0.01 0.10 1.00
 [(GeV/c)
Figure 1. The kaon electromagnetic form factor is obtained with the plus and minus
components of the e.m. current (both cases are shown by the solid-line results) and
compared with experimental data [ 6]. The dashed-line curve shows the form factor
without the pair terms contribution in J−
3. WEAK DECAY CONSTANTS IN A QCD INSPIRED MODEL
Next, we briefly review the calculation of the pseudoscalar constants, in a light-front
QCD-inspired dynamical model. In this case, the constituent quark masses need to be
readjusted in view of the fact that, differently from the approach outlined in section 2,
the wave-function is obtained from an eigenvalue equation, as follows.
The valence wave function is obtained by solving an eigenvalue equation for the effective
square mass operator M2ps [ 7]:
ps ψ(x,
~k⊥) = M
0 (x, k⊥) ψ(x,
~k⊥)−
dx′d~k′⊥θ(x
′)θ(1− x′)
x(1− x)x′(1− x′)
4m1m2
− λpsg(M
0 (x, k⊥))g(M
′, k′⊥))
ψ(x′, ~k′⊥) , (8)
where M20 (x, k⊥) ≡ (
~k2⊥ + m
1)/x + (
~k2⊥ + m
2)/(1 − x) is the free square mass operator
in the meson rest frame, m1,2 are the constituent quark masses, α gives the strength of
the Coulomb-like interaction. g(K) is the model form factor, with λps the strength of the
separable interaction. We consider two expressions for the form factors:
g(a)(K2) =
β(a) +K2
and g(b)(K2) =
, (9)
where the parameters β(a,b) and λps are adjusted to reproduce the experimental val-
ues of the pion electromagnetic radius and mass, mπ. For α = 0.5, we have β
(a) =
−(634.5 MeV)2 and β(b) = (1171 MeV)2. mu = 384 MeV, ms = 508 MeV. In Table 1,
we have the results compared with experimental data [ 8].
Table 1
Results for the kaon and pion weak decay constants, compared with experimental data.
The model is adjusted to reproduce pion radius and mass.
qq f (a)ps (MeV) f
ps (MeV) f
ps (MeV) M
ps (MeV) M
ps (MeV) [ 8]
π+(ud) 110 110 92.4± .07± 0.25 [ 8] 140 140
K+(us) 126 121 113.0± 1.0± 0.31[ 8] 490 494
4. CONCLUSIONS
Considering a light-front model wave-function we have observed a good agreement of
the results for the kaon electromagnetic form factor with experimental data. The electro-
magnetic form factor was obtained using the plus and minus components of the electro-
magnetic current. The inclusion of the non-valence component of the current was shown
to be essential in this approach to obtain covariant results for the calculated matrix ele-
ments. We also show that a good agreement with experimental data is obtained for the
kaon weak decay constants once a probability of the valence component of about 80% is
taken into account.
REFERENCES
1. F. Cardarelli, I. L. Grach, I. M. Narodetsky, E. Pace, G. Salme, S. Simula,
Phys. Rev. D 53 (1996) 6682.
2. J. P. B. C. de Melo, H. W. Naus and T. Frederico, Phy. Rev. C 59 (1999) 2278.
3. B. L. G. Bakker, H.-M. Choi and C.-R. Ji, Phys. Rev. D 63 (2001) 074014.
4. J. P. B. C. de Melo, T. Frederico, E. Pace and G. Salmè, Phy. Rev. D 73 (2006)
074013; J. P. B. C. de Melo, T. Frederico, E. Pace and G. Salmè, Phy. Lett. B 581
(2004) 75.
5. F.P. Pereira, J.P.B.C. de Melo, T. Frederico and L. Tomio, Phys. of Part. and Nucl.
36 (2005) 5217; F.P. Pereira, Fatores de Forma Eletromagnéticos do Ṕıon e do Kaon
na Frente de Luz, Msc Dissertation, IFT, São Paulo, 2005.
6. S. R. Amendolia et al., Phys. Lett. B 178 (1986) 435.
7. T. Frederico and H.-C. Pauli, Phy. Rev. D 64 (2001) 054004; L. A. M. Salcedo, J. P.
B. C. de Melo, D. Hadjmichef and T. Frederico, Eur. Phys. J. A 27 (2006) 213.
8. W.-M. Yao et al., Journal of Physics G 33 (2006) 1.
	INTRODUCTION
	ELECTROMAGNETIC FORM FACTOR
	WEAK DECAY CONSTANTS IN A QCD INSPIRED MODEL
	CONCLUSIONS
ABSTRACT
  The kaon electromagnetic (e.m.) form factor is reviewed considering a
light-front constituent quark model. In this approach, it is discussed the
relevance of the quark-antiquark pair terms for the full covariance of the e.m.
current. It is also verified, by considering a QCD dynamical model, that a good
agreement with experimental data can be obtained for the kaon weak decay
constant once a probability of about 80% of the valence component is taken into
account.

<|endoftext|><|startoftext|>
Black hole puncture initial data with realistic gravitational wave content
B. J. Kelly,1, 2 W. Tichy,3 M. Campanelli,4, 2 and B. F. Whiting5, 2
1Gravitational Astrophysics Laboratory, NASA Goddard Space Flight Center, 8800 Greenbelt Rd., Greenbelt, MD 20771, USA
2Center for Gravitational Wave Astronomy, Department of Physics and Astronomy,
The University of Texas at Brownsville, Brownsville, Texas 78520
3Department of Physics, Florida Atlantic University, Boca Raton Florida 33431-0991
4Center for Computational Relativity and Gravitation,
School of Mathematical Sciences, Rochester Institute of Technology,
78 Lomb Memorial Drive, Rochester, New York 14623
5Department of Physics, University of Florida, Gainsville Florida 32611-8440
(Dated: October 26, 2018)
We present improved post-Newtonian-inspired initial data for non-spinning black-hole binaries,
suitable for numerical evolution with punctures. We revisit the work of Tichy et al. [W. Tichy, B.
Brügmann, M. Campanelli, and P. Diener, Phys. Rev. D 67, 064008 (2003)], explicitly calculating
the remaining integral terms. These terms improve accuracy in the far zone and, for the first
time, include realistic gravitational waves in the initial data. We investigate the behavior of these
data both at the center of mass and in the far zone, demonstrating agreement of the transverse-
traceless parts of the new metric with quadrupole-approximation waveforms. These data can be
used for numerical evolutions, enabling a direct connection between the merger waveforms and the
post-Newtonian inspiral waveforms.
PACS numbers: 04.25.Dm, 04.25.Nx, 04.30.Db, 04.70.Bw
I. INTRODUCTION
Post-Newtonian (PN) methods have played a funda-
mental role in our understanding of the astrophysical im-
plications of Einstein’s theory of general relativity. Most
importantly, they have been used to confirm that the ra-
diation of gravitational waves accounts for energy loss in
known binary pulsar configurations. They have also been
used to create templates for the gravitational waves emit-
ted from compact binaries which might be detected by
ground-based gravitational wave observatories, such as
LIGO [1, 2], and the NASA/ESA planned space-based
mission, LISA [3, 4]. However, PN methods have not
been extensively used to provide initial data for binary
evolution in numerical relativity, nor, until recently (see
[5, 6]), have they been extensively studied so that their
limitations could be well identified and the results of nu-
merical relativity independently confirmed.
Until the end of 2004, the field of numerical relativ-
ity had been struggling to compute even a single or-
bit for a black-hole binary (BHB). Although debate oc-
curred on the advantages of one type of initial data over
another, the primary focus within the numerical rela-
tivity community was on code refinement which would
lead to more stable evolution. Astrophysical realism was
very much a secondary issue. However, this situation
has radically changed in the last few years with the in-
troduction of two essentially independent, but equally
successful techniques: the generalized harmonic gauge
(GHG) method developed by Pretorius [7] and the “mov-
ing puncture” approach, independently developed by the
UTB and NASA Goddard groups [8, 9]. Originally in-
troduced by Brandt & Brügmann [10] in the context of
initial data, the puncture method explicitly factored out
the singular part of the metric. When used in numerical
evolution in which the punctures remained fixed on the
numerical grid, it resulted in distortions of the coordi-
nate system and instabilities in the Baumgarte-Shapiro-
Shibata-Nakamura (BSSN) [11, 12] evolution scheme.
The revolutionary idea behind the moving puncture ap-
proach was precisely, not to factor out the singular part
of the metric, but rather evolve it together with the reg-
ular part, allowing the punctures to move freely across
the grid with a suitable choice of the gauge.
A golden age for numerical relativity is now emerging,
in which multiple groups are using different computer
codes to evolve BHBs for several orbits before plunge and
merger [13, 14, 15, 16, 17, 18, 19, 20, 21]. Comparison of
the numerical results obtained from these various codes
has taken place [22, 23, 24], and comparison with PN
inspiral waveforms has also been carried out with encour-
aging success [5, 6, 25, 26]. The application of successful
numerical relativity tools to study some important as-
trophysical properties (e.g. precession, recoil, spin-orbit
coupling, elliptical orbits, etc) of spinning and/or un-
equal mass-black hole systems is currently producing ex-
tremely interesting new results [27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39, 40, 41, 42]. It now seems that
the primary obstacle to further progress is simply one of
computing power. In this new situation, it is perhaps
time to return to the question of what initial data will
best describe an astrophysical BHB.
To date, the best-motivated description of pre-merger
BHBs has been supplied by PN methods. We might ex-
pect, then, that a PN-based approach would give us the
most astrophysically correct initial data from which to
run full numerical simulations. In practice, PN results
are frequently obtained in a form ill-adapted to numeri-
http://arxiv.org/abs/0704.0628v2
cal evolution. PN analysis often deals with the full four-
metric, in harmonic coordinates; numerical evolutions
frequently use ADM-type coordinates, with a canonical
decomposition of the four-metric into a spatial metric
and extrinsic curvature.
Fortunately, many PN results have been translated
into the language of ADM by Ohta, Damour, Schäfer and
collaborators. Explicit results for 2.5PN BHB data in the
near zone were given by Schäfer [43] and Jaranowski &
Schäfer (JS) [44], and these were implemented numer-
ically by Tichy et al. [45]. Their insight was that the
ADM-transverse-traceless (TT) gauge used by Schäfer
was well-adapted to a puncture approach. To facilitate
comparison with this earlier work [45], we continue to
use the results of Schäfer and co-workers, anticipating
that higher-order PN results should eventually become
available in a useful form.
The initial data provided previously by Tichy et
al. already include PN information. They are accurate
up to order (v/c)5 in the near zone (r ≪ λ), but the
accuracy drops to order (v/c)3 in the far zone (r ≫ λ)
[here λ ∼ π
r312/G(m1 +m2) is the gravitational wave-
length]. These data were incomplete in the sense that
they did not include the correct TT radiative piece in
the metric, and thus did not contain realistic gravita-
tional waves.
In this paper, we revisit the PN data problem in
ADM-TT coordinates, with the aim of supplying Numer-
ical Relativity with initial BHB data that extend as far
as necessary, and contain realistic gravitational waves.
To do this, we have evaluated the “missing pieces” of
Schäfer’s TT metric for the case of two non-spinning par-
ticles. We have analyzed the near- and far-zone behavior
of these data, and incorporated them numerically in the
Cactus [46] framework. In principle, the most accurate
PN metric available could be used at this step, but it is
not currently available in ADM-TT form.
The remainder of this paper is laid out as follows. In
Section II, we summarize the results of Schäfer (1985)
[43], and Jaranowski & Schäfer (1997) [44] and their ap-
plication by Tichy et al. (2003) [45], to the production
of puncture data for numerical evolution. In Section
III, we describe briefly the additional terms necessary
to complete hTT to order (v/c)4, deferring details to the
Appendix. In Section IV, we study the full data both
analytically and numerically. Section V summarizes our
results, and lays the groundwork for numerical evolution
of these data, to be presented in a subsequent article.
II. ADM-TT GAUGE IN POST-NEWTONIAN
The “ADM-TT” gauge [43, 47] is a 3+1 split of data
where the three-metric differs from conformal flatness
precisely by a TT radiative part:
gij =
ηij + h
ij , (1)
πii = 0. (2)
The fields φ, πij and hTTij can all be expanded in a post-
Newtonian series. Solving the constraint equations of
3+1 general relativity in this gauge, [43, 44] obtained ex-
plicit expressions valid up to O(v/c)5 in the near zone,
incorporating an arbitrary number of spinless point par-
ticles, with arbitrary masses mA. For N particles, the
lowest-order contribution to the conformal factor is1:
φ(2) = 4G
, (3)
where rA =
~x− ~xA is the distance from the field point
to the location of particle A.
In principle hTTij is computed from
hTTij = −δTT klij ✷−1retskl, (4)
where ✷−1ret is the (flat space) inverse d’Alembertian (with
a “no-incoming-radiation” condition [48]), skl is a non-
local source term and δTT klij is the TT-projection oper-
ator. In order to compute hTTij we first rewrite Eq. (4)
hTTij = −δTT klij
∆−1 + (✷−1ret −∆−1)
TT (NZ)
ij − δ
TT kl
ij (✷
ret −∆−1)skl. (5)
Note that the near-zone approximation h
TT (NZ)
ij of h
has already been computed in [43] up to order O(v/c)4
(see also Eq. 12 below). The last term in Eq. (5) is diffi-
cult to compute because
skl = 16πG
pAk pAl
δ(x− xA) +
,l (6)
is a non-local source. However, we can approximate skl
s̄kl =
pAk pAl
B 6=A
nABk nABl
×16πGδ(x− xA). (7)
and show that
hTTij,(div) = −δ
TT kl
ij (✷
ret−∆−1)(skl− s̄kl) ∼ O(v/c)5 (8)
1 We explicitly include the gravitational constant G in all expres-
sions here, as the standard convention G = 1 used in Numerical
Relativity differs from the convention 16πG = 1 employed by
[43, 44].
in the near zone. Furthermore, outside the near zone
ij,(div)
∼ 1/r2, so that hTT
ij,(div)
falls off much faster than
rest of hTTij , which falls off like 1/r. Hence
hTTij = h
TT (NZ)
ij − δ
TT kl
ij (✷
ret −∆−1)s̄kl + hTTij,(div), (9)
where hTT
ij,(div)
can be neglected if we only keep terms up
to O(v/c)4 generally, and O(1/r) at infinity.
The full expression for hTTij for N interacting point
particles from Eq. (4.3) of [43] is:
hTTij = h
TT (NZ)
ij + h
ij,(div) + 16πG
d3~k dω dτ
(2 π)4
pAi pAj
B 6=A
nABi nABj
× (ω/k)
~k·(~x−~xA)−i ω (t−τ)
k2 − (ω + i ǫ)2
. (10)
The first term in (10), h
TT (NZ)
ij can be expanded in v/c
TT (NZ)
ij = h
TT (4)
ij + h
TT (5)
ij +O(v/c)
6. (11)
The leading order term at O(v/c)4, is given explicitly by
Eq. (A20) of [44]:
hTT (4)ij =
mA rA
‖ ~pA ‖2 −5 (n̂A · ~pA)2
δij + 2 piA p
3(n̂A · ~pA)2 − 5 ‖ ~pA ‖2
niA n
A + 12(n̂A · ~pA)n
B 6=A
niABn
AB + 2
rA + rB
niA n
rABrA
+ 3rA
rABrA
, (12)
where sAB ≡ rA + rB + rAB. The other two terms
in Eq. (10) can be shown to be small in the near zone
(r ≪ λ, where the characteristic wavelength λ ∼ 100M
for rAB ∼ 10M). However, hTT (NZ)ij is only a valid ap-
proximation to hTTij in the near zone, and becomes highly
inaccurate when used further afield.
Setting aside these far-field issues, Tichy et al. [45] ap-
plied Schäfer’s formulation, in the context of a black-hole
binary system, to construct initial data that are accurate
up to O(v/c)5 in the near zone. They noted that the
ADM-TT decomposition was well-adapted to the use of
a puncture approach to handle black-hole singularities.
This approach is essentially an extension of the method
introduced in [10]. It allows a simple numerical treatment
of the black holes without the need for excision.
The PN-based puncture data of Tichy et al. have
not been used for numerical evolutions. This is in part
because these data, just like standard puncture data [10,
49, 50, 51], do not contain realistic gravitational waves
in the far zone: h
TT (NZ)
ij does not even vaguely agree
with the 2PN approximation to the waveform amplitude
nor with the quadrupole approximation to the waveform
phase for realistic inspiral.
To illustrate this, we restrict to the case of two point
sources, and compute the “plus” and “cross” polariza-
tions of the near-zone approximation for hTTij :
+ = h
TT (NZ)
θ, (13)
× = h
TT (NZ)
φ. (14)
For comparison, the corresponding polarizations of the
quadrupole approximation for the gravitational-wave
strain are given by (paraphrasing Eq. (3.4) of [52]):
(1+cos2 θ)(πGMfGW)2/3cos(ΦGW), (15)
cos θ(πGMfGW)2/3sin(ΦGW), (16)
where M ≡ ν3/5M is the “chirp mass” of the binary,
given in terms of the total PN mass of the system M =
m1 +m2, and the symmetric mass ratio ν = m1m2/M
The angle θ is the “inclination angle of orbital angular
momentum to the line of sight toward the detector”; that
is, just the polar angle to the field point, when the binary
moves in the x-y plane. ΦGW and fGW are the phase and
frequency of the radiation at time t, exactly twice the
orbital phase Φ(t− r) and orbital frequency Ω(t− r)/2π.
The lowest-order PN prediction for radiation-reaction
effects yields a simple inspiral of the binary over time,
with orbital phasing given by
Φ(τ) = Φ(tc)−
Θ5/8, (17)
Ω(τ) =
Θ−3/8, (18)
where Θ ≡ ν (tc − τ)/5GM , M and ν are given below
(16), and tc is a nominal “coalescence time”. To evaluate
(13-14), we need the transverse momentum p correspond-
ing to the desired separation r12. The simplest expression
for this is the classical Keplerian relation, which we give
parameterized by Ω(τ):
r12 = G
1/3M(MΩ)−2/3, (19)
p = Mν(GMΩ)1/3. (20)
In Fig. 1 we compare the plus polarization of the two
waveforms (13) and (15) at a field point r = 100M ,
θ = π/4, φ = 0, for a binary in the x-y plane, with ini-
tial separation r12 = 10M . The orbital frequency of the
binary is related to the separation r12 and momenta p en-
tering (13) by (19-20). To this level of approximation, the
binary has a nominal PN coalescence time tc ≈ 780M .
As might have been anticipated, both phase and ampli-
tude of h
TT (4)
ij are wrong outside the near zone. This
means that the data constructed from h
TT (4)
ij have the
wrong wave content, but nevertheless these data are still
accurate up to order (v/c)3 in the far zone.
It is evident from the present-time dependence of (12)
that it cannot actually contain any of the past history
of an inspiralling binary. We would expect that a cor-
rect “wave-like” contribution should depend rather on
the retarded time of each contributing point source. It
seems evident that the correct behavior must, in fact, be
contained in the as-yet unevaluated parts of (10). The
requisite evaluation is what we undertake in the next sec-
tion.
III. COMPLETING THE EVALUATION OF hTTij
To move forward, we will simplify (10) and (12) to the
case of only two particles. Then (10) reduces to:
hTTij = h
TT (NZ)
ij + 16πG
p1 i p1 j
~k ·(~x−~x1) +
p2 i p2 j
~k ·(~x−~x2) − G
n12i n12j
~k ·(~x−~x1)
n21i n21j
~k ·(~x−~x2)
· (ω/k)
2 e−i ω (t−τ)
k2 − (ω + i ǫ)2
d3~k dω dτ
(2 π)4
+ hTTij,(div) (21)
TT (NZ)
ij +H
+HTT2ij
−HTT1ij
Gm1m2
2 r12
−HTT2ij
Gm1m2
2 r12
+hTTij,(div), (22)
where
HTTAij [~u] := 16πG
d3~k dω
(2 π)4
[ui uj ]
(ω/k)2
k2 − (ω + i ǫ)2
~k·(~x−~xA(τ)) e−i ω (t−τ). (23)
Here, the “TT projection” is effected using the operator
i := δ
i − ki kj/k2. For an arbitrary spatial vector ~u,
[ui uj ]
TT = uc ud (P
Pij P
= ui uj +
ki kj
u(i kj)
. (24)
Details on the evaluation of these terms are presented
in Appendix A. After calculation, we write the result as
a sum of terms evaluated at the present field-point time
t, the retarded time trA defined by
t− trA − rA(trA) = 0, (25)
and integrals between trA and t,
TTA[~u] = H
TTA[~u; t] +H
TTA[~u; t
TTA[~u; t
A → t], (26)
where the three parts are given by:
0 100 200 300 400 500 600 700 800
-0.0015
-0.001
-0.0005
0.0005
0.001
 (quadrupole)
 (NZ)
FIG. 1: Plus polarization of the quadrupole (black/solid) and
near-zone (red/dashed) strains observed at field point r =
100M , θ = π/4, φ = 0. The binary orbits in the x-y plane,
with initial separation r12 = 10M , and a nominal coalescence
time tc ≈ 780M . Both phase and amplitude of h
TT (4)
very wrong outside the near zone.
TTA[~u; t] = −
rA(t)
u2 − 5 (~u · n̂A)2
δi j + 2 ui uj +
3 (~u · n̂A)2 − 5 u2
niA n
+12 (~u · n̂A)u(i nj)A
, (27)
TTA[~u; t
−2 u2 + 2 (~u · n̂A)2
δi j + 4 ui uj +
2 u2 + 2 (~u · n̂A)2
niA n
−8 (~u · n̂A)u(i nj)A
, (28)
TTA[~u; t
A → t] = −G
(t− τ)
rA(τ)3
−5 u2 + 9 (~u · n̂A)2
δi j + 6 ui uj − 12 (~u · n̂A)u(i nj)A
9 u2 − 15 (~u · n̂A)2
niA n
(t− τ)3
rA(τ)5
u2 − 5 (~u · n̂A)2
δi j + 2 ui uj − 20 (~u · n̂A)u(i nj)A
−5 u2 + 35 (~u · n̂A)2
niA n
. (29)
In Fig. 2, we show the retarded times calculated for
each particle, as measured at points along the x axis, for
the same orbit as in Fig. 1. We also show the corre-
sponding retarded times for a binary in an exactly cir-
cular orbit. Since the small-scale oscillatory effect of the
finite orbital radius would be lost by the overall linear
trend, we have multiplied by the orbital radius.
A. Reconciling with Jaranowski & Schäfer’s h
TT (4)
From the derivation above it is clear that hTTij includes
retardation effects, so it will not depend solely on the
present time. We might even expect that all “present-
time” contributions should vanish individually, or should
cancel out. It can be seen easily from (27) that the “t”
part of the second and third terms of Eq. (22) exactly
cancel out the “kinetic” part (first line) of Eq. (12). Thus,
we can simply remove that line in Eq. (12), and use the
500 1000 1500
particle 1 (circular)
particle 1 (inspiral)
particle 2 (circular)
particle 2 (inspiral)
FIG. 2: Retarded times for particles 1 and 2, as measured by
observers along the x axis at the initial time t = 0, for the
binary of Fig. 1. To highlight the oscillatory effect of the
finite-radius orbit on tr, we first divide by the average field
distance r.
“tr” part instead. One may similarly inquire whether the
“t” parts of the fourth and fifth terms of Eq. (22) above,
TT (pot,now)
ij ≡ −H
Gm1m2
2 r12
n̂12; t
−HTT2ij
Gm1m2
2 r12
n̂12; t
, (30)
also cancel the remaining, “potential” parts of Eq. (12).
The answer is “not completely”; expanding in powers of
1/r, we find:
TT (pot,4)
ij + h
TT (pot,now)
G2m1m2 r12
16 r3
(3 + 14W 2 − 25W 4) δi j − 4 (1 + 5W 2)n12i n12j
−5 (1 + 6W 2 − 7W 4)n1i n1j + 2W (7 + 9W 2) (n12i n1j + n12j n1i)
+O(1/r4),(31)
where W ≡ sin θ cos(φ − Φ(t)), and Φ(t) is the orbital
phase of particle 1 at the present time t. That is, the
“new” contribution cancels the 1/r and 1/r2 pieces of
TT (4)
ij entirely. In the far zone the result is thus smaller
than the hTT
ij,(div)
term which we are ignoring everywhere,
since it is small both in the near and the far zone [43].
We note here two general properties of the contribu-
tions to the full hTTij .
1. In the near zone h
TT (4)
ij is the dominant term since
all other terms arise from (✷−1ret − ∆−1)skl. Thus
all other terms must cancel within the accuracy of
the near-zone approximation.
TT (4)
ij is wrong far from the sources; thus, the
new corrections should “cancel” h
TT (4)
ij entirely, far
from sources. Note, however, that while hij =
−✷−1retskl depends only on retarded time, its TT-
projection hTTij = δ
TT kl
ij hkl has a more complicated
causal structure; E.g. the finite time integral comes
from applying the TT-projection. [Proof: Even if
we had a source given exactly by s̄kl, h
TT (4)
ij would
depend only the present time, hij would depend
only on retarded time, and hTTij would (as we have
computed) contain a finite time integral term.]
Additionally, the full hTTij agrees well with quadrupole
predictions, which we demonstrate in Section IV.
IV. NUMERICAL RESULTS AND INVARIANTS
A. Phasing and Post-Keplerian Relations
It has been known for some time (see for example [53])
that gravitational wave phase plays an even more impor-
tant part in source identification than does wave ampli-
tude. In PN work, phase and amplitude are estimated
somewhat separately; the amplitude requires knowledge
of the time-dependent multipoles, used in developing the
the full metric, while the phase can be relatively simply
approximated from the orbital equations of motion, tak-
ing into account the gravitational wave flux at infinity to
evolve the orbital parameters [54].
The quadrupole waveform introduced for the compar-
ison in Fig. 1 had an amplitude accurate to O(v/c)4
and the simplest available time evolution for the phase.
Waveform phase is a direct consequence of orbital phase.
To lowest order, we could have assumed a binary mov-
ing in a circular orbit (of zero eccentricity) since, up to
2PN order, we can have circular orbits, where the linear
momentum, p, of each particle is related to the separa-
tion r12 by, say, Eq. (24) of [45]. Nevertheless, circular
orbits are physically unrealistic – since radiation reac-
tion will lead to inspiral and merger of the particles –
and Eqs. (17-18) already include leading-order radiation-
reaction effects. Moreover, the phase errors that would
accrue from using purely circular orbits would be larger,
the further from the sources we tried to compute them.
The calculations of section III lead to waveform am-
plitudes that are accurate at O(v/c)4 everywhere. How-
ever, we desire that our initial-data wave content already
encode the phase as accurately as possible. Highly ac-
curate phase for our initial data (via hTT), and hence in
the leading edge of the waveforms we would extract from
numerical evolution, is critical for parameter estimation
following a detection.
For demonstrative purposes, in this section, we will
restrict ourselves to the simplest phasing relations con-
sistent with radiation-reaction inspiral as given by Eqs.
(17-18), while using higher-order PN expressions than
Eqs. (19 -20) for relating the orbit to the phase. For ex-
ample, from [55], we have found to second PN (beyond
leading) order:
r12(Ω)
= (GMΩ)−2/3 − (3− ν)
− (18− 81ν − 8ν
(GMΩ)2/3, (32)
= (GMΩ)1/3 +
(15− ν)
(GMΩ)
(441− 324ν − ν2)
(GMΩ)5/3, (33)
and we note that higher-order equivalents of these can
be computed from [56].
In the numerical construction of initial data, the pri-
mary input is the coordinate separation of the holes. In
placing the punctures on the numerical grid, the separa-
tion must be maintained exactly. To ensure this, we in-
vert Eq. (32) to obtain the exact Ωr corresponding to our
desired r12. Then we use Eq. (18) with t = 0 to find the
coalescence time tc that yields this Ωr. Once we have ob-
tained tc, we then find the orbital phase Φ and frequency
Ω at any source time τ directly from Eqs. (17-18), and
the corresponding separation r12 and momentum p from
Eqs. (32-33), or their higher-order equivalents.
In Fig. 3, we show a representative component of the
retarded-time part of hTTij for both circular and leading-
order inspiral orbits. For both orbits, we use the ex-
tended Keplerian relations (32) and (33); otherwise the
orbital configuration is that of Fig. 1. The coalescence
time is now tc ∼ 1100M . We can see that the cumu-
lative wavelength error of the circular-orbit assumption
becomes very large at large distances from the sources.
This demonstrates that using inspiral orbits instead of
circular orbits will significantly enhance the phase accu-
racy of the initial data, even though circular orbits are
in principle sufficient when we include terms only up to
O(v/c)4 as done in this work. From now on we use only
inspiral orbits.
0 500 1000 1500
circular
inspiral
FIG. 3: The xx component of the full hTTij for a binary with
initial separation r12 = 10M in a circular (black/solid) or
inspiralling (red/dashed) orbit. Both fields have been rescaled
by the observer radius r = z to compensate for the leading
1/r fall-off. The orbital configuration is the same as for Fig.
1, apart from the Keplerian relations, where we have used the
higher-order relations (32-33), yielding tc ∼ 1100M . Note the
frequency broadening at more distant field points.
0 100 200 300 400 500 600 700 800
-0.0015
-0.001
-0.0005
0.0005
0.001
 (quadrupole)
 (full)
 (quadrupole)
 (full)
FIG. 4: Plus and cross polarizations of the strain observed at
field point r = 100M , θ = π/4, φ = 0. Both the quadrupole-
approximation waveform (black/solid and green/dot-dashed)
and the full (red/dashed and blue/dotted) waveforms coming
from hTTij are shown. The orbital configuration is the same as
for Fig. 1.
Next, we compare our full waveform hTTij (expressed
as the combinations h+ and h×) at an intermediate-field
position (r = 100M , θ = π/4, φ = 0) to the lowest-order
quadrupole result. In Fig. 4, the orbital configuration
is the same as for Fig. 1. As one can see, both the
+ and × polarizations of our hTTij agree very well with
quadrupole results, as they should. We demonstrate the
near- and intermediate-zone behavior of the new data on
50 100 150 200 250 300 350 400 450 500
-0.05
-0.05
quadrupole
FIG. 5: Plus and cross polarizations of the strain observed at
t = 0 along the z axis. We show the near-zone (solid/black),
the quadrupole (dashed/red) and full (dot-dashed/green)
waveforms. All waveforms have been rescaled by the observer
radius r = z to compensate for the leading 1/r fall-off. The
orbital configuration is the same as for Fig. 1.
the initial spatial slice in Fig. 5. The quadrupole and
full solutions agree very well outside ∼ 100M . However,
the full solution’s phase and amplitude approach the NZ
solution closer to the sources.
B. Numerical Implementation
After having confirmed that we have a PN three-metric
gij that is accurate up to errors of order O(v/c)
5, and
that correctly approaches the quadrupole limit outside
the near zone, we are now ready to construct initial data
for numerical evolutions. In order to do so, we need the
intrinsic curvature Kij , which can be computed as in
Tichy et al. [45] from the conjugate momentum. The
difference is that here we use the full ḣTTij instead of the
near-zone approximation ḣ
TT (4)
ij to obtain the conjugate
momentum [43]. The result is
Kij = −ψ−10PN
ḣTTij + (φ(2)π̃
+O(v/c)6, (34)
where the error term comes from neglecting terms like
ij,(div)
at O(v/c)5 in hTTij , and where ψPN , π̃
φ(2) can be found in Tichy et al. [45]. An additional
difference is that the time derivative of hTTij is evaluated
numerically in this work. Note that the results for gij
are accurate up to O(v/c)4, while the results for Kij are
0 0.5 1 1.5
 = 10M
 = 20M
 = 50M
 = 100M
0 0.5 1 1.5
FIG. 6: Upper panel: Hamiltonian constraint violation along
the y axis of our new data in the near zone, as a function of
binary separation r12. Lower panel: Momentum constraint
(y-component) violation of the same data along the x axis.
The orbital configuration is that of Fig. 3. Distances have
been scaled relative to r12, so that the punctures are initially
at y/r12 = ±0.5.
accurate up O(v/c)5, because Kij contains an additional
time derivative [45, 57, 58].
Next we show the violations of the Hamiltonian and
momentum constraints computed from gij and Kij , as
functions of the binary separation r12. As we can see in
both panels of Fig. 6, the constraints become smaller for
larger separations, because the post-Newtonian approxi-
mation gets better. Note that, as in [45], the constraint
violation remains finite everywhere, and is largest near
each black hole.
C. Curvature Invariants and Asymptotic Flatness
In analysis of both initial and evolved data, it is often
instructive to investigate the behavior of scalar curva-
ture invariants, as these give some idea of the far-field
properties of our solution. We expect, for an asymptoti-
cally flat space-time, that in the far field, the speciality
index S ≡ 27J 2/I3 will be close to unity. This can
be seen from the following arguments. Let us choose a
tetrad such that the Weyl tensor components ψ1 and ψ3
are both zero. Further, we assume that in the far field
ψ0 and ψ4 are both perturbations of order ǫ off a Kerr
background. Then
S ≈ 1− 3ψ0ψ4
+O(ǫ3), (35)
which is indeed close to one. Note however, that this
argument only works if the components of the Weyl ten-
sor obey the peeling theorem, such that ψ2 ∼ O(r−3),
ψ0 ∼ O(r−5) and ψ4 ∼ O(r−1). In particular, if ψ0 falls
off more slowly than O(r−5), S will grow for large r.
Now observe that ψ0 ∼ O(r−5) ∼ M3/r5 is formally of
O(v/c)6. Thus, in order to see the expected behavior of
S ≈ 1 in the far-field we need to go to O(v/c)6. If we only
go to O(v/c)4 (as done in this work) ψ0 consists of un-
controlled remainders only, which should in principle be
dropped. When we numerically compute S we find that
for our data, S deviates further and further from unity
for large distances from the binary. This reflects the fact
that the so-called “incoming” Weyl scalar ψ0 only falls
off as 1/r3, due to uncontrolled remainders at O(v/c)6,
which arise from a mixing of the background with the
TT waveform.
V. DISCUSSION AND FUTURE WORK
Exploring and validating PN inspiral waveforms is cru-
cially important for gravitational-wave detection and for
our theoretical understanding of black-hole binaries. Our
goal has been to provide a step forward in this under-
standing by building a direct interface between the PN
approach and numerical evolution, along the lines ini-
tially outlined in Ref. [45]. In this paper we have
essentially completed the calculation of the transverse-
traceless part of the ADM-TT metric to O(v/c)4 pro-
vided in [45], yielding data that, on the initial Cauchy
slice, will describe the space-time into the far-field.
We have incorporated this formulation into a numerical
initial-data routine adapted to the “puncture” topology
that has been so successful recently, and have explored
these data’s numerical properties on the initial slice.
Our next step is to evolve these data with moving
punctures, and investigate how the explicit incorporation
of post-Newtonian waveforms in the initial data affects
both the ensuing slow binary inspiral of the sources and
the release of radiation from the system. We note es-
pecially that our data are non-conformally flat beyond
O(v/c)3. We expect our data to incorporate smaller
unphysical initial distortions in the black holes than is
possible with conformal flatness, and hence less spurious
gravitational radiation during the numerical evolution.
We see this as a very positive step toward providing fur-
ther validation of numerical relativity results for multiple
orbit simulations, since it permits comparison with PN
results where they are expected to be reliable. Our initial
data will also allow us to fully evaluate the validity of PN
results for merging binaries by enabling comparison with
the most accurate numerical relativity results.
We expect that further development of these data will
certainly involve the use of more accurate orbital phas-
ing information than the leading order given by Eqs. (17-
18). This information is available in radiative coordinates
(see, e.g. Eq. (6.29) of [59]) appropriate for far-field eval-
uation of the gravitational radiative modes; it may be
possible to produce them in ADM-TT coordinates via
a contact transformation, or by direct calculation (see,
e.g. [60]). For initial separations similar to the fiducial
test case of this paper, r12 = 10M , the order necessary
for clean matching of the initial wave content with the
new radiation generated in evolution should not be par-
ticularly high [26]. As noted, the Keplerian relations
Eqs. (32-33) can easily be extended to higher PN order.
The data presented already allow for arbitrary initial
mass ratios ν; this introduces the possibility of significant
gravitational radiation in odd-l multipoles, together with
associated phenomena, such as in-plane recoil “kicks”.
An interesting future development of these data will be
the inclusion of spin angular momenta on the pre-merger
holes. This will open our initial-data prescription to de-
scribing an even richer spectrum of binary radiation.
Acknowledgments
We would like to thank L. Blanchet and G. Schäfer for
generous assistance and helpful discussion.
M.C., B.K. and B.W. gratefully acknowledge the sup-
port of the NASA Center for Gravitational Wave Astron-
omy (NAG5-13396). M.C. and B.K. also acknowledge the
NSF for financial support under grants PHY-0354867 and
PHY-0722315. B.K. also acknowledges support from the
NASA Postdoctoral Program at the Oak Ridge Associ-
ated Universities. The work of W.T. was supported by
NSF grant PHY-0555644. W.T. also acknowledges par-
tial support from the NCSA under Grant PHY-060040T.
The work of B.W. was also supported by NSF grants
PHY-0245024 and PHY-0555484.
APPENDIX A: DETAILS OF INTEGRAL
CALCULATION
Here we present some more details of the calculations
that lead to the three contributions to Eq. (23): Eqs.
(27-29). Inserting Eq. (24) in the general integral (23),
we can write H
TTA[~u] as a combination of scalar and
tensor terms:
HTTAij [~u] = 16πG
ui uj −
Iij A +
[uc ud
IcdA δij
2 uc u(i
Icj)A +
[uc ud
I cdij A
,(A1)
where the “I” integrals are defined as:
d3~k dω
(2 π)4
(ω/k)2 ei k rA cos θ−i ω T
k2 − (ω + i ǫ)2
, (A2)
d3~k dω
(2 π)4
ki kj
× (ω/k)
2 ei k rA cos θ−i ω T
k2 − (ω + i ǫ)2
, (A3)
i j c d
d3~k dω
(2 π)4
ki kj kc kd
× (ω/k)
2 ei k rA cos θ−i ω T
k2 − (ω + i ǫ)2
. (A4)
Here T ≡ t − τ , and ~rA ≡ ~x − ~xA. We have also taken
our integration coordinates such that ~rA lies in the z di-
rection, so that the dummy momentum vector ~k satisfies
~k · ~rA = k rA cos θ, (A5)
d3~k = k2 dk sin θ dθ dφ. (A6)
Define the unit orthogonal vectors n̂A ≡ (0, 0, 1) , ℓ̂ ≡
(cosφ, sinφ, 0). Then we can write
~k = k cos θ n̂A + k sin θ ℓ̂ ⇒ ~k · ~rA = rA ~k · n̂A.
We can also define a projector tensor onto ℓ̂:
Qa b ≡ δa b − na nb ⇒ Qab = δab − na nb
⇒ QacQcb = Qab , Qab nb = 0 , Qab ℓb = ℓa.
1. Angular integration
We will neglect the A subscript for now, until it be-
comes relevant again. To calculate the integrals (A2-A4),
we begin with the φ integration. The only φ dependence
comes from the ~ℓ parts of the ~k terms. It can be seen
from elementary trigonometric integrals that:
dφ ℓa =
dφ ℓa ℓb ℓc = 0,
dφ ℓa ℓb = π Qa b,
dφ ℓa ℓb ℓc ℓd =
Qa bQc d +Qa cQb d +Qa dQb c
We use these to calculate the φ integrals for Ia bA and
Ia b c dA . Define w ≡ cos θ. Then
dφ 1 = 2 π,
ka kb
= 2 π w2 na nb + π (1− w2)Qa b,
ka kb kc kd
= 2 π w4 na nb nc nd
+6 πw2 (1− w2)Q(a b nc nd)
(1− w2)2Q(a bQc d).
So the next integrals will differ in their θ dependence,
contained in the powers of w above. The θ integrals will
contain the following basic types:
g0(a) ≡
dw eaw = 2
sinh a
, (A7)
g2(a) ≡
dw w2 eaw = 2
sinh a
− 4 cosha
sinh a
g4(a) ≡
dw w4 eaw = 2
sinh a
− 8 cosha
sinh a
− 48 cosha
sinh a
. (A9)
Now Ia b and Ia b c d can be written as the linear combi-
nations:
Ia b =
Qa b I
(na nb − 1
Qa b)K
, (A10)
Ia b c d =
na nb nc nd − 3Q(a b nc nd) + 3
Q(a bQc d)
3Q(a b nc nd) −
Q(a bQc d)
Q(a bQc d) I
. (A11)
I here can be expressed in terms of g0(a) above:
(2 π)3
(ω/k)2
k2 − (ω + i ǫ)2 e
i k r cos θ−i ω T
(2 π)3
ω2 e−i ω T
k2 − (ω + i ǫ)2
g0(i k r)
(2 π)3
ω2 e−i ω T J0. (A12)
The 1/2 factor is because we moved to integrating k over
the whole real line instead of the positive half-line (this
is permissible as gn(a) is an even function of a). K and L
are defined analogously to I, but with extra even powers
of cos θ = w:
(2 π)3
(ω/k)2
k2 − (ω + i ǫ)2
ei k r cos θ−i ω T cos2 θ
(2 π)3
ω2 e−i ω T
k2 − (ω + i ǫ)2 g2(i k r)
(2 π)3
ω2 e−i ω T J2, (A13)
(2 π)3
(ω/k)2
k2 − (ω + i ǫ)2
ei k r cos θ−i ω T cos4 θ
(2 π)3
ω2 e−i ω T
k2 − (ω + i ǫ)2
g4(i k r)
(2 π)3
ω2 e−i ω T J4. (A14)
2. Momentum integration
Now we address the k integrals, defined as:
dk fn(k) =
dk f+n (k) +
dk f−n (k),
where we collect the positive exponents in the gn in the
integrand of f+n (k), and the negative exponents in f
n (k):
f+n (k) ≡
g+n (i k r)/2
k2 − (ω + i ǫ)2
, f−n (k) ≡
g−n (i k r)/2
k2 − (ω + i ǫ)2
We calculate this as the sum of contour integrals of the
“plus” and “minus” integrands (necessary, as the oppo-
site signs require different contours). Each of these has
poles at k = 0, k = k+ ≡ ω + i ǫ, and k = k− ≡ −ω − i ǫ
(the first of these is from the gn). We integrate the “plus”
integrands anticlockwise around the contour C1, and the
“minus” integrands anticlockwise around the contour C2
(see Fig. 7); taking the limit |k| → ∞, the contribu-
tion from the curved segments vanishes, and the residue
theorem gives us:
Jn = 2 π iRes[f
n , k+]− 2 π iRes[f−n , k−]
+π iRes[f+n , 0]− π iRes[f−n , 0]. (A15)
Calculating the residues, we find the values of each of
the Jn:
π ei r (ω+i ǫ)
r (ω + i ǫ)2
r (ω + i ǫ)2
, (A16)
π ei r (ω+i ǫ)
r (ω + i ǫ)2
π ei r(ω+i ǫ) [−2 + 2 i r (ω + i ǫ)]
r3 (ω + i ǫ)4
r3 (ω + i ǫ)4
, (A17)
π ei r (ω+i ǫ)
r (ω + i ǫ)2
4π ei r(ω+i ǫ)
r5 (ω + i ǫ)6
[6− 6 i r (ω + i ǫ)
−3 r2 (ω + i ǫ)2 + i r3 (ω + i ǫ)3
− 24 π
r5 (ω + i ǫ)6
. (A18)
3. Frequency integration
Now we perform the ω integration. Inserting the re-
sults (A16-A18) into (A12-A14) respectively, we see that
each of I, K and L contains a delta function, which we
can extract:
4 π r
[δ(T − r) − δ(T )],
4 π r
δ(T − r) + e−r ǫ
(2 π)3
e−iω (T−r) F2a(ω)
(2 π)3
e−i ω T F2b(ω),
4 π r
δ(T − r) + e−r ǫ
(2 π)3
e−iω (T−r) F4a(ω)
(2 π)3
e−i ω T F4b(ω),
where the new terms on the right-hand side come from
the Jn above, grouped by exponential, as that is what
determines the contours chosen during integration (see
Fig. 7):
F2a(ω) =
π ω2 [−2 + 2 i r (ω + i ǫ)]
r3 (ω + i ǫ)4
F2b(ω) =
2 π ω2
r3 (ω + i ǫ)4
F4a(ω) =
r5 (ω + i ǫ)6
[24− 24 i r (ω + i ǫ)
−12 r2 (ω + i ǫ)2 + 4 i r3 (ω + i ǫ)3
F4b(ω) = −
24 π ω2
r5 (ω + i ǫ)6
Now the residues are as follows (taking the ǫ→ 0 limit):
e−iω (T−r) F2a(ω),−i ǫ
2 π i T
e−iω T F2b(ω),−i ǫ
= −2 π i T
e−iω (T−r) F4a(ω),−i ǫ
4 π i T 3
e−iω T F4b(ω),−i ǫ
= −4 π i T
The only pole is at ω = −i ǫ, so if we can close the contour
in the upper half-plane, we will get zero.
• For T < 0, both the “a” and “b” integrals can be
closed in C1. Result: zero contribution.
• For 0 < T < r, the “a” integrals can be closed
in C1, but the “b” integrals must be closed in C2.
Result: “b” contribution.
• For T > r, both the “a” and “b” integrals must be
closed in C2. But then the “a” and “b” residues
cancel out. Result: zero contribution.
Thus the only interesting contribution happens in the
interval 0 < T < r ⇔ t− r(τ) < τ < t. In this case, the
final integrals yield
(2 π)3
e−iω (T−r) F2b(ω) = −
2 π r3
(2 π)3
e−iω (T−r) F4b(ω) = −
−ω − iǫ
ω + iǫ
FIG. 7: Contours needed to complete integration over k (left) and ω (right).
leading to the final result for K and L:
4 π r
δ(T − r)− 1
4 π r
δ(T ),
4 π r
δ(T − r)−Θ(T )Θ(r − T ) T
2 π r3
4 π r
δ(T − r)−Θ(T )Θ(r − T ) T
We use these to calculate the Ii j and Ii j k l:
Ii j =
ni nj
4 π r
δ(T − r)−Θ(T )Θ(r − T ) T
2 π r3
4 π r
δ(T ) + Θ(T )Θ(r − T ) T
2 π r3
,(A19)
Ii j k l =
ni nj nk nl
4 π r
δ(T − r) −Θ(T )Θ(r − T ) T
− 3Q(i j nk nl) Θ(T )Θ(r − T )
2 π r3
Q(i j Qk l)
4 π r
δ(T ) + Θ(T )Θ(r − T )
. (A20)
4. Time integration
The final integrations will be over the source time τ . The “crossing times” for the two Θ functions are τ = t and
τ = tr, where t is the present field time, and tr the corresponding retarded time defined by (25). Now taking a general
function y(τ), we find that
dτ IA y(τ) =
y(trA)
4 π rA(t
− y(t)
4 π rA(t)
A y(τ) =
niA n
4 π rA
4 π rA
3niA n
A − δ
) (t− τ) y(τ)
4 π rA(τ)3
i j k l
A y(τ) =
niA n
4 π rA
4 π rA
−3Q(i jA n
(t− τ)
2 π rA(τ)3
−niA n
A + 3Q
(t− τ)3
π rA(τ)5
y(τ).
These can now be substituted into the general integral (A1). We write the result as a sum of terms at the present
field-point time t, the retarded time trA, and interval terms between them,
TTA[~u] = H
TTA[~u; t] +H
TTA[~u; t
A] +H
TTA[~u; t
A → t],
TTA[~u; t] = −
rA(t)
ui uj − u
[uk ul
Qk lA δ
2 uk u
[uk ul
TTA[~u; t
ui uj − u
niA n
[uk ul
nkA n
2 uk u
[uk ul
niA n
TTA[~u; t
A → t] = −4G
(t− τ)
rA(τ)3
3niA n
A − δ
[uk ul
3nkA n
A − δk l
2 uk u
A − δj) k
[uk ul
(t− τ)3
rA(τ)5
[uk ul
niA n
A − 3Q
[1] R. Vogt, in Sixth Marcel Grossman Meeting on General
Relativity (Proceedings, Kyoto, Japan, 1991), edited by
H. Sato and T. Nakamura (World Scientific, Singapore,
1992), pp. 244–266.
[2] B. Abbott et al. (LIGO Scientific), Nucl. Instrum. Meth.
A517, 154 (2004), gr-qc/0308043.
[3] P. Bender et al., Tech. Rep. MPQ 233, Max-
Planck-Institut für Quantenoptik (1998), URL:
http://www.lisa-science.org/resources/talks-articles/mission/prephasea.pdf.
[4] K. Danzmann and A. Rudiger, Class. Quantum Grav.
20, S1 (2003).
[5] A. Buonanno, G. B. Cook, and F. Pretorius, Phys. Rev.
D 75, 124018 (2007), gr-qc/0610122.
[6] E. Berti, V. Cardoso, J. A. Gonzalez, U. Sperhake,
M. Hannam, S. Husa, and B. Brügmann (2007), arXiv:gr-
qc/0703053.
[7] F. Pretorius, Phys. Rev. Lett. 95, 121101 (2005), gr-
qc/0507014.
[8] M. Campanelli, C. O. Lousto, P. Marronetti, and
Y. Zlochower, Phys. Rev. Lett. 96, 111101 (2006), gr-
qc/0511048.
[9] J. G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and
J. van Meter, Phys. Rev. Lett. 96, 111102 (2006), gr-
qc/0511103.
[10] S. Brandt and B. Brügmann, Phys. Rev. Lett. 78, 3606
(1997), gr-qc/9703066.
[11] M. Shibata and T. Nakamura, Phys. Rev. D 52, 5428
(1995).
[12] T. Baumgarte and S. Shapiro, Phys. Rev. D 59, 024007
(1999), gr-qc/9810065.
[13] B. Brügmann, W. Tichy, and N. Jansen, Phys. Rev. Lett.
92, 211101 (2004), gr-qc/0312112.
[14] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys.
Rev. D 73, 061501(R) (2006), gr-qc/0601091.
[15] F. Pretorius, Class. Quantum Grav. 23, S529 (2006), gr-
qc/0602115.
[16] J. G. Baker, J. Centrella, D.-I. Choi, M. Koppitz, and
J. van Meter, Phys. Rev. D 73, 104002 (2006), gr-
qc/0602026.
[17] B. Brügmann et al. (2006), gr-qc/0610128.
[18] M. A. Scheel et al., Phys. Rev. D 74, 104006 (2006),
gr-qc/0607056.
[19] P. Marronetti, W. Tichy, B. Brügmann, J. Gonzalez,
M. Hannam, S. Husa, and U. Sperhake, Class. Quant.
Grav. 24, S43 (2007), gr-qc/0701123.
[20] W. Tichy, Phys. Rev. D 74, 084005 (2006), gr-
qc/0609087.
[21] H. P. Pfeiffer, D. A. Brown, L. E. Kidder, L. Lindblom,
G. Lovelace, and M. A. Scheel, Class. Quant. Grav. pp.
S59–S81 (2007), gr-qc/0702106.
[22] J. G. Baker, M. Campanelli, F. Pretorius, and Y. Zlo-
chower, Class. Quant. Grav. 24, S25 (2007), gr-
qc/0701016.
[23] J. Thornburg, P. Diener, D. Pollney, L. Rezzolla,
http://www.lisa-science.org/resources/talks-articles/mission/prephasea.pdf
E. Schnetter, E. Seidel, and R. Takahashi, Class. Quant.
Grav. 24, 3911 (2007), gr-qc/0701038.
[24] NRwaves home page:
https://gravity.psu.edu/wiki NRwaves.
[25] J. G. Baker, S. T. McWilliams, J. R. van Meter, J. Cen-
trella, D.-I. Choi, B. J. Kelly, and M. Koppitz, Phys.
Rev. D 75, 124024 (2007), gr-qc/0612117.
[26] J. G. Baker, J. R. van Meter, S. T. McWilliams, J. Cen-
trella, and B. J. Kelly (2006), gr-qc/0612024.
[27] M. Campanelli, Class. Quant. Grav. 22, S387 (2005),
astro-ph/0411744.
[28] F. Herrmann, D. Shoemaker, and P. Laguna (2006), gr-
qc/0601026.
[29] J. G. Baker et al., Astrophys. J. 653, L93 (2006), astro-
ph/0603204.
[30] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys.
Rev. D 74, 041501(R) (2006), gr-qc/0604012.
[31] M. Campanelli, C. O. Lousto, and Y. Zlochower, Phys.
Rev. D 74, 084023 (2006), astro-ph/0608275.
[32] M. Campanelli, C. O. Lousto, Y. Zlochower, B. Krishnan,
and D. Merritt, Phys. Rev. D 75, 064030 (2007), gr-
qc/0612076.
[33] J. A. Gonzalez, U. Sperhake, B. Bruegmann, M. Han-
nam, and S. Husa, Phys. Rev. Lett. 98, 091101 (2007),
gr-qc/0610154.
[34] F. Herrmann, I. Hinder, D. Shoemaker, P. Laguna, and
R. A. Matzner (2007), gr-qc/0701143.
[35] M. Campanelli, C. O. Lousto, Y. Zlochower, and D. Mer-
ritt, 659, L5 (2007), revised version has very different
numbers/formulae, gr-qc/0701164.
[36] M. Koppitz, D. Pollney, C. Reisswig, L. Rezzolla,
J. thornburg, P. Diener, and E. Schnetter, Phys. Rev.
Lett. 99, 041102 (2007), gr-qc/0701163.
[37] J. A. Gonzalez, M. D. Hannam, U. Sperhake,
B. Brügmann, and S. Husa, Phys. Rev. Lett. 98, 231101
(2007), gr-qc/0702052.
[38] D.-I. Choi et al. (2007), gr-qc/0702016.
[39] J. G. Baker et al. (2007), astro-ph/0702390.
[40] F. Pretorius and D. Khurana, Class. Quant. Grav. 24,
S83 (2007), gr-qc/0702084.
[41] M. Campanelli, C. O. Lousto, Y. Zlochower, and D. Mer-
ritt, Phys. Rev. Lett. 98, 231102 (2007), arXiv:gr-
qc/0702133.
[42] W. Tichy and P. Marronetti (2007), gr-qc/0703075.
[43] G. Schäfer, Ann. Phys. 161, 81 (1985).
[44] P. Jaranowski and G. Schäfer, Phys. Rev. D 57, 7274
(1998), errata: Phys. Rev. D 63, 029902(E) (2000), gr-
qc/9712075.
[45] W. Tichy, B. Brügmann, M. Campanelli, and P. Diener,
Phys. Rev. D 67, 064008 (2003), gr-qc/0207011.
[46] Cactus Compuational Toolkit,
http://www.cactuscode.org.
[47] T. Ohta, H. Okamura, T. Kimura, and K. Hiida, Prog.
Theor. Phys. 51, 1598 (1974).
[48] V. A. Fock, The Theory of Space, Time and Gravitation,
2nd ed. (Pergamon Press, 1964).
[49] W. Tichy, B. Brügmann, and P. Laguna, Phys. Rev. D
68, 064008 (2003), gr-qc/0306020.
[50] W. Tichy and B. Brügmann, Phys. Rev. D 69, 024006
(2004), gr-qc/0307027.
[51] M. Ansorg, B. Brügmann, and W. Tichy, Phys. Rev. D
70, 064011 (2004), gr-qc/0404056.
[52] L. S. Finn and D. F. Chernoff, Phys. Rev. D 47, 2198
(1993), gr-qc/9301003.
[53] C. Cutler et al., Phys. Rev. Lett. 70, 2984 (1993), astro-
ph/9208005.
[54] W. Tichy, E. E. Flanagan, and E. Poisson, Phys. Rev. D
61, 104015 (2000), gr-qc/9912075.
[55] G. Schäfer and N. Wex, Phys. Lett. A 174, 196 (1993).
[56] R.-M. Memmesheimer, A. Gopakumar, and G. Schäfer,
Phys. Rev. D 70, 104011 (2004), gr-qc/0407049.
[57] N. Yunes, W. Tichy, B. J. Owen, and B. Brügmann,
Phys. Rev. D 74, 104011 (2006), gr-qc/0503011.
[58] N. Yunes and W. Tichy, Phys. Rev. D 74, 064013 (2006),
gr-qc/0601046.
[59] L. Blanchet, Phys. Rev. D 54, 1417 (1996), gr-
qc/9603048.
[60] T. Damour, A. Gopakumar, and B. R. Iyer, Phys. Rev.
D 70, 064028 (2005), gr-qc/0404128.
http://www.cactuscode.org
ABSTRACT
  We present improved post-Newtonian-inspired initial data for non-spinning
black-hole binaries, suitable for numerical evolution with punctures. We
revisit the work of Tichy et al. [W. Tichy, B. Bruegmann, M. Campanelli, and P.
Diener, Phys. Rev. D 67, 064008 (2003)], explicitly calculating the remaining
integral terms. These terms improve accuracy in the far zone and, for the first
time, include realistic gravitational waves in the initial data. We investigate
the behavior of these data both at the center of mass and in the far zone,
demonstrating agreement of the transverse-traceless parts of the new metric
with quadrupole-approximation waveforms. These data can be used for numerical
evolutions, enabling a direct connection between the merger waveforms and the
post-Newtonian inspiral waveforms.

<|endoftext|><|startoftext|>
CLNS 07/1989
CLEO 07-01
Measurement of the Decay Constant f
using D+
→ ℓ+ν
M. Artuso,1 S. Blusk,1 J. Butt,1 S. Khalil,1 J. Li,1 N. Menaa,1 R. Mountain,1 S. Nisar,1
K. Randrianarivony,1 R. Sia,1 T. Skwarnicki,1 S. Stone,1 J. C. Wang,1 G. Bonvicini,2 D. Cinabro,2
M. Dubrovin,2 A. Lincoln,2 D. M. Asner,3 K. W. Edwards,3 P. Naik,3 R. A. Briere,4 T. Ferguson,4
G. Tatishvili,4 H. Vogel,4 M. E. Watkins,4 J. L. Rosner,5 N. E. Adam,6 J. P. Alexander,6
D. G. Cassel,6 J. E. Duboscq,6 R. Ehrlich,6 L. Fields,6 L. Gibbons,6 R. Gray,6 S. W. Gray,6
D. L. Hartill,6 B. K. Heltsley,6 D. Hertz,6 C. D. Jones,6 J. Kandaswamy,6 D. L. Kreinick,6
V. E. Kuznetsov,6 H. Mahlke-Krüger,6 D. Mohapatra,6 P. U. E. Onyisi,6 J. R. Patterson,6 D. Peterson,6
J. Pivarski,6 D. Riley,6 A. Ryd,6 A. J. Sadoff,6 H. Schwarthoff,6 X. Shi,6 S. Stroiney,6 W. M. Sun,6
T. Wilksen,6 S. B. Athar,7 R. Patel,7 J. Yelton,7 P. Rubin,8 C. Cawlfield,9 B. I. Eisenstein,9
I. Karliner,9 D. Kim,9 N. Lowrey,9 M. Selen,9 E. J. White,9 J. Wiss,9 R. E. Mitchell,10
M. R. Shepherd,10 D. Besson,11 T. K. Pedlar,12 D. Cronin-Hennessy,13 K. Y. Gao,13 J. Hietala,13
Y. Kubota,13 T. Klein,13 B. W. Lang,13 R. Poling,13 A. W. Scott,13 A. Smith,13 P. Zweber,13
S. Dobbs,14 Z. Metreveli,14 K. K. Seth,14 A. Tomaradze,14 J. Ernst,15 K. M. Ecklund,16
H. Severini,17 W. Love,18 V. Savinov,18 O. Aquines,19 A. Lopez,19 S. Mehrabyan,19 H. Mendez,19
J. Ramirez,19 G. S. Huang,20 D. H. Miller,20 V. Pavlunin,20 B. Sanghi,20 I. P. J. Shipsey,20
B. Xin,20 G. S. Adams,21 M. Anderson,21 J. P. Cummings,21 I. Danko,21 D. Hu,21 B. Moziak,21
J. Napolitano,21 Q. He,22 J. Insler,22 H. Muramatsu,22 C. S. Park,22 E. H. Thorndike,22 and F. Yang22
(CLEO Collaboration)
Syracuse University, Syracuse, New York 13244
Wayne State University, Detroit, Michigan 48202
Carleton University, Ottawa, Ontario, Canada K1S 5B6
Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637
Cornell University, Ithaca, New York 14853
University of Florida, Gainesville, Florida 32611
George Mason University, Fairfax, Virginia 22030
University of Illinois, Urbana-Champaign, Illinois 61801
Indiana University, Bloomington, Indiana 47405
University of Kansas, Lawrence, Kansas 66045
Luther College, Decorah, Iowa 52101
University of Minnesota, Minneapolis, Minnesota 55455
Northwestern University, Evanston, Illinois 60208
State University of New York at Albany, Albany, New York 12222
State University of New York at Buffalo, Buffalo, New York 14260
University of Oklahoma, Norman, Oklahoma 73019
University of Pittsburgh, Pittsburgh, Pennsylvania 15260
University of Puerto Rico, Mayaguez, Puerto Rico 00681
Purdue University, West Lafayette, Indiana 47907
Rensselaer Polytechnic Institute, Troy, New York 12180
University of Rochester, Rochester, New York 14627
(Dated: November 1, 2018)
We measure the decay constant f
using the D+s → ℓ
+ν channel, where the ℓ+ designates either
a µ+ or a τ+, when the τ+ → π+ν. Using both measurements we find f
= 274 ± 13 ± 7 MeV.
Combining with our previous determination of fD+ , we compute the ratio fD+
/fD+ = 1.23±0.11±
0.04. We compare with theoretical estimates.
PACS numbers: 13.20.Fc, 13.66.Bc
To extract precise information on the size of CKM
matrix elements from Bd and Bs mixing measure-
ments the ratio of “decay constants,” that are re-
lated to the heavy and light quark wave-function
overlap at zero separation, must be well known [1].
Recent measurement of B0s mixing by CDF [2] has
shown the urgent need for precise numbers. De-
cay constants have been calculated for both B and
http://arxiv.org/abs/0704.0629v3
D mesons using several methods, including lattice
QCD [3]. Here we present the most precise measure-
ment to date of f
, and combined with our previ-
ous determination of fD+ [4, 5], we find fD+
/fD+ .
In the Standard Model (SM) purely leptonic Ds
decay proceeds via annihilation through a virtual
W+. The decay rate is given by [6]
s → ℓ
|Vcs|
where M
is the D+s mass, mℓ is the lepton mass,
GF is the Fermi constant, and |Vcs| is a CKM matrix
element with a value of 0.9738 [7].
In this Letter we report measurements of both
B(D+s → µ
+ν) and B(D+s → τ
+ν), when τ+ → π+ν
(D+s → π
+νν). More details are given in a compan-
ion paper [8]. The ratio Γ(D+s → τ
+ν)/Γ(D+s →
µ+ν) predicted in the SM via Eq. 1 depends only
on well-known masses, and equals 9.72; any devia-
tion would be a manifestation of new physics as it
would violate lepton universality [9]. New physics
can also affect the expected widths; any undiscov-
ered charged bosons would interfere with the SM
W+ [10].
The CLEO-c detector [11] is equipped to mea-
sure the momenta of charged particles, identify
them using dE/dx and Cherenkov imaging (RICH)
[12], detect photons and determine their directions
and energies. We use 314 pb−1 of data produced
in e+e− collisions using CESR near 4.170 GeV.
Here the cross-section for our analyzed sample,
D∗+s D
s , is ∼1 nb. Other charm produc-
tion totals ∼7 nb [13], and the underlying light-
quark “continuum” is ∼12 nb. We fully reconstruct
oneD−s as a “tag,” and examine the properties of the
D+s . (Charge conjugate decays are used.) Track se-
lection, particle identification, π0, η, and K0S criteria
are the same as those described in Ref. [4], except
that RICH identification now requires a minimum
momentum of 700 MeV/c.
Tag modes are listed in Table I. For resonance de-
cays we select intervals in invariant mass within ±10
MeV of the known mass for η′ → π+π−η, ±10 MeV
for φ → K+K−, ±100 MeV for K∗0 → K−π+, and
±150 MeV for ρ− → π−π0. We require tags to have
momentum consistent with coming from DsD
s pro-
duction. The distribution for the K+K−π− mode
(44% of all the tags) is shown in Fig. 1.
To select tags, we first fit the invariant mass dis-
tributions to the sum of two Gaussians centered
at MDs . The r.m.s. resolution (σ) is defined as
σ ≡ f1σ1 + (1 − f1)σ2, where σ1 and σ2 are the in-
dividual widths and f1 is the fractional area of the
FIG. 1: Invariant mass of K+K−π− candidates after
requiring the total energy to be consistent with the beam
energy. The curve shows a fit to a two-Gaussian signal
function plus a polynomial background.
TABLE I: Tagging modes and numbers of signal and
background events, within cuts, from two-Gaussian fits
to the invariant mass plots, and the number of γ tags in
each mode, within ±2.5σ from a fit to the signal Crys-
tal Ball function (see text) and a 5th order Chebychev
background polynomial and the associated background.
Mode Invariant Mass MM∗2
Signal Bkgrnd Signal Bkgrnd
K+K−π− 13871±262 10850 8053± 211 13538
− 3122±79 1609 1933±88 2224
ηπ− 1609± 112 4666 1024±97 3967
η′π− 1196±46 409 792±69 1052
φρ− 1678±74 1898 1050±113 3991
π+π−π− 3654±199 25208 2300±187 15723
K∗−K∗0 2030±98 4878 1298±130 5672
ηρ− 4142±281 20784 2195±225 17353
Sum 31302 ± 472 70302 18645±426 63520
first Gaussian. We require the invariant masses to be
within ± 2.5σ (±2σ for the ηρ− mode) of MDs . We
have a total of 31302±472 tag candidates. Then we
add a γ candidate that satisfies our shower shape re-
quirement. Regardless of whether or not the γ forms
a D∗s with the tag, for real D
sDs events, the missing
mass squared, MM∗2, recoiling against the γ and the
D−s tag should peak at M
. We calculate
MM∗2 = (ECM − EDs − Eγ)
−→pCM −
−→pDs −
where ECM (
−→pCM) is the center-of-mass energy (mo-
mentum), EDs (
−→pDs) is the energy (momentum) of
the fully reconstructed D−s tag, Eγ (
−→pγ) is the en-
ergy (momentum) of the additional γ. We use a
kinematic fit that constrains the decay products of
the D−s to MDs and conserves overall momentum
and energy. All γ’s in the event are used, except for
those that are decay products of the D−s tag.
The MM∗2 distribution from K+K−π− tags is
shown in Fig. 2. We fit all the modes individually
to determine the number of tag events. This proce-
dure is enhanced by having information on the shape
of the signal function. We use fully reconstructed
D−s D
s events, and examine the signal shape when
one Ds is ignored. The signal is fit to a Crystal Ball
function [14], which determines σ and the shape of
the tail. Though σ varies somewhat between modes,
the tail parameters don’t change, since they depend
on beam radiation and γ energy resolution.
FIG. 2: The MM∗2 distribution from events with a γ
in addition to the K+K−π− tag. The curve is a fit to
the Crystal Ball function and a 5th order Chebychev
background function.
Fits of MM∗2 in each mode when summed show
18645±426 events within a ±2.5σ interval (see Ta-
ble I). There is a small enhancement of (4.8± 1.0)%
in our ability to find tags in µ+ν (or π+νν) events
(tag bias) as compared with generic events. Addi-
tional systematic errors are evaluated by changing
the fitting range, using 4th and 6th order Chebychev
background polynomials, and allowing the parame-
ters of the tail of the fitting function to float, leading
to an overall systematic uncertainty of 5%.
Candidate µ+ν events are required to have only a
single additional track oppositely charged to the tag
with an angle >35.9◦ with respect to the beam line.
We also require that there not be any neutral en-
ergy cluster detected of more than 300 MeV, which
is especially useful to reject D+s → π
+π0 and ηπ+
decays. Since here we are searching for events in
which there is a single missing ν, the missing mass
squared, MM2, should peak at zero:
MM2 = (ECM − EDs − Eγ − Eµ)
−→pCM −
−→pDs −
−→pγ −
where Eµ (
−→pµ) are the energy (momentum) of the
candidate µ+ track.
We also make use of a set of kinematical con-
straints and fit each event to two hypotheses:
(1) the D−s tag is the daughter of a D
s and
(2) the D∗+s decays into γD
s . The kinemati-
cal constraints, in the center-of-mass frame, are
−→pDs +
−→pD∗
= 0, ECM = EDs + ED∗
, ED∗
ECM/2+
−M2Ds
/2ECM or EDs = ECM/2−
−M2Ds
/2ECM, MD∗
− MDs = 143.6 MeV.
In addition, we constrain the invariant mass of the
D−s tag to MDs . This gives a total of 7 constraints.
The missing ν four-vector needs to be determined,
so we are left with a three-constraint fit. We perform
an iterative fit minimizing χ2. To eliminate system-
atic uncertainties that depend on understanding the
absolute scale of the errors, we do not make a χ2 cut
but simply choose the γ and the decay sequence in
each event with the minimum χ2.
We consider three separate cases: (i) the track de-
posits < 300 MeV in the calorimeter, characteristic
of a non-interacting pion or a µ+; (ii) the track de-
posits > 300 MeV in the calorimeter, characteristic
of an interacting pion; or (iii) the track satisfies our
electron selection criteria. The separation between
muons and pions is not complete. Case (i) contains
99% of the muons but also 60% of the pions, while
case (ii) includes 1% of the muons and 40% of the
pions [5]. Case (iii) does not include any signal but
is used for background estimation. For cases (i) and
(ii) we insist that the track not be identified as an
electron or a kaon. Electron candidates have a match
between the momentum measured in the tracking
system and the energy deposited in the CsI calorime-
ter, and dE/dx and RICH measurements consistent
with this hypothesis.
For the µ+ν final state the MM2 distribution is
modeled as the sum of two Gaussians centered at
zero. A Monte Carlo (MC) simulation of the MM2
shows σ=0.025 GeV2 after the fit. We check the
resolution using the D+s → K
K+ mode. We search
for events with at least one additional track identi-
fied as a kaon using the RICH detector, in addition
to a D−s tag. The MM
2 resolution is 0.025 GeV2 in
agreement with the simulation.
In the π+νν final state, the extra missing ν re-
sults in a smeared MM2 distribution that is almost
triangular in shape starting near -0.05 GeV2, peak-
ing near 0.10 GeV2, and ending at 0.75 GeV2.
FIG. 3: The MM2 distributions from data usingD−s tags,
and one additional opposite-sign charged track and no
extra energetic showers, for cases (i), (ii), and (iii).
The MM2 distributions from data are shown in
Fig. 3. The overall signal region is -0.05 < MM2 <
0.20 GeV2. The upper limit is chosen to prevent
background from ηπ+ and K0π+ final states. The
peak in Fig. 3(i) is due to D+s → µ
+ν. Below 0.20
GeV2 in both (i) and (ii) we have π+νν events.
The specific signal regions are: for µ+ν, −0.05 <
MM2 < 0.05 GeV2, corresponding to ±2σ; for π+νν,
in case (i) 0.05 < MM2 < 0.20 GeV2 and in case (ii)
−0.05 < MM2 < 0.20 GeV2. In these regions we
find 92, 31, and 25 events, respectively.
We consider backgrounds from two sources: one
from real D+s decays and the other from the back-
ground under the single-tag signal peaks. For the
latter, we estimate the background from data using
side-bands of the invariant mass, shown in Fig. 1.
For case (i) we find 3.5 (properly normalized) back-
ground events in the µ+ν region and 2.5 back-
grounds in the τ+ν region; for case (ii) we find 3
events. Our total background estimate summing
over all of these cases is 9.0±2.3 events.
The background from real D+s decays is evaluated
by identifying specific sources. For µ+ν the only
possible background is D+s → π
+π0. Using a 195
pb−1 subsample of our data, we limit the branching
fraction as < 1.1 × 10−3 at 90% C.L. [8]. This low
rate coupled with the extra γ veto yields a negligible
contribution. The real D+s backgrounds for π
are listed in Table II. Using the SM expected ratio of
decay rates we calculate a contribution of 7.4 π+νν
events.
TABLE II: Event backgrounds in the π+νν sample from
real D+s decays.
Source B(%) case (i) case (ii) Sum
D+s → Xµ
+ν 8.2 0+1.8
0 0+1.8
D+s → π
+π0π0 1.0 0.03±0.04 0.08±0.03 0.11±0.04
D+s → τ
+ν 6.4
τ+ → π+π0ν 1.5 0.55±0.22 0.64±0.24 1.20±0.33
τ+ → µ+νν 1.0 0.37±0.15 0 0.37±0.15
Sum 1.0+1.8
0.7±0.2 1.7+1.8
The event yield in the signal region, Ndet (92), is
related to the number of tags, Ntag, the branching
fractions, and the background Nbkgrd (3.5) as
Ndet −Nbkgrd = Ntag · ǫ[ǫ
′B(D+s → µ
+ν) (3)
+ǫ′′B(D+s → π
+νν)],
where ǫ (80.1%) includes the efficiencies (77.8%) for
reconstructing the single charged track including fi-
nal state radiation, (98.3)% for not having another
unmatched cluster in the event with energy greater
than 300 MeV, and the correction for the tag bias
(4.8%); ǫ′ (91.4%) is the product of the 99.0% µ+
calorimeter efficiency and the 92.3% acceptance of
the MM2 cut of |MM2| < 0.05 GeV2; ǫ′′ (7.6%) is
the fraction of π+νν events contained in the µ+ν sig-
nal window (13.2%) times the 60% acceptance for a
pion to deposit less than 300 MeV in the calorime-
ter. Using B(τ+ → π+ν) of (10.90±0.07)% [7], the
ratio of the π+νν to µ+ν widths is 1.059; we find:
B(D+s → µ
+ν) = (0.594± 0.066± 0.031)%. (4)
We can also sum the µ+ν and τ+ν contributions
for −0.05 < MM2 < 0.02 GeV2. Equation 3 still ap-
plies. The number of signal and background events
changes to 148 and 10.7, respectively. ǫ′ becomes
96.2%, and ǫ′′ increases to 45.2%. The effective
branching fraction, assuming lepton universality, is
Beff(D+s → µ
+ν) = (0.638± 0.059± 0.033)%. (5)
The systematic errors on these branching fractions
are dominated by the error on the number of tags
(5%). Other errors include: (a) track finding (0.7%),
determined from a detailed comparison of the sim-
ulation with double tag events where one track is
ignored; (b) the error due to the requirement that
the charged track deposit no more than 300 MeV
in the calorimeter (1%), determined using two-body
D0 → K−π+ decays [5]; (c) the γ veto efficiency
(1%), determined by extrapolating measurements on
fully reconstructed events. Systematic errors arising
from the background estimates are negligible. The
total systematic error for Eq. 4 is 5.2%, and is 5.1%
for Eq. 5 as (b) doesn’t apply here.
We also analyze the τ+ν final state independently.
For case (i) we define the signal region to be the in-
terval 0.05<MM2 <0.20 GeV2, while for case (ii)
-0.05<MM2 <0.20 GeV2. The upper limit on MM2
is chosen to avoid background from the tail of the
K0π+ peak. The fractions of the MM2 range ac-
cepted are 32% and 45% for case (i) and (ii), respec-
tively.
We find 31 [25] events in the signal region with a
background of 3.5 [5.1] events for case (i) [(ii)]. The
branching fraction, averaging the two cases is
B(D+s → τ
+ν) = (8.0± 1.3± 0.4)%, (6)
where the systematic error includes a contribution
of 0.06% from the uncertainty on B(τ+ → π+ν).
We measure 13.4 ± 2.6 ± 0.2 for the ratio of τ+ν
to µ+ν rates using Eq. 4. Here the systematic er-
ror is dominated by the uncertainty on the mini-
mum ionization cut. We also set an upper limit of
B(D+s → e
+ν) < 1.3 × 10−4 at 90% C.L. Both of
these results are consistent with SM predictions and
lepton universality.
We perform an overall check of our procedures by
measuring B(D+s → K
K+). We compute the MM2
(Eq. 2) using events with an additional charged track
identified as a kaon. These track candidates have
momenta of approximately 1 GeV/c; here the RICH
has a pion to kaon fake rate of 1.1% with a kaon
detection efficiency of 88.5% [12]. For this study,
we do not veto events with extra charged tracks, or
γ’s, because of the presence of the K0. We deter-
mine B(D+s → K
K+) = (2.90 ± 0.19 ± 0.18)%.
This method gives a result in good agreement with
preliminary CLEO-c results using double tags of
(3.00 ± 0.19 ± 0.10)% [15]; these results are not in-
dependent.
We also performed the entire analysis on a MC
sample that is 4 times larger than the data sam-
ple. The input branching fraction is 0.5% for µ+ν
and 6.57% for τ+ν, while our analysis measured
(0.514±0.027)% for the case (i) µ+ν signal and
(0.521±0.024)% for µ+ν and τ+ν combined.
Using B(D+s → µ
+ν) from Eq. 5, and Eq. 1 with
a Ds lifetime of (500±7)×10
−15 s [7], we extract
= 274± 13± 7 MeV. (7)
We combine with our previous result fD+ =
222.6± 16.7+2.8
−3.4 MeV [4], and find
/fD+ = 1.23± 0.11± 0.04. (8)
Lattice QCD predictions for f
and the ratio
/fD+ have been summarized by Onogi [16]. Our
measurements are consistent with most calculations;
examples are unquenched Lattice that predicts 249±
3± 16 MeV and 1.24± 0.01± 0.07 for the ratio [17],
while a recent quenched prediction gives 266± 10±
18 MeV and 1.13 ± 0.03 ± 0.05 [18]. There is no
evidence yet for any suppression in the ratio due to
the presence of a virtual charged Higgs [10].
The CLEO-c determination of f
is the most
accurate to date and consistent with other measure-
ments [7, 8]. It also does not rely on the indepen-
dent determination of any normalization mode (e.g.
φπ+). (We note that a preliminary CLEO-c result
using D+s → τ
+ν, τ+ → e+νν [19] is consistent with
these results.)
We gratefully acknowledge the effort of the CESR
staff in providing us with excellent luminosity and
running conditions. This work was supported by the
A.P. Sloan Foundation, the National Science Foun-
dation, the U.S. Department of Energy, and the Nat-
ural Sciences and Engineering Research Council of
Canada.
[1] G. Buchalla, A. J. Buras and M. E. Lautenbacher,
Rev. Mod. Phys. 68, 1125 (1996).
[2] A. Abulencia et al. (CDF), Phys. Rev. Lett. 97,
242003 (2006). See also V. Abazov et al. (D0), Phys.
Rev. Lett. 97, 021802 (2006).
[3] C. Davies et al., Phys. Rev. Lett. 92, 022001 (2004).
[4] M. Artuso et al. (CLEO), Phys. Rev. Lett. 95,
251801 (2005).
[5] G. Bonvicini et al. (CLEO) Phys. Rev. D70, 112004
(2004).
[6] D. Silverman and H. Yao, Phys. Rev. D38, 214
(1988).
[7] W.-M. Yao et al., J. Phys. G33, 1 (2006).
[8] T. K. Pedlar et al. (CLEO), arXiv:0704.0437[hep-
ex], submitted to Phys. Rev. D.
[9] J. Hewett, [hep-ph/9505246]; W.-S. Hou, Phys. Rev.
D48, 2342 (1993).
[10] A. G. Akeroyd, Prog. Theor. Phys. 111, 295 (2004).
[11] D. Peterson et al., Nucl. Instrum. and Meth. A478,
142 (2002); Y. Kubota et al., Nucl. Instrum. and
Meth. A320, 66 (1992).
[12] M. Artuso et al., Nucl. Instrum. Meth. A554, 147
(2005).
[13] R. Poling, [hep-ex/0606016].
[14] P. Rubin et al. (CLEO), Phys. Rev. D73, 112005
(2006).
http://arxiv.org/abs/0704.0437
http://arxiv.org/abs/hep-ph/9505246
http://arxiv.org/abs/hep-ex/0606016
[15] N. E. Adam et al. (CLEO), [hep-ex/0607079].
[16] T. Onogi [hep-lat/0610115].
[17] C. Aubin et al., Phys. Rev. Lett. 95, 122002 (2005).
[18] T. W. Chiu et al., Phys. Lett. B624, 31 (2005).
[19] S. Stone [hep-ex/0610026].
http://arxiv.org/abs/hep-ex/0607079
http://arxiv.org/abs/hep-lat/0610115
http://arxiv.org/abs/hep-ex/0610026
ABSTRACT
  We measure the decay constant fDs using the Ds -> l+ nu channel, where the l+
designates either a mu+ or a tau+, when the tau+ -> pi+ nu. Using both
measurements we find fDs = 274 +-13 +- 7 MeV. Combining with our previous
determination of fD+, we compute the ratio fDs/fD+ = 1.23 +- 0.11 +- 0.04. We
compare with theoretical estimates.

<|endoftext|><|startoftext|>
Introduction
	The BABAR detector and dataset
	Event Selection and Kinematic Fit
	The K+ K- +- final state
	Final Selection and Backgrounds
	Selection Efficiency
	Cross Section for e+e-  K+ K- +- 
	Substructure in the K+ K- +-  Final State
	The e+e-  K*0 K  Cross Section
	The (1020)+- Intermediate State
	The (1020) f0(980) Intermediate State
	The K+ K-00 Final State
	Final Selection and Backgrounds
	Selection Efficiency
	Cross Section for e+e-  K+ K- 00
	Substructure in the K+ K- 00 Final State
	The (1020)00 Intermediate State
	The (1020) f0(980) Intermediate State
	The K+ K- K+ K-  Final State
	Final Selection and Background
	Selection Efficiency
	Cross Section for e+e-  K+ K- K+ K-
	The (1020) K+ K-  Intermediate State
	e+e-  f0 Near Threshold
	The Charmonium Region
	Summary
	Acknowledgments
	References
ABSTRACT
  We study the processes $e^+ e^-\to K^+ K^- \pi^+\pi^-\gamma$,
$K^+K^-\pi^0\pi^0\gamma$ and $K^+ K^- K^+ K^-\gamma$, where the photon is
radiated from the initial state. About 34600, 4400 and 2300 fully reconstructed
events, respectively, are selected from 232 \invfb of \babar data. The
invariant mass of the hadronic final state defines the effective \epem
center-of-mass energy, so that the $K^+ K^- \pi^+\pi^-\gamma$ data can be
compared with direct measurements of the $e^+ e^-\to K^+K^- \pipi$ reaction; no
direct measurements exist for the $e^+ e^-\to K^+ K^- \pi^0\pi^0$ or $\epem\to
K^+ K^- K^+ K^-$ reactions. Studying the structure of these events, we find
contributions from a number of intermediate states, and we extract their cross
sections where possible. In particular, we isolate the contribution from $e^+
e^-\to\phi(1020) f_{0}(980)$ and study its structure near threshold. In the
charmonium region, we observe the $J/\psi$ in all three final states and
several intermediate states, as well as the $\psi(2S)$ in some modes, and
measure the corresponding branching fractions. We see no signal for the Y(4260)
and obtain an upper limit of
$\BR_{Y(4260)\to\phi\pi^+\pi^-}\cdot\Gamma^{Y}_{ee}<0.4 \ev$ at 90% C.L.

<|endoftext|><|startoftext|>
d-wave superconductivity from electron-phonon interactions
J.P.Hague
Dept. of Physics and Astronomy, University of Leicester, Leicester, LE1 7RH and
Dept. of Physics, Loughborough University, Loughborough, LE11 3TU
(Dated: 4th May 2005)
I examine electron-phonon mediated superconductivity in the intermediate coupling and phonon
frequency regime of the quasi-2D Holstein model. I use an extended Migdal–Eliashberg theory which
includes vertex corrections and spatial fluctuations. I find a d-wave superconducting state that is
unique close to half-filling. The order parameter undergoes a transition to s-wave superconductivity
on increasing filling. I explain how the inclusion of both vertex corrections and spatial fluctuations is
essential for the prediction of a d-wave order parameter. I then discuss the effects of a large Coulomb
pseudopotential on the superconductivity (such as is found in contemporary superconducting ma-
terials like the cuprates), which results in the destruction of the s-wave states, while leaving the
d-wave states unmodified. Published as: Phys. Rev. B 73, 060503(R) (2006)
PACS numbers: 71.10.-w, 71.38.-k, 74.20.-z
The discovery of high transition temperatures and a
d-wave order parameter in the cuprate superconductors
are remarkable results and have serious implications for
the theory of superconductivity. The presence of large
Coulomb interactions in the cuprates which have the po-
tential to destroy conventional s-wave BCS states has
prompted the search for new mechanisms that can give
rise to superconductivity. However, electron-phonon me-
diated superconductivity is still not well understood, es-
pecialy in lower dimensional systems. In particular, the
electron-phonon problem is particularly difficult at inter-
mediate couplings with large phonon frequency (such as
found in the cuprates) and the electron-phonon mech-
anism cannot be fully ruled out. It is therefore of
paramount importance to develop new theories to under-
stand electron-phonon mediated superconductivity away
from the BCS limit.
The assumption that electron-phonon interactions can-
not lead to high transition temperatures and unusual or-
der parameters was made on the basis of calculations
from BCS theory, which is a very-weak-coupling mean-
field theory (although of course highly successful for
pre-1980s superconductors)1. In the presence of strong
Coulomb interaction, the BCS s-wave transition temper-
ature is vastly reduced. However, the recent measure-
ment of large couplings between electrons and the lat-
tice in the cuprate superconductors means that exten-
sions to the conventional theories of superconductivity
are required2,3,4. In particular, low dimensionality, in-
termediate dimensionless coupling constants of ∼ 1 and
large and active phonon frequencies of ∼ 75meV mean
that BCS or the more advanced Migdal–Eliashberg (ME)
theory cannot be applied. In fact, the large coupling con-
stant and a propensity for strong renormalization in 2D
systems, indicate that the bare unrenormalized phonon
frequency could be several times greater than the mea-
sured 75 meV5.
Here I apply the dynamical cluster approxima-
tion (DCA) to introduce a fully self-consistent
momentum-dependent self-energy to the electron-phonon
problem5,6,7,8. Short ranged spatial fluctuations and low-
est order vertex corrections are included, allowing the
sequence of phonon absorption and emission to be re-
ordered once. In particular, the theory used here is
second order in the effective electron-electron coupling
U = −g2/Mω20, which provides the correct weak coupling
limit from small to large phonon frequencies18. In this
paper, I include symmetry broken states in the anoma-
lous self energy to investigate unconventional order pa-
rameters such as d-wave. No assumptions are made in
advance about the form of the order parameter.
DCA6,8,9 is an extension to the dynamical mean-field
theory for the study of low dimensional systems. To ap-
ply the DCA, the Brillouin zone is divided into NC sub-
zones within which the self-energy is assumed to be mo-
mentum independent, and cluster Green functions are
determined by averaging over the momentum states in
each subzone. This leads to spatial fluctuations with
characteristic range, N
c . In this paper, Nc = 4 is used
throughout. This puts an upper bound on the strength of
the superconductivity, which is expected to be reduced
in larger cluster sizes10. To examine superconducting
states, DCA is extended within the Nambu formalism7,8.
Green functions and self-energies are described by 2 × 2
matrices, with off diagonal terms relating to the super-
conducting states. The self-consistent condition is:
G(K, iωn) =
Di(ǫ)(ζ(Ki, iωn)− ǫ)
|ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2
F (K, iωn) = −
Di(ǫ)φ(Ki, iωn)
|ζ(Ki, iωn)− ǫ|2 + φ(Ki, iωn)2
where ζ(Ki, iωn) = iωn + µ−Σ(Ki, iωn), µ is the chem-
ical potential, ωn are the Fermionic Matsubara frequen-
cies, φ(K, iω) is the anomalous self energy and Σ(K, iω)
is the normal self energy. G(K, iωn) must obey the lat-
tice symmetry. In contrast, it is only |F (K, iωn)| which
is constrained by this condition, since φ is squared in
the denominator of Eqn. 1. Therefore the sign of φ
http://arxiv.org/abs/0704.0633v1
FIG. 1: Diagrammatic representation of the current approx-
imation. Series (a) represents the vertex-neglected theory
which corresponds to the Migdal–Eliashberg approach, valid
when the phonon energy ω0 and electron-phonon coupling
U are small compared to the Fermi energy. Series (b) repre-
sents additional diagrams for the vertex corrected theory. The
phonon self energies are labeled with Π, and Σ denotes the
electron self-energies. Lines represent the full electron Green
function and wavy lines the full phonon Green function.
can change. For instance, if the anomalous self energy
has the rotational symmetry φ(π, 0) = −φ(0, π), the
on-diagonal Green function, which represents the elec-
tron propagation retains the correct lattice symmetry
G(π, 0) = G(0, π). Therefore, only inversion symmetry
is required of the anomalous Green function representing
superconducting pairs and the anomalous self energy.
Here I examine the Holstein model11 of electron-
phonon interactions. It treats phonons as nuclei vibrat-
ing in a time-averaged harmonic potential (representing
the interactions between all nuclei), i.e. only one fre-
quency ω0 is considered. The phonons couple to the local
electron density via a momentum-independent coupling
constant g11.
H = −
<ij>σ tc
iσcjσ +
iσ niσ(gri − µ)
The first term in this Hamiltonian represents hopping of
electrons between neighboring sites and has a dispersion
ǫk = −2t
i=1 cos(ki). The second term couples the lo-
cal ion displacement, ri to the local electron density. The
last term is the bare phonon Hamiltonian, i.e. a sim-
ple harmonic oscillator. The creation and annihilation of
electrons is represented by c
i (ci), pi is the ion momen-
tum and M the ion mass. The effective electron-electron
interaction is,
U(iωs) =
ω2s + ω
where, ωs = 2πsT , s is an integer and U = −g
2/Mω20
represents the magnitude of the effective electron-
electron coupling. D = 2 with t = 0.25, resulting in
a non-interacting band width W = 2. A small interpla-
nar hopping t⊥ = 0.01 is included. This is necessary to
stabilise superconductivity, which is not permitted in a
pure 2D system12.
Perturbation theory in the effective electron-electron
interaction (Fig. 1) is applied to second order in U , us-
ing a skeleton expansion. The electron self-energy has
two terms, ΣME(ω,K) neglects vertex corrections (Fig.
1(a)), and ΣVC(ω,K) corresponds to the vertex corrected
case (Fig. 1(b)). ΠME(ω,K) and ΠVC(ω,K) correspond
to the equivalent phonon self energies. At large phonon
frequencies, all second order diagrams including ΣV C are
essential for the correct description of the weak coupling
limit.
The phonon propagator D(z,K) is calculated from,
D(iωs,K) =
ω2s + ω
0 −Π(iωs,K)
and the Green function from equations 1 and 2. Σ =
ΣME+ΣVC and Π = ΠME+ΠVC. Details of the transla-
tion of the diagrams in Fig. 1 and the iteration procedure
can be found in Ref. 7. Calculations are carried out along
the Matsubara axis, with sufficient Matsubara points for
an accurate calculation. The equations were iterated un-
til the normal and anomalous self-energies converged to
an accuracy of approximately 1 part in 103.
Since the anomalous Green function is proportional
to the anomalous self energy, initializing the problem
with the non-interacting Green function leads to a non-
superconducting (normal) state. A constant supercon-
ducting field with d-wave symmetry was applied to the
system to induce superconductivity. The external field
was then completely removed. Iteration continued with-
out the field until convergence. This solution was then
used to initialize self-consistency for other similar val-
ues of the parameters. The symmetry conditions used in
Refs 5 and 7 have been relaxed to reflect the additional
breaking of the anomalous lattice symmetry in the d-wave
state. This does not affect the normal state Green func-
tion, but does affect the anomalous state Green function.
In Fig. 2, the anomalous self energy is examined for
n = 1.0 (half-filling). The striking feature is that sta-
ble d-wave superconductivity is found. This is mani-
fested through a change in sign of the anomalous self
energy, which is negative at the (π, 0) point and positive
at the (0, π) point. The electron Green function (equa-
tion 1) depends on φ2, so causality and lattice symmetry
are maintained. Since the gap function φ(iωn)/Z(iωn)
is directly proportional to φ(iωn), and Z(iωn,K(π,0)) =
Z(iωn,K(0,π)), then the sign of the order parameter i.e.
the sign of the superconducting gap changes under 90o
rotation. Z(iωn) = 1− Σ(iωn)/iωn.
Figure 3 shows the variation of superconducting pair-
ing across the Brillouin zone. ns(k) = T
n F (iωn,k).
U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. The d-wave
order can be seen very clearly. The largest anomalous
densities are at the (π, 0) and (0, π) points, with a node
situated at the (π/2, π/2) point and a sign change on 90o
rotation. Pairing clearly occurs between electrons close
to the Fermi surface.
So far, the model has been analyzed at half filling.
Figure 4 demonstrates the evolution of the order param-
-0.025
-0.02
-0.015
-0.01
-0.005
 0.005
 0.01
 0.015
 0.02
 0.025
-2 -1.5 -1 -0.5  0  0.5  1  1.5  2
(0,0)
(π,0)
(0,π)
(π,π)
FIG. 2: Anomalous self-energy at half-filling. The anomalous
self energy is real. It is clear that φ(π, 0) = −φ(0, π). This
is characteristic of d-wave order. Similarly, the electron self
energy has the correct lattice symmetry Σ(π, 0) = Σ(0, π),
which was not imposed from the outset. The gap function is
related to the anomalous self energy via φ(iωn)/Z(iωn).
-0.25
 0.25
-0.25
 0.25
ns(k)
FIG. 3: Variation of superconducting (anomalous) pairing
density across the Brillouin zone. ns(k) = T
F (iωn,k).
U = 0.6, ω0 = 0.4, n = 1 and T = 0.005. The d-wave order
can be seen very clearly, with a change in sign on 90o rota-
tion and a node situated at the (π/2, π/2) point. The largest
anomalous (superconducting) densities are at the (π, 0) and
(0, π) points.
eter as the number of holes is first increased, and then
decreased. The total magnitude of the anomalous den-
sity, ns =
|ns(Ki)| is examined. When the number
of holes is increased, stable d-wave order persists to a
filling of n = 1.18, while decreasing monotonically. At
the critical point, there is a spontaneous transition to s-
wave order. Starting from a high filling, and reducing the
number of holes, there is a spontaneous transition from
s to d-wave order at n = 1.04. There is therefore hys-
teresis associated with the self-consistent solution. It is
reassuring that the d-wave state can be induced without
the need for the external field. As previously established,
s-wave order does not exist at half-filling as a mainfesta-
 0.02
 0.04
 0.06
 0.08
 1  1.05  1.1  1.15  1.2  1.25  1.3
External
field
FIG. 4: Hysteresis of the superconducting order parameters.
|ns(Ki)|. Starting from a d-wave state at half-
filling, increasing the chemical potential increases the filling
and decreases the d-wave order. Eventually, at n = 1.18 the
system changes to an s-wave state. On return from large
filling, the s-wave superconductivity is persistent to a low
filling of n = 1.04, before spontaneously reverting to a d-
wave state. The system is highly susceptible to d-wave order,
and application of a very small external superconducting field
to an s-wave state results in a d-wave state. Note that d-
and s-wave channels are coupled in the higher order theory,
so the transition can take place spontaneously, unlike in the
standard gap equations.
tion of Hohenberg’s theorem7, so the computed d-wave
order at half-filling is the ground state of the model. It
is interesting that the d- and s-channels are able to co-
exist, considering that the BCS channels are separate on
a square lattice. This is due to the vertex corrections,
since the self consistent equations are no longer linear in
the gap function (the 1st order gap equation vanishes in
the d-wave case, leaving 2nd order terms as the leading
contribution).
I finish with a brief discussion of Coulomb effects.
In the Eliashberg equations, a Coulomb pseudopotential
may be added to the theory as,
φC = UCT
F (iωn,K) (6)
It is easy to see the effect of d-wave order on this term.
Since the sign of the anomalous Green function is mod-
ulated, the average effect of d-wave order is to nullify
the Coulomb contribution to the anomalous self-energy
(i.e. φCd = 0). This demonstrates that the d-wave
state is stable to Coulomb perturbations, presumably be-
cause the pairs are distance separated. In contrast, the
s-wave state is not stable to Coulomb interaction, with
a corresponding reduction of the transition temperature
(TC = 0 for λ < µC). Thus, such a Coulomb filter selects
the d-wave state (see e.g. Ref. 13). Since large local
Coulomb repulsions are present in the cuprates (and in-
deed most transition metal oxides), then this mechanism
seems the most likely to remove the hysteresis. Without
the Coulomb interactions, it is expected that the s-wave
state will dominate for n > 1.04, since the anomalous
order is larger.
I note that a further consequence of strong Coulomb
repulsion is antiferromagnetism close to half-filling. Typ-
ically magnetic fluctuations act to suppress phonon medi-
ated superconducting order. As such, one might expect a
suppression of superconducting order close to half-filling,
with a maximum away from half filling. The current the-
ory could be extended to include additional anomalous
Green functions related to antiferromagnetic order. This
would lead to a 4x4 Green function matrix. A full anal-
ysis of antiferromagnetism and the free energy will be
carried out at a later date.
a. Summary In this paper I have carried out simu-
lations of the 2D Holstein model in the superconducting
state. Vertex corrections and spatial fluctuations were
included in the approximation for the self-energy. The
anomalous self energy and superconducting order param-
eter were calculated. Remarkably, stable superconduct-
ing states with d-wave order were found at half-filling.
d-wave states persist to n = 1.18, where the symmetry
of the parameter changes to s-wave. Starting in the s-
wave phase and reducing the filling, d-wave states spon-
taneously appear at n = 1.04. The spontaneous appear-
ance of d-wave states in a model of electron-phonon in-
teractions is of particular interest, since it may negate
the need for novel pairing mechanisms in the cuprates19.
The inclusion of vertex corrections and spatial fluctua-
tions was essential to the emergence of the d-wave states
in the Holstein model, which indicates why BCS and ME
calculations do not predict this phenomenon. For very
weak coupling, the off diagonal Eliashberg self-energy
has the form −UT
Q,n F (iωn,Q)D0(iωs − iωn), so it
is clear (for the same reasons as the Coulomb pseudopo-
tential) that this diagram has no contribution in the d-
wave phase (the weak coupling phonon propagator is mo-
mentum independent for the Holstein model). Therefore,
vertex corrections are the leading term in the weak cou-
pling limit. Furthermore, I have discussed the inclusion
of Coulomb states to lowest order, which act to desta-
bilize the s-wave states, while leaving the d-wave states
unchanged. Since the Coulomb pseudopotential has no
effect then it is possible that electron-phonon interactions
are the mechanism inducing d-wave states in real mate-
rials such as the cuprates. The Coulomb filtering mech-
anism works for p-wave symmetry and higher, so it is
possible that electron-phonon interactions could explain
many novel superconductors. Certainly, such a mecha-
nism cannot be ruled out. The doping dependence of the
order qualitatively matches that of La2−xSrxCuO4 (here
order extends to x = 0.18, in the Cuprate to x = 0.3).
Antiferromagnetism is only present in the cuprate very
close to half filling (up to approx x = 0.02), and on a
mean-field level does not interfere with the d-wave su-
perconductivity at larger dopings.
It has been determined experimentally that strong
electron-phonon interactions and high phonon frequen-
cies are clearly visible in the electron and phonon band
structures of the cuprates, and are therefore an essential
part of the physics3,4. Similar effects to those observed
in the cuprates are seen in the electron and phonon band
structures of the 2D Holstein model in the normal phase5.
It is clearly of interest to determine whether other fea-
tures and effects in the cuprate superconductors could be
explained with electron-phonon interactions alone.
b. Acknowledgments I thank the University of Le-
icester for hospitality while carrying out this work. I
thank E.M.L.Chung for useful discussions. I am currently
supported under EPSRC grant no. EP/C518365/1.
1 J.Bardeen, L.N.Cooper, and J.R.Schrieffer, Phys. Rev. B
108, 1175 (1957).
2 G.M.Zhao, M.B.Hunt, H.Keller, and K.A.Müller, Nature
385, 236 (1997).
3 A.Lanzara, P.V.Bogdanov, X.J.Zhou, S.A.Kellar,
D.L.Feng, E.D.Lu, T.Yoshida, H.Eisaki, A.Fujimori,
K.Kishio, et al., Nature 412, 6846 (2001).
4 R.J.McQueeney, Y.Petrov, T.Egami, M.Yethiraj,
G.Shirane, and Y.Endoh, Phys. Rev. Lett. 82, 628
(1999).
5 J.P.Hague, J. Phys. Condens. Matt 15, 2535 (2003).
6 M.H.Hettler, A.N.Tahvildar-Zadeh, M.Jarrell,
T.Pruschke, and H.R.Krishnamurthy, Phys. Rev. B
58, R7475 (1998).
7 J.P.Hague, J. Phys.: Condens. Matter 17, 5663 (2005).
8 T. Maier, M. Jarrell, T. Pruschke, and M. H. Hettler, Rev.
Mod. Phys 77, 1027 (2005).
9 M.H.Hettler, M.Mukherjee, M.Jarrell, and
H.R.Krishnamurthy, Phys. Rev. B 61, 12739 (2000).
10 M.Jarrell, Th.Maier, C.Huscroft, and S.Moukouri, Phys.
Rev. B 64, 195130 (2001).
11 T.Holstein, Ann. Phys. 8, 325 (1959).
12 P.C.Hohenberg, Phys. Rev. 158, 383 (1967).
13 J.F.Annett, Superconductivity, Superfluidity and Conden-
sates (Oxford University Press, 2004).
14 C.Grimaldi, L.Pietronero, and S.Strässler, Phys. Rev. Lett.
75, 1158 (1995).
15 A.A.Abrikosov, Physica C 244, 243 (1995).
16 A.A.Abrikosov, Phys. Rev. B 52, R15738 (1995).
17 R. J. Birgeneau,and G. Shirane, in Physical Properties
of High Temperature Superconductors I, edited by D.
M.Ginsberg (World Scientific, Singapore, 1989).
18 I also note the extensions to Eliashberg theory carried out
by Grimaldi et al.14.
19 On the basis of a screened electron-phonon interaction,
Abrikosov claims to have found stable d-wave states in a
BCS like theory15,16. However with an unscreened Holstein
potential, the transition temperature it the d-wave chan-
nel given by the standard theory is zero. Also, the assumed
order parameter in his work does not clearly have d-wave
symmetry.
ABSTRACT
  I examine electron-phonon mediated superconductivity in the intermediate
coupling and phonon frequency regime of the quasi-2D Holstein model. I use an
extended Migdal-Eliashberg theory which includes vertex corrections and spatial
fluctuations. I find a d-wave superconducting state that is unique close to
half-filling. The order parameter undergoes a transition to s-wave
superconductivity on increasing filling. I explain how the inclusion of both
vertex corrections and spatial fluctuations is essential for the prediction of
a d-wave order parameter. I then discuss the effects of a large Coulomb
pseudopotential on the superconductivity (such as is found in contemporary
superconducting materials like the cuprates), which results in the destruction
of the s-wave states, while leaving the d-wave states unmodified.

<|endoftext|><|startoftext|>
Introduction 
 Equilibrium conformational fluctuations of proteins about their folded, native 
structure play an important role in their biological function.1-4 Three prominent 
approaches used to compute conformational fluctuations of proteins are, in order of 
increasing computational efficiency and decreasing modeling resolution, molecular 
dynamics (MD), all-atom normal mode analysis (NMA), and coarse-grained elastic NMA 
(eNMA). MD attempts to sample the equilibrium distribution of states in the vicinity of 
the native structure via time-integration of Newton’s equations of motion, typically 
modeling solvent explicitly.5 All-atom NMA assumes harmonic fluctuations about the 
native state in solving the free vibration problem for the protein while treating the solvent 
implicitly.2,3,6 Finally, eNMA employs a coarse-grained elastic description of the protein 
in which specific atomic interactions are replaced by a simple network of linear elastic 
springs, typically connecting Cα atoms within an arbitrary cut-off radius.7,8 Successively 
coarser and thus computationally more efficient eNMA descriptions are obtained by 
reducing the total number of interaction sites in the system.9-12 The idea of treating 
proteins as effective elastic media in calculating their normal modes dates back at least to 
Suezaki and Go.13 
 Despite their relative simplicity, elastic coarse-grained models have proven 
remarkably successful in calculating the slow, large length-scale vibrational modes of 
proteins and their supramolecular assemblies. As shown recently by Lu and Ma,14 their 
success may partially be attributed to the fact that biomolecular shape plays a dominant 
role in determining the lowest normal modes of proteins. Indeed, large length-scale 
modes naturally average over heterogeneous interactions present at atomic length-scales, 
thereby rendering elastic descriptions valid in this regime. Global structural averages 
such as backbone fluctuations and inter-residue correlations are in turn also successfully 
predicted because they are dominated by these low frequency modes. 
 The success of eNMA motivates the current work, in which the elastic network 
model for proteins is cast in the framework of the well established Finite Element Method 
(FEM).15,16 In formulating the model, the protein is defined by its mass density, ρ, 
isotropic elastic modulus, E, and solvent-excluded surface (SES), which is obtained by 
rolling a water molecule-size probe-sphere over its van der Waals surface.17-20 As an 
initial exploration of the utility of the FEM in analyzing protein mechanical response, the 
normal modes of a mutant of T4 lysozyme and of F-actin are computed, as well as the 
critical Euler buckling load of F-actin when subject to axial compression. NMA results 
for T4 lysozyme are compared with all-atom NMA, the Rotation Translation Blocks 
(RTB) procedure,21,22 which treats residues as rigid but retains atomic-level interactions 
as modeled by the implicit solvent force-field EEF1,23 and experiment. 
 Similar to eNMA, the proposed FE-based procedure offers several advantages 
over all-atom NMA, including the elimination of costly energy minimization that may 
distort the initial protein structure, direct applicability to x-ray data of proteins with 
unknown atomic structure,24-26 and a significant speed-up of the NMA itself due to a 
drastic reduction in the number of degrees of freedom simulated. 
 Additionally, the FEM offers several distinct advantages over existing elastic 
network models that provide the primary motivation for the current work. Principal 
among these is the suitability of the FEM to calculate the mechanical response of proteins 
and their supramolecular assemblies to applied bending, buckling, and other generalized 
loading scenarios, which is needed to probe the structure-function relation of 
supramolecular assemblies such as viral capsids,27,28 microtubules,29,30 F-actin 
bundles,31,32 and molecular motors.33,34 Moreover, casting the coarse-grained elastic 
model in the framework of the FEM opens two important avenues of model refinement 
that are currently being pursued. First, the atomic Hessian can be projected onto the FE-
space in order to incorporate atomic-level interactions into the model, thereby eliminating 
the a priori assumption of homogeneous isotropic elastic response. This idea is similar to 
the initial version of the Rotation Translation Blocks (RTB) procedure proposed in 
Durand et al.,35, as well as related works in modeling crystals.36,37 The incorporation of 
atomic-level interactions may be particularly important in modeling binding interfaces 
present between constituent monomers in supramolecular assemblies such as F-actin, 
MTs, and viral coat protein subunits, particularly near the onset of mechanical failure. 
Second, the FE-based protein model may be coupled directly to field calculations 
including the Poisson–Boltzmann Equation to model solvent-mediated electrostatic 
interactions38-40 and the Stokes Equations to model solvent-damping in dynamic response 
calculations.41,42 
Methods 
 The FEM is a mature field that is discussed in detail in references such as Bathe15 
and Zienkiewicz and Taylor.16 Accordingly, the focus here is on its application to 
proteins and readers are kindly referred to the above-referenced books for details on its 
theoretical foundations. 
 Generation of the FE model requires three steps: (1) definition and discretization 
of the protein volume; (2) definition of the local effective mass density and constitutive 
behavior of the protein; and (3) application of boundary conditions such as displacement- 
or force-based loading. The protein volume is defined by its bounding SES, which is also 
called the Richards Molecular Surface or simply the Molecular Surface. This surface is 
defined by the closest point of contact of a solvent-sized probe-sphere that is rolled over 
the van der Waals surface of the protein, which defines the molecular volume that is 
never penetrated by any part of the solvent probe-sphere.17-19 The SES is computed using 
MSMS ver. 2.6.1, which generates a high density triangulated approximation (one 
triangular vertex per Å2) to the exact SES.20 The MSMS-discretized SES is subsequently 
decimated to arbitrary prescribed spatial resolution using the surface simplification 
algorithm QSLIM.43-45 The QSLIM algorithm employs iterative vertex-pair contraction 
together with a quadric error metric to retain a near-optimal representation of the original 
surface while reducing the total number of faces by an arbitrary, user-specified amount.43 
The protein volume that is bounded by the closed SES is subsequently discretized with 
3D tetrahedral finite elements via automatic mesh generation using the commercial Finite 
Element program ADINA ver. 8.4 (Watertown, MA, USA). Application of the proposed 
FE-based procedure directly to x-ray data would require definition of the molecular 
volume from the electron density map using Voronoi tessellation or a similar procedure, 
as proposed by Wriggers et al.,46, and performed by Ming et al.24 
 The protein constitutive response is modeled using the standard Hooke’s law, 
which treats the protein as a homogeneous, isotropic, elastic continuum with Young’s 
modulus E and Poisson ratio ν.47 While this is conceptually similar to elastic-network 
based models, it is rigorously distinct: Elastic network models typically connect Cα 
atoms by springs of equal stiffness, which results in general in a locally anisotropic and 
inhomogeneous elastic material with length-scale dependent mechanical properties. In 
contrast, the FE-model defined here treats the protein as strictly homogeneous, with an 
isotropic elastic material response that is length-scale invariant. The mass density of the 
protein is taken to be homogeneous, although it could equally be defined as a spatially-
varying function from the underlying atomic constitution or from electron density data. 
 Finally, arbitrary boundary conditions consisting of displacement- or force-based 
loading may be applied to the molecule, modeling the effects of the protein environment. 
In the current application, the free vibration problem is solved for T4 lysozyme and F-
actin in the absence of any boundary condition and the linearized buckling problem for F-
actin is solved by applying co-axial compressive point loads to the ends of the molecule. 
 Given the protein volume, constitutive behavior, and boundary conditions, the 
FEM uses numerical volume-integration to derive a set of algebraic equations that is 
linear in the finite element nodal displacement degrees of freedom, u , 
 + =Mu Ku R  (1) 
where M is the diagonal mass-matrix, K is the elastic stiffness matrix, and R is a forcing 
vector that results from natural (force-based) boundary conditions.15 In the case of the 
free vibration problem relevant to the NMA of proteins, 0=R . Substitution of the 
oscillatory solution, cos( )tω γ= +u y , into the free-vibration form of Eq. (1) results in 
the generalized eigenvalue problem, 
 2 0ω− =Ky My  (2) 
which after definition of the eigenvalues, 2:λ ω= , may be written in secular form, 
 det 0λ− =K M . (3) 
Various efficient FE procedures exist to obtain the solution to the generalized eigenvalue 
problem, yielding the eigenvalues and eigenvectors, ( , )i iλ y . In the present application an 
accelerated subspace-iteration method15,48 is used for T4 lysozyme and F-actin. The 
substructure synthesis procedure49 commonly available in structural mechanics FE 
programs could also be applied to calculate the normal modes of F-actin, as recently 
proposed by Ming et al.50 
 The eigenvectors corresponding to the FE nodal degrees of freedom are linearly 
interpolated to the Cα positions given by the atomic coordinates that were used to define 
the FE model. Standard equilibrium thermal averages may then be computed in the 
standard way, including the fluctuation of Cα atom i due to mode k, 
2 2 /ik B ik k ir k Ta mλΔ = , the total fluctuation of Cα atom i due to all modes, 
i ikk
r rΔ = Δ∑ , correlations in positional fluctuations of Cα atoms i and j, 
/ij i j i i j jC r r r r r r= Δ ⋅Δ Δ ⋅Δ Δ ⋅Δ , where ( )i j B ik jk k i jkr r k T a a m mλΔ ⋅Δ = ∑ , 
and the overlap, ijR , between normal modes i and j, defined by the inner product of the 
modes, ij i j i jR = ⋅a a a a , where (1 1)ijR≤ ≤ − .
6 As with elastic network models, the 
protein stiffness-scale (E) is unknown. Accordingly, the acoustic wave speed, E ρ , 
which is the relevant physical unit in the free-vibration problem, is adjusted to best-fit the 
pertinent Cα fluctuation data, which is either experimental or that from the all-atom 
NMA. In the case of F-actin, the average mass density, ρ, is set explicitly and the 
Young’s modulus is determined by matching its stretching stiffness to experiment,51 as 
also performed by ben-Avraham and Tirion52 and described in more detail below. The 
Poisson ratio is taken to be 0.3 for T4 lysozyme and F-actin, which is typical of 
crystalline solids. While the choice of 0.3ν =  has, to the best of the author’s knowledge, 
no rigorous justification, it is noted that its precise value does not affect the computed 
results within the range of (0.3 0.5)ν≤ ≤ . This is typical of response calculations such as 
those performed here, in which material compressibility does not play an important role. 
 Two important considerations in generating the FE model are the choice of the 
probe-sphere radius used to define the protein volume and the degree of surface 
simplification performed. Regarding the choice of probe-sphere radius, two approaches 
were deliberated here. In the first, the probe-sphere radius is treated as an adjustable 
parameter, akin to the cut-off radius used in elastic network models. In this case, as the 
radius of the probe-sphere is increased, protein cavities in which solvent would normally 
be present become part of the effective elastic medium constituting the protein. 
Accordingly, the shape of the protein becomes a function of the probe-sphere radius, 
which will affect its mechanical response. In the second approach, the probe-sphere 
radius is treated as a fixed, physically-based parameter that is approximately equal to the 
size of a water molecule, as in electrostatic field calculations.53-55 The homogeneous 
elastic medium of the protein is then strictly applied to those volumetric regions in which 
dense intramolecular packing involving close-ranged van der Waals, hydrogen-bond, and 
bonded interactions are present, and the molecular surface is a well-defined physical 
feature of the protein. The latter approach was taken here in order to retain the physical 
connection to atomic packing in solids. 
 An important theoretical property of the FEM is that it guarantees convergence to 
the exact solution of the underlying mathematical model as the FE mesh is refined, where 
the mathematical model is defined by the protein’s analytical SES, constitutive behavior, 
and boundary conditions.15 Thus, any normal mode or mechanical response calculation 
performed using the proposed FE-based procedure should in principle systematically 
refine the discretized representation in order to ensure convergence of the computed 
model property to its exact result. In practice, however, the permissible degree of surface 
simplification using QSLIM or similar algorithm will depend on the sensitivity of the 
computed observable to details of molecular shape, which must be evaluated on a case-
by-case basis, as addressed below for T4 lysozyme and F-actin. 
T4 lysozyme 
 The initial structure of the 164 residue (18.7 kDa) mutant T4 phage lysozyme is 
taken from Matsumura et al.,56 (Protein Data Bank ID 3LZM).57 CHARMM ver. 33a158 
is used with the implicit solvation model EEF123 to build in coordinates missing in the 
crystal structure and to perform energy minimization and NMA. Steepest descent 
minimization followed by adopted-basis Newton–Raphson minimization is performed in 
the presence of successively reduced harmonic constraints on backbone atoms to achieve 
a final root-mean-square (RMS) energy gradient of 5×10–4 kcal/(mol Å) with 
corresponding RMS deviation between the x-ray and energy-minimized structures of 1.3 
Å (Fig. 1a). All-atom (ATM) and RTB NMA21 are used as implemented in CHARMM,22 
using one-block per residue for the RTB calculations. 
Fig. 1 T4 lysozyme (a) crystal structure (Protein Data Bank ID 3lmz), (b) MSMS-
triangulated SES, and (c) QSLIM-decimated SES used for the FE computation. Atomic 
structure rendered with VMD ver. 1.8.559 and triangulated models rendered with ADINA 
ver. 8.4. 
 To define the FE model, MSMS is used to compute the SES of the energy-
minimized structure of T4 lysozyme using the MSMS-default 1.5 Å radius probe 
ignoring hydrogens. As noted previously, the FE model may be defined directly from the 
atomic structure without initial energy minimization, however, the energy-minimized 
structure is used here to be consistent with the ATM model, which requires minimization. 
MSMS generates a triangulated approximation to the analytical SES that consists of 
17,300 triangular faces (Fig. 1b). This model is decimated using QSLIM to a reduced 
model consisting of 2,000 faces (Fig. 1c). The decimated surface-mesh is read into 
ADINA ver. 8.4 and used as a template to generate 6,843 4-node tetrahedral finite 
elements consisting of 1,627 nodes. Calculation of the 100 lowest non-rigid-body modes 
using an accelerated subspace-iteration method15,48 required 27 MB of RAM and about 
10 seconds on a 2.1 GHz Intel Core2Duo processor. Refining drastically the surface 
representation from 2,000 faces to 17,300 faces (and associated volume discretization) or 
computing more than 100 normal modes did not alter the Cα fluctuations significantly. 
F-actin 
 The atomic structure of F-actin (52 protomers, 2.2 MDa molecular weight) is 
generated using FilaSitus ver. 1.460 based on the Holmes fiber model61 and the structure 
of G-actin:ADP:Ca2+ from the actin-gelsolin segment-1 complex.62,63 This structure of F-
actin-ADP models the filament in its “young” state when the DNase I binding region of 
subdomain 2 of G-actin (residues 40–48) is in its disordered loop conformation as 
opposed to its ordered α-helix conformation.63-65 Importantly, in its disordered loop 
conformation this region forms intramolecular contacts in F-actin that stabilize the 
filament and have direct consequences on its mechanical properties.66-68 
 Calculation of the SES using MSMS and a 3 Å radius probea results in a model 
with 1,248,038 triangular faces, which is subsequently decimated in several seconds 
using QSLIM to a reduced model with 40,000 triangular faces. The decimated surface-
mesh is read into ADINA ver. 8.4 and used as a template to generate 134,883 4-node 
tetrahedral finite elements consisting of 31,881 nodes (Fig. 2b). Planar axial stretching is 
used to determine the effective Young’s modulus of F-actin, E = 2.69 GPa, by fitting its 
computed value to its experimentally-measured value in the absence of tropomyosin, 43.7 
nN.51 The homogeneous mass density, ρ = 1,170 kg/m3, is based on the 42 kDa molecular 
weight of G-actin and the calculated molecular volume of F-actin, which is equal to 
3.1×106 Å3 for the 52-mer considered. Normal mode analysis using the accelerated 
subspace-iteration procedure in ADINA requires 22 MB and less than 10 seconds to 
calculate the lowest 10 modes on a 2.1 GHz Intel Core2Duo processor. To test 
                                                 
a Use of the MSMS-default 1.5 Å radius probe resulted in QSLIM-decimated surface models that were 
poorly formed with multiple intersecting and degenerate triangles due to re-entrant surfaces of F-actin. Use 
of a 3 Å radius probe resolved this problem of SES-representation and is not expected to affect significantly 
the large length-scale normal modes of F-actin, which has relatively large minor and major diameters of 
~40 and 80 Å, respectively. 
convergence of the FE solution to the exact solution, the FE mesh was coarsened 
considerably to a model consisting of only 7,558 4-node tetrahedral volume elements 
(4,000 surface triangles), for which the lowest four eigen-frequencies increased by at 
most 15% with respect to the more detailed model. Further mesh refinement beyond 
40,000 surface elements was precluded by the problematic surface mesh generated by the 
proposed procedure, in which substantial element intersections were present. 
Fig. 2 (a) Atomic structure of the 52-monomer F-actin filament analyzed and (b) the 
triangulated SES used to define the FE model. Atomic structure is rendered with VMD 
ver. 1.8.5 59 and the FEM model rendered using ADINA ver. 8.4. 
Results 
T4 lysozyme 
 Equilibrium thermal fluctuations of Cα atoms aid in understanding protein 
function as mediated by local conformational flexibility and provide a first quantitative 
test for the proposed coarse-grained procedure. Experimental fluctuations are related to 
the experimental temperature- or B-factor by, 2 28 / 3i iB rπ= Δ , where 
irΔ  is the 
mean-squared fluctuation of atom i. While both coarse-grained models capture well the 
overall experimental variation in flexibility of T4 lysozyme (Fig. 3a and Table 1),56 local 
differences are evident in disordered loop regions where conformational flexibility is 
overestimated significantly by both the RTB and FEM procedures (e.g., residue numbers 
35–40). Comparison with the all-atom model indicates that these discrepancies are 
inherent to the protein structure, however, and not artifacts of the RTB and FEM 
procedures (Fig. 3b). Indeed, Cα fluctuations calculated with the RTB and FEM models 
correlate notably better with fluctuations calculated with the all-atom model than with 
experiment (Table 1). 
residue index
RMSF (Å)
0 20 40 60 80 100 120 140 160
residue index
RMSF (Å)
0 20 40 60 80 100 120 140 160
(a) (b)residue index residue index
Fig. 3. Coarse-grained RMSF of Cα atoms in T4 lysozyme compared with (a) experiment 
and (b) all-atom NMA. 100 modes are used to compute the all-atom, RTB, and FEM 
fluctuations. Correlation coefficients provided in Table 1. 
Table 1 Correlation coefficients corresponding to Cα atom RMSF in Figure 3. 
 Experiment ATM 
RTB 0.73 0.95 
FEM 0.68 0.89 
 Inter-residue spatial correlations measured at Cα atoms provide additional insight 
into protein function,69,70 as well as a further test of the proposed coarse-grained 
procedure. Interestingly, the RTB and FEM procedures provide similar information with 
respect to the all-atom model, as measured over either the lowest 10 or 100 modes (Fig. 4 
top and bottom, respectively). The fact that the correlation maps are largely determined 
with as few as ten modes reconfirms numerous previous findings that the lowest modes 
of proteins dominate their free vibration response.14,71,72 The similarity in the FEM and 
ATM correlation maps provides additional evidence that T4 lysozyme behaves 
remarkably similar to a homogeneous isotropic elastic solid in free vibration. 
Fig. 4 T4 lysozyme inter-Cα correlations computed using (top) 10 modes and (bottom) 
100 modes for the (a) ATM, (b) RTB, and (c) FEM models. 
 The lowest four mode shapes computed using the FEM may be projected onto the 
ground-state (energy-minimized) structure of T4 lysozyme to visualize their nature (Fig. 
5). Similar to the native hen egg lysozyme, the lowest mode is a hinge-bending mode,1,73 
whereas the three higher modes are a combination of hinge- and twist-deformations. 
Quantitative comparison between the coarse-grained and all-atom models is made in 
Table 2 for the lowest four mode shapes, and the lowest 200 frequencies in Figure 6. 
Fig. 5 Lowest four eigenmodes computed by the FEM superimposed on the minimized 
structure of T4 lysozyme. Overlap with the all-atom model is given in Table 2. Images 
rendered using VMD ver. 1.8.5 59. 
Table 2 Overlap of coarse-grained model and all-atom normal modes as measured at Cα 
positions. 
 Mode 1 Mode 2 Mode 3 Mode 4 
RTB 0.97 0.93 0.82 0.28 
FEM 0.91 0.86 0.76 0.71 
 The modal frequency distributions provide a final quantitative evaluation of the 
FEM and RTB approach for T4 lysozyme (Fig. 6). While the overall correlation between 
the FEM and all-atom frequencies is reasonable, particularly for low mode-numbers, the 
FEM tends to underestimate the “exact” frequency computed using the all-atom model at 
high mode-numbers. This suggests that the FEM models the protein as overly compliant 
in this regime, which is to be expected because higher modes excite shorter wavelength, 
stiffer degrees of freedom in the all-atom protein resulting from chain connectivity, 
whereas the elastic solid approximation assumes a compliance that is length-scale 
invariant. Backbone Cα fluctuations as well as Cα correlations are apparently unaffected 
by this approximation because the low modes dominate these observables. Interestingly, 
the opposite tendency was observed by Tama et al.,21 for the RTB-approach with 
successively larger blocks. This is also to be expected because the assumption of rigid 
blocks in the protein renders the structure overly stiff on short length scales (high 
frequency modes), and the length-scale at which this deviation from the all-atom model 
becomes significant increases with increasing block-size. 
ωCG [cm−1]
ωATM [cm−1]
0 5 10 15 20 25 30 35 40
ωATM [cm–1]
ωCG [cm–1]
Fig. 6 Correlation between coarse-grained (CG) and all-atom (ATM) model frequencies 
for the lowest 200 modes. 
F-actin 
 F-actin is a highly dynamic biopolymer with a considerable degree of internal 
plasticity in the state of tilt and twist of its constituent protomers, which depends on the 
bound nucleotide-state (ATP/ADP), bound actin-binding protein, and solvent 
conditions.74-78 Additionally, the bending stiffness of F-actin has been shown to increase 
by a factor of two in the presence of phalloidin, by 50% in the F-ADP-P versus F-ADP 
state, and to be regulated by tropomyosin in a Ca2+-dependent fashion.66 Thus, any 
modeling attempt to predict the mechanical properties of F-actin and investigate their 
relation to its detailed internal structure and composition must consider such variations. 
 Modeling attempts to investigate the structure-function relation of F-actin include 
an early study by ben-Avraham and Tirion,68 who treated G-actin monomers as internally 
rigid and connected to their nearest neighbor monomers by compliant springs, a more 
recent study by Ming et al.,79 in which conventional eNMA is used together with 
substructure-synthesis to calculate the large wavelength normal modes of a micron-long 
F-actin molecule, and most recently an all-atom MD study by Chu and Voth,67 who found 
that the loop-helix transition of the DNase I binding region of subdomain 2 of G-actin 
plays a central role in respectively stabilizing-destabilizing F-actin by disrupting inter-
monomer interactions. Chu and Voth67 also calculated the apparent persistence length of 
F-actin and found that the loop-to-helix transition between the ATP- and ADP-bound 
states accounted for the approximately 50% decrease in associated bending stiffness 
observed experimentally.66 
 The normal modes of F-actin (52-mer, 0.14 μm length) computed here in free 
planar-vibration yield four bending modes as the lowest modes (Fig. 7). Association of F-
actin with a homogeneous elastic rod in free vibration80 results in an apparent bending 
stiffness, κ = 6.8×10–26 Nm2, for the lowest mode, which is near the upper limit of 
bending stiffness typically reported experimentally.66,81,82 
 Subjecting F-actin to an axially compressive load and performing a linearized 
buckling analysis yields the lowest critical Euler buckling load, Pcrit = 33 pN. Association 
of the filament again with a homogeneous elastic Euler–Bernoulli beam yields the 
effective bending stiffness, 2 2 26 2/ 6.9 10  NmcritP Lκ π
−= = × , which is similar to the 
bending stiffness calculated from the lowest bending mode because that mode of 
deformation is the same as the lowest Euler buckling mode. 
Mode 1
Mode 4
Mode 3
Mode 2
Fig. 7 Four lowest free vibration modes of F-actin (52-mer, 0.144 μm length) in planar 
deformation. The corresponding angular frequencies are, 0.18×10–2, 0.48×10–2, 0.92×10–
2, and 0.16×10–1 rad/psec. 
 The bending stiffness calculated here for F-actin is consistent with experimental 
measurements of the ATP-bound-state in which the DNase binding region (residues 40–
48) in subdomain 2 of G-actin is in its disordered loop conformation, thereby stabilizing 
inter-monomer interactions.66,67 While this is to be expected given the structure of G-
actin:ADP:Ca2+ employed, in which the DNase binding region is also in its disordered 
loop conformation, a similar coarse-grained analysis of F-actin must be performed in 
which the DNase binding region of G-actin is in its ordered α-helical structure. Only then 
may it be stated definitively whether the observed mechanical behavior is due solely to 
this detailed structural difference or to some other source, such as a lack of modeling 
resolution. 
 While a more detailed investigation of this type is of direct interest in evaluating 
the full utility of the proposed procedure, it is also of interest fundamentally to investigate 
the respective roles of molecular shape versus molecular interactions on determining the 
mechanical properties of supramolecular assemblies such as F-actin, MTs, and viral 
capsids. In particular, an intriguing hypothesis is that mechanical response is determined 
solely by molecular shape, in which case the mechanical properties of supramolecular 
assemblies would be robust to amino acid mutations that do not alter molecular shape. A 
competing hypothesis is that mechanical response is sensitive to both molecular shape 
and detailed molecular interactions, in which case amino acid mutations would be more 
tightly constrained. In either case, investigation of the respective roles of molecular shape 
versus specific interactions on protein mechanics clearly requires that all-atom models be 
considered, either directly or via incorporation into coarse-grained models. Such 
investigations are currently underway and are expected to provide fundamental insight 
into the origin and robustness of the mechanical properties of supramolecular assemblies. 
Concluding discussion 
 A coarse-grained FE-based procedure is proposed to compute the normal modes 
and mechanical response of proteins and their supramolecular assemblies. The procedure 
takes as input the atomic structure to define uniquely the volume associated with the SES, 
mass density, and elastic stiffness of the protein. The initial, high resolution SES 
discretized at atomic resolution is simplified using a quadric simplification algorithm to 
obtain a molecular surface representation of arbitrary prescribed spatial resolution. While 
the proposed procedure is applied to proteins with known atomic structure, the molecular 
volume could equally be defined from electron density data, rendering the procedure 
applicable to a broad class of biomolecules and biomolecular complexes for which only a 
rough approximation to the molecular volume is known. As with existing coarse-grained 
elastic network models, energy minimization is not required prior to the NMA because 
the initial structure is assumed to be the ground-state structure. 
 Ongoing development of the proposed procedure is directed towards three areas 
of improvement. First, the atomic-based Hessian from all-atom force-fields such as 
CHARMM58 will be projected onto the FE-space such that the model optimally 
converges to the “exact” all-atom solution as the FE mesh is refined to atomic length-
scales. Such a procedure will enable the systematic coarsening of protein structure and 
interactions without the a priori assumption of elastic response. Indeed, an intriguing and 
as of yet unresolved question regards the relative effects of molecular shape versus 
specific molecular interactions on the mechanical response of supramolecular assemblies 
such as F-actin, MTs, and viral capsids. Second, the Poisson–Boltzmann equation used to 
model aqueous electrolyte-mediated electrostatic interactions in proteins may be coupled 
directly to the elastic-based FE model, so that it may be included in computations of 
normal modes and mechanical response. Langevin dynamics may also be incorporated 
into the model by coupling the protein-domain to the Stokes equations to model solvent 
damping.41,83 Finally, the proposed surface discretization and simplification procedure 
requires improvement because it often results in surface meshes with intersecting or 
degenerate triangles, as encountered here for F-actin. 
 The utility of the proposed FE-based procedure is explored here for one specific 
globular protein and supramolecular assembly, namely T4 lysozyme and F-actin. Clearly, 
in order to evaluate the utility of the proposed procedure thoroughly, a set of proteins of 
drastically varying structure must be analyzed, as well as additional supramolecular 
assemblies. Additional response variables and the effects of internal structural variations 
of the molecules examined should also be investigated. Notwithstanding these additional 
analyses and the foregoing model improvements, the current communication establishes 
an effective theoretical framework for the computation of the normal modes and 
generalized mechanical response of proteins and their supramolecular assemblies based 
on the elastic medium theory of proteins. 
Acknowledgements 
Discussions with Marco Cecchini, Martin Karplus, Klaus–Jürgen Bathe, and Michael 
Garland are gratefully acknowledged, as is funding from the Alexander von Humboldt 
Foundation in the form of a post-doctoral fellowship. The author additionally thanks 
Michael Sanner for bringing QSLIM to his attention. 
References 
1. Levitt M, Sander C, Stern PS. Protein Normal-mode Dynamics: Trypsin Inhibitor, 
Crambin, Ribonuclease and Lysozyme. Journal of Molecular Biology 
1985;181(3):423-447. 
2. Brooks B, Karplus M. Harmonic dynamics of proteins - Normal Modes and 
fluctuations in bovine pancreatic trypsin inhibitor. Proceedings of the National 
Academy of Sciences of the United States of America 1983;80(21):6571-6575. 
3. Go N, Noguti T, Nishikawa T. Dynamics of a small globular protein in terms of 
low frequency vibrational modes. Proceedings of the National Academy of 
Sciences of the United States of America 1983;80(12):3696-3700. 
4. Bruccoleri RE, Karplus M, McCammon JA. The hinge-bending mode of a 
lysozyme inhibitor complex. Biopolymers 1986;25(9):1767-1802. 
5. Karplus M, McCammon JA. Molecular dynamics simulations of biomolecules. 
Nature Structural Biology 2002;9(9):646-652. 
6. Brooks BR, Janezic D, Karplus M. Harmonic analysis of large systems. 1. 
Methodology. Journal of Computational Chemistry 1995;16(12):1522-1542. 
7. Tirion MM. Large amplitude elastic motions in proteins from a single-parameter, 
atomic analysis. Physical Review Letters 1996;77(9):1905-1908. 
8. Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctuations in 
proteins using a single-parameter harmonic potential. Folding & Design 
1997;2(3):173-181. 
9. Tama F, Brooks CL. Symmetry, form, and shape: Guiding principles for 
robustness in macromolecular machines. Annual Review of Biophysics and 
Biomolecular Structure 2006;35:115-133. 
10. Tozzini V. Coarse-grained models for proteins. Current Opinion In Structural 
Biology 2005;15(2):144-150. 
11. Bahar I, Rader AJ. Coarse-grained normal mode analysis in structural biology. 
Current Opinion in Structural Biology 2005;15(5):586-592. 
12. Ma JP. Usefulness and limitations of normal mode analysis in modeling dynamics 
of biomolecular complexes. Structure 2005;13(3):373-380. 
13. Suezaki Y, Go N. Breathing Mode of Conformational Fluctuations in Globular 
Proteins. International Journal of Peptide and Protein Research 1975;7(4):333-
334. 
14. Lu MY, Ma JP. The role of shape in determining molecular motions. Biophysical 
Journal 2005;89(4):2395-2401. 
15. Bathe KJ. Finite Element Procedures. Upper Saddle River, New Jersey: Prentice-
Hall Inc.; 1996. 
16. Zienkiewicz OC, Taylor RL. The finite element method. Boston: Butterworth–
Heinemann; 2000. 
17. Richards FM. Areas, volumes, packing, and protein-structure. Annual Review of 
Biophysics and Bioengineering 1977;6:151-176. 
18. Connolly ML. Analytical Molecular Surface Calculation. Journal of Applied 
Crystallography 1983;16(OCT):548-558. 
19. Greer J, Bush BL. Macromolecular Shape and Surface Maps by Solvent 
Exclusion. Proceedings of the National Academy of Sciences of the United States 
of America 1978;75(1):303-307. 
20. Sanner MF, Olson AJ, Spehner JC. Reduced surface: An efficient way to compute 
molecular surfaces. Biopolymers 1996;38(3):305-320. 
21. Tama F, Gadea FX, Marques O, Sanejouand YH. Building-block approach for 
determining low frequency normal modes of macromolecules. Proteins: Structure 
Function and Genetics 2000;41(1):1-7. 
22. Li GH, Cui Q. A coarse-grained normal mode approach for macromolecules: An 
efficient implementation and application to Ca2+-ATPase. Biophysical Journal 
2002;83(5):2457-2474. 
23. Lazaridis T, Karplus M. Effective energy function for proteins in solution. 
Proteins: Structure Function and Genetics 1999;35(2):133-152. 
24. Ming D, Kong YF, Lambert MA, Huang Z, Ma JP. How to describe protein 
motion without amino acid sequence and atomic coordinates. Proceedings of the 
National Academy of Sciences of the United States of America 
2002;99(13):8620-8625. 
25. Tama F, Wriggers W, Brooks CL. Exploring global distortions of biological 
macromolecules and assemblies from low-resolution structural information and 
elastic network theory. Journal of Molecular Biology 2002;321(2):297-305. 
26. Chacon P, Tama F, Wriggers W. Mega-Dalton biomolecular motion captured 
from electron microscopy reconstructions. Journal of Molecular Biology 
2003;326(2):485-492. 
27. Michel JP, Ivanovska IL, Gibbons MM, Klug WS, Knobler CM, Wuite GJL, 
Schmidt CF. Nanoindentation studies of full and empty viral capsids and the 
effects of capsid protein mutations on elasticity and strength. Proceedings of the 
National Academy of Sciences of the United States of America 
2006;103(16):6184-6189. 
28. Carrasco C, Carreira A, Schaap IAT, Serena PA, Gomez-Herrero J, Mateu MG, 
Pablo PJ. DNA-mediated anisotropic mechanical reinforcement of a virus. 
Proceedings of the National Academy of Sciences of the United States of America 
2006;103(37):13706-13711. 
29. de Pablo PJ, Schaap IAT, MacKintosh FC, Schmidt CF. Deformation and 
collapse of microtubules on the nanometer scale. Physical Review Letters 
2003;91(9). 
30. Kis A, Kasas S, Babic B, Kulik AJ, Benoit W, Briggs GAD, Schonenberger C, 
Catsicas S, Forro L. Nanomechanics of microtubules. Physical Review Letters 
2002;89(24). 
31. Claessens M, Bathe M, Frey E, Bausch AR. Actin-binding proteins sensitively 
mediate F-actin bundle stiffness. Nature Materials 2006;5(9):748-753. 
32. Bathe M, Heussinger C, Claessens MMAE, Bausch AR, Frey E. Cytoskeletal 
bundle bending, buckling, and stretching behavior. Submitted 2006. 
33. Wriggers W, Zhang Z, Shah M, Sorensen DC. Simulating nanoscale functional 
motions of biomolecules. Molecular Simulation 2006;32(10-11):803-815. 
34. Howard J. Mechanics of Motor Proteins and the Cytoskeleton. Sunderland, MA: 
Sinauer Associates, Inc.; 2001. 
35. Durand P, Trinquier G, Sanejouand YH. New approach for determining low 
frequency Normal-Modes in macromolecules. Biopolymers 1994;34(6):759-771. 
36. Tadmor EB, Ortiz M, Phillips R. Quasicontinuum analysis of defects in solids. 
Philosophical Magazine A: Physics of Condensed Matter Structure Defects and 
Mechanical Properties 1996;73(6):1529-1563. 
37. Abraham FF, Broughton JQ, Bernstein N, Kaxiras E. Spanning the continuum to 
quantum length scales in a dynamic simulation of brittle fracture. Europhysics 
Letters 1998;44(6):783-787. 
38. Baker NA, Sept D, Joseph S, Holst MJ, McCammon JA. Electrostatics of 
nanosystems: Application to microtubules and the ribosome. Proceedings of the 
National Academy of Sciences of the United States of America 
2001;98(18):10037-10041. 
39. Cortis CM, Friesner RA. Numerical solution of the Poisson-Boltzmann equation 
using tetrahedral finite-element meshes. Journal of Computational Chemistry 
1997;18(13):1591-1608. 
40. Holst M, Baker N, Wang F. Adaptive multilevel finite element solution of the 
Poisson-Boltzmann equation I. Algorithms and examples. Journal of 
Computational Chemistry 2000;21(15):1319-1342. 
41. Lamm G, Szabo A. Langevin modes of macromolecules. Journal of Chemical 
Physics 1986;85(12):7334-7348. 
42. Potter MJ, Luty B, Zhou HX, McCammon JA. Time-dependent rate coefficients 
from Brownian Dynamics simulations. Journal of Physical Chemistry 
1996;100(12):5149-5154. 
43. Heckbert PS, Garland M. Optimal triangulation and quadric-based surface 
simplification. Computational Geometry: Theory and Applications 1999;14(1-
3):49-65. 
44. Garland M. Quadric-Based Polygonal Surface Simplification [Ph.D.]: Carnegie 
Mellon University; 1999. 
45. Garland M, Heckbert PS. Surface simplification using quadric error metrics. 1997 
August 1997. p 209-216. 
46. Wriggers W, Agrawal RK, Drew DL, McCammon A, Frank J. Domain motions 
of EF-G bound to the 70S ribosome: Insights from a hand-shaking between multi-
resolution structures. Biophysical Journal 2000;79(3):1670-1678. 
47. Malvern LE. Introduction to the Mechanics of a Continuous Medium. Englewood 
Cliffs, N. J.: Prentice–Hall; 1969. 
48. Bathe KJ, Ramaswamy S. An accelerated Subspace Iteration Method. Computer 
Methods in Applied Mechanics and Engineering 1980;23(3):313-331. 
49. Bathe KJ, Gracewski S. On non-linear dynamic analysis using substructuring and 
mode superposition. Computers & Structures 1981;13(5-6):699-707. 
50. Ming D, Kong YF, Wu YH, Ma JP. Substructure synthesis method for simulating 
large molecular complexes. Proceedings of the National Academy of Sciences of 
the United States of America 2003;100(1):104-109. 
51. Kojima H, Ishijima A, Yanagida T. Direct measurement of stiffness of single 
actin-filaments with and without tropomyosin by in-vitro nanomanipulation. 
Proceedings of the National Academy of Sciences of the United States of America 
1994;91(26):12962-12966. 
52. Benavraham D, Tirion MM. Dynamic and elastic properties of F-actin: A Normal 
Modes Analysis. Biophysical Journal 1995;68(4):1231-1245. 
53. Nicholls A, Honig B. A rapid finite difference algorithm, utilizing successive 
over-relaxation to solve the Poisson-Boltzmann equation. Journal of 
Computational Chemistry 1991;12(4):435-445. 
54. Cortis CM, Friesner RA. An automatic three-dimensional finite element mesh 
generation system for the Poisson-Boltzmann equation. Journal of Computational 
Chemistry 1997;18(13):1570-1590. 
55. Baker N, Holst M, Wang F. Adaptive multilevel finite element solution of the 
Poisson-Boltzmann equation II. Refinement at solvent-accessible surfaces in 
biomolecular systems. Journal of Computational Chemistry 2000;21(15):1343-
1352. 
56. Matsumura M, Wozniak JA, Daopin S, Matthews BW. Structural Studies of 
Mutants of T4 Lysozyme that Alter Hydrophobic Stabilization. Journal of 
Biological Chemistry 1989;264(27):16059-16066. 
57. Bernstein FC, Koetzle TF, Williams GJB, Meyer EF, Brice MD, Rodgers JR, 
Kennard O, Shimanouchi T, Tasumi M. The Protein Data Bank: A Computer-
based Archival File for Macromolecular Structures. Journal of Molecular Biology 
1977;112(3):535-542. 
58. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. 
CHARMM—A program for macromolecular energy, minimization, and dynamics 
calculations. Journal of Computational Chemistry 1983;4(2):187-217. 
59. Humphrey W, Dalke A, Schulten K. VMD: Visual Molecular Dynamics. Journal 
of Molecular Graphics 1996;14(1):33. 
60. Wriggers W, Milligan RA, McCammon JA. Situs: A package for docking crystal 
structures into low-resolution maps from electron microscopy. Journal of 
Structural Biology 1999;125(2-3):185-195. 
61. Holmes KC, Popp D, Gebhard W, Kabsch W. Atomic model of the actin filament. 
Nature 1990;347(6288):44-49. 
62. McLaughlin PJ, Gooch JT, Mannherz HG, Weeds AG. Structure of Gelsolin 
Segment-1-Actin Complex and the Mechanism of Filament Severing. Nature 
1993;364(6439):685-692. 
63. Wriggers W, Schulten K. Investigating a back door mechanism of actin phosphate 
release by steered molecular dynamics. Proteins: Structure Function and Genetics 
1999;35(2):262-273. 
64. Otterbein LR, Graceffa P, Dominguez R. The crystal structure of uncomplexed 
actin in the ADP state. Science 2001;293(5530):708-711. 
65. Graceffa P, Dominguez R. Crystal structure of monomeric actin in the ATP state - 
Structural basis of nucleotide-dependent actin dynamics. Journal of Biological 
Chemistry 2003;278(36):34172-34180. 
66. Isambert H, Venier P, Maggs AC, Fattoum A, Kassab R, Pantaloni D, Carlier MF. 
Flexibility of actin filaments derived from thermal fluctuations - Effect of bound 
nucleotide, phalloidin, and muscle regulatory proteins. Journal of Biological 
Chemistry 1995;270(19):11437-11444. 
67. Chu JW, Voth GA. Allostery of actin filaments: Molecular dynamics simulations 
and coarse-grained analysis. Proceedings of the National Academy of Sciences of 
the United States of America 2005;102(37):13111-13116. 
68. ben-Avraham D, Tirion MM. Dynamic and elastic properties of F-actin: A 
Normal Modes Analysis. Biophysical Journal 1995;68(4):1231-1245. 
69. Cheng XL, Lu BZ, Grant B, Law RJ, McCammon JA. Channel opening motion of 
alpha 7 nicotinic acetylcholine receptor as suggested by normal mode analysis. 
Journal of Molecular Biology 2006;355(2):310-324. 
70. Lange OF, Grubmuller H. Generalized correlation for biomolecular dynamics. 
Proteins: Structure Function and Bioinformatics 2006;62(4):1053-1061. 
71. Cui Q, Li GH, Ma JP, Karplus M. A normal mode analysis of structural plasticity 
in the biomolecular motor F-1-ATPase. Journal of Molecular Biology 
2004;340(2):345-372. 
72. Nicolay S, Sanejouand YH. Functional modes of proteins are among the most 
robust. Physical Review Letters 2006;96(7):Art. No. 078104. 
73. Brooks B, Karplus M. Normal-Modes for specific motions of macromolecules - 
Application to the hinge-bending mode of lysozyme. Proceedings of the National 
Academy of Sciences of the United States of America 1985;82(15):4995-4999. 
74. Egelman EH, Francis N, Derosier DJ. F-actin is a helix with a random variable 
twist. Nature 1982;298(5870):131-135. 
75. Egelman EH, Orlova A. Allostery, cooperativity, and different structural states in 
F-actin. Journal of Structural Biology 1995;115(2):159-162. 
76. Orlova A, Galkin VE, VanLoock MS, Kim E, Shvetsov A, Reisler E, Egelman 
EH. Probing the structure of F-actin: Cross-links constrain atomic models and 
modify actin dynamics. Journal of Molecular Biology 2001;312(1):95-106. 
77. Orlova A, Shvetsov A, Galkin VE, Kudryashov DS, Rubenstein PA, Egelman 
EH, Reisler E. Actin-destabilizing factors disrupt filaments by means of a time 
reversal of polymerization. Proceedings of the National Academy of Sciences of 
the United States of America 2004;101(51):17664-17668. 
78. McGough A. F-actin-binding proteins. Current Opinion in Structural Biology 
1998;8(2):166-176. 
79. Ming DM, Kong YF, Wu YH, Ma JP. Simulation of F-actin filaments of several 
microns. Biophysical Journal 2003;85(1):27-35. 
80. Den Hartog JP. Mechanical Vibrations. New York: McGraw-Hill; 1956. 
81. LeGoff L, Hallatschek O, Frey E, Amblard F. Tracer studies on F-actin 
fluctuations. Physical Review Letters 2002;89(25):Art. No. 258101-258101. 
82. Gittes F, Mickey B, Nettleton J, Howard J. Flexural rigidity of microtubules and 
actin filaments measured from thermal fluctuations in shape. Journal of Cell 
Biology 1993;120(4):923-934. 
83. Case DA. Normal mode analysis of protein dynamics. Current Opinion in 
Structural Biology 1994;4(2):285-290.
ABSTRACT
  A coarse-grained computational procedure based on the Finite Element Method
is proposed to calculate the normal modes and mechanical response of proteins
and their supramolecular assemblies. Motivated by the elastic network model,
proteins are modeled as homogeneous isotropic elastic solids with volume
defined by their solvent-excluded surface. The discretized Finite Element
representation is obtained using a surface simplification algorithm that
facilitates the generation of models of arbitrary prescribed spatial
resolution. The procedure is applied to compute the normal modes of a mutant of
T4 phage lysozyme and of filamentous actin, as well as the critical Euler
buckling load of the latter when subject to axial compression. Results compare
favorably with all-atom normal mode analysis, the Rotation Translation Blocks
procedure, and experiment. The proposed methodology establishes a computational
framework for the calculation of protein mechanical response that facilitates
the incorporation of specific atomic-level interactions into the model,
including aqueous-electrolyte-mediated electrostatic effects. The procedure is
equally applicable to proteins with known atomic coordinates as it is to
electron density maps of proteins, protein complexes, and supramolecular
assemblies of unknown atomic structure.

<|endoftext|><|startoftext|>
Introduction
	Light element abundance
	Homogeneous BBN
	Parameters and Basic equations
	Theoretical predictions and observations of light elements
	Theoretical predictions and observations of heavy elements (92,94Mo, 96,98Ru)
	Diffusion during BBN
	Summary
	Acknowledgements
	References
ABSTRACT
  This is a reply report to astro-ph/0604264. We studied heavy element
production in high baryon density region in early universe astro-ph/0507439.
However it is claimed in astro-ph/0604264 that small scale but high baryon
density region contradicts observations for the light element abundance or in
order not to contradict to observations high density region must be so small
that it cannot affect the present heavy element abundance.
  In this paper we study big bang nucleosynthesis in high baryon density region
and show that in certain parameter spaces it is possible to produce enough
amount of heavy element without contradiction to CMB and light element
observations.

<|endoftext|><|startoftext|>
Introduction
	Sum-over-states expressions for the time-ordered nonlinear response 
	Quasiparticle expressions for Wannier excitons in semiconductors 
	Connecting the sum-over-states and the quasiparticle pictures
	2D correlation signals
	Discussion
	Acknowledgments
	Exciton representation of the two-band Hamiltonian for fermions 
	The Nonlinear Exciton Equations
	Response functions of quasiparticles
	The exciton scattering-matrix
	SOS expressions for third order techniques.
	Quasiparticle picture for soft-core and hard-core bosons
	References
ABSTRACT
  Two-dimensional correlation spectroscopy (2DCS) based on the nonlinear
optical response of excitons to sequences of ultrafast pulses, has the
potential to provide some unique insights into carrier dynamics in
semiconductors. The most prominent feature of 2DCS, cross peaks, can best be
understood using a sum-over-states picture involving the many-body eigenstates.
However, the optical response of semiconductors is usually calculated by
solving truncated equations of motion for dynamical variables, which result in
a quasiparticle picture. In this work we derive Green's function expressions
for the four wave mixing signals generated in various phase-matching directions
and use them to establish the connection between the two pictures. The formal
connection with Frenkel excitons (hard-core bosons) and vibrational excitons
(soft-core bosons) is pointed out.

<|endoftext|><|startoftext|>
Introduction
Since the solar corona is optically thin, studies based on coronal loop observations must
include some form of subtraction of the background contribution (see e.g., Klimchuk 2000,
Schmelz & Martens 2006, López Fuentes, Klimchuk & Démoulin 2006). In their recent
statistical study based on observations from the Transition Region and Coronal Explorer
(TRACE, see Handy et al. 1999), Aschwanden, Nightingale & Boerner (2007) showed that
the background can be several times brighter than the loops themselves. It is likely that
the background corona is formed by a number of loops that are too faint to produce a large
enough contrast to make them detectable. However, these unobserved structures constitute
a spatially fluctuating background for actual observed loops. Therefore, even for loops with
constant intensity along their length, fluctuations due to the structuring of the background
are expected. The determination of morphological properties of a loop, such as its diameter,
can be affected by the characteristics of the background, and therefore it is important that
the background be taken into account during such analyses.
In a recent paper (López Fuentes, Klimchuk & Démoulin 2006, henceforth LKD06) we
explored the problem of the apparent constant width of coronal loops. Since loops are the
trace of magnetic flux tubes rooted in the photosphere, we might expect on the basis of
simple force-free magnetic field models that most loops would expand with height. However,
observations show that this is not the case; both X-ray loops (Klimchuk 2000) observed
with the Soft X-ray Telescope (SXT, see Tsuneta et al. 1991) aboard Yohkoh, and EUV
loops (Watko & Klimchuk 2000, and LKD06) observed with TRACE, seem to correspond to
constant cross-sections.
In LKD06, we compared a number of observed TRACE loops with corresponding model
flux tubes obtained from force-free extrapolations of magnetogram data from the Michelson
Doppler Imager (MDI, see Scherrer et al. 1995) aboard the Solar and Heliospheric Ob-
servatory (SOHO). To quantify the expansion of the loops and flux tubes, we defined the
expansion factor Γ as the ratio between the widths averaged over the middle and footpoint
sections. We found that the mean expansion factor of the model flux tubes is about twice
that of the corresponding observed loops. Another important result is that the cross sec-
tion is much more asymmetric (from footpoint to footpoint) for the model flux tubes than
for observed loops. We suggest that the origin of this asymmetry lies is the complexity of
the magnetic connectivity of the solar atmosphere. In LKD06 we proposed a mechanism to
explain the observed symmetry of real loops.
Although the measured widths of observed loops have very little global variation, there
are short distance fluctuations as large as 25% of the average width. In LKD06 the loop
background was subtracted by linearly interpolating between the intensities on either side
– 3 –
of the loop. Since the background intensity can be as much as three to five times the
intrinsic intensity of the loop, we might expect the width determination to be less reliable at
positions where the ratio of loop to background intensities is smaller. If so, then measured
width variations might be partly or largely an artifact of imperfect background subtraction.
It is also possible that the width fluctuations are indicative of real structural properties
of the loops. For instance, loops may be bundles of thinner unresolved magnetic strands
that wrap around each other. If there are only a few such strands, then we might expect the
width of the bundle to fluctuate on top of a global trend. Furthermore, the width should
be anti-correlated with the intensity, since the bundle will be thinner and brighter in places
where the strands are lined up along the line of sight, and it will be fatter and fainter in
places where the stands are side-by-side across the plane of the sky.
The filamentary nature of coronal loops, and the solar corona in general, has been
progressively evident from the combination of models and observations (for a review, see
e.g., Klimchuk 2006). Our ability to discern the internal structure of loops is limited by the
instrument resolution. It can be seen from TRACE images that structures many times wider
than the instrument Point Spread Function (PSF) are clearly made of thinner strands. It is
not surprising then that recognizable “individual” loops are no thicker than a few times the
PSF width. Since identifiable individual loops are close to the resolution limit, it has been
suggested that the apparent constant width may just be an artifact of the resolution (see the
recent paper by DeForest 2007). If loops are everywhere much smaller than the PSF, then
they will appear to have a constant width equal to that of the PSF, even if the true width is
varying greatly. We have carefully accounted for the PSF in our earlier studies and concluded
that this is not a viable explanation for the observed constant widths. What we have not
addressed in as much detail is the possible role of imperfect background subtraction. This
paper describes a study that addresses both the background subtraction and finite resolution
and the extent to which they influence the measured widths of loops.
Our approach is to produce synthetic loops with constant and variable cross-sections,
and place them on real TRACE backgrounds to simulate loop observations. We then process
the synthetic data following the same procedure used in LKD06, so we can compare them
with actual TRACE loops. This allows us to determine whether the procedure followed in
LKD06 is able to distinguish expanding loops from constant cross-section loops. We will
answer the question of whether the lack of global expansion in observed loops is real or
simply an observational artifact, as suggest by DeForest (2007). We will also investigate the
reliability of the shorter length scale fluctuations that are often observed.
In Section 2 we describe the main properties of the set of loops studied in LKD06. We
explain the synthetic loop construction in Section 3. In Section 4 we compare synthetic and
– 4 –
observed loops, and we discuss and conclude in Section 5.
2. Observed TRACE loops
In LKD06 we studied a set of 20 loops from TRACE images in the 171 Å passband.
To determine the width of the loops we followed a procedure based on the measurement
of the second moment (the standard deviation) of the cross-axis intensity profile at each
position along the loop. The measurement is done on a straightened version of the loop,
as described in LKD06. Assuming circular cross-sections and uniform emissivity, the cross-
section diameter (that we refer to as the width) will be 4 times the standard deviation of
the profile. The same procedure had been used in previous studies (see Klimchuk et al.
1992, Klimchuk 2000, Watko & Klimchuk 2000). The actual background is estimated by
linear interpolation of the background pixels at both sides of the loop. The obtained width
is corrected for instrumental resolution (i.e. the combined PSF due to telescope smearing
and detector pixelation).
The typical length of the studied loops is around 150 TRACE pixels or 54 Mm, though
loops as long as 300 pixels (108 Mm) are included in the set. The average width for all loops
in the set is 4.2 pixels or 1.5 Mm. Figure 1 shows a typical case having the average width
and length. The upper panel shows the loop as observed in the TRACE image, and the
lower panel is the “straightened” version.
For the resolution correction we use a conversion curve (see Figure 4 in LKD06) to
transform each measured standard deviation value to width. The curve has been obtained
assuming a Gaussian PSF with a full width at half maximum of 2.25 pixels and loop cross-
sections that are circular and uniformly filled. The chosen PSF width is an upper limit for
values obtained in different studies (Golub et al. 1999, Gburek, Sylwester & Martens 2006).
The resolution correction curve plays a two-fold role. First, it allows us to obtain a more
realistic width value from a measured quantity like the standard deviation. Second, it is a
filter for measurements that are clearly unreliable. Standard deviation measurements smaller
than a minimum value equal to the standard deviation of the PSF itself (where the conversion
curve crosses the abscissa axis, see LKD06 Figure 4) are considered untrustworthy, and the
corresponding width is set to zero (i.e., rejected). Problems resulting from significant errors
in the background subtraction can also be identified in this way. It is worth remarking that
our approach is quite cautious in that the PSF we have assumed is wider than the most
recent estimates (see Gburek, Sylwester & Martens 2006). Some of the measurements we
reject as being unresolved may in fact be valid.
– 5 –
Figure 2 is a plot of width (asterisks) versus position along the loop shown in Figure 1.
The horizontal line corresponds to the average width. It is nearly identical to the mean
width of all the loops in the set. The three “zero width” values that lie on the abscissa
axis correspond to standard deviation measurements that were below the resolution limit as
explained above.
To quantify the expansion of loops from footpoint to top we defined expansion factors
as follows:
Γm/se =
Ws +We
, Γm/s =
and Γm/e =
, (1)
where Wm, Ws and We are the average width of portions that cover 15% of the loop length
at the middle, start footpoint, and end footpoint, respectively. Start and end refer to the
magnetic field line traces used to define the magnetic flux tubes in the extrapolation models.
The model flux tubes expand much more than the corresponding observed loops (LKD06).
Their expansion factors are 1.5 to 2 times larger.
As explained in Section 1, the loop width fluctuates as much as 25% over short dis-
tances (see e.g., Figure 2). We tried alternate measures of the loop width (full width at
half maximum and equivalent width of the intensity profile), and the same fluctuations are
present. Our conclusion is that the fluctuations are most likely due to the influence of the
background (see below). Since these fluctuations have a short length scale and vary quasi
randomly around a global trend, they do not significantly affect the measured expansion
factors.
3. Synthetic loops
In this study we create a set of synthetic loops with similar characteristics to the TRACE
loops studied in LKD06, and we overlay them on real TRACE backgrounds. The axis of the
loop is linear and its cross-section is circular. To analyze the possibility that the apparent
constant width is due to a resolution effect we create loops of two kinds: loops with constant
diameter along their length, and loops that are wider in the middle than at the footpoints.
For the second class of synthetic loops we use an expansion factor that is typical of the
model flux tubes obtained in LKD06. We set the diameter of the loop at the mid point to be
twice the diameter at the ends, and we assume that the diameter varies quadratically with
position. Since the expansion factor defined in Equation (1) involves averages along 15%
sections of the loop, Γm/se = 1.57 rather than 2.0.
– 6 –
We have chosen two kinds of background for the synthetic loops. Background I, shown
in the top-left panel in Figure 3, corresponds to a typical TRACE loop background: it has
similar intensity magnitude and fluctuations, and it contains moss (see Berger et al. 1999,
Martens et al. 2000) and other intense features. Background II, shown in the top-right
panel, is fainter and fluctuates less than Background I. Although it does not correspond
very well to real loop backgrounds, we consider it interesting to study how this kind of
background affects the width determination. The average intensities of backgrounds I and II
are approximately 70 and 30 DN (Data Numbers), respectively. For comparison, the typical
intrinsic (background subtracted) intensity of observed loops is between 20 and 40 DN. Both
background areas have been extracted from a TRACE image in the 171 Å band obtained
at 01:45 UT on July 30, 2002.
To create a simulated TRACE image containing the synthetic loop we proceed as follows.
We create an image of the loop without background. The maximum intensity of the loop (at
the axis) is set proportional to the average intensity of the background image on which it
will be later superposed. The constant of proportionality is referred to as the intensity factor
Φ. Since the background intensity tends to be higher than the intrinsic loop intensity, Φ is
generally smaller than 1. To simulate the finite resolution, we smooth the image of the loop
using a gaussian profile with a full width at half maximum of 2.25 pixels corresponding to
the instrument PSF. The resulting loop is then placed on the previously selected background
(I or II) from the TRACE image.
The images in Figure 3 have been created as described above. Both panels in each
row use the same synthetic loop placed in one case on background I (left) and the other
case on Background II (right). The four loops differ in the following ways. The loop in the
second row (panels a and b) has a constant diameter of 4 pixels, corresponding to Γm/se = 1.
The loop in the third row (panels c and d) has a diameter that expands from 2.5 pixels
at the ends to 5 pixels at the center, corresponding to Γm/se = 1.57. The loop in fourth
row (panels e and f) has a constant diameter of 3 pixels. Finally, the loop in the bottom
row (panels g and h) expands from 2 pixels at the ends to 4 pixels in the middle. Notice
that the ends of this last loop are narrower than the PSF. The intensity factor Φ has been
adjusted so that the resulting loops look similar, by eye, to typical TRACE loops. We used
Φ = 0.5 for Background I and Φ = 0.7 for Background II. Considering the average intensities
of Backgrounds I and II, this gives intrinsic loop intensities of around 35 DN and 25 DN,
respectively. These values are consistent with the intrinsic intensities of observed loops.
The photon statistical noise associated with TRACE data is given by
N , where N is
the number of photon counts per pixel (Handy et al. 1999). Since 1 DN corresponds to 12
photon counts, the photon noise as a percentage of the signal is:
– 7 –
PN% = 100
, (2)
where I is the intensity of the signal in DN/pix. The synthetic loop data constructed here
includes the photon noise present in the TRACE image used for the background. As we
now demonstrate, this contribution dominates the noise from the loop itself, so we can safely
ignore the loop contribution. Let us first consider the extreme case of low background and
loop intensities, namely: Ib = 30 DN and Il = 10 DN. According to Equation (2) the photon
noise of the total signal is PN% = 30/
40 or 4.7%. On the other hand, for our synthetic
images (photon noise from the background only) it is PN% = 30/
30 or 4.1%, meaning
a difference of 0.6%. For a more typical case of Ib = 70 DN and Il = 25 DN, the same
percentages are 3.1% and 2.6% respectively, implying a difference of 0.5% of the total signal.
These differences are minor and will have a negligible effect on the results of the following
sections.
4. Results
4.1. Can we detect expanding loops?
From the set of loop images, we measured the width following the same procedure used
in LKD06 for real loops and described in Section 2. We used the conversion curve (Figure
4 in LKD06) to correct for the instrument PSF. The non-linearity of the curve increases
the dispersion of the resulting widths at smaller values approaching the width of the PSF.
In Figure 4 we plot the “measured” width (asterisks) versus position along the loop for the
eight cases in Figure 3. The format of the figures is the same. For comparison, we also plot
as continuous lines the actual diameters used to construct the images. It can be seen that,
despite the fluctuations, expanding and constant width loops are clearly distinguishable. This
is true for loops that are relatively wide (top two rows) and loops that are relatively narrow
(bottom two rows). This demonstrates convincingly that, if loops expanded as expected
from standard force-free extrapolation models, then it would be noticeable from observations
even when they are very close to the resolution limit (last row). Since that it not the case,
this may imply that actual magnetic fields have more complexity than is present in the
standard models. We know, for example, that the field is comprised of many thin flux
strands (elemental kilogauss tubes) that are tangled by photospheric convection. We believe
this can explain the symmetry of observed loops with respect to their summit (see discussion
in LKD06 and Klimchuk 2006), but whether it can also explain the lack of a general expansion
with height is unclear.
– 8 –
It is interesting to note from the plots in Figure 4 that the measured width is systemat-
ically smaller than the width set in the construction of the loops. This tendency appears in
all loops and is very likely due to an underestimation of the real width in the measurement
procedure. The procedure requires a subjective selection of the loop edges for the purpose
of defining the background and computing the standard deviation. During this step one
can miss the faint tail of the cross-axis intensity profile that blends in with the background.
We have verified that there is a tendency to define the loop edges to be slightly inside the
actual edges. This causes the measured width to be artificially small, both because the tail
of the profile is missing from the standard deviation computation and because too strong a
background is subtracted from the loop. We expect the effect to be greatest for loops that
are especially faint or especially narrow, as discussed below. If this explanation is correct,
we can conclude that the TRACE loops studied in LKD06 are actually slightly wider than
our measurements seem to indicate.
The fact that the measured width is a lower bound for the real width gives further
support to our assertion that the analyzed loops are instrumentally resolved. In LDK06, we
estimated the width uncertainties associated with background subtraction by repeating each
measurement using different choices for the loop edges. We concluded that rule-of-thumb
error bars range from 10% below to 20% above the measured best value. It now appears
that the actual error bars may be somewhat larger. However, we stress that this does not
impact our ability to distinguish expanding loops from non-expanding loops, as is readily
apparent from Figure 4.
To quantify this claim, we computed the expansion factors (Γm/se in Equation 1) of all
the synthetic loops shown in Figure 4. These are listed in Table 1. The upper and lower
limits that define the error bars are the expansion factors Γm/s and Γm/e. For comparison, in
the case of the observed loop of Figures 1 and 2, the expansion factor computed in the same
way is 1.03±0.04. The values given in Table 1 clearly confirm our conclusion that loops with
constant and expanding cross section can be easily distinguished. It is interesting to note
that the expansion factors for the same loops placed in different backgrounds can be notably
different. The same is true for loops of the same kind (expanding or not) but with different
characteristic size (wide or narrow). Compare row 1 with row 3, and row 2 with row 4 in
Figure 4 and the table. The error bars are also different in all cases. Part of these differences
may be due to the subjective part of the analysis procedure (the selection of the loop edges).
However, repeating the width measurements we obtain approximately the same expansion
factors. Therefore, the distribution of the background emission and the characteristic size of
the loop both play a role in determining the precise value of the expansion factors and the
error bars. In particular, Backgrounds I and II tend to give an under and over estimation of
the expansion factors, respectively (compared to the values set during the loop construction).
– 9 –
Nevertheless, we want to stress that the measured expansion factors of the expanding and
non-expanding synthetic loops are clearly clustered around the actual values, implying that
loops with constant and expanding cross section are readily distinguishable.
Next, we study how the observed loop expansion is affected by the relative intensity of
the loop compared to the background. To test this, we created synthetic data in the way
described in Section 3, for different values of the loop-to-background intensity ratio Φ. In
Figure 5 we plot the expansion factor Γm/se (Equation 1) versus Φ for 4 narrow synthetic
loops with similar characteristics to those shown in panels e) to h) of Figures 3 and 4.
The difference is that the loops of Figure 5 are 300 pixels long, instead of 200 pixels. The
definition of Γ is not affected by the change of length. On the other hand, longer loops
provide more measurements and better statistics for studying how loop expansion depends
on the loop-to-background intensity ratio. We chose thin loops for Figure 5 because their
expansion is more challenging to measure and they are more affected by the background.
If the intensity ratio Φ is too small, it is difficult to detect a loop above the background,
much less measure its width. Our previous studies of observed loops have therefore avoided
such cases. We subjectively define a lower limit for loop visibility of around Φ = 0.3. Below
that, the width determination is unreliable. The upper value Φ = 1.5 is extreme for most
TRACE loops, but it is interesting for analysis and may be appropriate to other datasets.
The intermediate Φ values are 0.5, 0.7 and 1.0. Figure 5 provides strong additional support
for our claim that the expansion factors of expanding and non-expanding loops can be clearly
distinguished, even for the most critical cases of very low intensities and narrow widths. In
no case do the error bars of expanding and non-expanding loops overlap.
It is interesting to note that the synthetic loops used for Figure 5 overlap with more of
the background image than do the shorter synthetic loops used for Figures 3 and 4. The
footpoint and middle sections therefore combine with different portions of the background.
Since the expansion factors are qualitatively similar in the corresponding cases, we can be
confident that our results are not an artifact of the particular loop-background combinations.
So far we have not considered loops that are completely below the resolution limit. In
Figure 6 we show two cases of unresolved synthetic loops. Both have a constant diameter of
0.5 pix, and both use Background I (Figure 3). The loops differ only in the intensity ratio
Φ, which is set to 1 for case (a) and 3 for case (b). Note, however, that because the loops
occupy only a fraction of a pixel, the “observed” intensity ratios are much smaller: around
0.25 for the Φ=1 loop and 0.5 for the Φ=3 loop.
Figure 7 shows the widths of the loops as measured in the usual way, including correction
for instrument resolution. A majority of the measurements are equal to zero, meaning that
– 10 –
the computed standard deviation is below that of the PSF. This is especially true for the
fainter loop of case (a). We can understand this behavior as follows. Due to the influence
of the variable background, we expect some measurements to be too large and others to be
too small. However, because of the systematic effects associated with loop edge selection,
discussed above, we expect more of the measurements to be too small.
The conversion from standard deviation to width is very sensitive at small values, where
the conversion curve is nonlinear, and it only takes small errors in the standard deviation to
produce a zero width value. The solid line in Figure 7 indicates the actual loop width of 0.5
pix, while the dashed line indicates the full width at half maximum of the PSF. The most
important conclusion to draw from the figure is that our measurement technique can easily
detect when loops are unresolved, i.e., when they are thinner than the PSF. As we stated
before, the loops analyzed in LKD06 and previous works are all wider than the PSF (see
also Section 5 below).
Finally, in Figure 8 we plot width versus position along a synthetic loop with a footpoint
width of 2.5 pixels and a model expansion factor of Γm/se = 2.2. Our measurement procedure
tracks the loop expansion very well. The expansion factor computed from the observed width
as in Table 1 gives Γm/se = 2.1± 0.2. Therefore, the loop can be readily distinguished from
the Γm/se = 1.57 loop having the same footpoint size in Figure 4, panel (d).
4.2. Short length scale width fluctuations
As discussed in Section 1 the measured widths of observed loops fluctuate as much as
25% over short length scales. It is important to know whether these variations are real or
an artifact of the background. Comparison of panels a) and b) in Figure 4 with Figure 2
shows that synthetic and observed loops with similar characteristic width exhibit similar
width fluctuations. For the observed loop of Figure 2, the amplitude of the fluctuations
computed as the ratio of the standard deviation of the measured width to its average is
18%. The corresponding ratios for the synthetic loops of panels (a) and (b) in Figure 4,
are 17% and 25%, respectively. This suggests that the fluctuations are not real and argues
against loops being comprised of a small number of braided strands (the possibility that
they are bundles of many tangled strands is not affected). This is not a firm conclusion,
however, since the fluctuations are somewhat more coherent for the observed loop than for
the synthetic loop. We return to this issue below. Narrower loops (e.g., panels (e) and (f)
in Figure 4) show larger amplitude fluctuations (21% and 38%, respectively) mostly because
of the non-linearity of the resolution correction curve (LKD06 Figure 4), which exaggerates
differences at smaller widths.
– 11 –
To study how the background fluctuations affect the width determination, we analyze
the relationships between the width and the loop and background intensities. In Figure 9 we
plot as a function of position along the loop: the intensity of the background pixels at either
side of the loop (from which the loop background is linearly interpolated; continuous lines),
the loop width (dotted), the loop intensity (maximum intensity of the background-subtracted
profile; dashed), and the absolute value of the difference between the two background pixel
intensities (dot-dashed). The loop width is given in pixels and multiplied by 10 for easier
comparison with the intensities. The upper panel corresponds to the observed loop example
of Figures 1 and 2, and the lower panel correspond to the synthetic loop of Figures 3 and 4,
panels (a).
Figure 9 shows that our synthetic loop data share the main qualitative characteristics of
real loops. The fluctuations of the background intensity and its difference at the sides of the
loop, and the loop intrinsic intensity and its fluctuations, are similar in both cases. There are
obvious differences due to the spatial structure unique to each background that can easily
be identified in the images. For example, the bumps between 30 and 60 for the observed
loop, and between positions 0 and 40 and between 90 and 130 for the synthetic loop, can
be traced to patches of enhanced emission in Figures 1 and 3. Another difference is the
global variation in the intensity of the observed and synthetic loops. The measured intensity
of the synthetic loop is nearly constant because the loop was constructed that way (small
fluctuations come entirely from imperfect background subtraction). The measured intensity
of the observed loop, on the other hand, tends to diminish systematically toward the right
end. This is likely to be real and not an artifact of the background subtraction. Despite
of these expected differences, the comparison shows that the synthetic loops reproduce the
main properties of the observed cases.
We have suggested that small-scale fluctuations of the measured intensities and widths
of loops are due to imperfect background subtraction. To further assess this, we look for
statistical correlations between these quantities. In the upper panels of Figure 10 we plot
width versus intensity for all positions along the observed and synthetic loops, respectively.
We find that there is a small direct correlation between the width and intensity in both
cases: wider sections of the loops tend to be brighter. The lines in the scatter plots are least-
squares fits, which have the indicated slopes and intercepts. The correlation between width
and intensity can be explained by the tendency, during the interactive analysis procedure,
to miss the wings of the intensity profile and define the loop edges to be inside the true
edges. As described earlier, this causes an over estimation of the background intensity and
produces artificially narrow loop widths and artificially faint loop intensities. We expect the
magnitude of this effect to vary depending on the brightness of the background relative to
the loop. It will be stronger (i.e., the underestimates of width and intensity will be greater)
– 12 –
when the background is relatively bright. This is confirmed in the second row of Figure
10. It shows an inverse correlation between the measured width and the background-to-loop
intensity ratio for both the observed and synthetic loops. The background intensity used
here is the average of the sloping background subtracted during the analysis (i.e., the average
of the values on either side of the loop). Notice also that for the synthetic loop, the measured
width tends to be smaller than the model width (4 pixels) when the relative intensity of the
background is larger.
This effect is almost certainly responsible for the width-intensity correlation of the
synthetic loop and seems a likely explanation for the observed loop, as well. Whether it
is strong enough to allow the possibility that loops are bundles of a few (3-5) intertwined
strands is unclear. Recall that such loops would exhibit an inverse correlation between
width and intensity if the measurements were perfect. Are measurement errors large enough
to negate this inverse correlation and produce a small direct correlation, as observed? Only
more involved modeling can answer this question.
It seems plausible that cross-loop gradients in the background could also have an effect
on the measured width. Certainly small scale inhomogeneities are more difficult to subtract
than a flat background. In the upper panels of Figure 11, we plot width versus the absolute
value of the background intensity difference on the two sides of the loop. No correlation is
apparent for either the observed or synthetic loops. We confirmed a lack of correlation using
a non-parametric statistical analysis. We also find no correlation between the intrinsic loop
intensity and the background intensity difference.
The right bottom panel of Figure 11 indicates how the known error in width measure-
ment for the synthetic loop correlates with the background intensity gradient. The ordinate
is the absolute value of the difference between the measured width and the width used dur-
ing the loop construction. The abscissa is the absolute value of the background intensity
difference on the two sides, normalized by the loop intensity. The normalization is meant
to compensate for the fact that background gradients should have a lesser impact on bright
loops. The left bottom panel of Figure 11 is a corresponding scatter plot for the observed
loop. Since the actual width is not known, the ordinate is replaced by the absolute value of
the deviation of the measured width from its mean. In neither case is there a correlation, as
confirmed by statistical analysis. We conclude that the magnitude of the background has a
bigger effect on the width measurements than does the difference in the background on the
two sides of the loop (the cross loop gradient).
– 13 –
5. Discussion and conclusion
In this paper we study the effect of the background and the instrument PSF in the
determination of the apparent width of EUV coronal loops observed by TRACE. Our main
motivation is to extend the results obtained in our previous work: López Fuentes, Klimchuk
& Démoulin (2006; LKD06). There, we compared a set of observed TRACE loops with
corresponding force-free model flux-tubes, and we found that observed loops do not expand
with height as expected from the extrapolation model. Here, we construct artificial loops
with expansion factors similar to those of the studied loops and the model flux-tubes, and
we overlay them on real TRACE backgrounds. We repeat on these synthetic loops the
same procedure followed in LKD06, and compare the results back with real loops. We find
that even for loops close to the resolution limit the procedure followed in LKD06 discerns
expanding and non-expanding cross-sections. The method includes a resolution correction
that identifies measurements that are below the resolution limit and therefore unreliable
(see explanation in Section 2). We used a gaussian Point Spread Function (PSF) for the
instrument with a FWHM of 2.25 pixels, which is an upper bound for values found by
different authors (Golub et al. 1999, Gburek et al. 2006).
In a recent paper, DeForest (2007) has proposed the interesting idea that most thin
individual loops observed by TRACE are actually extremely bright structures well under the
resolution limit. In this scenario, the loop apparent width would be given by the instrument
PSF. In this way, loops may actually expand, but their size both at the top and the footpoints
would be unresolved and would appear the same. The motivations for this conjecture are the
apparent constant width of loops, and the observation that TRACE loops have an intensity
scale height that is considerably larger than expected for static equilibrium (Winebarger et
al. 2003, Aschwanden et al. 2001) or steady flow (Patsourakos & Klimchuk 2004). More
precisely, for expanding loops that are everywhere unresolved, the density gradient present
in the corona is larger than inferred from the observations under the assumption of constant
cross section.
According to the above explanation, we should expect all individual TRACE loops to
have a true width less than that of the PSF and an apparent width roughly equal to that of
the PSF. However, observations do not support this. The mean width of the loops studied in
LKD06 is 4.2 pix after correction for the instrument resolution (see also Watko & Klimchuk
2000). As shown in Figure 7 and discussed in Section 4.1, our method can easily identify
loops that are intrinsically more narrow than the PSF. The loops selected for our studies are
clearly not of this type.
As we discussed in Section 1, coronal structures that are many times wider than the
TRACE PSF are observed to be formed by thinner individual loops. Therefore, there is an
– 14 –
intermediate range of widths – let us say between one and three PSF widths – for which the
profiles produced by unresolved threads could overlap to form apparently wider loops. This,
together with the effect of a fluctuating and intense background, are the arguments provided
by DeForest (2007) to explain loops wider than the PSF. A key point in this discussion is
that unresolved neighbor threads might be expected to separate from each other with height
for the same reasons that individual strands might be expected to expand with height (e.g.,
if the field behaves like simple force-free extrapolation models predict). In this respect, a
structure formed by diverging threads does not differ from the expanding loops studied in
Section 3. As we discussed there, the plots in Figures 4 and 5 and the Γ factors given in
Table 1 show clearly that our procedure for the width determination would be able to detect
the expansion if it existed, even for loops near the resolution limit.
It is interesting to compare the synthetic images in Figure 2 in DeForest’s article with
our Figure 3. There, he claims that synthetic loops made from a single unresolved thread
of constant width and from two diverging threads are indistinguishable from each other and
from actual TRACE loops. In our Figure 3 it is also very difficult, by eyeball, to determine
which loops have expanding widths or constant widths. However, the plots in Figure 4 show
that a careful examination through a quantitative measurement provides the answer.
One of DeForest’s main arguments is that it is difficult to measure the width of features
that are at or near the instrument resolution due to effects such as the smearing from the
telescope, pixilation from the detector, and the presence of background emission. We agree,
but these claims need to be quantified. It is not sufficient to make eye-ball comparisons of
features. Quantitative measures must be used. We have adopted the standard deviation
of the loops cross-axis intensity profile as one such measure. We have been very careful in
our work to indicate when the measurements are reliable and when they are not. Measured
widths that are very close to the instrument resolution have very large error bars that we
show (see LKD06) and that we take into account.
We have paid particularly careful attention to the effects of the combined PSF, which
accounts for both smearing and pixilation. DeForest is correct that measurements of very
thin features depend critically on the PSF. We have therefore adopted a conservative value
for the PSF width that is greater than the estimates determined by the instrument teams
and others. Furthermore, features as narrow as our assumed PSF are routinely observed,
which would not be possible if the actual PSF were wider.
DeForest is also correct that background emission can be important and may lead to
spurious results. It is therefore vital to subtract the background before making measure-
ments, as we have done. In LKD06, we have avoided loops where the background is especially
bright or complicated. We attempted in our earlier studies to estimate the uncertainties as-
– 15 –
sociated with imperfect background subtraction, but this was not as careful as our treatment
of resolution effects. The main purpose of this paper was to rigorously evaluate the effects
of background on the measurement of loop widths.
Regarding the importance of quantitative measurements versus visual inspection, we
concur with DeForest that the visual determination of the edge of a feature is subjective
and largely based on the intensity gradient across the feature. This can lead to erroneous
conclusions about width variations if there is a systematic variation of intensity along the
feature, such as decreasing intensity with height. Our quantitative measure of width based
on the standard deviation of the intensity profile is by construction moderating such bias.
The positive correlation found between the loop width and the maximum intensity (top
panels in Figure 10) could be a remnant of this effect or an intrinsic property of the loops.
DeForest correctly points out that, with optically-thin coronal emission, the observed
intensity scale height of a hydrostatic structure is larger for an expanding loop than for a
constant cross section loop, especially if the loop is unresolved. In fact, for the 1-2 MK
model examples he shows (Figures 5 and 6), the intensity actually increases with height by
a factors of 2-3 to a maximum brightness at altitudes near 7 × 109 cm). Whether actual
TRACE loops have this property is unknown and should be investigated. The variation of
temperature with height combined with the transmission properties of the filter used will
complicate the interpretation.
We note that the observation of super-hydrostatic scale heights is different from the
observation of excess densities in TRACE loops. For most TRACE loops, the density inferred
from the observed emission measure and diameter is much larger than that expected from
static equilibrium theory, given the observed temperature and loop length (Aschwanden et
al. 2001, Winebarger et al. 2003). DeForest’s idea of unresolved loops would make this
discrepancy even worse, since a higher density is required to produce the same emission
measure from a smaller volume.
The loops identified and measured by DeForest are qualitatively much different from
the loops identified and measured in our studies. We chose cases that are not obviously
composed of a few resolved or quasi-resolved strands (although we believe that our loops
may be composed of large numbers of elemental strands that are far below the resolution
limit). The only one of his loops with no apparent internal structure (Loop 6 in his Figure 8)
would have not been selected by us, because it is barely discernable above the background.
On the other hand, some of the thinner structures within DeForest’s loop bundles (e.g., at
the bottom edge of his Loop 3) are not unlike the loops we have investigated. In this regard,
we must clarify a comment attributed to one of us at the end of Section 5 in his paper. We
suggest that researchers seeking to study monolithic-looking loops will tend to select cases
– 16 –
that are only a few resolution elements across. Significantly wider loops (e.g, all except Loop
6 in DeForest’s sample) usually show evidence of internal structure and will be rejected.
We agree fully with DeForest that collections of loops (loop bundles) expand appreciably
with height. However, we stand by our claim that individual loops that are clearly discernable
within a bundle have a much more uniform width. This is not an artifact of the resolution.
A hare and hounds exercise, as currently planned, is one useful way to clarify any remaining
differences of opinion.
An important topic of the present study has been the analysis of how the properties of
the background affect the loop width determination. We searched for correlations between
the width and: the loop intrinsic intensity, the background intensity to loop intensity ratios,
and the absolute value of the background difference. The background intensity is computed
as the average between the pixels at both sides of the loop, which is used for the estimation of
the actual loop background, while the background difference is the difference between those
pixels. We found a direct correlation between the width and the maximum intensity of the
loop profile (see Figure 10, upper panels). This is probably due to the fact that we tend to
miss the “tails” of the loop profile at positions where the loop is less intense with respect to
the background, and therefore, the measured profile tends to be narrower. This is confirmed
by the inverse correlation found between the width and the ratio of background intensity to
loop intensity (see Figure 10, bottom panels). It can be seen from the plots that the width
tends to be abnormally narrower, and the points more disperse, for larger background to
loop intensity ratios.
We found no evidence of correlation between the width and the the background differ-
ence. This shows that the background gradients are less important in the determination of
the width than the background relative intensity. We stress, however, that this does not
affect our ability to determine the global expansion properties of loops, and that despite of
the background contribution we are readily able to distinguish constant width loops from
loops that expand as predicted from simple force-free magnetic models.
The results presented here are extremely intriguing and provide clues and new questions
to guide future investigations. However, it is expected that definitive answers will come from
improved observations using new generations of solar instruments with higher resolution.
We acknowledge the Transition Region and Coronal Explorer (TRACE ) team. We wish
to thank Craig DeForest for fruitful discussions about the nature of observed loops. We also
thank our anonymous referee for his/her valuable suggestions and comments. The authors
acknowledge financial support from CNRS (France) and CONICET (Argentina) through
their cooperative science program (N0 20326). MLF thanks the Secretary of Science and
– 17 –
Technology of Argentina, through its RAICES program, for travel support. This work was
partially funded by NASA and the Office of Naval Research.
REFERENCES
Aschwanden,M.J., Nightingale,R.W., & Boerner,P. 2007, ApJ, 656, 577
Aschwanden, M. J., Schrijver, C. J., & Alexander, D. 2001, ApJ, 550, 1036
Berger, T. E., De Pontieu, B., Fletcher, L., Schrijver, C. J., Tarbell, T. D., & Title, A. M.
1999, Sol. Phys., 190, 409
DeForest, C. E. 2007, ApJ, 661, 532
Gburek, S., Sylwester, J., & Martens, P. 2006, Sol. Phys., 239, 531
Golub, L., et al. 1999, Physics of Plasmas, 6, 2205
Handy, B. N., et al. 1999, Sol. Phys., 187, 229
Klimchuk, J. A. 2000, Sol. Phys., 193, 53
Klimchuk, J. A., Antiochos, S. K., & Norton, D. 2000, ApJ, 542, 504
Klimchuk, J. A. 2006, Sol. Phys., 234, 41
Klimchuk, J. A., Lemen, J. R., Feldman, U., Tsuneta, S., & Uchida, Y. 1992, PASJ, 44,
López Fuentes, M. C., Klimchuk, J. A., & Démoulin, P. 2006, ApJ, 639, 459 (LKD06)
Martens, P. C. H., Kankelborg, C. C., & Berger, T. E. 2000, ApJ, 537, 471
Patsourakos, S. & Klimchuk, J. A. 2004, ApJ, 603, 322
Scherrer, P. H. et al. 1995, Sol. Phys., 162, 129
Schmelz, J. T., & Martens, P. C. H. 2006, ApJ, 636, L49
Watko, J. A. & Klimchuk, J. A. 2000, Sol. Phys., 193, 77
Winebarger, A. R., Warren, H. P., & Mariska, J. T. 2003, ApJ, 587, 439
This preprint was prepared with the AAS LATEX macros v5.2.
– 18 –
Table 1: Expansion factors Γm/se (Equation 1) for the synthetic loops shown in Figures 3
and 4 (see detailed explanation in Section 4.1).
Synthetic loop Imposed Background I Background II
Const. width (4 pix) 1 0.85± .02 0.95± .04
Variable width (2.5-5 pix) 1.57 1.38± .25 1.76± .01
Const. width (3 pix) 1 0.82± .05 1.03± .03
Variable width (2-4 pix) 1.57 1.59± .28 2.11± .20
– 19 –
Fig. 1.— The loop shown is an example of the TRACE loops studied in LKD06. The lower
panel shows the straightened version of the loop that is used for the width determination.
– 20 –
Fig. 2.— Measured width versus position along the loop of Figure 1. Background subtraction
and PSF correction have been applied, as described in Section 2. The horizontal line shows
the mean value.
– 21 –
Fig. 3.— Synthetic data created by superposing loops with different specified properties on
real TRACE backgrounds. The top panels show the background used in each column. The
ends (footpoints) and middle of the loops have an imposed diameter (in pixels) of: (a,b) 4-4,
(c,d) 2.5-5, (e,f) 3-3, (g,h) 2-4. This provides both constant and expanding (by a factor 2)
synthetic loops close to the spatial resolution. For a detailed description of the panels see
Section 3.
Fig. 4.— Measured width corrected for instrument resolution (asterisks) versus position
along the synthetic loops shown in panels a) to h) of Figure 3. Continuous lines indicate the
model width.
– 23 –
Fig. 5.— Expansion factor Γm/se versus loop-to-background intensity ratio Φ, for expanding
and non-expanding synthetic narrow loops on backgrounds I and II (similar to loops in panels
e-h of Figure 4). The error bars are defined by the expansion factors Γm/s and Γm/e (see
Section 4.1). The horizontal line indicates the expansion factor of the model.
– 24 –
Fig. 6.— Two examples of unresolved synthetic loops constructed with a constant width of
0.5 pixels. The loops differ only in the loop-to-background intensity ratio Φ, which is set to
1 for case (a) and 3 for case (b) (see Section 4.1).
– 25 –
Fig. 7.— Width measurements for the two synthetic loops in Figure 6, corrected for the
instrument resolution. The solid line indicates the actual model loop width, and the dashed
line indicates the PSF full width at half maximum.
– 26 –
Fig. 8.— Width versus position along a synthetic loop with a model expansion factor of 2.2.
The width has been corrected for instrument resolution.
– 27 –
Fig. 9.— Different loop and background properties versus position along the loop. Top
panel: example loop from Figure 1; botton panel: synthetic loop from Figure 3, panel a).
For a detailed description see Section 4.2.
Fig. 10.— Scatter plots of measured quantities for the observed loop in Figure 1 (left
column) and for the synthetic loop of Figures 3 and 4, panels (a) (right column). Top:
width versus on-axis loop intensity. Bottom: width versus background intensity to loop
intensity ratio (see Section 4.2). Continuous lines correspond to least-squares fits of the
data. Slopes and intercepts are given in the respective panels.
Fig. 11.— Scatter plots of measured quantities for the observed loop in Figure 1 (left
column) and for the synthetic loop of Figures 3 and 4, panels (a) (right column). Top:
width versus absolute value of the background intensity difference across the loop. Bottom
left: width deviation from the mean versus the background intensity difference normalized
by the loop intensity (see Section 4.2). Bottom right: same kind of plot, but the deviation
is relative to the model width.
	Introduction
	Observed TRACE loops
	Synthetic loops
	Results
	Can we detect expanding loops?
	Short length scale width fluctuations
	Discussion and conclusion
ABSTRACT
  We study the effect of the coronal background in the determination of the
diameter of EUV loops, and we analyze the suitability of the procedure followed
in a previous paper (L\'opez Fuentes, Klimchuk & D\'emoulin 2006) for
characterizing their expansion properties. For the analysis we create different
synthetic loops and we place them on real backgrounds from data obtained with
the Transition Region and Coronal Explorer (\textit{TRACE}). We apply to these
loops the same procedure followed in our previous works, and we compare the
results with real loop observations. We demonstrate that the procedure allows
us to distinguish constant width loops from loops that expand appreciably with
height, as predicted by simple force-free field models. This holds even for
loops near the resolution limit. The procedure can easily determine when loops
are below resolution limit and therefore not reliably measured. We find that
small-scale variations in the measured loop width are likely due to
imperfections in the background subtraction. The greatest errors occur in
especially narrow loops and in places where the background is especially bright
relative to the loop. We stress, however, that these effects do not impact the
ability to measure large-scale variations. The result that observed loops do
not expand systematically with height is robust.

<|endoftext|><|startoftext|>
Polarizations of J/ψ and ψ(2S) Mesons Produced in pp Collisions at
s = 1.96 TeV
A. Abulencia,24 J. Adelman,13 T. Affolder,10 T. Akimoto,55 M.G. Albrow,17 S. Amerio,43 D. Amidei,35
A. Anastassov,52 K. Anikeev,17 A. Annovi,19 J. Antos,14 M. Aoki,55 G. Apollinari,17 T. Arisawa,57 A. Artikov,15
W. Ashmanskas,17 A. Attal,3 A. Aurisano,42 F. Azfar,42 P. Azzi-Bacchetta,43 P. Azzurri,46 N. Bacchetta,43
W. Badgett,17 A. Barbaro-Galtieri,29 V.E. Barnes,48 B.A. Barnett,25 S. Baroiant,7 V. Bartsch,31 G. Bauer,33
P.-H. Beauchemin,34 F. Bedeschi,46 S. Behari,25 G. Bellettini,46 J. Bellinger,59 A. Belloni,33 D. Benjamin,16
A. Beretvas,17 J. Beringer,29 T. Berry,30 A. Bhatti,50 M. Binkley,17 D. Bisello,43 I. Bizjak,31 R.E. Blair,2
C. Blocker,6 B. Blumenfeld,25 A. Bocci,16 A. Bodek,49 V. Boisvert,49 G. Bolla,48 A. Bolshov,33 D. Bortoletto,48
J. Boudreau,47 A. Boveia,10 B. Brau,10 L. Brigliadori,5 C. Bromberg,36 E. Brubaker,13 J. Budagov,15 H.S. Budd,49
S. Budd,24 K. Burkett,17 G. Busetto,43 P. Bussey,21 A. Buzatu,34 K. L. Byrum,2 S. Cabreraq,16 M. Campanelli,20
M. Campbell,35 F. Canelli,17 A. Canepa,45 S. Carilloi,18 D. Carlsmith,59 R. Carosi,46 S. Carron,34 B. Casal,11
M. Casarsa,54 A. Castro,5 P. Catastini,46 D. Cauz,54 M. Cavalli-Sforza,3 A. Cerri,29 L. Cerritom,31 S.H. Chang,28
Y.C. Chen,1 M. Chertok,7 G. Chiarelli,46 G. Chlachidze,17 F. Chlebana,17 I. Cho,28 K. Cho,28 D. Chokheli,15
J.P. Chou,22 G. Choudalakis,33 S.H. Chuang,52 K. Chung,12 W.H. Chung,59 Y.S. Chung,49 M. Cilijak,46
C.I. Ciobanu,24 M.A. Ciocci,46 A. Clark,20 D. Clark,6 M. Coca,16 G. Compostella,43 M.E. Convery,50 J. Conway,7
B. Cooper,31 K. Copic,35 M. Cordelli,19 G. Cortiana,43 F. Crescioli,46 C. Cuenca Almenarq,7 J. Cuevasl,11
R. Culbertson,17 J.C. Cully,35 S. DaRonco,43 M. Datta,17 S. D’Auria,21 T. Davies,21 D. Dagenhart,17
P. de Barbaro,49 S. De Cecco,51 A. Deisher,29 G. De Lentdeckerc,49 G. De Lorenzo,3 M. Dell’Orso,46 F. Delli Paoli,43
L. Demortier,50 J. Deng,16 M. Deninno,5 D. De Pedis,51 P.F. Derwent,17 G.P. Di Giovanni,44 C. Dionisi,51
B. Di Ruzza,54 J.R. Dittmann,4 M. D’Onofrio,3 C. Dörr,26 S. Donati,46 P. Dong,8 J. Donini,43 T. Dorigo,43
S. Dube,52 J. Efron,39 R. Erbacher,7 D. Errede,24 S. Errede,24 R. Eusebi,17 H.C. Fang,29 S. Farrington,30
I. Fedorko,46 W.T. Fedorko,13 R.G. Feild,60 M. Feindt,26 J.P. Fernandez,32 R. Field,18 G. Flanagan,48 R. Forrest,7
S. Forrester,7 M. Franklin,22 J.C. Freeman,29 I. Furic,13 M. Gallinaro,50 J. Galyardt,12 J.E. Garcia,46
F. Garberson,10 A.F. Garfinkel,48 C. Gay,60 H. Gerberich,24 D. Gerdes,35 S. Giagu,51 P. Giannetti,46 K. Gibson,47
J.L. Gimmell,49 C. Ginsburg,17 N. Giokarisa,15 M. Giordani,54 P. Giromini,19 M. Giunta,46 G. Giurgiu,25
V. Glagolev,15 D. Glenzinski,17 M. Gold,37 N. Goldschmidt,18 J. Goldsteinb,42 A. Golossanov,17 G. Gomez,11
G. Gomez-Ceballos,33 M. Goncharov,53 O. González,32 I. Gorelov,37 A.T. Goshaw,16 K. Goulianos,50 A. Gresele,43
S. Grinstein,22 C. Grosso-Pilcher,13 R.C. Group,17 U. Grundler,24 J. Guimaraes da Costa,22 Z. Gunay-Unalan,36
C. Haber,29 K. Hahn,33 S.R. Hahn,17 E. Halkiadakis,52 A. Hamilton,20 B.-Y. Han,49 J.Y. Han,49 R. Handler,59
F. Happacher,19 K. Hara,55 D. Hare,52 M. Hare,56 S. Harper,42 R.F. Harr,58 R.M. Harris,17 M. Hartz,47
K. Hatakeyama,50 J. Hauser,8 C. Hays,42 M. Heck,26 A. Heijboer,45 B. Heinemann,29 J. Heinrich,45 C. Henderson,33
M. Herndon,59 J. Heuser,26 D. Hidas,16 C.S. Hillb,10 D. Hirschbuehl,26 A. Hocker,17 A. Holloway,22 S. Hou,1
M. Houlden,30 S.-C. Hsu,9 B.T. Huffman,42 R.E. Hughes,39 U. Husemann,60 J. Huston,36 J. Incandela,10
G. Introzzi,46 M. Iori,51 A. Ivanov,7 B. Iyutin,33 E. James,17 D. Jang,52 B. Jayatilaka,16 D. Jeans,51 E.J. Jeon,28
S. Jindariani,18 W. Johnson,7 M. Jones,48 K.K. Joo,28 S.Y. Jun,12 J.E. Jung,28 T.R. Junk,24 T. Kamon,53
P.E. Karchin,58 Y. Kato,41 Y. Kemp,26 R. Kephart,17 U. Kerzel,26 V. Khotilovich,53 B. Kilminster,39 D.H. Kim,28
H.S. Kim,28 J.E. Kim,28 M.J. Kim,17 S.B. Kim,28 S.H. Kim,55 Y.K. Kim,13 N. Kimura,55 L. Kirsch,6 S. Klimenko,18
M. Klute,33 B. Knuteson,33 B.R. Ko,16 K. Kondo,57 D.J. Kong,28 J. Konigsberg,18 A. Korytov,18 A.V. Kotwal,16
A.C. Kraan,45 J. Kraus,24 M. Kreps,26 J. Kroll,45 N. Krumnack,4 M. Kruse,16 V. Krutelyov,10 T. Kubo,55
S. E. Kuhlmann,2 T. Kuhr,26 N.P. Kulkarni,58 Y. Kusakabe,57 S. Kwang,13 A.T. Laasanen,48 S. Lai,34 S. Lami,46
S. Lammel,17 M. Lancaster,31 R.L. Lander,7 K. Lannon,39 A. Lath,52 G. Latino,46 I. Lazzizzera,43 T. LeCompte,2
J. Lee,49 J. Lee,28 Y.J. Lee,28 S.W. Leeo,53 R. Lefèvre,20 N. Leonardo,33 S. Leone,46 S. Levy,13 J.D. Lewis,17
C. Lin,60 C.S. Lin,17 M. Lindgren,17 E. Lipeles,9 A. Lister,7 D.O. Litvintsev,17 T. Liu,17 N.S. Lockyer,45
A. Loginov,60 M. Loreti,43 R.-S. Lu,1 D. Lucchesi,43 P. Lujan,29 P. Lukens,17 G. Lungu,18 L. Lyons,42 J. Lys,29
R. Lysak,14 E. Lytken,48 P. Mack,26 D. MacQueen,34 R. Madrak,17 K. Maeshima,17 K. Makhoul,33 T. Maki,23
P. Maksimovic,25 S. Malde,42 S. Malik,31 G. Manca,30 F. Margaroli,5 R. Marginean,17 C. Marino,26
C.P. Marino,24 A. Martin,60 M. Martin,25 V. Marting,21 M. Mart́ınez,3 R. Mart́ınez-Ballaŕın,32 T. Maruyama,55
P. Mastrandrea,51 T. Masubuchi,55 H. Matsunaga,55 M.E. Mattson,58 R. Mazini,34 P. Mazzanti,5 K.S. McFarland,49
P. McIntyre,53 R. McNultyf ,30 A. Mehta,30 P. Mehtala,23 S. Menzemerh,11 A. Menzione,46 P. Merkel,48
C. Mesropian,50 A. Messina,36 T. Miao,17 N. Miladinovic,6 J. Miles,33 R. Miller,36 C. Mills,10 M. Milnik,26
A. Mitra,1 G. Mitselmakher,18 A. Miyamoto,27 S. Moed,20 N. Moggi,5 B. Mohr,8 C.S. Moon,28 R. Moore,17
http://arxiv.org/abs/0704.0638v2
M. Morello,46 P. Movilla Fernandez,29 J. Mülmenstädt,29 A. Mukherjee,17 Th. Muller,26 R. Mumford,25
P. Murat,17 M. Mussini,5 J. Nachtman,17 A. Nagano,55 J. Naganoma,57 K. Nakamura,55 I. Nakano,40 A. Napier,56
V. Necula,16 C. Neu,45 M.S. Neubauer,9 J. Nielsenn,29 L. Nodulman,2 O. Norniella,3 E. Nurse,31 S.H. Oh,16
Y.D. Oh,28 I. Oksuzian,18 T. Okusawa,41 R. Oldeman,30 R. Orava,23 K. Osterberg,23 C. Pagliarone,46
E. Palencia,11 V. Papadimitriou,17 A. Papaikonomou,26 A.A. Paramonov,13 B. Parks,39 S. Pashapour,34
J. Patrick,17 G. Pauletta,54 M. Paulini,12 C. Paus,33 D.E. Pellett,7 A. Penzo,54 T.J. Phillips,16 G. Piacentino,46
J. Piedra,44 L. Pinera,18 K. Pitts,24 C. Plager,8 L. Pondrom,59 X. Portell,3 O. Poukhov,15 N. Pounder,42
F. Prakoshyn,15 A. Pronko,17 J. Proudfoot,2 F. Ptohose,19 G. Punzi,46 J. Pursley,25 J. Rademackerb,42
A. Rahaman,47 V. Ramakrishnan,59 N. Ranjan,48 I. Redondo,32 B. Reisert,17 V. Rekovic,37 P. Renton,42
M. Rescigno,51 S. Richter,26 F. Rimondi,5 L. Ristori,46 A. Robson,21 T. Rodrigo,11 E. Rogers,24 S. Rolli,56
R. Roser,17 M. Rossi,54 R. Rossin,10 P. Roy,34 A. Ruiz,11 J. Russ,12 V. Rusu,13 H. Saarikko,23 A. Safonov,53
W.K. Sakumoto,49 G. Salamanna,51 O. Saltó,3 L. Santi,54 S. Sarkar,51 L. Sartori,46 K. Sato,17 P. Savard,34
A. Savoy-Navarro,44 T. Scheidle,26 P. Schlabach,17 E.E. Schmidt,17 M.P. Schmidt,60 M. Schmitt,38 T. Schwarz,7
L. Scodellaro,11 A.L. Scott,10 A. Scribano,46 F. Scuri,46 A. Sedov,48 S. Seidel,37 Y. Seiya,41 A. Semenov,15
L. Sexton-Kennedy,17 A. Sfyrla,20 S.Z. Shalhout,58 M.D. Shapiro,29 T. Shears,30 P.F. Shepard,47 D. Sherman,22
M. Shimojimak,55 M. Shochet,13 Y. Shon,59 I. Shreyber,20 A. Sidoti,46 P. Sinervo,34 A. Sisakyan,15 A.J. Slaughter,17
J. Slaunwhite,39 K. Sliwa,56 J.R. Smith,7 F.D. Snider,17 R. Snihur,34 M. Soderberg,35 A. Soha,7 S. Somalwar,52
V. Sorin,36 J. Spalding,17 F. Spinella,46 T. Spreitzer,34 P. Squillacioti,46 M. Stanitzki,60 A. Staveris-Polykalas,46
R. St. Denis,21 B. Stelzer,8 O. Stelzer-Chilton,42 D. Stentz,38 J. Strologas,37 D. Stuart,10 J.S. Suh,28 A. Sukhanov,18
H. Sun,56 I. Suslov,15 T. Suzuki,55 A. Taffardp,24 R. Takashima,40 Y. Takeuchi,55 R. Tanaka,40 M. Tecchio,35
P.K. Teng,1 K. Terashi,50 J. Thomd,17 A.S. Thompson,21 E. Thomson,45 P. Tipton,60 V. Tiwari,12 S. Tkaczyk,17
D. Toback,53 S. Tokar,14 K. Tollefson,36 T. Tomura,55 D. Tonelli,46 S. Torre,19 D. Torretta,17 S. Tourneur,44
W. Trischuk,34 R. Tsuchiya,57 S. Tsuno,40 Y. Tu,45 N. Turini,46 F. Ukegawa,55 S. Uozumi,55 S. Vallecorsa,20
N. van Remortel,23 A. Varganov,35 E. Vataga,37 F. Vazquezi,18 G. Velev,17 G. Veramendi,24 V. Veszpremi,48
M. Vidal,32 R. Vidal,17 I. Vila,11 R. Vilar,11 T. Vine,31 I. Vollrath,34 I. Volobouevo,29 G. Volpi,46 F. Würthwein,9
P. Wagner,53 R.G. Wagner,2 R.L. Wagner,17 J. Wagner,26 W. Wagner,26 R. Wallny,8 S.M. Wang,1 A. Warburton,34
D. Waters,31 M. Weinberger,53 W.C. Wester III,17 B. Whitehouse,56 D. Whiteson,45 A.B. Wicklund,2
E. Wicklund,17 G. Williams,34 H.H. Williams,45 P. Wilson,17 B.L. Winer,39 P. Wittichd,17 S. Wolbers,17
C. Wolfe,13 T. Wright,35 X. Wu,20 S.M. Wynne,30 A. Yagil,9 K. Yamamoto,41 J. Yamaoka,52 T. Yamashita,40
C. Yang,60 U.K. Yangj,13 Y.C. Yang,28 W.M. Yao,29 G.P. Yeh,17 J. Yoh,17 K. Yorita,13 T. Yoshida,41 G.B. Yu,49
I. Yu,28 S.S. Yu,17 J.C. Yun,17 L. Zanello,51 A. Zanetti,54 I. Zaw,22 X. Zhang,24 J. Zhou,52 and S. Zucchelli5
(CDF Collaboration∗)
1Institute of Physics, Academia Sinica, Taipei, Taiwan 11529, Republic of China
2Argonne National Laboratory, Argonne, Illinois 60439
3Institut de Fisica d’Altes Energies, Universitat Autonoma de Barcelona, E-08193, Bellaterra (Barcelona), Spain
4Baylor University, Waco, Texas 76798
5Istituto Nazionale di Fisica Nucleare, University of Bologna, I-40127 Bologna, Italy
6Brandeis University, Waltham, Massachusetts 02254
7University of California, Davis, Davis, California 95616
8University of California, Los Angeles, Los Angeles, California 90024
9University of California, San Diego, La Jolla, California 92093
10University of California, Santa Barbara, Santa Barbara, California 93106
11Instituto de Fisica de Cantabria, CSIC-University of Cantabria, 39005 Santander, Spain
12Carnegie Mellon University, Pittsburgh, PA 15213
13Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637
14Comenius University, 842 48 Bratislava, Slovakia; Institute of Experimental Physics, 040 01 Kosice, Slovakia
15Joint Institute for Nuclear Research, RU-141980 Dubna, Russia
16Duke University, Durham, North Carolina 27708
∗ With visitors from aUniversity of Athens, bUniversity of Bristol, cUniversity Libre de Bruxelles, dCornell University, eUniversity
of Cyprus, fUniversity of Dublin, gUniversity of Edinburgh, hUniversity of Heidelberg, iUniversidad Iberoamericana, jUniversity of
Manchester, kNagasaki Institute of Applied Science, lUniversity de Oviedo, mUniversity of London, Queen Mary College, nUniversity
of California Santa Cruz, oTexas Tech University, pUniversity of California Irvine, and qIFIC(CSIC-Universitat de Valencia).
17Fermi National Accelerator Laboratory, Batavia, Illinois 60510
18University of Florida, Gainesville, Florida 32611
19Laboratori Nazionali di Frascati, Istituto Nazionale di Fisica Nucleare, I-00044 Frascati, Italy
20University of Geneva, CH-1211 Geneva 4, Switzerland
21Glasgow University, Glasgow G12 8QQ, United Kingdom
22Harvard University, Cambridge, Massachusetts 02138
23Division of High Energy Physics, Department of Physics,
University of Helsinki and Helsinki Institute of Physics, FIN-00014, Helsinki, Finland
24University of Illinois, Urbana, Illinois 61801
25The Johns Hopkins University, Baltimore, Maryland 21218
26Institut für Experimentelle Kernphysik, Universität Karlsruhe, 76128 Karlsruhe, Germany
27High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki 305, Japan
28Center for High Energy Physics: Kyungpook National University,
Taegu 702-701, Korea; Seoul National University, Seoul 151-742,
Korea; SungKyunKwan University, Suwon 440-746, Korea
29Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California 94720
30University of Liverpool, Liverpool L69 7ZE, United Kingdom
31University College London, London WC1E 6BT, United Kingdom
32Centro de Investigaciones Energeticas Medioambientales y Tecnologicas, E-28040 Madrid, Spain
33Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
34Institute of Particle Physics: McGill University, Montréal,
Canada H3A 2T8; and University of Toronto, Toronto, Canada M5S 1A7
35University of Michigan, Ann Arbor, Michigan 48109
36Michigan State University, East Lansing, Michigan 48824
37University of New Mexico, Albuquerque, New Mexico 87131
38Northwestern University, Evanston, Illinois 60208
39The Ohio State University, Columbus, Ohio 43210
40Okayama University, Okayama 700-8530, Japan
41Osaka City University, Osaka 588, Japan
42University of Oxford, Oxford OX1 3RH, United Kingdom
43University of Padova, Istituto Nazionale di Fisica Nucleare,
Sezione di Padova-Trento, I-35131 Padova, Italy
44LPNHE, Universite Pierre et Marie Curie/IN2P3-CNRS, UMR7585, Paris, F-75252 France
45University of Pennsylvania, Philadelphia, Pennsylvania 19104
46Istituto Nazionale di Fisica Nucleare Pisa, Universities of Pisa,
Siena and Scuola Normale Superiore, I-56127 Pisa, Italy
47University of Pittsburgh, Pittsburgh, Pennsylvania 15260
48Purdue University, West Lafayette, Indiana 47907
49University of Rochester, Rochester, New York 14627
50The Rockefeller University, New York, New York 10021
51Istituto Nazionale di Fisica Nucleare, Sezione di Roma 1,
University of Rome “La Sapienza,” I-00185 Roma, Italy
52Rutgers University, Piscataway, New Jersey 08855
53Texas A&M University, College Station, Texas 77843
54Istituto Nazionale di Fisica Nucleare, University of Trieste/ Udine, Italy
55University of Tsukuba, Tsukuba, Ibaraki 305, Japan
56Tufts University, Medford, Massachusetts 02155
57Waseda University, Tokyo 169, Japan
58Wayne State University, Detroit, Michigan 48201
59University of Wisconsin, Madison, Wisconsin 53706
60Yale University, New Haven, Connecticut 06520
We have measured the polarizations of J/ψ and ψ(2S) mesons as functions of their transverse
momentum pT when they are produced promptly in the rapidity range |y| < 0.6 with pT ≥ 5 GeV/c.
The analysis is performed using a data sample with an integrated luminosity of about 800 pb−1
collected by the CDF II detector. For both vector mesons, we find that the polarizations become
increasingly longitudinal as pT increases from 5 to 30 GeV/c. These results are compared to the pre-
dictions of nonrelativistic quantum chromodynamics and other contemporary models. The effective
polarizations of J/ψ and ψ(2S) mesons from B-hadron decays are also reported.
PACS numbers: 13.88.+e, 13.20.Gd, 14.40.Lb
An effective field theory, nonrelativistic quantum chromodynamics (NRQCD) [1], provides a rigorous formalism for
calculating the production rates of charmonium (cc) states. NRQCD explains the direct production cross sections
for J/ψ and ψ(2S) mesons observed at the Tevatron [2, 3] and predicts their increasingly transverse polarizations
as pT increases, where pT is the meson’s momentum component perpendicular to the colliding beam direction [4].
The first polarization measurements at the Tevatron [5] did not show such a trend. This Letter reports on J/ψ and
ψ(2S) polarization measurements with a larger data sample than previously available. This allows the extension of
the measurement to a higher pT region and makes a more stringent test of the NRQCD prediction.
The NRQCD cross section calculation for cc production separates the long-distance nonperturbative contributions
from the short-distance perturbative behavior. The former is treated as an expansion of the matrix elements in powers
of the nonrelativistic charm-quark velocity. This expansion can be computed by lattice simulations, but currently the
expansion coefficients are treated as universal parameters, which are adjusted to match the cross section measurements
at the Tevatron [2, 3]. The calculation also applies to cc production in ep collisions, but HERA measurements of J/ψ
polarization tend to disagree with the NRQCD prediction [6]. These difficulties have led some authors to explore
alternative power expansions of the long-distance interactions for the cc system [7]. There are also new QCD-inspired
models, the gluon tower model [8] and the kT -factorization model [9], that accomodate vector-meson cross sections at
both HERA and the Tevatron and predict the vector-meson polarizations as functions of pT . These authors emphasize
that measuring the vector-meson polarizations as functions of pT is a crucial test of NRQCD.
The CDF II detector is described in detail elsewhere [3, 10]. In this analysis, the essential features are a muon system
covering the central region of pseudorapidity, |η| < 0.6, and the tracking system, immersed in the 1.4 T solenoidal
magnetic field and composed of a silicon microstrip detector and a cylindrical drift chamber called the central outer
tracker (COT). The data used here correspond to an integrated luminosity of about 800 pb−1 and were recorded
between June 2004 and February 2006 by a dimuon trigger, which requires two opposite-charge muon candidates,
each having pT > 1.5 GeV/c.
Decays of vector mesons V (either J/ψ or ψ(2S)) → µ+µ− are selected from dimuon events for which each track
has segments reconstructed in both the COT and the silicon microstrip detector. The pT of each muon is required
to exceed 1.75 GeV/c in order to guarantee a well-measured trigger efficiency. The muon track pair is required to
be consistent with originating from a common vertex and to have an invariant mass M within the range 2.8 (3.4) <
M < 3.4 (3.9) GeV/c2 to be considered as a J/ψ (ψ(2S)) candidate. To have a reasonable polarization sensitivity,
the vector-meson candidates are required to have pT ≥ 5 GeV/c in the rapidity range |y (≡ 12 ln
E+p||
E−p||
)| < 0.6, where
E is the energy and p|| is the momentum parallel to the beam direction of the dimuon system. Events are separated
into a signal region and sideband regions, as indicated in Fig. 1. The fit to the data uses a double (single) Gaussian
for the J/ψ (ψ(2S)) signal and a linear background shape. The fits are used only to define signal and background
regions. The signal regions are within 3σV of the fitted mass peaks MV , where σV is the width obtained in the fit
to the invariant mass distribution. Both the background distribution and the quantity of background events under
the signal peak are estimated by events from the lower and upper mass sidebands. The sideband regions are 7σJ/ψ
(4σψ(2S)) away from the signal region for J/ψ (ψ(2S)).
For each candidate, we compute ct =MLxy/pT , where t is the proper decay time and Lxy is the transverse distance
between the beam line and the decay vertex in the plane normal to the beam direction. The ct distributions of the
selected dimuon events are shown in Fig. 2. The ct distribution of prompt events is a Gaussian distribution centered
at zero due to finite tracking resolution. For J/ψ, the prompt events are due to direct production or the decays of
heavier charmonium states such as χc and ψ(2S); for ψ(2S), the prompt events are almost entirely due to direct
production since heavier charmonium states rarely decay to ψ(2S) [11]. Both the J/ψ and the ψ(2S) samples contain
significant numbers of events originating from long-lived B-hadron decays, as can be seen from the event excess at
positive ct. We have measured the fraction of B → J/ψ + X events in the J/ψ sample and found agreement with
other results [3]. We select prompt events by requiring the sum of the squared impact parameter significances of the
positively and negatively charged muon tracks S ≡ ( d
)2 + (
)2 ≤ 8. The impact parameter d0 is the distance
of closest approach of the track to the beam line in the transverse plane. Vector-meson candidates from B-hadron
decays are selected by requiring S > 16 and ct > 0.03 cm. This requirement retains a negligible fraction of prompt
events in the B sample.
To measure the polarizations of prompt J/ψ and ψ(2S) mesons as functions of pT , the J/ψ events are analyzed in
six pT bins and the ψ(2S) events in three bins, shown in Table I. We determine the fraction of B-decay background
remaining in prompt samples fbkd by subtracting the number of negative ct events from the number of positive ct
events. Only a negligible fraction (< 0.2%) of B decays produce vector-meson events with negative ct. For both
vector mesons, fbkd increases with pT , as listed in Table I. The prompt polarization from the fitting algorithm is
corrected for this contamination.
M (GeV/c
2.8 2.9 3 3.1 3.2 3.3 3.4
M (GeV/c
3.4 3.5 3.6 3.7 3.8 3.9
FIG. 1: Invariant mass distributions for (a) J/ψ and (b) ψ(2S) candidates. The curves are fits to the data. The solid (dashed)
lines indicate the signal (sideband) regions.
ct (cm)
-0.3 -0.2 -0.1 0 0.1 0.2 0.3
510 (a)
ct (cm)
-0.2 -0.1 0 0.1 0.2
410 (b)
FIG. 2: Sideband-subtracted ct distributions for (a) J/ψ and (b) ψ(2S) events. The prompt Gaussian peak, positive excess
from B-hadron decays, and negative tail from mismeasured events are shown. The dotted line is the reflection of the negative
ct histogram about zero.
The polarization information is contained in the distribution of the muon decay angle θ∗, the angle of the µ+ in the
rest frame of vector meson with respect to the vector-meson boost direction in the laboratory system. The decay angle
distribution depends on the polarization parameter α:
d cos θ∗
∝ 1 + α cos2θ∗ (−1 ≤ α ≤ 1). For fully transverse
(longitudinal) polarization, α = +1 (−1). Intermediate values of α indicate a mixture of transverse and longitudinal
polarization.
A template method is used to account for acceptance and efficiency. Two sets of cos θ∗ distributions for fully
polarized decays of J/ψ and ψ(2S) events, one longitudinal (L) and the other transverse (T ), are produced with the
CDF simulation program using the efficiency-corrected pT spectra measured from data [3, 12]. We use the muon
trigger efficiency measured using data as a function of track parameters (pT , η, φ) to account for detector non-
uniformities. The parametrized efficiency is used as a filter on all simulated muons. Events that pass reconstruction
represent the behavior of fully polarized vector-meson decays in the detector.
The fitting algorithm [5] uses two binned cos θ∗ distributions for each pT bin, one made by NS events from the
signal region (signal plus background) and the other made by NB events from the sideband regions (background).
The χ2 minimization is done simultaneously for both cos θ∗ distributions. The fitting algorithm includes an individual
background term for each cos θ∗ bin, normalized to NB. Simulation shows that the cos θ
∗ resolution at all decay
angles over the entire pT range is much smaller than the bin width of 0.05 (0.10 for ψ(2S)) used here. The data, fit,
and template distributions for the worst fit (9% probability) in the J/ψ data are shown in Fig. 3.
*θcos 
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
 (GeV/c) <  9T p≤ 7 
FIG. 3: cos θ∗ distribution of data (points) and polarization fit for the worst χ2 probability bin in the J/ψ data. The dotted
(dashed) line is the template for fully L (T) polarization. The fit describes the overall trend of the data well.
All systematic uncertainties are much smaller than the statistical uncertainties. Varying the pT spectrum used in
the simulation by 1σ changed the polarization parameter for J/ψ at most by 0.002. A systematic uncertainty of 0.007
was estimated by the change in the polarization parameter when a modification was made on all trigger efficiencies
by ±1σ. For ψ(2S), the dominant systematic uncertainty came from the yield estimate because of the radiative tail
and the large background. The total systematic uncertainties shown in Table I were taken to be the quadrature
sum of these individual uncertainties. Other possible sources of systematic uncertainties - signal definition and cos θ∗
binning - were determined to be negligible. Corrections to prompt polarization from B-decay contamination were
small, so that uncertainties on B-decay polarization measurements also had negligible effect. No φ-dependence of the
polarizations was observed.
The polarization of J/ψ mesons from inclusive Bu and Bd decays was measured by the BABAR collaboration [13].
In this analysis, the B-hadron direction is unknown, so we define θ∗ with respect to the J/ψ direction in the laboratory
system. The resulting polarization is somewhat diluted. As discussed in Ref. [3], CDF uses a Monte Carlo procedure to
adapt the BABAR measurement to predict the effective J/ψ polarization parameter. For the J/ψ events with 5 ≤ pT <
30 GeV/c, the CDF model for Bu and Bd decays gives αeff = −0.145± 0.009, independent of pT . We have measured
the polarization of vector mesons from B-hadron decays. For J/ψ, we find αeff = −0.106±0.033 (stat)±0.007 (syst).
At this level of accuracy, a polarization contribution by J/ψ mesons from Bs and b-baryon decays cannot be separated
from the effective polarization due to those from Bu and Bd decays. We also report the first measurement of the
ψ(2S) polarization from B-hadron decays: αeff = 0.36± 0.25 (stat)± 0.03 (syst).
The polarization parameters for both prompt vector mesons corrected for fbkd using our experimental results on αeff
are listed as functions of pT in Table I and are plotted in Fig. 4. The polarization parameters for J/ψ are negative
over the entire pT range of measurement and become increasingly negative (favoring longitudinal polarization) as
pT increases. For ψ(2S), the central value of the polarization parameter is positive at small pT , but, given the
uncertainties, its behavior is consistent with the trend shown in the measurement of the J/ψ polarization.
The polarization behavior measured previously with 110 pb−1 [5] is not consistent with the results presented here.
This is a differential measurement, and the muon efficiencies in this analysis are true dimuon efficiencies. In Ref. [5],
they are the product of independent single muon efficiencies. The efficiency for muons with pT < 4 GeV/c is crucial for
good polarization sensitivity. In this analysis, the muon efficiency varies smoothly from 99% to 97% over this range.
In the analysis of Ref. [5], it varied from 93% to 40% with significant jumps between individual data points. Data
from periods of drift chamber aging were omitted from this analysis because the polarization results were inconsistent
with the remainder of the data. Studies such as this were not done in the analysis of Ref. [5]. The systematics of the
polarization measurement are much better understood in this analysis.
These polarization measurements for the charmed vector mesons extend to a pT regime where perturbative QCD
should be applicable. The results are compared to the predictions of NRQCD and the kT -factorization model in Fig. 4.
The prediction of the kT -factorization model is presented for pT < 20 GeV/c and does not include the contribution
pT (GeV/c) <pT > (GeV/c) fbkd(%) α χ
2/d.o.f
J/ψ 5−6 5.5 2.8± 0.2 −0.004± 0.029± 0.009 15.5/21
6−7 6.5 3.4± 0.2 −0.015± 0.028± 0.010 24.1/23
7−9 7.8 4.1± 0.2 −0.077± 0.023± 0.013 35.1/25
9−12 10.1 5.7± 0.3 −0.094± 0.028± 0.007 34.0/29
12−17 13.7 6.7± 0.6 −0.140± 0.043± 0.007 35.0/31
17−30 20.0 13.6± 1.4 −0.187± 0.090± 0.007 33.9/35
ψ(2S) 5−7 5.9 1.6± 0.9 +0.314± 0.242± 0.028 13.1/11
7−10 8.2 4.9± 1.2 −0.013± 0.201± 0.035 18.5/13
10−30 12.6 8.6± 1.8 −0.374± 0.222± 0.062 26.9/17
TABLE I: Polarization parameter α for prompt production in each pT bin. The first (second) uncertainty is statistical (sys-
tematic). <pT > is the average transverse momentum.
 (GeV/c)Tp
5 10 15 20 25 30
CDF Data
NRQCD
-factorizationTk
 (GeV/c)Tp
5 10 15 20 25 30
CDF Data
NRQCD
-factorizationTk
FIG. 4: Prompt polarizations as functions of pT : (a) J/ψ and (b) ψ(2S). The band (line) is the prediction from NRQCD [4]
(the kT -factorization model [9]).
from the decays of heavier charmonium states for J/ψ production. The polarizations for prompt production of
both vector mesons become increasingly longitudinal as pT increases beyond 10 GeV/c. This behavior is in strong
disagreement with the NRQCD prediction of large transverse polarization at high pT . It is striking that the NRQCD
calculation and the other models reproduce the measured J/ψ and ψ(2S) cross sections at the Tevatron, but fail to
describe the polarization at high pT . This indicates that there is some important aspect of the production mechanism
that is not yet understood.
We thank the Fermilab staff and the technical staffs of the participating institutions for their vital contributions.
This work was supported by the U.S. Department of Energy and National Science Foundation; the Italian Istituto
Nazionale di Fisica Nucleare; the Ministry of Education, Culture, Sports, Science and Technology of Japan; the
Natural Sciences and Engineering Research Council of Canada; the National Science Council of the Republic of
China; the Swiss National Science Foundation; the A.P. Sloan Foundation; the Bundesministerium für Bildung und
Forschung, Germany; the Korean Science and Engineering Foundation and the Korean Research Foundation; the
Particle Physics and Astronomy Research Council and the Royal Society, UK; the Institut National de Physique
Nucleaire et Physique des Particules/CNRS; the Russian Foundation for Basic Research; the Comisión Interministerial
de Ciencia y Tecnoloǵıa, Spain; the European Community’s Human Potential Programme; the Slovak R&D Agency;
and the Academy of Finland.
[1] G. T. Bodwin, E. Braaten, and G. P. Lepage, Phys. Rev. D 51, 1125 (1995); Erratum, ibid. Phys. Rev. D 55, 5853 (1997);
E. Braaten and S. Fleming, Phys. Rev. Lett. 74, 3327 (1995).
[2] F. Abe et al. (CDF Collaboration), Phys. Rev. Lett. 79, 572 (1997).
[3] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71, 032001 (2005).
[4] P. Cho and M. Wise, Phys. Lett. B 346, 129 (1995); M. Beneke and I. Z. Rothstein, Phys. Lett. B 372, 157 (1996);
Erratum, ibid. Phys. Lett. B 389, 769 (1996); E. Braaten, B. A. Kniehl, and J. Lee, Phys. Rev. D 62, 094005 (2000).
[5] T. Affolder et al. (CDF Collaboration), Phys. Rev. Lett. 85, 2886 (2000).
[6] C. Adloff et al. (H1 Collaboration), Eur. Phys. J. C 25, 41 (2002); S. Chekanov et al. (ZEUS Collaboration), Eur. Phys.
J. C 44, 13 (2005).
[7] S. Fleming, I. Z. Rothstein, and A. K. Leibovich, Phys. Rev. D 64, 036002 (2001).
[8] V. A. Khoze, A. D. Martin, M. G. Ryskin, and W. J. Stirling, Eur. Phys. J. C 39, 163 (2005).
[9] S. P. Baranov, Phys. Rev. D 66, 114003 (2002).
[10] The CDF coordinate system has ẑ along the proton direction, x̂ horizontal pointing outward from the Tevatron ring,
and ŷ vertical. θ (φ) is the polar (azimuthal) angle measured with respect to ẑ, and η is the pseudorapidity defined as
−ln (tan (θ/2)). The transverse momentum of a particle is denoted as pT = p sin θ.
[11] W.-M. Yao et al. (Particle Data Group), J. Phys. G 33, 1 (2006).
[12] A Letter on ψ(2S) cross section measurement is in preparation.
[13] B. Aubert et al. (BABAR Collaboration), Phys. Rev. D 67, 032002 (2003).
	References
ABSTRACT
  We have measured the polarizations of $\jpsi$ and $\psiprime$ mesons as
functions of their transverse momentum $\pt$ when they are produced promptly in
the rapidity range $|y|<0.6$ with $\pt \geq 5 \pgev$. The analysis is performed
using a data sample with an integrated luminosity of about $800 \ipb$ collected
by the CDF II detector. For both vector mesons, we find that the polarizations
become increasingly longitudinal as $\pt$ increases from 5 to $30 \pgev$. These
results are compared to the predictions of nonrelativistic quantum
chromodynamics and other contemporary models. The effective polarizations of
$\jpsi$ and $\psiprime$ mesons from $B$-hadron decays are also reported.

<|endoftext|><|startoftext|>
A measure of the non-Gaussian character of a quantum state
Marco G. Genoni,1 Matteo G. A. Paris,1, 2, ∗ and Konrad Banaszek3
1Dipartimento di Fisica dell’Università di Milano, I-20133, Milano, Italia.
2Institute for Scientific Interchange, I-10133 Torino, Italia
3Institute of Physics, Nicolaus Copernicus University, PL-87-100 Toruń, Poland
(Dated: November 3, 2018)
We address the issue of quantifying the non-Gaussian character of a bosonic quantum state and introduce
a non-Gaussianity measure based on the Hilbert-Schmidt distance between the state under examination and a
reference Gaussian state. We analyze in details the properties of the proposed measure and exploit it to evaluate
the non-Gaussianity of some relevant single- and multi-mode quantum states. The evolution of non-Gaussianity
is also analyzed for quantum states undergoing the processes of Gaussification by loss and de-Gaussification by
photon-subtraction. The suggested measure is easily computable for any state of a bosonic system and allows
to define a corresponding measure for the non-Gaussian character of a quantum operation.
PACS numbers: 03.67.-a, 03.65.Bz, 42.50.Dv
I. INTRODUCTION
Gaussian states play a crucial role in quantum information
processing with continuous variables. This is especially true
for quantum optical implementations since radiation at ther-
mal equilibrium, including the vacuum state, is itself a Gaus-
sian state and most of the Hamiltonians achievable within the
current technology are at most bilinear in the field operators,
i.e. preserve the Gaussian character [1, 2, 3]. As a matter
of fact, using single-mode and entangled Gaussian states, lin-
ear optical circuits and Gaussian operations, like homodyne
detection, several quantum information protocols have been
implemented, including teleportation, dense coding and quan-
tum cloning [4].
On the other hand quantum information protocols required
for long distance communication, as for example entangle-
ment distillation and entanglement swapping, rely on non-
Gaussian operations. In addition, it has been demonstrated
that teleportation [5, 6, 7] and cloning [8] of quantum states
may be improved by using non-Gaussian states and non-
Gaussian operations. Indeed, de-Gaussification protocols for
single-mode and two-mode states have been proposed [5, 6, 7]
and realized [9]. It should be also noticed that any strongly
superadditive function is minimized, at fixed covariance ma-
trix, by Gaussian states. This is crucial to prove extremality
of Gaussian states and Gaussian operations [10, 11] for what
concerns various quantities as channel capacities [12], multi-
partite entanglement measures [13] and distillable secret key
in quantum key distribution protocols. Since in most cases
these quantities can be computed only for Gaussian states, a
non-Gaussianity measure may serve as a guideline to quan-
tify them for the class of non-Gaussian states. Overall, non-
Gaussianity is revealing itself as a resource for continuous
variable quantum information, and thus we urge a measure
able to quantify the non-Gaussian character of a quantum
state.
In this paper we introduce a novel quantity, the non-
∗Electronic address: matteo.paris@fisica.unimi.it
Gaussianity δ[̺] of a quantum state, which quantifies how
much a state fails to be Gaussian. Our measure, which is based
on the Hilbert-Schmidt distance between the state itself and a
reference Gaussian state, can be easily computed for any state,
either single-mode or multi-mode.
The paper is structured as follows. In the next Section we
introduce notation and review the basic properties of Gaussian
states. Then, in Section III we introduce the formal definition
of δ[̺] and study its properties in details. In Section IV we
evaluate non-Gaussianity of relevant quantum states whereas
in Section V we analyze the evolution of non-Gaussianity for
known Gaussification and de-Gaussification maps. Section VI
closes the paper with some concluding remarks.
II. GAUSSIAN STATES
For concreteness, we will use here the quantum optical ter-
minology of modes carrying photons, but our theory applies
to general bosonic systems. Let us consider a system of n
modes described by mode operators ak, k = 1 . . . n, satis-
fying the commutation relations [ak, a
j ] = δkj . A quantum
state ̺ of the n modes is fully described by its characteristic
function [14]
χ[̺](λ) = Tr[̺D(λ)]
where D(λ) =
k=1Dk(λk) is the n-mode displacement
operator, with λ = (λ1, . . . , λn)
T , λk ∈ C, and where
Dk(λk) = exp{λka†k − λ
is the single-mode displacement operator. The canonical op-
erators are given by:
(ak + a
(ak − a†k)
with commutation relations given by [qj , pk] = iδjk. Upon in-
troducing the real vector R = (q1, p1, . . . , qn, pn)
T , the com-
http://arxiv.org/abs/0704.0639v4
mailto:matteo.paris@fisica.unimi.it
mutation relations rewrite as
[Rk, Rj ] = iΩkj
where Ωkj are the elements of the symplectic matrix Ω =
k=1 σ2, σ2 being the y-Pauli matrix. The covariance ma-
trix σ ≡ σ[̺] and the vector of mean values X ≡ X[̺] of a
quantum state ̺ are defined as
Xj = 〈Rj〉
σkj =
〈{Rk, Rj}〉 − 〈Rj〉〈Rk〉
where {A,B} = AB+BA denotes the anti-commutator, and
〈O〉 = Tr[̺ O] is the expectation value of the operator O.
A quantum state ̺G is referred to as a Gaussian state if its
characteristic function has the Gaussian form
χ[̺G](Λ) = exp
σΛ+XTΩΛ
where Λ is the real vector Λ =
(Reλ1, Imλ1, . . . ,Reλn, Imλn)
T . Of course, once the
covariance matrix and the vector of mean values are given, a
Gaussian state is fully determined. For a single-mode system
the most general Gaussian state can be written as
̺G = D(α)S(ζ)ν(nt)S
†(ζ)D†(α),
D(α) being the displacement operator, S(ζ) =
exp[ 1
ζ(a†)2 − 1
ζ∗a2] the squeezing operator, α, ζ ∈ C, and
ν(nt) = (1 + nt)
−1[nt/(1 + nt)]
a†a a thermal state with nt
average number of photons.
III. A MEASURE OF THE NON-GAUSSIAN CHARACTER
OF A QUANTUM STATE
In order to quantify the non-Gaussian character of a quan-
tum state ̺ we use a quantity based on the distance between
̺ and a reference Gaussian state τ , which itself depends on ̺.
Specifically, we define the non-Gaussianity δ[̺] of the state ̺
δ[̺] =
D2HS [̺, τ ]
where DHS [̺, τ ] denotes the Hilbert-Schmidt distance be-
tween ̺ and τ
D2HS [̺, τ ] =
Tr[(̺− τ)2] = µ[̺] + µ[τ ] − 2κ[̺, τ ]
, (3)
with µ[̺] = Tr[̺2] and κ[̺, τ ] = Tr[̺τ ] denoting the purity of
̺ and the overlap between ̺ and τ respectively. The Gaussian
reference τ is the Gaussian state such that
X[̺] = X[τ ]
σ[̺] = σ[τ ]
i.e. τ is the Gaussian state with the same covariance matrix σ
and the same vector X of the state ̺.
The relevant properties of δ[̺], which confirm that it repre-
sents a good measure of the non-Gaussian character of ̺, are
summarized by the following Lemmas:
Lemma 1: δ[̺] = 0 iff ̺ is a Gaussian state.
Proof: If δ[̺] = 0 then ̺ = τ and thus it is a Gaussian state.
If ̺ is a Gaussian state, then it is uniquely identified by its first
and second moments and thus the reference Gaussian state τ
is given by τ = ̺, which, in turn, leads to DHS [̺, τ ] = 0 and
thus to δ[̺] = 0.
Lemma 2: If U is a unitary map corresponding to a symplec-
tic transformation in the phase space, i.e. if U = exp{−iH}
with hermitianH that is at most bilinear in the field operators,
then δ[U̺U †] = δ[̺]. This property ensures that displace-
ment and squeezing operations do not change the Gaussian
character of a quantum state.
Proof: Let us consider ̺′ = U̺U †. Then the covariance ma-
trix transforms as σ[̺′] = Σσ[̺]ΣT , Σ being the symplectic
transformation associated to U . At the same time the vector of
mean values simply translates to X ′ = X +X0, where X0
is the displacement generated by U . Since any Gaussian state
is fully characterized by its first and second moments, then the
reference state must necessarily transform as τ ′ = UτU †, i.e.
with the same unitary transformation U . Since the Hilbert-
Schmidt distance and the purity of a quantum state are invari-
ant under unitary transformations the lemma is proved.
Lemma 3: δ[̺] is proportional to the squared L2(Cn) dis-
tance between the characteristic functions of ̺ and of the ref-
erence Gaussian state τ . In formula:
δ[̺] ∝
d2nλ [χ[̺](λ)− χ[τ ](λ)]2 . (4)
Since the notion of Gaussianity of a quantum state is de-
fined through the shape of its characteristic function, and since
the characteristic function of a quantum state belongs to the
L2(Cn) space [14], we address L2(C) distance to as a good
indicator for the non Gaussian character of ̺.
Proof: Since characteristic functions of self-adjoint operators
are even functions of λ and by means of the identity
Tr[O1O2] =
χ[O1](λ)χ[O2](−λ) ,
we obtain
D2HS [̺, τ ] =
[χ[̺](λ)− χ[τ ](λ)]2 .
Lemma 4: Consider a bipartite state ̺ = ̺A ⊗ ̺G. If ̺G is a
Gaussian state then δ[̺] = δ[̺A].
Proof: we have
µ[̺] = µ[̺A]µ[̺G]
µ[τ ] = µ[τA]µ[τG]
κ[̺, τ ] = κ[̺A, τA]κ[̺G, ̺G] .
Therefore, since κ[̺G, ̺G] = µ[̺G] we arrive at
δ[̺] =
µ[̺A]µ[̺G] + µ[τA]µ[̺G]− 2κ[̺A, τA]κ[̺G, ̺G]
2µ[̺A]µ[̺G]
= δ[̺A] (5)
The four properties illustrated by the above lemmas are the
natural properties required for a good measure of the non-
Gaussian character of a quantum state. Notice that by using
the trace distanceDT [̺, τ ] =
Tr|̺−τ | instead of the Hilbert-
Schmidt distance we would lose Lemmas 3 and 4, and that the
invariance expressed by Lemma 4 holds thanks to the renor-
malization of the Hilbert-Schmidt distance through the purity
µ[̺]. We stress the fact that our measure of non-Gaussianity
is a computable one: It may be evaluated for any quantum
state of n modes by the calculation of the first two moments
of the state, followed by the evaluation of the overlap with the
corresponding Gaussian state.
Notice that δ[̺] is not additive (nor multiplicative) with re-
spect to the tensor product. If we consider a (separable) multi-
partite quantum state in the product form ̺ = ⊗nk=1̺k, the
non-Gaussianity is given by
δ[̺] =
k=1 µ[̺k] +
k=1 µ[τk]− 2
k=1 κ[̺k, τk]
k=1 µ[̺k]
where τk is the Gaussian state with the same moments of ̺k.
In fact, since the state ̺ is factorisable, we have that the cor-
responding Gaussian τ is a factorisable state too.
IV. NON-GAUSSIANITY OF RELEVANT QUANTUM
STATES
Let us now exploit the definition (2) to evaluate the non-
Gaussianity of some relevant quantum states. At first we con-
sider Fock number states |p〉 of a single mode as well as mul-
timode factorisable states |p〉⊗n made of n copies of a num-
ber state. The reference Gaussian states are a thermal state
τp = ν(p) with average photon number p and a factorisable
thermal state τN = [ν(p)]
⊗n with average photon number p
in each mode [15]. Non-Gaussianity may be analytically eval-
uated, leading to
δ[|p〉〈p|] = 1
2p+ 1
δ[(|p〉〈p|)⊗n] = 1
2p+ 1
In the multimode case of |p〉⊗n, we seek for the number of
copies that maximizes the non-Gaussianity. In Fig. 1 we
show both δp ≡ δ[|p〉〈p|] and δ̄p = maxn δ[(|p〉〈p|)⊗n] as a
function of p. As it is apparent from the plot non-Gaussianity
of Fock states |p〉 increases monotonically with the number
of photon p with the limiting value δp = 1/2 obtained for
p → ∞. Upon considering multi-mode copies of Fock states
we obtain larger value of non-Gaussianity: δ̄p is a decreasing
function of p, approaching δ̄ = 1/2 from above. The value
1 5 10 15 20 25 30
FIG. 1: (Top): Non-Gaussianity of single mode Fock states (gray)
|p〉 and of multi-mode Fock states |p〉⊗n (black) as a function of p.
Non-Gaussianity for multi-mode states has been maximized over the
number of copies n. (Bottom): Non-Gaussianity, as a function of the
parameter φ, for the two-mode superpositions |Φ〉〉 (dashed gray),
|Ψ〉〉 (solid gray), and for the single-mode superposition of coherent
states |ψS〉 for α = 0.5 (solid black) and α = 5 (dashed black).
of δ̄p corresponds to n = 3 for p < 26 and to n = 2 for
27 ≤ p . 250.
Another example is the superposition of coherent states
|ψS〉 = N−1/2 (cosφ|α〉 + sinφ| − α〉) (7)
with normalization N = 1 + sin(2φ) exp{−2α2} which for
φ = ±π/4 reduces to the so-called Schrödinger cat states,
and whose reference Gaussian state is a displaced squeezed
thermal state τS = D(C)S(r)ν(N)S
†(r)D†(C), where the
real parameters C, r, and N are analytical functions of φ and
α. Finally we evaluate the non-Gaussianity of the two-mode
Bell-like superpositions of Fock states
|Φ〉〉 = cosφ|0, 0〉+ sinφ|1, 1〉
|Ψ〉〉 = cosφ|0, 1〉+ sinφ|1, 0〉,
which for φ = ±π/4 reduces to the Bell states |Φ±〉
and |Ψ±〉. The corresponding reference Gaussian states
are respectively a two mode squeezed thermal state τΦ =
S2(ξ)[ν(N) ⊗ ν(N)]S†2(ξ), where S2(ξ) = exp(ξa
ξ∗ab) denotes the two-mode squeezing operator, and τΨ =
R(θ)[ν(N1)⊗ν(N2)]R†(θ), namely the correlated two-mode
state obtained by mixing a single-mode thermal state with
the vacuum at a beam splitter of transmissivity cos2 θ, i.e.
R(θ) = exp[iθ(a
1a2+a
2a1)]. All the parameters involved in
these reference Gaussian states are analytical functions of the
superposition parameter φ. Non-Gaussianities are thus evalu-
ated by means of (2) and are reported in Fig. 1 as a function
of the parameter φ. As it is apparent from the plot, the non-
Gaussianity of single-mode states does not surpass the value
δ = 1/2, and this fact is confirmed by other examples not
reported here.
As concern the cat-like states, we notice that for small val-
ues of α the non-Gaussianity of the superposition |ψS〉 shows
a different behavior for positive and negative values of the pa-
rameter φ: for φ > 0 and α = 0.5 we have almost zero δ,
while higher values are achieved for φ < 0. For higher val-
ues of α (α = 5 in Fig. 1), non-Gaussianity becomes an even
function of φ. This different behavior can be understood by
looking at the Wigner functions of even and odd Schrödinger
cat states for different values of α: for small values of α the
even cat’s Wigner function is similar to a Gaussian function,
while the odd cat’s Wigner function shows a non-Gaussian
hole in the origin of the phase space; increasing the value of α
the Wigner functions of the two kind of states become similar
and deviate from a Gaussian function.
We have also done a numerical analysis of non-Gaussianity
of single-mode quantum states represented by finite superpo-
sition of Fock states
n,k=0
̺nk|n〉〈k| . (8)
To this aim we generate randomly quantum states in a finite
dimensional subspaces, dim(H) ≡ d+ 1 ≤ 21, following the
algorithm proposed by Zyczkowski et al [16, 17], i.e. by gen-
erating a random diagonal state (i.e. a point on the simplex)
and a random unitary matrix according to the Haar measure.
In Fig. 2 we report the distribution of non-Gaussianity δ[̺d],
as evaluated for 105 random quantum states, for three different
value of the maximum number of photons d. As it is apparent
from the plots the distribution of δ[̺d] becomes Gaussian-like
for increasing d. In the fourth panel of Fig. 2 we thus re-
port the mean values and variances of the the distributions as
a function of the maximum number of photons d. The mean
value increases with the dimension whereas the variance is a
monotonically decreasing function of d.
Also for finite superpositions simulations we did not ob-
serve non-Gaussianity higher than 1/2. Therefore, although
we have no proof, we conjecture that δ = 1/2 is a limiting
value for the non-Gaussianity of a single-mode state. Higher
values are achievable for two-mode or multi-mode quantum
states (e.g. δ = 2/3 for the Bell states |Ψ±〉〉).
V. GAUSSIFICATION AND DE-GAUSSIFICATION
PROCESSES
We have also studied the evolution of non-Gaussianity
of quantum states undergoing either Gaussification or de-
Gaussification processes. First we have considered the Gaus-
sification of Fock states due do the interaction of the system
FIG. 2: Distribution of non-Gaussianity δ[̺d] as evaluated for 10
random quantum states, for three different value of the maximum
number of photons d. Top: d = 2 (left), d = 10 (right); Bottom:
d = 20 (left). (Bottom-right): Mean values and variances of the non-
Gaussianities evaluated for 105 random quantum states, as a function
of the maximum number of photons d.
with a bath of oscillators at zero temperature. This is per-
haps the simplest example of a Gaussification protocol. In
fact the interaction drives asymptotically any quantum state to
the vacuum state of the harmonic system, which, in turn, is a
Gaussian state. The evolution of the system is governed by the
Lindblad Master equation ˙̺ = γ
L[a]̺, where ˙̺ denotes time
derivative, γ is the damping factor and the Lindblad superop-
erator acts as follows L[a]̺ = 2a†̺a − a†a̺ − ̺a†a. Upon
writing η = e−γt the solution of the Master equation can be
written as
̺(η) =
Vm ̺ V
m (9)
Vm = [(1− η)m/m!]
2 am η
(a†a−m) ,
where ̺ is the initial state. In particular for the system ini-
tially prepared in a Fock state ̺p = |p〉〈p|, we obtain, after
evolution, the mixed state
̺p(η) =
Vm̺pV
αl,p(η)|l〉〈l| (10)
with αl,p(η) =
(1−η)p−lηl. The reference Gaussian state
corresponding to ̺p(η) is a thermal state τp(η) = ν(pη) with
average photon number pη. Non-Gaussianity of ̺p(η) can be
evaluated analytically, we have
δpη ≡ δ[̺p(η)]
2(1− η)2m 2F1
−m,−m, 1; η2
(η−1)2
(1− η)2m 2F1
−m,−m, 1; η
(η − 1)2
+ (1 + 2mη)−1 − 2(1 + (m− 1)η)
(1 +mη)m+1
2F1(a, b, c;x) being a hypergeometric function. We show the
behavior of δpη in Fig. 3 as a function of 1 − η for different
values of p. As it is apparent from the plot δpη is a monotoni-
cally decreasing function of 1 − η as well as a monotonically
increasing function of p. That is, at fixed time t the higher
is the initial photon number p, the larger is the resulting non-
Gaussianity.
0.2 0.4 0.6 0.8 1
1 - Η
0.2 0.4 0.6 0.8 1
FIG. 3: (Left): Non-Gaussianity of Fock states |p〉 undergoing Gaus-
sification by loss mechanism due to the interaction with a bath of os-
cillators at zero temperature. We show δηp as a function of 1 − η
for different values of p: from bottom to top p = 1, 10, 100, 1000.
(Right): Non-Gaussianity of ̺IPS as a function of T for r = 0.5 and
for different values of ǫ = 0.2, 0.4, 0.6, 0.8 (from bottom to top).
δIPS results to be a monotonous increasing function of T , while ǫ
only slightly changes the non-Gaussian character of the state.
Let us now consider the de-Gaussification protocol ob-
tained by the process of photon subtraction. Inconclusive Pho-
ton Subtraction (IPS) has been introduced for single-mode and
two-mode states in [6, 7, 18] and experimentally realized in
[9]. In the IPS protocol an input state ̺(in) is mixed with
the vacuum at a beam splitter (BS) with transmissivity T and
then, on/off photodetection with quantum efficiency ǫ is per-
formed on the reflected beam. The process can be thus charac-
terized by two parameters: the transmissivity T and the detec-
tor efficiency ǫ. Since the detector can only discriminate the
presence from the absence of light, this measurement is in-
conclusive, namely it does not resolve the number of detected
photons. When the detector clicks, an unknown number of
photons is subtracted from the initial state and we obtain the
conditional IPS state ̺IPS . The conditional map induced by
the measurement is non-Gaussian [7], and the output state is
de-Gaussified. Upon applying the IPS protocol to the (Gaus-
sian) single-mode squeezed vacuum S(r)|0〉 (r ∈ R), where
S(r) is the real squeezing operation we obtain [18] the con-
ditional state ̺IPS , whose characteristic function χ[̺IPS ](λ)
is a sum of two Gaussian functions and therefore is no longer
Gaussian. The corresponding Gaussian reference state is a
squeezed thermal state τIPS = S(ξIPS)ν(NIPS)S
†(ξIPS)
where the parameters ξIPS andNIPS are analytic functions of
r, T and ǫ. Non-Gaussianity δIPS = δIPS(T, ǫ, r) has been
evaluated, and in Fig. 3 (right) we report δIPS for r = 0.5
as a function of the transmittivity T for different values of
the quantum efficiency ǫ. As it is apparent from the plot the
IPS protocol indeed de-Gaussifies the input state, i.e. nonzero
values of the non-Gaussianity are obtained. We found that
δIPS is an increasing function of the transmissivity T which
is the relevant parameter, while the quantum efficiency ǫ only
slightly affects the non-Gaussian character of the output state.
The highest value of non-Gaussianity is achieved in the limit
of unit transmissivity and unit quantum efficiency
T,η→1
δIPS = δ[|1〉〈1|] = δ[S(r)|1〉〈1|S†(r)],
where the last equality is derived from Lemma 2. This result
is in agreement with the fact that a squeezed vacuum state
undergoing the IPS protocol is driven towards the target state
S(r)|1〉 in the limit of T, ǫ → 1 [18]. Finally, we notice that
for T, ǫ 6= 1 and for r → ∞ the non-Gaussianity vanishes. In
turn, this corresponds to the fact that one of the coefficients
of the two Gaussians of χ[̺IPS ](λ) vanishes, i.e. the output
state is again a Gaussian one.
VI. CONCLUSION AND OUTLOOKS
Having at disposal a good measure of non-Gaussianity for
quantum state allows us to define a measure of the non-
Gaussian character of a quantum operation. Let us denote
by G the whole set of Gaussian states. A convenient defi-
nition for the non-Gaussianity of a map E reads as follows
δ[E ] = max̺∈G δ[E(̺)], where E(̺) denotes the quantum
state obtained after the evolution imposed by the map. Indeed,
for a Gaussian map Eg , which transforms any input Gaussian
state into a Gaussian state, we have δ[Eg] = 0. Work along
this line is in progress and results will be reported elsewhere.
In conclusion, we have proposed a measure of the non-
Gaussian character of a CV quantum state. We have shown
that our measure satisfies the natural properties expected from
a good measure of non-Gaussianity, and have evaluated the
non-Gaussianity of some relevant states, in particular of states
undergoing Gaussification and de-Gaussification protocols.
Using our measure an analogue non-Gaussianity measure for
quantum operations may be introduced.
Acknowledgments
This work has been supported by MIUR project
PRIN2005024254-002, the EC Integrated Project QAP (Con-
tract No. 015848) and Polish MNiSW grant 1 P03B 011 29.
APPENDIX A: GAUSSIAN REFERENCE WITH
UNCONSTRAINED MEAN VALUE
As we have seen from the above examples δ[̺] of Eq. (2)
represents a good measure of the non-Gaussian character of a
quantum state. A question arises on whether different choices
for the reference Gaussian state τ may lead to alternative,
valid, definitions. As for example (for single-mode states) we
may define
δ′[̺] = min
D2HS [̺, τ ]/µ[̺], (A1)
where τ = D(C)S(ξ)ν(N)S†(ξ)D†(C) is a Gaussian state
with the same covariance matrix of ̺ and unconstrained vec-
tor of mean values X = (ReC, ImC) used to minimize the
Hilbert-Schmidt distance. Here we report few examples of
the comparison between the results already obtained using (2)
with that coming from (A1). As we will see either the two
definitions coincide or δ′ and δ are monotone functions of
each other. Since the definition (2) corresponds to an easily
computable measure we conclude that it represents the most
convenient choice.
Let us first consider the Fock state ̺ = |p〉〈p|. According
to (A1), the reference Gaussian state is given by a displaced
thermal states τ ′ = D(C)νpD
†(C). The overlap between ̺
and τ ′ is given by
κ[|p〉〈p|, τ ′] = 1
1 + p
1 + p
1 + p
p(1 + p)
The maximum of (A2) is achieved forC = 0, which coincides
with the assumptions C = Tr[a|p〉〈p|].
Let us consider the quantum state (10) obtained as the so-
lution of the loss Master Equation for an initial Fock state
|p〉〈p|. The unconstrained Gaussian reference is again a dis-
placed thermal state τ ′ = D(C)νpηD
†(C), and the overlap is
given by
κ[̺p(η), τ
′] = Tr[τ̺p(η)] =
(1 + η(p− 1))p
(1 + pη)p+1
η|C|2
(1 + pη)(η(1 − p)− 1)
Again, since the overlap is maximum for C = Tr[a̺p(η)] =
0, both definitions give the same results for the non-
Gaussianity.
Let us now consider the Schrödinger cat-like states of (7).
The reference Gaussian state is a displaced squeezed thermal
state, with squeezing and thermal photons as calculated be-
fore. The optimization over the free parameterC may be done
numerically. In Fig. 4 we show the non-Gaussianitiy, both as
resulting from (A1) and by choosing C = Tr[a̺S ] as in (2),
as a function of ǫ. The two curves are almost the same, with
no qualitative differences.
[1] A. Ferraro, S. Olivares and M. G. A. Paris, Gaussian States in
Quantum Information, (Bibliopolis, Napoli, 2005)
[2] J. Eisert, M. B. Plenio, Int. J. Quant. Inf. 1, 479 (2003)
[3] F. Dell’Anno et al., Phys. Rep. 428, 53 (2006).
[4] S. L. Braunstein, P. van Loock, Rev. Mod. Phys 77, 513 (2005).
[5] T. Opatrny et al., Phys.Rev. A 61, 032302 (2000).
[6] P. T. Cochrane et al., Phys Rev. A 65, 062306 (2002).
[7] S. Olivares et al., Phys. Rev. A 67, 032314 (2003); S. Olivares,
M. G. A. Paris, Las. Phys. 16, 1533 (2006).
[8] N. J. Cerf et al., Phys. Rev. Lett. 95, 070501 (2005).
[9] J. Wenger et al., Phys. Rev. Lett. 92, 153601 (2004); A. Our-
joumtsev et al., Science 312, 83 (2006).
[10] M. M. Wolf et al., Phys. Rev. Lett. 96, 080502 (2006).
[11] M. M. Wolf et al, Phys. Rev. Lett. 98, 130501 (2007). 0
[12] A. S. Holevo, R. F. Werner, Phys. Rev. A 63, 032312 (2001).
[13] L. M. Duan at al, Phys. Rev. Lett. 84, 4002 (2000); R. F. Werner,
M. M. Wolf, Phys. Rev. Lett. 86, 3658 (2001).
[14] K. E. Cahill and R. J. Glauber, Phys. Rev. 177, 1882 (1969).
[15] P. Marian, T. Marian, Phys. Rev. A 47, 4474 (1993).
[16] K. Zyczkowski and M. Kus, J. Phys. A 27, 4235 (1994).
[17] K. Zyczkowski, P. Horodecki, A. Sanpera and M. Lewenstein,
Phys. Rev. A 58, 883 (1994).
[18] S. Olivares and M. G. A. Paris, J. Opt. B, 7, S392 (2005).
FIG. 4: Non-Gaussianity of a Schrödinger cat-like state as a func-
tion of the superposition parameter φ, with either C obtained by nu-
merical minimization (solid) or with C = Tr[a̺] (dotted). (Left):
α = 0.5; (Right): α = 5.
ABSTRACT
  We address the issue of quantifying the non-Gaussian character of a bosonic
quantum state and introduce a non-Gaussianity measure based on the
Hilbert-Schmidt distance between the state under examination and a reference
Gaussian state. We analyze in details the properties of the proposed measure
and exploit it to evaluate the non-Gaussianity of some relevant single- and
multi-mode quantum states. The evolution of non-Gaussianity is also analyzed
for quantum states undergoing the processes of Gaussification by loss and
de-Gaussification by photon-subtraction. The suggested measure is easily
computable for any state of a bosonic system and allows to define a
corresponding measure for the non-Gaussian character of a quantum operation.

<|endoftext|><|startoftext|>
Introduction
Recall that a Hadamard matrix A of orderm is a {±1}-matrix of size
m×m such that AAT = mIm, where T denotes the transpose and Im
the identity matrix. A skew-Hadamard matrix is a Hadamard matrix
A such that A − Im is a skew-symmetric matrix. We refer the reader
to [1] for the survey of known results about skew-Hadamard matrices.
The construction of skew-Hadamard matrices is lagging considerably
behind that for arbitrary Hadamard matrices. Our previous four notes,
written more than 13 years ago, were motivated by the desire to im-
prove this situation. We constructed skew-Hadamard matrices of order
m = 4n for the following 24 odd integers n:
[2]: 37, 43;
[3]: 67, 113, 127, 157, 163, 181, 241;
[4]: 39, 49, 65, 93, 121, 129, 133, 217, 219, 267;
[6]: 81, 103, 151, 169, 463.
At the time of publication, such matrices of these orders were not
known to exist. Due to the manifold increase in computing power
since that time, one can now make further progress.
In [6], we listed 45 odd integers n < 300 for which no skew-Hadamard
matrix of order 4n was known at that time. (In the first edition of [1],
Table 24.31 was incomplete.) The smallest of these n’s was 47. The
next one, 59, has been removed recently by Fletcher, Koukouvinos and
The author was supported by an NSERC Discovery Grant.
http://arxiv.org/abs/0704.0640v2
2 D.Ž. D– OKOVIĆ
Seberry [7]. In this note we shall remove the integers 47 and 97 from the
mentioned list by constructing examples of skew-Hadamard matrices
of orders 4 · 47 = 188 and 4 · 97 = 388. (We have constructed a bunch
of examples but we have saved and will present only a few of them.)
Consequently, the revised list now consists of the 42 integers:
69, 89, 101, 107, 109, 119, 145, 149, 153, 167, 177, 179, 191, 193,
201, 205, 209, 213, 223, 225, 229, 233, 235, 239, 245, 247, 249, 251,
253, 257, 259, 261, 265, 269, 275, 277, 283, 285, 287, 289, 295, 299.
We construct our examples of skew-Hadamard matrices of orders
188 and 388 by constructing first suitable supplementary difference
sets, and then we use these sets to build four circulant blocks, which
one should plug into the Goethals–Seidel array. The procedure used to
find these supplementary difference sets is not new. I have used it in
several papers during the last 15 years. It is described in my note [5].
2. The case n = 47
We denote the additive group of integers modulo n by Zn. In this
section we set n = 47. In the literature on Hadamard matrices it is
customary to refer to difference families (DF) as supplementary differ-
ence sets (SDS) and to employ more elaborate and more informative
notation by listing the order v of the underlying abelian group, the
number of sets in the family as well as their cardinals, and also the
parameter λ.
We have constructed four suitable difference families in Zn. The first
two are the following.
SKEW-HADAMARD MATRICES 3
Proposition 2.1. Define six subsets of Z47:
X1 = {2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 17, 18, 19, 20, 21, 22, 25, 27, 30,
31, 33, 35, 37, 38, 39, 40, 42, 43, 44},
X2 = {1, 3, 6, 7, 8, 11, 13, 14, 15, 19, 20, 21, 24, 27, 30, 33, 39, 41, 43,
44, 45, 46},
X3 = {3, 6, 8, 10, 11, 12, 14, 20, 21, 23, 24, 25, 26, 27, 30, 31, 32, 34, 35,
41, 42, 45},
Y1 = {1, 2, 3, 4, 5, 6, 10, 11, 12, 13, 14, 15, 17, 18, 19, 21, 23, 24, 25, 27,
28, 29, 30, 31, 35, 38, 41, 43, 44, 46},
Y2 = {3, 6, 7, 8, 10, 11, 12, 16, 22, 25, 26, 31, 32, 33, 34, 37, 39, 41, 42,
43, 44, 46},
Y3 = {3, 7, 12, 13, 15, 16, 18, 20, 21, 23, 25, 26, 27, 28, 32, 35, 38, 39, 42,
44, 45, 46}.
The triples {X1, X2, X3} and {Y1, Y2, Y3} are difference families, i.e.,
they are 3 − (47; 30, 22, 22; 39) supplementary difference sets in Z47.
The two families are not equivalent.
Proof. Use the computer to verify the claims. Note that the cardinals
nk = |Xk| = |Yk| are indeed n1 = 30 and n2 = n3 = 22. The parameter
λ is 39, i.e., each nonzero integer in Zn occurs 39 times in the list of
differences created from the sets Xk and also from the Yk.
The second claim can be verified in several ways. We used the fol-
lowing ad hoc method. We compare the list of differences generated
by the sets X1 and Y1. Each nonzero integer i ∈ Zn occurs in one of
these lists say µi times. The µi’s take only three values: 18, 19 or 20.
But the number of µi’s equal to 18, 19 and 20 is 12, 26 and 8 for X1
and 14, 22 and 10 for Y1. Hence X1 and Y1 are not equivalent under
translations and automorphisms of the additive group Zn. �
For any subset X ⊆ Zn let
aX = (a0, a1, . . . , an−1)
be the {±1}-row vector such that ai = −1 iff i ∈ X . We denote by AX
the n× n circulant matrix having aX as its first row.
Let X0 ⊆ Zn be the Paley difference set (the set of nonzero squares
in the finite field Zn). Recall that X0 is of skew type, i.e., for nonzero
i ∈ Zn we have i ∈ X0 iff −i /∈ X0. Its cardinal is n0 = |X0| = 23.
For simplicity, write Ak instead of AXk for k = 0, 1, 2, 3. We can now
plug our matrices Ak into the Goethals–Seidel template to construct a
4 D.Ž. D– OKOVIĆ
skew-Hadamard matrix of order 188:
A0 A1R A2R A3R
−A1R A0 −A
−A2R A
R A0 −A
−A3R −A
As usual, R denotes the matrix having ones on the back-diagonal and
all other entries zero.
Clearly, we can use the second difference family to construct another
skew-Hadamard matrix of order 188. Both solutions have the same
associated decomposition of 4n as sum of four squares:
4n = 188 = 132 + 32 + 32 + 12
(n− 2nk)
The remaining two difference families have different parameters from
the first two.
Proposition 2.2. Define six subsets of Z47:
P1 = {0, 2, 4, 5, 9, 10, 12, 16, 17, 19, 21, 22, 23, 25, 27, 28, 35, 36, 37,
43, 46},
P2 = {0, 1, 2, 6, 8, 9, 11, 15, 16, 19, 25, 32, 33, 35, 36, 37, 38, 40, 44},
P3 = {1, 2, 3, 4, 5, 6, 7, 10, 11, 16, 18, 22, 24, 28, 31, 35, 38, 40, 43},
Q1 = {4, 5, 6, 8, 11, 12, 15, 20, 21, 23, 25, 26, 28, 29, 30, 31, 32, 36,
39, 41, 43},
Q2 = {1, 2, 5, 7, 13, 14, 21, 22, 24, 26, 31, 32, 35, 36, 37, 39, 40, 42, 46},
Q3 = {1, 2, 3, 4, 5, 9, 12, 18, 20, 21, 24, 25, 32, 34, 38, 39, 43, 44, 46}.
The triples {P1, P2, P3} and {Q1, Q2, Q3} are difference families, i.e.,
they are 3 − (47; 21, 19, 19; 24) supplementary difference sets in Z47.
These two families are not equivalent to each other or the ones above.
Just as the first two families, {P1, P2, P3} and {Q1, Q2, Q3} can be
used to construct two more skew-Hadamard matrices of order 188. The
associated decomposition into sum of four squares is now different:
188 = 92 + 92 + 52 + 12.
3. The case n = 97
For the remainder of this note we set n = 97. Let G be the multi-
plicative group of the nonzero elements of Zn, a cyclic group of order
n − 1 = 96, and let H = 〈35〉 = {1, 35, 61} be its subgroup of order
SKEW-HADAMARD MATRICES 5
3. We use the same enumeration of the 32 cosets αi, 0 ≤ i ≤ 31, of H
in G as in our computer program. Thus we impose the condition that
α2i+1 = −1 · α2i for 0 ≤ i ≤ 15. For even indices we have
α0 = H, α2 = 2H, α4 = 3H, α6 = 4H, α8 = 5H,
α10 = 6H, α12 = 7H, α14 = 9H, α16 = 10H, α18 = 12H,
α20 = 13H, α22 = 15H, α24 = 18H, α26 = 20H, α28 = 23H,
α30 = 26H.
Next define four index sets:
J0 = {1, 2, 4, 6, 9, 11, 13, 14, 17, 18, 21, 23, 25, 27, 29, 30},
J1 = {1, 2, 6, 7, 8, 9, 10, 11, 12, 13, 23, 27, 29},
J2 = {0, 1, 2, 5, 6, 12, 13, 15, 16, 20, 24, 25, 26, 29, 30, 31},
J3 = {0, 2, 3, 4, 7, 8, 9, 11, 12, 13, 15, 16, 17, 18, 23, 28, 29}
and introduce the following four subsets of Zn:
αi, k = 0, 1, 2, 3.
Their cardinals nk = |Uk| = 3|Jk| are:
n0 = n2 = 48, n1 = 39, n3 = 51
and we set
λ = n0 + n1 + n2 + n3 − n = 89.
Observe that U0 is of skew type, i.e., we have
U0 ∩ (−U0) = ∅, U0 ∪ (−U0) = Zn \ {0}.
Proposition 3.1. The four subsets U0, U1, U2, U3 ⊂ Zn form a dif-
ference family, i.e., they are 4 − (97; 48, 39, 48, 51; 89) supplementary
difference sets in Z97.
Proof. For r ∈ {1, 2, . . . , 96} let λk(r) denote the number of solutions
of the congruence i − j ≡ r (mod 97) with {i, j} ⊆ Uk. It is easy to
verify (by using a computer) that
λ1(r) + λ2(r) + λ3(r) + λ4(r) = λ
is valid for all such r. Hence the sets U1, U2, U3, U4 form a difference
family in Zn. �
Let Ak now denote the n × n circulant matrices AYk . The SDS-
property implies that the {±1}-matrices A0, . . . , A3 satisfy the identity
= 4nIn.
6 D.Ž. D– OKOVIĆ
One can now plug the matrices Ak into the Goethals–Seidel template
to obtain a Hadamard matrix A of order 4n = 388. Since U1 is of skew
type, A is also skew-Hadamard.
Our second example, B, is constructed in the same way by using the
index sets:
K0 = {0, 3, 4, 7, 9, 11, 12, 14, 17, 19, 20, 22, 24, 27, 28, 30},
K1 = {4, 7, 8, 10, 12, 13, 14, 15, 17, 18, 20, 26, 27},
K2 = {0, 1, 2, 3, 6, 7, 8, 11, 12, 14, 20, 23, 24, 25, 28, 31},
K3 = {1, 2, 4, 7, 8, 9, 10, 12, 13, 19, 21, 23, 24, 25, 26, 27, 31},
with the corresponding subsets of Zn:
αi, k = 0, 1, 2, 3,
with V0 of skew type.
Proposition 3.2. The four subsets V0, V1, V2, V3 ⊂ Zn form a dif-
ference family, i.e., they are 4 − (97; 48, 39, 48, 51; 89) supplementary
difference sets in Z97.
The two SDS’s that we used to construct A and B are not equivalent.
For instance, the sets U1 and V1 are not equivalent under translations
and group automorphisms of Zn.
Since the two SDS’s have the same parameters, they share the same
decomposition of 4n into sum of four squares:
4n = 388 = 192 + 52 + 12 + 12
(n− 2nk)
References
[1] C.J. Colbourn and J.H. Dinitz, Handbook of Combinatorial Designs, 2nd Edi-
tion, CRC Press, New York, 2006.
[2] D.Ž.D– oković, Skew Hadamard matrices of order 4×37 and 4×43, J. Combinat.
Theory, Series A, 61 (1992), 319–321.
[3] , Construction of some new Hadamard matrices, Bull. Austral. Math.
Soc. 45 (1992), 327–332.
[4] , Ten new orders for Hadamard matrices of skew type, Univ. Beograd,
Publ. Elektrotehn. Fak. Ser. Mat. 3 (1992), 47–59.
[5] , Good matrices of orders 33, 35 and 127 exist, J. Comb. Math. Comb.
Comp. 14 (1993), 145–152.
[6] , Five new orders for Hadamard matrices of skew type, Australasian J.
Comb. 10 (1994), 259–264.
SKEW-HADAMARD MATRICES 7
[7] R.J. Fletcher, C. Koukouvinos and J. Seberry, New skew-Hadamard matrices
of order 4 · 59 and new D-optimal designs of order 2 · 59, Discrete Math. 286
(2004), 251–253.
Department of Pure Mathematics, University of Waterloo, Water-
loo, Ontario, N2L 3G1, Canada
E-mail address : djokovic@uwaterloo.ca
	1. Introduction
	2. The case n=47
	3. The case n=97
	References
ABSTRACT
  We construct several difference families on cyclic groups of orders 47 and
97, and use them to construct skew-Hadamard matrices of orders 188 and 388.
Such difference families and matrices are constructed here for the first time.
The matrices are constructed by using the Goethals-Seidel array.

<|endoftext|><|startoftext|>
Quantum engineering of photon states with entangled atomic ensembles
D. Porras and J. I. Cirac
Max-Planck Institut für Quantenoptik, Hans-Kopfermann-Str. 1, Garching, D-85748, Germany
(Dated: November 4, 2018)
We propose and analyze a new method to produce single and entangled photons which does not
require cavities. It relies on the collective enhancement of light emission as a consequence of the
presence of entanglement in atomic ensembles. Light emission is triggered by a laser pulse, and
therefore our scheme is deterministic. Furthermore, it allows one to produce a variety of photonic
entangled states by first preparing certain atomic states using simple sequences of quantum gates.
We analyze the feasibility of our scheme, and particularize it to: ions in linear traps, atoms in optical
lattices, and in cells at room temperature.
PACS numbers: PACS
The deterministic generation of collimated single and
entangled photons is of crucial importance in Quantum
Information, like in quantum cryptography [1], quantum
computation [2], quantum lithography [3] or quantum
interferometry [4, 5]. Most of the methods tested so far
require high-Q cavities, something which is very demand-
ing in practice [6, 7, 8, 9]. The engineering of quantum
states in atomic systems is now possible thanks to the
experimental progress experienced by the field of Atomic
Physics during the last years. In fact, with trapped ions
it has been already possible to create so–called W [10]
and GHZ [11] states of up to 8 ions. At the same time,
scientists have been able to produce other kinds of entan-
gled states [12] with atoms in optical lattices. Further-
more, with the advent of Rydberg techniques [13] it will
soon be possible to create W–like states in that system or
in atomic ensembles at room temperature. Apart from
their fundamental interest, some of those states may have
applications in precision spectroscopy [14, 15].
In this work we show that the ability of creating those
atomic states may have a strong impact in different sub-
fields of quantum information, as it may lead to a very
efficient way of creating certain kind of entangled pho-
tonic states which are required in various applications.
The main idea is to use a laser and an internal level con-
figuration such that we can map the atomic state onto
photonic states corresponding to modes propagating in a
well defined direction. Our scheme uses the well known
fact [16, 17] that, under certain circumstances, light scat-
tering takes place predominantly in the forward direction
due to an interference effect. In fact, this effect is the ba-
sis of one of the building blocks of the repeater scheme
proposed in [18], and has been recently demonstrated in
a series of experiments [19, 20, 21]. There, a single ex-
citation is created in an atomic ensemble by detecting a
photon emission in a certain direction. Then, the excita-
tion is released in the forward direction by using a laser.
Building on this fact, we propose to create certain kind
of excitations by using quantum gates or atomic interac-
tions, which give rise to the desired entangled states when
they are released using a laser, and which propagate in
the desired direction due to the mentioned interference
effect.
Let us consider a set of N atoms with (ground) hype-
fine levels |g〉 and |sa,b〉 (see Fig. 1 (a)). We consider
states of the form
|k(na)a ,k
b 〉 =
na! nb!
)na (
|0〉, (1)
and linear combinations thereof. Here, |0〉 = |g〉1 . . . |g〉N ,
e−ikxr
x,j , x = a, b, (2)
where σ
x,j excites an atom from |g〉j to |sx〉j , and r0j are
the equilibrium position of the atoms. In the limit nx ≪
N , Eq. (1) defines a set of orthonormal collective states
with nx atoms excited in |sx〉 and linear momentum kx.
Those states can be indeed readily created using trapped
ions or Rydberg techniques (see Appendix A1,3).
In order to release the photons, one sends a laser pulse
of wavevector kL which couples level |sx〉 to some elec-
tronically excited ones |ex〉, respectively. The large pop-
ulation of level |g〉 together with the initial entangle-
ment (coherences) between the atoms, will now stimu-
late the emission of photons from the excited states to
the level |g〉, which overall will produce the mapping be-
tween these states and the photonic states,
|k(na)a ,k
b 〉 → |na〉ka+kL,σa |nb〉kb+kL,σb ; (3)
that is, (1) is mapped to a Fock state of nx photons with
momenta kx + kL and polarization σx, where σx is the
polarization of the light in each decay channel. More-
over, due to the linearity of this process, superpositions
of states of the form (1) will be mapped onto superpo-
sitions of photonic states (3). For example, the atomic
state
|k(1),q(1)〉+ |q(1),k(1)〉
2 will emit a pair of
entangled photons in different directions. The mapping
(3) is strictly valid under ideal conditions, and in the limit
N → ∞, and the directionality in the photon emission is
directly connected to the momentum conservation which,
in turn, is a consequence of the constructive interference
in the field emitted by each atom. Thus, the crucial is-
sue in our scheme is to determine how this mapping is
http://arxiv.org/abs/0704.0641v2
modified in finite atomic ensembles under nonideal con-
ditions. In the following we analyze such questions in
detail, concentrating in the simplest case in which we
have a single excitation with momentum |k0〉 in |sa〉 (i.e.
our initial state is a W-like state) and thus we produce
a single photon. We determine a function f(Ω), which is
proportional to the probability density that the photon
is emitted in the direction Ω. In general, f = fcoh + finc;
that is, it is the sum of a coherent contribution and an
incoherent one. The later appears whenever the posi-
tions of the particles fluctuate. fcoh contains the forward
scattering contribution, which is emitted in a cone with
a width ∆Ω that decreases with the number of particles.
finc, on the contrary, describes isotropic light emission,
thus, even when the light emitted in ∆Ω is collected, the
contribution finc leads to a limitation in the efficiency of
the setup. To quantify the error probability, we define
dΩ finc(Ω)
dΩ f(Ω)
. (4)
As long as the number of excited atoms is small nx ≪ N ,
this analysis can be easily generalized to the emission of
states with many photons (1). One obtaines that the
overall error is bounded by 1− (1− E)na+nb .
The emission pattern can be obtained by studying the
Heisenberg equations of motion of the field operators.
The calculation inolves the study of the decay of the
atomic state under collective effects (see Appendix C).
To simplify our analysis we ignore the dipole pattern, in
which case we get:
f(Ω) =
i,j=1
〈e−i(kLnΩ−kL)(ri−rj)〉eik0(r
). (5)
rj are the coordinate operators of the atoms, and thus Eq.
(5) allows us to describe fluctuations in the position of the
particles during the emission of light. In the following,
we will show three different experimental set–ups where
our scheme can be implemented. In order to analyze
the performance in each of them, we first particularize
the above formula to three different situations which are
directly connected with those set–ups. We will focus on
the angular width of the forward–scattering cone, ∆Ω,
which measures the collimation of the emitted photons,
and the error probability, E , as figures of merit. Then
we will introduce the possible implementations and will
use those formulas to specify the conditions for them to
correctly operate.
(i) Fixed atomic positions. In the case of a square
lattice of particles trapped in 3D (see Fig. 1 (b)), the
emission pattern is given by
f0(Ω) =
α=x,y,z
sin2((kLn
Ω − (kL+k0)α)d0Nα/2)
sin2((kLn
Ω − (kL+k0)α)d0/2)
, (6)
with Nα the number of atoms in each direction. f0(Ω)
has a series of diffraction peaks, which are reduced to a
single one if d0 < λ/2. In this regime, the emission is cen-
tered in a cone with nΩ in the direction of kL+k0. Note
that for simultaneous energy and momentum conserva-
tion condition |kL + k0| = kL has to be fulfilled. Since
the positions of the atoms do not fluctuate, f0 has only
a coherent contribution (E = 0), and the only limitation
for the effiency of the setup is the width of the emission
cone, which scales in 3D like ∆θ3D ≈ 1/(N1/3kLd0). In
the case of a chain of atoms (1D) momentum is conserved
only along the direction of the chain. Photon emission
can be still directed efficiently along the axis of the chain,
in a cone whose width scales like ∆θ1D ≈ 1/
NkLd0.
(ii) Fluctuating atomic positions. Let us consider a
lattice of atoms at temperature T , trapped by indepen-
dent harmonic potentials. The emission pattern is now
the sum the of two contributions,
fcoh(Ω) = f0(Ω)gT (Ω), finc(Ω) = 1− gT (Ω). (7)
gT (Ω) = e
−((kLnΩ−kL)ξT )
, and ξT is the vector whose
components are the size of the position fluctuations in
each spatial direction, (ξαT )
2 = x20(1 + 2n
T ), with x
0 the
size of the ground state in harmonic potential, and nαT
the number of motion quanta at T . Light scattered into
finc represents an important fraction whenever ξ
T ≫ d0.
In this case, the emission of light is centered around kL,
since the uncertainty in the position of the particles av-
erages out the intial linear momentum k0. The scaling of
E in this regime strongly depends on the dimensionality
of the system. In particular, in the case of a chain of
atoms, E1D = d0/λ, whereas in the square 3D lattice, we
get E3D ≈ 12.6(d0/λ)2N−1/3.
(iii) Statistical distribution of particles. Consider an
ensemble of atoms (see Fig. 1 (c)), which move inside
a square box of size L, such that their motion is faster
than their radiative decay, that is, their average velocity
v is such that vL ≫ Γ, with Γ the emission rate. This
situation can be described by assuming that the atoms
are in a statistical distribution with equal probability to
be at any point in the box. The situation is thus similar
to that of a thermal state,
fcoh (Ω) = Ngbox(Ω), finc (Ω) = 1− gbox(Ω), (8)
where
gbox(Ω) =
α=x,y,z
sinc2 (L(kLn
Ω − kαL)) . (9)
Defining the average distance between particles like d0 =
LN−1/3, we find the same scalings of ∆θ as in case (i),
and of E , as in case (ii). Trapping schemes for atomic
ensembles are simpler to realize but face the difficulty
that conditions for the directionality of photon emission
are more stringent. In the case of a lattice of particles at
fixed positions, forward–scattering is ensured whenever
condition d0 < λ/2 is fulfilled. On the contrary, in the
case of atomic ensembles, the incoherent contribution finc
has to be small enough such that E ≪ 1, which implies
FIG. 1: (a) Level configuration for the release of atomic en-
tangled states in photonic channels. (b) Release of a collec-
tive state with linear momentum k0, that has been gener-
ated in a lattice of atoms. (c) Emission of photons from an
atomic ensemble, which consists of an incoherent contribu-
tion (isotropic), and a coherent one in the forward–scattering
direction.
d0 ≪ λ in 1D, or, alternatively, a number of particles
large enough in 3D.
Now we introduce three experimental set–ups where
our scheme can be implemented. In the Appendix we
show how to create the atomic states that we are consid-
ering here.
Trapped ions. This system is ideally suited to create
collective states like (1), as was demonstrated recently
in ref. [10]. Most usually ions are arranged in chains,
such that we deal with the 1D situation discussed above.
Even though trapped ions are not equally spaced, under
the condition d̄0 < λ/2, with d̄0 the average distance,
we still get light emission in the forward–scattering cone
only, see Fig. 2. Considering two different internal levels,
which can correspond to different states in an hyperfine
multiplet, states such as those defined by Eq. (1) can be
created by a number of quantum operations that scales
linearly with the number of ions N (see A1). For exam-
ple, the state 1/
2 (|0, 2kL〉+ |2kL,0〉), would emit two
photons in the forward and backward directions along the
chain axis, entangled in polarization. The main difficulty
for the implementation of this idea with ions lies on the
fact that ion–ion distances are usually in the range of a
few µm, and thus condition d0 < λ/2 is not fulfilled when
considering optical wavelengths. A way out of this prob-
lem is to use optical transitions which lie in the range of
λ & 5µm, which can be found in ions such as Hg+, Ba+,
or Yb+ [22].
Cold atoms in optical lattices. By using optical lat-
tices we fulfill the need of placing atoms at interparticle
distances comparable to optical wavelengths, since po-
tential wells in a standing–wave are indeed separated by
d0 = λsw/2, with λsw, the wavelength of the counterprop-
agating lasers. By using an optical transition such that
λ > λsw, we are in the regime in which light emission
FIG. 2: Probability of photon emission from an ion chain with
N = 30 ions initially in a W –state. The blue line corresponds
to a chain with equally spaced ions with two diffraction peaks.
Black and red lines corresponds to an ion Coulomb chain, in
which ions are in an overall trapping potential and thus are
not equally spaced. However, in the case that the average dis-
tance, d̄0 is small enough, light is also preferentially emitted
in the forward–scattering direction.
is focused into a single Bragg peak. Although one could
think of peforming quantum gates between ultracold neu-
tral atoms to generate collective atomic states [23], this
procedure faces the difficulties of quantum computation
in this system, like for example how to achieve single
atom addressability. More efficiently, one could avoid
the use of quantum gates by using the dipole–blockade
mechanism with Rydberg atoms, which allows us to gen-
erate W-states, as well as states which emit Fock states
with a number M of photons [13] (see Appendix 3).
Atomic ensembles at room temperature. The very same
techniques which can be applied to Rydberg atoms in an
optical lattice can also be used in the case of hot en-
sembles. On the one hand, this setup has the advantage
that atoms do not need to be cooled and placed in an
optical lattice. On the other hand, it can be described
by a statistical distribution of particles, and thus suffers
from the fact that high efficiency in the release of pho-
tons is achieved under more severe conditions of particle
density and atom number, as discussed above. However,
densities which are high enough to fulfill the requirement
E ≪ 1 have been recently reported in [24].
In conclusion, we have proposed to use current tech-
niques for quantum engineering to generate atomic multi-
partite entangled states which can be efficiently mapped
into photonic states. Our proposal relies on the release of
spin–wave like excitations into a given spatial direction
by means of interference effect, and can be implemented
with trapped ions, atoms in optical lattices, and atomic
ensembles at room temperature.
This work was supported by E.U. projects (SCALA
and CONQUEST), and the Deutsche Forschungsgemein-
schaft.
APPENDIX A: CREATION OF COLLECTIVE
ATOMIC STATES IN A CHAIN OF ATOMS
Entangled states of the form (1) and their linear com-
binations can be generated in a chain of particles, for
example, of trapped ions, by means of a limited number
of quantum operations. To demonstrate this, we first
show that they can be written as Matrix Product States
with a small bond dimension D, i.e. they can be written
|Ψ〉 =
i1,...,iN
〈ΦF|V iN[N ] . . . V
|ΦI〉|i1〉 . . . |iN 〉, (A1)
In (A1), the indices ij = g, sa, sb, and V
are D × D
matrices acting on an auxiliary D–dimensional Hilbert
space. D is given by the number of states which appear
in the singular value decomposition (s.v.d.) of |Ψ〉 at
any site in the chain [25]. As it is shown in [26], the state
(A1) can be prepared by performing N gates which act
on [log2 D]+1 qubits. Thus, as long as D is independent
of N , the number of gates to be applied scales linearly
with the total number of atoms.
To evaluate D, consider first the case of a state like (1)
with atoms excited in level sa only, and a partition of the
chain in two parts L and R. We get na + 1 states in the
s.v.d. with respect to this partition, which correspond to
states with a number of excited atoms in part L, ranging
from 0 to na. This result is easily generalized to a linear
combination ofM states of the form (1), in which case we
get D = M(na + 1)(nb + 1). For example, an entangled
state of the form:
|Ψ〉 = 1√
|kna=1,qnb=1〉+ |qna=1,knb=1〉
, (A2)
has D = 8.
APPENDIX B: QUANTUM STATE
ENGINEERING WITH RYDBERG BLOCKADE
Interactions between excited atomic states, like those
that take place in Rydberg atoms, can be used to the cre-
ate the states defined by Eq. (A2). This can be achieved
in a single experimental step, without the need for quan-
tum gates, if the proper configuration of atomic inter-
actions is chosen. As an example, consider the 3 level
configuration shown in Fig. 1 (a), and interactions be-
tween excited states such that atoms in levels |sa〉, |sb〉,
interact strongly only if they are in the same excited
state, that is, Uaa = Ubb = U , but Uab = 0. We ap-
ply two lasers with wavectors k1,2 and Rabi frequencies
Ω1,2, detuned with respect to the |g〉 – |sa,b〉 transition,
such that ∆1 = −∆2 = ∆. If condition ∆1,2 ≫ Ω1,2
is fulfilled, then the lasers induce a two–photon transi-
tion with Rabi frequency Ωeff = Ω1Ω2/∆. Furthermore,
if Ωeff ≪ U , states with two atoms in the same excited
state are not populated. Under these conditions there
are two possible excitation channels, depicted in Fig. 3,
which give rise to the linear combination (A2).
FIG. 3: Lasers and level configuration for the creation of
atomic entangled states which emit pairs of photons entan-
gled in polarization.
APPENDIX C: CALCULATION OF THE
PHOTON DISTRIBUTION
We consider for simplicity the lambda configuration
depicted in Fig. 1 (a), considering a single excited state
|s〉, and a single auxiliary level |e〉. The interaction of
the quantized electromagnetic field with the ensemble of
atoms, after the adiabatic elimination of level |e〉, is de-
scribed by
j,k,λ
jak,λe
i(k−kL)rj+iωLt + h.c.
gk,λ =
(ǫkλ · dge) , (C1)
σj refers to the |g〉 – |s〉 atomic transition, ΩL and kL
are the Rabi frequency and wave–vector of the classical
field, respectively, ωk is the photon energy, ǫk,λ are the
polarization vectors, and dge is the dipole matrix element
for the |g〉 – |e〉 transition.
The probability of photon emission is proportional to
the the diagonal elements of the one–photon density ma-
trix, which are obtained from the Heisenberg equation of
motion for the field operators,
ak〉 =
dτ1dτ2e
−i(ωk−ωL)(τ1−τ2)
〈e−i(k−kL)(ri−rj)〉〈σ†i (τ1)σj(τ2)〉. (C2)
Since we are interested in the conditions for momentum
conservation due to interference effects, we consider the
following atomic initial state,
|k0〉 = σ†k0 |0〉, σ
e−ik0r
j . (C3)
The emission pattern depends thus on the two–time
atomic correlation function, which in turns can be eval-
uated by means of a master equation which describes
the decay of the atomic levels. In the case of the ini-
tial atomic state k0 (C3), fixed atom positions, and ne-
glecting boundary effects, this correlation function can
be evaluated exactly,
〈σ†i (τ1)σj(τ2)〉 = e
−Γk(τ1+τ2)/2eik0(r
), (C4)
where we have neglected an energy shift due to dipole–
dipole interactions. Integrating (C2) over the absolute
value of k yields the probability of photon emission,
I(Ω) = Ī(Ω)f(Ω). (C5)
Ī(Ω) is the dipole pattern,
Ī(Ω) =
1− (negnΩ)2
, (C6)
where Γ is the single atom radiative decay rate, Γk0 is
the collective decay rate, neg is the unit vector of the
atomic transition, and nΩ is a unit vector in the direc-
tion defined by the solid angle Ω. The factor f(Ω) in I(Ω)
describes the interference between the emission from dif-
ferent atoms, and is given by Eq. (5).
Below we deduce the master equation which leads to
(C4) and we sketch its solution in the case of collective
states with a single excited atom.
APPENDIX D: MASTER EQUATION
The master equation for the reduced density matrix
of the internal levels, which describes the ratiative decay
of a set of atoms under the coupling to the quantized
radiation field given by Eq. (C1), is
∂tρ =
2 σiρσ
j − σ
i σjρ− ρσ
Gij [σ
i σj , ρ], (D1)
where the coupling constants depend on
Jij =
dτgij(τ)e
−iωLτ =
ei(ωk−ωL)τ+i(k−kL)r,(D2)
in the following way:
ℜ(Jij) =
Γij ,
ℑ(Jij) =
Gij . (D3)
The master equation (D1) can be solved for the particular
case of an initial state (C3) by noticing that the evolu-
tion of ρ is closed within the subspace spanned by the
states |k0〉, |0〉. This fact can be easily proved by direct
substitution of ρ(0) = |k0〉〈k0| in Eq. (D1), which yields
the following evolution for the atomic density matrix:
ρ(t) = e−Γk0 t|k0〉〈k0|+
1− e−Γk0 t
|0〉〈0|, (D4)
where the collective decay rate Γk0 is just the Fourier
transform of the coupling constants in the master equa-
tion,
Γk0 =
Γi,je
ik0(r
j). (D5)
A similar result holds for nondiagonal elements of ρ(t),
which together with the quantum regression theorem
yields the evolution of the atomic correlation function
(C4).
[1] Nicolas Gisin, Grégoire Ribordy, Wolfgang Tittel, and
Hugo Zbinden, Rev. Mod. Phys. 74, 145 (2002).
[2] E. Knill, R. Laflamme, and G. J. Milburn. Nature 409,
46 (2001).
[3] A.N. Boto et al.. 5to Beat the Diffraction Limit. Phys.
Rev. Lett. 85, 2733 (2000).
[4] J.J. Bollinger, W.M. Itano, D.J. Wineland, and D.J.
Heinzen, Phys. Rev. A 54, R4649 (1996).
[5] V. Giovannetti, S. Lloyd, and L. Maccone. Science 306
1330 (2004).
[6] P. Michler et al.. Science 290, 2282–2285 (2000).
[7] A. Kuhn, M. Heinrich, and G. Rempe. Phys. Rev. Lett.
89, 067901 (2002).
[8] K. Keller et al.. Nature 431, 1075–1078 (2004).
[9] J. McKeever et al.. Science 303, 1992 (2004).
[10] H. Häffner et al.. Nature 438, 643 (2004).
[11] D. Leibfried et al.. Nature 438, 639 (2004).
[12] O. Mandel et al.. Nature 425, 937 (2003).
[13] M. D. Lukin et al.. Phys. Rev. Lett. 87, 037901 (2001).
[14] D.J. Wineland et al.. Phys. Rev. A 46, R6797 (1992).
[15] C.F. Roos et al.. Nature 443, 316 (2006).
[16] J.D. Jackson. Classical Electrodynamics. Wiley, New
York (1962).
[17] M.O. Scully, E.S. Fry, C. H. Raymond Ooi, and K. Wd-
kiewicz. Phys. Rev. Lett. 96, 010501 (2006).
[18] L–M. Duan, M.D. Lukin, J.I. Cirac, and P. Zoller, Nature
41, 413 (2001).
[19] C.W. Chou et al.. Nature 438, 828 (2005).
[20] Nature 438, 833 (2005).
[21] M. D. Eisaman et al.. Nature 438, 837 (2005).
[22] For example λ(2D3/2 -
2P1/2) = 10.8 µm in Hg
+, or
λ(2D3/2 -
2D5/2) = 12.5 µm in Ba
[23] D. Jaksch et al.. Phys. Rev. Lett. 85, 2208 (2000).
[24] R. Heidemann et al.. Preprint at
<http://arxiv.org/abs/quant-ph/0701120> (2007).
[25] G. Vidal. Phys. Rev. Lett. 91, 147902 (2003).
[26] C. Schön et al.. Phys. Rev. Lett. 95, 110503 (2005).
http://arxiv.org/abs/quant-ph/0701120
ABSTRACT
  We propose and analyze a new method to produce single and entangled photons
which does not require cavities. It relies on the collective enhancement of
light emission as a consequence of the presence of entanglement in atomic
ensembles. Light emission is triggered by a laser pulse, and therefore our
scheme is deterministic. Furthermore, it allows one to produce a variety of
photonic entangled states by first preparing certain atomic states using simple
sequences of quantum gates. We analyze the feasibility of our scheme, and
particularize it to: ions in linear traps, atoms in optical lattices, and in
cells at room temperature.

<|endoftext|><|startoftext|>
Direct photons and dileptons via color dipoles
B. Z. Kopeliovich,1, 2 A. H. Rezaeian,1 H. J. Pirner,3 and Iván Schmidt1
Departamento de F́ısica y Centro de Estudios Subatómicos,
Universidad Técnica Federico Santa Maŕıa, Casilla 110-V, Valparáıso, Chile
Joint Institute for Nuclear Research, Dubna, Russia
Institute for Theoretical Physics, University of Heidelberg,
Philosophenweg 19, D-69120 Heidelberg, Germany
(Dated: November 4, 2018)
Drell-Yan dilepton pair production and inclusive direct photon production can be described within
a unified framework in the color dipole approach. The inclusion of non-perturbative primordial
transverse momenta and DGLAP evolution is studied. We successfully describe data for dilepton
spectra from 800-GeV pp collisions, inclusive direct photon spectra for pp collisions at RHIC energies√
s = 200 GeV, and for pp̄ collisions at Tevatron energies
s = 1.8 TeV, in a formalism that is free
from any extra parameters.
PACS numbers: 13.85.QK,13.60.Hb,13.85.Lg
I. INTRODUCTION
Massive lepton pair production and inclusive direct
photon production in hadronic collisions have histori-
cally provided an important tool to gain access to parton
distributions in hadrons. Moreover, direct photons, i.e.
photons not from hadronic decay, can be also a powerful
probe of the initial state of matter created in heavy ion
collisions, since they interact with the medium only elec-
tromagnetically and therefore provide a baseline for the
interpretation of jet-quenching models.
In the parton model, the Feynman diagrams for par-
tonic subprocesses that are present in Drell-Yan (DY)
lepton pair production and in inclusive direct photon pro-
duction are different, and the connection between both
production mechanisms within a unique approximation
scheme is not obvious. Since in the target rest frame the
DY process looks like bremsstrahlung of a virtual photon
decaying into a lepton pair, we will show that the color
dipole formalism defined in this frame is well suited to de-
scribe both production processes in a unified framework
free of parameters. As an illustrative example, we study
dilepton spectra in 800-GeV pp collisions from the E866
experiment [1], inclusive direct-photon spectra in pp at√
s = 200 GeV from the PHENIX experiment [2], and pp̄
collisions at
s = 1.8 TeV from the CDF experiment [3].
There have been already some attempts to describe
the DY transverse momentum distribution in the color
dipole approach [4], but unfortunately the experimen-
tal data that was used for comparison is not fully kine-
matically in the range of validity of the model. Here
we confront the dipole approach with experimental data
that is in a region where the model is supposed to be
at work. Furthermore, we will also study the inclusion
of both non-perturbative primordial transverse momenta
and DGLAP evolution.
Despite many years of intense studies, a satisfactory
description of all existing inclusive direct photon pro-
duction data in hadronic collision, based on perturba-
tive QCD (pQCD) calculations, seems to be evasive [5].
This letter is an alternative attempt. We shows that the
color dipole approach can successfully describe inclusive
photon production in hadron-hadron collisions.
II. COLOR DIPOLE FORMALISM
The color dipole formalism, developed in [6] for the
case of the total and diffractive cross sections, can be
also applied to radiation [7]. Although in the process
of electromagnetic bremsstrahlung by a quark no dipole
participates, the cross section can be expressed via the
more elementary cross section σqq̄ of interaction of a q̄q
dipole. Nevertheless, this is a fake, or effective dipole.
Similar to a real dipole, where color screening is provided
by interactions with either the quark or the antiquark, in
the case of radiation the two amplitudes for radiation
prior or after the interaction screen each other, leading
to cancellation of the infra-red divergences [7].
The transverse momentum pT distribution of photon
bremsstrahlung in quark-nucleon interactions, integrated
over the final quark transverse momentum, was derived
in [8] in terms of the dipole formalism,
dσqN (q → qγ)
d(lnα)d2~pT
(2π)2
d2~r1d
2~r2e
i~pT .(~r1−~r2)
× φ⋆T,Lγq (α,~r1)φT,Lγq (α,~r2)Σγ(x,~r1, ~r2, α),
where
Σγ(x,~r1, ~r2, α) =
{σqq̄(x, αr1) + σqq̄(x, αr2)}
σqq̄(x, α(~r1 − ~r2)). (2)
and ~r1 and ~r2 are the quark-photon transverse separa-
tions in the two radiation amplitudes contributing to the
cross section, Eq. (1), which correspondingly contains
double-Fourier transformations. The parameter α is the
http://arxiv.org/abs/0704.0642v2
relative fraction of the quark momentum carried by the
photon, and is the same in both amplitudes, since the
interaction does not change the sharing of longitudinal
momentum. The transverse displacement between the
initial and final quarks is αr1 and αr2 respectively. Since
the amplitude of quark interaction has a phase factor
exp(i~b · ~pT ), where ~b is the impact parameter of collision,
the transverse displacement between the initial and final
quarks leads to the color screening factor 1−exp(iα~r·~pT ).
In Eq. (1) T stands for transverse and L for longitudinal
photons. The energy dependence of the dipole cross sec-
tion, which comes via the variable x = 2p1 · q/s, where
p1 is the projectile four-momentum and q is the four-
momentum of the dilepton, is generated by additional
radiation of gluons which can be resummed in the lead-
ing ln(1/x) approximation.
In Eq. (1) the light-cone (LC) wavefunction of the pro-
jectile quark γq fluctuation has been decomposed into
transverse φTγq(α,~r) and longitudinal φ
γq(α,~r) compo-
nents, and an average over the initial quark polarization
and sum over all final polarization states of quark and
photon is performed. These LC wavefunction compo-
nents φT,Lγq (α,~r) can be represented at the lowest order
φT⋆γq (α,~r1)φ
γq(α,~r2) =
4K0(ǫr1)K0(ǫr2)
[1 + (1− α)2]ǫ2~r1.~r2
K1(ǫr1)K1(ǫr2),
φL⋆γq (α,~r1)φ
γq(α,~r2) =
M2(1 − α)2
×K0(ǫr1)K0(ǫr2), (3)
in terms of transverse separation ~r between photon γ
and quark q and the relative fraction α of the quark mo-
mentum carried by the photon. Here K0,1(x) denotes the
modified Bessel function of the second kind. We have also
introduced the auxiliary variable ǫ2 = α2m2q+(1−α)M2,
where M denotes the mass of dilepton and mq is an ef-
fective quark mass which can be conceived as a cutoff
regularization. This quark mass has less influence on
dilepton production in pp collisions, albeit it will be a
numerically important parameter for direct photon pro-
duction, when M = 0. In general the quark mass mq
should not be considered an extra parameter. Indeed,
depending on the kinematical variable M , the Feynman
variable xF and the square of the center of mass energy
of the colliding hadrons s, there always exists a range of
values ofmq where the result does not depend on the spe-
cific mq value. For direct photon M = 0, mq cannot be
zero since the wave function becomes divergent. In this
paper, as in Refs. [8, 9], we take mq = 0.2 GeV for both
dilepton and direct photon production. Notice also that
mq is a more important parameter in proton-nucleus col-
lisions where a value of mq = 0.2 GeV is needed in order
to describe the nuclear shadowing effect [10].
In order to obtain the hadron cross section from the
elementary partonic cross section Eq. (1), one should
sum up the contributions from quarks and antiquarks
weighted with the corresponding parton distribution
functions (PDFs) [8, 9],
dσDY (pp → γ⋆X)
dM2dxF d2~pT
x1 + x2
Z2f{qf (
) + q̄f (
qN (q → qγ⋆)
d(lnα)d2~pT
3πM2(x1 + x2)
dσqN (q → qγ⋆)
d(lnα)d2~pT
dσγ(pp → γX)
dxF d2~pT
x1 + x2
qN (q → qγ)
d(lnα)d2~pT
. (5)
The PDFs of the projectile enter in a combination which
can be written in terms of proton structure function
2 . Notice that with our definitions the fractional quark
charge Zf is not included in the LC wave function of
Eq. (3), and that the factor αem
in Eq. (4) accounts
for the decay of the photon into the lepton pair. We
use the standard notation for the kinematical variables,
x1 = (
x2F + 4τ+xF )/2 denotes the momentum fraction
that the photon carries away from the projectile hadron
in the target frame, we define x2 = x1−xF , xF = 2pL/
is the Feynman variable and τ =
M2+p2
, where pL and
pT denote the longitudinal and transverse momentum
components of the photon in the hadron-hadron center
of mass frame, s is the center of mass energy squared of
the colliding protons and M is the dilepton mass. We
also need to identify the scale Q entering in the pro-
ton structure function in Eq. (5), and relate the energy
scale x of the dipole cross section entered in Eq. (2) to
measurable variables. From our previous definition, and
following previous works [9, 11] we have that x = x2.
At zero transverse momentum, the dominant term in the
LC wavefunction Eq. (3) is the one that contains the
modified Bessel function K1(ǫr). This function decay ex-
ponentially at large values of the argument, so that the
mean distances which numerically contribute are of order
1/ǫ. On the other hand, the minimal value of α is x1, and
therefore the virtuality Q2 which enters into the problem
at zero transverse momentum is ∼ (1−x1)M2. Thus the
hard scale at which the projectile parton distribution is
probed turns out to be Q2 = p2T + (1 − x1)M2. Notice
that in the previous studies, M2 [9] and (1− x1)M2 [11]
were used for the scale Q2. Nevertheless, these different
choices for Q2 bring less than about a 20% effect at small
x2 values.
The dipole cross section is theoretically unknown, al-
though several parametrizations have been proposed in
the literature. For our purposes, here we consider two
parametrizations, the saturation model of Golec-Biernat
and Wüsthoff (GBW) [12] and the modified GBW cou-
pled to DGLAP evolution (GBW-DGLAP) [13].
A. GBW model parametrization
In the GBW model [12] the dipole cross section is
parametrized as,
σqq̄(x,~r) = σ0
1− e−r
, (6)
where the parameters, fitted to DIS HERA data at small
x, are given by σ0 = 23.03 mb, R0 = 0.4fm ×(x/x0)0.144,
where x0 = 3.04 × 10−4. This parametrization gives a
quite good description of DIS data at x < 0.01. A salient
feature of the model is that for decreasing x, the dipole
cross section saturates for smaller dipole sizes, and that
at small r, as perturbative QCD implies, σ ∼ r2 vanishes.
This is the so-called color transparency phenomenon [6].
One of the obvious shortcoming of the GBW model is
that it does not match with QCD evolution (DGLAP) at
large values of Q2. This failure can be clearly seen in the
energy dependence of σ
tot for Q
2 > 20 GeV2, where the
the model predictions are below the data [12, 13].
B. GBW couple to DGLAP equation and dipole
evolution
A modification of the GWB dipole parametrization
model, Eq. (6), was proposed in Ref. [13]
σqq̄(x,~r) = σ0
1− exp
2r2αs(µ
2)xg(x, µ2)
where the scale µ2 is related to the dipole size by
+ µ20. (8)
Here the gluon density g(x, µ2) is evolved to the scale
µ2 with the leading order (LO) DGLAP equation [14].
Moreover, the quark contribution to the gluon density is
neglected in the small x limit, and therefore
∂xg(x, µ2)
∂ lnµ2
dzPgg(z)
, µ2). (9)
where Pgg(z) and αs(µ
2) denote the QCD splitting func-
tion and coupling, respectively. The initial gluon density
is taken at the scale Q20 = 1GeV
2 in the form
xg(x, µ2) = Agx
−λg (1− x)5.6, (10)
where the parameters C = 0.26, µ20 = 0.52GeV
2, Ag =
1.20 and λg = 0.28 are fixed from a fit to the DIS data for
x < 0.01 and in a range of Q2 between 0.1 and 500 GeV2
[13]. We use the LO formula for the running of the strong
coupling αs, with three flavors and for ΛQCD = 0.2 GeV.
The dipole size determines the evolution scale µ2 through
Eq. (8). The evolution of the gluon density is performed
numerically for every dipole size r during the integration
of Eq. (1). Therefore, the DGLAP equation is now cou-
pled to our master equations (4,5). It is important to
stress that the GBW-DGLAP model preserves the suc-
cesses of the GBW model at low Q2 and its saturation
property for large dipole sizes, while incorporating the
evolution of the gluon density by modifying the small-r
behaviour of the dipole size.
The proton structure function in Eqs. (4,5) is
parametrized as
2 (x,Q) = A(x)
[ ln(Q2/Λ2)
ln(Q20/Λ
]B(x)
, (11)
with Q20 = 20GeV
2, Λ = 0.25 GeV, and the functions
A(x), B(x) and C(x) are parametrized in terms of 17 pa-
rameters fitted to different experiments, and whose func-
tional forms can be found in the Appendix of Ref. [15].
This parametrization is only valid in the kinematic range
of the data sets which cover correlated regions in the
ranges 3.5× 10−5 < x < 0.85 and 0.2 < Q2 < 5000GeV2.
III. NUMERICAL RESULTS
Before we proceed to present the results in the color
dipole approach, some words regarding the validity of this
formulation are in order. Although both valence and sea
quarks in the projectile are taken into account through
the proton structure function Eqs. (4, 5), the color dipole
picture accounts only for Pomeron exchange from the tar-
get, while ignoring its valence content. In terms of Regge
phenomenology, this means that Reggeons are not taken
into account, and as a consequence, the dipole approach
predicts the same cross sections for both particle and
antiparticle induced DY reactions. Therefore, in princi-
ple this approach is well suited for high-energy processes,
i.e. small x2. The exact range of validity of the dipole
approach is of course not known a priori, but there is
evidence [9, 11] in its favor for values of x2 < 0.1. In our
case, however, we use a parametrization of the dipole
cross section fitted to DIS data for Bjorken-x < 0.01 and
for energy scales Q2 < 500. Given these restrictions, at
present there are not many data for DY cross section at
low x2. Notice also that some data are integrated over xF
and M , and are therefore contaminated by contributions
not included into the color dipole approach.
We compare the present approach to data for 800-GeV
pp collisions from E886 [1], which are not integrated over
xF and M , and correspond to the lowest x2 values, i.e.
lightest M and highest xF . We selected a xF bin where
0.55 < xF < 0.8, with an average value 〈xF 〉 = 0.63.
Within this bin we selected two bins with the lightest
average values for M , one for 4.20 < Mµ+µ− < 5.20,
with an average value of 〈Mµ+µ−〉 = 4.80 GeV, and the
other for 5.20 < Mµ+µ− < 6.20, with an average value
〈Mµ+µ−〉 = 5.70 GeV. The experimental data are plot-
ted, with errors, in Figs. 1 and 2. In our calculations we
have taken the experimental average values for xF and
0 1 2 3 4 5
(GeV)
E866 Data
GBW-DGLAP
GBW-DGLAP-Primordial
=0.63, M=5.70 GeV
FIG. 1: The Dilepton spectrum in 800-GeV pp collisions at
xF = 0.63 and M = 5.7GeV. We show the result of the GBW
dipole model (dashed line) and the GBW-DGLAP model
(dotted line). We also show the result when a constant pri-
mordial momentum 〈k20〉 = 0.4GeV2 is incorporated within
the GBW-DGLAP dipole model (solid line). Experimental
data are from Ref. [1]. The E866 error bars are the linear
sum of the statistical and systematic uncertainties. An addi-
tional ±6.5% uncertainty in the experimental data points due
to the normalization is also common to all points.
0 1 2 3 4 5
(GeV)
E866 Data
GBW-DGLAP
GBW-DGLAP-Primordial
=0.63, M=4.8 GeV
FIG. 2: The same as Fig. 1, except for M = 4.8.
In Figs. 1, 2 we show the result obtained by the dipole
approach, for both the GBW and the GBW-DGLAP
dipole models. At low transverse momentum pT < 2 GeV
both model predictions are almost identical, but at higher
pT the dipole parametrization improved by DGLAP evo-
lution bends down towards the experimental points im-
proving the result. This is more obvious for higher val-
ues of M . Notice that for the case of a smaller value of
x2 with a lighter M , Fig. 2, where the dipole approach
is better suited, the GBW model without inclusion of
the DGLAP evolution already provides a good descrip-
tion of the data. We stress that the theoretical curves
in Figs. 1, 2 are the results of a parameter free calcula-
tion. As we already pointed out, varying the quark mass
mq leaves the numerical results almost unaffected. Notice
also that in contrast to the LO parton model, noK-factor
was introduced, since the dipole parametrization fitted to
DIS data already includes contributions from higher or-
der perturbative corrections as well as non-perturbative
effects contained in DIS data.
One of the data point which surprisingly is left out
from our theoretical computation curves for both values
of M , is the one at the lowest pT . In the dipole ap-
proach the DY cross section is finite at pT = 0 due to the
saturation of the dipole cross section, which is in strik-
ing contrast to the LO pQCD correction to the parton
model, where one needs to resume the large logarithms
Ln(p2T /M
2) from soft gluon radiation in order to obtain a
physically sensible results at pT = 0 [16]. One of the pos-
sible reasons behind the lack of agreement between our
result and the experimental data at p → 0 may be due
to a soft non-perturbative primordial transverse momen-
tum distribution of the partons in the colliding protons.
Such a primordial transverse momentum may have var-
ious non-perturbative origins, e. g. finite size effects of
the hadron, instanton effects, pion-cloud contributions.
Moreover, in the parton model it has been shown that
even within the next-to-leading order (NLO) pQCD cor-
rection, experimental data of heavy quark pair produc-
tion [17], direct photon production [18] and DY lepton
pair production [19] can be only described if an aver-
age primordial momentum as large as 1 GeV is included
(see also Ref. [20]). Such a large value for the initial
transverse momentum strongly indicates its perturbative
origin in the parton model, and in principle must have
been already included in the pQCD correction. There-
fore, in the pQCD approach, it is still an open question
how to separate what is truly intrinsic and what is pQCD
generated transverse momentum. However, in the dipole
approach all perturbative and non-perturbative contribu-
tions, apart from the finite-size effect of hadrons, are al-
ready encoded into the cross section via fitting the dipole
parameters to DIS data. Therefore, we expect that in the
dipole approach the primordial momentum should have
a purely non-perturbative origin, and to be considerably
less than in the parton model. One may introduce an
intrinsic momentum contribution in the following factor-
ized form
F(pT ) →
d2kTF(pT − kT )GN (kT ), (12)
where the function F denotes the cross section defined
in Eqs. (4,5). We assume that the initial pT distribution
GN (pT ) has a Gaussian form,
GN (kT ) =
π〈k2T 〉N
〉N , (13)
where 〈k2T 〉N is the square of the two-dimensional width
of the pT -distribution for an incoming quark, and also
that 〈k2T 〉N is a constant independent of the hard scale
Q, since the pQCD radiation-generating transverse mo-
menta are already taken into account in our approach.
The differential cross section convoluted with the pri-
mordial momentum distribution in the GBW-DGLAP
dipole model are shown in Figs. 1 and 2 with curves de-
noted with GBW-DGLAP-Primordial. A value around
〈k2T 〉N = 0.4 GeV
2 can describe the experimental points
at low pT for both sets of data plotted in Figs. 1 and 2.
This value, as we expected, is lower than the primordial
momentum which has been used in the parton model.
The experimental data points for pT → 0 should be
taken with some precaution, since there exists some dis-
agreement between different experiments for DY lep-
ton pair production at low pT . Indeed, although the
E772 and E866 measurements [1] have good agreement
among them over a wide range of values, they disagree
at pT → 0. Therefore, the discrepancy between our the-
oretical results and experimental data at pT → 0 might
be just in fact an artifact of the experiments.
Next we calculate the inclusive direct photon spectra
within the same framework. For direct photon we have
M = 0, and we assume again a quark mass mq = 0.2
GeV. As illustrative examples we compare our results
with the PHENIX and CDF experiments. Notice that di-
rect photon problem (withM = 0), compared to the mas-
sive virtual photon case (with M as big as ∼ 5 GeV), is
numerically more involved since the integrand in Eq. (1)
is divergent when mq → 0.
In Fig. 3 we show the differential cross section ob-
tained from the GBW and the GBW-DGLAP dipole
models at midrapidities, for pp collisions at RICH en-
ergies
s = 200 GeV. The experimental data are from
the PHENIX measurements for inclusive direct photon
production at y = 0 [2]. We have also checked out
that the effect of the incorporation of the same trans-
verse primordial momentum 〈k2T 〉N = 0.4 GeV
2 which
can describe the dilepton spectra at low pT , will be in
this case too small to improve the results at the range
of pT of the experimental data. Without a physically
sound guiding principle, however, the introduction of a
higher value of intrinsic momentum is somehow unsat-
isfactory and will not be further discussed here. Notice
also that, in contrast to the parton model, we have not
included any photon fragmentation function [21, 22, 23]
for computing the cross section, since the dipole formula-
tion already incorporates all perturbative (via Pomeron
exchange) and non-perturbative radiation contributions.
It has been shown that the NLO pQCD prediction [21, 23]
] RHIC data
GBW-DGLAP
0 2 4 6 8 10 12 14 16 18 20 22 24
 (GeV)
FIG. 3: Inclusive direct photon spectra obtained from the
GBW and the GBW-DGLAP dipole models for midrapidity
η = 0 at RHIC energy
s = 200 GeV. Experimental data are
from Ref. [2]. In the down panel we use the GBW-DGLAP
dipole model result for the theory. The error bars are the
linear sum of the statistical and systematic uncertainties.
are also consistent with the RHIC data within the uncer-
tainties [2].
In Fig. 4 we show the dipole models predictions for in-
clusive prompt-photon production at midrapidities, and
for CDF energies
s = 1.8 TeV. The experimental points
are taken from CDF data for inclusive isolated-photon,
averaged over |η| < 0.9 [3]. At lower transverse momen-
tum pT < 30 GeV the GBW dipole model can reproduce
rather fairly the experimental data, and at higher pT val-
ues DGLAP evolution significantly improves the results.
In the collider experiments at the Tevatron, in order to
reject the overwhelming background of secondary pho-
tons which come from the decays of pions, isolation cuts
are imposed [3]. These cuts affect the direct-photon cross
section, in particular by reducing the fragmentation ef-
fects. Isolation conditions are not imposed in our cal-
culation, although the experimental data is for isolated
photon. However, it has been shown that the cross sec-
tion does not vary by more than 10% under CDF isola-
tion conditions and kinematics [24]. Therefore, the main
source of uncertainty in our approach is due to the fact
that the experimental points are averaged over rapidity
and contaminated by Reggeon contributions which are
ignored in the dipole approach. One should also notice
that the parametrizations of the dipole cross section and
proton structure function employed in our computation
have been fitted to data at considerably lower Q2 values
(see previous section). The NLO pQCD calculation for
direct photon production at the Tevatron energy was per-
formed in Ref. [25]. New independent measurement of di-
] CDF data
GBW-DGLAP
NLO QCD 
0 20 40 60 80 100 120
 (GeV)
CTEQ5M, µ = p
FIG. 4: Inclusive direct photon spectra obtained from the
GBW and GBW-DGLAP dipole models for midrapidity at
CDF energy
s = 1.8 TeV. Experimental data are for inclu-
sive isolated photon from CDF experiment for pp̄ collision at
CDF energy and |η| < 0.9 [3]. The NLO QCD curve is from
the authors of reference [25] (given in table 3 of Ref. [26])
and use the CTEQ5M parton distribution functions with the
all scales set to the pT . In the down panel we use the GBW-
DGLAP dipole model result for the theory. The error bars are
the linear sum of the statistical and systematic uncertainties.
rect photon at the Tevatron energy which is in agreement
with previous CDF measurement [3], provided further ev-
idence that the shape of the cross section as function of
pT cannot be fully described by the available NLO pQCD
computation [26].
In this letter, we showed that both direct photon pro-
duction and DY dilepton pair production processes can
be described within the same color dipole approach with-
out any free parameters. In contrast to the parton model,
in the dipole approach there is no ambiguity in defin-
ing the intrinsic transverse momentum. Such a purely
non-perturbative primordial momentum improves the re-
sults in the case of dilepton pair production, but does not
play a significant role for direct photon production at the
given experimental range of pT . We also showed that the
color dipole formulation coupled to the DGLAP evolution
provides a better description of data at large transverse
momentum compared to the GBW dipole model.
Acknowledgments
The authors would like to thank T. Isobe for provid-
ing the experimental data in Ref. [2]. This work was
supported in part by Fondecyt (Chile) grants 1070517
and 1050519 and by DFG (Germany) grant PI182/3-1.
[1] J. C. Webb, FERMILAB-THESIS-2002-56,
hep-ex/0301031.
[2] PHENIX Collaboration, Phys. Rev. Lett. 98, 012002
(2007).
[3] CDF Collaboration, Phys. Rev. Lett. 73, 2662 (1994);
74,1891 (1995).
[4] M. A. Betemps, M. B. Gay Ducati, M. V. T. Machado,
J. Raufeisen, Phys. Rev. D67, 114008 (2003).
[5] P. Aurenche, M. Fontannaz, J. Ph. Guillet, B. Kniehl, E.
Pilon and M. Werlen, Eur. Phys. J. C9, 107 (1999).
[6] A. B. Zamolodchikov, B. Z. Kopeliovich and L. I.
Lapidus, JETP Lett. 33, 595 (1981).
[7] B.Z. Kopeliovich, proc. of the workshop Hirschegg
’95: Dynamical Properties of Hadrons in Nuclear Mat-
ter, Hirschegg January 16-21, 1995, ed. by H. Feld-
meyer and W. Nörenberg, Darmstadt, 1995, p. 102
(hep-ph/9609385).
[8] B. Z. Kopeliovich, A. Schaefer and A. V. Tarasov, Phys.
Rev. C59, 1609 (1999).
[9] B. Z. Kopeliovich, J. Raufeisen and A. V. Tarasov, Phys.
Lett. B503, 91 (2001).
[10] B. Z. Kopeliovich, J. Raufeisen and A. V. Tarasov, Phys.
Rev. C62, 035204 (2000).
[11] J. Raufeisen, J.-C. Peng and G. C. Nayak, Phys. Rev.
D66, 034024 (2002).
[12] K. Golec-Biernat and M. Wusthoff, Phys. Rev. D59,
014017 (1999); D60, 114023 (1999).
[13] J. Bartels, K. Golec-Biernat and H. Kowalski, Phys. Rev.
D66, 014001 (2002).
[14] M. A. J. Botje, QCDNUM16: A fast QCD evolution,
ZEUS Note 97-006, 1997.
[15] SMC Collaboration, Phys. Rev. D58, 112001 (1998).
[16] P. Chiappetta and H. J. Pirner, Nucl. Phys. B291, 765
(1987).
[17] M. N. Mangano, P. Nason, and G. Ridolfi, Nucl. Phys.
B373, 295 (1992).
[18] Fermilab E706 Collaboration, Phys. Rev. Lett. 81, 2642
(1998); L. Apanasevich et al., Phys. Rev. D59, 074007
(1999).
[19] D. C. Hom et al., Phys. Rev. Lett. 37, 1374 (1976); D.
M. Kaplan et al., Phys. Rev. Lett.40, 435 (1978).
[20] X.-N. Wang, Phys. Rev. C61, 064910 (2000).
[21] L. E. Gordon and W. Vogelsang, Phys. Rev. D48, 3136
(1993); D50, 1901 (1994).
[22] E. L. Berger and J.-W. Qiu, Phys. Rev. D44, 2002
(1991).
[23] P. Aurenche et al., Phys. Lett. B140, 87 (1984); Nucl.
Phys. B297, 661 (1988); H. Baer et al., Phys. Rev. D42
61 (1990); Phys. Lett. B234 127 (1990).
[24] S. Catani, M. Fontannaz, J. Ph. Guillet and E. Pilon,
JHEP 0205, 028 (2002).
[25] M. Gluck, L. E. Gordon, E. Reya and W. Vogelsang,
http://arxiv.org/abs/hep-ex/0301031
http://arxiv.org/abs/hep-ph/9609385
Phys. Rev. Lett. 73, 388 (1994).
[26] CDF Collaboration, Phys. Rev. D70, 074008 (2004).
ABSTRACT
  Drell-Yan dilepton pair production and inclusive direct photon production can
be described within a unified framework in the color dipole approach. The
inclusion of non-perturbative primordial transverse momenta and DGLAP evolution
is studied. We successfully describe data for dilepton spectra from 800-GeV pp
collisions, inclusive direct photon spectra for pp collisions at RHIC energies
$\sqrt{s}=200$ GeV, and for $p\bar{p}$ collisions at Tevatron energies
$\sqrt{s}=1.8$ TeV, in a formalism that is free from any extra parameters.

<|endoftext|><|startoftext|>
Submitted to The Astrophysical Journal
Preprint typeset using LATEX style emulateapj v. 08/22/09
MAPPING THE YOUNGEST GALAXIES TO REDSHIFT ONE1,2
Yuko Kakazu,
Lennox L. Cowie,
Esther M. Hu,
Submitted to The Astrophysical Journal
ABSTRACT
We describe the results of a narrow band search for ultra-strong emission line galaxies (USELs)
with EW(Hβ) ≥ 30 Å. 542 candidate galaxies are found in a half square degree survey using two
∼ 100Å filters centered at 8150Å and 9140Å with Subaru/SuprimeCam. Followup spectroscopy has
been obtained for randomly selected objects in the candidate sample with KeckII/DEIMOS and has
shown that they consist of [O III]λ5007, [O II]λ3727, and Hα selected strong-emission line galaxies at
intermediate redshifts (z < 1), and Lyα emitting galaxies at high-redshift (z >> 5). We determine
the Hβ luminosity functions and the star formation density of the USELs, which is 5-10% of the
value found from ultraviolet continuum objects at z = 0 − 1, suggesting that they correspond to
a major epoch in the galaxy formation process at these redshifts. Many of the USELs show the
temperature-sensitive [O III]λ4363 auroral lines and about a dozen have oxygen abundances satisfying
the criteria of eXtremely Metal Poor Galaxies (XMPGs). These XMPGs are the most distant known
today and our high yield rate of XMPGs suggests that narrowband method is powerful in finding
such populations. Moreover, the lowest metallicity measured in our sample is 12+log(O/H)=7.06
(6.78−7.44), which is close to the minimum metallicity found in local galaxies, though we need deeper
spectra to minimize the errors. HST/ACS images of several USELs exhibit widespread morphologies
from relatively compact high surface brightness objects to very diffuse low surface brightness ones. The
luminosities, metallicities and star formation rates of USELs are consistent with the strong emitters
being start-up intermediate mass galaxies which will evolve into more normal galaxies and suggest
that galaxies are still forming in relatively chemically pristine sites at z < 1.
Subject headings: cosmology: observations — galaxies: distances and redshifts — galaxies: abundances
— galaxies: evolution — galaxies: starburst
1. INTRODUCTION
The study of low-metallicity galaxies is of consider-
able interest for the clues that it can provide about the
first stages of galaxy formation and chemical enrichment.
We would also like to know if there are any genuinely
young galaxies undergoing their first episodes of star for-
mation at low redshifts. To date, the most metal-poor
systems studied have been the blue compact emission-
line galaxies found in the local Universe, with systems
such as I Zw 18 and SBS 0335-052W defining the low
metallicity boundary with measured 12+log (O/H) of
∼ 7.1 − 7.2 (Sargent & Searle 1970; Thuan & Izotov
2005; Izotov et al. 2005). More recently, the Sloan Digi-
tal Sky Survey (SDSS) has yielded additional extremely
metal-poor galaxies (XMPGs) (12+log (O/H) < 7.65 or
Z < Z⊙/12; Kniazev et al. 2003; Izotov et al. 2006a).
Despite enormous efforts, only a few dozen such XMPGs
are known, all at redshift z < 0.05 (e.g., Oey 2006; Izotov
2006b).
Historically, objective prism surveys have been used
to select emission-line galaxies for low-metallicity stud-
ies. (e.g. the Hamburg QSO Survey (Popescu et al.
1996) and its HSS sequel (Ugryumov et al. 1999) that
1 Based in part on data obtained at the Subaru Telescope, which
is operated by the National Astronomical Observatory of Japan.
2 Based in part on data obtained at the W. M. Keck Observa-
tory, which is operated as a scientific partnership among the the
California Institute of Technology, the University of California, and
NASA and was made possible by the generous financial support of
the W. M. Keck Foundation.
3 Institute for Astronomy, University of Hawaii, 2680 Woodlawn
Drive, Honolulu, HI 96822.
discovered HS 2134+0400 (Pustilnik et al. 2006) and
the Kitt Peak International Spectroscopy Survey (KISS;
Salzer et al. 2000; Melbourne & Salzer 2002)). The ad-
vantage of using the objective prism technique rather
than the continuum selection, employed with the SDSS
(Kniazev et al. 2003) or DEEP2 surveys (Hoyos et al.
2005), is that they have a higher efficiency and
provide a more uniform selection. By comparison,
continuum/broad-band surveys have a very low yield rate
(8 new XMPGs and 4 recovered XMPGS from an anal-
ysis of 250,000 spectra over ∼3000 deg2 for the SDSS
(Kniazev et al. 2003), since low-metallicity populations
in their first outburst have weak continuua and strong
emission lines.
An alternative method of discovering strong emission-
line, low-metallicity galaxies is to use narrowband
surveys. Strong emission-line galaxies have histori-
cally been picked up in high-z Lyman alpha searches
(e.g., Stockton & Ridgway 1998; Hu et al. 1998, 2004;
Stern et al. 2000; Tran et al. 2004; Ajiki et al. 2006)
where they have been considered contaminants. How-
ever, the low redshift emission line galaxies seen in these
surveys are of great interest in their own right as we shall
show in the present paper. While some spectroscopic
studies have been carried out for low-redshift galaxies se-
lected from narrowband surveys (e.g., Maier et al. 2006;
Ly et al. 2007), the small sample sizes have precluded
any detailed investigation of metallicity and identifica-
tion of a low-metallicity population.
The narrowband method probes to much deeper limits
than the objective prism surveys. This enables prob-
http://arxiv.org/abs/0704.0643v1
2 Kakazu et al.
ing star-forming populations out to near redshift z ∼ 1
where the cosmic star formation rates are considerably
higher. Furthermore the narrow-band emission-line se-
lection can allow us to assemble very large samples of
strong-emission line objects, with a clean selection of dif-
ferent line types for the construction of luminosity func-
tions.
Such a sample allows us to address such questions
as whether there are substantial populations of strong
star-forming galaxies with low metallicities among more
massive galaxies. There has been considerable contro-
versy about the interpretation of the low metallicity mea-
surements in the blue compact galaxy samples where
the ease with which gas may be ejected in these dwarf
galaxies has complicated the picture (e.g., Corbin et al.
2006) or, at least, resulted in identifying low metal-
licity systems which are not forming their first gen-
eration of stars. The identification of low metallicity
galaxies – at the level of the XMPGs – among mas-
sive galaxies can provides less ambiguous examples of
galaxies that are genuinely ‘young’ and caught in the
initial stages of star formation. Current efforts to iden-
tify low metallicity galaxies from continuum selected
surveys (e.g., Kobulnicky et al. 2003; Lilly et al. 2003;
Kobulnicky & Kewley 2004; Hoyos et al. 2005) have low-
metallicity thresholds that are higher than this – about
one-third solar (in O/H). With a narrow-band selec-
tion criterion much larger emission-line samples includ-
ing such low metallicity galaxies can be identified. With
these large samples it is also possible to determine
whether there is an observed lower metallicity threshold
for such galaxies, and to estimate what the contribution
of such strong star-formers might be at these epochs.
In the present work we use a number of deep, narrow-
band images obtained with the SuprimeCam mosaic
CCD camera (Miyazaki et al. 2002) on the Subaru 8.2-
m telescope to find a large sample of extreme emission-
line galaxies. We first (§2) outline the selection criteria
(magnitude and flux thresholds) for the target fields re-
sulting in a sample of 542 galaxies, which we call USELs
(Ultra-Strong Emission Liners). We then describe (§3)
the spectroscopic followups for 161 of these galaxies us-
ing multi-object masks with the DEIMOS spectrograph
(Faber et al. 2003) on the 10-m Keck II telescope. Sam-
ple spectra for each class of object are shown. Flux
calibration and equivalent width distributions are pre-
sented in §4, and the resulting measured line ratios are
discussed. In §5 luminosity functions are constructed and
star formation rates are estimated for the sample. These
galaxies are estimated to contribute roughly 10% to the
measured star-formation rate (without extinction correc-
tions) at this epoch. Analysis of the metallicities is given
in §6. Their morphologies and dynamical masses are dis-
cussed in §7 and a final summary discussion is given in
§8. We use a standard H0 = 70 km s
−1 Mpc−1, Ωb =
0.3, ΩΛ = 0.7 cosmology throughout.
2. THE NARROW BAND SELECTION
The emission line sample was chosen from a set of
narrow band images obtained with the SuprimeCam
camera on the Subaru 8.2-m telescope. The data were
obtained in a number of runs between 2001 and 2005
under photometric or near photometric conditions. We
used two ∼120 Å (FWHM) filters centered at nominal
5000 6000 7000 8000 9000 10000
WAVELENGTH (A)
Fig. 1.— Schematic illustration of the selection process and a
typical spectrum of the galaxies we find. The objects are chosen
based on their excess light in one of two narrow band filters at 8160
Å and 9140Å. The present case corresponds to an Hα emission
line object found in the 9140Å filter (shown with the narrow solid
curve). Also illustrated are the broad band V (dash-dot), R (solid),
I (dashed), and z′ (dotted) filters use to measure the continuum.
The spectrum shown corresponds to object 205 in Table 3 and is an
Hα emitter at z = 0.3983. The easily visible lines are the Balmer
series and the [O III] lines at λλ5007, 4959, and 4363Å.
wavelengths of 8150 Å and 9140 Å in regions of low
sky background between the OH bands. The nominal
specifications for the Subaru filters may be found at
http://www.naoj.org/Observing/Instruments/SCam/sensitivity.html
and are also described in Ajiki et al. (2003). We shall
refer to these filters as NB816 and NB912.
About 5 hour exposures were obtained with NB816 and
∼10 hour exposures with NB912 yielding 5 sigma limits
fainter than 26 mags in both bands. Deep exposures in
B, V , R, I and z′ were also taken for the fields. The data
were taken as a sequence of dithered background-limited
exposures and successive mosaic sequences were rotated
by 90 degrees. Only the central uniformly covered areas
of the images were used. Corresponding continuum expo-
sures were always obtained in the same observing run as
the narrowband exposures to avoid false identifications
of transients such as high-z supernovae, or Kuiper belt
objects, as emission-line candidates. A detailed descrip-
tion of the full reduction procedure for images is given in
Capak et al. (2004). All magnitudes are given in the AB
system (Oke 1990). These were measured in 3′′ diameter
apertures, and had average aperture corrections applied
to give total magnitudes.
The primary purpose of the program was to study Lyα
emitters at redshifts of z ∼ 5.7 and z ∼ 6.6 (Hu et al.
2004, 2007) but the narrow band imaging is also ideal
for selecting lower redshift emission line galaxies and it
is for this purpose that we use these data in the present
paper. The fields which we use and the area covered (ap-
proximately a half square degree in each bandpass) are
summarized in Table 1. These are distributed over the
sky to deal with cosmic variance. We selected galaxies in
the narrowband NB816 filter using the Cousins I band
filter as a reference continuum bandpass and including
all galaxies with NB816 < 25 and I − NB816 greater
than 0.8. We selected galaxies in the NB912 filter with
the z filter as the reference continuum bandpass and in-
http://www.naoj.org/Observing/Instruments/ SCam/sensitivity.html
THE YOUNGEST GALAXIES 3
TABLE 1
Narrowband Survey Area Coverage
Field RA (J2000) Dec (J2000) (lII, bII) EB−V
a NB816 NB912
(arcmin2) (arcmin2)
SSA22 22:17:57.00 +00:14:54.5 ( 63.1,−44.1) 0.07 674 591
SSA22 new 22:18:24.67 +00:36:53.4 ( 63.6,−43.9) 0.06 278 278
A370 new 02:41:16.27 −01:34:25.1 (173.4,−53.3) 0.03 278 278
HDF-N 12:36:49.57 +62:12:54.0 (125.9,+54.8) 0.01 710 528
Total 1940 1675
Note. — An adjacent field to A370 new is a site of a gravitational lensing cluster at
z ∼ 0.375, and was omitted from the suvey.
a estimated using http://irsa.ipac.caltech.edu/applications/DUST/ based on
Schlegel et al. (1998)
20 21 22 23 24
N(AB)
20 21 22 23 24
N(AB)
WITH SPECTRA
Fig. 2.— Continuum – Narrow band magnitude versus narrow
band magitude for all objects with narrow band magnitude brighter
than 24. The diamonds show the narrowband excess emission mag-
nitude of the NB912 sample and the squares the NB816 sample.
Galaxies which would be included in an R < 24 continuum selected
sample are shown with solid symbols. The upper panel shows the
complete sample while the lower panel shows the subsample which
has been spectroscopically identified.
cluded all galaxies with NB912 < 25 and z − NB912
greater than 1. The selection process is illustrated for
a galaxy found in the NB912 filter in Figure 1. Both
selections correspond roughly to choosing objects with
emission lines with rest frame equivalent widths greater
than 100 Å. The exact equivalent width limit depends
on the precise position of the emission line in the filter
and the redshift of the galaxy which in turn depends on
which emission line is producing the excess light in the
narrow band.
The final USEL sample consists of 542 galaxies (267
in the NB816 filter and 275 in the NB912 filter). Tabu-
lated coordinates, multi-color magnitudes, and redshifts
(where measured) for these objects are summarized in
Table 2. Very few of these objects would be included
in continuum-selected spectroscopic samples. Figure 2
shows the narrow band excess as a function of narrow-
band (NAB) magnitude for objects with narrow band
magnitudes brighter than 24. The open symbols show
the present sample while the solid symbols show ob-
jects which would be included in an R < 24 continuum-
selected sample.
3. SPECTRA
Spectroscopic observations were obtained for 161
USELs from the sample using the Deep Extragalac-
tic Imaging Multi-Object Spectrograph (DEIMOS;
Faber et al. 2003) on Keck II in a series of runs between
2003 and 2006. The emission line objects were included
in masks designed to observe high-z Lyα candidates and,
as can be seen in the lower panel of Figure 2, constitute
an essentially random sample of the emission line galax-
The observations were primarily made with the G830
ℓ/mm grating blazed at 8640 Å and used 1′′ wide slitlets.
In this configuration, the resolution is 3.3 Å, which is
sufficient to distinguish the [O II]λ3727 doublet struc-
ture. This allows us to easily identify [O II]λ3727 emit-
ters where often the [O II]λ3727 doublet is the only emis-
sion feature. The spectra cover a wavelength range of
approximately 4000 Å and were centered at an average
wavelength of 7800 Å, though the exact wavelength range
for each spectrum depends on the slit position with re-
spect to the center of the mask along the dispersion di-
rection. The G830 grating used with the OG550 blocker
gives a throughput greater than 20% for most of this
range, and ∼ 28% at 8150 Å. The observations were
not generally taken at the parallactic angle, since this
was determined by the mask orientation, so considerable
care must be taken in measuring line fluxes as we dis-
cuss below. Each ∼ 1 hr exposure was broken into three
subsets, with the objects stepped along the slit by 1.5′′
in each direction. Some USELs were observed multiple
times, resulting in total exposure times for these galaxies
of 2−3 hours. The two-dimensional spectra were reduced
following the procedure described in Cowie et al. (1996)
and the final one-dimensional spectra were extracted us-
ing a profile weighting based on the strongest emission
http://irsa.ipac.caltech.edu/applications/DUST/
4 Kakazu et al.
4000 4500 5000 5500 6000 6500 7000
REST WAVELENGTH (A)
Fig. 3.— Spectrum of an Hα emission galaxy selected in the NB912 filter. In the upper plot we have decreased the scale of the vertical
axis by a factor of 10 to show the continuum and the weaker lines. The more important emission line features are labelled and marked
with the dotted lines.
3500 4000 4500 5000 5500
REST WAVELENGTH (A)
Fig. 4.— Spectrum of an [O III] galaxy in the NB816 selected sample. The lower plot shows the relative strengths of the very strong
emission lines in the spectrum. In the upper plot we have decreased the scale of the vertical axis by a factor of 10 to show the continuum
and the weaker lines. The more important emission line features are labelled and marked with the dotted lines.
THE YOUNGEST GALAXIES 5
3500 4000 4500 5000 5500
REST WAVELENGTH (A)
Fig. 5.— Spectrum of an [O III] galaxy selected in the NB912 filter. The lower plot shows the relative strengths of the very strong
emission lines in the spectrum. In the upper plot we have decreased the scale of the vertical axis by a factor of 10 to show the continuum
and the weaker lines. The more important emission line features are labelled and marked with the dotted lines.
3000 3200 3400 3600 3800 4000 4200 4400
REST WAVELENGTH (A)
Fig. 6.— Spectrum of an [O II] galaxy selected in the NB816 filter. The plot shows the relative strengths of the very strong emission
lines in the spectrum. The more important emission line features are labelled and marked with the dotted lines.
6 Kakazu et al.
0.0 0.5 1.0 1.5
REDSHIFT
[OIII]
Hα [OII]
0.0 0.5 1.0 1.5
REDSHIFT
10-18
10-17
10-16
10-15
Fig. 7.— (a) Distribution of redshifts for the spectroscopically
identified sources. [O III] λ5007 emitters are the most common.
Since the focus of this paper is on intermediate-redshift (z . 1)
strong emission line galaxies, we did not plot high redshift Lyα
galaxies (z >> 5) in our NEO sample. High-z Lyα emitters are
discussed in Hu et al. (2004, 2007). (b) Flux versus redhift for the
spectroscopically identified sample. Squares are Hα, diamonds are
[O III] λ5007, and triangles are [O II] λ3727. The solid line shows
the flux limit corresponding to the narrow band magnitude limit of
N(AB)=25 for an emitter with very large equivalent width. Some
objects with lower equivalent widths fall below this limit.
line in the spectrum. A small number of the spectra
were obtained with the ZD600 ℓ/mm grating giving a
correspondingly lower resolution but a wider wavelength
coverage. These observations were centered at 7200 Å.
Essentially all of the emission line candidates which
were observed were identified, though two of the objects
in the NB816 sample are stars where the absorption line
structure mimics emission in the band. Sample spec-
tra are shown in Figures 3, 4, 5, and 6. The measured
redshifts are given in Tables 2 and 3. The narrow band
emission line selection produces a mixture of objects cor-
responding to Hα, [O III]λ5007, and [O II]λ3727 and, at
the faintest magnitudes (> 24), high redshift Lyα emit-
ters. The number of objects seen in each line and the
redshifts where they are found are shown in Figure 7.
The spectroscopically identified sample from both bands
contains 13 Hα, 92 [O III]λ5007, and 23 [O II]λ3727 emit-
ters. In the remainder of the paper we shall focus primar-
ily on the Hα and [O III]λ5007 selected galaxies which lie
between redshifts zero and one.
Since only 30% of the USELs are spectroscopically
identified we must apply a substantial incompleteness
correction in computing the line luminosity function
and the universal star formation histories. Because
the type mix may vary as a function of magnitude we
have adopted a magnitude dependent weighting for each
galaxy equal to the total number of galaxies at this mag-
nitude divided by the number of spectroscopically iden-
tified galaxies. However, the analysis is not particularly
sensitive to the adopted scheme since the fraction of iden-
tified galaxies is relatively constant with magnitude.
4. FLUX CALIBRATIONS
Generally our spectra were not obtained at the par-
allactic angle since this is determined by the DEIMOS
mask orientation. Therefore flux calibration using stan-
dard stars is problematic due to atmospheric refraction
effects, and special care must be taken for the flux cali-
bration. We thus employed three independent methods
for the flux calibration. In §4.1 we introduce “primary
fluxes” which are absolute fluxes of the emission lines
used to select the galaxies. Primary fluxes are computed
from the SuprimeCam broadband and narrowband mag-
nitudes. We use these fluxes to derive luminosity func-
tions of Hα and [O III]λ5007 emitters at z < 1 (§5.1).
In §4.2 we measure line fluxes from the spectra. Rela-
tive line fluxes can be measured from the spectra without
flux calibration as long as we restrict the line measur-
ments to over short wavelength range where the DEIMOS
response is essentially constant. For example, one can as-
sume the response of neighboring lines (e.g. [O III]λ4949
and [O III]λ5007) are the same and therefore one can
measure the flux ratio without calibration. For bright
galaxies, we can absolutely calibrate the fluxes by inte-
grating spectra and equating them to Subaru broadband
fluxes. These line fluxes derived from the spectra are
used as a check of the primary fluxes. We show that
the ratio of [O III]λ5007/[O III]λ4959 is indeed close to
1/3, and that the fluxes computed from the spectra are
highly consistent with the primary fluxes measured in
§4.1. In §4.3, we show Balmer flux ratios f(Hβ)/f(Hα) of
bright Hα emitters are close to the Case B conditions,
suggesting very little reddening.
Metallicity measurements by the direct method re-
quire four emission lines that are widely displaced
over the spectral wavelength range ([O III]λλ4959, 5007,
[O III]λ4363, and [O II]λ3727). To calibrate these lines,
we used neighboring Balmer lines with the assumption
of Case B conditions, and this is described in §4.4.
4.1. Narrow Band Fluxes − Primary Fluxes
For the emission lines used to select each galaxy we
may compute the equivalent widths and absolute fluxes
directly from the narrow band magnitudes (N) and
the corresponding continuum magnitudes (C) from our
SuprimeCam imaging data. For example, in the case of
the NB816 selected emission-line galaxies, N corresponds
to the NB816 magnitude and C is the I band magnitude.
We shall refer to the values calculated in this way as the
primary fluxes and use this quantity to compute the lu-
minosity functions for each emitter in §5.1.
Defining the quantity
R = 10−0.4∗(N−C)
THE YOUNGEST GALAXIES 7
1.5 2.0 2.5 3.0
LOG REST FRAME EW ([OIII]5007)
1.5 2.0 2.5 3.0
LOG REST FRAME EW (Hα)
Fig. 8.— (a) Distribution of the rest frame equivalent widths
determined from the narrow band magnitudes for the spectroscop-
ically identified [O III] λ5007 sources. (b) Distribution of the rest
frame equivalent widths for the Hα selected sample.
the observed frame equivalent width becomes
where φ is the narrow band filter response normalized
such that the integral over wavelength is unity and ∆λ is
the effective width of the continuum filter. The narrow
band filter is often assumed to be rectangular in which
case φ becomes 1/δλ where δλ is the width of the narrow
band but as can be seen from Figure 1 this is not a very
good approximation in the present case. For very high
equivalent width objects the denominator in this equa-
tion becomes uncertain unless the broad band data are
very deep, and this can result in a large scatter in the
very highest equivalent widths where the continuum is
poorly determined.
In the case of the [O III]λ5007 line we must include the
second member of the doublet which also lies within the
narrow band filter. We have computed these cases as-
suming the flux of the [O III]λ4959 line is 0.34 times that
of the [O III]λ5007 line. Then φ = φ1+0.34×φ2 where φ1
is the filter response at the redshifted 5007 Å wavelength
and φ2 is the filter response at redshifted 4959 Å.
The distribution of the rest frame equivalent widths for
the Hα and [O III]λ5007 samples is shown in Figure 8.
The [O III]λ5007 sample selects objects with rest frame
equivalent widths above about 100Å while the lower red-
shift Hα sample selects objects with rest frame equivalent
widths above about 150Å. Since the [O III]λ5007 lines are
also generally stronger than the Hα lines the [O III] selec-
tion chooses less extreme objects than the Hα selection
and will include a larger fraction of galaxies at the given
10-18 10-17 10-16 10-15
f(OIII 5007) erg cm-2 s-1
Fig. 9.— The ratio of the [O III] λ4959 line to [O III] λ5007. The
errors are plus and minus 1 sigma. The median ratio is 0.338 and
the scatter around this value is consistent with that expected from
the statistical errors.
redshift.
The high observed frame equivalent widths make the
line fluxes insensitive to the continuum determination
and these may simply be found from
f = A
10−0.4N − 10−0.4C
where A is the AB zeropoint at the narrow band wave-
length in units of erg cm−2 s−1 Å−1. The flux depends
on the filter response at the emission line wavelength and
correspondingly is most uncertain at the edges of the fil-
ters where this quantity changes rapidly. Primary fluxes
defined here are measured by using narrowband (N) and
broadband (C) magnitudes from Subaru imaging data
with the object redshift information from Keck spectra
for the filter response at the exact location of emission
line wavelength (φ). We use these primary fluxes to con-
struct the luminosity functions of Hα and [O III]λ5007
selected emitters as we discuss in §5.1.
4.2. Line Fluxes from the Spectra
For the short wavelength range where DEIMOS re-
sponse is essentially constant, we may also compute the
relative line fluxes from the spectra without calibration.
For each spectrum we fitted a standard set of lines. For
the stronger lines we used a full Gaussian fit together
with a linear fit to the continuum baseline. For weaker
lines we held the full width constant using the value mea-
sured in the stronger lines and set the central wavelength
to the nominal redshifted value. We also measured the
noise as a function of wavelength by fitting to random
positions in the spectrum and computing the dispersion
in the results.
These fits should provide accurate relative fluxes over
short wavelength intervals where the DEIMOS response
is similar, but may be expected to be poorer over longer
wavelength intervals where the true calibration can vary
from the adopted value. We tested the relative flux cali-
bration for neighboring lines and the noise measurement
by measuring the ratio of the [O III]λ4959/ [O III]λ5007
lines. This is expected to have a value of approximately
0.34. The ratio is shown as a function of the [O III]λ5007
8 Kakazu et al.
10-17 10-16 10-15
FLUX (erg cm-2 s-1)
Fig. 10.— Ratio of fluxes computed from the spectra and the
broad band magnitudes versus those from the narrow band mag-
nitudes. Hα lines are shown as solid boxes, [O III] λ5007 lines as
diamonds and [O II] λ3727 lines as crosses.
flux in Figure 9. The measured values scatter around the
expected value and the dipsersion is consistent with the
noise determination. This result supports our assump-
tion of [O III]λ4959/[O III]λ5007 = 0.34 in the primary
fluxes measurements described in §4.1.
The brighter objects may be absolutely calibrated us-
ing the continuum magnitudes obtained from our Sub-
aru data. We integrated the spectrum convolved with
each SuprimeCam filter response and set this equal to
the broad band flux to normalize the spectrum in each
of the filters. We then used the Gaussian fits to obtain
the spectral line fluxes for lines lying within that broad
band. This procedure only works for sources with well
determined continuum magnitudes (C < 24.5) where the
sky subtraction can be well determined in the spectra.
For these objects the spectrally determined fluxes are
shown versus the primary fluxes in Figure 10 where we
plot the ratio of the spectral to the primary flux versus
the primary line flux. The values agree extremely well
though the measured spectral line fluxes are on average
about 80 − 90% of the primary flux values. This may
reflect a selection bias in the choice of the objects or the
narrow band filters could be slightly narrower than the
nominal profiles.
4.3. Balmer Ratios
We now measured the Balmer ratios for the sample of
objects selected in Hα and where the continuum magni-
tudes were bright enough to absolutely flux calibrate the
spectra. The ratio of f(Hβ)/f(Hα) is shown in Figure 11.
The values average closely to the Case B ratio which is
shown as the solid line and at brighter fluxes the indi-
vidual values also closely match to this value suggesting
that the galaxies have very little reddening. However,
at fainter fluxes the scatter about the average value is
considerably higher than the statistical errors. At the
faintest fluxes it appears that the systematic uncertainty
can be as high as a multiplicative factor of two.
4.4. Final Flux Calibration for Metallicity Analysis
For the metallicity analysis we adopted the pro-
cedure of normalizing the longer wavelength lines
([O III]λλ4959, 5007, [O III]λ4363) to their nearest
10-18 10-17 10-16 10-15
f(Hα) erg cm-2 s-1
Fig. 11.— The ratio of the Hβ/Hα fluxes versus Hα flux. The
values average to the unreddened Balmer decrement shown by the
solid line but at the lower fluxes the scatter is larger than expected
from the statistical errors reflecting the calibration uncertainties for
the fainter sources. The figure shows the ten objects detected in the
Hα line with continuum magnitudes above 24.5 in the bandpasses
corresponding to the lines.
Balmer line to determine the unreddened fluxes. For ex-
ample, in the case of the Hα emission selected galaxies,
we can measure Hα absolute fluxes by the primary fluxes
method described in §4.1. We can then derive Hβ and Hγ
fluxes from Hα fluxes by assuming Case B recombination
[e.g., f(Hα) = 2.85 × f(Hβ), f(Hγ) = 0.469 × f(Hβ) at T
= 104 K and Ne ∼ 10
2−104cm−3; Osterbrock 1989]. As
Hβ and [O III]λλ4959, 5007 have very similar DEIMOS
response, the relative fluxes should remain the same with
or without the flux calibration and this can be expressed
by a simple equation:
f0(Hβ)
f0([O III]λ4959, λ5007)
f(Hβ)
f([O III]λ4959, λ5007)
where f0(Hβ) and f0([O III]λ4959, λ5007) are the flux
counts in the un-calibrated, reddened DEIMOS spec-
tra, while f(Hβ) and f([O III]λ4959, λ5007) are absolute,
unreddened fluxes. Since we know f(Hβ) from f(Hα)
with the Case B assumption and f0(Hβ)/f0([O III]λ4959,
λ5007) from the DEIMOS spectra, we can derive
f([O III]λ 4959, λ5007) using this simple formula. Sim-
ilary, we can absolutely calibrate [O III]λ4363 lines by
using its neighboring Balmer line, Hγ:
f0(Hγ)
f0([O III]λ4363)
f(Hγ)
f([O III]λ4363)
where f0(Hγ) and f0([O III]λ4363) are again the counts
in flux uncalibrated, reddened DEIMOS spectra, and
f(Hγ) and f([O III]λ4363) are absolute fluxes.
In the case of the [O III] selected emitters, we first de-
rive [O III]λλ4959, 5007 absolute fluxes using the primary
fluxes method (§4.1), and then use the above formula to
get absolute fluxes of Hβ, then Hγ (by the Case B ratio),
and finally [O III]λ4363.
This flux calibration technique using neighboring
Balmer line should work well for the [O III]λλ4959, 5007,
λ4363 lines and the [N II] lines which all lie close to
Balmer lines but may be slightly more approximate for
the [S II] lines. The higher order Balmer lines are too un-
certain to apply this procedure due to inadequate S/N of
THE YOUNGEST GALAXIES 9
40 41 42 43
L(Hα) (erg s-1)
1 z=0.24
40 41 42 43
L(Hα) (erg s-1)
1 z=0.39
Fig. 12.— The luminosity function of Hα at z = 0.24 (top panel)
and at z = 0.39 (bottom panel). In each case the open boxes
show the luminosity functions determined from the spectroscopic
sample alone while the solid boxes show the function corrected for
the incompleteness in the spectroscopic identification. The errors
are plus and minus 1 sigma and at the highest luminosity we show
the 1 sigma upper limit.
the lines, and we have used the continuum flux calibrated
values with no reddening for the [O II]λ3727 and [Ne III]
lines. These values will have correspondingly higher flux
uncertainties. Fortunately the [O II]λ3727 line is very
weak in most of the objects and the uncertainty has lit-
tle effect on the metallicity determinations. However,
ionization analyses based on the [Ne III] line should be
undertaken with caution.
5. STAR FORMATION HISTORY
5.1. Hα and [O III]λ5007 Luminosity Functions
Because of the high observed frame equivalent widths
the primary fluxes are insensitive to the continuum de-
termination. However, they do depend on the filter re-
sponse at the emission line wavelength so we first restrict
ourselves to redshifts where the nominal filter response
is greater than 70% of the peak value. This also has
the advantage of providing a uniform selection and we
assume the window function is flat over the defined red-
shift range. Now the volume is simply defined by the se-
lected redshift range for all objects above the minimum
luminosity which we take as corresponding to a flux of
1.5×10−17 erg cm−2 s−1 (Figure 7). The luminosity func-
tion is now obtained by dividing the number of objects
40 41 42 43
L(OIII) (erg s-1)
1 z=0.63
40 41 42 43
L(OIII) (erg s-1)
1 z=0.83
Fig. 13.— The luminosity function of [O III] λ5007 emitters at
z = 0.63 (top panel) and at z = 0.83 (bottom panel). In each case
the open boxes show the luminosity functions determined from the
spectroscopic sample alone while the solid boxes show the function
corrected for the incompleteness in the spectroscopic identification.
The errors are plus and minus 1 sigma and at the highest luminosity
we show the 1 sigma upper limit.
in each luminosity bin by the volume. The incomplete-
ness corrected luminosity function is obtained from the
sum of the weights in each luminosity bin divided by the
volume. The 1 sigma errors shown are calculated from
the Poissonian errors based on the number of objects in
the bin. The calculated Hα luminosity function is shown
for the two redshift ranges corresponding to the NB816
and NB912 selections in Figure 12 and the corresponding
[O III]λ5007 luminosity functions in Figure 13.
5.2. Star Formation Rates
The individual objects have Hα luminosities stretching
up to about 3×1041 erg s−1 where, at the higher redshifts,
we use the Hβ luminosity to infer the Hα value assuming
there is no reddening. For a steady formation this would
require a star formation rate of a few solar masses per
year if we adopt the Kennicutt (1998) conversion rate for
his Salpeter mass function.
Since the objects are more probably caused by star-
bursts the true star formation rate will depend on the
evolutionary history. However, the Hα luminosity den-
sity should give a reasonable estimate of the universal
star formation density of the objects provided only that
most of the ionizing photons are absorbed in the galax-
10 Kakazu et al.
Fig. 14.— The star formation history inferred from the Hα or
Hβ luminosity density as a function of redshift. The data from our
sample are shown in red. The open squares show the total rate
from the entire sample while the solid squares show the values for
objects with Hα rest frame equivalent widths in excess of 200Å or
Hβ rest frame equivalent widths in excess of 70Å. The diamonds
show the UV star formation rates (uncorrected for extinction) from
the ground based work of Wilson et al. (2002) and the triangles
the Galex results of Wyder et al. (2005) and Schiminovich et al.
(2005). Hα measurements from the literature as summarized in
Ly et al. (2007) are shown with filled circles. In all cases the errors
are ±1σ.
ies. We first formed the total Hα or Hβ luminosity den-
sity of the galaxies by summing over the incompleteness
weighted luminosities in each redshift interval. We only
included detected objects and did not attempt to extrap-
olate to lower luminosities but the result are not partic-
ularly sensitive to this because the luminosity functions
are relatively flat at the lower luminosities (Figures 13
and 14). We then converted these to star formation rates
with the Kennicutt (1998) conversion.
The results are shown in Figure 14. We first plot
the rate for the total samples at each redshifts shown
by the open squares. We have shown star formation
rates for UV continuum samples for comparison and the
present sample of strong emitters gives star formation
rates which are about 10% of the UV values over the
redshift interval. For comparison, we also show the star
formation rates from Hα selected samples reported in the
literature and summarized in Ly et al. (2007). In order
to better understand the evolution we have also restricted
our own sample to objects with rest frame equivalent
widths of Hα in excess of 200Å at low redshifts and Hβ
equivalent widths in excess of 70Å at the higher redshifts.
The star formation rates for this sample are shown with
the solid squares. This provides a more uniform selection
with redshift and gives a slower increase than the total
inferred star formation rate. For this sample the SFR is
evolving roughly as (1+z)3 broadly similar to other UV
and optically determined formation rates in this redshift
interval. These more restricted objects comprise about
5% relative to the UV star formation rates at the higher
redshifts.
6. GALAXY METALLICITIES
6.1. [O III] emitters
The spectra are of variable quality and, in order to
measure the metallicities, we need very high signal to
noise observations. It is also important that Balmer lines
are well detected since our flux calibrations are done us-
ing the neighboring Balmer lines (§4.4). We therefore
restricted ourselves to [O III] emitters whose Hβ fluxes
are detected above 15 sigma. Among 92 [O III] emitters
in our total spectroscopic sample, 8 such [O III] emitters
were chosen in the NB912 sample, and 10 in the NB816.
These emitters have Hγ detected above 4 sigma. Tables 4
and 5 give the oxygen line fluxes normalized by their Hβ
fluxes for the NB816 and NB912 selected emitters, re-
spectively. 1σ upper limits are listed when the measured
flux is below 1σ.
The most direct method to estimate the gas-phase
oxygen abundance is to use the electron temperature of
the HII region. Higher gas metallicity increases nebular
cooling, leading to lower electron temperatures. There-
fore electron temperature is a good indicator of the
gas metallicity. The electron temperature can be de-
rived from the ratio of the [O III]λ4363 auroral line to
[O III]λλ5007,4959. This procedure is often referred to
as the ‘direct’ method or Te method (e.g., Seaton 1975;
Pagel et al. 1992; Pilyugin & Thuan. 2005; Izotov et al.
2006c). One well-known problem with the direct method,
however, is that [O III]λ4363 is generally very weak even
in the low-metallicity galaxies. For higher metallicity
systems, the far-IR lines become the dominant coolant
and therefore the optical auroral line is essentially not
detectable. However, the majority of our sample exhibit
[O III]λ4363, already suggesting that these are metal-
deficient systems. To derive Te[O III] and the oxygen
abundances, we used the Pagel et al. (1992) calibrations
with the Te[O II]−Te[O III] relations derived by Garnett
(1992). The results are shown in Table 4 (for NB816
selected [O III] emitters) and Table 5 (for NB912 se-
lected [O III] emitters). The Izotov et al. (2006c) for-
mula, which was developed with the latest atomic data
and photoionization models, gives consistent abundances
within 0.1 dex. The [S II]λλ6717, 6731 lines that are usu-
ally used for the determination of the electron number
density, are beyond the Keck/DEIMOS wavelength cov-
erage for our [O III] emitters. Therefore we assumed ne
= 100 cm−3. However the choice of electron density has
little effect as electron temperature is insensitive to the
electron density; we get the same results even when we
use ne = 1000 cm
The 1σ upper and lower limits on Te[O III] and the
oxygen abundances are also shown in the tables. Because
the [O III]λ4363 flux is weak (< 4σ), the range of our
metallicity measurements are quite wide. However, of
18 [O III] emitters, even the upper metallicity limits on 6
emitters satisfy the definition of XMPGs [12 + log(O/H)
< 7.65; Kunth & Östlin 2000]. All our emitters, except
the ones that only have lower limits on metallicties such
as ID31 in Table 4, have very low metallicities – even the
upper metallicity limits are about 0.02 - 0.2 Z⊙. A few
emitters may even have metallicities that are comparable
to the currently known most metal-poor galaxies [I Zw
18 and SBS0335−052W; 12 + log(O/H) ∼ 7.1 − 7.2].
However our current metallicity errors are too large to
measure the baseline metallicity accurately and higher
S/N spectra will be necessary for this purpose.
Our discovery rate of XMPGs appers to be signif-
icantly higher than those of other surveys. Only 14
new XMPGs have been discovered from the analysis of
THE YOUNGEST GALAXIES 11
Fig. 15.— [O III]λ4959+λ5007/[O III]λ4363 versus
[O II]λ3727/[O III]λ5007 for the [O III] and Hα selected emitters
in Table 4 and 5. The electron temperature of the HII region is
also shown.
∼530,000 galaxy spectra in the SDSS and they are all
located nearby (z < 0.005) (SDSS DR3: Kniazev et al.
2003, SDSS DR4: Izotov et al. 2006a). At higher red-
shift, 17 metal-poor (7.8 < 12 + log(O/H)< 8.3) galax-
ies have been found at z ∼ 0.7 in the initial phase of the
DEEP2 survey of 3,900 galaxies and the Team Keck Red-
shift Survey of 1,536 galaxies (Hoyos et al. 2005). But
none of these galaxies satisfies the condition of XMPGs.
The present sample may be the first XMPGs at inter-
mediate redshift (z ∼ 1) whose oxygen abundances are
securely measured by the direct method. The narrow-
band method is very powerful for finding not only high-
redshift (z >> 5) galaxies, but also strong emission-line,
extremely metal-deficient galaxies at intermediate red-
shifts (z < 1).
Figure 15 shows the electron temperature sensi-
tive line ratio, [O III](λ4959+λ5007)/[OIII]λ4363 ver-
sus [O II]λ3727/[O III]λ5007. If we have an estimate of
the metallicity, as in the present case, we can use the
[O II]λ3727/[O III]λ5007 ratio to estimate the ionization
parameter q. The ionization parameter q is defined as
the number of hydrogen ionizing photons passing through
a unit area per second per unit hydrogen number den-
sity (Kewley & Dopita 2002). For the low metallicity
systems with strong [O III]λ4363 lines, we can see from
Figure 15 that [O II]λ3727 is very weak compared to
[O III]λ5007 with values ranging downwards from 0.3.
Assuming the metallicity is less than 0.2 Z⊙ this would
place a lower bound of q = 108 cm s−1 on the ioniza-
tion parameter based on the Kewley & Dopita (2002)
model. The higher metallicity objects have stronger
[O II]λ3727/[O III]λ5007 which, while in part due to the
metallicity, also requires these objects to have lower ion-
ization parameters suggesting we are seeing an evolution-
ary sequence.
6.2. Hα emitters
Among 13 Hα emitters in our spectroscopic sample,
only 3 NB912 selected emitters have Hβ fluxes ade-
quate (> 15σ) for the purpose of metallicity measure-
ments. Their Te[O III] and oxygen abundances were mea-
sured using the direct method described above, and are
shown in the Table 5 together with the data for the
[O III] emitters. The [O II]λ3727 line of ID266 is out-
side the Keck/DEIMOS wavelength coverage. In order
to derive an upper-limit on the metallicity, we assumed
[O II]λ3727/[O III]λ5007 = 1.0, which is the maximum
value in our sample (see, Fig. 15). All our Hα emitters
are metal poor (Zupper < 0.45 Z⊙), but none of them are
XMPGs.
6.3. Composite Spectrum
As can be seen in Figure 15 the objects with low
[O II]λ3727/[O III]λ5007 have relatively uniform values
of ([O III]λ5007+λ4959)/[OIII]λ4363 and similar metal-
licities. Given the relatively low signal to noise of the
individual spectra it therefore seems of interest to form a
composite spectrum. Such a spectrum will have weight-
ings on the lines which depend on the individual ion-
ization parameters and metallicity but will give a rough
estimate of the average metallicity and temperature of
the sample.
In Figure 16 we show the composite spectrum of
the 8 objects with [O II]λ3727/[OIII]λ5007 less than
0.1. The [O III]λ4363 is now strongly detected with a
value of 16.7 ± 2.1 or eight sigma. The mean temper-
ature is 19, 500 ± 1, 500 K and the average abundance
12+log(O/H)=7.5± 0.1 and the mean rest frame equiv-
alent width of Hβ is 57Å. The results are similar to the
values obtained by averaging the individual analyses of
the eight objects.
7. MORPHOLOGIES
The morphology of the USELs may give us a clue to the
mechanism of their high star formation activity (SFR ∼
a few M⊙/yr) and star formation history; what has trig-
gered the star formation −mergers, gas infall, or galactic
winds? High resolution HST/ACS images are available
for GOODS-North (GOODS-N) region (Giavalisco et al.
2004) which is one of our survey fields. There are 17
NB816 selected USELs in the GOODS-N, and 16 in the
NB912 sample. Figures 17 and 18 show thumbnails of
the NB816 and NB912 selected USELs in the GOODS-N
field (respectively) with each thumbnail 12.′′5 on a side.
The white dashes point to the largest galaxy. We used
continuum broadband images to show underlying stellar
populations: HST/ACS B, V, z′-band images were used
for NB816 emitters, and B, V, I-band for NB912 emit-
ters. High-redshift Lyα emitters (z >> 5) are very red
because of the continuum absorption below Lyα emission
caused by neutral hydrogen in the intergalactic medium.
We do not have spectra for most of the USELs in the
GOODS-N field yet, and none of the USELs in GOODS-
N have metallicity measurements either due to lack of
spectra or low spectral S/N. But we can qualitatively
argue that the USELs at intermediate redshift (z < 1)
exhibit widespread morphologies from relatively compact
high surface brightness objects to very diffuse low surface
brightness ones.
8. DISCUSSION
The present emitters differ from the local dwarf HII
galaxies in a large number of ways though they appear
much more similar to the XMPGs found in the SDSS
samples. They are much more luminous, have large
[O III]/[O II] ratios, and they are a relatively high frac-
tion (about 10% by number from Figure 13) of other
12 Kakazu et al.
3600 3800 4000 4200 4400 4600 4800 5000 5200
REST WAVELENGTH (A)
Fig. 16.— Composite spectrum of the 8 emitters with [O II]λ3727/[O III]λ5007 less than 0.1. The eight spectra have simply been summed
without weighting. The lower panel shows the stronger lines and the upper the continuum and the weaker lines. The stronger emission
lines are labelled and marked with the vertical dotted lines.
Fig. 19.— The oxygen abundance versus the absolute rest frame B magnitude for the [O III] selected samples (red squares). One sigma
errors are shown for the oxygen abundances and one sigma lower limits are shown with upward pointing arrows. The solid line shows the
(Skillman et al. 1989) relation for the nearby dwarf irregulars. As with the local XMPGs (filled circles, Kniazev et al. 2003; Kewley et al.
2007 and GRB hosts (open squares, Stanek et al. 2006; Kewley et al. 2007, the present galaxies are much more luminous at a given
metallicity than the local irregulars. Metal-poor luminous galaxies (but not XMPGs) at z ∼ 1 from Hoyos et al. 2005. are shown as
triangles. A few of our emitters may have oxygen abundances comparable to the most metal-deficient galaxis, I Zw 18 (12 + log O/H =
7.17± 0.01, Thuan & Izotov 2005) and SBS 0335-052W (12 + log O/H = 7.12 ± 0.03, Izotov et al. 2005).
THE YOUNGEST GALAXIES 13
galaxy populations at these redshifts. Taken together
this suggests that we are seeing much more massive
galaxies in the early stages of formation and, since we
need these to have relatively long lifetimes in order to
understand their frequency, that we are seeing objects
undergoing continuous star formation rather than single
starbursts. For the case of constant star formation with a
standard Salpeter IMF a forming galaxy can have equiv-
alent widths above 30Å for 109 yr (Leitherer et al. 1999)
which would allows us to understand the observed num-
ber density of strong emitters relative to the total galaxy
population.
In this type of model we would expect the metal-
licity to grow with time and that higher metallicity
galaxies would have higher continuum magnitudes and
lower equivalent widths in Hβ. We plot the absolute
rest frame B magnitudes versus the Oxygen abundance
in Figure 19. As with the case for the local XMPGs
found in the SDSS (filled circles, Kniazev et al. 2003;
Kewley et al. 2007) and the metal-poor galaxies (7.8 <
12 + log O/H < 8.3) at z ∼ 1 (triangles, Hoyos et al.
2005), the present emitters (red squares) are much more
luminous at a given metallicity than is found for the
local dIrrs (solid line, Skillman et al. 1989). Further-
more there does indeed seem to be a trend to higher
continuum luminosities at higher metallicity consistent
with ongoing star formation raising the luminosity. Re-
cently Kewley et al. (2007) reported the similarity be-
tween XMPGs and long duration GRB hosts ; they share
similar SFRs, extinction levels, and both lie in a similar
region of the luminosity-metallicity diagram. Our sam-
ple metal-deficient galaxies, which also lie in the region of
GRB hosts, may be additional support of the connection
between XMPGs and GRB hosts.
Of the six galaxies with continuum magnitudes
brighter than -18 all but one have metallicities or
lower limits which would place them near or above
12+log(O/H)=8 while the lower luminosity galaxies pri-
marily have 12+log(O/H) in the range 7.1 − 7.8. If we
assumed that the metallicity were a simple linear func-
tion of the age then the more luminous galaxies would be
several times older than the less luminous ones which is
not quite enough to account for the luminosity increase
(see e.g. Leitherer et al. 1999) suggesting that the en-
richment process may be more complex. However, the
accuracy of our current metallicity measurements may
be inadequate for measuring the lowest metallicities in
the sample and we could be underestimating the amount
of metallicity evolution.
The relation between the metallicity and the Hβ equiv-
alent width is shown in Figure 20. There clearly is a large
scatter in metallicity at all equivalent widths suggesting
that the star formation may be episodic with periods in
which bursts of star formation enhances the Hβ equiva-
lent widths in objects where previous star formation has
raised the metallicity.
With better spectra and more accurate metal estimates
we should be able to refine these tests and also determine
whether the number density of objects versus metallic-
ity is consistent with that expected in a simple growth
model. Perhaps even more importantly as larger spectro-
scopic samples are obtained we should be able to deter-
mine if there is a floor on the metallicity corresponding
10 100 1000
EW (H Beta)
Fig. 20.— The oxygen abundance versus the rest frame Hβ equiv-
alent width for the [O III] selected samples. One sigma errors are
shown for the oxygen abundances and one sigma lower limits are
shown with upward pointing arrows. The dotted line shows the
metallicity of 1Zw-18.
to the enrichment in the intergalactic gas. Within the er-
rors we have yet to find an object with lower metallicity
than the lowest metallicity local galaxies but this could
easily change as the observations are improved.
9. SUMMARY
We have described the results of spectrscopic observa-
tions of a narrowband selected sample of extreme emis-
sion line objects. The results show that such objects are
common in the z = 0 − 1 redshift interval and produce
about 5-10% of the star formation seen in ultraviolet or
emission line measurements at these redshifts. A very
large fraction of the strong emitters are detected in the
[O III]λ4363 line and oxygen abundances can be mea-
sured using the direct method. The abundance primar-
ily lie in the 12+log(O/H) range of 7−8 characteristic of
XMPGs.
The results suggest that we are seeing early chemical
enrichment of startup galaxies at these redshifts which
are forming in relatively chemically regions. As we de-
velop larger samples of these objects and improve the
accuracy of their abundance estimates we should be able
to test this model and to determine if there is a floor on
the metallicity of the galaxies.
We are indebted to the staff of the Subaru and Keck
observatories for their excellent assistance with the obser-
vations. We acknowledge Subaru/SuprimeCam support
astronomer, Hisanori Furusawa, for his help over several
years with the observations. Y. Kakazu acknowledges
invaluable advice from Lisa J. Kewley and Roberto Ter-
levich on metallicity measurements. This work was sup-
ported in part by NSF grants AST04-07374 (LLC) and
AST06-87850 (EMH), and Spitzer grant JPL 1289080
(EMH).
14 Kakazu et al.
REFERENCES
Ajiki, M. et al. 2003. AJ, 126, 2091, astro-ph/0307325
Ajiki, M., et al. 2006, PASJ, 58, 113, astro-ph/0511471
Capak, P. et al. 2004, AJ, 127, 180, astro-ph/0312635
Corbin, M. R., Vacca, W. D., Cid Fernandes, R., Hibbard, J. E.,
Somerville, R. S., & Windhorst, R. A. 2006, ApJ, 651, 861,
astro-ph/0607280
Cowie, L. L., Songaila, A., Hu, E. M., & Cohen, J. G. 1996, AJ,
112, 839, astro-ph/9606079
Faber, S. M. et al. 2003, Proc. SPIE, 4841, 1657
Garnett, D. R. 1992, AJ, 103, 1330
Giavalisco, M. et al. 2004, ApJ, 600, L93, astro-ph/0309105
Hoyos, C., Koo, D. C., Phillips, A. C., Willmer, C. N. A., &
Guhathakurta, P. 2005, ApJ, 635, L21, astro-ph/0511066
Hu, E. M., Cowie, L. L., & McMahon, R. G. 1998, ApJ, 502, L99,
astro-ph/9801003
Hu, E. M., Cowie, L. L., Capak, P., Hayashino, T., & Komiyama,
Y. 2004, AJ, 127, 563, astro-ph/0311528
Hu, E. M., Cowie, L. L., Kakazu, Y., & Capak. P. 2007, in
preparation
Izotov, Y. I., Thuan, T. X., & Guseva, N. G., 2005, ApJ, 415, 87,
astro-ph/0506498
Izotov, Y. I., Papaderos, P., Guseva, N. G., Fricke, K. J., &
Thuan, T. X. 2006a, A&A, 454, 137, astro-ph/0604234
Izotov, Y. I. 2006b, in ASP Conf. Ser. 353, Stellar Evolution at
Low Metallicity: Mass Loss, Explosions, Cosmology, ed.
H. J. G. L. M. Lamers, N. Langer, T. Nugis, & K. Annuk (San
Francisco: ASP), 349
Izotov, Y. I., Stasińska, G., Meynet, G., Guseva, N. G., & Thuan,
T. X. 2006c, A&A, 448, 955, astro-ph/0511644
Kennicutt, R. C., Jr. 1998, ARA&A, 36, 189, astro-ph/9807187
Kewley, L. J., & Dopita, M. A. 2002, ApJS, 142, 35,
astro-ph/0206495
Kewley, L. J., Brown W. R., Geller, M. J., Kenyon, S. J., &
Kurtz, M. J. 2007, ApJ, in press, astro-ph/0609246
Kniazev, A. Y., Grebel, E. K., Hao, L., Strauss, M. A.,
Brinkmann, J., & Fukugita, M. 2003, ApJ, 593, L73,
astro-ph/0307401
Kobulnicky, H. A., & Kewley, L. J. 2004, ApJ, 617, 240,
astro-ph/0408128
Kobulnicky, H. A., et al. 2003, ApJ, 599, 1006, astro-ph/0310346
Kunth, D., & Östlin, G. 2000, A&A Rev., 10, 1, astro-ph/9911094
Leitherer, C. et al. 1999, ApJS, 123, 3, astro-ph/9902334
Lilly, S. J., Carollo, C. M., & Stockton, A. N. 2003, ApJ, 597,
730, astro-ph/0307300
Ly, C. et al. 2007, ApJ, 657, 738, astro-ph/0610846
Maier, C., Lilly, S. J., Carollo, C. M., Meisenheimer, K.,
Hippelein, H., & Stockton, A. 2006, ApJ, 639, 858,
astro-ph/0511255
Melbourne, J., & Salzer, J. J. 2002, AJ, 123, 2302,
astro-ph/0202301
Miyazaki, S., et al. 2002, PASJ, 54, 833, astro-ph/0211006
Oey, M. S. 2006, in ASP Conf. Ser. 353, Stellar Evolution at Low
Metallicity: Mass Loss, Explosions, Cosmology, ed.
H. J. G. L. M. Lamers, N. Langer, T. Nugis, & K. Annuk (San
Francisco: ASP), 253
Oke, J. B. 1990, AJ, 99, 1621
Osterbrock, D. E. 1989, Astrophysics of gaseous nebulae and
active galactic nuclei (Mill Valley, CA: University Science
Books)
Pagel, B.E.J., Simonson, E. A., Terlevich, R. J., & Edmunds, M.
G. 1992, MNRAS, 255, 325
Papovich, C. et al. 2004, ApJ, 600, L111, astro-ph/0310888
Pilyugin, L. S. & Thuan, T. X. 2005, ApJ, 631, 231
Popescu, C. C., Hopp, U., Hagen, H. J., & Elsaesser, H. 1996,
A&AS, 116, 43, astro-ph/9510127
Pustilnik, S. A., Engels, D., Kniazev, A. Y., Pramskij, A. G.,
Ugryumov, A. V., & Hagen, H.-J. 2006, Astronomy Letters, 32,
228, astro-ph/0508255
Salzer, J. J., et al. 2000, AJ, 120, 80, astro-ph/0004074
Sargent, W. L. W., & Searle, L. 1970, ApJ, 162, L155
Schiminovich, D., et al. 2005, ApJ, 619, L47, astro-ph/0411424
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500,
525, astro-ph/9710327, used to estimate dust extinctions from
source location by the NASA/IPAC Infrared Science Archive
Dust Reddening and Extinction Tool
(http://irsa.ipac.caltech.edu/applications/DUST/)
Seaton M. J. 1975, MNRAS, 170, 475
Skillman, E. D., Kennicutt, R. C., & Hodge, P. W. 1989, ApJ,
347, 875
Stanek, K. Z. et al. 2006, Acta Astron., 56, 333, astro-ph/0604113
Stasińska, G., Schaerer, D., & Leitherer, C. 2002, Ap&SS, 281,
335 McMahon, R. G. 2003, MNRAS, 342, 439,
astro-ph/0302212
Stern, D., Bunker, A., Spinrad, H., & Dey, A. 2000, ApJ, 537, 73,
astro-ph/0002239
Stockton, A., & Ridgway, S. E. 1998, AJ, 115, 1340,
astro-ph/9801056
Thuan, T. X., Izotov, Y. I. 2005, ApJS, 161, 240,
astro-ph/0507209
Tran, K.-V. H., Lilly, S. J., Crampton, D., & Brodwin, M. 2004,
ApJ, 612, L89, astro-ph/0407648
Turnshek, D. A., Bohlin, R. C., Williamson II, R. L., Lupie,
O. L., Koornneef, J., & Morgan, D.H. 1990, AJ, 99, 1243
Ugryumov, A. V., et al. 1999, A&AS, 135, 511
Wilson, G., Cowie, L. L., Barger, A. J., & Burke, D. J. 2002, AJ,
124, 1258, astro-ph/0203168
Wyder, T. K., et al. 2005, ApJ, 619, L15, astro-ph/0411364
http://arxiv.org/abs/astro-ph/0307325
http://arxiv.org/abs/astro-ph/0511471
http://arxiv.org/abs/astro-ph/0312635
http://arxiv.org/abs/astro-ph/0607280
http://arxiv.org/abs/astro-ph/9606079
http://arxiv.org/abs/astro-ph/0309105
http://arxiv.org/abs/astro-ph/0511066
http://arxiv.org/abs/astro-ph/9801003
http://arxiv.org/abs/astro-ph/0311528
http://arxiv.org/abs/astro-ph/0506498
http://arxiv.org/abs/astro-ph/0604234
http://arxiv.org/abs/astro-ph/0511644
http://arxiv.org/abs/astro-ph/9807187
http://arxiv.org/abs/astro-ph/0206495
http://arxiv.org/abs/astro-ph/0609246
http://arxiv.org/abs/astro-ph/0307401
http://arxiv.org/abs/astro-ph/0408128
http://arxiv.org/abs/astro-ph/0310346
http://arxiv.org/abs/astro-ph/9911094
http://arxiv.org/abs/astro-ph/9902334
http://arxiv.org/abs/astro-ph/0307300
http://arxiv.org/abs/astro-ph/0610846
http://arxiv.org/abs/astro-ph/0511255
http://arxiv.org/abs/astro-ph/0202301
http://arxiv.org/abs/astro-ph/0211006
http://arxiv.org/abs/astro-ph/0310888
http://arxiv.org/abs/astro-ph/9510127
http://arxiv.org/abs/astro-ph/0508255
http://arxiv.org/abs/astro-ph/0004074
http://arxiv.org/abs/astro-ph/0411424
http://arxiv.org/abs/astro-ph/9710327
http://irsa.ipac.caltech.edu/applications/DUST/
http://arxiv.org/abs/astro-ph/0604113
http://arxiv.org/abs/astro-ph/0302212
http://arxiv.org/abs/astro-ph/0002239
http://arxiv.org/abs/astro-ph/9801056
http://arxiv.org/abs/astro-ph/0507209
http://arxiv.org/abs/astro-ph/0407648
http://arxiv.org/abs/astro-ph/0203168
http://arxiv.org/abs/astro-ph/0411364
THE YOUNGEST GALAXIES 15
TABLE 2
NB816 Emission-Line Sample
No. RA(2000) Dec(2000) N(AB) Z(AB) I R V B z
1 40.115555 −1.694722 23.92 25.28 24.84 25.44 −99.00 −99.00 −1.0000
2 40.116665 −1.617361 24.26 25.88 25.52 25.74 −99.00 −99.00 −1.0000
3 40.138332 −1.405639 23.49 24.94 24.70 25.22 −99.00 −99.00 0.6343
4 40.174721 −1.704750 24.27 25.46 25.35 25.82 −99.00 −99.00 0.6355
5 40.183056 −1.495417 24.58 25.28 25.85 26.86 −99.00 −99.00 5.6886
6 40.216946 −1.494805 24.80 26.08 26.31 26.14 −99.00 −99.00 0.2416
7 40.250832 −1.744639 23.51 24.35 24.33 24.54 −99.00 −99.00 −1.0000
8 40.276112 −1.518139 24.72 24.78 25.53 25.75 −99.00 −99.00 −1.0000
9 40.276390 −1.623250 24.55 25.36 25.36 25.76 −99.00 −99.00 −1.0000
10 40.284168 −1.453583 21.82 23.20 23.02 22.56 −99.00 −99.00 0.2480
11 40.298889 −1.447389 24.32 25.95 25.42 25.71 −99.00 −99.00 −1.0000
12 40.304165 −1.391694 20.88 22.35 22.06 22.32 −99.00 −99.00 −1.0000
13 40.306946 −1.638500 24.91 25.67 25.72 26.02 −99.00 −99.00 −1.0000
14 40.311111 −1.535111 24.08 25.93 25.49 25.71 −99.00 −99.00 0.6240
15 40.318333 −1.548222 24.60 25.03 25.50 25.96 −99.00 −99.00 −1.0000
16 40.318890 −1.430889 23.37 23.77 24.32 24.42 −99.00 −99.00 1.1804
17 40.319168 −1.446333 23.60 23.85 24.43 24.56 −99.00 −99.00 1.1873
18 40.320835 −1.778028 20.70 21.57 21.60 21.86 −99.00 −99.00 −1.0000
19 40.324165 −1.409972 24.29 24.38 25.18 25.45 −99.00 −99.00 −1.0000
20 40.326111 −1.709805 23.24 24.75 24.62 24.96 −99.00 −99.00 −1.0000
21 40.336388 −1.570194 24.26 24.73 25.06 25.33 −99.00 −99.00 −1.0000
22 40.337223 −1.388194 24.81 27.37 26.71 26.77 −99.00 −99.00 −1.0000
23 40.337502 −1.658306 24.89 25.20 25.76 26.05 −99.00 −99.00 −1.0000
24 40.340279 −1.689472 24.55 26.37 26.07 26.51 −99.00 −99.00 −1.0000
25 40.340557 −1.551889 24.99 25.70 25.93 26.35 −99.00 −99.00 −1.0000
26 40.340832 −1.371222 22.42 23.43 23.29 23.16 −99.00 −99.00 −1.0000
27 40.340832 −1.516250 24.89 25.06 25.69 25.78 −99.00 −99.00 −1.0000
28 40.341110 −1.493500 23.37 24.62 24.55 24.23 −99.00 −99.00 −1.0000
29 40.341389 −1.484139 24.48 25.46 25.49 25.73 −99.00 −99.00 −1.0000
30 40.342777 −1.599528 24.65 25.31 25.50 25.85 −99.00 −99.00 −1.0000
31 40.343056 −1.438833 23.17 24.60 24.54 24.54 −99.00 −99.00 0.6226
32 40.347500 −1.403833 24.86 27.45 26.16 26.35 −99.00 −99.00 0.6324
33 40.349724 −1.598472 23.11 23.87 24.11 24.46 −99.00 −99.00 1.1956
34 40.356388 −1.515722 24.94 26.19 26.18 26.42 −99.00 −99.00 −5.0000
35 40.372223 −1.390361 23.85 24.68 24.72 24.62 −99.00 −99.00 −1.0000
36 40.373611 −1.722528 24.93 25.85 25.74 25.88 −99.00 −99.00 −1.0000
37 40.377777 −1.701889 23.45 23.82 24.26 24.70 −99.00 −99.00 −1.0000
38 40.388054 −1.697361 23.96 24.32 24.79 24.90 −99.00 −99.00 −1.0000
39 40.388889 −1.573361 24.79 25.13 25.65 26.07 −99.00 −99.00 −1.0000
40 40.394444 −1.521389 22.73 23.69 23.84 24.06 −99.00 −99.00 0.6292
Note. — Magnitudes are measured in 3′′ diameter apertures. An entry of ‘−99’ indicates that
no excess flux was measured. −1.0000 in the redshift column means no spectroscopic data were
obtained for the object. This is a sample table showing the first entries of the electronic version
of the table that will accompany the published paper.
16 Kakazu et al.
TABLE 3
NB912 Emission-Line Sample
No. R.A. (J2000.0) Decl. (J2000.0) N(AB) z′(AB) I R V B zspec
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
1 40.131668 −1.408361 23.86 25.08 25.89 25.97 −99.00 −99.00 0.8371
2 40.133888 −1.575222 24.76 25.92 26.13 25.84 −99.00 −99.00 1.4498
3 40.148056 −1.593555 23.21 24.62 25.53 26.11 −99.00 −99.00 0.8207
4 40.148335 −1.725417 24.76 25.97 26.98 26.39 −99.00 −99.00 0.8111
5 40.150833 −1.737556 23.31 24.35 24.88 25.32 −99.00 −99.00 0.8269
6 40.153332 −1.536833 23.64 25.00 26.18 26.87 −99.00 −99.00 0.8301
7 40.156387 −1.765833 21.93 23.05 23.51 23.74 −99.00 −99.00 −1.0000
8 40.165833 −1.580056 24.94 26.20 25.92 25.84 −99.00 −99.00 1.4482
9 40.183334 −1.389583 21.77 22.79 23.33 23.52 −99.00 −99.00 0.8325
10 40.184444 −1.596444 23.09 24.50 25.30 25.55 −99.00 −99.00 0.8293
11 40.193611 −1.690083 24.32 25.40 25.83 25.69 −99.00 −99.00 0.8266
12 40.194168 −1.373722 23.93 24.93 25.21 25.35 −99.00 −99.00 −1.0000
13 40.194721 −1.373917 24.87 25.90 26.30 26.47 −99.00 −99.00 −1.0000
14 40.196667 −1.378333 24.07 25.42 26.05 26.18 −99.00 −99.00 −1.0000
15 40.202221 −1.584472 24.22 25.30 25.81 26.19 −99.00 −99.00 0.8289
16 40.203335 −1.471861 24.77 26.20 25.78 25.54 −99.00 −99.00 0.3965
17 40.214722 −1.519917 23.14 24.40 25.38 25.94 −99.00 −99.00 0.8288
18 40.220276 −1.753778 24.34 25.41 26.40 26.29 −99.00 −99.00 −1.0000
19 40.220833 −1.388556 23.24 24.36 25.13 24.99 −99.00 −99.00 −1.0000
20 40.226944 −1.542111 23.02 24.47 25.77 25.48 −99.00 −99.00 0.8208
21 40.229168 −1.720889 24.99 27.85 27.11 · · · −99.00 −99.00 6.4800
22 40.229443 −1.376472 23.75 24.98 25.72 24.66 −99.00 −99.00 −1.0000
23 40.245834 −1.578972 24.61 25.82 27.22 26.91 −99.00 −99.00 0.8285
24 40.280556 −1.421056 24.82 25.86 26.13 25.91 −99.00 −99.00 −1.0000
25 40.290833 −1.746361 23.89 25.16 25.81 25.03 −99.00 −99.00 0.0000
26 40.323891 −1.697667 24.73 25.80 26.47 25.71 −99.00 −99.00 −1.0000
27 40.330833 −1.612389 23.03 24.68 26.06 25.47 −99.00 −99.00 0.0000
28 40.339722 −1.395361 23.87 25.54 26.98 25.45 −99.00 −99.00 0.3889
29 40.346668 −1.448305 23.93 24.98 25.37 25.48 −99.00 −99.00 0.8274
30 40.382500 −1.554056 23.52 24.87 25.60 25.08 −99.00 −99.00 0.3930
31 40.393055 −1.713694 24.94 26.83 26.70 26.50 −99.00 −99.00 −1.0000
32 40.398888 −1.466417 23.87 24.91 24.83 24.64 −99.00 −99.00 1.4590
33 40.403889 −1.530167 24.88 26.08 27.44 27.30 −99.00 −99.00 0.8223
34 40.409443 −1.369222 21.52 23.44 24.47 24.04 −99.00 −99.00 −1.0000
35 40.411667 −1.691417 24.41 25.50 25.68 25.55 −99.00 −99.00 −1.0000
36 40.424999 −1.454028 21.91 22.93 23.42 23.41 −99.00 −99.00 −1.0000
37 40.430000 −1.501111 23.13 24.21 24.76 24.92 −99.00 −99.00 0.8267
38 40.446110 −1.676139 24.91 26.09 26.37 26.04 −99.00 −99.00 −1.0000
39 40.478889 −1.534278 24.87 26.06 25.79 25.56 −99.00 −99.00 0.3861
40 40.506111 −1.755111 22.73 24.27 25.24 25.19 −99.00 −99.00 −1.0000
41 40.511665 −1.596944 24.73 25.96 25.55 25.67 −99.00 −99.00 −1.0000
42 40.518055 −1.666139 22.47 23.73 24.37 24.45 −99.00 −99.00 −1.0000
Note. — Magnitudes are measured in 3′′ diameter apertures. An entry of ‘−99’ indicates that no
excess flux was measured. −1.0000 in the redshift column means no spectroscopic data were obtained for
the object. This is a sample table showing the first entries of the electronic version of the table that will
accompany the published paper.
TABLE 4
Line fluxes and Oxygen Abundance for L816 selected emitters
Object f([OIII]5007) f([OIII]4959) f([OIII]4363) f([OII]3727) Te[OIII] 12+log(O/H)
[OIII] emitters
31 513.6 ± 24.0 222.3 ± 11.4 < 6.60 54.4 ± 4.62 < 1.19 > 8.09
40 577.9 ± 21.6 191.3 ± 8.05 9.40 ± 3.40 140.9 ± 6.22 1.14 < 1.34 < 1.53 7.86 < 8.03 < 8.25
51 401.5 ± 12.6 146.9 ± 5.52 9.40 ± 4.50 < 2.39 1.19 < 1.55 < 1.90 7.51 < 7.62 < 7.94
76 464.4 ± 10.5 191.3 ± 4.86 < 2.90 344.5 ± 8.07 < 0.95 > 8.55
118 492.6 ± 29.7 193.9 ± 12.9 34.4 ± 12.8 11.3 ± 2.61 2.16 < 3.08 < 4.32 6.93 < 7.16 < 7.44
195 335.0 ± 21.4 129.5 ± 10.2 24.0 ± 10.9 97.1 ± 9.71 2.02 < 3.17 < 4.86 6.78 < 7.06 < 7.44
206 597.1 ± 19.5 204.1 ± 7.41 21.6 ± 9.20 < 1.72 1.48 < 1.97 < 2.48 7.42 < 7.55 < 7.84
208 658.0 ± 30.9 249.8 ± 12.3 15.6 ± 8.00 < 22.1 1.16 < 1.56 < 1.93 7.67 < 7.85 < 8.22
223 242.9 ± 15.3 83.3 ± 7.53 23.7 ± 18.9 < 10.1 1.45 < 4.64 < 19.62 6.14 < 6.61 < 7.53
252 466.8 ± 9.32 157.9 ± 3.57 9.20 ± 4.00 139.0 ± 3.71 1.16 < 1.45 < 1.72 7.68 < 7.87 < 8.14
Note. — Only emitters with > 15σ Hβ fluxes are listed. All fluxes are normalized by their f(Hβ) and multiplied by
100. 1σ upper limits are listed for [OII]3727 flux below 3σ and [OIII]4363 below 1σ. The units of Te[OIII] are 10
−4 [K].
THE YOUNGEST GALAXIES 17
Fig. 17.— HST/ACS (B, V, z’) composite images of NB816 emitters in the GOODS-N field with overlaid object IDs from Table 2 and
redshifts, where known. Fields are 12.′′5 on a side.
TABLE 5
Line Fluxes and Oxygen Abundances for L912 selected emitters
Object f([OIII]5007) f([OIII]4959) f([OIII]4363) f([OII]3727) Te[OIII] 12+log(O/H)
[OIII] emitters
3 550.9 ± 12.9 187.9 ± 4.91 23.9 ± 7.90 8.6 ± 2.5 1.74 < 2.20 < 2.71 7.26 < 7.43 < 7.65
6 588.1 ± 35.1 216.0 ± 14.2 18.4 ± 11.0 52.0 ± 8.6 1.20 < 1.79 < 2.39 7.40 < 7.68 < 8.14
9 442.3 ± 15.3 154.7 ± 6.42 < 12.3 157.2 ± 6.88 < 1.70 > 7.70
10 490.0 ± 11.9 178.7 ± 5.14 13.7 ± 4.40 < 2.95 1.42 < 1.69 < 1.97 7.55 < 7.61 < 7.82
17 342.5 ± 20.0 129.7 ± 9.29 15.9 ± 9.40 < 7.72 1.41 < 2.26 < 3.28 7.03 < 7.22 < 7.70
20 418.7 ± 17.4 135.1 ± 6.96 16.8 ± 5.50 24.5 ± 2.45 1.69 < 2.11 < 2.57 7.18 < 7.36 < 7.58
239 202.4 ± 10.2 75.6 ± 6.40 < 8.20 190.6 ± 10.2 < 2.08 > 7.34
270 351.7 ± 15.1 149.7 ± 7.73 12.4 ± 3.40 30.7 ± 2.72 1.59 < 1.87 < 2.16 7.28 < 7.43 < 7.61
Hα emitters
52 589.1 ± 10.0 179.2 ± 3.42 18.3 ± 1.59 < 1.56 1.75 < 1.83 < 1.92 7.62 < 7.67 < 7.72
60 619.1 ± 33.5 206.7 ± 12.5 10.7 ± 7.77 48.4 ± 8.7 0.90 < 1.37 < 1.77 7.67 < 7.96 < 8.57
266 682.8 ± 10.3 217.7 ± 3.57 14.7 ± 2.42 ... 1.40 < 1.52 < 1.63 < 8.3
Note. — Same as Table 4 but for NB912 emitters. The [OII]λ3727 of ID266 is beyond the DEIMOS wavelength
coverage and thus was not being measured.
18 Kakazu et al.
Fig. 18.— HST/ACS (B, V, I) composite images of NB912 emitters in the GOODS-N field with overlaid object IDs from Table 3 and
redshifts, where known. Fields are 12.′′5 on a side.
ABSTRACT
  We describe results of a narrow band search for ultra-strong emission line
galaxies (USELs) with EW(H beta) > 30 A. 542 candidate galaxies are found in a
half square degree survey using two ~100 Angstrom 8150 A and 9140 A filters
with Subaru/SuprimeCam. Followup spectroscopy for randomly selected objects in
the sample with KeckII/DEIMOS shows they consist of [OIII] 5007, [OII] 3727,
and H alpha selected strong-emission line galaxies at intermediate redshifts (z
< 1), and Ly alpha emitting galaxies at high-redshift (z >> 5). We determine
the H beta luminosity functions and the star formation density of the USELs,
which is 5-10% of the value found from ultraviolet continuum objects at z=0-1,
suggesting they correspond to a major epoch in galaxy formation at these
redshifts. Many USELs show the temperature-sensitive [OIII] 4363 auroral lines
and about a dozen have oxygen abundances characteristic of eXtremely Metal Poor
Galaxies (XMPGs). These XMPGs are the most distant known today. Our high yield
rate of XMPGs suggests this is a powerful method for finding such populations.
The lowest metallicity measured in our sample is 12+log(O/H) = 7.06
(6.78-7.44), close to the minimum metallicity found in local galaxies. The
luminosities, metallicities and star formation rates of USELs are consistent
with the strong emitters being start-up intermediate mass galaxies and suggest
that galaxies are still forming in relatively chemically pristine sites at z <
1.

<|endoftext|><|startoftext|>
Submitted to ApJ Letters December 18, 2006; Accepted April 04, 2007
Preprint typeset using LATEX style emulateapj v. 08/22/09
DISCOVERY OF EXTREME ASYMMETRY IN THE DEBRIS DISK SURROUNDING HD 15115
Paul Kalas
, Michael P. Fitzgerald
, James R. Graham
Submitted to ApJ Letters December 18, 2006; Accepted April 04, 2007
ABSTRACT
We report the first scattered light detection of a dusty debris disk surrounding the F2V star HD
15115 using the Hubble Space Telescope in the optical, and Keck adaptive optics in the near-infrared.
The most remarkable property of the HD 15115 disk relative to other debris disks is its extreme length
asymmetry. The east side of the disk is detected to ∼315 AU radius, whereas the west side of the disk
has radius >550 AU. We find a blue optical to near-infrared scattered light color relative to the star
that indicates grain scattering properties similar to the AU Mic debris disk. The existence of a large
debris disk surrounding HD 15115 adds further evidence for membership in the β Pic moving group,
which was previously argued based on kinematics alone. Here we hypothesize that the extreme disk
asymmetry is due to dynamical perturbations from HIP 12545, an M star 0.5◦ (0.38 pc) east of HD
15115 that shares a common proper motion vector, heliocentric distance, galactic space velocity, and
Subject headings: stars: individual(HD 15115) - circumstellar matter
1. INTRODUCTION
Volume-limited, far-infrared surveys of the solar neigh-
borhood suggest that ∼15% of main sequence stars have
excess thermal emission indicative of circumstellar grains
(Aumann 1985; Backman & Paresce 1993; Meyer et al.
2007). Direct imaging of dust scattered light reveals the
geometry of the grain population relative to the star,
which further elucidates the origin of dust. In some
cases, a circumstellar nebulosity may be amorphous with
asymmetric striated features produced when stellar ra-
diation pressure deflects interstellar dust (Kalas et al.
2002). In other cases, such as β Pictoris and Fomal-
haut, the geometry of dust is consistent with a circum-
stellar disk or belt related to the formation of planetesi-
mals (Smith & Terrile 1984; Kalas et al. 2005). Though
larger bodies such as comets and asteroids are not di-
rectly observed, they most likely exist as a reservoir for
injecting fresh debris into the system over the lifetime
of the star. Furthermore, circumstellar debris disks dis-
play significant structure and asymmetry that may be
linked, in principle, to dynamical perturbations from
a planetary system (Roques et al. 1994; Liou & Zook
1999; Moro-Mart́ın & Malhotra 2002). Unfortunately,
only ∼10% of stars with excess thermal emission have
detected scattered light disks due to the high contrast
between the host star and the low surface brightness neb-
ulosity at optical and near-infrared wavelengths. Fortu-
nately, the observational capabilities have improved in
recent years due to instrument upgrades on the Hubble
Space Telescope (HST) and the implementation of adap-
tive optics (AO) on large, ground-based telescopes.
Here we show new scattered light images of a debris
disk surrounding HD 15115, an F2 star at 45 pc (Table
1), first reported as a source of thermal excess emission
by Silverstone (2000). The spectral energy distribution
1 Astronomy Department and Radio Astronomy Laboratory, 601
Campbell Hall, Berkeley, CA 94720
2 National Science Foundation Center for Adaptive Optics, Uni-
versity of California, Santa Cruz, CA 95064
is consistent with a single temperature dust belt at ∼35
AU radius with an estimated dust mass of 0.047 M⊕
(Zuckerman & Song 2004; Williams & Andrews 2006).
Recently, Moór et al. (2006) identified HD 15115 as a
candidate for membership in the 12 Myr-old β Pic mov-
ing group (BPMG), based on new radial velocity mea-
surements that resulted in galactic kinematics similar to
those of the BPMG.
2. OBSERVATIONS & DATA ANALYSIS
We first detected the HD 15115 disk in scattered light
using the HST ACS High Resolution Camera (HRC) on
2006 July 17. We used the F606W broadband filter and
the 1.8′′ diameter occulting spot to artificially eclipse the
star. Three flatfielded frames of 700 seconds each from
standard pipeline processing of the HST data archive
were median combined for cosmic ray rejection. The
point spread function was then subtracted iteratively us-
ing five other stars of similar spectral type obtained from
the HST archive. The relative intensity scaling between
images was iteratively adjusted until the residual image
showed a mean radial profile equal to zero intensity per-
pendicular to the circumstellar disk. The images were
then corrected for geometric distortion, giving a 25 mas
pixel−1 scale.
The resulting optical images revealed a needle-like fea-
ture projecting westward from the star to the edge of
the field, but with almost no counterpart to the east.
Given the high degree of asymmetry that could conceiv-
ably arise from instrumental scattering, we endeavored
to confirm the disk using the Keck II telescope with
AO on 2006 October 07 and 2007 January 26. Utilizing
the near-infrared camera NIRC2, a 0.4′′ radius occulting
spot and a 10 mas pixel scale, we confirmed the exis-
tence of the disk in J (1.2 µm), H (1.6 µm), and K′ (2.2
µm). PSF subtraction is accomplished by allowing the
sky to rotate relative to the detector, thereby separat-
ing the stellar PSF from the disk. The observing pro-
cedure and data reduction procedure are fully described
in Fitzgerald et al. (2007). Due to poor observing condi-
http://arxiv.org/abs/0704.0645v1
2 Kalas, Fitzgerald, Graham.
tions in October, including intermittent cirrus clouds, we
used only the best fraction of data by visually selecting
frames of relatively constant intensity and PSF sharp-
ness. The resulting effective integration times are 450 s,
980 s, and 600 s for J, H, and K′, respectively. Stan-
dard star observations were obtained under similar, non-
photometric conditions and processed in a similar man-
ner. In January 2007 we re-observed HD 15115 (1930
s cumulative integration time) and two standard stars
under photometric conditions from Keck using the same
instrumentation with the H broadband filter. However,
the observations were made after meridian transit and
the limited rotation of the sky relative to the instrumen-
tal PSF causes disk emission to be included in the PSF
estimate, resulting in disk self-subtraction at small radii.
Our analysis of the 2007 January data therefore yields a
detection of the west ansa in the region 1.3′′ − 3.3′′ ra-
dius. The photometry in this second epoch agrees well
with that of the first epoch (on average, the 2007 Jan-
uary disk photometry is 0.13 mag arcsec−2 fainter than
2006 October), suggesting that our frame selection tech-
nique for the first epoch of cloudy conditions effectively
filtered out non-photometric data.
3. RESULTS
Fig. 1 shows the PSF-subtracted images of HD 15115
with HST and with Keck. The west side of the disk in
the optical HST data has PA = 278◦.5 ± 0.5 and is de-
tected from the edge of the occulting spot at 1.5′′ (67
AU) radius to the edge of the field at 12.38′′ (554 AU)
radius. The east midplane is detected as far as ∼7′′ (315
AU) radius. At this radius the east midplane begins to
intersect the outer portion of the coronagraph’s 3.0′′ oc-
culting spot. Further east, past the spot and to the edge
of the field, no nebulosity is detected 9.0′′−14.9′′ radius.
The appearance of the disk is more symmetric in the
2006 October Keck data, which show the disk between
0.7′′ (31 AU) - 2.5′′ (112 AU).
Optical surface brightness contours (Fig. 2) re-
veal a sharp midplane morphology for the west ex-
tension that indicates an edge-on orientation to the
line of sight. The west midplane is qualitatively simi-
lar to that of β Pic’s northeast midplane, including a
characteristic width asymmetry (Kalas & Jewitt 1995;
Golimowski et al. 2006). The northern side of the west
midplane is more vertically extended than the southern
side. For example, the full-width at half-maximum across
the disk midplane at 2′′ radius is 0.19′′ ± 0.10′′ in both
the optical and NIR data. However, the vertical cuts are
not symmetric about the midplane when measuring the
half-width at quarter-maximum (HWQM). The HWQM
north of the west midplane is 1.6 ± 0.1 times greater
than that of the HWQM south of the west midplane.
This width asymmetry is confirmed in the Keck data.
If the width asymmetry is found to be in the opposite
direction in the opposite midplane, then Kalas & Jewitt
(1995) refer to such a feature as the butterfly asymmetry.
The butterfly asymmetry is evident in the morphology of
β Pic, that Golimowski et al. (2006) recently related to
the presence of a second disk midplane tilted relative to
the main disk midplane. However, our detection of HD
15115’s east midplane has insufficient signal to noise to
confirm the presence of a width asymmetry here.
We note that none of the surface brightness profiles
show evidence for significant flattening inward toward
the star (Fig. 3). All four surface brightness profiles
are well-represented by a single power law decrease with
radius. If there is an inner dust depletion, then it resides
within 40 AU radius. This constraint is consistent with
model fits of the spectral energy distribution that place
the dominant emitting dust component at∼35 AU radius
(Zuckerman & Song 2004; Williams & Andrews 2006).
The color of the disk may be estimated in the 2.0′′ −
3.3′′ region where the H-band and V-band data overlap
(Fig. 3). At face value, ΣV − ΣH ≈ −0.6 mag arcsec
at 2′′ radius, increasing to −1.9 mag arcsec−2 at 3.3′′
radius for the West disk extension. The east ansa has
similar blue scattering at 2′′ radius, but the V-band sur-
face brightness profile is steeper in the east than in the
west, giving a roughly constant blue color with increasing
radius in the east.
In a future paper we will present a detailed model of
dust scattering and thermal emission properties, which
requires a more complicated treatment of the obvious
disk asymmetry. However, for isotropically scattering
grains in an edge-on disk, an analytic approach shows
that the grain number density distribution as a function
of radius within the disk midplane follows a power-law
with index equivalent to one minus the sky-projected ra-
dial midplane power-law index. From the Keck data in
Fig. 3, we estimate that the disk number density dis-
tribution decreases with disk radius as r−3 in the inner
region up to ∼3.3′′ radius for both sides of the disk. At
> 3.3′′ radius, the optical data show that this profile
continues for the east extension, but that the disk num-
ber density profiles flattens for the west extension, as
described in Fig. 3. A precise measurement of the color
and polarization of the disk scattered light is necessary
to further constrain the grain size distribution, the corre-
sponding scattering phase function and albedo, and the
effect on the disk number density profile.
4. DISCUSSION
Asymmetric disk structure is evident in the majority of
debris disks, and most authors invoke planetary pertur-
bations as the likely origin. Secular perturbations may
offset the center of global disk symmetry from the lo-
cation of the star, though this effect may also be pro-
duced by an external perturber (Wyatt et al. 1999). The
edge-on debris disk surrounding β Pic displays a vari-
ety of radial and vertical asymmetries on large scales
(Kalas & Jewitt 1995) that may be most relevant to the
study of HD 15115. In the deepest optical images of the
β Pic disk, the northeast and southwest disk midplanes
are traced to 1835 AU and 1450 AU, respectively, giving
a ratio of 1.27 (Larwood & Kalas 2001). In the case of
HD 15115 the corresponding ratio is >1.75. This ratio
is a lower limit given that the 550 AU extent of the west
midplane is limited only by our field of view.
A single stellar flyby, or a periodic flyby by a bound
companion on an eccentric orbit, has been studied as
a potential mechanism for producing β Pic’s large-scale
asymmetry (Larwood & Kalas 2001). However, in a
kinematic study of Hipparcos-detected stars with pub-
lished radial velocities, Kalas et al. (2001) did not find
any perturbers that approached closer than 0.6 pc of β
Pic, though the sample was estimated as only 20% com-
plete. In the case of HD 15115, Moór et al. (2006) noted
Extreme Debris Disk Asymmetries 3
that another β Pic moving group member, HIP 12545, is
located relatively nearby in sky position.
Table 1 summarizes the observed properties of both
stars. Their projected separation is 0.51◦, which trans-
lates to 0.38 pc at a mean heliocentric distance of 43
pc. Within the uncertainties, the proper motion vectors,
the (U, V ) galactic space motions, and heliocentric dis-
tance are identical. Furthermore, the eastward location
of HIP 12545 is in the direction of the truncated side of
the HD 15115 debris disk. This geometry is consistent
with the dynamical simulation of a disk disrupted by a
stellar flyby in Larwood & Kalas (2001). Specifically, in
their Fig. 18, the long end of a highly perturbed disk
is located in the direction of periastron. The perturber
follows a parabolic trajectory such that in a later epoch
it is located in the direction opposite of periastron, or
in the direction of the truncated side of the disk. Peri-
astron in these models is ∼700 AU, with an initial disk
radius of ∼500 AU. Overall, the ensemble of evidence fa-
vors further consideration of HD 15115 and HIP 12545 as
a possible wide-separation multiple system with a highly
eccentric orbit (e > 0.95).
If the heliocentric distances are in fact nearly equiva-
lent, then the projected sky separation approximates the
true separation. Kalas et al. (2001) discuss the Roche ra-
dius, at, of a star as containing the volume within which
the stellar potential dominates the Galactic tidal field.
Using their Eq. 2 and the stellar mass estimates in our
Table 1, we find at = 1.1 pc and at= 0.7 pc for HD 15115
and HIP 12545, respectively. Therefore, for a small body
gravitationally bound to HD 15115, the potential well
of HIP 12545 exerts a more significant perturbing force
than the Galactic tidal field at the current epoch. This is
not the case if we take the Hipparcos parallaxes at face
value. These give a line-of-sight separation between the
stars of ∼4 pc, and we derive a 3-D separation of 5.1±2.8
pc. We further calculate that closest approach will occur
∼1 Myr in the future. Therefore, improving the par-
allax measurements for both stars is a critical task for
future work that would examine their possible physical
association.
A prediction of the Larwood & Kalas (2001) model is
that the perturber may capture disk material, and dis-
play a tenuous and highly asymmetric tail of escaping
material pointing away from the mother disk. To test
the hypothesis that HD 15115 suffered a close encounter
with HIP 12545, high-contrast observations of HIP 12545
should reveal circumstellar nebulosity due to captured
material. Since this is captured material, the nebulosity
may not resemble a disk, and any tail should point away
from the mother disk (eastward).
To futher investigate this hypothesis, we examined
ACS/HRC coronagraphic observations of HIP 12545 ob-
tained by program GO-10487 (Principal Investigator
David Ardila). The observing technique is similar to
that described here for HD 15115. After PSF subtrac-
tion, we do not detect nebulosity in the vicinity of HIP
12545. Therefore the possibility that the extreme disk
asymmetry of HD 15115 is created by dynamical inter-
actions with HIP 12545 does not have further supporting
evidence at the present time.
Finally, we note that among the four debris disks
imaged in scattered light in the BPMG, the dust ap-
pears depleted for HD 15115. The values of dust opti-
cal depth in 10−4 units are given as 24.3±1.1, 4.9±0.4,
29.3±1.6 and 4.0±0.3 for β Pic (A5V), HD 15115 (F2V),
HD 181327 (F5.5V) and AU Mic (M2V), respectively
(Moór et al. 2006). The factor of ∼five smaller opti-
cal depth for HD 15115 compared to β Pic and HD
181327 suggests a different evolutionary path for the
disk. Though a stellar flyby is one possibility, mi-
gration and dynamical instabilities within a hypotheti-
cal planetary system may also play a role in the rapid
diminution of dust parent bodies around HD 15115 (e.g.
Morbidelli & Valsecchi 1997).
5. SUMMARY
Optical and near-infrared coronagraphic images of the
F2 star HD 15115 reveal a highly asymmetric debris disk
with an edge-on orientation. We describe the morpho-
logical and photometric properties of the disk, deferring
a detailed model of scattering and thermal emission of
grains to future work. The blue scattered light color may
indicate grain properties most similar to those of the AU
Mic debris disk, where ΣV − ΣH ≈ −1 mag arcsec
relative to the star (Fitzgerald et al. 2007), and less like
those of β Pic, which is predominantly red scattering
(Golimowski et al. 2006). A key follow-up measurement
would be polarization, which in the case of AU Mic re-
vealed highly porous macroscopic grains (Graham et al.
2007).
With outer optical radius >550 AU, HD 15115 pos-
sesses the second largest debris disk next to β Pic. How-
ever, the length asymmetry between its west and east
midplanes greatly exceeds that of β Pic and other disks.
HD 15115 is now the fourth debris disk discovered in
scattered light among the β Pic moving group mem-
bers. Future work should test our hypothesis that ex-
treme asymmetries are due to dynamical perturbations
from the nearby M star HIP 12545.
Acknowledgements: Support for GO-10896 was
provided by NASA through a grant from STScI under
NASA contract NAS5-26555.
REFERENCES
Augereau, J. C., Lagrange, A. M., Mouillet, D., Papaloizou, J. C.
B. & Grorod, P.A. 1999, A&A, 348, 557
Aumann, H.H. 1985, PASP, 97, 885
Backman, D. E. & Paresce, F. 1993, in Protostars and Planets
III, eds. E. H. Levy & J. I. Lunine, (Univ. Arizona Press,
Tucson), p. 1253
Fitzgerald, M. P., Kalas, P., Duchene, G., Pinte, C. and Graham,
J. R. 2007, ApJ, submitted.
Golimowski, D.A., Ardila, D.R., Krist, J.R., et al., AJ, 131, 3109.
Graham. J.R., Kalas, P. & Matthews, B. 2007, ApJ, 654, 595.
Kalas, P. & Jewitt, D. 1995, AJ,110, 794
Kalas, P. , Deltorn, J.-M. and Larwood, J. 2001, ApJ, 533, 410
Kalas, P., Graham, J.R., Beckwith, S.V.W., Jewitt, D.C. &
Lloyd, J.P. , 2002, ApJ, 567, 999
Kalas, P., Graham. J.R. & Clampin, M.C. 2005, Nature, 435, 1067
Larwood, J. D. & Kalas, P. 2001, MNRAS, 323, 402
Liou, J. -C. & Zook, H. A. 1999, AJ, 118, 580
Meyer, M.R., Backman, D.E., Weinberger, A. J. & Wyatt, M.
2007, in Protostars and Planets V, in press.
Moór, A., Abraham, P., Derekas, A., et al. 2006, ApJ, 644, 525
4 Kalas, Fitzgerald, Graham.
Morbidelli, A. & Valsecchi, G. B. 1997, Icarus, 128, 464
Moro-Mart́ın, A. & Malhotra, R. 2002, AJ, 124, 2305
Roques, F. , Scholl, H., Sicardy, B. & Smith, B.A. 1994, Icarus,
108, 37
Silverstone, M.D., Ph.D. thesis
Song, I., Zuckerman, B., and Bessel, M.S. 2003, ApJ, 599, 342
Smith, B.A. & Terrile, R. J. 1984, Science, 226, 1421
Strubbe, L. E. & Chiang, E. I. 2005, ApJ, 648, 652
Williams, J. P. & Andrews, S. M. 2006, ApJ, 653, 1480
Wyatt, M. C., Dermott, S.F., Telesco, C.M., et al. 1999, ApJ,
527, 918
Zuckerman, B. & Song, I. 2004, ApJ, 603, 738
Extreme Debris Disk Asymmetries 5
Fig. 1.— False-color, log scale images of the HD 15115 disk as originally discovered using the ACS/HRC F606W (λc= 591 nm, ∆λ =
234 nm) [LEFT] and confirmed in H-band with Keck II adaptive optics [RIGHT; 2006 October 26 data]. North is up, east is left and the
scale bars span 5′′. In the HST image we use gray fields over the occulting bar and 3.0′′ occulting spot located to the left of HD 15115,
as well as a gray disk covering PSF-subtraction artifacts surrounding HD 15115 itself. If the HD 15115 disk were a symmetric structure,
then the east side of the disk would have been detected within the rectangular box, shown below the ACS/HRC occulting finger. The NIR
data [RIGHT] show a more symmetric disk within 2′′ radius, with asymmetry becoming more apparent beyond 2′′ radius. Due to poor
observing conditions, the field is contaminated by residual noise due to the diffraction pattern of the telescope (e.g. at 2 o’clock and 7
o’clock relative to HD 15115). However, whereas the residual diffraction pattern noise of the telescope rotates relative to the sky orientation
over a series of exposures, the image of the disk remains fixed and it is confirmed as real.
Fig. 2.— Surface brightness isocontours for the HD 15115 debris disk converted from F606W to Johnson V-band (derived using
STSDAS/SYNPHOT with a Kurucz model atmosphere and the appropriate observatory parameters). The disk was rotate by 8◦ clockwise
such that the midplane lies along a horizontal line. The bottom frame is the east extension, transposed across the vertical axis, and the
gray region marks the area occupied by the ACS/HRC occulting finger and 3.0′′ occulting spot. The left edge of the frame corresponds
to 2′′ radius from the star. The innermost contour (bold) is 19.0 mag arcsec−2 and the outermost contour represents 23.0 mag arcsec−2,
with a contour interval of 0.5 mag arcsec−2.
6 Kalas, Fitzgerald, Graham.
Fig. 3.— Radial surface brightness (mag arcsec−2) distribution along the west and east midplanes of HD 15115. We plot the difference
between the measured disk surface brightness and the stellar magnitudes of H=5.86 and V=6.80. Disk photometry was extracted from
boxes 0.25′′ × 0.25′′ centered on the midplane. We plot a representative sample of error bars that gives the standard deviation of the
background residuals as a function of radius. The aperture corrections derived from point source photometry are 0.48 and 0.57 for the
H-band data in the 2006 October and 2007 January observations, respectively, and 0.70 for the V band data. The H-band radial profiles
between 0.7′′ and 2.3′′ radius may be described by power-laws with indices -3.7 and -4.4 for the west and east disk extensions, respectively.
In the V band, the east midplane profile may be fit by a power-law with index -4.0 between 2.0′′ and 6.0′′ radius. Thus, our data do not
show a significant color gradient as a function of radius for the east ansa. The west midplane profile in V band may be fit by a single
power-law with index -3.0 between 2.0′′ and 10.0′′ radius. This profile is significantly shallower than the H band profile, resulting in a blue
color gradient as a function of radius for the west ansa.
Extreme Debris Disk Asymmetries 7
TABLE 1
Stellar Properties
HD 15115 HIP 12545 Ref.
Spectral Type F2 M0 Hipparcos
mV (mag) 6.79 10.28 Hipparcos
Mass (M⊙) 1.6 0.5 Astrophys. Quant.
Distance (pc) 44.78+2.22
−2.01
40.54+4.38
−3.61
Hipparcos
RA (ICRS) 02 26 16.2447 02 41 25.89 Hipparcos
DEC (ICRS) +06 17 33.188 +05 59 18.41 Hipparcos
µα(mas/yr) 86.09 ± 1.09 82.32± 4.46 Hipparcos
µδ (mas/yr) −50.13 ± 0.71 −55.13± 2.45 Hipparcos
µα(mas/yr) 87.1 ± 1.2 82.3± 4.3 Tycho-2
µδ (mas/yr) −50.9± 1.2 −55.1± 2.7 Tycho-2
U (km / s) −13.2± 1.9 −14.0± 0.5 a
V (km / s) −17.8± 1.2 −16.7± 0.9 a
W (km / s) −6.0± 2.3 −10.0± 0.5 a
a Galactic kinematics for HD 15115 and HIP 12545 from Moór et al. (2006)
and Song et al. (2003), respectively
ABSTRACT
  We report the first scattered light detection of a dusty debris disk
surrounding the F2V star HD 15115 using the Hubble Space Telescope in the
optical, and Keck adaptive optics in the near-infrared. The most remarkable
property of the HD 15115 disk relative to other debris disks is its extreme
length asymmetry. The east side of the disk is detected to ~315 AU radius,
whereas the west side of the disk has radius >550 AU. We find a blue optical to
near-infrared scattered light color relative to the star that indicates grain
scattering properties similar to the AU Mic debris disk. The existence of a
large debris disk surrounding HD 15115 adds further evidence for membership in
the Beta Pic moving group, which was previously argued based on kinematics
alone. Here we hypothesize that the extreme disk asymmetry is due to dynamical
perturbations from HIP 12545, an M star 0.5 degrees (0.38 pc) east of HD 15115
that shares a common proper motion vector, heliocentric distance, galactic
space velocity, and age.

<|endoftext|><|startoftext|>
Introduction to Kolmogorov
Complexity and Its Applications (Springer: Berlin, 1997)
[46] E. Borel, Rend. Circ. Mat. Paleremo, 26, 247 (1909)
[47] K. L. Chung, A Course in Probability Theory (New
York: Academic, 1974)
[48] P. C. W. Davies 1990, in Complexity, Entropy, and
Physical Information, ed. W. H. Zurek (Addison-
Wesley: Redwood City), p61
[49] M. Tegmark, Found. Phys. Lett., 9, 25 (1996)
[50] H. D. Zeh, The Physical Basis of the Direction of Time,
4th Ed. (Springer: Berlin, 2002)
[51] A. Albrecht and L. Sorbo, PRD, 70, 063528 (2004)
[52] S. M. Carroll and J. Chen, Gen.Rel.Grav., 37, 1671
(2005)
[53] R. M. Wald, gr-qc/0507094 (2005)
[54] D. N. Page, hep-th/0612137 (2006)
[55] A. Vilenkin, JHEP, 701, 92 (2007)
[56] L. Boltzmann, Nature, 51, 413 (1895)
[57] A. Guth, PRD, 23, 347 (1981)
[58] A. Vilenkin, PRD, 27, 2848 (1983)
[59] A. A. Starobinsky, Fundamental Interactions (MGPI
Press, Moscow: p.55, 1984)
[60] A. D. Linde, Particle Physics and Inflationary Cosmol-
ogy (Harwood: Switzerland, 1990)
[61] A. H. Guth, hep-th/0702178 (2007)
[62] R. Penrose, N.Y.Acad.Sci., 571, 249 (1989)
[63] S. Hollands and R. M. Wald, Gen.Rel.& Grav., 34, 2043
(2002)
[64] L. Kofman, A. Linde, and V. Mukhanov, JHEP, 10, 57
(2002)
[65] D. Giulini, E. Joos, C. Kiefer, J. Kupsch, I. O. Sta-
matescu, and H. D. Zeh, Decoherence and the Appear-
ance of a Classical World in Quantum Theory (Berlin:
Springer, 1996)
[66] D. Polarski and A. A. Starobinsky, Class. Quant. Grav.,
13, 377 (1996)
[67] K. Kiefer and D. Polarski, Ann. Phys., 7, 137 (1998)
[68] M. Tegmark, JCAP, 2005-4, 1 (2005)
[69] R. Easther, E. A. Lim, and M. R. Martin, JCAP, 0603,
16 (2006)
[70] R. Bousso, PRL, 97, 191302 (2006)
[71] A. Vilenkin, hep-th/0609193 (2006)
[72] A. Aguirre, S. Gratton, and M. C. Johnson,
hep-th/0611221 (2006)
[73] J. Garriga and A. Vilenkin, PRD, 64, 043511 (2001)
[74] D. Deutsch, The fabric of reality (Allen Lane: New York,
1997)
[75] A. D. Linde, hep-th/0211048 (2002)
[76] G. F. R. Ellis, U. Kirchner, and W. R. Stoeger, MN-
RAS, 347, 921 (2004)
[77] W. R. Stoeger, G. F. R. Ellis, and U. Kirchner,
astro-ph/0407329 (2004)
[78] R. D. Holder, God, the Multiverse, and Everything:
Modern Cosmology and the Argument from Design
(Ashgate: Burlington, 2004)
[79] S. Weinberg, hep-th/0511037 (2005)
[80] S. M. Carroll, Nature, 440, 1132 (2006)
[81] D. N. Page, hep-th/0610101 (2006)
[82] P. Davies 2007, in Universe or Multiverse?, ed. B. Carr
(Cambridge: Cambridge Univ. Press)
[83] M. Kaku, Parallel Worlds: A Journey Through Cre-
ation, Higher Dimensions, and the Future of the Cos-
mos (Anchor: New York, 2006)
[84] A. Vilenkin, Many Worlds in One: The Search for Other
Universes (Hill and Wang: New York, 2006)
[85] R. Bousso and J. Polchinski, JHEP, 6, 6 (2000)
[86] J. L. Feng, J. March-Russell, S. Sethi, and Wilczek F,
Nucl. Phys. B, 602, 307 (2001)
[87] S. Kachru, R. Kallosh, A. Linde, and S. P. Trivedi,
PRD, 68, 046005 (2003)
[88] L. Susskind, hep-th/0302219 (2003)
[89] S. Ashok and M. R. Douglas, JHEP, 401, 60 (2004)
[90] S. Feferman, In the Light of Logic, chapter 14 (Oxford
Univ. Press: Oxford, 1998)
[91] R. Hersh, What Is Mathematics, Really? (Oxford
Univ. Press: Oxford, 1999)
[92] D. Lewis, On the Plurality of Worlds (Blackwell: Oxford,
1986)
[93] S. Hawking, A Brief History of Time (Touchstone: New
York, 1993)
[94] G. F. R. Ellis, Class.Quant.Grav., 16, A37 (1999)
[95] C. Schmidhuber, hep-th/0011065 (2000)
[96] C. J. Hogan, Rev.Mod.Phys., 72, 1149 (2000)
[97] P. Benioff, PRA, 63, 032305 (2001)
[98] G. F. R. Ellis, Int.J.Mod.Phys. A, 17, 2667 (2002)
[99] N. Bostrom, Anthropic Bias: Observation Selection Ef-
fects in Science and Philosophy (Routledge: New York,
2002)
[100] P. Benioff, Found.Phys., 32, 989 (2002)
[101] P. Benioff, quant-ph/0303086 (2003)
[102] M. M. Circovic, Found.Phys., 33, 467 (2003)
[103] R. Vaas, physics/0408111 (2004)
[104] A. Aguirre and M. Tegmark, hep-th/0409072 (2004)
[105] P. Benioff, Found.Phys., 35, 1825 (2004)
[106] G. McCabe, http://philsci-archive.pitt.edu
/archive/00002218 (2005)
[107] P. Hut, M. Alford, and M. Tegmark, Found. Phys., 36,
765 (2006, physics/0510188)
[108] B. Vorhees, C. Luxford, and A. Rhyan, Int. J. Uncon-
ventional Computing, 1, 69 (2005)
[109] G. F. R. Ellis, astro-ph/0602280 (2006)
[110] W. R. Stoeger, astro-ph/0602356 (2006)
[111] R. Hedrich, physics/0604171 (2006)
[112] K. E. Drexler, Engines of Creation: The Coming Era of
Nanotechnology (Forth Estate: London, 1985)
[113] N. Bostrom, Int. Journal of Futures Studies, 2, 1 (1998)
[114] R. Kurzweil, The Age of Spiritual Machines: When com-
puters exceed human intelligence (Viking: New York,
1999)
[115] H. Moravec, Robot: Mere Machine to Transcendent
Mind (Oxford Univ. Press: Oxford, 1999)
[116] F. J. Tipler, The Physics of Immortality (Doubleday:
New York, 1994)
[117] N. Bostrom, Philosophical Quarterly, 53, 243 (2003)
[118] G. McCabe, Stud. Hist. Philos. Mod. Phys., 36, 591,
physics/0511116 (2005)
[119] R. Penrose, The Emperor’s New Mind (Oxford Univ.
Press: Oxford, 1989)
[120] R. Penrose 1997, in The Large, the Small and the Hu-
man Mind, ed. M. Longair (Cambridge Univ. Press:
Cambridge)
[121] T. Hafting, Nature, 436, 801 (2005)
[122] R. Gambini, R. Porto, and J. Pullin, New J. Phys., 6,
45 (2004)
[123] G. Egan, Permutation City (Harper: New York, 1995)
[124] R. K. Standish, Found. Phys. Lett., 17, (2004)
[125] M. Davis, Computability and Unsolvability, Dover, New
http://arXiv.org/abs/gr-qc/0507094
http://arXiv.org/abs/hep-th/0612137
http://arXiv.org/abs/hep-th/0702178
http://arXiv.org/abs/hep-th/0609193
http://arXiv.org/abs/hep-th/0611221
http://arXiv.org/abs/hep-th/0211048
http://arXiv.org/abs/astro-ph/0407329
http://arXiv.org/abs/hep-th/0511037
http://arXiv.org/abs/hep-th/0610101
http://arXiv.org/abs/hep-th/0302219
http://arXiv.org/abs/hep-th/0011065
http://arXiv.org/abs/quant-ph/0303086
http://arXiv.org/abs/physics/0408111
http://arXiv.org/abs/hep-th/0409072
http://philsci-archive.pitt.edu
http://arXiv.org/abs/physics/0510188
http://arXiv.org/abs/astro-ph/0602280
http://arXiv.org/abs/astro-ph/0602356
http://arXiv.org/abs/physics/0604171
http://arXiv.org/abs/physics/0511116
York (1982)
[126] D. Hilbert and P. Bernays, Grundlagen der Matematik
(Springer: Berlin, 1934)
[127] K. Gödel, I. Monatshefte f. Mathematik und Physik, 38,
173 (1931)
[128] S. G. Simpson, Journal of Symbolic Logic, 53, 349,
http://www.math.psu.edu/simpson/papers/hilbert.pdf
(1988)
[129] J. W. Dawson, 21st Annual IEEE Symposium on Logic
in Computer Science, IEEE, p339 (2006)
[130] A. Church, Am. J. Math., 58, 345 (1936)
[131] A. Turing, Proc. London Math. Soc., 42, 230 (1936)
[132] R. L. Goodstein, Constructive formalism; essays on the
foundations of mathematics (Leister Univ. College: Le-
icester, 1951)
[133] G. McCabe, Stud.Hist.Philos.Mod.Phys., 36, 591
(2005, physics/0511116)
[134] X. Wen, Prog.Theor.Phys.Suppl., 160, 351 (2006,
cond-mat/0508020)
[135] M. Levin and X. Wen, hep-th/0507118 (2005)
[136] J. D. Barrow and F. J. Tipler, The Anthropic Cosmo-
logical Principle (Clarendon Press: Oxford, 1986)
[137] A. D. Linde 1987, in 300 Years of Gravitation, ed. S.
Hawking and W. Israel (Cambridge University Press:
Cambridge)
[138] S. Weinberg, PRL, 59, 2607 (1987)
[139] A. D. Linde, PLB, 201, 437 (1988)
[140] M. Tegmark, A. Vilenkin, and L. Pogosian,
astro-ph/0304536 (2003)
[141] L. Pogosian, A. Vilenkin, and M. Tegmark, JCAP, 407,
5 (2004)
[142] R. Jones, Philosophy of Science, 58, 185 (1991)
[143] O. Pooley 2007, in The Structural Foundations of Quan-
tum Gravity, ed. D. P. Rickles and S. R. D. French (Ox-
ford Univ. Press: Oxford)
[144] T. A. Larsson, math-ph/0103013v3 (2001)
http://www.math.psu.edu/simpson/papers/hilbert.pdf
http://arXiv.org/abs/physics/0511116
http://arXiv.org/abs/cond-mat/0508020
http://arXiv.org/abs/hep-th/0507118
http://arXiv.org/abs/astro-ph/0304536
http://arXiv.org/abs/math-ph/0103013
ABSTRACT
  I explore physics implications of the External Reality Hypothesis (ERH) that
there exists an external physical reality completely independent of us humans.
I argue that with a sufficiently broad definition of mathematics, it implies
the Mathematical Universe Hypothesis (MUH) that our physical world is an
abstract mathematical structure. I discuss various implications of the ERH and
MUH, ranging from standard physics topics like symmetries, irreducible
representations, units, free parameters, randomness and initial conditions to
broader issues like consciousness, parallel universes and Godel incompleteness.
I hypothesize that only computable and decidable (in Godel's sense) structures
exist, which alleviates the cosmological measure problem and help explain why
our physical laws appear so simple. I also comment on the intimate relation
between mathematical structures, computations, simulations and physical
systems.

<|endoftext|><|startoftext|>
Introduction
Cosmological observations may provide several interesting ways of testing string theory, which is
important for its further development. For example, a discovery of the cosmological acceleration
corresponding to the existence of the cosmological constant Λ ∼ 10−120 (in Planck unitsMp = 1,
where Mp = 2.435× 1018 GeV) initially was viewed as a problem for string theory. For a while
is was not known how to describe an accelerating 4D universe in a vacuum state with a positive
energy density. Eventually the problem was resolved by the KKLT construction [1] (developing
on [2]), which allowed to explain acceleration in a metastable vacuum state. Earlier and further
investigation of these issue [3], combined with the ideas of eternal inflation [4, 5], resulted in
the development of the idea of inflationary multiverse [5, 6] and string landscape scenario [7],
which may have important implications for the general methodology of theoretical physics.
There are some other ways in which cosmology can be used for testing string theory. Much
attention of string theory and cosmology communities during the recent few years, starting with
[8], was dedicated to the possible future detection of cosmic strings produced after inflation
[9, 10]. It is viewed as a possible window of a string theory into the real world. If detected,
cosmic strings in the sky may test various ideas in string theory and cosmology.
One may also try to check which versions of string theory lead to the best description of in-
flation, in agreement with the existing measurements of the anisotropy of the cosmic microwave
background radiation produced by scalar perturbations of metric [11]. These measurements
provide an important information about the structure of the inflaton potential [12, 13, 14, 15].
In particular, observational constraints on the amplitude of scalar perturbations, in the slow
roll approximation, imply that
V 3/2
≃ 5× 10−4 , (1.1)
whereas the spectral index of the scalar perturbations is given by
ns = 1− 3
≈ 0.95± 0.02 (1.2)
if the ratio of tensor perturbations to scalar perturbations is sufficiently small, r ≪ 0.1. For
larger values of r, e.g. for r ∼ 0.2, ns = 0.98± 0.02.
However, these data give rather indirect information about V : One can reduce the overall
scale of energy density by many orders of magnitude, change its shape, and still obtain scalar
perturbations with the same properties.
In this sense, a measurement of the tensor perturbations (gravitational waves) [16], or of
the tensor-to scalar ratio r = T/S, would be especially informative, since it is directly related
to the value of the inflationary potential and the Hubble constant during inflation [12],
r = 8
≈ 3× 107 V ∼ 108 H2. (1.3)
The last part of this equation follows from Eg. (1.1) and from the Einstein equation H2 = V/3.
The purpose of this note is to address the issues of string cosmology in view of the possibility
that tensor modes in primordial spectrum may be detected. We will argue here that the possible
detection of tensor modes from inflation may have dramatic consequences for string theory and
for fundamental physics in general. The current limit on the ratio of tensor to scalar fluctuations
is r < 0.3. During the next few years one expects to probe tensor modes with r ∼ 0.1 and
gradually reach the level of r ∼ 0.01. It is believed that probing below r ∼ 10−2 − 10−3 will be
“formidably difficult” [17]. However, the interval between r = 0.3 and r ∼ 10−3 is quite large,
and it can be probed by the cosmological observations.
Expected amplitude of tensor perturbations in stringy inflation appears to be very low,
r ≪ 10−3, see in particular [18, 19]. In Section 2 we will briefly review their results, as well
as some other recent results concerning string theory inflation [20]. In Section 3 we give some
independent arguments using the relation between the maximal value of the Hubble constant
during inflation and the gravitino mass [21], which suggest that in the superstring models based
on generic KKLT construction the amplitude of tensor perturbations in string theory inflation
with m3/2 <∼ 1 TeV should be extremely small, r <∼ 10−24.
One could argue therefore that the experimental detection of tensor modes would be in a
contradiction with the existing models of string cosmology. Let us remember, however, that
many of us did not expect the discovery of the tiny cosmological constant Λ ∼ 10−120, and
that it took some time before we learned how to describe acceleration of the universe in the
context of string theory. Since there exists a class of rather simple non-stringy inflationary
models predicting r in the interval 0.3 <∼ r <∼ 10−3 [22, 23, 24, 28, 25, 26], it makes a lot of
sense to look for tensor perturbations using the CMB experiments. It is important to think,
therefore, what will happen if the cosmological observations will discover tensor perturbations
in the range 10−3 < r < 0.3. As we will see, this result would not necessarily contradict string
theory, but it may have important implications for the models of string theory inflation, as well
as for particle phenomenology based on string theory.
2 Tensor modes in the simplest inflationary models
Before discussing the amplitude of tensor modes in string theory, we will briefly mention what
happens in general non-stringy inflationary models.
The predicted value of r depends on the exact number of e-foldings N which happened after
the time when the structure was formed on the scale of the present horizon. This number,
in turn, depends on the mechanism of reheating and other details of the post-inflationary
evolution.
For N ∼ 60, one should have r ∼ 0.14 for the simplest chaotic inflation model m2φ2/2, and
r ∼ 0.28 for the model λφ4/4. In the slow-roll approximation, one would have r = 8/N for the
model m2φ2/2 and 16/N for the model λφ4/4 [12].
If one considers the standard spontaneous symmetry breaking model with the potential
V = −
(φ2 − v2)2 , (2.1)
with v = m/
λ, it leads to chaotic inflation with the tensor to scalar ratio which can take any
value in the interval 10−2 <∼ r <∼ 0.3, for N ∼ 60. The value of r depends on the scale of the
spontaneous symmetry breaking v [23, 24], see Fig. 1. The situation in the so-called natural
inflation model [25] is very similar [26], except for the upper branch of the curve above the green
star (the first star from below) shown in Fig. 1, which does not appear in natural inflation.
Figure 1: Possible values of r and ns in the theory
(φ2 − v2)2 for different initial conditions
and different v, for N = 60. In the small v limit, the model has the same predictions as the
theory λφ4/4. In the large v limit it has the same predictions as the theory m2φ2. The branch
above the green star (the first star from below) corresponds to inflation which occurs while the
field rolls down from large φ, as in the simplest models of chaotic inflation. The lower branch
corresponds to the motion from φ = 0, as in new inflation.
If one considers chaotic inflation with the potential including terms φ2, φ3 and φ4, one can
considerably alter the properties of inflationary perturbations [27] and cover almost all parts of
the area in the (r, ns) plane allowed by the latest observational data [28].
However, in all of these models the value of r is large because the change of the inflation
field during the last 60 e-folds of inflation is greater than Mp = 1 [29], which is not the case in
many other inflationary models, such as new inflation [30] and hybrid inflation [31], see [29, 32]
for a discussion of this issue. Therefore the bet for the possibility of the observational discovery
of tensor modes in non-stringy inflationary models would be a bet for the triumph of simplicity
over majority.
3 Existing models of string theory inflation do not pre-
dict a detectable level of tensor modes
String theory at present has produced two classes of models of inflation: brane inflation and
modular inflation, see [10, 20, 33] for recent reviews. The possibility of a significant level of
tensor modes in known brane inflation models was carefully investigated by several authors.
The following conclusion has been drawn from our analysis of the work performed by Bean,
Shandera, Tye, and Xu [19]. They compared the brane inflationary model to recent cosmological
data, including WMAP 3-year cosmic microwave background (CMB) results, Sloan Digital Sky
Survey luminous red galaxies (SDSS LRG) power spectrum data and Supernovae Legacy Survey
(SNLS) Type 1a supernovae distance measures. When they used the bound on the distance in
the warped throat geonetry derived by Baumann and McAllister [18], it became clear that in all
currently known models of brane inflation (including DBI models [34]) the resulting primordial
spectrum could not simultaneously have significant deviation from the slow roll behavior and
satisfy the bound [18]. Moreover the slow roll inflation models that satisfy the bound have
very low tensors not measurable by current or even upcoming experiments. The known models
of brane inflation include the motion of a D3 brane down a single throat in the framework of
the KKLMMT scenario [9]. In short, the bound on an inflaton field, which is interpreted as a
distance between branes, does not permit fields with vev’s of Planckian scale or larger, which
would lead to tensor modes. A work on the improved derivation of the bound including the
breathing mode of the internal geometry is in progress [35].
At present, there is still a hope that it may be possible to go beyond the simplest models of
brane inflation and evade the constraint on the field range. However, this still has to be done
before one can claim that string theory has a reliable class of brane inflation models predicting
tensor modes, or, on the contrary, that brane inflation predicts a non-detectable level of tensor
modes.
All known models of modular inflation in string theory (no branes) do not predict a de-
tectable level of gravity waves [33], [20]. The only string theory inspired version of assisted
inflation model [36], N-flation [37], would predict a significant level of tensors, as in chaotic and
natural inflation [22, 25, 26], if some assumptions underlying the derivation of this model would
be realized. The main assumption is that in the effective supergravity model with numerous
complex moduli, tn =
+ iM2R2n, all moduli R
n quickly go to their minima. Then only the
axions φn
remain to drive inflation. The reason for this assumption is that the Kähler potential
depends only on the volume modulus of all two-cycles, R2n = − i2M2 (tn − t̄n), but is does not
depend on the axions φn
(tn + t̄n), so one could expect that the axion directions in the
first approximation remain flat. Recently this issue was re-examined in [20], and it was found
that in all presently available models this assumption is not satisfied. The search for models in
various regions of the string theory landscape which would support assumptions of N-flation is
in progress [38].
Thus at present we are unaware of any string inflation models predicting the detectable
level of gravitational waves. However, a search for such models continues. We should mention
here possible generalizations on N-flation, new types of brane inflation listed in Sec. 5 of [19]
and some work in progress on DBI models in a more general setting [39].
We may also try to find a string theory generalization of a class of inflationary models in
N = 1 d = 4 supergravity, which has shift symmetry and predict large tensor modes. One
model is a supergravity version [40] of chaotic inflation, describing fields Φ and X with
(Φ + Φ̄)2 +XX̄ , W = mΦX . (3.1)
This model effectively reproduces the simplest version of chaotic inflation with V = 1
m2φ2,
where the inflaton field is φ = i(Φ− Φ̄). Here the prediction for r, depending on the number of
e-foldings, is 0.14 <∼ r <∼ 0.20.
Another model is a supergravity version [20] of natural inflation [25].
(Φ + Φ̄)2 , W = w0 +Be
−bΦ . (3.2)
This model has an axion valley potential in which the radial part of the complex field quickly
reaches the minimum. Therefore this model effectively reproduces natural inflation with the
axion playing the role of the inflaton with potential V = V0(1 − cos(bφ)) where φ = i(Φ − Φ̄).
Here the possible range of r, depending on the number of e-foldings and the axion decay constant
2 b)−1, is approximately 5× 10−3 <∼ r <∼ 0.20 [26].
Both models have one feature in common. They require shift symmetry of the canonical
Kähler potential K = 1
(Φ + Φ̄)2,
Φ → Φ + iδ , δ = δ̄ . (3.3)
The inflaton potential appears because this shift symmetry is slightly broken by the superpo-
tential.
If supersymmetry will be discovered in future, one would expect that inflationary potential
should be represented by a supergravity potential, or even better, by the supergravity effective
potential derivable from string theory. It is gratifying that at least some supergravity models
capable of prediction of large amplitude of tensor perturbations from inflation are available.
So far, neither of the supergravity models in (3.1), (3.2) with detectable level of gravity waves
was derived from string theory.1 It would be most important to study all possible corners of the
landscape in a search of models which may eventually predict detectable tensor fluctuations, or
prove that it is not possible. The future data on r will make a final judgment on the theories
discussed above.
If some models in string cosmology with r > 10−3 will be found, one can use the detection of
gravity waves for testing models of moduli stabilization in string theories, and in this way relate
cosmology to particle physics. The main point here is that the value of the Hubble constant
during inflation is directly measurable in case that gravity waves are detected.
4 Scale of SUSY breaking, the gravitino mass, and the
amplitude of the gravitational waves in string theory
inflation
So far, we did not discuss relation of the new class of models with particle phenomenology. This
relation is rather unexpected and may impose strong constraints on particle phenomenology
1There is a difference between arbitrary N = 1, d = 4 supergravity model of the general type and models
derived from string theory where various fields in effective supergravity theory have some higher-dimensional
interpretation, like volumes of cycles, distance between branes etc. However, there are situations in string theory
when the actual value of the Kähler potential is not known and therefore models like (3.1), (3.2) are not a priori
excluded.
and on inflationary models: In simplest models based on the KKLT mechanism the Hubble
constant H should be smaller than the present value of the gravitino mass [21],
H <∼ m3/2 . (4.1)
The reason for this bound is that the mass of gravitino at the supersymmetric KKLT minimum
with DW = 0 before the uplifting is given by 3m23/2 = |VAdS|. Uplifting of the AdS minimum
to the present nearly Minkowski vacuum is achieved by adding to the potential a term of the
type of C/σn, where σ is the volume modulus and n = 3 for generic compactification and n = 2
for the highly warped throat geometry. Since the uplifting is less significant at large σ, the
barrier created by the uplifting generically is a bit smaller than |VAdS|. Adding the energy of
the inflaton field leads to an additional uplifting. Since it is also proportional to an inverse
power of the volume modulus, it is greater at the minimum of the KKLT potential than at the
top of the barrier. Therefore adding a large vacuum energy density to the KKLT potential,
which is required for inflation, may uplift the minimum to the height greater than the height
of the barrier, and destabilize it, see Fig. 2. This leads to the bound (4.1).
100 150 200 250 Σ
Figure 2: The lowest curve with dS minimum is the potential of the KKLT model. The
second one shows what happens to the volume modulus potential when the inflaton potential
Vinfl =
V (φ)
added to the KKLT potential. The top curve shows that when the inflaton potential
becomes too large, the barrier disappears, and the internal space decompactifies. This explains
the constraint H <∼ m3/2.
One should note that an exact form of this bound is a bit more complicated than (4.1),
containing additional factors which depend logarithmically on certain parameters of the KKLT
potential. However, unless these parameters are exponentially large or exponentially small, one
can use the simple form of this bound, H <∼ m3/2.
Therefore if one believes in the standard SUSY phenomenology with m3/2 <∼ O(1) TeV, one
should find a realistic particle physics model where the nonperturbative string theory dynamics
occurs at the LHC scale (the mass of the volume modulus is not much greater than the gravitino
mass), and inflation occurs at a density at least 30 orders of magnitude below the Planck energy
density. Such models are possible, but their parameters should be substantially different from
the parameters used in all presently existing models of string theory inflation.
An interesting observational consequence of this result is that the amplitude of the grav-
itational waves in all string inflation models of this type should be extremely small. Indeed,
according to Eq. (1.3), one has r ≈ 3× 107 V ≈ 108 H2, which implies that
r <∼ 10
8 m23/2 , (4.2)
in Planck units. In particular, for m3/2 <∼ 1 TeV ∼ 4 × 10−16 Mp, which is in the range most
often discussed by SUSY phenomenology, one has
r <∼ 10
−24 . (4.3)
If CMB experiments find that r >∼ 10−2, then this will imply, in the class of theories described
above, that
m3/2 >∼ 10
−5 Mp ∼ 2.4× 1013 GeV , (4.4)
which is 10 orders of magnitude greater than the standard gravitino mass range discussed by
particle phenomenologists.
There are several different ways to address this problem. First of all, one may consider
KKLT models with the racetrack superpotential containing at least two exponents and find
such parameters that the supersymmetric minimum of the potential even before the uplifting
occurs at zero energy density [21], which would mean m3/2 = 0. Then, by a slight change of
parameters one can get the gravitino mass squared much smaller than the height of the barrier,
which removes the constraint H <∼ m3/2.
If we want to increase the upper bound on H from 1 TeV up to 1013 GeV for m3/2 ∼ 1
TeV, we would need to fine-tune the parameters of the model of Ref. [21] with a very high
accuracy. Therefore it does not seem easy to increase the measurable value of r in the model
of [21] from 10−24 up to 10−3. However, this issue requires a more detailed analysis, since this
model is rather special: In its limiting form, it describes a supersymmetric Minkowski vacuum
without any need of uplifting, and it has certain advantages with respect to vacuum stability
being protected by supersymmetry were discussed in [41]. Therefore it might happen that this
model occupies a special place in the landscape which allows a natural way towards large r.
We will discuss now several other models of moduli stabilization in string theory to see
whether one can overcome the bound (4.2).
A new class of moduli stabilization in M-theory was recently developed in [42]. In particular
cases studied numerically, the height of the barrier after the uplifting is about Vbarrier ≈ 50m23/2,
in some other cases, Vbarrier ≤ O(500) m23/2 [43]. It seems plausible that for this class of
models, just as in the simplest KKLT models, the condition that Vbarrier ≥ 3H2 is required for
stabilization of moduli during inflation. Since the gravitino mass in this model is in the range
from 1 TeV to 100 TeV, the amplitude of the tensor modes is expected to be negligibly small.
Another possibility is to consider the large volume compactification models with stringy
α′ corrections taken into account [44]. At first glance, this also does not seem to help. The
AdS minimum at which moduli are stabilized before the uplifting is not supersymmetric, which
means that generically in AdS minimum 3m23/2 = |V |AdS + eK |DW |2 ≥ |V |AdS. Upon uplift-
ing, generically the height of the barrier is not much different from the absolute value of the
potential in the AdS minimum, Vbarrier ∼ |V |AdS. As the result, the situation with the destabi-
lization during inflation may seem even more difficult than in the simplest KKLT models: the
extra term due to broken supersymmetry eK |DW |2 6= 0 tends to increase the gravitino mass
squared as compared to |V |AdS. This decreases the ratio of the height of the barrier after the
uplifting to the gravitino mass squared. However, a more detailed investigation of this model
is required to verify this conjecture. As we already mentioned, an important assumption in
the derivation of the constraint H <∼ m3/2 in the simplest version of the KKLT model is the
absence of exponentially large parameters. Meanwhile the volume of compactification in [44] is
exponentially large. One should check whether this can help to keep the vacuum stabilized for
large H .
But this class of models offers another possible way to address the low-H problem: In
the phenomenological models based on [44] the gravitino mass can be extremely large. Phe-
nomenological models with superheavy gravitinos were also considered in [45, 46]. In particu-
lar, certain versions of the split supersymmetry models allow gravitino masses in the range of
1013 − 1014 GeV [46]. Therefore in such models the constraint H <∼ m3/2 is quite consistent
with the possibility of the discovery of tensor modes with 10−3 <∼ r <∼ 0.3 if the problems with
constructing the corresponding inflationary models discussed in the previous section will be
resolved.
We would like to stress that we presented here only a first scan of possibilities available
in string cosmology with regard to detectability of the tensor modes, and so far the result
is negative. More studies are required to have a better prediction of r in string cosmology.
It would be most important either to construct a reliable inflationary model in string theory
predicting tensors with 10−3 <∼ r <∼ 0.3, or prove a no-go theorem. If tensor modes will not
be detected, this issue will disappear; the attention will move to more precise values of the tilt
of the spectrum ns, non-gaussianity, cosmic strings and other issues which will be clarified by
observations in the next few years.
However, a possible discovery of tensor modes may force us to reconsider several basic
assumptions of string cosmology and particle phenomenology. In particular, it may imply
that the gravitino must be superheavy. Thus, investigation of gravitational waves produced
during inflation may serve as a unique source of information about string theory and about the
fundamental physics in general.
Acknowledgments
We are grateful to D. Baumann, R. Bean, S.E. Church, G. Efstathiou, S. Kachru, L. Kofman,
D. Lyth, L. McAllister, V. Mukhanov, S. Shenker, E. Silverstein and H. Tye for very stimulating
discussions. This work was supported by NSF grant PHY-0244728.
References
[1] S. Kachru, R. Kallosh, A. Linde and S. P. Trivedi, “De Sitter vacua in string theory,”
Phys. Rev. D 68, 046005 (2003) [arXiv:hep-th/0301240].
[2] S. B. Giddings, S. Kachru and J. Polchinski, “Hierarchies from fluxes in string com-
pactifications,” Phys. Rev. D66, 106006 (2002) [arXiv:hep-th/0105097]; E. Silverstein,
“(A)dS backgrounds from asymmetric orientifolds,” arXiv:hep-th/0106209; A. Mal-
oney, E. Silverstein and A. Strominger, “De Sitter space in noncritical string theory,”
arXiv:hep-th/0205316.
[3] W. Lerche, D. Lüst and A. N. Schellekens, “Chiral Four-Dimensional Heterotic Strings
From Selfdual Lattices,” Nucl. Phys. B 287, 477 (1987); R. Bousso and J. Polchin-
ski, “Quantization of four-form fluxes and dynamical neutralization of the cosmologi-
cal constant,” JHEP 0006, 006 (2000) [arXiv:hep-th/0004134]; M. R. Douglas, “The
statistics of string / M theory vacua,” JHEP 0305 046 (2003) [arXiv:hep-th/0303194];
F. Denef and M. R. Douglas, “Distributions of flux vacua,” JHEP 0405, 072
(2004) [arXiv:hep-th/0404116]; M. R. Douglas and S. Kachru, “Flux compactification,”
arXiv:hep-th/0610102.
[4] A. Vilenkin, “The Birth Of Inflationary Universes,” Phys. Rev. D 27, 2848 (1983).
[5] A. D. Linde, “Eternally Existing Self-reproducing Chaotic Inflationary Universe,” Phys.
Lett. B 175, 395 (1986); A. D. Linde, D. A. Linde and A. Mezhlumian, “From the
Big Bang theory to the theory of a stationary universe,” Phys. Rev. D 49, 1783 (1994)
[arXiv:gr-qc/9306035].
[6] A.D. Linde, Particle Physics and Inflationary Cosmology (Harwood, Chur, Switzerland,
1990) [arXiv:hep-th/0503203]; A. Linde, “Inflation, quantum cosmology and the anthropic
principle,” in Science and Ultimate Reality: From Quantum to Cosmos, (eds. J.D. Barrow,
P.C.W. Davies, & C.L. Harper, Cambridge University Press, 2003) [arXiv:hep-th/0211048].
[7] L. Susskind, “The anthropic landscape of string theory,” arXiv:hep-th/0302219.
[8] E. J. Copeland, R. C. Myers and J. Polchinski, “Cosmic F- and D-strings,” JHEP 0406,
013 (2004) [arXiv:hep-th/0312067].
[9] S. Kachru, R. Kallosh, A. Linde, J. M. Maldacena, L. McAllister and S. P. Trivedi, “To-
wards inflation in string theory,” JCAP 0310, 013 (2003) [arXiv:hep-th/0308055].
[10] S. H. Henry Tye, “Brane inflation: String theory viewed from the cosmos,”
arXiv:hep-th/0610221.
[11] V. F. Mukhanov and G. V. Chibisov, “Quantum Fluctuation And ‘Nonsingular’ Universe,”
JETP Lett. 33, 532 (1981) [Pisma Zh. Eksp. Teor. Fiz. 33, 549 (1981)]; S. W. Hawking,
“The Development Of Irregularities In A Single Bubble Inflationary Universe,” Phys. Lett.
B 115, 295 (1982); A. A. Starobinsky, “Dynamics Of Phase Transition In The New In-
flationary Universe Scenario And Generation Of Perturbations,” Phys. Lett. B 117, 175
http://arxiv.org/abs/hep-th/0301240
http://arxiv.org/abs/hep-th/0105097
http://arxiv.org/abs/hep-th/0106209
http://arxiv.org/abs/hep-th/0205316
http://arxiv.org/abs/hep-th/0004134
http://arxiv.org/abs/hep-th/0303194
http://arxiv.org/abs/hep-th/0404116
http://arxiv.org/abs/hep-th/0610102
http://arxiv.org/abs/gr-qc/9306035
http://arxiv.org/abs/hep-th/0503203
http://arxiv.org/abs/hep-th/0211048
http://arxiv.org/abs/hep-th/0302219
http://arxiv.org/abs/hep-th/0312067
http://arxiv.org/abs/hep-th/0308055
http://arxiv.org/abs/hep-th/0610221
(1982); A. H. Guth and S. Y. Pi, “Fluctuations In The New Inflationary Universe,” Phys.
Rev. Lett. 49, 1110 (1982); J. M. Bardeen, P. J. Steinhardt and M. S. Turner, “Sponta-
neous Creation Of Almost Scale - Free Density Perturbations In An Inflationary Universe,”
Phys. Rev. D 28, 679 (1983); V. F. Mukhanov, “Gravitational Instability Of The Universe
Filled With A Scalar Field,” JETP Lett. 41, 493 (1985) [Pisma Zh. Eksp. Teor. Fiz. 41,
402 (1985)]; V. F. Mukhanov, Physical Foundations of Cosmology, Cambridge University
Press, 2005.
[12] A.R. Liddle and D. H. Lyth, Cosmological Inflation and Large-Scale Structure, (Cambridge
University Press, Cambridge 2000).
[13] H. V. Peiris et al., “First year Wilkinson Microwave Anisotropy Probe (WMAP)
observations: Implications for inflation,” Astrophys. J. Suppl. 148, 213 (2003)
[arXiv:astro-ph/0302225].
[14] M. Tegmark et al., “Cosmological Constraints from the SDSS Luminous Red Galaxies,”
Phys. Rev. D 74, 123507 (2006) [arXiv:astro-ph/0608632].
[15] C. L. Kuo et al., “Improved Measurements of the CMB Power Spectrum with ACBAR,”
arXiv:astro-ph/0611198.
[16] A. A. Starobinsky, “Spectrum Of Relict Gravitational Radiation And The Early State Of
The Universe,” JETP Lett. 30, 682 (1979) [Pisma Zh. Eksp. Teor. Fiz. 30, 719 (1979)];
A. A. Starobinsky, “Cosmic Background Anisotropy Induced by Isotropic Flat-Spectrum
Gravitational-Wave Perturbations,” Sov. Astron. Lett. 11, 133 (1985).
[17] J. A. Peacock, P. Schneider, G. Efstathiou, J. R. Ellis, B. Leibundgut, S. J. Lilly and
Y. Mellier, “Report by the ESA-ESO Working Group on Fundamental Cosmology,”
arXiv:astro-ph/0610906.
[18] D. Baumann and L. McAllister, “A microscopic limit on gravitational waves from D-brane
inflation,” arXiv:hep-th/0610285.
[19] R. Bean, S. E. Shandera, S. H. Henry Tye and J. Xu, “Comparing brane inflation to
WMAP,” arXiv:hep-th/0702107.
[20] R. Kallosh, “On inflation in string theory,” arXiv:hep-th/0702059.
[21] R. Kallosh and A. Linde, “Landscape, the scale of SUSY breaking, and inflation,” JHEP
0412, 004 (2004) [arXiv:hep-th/0411011].
[22] A. D. Linde, “Chaotic Inflation,” Phys. Lett. B 129, 177 (1983).
[23] H. J. de Vega and N. G. Sanchez, “Predictions of single field inflation for the tensor/scalar
ratio and the running spectral index,” Phys. Rev. D 74, 063519 (2006).
[24] A. Linde, “Inflationary cosmology,” a contribution to the proceedings of the conference
“Inflation + 25,” in preparation.
http://arxiv.org/abs/astro-ph/0302225
http://arxiv.org/abs/astro-ph/0608632
http://arxiv.org/abs/astro-ph/0611198
http://arxiv.org/abs/astro-ph/0610906
http://arxiv.org/abs/hep-th/0610285
http://arxiv.org/abs/hep-th/0702107
http://arxiv.org/abs/hep-th/0702059
http://arxiv.org/abs/hep-th/0411011
[25] K. Freese, J. A. Frieman and A. V. Olinto, “Natural inflation with pseudo - Nambu-
Goldstone bosons,” Phys. Rev. Lett. 65, 3233 (1990).
[26] C. Savage, K. Freese and W. H. Kinney, “Natural inflation: Status after WMAP 3-year
data,” Phys. Rev. D 74, 123511 (2006) [arXiv:hep-ph/0609144].
[27] H. M. Hodges, G. R. Blumenthal, L. A. Kofman and J. R. Primack, “Nonstandard primor-
dial fluctuations from a polynomial inflaton potential,” Nucl. Phys. B 335, 197 (1990).
[28] C. Destri, H. J. de Vega and N. G. Sanchez, “MCMC analysis of WMAP3 data points to
broken symmetry inflaton potentials and provides a lower bound on the tensor to scalar
ratio,” arXiv:astro-ph/0703417.
[29] D. H. Lyth, “What would we learn by detecting a gravitational wave signal
in the cosmic microwave background anisotropy?,” Phys. Rev. Lett. 78, 1861
(1997) [arXiv:hep-ph/9606387]; D. H. Lyth, “Particle physics models of inflation,”
arXiv:hep-th/0702128.
[30] A. D. Linde, “A New Inflationary Universe Scenario: A Possible Solution Of The Horizon,
Flatness, Homogeneity, Isotropy And Primordial Monopole Problems,” Phys. Lett. B 108,
389 (1982); A. Albrecht and P. J. Steinhardt, “Cosmology For Grand Unified Theories With
Radiatively Induced Symmetry Breaking,” Phys. Rev. Lett. 48, 1220 (1982); A. D. Linde,
“Coleman-Weinberg Theory And A New Inflationary Universe Scenario,” Phys. Lett. B
114, 431 (1982); A. D. Linde, “Temperature Dependence Of Coupling Constants And
The Phase Transition In The Coleman-Weinberg Theory,” Phys. Lett. B 116, 340 (1982);
A. D. Linde, “Scalar Field Fluctuations In Expanding Universe And The New Inflationary
Universe Scenario,” Phys. Lett. B 116, 335 (1982).
[31] A. D. Linde, “Axions in inflationary cosmology,” Phys. Lett. B 259, 38 (1991); A. D. Linde,
“Hybrid inflation,” Phys. Rev. D 49, 748 (1994) [astro-ph/9307002].
[32] S. Chongchitnan and G. Efstathiou, “Prospects for direct detection of primordial gravi-
tational waves,” Phys. Rev. D 73, 083511 (2006) [arXiv:astro-ph/0602594]; G. Efstathiou
and S. Chongchitnan, “The search for primordial tensor modes,” Prog. Theor. Phys. Suppl.
163, 204 (2006) [arXiv:astro-ph/0603118].
[33] J. M. Cline, “String cosmology,” arXiv:hep-th/0612129.
[34] M. Alishahiha, E. Silverstein and D. Tong, “DBI in the sky,” Phys. Rev. D 70, 123505
(2004) [arXiv:hep-th/0404084].
[35] D. Baumann and L. McAllister, private communication.
[36] A. R. Liddle, A. Mazumdar and F. E. Schunck, “Assisted inflation,” Phys. Rev. D 58,
061301 (1998) [arXiv:astro-ph/9804177].
[37] S. Dimopoulos, S. Kachru, J. McGreevy and J. G. Wacker, “N-flation,”
arXiv:hep-th/0507205.
http://arxiv.org/abs/hep-ph/0609144
http://arxiv.org/abs/astro-ph/0703417
http://arxiv.org/abs/hep-ph/9606387
http://arxiv.org/abs/hep-th/0702128
http://arxiv.org/abs/astro-ph/9307002
http://arxiv.org/abs/astro-ph/0602594
http://arxiv.org/abs/astro-ph/0603118
http://arxiv.org/abs/hep-th/0612129
http://arxiv.org/abs/hep-th/0404084
http://arxiv.org/abs/astro-ph/9804177
http://arxiv.org/abs/hep-th/0507205
[38] R. Kallosh, N. Sivanandam, M. Soroush, “Looking for Axion Valley in the Landscape”,
work in progress.
[39] E. Silverstein, work in progress.
[40] M. Kawasaki, M. Yamaguchi and T. Yanagida, “Natural chaotic inflation in supergravity,”
Phys. Rev. Lett. 85, 3572 (2000) [arXiv:hep-ph/0004243].
[41] J. J. Blanco-Pillado, R. Kallosh and A. Linde, “Supersymmetry and stability of flux vacua,”
JHEP 0605, 053 (2006) [arXiv:hep-th/0511042].
[42] B. S. Acharya, K. Bobkov, G. L. Kane, P. Kumar and J. Shao, “Explaining the electroweak
scale and stabilizing moduli in M theory,” arXiv:hep-th/0701034.
[43] We are grateful to the authors of [42] for providing us with this information.
[44] V. Balasubramanian, P. Berglund, J. P. Conlon and F. Quevedo, “Systematics of
moduli stabilisation in Calabi-Yau flux compactifications,” JHEP 0503, 007 (2005)
[arXiv:hep-th/0502058].
[45] O. DeWolfe and S. B. Giddings, “Scales and hierarchies in warped compactifications and
brane worlds,” Phys. Rev. D 67, 066008 (2003) [arXiv:hep-th/0208123].
[46] N. Arkani-Hamed and S. Dimopoulos, “Supersymmetric unification without low en-
ergy supersymmetry and signatures for fine-tuning at the LHC,” JHEP 0506, 073
(2005) [arXiv:hep-th/0405159]; N. Arkani-Hamed, S. Dimopoulos, G. F. Giudice and
A. Romanino, “Aspects of split supersymmetry,” Nucl. Phys. B 709, 3 (2005)
[arXiv:hep-ph/0409232].
http://arxiv.org/abs/hep-ph/0004243
http://arxiv.org/abs/hep-th/0511042
http://arxiv.org/abs/hep-th/0701034
http://arxiv.org/abs/hep-th/0502058
http://arxiv.org/abs/hep-th/0208123
http://arxiv.org/abs/hep-th/0405159
http://arxiv.org/abs/hep-ph/0409232
	Introduction
	Tensor modes in the simplest inflationary models
	Existing models of string theory inflation do not predict a detectable level of tensor modes
	Scale of SUSY breaking, the gravitino mass, and the amplitude of the gravitational waves in string theory inflation
ABSTRACT
  Future detection/non-detection of tensor modes from inflation in CMB
observations presents a unique way to test certain features of string theory.
Current limit on the ratio of tensor to scalar perturbations, r=T/S, is r <
0.3, future detection may take place for r > 10^{-2}-10^{-3}. At present all
known string theory inflation models predict tensor modes well below the level
of detection. Therefore a possible experimental discovery of tensor modes may
present a challenge to string cosmology.
  The strongest bound on r in string inflation follows from the observation
that in most of the models based on the KKLT construction, the value of the
Hubble constant H during inflation must be smaller than the gravitino mass. For
the gravitino mass in the usual range, m_{3/2} < O(1) TeV, this leads to an
extremely strong bound r < 10^{-24}. A discovery of tensor perturbations with r
> 10^{-3} would imply that the gravitinos in this class of models are
superheavy, m_{3/2} > 10^{13} GeV. This would have important implications for
particle phenomenology based on string theory.

<|endoftext|><|startoftext|>
Introduction 
     Quick behavioral response to strong aversive stimuli (such as threat from a predator or 
an imminent danger of being hurt) is a key to survival throughout the animal kingdom. 
Network models of animal behavior have been elaborately discussed in (Schmajuk 1997). 
A neural network model of rats’ anxiety behavior had been studied by Salum and 
colleagues (Salum et al. 2000), but they didn’t take into account the functioning of the 
nerve cells in the brain during the task performance. Recently a neurodynamical model 
for conditional visuomotor association task has been proposed (Loh & Deco 2005). In 
this model a trial and error paradigm has been assumed in a stochastic decision space. An 
integrate and fire neuronal network model has been proposed to realize the paradigm. A 
neural network model of brain or cognitive state machine (CSM) to study decision 
making in a competitive environment has also been proposed (Rabinovich et al. 2006). 
The dynamics of making a choice from among multiple conflicting options has been 
formulated by Lotka-Volterra type of equations. It is evident that intelligent decisions in a 
sequential behavior have to be stable against noise and reproducible to allow 
memorization and reuse of successful decision sequences in the future. On the other 
hand, it also has to be sensitive to new information from the environment. These two 
fundamentally contradictory requirements have been taken care of in (Rabinovich et al. 
2006). 
     In this paper a neurodynamical model of response behavior of a neural network to 
strong aversive stimuli has been presented. The assumption is that the network will 
behave in a manner to avoid repeating the negative experiences of the past. No specific 
neuronal network model has been assumed. In this approach information has been 
extracted (at least theoretically) from a behaving neuronal network by FFT on spike 
trains of all the neurons in the network, which should work across all networks of 
neurons irrespective of their architecture. The neurodynamical system has a unique 
representation in  the information space where the Fourier coefficients of a spike train are 
arranged as a vector and uniquely represent the spike train. All the calculations have been 
carried out in that space. The model depends on past memory, synaptic plasticity and 
intensity of feeling. Interestingly, a short latency (120 – 160 ms) had been reported before 
response to aversive stimuli in the right prefrontal cortex of a human subject. No such 
latency was observed in case of pleasant or neutral stimuli (Kawasaki et al. 2001). The 
present model can offer a possible explanation for this apparently perplexing 
phenomenon. The following assumptions have been made: 
1) Structure: Any behavior involves a network in the nervous system which if 
represented as a directed multigraph will have neurons as vertices and synapses as 
(directed) edges. 
2) Function: The functional behavior of this network is manifested by the collective 
behavior of the neurons present in the network. 
3) Memory: Memory of experiences of past behaviors of this network is stored 
entirely within the network and nowhere out side of it. 
4) Plasticity: Plasticity of each synapse is a time dependent function (called synaptic 
weight) and the total plasticity of the network between any two behaviors is also a 
time dependent function called network plasticity. 
5) Feeling: Feeling is associated with each behavior as a specific mathematical 
function which controls how experience associated with this behavior will mediate 
the intensity of effect of this behavior on any other behavior. 
6) Interaction: Network plasticity mediates the interaction of the ongoing behavior 
with the memory of past behaviors stored in the network. 
In section 2 each of the above points will be explained briefly and will be represented 
with appropriate mathematical expressions and equations. In section 3 with the help of an 
analogy from classical electrostatics the behavior dynamics of the network in terms of 
those expressions and equations will be formulated. Notion of a force like and a potential 
energy like expressions have been introduced. Despite computational difficulties 
involved with the model, in section 4 it has been related to reality first by offering an 
explanation to an hitherto unexplained observation in the human brain, second relating it 
with a successful decision making model and third relating this electrophysiological 
model to hemodynamical activity of the brain. The paper will be concluded with 
discussions and future directions. 
2. Modeling of the parts 
A) Structure 
     The closest computational analog of a neuronal circuit is a directed multigraph, whose 
each node will represent a neuron and each edge a synapse (Majumdar & Kozma, 2006). 
It can be represented as a three dimensional array ],,[ kjia . If there are p  different 
synapses joining the neuron i  with the neuron j  (assuming that all neurons in the brain 
have been numbered) then ],,[ kjia  will give the weight (the gain in synaptic 
transmission) of the kth  synapse joining the neuron i  with the neuron j , where 
pk ≤≤1 . If the neuron i  and neuron j  are excitatory then ],,[ kjia  will be positive, if 
they are inhibitory then ],,[ kjia  will be negative. For given values of ji,  and k  ],,[ kji  
uniquely determines a synapse and ],,[ kjia  represents the synaptic weight. In general 
],,[ kjia  will be a function of time and may be written as )](,,[ tkjia  or )(taijk , where t  
denotes time. 
B) Function 
     Computations in the central nervous system (CNS) have been viewed from three 
different angles – (a) synaptic computation (Abbott & Regehr 2004), (b) dendritic 
computation (London & Hausser 2005) and (c) neuronal computation (Koch 1999; 
Borisyuk & Rinzel 2005). In a neuronal network synapses are activating the neurons and 
neurons are activating the synapses. In this sense the aggregate behavior of the neurons in 
a network is equivalent to aggregate behavior of the synapses in that network. On an 
average every neuron receives input from about 
10  synapses and therefore from the 
computational modeling purpose it would be more convenient to consider the aggregate 
behavior of neurons rather than the aggregate behavior of synapses in a network. The 
behavior of a neuron during a particular epoch of time is completely represented by the 
spike train it generates during that period of time. A neuronal spike train carries 
information in the following manner (Kandel, et al. 2000).  
(1) The number of action potentials (spikes); and 
(2) The time intervals between them. 
(Although this argument simplifies the description of brain functions to a great extent it 
ignores the reality of occurrence of change in the neural circuit without changing the 
firing patterns of the neurons. This has been discussed in Conclusion). The duration of a 
spike is typically 1 to 2 milliseconds (ms, depending on the temperature) (Koch 1999). 
When the sample frequency is high (1000 Hz or more) by FFT reliable information out of 
a neuronal spike train can be extracted. The FFT produces the vector 
bababaa ....
22110
where 
     
=na (nth  Fourier coefficient + conjugate of nth  Fourier coefficient),                (2.1) 
     
= (nth  Fourier coefficient – conjugate of nth  Fourier coefficient)                (2.2) 
1−=i . For convenience it can be rewritten as )12().....1( +ree , where 
     
− oddisnifbne
evenisnifane
,                                                                                       (2.3) 
in uniform symbol. More conventionally the vector kv  associated with the spike train of 
the kth  neuron in the network can be written as  
     
kkk reev ))12(),...,1(( += .                                                                                       (2.4) 
The suffix T  stands for transpose. If the duration of the spike train is p  seconds then 
pr 500=  (assuming sample frequency is 1000 Hz). Clearly the vector kv  uniquely 
represents the spike train of the kth  neuron in the network, assuming all the neurons in 
the network have been uniquely numbered. Since the Fourier series is convergent 
0)12( →+rek  as ∞→r . 
     Let there are N  neurons in the network. The behavior iB  of the network for a 
duration of p  seconds is represented by the cluster of N vectors { }N
kv 1= , where 
     
k reev ))12(),...,1(( += ,                                                                                       (2.5) 
     p
= ,                                                                                                                    (2.6) 
f  is sample frequency. It is important to note that even when a neuron is oscillating 
below threshold for spike initiation, it can still release neurotransmitter and shape the 
final circuit output (Harris-Warrick & Marder 1991). But for simplicity of modeling I 
shall ignore this fact in this paper. 
C) Memory 
     Memory is no single concept and there is no universally agreed upon definition of it. 
However the starting point for virtually any scientific analysis of memory involves a 
decomposition into processes of encoding, storage and retrieval  (Schacter 2004). Wilder 
Penfield explored the cortical surface in more than a thousand epileptic patients. On rare 
occasions (about 8% of all the subjects he tried) he found that electrical stimulation in the 
temporal lobes produced a coherent recollection of an earlier experience (Kandel et al. 
2000). A similar phenomenon has been observed by stimulating the inferotemporal (IT) 
cortex of macaque monkeys (Afraz et al. 2006). By mild stimulation to the IT in the 
macaque brain (previously trained to distinguish between face and non-face) impression 
of seeing a face was created in the mind of the animals where there was actually no face. 
An opposite phenomenon has been reported for the human brain (Quian Quiroga et al. 
2005). Visualization of images of interesting objects including faces of celebrities can 
make particular neurons to fire in the sub-region of the medial temporal lobes (MTL) of 
the human brain consisting of hippocampus, amygdala, entorhinal cortex and 
parahippocampal gyrus. This supports the hypothesis – object cognition activates a 
network in the brain and artificially activating the network to the appropriate degree will 
create the impression of perceiving the object in the brain even when it is not present in 
the environment. The following hypothesis seems to hold. 
A particular cognition involves a particular network in the brain where the memory of the 
cognition remains stored. The higher processing areas of the network takes increasingly 
greater part in the cognition and also greater part in storing and retrieving the memory. 
For the purpose of this paper it would be enough to be able to express memory in terms 
of behavior of a network. If the behavior of the network is iB  the memory associated 
with it will be { }N
ki uM 1== . The relationship between 
ku  and 
kv  is determined by long 
term potentiation (LTP) and long term depression (LTD) if iM  is residing in the network 
long after iB  has happened. It may be appropriate to emphasize at this point that the 
collective firing pattern of neurons (i.e., the collective spike trains of neurons belonging 
to the network) to evoke iM  is preserved by the synaptic connections in the network, for 
they control input to the neurons in response to the stimuli and therefore iM  is stored in 
the synaptic strengths of the network. A metric between iB  and iM  can be defined in the 
following manner 
     { }∑ ∑
MBd ii ,                                                          (2.7) 
where N  is the number of neurons in the network. Although virtually impossible to 
compute in this form d  can be a measure of plasticity of the whole network with respect 
to iB  and iM . Note that the time duration has been taken care of in p  at the time of 
determining )( je i
 and )( je i
 (neuronal firing patterns may not be identical at the time 
of a behavior and recalling it later both with respect to identical set of stimuli). A single 
network can mediate multiple behaviors (Harris-Warrick & Marder 1991) and therefore 
can store multiple memories. 
     Memory associated with a behavior may either be positive (appetitive) or negative 
(aversive) or neutral or a combination of them. The sense positive, negative and neutral 
are totally subjective and no attempt will be made here to define them. The meaning will 
become clear in the contexts in which they occur. A simple behavior is one with which 
only one type of memory (i.e., either positive or negative or neutral) remains associated. 
A behavior with combined types of memory can be called complex behavior and let us 
assume any behavior can be decomposed into simple behaviors. A network therefore can 
be thought to have memories of simple behaviors only. When the difference between iB  
and iM  is small i.e., in (2.7) ),( ii MBd  becomes small, consolidation of iM  is good. 
),( ii MBd  is intimately related to synaptic plasticity. 
D) Plasticity 
     (2.7) gives us an immediate measure (no matter however difficult it is or may even be 
impossible to implement) of plasticity of a neuronal network. The measure of combined 
long term plasticity )(tW  as a combination of long term potentiation (LTP) and long 
term depression (LTD) can be given by the following formula 
     ∑
ii MBd
),( ,                                                                                        (2.8) 
where p is the duration of recording of the neural response when iB  is happening and 
when iM  is being retrieved both in response to identical set of stimuli and t  is the time 
difference between end of happening of iB  and start of retrieving iM . Usually p  
remains fixed and therefore )(),( tWtpW = . m  is the total number of past behaviors 
whose memory is still preserved in the network. When the gap between occurrence of iB  
and recalling of iM  is long (30 minutes or more according to some estimate (Koch 
1999)) (2.8) gives the long term plasticity and when it is shorter (2.8) gives short term 
plasticity. The following notion will also be useful 
     ),()( iii MBdtW = ,                                                                                                   (2.9) 
where )(iWi  is the network plasticity between behavior iB  and memory iM  after time t . 
E) Feeling 
     From a modeling or computational point of view the feelings may be taken as 
mediating the intensity of a behavioral response. In this sense if iB  is a simple behavior 
the feeling associated with it determines how intensely negative or intensely positive will 
be the memory of it. Ideally iM  should have an ‘intensity distribution function’ similar 
to a normal distribution function, which is decayed as the distance from the mean 
position is increased. iM  is not a single point, but a cluster of points. So if the intensity 
distribution function is to be modeled after the normal distribution function the most 
appropriate candidate for the mean point turns out to be the mean point of iM  
     
M ii 
+= ∑∑
)12(),.......,1(
,                                                           (2.10) 
where T  stands for transpose. The feeling function RRF
:  associated with iM  is 
to be defined as 
     
2/12/)12(
i MXMXXF
,                               (2.11) 
where Σ  is the covariance matrix and Σ  is the determinant of Σ . Since X  has 12 +r  
independent component variables only diagonal entries of Σ  are nonzero and each of 
them is the variance of a component variable in X . Σ  is the parameter which controls 
the intensity of )(XFi  beyond iM . The hormones and neuromodulators responsible for 
mediating feelings act by controlling the entries of Σ . 
F) Interaction 
     Interaction means here the part played by synapses in the network in mediating the 
effects of sM i '  on an ongoing behavior B . In mathematical term it can be put as 
     ))(())(()(
tpWfttpWftI iii −−+−= ,                                                                 (2.12) 
where )(
tI i  denotes the interaction and t  is any instant during occurrence of B . if  is a 
continuous function which is almost everywhere differentiable. When B  takes place the 
plasticity of the network changes and so also the interaction. Let us normalize interaction 
by the following formula 
     
)( ,                                                                                                   (2.13) 
where T  is the duration of happening of B . 
3. Integrative dynamics 
     Now let us briefly consider a phenomenon in electrostatics. Let there be seven charged 
particles on a plane and they are all in arbitrary but fixed positions. Four of them are 
positively charged and three are negative. A new negative charge is introduced, which is 
allowed to move freely in the plane, that is, it has two degrees of freedom along the X 
and Y axes. Assume apart from sign all the charges are quantitatively equal. 
     Let the locus of the introduced negative charge be ),( yx . The position of four positive 
charges be { }4
),( =iii yx  and that of the three negative charges be { }
),( =iii yx . The 
Coulomb force ),( yxF  acting on the free charge is given by 
     ∑∑
== −+−
)()()()(
i iii ii yyxx
yxf ,                          (3.1) 
where C  is a constant. (3.1) will govern the dynamics of the whole system. Now what 
should be the condition to keep the introduced charge fixed within a bounded region 
enclosing the seven fixed positioned charge so that the potential energy on the introduced 
particle becomes minimum? 
     On line of this analogy the dynamics of a new behavior B  should follow the 
governing expression )(BG  given by the following equation 
     
( ) ( )
= +−=
smi i
)( ,                                                    (3.2) 
where m  is the number of simple behaviors whose memories are stored in the network 
under consideration. It has been assumed that B  will be attracted towards s  behaviors 
with positive experience and will be repulsed by the remaining sm −  behaviors with 
negative experience. Like (3.1) there is no apparent reason why inverse square law 
should hold for (3.2) also. If the inverse square law does not hold then the denominators 
on the right side of (3.2) should be replaced by a general polynomial in the metric d , for 
any continuous function can be approximated to any desired degree within a compact 
interval by a suitable polynomial. Like the electrostatic situation for a given iM  )( iMG  
must have a unique pole at the iM . This means in general (3.2) can have the form 
      
( ) ( )
= +−=
1 1 ),(
)( ,                                                   (3.3) 
for some positive n . However in this paper I shall adhere to (3.2), for the model here is 
essentially electrophysiological and therefore Coulomb interaction seems to be probable. 
In summary the dynamics of the new behavior B  is described by (2.7), (2.11), (2.13) and 
(3.2). 
     In analogy with the electrostatic system (3.2) is a ‘force like’ expression with which a 
‘potential energy like’ expression needs to be associated, which for the sake of stability 
of the system (i.e., if perturbed infinitesimally will come back to the original state) needs 
to be minimized. Let the potential function associated with B  due to iM  be 
iφ  given by 
     ∑
))12(),......,1((φφ .                                                                           (3.4) 
The potential function associated with the emerging new behavior B  due to all previous 
behaviors iB  ( iM  is memory of iB ) be φ , which is given by 
     ∑
φφ .                                                                                                                (3.5) 
To derive the potential energy classically from the force field the force must have to be 
conservative, i.e., independent of time (Goldstein 1950). In (3.2) the expression )(tI i  is a 
function in time. However in case of a very strong aversive stimulus like, locating a 
predator dangerously close all of a sudden or being on the verge of falling down deep 
underneath from a very high roof top after a sudden slip, an extremely fast behavioral 
response must have to be shown. Within this duration synaptic plasticity does not get 
much chance to act and the feeling must have to be very strong (such as intense fear) to 
compensate for that as is evident from (3.2). This gives us an opportunity to treat )(tI  as 
a fixed quantity which makes )(BG  in (3.2) time independent for the duration of B . 
Then in analogy with the electrostatic system )(BG  must satisfy the following relation 
     ∑ ∑∑
.                                                                       (3.6) 
The value of )(BG  has only been considered and not its direction. (3.2) and (3.6) 
together give 
     
( ) ( )
∑ ∑∑ ∑∑
= +−== =
smi i
jeN 1 1
.                 (3.7) 
(3.7) describes the dynamics of the behavior of B  in the 
12 +r
R  under the assumption that 
)(tI i  is constant (otherwise the ‘force field’ would not have been conserved and only 
space dependent potential expressions could not have been brought in the dynamics). 
Also the time scale is very small. φ  as given by (3.5) will have to be minimized 
(minimization of energy is an important criterion for neural computation (Laughlin 
2004)), which means 
     )}12(,.....,1{,0
.                                                                          (3.8) 
Combining (3.4) and (3.5) the expression for φ  becomes 
     ∑∑
k ree
))12(),......,1((φφ .                                                                        (3.9) 
If (3.9) is to give a global minimum it should not only be true when φ  is given by (3.9), 
but also it must hold when φ  is given by 
     ∑ ∑
kki reeyx
))12(),......,1((φφ ,                                                                (3.10) 
where )1,1(, ccyx ki +−∈  for some small 0>c . This is because a global minimum is a 
very stable position and therefore under a small perturbation the system always comes 
back to it. Let ikki zyx = . Then combining (3.8) and (3.10) the following is obtained 
     
+++ 0
.....
.....
.....
.....
.....
.....
.....
12,,12,2,112,1,1
2,,2,2,12,1,1
1,,1,2,11,1,1
NmrNmrr
,                                                (3.11) 
where 
))12(),......,1((
. Note that each kiz ,  or ikz  can take 
uncountably infinitely many values from some (small) open interval and for all of them 
(3.11) holds under the perturbation principle. Therefore the mNr ×+ )12(  matrix on the 
left of (3.11) must represent a null linear transformation and it must be a null matrix, 
which implies 
     jki
jki ,,,0
))12(),......,1((
,, ∀=
.                                                      (3.12) 
(3.7) and (3.12) together imply 
     
( ) ( )
= +−=
smi i
.                                                                   (3.13) 
(3.13) says, “Negative and positive experiences in a neuronal network must counter 
balance each other for a stable (which will not be altered due to presence of some amount 
of noise or distraction) unsupervised learning from the experience gained through a new 
behavior.” 
4. Application 
     What do we get from the principle enunciated at the end of the last section in response 
to a strong aversive stimulus? Its mathematical formulation (3.13) says, in the face of 
strong aversive stimuli (such as an imminent danger) the ensuing behavior B  must avoid 
repeating the past behaviors sBi '  with negative experience sM i ' . Note that the stimuli 
can invoke an iM  if and only if iB  is at least partially activated by the stimuli. (3.13) 
says the ensuing behavior B  will have to be such that none of the sBi '  with negative 
experience is repeated. This means in the information space 
12 +r
R  B  will have to sit 
away from each 
iM , which denotes memory of a negative behavior. B  will have its 
own positive and negative parts which will later be stored as new positive and negative 
memories in the network. B  cannot be neutral. This will be the subject of a future work. 
A) Aversion response latency 
     The prefrontal cortex participates in linking perception of stimuli to the guidance of 
behavior including the flexible execution of strategies for obtaining rewards and avoiding 
punishments as an organism interacts with its environment. Recording from neurons 
within healthy tissue in the ventral sites of the right prefrontal cortex short latency (120–
160ms) responses selective for aversive visual stimuli have been observed. No such 
latency was observed for pleasant or neutral stimuli (Kawasaki et al. 2001). (3.13) can 
offer us an explanation of this phenomenon. Aversive stimuli do evoke negative (or 
aversive) memory (that is how the stimuli are identified as aversive, even a novel 
aversive stimulus will have to be decomposable into known aversive features) and 
therefore the ensuing behavior B  must ensure that it has minimum overlap with the 
negative memories in the space 
12 +r
R . This in turn makes sure that B  acts on the 
network as less as possible to repeat the behaviors associated with the negative memories. 
     When there will be only aversive stimuli (no pleasant stimulus) the new behavior will 
have to be organized to avoid the stimuli particularly when the stimuli are strongly 
aversive (like an immediate threat). Clearly a strongly aversive stimulus must invoke the 
memory  
jM  of at least one simple negative behavior with which a strong feeling is to 
be associated. In other words amygdale make sure that the Σ  in (2.11) have larger 
entries. Then there must be the memory 
jM  of a simple positive behavior which can 
take appropriate action in response to the behavior (sensation) corresponding to 
jM . 
Even a new born is hardwired to express displeasure by crying in response to an aversive 
stimulus so that some one else (possibly the mother) is alerted and come in help to avoid 
the stimulus. In order to avoid the aversive stimuli the dynamics of B  will be such that 
(3.13) can take the following form 
     
( ) ( )∑∑ == +
,                                                                          (4.1) 
where )(XF j  is the feeling associated with 
jM  and )(tI j  is the network interaction 
between B  and 
jM , k  is the number of simple positive behavior recalled. This will 
counterbalance the aversive effect of the stimuli and make sure that (3.13) holds. 
     Whereas in case of pleasant or neutral stimuli (without the presence of aversive 
stimuli) avoidance is not necessary, the brain is free to repeat the behaviors with pleasant 
memory. In that case no global minimization of the potential function will be necessary 
and left side of (3.7) does not need to be zero. Therefore equality in (4.1) will also not be 
necessary. The new behavior can sit anywhere and no ‘optimum positioning’ (making the 
potential function globally minimum) will be necessary and this will require less time for 
a response and therefore there will be no latency in the response to pleasant stimuli. 
     When there will be both positive and negative stimuli (3.13) will hold. Thus it appears 
that whenever aversive (negative) stimuli are present either (3.13) or (4.1) needs to be 
calculated. Time taken by this ‘calculation’ in the brain is the reason for latency when 
there are aversive stimuli. Whereas no such latency is necessary for positive or neutral 
stimuli. Probably the brain does not calculate the way shown in this paper. Had it done so 
it would have taken a much longer latency. But the reasoning here is compelling enough 
to conclude that calculations responsible for the observed latency do take place in the 
brain in one form or the other. 
B) Sequential decision making 
     A brain always has to make choices i.e., a behavior is a series of switching or decision 
making procedures. Rabinovich and colleagues have considered the dynamics of decision 
making (DM) by the brain or cognitive state machine (CSM) at the psychological level 
(Rabinovich et al. 2006). Whereas the focus of this paper is on the dynamics of neural 
substrate of such processes. At each DM instant of a CSM the underlying neural 
dynamics of the DM generates the psychological process of DM in the CSM. 
     The dynamics of the CSM is governed by the Lotka-Volterra type equation 
(Rabinovich et al. 2006) 
     )(),( taatIaa i
jijiiii ηρσ +
+−= ∑
& ,                                                             (4.2) 
where )(tai  is the state of the CSM, ),( tIiσ  is a control function which controls the 
dynamics given by (4.3), I  is the input (environmental stimulus), ijρ  is a coupling 
constant between ia  and ja  based on genetic and memorized information (very similar 
to the interaction )(tI i  (2.13) between the memories of the old behavior and the ensuing 
new behavior, which in this paper has been taken to be constant), N  the total number of 
states and )(tiη  is the external noise. Note that in (4.2) the meaning of iσ  and N  are 
different than anywhere else in the earlier part of this paper. 
     
& ,                                                                                                  (4.3) 
where iU  is a potential function and τ  is characteristic time which is very small. 
Difference in states occur at the it  where a decision needs to be made as shown in Figure 
1. ia  can be chosen at an instant jt , where a decision has to be made, as many ways as 
there are minima of iU . 
                             
Figure 1: A sequence of cognitive states. Thin lines are possible paths and the decision 
path has been shown by thick lines. 9643 ,,, tttt  are the instances to make a decision. 
Adopted from (Rabinovich et al. 2006). 
     (3.13) is concerned about the dynamics of neural substrate of each single decision 
making (DM) in the sequence of DM’s shown in Figure 1. (3.13) is also derived by 
minimizing the potential function (3.9). Since the dynamics given by (3.7) is concerned 
about a local decision making in Figure 1, it was possible to take global optimum of the 
potential function given by (3.9) (global optimum of local potential function) and its 
stability ensures that the local dynamics is not perturbed by a small amount of noise. It is 
very important for stable cognition or stable behavior. According to this model the 
alteration in behavior (to make it dynamic) should be introduced by changing feeling 
(given by (2.11)) and synaptic plasticity (given by (2.13)), which happens from state to 
state in a CSM. Therefore despite the global minimum of the local dynamics the CSM 
manages to move on from one state to the next till it reaches the end of life which may be 
signified by lack of emotional impetus. On the other hand at the sequential DM level in a 
CSM (Figure 1) introduction of noise plays a significant role in dynamically altering the 
behavior (equation (4.2)). In both the dynamical systems of this paper and that of 
Rabinovich et al. the time length has been taken to be short. The duration has been taken 
so short that the dynamics of ),( tIiσ  (in (4.3)) and )(tI i  (in (2.13)) do not change 
within that period. In this scenario comparing (3.13) and (4.2) it appears that the external 
noise )(tiη  may take an important role in mediating feeling (emotional distraction or 
attraction with respect to an external stimulus). 
     (3.13) and (4.1) describe how new behavior is formed depending on synaptic 
plasticity, memory and feeling. However there is no deterministic way to compute the 
new behavior from (3.13) or (4.1) even if all the information are available. Also (3.13) or 
(4.1) will be extremely difficult (if not impossible) to compute except for cases like a 
simple behavior (such as inking or gill withdrawal) of a simple animal (such as the sea 
snail Aplysia californica). This is in conformity with the fact that closer is the model to 
the neuronal network level of the brain the more difficult it would be to implement. On 
the other hand the CSM dynamics given by (4.2) is very convenient to compute. 
C) Potential function 
     The potential function introduced by (3.9) in the neuronal network dynamics of the 
brain can be related to metabolic energy consumption during activation of the network to 
execute a behavioral or cognitive task. (3.9) is based on electrophysiology and 
metabolism is related to hemodynamics. Lot of research is going on to find the 
relationship between these two activities of the brain (Logothetis & Wandell 2004). 
Recently a correlation study between stimulus based ERP and fMRI in different parts of 
the human brain has been reported (Gore et al. 2006). Despite convincing evidence of 
their interdependence, such as in the form of dependence of fMRI on ERP or EEG, 
precise knowledge about the nature of dependence is still lacking. A linear transform 
model had been proposed based on the hypothesis that fMRI responses are proportional 
to local average neural activity averaged over a period of time (Boynton et al. 1996). It 
has been reported that in the visual cortex of macaque monkey fMRI response depends 
closely on the local field potential (LFP) (Logothetis et al. 2001). 
     Note that the potential function given by (3.5) can be obtained from (3.7) when the 
time course is short and the quantities on the right side are known. From (3.7) it is clear 
that the potential function φ  (which is defined on the Fourier coefficients of neural spike 
trains (3.9)) depends on past memory, synaptic plasticity and feelings. In case of a 
potential function φ  sum of local components will give the function over the whole 
space, and therefore no matter how large and distributed the network is, local component 
of φ  at a point x  on the cortex (let be denoted by xφ ) can be represented by (3.7) where 
only the neurons in a small neighborhood of x , denoted by )(xn , will participate in the 
computation of xφ . Since xφ  defined on the Fourier coefficients of spike trains of the 
neurons in )(xn  and the value of xφ  depends on past memories, plasticity of synapses 
within )(xn  and part of feeling arising within )(xn  it is supposed to have a role in BOLD 
fMRI response at x  with respect to a stimulus. xφ  may have an affine or linear relation 
with stimulus driven difference in local BOLD signal at x  as modeled in (Logothetis & 
Wandell 2004). But this needs experimentations to establish. 
Conclusion 
     In this paper a system level model of the brain functions for behavior has been 
proposed. The model needs input from individual neurons under the assumption that a 
brain circuit responsible for a behavior can be understood by the behavior of neurons 
alone. This assumption has limitations, for significant change in a neural network may 
occur at the spiking sub-threshold level in the neurons and therefore without changing the 
spike trains (Harris-Warrick & Marder 1991). Note that FFT based information retrieveal 
technique followed in this paper will hold equally good for subthreshold signals, whereas 
spike detection or prediction algorithms may not work for those signals. To account for a 
fuller neural computation synaptic computation (Abbott & Regehr 2004) and dendritic 
computation (London & Hausser 2005) will also have to be incorporated. A close 
investigation into the nature of )(tI  (equation (2.13)) will inevitably call attention to 
synaptic and dendritic computations. The model equation (3.2) will be valid in all 
generality, but the important consequence (3.13) cannot be drawn as easily as shown in 
this paper. 
     The effectiveness of the model dynamics is only for a very short period of time and 
therefore only behaviors in response to strong aversive stimuli have been considered, for 
such response behaviors must have to be very quick. The time duration is so short that it 
has been assumed – the average change in the collection of synapses in a network 
remains fixed within that time. This is an important constraint for introducing potential 
energy function whose global minimization gives stability to the behavioral response in 
the sense that it remains unperturbed in a noisy environment. 
     Even without incorporating synaptic and dendritic computations the model stands 
extremely difficult to implement (for example, calculating iB  and then calculating iM  
will be very challenging). However it can be implemented in case of a simple behavior by 
a simple animal. In invertebrates like the sea snail Aplysia lesser number of neurons will 
have to be monitored (only a few in case of simple behaviors like gill withdrawal or 
inking) and single cell recordings will be less challenging than the mammals. This will 
make the verification of the system possible. In case of human brain functions if the 
behavior of the circuit involved can be monitored to a good extent by recording signals 
only from a few neurons verification of the model may be feasible for some very simple 
behavior like the following. Consider the experiment of tapping the leg (Kandel et al. 
2000). If there is a sharp edge such that dragging the leg too much backward will make it 
bump on the sharp edge which is another aversive stimulus. The final position of the leg 
will be away from the source of tapping as well as the sharp edge. 
     If a relationship between the potential function φ  and the metabolic or hemodynamic 
activity of the brain can be established, study of the system will become easier. 
Introduction of an electrophysiological potential function in the neural computation is a 
significant outcome of the FFT based information extraction method followed in this 
paper. Apart from the prospect of relating it to metabolic energy requirements of the 
brain φ  gives the opportunity of deriving interesting theoretical results as shown in this 
paper. 
     Since the expression of feeling as given by (2.11) is independent of time here the 
detail of feeling did not have to be considered. But feeling is likely to be dependent on 
time in the long run and in that case the entries of Σ  (which is a diagonal matrix) will be 
function of time and the dynamics will be much more complicated. How the eigen values 
of Σ  are controlled by the limbic system of the brain is an open question. No attempt has 
been made in this paper to answer the question. 
Acknowledgement 
     The author thankfully acknowledges the Institute of Mathematical Sciences in 
Chennai, India for a postdoctoral fellowship under which this work has been carried out. 
Some comments by Peter Dayan on a preliminary version of this manuscript are also 
being acknowledged. 
References 
Abbott, L. A. and Regehr, W. G. (2004) Synaptic computation. Nature 431:796–803. 
Afraz, S. R., Kiani, R. and Esteky, H. (2006) Microstimulation of inferotemporal cortex influences face 
categorization. Nature 442: 692–695. 
Borisyuk, R. and Rinzel, J. (2005) Understanding neuronal dynamics by geometrical dissections of minimal 
models. In: Chow., C, Gutkin, B, Hansel, D, Meunier, C and Dalibard, J, (eds.) Models and Methods in 
Neurophysics, Proc Les Houches Summer School 2003, (Session LXXX) Elsevier:19–72. 
Boynton, G. M., Engel, S. A., Glover, G. H. and Heeger, D. J. (1996) Linear systems analysis of functional 
magnetic resonance imaging in human V1. J. Neurosci. 16(13): 4207–4221. 
Engel, A.K., Fries, P. and Singer, W. (2001) Dynamic predictions: oscillations and synchrony in top-down 
processing. Nat. Rev. Neurosci. 2: 704–716. 
Goldstein, H., (1950) Classical Mechanics, Reading, MA, Addison-Wesley.  
Harris-Warrick, R. M. and Marder, E. (1991) Modulation of neural networks for behavior. Annu. Rev. 
Neurosci. 14: 39–57. 
Gore, J. C., Horovitz, S. G., Cannistraci, C. J. and Skudlarski, P. (2006) Integration of fMRI, NIROT and 
ERP for studies of human brain function. Magnetic Resonance Imaging 24: 507–513. 
Kandel, E. R., Schwartz, J. H. and Jessell, T. M. (2000). Principles of Neural Science, New York, McGraw 
Hill. 
Kawasaki, H., Adolphs, R., Kaufman, O., Damasio, H., Damasio, A. R., Granner, M., Bakken, H., Hori, T. 
and Howard III, M. A. (2001) Single-neuron responses to emotional visual stimuli recorded in human 
ventral prefrontal cortex. Nature Neuroscience 4: 15–16. 
Koch, C. (1999) Biophysics of Computation: Information Processing in Single Neuron. Oxford University 
Press, New York. 
Laughlin, S. B. (2004) The implications of metabolic energy requirements for the representation of 
information in neurons. In: Gazzaniga, M. S. (ed.) The Cognitive Neurosciences, MIT Press, Cambridge, 
MA, 187–196. 
Logothetis, N. K., Pauls, J., Augath, M., Trinath, T. and Oeltermann, A. (2001) Neurophysiological 
investigation of the basis of the fMRI signal. Nature 412: 150–157. 
Logothetis, N. K. and Wandell, B. A. (2004) Interpreting the BOLD signal, Annu. Rev. Physol. 66: 735–
769. 
Loh, M. and Deco, G. (2005) Cognitive flexibility and decision-making in a model of conditional 
visuomotor associations. Euro. J. Neurosci. 22: 2927–2936. 
London, M. and Hausser, M. (2005) Dendritic computation. Annu. Rev. Neurosci. 28:503–532. 
Majumdar, K. K. and Kozma, R. (2006). Studies on sparse array cortical modeling and memory cognition 
duality, Proc. IJCNN’06, Vancouver, Canada:4954–4957. 
Quian Quiroga, R., Reddy, L., Kreiman, G., Koch, C. and Fried, I. (2005) Invariant visual representation by 
single neurons in the human brain. Nature 435: 1102–1107. 
Rabinovich, M. I., Huerta, R. and Afraimovich, V. (2006) Dynamics of sequential decision making. Phys. 
Rev. Lett. 97: 188103–1–4. 
Salum, C., Morato, S. and Roque-da-Silva, A. C. (2000) Anxiety-like behavior in rats: a computational 
model. Neural Networks 13: 21–29. 
Schacter, D. L. (2004) Introduction (to Part VI: Memory). In: Gazzaniga, M. S. (ed.) The Cognitive 
Neurosciences, MIT Press, Cambridge, MA, 643–645. 
Schmajuk, N. A. (1997) Proteus Caught in A (Neural) Net. Animal Learning and Cognition: A Neural 
Network Approach, Cambridge University Press, Cambridge.
ABSTRACT
  In this paper a theoretical model of functioning of a neural circuit during a
behavioral response has been proposed. A neural circuit can be thought of as a
directed multigraph whose each vertex is a neuron and each edge is a synapse.
It has been assumed in this paper that the behavior of such circuits is
manifested through the collective behavior of neurons belonging to that
circuit. Behavioral information of each neuron is contained in the coefficients
of the fast Fourier transform (FFT) over the output spike train. Those
coefficients form a vector in a multidimensional vector space. Behavioral
dynamics of a neuronal network in response to strong aversive stimuli has been
studied in a vector space in which a suitable pseudometric has been defined.
The neurodynamical model of network behavior has been formulated in terms of
existing memory, synaptic plasticity and feelings. The model has an analogy in
classical electrostatics, by which the notion of force and potential energy has
been introduced. Since the model takes input from each neuron in a network and
produces a behavior as the output, it would be extremely difficult or may even
be impossible to implement. But with the help of the model a possible
explanation for an hitherto unexplained neurological observation in human brain
has been offered. The model is compatible with a recent model of sequential
behavioral dynamics. The model is based on electrophysiology, but its relevance
to hemodynamics has been outlined.

<|endoftext|><|startoftext|>
Introduction 1
2. Quivers and path algebras 5
3. Potentials and their Jacobian ideals 8
4. Quivers with potentials 12
5. Mutations of quivers with potentials 20
6. Some mutation invariants 25
7. Nondegenerate QPs 29
8. Rigid QPs 32
9. Relation to cluster-tilted algebras 36
10. Decorated representations and their mutations 39
11. Some three-vertex examples 50
12. Some open problems 54
13. Appendix. Proof of Lemma 4.12 55
Acknowledgments 57
References 57
1. Introduction
The main objects of study in this paper are quivers with potentials (QPs for short).
Roughly speaking, a QP is a quiver Q together with an element S of the path algebra
of Q such that S is a linear combination of cyclic paths. We associate to S the two-
sided ideal J(S) in the path algebra generated by the (noncommutative) partial
derivatives of S with respect to the arrows of Q. We refer to J(S) as the Jacobian
Date: April 18, 2007; revised July 10, 2007 and March 12, 2008.
2000 Mathematics Subject Classification. Primary 16G10, Secondary 16G20, 16S38.
Research of H. D. supported by the NSF grant DMS-0349019. Research of J. W. supported by
the NSF grant DMS-0600229. Research of A. Z. supported by the NSF grant DMS-0500534 and
by a Humboldt Research Award.
http://arxiv.org/abs/0704.0649v4
2 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
ideal, and to the quotient of the path algebra modulo J(S) as the Jacobian algebra.
They appeared in physicists’ work on superpotentials in the context of the Seiberg
duality in mirror symmetry (see e.g., [17, 2, 6]). Since in some of their work the
superpotentials are required to satisfy some form of Serre duality, we prefer not to
use this terminology, and just refer to S as a potential ; another reason for this is
that we are working with the completed path algebra, so our potentials are possibly
infinite linear combinations of cyclic paths. The Jacobian algebras also play an
important role in the recent work on Calabi-Yau algebras [4, 25, 26, 27].
In this paper we introduce and study mutations for QPs and their (decorated)
representations. In the context of Calabi-Yau algebras, the mutations were discussed
in [26] but our approach is much more elementary and down-to-earth. Namely, we
develop the setup that directly extends to QPs the Bernstein-Gelfand-Ponomarev
reflection functors [3] and their “decorated” version [28].
The original motivation for our study comes from the theory of cluster algebras
introduced and studied in a series of papers [18, 19, 1, 20]. In this paper, we deal only
with the underlying combinatorics of this theory embodied in skew-symmetrizable
integer matrices and their mutations. Furthermore, we restrict our attention to skew-
symmetric integer matrices. Such matrices can be encoded by quivers without loops
and oriented 2-cycles. Namely, a skew-symmetric integer n × n matrix B = (bi,j)
corresponds to a quiver Q(B) with vertices 1, . . . , n, and bi,j arrows from j to i
whenever bi,j > 0. For every vertex k, the mutation at k transforms B into another
skew-symmetric integer n × n matrix µk(B) = B = (bi,j). The formula for bi,j is
given below in (7.4). It is well-known (see Proposition 7.1 below) that the quiver
Q(B) can be obtained from Q(B) by the following three-step procedure:
Step 1. For every incoming arrow a : j → k and every outgoing arrow b : k → i,
create a “composite” arrow [ba] : j → i; thus, whenever bi,k, bk,j > 0, we
create bi,kbk,j new arrows from j to i.
Step 2. Reverse all arrows at k; that is, replace each arrow a : j → k with a⋆ : k → j,
and b : k → i with b⋆ : i→ k.
Step 3. Remove any maximal disjoint collection of oriented 2-cycles (that can appear
as a result of creating new arrows in Step 1).
In the case where k is a source or a sink of Q(B), the first and last steps of the
above procedure are not applicable, so Q(B) is obtained from Q(B) by just reversing
all the arrows at k. In this situation, J.Bernstein, I. Gelfand, and V. Ponomarev [3]
introduced the reflection functor at k sending representations of a quiver Q(B) (with-
out relations) into representations of Q(B). A modification of these functors acting
on decorated representations was introduced in [28] to establish a link between clus-
ter algebras and quiver representations (the definition of decorated representations
for general QPs is given below in Section 10).
The elementary approach of [28] has not been further pursued until now, giving
way to a more sophisticated approach via cluster categories and cluster-tilted alge-
bras developed in [7, 8, 9, 10, 12, 13, 14, 15] and many other publications. Most of
the results in these papers are for the quivers obtained by mutations from hereditary
algebras (i.e., quivers without oriented cycles and without relations). In this paper
we return to the more elementary point of view of [28] and propose an alternative
QUIVERS WITH POTENTIALS I 3
approach (which is in fact more general, since we do not impose any restrictions
on quivers in question). In this approach, the mutations at arbitrary vertices (not
just sources or sinks) are defined for QPs and their decorated representations. The
construction for QPs is carried out in Section 5, and for their representations in Sec-
tion 10. It turns out to be rather delicate and requires a lot of technical preparation.
The first two steps of the above mutation procedure extend to QPs in a relatively
straightforward way, but Step 3 presents a real challenge: we need to accompany
the removal of oriented 2-cycles from a quiver with a suitable modification of the
potential, leaving the corresponding Jacobian algebra unchanged. Our main device
in dealing with this difficulty is Theorem 4.6, which is the crucial technical result
of the paper. Roughly speaking, Theorem 4.6 asserts that every potential S can be
transformed by an automorphism of the path algebra into the sum of two potentials
Striv and Sred on the disjoint sets of arrows, where the trivial part Striv is a linear
combination of cyclic 2-paths, while the reduced part Sred involves only cyclic d-paths
with d ≥ 3. Furthermore, the Jacobian algebra of Sred is isomorphic to that of S.
Several comments on this result are in order. First, our arguments heavily de-
pend on the setup using completed path algebras, thus allowing potentials to involve
infinite sums of cyclic paths. Second, the reduction S 7→ Sred is not given by a
canonical procedure. As a consequence, our construction of mutations for QPs and
their representations is not functorial in any obvious sense. On the positive side, we
prove that every mutation is a well-defined transformation on the right-equivalence
classes of QPs (and their representations), where, roughly speaking, two QPs are
right-equivalent if they can be obtained from each other by an automorphism of the
path algebra (for more precise definitions see Definitions 4.2 and 10.2).
Finally, it is important to keep in mind that, even with the help of Theorem 4.6,
in order to get rid of all oriented 2-cycles in the mutated QP, one needs to impose
some “genericity” conditions on the initial potential S. These conditions are studied
in Section 7. They are not very explicit in general, but we introduce an important
class or rigid QPs (see Definitions 6.7 and 6.10) for which the absence of oriented
2-cycles after any sequence of QP mutation is guaranteed.
We now describe the contents of the paper in more detail. In Section 2 we introduce
an algebraic setup for dealing with quivers and their path algebras. We fix a base field
K, and encode a quiver with the vertex set Q0 and the arrow set Q1 by its vertex span
R = KQ0 and arrow span A = KQ1 . Thus, R is a finite-dimensional commutative
K-algebra, and A is a finite-dimensional R-bimodule. We then introduce the path
algebra
R〈A〉 =
and, more importantly for our purposes, the complete path algebra
R〈〈A〉〉 =
here Ad stands for the d-fold tensor power of A as an R-bimodule. We view R〈〈A〉〉
as a topological algebra via the m-adic topology, where m is the two-sided ideal
generated by A.
4 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
In Section 3 we introduce some of our main objects of study: potentials and
their Jacobian ideals. It is natural to view potentials as elements of the trace space
R〈〈A〉〉/{R〈〈A〉〉, R〈〈A〉〉}, where {R〈〈A〉〉, R〈〈A〉〉} is the closure of the vector sub-
space in R〈〈A〉〉 spanned by all commutators. It is more convenient for us to define
a potential S as an element of the cyclic part of R〈〈A〉〉; for all practical purposes,
S can be replaced by a cyclically equivalent potential, that is, the one with the same
image in the trace space. To define the Jacobian ideal J(S) and derive its basic
properties, we develop the formalism of cyclic derivatives, in particular, establishing
“cyclic” versions of the Leibniz rule and the chain rule. The main result of Section 3
is Proposition 3.7 that asserts that any isomorphism ϕ of path algebras sends J(S)
to J(ϕ(S)). Note that cyclic derivatives for general non-commutative algebras were
introduced in [29], and the results we present can be easily deduced from those in
loc.cit. For the convenience of the reader, we present complete independent proofs.
Victor Ginzburg informed us that in the context of path algebras of quivers, cyclic
derivatives were introduced and studied in [5, 24], and that Proposition 3.7 is a
consequence of the geometric interpretation of J(S) given in [25, Definition 5.1.1,
Lemma 5.1.3].
In Section 4 we introduce quivers with potentials (QPs) and define the right-
equivalence relation on them, which plays an important role in the paper. We then
state and prove the key technical result of the paper: Splitting Theorem 4.6, already
discussed above. The proof is elementary but pretty involved; it uses in an essential
way the topology in a complete path algebra. In order not to interrupt the argument,
we move to the Appendix our treatment of the topological properties needed for the
proof of one of the technical lemmas.
In Section 5 we finally introduce the mutations of QPs. Using Theorem 4.6, we
prove that the mutation at an arbitrary vertex is a well-defined involution on the set
of right-equivalence classes of reduced QPs (Theorem 5.7).
In Section 6, we study some mutation invariants of QPs. In particular, we show
that mutations preserve the class of QPs with finite-dimensional Jacobian algebras
(Corollary 6.6). Another important property of QPs preserved by mutations is rigid-
ity (Corollary 6.11), which was already mentioned above. For the precise definition
of rigid QPs see Definitions 6.7 and 6.10 below; intuitively, a QP is rigid if its right-
equivalence class is invariant under infinitesimal deformations.
In Section 7, we introduce and study nondegenerate QPs, that is, those to which
one can apply an arbitrary sequence of mutations without creating oriented 2-cycles.
In Corollary 7.4 we show that nondegeneracy is guaranteed by non-vanishing of
countably many nonzero polynomial functions on the space of potentials. In par-
ticular, if the base field K is uncountable, a nondegenerate QP exists for every
underlying quiver.
Section 8 contains some examples of rigid and non-rigid potentials and some further
results illustrating the importance of rigidity. A simple but important Proposition 8.1
asserts that rigid QPs have no oriented 2-cycles. Combining this with the fact that
rigidity is preserved by mutations, we conclude that every rigid QP is nondegenerate.
Using a result by Keller-Reiten [27], we show in Example 8.7 that the class of rigid
QPs (as well as the class of QPs with finite-dimensional Jacobian algebras) is strictly
greater than the class of QPs mutation-equivalent to acyclic ones. On the other hand,
QUIVERS WITH POTENTIALS I 5
Example 8.6 exhibits an underlying quiver without oriented 2-cycles that does not
admit a rigid QP; thus, the class of nondegenerate QPs is strictly greater than the
class of rigid ones.
In Section 9, we consider quivers that are mutation-equivalent to a Dynkin quiver.
For every such underlying quiver, we compute explicitly the corresponding rigid QP
(Proposition 9.1). Comparing this result with the description of cluster-tilted alge-
bras obtained in [13, 9], we conclude in Corollary 9.3 that in the case in question,
every cluster-tilted algebra can be identified with the Jacobian algebra of the cor-
responding rigid QP. Thus, Jacobian algebras can be viewed as generalizations of
cluster-tilted algebras.
In Section 10 we introduce decorated representations of QPs (QP-representations,
for short) and their right-equivalence (Definitions 10.1 and 10.2). We then present a
representation-theoretic extension of Splitting Theorem 4.6 by defining the reduced
part of a QP-representation M (Definition 10.4) and proving that, up to right-
equivalence, it is determined by the right-equivalence class of M (Proposition 10.5).
We use this result to introduce mutations of QP-representations and to prove a
representation-theoretic extension of Theorem 5.7: the mutation at every vertex is
an involution on the set of right-equivalence classes of reduced QP-representations
(Theorem 10.13).
Some examples of QP-representations and their mutations are given in Section 11.
All these examples treat quivers with three vertices. In particular, we describe the
effect of mutations on a special family of band representations coming from the theory
of string algebras [11, 21].
The concluding Section 12 contains some open problems about QPs and their
representations that we find essential for better understanding of the theory.
In the forthcoming continuation of this paper, we plan to discuss applications of
QP-representations and their mutations to the structure of the corresponding cluster
algebras.
2. Quivers and path algebras
A quiver Q = (Q0, Q1, h, t) consists of a pair of finite sets Q0 (vertices) and Q1
(arrows) supplied with two maps h : Q1 → Q0 (head) and t : Q1 → Q0 (tail). It
is represented as a directed graph with the set of vertices Q0 and directed edges
a : ta → ha for a ∈ Q1. Note that this definition allows the underlying graph to
have multiple edges and (multiple) loops.
We fix a field K, and associate to a quiver Q two vector spaces R = KQ0 and
A = KQ1 consisting of K-valued functions on Q0 and Q1, respectively. We will
sometimes refer to R as the vertex span of Q, and to A as the arrow span of Q. The
space R is a commutative algebra under the pointwise multiplication of functions.
The space A is an R-bimodule, with the bimodule structure defined as follows: if
e ∈ R and f ∈ A then (e ·f)(a) = e(ha)f(a) and (f ·e)(a) = f(a)e(ta) for all a ∈ Q1.
We denote by Q⋆ the dual or opposite quiver Q⋆ obtained by reversing the arrows
in Q (i.e., replacing Q = (Q0, Q1, h, t) with Q
⋆ = (Q0, Q1, t, h)). The corresponding
arrow span is naturally identified with the dual bimodule A⋆ (the dual vector space
of A with the standard R-bimodule structure).
6 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
For a given vertex set Q0 with the vertex span R, every finite-dimensional R-
bimodule B is the arrow span of some quiver on Q0. To see this, consider the
elements ei ∈ R for i ∈ Q0 given by ei(j) = δi,j (the Kronecker delta symbol).
They form a basis of idempotents of R, hence every R-bimodule B has a direct sum
decomposition
i,j∈Q0
Bi,j ,
where Bi,j = eiBej ⊆ B for every i, j ∈ Q0. If B is finite-dimensional, we can
identify the (finite) set of arrows Q1 with a K-basis in B which is the union of bases
in all components Bi,j; under this identification, every a ∈ Q1 ∩ Bi,j has h(a) = i
and t(a) = j.
It is convenient to represent an R-bimodule B by a matrix of vector spaces (Bi,j)
whose rows and columns are labeled by vertices. In this model, the left (resp. right)
action of an element c =
i ciei ∈ R is given by the left (resp. right) multiplication
by the diagonal matrix with diagonal entries ci. And the tensor product over R is
given by a usual matrix multiplication: if B =
i,j Bi,j and C =
i,j Ci,j, then
(B ⊗R C)i,j =
(Bi,k ⊗ Ck,j).
Returning to a quiver Q with the arrow span A, for each nonnegative integer d,
let Ad denote the R-bimodule
Ad = A⊗R A⊗R · · · ⊗R A︸ ︷︷ ︸
with the convention A0 = R.
Definition 2.1. The path algebra of Q is defined as the (graded) tensor algebra
R〈A〉 =
For each i, j ∈ Q0, the component R〈A〉i,j = eiR〈A〉ej is called the space of paths
from j to i.
As above, we identify the set of arrows Q1 with some basis of A consisting of
homogeneous elements, that is, each a ∈ Q1 belongs to some component Ai,j. Then
for every d ≥ 1, the products a1 · · · ad such that all ak belong to Q1, and t(ak) =
h(ak+1) for 1 ≤ k < d, form a K-basis of A
d. We call this basis the path basis of Ad
associated to Q1. For d = 0, we call {ei | i ∈ Q0} the path basis of A
0 = R. We
refer to the union of path bases for all d as the path basis of R〈A〉. The elements of
the path basis will be sometimes referred to simply as paths. We depict a1 · · · ad as
a path of length d starting in the vertex t(ad) and ending in h(a1). Note that the
product (a1 · · · ad)(ad+1 · · ·ad+k) of two paths is 0 unless t(ad) = h(ad+1), in which
case the product is given by concatenation of paths. This description implies the
following:
(2.1) If 0 6= p ∈ Akei and 0 6= q ∈ eiA
ℓ for some vertex i then pq 6= 0.
QUIVERS WITH POTENTIALS I 7
Definition 2.2. The complete path algebra of Q is defined as
R〈〈A〉〉 =
Thus, the elements of R〈〈A〉〉 are (possibly infinite) K-linear combinations of the
elements of a path basis in R〈A〉; and the multiplication in R〈〈A〉〉 naturally extends
the multiplication in R〈A〉.
Note that, if the quiver Q is acyclic (that is, has no oriented cycles), then Ad = {0}
for d≫ 0, hence in this case R〈〈A〉〉 = R〈A〉, and this algebra is finite-dimensional.
Example 2.3. Consider the quiver Q = (Q0, Q1) with Q0 = {1} and Q1 = {a} with
a : 1 → 1. This is the loop quiver:
1a 88 .
In this case R = KQ0 = K, and A = KQ1 = Ka. We have R〈A〉 = K[a], and
R〈〈A〉〉 = K[[a]], the algebra of formal power series.
Let m = m(A) denote the (two-sided) ideal of R〈〈A〉〉 given by
(2.2) m = m(A) =
Thus the powers of m are given by
We view R〈〈A〉〉 as a topologicalK-algebra via the m-adic topology having the powers
of m as a basic system of open neighborhoods of 0. Thus, the closure of any subset
U ⊆ R〈〈A〉〉 is given by
(2.3) U =
(U +mn).
It is clear that R〈A〉 is a dense subalgebra of R〈〈A〉〉.
In dealing with R〈〈A〉〉, the following fact is quite useful: every (non-commutative)
formal power series over R in a finite number of variables can be evaluated at arbi-
trary elements of m to obtain a well-defined element of R〈〈A〉〉. To illustrate, let us
show that m is the unique maximal two-sided ideal of R〈〈A〉〉 having zero intersec-
tion with R = A0. Indeed, it is enough to show that any element x ∈ R〈〈A〉〉 − m
generates an ideal having nonzero intersection with R. Let x = c + y with c a
nonzero element of R, and y ∈ m. Multiplying x on both sides by suitable elements
of R, we can assume that c = ei for some i ∈ Q0, and eiy = yei = y. But then
z = ei − y + y
2 − y3 + · · · is a well-defined element of R〈〈A〉〉, and we have xz = ei,
proving our claim.
This characterization of m implies that it is invariant under any algebra automor-
phism ϕ of R〈〈A〉〉 such that ϕ|R is the identity. Thus, ϕ is continuous, i.e., is an
automorphism of R〈〈A〉〉 as a topological algebra.
8 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
The same argument shows that, more generally, if A and A′ are finite-dimensional
R-bimodules then any algebra homomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that
ϕ|R = id, sends m(A) intom(A
′), hence is continuous. Thus, ϕ is uniquely determined
by its restriction to A1 = A, which is a R-bimodule homomorphism A → m(A′) =
A′ ⊕m(A′)2. We write ϕ|A = (ϕ
(1), ϕ(2)), where ϕ(1) : A→ A′ and ϕ(2) : A→ m(A′)2
are R-bimodule homomorphisms.
Proposition 2.4. Any pair (ϕ(1), ϕ(2)) of R-bimodule homomorphisms ϕ(1) : A→ A′
and ϕ(2) : A → m(A′)2 gives rise to a unique homomorphism of topological algebras
ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, and ϕ|A = (ϕ
(1), ϕ(2)). Furthermore, ϕ is
an isomorphism if and only if ϕ(1) is a R-bimodule isomorphism A→ A′.
Proof. The uniqueness of ϕ is clear. To show the existence, let Q1 = {a1, . . . , aN} ⊂
A = A1 be the set of arrows in A. Writing an element x ∈ R〈〈A〉〉 as an infinite
K-linear combination of the elements of the corresponding path basis in R〈A〉, we
express x as a (non-commutative) formal power series F (a1, . . . , aN ). Therefore,
a desired algebra homomorphism can be obtained by setting ϕ(x) = F (ϕ(1)(a1) +
ϕ(2)(a1), . . . , ϕ
(1)(aN) + ϕ
(2)(aN)).
If ϕ is an isomorphism then ϕ(1) : A → A′ is clearly an isomorphism of R-
bimodules. To show the converse implication, we can identify A and A′ with the
help of ϕ(1), and so assume that ϕ(1) is the identity automorphism of A. Then the
(infinite) matrix that expresses ϕ as a K-linear map in the path basis of R〈〈A〉〉
is lower-triangular with all the diagonal entries equal to 1 (we order the basis ele-
ments so that their degrees weakly increase). Clearly, such a matrix is invertible,
completing the proof of Proposition 2.4. �
Definition 2.5. Let ϕ be the automorphism of R〈〈A〉〉 corresponding to a pair
(ϕ(1), ϕ(2)) as in Proposition 2.4. If ϕ(2) = 0, then we call ϕ a change of arrows. If ϕ(1)
is the identity automorphism of A, we say that ϕ is a unitriangular automorphism;
furthermore, we say that ϕ is of depth d ≥ 1, if ϕ(2)(A) ⊂ md+1.
The following property of unitriangular automorphisms is immediate from the
definitions:
If ϕ is an unitriangular automorphism of R〈〈A〉〉 of depth d,(2.4)
then ϕ(u)− u ∈ mn+d for u ∈ mn.
3. Potentials and their Jacobian ideals
In this section we introduce some of our main objects of study: potentials and
their Jacobian ideals in the complete path algebra R〈〈A〉〉 given by Definition 2.2.
We fix a path basis in R〈A〉; recall that it consists of the elements ei ∈ R = A
together with the products a1 · · · ad (paths) such that all ak belong to Q1, and t(ak) =
h(ak+1) for 1 ≤ k < d. Then each space A
d has a direct R-bimodule decomposition
i,j∈Q0
Adi,j, where the component A
i,j is spanned by the paths a1 · · · ad with
h(a1) = i and t(ad) = j.
QUIVERS WITH POTENTIALS I 9
Definition 3.1.
• For each d ≥ 1, we define the cyclic part of Ad as the sub-R-bimodule Adcyc =⊕
Adi,i. Thus, A
cyc is the span of all paths a1 · · · ad with h(a1) = t(ad);
we call such paths cyclic.
• We define a closed vector subspace R〈〈A〉〉cyc ⊆ R〈〈A〉〉 by setting
R〈〈A〉〉cyc =
Adcyc,
and call the elements of R〈〈A〉〉cyc potentials.
• For every ξ ∈ A⋆, we define the cyclic derivative ∂ξ as the continuous K-linear
map R〈〈A〉〉cyc → R〈〈A〉〉 acting on paths by
(3.1) ∂ξ(a1 · · · ad) =
ξ(ak)ak+1 · · · ada1 · · · ak−1.
• For every potential S, we define its Jacobian ideal J(S) as the closure of the
(two-sided) ideal in R〈〈A〉〉 generated by the elements ∂ξ(S) for all ξ ∈ A
(see (2.3)); clearly, J(S) is a two-sided ideal in R〈〈A〉〉.
• We call the quotient R〈〈A〉〉/J(S) the Jacobian algebra of S, and denote it
by P(Q, S) or P(A, S).
An easy check shows that a cyclic derivative ∂ξ : R〈〈A〉〉cyc → R〈〈A〉〉 does not de-
pend on the choice of a path basis. Furthermore, cyclic derivatives do not distinguish
between the potentials that are equivalent in the following sense.
Definition 3.2. Two potentials S and S ′ are cyclically equivalent if S−S ′ lies in the
closure of the span of all elements of the form a1 · · ·ad − a2 · · · ada1, where a1 · · · ad
is a cyclic path.
The following proposition is immediate from (3.1).
Proposition 3.3. If two potentials S and S ′ are cyclically equivalent, then ∂ξ(S) =
′) for all ξ ∈ A⋆, hence J(S) = J(S ′) and P(A, S) = P(A, S ′).
It is easy to see that the definition of cyclical equivalence does not depend on the
choice of a path basis. In fact, it can be given in more invariant terms as follows.
Definition 3.4. For any topological K-algebra U , its trace space Tr(U) is defined as
Tr(U) = U/{U, U}, where {U, U} is the closure of the vector subspace in U spanned
by all commutators. We denote by π = πU : U → Tr(U) the canonical projection.
The following proposition is a direct consequence of the definitions.
Proposition 3.5. Two potentials S and S ′ are cyclically equivalent if and only if
πR〈〈A〉〉(S) = πR〈〈A〉〉(S
′). Thus, the Jacobian ideal and the Jacobian algebra of a
potential S depend only on the image of S in Tr(R〈〈A〉〉).
Recall that we identify the set of arrows Q1 with a K-basis in A = A
1. For a ∈ Q1,
we will use the notation ∂a for the cyclic derivative ∂a⋆ , where Q
1 = {a
⋆ | a ∈ Q1} is
the dual basis of Q1 in A
10 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Example 3.6. Consider the quiver Q = (Q0, Q1) with Q0 = {1, 2} and Q1 = {a, b},
where a : 1 → 2 and b : 2 → 1:
The vertex and arrow spans of Q are given by R = KQ0 = Ke1 ⊕ Ke2, and A =
KQ1 = Ka⊕Kb. The paths in R〈〈A〉〉 are e1, e2 and all products of the generators
a and b in which the factors a and b alternate. The potentials are (possibly infinite)
linear combinations of the elements (ab)n and (ba)n for all n ≥ 1. Using (3.1), we
obtain
∂a((ab)
n) = ∂a((ba)
n) = nb(ab)n−1, ∂b((ab)
n) = ∂b((ba)
n) = na(ba)n−1 (n ≥ 1).
Up to cyclical equivalence, every potential can be written in the form
αn(ab)
n (αn ∈ K).
Returning to the general theory, it is clear that every algebra homomorphism
ϕ : R〈〈A〉〉 → R〈〈A′〉〉
such that ϕ|R = id, sends potentials to potentials.
Proposition 3.7. Every algebra isomorphism
ϕ : R〈〈A〉〉 → R〈〈A′〉〉
such that ϕ|R = id, sends J(S) onto J(ϕ(S)), inducing an isomorphism of Jacobian
algebras P(A, S) → P(A′, ϕ(S)).
Proof. We start by developing some “differential calculus” for cyclic derivatives. We
need a few pieces of notation. We set
R〈〈A〉〉⊗̂R〈〈A〉〉 =
d,e≥0
(Ad ⊗ Ae)
(the tensor product on the right is over the base field K), and view this space as a
topological vector space with a basic system of open neighborhoods of 0 formed by
the sets
d+e≥n(A
d⊗Ae) for all n ≥ 0; thus, R〈A〉⊗R〈A〉 is dense inR〈〈A〉〉⊗̂R〈〈A〉〉.
Now, for every ξ ∈ A⋆, we define a continuous K-linear map
∆ξ : R〈〈A〉〉 → R〈〈A〉〉⊗̂R〈〈A〉〉
by setting ∆ξ(e) = 0 for e ∈ R = A
0, and
(3.2) ∆ξ(a1 · · · ad) =
ξ(ak)a1 · · · ak−1 ⊗ ak+1 · · · ad
for any path a1 · · · ad of length d ≥ 1. Note that ∆ξ does not depend on the choice of
a path basis. We will use the same convention as for cyclic derivatives: for a ∈ Q1,
we write ∆a instead of ∆a⋆ . For instance, in the situation of Example 3.6, we have
∆a((ab)
(ab)k−1 ⊗ b(ab)n−k, ∆b((ab)
(ab)k−1a⊗ (ab)n−k.
QUIVERS WITH POTENTIALS I 11
Next, we denote by (f, g) 7→ f�g a continuousK-bilinear map (R〈〈A〉〉⊗̂R〈〈A〉〉)×
R〈〈A〉〉 → R〈〈A〉〉 given by
(3.3) (u⊗ v)�g = vgu
for u, v ∈ R〈A〉. We are now ready to state the Leibniz rule.
Lemma 3.8 (Cyclic Leibniz rule). Let f ∈ R〈〈A〉〉i,j and g ∈ R〈〈A〉〉j,i for some
vertices i and j. Then for every ξ ∈ A⋆, we have
(3.4) ∂ξ(fg) = ∆ξ(f)�g +∆ξ(g)�f.
More generally, for any finite sequence of vertices i1, . . . , id, id+1 = i1 and for any
f1, . . . fd such that fk ∈ R〈〈A〉〉ik,ik+1, we have
(3.5) ∂ξ(f1 · · ·fd) =
∆ξ(fk)�(fk+1 · · · fdf1 · · · fk−1).
Proof. It is enough to check (3.4) in the case where f = a1 · · · ad and g = ad+1 · · · ad+s
are two paths such that t(ad) = h(ad+1) and t(ad+s) = h(a1). Using (3.1), we obtain
∂ξ(fg) =
ξ(ak)ak+1 · · · ad+sa1 · · · ak−1.
Comparing this expression with (3.2) and (3.3), we see that the part of the last sum
where k runs from 1 to d (resp. from d + 1 to d + s) is equal to ∆ξ(f)�g (resp. to
∆ξ(g)�f), proving (3.4). The identity (3.5) follows from (3.4) by induction on d. �
Lemma 3.9 (Cyclic chain rule). Suppose that ϕ : R〈〈A〉〉 → R〈〈A′〉〉 is an algebra
homomorphism as in Proposition 2.4. Then, for every potential S ∈ R〈〈A〉〉)cyc and
ξ ∈ A′⋆, we have:
(3.6) ∂ξ(ϕ(S)) =
∆ξ(ϕ(a))�ϕ(∂a(S)).
Proof. It suffices to treat the case where S = a1 · · ·ad is a cyclic path. Applying
(3.5) and (3.1), we obtain
∂ξ(ϕ(S)) =
∆ξ(ϕ(ak))�(ϕ(ak+1 · · · ada1 · · · ak−1))
∆ξ(ϕ(a))�ϕ(
k:ak=a
ak+1 · · ·ada1 · · · ak−1)
∆ξ(ϕ(a))�ϕ(∂a(S)),
as desired. �
Now we are ready to prove Proposition 3.7. By Lemma 3.9, for every ξ ∈ A′⋆, the
element ∂ξ(ϕ(S)) lies in the ideal generated by the elements ϕ(∂a(S)) for a ∈ Q1,
hence, it lies in ϕ(J(S)). Thus, we have the inclusion
J(ϕ(S)) ⊆ ϕ(J(S)).
12 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
We can also apply this to the inverse isomorphism ϕ−1 and the potential ϕ(S):
J(S) = J(ϕ−1(ϕ(S))) ⊆ ϕ−1(J(ϕ(S)).
Applying ϕ to both sides yields
ϕ(J(S)) ⊆ J(ϕ(S)),
completing the proof. �
4. Quivers with potentials
We now introduce our main objects of study.
Definition 4.1. Suppose Q is a quiver with the arrow span A, and S ∈ R〈〈A〉〉cyc
is a potential. We say that a pair (Q, S) (or (A, S)) is a quiver with potential (QP
for short) if it satisfies the following two conditions:
(4.1) The quiver Q has no loops, i.e., Ai,i = 0 for all i ∈ Q0.
(4.2) No two cyclically equivalent cyclic paths appear in the decomposition of S.
In view of (4.1), every potential S belongs to m(A)2; and condition (4.2) excludes,
for instance, any non-zero potential S cyclically equivalent to 0.
Definition 4.2. Let (A, S) and (A′, S ′) be QPs on the same vertex set Q0. By
a right-equivalence between (A, S) and (A′, S ′) we mean an algebra isomorphism
ϕ : R〈〈A〉〉 → R〈〈A′〉〉 such that ϕ|R = id, and ϕ(S) is cyclically equivalent to S
(see Definition 3.2).
In view of Proposition 3.5, any algebra homomorphism R〈〈A〉〉 → R〈〈A′〉〉 such
that ϕ|R = id, sends cyclically equivalent potentials to cyclically equivalent ones. It
follows that right-equivalences of QPs have the expected properties: the composition
of two right-equivalences, as well as the inverse of a right-equivalence, is again a
right-equivalence. Note also that an isomorphism ϕ : R〈〈A〉〉 → R〈〈A′〉〉 induces
an isomorphism of R-bimodules A and A′ (cf. Proposition 2.4), so in dealing with
right-equivalent QPs we can assume without loss of generality that A = A′.
In view of Propositions 3.3 and 3.7, any right-equivalence of QPs (A, S) ∼= (A′, S ′)
induces an isomorphism of the Jacobian ideals J(S) ∼= J(S ′) and of the Jacobian
algebras P(A, S) ∼= P(A′, S ′).
For every two QPs (A, S) and (A′, S ′) (on the same set of vertices Q0), we can
form their direct sum (A, S) ⊕ (A′, S ′) = (A ⊕ A′, S + S ′); it is well-defined since
both complete path algebras R〈〈A〉〉 and R〈〈A′〉〉 have canonical embeddings into
R〈〈A⊕ A′〉〉 as closed R-subalgebras.
We start our analysis of QPs with the case S ∈ A2. In this case, J(S) is the
closure of the ideal generated by the subspace
(4.3) ∂S = {∂ξ(S) | ξ ∈ A
⋆} ⊆ A.
Definition 4.3. We say that a QP (A, S) is trivial if S ∈ A2, and ∂S = A, or,
equivalently, P(A, S) = R.
The following description of trivial QPs is seen by standard linear algebra.
QUIVERS WITH POTENTIALS I 13
Proposition 4.4. A QP (A, S) with S ∈ A2 is trivial if and only if the set of arrows
Q1 consists of 2N distinct arrows a1, b1, . . . , aN , bN such that each akbk is a cyclic
2-path, and there is a change of arrows ϕ (see Definition 2.5) such that ϕ(S) is
cyclically equivalent to a1b1 + · · ·+ aNbN .
Returning to general QPs, we now show that taking direct sums with trivial ones
does not affect the Jacobian algebra.
Proposition 4.5. If (A, S) is an arbitrary QP, and (C, T ) is a trivial one, then
the canonical embedding R〈〈A〉〉 → R〈〈A⊕C〉〉 induces an isomorphism of Jacobian
algebras P(A, S) → P(A⊕ C, S + T ).
Proof. Let L denote the closure of the two-sided ideal in R〈〈A ⊕ C〉〉 generated by
C; thus, L is the set of all (possibly infinite) linear combinations of paths, each of
which contains at least one arrow from C. The definitions readily imply that
R〈〈A⊕ C〉〉 = R〈〈A〉〉 ⊕ L,
J(S + T ) = J(S)⊕ L
(in the last equality, J(S) is understood as the Jacobian ideal of S in R〈〈A〉〉).
Therefore,
P(A⊕ C, S + T ) = R〈〈A⊕ C〉〉/J(S + T ) = (R〈〈A〉〉 ⊕ L)/(J(S)⊕ L)
∼= R〈〈A〉〉/J(S) = P(A, S),
as desired. �
For an arbitrary QP (A, S), we denote by S(2) ∈ A2 the degree 2 homogeneous
component of S. We call (A, S) reduced if S(2) = 0, i.e., S ∈ m(A)3. We define
the trivial and reduced arrow spans of (A, S) as the finite-dimensional R-bimodules
given by
(4.4) Atriv = Atriv(S) = ∂S
(2), Ared = Ared(S) = A/∂S
(2) .
(see (4.3)).
The following statement will play a crucial role in later sections.
Theorem 4.6 (Splitting Theorem). For every QP (A, S) with the trivial arrow
span Atriv and the reduced arrow span Ared, there exist a trivial QP (Atriv, Striv)
and a reduced QP (Ared, Sred) such that (A, S) is right-equivalent to the direct sum
(Atriv, Striv)⊕(Ared, Sred). Furthermore, the right-equivalence class of each of the QPs
(Atriv, Striv) and (Ared, Sred) is determined by the right-equivalence class of (A, S).
Let us first prove the existence of a desired right-equivalence
(4.5) (A, S) ∼= (Atriv, Striv)⊕ (Ared, Sred).
There is nothing to prove if (A, S) is reduced, so let us assume that S(2) 6= 0. Using
Proposition 4.4 and replacing S if necessary by a cyclically equivalent potential, we
can assume that S is of the form
(4.6) S =
(akbk + akuk + vkbk) + S
14 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
where each akbk is a cyclic 2-path, the arrows a1, b1, . . . , aN , bN form a basis of Atriv,
the elements uk and vk belong tom
2, and the potential S ′ ∈ m3 is a linear combination
of cyclic paths containing none of the arrows ak or bk. The existence of a right-
equivalence (4.5) becomes a consequence of the following lemma.
Lemma 4.7. For every potential S of the form (4.6), there exists a unitriangular
automorphism ϕ of R〈〈A〉〉 such that ϕ(S) is cyclically equivalent to a potential of
the form (4.6) with uk = vk = 0 for all k.
We say that a potential S is d-split if it is of the form (4.6) with uk, vk ∈ m
d+1 for
all k. To prove Lemma 4.7, we first show the following.
Lemma 4.8. Suppose a potential S is d-split for some d ≥ 1. There exists a unitri-
angular automorphism ϕ of R〈〈A〉〉 having depth d and such that ϕ(S) is cyclically
equivalent to a 2d-split potential S̃ with ϕ(S)− S̃ ∈ m2d+2.
Proof. Let us write S in the form (4.6) with uk, vk ∈ m
d+1. Let ϕ be the unitriangular
automorphism of R〈〈A〉〉 acting on arrows as follows:
ϕ(ak) = ak − vk, ϕ(bk) = bk − uk, ϕ(c) = c (c ∈ Q1 − {a1, b1, . . . , aN , bN}).
Then ϕ is of depth d, so by (2.4), for each k, we have
ϕ(uk) = uk + u
k, ϕ(vk) = vk + v
k ∈ m
2d+1).
Therefore, we obtain
ϕ(S) =
((ak − vk)(bk − uk) + (ak − vk)(uk + u
k) + (vk + v
k)(bk − uk)) + S
(akbk + aku
k + v
kbk) + S1 + S
where
S1 = −
(vkuk + vku
k + v
kuk) ∈ m
2d+2.
In view of Definition 3.2, S1 is cyclically equivalent to a potential of the form∑
k(aku
k + v
kbk) + S
′′, where u′′k, v
k ∈ m
2d+1, and S ′′ is a linear combination of
cyclic paths containing none of the ak or bk. Furthermore, we have
S1 − S
k + v
kbk) ∈ m
2d+2.
We see that the desired potential S̃ can be chosen as
(akbk + ak(u
k + u
k) + (v
k + v
k)bk) + S
′ + S ′′,
completing the proof of Lemma 4.8. �
Proof of Lemma 4.7. Starting with a potential S of the form (4.6) and using repeat-
edly Lemma 4.8, we construct a sequence of potentials S1, S2, . . . , and a sequence of
unitriangular automorphisms ϕ1, ϕ2, . . . , with the following properties:
(1) S1 = S.
(2) Sd is 2
d−1-split.
(3) ϕd is of depth 2
QUIVERS WITH POTENTIALS I 15
(4) ϕd(Sd) is cyclically equivalent to Sd+1, and ϕd(Sd)− Sd+1 ∈ m
2d+2.
By property (3), setting
(4.7) ϕ = lim
ϕnϕn−1 · · ·ϕ1,
we obtain a well defined unitriangular automorphism ϕ of R〈〈A〉〉; indeed, in view of
(2.4), for any u ∈ R〈〈A〉〉, if we write ϕnϕn−1 · · ·ϕ1(u) as
d=0 u
n with u
n ∈ A
then each homogeneous component u
n stabilizes as n→ ∞.
We claim that this automorphism ϕ satisfies the required properties in Lemma 4.7.
To see this, for d ≥ 1, denote Cd = ϕd(Sd)− Sd+1. By (4), Cd ∈ {R〈〈A〉〉, R〈〈A〉〉} ∩
2d+2 (recall from Definition 3.4 that {R〈〈A〉〉, R〈〈A〉〉} denotes the closure of the
vector subspace in R〈〈A〉〉 spanned by all commutators). Using (1), it is easy to see
ϕnϕn−1 · · ·ϕ1(S) = Sn+1 +
ϕnϕn−1 · · ·ϕd+1(Cd)
for every n ≥ 1; passing to the limit as n→ ∞ yields
ϕ(S) = lim
Sn + ϕ(
(ϕd · · ·ϕ1)
−1(Cd))
(the convergence of the series on the right is clear since any automorphism of R〈〈A〉〉
preserves the powers of m). We conclude that ϕ(S) is cyclically equivalent to
limn→∞ Sn. In view of (2), the latter element is of the form (4.6) with uk = vk = 0
for all k. This completes the proofs of Lemma 4.7 and of the existence of a right-
equivalence (4.5). �
The above argument makes it clear that the right-equivalence class of (Atriv, Striv)
is determined by the right-equivalence class of (A, S) . To prove Theorem 4.6, it
remains to show that the same is true for (Ared, Sred). Changing notation a little bit,
we need to prove the following.
Proposition 4.9. Let (A, S) and (A, S ′) be reduced QPs, and (C, T ) a trivial QP.
If (A⊕C, S+T ) is right-equivalent to (A⊕C, S ′+T ) then (A, S) is right-equivalent
to (A, S ′).
We deduce Proposition 4.9 from the following result of independent interest.
Proposition 4.10. Let (A, S) and (A, S ′) be reduced QPs such that S ′−S ∈ J(S)2.
Then we have:
(1) J(S ′) = J(S).
(2) (A, S) is right-equivalent to (A, S ′). More precisely, there exists an algebra
automorphism ϕ of R〈〈A〉〉 such that ϕ(S) is cyclically equivalent to S ′, and
ϕ(u)− u ∈ J(S) for all u ∈ R〈〈A〉〉.
Proof. (1) Since (A, S) is reduced, we have J(S) ⊆ m2. As an easy consequence of
the cyclic Leibniz rule (3.4), we see that
∂ξ(J(S)
2)cyc ⊆ mJ(S) + J(S)m
16 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
for any ξ ∈ A⋆. It follows that
(4.8) ∂ξS
′ − ∂ξS ∈ mJ(S) + J(S)m,
implying that J(S ′) ⊆ J(S).
To show the reverse inclusion, note that (4.8) also implies that
J(S) ⊆ J(S ′) + (mJ(S) + J(S)m).
Applying the same inclusion to each of the terms J(S) on the right, we obtain
J(S) ⊆ J(S ′) + (m2J(S) +mJ(S)m+ J(S)m2).
Continuing in the same way, we get
J(S) ⊆ J(S ′) +
kJ(S)mn−k ⊆ J(S ′) +mn+2
for any n ≥ 1. Remembering the definition of topology in R〈〈A〉〉 (see (2.3)) and the
fact that J(S ′) is closed, we conclude that J(S) ⊆ J(S ′), finishing the proof of part
(1) of Proposition 4.10.
(2) Let Q1 = {a1, . . . , aN} be the set of arrows (that is, a basis of A). Then
a unitriangular automorphism ϕ of R〈〈A〉〉 is specified by a N -tuple of elements
b1, . . . , bN ∈ m
2 such that
(4.9) ϕ(ak) = ak + bk (k = 1, . . . , N).
Lemma 4.11. Let (A, S) be a reduced QP, and let ϕ be a unitriangular automor-
phism of R〈〈A〉〉 given by (4.9). Then the potential ϕ(S) − S −
k=1 bk∂akS is
cyclically equivalent to an element of mI2, where I is the closure of the ideal in
R〈〈A〉〉 generated by b1, . . . , bN .
Proof. First consider the case where S = ak1 · · · akd is a cyclic path of length d ≥ 3.
Then ϕ(S) = (ak1 +bk1) · · · (akd +bkd). Expanding this product, we see that the term
that contains no factors bkj is equal to S, while the sum of the terms that contain ex-
actly one factor bkj is easily seen to be cyclically equivalent to
k=1 bk∂akS (cf.(3.1)),
and the rest of the terms are cyclically equivalent to elements of
k=1m(m
d−1∩I)bk.
Writing a general potential S ∈ m3 as a linear combination of cyclic paths,we see
that ϕ(S)− S −
bk∂akS is cyclically equivalent to
ckbk, where each ck is
of the form
with c
kℓ ∈ m
d−1 ∩ I. Since I is closed, each ck is a well-defined element of mI,
implying the assertion of Lemma 4.11. �
We will also need one more lemma whose proof will be given in Section 13.
Lemma 4.12. Let I be a closed ideal of R〈〈A〉〉, and J be the closure of an ideal
generated by finitely many elements f1, f2, . . . , fN , which are bi-homogeneous with
respect to the vertex bigrading. Then every potential belonging to the ideal IJ is
cyclically equivalent to an element of the form
k=1 bkfk, where all bk belong to I.
QUIVERS WITH POTENTIALS I 17
To prove part (2) of Proposition 4.10, we construct a sequence of N -tuples
(b1n, . . . , bNn) (n ≥ 1)
of elements of m2 and the corresponding unitriangular automorphisms ϕn of R〈〈A〉〉
(so that ϕn(ak) = ak + bkn for k = 1, . . . , N) such that, for all n ≥ 1, we have
(1) bkn ∈ m
n+1 ∩ J(S) for k = 1, . . . , N .
(2) S ′ is cyclically equivalent to ϕ0ϕ1 · · ·ϕn−1(S +
k=1 bkn∂akS) (with the con-
vention that ϕ0 is the identity automorphism).
We proceed by induction on n. In the basic case n = 1, the existence of an
N -tuple (b11, . . . , bN1) with desired properties follows from Lemma 4.12 applied to
I = J = J(S) and fk = ∂akS (note that J(S) ⊆ m
2, since (A, S) is assumed to be
reduced).
Now assume that, for some n ≥ 1, we have already defined the elements bkℓ for
k = 1, . . . , N and ℓ = 1, . . . , n, satisfying (1) and (2). Applying Lemma 4.11 to
bk = bkn (so that ϕ = ϕn), we obtain that ϕn(S)− (S +
k=1 bkn∂akS) is cyclically
equivalent to an element of m(mn+1 ∩ J(S))2. We have
m(mn+1 ∩ J(S))2 ⊆ (mn+2 ∩ J(S))J(S).
This implies in particular that ϕn(S) − S is cyclically equivalent to an element of
J(S)2. Combining Proposition 3.7 with the already proved part (1) of Proposi-
tion 4.10, we conclude that ϕn(J(S)) = J(ϕn(S)) = J(S). It follows that ϕn(S) −
k=1 bkn∂akS) is cyclically equivalent to an element of ϕn((m
n+2 ∩ J(S))J(S)).
Applying Lemma 4.12 to I = mn+2 ∩ J(S), J = J(S) and fk = ∂akS, we see that
every potential in (mn+2 ∩ J(S))J(S) is cyclically equivalent to a potential of the
bk,n+1∂akS
for some bk,n+1 ∈ m
n+2∩J(S). It follows that S+
k=1 bkn∂akS is cyclically equivalent
to ϕn(S+
k=1 bk,n+1∂akS). Thus, conditions (1) and (2) get satisfied with n replaced
by n+ 1, completing our inductive step.
In view of condition (1), limn→∞ ϕ1 · · ·ϕn is a well-defined automorphism ϕ of
R〈〈A〉〉 such that ϕ(u)− u ∈ J(S) for all u ∈ R〈〈A〉〉. Passing to the limit n → ∞
in condition (2), we conclude that S ′ is cyclically equivalent to ϕ(S), completing the
proof of part (2) of Proposition 4.10. �
Proof of Proposition 4.9. We abbreviate J = J(S) and J ′ = J(S ′) (understood as
the Jacobian ideals of S and S ′ in R〈〈A〉〉). As in Proposition 4.5, let L denote the
closure of the two-sided ideal in R〈〈A⊕ C〉〉 generated by C. Then we have
(4.10) R〈〈A⊕ C〉〉 = R〈〈A〉〉 ⊕ L, J(S + T ) = J ⊕ L, J(S ′ + T ) = J ′ ⊕ L.
Let ϕ be an automorphism ofR〈〈A⊕C〉〉, such that ϕ(S+T ) is cyclically equivalent
to S ′ + T . In view of (4.10) and Proposition 3.7, we have
(4.11) ϕ(J ⊕ L) = J ′ ⊕ L.
18 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Let ψ : R〈〈A〉〉 → R〈〈A〉〉 denote the restriction to R〈〈A〉〉 of the composition
pϕ, where p is the projection of R〈〈A ⊕ C〉〉 onto R〈〈A〉〉 along L. In view of
Proposition 4.10, it suffices to show the following:
ψ is an automorphism of R〈〈A〉〉 such that(4.12)
S ′ − ψ(S) is cyclically equivalent to an element of ψ(J2)
(indeed, assuming (4.12) and using Proposition 3.7, we see that ψ(J2) = J(ψ(S))2,
hence one can apply Proposition 4.10 to potentials S ′ and ψ(S)).
Clearly, ψ is an algebra homomorphism, so can be represented by a pair (ψ(1), ψ(2))
as in Proposition 2.4. To show that ψ is an automorphism of R〈〈A〉〉, it suffices to
show that ψ(1) is an R-bimodule automorphism of A. By the definition, if we write
the R-bimodule automorphism ϕ(1) of A⊕ C as a matrix
ϕAA ϕAC
ϕCA ϕCC
then ψ(1) = ϕAA. Since
ϕ(C) ⊂ ϕ(J ⊕ L) = J ′ ⊕ L ⊆ m(A)2 ⊕ L,
it follows that ϕAC = 0, implying that ψ
(1) = ϕAA is an R-bimodule automorphism
of A, and that ψ is an automorphism of R〈〈A〉〉.
Since S ′+T is cyclically equivalent to ϕ(S+T ), the same is true for the potentials
obtained from them by applying the projection p; it follows that S ′−ψ(S) is cyclically
equivalent to pϕ(T ). Since T ∈ C2, the claim that S ′ − ψ(S) is cyclically equivalent
to an element of ψ(J2) follows from the fact that pϕ(L) ⊆ ψ(J), or, equivalently,
that ϕ(L) ⊆ ϕ(J) + L. Applying the inverse automorphism ϕ−1 to both sides, it
suffices to show that L ⊆ J + ϕ−1(L). Using the obvious symmetry between J and
J ′, it is enough to show the inclusion L ⊆ J ′ + ϕ(L).
Let us abbreviate M = m(A⊕ C), and I = J ′ + ϕ(L). Since ϕ(J) ⊆ J ′ ⊕ L, and
J ⊆ m(A)2, it follows that ϕ(J) ⊆ J ′ ⊕ (L ∩M2) = J ′ +ML + LM . Therefore, we
L ⊆ J ′ + L = ϕ(J) + ϕ(L) ⊆ I +ML+ LM.
Substituting this upper bound for L into its right hand side, we deduce the inclusion
L ⊆ I +M2L+MLM + LM2.
Continuing in the same way, for every n > 0, we have the inclusion
L ⊆ I +
MkLMn−k ⊆ I +Mn+1.
In view of (2.3), it follows that L is contained in I, the closure of I in R〈〈A⊕ C〉〉.
However, it is easy to see that I = J ′ + ϕ(L) is closed in R〈〈A ⊕ C〉〉 (indeed, the
closedness of I is equivalent to that of ϕ−1(I) = ϕ−1(J ′)+L, and so, by symmetry, it
is enough to show that ϕ(J)+L is closed; but this is clear since ϕ(J)+L = p−1(ψ(J))
is the inverse image of the closed ideal ψ(J) of R〈〈A〉〉). This completes the proofs
of Proposition 4.9 and Theorem 4.6. �
QUIVERS WITH POTENTIALS I 19
Definition 4.13. We call the component (Ared, Sred) in the decomposition (4.5)
the reduced part of a QP (A, S) (by Theorem 4.6, it is determined by (A, S) up to
right-equivalence).
Definition 4.14. We call a quiver Q (as well as its arrow span A) 2-acyclic if it has
no oriented 2-cycles, i.e., satisfies the following condition:
(4.13) For every pair of vertices i 6= j, either Ai,j = {0} or Aj,i = {0}.
In the rest of this section we study the conditions on a QP (A, S) guaranteeing
that its reduced part is 2-acyclic. We need some preparation.
For a quiver Q with the arrow span A, let C = C(A) denote the set of cyclic
paths on A up to cyclical equivalence. Thus, C is either empty (if Q has no oriented
cycles at all), or countable. The space of potentials up to cyclical equivalence is
naturally identified with KC. We say that a K-valued function on KC is polynomial
if it depends on finitely many components of a potential S and can be expressed as a
polynomial in these components. For a nonzero polynomial function F , we denote by
U(F ) ⊂ KC the set of all potentials S such that F (S) 6= 0. By a regular function on
U(F ) we mean a ratio of two polynomial functions on KC such that the denominator
vanishes nowhere on U(F ); in particular, any function of the form G/F n, where G
is a polynomial, is regular on U(F ). If A′ is the arrow span of another quiver Q′, we
say that a map KC(A) → KC(A
′) is polynomial if its every component is a polynomial
function; similarly, a map U(F ) → KC(A
′) is regular if its every component is a
regular function on U(F ).
Now suppose that the arrow span A satisfies (4.1), and let {a1, b1, . . . , aN , bN} be
any maximal collection of distinct arrows in Q such that bkak is a cyclic 2-path for
k = 1, . . . , N . Then the quiver obtained from Q by removing this collection of arrows
is clearly 2-acyclic. To such a collection we associate a nonzero polynomial function
on KC(A) given by
(4.14) Db1,...,bNa1,...,aN (S) = det(xbqap)p,q=1,...,N ,
where xbqap is the sum of the coefficients of bqap and apbq in a potential S, with the
convention that xbqap = 0 unless bqap is a cyclic 2-path.
Proposition 4.15. The reduced part (Ared, Sred) of a QP (A, S) is 2-acyclic if and
only if Db1,...,bNa1,...,aN (S) 6= 0 for some collection of arrows as above. Furthermore, if A
the arrow span of the quiver obtained from Q by removing all arrows a1, b1, . . . , aN , bN ,
then there exists a regular map H : U(Db1,...,bNa1,...,aN ) → K
C(A′) such that, for any QP
(A, S) with S ∈ U(Db1,...,bNa1,...,aN ), the reduced part (Ared, Sred) is right-equivalent to
(A′, H(S)).
The proof of Proposition 4.15 follows by tracing the construction of (Ared, Sred)
given in the proof of Lemma 4.7. Note that we use the following convention. If A is
2-acyclic from the start then the only collection {a1, b1, . . . , aN , bN} as above is the
empty set; in this case, the function Db1,...,bNa1,...,aN is understood to be equal to 1, and H
is just the identity mapping.
20 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
5. Mutations of quivers with potentials
Let (A, S) be a QP. Suppose that a vertex k ∈ Q0 does not belong to an oriented
2-cycle. In other words, k satisfies the following condition:
(5.1) For every vertex i, either Ai,k or Ak,i is zero.
Replacing S if necessary with a cyclically equivalent potential, we can also assume
(5.2) No cyclic path occurring in the expansion of S starts (and ends) at k.
Under these conditions, we associate to (A, S) a QP µ̃k(A, S) = (Ã, S̃) on the same
set of vertices Q0. We define the homogeneous components Ãi,j as follows:
(5.3) Ãi,j =
(Aj,i)
⋆ if i = k or j = k;
Ai,j ⊕Ai,kAk,j otherwise;
here the product Ai,kAk,j is understood as a subspace of A
2 ⊆ R〈〈A〉〉. Thus, the
R-bimodule Ã is given by
(5.4) Ã = ekAek ⊕ AekA⊕ (ekA)
⋆ ⊕ (Aek)
where we use the notation
(5.5) ek = 1− ek =
i∈Q0−{k}
We associate to Q1 the set of arrows Q̃1 in the following way:
• Take all the arrows c ∈ Q1 not incident to k.
• For each incoming arrow a and outgoing arrow b at k, create a “composite”
arrow [ba] corresponding to the product ba ∈∈ AekA.
• Replace each incoming arrow a (resp. each outgoing arrow b) at k by the
corresponding arrow a⋆ (resp. b⋆) oriented in the opposite way.
More formally, for i = k or j = k, we set
(5.6) Q̃1 ∩ Ãi,j = {a
⋆ | a ∈ Q1 ∩Aj,i}
(the dual basis); and for i and j different from k, we define
(5.7) Q̃1 ∩ Ãi,j = (Q1 ∩ Ai,j)
{[ba] | b ∈ Q1 ∩ Ai,k, a ∈ Q1 ∩ Ak,j},
where [ba] ∈ Q̃1 ∩Ai,kAk,j denotes the arrow in Q̃1 associated with the product ba.
We now associate to S the potential µ̃k(S) = S̃ ∈ R〈〈Ã〉〉 given by
(5.8) S̃ = [S] + ∆k,
where
(5.9) ∆k = ∆k(A) =
a,b∈Q1: h(a)=t(b)=k
[ba]a⋆b⋆,
and [S] is obtained by substituting [apap+1] for each factor apap+1 with t(ap) =
h(ap+1) = k of any cyclic path a1 · · · ad occurring in the expansion of S (recall that
none of these cyclic paths starts at k). It is easy to see that both [S] and ∆k do not
depend on the choice of a basis Q1 of A.
QUIVERS WITH POTENTIALS I 21
The following proposition is immediate from the definitions.
Proposition 5.1. Suppose a QP (A, S) satisfies (5.1) and (5.2), and a QP (A′, S ′)
is such that ekA
′ = A′ek = {0}. Then we have
(5.10) µ̃k(A⊕ A
′, S + S ′) = µ̃k(A, S)⊕ (A
′, S ′).
Theorem 5.2. The right-equivalence class of the QP (Ã, S̃) = µ̃k(A, S) is deter-
mined by the right-equivalence class of (A, S).
Proof. Let Â be the finite-dimensional R-bimodule given by
(5.11) Â = A⊕ (ekA)
⋆ ⊕ (Aek)
The natural embedding A→ Â identifies R〈〈A〉〉 with a closed subalgebra in R〈〈Â〉〉.
We also have a natural embedding Ã → R〈〈Â〉〉 (sending each arrow [ba] to the
product ba). This allows us to identify R〈〈Ã〉〉 with another closed subalgebra in
R〈〈Â〉〉, namely, with the closure of the linear span of the paths â1 · · · âd such that
â1 /∈ ekA and âd /∈ Aek. Under this identification, the potential S̃ given by (5.8) and
viewed as an element of R〈〈Â〉〉 is cyclically equivalent to the potential
S + (
b∈Q1∩Aek
b⋆b)(
a∈Q1∩ekA
aa⋆).
Taking this into account, we see that Theorem 5.2 becomes a consequence of the
following lemma.
Lemma 5.3. Every automorphism ϕ of R〈〈A〉〉 can be extended to an automorphism
ϕ̂ of R〈〈Â〉〉 satisfying
(5.12) ϕ̂(R〈〈Ã〉〉) = R〈〈Ã〉〉,
(5.13) ϕ̂(
a∈Q1∩ekA
aa⋆) =
a∈Q1∩ekA
aa⋆, ϕ̂(
b∈Q1∩Aek
b⋆b) =
b∈Q1∩Aek
In order to extend ϕ to an automorphism ϕ̂ of R〈〈Â〉〉, we need only to define the
elements ϕ̂(a⋆) and ϕ̂(b⋆) for all arrows a ∈ Q1 ∩ ekA and b ∈ Q1 ∩ Aek.
We first deal with ϕ̂(a⋆). Let Q1 ∩ ekA = {a1, . . . , as}. In view of Proposition 2.4,
the action of ϕ on these arrows is given by
(5.14)
ϕ(a1) ϕ(a2) · · · ϕ(as)
a1 a2 · · · as
(C0 + C1),
where:
• C0 is an invertible s × s matrix with entries in K such that its (p, q)-entry
is 0 unless t(ap) = t(aq);
• C1 is a s× s matrix whose (p, q)-entry belongs to m(A)t(ap),t(aq).
Note that C0 +C1 is invertible, and its inverse is of the same form: indeed, we have
(C0 + C1)
−1 = (I + C−10 C1)
−1C−10 = (I +
(−1)n(C−10 C1)
n)C−10 .
22 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Now we define the elements ϕ̂(a⋆p) by setting
ϕ̂(a⋆1)
ϕ̂(a⋆2)
ϕ̂(a⋆s)
 = (C0 + C1)
 .
It follows that
ϕ̂(a1) ϕ̂(a2) · · · ϕ̂(as)
ϕ̂(a⋆1)
ϕ̂(a⋆2)
ϕ̂(a⋆s)
a1 a2 · · · as
 =
For b ∈ Q1 ∩ Aek, we define ϕ̂(b
⋆) in a similar way. Namely, let Q1 ∩ Aek =
{b1, . . . , bt}. As above, the action of ϕ on these arrows is given by
(5.15)
ϕ(b1)
ϕ(b2)
ϕ(bt)
 = (D0 +D1)
 ,
where:
• D0 is an invertible t × t matrix with entries in K such that its (p, q)-entry
is 0 unless h(bp) = h(bq);
• D1 is a t× t matrix whose (p, q)-entry belongs to m(A)h(bp),h(bq).
As above, we see that D0+D1 is invertible, and its inverse is of the same form. Now
we define the elements ϕ̂(b⋆q) by setting
ϕ̂(b⋆1) ϕ̂(b
2) · · · ϕ̂(b
b⋆1 b
2 · · · b
(D0 +D1)
It follows that
b⋆qbq) =
ϕ̂(b1)
⋆ ϕ̂(b⋆2) · · · ϕ̂(b
ϕ̂(b1)
ϕ̂(b2)
ϕ̂(bt)
b⋆1 b
2 · · · b
 =
b⋆qbq.
The condition (5.13) is then clearly satisfied; the construction also makes clear that
the automorphism ϕ̂ of R〈〈Â〉〉 preserves the subalgebra R〈〈Ã〉〉. As a consequence
QUIVERS WITH POTENTIALS I 23
of Proposition 2.4, ϕ̂ restricts to an automorphism of R〈〈Ã〉〉, verifying (5.12) and
completing the proofs of Lemma 5.3 and Theorem 5.2. �
Note that even if a QP (A, S) is assumed to be reduced, the QP µ̃k(A, S) = (Ã, S̃)
is not necessarily reduced because the component [S](2) ∈ Ã2 may be non-zero.
Combining Theorems 4.6 and 5.2, we obtain the following corollary.
Corollary 5.4. Suppose a QP (A, S) satisfies (5.1) and (5.2), and let µ̃k(A, S) =
(Ã, S̃). Let (A, S) be a reduced QP such that
(5.16) (Ã, S̃) ∼= (Ãtriv, S̃
(2))⊕ (A, S)
(see (4.5)). Then the right-equivalence class of (A, S) is determined by the right-
equivalence class of (A, S).
Definition 5.5. In the situation of Corollary 5.4, we use the notation µk(A, S) =
(A, S) and call the correspondence (A, S) 7→ µk(A, S) the mutation at vertex k.
Note that if a QP (A, S) satisfies (5.1) then the same is true for µ̃k(A, S) and
for µk(A, S). Thus, the mutation µk is a well-defined transformation on the set
of right-equivalence classes of reduced QPs satisfying (5.1). (With some abuse of
notation, we sometimes denote a right-equivalence class by the same symbol as any
of its representatives.)
Example 5.6. Consider the quiver Q with vertices {1, 2, 3, 4} and arrows a : 1 → 2,
b : 2 → 3, c : 3 → 4 and d : 4 → 1:
Let S = dcba. Let us perform the mutation at vertex 2. The arrow a is replaced by
e := a⋆ : 2 → 1, and b is replaced by f := b⋆ : 3 → 2. We also have a new arrow
g := [ba] : 1 → 3. So µ̃2(A) corresponds to the quiver with vertices {1, 2, 3, 4} and
arrows c, d, e, f, g:
g=[ba]
The potential µ̃2(S) = S̃ is given by
S̃ = dcg + gef ;
thus, µ̃2(A, S) is reduced, and we have µ̃2(A, S) = µ2(A, S).
Note that S̃ does not satisfy condition (5.2) with respect to vertex k = 3 since the
path gef starts and ends at 3. But we can fix this condition by replacing S̃ with
a cyclically equivalent potential, say S ′ = dcg + efg. Now let us mutate (Ã, S ′) at
24 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
vertex 3. The arrows c, f, g are replaced by c⋆ : 4 → 3, f ⋆ : 2 → 3 and g⋆ : 3 → 1,
respectively. We also add new arrows [cg] : 1 → 4 and [fg] : 1 → 2. Thus, µ̃3(Ã, S
has arrows {d, e, c⋆, f ⋆, g⋆, [cg], [fg]}:
c⋆ //
The potential µ̃3(S
′) is given by
′) = d[cg] + e[fg] + [fg]g⋆f ⋆ + [cg]g⋆c⋆.
It is not reduced, so to obtain the reduced QP µ3(Ã, S
′), we need to remove the
trivial part of µ̃3(Ã, S
′). The resulting quiver is as follows:
c⋆ // 3
Since it is acyclic (that is, has no oriented cycles), the corresponding potential is 0.
Our next result is that every mutation is an involution.
Theorem 5.7. The correspondence µk : (A, S) → (A, S) acts as an involution on
the set of right-equivalence classes of reduced QPs satisfying (5.1), that is, µ2k(A, S)
is right-equivalent to (A, S).
Proof. Let (A, S) be a reduced QP satisfying (5.1) and (5.2). Let µ̃k(A, S) = (Ã, S̃)
and µ̃2k(A, S) = µ̃k(Ã, S̃) = (
S). In view of Theorem 4.6 and Proposition 5.1, it is
enough to show that
(5.17) (
S) is right-equivalent to (A, S)⊕ (C, T ), where (C, T ) is a trivial QP.
Using (5.4) twice, and identifying (ekA)
⋆ with A⋆ek, and (Aek)
⋆ with ekA
⋆, where
A⋆ is the dual R-bimodule of A, we conclude that
(5.18)
A = A⊕ AekA⊕A
Furthermore, the basis of arrows in
A consists of the original set of arrows Q1 in A
together with the arrows [ba] ∈ AekA and [a
⋆b⋆] ∈ A⋆ekA
⋆ for a ∈ Q1 ∩ ekA and
b ∈ Q1 ∩Aek.
Remembering (5.8) and (5.9), we see that the potential
S is given by
(5.19)
S = [[S]] + [∆k(A)] + ∆k(Ã) = [S] +
a,b∈Q1: h(a)=t(b)=k
([ba][a⋆b⋆] + [a⋆b⋆]ba),
QUIVERS WITH POTENTIALS I 25
hence it is cyclically equivalent to
(5.20) S1 = [S] +
a,b∈Q1: h(a)=t(b)=k
([ba] + ba)[a⋆b⋆]
(recall that [S] is obtained by substituting [apap+1] for each factor apap+1 with t(ap) =
h(ap+1) = k of any cyclic path a1 · · · ad occurring in the path expansion of S). Let
us abbreviate
(C, T ) = (AekA⊕ A
a,b∈Q1: h(a)=t(b)=k
[ba][a⋆b⋆]).
This is a trivial QP (cf. Proposition 4.4); therefore to prove Theorem 5.7 it suffices
to show that the QP (
A, S1) given by (5.18) and (5.20) is right-equivalent to (A, S)⊕
(C, T ). We proceed in several steps.
Step 1: Let ϕ1 be the change of arrows automorphism of R〈〈
A〉〉 (see Defini-
tion 2.5) multiplying each arrow b ∈ Q1 ∩ Aek by −1, and fixing the rest of the
arrows in
A. Then the potential S2 = ϕ1(S1) is given by
S2 = [S] +
a,b∈Q1: h(a)=t(b)=k
([ba]− ba)[a⋆b⋆].
Step 2: Let ϕ2 be the unitriangular automorphism of R〈〈
A〉〉 (see Definition 2.5)
sending each arrow [ba] ∈ AekA to [ba] + ba, and fixing the rest of the arrows in
A. Remembering the definition of [S], it is easy to see that the potential ϕ2(S2) is
cyclically equivalent to a potential of the form
S3 = S +
a,b∈Q1: h(a)=t(b)=k
[ba]([a⋆b⋆] + f(a, b))
for some elements f(a, b) ∈ m(A⊕ AekA)
Step 3: Let ϕ3 be the unitriangular automorphism of R〈〈
A〉〉 sending each arrow
[a⋆b⋆] ∈ A⋆ekA
⋆ to [a⋆b⋆] − f(a, b), and fixing the rest of the arrows in
A. Then we
have ϕ3(S3) = S + T .
Combining these three steps, we conclude that the QP (
A, S1) is right-equivalent
A, S + T ) = (A, S)⊕ (C, T ), finishing the proof of Theorem 5.7. �
6. Some mutation invariants
In this section we fix a vertex k and study the effect of the mutation µk on the
Jacobian algebra P(A, S). We will use the following notation: for an R-bimodule B,
denote
(6.1) Bk̂,k̂ = ekBek =
i,j 6=k
(see (5.5)). Note that if B is a (topological) algebra then B
k̂,k̂
is a (closed) subalgebra
of B.
26 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Proposition 6.1. Suppose a QP (A, S) satisfies (5.1) and (5.2), and let (Ã, S̃) =
µ̃k(A, S) be given by (5.4) and (5.8). Then the algebras P(A, S)k̂,k̂ and P(Ã, S̃)k̂,k̂
are isomorphic to each other.
Proof. In view of (5.4), we have
(6.2) Ãk̂,k̂ = Ak̂,k̂ ⊕ AekA.
Thus, the algebra R〈〈Ãk̂,k̂〉〉 is generated by the arrows c ∈ Q1 ∩ Ak̂,k̂ and [ba] for
a ∈ Q1∩ekA and b ∈ Q1∩Aek. The following fact is immediate from the definitions.
Lemma 6.2. The correspondence sending each c ∈ Q1 ∩ Ak̂,k̂ to itself, and each
generator [ba] to ba extends to an algebra isomorphism
R〈〈Ã
k̂,k̂
〉〉 → R〈〈A〉〉
k̂,k̂
Let u 7→ [u] denote the isomorphism R〈〈A〉〉k̂,k̂ → R〈〈Ãk̂,k̂〉〉 inverse of that in
Lemma 6.2. It acts in the same way as the correspondence S 7→ [S] in (5.8): [u] is
obtained by substituting [apap+1] for each factor apap+1 with t(ap) = h(ap+1) = k of
any path a1 · · ·ad occurring in the path expansion of u.
Lemma 6.3. The correspondence u 7→ [u] induces an algebra epimorphism
P(A, S)
k̂,k̂
→ P(Ã, S̃)
k̂,k̂
Proof. It is enough to prove the following two facts:
(6.3) R〈〈Ã〉〉k̂,k̂ = R〈〈Ãk̂,k̂〉〉+ J(S̃)k̂,k̂;
(6.4) [J(S)k̂,k̂] ⊆ R〈〈Ãk̂,k̂〉〉 ∩ J(S̃)k̂,k̂.
To show (6.3), we note that if a path ã1 · · · ãd ∈ R〈〈Ã〉〉k̂,k̂ does not belong to
R〈〈Ã
k̂,k̂
〉〉 then it must contain one or more factors of the form a⋆b⋆ with h(a) =
t(b) = k. In view of (5.8) and (5.9), we have
(6.5) a⋆b⋆ = ∂[ba]S̃ − ∂[ba][S].
Substituting this expression for each factor a⋆b⋆, we see that ã1 · · · ãd ∈ R〈〈Ãk̂,k̂〉〉+
J(S̃)k̂,k̂, as desired.
To show (6.4), we note that J(S)k̂,k̂ is easily seen to be the closure of the ideal
in R〈〈A〉〉
k̂,k̂
generated by the elements ∂cS for all arrows c ∈ Q1 with t(c) 6= k and
h(c) 6= k, together with the elements (∂aS)a
′ for a, a′ ∈ Q1 ∩ ekA , and b
′(∂bS) for
b, b′ ∈ Q1 ∩ Aek. Let us apply the map u 7→ [u] to these generators. First, we have:
(6.6) [∂cS] = ∂cS̃.
QUIVERS WITH POTENTIALS I 27
With a little bit more work (using (6.5)), we obtain
[(∂aS)a
t(b)=k
(∂[ba][S])[ba
t(b)=k
(∂[ba]S̃ − a
⋆b⋆)[ba′](6.7)
t(b)=k
(∂[ba]S̃)[ba
′]− a⋆∂a′⋆S̃,
[b′(∂bS)] =
h(a)=k
[b′a](∂[ba][S])
h(a)=k
[b′a](∂[ba]S̃ − a
⋆b⋆)(6.8)
h(a)=k
[b′a](∂[ba]S̃)− (∂b′⋆S̃)b
This implies the desired inclusion in (6.4). �
To finish the proof of Proposition 6.1, it is enough to show that the epimorphism
in Lemma 6.3 (let us denote it by α) is in fact an isomorphism. To do this, we
construct the left inverse algebra homomorphism β : P(Ã, S̃)
k̂,k̂
→ P(A, S)
k̂,k̂
that βα is the identity map on P(A, S)k̂,k̂). We define β as the composition of three
maps. First, we apply the epimorphism P(Ã, S̃)
k̂,k̂
k̂,k̂
defined in the same
way as α. Remembering the proof of Theorem 5.7 and using the notation introduced
there, we then apply the isomorphism P(
S)k̂,k̂ → P(A ⊕ C, S + T )k̂,k̂ induced
by the automorphism ϕ3ϕ2ϕ1 of R〈〈A ⊕ C〉〉. Finally, we apply the isomorphism
P(A⊕ C, S + T )k̂,k̂ → P(A, S)k̂,k̂ given in Proposition 4.5.
Since all the maps involved are algebra homomorphisms, it is enough to check
that βα fixes the generators p(c) and p(ba) of P(A, S)k̂,k̂, where p is the projection
R〈〈A〉〉 → P(A, S), and a, b, c have the same meaning as above. This is done by
direct tracing of the definitions. �
Proposition 6.4. In the situation of Proposition 6.1, if the Jacobian algebra P(A, S)
is finite-dimensional then so is P(Ã, S̃).
Proof. We start by showing that finite dimensionality of P(A, S) follows from a
seemingly weaker condition.
Lemma 6.5. Let J ⊆ m(A) be a closed ideal in R〈〈A〉〉. Then the quotient alge-
bra R〈〈A〉〉/J is finite dimensional provided the subalgebra R〈〈A〉〉k̂,k̂/Jk̂,k̂ is finite
dimensional. In particular, the Jacobian algebra P(A, S) is finite-dimensional if and
only if so is the subalgebra P(A, S)
k̂,k̂
Proof. Similarly to (6.1), for an R-bimodule B, we denote
Bk,k̂ = ekBek =
j 6=k
Bk,j, Bk̂,k = ekBek =
i 6=k
Bi,k.
28 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
We need to show that if R〈〈A〉〉k̂,k̂/Jk̂,k̂ is finite dimensional then so is each of the
spaces R〈〈A〉〉
, R〈〈A〉〉
andR〈〈A〉〉k,k/Jk,k. Let us treatR〈〈A〉〉k,k/Jk,k;
the other two cases are done similarly (and a little simpler).
Q1 ∩Ak,k̂ = {a1, . . . , as}, Q1 ∩ Ak̂,k = {b1, . . . , bt}.
We have
R〈〈A〉〉k,k = Kek ⊕
aℓR〈〈A〉〉k̂,k̂bm.
It follows that there is a surjective map α : K×Mats×t(R〈〈A〉〉k̂,k̂) → R〈〈A〉〉k,k/Jk,k
given by
α(c, C) = p(cek +
a1 a2 · · · as
),
where Mats×t(B) stands for the space of s× t matrices with entries in B, and p is the
projection R〈〈A〉〉 → R〈〈A〉〉/J . The kernel of α contains the space Mats×t(Jk̂,k̂),
hence R〈〈A〉〉k,k/Jk,k is isomorphic to a quotient of the finite-dimensional space K ×
Mats×t(R〈〈A〉〉k̂,k̂/Jk̂,k̂). Thus, R〈〈A〉〉k,k/Jk,k is finite dimensional, as desired. �
To finish the proof of Proposition 6.4, suppose that P(A, S) is finite dimensional.
Then P(Ã, S̃)k̂,k̂ is finite dimensional by Proposition 6.1. Applying Lemma 6.5 to
the QP (Ã, S̃), we conclude that P(Ã, S̃) is finite dimensional, as desired. �
Remembering (5.16) and using Proposition 4.5, we see that Propositions 6.1 and
6.4 have the following corollary.
Corollary 6.6. Suppose (A, S) is a reduced QP satisfying (5.1), and let (A, S) =
µk(A, S) be a reduced QP obtained from (A, S) by the mutation at k. Then the
algebras P(A, S)
k̂,k̂
and P(A, S)
k̂,k̂
are isomorphic to each other, and P(A, S) is
finite-dimensional if and only if so is P(A, S).
We see that the class of QPs with finite dimensional Jacobian algebras is invariant
under mutations. Let us now present another such class.
Definition 6.7. For every QP (A, S), we define its deformation space Def(A, S) by
(6.9) Def(A, S) = Tr(P(A, S))/R
(see Definitions 3.1 and 3.4).
Remark 6.8. Definition 6.7 can be motivated as follows (we keep the following
arguments informal although with some work they can be made rigorous). Let G =
Aut(R〈〈A〉〉) be the group of algebra automorphisms ofR〈〈A〉〉 (acting as the identity
on R). Using Proposition 2.4, we can think of G as an infinite dimensional algebraic
group. In view of Definition 3.4, G acts naturally on the trace space Tr(R〈〈A〉〉).
Remembering Definition 4.2, it is natural to think of the deformation space of a QP
(A, S) as the normal space at π(S) of the orbit G·π(S) in the ambient space π(m(A)2)
(recall that π stands for the natural projection R〈〈A〉〉 → Tr(R〈〈A〉〉)). Arguing as
QUIVERS WITH POTENTIALS I 29
in Lemma 4.11, we conclude that the infinitesimal action of (the Lie algebra of) G
on π(m(A)2) is by the transformations
π(u) 7→ π(
bk∂aku),
where Q1 = {a1, . . . , aN} is the set of arrows, and bk ∈ m(A)h(ak),t(ak) (this is well
defined in view of Proposition 3.3). This makes it natural to identify the tangent
space at π(S) of G · π(S) with π(J(S)), hence to identify the corresponding normal
space with π(m(A))/π(J(S)), or equivalently, with the space Def(S) given by (6.9).
Proposition 6.9. In the situation of Proposition 6.1, deformation spaces Def(Ã, S̃)
and Def(A, S) are isomorphic to each other.
Proof. In view of Proposition 3.5, Def(A, S) is isomorphic to Tr(P(A, S)k̂,k̂)/Rk̂,k̂.
Therefore, our assertion is immediate from Proposition 6.1. �
Definition 6.10. We call a QP (A, S) rigid if Def(A, S) = {0}, i.e., if Tr(P(A, S)) =
Combining Propositions 4.5 and 6.9, we obtain the following corollary.
Corollary 6.11. If a reduced QP (A, S) satisfies (5.1) and is rigid, then the QP
(A, S) = µk(A, S) is also rigid.
Some examples of rigid and non-rigid QPs will be given in Section 8.
7. Nondegenerate QPs
If we wish to be able to apply to a reduced QP (A, S) the mutation at every vertex
of Q0, the R-bimodule A must satisfy (5.1) at all vertices. Thus, the arrow span A
must be 2-acyclic (see Definition 4.14). Such an arrow span A can be encoded by a
skew-symmetric integer matrix B = B(A) = (bi,j) with rows and columns labeled by
Q0, by setting
(7.1) bi,j = dimAi,j − dimAj,i.
Indeed, the dimensions of the components Ai,j are recovered from B by
(7.2) dimAi,j = [bi,j ]+,
where we use the notation
(7.3) [x]+ = max(x, 0).
Proposition 7.1. Let (A, S) be a 2-acyclic reduced QP, and suppose that the reduced
QP µk(A, S) = (A, S) obtained from (A, S) by the mutation at some vertex k (see
Definition 5.5) is also 2-acyclic. Let B(A) = (bi,j) and B(A) = (bi,j) be the skew-
symmetric integer matrices given by (7.1). Then we have
(7.4) bi,j =
−bi,j if i = k or j = k;
bi,j + [bi,k]+ [bk,j ]+ − [−bi,k]+ [−bk,j ]+ otherwise.
30 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Proof. First we note that by Proposition 4.4, if (C, T ) is a trivial QP then dimCi,j =
dimCj,i for all i, j. In view of (5.16), this implies that
(7.5) bi,j = dimAi,j − dimAj,i = dim Ãi,j − dim Ãj,i,
where (Ã, S̃) = µ̃k(A, S). Using (5.3), we obtain
dim Ãi,j =
dimAj,i if i = k or j = k;
dimAi,j + dimAi,k dimAk,j otherwise.
To obtain (7.4), it remains to substitute these expressions into (7.5) and use (7.2). �
An easy calculation using the obvious identity x = [x]+ − [−x]+ shows that the
second case in (7.4) can be rewritten in several equivalent ways as follows:
bi,j = bi,j + sgn(bi,k) [bi,kbk,j ]+
= bi,j + [−bi,k]+ bk,j + bi,k[bk,j]+
= bi,j +
|bi,k|bk,j + bi,k|bk,j|
It follows that the transformation B 7→ B given by (7.4) coincides with the matrix
mutation at k which plays a crucial part in the theory of cluster algebras (cf. [18,
(4.3)], [20, (2.2), (2.5)]).
We see that the mutations of 2-acyclic QPs provide a natural framework for matrix
mutations. With some abuse of notation, we denote by µk(A) the 2-acyclic R-
bimodule such that the skew-symmetric matrix B(µk(A)) is obtained from B(A) by
the mutation at k; thus, µk(A) is determined by A up to an isomorphism.
Note that the matrix mutations at arbitrary vertices can be iterated indefinitely,
while the 2-acyclicity condition (4.13) can be destroyed by a QP mutation. We will
study QPs for which this does not happen.
Definition 7.2. Let k1, . . . , kℓ ∈ Q0 be a finite sequence of vertices such that kp 6=
kp+1 for p = 1, . . . , ℓ − 1. We say that a QP (A, S) is (kℓ, · · · , k1)-nondegenerate if
all the QPs (A, S), µk1(A, S), µk2µk1(A, S), . . . , µkℓ · · ·µk1(A, S) are 2-acyclic (hence
well-defined). We say that (A, S) is nondegenerate if it is (kℓ, . . . , k1)-nondegenerate
for every sequence of vertices as above.
To state our next result recall the terminology introduced before Proposition 4.15.
In particular, for a given quiver with the arrow span A, the QPs on A are identified
with the elements of KC(A).
Proposition 7.3. Suppose that the base field K is infinite, Q is a 2-acyclic quiver
with the arrow span A, a sequence of vertices k1, . . . , kℓ is as in Definition 7.2, and
A′ = µkℓ · · ·µk1(A). Then there exist a non-zero polynomial function F : K
C(A) →
K and a regular map G : U(F ) → KC(A
′) such that every QP (A, S) with S ∈
U(F ) is (kℓ, . . . , k1)-nondegenerate, and, for any QP (A, S) with S ∈ U(F ), the QP
µkℓ · · ·µk1(A, S) is right-equivalent to (A
′, G(S)).
Proof. We proceed by induction on ℓ. First let us deal with the case ℓ = 1, that
is, with a single mutation µk. Recall that µk(A, S) = (A, S) is the reduced part of
the QP µ̃k(A, S) = (Ã, S̃) given by (5.3) and (5.8). It is clear from the definition
QUIVERS WITH POTENTIALS I 31
that S̃ = G̃(S) for a polynomial map G̃ : KC(A) → KC(
eA). Now let us apply
Proposition 4.15 to the quiver with the arrow span Ã. We see that there exists
a polynomial function of the form Dd1,...,dNc1,...,cN on K
C( eA) (see (4.14), where we have
changed the notation for the arrows to avoid the notation conflict with Section 5) such
that the reduced part (A, S) of a QP (Ã, S̃) is 2-acyclic whenever S̃ ∈ U(Dd1,...,dNc1,...,cN ).
Furthermore, for S̃ ∈ U(Dd1,...,dNc1,...,cN ), the QP (A, S) is right-equivalent to (A
′, H(S̃))
for some regular map H : U(Dd1,...,dNc1,...,cN ) → K
C(A′), where A′ = µk(A). We now define
a polynomial function F : KC(A) → K and a regular map G : U(F ) → KC(A
′) by
setting
(7.6) F = Dd1,...,dNc1,...,cN ◦ G̃, G = H ◦ G̃.
To finish the argument for ℓ = 1, it remains to show that F is not identically equal
to zero. But this is clear from the definitions (4.14) and (5.8), since the oriented 2-
cycles in Ã (up to cyclical equivalence) are of the form c[ba] and so are in a bijection
with the oriented 3-cycles cba in A that pass through k.
Now assume that ℓ ≥ 2, and that our assertion holds if we replace ℓ by ℓ− 1. Let
A1 = µk1(A), so A
′ = µkℓ · · ·µk2(A1). By the inductive assumption, there exist a non-
zero polynomial function F ′ : KC(A1) → K and a regular map G′ : U(F ′) → KC(A
such that, for any QP (A1, S1) with S1 ∈ U(F
′), the QP µkℓ · · ·µk2(A1, S1) is right-
equivalent to (A′, G′(S1)). Also by the already established case ℓ = 1, there exists a
non-zero polynomial function F ′′ : KC(A1) → K such that, for any QP (A1, S1) with
S1 ∈ U(F
′′), the QP µk1(A1, S1) is 2-acyclic, hence is right-equivalent to some QP
on A. Since the base field K is assumed to be infinite, we have U(F ′) ∩ U(F ′′) 6= ∅.
Choose S
1 ∈ U(F
′) ∩ U(F ′′), and let (A, S0) = µk(A1, S
1 ). By Theorem 5.7, we
have µk(A, S0) = (A1, S
1 ). By the above argument for ℓ = 1, there exist a nonzero
polynomial function F1 : K
C(A) → K and a regular map G1 : U(F1) → K
C(A1)
(of the type (7.6)) such that µk(A, S) = (A1, G1(S)) for S ∈ U(F1). In particular,
we have G1(S0) = S
1 implying that F
′ ◦ G1 is a nonzero polynomial function on
KC(A). It follows that the nonzero polynomial function F (S) = F1(S)F
′(G1(S)) and
the regular map G = G′ ◦ G1 : U(F ) → K
C(A′) are well-defined and satisfy all the
required conditions. This completes the proof of Proposition 7.3. �
Corollary 7.4. For every 2-acyclic arrow span A, there exists a countable family F
of nonzero polynomial functions on KC(A) such that the QP (A, S) is nondegenerate
whenever S ∈
F∈F U(F ). In particular, if the base field K is uncountable, then
there exists a nondegenerate QP on A.
Proof. By Proposition 7.3, for every sequence k1, . . . , kℓ as in Definition 7.2, there
exists a nonzero polynomial function Fk1,...,kℓ on K
C(A) such that a QP (A, S) is
(kℓ, . . . , k1)-nondegenerate for S ∈ U(Fk1,...,kℓ). These functions form a desired count-
able family F .
It remains to prove that
F∈F U(F ) 6= ∅ provided K is uncountable. If A is
acyclic, i.e., has no oriented cycles, then KC(A) = {0}, and each of the functions in
F is just a nonzero constant, so there is nothing to prove; no assumption on K is
needed here. If A has at least one oriented cycle then the set C(A) is countable (recall
32 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
that it consists of cyclic paths up to cyclical equivalence). Thus, we can realize KC(A)
as the polynomial ring K[X1, X2, . . . ] in countably many indeterminates. Since K is
uncountable, we can choose x1 so that F (x1) 6= 0 for all F ∈ F ∩K[X1]. Then we
choose x2 so that F (x1, x2) 6= 0 for all F ∈ F ∩K[X1, X2]. Continuing like this, we
find a sequence x1, x2, . . . such that F (x1, x2, . . . ) 6= 0 for all F ∈ F . �
8. Rigid QPs
Proposition 8.1. Every rigid reduced QP (A, S) is 2-acyclic.
Proof. First note that the definition of rigidity can be conveniently restated as fol-
lows:
a QP (A, S) is rigid if and only if every potential(8.1)
on A is cyclically equivalent to an element of J(S).
Now suppose for the sake of contradiction that for some i 6= j both components Ai,j
and Aj,i are non-zero. Choose non-zero elements a ∈ Ai,j and b ∈ Aj,i. Remembering
the definition of the Jacobian ideal (see Definition 3.1), it is easy to see that the cyclic
part of J(S) is contained in m(A)3. It follows that ab is not cyclically equivalent to
an element of J(S), in contradiction with (8.1). �
Combining Proposition 8.1 with Corollary 6.11, we obtain the following result.
Corollary 8.2. Any rigid QP is nondegenerate.
Let us now give some examples.
Example 8.3. Recall that a skew-symmetric integer matrix B is acyclic if the corre-
sponding directed graph (with an arrow i→ j associated with each entry bi,j > 0) has
no oriented cycles. If the matrix B(A) given by (7.1) is acyclic, then R〈〈A〉〉cyc = {0},
and so the only QP associated with A is (A, 0), which is clearly rigid.
Now suppose that A is 2-acyclic, and that B(A) is not necessarily acyclic but
is mutation equivalent to an acyclic matrix (i.e., can be transformed to an acyclic
matrix by a sequence of mutations). As a consequence of Corollary 6.11 and Theo-
rem 5.7, there exists a potential S such that (A, S) is a rigid reduced QP; moreover,
(A, S) is unique up to right-equivalences.
Example 8.4. For A arbitrary, the deformation space of a QP (A, 0) is naturally
identified with the space of potentials modulo cyclical equivalence, hence it is infinite-
dimensional provided A has at least one oriented cycle.
Example 8.5 (Cyclic triangle). Let Q be the quiver with three vertices 1, 2, 3 and
three arrows a : 1 → 2, b : 2 → 3 and c : 3 → 1:
An arbitrary potential S is cyclically equivalent to the one of the form S = F (cba),
where F ∈ K[[t]] is a formal power series. The deformation space Def(A, S) is
naturally isomorphic to the quotient space of tK[[t]] modulo the ideal generated by
QUIVERS WITH POTENTIALS I 33
tdF/dt. If charK = 0, and n ≥ 1 is the smallest exponent such that tn appears in F ,
then dimDef(A, S) = n− 1. In particular, (A, S) is rigid if and only if n = 1.
Now let us consider the QP (Ã, S̃) = µ̃2(A, S); in view of (5.6), (5.7) and (5.8), Ã
has four arrows a⋆, b⋆, c, [ba], and
S̃ = F (c[ba]) + [ba]a⋆b⋆.
Thus, if n ≥ 2 then (Ã, S̃) is reduced and so is equal to µ2(A, S) = (A, S). Since
µ2(A, S) has an oriented 2-cycle formed by the arrows c and [ba], the mutations at
vertices 1 and 3 cannot be applied. We see that the QP (A, F (cba)) is degenerate
for n ≥ 2.
Example 8.6 (Double cyclic triangle). Now consider the quiver with three vertices
1, 2, 3 and six arrows a1, a2 : 1 → 2, b1, b2 : 2 → 3 and c1, c2 : 3 → 1:
b2 ��=
Any potential S on A is cyclically equivalent to the one whose degree 3 component
belongs to the 8-dimensional space A31,1 = A1,3A3,2A2,1. It is known that the diagonal
action of the group GL32 on K
2 ⊗ K2 ⊗K2 has seven orbits, see e.g., [23, Chapter
14, Example 4.5]. Thus, by performing a change of arrows automorphism, we can
assume that the degree 3 component of S is one of the representatives of these orbits.
An easy case-by-case analysis shows that no potential can give rise to a rigid QP
on A.
For instance, let
(8.2) S = c1b1a1 + c2b2a2.
Then J(S) is the closure of the ideal in R〈〈A〉〉 generated by six elements
c1b1, b1a1, a1c1, c2b2, b2a2, a2c2.
One checks easily that the cyclic path c1b2a1c2b1a2 is not cyclically equivalent to an
element of J(S), hence (A, S) is not rigid.
Now let us compute µ2(A, S). Again setting (Ã, S̃) = µ̃2(A, S), we see that Ã has
ten arrows
a⋆1, a
2, c1, c2, [b1a1], [b1a2], [b2a1], [b2a2],
S̃ = c1[b1a1] + c2[b2a2] +
i,j=1
[biaj ]a
To obtain the splitting (4.5) of (Ã, S̃), we apply the automorphism ϕ of R〈〈Ã〉〉 fixing
all arrows except c1 and c2, and such that ϕ(ci) = ci−a
i . An easy check shows that
µ2(A, S) = (A, S) can be described as follows: A is 6-dimensional with the arrows
a⋆1, a
2, [b1a2], [b2a1], and
S = [b1a2]a
1 + [b2a1]a
34 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Thus, the mutated QP (A, S) can be obtained from the initial QP (A, S) by a
renumbering of the vertices. This implies that one can apply to (A, S) unlimited
mutations at arbitrary vertices, so (A, S) is a non-rigid, nondegenerate QP.
Example 8.7. For each n ≥ 0, let us consider the following quiver Q(n), which we
refer to as the triangular grid of order n. The vertex set of Q(n) is
Q(n)0 = {(p, q, r) ∈ Z
≥0 | p+ q + r = n};
and there is a single arrow (p1, q1, r1) → (p2, q2, r2) if and only if (p2, q2, r2)−(p1, q1, r1)
is one of the three vectors (−1, 1, 0), (0,−1, 1), (1, 0,−1). Thus, the vertices of Q(n)
form a regular triangular grid with n2 cyclically oriented unit triangles. For example,
the quiver Q(4) is
<<yyyyyyyy
031oo
<<yyyyyyyy
121oo
<<yyyyyyyy
022oo
<<yyyyyyyy
211oo
<<yyyyyyyy
112oo
<<yyyyyyyy
013oo
<<yyyyyyyy
301oo
<<yyyyyyyy
202oo
<<yyyyyyyy
103oo
<<yyyyyyyy
004oo
Let A = A(n) be the arrow span of Q(n), and let a ∈ A (resp. b ∈ A, c ∈ A) denote
the sum of all arrows of Q(n) that are parallel translates of (−1, 1, 0) (resp. (0,−1, 1),
(1, 0,−1)). Thus, every interior vertex i has three incoming arrows eia, eib, eic and
three outgoing arrows aei, bei, cei. Every path of length d can be uniquely written
as ad · · · a2a1ej , where each as is one of the elements a, b, c, and j is a vertex; this
expression is non-zero if and only if the polygonal line obtained by attaching to the
vertex j the vectors corresponding to a1, a2, . . . , ad (in this order) is contained in our
grid.
Define a potential S ∈ A3 by setting
S = cba− bca.
Then the Jacobian ideal J(S) is generated by the elements
(cb− bc)ej , (ac− ca)ej , (ba− ab)ej
for all vertices j. It follows that the image of the path ad · · · a2a1ej in the Jacobian
algebra P(A, S) does not change under any permutation of the factors a1, . . . , ad.
In particular, we see that P(A, S) is spanned by the images of the paths ckbℓamej
for all vertices j and all k, ℓ,m such that 0 ≤ k, ℓ,m ≤ n; hence P(A, S) is finite-
dimensional. By a similar argument, it is easy to see that (A, S) is rigid. Indeed,
every potential on A is cyclically equivalent to an element of the closure of the span
of the elements (cba)mej for all vertices j and allm ≥ 0. Denoting by p the projection
QUIVERS WITH POTENTIALS I 35
R〈〈A〉〉 → Tr(P(A, S)), we see that the rigidity of (A, S) follows from the fact that
p(cbaej) = 0 for all j. Now if aej 6= 0 and h(aej) = i then we have
p(cbaej) = p(acbei) = p(cabei) = p(cbaei).
Continuing in the same way, we obtain that, for any m ≥ 1 such that amej 6= 0, we
have p(cbaej) = p(cbaek), where k is the end-point of the path a
mej. Taking m the
largest such that amej 6= 0, we conclude that p(cbaej) = 0, as desired.
As shown in [27], the quiver Q(3) in Example 8.7 is not mutation-equivalent to an
acyclic one. So there exist QPs with finite-dimensional Jacobian algebras (and also
rigid QPs), which are not mutation-equivalent to acyclic ones.
We now describe a procedure to obtain new QPs with finite-dimensional Jacobian
algebras (and new rigid QPs) from old ones.
Definition 8.8. For a QP (A, S) and a subset I of the vertex set Q0, we define the
restriction of (A, S) to I as the QP (A|I , S|I) given by
A|I =
i,j∈I
S|I = ψI(S),
where ψI : R〈〈A〉〉 → R〈〈A|I〉〉 is the algebra homomorphism such that ψI(a) = a
for a ∈ A|I , and ψI(b) = 0 for any arrow b not belonging to A|I .
Proposition 8.9. The homomorphism ψI induces an epimorphism of Jacobian alge-
bras P(A, S) → P(A|I , S|I) and an epimorphism of deformation spaces Def(A, S) →
Def(A|I , S|I). Therefore, if P(A, S) is finite-dimensional, or if (A, S) is rigid, then
the same is true for (A|I , S|I).
Proof. Remembering (3.1), it is easy to see that ψI(∂aS) = ∂aψI(S) for any arrow
a ∈ A|I , and ψI(∂bS) = 0 for any arrow b not belonging to A|I . Therefore, we have
ψI(J(S)) = J(ψI(S)), implying all the assertions. �
Corollary 8.10. Suppose that A and A′ are 2-acyclic, and there is a rigid QP (A, S)
on A. Let B(A) and B(A′) be the corresponding skew-symmetric integer matrices
given by (7.1). Suppose that B(A′) can be obtained by a simultaneous permutation
of rows and columns from some principal submatrix of a matrix mutation-equivalent
to B(A). Then there exists a rigid QP (A′, S ′) on A′.
Proof. In view of (7.1), the matrix B(A|I) is the principal submatrix of B(A) in-
volving rows and columns from I. Therefore, our assertion follows by combining
Proposition 8.9 with Corollary 6.11 and Proposition 7.1. �
We conclude this section with a combinatorial application of Corollary 8.10.
Corollary 8.11. Let B = B(A(n)) be the matrix associated with the triangular grid
of some order n (see Example 8.7). Then none of the matrices mutation-equivalent
to B contains
0 2 −2
−2 0 2
2 −2 0
36 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
as a principal submatrix.
Proof. Note that B′ = B(A′), where A′ is the quiver in Example 8.6. We have seen
that there exists a rigid QP on A, but not on A′. Thus our statement is a special
case of Corollary 8.10. �
Remark 8.12. Using the results in [1, Section 2.6] (cf. also [22, Theorem 1]), it
is easy to see that the quiver Q(n) in Example 8.7 is naturally associated with the
cluster algebra structure in the coordinate ring of the base affine space of the group
SLn+3. We have been informed by J. Schröer that, following his suggestion, A. Seven
has shown that a skew-symmetric matrix B′ associated with an arbitrary tree appears
as a principal submatrix in some matrix mutation-equivalent to the matrix B(A(n))
for some n. J. Schröer also informed us that Corollary 8.11 has been also proved by
B. Keller (using a different method).
9. Relation to cluster-tilted algebras
Let Q be a quiver with the vertex span R and the arrow span A. Assume that
Q is 2-acyclic. Let B(A) be the corresponding skew-symmetric integer matrix given
by (7.1). As shown in [19], the matrix B(A) gives rise to a cluster algebra of finite
type if and only if Q is mutation-equivalent to a Dynkin quiver (that is, an arbitrary
orientation of a Dynkin diagram of one of the types An, Dn, E6, E7, or E8). In
particular, as we have seen in Example 8.3, there is a rigid reduced QP (A, S), and
it is unique up to a right-equivalence; in fact, (the right-equivalence class of) (A, S)
is obtained by a sequence of mutations from a QP (A0, 0), where A0 is associated to
a Dynkin quiver. We now present an explicit choice of such a potential S. To do
this, we need some preparation.
First note that, according to [19, Theorem 1.8], every quiver mutation-equivalent
to a Dynkin quiver has no multiple edges, that is |bi,j| ≤ 1 for every entry of B(A).
In other words, we have:
(9.1) dimAi,j ≤ 1 for all i and j.
Therefore, we can unambiguously denote by ai,j the only arrow in a non-zero subspace
Ai,j . We will also use the convention that ai,j = 0 whenever Ai,j = {0}.
Second, we use the following terminology: a chordless cycle in (the underlying
graph of) Q is a d-subset of vertices I ⊆ Q0 that can be bijectively labeled by the
elements of Z/dZ so that the edges between them are precisely {i, i+1} for i ∈ Z/dZ.
According to [19, Proposition 9.7], if Q is mutation-equivalent to a Dynkin quiver
then the arrows of every chordless cycle in Q must be cyclically oriented. In terms
of the arrow span A, this condition can be stated as follows:
For any chordless d-cycle I, there exists a bijection ν : Z/dZ → I such(9.2)
that the set of arrows in the restriction A|I is {aν(i),ν(i+1) | i ∈ Z/dZ}.
(see Definition 8.8). Note that the choice of a bijection ν in (9.2) is unique up to a
cyclic shift.
We call a potential S on A primitive if it has the form
(9.3) S =
xIaν(1),ν(2) · · · aν(d−1),ν(d)aν(d),ν(1),
QUIVERS WITH POTENTIALS I 37
where the (finite) sum is over all chordless cycles I in Q, for each I there is a
bijection ν chosen as in (9.2), and the coefficients xI are some non-zero elements of
the base field K.
Proposition 9.1. If a quiver Q with the arrow span A is mutation-equivalent to a
Dynkin quiver, and S is a primitive potential on A, then the QP (A, S) is rigid.
To prove Proposition 9.1, we impose on Q a weaker assumption that its arrow
span A satisfies (4.13), (9.1) and (9.2). Choose some vertex k, and (as in Section 7)
let µk(A) denote the 2-acyclic R-bimodule such that the skew-symmetric matrix
B(µk(A)) is obtained from B(A) by the mutation at k. It is easy to see that µk(A)
satisfies (9.1) but not necessarily (9.2). In view of Corollary 6.11, the assertion of
Proposition 9.1 is a consequence of the following lemma.
Lemma 9.2. Suppose that the arrow span A of a quiver Q satisfies (4.13), (9.1) and
(9.2), and that µk(A) also satisfies (9.2) for some vertex k. Let S be a primitive
potential on A. Then the QP (A, S) = µk(A, S) is right-equivalent to a QP on µk(A)
with a primitive potential.
Proof. Let µ̃k(A, S) = (Ã, S̃) be the QP given by (5.6), (5.7) and (5.8). Denote by
In(k) (resp. Out(k)) the set of vertices j (resp. i) such that dimAk,j = 1 (resp.
dimAi,k = 1). In view of (9.2), every arrow a with both ends in In(k) ∪Out(k) has
h(a) ∈ In(k) and t(a) ∈ Out(k). We denote the set of these arrows by Q′1. The
arrows of Ã can be unambiguously denoted as follows:
• ãi,j = ai,j for all i, j different from k and such that ai,j 6= 0.
• ãi,j = [ai,kak,j] for all i ∈ Out(k), j ∈ In(k).
• ãj,k = a
k,j for j ∈ In(k).
• ãk,i = a
i,k for i ∈ Out(k).
We can split S into the sum of four terms
(9.4) S = S1 + S2 + S3 + S4,
where
• S1 involves (oriented) 3-cycles ai,kak,jaj,i.
• S2 involves chordless d-cycles through k with d ≥ 4;
• S3 involves chordless cycles having an arrow aj,i ∈ Q
1 but not passing
through k;
• S4 involves chordless cycles having no arrows with both ends in In(k)∪{k}∪
Out(k).
Using (9.2), it is easy to see that every chordless cycle I involved in S2 or S3 has
exactly one common point with each of the sets In(k) and Out(k). Also note that
every term in S1 or S3 contains exactly one arrow from Q
1, while none of the terms
in S2 or S4 contain such arrows. Remembering (5.8), we write the potential S̃ as
follows:
(9.5) S̃ = [S1] + [S2] + [S3] + [S4] + ∆k.
We have
[S1] =
aj,i∈Q
x{i,j,k}ãi,jaj,i,
38 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
and this is the degree 2 component S̃(2) of S̃. In view of (5.16), the arrows in A are
obtained from those in Ã by removing all arrows aj,i ∈ Q
1 and their opposites ãi,j.
Inspecting the other terms in (9.5), it is easy to see that
[S2] = S3, [S4] = S4,
∆k = S1 +
aj,i∈Q
ãi,jãj,kãk,i,
[S3] =
aj,i∈Q
fi,jaj,i,
where fi,j = ∂aj,iS3, and the terms S1, S3 and S4 have the same meaning as in (9.4)
with A replaced by A. Let ϕ be the automorphism of R〈〈Ã〉〉 acting on arrows as
follows:
ϕ(aj,i) = x
{i,j,k}
(aj,i − ãj,kãk,i), ϕ(ãi,j) = ãi,j − x
{i,j,k}
for every aj,i ∈ Q
1, and ϕ fixes the rest of the arrows. Then we have
ϕ(S̃) = S1 + S3 + S4 + ϕ(
aj,i∈Q
(x{i,j,k}ãi,jaj,i + ãi,j ãj,kãk,i + fi,jaj,i))
= S1 + S3 + S4 +
aj,i∈Q
(ãi,jaj,i − x
{i,j,k}
fi,jãj,kãk,i).
We see that the degree 2 component of ϕ(S̃) is
ϕ(S̃)(2) =
aj,i∈Q
ãi,jaj,i,
and a moment’s reflection shows that
aj,i∈Q
{i,j,k}
fi,jãj,kãk,i
can be viewed as the component S2 of a primitive potential on A. We conclude that
µk(A, S) is right-equivalent to (A, S1 + S2 + S3 + S4), finishing the proof. �
We conclude this section by briefly discussing a connection between Jacobian al-
gebras of rigid QPs and cluster-tilted algebras introduced in [8]. We refer to [8] for
precise definitions; roughly speaking, cluster-tilted algebras are endomorphism rings
of tilting objects in cluster categories. One can associate such an algebra to any
quiver Q which is mutation-equivalent to a Dynkin quiver. As shown in [13, Theo-
rem 4.1] (for type A) and [9, Theorems 4.1, 4.2], the cluster-tilted algebra associated
to Q is isomorphic to the path algebra of Q modulo an explicitly described ideal of
relations. By inspection, this ideal is exactly the Jacobian ideal of a primitive poten-
tial S given by (9.3). Thus, Proposition 9.1 has the following corollary, which shows
that the Jacobian algebras of QPs can be viewed as generalizations of cluster-tilted
algebras.
Corollary 9.3. If a quiver Q with the arrow span A is mutation-equivalent to a
Dynkin quiver, then the Jacobian algebra P(A, S) of a rigid QP on A (explicitly
given by (9.3)) is isomorphic to the cluster-tilted algebra associated to Q.
QUIVERS WITH POTENTIALS I 39
10. Decorated representations and their mutations
The following definition is inspired by [28].
Definition 10.1. A decorated representation of a QP (A, S) is a pair M = (M,V ),
where V is a finite-dimensional (left) R-module, and M is a finite-dimensional
R〈〈A〉〉-module annihilated by J(S).
Equivalently, M is a finite-dimensional P(A, S)-module. We will sometimes write
M = (A, S,M, V ) and refer to such a quadruple as a QP-representation.
We have M =
Mi and V =
Vi, where Mi = eiM and Vi = eiV . With
some abuse of notation, for u ∈ R〈〈A〉〉 or u ∈ P(A, S), we denote the multiplication
operator m 7→ um on M simply as u : M → M ; we write u = uM if we need
to emphasize the dependence of this operator on M . In particular, for each arrow
a ∈ A, we have a :Mt(a) → Mh(a), and a|Mi = 0 for i 6= t(a).
Note that every finite-dimensional R〈〈A〉〉-module M is nilpotent, i.e., M is anni-
hilated by mn for n≫ 0. We thank Bill Crawley-Boevey for pointing this out to us;
the following argument was also suggested by him. The above claim is a special case
of the following more general fact: if a ring U with unit is complete in the I-adic
topology for some two-sided ideal I, and M is a U -module of finite length n, then
InM = {0}. Indeed, the element 1 + x is invertible in U for any x ∈ I, since it has
inverse 1 − x + x2 − x3 + · · · . Thus I is contained in the Jacobson radical J (since
J is the set of x ∈ U such that 1 + xy is invertible for all y ∈ U). Thus IS = {0}
for any simple U -module S (since J is the intersection of annihilators of all simple
modules). Now if M has composition series
{0} =M0 ⊂M1 ⊂ · · · ⊂Mn =M,
then for all i ≥ 1, we have I(Mi/Mi−1) = {0}, so IMi ⊆ Mi−1. It follows that
InM = {0}, as claimed.
The aim of this section is to extend the definition of QP-mutations in Corollary 5.4
and Definition 5.5 to the level of QP-representations, and to prove a representation-
theoretic extension of Theorem 5.7. To do this, we first introduce right-equivalence
for QP-representations.
Definition 10.2. Let (A, S) and (A′, S ′) be QPs on the same set of vertices, and let
M = (M,V ) (resp. M′ = (M ′, V ′)) be a decorated representation of (A, S) (resp.
of (A′, S ′)). A right-equivalence between M and M′ is a triple (ϕ, ψ, η), where:
• ϕ : R〈〈A〉〉 → R〈〈A′〉〉 is a right-equivalence between (A, S) and (A′, S ′) (see
Definition 4.2);
• ψ : M → M ′ is a vector space isomorphism such that ψ ◦ uM = ϕ(u)M ′ ◦ ψ
for u ∈ R〈〈A〉〉;
• η : V → V ′ is an isomorphism of R-modules.
Remark 10.3. The usual notion of isomorphism of decorated representations M =
(M,V ) and M′ = (M ′, V ′) of the same QP (A, S) (namely, that M and M ′ are iso-
morphic P(A, S)-modules, and V and V ′ are isomorphic R-modules) is a special case
of right-equivalence with the ϕ-component being the identity. The right-equivalence
seems to be more relevant for applications to cluster algebras. To illustrate, consider
the Kronecker quiver Q with two vertices 1 and 2, and two arrows a and b from
40 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
1 to 2. Since Q has no oriented cycles, it has only one QP (A, S): the one with
S = 0. Thus, a decorated representation M = (M,V ) with V = 0 is just a usual
representation of the quiver Q, i.e., it consists of two vector spaces M1 and M2, and
two linear maps a and b from M1 to M2. Such representations were classified by
Kronecker. In particular, he proved that, for every n ≥ 1, the isomorphism classes of
indecomposable Q-representations with dimM1 = dimM2 = n are naturally param-
eterized by the projective line. However all these representations are right-equivalent
to each other. This is more in sync with the fact that, as discussed in [16], all these
representations give rise to the same element of the corresponding cluster algebra.
We now present a representation-theoretic extension of Theorem 4.6. Let M =
(A, S,M, V ) be a QP-representation, and let ϕ : R〈〈Ared⊕C〉〉 → R〈〈A〉〉 be a right-
equivalence of the QPs (Ared, Sred)⊕(C, T ) and (A, S), where (Ared, Sred) is a reduced
QP, and (C, T ) is a trivial QP, see Theorem 4.6. We define a R〈〈Ared〉〉-module M
by setting M ′ =M as a K-vector space with the action of R〈〈Ared〉〉 given by uM ′ =
ϕ(u)M . In view of Proposition 4.5, this makes a quadruple Mred = (Ared, Sred,M
′, V )
a well-defined QP-representation.
Definition 10.4. We call the QP-representation Mred given by the above construc-
tion the reduced part of M.
This terminology is justified by the following.
Proposition 10.5. The right-equivalence class of Mred is determined by the right-
equivalence class of M.
Proof. To get more in sync with the notation in Proposition 4.9, we change our
notation a little bit and assume that M is a decorated representation of a QP
(A ⊕ C, S + T ), where (A, S) is a reduced QP, and (C, T ) a trivial one. Let ϕ
be an auto-right-equivalence of (A ⊕ C, S + T ), that is, an algebra automorphism
of R〈〈A ⊕ C〉〉 such that ϕ(S + T ) is cyclically equivalent to S + T . To prove
Proposition 10.5, it suffices to show the following:
there exists an algebra automorphism ϕ′ of R〈〈A〉〉 such that(10.1)
ϕ′(S) is cyclically equivalent to S, and ϕ′(u)M = ϕ(u)M for u ∈ R〈〈A〉〉.
As in the proof of Proposition 4.9, let L denote the closure of the two-sided ideal
in R〈〈A⊕ C〉〉 generated by C. Recall from (4.10) that we have
R〈〈A⊕ C〉〉 = R〈〈A〉〉 ⊕ L,
J(S + T ) = J(S)⊕ L
(in the last equality, J(S) is understood as the Jacobian ideal of S in R〈〈A〉〉). In
particular, we have uM = 0 for u ∈ L.
Let ψ : R〈〈A〉〉 → R〈〈A〉〉 denote the restriction to R〈〈A〉〉 of the composition
p ◦ ϕ, where p is the projection of R〈〈A⊕ C〉〉 onto R〈〈A〉〉 along L. Then we have
ψ(u)M = ϕ(u)M for u ∈ R〈〈A〉〉.
According to (4.12), ψ is an algebra automorphism of R〈〈A〉〉. Furthermore, using
(4.12) in conjunction with Proposition 4.10, we conclude that J(ψ(S)) = J(S), and
QUIVERS WITH POTENTIALS I 41
that there exists an algebra automorphism η of R〈〈A〉〉 such that η(ψ(S)) is cyclically
equivalent to S, and η(u)− u ∈ J(S) for u ∈ R〈〈A〉〉. Taking ϕ′ = η ◦ψ, we see that
ϕ′(u)M = η(ψ(u))M = ψ(u)M = ϕ(u)M
for u ∈ R〈〈A〉〉. Thus ϕ′ satisfies all the conditions in (10.1), and we are done. �
We turn to the definition of mutations for a QP-representation M = (A, S,M, V ).
We fix a vertex k satisfying (5.1), and let a1, . . . , as be all arrows with h(ap) = k,
and b1, . . . , bt be all arrows with t(bq) = k. We denote
(10.2) Min =
Mt(ap), Mout =
Mh(bq).
Let α = αM : Min → Mk and β = βM : Mk → Mout be K-linear maps given in the
matrix form by
(10.3) α =
a1 a2 · · · as
, β =
 .
We also introduce a K-linear map γ = γM : Mout → Min as follows. Replacing S
if necessary by a cyclically equivalent potential, we may assume that S ∈ R〈〈A〉〉k̂,k̂
(see (6.1)). We identify R〈〈A〉〉
k̂,k̂
with R〈〈Ã
k̂,k̂
〉〉 as in Lemma 6.2. This allows us
to define the component γp,q :Mh(bq) →Mt(ap) of γ by setting
(10.4) γp,q = ∂[bqap]S.
Thus, we have constructed the following triangle of linear maps:
(10.5) Mk
<<zzzzzzzz
Lemma 10.6. We have αγ = 0 and γβ = 0.
Proof. The q-th component of αγ is equal to
ap∂[bqap]S = ∂bqS;
therefore, αγ = 0, since M is a representation of P(A, S). Similarly, the p-th com-
ponent of γβ is equal to ∑
(∂[bqap]S)bq = ∂apS,
implying that γβ = 0. �
Now let (Ã, S̃) be given by (5.4) and (5.8). We associate to a QP-representation
M = (A, S,M, V ) the QP-representation µ̃k(M) = (Ã, S̃,M, V ) defined as follows.
First, we set
(10.6) M i =Mi, V i = Vi (i 6= k).
42 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
We define Mk and V k by
(10.7) Mk =
ker γ
⊕ im γ ⊕
⊕ Vk, V k =
ker β
ker β ∩ im α
We now define the action on M of all arrows in Ã (recall that they are given
by (5.6) and (5.7)). Thus, for every such arrow c, we need to define a linear map
cM :M t(c) → Mh(c).
First, we set
cM = cM
for every arrow c not incident to k, and
[bqap]M = (bqap)M
for all p and q.
To define the action of the remaining arrows a⋆p and b
q , we assemble them into the
operators
b⋆1 b
2 · · · b
in the same way as in (10.3). Thus, we need to define linear maps
α :Mout =M in →Mk, β :Mk → Mout =Min.
We will use the following notational convention: whenever we have a pair U1 ⊆ U2
of vector spaces, denote by ι : U1 → U2 the inclusion map, and by π : U2 → U2/U1
the natural projection. We now introduce the following splitting data:
Choose a linear map ρ :Mout → ker γ such that ρι = idker γ.(10.8)
Choose a linear map σ : kerα/im γ → kerα such that πσ = idkerα/im γ .(10.9)
Then we define:
(10.10) α =
 , β =
0 ι ισ 0
Having defined the action of all arrows in Ã on M , we can view M as a module
over the path algebra R〈Ã〉. The property thatM is annihilated by m(A)n for n≫ 0
implies thatM is annihilated by Ãn for n≫ 0. This allows us to viewM as a module
over the completed path algebra R〈〈Ã〉〉.
Proposition 10.7. The above definitions make µ̃k(M) = (M,V ) a decorated repre-
sentation of (Ã, S̃).
QUIVERS WITH POTENTIALS I 43
Proof. We only need to show that (∂cS̃)M = 0 for every arrow c in Ã. If c is not
incident to k, the desired statement follows from (6.6). If c is one of the arrows [bqap],
then, in view of (6.5) and (10.4), it is enough to show that
q + γp,q = 0
for all p and q. In other words, we need to show that βα = −γ; But this follows at
once by multiplying the row and column given by (10.10).
It remains to show that (∂a⋆p S̃)M = 0 and (∂b⋆q S̃)M = 0 for all p and q. We first
deal with the first equality. Remembering (5.8) and (5.9), we see that
(∂a⋆p S̃)M = (
b⋆q[bqap])M = (
(b⋆q)M(bq)M)(ap)M .
Thus it suffices to show that
(b⋆q)M(bq)M = 0,
or equivalently, αβ = 0. In view of (10.10), we have
 = 0,
as desired (the equality πρβ = 0 is immediate from the definitions, while γβ = 0 by
Lemma 10.6).
The remaining equality (∂b⋆ν S̃)M = 0 is proved in a similar way. We have
(∂b⋆q S̃)M = (
[bqap]a
p)M = (bq)M
(ap)M(a
p)M .
Thus, it suffices to observe that
0 αι αισ 0
Example 10.8. Let us illustrate the definition of the operation µ̃k by a special case
where the vertex k is a sink or a source. First suppose k is a sink, that is, there are
no arrows b with t(b) = k. Then we have Mout = {0}, hence β = 0 and γ = 0. Thus,
(10.7) simplifies to
Mk = kerα⊕ Vk, V k = coker α .
The arrow span Ã is obtained from A by reversing all arrows a with h(a) = k, that
is, replacing every such arrow a with a⋆. Thus, k becomes a source for Ã, hence
M in = {0} and α = 0. The choice of splitting data (10.8) and (10.9) becomes
immaterial, and the second equality in (10.10) simplifies to
Note that we have Ã
k̂,k̂
k̂,k̂
, and the potential S̃ ∈ R〈〈Ã
k̂,k̂
〉〉 is naturally identified
with S.
44 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
The case where k is a source is completely similar. In this case we haveMin = {0},
hence α = 0 and γ = 0. It follows that
Mk = coker β ⊕ Vk, V k = ker β ,
and the map α :M in =Mout →Mk is given by
In both cases, µ̃k coincides with the “extended reflection functor” introduced in
[28]; furthermore, if we ignore the decorations (and the potentials), it becomes the
classical Bernstein-Gelfand-Ponomarev reflection functor at k, see [3].
Now we return to the case of an arbitrary vertex k.
Proposition 10.9. The isomorphism class of the decorated representation µ̃k(M)
does not depend on the choice of the splitting data (10.8) – (10.9).
Proof. We have the following freedom in choosing the splitting data: one can replace
ρ :Mout → ker γ with ρ
′ = ρ+ ξγ for some linear map ξ : im γ → ker γ, and replace
σ : kerα/im γ → kerα by σ′ = σ + η for some linear map η : kerα/im γ → im γ.
Let α′ and β
be the maps obtained by replacing ρ with ρ′, and σ with σ′ in (10.10).
It is enough to show that ψα = α′ and β
ψ = β for some linear automorphism
ψ : Mk → Mk. Decomposing Mk as in (10.7), we define ψ as the block-triangular
matrix
I πξ 0 0
0 I −η 0
0 0 I 0
0 0 0 I
 ,
where I stands for the identity transformation. The invertibility of ψ is obvious,
and the desired equalities ψα = α′ and β
ψ = β are checked by direct matrix
multiplication. �
Proposition 10.10. The right-equivalence class of the representation µ̃k(M) is de-
termined by the right-equivalence class of M.
Proof. Let ϕ be an automorphism of R〈〈A〉〉, and let M′ = (A,ϕ(S),M ′, V ′) be the
QP-representation defined as follows: V ′ = V and M ′ =M as R-modules, while the
R〈〈A〉〉-actions in M and M ′ are related by
(10.11) uM = ϕ(u)M ′ (u ∈ R〈〈A〉〉)
(note that (10.11) indeed defines a representation of (A,ϕ(S)) in view of Proposi-
tion 3.7). To prove Proposition 10.10, it suffices to show that the representations
µ̃k(M) and µ̃k(M
′) are right-equivalent.
We retain all the above notation related to M and µ̃k(M); in particular, α, β
and γ stand for the linear maps in the triangle (10.5). Let α′, β ′ and γ′ denote the
corresponding maps for the representation M′. Our first order of business is to relate
these maps to α, β and γ.
We can write the action of ϕ on the arrows a1, . . . , as as follows:
(10.12)
ϕ(a1) ϕ(a2) · · · ϕ(as)
a1 a2 · · · as
QUIVERS WITH POTENTIALS I 45
where C = C0 +C1 is an invertible s× s matrix as in (5.14). Similarly, the action of
ϕ on the arrows b1, . . . , bs can be written as
(10.13)
ϕ(b1)
ϕ(b2)
ϕ(bt)
 = D
 ,
where D = D0 +D1 is an invertible t× t matrix as in (5.15). Therefore we have
a1 a2 · · · as
ϕ(a1) ϕ(a2) · · · ϕ(as)
(10.14)
a1 a2 · · · as
C)M ′ = α
′CM ′ ,
and similarly,
(10.15) β = DM ′β
here CM ′ (resp. DM ′) is understood as an R-module automorphism of M
in = Min
(resp. of M ′out =Mout).
Turning to the maps γ and γ′, we claim that they are related by
(10.16) γ′ = CM ′γDM ′.
To see this, we use (10.4) and (3.6) to write
γ′p,q = (∂[bqap]ϕ(S))M ′(10.17)
∆[bqap](ϕ(c))�ϕ(∂cS))M ′,
where the sum is over all arrows c in Ãk̂,k̂. If c is one of the arrows in A then by
(10.11) we have
ϕ(∂cS)M ′ = (∂cS)M = 0;
remembering the definition (3.3), we see that c does not contribute to (10.17). Thus,
we have
(10.18) γ′p,q = (
p′,q′
∆[bqap](ϕ(bq′ap′))�ϕ(∂[bq′ap′ ]S))M ′.
Remembering (3.2), and using (10.12) and (10.13), we see that the summand with
(p′, q′) = (p, q) in (10.18) contains among its terms the (p, q)-entry of the matrix
(Cϕ(∂[bqap]S)D)M ′ = CM ′γDM ′.
Thus, to prove (10.16), it remains to show that the rest of the terms in (10.18) add
up to 0. Again using the definitions (3.2) and (3.3), we can rewrite the rest of the
sum in (10.18) as S1 + S2, where
S1 = (
p′,q′
∆[bqap](ϕ(bq′))�ϕ(ap′ · ∂[bq′ap′ ]S))M ′,
S2 = (
p′,q′
∆[bqap](ϕ(ap′))�ϕ(∂[bq′ap′ ]S · bq′))M ′.
46 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
It remains to observe that
S1 = (
∆[bqap](ϕ(bq′))�ϕ(∂bq′S))M ′ = 0
since ϕ(∂bq′S)M ′ = (∂bq′S)M = 0; and similarly,
S2 = (
∆[bqap](ϕ(ap′))�ϕ(∂ap′S))M ′ = 0.
In view of (10.14), (10.15) and (10.16), we have
kerα = C−1M ′(kerα
′), im α = im α′,
ker β = ker β ′, im β = DM ′(im β
′),(10.19)
ker γ = DM ′(ker γ
′), im γ = C−1M ′(im γ
Recall that the spaces M and V in the decorated representation µ̃k(M) = (M,V )
of (Ã, S̃) are given by (10.6) and (10.7). We express the decorated representation
µ̃k(M
′) = (M ′, V ′) of (Ã, ϕ̃(S)) in the same way, with the maps α, β and γ replaced
by α′, β ′ and γ′. In particular, we have V ′ = V , and M ′i = M i = Mi for i 6= k.
To specify the actions of R〈〈A〉〉 in M and M ′, we need to choose the splitting data
(ρ, σ) and (ρ′, σ′) as in (10.8) and (10.9). Note that, in view of (10.19), we can choose
(10.20) ρ′ = D−1M ′ρDM ′, σ
′ = CM ′σC
here with some abuse of notation we use the same notation C−1M ′ for the isomorphism
kerα′ → kerα and the induced isomorphism kerα′/im γ′ → kerα/im γ.
Everything is now in place for defining the desired right-equivalence (ϕ̂, ψ, η) be-
tween µ̃k(M) and µ̃k(M
′) (see Definition 10.2). First of all, we define ϕ̂ : R〈〈Ã〉〉 →
R〈〈Ã〉〉 as the right-equivalence between (Ã, S̃) and (Ã, ϕ̃(S) constructed in the proof
of Lemma 5.3. In particular, we have
ϕ̂(a⋆1)
ϕ̂(a⋆2)
ϕ̂(a⋆s)
 = C
 ,
ϕ̂(b⋆1) ϕ̂(b
2) · · · ϕ̂(b
b⋆1 b
2 · · · b
Next we define ψ :M →M ′ as the identity map on ⊕i 6=kM i = ⊕i 6=kMi = ⊕i 6=kM ′i,
and the restriction ψ|Mk :Mk →M
k given by the block-diagonal matrix
(10.21) ψ|Mk =
D−1M ′ 0 0 0
0 CM ′ 0 0
0 0 CM ′ 0
0 0 0 I
(this is well-defined in view of (10.19)). Finally, we define η : V → V ′ simply as the
identity map.
The only thing to check is the equality ψ ◦ cM = ϕ̂(c)M ′ ◦ ψ for any arrow c ∈ Ã.
And the only case that may require some consideration is when c is one of the arrows
a⋆p or b
q . Unraveling the definitions, it suffices to show that
β = C−1M ′β
′ ◦ ψ|Mk , ψ|Mk ◦ α = α
′D−1M ′.
QUIVERS WITH POTENTIALS I 47
But this is an immediate consequence of the definitions (10.21) and (10.10) (we also
need an analogue of (10.10) for the maps β ′ and α′, using the splitting data (10.20)).
This completes the proof of Proposition 10.10. �
Note that in the above treatment of the operation M 7→ µ̃k(M) for a QP-
representation M = (A, S,M, V ), the QP (A, S) was not assumed to be reduced.
Recall from Proposition 10.5 that we have a well-defined operation M 7→ Mred on
(right-equivalence classes of) QP-representations. The following property is imme-
diate from definitions.
Proposition 10.11. Let (A, S) be a QP satisfying (5.1). Then, for every represen-
tation M of (A, S), the representation µ̃k(M)red is right-equivalent to µ̃k(Mred)red.
Recall that, according to Corollary 5.4 and Definition 5.5, the correspondence µ̃k :
(A, S) 7→ µ̃k(A, S) = (Ã, S̃) gives rise to the mutation (A, S) 7→ µk(A, S) = (A, S),
which is a well-defined bijective transformation on the set of right-equivalence classes
of reduced QPs satisfying (5.1). Here (A, S) is the reduced part of (Ã, S̃). Now for
every QP-representation M = (A, S,M, V ) of a reduced QP (A, S) we define
(10.22) µk(M) = µ̃k(M)red;
thus, µk(M) is a decorated representation (A, S,M, V ) of a reduced QP (A, S).
Combining Propositions 10.5 and 10.10, we obtain the following important corollary.
Corollary 10.12. The correspondence M 7→ µk(M) is a well-defined transforma-
tion on the set of right-equivalence classes of decorated representations of reduced
QPs satisfying (5.1).
We refer to the transformation M 7→ µk(M) in Corollary 10.12 as the mutation at
vertex k. With some abuse of terminology, we will talk about mutations of decorated
representations (rather than their right-equivalence classes).
The following result naturally extends Theorem 5.7.
Theorem 10.13. The mutation µk of decorated representations is an involution; that
is, for every decorated representation M of a reduced QP (A, S) satisfying (5.1), the
decorated representation µ2k(M) of a QP µ
k(A, S) is right-equivalent to M.
Proof. In view of Proposition 10.11, µ2k(M) is right-equivalent to µ̃
k(M)red. There-
fore, is suffices to show that µ̃2k(M)red is right-equivalent to M.
We write the QP-representation µ̃2k(M) as µ̃
k(M) = (
S,M, V ). The QP (
is given by (5.18) and (5.19). In particular, there is a natural embedding of A into
A identifying A with the reduced part
Ared. Furthermore, as shown in the proof of
Theorem 5.7, an automorphism of R〈〈
A〉〉 that establishes the right-equivalence in
(5.17) can be chosen so that it restricts to an automorphism ϕ0 : R〈〈A〉〉 → R〈〈A〉〉
acting as follows:
ϕ0 multiplies each of the arrows b1, . . . , bt by −1,(10.23)
and fixes the rest of the arrows in A.
48 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
In view of Definition 10.4, the QP-representation µ2k(M) = µ̃
k(M)red can be realized
as (A, S,M ′, V ′), where M ′ = M and V ′ = V as vector spaces, and the action of
R〈〈A〉〉 in M ′ is given by
(10.24) uM ′ = ϕ0(u)M (u ∈ R〈〈A〉〉).
To prove Theorem 10.13, it suffices to show that the decorated representation (M ′, V ′)
of (A, S) is isomorphic to (M,V ).
We first compute M ′ = M and V ′ = V as vector spaces. According to (10.6), we
M ′i =M i =M i =Mi, V
i = V i = V i = Vi
for all i 6= k. As for the spaces M ′k and V
k , they are given as in (10.7), with the maps
α, β, and γ replaced by α, β, and γ, respectively. Recall that α and β are given by
(10.10). As for γ, by applying the definition (10.4) to the potential S̃ given by (5.8)
and (5.9), we see that
(10.25) γ = βα.
As a direct consequence of the definitions, we conclude that
kerα = im β, im α = ker γ
⊕ im γ ⊕ {0} ⊕ {0},
ker β = ker γ
⊕ {0} ⊕ {0} ⊕ Vk, im β = kerα,(10.26)
ker γ = ker(βα), im γ = im (βα).
It follows that
V ′k = V k =
ker β
ker β ∩ im α
= Vk,
and so V ′ = V , as desired. We also have
M ′k =
ker(βα)
⊕ im (βα)⊕
im (βα)
ker β
ker β ∩ im α
We now make the following easy observations:
• the map α induces an isomorphism ker(βα)/ kerα → ker β ∩ im α;
• the map β induces an isomorphism im α/(ker β ∩ im α) → im (βα);
• the map β induces an isomorphism Mk/(ker β + im α) → im β/im (βα).
• there is a natural isomorphism ker β/(ker β ∩ im α) → (ker β + im α)/im α.
Using these isomorphisms, we can identify M ′k with the space
(10.27) M ′k = (ker β ∩ im α)⊕
ker β ∩ im α
ker β + im α
ker β + im α
To describe the action of R〈〈A〉〉 in M ′, we need only to describe the maps α′ :
Min → M
k and β
′ : M ′k → Mout constructed in the same way as in (10.3). As in
(10.10), the definition of α′ and β ′ involves splitting data (10.8) - (10.9). In view of
(10.26), in the current situation the splitting data take the following form:
Choose a linear map ρ :Min → ker(βα) such that ρι = idker(βα).
Choose a linear map σ : im β/im (βα) → im β such that πσ = idim β/im (βα).
QUIVERS WITH POTENTIALS I 49
Adapting (10.10) to the current situation (in particular, realizing M ′k as in (10.27)),
we see that the maps α′ and β ′ take the following form:
(10.28) α′ =
 , β
0 −β −ισβ 0
here with some abuse of notation we denote by the same symbol β the two maps
im α/ kerβ ∩ im α → Mout and Mk/(ker β + im α) → im β/im (βα) induced by β.
Note that the appearance of the minus sign in β ′ is caused by the minus sign in
(10.23).
To complete the proof of Theorem 10.13, it remains to construct an isomorphism
of vector spaces ψ :M ′k →Mk such that
(10.29) ψα′ = α, βψ = β ′.
To do this, notice that the four direct summands in (10.27) are the factors in the
filtration
{0} ⊆ ker β ∩ im α ⊆ im α ⊆ ker β + im α ⊆Mk.
Choose some sections
σ1 : im α/(ker β ∩ im α) → im α,
σ2 : (ker β + im α)/im α→ (ker β + im α),
σ3 :Mk/(ker β + im α) →Mk
for the three factors of this filtration, so that they satisfy:
im σ1 = α(ker ρ), im σ2 ⊆ ker β, im (βσ3) = im σ.
Now define an an isomorphism ψ :M ′k →Mk by setting
−ι −ισ1 −ισ3 −ισ2
The equalities (10.29) are checked by a direct inspection, finishing the proof. �
Note that there is an obvious way to define direct sums for decorated repre-
sentations of a given QP (A, S). Hence we can talk about indecomposable QP-
representations. Clearly, the right-equivalence relation respects direct sums and in-
decomposability. It is also immediate from the definitions that any mutation µk of
QP-representations sends direct sums to direct sums. Combining this with Theo-
rem 10.13, we obtain the following corollary.
Corollary 10.14. Any mutation µk is an involution on the set of right-equivalence
classes of indecomposable decorated representations of reduced QPs satisfying (5.1).
We call a QP-representation (A, S,M, V ) positive if V = {0}. Thus, indecompos-
able positive representations are just indecomposable P(A, S)-modules. In partic-
ular, for every vertex k, the simple representation Sk(A, S) is the indecomposable
positive representation of (A, S) such that dimMi = δi,k. We denote by S
k (A, S) the
indecomposable representation (M,V ) of (A, S) such thatM = {0} and dim Vi = δi,k.
We refer to S−k (A, S) as the negative simple representation at k. The following propo-
sition is immediate from the definitions.
50 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
Proposition 10.15. Any indecomposable QP-representation is either positive, or
negative simple. If µk(A, S) = (A, S) then we have
(10.30) µk(Sk(A, S)) = S
k (A, S), µk(S
k (A, S)) = Sk(A, S);
and this is the only mutation that interchanges positive and negative indecomposable
representations.
11. Some three-vertex examples
In this section we illustrate the action of mutations on QP-representations by
some examples dealing with three-vertex quivers. All the representations (M,V )
considered below will be positive, i.e., V = {0}.
Example 11.1. Let Q be the quiver with three vertices 1, 2, 3 and two arrows a :
1 → 2 and b : 2 → 3:
Since Q is acyclic, the only QP on it is (A, 0). We have µ2(A, 0) = (A, S), where A
is the arrow span of the quiver Q given by
and S = b⋆[ba]a⋆. Thus, positive representations of (A, 0) are the representations of
the quiver Q, while positive representations of (A, S) are the representations of the
quiver Q satisfying the relations
(11.1) b⋆[ba] = [ba]a⋆ = a⋆b⋆ = 0.
In view of Corollary 10.14 and Proposition 10.15, the mutation µ2 establishes a
bijection between the set of right-equivalence classes of indecomposable positive rep-
resentations of (A, 0) different from the simple representation S2, and the same set for
(A, S). Since Q is a Dynkin quiver of type A3, by Gabriel’s theorem, an indecompos-
able positive representationM of (A, 0) is uniquely up to an isomorphism determined
by its dimension vector dimM = (dimM1, dimM2, dimM3), and these dimension
vectors are the positive roots of type A3 (note that in this case, the right-equivalence
classes are the same as isomorphism classes). Computing the images of these rep-
resentations under µ2, we obtain the correspondence between the dimension vectors
given in Table 1. We conclude that an indecomposable positive representation of
dimM (1,0,0) (0,0,1) (1,1,0) (0,1,1) (1,1,1)
dim µ2(M) (1,1,0) (0,1,1) (1,0,0) (0,0,1) (1,0,1)
Table 1. Indecomposable representations for A3 and the cyclic triangle.
QUIVERS WITH POTENTIALS I 51
(A, S) is uniquely up to right-equivalence determined by its dimension vector, and
these dimension vectors are given in the second line of Table 1, with the exception
of dimS2 = (0, 1, 0).
Example 11.2. Now let Q be the quiver with three vertices 1, 2, 3 and three arrows
a : 1 → 2, b : 2 → 3, and c : 1 → 3:
Again, the only QP on Q is (A, 0). We have µ2(A, 0) = (A, S), where A is the arrow
span of the quiver Q given by
and S = b⋆[ba]a⋆. Again, positive representations of (A, 0) are the representations of
the quiver Q, while positive representations of (A, S) are the representations of the
quiver Q satisfying the relations (11.1).
We consider indecomposable positive representations of (A, 0) with the dimension
vector (n, n, n) for some n ≥ 1. Assume that K is algebraically closed. Since Q is
an extended Dynkin quiver of type A
2 , and (n, n, n) is an isotropic imaginary root,
by Kac’s extension of Gabriel’s theorem, the isomorphism classes of indecomposable
Q-representations of this dimension form a 1-parametric family. An easy check shows
that these representations break into three right-equivalence classes. Their represen-
tatives can be described as follows. For each of them we have M1 =M2 =M3 = K
and two of the maps aM , bM , cM are equal to the identity map I, while the third one
is the nilpotent Jordan block N . If aM = N (resp. bM = N , cM = N) then we
denote the corresponding Q-representation by M(a) (resp. M(b), M(c)). In view of
(10.7), ifM is one of these representations then µ2(M) =M is positive, and we have
M 2 = coker bM ⊕ ker aM
(note that since S = 0, we have γ = 0). It follows that M(c) has dimension vector
(n, 0, n), with the maps [ba]
: Kn → KN given by [ba]
= I, c
= N .
Also both representations M(a) and M(b) have dimension vector (n, 1, n). In each
of them, the arrows [ba] and c act as [ba] = N, c = I. We also have b⋆
= 0, while
the map a⋆
: K → Kn has im a⋆
= kerN ; similarly, a⋆
= 0, while the map
: Kn → K has ker b⋆
= im N .
Example 11.3. Our last example deals with the QP (A, S) from Example 8.6.
Thus, the quiver in question has three vertices 1, 2, 3 and six arrows a1, a2 : 1 → 2,
52 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
b1, b2 : 2 → 3 and c1, c2 : 3 → 1; and the potential S is given by (8.2).
b2 ��=
To specify a positive representationM of (A, S), we need to define three vector spaces
M1,M2,M3, and six linear maps (a1)M , (a2)M :M1 → M2, (b1)M , (b2)M : M2 → M3,
and (c1)M , (c2)M :M3 → M1. In our case, J(S) is the closure of the ideal in R〈〈A〉〉
generated by six elements
c1b1, b1a1, a1c1, c2b2, b2a2, a2c2.
Thus, all the compositions (c1)M(b1)M , . . . , (a2)M(c2)M must be equal to 0.
We first consider the indecomposable positive representation M of (A, S) given
(11.2) M1 =M2 = K, M3 = 0; (a1)M = (a2)M = 1.
Let us compute µ2(M) = (M,V ). First of all, the QP µ2(A, S) = (A, S) was
computed in Example 8.6: recall that the arrows in A are a⋆1, a
2, [b1a2], [b2a1],
and the potential S is given by
S = [b1a2]a
1 + [b2a1]a
To compute M and V , we apply (10.6) and (10.7) to the triangle (10.5) given by
Min = K
2, Mk =M2 = K, Mout = {0}, α =
(so we have β = 0 and γ = 0). It follows that V = {0}, i.e., µ2(M) is positive; we
also have M 1 =M1 = K, M 3 =M3 = {0}, and
M 2 = kerα = K ·
(this is the third term in the decomposition of Mk in (10.7)). Since M3 = 0, the
arrows b⋆1, b
2, [b1a2], [b2a1] act as 0 in M . As for a
1 and a
2, their action is given by
the second equality in (10.10) (note that the choice of a splitting (10.9) is immaterial
here). Namely, identifying M 2 with K via choosing
as the standard basis
vector, we obtain
(a⋆1)M = 1, (a
2)M = −1
as maps M2 = K → K =M 1.
Note that the resulting representation µ2(M) can be conveniently described as
follows: by renumbering the vertices of our quiver via
(11.3) 1′ = 2, 2′ = 1, 3′ = 3,
and setting
(11.4) a′1 = −a
2 = a
1 = [b1a2], b
2 = [b2a1], c
1 = −b
2 = b
the representation µ2(M) gets identified with the initial representation M of the
initial QP (A, S).
QUIVERS WITH POTENTIALS I 53
The mutation µ1(M) can be computed in a similar way. But since we have already
computed the QP µ2(A, S) = (A, S), we find it more convenient to renumber the
vertices via 1′ = 3, 2′ = 1, 3′ = 2, so that µ1(M) gets identified with µ2(M
′), where
M ′ is given by:
(11.5) M ′1 = 0, M
3 = K, ; (b1)M ′ = (b2)M ′ = 1.
Now the triangle (10.5) is given by
M ′in = {0}, M
2 = K, M
out = K
2, β =
(so we have α = 0 and γ = 0). It follows that µ2(M
′) is positive, and we have
M ′1 =M
1 = {0}, M
3 = K, and
M ′2 = K
2/K ·
(this is the first term in the decomposition of Mk in (10.7)). Since M
1 = 0, the
arrows a⋆1, a
2, [b1a2], [b2a1] act as 0 in M
′. As for b⋆1 and b
2, their action is given by
the first equality in (10.10) (note that the choice of a splitting (10.8) is immaterial
here). Namely, identifying M ′2 with K via choosing π(
) = −π(
) as the
standard basis vector, we obtain
(b⋆1)M ′ = −1, (b
2)M ′ = 1
as maps M ′3 = K → K =M ′2.
As above, by renumbering the vertices of our quiver via
(11.6) 1′ = 1, 2′ = 3, 3′ = 2,
and setting
(11.7) a′1 = [b2a1], a
2 = [b1a2], b
1 = b
2 = −b
1 = a
2 = −a
the resulting representation µ2(M
′) gets identified with the initial representationM ′.
We now include the representations M and M ′ given by (11.2) and (11.5) into
a family of positive representations of (A, S) defined as follows: for every pair of
nonnegative integers (m,n) 6= (0, 0), we define the positive representation M =
M(m,n) of (A, S) by setting
(11.8) M1 = K
m, M2 = K
m+n, M3 = K
(a1)M =
, (a2)M =
, (b1)M =
,(11.9)
(b2)M =
, (c1)M = 0, (c2)M = 0,
where In is the n× n identity matrix.
We refer to the representations M(m,n) as well as those obtained from them by
renumbering the vertices as band representations ; they are a special case of band
modules studied in [11, 21] in the context of string algebras. Note that both repre-
sentations M and M ′ treated above are indeed special cases of band representations:
54 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
we have M = M(1, 0), M ′ = M(0, 1). By a direct generalization of the above
computations, we obtain the following proposition.
Proposition 11.4.
(1) If m ≥ n then after renumbering of vertices as in (11.3) and the change of
arrows as in (11.4), the representation µ2(M(m,n)) can be identified with
M(m− n, n).
(2) If m ≤ n then after renumbering of vertices as in (11.6) and the change of
arrows as in (11.7), the representation µ2(M(m,n)) can be identified with
M(m,n−m).
Remembering Theorem 10.13, we obtain the following corollary.
Corollary 11.5.
(1) After renumbering of vertices as in (11.3), the representation µ1(M(m,n))
becomes right-equivalent to M(m+ n, n).
(2) After renumbering of vertices as in (11.6), the representation µ3(M(m,n))
becomes right-equivalent to M(m,m+ n).
Corollary 11.6. The class of band representations is closed under mutations.
Note that if we iterate the mutations in Proposition 11.4, the pair (m,n) gets
transformed according to the Euclid algorithm for finding gcd(m,n). Thus, after a
sequence of mutations (and appropriate renumberings of vertices), every M(m,n)
can be transformed into M(gcd(m,n), 0). Since M(d, 0) is obviously isomorphic to
the direct sum of d copies of M(1, 0), by backtracking this sequence of mutations,
we obtain the following well-known corollary.
Corollary 11.7. The representation M(m,n) is indecomposable if and only if m
and n are relatively prime. Furthermore, if gcd(m,n) = d then M(m,n) is right-
equivalent to the direct sum of d copies of M(m/d, n/d).
Remark 11.8. By the same methods as above, one can compute all the mutations for
another family of representations of the QP (A, S) in Example 11.3: string modules
introduced and studied in [11, 21].
12. Some open problems
Here we collect some natural questions that we find important for better under-
standing of QPs and their representations. In what follows, suppose that (A, S) is a
reduced QP with the Jacobian algebra P(A, S). Let M(A, S) denote the category of
finite dimensional P(A, S)-modules. Suppose also that k ∈ Q0 is a vertex satisfying
(5.1), so that the mutated reduced QP µk(A, S) is well-defined.
Question 12.1. Is the isomorphism class of P(A, S) determined by the equivalence
class of the category M(A, S)?
Question 12.2. Is the isomorphism class of P(µk(A, S)) determined by the isomor-
phism class of P(A, S)?
Question 12.3. Is the category M(µk(A, S)) determined up to equivalence by
M(A, S)?
QUIVERS WITH POTENTIALS I 55
Note that the right-equivalence class of (A, S) is not determined by the isomor-
phism class of the Jacobian algebra P(A, S). In fact, we can construct a QP (A, S)
which is not right-equivalent to (A, cS) for some nonzero c ∈ K, while we obviously
have P(A, S) = P(A, cS) (the possibility of such an example was brought to our
attention by Bill Crawley-Boevey).
We conclude with the following intriguing question.
Question 12.4. Is there a proper analogue of the cluster category for a non-acyclic
quiver with potential?
13. Appendix. Proof of Lemma 4.12
We include Lemma 4.12 into a more general setup. We call a K-vector space V a
C-space (for the lack of a better term) if V has an increasing filtration {0} = V0 ⊆
V1 ⊆ · · · such that all Vn are finite dimensional, and V =
n≥0 Vn. (Equivalently, V
is either finite dimensional, or it has countable dimension.) The class of C-spaces is
clearly closed under taking subspaces, quotient spaces, finite direct sums, and finite
tensor products. We always consider C-spaces equipped with discrete topology; in
particular, this applies to the base field K.
We refer to the dual space V ⋆ of a C-space V as a D-space (the dual is understood
as the space of all linear forms V → K). Most of the properties ofD-spaces discussed
below are undoubtedly well-known; for the convenience of the reader, we provide a
self-contained treatment.
Example 13.1. The complete path algebra R〈〈A〉〉 can be naturally viewed as a
D-space V ⋆, corresponding to the C-space V = ⊕∞d=0(A
, and the filtration (Vn)
given by
Vn = ⊕
d=0(A
(n ≥ 1).
For a subspace W of V , we denote by W⊥ ⊂ V ⋆ its orthogonal complement, that
W⊥ = {f ∈ V ⋆ | f(W ) = 0}.
We make V ⋆ into a topological vector space by taking the sets V ⊥n for all n ≥ 0
as a basic system of open neighborhoods of 0. In particular, in Example 13.1, we
have V ⊥n = m(A)
n, so the D-space topology on R〈〈A〉〉 coincides with the topology
introduced in Section 2.
Since every v ∈ V belongs to some Vn, a sequence f1, f2, . . . converges in V
⋆ if and
only if, for every v ∈ V , the sequence (fk(v)) stabilizes as k → ∞. This implies in
particular that W⊥ is a closed subspace of V ⋆ for every subspace W of V . In fact,
the converse is also true.
Lemma 13.2. A vector subspace Z of V ⋆ is closed if and only if Z =W⊥ for some
subspace W of V .
Proof. Let Z be a vector subspace of V ⋆. Let
W = {v ∈ V | f(v) = 0 for f ∈ Z}.
It suffices to show that W⊥ is contained in the closure Z of Z. Let f ∈ W⊥.
Restricting f to each finite-dimensional subspace Vn of V , we conclude that f |Vn=
56 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
hn |Vn for some hn ∈ Z. Thus, the sequence h1, h2, . . . , converges to f , implying that
f ∈ Z, as required. �
In view of Lemma 13.2, for every closed subspace Z of V ⋆, the spaces Z and V ⋆/Z
can be naturally viewed as D-spaces: indeed, we have
Z =W⊥ = (V/W )⋆, V ⋆/Z = V ⋆/W⊥ = W ⋆
for some subspace W of V . The following lemma is immediate from the definitions.
Lemma 13.3. For every closed subspace Z ⊆ V ⋆, the D-space topologies on Z and
V ⋆/Z coincide with the topologies induced from V ⋆. In particular, the embedding
Z → V ⋆ and the projection V ⋆ → V ⋆/Z are continuous.
Lemma 13.4. If Z1 and Z2 are closed subspaces of V
⋆, then Z1 + Z2 is a closed
subspace of V ⋆ as well.
Proof. By Lemma 13.2, Z1 = W
1 and Z2 = W
2 for some subspaces W1 and W2
of V . Choosing some direct complements of W1 ∩W2 in W1 and W2, and a direct
complement of W1 +W2 in V , it is easy to see that
Z1 + Z2 = W
2 = (W1 ∩W2)
proving that Z1 + Z2 is closed. �
Lemma 13.5. Let U and V be C-spaces, and U⋆ and V ⋆ be the corresponding D-
spaces. A linear map α : U⋆ → V ⋆ is continuous if and only if α = β⋆ for some
linear map β : V → U .
Proof. First let us show that α = β⋆ is continuous. By the definition, it is enough to
show that, for every n, there exists an index k such that U⊥k ⊂ α
−1(V ⊥n ). Since the
subspace β(Vn) ⊂ U is finite dimensional, it is contained in some Uk, implying the
desired inclusion U⊥k ⊂ α
−1(V ⊥n ).
Conversely, suppose α : U⋆ → V ⋆ is a continuous linear map. Let v ∈ V . Then
the linear form f 7→ α(f)(v) is a continuous linear map U⋆ → K, and so its kernel is
a closed subspace of U⋆. Using Lemma 13.2, we conclude that there exists a unique
u ∈ U such that α(f)(v) = f(u) for all f ∈ U⋆. The correspondence v 7→ u is the
desired linear map β : V → U such that α = β⋆. �
Lemma 13.6. Any continuous linear map of D-spaces α : U⋆ → V ⋆ sends closed
vector subspaces of U⋆ to closed vector subspaces of V ⋆.
Proof. Let Z ⊆ U⋆ be a closed vector subspace. By Lemma 13.2, Z =W⊥ for some
vector subspace W ⊂ U . Also by Lemma 13.5, we have α = β⋆ for a linear map
β : V → U . The definitions imply that α(Z) = β⋆(W⊥) = (β−1(W ))⊥, hence α(Z)
is a closed subspace of V ⋆, as claimed. �
We will call a D-space V ⋆ a D-algebra if it has a structure of an associative
K-algebra such that V ⊥m V
n ⊂ V
m+n for all m,n ≥ 0. In particular, R〈〈A〉〉 is a
D-algebra.
Lemma 13.7. If I1, . . . , IN are closed subspaces in a D-algebra V
⋆, then the subspace
I1f1+ · · ·+INfN is closed for every f1, . . . , fN ∈ V
⋆. In particular, finitely generated
left ideals in V ⋆ are closed.
QUIVERS WITH POTENTIALS I 57
Proof. By the definition of a D-algebra, the operator of right multiplication with any
f ∈ V ⋆ is continuous. Thus each subspace Ikfk is closed by Lemma 13.6, and our
assertion follows from Lemma 13.4. �
Recall from Definition 3.4, that the trace space of a D-algebra V ⋆ is the quo-
tient Tr(V ⋆) = V ⋆/{V ⋆, V ⋆}, where {V ⋆, V ⋆} is the closure of the vector subspace
in V ⋆ spanned by all commutators. We denote by π : V ⋆ → Tr(V ⋆) the canonical
projection. By Lemma 13.3, π is continuous with respect to the D-space topologies.
In view of Proposition 3.5, the assertion of Lemma 4.12 is a special case of the
following.
Lemma 13.8. Let I be a closed (two-sided) ideal of a D-algebra V ⋆, and J be
the closure of an ideal generated by finitely many elements f1, f2, . . . , fN . Then the
subspace π(IJ) ⊆ Tr(V ⋆) is equal to π(If1 + · · ·+ IfN).
Proof. Let J0 be the ideal generated by f1, f2, . . . , fN , that is, the linear span of
elements of the form ufkv with u, v ∈ V
⋆ and k = 1, . . . , N . Thus the ideal IJ0
is the linear span of elements of the form gufkv with g ∈ I. By the definition, we
have π(gufkv) = π(vgufk), and so π(IJ
0) = π(If1 + · · ·+ IfN). Since IJ
0 is dense
in IJ , it follows that π(IJ0) is dense in π(IJ). On the other hand, the subspace
π(If1 + · · ·+ IfN) ⊆ Tr(V
⋆) is closed by Lemmas 13.7 and 13.6. We conclude that
π(IJ0) = π(If1 + · · ·+ IfN) = π(IJ), as required. �
Acknowledgments
We thank Victor Ginzburg for helpful comments and bibliographic guidance, Bill
Crawley-Boevey for valuable remarks, Daniel Labardini Fragoso for careful reading
of the manuscript and many useful suggestions and Christopher Herzog for making
us aware of and discussing the use of quivers with potentials in theoretical physics.
After this paper was completed, we have been informed by Maxim Kontsevich that
he has rediscovered some of the results in Sections 6 and 7 in the context of A∞-
categories.
References
[1] A. Berenstein, S. Fomin and A. Zelevinsky, Cluster algebras III: Upper bounds and double
Bruhat cells, Duke Math. J. 126 (2005), 1–52.
[2] D. Berenstein and M.R. Douglas, Seiberg Duality for Quiver Gauge Theories,
hep-th/0207027.
[3] J. Bernstein, I. Gelfand and V. Ponomarev, Coxeter functors and Gabriel’s theorem, Uspehi
Mat. Nauk 28 (1973), no. 2(170), 19-33.
[4] R. Bocklandt, Graded Calabi Yau Algebras of dimension 3, math.RA/0603558.
[5] R. Bocklandt and L. Le Bruyn, Necklace Lie algebras and noncommutative symplectic geom-
etry, Math. Z. 240 (2002), no. 1, 141–167.
[6] V. Braun, On Berenstein-Douglas-Seiberg Duality, J. High Energy Phys. 2003, no. 1, 082, 21
[7] A. B. Buan, R. Marsh, M. Reineke, I. Reiten and G. Todorov, Tilting theory and cluster
combinatorics, Adv. Math. 204 (2006), no. 2, 572-618.
[8] A. B. Buan, R. Marsh and I. Reiten, Cluster-tilted algebras, Tran. Amer. Math. Soc. 359
(2007), no. 1, 323-332.
[9] A. B. Buan, R. Marsh and I. Reiten Cluster-tilted algebras of finite representation type, J.
of Algebra. 306 (2006), no. 2, 412-431.
58 HARM DERKSEN, JERZY WEYMAN, AND ANDREI ZELEVINSKY
[10] A. B. Buan, R. Marsh and I. Reiten, Cluster mutation via quiver representations,
math.RT/0412077.
[11] C.R. Butler and C.M. Ringel, Auslander-Reiten sequences with few middle terms and appli-
cations to string algebras, Comm. Algebra 15 (1987), 269–290.
[12] P. Caldero and F. Chapoton, Cluster algebras as Hall algebras of quiver representations,
Comment. Math. Helv. 81 (2006), no. 3, 595-616.
[13] P. Caldero, F. Chapoton and R. Schiffler, Quivers with relations and cluster tilted algebras,
Algebras and Rep. Theory 9 (2006), 359–376.
[14] P. Caldero and B. Keller, From triangulated categories to cluster algebras, math.RT/0506018.
[15] P. Caldero and B. Keller, From triangulated categories to cluster algebras II,
math.RT/0510251.
[16] P. Caldero, A. Zelevinsky, Laurent expansions in cluster algebras via quiver representations,
Moscow Math. J. 6 (2006), no. 3, 411–429.
[17] M.R. Douglas and G. Moore, D-branes, Quivers and ALE Instantons, hep-th/9603167.
[18] S. Fomin and A. Zelevinsky, Cluster algebras I: Foundations, J. Amer. Math. Soc. 15 (2002),
497–529.
[19] S. Fomin and A. Zelevinsky, Cluster algebras II: Finite type classification, Invent. Math. 154
(2003), 63–121.
[20] S. Fomin and A. Zelevinsky, Cluster algebras IV: Coefficients, Comp. Math. 143 (2007),
112–164.
[21] K.R. Fuller, Biserial Rings, Lecture Notes in Mathematics, 734, Springer, Berlin, 1979, 64–90.
[22] C. Geiss, B. Leclerc and J. Schröer, Auslander algebras and initial seeds for cluster algebras,
math.RT/0506405.
[23] I. Gelfand, M. Kapranov and A. Zelevinsky, Discriminants, Resultants and Multidimensional
Determinants, Birkhäuser Boston, 1994.
[24] V. Ginzburg, Non-commutative symplectic geometry, quiver varieties, and operads, Math.
Res. Lett. 8 (2001), no. 3, 377–400.
[25] V. Ginzburg, Calabi-Yau algebras, math.AG/0612139.
[26] O. Iyama and I. Reiten, Fomin-Zelevinsky mutation and tilting modules over Calabi-Yau
algebras, math.RT/0605136.
[27] B. Keller and I. Reiten, Acyclic Calabi-Yau categories, math.RT/0610594.
[28] R. Marsh, M. Reineke and A. Zelevinsky, Generalized associahedra via quiver representations.
Trans. Amer. Math. Soc. 355 (2003), no. 10, 4171–4186.
[29] G.-C. Rota, B. Sagan and P. Stein, A cyclic derivative in noncommutative algebra. J. Algebra
64 (1980), no. 1, 54–75.
Department of Mathematics, University of Michigan, Ann Arbor, MI 48109, USA
E-mail address : hderksen@umich.edu
Department of Mathematics, Northeastern University, Boston, MA 02115
E-mail address : j.weyman@neu.edu
Department of Mathematics, Northeastern University, Boston, MA 02115
E-mail address : andrei@neu.edu
	1. Introduction
	2. Quivers and path algebras
	3. Potentials and their Jacobian ideals
	4. Quivers with potentials
	5. Mutations of quivers with potentials
	6. Some mutation invariants
	7. Nondegenerate QPs
	8. Rigid QPs
	9. Relation to cluster-tilted algebras
	10. Decorated representations and their mutations
	11. Some three-vertex examples
	12. Some open problems
	13. Appendix. Proof of Lemma 4.12
	Acknowledgments
	References
ABSTRACT
  We study quivers with relations given by non-commutative analogs of Jacobian
ideals in the complete path algebra. This framework allows us to give a
representation-theoretic interpretation of quiver mutations at arbitrary
vertices. This gives a far-reaching generalization of
Bernstein-Gelfand-Ponomarev reflection functors. The motivations for this work
come from several sources: superpotentials in physics, Calabi-Yau algebras,
cluster algebras.

<|endoftext|><|startoftext|>
Coherent macroscopic quantum tunneling in boson-fermion mixtures
D. Mozyrsky, I. Martin, and E. Timmermans
Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545
(Dated: August 11, 2021)
We show that the cold atom systems of simultaneously trapped Bose-Einstein condensates (BEC’s)
and quantum degenerate fermionic atoms provide promising laboratories for the study of macroscopic
quantum tunneling. Our theoretical studies reveal that the spatial extent of a small trapped BEC
immersed in a Fermi sea can tunnel and coherently oscillate between the values of the separated
and mixed configurations (the phases of the phase separation transition of BEC-fermion systems).
We evaluate the period, amplitude and dissipation rate for 23Na and 40K-atoms and we discuss the
experimental prospects for observing this phenomenon.
PACS numbers: 05.30.Jp, 03.75.Kk, 32.80.Pj, 67.90.+z
The tunneling of a macroscopic (or collective) vari-
able of a many-body system through a classically forbid-
den region, macroscopic quantum tunneling (MQT), is a
phenomenon of fundamental interest [1] and a recurring
theme in a variety of fields ranging from nuclear (fission)
and condensed matter physics (e.g. quantummagnets [2],
SQUIDs) to quantum optics (macroscopic Schrodinger
cat states [3] and beyond-standard limit measurements).
Nevertheless, stringent tests under well-understood and
controlled conditions remain an experimental challenge.
Cold atom gases, arguably the cleanest and best under-
stood mesoscopic systems which, furthermore, offer un-
precedented control knobs such as the ability to vary the
inter-particle interactions [4], now provide an intriguing
candidate laboratory for the study of MQT.
The first cold atom MQT proposals [5] suggested ob-
serving the collapse of a trapped dilute gas Bose-Einstein
condensate (BEC) of mutually attracting bosons. How-
ever, the experimental results [6, 7] were either too sensi-
tive to particle number to distinguish MQT from classi-
cal collapse [6], or the analysis was complicated by more
complex dynamics (such as ’clumping’) [7]. Evidence of
coherence (of the many-body system taking on a linear
superposition of states that correspond to the macro-
scopic variable residing on either side of the barrier) is
even more difficult to gather. Such coherence would be
more readily observable in the MQT between long-lived
states, in which case one could set up a coherent pop-
ulation oscillation between the many-body states. Such
long-lived states naturally occur in (zero-temperature)
first order phase transitions in which the order parame-
ter, which provides the macroscopic variable, can tunnel
through the barrier of its Landau-Ginzburg potential. In
the infinite system limit, the coupling between the two
states rigorously vanishes, but finite- size cold atom sys-
tems of moderate particle numbers provide, once again,
a promising candidate to observe the MQT coherence be-
tween states of different phases, as we show below.
An earlier proposal to observe MQT between states in
which the components of a BEC-mixture arrange them-
selves differently in space, involved a very low coupling
on account of the small spatial overlap between the sin-
gle component densities in the different states [8]. In
this paper, we propose that MQT can be realized and
its coherence, perhaps, observed in trapped gas mixtures
of a single-component fermion system and a BEC. Such
mixtures are currently created [9] e.g. in the sympa-
thetic cooling scheme in which the colder BEC cools the
fermions. The tunneling and coherent oscillations that
we target would occur between states of the mixed and
separated phases in the phase separation transition of
the fermion-BEC mixture [10]. Such transitions could be
accessed by varying the scattering length of the boson-
fermion interaction [11].
We consider NB atomic bosons confined in a spheri-
cally symmetric harmonic trap (of frequency ωT ) inter-
acting with a much larger system of atomic fermions.
For simplicity we assume the fermions to occupy an in-
finite volume. The Hamiltonian of the bosons is de-
scribed by the standard Gross-Pitaevskii (GP) form [1],
i.e., with inter-particle interactions described by a con-
tact potential (∝ λBBδ(r − r
′)), which we choose to be
repulsive (λBB > 0) We assume that the interaction of
bosons with fermions is also contact-like, contributing
λBF |ΨB|
2|ΨF |
2 to the Hamiltonian density, where λBF
is the fermion-boson coupling constant. Furthermore,
all fermions occupy in the same spin state so that the
short-range inter-fermion interactions do not contribute
by virtue of the Pauli exclusion principle.
We are interested in the dynamics of the reduced sys-
tem of bosons described by the functional
S = SBEC +Tr log
~∂τ −
− µF + λBF |ΨB|
where S0 is the action of the bosons alone, SBEC =
d3rΨ̇BΨ
B − HBEC), and the second term is
a contribution due to the interaction of bosons with
fermions; µF is the chemical potential of the fermions.
Here and throughout the paper we will be utilizing
the imaginary time (Matsubara) representation, unless
stated otherwise. An explicit evaluation of the second
term is a challenging task. However, here we are inter-
ested in the dynamics of the slow breathing mode of the
BEC Ψ0B, which can be treated in the self-similar density
http://arxiv.org/abs/0704.0650v1
approximation. This dynamics describes the longitudinal
expansions (and contractions) of the condensate. Finite
size effects such as the appearance of a non-vanishing ex-
citation energy (gap) can decouple this mode from other
excitation modes. Hence, Ψ0B peaks at small frequen-
cies (ω) and small wavevectors (q), giving a Ψ0B that is
a slowly varying function of spacial and temporal coor-
dinates. In such case the Tr log[...] in Eq. (1) can be
evaluated within the Thomas-Fermi approximation. A
straightforward zero-temperature calculation yields
δTr log[...]
λBF |ΨB(r)|
ΨB, (2)
where kF is the Fermi wavevector. Eq. (2) represents an
additional term in the Gross-Pitaevskii (GP) equation,
δSGP /δΨ
B = 0, resulting from interaction with fermions.
In order to analyze the physical meaning of Eq. (2) let
us expand it in powers of ΨB. The first nontrivial con-
tribution is a term −2λ′|ΨB|
2ΨB, λ
′ = (λ2BF k
F /4π
2µF ),
which corresponds to the attraction between bosons me-
diated by interaction with fermions. For nonzero, but
small ω and q there is an additional term (of the or-
der of λ2BF ) related to the dissipation of the condensate
due to the Landau damping, as we discuss below. The
next order yields η|ΨB|
4ΨB, η = (k
BF /8π
2µ2F ). Un-
like the previous term this one is positive, and represents
reduction in the effective boson-boson attraction due to
depletion of fermions in the regions of high density of
the bosons. The next order terms (in λBF ) prove to
be unimportant as can be verified directly from Eq. (2).
Therefore we will replace the potential energy contribu-
tion in GP equation given by Eq. (2) by the two terms
discussed above [12].
To analyze the dynamics of the slow (breathing) mode
described by the Hamiltonian
|∇ΨB|
2 (3)
(λBB − λ
′)|ΨB|
we apply the time dependent variational principle. Since
we are interested in ground state properties of Eq. (5) we
use a spherically symmetric Gaussian trial wavefunction
Ψ0B(r) =
4 (xR0)
2(xR0)2
, (4)
parameterized by a dimensionless parameter x that char-
acterizes the BEC’s spatial width in units of the zero
point motion amplitude R0 = (~/2mBωT )
1/2. Substitu-
tion of this wavefunction into Eq. (5) yields the following
dependence of the ground-state energy E0 on x:
E0(x) =
3NB~ωT
, (5)
where α = NB(λ
′ − λBB)/[(2π)
3/2R30~ωT ] and β =
4N2Bη/(3
5/2π3R60~ωT ). For positive but relatively small
FIG. 1: Density profiles of the bosons (solid lines 1 and 2) and
corresponding density profiles of the fermions (dashed lines
1’ and 2’) in two metastable states: (1) with fermions having
zero density at the center of the trap (“separated phase”) and
(2) with nonzero density of fermions (“mixed phase”). The
dotted line represents schematically an effective potential for
the breathing mode of the bosons.
α, i.e., for α < αcr = 32(2/5)
1/4/15 ≃ 1.69, E0 may de-
velop two competing minima, depending on the value of
the β-parameter. The energy barrier separating the min-
ima is caused by the same effect as the barrier appearing
in the description of a BEC with attractive interactions:
it arises due to the competition between the kinetic and
the interaction energies, i.e, the first and the third terms
in the right-hand side (rhs) of Eq. (5). In the absence of
the last term in the rhs of Eq. (5) the state in this well
would have been metastable - the energy would tend to
−∞ at x → 0. The 1/x6 term stabilizes the system: for
small x this term rapidly increases, giving rise to another
minimum of E0(x), now due to the competition between
the last two terms in the rhs of Eq. (5).
At certain values of α and β the two minima of E0(x)
will have the same energy, and the ground state of the
system becomes degenerate. Since our system is finite,
this degeneracy will be lifted by the quantum tunnel-
ing transition between the two states. Such mechanism
has been suggested to be the dominant decay process for
condensates with attractive interactions between parti-
cles [5]. The tunneling corresponds to the low energy ex-
citations of the breathing mode, described by the wave-
function in Eq. (4). It has been shown in [5] that by
accounting for the superfluid motion of the condensate
(which can be done by introducing a phase-factor eiφ for
the wavefunction in Eq. (4) and requiring the superfluid
velocity vs = (~/mB)∇φ to satisfy the continuity equa-
tion) one obtains an effective action for the breathing
mode of the condensate
S0[x(τ)] =
+ E0(x)
, (6)
where E0(x) is given by Eq. (5) and m0 = 3mBNBR
Thus the dynamics of the ground state wavefunction of
the condensate is that of a quantum particle of mass m0
moving in the potential E0(x).
A direct analysis of the Shrodinger equation corre-
sponding to Eq. (6), however, is quite cumbersome since
the two wells are generally quite asymmetric. Instead
we choose an alterative route: we compute the ground
state energy and obtain the tunneling rate by numerically
solving the time-independent GP equation, δH/δΨB =
EΨB, where H is given by Eq. (3). The latter approach
also serves as an independent justification of the varia-
tional method and confirms that macroscopic quantum
tunneling, QMT, is the mechanism that causes the tran-
sition between the two states of the condensate. Upon
substitution ΨB ∼ φ/r, the time-independent GP equa-
tion can be cast in the form
φ = µφ, (7)
where the φ(x)-function is normalized to unity, a =
(π/2)1/2α, b = 35/2πβ/16, and x = r/R0, µ = E/~ωT .
We find the ground state numerically by replacing the
rhs of Eq. (7) by −∂τφ and propagating φ in the imag-
inary time τ until it converges to the ground state φ0
(or Ψ0B). We then evaluate the ground state energy ac-
cording to Eq. (3) and present the results in Fig. 2(a)
as a function of the b-parameter for different values of a.
Fig. 2(b) shows the dispersion of the ground state width,
(1/NB)
d3r|Ψ0B(r)|
2, as a function of those same pa-
rameters. For a < acr = 1.83 the ground state energy
and dispersion undergo a sharp crossover between the
state with compressed and expanded BEC wavefunctions
(corresponding to the phase separated and mixed states)
as functions of b. Note that the value acr corresponds
to the value of αcr = 1.46, which is quite close to the
above critical value of 1.69 obtained from the variational
approach. The dependence of ground state energy near
the critical value of a is shown in the inset of Fig. 2(a).
Clearly the ground state energy exhibits avoided level
crossing, which is in accordance with the above conjec-
ture (e.g., Eq. (6)) of macroscopic quantum tunneling
between the two local energy minima.
The value of the tunneling matrix element ∆ between
two local “ground” states ǫ1 and ǫ2 can be deduced by
fitting the calculated energy curves in Fig. 1 with the
standard expression [13], ǫ = (ǫ1+ ǫ2)/2− [(ǫ1− ǫ2)
∆2]1/2 and assuming that in the vicinity of the point
of crossover both ǫ1 and ǫ2 are linear functions of pa-
rameter b. For a = 1.81 one finds ∆ ∼ 10−4 × ~ωT ,
while for a = 1.82, ∆ ∼ 10−2 × ~ωT . Assuming that
the ground state wavefunctions have Gaussian shape,
∼ R−3
exp(−r2/R2
), from Fig. 2 one finds
that R̄ = (R1 + R2)/2 ≃ 0.85R0 for both a = 1.81 and
a = 1.82, and δR = |R1 − R2| ≃ 0.09R0 for a = 1.81
and δR ≃ 0.03R0 for a = 1.82. For a typical value
of the trapping frequency νT = 10
2Hz (ωT = 2πνT ),
the two tunneling rates are ∆1.81/~ = 10
−2 s−1 and
∆1.82/~ = 10
2 s−1. Since the value of R0 for most
trapped atomic BEC’s is of the order of a few microns,
FIG. 2: (a) Dependence of the ground state energy of the BEC
(per particle, in units of ~ωT ) as a function of parameters a
and b; (b) Dispersion of the ground state spatial extent as a
function of the same parameters.
the difference between the radii of the two condensate
states δR is submicron. Such small variation may be
difficult to observe in situ by optical means. However,
the expansion process that takes place in time-of-flight
measurements after the trap potential is shut off and the
expanding atoms are observed, has successfully magnified
small distance features in other experiments.
Role of dissipation: The above analysis determines
the tunneling rate, but does not address the question
whether the tunneling process is quantum coherent. Will
the probability of the system to occupy one of the two
macroscopic states oscillate in time as cos2 (∆ t/~)? The
fermions not only provide the BEC with the effective in-
teraction, they also cause fluctuations which can destroy
the macroscopic quantum coherence. To evaluate the
effect of fluctuations, it is sufficient to consider the first
non-vanishing frequency-dependent contribution into the
effective action of the bosons coming from the perturba-
tive expansion of the Tr log [...] term in Eq. (1):
(2π)3
χ0(q, ω)|ρB(ω,q)|
2. (8)
Here ρB(q, ω) is the Fourier transform of ρB(r, t) and
χ0 is the response function of the non-interacting
fermions. In the small frequency domain χ0 =
(1/4π)[~2k3F /(πµF ) + m
F |ω|/(~
2q)]. The frequency-
independent part of χ0 has already been incorporated
in the effective interaction between bosons, i.e., λ′|ΨB|
term in Eq. (3). The second term in χ0 is responsible
for damping. To quantify its role we employ a two-
state approximation in describing the tunneling dynam-
ics. In this representation the tunneling is described by
the Hamiltonian Htun = ∆σ̂x, where σ̂x is a Pauli ma-
trix with non-zero off-diagonal elements, and the posi-
tion operator, i.e. the spatial width of the ground-state
BEC wavefunction, is given by R̂ = R̄+(δR/2)σ̂z, where
σ̂z is the diagonal Pauli matrix (with ±1 along the di-
agonal). The dissipative part of the action for Htun
can be derived from Eq. (8) by substituting a Gaus-
sian ansatz, ρB(r, t) = NB/[π
3/2R3(t)] exp [−r2/R2(t)],
where R(t) = R̄ + (δR/2)σz(t), σz = ±1, into Eq. (8).
For δR ≪ R̄ one obtains
Sdiss = γ~
dτdτ ′σz(τ)σz(τ
′)(τ − τ ′)−2, (9)
where γ = N2Bλ
2/[2(2π~)4R̄4]. Eq. (9), to-
gether with Htun defined above, describes dissipative dy-
namics of a two-state system. Such dynamics has been
extensively studied in connection with macroscopic quan-
tum tunneling of a superconducting phase in Josephson
junctions, and is known to depend critically on the value
the parameter γ. Specifically, for γ > 1 the two-state os-
cillation is always overdamped and at zero temperature
it exhibits localization as a result of quantum fluctua-
tions [14]. It is therefore instructive to evaluate γ for our
situation. For estimates we consider an atomic mixture
of 23Na (bosons) and 40K (fermions), which have natural
scattering lengthes aBB ≃ 1nm (λBB = 4π~
2aBB/mB)
and aBF ≃ 4nm (λBF = 2π~
2aBF [(1/mB) + (1/mF )]).
For these data we obtain a critical value of N crB ≃ 12400
(again for νT = 10
2Hz) and the fermion density ncrF ≃
7.4×1015cm−3. Then, for a = 1.81 we obtain γ1.81 ≃ 1.1,
which corresponds to the localized case (at T = 0),
whereas for a = 1.82 one gets γ1.82 ≃ 0.1. In the high
temperature limit (for kBT > ∆) the relaxation rate Γ
can be expressed in terms of γ as ~Γ = πγkBT [14],
and therefore coherent (underdamped) oscillations can
be observed for T ≪ ∆1.82/(γ1.82kB) = 0.5nK. The
situation can be improved, however, if one utilizes a Fes-
hbach resonance [4] to increase the aBF scattering length.
For example, for aBF = 80nm one finds N
B ≃ 25 and
ncrF ≃ 2.6 × 10
11cm−3, and γ1.82 ≃ 2.5 × 10
−4. For
such parameters coherent oscillations can be observed for
T ≪ 0.2µK, which is easily observable. A low particle
number also reduces the uncertainty of an atomic count-
ing measurement that can be carried out in the time-of-
flight procedure [15].
In summary we argue that a trapped boson-fermion
mixture can exhibit MQT tunneling and coherent oscil-
lations. Our studies indicate that MQT can be observed
in 23Na and 40K atomic mixtures of sufficiently low tem-
peratures.
We thank M. Boshier and S. A. Gurvitz for valuable
discussions. The work is supported by the US DOE.
[1] A. J. Leggett et al., Rev. Mod. Phys., 59, 1 (1998)
[2] L. Gunther and B. Barbera, Eds. ’Quantum tunneling
of magnetization - QTM’94 (Kluwer, Dordrecht, Nethre-
lands, 1995).
[3] J. I. Cirac et al., Phys. Rev A, 57, 1208 (1998).
[4] E. Timmermans et al, Phys. Rep., 315, 199 (1999).
[5] H. T. C. Stoof, J. Stat. Phys. 87, 1353 (1997); M. Ueda
and A. J. Leggett, Phys. Rev. Lett., 80, 1576 (1998); J.
A. Freire and D. P. Arovas, Phys. Rev. A 59, 1461 (1999);
C. Huepe, S. Metes, G. Dewel, P. Borckmans, and M. E.
Brachet, Phys. Rev. Lett., 82, 1616 (1999).
[6] C. A. Sackett et al, Phys. Rev. Lett. 82, 876 (1999); J.
M. Gerton et al, Nature;408, 692 (2000).
[7] E. A. Donley et al., Nature; 412, 295 (2001); J. L.
Roberts et al. Phys. Rev. Lett.; 86, 4211 (2001).
[8] K. Kasamatsu et al., Phys. Rev. A, 64, 053605 (2001).
[9] A. G. Truscott et al., Science 291, 2570 (2001); F.
Schreck et al., Phys. Rev. Lett. 87 080403 (2001); G.
Modugno et al., Science, 297 2240 (2002); M. W. Zwier-
lein et al., Phys. Rev. Lett. 92, 120403 (2004); T. Bour-
del et al., Phys. Rev. Lett. 93 050401 (2004); Stan et al.,
Phys. Rev. Lett. 93, 143001 (2004).
[10] The spatial arrangements in the fermion-BEC mixtures
were first discussed in K. Molmer, Phys. Rev. Lett., 80,
1804 (1998), and the infinite system phase separation
transition was described in L. Viverit et al. Phys. Rev.
A, 61, 053605 (2000).
[11] A Simoni et al., Phys. Rev. Lett., 90, 163202 (2003).
[12] A quantitaive analysis of the GP equation for the poten-
tial given by Eq. (2) for will be presented elsewhere.
[13] QuantumMechanics, by L. D. Landau and E. M. Lifshits,
Pergamon Press (1965).
[14] A. J. Leggett et al., Rev. Mod. Phys. 59, 1 (1987).
[15] Low particle numbers can be measured very accurately
with resonance fluorescence, for instance, see D. Frese
et al., Phys. Rev. Lett., 85, 3777 (2000); recent work
also illustrated subPoissonian counting (∆N <
N) for
larger numbers, T. Campey et al., Phys. Rev. A, 74,
043612 (2006).
ABSTRACT
  We show that the cold atom systems of simultaneously trapped Bose-Einstein
condensates (BEC's) and quantum degenerate fermionic atoms provide promising
laboratories for the study of macroscopic quantum tunneling. Our theoretical
studies reveal that the spatial extent of a small trapped BEC immersed in a
Fermi sea can tunnel and coherently oscillate between the values of the
separated and mixed configurations (the phases of the phase separation
transition of BEC-fermion systems). We evaluate the period, amplitude and
dissipation rate for $^{23}$Na and $^{40}$K-atoms and we discuss the
experimental prospects for observing this phenomenon.

<|endoftext|><|startoftext|>
Efficiency of thin film photocells
D. Mozyrsky and I. Martin
Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
(Dated: Printed November 4, 2018)
We propose a new concept for the design of high-efficiency photocells based on ultra-thin (sub-
micron) semiconductor films of controlled thickness. Using a microscopic model of a thin dielectric
layer interacting with incident electromagnetic radiation we evaluate the efficiency of conversion
of solar radiation into the electric power. We determine the optimal range of parameters which
maximize the efficiency of such photovoltaic element.
Improvement of efficiency of semiconductor photo-
voltaic elements (solar cells) has been an important tech-
nological challenge for several decades. The maximum
possible efficiency obtains when every incident photon
generates an electron-hole pair, which then separates into
electron flowing to cathode and hole flowing to anode[1].
The limitations that reduce the efficiency of the practi-
cal solar cells relative to the ideal are 1) light reflection
at the interfaces, 2) incomplete absorption of light en-
tering the device due to finite thickness, 3) electron-hole
relaxation inside the absorbing medium during diffusion
to the leads[2, 3]. Interplay between two latter mecha-
nisms leads to an existence of an optimal device thick-
ness, typically a few optical wavelengths. Here we show
that the interface reflection, commonly considered a com-
pletely independent loss mechanism, shows an interest-
ing interplay with absorption in ultra-thin film devices.
This opens a possibility for a new generation of ultra-thin
(sub-wavelength) photovoltaic elements with efficiencies
rivaling the best conventional devices.
A “working body” of a solar cell is typically a semicon-
ductor with relatively high absorption index at frequen-
cies corresponding to those of the sun quanta h̄ωsun ∼
kBTsun, Tsun ≃ 6000 K. Such semiconductors, however,
process a rather high refraction index n at these frequen-
cies. As a consequence, a fraction (n − 1)2/(n + 1)2 of
the incident light is reflected from the surface of the think
device. To reduce this loss, often anti-reflective coating
are applied to the surface of the device. On the other
hand, for sub-wavelength thin films, the reflection can
be significantly smaller (for a reason similar to why even
metallic films are transparent when thin enough). Thus
reducing the film thickness one should reach an optimum
where reflection is reduced but the absorption is still sig-
nificant.
Also, in such thin devices the carrier recombination
is naturally reduced. Electron-hole recombination which
prevents efficient charge separation in the photocell is a
major limiting factor in device operation. There are nu-
merous mechanisms which lead to the charge relaxation
in a bulk of a semiconductor. These mechanisms include
spontaneous emission as well as phonon or impurity in-
duced relaxation. While it is difficult to control these
processes in the bulk of a semiconductor, it is clear that
their contribution can be significantly reduced if diffu-
sion length of electrons and holes is large compared to
the width of the semiconducting layer.
radiation
(a) (b)
(d)(c)
FIG. 1: Insets (a) and (b): Schematics of the device. Insets
(c) and (d): Band structure of the device without and with
external load.
Diffusion length for most semiconductors used in pho-
tocells is of the order of a few microns. Since this distance
is comparable to a typical wavelength of the sunlight,
one could expect that the specific absorption (the ratio
of the absorbed power to the incident power of radiation)
in such a thin semiconducting layer is insufficient for any
practical use. In order to see whether this is the case,
it is instructive to look at the absorption of radiation
in a layer of thickness d, e.g. Fig. 1(a,b). For simplicity
we assume that the radiation is incident perpendicular to
the surface of the layer and is monochromatic with wave-
length λ. The layer has a dielectric constant whose real
and imaginary parts are ǫRe and ǫIm respectively. The
specific absorption of the dielectric layer can be easily
evaluated by solving the wave equation (n/c2)Ä = ∂2zA
(A is radiation field vector-potential, n = ǫ1/2 is the re-
fraction index, and c is the speed of light) in the three
regions, i.g., Fig. 1(b), and taking into account the conti-
nuity conditions at the boundaries of the dielectric layer,
A1 = A2, A2 = A3, ∂zA1 = ∂zA2, etc. After straight-
forward algebra one finds that the specific absorption is
http://arxiv.org/abs/0704.0651v1
Pabs = 1−|t|2−|r|2, where the amplitudes of transmitted
and reflected waves are
4n exp (id/λ)
(n+ 1)2 exp (−ind/λ)− (n− 1)2 exp (ind/λ)
, (1a)
(n2 − 1)[exp (ind/λ)− exp (−ind/λ)]
(n+ 1)2 exp (−ind/λ)− (n− 1)2 exp (ind/λ) .(1b)
The specific absorption evaluated according to the above
equations is presented in Fig. 2 for a GaAs slab as a
function of its thickness d. GaAs has a relatively nar-
row (∼ 1.4 eV) bandgap and therefore is widely used
in high-efficiency photocells. Since dielectric function
of GaAs is strongly frequency dependent [4], in Fig. 2
we plot the specific absorption for several energies typ-
ical to the quanta of solar radiation. The 2 eV curve
corresponds to the relatively low imaginary part of the
dielectric constant and thus saturates slowly exhibiting
several oscillations due to the interference between re-
flected and transmitted components. The 3 eV and 4 eV
curves correspond to much higher absorption (for exam-
ple ǫGaAsIm (3 eV ) ≃ 17) and saturate much faster. Prior
to saturation both curves exhibit a peak (again due to
the interference) at roughly d ≃ λ/|ǫ|. Remarkably the
value of the specific absorption at the peak (∼ 0.42 at
d ≃ λ/|ǫ| ≃ 20 nm for 3 eV curve) is nearly the same as
its saturation value (0.51 at d ≫ λ). Thus we conclude
that the solar radiation can be absorbed by a semicon-
ducting layer of submicron thickness almost as efficiently
as by an infinitely thick slab.
In this paper, following the simple above considera-
tions, we propose a new concept for the design of pho-
tovoltaic elements based on thin semiconductor films of
controlled thickness. To put our arguments on more rig-
orous footings in the following we consider a detailed
microscopic model of a dielectric layer interacting with
solar radiation. Effective Hamiltonian can be derived
from standard quantum-mechanical interaction between
matter and radiation, (e/mc)A(r)·p + (e2/2mc2)A2(r),
where here and in the following we assume Coulomb
gauge for the electromagnetic field. The radiation in-
duces transitions between valence and conduction band
of the semiconductor. We assume that the temperature
of the semiconductor is 0, and therefore these are the
only possible transitions in the system (the valence band
is completely full and the conduction band is empty). De-
noting the Bloch states for the valence(conduction) bands
(r) = exp (−ikr)uv(c)
(r), H0ψ
(H0 is a Hamiltonian of the crystal in the absence of
coupling to radiation field) one can rewrite the radiation-
matter interaction Hamiltonian in terms of single particle
states in the semiconductor as
Hint =
〈uc0|pα|uv0α〉 c
dk−q⊥αA
+H.c. . (2)
We make the following assumptions: (1) While the
electromagnetic field does not significantly vary with
distance inside the film, the electronic wave-functions
are effectively 3-dimensional - we assume that λsun ∼
2πh̄c/(kBTsun) ≫ d ≫ h̄/(m∗Eg)1/2, where d is the
thickness of the film, m∗ is exciton effective mass,
(m∗)−1 = m−1v +m
c , mv(c) are effective masses in va-
lence and conduction bands, and Eg is the band-gap (in
this paper we assume that bands have extremuma at zero
momentum). This assumption allows one to carry out an
analytic calculation with rather simple and transparent
results. We will discuss the validity of this approxima-
tion at the end of the paper; (2) The bands have different
symmetry, say s and p, so index α denotes angular mo-
mentum of an electron in p band. Since wave-vectors
of the incident radiation are nearly perpendicular to the
surface of the film and the film is assumed infinite in
x − y dimension, only in-plane components of the an-
gular momentum (α = x, y) are relevant in Eq. (2); (3)
The coupling in Eq. (2) is isotropic and the coupling con-
stant tα = 〈uc0|epα/(mc)|uv0α〉 ≃ Egp/(cS1/2), where p is
the effective dipole moment per unit cell of the film and
S is the surface area of the film. (4) Due to external
electric load the bands have effectively different chem-
ical potentials, µn and µp, e.g., Figs. 1(c) and 1(d).
That is, once electron is promoted from valence to con-
duction band, it immediately “rolls over” to the left lead,
which corresponds to the infinite transition rate between
the semiconductor and the metallic lead (an infinitely
thin Shotkey barrier). Clearly, were the rate comparable
or slower than the electron-hole relaxation rate, the effi-
ciency of the cell would have decreased. (5) The p − n
junction is prepared (doped) as shown in Fig. 1(c) - the
top of the valence band in the p-doped area of the junc-
tion lies just below the bottom of the of the conduction
band in the n-doped area. Therefore the maximum volt-
age the photo-cell can sustain is equal to Eg/e, which
corresponds to the assumption of maximum efficiency of
Shockley and Quasser[1], i.e., each electronic transition
from valence to conduction band generates energy Eg in
the circuit. The photocurrent is defined as the rate of
the charge transfer between the valence and conduction
band, Îph = [Hint,
ck]. To the lowest non-vanishing
order it can be expressed as
Iph =
(Eck − Evk−q⊥α, z = z
′ = 0) . (3)
The photocurrent of Eq. (3) is independent of the
voltage across the cell as far as it does not exceed
the bandgap of the semiconductor, e.g., Fig. 1(d).
For larger bias, within our assumptions, a reverse
current begins to flow. In Eq. (3) D<αβq⊥(ω, z, z
dtd2r⊥ exp i(ωt+ r⊥ · q⊥) 〈TKA−α (t, r⊥, z)A+β (0,0, z′)〉
is the “lesser” Green’s function of the electromagnetic
field defined along the Keldysh contour, where super-
scripts ± denote forward and return branches of the
contour [5]. The Green’s function is inhomogeneous
0 0.5 1 1.5
4 eV 
3 eV 
2 eV 
FIG. 2: Specific absorption of GaAs slab as a function of its
thickness. Different curves correspond to different frequencies
(energies) of incident radiation.
along z-direction, i.e., perpendicular to the film surface.
In order to incorporate effects of absorption and reflec-
tion from the film, it is necessary to include renormaliza-
tion of the photon Green’s function due to interactions
with the film according to Eq. (2). These effects can be
treated by means of Dyson equation, which, for the case
of two-dimensional film reads
D̂αβ(ω,q⊥, z, z
′) = D̂0αβ(ω,q⊥, z − z′)
+D̂0αγ(ω,q⊥, z)Σ̂γδ(ω,q⊥)D̂δβ(ω,q⊥, 0, z
′) , (4)
where hats denote the standard 2 × 2 matrix struc-
ture of the non-equilibrium Green’s functions. The
self-energy Σ̂γδ in Eq. (4) is quasi-two-dimensional.
Since l ≫ h̄/(m∗Eg)1/2, effects related to the finite
width of the semiconducting slab can be neglected and
Σ̂γδ(ω,q⊥, 0) =
Σ̂γδ(ω,q), where Σ̂γδ(ω,q) is the
self-energy defined for the bulk of the semiconductor.
Moreover, due to x− y symmetry Σ̂γδ reduces to a δγδΣ̂,
where Σ̂ depends only on |q⊥|. Therefore we obtain a
closed form equation for D̂αβq⊥(ω, z = z
′ = 0):
D̂αβ(0, 0) = D̂0αβ(0) + D̂0αγ(0)Σ̂D̂γβ(0, 0) , (5)
where components of the bare Green’s function D̂0αβ for
solar radiation are:
0αβ (ω,q⊥, 0) =
4π(δαβ − qαqβ/q2)
ω2 − ω2q ± iδ
, (6a)
D<0αβ(ω,q⊥, 0) =
(δαβ − qαqβ/q2)
δ(ω − ωq)ñq − δ(ω + ωq)(1 + ñ−q)
. (6b)
In Eq. (6) ωq = h̄c|q| and the ñq is the distribution
function of solar radiation. We assume that the incident
radiation wavevectors are uniformly distributed within a
cone with an opening angle 2φ (φ ≪ 1). Moreover, in
order for the incident power to be maximum, we assume
that the surface of the cell is perpendicular to the cone’s
axis. Then ñq = n
q θ(qz)θ(φqz −|q⊥|), where nBq is Bose
distribution function with temperature Tsun.
Furthermore due to the homogeneity of the Green’s
function D̂αβq⊥ in x− y plane one can seek for solution
of Eq. (5) in the form D̂αβ = D̂1δαβ + D̂2 qαqβ/q
2, where
D̂1(2) are 2×2 matrices in the Keldysh space, but depend
only on the absolute value of the wavevector q. Substi-
tution of this ansatz into Eq. (5) yields two independent
equations for D̂1 and for D̂3 = D̂1 + D̂2. After solving
those equations one finds
(DR01(3))
01(3)
(DA01(3))
−1 +Σ<
01(3)
)−1 − ΣR][(DA
01(3)
)−1 − ΣA]
, (7)
where D̂01 is the diagonal part of D̂0αβ , e.g., Eqs. (6),
and D̂03 = D̂01 + D̂02, where D̂02 is the transverse part
of D̂0αβ . Σ
R(A) and Σ< are retarded(advanced) and
“lesser” parts of photon self-energy. Also we find a stan-
dard expression for the retarded(advanced) Green’s func-
tions of the radiation:
01(3)
)−1 − ΣR(A)
. (8)
We can now evaluate the photo-current in Eq. (3) in
terms of the Green’s function of Eq. (7). The first contri-
bution is due to absorption of incident radiation accom-
panied by transfer of electrons from valence to conduction
band. It comes from the first term in the numerator in
the RHS of Eq. (7). This term can also lead to the reverse
current due to spontaneous and stimulated emission, i.e.,
transitions from conduction band to the valence band ac-
companied by creation of a real photon. It arises due to
the δ(ω+ωq) term in Eq. (6b). This process is, however,
not allowed while the energy gap Eg exceeds the applied
voltage µn − µp, e.g., Fig. 1(c,d). In this situation the
cell becomes a light-emitting diode, and, as was stated
above, we are not interested in such case in this paper.
The second contribution comes from the Σ< term in the
RHS of Eq. (7). It corresponds to the emission of vir-
tual quanta of radiation by one electron-hole pair and
their subsequent re-absorbtion by another pair, resulting
in an incoherent simultaneous transfer of two electrons
from conduction to valence band. This process gives a
reverse contribution to the current which, once again, is
non-zero only in the “light-emitting” regime (or at high
temperature).
The self-energies in Eqs. (7,8) can be evaluated in
terms of the electronic Green’s functions. The leading
contribution comes from the conventional polarization di-
agram, i.e., a convolution of two electronic Green’s func-
tions. For non-equilibrium situation one obtains
ΣR(ω, 0) =
GKvk(ω + ω
′)GAck(ω
+GRvk(ω + ω
′)GKck(ω
′) + (v ↔ c)
. (9)
and ΣA = (ΣR)∗. In Eq. (9) G
are retarded (ad-
vanced) Green’s functions of valence(conduction) elec-
trons and GK
is the Keldysh Green’s function. Note
that in Eq. (9) we evaluated the self-energy at zero
wavevector, since photon wavevectors are small com-
pared to those of electrons, and therefore the self-energy
Σ̂ is weakly dependent on q for direct bandgap materials.
The self-energies Σ(R)A can be easily evaluated for non-
interacting electrons. ThenG
v(c)k
(ω) = (ω−Ev(c)
±iδ)−1
and GKv(c)k(ω) = (1− 2n
n(p)k)δ(ω −E
), where nFn(p)k
are Fermi filling factors. The valence and conduction
electrons are assumed to have chemical potentials corre-
sponding to those of the two leads, µn and µp respec-
tively. For these Green’s functions the imaginary part of
the self-energy yields:
Im (ω, 0) = ±
Θ(ω − Eg)
ω − Eg , (10)
where we have introduced a dimensionless parameter
a = p2dE
∗)3/2/(21/2h̄4c). A similar calculation
for the real part of ΣR(A) yields an ultraviolet diver-
gence. This, however, is an artifact of our approxima-
tion of the infinitely thin absorbing layer (similar prob-
lem occurs in the quantum electrodynamics treatment
of electron-photon interaction). In a proper microscopic
theory this divergence in the real part of the self-energy
is exactly cancelled by the e2A2(r)/(2mc2) term in the
radiation-matter interaction Hamiltonian (the frequency
sum rule). Taking into account this cancellation is equiv-
alent to performing the Kramers-Krönig transformation
on the dielectric function [Re ǫ(ω) = 4πc2ReΣ(ω)/ω2],
rather than self-energy:
ΣRRe(ω, 0) =
ΣRIm(ω, 0)dω
ω′2(ω′ − ω)
|ω + Eg| −
|ω − Eg|) , (11)
Note that in Eqs. (10,11) we used the three-dimensional
expression for the self-energy and therefore we can re-
express parameter a in terms of the conventional zero
frequency dielectric constant ǫ0, a = 2lEg(ǫ0 − 1)/(h̄c).
From Eqs. (6,7,10,11) one can evaluate the photocur-
rent Iph given by Eq. (3). In this paper we are interested
in the maximum efficiency of the photocell, which can
be defined as ηmax = (IV )max/Pin, where (IV )max is the
power that is dissipated in the circuit assuming an opti-
mal load and Pin is the power of incident solar radiation.
0 2 4 6
FIG. 3: Dependence of photovoltaic efficiency on dimension-
less parameters a and b.
Since the photo-current is only weakly dependent on volt-
age for V < Eg and becomes negative due to spontaneous
and stimulated emission emission for V > Eg, we have
(IV )max ≃ IphEg/e. The incident power per unit area
Pin = cu, where energy density u = (2/v)
h̄ωqñq,
where v is the mode quantization volume and factor 2 is
due to two polarizations of the light waves. Carrying out
the calculation we obtain the following closed form ex-
pression for the maximum photovoltaic efficiency of the
cell:
ηmax =
30ab4
exp (bx)− 1
(x+ a
x− 1)2 + a2(2−
x+ 1−
x− 1)2
, (12)
where we introduced another dimensionless parameter
b = Eg/Tsun. Function ηmax(a, b) shown in Figure 3 has
a pronounced maximum at a ≃ 2.0 and b ≃ 2.4. This
maximum corresponds to the first maximum of the spe-
cific absorption curves in Fig. 2. Note that since our
microscopic theory is valid only for thin dialectic layers
(d ≪ λsun), it does not account for the saturation of
specific absorption at larger thickness, i.e., when d > λ.
However, since ηmax reaches maximum at d ∼ λsun/ǫ0,
our theory is fully self-consistent for semiconductors with
sufficiently high value of the dielectric constant.
The optimal value of parameter b corresponds to the
bandgap energy Eoptg ≃ 1.2 eV . Since this value is close
to the bandgap energy in GaAs (EGaAsg ≃ 1.4eV ), we
conclude that GaAs is a good candidate for the prac-
tical realization of thin film photocells. According to
Eq. (12) the optimal thickness of such GaAs layer is
dopt ≃ 1.2h̄c/[Eoptg (ǫGaAs0 − 1)] ≃ 15 nm, which is within
the fabrication capabilities of contemporary molecular
beam epitaxy technology. Another promising material
is amorphous Silicone which, unlike crystalline Si, has
large imaginary part of dielectric constant[6]. Moreover,
in ultra-thin devices considered here, the thermal equi-
libration of photo-generated carriers may occur on the
time scales longer than the charge separation timescale.
Thus the hot-carrier physics[7] may lead to further en-
hancement of the efficiency.
We thank D. Smith and A. Findikoglu for useful dis-
cussions. The work was supported by the US DOE.
[1] W. Shockley, H.J. Queisser, J. Appl. Phys. 32 (1961) 510-
[2] P. Wurfel, Physics of Solar Cells, Wiley-VCH, 2005.
[3] J. Nelson, The Physics of Solar Cells, Imperial College
Press, 2003.
[4] J.S. Blakemore, J. Appl. Phys. 53 (1982) 520-531.
[5] J. Rammer, H. Smith, Rev. Mod. Phys. 58 (1986) 323-359.
[6] K.C. Kao, R.D. McLeod, C.H. Leung, H.C. Card, H.
Watanabe, J. Phys. D: Appl. Phys. 16 (1983) 1801-1811.
[7] R.T. Ross, A.J. Nozik, J. Appl. Phys. 53 (1982) 3813-3818.
ABSTRACT
  We propose a new concept for the design of high-efficiency photocells based
on ultra-thin (submicron) semiconductor films of controlled thickness. Using a
microscopic model of a thin dielectric layer interacting with incident
electromagnetic radiation we evaluate the efficiency of conversion of solar
radiation into the electric power. We determine the optimal range of parameters
which maximize the efficiency of such photovoltaic element.

<|endoftext|><|startoftext|>
arXiv:0704.0652v1  [astro-ph]  4 Apr 2007
Draft version November 28, 2021
Preprint typeset using LATEX style emulateapj v. 11/26/04
GALACTIC WIND SIGNATURES AROUND HIGH REDSHIFT GALAXIES
Daisuke Kawata
and Michael Rauch
Draft version November 28, 2021
ABSTRACT
We carry out cosmological chemodynamical simulations with different strengths of supernova (SN)
feedback and study how galactic winds from star-forming galaxies affect the features of hydrogen (HI)
and metal (CIV and OVI) absorption systems in the intergalactic medium at high redshift. We find
that the outflows tend to escape to low density regions, and hardly affect the dense filaments visible
in HI absorption. As a result, the strength of HI absorption near galaxies is not reduced by galactic
winds, but even slightly increases. We also find that a lack of HI absorption for lines of sight (LOS)
close to galaxies, as found by Adelberger et al., can be created by hot gas around the galaxies induced
by accretion shock heating. In contrast to HI, metal absorption systems are sensitive to the presence
of winds. The models without feedback can produce the strong CIV and OVI absorption lines in
LOS within 50 kpc from galaxies, while strong SN feedback is capable of creating strong CIV and
OVI lines out to about twice that distance. We also analyze the mean transmissivity of HI, CIV, and
OVI within 1 h−1 Mpc from star-forming galaxies. The probability distribution of the transmissivity
of HI is independent of the strength of SN feedback, but strong feedback produces LOS with lower
transmissivity of metal lines. Additionally, strong feedback can produce strong OVI lines even in cases
where HI absorption is weak. We conclude that OVI is probably the best tracer for galactic winds at
high redshift.
Subject headings: galaxies: kinematics and dynamics —galaxies: formation —galaxies: stellar content
1. INTRODUCTION
Supernova (SN) explosions are thought to be capa-
ble of ejecting part of the interstellar medium (ISM)
from galaxies. Such outflows are often called “galactic
winds” (Johnson & Axford 1971; Mathews & Baker 1971;
Veilleux et al. 2005). Galactic winds are believed to be
an important mechanism for enriching the intergalactic
medium (IGM) (Ikeuchi 1977; Aguirre et al. 2001; Madau
et al. 2001; Scannapieco et al. 2002; Cen et al. 2005) and
are thought to play a crucial role in shaping the mass-
metallicity relation of galaxies (e.g. Larson 1974; Dekel &
Silk 1986; Arimoto & Yoshii 1987; Gibson 1997; Kawata
& Gibson 2003), and in heating the IGM (Ikeuchi & Os-
triker 1986). Galactic winds have been observed in lo-
cal star-forming galaxies (e.g. Lynds & Sandage 1963;
Martin 1998; Ohyama et al. 2002), where their outflow
morphology and kinematics have been extensively stud-
ied (e.g. Martin 2005; Rupke et al. 2005). Some local
galaxies show outflows to about 20 kpc (Veilleux et al.
2003). Such outflows are expected to be more common at
high redshift, where star formation is more active (z > 1)
(Madau et al. 1996). Galactic winds are also believed to
terminate star formation in ellipticals (Mathews & Baker
1971; Kawata & Gibson 2003) and have been invoked
at high redshift to explain the high age of their stel-
lar populations (Kodama & Arimoto 1997; Labbé et al.
2005; Kriek et al. 2006). Theoretical studies suggest that
for progenitors of disk galaxies the gas outflow due to
SN heating at high redshift can effectively suppress star
formation, leading to a less dense stellar halo (Brook
et al. 2004) and a larger disk (Sommer-Larsen et al.
1 The Observatories of the Carnegie Institution of Washington,
813 Santa Barbara Street, Pasadena, CA 91101
2 Swinburne University of Technology, Hawthorn VIC 3122,
Australia
2003; Robertson et al. 2004; Governato et al. 2006) at
z = 0. Therefore, observing galactic wind signatures at
high redshift may elucidate an important ingredient in
the formation of galaxies.
The observational studies at high redshift have uncov-
ered evidence that outflows from star-forming galaxies
at high redshift may be common, as has been shown for
Lyman break selected galaxies (e.g. Pettini et al. 1998;
Ohyama et al. 2003; Shapley et al. 2003). These results
have contributed to the debate about how the outflows
from star-forming galaxies may affect the intergalactic
medium (IGM). Rauch et al. (2001a) uncovered evidence
for repeated injection of kinetic energy into higher den-
sity, CIV absorbing gas, possibly driven by recent galac-
tic winds. In a study of the lower density, general Lyman
alpha forest, Rauch et al. (2001b) found that most that
of the HI absorption systems lack signs for being dis-
turbed by winds, and derived upper limits on the filling
factor of wind bubbles. Simcoe et al. (2002) surveyed
the properties of strong OVI absorption systems at high
redshift and proposed that the apparent temperatures
and the kinematics of the OVI gas as well as their rate
of incidence could be explained if massive Lyman break
galaxies are driving winds out to 50 proper kpc.
Adelberger et al. (2003, 2005a) studied the absorption
line features in the spectra of background QSOs whose
line of sight (LOS) passes close to Lyman-break selected
star-forming galaxies. They found a deficit of neutral hy-
drogen near these galaxies out to 0.5 h−1 comoving Mpc,
accompanied by a surplus of HI beyond that radius, and
suggested that most Lyman break galaxies may reside
bubbles where superwinds have depleted the HI in the
interior and piled up more neutral gas beyond the hot
bubble. The more recent, more statistically significant
of these studies Adelberger et al. (2005a) however, does
not support this claim, but still appears to show a sig-
http://arxiv.org/abs/0704.0652v1
2 Kawata and Rauch
nificant fraction (about 7 out of 24) of the LOS exhibit-
ing weak or no absorption within 1 h−1 comoving Mpc.
Numerical simulations of the IGM without SN feedback
predict a much lower fraction of such weak absorption
systems near the galaxies (Kollmeier et al. 2003), a fact
that may conceivably be explained if at least some galax-
ies have outflows destroying HI in the IGM in their vicin-
ity. Adelberger et al. (2003, 2005a) also show that there
is a correlation between the spatial distribution of CIV
absorption lines and the star-forming galaxies, with the
strongest CIV absorption lines being observed at the LOS
closest to the galaxies (∼ 80 proper kpc). This result may
indicate that outflows related to recent star formation ac-
tivity have enriched the IGM locally. Other evidence for
the association of CIV with galaxies has been reported
(Pieri et al. 2006; Simcoe et al. 2006). Scannapieco et al.
(2006b) compared the observed LOS correlation func-
tions of CIV and SiIV with analytic outflow models and
concluded that the observed correlation can be formally
explained if there are outflows with a scale of about 2
comoving Mpc from large galaxies whose stellar mass is
about 1012 M⊙. However, the large range of the indi-
vidual wind bubbles required to explain the CIV- galaxy
correlation is puzzling. Theoretical arguments suggest
that the clustering of CIV with Lyman break galaxies is
not necessarily proof that those same galaxies produced
the metal enrichment (Porciani & Madau 2005; Scanna-
pieco 2005), and the relation between metals in the IGM
and galaxies clearly needs further study.
Cosmological numerical simulations have proven a use-
ful tool for understanding metal absorption lines (Rauch
et al. 1997). Comparisons between the observations and
the statistics of metal absorption lines derived from nu-
merical simulations have mostly been concerned with
measurements of the metallicity and ionization state of
the IGM (e.g. Davé et al. 1998; Aguirre et al. 2002;
Schaye et al. 2003; Aguirre et al. 2004). The relation
between star-forming galaxies and absorption line fea-
tures have recently been studied by a number of authors
(Croft et al. 2002; Kollmeier et al. 2003; Bruscoli et al.
2003; Kollmeier et al. 2006; Tasker & Bryan 2006), with
several papers (e.g. Croft et al. 2002; Kollmeier et al.
2006; Theuns et al. 2002) considering observable effects
of galactic outflows on the Lyman alpha forest absorption
lines. These studies generally have concluded that such
outflows hardly affect the strength of the HI absorption
lines, because the winds tend to escape into less dense
regions and do not impact the IGM where the density is
high enough to produce HI absorption lines.
So far, most studies based on full-blown cosmologi-
cal numerical simulations have focused on HI absorp-
tion lines. Although there have been a number of pa-
pers discussing the origin and properties of metal ab-
sorption lines (e.g. Rauch et al. 1997; Theuns et al. 2002;
Aguirre et al. 2005; Tasker & Bryan 2006; Oppenheimer
& Davé 2006), the relation between metal absorption
line systems and outflows from the coeval galaxy pop-
ulation has remained largely unclear, in particular as
the effect of winds on the physical properties of metal
lines it is not well known. Nevertheless, the fact that
the metallicity is expected to be increased by galactic
outflows, and the availability of multiple transitions and
several heavy elements with a different enrichment his-
tory should make metal lines a potentially much more
Fig. 1.— Number of SN and chemical yields as a function of the
age of a star particle with a mass of 1 M⊙. The upper panel shows
the total number of SN. The middle and lower panels present the
total ejected carbon and oxygen masses, respectively. The thin
solid, thick solid, and gray thick dotted lines indicate the history
of a star particle with the metallicity of Z = 10−4, 0.02, and 0.04,
respectively.
useful tracer of galactic winds than neutral hydrogen.
The current paper studies the properties of both HI
and metal absorption lines, and looks for observational
signatures of galactic winds. To this end, we run cosmo-
logical simulations with the original version of the Galac-
tic Chemodynamics Code, GCD+ (Kawata & Gibson 2003)
which is capable of tracing the chemical evolution of the
IGM and galaxies self-consistently. We carry out simula-
tions with different strengths of SN feedback, and com-
pare the features of the HI and metal absorption lines
between the simulations, attempting to identify features
sensitive to the presence of galactic winds.
The following section will explain our method including
the description of the numerical simulations with GCD+
and analysis of absorption lines. Unlike previous studies,
we follow different heavy elements separately, and take
into account the abundance evolution for the different
elements when creating fake QSO spectra from the sim-
ulation. This is important because the different elements
come from different types of SNe or are due to mass loss
from inter-mediate mass stars with different life-times.
Section 3 shows our results. First, Section 3.1 focuses
on one galaxy in the simulation volume, and compares
the absorption line features around the galaxy among
models with different SN feedback strengths. Then, in
Section 3.2 we discuss the results more quantitatively us-
ing artificial QSO spectra in 1000 random LOS. Section
4 summarizes our conclusions.
Galactic Wind Signatures 3
Fig. 2.— Projected gas surface density (upper) and temperature (middle) map and star particle distribution (lower) for models NF (left),
SF (middle), and ESF (right).
Fig. 3.— The history of the star formation rate down to z = 2.43 for models NF (left), SF (middle), and ESF (right). The time of first
star formation is different among the three models, although there should not be any difference before feedback from stars happens. This
difference is because different models are carried out on different computers, and the star formation model in GCD+ uses the random number
generator (see Kawata & Gibson 2003, for details) whose sequences are different for different simulations.
2. METHOD
2.1. Numerical Simulations
The simulations were carried out using the Galactic
Chemodynamics Code GCD+ (Kawata & Gibson 2003).
GCD+ is a three-dimensional tree N -body/smoothed par-
ticle hydrodynamics (SPH) code that incorporates self-
gravity, hydrodynamics, radiative cooling, star forma-
tion, SN feedback, and metal enrichment. GCD+ takes
into account chemical enrichment by both Type II
(SNe II) (Woosley & Weaver 1995) and Type Ia (SNe Ia)
(Iwamoto et al. 1999; Kobayashi et al. 2000) SNe and
mass loss from intermediate-mass stars (van den Hoek
& Groenewegen 1997), and follows the chemical enrich-
ment history of both the stellar and gas components of
the system. Figure 1 shows the total number of SN
4 Kawata and Rauch
Fig. 4.— Overdensity, temperature, metallicity, and [C/O] map (from left to right) for model NF at Y= +50 (upper), 0 (middle) −50
(lower) proper kpc, where the biggest galaxy is set to be at the center.
TABLE 1
Properties of the cental galaxy at z = 2.43
Model ESN Mvir
a rvir
b Mgas,vir MDM,vir Mstar,vir M200
c Tvir
Name (erg) (M⊙) (kpc) (M⊙) (M⊙) (M⊙) (M⊙) (K)
NF 0 2.1× 1011 57 1.7× 1010 1.7× 1011 2.5× 1010 2.0× 1011 3.8× 105
SF 3× 1051 2.0× 1011 57 1.5× 1010 1.7× 1011 1.6× 1010 1.9× 1011 3.7× 105
ESF 5× 1051 1.7× 1011 56 1.2× 1010 1.6× 1011 1.7× 109 1.6× 1011 3.2× 105
aVirial Mass in the definition of Kitayama & Suto (1996)
bVirial radius in the definition of Kitayama & Suto (1996)
cMass within a radius which is the radius of a sphere containing a mean density of 200 times the critical
density at z = 2.43
bVirial temperature in the definition of Kitayama & Suto (1996)
(both SN II and SN Ia) and the total amount of car-
bon and oxygen ejected from a star particle with the
mass of 1 Msun as a function of its age. Initially, SNe
II go off, and they continue until the 8 Msun star dies
(∼0.04 Gyr in the case of Z = 0.02). There is no SN
until SNe Ia start to occur around 0.7 Gyr. A star par-
ticle with Z = 10−4 does not lead to SN Ia, because the
adopted SNIa model restricts the metallicity range for
progenitors of SN Ia to logZ/Z⊙ > −1.1 (see Kobayashi
et al. 2000, for details). Oxygen is produced mainly
by SN II. After SN II ceases, the continuous ejection
of oxygen and carbon is mainly due to the contribution
from intermediate-mass stars. Although oxygen yield is
mainly from the pre-enriched ejecta, carbon is newly pro-
cessed in intermediate-mass stars, which explains the sig-
nificant yield in the low metallicity case.
The adopted version of the code also includes non-
equilibrium chemical reactions of hydrogen and helium
species (H, H+, He, He+, He++, H2, H
2 , H
−) and their
cooling processes, following the method of Abel et al.
Galactic Wind Signatures 5
Fig. 5.— Same as Fig. 4, but for model SF.
(1997); Anninos et al. (1997); Galli & Palla (1998). The
details of the non-equilibrium chemical reactions are de-
scribed in the Appendix of Kawata et al. (2006). We
have made the following update from the code used in
Kawata et al. (2006). We adopt a density threshold for
star formation, and permit star formation from gas whose
hydrogen number density (nH = fHρg/mp, where fH, ρg
and mp are the hydrogen mass fraction, density, and pro-
ton mass for each gas particle) is higher than 0.01 cm−3
(Schaye 2004). It is also crucial to take into account the
effect of the UV background radiation when studying the
properties of the IGM. We use the UV background spec-
trum suggested by Haardt & Madau (2001). The code
follows non-equilibrium chemical reactions of hydrogen
and helium species subjected to the UV background. In
addition, radiative cooling and heating due to heavy el-
ements are taken into account based on the Raymond-
Smith code (Raymond & Smith 1977), used in Cen et al.
(1995). The simulation starts at z = 29.7, and initial
temperature and the fractions of hydrogen and helium
species are calculated by RECFAST (Seager et al. 1999,
2000). We turn on the UV background radiation at z = 6
(Becker et al. 2001; Fan et al. 2001).
The cosmological simulation adopts a Λ-dominated
cold dark matter (ΛCDM) cosmology (Ω0=0.24,
Λ0=0.76, Ωb=0.042, h = 0.73, σ8 = 0.74, and ns = 0.95)
consistent with the measured parameters from three-year
Wilkinson Microwave Anisotropy Probe data (Spergel
et al. 2006). We use a multi-resolution technique to
achieve high-resolution in the regions of interest, includ-
ing the tidal forces from neighboring large-scale struc-
tures. The initial conditions for the simulations are con-
structed using the public software LINGER and GRAFIC2
(Bertschinger 2001). Gas dynamics and star formation
are included only within the relevant high-resolution re-
gion (∼6 Mpc at z=0); the surrounding low-resolution
region (∼55 Mpc) contributes to the high-resolution re-
gion only through gravity. Consequently, the initial con-
dition consists of a total of 1350380 dark matter parti-
cles and 255232 gas particles. The mass and softening
lengths of individual gas (dark matter) particles in the
high-resolution region are 7.61 × 106 (3.59 × 107) M⊙
and 1.15 (1.93) kpc, respectively. The high-resolution
region is chosen as the region within 8 times the virial
radius of a small group scale halo with the total mass of
Mtot = 3 × 10
12M⊙ and the virial radius of rvir = 380
kpc at z = 0.
We simulate the following three models with these ini-
tial conditions to investigate the effect of SN feedback.
Model NF is a ”no-SN-feedback” model: although the
model follows the chemical evolution due to SNe and
mass loss from stars, we ignore the effect of energy feed-
6 Kawata and Rauch
Fig. 6.— Same as Fig. 4, but for model ESF.
back by SNe. Model SF is a ”strong feedback” model,
where each SN yields the thermal energy of 3× 1051 erg.
This model produces a feedback effect noticeable in a
number of observables. In the final model, ESF, for ”ex-
tremely strong feedback” model, a thermal energy release
of 5 × 1051 erg per SN is assumed. We found that this
model causes too strong effects of feedback, and produces
too few stars. Therefore, the model is obviously unreal-
istic. However, we retain the model for this extreme case
to help put the other models in perspective.
We analyze the properties of the IGM for all the mod-
els at z = 2.43. As mentioned above, we adopt the
multi-resolution technique. We extract a spherical vol-
ume within the radius of rp = 800 kpc (in physical scale
at z = 2.43) from a galaxy in the high-resolution region.
The central galaxy is the biggest galaxy in the simulation
volume, and the radius is chosen to avoid contamination
from the low-resolution particles. In this paper, we use
the coordinate system where the central galaxy resides at
(x,y,z)=(0,0,0). Figure 2 shows the gas density, temper-
ature, and stellar density map of the central 800 × 800
proper kpc2 region of this volume analyzed for all the
models. Figure demonstrates that the stronger feedback
affects the gas density distribution, and suppresses star
formation more dramatically.
2.2. The Artificial QSO Spectra
The aim of this paper is to search for signatures of
galactic winds among the absorption lines in the back-
ground QSO spectrum. To this end, we construct artifi-
cial QSO spectra, with lines of sight through the simula-
tion volume from various orientation and projected po-
sitions, and compare the absorption line features among
models with different strengths of SN feedback. For a
given line of sight we identify the gas particles whose
projected distance is smaller than their SPH smoothing
length. In this paper, we focus on three absorption fea-
tures, HI1216, CIV1548, and OVI1032, and we call them
HI, CIV, and OVI hereafter. The ionization fractions for
HI, CIV, and OVI for each gas particle are derived as fol-
lows. The HI fraction is self-consistently calculated in our
simulations, because GCD+ follows the non-equilibrium
chemical reactions of the hydrogen and helium ions. The
CIV and OVI fractions for each gas particle are analyzed
with version 6.02b of CLOUDY, described by Ferland
et al. (1998), assuming the condition of optical thin and
ionization equilibrium. Here, we put in the density, tem-
perature, and the abundances of different elements for
the gas particles in the simulation, and run CLOUDY
adopting the same UV background radiation as used in
the simulations (Haardt & Madau 2001). Unfortunately,
we realized that even our (unrealistically) strong feed-
Galactic Wind Signatures 7
Fig. 7.— Density of HI, CIV, and OVI and OVI weighted temperature map (from left to right) for model NF at Y= +50 (upper), 0
(middle) −50 (lower) proper kpc, where the biggest galaxy is set to be at the center. The arrows represent the velocity field weighted by
HI, CIV, OVI, and OVI in the panels from left to right, respectively. The size of arrow corresponds to the amount of velocity, as indicated
in the upper right panel.
back model cannot enrich the lower density regions in
the IGM as much as what is observed (Cowie & Songaila
1998; Schaye et al. 2003; Aguirre et al. 2004). For ex-
ample, some filaments are not enriched at all even in
models ESF as will be seen in Figure 6, although quanti-
tative comparisons with the observational data (see also
Oppenheimer & Davé 2006) will be pursued in a future
paper. This is likely because the limited resolution of our
simulations is unable to resolve the formation of smaller
galaxies which form at a higher redshift and could enrich
the IGM. Thus, to mimic pre-enrichment for the low den-
sity regions we add metals at the level of [C/H]= −3 and
[O/H]= −2.5 to all gas particles. Once the ionization
fractions are obtained, the column densities at the LOS
for each species are analyzed for each gas particle, using
the two-dimensional version of the SPH kernel. The op-
tical depth τ(v) profiles along the LOS are calculated by
the sum of the Voigt-absorption profiles for each particle,
taking into account their temperature and LOS velocity,
vi,LOS, which is the sum of Hubble expansion and pecu-
liar velocity. The final spectra are constructed, assum-
ing an overall signal-to-noise ratio of 50 per 0.04 Å pixel,
the read out signal-to-noise ratio 500, and FWHM=6.7
km s−1. We stress that we take into account the dif-
ference between carbon and oxygen abundances in our
chemodynamical simulations, when obtaining CIV and
OVI fraction.
Note that the simulation volume analyzed is only 1.6
proper Mpc scale at z = 2.43. The Hubble expansion at
z = 2.43 is 236 km s−1 Mpc−1 in the adopted cosmol-
ogy, and 1.6 proper Mpc corresponds to 378 km s−1. In
addition, since the volume is overdensity region, the ex-
pansion velocity is smaller than the Hubble expansion. In
Section 3.2, we analyze the mean transmissivity within 1
h−1 comoving Mpc from the galaxies. The velocity that
corresponds to 1 h−1 comoving Mpc is 94.2 km s−1 at
z = 2.43, which is well within the range of the volume.
However, in the real Universe, if there are absorbers out-
side the volume at the LOS and their peculiar velocity is
large, they can contribute to absorptions in the velocity
range we focus on (see Kollmeier et al. 2003, for more
detailed discussion of such effect). Therefore, we ignore
the contamination from such absorptions, and consider
the ideal absorption systems only by the absorbers which
are spatially close.
3. RESULTS
3.1. Absorption features around a galaxy
To study how the gas outflows from galaxies affect the
absorption line features, we generate two sets of QSO
8 Kawata and Rauch
Fig. 8.— Same as Fig. 7, but for model SF.
spectra for all the models. In this section, we analyze the
spectra whose LOS are chosen as the 5 × 5 grid points
each separated by 50 kpc (in physical scale at z = 2.43) as
projected onto the plane of the sky. The grid is centered
on a galaxy. In the next section, we generate 1000 ran-
dom LOS spectra, and compare them among the three
models. The properties of the central galaxy chosen for
the present section for different feedback models are sum-
marized in Table 1. The total mass of the central galaxy
is slightly smaller than the estimated mass of the BX
galaxies (M200 ∼ 6.3 × 10
11 − 1.6 × 1012 M⊙ in Adel-
berger et al. 2005b) from the observational studies of the
IGM-galaxies connection around z = 1.9 − 2.6 (Adel-
berger et al. 2003, 2005a; Simcoe et al. 2006). However,
the accurate mass range for such rest-frame UV-selected
galaxies are still unknown. In this paper, we simply as-
sume that the central galaxy is a typical UV-selected
star-forming galaxies, and our simulation volume is a
typical environment for such galaxies. This assumption
will be tested in our future studies. Figure 3 shows the
history of the total star formation rate for the star parti-
cles within r = 5 proper kpc at z = 2.43. The extremely
strong feedback in model ESF terminates star formation
in the system. We confirmed that SNe Ia continuously
heat the ISM, which keeps maintaining an outflow in
model ESF.
We chose the X–Y plane in Figure 2 as the projected
plane of the sky, and define the Z-axis as the LOS di-
rection. Figures 4-6 demonstrate the overdensity, tem-
perature, metallicity, and abundance ratio of carbon to
oxygen, [C/O], in the X–Z plane at three different posi-
tion of Y= ±50 and 0 proper kpc for models NF, SF, and
ESF, respectively. In the same planes, Figures 7-9 give
the density distributions of HI, CIV, and OVI, and the
OVI weighted temperature map. In these figures, the
velocity field of the gas component is also shown with
tangential arrows.
In model NF the central galaxy ends up surrounded by
a hot gaseous halo with a radius of about 100 kpc (Fig.
4). This is due to infalling gas being shock-heated to
the virial temperature (Table 1). This situation appears
common for high redshift galaxies (Rauch et al. 1997)
and corresponds to the hot accretion mode of (Kereš
et al. 2005; Dekel & Birnboim 2006)
Because of the inflow, the metallicity of the hot gas
is low. On the other hand, the high density filaments
are cold, and part of the gas keeps accreting through the
filaments onto the galaxy, i.e., the cold accretion mode
described by Kereš et al. (2005) and Dekel & Birnboim
(2006) (see also the velocity field in Fig. 7). As a result,
the HI density is higher along the filament, and gets sig-
nificantly higher in the collapsed region near the central
Galactic Wind Signatures 9
Fig. 9.— Same as Fig. 7, but for model ESF.
Fig. 10.— Composite image of overdensity (blue contour) and
metallicity (red contour) distribution at Y= 0 for model SF. The
edges of contours for overdenisty and metallicity correspond to
log ρg/ < rhog >= −1 and logZ/Z⊙ = −2.5, respectively.
galaxy and the neighboring galaxies. The spatial dis-
tribution of CIV and OVI also traces the filaments, and
their densities are high in the region close to the galaxies.
Model SF produces a more extended hot gas region
than model NF, and the hot gas is now metal-enriched
(Fig. 5), compared to the NF case. Here the hot gas is
dominated by a galactic wind induced by strong SN feed-
back. The temperature map of Figure 5, the arrows in
Figure 8, and Figure 10 demonstrate that the enriched
gas tends to escape toward the lower density regions.
The filament remains unaffected by the wind. As a re-
sult, the cold accretion is still maintained through the
filaments, which continues funneling gas to the galaxy.
This is similar to what was observed in previous numeri-
cal simulations with strong feedback effect (e.g., Theuns
et al. 2002; Kollmeier et al. 2006). Consequently, the dis-
tribution of HI density is not significantly different from
model NF. On the other hand, higher density CIV and
OVI gas extends over a much larger region. This is be-
cause the enriched gas is blown out from the galaxy, and
helps to raise the abundance of carbon and oxygen in the
Interestingly, we find that the gas whose OVI density is
high in model SF has a low temperature which is consis-
tent with photoionization equilibrium for OVI (typical
logarithmic temperatures are around Log(T (K))= 4.2)
as opposed to collisional ionization temperature (typical
Log(T (K))= 5.5). Comparison between the 3rd and 4th
columns in Figure 8 demonstrates that the region where
the density of OVI is high has temperatures around
Log(T (K))< 4.5. This is gas that has been blown out
of the galaxy, and cooled down by radiative cooling after
colliding with the ambient IGM. CIV in low density halo
10 Kawata and Rauch
Fig. 11.— Mosaic of 5× 5 lines of sight spectra of HI, CIV, and OVI lines around a galaxy at z = 2.43 for model NF. The central panel
corresponds to the position of the galaxy, and each line of sight is separated by 50 kpc in physical scale. The numbers at the upper left
corner of each panel present the (x,y) coordinate (in kpc) for each LOS. In the case of CIV and OVI only the stronger line of each doublet
is shown. The LOS velocity is adjusted so that the LOS velocity of the galaxy equals zero.
Fig. 12.— Same as Fig. 11, but for model SF.
Galactic Wind Signatures 11
Fig. 13.— Same as Fig. 11, but for model ESF.
Fig. 14.— The mean transmissivity of HI (upper), CIV (middle), and OVI (lower) for the pixels within 1 h−1 comoving Mpc of the
galaxies as a function of the impact parameter for models NF (left), SF (middle), and ESF (right). The gray lines indicate median and
25th and 75th percentile. Note that the scale of the y-axis, i.e., f1Mpc is different in each panel.
12 Kawata and Rauch
Fig. 15.— The probability of the flux decrement 1 − f1Mpc of HI (upper), CIV (middle), and OVI (lower), where f1Mpc is the
mean transmissivity for the pixels within 1 h−1 comoving Mpc of the galaxies. The left/middle/right panel shows the results of model
NF/SF/ESF. The dotted histogram in the upper panels present the observational results of Adelberger et al. (2005a).
Fig. 16.— The mean transmissivity of the galaxies of HI, f1Mpc,HI for the pixels within 1 h
−1 comoving Mpc as a function of the mean
transmissivity of OVI, f1Mpc,OVI, for models NF (left), SF (middle), and ESF (right). The lines indicate median (black line) and 25th and
75th percentile (grey lines).
and void regions is in a similar thermal state.
Figure 6 shows that the extremely strong SN feedback
in model ESF develops a much stronger galactic wind
and a larger hot gas bubble. Feedback is now strong
enough to affect the filaments. Gas accretion through
the filaments is suppressed, so that star formation in the
central galaxy ceases (Fig. 3). However, even here the
denser regions of the filaments survive, and the density
distribution of HI is similar to the one seen in models NF
and SF. Figure 9 reveals that in model ESF there is more
collisionally ionized OVI especially in the region close to
the galaxy (r ≤ 40 kpc) where the OVI weighted tem-
perature is around Log(T (K))=5.5 and the OVI density
peaks.
Figures 11-13 represent the spectra whose LOS are cho-
sen as the 5× 5 grid points each separated by 50 proper
kpc projected on the sky i.e., in the X–Y plane. In the
figures, the middle panels correspond to the LOS through
the center of the galaxy. First, we compare HI absorp-
tion lines between the three models. At the LOS through
the central galaxy, the HI lines are heavily saturated,
i.e., they produce a damped Lyman alpha line, except in
model ESF. In model ESF, not enough cold gas can sur-
vive in the galaxy due to the extremely strong feedback.
In all the models, the HI absorption becomes weaker with
the projected distance from the galaxy. General features
of HI absorption lines are similar among the three mod-
els. Thus, the signature of a galactic wind seems to be
difficult to see in HI absorption lines, confirming the con-
clusions of previous studies (Theuns et al. 2002; Croft
et al. 2002; Bruscoli et al. 2003; Kollmeier et al. 2006).
However, a more detailed comparison of HI absorption
line features between models NF and SF (Figs. 11 and
12, respectively) shows that the HI absorption lines tend
Galactic Wind Signatures 13
Fig. 17.— The mean transmissivity of CIV (uppera), and OVI (lower) for the pixels within 1 h−1 comoving Mpc of the galaxies as a
function of the impact parameter. The left panel shows the resul of model SF. The middle panel presents the result of model SF, when the
IGM is assumed to be homogeneously enriched at the level of [C/H]=-2.5 and [O/H]=-3. The right panel demonstrates the result of model
SF, when the QSO only UV background radiation suggested by Haardt & Madau (2001) is adopted. The gray lines indicate median and
25th and 75th percentile.
to be stronger in model SF than those in model NF,
even close to the galaxy. This is a counter-intuitive re-
sult, because it seems natural that a strong wind should
predominantly be destroying HI clouds, as suggested by
Adelberger et al. (2003) and Bertone & White (2006).
However, as shown in Figures 2 and 5, strong feedback
redistributes the high density gas in the galaxy to the
surrounding region, and the filaments become broader.
This leads to the stronger HI lines in model SF, espe-
cially in the outer regions of the galaxy. A similar effect
is seen in model ESF (Fig.13). Therefore, we suggest
that stronger SN feedback actually increases the HI ab-
sorption. We will test this more quantitatively in the
next section.
Conversely, we do see a lack of neutral hydrogen along
some LOS close to the galaxy. However, they do not
seem to have anything to do with winds. In model NF,
the LOS at (X,Y) = (−100,−50) and (−100,−100) have
very weak HI absorption lines, although their projected
distance is smaller than ∼120 proper kpc (∼ 300 h−1
comoving kpc). Adelberger et al. (2005a) studied HI ab-
sorption lines around star-forming galaxies at 2 ≤ z ≤ 3,
and found that some LOS which are within 1 h−1 comov-
ing Mpc from the galaxies show weak or absent HI ab-
sorption. They argue that such a lack of absorption may
be caused by a galactic superwind destroying the neutral
hydrogen. However, our model NF does not include any
SN feedback. The LOS at (X,Y) = (−100,−50) corre-
sponds to the line at X= −100 in the lower panels of
Figures 4 and 7. This LOS passes through hot accretion-
shocked gas which cannot accommodate HI, and misses
the more HI-rich filaments. This example demonstrates
that it is possible to have LOS close to galaxies which do
not show any strong HI absorption, without the need for
a galactic wind.
It is also worth mentioning that there are some LOS
which show double HI absorption lines, especially in
model NF, e.g., the LOS at (X,Y) = (50, 0). We find
that this is due to symmetric gas infall from the fila-
ments. The LOS at (X,Y) = (50, 0) in model NF can be
seen at X= 50 kpc at the middle panels of Figure 7. This
LOS passes through two filaments at Z∼ −120 and ∼ 75
kpc, and the velocity map shows the filament at Z∼ −100
(∼ 75) has positive (negative) LOS velocity. As a result,
these two filaments appear as double absorption compo-
nents. Such double HI absorption line features become
less obvious in the cases of stronger feedback, because
outflow from the galaxy makes the velocity field more
chaotic, and fills the gap between the components in ve-
locity space.
We also compare CIV and OVI lines among the models.
In model NF, there are almost no CIV or OVI lines at
R≥ 50 proper kpc. The LOS at (X,Y) = (50,−50) shows
strong CIV lines. However, this is due to the next closest
galaxy, as seen in Figure 7. In contrast, the stronger
feedback in models SF and ESF creates CIV and OVI
lines further away from the galaxy. We also investigated
the absorption lines with a much finer grid of LOS, i.e.,
smaller separations, and found that strong CIV or OVI
lines are rare at projected radii R ≥ 100 kpc even in
models SF and ESF, unless there is another galaxy close
to the LOS. This can also be seen in Figures 8 and 9,
where dense CIV and OVI regions extent to about 100
kpc. OVI lines are very rare in model NF, and only exist
where the HI absorption is saturated. On the other hand,
strong feedback models produce more OVI lines, that
are sometimes stronger than CIV lines. These figures
demonstrate that the OVI lines are the most sensitive
signature of a galactic wind in absorption. We study
this possibility more quantitatively in the next section.
3.2. The mean transmissivity of HI, CIV, and OVI
In this section, we analyze the artificial QSO spectra
in 1000 random LOS. Since our simulation volume is a
spherical volume (Sec. 2.2), we cannot use the LOS at
too large an impact parameter. Therefore, we generate
1000 spectra for the random LOS at the projected radius
of R < 400 proper kpc. We also change the angle of
projection randomly for each LOS.
Within the three dimensional radius of r = 400 proper
kpc, the high-resolution volume of the simulations con-
tains two galaxies whose virial mass is more than 1011
14 Kawata and Rauch
M⊙. Since the virial mass of the observed UV-selected
galaxies is not well known, as mentioned above (see also
Erb et al. 2006), we assume these two galaxies are such
galaxies, and apply a similar analysis to the one done
for the observed UV-selected galaxies (Adelberger et al.
2003, 2005a). Adelberger et al. (2005a) measured the
mean transmissivity of all HI pixel in their QSO spectra
that lie within 1 h−1 comoving Mpc from the galaxies as
determined from the projected distance and the LOS ve-
locity. In this paper we indicate the mean transmissivity
as f1Mpc. We analyze f1Mpc not only for HI (f1Mpc,HI)
but also for CIV (f1Mpc,CIV) and OVI (f1Mpc,OVI) for all
the LOS spectra for our two galaxies. In reality, it is diffi-
cult to measure f1Mpc for metal lines. For example, OVI
lines are often contaminated by interloper Lyman alpha
lines. However, we carry out this theoretical exercize
to understand the effect of galactic winds on absorption
lines quantitatively.
Figure 14 shows the mean transmissivity of HI, CIV,
and OVI as a function of the projected distance from the
galaxy, i.e., the impact parameter, b. Figure 15 shows the
histogram of the probability of the decrement which is de-
fined as 1−f1Mpc from Figure 14. The top panels of Fig-
ures 14 and 15 correspond to the lower panel of Figures
13 and 15 of Adelberger et al. (2005a), respectively. The
left-top panel of Figure 15 shows results similar to those
from previous numerical simulation studies with without
feedback (Kollmeier et al. 2003; Tasker & Bryan 2006).
Although the observational data (dotted histogram) of
Adelberger et al. (2005a) agree as far as strong absorp-
tion is concerned, their data show a much higher proba-
bility for the very weak absorption (1 − f1Mpc < 0.2).
Adelberger et al. (2005a) claim that this may be be-
cause in the real universe galactic winds turn moder-
ate absorption into weak absorption, but do not affect
the strong absorption systems. However, as seen in the
top panels of Figures 14 and 15, our simulations predict
that the existence of the strong galactic wind does not
change the mean flux transmissivity of the HI lines. In-
terestingly, if we compare the transmissivity at the LOS
close to the galaxy (b ≤ 0.4h−1 comoving Mpc) in Figure
14, the stronger feedback leads to slightly stronger mean
absorption, i.e., smaller median f1Mpc,HI, although the
difference is subtle. Therefore, again, we conclude that
HI absorption lines generally are unaffected by galactic
winds. This leaves an inconsistency between the obser-
vations and numerical simulations. Unfortunately, the
current number of the observational sample is not satis-
factory (31 systems in Adelberger et al. 2005a) to reach
firm conclusions. On the theory side, the implementation
of feedback has been one of the more uncertain ingredi-
ents in current numerical simulations (e.g. Okamoto et al.
2005; Kobayashi et al. 2006; Scannapieco et al. 2006a).
We adopt the simplest implementation, but different im-
plementations may lead to different conclusions. Clearly,
further observational studies and numerical simulations
are required to address this problem.
Figures 14 and 15 also show the results for CIV and
OVI lines, which appear to be more sensitive to the ef-
fect of SN feedback. Model NF shows low f1Mpc,CIV to be
almost independently of the impact parameter. On the
other hand, models SF and ESF show that significantly
more LOS have a lower f1Mpc,CIV at b <∼ 0.2h
−1 co-
moving Mpc. The mean transmissivity of OVI lines also
shows a similar trend. However, for OVI the absorption
becomes noticeably stronger i.e., f1Mpc,OVI decreases, as
the impact parameter decreases below b <∼ 0.4h−1 co-
moving Mpc. Figure 15 reveals that model NF barely
shows a decrement 1 − f1Mpc higher than 0.2 for both
CIV and OVI, but models SF and ESF can produce such
strong absorption lines. However, note that y-axis of the
figure is the logarithm of probability, and it represents
only ∼1 % of f1Mpc that show such strong absorption in
models SF and ESF.
In the previous section, we suggested that OVI is a
good tracer for a galactic wind, and in model NF OVI
lines are only observable where the HI lines are satu-
rated. On the other hand, models SF and ESF produce
OVI lines even where the HI is not saturated. Figure 16
plot the f1Mpc,HI against f1Mpc,OVI. Figure clearly shows
that models SF and ESF show the significant fraction of
the spectra with relatively weak HI (f1Mpc,HI > 0) and
stronger OVI (f1Mpc,OVI < 0.95). We have calculated
the probability of low f1Mpc,OVI for the spectra with
f1Mpc,HI > 0.2. Model NF has only 14 % of the spec-
tra with f1Mpc,OVI < 0.95, while models SF and ESF
have 24 and 28 %. Although the difference is small, the
stronger feedback seems to produce more such HI weak
OVI strong lines.
Finally, we briefly mention how our results are sensi-
tive to the distribution of metals and the assumed UV
background radiation. Since re-simulations changing the
metal yields and the UV background radiation are com-
putationally too expensive, we analyzed the results of
model SF, assuming different metal distributions and UV
background radiations. Figure 17 shows the results of
the same analysis as Figure 14 in the cases when the the
IGM is assumed to be homogeneously enriched at the
level of [C/H]= −2.5 and [O/H]= −3 (model SFhZ) and
when the QSO-only UV background radiation suggested
by Haardt & Madau (2001) is adopted (model SFQ),
while model SF includes radiation from both QSO and
galaxies. Model SFhZ demonstrates that if the heavy el-
ements are homogeneously distributed with the assumed
metallicity, there is very little correlation between the
strength of metal lines and the impact parameter. There-
fore, it is important to follow the metal distribution in the
IGM self-consistently. Model SFQ shows that f1Mpc,CIV
and f1Mpc,OVI differ little between the QSO and galaxy
radiation and QSO-only UV background radiation cases,
except for very subtle decrease in CIV absorptions and
increase in OVI absorptions (see also Aguirre et al. 2005).
4. CONCLUSIONS
We have analyzed the QSO absorption features ob-
tained from cosmological numerical simulations with dif-
ferent strengths of SN feedback. Our simulations self-
consistently follow the metal exchange histories among
the IGM, ISM, and stellar components. We investigate
not only the neutral hydrogen absorption lines but also
the ionization lines for heavy elements, keeping track of
the abundance history of the elements.
We have paid particular attention to the properties of
the IGM around high-redshift (z = 2.43) galaxies with
Mvir ∼ 10
11 M⊙. We found that a model without me-
chanical feedback creates hot gas halos around galaxies
due to shock heating, with radii up to 100 proper kpc.
We found that such hot gas can lead to a lack of HI
Galactic Wind Signatures 15
absorption (Fig. 11) even for LOS close to galaxies, as
found by Adelberger et al. (2005a), without having to
invoke a galactic wind.
In our strong feedback models, outflows induced by
SN feedback produce larger hot bubbles around galax-
ies (Figs. 5 and 6). However, such outflows tends to
escape to lower density regions, and hardly affect the
dense filaments producing HI absorption systems so that
the transmissivity of HI Lyman alpha is virtually inde-
pendent of the strength of SN feedback. If anything the
absorption by neutral hydrogen slightly increases in the
presence of a wind. We conclude that the presence or
absence of HI absorption lines is not a good indicator of
the presence or absence of a galactic wind.
On the other hand, we found that the metal lines, es-
pecially OVI, are sensitive to the existence of outflows.
Without feedback, it is difficult to enrich the IGM enough
to produce strong OVI lines further away from galaxies
(Fig. 4), unless there are nearby satellite galaxies inter-
sected by the LOS by chance. We also found that, in
the no-feedback model, strong OVI lines are almost al-
ways associated with saturated HI lines. On the other
hand, the strong feedback model can produce strong OVI
lines even where HI lines are unsaturated, because strong
feedback can re-distribute the enriched gas to relatively
low density regions. We have confirmed this by looking
for the spectra whose OVI flux is less than 0.8 over more
than 5 pixels and whose mean HI flux within ±50 km
s−1 from the HI velocity corresponding to the OVI lines
are higher than 0.2 from 1000 spectra with random LOS.
The no-feedback model has no such spectra, while strong
feedback (model SF) has 12 of such spectra. We point
out that Figure 9 of Simcoe et al. (2004) shows an OVI
line where the HI is not saturated. Our results suggest
that this is likely to be a region where the effect of a
galactic wind is significant. Analyzing the transmissivity
of OVI lines we found that strong feedback creates more
LOS with lower transmissivity, i.e. stronger OVI ab-
sorption, near the star-forming galaxies. The statistical
analysis of transmissivity also shows that there are more
LOS where stronger OVI is associated with weaker HI,
in the presence of galactic winds. We expect that the
pixel-optical depth analysis of OVI against HI (Schaye
et al. 2000) would be sensitive to the presence of a galac-
tic wind, and we will test this idea in a future paper. In
conclusion, OVI appears a theoretically good tracer of
galactic winds that merits further attention.
DK thanks the financial support of the JSPS, through
Postdoctoral Fellowship for research abroad. We ac-
knowledge the Center for Computational Astrophysics of
the National Astronomical Observatory, Japan (project
ID: imn33a), the Institute of Space and Astronautical
Science of Japan Aerospace Exploration Agency, and
the Australian and Victorian Partnerships for Advanced
Computing, where the numerical computations for this
paper were performed. MR is grateful to the NSF for
support under grant AST-05-06845.
REFERENCES
Abel, T., Anninos, P., Zhang, Y., & Norman, M. L. 1997, New
Astronomy, 2, 181
Adelberger, K. L., Shapley, A. E., Steidel, C. C., Pettini, M., Erb,
D. K., & Reddy, N. A. 2005a, ApJ, 629, 636
Adelberger, K. L., Steidel, C. C., Pettini, M., Shapley, A. E.,
Reddy, N. A., & Erb, D. K. 2005b, ApJ, 619, 697
Adelberger, K. L., Steidel, C. C., Shapley, A. E., & Pettini, M.
2003, ApJ, 584, 45
Aguirre, A., Hernquist, L., Schaye, J., Weinberg, D. H., Katz, N.,
& Gardner, J. 2001, ApJ, 560, 599
Aguirre, A., Schaye, J., Hernquist, L., Kay, S., Springel, V., &
Theuns, T. 2005, ApJ, 620, L13
Aguirre, A., Schaye, J., Kim, T.-S., Theuns, T., Rauch, M., &
Sargent, W. L. W. 2004, ApJ, 602, 38
Aguirre, A., Schaye, J., & Theuns, T. 2002, ApJ, 576, 1
Anninos, P., Zhang, Y., Abel, T., & Norman, M. L. 1997, New
Astronomy, 2, 209
Arimoto, N., & Yoshii, Y. 1987, A&A, 173, 23
Becker, R. H., Fan, X., White, R. L., Strauss, M. A., Narayanan,
V. K., Lupton, R. H., Gunn, J. E., Annis, J., Bahcall, N. A.,
Brinkmann, J., Connolly, A. J., Csabai, I., Czarapata, P. C., Doi,
M., Heckman, T. M., Hennessy, G. S., Ivezić, Ž., Knapp, G. R.,
Lamb, D. Q., McKay, T. A., Munn, J. A., Nash, T., Nichol, R.,
Pier, J. R., Richards, G. T., Schneider, D. P., Stoughton, C.,
Szalay, A. S., Thakar, A. R., & York, D. G. 2001, AJ, 122, 2850
Bertone, S., & White, S. D. M. 2006, MNRAS, 367, 247
Bertschinger, E. 2001, ApJS, 137, 1
Brook, C. B., Kawata, D., Gibson, B. K., & Flynn, C. 2004,
MNRAS, 349, 52
Bruscoli, M., Ferrara, A., Marri, S., Schneider, R., Maselli, A.,
Rollinde, E., & Aracil, B. 2003, MNRAS, 343, L41
Cen, R., Kang, H., Ostriker, J. P., & Ryu, D. 1995, ApJ, 451, 436
Cen, R., Nagamine, K., & Ostriker, J. P. 2005, ApJ, 635, 86
Cowie, L. L., & Songaila, A. 1998, Nature, 394, 44
Croft, R. A. C., Hernquist, L., Springel, V., Westover, M., &White,
M. 2002, ApJ, 580, 634
Davé, R., Hellsten, U., Hernquist, L., Katz, N., & Weinberg, D. H.
1998, ApJ, 509, 661
Dekel, A., & Birnboim, Y. 2006, MNRAS, 368, 2
Dekel, A., & Silk, J. 1986, ApJ, 303, 39
Erb, D. K., Steidel, C. C., Shapley, A. E., Pettini, M., Reddy,
N. A., & Adelberger, K. L. 2006, ApJ, 646, 107
Fan, X., Narayanan, V. K., Lupton, R. H., Strauss, M. A., Knapp,
G. R., Becker, R. H., White, R. L., Pentericci, L., Leggett, S. K.,
Haiman, Z., Gunn, J. E., Ivezić, Ž., Schneider, D. P., Anderson,
S. F., Brinkmann, J., Bahcall, N. A., Connolly, A. J., Csabai, I.,
Doi, M., Fukugita, M., Geballe, T., Grebel, E. K., Harbeck, D.,
Hennessy, G., Lamb, D. Q., Miknaitis, G., Munn, J. A., Nichol,
R., Okamura, S., Pier, J. R., Prada, F., Richards, G. T., Szalay,
A., & York, D. G. 2001, AJ, 122, 2833
Ferland, G. J., Korista, K. T., Verner, D. A., Ferguson, J. W.,
Kingdon, J. B., & Verner, E. M. 1998, PASP, 110, 761
Galli, D., & Palla, F. 1998, A&A, 335, 403
Gibson, B. K. 1997, MNRAS, 290, 471
Governato, F., Willman, B., Mayer, L., Brooks, A., Stinson,
G., Valenzuela, O., Wadsley, J., & Quinn, T. 2006, ArXiv
Astrophysics e-prints
Haardt, F., & Madau, P. 2001, in Clusters of Galaxies and the
High Redshift Universe Observed in X-rays, ed. D. M. Neumann
& J. T. V. Tran
Ikeuchi, S. 1977, Progress of Theoretical Physics, 58, 1742
Ikeuchi, S., & Ostriker, J. P. 1986, ApJ, 301, 522
Iwamoto, K., Brachwitz, F., Nomoto, K., Kishimoto, N., Umeda,
H., Hix, W. R., & Thielemann, F. 1999, ApJS, 125, 439
Johnson, H. E., & Axford, W. I. 1971, ApJ, 165, 381
Kawata, D., Arimoto, N., Cen, R., & Gibson, B. K. 2006, ApJ,
641, 785
Kawata, D., & Gibson, B. K. 2003, MNRAS, 340, 908
Kereš, D., Katz, N., Weinberg, D. H., & Davé, R. 2005, MNRAS,
363, 2
Kitayama, T., & Suto, Y. 1996, ApJ, 469, 480
Kobayashi, C., Springel, V., & White, S. D. M. 2006, ArXiv
Astrophysics e-prints
Kobayashi, C., Tsujimoto, T., & Nomoto, K. 2000, ApJ, 539, 26
Kodama, T., & Arimoto, N. 1997, A&A, 320, 41
Kollmeier, J. A., Miralda-Escudé, J., Cen, R., & Ostriker, J. P.
2006, ApJ, 638, 52
16 Kawata and Rauch
Kollmeier, J. A., Weinberg, D. H., Davé, R., & Katz, N. 2003, ApJ,
594, 75
Kriek, M., van Dokkum, P., Franx, M., Quadri, R., Gawiser, E.,
Herrera, D., Illingworth, G., Labbe, I., Lira, P., Marchesini, D.,
Rix, H.-W., Rudnick, G., Taylor, E., Toft, S., Urry, M., & Wuyts,
S. 2006, ArXiv Astrophysics e-prints
Labbé, I., Huang, J., Franx, M., Rudnick, G., Barmby, P., Daddi,
E., van Dokkum, P. G., Fazio, G. G., Schreiber, N. M. F.,
Moorwood, A. F. M., Rix, H.-W., Röttgering, H., Trujillo, I.,
& van der Werf, P. 2005, ApJ, 624, L81
Larson, R. B. 1974, MNRAS, 169, 229
Lynds, C. R., & Sandage, A. R. 1963, ApJ, 137, 1005
Madau, P., Ferguson, H. C., Dickinson, M. E., Giavalisco, M.,
Steidel, C. C., & Fruchter, A. 1996, MNRAS, 283, 1388
Madau, P., Ferrara, A., & Rees, M. J. 2001, ApJ, 555, 92
Martin, C. L. 1998, ApJ, 506, 222
—. 2005, ApJ, 621, 227
Mathews, W. G., & Baker, J. C. 1971, ApJ, 170, 241
Ohyama, Y., Taniguchi, Y., Iye, M., Yoshida, M., Sekiguchi, K.,
Takata, T., Saito, Y., Kawabata, K. S., Kashikawa, N., Aoki,
K., Sasaki, T., Kosugi, G., Okita, K., Shimizu, Y., Inata, M.,
Ebizuka, N., Ozawa, T., Yadoumaru, Y., Taguchi, H., & Asai,
R. 2002, PASJ, 54, 891
Ohyama, Y., Taniguchi, Y., Kawabata, K. S., Shioya, Y.,
Murayama, T., Nagao, T., Takata, T., Iye, M., & Yoshida, M.
2003, ApJ, 591, L9
Okamoto, T., Eke, V. R., Frenk, C. S., & Jenkins, A. 2005,
MNRAS, 363, 1299
Oppenheimer, B. D., & Davé, R. 2006, MNRAS, 373, 1265
Pettini, M., Kellogg, M., Steidel, C. C., Dickinson, M., Adelberger,
K. L., & Giavalisco, M. 1998, ApJ, 508, 539
Pieri, M. M., Schaye, J., & Aguirre, A. 2006, ApJ, 638, 45
Porciani, C., & Madau, P. 2005, ApJ, 625, L43
Rauch, M., Haehnelt, M. G., & Steinmetz, M. 1997, ApJ, 481, 601
Rauch, M., Sargent, W. L. W., & Barlow, T. A. 2001a, ApJ, 554,
Rauch, M., Sargent, W. L. W., Barlow, T. A., & Carswell, R. F.
2001b, ApJ, 562, 76
Raymond, J. C., & Smith, B. W. 1977, ApJS, 35, 419
Robertson, B., Yoshida, N., Springel, V., & Hernquist, L. 2004,
ApJ, 606, 32
Rupke, D. S., Veilleux, S., & Sanders, D. B. 2005, ApJS, 160, 115
Scannapieco, C., Tissera, P. B., White, S. D. M., & Springel, V.
2006a, MNRAS, 371, 1125
Scannapieco, E. 2005, ApJ, 624, L1
Scannapieco, E., Ferrara, A., & Madau, P. 2002, ApJ, 574, 590
Scannapieco, E., Pichon, C., Aracil, B., Petitjean, P., Thacker,
R. J., Pogosyan, D., Bergeron, J., & Couchman, H. M. P. 2006b,
MNRAS, 365, 615
Schaye, J. 2004, ApJ, 609, 667
Schaye, J., Aguirre, A., Kim, T.-S., Theuns, T., Rauch, M., &
Sargent, W. L. W. 2003, ApJ, 596, 768
Schaye, J., Rauch, M., Sargent, W. L. W., & Kim, T.-S. 2000, ApJ,
541, L1
Seager, S., Sasselov, D. D., & Scott, D. 1999, ApJ, 523, L1
—. 2000, ApJS, 128, 407
Shapley, A. E., Steidel, C. C., Pettini, M., & Adelberger, K. L.
2003, ApJ, 588, 65
Simcoe, R. A., Sargent, W. L. W., & Rauch, M. 2002, ApJ, 578,
—. 2004, ApJ, 606, 92
Simcoe, R. A., Sargent, W. L. W., Rauch, M., & Becker, G. 2006,
ApJ, 637, 648
Sommer-Larsen, J., Götz, M., & Portinari, L. 2003, ApJ, 596, 47
Spergel, D. N., Bean, R., Dore’, O., Nolta, M. R., Bennett, C. L.,
Hinshaw, G., Jarosik, N., Komatsu, E., Page, L., Peiris, H. V.,
Verde, L., Barnes, C., Halpern, M., Hill, R. S., Kogut, A., Limon,
M., Meyer, S. S., Odegard, N., Tucker, G. S., Weiland, J. L.,
Wollack, E., & Wright, E. L. 2006, ArXiv Astrophysics e-prints
Tasker, E. J., & Bryan, G. L. 2006, ApJ, 642, L5
Theuns, T., Viel, M., Kay, S., Schaye, J., Carswell, R. F., &
Tzanavaris, P. 2002, ApJ, 578, L5
van den Hoek, L. B., & Groenewegen, M. A. T. 1997, A&AS, 123,
Veilleux, S., Cecil, G., & Bland-Hawthorn, J. 2005, ARA&A, 43,
Veilleux, S., Shopbell, P. L., Rupke, D. S., Bland-Hawthorn, J., &
Cecil, G. 2003, AJ, 126, 2185
Woosley, S. E., & Weaver, T. A. 1995, ApJS, 101, 181
ABSTRACT
  We carry out cosmological chemodynamical simulations with different strengths
of supernova (SN) feedback and study how galactic winds from star-forming
galaxies affect the features of hydrogen (HI) and metal (CIV and OVI)
absorption systems in the intergalactic medium at high redshift. We find that
the outflows tend to escape to low density regions, and hardly affect the dense
filaments visible in HI absorption. As a result, the strength of HI absorption
near galaxies is not reduced by galactic winds, but even slightly increases. We
also find that a lack of HI absorption for lines of sight (LOS) close to
galaxies, as found by Adelberger et al., can be created by hot gas around the
galaxies induced by accretion shock heating. In contrast to HI, metal
absorption systems are sensitive to the presence of winds. The models without
feedback can produce the strong CIV and OVI absorption lines in LOS within 50
kpc from galaxies, while strong SN feedback is capable of creating strong CIV
and OVI lines out to about twice that distance. We also analyze the mean
transmissivity of HI, CIV, and OVI within 1 h$^{-1}$ Mpc from star-forming
galaxies. The probability distribution of the transmissivity of HI is
independent of the strength of SN feedback, but strong feedback produces LOS
with lower transmissivity of metal lines. Additionally, strong feedback can
produce strong OVI lines even in cases where HI absorption is weak. We conclude
that OVI is probably the best tracer for galactic winds at high redshift.

<|endoftext|><|startoftext|>
cond-mat/somewhere
‘Stückelberg interferometry’ with ultracold molecules
M. Mark,1 T. Kraemer,1 P. Waldburger,1 J. Herbig,1 C. Chin,1,3 H.-C. Nägerl,1 R. Grimm1,2
Institut für Experimentalphysik und Forschungszentrum für Quantenphysik, Universität Innsbruck, 6020 Innsbruck, Austria
Institut für Quantenoptik und Quanteninformation,
Österreichische Akademie der Wissenschaften, 6020 Innsbruck, Austria
Physics Department and James Franck Institute, University of Chicago, Chicago, IL 60637, USA
(Dated: August 18, 2021)
We report on the realization of a time-domain ‘Stückelberg interferometer’, which is based on
the internal state structure of ultracold Feshbach molecules. Two subsequent passages through a
weak avoided crossing between two different orbital angular momentum states in combination with
a variable hold time lead to high-contrast population oscillations. This allows for a precise determi-
nation of the energy difference between the two molecular states. We demonstrate a high degree of
control over the interferometer dynamics. The interferometric scheme provides new possibilities for
precision measurements with ultracold molecules.
PACS numbers: 34.50.-s, 05.30.Jp, 32.80.Pj, 67.40.Hf
The creation of molecules on Feshbach resonances in
atomic quantum gases has opened up a new chapter in
the field of ultracold matter [1]. Molecular quantum gases
are now readily available in the lab for various applica-
tions. Prominent examples are given by the creation of
strongly interacting many-body systems based on molec-
ular Bose-Einstein condensates [2], experiments on few-
body collision physics [3], the realization of molecular
matter-wave optics [4], and by the demonstration of ex-
otic pairs in optical lattices [5]. Recent experimental
progress has shown that full control over all degress of
freedom can be expected for such molecules [6, 7, 8]. Ul-
tracold molecular samples with very low thermal spread
and long interaction times could greatly increase the sen-
sitivity in measurements of fundamental physical prop-
erties such as the existence of an electron dipole moment
[9] and a possible time-variation of the fine-structure con-
stant [10, 11].
Most of today’s most accurate and precise measure-
ments rely on interferometric techniques applied to ultra-
cold atomic systems. For example, long coherence times
in atomic fountains or in optical lattices allow ultrapre-
cise frequency metrology [12, 13]. Molecules, given their
rich internal structure, greatly extend the scope of possi-
ble precision measurements. Molecular clocks, for exam-
ple, may provide novel access to fundamental constants
and interaction effects, different from atomic clocks. The
fast progress in preparing cold molecular samples thus
opens up fascinating perspectives for precision interfer-
ometry. Recently, the technique of Stark deceleration has
allowed a demonstration of Ramsey interferometry with a
cold and slowed molecular beam [10]. Ultracold trapped
molecular ensembles are expected to further enhance the
range of possible measurements.
In this Letter, we report on the realization of an inter-
nal-state interferometer with ultracold Cs2 molecules. A
weak avoided crossing is used as a ‘beam splitter’ for
molecular states as a result of partial Landau-Zener tun-
10 12 14 16 18 20
magnetic field (G)
FIG. 1: (color online). (a) Molecular energy structure below
the dissociation threshold showing all molecular states up ℓ=
8. The relevant states for the present experiment (solid lines)
are labeled |g〉, |g′〉, and |l〉. Molecules in state |g〉 or |l〉
are detected upon dissociation as shown in (b) and (c). The
crossing used for the interferometer is the one between |g′〉 and
|l〉 near 11.4G. Initially, ultracold molecules are generated in
state |g〉 on the Feshbach resonance at 19.8G.
neling when it is traversed by means of an appropriately
chosen magnetic field ramp. Using the avoided crossing
twice, first for splitting, and then for recombination of
molecular states, leads to the well-known ‘Stückelberg
oscillations’ [14]. We thus call our scheme a ‘Stückelberg
interferometer’. Our realization of this interferometer al-
lows full control over the interferometer dynamics. In
particular, the hold time between splitting and recombi-
nation can be freely chosen. In analogy to the well-known
Ramsey interferometer [15] the acquired interferometer
phase is mapped onto the relative populations of the two
output states that can be well discriminated upon molec-
ular dissociation. To demonstrate the performance of the
Stückelberg interferometer we use it for precision molec-
ular spectroscopy to determine the position and coupling
strength of the avoided crossing.
http://arxiv.org/abs/0704.0653v1
hold time 
FIG. 2: (a) Scheme of the ‘Stückelberg interferometer’. By
ramping the magnetic field over the avoided crossing at Bc
at a rate near the critical ramp rate Rc the population in the
initial molecular state is coherently split. ∆E is the bind-
ing energy difference at the given hold field B0. After the
hold time τ a reverse ramp coherently recombines the two
populations. The populations in the two ‘output ports’ are
then determined as a function of acquired phase difference
φ ∝ ∆E × τ . (b) Corresponding magnetic field ramp.
The energy structure of weakly bound Cs2 dimers in
the relevant range of low magnetic field strength is shown
in Fig. 1 [16]. Zero binding energy corresponds to the
threshold of dissociation into two free Cs atoms in the
lowest hyperfine sublevel |F = 3,mF = 3〉 and thus to
the zero-energy collision limit of two atoms. The states
relevant for this work are labelled by |g〉, |g′〉, and |l〉 [17].
While |g〉 and |g′〉 are g-wave states with orbital angular
momentum ℓ=4, the state |l〉 is an l-wave state with a
high orbital angular momentum of ℓ= 8 [18]. Coupling
with ∆ℓ=4 between s-wave atoms and g-wave molecules
and between g- and l-wave states is a result of the strong
indirect spin-spin interaction between two Cs atoms [16].
The starting point for our experiments is a Bose-
Einstein condensate (BEC) with ∼ 2 × 105 Cs atoms in
the |F = 3,mF = 3〉 ground state confined in a crossed-
beam dipole trap generated by a broad-band fiber laser
with a wavelength near 1064nm [19, 20]. The BEC al-
lows us to efficiently produce molecules on a narrow Fesh-
bach resonance at 19.84 G [21] in an optimized scheme
as described in Ref. [22]. With an efficiency of typically
25% we produce a pure molecular ensemble with up to
2.5×104 ultracold molecules all in state |g〉, initially close
to quantum degeneracy [21]. The following experiments
are performed on the molecular ensemble in free fall.
During the initial expansion to a 1/e-diameter of about
28µm along the radial and about 46µm along the axial
direction the peak density is reduced to 1× 1011cm−3 so
that molecule-molecule interactions [3] can be neglected
on the timescale of the experiment.
The molecules can now be transferred to any one of
the molecular states shown in Fig. 1 with near 100%
efficiency by controlled ‘jumping’ or adiabatic following
at the various crossings [23]. When the magnetic field
strength is decreased, the molecules first encounter the
crossing at 13.6G. At all ramp rates used in our present
experiments the passage through this crossing takes place
in a fully adiabatic way. The molecules are thus trans-
ferred from |g〉 to |g′〉 along the upper branch of the cross-
ing. They then encounter the next crossing at a magnetic
field of Bc ≈ 11.4G. We accidentally found this weak
crossing in our previous magnetic moment measurements
[3, 23]. This allowed the identification of the l-wave state
|l〉 [18].
This crossing between |g′〉 and |l〉 plays a central role
in the present experiment. It can be used as a tunable
‘beam splitter’, which allows adiabatic transfer, coherent
splitting, as will be shown below, or diabatic transfer for
the molecular states involved, depending on the chosen
magnetic ramp rate near the crossing. We find that a crit-
ical ramp rate of Rc ≈ 14G/ms leads to a 50/50-splitting
into |g′〉 and |l〉 [23]. Using the well-known Landau-Zener
formula and an estimate for the difference in magnetic
moment for states |g′〉 and |l〉 [18] we determine the cou-
pling strength V between |g′〉 and |l〉 to ∼h×15 kHz.
We state-selectively detect the molecules by ramp-
ing up the magnetic field to bring the molecules above
the threshold for dissociation. There the quasi-bound
molecules decay into the atomic scattering continuum.
For state |g〉 dissociation is observed for magnetic fields
above the 19.84G position of the corresponding Feshbach
resonance. Fig. 1 (b) shows a typical absorption image of
the resulting atom cloud [21]. For state |l〉 dissociation is
observed above 16.5G. The molecular states can thus be
easily discriminated by the different magnetic field values
needed for dissociation. Moreover, the expansion pattern
is qualitatively different from the one connected to state
|g〉. The absorption image in Fig. 1 (c) shows an ex-
panding ‘bubble’ of atoms with a relatively large kinetic
energy of about kB×20µK per atom. Here, kB is Boltz-
mann’s constant. We find that significant dissociation
occurs only when the state |l〉 couples to a quasi-bound
g-wave state about h×0.7 MHz above threshold [24]. The
resulting bubble is not fully spherically symmetric, which
indicates higher partial-wave contributions [25]. The dif-
ferent absorption patterns allow us to clearly distinguish
between the two different dissociation channels in a sin-
gle absorption picture when the magnetic field is ramped
up to ∼ 22 G. These dissociation channels serve as the
interferometer ‘output ports’.
The interferometer is based on two subsequent pas-
sages through the crossing following the scheme illus-
trated in Fig. 2. For an initial magnetic field above the
crossing a downward magnetic-field ramp brings the ini-
tial molecular state into a coherent superposition of |g′〉
and |l〉. After the ramp the field is kept constant at a
FIG. 3: (color online). Series of dissociation patterns showing about one oscillation period with ∆E/h = 155 kHz at a hold field
of 11.19G. From one picture to the next the hold time τ is increased by 1µs. The first and the last of the absorption images
mainly show dissociation of l-wave molecules, whereas the central image shows predominant dissociation of g-wave molecules.
130 140 150 160 170 180
=11.24 G
94(1) kHz
=11.20 G
132(1) kHz
=11.17 G
169(1) kHz
 hold time (µs)
FIG. 4: Interferometer fringes for magnetic hold fields B0 be-
low the crossing of |g′〉 and |l〉. The g-wave molecular fraction
is plotted as a function of the hold time τ . Sinusoidal fits give
the oscillation frequency as indicated.
hold field B0 below the crossing for a variable hold time
τ . A differential phase φ is then accumulated between the
two components, which linearly increases with the prod-
uct of the binding energy difference ∆E and the hold
time τ . The magnetic field is then ramped back up, and
the second passage creates a new superposition state de-
pending on φ. For a 50/50-splitting ratio, this can lead to
complete destructive or constructive interference in the
two output ports and thus to high-contrast fringes as a
function of τ or ∆E. These fringes, resulting from two
passages through the same crossing, are analogous to the
well-known Stückelberg oscillations in collision physics
[14, 26] or in the physics with Rydberg atoms [27, 28].
Note that our realization of a ‘Stückelberg interferome-
ter’ gives full control over the interferometer dynamics
by appropriate choice of ramp rates and magnetic offset
fields.
A typical ramp sequence, as shown in Fig. 2 (b), starts
with a sample of |g′〉 molecules at a magnetic field of
11.6G about 250mG above the crossing. At the critical
ramp rate Rc we ramp the magnetic field within about
50µs to a hold field B0 below the crossing. After the vari-
able hold time τ we reverse the ramp and transverse the
crossing a second time at the critical ramp rate. The out-
put of the interferometer is detected by rapidly ramping
the magnetic field up to 22G and by imaging the pattern
of dissociating |l〉 and |g〉 molecules.
For one period of oscillation the dependence of the dis-
sociation pattern on the hold time τ is demonstrated by
the series of absorption images shown in Fig. 3. The hold
time is increased in steps of 1µs while the entire prepa-
ration, ramping, and detection procedure is repeated for
each experimental cycle, lasting about 20 s. The molecu-
lar population oscillates from being predominantly l-wave
to predominatly g-wave and back. For a quantitative
analysis of the molecular population in each output port
we fit the images with a simple model function [29] and
extract the fraction of molecules in each of the two out-
put ports. Fig. 4 shows the g-wave molecular population
as a function of hold time τ for various hold fields B0
corresponding to different ∆E. The existence of these
Stückelberg oscillations confirms that coherence is pre-
served by the molecular beam splitter. Their high con-
trast ratio shows that near 50/50-splitting is achieved.
Sinusoidal fits to the data allow for an accurate determi-
nation of the oscillation frequency and hence of ∆E.
Fig. 5 shows ∆E as a function of magnetic field
strength near the avoided crossing. For magnetic fields
below the crossing we obtain ∆E as described before. For
magnetic fields above the crossing, we invert the interfer-
ometric scheme. Molecules are first transferred from |g′〉
into |l〉 using a slow adiabatic ramp. The field is then
ramped up above the crossing with a rate near Rc, kept
constant for the variable time τ at the hold field B0 and
then ramped down to close the interferometer. An adia-
batic ramp through the crossing maps population in |g′〉
onto |l〉 and vice versa. Detection then proceeds as be-
fore.
For both realizations of the interferometer we obtain
high-contrast fringes even when it is not operated in the
Landau-Zener regime and the fast ramps are stopped
right at the crossing (see inset to Fig. 5). This allows
us to measure ∆E in the crossing region. A fit to the
data with a hyperbolic fit function according to the stan-
dard Landau-Zener model yields Bc = 11.339(1) G for
the position of the crossing, ∆µ = 0.730(6)µB for the
difference in magnetic moment of the two states involved
(µB is Bohr’s magneton), and V = h× 14(1) kHz for
the coupling strength. While the measured ∆µ agrees
reasonably well with the result from an advanced theo-
retical model of the Cs2 dimer [18], Bc and V cannot be
obtained from these calculations with sufficient accuracy.
The present interferometer allows us to observe up to
100 oscillations at 200 kHz. Shot-to-shot fluctuations in-
11.1 11.2 11.3 11.4 11.5 11.6
50 75 100 125 150 175
magnetic field (G)
 hold time  (µs)
FIG. 5: Interferometrically measured binding energy differ-
ence ∆E in the region of the crossing between states |g′〉 and
|l〉 as a function of magnetic field. Solid circles: Standard
ramp sequence of the interferometer. Open circles: Inverted
scheme for field values above the crossing. The one-sigma
statistical error from the sinusoidal fit is less than the size of
the symbols. The solid curve is a hyperbolic fit to the experi-
mental data. Inset: Oscillation at 26.6(3) kHz for a hold field
B0 = 11.34 G right on the crossing.
creasingly scramble the phase of the oscillations for longer
hold times until the phase appears to be fully randomized
while large amplitude variations for the molecular pop-
ulations persist. The peak-to-peak amplitude of these
fluctuations decays slowly and is still 50% of the initial
contrast after 1 ms. We attribute this phase scram-
bling to magnetic field noise that causes shot-to-shot
variations of ∆E, the same, however, for each molecule.
The large amplitude of these fluctuations indicates that
phase coherence is preserved within the molecular sam-
ple. We attribute the gradual loss of peak-to-peak am-
plitude to spatial magnetic field inhomogeneities. We
expect that straightforward technical improvements re-
garding the magnetic field stability and homogeneity and
applying the interferometer to trapped molecular samples
will allow us extend the hold times far into the millisec-
ond range. It will then be possible to measure ultraweak
crossings with coupling strengths well below h×1 kHz.
We have demonstrated a molecular Stückelberg inter-
ferometer with full control over the interferometer dy-
namics. The interferometer can be used as a spectros-
copic tool as it allows precise measurements of binding
energy differences of molecular states. In particular, the
technique can be employed to measure feeble interactions
between molecular states, such as parity non-conserving
interactions [30]. The sensitivity to detect ultraweak level
crossings, combined with long storage times in optical
molecule traps [3] or lattices [5, 6, 7], may allow to de-
tect interaction phenomena on the h× 1 Hz scale. In
view of the rapid progress in various preparation methods
for cold molecular samples, new tools for precision mea-
surements on molecular samples, such as our Stückelberg
interferometer, will open up exciting avenues for future
research.
We thank E. Tiesinga for discussions and for the-
oretical support. We acknowledge financial support
by the Austrian Science Fund (FWF) within SFB 15
(project part 16) and by the EU within the Cold
Molecules TMR Network under contract No. HPRN-CT-
2002-00290. M.M. and C.C. acknowledge support by
DOC [PhD-Program of the Austrian Academy of Science]
and the FWF Lise-Meitner program, respectively.
[1] For a review, see T. Köhler, K. Góral, and P.S. Julienne,
Rev. Mod. Phys. 78, 1311 (2006).
[2] See e.g. Ultracold Fermi Gases, Proceedings of the In-
ternational School of Physics “Enrico Fermi”, Course
CLXIV, Varenna, 20 - 30 June 2006, edited by M. In-
guscio, W. Ketterle, and C. Salomon, in press.
[3] C. Chin et al., Phys. Rev. Lett. 94, 123201 (2005).
[4] J.R. Abo-Shaeer et al. Phys. Rev. Lett. 94, 040405
(2005).
[5] K. Winkler et al., Nature 441, 853 (2006).
[6] G. Thalhammer et al., Phys. Rev. Lett. 96, 050402
(2006).
[7] T. Volz et al., Nature Physics 2, 692 (2006).
[8] K. Winkler et al., Phys. Rev. Lett. 98, 043201 (2007).
[9] J. J. Hudson, B. E. Sauer, M. R. Tarbutt, and E. A.
Hinds, Phys. Rev. Lett. 89, 023003 (2002).
[10] E. R. Hudson, H. J. Lewandowski, B. C. Sawyer, and J.
Ye, Phys. Rev. Lett. 96, 143004 (2006).
[11] C. Chin and V. V. Flambaum, Phys. Rev. Lett. 96,
230801 (2006).
[12] S. Bize et al., J. Phys. B 38, S449 (2005).
[13] M. M Boyd et al., Phys. Rev. Lett. 98, 083002 (2007).
[14] E.C.G. Stückelberg, Helv. Phys. Acta 5, 369 (1932).
[15] N.F. Ramsey, Molecular Beams (Oxford University Press,
London, 1956).
[16] C. Chin et al., Phys. Rev. A 70, 032701 (2004).
[17] Using the notation |f , mf ; l, ml〉 of Ref. [16], the
three states |g〉, |g′〉, and |l〉 correspond to |4, 4; 4, 2〉,
|6, 6; 4, 0〉, and |6, 3; 8, 3〉, respectively.
[18] E. Tiesinga (private communication).
[19] T. Kraemer et al., Appl. Phys. B 79, 1013 (2004).
[20] T. Weber et al., Science 299, 232 (2003).
[21] J. Herbig et al., Science 301, 1510 (2003).
[22] M. Mark et al., Europhys. Lett. 69, 706 (2005).
[23] M. Mark et al., manuscript in preparation.
[24] S. Knoop et al., manuscript in preparation.
[25] S. Dürr, T. Volz, and G. Rempe, Phys. Rev. A 70,
031601(R) (2004).
[26] E. E. Nikitin and S. Ya. Umanskii, Theory of Slow Atomic
Collisions (Springer, Berlin, 1984).
[27] M. C. Baruch and T. F. Gallagher, Phys. Rev. Lett. 68,
3515 (1992).
[28] S. Yoakum, L. Sirko, and P. M. Koch, Phys. Rev. Lett.
68, 1919 (1992).
[29] We model the dissociation pattern with appropriately
chosen spherical harmonic functions to account for the
angular distribution [25].
[30] E. D. Commins, Adv. At. Mol. Opt. Phys. 40, 1 (1999).
ABSTRACT
  We report on the realization of a time-domain `St\"uckelberg interferometer',
which is based on the internal state structure of ultracold Feshbach molecules.
Two subsequent passages through a weak avoided crossing between two different
orbital angular momentum states in combination with a variable hold time lead
to high-contrast population oscillations. This allows for a precise
determination of the energy difference between the two molecular states. We
demonstrate a high degree of control over the interferometer dynamics. The
interferometric scheme provides new possibilities for precision measurements
with ultracold molecules.

<|endoftext|><|startoftext|>
Introduction
Galaxy counterparts to Damped Lyman-α systems (DLAs) seen
in quasar (QSO) spectra have been suggested to be (proto)-disk
galaxies with line of sight clouds of neutral gas with column
densities N(H i) > 2 × 1020 cm−2 (Wolfe et al. 1986). Analyses
of absorption line profiles have indicated that rotational compo-
nents with velocities of ∼200 km −1 can be involved in these
systems suggesting that DLAs reside in large disk galaxies
(Prochaska & Wolfe 1997; Ledoux et al. 1998a). On the other
hand, numerical simulations show that in a hierarchical forma-
tion scenario merging proto-galactic clumps can also give rise to
the observed line profiles (Haehnelt et al. 1998).
A large fraction of the neutral hydrogen present at z > 2 is
contained in high column density DLA systems (Lanzetta et al.
1995; Storrie-Lombardi & Wolfe 2000; Péroux et al. 2001). In
addition to the classical DLAs, clouds with column densities
1019 < N(H i) < 2 × 1020 cm−2 also show some de-
gree of damping wings, which is characteristic of DLA sys-
tems. It is suggested that sub-DLA systems contain a signifi-
cant fraction of the neutral matter in the Universe (Péroux et al.
2003). Metallicity studies have shown that the properties of
the sub-DLA systems are similar to those of DLA systems
(Dessauges-Zavadsky et al. 2003), apart from the latter category
having large ionisation corrections (Prochaska & Herbert-Fort
2004).
Send offprint requests to: L. Christensen
⋆ Based on observations collected at the Centro Astronómico
Hispano Alemán (CAHA), operated by the Max-Planck Institut für
Astronomie and the Instituto Astrofisica de Andalucia (CSIC).
The association of DLAs with galaxies has been a subject of
much study. Originally, either space-based or ground-based deep
images were obtained to identify objects near the line of sight to
the QSOs (Steidel et al. 1995; Le Brun et al. 1997; Warren et al.
2001). To confirm nearby objects as galaxies that are respon-
sible for the DLA lines in the QSO spectra, follow-up spectra
are needed to find the galaxy redshifts. At z < 1, confirmations
of 14 systems exist to date (Rao et al. 2003; Chen & Lanzetta
2003; Lacy et al. 2003; Chen et al. 2005, and references therein),
while at z & 2 only 6 DLA galaxies are confirmed through spec-
troscopic observations of Lyα emission from the DLA galax-
ies (Møller & Warren 1993; Djorgovski et al. 1996; Møller et al.
1998; Leibundgut & Robertson 1999; Møller et al. 2002, 2004),
three of which are located at the same redshifts as the QSOs
themselves. Other techniques to identify DLA galaxies involve
narrow-band imaging (e.g. Fynbo et al. 1999, 2000) or Fabry-
Perot imaging. A Fabry-Perot imaging study of several QSO
fields did not result in detections of emission from DLA galax-
ies (Lowenthal et al. 1995), while recently the same method was
used to identify a few emission line candidates (Kulkarni et al.
2006).
Integral field spectroscopy (IFS) presents an alternative that
provides images and spectra at each point on the sky simultane-
ously. This technique can be used to look for emission line ob-
jects at known wavelengths, but unknown spatial location. This
technique is ideally suited to look for Lyα emission lines from
the galaxies responsible for strong QSO absorption lines. At the
Lyα wavelength corresponding to the redshift of the DLA sys-
tem, the QSO emission has been absorbed, enabling us to lo-
cate emission line objects very near to the QSO line of sight.
Because of the large column densities in DLAs and the res-
http://arxiv.org/abs/0704.0654v1
2 L. Christensen et al.: An IFS survey for high-z DLA galaxies
onant nature of Lyα photons the corresponding emission line
may be offset in velocity space relative to the DLA line (e.g.
Leibundgut & Robertson 1999), but such an offset is not always
observed (e.g. Møller et al. 2004).
IFS searches for emission from DLA galaxies towards
two QSOs have resulted in upper limits for their fluxes
(Petitjean et al. 1996; Ledoux et al. 1998b), while a sub-DLA
galaxy previously known to be a Lyα emitter was confirmed
with IFS (Christensen et al. 2004). Here we present a survey us-
ing IFS to look for Lyα emitting DLA galaxies. Section 2 de-
scribes the sample of QSOs included in the survey, which are
known previously to have DLAs and sub-DLAs in their spec-
tra. Section 3 describes the observations and data reduction. In
Section 4 the method of detecting emission line candidates is
described. Section 5 presents the results and comments on each
object. Properties of the Lyα emission candidates detected in
the survey in relation to the six previously known Lyα emitting
DLA galaxies are presented in Section 6. Section 7 summarises
our findings. A flat cosmology with H0 = 70 km s
−1 Mpc−1,
Ωm = 0.3, and ΩΛ = 0.7 is used throughout.
This study, as well as previous ones that try to identify the
host galaxies of DLA systems, can be biased since the galaxy
observed at the right redshift likely belongs to the brightest emis-
sion line object close to the line of sight. In the case that the
host galaxy is a much fainter galaxy in a group, it will not be
identified correctly. In the remaining part of the paper, an ‘iden-
tified’ DLA galaxy refers to observations that show (line) emis-
sion from independent observations, while the ‘candidates’ are
only reported in these IFS observations. Although extensive tests
are done on the data to distinguish the candidates from potential
artifacts, independent observations are needed to prove them as
Lyα emitters connected with the DLAs.
2. Sample selection
We selected a number of DLA systems without previous de-
tections of associated Lyα emission. The selected QSOs with
known DLAs were chosen based on the following criteria
1. N(H i)> 2 × 1020 cm−2
2. DLA redshift (2 < z < 4)
3. Northern hemisphere object
The first criterion includes only classic DLA systems. Many
QSO spectra show additional sub-DLA systems. Although their
relationship to DLAs is still debated, these systems were in-
cluded in the survey because of their probable physical associa-
tion with galaxies.
To increase the sample size with a minimum number of
pointings we preferentially selected QSOs with multiple DLAs.
IFS covers a range of wavelengths, and correspondingly Lyα
emission at a large range of redshifts in the line of sight for each
QSO. However, in retrospect, this can affect the emission line
detections, because extinction in foreground DLAs could ob-
scure emission from background ones when the galaxies lie in
the same line of sight. Hence, upper limits on detections of the
higher redshift systems can be biased.
From the list of DLA systems compiled by S. Curran1,
we found 66 QSOs matching these criteria in 2003. More re-
cently, detections of DLAs in the Sloan Digitized Sky Survey
QSO spectra have greatly increased the number of known DLAs
(Prochaska & Herbert-Fort 2004; Prochaska et al. 2005). A sys-
tematic survey of all 66 objects would require a large amount
1 http://www.phys.unsw.edu.au/∼sjc/dla/
of time with present instruments, so we selected a few systems
based on their observability during the allocated observing runs.
We avoided DLAs with Lyα absorption lines close to sky emis-
sion lines.
The total sample consists of 9 QSOs with a total number of
14 DLA systems as listed in Table 1. These QSOs have an ad-
ditional 8 sub-DLAs which are included in the survey. Because
of the small number of DLAs involved in the survey, a proper
statistical study is not the aim of this paper. Instead we focus
on a few systems to exploit the benefits of IFS for this kind of
investigation.
To study the applicability of IFS in identifying DLA galax-
ies we initially observed DLA galaxies where Lyα emis-
sion had been reported previously in the literature. Two of
these systems could be observed during our runs; Q2233+131
and PHL 1222, originally identified by Steidel et al. (1995),
Djorgovski et al. (1996) and Møller et al. (1998). Both objects
are reported to have extended Lyα emission (Fynbo et al. 1999;
Christensen et al. 2004). Table 1 includes these two previously
known Lyα emitting DLA and sub-DLA galaxies, although the
criteria listed above are not satisfied. The absorption system to-
wards Q2233+131 has a column density that classifies it as a
sub-DLA. Unless otherwise noted, these two objects are kept
separate from the detection of candidate emission line objects in
the remainder of the paper.
Most of the DLAs in the IFS study lie towards bright QSOs
(R < 19). This ensured that the PSF variations as a function of
wavelength could be determined, which was necessary for the
subtraction of the QSO emission. Bright QSOs had larger resid-
uals from the subtraction of the continuum emission, which po-
tentially affected out ability to recover emission line objects that
were offset in velocity space and located closer than 1′′ to the
QSO line of sight. However, tests with artificial objects showed
that this was a minor problem for the data set (see Sect. 5.3).
3. Observations and data reduction
Using the Potsdam Multi Aperture Spectrophotometer (PMAS)
mounted on the 3.5m telescope at Calar Alto we observed the
objects listed in Table 2 during several runs from 2002–2004.
The PMAS integral field unit (IFU) was used in the standard
configuration where 256 fibres are coupled to a 16×16 element
lens array. During the observations each fibre covered 0.′′5×0.′′5
on the sky giving a field of view of 8′′×8′′. Each fibre resulted in
a spatial element (spaxel) represented by a single spectrum. The
256 spectra were recorded on a 2k×4k CCD which was read out
in a 2×2 binned mode. With a separation of 7 pixels between in-
dividual spectra, the fibre to fibre cross-talk was negligible (less
than 0.4% for an extraction of all 7 pixels). Detailed overviews
of the PMAS instrument and capabilities are given in Roth et al.
(2000, 2005).
For individual exposures a maximum time of 1800s was used
because of the large number of pixels affected by cosmic ray hits.
Furthermore, because of varying conditions such as the atmo-
spheric transmission and seeing, the total exposure time for each
object was adjusted, or sometimes an observation was repeated
under better conditions. The photometric conditions during ob-
servations were monitored in real time with the PMAS acquisi-
tion and guiding camera (A&G camera) which is equipped with
a 1k×1k CCD. Seeing values listed in Table 2 refer to the see-
ing FWHM measured in the A&G camera images. Determining
actual spectrophotometric conditions requires monitoring of the
extinction coefficients which cannot be determined from the
http://www.phys.unsw.edu.au/~sjc/dla/
L. Christensen et al.: An IFS survey for high-z DLA galaxies 3
Coordinate name Alt. name zem zabs logN(H i) [Fe/H] [Si/H] Ref.
Q0151+048A PHL 1222 1.93 1.934 20.36±0.10 (1)
Q0953+4749 PC 0953+4749 4.457 3.404 21.15±0.15 >–2.178 >–2.09 (2,3)
3.891 21.20±0.10 >–1.712 >–1.60
4.244 20.90±0.15 –2.50±0.17 –2.23±0.15
Q1347+112 2.679 2.471 20.3 (4,5,6)
2.05 20.3† (7)
Q1425+606 SBS 1425+606 3.163 2.827 20.30±0.04 –1.33±0.04 >–1.03 (8,9,10,21)
Q1451+1223 B1451+123 3.246 2.469 20.39±0.10 –2.54±0.12 –1.95±0.16 (11,24)
3.171 19.70±0.15 –1.87±0.16 –1.62±0.15 (19)
2.254 20.30±0.15 –1.47±0.17 >–0.40 (6,12,19)
Q1759+7539 GB2 1759+756 3.05 2.625 20.76±0.10 –1.21±0.10 –0.82±0.10 (13,18,20)
2.91 19.8 –1.65±0.01 –1.26±0.01 (18)
Q1802+5616 PSS J1802+5616 4.158 3.391 20.30±0.10 –1.54±0.11 >–1.55 (23)
3.554 20.50±0.10 –1.93±0.12 >–1.82
3.762 20.55±0.15 –1.82±0.26 >–1.74
3.811 20.35±0.20 –2.19±0.23 –2.04±0.22
Q2155+1358 PSS J2155+1358 4.256 3.316 20.55±0.15 >–1.68 –1.26±0.17 (3)
3.142 19.94±0.10 –2.21±0.21 –1.85±0.12 (14,19)
3.565 19.37±0.15 <–2.40 –1.27±0.16 (14,19)
4.212 19.61±0.15 –2.18±0.25 –1.92±0.11 (14,19)
Q2233+131 3.295 3.153 20.0 >–1.4±0.1 >–1.04 (15,16,17)
2.551 20.0 (12,16)
Table 1. List of the observed DLA and sub-DLA systems with column densities and metallicities taken from the litera-
ture. † denotes a system where the reported N(H i) needs to be confirmed through high resolution spectroscopy. References
for either DLA redshifts or metallicities: (1) Møller et al. (1998), (2) Schneider et al. (1991), (3) Prochaska et al. (2003d), (4)
Smith et al. (1986), (5) Wolfe et al. (1986), (6) Turnshek et al. (1989), (7) Wolfe et al. (1995), (8) Chaffee et al. (1994), (9)
Stepanian et al. (1996), (10) Prochaska et al. (2002a), (11) Bechtold (1994), (12) Lanzetta et al. (1991), (13) Prochaska et al. (2001),
(14) Péroux et al. (2003), (15) Steidel et al. (1995), (16) Lu et al. (1997), (17) Prochaska et al. (2003a), (18) Outram et al. (1999),
(19) Dessauges-Zavadsky et al. (2003), (20) Prochaska et al. (2002b), (21) Lu et al. (1996), (22) Prochaska et al. (2003c), (23)
Prochaska et al. (2003b), (24) Petitjean et al. (2000).
A&G camera images. In Table 2 ‘stable’ means that the pho-
tometry of the guiding star was constant within 1% during the
observations.
The data were obtained using 2 gratings; one with 300 lines
mm−1 and one with 600 lines mm−1, set at a chosen tilt to cover
a selected wavelength range. The FWHM of the sky lines were
measured to be 6.4 and 3.2 Å, respectively. Observations of spec-
trophotometric standard stars were carried out at the beginning
and end of each night at the grating position used for the obser-
vations.
Data reduction was done by first subtracting an average bias
frame. Before extracting the 256 spectra most cosmic ray hits
were removed by the routine described in Pych (2004). A high
threshold was chosen such that not all cosmic rays were re-
moved, because a low threshold would also remove bright sky
emission lines from some spectra. Remaining cosmic rays were
removed from the extracted spectra using the program L.A.
Cosmic (van Dokkum 2001).
The locations of the spectra on the CCD were found from
exposures of a continuum lamp, taken either before or after the
science exposures, using a tracing algorithm developed for the
IDL based PMAS data reduction package P3D (Becker 2002).
The spectral extraction was done in two ways; a ‘simple extrac-
tion’ that added all flux from each spectrum on the CCD (i.e. an
extraction width of 7 pixels), and another method that took into
account the profile of the spectrum on the CCD. This second
method assumed that the spectral profiles were represented by
Gaussian functions (Gaussian extraction) where the widths were
allowed to vary with wavelength. Widths were determined by fits
to each of the 256 spectra as a function of the wavelength, and
the extraction used these width in combination with the centre
found from the tracing algorithm. The Gaussian profile is an ap-
proximation because the profiles are not strictly Gaussian. The
second method increased the signal-to-noise ratio by >10% for
faint objects and therefore unless otherwise noted, the results
from the ‘Gaussian extraction’ data cubes will be reported (see
also Sánchez 2006).
After extraction, the spectra were wavelength calibrated us-
ing exposures of emission line lamps taken just before or after
the observations. The wavelength calibration was done using the
P3D reduction tool. Comparisons with sky emission lines indi-
cated an accuracy of the wavelength calibration of about 10% of
the spectral resolution.
The spectra show a wavelength dependent fibre to fibre trans-
mission. To correct for this effect, we extracted sky spectra ob-
tained from twilight sky observations in the same way as the sci-
ence observations. A one dimensional average sky spectrum was
calculated. Each of the 256 spectra were divided by this average
spectrum, and the fraction was fit by a polynomial function to re-
duce noise. These polynomials were used to flat field the science
spectra.
Before combining individual frames, the extracted spectra
were arranged into data cubes. Each data cube was corrected
for extinction using an average extinction curve for Calar Alto
(Hopp & Fernandez 2002). The data cube combination took into
account a correction for the differential atmospheric refraction
4 L. Christensen et al.: An IFS survey for high-z DLA galaxies
QSO date exp.time grating λ coverage seeing conditions
(s) (Å)
Q0151+048A 2003-08-27 5×1800 V600 3500–5080 0.8–1.2 stable
Q0953+4749 2004-04-16 4×1800 V300 3630–6980 0.9 stable
2004-04-21 5×1800 V300 3630–6980 1.0 non-phot.
Q1347+112 2004-04-20 7×1800 V300 3630–6980 0.6 non-phot.
Q1425+606 2004-04-19 6×1800 V300 3630–6750 1.0 stable
Q1451+1223 2004-04-17 7×1800 V300 3630–6980 0.8 non-phot.
Q1759+7539 2004-04-21 7×1800 V300 3630–6980 1.0–1.5 non-phot.
Q1802+5616 2003-06-18 2×1800 V600 5100–6650 1.0 non-phot.
2003-06-20 3×1800 V600 5100–6650 1.0 non-phot.
2003-06-21 4×1800 V600 5100–6650 1.8 non-phot.
2003-06-22 6×1800 V600 5100–6650 0.9 stable
Q2155+1358 2003-08-26 7×1800 V600 4015–5610 0.7 stable
2003-08-27 4×1800 V600 4015–5610 0.8 non-phot.
Q2233+131 2002-09-03† 4×1800 V300 3930–7250 1.0–1.3
2003-08-24 6×1800 V600 4000–5600 0.6 stable
2003-08-25 4×1800 V600 4000–5600 0.7 non-phot.
Table 2. Log of the observations. The last two columns show the average seeing during the integrations and the photometric
conditions derived from the A&G camera images. † Results from these observations are published in Christensen et al. (2004).
using a theoretical prediction (Filippenko 1982). Relative spa-
tial shifts between individual data cubes were determined from
a two-dimensional Gaussian fit to the QSO PSF at a wavelength
close to the strong DLA absorption lines.
Subtraction of the sky background was an iterative process
because the locations of the emission line objects of interest were
not known beforehand. PMAS, in the configuration used, does
not have specifically allocated sky fibres. Instead, we selected
10–20 fibres uncontaminated by the QSO emission and the av-
erage spectrum was subtracted from all 256 spectra. Different
spaxel selections were examined visually to select an appropri-
ate sky spectrum which had no emission line or noisy pixels in
the spectral region of interest.
Flux calibration was done in the standard way using obser-
vations of spectrophotometric standard stars. A one-dimensional
spectrum of the standard star was constructed by co-adding flux
from all 256 spaxels. This was used to create a sensitivity func-
tion that could be applied to each of the 256 spectra in the science
exposures. For non-photometric nights the flux calibrated spec-
tra were compared with QSO spectra from the literature to esti-
mate photometric errors. However, no correction factor was ap-
plied to our spectra, because an intrinsic variability of the QSOs
would make such scaling uncertain. For some cases we note in
Sect. 5.5 that there are differences which could be caused by ei-
ther non-photometric conditions or intrinsic variability.
For reference we show spectra of the target QSOs in Fig. 1.
Where present, metal absorption lines corresponding to the
highest column density DLAs are indicated. For QSOs with
multiple DLAs lines only the DLA lines and their redshifts
are indicated since the wavelength coverage does not include
lines redwards of the QSO Lyα line. A detailed analysis of
metal absorption lines requires higher resolution spectroscopy as
presented elsewhere (e.g. Prochaska et al. 2003c; Péroux et al.
2003; Dessauges-Zavadsky et al. 2003). DLA redshifts derived
from the metal absorption lines were consistent with those re-
ported in the literature within the accuracy of the wavelength
calibration of the data cubes.
4. Search for DLA optical counterparts
The observations covered the wavelengths of Lyα for all but one
of the strong absorption systems listed in Table 1. Only the high-
est redshift sub-DLA system towards Q2155+1358 was not cov-
ered, i.e. the total number of systems included in this analysis is
21 DLA and sub-DLA systems.
4.1. Expected sizes
For this project we are only interested in small wavelength re-
gions corresponding to Lyα at the DLA redshifts, and thus the
search for candidate galaxies could be carried out using cus-
tomised narrow-band filters. IFS, on the other hand, has the ad-
vantage that the widths of the narrow-band filters can be ad-
justed to match those of the emission lines. Typically customised
narrow-band filters have a larger transmission FWHM than the
spectral resolution of IFS data. Hence, IFS allows detection of
emission line objects with a higher signal-to-noise ratio than
that possible with narrow-band filters. A disadvantage is the rela-
tively small field of view of current IFUs, but this is not a serious
concern. One can estimate the expected sizes of DLA galaxies
(see Wolfe et al. 1986). Using a Schechter luminosity function
and a power-law relation between the disk luminosity and gas
radius given by the Holmberg relation R/R∗ = (L/L∗)β, one can
calculate the expected impact parameter. Combining β = 0.26
found for DLA galaxies at z < 1 (Chen et al. 2005) with the
luminosity function in for z = 3 galaxies (Poli et al. 2003) one
finds R∗ ≈ 30 kpc. If DLA galaxies are similar to or fainter than
L∗ galaxies this implies that DLA galaxies at z > 2 are expected
to lie closer than ∼4′′ from the QSO line of sight. The small
field of view of IFUs is therefore well suited to search for Lyα
emission from DLA galaxies.
The estimated galaxy sizes are highly dependent on the
parameters of the DLA galaxy luminosity and slope β. Most
probably, high redshift DLA galaxies are not regular disks like
those in the local universe. Numerical models of DLAs predict
that the galaxies are mostly smaller than 10 kpc, while obser-
vations that give limits on the star-formation rates associated
with DLAs suggest that DLAs are located in neutral gas around
Lyman break galaxies (Wolfe & Chen 2006). As DLA galaxies
hristensen
survey
high-z
galaxies
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
SiIV(1393.76)
SiIV(1402.77)
SII(1526.71)
SII(1533.43)
CIV(1549)
Si IV
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyα(z=3.404)
Lyα(z=3.891)
Lyα(z=4.244)
Lyα(QSO)
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
SiIV(1393.76)
SiIV(1402.77)
SII(1526.71)
CIV(1549)
FeII(1608.45)
AlII(1670.79)
AlIII(1854.7)
AlIII(1862.8)
Si IV
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyβ(1026)
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
SiIV(1393.76)
SiIV(1402.77)
SII(1526.71)
SII(1533.43)
CIV(1549)
FeII(1608.45)
AlII(1670.79)
Si IV
0 2 4 6 8
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
SiIV(1393.76)
SiIV(1402.77)
SII(1526.71)
CIV(1549)
Si IV
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyβ(1026)
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
SiIV(1393.76)
SiIV(1402.77)
SII(1526.71)
CIV(1549)
FeII(1608.45)
AlII(1670.79)
AlIII(1854.7)
AlIII(1862.8)
Si IV
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyα(z= 3.391)
Lyα(z= 3.554)
Lyα(z= 3.762)
Lyα(z= 3.811)
Lyα(QSO)
fλ  [10
−16 erg s−1cm−2 Å−1]
| Lyα(1215.67)
0 2 4 6 8
fλ  [10
−16 erg s−1cm−2 Å−1]
Lyβ(1026)
Lyα(1215.67)
SiII(1260.42)
OI(1302.17)
CII(1334.53)
ig.1.
ectra
tracted
rrectio
alactic
ciated
icated
icated
catio
ritten
ectra
issio
6 L. Christensen et al.: An IFS survey for high-z DLA galaxies
at z > 2 are generally found to be fainter than an L∗ galaxy
(Colbert & Malkan 2002), we choose to consider only objects
with impact parameters smaller than 30 kpc for a more detailed
analysis.
The impact parameters that we measure in the data corre-
spond to the radially projected distances so the real distances to
the absorber can be larger. Two candidates are found at impact
parameters larger than 30 kpc, and they are likely not associated
directly with the absorbers themselves.
4.2. Candidate selection
Some Lyα emission lines from DLA galaxies are offset from the
QSO- DLA line by ∼ 200 km s−1 (Møller et al. 2002), whereas
Lyα emission from high redshift galaxies can have even larger
offsets from the galaxy systemic redshift (Shapley et al. 2003).
We therefore chose to focus on regions in the data cubes with
velocities ranging from approximately –1000 to +1000 km s−1
from the DLA lines.
First, the reduced data cubes were stacked in a two-
dimensional frame and inspected visually around the DLA lines
for emission line objects. When the spatial offset from the QSO
is larger than the seeing, or alternatively when the QSO is very
faint, emission line objects can be identified directly because
of the ordering of the spectra in the stacked spectrum. Where
no objects could be detected visually further sampling of the
data cubes was necessary to increase the signal-to-noise ratio
to detect candidate emission line objects. Inspections of the data
cubes was done using the Euro3D visualisation tool (Sánchez
2004).
From the reduced, sky-subtracted and combined data cubes,
narrow-band images were created with an initial width of 10–
15 Å depending on the spectral resolution of the observations. A
set of images was created offset by –10 to +10 Å from the DLA
line to allow for possible velocity shifts of the Lyα emission line,
and inspected visually for objects brighter than the background.
If detected, spectra from these brighter regions were co-added
and inspected for emission lines at the wavelength chosen in the
narrow-band image. This step was necessary to discriminate be-
tween emission lines and individual noisy spectra. It is known
that three blocks of 16 fibres, i.e. 48 fibres in an area of 1.′′5 to the
west in the field of view, have lower than average transmission.
The effect of correcting for the total throughput was that these
spectra had lower signal-to-noise ratios. When narrow-band im-
ages were created from the cubes, the higher variance in these
spaxels could result in extreme values, seemingly inconsistent
with the neighboring spaxels. Only by looking at the spectrum
associated with a bright spaxel could it be determined if an emis-
sion line was present, or if the spectrum was just noisy. If an
emission line was seen, a second pass narrow-band image was
created using the value of the emission line width to increase the
signal of the detection. A second pass one-dimensional spectrum
was created after inspecting the narrow-band image for more
bright spaxels surrounding the emission line candidate. This pro-
cedure was iterated until the signal in either narrow-band images
or spectra did not increase. We found that an interactive visual
identification of faint emission lines was more effective than an
automatic routine.
To allow a better visual detection of emission line objects,
the narrow-band images were interpolated to pixel scales 0.′′2
pixel−1 as shown in Fig. 2. In all panels the images are 8′′ by
8′′, with orientation north up and east left. The left panels show
interpolated images of the QSO at wavelengths near to the DLA
line. Contours correspond to an image centered on the visually
detected emission feature close to the DLA redshift. In the mid-
dle panels in Fig. 2 the plots are reversed, such that the image
shows the emission line object and the contours correspond to
the QSO narrow-band image. Here, the innermost contour cor-
responds to the seeing FWHM. To enhance the visibility of the
candidates the QSO emission was subtracted from the data cubes
before creating the images. This subtraction of the QSO emis-
sion was done using a simple approach (see Christensen et al.
2006). A scale factor was determined for each spaxel by dividing
each spectrum by the extracted one-dimensional QSO spectrum.
Using this scale factor, the QSO emission was subtracted, a pro-
cess which retains objects with spectral characteristics different
from the QSO in the data cube.
The spectra of the candidates are shown in the right hand
column in Fig. 2. These are created by co-adding spectra from
between 4 and 10 spaxels. The dotted line corresponds to the
1σ noise level determined from a statistical analysis of the pixel
values in the data cube, while the lower sub-panels show the
background noise spectra in the data cubes, obtained from 4-10
background spaxels.
Properties of the candidate objects corresponding to those
with spectra in Fig. 2 are listed in Table 3. Offsets in RA, DEC
from the QSO and the corresponding projected distance at the
DLA redshift are listed in columns 2, 3, and 4. Emission lines
were fit using ngaussfit in IRAF, redshifts listed in column 5
are derived. Fluxes in column 6 are derived from the Gaussian
fits, and errors in the peak intensity, line width, and contin-
uum placement are propagated to calculate the uncertainties. The
fluxes have not been corrected for the Galactic extinction. The
flux measurements and the associated errors indicate that most
of the candidates are detected with a signal-to-noise ratio < 4σ.
Column 7 gives the velocity difference between the systemic
DLA redshift and the candidate Lyα emission lines. We integrate
the signal-to-noise estimate over the emission line (Column 8),
S/Nint = f /(
N × σ), where f is the line flux, N is the number
of pixels the emission line covers and σ is the noise in adjacent
wavelength intervals. Column 9 gives the observed emission line
FWHM after correcting for the instrumental resolution. Finally
column 10 gives the significance classes of the candidate detec-
tion, which is explained in Section 5.1.
Columns 3 and 4 in Table 4 list the values of Galactic redden-
ing towards each QSO (Schlegel et al. 1998), and the correction
factors to be applied to the candidate fluxes for a Milky Way
extinction curve (Fitzpatrick 1999).
5. Results
This section describes the classification of the emission line can-
didates. We estimate the contamination from spurious detections
and from interlopers. Notes on each observed object are pre-
sented as well.
5.1. Candidate significance class
To estimate how reliable the candidate detection was, various
tests were applied to the data cubes. The candidates were as-
signed a significance class: 1, 2, 3, and 4 according to how many
of the following tests were passed.
1. Instead of co-adding all data-cubes, two independent subsets
of exposures were created, and the emission line candidate
was visible in both sub-set combinations. In Sect. 5.5 these
will be referred to as subcombinations.
L. Christensen et al.: An IFS survey for high-z DLA galaxies 7
DLA Q0151+048A
3500 3550 3600 3650
Wavelength [Å]
Sky noise
DLA (z=3.404) Q0953+4749
5200 5300 5400 5500
Wavelength [Å]
Sky noise
DLA (z=3.891) Q0953+4749
5800 5900 6000 6100
Wavelength [Å]
Sky noise
DLA (z=4.244) Q0953+4749       no candidate
6200 6300 6400 6500
Wavelength [Å]
Sky noise
       
DLA Q1347+112 (z=2.484)
4100 4200 4300 4400
Wavelength [Å]
Sky noise
Fig. 2. Left panels: narrow-band images of the QSOs with overlayed contours of narrow-band images centered on the Lyα wave-
lengths of the DLAs. The images are 8′′ square. Contour levels correspond to 2, 3, 4σ levels above the background noise. Middle
panels: the reverse, where the contours are arbitrary apart from the central one that shows the QSO seeing FWHM. Right hand
panels: Spectra of candidate emission line objects created from co-adding spectra from spaxels associated with the emission line
candidates. The width of the grey bars over the emission lines correspond to the wavelength ranges of the narrow band images. The
lines below the spectra show the background sky noise spectra determined from background spaxels.
8 L. Christensen et al.: An IFS survey for high-z DLA galaxies
      
DLA Q1347+112 (z=2.057)
3650 3700 3750 3800 3850 3900
Wavelength [Å]
Sky noise
10 DLA  Q1425+606
4500 4600 4700 4800
Wavelength [Å]
Sky noise
DLA Q1451+1223
4100 4200 4300 4400
Wavelength [Å]
Sky noise
sub−DLA Q1451+1223
4900 5000 5100 5200
Wavelength [Å]
Sky noise
Fig. 2. Plots of candidates– continued. No candidate is found for the z = 2.254 DLA towards Q1451+1223.
L. Christensen et al.: An IFS survey for high-z DLA galaxies 9
DLA  Q1759+7539
4300 4400 4500 4600
Wavelength [Å]
Sky noise
10 sub−DLA  Q1759+7539
4600 4700 4800 4900
Wavelength [Å]
Sky noise
DLA Q1802+5616 (z=3.391)
5200 5300 5400 5500
Wavelength [Å]
Sky noise
10 DLA Q1802+5616 (z=3.762)
5600 5700 5800 5900
Wavelength [Å]
Sky noise
Fig. 2. Plots of candidates– continued. No candidate is found for the z = 3.554 DLA towards Q1802+5616.
10 L. Christensen et al.: An IFS survey for high-z DLA galaxies
10 sub−DLA Q2155+1358 (z=3.146)
4900 5000 5100 5200
Wavelength [Å]
Sky noise
DLA Q2155+1358
5100 5200 5300 5400
Wavelength [Å]
Sky noise
subDLA Q2233+131 (z=3.153)
4900 5000 5100 5200
Wavelength [Å]
Sky noise
Fig. 2. Plots of candidates– continued. No candidates are found for the z = 3.811 DLA towards Q1802+5616, or the z = 3.565
sub-DLA towards Q2155+1358.
L. Christensen et al.: An IFS survey for high-z DLA galaxies 11
Fig. 2. Plots of candidates– continued. No candidate is found for the z = 2.51 sub-DLA towards Q2233+131.
QSO ∆RA ∆DEC b z fλ ∆ v S/Nint FWHM significance
(′′) (′′) (kpc) (km s−1) (km s−1) class
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Q0151+048A 2.5:0.4 –2.5: 1.7 3.3:25.4 1.9363 (150±70) +225 7 280±220 conf.
Q0953+4749 –1.2 –0.2 9.0 3.4041 (6.6±2.9) 0 15 290±260 2
–0.5 1.8 11.1 3.9029 (4.9±2.1) +730 16 570±230 3
Q1347+112 –2.0 1.5 20.2 2.4835 (3.5±1.9) +1080 8 190±370 3
–4.1 0.4 34.3 2.0568 (4.2±3.0) +670 7 610±210 1
Q1425+606 –4.1 0.4 32.3 2.8280 (8.5±3.1) +80 14 590±220 3
Q1451+1223 –3.0 3.8 39.2 2.4764 (5.8±2.6) +640 10 320±260 3
–1.0 2.8 22.5 3.1739 (3.1±2.0) +210 7 <100 3
Q1759+7539 –1.4 3.5 30.0 2.6377 (5.8±3.0) +1050 10 290±220 2
0.1 –1.2 9.4 2.9090 (6.0±2.9) –80 21 240±260 1
Q1802+5616 –0.2 –2.0 14.9 3.3820 (3.5±0.9) –610 10 180±90 4
0.2 1.8 12.9 3.7652 (4.6±1.9) +200 7 <100 2
Q2155+1358 0.7 1.2 10.2 3.3174 (9.4±3.0) +100 9 780±210 3
1.7 1.9 19.4 3.1461 (4.1±2.4) +290 7 260±220 3
Q2233+131 0.6 2.3 18.0 3.1543 (9.6±2.5) +90 9 230±110 conf.
Table 3. Properties of candidate Lyα emission lines. Column 2, 3, and 4 list the offsets of the candidate in RA and DEC and in pro-
jected kpc at the Lyα emission redshifts (in col. 5), respectively. Column 6 lists the integrated Lyα flux in units of 10−17 erg cm−2 s−1,
and column 7 the velocity offset from the DLA redshift. Column 8 lists the integrated signal-to-noise ratio of the Lyα emission line,
and column 9 gives the line width of the emission lines. Fluxes have not been corrected for Galactic extinction. Column 10 lists the
significance class of the detections as described in Sect. 5.5. ‘Conf.’ implies candidates that were confirmed previously (Møller et al.
1998; Djorgovski et al. 1996).
2. The emission line candidate was visible in the ‘simple ex-
tractions’, i.e. where a Gaussian profile was not assumed.
3. Emission line candidates were visible in the narrow-band im-
ages when the QSO spectrum was subtracted from the data
cube.
4. Emission line objects that were directly visible in individual
or combined data cubes, or in the stacked spectra.
In all cases, for a candidate to be considered further it was re-
quired to be detected above 3σ in both narrow-band images and
the associated spectra. The significance classes of the candidates
are listed in column 10 in Table 3, and comments for each object
are described in Section 5.5. Since the candidates were found
from visual inspections of the data cubes, this classification was
done to describe the candidates in a more qualitative manner. As
the classes involve various tests on the data sets, this classifica-
tion goes beyond the simple statistical significance in terms of to
what σ level the object is detected.
5.2. Non-detections
In the data cubes where no candidates were found, we esti-
mated the upper limits for the emission line fluxes. Spectra from
spaxels within one seeing element (e.g. 4 spaxels corresponding
to a seeing of 1′′) were co-added to create a one-dimensional
spectrum. Artificial emission lines with varying line fluxes were
added to this spectrum at the DLA wavelength, and Gaussian
profile fits to these lines were used to estimate the detection level.
The results are listed in Table 5. The varying limits are due to the
wavelength dependent noise in the data cubes and in particular
the presence of residuals from nearby sky emission lines.
12 L. Christensen et al.: An IFS survey for high-z DLA galaxies
QSO EB−V ffrac
Q0151+048A 0.044 1.216
Q0953+4749 (z =3.4041) 0.011 1.032
Q0953+4749 (z =3.9028) 1.028
Q1347+112 (z =2.4835) 0.035 1.145
Q1347+112 (z =2.0568) 1.163
Q1425+606 (z =2.827) 0.012 1.043
Q1451+1223 (z =2.4764) 0.031 1.128
Q1451+1223 (z =3.1739) 1.102
Q1759+7539 (z =2.6377) 0.053 1.220
Q1759+7539 (z =2.91) 1.199
Q1802+5616 (z =3.3820) 0.052 1.164
Q1802+5616 (z =3.7652) 1.145
Q2155+1358 (z =3.3174) 0.067 1.222
Q2155+1358 (z =3.1461) 0.067 1.237
Q2233+131 0.068 1.240
Table 4. Column 2 and 3 give values of the Galactic redden-
ing and the corresponding correction factor to be applied to the
emission line candidates.
QSO zabs flim(3σ)
Q0953+4749 4.244 2.5
Q1451+1223 2.256 4.0
Q1802+5616 3.554 4.0
Q1802+5616 3.811 2.2
Q2155+1358 3.565 3.5
Q2233+131 2.551 4.8
Table 5. DLA and sub-DLA systems where no candidate emis-
sion lines are found and 3σ upper limits for the line fluxes.
Fluxes are in units of 10−17 erg cm−2 s−1.
5.3. Experiments with artificial objects
To investigate how the efficiency of the visual inspection de-
pended on object properties, several experiments with artificial
data cubes were made. Similar to artificial experiments for one-
and two-dimensional data sets, artificial emission line objects
were added to the data cubes. These objects were described by
the location in RA and DEC, central wavelength, peak emission
intensity, and the widths in RA, DEC and wavelength. For sim-
plicity we assumed that an emission line object seen as a point
source in the data cube could be represented by a Gaussian pro-
file in each direction, i.e. described by a Gaussian ellipsoid in
the data cube.
We first tested completely simulated data cubes with statisti-
cal noise levels corresponding to the typical noise level in the
combined data cubes. An emission line object with a flux of
5 × 10−17 erg cm−2 s−1, a width of 800 km s−1, and spatial
FWHM of 1′′ was placed at a previously known wavelength. In
the stacked spectra no objects could be seen immediately. The
emission line was only identified after inspecting the data cube
in the visualisation tool, and it was extracted and analysed in the
same way as the real data. Similar tests were made by adding an
emission line to a real data cube, where the background noise in-
cluded the systematic noise as well as the pure Poissonian noise.
These tests produced similar results for the faint emission lines
with Lyα flux f ∼ 5 × 10−17 erg cm−2 s−1, i.e. 1) the emission
line flux could be recovered within uncertainties, 2) even at very
small impact parameters the object could be found 3) the recon-
structed PSF of the emission line object was irregular as in any
of the images in Fig. 2.
We also tested an automatic routine where the re-detection
of the artificial objects was done with no visual intervention. A
set of narrow-band images were created in wavelength ranges
around the artificial line. For the detection of an emission line
the location was constrained to be within ±10 Å of the input
central wavelengths. These images were smoothed and a two-
dimensional Gaussian profile was fit to the images. When an ob-
ject was detected above a certain threshold, spaxels around the
centre within the seeing FWHM were co-added. A series of tests
showed that the recovered flux was consistent within 1σ errors
for fluxes down to f = 5 × 10−17 erg cm−2 s−1. In a typical data
cube this was also the detection limit where 50% of the objects
were re-identified, while the fraction of re-identified emission
lines at this flux level from a visual inspection was larger.
Tests on the frequency of false detections in data cubes where
no objects were present showed that simultaneous detections
of objects in narrow-band images and associated spectra with
S/N > 3 occurred at a rate of less than 5% in a series of ex-
periments. Therefore false detections cannot explain the large
number of candidate objects.
5.4. Field Lyα emitters
We estimate here whether the detected candidates are likely to
be field Lyα emitters having no association with the DLAs.
Observations of high redshift objects have partly focused on
detecting Lyα emission from galaxies to determine the global
comoving star-formation rates (e.g. Hu et al. 1998, 2004). The
density of Lyα emitters at z ∼ 3 is estimated to be 15000 deg−2
∆z−1 with line fluxes brighter than a mean of f = 1.5 × 10−17
erg cm−2 s−1 (Hu et al. 1998; Kudritzki et al. 2000). From the
luminosity function at z ≈ 3 (van Breukelen et al. 2005), the ex-
pected number of field Lyα emitters at a flux limit of 5 × 10−17
erg cm−2 s−1 is 1.7×10−4 arcsec−2 ∆z−1. In our survey, the 9 data
cubes sample a total redshift interval of ∆z = 21.55 around z ≈ 3.
Statistically, it is expected that there are 0.2 field Lyα emitters in
the whole sample presented here. Because these very faint lines
are difficult to locate when the approximate wavelength is not
known in advance, we did not look for field emission objects.
The negligible number of expected field emitters furthermore
shows that the emission candidates, if proved to be real, are un-
likely to be interloping field Lyα emitters. They are much more
likely to be associated with the DLA galaxies.
5.5. Notes on individual objects
This section explains the significance of the candidates for each
individual QSO.
Q0151+045A. – This is a zem ≈ zabs system at z ≈ 1.93. After
flux calibration, the QSO spectrum is 2 magnitudes brighter than
that presented in Møller et al. (1998). The low instrument sensi-
tivity at 3560 Å combined with a variable extinction coefficient
at Calar Alto makes the calibration uncertain.
Extended Lyα emission was observed in a region of 3′′×6′′
around the QSO mostly to the east of the QSO (Fynbo et al.
1999). Long slit spectroscopy along the long axis revealed ve-
locity structures of 400 km s−1 that could be interpreted as a
rotation curve (Møller 1999). In the IFS data extended emission
is detected to some degree in Fig. 2, but not with the same detail
as in the higher spatial resolution and larger field of view data
in Fynbo et al. (1999). This is the only case in the sample where
extended emission is found, but the signal is not strong enough
to determine the velocity structure over the extended region. The
L. Christensen et al.: An IFS survey for high-z DLA galaxies 13
spectrum shown in Fig. 2 is the total spectrum co-added from the
whole nebula.
Q0953+4749. – This zem = 4.457 QSO has three DLAs at
zabs = 3.407, 3.891, and 4.244 (Bunker et al. 2003). A candidate
associated with the lowest redshift DLA is visible in the narrow-
band image in Fig. 2. Independent subcombinations, the simple
extraction, and the corresponding spectra show a faint emission
line. This emission line coincides with a sky emission line 1.6 Å
away and could be due to sky subtraction errors, so we only as-
sign this candidate a significance class of 2. A Lyα emission
line from the DLA galaxy has been reported (A. Bunker, pri-
vate comm.) but its line flux is below our detection limit. For the
second DLA system at z = 3.891 the object is present in sub-
combinations, the simple-extracted images, and in the extracted
spectra. This candidate is assigned significance class 3. No can-
didate is found for the highest redshift DLA to the detection limit
reported in Table 5.
The locations of the candidates are compared to WFPC2 im-
ages obtained from the HST archive, but no continuum counter-
part could be identified.
Q1347+112. – This zem = 2.679 QSO has a DLA at zabs =
2.471 and another possible one at zabs = 2.05, which needs con-
firmation from spectroscopy at higher spectral resolution. An
emission candidate for the z = 2.471 DLA is visible in the sub-
combinations and the extracted one-dimensional spectra. In the
simple extraction, the spectrum has a low signal-to-noise ratio
and the emission feature in the spectrum is faint. We assign a
significance class of 3 to this candidate. For the z = 2.0568 DLA
system we detect a candidate emission line object, but note an
increase in the background noise shortwards of 3750 Å. The ob-
ject is not seen in one of the subcombinations, nor the extracted
spectra, and therefore the candidate is assigned a significance
class of 1.
A snapshot WFPC F555W image (Bahcall et al. 1992) ob-
tained from the HST archive has a 5σ limiting magnitude of
24.4 mag arcsec−2, but no continuum counterpart can be seen at
the location of the candidate.
Q1425+606. – This zem = 3.163 QSO has a DLA at zabs =
2.827. Because this QSO is very bright, strong residuals within
1′′ from the QSO centre are present in the narrow-band image
where the QSO emission is subtracted. A faint object offset by
∼4′′ to the west is visible in the narrow-band image in Fig. 2. The
candidate is present in subcombinations and in the constructed
spectra and is assigned a significance class of 3. In PMAS data
cubes, spaxels in the west region are more noisy than the average
due to an overall lower transmission. Note that the tests suggest
a good candidate, but the impact parameter (> 30 kpc) is large.
Q1451+1223. – This zem = 3.246 QSO has two DLAs at
zabs = 2.469 and zabs = 2.254 and a sub-DLA at zabs = 3.171.
For the DLA system at z = 2.469 an object appears after the QSO
subtraction close to the centre. It is caused by residuals, since the
spectrum has no emission lines at the expected wavelength. For
the same DLA, a region ∼4′′ to the north west appears in both
narrow-band imaging, subcombinations, simple extractions, and
the constructed spectra. We therefore assign a significance class
of 3 to it, but note that is has a large impact parameter (39 kpc).
For the z = 3.171 sub-DLA an object is detected to the north.
Narrow-band images from subcombinations, and simple extrac-
tions show the emission line candidate, but the corresponding
spectra have emission lines with very low signals. The candidate
is assigned a significance class of 3. No candidate is found for
the z = 2.254 DLA.
A deep optical broad-band image of the field surrounding
this QSO was obtained by Steidel et al. (1995), who found no
obvious candidates to the absorbers. Warren et al. (2001) found
one candidate offset by 3.′′9 to the south-west of the QSO in a
NICMOS image, but this object is outside the field of view of
the IFS data. An HST/STIS archive image shows that the emis-
sion line candidates lie in regions where no continuum emitting
counterpart is found.
Q1759+7539. – This zem = 3.05 QSO has a DLA at zabs =
2.625 and a sub-DLA at zabs = 2.91. The candidate detected
for the DLA system in Fig. 2 lies near the northern edge of the
field of view and can be affected by flat field errors. Although it
is bright, the candidate is not visible in both subcombinations,
and is therefore assigned a significance class of 2. A bright area
1.′′8 south west of the QSO appears after the QSO emission is
subtracted but it is likely due to residuals. It has no emission lines
at the expected wavelength and is not considered further. The
higher redshift sub-DLA system has an emission line candidate
which is visible only after the QSO PSF has been subtracted
from the final cube. However, the candidate is only visible in
one out of two subcombinations and we assign this candidate a
low significance class of 1.
A NICMOS snapshot image showed no bright galaxies
near the QSO to a limit corresponding to an L∗ galaxy
(Colbert & Malkan 2002).
Q1802+5616. – This zem = 4.158 QSO has four DLAs at
zabs = 3.391, 3.554, 3.762, and 3.811. The candidate for the low-
est redshift DLA system is directly visible in the reduced and
combined data cube when looking at the stacked spectra. The
candidate can also be identified in individual subcombinations
and in the simple extracted spectrum. Therefore this candidate is
assigned the class 4. In a narrow-band image at the wavelength
of Lyα at z = 3.7652 there is an emission region to the south (see
Fig. 2) and the corresponding spectrum shows an emission fea-
ture. However, this line is coincident with a faint sky emission
line, so this candidate is assigned the class 2. No candidates are
found for the other two DLA systems.
Q2155+1358. – This zem = 4.256 QSO has a DLA at zabs =
3.316 and three sub-DLAs at zabs = 3.142, 3.565, and 4.212. The
observations only cover Lyα for the three lower redshift systems.
IFS covering the highest redshift system has revealed a possi-
ble faint candidate emission line object (Francis & McDonnell
2006). The candidate Lyα emission line associated with the DLA
system is visible in independent subcombinations and in the
simple extraction and is therefore assigned a high value of 3.
Because of the partial spatial overlap with the QSO, the emis-
sion from the QSO is subtracted to give a cleaned emission line
object and the associated spectrum shown in Fig. 2. A candidate
is found to the south for the z = 3.142 sub-DLA system. This
object is visible in the simple extraction, and subcombinations,
but only one associated spectrum shows a clearly detected emis-
sion line. We assign a significance class of 3 to this candidate.
No candidate is found for the z = 3.565 sub-DLA.
Q2233+131. – This zem = 3.295 QSO has two sub-DLAs at
zabs = 3.153 and zabs = 2.551. The galaxy responsible for the
z = 3.153 DLA was found by Steidel et al. (1995), and follow-
up spectroscopy confirmed this by the detection of Lyα emission
(Djorgovski et al. 1996). Previous IFS of this object suggested
that the Lyα emission was extended (Christensen et al. 2004).
This is not confirmed by the higher spectral resolution data in-
cluded in this paper, although there appears to be some faint
emission to the east of the object in Fig. 2. The new data and im-
proved data reduction which optimises the signal-to-noise ratio,
confirm the line flux Lyα line flux reported in Djorgovski et al.
14 L. Christensen et al.: An IFS survey for high-z DLA galaxies
(1996). No candidate was found for the z = 2.551 sub-DLA sys-
tem, consistent with the upper limit from a deeper Fabry-Perot
imaging analysis (Kulkarni et al. 2006).
6. Properties of candidate DLA counterparts
We proceed with a more detailed analysis of the properties of
the detected candidate Lyα emission lines. Only those candi-
dates assigned values 3 and 4 are included. Of the eight good
candidates, we reject two due to their large impact parameters
(> 30 kpc). However, since they fulfill the criteria for good can-
didates, they could instead belong to a brighter component in a
group. The average redshift of all the DLAs in the whole sam-
ple is z̄sample = 3.13 while that of the six remaining candidates
is z̄cand = 3.23, hence we find no preference for detection of ei-
ther lower or higher redshift candidates. We emphasise that the
candidates emission lines have fluxes that are detected at the 3σ
level, but with this in mind we compare their properties with
those of confirmed Lyα emission lines from DLA galaxies.
6.1. Line fluxes
Fig. 3 shows the inferred line fluxes of the candidates as a func-
tion of redshift. The triangles denote our candidates and square
symbols indicate already confirmed objects from the literature
(Møller & Warren 1993; Djorgovski et al. 1996; Møller et al.
1998; Leibundgut & Robertson 1999; Møller et al. 2002, 2004).
This figure shows that the line fluxes for the candidates are simi-
lar to those for the previously confirmed ones, which have deeper
observations and detection levels of 5–10σ.
Fabry-Perot imaging studies of QSOs with DLAs have man-
aged to reach similar or lower flux limits than our IFS survey
(Lowenthal et al. 1995; Kulkarni et al. 2006). With their detec-
tion limit some objects should have been detected if the Lyα
fluxes of DLA galaxies are around the level we find for the
candidates and the confirmed objects. IFS is useful to look for
emission lines as it allows us to adjust a posteriori the cen-
tral wavelength, whereas in Fabry-Perot images, the emission
line could fall at the wings of the filter where the transmis-
sion is lower. Another advantage of IFS observations is the
knowledge of the spatial QSO PSF as a function of wave-
length which allows a modeling and subtraction of the QSO
emission (Wisotzki et al. 2003; Sánchez et al. 2004). This al-
lows detection of emission lines even when they are superim-
posed on the QSO. Nevertheless we do not detect emission line
candidates closer than about 1′′ from the QSO possibly due to
subtraction residuals. The fact that the confirmed objects are
found at smaller impact parameters compared to the candidates
(Sect. 6.3) could indicate a bias.
6.2. Velocity differences
An anticorrelation is expected between the Lyα luminosity and
the velocity difference between the Lyα emission line and optical
emission lines (Weatherley et al. 2005). The resonant nature of
Lyα causes a shift of the emission line towards slightly longer
wavelengths where the photons can escape absorption. When a
larger fraction of the blue part of the line profile is absorbed,
the remaining emission line of lower luminosity will be more
shifted in velocity compared to brighter ones. This explanation
is supported by the study of Lyα emission lines from Lyman
break galaxies (LBGs) (Shapley et al. 2003).
1.5 2.0 2.5 3.0 3.5 4.0
redshift
10−18
10−17
10−16
10−15
10−14
L= 1043 erg s−1
L= 1042 erg s−1
confirmed
candidates
Fig. 3. Line fluxes of Lyα emission objects as function of red-
shift, where square symbols represent already confirmed objects
and triangles the candidates. The solid and dashed lines corre-
spond to Lyα luminosities of 1042 and 1043 erg s−1, respectively.
Fig. 4 shows the velocity differences between Lyα emission
lines and the DLA redshifts for the candidates as a function of
the Lyα luminosity. There is no evidence for a correlation for the
candidates. We note that the only candidate that shows a negative
velocity offset is the best candidate in the sample; the z = 3.391
DLA towards Q1802+5616. For the candidates we find an aver-
age velocity difference of 300±580 km s−1, which is similar to
the velocity differences measured for LBGs; Pettini et al. (2001)
find 560±410 km s−1 while a larger sample has ∆v = 650 km s−1
between the Lyα emission line and low-ionisation absorption
lines (Shapley et al. 2003). In the case that DLAs are associated
with bright galaxies we would expect to see large velocity off-
sets too. Furthermore, as the line of sight towards the emission
line object and the QSOs differ by 10–30 kpc, a larger velocity
offset can be expected due to differences in kinematics within
the host and its environment. Instead, if the DLA galaxy resides
in a group, the velocity offset will reflect the velocity disper-
sion in the group instead of being related to the host. In support
of this idea, it has been shown that bright Lyman break galax-
ies at z > 2 are surrounded by gas extending to large distances
(Adelberger et al. 2005). Correlation studies have revealed that
DLAs cluster on almost the same scale as LBGs (Cooke et al.
2006), indicating that a similar amount of gas is present in their
environments.
Like flux-limited surveys, this IFS study selects the brightest
emission component, and it is possible that the real absorbing
galaxy is a fainter component in a group.
In the case that DLA galaxies are related to rotating large
disks, it can be expected that the velocity difference increases
with impact parameter but Fig. 5 shows no clear correlation. In
three of the confirmed DLA galaxies optical emission lines have
velocity differences between –200 and 30 km s−1 relative to Lyα
(Weatherley et al. 2005). Some candidates have larger offsets,
possibly affected more strongly by resonant scattering.
6.3. H i extension
The average impact parameter of 16 kpc derived for the can-
didates is larger than that expected by numerical simulations
which favor impact parameters of b = 3 kpc for DLA galax-
L. Christensen et al.: An IFS survey for high-z DLA galaxies 15
42.2 42.4 42.6 42.8 43.0
log (L
confirmed
candidates
Fig. 4. Velocity differences between the Lyα emission lines and
DLA redshifts as a function of the Lyα luminosity. An average
error bar for the Lyα emission candidates is shown in the lower
right corner.
0 10 20 30
b (kpc)
confirmed
candidates
Fig. 5. Velocity differences between the Lyα emission lines and
DLA redshifts as a function of the impact parameter.
ies, and have fewer than 25% with b > 10 kpc at all redshifts
(Haehnelt et al. 2000; Okoshi & Nagashima 2005; Hou et al.
2005). Larger DLA galaxy sizes of 10–15 kpc at 2 < z < 4
are inferred by other simulations (Gardner et al. 2001). A pos-
sibility for the difference between observations and simulations
is that the simulations assume a single disk scenario, while DLA
galaxies could exist in groups (Hou et al. 2005). The real absorb-
ing galaxy could be fainter and lie closer to the QSO line of sight
than the detected candidate galaxy.
An anticorrelation between N(H i) and the distance to the
nearest galaxy is found in simulations (Gardner et al. 2001), but
no analysis of this effect for observed DLA galaxies has been at-
tempted. A trend for larger column density absorbers at smaller
impact parameters was observed in a sample of DLA galax-
ies at z < 1 (Rao et al. 2003). At lower column densities in
the Lyα forest such an anticorrelation has been shown to ex-
ist (Chen et al. 1998, 2001). Observations of the galaxies giving
rise to Mg ii absorption lines showed an anticorrelation between
the impact parameters and column densities of both Mg ii and
2.0 2.5 3.0 3.5 4.0
redshift
confirmed
candidates
Fig. 6. Angular impact parameter vs. redshift. The solid line
corresponds to the size-luminosity relation for an L∗B galaxy
(Chen & Lanzetta 2003), while the dotted and dashed lines cor-
respond to galaxies with B band luminosities LB = 10%L
B and
LB = 1%L
B, respectively. Squares are objects from the literature
and triangles the candidates from this survey.
H i (Churchill et al. 2000), but recent observations of a larger
sample have indicated that the correlation is not always present
(Churchill et al. 2005).
We here investigate whether the candidates show a similar
anticorrelation using the impact parameters for the candidates as
a proxy for the sizes of neutral gas envelopes around proto galax-
ies. Assuming such a correlation is necessarily a rough approx-
imation because large morphological differences between indi-
vidual systems are expected (Rao et al. 2003; Chen & Lanzetta
2003). Specifically, the possible presence of sub-clumps is ne-
glected.
6.3.1. DLA galaxy sizes and luminosities
To analyse the sizes of DLA galaxies at z < 1, Chen & Lanzetta
(2003) describe the extension of the neutral gas cloud associated
with DLA galaxies as
, (1)
assuming that DLA galaxies follow a Holmberg relation be-
tween galaxy sizes and luminosities. Their fit to the observed
DLA galaxies gives R∗ = 30h−1 kpc, t = 0.26+0.24−0.06, where L
B cor-
responds to a galaxy with M∗B = −19.6. Based on the morpholo-
gies and impact parameters Chen et al. (2005) argued that dwarf
galaxies alone cannot represent the DLA galaxy population at
z < 1. In the case that the DLA galaxy population evolves from
low luminosity objects at high redshifts to higher luminosity at
lower redshifts, this will affect the expected impact parameters.
In Fig. 6 the impact parameters of candidates and confirmed ob-
jects are shown as a function of their redshifts. Overlayed on this
figure are curves for the size-luminosity relation derived for low
redshift DLA galaxies. If DLA galaxies at high redshift follow
the low redshift scaling relation, they comprise a mix of galaxy
luminosities.
16 L. Christensen et al.: An IFS survey for high-z DLA galaxies
0 10 20 30
b (kpc)
confirmed
candidates
Fig. 7. Column density of neutral hydrogen as a function of
impact parameters for the candidates and previously confirmed
objects. The symbols are similar to the previous figures. The
solid and dashed lines are fits to the power-law relation b/b∗ =
(N/N∗)β for the candidates and confirmed objects, respectively.
6.3.2. Powerlaw profiles
Using similar arguments as above one could expect that there is a
relation between the impact parameter b, and the column density
measured for the DLA. Fig. 7 shows the N(H i) measured for the
DLA as a function of the impact parameters in kpc. We assume
a similar scaling relation as in Eq. (1) for the impact parameter
and N(H i), i.e.
( N(H i)
N(H i)∗
We set log N(H i)∗ = 20.3 and the error on the measured
impact parameter is given by the fibre size of 0.′′5, which corre-
sponds to ∼4 kpc. A fit of the observed impact parameters for
the candidates gives b∗ = 15.9 ± 1.4 kpc, and β = −0.23 ± 0.08
as shown by the solid line, while for the confirmed objects, we
find b∗ = 12.0± 3.7 kpc and β = −0.36± 0.14 as represented by
the dashed line. The fits to the candidates and confirmed objects
are consistent within 1σ uncertainties.
6.3.3. Exponential profiles
Radio observations of the 21 cm emission from H i disks in the
local Universe have shown that an exponential profile is a poor
representation in the central part of disk galaxies, where the
21 cm flux density either stays constant or decreases towards
the centre (e.g. Verheijen & Sancisi 2001). However, at optical
wavelengths disk galaxies are well represented by exponential
profiles. Here we fit the impact parameter distribution by the re-
lation
N(H i) = N(H i)0 exp(−b/h) (3)
where h is the scale length, N(H i)0 the central column density
of a simple exponential disk, and N(H i) is the column density
measured for the DLAs. The resulting fit to all the candidates
is shown by the solid line in Fig. 8, which has log N(H i)0 =
21.7± 1.1 cm−2 and h = 5.1+2.5−1.3 kpc. This result is similar within
the uncertainties to a fit to the confirmed objects only (log N0 =
21.7 ± 1.1 cm−2 and h = 4.5+3.6−1.4 kpc).
0 5 10 15 20 25 30
b (kpc)
confirmed
candidates
Fig. 8. Column density of neutral hydrogen as a function of im-
pact parameter for the candidates and previously confirmed ob-
jects. The solid and dashed lines show the fit to the exponential
relation N(H i) = N(H i)0 exp(−b/h) for the candidates and con-
firmed objects, respectively.
Local disk galaxies have optical scale lengths ranging from
∼2 kpc to ∼6 kpc, and observations of the H i profile in low sur-
face brightness galaxies have indicated scale lengths > 10 kpc
(Matthews et al. 2001). In contrast, higher redshift spiral galax-
ies in the HST Ultra Deep field have smaller optical scale lengths
of 1.5–3 kpc (Elmegreen et al. 2005), possibly biased towards
smaller values due to the easier detection of high surface bright-
ness, high star formation rate regions. The question is how ex-
tended the gaseous envelopes are around these young galaxies.
The impact parameters of the candidates suggest that high red-
shift DLAs reside far from the host galaxy, if not in a regular
proto-galactic disk (Wolfe et al. 1986), then in a region of the
same physical scale, possibly in merging clumps of gas sur-
rounding the actual proto galaxy. In this picture, DLAs can be
found far from the center of the parent galaxy.
The two objects that were originally discarded as candidates
due to their large impact parameters (> 30 kpc) do not follow
the relations for either the exponential or power-law profiles.
Including them would make the scatter around the fit substan-
tial.
6.4. Metallicity effects on Lyα emission
Using a space based imaging survey and follow up long-slit
spectroscopic observations, Møller et al. (2004) found indica-
tions for a positive metallicity–Lyα luminosity relation, such
that Lyα emission was preferentially observed in higher metal-
licity systems. They argued that this positive correlation could
over-power the negative dust–Lyα luminosity effects which
are expected to be strong in high metallicity environments
(Charlot & Fall 1993). Studies of Lyα emission from nearby
star-forming galaxies have not revealed any correlations between
metallicity and Lyα emission strength (Keel 2005). Differences
in the velocity-metallicity relation between high and low redshift
DLAs could be explained by higher redshift DLAs residing in
lower mass galaxies (Ledoux et al. 2006), that have fainter Lyα
emission.
In this context we investigate the distribution of metallici-
ties for the emission line candidates in comparison to the total
L. Christensen et al.: An IFS survey for high-z DLA galaxies 17
-2.5 -2.0 -1.5 -1.0 -0.5
[Si/H]
good candidates
no candidates
Fig. 9. Cumulative distribution of Si metallicities of the six good
emission line candidates compared to the remaining part of the
sample. The probability that the two distributions are similar is
38% estimated from a Kolmogorov-Smirnov test.
sample. We compare the cumulative distributions of metallici-
ties ([Si/H]) for the DLAs that have candidate Lyα detections
with the metallicities of the remaining objects that have either
no Lyα candidates or rejected candidates. The distributions in
Fig. 9 show the fraction of DLAs with metallicities larger than a
given value. Table 1 has several lower limits on [Si/H] and Fig. 9
treats the limits as actual detections.
A two sided Kolmogorov Smirnov (KS) test gives a prob-
ability of 38% that the two samples have the same underlying
distributions. A similar analysis for [Fe/H] gives the same re-
sult. Hence, none of the tests give clear statistical evidence for a
difference between the two populations. Only a small number of
DLAs are included in this survey. For the two samples with N1
and N2 being the number of objects in each sample respectively
we find N1N2/(N1 + N2) = 4. For the KS test to be statistically
valid a value larger than 4 is required (Press et al. 1992), hence
a few more objects are needed to make the test more statistically
significant.
6.5. Metallicity gradients
In local galaxies the metallicities of H ii regions are shown to de-
crease with increasing radial distance in the disk (Zaritsky et al.
1994). If DLAs arise in disks lower metallicities are expected
at larger linear separations between the QSO line of sight and
the galaxy centers. A comparison of absorption metallicities for
3 DLAs at z < 0.6 with abundances derived from strong emis-
sion line diagnostics from the galaxy spectra revealed that gra-
dients are likely present at at level –0.041±0.012 dex kpc−1
(Chen et al. 2005). Uncertainties in the gradients arise due to
an unknown correction for dust depletion and also the inclina-
tion of the galaxy plays an important role due to projection ef-
fects. Metallicity differences due to differential depletion within
a singe DLA galaxy can be strong as demonstrated by observa-
tions of a lensed QSO (Lopez et al. 2005).
Observations of nebular emission lines from seven galax-
ies at 2.0 < z < 2.5 indicate an average solar metallicity
(Shapley et al. 2004), with evidence for the presence of metal-
licity gradients (Förster Schreiber et al. 2006). If metallicity gra-
dients exist for high redshift DLA galaxies, we would expect
0 10 20 30
b (kpc)
confirmed
candidates
Fig. 10. DLA metallicities as a function of the impact pa-
rameters, where symbols shapes have similar meanings as
in the previous figures. The dotted line shows a fit to all
the objects excluding the limits. This line has a gradient of
−0.024 ± 0.015 dex kpc−1. Data for the confirmed objects are
either [Si/H] of [Zn/H] taken from Møller et al. (2004).
to see a tendency for higher metallicities for the DLA galax-
ies detected at smaller impact parameters. DLA galaxies are
on the average fainter than LBGs detected in flux limited sur-
veys (Møller et al. 2002). Combining this with a high redshift
luminosity-metallicity relation can imply low DLA metallicities
without involving metallicity gradients.
Fig. 10 shows the DLA metallicities as a function of the im-
pact parameters. The line shows a fit to all objects ignoring those
with lower limits on [Si/H]. This gradient is –0.024±0.015, i.e.
is consistent with zero at the 2σ level. The large scatter in the
plot could be real and unrelated to gradients. Different DLAs
exhibit a large range of star formation histories (Erni et al. 2006;
Herbert-Fort et al. 2006), which makes it unreasonable to expect
a smooth relation between metallicities and impact parameters
for a sample of DLAs. Clearly more data are needed to deter-
mine the reality of any relation.
7. Conclusions
We have presented an integral field spectroscopic survey of 9
high redshift QSOs, which have a total of 14 DLA systems and
8 sub-DLA systems. We detect eight good candidates for Lyα
emission lines from DLA and sub-DLA galaxies. Two of these
are found at impact parameters larger than 30 kpc, and are not
likely associated directly with the absorbing galaxy, but could
be associated with galaxy groups in which the real absorbing
galaxy resides. All candidates are detected at a statistically sig-
nificant level in reconstructed narrow-band images as well as
in the co-added one-dimensional spectra. Further observations
will be useful to independently confirm the candidates at an even
higher signal to noise ratio. We compare the properties inferred
from the IFS data with those for previously spectroscopically
confirmed Lyα emission lines from DLA galaxies reported in
the literature. We find that line luminosities are similar to those
of previously confirmed objects, that the average impact param-
eters are larger by a factor of ∼2, and that some candidates have
larger velocity offsets between the Lyα emission line and the
systemic redshift of the DLA system.
18 L. Christensen et al.: An IFS survey for high-z DLA galaxies
We analyse the distribution of DLA column densities as a
function of impact parameters. Assuming that the average DLA
galaxy is similar to a disk galaxy with an exponential profile, we
show that it has a scale length of 5 kpc. Such a scale length is
similar to disk scale lengths found for local spiral galaxies. This
could imply that DLAs belong to large disks even at high red-
shifts as originally suggested by Wolfe et al. (1986). However,
it is probably too simplistic to expect that high redshift DLAs
reside in regular disks with similar structure to large local disks.
DLA systems are generally not associated with luminous galax-
ies (e.g. Colbert & Malkan 2002; Møller et al. 2002). The large
impact parameters found for the candidates indicate that the dis-
tribution of H i clouds in DLA galaxies extends significantly be-
yond the optical sizes of fainter dwarf galaxies.
Furthermore, Wolfe & Chen (2006) showed that high red-
shift DLAs do not reside in extended disks that follow the lo-
cal Schmidt-Kennicutt law for star formation. The IFS results
presented here suggest that the Lyα emission is generally not ex-
tended, and that star formation takes place at a distance of several
kpc from the DLA. Hence, we may speculate that DLAs arise in
the outskirts of proto-galaxies, for example in clouds of neutral
gas around LBGs. In this case one would expect a significant
scatter in the relation between the impact parameter and column
density of the DLA since neutral clouds could be distributed ir-
regularly around the galaxy. Contrary to expectations, the ob-
jects show a small scatter around the relations in Figs. 7 and 8.
Regardless of the distribution of neutral gas in DLA galaxies we
conclude that there is a tendency to find a lower column density
DLA with increasing impact parameter. Extending the investiga-
tion to include DLAs and sub-DLAs with N(H i)> 1019.6 cm−2
this tendency emerges for both the candidates and the confirmed
objects.
The velocity offsets between the Lyα emission lines and the
systemic redshifts of the DLAs are larger for half of the can-
didates compared to the confirmed objects. This could indicate
an origin in groups of galaxies, where the DLA resides in a less
luminous component than the galaxy detected in Lyα. To deter-
mine whether resonant scattering affects the candidate Lyα lines
more strongly and gives rise to larger velocity offsets than for
the confirmed objects, observations of the corresponding opti-
cal emission lines which are shifted to the near-IR are required
(e.g. Weatherley et al. 2005). Optical emission lines furthermore
have the advantage of being less affected by dust absorption
and therefore are better for estimating the star-formation rates.
Alternatively non-resonance UV lines such as C iv could be stud-
This survey was carried out with IFS on a 4-m class tele-
scope, and the signals were generally near the detection limit. To
verify this IFS method, it is necessary to get independent, higher
signal-to-noise ratio spectra with a larger aperture telescope to
confirm the existence of the emission lines.
Acknowledgements. This study was supported by the German
Verbundforschung associated with the ULTROS project, grant no.
05AE2BAA/4. S.F. Sánchez acknowledges the support from the Euro3D
Research Training Network, grant no. HPRN-CT2002-00305. K. Jahnke
acknowledges support from DLR project No. 50 OR 0404. We thank the referee
for useful suggestions that clarified the paper.
References
Adelberger, K. L., Shapley, A. E., Steidel, C. C., et al. 2005, ApJ, 629, 636
Bahcall, J. N., Maoz, D., Doxsey, R., et al. 1992, ApJ, 387, 56
Bechtold, J. 1994, ApJS, 91, 1
Becker, T. 2002, PhD thesis, Astrophysikalisches Institut Potsdam, Germany
Bunker, A., Smith, J., Spinrad, H., Stern, D., & Warren, S. 2003, Ap&SS, 284,
Chaffee, F. H., Stepanian, J. A., Chavushian, V. A., Foltz, C. B., & Green, R. F.
1994, Bulletin of the American Astronomical Society, 26, 1338
Charlot, S. & Fall, S. M. 1993, ApJ, 415, 580
Chen, H.-W., Kennicutt, R. C., & Rauch, M. 2005, ApJ, 620, 703
Chen, H.-W. & Lanzetta, K. M. 2003, ApJ, 597, 706
Chen, H.-W., Lanzetta, K. M., Webb, J. K., & Barcons, X. 1998, ApJ, 498, 77
Chen, H.-W., Lanzetta, K. M., Webb, J. K., & Barcons, X. 2001, ApJ, 559, 654
Christensen, L., Jahnke, K., Wisotzki, L., & Sánchez, S. F. 2006, A&A, 459, 717
Christensen, L., Sánchez, S. F., Jahnke, K., et al. 2004, A&A, 417, 487
Churchill, C. W., Kacprzak, G. G., & Steidel, C. C. 2005, In Probing Galaxies
through Quasar Absorption Lines
Churchill, C. W., Mellon, R. R., Charlton, J. C., et al. 2000, ApJ, 543, 577
Colbert, J. W. & Malkan, M. A. 2002, ApJ, 566, 51
Cooke, J., Wolfe, A. M., Gawiser, E., & Prochaska, J. X. 2006, ApJ, 652, 994
Dessauges-Zavadsky, M., Péroux, C., Kim, T.-S., D’Odorico, S., & McMahon,
R. G. 2003, MNRAS, 345, 447
Djorgovski, S. G., Pahre, M. A., Bechtold, J., & Elston, R. 1996, Nature, 382,
Elmegreen, B. G., Elmegreen, D. M., Vollbach, D. R., Foster, E. R., & Ferguson,
T. E. 2005, ApJ, 634, 101
Erni, P., Richter, P., Ledoux, C., & Petitjean, P. 2006, A&A, 451, 19
Filippenko, A. V. 1982, PASP, 94, 715
Fitzpatrick, E. L. 1999, PASP, 111, 63
Förster Schreiber, N. M., Genzel, R., Lehnert, M. D., et al. 2006, ApJ, 645, 1062
Francis, P. J. & McDonnell, S. 2006, MNRAS, 656
Fynbo, J. U., Burud, I., & Møller, P. 2000, A&A, 358, 88
Fynbo, J. U., Møller, P., & Warren, S. J. 1999, MNRAS, 305, 849
Gardner, J. P., Katz, N., Hernquist, L., & Weinberg, D. H. 2001, ApJ, 559, 131
Haehnelt, M. G., Steinmetz, M., & Rauch, M. 1998, ApJ, 495, 647
Haehnelt, M. G., Steinmetz, M., & Rauch, M. 2000, ApJ, 534, 594
Herbert-Fort, S., Prochaska, J. X., Dessauges-Zavadsky, M., et al. 2006, PASP,
118, 1077
Hopp, U. & Fernandez, M. 2002, Calar Alto Newsletter No.4,
http://www.caha.es/newsletter/news02a/hopp/paper.pdf
Hou, J. L., Shu, C. G., Shen, S. Y., et al. 2005, ApJ, 624, 561
Hu, E. M., Cowie, L. L., Capak, P., et al. 2004, AJ, 127, 563
Hu, E. M., Cowie, L. L., & McMahon, R. G. 1998, ApJ, 502, L99
Keel, W. C. 2005, AJ, 129, 1863
Kudritzki, R.-P., Méndez, R. H., Feldmeier, J. J., et al. 2000, ApJ, 536, 19
Kulkarni, V. P., Woodgate, B. E., York, D. G., et al. 2006, ApJ, 636, 30
Lacy, M., Becker, R. H., Storrie-Lombardi, L. J., et al. 2003, AJ, 126, 2230
Lanzetta, K. M., McMahon, R. G., Wolfe, A. M., et al. 1991, ApJS, 77, 1
Lanzetta, K. M., Wolfe, A. M., & Turnshek, D. A. 1995, ApJ, 440, 435
Le Brun, V., Bergeron, J., Boisse, P., & Deharveng, J. M. 1997, A&A, 321, 733
Ledoux, C., Petitjean, P., Bergeron, J., Wampler, E. J., & Srianand, R. 1998a,
A&A, 337, 51
Ledoux, C., Petitjean, P., Fynbo, J. P. U., Møller, P., & Srianand, R. 2006, A&A,
457, 71
Ledoux, C., Theodore, B., Petitjean, P., et al. 1998b, A&A, 339, L77
Leibundgut, B. & Robertson, J. G. 1999, MNRAS, 303, 711
Lopez, S., Reimers, D., Gregg, M. D., et al. 2005, ApJ, 626, 767
Lowenthal, J. D., Hogan, C. J., Green, R. F., et al. 1995, ApJ, 451, 484
Lu, L., Sargent, W. L. W., & Barlow, T. A. 1997, ApJ, 484, 131
Lu, L., Sargent, W. L. W., Barlow, T. A., Churchill, C. W., & Vogt, S. S. 1996,
ApJS, 107, 475
Matthews, L. D., van Driel, W., & Monnier-Ragaigne, D. 2001, A&A, 365, 1
Møller, P. 1999, in Astrophysics with the NOT, ed. H. Karttunen & V. Piirola
(University of Turku), 80
Møller, P., Fynbo, J. U., & Fall, S. M. 2004, A&A, 422, L33
Møller, P. & Warren, S. J. 1993, A&A, 270, 43
Møller, P., Warren, S. J., Fall, S. M., Fynbo, J. U., & Jakobsen, P. 2002, ApJ,
574, 51
Møller, P., Warren, S. J., & Fynbo, J. U. 1998, A&A, 330, 19
Okoshi, K. & Nagashima, M. 2005, ApJ, 623, 99
Outram, P. J., Chaffee, F. H., & Carswell, R. F. 1999, MNRAS, 310, 289
Péroux, C., Dessauges-Zavadsky, M., D’Odorico, S., Kim, T., & McMahon,
R. G. 2003, MNRAS, 345, 480
Péroux, C., Storrie-Lombardi, L. J., McMahon, R. G., Irwin, M., & Hook, I. M.
2001, AJ, 121, 1799
Petitjean, P., Pecontal, E., Valls-Gabaud, D., & Charlot, S. 1996, Nature, 380,
Petitjean, P., Srianand, R., & Ledoux, C. 2000, A&A, 364, L26
Pettini, M., Shapley, A. E., Steidel, C. C., et al. 2001, ApJ, 554, 981
Poli, F., Giallongo, E., Fontana, A., et al. 2003, ApJ, 593, L1
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P.
1992, Numerical recipes in FORTRAN. The art of scientific computing
L. Christensen et al.: An IFS survey for high-z DLA galaxies 19
(Cambridge: University Press, 1992, 2nd ed.)
Prochaska, J. X., Castro, S., & Djorgovski, S. G. 2003a, ApJS, 148, 317
Prochaska, J. X., Castro, S., & Djorgovski, S. G. 2003b, ApJS, 148, 317
Prochaska, J. X., Gawiser, E., Wolfe, A. M., Castro, S., & Djorgovski, S. G.
2003c, ApJL, 595, L9
Prochaska, J. X., Gawiser, E., Wolfe, A. M., Cooke, J., & Gelino, D. 2003d,
ApJS, 147, 227
Prochaska, J. X., Henry, R. B. C., O’Meara, J. M., et al. 2002a, PASP, 114, 933
Prochaska, J. X. & Herbert-Fort, S. 2004, PASP, 116, 622
Prochaska, J. X., Herbert-Fort, S., & Wolfe, A. M. 2005, ApJ, 635, 123
Prochaska, J. X., Howk, J. C., O’Meara, J. M., et al. 2002b, ApJ, 571, 693
Prochaska, J. X. & Wolfe, A. M. 1997, ApJ, 487, 73
Prochaska, J. X., Wolfe, A. M., Tytler, D., et al. 2001, ApJS, 137, 21
Pych, W. 2004, PASP, 116, 148
Rao, S. M., Nestor, D. B., Turnshek, D. A., et al. 2003, ApJ, 595, 94
Roth, M. M., Bauer, S., Dionies, F., et al. 2000, in Proc. SPIE, Vol. 4008, 277–
Roth, M. M., Kelz, A., Fechner, T., et al. 2005, PASP, 117, 620
Sánchez, S. F. 2004, AN, 325, 167
Sánchez, S. F. 2006, AN, 327, 850
Sánchez, S. F., Garcia-Lorenzo, B., Mediavilla, E., González-Serrano, J. I., &
Christensen, L. 2004, ApJ, 615, 156
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525
Schneider, D. P., Schmidt, M., & Gunn, J. E. 1991, AJ, 101, 2004
Shapley, A. E., Erb, D. K., Pettini, M., Steidel, C. C., & Adelberger, K. L. 2004,
ApJ, 612, 108
Shapley, A. E., Steidel, C. C., Pettini, M., & Adelberger, K. L. 2003, ApJ, 588,
Smith, H. E., Cohen, R. D., & Bradley, S. E. 1986, ApJ, 310, 583
Steidel, C. C., Pettini, M., & Hamilton, D. 1995, AJ, 110, 2519
Stepanian, J. A., Chavushian, V. H., Chaffee, F. H., Foltz, C. B., & Green, R. F.
1996, A&A, 309, 702
Storrie-Lombardi, L. J. & Wolfe, A. M. 2000, ApJ, 543, 552
Turnshek, D. A., Wolfe, A. M., Lanzetta, K. M., et al. 1989, ApJ, 344, 567
van Breukelen, C., Jarvis, M. J., & Venemans, B. P. 2005, MNRAS, 359, 895
van Dokkum, P. G. 2001, PASP, 113, 1420
Verheijen, M. A. W. & Sancisi, R. 2001, A&A, 370, 765
Warren, S. J., Møller, P., Fall, S. M., & Jakobsen, P. 2001, MNRAS, 326, 759
Weatherley, S. J., Warren, S. J., Møller, P., et al. 2005, MNRAS, 358, 985
Wisotzki, L., Becker, T., Christensen, L., et al. 2003, A&A, 408, 455
Wolfe, A. M. & Chen, H.-W. 2006, ApJ, 652, 981
Wolfe, A. M., Lanzetta, K. M., Foltz, C. B., & Chaffee, F. H. 1995, ApJ, 454,
Wolfe, A. M., Turnshek, D. A., Smith, H. E., & Cohen, R. D. 1986, ApJS, 61,
Zaritsky, D., Kennicutt, R. C., & Huchra, J. P. 1994, ApJ, 420, 87
	Introduction
	Sample selection
	Observations and data reduction
	Search for DLA optical counterparts
	Expected sizes
	Candidate selection
	Results
	Candidate significance class
	Non-detections
	Experiments with artificial objects
	Field Ly emitters
	Notes on individual objects
	Properties of candidate DLA counterparts
	Line fluxes
	Velocity differences
	Hi extension
	DLA galaxy sizes and luminosities
	Powerlaw profiles
	Exponential profiles
	Metallicity effects on Ly emission
	Metallicity gradients
	Conclusions
ABSTRACT
  We search for galaxy counterparts to damped Lyman-alpha absorbers (DLAs) at
z>2 towards nine quasars, which have 14 DLAs and 8 sub-DLAs in their spectra.
We use integral field spectroscopy to search for Ly-alpha emission line objects
at the redshifts of the absorption systems. Besides recovering two previously
confirmed objects, we find six statistically significant candidate Ly-alpha
emission line objects. The candidates are identified as having wavelengths
close to the DLA line where the background quasar emission is absorbed. In
comparison with the six currently known Ly-alpha emitting DLA galaxies the
candidates have similar line fluxes and line widths, while velocity offsets
between the emission lines and systemic DLA redshifts are larger. The impact
parameters are larger than 10 kpc, and lower column density systems are found
at larger impact parameters. Assuming that a single gas cloud extends from the
QSO line of sight to the location of the candidate emission line, we find that
the average candidate DLA galaxy is surrounded by neutral gas with an
exponential scale length of ~5 kpc.

<|endoftext|><|startoftext|>
Introduction
Variability is an important phenomenon in astrophysical studies of structure and evolu-
tion, both stellar and galactic. Some variable stars, such as RR Lyrae, are an excellent tool
for studying the Galaxy. Being nearly standard candles (thus making distance determina-
tion relatively straightforward) and being intrinsically bright, they are a particularly suitable
– 3 –
tracer of Galactic structure. In extragalactic astronomy, the optical continuum variability of
quasars is utilized as an efficient method for their discovery (van den Bergh, Herbst & Pritchet
1973; Hawkins 1983; Koo, Kron & Cudworth 1986; Hawkins & Veron 1995), and is also fre-
quently used to constrain the origin of their emission (Kawaguchi et al. 1998; Trevese et al.
2001; Martini & Schneider 2003).
Despite the importance of variability, the variable optical sky remains largely unex-
plored and poorly quantified, especially at the faint end. To what degree different variable
populations contribute to the overall variability, how they are distributed in magnitude and
color, what the characteristic time-scales and the dominant mechanisms of variability are,
are just some of the questions that still remain to be answered. To address these questions,
several contemporary projects aimed at regular monitoring of the optical sky were started.
Some of the more prominent surveys in terms of the sky coverage, depth, and cadence are:
• The Faint Sky Variability Survey (Groot et al. 2003) is a very deep (17 < V < 24)
BV I survey of 23 deg2 of sky, containing about 80,000 sources sampled at timescales
ranging from minutes to years.
• The QUEST Survey (Vivas et al. 2001) monitors 700 deg2 of sky from V = 13.5 to a
limit of V = 21.
• ROTSE-I (Akerlof et al. 2000) monitors the entire observable sky twice a night from
V = 10 to a limit of V = 15.5. The Northern Sky Variability Survey (Woźniak et al.
2004) is based on ROTSE-I data.
• OGLE (most recently OGLE III; Udalski et al. 2002) monitors ∼ 100 deg2 towards the
Galactic bulge from I = 11.5 to a limit of I = 20. Due to the very high stellar density
towards the bulge, OGLE II has detected about 270,000 variable stars (Woźniak et al.
2002; Żebruń et al. 2002).
• The MACHO Project monitored the brightness of ∼ 60 million stars in ∼ 90 deg2 of
sky toward the Magellanic Clouds and the Galactic bulge for ∼ 7 years to a limit of
V ∼ 24 (Alcock et al. 2001).
A comprehensive review of past and ongoing variability surveys can be found in Becker et al.
(2004).
Recognizing the outstanding importance of variable objects, the last Decadal Survey Re-
port (Astronomy and Astrophysics Survey Committee, Board on Physics and Astronomy, Space Studies Board, National Research Council
2001) highly recommended a major new initiative for studying the variable sky, the Large
– 4 –
Synoptic Survey Telescope (LSST; Tyson et al. 2002; Walker 2003). The LSST1 will offer an
unprecedented view of the faint variable sky: according to the current designs it will scan
the entire accessible sky every three nights to a limit of V ∼ 25 with two observations per
night in two different bands (selected from a set of six). One of the LSST science goals2 will
be the exploration of the transient optical sky: the discovery and analysis of rare and exotic
objects (e.g. neutron star and black hole binaries), gamma-ray bursts, X-ray flashes, and of
new classes of transients, such as binary mergers and stellar disruptions by black holes. The
observed volume of space, and the requirement to recognize and monitor these events — in
real time — on a “normally” variable sky, will present a challenge to the project.
Since LSST will utilize3 the Sloan Digital Sky Survey (SDSS; York et al. 2000) photo-
metric system (ugriz, Fukugita et al. 1996), multiple photometric observations obtained by
the SDSS represent an excellent dataset for a pre-LSST study that characterizes the faint
variable sky and quantifies the variable population and its distribution in magnitude-color-
variability space. Here we present such a study of unresolved sources in a region that has
been imaged multiple times by the SDSS.
In Section 2 we give a brief overview of the SDSS imaging survey and repeated scans
of a ∼ 290 deg2 region called Stripe 82. In Section 3, we describe methods used to select
candidate variable sources from the SDSS Stripe 82 data assembled, averaged and recal-
ibrated by Ivezić et al. (2007), and present tests that show the robustness of the adopted
selection criteria. In the same section, we discuss the distribution of selected variable sources
in magnitude-color-variability space. The Milky Way halo structure traced by selected can-
didate RR Lyrae stars is discussed in Section 4, and in Section 5 we estimate the fraction of
variable quasars. Implications for surveys such as the LSST are discussed in Section 6, and
our main results are summarized in Section 7.
2. Overview of the SDSS Imaging and Stripe 82 Data
The quality of photometry and astrometry, as well as the large area covered by the
survey, make the SDSS stand out among available optical sky surveys (Sesar et al. 2006).
The SDSS is providing homogeneous and deep (r < 22.5) photometry in five bandpasses
1See [HREF]http://www.lsst.org
2For more details see [HREF]http://www.lsst.org/Science/science goals.shtml
3LSST will also use the Y band at ∼ 1 µm. For more details see the LSST Science Requirement Document
at [HREF]http://www.lsst.org/Science/lsst baseline.shtml
– 5 –
(u, g, r, i, and z, Gunn et al. 1998; Hogg et al. 2002; Smith et al. 2002; Gunn et al. 2006;
Tucker et al. 2006) accurate to 0.02 mag (root-mean-square scatter, hereafter rms) for un-
resolved sources not limited by photon statistics (Scranton et al. 2002; Ivezić et al. 2003a),
and with a zeropoint uncertainty of 0.02 mag (Ivezić et al. 2004a). The survey sky coverage
of 10,000 deg2 in the northern Galactic cap, and 300 deg2 in the southern Galactic cap will
result in photometric measurements for well over 100 million stars and a similar number
of galaxies (Stoughton et al. 2002). The recent Data Release 5 (Adelman-McCarthy et al.
2007)4 lists photometric data for 215 million unique objects observed in 8000 deg2 of sky as
part of the “SDSS-I” phase that ran through June 2005. Astrometric positions are accu-
rate to better than 0.1′′ per coordinate (rms) for sources with r < 20.5 (Pier et al. 2003),
and the morphological information from the images allows reliable star-galaxy separation to
r ∼ 21.5 (Lupton et al. 2002). In addition, the 5-band SDSS photometry can be used for
very detailed source classification; e.g. separation of quasars and stars (Richards et al. 2002),
spectral classification of stars to within 1-2 spectral subtypes (Lenz et al. 1998; Finlator 2000;
Hawley et al. 2002), and even remarkably efficient color selection of the horizontal branch and
RR Lyrae stars (Yanny et al. 2000; Sirko et al. 2004; Ivezić et al. 2005) and low-metallicity
G and K giants (Helmi et al. 2003).
The equatorial Stripe 82 region (22h 24m < αJ2000 < 04h 08m, −1.27
◦ < δJ2000 <
+1.27◦, ∼ 290 deg2), observed in the southern Galactic cap, presents a valuable data source
for variability studies. The region was repeatedly observed (65 imaging runs by July 2005,
but not all cover the entire region), and it is the largest source of multi-epoch data in the
SDSS-I phase. Another source of the large number of scans is the SDSS-II Supernova Survey
(Frieman et al. 2007). By averaging the repeated observations of Stripe 82 sources, more
accurate photometry than the nominal 0.02 mag single-scan accuracy can be achieved. This
motivated Ivezić et al. (2007) to produce a catalog of recalibrated Stripe 82 observations.
The catalog lists 58 million photometric observations for 1.4 million unresolved sources that
were observed at least 4 times in each of the gri bands (with a median of 10 observations
obtained over ∼ 5 years). The random photometric errors for PSF (point spread function)
magnitudes are below 0.01 mag for stars brighter than 19.5, 20.5, 20.5, 20, 18.5 in ugriz,
respectively (about twice as accurate for individual SDSS runs), and the spatial variation of
photometric zeropoints is not larger than ∼0.01 mag (rms). Following Ivezić et al. (2007),
we use PSF magnitudes because they go deeper at a given signal-to-noise ratio than aperture
magnitudes, and have more accurate photometric error estimates than model magnitudes.
In addition, various low-order statistics such as root-mean-square scatter (Σ), χ2 per degree
of freedom (χ2), lightcurve skewness (γ), minimum and maximum PSF magnitude, were
4Please see [HREF]http://www.sdss.org/dr5
– 6 –
computed for each ugriz band and each source. We compute χ2 per degree of freedom as
(xi − 〈x〉)
and lightcurve skewness γ as5
(n− 1)(n− 2)
(xi − 〈x〉)
3 (3)
(xi − 〈x〉)2 (4)
where n is the number of detections, xi is the magnitude, 〈x〉 is the mean magnitude, and
ξi is the photometric error.
Separation of quasars and stars, as well as efficient color selection of horizontal branch
and RR Lyrae stars, depend on accurate u band photometry. To ensure this, we select
748,084 unresolved sources from the Ivezić et al. (2007) catalog with at least 4 detections in
the u band. A catalog of variable sources selected from this sample is analyzed in Section 3
below.
3. Analysis of Stripe 82 Catalog of Variable Sources
In this section we describe methods for selecting candidate variable sources, and present
tests that show the robustness of the adopted selection criteria. The distribution of selected
variable sources in magnitude-color-variability space is also presented.
3.1. Methods and Selection Criteria
Due to a relatively small number of observations per source and random sampling, we
do not perform lightcurve fitting, but instead use low order statistics to select candidate
variables and study their properties. There are four parameters (median PSF magnitude,
root-mean-square scatter Σ, χ2, and lightcurve skewness γ) measured in five photometric
5We use equations from [HREF]http://www.xycoon.com/skewness small sample test 1.htm.
– 7 –
bands (u, g, r, i, and z), for a total of 20 parameters. In the analysis presented here, we
utilize eight of them:
• median PSF magnitudes in the ugr bands (corrected for interstellar extinction using
the map from Schlegel, Finkbeiner & Davis 1998) because the g − r vs. u − g color-
color diagram has the most classification power (e.g. Smolčić et al. 2004 and references
therein).
• Σ and χ2 in the g and r bands, and
• lightcurve skewness γ(g) (the g band combines a high signal-to-noise ratio and large
variability amplitude for the majority of variable sources).
The observed root-mean-square scatter Σ includes both the intrinsic variability σ and
the mean photometric error 〈ξ(m)〉 as a function of magnitude. The dependence of Σ on
magnitude in the ugriz bands, is shown in Figure 1. For sources brighter than 18, 19.5, 19.5,
19, and 17.5 mag in the ugriz, respectively, the SDSS delivers 2% photometry with little or
no dependence on magnitude. We determine 〈ξ(m)〉 by fitting a fourth degree polynomial
to median Σ values in 0.5 mag wide bins (here we assume that the majority of sources are
not variable). The theoretically expected 〈ξ(m)〉 function (Strateva et al. 2001)
〈ξ(m)〉 = a+ b100.4m + c100.8m (5)
provides equally good fits. We define the intrinsic variability σ (hereafter rms scatter σ) as
σ = (Σ2 − 〈ξ(m)〉2)1/2 (6)
for Σ > 〈ξ(m)〉, and σ = 0 otherwise.
As the first variability selection criterion, we adopt σ(g) > 0.05 mag and σ(r) > 0.05
mag (hereafter written as σ(g, r) > 0.05 mag). At the bright end, this criterion is equivalent
to selecting sources with rms scatter greater than 2.5σ0, where σ0 = 0.02 mag is the mea-
surement noise. Selection cuts are applied simultaneously in the g and r bands to reduce
the number of “false positives” (intrinsically non-variable sources selected as candidate vari-
able sources due to measurement noise). About 6% of sources pass the σ cut in each band
separately, and ∼ 3% of sources pass the cut in both bands simultaneously. By selecting
sources with σ(g, r) > 0.05 mag, we also select faint sources that have large σ due to large
photometric errors at the faint end. To only select faint sources with statistically significant
rms scatter, we apply the χ2 test as the second selection cut.
– 8 –
In the χ2 test, the value of χ2 per degree of freedom (calculated with respect to a
weighted mean magnitude and using errors computed by the photometric pipeline) deter-
mines whether the observed lightcurve is consistent with the Gaussian distribution of er-
rors. Large χ2 values show that the rms scatter is inconsistent with random fluctuations.
Ivezić et al. (2003a, 2007) used multi-epoch SDSS observations to show that the photometric
error distribution in the SDSS roughly follows a Gaussian distribution. A comparison of χ2
distributions in the g and r bands with a reference Gaussian χ2 distribution is shown in
Figure 2. As evident, χ2 distributions in both bands roughly follow the reference Gaussian
χ2 distribution for χ2 < 1, demonstrating that median photometric errors are correctly de-
termined. The discrepancy for larger χ2 is due to variable sources rather than non-Gaussian
error distributions, as we demonstrate below.
The second selection cut, χ2(g) > 3 and χ2(r) > 3 (hereafter written as χ2(g, r) > 3),
selects ∼ 90% of σ(g, r) > 0.05 mag sources, as shown in Figure 2 (middle panels). The
effectiveness of the χ2 test is demonstrated in the bottom panel of Figure 2. For magnitudes
fainter than g = 20.5, the fraction of candidate variables decreases as photometric errors
increase. The selection is relatively uniform for sources brighter than g = 20.5, and we
adopt this value as the flux limit for the selected variable sample.
There are 662,195 sources brighter than g = 20.5 in the full sample. Using σ(g, r) > 0.05
mag and χ2(g, r) > 3 as the selection criteria, we select 13,051 candidate variable sources6.
Therefore, at least 2% of unresolved optical sources brighter than g = 20.5 appear variable
at the > 0.05 mag level (rms) simultaneously in the g and r bands. The fraction of selected
variable sources is not a strong function of the minimum required number of observations,
but it does depend on the stellar density because the number of stars increases at lower
Galactic latitudes (see Fig. 5 in Ivezić et al. 2007) while the quasar count remains the same.
3.2. The Counts of Variable Sources
In this section we estimate the completeness and efficiency of the candidate variable
sample, and discuss the dependence of counts, rms scatter, σ(g)/σ(r) ratio, and the lightcurve
skewness γ(g) on the position in the g − r vs. u− g color-color diagram.
6A list of candidate variable sources and their data from Ivezić et al. (2007) are publicly available from
[HREF]http://www.sdss.org/dr5/products/value added/index.html
– 9 –
3.2.1. Completeness
The selection completeness, defined as the fraction of true variable sources recovered by
the algorithm, depends on the lightcurve shape and amplitudes. Due to a fairly large number
of observations (median of 10), and small σ(g, r) cutoff compared to typical amplitudes of
variable sources (e.g. most RR Lyrae stars and quasars have peak-to-peak amplitudes ∼ 1
mag), we expect the completeness to be fairly high for RR Lyrae stars (& 95%, see Section 4)
and quasars (∼ 90%, see Section 5). The completeness for other types of variable sources,
such as flares and eclipsing binaries, is hard to estimate, but is probably low due to sparse
sampling.
3.2.2. Efficiency
The selection efficiency, defined as the fraction of true variable sources in the candidate
variable sample, determines the robustness of the selection algorithm. The main diagnostic
for the robustness of the adopted selection criteria is the distribution of selected candidates
in the SDSS color-magnitude and color-color diagrams. The position of a source in these
diagrams is a good proxy for its spectral classification (Lenz et al. 1998; Fan 1999; Finlator
2000; Smolčić et al. 2004).
Figure 3 compares the distribution of candidate variable sources to that of all sources
in the g − r vs. u − g color-color diagram. Were the selection a random process, the se-
lected candidates would have the same distribution as the full sample. The distributions of
candidate variables and of the full sample are remarkably different, demonstrating that the
candidate variables are not randomly selected from the parent sample.
The three dominant classes of variable objects are quasars, RR Lyrae stars, and stars
from the main stellar locus. The most obvious difference between the variable and the full
sample distributions is a much higher fraction of low-redshift quasars (< 2.2, recognized by
their UV excess, u − g < 0.7, see Richards et al. 2002) and RR Lyrae stars (u − g ∼ 1.15,
g − r < 0.3, see Ivezić et al. 2005) in the variable sample, and vividly shown in the bottom
panel of Figure 3.
Another interesting feature visible in this panel is a gradient in the fraction of variable
main stellar locus stars (perpendicular to the main stellar locus). We investigate this gradient
by first defining principal colors
P1 = 0.91u− 0.495g − 0.415r − 1.28 (7)
– 10 –
s = −0.249u+ 0.794g − 0.555r + 0.234 (8)
where P1 and s are principal axis parallel and perpendicular to the main stellar locus,
respectively (Ivezić et al. 2004a). The s color is a measure of metallicity (Lenz et al. 1998),
and s > 0.05 stars are expected to be metal poor (Helmi et al. 2003). Sources with r < 19
and 0 < P1 < 0.9 are selected and binned in four s bins. For each bin we calculate the
fraction of source with σ(g) > 0.05 mag, the fraction of variable sources (selected with
σ(g, r) > 0.05 mag and χ2(g, r) > 3), median σ(g), and the total number of sources in the
bin (see Table 2). A greater fraction of variable sources in the last bin (s > 0.06) indicates
that, on average, metal-poor main stellar locus stars are more variable than the metal-rich
stars. This could be because this sample of metal-poor stars is expected to have a high
fraction of giants.
In order to quantify the differences between the full and the variable sample, we follow
Sesar et al. (2006) and divide the g − r vs. u− g color-color diagram into six characteristic
regions, each dominated by a particular type of source, as shown in Figure 4. The fractions
and counts of variable and all sources in each region are listed in Table 1 for g < 19, g < 20.5,
and g < 22 flux-limited samples. Notably, in the adopted g < 20.5 flux limit, the fraction
of Region II sources (dominated by low-redshift quasars) in the variable sample is 63%, or
∼ 30 times greater than the fraction of Region II sources in the full sample (∼ 2%). The
fraction of Region IV sources (which include RR Lyrae stars) in the variable sample is also
high when compared to the full sample (∼ 6 times higher).
As shown in Table 1, in the g = 20.5 flux-limited sample, we find that low-redshift
quasars and RR Lyrae stars (i.e. Regions II and IV) make 70% of the variable population,
while representing only 3% of all sources. Quasars alone account for 63% of the variable
population. Stars from the main stellar locus represent 95% of all sources and 25% of the
variable sample: about 0.5% of the stars from the locus are variable at the > 0.05 mag level.
3.3. The Properties of Variable Sources
Various lightcurve properties, such as shape and amplitude, are expected to be correlated
with stellar types. In this section we study the distribution of the rms scatter in the u and g
bands, and σ(g)/σ(r) ratio as a function of the u− g and g− r colors. To emphasize trends,
we bin sources and present median values for each bin.
The distribution of the median σ(u) and σ(g) values in the g − r vs. u − g color-color
diagram is shown in the top two panels of Figure 5. RR Lyrae stars show larger rms scatter
– 11 –
(& 0.3 mag) in the u and g bands, than low-redshift quasars or stars from the main stellar
locus. Quasars also show slightly larger rms scatter in the u band (∼ 0.1 mag) than in
the g band (∼ 0.07 mag), as discussed by Kinney et al. (1991),Ivezić et al. (2004b), and
Vanden Berk et al. (2004). If we define the degree of variability as the root-mean-square
scatter in the g band, then on average RR Lyrae stars show the greatest variability, followed
by quasars and the main stellar locus stars.
Another distinctive characteristic of variable sources is the ratio of flux changes in
different bandpasses. This property can be used to select different types of variable sources.
For example, RR Lyrae stars are bluer when brighter, a behavior used by Ivezić et al. (2000)
to select RR Lyrae using 2-epoch SDSS data. Here we define a new parameter, σ(g)/σ(r), to
express the ratio of flux changes in the g and r bands, and study its distribution in the g− r
vs. u − g color-color diagram. In particular, we examine this distribution and its median
values for three dominant classes of variable sources: quasars, RR Lyrae stars, and stars
from the main stellar locus.
The bottom left panel in Figure 5 shows the distribution of median σ(g)/σ(r) values as
a function of u− g and g − r colors. Using Fig. 5 we note that on average:
• RR Lyrae stars have σ(g)/σ(r) ∼ 1.4
• Main stellar locus stars have σ(g)/σ(r) ∼ 1, and
• Quasars show a σ(g)/σ(r) gradient in the g − r vs. u− g color-color diagram.
The average value of σ(g)/σ(r) ∼ 1.4 in Region IV indicates that RR Lyrae stars
dominate the variable source count in this region. The ratio of 1.4 for RR Lyrae stars was
also previously found by Ivezić et al. (2000). While Figure 5 only presents median values of
the rms scatter, Figure 6 shows how the rms scatter in the g and r bands correlates with the
u − g color for individual sources. Variable sources that follow the σ(g) = 1.4σ(r) relation
also correlate with the u− g color, and have u− g ∼ 1, as expected for RR Lyrae stars.
The average ratio of σ(g)/σ(r) ∼ 1 (i.e. gray flux variations) for stars in the main stellar
locus suggests that the variability could be caused by eclipsing systems. The distribution of
γ(g) for main stellar locus stars further strengthens this possibility, as discussed in Section 3.4
below.
The gradient in the σ(g)/σ(r) ratio observed for low-redshift quasars in the g−r vs. u−g
color-color diagram suggests that the variability correlation between the g and r bands is
more complex than in the case of RR Lyrae or main stellar locus stars. Wilhite et al. (2006)
– 12 –
show that the photometric color changes for quasars depend on the combined effects of con-
tinuum changes, emission-line changes, redshift, and the selection of photometric bandpasses.
They note that due to the lack of variability of the lines, measured photometric color is not
always bluer in brighter phases, but depends on redshift and the filters used. To verify the
dependence of broad-band photometric variability on redshift, we plot σ(g)/σ(r) vs. redshift
for all spectroscopically confirmed unresolved quasars from Schneider et al. (2005) which are
in Stripe 82, as shown in Figure 7. We confirm that the broad-band photometric variability
depends on the redshift, and that the σ(g)/σ(r) gradient in the g − r vs. u − g color-color
diagram can be explained by the increase in σ(g)/σ(r) from ∼ 1 to ∼ 1.6 in the 1.0 to 1.6
redshift range. This effect is due to the Mg II emission line (more stable in flux than the con-
tinuum) moving through the r band filter over this redshift range. The implied correlation
of the u−g and g− r colors with redshift is consistent with the discussion by Richards et al.
(2002). The lack of noticeable correlation of σ(g) with redshift is due to the combined effects
of the dependence of σ(g) on the rest-frame wavelength and time which cancel out (for a
detailed model see Ivezić et al. 2004b).
3.4. Skewness as a Proxy for Dominant Variability Mechanism
Lightcurve skewness, a measure of the lightcurve asymmetry, provides additional in-
formation on the type of variability. Negatively skewed, asymmetric lightcurves indicate
variable sources that spend more time fainter than (mmin+mmax)/2, where mmin and mmax
are magnitudes at the minimum and maximum. Type ab RR Lyrae stars, for example,
have negatively skewed lightcurves (γ ∼ −0.5, Wils, Lloyd & Bernhard 2006). Positively
skewed, asymmetric lightcurves indicate variable sources that spend more time brighter
than (mmin + mmax)/2 (e.g. eclipsing systems). Sources with symmetric lightcurves will
have γ ∼ 0.
The bottom right panel in Figure 5 shows the distribution of the median γ(g) as a
function of the position in the g − r vs. u − g color-color diagram. On average, quasars
and c type RR Lyrae stars (u − g ∼ 1.15, g − r < 0.15) have γ(g) ∼ 0, ab type RR Lyrae
(u − g ∼ 1.15, g − r > 0.15) have negative skewness (γ(g) ∼ −0.5), and stars in the main
stellar locus have positive skewness.
Figure 8 shows the distribution of the lightcurve skewness in the ugi bands for spec-
troscopically confirmed unresolved quasars from Schneider et al. (2005) which are in Stripe
82, candidate RR Lyrae stars (selection details are discussed in Section 4 below), and main
stellar locus stars from our variable sample. Stars in the main stellar locus show a bimodal
γ(g) distribution. This distribution suggests at least two, and perhaps more, different popu-
– 13 –
lations of variables. Indeed, when spectroscopically confirmed M dwarfs are selected, a third
peak appears at γ(g) −2.5, possibly associated with flaring M dwarfs (Kowalski et al. 2007).
The bimodality similar to the one in the g band is also discernible in the r band, while it is
less pronounced in the i band and not detected in the u and z bands (the r and z data are
not shown).
A comparison of the u− g and g − r color distributions for variable main stellar locus
stars brighter than g = 19 and a subset with highly asymmetric lightcurves (γ(g) > 2.5)
is shown in Figure 9. The subset with asymmetric lightcurves has an increased fraction of
stars with colors u− g ∼ 2.5 and g− r ∼ 1.4, that correspond to M stars. This may indicate
that M stars have a higher probability of being associated with an eclipsing companion than
stars with earlier spectral types. However, the selection effects are probably important since
a companion is easier to detect (due to the low luminosity of M dwarfs). Kowalski et al.
(2007) examine these issues using lightcurve data on a sample of spectroscopically confirmed
M dwarfs. Finally, quasars have symmetric lightcurves (γ ∼ 0) and their distribution of
skewness does not change between bands.
4. The Milky Way Halo Structure Traced by Candidate RR Lyrae Stars
Studies of substructures in the Galactic halo, such as clumps and streams, can constrain
the formation history of the Milky Way. One of the best tracers to study the outer halo
are RR Lyrae stars because they are nearly standard candles, are sufficiently bright to be
detected at large distances (5− 100 kpc for 14 < r < 20.7), and are sufficiently numerous to
trace the halo substructure with a high spatial resolution. The General Catalog of Variable
Stars (GCVS; Kholopov et al. 1988) lists7 RR Lyrae stars as RR Lyrae type ab (RRab)
and type c (RRc) stars. RRab stars have asymmetric lightcurves, periods from 0.3 to 1.2
days, and amplitudes from V ∼ 0.5 to V ∼ 2. RRc stars have nearly symmetric, sometimes
sinusoidal, lightcurves, with periods from 0.2 to 0.5 days, and amplitudes not greater than
V ∼ 0.8. In this work we assume MV = 0.7 as the absolute V band magnitude of RRab and
RRc stars. A comprehensive review of RR Lyrae stars can be found in Smith (1995).
In this section we fine tune criteria for selecting candidate RR Lyrae stars, and estimate
the selection completeness and efficiency. Using selected candidate RR Lyrae stars, we
recover a known halo clump associated with the Sgr dwarf tidal stream, and find several new
halo substructures.
7A list of GCVS variability types can be found at [HREF]http://www.sai.msu.su/groups/cluster/gcvs/gcvs/iii/vartype.txt
– 14 –
4.1. Criteria for Selecting RR Lyrae Stars
Figures 3, 4, and 5 show that RR Lyrae stars occupy a well-defined region (Region IV)
in the g− r vs. u−g color-color diagram, and Figure 6 shows how RR Lyrae stars follow the
σ(g) = 1.4σ(r) relation. Motivated by these results, we introduce color and σ(g)/σ(r) cuts
to specifically select candidate RR Lyrae stars from the variable sample, and study their
distribution in the rms scatter-color-lightcurve skewness parameter space.
RR Lyrae stars have distinctive colors and can be selected with the following criteria
(Ivezić et al. 2005):
0.98 < u− g < 1.30 (9)
− 0.05 < Dug < 0.35 (10)
0.06 < Dgr < 0.55 (11)
− 0.15 < r − i < 0.22 (12)
− 0.21 < i− z < 0.25 (13)
where
Dug = (u− g) + 0.67(g − r)− 1.07 (14)
Dgr = 0.45(u− g)− (g − r)− 0.12. (15)
We apply these cuts to our sample of candidate variables and select 846 sources. It is
implied by Ivezić et al. (2005) that RR Lyrae should always stay within these color bound-
aries, even though their colors change as a function of phase. Their distribution in the g− r
vs. u − g color-color diagram and rms scatter in the g band are shown in Figure 10 (top
left panel). The distribution of sources in the RR Lyrae region is inhomogeneous. Sources
with large rms scatter in the g band (& 0.2 mag) are centered around u− g ∼ 1.15, and are
separated by g − r ∼ 0.12 into two groups. A comparison with Figure 3 from Ivezić et al.
(2005) suggests that these large rms scatter sources might be RR Lyrae type ab (RRab,
g − r > 0.12) and type c stars (RRc, g − r < 0.12). Small rms scatter sources (. 0.1 mag)
have a fairly uniform distribution, and are slightly bluer with u− g . 1.1.
The distribution of sources from the RR Lyrae region in the σ(r) vs. σ(g) diagram is
presented in the top right panel of Figure 10. The majority of large rms scatter sources
follow the σ(g) = 1.4σ(r) relation, as expected for RR Lyrae stars. Since RR Lyrae stars
are bluer when brighter, or equivalently, have greater rms scatter in the g band than in the
r band, we require 1 < σ(g)/σ(r) 6 2.5 and select 683 candidate RR Lyrae stars.
– 15 –
A comparison of u − g color distributions for candidate RR Lyrae stars and of sources
with RR Lyrae colors, but not tagged as RR Lyrae stars, presented in the bottom left panel
of Figure 10, demonstrates the robustness of the RR Lyrae selection. The two distributions
are very different (the probability that they are the same is 10−4, as given by the KS test),
with the candidate RR Lyrae distribution peaking at u−g ∼ 1.15, as expected for RR Lyrae
stars.
One property that distinguishes RRab from RRc stars is the shape (or skewness) of their
lightcurves (in addition to lightcurve amplitude and period). RRab stars have asymmetric
lightcurves, while RRc lightcurves are symmetric. In the top left panel of Figure 10, we
noted that g − r ∼ 0.12 seemingly separates high rms scatter sources into two groups. If
g − r ∼ 0.12 is the boundary between the RRab and RRc stars, then the same boundary
should show up in the distribution of lightcurve skewness as a function of the g − r color.
As shown in Figure 10 (bottom left panel), this is indeed the case. On average, sources with
g− r < 0.12 have γ(g) ∼ 0 (symmetric lightcurves), as RRc stars, while g− r > 0.12 sources
have γ(g) ∼ −0.5 (asymmetric lightcurves) typical of RRab stars.
We show in Section 4.2 that candidate RR Lyrae stars with γ(g) > 1 are contaminated
by eclipsing variables. Therefore, to reduce the contamination by eclipsing variables, we also
require γ(g) 6 1, and select 634 sources as our final sample of candidate RR Lyrae stars.
4.2. Completeness and efficiency
The selection completeness, defined as the fraction of recovered RR Lyrae stars, will
depend on the color cuts, σ(g, r) cutoff, and the number of observations. The color cuts
(Eqs. 9 to 15) applied in Section 4.1 were chosen to minimize contamination by sources
other than RR Lyrae stars while maintaining an almost 100% completeness (Ivezić et al.
2005). With the σ(g, r) cutoff at 0.05 mag (small compared to the ∼ 1 mag typical peak-
to-peak amplitudes of RR Lyrae stars), and a fairly large number of observations per source
(median of 10), we estimate the RR Lyrae selection completeness to be & 95% (see Appendix
in Ivezić et al. 2000).
To determine the selection efficiency, defined as the fraction of true RR Lyrae stars in the
RR Lyrae candidate sample, we positionally match 683 candidate RR Lyrae stars selected
by 1 < σ(g)/σ(r) 6 2.5 to a sample of RR Lyrae sources selected from the SDSS Light-
Motion-Curve Catalog (LMCC; Bramich et al. 2007). This catalog covers the same region of
the sky as the one discussed here, but includes more recent SDSS-II observations that allow
the construction of lightcurves. We match 613 candidates, while 70 candidate RR Lyrae
– 16 –
stars from our sample, for some reason, do not have a match in the LMCC (De Lee, private
communication). Following the classification based on phased lightcurves by De Lee et al.
(2007), we find that 71% of sources in our candidate RR Lyrae sample are classified as RRab
and RRc, 28% are classified as variable non-RR Lyrae stars, and only 1% are spurious, non-
variable sources. The most significant contamination comes from a population of variable
sources bluer than u−g ∼ 1.1 (dotted line, bottom left panel Figure 11), possibly Population
II δ Scuti stars, also known as SX Phoenicis stars (Hoffmeister, Richter & Wenzel 1985).
The top left and the bottom right panels in Figure 11, show that RRab and RRc-
dominated regions are separated by g−r ∼ 0.12, as already hinted in Figure 10. Also, variable
non-RR Lyrae sources with γ(g) > 1 are classified by De Lee et al. (2007) as eclipsing
variables, justifying our γ(g) 6 1 cut.
To summarize, using color criteria and criteria based on σ(g), σ(r), and γ(g) RR Lyrae
stars are selected with & 95% completeness and ∼ 70% efficiency.
4.3. The Spatial Distribution of Candidate RR Lyrae Stars
Using the selection criteria from Section 4.1 we isolate 634 RR Lyrae candidates. The
magnitude-position diagram for these candidates within 2.5◦ from the Celestial Equator is
shown in Figure 12.
As discussed by Ivezić et al. (2005), an advantage of the data representation utilized
in Figure 12 (magnitude–right ascension diagram) is its simplicity – only “raw” data are
shown, without any post-processing. However, the magnitude scale is logarithmic and thus
the spatial extent of structures is heavily distorted. In order to avoid these shortcomings,
we have applied a Bayesian method for estimating continuous spatial density distribution
developed by Ivezić et al. (2005) (see their Appendix B). The resulting density map is shown
in the right panel in Figure 13. The advantage of that representation is that it better conveys
the significance of various local overdensities. For comparison, we also show a map of the
northern part of the equatorial strip constructed using 2-epoch data discussed by Ivezić et al.
(2000).
We detect several new halo substructures at & 3σ significance (compared to expected
Poissonian fluctuations) and present their approximate locations and properties in Table 3.
The most distant clump is at 100 kpc from the Galactic center. The strongest clump in
the left wedge belongs to the Sgr dwarf tidal stream as does the clump marked by C in
the right wedge (Ivezić et al. 2003a). We note that the apparent “clumpiness” of the candi-
date RR Lyrae distribution increases with increasing radius, similar to CDM predictions by
– 17 –
Bullock, Kravtsov & Weinberg (2001). A detailed comparison of their models with the data
presented here will be discussed elsewhere (Sesar et al., in prep).
5. Are All Quasars Variable?
The optical continuum variability of quasars has been recognized since their first op-
tical identification (Matthews & Sandage 1963), and it has been proposed and utilized as
an efficient method for their discovery (van den Bergh, Herbst & Pritchet 1973; Hawkins
1983; Koo, Kron & Cudworth 1986; Hawkins & Veron 1995). The observed characteristics
of the variability of quasars are frequently used to constrain the origin of their emission
(e.g. Kawaguchi et al. 1998 and references therein; Martini & Schneider 2003; Pereyra et al.
2006). Recently, significant progress in the description of quasar variability has been made by
employing the SDSS data (de Vries, Becker & White 2003; Ivezić et al. 2004b; Vanden Berk et al.
2004; de Vries et al. 2005; Sesar et al. 2006). Here we expand these studies by quantifying
the efficiency of quasar discovery using variability.
A preliminary comparison of color and variability based methods for selecting quasars
using SDSS data was presented by Ivezić et al. (2003b). They found that 47% of spectroscop-
ically confirmed unresolved quasars with UV excess have the g band magnitude difference
between two observations obtained two years apart larger than 0.15 mag. We can improve on
their analysis because now there are significantly more observations obtained over a longer
time period. Since quasars vary erratically and the rms scatter of their variability (the so-
called structure function) increases with time (e.g. Vanden Berk et al. 2004 and references
therein), the variability selection completeness is expected to be higher than ∼ 50% obtained
by Ivezić et al. (2003b).
First, although the adopted variability selection criteria discussed above are fairly con-
servative, we find that at least 63% of low-redshift quasars are variable at the > 0.05 mag
level (simultaneously in the g and r bands over observer’s time scales of several years) in
the g < 20.5 flux-limited sample. Second, even this estimate is only a lower limit: given the
spectroscopic confirmation for a large flux-limited sample of quasars, it is possible to relax
the adopted variability selection cutoff without a prohibitive contamination by non-variable
sources.
There are 2,492 unresolved quasars in the catalog of spectroscopically confirmed SDSS
quasars (Schneider et al. 2005) from Stripe 82. The fraction of these objects that vary more
than σ in the g and r bands, as a function of σ, is shown in Figure 14. We also show the
analogous fraction for stars from the stellar locus. About 93% of quasars vary with σ > 0.03
– 18 –
mag. For a small fraction of these objects the measured rms scatter is due to photometric
noise, and the stellar data limit this fraction to be at most 3%. Conservatively assuming that
none of these 3% of stars is intrinsically variable, we estimate that at least 90% of quasars
are variable at the 0.03 mag level on time scales up to several years.
6. Implications for Surveys such as LSST
The Large Synoptic Survey Telescope (LSST) is a proposed imaging survey that aims
to obtain repeated multi-band imaging to faint limiting magnitudes over a large fraction of
the sky. The LSST Science Requirement Document8 calls for ∼ 1000 repeated observations
of a solid angle of ∼ 20, 000 deg2 distributed over the six ugrizY photometric bandpasses
and over 10 years. The results presented here can be extrapolated to estimate the lower
limit on the number of variable sources that the LSST would discover.
The single-epoch LSST images will have a 5σ detection limit9 at r ∼ 24.7. Hence, 2%
accurate photometry, comparable to the subsample with g < 20.5 discussed here, will be
available for stars with r . 22. The USNO-B catalog (Monet et al. 2003) shows that there
are about 109 stars with r < 21 across the entire sky. About half of these stars are in the
parts of the sky to be surveyed by the LSST. The simulations based on contemporary Milky
Way models, such as those developed by Robin et al. (2003) and Jurić et al. (2007), predict
that there are about twice as many stars with r < 22 than with r < 21 across the whole
sky. Hence, it is expected that the LSST will detect about a billion stars with r < 22. This
estimate is uncertain to within a factor of two or so due to unknown details in the spatial
distribution of dust in the Galactic plane and towards the Galactic center.
We found that at least 0.5% of stars from the main stellar locus can be detected as
variable with photometry accurate to ∼ 2%. This is only a lower limit because a much
larger number of LSST observations obtained over a longer timespan than the SDSS data
discussed here would increase this fraction. Hence, our results imply that the LSST will
discover at least 50 million variable stars (without accounting for the fact that stellar counts
greatly increase closer to the Galactic plane). Unlike the SDSS sample, where RR Lyrae
stars account for ∼ 25% of all variable stars, the number of RR Lyrae stars in the LSST
sample will be negligible compared to other types of variable stars.
As estimated by Jurić et al. (2007) using deeper coadded SDSS photometry, there are
8 Available at [HREF]http://www.lsst.org/Science/lsst baseline.shtml
9An LSST Exposure Time Calculator is available at [HREF]www.lsst.org
– 19 –
about 100 deg−2 low-redshift quasars with r < 22 (see also Beck-Winchatz & Anderson 2007
and references therein). Therefore, with a sky coverage of ∼ 20, 000 deg2, the LSST will
obtain well-sampled accurate multi-color lightcurves for ∼ 2 million low-redshift quasars.
Even at the redshift limit of ∼ 2, this sample will be complete to Mr ∼ −24, that is,
almost to the formal quasar luminosity cutoff, and will represent an unprecedented sample
for studying quasar physics.
7. Conclusions and Discussion
We have designed and tested algorithms for selecting candidate variable sources from
a catalog based on multiple SDSS imaging observations. Using a sample of 13,051 selected
candidate variable sources in the adopted g < 20.5 flux-limited sample, we find that at least
2% of unresolved optical sources appear variable at the > 0.05 mag level, simultaneously
in the g and r bands. A similar fraction of variable sources (∼ 1%) was also found by
Sesar et al. (2006) using recalibrated photometric POSS and SDSS measurements, and by
Morales-Rueda et al. (2006) using the Faint Sky Variability Survey data (∼ 1%).
Thanks to the multi-color nature of the SDSS photometry, and especially to the u band
data, we can obtain robust classification of selected variable sources. The majority (2/3)
of variable sources are low-redshift (< 2) quasars, although they represent only 2% of all
sources in the adopted g < 20.5 flux-limited sample. We find that about 1/4 of variable stars
are RR Lyrae stars, and that only 0.5% of stars from the main stellar locus are variable at
the 0.05 mag level.
The distribution of γ(g) for main stellar locus stars is bimodal, suggesting at least two,
and perhaps more, different populations of variables. About a third of variable stars from
the stellar locus show gray flux variations in the g and r bands (σ(g)/σ(r) ∼ 1), and positive
lightcurve skewness, suggesting variability caused by eclipsing systems. This population has
an increased fraction of M type stars.
RR Lyrae stars show the largest rms scatter in the u and g bands, followed by low-
redshift quasars. The ratio of rms scatter in the g and r bands for RR Lyrae is ∼ 1.4, in agree-
ment with Ivezić et al. (2000) results based on 2-epoch photometry. The mean lightcurve
skewness for RR Lyrae stars is ∼ −0.5, in agreement with Wils, Lloyd & Bernhard (2006).
We selected a sample of 634 candidate RR Lyrae stars, with an estimated & 95% complete-
ness and ∼ 70% efficiency. Using these stars, we detected rich halo substructure out to
distances of 100 kpc. The apparent “clumpiness” of the candidate RR Lyrae distribution in-
creases with increasing radius, similar to CDM predictions by Bullock, Kravtsov & Weinberg
– 20 –
(2001).
Low-redshift quasars show a dependence of σ(g)/σ(r) on redshift, consistent with dis-
cussions in Richards et al. (2002) and Wilhite et al. (2006). The lightcurve skewness dis-
tribution for quasars is centered on zero in all photometric bands. We find that at least
90% of quasars are variable at the 0.03 mag level (rms) on time scales up to several years.
This confirms that variability is as a good a method for finding low-redshift quasars at high
(|b| > 30) Galactic latitudes as is the UV excess color selection. The fraction of variable
quasars at the > 0.1 mag level obtained here (30%, see Figure 14) is comparable to 36%
found by Rengstorf, Brunner & Wilhite (2006).
The multiple photometric observations obtained by the SDSS represent an excellent
dataset for estimating the impact of surveys such as the LSST on studies of the variable
sky. Our results indicate that the LSST will obtain well-sampled 2% accurate multi-color
lightcurves for ∼ 2 million low-redshift quasars, and will discover at least 50 million variable
stars. The number of variable stars discovered by the LSST will be of the same order as the
number of all stars detected by the SDSS. With about 1000 data points in six photometric
bands, it will be possible to recognize and classify variable objects using lightcurve mo-
ments of higher order than skewness discussed here, including lightcurve folding for periodic
variables.
We acknowledge support by NSF grant AST-0551161 to the LSST for design and de-
velopment activity.
Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation,
the Participating Institutions, the National Science Foundation, the U.S. Department of
Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho,
the Max Planck Society, and the Higher Education Funding Council for England. The SDSS
Web Site is http://www.sdss.org/.
The SDSS is managed by the Astrophysical Research Consortium for the Participating
Institutions. The Participating Institutions are the American Museum of Natural History,
Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western
Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Ad-
vanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute
for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the
Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National
Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for
Astrophysics (MPA), New Mexico State University, Ohio State University, University of
Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Obser-
http://www.sdss.org/
– 21 –
vatory, and the University of Washington.
REFERENCES
Adelman-McCarthy, J. K., Agüeros, M. A., Allam, S. S. et al., 2007, submitted
Akerlof, C., Amrose, S., Balsano, R. et al., 2000, AJ, 119, 1901
Alcock, C., Allsman, R. A., Alves, D. R. et al., 2001, ApJS, 136, 439
Astronomy and Astrophysics Survey Committee, Board on Physics and Astronomy, Space
Studies Board, National Research Council, 2001, Astronomy and Astrophysics in the
New Millennium (The National Academies Press)
Becker, A. C., Wittman, D. M., Boeshaar, P. C. et al., 2004, ApJ, 611, 418
Beck-Winchatz, B., & Anderson, S. F., 2007, MNRAS, 374, 1506
Bramich, D. et al., 2007, in prep
Bullock, J. S., Kravtsov, A. V., & Weinberg D. H., 2001, ApJ, 548, 33
De Lee, N. et al., 2007, in prep
de Vries, W. H., Becker, R. H., & White, R. L., 2003, AJ, 126, 1217
de Vries, W. H., Becker, R. H., White, R. L., & Loomis, C., 2005, AJ, 129, 615
Fan, X. 1999, AJ, 117, 2528
Finlator, K., Ivezić, Ž., Fan, X. et al., 2000, AJ, 120, 2615
Frieman, J. A. et al., 2007, in prep
Fukugita, M., Ichikawa, T., Gunn, J. E. et al., 1996, AJ, 111, 1748
Groot, P. J., Vreeswijk, P. M., Huber, M. E. et al., 2003, MNRAS, 339, 427
Gunn, J. E., Carr, M., Rockosi, C. et al., 1998, AJ, 116, 3040
Gunn, J. E., Siegmund, W. A., Mannery, E. J. et al., 2006, AJ, 131, 2332
Hawkins, M. R. S., 1983, MNRAS, 202, 571
Hawkins, M. R. S., & Veron, P., 1995, MNRAS, 275, 1102
– 22 –
Hawley, S. L., Covey, K. R., Knapp, G. R. et al., 2002, AJ, 123, 3409
Helmi, A., Ivezić, Ž., Prada, F. et al., 2003, ApJ, 586, 195
Hoffmeister, C., Richter, G., & Wenzel, W., 1985, Variable Stars (New York:Springer)
Hogg, D. W., Finkbeiner, D. P., Schlegel, D. J. & Gunn, J.E., 2002, AJ, 122, 2129
Ivezić, Ž., Goldston, J., Finlator, K. et al., 2000, AJ, 120, 963
Ivezić, Ž., Lupton, R. H., Anderson, S. et al., 2003a, Proceedings of the Workshop Variability
with Wide Field Imagers, Mem. Soc. Astron. Italiana, 74, 978 (also astro-ph/0301400)
Ivezić, Ž., Lupton, R. H., Johnston, D. E. et al., 2003b, Proceedings of “AGN Physics with
the SDSS”, Richards, G.T. & Hall, P.B., eds., in press (also astro-ph/0310566)
Ivezić, Ž., Lupton, R. H., Schlegel, D. et al., 2004a, Astronomische Nachrichten, 325, No.
6-8, 583-589 (also astro-ph/0410195)
Ivezić, Ž., Lupton, R. H., Jurić, M. et al., 2004b, Proceedings of “The Interplay among Black
Holes, Stars, and ISM in Galactic Nuclei”, IAU Symposium No. 222, Bergmann, Th.
S., Ho, L. C., & Schmitt, H. R., eds., p. 525 (also astro-ph/0404487)
Ivezić, Ž., Vivas, A. K., Lupton, R. H. & Zinn, R., 2005, AJ, 129, 1096
Ivezić, Ž., Smith, J. A., Miknaitis, G. et al., 2007, submitted to AJ (also astro-ph/0703157)
Jurić, M., Ivezić, Ž., Brooks, A. et al., 2007, submitted to AJ (also astro-ph/0510520)
Kawaguchi, T., Mineshige, S., Umemura, M., & Turner, E. L., 1998, ApJ, 504, 671
Kholopov, P. N., Samus, N. N., Frolov, M. S. et al., 1988, General Catalog of Variable Stars,
4th Ed. (Moscow:Nauka)
Kinney, A. L., Bohlin, R. C., Blades, J. C., & York, D. G., 1991, ApJS, 75, 645
Koo, D. C., Kron, R. G., & Cudworth, K. M., 1986, PASP, 98, 285
Kowalski, A. et al, 2007, in prep
Lenz, D. D., Newberg, J., Rosner, R., Richards, G. T., & Stoughton, C., 1998, ApJS, 119,
Lupton, R. H., Ivezić, Ž., Gunn, J. E., Knapp, G. R., Strauss, M. A. & Yasuda, N., 2002, in
“Survey and Other Telescope Technologies and Discoveries”, Tyson, J. A. & Wolff,
S., eds. Proc. SPIE, 4836, 350
http://arxiv.org/abs/astro-ph/0301400
http://arxiv.org/abs/astro-ph/0310566
http://arxiv.org/abs/astro-ph/0410195
http://arxiv.org/abs/astro-ph/0404487
http://arxiv.org/abs/astro-ph/0703157
http://arxiv.org/abs/astro-ph/0510520
– 23 –
Martini, P., & Schneider, D. P., 2003, ApJ, 597, L109
Matthews, T. A., & Sandage, A. R., 1963, ApJ, 138, 30
McMillan, J. D., & Herbst, W., 1991, AJ, 101, 1788
Monet, D. G., Levine, S. E., Canzian, B. et al., 2003, AJ, 125, 984
Morales-Rueda, L., Groot, P. J., Augusteijn, T. et al., 2006, MNRAS, 371, 1681
Pier, J. R., Munn, J. A., Hindsley, R. B., Hennesy, G. S., Kent, S. M., Lupton, R. H. &
Ivezić, Ž., 2003, AJ, 125, 1559
Pereyra, N. A., Vanden Berk, D. E., Turnshek, D. A., 2006, ApJ, 642, 87
Richards, G. T., Fan, X., Newberg, H. J. et al., 2002, AJ, 123, 2945
Rengstorf, A. W., Mufson, S. L., Andrews, P. et al., 2004, AJ, 617, 184
Rengstorf, A. W., Brunner, R. J., & Wilhite, B. C., 2006, AJ, 131, 1923
Robin, A. C., Reylé, C., Derriére, S., & Picaud, S., 2003, A&A, 409, 523
Schlegel, D., Finkbeiner, D. P., & Davis, M., 1998, ApJ500, 525
Schneider, D. P., Hall, P. B., Richards, G. T. et al, 2005, AJ, 367
Scranton, R., Scranton, R., Johnston, D. et al., 2002, ApJ, 579, 48
Sesar, B., Svilković, D., Ivezić, Ž. et al., 2006, AJ, 131, 2801
Sirko, E., Goodman, J., Knapp, G. et al., 2004, AJ, 127, 899S
Smith, H. A., 1995, RR Lyrae Stars (Cambridge University Press)
Smith, J. A., Tucker, D. L., Kent, S. M. et al., 2002, AJ, 123, 2121
Smolčić, V., Ivezić, Ž., Knapp, G. R. et al., 2004, ApJ, 615L, 141
Strateva, I., Ivezić, Ž., Knapp, G. R. et al., 2001, 122, 1861
Tonry, J. L., Howell, S. B., Everett, M. E. et al., 2005, PASP, 117, 281
Stoughton, C., Lupton, R. H., Bernardi, M. et al., 2002, AJ, 123, 485
Trevese, D., Kron, R. G., & Bunone, A., 2001, ApJ, 551, 103
– 24 –
Tucker, D., Kent, S., Richmond, M. W. et al., 2006, Astronomische Nachrichten, 327, 821
Tyson, J.A., 2002, in “Survey and Other Telescope Technologies and Discoveries”, Tyson, J.
A. & Wolff, S., eds. Proc. SPIE, 4836, 10
Udalski, A., Zebrun, K., Szymanski, M. et al., 2002, Acta Astron., 52, 115
van den Bergh, S., Herbst, E., & Pritchet, C. 1973, AJ, 78, 375
Vanden Berk, D. E., Wilhite, B. C., Kron, R. G. et al., 2004, ApJ, 601, 692
Vivas, A. K., Zinn, R., Andrews, P. et al., 2001, ApJ, 554, L33
Walker, A.R., 2003, Proceedings of the Workshop Variability with Wide Field Imagers,
Mem. Soc. Astron. Italiana, 74, 999 (also astro-ph/0303012)
Wilhite, B. C., Vanden Berk, D. E., Kron, R. G. et al., 2006, ApJ, 633, 638
Wils, P., Lloyd, C., & Bernhard, K., 2006, MNRAS, 368, 1757
Woźniak, P. R., Udalski, A., Szymanski, M. et al., 2002, Acta Astron., 52, 129
Woźniak, P. R. et al., 2004, AJ127,2436
Yanny, B., Newberg, H. J., Kent, S. et al., 2000, ApJ, 540, 825
York, D. G., Adelman, J., Anderson, S. et al., 2000, AJ, 120, 1579
Żebruń, K., Soszyński, I., Woźniak, P. R. et al., 2002, Acta Astron., 51, 317
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0303012
Table 1. The distribution of candidate variable sources in the g − r vs u− g diagram
g < 19 g < 20.5 g < 22
Regiona Nameb % allc % vard var/alle Nvar/N
% allc % vard var/alle Nvar/N
% allc % vard var/alle Nvar/N
I white dwarfs 0.14 0.59 4.25 3.50 0.24 0.40 1.69 3.34 0.28 0.45 1.64 4.51
II low-redshift QSOs 0.45 30.88 68.83 56.58 1.90 62.90 33.03 65.10 4.07 70.01 17.22 47.30
III dM/WD pairs 0.08 0.53 6.54 5.37 0.83 2.08 2.50 4.92 1.21 3.79 3.13 8.61
IV RR Lyrae stars 1.28 16.81 13.11 10.78 1.33 7.95 5.99 11.81 1.48 6.41 4.33 11.90
V stellar locus stars 96.27 48.77 0.51 0.42 94.49 25.15 0.27 0.52 91.89 18.33 0.20 0.55
VI high-redshift QSOs 1.78 2.42 1.36 1.12 1.21 1.52 1.26 2.48 1.07 1.01 0.95 2.60
total count 411,667 3,384 662,195 13,051 748,067 20,553
aThese regions are defined in the g − r vs. u− g color-color diagram, with their boundaries shown in Fig. 4
bAn approximate description of the dominant source type found in the region
cThe fraction of all sources in a magnitude-limited sample found in this color region, with the magnitude limits listed on top.
dThe number of candidate variables from the region, expressed as a fraction of all variable sources
eThe ratio of values listed in columns d) and c)
fThe number of candidate variables from the region, expressed as a fraction of all sources in that region
Table 2. The fraction of variable main stellar locus stars as a function of the s color
Bin % σ(g) > 0.05a % varb 〈σ(g)〉c Countsd
s < −0.02 3.23 0.36 0.017 46,836
−0.02 < s < 0.02 2.92 0.28 0.017 136,910
0.02 < s < 0.06 4.61 1.18 0.019 29,106
s > 0.06 11.50 4.10 0.027 4,547
aFraction of sources with σ(g) > 0.05 mag
bFraction of variable sources (selected using σ(g, r) > 0.05 mag and
χ2(g, r) > 3)
cMedian σ(g)
dNumber of sources in the bin
Table 3. Approximate locations and properties of detected overdensities
Labela Nb 〈R.A.〉c 〈d〉d 〈r〉e 〈u− g〉f 〈g − rg〉 〈γ(g)〉h Nb/N
A 84 330.95 21 17.02 1.14 0.18 -0.50 0.62
B 144 309.47 22 16.76 1.12 0.16 -0.57 0.64
C 54 33.69 25 17.61 1.13 0.20 -0.68 0.29
D 8 347.91 29 18.02 1.14 0.23 -0.50 0.38
E 11 314.06 43 18.84 1.09 0.20 -0.41 0.75
F 11 330.26 48 19.16 1.07 0.20 -0.46 0.38
G 10 354.81 55 19.46 1.10 0.22 -0.69 0.38
H 7 43.57 57 19.32 1.05 0.04 0.06 1.34
I 4 311.34 72 19.98 1.08 0.11 -0.10 2.0
J 26 353.58 81 20.21 1.11 0.20 -0.27 0.58
K 8 28.39 84 20.35 1.10 0.20 0.14 0.44
L 3 339.01 92 20.45 1.06 0.16 0.08 0.67
M 5 39.45 102 20.73 1.07 0.11 0.36 1.67
aOverdensity’s label from Fig. 13
bNumber of candidate RR Lyrae in the overdensity
cMedian Right Ascension
dMedian distance (in kpc)
eMedian r band magnitude
fMedian u− g color
gMedian g − r color
hMedian γ(g)
iThe number ratio of candidate RR Lyrae with g − r < 0.12 and g − r > 0.12
– 28 –
Fig. 1.— The dependence of the median root-mean-square (rms) scatter Σ in SDSS ugrz
bands on magnitude (symbols). The vertical bars show the rms scatter of Σ in each bin
(not the error of the median). The dependence of Σ in the i band is similar to the r band
dependence. In each band, a fourth-degree polynomial is fitted through medians and shown
by the solid line.
– 29 –
Fig. 2.— Top: The cumulative distribution of χ2 g and r values for all sources (solid line)
and a reference Gaussian χ2 distribution with 9 degrees of freedom (dashed line). Vertical
dashed lines show adopted selection cuts on χ2(g) and χ2(r) values. Middle: The fraction
of σ(g, r) > 0.05 mag sources with χ2 per degree of freedom greater than χ2 (only in the
g or r band: solid line, both in the g and r bands: dashed line). Bottom: The fraction of
σ(g, r) > 0.05 mag sources with χ2(m) > 2 (dashed line) or χ2(m) > 3 (solid line) as a
function of magnitude for m = g, r bands, respectively.
– 30 –
Fig. 3.— The distribution of counts for the full sample (top), candidate variable sample
(middle), and the ratio of two counts (bottom) in the g− r vs. u− g color-color diagram for
sources brighter than g = 20.5, binned in 0.05 mag bins. Contours outline distributions of
unbinned counts. Note the remarkable difference between the distribution of all sources and
that of the variable sample, which demonstrates that the latter are robustly selected.
– 31 –
Fig. 4.— The distribution of 18,329 candidate variable sources brighter than g = 21 in
representative SDSS color-magnitude and color-color diagrams. Candidate variables are
color-coded by their rms scatter in the g band (0.05-0.2, see the legend, red if larger or equal
than 0.2). Only sources brighter than g = 20 are plotted in color-color diagrams. Note
how RR Lyrae stars (u − g ∼ 1.15, red dots, σ(g) & 0.2 mag) and low-redshift quasars
(u− g 6 0.7, green dots, σ(g) & 0.1 mag) stand out as highly variable sources. The regions
marked in the top right panel are used for quantitative comparison of the overall and variable
source distributions (see Table 1).
– 32 –
Fig. 5.— The distribution of the rms scatter σ(u) (top left), rms scatter σ(g) (top right),
σ(g)/σ(r) ratio (bottom left), and γ(g) (bottom right) for the variable sample in the g − r
vs. u − g color-color diagram. Sources are binned in 0.05 mag wide bins and the median
values are color-coded. Color ranges are given at the top of each panel, going from blue to
red, where the green color is in the mid-range. Values outside the range saturate in blue or
red. Contours outline the count distributions on a linear scale in steps of 15%. Flux limit is
g < 20.5, with an additional u < 20.5 limit in the top left panel. Bottom left: On average,
RR Lyrae stars have σ(g)/σ(r) ∼ 1.4, main stellar locus stars have σ(g)/σ(r) ∼ 1, while
low-redshift quasars show a gradient of σ(g)/σ(r) values. Bottom right: On average, quasars
and c type RR Lyrae stars (u − g ∼ 1.15, g − r < 0.15) have γ(g) ∼ 0, ab type RR Lyrae
(u− g ∼ 1.15, g− r > 0.15) have negative skewness, and stars in the main stellar locus have
positive skewness.
– 33 –
Fig. 6.— The distribution of candidate variable sources in the g < 20.5 flux-limited sample
is shown by linearly-spaced contours, and by symbols color-coded by the u − g color for
sources with σ(g) > 0.05 mag and σ(r) > 0.05 mag. The dotted lines show the adopted
σ(g, r) selection cut. The thick solid line shows σ(g) = σ(r), while the dashed line shows
σ(g) = 1.4σ(r) relation representative of RR Lyrae stars. Note that sources following the
σ(g) = 1.4σ(r) relation tend to have u−g ∼ 1, as expected for RR Lyrae stars. The greyscale
background shows the fraction of χ2(g, r) > 3 sources which also have σ(g) > x and σ(r) > y
and demonstrates that large χ2 sources also have large σ.
– 34 –
0 0.5 1 1.5 2 2.5 3 3.5
0 0.5 1 1.5 2 2.5 3 3.5
0 0.5 1 1.5 2 2.5 3 3.5
Fig. 7.— The dependence of σ(g)/σ(r) (top), g − r (middle), and σ(g) on redshift for
a sample of spectroscopically confirmed unresolved quasars from Schneider et al. (2005).
The σ(g)/σ(r) gradient shown in Fig. 5 (bottom left panel) can be explained by the local
maximum of σ(g)/σ(r) in the 1.0 to 1.6 redshift range.
– 35 –
Fig. 8.— The lightcurve skewness distribution in the ugi bands for spectroscopically con-
firmed unresolved quasars (dotted line), candidate RR Lyrae stars (dashed line), and variable
main stellar locus stars (solid line, Region V, see Fig. 4 for the definition). The distribution
of the skewness in the r band is similar to the g band distribution, and the distribution of
skewness in the z band is similar to the u band distribution (therefore the r and z data
are not shown). Stars in the main stellar locus show bimodality in the γ(g) suggesting at
least two, and perhaps more, different populations of variables. Similar bimodality is also
discernible in the r band, while it is less pronounced in the i band and not detected in the u
and z bands. Quasars have symmetric lightcurves (γ ∼ 0) and their distribution of skewness
does not change between bands.
– 36 –
0.5 1 1.5 2 2.5 3 3.5
0.5 1 1.5 2
Fig. 9.— A comparison of the u−g (left) and g−r (right) color distributions for variable main
stellar locus stars brighter than g = 19 (dashed lines), and a subset with highly asymmetric
lightcurves (γ(g) > 2.5, solid lines). The subset with highly asymmetric lightcurves has an
increased fraction of stars with colors u− g ∼ 2.5 and g− r ∼ 1.5, characteristic of M stars.
– 37 –
0.9 1 1.1 1.2 1.3
0 0.1 0.2 0.3 0.4 0.5
0.9 1 1.1 1.2 1.3
-0.1 0 0.1 0.2 0.3
Fig. 10.— Top left: The distribution of 846 candidate variable sources from the RR Lyrae
region (dashed lines, see Fig. 3 in Ivezić et al. 2005) in the g−r vs. u−g color-color diagram.
The symbols mark the time-averaged values and are color-coded by σ(g) (0.05 to 0.2, blue to
red). The dotted horizontal line shows the boundary between the RRab and RRc-dominated
regions. Top right: Sources from the top left panel divided into 3 groups according to their
σ(g)/σ(r) values: candidate RR Lyrae stars with 1 < σ(g)/σ(r) 6 2.5 (large dots), sources
with σ(g)/σ(r) 6 1 (triangles), and sources with σ(g)/σ(r) > 2.5 (squares). Small dots show
sources with RR Lyrae colors that fail the variability criteria. The dashed lines show the
σ(g) = σ(r) and σ(g) = 2.5σ(r) relations, while the dotted line shows the σ(g) = 1.4σ(r)
relation. Bottom left: A comparison of the u− g color distributions for candidate RR Lyrae
stars (solid line) and sources with RR Lyrae colors but not tagged as RR Lyrae stars (dashed
line). Bottom right: The dependence of γ(g) on the g−r color for candidate RR Lyrae stars.
The boundary g − r = 0.12 (vertical dotted line) separates candidate RR Lyrae stars into
those with asymmetric (γ(g) ∼ −0.5) and symmetric (γ(g) ∼ 0) lightcurves, corresponding
to RRab and RRc stars, respectively. The condition γ(g) 6 1 (horizontal dashed line) is
used to reduce the contamination of the RR Lyrae sample by eclipsing variables.
– 38 –
0.9 1 1.1 1.2 1.3
0 0.1 0.2 0.3 0.4 0.5
0.9 1 1.1 1.2 1.3
-0.1 0 0.1 0.2 0.3
Fig. 11.— The distribution of candidate RR Lyrae stars selected with 1 < σ(g)/σ(r) 6 2.5
and classified by De Lee et al. (2007), shown in diagrams similar to Fig. 10. Symbols show
RRab stars (red dots), RRc stars (blue dots), variable non-RR Lyrae stars (green dots),
and non-variable sources (open squares, only four sources). A comparison of the u − g
color distribution for RRab (solid line), RRc (dashed line), and variable non-RR Lyrae stars
(dotted line) is shown in the bottom left panel.
– 39 –
60 50 40 30 20 10 0 -10 -20 -30 -40 -50
634 Stripe 82 RR Lyrae candidates
25 kpc
50 kpc
75 kpc
100 kpc
JK LM
Fig. 12.— The magnitude-position distribution of 634 Stripe 82 RR Lyrae candidates within
−55◦ < R.A. < 60◦ and |Dec| 6 1.27◦. Approximate distance (shown on the right y-axis)
is calculated assuming Mr = 0.7 mag for RR Lyrae stars. Dashed lines show where sample
completeness decreases from approximately 99% to 60% due to the χ2 cut (see the bottom
right panel in Fig. 2). Closed curves are remapped ellipses and circles from Fig. 13 that
mark halo substructure.
– 40 –
25       
50       
75       
100            kpc
Fig. 13.— Left: The spatial distribution of candidate RR Lyrae stars discovered by SDSS
along the Celestial Equator. Distance is calculated assuming Eq. 3 from Ivezić et al. (2005)
and MV = 0.7 mag as the absolute magnitude of RR Lyrae in the V band. The right
wedge corresponds to candidate RR Lyrae selected in this work (634 candidates, shown in
Fig. 12) and the left wedge is based on the sample from Ivezić et al. (2000) (296 candidates).
Right: The number density distribution of candidate RR Lyrae stars shown in the left panel,
computed using an adaptive Bayesian density estimator developed by Ivezić et al. (2005).
The color scheme represents the number density multiplied by the cube of the galactocentric
radius, and displayed on a logarithmic scale with a dynamic range of 300 (from light blue to
red). The green color corresponds to the mean density – both wedges with the data would
have this color if the halo number density distribution followed a perfectly smooth r−3 power-
law. The purple color marks the regions with no data. The yellow regions are formally ∼ 3σ
significant overdensities, and orange/red regions have an even higher significance (using
only the counts variance). The strongest clump in the left wedge belongs to the Sgr dwarf
tidal stream as does the clump marked by C in the right wedge (Ivezić et al. 2003a). An
approximate location and properties of labeled overdensities are listed in Table 3. The
Ivezić et al. (2000) sample is based on only 2 epochs and thus has a much lower completeness
(∼ 56%) resulting in a lower density contrast.
– 41 –
Fig. 14.— The fraction of spectroscopically confirmed unresolved QSOs (fQSO, solid line)
and the fraction of sources from the stellar locus (floc, dashed line) brighter than g = 19.5 and
r = 19.5 that have rms scatter larger than σ in the g and r bands. The ratio fQSO/(1+ floc)
(dotted line), which corresponds to the implied fraction of variable QSOs, peaks at a level
of 90% for σ = 0.03 mag.
	Introduction
	Overview of the SDSS Imaging and Stripe 82 Data
	Analysis of Stripe 82 Catalog of Variable Sources
	Methods and Selection Criteria
	The Counts of Variable Sources
	Completeness
	Efficiency
	The Properties of Variable Sources
	Skewness as a Proxy for Dominant Variability Mechanism
	The Milky Way Halo Structure Traced by Candidate RR Lyrae Stars 
	Criteria for Selecting RR Lyrae Stars
	Completeness and efficiency
	The Spatial Distribution of Candidate RR Lyrae Stars
	Are All Quasars Variable?
	Implications for Surveys such as LSST
	Conclusions and Discussion
ABSTRACT
  We quantify the variability of faint unresolved optical sources using a
catalog based on multiple SDSS imaging observations. The catalog covers SDSS
Stripe 82, and contains 58 million photometric observations in the SDSS ugriz
system for 1.4 million unresolved sources. In each photometric bandpass we
compute various low-order lightcurve statistics and use them to select and
study variable sources. We find that 2% of unresolved optical sources brighter
than g=20.5 appear variable at the 0.05 mag level (rms) simultaneously in the g
and r bands. The majority (2/3) of these variable sources are low-redshift (<2)
quasars, although they represent only 2% of all sources in the adopted
flux-limited sample. We find that at least 90% of quasars are variable at the
0.03 mag level (rms) and confirm that variability is as good a method for
finding low-redshift quasars as is the UV excess color selection (at high
Galactic latitudes). We analyze the distribution of lightcurve skewness for
quasars and find that is centered on zero. We find that about 1/4 of the
variable stars are RR Lyrae stars, and that only 0.5% of stars from the main
stellar locus are variable at the 0.05 mag level. The distribution of
lightcurve skewness in the g-r vs. u-g color-color diagram on the main stellar
locus is found to be bimodal (with one mode consistent with Algol-like
behavior). Using over six hundred RR Lyrae stars, we demonstrate rich halo
substructure out to distances of 100 kpc. We extrapolate these results to
expected performance by the Large Synoptic Survey Telescope and estimate that
it will obtain well-sampled 2% accurate, multi-color lightcurves for ~2 million
low-redshift quasars, and will discover at least 50 million variable stars.

<|endoftext|><|startoftext|>
Introduction
The theory of time scales is a relatively new area, that unify and generalize
difference and differential equations [5]. It was initiated by Stefan Hilger in the
nineties of the XX century [7, 8], and is now subject of strong current research in
many different fields in which dynamic processes can be described with discrete
or continuous models [1].
The calculus of variations on time scales was introduced by Bohner [4] and by
Hilscher and Zeidan [9], and appears to have many opportunities for application
in economics [2]. In all those works, necessary optimality conditions are only
obtained for the basic (simplest) problem of the calculus of variations on time
scales: in [2, 4] for the basic problem with fixed endpoints, in [9] for the basic
problem with general (jointly varying) endpoints. Having in mind the classical
∗This work is part of the first author’s PhD project.
http://arxiv.org/abs/0704.0656v1
setting (situation when the time scale T is either R or Z – see e.g. [6, 14] and
[10, 11], respectively), one suspects that the Euler-Lagrange equations in [2, 4, 9]
are easily generalized for problems with higher-order delta derivatives. This is
not exactly the case, even beginning with the formulation of the problem.
The basic problem of the calculus of variations on time scales is defined (cf.
[4, 9], see §2 below for the meaning of the ∆-derivative and ∆-integral) as
L[y(·)] =
L(t, yσ(t), y∆(t))∆t −→ min, (y(a) = ya) , (y(b) = yb) , (1)
with L : T×Rn×Rn → R, (y, u) → L(t, y, u) a C2-function for each t, and where
we are using parentheses around the endpoint conditions as a notation to mean
that the conditions may or may not be present: the case with fixed boundary
conditions y(a) = ya and y(b) = yb is studied in [4], for admissible functions y(·)
belonging to C1rd (T;R
n) (rd-continuously ∆-differentiable functions); general
boundary conditions of the type f(y(a), y(b)) = 0, which include the case y(a) or
y(b) free, and over admissible functions in the wider class C1prd (T;R
n) (piecewise
rd-continuously ∆-differentiable functions), are considered in [9]. One question
immediately comes to mind. Why is the basic problem on time scales defined
as (1) and not as
L[y(·)] =
L(t, y(t), y∆(t))∆t −→ min, (y(a) = ya) , (y(b) = yb) . (2)
The answer is simple: compared with (2), definition (1) simplifies the Euler-
Lagrange equation, in the sense that makes it similar to the classical context.
The reader is invited to compare the Euler-Lagrange condition (6) of problem
(1) and the Euler-Lagrange condition (13) of problem (2), with the classical
expression (on the time scale T = R):
Ly′(t, y∗(t), y
(t)) = Ly(t, y∗(t), y
(t)), t ∈ [a, b] .
It turns out that problems (1) and (2) are equivalent: as far as we are assuming
y(·) to be ∆-differentiable, then y(t) = yσ(t) − µ(t)y∆(t) and (i) any problem
(1) can be written in the form (2), (ii) any problem (2) can be written in the
form (1). We claim, however, that the formulation (2) we are promoting here is
more natural and convenient. An advantage of our formulation (2) with respect
to (1) is that it makes clear how to generalize the basic problem on time scales
to the case of a Lagrangian L containing delta derivatives of y(·) up to an order
r, r ≥ 1. The higher-order problem will be naturally defined as
L[y(·)] =
r−1(b)
L(t, y(t), y∆(t), . . . , y∆
(t))∆t −→ min,
y(a) = y0a
r−1(b)
= y0b
, (3)
(a) = yr−1a
r−1 (
ρr−1(b)
= yr−1
where y∆
(t) ∈ Rn, i ∈ {0, . . . , r}, y∆
= y, and n, r ∈ N (assumptions on
the data of the problem will be specified later, in Section 3). One of the new
results in this paper is a necessary optimality condition in delta integral form
for problem (3) (Theorem 4). It is obtained using the interplay of problems (1)
and (2) in order to deal with more general optimal control problems (16).
The paper is organized as follows. In Section 2 we give a brief introduction
to time scales and recall the main results of the calculus of variations on this
general setting. Our contributions are found in Section 3. We start in §3.1
by proving the Euler-Lagrange equation and transversality conditions (natural
boundary conditions – y(a) or/and y(b) free) for the basic problem (2) (The-
orem 2). As a corollary, the Euler-Lagrange equation in [4] and [9] for (1) is
obtained. Regarding the natural boundary conditions, the one which appears
when y(a) is free turns out to be simpler and more close in aspect to the classical
condition Ly′(a, y∗(a), y
(a)) = 0 for problem (1) than to (2)—compare condi-
tion (9) for problem (2) with the correspondent condition (14) for problem (1);
but the inverse situation happens when y(b) is free—compare condition (15) for
problem (1) with the correspondent condition (10) for (2), this last being simpler
and more close in aspect to the classical expression Ly′(b, y∗(b), y
(b)) = 0 valid
on the time scale T = R. In §3.2 we formulate a more general optimal control
problem (16) on time scales, proving respective necessary optimality conditions
in Hamiltonian form (Theorem 3). As corollaries, we obtain a Lagrange multi-
plier rule on time-scales (Corollary 2), and in §3.3 the Euler-Lagrange equation
for the problem of the calculus of variations with higher order delta derivatives
(Theorem 4). Finally, as an illustrative example, we consider in §4 a discrete
time scale and obtain the well-known Euler-Lagrange equation in delta differ-
entiated form.
All the results obtained in this paper can be extended: (i) to nabla deriva-
tives (see [5, §8.4]) with the appropriate modifications and as done in [2] for the
simplest functional; (ii) to more general classes of admissible functions and to
problems with more general boundary conditions, as done in [9] for the simplest
functional of the calculus of variations on time scales.
2 Time scales and previous results
We begin by recalling the main definitions and properties of time scales (cf.
[1, 5, 7, 8] and references therein).
A nonempty closed subset of R is called a Time Scale and is denoted by T.
The forward jump operator σ : T → T is defined by
σ(t) = inf {s ∈ T : s > t}, for all t ∈ T,
while the backward jump operator ρ : T → T is defined by
ρ(t) = sup {s ∈ T : s < t}, for all t ∈ T,
with inf ∅ = supT (i.e., σ(M) = M if T has a maximum M) and sup ∅ = inf T
(i.e., ρ(m) = m if T has a minimum m).
A point t ∈ T is called right-dense, right-scattered, left-dense and left-
scattered if σ(t) = t, σ(t) > t, ρ(t) = t and ρ(t) < t, respectively.
Throughout the text we let T = [a, b] ∩ T0 with a < b and T0 a time scale.
We define Tk = T\(ρ(b), b], Tk
and more generally Tk
for n ∈ N. The following standard notation is used for σ (and ρ): σ0(t) = t,
σn(t) = (σ ◦ σn−1)(t), n ∈ N.
The graininess function µ : T → [0,∞) is defined by
µ(t) = σ(t)− t, for all t ∈ T.
We say that a function f : T → R is delta differentiable at t ∈ Tk if there is
a number f∆(t) such that for all ε > 0 there exists a neighborhood U of t (i.e.,
U = (t− δ, t+ δ) ∩ T for some δ > 0) such that
|f(σ(t)) − f(s)− f∆(t)(σ(t) − s)| ≤ ε|σ(t)− s|, for all s ∈ U.
We call f∆(t) the delta derivative of f at t.
Now, we define the rth−delta derivative (r ∈ N) of f to be the function
→ R, provided f∆
is delta differentiable on Tk
For delta differentiable f and g, the next formulas hold:
fσ(t) = f(t) + µ(t)f∆(t) (4)
(fg)∆(t) = f∆(t)gσ(t) + f(t)g∆(t)
= f∆(t)g(t) + fσ(t)g∆(t),
where we abbreviate f ◦ σ by fσ.
Next, a function f : T → R is called rd-continuous if it is continuous at
right-dense points and if its left-sided limit exists at left-dense points. We
denote the set of all rd-continuous functions by Crd or Crd[T] and the set of all
delta differentiable functions with rd-continuous derivative by C1rd or C
rd[T].
It is known that rd-continuous functions possess an antiderivative, i.e., there
exists a function F with F∆ = f , and in this case an integral is defined by
f(t)∆t = F (b)− F (a). It satisfies
∫ σ(t)
f(τ)∆τ = µ(t)f(t) . (5)
We now present some useful properties of the delta integral:
Lemma 1. If a, b ∈ T and f, g ∈Crd, then
f(σ(t))g∆(t)∆t = [(fg)(t)]
t=a −
f∆(t)g(t)∆t.
f(t)g∆(t)∆t = [(fg)(t)]
t=a −
f∆(t)g(σ(t))∆t.
The main result of the calculus of variations on time scales is given by the
following necessary optimality condition for problem (1).
Theorem 1 ([4]). If y∗ is a weak local minimizer (cf. §3) of the problem
L[y(·)] =
L(t, yσ(t), y∆(t))∆t −→ min
y(·) ∈ C1
y(a) = ya, y(b) = yb,
then the Euler-Lagrange equation
L∆y∆(t, y
(t), y∆
(t)) = Lyσ(t, y
(t), y∆
(t)), t ∈ Tk
holds.
Main ingredients to prove Theorem 1 are item 1 of Lemma 1 and the Dubois-
Reymond lemma:
Lemma 2 ([4]). Let g ∈ Crd, g : [a, b]
k → Rn. Then,
g(t) · η∆(t)∆t = 0 for all η ∈ C1
with η(a) = η(b) = 0
if and only if
g(t) = c on [a, b]k for some c ∈ Rn.
3 Main results
Assume that the Lagrangian L(t, u0(t), u1(t), . . . , ur(t)) (r ≥ 1) is a C
r+1 func-
tion of (u0(t), u1(t), . . . , ur(t)) for each t ∈ T. Let y ∈ C
rd[T], where
Crrd[T] =
y : Tk
→ Rn : y∆
is rd-continuous on Tk
We want to minimize the functional L of problem (3). For this, we say that
y∗ ∈ C
rd[T] is a weak local minimizer for the variational problem (3) provided
there exists δ > 0 such that L[y∗] ≤ L[y] for all y ∈ C
rd[T] satisfying the
constraints in (3) and ‖y − y∗‖r,∞ < δ, where
||y||r,∞ :=
with y∆
= y and ||y||∞ := supt∈Tkr |y(t)|.
3.1 The basic problem on time scales
We start by proving the necessary optimality condition for the simplest varia-
tional problem (r = 1):
L[y(·)] =
L(t, y(t), y∆(t))∆t −→ min
y(·) ∈ C1rd[T]
(y(a) = ya) , (y(b) = yb) .
Remark 1. We are assuming in problem (7) that the time scale T has at least
3 points. Indeed, for the delta-integral to be defined we need at least 2 points.
Assume that the time scale has only two points: T = {a, b}, with b = σ(a).
Then,
∫ σ(a)
L(t, y(t), y∆(t))∆t = µ(a)L(a, y(a), y∆(a)). In the case both y(a)
and y(σ(a)) are fixed, since y∆(a) =
y(σ(a))−y(a)
, then L[y(·)] would be a con-
stant for every admissible function y(·) (there would be nothing to minimize
and problem (7) would be trivial). Similarly, for (3) we assume the time scale
to have at least 2r + 1 points (see Remark 15).
Theorem 2. If y∗ is a weak local minimizer of (7) (problem (3) with r = 1),
then the Euler-Lagrange equation in ∆-integral form
Ly∆(t, y∗(t), y
(t)) =
∫ σ(t)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ + c (8)
holds ∀t ∈ Tk and some c ∈ Rn. Moreover, if the initial condition y(a) = ya is
not present (y(a) is free), then the supplementary condition
Ly∆(a, y∗(a), y
(a))− µ(a)Ly(a, y∗(a), y
(a)) = 0 (9)
holds; if y(b) = yb is not present (y(b) is free), then
Ly∆(ρ(b), y∗(ρ(b)), y
(ρ(b))) = 0 . (10)
Remark 2. For the time scale T = R equalities (9) and (10) give, respec-
tively, the well-known natural boundary conditions Ly′(a, y∗(a), y
(a)) = 0 and
Ly′(b, y∗(b), y
(b)) = 0.
Proof. Suppose that y∗(·) is a weak local minimizer of L[·]. Let η(·) ∈C
rd and
define Φ : R → R by
Φ(ε) = L[y∗(·) + εη(·)].
This function has a minimum at ε = 0, so we must have Φ′(0) = 0. Applying
the delta-integral properties and the integration by parts formula 2 (second item
in Lemma 1), we have
0 = Φ′(0)
[Ly(t, y∗(t), y
(t)) · η(t) + Ly∆(t, y∗(t), y
(t)) · η∆(t)]∆t
Ly(t, y∗(t), y
(t)∆t · η(t)
∫ σ(t)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ · η∆(t)− Ly∆(t, y∗(t), y
(t)) · η∆(t)
Let us limit the set of all delta-differentiable functions η(·) with rd-continuous
derivatives to those which satisfy the condition η(a) = η(b) = 0 (this condition
is satisfied by all the admissible variations η(·) in the case both y(a) = ya and
y(b) = yb are fixed). For these functions we have
Ly∆(t, y∗(t), y
(t)) −
∫ σ(t)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ
· η∆(t)∆t = 0 .
Therefore, by the lemma of Dubois-Reymond (Lemma 2), there exists a constant
c ∈ Rn such that (8) holds:
Ly∆(t, y∗(t), y
(t))−
∫ σ(t)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ = c , (12)
for all t ∈ Tk. Because of (12), condition (11) simplifies to
Ly(t, y∗(t), y
(t)∆t · η(t)
+ c · η(t)|
t=a = 0,
for any admissible η(·). If y(a) = ya is not present in problem (7) (so that η(a)
need not to be zero), taking η(t) = t− b we find that c = 0; if y(b) = yb is not
present, taking η(t) = t − a we find that
Ly(t, y∗(t), y
(t) = 0. Applying
these two conditions to (12) and having in mind formula (5), we may state that
Ly∆(a, y∗(a), y
(a))−
∫ σ(a)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ = 0
⇔ Ly∆(a, y∗(a), y
(a))− µ(a)Ly(a, y∗(a), y
(a)) = 0,
and (note that σ(ρ(b)) = b)
Ly∆(ρ(b), y∗(ρ(b)), y
(ρ(b)))−
Ly(ξ, y∗(ξ), y
(ξ))∆ξ = 0
⇔ Ly∆(ρ(b), y∗(ρ(b)), y
(ρ(b))) = 0.
Remark 3. Since σ(t) ≥ t, ∀t ∈ T, we must have
Ly∆(t, y∗(t), y
(t))−
∫ σ(t)
Ly(ξ, y∗(ξ), y
(ξ))∆ξ = c
⇔ Ly∆(t, y∗(t), y
(t)) − µ(t)Ly(t, y∗(t), y
Ly(ξ, y∗(ξ), y
(ξ))∆ξ + c,
by formula (5). Delta differentiating both sides, we obtain
Ly∆(t, y∗(t), y
(t))− µ(t)Ly(t, y∗(t), y
= Ly(t, y∗(t), y
(t)), t ∈ Tk
. (13)
Note that we can’t expand the left hand side of this last equation, because we
are not assuming that µ(t) is delta differentiable. In fact, generally µ(t) is not
delta differentiable (see example 1.55, page 21 of [5]). We say that (13) is the
Euler-Lagrange equation for problem (7) in the delta differentiated form.
As mentioned in the introduction, the formulations of the problems of the cal-
culus of variations on time scales with “
t, yσ(t), y∆(t)
” and with “
t, y(t), y∆(t)
are equivalent. It is trivial to derive previous Euler-Lagrange equation (6) from
our equation (13) and the other way around (one can derive (13) directly from
(6)).
Corollary 1. If y∗ ∈ C
[T] is a weak local minimizer of
L[y(·)] =
L(t, yσ(t), y∆(t))∆t , (y(a) = ya), (y(b) = yb),
then the Euler-Lagrange equation (6) holds. If y(a) is free, then the extra
transversality condition (natural boundary condition)
Ly∆(a, y
(a), y∆
(a)) = 0 (14)
holds; if y(b) is free, then
Lyσ(ρ(b), y
(ρ(b)), y∆
(ρ(b)))µ(ρ(b)) + Ly∆(ρ(b), y
(ρ(b)), y∆
(ρ(b))) = 0 . (15)
Proof. Since y(t) is delta differentiable, then (4) holds. This permits us to write
L(t, yσ(t), y∆(t)) = L(t, y(t) + µ(t)y∆(t), y∆(t)) = F (t, y(t), y∆(t)).
Applying equation (13) to the functional F we obtain
Fy∆(t, y(t), y
∆(t)) − µ(t)Fy(t, y(t), y
∆(t))
= Fy(t, y(t), y
∆(t)).
Fy(t, y(t), y
∆(t)) = Lyσ(t, y
σ(t), y∆(t)) ,
Fy∆(t, y(t), y
∆(t)) = Lyσ(t, y
σ(t), y∆(t))µ(t) + Ly∆(t, y
σ(t), y∆(t)) ,
and the result follows.
3.2 The Lagrange problem on time scales
Now we consider a more general variational problem with delta-differential side
conditions:
J [y(·), u(·)] =
L(t, y(t), u(t))∆t −→ min ,
y∆(t) = ϕ(t, y(t), u(t)) ,
(y(a) = ya) , (y(b) = yb) ,
where y(·) ∈ C1rd[T], u(·) ∈ Crd[T], y(t) ∈ R
n and u(t) ∈ Rm for all t ∈ T,
and m ≤ n. We assume L : T × Rn × Rm → R and ϕ : T × Rn × Rm → Rn
to be C1-functions of y and u for each t; and that for each control function
u(·) ∈ Crd[T;R
m] there exists a correspondent y(·) ∈ C1rd[T;R
n] solution of
the ∆-differential equation y∆(t) = ϕ(t, y(t), u(t)). We remark that conditions
for existence or uniqueness are available for O∆E’s from the very beginning
of the theory of time scales (see [8, Theorem 8]). Roughly speaking, forward
solutions exist, while existence of backward solutions needs extra assumptions
(e.g. regressivity). In control theory, however, one usually needs only forward
solutions, so we do not need to impose such extra assumptions [3].
We are interested to find necessary conditions for a pair (y∗, u∗) to be a weak
local minimizer of J .
Definition 1. Take an admissible pair (y∗, u∗). We say that (y∗, u∗) is a weak
local minimizer for (16) if there exist δ > 0 such that J [y∗, u∗] ≤ J [y, u] for all
admissible pairs (y, u) satisfying ‖y − y∗‖1,∞ + ‖u− u∗‖∞ < δ.
Remark 4. Problem (16) is very general and includes: (i) problem (7) (this
is the particular case where m = n and ϕ(t, y, u) = u), (ii) the problem of
the calculus of variations with higher-order delta derivatives (3) (such problem
receive special attention in Section 3.3 below), (iii) isoperimetric problems on
time scales. Suppose that the isoperimetric condition
I[y(·), u(·)] =
g (t, y(t), u(t))∆t = β ,
β a given constant, is prescribed. We can introduce a new state variable yn+1
defined by
yn+1(t) =
g(ξ, y(ξ), u(ξ))∆ξ, t ∈ T,
with boundary conditions yn+1(a) = 0 and yn+1(b) = β. Then
y∆n+1(t) = g (t, y(t), u(t)) , t ∈ T
and we can always recast an isoperimetric problem as a Lagrange problem (16).
To establish necessary optimality conditions for (16) is more complicated
than for the basic problem of the calculus of variations on time scales (1) or (2),
owing to the possibility of existence of abnormal extremals (Definition 2). The
abnormal case never occurs for the basic problem (Proposition 2).
Theorem 3 (The weak maximum principle on time scales). If (y∗(·), u∗(·)) is
a weak local minimizer of problem (16), then there exists a set of multipliers
(ψ0∗ , ψ∗(·)) 6= 0, where ψ0∗ is a nonnegative constant and ψ∗(·) : T → R
n is a
delta differentiable function on Tk, such that (y∗(·), u∗(·), ψ0∗ , ψ∗(·)) satisfy
(t) = Hψσ (t, y∗(t), u∗(t), ψ0∗ , ψ
(t)) , (∆-dynamic equation for y) (17)
(t) = −Hy(t, y∗(t), u∗(t), ψ0∗ , ψ
(t)) , (∆-dynamic equation for ψ) (18)
Hu(t, y∗(t), u∗(t), ψ0∗ , ψ
(t)) = 0 , (∆-stationary condition) (19)
for all t ∈ Tk, where the Hamiltonian function H is defined by
H(t, y, u, ψ0, ψ
σ) = ψ0L(t, y, u) + ψ
σ · ϕ(t, y, u) . (20)
If y(a) is free in (16), then
ψ∗(a) = 0 ; (21)
if y(b) is free in (16), then
ψ∗(b) = 0 . (22)
Remark 5. From the definition (20) of H , it follows immediately that (17) holds
true for any admissible pair (y(·), u(·)) of problem (16). Indeed, condition (17)
is nothing more than the control system y∆
(t) = ϕ(t, y∗(t), u∗(t)).
Remark 6. For the time scale T = Z, (17)-(19) reduce to well-known conditions
in discrete time (see e.g. [13, Ch. 8]): the ∆-dynamic equation for y takes the
form y(k + 1) − y(k) = Hψ (k, y(k), u(k), ψ0, ψ(k + 1)); the ∆-dynamic equa-
tion for ψ gives ψ(k + 1) − ψ(k) = −Hy (k, y(k), u(k), ψ0, ψ(k + 1)); and the
∆-stationary condition reads as Hu (k, y(k), u(k), ψ0, ψ(k + 1)) = 0; with the
Hamiltonian H = ψ0L(k, y(k), u(k)) + ψ(k + 1) · ϕ(k, y(k), u(k)). For T = R,
Theorem 3 is known in the literature as Hestenes necessary condition, which is
a particular case of the Pontryagin Maximum Principle [12].
Corollary 2 (Lagrange multiplier rule on time scales). If (y∗(·), u∗(·)) is a
weak local minimizer of problem (16), then there exists a collection of multipli-
ers (ψ0∗ , ψ∗(·)), ψ0∗ a nonnegative constant and ψ∗(·) : T → R
n a delta dif-
ferentiable function on Tk, not all vanishing, such that (y∗(·), u∗(·), ψ0∗ , ψ∗(·))
satisfy the Euler-Lagrange equation of the augmented functional J∗:
J∗[y(·),u(·), ψ(·)] =
t, y(t), u(t), ψσ(t), y∆(t)
ψ0L(t, y(t), u(t)) + ψ
σ(t) ·
ϕ(t, y(t), u(t)) − y∆(t)
[H(t, y(t), u(t), ψ0, ψ
σ(t))− ψσ(t) · y∆(t)]∆t .
Proof. The Euler-Lagrange equations (13) and (6) applied to (23) give
y∆ − µ(t)L
= L∗y ,
(−µ(t)L∗u)
= L∗u , L
ψσ = 0 ,
that is,
(ψσ(t) + µ(t) ·Hy)
= −Hy , (24)
(−µ(t)Hu)
∆ = Hu , (25)
∆(t) = Hψσ ,
where the partial derivatives ofH are evaluated at (t, y(t), u(t), ψ0, ψ
σ(t)). Obvi-
ously, from (19) we obtain (25). It remains to prove that (18) implies (24) along
(y∗(·), u∗(·), ψ0∗ , ψ∗(·)). Indeed, from (18) we can write µ(t)ψ
∆(t) = −µ(t)Hy,
which is equivalent to ψ(t) = ψσ(t) + µ(t)Hy.
Remark 7. Condition (18) or (24) imply that along the minimizer
ψσ(t) = −
∫ σ(t)
Hy(ξ, y(ξ), u(ξ), ψ0, ψ
σ(ξ))∆ξ − c (26)
for some c ∈ Rn.
Remark 8. The assertion in Theorem 3 that the multipliers cannot be all zero is
crucial. Indeed, without this requirement, for any admissible pair (y(·), u(·)) of
(16) there would always exist a set of multipliers satisfying (18)-(19) (namely,
ψ0 = 0 and ψ(t) ≡ 0).
Remark 9. Along all the work we consider ψ as a row-vector.
Remark 10. If the multipliers (ψ0, ψ(·)) satisfy the conditions of Theorem 3,
then (γψ0, γψ(·)) also do, for any γ > 0. This simple observation allow us to
conclude that it is enough to consider two cases: ψ0 = 0 or ψ0 = 1.
Definition 2. An admissible quadruple (y(·), u(·), ψ0, ψ(·)) satisfying condi-
tions (17)-(19) (also (21) or (22) if y(a) or y(b) are, respectively, free) is called
an extremal for problem (16). An extremal is said to be normal if ψ0 = 1 and
abnormal if ψ0 = 0.
So, Theorem 3 asserts that every minimizer is an extremal.
Proposition 1. The Lagrange problem on time scales (16) has no abnormal
extremals (in particular, all the minimizers are normal) when at least one of the
boundary conditions y(a) or y(b) is absent (when y(a) or y(b) is free).
Proof. Without loss of generality, let us consider y(b) free. We want to prove
that the nonnegative constant ψ0 is nonzero. The fact that ψ0 6= 0 follows from
Theorem 3. Indeed, the multipliers ψ0 and ψ(t) cannot vanish simultaneously
at any point of t ∈ T. As far as y(b) is free, the solution to the problem must
satisfy the condition ψ(b) = 0. The condition ψ(b) = 0 requires a nonzero value
for ψ0 at t = b. But since ψ0 is a nonnegative constant, we conclude that ψ0 is
positive, and we can normalize it (Remark 10) to unity.
Remark 11. In the general situation abnormal extremals may occur. More
precisely (see proof of Theorem 3), abnormality is characterized by the existence
of a nontrivial solution ψ(t) for the system ψ∆(t) + ψσ(t) · ϕy = 0.
Proposition 2. There are no abnormal extremals for problem (7), even in the
case y(a) and y(b) are both fixed (y(a) = ya, y(b) = yb).
Proof. Problem (7) is the particular case of (16) with y∆(t) = u(t). If ψ0 = 0,
then the Hamiltonian (20) takes the form H = ψσ · u. From Theorem 3, ψ∆ = 0
and ψσ = 0, for all t ∈ Tk. Since ψσ = ψ + µ(t)ψ∆, this means that ψ0 and ψ
would be both zero, which is not a possibility.
Corollary 3. For problem (7), Theorem 3 gives Theorem 2.
Proof. For problem (7) we have ϕ(t, y, u) = u. From Proposition 2, the Hamil-
tonian becomes H(t, y, u, ψ0, ψ
σ) = L(t, y, u) + ψσ · u. By the ∆-stationary
condition (19) we may write Lu(t, y, u) + ψ
σ = 0. Now apply (26) and the
result follows.
To prove Theorem 3 we need the following result:
Lemma 3 (Fundamental lemma of the calculus of variations on time scales).
Let g ∈ Crd, g : T
k → Rn. Then,
g(t) · η(t)∆t = 0 for all η ∈ Crd
if and only if
g(t) = 0 on Tk .
Proof. If g(t) = 0 on Tk, then obviously
g(t) · η(t)∆t = 0, for all η ∈ Crd.
Now, suppose (without loss of generality) that g(t0) > 0 for some t0 ∈ T
We will divide the proof in two steps:
Step 1: Assume that t0 is right scattered. Define in T
η(t) =
1 if t = t0;
0 if t 6= t0.
Then η is rd-continuous and
g(t)η(t)∆t =
∫ σ(t0)
g(t)η(t)∆t = µ(t0)g(t0) > 0,
which is a contradiction.
Step 2: Suppose that t0 is right dense. Since g is rd-continuous, then it is
continuous at t0. So there exist δ > 0 such that for all t ∈ (t0 − δ, t0 + δ) ∩ T
we have g(t) > 0.
If t0 is left-dense, define in T
η(t) =
(t− t0 + δ)
2(t− t0 − δ)
2 if t ∈ (t0 − δ, t0 + δ);
0 otherwise.
It follows that η is rd-continuous and
g(t)η(t)∆t =
∫ t0−δ
g(t)η(t)∆t +
∫ t0+δ
g(t)η(t)∆t +
g(t)η(t)∆t > 0,
which is a contradiction.
If t0 is left-scattered, define in T
η(t) =
(t− t0 − δ)
2 if t ∈ [t0, t0 + δ̃);
0 otherwise,
where 0 < δ̃ < min{µ(ρ(t0), δ)}. We have: η is rd-continuous and
g(t)η(t)∆t =
∫ t0+δ̃
g(t)η(t)∆t > 0,
that again leads us to a contradiction.
Proof. (of Theorem 3) We begin by noting that u(t) = (u1(t), . . . , um(t)) in
problem (16), t ∈ Tk, are arbitrarily specified functions (controls). Once fixed
u(·) ∈ Crd[T;R
m], then y(t) = (y1(t), . . . , yn(t)), t ∈ T
k, is determined from
the system of delta-differential equations y∆(t) = ϕ(t, y(t), u(t)) (and boundary
conditions, if present). As far as u(·) is an arbitrary function, variations ω(·) ∈
Crd[T;R
m] for u(·) can also be considered arbitrary. This is not true, however,
for the variations η(·) ∈C1rd[T;R
n] of y(·). Suppose that (y∗(·), u∗(·)) is a weak
local minimizer of J [·, ·]. Let ε ∈ (−δ, δ) be a small real parameter and yε(t) =
y∗(t)+εη(t) (with η(a) = 0 if y(a) = ya is given; η(b) = 0 if y(b) = yb is given) be
the trajectory generated by the control uε(t) = u∗(t)+εω(t), ω(·) ∈ Crd[T;R
ε (t) = ϕ(t, yε(t), uε(t)) , (27)
t ∈ Tk, (yε(a) = ya), (yε(b) = yb). We define the following function:
Φ(ε) = J [yε(·), uε(·)] = J [y∗(·) + εη(·), u∗(·) + εω(·)]
L (t, y∗(t) + εη(t), u∗(t) + εω(t))∆t .
It follows that Φ : (−δ, δ) → R has a minimum for ε = 0, so we must have
Φ′(0) = 0. From this condition we can write that
[ψ0Ly (t, y∗(t), u∗(t)) · η(t) + ψ0Lu (t, y∗(t), u∗(t)) · ω(t)]∆t = 0 (28)
for any real constant ψ0. Differentiating (27) with respect to ε, we get
η∆(t) = ϕy(t, yε(t), uε(t)) · η(t) + ϕu(t, yε(t), uε(t)) · ω(t) .
In particular, with ε = 0,
η∆(t) = ϕy(t, y∗(t), u∗(t)) · η(t) + ϕu(t, y∗(t), u∗(t)) · ω(t) . (29)
Let ψ(·) ∈C1rd[T;R
n] be (yet) an unspecified function. Multiplying (29) by
ψσ(t) = [ψσ1 (t), . . . , ψ
n(t)], and delta-integrating the result with respect to t
from a to b, we get that
ψσ(t) · η∆(t)∆t =
[ψσ(t) · ϕy · η(t) + ψ
σ(t) · ϕu · ω(t)]∆t (30)
for any ψ(·) ∈ C1rd[T;R
n]. Integrating by parts (see Lemma 1, formula 1),
σ(t) · η∆(t)∆t = ψ(t) · η(t)|
∆(t) · η(t)∆t , (31)
and we can write from (28), (30) and (31) that
ψ∆(t) + ψ0Ly + ψ
σ(t) · ϕy
· η(t)
+ (ψ0Lu + ψ
σ(t) · ϕu) · ω(t)
∆t− ψ(t) · η(t)|
= 0 (32)
hold for any ψ(t). Using the definition (20) of H , we can rewrite (32) as
ψ∆(t) +Hy
· η(t) +Hu · ω(t)
∆t− ψ(t) · η(t)|
= 0 . (33)
It is, however, not possible to employ (yet) Lemma 3 due to the fact that
the variations η(t) are not arbitrary. Now choose ψ(t) = ψ∗(t) so that the
coefficient of η(t) in (33) vanishes: ψ∆
(t) = −Hy (and ψ∗(a) = 0 if y(a) is
free, i.e. η(a) 6= 0; ψ∗(b) = 0 if y(b) is free, i.e. η(b) 6= 0). In the normal case
ψ∗(t) is determined by (y∗(·), u∗(·)), and we choose ψ0∗ = 1. The abnormal case
is characterized by the existence of a non-trivial solution ψ∗(t) for the system
(t) + ψσ
(t) · ϕy = 0: in that case we choose ψ0∗ = 0 in order to the first
coefficient of η(t) in (32) or (33) to vanish. Given this choice of the multipliers,
the necessary optimality condition (33) takes the form
Hu · ω(t)∆t = 0 .
Since ω(t) can be arbitrarily assigned for all t ∈ Tk, it follows from Lemma 3
that Hu = 0.
3.3 The higher-order problem on time scales
As a corollary of Theorem 3 we obtain the Euler-Lagrange equation for problem
(3). We first introduce some notation:
y0(t) = y(t),
y1(t) = y∆(t),
yr−1(t) = y∆
u(t) = y∆
Theorem 4. If y∗ ∈ C
rd[T] is a weak local minimizer for the higher-order
problem (3), then
(σ(t)) = −Lu(t, x∗(t), u∗(t)) (34)
holds for all t ∈ Tk
, where x∗(t) =
y∗(t), y
(t), . . . , y∆
and ψr−1
(σ(t))
is defined recursively by
(σ(t)) = −
∫ σ(t)
Ly0(ξ, x∗(ξ), u∗(ξ))∆ξ + c0 , (35)
(σ(t)) = −
∫ σ(t)
Lyi(ξ, x∗(ξ), u∗(ξ)) + ψ
(σ(ξ))
∆ξ + ci, i = 1, . . . , r − 1 ,
with cj, j = 0, . . . , r − 1, constants. If y
∆i(α) is free in (3) for some i ∈
{0, . . . , r − 1}, α ∈ {a, ρr−1(b)}, then the correspondent condition ψi
(α) = 0
holds.
Remark 12. From (34), (35) and (36) it follows that
(−1)r−i
· · ·
Lyi + [ci]r−i−1 = 0, (37)
where [ci]r−i−1 means that the constant is free from the composition of the r− i
integrals when i = r − 1 (for simplicity, we have omitted the arguments in Lu
and Lyi).
Remark 13. If we delta differentiate (37) r times, we obtain the delta differen-
tiated equation for the problem of the calculus of variations with higher order
delta derivatives. However, as observed in Remark 3, one can only expand
formula (37) under suitable conditions of delta differentiability of µ(t).
Remark 14. For the particular case with ϕ(t, y, u) = u, equation (8) is (37) with
r = 1.
Proposition 3. The higher-order problem on time scales (3) does not admit ab-
normal extremals, even when the boundary conditions y∆
(a) and y∆
(ρr−1(b)),
i = 0, . . . , r − 1, are all fixed.
Remark 15. We require the time scale T to have at least 2r + 1 points. Let
us consider problem (3) with all the boundary conditions fixed. Due to the
fact that we have r delta derivatives, the boundary conditions y∆
(a) = yia and
(ρr−1(b)) = yib for all i ∈ {0, . . . , r− 1}, imply that we must have at least 2r
points in order to have the problem well defined. If we had only this number of
points, then the time scale could be written as T = {a, σ(a), . . . , σ2r−1(a)} and
r−1(b)
L(t, y(t),y∆(t), . . . , y∆
(t))∆t
i+1(a)
σi(a)
L(t, y(t), y∆(t), . . . , y∆
(t))∆t
L(σi(a), y(σi(a)), y∆(σi(a)), . . . , y∆
(σi(a))),
where we have used the fact that ρr−1(σ2r−1(a)) = σr(a). Now, having in mind
the boundary conditions and the formula
f∆(t) =
f(σ(t))− f(t)
we can conclude that the sum in (38) would be constant for every admissible
function y(·) and we wouldn’t have nothing to minimize.
The following technical result is used in the proof of Proposition 3.
Lemma 4. Suppose that a function f : T → R is such that fσ(t) = 0 for all
t ∈ Tk. Then, f(t) = 0 for all t ∈ T\{a} if a is right-scattered.
Proof. First note that, since fσ(t) = 0, then fσ(t) is delta differentiable, hence
continuous for all t ∈ Tk. Now, if t is right-dense, the result is obvious. Suppose
that t is right-scattered. We will analyze two cases: (i) if t is left-scattered,
then t 6= a and by hypothesis 0 = fσ(ρ(t)) = f(t); (ii) if t is left-dense, then
f(t) = lims→t− f
σ(s) = fσ(t), by the continuity of fσ. The proof is done.
Proof. (of Proposition 3) Suppose that ψ0 = 0. With the notation (40) intro-
duced below, the higher order problem (3) would have the abnormal Hamilto-
nian given by
H(t, y0, . . . , yr−1, u, ψ0, . . . , ψr−1) =
ψi(σ(t)) · yi+1(t) + ψr−1(σ(t)) · u(t)
(compare with the normal Hamiltonian (41)). From Theorem 3, we can write
the system of equations:
ψ̂0(t) = 0
ψ̂1(t) = −ψ0(σ(t))
ψ̂r−1(t) = −ψr−2(σ(t))
ψr−1(σ(t)) = 0,
for all t ∈ Tk
, where we are using the notation ψ̂i(t) = ψi
(t), i = 0, . . . , r− 1.
From the last equation, and in view of Lemma 4, we have ψ(t) = 0, ∀t ∈
\{a} if a is right-scattered. This implies that ψ̂r−1(t) = 0, ∀t ∈ Tk
and consequently ψr−2(σ(t)) = 0, ∀t ∈ Tk
\{a}. Like we did before, ψr−2(t) =
0, ∀t ∈ Tk
\{a, σ(a)} if σ(a) is right-scattered. Repeating this procedure, we
will finally have ψ̂1(t) = 0, ∀t ∈ Tk
\{a, . . . , σr−2(a)} if σi(a) is right-scattered
for all i ∈ {0, . . . , r− 2}. Now, the first and second equations in the system (39)
imply that ∀t ∈ A = Tk
\{a, . . . , σr−2(a)}
0 = ψ̂1(t) = −ψ0(σ(t)) = ψ0(t) + µ(t)ψ∆(t) = ψ0(t) .
We pick again the first equation to point out that ψ0(t) = c, ∀t ∈ Tk
some constant c. Since the time scale has at least 2r + 1 points (Remark 15),
the set A is nonempty and therefore ψ0(t) = 0, ∀t ∈ Tk
. Substituting this
in the second equation, we get ψ̂1(t) = 0, ∀t ∈ Tk
. As before, it follows that
ψ1(t) = d, ∀t ∈ Tk
and some constant d. But we have seen that there exists
some t0 such that ψ
1(t0) = 0, hence ψ
1(t) = 0, ∀t ∈ Tk
. Repeating this
procedure, we conclude that for all i ∈ {0, . . . , r−1}, ψi(t) = 0 at t ∈ Tk
. This
is in contradiction with Theorem 3 and we conclude that ψ0 6= 0.
Proof. (of Theorem 4) Denoting ŷ(t) = y∆(t), then problem (3) takes the fol-
lowing form:
L[y(·)] =
r−1(b)
L(t, y0(t), y1(t), . . . , yr−1(t), u(t))∆t −→ min,
ŷ0 = y1
ŷ1 = y2
ŷr−2 = yr−1
ŷr−1 = u
yi(a) = yia
ρr−1(b)
= yib
, i = 0, . . . , r − 1, yia and y
b ∈ R
System (40) can be written in the form y∆ = Ay +Bu, where
y0, y1, . . . , yr−1
y01 , . . . , y
1 , . . . , y
n, . . . , y
∈ Rnr
and the matrices A (nr by nr) and B (nr by n) are
0 I 0 · · · 0
0 0 I · · · 0
. . .
0 0 0 · · · I
0 0 0 · · · 0
, B = col{0, . . . , 0, I}
in which I denotes the n by n identity matrix, and 0 the n by n zero matrix.
From Proposition 3 we can fix ψ0 = 1: problem (40) is a particular case of (16)
with the Hamiltonian given by
H(t, y0, . . . , yr−1, u, ψ0, . . . , ψr−1)
= L(t, y0, . . . , yr−1, u) +
ψi(σ(·)) · yi+1 + ψr−1(σ(·)) · u. (41)
From (26) and (19), we obtain
ψi(σ(t)) = −
∫ σ(t)
Hyi(ξ, x(ξ), u(ξ), ψ
σ(ξ))∆ξ + ci, i ∈ {0, . . . , r − 1} (42)
0 = Hu(t, x(t), u(t), ψ
σ(t)), (43)
respectively. Equation (43) is equivalent to (34), and from (42) we get (35)-
(36).
4 An example
We end with an application of our higher-order Euler-Lagrange equation (37)
to the time scale T = [a, b] ∩ Z, that leads us to the usual and well-known
discrete-time Euler-Lagrange equation (in delta differentiated form) – see e.g.
[11]. Note that ∀t ∈ T we have σ(t) = t + 1 and µ(t) = σ(t) − t = 1. In
particular, we conclude immediately that µ(t) is r times delta differentiable.
Also for any function g, g∆ exists ∀t ∈ Tk (see Theorem 1.16 (ii) of [5]) and
g∆(t) = g(t+1)− g(t) = ∆g is the usual forward difference operator (obviously
exists ∀t ∈ Tk
and more generally g∆
exists ∀t ∈ Tk
, r ∈ N).
Now, for any function f : T → R and for any j ∈ N we have
∫ σ(t)
· · ·
︸ ︷︷ ︸
j−i integrals
, i ∈ {0, . . . , j − 1} , (44)
where f∆
(t) stands for f∆
(σj−i(t)). To see this we proceed by induction.
For j = 1
∫ σ(t)
f(ξ)∆ξ =
∫ t+1
f(ξ)∆ξ =
f(ξ)∆ξ +
∫ t+1
f(ξ)∆ξ
f(ξ)∆ξ + f(t),
and then
∫ σ(t)
f(ξ)∆ξ
= f(t) + f∆(t) = fσ. Assuming that (44) is true for
all j = 1, . . . , k, then
∫ σ(t)
· · ·
︸ ︷︷ ︸
k+1−i integrals
· · ·
︸ ︷︷ ︸
k+1−i
f∆τ +
∫ σ(t)
· · ·
︸ ︷︷ ︸
∫ σ(t)
· · ·
︸ ︷︷ ︸
∫ σ(t)
· · ·
︸ ︷︷ ︸
∆iσk−i
k+1−i
Delta differentiating r times both sides of equation (37) and in view of (44),
we obtain the Euler-Lagrange equation in delta differentiated form (remember
that y0 = y, . . ., yr−1 = y∆
= u):
r (t, y, y∆, . . . , y∆
(−1)r−iL∆
i (t, y, y
, . . . , y
∆r) = 0.
5 Conclusion
We introduce a new perspective to the calculus of variations on time scales. In all
the previous works [2, 4, 9] on the subject, it is not mentioned the motivation for
having yσ (or yρ) in the formulation of problem (1). We claim the formulation
(2) without σ (or ρ) to be more natural and convenient. One advantage of
the approach we are promoting is that it becomes clear how to generalize the
simplest functional of the calculus of variations on time scales to problems with
higher-order delta derivatives. We also note that the Euler-Lagrange equation
in ∆-integral form (8), for a Lagrangian L with y instead of yσ, follows close
the classical condition. Main results of the paper include: necessary optimality
conditions for the Lagrange problem of the calculus of variations on time scales,
covering both normal and abnormal minimizers; necessary optimality conditions
for problems with higher-order delta derivatives. Much remains to be done in
the calculus of variations and optimal control on time scales. We trust that our
perspective provides interesting insights and opens new possibilities for further
investigations.
Acknowledgments
This work was partially supported by the Portuguese Foundation for Science
and Technology (FCT), through the Control Theory Group (cotg) of the Centre
for Research on Optimization and Control (CEOC – http://ceoc.mat.ua.pt).
The authors are grateful to M. Bohner and S. Hilger for useful and stimulating
comments, and for them to have shared their expertise on time scales.
References
[1] R. Agarwal, M. Bohner, D. O’Regan, A. Peterson. Dynamic equations on time
scales: a survey, J. Comput. Appl. Math. 141 (2002), no. 1-2, 1–26.
[2] F. M. Atici, D. C. Biles, A. Lebedinsky. An application of time scales to economics,
Math. Comput. Modelling 43 (2006), no. 7-8, 718–726.
[3] Z. Bartosiewicz, E. Paw luszewicz. Realizations of linear control systems on time
scales, Control Cybernet. 35 (2006), no. 4 (in press)
[4] M. Bohner. Calculus of variations on time scales, Dynam. Systems Appl. 13
(2004), no. 3-4, 339–349.
[5] M. Bohner, A. C. Peterson. Dynamic equations on time scales: an introduction
with applications, Birkhäuser Boston, Inc., Boston, MA, 2001.
[6] I. M. Gelfand, S. V. Fomin. Calculus of variations, Dover, New York, 2000.
[7] S. Hilger. Analysis on measure chains—a unified approach to continuous and dis-
crete calculus, Results Math. 35 (1990), 18–56.
[8] S. Hilger. Differential and difference calculus—unified!, Proceedings of the Second
World Congress of Nonlinear Analysts, Part 5 (Athens, 1996). Nonlinear Anal.
30 (1997), no. 5, 2683–2694.
[9] R. Hilscher, V. Zeidan. Calculus of variations on time scales: weak local piecewise
rd solutions with variable endpoints, J. Math. Anal. Appl. 289 (2004), no. 1,
143–166.
[10] R. Hilscher, V. Zeidan. Nonnegativity and positivity of quadratic functionals in
discrete calculus of variations: survey, J. Difference Equ. Appl. 11 (2005), no. 9,
857–875.
[11] J. D. Logan. Higher dimensional problems in the discrete calculus of variations,
Internat. J. Control (1) 17 (1973), 315–320.
[12] L. S. Pontryagin, V. G. Boltyanskii, R. V. Gamkrelidze, E. F. Mishchenko. The
mathematical theory of optimal processes. Translated from the Russian by K. N.
Trirogoff; edited by L. W. Neustadt Interscience Publishers John Wiley & Sons,
Inc. New York-London, 1962.
[13] S. P. Sethi, G. L. Thompson. Optimal control theory, Second edition, Kluwer
Acad. Publ., Boston, MA, 2000.
[14] B. van Brunt. The calculus of variations, Universitext, Springer-Verlag, New York,
2004.
	Introduction
	Time scales and previous results
	Main results
	The basic problem on time scales
	The Lagrange problem on time scales
	The higher-order problem on time scales
	An example
	Conclusion
ABSTRACT
  We study more general variational problems on time scales. Previous results
are generalized by proving necessary optimality conditions for (i) variational
problems involving delta derivatives of more than the first order, and (ii)
problems of the calculus of variations with delta-differential side conditions
(Lagrange problem of the calculus of variations on time scales).

<|endoftext|><|startoftext|>
Introduction
The solar wind evacuates a cavity in the interstellar medium
(ISM) known as the heliosphere, from which interstellar ions
are excluded. In contrast, neutral interstellar gas flows through
the heliosphere until destroyed by charge-transfer with the solar
wind and photoionization. These neutrals form the parent popu-
lation of pickup ions (PUI) and anomalous cosmic rays (ACR)
observed inside of the heliosphere. The properties of the sur-
rounding interstellar medium set the boundary conditions of the
heliosphere and determine its configuration and evolution. An
ionization gradient is expected in the cloud feeding ISM into the
heliosphere because of the hardness of the interstellar radiation
field and the low opacity of the ISM (Cheng & Bruhweiler 1990;
Vallerga 1996; Slavin & Frisch 2002, hereafter SF02). Because
of this ionization gradient, the densities of partially ionized
species in the local interstellar cloud (LIC) will differ from the
values in the circumheliosphere interstellar medium (CHISM)
that forms the boundary conditions of the heliosphere. Hence
we have undertaken a series of studies to determine the bound-
ary conditions of the heliosphere based on both astronomical and
heliospheric data. In turn our results provide tighter constraints
on the heliosphere models used to calculate the filtration factors
Send offprint requests to: J. Slavin
for neutrals that then permit comparisons between ISM inside
and outside of the heliosphere.
The distribution and velocity of interstellar H0 inside
of the heliosphere were first determined over 30 years ago
from the florescence of solar Lyα radiation from these
atoms (Thomas & Krassa 1971; Bertaux & Blamont 1971;
Adams & Frisch 1977). Similar observations of solar 584Å flo-
rescence from interstellar He0 showed that H/He ∼ 6 for in-
terstellar gas inside of the heliosphere, in contrast to the cos-
mic value H/He = 10 (Ajello 1978; Weller & Meier 1981).
More recent measurements of n(H0) compared to n(He0) in-
side of the heliosphere find a similar ratio of H/He ∼ 6 − 7
(Richardson et al. 2004; Gloeckler & Geiss 2004; Witte 2004;
Möbius et al. 2004). This difference can be attributed to two ef-
fects: the loss of 40–60% of interstellar H0 due to charge-transfer
with protons in the heliosheath region, a process denoted “filtra-
tion” (Ripken & Fahr 1983), and the hardness of the interstellar
radiation field at the Sun that ionizes more He than H (§4).
The Copernicus satellite first showed that the local inter-
stellar cloud (LIC) surrounding the Sun is low density, ∼ 0.1
atoms cm−3, partially ionized (n(H+) ∼ n(H0), York 1974;
McClintock et al. 1975), and warm (temperature < 104 K, e.g.
McClintock et al. 1978). Copernicus, FUSE and HST data have
shown that the cluster of local interstellar clouds (CLIC, low
density clouds within ∼ 30 pc), has low column densities, N(H i)
< 1018.7, and N ionization levels of > 30% (e.g. Lehner et al.
http://arxiv.org/abs/0704.0657v2
2 Slavin and Frisch: Boundary Conditions of the Heliosphere
2003; Wood et al. 2005), indicating partially ionized gas because
H and N ionization are coupled by charge-transfer. The low col-
umn densities of the LIC itself, N(H i) < 1018 cm−2, indicate
it is partially opaque to H ionizing radiation but not to He ion-
izing radiation. Cloud opacities of unity are reached for N(H i)
∼ 1017.2 cm−2 for photons close to the ionization threshold of
hydrogen (13.6 eV), and N(H i) ∼ 1017.7 cm−2 for photons at the
He0 ionization edge (24.6 eV). The result is that the LIC is par-
tially ionized with a significant ionization gradient between the
edge and center. Because of this radiative transfer effects are im-
portant and need to be modeled carefully in order to determine
the boundary conditions of the heliosphere.
The LIC belongs to a flow of low density ISM embedded
in the very low density and apparently hot (T ∼ 106 K) Local
Bubble (Frisch 1981; Frisch & York 1986; McCammon et al.
1983; Snowden et al. 1990). The bulk motion of the CLIC
through the local standard of rest1 corresponds to a velocity of
−17.0 to −19.4 km s−1 from the direction ℓ ∼ 331◦, b ∼ −5◦
(Frisch & Slavin 2006). This upwind direction is near the center
of the Loop I superbubble and the center of the “ring” shadow
that has been attributed to the merging of Loop I and the Local
Bubble (Frisch & York 1986; Frisch 2007; Egger & Aschenbach
1995). Individual cloudlets with distinct velocities are identified
in this flow (Lallement et al. 1986; Frisch et al. 2002). The ve-
locity of the cloud feeding gas into the heliosphere has been de-
termined by the velocity of interstellar He0 in the heliosphere
measured by the GAS detector on Ulysses, −26.3 km s−1 (Witte
2004).
Curiously, absorption lines at the LIC velocity are not ob-
served in the nearest interstellar gas in the upwind hemisphere,
such as towards the closest star α Cen or towards 36 Oph lo-
cated ∼ 5 pc beyond the heliosphere nose (Adams & Frisch
1977; Landsman et al. 1984; Linsky & Wood 1996; Wood et al.
2000a,b). This lack of an absorption component at the LIC ve-
locity in the closest stars in the upstream direction indicates that
the Sun is near the edge of the LIC, so the CHISM may vary
over short distances (and hence timescales).
We are able to test for possible past variations by compar-
ing the first interplanetary H0 Lyα glow spectrum obtained by
Copernicus in 1975 with Hubble Space Telescope observations
of the Lyα spectra obtained during the mid-1990’s solar min-
imum conditions. The observed H0 velocity and intensity has
not varied to within uncertainties over the twenty-year period
separating these two sets of observations, so that the CHISM
velocity field is relatively smooth over spatial scales of ∼ 120
AU in the downwind direction (Frisch & Slavin 2005). The 1975
Copernicus data were acquired in the direction corresponding
to ecliptic longitudes of λ = 264.3◦, β = +15.0◦, or ∼ 13.3◦
from the most recent upwind direction derived from SOHO H i
Lyα data (Quémerais et al. 2006b,a, Q06). The Copernicus look
direction was just outside of the “groove” expected in the Lyα
glow in the ecliptic during solar minimum. The groove is caused
by increased charge-transfer in the solar wind current sheet,
which has a small tilt during minimum conditions (Bzowski
2003). The velocity of the Lyα profile observed by Copernicus
corresponds to −24.8±2.6 km s−1, after correction to the SOHO
upwind direction. Since Q06 measured H0 upwind velocities
during the solar minimum years of 1996 and 1997 of −25.7±0.2
km s−1 and −25.3 ± 0.2 km s−1, which is consistent with the
Copernicus results, we must arrive at the conclusion that the flow
of interstellar H0 into the heliosphere was relatively constant be-
1 Heliocentric motions are converted to the local standard of rest,
LSR, using the Standard solar apex motion.
tween the years of 1975 and 1997, so that any observations of
variations in the interplanetary Lyα glow properties must be due
to solar activity properties alone. The thermal broadening of the
Copernicus spectrum corresponded to a temperature of ∼ 5400
K, however measurement uncertainties allowed temperatures of
up to 20, 000 K.
In this paper we present new photoionization models of the
LIC and show they reproduce both the densities of the ISM at the
heliosphere and column densities in the LIC component towards
the star ǫ CMa. In earlier papers, photoionization models of both
the “LIC” and “Blue Cloud” components observed towards ǫ
CMa were used to constrain the models (SF02, Frisch & Slavin
2003). SF02 grouped the properties of the LIC and Blue Cloud,
both of which are with 2.7 pc of the Sun because they are also
observed towards Sirius, ∼ 12◦ from the ǫ CMa sightline. The
LIC velocity and density are sampled by observations of inter-
stellar He i inside of the heliosphere (−26.3 km s−1, Witte 2004),
and Gry & Jenkins (2001) show that the properties of the LIC
and Blue Cloud differ somewhat. The present study therefore fo-
cuses on obtaining the best model of heliosphere boundary con-
ditions by using only data on the LIC inside of the heliosphere,
and the LIC component towards ǫ CMa. The present study also
benefits from new atomic rates for the critical Mg ii→Mg i di-
electronic recombination coefficient, improved cooling rates in
the radiative transfer code Cloudy, recent values for the pickup
ion densities at the termination shock, and recent values for solar
abundances.
The most truly unique quality of the LIC is that we are inside
of it and therefore have the ability to sample the cloud directly
via in situ observations carried out by a variety of spacecraft
within the Solar System. Of all the available measurements of
LIC gas flowing into the Solar System, the observations of the
density and temperature of neutral He are apparently the most
robust. Helium, unlike hydrogen, undergoes little ionization or
heating in traversing the heliosheath regions, no deflection due
to radiation pressure, and is destroyed by photoionization and
electron-impact ionization within ∼ 1 AU of the Sun, so we ex-
pect that the density and temperature of He0 derived from the
observations in the Solar System are truly representative of the
values in the LIC (Möbius et al. 2004). In this paper we put a
special emphasis on matching these He i data by determining
model parameters that yield close agreement with the n(He0) and
T (He0) data simultaneously.
2. Photoionization Model Constraints and
Assumptions
The primary data constraints on our photoionization mod-
els are the LIC component column densities towards ǫ CMa,
(Gry & Jenkins 2001, hereafter GJ01), in situ observations
of He0 (Witte 2004; Möbius et al. 2004), pickup ions (PUI,
Gloeckler & Fisk 2007), and anomalous cosmic rays (ACR,
Cummings et al. 2002). The astronomical and in situ observa-
tional constraints are summarized in Table 1.
2.1. Astronomical Constraints – The LIC towards ǫ CMa
The data toward ǫ CMa of GJ01 show four separate veloc-
ity components detected in several different ions including C ii,
Si iii, Si ii, and Mg ii. One of these velocity components, with
a heliocentric velocity of ∼ 17 km s−1, is identified as the LIC.
Another is identified with a second local cloud, the “Blue Cloud”
(BC) at ∼ 10 km s−1. In our previous study of LIC ioniza-
Slavin and Frisch: Boundary Conditions of the Heliosphere 3
Table 1. Observational Constraints
Observed Observeda Notesb
Quantity Value
N(C ii) (cm−2) 1.4 − 2.1 × 1014 1, S
N(C ii∗) (cm−2) 1.3 ± 0.2 × 1012 1
N(C iv) (cm−2) 1.2 ± 0.3 × 1012 1
N(N i) (cm−2) 1.70 ± 0.05 × 1013 1
N(O i) (cm−2) 1.4+0.5
−0.2 × 10
14 1, S
N(Mg i) (cm−2) 7 ± 2 × 109 1
N(Mg ii) (cm−2) 3.1 ± 0.1 × 1012 1
N(Si ii) (cm−2) 4.52 ± 0.2 × 1012 1
N(Si iii) (cm−2) 2.3 ± 0.2 × 1012 1
N(S ii) (cm−2) 8.6 ± 2.1 × 1012 1
N(Fe ii) (cm−2) 1.35 ± 0.05 × 1012 1
N(H i)/N(He i) 14 ± 0.4c 2
T (K) 6300 ± 340 4
n(He0) (cm−3) 0.015 ± 0.003 4, f = 1d
n(N0)e (cm−3) 5.47 ± 1.37 × 10−6 3, f = 0.68 − 0.95d
n(O0)e (cm−3) 4.82 ± 0.53 × 10−5 3, f = 0.64 − 0.99d
n(Ne0)e (cm−3) 5.82 ± 1.16 × 10−6 3, f = 0.84 − 0.95d
n(Ar0)e (cm−3) 1.63 ± 0.73 × 10−7 3, f = 0.53 − 0.95d
a N(C ii∗), N(N i), N(O i), N(Mg ii), N(Si ii), N(S ii) and N(Fe ii) are
used to constrain the input abundances of the models.
b The first number indicates the reference (below), and “S” indi-
cates that the column density is based on a saturated line. Filtration
factors ( f , §2.2) are listed for neutrals that have been observed as
pickup ions.
c The uncertainty given is only that due to uncertainties listed in
Dupuis et al. (1995) for the observed H i and He i column densities
with the implicit assumption that the ratio is the same on all lines of
sight. Given the substantial intrinsic variation in this ratio, however,
the quoted uncertainty must be regarded as a lower limit to the true
uncertainty.
d We assume in this paper that for He fHe = 1, and f is not allowed
to exceed 1. Heliosphere models predict the range fHe = 0.92 − 1
for He and fH = 0.50 − 0.74 for H (see filtration factors in
Cummings et al. 2002; Müller & Zank 2004a; Müller et al. 2007;
Izmodenov et al. 2004).
e Note that the pickup ion ratios are for values at the termination
shock. N0, O0, Ne0, and Ar0 densities need to be corrected for fil-
tration in the heliosheath regions (2.2).
References: (1) Gry & Jenkins (2001) Note that the values shown are
for the LIC component towards ǫ CMa.; (2) Dupuis et al. (1995); (3)
Gloeckler & Fisk (2007); (4) Witte (2004).
tion (SF02) we noted that since the BC is close to the LIC,
and both are detected towards Sirius at 2.7 pc (Lallement et al.
1994; Hébrard et al. 1999), perhaps actually abutting the LIC,
we should treat the two as a single cloud for this line of sight.
Counter arguments to this line of reasoning include the fact that
the BC apparently has different properties than the LIC, though
whether it is colder or hotter is unclear (see Hébrard et al. 1999,
GJ01). For this reason and because we wish to find the ioniza-
tion models that best predict interstellar neutral densities inside
and outside of the heliosphere, we assume in this paper that the
LIC and BC are truly separate clouds and select only the LIC
components towards ǫ CMa to constrain the models.
In the context of these models, the best astronomical con-
straints on the neutral ISM component are N i, the saturated O i
line, and to some extent Mg i (though the line is weak). The elec-
tron density can be deduced from the ratio Mg ii/Mg i, which is
determined by photoionization and both dielectronic and radia-
tive recombination, and the excitation of the C ii fine-structure
lines, C ii/C ii∗. However, the heavy saturation of the C ii 1335
Å line in the ISM limits the accuracy of the determination of
N(C ii), so in this paper we have de-emphasized C ii/C ii∗ as an
ionization diagnostic. For a discussion of the use of C ii/C ii∗ and
other observations in diagnosing the C abundance and C/S ratio
see Slavin & Frisch (2006). Elements with first ionization po-
tentials (FIP) < 13.6 eV (e.g. Mg, Si, S, and Fe) are generally
almost entirely singly ionized in the LIC and thus the column
densities of these ions are close to the total column densities for
the elements.
2.2. In situ Constraints – He0, Pickup Ions, and Anomalous
Cosmic Rays
Neutral atoms in the LIC penetrate the outer heliosphere regions,
and become ionized primarily by charge-transfer with the so-
lar wind ions, photoionization, and electron impact ionization
(Rucinski et al. 1996). The composition of this neutral popula-
tion reflects the partially ionized state of the LIC, rather than
indicating a pure FIP effect. Thus the observed abundances of
neutrals representing high FIP elements, such as He, Ne, and Ar,
as well as H, N, and O, do not reflect their elemental abundances
directly. As a result, these neutrals provide an interesting and
unique constraint on the photoionization models. Once ionized,
these interstellar wind particles form a population of ions with
a distinct velocity distribution that are “picked up” by the solar
wind and convected outwards, where they are measured by var-
ious spacecraft (Möbius et al. 2004; Gloeckler & Geiss 2004).
These pickup ions (PUI) are accelerated in the heliosheath re-
gion and form a population of cosmic rays with an anomalous
composition reflective of their origin as interstellar neutrals in
a partially ionized gas (Cummings et al. 2002). In situ observa-
tions of these byproducts of the ISM interaction with the helio-
sphere, H, He, N, O, Ar, and Ne, provide a unique opportunity
to constrain theoretical models of an interstellar cloud using the
combination of sightline-integrated data, and data from a “sin-
gle” spatial location, the heliosphere.
We adopt the Ulysses He measurements as the best set of
constraints on the ISM inside of the heliosphere. The Ulysses
satellite provides direct measurements of interstellar He0 at
high ecliptic latitudes throughout the solar cycle (the GAS de-
tector, Witte 2004) and also measurements of the He pickup
ion component (the SWICS detector Gloeckler & Geiss 2004;
Gloeckler & Geiss 2007). Although interstellar He0 close to the
Sun is detected through the resonant scattering of solar 584 Å
radiation, geocoronal contamination of the interstellar signal is
present so we prefer the Ulysses data (Möbius et al. 2004). We
adopt the Ulysses GAS and SWICS results, n(He0) = 0.0151 ±
0.0015 cm−3, T (He0) = 6, 300 ± 340 K.
Data on the density of neutral N, O, Ne, and Ar in the sur-
rounding ISM are provided by PUI and ACR data. The densities
of interstellar N0, O0, Ne0, and Ar0 at the termination shock are
listed in Table 1. These densities must be corrected by the fil-
tration factors, which correspond to the ratios of the densities
at the termination shock to those in the LIC. Filtration occurs
when neutral interstellar atoms are removed from the inflow by
charge-transfer with interstellar protons as the atom crosses the
heliosheath regions. Filtration values are listed in Table 1, based
on values in Cummings et al. (2002); Müller & Zank (2004b);
Izmodenov et al. (2004). Filtration values larger than 1 are not
considered, although some models suggest possible net creation
of O0 through charge-transfer between O+ and H0 in the he-
liosheath regions (Müller & Zank 2004a). We adopt fHe = 1 for
He. Our models must be compared to the interstellar densities
4 Slavin and Frisch: Boundary Conditions of the Heliosphere
Fig. 1. The modeled interstellar radiation field at the Sun (model
26) is shown as a function of wavelength (bottom X-axis) and
energy (top X-axis). The black histogram is the modeled hot
gas (i.e. Local Bubble) spectrum while the gray histogram is
the cloud boundary contribution. The other line is the stellar
EUV/FUV background. The list of elements at the top of the
plot identifies the ionization potentials for neutrals and ions of
interest. The energy/wavelength at which an optical depth of 1 is
reached for several different H i column densities is shown along
the bottom of the plot. Observed flux levels of the soft X-ray dif-
fuse background in the Wisconsin Be and B bands are plotted as
lines (Bloch et al. 1986; McCammon et al. 1983).
obtained by correcting densities at the termination shock by fil-
tration factors.
2.3. Interstellar Radiation Field at the Cloud Surface
The spectrum and flux of the cosmic radiation field control
the ionization of the very local warm partially ionized medium
(WPIM). The local interstellar radiation field (ISRF) that ionizes
the LIC and other nearby cloudlets is determined by the location
of the Sun in the interior of a hole in the neutral interstellar gas
and dust referred to as the Local Bubble. The clustering tendency
of hot O and B stars, and attenuation of radiation by interstellar
dust and gas, yield the well known spatial variation of the in-
tensity and spectrum of the ISRF. We ignore possible temporal
variations of the radiation field (e.g. Parravano et al. 2003), and
model the ISRF throughout the LIC based on the observations
of the present-day radiation field.
The radiation field we use in our models is based on obser-
vations of the ISRF at the Sun, supplemented by theoretical cal-
culations of the spectra in the EUV and soft X-rays where lack
of sensitivity and/or spectral resolution require the use of mod-
els to create a realistic spectrum. The directly observed radiation
field includes the far ultraviolet (FUV) field created primarily
by B stars, the extreme ultraviolet (EUV) field from two B stars
(ǫ CMa and β CMa) and hot nearby white-dwarf stars, and the
diffuse soft X-ray background (SXRB). We show the radiation
field at the cloud surface in Fig. 1. Because the interstellar opac-
ity for radiation with E > 13.6 eV is vastly greater than for
E < 13.6 eV, the EUV part of the ISRF originates in nearby
regions with N(H i) <∼ 10
18 cm−2 while the FUV comes from a
much larger volume.
We use the EUV field of Vallerga (1998), which is based
on data collected by the Extreme Ultraviolet Explorer (EUVE)
satellite. The EUVE spectrometers were sensitive over the wave-
length range of 504 – 730 Å and showed that the stellar part of
the EUV background is dominated by ǫ CMa and β CMa with
substantial contributions from nearby hot white dwarfs at shorter
wavelengths. Vallerga extrapolated those measurements to the
H0 ionization edge at 912 Å using a total interstellar H0 density
towards ǫ CMa of N(H i) = 9 × 1017 cm−2. This value for N(H i)
appears somewhat high based on observations of Gry & Jenkins
(2001) which, though dependent on assumptions for gas phase
abundances, indicate a value for N(H i) of ∼ 7× 1017 cm−2. This
uncertainty in the total N(H i) affects only the extrapolated por-
tion of the spectrum in our models since the other portion of the
spectrum is derived by de-absorbing the observed spectrum by
the value of N(H i) assumed just for the LIC. For the results pre-
sented here we have assumed that the total N(H i) towards ǫ CMa
is 7×1017 cm−2. Assuming a larger value would increase the flux
in the extrapolated region, though our calculations indicate that
the overall affect on our results is only at the ∼ 2% level at most.
The FUV field is important because it sets the ionization rate
of Mg0, which has a first ionization potential of 7.65 eV (1621
Å). The radiation field shortwards of 1600 Å is heavily domi-
nated by O and B stars in Gould’s Belt, particularly those oc-
cupying the unattenuated regions of the third and fourth galac-
tic quadrants. A pronounced spatial asymmetry in the 1565 Å
radiation field has been observed by the TD-1 satellite S2/68
telescope survey of the interstellar radiation field, and we use
those data and the extrapolation down to 912 Å from the 1564 Å
measurements as calculated by Gondhalekar et al. (1980). The
asymmetries in the TD-1 1565 Å radiation field are reproduced
by diffuse interstellar radiation field models (Henry 2002).
The diffuse soft X-ray background (SXRB) has been ob-
served over the entire sky at relatively low spatial and spectral
resolution by ROSAT (Snowden et al. 1997) and proportional
counters flown on sounding rockets by the Wisconsin group
(e.g., McCammon et al. 1983). The broadband count rates in
the low energy bands, particularly the B and C bands (130–
188 eV and 160–284 eV respectively) have been modeled as
coming from an optically thin, hot plasma at a temperature of
∼ 106 K that occupies the low density cavity extending to ∼ 50–
200 pc from the Sun in all directions (Snowden et al. 1990).
Refinements to this picture have been required by ROSAT data
showing absorption by relatively distant clouds (e.g. MBM12,
Snowden et al. 1993). Snowden et al. (1998) propose a picture
in which the emission is divided between a LB component (un-
absorbed except for the LIC) and a distant absorbed component,
mainly in the Galactic halo. More recently there has been grow-
ing evidence that some, possibly large, fraction of the SXRB is
generated within the heliosphere from charge exchange between
solar wind ions and neutral atoms (SWCX). We discuss this fur-
ther in §5.6.
We model the spectrum measured by the broad-band soft X-
ray observations by assuming that the SXR background consists
of a local (unabsorbed) component and a distant (halo) com-
ponent absorbed by an H i column density of 1019 cm−2. The
emission measure or intrinsic intensity of the local and distant
components are assumed to be equal. The spectrum is calculated
using the Raymond & Smith (1977, updated) plasma emission
code assuming a hot, optically thin plasma in collisional ion-
ization equilibrium. We explore temperatures for the hot gas of
log Th = 5.9, 6.0 and 6.1. The total flux, scaled by the emission
measure, is fixed so that the B band flux matches the all-sky av-
erage from McCammon et al. (1983).
Slavin and Frisch: Boundary Conditions of the Heliosphere 5
The boundary region between the warm LIC and the adja-
cent Local Bubble plasma may be another significant source of
EUV radiation and we include this flux in our models 1–30. We
model this transition region as a conductive interface between
the LIC and Local Bubble plasma in the same way as described
in Slavin & Frisch (2002). The cloud is assumed to be steadily
evaporating into the surrounding hot gas. The partially ionized
gas of the LIC is heated and ionized as it flows into the Local
Bubble. The ionization falls out of equilibrium, with low ion-
ization stages persisting into the hot gas. The non-equilibrium
ionization is followed and the emission in the boundary is calcu-
lated again using the Raymond & Smith (1977) code. Ions in the
outflow are typically excited several times before being ionized
and the boundary region radiates strongly in the 13.6 − 54.4 eV
band. The contribution of the interface emission to the total B
band flux is taken into account in the calculation of the emission
measure for the hot gas of the Local Bubble so that the B band
flux still matches the all-sky average.
We note that no attempt is made to make this model con-
sistent with the size of the local cavity, which is proposed to
contain the hot gas in the standard model for the SXRB (e.g.,
Snowden et al. 1998). The pressure in the hot gas is not adjusted
to fit such models. Rather the total pressure in the hot gas for an
evaporating cloud model is dictated by the assumed density, tem-
perature and magnetic field in the cloud. In fact the pressures in
our models come out far too low for the standard model to pro-
duce the SXRB within the confines of the Local Cavity as de-
duced, e.g., from Na i observations (Lallement et al. 2003). This
in turn means that if the thermal pressure in the Local Bubble
turns out to be much lower than was assumed in those models
because a substantial fraction of the SXRB comes from SWCX
then the emission from the cloud boundary would be unaffected.
An example of the profiles of the hydrodynamic variables
for two different models is shown in Fig. 2. The magnetic field
strength in these calculations is assumed to be proportional to
the density at every point in the outflow. The treatment here is
the same as in Slavin (1989) which contains a more thorough
discussion of the issues involved in this sort of calculation. The
effect of the field on the conductivity is parametrized in a simple
way by a constant reduction factor of 0.5. The importance of the
magnetic field for our calculations lies in the way it affects the
thermal pressure in the layer. Since the density drops sharply in
the outflow and |B| ∼ n, any magnetic pressure (∼ B2) drops
even more sharply. The total pressure is roughly constant in the
boundary, so the thermal pressure necessarily rises to make up
for the decreasing magnetic pressure. Since all our models have
nearly the same thermal pressure in the cloud, the primary effect
of the magnetic field is to help determine the thermal pressure
in the interface and thus radiative flux from the boundary. Since
the total soft X-ray flux is fixed by requiring a match with the
Wisconsin B band all-sky average count rate (McCammon et al.
1983), a larger assumed magnetic field increases only the EUV
flux which is not constrained by the B-band data. The affect on
the cloud of a larger EUV flux is to increase its temperature and
ionization.
A secondary effect of the magnetic field can be seen in the
temperature profiles in Figure 2. In evaporating clouds in the
ISM, if the temperature gradient is large enough and the thermal
pressure is low enough, the conduction becomes “saturated”,
which means that the heat flux expected from the gas temper-
ature and temperature gradient exceeds the flux that can be car-
ried by the electrons (Cowie & McKee 1977). Saturation leads
to a steepening of the temperature gradient and a relative slow-
ing of the mass loss rate. In the two cases shown in Figure 2,
Fig. 2. Profiles of hydrodynamic variables in two different cloud
boundary calculations. The solid lines are for model 6 which has
nH = 0.273 cm
−3, log Th = 6.0, B0 = 2 µG and NHI = 4.5 × 10
cm−2, while the dashed lines are for model 8, which differs from
model 6 only in the strength of the magnetic field, B0 = 5 µG.
Note that in the upper right panel, the plot of thermal pressure,
the radial scale is much smaller in order to show the variation in
thermal pressure. The higher magnetic pressure inside the cloud
for model 8 leads to the higher external thermal pressure for that
case. The temperature profile differs in the two cases because the
degree of heat flux saturation is reduced for the higher thermal
pressure of model 8, which in turn leads to a shallower tempera-
ture gradient.
the lower magnetic field case (B0 = 2 µG) is moderately satu-
rated (in terms of Cowie & McKee’s parameter, σ0 = 3) while
the other case (B0 = 5 µG), because of the higher magnetic pres-
sure, is less saturated.
3. Photoionization Models
The photoionization models of the LIC are developed follow-
ing the same underlying procedure as in SF02, but selecting
only the LIC absorption component towards ǫ CMa for com-
parison. Improvements include using updated values for in situ
ISM observations and using a recent version of the Cloudy
radiative transfer/thermal equilibrium code (version 06.02.09a,
Ferland et al. 1998). We run Cloudy with the assumption of a
plane-parallel cloud geometry and format our calculated pho-
toionizing spectrum to be used as input. The selected options
include utilizing recent calculations for the dielectronic recom-
bination rates for Mg+ → Mg0 (e.g. Altun et al. 2006, and the
commands “set dielectronic recombination Badnell” and “set ra-
diative recombination Badnell”) , the assumption of constant
pressure, and the inclusion of interstellar dust grains at 50%
abundance compared to a standard ISM value. As we discuss
below, the only role that dust plays is net heating of the gas since
there is far too little column for extinction to be important (ex-
pected E(B − V) ∼ 10−4.2). The fraction of the heating provided
by dust is ∼ 4% of the total heating, while dust provides ∼ 2% of
the cooling (mainly by capture of electrons onto grain surfaces).
A cosmic ray ionization rate at the default background level of
2.5×1017 s−1 (for H ionization) is included. The LIC is assumed
to be in ionization and thermal equilibrium, and Cloudy calcu-
lates the detailed transfer of radiation, including absorption and
6 Slavin and Frisch: Boundary Conditions of the Heliosphere
Table 2. Model Input Parameter Values
Input Parameter
nH log Th B0 NHI
Model No. (cm−3) (K) (µG) (1017 cm−2)
1 0.273 5.9 2.0 3.0
2 0.273 5.9 2.0 4.5
3 0.273 5.9 5.0 3.0
4 0.273 5.9 5.0 4.5
5 0.273 6.0 2.0 3.0
6 0.273 6.0 2.0 4.5
7 0.273 6.0 5.0 3.0
8 0.273 6.0 5.0 4.5
9 0.273 6.1 2.0 3.0
10 0.273 6.1 2.0 4.5
11 0.273 6.1 5.0 3.0
12 0.273 6.1 5.0 4.5
13 0.218 5.9 2.0 3.0
14 0.218 5.9 2.0 4.5
15 0.218 5.9 5.0 3.0
16 0.218 5.9 5.0 4.5
17 0.218 6.0 2.0 3.0
18 0.218 6.0 2.0 4.5
19 0.218 6.0 5.0 3.0
20 0.218 6.0 5.0 4.5
21 0.218 6.1 2.0 3.0
22 0.218 6.1 2.0 4.5
23 0.218 6.1 5.0 3.0
24 0.218 6.1 5.0 4.5
25 0.226 5.9 4.7 3.0
26 0.213 5.9 2.5 4.0
27 0.226 6.0 3.8 3.0
28 0.216 6.0 2.1 4.0
29 0.232 6.1 3.4 3.0
30 0.223 6.1 0.05 4.0
42 0.218 6.1 – 4.5
scattering of the radiation incident on the cloud surface, as well
as the diffuse continuum and emission lines generated within the
cloud.
The procedure for creating a model begins with generating
the incident radiation field at the cloud surface (§2.3). The ra-
diative transfer model is then run, and the output predictions of
the model are compared to observations of interstellar absorp-
tion lines in the LIC towards ǫ CMa (N(C ii∗), N(N i), N(O i),
N(Mg ii), N(S ii), N(Si ii), N(Fe ii)) and in situ observations of
n(He0) by spacecraft inside of the solar system. The abundances
of C, N, O, Mg, Si, S and Fe are then adjusted to be consistent
with the observed column densities. With these new abundances,
the Cloudy run is repeated and this process is continued until no
adjustment of the abundances is needed. Because the abundances
do have an impact on the emission from the cloud boundary, the
cloud evaporation model is then re-run with the new abundances
as well to re-generate the input ionizing spectrum. The iterative
process of generating the spectrum and doing the Cloudy pho-
toionization runs generally requires only a couple runs of the
cloud evaporation program and a few runs of Cloudy.
4. Model Results
To investigate the dependence of the results on the input pa-
rameters we calculate a grid of 24 models. We explore total H
density, nH = 0.273, 0.218; Local Bubble hot gas temperature,
log Th = 5.9, 6.0, 6.1; magnetic field strength, B0 = 2, 5 µG; and
H i column density, NH I = 3 × 10
17, 4.5 × 1017 cm−2. The val-
Fig. 3. Model results for the He0 density and temperature in the
ISM just outside the heliosphere. The squares, circles, and trian-
gles are for models that are part of the initial grid of 24 models,
while the stars are for models 25 − 30 for which the magnetic
field, B, and the total H density, nH, were varied to match the ob-
served n(He0) and T . For the grid models, the empty symbols are
for models with N(H i)= 3 × 1017 cm−2, while the filled symbol
models have N(H i)= 4.5 × 1017 cm−2. As the legend shows, the
color (black vs. gray) indicates the magnetic field strength and
the symbol shape indicates the temperature assumed for the hot
gas of the Local Bubble. For identical kinds of points, the one to
the left is for a model with n(H0) = 0.218 cm−3 and the one to
the right has n(H0) = 0.273 cm−3. The ellipse is the error range
around the observed values for n(He0) and T .
ues for T and n(He i) at the solar location, the endpoint of the
calculations, are shown in Fig. 3 for this set of models. We then
explore another six models in which we do the calculations over
a grid in log Th and NH I but vary the values of nH and B0 in or-
der to match the observed T and n(He i). We employed a multiple
linear regression to assist in narrowing down the search region
for the values of nH and B0 needed to match the observations.
Since the dependencies of the results on the parameters really
are quite non-linear, this procedure could not work to predict
exactly correct values for the required parameters, but was use-
ful for getting close to the correct values. Based on the results
for the initial grid of models, we use log Th = 5.9, 6.0, 6.1 and
NH I = 3 × 10
17, 4 × 1017 cm−2 for this smaller grid, models
25 − 30. We chose to use 4 × 1017 cm−2 rather than 4.5 × 1017
cm−2 because the higher column density models, for the most
part, produced temperatures that were too high. As can be seen
from Fig. 3, all of these models (25–30), plotted as stars, are con-
sistent with the observed T and n(He i), indicated by the ellipse
in the figure. Table 2 gives the input parameters for each model.
Model predictions for the ǫ CMa sightline integrated through the
LIC are presented in Table 3. Model predictions for the CHISM
(i.e. at the solar location) are shown in Table 4.
In SF02 we tested models with no emission from the cloud
boundary. This amounts to assuming that the boundary is a sharp
transition from the hot gas of the Local Bubble to the warm gas
of the LIC. We have again explored such models using the LIC
data as constraints in the same way as for the models discussed
above with a conductive interface. When there is no evaporative
boundary, our models do not depend on the magnetic field in the
cloud since in that case the ionizing flux consists only of dif-
fuse emission from the hot gas of the Local Bubble and EUV
Slavin and Frisch: Boundary Conditions of the Heliosphere 7
Table 3. Model Column Density Results
Modela log N(Htot) log N(Ar i) log N(Ar ii) log N(Si iii)
N(Mg II)
N(Mg I)
N(C II)
N(C II∗)
N(H I)
N(He I)
Obs.b – – – 12.40 443+197
−110 93 − 430
c 12 − 16
1 17.58 11.48 11.76 8.895 706.3 213.0 11.36
2 17.78 11.67 11.95 9.641 325.3 234.1 11.40
3 17.65 11.33 11.84 9.844 246.1 182.1 11.44
4 17.82 11.55 12.00 10.12 136.5 198.0 11.80
5 17.58 11.49 11.75 8.961 717.8 230.0 11.62
6 17.78 11.67 11.94 9.711 286.7 244.9 11.62
7 17.64 11.34 11.81 9.909 207.4 194.8 12.47
8 17.82 11.56 11.97 10.17 119.7 208.2 12.66
9 17.59 11.46 11.75 9.289 553.0 239.5 12.27
10 17.78 11.66 11.93 9.861 210.3 248.1 12.14
11 17.64 11.33 11.78 9.993 171.2 201.7 13.44
12 17.82 11.54 11.95 10.23 104.3 210.6 13.52
13 17.60 11.44 11.78 9.100 768.9 255.1 11.52
14 17.79 11.64 11.96 9.738 336.4 274.4 11.47
15 17.67 11.30 11.85 9.944 251.2 215.9 11.65
16 17.84 11.51 12.01 10.21 140.6 232.1 12.01
17 17.58 11.47 11.76 8.926 840.1 262.5 11.65
18 17.79 11.65 11.95 9.792 308.9 286.3 11.64
19 17.66 11.30 11.82 10.02 208.5 229.2 12.76
20 17.84 11.52 11.98 10.26 123.5 242.1 12.95
21 17.59 11.43 11.76 9.318 633.1 280.8 12.41
22 17.79 11.63 11.94 9.926 224.9 288.4 12.25
23 17.66 11.29 11.79 10.10 175.0 235.5 13.81
24 17.84 11.50 11.96 10.31 110.2 245.5 13.85
25 17.66 11.32 11.84 9.887 271.3 213.6 11.71
26 17.74 11.56 11.91 9.676 383.9 270.1 11.70
27 17.63 11.36 11.79 9.751 331.1 242.0 12.51
28 17.73 11.59 11.90 9.625 411.6 284.5 11.79
29 17.62 11.38 11.77 9.720 329.3 251.6 13.01
30 17.72 11.60 11.88 9.642 386.6 295.1 12.08
42 17.77 11.70 11.92 9.555 482.9 317.9 11.51
a The best models, consistent with all observational data (see §4), are indicated by bold face.
b Observational results from Gry & Jenkins (2001) (see table 1). The values listed for N(H i)/N(He i) are the range of values observed excluding
Feige 24 which is one of the most distant stars observed by Dupuis et al. (1995) and has unusually large N(H i) and ratio values.
c The upper limit on N(C ii) is not well determined observationally because of the saturation of the line. Gry & Jenkins (2001) define it by
assuming a solar C/S abundance ratio and using the observed S ii column density. We find (see Slavin & Frisch 2006) that the abundance of C
required to match the N(C ii∗) observations is supersolar, with C/S∼ 36 − 44, which results in a much higher upper limit on N(C ii).
emission from nearby hot stars and the spectrum is not related
to the properties of the cloud. For these models our model grid
consists of total H density, nH = 0.273, 0.218, Local Bubble hot
gas temperature, log Th = 5.9, 6.0, 6.1, and H i column density,
NH I = 3×10
17, 4.5×1017 cm−2, for a total of twelve models that
we label as models 31–42. Of these models, however, only two
resulted in ionizing fluxes sufficient to heat the cloud to a tem-
perature ∼ 6000 K at the same time as matching the constraints
on the ion column densities. These models were the ones with
log Th = 6.1 and NH I = 4.5 × 10
17 cm−2 (models 36 and 42).
For the other cases, either at the surface or deeper into the cloud,
there are insufficient photons to provide the heating to balance
the cooling and the cloud temperature drops sharply to < 1000
K. The model with nH = 0.218 (model 42) is consistent with the
observed value of n(He0) and with the column density data as
well.
From these 42 models we have selected those that provide
acceptable results for the observational constraints, according to
prioritized requirements. The first requirement is that the model
predict the density and temperature of He i observed inside of
the solar system. Models 14 (marginally), 15 and 25–30 and 42
predict a He density and temperature consistent with the ob-
served values within the reported errors. They also predict the
PUI Ne densities, for an assumed Ne/H=123 ppm. Models 26
and 28 successfully match the PUI data for Ne/H as low as
∼ 100 ppm. These models are also required to match the ob-
served Mg ii/Mg i ratio in the LIC towards ǫ CMa. Models 14,
27, and 29 marginally fit this criterion, while 26, 28, 30, and 42
successfully fit this criterion. Note that the new models now pro-
vide acceptable predictions for the CHISM temperature, which
was not the case for the best models in SF02. These models are
also consistent with the observed C ii/C ii∗ ratios in the LIC com-
ponent towards ǫ CMa though this constraint is weak because of
the large uncertainties in N(C ii). Based on these comparisons,
and given the uncertainties in both data and models, Models 14,
26-30, and 42 are plausible models, but Models 26 and 28 appear
to best match the observational constraints.
Based on the constraints and assumptions presented in the
previous section, we select models 14, 26–30 and 42 as the best
models for the LIC ionization, with models 26 and 28 favored
by the PUI Ne data provided that Ne/H > 100 ppm. We believe
that these seven models then give a realistic range for the uncer-
tainties in the boundary conditions of the heliosphere, providing
that the underlying assumptions implicit in the Cloudy code, e.g.
8 Slavin and Frisch: Boundary Conditions of the Heliosphere
Table 4. Model Results for Solar Location
Modela X(H) X(He) n(H0) n(He0) n(N0) n(O0) n(Ne0) n(Ar0) ne np T
Obs.b 0.015 5.5 × 10−6 4.8 × 10−5 5.8 × 10−6 1.6 × 10−7 6300
1 0.176 0.299 0.318 0.0269 1.84 × 10−5 1.48 × 10−4 1.39 × 10−5 3.47 × 10−7 0.0800 0.0678 4310
2 0.196 0.352 0.251 0.0201 9.63 × 10−6 7.71 × 10−5 8.51 × 10−6 2.69 × 10−7 0.0724 0.0611 6450
3 0.286 0.404 0.230 0.0191 1.34 × 10−5 1.07 × 10−4 7.81 × 10−6 1.82 × 10−7 0.106 0.0922 6280
4 0.259 0.428 0.229 0.0175 8.86 × 10−6 6.95 × 10−5 6.45 × 10−6 1.92 × 10−7 0.0937 0.0799 7560
5 0.174 0.310 0.300 0.0249 1.72 × 10−5 1.39 × 10−4 1.13 × 10−5 3.30 × 10−7 0.0750 0.0631 4610
6 0.197 0.363 0.242 0.0190 9.13 × 10−6 7.44 × 10−5 7.08 × 10−6 2.61 × 10−7 0.0707 0.0593 6770
7 0.278 0.441 0.225 0.0173 1.33 × 10−5 1.04 × 10−4 5.75 × 10−6 1.78 × 10−7 0.101 0.0866 6670
8 0.257 0.459 0.223 0.0161 8.65 × 10−6 6.79 × 10−5 4.92 × 10−6 1.89 × 10−7 0.0918 0.0773 7860
9 0.193 0.357 0.265 0.0209 1.53 × 10−5 1.23 × 10−4 7.52 × 10−6 2.71 × 10−7 0.0761 0.0634 5410
10 0.203 0.392 0.235 0.0177 9.06 × 10−6 7.24 × 10−5 5.52 × 10−6 2.43 × 10−7 0.0723 0.0600 7230
11 0.278 0.474 0.220 0.0158 1.29 × 10−5 1.02 × 10−4 4.42 × 10−6 1.69 × 10−7 0.0999 0.0847 7020
12 0.262 0.490 0.219 0.0148 8.41 × 10−6 6.66 × 10−5 3.83 × 10−6 1.78 × 10−7 0.0928 0.0776 8160
13 0.201 0.333 0.230 0.0191 1.34 × 10−5 1.08 × 10−4 8.94 × 10−6 2.32 × 10−7 0.0681 0.0580 4720
14 0.215 0.376 0.193 0.0152 7.29 × 10−6 5.95 × 10−5 5.99 × 10−6 1.94 × 10−7 0.0625 0.0528 6650
15 0.310 0.436 0.176 0.0143 1.02 × 10−5 8.18 × 10−5 5.35 × 10−6 1.28 × 10−7 0.0905 0.0789 6470
16 0.284 0.461 0.175 0.0131 6.76 × 10−6 5.36 × 10−5 4.40 × 10−6 1.35 × 10−7 0.0812 0.0695 7750
17 0.178 0.315 0.255 0.0211 1.47 × 10−5 1.19 × 10−4 9.51 × 10−6 2.75 × 10−7 0.0656 0.0553 4360
18 0.214 0.383 0.188 0.0146 7.16 × 10−6 5.81 × 10−5 5.07 × 10−6 1.91 × 10−7 0.0610 0.0513 6900
19 0.302 0.472 0.173 0.0129 1.01 × 10−5 8.03 × 10−5 3.93 × 10−6 1.26 × 10−7 0.0872 0.0748 6860
20 0.282 0.491 0.172 0.0120 6.61 × 10−6 5.26 × 10−5 3.37 × 10−6 1.33 × 10−7 0.0802 0.0677 8050
21 0.204 0.374 0.211 0.0164 1.20 × 10−5 9.85 × 10−5 5.54 × 10−6 2.04 × 10−7 0.0648 0.0541 5390
22 0.221 0.414 0.184 0.0136 6.96 × 10−6 5.69 × 10−5 3.90 × 10−6 1.78 × 10−7 0.0628 0.0523 7350
23 0.305 0.506 0.168 0.0117 9.91 × 10−6 7.82 × 10−5 2.97 × 10−6 1.18 × 10−7 0.0868 0.0737 7220
24 0.286 0.520 0.169 0.0111 6.57 × 10−6 5.16 × 10−5 2.60 × 10−6 1.25 × 10−7 0.0808 0.0676 8310
25 0.297 0.427 0.187 0.0151 1.09 × 10−5 8.67 × 10−5 5.75 × 10−6 1.41 × 10−7 0.0907 0.0789 6360
26 0.224 0.385 0.192 0.0151 8.32 × 10−6 6.66 × 10−5 5.96 × 10−6 1.83 × 10−7 0.0653 0.0554 6320
27 0.258 0.426 0.195 0.0149 1.13 × 10−5 8.97 × 10−5 5.03 × 10−6 1.63 × 10−7 0.0796 0.0677 6240
28 0.212 0.376 0.194 0.0152 8.24 × 10−6 6.73 × 10−5 5.49 × 10−6 1.95 × 10−7 0.0622 0.0523 6350
29 0.242 0.429 0.202 0.0150 1.17 × 10−5 9.30 × 10−5 4.53 × 10−6 1.74 × 10−7 0.0769 0.0646 6300
30 0.203 0.379 0.197 0.0151 8.51 × 10−6 6.81 × 10−5 4.78 × 10−6 2.01 × 10−7 0.0605 0.0502 6500
42 0.189 0.355 0.186 0.0145 6.91 × 10−6 5.71 × 10−5 4.60 × 10−6 2.04 × 10−7 0.0522 0.0433 6590
a The best models (see §4) are indicated by bold face.
b Observational results from Gloeckler & Geiss (2004), Gloeckler (2005, private communication) and Witte et al. (1996) (see table 1 for uncer-
tainties).
photoionization equilibrium, are correct. The predictions of the
best models provide excellent matches to the observational con-
straints. Based on these models, we find the boundary conditions
of the heliosphere to be describable as n(H0) = 0.19−0.20 cm−3,
ne = 0.05−0.08 cm
−3, and X(H) ≡ H+/(H0+H+) = 0.19−0.26,
X(He) ≡ He+/(He0 + He+) = 0.36 − 0.43. For these models, we
find abundances of O/H = 295 − 437 ppm, C/H = 589 − 813
ppm, and N/H = 40.7 − 64.6 ppm (Table 7). The total LIC den-
sity is n(H)0 = 0.213 − 0.232 cm
−3, while the strength of the
interstellar magnetic in the cloud varies between 0 and 3.8 µG.
The Ne PUI data further favor densities of n(H0) ≈ 0.19 cm−3
and ne = 0.06 − 0.07 cm
In this analysis we have assumed negligible filtration for He
in the heliosheath regions. Modeling of the He filtration factor
however allows values as small as fHe = 0.92, which yields
n(He i)= 0.0164 cm−3 for the CHISM (Table 1). The predictions
of Model 21 agree with this value for n(He i), as well as with
the ratios Mg ii/Mg i and C ii/C ii∗ towards ǫ CMa and the pickup
ion Ne and Ar data (Table 3). The predicted cloud temperature
is low by ∼ 1000 K. The density and ionization for Model 21 are
n(H0) = 0.21 cm−3 and ne= 0.06 cm
−3. We therefore conclude
that our best models listed above are robust in the sense that they
predict consistent values for the n(H0) to within 5%, and electron
densities to within 25%.
The radiation field incident on the LIC for Model 26 is shown
in Fig. 1 and the spectral characteristics of the field for each
model are listed in Table 5. The wavelengths regions λ ≤ 912 Å
and λ ∼ 1500 Å are of primary importance for the photoioniza-
tion of the cloud, the former because it determines H0 and He0
ionization, and the latter because it determines Mg0 ionization.
The ionization parameter is defined as U ≡ Φ/n(H)c, where
Φ is the H ionizing photon flux, n(H) is the total (neutral + ion-
ized) hydrogen density at the cloud surface and c is the speed
of light. The total ionizing photon fluxes at the cloud surface
(photons cm−2 s−1) for the three bands 13.6–24.6, 24.6–54.4, and
54.4–100 eV, are given by ΦH, ΦHe0 , and ΦHe+ , respectively. The
ratio of the total number of H0 and He0 ionizing photons in the
incident radiation field is given by Q(He0)/Q(H0). The quantity
〈E〉 (eV) is the mean energy of an ionizing photon, equal to the
integrated energy flux from 13.6 to 100 eV divided by the inte-
grated photon flux over the same energy range.
In Figure 3 we show n(He0) and temperature of the CHISM
for our model calculations. In Figure 4 we show N ii/N i vs.
Mg ii/Mg i, illustrating an anti-correlation of the ratios caused
by the fact that Mg ii/Mg i decreases with electron density while
N ii/N i indicates the ionization level in the cloud. The ionization
of the CHISM is listed in Table 6 for Model 26, where com-
monly observed elements are listed. The abundances of He, Ne,
Slavin and Frisch: Boundary Conditions of the Heliosphere 9
Table 5. Characteristics of the Model Radiation Field
Modela U φH φHe0 φHe+ Q(He
0)/Q(H0) 〈E〉
photons cm−2 s−1 photons cm−2 s−1 photons cm−2 s−1 eV
1 2.0 × 10−6 4.6 × 103 7.5 × 103 2.8 × 103 0.46 74.6
2 2.0 × 10−6 5.3 × 103 7.3 × 103 2.7 × 103 0.44 72.4
3 3.1 × 10−6 9.0 × 103 1.3 × 104 2.7 × 103 0.49 57.7
4 3.0 × 10−6 8.6 × 103 1.2 × 104 2.6 × 103 0.49 58.6
5 2.0 × 10−6 4.0 × 103 7.0 × 103 3.8 × 103 0.43 79.0
6 2.1 × 10−6 4.9 × 103 6.9 × 103 3.8 × 103 0.41 76.2
7 3.2 × 10−6 7.4 × 103 1.4 × 104 3.8 × 103 0.53 61.3
8 3.1 × 10−6 7.4 × 103 1.3 × 104 3.7 × 103 0.52 61.5
9 2.2 × 10−6 3.8 × 103 7.4 × 103 5.6 × 103 0.40 80.0
10 2.3 × 10−6 4.8 × 103 7.2 × 103 5.6 × 103 0.38 77.3
11 3.5 × 10−6 6.6 × 103 1.5 × 104 5.6 × 103 0.53 63.4
12 3.4 × 10−6 6.9 × 103 1.5 × 104 5.5 × 103 0.52 63.2
13 2.3 × 10−6 4.3 × 103 7.0 × 103 2.8 × 103 0.46 76.9
14 2.4 × 10−6 5.1 × 103 6.8 × 103 2.7 × 103 0.43 74.3
15 3.7 × 10−6 8.5 × 103 1.2 × 104 2.7 × 103 0.49 58.8
16 3.6 × 10−6 8.2 × 103 1.2 × 104 2.6 × 103 0.49 59.5
17 2.3 × 10−6 3.7 × 103 6.4 × 103 3.8 × 103 0.42 81.8
18 2.4 × 10−6 4.7 × 103 6.2 × 103 3.8 × 103 0.39 78.4
19 3.9 × 10−6 7.1 × 103 1.4 × 104 3.8 × 103 0.53 62.2
20 3.8 × 10−6 7.2 × 103 1.3 × 104 3.7 × 103 0.52 62.2
21 2.6 × 10−6 3.6 × 103 6.8 × 103 5.6 × 103 0.39 82.0
22 2.7 × 10−6 4.6 × 103 6.6 × 103 5.6 × 103 0.36 78.9
23 4.2 × 10−6 6.3 × 103 1.5 × 104 5.6 × 103 0.52 64.4
24 4.2 × 10−6 6.7 × 103 1.4 × 104 5.5 × 103 0.51 64.1
25 3.5 × 10−6 7.9 × 103 1.2 × 104 2.7 × 103 0.49 60.3
26 2.5 × 10−6 5.0 × 103 7.5 × 103 2.7 × 103 0.45 73.2
27 3.1 × 10−6 5.4 × 103 1.0 × 104 3.8 × 103 0.50 68.9
28 2.4 × 10−6 4.3 × 103 6.4 × 103 3.8 × 103 0.40 79.5
29 3.0 × 10−6 4.6 × 103 9.7 × 103 5.6 × 103 0.45 73.6
30 2.4 × 10−6 3.8 × 103 5.5 × 103 5.6 × 103 0.33 84.6
42 2.2 × 10−6 3.6 × 103 3.6 × 103 5.6 × 103 0.25 91.1
a The best models (see §4) are indicated by bold face.
Fig. 4. Model results for N(N ii)/N(N i) versus Mg ii/Mg i. The
symbols have the same meaning as in Figure 3. In this case the
models with higher n(H0) lie below and slightly to the left of
those with lower density. N(N ii)/N(N i) is an indicator of cloud
ionization fraction while Mg ii/Mg i goes as 1/ne. The dotted line
is the observed value for Mg ii/Mg i and the dashed lines indi-
cate the 1 − σ error range for the value. We see that models that
match the observed ratio all correspond to relatively low ioniza-
tion, X(H) ∼ 0.20 − 0.27 in the CHISM.
Na, Al, P, Ar, and Ca were assumed, based on solar abundances,
and were not adjusted in the modeling process. The abundances
of C, N, O, Mg, Si, S and Fe were adjusted for each model to
match observed column densities towards ǫ CMa (see §2). The
elemental abundances that have been assumed, with the excep-
tion of that for He, are not expected to have any significant im-
pact on the model results.
5. Discussion
5.1. Heliosphere Boundary Conditions
As discussed above, the best models of those we calculated
are determined by the match to the CHISM He0 density and
temperature found by the in situ Ulysses measurements (§2.2),
combined with the matching the LIC component column den-
sity ratios Mg ii/Mg i and C ii/C ii∗. These models span a fairly
large range in the model parameters: nH = 0.213 − 0.232
cm−3, log Th = 5.9 − 6.1 K, B0 = 0.05 − 3.8 µG, and N(H i)
= 3.0 × 1017 − 4.5 × 1017 cm−2. Despite this, the predicted val-
ues for neutral H density and electron density in the CHISM lie
within a remarkably small range: n(H0) = 0.19 − 0.20 cm−3,
ne = 0.05 − 0.08 cm
−3, for models that include the conductive
boundary. For the one model without evaporation that is consis-
tent with the data we find n(H0) = 0.186 cm−3, ne = 0.052 cm
Including the PUI Ne data as a constraint narrows the density re-
sults to n(H0) ≈ 0.19 cm−3 and ne = 0.06− 0.07 cm
−3. For these
10 Slavin and Frisch: Boundary Conditions of the Heliosphere
Table 6. Model 26 Results for Ionization Fractionsa
Element PPM I II III IV
H 106 0.776 0.224 – –
He 105 0.611 0.385 4.36(-3) –
C 661 2.68(-4) 0.975 0.0244 0.000
N 46.8 0.720 0.280 8.52(-5) 0.000
O 331 0.814 0.186 4.71(-5) 0.000
Ne 123 0.196 0.652 0.152 2.79(-6)
Na 2.04 1.47(-3) 0.843 0.155 6.34(-6)
Mg 6.61 1.98(-3) 0.850 0.148 0.000
Al 0.0794 5.37(-5) 0.976 0.0118 0.0123
Si 8.13 4.21(-5) 0.999 8.02(-4) 3.10(-5)
P 0.219 1.35(-4) 0.977 0.0232 9.29(-5)
S 15.8 6.47(-5) 0.971 0.0288 1.95(-6)
Ar 2.82 0.263 0.500 0.238 2.83(-6)
Ca 4.07(-4) 9.21(-6) 0.0155 0.984 1.87(-4)
Fe 2.51 7.01(-5) 0.975 0.0245 5.75(-6)
a Numbers less than 10−3 are written as x(y) where y is the exponent
and x is the mantissa (or significand).
densities to stray out of this range would appear to require sig-
nificant errors in the underlying comparison data, e.g. n(He0) in
the CHISM, or substantial non-equilibrium ionization effects in
the LIC. The variation in the interstellar radiation field between
the different models that match the data give us some degree of
confidence that these densities are not highly sensitive to the de-
tails of the radiation field. The range ne = 0.05 − 0.08 cm
corresponds to an electron plasma frequency of 2.0–2.5 kHz,
which is the frequency of the mysterious weak radio emission
detected beyond the termination shock in the outer heliosphere
(Gurnett & Kurth 2005; Mitchell et al. 2004).
5.2. Hydrogen Filtration Factor
Tracers of H0 inside of the termination shock, after filtration, in-
clude the H i Lyα backscattered radiation, H pickup ions, and
the slowdown of the solar wind at distances beyond 5 AU from
mass-loading by H PUIs. The range of n(H0) found above to
best fit the combined heliospheric n(He0) and LIC data towards
ǫ CMa, n(H0) = 0.19− 0.20 cm−3, represents the density of neu-
tral interstellar H atoms outside of the heliosphere, and removed
from heliospheric influences. The hydrogen filtration factor, fH,
can be obtained from comparisons between these models and in-
terstellar H0 densities at the termination shock as inferred from
in situ observations of interstellar H inside of the heliosphere.
The accompanying papers in this special section provide es-
timates of the interstellar H0 density at the termination shock.
The solar wind slows down due to massloading by interstel-
lar H, yielding n(H0)= 0.09 ± 0.01 cm−3 at the termination
shock (Richardson et al. 2007). The density of H pickup ions ob-
served by Ulysses is inferred at the termination shock, yielding
n(H0)= 0.11 ± 0.01 cm−3; models of H atoms traversing the he-
liosheath regions then yield for the CHISM n(H0)= 0.20 ± 0.02
cm−3 and np= 0.04±0.02 cm
−3, or ne∼ 0.05 cm
−3 (Bzowski et al.
2007). The radial variation in the response of the interplanetary
Lyα 1215 Å backscattered radiation to the solar rotational mod-
ulation of the Lyα “beam” that excites the florescence yields
n(H0)∼ 0.085−0.095 cm−3, depending on the heliosphere model
(Pryor et al. 2007). From these n(H0) values at the termination
shock, we estimate that 43%–58% of the H-atoms successfully
traverse the heliosheath region, or fH ∼ 0.43−0.58. Müller et al.
(2007) evaluate filtration using five different plasma-neutral
Table 7. Elemental Gas Phase Abundances (ppm)
Element
Model No. C N O Mg Si S Fe
14 589 40.7 295 5.89 7.24 14.1 2.24
21 955 60.3 447 9.77 11.5 22.9 3.55
25 631 66.1 437 7.76 10.0 19.5 3.09
26 661 46.8 331 6.61 8.13 15.8 2.51
27 759 64.6 437 8.71 10.7 20.9 3.31
28 708 45.7 331 7.08 8.32 16.6 2.57
29 813 64.6 437 9.33 11.0 21.9 3.39
30 741 46.8 331 7.41 8.51 17.0 2.63
42 724 39.8 295 6.76 7.76 15.1 2.34
models, and find a range of fH = 0.52 − 0.74. A hydrogen filtra-
tion of fH = 0.55 ± 0.03 is consistent with both in situ data and
radiative transfer models.
5.3. Gas-Phase Abundances
The LIC photoionization models are forced to match the ob-
served set of column densities (Tables 1 and 3). The gas-phase
abundances of most elements are treated as free parameters that
can be varied in order to match observed column densities, so
that the successful models yield elemental abundances for the
LIC that are automatically corrected for unobserved H+ (Table
7). The exceptions are that He, Ne, and Ar abundances, being
unconstrained by the observations toward ǫ CMa, are not ad-
justed but are assumed to be 105 ppm, 123 ppm and 2.82 ppm,
respectively. In the models, N(C ii∗) is a constraint on both the
C abundance (in place of the heavily saturated C ii 1335Å line)
and ne, such that the product of the abundance and ne is more
tightly limited than either quantity individually. The requirement
to match both N(C ii∗) and n(He0) effectively restricts the ioniza-
tion fraction of H, which in turn limits O and N ionization whose
ionization fractions are tied by charge-transfer to the H ioniza-
tion at LIC temperatures.
Early studies showed that the abundances of refractory el-
ements in the very local ISM are enhanced compared to abun-
dances in cold disk gas (Marschall & Hobbs 1972; Stokes 1978;
Frisch 1981). Throughout warm and cold disk gas, the un-
derabundances of refractory elements compared to solar abun-
dances (by factors of 10−1 − 10−4) are taken to represent deple-
tion onto interstellar dust grains (e.g. Savage & Sembach 1996).
This view is supported by the correlation found between elemen-
tal depletions and the temperature characteristic of condensa-
tion at solar pressure and composition (Ebel 2000), and assumes
that there is a reference abundance pattern that characterizes the
cloud, and remains constant over the cloud lifetime as atoms are
exchanged between the gas and dust phases. Below (§5.4) we
compare solar abundances with observed gas-phase abundances
to predict the gas-to-dust mass ratios for the LIC, based on the
assumption that LIC gas and dust have remained coupled over
the cloud lifetime.
An important question is whether the LIC has solar abun-
dances. Isotopes of 18O and 22Ne isotopes measured in the ACR
population suggest that this is so. ACRs are characterized by
a rising particle flux for energies below 10–50 MeV/nucleon,
and this characteristic spectral signature is seen for 16O, 18O,
20Ne, and 22Ne. Ratios of 16O/18O ∼ 500 and 20Ne/22Ne ∼
13.7 are found for both the ACRs and solar material, indicat-
ing that the CHISM and solar material have similar composi-
Slavin and Frisch: Boundary Conditions of the Heliosphere 11
Table 8. Solar Abundances (ppm)
Grevesse Holweger Lodders Grevesse
Sauval et al.
(1998) (2001) (2003) (2007)a
C 334 ± 46 391(+110,−86) 290 ± 27 275 ± 34
N 84 ± 12 85.3+25
−19 82 ± 20 67 ± 10
O 683 ± 94 545+107
−90 579 ± 66 513 ± 63
Ne 121 ± 17 100+17
−15 91 ± 21 77 ± 12
Mg 38 ± 4 34.5+5.1
−4.5 41 ± 2 38 ± 9
Si 35 ± 4 34.4+4.1
−3.7 41 ± 2 36 ± 4
S 22 ± 5 - 18 ± 2 15 ± 2
Ar 2.54 ± 0.35 - 4.24 ± 0.77 1.70 ± 0.34
Fe 32 ± 4 28.15.84.8 35 ± 2 32 ± 4
a Protosolar abundances are obtained by increasing the photospheric
abundances by 0.05 dex for elements heavier than He, as suggested
by Grevesse et al. (2007)
tions (Leske et al. 2000; Leske 2000). We therefore adopt solar
abundances as the underlying reference abundance pattern for
the LIC.
Unfortunately a prominent uncertainty exists in the correct
solar abundances of volatile elements such as O and S, which
have low condensation temperatures (Tcond, 180 K and 700 K
respectively), and noble elements such as Ne and Ar. Solar
abundances are determined from photospheric data (C, N, O,
Mg, Si, S, Fe), the solar wind (Ar, Ne), solar active regions
(Ne), and helioseismic data (He); abundances of non-volatile el-
ements are also found from meteoritic data (Grevesse & Sauval
1998; Holweger 2001; Lodders 2003; Grevesse et al. 2007).
Solar abundances from these studies are listed in Table 8. Our
results, discussed below, indicate that if the LIC has a solar abun-
dance composition, as indicated by the 18O and 22Ne data, then
the lower abundances found by Grevesse et al. (2007) are pre-
ferred by our models.
Ne: In these models we have assumed the Ne abundance
is 123 ppm (Anders & Grevesse 1989), which is based on a
combination of photospheric and interstellar data. Solar sys-
tem Ne abundances are difficult to measure because of FIP
effects, however values include 77 ppm (Grevesse et al. 2007,
after adding 0.05 dex to account for gravitational settling of
the elements), and ∼ 41 ppm for solar wind in coronal holes
(Gloeckler & Geiss 2007). The predicted densities of n(Ne0) in
the CHISM for models 26 and 28 (Table 4) are in agreement with
the most recent PUI results for Ne (Table 1), and Ne densities as
low as ∼ 100 ppm are allowed when filtration is included.
The CHISM Ne abundances indicated by these results ap-
pear to be consistent with Ne abundances and ionization levels
in the global ISM. The Ne abundance in the Orion nebula is 100
ppm (Simpson et al. 2004). Takei et al. (2002) measured X-ray
absorption edges formed by Ne and O in the interstellar gas and
dust towards Cyg X-2, and found abundances of Ne/H ∼ 92 ppm
and O/H ∼ 579 ppm when both atomic and compound forms
in the sightline were included. Juett et al. (2006) observed the
X-ray absorption edges of Ne and O towards nine X-ray bina-
ries which sampled both neutral and ionized warm material, and
found that the ionized states formed in the ionized material have
the ratio Ne iii/Ne ii ∼ 0.23. This value is identical to the pre-
dictions of model 26, Ne iii/Ne ii ∼ 0.23, a fortuitous agreement
that may indicate that the EUV radiation field in the CHISM is
similar to the generic galactic EUV field in the solar vicinity.
Ar: Solar Ar abundance determinations range between Ar/H
∼ 1.4 − 5.0 ppm (Table 8); we have assumed Ar/H = 2.82 ppm.
The predicted Ar density at the Sun is within the uncertainties
of the PUI data, although the range of possible filtration factors
(0.64 – 0.95, see §2.2) also allow considerable leeway.
O: In the warm ISM such as the LIC, the ionization of oxy-
gen and hydrogen is tightly coupled over timescales of ∼100
years by charge transfer (Field & Steigman 1971), so that the
assumed N(H i) combined with N(O i) measurements act to con-
strain the deduced O abundance in the gas. The two best models
(26, 28) correspond to O/H = 331 ppm, however the O column
density measurements are based on the saturated 1302Å line, and
have ∼ 35% uncertainty. The modeled LIC value of O/H ∼ 331
ppm indicates that ∼ 35% of the O atoms are depleted onto dust
grains. An oxygen filtration factor of fO ∼ 0.75 is required by
the PUI data and Model 26.
These models yield gas-phase O abundances that are con-
sistent with observations of more distant interstellar sightlines.
The ratio N(O i)/N(H i) is measured in both low and high extinc-
tion clouds. Oliveira et al. (2003) used unsaturated O i lines in
the 910–1100Å interval and found O i/H i = 317 ± 19 ppm for
∼ 30 sightlines that included both types of material. Sightlines
with detected H2, N(H) > 10
20.5 cm−2 and 〈nH〉 = 0.1−3.3 cm
yield O/H = 319 ± 14 ppm (Meyer et al. 1997). A survey of 19
stars with an average distance of 2.6 kpc by André et al. (2003)
found O i/H i = 408± 14 ppm, where the long sightline and high
average value N(H)/E(B− V) = 6.3 × 1021 cm−2 mag−1 indicate
a bias towards sightlines containing many clouds that individu-
ally have low extinctions. A larger sample of 56 sightlines for
a range of extinctions and distances show that sightlines with
higher average mean densities, 〈nH〉, show O/H = 284±12 ppm,
versus O/H = 390 ± 10 ppm for stars with low values of 〈nH〉
(Cartledge et al. 2004). For comparison, solar abundance stud-
ies yield a range of ∼ 450 − 780 ppm.
C: The best Models 26 and 28 yield a gas-phase abundance
of C of C/H = 661 and 708 compared to solar abundances
of ∼ 240 − 500 ppm, which is consistent with our earlier re-
sults (Slavin & Frisch 2006) indicating an overabundance of C
in the LIC. We speculate that shock destruction of carbonaceous
grains, perhaps combined with some local spatial decoupling be-
tween carbonaceous and silicate grains, may explain these find-
ings.
Singly-ionized carbon is an important coolant in the LIC
(§5.5), so the C overabundance is required to maintain the tem-
perature of the CHISM at the observed value. The carbon abun-
dance obtained here indirectly depends on the Mg ii→Mg i di-
electronic recombination coefficient that determines the ratio
Mg ii/Mg i, since that ratio is used as a criteria for the best mod-
els. The same ionization correction that gives the C abundance
also successfully predicts Ne ionization in global WPIM and the
S abundances in the LIC, although this may be a fortuitous co-
incidence. In the adjacent sightline towards Sirius the LIC has
N(C ii)/N(H i) = 1, 050 ppm (Hebrard et al. 1999). An ionization
correction of 300% is required to make this value consistent with
solar abundances, and such a large ionization correction is not
consistent with the ionization levels of X(H) ∼ 20 − 26% found
here. In contrast, sightlines with cold ISM show C abundances
on the order of 135 ± 46 ppm (Sofia et al. 1997; Sembach et al.
2000).
N: The best models (26 and 28) find N/H = 46–47 ppm, com-
pared to solar values of ∼ 57−110 ppm. These results are consis-
tent with the PUI results, N/H ∼ 19−47 ppm, after filtration fac-
tor uncertainties are included. The N and O results and favor an
12 Slavin and Frisch: Boundary Conditions of the Heliosphere
ISM abundance pattern for volatiles similar to the Grevesse et al.
(2007) photospheric abundances.
S: The best models predict S/H = 16–17, compared to solar
values of ∼ 13−27 (including uncertainties, see Table 8). Sulfur
is found to have little or no depletion onto dust grains in warm
diffuse ISM (e.g. Welty et al. 1999).
Mg, Si, Fe: These refractory elements are observed in the
LIC gas with abundances far below solar (factors of 3–15).
Approximately 92%, 82%, and 77% of the Fe, Mg, and Si, re-
spectively, are presumably depleted onto interstellar dust grains.
If (Grevesse et al. 2007) abundances are assumed for the LIC,
then to within the uncertainties the LIC dust has the relative com-
position of Fe:Mg:Si:O = 1:1:1:4, as is consistent with amor-
phous olivines MgFeSiO4. Fe and Si are dominantly singly ion-
ized, while Mg has a significant fraction (∼ 15%) that is twice
ionized. The gas-phase abundances of these refractory elements
are highly subsolar, even after ionization corrections are made,
indicating that these elements are substantially depleted onto in-
terstellar dust grains. In contrast to C, however, the silicate dust
in the LIC that carries the missing Mg, Si, and Fe has experi-
enced far less destruction than the carbonaceous grains.
Ca ii, Na i: Weak lines of the trace ionization species Ca ii
and Na i are common diagnostics of ionization and abundance
for interstellar clouds, including the partially ionized LIC; Na i
is also frequently used as a diagnostic of the H column density.
We note that our models show that the ratios N(Ca ii)/N(Na i),
N(Na i)/N(H), and N(Na i)/N(H i) vary by 30%, 77%, and 93%,
respectively, between the best models (Models 26–30, 42). As
trace ionization species, the densities of Na i and Ca ii are highly
sensitive to volume density, n(Na0), n(Ca+) ∝ n(H)ne. We there-
fore conclude that Na i and Ca ii are imprecise diagnostics of
ionization levels, H density, and abundances in warm partially
ionized clouds.
5.4. Gas-to-Dust Mass Ratio
Because the abundances are automatically corrected for unob-
served H+, we use the model results to infer the total mass of the
interstellar dust, providing that the gas and dust in the LIC form
a coupled and closed system that evolves together as the cloud
moves through the LSR. The LIC LSR velocity is 16–21 pc/Myr,
so that a LIC origin related to the Loop I or Scorpius-Centaurus
superbubble would require that the LIC gas and dust remained
a closed system over timescales of 4–5 Myr (Frisch & Slavin
2006; Frisch 1981). Gas-to-dust mass ratios calculated from the
best models (26 and 28) using the missing-mass argument2 are
in the range RG/D= 149 − 217, depending on solar abundances.
The detailed information about RG/D for different assumptions
and the different models is listed in table 9.
For comparison, RG/D determined from comparisons of in
situ observations of interstellar dust inside of the solar system,
compared to the gas densities of these models, yield RG/D = 115–
125 (Table 9, Landgraf et al. 2000; Altobelli et al. 2004). The in
situ RG/D is an upper limit, since the smallest interstellar dust
grains (radii ≤ 0.15 µm) with large charge-to-mass ratios (and
thus small Larmor radii) are excluded from the heliosphere by
the interstellar magnetic field which is draped over the helio-
sphere.
For all the models, the RG/D determined from comparing
in situ dust measurements with the CHISM gas mass flux is
2 This argument assumes that the ISM reference abundances, in this
case solar abundances, represent the sum of the atoms in the gas plus
the dust (Frisch et al. 1999).
Table 9. Gas-to-Dust Mass Ratios from Models and In Situ
Observations
Modela
Source 14 26 27 28 29 30 42
GS98 137 149 196 149 197 150 138
Lodders 158 174 238 174 239 175 160
Grevesse 194 217 321 217 323 218 196
In Situ 115 116 123 116 125 116 107
a The source of the comparison solar abundances is listed in column
1 (Table 8. The in situ dust flux is from Landgraf et al. (2000),
corrected downwards by 20% as recommended by Altobelli et al.
(2004) to account for side-wall impacts.
Table 10. Major Heat Sources in LIC Gasa
Sourceb Fraction of Heating
H i 0.657
He i 0.248
dust 0.055
He ii 0.016
cosmic rays 0.010
a Results for model 26. Other models are qualitatively the same,
though there are some quantitative variations.
b For lines with ion names, the source here denotes the ion that is
photoionized. Dust heating comes from photoelectric ejection by
photons of the background FUV radiation field. Cosmic ray heating
comes from electron impact ionization of the gas and direct heating
of the electrons in the LIC plasma by the cosmic ray electrons.
lower than that determined by assuming solar abundances and
using the gas phase abundances we determine to find RG/D.
This suggests that somehow the dust flowing into the helio-
sphere is concentrated relative to the gas, compared to the over-
all LIC sightline towards ǫ CMa. The lower solar abundances
of Grevesse et al. (2007) result in lower required depletions, and
produces stronger disagreements with RG/D determined from in
situ data. We do not understand this result, which we have found
previously (Frisch et al. 1999). Since RG/D is sensitive to the
mass of Fe in the dust grains (Frisch & Slavin 2003), we sug-
gest that this difference may indicate inhomogeneous mixing of
the gas and silicate dust over the ∼ 0.64 pc extent of the LIC.
5.5. Heating and Cooling Rates
The heating and cooling rates for Model 26 are listed in Tables
10 and 11. The primary heat sources are photoelectrons from
the ionization of H0 and He0, with dust and cosmic ray heating
contributing less than 7% of the heating. The dominant source
of cooling is the [C ii] 157.6 µm fine-structure line, making up
43% of the total. This is more than twice the contribution of
any other coolant. Nearly all the cooling is due to optical and
infrared forbidden lines with many lines contributing at the ∼
1% level. H recombination, free-free emission and dust, through
the capture of electrons onto grain surfaces, also contribute at
about a 2% level. The importance of C ii as both a constraint
on the C abundance in our models as well as a major coolant
means that any model that aims to reduce the abundance of C
to a solar level faces severe difficulties. The models with LIC
temperatures in the THe0 = 6 300 ± 340 K range indicated by
Slavin and Frisch: Boundary Conditions of the Heliosphere 13
Table 11. Major Coolants in LIC Gasa
Ion/Line Fraction of Cooling
[C ii] 157.6 µm 0.428
[S ii] 6731 Å 0.145
Fe ii (total) 0.074
[Si ii] 34.8 µm 0.065
[Ne ii] 12.8 µm 0.035
[O i] 63.2 µm 0.028
H recomb. 0.024
dust 0.024
[Ne iii] 15.6 µm 0.020
[N ii] 6584 Å 0.018
net free-free 0.018
[O i] 6300 Å 0.017
[O ii] 3727 Å 0.011
[Ar ii] 6.98 µm 0.011
a Results for model 26. Other models show similiar results.
the in situ He0 data all require supersolar abundances of C. The
total heating/cooling rate for the LIC at the Sun for this model is
3.55 × 10−26 ergs cm−3 s−1.
5.6. Radiation Field
Recently it has been proposed that a significant portion
of the SXRB can be attributed to charge-transfer (a.k.a.
charge exchange) between the solar wind ions (e.g., O+7 and
O+8) and interstellar neutrals (Cravens 2000; Snowden et al.
2004; Wargelin et al. 2004; Smith et al. 2005; Koutroumpa et al.
2006). While it seems at present that some fraction of the low
energy X-rays are from this mechanism, it is unclear how large
that fraction is. We note that basing the properties of the local
hot plasma in the galactic plane on SXRB emission at energies
E > 0.3 keV is problematical. (Bellm & Vaillancourt 2005) have
compared the Wisconsin B and Be band data with the ROSAT
R12 data, and concluded that the observed anticorrelation be-
tween R12 and N(H i) indicates that more than 34% of the SXRB
generated in the Galactic disk must come from the Local Bubble.
They also concluded that a heavily depleted plasma with log
T ∼ 5.8 is consistent both with the McCammon et al. (2002);
Sanders et al. (2001) X-ray spectral data, and the upper limits
set on the EUV emission by CHIPS (Hurwitz et al. 2004). When
the Robertson & Cravens (2003) models of SXRB production
by charge-transfer with the solar wind are considered, then only
half of the SXRB in the plane is required to arise from a hot local
plasma. We also note that the atomic physics for the calculation
of the low energy part of the emission is still quite uncertain
(V. Kharchenko, private communication). At this point we take
the simple approach of ignoring the charge-transfer emission,
though we plan to consider its possible impact in future work by
reducing the assumed SXRB flux from hot gas. As noted previ-
ously, a lower SXRB flux due to a lower pressure in the hot gas
does not necessarily have any impact on our calculated flux from
the evaporative cloud boundary.
5.7. LIC Pressure
The strength of the interstellar magnetic field in the LIC is un-
known, though modeling of its effects on the heliosphere sets
some constraints. Our best models (26 and 28) presented here
have a thermal pressure of ∼ 2100 cm−3 K for the LIC. If the
thermal and magnetic pressures are equal, this indicates a mag-
netic field strength B ∼ 2.7 µG, in agreement with field strengths
for these models. As noted is §2.3, the main effect of the field
strength in the models is to regulate the pressure in the evapo-
rative cloud boundary, which in turn affects the flux of diffuse
EUV radiation incident on the cloud. The amount of EUV flux
helps determine the temperature in the cloud, which is how the
observational constraints fix the magnetic field strength in the
context of our modeling. Thus we do not explicitly fix the mag-
netic field strength with the goal of achieving equipartition and
indeed some of our successful models have lower or higher field
strengths. It is probably coincidental that the field strength re-
quired to match the in situ He0 temperature for our best models
is also close to the equipartition field strength, but it is at least
encouraging that this field strength is consistent with our pho-
toionization models. We note that if thermal, cosmic ray, and
magnetic pressures are approximately equal the LIC has a pres-
sure of ∼ 6300 cm−3 K.
5.8. Comparisons with other LISM Sightlines
There have been a number of efforts to understand the
LISM ionization and abundances (Frisch et al. 1986;
Cheng & Bruhweiler 1990; Lallement & Bertin 1992;
Vallerga 1996; Lallement & Ferlet 1997; Holberg et al. 1999;
Kimura et al. 2003). The studies that attempt to derive gas
phase elemental abundances find a range of results, generally
fairly consistent with ours. A point of particular interest is the
abundance of carbon that is surprisingly overabundant in our
results. As an example, Kimura et al. (2003) find (based on four
sightlines and excluding the ǫ CMa and β CMa sightlines), a
subsolar C abundance in contradiction with our results. Results
for thirteen sightlines from Redfield & Linsky (2004) with
velocity components consistent with the LIC velocity vector
show that N(C ii)/N(O i) > 1 for 8 of them, especially those
at lower column density indicating a solar or supersolar C
abundance. For our best models N(C ii)/N(O i) ≈ 2. Our series
of studies are unique in that we model the radiation field incident
on the cloud, include radiative transfer effects, and calculate
the thermal equilibrium within the cloud. The ionization varies
through the cloud as does the temperature and density (slightly)
and we compare observations within the heliosphere with the
physical conditions at that point in the cloud rather than basing
the model on line-of-sight averages.
Our present results indicate that n(N ii)/n(N i)∼ 0.32 − 0.50
at the solar location, with N becoming more ionized as the
sightline approaches the cloud surface. The column density ra-
tio is thus higher, ranging from 0.38 − 0.62. Observed values
of N(N ii)/N(N i) toward other nearby stars are 0.58+0.56
−0.77 toward
Capella (Wood et al. 2002), 1.29±0.23 toward HZ43 (Kruk et al.
2002), 1.91+0.87
−0.69 towards WD1634-573 (Lehner et al. 2003), and
1.13 ± 0.24 towards η UMa (Frisch et al. in preparation). The
total H i column density towards each of these stars is greater
than the N(H i)∼ 4 × 1017 cm−2 found for the best models here.
The nearest of these stars, Capella, has an ionization compara-
ble to that of the LIC. The two high-latitude stars HZ43 and η
UMa appear to sample low opacity regions where the ionization
is larger than at the Sun, as does the WD1634-573 sightline that
appears to cross the nearby diffuse H ii region seen towards λ
Sco (York 1983). As we have noted the ǫ CMa line of sight is
special because that star is the dominant source of stellar EUV
photons for the LIC. Thus for sightlines at a large angle from
the ǫ CMa sightline, if the H i column between points along the
sightline and ǫ CMa is small the apparently high column points
14 Slavin and Frisch: Boundary Conditions of the Heliosphere
are subject to a strong EUV field. Such geometry dependent ion-
ization effects can be important for non-spherical clouds subject
to a strongly spatially variable ionizing radiation source.
The variation in the fractional ionization of the CLIC gas
has a direct impact on our understanding of the distribution and
physical properties of low column density clouds for several rea-
sons. (1) Abundances of elements with FIP < 13.6 eV must al-
ways be calculated with respect to N(H i)+ N(H ii) for very low
column density clouds. (2) Cloud geometry affects the opacity
of observed sightlines so that the opacity to ionizing radiation
is not directly traced by the observed value of N(H i). For lines
of sight other than that towards ǫ CMa, this could require more
complex radiative transfer models in which the difference be-
tween the line of sight toward the star and that toward one of the
primary sources of ionizing flux, ǫ CMa, at each point is taken
into account.
6. Conclusions
There are many uncertainties regarding the detailed properties of
the ionizing interstellar radiation field incident on the LIC. The
data we have on the LIC, both from absorption line studies and
in situ measurements by spacecraft in the heliosphere, provide us
with strong constraints on the ionization and composition of the
LIC and particularly the CHISM. By exploring a range of models
for the ISRF we find that while a fairly broad range of radiation
fields can produce photoionization consistent with the data, other
outputs from the models fall within a relatively narrow range of
values. Our results for the models explored in this paper in which
we require our models to be consistent with the LIC component
of the absorption lines observed towards ǫ CMa include:
1. For a range of assumptions regarding the H i column density
of the LIC, N(H i) = (3.0−4.5)×1017 cm−2, and temperature
of the hot gas of the Local Bubble, log Th = 5.9, 6.0 and
6.1, we are able to find model parameters that allow a match
of the model results with best observed quantities, n(He0),
T (He0) and N(Mg ii)/N(Mg i). For these models we assume
that the cloud is evaporating because of thermal conduction
between the hot Local Bubble gas and the warm LIC gas and
include the emission from the cloud boundary.
2. For the best models in terms of fits to data, the required input
parameters are: initial (i.e. at the outer edge of the cloud)
total H density, n(H) ≈ 0.21−0.23 cm−3; and cloud magnetic
field, B0 ≤ 3.8 µG.
3. If we assume that the magnetic field configuration reduces
thermal conductivity at the boundary enough to prevent
evaporation and ignore any radiation from the cloud bound-
ary, we find that for most cases the radiation field does not
cause sufficient heating to maintain the LIC at the temper-
ature observed, T = 6, 300 ± 340 K. One set of parameter
choices, though, yields a successful model. These parame-
ters are N(H i) = 4.5 × 1017 cm−2 and log Th = 6.1.
4. Despite the wide range of possible input parameters, the out-
put values for quantities important for shaping the helio-
sphere are confined to a fairly small range: n(H0) = 0.19 −
0.20 cm−3, and ne = 0.05 − 0.08 cm
5. A H filtration factor of fH = 0.55 ± 0.03 yields good agree-
ment between the radiative transfer model predictions for
n(H i) in the CHISM, and n(H i) at the termination shock as
found from observations of PUIs, the H i Lyα glow, and the
solar wind slow-down in the outer heliosphere. This filtra-
tion value is also consistent with heliosphere models of the
ionization of interstellar H atoms traversing the heliosheath
regions.
6. Elements with ionization potentials 13.6− 25 eV, e.g. H, He,
N, O, Ne, and Ar, are partially ionized with ionization frac-
tions of ∼ 0.2 − 0.7.
7. By requiring that the models match the column densities de-
rived from absorption line data we are able to determine the
necessary elemental abundances for several elements. We
find that the abundances of N and O may be somewhat sub-
solar. Sulfur is roughly solar, and C is substantially super-
solar. Mg, Si and Fe are all sub-solar by factors of 3 − 15.
The depletions of Fe, Mg, Si and O in the LIC are consis-
tent with a dust population consisting of amorphous silicate
olivines MgFeSiO4, though other compositions for the dust
are possible as well. We conclude that any carbonaceous dust
in the LIC must have been destroyed, while silicate dust has
persisted. Except for the gas-to-dust mass ratio, these results
are in better agreement with the lower solar abundances of
Grevesse et al. (2007). However we note that the O and Ne
abundances of Lodders (2003) are in better agreement with
other astronomical data such as the X-ray absorption edges.
8. The gas-to-dust mass ratio derived from missing mass in the
gas-phase for our best models depends strongly on the as-
sumed reference abundance set and range from 137 − 323.
Our two best models, nos. 26 and 28, give a range of
149−217. For these same models RG/D = 115−125 based on
the observed flux of dust into the heliosphere. The discrep-
ancy of these values is minimized, in fact leading to con-
sistency within the errors, if one assumes an abundance set
such as that of GS98 which has large abundances of the met-
als. The GS98 abundances lead to substantial O depletion,
however, which is not easily explained and conflict with the
S abundances found for models 26 and 28.
9. These models also show that the densities of the trace ion-
ization species Ca ii and Na i are extremely sensitive to den-
sity and ionization. Therefore the ratios N(Ca ii)/N(Na i),
N(Na i)/N(H i), and N(Na i)/N(H) are, by themselves, inade-
quate diagnostics of warm low density diffuse gas.
Acknowledgements. We would like to thank George Gloeckler for sharing data
with us prior to publication, and Alan Cummings for pointing out that the
ACR isotopic data indicate that the LIC abundances are solar. We also thank
the International Space Science Institute in Bern, Switzerland for hosting the
working group on “Interstellar Hydrogen in the Heliosphere.” This research was
supported by NASA Solar and Heliospheric Program grants NNG05GD36G
and NNG06GE33G to the University of Chicago, and by the NASA grant
NNG05EC85C to SWRI.
References
Adams, T. F. & Frisch, P. C. 1977, ApJ, 212, 300
Ajello, J. M. 1978, ApJ, 222, 1068
Altobelli, N., Krüger, H., Moissl, R., Landgraf, M., & Grün, E. 2004,
Planet. Space Sci., 52, 1287
Altun, Z., Yumak, A., Badnell, N. R., Loch, S. D., & Pindzola, M. S. 2006, A&A,
447, 1165
Anders, E. & Grevesse, N. 1989, Geochim. Cosmochim. Acta, 53, 197
André, M. K., Oliveira, C. M., Howk, J. C., et al. 2003, ApJ, 591, 1000
Bellm, E. C. & Vaillancourt, J. E. 2005, ApJ, 622, 959
Bertaux, J. L. & Blamont, J. E. 1971, A&A, 11, 200
Bloch, J. J., Jahoda, K., Juda, M., et al. 1986, ApJ, 308, L59
Bzowski, M. 2003, A&A, 408, 1155
Bzowski, M., Gloeckler, G., Tarnopolski, S., Izmodenov, V., & Moebius, E.
2007, A&A, in press,
Cartledge, S. I. B., Lauroesch, J. T., Meyer, D. M., & Sofia, U. J. 2004, ApJ, 613,
Cheng, K. & Bruhweiler, F. C. 1990, ApJ, 364, 573
Cowie, L. L. & McKee, C. F. 1977, ApJ, 211, 135
Cravens, T. E. 2000, ApJ, 532, L153
Slavin and Frisch: Boundary Conditions of the Heliosphere 15
Cummings, A. C., Stone, E. C., & Steenberg, C. D. 2002, ApJ, 578, 194
Dupuis, J., Vennes, S., Bowyer, S., Pradhan, A. K., & Thejll, P. 1995, ApJ, 455,
Ebel, D. S. 2000, J. Geophys. Res., 105, 10363
Egger, R. J. & Aschenbach, B. 1995, A&A, 294, L25
Ferland, G. J., Korista, K. T., Verner, D. A., et al. 1998, PASP, 110, 761
Field, G. B. & Steigman, G. 1971, ApJ, 166, 59
Frisch, P. & York, D. G. 1986, in The Galaxy and the Solar System (University
of Arizona Press), 83–100
Frisch, P. C. 1981, Nature, 293, 377
Frisch, P. C. 2007, “Composition of Matter”, Space Sciences Series of ISSI, publ.
Springer, 27, 00
Frisch, P. C., Dorschner, J. M., Geiss, J., et al. 1999, ApJ, 525, 492
Frisch, P. C., Grodnicki, L., & Welty, D. E. 2002, ApJ, 574, 834
Frisch, P. C. & Slavin, J. D. 2003, ApJ, 594, 844
Frisch, P. C. & Slavin, J. D. 2005, Advances in Space Research, 35, 2048
Frisch, P. C. & Slavin, J. D. 2006, Short Term Variations in the Galactic
Environment of the Sun, in it Solar Journey: The Significance of Our Galactic
Environment for the Heliosphere and Earth, Ed. P. Frisch (Springer), 133–193
Frisch, P. C., York, D. G., & Fowler, J. R. 1986, in ESA Special Publication, Vol.
263, New Insights in Astrophysics. Eight Years of UV Astronomy with IUE,
ed. E. J. Rolfe, 491–492
Gloeckler, G. & Fisk, L. 2007, “Composition of Matter”, Space Sciences Series
of ISSI, publ. Springer, 27, 00
Gloeckler, G. & Geiss, J. 2004, Advances in Space Research, 34, 53
Gloeckler, G. & Geiss, J. 2007, A&A, this volume
Gloeckler, G. & Geiss, J. 2007, Space Science Reviews, 116
Gondhalekar, P. M., Phillips, A. P., & Wilson, R. 1980, A&A, 85, 272
Grevesse, N., Asplund, M., & Sauval, A. J. 2007, Space Science Reviews, 105
Grevesse, N. & Sauval, A. J. 1998, Space Science Reviews, 85, 161
Gry, C. & Jenkins, E. B. 2001, A&A, 367, 617
Gurnett, D. A. & Kurth, W. S. 2005, Science, 309, 2025
Hébrard, G., Mallouris, C., Ferlet, R., et al. 1999, A&A, 350, 643
Hebrard, G., Mallouris, C., Ferlet, R., et al. 1999, A&A, 350, 643
Henry, R. C. 2002, ApJ, 570, 697
Holberg, J. B., Bruhweiler, F. C., Barstow, M. A., & Dobbie, P. D. 1999, ApJ,
517, 841
Holweger, H. 2001, in AIP Conf. Proc. 598: Joint SOHO/ACE workshop ”Solar
and Galactic Composition”, 23–+
Hurwitz, M., Sasseen, T. P., & Sirk, M. M. 2004, ArXiv Astrophysics e-prints
Izmodenov, V., Malama, Y., Gloeckler, G., & Geiss, J. 2004, A&A, 414, L29
Juett, A. M., Schulz, N. S., Chakrabarty, D., & Gorczyca, T. W. 2006, ApJ, 648,
Kimura, H., Mann, I., & Jessberger, E. K. 2003, ApJ, 582, 846
Koutroumpa, D., Lallement, R., Kharchenko, V., et al. 2006, A&A, 460, 289
Kruk, J. W., Howk, J. C., André, M., et al. 2002, ApJS, 140, 19
Lallement, R. & Bertin, P. 1992, A&A, 266, 479
Lallement, R., Bertin, P., Ferlet, R., Vidal-Madjar, A., & Bertaux, J. L. 1994,
A&A, 286, 898
Lallement, R. & Ferlet, R. 1997, A&A, 324, 1105
Lallement, R., Vidal-Madjar, A., & Ferlet, R. 1986, A&A, 168, 225
Lallement, R., Welsh, B. Y., Vergely, J. L., Crifo, F., & Sfeir, D. 2003, A&A,
411, 447
Landgraf, M., Baggaley, W. J., Grün, E., Krüger, H., & Linkert, G. 2000,
J. Geophys. Res., 105, 10343
Landsman, W. B., Henry, R. C., Moos, H. W., & Linsky, J. L. 1984, ApJ, 285,
Lehner, N., Jenkins, E., Gry, C., et al. 2003, ApJ, 595, 858
Leske, R. A. 2000, in AIP Conf. Proc. 516: 26th International Cosmic Ray
Conference, ICRC XXVI, ed. B. L. Dingus, D. B. Kieda, & M. H. Salamon,
274–+
Leske, R. A., Mewaldt, R. A., Christian, E. R., et al. 2000, in AIP Conf.
Proc. 528: Acceleration and Transport of Energetic Particles Observed in the
Heliosphere, ed. R. A. Mewaldt, J. R. Jokipii, M. A. Lee, E. Möbius, & T. H.
Zurbuchen, 293–+
Linsky, J. L. & Wood, B. E. 1996, ApJ, 463, 254
Lodders, K. 2003, ApJ, 591, 1220
Möbius, E., Bzowski, M., Chalov, S., et al. 2004, A&A, 426, 897
Müller, H.-R., Florinski, V., Heerikhuisen, J., et al. 2007, A&A, in press, sub-
mitted
Müller, H.-R. & Zank, G. P. 2004a, J. Geophys. Res., 7104
Müller, H.-R. & Zank, G. P. 2004b, Journal of Geophysical Research (Space
Physics), 7104
Marschall, L. A. & Hobbs, L. M. 1972, ApJ, 173, 43
McCammon, D., Almy, R., Apodaca, E., et al. 2002, ApJ, 576, 188
McCammon, D., Burrows, D. N., Sanders, W. T., & Kraushaar, W. L. 1983, ApJ,
269, 107
McClintock, W., Henry, R. C., Linsky, J. L., & Moos, H. W. 1978, ApJ, 225, 465
McClintock, W., Linsky, J. L., Henry, R. C., & Moos, H. W. 1975, ApJ, 202, 733
Meyer, D. M., Cardelli, J. A., & Sofia, U. J. 1997, ApJ, 490, L103
Mitchell, J. J., Cairns, I. H., & Robinson, P. A. 2004, Journal of Geophysical
Research (Space Physics), 109, 6108
Oliveira, C. M., Hébrard, G., Howk, J. C., et al. 2003, ApJ, 587, 235
Parravano, A., Hollenbach, D. J., & McKee, C. F. 2003, ApJ, 584, 797
Pryor, W., Gangopadhyay, P., Sandel, W., et al. 2007, A&A, in press
Quémerais, E., Lallement, R., Bertaux, J.-L., et al. 2006a, A&A, 455, 1135
Quémerais, E., Lallement, R., Ferron, S., et al. 2006b, Journal of Geophysical
Research (Space Physics), 111, 9114
Raymond, J. C. & Smith, B. W. 1977, ApJS, 35, 419
Redfield, S. & Linsky, J. L. 2004, ApJ, 602, 776
Richardson, J. D., Liu, Y., Wang, C., & McComas, D. J. 2007, A&A, in press
Richardson, J. D., Wang, C., & Burlaga, L. F. 2004, Advances in Space Research,
34, 150
Ripken, H. W. & Fahr, H. J. 1983, A&A, 122, 181
Robertson, I. P. & Cravens, T. E. 2003, Journal of Geophysical Research (Space
Physics), 108, 6
Rucinski, D., Cummings, A. C., Gloeckler, G., et al. 1996, Space Science
Reviews, 78, 73
Sanders, W. T., Edgar, R. J., Kraushaar, W. L., McCammon, D., & Morgenthaler,
J. P. 2001, ApJ, 554, 694
Savage, B. D. & Sembach, K. R. 1996, ApJ, 470, 893
Sembach, K. R., Howk, J. C., Ryans, R. S. I., & Keenan, F. P. 2000, ApJ, 528,
Simpson, J. P., Rubin, R. H., Colgan, S. W. J., Erickson, E. F., & Haas, M. R.
2004, ApJ, 611, 338
Slavin, J. D. 1989, ApJ, 346, 718
Slavin, J. D. & Frisch, P. C. 2002, ApJ, 565, 364
Slavin, J. D. & Frisch, P. C. 2006, ApJ, 651, L37
Smith, R. K., Edgar, R. J., Plucinsky, P. P., et al. 2005, ApJ, 623, 225
Snowden, S. L., Collier, M. R., & Kuntz, K. D. 2004, ApJ, 610, 1182
Snowden, S. L., Cox, D. P., McCammon, D., & Sanders, W. T. 1990, ApJ, 354,
Snowden, S. L., Egger, R., Finkbeiner, D. P., Freyberg, M. J., & Plucinsky, P. P.
1998, ApJ, 493, 715
Snowden, S. L., Egger, R., Freyberg, M. J., et al. 1997, ApJ, 485, 125
Snowden, S. L., McCammon, D., & Verter, F. 1993, ApJ, 409, L21
Sofia, U. J., Cardelli, J. A., Guerin, K. P., & Meyer, D. M. 1997, ApJ, 482, L105
Stokes, G. M. 1978, ApJS, 36, 115
Takei, Y., Fujimoto, R., Mitsuda, K., & Onaka, T. 2002, ApJ, 581, 307
Thomas, G. E. & Krassa, R. F. 1971, A&A, 11, 218
Vallerga, J. 1996, Space Sci. Rev., 78, 277
Vallerga, J. 1998, ApJ, 497, 921
Wargelin, B. J., Markevitch, M., Juda, M., et al. 2004, ApJ, 607, 596
Weller, C. S. & Meier, R. R. 1981, ApJ, 246, 386
Welty, D. E., Hobbs, L. M., Lauroesch, J. T., et al. 1999, ApJS, 124, 465
Witte, M. 2004, A&A, 426, 835
Witte, M., Banaszkiewicz, M., & Rosenbauer, H. 1996, Space Science Reviews,
78, 289
Wood, B. E., Linsky, J. L., & Zank, G. P. 2000a, ApJ, 537, 304
Wood, B. E., Linsky, J. L., & Zank, G. P. 2000b, ApJ, 537, 304
Wood, B. E., Redfield, S., Linsky, J. L., Müller, H.-R., & Zank, G. P. 2005, ApJS,
159, 118
Wood, B. E., Redfield, S., Linsky, J. L., & Sahu, M. S. 2002, ApJ, 581, 1168
York, D. G. 1974, ApJ, 193, L127
York, D. G. 1983, ApJ, 264, 172
List of Objects
‘ǫ CMa’ on page 2
‘Sirius’ on page 3
	Introduction
	Photoionization Model Constraints and Assumptions 
	Astronomical Constraints – The LIC towards  CMa  
	In situ Constraints – He0, Pickup Ions, and Anomalous Cosmic Rays 
	Interstellar Radiation Field at the Cloud Surface 
	Photoionization Models
	Model Results 
	Discussion
	Heliosphere Boundary Conditions
	Hydrogen Filtration Factor
	Gas-Phase Abundances 
	Gas-to-Dust Mass Ratio 
	Heating and Cooling Rates 
	Radiation Field
	LIC Pressure
	Comparisons with other LISM Sightlines
	Conclusions
ABSTRACT
  The boundary conditions of the heliosphere are set by the ionization, density
and composition of inflowing interstellar matter. Constraining the properties
of the Local Interstellar Cloud (LIC) at the heliosphere requires radiative
transfer ionization models. We model the background interstellar radiation
field using observed stellar FUV and EUV emission and the diffuse soft X-ray
background. We also model the emission from the boundary between the LIC and
the hot Local Bubble (LB) plasma, assuming that the cloud is evaporating
because of thermal conduction. We create a grid of models covering a plausible
range of LIC and LB properties, and use the modeled radiation field as input to
radiative transfer/thermal equilibrium calculations using the Cloudy code. Data
from in situ observations of He^O, pickup ions and anomalous cosmic rays in the
heliosphere, and absorption line measurements towards epsilon CMa were used to
constrain the input parameters. A restricted range of assumed LIC HI column
densities and LB plasma temperatures produce models that match all the
observational constraints. The relative weakness of the constraints on N(HI)
and T_h contrast with the narrow limits predicted for the H^O and electron
density in the LIC at the Sun, n(H^0) = 0.19 - 0.20 cm^-3, and n(e) = 0.07 +/-
0.01 cm^-3. Derived abundances are mostly typical for low density gas, with
sub-solar Mg, Si and Fe, possibly subsolar O and N, and S about solar; however
C is supersolar. The interstellar gas at the Sun is warm, low density, and
partially ionized, with n(H) = 0.23 - 0.27 cm^-3, T = 6300 K, X(H^+) ~ 0.2, and
X(He^+) ~ 0.4. These results appear to be robust since acceptable models are
found for substantially different input radiation fields. Our results favor low
values for the reference solar abundances for the LIC composition.

<|endoftext|><|startoftext|>
Introduction
	Model Hamiltonian
	Single-orbital model. Spin-selective hybridization.
	Description of the employed simplifications
	Interaction Hamiltonian in excitonic representation
	Numerical estimates of the energy parameters
	Collective spin-flip states. negative g2DEG-factor
	Secular equation
	Spectrum of the localized states
	Delocalized impurity-related excitations.
	Positive g2DEG-factor. Pinning of the QHF spin
	Skyrmionic states created by magnetic impurities
	Phase diagram of QHF ground state at g2DEG*>0 
	Discussion
	acknowledgments
	EXCITONIC REPRESENTATION
	SPIN OPERATORS
	References
ABSTRACT
  A theory of collective states in a magnetically quantized two-dimensional
electron gas (2DEG) with half-filled Landau level (quantized Hall ferromagnet)
in the presence of magnetic 3d impurities is developed. The spectrum of bound
and delocalized spin-excitons as well as the renormalization of Zeeman
splitting of the impurity 3d levels due to the indirect exchange interaction
with the 2DEG are studied for the specific case of n-type GaAs doped with Mn
where the Lande` g-factors of impurity and 2DEG have opposite signs. If the
sign of the 2DEG g-factor is changed due to external influences, then impurity
related transitions to new ground state phases, presenting various spin-flip
and skyrmion-like textures, are possible. Conditions for existence of these
phases are discussed. PACS: 73.43.Lp, 73.21.Fg, 72.15.Rn

<|endoftext|><|startoftext|>
Introduction
About a year after the discovery of the first optical af-
terglow of a Gamma-Ray Burst (GRB) by van Paradĳs
et al. (1997), two of van Paradĳs’ students discov-
ered the first supernova associated with a long-duration
E.P.J.van den Heuvel and S.-C. Yoon
Astronomical Instiutute “Anton Pannekoek” & Center for High
Energy Astrophysics, University of Amsterdam, The Netherlands
and Kavli Institute for Theoretical Physics, University of Califor-
nia, Santa Barbara
GRB: SN 1998bw/GRB980425 (Galama, Vreeswĳk et
al. 1998). This supernova appeared to be highly pecu-
liar and energetic. It is of class Ic, which means that
it has no H or He in its spectrum. Its outflow veloci-
ties of > 30000 km/s were very much larger than the
10000 km/s seen in “ordinary” Type Ic supernovae and
the total kinetic energy in SN1998bw was > 1052 ergs:
at least an order of magnitude larger than in other
supernovae. Theoretical modeling by Iwamoto at al.
(1998) showed that the exploding star must have been
a Carbon-Oxygen star with a mass in the range 6 to 13
M�, which had a collapsing core > 3 M�. The latter
is too large to leave a neutron star, implying that this
was the first-ever observed birth event of a stellar-mass
black hole (Iwamoto et al. 1998). The discovery of
SN1998bw was a beautiful confirmation of the “collap-
sar” (“hypernova”) model proposed by Woosley (1993).
According to this model the collapse of the rapidly ro-
tating core of a massive star to a black hole will leave
behind a rapidly rotating torus of extremely hot nuclear
matter around the black hole. Internal friction in this
keplerian torus causes its matter to spiral in towards the
black hole within a few minutes, generating so much
heat in this process that part of the matter is blown
away in directions perpendicular to the plane of the
torus with relativistic velocities. Woosley speculated
that these relativistic “jets” of matter might produce a
GRB. SN 1998bw appeared to confirm the predictions
of Woosley‘s “collapsar” (“hypernova”) model. Al-
though GRB980425 was, as a GRB, intrinsically quite
faint and nearby (z=0.0085), which at first cast some
doubt on the idea that genuine long-duration GRBs
would in general be the birth events of stellar black
holes, the discovery of the association of the really “cos-
mological” gamma-ray burst GRB 030329 (z= 0.17)
with a supernova with a spectrum and lightcurve almost
identical to those of SN1998bw (e.g. Hjorth et al. 2003)
confirmed beyond any reasonable doubt the association
of long GRBs (abbreviated further as LGRB) with the
death events of very massive stars and the formation of
black holes. Indeed, while the lightcurves of the opti-
cal transients (OTs) associated with LGRBs are often
dominated by the radiation from the relativistic outflow
of the GRB, numerous LGRBs have shown late-time
“bumps” consistent with the presence of underlying su-
pernovae (e.g. Bloom et al. 1999; Galama et al. 1999;
Levan et al. 2005). For a review see Woosley and Bloom
(2006). These discoveries have given strong credence to
Woosley‘s (1993) model as the “standard” model for
the production of the LGRBs, and this model has been
worked out in more detail by Woosley and collabora-
tors (e.g. MacFadyen and Woosley 1999; Woosley and
Heger, 2006). To distinguish these very energetic and
peculiar Ic “supernovae” associated LGRBs from the
more ordinary Ibc supernovae, we will in this paper call
them “hypernovae”. In order to finish with a pure CO-
core of mass > 6M�, a star must have started out on
the main sequence with a mass > 30M�, which implies
that the LGRBs are associated with the most massive
stars. Here we will discuss further evidence linking in-
deed the LGRBs with such stars, and examine under
which circumstances a star could lose its entire H- en
He-rich envelope before collapsing to a black hole. It
appears that the removal of the envelope by a binary
companion might be an attractive possibility.
2 Host Galaxy Characteristics: further
evidence for an association of the LGRBs
with the most massive stars.
In a very important recent paper, Fruchter et al. (2006)
reported that the environments of LGRBs are strik-
ingly different from those of the “ordinary” core col-
lapse supernovae of types Ib,c and II. Using Hubble
Space Telescope imaging of the host galaxies of LGRBs
and core-collapse supenovae they found that the GRB
are far more concentrated on the very brightest re-
gions of their host galaxies than are the supernovae.
Furthermore, they found that the host galaxies of the
GRBs are significantly fainter and more irregular than
the hosts of the supernovae. Theoretical work (Fryer,
2004, 2006) shows that stars which started out on the
main sequence with masses between 8 and 20 M� leave
neutron stars as remnants, while the cores of stars more
massive than about 20M� collapse to black holes. Fig-
ure 1, after Fryer (2006) shows that this happens irre-
spective of initial metallicity, although the black holes
produced at lower metallicity tend to be much more
massive than those from higher metallicity stars. In
view of the slope of the IMF, some 75 per cent of the
Fig. 1.— Mass of collapsed remnant as a function of
initial main-sequence progenitor mass from the analysis
by Fryer (2006), for both the Limongi & Chieffi (2006)
and Woosley et al. (2002) stellar progenitors. The lines
are derived from the Woosley et al. (2002) progenitors:
dotted line refers to solar metallicity, solid line refers to
very low metallicity. The points are derived from the
Limogni and Chieffi (2006) models: circle -solar, square
0.2 solar, triangle - zero, metallicities. Around 20 solar
masses the outcome depends sensitively on the stellar
evolution code used. Credit: C.L.Fryer (2006)
deaths of stars >8 M� arise from the mass range 8-
20M�, and only some 25 per cent from masses > 20M�.
Therefore, the bulk of the core collapse supernovae will
be neutron-star forming events. It thus appears that
the neutron-star forming events follow the normal light
distribution of their host galaxies, whereas the LGRBs
are concentrated strongly on the brightest parts of these
galaxies. Another striking difference is that while half
of the hosts of the “normal” core collapse supernovae
are Grand Design (GD) spiral galaxies, only one out of
the 42 hosts of the LGRBs is a GD spiral, the other
41 being smaller and more irregular galaxies. [In the
case of the one GD spiral it is still very well possible
that the real host is a small SMC- or LMC-like satellite
of this spiral galaxy, which at this distance cannot be
separately recognized].
The brightest patches of the irregular and small host
galaxies of LGRBs are “clumps” of massive stars. This
follows from the fact that these hosts are generally
found to be very blue ( Fruchter et al. 1999; Sokolov et
Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 3
al. 2001) and have strong emission lines (Bloom et al.
1998; Vreeswĳk et al. 2001), suggesting a significant
abundance of young massive stars. At the large red-
shifts of the GRB hosts it is impossible to distinguish
the stellar content of the bright emission line spots (the
entire HST image of a host is often smaller than an
arcsec), but nearby small irregular starforming (“star-
burst”) galaxies serve as a good example of what is
going on in these small GRB hosts. A nearby exam-
ple of such a galaxy is NGC 3125 which was studied
by Hadfield and Crowther (2006). These authors find
that the bright spots of this galaxy consist of large con-
centrations of O- and WR-type stars, which number of
order 10 000 in this galaxy. The galaxy has a metal-
licity like that of the SMC/LMC (between 0.2 and 0.5
solar) and its brightest clump has at least four dense
star clusters of > 200 000 solar masses, each with some
600 O-stars. A few of the hosts of relatively nearby
LGRBs associated with hypernovae show similar char-
acteristics. The host of SN1998bw is an LMC-size star-
forming galaxy; the host of GRB060218 is SMC size;
the host of GRB030329 is a z=0.17 undetectable, indi-
cating that its size must be smaller than that of the
SMC, and the host of GRB970228 at z=0.67 is not
larger than the LMC.
Recently Wolf and Podsiadlowski (2006), statis-
tically studying part of the host galaxy sample of
Fruchter et al. (2006), concluded that the typical
LGRB host galaxy is of LMC size. They found, on
the basis of the metallicity-luminosity relation for star-
forming galaxies, that LGRB models that require a
sharp metallicity cut-off below 0.5 solar metallicity are
effectively ruled out as they would require fainter host
galaxies than are observed. They therefore conclude
that metallicities up to 0.5 solar must be allowed by
models for LBRBs/hypernovae. As, however, in these
irregular galaxies the metallicity may vary wildly from
place to place, it is not clear to us whether not the
LGRBs might arise from areas in the hosts of much
lower metallicity, while the average metallicity of the
host might still be up to of order 0.5 solar.
3 Possible reasons why small “starburst-like”
galaxies are the prime sources of LGRBs
These reasons can be divided into two broad categories:
(1) Metallicity-related, (2) Starburst-related.
As to Category (1): the wind mass-loss rates from
massive stars are known to be metallicity-related: Mok-
iem at al.(2006) find from observations of O- and B-
supergiants in the Local Group galaxies that he wind
mass-loss rates scale roughly as Ṁw ∝ Z0.78, where Z
is the abundance of the elements heavier than helium.
This implies that at lower metallicities, such as in the
SMC and LMC (0.2 and 0.5 solar, respectively) massive
stars lose (much) less mass during their evolution than
in our galaxy. Therefore, they are more likely to finish
as a black hole. Indeed, one observes that in the LMC
half of the four known persistent High Mass X-ray Bina-
ries (HMXB) harbour a black hole while in our Galaxy
only one out of the over 20 known persistent HMXBs
harbours a black hole (Cygnus X-1). It thus appears
that at low Z, black-hole production is more efficient.
In addition, a requirement for producing a “hypernova”
is that at the time of the core collapse, the star is still
rotating sufficiently rapidly to enable the formation of
a disk or torus around the black hole (MacFadyen and
Woosley 1999). Lower wind mass-loss rates imply also
lower angular momentum loss rates, which will increase
the probability of having still a sufficiently rapidly ro-
tating stellar core at the time of the collapse.
As to Category (2): It is well-known that during a
starburst massive dense star clusters form with many
hundreds, if not thousands, of massive OB stars. For
example, many such massive young globular clusters are
observed in the pair of Antennae Galaxies. In massive
young globular clusters a variety of dynamical interac-
tions take place between massive stars, massive binaries
and stellar remnants (black holes, neutron stars) rang-
ing from direct collisions to companion exchanges in
binary systems, and to the formation of so-called In-
termediate Mass Black Holes (IMBHs) with masses of
order 100 to 1000 solar masses (Portegies Zwart et al.,
2002, 2004, 2006). These can be unique events, which
do not occur in any other stellar environment. Kulkarni
(2006) suggested that LGRBs might be related to such
unique events that can occur only in starburst galax-
ies. This interesting idea merits to be further worked
out, but at present not much further can be said about
it. For this reason we will here only concentrate on the
possible relation between LGRBs and metallicity. In
order to make a hypernova such as the ones observed
to coincide with the LGRBs, the two following condi-
tions should be fulfilled:
(1) the star must have lost its H- and He-rich outer lay-
(2) At the time of core collapse, the core should have
specific angular momentum in the range
J(CO − core) = (3− 20)× 1016[cm2/s] (1)
In order to fulfill these two conditions, two possible sce-
narios have been proposed:
(i) Completely-mixed single-star evolution of a rapidly-
rotating low-metallicity star (Yoon and Langer 2005;
Woosley and Heger 2006).
(ii) Binary mass exchange, where the star achieves and
maintains its rapid rotation due to tidal synchroniza-
tion in a close binary (Izzard et al. 2004; Podsiadlowski
et al. 2004).
We now separately discuss these two possible scenarios.
4 Completely mixed single-star models of low
metallicity
In this case the rapid rotation of the star keeps it com-
pletely mixed by meridional circulation during its en-
tire H-burning evolution. The low metallicity causes
the wind mass- and angular-momentum-loss rates to
be small such that the star keeps rotating rapidly until
the end. The complete mixing makes that by the end
of hydrogen burning the star has become a complete
helium star (the weak wind has by that time carried off
the thin hydrogen envelope that still surrounded the
helium core). Yoon and Langer (2005) calculated such
an evolution for a star which started out with M= 40
M� and Z= 10−5 and find that it evolves into a rapidly
rotating pure helium star of 32 M�, which after 600
years of C-burning undergoes core-collapse to a black
hole with sufficient angular momentum to make a hy-
pernova. They find that this type of evolution follows if
the star starts out with an equatorial rotation velocity
of 0.5 times the critical one. Later calculations by these
authors suggest that up to Z =0.2 solar the stars still
follow this evolutionary path. Woosley and Heger find
that it would still work up to Z = 0.33 solar. For higher
Z this single star model no longer works. If the conclu-
sion of Wolf and Podsiadlowski (2006) mentioned in
section 2 would strictly hold, i.e. if models should work
up to Z =0.5 solar, these single star models would be
ruled out. However, as mentioned at the end of section
2, due to the patchy distribution of metallicity in irregu-
lar starburst galaxies, there could easily be patches with
SMC-like (Z=0.2) metallicities in the irregular hosts
and therefore certainly these completely mixed single
star models cannot be ruled out. In the calculations of
Yoon and Langer (2005) these stars still have a helium-
rich envelope, which would lead to a Type Ib supernova,
but later calculated models (Yoon, Langer and Norman
2006) and also some of the Woosley and Heger (2006)
models lose this envelope by wind such that they would
produce a Type Ic supernova.
5 Binary Models; can LGRBs be the formation
events of Black-Hole X-ray Binaries?
5.1 Introduction
The first ones to consider binary models for making
LGRBs were Fryer and Woosley (1998). Their model
was, however, not a core-collapse model, but one in
which an already existing black hole in an X-ray bi-
nary spiraled down into the helium core of its massive
companion, as a result of a Common-Envelope phase.
Although interesting, we will not consider such models
here and only concentrate on “hypernova” models in
which the LGRB coincides with the core-collapse event
in which a black hole is formed.
Izzard et al. (2004) and Podsiadlowski et al. (2004)
were the first to consider the role that binary systems
might play in producing such “hypernova” events. At
present some twenty close X-ray binaries are known
that consist of a black hole and a low-mass companion
star (see McClintock and Remillard, 2006). The black
hole in such systems typically has a mass between 3
and 20 M�, and the companion is a Roche-lobe filling
star with a mass < 2M�. The orbital periods are in
general less than a few days, and in many cases less
than 0.5 day. In the system of X-ray-Nova Sco 1994
(J1655-40) the F-type companion of the 7 M� black
hole has an overabundance of alpha-type elements such
as S, Mg and Si of more than one order of magnitude
(Israelian et al. 1999). This is just what one expects if
the outer layers of the star of which the core collapsed
to the black hole were ejected in a supernova-like event
and polluted the outer layers of the F-type companion.
It thus appears that in this black-hole X-ray binary a
hypernova-like event took place. Podsiadlowski et al.
(2004) propose that in all of these low-mass black hole
X-ray binaries the formation event of the black hole
produced a LGRB. The formation of these BH-LMXBs
requires a preceding Common-Envelope (CE) phase of
an initially wide binary system consisting of the massive
progenitor star of the black hole together with a dis-
tant low-mass companion star (e.g. see van den Heuvel
and Habets 1984; Brown et al. 1996; Nelemans and
van den Heuvel, 2001). During this CE phase the low-
mass companion spiraled down deeply into the envelope
of the massive companion resulting in a very close bi-
nary system consisting of the helium core of the mas-
sive star together with its low mass-mass main-sequence
companion (< 2M�). Izzard et al. (2004) and Podsi-
adlowski et al. (2004) suggested that tidal forces in
this close binary keep the helium star in synchronous
(=rapid) rotation, allowing it to have sufficient angular
momentum at the time of its core collapse to produce a
Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 5
hypernova. These authors, however, did not calculate
the timescales on which tidal synchronization in such
binaries can be achieved. In order to see whether such
a model can work, one has to calculate these timescales
as well as the timescales on which the rotation of the
contracting stellar core is synchronized with the outer
envelope of the star. These two problems we will con-
sider here.
5.2 Timescales for synchronization of helium stars in
close binaries with a main-sequence companion.
We consider helium stars of 8 and 16 M�, which are
probably representative for the progenitors of the black
holes in LGRBs. Helium-burning helium stars with
such masses are almost completely convective. In 8
and 16 M� helium stars the convective cores have radii
of about 60 and 70 per cent, respectively, of the stellar
radii, and occupy most of the stellar mass (Paczynski
1971).
According to Zahn (1975, 1977) the tidal synchro-
nization timescale for a star with a convective core and
a radiative envelope is given by:
1/tsync = 52
)1/2 MR2
q2(1 + q)5/6E2
where q = M2/M is the mass ratio of the compan-
ion (M2) and of the star to be synchronized (M), and
gs, R and I are the surface gravity, radius and mo-
ment of intertia, respectively, of the latter star, a is the
orbital radius and E2 is the tidal torque constant for
stars with a radiative envelope and a convective core.
E2 is proportional to (Rc/R)6, where Rc is the radius
of the convective core (Zahn, 1975, 1977). Zahn (1975)
calculated the values of E2 for main-sequence stars of
various masses. For such stars in the mass range 7 to
15M� he found E2 to be around 10−4. In order to cor-
rect for the much larger relative radius of the convec-
tive cores in helium stars, one has to multiply the E2-
values for main-sequence stars of similar masses with
(RcHe/Rcms)6 , where Rcms is the relative radius if the
convective core of the main-sequence star, and RcHe is
the one of the helium star. To this end we used for the
8M� helium star (Rc = 0.7R) the E2 value of Zahn’s
10M� main-sequence star (Rc = 0.27R) and for the
16M� Helium star (Rc = 0.8R) we used the E2 value of
Zahn‘s 15M� main-sequence star (Rc = 0.30R). This
yields E2 = 4.4 × 10−4 for the 8M� helium star and
E2 = 1.7× 10−2 for the 16M� helium star.
In order to get the shortest possible orbital periods,
we now assume that after the CE phase the low-mass
main-sequence companion of the helium star fills its
Roche lobe.
We then find for the 8M� helium star that with
Roche-lobe-filling companions of 1, 2 and 4 M�, re-
spectively, the orbital periods are 8.78, 10.45 and 12.43
hours, respectively; using equation (2) we then find
that with these three companion masses the tidal syn-
chronization timescales of these three systems are 1800,
1400 and 1130 years, respectively. For a 16M� helium
star the orbital periods with these three main-sequence
companion masses are exactly the same and the tidal
synchronization timescales are 440, 400 and 370 years,
respectively. The lifetimes of helium stars of 8 and 16
M�, respectively, are of order 5 × 105 yrs (Paczynski
1971). Thus one expects, as already assumed by Izzard
et al. (2004) and Podsiadlowski et al. (2004), that these
helium stars will be fully synchronized with their or-
bital motion throughout their core-helium-burning evo-
lution. Could after the end of helium burning the con-
tracting Carbon-Oxygen core of the helium star keep
the angular momentum which it obtained in its state
of synchronized helium star and maintain that angular
momentum until core collapse? As we will now show,
it is unlikely that it will be able to take this barrier.
Fig. 2.— Solid curve: specific angular momentum as
a function of mass in a synchronized 8 solar mass he-
lium star with a 0.8 solar mass Roche-lobe filling main-
sequence companion. Dotted curve: specific angular
momentum distribution required for the formation of a
hypernova in case the mass interior toMr collapses to a
Schwarzschild black hole; dash-dotted curve: the same
for the case of a Kerr black hole
5.3 Timescales for core-envelope coupling
The fully drawn curve in Figure 2 shows the specific an-
gular momentum distribution in a synchronized helium
star of 8M� in a close binary with a 0.8M� Roche-lobe
filling companion (Porb = 7.17h), compared with the
minimum specific angular momentum required to form
an accretion disk around a Schwarzschild and a Kerr
black hole, as a function of the black hole mass. One
observes from this figure that if the inner part of the
helium star can maintain its specific angular momen-
tum also when it becomes a contracting CO-core (which
then will spin much faster than its helium envelope)
then indeed the inner parts of such helium stars would
be able to produce a hypernova/GRB if the black hole
is of the Kerr type. However, whether the contracting
CO-core can maintain its specific angular momentum
which it had as a helium star, depends on the timescale
of core-envelope coupling. It is expected that this cou-
pling in a convective differentially rotating star will be
due to magnetic fields generated in this star, and Spruit
(2002) has derived the order of magnitude timescale for
this coupling. Yoon (2006) calculated the evolution of
rotating helium stars with masses between 8 and 40M�
using Spruit’s (2002) mechanism for core-envelope cou-
pling. He found that the inner 3M� of the CO-cores of
these stars at the moment of core collapse have retained
a fraction f of their initial specific angular momentum
which they had as a helium star in solid-body rotation:
For MHe = 8-16 M�: f = 0.2; 20M�: f = 0.4; 25M�: f
= 0.6; 30M�: f = 0.65; 40 M�: f = 0.75.
Using these values for 8-16 M� stars in Figure 2 one
sees that the specific angular momentum in the central
parts of the 8 M� helium star (the fully drawn curve)
moves downwards by a factor 5 and thus falls below
the Kerr as well as the Schwarzschild curves. The same
holds for the 16M� helium star. This means that while
a Helium star in a close binary with a Roche-lobe fill-
ing low-mass main-sequence star has achieved tidal syn-
chronization during core-helium burning, still its core
at the time of its collapse will be unable to produce a
hypernova/LGRB. We thus see that the progenitors of
the black holes in the Black-Hole X-ray Binaries with
low-mass companion stars in all likelyhood did not pro-
duce a hypernova/LGRB.
5.4 Timescales for synchronization of helium stars in
close binaries with a compact companion
Such binaries will form by the spiral-in of a neutron-
star or black-hole companion of a massive star in a wide
High-Mass X-ray Binary (HMXB). Recently, with IN-
TEGRAL such a wide system was discovered, consist-
ing of a blue supergiant and a compact star in a 330
day orbit (Sidoli et al. 2006). [In HMXBs with orbital
periods shorter than about 100 days, the compact star
is expected to spiral into the core of its companion such
that no binary will be left (e.g. Taam 1996)]. Presently
three close X-ray binaries consisting of a helium star
(Wolf-Rayet star) and a compact object are known:
Cygnus X-3 (Porb = 4.8 h; van Kerkwĳk et al. 1992),
and the extragalactic sources IC10 X-1 (Porb = 34.8
h, Prestwich et al. 2007; ATel 955) and NGC 300 X-
1 (Porb = 32.8 h; Carpano et al. 2007). The short-
est possible orbital periods of helium star plus compact
star binaries will occur if the helium star fills its Roche
lobe. For helium stars of 8M� and 16M� these short-
est possible orbital periods are 2.046 and 2.466 hours,
respectively, independent of the mass of the compact
companion. Using equation (2) one finds that the syn-
chronization timescales in these systems are extremely
short, of the order of years to decades at most, such
that they will remain synchronized throughout their
core-helium- burning evolution. The specific angular
momentum is here 3.7×1017 and 6.0×1017 cgs, respec-
tively. As mentioned above, the cores of these stars
can maintain some 20 per cent of this up till core col-
lapse. Equation (1) shows that this is sufficient to make
a LGRB/hypernova. Thus the post-in-spiral remnants
of HMXBs are suitable for producing Long GRBs.
Some example progenitor HMXBs that might pro-
duce a LGRB: We use Webbink‘s (1984) equation to
calculate the ratio of the final and initial orbital radius
in the case of Common-Envelope evolution (e.g. see
also van den Heuvel 1994). We will assume that the
product αλ = 1, where α is the efficiency parameter
for the ejection of the envelope, and λ is a parameter
characterizing the density structure of the star. Our
first example is Cygnus X-1, for which we adopt a mass
of 35M�, with a 14M� helium core for the supergiant
and a mass of 15M� for the black hole (e.g. Gies and
Bolton, 1982, 1986). The initial orbital period of 5.6
days of this system then results into a final orbital pe-
riod of 2.4 hours for the 14M� helium star plus the
15M� black hole. In this case the helium star will be
very close to filling is Roche lobe, so we expect the final
product of the Cygnus X-1 system to be able to pro-
duce a hypernova/LGRB when the core of the helium
star collapses to a black hole.
A second example is the system of 4U 1223-62/Wray
977, which consists of a neutron star and a blue hyper-
giant (B1.5Ia0) in an eccentric orbit with P = 41.5 days
(e.g. see Kaper et al. 2006). The hypergiant is likely
to have a mass 35M�, so we again we assume here
a helium core of 14M�. For the neutron star we as-
sume a mass of 1.8M� (like in the system of Vela X-1,
which also is a very massive X-ray binary). Assuming
the same values for alpha and lambda as in the first
case, we find that the final orbital period after spiral-in
is 2.1 hours, such that again the helium star just fits in-
side its Roche lobe. So also here the core of the helium
Long Gamma-Ray Burst Progenitors: Boundary Conditions and Binary Models 7
star at the time of collapse will have enough angular
momentum to make a hypernova/LGRB.
6 Discussion and Conclusions
We saw in section 4 that completely rotationally mixed
single star evolution at relatively low metallicities (Z ≤
0.33 solar) may well provide a viable model for the
production of LGRBs/hypernovae. As to binary mod-
els: the results from section 5 show that, assuming
Zahn’s (1975, 1977) model for the tidal synchroniza-
tion of helium stars in close binaries, massive helium
stars with main-sequence companions will be quickly
synchronized, within a few centuries to millennia, with
their orbital motion. However, we find that as a conse-
quence of efficient core-envelope coupling in the post-
helium burning phase it is unlikely that these stars by
the time of core collapse will have sufficient core angu-
lar momentum to produce a hypernova/GRB. On the
other hand, if the companion of the helium star is a
compact object and the helium star is close to filling its
Roche lobe (implying a very short orbital period, of the
order of a few hours) we find that by the time of core
collapse the core can still have sufficient angular mo-
mentum to produce a hypernova/GRB. The fact that
we already know two potential progenitors of close he-
lium star plus compact star companion binaries among
the HMXBs within 3.5 kpc distance from the sun im-
plies that there must be several dozens such progenitor
systems in our galaxy. Assuming a lifetime of some
50000 years for the HMXB phase, and 25 such systems
in the Galaxy, one would expect one hypernova/LGRB
from such systems every 2000 years. This is about 5
per cent of the SN rate in our galaxy. Assuming that
the GRBs are beamed within a cone of opening half-
angle 5 degrees (Frail et al. 2001), we would expect
to observe one LGRB from such binary systems per 2
million years from a Galaxy like our own.
We note that although this binary model appears
viable, it remains puzzling why LGRBs have such
a strong preference for the small irregular starburst
galaxies. A possible explanation might be that at
low metallicity a much larger fraction of the massive
stars collapses to black holes. In such galaxies one
would already expect most of the persistent “standard”
HMXBs (that is: the ones with massive blue supergiant
donor stars) to harbour black holes, while then also
the donor stars in such systems are likely to collapse
to black holes. This would imply that, if indeed the
LGRBs originate from binary systems, a considerable
fraction of the hypernovae/LGRBs will be the forma-
tion events of close double black hole systems. We note
that also Tutukov and Cherepaschuck (2004) have pro-
posed that LGRBs are later evolutionary products of
HMXBs. They assumed (but did not calculate) that
the helium star plus compact star remnants from such
systems would be synchronized and also assumed that
the collapsing cores would have retained the angular
momentum from the time a synchronized helium star.
We have shown here quantitatively that this is indeed
the case.
This research was supported in part by the National
Science Foundation under Grant No. PHY99-07949.
The first author thanks the Mount Stromlo Observa-
tory for its hospitality during the conference and the
Netherlands research School for Astronomy NOVA for
financial support for participation in this meeting.
References:
Bloom, J.S., Djorgovski, S.G., Kulkarni, S.R. and Frail,
D.A., 1998, Ap.J. 507, L25-L28
Bloom, J.S. et al., 1999, Nature, 401, 253- 456.
Brown, G.E., Weingartner, J.C. and Wĳers, R.A.M.J.,
1996, Ap. J. 463, 297-304.
Carpano, S., Pollock, A.M.T., Prestwich, A., Crowther,
P., Wilms, J., Yungelson, L. and Ehle, M., 2007, astro-
ph/0703270 (accepted as a Letter in Astron. & Ap.).
Fryer, C.L. (editor), 2004, “Stellar Collapse”, Kluwer
Acad. Publishers, Dordrecht, 406 pp.
Fryer, C.L., 2006, New Astron. Rev. 50, 492.
Fryer, C.L. and Woosley, S.E., 1998, Ap.J. 502, L9-L12.
Frail, D.A. et al., 2001, Ap.J. L55.
Galama, T.J., Vreeswĳk, P.M., et al., 1998, Nature 395,
Galama, T.J. et al., 2000, Ap.J. 536, 185-194.
Fruchter, A.S. et al., 1999, Ap.J. 519, 13-16.
Fruchter, A.S., Levan, A.J., Strolger, L., Vreeswĳk,
P.M., Thorsett, S.E., Bersier,D., Burud, I, and 26 co-
authors, 2006, Nature 441, 463.
Hadfield, L.J. and Crowther, P.A., 2006, MMRAS 368,
1822-1832.
Hjorth, J. et al., 2003, Nature 423, 847-850.
Israelian, G., Rebolo, R., Basri, G., Casares, J. And
Martin. E.L., 1999, Nature 401, 142-144.
Iwamoto, K. et al. 1998, Nature 395, 672.
Izzard, R.G., Ramirez-Ruiz, E. And Tout, C.A., 2004,
MNRAS 348, 1215.
Kaper, L., van der Meer, A., van Kerkwĳk, M.H and
van den Heuvel, E.P.J., 2006, Astron. Ap. 457, 595-
Kulkarni, S.R. 2006, talk presented at Kavli Institute
for Theoretical Physics, March 2006.
http://arxiv.org/abs/astro-ph/0703270
http://arxiv.org/abs/astro-ph/0703270
Levan, A. et al., 2005, Ap.J. 624, 880-888.
Limongi, M. and Chieffi, A. 2006, Ap.J. 647, 483.
MacFadyen, A.I. and Woosley, S.E., 1999, Ap.J. 524,
McClintock, J.E. and Remillard, R.A., 2006 in “Com-
pact Stellar X-ray Sources” (editors W.H.G.Lewin and
M. van der Klis), Cambridge Univ. Press, p. 157-213.
Mokiem, R., de Koter, A., et al. 2006, astro-
ph/0606403.
Nelemans, G. And van den Heuvel, E.P.J., 2001, As-
tron.Ap. 376, 950.
Paczynski, B. 1971, Acta Astron. 21, 1-14.
Podsiadlowski, P., Mazzali, P.A., Nomoto, K., Lazzati,
D. And Cappellaro, E., 2004, Ap.J. 607, L17-L20.
Portegies Zwart, S.F. and McMillan, S.L.W. 2002,
Ap.J. 576, 899.
Portegies Zwart, S.F., Baumgardt, H., Hut, P., Makino,
J. And McMillan, S.L.W., 2004, Nature, 428, 724.
Portegies Zwart, S.F., Baumgardt, H., McMillan,
S.L.W., Makino, J., Hut, P., and Ebisuzaki, T. 2006,
Ap.J. 641, 319.
Prestwich, A. et al, 2007, ATel Nr. 955.
Sidoli, L., Paizis, A. and Mereghetti, S., 2006, Astro-
ph/10890S.
Sokolov, V.V. et al., 2001, Astron. Ap. 372, 438-455.
Spruit, H.C., 2002, Astron. Ap. 331, 923-932.
Taam, R.E., 1996, in “Compact Stars in Binaries”
(editors J. van Paradĳs, E.P.J.van den Heuvel and
E.Kuulkers), Kluwer Acad. Publishers, Dordrecht, p.
3-15.
Tutukov, A.V. and Cherepaschuk, A.M. 2004, Astron-
omy Reports 48(1), 39-44.
Van den Heuvel, E.P.J., 1994, in “Interacting Binaries”
(eds. H.Nussbaumer and A.Orr), Springer, Heidelberg,
p. 263ff.
Van den Heuvel, E.P.J. and Habets, G.M.H.J., 1984,
Nature 309, 598-600.
Van Paradĳs, J, Groot, P.J., Galama, T., Kouveliotou,
C. et al., 1997, Nature 386, 686-689.
Vreeswĳk, P.M. et al., 2001, Ap.J. 546, 672-680.
Webbink, R.F., 1984, Ap.J. 277, 355-360.
Van Kerkwĳk, M.H., Charles, P.A., Geballe, T.R.,
King, D.L., et al. 1992, Nature 355, 703.
Wolf, C. and Podsiadlowski, P., 2006,
astro-ph/0606725v3
Woosley, S.E., 1993, Ap.J. 405, 273.
Woosley, S.E., Heger, A. and Weaver, T.A., 2002, Rev.
Mod. Phys. 74, 1015.
Woosley, S.E. and Bloom, J.S. 2006, Ann. Rev. As-
tron. Ap. 44, 507-556.
Woosley, S.E. and Heger, A. 2006, Ap.J. 637, 914-921.
Yoon, S.-C. and Langer, N., 2005, Astron. Ap. 443,
643-648.
Yoon,S.-C., Langer, N. and Norman, C. 2006, Astron.
Ap. 460, 199-208.
Zahn, J.P., 1975, Astron. Ap. 41, 329.
Zahn, J.P. 1977, Astron. Ap. 57, 383-394.
http://arxiv.org/abs/astro-ph/0606403
http://arxiv.org/abs/astro-ph/0606403
http://arxiv.org/abs/astro-ph/0606725
	Introduction
	Host Galaxy Characteristics: further evidence for an association of the LGRBs with the most massive stars. 
	Possible reasons why small ``starburst-like'' galaxies are the prime sources of LGRBs
	Completely mixed single-star models of low metallicity
	Binary Models; can LGRBs be the formation events of Black-Hole X-ray Binaries?
	Introduction
	Timescales for synchronization of helium stars in close binaries with a main-sequence companion.
	 Timescales for core-envelope coupling
	Timescales for synchronization of helium stars in close binaries with a compact companion
	Discussion and Conclusions
ABSTRACT
  The observed association of Long Gamma-Ray Bursts (LGRBs) with peculiar Type
Ic supernovae gives support to Woosley`s collapsar/hypernova model, in which
the GRB is produced by the collapse of the rapidly rotating core of a massive
star to a black hole. The association of LGRBs with small star-forming galaxies
suggests low-metallicity to be a condition for a massive star to evolve to the
collapsar stage. Both completely-mixed single star models and binary star
models are possible. In binary models the progenitor of the GRB is a massive
helium star with a close companion. We find that tidal synchronization during
core-helium burning is reached on a short timescale (less than a few
millennia). However, the strong core-envelope coupling in the subsequent
evolutionary stages is likely to rule out helium stars with main-sequence
companions as progenitors of hypernovae/GRBs. On the other hand, helium stars
in close binaries with a neutron-star or black-hole companion can, despite the
strong core-envelope coupling in the post-helium burning phase, retain
sufficient core angular momentum to produce a hypernova/GRB.

<|endoftext|><|startoftext|>
Multiscale model of electronic behavior and localization in stretched dry DNA
Ryan L. Barnett1,
, Paul Maragakis2,
, Ari Turner1, Maria Fyta1, and Efthimios Kaxiras1,3
Department of Physics, Harvard University, Cambridge, MA 02138
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138
When the DNA double helix is subjected to external forces it can stretch elastically to elongations
reaching 100% of its natural length. These distortions, imposed at the mesoscopic or macroscopic
scales, have a dramatic effect on electronic properties at the atomic scale and on electrical transport
along DNA. Accordingly, a multiscale approach is necessary to capture the electronic behavior of the
stretched DNA helix. To construct such a model, we begin with accurate density-functional-theory
calculations for electronic states in DNA bases and base pairs in various relative configurations
encountered in the equilibrium and stretched forms. These results are complemented by semi-
empirical quantum mechanical calculations for the states of a small size [18 base pair poly(CG)-
poly(CG)] dry, neutral DNA sequence, using previously published models for stretched DNA. The
calculated electronic states are then used to parametrize an effective tight-binding model that can
describe electron hopping in the presence of environmental effects, such as the presence of stray water
molecules on the backbone or structural features of the substrate. These effects introduce disorder
in the model hamiltonian which leads to electron localization. The localization length is smaller by
several orders of magnitude in stretched DNA relative to that in the unstretched structure.
I. INTRODUCTION
Soon after Watson and Crick’s discovery of the DNA
double-helix structure [1], Eley and Spivey [2] introduced
the notion of efficient charge transport along the stacked
π orbitals of the bases. The mechanism of charge trans-
port has been the subject of numerous studies in the
intervening years, with renewed interest fuelled recently
by both biological and technological considerations. Over
a decade ago, Barton and co-workers observed distance-
independent charge transfer between DNA-intercalated
transition-metal complexes [3] and argued that it would
be relevant for biology and biotechnology. More recent
electron transport experiments on DNA have yielded
widely varying results, showing alternatively insulating
behavior [4, 5, 6, 7, 8], semiconducting behavior [9],
Ohmic conductivity [10, 11, 12, 13], and proximity in-
duced superconductivity [14]. The large number of rel-
evant variables endemic to such experiments, like the
DNA-electrode contact, and the rich variety of structures
that DNA can assume, are the causes of variability in the
experimental measurements (for a recent review of trans-
port theory and experiments see Ref. [15]).
Specifically, there is a large diversity of the DNA forms
in terms of its composition, length, and structure. Exper-
iments done long ago, suggested that DNA substantially
longer than its natural length (also referred to as “over-
stretched DNA”) can undergo a transition to an elon-
gated structure up to twice the length of relaxed DNA
[16]. This was also confirmed by recent single molecule
∗Present address: Department of Physics, California Institute of
Technology, Pasadena, CA 91125.
†Present address: D.E. Shaw Group, 120 West Forty-Fifth St., New
York, NY 10036.
stretching experiments [17, 18, 19], which showed that
the molecule can be reversibly stretched up to 90% of its
natural length. Such important deformations of the dou-
ble helix may occur in biological environments. Stretch-
ing of DNA is also related to cellular processes, such as
transcription and replication. For example, proteins of-
ten induce important local distortions in the double helix
while they diffuse along the molecule in search of their
target sequences. The electronic and transport proper-
ties of DNA are directly influenced by its different con-
formations as well as by environmental factors, such as
counterions, impurities or temperature. A full account of
these effects based on a realistic, atomic scale description
of the structure and the electronic properties challenges
the capabilities of theoretical models.
Theoretical efforts to understand the electronic behav-
ior and transport in DNA can be divided into two general
categories:
(i) Model calculations that use effective hamiltonians and
master equations to describe the dynamics of electrons
and holes in DNA (see, for instance, Refs. [20, 21, 22,
23]). Recent results [24] have led to considerable in-
sights concerning the sequence-independent delocaliza-
tion of electronic states in DNA. The main limitation
of such approaches lies in the difficulty of determining
accurate values for the parameters in the effective hamil-
tonians.
(ii) Ab initio calculations that can provide an accurate
and detailed description of the electronic features [25,
26, 27]. These approaches are typically limited to a small
number of atoms due to computational costs, and cannot
readily handle the full complexity of DNA molecules in
various conformations. In particular, stretching of DNA
can induce a very significant deviation from the B form
which is stable under normal conditions in aqueous so-
lution. Such structural distortions are bound to have a
profound effect on the electronic behavior. A realistic
http://arxiv.org/abs/0704.0660v1
description of these effects makes it necessary to handle
both the atomic scale features and the overall state of
the macromolecule.
In the present work, we address the problem of DNA
stretching effects on the electronic states and the electron
localization by providing a bridge between the two ex-
tremes of the length scale; a similar methodology was re-
cently used to study hole transfer in DNA [28]. Theoreti-
cally, there are different ways of pulling the opposite ends
of the DNA strands, leading to different stretched DNA
forms, which are determined largely by base pair reorien-
tations. Here, we use the poly(CG)-poly(CG) structures
obtained in the pioneering study of Lebrun and Lavery
[29] as the representative structure for stretching effects.
This study modeled the adiabatic elongation of selected
DNA molecules in two modes of stretching, correspond-
ing to pulling on opposite 3’-3’ ends or 5’-5’ ends of the
molecule: In the 3’-3’ stretching mode, the DNA helix is
unwound leading to a ribbon-like structure, while in the
5’-5’ stretching mode the DNA helix contracts.
We begin with a set of detailed calculations for the
electronic structure of DNA bases (A,T,C,G) and repre-
sentative base pairs (AT-AT, CG-CG, AT-CG, CG-GC)
in various relative configurations, as they are likely to
appear in the stretched forms, These calculations are
based on density-functional theory [30, 31] and serve
to set the stage for more extensive calculations which
employ successive levels of approximations necessary to
handle the computational demands. Specifically, we ex-
tract the salient features of electronic structure of the
individual DNA bases and base pairs from the ab ini-
tio calculations; these are compared to an efficient and
realistic semi-empirical model [32], in order to establish
the validity of the latter approach. At this intermediate
scale, we consider an 18 base pair poly(CG)-poly(CG)
DNA sequence which has been stretched by 30%, 60%
and 90% relative to the natural length of the unstretched
B form. The atomic structure of these forms has been
established by Lebrun and Lavery [29], using empirical
interatomic potentials. We next use the information from
this approximate description to build an effective hamil-
tonian for the electronic behavior at much larger scales.
This allows us to describe electron localization, due to
the combined effects of stretching and environmental fac-
tors, over mesoscopic to macroscopic length scales. The
essence of the approach and the different scales involved
are shown schematically in Fig. 1. We emphasize that we
address here issues related only to dry and neutral DNA
structures, where the negatively charged groups on the
backbone are passivated by protons, conditions that are
relevant to the experiments we consider for comparison
to our theoretical results; water molecules or counterions
(such as Na+) are not considered in our calculations.
II. THEORETICAL METHODS
A. Ab initio calculations
As our first step toward establishing the electronic be-
havior of dry, neutral DNA, we study the nature of elec-
tronic states in individual bases and in base pairs. For
these calculations we used three different implementa-
tions of density-functional theory [30]: a method that
uses atomic-like orbitals as the basis [33], one that uses
plane waves [34] and a third that uses a real-space grid
[35]. In all three approaches, we used the same exchange-
correlation functional in the local-density approxima-
tion [31], for consistency and simplicity. More elabo-
rate approximations to exchange-correlation effects, such
as the generalized gradient approximation [36], do not
provide any improvement in describing the physics of
these weakly interacting units. In each method we used
pseudopotentials to represent the atomic cores, of the
Trouiller-Martins type [37] in SIESTA, the Vanderbilt
ultrasoft type [38] in VASP and the Hammann-Schluter-
Chiang type [39] in HARES, with computational pa-
rameters (number of orbitals in basis, plane-wave ki-
netic energy cutoff and grid spacing) that ensure a high
level of convergence. These calculations provide a thor-
ough check on the consistency of various computational
schemes to reproduce the electronic features of interest.
The results are in excellent agreement across the three
approaches. Since in these calculations there are no ad-
justable parameters, we refer to them in the following as
ab initio results.
B. Construction of semi-empirical model
The stretched forms contain a large number of atoms,
typically beyond what can be efficiently treated with
the ab initio methods used for the DNA bases and base
pairs. Accordingly, for the electronic structure calcula-
tions of these structures we use an efficient semi-empirical
quantum-mechanical approach which employs a minimal
basis set [32]. The consistency of this approach is then
verified against the ab-initio calculations. Within the
semi-empirical scheme, the electronic eigenfunctions are
expressed as
|ψ(n)〉 =
c(n)ν |ϕν〉 (1)
where the basis set |ϕn〉 includes the s and p atomic or-
bitals for each atom in the system. The coefficients c
are numerical constants, with |c(n)ν |2 giving the weight of
orbital |ϕn〉 to the electronic wavefunction. This method
uses a second order expansion in the electronic den-
sity to obtain the total energy and takes into account
self-consistently charge transfer effects which are impor-
tant for biological systems. The method gives results
for the band gaps that are in excellent agreement with
those of the ab initio approaches described above (see
Refs. [5, 40]).
The highest occupied and lowest unoccupied molecu-
lar orbitals (HOMO and LUMO, respectively, also re-
ferred to collectively as “frontier states” in the following)
are extended over the entire structure in Bloch-like wave
functions. In order to describe electron hopping and lo-
calization, we need to express these in terms of a basis
of Wannier-like states that are localized on the individ-
ual bases. To this end, we construct maximally localized
states on single base pairs by taking linear combinations
of the HOMO and LUMO states from the wavefunctions
of Eq. (1). The maximally localized states will then be
used to calculate the hopping parameters in the effective
1D hamiltonian. Using the extended electronic states
|ψ(n)〉 of the frontier states, with corresponding ener-
gies ε(n), we define the maximally localized states |ψ̃(i)〉
through the unitary transformation
|ψ̃(i)〉 =
〈ψ(n)|ψ̃(i)〉|ψ(n)〉 (2)
which minimizes the sum of the variances
〈ψ̃(i)|ẑ2|ψ̃(i)〉 − 〈ψ̃(i)|ẑ|ψ̃(i)〉2
under the constraint 〈ψ̃(i)|ψ̃(j)〉 = δij where z is the po-
sition along the helical axis. Similar and more general
methodologies have been developed in the past for ob-
taining maximally localized states from extended ones
[41, 42]. Due to the invariance of the trace, the first term
in Eq. (3) is independent of the unitary transformation
and the problem is simplified to one of maximizing the
second term on the right-hand side with the same or-
thonormality constraint. Carrying out the minimization,
we arrive at the equation
〈ψ̃(n)|ẑ|ψ̃(m)〉(zn − zm) = 0 (4)
where
zn = 〈ψ̃(n)|ẑ|ψ̃(n)〉. (5)
By inspection, we see that ζ is maximized when zn = zm
for all m and n, corresponding to maximally delocalized
states. On the other hand, ζ is minimized when the states
|ψ̃(n)〉 are the eigenfunctions of the position operator ẑ
within the HOMO or LUMO subspace. Therefore, the
problem is further reduced to constructing and diagonal-
izing the matrix
Mnm = 〈ψ(n)|ẑ|ψ(m)〉 (6)
which has the eigenvectors 〈ψ(n)|ψ̃(i)〉 that provide the
desired transformation given in Eq. (2). The eigenvalues
zn are the positions of the localized states. To evaluate
the matrix elements we use the approximation
〈ψ(n)|ẑ|ψ(m)〉 =
c(n)∗µ c
ν 〈ϕµ|ẑ|ϕν〉
c(n)∗µ c
ν Sµνzµν (7)
where Sµν = 〈ϕµ|ϕν〉 is the overlap matrix between the
two atomic orbitals and zµν =
zµ+zν
is the average z-
value for the atoms located at sites given by the labels
µ and ν. Once the localized states are constructed, the
hopping parameters can be computed as
tij = 〈ψ̃(i)|H|ψ̃(j)〉 =
ε(n)〈ψ̃(i)|ψ(n)〉〈ψ(n)|ψ̃(j)〉 (8)
recalling that the quantities 〈ψ(n)|ψ̃(i)〉 are determined
from the transformation described above.
Having defined the maximally localized states in terms
of the electronic wavefunctions from the all-atom calcula-
tions, we next produce an effective tight-binding hamilto-
nian, which allows us to study electron hopping along the
DNA double helix. This approach has also been used in a
recent study on functionalized carbon nanotubes [43]. In
our effective hamiltonian, we consider hopping between
first and second neighbors along the helix, and denote
the hopping matrix elements according to the scheme
shown in Fig. 2 for the HOMO state of the poly(CG)-
poly(CG) structure (all other frontier states involve ex-
actly the same type of hopping matrix elements):
H = ε
c†ncn + t1
n even
c†ncn+1 + c
n+1cn
n odd
c†ncn+1 + c
n+1cn
c†ncn+2 + c
n+2cn
where n represents the nth base pair along the helical
axis and we have neglected spin indices because they are
unimportant for our analysis. Note that there is a dif-
ference between hopping elements connecting even and
odd sites to their neighbors (t1 and t2 terms in the ef-
fective hamiltonian of Eq. (9)), due to the asymmetry in
the structure illustrated in Fig. 2. Performing a Fourier
transform on the electron creation and annihilation op-
erators
e−ikncn (10)
gives a hamiltonian which has coupling between momenta
k and k + π/a. By doubling the unit cell (and reducing
the Brillouin Zone by a factor of two), this can finally be
diagonalized to obtain the eigenvalues
E±k = ε+ 2t3 cos(2k)±
t 21 + t
2 + 2 t1t2 cos(2k)(11)
with the momentum sum carried out over the reduced
Brillouin Zone. With these expressions for the band
structure energies, the density of states (DOS)
g(ω) =
δ(ω − E(n)k ) (12)
can be readily obtained. These quantities are essential
in describing electron localization along the DNA double
helix under different conditions.
C. Disorder and Localization length
In order to quantify the amount of localization that is
expected in stretched DNA forms, we add a term to the
hamiltonian in Eq. (9) of the form
Hdis =
ncn (13)
which is meant to emulate disorder arising from a variety
of sources such as interaction of the DNA bases with
stray water molecules and ions, or interaction with the
substrate. Un are uncorrelated random energy variations
chosen according to a Gaussian distribution of zero mean
and width γ
P (U) =
. (14)
Once the disorder hamiltonian is constructed with a spe-
cific set of random on-site energies, by direct diagonaliza-
tion we find the eigenstates |Ψ(i)〉 ofH+Hdis (we use cap-
ital symbols to denote the new wavefunctions from the
hamiltonian that includes the disorder term) and then
calculate the localization length defined as
〈Ψ(i)|n̂2|Ψ(i)〉 − 〈Ψ(i)|n̂|Ψ(i)〉2
where
nc†ncn. (16)
For a single-hopping model with weak disorder, the lo-
calization length scales as L ∼ (t/γ)2 for electrons near
the middle of the band [44], with t the hopping matrix
element which determines the band width. The more
complicated effective hamiltonian considered here is not
amenable to simple analytic treatment.
III. RESULTS AND DISCUSSION
We begin our discussion with an overview of electronic
states in single bases and isolated base pairs. The struc-
ture of the base pairs is shown in Fig. 3 with the atoms
in each base labeled for future reference. These calcula-
tions will set the stage for a proper interpretation of the
behavior in the stretched and unstretched dry, neutral
DNA helix.
A. Frontier states
The frontier states in the base pairs are related to only
one component of the pair for both AT and CG. This is
shown in Fig. 4. Specifically, the HOMO state of the AT
pair is exactly the same as that of the HOMO state of
the isolated A, and the LUMO state of AT the same as
that of the isolated T. Similarly, the HOMO state of CG
is identified with that of the isolated G and the LUMO
state with that of the isolated C. Thus, the purines (A or
G) give rise to the HOMO state, while the pyrimidines
(T or C) are responsible for the LUMO states of each
pair. It is clear from the same figure, that essentially all
atomic pz orbitals which belong to a purine or pyrimidine
contribute to the respective HOMO or LUMO π state of
the base pair. This is in agreement with calculations on
the optical absorption spectra of DNA bases and base
pairs [45]. A closer inspection of Fig. 4 shows that the
molecular frontier states of both AT and CG can be iden-
tified as similar contributions (up to sign changes) from
specific groups of carbon and nitrogen atoms. Specif-
ically, in the purines (A and G) three distinct groups
of atoms are mainly involved in forming the HOMO or-
bital and include atoms (C8-N7), (C2-N3) and (N1-C6-
C5-C4-N9), respectively. In the pyrimidines (T and C)
the main groups involved in forming the LUMO orbital
are two, (C4-C5-N1) and (N3-N7-C6). In both base pairs
the atoms that are less involved in the frontier molecu-
lar states are the carbon atoms that form a double bond
with an oxygen atom, such as C2 of A and C and the
four-fold bonded C7 atom of A.
The frontier states are very little affected when the
two components of the base-pair are separated along the
direction in which they are hydrogen-bonded. To demon-
strate this, we show in Fig. 5 the change in the eigenval-
ues of the frontier states in AT and CG as a function of
the distance between the two atoms that are bonded to
the two backbones (we call this the backbone distance).
For both base pairs the nitrogen atoms labeled N1 and
N9, are the ones attached to the backbone (see Fig. 3).
In order to obtain realistic structures, for each value of
the backbone distance we hold the atoms of each base
that are bonded to the backbone fixed and allow all other
atoms to relax fully. These calculations were performed
with the SIESTA code [33] and the relaxed configurations
were used as input to calculate the electronic structure
with the other two methodologies [34, 35]. In Fig. 5 we
show complete results from the SIESTA calculations and
selected results from one of the other two approaches.
The results of Fig. 5 show clearly that only in the re-
gion where the backbone distance becomes significantly
smaller than the equilibrium value, interaction between
the two bases shifts the eigenvalues of the electronic
states appreciably, but even then the shifts are relatively
small for the frontier states. It is also noteworthy that
the band gap of the AT pair is significantly larger (∼ 3
eV) than that of the CG pair (∼ 2 eV) and that the fron-
tier states of CG lie within the band gap of the AT pair.
This observation is important because it indicates that
in an arbitrary sequence of base pairs, the frontier states
will be associated with those of the CG pairs. This state-
ment is verified by calculations of electronic states in the
AT-AT, CG-CG and AT-CG base pair combinations, to
which we turn next.
For more detailed comparisons, we collect in Table I
min HOMO LUMO gap
Backbone distance
AT 8.67 Å −1.63 1.60 3.23
CG 8.73 Å −0.80 1.31 2.11
Axial distance
AT-AT 3.67 Å −1.33 1.37 2.70
CG-CG 3.52 Å −0.46 0.95 1.41
AT-CG 3.36 Å −0.71 1.00 1.71
Rotation angle
36o −1.48 1.58 3.06
AT-AT 108o −1.45 1.68 3.13
180o −1.55 1.63 3.18
36o −0.52 1.22 1.74
CG-CG 108o −0.64 1.54 2.18
180o −0.94 1.60 2.54
36o −0.86 1.51 2.37
CG-GC 108o −0.66 1.43 2.09
180o −0.60 1.12 1.72
36o −0.73 1.38 2.11
AT-CG 108o −0.59 1.27 1.86
180o −0.81 1.25 2.06
TABLE I: Eigenvalues (in eV) of the frontier states for the
DNA base pairs and the base-pair combinations, at the equi-
librium configurations for the backbone distance, the axial
distance (at zero relative angle of rotation) and the angle of
rotation (at the equilibrium axial distance). The column la-
beled “min” gives the values of the distances and the angle at
the equilibrium configurations. Due to symmetry the values
for the minima at rotation angles larger than 180o are similar
to those given here and are not shown.
the eigenvalues of the frontier states for the DNA pairs
and the pair combinations, at different equilibrium con-
figurations in the three relevant variables, the backbone
distance, the axial distance and the rotation angle. Some
results on the CG-GC base pair combination are also
shown, to allow for comparison to the poly(C)-poly(G)
sequence.
When two base pairs are stacked on top of each other,
there are two degrees of freedom for motion of one relative
to the other: a separation along the helical axis, which
we will call axial distance, and a relative rotation around
the helical axis. We take the helical axis to be that which
corresponds to stacking of successive base pairs in the B
form of the DNA double helix. According to the notation
of Fig. 3, the helical axis for both base-pairs is normal
to the line connecting atoms C4 and C6 and is closer
(about one third of their distance) to the purine atom
C6. For each configuration we fix the atoms that are
bonded to the backbone at a given relative position and
allow all other atoms to relax, as was done in the calcu-
lations involving the backbone distance discussed above.
In Fig. 6 we show the behavior of electronic eigenval-
ues as a function of the axial distance and the rotation
HOMO LUMO
ε (eV) 3.12 −0.09
t1 (meV) 14.0 −0.29
t2 (meV) 2.60 0.04
t3 (meV) 0.09 0.26
TABLE II: Parameters for the on-site (ε) and hopping matrix
elements (ti, i = 1, 2, 3), for the HOMO and LUMO states of
unstretched poly(CG)-poly(CG) DNA.
angle. As above, the eigenvalues show little dependence
on these two variables, except for rather small values of
the axial distance which correspond to unphysically small
separation between the two base pairs.
What is also remarkable in the above results, is that
in the AT-CG combination, the frontier states are clearly
identified with those corresponding to the CG pair ex-
clusively, which has the smaller band gap (see Fig. 6).
Moreover, we note that the band gap of the poly(C)-
poly(G) sequence, as calculated by the semi-empirical
method based on a minimal atomic orbital basis [32] is
in excellent agreement with the value obtained from the
SIESTA calculation (2.0 eV and 2.1 eV, respectively).
The band gap is expected to be significantly smaller in
the case of wet DNA and in the presence of counterions,
as shown in Ref. [46], for a Z-DNA helix. The band gaps
between all three ab initio methods are identical within
the accuracy of these methods. The nature of electronic
wavefunctions obtained by the different methods is also
in good qualitative agreement. Accordingly, in the rest
of this paper we focus our attention to electron local-
ization in the dry, neutral poly(CG)-poly(CG) sequence,
and employ the results of the semi-empirical electronic
structure method.
B. Hopping electrons
In Fig. 7, we show the unstretched and the three
stretched forms of the poly(CG)-poly(CG) sequences at
30%, 60%, 90% elongation, along with the features of
the frontier states. For visualization purposes, we repre-
sent the calculated wavefunction magnitude of the fron-
tier states by blue (HOMO) and red (LUMO) spheres,
centered at the sites where the atomic orbitals are lo-
cated. The radius of the sphere centered on a particu-
lar atom is proportional to the magnitude of the dom-
inant coefficient |c(n)ν |2 at this site (see Eq. (1)), which
is essentially proportional to the local electronic density.
It is evident from this figure that the nature of the or-
bitals themselves, represented by the radii of the colored
spheres, does not change much in the different stretched
DNA forms, but the overlap between orbitals at neigh-
boring bases is affected greatly by the amount of stretch-
ing. For the poly(CG)-poly(CG) sequence shown, the
HOMO orbitals are always associated with the G sites
for all the stretching modes, while the LUMO orbitals
are related to the C sites. However, as the DNA be-
comes more elongated, the orbitals overlap even less and
become localized for high stretching modes. The elon-
gation to the overstretched form is achieved by changing
the dihedral angle configuration of the DNA backbone,
which leaves the local part of the orbitals essentially in-
tact. Note how the orbitals rotate and spread out as the
structure is being ovestretched, following the rotation of
bases.
We now turn to a discussion of the results for the hop-
ping matrix elements of Eq. (9). Our discussion here
is relevant to what happens when the occupation of a
frontier state is changed from complete filling (for the
HOMO) or complete depletion (for the LUMO), that is,
the physics of small amounts of hole or electron doping.
In Table II we give the values for ε, t1, t2, t3 (see Fig. 2)
for the two frontier states of the unstretched poly(CG)-
poly(CG) DNA form. The hopping matrix elements for
the HOMO state involve only the G sites; those for the
LUMO state involve only the C sites. As a consistency
check, we have also calculated matrix elements for farther
neighbors and found those to be much smaller in magni-
tude. We have calculated the values of t1, t2, t3 by repeat-
ing the same procedure as above for the stretched forms
of the poly(CG)-poly(CG) DNA sequence. We note that
if t2 = t3 = 0 electrons will not be able to migrate along
the DNA molecule even if t1 is quite large, because at
least one of the other two hops is necessary for migra-
tion (see Fig. 2). From this simple picture, it is evident
that the conductivity will be determined by which ma-
trix element dominates. Quantitatively, the “bottleneck”
hopping matrix element is given by
t = max (min(|t1|, |t2|), |t3|) . (17)
In Fig. 8 we show the value of the “bottleneck” hop-
ping matrix element calculated as a function of stretch-
ing. This indicates that hopping conductivity will dra-
matically decrease by several orders of magnitude upon
stretching the molecule and that the hopping will de-
crease more from stretching in the 3’-3’ mode than in
the 5’-5’ mode. This is due to the conformational changes
induced by the different stretching modes, described ear-
lier.
C. Localization length
The significant dropping of the hopping matrix ele-
ments upon stretching as described in the previous sec-
tion is indicative of electron localization with a weak
amount of disorder. To investigate this possibility in
detail, we focus on effects of stretching in the 3’-3’
mode. The evolution of the density of HOMO states
upon stretching is shown in Fig. 9; similar behavior is
observed for the LUMO states. The dramatic narrowing
of the DOS width (equivalent to reduced dispersion in a
band-structure picture) is strongly suggestive of electron
localization [47], in this case induced by stretching. This
localization length is controlled by the hopping elements
t, since ε is the same at each site.
For a more quantitative description, we show in Fig. 9
the localization length Li for each eigenstate for a 1500
base-pair DNA strand under different amounts of stretch-
ing. The value of L(i) for each state is obtained from
Eq.(15), with disorder strength γ = 0.3 meV, which de-
termines the width of the gaussian given in Eq. (14). This
disorder strength is much smaller than the band width
of the unstretched DNA, but becomes comparable to the
band width as the molecule is stretched. The magni-
tude of such variations in on-site energies is consistent
with those produced by the dipole potential terms, for
instance, due to the presence of a stray water molecule
situated on the substrate roughly 15 Å away from the
DNA bases. We find that changing the value of γ by an
order of magnitude (either smaller or larger) does not af-
fect the qualitative picture presented here. Note that the
localization length is not a strict function of the energy,
as it depends on the disorder near where a given state
happens to be localized. As the molecule is stretched,
the localization length dramatically decreases until, for
60% stretching, the eigenstates are completely localized
on single base pairs.
The charge localization length as a function of DNA
stretching has been recently studied in the experiment
of Heim et al. [48]. This study focuses on λ-DNA which
has an irregular sequence of base pairs, and can be com-
pared to our theoretical results for poly(CG)-poly(CG)
recalling that the frontier states even for a random se-
quence are associated with those of the CG base-pairs.
In the experiment, ropes of λ-DNA on a substrate are
overstretched by a receding meniscus technique. The
DNA ropes in this experimental setup are slightly pos-
itively charged, corresponding to a depletion of a few
electrons per 1000 base pairs. We suggest that this situa-
tion is approximated by the structures of dry and neutral
DNA that we considered above. Electrons were injected
into the DNA and the resulting localization length was
measured by an electron force microscope. For the un-
stretched DNA, the charge was found to delocalize across
the entire molecule, extending over a length of several
microns. On the other hand, the charge injected into
the overstretched DNA is localized, extending over a few
hundred nanometers only. This is qualitatively consistent
with the picture that emerges from our theoretical analy-
sis, and is even in reasonable quantitative agreement: the
degree of localization in experiment, measured by the ra-
tio of length scales going from unstretched to stretched
DNA structures, is approximately two orders of magni-
tude, while the same quantity in our calculations, going
from unstretched to 60% stretched DNA is ∼ 103.
IV. SUMMARY
We have described and implemented a multiscale
method to derive effective hamiltonian models that are
able to capture the dynamics of conduction and valence
electrons in stretched DNA, starting from ab initio, all-
atom quantum mechanical calculations. The ab initio
simulations revealed that the frontier states in the base
pairs are related to only one component of the pair. The
purines were found to be associated with the HOMO
states while the pyrimidines with the LUMO states. In
the AT-CG combination the frontier states are identified
with those of the CG pair. For all combinations of bases
and base pairs studied here, the nature of these states was
not affected by separation of the bases or base pairs along
different directions or rotation along the helical axis.
Turning to the next length scale and the semi-empirical
calculations, we have calculated the “bottleneck” matrix
elements for electron hopping along the DNA molecule,
as a function of stretching. These show a significant
decrease with elongation of DNA, which is stronger for
stretching in the 3’-3’ mode than in the 5’-5’ mode. We
were able to show quantitatively that stretching of DNA
dramatically narrows the DOS width of frontier states.
A small amount of disorder produced by environmental
factors will naturally lead to localization of the electrons
along the DNA. Our estimate for the degree of localiza-
tion, based on a reasonable (and quite small) amount of
disorder in the on-site energies for the electron states,
is in very good agreement with recent experimental ob-
servations. This provides direct validation for the con-
sistency and completeness of the multiscale method pre-
sented here.
Acknowledgements: The authors are grateful to
Richard Lavery for providing the overstretched struc-
tures. MF acknowledges support by Harvard’s Nanoscale
Science and Engineering Center, funded by the National
Science Foundation, Award Number PHY-0117795.
[1] J. D. WATSON and F. H. C. CRICK, Nature 171 (1953)
737 .
[2] D. D. ELEY and D. I. SPIVEY, Trans. Faraday Soc. 58
(1961) 411.
[3] C. J. MURPHY, M. R. ARKIN, Y. JENKINS, N. D.
GHATLIA, S. H. BOSSMANN, N. J. TURRO, and J. K.
BARTON, Science 262 (1993) 1025.
[4] E. BRAUN, Y. EICHEN, U. SIVAN, and G. BEN-
YOSEPH, Nature 391 (1998) 775.
[5] P. DE PABLO, F. MORENO-HERRERO,
J. COLCHERO, J. HERRERO, P. HERRERO,
A. BARO, P. ORDEJÓN, J. SOLER, and E. ARTA-
CHO, Phys. Rev. Lett. 85 (2000) 4992.
[6] A. J. STORM, J. VAN NOORT, S. DE VRIES, and
C. DEKKER, Appl. Phys. Lett. 79 (2001) 3881.
[7] Y. ZHANG, R. H. AUSTIN, J. KRAEFT, E. C. COX,
and N. P. ONG, Phys. Rev. Lett. 89 (2002) 198102.
[8] Y. BABA, T. SEKIGUCHI, I. SHIMOYAMA, N. HI-
RAO, and K. .G. NATH, Phys. Rev. B 74 (2006) 205433.
[9] D. PORATH, A. BEZRYADIN, S. DE VRIES, and
C. DEKKER, Nature 403 (2000) 635.
[10] H.-W. FINK and C. SCHONENBERGER, Nature 398
(1999) 407.
[11] L. CAI, H. TABATA, and T. KAWAI, Appl. Phys. Lett.
77 (2000) 3105.
[12] P. TRAN, B. ALAVI, and G. GRUNER, Phys. Rev. Lett.
85 (2000) 1564.
[13] H. COHEN, C. NOGUES, R. NAAMAN, and D. PO-
RATH, Proc. Natl. Acad. Sci., 102 (2005) 11589.
[14] A. KASUMOV, M. KOCIAK, S. GUERON,
B. REULET, V. VOLKOV, D. KLINOV, and
H. BOUCHIAT, Science 291 (2001) 280.
[15] R. G. ENDRES, D. L. COX, and R. R. P. SINGH, Rev.
Mod. Phys. 76 (2004) 195.
[16] M.H.F. WILKINS, R.G. GOSLING, and W.E. SEEDS,
Nature (London) 167 (1951) 759.
[17] S. B. SMITH, Y. J. CUI, and C. BUSTAMANTE, Sci-
ence 271 (1996) 795.
[18] P. CLUZEL, A. LEBRUN, C. HELLER, R. LAVERY,
J. L. VIOVY, D. CHATENAY, and F. CARON, Science
271 (1996) 792.
[19] T. R. STRICK, J. F. ALLEMAND, D. BENSIMON,
and V. CROQUETTE, Annu. Rev. Biophys. Biomolec.
Struct. 29 (2000) 523.
[20] Y. YAMADA, Int. J. Mod. Phys. B, 18 (2004) 1697.
[21] K. IGUCHI, Int. J. Mod. Phys. B, 18 (2004) 1845.
[22] J. Y. YI, Phys. Rev. B 68 (2003) 193103.
[23] C. M. CHANG, A. H. C. NETO, and A. R. BISHOP,
Chem. Phys. 303 (2004) 189.
[24] R. A. CAETANO and P. A. SCHULZ, Phys. Rev. Lett.,
95 (2005) 126601.
[25] R. N. BARNETT, C. L. CLEVELAND, A. JOY,
U. LANDMAND, G. B. SCHUSTER, Science, 294
(2001) 567.
[26] E. ARTACHO, M. MACHADO, S. SÁNCHEZ-
PORTAL, P. ORDEJÓN, J. M. SOLER, Molec. Phys.
101 (2003) 1587.
[27] S. S. ALEXANDRE, E. ARTACHO, J. M. SOLER, Phys.
Rev. Lett. 91 (2003) 108105.
[28] K. SENTHILKUMAR et al, J. Am. Chem. Soc. 127
(2005) 14894.
[29] A. LEBRUN and R. LAVERY, Nucl. Ac. Res. 24 (1996)
2260.
[30] P. HOHENBERG and W. KOHN, Phys. Rev. 136 (1964)
B864; W. KOHN and L. J. SHAM, Phys. Rev. 140 (1965)
A1133.
[31] J. P. PERDEW and A. ZUNGER, Phys. Rev. B 23
(1981) 5048.
[32] M. ELSTNER, D. POREZAG, G. JUNGNICKEL, J. EL-
SNER, M. HAUGK, T. FRAUENHEIM, S. SUHAI, and
G. SEIFERT, Phys. Rev. B 58 (1998) 7260.
[33] J. M. SOLER, E. ARTACHO, J. D. GALE, A. GARCÍA,
J. JUNGUERA, P. ORDEJÓN, and D. SÁNCHEZ-
PORTAL, J. Phys.: Condens. Matter, 14 (2002) 2745.
[34] G. KRESSE and J. FURTHMÜLLER, Phys. Rev. B, 54
(1996) 11169.
[35] U. V. WAGHMARE, H. KIM, I. J. PARK, N. MO-
DINE, P. MARAGAKIS, E. KAXIRAS, Computer Phys.
Comm., 137, (2001) 341.
[36] J. P. PERDEW and Y. WANG, Phys. Rev. B 45 (1992)
13244; J. P. PERDEW, K. BURKE, and M. ERNZER-
HOF, Phys. Rev. Lett. 77 (1996) 3865.
[37] N. TROUILLER and J. L. MARTINS, Phys. Rev. B, 43
(1991) 8861.
[38] D. VANDERBILT, Phys. Rev. B, 41 (1990) 7892.
[39] D. R. HAMANN, M. SCHLUTER and C. CHIANG,
Phys. Rev. Lett. 43 (1979) 1494.
[40] P. MARAGAKIS, R. L. BARNETT, E. KAXIRAS,
M. ELSTNER, and T. FRAUENHEIM, Phys. Rev. B
66 (2002) 241104(R).
[41] N. MARZARI and D. VANDERBILT, Phys. Rev. B 56
(1997) 12847.
[42] C. SGIAROVELLO,M. PERESSI, and R. RESTA, Phys.
Rev. B 64 (2001) 115202
[43] Y-S. LEE, M. BUONGIORNO NARDELLI, and N.
MARZARI, Phys. Rev. Lett. 95 (2005) 076804.
[44] D. J. THOULESS, J. Phys. C 5 (1972) 77.
[45] D. VARSANO, R.I. FELICE, M. A. .L. MARQUES, and
A. RUBIO, J. Phys. Chem. B 110 (2006) 7129.
[46] F. L. GERVASIO, P. CARLONI, M. PARRINELLO,
Phys. Rev. Lett. 89 (2002) 108102.
[47] P. W. ANDERSON, Phys. Rev. 109 (1958) 1492.
[48] T. HEIM, T. MELIN, D. DERESMES, D. VUILLAUME,
Appl. Phys. Lett. 85 (2004) 2637.
[49] R. R. SINDEN, in “DNA Structure and Function”, (Aca-
demic Press, London, 1994).
density
functional theory
ab−initio
structure
electronic
semi−empirical
1500 b
hamiltonian
tight−binding
effective
1130 atoms94000
FIG. 1: Schematic illustration of the different scales included
in the current multiscale model: The two pictures on the left
are atomistic systems simulated with different computational
approaches (ab initio density functional theory and semi-
empirical electronic structure, resprectively). The picture on
the right represents a rope composed of DNA molecules, as in
experiments [48], which is treated by an effective tight-binding
hamiltonian constructed from the atomistic scale calculations.
FIG. 2: Schematic depiction of electron hopping in poly(CG)-
poly(CG) DNA for the HOMO state. The hopping matrix
elements ti are denoted by the indices (i) = (1), (2), (3). Elec-
trons are localized on the G bases. For the LUMO state, the
hopping is similar with electrons localized on the C bases.
C2 C2
FIG. 3: The DNA base pairs AT (top) and CG (bottom), with
the atoms labeled. The purines (A, G) are on the right, the
pyrimidines (T, C) on the left. Atom labeling follows standard
notation convention [49]. All rotations were performed with
respect to the helical axis denoted by the black circle (see
text).
FIG. 4: The frontier states in the base pairs and their identifi-
cation with corresponding orbitals in the isolated bases. The
middle figure in each panel shows the total charge density on
the plane of the base pair, with higher values of the charge
density in red and lower values in blue. The figure on the left
shows the HOMO state and the figure on the right shows the
LUMO state, where red and blue isosurfaces correspond to
positive and negative values of the wavefunctions. The labels
on the left denote the type of bases and base pairs.
7 8 9 10 11
 backbone distance (Α
7 8 9 10 11
 backbone distance (Α
AT CG
T−LUMO C−LUMO
G−HOMO
A−HOMO
FIG. 5: Eigenvalues of states in the AT and CG base pairs
as a function of backbone distance. In each case three states
are included above and below the band gap. Lines are results
from SIESTA calculations, points are results from HARES
calculations (see text). The frontier orbitals in both pairs
are related to one component of the pair as indicated by the
labels. The equilibrium backbone distance is denoted by a
vertical dashed line.
2.9 3.4 3.9 4.4 4.9 5.4
axial distance (Α
0 60 120 180 240 300 360
angle (deg.)
AT−AT
CG−CG
T−LUMO
A−HOMO
C−LUMO
G−HOMO
C−LUMO
G−HOMO
AT−CG
FIG. 6: Eigenvalues of states in the AT-AT, CG-CG and AT-
CG base pair combinations as a function of the distance along
the helical axis (at zero angle of rotation) and the rotation an-
gle around the helical axis (at the equilibrium axial distance).
Lines are results from SIESTA calculations, points are results
from VASP calculations (see text). In each case three states
are included above and below the band gap. The value of the
distance or the rotation angle that correspond to equilibrium
configurations are indicated by vertical dashed lines (there are
five almost equivalent local minima in rotation). As in Fig.
5, frontier orbitals are identified as the corresponding orbital
of one base only.
3’−3’ 5’−5’
FIG. 7: The DNA structures for the unstretched (top) and the
different amounts of stretching in the 3’-3’ and the 5’-5’ modes
with features of the frontier orbitals described by the blue
(HOMO) and red (LUMO) spheres (see text for details). For
both modes the amount of stretching is (a) 30%, (b) 60%, and
(c) 90% relative to the unstretched structure, which is the B-
DNA form. The 3’-5’ orientations of the poly(CG)-poly(CG)
sequence are shown in the left panel at 90% stretching, where
these the structure is easier to visualize.
0 30 60 90
% stretching
HOMO 3’−3’
HOMO 5’−5’
LUMO 3’−3’
LUMO 5’−5’
(2) (2)
(2) (3)
FIG. 8: The frontier state “bottleneck” hopping matrix el-
ements as given by Eq. (17) for the different types (3’-3’ or
5’-5’) and amounts of stretching of poly(CG)-poly(CG) DNA.
At each value of stretching, the dominant hopping process is
indicated in parenthesis.
−20 −10 0 10 20
energy (meV)
FIG. 9: (bottom) The density of electronic states for the
HOMO state stretched in the 3’-3’ mode. For comparison,
the on-site energy parameter, ε, has been set to zero. (top)
The localization length Li, defined in Eq. (15), is computed
for each eigenstate with disorder strength γ = 0.3 meV.
ABSTRACT
  When the DNA double helix is subjected to external forces it can stretch
elastically to elongations reaching 100% of its natural length. These
distortions, imposed at the mesoscopic or macroscopic scales, have a dramatic
effect on electronic properties at the atomic scale and on electrical transport
along DNA. Accordingly, a multiscale approach is necessary to capture the
electronic behavior of the stretched DNA helix. To construct such a model, we
begin with accurate density-functional-theory calculations for electronic
states in DNA bases and base pairs in various relative configurations
encountered in the equilibrium and stretched forms. These results are
complemented by semi-empirical quantum mechanical calculations for the states
of a small size [18 base pair poly(CG)-poly(CG)] dry, neutral DNA sequence,
using previously published models for stretched DNA. The calculated electronic
states are then used to parametrize an effective tight-binding model that can
describe electron hopping in the presence of environmental effects, such as the
presence of stray water molecules on the backbone or structural features of the
substrate. These effects introduce disorder in the model hamiltonian which
leads to electron localization. The localization length is smaller by several
orders of magnitude in stretched DNA relative to that in the unstretched
structure.

<|endoftext|><|startoftext|>
Introduction
Both from a biregular and a birational standpoint, the geometry of algebraic varieties
is often studied in terms of their fibrations. Given a smooth complex projective variety
X, there are two parallel theories able to detect on X fibrations whose fibers are varieties
of negative Kodaira dimension. The first one, initiated by Mori, associates a morphism
X → S, whose general positive dimensional fiber is a Fano variety, to any KX-negative
extremal ray of the cone of effective curves of X. The second theory, introduced in works
of Campana and of Kollár, Miyaoka and Mori, produces a rational map X 99K S, with
proper and rationally chain connected fibers within its domain of definition, from any
family of rational curves on X, the most studied case being the one in which the family
of all rational curves of X is considered.
The purpose of this paper is to study extension properties of fibrations of the types
described above, when these fibrations are defined on a smooth subvariety Y of X with
ample normal bundle: our goal is to determine additional conditions that guarantee that
given fiber structures on Y extend to analogous structures on X, and to compare the
corresponding fibrations. Following the terminology of [13], we will refer to such a Y as
an “ample subvariety” of X. We remark that, in the special case of codimension one, this
setting is more general than the classical setting of ample divisors.
Our first result is an extension property for rationally connected fibrations. The pre-
cise formulation requires some additional notation, and is given in Theorems 3.1 and 3.6.
1991 Mathematics Subject Classification. Primary: 14D06, 14J10. Secondary: 14C05, 14J40, 14N30.
Key words and phrases. Ample subvarieties, rationally connected fibrations, families of rational curves,
special varieties, extension of maps, Mori contractions.
All authors acknowledge support by MIUR National Research Project “Geometry on Algebraic Va-
rieties” (Cofin 2004). The research of the second author was partially supported by NSF grants DMS
0111298 and DMS 0548325. The third author acknowledges partial support by the University of Milan
(FIRST 2003).
http://arxiv.org/abs/0704.0661v3
2 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
Roughly speaking, Theorem 3.1 says that if Y is a submanifold with ample normal bundle,
the inclusion inducing a surjection N1(X) → N1(Y ), and V is a family of rational curves
on X inducing a covering family VY on Y , then there is a commutative diagram
Y//VY
// X//V ,
where φ is the rational map associated to the family V , π is the map associated to VY , and
δ is a surjective morphism of normal varieties; here X//V and Y//VY denote the respective
rational quotients.
With the second theorem, we determine sufficient conditions to ensure that the
morphism δ in the above diagram is generically finite; this is one of the core results of the
paper (see Theorem 3.6).
Theorem A. Assuming that the normal bundle of Y in X is ample and that the inclusion
of Y induces a surjection N1(X) ։ N1(Y ), the morphism δ is generically finite if one of
the following two conditions is satisfied:
(a) V is an unsplit family; or
(b) codimX Y < dimY − dimY//VY .
When the rational quotient Y//VY is one-dimensional, it turns out from a general
fact that π is a morphism (see Proposition 3.12); using this fact, the previous result can
be improved in this case, and we obtain a commutative diagram as the one above where
now the vertical arrows are morphisms and δ is a finite morphism between smooth curves
(see Corollary 3.13). The situation when Y is the zero scheme of a regular section of an
ample vector bundle on X and π is the MRC-fibrations over a base of positive geometric
genus was also treated in [25] (see Remark 3.10).
Next, we address the problem of extending extremal Mori contractions of fiber-type.
Using the previous result, we prove the following theorem (see Theorem 4.1).
Theorem B. Let Y be a submanifold of a projective manifold X with ample normal
bundle, and assume that the inclusion induces an isomorphism N1(X) ∼= N1(Y ). Let
π : Y → Z be an extremal Mori contraction of fiber-type, and let W be an irreducible
covering family of rational curves on Y that are contracted by π. If these curves do not
break in X under deformation (that is, if the family they generate in X is unsplit), then π
extends to an extremal Mori contraction φ : X → S, and there is a commutative diagram
// S,
where δ is a finite surjective morphism.
In the special case when Y is defined by a regular section of an ample vector bundle
on X, related results were obtained in [1, 9, 25] (see Remark 4.3).
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 3
In the last section of the paper, the above results are finally applied to classify, un-
der suitable conditions, projective manifolds containing “ample subvarieties” that admit
a structure of projective bundles or quadric fibrations. Beginning with classical results on
hyperplane sections, there has been given evidence to the fact that projective manifolds
are, so to speak, at least as “special” as their ample divisors (cf. [26]). The study of pro-
jective bundles and quadric fibrations embedded in projective manifolds as zero schemes
of regular sections of ample vector bundles was already undertaken in [18, 19, 20, 8, 25],
where classification results were obtained when the base of the fibration, if positive dimen-
sional, has positive geometric genus. Under the additional assumption of a polarization
on the ambient variety inducing a relatively linear polarization on the fibration, classifi-
cation results were obtained in [7, 1, 21]. With some dimensional restrictions on the fiber
structure of the subvariety in terms of its codimension, we extend such results in two ways.
In our first generalization, we consider fibrations over curves and drop the hypothesis
that the subvariety is defined by a regular section of an ample vector bundle, only assuming
that the normal bundle is ample. In this case, since the positivity condition is local near Y ,
we additionally require that the inclusion induces an isomorphism on the Picard groups. In
the second generalization, we allow the base of the projective fibration to have arbitrary
dimension. In either case, we do not put any additional restriction on the base of the
fibration and do not require a priori any global polarization (such a polarization will turn
out to exist a posteriori). We state here the extension property that we obtain in the case
of projective bundles.
Theorem C. Let Y be a submanifold of a projective manifold X with ample normal
bundle, of codimension
(1) codimX Y < dimY − dimZ.
Assume that Y admits a projective bundle structure π : Y → Z over a smooth projective
variety Z, and that one of the following two situations occurs:
(a) Z is a curve and the inclusion of Y in X induces an isomorphism Pic(X) ∼= Pic(Y );
(b) Y is the zero scheme of a regular section of an ample vector bundle on X.
Then π extends to a projective bundle structure π̃ : X → Z on X, giving a commutative
diagram
eπ~~~~
and the fibers of π are linearly embedded in the fibers of π̃.
A similar statement is also proven in the case of quadric fibrations with irreducible
reduced fibers and relative Picard number one. For the full result, see Theorem 5.8.
The proofs of the above results rely on various properties concerning families of
rational curves on projective manifolds and their associated rational quotients, which are
collected in Section 2. General facts from deformation theory of rational curves needed in
the sequel are recalled in this section, and several new notions are introduced and studied.
4 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
Among other things, we study numerical properties of families of rational curves, intro-
ducing in particular the notion of numerically covering family (see Definition 2.18), which
turns out to be very useful in our context. Appropriate notions for the extension and re-
striction of families of rational curves are also introduced in relation to given submanifolds
of the ambient variety.
Acknowledgements. We would like to thank C. Araujo, L. Bădescu, C. Casagrande and
P. Ionescu for useful discussions, and the referee for many precious remarks and sugges-
tions. We are grateful to the Istituto Nazionale di Alta Matematica and to the University
of Milan (FIRST 2003) for making this collaboration possible.
1. Conventions and basic notation
We work over the complex numbers, and use standard notation in algebraic geome-
try, although tensor product between line bundles is often denoted additively. For n ≥ 1,
we denote by Qn the smooth quadric hypersurface of Pn+1. We use the word general to
mean that the choice is made outside a properly contained (Zariski) closed subset, and
the word very general to mean that the choice is made outside a countable union of such
subsets. For any projective algebraic variety X, we denote by N1(X) := (Pic(X)/ ≡)⊗R
the Néron–Severi space of X, and by N1(X) := (Z1(X)/ ≡) ⊗ R the space generated by
numerical classes of curves on X. The dimension of these spaces, which is equal to the
Picard number of X, is denoted by ρ(X). For a morphism of algebraic varieties X → S,
we set N1(X/S) := (Z1(X/S)/ ≡) ⊗ R, and we denote by ρ(X/S) the dimension of this
space. The numerical class of a line bundle L (resp., of a curve C) on X is denoted by [L]
(resp., [C]). We denote by NE(X) the closure of the cone in N1(X) spanned by numer-
ical classes of effective curves, and by Nef(X) its dual cone, namely, the cone in N1(X)
spanned by classes of nef divisors. In the case X is a projective variety with Pic(X) ∼= Z,
we will denote by OX(1) the ample generator of this group.
2. Families of rational curves on varieties
Throughout this section we consider morphisms from the projective line P1 to a
smooth projective variety X; for convenience, we fix once for all two distinct points 0 and
∞ in P1. We will use notation as in [15], some of which we recall below. For general facts
on the theory of deformation of rational curves on varieties, we refer to [15, 6].
Let Hom(P1,X) be the scheme parameterizing morphisms from P1 to X. We will
denote a morphism f : P1 → X by [f ] whenever we think of it as a point of Hom(P1,X).
If Σ ⊆ P1 and Θ ⊆ X are closed subschemes, then we denote by
Hom(P1,X; Σ → Θ)
the subscheme of Hom(P1,X) parameterizing those morphisms f such that f(Σ) ⊆ Θ (the
image f(Σ) of Σ is here intended scheme-theoretically). In the particular case Σ = {0}
and Θ = {x} for some x ∈ X, we also use the notation Hom(P1,X; 0 7→ x). Note that,
for any closed subscheme j : Σ →֒ P1, there is a natural morphism
j∗ : Hom(P1,X) → Hom(Σ,X)
defined by j∗([f ]) := [f ◦ j].
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 5
Definition 2.1. A morphism f : P1 → X is free if it is non-constant and f∗TX is nef.
We will use the following well-known properties (see [15, Theorem II.1.7]).
Proposition 2.2. Let f : P1 → X be a non-constant morphism, and let x = f(0). Let
m0 ⊂ OP1 be the maximal ideal sheaf of 0 ∈ P
(a) The Zariski tangent space of Hom(P1,X) at [f ] is isomorphic to H0(P1, f∗TX),
and Hom(P1,X) is smooth at [f ] if h1(P1, f∗TX) = 0.
(b) Similarly, the Zariski tangent space of Hom(P1,X; 0 7→ x) at [f ] is isomorphic to
H0(P1, f∗TX⊗m0), and Hom(P
1,X; 0 7→ x) is smooth at [f ] if h1(P1, f∗TX⊗m0) =
In particular, if f is free, then Hom(P1,X) and Hom(P1,X; 0 7→ x) are smooth at [f ].
For closed subschemes Θ ⊆ X and Σ ⊆ P1, and any subset S ⊆ Hom(P1,X), we
denote
S(Σ → Θ) := S ∩Hom(P1,X; Σ → Θ),
and, in a similar way, we define S(0 7→ x) for any point x ∈ X.
Definition 2.3. Given a subset S ⊆ Hom(P1,X), the image of the universal map P1×S →
X is called locus of S, and is denoted by Locus(S). Analogous definitions are given for
the loci of S(Σ → Θ) and S(0 7→ x), which are respectively denoted by Locus(S; Σ → Θ)
and Locus(S; 0 7→ x).
We now restrict our attention to the open subscheme Hombir(P
1,X) of Hom(P1,X)
parameterizing morphisms that are birational to their images. The previous notation is
adapted to Hombir(P
1,X) in the obvious way.
Definition 2.4. Any union V = ∪α∈AVα of irreducible components Vα of Hombir(P
is said to be a family of (parameterized) rational curves on X. If V consists of only one
irreducible component of Hombir(P
1,X), then V is said to be an irreducible family. If
Locus(Vα) is dense in X for every α ∈ A, then V is said to be a covering family.
Remark 2.5. The scheme Hombir(P
1,X) has at most countably many irreducible compo-
nents. In particular, the same holds for every family of rational curves on X.
Proposition 2.6. Let V be an irreducible family of rational curves on X. Then the locus
Locus(V ) of V is dense in X if and only if f is free for a general [f ] ∈ V .
Proof. Locus(V ) is dense in X if and only if the differential of the universal map P1×V →
X has rank equal to dimX at a general point (p, [f ]) ∈ P1×V . By [15, Proposition II.3.10],
this occurs if and only if f is free for a general [f ] ∈ V . �
Definition 2.7. Given any closed subset S ⊆ Hombir(P
1,X), we denote by 〈S〉 the union
of all irreducible components of Hombir(P
1,X) that contain at least one irreducible com-
ponent of S, and by 〉S〈 the union of all irreducible components of Hombir(P
1,X) that
are irreducible components of S. We call 〈S〉 the minimal family generated by S, and 〉S〈
the maximal family contained in S.
Analogous notation and definitions will be adopted for S ⊆ Hom(P1,X).
6 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
Definition 2.8. Let V be a family of rational curves on X. We say that a curve C ⊂ X
is a V -curve if C = f(P1) for some [f ] ∈ V . A V -chain of length ℓ is the union of ℓ
distinct curves fi(P
1) ⊂ X (1 ≤ i ≤ ℓ) parameterized by elements [fi] ∈ V such that
fi+1(0) = fi(∞) for every 1 ≤ i ≤ ℓ− 1.
Consider a family V = ∪Vα of rational curves on X. Let V
α be the irreducible
components of the normalization Homnbir(P
1,X) of Hombir(P
1,X) that map to Vα, and let
V be the closure in Chow1(X) of the image of ∪V
α via the natural morphism
(2) Homnbir(P
1,X) → Chow1(X)
(see [15, Comment II.2.7]). By [15, Theorem I.5.10], V is proper. Associated to V (hence
to V ), one can define a proper proalgebraic relation on X, as explained in [15, Section IV.4]
(see in particular [15, Example IV.4.10]). We denote this relation by RCV . We have that
(x, y) ∈ RCV for two very general points x, y ∈ X if and only if there is a V -chain
containing x and y.
Definition 2.9. X is said to be RCV -connected if there is only one RCV -equivalence
class. We say that X is V -chain connected if every two points of X lie on a V -chain. More
generally, a closed subset T ⊂ X is said to be V -chain connected if any two points of T
lie on a V -chain supported on T .
Clearly if X is V -chain connected, then it is RCV -connected.
Definition 2.10. A family of rational curves V is said to be an unsplit family if it is
irreducible and, after normalization, its image in Chow1(X) via the map (2) is a proper
scheme.
Remark 2.11. If V is an irreducible family of rational curves on X, and there is an ample
vector bundle E onX such that detE·C < 2 rkE for a V -curve C, then V is unsplit. Indeed,
i=1[Ci] is any degeneration of the cycle [C], then detE · C =
i=1 detE · Ci ≥ k rkE
by the ampleness of E. Thus k = 1 and [C1] lies in the image of the normalization of V in
Chow1(X) via the map (2).
Proposition 2.12. If X is RCV -connected by a family V = ∪Vα with each Vα unsplit,
then it is V -chain connected and N1(X) is generated by numerical classes of V -curves. In
particular, if V is unsplit, then X has Picard number ρ(X) = 1.
Proof. Fix ℓ≫ 0 such that every two very general points of X are connected by a V -chain
of length ℓ, and let S ⊂ Chow1(X) be the subset parameterizing connected 1-cycles with
ℓ components (counting multiplicities) supported on V -curves. Since each component Vα
of V is unsplit, it follows that S is proper. By the choice of ℓ, there is an irreducible
component T of S such that u(2) : U ×T U → X × X is dominant, where U → T is the
universal family and u : U → X is the cycle map. Note that T is a closed subvariety of
Chow1(X), and in particular it is proper. Therefore the image of u
(2) is proper, and thus
u(2) is surjective. This implies that X is V -chain connected. The second assertion follows
from [15, Proposition IV.3.13.3]. �
If V = Hombir(P
1,X), then it is a result of Campana [5] and Kollár–Miyaoka–Mori
[16] that one can “pass to the quotient”. The natural generalization of this result when V is
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 7
an arbitrary family of rational curves was later given by Kollár (see [15, Theorem IV.4.17]),
and can be stated as follows.
Theorem 2.13. Let V be a family of rational curves on X. Then there is an open set
X◦ ⊆ X, a normal projective variety X//V , and a dominant morphism X
◦ → X//V with
connected fibers and proper over the image, whose very general fibers are RCV -equivalence
classes in X.
Proof. We refer the reader to [15, Theorem IV.4.17] for the existence of a proper morphism
X◦ → Y ◦ onto a variety Y ◦ with connected fibers and whose very general fibers are RCV -
equivalence classes in X. Since X◦ is normal and the fibers are connected, Y ◦ is normal.
It follows by construction that Y ◦ is quasi-projective. Then we can take X//V to be the
normalization of the closure of Y ◦ in some projective embedding. �
Definition 2.14. With the notation as in the previous theorem, the resulting rational
map X 99K X//V is called the RCV -fibration of X, and X//V is the RCV -quotient. We also
say that X//V (resp., X 99K X//V ) is the rational quotient (resp., the rational fibration)
defined by V . We denote by
dimRCV := dimX − dimX//V
the dimension of a very general RCV -equivalence class (which is the same as the dimension
of a general fiber of X 99K X//V ).
Remark 2.15. We have dimRCV > 0 if and only if V is a covering family. We would
like to stress that X//V is well defined only up to birational equivalence; throughout the
paper, we will often make suitable choices of rational quotients. We remark that, after
possibly shrinking further X◦, one can always take a smooth projective model for the
rational quotient X//V .
Remark 2.16. A case of particular interest is when V = Hombir(P
1,X). In this case X//V
is called the MRC-quotient (maximal rational quotient) of X. It follows from a result of
Graber, Harris and Starr [12] that in this situation X//V is not uniruled.
Every connected subset S ⊆ Hombir(P
1,X) determines in a natural way a numerical
class [S] ∈ N1(X), by taking the class of the curve f(P
1) for an arbitrary [f ] ∈ S. More
generally, we give the following definition.
Definition 2.17. For any subset S ⊆ Hombir(P
1,X), we denote
R≥0[S] :=
[f ]∈S
R≥0[f(P
1)] ⊂ N1(X).
We call R≥0[S] the cone numerically spanned by S. If S is connected, then we call [S] the
numerical class of S.
Definition 2.18. A family of rational curves V = ∪α∈AVα is said to be numerically
covering if there is a subset A′ ⊆ A such that
R≥0[V ] =
R≥0[Vα]
8 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
and Locus(Vα) is dense in X for every α ∈ A
′. If V is a numerically covering family, then
we denote by Vcov the subfamily of V consisting of all the covering irreducible components
of V .
Remark 2.19. Note that by definition R≥0[Vcov] = R≥0[V ] for every numerically covering
family V of rational curves on X. Clearly, in the notation of the definition, we have
∪α∈A′Vα ⊆ Vcov.
Proposition 2.20. Let V = ∪α∈AVα and W = ∪β∈BWβ be two numerically covering
families of rational curves on X, and suppose that
R≥0[V ] ⊆ R≥0[W ].
Then the RCW -fibration ψ : X 99K X//W factors through the RCV -quotient of X, i.e., we
have a commutative diagram
X//V //___ X//W .
In particular, if R≥0[V ] = R≥0[W ], then V and W define the same rational fibration.
Proof. Consider a resolution of the indeterminacies of ψ
ψ ""F
X//W ,
and let E ⊂ X ′ be the exceptional locus of σ. Note that the locus of indeterminacies of
ψ is contained in X \X◦, and hence, by Theorem 2.13, it does not dominate X//W , since
it does not meet the very general RCV -equivalence class. Therefore we can assume that
ψ′(E) is a proper closed subset of X//W .
Lemma 2.21. If C ⊂ X is a V -curve not contained in σ(E), then its proper transform
C ′ ⊂ X ′ is mapped to a point in X//W by ψ
Proof of the lemma. Let B′ ⊂ B be the subset parameterizing the irreducible components
of W that are covering, so that Wcov = ∪β∈B′Wβ. Since
R≥0[V ] ⊆ R≥0[W ] = R≥0[Wcov],
we can pick general elements [gβ] ∈ Wβ for β ∈ B
′, and numbers λβ ≥ 0, such that,
denoting Γβ = gβ(P
1), we have
[C] =
λβ[Γβ] in N1(X).
By the definition of ψ and Theorem 2.13 and the fact that the Wβ are covering families
for every β ∈ B′, we can assume that Γβ ∩ σ(E) = ∅ for every β. Let C
′ and Γ′β be the
proper transforms of C and Γβ on X
′, and let D ⊂ X ′ be the pull-back, via ψ′, of an
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 9
ample divisor on X//W . Then ψ
β ] = 0, so we have σ∗D ·Γβ = D ·Γ
β = 0 for all β, and
D · C ′ ≤ σ∗D · C =
λβ(σ∗D · Γβ) = 0.
Since D is the pull-back of an ample divisor on X//W , this implies that ψ
′] = 0. �
We can now conclude the proof of the proposition. We consider a very general fiber
G of φ, and fix two very general points x, y ∈ G that are not contained in σ(E). By
the generality of the choices, we can assume that both x and y are outside the locus of
indeterminacies of ψ, hence ψ(x), ψ(y) 6∈ ψ′(E). Furthermore, we can assume that x and
y are connected by a chain of V -curves. By Lemma 2.21, every irreducible component
of this chain that is not contained in σ(E) is mapped to a point by ψ. Therefore, since
ψ(x) 6∈ ψ′(E), we deduce that this chain is in fact disjoint from σ(E). Thus we can apply
the lemma to each one of its components. Since the chain is connected, this implies
ψ(x) = ψ(y).
In view of the generality of the choice of x and y in G, we conclude that there exists a
natural rational map X//V 99K X//W commuting with the respective projections. �
Definition 2.22. Let V ⊆ Hombir(P
1,X) be any family of rational curves, and let a > 0
be an integer. We denote by ‖V ‖ the largest family of rational curves on X such that
R≥0[‖V ‖] = R≥0[V ]. We call ‖V ‖ the family numerically generated by V .
Remark 2.23. Quite obviously, V is a subfamily of ‖V ‖, and if V is a numerically covering
family then so is ‖V ‖.
The previous proposition implies the following useful property.
Corollary 2.24. Let V be a numerically covering family of rational curves on X. Then
the families V , Vcov, and ‖V ‖ define the same rational fibration.
Proof. Since R≥0[‖V ‖] = R≥0[V ] = R≥0[Vcov], it follows from Proposition 2.20. �
An important theorem, due to Kollár, Miyaoka and Mori, says that a smooth pro-
jective manifold is maximally rationally chain connected if and only if it is maximally
rationally connected (see [15, Theorem IV.3.10]). The proof of this property can be
adapted to the situation in which an arbitrary family of rational curves V is considered.
More precisely, one can prove that if X is V -chain connected, then every two very general
points of X are connected by a ‖V ‖-curve. In the sequel, we will need the following slightly
different version of this property (the proofs of the two properties are almost the same).
Proposition 2.25. Let V be a family of rational curves on X, and assume that X is
V -chain connected. Let y ∈ X be any point that is connected by a V -chain to a very
general point of X. Then Locus(‖V ‖, 0 7→ y) is dense in X.
Proof. Fix a very general point x ∈ X, and let C = C1+ · · ·+Cn be a V -chain connecting
x to y, with Ci = fi(P
1). Let p0 = x, pn = y, and pi = fi(∞) = fi+1(0) for 1 ≤
i ≤ n − 1. From here, we follow step by step the proof of [15, Complement IV.3.10.1],
proving inductively that there is a free rational curve gi : P
1 → X which connects pi−1
to pi; the resulting chain of free rational curves is then smoothed into a free rational
10 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
curve h : P1 → X connecting x to y. The construction shows that each gi+1 is obtained
by smoothing a comb on fi+1 with teeth assembled out of deformations of gi (see [15,
Definition II.7.7]). It follows that h is a ‖V ‖-curve. �
Consider now a submanifold Y of X. The inclusion i : Y →֒ X naturally induces an
injective morphism
i∗ : Hom(P
1, Y ) →֒ Hom(P1,X),
which is defined by i∗([g]) := [i ◦ g]. Similar notation will be used for the injection
Hombir(P
1, Y ) →֒ Hombir(P
1,X). Statements analogous to the following two propositions
also hold for Hombir( ) in place of Hom( ).
Proposition 2.26. With the above notation, assume that the normal bundle of Y in X
is ample. Then, for every [g] ∈ Hom(P1, Y ) with g free, the morphism f := i ◦ g is free on
X, and the schemes Hom(P1,X) and Hom(P1,X; 0 7→ g(0)) are smooth at [f ].
Proof. The bundle g∗TY is nef, since [g] is free. Thus, by the ampleness of NY/X and the
exact sequence
0 → g∗TY → f
∗TX → g
∗NY/X → 0,
we conclude that f is free. Then the last two assertions follow from Proposition 2.2. �
We will need the following generalization of Proposition 2.2.
Proposition 2.27. Let Y be a submanifold of a smooth projective variety X and let
i : Y →֒ X be the inclusion. Let g : P1 → Y be a free morphism, and let f := i ◦ g.
Then the Zariski tangent space of Hom(P1,X; {0} → Y ) at [f ] sits naturally in an exact
sequence
0 → H0(P1, g∗TY ) → T[f ]Hom(P
1,X; {0} → Y ) → H0(P1, g∗NY/X ⊗m0).
Moreover, if the normal bundle of Y in X is ample, then the sequence completes to a short
exact sequence
0 → H0(P1, g∗TY ) → T[f ]Hom(P
1,X; {0} → Y ) → H0(P1, g∗NY/X ⊗m0) → 0
and Hom(P1,X; {0} → Y ) is smooth at [f ].
Proof. The natural maps f∗TX → g
∗NY/X → g
∗NY/X |{0}, passing to cohomology and
taking into account the isomorphism T[f ]Hom(P
1,X) ∼= H0(P1, f∗TX) given by Proposi-
tion 2.2, yield a natural homomorphism
r : T[f ]Hom(P
1,X) → (g∗NY/X)|{0}.
We have a Cartesian square
Hom(P1,X; {0} → Y )
// Hom(P1,X)
Hom({0}, Y ) // Hom({0},X).
Note that there are natural identifications Hom({0}, Y ) = Y and Hom({0},X) = X.
The Zariski tangent space of Hom(P1,X; {0} → Y ) at [f ], viewed as a subspace of
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 11
T[f ]Hom(P
1,X), is contained in ker(r). This simply follows from the fact that the mor-
phism Hom(P1,X; {0} → Y ) → Hom({0},X) factors through Hom({0}, Y ), by taking
tangent spaces and recalling the above natural identifications. Now, the freeness of g
implies that H1(P1, g∗TY ) = 0, hence we get the exact sequence
0 → H0(P1, g∗TY ) → T[f ]Hom(P
1,X) → H0(P1, g∗NY/X) → 0.
This sequence restricts to an exact sequence
(3) 0 → H0(P1, g∗TY ) → ker(r) → H
0(P1, g∗NY/X ⊗m0) → 0,
and the first assertion follows by observing that T[f ]Hom(P
1,X; {0} → Y ), viewed as a
subspace of ker(r), contains the image of H0(P1, g∗TY ).
Suppose now that NY/X is ample. By Proposition 2.26, Hom(P
1,X; 0 7→ g(0)) is
smooth, hence of dimension h0(P1, f∗TX⊗m0), at [f ]. Let V be the irreducible component
of Hom(P1,X) containing [f ]. Note that T[f ]Hom(P
1,X; {0} → Y ) = T[f ]V ({0} → Y ).
Moreover, there is a dominant morphism V ({0} → Y ) → Y , defined by [h] 7→ h(0), whose
fiber over a point y ∈ Y is V (0 7→ y). This implies that dimV ({0} → Y ) = dimV (0 7→
y) + dimY for a general y ∈ Y , and therefore
(4) dimV ({0} → Y ) = h0(P1, f∗TX ⊗m0) + dimY
by Proposition 2.2,(b). On the other hand, we have
h0(P1, g∗NY/X ⊗m0) = h
0(P1, f∗TX ⊗m0)− h
0(P1, g∗TY ⊗m0)
= h0(P1, f∗TX ⊗m0)− h
0(P1, g∗TY ) + h
0(g∗TY |{0}).
Therefore, by (3) and h0(g∗TY |{0}) = dimY , we get
(5) dimker(r) = h0(P1, f∗TX ⊗m0) + dimY.
Then, comparing (5) with (4), we conclude at once that T[f ]Hom(P
1,X; {0} → Y ) = ker(r)
and that V ({0} → Y ) is smooth at [f ]. �
Definition 2.28. If W ⊆ Hombir(P
1, Y ) is a family of rational curves on Y , then we
call 〈i∗(W )〉 ⊆ Hombir(P
1,X) the extension of W to X. Conversely, for every family of
rational curves V on X, we call 〉i−1∗ (V )〈 ⊆ Hombir(P
1, Y ) the restriction of V to Y .
Remark 2.29. If V is an irreducible family on X, then 〉i−1∗ (V )〈 needs not be irreducible.
In fact, the example of the restriction to a smooth quadric Q2 of the family of lines
in P3 shows that, in general, different elements in 〉i−1∗ (V )〈 may even define linearly
independent numerical classes in N1(Y ). In particular, if V is an unsplit family on X,
then its restriction to Y may not be an unsplit family. Although we do not have examples,
it seems likely that i−1∗ (V ) might fail to be a family on Y even if V is a family on X.
We close this section with the following “relative” version of Proposition 2.12. The
proof, which can be found within the proof of [4, Lemma 1.4.5], uses a “non-breaking
lemma” due to Wísniewski [28].
Proposition 2.30. Let V be an unsplit family on X. Let Y ⊂ X be a subvariety, and
assume that Locus(V, {0} → Y ) is dense in X. Then, for every curve Γ ⊂ X, we have
[Γ] = a[ΓY ] + b[C] in N1(X),
12 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
where C is a V -curve, ΓY is a curve contained in Y , and a ≥ 0.
3. Extension of rationally connected fibrations
In this section we study the relationship between rational connected fibrations of a
projective manifold X and those of an ample submanifold Y ⊂ X. Our main interest is in
the case when it is given on X an irreducible family of rational curves V whose restriction
VY to Y contains a covering component. Given this situation, the goal is to compare
the associated rationally connected fibrations X 99K X//V and Y 99K Y//VY . However,
for technical reasons that will be evident in the proof of Theorem 3.6, it is convenient
to consider from the beginning a more general setting, allowing V to be reducible. The
conditions given in items (i) and (ii) in the following two theorems capture the essential
properties of V needed in our arguments.
Theorem 3.1. Let X be a smooth projective variety, and assume that Y ⊂ X is a smooth
subvariety with ample normal bundle. Let i : Y →֒ X be the inclusion, and suppose that the
induced map i∗ : N1(X) → N1(Y ) is surjective. Let V = ∪α∈AVα be a family of rational
curves on X, and assume that there is a subset B ⊆ A such that
(i) R≥0[V ] =
β∈B R≥0[Vβ ], and
(ii) Locus
i−1∗ (Vβ)
is dense in Y for every β ∈ B.
Let VY := 〉i
∗ (V )〈 be the restriction of V to Y . Then both V and VY are numerically
covering families (respectively, on X and on Y ), and, for suitable choices of the rational
quotients, there is a commutative diagram
Y//VY
// X//V ,
where π and φ are the projections to the respective rational quotients and δ is a surjective
morphism.
Remark 3.2. If one assumes that V is irreducible, then the hypothesis that i∗ : N1(X) →
N1(Y ) be surjective is unnecessary.
Remark 3.3. An analogous property has been independently observed to hold in the case
V = Hombir(P
1,X) by A.J. de Jong and J. Starr.
The proof of Theorem 3.1 is based upon the following two lemmas.
Lemma 3.4. With the same notation and assumptions as in Theorem 3.1,
Locus(Vβ; {0} → Y ) is dense in X for every β ∈ B.
Proof. To keep the notation light, we suppose throughout the proof that V is irreducible,
so that Vβ = V . By hypothesis, Locus(VY ) is dense in Y . Fix a general element [g] of
an irreducible component of VY with dense locus in Y , and let f := i ◦ g. Note that
[f ] ∈ V ({0} → Y ). Since the chosen component of VY has dense locus in Y , we can
assume that y := f(0) is a general point of Y . We know that g is free by Proposition 2.6.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 13
Thus, since V is an irreducible component of Hom(P1,X) containing [f ], Proposition 2.27
applies to say that V ({0} → Y ) is smooth at [f ], with Zariski tangent space
T[f ]V ({0} → Y )
0(P1, g∗TY )⊕H
0(P1, g∗NY/X ⊗m0)).
We can view the right hand side as a vector subspace Λ ⊆ H0(P1, f∗TX) via the iso-
morphism (3) given in the proof of Proposition 2.27. It follows from the ampleness of
g∗NY/X that f
∗TX is spanned by the sections in Λ at every point q ∈ P
1 \ {0}. By [15,
Proposition I.2.19], we conclude that the differential of the universal map
1 × V ({0} → Y ) → X
has rank equal to dimX at (q, [f ]) for every q ∈ P1 \ {0}. Since V ({0} → Y ) is smooth at
[f ], this implies that its locus in X is dense. �
Lemma 3.5. With the same notation and assumptions as in Theorem 3.1, for every
β ∈ B the family Vβ is covering (in X) and every irreducible component of i
∗ (Vβ) with
dense locus in Y is a family (on Y ).
Proof. By the hypothesis (ii) of Theorem 3.1, we can fix an arbitrary irreducible component
T of i−1∗ (Vβ) with dense locus in Y . Let [g] ∈ T be a general element. We know by
Proposition 2.6 that g is free. Then by Proposition 2.26, f := i ◦ g is free and Hom(P1,X)
is smooth at [f ], and thus, in particular, Vβ is a covering family by Proposition 2.6 again.
Moreover, we deduce by the smoothness of Hom(P1,X) at [f ] that for every irreducible
one-parameter family [gt] in Hombir(P
1, Y ) specializing to [g], the corresponding family
[ft] := [i ◦ gt] is contained in V . Therefore [gt] ∈ i
∗ (V ), and in fact [gt] ∈ T by the
generality of the choice of [g] in the component T . We conclude that T is an irreducible
component of Hombir(P
1, Y ), and hence, in particular, of VY . �
Proof of Theorem 3.1. It follows from Lemma 3.5 and the hypothesis (i) that V is a nu-
merically covering family onX and similarly, using the fact that the i∗ : N1(Y ) → N1(X) is
injective, it follows that VY is a numerically covering family on Y . Let φ : X 99K Z//V and
π : Y 99K Y//VY be the rational quotients, as in Theorem 2.13. Let X
◦ ⊆ X, S◦ ⊆ X//V ,
Y ◦ ⊆ Y , and Z◦ ⊆ Y//VY be open subsets such that the maps φ and π restrict to proper
surjective morphisms
φ◦ : X◦ → S◦ and π◦ : Y ◦ → Z◦.
Let G be a very general fiber of φ◦, and let x be a general point of G. By Lemma 3.4, we
can assume that x ∈ Locus(V ; {0} → Y ), and hence we can find a point y ∈ Y such that
x ∈ Locus(V ; 0 7→ y) by Theorem 2.13. In particular, x and y are V -chain connected. This
implies that y ∈ G, and therefore that G ∩ Y 6= ∅. Since G is a general fiber of φ◦, this
means that φ◦(Y ∩X◦) = S◦. Moreover, by the generality of G, we can in fact assume that
G has non-empty intersection with a very general fiber of π◦. On the other hand, recalling
that π is defined by the restriction VY of the family V defining φ, we see that if F is a very
general fiber of π◦ meeting G, then necessarily F ⊆ G. In conclusion, after possibly further
14 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
shrinking Z◦ (and consequently Y ◦), we may assume to have a commutative diagram
// X◦
// S◦,
where δ◦ is a dominant morphism.
We need to show that, after possibly changing the birational model for Y//VY , the
map δ◦ extends to a surjective morphism
δ : Y//VY → X//V .
We first take the normalization Z of a projective compactification of Z◦. Note that δ◦
extends to a rational map Z 99K X//V , which is defined by some linear system H on Z.
Then we take Z ′ to be the normalization of the blow-up of the base scheme of H. Since the
proper transform of H on Z ′ is base-point free, the rational map δ◦ lifts to a well defined
morphism δ : Z ′ → X//V . This is a morphism of projective varieties, and δ
◦ is dominant,
so δ is surjective. Moreover, the morphism Z ′ → Z is an isomorphism over Z◦, and thus
we can identify the latter with a subset of Z ′. Then we take Z ′ as the VY -quotient Y//VY
of Y . This proves the theorem. �
Theorem 3.1 can be viewed as a general property of rationally connected fibrations.
Aiming also at applications in extension problems of specific fibrations, that will be ad-
dressed in the following sections, we are interested in determining sufficient conditions to
ensure that the map δ, whose existence was proven in the previous theorem, is generically
finite. This is the content of the next theorem, which is the main result of this section.
Theorem 3.6. Let X be a smooth projective variety, and assume that Y ⊂ X is a smooth
subvariety with ample normal bundle. Let i : Y →֒ X be the inclusion, and suppose that the
induced map i∗ : N1(X) → N1(Y ) is surjective. Let V = ∪α∈AVα be a family of rational
curves on X, and assume that there is a subset B ⊆ A such that
(i) R≥0[V ] =
β∈B R≥0[Vβ ], and
(ii) Locus
i−1∗ (Vβ)
is dense in Y for every β ∈ B.
Let VY := 〉i
∗ (V )〈 be the restriction of V to Y , and let
Y//VY
// X//V ,
be the commutative diagram given by Theorem 3.1. Suppose that one of the following two
conditions holds:
(a) V is an unsplit family; or
(b) codimX Y < dimRCVY .
Then the morphism δ is generically finite.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 15
Proof. We first prove that δ is generically finite when the condition (a) of the statement is
satisfied. If Y//VY is a point, then the statement is obvious, so we can suppose that Y//VY
is positive dimensional. We suppose by contradiction that δ is not generically finite.
Keeping the notation introduced in the proof of Theorem 3.1, let G be a very
general fiber of φ◦, and consider YG := Y ∩ G. Note that G is smooth and rationally
connected by the restriction VG of V to G in the sense of Definition 2.28. So, since we
are assuming that V is unsplit, every irreducible component of VG is an unsplit family on
G, and hence N1(G) is generated by classes of VG-curves by Proposition 2.12. Since such
curves are all numerically equivalent in X, it follows that the image of the restriction map
Pic(X) → Pic(G) has rank 1. Consider the inclusions jX : G →֒ X, jY : YG →֒ Y and
iG : YG →֒ G. In order to reach a contradiction, we consider the commutative diagram
Pic(X)
// Pic(G)
Pic(Y )
// Pic(YG).
The plan is to give a lower-bound for the rank of the map Pic(X) → Pic(YG) by factoring
it through Pic(Y ).
Note that Y ◦G := Y
◦ ∩G is a non-empty open subset of YG. Since we are supposing
that δ◦ is not generically finite, we have dim
π◦(Y ◦G)
≥ 1. Therefore we can find a curve
Γ ⊆ YG such that Γ ∩ Y
G 6= ∅ and dim
π◦(Γ ∩ Y ◦G)
≥ 1. Since Locus(VY ) is dense in Y ,
every fiber of π◦ has positive dimension and, since these fibers are proper, we can fix a
curve C ⊂ YG lying inside a fiber of π
◦|Y ◦
Now, fix a projective model Y//VY for the rational quotient of Y containing Z
◦ as an
open subset, and let L ⊂ Y//VY be a general hyperplane section passing through a point
in π◦(Γ ∩ Y ◦G) but not containing the point π
◦(C). Let L◦ := L|Z◦ be the restriction of
L to Z◦, and let D ⊂ Y be the closure (in Y ) of (π◦)−1(L◦). If U ⊂ Z◦ \ L◦ is an open
neighborhood of π◦(C), then (π◦)−1(U) is an open neighborhood of C in X◦ (hence in
X) and is disjoint from (π◦)−1(L◦); this shows that D ∩ C = ∅. Let then DG and HG be
the restrictions of D and of a general hyperplane section H of Y to YG. By construction,
we have DG · Γ > 0 and DG · C = 0, whereas HG is ample; in particular, DG and HG
induce numerically independent elements of Pic(YG). Since both of them are restrictions
of divisors on Y , this implies that rk Im(j∗Y ) ≥ 2. Therefore, observing that the cokernel of
i∗ : Pic(X) → Pic(Y ) is torsion due to the surjectivity of N1(X) → N1(Y ), we conclude
rk Im(j∗Y ◦ i
∗) ≥ 2.
On the other hand, we have rk Im(i∗G ◦ j
X) ≤ 1, and therefore we have a contradiction
by the commutativity of the above diagram. This proves that δ is generically finite if the
condition (a) of the statement of the theorem is satisfied.
It remains to prove that δ is generically finite under the hypothesis (b). Let G be
a very general (smooth and connected) fiber of φ◦, and let F be a very general fiber of π
among those that are contained in G. Choosing G sufficiently general, we can also ensure
that F is general among the fibers of π◦. Then, to conclude the proof of the theorem, we
16 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
need to show that
(6) dimG = dimF + codimX Y.
Note indeed that, since dimG = dimRCV and dimF = dimRCVY , (6) implies that δ is
generically finite.
In order to prove (6), we fix a very general point y ∈ F . By Lemma 3.4,
Locus(V, {0} 7→ Y ) is dense in X. Moreover, we know that Locus(V, 0 7→ y) is con-
tained in G by Theorem 2.13. Thus Locus(V, 0 7→ y) sweeps a dense subset of G, if we let
y vary in F . Therefore, by our choices of F and y, we can assume that Locus(V, 0 7→ y)
contains a very general point of G. Note that any point of G is connected by a VG-chain
to the point y, where VG denotes the restriction of V to G in the sense of Definition 2.28.
Therefore Proposition 2.25 applied to G and VG implies that Locus
‖VG‖, 0 7→ y
is dense
in G. Note that the two families ‖V ‖(0 7→ y) and ‖VG‖(0 7→ y) are the same since G is
an RC‖V ‖ equivalence class. In particular, we have
dimG = dimLocus
‖V ‖, 0 7→ y
At this point the idea is to replace V with ‖V ‖, so that, by the above argument,
we can directly assume, without loss of generality, that the two sets Locus(V, 0 7→ y) and
G have the same dimension (and, in fact, that the first set is dense in the second one).
In order to make this step, we need to show, first of all, that ‖V ‖ satisfies hypotheses
analogous to (i) and (ii) imposed to V in the statement of the theorem, and moreover
that replacing V with ‖V ‖ does not affect the quotient maps φ and π and the respective
rational quotients, which implies condition (b) for ‖V ‖.
Lemma 3.7. The family ‖V ‖ satisfies the hypotheses (i) and (ii) imposed to V in Theo-
rem 3.6. Moreover, the families VY = 〉i
∗ (V )〈 and ‖V ‖Y := 〉i
∗ (‖V ‖)〈 define the same
rational fibration on Y . In particular, codimX Y < dimRC‖V ‖Y .
Proof of the lemma. First recall that VY is a numerically covering family by Theorem 3.1,
hence so is ‖VY ‖. We compare ‖V ‖ with
W := 〈i∗(‖VY ‖cov)〉.
Note that ‖VY ‖cov is non-empty and W is a subfamily of ‖V ‖. We claim that
(7) R≥0[‖V ‖] = R≥0[W ] in N1(X).
By the definition of ‖V ‖, this is equivalent to R≥0[V ] = R≥0[W ]. The inclusion R≥0[V ] ⊇
R≥0[W ] is obvious, so we need to show that the reverse inclusion holds. In fact, by the
hypothesis (i) on V , it suffices to show that R≥0[Vβ ] ⊆ R≥0[W ] for every β ∈ B. Since
Locus(i−1∗ (Vβ)) is dense in Y , at least one of the irreducible components of 〉i
∗ (Vβ)〈 is a
covering family on Y , hence a subfamily of ‖VY ‖cov. This implies that R≥0[Vβ] ⊆ R≥0[W ],
hence (7) is proven.
On the other hand, the restriction to Y of any irreducible component ofW is a union
of irreducible components of ‖VY ‖cov, which are covering by Definition 2.18. Therefore
their loci are dense in Y . Combining this with (7) proves the first assertion.
We are currently assuming that V satisfies the condition (b) in Theorem 3.6.
Therefore we get the desired inequality codimX Y < dimRC‖V ‖Y as soon as we show
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 17
that the families VY and ‖V ‖Y define the same rational quotient of Y , because then
dimRCVY = dimRC‖V ‖Y . By using Proposition 2.20 it is enough to show that
(8) R≥0[‖V ‖Y ] = R≥0[VY ] in N1(Y ).
The inclusion R≥0[‖V ‖Y ] ⊇ R≥0[VY ] is obvious. To prove the other, note that the embed-
ding of Y in X induces an inclusion
ι : N1(Y ) →֒ N1(X),
which follows from the hypothesis that i∗ : N1(X) → N1(Y ) is surjective made in Theo-
rem 3.6. Observe that
R≥0[‖V ‖Y ]
⊆ R≥0[‖V ‖] = R≥0[V ] =
R≥0[Vβ] = ι
R≥0[VY ]
in N1(X).
Since ι is injective, the inclusion holds inN1(Y ), before taking images. This proves equality
(8) and completes the proof of the lemma. �
We now come back to the proof of the theorem. By the previous lemma, we are
allowed to replace V with ‖V ‖. Therefore we can directly assume without loss of generality
(9) dimLocus(V, 0 7→ y) = dimG.
Then, in order to show (6) and hence to conclude the proof, it suffices to show that
dimLocus(V, 0 7→ y) = dimX − dimY//VY .
For short, let d = dimX and k = dimY//VY . Since y is a general point of Y , we can
assume that VY (0 7→ y) contains a sufficiently general point [g] of a covering component
of VY , so that g is free and C := g(P
1) is a smooth curve (the smoothness of C is not
essential for the proof, but simplifies the notation).
We observe that NY/X |F is an ample vector bundle over F , and that rkNY/X |F <
dimF by condition (b), and therefore H1((NY/X |F )
∗) = 0 by the Le Potier vanishing
theorem [23]. This implies that the short exact sequence
0 → NF/Y → NF/X → NY/X |F → 0
splits since NF/Y
F , and therefore we get a surjection NF/X ։ NF/Y . Restricting to
C and composing with the natural surjections TX |C ։ NC/X and NC/X ։ NF/X |C , we
obtain a chain of surjections
TX |C ։ NC/X ։ NF/X |C ։ NF/Y |C
Since g is free, so is f := i ◦ g by Proposition 2.26, and therefore TX |C is nef. Writing
TX |C ∼= ⊕iOP1(bi), we have bi ≥ 0, and thus the existence of a quotient of TX |C isomorphic
to O⊕k
implies that
(10) #{bi | bi = 0} ≥ k.
Note that (V, 0 7→ y) is smooth at [f ] by Proposition 2.2. Let u : P1×(V, 0 7→ y) → X
be the universal map, and let q ∈ P1 be a point different from 0. By [15, Proposition II.3.10]
and (10), we have
rkdu(q, [f ]) ≤ d− k.
18 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
Since (V, 0 7→ y) is smooth at [f ], we obtain dimLocus(V, 0 7→ y) ≤ d − k. On the
other hand, we have dimG ≥ d − k by Theorem 3.1. Thus by (9) we conclude that
dimLocus(V, 0 7→ y) = d− k, which proves (6). This concludes the proof of the theorem.
Remark 3.8. It is interesting to compare the result of Theorem 3.6 when the condition
in case (b) is satisfied with a result of Sommese [26, Proposition III] (see also [3, Theo-
rem (5.3.1)] for a more general version) in which the case of an ample divisor endowed with
a fibration is considered. Indeed, one notices that if codimX Y = 1, then the inequality
in (b) reduces exactly to the hypothesis imposed in the theorem of Sommese.
Remark 3.9. Hartshorne’s conjecture [13, Conjecture 4.5] on the intersection of ample
submanifolds with complementary dimensions, if true, would imply that the morphism δ
in Theorem 3.6 is in fact an isomorphism whenever the condition given in (b) is satisfied.
Indeed, let G be a general fiber of φ. Then Y ∩ G is a disjoint union of deg δ fibers Fj
of π, and by using standard exact sequences one easily sees that NY ∩G/G
∼= NY/X |Y ∩G.
In particular, we have NFj/G
∼= NY ∩G/G|Fj
∼= NY/X |Fj , which is ample. Note that con-
dition (b) implies that 2 dimFj > dimG. Then, assuming that deg δ ≥ 2, Hartshorne’s
conjecture would give the contradiction F1 ∩ F2 6= ∅; therefore δ can only be birational.
In fact, modulo suitable choices of the rational quotients, one could even assume that δ
is an isomorphism. We recall that this conjecture of Hartshorne is known to be true, for
instance, for homogeneous spaces [24]; we will use this fact in the proof of Theorem 5.8.
Remark 3.10. The previous results are related to a result of Occhetta [25, Proposition 4],
where an analogous extension property is obtained under more restrictive hypotheses. In
the setting of [25], Y is supposed to be the zero locus of a regular section of an ample vector
bundle on X, π is the MRC-fibration of Y , and the rational quotient of Y is assumed to
have positive geometric genus.
In applications, it might be useful to start with families of rational curves on Y ,
rather than only considering restrictions of families on X. The following elementary
property will be useful. Let X and Y be as above, with the inclusion inducing a surjection
N1(X) → N1(Y ), and consider a numerically covering family W of rational curves on
Y . Let then V = 〈i∗(W )〉 be the family on X obtained as the extension of W , and let
VY =〉i
∗ (V )〈 be its restriction back to Y . Note that VY is numerically covering since W
is so. We clearly have W ⊆ VY , but in general the inclusion may be strict (one can take
as an example the case of a quadric Y = Q2 in X = P3, taking W to be the family of
lines on Y corresponding to one of the two rulings). By this inclusion and Theorem 3.1,
we have a commutative diagram
// Y//VY
// X//V ,
for a suitable choice of the rational quotients, and some surjective morphisms γ and δ.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 19
Proposition 3.11. Keeping the notation introduced above, both families W and VY define
the same rational quotient (i.e., Y//W = Y//VY and τ = π) if either one of the following
two situations occurs:
(a) dimY//W = dimX//V , or
(b) i∗ : N1(X) → N1(Y ) is surjective.
Proof. We observe that the general fiber of γ is rationally connected, since it is dominated
by the general fiber of π, and in particular it is connected. Then the conclusion in case (a)
follows immediately from the commutativity of the above diagram. We now focus on
case (b), and thus assume that N1(X) → N1(Y ) is surjective. By duality, we have an
injection N1(Y ) →֒ N1(X) and, by the construction of VY , we deduce that R≥0[VY ] =
R≥0[W ]. Then the assertion follows from Proposition 2.20. �
The case when the rational quotient is one-dimensional is particularly well suited
because of the following well-known property. We are grateful to the referee for suggesting
this proof, which simplifies our original argument.
Proposition 3.12. Let X be a normal projective variety, let φ◦ : X◦ → C be a dominant
morphism from a non-empty open subset X◦ ⊆ X to a smooth projective curve C, and
assume that φ◦ is proper over its image. Then φ◦ extends to a (surjective) morphism
φ : X → C.
Proof. We consider a projection C → P1. Composing with φ◦, we have a rational map
X 99K P1 which is defined by some linear pencil. Because of the properness of φ◦, this
pencil is free, hence the map X → P1 is regular and we can lift it via Stein factorization
to a regular morphism X → C which extends φ◦. �
Coming back to the extension of rationally connected fibrations, we immediately
obtain the following result.
Corollary 3.13. With the same notation and assumptions as in Theorem 3.6, suppose
furthermore that dimY//VY = 1. Then the RCVY -fibration π of Y and the RCV -fibration
φ of X are surjective morphisms fitting in a commutative diagram
Y//VY
// X//V ,
where δ is a finite morphism of smooth curves.
Proof. It follows from Theorems 3.1 and 3.6, and Proposition 3.12. �
4. Extension of Mori contractions
In this section we consider the problem to extend Mori contractions. Let Y be a
smooth projective variety with Picard number ρ := ρ(Y ) ≥ 2, and suppose that R ⊂
N1(Y ) is a KY -negative extremal ray of NE(Y ) such that the associated Mori contraction
π : Y → Z
is of fiber type, namely, that dimZ < dimY .
20 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
We fix a rational curve C in a general fiber of π passing through a general point
of Y , whose numerical class [C] generates R. Then we fix a normalization morphism
f : P1 → C ⊂ Y , and let
W = 〈[f ]〉 ⊂ Hombir(P
1, Y )
be the minimal family of rational curves on Y generated by [f ]. Note that, by construction,
R≥0[W ] = R≥0[C] = R in N1(Y ),
and W is a covering family on Y , since C passes through a general point of Y . Moreover,
π can be considered as the RCW -fibration of Y .
Theorem 4.1. With the above notation, assume that Y is a smooth subvariety, with
ample normal bundle, of a smooth projective variety X, such that the inclusion i : Y →֒ X
induces an isomorphism i∗ : N1(X) → N1(Y ). Let
V := 〈i∗(W )〉 ⊂ Hombir(P
be the extension of the family W to X, and assume that V is an unsplit family on X.
Then R≥0[V ] is a KX -negative extremal ray of X. Moreover, denoting by φ : X → S
the Mori contraction of this ray, there is a finite surjective morphism δ : Z → S giving a
commutative diagram
// S.
Proof. By the hypothesis and duality, the inclusion i : Y →֒ X induces a natural isomor-
phism ι : N1(Y ) ∼= N1(X). Let RX := ι(R). This is a ray in N1(X), and is contained in
NE(X), because of the inclusion ι(NE(Y )) ⊆ NE(X). Moreover, if C is a curve as above,
then adjunction formula gives
KX · C = (KY − detNY/X) · C < 0,
and thus RX is a KX -negative ray. The main point here is to prove that RX is an extremal
ray of NE(X).
We start by observing two things. First of all, we have RX = R≥0[V ] in N1(X),
and we can consider φ as the RCV -fibration of X. Moreover, if VY := 〉i
∗ (V )〈 is the
restriction of V to Y , then we have
R≥0[VY ] = ι
−1(R≥0[V ]) = R≥0[W ] in N1(Y ),
by the isomorphism ι : N1(Y ) ∼= N1(X). Therefore VY and W define the same rational
quotient, by Proposition 2.20.
Note that V satisfies the conditions (i) and (ii) of Theorem 3.1. Then, by Lemma 3.4
and the fact that V is unsplit, we have
(11) Locus(V, {0} → Y ) = X.
Indeed the lemma implies that Locus(V, {0} → Y ) is dense in X, and the unsplitness of
V implies that such locus is proper, hence closed in X.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 21
Let D be any divisor on X whose restriction D|Y is a good supporting divisor for
R. Note that D · C = 0. Let Γ be any curve on X. Then, by Proposition 2.30, there is a
curve ΓY ⊂ Y such that [Γ] = a[ΓY ] + b[C] in N1(X), with a ≥ 0. This implies that
D · Γ = aD · ΓY + bD · C = aD|Y · ΓY ≥ 0,
since D · C = 0 and D|Y is nef. This proves that D is nef.
Since R is an extremal ray of NE(Y ), by duality it corresponds to it an extremal face
of Nef(Y ) of maximal dimension ρ − 1, and therefore we can find ρ− 1 good supporting
divisors of R whose numerical classes are linearly independent in N1(Y ). By the previous
argument, this implies that such good supporting divisors, up to numerically equivalence,
extend to divisors on X that are nef and trivial on RX , and whose numerical classes are
linearly independent in N1(X). Since there are ρ − 1 of them, and ρ = ρ(X) by the
isomorphism N1(X) ∼= N1(Y ), this implies that RX is an extremal ray of NE(X).
So, let φ : X → S be the Mori contraction of RX . By taking the Stein factorization
of the restricted morphism φ|Y : Y → S, we obtain a commutative diagram
// S,
where T is normal, ψ is surjective with connected fibers, and δ is finite. We observe that
an irreducible curve on Y is contracted by ψ if and only if it is numerically proportional
(in X, and hence in Y ) to C, and therefore if and only if it is contracted by π. This
implies that T = Z and ψ = π. Moreover, by (11), we see that Y meets every fiber of φ,
and therefore δ is surjective. This completes the proof of the theorem. �
Remark 4.2. Theorem 4.1 can be generalized, with minor changes, to the case of contrac-
tions of KY -negative extremal faces of NE(Y ).
Remark 4.3. A result analogous to Theorem 4.1 is obtained in [25, Proposition 5] in a
more restrictive context (in the setting of [25], Y is assumed to be the zero locus of a
regular section of an ample vector bundle on X, with dimX ≥ 4, and δ is proven to
be an isomorphism). Let us note that the proof of [25, Proposition 5] makes use in an
essential way of the fact that the divisor D (in our and in his notation as well) is an
adjoint divisor, i.e., a divisor of the form D = KX + A for some ample line bundle A on
X. This circumstance seems not to be true in general. The gap was pointed out by Paltin
Ionescu. Our arguments fill up that gap. A similar discussion is given in [2] in the case
Y is an ample divisor on X. In connection with Theorem 4.1 we should also mention [1,
Theorem 3.4] and [9, Lemma (1.4)].
We close this section with the following elementary property on extensions of relative
polarizations, which applies in particular to the setting of Theorem 4.1.
Proposition 4.4. Let
Z // S
22 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
be a commutative diagram of morphisms of projective varieties, and assume that i is an
embedding, that π is not a finite morphism, and that ρ(X/S) = 1. Let M be a line bundle
on X whose restriction M |Y to Y is π-ample. Then there exists an ample line bundle H
on X such that H|F ∼=M |F for every fiber F of π.
Proof. Since the numerical class of a curve in a positive dimensional fiber of π spans
N1(X/S), it follows from the relative Kleiman criterion of ampleness (see, e.g., [17, Theo-
rem 1.4]) that M is φ-ample. Then the line bundle H := M + φ∗(kA) is ample if A is an
ample line bundle on S and k ≫ 0 (see, e.g., [22, Proposition 1.7.10]), and H|F ∼= M |F
for every fiber of π. �
5. Projective bundles and quadric fibrations as ample subvarieties
The results obtained in the previous sections are here applied to classify, under
suitable conditions, projective manifolds containing “ample subvarieties” that admit a
structure of projective bundles or quadric fibrations.
Here we will adopt the following definitions of projective bundles and quadric fi-
brations. We remark that there are in fact several slightly different notions of quadric
fibrations in the literature, and the one adopted here is a little more restrictive than
others (see Remark 5.2 below).
Definition 5.1. A surjective morphism π : Y → Z between smooth projective varieties
is said to be a Pm-bundle (resp., a Qm-fibration) if π is an extremal Mori contraction and
there is a line bundle L on Y such that every fiber F of π is mapped isomorphically to
Pm (resp., is embedded as an irreducible and reduced quadric hypersurface of Pm+1) via
the complete linear system |L|F |.
Remark 5.2. With the above definition, a morphism π : Y → Z is a Pm-bundle if and
only if Y ∼= PZ(F) for some vector bundle F of rank m + 1 on Z; we can take as L the
tautological line bundle of F on Y . On the other hand, not all scrolls (resp., quadric
fibrations) in the adjunction theoretic sense (cf. [3, Sections 3.3, 14.1, 14.2]) are Pm-
bundles (resp., Qm-fibrations) in our sense. In fact, our definition of Qm-fibration does
not include conic bundles with singular fibers, since we require all fibers to be irreducible;
moreover, the assumption that π is an extremal Mori contraction implies that ρ(Y/Z) =
1, which excludes (as we will see in the proof of Lemma 5.3) those fibrations with all
fibers isomorphic to Q2 for which the fundamental group of the base Z acts with trivial
monodromy. Note that the general fiber of a Qm-fibration is isomorphic to Qm.
Lemma 5.3. Let π : Y → Z be either a Pm-bundle or a Qm-fibration, and let W be the
family of rational curves on Y generated by the lines in the fibers of π. Then W is an
irreducible family.
Proof. The assertion is clear in all cases except when π is a Q2-fibration. In this case, a
straightforward extension of the arguments in the proof of [8, Proposition (1.3.1)] shows
two things. First of all, ρ(Y/Z) 6= 1 if and only if all fibers of π are isomorphic to Q2
and the fundamental group of Z acts with trivial monodromy. Moreover, in all remaining
cases, the lines in the fibers generate an irreducible family. The fact that the base is
one-dimensional in the setting considered in [8] is not essential in the arguments. �
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 23
Definition 5.4. An embedding Pm →֒ Pn is said to be linear if the image of Pm is a
linear subspace of Pn. Similarly, an embedding of irreducible varieties F ⊂ G, with F
isomorphic to a quadric of Pm+1 and G isomorphic either to Pn or to a quadric in Pn+1,
is said to be linear if OG(1)|F ∼= OF (1).
For the reminder of this section, we consider the following setting. LetX be a smooth
projective variety, and assume that Y ⊂ X is a positive dimensional submanifold with
ample normal bundle NY/X . An easy application of a well-known theorem of Kobayashi
and Ochiai gives us the following property, in which the case when Y is a projective space
or a smooth quadric of dimension ≥ 3 is considered.
Proposition 5.5. With the above notation, assume that the inclusion i : Y →֒ X induces
an isomorphism i∗ : Pic(X) → Pic(Y ), and denote n := dimX. If Y ∼= Pm for some
m ≥ 1, then X ∼= Pn and Y is linearly embedded in X. If Y ∼= Qm for some m ≥ 3, then
either X ∼= Pn or X ∼= Qn, and Y is linearly embedded in X.
Proof. Suppose first that Y ∼= Pm. By the assumption, we have ρ(X) = 1, and the ample
generator OX(1) of Pic(X) satisfies OX(1)|Y ∼= OY (1). Observe that
d := degNY/X ≥ rkNY/X = n−m.
By adjunction, we have OX(−KX) ∼= OX(m+ 1+ d), and therefore X is a Fano manifold
of index ≥ n + 1. Then the assertion follows from Kobayashi and Ochiai’s theorem (see
e.g., [3, (3.1.6)]). The argument for the case when Y ∼= Qm with m ≥ 3 is analogous. �
Remark 5.6. The case when Y ∼= Q2 is harder to deal with under such weak hypotheses.
However, if we additionally assume the existence of an ample line bundle H on X such
that H|Y ∼= OP1×P1(1, 1), then this case cannot occur. If at the same time we relax the
hypothesis on the Picard groups, and only assume that i∗ : Pic(X) → Pic(Y ) is injective,
then it is easy to see that necessarily ρ(X) = 1, either X ∼= Pn or X ∼= Qn, and Y
is linearly embedded in X. Indeed, to exclude the case ρ(X) = 2, it suffices to apply
Theorem 4.1 with the family W being either one of the two rulings of Q2. In this way one
obtains two distinct Mori contractions of X whose relative dimensions, added together,
exceed the dimension of X, but this is impossible.
Let us now recall the following extension of the Lefschetz hyperplane theorem, es-
sentially due to Sommese [27, Proposition 1.16] (see also [19, Theorem 1.1]), which will
be used in the proof of case (b) of Theorem 5.8 below.
Proposition 5.7. Let X be a projective manifold, and suppose that Y ⊂ X is a submani-
fold defined by a regular section of an ample vector bundle on X. Suppose that dimY ≥ 3.
Then the inclusion i : Y →֒ X induces an isomorphism i∗ : Pic(X) → Pic(Y ).
We now address the case when Y is a Pm-bundle or a Qm-fibration over a smooth
projective variety. The following theorem (which includes Proposition 5.5 as a very special
case) is the main result of this section.
Theorem 5.8. Let Y be a submanifold of a projective manifold X with ample normal
bundle. Assume that Y admits either a Pm-bundle or a Qm-fibration structure π : Y → Z
over a smooth projective variety Z (see Definition 5.1), and that Y has codimension
(12) codimX Y < dimY − dimZ.
24 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
Further suppose that one of the following two situations occurs:
(a) Z is a curve and the inclusion i : Y →֒ X induces an isomorphism Pic(X) ∼=
Pic(Y ); or
(b) Y is the zero scheme of a regular section of an ample vector bundle E on X.
Then π extends to a morphism π̃ : X → Z, that is, there is a commutative diagram
eπ~~}}
Moreover, letting n := m+ codimX Y , the following holds:
• if π is a Pm-bundle, then π̃ is a Pn-bundle; and
• if π is a Qm-fibration, then π̃ is either a Pn-bundle or a Qn-fibration.
In either case, the fibers of π are linearly embedded in the fibers of π̃.
Proof. For clarity of exposition, we discuss case (a) and case (b) separately, as certain
steps require different arguments.
Case (a). Let W be the family of rational curves on Y generated by the lines in
the fibers of π. Note that W is an irreducible family by Lemma 5.3, and that π is the
RCW -fibration. By Propositions 2.6 and 2.26, Hombir(P
1,X) is smooth at the generic
point of i∗(W ). Let V = 〈i∗(W )〉 (note that V is irreducible). By Proposition 2.20, the
rational fibration defined by the restriction VY of V to Y coincides with π. Then, by
Corollary 3.13, we obtain the commutative diagram
// S,
where φ is the RCV -fibration and δ is a finite morphism of smooth projective curves.
By definition, there is a line bundle L on Y such that L|F ∼= OF (1) for every fiber
F of π. Since L extends to a line bundle on X and ρ(X/S) = 1, Proposition 4.4 implies
that there exists an ample line bundle H on X such that H|F ∼= OF (1) for every F .
If C is a line in a fiber of π and we set b = n if Y is a Pm-bundle and b = n − 1 if
Y is a Qm-fibration, then we have
(KX + bH) · C = (KY + bH|Y ) · C − detNY/X · C < 0.
It thus follows from the classification of polarized manifolds with large nef-value due to
Fujita and Ionescu (see [11, Section 1] or [14, Section 1]) that φ : X → S is either a
Pn-bundle or a fibration with fibers isomorphic to hyperquadrics in Pn+1 and relative
Picard number 1, and in fact, since dimX ≥ 3 and dimS = 1, in the second case X is a
Qn-fibration.
To conclude, observing that the general fiber of φ is a homogeneous space, we deduce
from Remark 3.9 that δ is in fact an isomorphism, and therefore we get a diagram as in
the statement by setting π̃ := δ−1 ◦ φ.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 25
Case (b). The first step is to prove the existence of a diagram as in the statement.
We will use different arguments according to the codimension of Y . For short, let r := rkE.
Note that codimX Y = r = n−m.
If r = 1, then (12) implies that π has relative dimension m ≥ 2. Therefore in this
case we can apply a general result of Sommese [26, Proposition III], which says that π
extends to a morphism π̃ : X → Z.
We now assume that r ≥ 2. Note that in this casem ≥ 3 by (12). To extend π in this
situation, we will use the results from the previous sections. As in the proof of case (a), let
W be the irreducible family of rational curves on Y generated by the lines in the fibers of
π, and let V be the unique irreducible component of Hombir(P
1,X) containing the generic
point of i∗(W ). By Proposition 2.20, the rational fibration defined by the restriction VY
of V to Y coincides with the one defined by W , that is, with π. Then, by Theorems 3.1
and 3.6, we obtain a commutative diagram
//___ S,
where φ is the rational quotient defined by V and δ is a dominant and generically finite
map. Let G be a general fiber of φ, let F be one of the fibers of π that is contained in G,
and fix a line C ⊆ F (we can assume that G and F are smooth). Note that dimG = n.
Lemma 5.9. Assuming r ≥ 2, the family V is unsplit.
Proof of the lemma. Since the section of E defining Y in X restricts to a regular section of
E|G whose zero scheme is F , we can apply Proposition 5.7 to this setting. This implies that
the inclusion of F in G induces an isomorphism Pic(G) ∼= Pic(F ). Note that ρ(F ) = 1,
since F is either a projective space or a quadric of dimension m ≥ 3. Therefore we have
ρ(G) = 1. Let L be a line bundle on Y such that L|F ∼= OF (1), and let L̃ be the extension
of L to X. We observe that L̃|G is ample and −KG ≡ aL̃|G for some integer a, which
is easily seen to be positive. Then, since L̃|G · C = 1, we see that G is a Fano manifold
of index a. In particular, we have −KG · C = a ≤ n + 1. Note that, if X
◦ ⊆ X is as in
Theorem 2.13, then we can assume that G is a fiber of the morphism X◦ → Z, and hence
KX |G = KX◦ |G = KG. Therefore
detE · C = KY · C −KX · C = KF · C −KG · C ≤ −m+ n+ 1 = r + 1.
Since we are assuming that r ≥ 2, this implies that detE · C < 2r. We conclude that the
family V is unsplit by Remark 2.11. �
We come back to the proof of the theorem, still assuming for the moment that r ≥ 2.
Since V is unsplit by Lemma 5.9, we can apply Theorem 4.1. This implies that, in the
above diagram, both φ and δ are morphisms and δ is finite, and moreover ρ(X/S) = 1.
Note in particular that rk Im
Pic(X) → Pic(G)
= 1. We have dimG = n. We set b = n
if F ∼= Pm, and b = n − 1 if F ∼= Qm. Denoting by C a line in a general fiber of π, we
obtain
(KG + bH|G) · C = (KF + bH|F ) · C − detNF/G · C < 0,
26 M.C. BELTRAMETTI, T. DE FERNEX, AND A. LANTERI
since NF/G
∼= NY/X |F is ample of rank n−m. By applying again [11, 14] and taking into
account that ρ(X/S) = 1, we deduce that either (G,H|G) ∼= (P
n,OPn(1)) or (G,H|G) ∼=
(Qn,OQn(1)), with the second case occurring only if F ∼= Q
m. Since the general fiber G of
π̃ is a homogeneous space, we deduce from Remark 3.9 that δ is in fact an isomorphism.1
We then define π̃ := δ−1 ◦ φ.
At this point we have a diagram as in the statement for all values of r. We claim that
π̃ is equidimensional with irreducible and reduced fibers. To see this, let Gz := π̃
−1(z) for
an arbitrary z ∈ Z, and let ∆ be any component of Gz. Note that ∆∩Y is contained in the
fiber Fz := π
−1(z) of π, which is irreducible and reduced of dimension m; in particular, we
have dim(∆∩Y ) ≤ m. Since dim∆ ≥ n and ∆∩Y is defined by a section of E|∆, we have
dim(∆ ∩ Y ) ≥ dim∆− r. We conclude that dim∆ = n so that π̃ is equidimensional, and
Fz ⊂ ∆. Recalling that Fz is irreducible and reduced, and cut out on Y by Gz (scheme
theoretically), we see that Gz is also irreducible and reduced. Since the fibers of π̃ are
irreducible and reduced, we can apply the semi-continuity of the ∆-genus [10, Section 5
and (2.1)]. We conclude that π̃ is either a Pn-bundle or a Qn-fibration over Z. �
We note that all cases in Theorem 5.8 are effective.
Remark 5.10. We already listed in the introduction a series of works related to Theo-
rem 5.8. The classical setting when Y is an ample divisor is widely studied in the litera-
ture; for a survey and complete references we refer to [3, Chapter 5] and [2]. In the case
when Y has codimension r ≥ 2, we are not aware of other results of this type in which
the base of the fibration is allowed to be arbitrary.
References
[1] M. Andreatta and G. Occhetta, Ample vector bundles with sections vanishing on special varieties.
Internat. J. Math. 10 (1999), 677–696.
[2] M.C. Beltrametti and P. Ionescu, A view on extending morphisms from ample divisors. In preparation.
[3] M.C. Beltrametti and A.J. Sommese, The Adjunction Theory of Complex Projective Varieties. Expo-
sitions in Mathematics, 16, W. de Gruyter, Berlin, (1995).
[4] M.C. Beltrametti, A.J. Sommese and J.A. Wísniewski, Results on varieties with many lines and their
applications to adjunction theory. Complex Algebraic Varieties (K. Hulek et al. eds.), Proceedings,
Bayreuth, 1990. 16–38, Lecture Notes in Math., 1507, Springer-Verlag, Berlin, 1992.
[5] F. Campana, Connexité rationnelle des variétés de Fano. Ann. Sci. Ec. Norm. Sup. 25 (1992), 539–
[6] O. Debarre, Higher-Dimensional Algebraic Geometry. Universitext, Springer-Verlag, Berlin, 2001.
[7] T. de Fernex, Ample vector bundles with sections vanishing along conic fibrations over curves. Collect.
Math. 49 (1998), 67–79.
[8] T. de Fernex, Ample vector bundles and intrinsic quadric fibrations over irrational curves.Matematiche
(Catania) 55 (2000), 205–222.
[9] T. de Fernex and A. Lanteri, Ample vector bundles and Del Pezzo manifolds. Kodai Math. J. 22
(1999), 83–98.
[10] T. Fujita, On the structure of polarized varieties with ∆-genera zero. J. Fac. Sci. Univ. Tokyo Sect.
IA Math. 22 (1975), 103–115.
[11] T. Fujita, On polarized manifolds whose adjoint bundles are not semipositive. Algebraic Geometry,
Sendai 1985, Adv. Stud. Pure Math. 10 (1987), 167–178.
1Alternatively, one can deduce this fact from [27, Proposition 1.16]. Indeed this result, applied to G∩Y
once this is thought of as a subvariety of G defined by a regular section of E|G, gives the isomorphism
0(G ∩ Y,Z) ∼= H
0(G,Z), and this implies that G ∩ Y is connected.
AMPLE SUBVARIETIES AND RATIONALLY CONNECTED FIBRATIONS 27
[12] T. Graber, J. Harris and J.M. Starr, Families of rationally connected varieties. J. Amer. Math. Soc.
16 (2003), 57–67.
[13] R. Hartshorne, Ample Subvarieties of Algebraic Varieties. Lecture Notes in Math. 156, Springer-
Verlag, Berlin, 1970.
[14] P. Ionescu, Generalized adjunction and applications. Math. Proc. Camb. Philos. Soc. 99 (1986), 457–
[15] J. Kollár, Rational Curves on Algebraic Varieties. Ergeb. Math. Grenzgeb. (3) 32, Springer-Verlag,
Berlin, 1996.
[16] J. Kollár, Y. Miyaoka and S. Mori, Rationally connected varieties. J. Algebraic Geom. 1 (1992),
429–448.
[17] J. Kollár and S. Mori, Birational Geometry of Algebraic Varieties. Cambridge Tracts in Mathematics,
Cambridge University Press, Cambridge, 1998.
[18] A. Lanteri and H. Maeda, Ample vector bundles with sections vanishing on projective spaces or
quadrics. Internat. J. Math. 6 (1995), 587–600.
[19] A. Lanteri and H. Maeda, Ample vector bundle characterizations of projective bundles and quadric
fibrations over curves. In Andreatta et al. (eds.), Proceedings of the international conference: Higher
Dimensional Complex Varieties, Trento, Italy, June 15–24, 1994. 247–259, W. de Gruyter, Berlin,
1996.
[20] A. Lanteri and H. Maeda, Geometrically ruled surfaces as zero loci of ample vector bundles. Forum
Math. 9 (1997), 1–15.
[21] A. Lanteri and H. Maeda, Special varieties in adjunction theory and ample vector bundles. Math.
Proc. Camb. Philos. Soc. 130 (2001), 61–75.
[22] R. Lazarsfeld, Positivity in Algebraic Geometry. I – Classical Setting: Line Bundles and Linear Series.
Ergeb. Math. Grenzgeb. (3) 48, Springer-Verlag, Berlin, 2004.
[23] J. Le Potier, Annullation de la cohomologie à valeurs dans un fibré vectoriel holomorphe positif de
rang quelconque. Math. Ann. 218 (1975), 35–53.
[24] M. Lübke, Beweis einer Vermutung von Hartshorne für den Fall homogener Mannigfaltigkeiten. J.
Reine Angew. Math. 316 (1980), 215–220.
[25] G. Occhetta, Extending rationally connected fibrations. Forum Math. 18 (2006), 853–867.
[26] A.J. Sommese, On manifolds that cannot be ample divisors. Math. Ann. 221 (1976), 55–72.
[27] A.J. Sommese, Submanifolds of Abelian varieties. Math. Ann. 233 (1978), 229–256.
[28] J.A. Wísniewski, On a conjecture of Mukai, Manuscripta Math. 68 (1990), 135–141.
Dipartimento di Matematica, Università di Genova, Via Dodecaneso 35, I-16146 Gen-
ova, Italy
E-mail address: beltrame@dima.unige.it
Department of Mathematics, University of Utah, 155 South 1400 East, Salt Lake City,
UT 84112, USA
E-mail address: defernex@math.utah.edu
Dipartimento di Matematica “F. Enriques”, Università di Milano, Via C. Saldini 50,
I-20133 Milano, Italy
E-mail address: lanteri@mat.unimi.it
	Introduction
	1. Conventions and basic notation
	2. Families of rational curves on varieties
	3. Extension of rationally connected fibrations
	4. Extension of Mori contractions
	5. Projective bundles and quadric fibrations as ample subvarieties
	References
ABSTRACT
  Under some positivity assumptions, extension properties of rationally
connected fibrations from a submanifold to its ambient variety are studied.
Given a family of rational curves on a complex projective manifold X inducing a
covering family on a submanifold Y with ample normal bundle in X, the main
results relate, under suitable conditions, the associated rational connected
fiber structures on X and on Y. Applications of these results include an
extension theorem for Mori contractions of fiber type and a classification
theorem in the case Y has a structure of projective bundle or quadric
fibration.

<|endoftext|><|startoftext|>
Introduction
Let G∞ be a semisimple real Lie group with unitary dual Ĝ∞. The goal of this note is to produce
new upper bounds for the multiplicities with which representations π ∈ Ĝ∞ of cohomological type
appear in certain spaces of cusp forms on G∞.
More precisely, we suppose that G∞ := G(R ⊗Q F ) for some connected semisimple linear
algebraic group G over a number field F . Let K∞ be a maximal compact subgroup of G∞. We
fix an embedding G →֒ GLN for some N , and for any ideal q of OF , we let G(q) denote the
intersection of G∞ with the congruence subgroup of GLN (OF ) of full level q. We also fix an
arithmetic lattice Γ in G∞ (i.e. a subgroup commensurable with the congruence subgroups G(q))
and write Γ(q) := Γ
G(q). For any π ∈ Ĝ∞, let m(π,Γ(q)) denote the multiplicity with which
π occurs in the decomposition of the regular representation of G∞ on L
cusp(Γ(q)\G∞). Let V (q)
denote the volume of the arithmetic quotient Γ(q)\G∞.
In terms of this notation, we may state our main results.
Theorem 1.1. Let p be a prime ideal in OF . Let π ∈ Ĝ∞ be of cohomological type. Suppose
either that G∞ does not admit discrete series, or, if G∞ admits discrete series, that π contributes
to cohomology in degrees other than
dim(G∞/K∞). Then
m(π,Γ(pk)) ≪ V (pk)1−1/ dim(G∞)
as k → ∞, with the implied constant depending on Γ and p.
Theorem 1.2. Let p be a prime ideal in OF . Let W be a finite-dimensional representation of G∞,
and let VW,k denote the local system on Γ(p
k)\G∞ induced by W (assuming that k is taken large
enough for Γ(pk) to be torsion free). Let n ≥ 0, and if G∞ admits discrete series, then suppose
furthermore that n 6=
dim(G∞/K∞). Then
dimHn
Γ(pk)\G∞,VW,k
≪ V (pk)1−1/ dim(G∞)
as k → ∞, with the implied constant depending on Γ and p.
These two theorems are evidently closely related, in light of the results of [5], which show that
Γ(pk)\G∞,VW,k
may be computed in terms of automorphic forms.
http://arxiv.org/abs/0704.0662v1
In the remainder of this introduction, we discuss the relation of Theorem 1.1 to prior results in
this direction before briefly describing the main ingredients in the proof of the two theorems.
DeGeorge and Wallach [3] established general upper bounds for m(π,Γ) in the case where Γ is
cocompact. In particular (ibid, Corollary 3.2), they showed that
m(π,Γ) ≤
|φ(g)|2dg
vol(Γ\G∞), (1.1)
where φ(g) = 〈π(g)v, v〉 is a matrix coefficient, and B is the preimage in Γ\G∞ of a ball in
Γ\G∞/K∞ of radius equal to the injectivity radius of Γ\G∞/K∞. Suppose, however, that π is
not a discrete series. In particular, the corresponding matrix coefficients of π are then not square
integrable. If Γ(q) denotes the mod q congruence subgroup of Γ, then inj.rad(Γ(q)\G∞) → ∞ as
NF/Q(q) → ∞, and thus, the formula of DeGeorge–Wallach implies that
NF/Q(q)→∞
V (q)−1 ·m(π,Γ(q)) = 0. (1.2)
For non-cocompact Γ, an analogous result was established by Savin [10].
It is natural to try to improve this result so as to obtain an estimate on the rate of decay in (1.2)
as NF/Q(q) → ∞. If π is non-tempered, then (1.1) itself implies an estimate of the form
m(π,Γ(q)) ≪ V (q)1−µ (1.3)
for some µ > 0. (See [9], Lemma 1 and displayed equation (6).) In fact Sarnak and Xue in [9] have
conjectured an inequality of the following form (in the case of cocompact Γ):
Conjecture 1.3 (Sarnak–Xue). For π ∈ Ĝ∞ fixed,
m(π,Γ(q)) ≪ V (q)(2/p(π))+ǫ, for all ǫ > 0,
where p(π) is the infimum over p ≥ 2 such that the K-finite matrix coefficients of π are in Lp(G∞).
Sarnak and Xue proved their conjecture for arithmetic lattices in SL2(R) and SL2(C), obtaining
partial results in the direction of this conjecture for SU(2, 1). Note, however, that their conjecture
is non trivial only for non-tempered representations, since for tempered representations, p(π) = 2.
In particular, in the tempered but non-discrete series case, Conjecture 1.3 is weaker than the known
result (1.2).
In Theorem 1.1, we restrict our attention to congruence covers of the form Γ(pk) for the fixed
prime p. For such covers we obtain a quantitative improvement of (1.2) even in the case of tempered
representations (at least for those of cohomological type; note that non-discrete series tempered
representations of cohomological type exist precisely when G∞ admits no discrete series – see [1],
Thm. 5.1, p. 101). For such representations, our result provides the first general bound of the
form (1.3) for any µ > 0.
As we already noted, our two main theorems are closely related. Indeed, Theorem 1.1 is an
easy corollary of Theorem 1.2 (see the end of Section 3 below), and most of our efforts will be
concentrated on establishing the latter result.
When studying the Betti numbers of arithmetic quotients of symmetric spaces, it is natural to
try to use tools such as Euler characteristics and the Lefschetz trace formula. When applied to
analyzing contributions from the discrete series, such methods tend to be very powerful; for example,
the (g, k)-cohomology of a discrete series representation is concentrated in a single dimension [1],
and so no cancellations occur when taking alternating sums. However, in other situations, these
methods can be useless. For example, if π is tempered but not discrete series, then the Euler
characteristic of its (g, k)-cohomology vanishes [1]. Similarly, in situations where the symmetric
space is a real manifold of odd dimension n, Poincare duality leads to cancellations in the natural
(−1)k dim(Hk). One is thus forced to find different techniques. The proof of Theorem 1.2
takes as input the inequality (1.2) of [3] and [10] and a spectral sequence from [4], proceeding via
a bootstrapping argument relying on non-commutative Iwasawa theory.
Acknowledgments. The second author would like to thank Peter Sarnak for a very stimulating
conversation on the subject of this note.
2 Iwasawa Theory
Let G ⊆ GLN (Zp) be an analytic pro-p group. Let Gk = G ∩ (1 + p
kMN (Zp)) ⊆ GLN (Zp). The
subgroups Gk form a fundamental set of open neighbourhoods of the identity in G, and moreover,
for large k, there exists a constant c such that [G : Gk] = c · p
dk, where d = dim(G).
Fix a finite extension E of Qp with ring of integers OE . Write Λ = OE [[G]] and ΛE = E⊗OE Λ.
The module theory of Λ falls under the rubric of Iwasawa theory. A fundamental result of Lazard [7]
states that Λ is Noetherian; the same is thus true of the ring ΛE . The rings Λ and ΛE are non-
commutative domains admitting a common field of fractions which we will denote by L . Thus,
L is a division ring which contains Λ and ΛE and is flat over each of them (on both sides). If
M is a finitely generated left Λ-module (resp. ΛE-module), then L ⊗Λ M (resp. L ⊗ΛE M) is
a finite-dimensional left L -vector space; we define the rank of M to be the L -dimension of this
vector space. Note that rank is additive in short exact sequences of finitely generated Λ-modules
(resp. ΛE-modules), by virtue of the flatness of L over Λ and ΛE .
Recall that a continuous representation of G on an E-Banach space V is called admissible if its
topological E-dual V ′ (which is naturally a ΛE-module) is finitely generated over ΛE . (See [11]; a
key point is that since ΛE is Noetherian, the category of admissible continuous G-representations
is abelian. Indeed, passing to topological duals yields an anti-equivalence with the abelian category
of finitely generated ΛE-modules.) We define the corank of an admissible G-representation to be
the rank of the finitely generated ΛE-module V
A coadmissible G-representation V is not determined by the collection of subspaces of invariants
V Gk (r ≥ 1). However, the following result (Theorem 1.10 of Harris [6]) shows that its corank is
so determined.
Theorem 2.1 (Harris). Let V be an E-Banach space equipped with an admissible continuous G-
representation and let d = dim(G). Then as k → ∞,
dimE V
Gk = r · [G : Gk] +O(p
(d−1)k) = r · c · pdk +O(p(d−1)k),
where r is the corank of V and c depends only on G.
Using this result, we may obtain bounds on the dimensions of the continuous cohomology groups
H i(Gk, V ) in terms of k for admissible continuous G-representations V . (Let us remark that the
continuous Gk-cohomology on the category of admissible continuous G-representations may also be
computed as the right derived functors of the functor of Gk-invariants; see Prop. 1.1.3 of [4].)
Lemma 2.2. Let V be an admissible continuous G-representation. For each i ≥ 1,
dimE H
i(Gk, V ) ≪ p
(d−1)k,
as k → ∞.
Proof. Let C := C(G,E) denote the Banach space of continuous E-valued functions on G, equipped
with the right regular G-action. The module C has corank one (indeed, it is cofree – its topological
dual is free of rank one over ΛE). Moreover, C is injective in the abelian category of admissible
G-representations and is therefore acyclic. If V is an admissible continuous G-representation, then
there exists an exact sequence
0 −→ V −→ Cn −→ W −→ 0
of admissible continuous G-representations for some integer n ≥ 0. Since C is acyclic, from the long
exact sequence of cohomology we obtain the following:
0 −→ V Gk −→ (CGk)n −→ WGk −→ H1(Gk, V ) −→ 0, (2.1)
H i(Gk, V ) ≃ H
i−1(Gk,W ), i ≥ 2. (2.2)
The lemma for i = 1 follows from a consideration of (2.1), taking into account Theorem 2.1 and the
fact that corank of W is equal to n minus the corank of V (since corank is additive in short exact
sequences). We now proceed by induction on i. Assume the result for i ≤ m and all admissible
continuous representations, in particular for W . The result for i = m+1 then follows directly from
the isomorphism (2.2). This completes the proof.
3 Cohomology of Arithmetic Quotients of Symmetric Spaces
We now return to the situation considered in the introduction and use the notation introduced there.
In particular, we fix a connected semisimple linear group G over F , an embedding G →֒ GLN over
F , an arithmetic lattice Γ of the associated real group G∞, and a prime p of F .
If we write G := lim
Γ/Γ(pk), then G is a compact open subgroup of the p-adic Lie group G(Fp)
(where Fp denotes the completion of F at p); alternatively, we may define G to be the closure of Γ in
G(Fp). If we replace Γ by Γ(p
k) for some sufficiently large value of k (i.e. discarding finitely many
initial terms in the descending sequence of lattices Γ(pk)), then G will be pro-p and hence, will be
an analytic pro-p-group. Note that Γ is a dense subgroup of G. Let e and f denote respectively
the ramification and inertial indices of p in F (so that [Fp : Qp] = ef). For each k ≥ 0, write Gk to
denote the closure of Γ(pek) in G. Alternatively, if we consider the embedding
G(Fp) →֒ GLN (Fp) →֒ GLefN (Qp),
then Gk = G ∩ (1 + p
kMefN (Zp)); thus, our notation is compatible with that of the preceding
section. We let d denote the dimension of G; note that d =
[F :Q]
· dim(G∞).
For each k ≥ 0, we write
Yk := Γ(p
ek)\G∞/K∞.
There is a natural action of G on Yk through its quotient G/Gk (∼= Γ/Γ(p
ek)), which is compatible
with the projections Yk → Yk′ for 0 ≤ k
′ ≤ k.
Fix a finite-dimensional representation W of G over E, and let W0 denote a G-invariant OE-
lattice in W . Let Vk denote the local system of free finite rank OE-modules on Yk associated to
W0, and denote by Vk the pull-back of V0 to Yk for any k ≥ 0. If 0 ≤ k
′ ≤ k, then the sheaf Vk on
Yk is naturally isomorphic to the pull-back of the sheaf Vk′ on Yk′ under the projection Yk → Yk′ .
In particular the sheaf Vk is G/Gk-equivariant.
Recall the following definitions from from [4], p. 21:
H̃n(V) := lim
Hj(Yk,Vk/p
s), H̃n(V)E := E ⊗OE H̃
n(V).
Each H̃n(V) is a p-adically completeOE-module, equipped with a leftG-action in a natural way, and
hence, each H̃n(V)E has a natural structure of E-Banach space and is equipped with a continuous
left G-action. In fact, they are admissible continuous representations of G ([4], Thm. 2.1.5 (i)), and
in particular, Theorem 2.1 and Lemma 2.2 apply to them. (Note that the results of [4] are stated
in the adèlic language. We leave it to the reader to make the easy translation to the more classical
language we are using in this paper.)
The following result, which is Theorem 2.1.5 (ii) of [4], p. 22, is a “control theorem” relating
Gk invariants in H̃
E to the classical cohomology classes H
j(Yk,Vk)⊗ E.
Theorem 3.1. Fix an integer k. There is a spectral sequence
2 (Yk) = H
i(Gk, H̃
j(V)E) =⇒ H
i+j(Yk,Vk)E .
One should view this spectral sequence as a version of the Hochschild-Serre spectral sequence
“compatible in the G-tower.”
Theorem 3.2. For any n ≥ 0, if rn denotes the corank of H̃
n(V), then
dimE H
n(Yk,Vk)E = rn · c · p
dk +O(p(d−1)k)
as k → ∞. (Here c denotes the constant appearing in the statement of Theorem 2.1; it depends
only on G.)
Proof. For each i, j ≥ 0 and l ≥ 2, let E
l (Yk) denote the terms in the spectral sequence of
Theorem 3.1. Since H̃j(V) is admissible, Lemma 2.2 implies that dimE H
i(Gk, H̃
j(V)E) ≪ p
(d−1)k,
and thus, dimE E
l (Yk) ≪ p
(d−1)k, as k → ∞ (since E
l (Yk) is a subquotient of E
2 (Yk) :=
H i(Gk, H̃
j(V)E) for l ≥ 2). Theorem 2.1 shows that
dimE E
2 (Yk) = dimE H̃
E = rn · c · p
dk +O(p(d−1)k).
On the other hand, since the spectral sequence of Theorem 3.1 is an upper right quadrant exact
sequence, E
∞ is obtained by taking finitely many successive kernels of differentials dl to E
i+l,j−l+1
which all have order ≪ p(d−1)k by the first part of our argument. Thus,
dimE E
= rn · c · p
dk +O(p(d−1)k).
Since Hn(Yk,Vk)E admits a finite length filtration whose associated graded pieces are isomorphic
∞ for i+ j = n, we conclude that dimE H
n(Yk,Vk)E = rn · c · p
dk +O(p(d−1)k), as claimed.
The following lemma quantifies the precise relationship between multiplicities and the dimen-
sions of cohomology groups that we will require to deduce Theorem 1.1 from Theorem 1.2.
Lemma 3.3. Fix a cohomological degree n, and let S denote the set of isomorphism classes [π] of
π ∈ Ĝ∞ that contribute to cohomology with coefficients in V in degree n. Then
[π]∈S
m(π,Gk) ≍ dimE H
cusp(Yk,Vk)E .
Proof. Since the set S is finite, there is an integer d ≥ 1 so that
1 ≤ dimHn(g, k;π ⊗W ) ≤ d
for each isomorphism class [π] ∈ S. This implies that
[π]∈S
m(π,Gk) ≤ dimE H
cusp(Yk,Vk)E ≤ d
[π]∈S
m(π,Gk)
for each k ≥ 0.
We can now prove our main result.
Theorem 3.4. Let n ≥ 0, and suppose that either G∞ does not admit discrete series or else that
dim(G∞/K∞). Then
dimE H
n(Yk,Vk)E ≪ p
(d−1)k
as k → ∞, for all n ≥ 0.
Proof. In the case when G∞ admits discrete series, recall that these contribute to cohomology
only in the dimension
dim(G∞/K∞) ([1], Thm. 5.1, p. 101). Thus, under the assumptions
of the theorem, there is no contribution from the discrete series to Hncusp(Yk,Vk). The inequal-
ity (1.2) of [3] and [10], together with Lemma 3.3 and the main result of [8] (which states that
Hn(Yk,Vk)/H
cusp(Yk,Vk)
= o(pdk)), thus shows that dimE H
n(Yk,Vk)E = o(p
dk) as k → ∞,
for all n ≥ 0. From Theorem 3.2, we then infer that each H̃n(V) has corank 0. Another application
of the same theorem now gives our result.
Note that V (pek) ∼ [G : Gk] ∼ c · p
dk; thus Theorem 3.4 implies Theorem 1.2 since
dim(G) ≤ dim(G∞). (3.1)
Theorem 1.2 and Lemma 3.3 together imply Theorem 1.1.
Remark 3.5. We have equality in (3.1) precisely when p is the unique prime lying over p in OF . If
there is more than one prime lying over p, then dim(G) is strictly less than dim(G∞), and we obtain
a corresponding improvement in the bounds of Theorems 1.1 and 1.2, namely (in the notation of
their statements), that
m(π,Γ(pk)) and dimE H
i(Y (pk),VW,k)E ≪ V (p
n)1−1/ dim(G)
(where, as we noted above, dim(G) =
[F :Q]
· dim(G∞), with e and f being the ramification and
inertial index of p respectively).
Example/Question 3.6. Let F/Q be an imaginary quadratic field, and let G = SL2/F . The
corresponding symmetric space G∞/K∞ = SL2(C)/SU(2) is a real hyperbolic three space H,
and the quotients Y are commensurable with the Bianchi manifolds H/PGL2(OK). Choose a
local system V0 associated to some finite-dimensional representation W of G∞ = GL2(C) and a
congruence subgroup Γ. Assume that p = pp splits in OF , and apply Theorem 3.4 to the p-power
tower. We obtain the inequality
H1cusp(Yk,Vk) ≪ p
as k → ∞. It is natural to ask how tight this inequality is.
The main result of Calegari–Dunfield [2] shows that there exists at least one (F,Γ, p) for which
H1cusp(Yk,C) = 0
for all k. On the other hand, if there exists at least one newform on Γ(pk) for some k, then a
consideration of the associated oldforms shows that
H1cusp(Yk,Vk) ≫ p
as k → ∞. Are there situations in which this lower bound gives the true rate of growth?
Remark 3.7. Our results are most interesting in the case when G∞ does not admit any discrete
series, since, as we noted in the introduction, in this case (and only in this case), G∞ admits
(non-discrete series) tempered representations of cohomological type.
On the other hand, Theorem 3.2 does have a consequence in the case when G∞ admits discrete
series which may be of some interest. Recall the following result from [10] (established in [3] in the
cocompact case): if π ∈ Ĝ∞ lies in the discrete series, then
m(Γ(πk), π) = d(π)V (pk) + o(V (pk)) (3.2)
as k → ∞. Fix a finite-dimensional representation W of G∞, and let Ĝ∞(W )d denote the subset
of Ĝ∞ consisting of discrete series representations that contribute to cohomology with coefficients
in W . Summing over all π ∈ Ĝ∞(W )d, we obtain the formula
|Ĝ∞(W )d|
π∈ bG∞(W )d
m(Γ(πk), π) = d(π)V (pk) + o(V (pk)). (3.3)
(A result first proved in [8].) The following result provides an improvement in the error term
of (3.3).
Theorem 3.8. There exists µ > 0 such that
|Ĝ∞(W )d|
π∈ bG∞(W )d
m(Γ(πk), π) = d(π)V (pk) +O(V (pk)1−µ).
Proof. Let n =
dim(G∞/K∞). As already noted, it follows from [1], (Thm. 5.1, p. 101), that
all non-discrete series contributions to Hncusp(Yk,Vk) are non-tempered. The same result shows
that each discrete series has one-dimensional (g, k)-cohomology in dimension n. As we recalled in
the introduction, the multiplicity of any non-tempered representations is bounded by V (pk)1−µ for
some µ > 0 [9], and thus, Theorem 3.2 and (the proof of) Lemma 3.3 show that
|Ĝ∞(W )d|
π∈ bG∞(W )d
m(Γ(πk), π) = C · V (pk) +O(V (pk)1−µ).
Comparing this formula with (3.3) yields the theorem.
Question 3.9. Does the result of Theorem 3.8 hold term-by-term? That is, does (3.2) admit an
improvement of the form
m(Γ(πk), π)
= d(π)V (pk) +O(V (pk)1−µ)
for some µ > 0?
References
[1] Borel, A.; Wallach, N. Continuous cohomology, discrete subgroups, and representations of reductive
groups. Annals of Mathematics Studies, 94. Princeton University Press, Princeton, N.J.; University of
Tokyo Press, Tokyo, 1980. xvii+388 pp.
[2] Calegari, F.; Dunfield, N. Automorphic forms and rational homology 3-spheres. Geom. Topol. 10 (2006),
295–329.
[3] DeGeorge, D.; Wallach, N. Limit formulas for multiplicities in L2(Γ\G). Ann. Math. (2) 107 (1978),
no. 1, 133–150.
[4] Emerton, M. On the interpolation of systems of eigenvalues attached to automorphic Hecke eigenforms.
Invent. Math. 164 (2006), no. 1, 1–84.
[5] Franke, J. Harmonic Analysis in Weighted L2-spaces. Ann. Sci. cole Norm. Sup. (4) 31 (1998), no. 2,
181–279.
[6] Harris, M. Correction to: “p-adic representations arising from descent on abelian varieties” [Compositio
Math. 39 (1979), no. 2, 177–245]. Compositio Math. 121 (2000), no. 1, 105–108.
[7] Lazard, M. Groupes analytiques p-adiques, Publ. Math. IHES 26 (1965).
[8] Rohlfs, J.; Speh, B. On limit multiplicities of representations with cohomology in the cuspidal spectrum.
Duke Math. J. 55 (1987), no. 1, 199–211.
[9] Sarnak, P.; Xue, X. Bounds for multiplicities of automorphic representations. Duke Math. J. 64 (1991),
no. 1, 207–227.
[10] Savin, G. Limit multiplicities of cusp forms. Invent. Math. 95 (1989), no. 1, 149–159.
[11] Schneider, P.; Teitelbaum, J. Banach space representations and Iwasawa theory, Israel. J. Math. 127
(2002), 359–380.
	Introduction
	Iwasawa Theory
	Cohomology of Arithmetic Quotients of Symmetric Spaces
ABSTRACT
  Let $\Goo$ be a semisimple real Lie group with unitary dual $\Ghat$. The goal
of this note is to produce new upper bounds for the multiplicities with which
representations $\pi \in \Ghat$ of cohomological type appear in certain spaces
of cusp forms on $\Goo$.

<|endoftext|><|startoftext|>
Decoherence of Quantum-Enhanced Timing Accuracy
Mankei Tsang
Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125
(Dated: August 6, 2021)
Quantum enhancement of optical pulse timing accuracy is investigated in the Heisenberg picture. Effects of
optical loss, group-velocity dispersion, and Kerr nonlinearity on the position and momentum of an optical pulse
are studied via Heisenberg equations of motion. Using the developed formalism, the impact of decoherence by
optical loss on the use of adiabatic soliton control for beating the timing standard quantum limit [Tsang, Phys.
Rev. Lett. 97, 023902 (2006)] is analyzed theoretically and numerically. The analysis shows that an appreciable
enhancement can be achieved using current technology, despite an increase in timing jitter mainly due to the
Gordon-Haus effect. The decoherence effect of optical loss on the transmission of quantum-enhanced timing
information is also studied, in order to identify situations in which the enhancement is able to survive.
PACS numbers: 42.50.Dv, 42.65.Tg, 42.81.Dp
I. INTRODUCTION
It has been suggested that the use of correlated photons is
able to enhance the position accuracy of an optical pulse be-
yond the standard quantum limit, and the enhancement can be
useful for positioning and clock synchronization applications
[1]. Generation of two photons with the requisite correlation
has been demonstrated experimentally by Kuzucu et al. [2],
but in practice it is more desirable to produce as many corre-
lated photons as possible in order to obtain a higher accuracy.
To achieve quantum enhancement for a large number of pho-
tons, a scheme of adiabatically manipulating optical fiber soli-
tons has recently been proposed [3], opening up a viable route
of applying quantum enhancement to practical situations. The
analysis in Ref. [3] assumes that the optical fibers are loss-
less, so the Heisenberg limit [4] can be reached in principle.
In reality, however, the quantum noise associated with optical
loss increases the soliton timing jitter and limits the achievable
enhancement. Compared with the use of solitons for quadra-
ture squeezing [5], the adiabatic soliton control scheme poten-
tially suffers more from decoherence, because the soliton must
propagate for a longer distance to satisfy the adiabatic approx-
imation. The effect of loss on a similar scheme of soliton mo-
mentum squeezing has been studied by Fini and Hagelstein
[6], although they did not study the timing jitter evolution rel-
evant to the scheme in Ref. [3], and did not take into account
possible departure from the adiabatic approximation.
In this paper, the decoherence effect of optical loss on the
timing accuracy enhancement scheme proposed in Ref. [3] is
investigated in depth, in order to evaluate the performance of
the scheme in practice. Instead of approaching the problem in
the Schrödinger picture like prior work [1, 3, 6, 7], this paper
primarily utilizes Heisenberg equations of motion, since they
are able to account for dissipation and fluctuation in a more el-
egant way. For simplicity, scalar solitons, as opposed to vector
solitons studied in Ref. [3], are considered here. The theoret-
ical and numerical analyses show that, despite an increase in
timing jitter due to quantum noise and deviation from the adi-
abatic approximation, an appreciable enhancement can still be
achieved using a realistic setup.
The developed formalism is also used to study the propa-
gation of an optical pulse with quantum-enhanced timing ac-
curacy in a lossy, dispersive, and nonlinear medium, such as
an optical fiber, in order to identify situations in which the
enhancement can still survive. The effect of loss on many
correlated photons sent in as many channels has been inves-
tigated by Giovannetti et al. [1], but their analysis focuses on
a relatively small number of correlated photons and does not
include the effects of dispersion and nonlinearity.
This paper is organized as follows: Section II defines the
general theoretical framework, and derives the standard quan-
tum limits and Heisenberg limits on the variances of the pulse
position and momentum operators. Section III studies the evo-
lution of such operators in the presence of loss, group-velocity
dispersion, and Kerr nonlinearity, and determines the effect of
dissipation and fluctuation on the position and momentum un-
certainties. Section IV investigates theoretically and numeri-
cally the impact of optical loss on the adiabatic soliton con-
trol scheme using realistic parameters, while Sec. V studies
the decoherence effect on the transmission of the quantum-
enhanced timing information in various linear and nonlinear
systems.
II. THEORETICAL FRAMEWORK
A. Definition of pulse position and momentum operators
The positive-frequency electric field of a waveguide mode
at a certain longitudinal position can be defined as [8]
Ê(+)(t) = i
4πε0cn2S
ĉ(ω)e−iωt , (1)
where n is the refractive index, η is the real part of n, S is the
transverse area of the waveguide mode, and ĉ(ω) is the photon
annihilation operator. The annihilation operator is related to
the corresponding creation operator via the commutator [8],
[ĉ(ω), ĉ†(ω ′)] = δ (ω −ω ′). (2)
For a pulse with a slowly-varying envelope compared with the
optical frequency, the coefficient in front of the annihilation
operator can be assumed to be independent of frequency and
http://arxiv.org/abs/0704.0663v1
can be evaluated at the carrier frequency ω0, so that the elec-
tric field is proportional to the temporal envelope annihilation
operator Â(t),
Ê(+)(t) ∝ Â(t)e−iω0t , (3)
Â(t)≡ 1√
dω â(ω)e−iωt , (4)
â(ω)≡ ĉ(ω +ω0). (5)
The temporal envelope operator Â(t) and the spectral operator
â(ω) evidently also satisfy the following commutation rela-
tions with their corresponding creation operators,
[Â(t), Â†(t ′)] = δ (t − t ′), (6)
[â(ω), â†(ω ′)] = δ (ω −ω ′). (7)
The total photon number operator can be defined as
dt Â†(t)Â(t), (8)
and the pulse center position operator as [9]
T̂ ≡ 1
dt tÂ†(t)Â(t), (9)
where
is the average photon number. This definition uses 1/N as
the normalization coefficient, instead of the inverse photon
number operator N̂−1 used by Lai and Haus [10], in order
to express the position operator in terms of normally ordered
optical field operators that are easier to handle, as well as to
avoid the potential problem of applying N̂−1 on the vacuum
state. As long as the photon-number fluctuation is small, the
position operator naturally corresponds to the measurement of
the center position of the pulse intensity profile. An average
longitudinal momentum operator can be similarly defined,
Ω̂ ≡ 1
dω ω â†(ω)â(ω)
dt Â†(t)
Â(t). (11)
If the quantum state is close to a large-photon-number coher-
ent state, Â can be approximated as
+δ Â, with O(δ Â)≪
O(Â). Equations (9) and (11) then become the approximate
position and momentum operators defined by Haus and Lai
for solitons in a linearized approach [11]. The linearized ex-
pressions also describe how they can be accurately measured
in practice using balanced homodyne detection [11].
For simplicity, we shall hereafter assume that
= 0 and
= 0 [9]. In the systems considered in this paper, these
two quantities remain constant throughout propagation, if t
is regarded as the retarded time in the moving frame of the
optical pulse.
The commutator between the position and momentum op-
erators is
[T̂ ,Ω̂] =
. (12)
By the Heisenberg uncertainty principle,
[T̂ ,Ω̂]
. (13)
B. Derivation of standard quantum limits
The standard quantum limits and Heisenberg limits on
should be expressed in terms of the pulse
width ∆t, defined as
dt t2Â†(t)Â(t)
, (14)
and the bandwidth ∆ω ,
dω ω2â†(ω)â(ω)
dt Â†(t)
Â(t)
. (15)
To calculate the standard quantum limit on the position uncer-
tainty, consider the expansion
dω ω â†â
dω ′ ω ′â′†â′
, (16)
where we have written â= â(ω) and â′ = â(ω ′) as shorthands.
Rearranging the operators,
dω ω2â†â
dω ′ ωω ′â†â′†ââ′
. (17)
The first term on the right-hand side of Eq. (17) is proportional
to ∆ω2, while the second term contains a normally ordered
cross-spectral density. To derive the standard quantum limit,
we shall assume that the cross-spectral density satisfies the
factorization condition:
â†â′†ââ′
â†â
â′†â′
. (18)
This condition is always satisfied by any pure or mixed state
with only one excited optical mode, such as a coherent state
[12, 13]. The second term on the right-hand side of Eq. (17)
becomes
dω ′ ωω ′
â†â′†ââ′
, (19)
which is assumed to be zero, as per the convention of this
paper. Thus, the variance of Ω̂ is
, (20)
where the subscript “coh” denotes statistics of coherent fields
[12, 13] given by Eq. (18). By virtue of the Heisenberg uncer-
tainty principle given by Eq. (13), the standard quantum limit
on the position variance is hence
4N∆ω2
. (21)
This limit is applicable to any pure or mixed state, and is con-
sistent with the one suggested by Giovannetti et al. for Fock
states [1]. A very similar derivation of the limit for Fock states
and coherent states is also performed by Vaughan et al. [9].
Owing to Fourier duality of position and momentum in the
slowly-varying envelope regime, the standard quantum limit
on the momentum can be derived in the same way. The vari-
ance of T̂ , assuming coherent-field statistics, is
, (22)
and the standard quantum limit on the momentum variance is
4N∆t2
. (23)
C. Derivation of Heisenberg limits
To derive the Heisenberg limit on the position uncertainty,
one needs an absolute upper bound on the momentum uncer-
tainty
. Consider the following non-negative quantity
proportional to the coherence bandwidth squared,
dω ′ (ω −ω ′)2
â†â′†ââ′
≥ 0. (24)
This quantity is non-negative because (ω − ω ′)2 is non-
negative and
â†â′†ââ′
is also non-negative [13]. It can be
rewritten as
dω ′ (ω −ω ′)2
â†â′†ââ′
dω ′ (ω −ω ′)2
â†ââ′†â′
, (25)
and expanded as
dω ′ (ω2 +ω ′2 − 2ωω ′)â†ââ′†â′
dω ω2â†â
− 2N2
≥ 0. (26)
Here we shall approximate N̂ with N, and neglect any photon-
number fluctuation. This approximation is exact for Fock
states, and acceptable for any quantum state with a small
photon-number fluctuation, such as a large-photon-number
coherent state. We then obtain the following approximate in-
equality,
≤ ∆ω2. (27)
With the Heisenberg uncertainty principle given by Eq. (13)
and the upper bound on
given by Eq. (27), one can then
obtain the Heisenberg limit on the uncertainty of T̂ :
4N2∆ω2
. (28)
Equation (28) is again consistent with the Heisenberg limit
suggested by Giovannetti et al. [1], although the derivation
here shows that it is not only valid for Fock states but also
correct to the first order for any quantum state with a small
photon-number fluctuation.
The Heisenberg limit on
is similar,
4N2∆t2
. (29)
A more exact derivation of the Heisenberg limits is given in
Appendix A, where the inverse photon-number operator N̂−1
is used instead of 1/N in the definitions of T̂ , Ω̂, ∆ω , and
∆t. The difference between the approximate Heisenberg limits
derived here and the exact Heisenberg limits in Appendix A is
negligible for small photon-number fluctuations.
III. OPTICAL PULSE PROPAGATION IN THE
HEISENBERG PICTURE
The classical nonlinear Schrödinger equation that describes
the propagation of pulses in a lossy, dispersive, and nonlinear
medium, such as an optical fiber, is given by [14]
−κ |A|2A−
A, (30)
where t is the retarded time coordinate in the frame of the
moving pulse, β is the group-velocity dispersion coefficient,
κ is the normalized Kerr coefficient, and α is the loss coeffi-
cient, all of which may depend on z. The phenomenological
quantized version that preserves the commutator between Â
and Â† is [5]
∂ 2Â
−κÂ†ÂÂ−
Â+ iŝ. (31)
Â ≡ Â(z, t) is the pulse envelope annihilation operator in the
Heisenberg picture, and ŝ is the Langevin noise operator, sat-
isfying the commutation relation
[ŝ(z, t), ŝ†(z′, t ′)] = αδ (z− z′)δ (t − t ′). (32)
Rewriting the position and momentum operators in Eqs. (9)
and (11) in the Heisenberg picture as T̂ (z) and Ω̂(z) in terms of
Â(z, t), differenting them with respect to z, and using Eq. (31),
their equations of motion can be derived,
dT̂ (z)
= β (z)Ω̂(z)+ ŜT (z), (33)
dΩ̂(z)
= ŜΩ(z), (34)
where ŜT and ŜΩ are position and momentum noise operators
defined as
ŜT (z)≡
dt tŝ†(z, t)Â(z, t)+H. c., (35)
ŜΩ(z)≡
dt ŝ†(z, t)
Â(z, t)+H. c., (36)
and H. c. denotes Hermitian conjugate. If the noise reservoir
is assumed to be in the vacuum state, the noise operators have
the following statistical properties, as shown in Appendix B,
ŜT (z)
ŜΩ(z)
= 0, (37)
ŜT (z)ŜT (z
α(z)∆t2(z)
δ (z− z′), (38)
ŜΩ(z)ŜΩ(z
α(z)∆ω2(z)
δ (z− z′), (39)
ŜT (z)ŜΩ(z
′)+ ŜΩ(z)ŜT (z
α(z)C(z)
δ (z− z′), (40)
where C(z) is the pulse chirp factor, defined as
C(z)≡
dt Â†(z, t)
Â(z, t)
The average position 〈T̂ (z)〉 and average momentum 〈Ω̂(z)〉
are constant and assumed to be zero throughout propagation.
The variance of Ω̂ is then
Ω̂2(z)
Ω̂2(0)
α(z′)∆ω2(z′)
N(z′)
, (42)
while the variance of T̂ is more complicated due to the pres-
ence of dispersion,
T̂ 2(z)
T̂ 2(0)
T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0)
dz′β (z′)
Ω̂2(0)
dz′β (z′)
α(z′)∆t2(z′)
N(z′)
dz′β (z′)
α(z′′)C(z′′)
N(z′′)
dz′β (z′)
dz′′β (z′′)
∫ z′′
dz′′′
α(z′′′)∆ω2(z′′′)
N(z′′′)
. (43)
Equation (43) is the central result of this paper. It is similar
to that derived by Haus for optical solitons using a linearized
approach [15], but Eq. (43) is valid for arbitrary loss, arbi-
trary dispersion profile β (z), and arbitrary evolution of pulse
width ∆t(z), chirp C(z), and bandwidth ∆ω(z), so that it is
able to describe the effect of loss on the quantum enhance-
ment scheme proposed in Ref. [3]. The first term on the right-
hand side of Eq. (43) is the initial quantum fluctuation, while
the second and third term on the right-hand side describe the
quantum dispersion effect [16]. In an ideal scenario described
in Ref. [3],
T̂ 2(z)
remains constant if the net dispersion
′β (z′) is zero and quantum dispersion is compensated.
With loss, however, noise introduces a diffusive jitter given
by the fourth term on the right-hand side of Eq. (43),
T̂ 2(z)
α(z′)∆t2(z′)
N(z′)
, (44)
a less well-known chirp-induced jitter given by the fifth term,
T̂ 2(z)
dz′β (z′)
α(z′′)C(z′′)
N(z′′)
, (45)
and also the Gordon-Haus timing jitter [17] given by the sixth
term,
T̂ 2(z)
dz′β (z′)
dz′′β (z′′)
∫ z′′
dz′′′
α(z′′′)∆ω2(z′′′)
N(z′′′)
. (46)
In most cases considered here, N ≫ 1,
≪ ∆t2,
≪ ∆ω2, so one can use the classical nonlinear
Schrödinger equation, Eq. (30), to predict the evolution of
∆t(z), C(z), and ∆ω(z) accurately. The evolution of
T̂ 2(z)
can subsequently be calculated analytically or numerically us-
ing Eq. (43) and the classical evolution of ∆t(z), C(z), and
∆ω(z), analogous to the linearized approach [11, 15].
It is worth noting that the chirp-induced jitter, Eq. (45), de-
pends on the cross-correlation between the position and mo-
mentum noise in Eq. (40), so it can be positive as well as neg-
ative, but the sum of the three sources of jitter must obviously
remain positive.
IV. EFFECT OF LOSS ON ADIABATIC SOLITON
CONTROL
A. Review of the ideal case
Consider the scheme proposed in Ref. [3] and depicted in
Fig. 1. Assume that the dispersion coefficient of the first fiber
β (z) is negative and its magnitude increases along the fiber
slowly compared with the soliton period. The classical soliton
solution of Eq. (30), assuming adiabatic change in parameters
FIG. 1: (Color online) Schematic (not-to-scale) of the adiabatic soli-
ton control scheme. An optical pulse is coupled into a dispersion-
increasing fiber of length L with a negative dispersion coefficient β ,
followed by a much shorter dispersion-compensating fiber of length
L′ with a positive dispersion coefficient β ′.
β (z) and N(z), is [18]
A(z, t) = A0(z)sech
dz′|A0(z′)|2
, (47)
A0(z) =
2τ(z)
, τ(z) =
2|β (z)|
κN(z)
. (48)
The adiabatic approximation is satisfied when
β (z)
dβ (z)/dz
dN(z)/dz
≪ Λ, (49)
where Λ is the soliton period,
Λ(z)≡
τ2(z)
|β (z)|
. (50)
The root-mean-square pulse width ∆t(z) and bandwidth
∆ω(z) then become
∆t(z) =
τ(z) =
|β (z)|
κN(z)
, (51)
∆ω(z) =
3τ(z)
κN(z)
|β (z)| . (52)
The bandwidth ∆ω(z) is thus reduced in the first fiber. If the
second fiber has a positive dispersion coefficient β ′ so that
the net dispersion is zero (
0 dzβ (z)+β
′L′ = 0), the quantum
dispersion effect given by the second and third term on the
right-hand side of Eq. (43) can be eliminated. Furthermore, if
β ′ has a very large magnitude compared with β (z) so that the
second fiber can be very short compared with the first fiber, the
effective nonlinearity experienced by the pulse in the second
fiber can be neglected, and ∆ω(z) remains essentially constant
in the second fiber. In the lossless case, the final timing jitter
T̂ 2(L+L′)
is therefore the same as the input
T̂ 2(0)
, but
∆ω(L+L′) has been reduced and the standard quantum limit
T̂ 2(L+L′)
, Eq. (21), is raised. Provided that the initial
timing jitter of a laser pulse obeys the coherent-field statistics
given by Eq. (22), the final timing jitter is
T̂ 2(L+L′)
T̂ 2(0)
∆t2(0)
β 2(0)
, (53)
while the final standard quantum limit is
T̂ 2(L+L′)
4N∆ω2(L+L′)
3β 2(L)
. (54)
A timing jitter squeezing ratio, analagous to the squeezing ra-
tio defined by Haus and Lai [11], can be defined as
T̂ 2(L+L′)
T̂ 2(L+L′)
β 2(0)
β 2(L)
. (55)
The factor of π2/9 arises because the initial jitter for a sech
pulse shape is slightly higher than the standard quantum limit
given by Eq. (21) in terms of the bandwidth. As long as β (L)
at the end of the first fiber is significantly larger than the initial
value, the timing jitter becomes lower than the raised standard
quantum limit, R becomes smaller than 1, and quantum en-
hancement of position accuracy is accomplished. This semi-
classical analysis is valid in all practical cases, where N ≫ 1,
R ≫ 1/N,
≪ ∆t2,
≪ ∆ω2, and is consistent with
the analysis of exact quantum soliton theory in Ref. [3]. R
is related to the quantum enhancement factor γ defined in
Ref. [3] by R = 1/γ2. The semiclassical analysis is no longer
valid when R is close to the Heisenberg limit 1/N, but as the
next sections will show, owing to decoherence effects, it is ex-
tremely difficult for the enhancement to get anywhere close to
the Heisenberg limit.
B. Numerical analysis of a realistic case
To investigate the impact of noise and the validity of the
adiabatic approximation in practice, a numerical evaluation of
∆t(z), C(z), ∆ω(z), and
T̂ 2(z)
, using Eqs. (30) and (43) and
realistic parameters, is necessary. β (z) is assumed to have the
following profile used in Ref. [19],
β (z) =
−12.75 ps2/km
1+(L− z)/Lβ
. (56)
Lβ = 1 km is used here instead of the Lβ = 1/12 km used
in Ref. [19], in order to satisfy the adiabatic approximation
for a longer pulse in this example. Other fiber parameters are
α = 0.4 dB/km, n2 = 2.6×10−16 cm2/W, Aeff = 30 µm2 [19],
λ0 = 1550 nm, ω0 = 2πc/λ0, so that κ = h̄ω0(ω0n2/cAeff).
L is assumed to be 2 km. A dispersion-compensating fiber
with β ′ = 127.5 ps2/km, α = 0.4 dB/km, n2 = 2.7× 10−16
cm2/W, Aeff = 15 µm2 [20], and L′ = 110 m is used in the
numerical analysis as the second fiber. The classical nonlin-
ear Schrödinger equation, Eq. (30), is numerically solved us-
ing the Fourier split-step method [14]. An initial sech soliton
pulse with τ(0) = 1 ps, N(0) = 1.9×107, and an initial energy
of 2.4 pJ is assumed.
Figure 2 plots the numerical evolution of pulse intensity and
spectrum in the two fibers. As expected, the bandwidth is nar-
rowed in the first fiber and remains approximately constant
in the second (z > 2000 m), owing to the latter’s relative short
length. Figure 3 plots the evolution of pulse width ∆t(z), chirp
−8 −6
−4 −2 0 2 4 6 8
z (m)
|A(z, t)|2
t (ps)
1.5 0
z (m)
|a(z, ω)|2
ω (THz)
FIG. 2: (Color online) Numerical evolution of pulse intensity (top)
and spectrum (bottom). The denser plots for z > 2000 m indicate
pulse propagation in the second fiber. The color codes are in the
same arbitrary units as the heights of the plots.
C(z), and bandwidth ∆ω(z), compared with the adiabatic ap-
proximation, Eqs. (51) and (52). The adiabatic approximation
is evidently not exact, and the pulse acquires a chirp due to
excess dispersion in the first fiber, leading to slight refocusing
in the second fiber. The bandwidth is reduced by a factor of
2.2, as opposed to the ideal factor of 3.6.
Figure 4 plots the evolution of the diffusive jitter given by
Eq. (44), the chirp-induced jitter given by Eq. (45), and the
Gordon-Haus jitter given by Eq. (46). It can be seen that al-
though the Gordon-Haus jitter increases much more quickly
than the other jitter components in the first fiber, the for-
mer drops abruptly in the second fiber (z > 2000 m) due
to the opposite dispersion. This kind of Gordon-Haus jit-
ter reduction by dispersion management is well known [21].
The chirp-induced jitter component drops below zero in the
second fiber, but as noted before, the total noise jitter re-
mains positive. The final jitter values are numerically deter-
mined to be
T̂ 2(L+L′)
= 0.71
T̂ 2(0)
T̂ 2(L+L′)
−0.93
T̂ 2(0)
, and
T̂ 2(L+L′)
= 1.42
T̂ 2(0)
, result-
ing in a total jitter of
T̂ 2(L+L′)
T̂ 2(0)
T̂ 2(L+L′)
T̂ 2(L+L′)
T̂ 2(L+L′)
= 2.19
T̂ 2(0)
. (57)
The final squeezing ratio is hence
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Pulse Width ∆t(z)
Numerical
Adiabatic Approximation
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Chirp C(z)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
z (m)
Bandwidth ∆ω(z)
FIG. 3: (Color online) Evolution of pulse width ∆t(z) (top), chirp
C(z) (center), and bandwidth ∆ω (bottom), compared with the adia-
batic approximation (dash). Plots of ∆t and ∆ω are normalized with
respect to their initial values, respectively.
T̂ 2(L+L′)
T̂ 2(L+L′)
T̂ 2(L+L′)
T̂ 2(0)
N(L+L′)
∆ω2(L+L′)
∆ω2(0)
= 0.42 =−3.8 dB. (58)
Despite taking into account the increased timing jitter and the
non-ideal bandwidth narrowing, a squeezing ratio of −3.8 dB
is predicted by the numerical analysis, suggesting that one
should be able to observe the quantum enhancement experi-
mentally using current technology.
C. Potential improvements
As shown in the previous section, the Gordon-Haus effect
contributes the largest amount of noise in the soliton control
scheme, despite its partial reduction by dispersion manage-
ment. Its magnitude at the end of the first fiber can be esti-
mated roughly as
T̂ 2(L)
T̂ 2(L)
(αL). (59)
As the length of the first fiber must be at least a few times
longer than the soliton period Λ for the adiabatic approxima-
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Diffusive Jitter
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Chirp-Induced Jitter
0 200 400 600 800 1000 1200 1400 1600 1800 2000
z (m)
Gordon-Haus Jitter
FIG. 4: (Color online) Evolution of diffusive jitter (top), chirp-
induced jitter (center), and Gordon-Haus jitter (bottom). All plots
are normalized with respect to the initial jitter
T̂ 2(0)
tion to hold and for the bandwidth to be significantly reduced,
L/Λ is approximately fixed, and the Gordon-Haus jitter can
be reduced only if a figure of merit,
FOM ≡ 1
, (60)
is enhanced. Since this is a rough order-of-magnitude esti-
mate, a representative value of Λ, say at z = L, can be used.
The figure of merit suggests that the performance of the soli-
ton control scheme can be improved by reducing the pulse
width, increasing the overall dispersion coefficient, or reduc-
ing the loss coefficient.
Reducing the pulse width is the most convenient way of
obtaining better enhancement, as the adiabatic bandwidth re-
duction can be achieved over a shorter distance with less loss
of photons. For example, using τ(0) = 500 fs, L = 1 km,
Lβ = 0.3 km, L
′ = 44 m, and otherwise the same parame-
ters as in Sec. IV B, the squeezing ratio becomes −6.0 dB,
while using τ(0) = 200 fs, L = 500 m, Lβ = 1/12 km, and
L′ = 16.2 m gives a squeezing ratio of −7.3 dB. The shorter
pulse width, however, significantly enhances higher-order dis-
persive and nonlinear effects. Raman scattering, in particular,
contributes additional quantum noise because of coupling to
optical phonons [22]. It is beyond the scope of this paper to
investigate these higher-order effects, so a more conservative
pulse width of 1 ps is used in the preceding section. A larger
overall dispersion coefficient, on the other hand, means that
more photons or a higher nonlinearity are required for a soli-
ton to form, so the Raman effect may also become more sig-
nificant with a larger dispersion coefficient. The Raman effect
can be reduced by cooling the fiber and reducing the number
of thermal phonons [22], if it becomes a significant problem.
Further advance in optical fiber technology should be able
to increase the figure of merit by reducing loss, since the spe-
cialty fibers assumed in Sec. IV B have a higher loss than usual
transmission fibers by a factor of two. Using α = 0.2 dB/km
instead of 0.4 dB/km in Sec. IV B, for instance, reduces the
squeezing ratio to −4.7 dB. Spectral filtering or frequency-
dependent gain [23] provides another way of controlling the
Gordon-Haus effect, although it adds another level of com-
plexity to the experimental setup, and it is beyond the scope
of this paper to investigate how the frequency-dependentdissi-
pation or amplification might help the quantum enhancement
scheme. Finally, the design of the setup assumed in Sec. IV B
is not fully optimized, and further optimization of parame-
ters, fiber dispersion profiles, and bandwidth narrowing strat-
egy should be able to improve the enhancement.
V. EFFECT OF LOSS ON THE TRANSMISSION OF
QUANTUM-ENHANCED TIMING INFORMATION
Provided that quantum enhancement of pulse position ac-
curacy can be achieved, the information still needs to be
transmitted through unavoidably lossy channels. It is hence
an important question to ask how loss affects the quantum-
enhanced information in optical information transmission sys-
tems. Equation (43) governs the general evolution of the tim-
ing jitter under the effects of loss, dispersion, and nonlinearity,
but in order to estimate the relative magnitude of the decoher-
ence effects and gain more insight into the decoherence pro-
cesses, in this section Eq. (43) is explicitly solved for various
systems and compared with the standard quantum limit.
A. Linear non-dispersive systems
Without dispersion, the timing jitter increases only due to
the diffusive component
T̂ 2(z)
. An analytic expression for
T̂ 2(z)
can then be derived from Eq. (43), as ∆t(z) and ∆ω(z)
remain constant,
T̂ 2(z)
T̂ 2(0)
∆t2(0)
(1− e−αz). (61)
If the initial variance obeys coherent-field statistics, that is,
T̂ 2(0)
= ∆t2(0)/N(0) according to Eq. (22), the subsequent
jitter is
T̂ 2(z)
∆t2(0)
, (62)
and obeys the same coherent-field statistics but for the re-
duced photon number N(z). This is consistent with intu-
ition. On the other hand, in the high loss limit (αz ≫ 1), the
term ∆t2(0)/N(z) is likely to dominate over the initial jitter
T̂ 2(0)
, so in most cases the position of a significantly atten-
uated pulse relaxes to coherent-field statistics independent of
its initial fluctuation. This justifies the assumption in Sec. IV
that a laser pulse exiting a laser cavity has such statistics, re-
gardless of the quantum properties of the pulse inside the cav-
Equation (61) can be renormalized as
R(z)≡
T̂ 2(z)
T̂ 2(z)
= R(0)e−αz + 4∆t2(0)∆ω2(0)(1− e−αz). (63)
≪ ∆t2 and
≪ ∆ω2, classical theory pre-
dicts that 4∆t2∆ω2 ≈ 1. Equation (63) then suggests that the
relative increase in timing jitter is independent of the initial
squeezing ratio R(0). This is nevertheless not true in general,
as ∆t may depend on both ∆ω and R when the classical theory
fails. In Appendix C, the exact dependence of ∆t on ∆ω and R
is calculated for a specific multiphoton state with a Gaussian
pulse shape called the jointly Guassian state. The expression
4∆t2∆ω2 is given by
4∆t2∆ω2 =
(1− 1/N)2
1− 1/(NR)
, (64)
which results in the following exact expression for an initial
jointly Gaussian state,
R(z) = R(0)e−αz +
(1− 1/N)2
1− 1/(NR)
(1− e−αz). (65)
For a large photon number (N ≫ 1) and moderate enhance-
ment (1 ≥ R ≫ 1/N), 4∆t2∆ω2 ≈ 1, as classical theory would
predict for a Gaussian pulse. In this regime, the quantum-
enhanced information is just as sensitive to loss as standard-
quantum-limited information. When R gets close to the
Heisenberg limit 1/N, however, ∆t∆ω approaches infinity.
This is because maximal coincident-frequency correlations
are required to achieve the Heisenberg limit [1], but heuris-
tically speaking, if the photons have exactly the same mo-
mentum, they must have infinite uncertainties in their rela-
tive positions, leading to an infinite pulse width ∆t. Owing to
the abrupt increase in 4∆t2∆ω2 when R approaches 1/N, the
quantum enhancement becomes much more sensitive to loss.
In the Heisenberg limit of R → 1/N, ∆t → ∞, any loss com-
pletely detroys the timing accuracy and leads to an infinite
jitter, according to Eq. (65).
B. Linear dispersive systems
If the system is lossy, dispersive, but linear, it is not difficult
to show that
∆t2(z) = ∆t2(0)+C(0)
dz′β (z′)
+∆ω2(0)
dz′β (z′)
, (66)
C(z) =C(0)+ 2∆ω2(0)
dz′β (z′), (67)
∆ω2(z) = ∆ω2(0). (68)
The following result can then be obtained from Eq. (43) after
some algebra,
T̂ 2(z)
T̂ 2(0)
T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0)
dz′β (z′)
Ω̂2(0)
dz′β (z′)
∆t2(z)
(1− e−αz).
This result is similar to that in the previous section, except for
the presence of quantum dispersion and the dispersive spread
of the pulse width ∆t(z) that leads to increased jitter. With ini-
tially coherent-field statistics,
T̂ 2(0)
Ω̂2(0)
are given
by Eqs. (22) and (20), respectively, while by similar argu-
ments, the coherent-field statistics for
T̂ Ω̂+ Ω̂T̂
T̂ Ω̂+ Ω̂T̂
. (70)
This leads to the following position variance for a pulse with
initially coherent-field statistics,
T̂ 2(z)
∆t2(z)
, (71)
which still maintains the coherent-field statistics for the dis-
persed pulse width and the reduced photon number. In the
high loss limit (αz ≫ 1), the coherent-field statistics is again
approached regardless of the initial conditions.
For an initial jointly Gaussian quantum state, on the other
hand, the normalized version of Eq. (69) is
R(z) =
R(0)+
4∆t2(0)∆ω2(0)+ 4ζ 2
(1− e−αz), (72)
where ζ is the normalized effective propagation distance,
ζ ≡ ∆ω2(0)
dz′β (z′), (73)
and 4∆t2(0)∆ω2(0) is given by Eq. (64) evaluated at z= 0. As
long as the loss is moderate so that e−αz ≫ 1−e−αz, quantum
dispersion, given by the term proportional ζ 2/R(0), becomes
the dominant effect and overwhelms the initial enhancement
when ζ exceeds R(0)/2.
If the net dispersion
′β (z′) is zero, both quantum and
classical dispersion are eliminated, and the jitter growth be-
comes identical to that in a non-dispersive and linear system
given by Eq. (61).
C. Soliton-like systems
The previous sections show that coherent-field statistics is
maintained in a linear system, but as Sec. IV clearly shows,
non-trivial statistics can arise from the quantum dynamics of
a nonlinear system. The complex evolution of ∆t(z), C(z), and
∆ω(z) in general prevents one from solving Eq. (43) explic-
itly, except for special cases such as solitons.
If the dispersion is constant and the pulse propagates in the
fiber as a soliton, C(z) is zero, while ∆t(z) and ∆ω(z) can be
regarded as constant if
≪∆t2 and
≪∆ω2 through-
out propagation. Equation (43) can then be solved explicitly,
T̂ 2(z)
T̂ 2(0)
Ω̂2(0)
β 2z2 +
∆t2(0)
(eαz − 1)
2β 2∆ω2(0)
eαz − 1
, (74)
where
T̂ (0)Ω̂(0)+ Ω̂(0)T̂ (0)
is asssumed to be zero for
simplicity. If 4
T̂ 2(0)
Ω̂2(0)
= (π2/9)[1/N2(0)] is also
assumed for a soliton pulse for simplicity, Eq. (74) can be
normalized to give
R(z) = R(0)e−αz +
e−αz +
(eαz − 1)
eαz − 1
. (75)
In the low loss regime with αΛ ≪ 1 and αz ≪ 1, Eq. (75) can
be further simplified,
R(z)≈ R(0)+
(αz). (76)
Quantum dispersion is again the dominant effect in this
regime, while decoherence effects are much smaller, by a fac-
tor of αz approximately.
Even if the net dispersion is zero and quantum dispersion is
compensated, the Gordon-Haus effect cannot be completely
eliminated by dispersion management in the presence of non-
linearity and may become significant, as the numerical anal-
ysis in Sec. IV B shows. An order-of-magnitude estimate of
Gordon-Haus jitter can be performed by considering soliton
propagation in a constant negative dispersion fiber, just as
in the previous case, followed by a dispersion-compensating
fiber of length L′ with positive dispersion coefficient β ′. If
L′ is short, the effective nonlinearity experienced by the pulse
in the second fiber can be neglected, and ∆ω(z) can be re-
garded as constant. Assuming that β L+β ′L′ = 0, the integral
in Eq. (46) can be solved to give the Gordon-Haus jitter,
T̂ 2(L+L′)
≈ α∆ω
6N(0)
β 2L2(L+L′)
≈ α∆ω
6N(0)
β 2L3. (77)
The normalized contribution to the squeezing ratio is therefore
T̂ 2(L+L′)
T̂ 2(L+L′)
(αL). (78)
Compared with the Gordon-Haus jitter at the end of the first
fiber given by the last term of Eq. (76), dispersion manage-
ment cuts the jitter by half, but the expression maintains its
functional dependence on the parameters of the first fiber.
This estimate also justifies the use of Eq. (59) to estimate the
Gordon-Haus jitter at the end of the two fibers in Sec. IV C. To
minimize the impact of Gordon-Haus jitter on the quantum-
enhanced timing accuracy in a dispersion-managed soliton
system, the condition
L3 ≪ 54
R (79)
is required.
VI. CONCLUSION
In conclusion, the decoherence effect by optical loss on
adiabatic soliton control and on the transmission of quantum-
enhanced timing information has been extensively studied. It
is found that an appreciable enhancement can still be achieved
by the soliton scheme using current technology, despite an in-
crease of timing jitter due to the presence of loss. It is also
found that the quantum-enhanced timing accuracy should be
much lower than the Heisenberg-limited accuracy to avoid in-
creased sensitivity to photon loss during transmission, and the
net dispersion in the transmission system should be minimized
in order to reduce quantum dispersion and the Gordon-Haus
effect.
Although the most important pulse propagation effects have
been considered in this analysis, higher-order effects, such as
third-order dispersion, self-steepening, and Raman scattering
[14] might provide further adverse impact on the quantum en-
hancement if the optical pulse is ultrashort. In particular, the
inelastic Raman scattering process is expected to be a signifi-
cant source of decoherence for ultrashort pulses [22]. It is be-
yond the scope of this paper to investigate these higher-order
effects, but they should be of minor importance for picosec-
ond pulses and the propagation distances considered in this
paper.
Finally, it is worth noting that while this paper focuses on
optical pulses, the developed formalism is equally valid for
describing the transverse position and momentum of optical
beams [24] and the center-of-mass variables of Bose-Einstein
condensates [9]. Decoherence by loss of particles in those
systems can be studied using the formalism developed in this
paper and parameters specific to those systems.
VII. ACKNOWLEDGMENTS
This work is financially supported by the DARPA Center
for Optofluidic Integration and the National Science Founda-
tion through the Center for the Science and Engineering of
Materials (DMR-0520565).
APPENDIX A: DERIVATION OF EXACT HEISENBERG
LIMITS
An exact Heisenberg limit can be derived if the inverse
photon-number operator N̂−1 is used instead of 1/N in the
definitions of T̂ , Ω̂, ∆t, and ∆ω in Eqs. (9), (11), (14), and
(15), just as in Refs. [9] and [10],
T̂ ′ ≡ N̂−1
dt tÂ†Â, (A1)
Ω̂′ ≡ N̂−1
dω ω â†â, (A2)
∆t ′ ≡
dt t2Â†Â
, (A3)
∆ω ′ ≡
dω ω2â†â
. (A4)
These operators are well defined as long as the quantum state
has zero vacuum-state component (〈0|ρ̂|0〉 = 0). Starting
from the Heisenberg uncertainty principle for T̂ ′ and Ω̂′,
T̂ ′2
, (A5)
and the inequality
dω ′ (ω −ω ′)2â†â′†ââ′
≥ 0, (A6)
one can obtain the exact inequality
≤ ∆ω ′2, (A7)
and the exact Heisenberg limit for the new position operator,
T̂ ′2
4∆ω ′2
. (A8)
The difference between Eqs. (28) and (A8) is negligible for
small photon-number fluctuations. The exact Heisenberg limit
is similar.
APPENDIX B: NOISE STATISTICS
In this section the expression
ŜT (z)ŜT (z
in Eq. (38) is
calculated. The derivations of
ŜΩ(z)ŜΩ(z
in Eq. (39) and
ŜT (z)ŜΩ(z
′)+ ŜΩ(z)ŜT (z
in Eq. (40) are similar. Substi-
tuting Eq. (35) into Eq. (38) gives
ŜT (z)ŜT (z
dt ′ tt ′
ŝ†Âŝ′†Â′
ŝ†ÂÂ′†ŝ′
Â†ŝÂ′†ŝ′
Â†ŝŝ′†Â′
, (B1)
where N = N(z), N′ = N(z′), ŝ = ŝ(z, t), Â = Â(z, t), ŝ′ =
ŝ(z′, t ′), and Â′ = Â(z′, t ′). If the noise reservoir is in the vac-
uum state, ŝ|0reservoir〉= 〈0reservoir|ŝ† = 0, so only the last term
in Eq. (B1) is non-zero,
ŜT (z)ŜT (z
dt ′ tt ′
Â†ŝŝ′†Â′
dt ′ tt ′
Â†Â′
αδ (z− z′)δ (t − t ′)+
Â†ŝ′† ŝÂ′
δ (z− z′)
dt ′ tt ′
Â†ŝ′†ŝÂ′
. (B2)
The first term on the right-hand side of Eq. (B2) is the desired
result, while the second term can be rewritten as
dt ′ tt ′
Â†ŝ′†ŝÂ′
dt ′ tt ′
Â†, ŝ′†
ŝ, Â′
. (B3)
If the system is linear, the commutator between ŝ and Â is
always zero [13], but because ŝ does not commute with Â†
and Â is coupled to Â† by the nonlinear term in Eq. (31), ŝ
may fail to commute with Â. That said, it can be argued that
the optical field operator must always commute with future
noise operators due to causality and the infinitesimally short
memory of ŝ,
Â†, ŝ′†
= 0 if z < z′, (B4)
ŝ, Â′
= 0 if z > z′, (B5)
so Eq. (B3) can be non-zero only at z = z′. The commutator
between ŝ and Â at z = z′ due to the parametric coupling of Â
and Â† can be estimated by a perturbative technique. Consider
an integral form of Eq. (31) with the nonlinear term and the
Langevin noise term only,
Â(z+∆z) = Â(z)+
∫ z+∆z
iκÂ†(z′)Â(z′)Â(z′)+ ŝ(z′)
and Â†(z′) given by the Hermitian conjugate of Eq. (B6),
Â†(z′) = Â†(z)+
−iκÂ†(z′′)Â†(z′′)Â(z′′)+ ŝ†(z′′)
The commutator between ŝ and Â at z+∆z becomes
[ŝ(z+∆z), Â(z+∆z)]
∫ z+∆z
ŝ(z+∆z), Â†(z′)
Â(z′)Â(z′). (B8)
ŝ(z+∆z) commutes with Â(z′) because z+∆z > z′, while it
fails to commute with Â†(z′) because Â†(z′) given by Eq. (B7)
depends explicitly on ŝ†. Thus, in the leading order of ∆z,
[ŝ(z+∆z), Â(z+∆z)]
∫ z+∆z
ŝ(z+∆z), ŝ†(z′′)
Â(z′)Â(z′), (B9)
which approaches 0 in the limit of ∆z → 0. Hence ŝ commutes
with Â at z= z′, and the position noise is given only by the first
term on the right-hand side of Eq. (B2), resulting in Eq. (38).
APPENDIX C: THE JOINTLY GAUSSIAN STATE
A Fock state can be expressed as [13, 25]
dω1 . . .
dωN φ(ω1, . . . ,ωN)|ω1, . . . ,ωN〉,
dt1 . . .
dtN ψ(t1, . . . , tN)|t1, . . . , tN〉, (C1)
where the spectral and temporal eigenstates are given by
|ω1, . . . ,ωN〉 ≡
â†(ω1) . . . â†(ωN)|0〉, (C2)
|t1, . . . , tN〉 ≡
Â†(t1) . . . Â
†(tN)|0〉. (C3)
Theses states are eigenstates of the following operators rele-
vant to our purpose,
Ω̂|ω1, . . . ,ωN〉=
|ω1, . . . ,ωN〉,
T̂ |t1, . . . , tN〉=
|t1, . . . , tN〉, (C5)
dω ω2â†â|ω1, . . . ,ωN〉=
|ω1, . . . ,ωN〉,
dt t2Â†Â|t1, . . . , tN〉=
|t1, . . . , tN〉. (C7)
φ(ω1, . . . ,ωN) is the spectral multiphoton probability ampli-
tude, and it is related to the temporal probability amplitude
ψ(t1, . . . , tN) by the N-dimensional Fourier transform in the
slowly-varying envelope regime. Both amplitudes should also
satisfy normalization and boson symmetry. To study temporal
quantum enhancement, it is convenient to define the probabil-
ity amplitude as a jointly Gaussian function [25],
φ(ω1, . . . ,ωN) =C exp
, (C8)
ψ(t1, . . . , tN) =C′ exp
−N2B2
, (C9)
where B and b are arbitrary and real constants, and C and C′
are normalization constants. Explicit expressions for
, ∆ω2, and ∆t2 can be obtained using Eqs. (C4)-(C7) and
Appendix B of Ref. [25],
= B2, (C10)
4N2B2
, (C11)
∆ω2 = B2 +
b2, (C12)
∆t2 =
4N2B2
. (C13)
In the limit of b → 0,
reaches the Heisenberg limit,
4N2∆ω2
, (C14)
and the quantum state can be written as a state of photons with
maximal coincident-frequency correlations,
|N〉 ∝
dω exp
|ω , . . . ,ω〉. (C15)
On the other hand, when B2 = b2/N,
is at the standard
quantum limit,
4N∆ω2
, (C16)
the quantum state has only one excited Gaussian mode [25],
|N〉 ∝
dω1 . . .
|ω1, . . . ,ωN〉
dω exp
â†(ω)
|0〉, (C17)
and therefore also satisfies the coherent-field statistics [12,
13]. These limits and the corresponding quantum states are
consistent with those suggested in Ref. [1]. With Eqs. (C12)
and (C13), the pulse width ∆t can be determined explicitly in
terms of ∆ω and the squeezing ratio R = ∆ω2/(NB2),
∆t2 =
(1− 1/N)2
1− 1/(NR)
. (C18)
[1] V. Giovannetti, S. Lloyd, and L. Maccone, Nature (London)
412, 417 (2001); V. Giovannetti, S. Lloyd, and L. Maccone,
Phys. Rev. A 65, 022309 (2002).
[2] V. Giovannetti, L. Maccone, J. H. Shapiro, and F. N. C. Wong,
Phys. Rev. Lett. 88, 183602 (2002); O. Kuzucu, M. Fiorentino,
M. A. Albota, F. N. C. Wong, and F. X. Kärtner, Phys. Rev. Lett.
94, 083601 (2005).
[3] M. Tsang, Phys. Rev. Lett. 97, 023902 (2006).
[4] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 96,
010401 (2006); V. Giovannetti, S. Lloyd, and L. Maccone, Sci-
ence 306, 1330 (2004).
[5] S. J. Carter, P. D. Drummond, M. D. Reid, and R. M. Shelby,
Phys. Rev. Lett. 58, 1841 (1987); P. D. Drummond and S. J.
Carter, J. Opt. Soc. Am. B 4, 1565 (1987); M. J. Potasek and
B. Yurke, Phys. Rev. A 35, 3974 (1987); M. J. Potasek and
B. Yurke, ibid. 38, 1335 (1988); H. A. Haus, Electromagnetic
Noise and Quantum Optical Measurements (Springer, Berlin,
2000).
[6] J. M. Fini and P. L. Hagelstein, Phys. Rev. A 66, 033818 (2002).
[7] P. L. Hagelstein, Phys. Rev. A 54, 2426 (1996); J. M. Fini, P. L.
Hagelstein, and H. A. Haus, ibid. 60, 2442 (1999).
[8] B. Huttner and S. M. Barnett, Phys. Rev. A 46, 4306 (1992);
R. Matloob, R. Loudon, S. M. Barnett, and J. Jeffers, ibid. 52,
4823 (1995).
[9] T. Vaughan, P. Drummond, and G. Leuchs, Phys. Rev. A 75,
033617 (2007).
[10] Y. Lai and H. A. Haus, Phys. Rev. A 40, 844 (1989).
[11] H. A. Haus and Y. Lai, J. Opt. Soc. Am. B 7, 386 (1990).
[12] U. M. Titulaer and R. J. Glauber, Phys. Rev. 140, 676 (1965);
U. M. Titulaer and R. J. Glauber, ibid. 145 1041 (1966).
[13] L. Mandel and E. Wolf, Optical Coherence and Quantum Op-
tics (Cambridge University Press, Cambridge, 1995).
[14] G. P. Agrawal, Nonlinear Fiber Optics (Academic Press, San
Diego, 2001).
[15] H. A. Haus, J. Opt. Soc. Am. B 8, 1122 (1991).
[16] Y. Lai and H. A. Haus, Phys. Rev. A 40, 854 (1989).
[17] J. P. Gordon and H. A. Haus, Opt. Lett. 11, 665 (1986).
[18] H. H. Kuehl, Opt. Lett. 5, 709 (1988).
[19] V. A. Bogatyrev, M. M. Bubnov, E. M. Dianov, A. S. Kurkov,
P. V. Mamyshev, A. M. Prokhorov, S. D. Rumyantsev, V. A.
Semenov, S. L. Semenov, A. A. Sysoliatin, S. V. Chernikov, A.
N. Gur’yanov, G. G. Devyatykh, and S. I. Miroshnichenko, J.
Lightwave Technol. 9, 561 (1991).
[20] L. Grüner-Nielsen, M. Wandel, P. Kristensen, C. Jørgensen,
L. V. Jørgensen, B. Edvold, B. Pálsdóttir, and D. Jakobsen, J.
Lightwave Technol. 23, 3566 (2005).
[21] N. J. Smith, W. Forysiak, and N. J. Doran, Electron. Lett. 32,
2085 (1996).
[22] F. X. Kärtner, D. J. Dougherty, H. A. Haus, and E. P. Ippen,
J. Opt. Soc. Am. B 11, 1267 (1994); J. F. Corney and P. D.
Drummond, ibid. 18, 153 (2001).
[23] A. Mecozzi, J. D. Moores, H. A. Haus, and Y. Lai, Opt. Lett.
16, 1841 (1991); Y. Kodama and A. Hasegawa, Opt. Lett. 17,
31 (1992).
[24] S. M. Barnett, C. Fabre, and A. Maı̂tre, Eur. Phys. J. D 22,
513 (2003); N. Treps, U. Andersen, B. Buchler, P. K. Lam, A.
Maı̂tre, H.-A. Bachor, and C. Fabre, Phys. Rev. Lett. 88, 203601
(2002); N. Treps, N. Grosse, W. P. Bowen, C. Fabre, H.-A. Ba-
chor, P. K. Lam, Science 301, 940 (2003).
[25] M. Tsang, “Relationship between resolution enhancement and
multiphoton absorption rate in quantum lithography,” e-print
quant-ph/0607114 (to appear in Phys. Rev. A).
http://arxiv.org/abs/quant-ph/0607114
ABSTRACT
  Quantum enhancement of optical pulse timing accuracy is investigated in the
Heisenberg picture. Effects of optical loss, group-velocity dispersion, and
Kerr nonlinearity on the position and momentum of an optical pulse are studied
via Heisenberg equations of motion. Using the developed formalism, the impact
of decoherence by optical loss on the use of adiabatic soliton control for
beating the timing standard quantum limit [Tsang, Phys. Rev. Lett. 97, 023902
(2006)] is analyzed theoretically and numerically. The analysis shows that an
appreciable enhancement can be achieved using current technology, despite an
increase in timing jitter mainly due to the Gordon-Haus effect. The decoherence
effect of optical loss on the transmission of quantum-enhanced timing
information is also studied, in order to identify situations in which the
enhancement is able to survive.

<|endoftext|><|startoftext|>
Stock market return distributions: from past
to present
S. Drożdż1,2, M. Forczek1, J. Kwapień1, P. Oświȩcimka1, R. Rak2
1Institute of Nuclear Physics, Polish Academy of Sciences, PL–31-342 Kraków,
Poland
2 Institute of Physics, University of Rzeszów, PL–35-959 Rzeszów, Poland
Abstract
We show that recent stock market fluctuations are characterized by the cumu-
lative distributions whose tails on short, minute time scales exhibit power scaling
with the scaling index α > 3 and this index tends to increase quickly with de-
creasing sampling frequency. Our study is based on high-frequency recordings of
the S&P500, DAX and WIG20 indices over the interval May 2004 - May 2006. Our
findings suggest that dynamics of the contemporary market may differ from the one
observed in the past. This effect indicates a constantly increasing efficiency of world
markets.
Key words: Financial markets, Inverse cubic power law, q-Gaussian distributions,
Multifractality
PACS: 89.20.-a, 89.65.Gh, 89.75.-k
The so-called financial stylized facts are among the central issues of econo-
physics research. Much effort has been devoted on both the empirical and the
theoretical level to such phenomena like fat-tailed distributions of financial
fluctuations, persistent correlations in volatility, multifractal properties of re-
turns etc. Specifically, the interest in the return distributions can be traced
back to an early work of Mandelbrot [1] in which he proposed a Lévy process
as the one governing the logarithmic price fluctuations. Much later this issue
was revisited in [2] based on data with much better statistics and a new model
of exponentially-truncated Lévy flights was introduced. Then, in an extensive
systematic study of the largest American stock markets [3] the distribution
tails for both the prices and the indices were shown to be power-law with the
scaling exponent α ≃ 3. The most striking outcome of that study was that de-
spite the fact that the tails were well outside the Lévy-stable regime (α ≤ 2),
they were apparently stable under time aggregation up to several days for in-
dices and up to a month for stocks. The existence of return distributions with
Preprint submitted to Elsevier 11 October 2018
http://arxiv.org/abs/0704.0664v1
scaling tails was also reported in other markets like e.g. London [4], Frank-
furt [5], Paris [6], Oslo [7], Tokyo [3], and Hong Kong [3,8] but sometimes with
a slightly different value of the scaling index. This empirical property of price
and index returns led to the formulation of the so-called inverse cubic power
law [3], which was soon followed by an attempt of formulating its theoretical
foundation [6] (see also [4]).
Subsequent related study [9] revealed that, opposite to the earlier outcomes
of [3], the tail shape of the return distributions might no longer be so stable
along time axis. After comparison of the results obtained from the American
stock market data in years 1994-95 and in 1998-99, it turned out that in
more recent data the scaling tails with α ≃ 3 for individual companies are
preserved up to the time scales (sampling intervals) ∆t of less than one hour
instead of one month. This earlier crossover for 1998-99 data can easily be seen
in Fig. 1. This result was obtained by extending our previous analysis [9] over
a set of 1000 largest American companies [10] in order to enable more direct
comparison with outcomes of ref. [3] based on the same number of stocks.
The inverse cubic scaling is still evident in Fig. 1 for short time scales up to
∆t = 4 min, but the scaling index starts rising already for data with ∆t = 16
min and for longer time scales the tail behaviour is clearly governed by the
Central Limit Theorem. The difference of the results for the same market
but for different time intervals might suggest that the scaling behaviour is
not stable and depends on some crucial factors as, for example, the speed of
information processing which constantly increases from past to present.
This possibility can further be examined by considering even more recent
data from the American market. First, we look at the S&P500 index which
already was the subject of an analysis in [3]. Our data is a time series of
1 min returns covering the period May 2004 - May 2006 (in [3] the period
was 1984-1996). The c.d.f. for this data is presented in Fig. 2(a) for several
time scales up to 120 min. The most interesting feature is the lack of inverse
cubic scaling even for the shortest one-minute returns: in this case the actual
scaling index is slightly above 4 and it systematically increases with decreasing
sampling frequency (see Table 1). We leave open here the question what factor
underlies the evident absence of the α ≃ 3 type of scaling: the dynamics of
S&P500 returns could have changed sufficiently significantly since earlier half
of 1990s and the inverse cubic scaling no longer exists or it still exists but
is restricted to time scales shorter than 1 minute. That our observation is
more universal and can be made for other markets as well one may infer from
Fig. 2(b) and Fig. 2(c) presenting c.d.f. for the German index DAX and for
the Polish index WIG20, respectively, for the same period of time. While the
returns of DAX did not exactly comply with the inverse cubic scaling also in
the period 1998-99 [9], the ones of WIG20 indeed used to display this kind of
behaviour in the past as documented in [11]. However, nowadays WIG20 also
develops much thinner tails with α > 4 for ∆t = 1 min (Table 1).
abs. norm. returns
n 1 min
4 min
16 min
60 min
120 min
240 min
1 day
2 days
4 days
8 days
1000 stocks
α = 3.0
Gaussian
Fig. 1. Cumulative distributions of normalized stock returns averaged over 1000
highly-capitalized American companies in time interval Dec 1997 - Dec 1999 for
several different time scales from 1 min to 8 days. Gaussian distribution and inverse
cubic scaling are also shown for comparison. Best-fit power index α calculated by
means of log-log regression assumes the following values: 3.08± 0.05 (∆t = 1 min),
3.34 ± 0.05 (4 min), 4.00 ± 0.04 (16 min), 4.60 ± 0.05 (60 min), 4.95 ± 0.06 (120
min), 4.81± 0.15 (240 min), 5.90± 0.08 (1 day), 7.18± 0.28 (2 days), 9.17± 0.22 (4
days), and 8.32 ± 0.40 (8 days).
∆t 1 min 4 min 16 min 32 min 60 min
S&P500 4.12± 0.12 4.21± 0.08 5.18 ± 0.21 5.53 ± 0.20 6.10 ± 0.35
DAX 3.56± 0.12 3.76± 0.05 4.44 ± 0.16 5.14 ± 0.26 5.16 ± 0.66
WIG20 4.28± 0.16 5.24± 0.27 5.81 ± 0.78 5.61 ± 0.45 6.30 ± 0.72
Table 1
Values of the scaling index α obtained with a log-log regression fit for three different
market indices (S&P500, DAX, and WIG20) from the interval May 2004 - May 2006.
Recent studies [11,12] showed that in a wide range of returns the financial
return distributions can be approximated by a family of q-Gaussians [13] with
the parameter q depending on the sampling interval ∆t. The q-Gaussian dis-
tributions follow naturally from the nonextensive statistical mechanics [13,12]
and their c.d.f. can be written as [11]
P (X > x) = Nq
(3− q)β)
2Γ(β)
± (x− µ̄q)2F1(α, β; γ; δ)
, (1)
Fig. 2. Cumulative distributions of index returns for American S&P500 (top row),
German DAX (middle row) and Polish WIG20 (bottom row) recorded over the in-
terval May 2004 - May 2006. Data points for sampling intervals from 1 min to 120
min are denoted by different symbols. (Left column) Distributions for normalized
returns are shown together with Gaussian distribution and lines corresponding to
inverse cubic scaling and, approximately, the actual scaling. (Right column) Exper-
imental distributions are best-fitted by q-Gaussians with a free parameter q (fitted
values displayed) separately for negative and positive returns. Note the small asym-
metry between left and right tails.
0,2 0,3 0,4 0,5 0,6 0,7 0,8
WIG20
Fig. 3. Singularity spectra for 1 min index returns: S&P500 (top), DAX (middle),
and WIG20 (bottom). In each case, the actual data (solid line) is accompanied by
its randomized version averaged over 10 independent realizations (full symbols).
where α = 1
, β = 1/(q− 1), γ = 3
, δ = −Bq(q− 1)(µ̄q − x)2, 2F1(α, β; γ; δ) =
δk(α)k(β)k
k!(γ)k
is the Gauss hypergeometric function and Nq, µ̄q are, respec-
tively, the normalization factor and q-mean of the q-Gaussian p.d.f. [13].
Fig. 2(d)-2(f) exhibit cumulative distributions of returns for the same three
indices with the corresponding best fits in terms of Eq.(1). The theoretical
curves are in satisfactory agreement with the data for all the considered val-
ues of ∆t. It is noteworthy that, consistently with the left-hand side panels
of Fig. 2, the largest values of q are well below q = 3/2, which corresponds
to the inverse cubic scaling, and decrease with decreasing sampling frequency
towards the classic Gaussian distribution with q = 1.
Finally, we look at the singularity spectra f(α) (for numerical details of the
method see e.g. [15]) of our time series under study. Two cases are considered:
original data comprising the full variety of nonlinear temporal correlations
(solid lines in Fig. 3) and the randomized data in which all the correlations
are removed by shuffling the data points (full symbols in Fig. 3). Since both
the nonlinear dependencies and the fat-tailed distributions can be potential
sources of multifractality, the data shuffling can give some information of how
rich is the multiscaling behaviour due to each of these sources [14,15]. In the
present context the most interesting feature is that the Gaussian distribution
of uncorrelated data is associated with a monofractal f(α) spectrum. Fig. 3
shows the singularity spectra for S&P500, DAX and WIG20 (top to bot-
tom) together with their randomized-data counterparts. In all three cases, the
original data clearly represent multifractal processes with DAX and S&P500
showing richer multifractality than WIG20. This picture changes completely if
we look at the randomized data: we cannot detect sufficiently significant trace
of multifractality (f(α) is almost point-like). This is in agreement with the
observation that the distribution tails for the contemporary data tend to be
thinner than before. This result gives also an additional argument in favour of
the statement that the principal (here even the unique) source of multifractal
properties of the stock market data are the nonlinear correlations.
In this paper we have shown that c.d.f. for the most recent stock market data
represented by the index returns develops tails whose scaling index rises above
the value of 3 even for short, minute sampling intervals. This means that con-
temporary market dynamics significantly differs from the one observed 20 or
even 10 years ago and described by the inverse cubic power law [3]. That these
changes are a continuous process rather that a sudden transition we infer from
the existence of intermediate stages in which the inverse cubic scaling was ob-
served up to medium time scales of tens of minutes but was absent in daily
data [9,11]. This effect suggests a scenario of constantly increasing market
efficiency due to an acceleration of information processing in the world mar-
kets [9,16]. The related compression of the range of potential time correlations
between consecutive returns in the present analysis finds evidence in a faster
convergence towards a Gaussian distribution for aggregated returns. Such a
faster convergence indicates weeker time-correlations between returns. Further
argument in favour of an increasing stock market efficiency comes from the
autocorrelation analysis. The autocorrelation functions calculated exlicitely
from the 1 min returns for the three indices considered above drop down to
the noise level already for time-lags as small as 1-2 min. This is to be com-
pared to ∼ 5 min in the period 1998-99 [9], and to ∼ 20 min which according
to ref. [3] was characteristic for the period 1994-95.
References
[1] B. Mandelbrot, J. Business 36, 294 (1963)
[2] R.N. Mantegna, H.E. Stanley, Nature 376 (1995) 46
[3] P. Gopikrishnan, V. Plerou, L.A. Nunes Amaral, M. Meyer, H.E. Stanley, Phys.
Rev. E 60 (1999) 5305; V. Plerou, P. Gopikrishnan, L.A.N. Amaral, M. Meyer,
H.E. Stanley, Phys. Rev. E 60 (1999) 6519
[4] J.D. Farmer, F. Lillo, Quant. Finance 4 (2004) C7; J.D. Farmer, L. Gillemot,
F. Lillo, S. Mike, A. Sen, Quant. Finance 4 (2004) 383
[5] T. Lux, Appl. Financial Economics 6 (1996) 463
[6] X. Gabaix, P. Gopikrishnan, V. Plerou, H.E. Stanley, Nature 423 (2003) 267
[7] J.A. Skjeltorp, Physica A 283 (2000) 486
[8] Z.F. Huang, Physica A 287 (2000) 405
[9] S. Drożdż, J. Kwapień, F. Grümmer, F. Ruf, J. Speth, Acta Phys. Pol. B 34
(2003) 4293, cond-mat/0208240
[10] See http://www.taq.com
[11] R. Rak, S. Drożdż, J. Kwapień, Physica A 374 (2007) 315
[12] C. Tsallis, C. Anteneodo, L. Borland, R. Osorio, Physica A 324 (2003) 89
[13] C. Tsallis, R.S. Mendes, A.R. Plastino, Physica A 261 (1998) 534
[14] K. Matia, Y. Ashkenazy, H.E. Stanley, Europhys. Lett. 61 (2003) 422
[15] J. Kwapień, P. Oświȩcimka, S. Drożdż, Physica A 350 (2005) 466
[16] J. Kwapień, S. Drożdż, J. Speth, Physica A 337 (2004) 231
http://arxiv.org/abs/cond-mat/0208240
http://www.taq.com
	References
ABSTRACT
  We show that recent stock market fluctuations are characterized by the
cumulative distributions whose tails on short, minute time scales exhibit power
scaling with the scaling index alpha > 3 and this index tends to increase
quickly with decreasing sampling frequency. Our study is based on
high-frequency recordings of the S&P500, DAX and WIG20 indices over the
interval May 2004 - May 2006. Our findings suggest that dynamics of the
contemporary market may differ from the one observed in the past. This effect
indicates a constantly increasing efficiency of world markets.

<|endoftext|><|startoftext|>
Introduction
In this paper, we study the Cauchy problem for the Hartree equation
iut +∆u = f(u), in R
n × R, n ≥ 5,
u(0) = ϕ(x), in Rn.
(1.1)
Here f(u) =
V ∗ |u|2
u is a nonlinear function of Hartree type for V (x) = |x|−γ , 0 <
γ < n, where ∗ denotes the convolution in Rn. In practice, we use the integral formula
of (1.1)
u(t) = U(t)ϕ− i
U(t− s)f(u(s))ds, (1.2)
where U(t) = eit∆.
http://arxiv.org/abs/0704.0665v2
If the solution u of (1.1) has sufficient smoothness and decay at infinity, it satisfies
two conservation laws :
M(u(t)) =
∥u(t)
E(u(t)) =
∥∇u(t)
|x− y|γ
|u(t, x)|2|u(t, y)|2 dxdy = E(ϕ).
(1.3)
As explained in [6], the energy is also conserved for the energy solutions u ∈ C0t (R,H
From the viewpoint of the fractional integral, we rewrite the equation (1.1) as
iut +∆u =
(−∆)−
2 |u|2
For dimension n ≥ 5, the exponent γ = 4 is the unique exponent which is energy critical
in the sense that the natural scale transformation
uλ(t, x) = λ
2 u(λ2t, λx),
leaves the energy invariant, in other words, the energy E(u) is a dimensionless quantity.
The Cauchy problem of the Hartree equation has been intensively studied ([4-10],
[15, 16, 18, 19]. With regard to the global well-posedness and scattering results, they
all dealt with the Ḣ1-subcritical case
2 < γ < min(4, n)
in the energy space or some
weighted spaces. In [16], we obtained the small data scattering result for the Ḣ1-critical
case in the energy space. For the large initial data for the Ḣ1-critical case
γ = 4, n ≥ 5
in the energy space , the argument in [16] can not yield the global well-posedness, even
with the conservation of the energy (1.3), because the time of existence given by the
local theory depends on the profile of the data as well as on the energy.
Concerning the Ḣ1-subcritical case
2 < γ < min(4, n)
, using the method of
Morawetz and Strauss [17], J. Ginibre and G. Velo [6] developed the scattering the-
ory in the energy space, where they exploited the properties of ∆ and obtain the usual
Morawetz estimate
∣u(t, x)
|x− y|γ
∇|u(t, y)|2dydxdt . CE(u).
Later, K. Nakanishi [18] exploited the properties of i∂t + ∆ and used a certain related
Sobolev-type inequality to obtain a new Morawetz estimate
|t|1+ν |u(t, x)|
(|t|+ |x|)2+ν
dxdt ≤ C(E, ν), for any ν > 0,
which was independent of the nonlinearity.
In this paper, we deal with the Cauchy problem of the Hartree equation with the large
data for the Ḣ1-critical case
γ = 4, n ≥ 5
. Inspired by the approach of Bourgain [1]
and Tao [22] in the case of the Ḣ1-critical Schrödinger equation with the local nonlinear
term, we obtain the global well-posedness and scattering results for the Hartree equation
for the large radial data in Ḣ1. The new ingredient is that we take advantage of the
following localized estimate for the first time
|x|≤A|I|1/2
|u(t, x)|2∆
dxdt = (n− 3)
|x|≤A|I|1/2
|u(t, x)|2
dxdt ≤ A|I|1/2C(E)
to rule out the possibility of energy concentration, instead of the classical Morawetz
estimate
∇V (x− y)|u(x)|2|u(y)|2dydxdt . C(E)
due to the nonlinear term .
Our main result is the following global well-posedness result in the energy space.
Theorem 1.1. Let n ≥ 5, and ϕ ∈ Ḣ1 be radial. then there exists a unique global
solution u ∈ C0t (Ḣ
(iut +∆u)(t, x) =
uV ∗ |u|2
(t, x), in Rn × R,
u(0) = ϕ(x), in Rn.
(1.4)
where V (x) = |x|−4 and on each compact time interval [t−, t+], we have
x ([t−,t+]×Rn)
). (1.5)
As the right hand side of (1.5) is independent of t−, t+, we can obtain the global
spacetime estimate. As a direct consequence of the global L6tL
x estimate, we have
scattering, asymptotic completeness, and uniform regularity.
Corollary 1.1. Let ϕ be radial and have finite energy. Then there exists finite energy
solutions u±(t, x) to the free Schrödinger equation iut +∆u = 0 such that
∥u±(t)− u(t)
→ 0 as t → ±∞.
Furthermore, the maps ϕ 7→ u±(0) are homeomorphisms from Ḣ
1(Rn) to Ḣ1(Rn). Fi-
nally, if ϕ ∈ Hs for some s > 1, then u(t) ∈ Hs for all time t, and one has the uniform
bounds
∥u(t)
≤ C(E(ϕ), s)
The paper is organized as follows.
In Section 2, we introduce notations and the basic estimates; In Section 3, we derive
the local mass conservation and Morawetz inequality; In Section 4, we discuss the local
theory for (1.4); In Section 5, we obtain the perturbation theory; Finally, we prove the
main theorem in Section 6.
2 Notations and basic estimates
We will often use the notations a . b and a = O(b) to denote the estimate a ≤ Cb for
some C. The derivative operator ∇ refers to the space variable only. We also occasionally
use subscripts to denote the spatial derivatives and use the summation convention over
repeated indices.
We define 〈a, b〉 = Re(ab), ∂ = (∂t,∇), D = (−
,∇); For 1 ≤ p ≤ ∞, we denote by
p′ the dual exponent, that is, 1
For any time interval I, we use L
x(I×R
n) to denote the mixed spacetime Lebesgue
x(I×R
Lr(Rn)
with the usual modifications when q = ∞. When q = r, we abbreviate L
x by L
We use U(t) = eit∆ to denote the free group generated by the free Schrödinger
equation iut +∆u = 0. It can commute with derivatives, and obeys the inequality
∥eit∆f
Lp(Rn)
. |t|
−n( 1
(2.1)
for t 6= 0, 2 ≤ p ≤ ∞.
We say that a pair (q, r) is admissible if
2 ≤ r
≤ ∞, n = 1;
< ∞, n = 2;
, n ≥ 3.
For a spacetime slab I ×Rn, we define the Strichartz norm Ṡ0(I) by
Ṡ0(I)
:= sup
(q,r) admissible
x(I×R
and define Ṡ1(I) by
Ṡ1(I)
Ṡ0(I)
When n ≥ 3, the spaces
Ṡ0(I), ‖ · ‖
Ṡ0(I)
Ṡ1(I), ‖ · ‖
Ṡ1(I)
are Banach spaces,
respectively.
Based on the above notations, we have the following Strichartz inequalities
Lemma 2.1. [11], [21] Let u be an Ṡ0 solution to the Schrödinger equation (1.1). Then
∥u(t0)
∥f(u)
x (I×R
for any t0 ∈ I and any admissible pairs (q, r). The implicit constant is independent of
the choice of interval I.
From Sobolev embedding, we have
Lemma 2.2. For any function u on I × Rn, we have
L∞t L
L∞t L
where all spacetime norms are on I × Rn.
For convenience, we introduce two abbreviated notations. For a time interval I, we
denote
x (I×R
W (I)
x (I×R
Lemma 2.3. Let f(u)(t, x) =
uV ∗ |u|2
(t, x), where V (x) = |x|−4. For any time
interval I and t0 ∈ I, we have
ei(t−s)∆f(u)(s, x)ds
Ṡ1(I)
W (I)
Proof: By Strichartz estimates, Hardy-Littlewood-Sobolev inequality and Hölder
inequality, we have
ei(t−s)∆f(u)(s, x)ds
Ṡ1(I)
. ‖∇f(u)(t, x)‖
x (I×R
. ‖∇uV ∗ |u|2‖
x (I×Rn)
+ ‖uV ∗ (u∇u)‖
x (I×Rn)
. ‖∇u‖
x (I×Rn)
‖V ∗ |u|2‖
x (I×Rn)
+ ‖u‖
x (I×Rn)
‖V ∗ (u∇u)‖
x (I×R
W (I)
3 Local mass conservation and Morawetz inequality
In this section, we will prove two useful estimates. One is a local mass conservation
estimate and the other is a Morawetz inequality, which appears in Morawetz identity.
The local mass conservation estimate is used to control the flow of mass through a region
of space, and the Morawetz inequality is used to prevent concentration.
3.1 Local mass conservation
We recall a local mass conservation law that has appeared in [1], [13] and [22]. For
completeness, we give the sketch of the proof. Let χ be a bump function supported on
the ball B(0, 1) that equals 1 on the ball B(0, 1/2). Observe that if u is a finite energy
solution of (1.4), then
∣u(t, x)
= −2∇ · Im(u∇u(t, x)).
We define
Mass(u(t), B(x0, R)) :=
(x− x0
u(t, x)
Differentiating the above quantity with respect to time, we obtain by the integration by
parts
∂tMass(u(t), B(x0, R)) =
(x− x0
∣u(t, x)
(x− x0
∇ · Im(u∇u)dx
(x− x0
(x− x0
Im(u∇u)dx
∥∇u(t)
Mass(u(t), B(x0, R))
hence, we have
Mass(u(t1), B(x0, R))
1/2 −Mass(u(t2), B(x0, R))
∣t1 − t2
∣. (3.1)
This implies that if the local mass Mass(u(t), B(x0, R)) is large for some time t, then it
can also be shown to be similarly large for nearly time t, by increasing the radius R if
necessary to reduce the rate of change of the mass.
On the other hand, from Sobolev and Hölder inequalities, we have
Mass(u(t), B(x0, R)) ≤
(x− x0
. (3.2)
This gives the control of mass in small volumes.
3.2 A Morawetz inequality
To prevent the concentration of the energy, we need a Morawetz estimate. The Morawetz
estimate is based on some integral identity derived by variation of the lagrangian.
We define ℓ(u) by
2ℓ(u) = 〈iut, u〉+ |∇u|
|u|2(V ∗ |u|2)
ℓ(u) is the lagrangian density associated to the equation (1.1).
From the definition of the variation of the functional ℓ, we have
δvℓ(u) : = lim
ℓ(u+ ǫv)− ℓ(u)
= 〈iut +∆u− u(V ∗ |u|
2), v〉 + ∂ · 〈Du, v〉.
Using this identity together with h =
, q =
Re(D·h) =
and Mu = h ·Du+qu,
we obtain the following formula:
〈iut +∆u− u(V ∗ |u|
2),Mu〉 =− ∂ · 〈Du,Mu〉 +D ·
hℓ(u) +
〈Du, ∂hαDαu〉 −
D · ∂q − (V ∗ ∇|u|2) ·
|u|2.
As a consequence of the above dilation identity, we have the following Morawetz
estimate, which plays an important role in our proof.
Proposition 3.1 (Morawetz estimate). Let u be a solution to (1.4) on a spacetime slab
I × Rn. Then for any A ≥ 1, we have
|x|≤A|I|1/2
dxdt−
∇V (x− y)|u(x)|2|u(y)|2dydxdt
. A|I|1/2E,
where Ω =
(x, y) ∈ Rn ×Rn; |x| ≤ A|I|1/2; |y| ≤ A|I|1/2
Remark 3.1. Since
∇V (x− y) = 4
|x||y| − x · y
|x− y|6
we have
∇V (x− y)|u(x)|2|u(y)|2dydxdt ≥ 0.
Proof: We define V a0 (t) =
a(x)|u(t, x)|2dx, then
Ma0 (t) =: ∂tV
0 (t) = 2Im
ajujudx.
0 (t) = −2Im
ajjutudx− 4Im
ajujutdx
△△a|u|2dx+ 4Re
ajkujukdx
− 2Re
∇a(x)∇V (x− y)|u(y)|2|u(x)|2dxdy
△△a|u|2dx+ 4Re
ajkujukdx
∇a(x)−∇a(y)
∇V (x− y)|u(y)|2|u(x)|2dxdy
where we use the symmetry of a(x) and V (x). Let R > 0 and let η be a bump function
adapted to the ball |x| ≤ R which equals 1 on the ball |x| ≤ R/2. We set a(x) := |x|η(x).
For |x| ≤ R/2, we have
; ajk =
; △a =
; −△△a =
(n− 1)(n − 3)
and for R/2 ≤ |x| ≤ R, we have bounds
aj = O(1); ajk = O(R
−1); △△a = O(R−3).
Thus we have
0 (t) = (n− 1)(n − 3)
|x|≤R/2
dx+ 4
|x|≤R/2
|∇u|2 − |∂ru|
∇V (x− y)|u(x)|2|u(y)|2dydx
|x|∼R
( |u|2
|∇u|2
aj(x)− aj(y)
) xj − yj
|x− y|γ+2
|u(x)|2|u(y)|2dydx
where γ = 4,
(x, y) ∈ Rn × Rn; |x| ≤ R/2, |y| ≤ R/2
(x, y) ∈ Rn × Rn; |x| ∼ R
(x, y) ∈ Rn × Rn; |y| ∼ R
Meanwhile
|x|∼R
( |u|2
|∇u|2
dx . R−1E,
aj(x)− aj(y)
) xj − yj
|x− y|γ+2
|u(x)|2|u(y)|2dydx
Ω2: |x−y|≤R/4
aj(x)− aj(y)
) xj − yj
|x− y|γ+2
|u(x)|2|u(y)|2dydx
Ω2: |x−y|≥R/4
aj(x)− aj(y)
) xj − yj
|x− y|γ+2
|u(x)|2|u(y)|2dydx
. R−1
|x− y|γ
|u(x)|2|u(y)|2dydx
. R−1E.
Moreover, from Sobolev and Hölder inequalities, we have
Ma0 (t) .
|x|.R
|u||∇u| . ‖u‖
‖∇u‖L2x
|x|.R
. RE.
So if we integrate by parts on a time interval I and take R = 2A|I|1/2, we obtain
|x|≤A|I|1/2
dxdt−
∇V (x− y)|u(x)|2|u(y)|2dydxdt
. A|I|1/2E
for n ≥ 4. The proof is completed.
4 Local theory
In this section, we develop a local well-posedness and blow-up criterion for the Ḣ1-critical
Hartree equation. First, we have
Proposition 4.1 (Local well-posedness). Let u(t0) ∈ Ḣ
1, and I be a compact time
interval that contains t0 such that
∥U(t− t0)u(t0)
for a sufficiently small absolute constant η > 0. Then there exists a unique strong solution
to (1.4) on I × Rn such that
∥u(t0)
Proof: The proof of this proposition is standard and based on the contraction
mapping arguments. We define the solution map to be
Φ(u)(t) := U(t− t0)u(t0)− i
U(t− s)f(u(s))ds,
then Φ is a map from
B = {u : ‖u‖X(I) ≤ 2η, ‖u‖W (I) ≤ 2C‖u(t0)‖Ḣ1}
with the metric
‖u‖B = ‖u‖X(I) + ‖u‖W (I)
onto itself because
‖Φ(u)‖X(I) ≤ ‖U(t− t0)u(t0)‖X(I) + C‖u‖
X(I)‖u‖W (I) ≤ η + 8Cη
2‖u(t0)‖Ḣ1 ≤ 2η;
‖Φ(u)‖W (I) ≤ C‖u(t0)‖Ḣ1+C‖u‖
X(I)‖u‖W (I) ≤ C‖u(t0)‖Ḣ1+8Cη
2‖u(t0)‖Ḣ1 ≤ 2C‖u(t0)‖Ḣ1 .
It suffices to prove Φ is a contraction map. Let u, v ∈ B, then
‖Φ(u)− Φ(v)‖W (I) ≤
U(t− s)(V ∗ (ū− v̄)u)u(s, x)ds
W (I)
U(t− s)(V ∗ v̄(u− v))u(s, x)ds
W (I)
U(t− s)(V ∗ v̄v)(u− v)(s, x)ds
W (I)
By Lemma 2.3, we have
‖Φ(u)− Φ(v)‖W (I) ≤ ‖u− v‖X(I)
‖u‖W (I)‖u‖X(I) + ‖u‖W (I)‖v‖X(I) + ‖v‖W (I)‖v‖X(I)
+ ‖u− v‖W (I)
‖u‖X(I)‖u‖X(I) + ‖u‖X(I)‖v‖X(I) + ‖v‖X(I)‖v‖X(I)
≤ 12Cη‖u(t0)‖Ḣ1‖u− v‖X(I) + 12η
2‖u− v‖W (I)
(‖u− v‖X(I) + ‖u− v‖W (I))
In the same way, we have
‖Φ(u)− Φ(v)‖X(I) ≤ 12Cη‖u(t0)‖Ḣ1‖u− v‖X(I) + 12η
2‖u− v‖W (I)
(‖u− v‖X(I) + ‖u− v‖W (I))
as long as η is chosen sufficiently small. Then the contraction mapping theorem implies
the existence of the unique solution to (1.4) on I.
Next, we give the blow-up criterion of the solutions for (1.4). The usual form is
similar to those in [2], [12], which is in the form of a maximal interval of existence. For
convenience, we obtain
Proposition 4.2 (Blow-up criterion). Let ϕ ∈ Ḣ1, and let u be a strong solution to
(1.4) on the slab [0, T ) × Rn such that
X([0,T ))
Then there exists δ > 0 such that the solution u extends to a strong solution to (1.4) on
the slab [0, T + δ] × Rn.
Proof: By the absolute continuity of integrals, there exists a t0 ∈ [0, T ), such that
‖u‖X([t0,T )) ≤ η/4,
then by Lemma 2.3, we have
‖u‖W ([t0,T )) . ‖u(t0)‖Ḣ1 + ‖u‖
X([t0,T ))
‖u‖W ([t0,T )),
therefore
‖u‖W ([t0,T )) . ‖u(t0)‖Ḣ1 .
Now we write
U(t− t0)u(t0) = u(t) + i
U(t− s)(V ∗ |u|2)u(s, x)ds,
‖U(t−t0)u(t0)‖X([t0,T )) ≤ ‖u‖X([t0,T ))+C‖u‖
X([t0,T ))
‖u‖W ([t0,T )) ≤
+Cη2‖u(t0)‖Ḣ1 ≤
By the absolute continuity of integrals again, there exists a δ, such that
‖U(t− t0)u(t0)‖X([t0,T+δ)) ≤ η.
Thus we may apply Proposition 4.1 on the interval [t0, T + δ] to complete the proof.
In other words, this lemma asserts that if [t0, T
∗) is the maximal interval of existence
and T ∗ < ∞, then
‖u‖X([t0,T ∗)) = ∞.
5 Perturbation result
In this section, we obtain the perturbation for Hartree equation, which shows that the
solution can not be large if the linear part of the solution is not large. This is an
analogue of Lemma 3.2 in [22], and later, Killip, Visan and Zhang [13] gave the similar
perturbation result for the Schrödinger equation with the quadric potentials.
Lemma 5.1 (Perturbation lemma). Let u be a solution to (1.4) on I = [t1, t2] such that
where η is sufficiently small constant depending on the norm of the initial data, then
Ṡ1(I)
where uk(t) = U(t− tk)u(tk) for k = 1, 2.
Proof: From Strichartz estimate and Lemma 2.3, we obtain
Ṡ1(I)
∥u(t1)
W (I)
∥u(t1)
Ṡ1(I)
∥u(t1)
Ṡ1(I)
If η is sufficiently small, we have the first claim
Ṡ1(I)
As for the second claim, we give the proof for k = 1, the case k = 2 is similar. Using
Strichartz estimate and Lemma 2.3 again, we have
∥u− u1
Ṡ1(I)
. η2,
therefore, the second claim follows by the triangle inequality and choosing η sufficiently
small.
6 Global well-posedness
In this section, we give the proof of Theorem 1.1. The new ingredient is that we first
take advantage of the the estimate of the term
|x|≤A|I|1/2
dxdt in the localized
Morawetz identity to rule out the possibility of energy concentration, which is indepen-
dent of the nonlinear term. For the Schrödinger equation, Tao [22] used the classical
Morawetz estimate, which depends on the nonlinearity, to prevent the concentration.
For readability, we first take some constants
C1 = 6n; C2 = 3; C3 = 18n. (6.1)
which come from several constraints in the rest of this section. All implicit constants in
this section are permitted to depend on the dimension n and the energy.
Fix E, [t−, t+], u. We may assume that the energy is large, E > c > 0, otherwise the
claim follows from the small energy theory [16]. From the boundedness of energy and
Sobolev embedding, we can obtain
∥u(t)
∥u(t)
. 1 (6.2)
for all t ∈ [t−, t+].
Assume that the solution u already exists on [t−, t+]. By Lemma 4.2, it suffices to
obtain a priori estimate
X([t−,t+])
≤ O(1), (6.3)
where O(1) is independent of t−, t+.
We may assume that
X([t−,t+])
≥ 2η,
otherwise it is trivial. We divide [t−, t+] into J subintervals Ij = [tj , tj+1] for some J ≥ 2
such that
X(Ij)
≤ η, (6.4)
where η is a small constant depending on the dimension n and the energy. As a conse-
quence, it suffices to estimate the number J .
Now let u± = U(t − t±)u(t±). By Sobolev embedding and Strichartz estimates, we
X([t−,t+])
. 1. (6.5)
We adapt the following definition of Tao [22].
Definition 6.1. We call Ij exceptional if
X(Ij)
> ηC3
for at least one sign ±. Otherwise, we call Ij unexceptional.
From (6.5), we obtain the upper bound on the number of exceptional intervals,
O(η−6C3). We may assume that there exist unexceptional intervals, otherwise the claim
would follow from this bound and (6.4). Therefore, it suffices to compute the number of
unexceptional intervals.
We first prove the existence of a bubble of mass concentration in each unexceptional
interval.
Proposition 6.1 (Existence of a bubble). Let Ij be an unexceptional interval. Then
there exists xj ∈ R
n such that
Mass(u(t), B(xj , η
−C1 |Ij |
1/2)) & ηC1 |Ij|
for all t ∈ Ij .
Proof: By time translation invariance and scale invariance, we may assume that
Ij = [0, 1]. We subdivide Ij further into [0,
] and [1
, 1]. By (6.4) and the pigeonhole
principle and time reflection symmetry if necessary, we may assume that
X([ 1
Thus by Lemma 5.1, we have
X([ 1
. (6.6)
By Duhamel formula, we have
) = U(t− t−)u(t−)− i
U(t− s)f(u(s))ds
U(t− s)f(u(s))ds.
(6.7)
Since [0, 1] is unexceptional interval, we have
∥U(t− t−)u(t−)
X([ 1
∥u−(t)
X([ 1
≤ ηC3 .
On the other hand, by (6.4), Lemma 2.2 , Lemma 2.3 and Lemma 5.1, we have
U(t− s)f(u(s))ds
X([ 1
X([ 1
W ([ 1
Ṡ1([ 1
. η2.
Thus the triangle inequality implies that
U(t− s)f(u(s))ds
X([ 1
provided η is chosen sufficiently small. Hence, if we define
v(t) :=
U(t− s)f(u(s))ds,
then we have
X([ 1
η. (6.8)
Next, we estimate the upper bound on v. We have by (6.7) and the triangle inequality
Ṡ1([ 1
Ṡ1([ 1
∥U(t− t−)u(t−)
Ṡ1([ 1
U(t− s)f(u(s))ds
Ṡ1([ 1
X([0, 1
W ([0, 1
X([0, 1
Ṡ1([0, 1
(6.9)
where we use Strichartz estimate, (6.4) and Lemma 5.1.
We shall need some additional regularity control on v. For any h ∈ Rn, let u(h)
denote the translation of u by h, i.e. u(h)(t, x) = u(t, x− h).
Lemma 6.1. Let χ be a bump function supported on the ball B(0, 1) of total mass one,
and define
vav(t, x) =
χ(y)v(t, x+ ηC2y)dy,
then we have
∥v − vav
X([ 1
. ηC2 .
Proof: By the chain rule, Hölder inequality and Sobolev embedding, we have
∥∇f(u)(s)
∥(V ∗ |u|2)∇u
∥u(V ∗ ∇|u|2)
∥V ∗ |u|2
∥V ∗ ∇|u|2
∥|u|2
∥∇|u|2
it follows by (2.1)
L∞t L
,1]×Rn)
≤ sup
t∈[ 1
|t− s|2
∥∇f(u)(s)
ds . 1.
From (6.9) and interpolation, we have
L∞t L
,1]×Rn)
L∞t L
,1]×Rn)
L∞t L
,1]×Rn)
From the fundamental theorem of calculus, we have
∥v − v(h)
L∞t L
,1]×Rn)
. |h|.
This implies
∥v − vav
L∞t L
,1]×Rn)
∥v(t, x+ ηC2y)− v(x)
L∞t L
,1]×Rn)
χ(y)|ηC2y|dy
. ηC2 .
Hence from Hölder inequality, we obtain
∥v − vav
X([ 1
∥v − vav
L∞t L
,1]×Rn)
. ηC2 .
This completes the proof of Lemma.
Now we return to the proof of Proposition 6.1. By Lemma 6.1 and (6.8), we have
X([ 1
& η. (6.10)
On the other hand, by Hölder inequality, Young inequalities and (6.9), we have
2(3n−8)
,1]×Rn)
L∞t L
,1]×Rn)
L∞t L
,1]×Rn)
Interpolating with (6.10) gives
L∞t,x([
,1]×Rn))
X([ 1
− 3n−8
2(3n−8)
,1]×Rn)
Thus there exists (sj, xj) ∈ [
, 1] × Rn such that
∣vav(sj, xj)
∣ & η
Hence, by Cauchy-Schwarz inequality, we have
∣vav(sj , xj)
χ(y)v(sj , xj + η
C2y)dy
= η−nC2
x− xj
)v(sj , x)dx
. η−nC2η
C2Mass(v(sj), B(xj , η
C2))1/2,
that is
Mass(v(sj), B(xj , η
C2)) & η3n−6+nC2 & ηC1 . (6.11)
Observe that (3.1) also holds for v. If we take R = η−C1 and choose η sufficiently
small, we have
Mass(v(t), B(xj , η
−C1)) &
Mass(v(sj), B(xj , η
−C1))1/2 −
& (Mass(v(sj), B(xj , η
C2))1/2 − ηC1)2
& ηC1
(6.12)
for all t ∈ [0, 1].
The last step is to show that this mass concentration holds for u. We first show mass
concentration for u at time 0.
Since [0, 1] is unexceptional interval, by the pigeonhole principle, there is a τj ∈ [0, 1]
such that
∥u−(τj)
. ηC3 ,
and so by Hölder inequality,
Mass(u−(τj), B(xj , η
−C1)) .
(x− xj
∥u−(τj)
C1+2C3 . η2C1 .
From (3.1), we have
Mass(u−(0), B(xj , η
−C1)) . η2C1 . (6.13)
Recall that u(0) = u−(0) − iv(0). Combing (6.12) and (6.13) with the triangle
inequality, we obtain
Mass(u(0), B(xj , η
−C1)) & ηC1 . (6.14)
Using (3.1) again, we obtain the result.
Next, we use the radial assumption to show that the bubble of mass concentration
must occur at the spatial origin. In the forthcoming paper, we shall use the interaction
Morawetz estimate with the frequency localized L2 almost-conservation law to rule out
the possibility of the energy concentration at any place and deal with the non-radial
data. The corresponding results for the Schrödinger equation with local nonlinearity,
please see [3], [20] and [23].
Corollary 6.1 (Bubble at the origin). Let Ij be an unexceptional interval. Then
Mass(u(t), B(0, η−3C1 |Ij|
1/2)) & ηC1 |Ij |
for all t ∈ Ij .
Proof: If xj in Proposition 6.1 is within
η−3C1 |Ij |
1/2 of the origin, then the result
follows immediately. Otherwise by the radial assumption, there would be at least
((η−3C1 |Ij |
1/2)n−1
(η−C1 |Ij|1/2)n−1
η−2(n−1)C1
many distinct balls each containing at least ηC1 |Ij | amount of mass. By Hölder inequality,
this implies
η−2(n−1)C1 × ηC1 |Ij | .
(η−3C1−η−C1 )|Ij |
1/2≤|x|≤(η−3C1+η−C1 )|Ij |
|u(t, x)|2dx
(η−3C1−η−C1 )|Ij |
1/2≤|x|≤(η−3C1+η−C1 )|Ij |
η−3C1 |Ij |
× η−C1 |Ij |
that is
2n2−9n+4
Because 2n2 − 9n+ 4 > 0 for n ≥ 5, this contradicts the boundedness on the energy
of (6.2). This completes the proof.
Next, we use Proposition 3.1 to show that if there are many unexceptional intervals,
they must form a cascade and must concentrate at some time t∗.
Corollary 6.2. Assume that the solution u is spherically symmetric. For any interval
I ⊆ [t−, t+] and I be a union of consecutive unexceptional intervals Ij. Then
. η−13C1
and moreover, there exists a j such that
∣ & η26C1
Proof: For any unexceptional interval Ij , from Hölder inequality and Corollary 6.1,
we have
∣ . Mass
u(t), B(0, η−3C1 |Ij |
η−3C1 |Ij |1/2
2η−3C1 |Ij |1/2
2 |u(t, x)|2
η−3C1 |Ij|
|x|≤2η−3C1 |Ij |
|u(t, x)|2
therefore
|x|≤2η−3C1 |Ij|
|u(t, x)|2
dx & η10C1
We integrate this over each unexceptional interval Ij and sum over j,
η10C1
|x|≤2η−3C1 |Ij|
|u(t, x)|2
|x|≤2η−3C1 |I|1/2
|u(t, x)|2
|x|≤2η−3C1 |I|1/2
|u(t, x)|2
. η−3C1 |I|1/2.
The second claim follows from the first and the fact that
)−1/2
This completes the proof.
Proposition 6.2 (Interval cascade). Let I be an interval tiled by finitely many intervals
I1, · · · , IN . Suppose that for any continuous family
Ij : j ∈ J
of the unexceptional
intervals, there exists j∗ ∈ J such that
∣ ≥ a
∣ (6.15)
for some small a > 0. Then there exist K ≥ log(N)/ log(2a−1) distinct indices j1, · · · , jK
such that
∣ ≥ 2
∣ ≥ · · · ≥ 2K−1
and for any t∗ ∈ IjK ,
dist(Ijk , t∗) .
hold for 1 ≤ k ≤ K.
I(1) 
I(2) 
I(k) 
I(K−1) 
I(K) 
(t) t+ 
exceptional interval 
at most O(η−6C3)
exceptional interval 
at most 2a−1−1 
Figure 1: Iteration process in Proposition 6.2.
Proof: Here we use an algorithm in [1] and [22] to assign a generation to each Ij .
By hypothesis, I contains at least one interval of length a|I|. All intervals with length
larger than a|I|/2 belong to the first generation. By the total measure, we see that there
are at most 2a−1 − 1 intervals in the first generation. Removing there intervals from I
leaves at most 2a−1 gaps, which are tiled by intervals Ij .
By (6.15) and the contradiction argument, we know that there is not gap with length
larger than |I|/2.
We now apply this argument recursively to all gaps generated by the previous itera-
tion until every Ij has been labeled with a generation number.
Each iteration of the algorithm removes at most 2a−1−1 many intervals and produces
at most 2a−1 gaps. Suppose that there areN consecutive unexceptional intervals initially,
and we perform at most K times iterations. Then the number K obeys
N ≤ (2a−1 − 1) + (2a−1 − 1)2a−1 + · · ·+ (2a−1 − 1)(2a−1)K−1
≤ (2a−1)K ,
which leads to the claim K ≥ log(N)/ log(2a−1).
Let I(K) be the interval obtained after K − 1 iterations and IjK be any interval in
I(K). For 1 ≤ i ≤ K − 1, let I(i) be the (i − 1)-generation gap which contains the IjK ,
and assign the Iji be any ith-generation interval which is contained in I
(i) (see Figure
1). By the construction, for any t∗ ∈ IjK , we have
dist(t∗, Ijk) ≤ |I
(k)| ≤ 2a−1
for all 1 ≤ k ≤ K.
Proposition 6.3 (Energy non-evacuation). Let Ij1 , · · · , IjK be a disjoint family of un-
exceptional intervals obeying
∣ ≥ 2
∣ ≥ · · · ≥ 2K−1
∣ (6.16)
and for any t∗ ∈ IjK ,
dist(Ijk , t∗) . η
−26C1
hold for 1 ≤ k ≤ K. Then
K ≤ η−100C1 .
Proof: By Corollary 6.1,
Mass(u(t), B(0, η−3C1 |Ijk |
1/2)) & ηC1 |Ijk |
for all t ∈ Ijk . By (3.1), we have
Mass(u(t∗), B(0, η
−27C1 |Ijk |
1/2)) &
ηC1 |Ijk |
dist(t∗, Ijk)
η−27C1 |Ijk |
& ηC1 |Ijk |.
On the other hand, from (3.2), we have
Mass(u(t∗), B(0, 2η
C1 |Ijk |
1/2)) . η2C1 |Ijk |.
Define
A(k) =
x : ηC1 |Ijk |
1/2 ≤ |x| ≤ η−27C1 |Ijk |
then we have
∣u(t∗, x)
dx & Mass(u(t), B(0, η−27C1 |Ijk |
1/2))−Mass(u(t), B(0, 2ηC1 |Ijk |
1/2))
& ηC1 |Ijk |.
By Hölder inequality, we have
∣u(t∗, x)
n−2 dx &
ηC1 |Ijk |
η−27C1 |Ijk |
)− 2n
& η95C1
Choosing M = −56C1 log η, then we obtain by (6.16)
η−27C1 |IjM+1 |
1/2 ≤ ηC1 |Ij1 |
η−27C1 |Ij2M+1 |
1/2 ≤ ηC1 |IjM+1 |
· · ·
Hence the annuli A(k) associated to k = 1,M +1, 2M +1, · · · , are disjoint. The number
of such annuli is O(K/M).
Therefore from (6.2), we obtain
η95C1 .
∣u(t∗, x)
n−2 dx . 1.
That is
K . Mη−95C1 . η−100C1 .
We now return to the proof of Theorem 1.1. As explained at the beginning of this
section, it suffices to bound the number of the unexceptional intervals.
Note that the number of exceptional interval is at most O(η−6C3). We first bound
the number N of unexceptional intervals that can occur consecutively.
Let us denote the union of these consecutive unexceptional intervals by I. By Corol-
lary 6.2, the hypotheses of Proposition 6.2 are satisfied with a = η26C1 and so we can
find a cascade of K intervals and they satisfied the hypotheses of Proposition 6.3. The
bound on K implies the bound on N , namely,
N . (2a−26C1)K ≈ (2η−26C1)η
−100C1
At last, since there are at most O(η−6C3) exceptional intervals, the total number of
intervals is
J . η−6C3 + η−6C3N . eη
−200C1
This completes the proof of Theorem 1.1.
Acknowledgements: The authors were partly supported by the NNSF of China.
G. Xu wish to thank Xiaoyi Zhang for providing the paper [13] and some discussions.
References
[1] J. Bourgain, Scattering in the energy space and below for 3D NLS. J. Anal. Math.
75(1998), 267-297.
[2] T. Cazenave, Semilinear Schrödinger equations. Courant Lecture Notes in Mathe-
matics, vol. 10. New York: New York University Courant Institute of Mathematical
Sciences, 2003.
[3] J. Colliander, M. Keel, G. Staffilani, H. Takaoka, and T. Tao, Global well-posedness
and scattering for the energy-cirtical nonlinear Schrödinger equation in R3. to appear
Ann. of Math..
[4] J. Ginibre and T. Ozawa, Long range scattering for nonlinear Schrödinger and
Hartree equations in space dimension n ≥ 2. Comm. Math. Phys., 151(1993), 619-
[5] J. Ginibre and G. Velo, On a class of nonlinear Schrödinger equations with nonlocal
interactions, Math. Z., 170(1980), 109-136.
[6] J. Ginibre and G. Velo, Scattering theory in the energy space for a class of
Hartree equations, Nonlinear wave equations (Providence, RI, 1998), 29-60, Con-
temp. Math., 263, Amer. Math. Soc., Providence, RI, 2000.
[7] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some
Hartree type equations. Rev. Math. Phys., 12, No. 3, 361-429 (2000).
[8] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some
Hartree type equations II. Ann. Henri Poincaré 1, No.4, 753-800 (2000).
[9] J. Ginibre and G. Velo, Long range scattering and modified wave operators for some
Hartree type equations. III: Gevrey spaces and low dimensions. J. Differ. Equations.
175, No.2, 415-501 (2001).
[10] N. Hayashi and Y. Tsutsumi, Scattering theory for the Hartree equations. Ann.
Inst. H. Poincaré Phys. Theorique 61(1987), 187-213.
[11] M. Keel and T. Tao, Endpoint Strichartz estimates. Amer. J. Math. 120:5(1998),
955-980.
[12] C. E. Kenig and F. Merle, Global well-posedness, scattering and blow-up for the
energy-critical, focusing, non-linear Schrödinger equation in the radial case. Invent.
Math., 166(2006), 645-675.
[13] R. Killip, M. Visan and X. Zhang, Energy-critical NLS with quadratic potentials.
arXiv:math.AP/0611394.
[14] K. Kurata and T. Ogawa, Remarks on blowing-up of solutions for some nonlinear
Schrödinger equations. Tokyo J. Math., 13:2(1990), 399-419.
[15] C. Miao, Hm-modified wave operator for nonlinear Hartree equation in the space
dimensions n ≥ 2. Acta Mathematica Sinica, 13:2(1997), 247-268.
[16] C. Miao, G. Xu and L. Zhao, The Cauchy problem of the Hartree equation. preprint.
[17] C. Morawetz and W. A. Strauss, Decay and scattering of solutions of a nonlinear
relativistic wave equation, Comm. Pure Appl. Math., 25(1972), 1-31.
[18] K. Nakanishi, Energy scattering for Hartree equations, Math. Res. Lett., 6(1999),
107-118.
http://arxiv.org/abs/math/0611394
[19] H. Nawa and T. Ozawa, Nonlinear scattering with nonlocal interactions, Comm.
Math. Phys. 146(1992), 259-275.
[20] E. Ryckman and M. Visan, Global well-posedness and scattering for the defocusing
energy-critical nonlinear Schrödinger equation in R1+4. Amer. J. Math., 129(2007),
1-60.
[21] R. S. Strichartz, Restriction of Fourier tranform to quadratic surfaces and decay of
solutions of wave equations. Duke Math. J., 44(1977), 705-714.
[22] T. Tao, Global well-posedness and scattering for the higher-dimensional energy-
critical nonlinear Schrödinger equation for radial data. New York Journal of Math-
ematics, 11(2005), 57-80.
[23] M. Visan, The defocusing energy-critical nonlinear Schrödinger equation in higher
dimensions. to appear Duke Math. J..
[24] M. I. Weinstein, Nonlinear Schrödinger equations and sharp interpolation estimates.
Comm. Math. Phys., 87(1983), 567-576.
	Introduction
	Notations and basic estimates
	Local mass conservation and Morawetz inequality
	Local mass conservation
	A Morawetz inequality
	Local theory
	Perturbation result
	Global well-posedness
	References
ABSTRACT
  We consider the defocusing, $\dot{H}^1$-critical Hartree equation for the
radial data in all dimensions $(n\geq 5)$. We show the global well-posedness
and scattering results in the energy space. The new ingredient in this paper is
that we first take advantage of the term $\displaystyle - \int_{I}\int_{|x|\leq
A|I|^{1/2}}|u|^{2}\Delta \Big(\frac{1}{|x|}\Big)dxdt$ in the localized Morawetz
identity to rule out the possibility of energy concentration, instead of the
classical Morawetz estimate dependent of the nonlinearity.

<|endoftext|><|startoftext|>
Introduction
	The model
	COSMOLOGICAL CONSEQUENCES
	Special case with =0 
	General case with =0 
	Conclusions and Discussions
	Acknowledgments
	References
ABSTRACT
  We investigate the effect of the bulk content in the general Gauss-Bonnet
braneworld on the evolution of the universe. We find that the Gauss-Bonnet term
and the combination of the dark radiation and the matter content of the bulk
play a crucial role in the universe evolution. We show that our model can
describe the super-acceleration of our universe with the equation of state of
the effective dark energy in agreement with observations.

<|endoftext|><|startoftext|>
Introduction
The theory of free probability and free entropy was developed by Voiculescu from 1990s. It
played a crucial role in the recent study of finite von Neumann algebras (see [1], [3], [4], [5], [6],
[7], [8], [11], [14], [15], [23], [24], [25]). The analogue of free entropy dimension in C∗ algebra
context, the notion of topological free entropy dimension of of n−tuple of elements in a unital
C∗ algebra, was also introduced by Voiculescu in [26].
After introducing the concept of topological free entropy dimension of n-tuple of elements
in a unital C∗ algebra, Voiculescu discussed some of its properties including subadditivity and
change of variables in [26]. In this paper, we will add one basic property into the list: topological
free entropy dimension of one variable. More specifically, suppose x is a self-adjoint element in a
unital C∗ algebra A and σ(x) is the spectrum of x in A. Then topological free entropy dimension
of x is equal to 1− 1
where n is the cardinality of the set σ(x) (see Theorem 4.1).
In [26], Voiculescu showed that (i) if x1, . . . , xn is a family of free semicircular elements
in a unital C∗ algebra with a tracial state, then δtop(x1, . . . , xn) = n, where δtop(x1, . . . , xn) is
the topological free entropy dimension of x1, . . . , xn; (ii) if x1, . . . , xn is the universal n-tuple
of self-adjoint contractions, then δtop(x1, . . . , xn) = n. Except in these two cases, very few has
been known on the values of topological free entropy dimensions in other C∗ algebras. Using the
inequality between topological free entropy dimension and Voiculescu’s free dimension capacity,
we are able to obtain an estimation of upper-bound of topological free entropy dimension for a
1The second author is supported by an NSF grant.
http://arxiv.org/abs/0704.0667v3
unital C∗ algebra with a unique tracial state (see Theorem 5.1). The lower-bound of topological
free entropy dimension is also obtained for infinite dimensional simple unital C∗ algebra with a
unique tracial state (see Theorem 5.2). As a corollary, we know that the topological free entropy
dimension of any family of self-adjoint generators of an irrational rotation C∗ algebra or UHF
algebra or C∗red(F2)⊗minC∗red(F2) is equal to 1 (see Theorem 5.3, 5.4, 5.5). For these C∗ algebras,
the value of the topological free entropy dimension is independent of the choice of generators.
The rest of the paper is devoted to study another invariant associated to n-tuple of elements
in C∗ algebras. This invariant, called topological free orbit dimension, is an analogue of free
orbit dimension in finite von Neumann algebras (see [11]). We show that the topological free
orbit dimension of a self-adjoint element in a unital C∗ algebra is equal to, according to some
measurement, the packing dimension of the spectrum of x (see Theorem 7.1).
The organization of the paper is as follows. In the section 2, we recall the definition of
topological free entropy dimension. Some technical lemmas are proved in section 3. In section
4, we compute the topological free entropy dimension of one self-adjoint element in a unital
C∗ algebra. In section 5, we study the relationship between topological free entropy dimension
and free capacity dimension of a unital C∗ algebra. Then we show that topological free entropy
dimension of of any family of generators of an infinite dimensional simple unital C∗ algebra with
a unique tracial state is always greater than or equal to 1. The concept of topological free orbit
dimension of n-tuple of elements in a C∗ algebra is introduced in section 6. Its value for one
variable is computed in section 7.
2. Definitions and preliminary
In this section, we are going to recall Voiculescu’s definition of topological free entropy
dimension of n-tuple of elements in a unital C∗ algebra.
2.1. A Covering of a set in a metric space. Suppose (X, d) is a metric space and K is
a subset of X . A family of balls in X is called a covering of K if the union of these balls covers
K and the centers of these balls lie in K.
2.2. Covering numbers in complex matrix algebra (Mk(C))n. Let Mk(C) be the
k × k full matrix algebra with entries in C, and τk be the normalized trace on Mk(C), i.e.,
Tr, where Tr is the usual trace on Mk(C). Let U(k) denote the group of all unitary
matrices in Mk(C). Let Mk(C)n denote the direct sum of n copies of Mk(C). Let Ms.ak (C) be
the subalgebra of Mk(C) consisting of all self-adjoint matrices of Mk(C). Let (Ms.ak (C))n be
the direct sum of n copies of Ms.ak (C). Let ‖ · ‖ be an operator norm on Mk(C)n defined by
‖(A1, . . . , An)‖ = max{‖A1‖, . . . , ‖An‖}
for all (A1, . . . , An) in Mk(C)n. Let ‖ · ‖2 denote the trace norm induced by τk on Mk(C)n, i.e.,
‖(A1, . . . , An)‖2 =
1A1) + . . .+ τk(A
for all (A1, . . . , An) in Mk(C)n.
For every ω > 0, we define the ω-‖ · ‖-ball Ball(B1, . . . , Bn;ω, ‖ · ‖) centered at (B1, . . . , Bn)
in Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that
‖(A1, . . . , An)− (B1, . . . , Bn)‖ < ω.
Definition 2.1. Suppose that Σ is a subset of Mk(C)n. We define the covering number ν∞(Σ, ω)
to be the minimal number of ω-‖ · ‖-balls that consist a covering of Σ in Mk(C)n.
For every ω > 0, we define the ω-‖·‖2-ball Ball(B1, . . . , Bn;ω, ‖·‖2) centered at (B1, . . . , Bn)
in Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that
‖(A1, . . . , An)− (B1, . . . , Bn)‖2 < ω.
Definition 2.2. Suppose that Σ is a subset of Mk(C)n. We define the covering number ν2(Σ, ω)
to be the minimal number of ω-‖ · ‖2-balls that consist a covering of Σ in Mk(C)n.
2.3. Noncommutative polynomials. In this article, we always assume that A is a unital
C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint elements inA. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉
be the unital noncommutative polynomials in the indeterminates X1, . . . , Xn, Y1, . . . , Ym. Let
{Pr}∞r=1 be the collection of all noncommutative polynomials in C〈X1, . . . , Xn, Y1, . . . , Ym〉 with
rational complex coefficients. (Here “rational complex coefficients” means that the real and
imaginary parts of all coefficients of Pr are rational numbers).
Remark 2.1. We alsways assume that 1 ∈ C〈X1, . . . , Xn, Y1, . . . , Ym〉.
2.4. Voiculescu’s Norm-microstates Space. For all integers r, k ≥ 1, real numbers
R, ǫ > 0 and noncommutative polynomials P1, . . . , Pr, we define
(top)
R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr)
to be the subset of (Ms.ak (C))n+m consisting of all these
(A1, . . . , An, B1, . . . , Bm) ∈ (Ms.ak (C))n+m
satisfying
max{‖A1‖, . . . , ‖An‖, ‖B1‖, . . . , ‖Bm‖} ≤ R
|‖Pj(A1, . . . , An, B1, . . . , Bm)‖ − ‖Pj(x1, . . . , xn, y1, . . . , ym)‖| ≤ ǫ, ∀ 1 ≤ j ≤ r.
Remark 2.2. In the definition of norm-microstates space, we use the following assumption. If
Pj(x1, . . . , xn, y1, . . . , ym) = α0 · IA +
1≤i1,...,is≤n+m
αi1···iszi1 · · · zis
where z1, . . . , zn+m denotes x1, . . . , xn, y1, . . . , ym and α0, αi1···is are in C, then
Pj(A1, . . . , An, B1, . . . , Bm) = α0 · Ik +
1≤i1,...,is≤n+m
αi1···isZi1 · · ·Zis
where Z1, . . . , Zn+m denotes A1, . . . , An, B1, . . . , Bm and Ik is the identity matrix in Mk(C).
Remark 2.3. In the original definition of norm-microstates space in [26], the parameter R was
not introduced. Note the following observation: Let R > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖}.
When r is large enough so that
{X1, . . . , Xn, Y1, . . . , Ym} ⊂ {P1, . . . , Pr}
and 0 < ǫ < R−max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖}, we have
(top)
R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) = Γtop(x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr)
for all k ≥ 1, where Γ(top)(x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) is the norm-microstates space
defined in [26]. Thus our definition agrees with the one in [26] for large R, r and small ǫ.
In the later sections, we need to construct the ultraproduct of some matrix algebras, it will
be convenient for us to include the parameter “R” in the definition of norm-microstate space.
Define the norm-microstates space of x1, . . . , xn in the presence of y1, . . . , ym, denoted by
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr)
as the projection of Γ
(top)
R (x1, . . . , xn, y1, . . . , ym; k, ǫ, P1, . . . , Pr) onto the space (Ms.ak (C))n via
the mapping
(A1, . . . , An, B1, . . . , Bm) → (A1, . . . , An).
2.5. Voiculescu’s topological free entropy dimension (see [26]). Define
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω)
to be the covering number of the set Γ
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) by ω-‖ · ‖-balls
in the metric space (Ms.ak (C))n equipped with operator norm.
Definition 2.3. Define
δtop(x1, . . . , xn : y1, . . . , ym;ω)
= sup
ǫ>0,r∈N
lim sup
log(ν∞(Γ
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω))
−k2 log ω .
The topological entropy dimension of x1, . . . , xn in the presence of y1, . . . , ym is defined by
δtop(x1, . . . , xn : y1, . . . , ym) = lim sup
δtop(x1, . . . , xn : y1, . . . , ym;ω).
Remark 2.4. Let M > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖} be some positive number. By
Remark 2.3, we know
δtop(x1, . . . , xn : y1, . . . , ym)
= lim sup
ǫ>0,r∈N
lim sup
log(ν∞(Γ
(top)
M (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω))
−k2 logω .
2.6. C∗ algebra ultraproduct and von Neumann algebra ultraproduct. Suppose
{Mkm(C)}∞m=1 is a sequence of complex matrix algebras where km goes to infinity when m
approaches infinity. Let γ be a free ultrafilter in β(N)\N. We can introduce a unital C∗ algebra
m=1Mkm(C) as follows:
Mkm(C) = {(Ym)∞m=1 | ∀ m ≥ 1, Ym ∈ Mkm(C) and sup
‖Ym‖ <∞}.
We can also introduce the norm closed two sided ideals I∞ and I2 as follows.
I∞ = {(Ym)∞m=1 ∈
Mkm(C) | lim
‖Ym‖ = 0}
I2 = {(Ym)∞m=1 ∈
Mkm(C) | lim
‖Ym‖2 = 0}
Definition 2.4. The C∗ algebra ultraproduct of {Mkm(C)}∞m=1 along the ultrfilter γ, denoted
m=1Mkm(C), is defined to be the quotient algebra of
m=1Mkm(C) by the ideal I∞. The
image of (Ym)
m=1 ∈
m=1Mkm(C) in the quotient algebra is denoted by [(Ym)m].
Definition 2.5. The von Neumann algebra ultraproduct of {Mkm(C)}∞m=1 along the ultrfilter
γ, also denoted by
m=1Mkm(C) if no confusion arises, is defined to be the quotient algebra of
m=1Mkm(C) by the ideal I2. The image of (Ym)∞m=1 ∈
m=1Mkm(C) in the quotient algebra
is denoted by [(Ym)m].
Remark 2.5. The von Neumann algebra ultraproduct
m=1Mkm(C) is a finite factor (see
[16]).
2.7. Topological free entropy dimension of elements in a non-unital C∗ algebra.
Topological free entropy dimension can also be defined for n-tuple of elements in a non-unital
C∗ algebra. Suppose that A is a non-unital C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint
elements in A. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉 ⊖ C be the noncommutative polynomials in the
indeterminates X1, . . . , Xn, Y1, . . . , Ym without constant terms. Let {Pr}∞r=1 be the collection
of all noncommutative polynomials in C〈X1, . . . , Xn, Y1, . . . , Ym〉 ⊖ C with rational complex
coefficients. Then norm-mocrostate space
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr)
can be defined similarly as in section 2.4. So topological free entropy dimension
δtop(x1, . . . , xn : y1, . . . , ym)
can also be defined similarly as in section 2.5.
In the paper, we will focus on the case when A is a unital C∗ algebra.
3. Some technical lemmas
3.1. Suppose x is a self-adjoint element in a unital C∗ algebra A. Let σ(x) be the spectrum
of x in A.
Theorem 3.1. Let R > ‖x‖. For any ω > 0, we have the following.
(1) There are some integer n ≥ 1 and distinct real numbers λ1, λ2, · · · , λn in σ(x) satisfying
(i) |λi−λj | ≥ ω for all 1 ≤ i 6= j ≤ n; and (ii) for any λ in σ(x), there is some λj with
1 ≤ j ≤ n such that |λ− λj| ≤ ω.
(2) There are some r0 > 0 and ǫ0 > 0 such that the following holds: when r > r0, ǫ < ǫ0,
for any A in Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), there are positive integers 1 ≤ k1, . . . , kn ≤ k with
k1 + k2 + · · ·+ kn = k and some unitary matrix U in Mk(C) satisfying
‖U∗AU −
λ1Ik1 0 · · · 0
0 λ2Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · λnIkn
‖ ≤ 2ω,
where Ikj is the kj × kj identity matrix in Mkj (C) for 1 ≤ j ≤ n.
Proof. The proof of part (1) is trivial. We will only prove part (2). Assume that the result
in (2) does not hold. Then there is some ω > 0 so that the following holds: for all m ≥ 1, there
are km ≥ 1 and some Am in Γ(top)R (x; km, 1m , P1, . . . , Pm) such that
‖U∗AmU −
λ1Is1 0 · · · 0
0 λ2Is2 · · · 0
· · · · · · . . . · · ·
0 0 · · · λnIsn
‖ > 2ω, (∗)
for every 1 ≤ s1, . . . , sn ≤ km with s1 + · · ·+ sn = km and every unitary matrix U in Mkm(C).
Let γ be a free ultrafilter in β(N)\N. Let B =
m=1 Mkm(C) be the C∗ algebra ultraproduct
of {Mkm(C)}∞m=1 along the ultrafilter α, i.e.
m=1 Mkm(C) is the quotient algebra of the
C∗ algebra
m=1Mkm(C) by I∞, the 0-ideal of the norm ‖ · ‖, where I∞ = {(Am)∞m=1 ∈
Mkm(C) | limm→γ ‖Am‖ = 0}. Let a = [(U∗AmU)∞m=1] be a self-adjoint element in B. By
mapping x to a, there is a unital ∗-isomorphism from the C∗ subalgebra generated by {IA, x}
in A onto the C∗ subalgebra generated by {IB, a} in B. Thus σ(x) = σ(a). It is not hard to see
that Hausdorff-dist(σ(U∗AmU), σ(a)) → 0 as m goes to γ, which contradicts with the results in
part (1) and (∗).
3.2. In this subsection, we will use the following notation.
(i) Let n,m be some positive integers with n ≥ m.
(ii) Let δ, θ be some positive numbers.
(iii) Let {λ1, λ2, . . . , λm} ∪ {λm+1, . . . , λn} be a family of real numbers such that
|λi − λj | ≥ θ for all 1 ≤ i < j ≤ m.
(iv) Let k be a positive integer such that k − (n−m) is divided by m. We let
k − n +m
(v) We let
B = diag(λm+1, . . . , λn)
be a diagonal matrix in Mn−m(C) and
A = diag(λ1It, λ2It, . . . , λmIt, B)
be a block-diagonal matrix in Mk(C), where It is the identity matrix in Mt(C).
(vi) We let A be defined as above and
Ω(A) = {U∗AU | U is in U(k)}.
(vii) Assume that {eij}ki,j=1 is a canonical basis of Mk(C). We let
V1 = span{eij | |λ[ i
]+1 − λ[ j
]+1| ≥ θ, with 1 ≤ i, j < mt}; and
V2 = Mk(C)⊖ V1,
where [ i
], or [ j
], denotes the largest integer ≤ [ i
], or [ j
] respectively.
Lemma 3.1. We follow the notation as above. Suppose ‖U1AU∗1 − U2AU∗2‖2 ≤ δ for some
unitary matrices U1 and U2 in U(k). Then the following hold.
(1) There exists some S ∈ V2 such that ‖S‖2 ≤ 1 and
‖U1 − U2S‖2 ≤
(2) If n = m, then there is a unitary matrix W in V2 such that
‖U1 − U2W‖2 ≤
Proof. Assume that
U∗2U1 =
U11 U12 · · · U1,m+1
U21 U22 · · · U2,m+1
· · · · · · · · · · · ·
Um+1,1 Um+1,2 · · · Um+1,m+1
where Ui,j is a t × t matrix, Ui,m+1 a t × (n − m) matrix, Um+1,j a (n − m) × t matrix for
1 ≤ i, j ≤ m and Um+1,m+1 is a (n−m)× (n−m) matrix.
(1) Let
U11 0 · · · 0 U1,m+1
0 U22 · · · 0 U2,m+1
· · · · · · . . . · · · · · ·
0 0 · · · Um,m Um,m+1
Um+1,1 Um+1,2 · · · Um+1,m Um+1,m+1
It is easy to see that S is in V2, ‖S‖2 ≤ 1 and
δ2 ≥ ‖U1AU∗1 − U2AU∗2‖22 =
Tr((U∗2U1A− AU∗2U1)(U∗2U1A− AU∗2U1)∗)
1≤i 6=j≤m
Tr(|λi − λj|2UijU∗ij)
1≤i 6=j≤m
Tr(UijU
Hence
‖U1 − U2S‖22 = ‖U∗2U1 − S‖22 =
1≤i 6=j≤m
Tr(UijU
ij) ≤
It follows that
‖U1 − U2S‖2 ≤
(2) If n = m, then
V2 = Mt(C)⊕Mt(C)⊕ · · · ⊕Mt(C).
By the construction of S, we can assume S =WH is a polar decomposition of S in V2 for some
unitary matrix W and positive matrix H in V2. Again by the construction of S, we know that
‖S‖ ≤ 1, whence ‖H‖ ≤ 1. From the proven fact that ‖U∗2U1 − S‖2 ≤ δθ , we know that
‖H2 − I‖2 = ‖S∗S − I‖2 ≤
‖H − I‖2 ≤ ‖H2 − I‖2 ≤
It follows that
‖U1 − U2W‖2 ≤ ‖U1 − U2WH‖2 + ‖U2WH − U2W‖2 = ‖U1 − U2S‖2 + ‖H − I‖2 ≤
Lemma 3.2. We have the following results.
(1) For every U ∈ U(k), let
Σ(U) = {W ∈ U(k) | ∃ S ∈ V2 such that ‖S‖2 ≤ 1 and ‖W − US‖2 ≤
Then the volume of Σ(U) is bounded above by
µ(Σ(U)) ≤ (C1 · 4δ/θ)k
)2mt2+4m(n−m)t+2(n−m)2
where µ is the normalized Haar measure on the unitary group U(k) and C,C1 are some
constants independent of k, δ, θ.
(2) When n = m, for every U ∈ U(k), let
Σ̃(U) = {W ∈ U(k) | ∃ a unitary matrix W1 in V2 such that ‖W − UW1‖2 ≤
µ(Σ̃(U)) ≤ (C1 · 8δ/θ)k
Proof. (1) By computing the covering number of the set {S | S ∈ V2, such that ‖S‖2 ≤ 1}
by δ/θ-‖ · ‖2-balls in Mk(C), we know
ν2({S | S ∈ V2, ‖S‖2 ≤ 1},
)real dimension of of V2
)2mt2+4m(n−m)t+2(n−m)2
where C is a universal constant. Thus the covering number of the set Σ(U) by the 4δ/θ-‖·‖2-balls
in Mk(C) is bounded by
ν2(Σ(U),
) ≤ ν2({S | S ∈ V2, ‖S‖2 ≤ 1},
)2mt2+4m(n−m)t+2(n−m)2
But the ball of radius 4δ/θ in U(k) has the volume bounded by
µ(ball of radius 4δ/θ) ≤ (C1 · 4δ/θ)k
where C1 is a universal constant. Thus
µ(Σ(U)) ≤ (C1 · 4δ/θ)k
)2mt2+4m(n−m)t+2(n−m)2
(2) A slight adaption of the proof of part (1) gives us the proof of part (2). �
Lemma 3.3. Let Ω(A) be defined as in (vi) at the beginning of this subsection.
(1) The covering number of Ω(A) by the 1
δ-‖ · ‖2-balls in Mk(C) is bounded below by
ν2(Ω(A),
δ) ≥ (C1 · 4δ/θ)−k
)−(2mt2+4m(n−m)t+2(n−m)2)
(2) If n = m, then
ν2(Ω(A),
δ) ≥ (C1 · 8δ/θ)−k
)−mt2
Proof. (1) For every U ∈ U(k), define
Σ(U) = {W ∈ U(k) | ∃ S = S∗ ∈ V1, such that ‖S‖2 ≤ 1, ‖W − US‖2 ≤
By preceding lemma, we have
µ(Σ(U)) ≤ (C1 · 4δ/θ)k
)mt2+4m(n−m)t+2(n−m)2
A “parking” (or exhausting) argument will show the existence of a family of unitary elements
{Ui}Ni=1 ⊂ U(k) such that
N ≥ (C1 · 4δ/θ)−k
)−(mt2+4m(n−m)t+2(n−m)2)
Ui is not contained in ∪i−1j=1 Σ(Uj), ∀ i = 1, . . . , N.
From the definition of each Σ(Uj), it follows that
‖Ui − UjS‖2 >
, ∀ S ∈ V2, with ‖S‖2 ≤ 1, ∀1 ≤ j < i ≤ N.
By Lemma 3.1, we know that
‖UiAU∗i − UjAU∗j ‖2 > δ, ∀1 ≤ j < i ≤ N,
which implies that
ν2(Ω(A),
δ) ≥ N ≥ (C1 · 4δ/θ)−k
)−(mt2+4m(n−m)t+2(n−m)2)
(2) is similar as (1). �
3.3. We have following theorem.
Theorem 3.2. Let n ≥ m, δ, θ > 0 and {λ1, λ2, . . . , λm} ∪ {λm+1, . . . , λn} be a family of real
numbers such that
|λi − λj| ≥ θ
for all 1 ≤ i < j ≤ m. Let k be a positive integer such that k − (n−m) is divided by m and
k − n +m
B = diag(λm+1, . . . , λn)
be a diagonal matrix in Mn−m(C) and
A = diag(λ1It, λ2It, . . . , λmIt, B)
be a block-diagonal matrix in Mk(C), where It is the identity matrix in Mt(C). We let
Ω(A) = {U∗AU | U is in U(k)}.
Then the covering number of Ω(A) by the 1
δ-‖ · ‖-balls in Mk(C) is bounded below by
ν∞(Ω(A),
δ) ≥ (C1 · 4δ/θ)−k
)−(2mt2+4m(n−m)t+2(n−m)2)
where C,C1 are some universal constants.
When n = m, we have
ν∞(Ω(A),
δ) ≥ (C1 · 8δ/θ)−k
)−mt2
Proof. Note that
ν∞(Ω(A),
) ≥ ν2(Ω(A),
), ∀ δ > 0.
The result follows directly from preceding lemma. �
3.4. The following proposition, whose proof is skipped, is an easy extension of Lemma 3.3.
Proposition 3.1. Let m, k be some positive integers and θ, δ be some positive numbers. Let
T1, T2, . . . , Tm+1 is a partition of the set {1, 2, . . . , k}, i.e. ∪m+1i=1 Ti = {1, 2, . . . , k} and Ti∩Tj = ∅
for 1 ≤ i 6= j ≤ m+ 1. Let λ1, . . . , λk be some real numbers such that, if 1 ≤ j1 6= j2 ≤ m then
|λi1 − λi2 | > θ, ∀ i1 ∈ Tj1, i2 ∈ Tj2 .
Let A = diag(λ1, λ2, . . . , λk) be a self-adjoint matrix in Mk(C) and
Ω(A) = {U∗AU | U ∈ U(k)}
be a subset of Mk(C).
Let sj be the cardinality of the set Tj for 1 ≤ j ≤ m+ 1. Then the covering number of Ω(A)
by the 1
δ-‖ · ‖2-balls in Mk(C) is bounded below by
ν2(Ω(A),
δ) ≥ (C1 · 4δ/θ)−k
)−2s2
−···−2s2m+1−4(s1+···+sm)sm+1
where C,C1 are some universal constants.
4. Topological free entropy dimension of one variable
Suppose x is a self-adjoint element of a unital C∗ algebra A. In this section, we are going to
compute the topological entropy dimension of x.
4.1. Upperbound.
Proposition 4.1. Suppose x in A is a self-adjoint element with the spectrum σ(x). Then
δtop(x) ≤ 1−
where n is the cardinality of σ(x). Here we assume that 1
Proof. By [26], we know that the inequality always holds when n is infinity. We need only
to show that
δtop(x) ≤ 1−
when n <∞.
Assume that λ1, . . . , λn are in the spectrum of x in A.
Let R > ‖x‖. By Theorem 3.1, for every ω > 0, there are r0 > 0 and ǫ0 > 0 such that, for
all r > r0, ǫ < ǫ0,
A ∈ Γ(top)R (x; k, ǫ, P1, . . . , Pr),
there are some 1 ≤ k1, . . . , kn ≤ k, with k1 + · · · + kn = k and a unitary matrix U in Mk(C)
satisfying
λ1Ik1 0 · · · 0
0 λ2Ik2 · · · 0
0 0 · · · λnIkn
≤ 2ω. (∗∗)
Ω(k1, . . . , kn) =
λ1Ik1 0 · · · 0 0
0 λ2Ik2 · · · 0 0
0 0 · · · λn−1Ikn−1 0
0 0 · · · 0 λnIkn
U∗ | U is in Uk
By Corollary 12 in [21] or Theorem 3 in [2], the covering number of Ω(k1, . . . , kn−1, kn) by
ω-‖ · ‖-balls in Mk(C) is upperbounded by
ν∞(Ω(k1, . . . , kn−1, kn), ω) ≤
where C2 is a constant which does not depend on k, k1, . . . , kn (may depend on n and ‖x‖).
Let I be the set consisting of all these (k1, . . . , kn) in Zn such that 1 ≤ k1, . . . , kn ≤ k and
k1 + · · ·+ kn = k. Then the cardinality of the set I is equal to
(k − 1)!
(n− 1)!(k − n)! .
Note that
k2i ≥ k2/n
for all 1 ≤ k1, . . . , kn ≤ k with k1 + · · ·+ kn = k; and by (∗∗)
(top)
R (x; k, ǫ, P1, . . . , Pr)
is contained in 2ω-neighborhood of the set
(k1,...,kn)∈I
Ω(k1, . . . , kn).
It follows that the covering number of the set
(top)
R (x; k, ǫ, P1, . . . , Pr)
by 3ω-‖ · ‖-balls in Mk(C) is upperbounded by
(top)
R (x; k, ǫ, P1, . . . , Pr), 3ω) ≤
(k − 1)!
(n− 1)!(k − n)! ·
)k2−k2/n
δtop(x) ≤ lim sup
lim sup
(k−1)!
(n−1)!(k−n)!
)k2−k2/n
−k2 log(3ω) = 1−
4.2. Lower-bound. We follow the notation from last subsection.
Proposition 4.2. Suppose that x is a self-adjoint element with the finite spectrum σ(x) in A.
δtop(x) ≥ 1−
where n is the cardinality of the set σ(x).
Proof. Suppose that λ1, . . . , λn are distinct spectrum of x. There is some positive number
θ such that
|λi − λj| > θ, ∀ 1 ≤ i 6= j ≤ n.
Assume k = nt for some positive integer t. Let
Ak = diag(λ1It, . . . , λnIt)
be a diagonal matrix in Mk(C) where It is the t× t identity matrix. It is easy to see that, for
all R > ‖x‖, r ≥ 1 and ǫ > 0, we have
Ak ∈ Γ(top)R (x; k, ǫ, P1, . . . , Pr).
For any ω > 0, applying Theorem 3.2 for n = m and δ = 1
ω, we have
(top)
R (x; k, ǫ, P1, . . . , Pr), ω) ≥ (C1 · 8δ/θ)−k
)−mt2
= (16C1ω/θ)
−k2 ·
)−mt2
Note that k = nt = mt and θ is some fixed number. A quick computation shows that
δtop(x) ≥ 1−
Proposition 4.3. Suppose that x is a self-adjoint element in A with infinite spectrum. Then
δtop(x) ≥ 1.
Proof. For any 0 < θ < 1, there are λ1, . . . , λm in the spectrum of x, σ(x), satisfying (i)
|λi − λj | ≥ θ;
and (ii) for any λ in σ(x), there is some λj with |λ − λj| ≤ θ. By functional calculus, for any
R > ‖x‖, r ≥ 1 and ǫ > 0, there are some positive integer n ≥ m and real numbers λm+1, . . . , λn
in σ(x) satisfying: for every t ≥ 1 the matrix
A = diag(λ1It, λ2It, . . . , λmIt, λm+1, . . . , λn)
is in
(top)
R (x; k, ǫ, P1, . . . , Pr),
where we assume that k = mt + n−m. For any ω > 0, let δ = 1
ω. By Theorem 3.2, we know
(top)
R (x; k, ǫ, P1, . . . , Pr), ω) ≥ (C1 · 4δ/θ)−k
)−(2mt2+4m(n−m)t+2(n−m)2)
lim sup
log(ν∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), ω))
−k2 logω ≥
log(4C1
)− log θ
+ 1 +
log(2C) + log θ
log ω
Then,
δtop(x) ≥ 1−
When θ goes to 0, m goes to infinity as σ(x) has infinitely many elements. Therefore,
δtop(x) ≥ 1.
4.3. Topological free entropy dimension in one variable case. By Proposition 4.1,
Proposition 4.2 and Proposition 4.3, we have the following result.
Theorem 4.1. Suppose x is a self-adjoint element in a unital C∗ algebra A. Then
δtop(x) = 1−
where n is the cardinality of the set σ(x) and σ(x) is the set of spectrum of x in A. Here we
assume that 1
5. Topological free entropy dimension of n-tuple in unital C∗ algebras
5.1. An equivalent definition of topological free entropy dimension. Suppose that
A is a unital C∗ algebra and x1, . . . , xn, y1, . . . , ym are self-adjoint elements in A. For every
R, ǫ > 0 and positive integers r, k, let
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr)
be Voiculescu’s norm-microstate space defined in section 2.4.
Define
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω)
to be the covering number of the set Γ
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr) by ω-‖·‖2-balls
in the metric space (Ms.ak (C))n equipped with trace norm (see Definition 2.2).
Definition 5.1. Define
δ̃top(x1, . . . , xn : y1, . . . , ym;ω)
= sup
ǫ>0,r∈N
lim sup
log(ν2(Γ
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr), ω))
−k2 logω
δ̃top(x1, . . . , xn : y1, . . . , ym) = lim sup
δ̃top(x1, . . . , xn : y1, . . . , ym;ω)
The following proposition was pointed out by Voiculescu in [26]. For the sake of complete-
ness, we also include a proof here.
Proposition 5.1. Suppose that A is a unital C∗ algebra and x1, . . . , xn, y1, . . . , ym are self-
adjoint elements in A. Then
δ̃top(x1, . . . , xn : y1, . . . , ym) = δtop(x1, . . . , xn : y1, . . . , ym),
where δtop(x1, . . . , xn : y1, . . . , ym) is the topological free entropy dimension of x1, . . . , xn in
presence of y1, . . . , ym.
Proof. This is an easy consequence of Lemma 1 in [21]. Let λ be the Lebesgue measure
on (Ms.ak (C))n. Let, for every ω > 0,
B∞(ω) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)‖ ≤ ω}
B2(ω) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)‖2 ≤ ω}
It follows from the results in [21] or Theorem 8 in [2] that, for some M1,M2 independent of k, ω
such that
λ(B∞(1)) ≤ λ(B∞(ω/4))
λ(B2(2
nω)) ≤ λ(B2(1)). (5.1.1)
For every ω > 0 and any subset set K of (Ms.ak (C))n , let
K(ω, ‖ · ‖) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)− (D1, . . . , Dn)‖ ≤ ω
for some (D1, . . . , Dn) ∈ K}
K(ω, ‖ · ‖2) = {(A1, . . . , An) ∈ (Ms.ak (C))n | ‖(A1, . . . , An)− (D1, . . . , Dn)‖2 ≤ ω
for some (D1, . . . , Dn) ∈ K}
Note the following fact:
‖(A1, . . . , An)‖2 ≤
n‖(A1, . . . , An)‖, ∀ (A1, . . . , An) ∈ (Ms.ak (C))n.
It follows from Lemma 1 in [21] that
ν∞(K,ω) ≤
λ(K(ω, ‖ · ‖))
λ(B∞(ω/4))
nω, ‖ · ‖2))
λ(B2(2
≤ ν2(K(
nω, ‖ · ‖2), 2
Combining with the equalities (5.1.1), we get
ν∞(K,ω) ≤
λ(K(ω, ‖ · ‖))
λ(B∞(ω/4))
λ(K(ω, ‖ · ‖))
λ(B∞(1))
nω, ‖ · ‖2))
λ(B∞(1))
nω, ‖ · ‖2))
λ(B2(1))
≤ λ(K(
nω, ‖ · ‖2))
λ(B2(2
≤ ν2(K(
nω, ‖·‖2), 2
nω) ≤ ν2(K,
Therefore, we have
ν2(K,
nω) ≤ ν∞(K,ω) ≤
λ(B2(1))
λ(B∞(1))
ν2(K,
It is a well-known fact (for example see Theorem 8 in [2]) that
λ(B2(1))
λ(B∞(1))
≤ Cnk23
for some universal constant C3 > 0. Hence
ν2(K,
nω) ≤ ν∞(K,ω) ≤
nM1C3
ν2(K,
Let K be Γ
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr). By the definitions of δ̃top and δtop, we
δ̃top(x1, . . . , xn : y1, . . . , ym) = δtop(x1, . . . , xn : y1, . . . , ym).
5.2. Upper-bound of topological free entropy dimension in a unital C∗ algebra.
Let us recall Voiculescu’s definition of free dimension capacity in [26].
Definition 5.2. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators
x1, . . . , xn. Suppose that TS(A) is the set consisting of all tracial states of A. If TS(A) 6= ∅,
define Voiculescu’s free dimension capacity κδ(x1, . . . , xn) of x1, . . . , xn as follows,
κδ(x1, . . . , xn) = sup
τ∈TS(A)
δ0(x1, . . . , xn : τ),
where δ0(x1, . . . , xn : τ) is Voiculescu’s (von Neumann algebra) free entropy dimension of
x1, . . . , xn in 〈A, τ〉.
The relationship between topological free entropy dimension of a unital C∗ algebra with a
unique tracial state and its free dimension capacity is indicated by the following result.
Theorem 5.1. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators
x1, . . . , xn. Suppose that TS(A) is the set consisting of all tracial states of A. If TS(A) is a set
with a single element, then
δtop(x1, . . . , xn) ≤ κδ(x1, . . . , xn).
To prove the preceding theorem, we need the following lemma.
Sublemma 5.2.1. Suppose that A is a unital C∗ algebra with a family of self-adjoint generators
x1, . . . , xn. Suppose that TS(A) 6= ∅ is the set consisting of all tracial states of A. Let R >
max{‖x1‖, . . . , ‖xn‖} be some positive number. Then for any m ≥ 1, there is some rm ∈ N such
(top)
R (x1, . . . , xn; k,
, P1, . . . , Prm) ⊆ ∪τ∈TS(A)ΓR(x1, . . . , xn; k,m,
; τ), ∀ k ≥ 1
where ΓR(x1, . . . , xn; k,m,
; τ) is microstate space of x1, . . . , xn with respect to τ (see [23]).
Proof of Sublemma 5.2.1: We will prove the result by contradiction. Suppose, to the
contrary, there is some m0 ≥ 1 so that following holds: for any r ∈ N, there are some kr ≥ 1
and some
1 , A
2 , . . . , A
n ) ∈ Γ
(top)
R (x1, . . . , xn; kr,
, P1, . . . , Pr)
satisfying
1 , A
2 , . . . , A
n ) /∈ ∪τ∈TS(A)ΓR(x1, . . . , xn; kr, m,
; τ). (5.2.1)
Let α be a free ultrafilter in β(N)\N. Let N =
r=1Mkr(C) be the von Neumann algebra ultra-
product of {Mkr(C)}∞r=1 along the ultrafilter α, i.e.
r=1Mkr(C) is the quotient algebra of the
C∗ algebra
r=1Mkr(C) by I2, the 0-ideal of the trace τα, where τα((Ar)∞r=1) = limr→α
Tr(Ar)
Let, for each 1 ≤ j ≤ n, aj = [(A(j)r )∞r=1] be a self-adjoint element in N . By mapping xj to aj ,
there is a unital ∗-homomorphism ψ from the C∗ algebra A onto the C∗ subalgebra generated
by {a1, . . . , an} in N .
Let τ0 be the tracial state on A which is induced by τα on ψ(A), i.e.
τ0(x) = τα(ψ(x)), ∀ x ∈ A.
It follows that when r is large enough,
1 , A
2 , . . . , A
n ) ∈ ΓR(x1, . . . , xn; kr, m,
; τ0),
which contradicts with the inequality (5.2.1). This complete the proof. �
Proof of Theorem 5.1: Let R > max{‖x1‖, . . . , ‖xn‖}. Let τ be the unique trace of
A. By Sublemma 5.2.1, for any m ≥ 1, there is r ∈ N such that
(top)
R (x1, . . . , xn; k,
, P1, . . . , Pr) ⊆ ΓR(x1, . . . , xn; k,m,
; τ), ∀ k ≥ 1.
Therefore, for any 1 > ω > 0, we have
(top)
R (x1, . . . , xn; k,
, P1, . . . , Pr), ω) ≤ ν2(ΓR(x1, . . . , xn; k,m,
; τ), ω), ∀ k ≥ 1.
Now it is easy to check that
δ̃top(x1, . . . , xn) ≤ δ(x1, . . . , xn; τ) = κδ(x1, . . . , xn).
By Proposition 5.1, we know that
δtop(x1, . . . , xn) ≤ κδ(x1, . . . , xn).
Remark 5.1. Combining Theorem 5.1 with the results in [11] or [14], we will be able to compute
the upper-bound of topological free entropy dimension for a large class of unital C∗ algebras. For
example, δtop(x1, . . . , xn) ≤ 1 if x1, . . . , xn is a family of self-adjoint operators that generates an
irrational rotation algebra A.
5.3. Lower-bound of topological free entropy dimension in a unital C∗ algebra.
In this subsection, we assume that A is a finitely generated, infinite dimensional, unital simple
C∗ algebra with a unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint
generators of A. Let H be the Hilbert space L2(A, τ). Without loss of generality, we might
assume that A is faithfully represented on the Hilbert space H . Let M be the von Neumann
algebra generated by A on H . It is not hard to see that M is a diffuse von Neumann algebra
with a tracial state τ .
For each positive integer m, there is a family of mutually orthogonal projections p1, . . . , pm
in M such that τ(pj) = 1/m for 1 ≤ j ≤ m. Let
ym = 1 · p1 + 2 · p2 + · · ·+m · pm =
j · pj ∈ M.
Let {Pr(x1, . . . , xn)}∞r=1 be defined as in section 2.3. Thus {Pr(x1, . . . , xn)}∞r=1 is dense in M
with respect to the strong operator topology. Hence, for each m ≥ 1, there is some self-adjoint
element Prm(x1, . . . , xn) in A such that
‖ym − Prm(x1, . . . , xn)‖2 ≤
where ‖a‖2 =
τ(a∗a) for all a ∈ M.
Lemma 5.1. Let A be finitely generated, infinite dimensional, unital simple C∗ algebra with a
unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint generators of A. Let
H, M be defined as above. For each m ≥ 1, let ym and Prm(x1, . . . , xn) be chosen as above.
δtop(x1, . . . , xn) ≥ δtop(Prm(x1, . . . , xn) : x1, . . . , xn).
Proof. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}. There exists a positive constant
D > 1 such that
‖Prm(A1, . . . , An)− Prm(B1, . . . , Bm)‖ ≤ D‖(A1, . . . , An)− (B1, . . . , Bm)‖
for all A1, . . . , An, B1, . . . , Bn in Mk(C) satisfying 0 ≤ ‖A1‖, . . . , ‖An‖, ‖B1‖, . . . , ‖Bn‖ ≤ R.
Then it is not hard to verify that, for ω > 0,
(top)
R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω)
≤ ν∞(Γ(top)R (x1, . . . , xn; k, ǫ, P1, . . . , Pr),
for each r ≥ rm and ǫ < ω4 . By definition of δtop and Remark 2.3, we have
δtop(Prm(x1, . . . , xn) : x1, . . . , xn) ≤ δtop(x1, . . . , xn).
Definition 5.3. Suppose A is a unital C∗ algebra and x1, . . . , xn is a family of self-adjoint ele-
ments of A that generates A as a C∗ algebra. If for any R > max{‖x1‖, . . . , ‖xn‖, ‖y1‖, . . . , ‖ym‖},
r > 0, ǫ > 0, there is a sequence of positive integers k1 < k2 < · · · such that
(top)
R (x1, . . . , xn : y1, . . . , ym; ks, ǫ, P1, . . . , Pr) 6= ∅, ∀ s ≥ 1
then A is called having approximation property.
Lemma 5.2. Let A be a finitely generated, infinite dimensional, unital simple C∗ algebra with a
unique tracial state τ . Assume that A has approximation property. Assume that x1, . . . , xn is a
family of self-adjoint generators of A. Let H, M be defined as above. Let m be a positive integer.
Let ym and Prm(x1, . . . , xn) be chosen as above. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}.
Then there is some positive integer r > rm so that the following hold: ∀ k ≥ 1, if
(B,A1, . . . , An) ∈ Γ(top)R (Prm(x1, . . . , xn), x1, . . . , xn; k,
, P1, . . . , Pr),
then there are some 1 ≤ k1, . . . , km ≤ k with 1m −
for each 1 ≤ j ≤ m and
k1 + · · ·+ km = k, and a unitary matrix U in U(k) satisfying
‖B − U
1 · Ik1 0 · · · 0
0 2 · Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · m · Ikm
U∗‖2 ≤
Proof. We will prove the result by contradiction. Assume, to the contrary, for all r ≥ rm
there are some kr ≥ 1 and some
(B(r), A
1 , . . . , A
n ) ∈ Γ
(top)
R (Prm(x1, . . . , xn), x1, . . . , xn; kr,
, P1, . . . , Pr),
satisfying
‖B(r) − U
1 · Is1 0 · · · 0
0 2 · Is2 · · · 0
· · · · · · . . . · · ·
0 0 · · · m · Ism
U∗‖2 >
, (5.3.1)
for all 1 ≤ s1, . . . , sm ≤ kr with 1m −
for each 1 ≤ j ≤ n and s1+ · · ·+ sm = kr,
and all unitary matrix U in U(k).
Let α be a free ultrafilter in β(N) \N. Let N =
r=1Mkr(C) be the von Neumann algebra
ultraproduct of {Mkr(C)}∞r=1 along the ultrafilter α, i.e.
r=1Mkr(C) is the quotient of the
C∗ algebra
r=1Mkr(C) by I2, the 0-ideal of the trace τα, where τα((Ar)∞r=1) = limr→α
Tr(Ar)
Let, for each 1 ≤ j ≤ n, aj = [(A(j)r )∞r=1] be a self-adjoint element in N . By mapping xj to aj ,
there is a unital ∗-homomorphism ψ from the C∗ algebra A onto the C∗ subalgebra generated
by {a1, . . . , an} in N . Since A is a simple C∗ algebra and ψ(IA) = IN , ψ actually is a ∗-
isomorphism. Since A has a unique trace τ , ψ induces a ∗-isomorphism (still denoted by ψ)
from M onto the von Neumann subalgebra generated by a1, . . . , an in N . Therefore,
‖ym − Prm(x1, . . . , xn)‖2 = ‖ψ(ym)− Prm(a1, . . . , an)‖2,τα ≤
This contradicts with the definition of ym and inequality (5.3.1). �
The following lemma is well-known (for example, see Lemma 4.1 in [23]).
Lemma 5.3. Suppose A, or B, is a self-adjoint matrix in Ms.a.k (C) with a list of eigenvalues
λ1 ≤ λ2 ≤ · · · ≤ λk, or µ1 ≤ µ2 ≤ · · · ≤ µk respectively. Then
|λj − µj |2 ≤ Tr((A− UBU∗)2),
where U is any unitary matrix in U(k).
Lemma 5.4. Let r,m be some positive integer with 4 < m < r. Suppose k1, . . . , km is a family
of positive integers such that 1
for all 1 ≤ j ≤ k and k1 + · · ·+ km = k. If A
is a self-adjoint matrix in Mk(C) such that, for some unitary matrix U in U(k),
‖A− U
1 · Ik1 0 · · · 0
0 2 · Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · m · Ikm
U∗‖2 ≤
then, for any ω > 0 we have
ν2(Ω(A), ω) ≥ (8C1ω)−k
for some constants C1, C > 1 independent of k, ω, where
Ω(A) = {W ∗AW | W ∈ U(k)}.
Proof. Suppose that λ1 ≤ λ2 ≤ . . . ≤ λk are the eigenvalues of A. For each 1 ≤ j ≤ m, let
Tj = {i ∈ N | (
kt) + 1 ≤ i ≤
kt and |λi − j| ≤
T̂j = {(
kt) + 1, (
kt) + 2, · · · ,
kt} \ Tj ,
here we assume that k0 = 0. Let B = diag(1 · Ik1, · · · , m · Ikm) be a diagonal matrix in Mk(C).
By Lemma 5.3, we have
≥ Tr((A− UBU∗)2) ≥
i ∈T̂j
|λi − j|2 ≥
card(T̂j),
where card(T̂j) is the cardinality of the set T̂j . Thus
card(T̂j) ≤
, for 1 ≤ j ≤ m.
Let sj = card(Tj) for 1 ≤ j ≤ m, whence
≥ kj ≥ sj = kj − card(T̃j) ≥ kj −
, ∀ 1 ≤ j ≤ m.
Tm+1 = {1, 2, . . . , k} \ (∪nj=1Tj)
and sm+1 be the cardinality of the set Tm+1. Thus
sm+1 = k − s1 − · · · − sm =
card(T̂j) ≤
It is not hard to see that T1, . . . , Tm+1 is a partition of the set {1, 2, . . . , k}. Moreover, if
1 ≤ j1 6= j2 ≤ m then for any
i1 ∈ Tj1 , and i2 ∈ Tj2
we have
|λi1 − λi2| ≥ |j2 − j1| − |λi2 − j2| − |λi1 − j1| ≥ 1−
Applying Proposition 3.1 for such T1, . . . , Tm, Tm+1, θ = 1/2 and ω = δ/2, we have
ν2(Ω(A), ω) ≥ (8C1ω)−k
)−2s2
−···−2s2m−2s
−4(s1+···+sm)sm+1
≥ (8C1ω)−k
)−2(k2
+···+k2m+(
)2+2k· 4k
≥ (8C1ω)−k
)−2(( k
)2+···+( k
)2+ 16k
≥ (8C1ω)−k
for some constants C,C1 > 1 independent of k, ω.
Lemma 5.5. Let A be a finitely generated, infinite dimensional, simple unital C∗ algebra with a
unique tracial state τ . Assume that A has approximation property. Assume that x1, . . . , xn is a
family of self-adjoint generators of A. Let H, M be defined as above. Let m be a positive integer.
Let ym and Prm(x1, . . . , xn) be chosen as above. Let R > max{‖Prm(x1, . . . , xn)‖, ‖x1‖, . . . , ‖xn‖}.
When r is large enough and ǫ is small enough, for any ω > 0, we have
(top)
R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω) ≥ (8C1ω)−k
Proof. By Lemma 5.2, when r is large enough and ǫ is small enough, the following hold:
∀ k ≥ 1, if
(B,A1, . . . , An) ∈ Γ(top)R (Prm(x1, . . . , xn), x1, . . . , xn; k, ǫ, P1, . . . , Pr),
then there are some 1 ≤ k1, . . . , km ≤ k with 1m −
for each 1 ≤ j ≤ m and
k1 + · · ·+ km = k, and a unitary matrix U in U(k) satisfying
‖B − U
1 · Ik1 0 · · · 0
0 2 · Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · m · Ikm
U∗‖2 ≤
Combining with Lemma 5.4, we know that if
B ∈ Γ(top)R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr)
then, for any ω > 0,
ν2(Ω(B), ω) ≥ (8C1ω)−k
where
Ω(B) = {W ∗BW | W ∈ U(k)}.
Note that Ω(B) ⊂ Γ(top)R (Prm(x1, . . . , xn) : x1, . . . , xn; k, r, ǫ). It follows that, for any ω > 0,
(top)
R (Prm(x1, . . . , xn) : x1, . . . , xn; k, r, ǫ), ω) ≥ (8C1ω)−k
−56k2
Now we have the following result.
Theorem 5.2. Let A be a finitely generated, infinite dimensional, simple unital C∗ algebra with
a unique tracial state τ . Assume that x1, . . . , xn is a family of self-adjoint generators of A. If
A has approximation property, then δtop(x1, . . . , xn) ≥ 1.
Proof. Let H be the Hilbert space L2(A, τ). Without loss of generality, we might assume
that A is faithfully represented on the Hilbert space H . Let M be the von Neumann algebra
generated by A on H . It is not hard to see that M is a diffuse von Neumann algebra with a
tracial state τ . For each positive integer m, there is a family of mutually orthogonal projections
p1, . . . , pm in M such that τ(pj) = 1/m for 1 ≤ j ≤ m. Let
ym = 1 · p1 + 2 · p2 + · · ·+m · pm =
j · pj.
Let {Pr(x1, . . . , xn)}∞r=1 be defined as in section 2.3. Thus {Pr(x1, . . . , xn)}∞r=1 is dense in M
with respect to the strong operator topology. Hence, for each m ≥ 1, there is some self-adjoint
element Prm(x1, . . . , xn) in A such that
‖ym − Prm(x1, . . . , xn)‖2 ≤
By Lemma 5.5, for any ω > 0, when r is large enough and ǫ is small enough, we have for
some constants C1, C > 1 independent of k, ω
(top)
R (Prm(x1, . . . , xn) : x1, . . . , xn; k, ǫ, P1, . . . , Pr), ω) ≥ (8C1ω)−k
Therefore,
δ̃top(Prm(x1, . . . , xn) : x1, . . . , xn) ≥ 1−
By Proposition 5.1, we get
δtop(Prm(x1, . . . , xn) : x1, . . . , xn) ≥ 1−
By Lemma 5.1,
δtop(x1, . . . , xn) ≥ 1−
Since m is an arbitrary positive integer, we obtain
δtop(x1, . . . , xn) ≥ 1.
5.4. Values of topological free entropy dimensions in some unital C∗ algebras. In
this subsection, we are going to compute the values of topological free entropy dimensions in
some unital C∗ algebras by using the results from preceding subsection.
Theorem 5.3. Let Aθ be an irrational rotation C∗ algebra. Then
δtop(x1, . . . , xn) = 1
where x1, . . . , xn is a family of self-adjoint operators that generates Aθ.
Proof. Note that Aθ is an infinite dimensional, unital simple C∗ algebra with a unique
tracial state τ . By [24] or [11] and Theorem 5.1, we know that
δtop(x1, . . . , xn) ≤ 1.
It follows from [18] that Aθ has approximation property. Therefore
δtop(x1, . . . , xn) ≥ 1.
Hence
δtop(x1, . . . , xn) = 1.
Theorem 5.4. Let A be a UHF algebra (uniformly hyperfinite C∗ algebra). Then
δtop(x1, . . . , xn) = 1
where x1, . . . , xn is a family of self-adjoint operators that generates A.
Proof. By [17], we know that A is generated by two self-adjoint elements. It is not hard
to see that A is an infinite dimensional, unital simple C∗ algebra with a unique tracial state τ .
By [24] or [11] and Theorem 5.1, we know that
δtop(x1, . . . , xn) ≤ 1.
It is easy to check that A has approximation property. Therefore
δtop(x1, . . . , xn) ≥ 1.
Hence
δtop(x1, . . . , xn) = 1.
Recall that for any sequence (Am)∞m=1 of C∗ algebras,we can introduce two C∗ algebras
Am = {(am)∞m=1 | am ∈ Am, sup
‖am‖ <∞}
Am = {(am)∞m=1 | am ∈ Am, lim
‖am‖ = 0}
The norm in the quotient C∗ algebra
m Am/
mAm is given by
‖ρ((am)∞m=1)‖ = lim sup
‖xm‖,
where ρ is the quotient map from
mAm onto
If A is an exact C∗ algebra, then the sequence
0 → A⊗min
Mm(C) → A⊗min
Mm(C) → A⊗min (
Mm(C)/
Mm(C)) → 0
is exact. Therefore, we have the following natural identification
A⊗min (
Mm(C)/
Mm(C)) = (A⊗min
Mm(C))/(A⊗min
Mm(C)).
On the other hand, we have the following natural embedding
A⊗min
Mm(C) ⊆
Mm(A)
and the identification
A⊗min
Mm(C) =
Mm(A)
Thus we have for any exact C∗ algebra A a natural embedding
ψ : A⊗min (
Mm(C)/
Mm(C) ⊆
Mm(A)/
Mm(A).
Lemma 5.6. Suppose that A and B are unital C∗ algebras and ρ is an unital embedding
ρ : A →
Mm(B)/
Mm(B).
Suppose that x1, . . . , xn is a family of elements in A. Suppose r is a positive integer and
{Pj(x1, . . . , xn)}rj=1 is a family of noncommutative polynomials of x1, . . . , xn. Then there are
some k ∈ N and a(k)1 , . . . , a
n in Mk(B) so that
|‖Pj(a(k)1 , . . . , a(k)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤
, ∀ 1 ≤ j ≤ r.
Proof. We might assume that
ρ(xi) = [(x
i )m] ∈
Mm(B)/
Mm(B), ∀ 1 ≤ i ≤ n.
By the definition of
mMm(B)/
mMm(B), there are some positive integers m1 ≤ m2 such
|( sup
m1≤l≤m2
‖Pj(x(l)1 , . . . , x(l)n )‖)− ‖Pj(x1, . . . , xn)‖| ≤
, ∀ 1 ≤ j ≤ r.
Let k =
j and
i = ⊕m2l=m1x
i ∈ Mk(B), ∀ 1 ≤ i ≤ n.
Then, it is not hard to check that
|‖Pj(a(k)1 , . . . , a(k)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤
, ∀ 1 ≤ j ≤ r.
Theorem 5.5. Let p ≥ 2 be a positive integer and Fp be the free group on p generators. Let
C∗red(Fp)⊗minC∗red(Fp) be a minimal tensor product of two reduced C∗ algebras of free groups Fp.
δtop(x1, . . . , xn) = 1,
where x1, . . . , xn is any family of self-adjoint generators of C
red(Fp)⊗min C∗red(Fp).
Proof. Note that C∗red(Fp)⊗minC∗red(Fp) is an infinite dimensional, unital simple C∗ algebra
with a unique tracial state. By the result from [5] or [11] and Theorem 5.1, Theorem 5.2, to
show δtop(x1, . . . , xn) = 1, we need only to show that C
red(Fp)⊗min C∗red(Fp) has approximation
property. Therefore, it suffices to show the following: Let R > max{‖x1‖, . . . , ‖xn‖}. For any
r ≥ 1, there is some k ∈ N so that
(top)
R (x1, . . . , xn; k,
, P1, . . . , Pr) 6= ∅.
By the result from [9], we know there is a unital embedding
φ1 : C
red(Fp) →
Mm(C)/
Mm(C),
which induce a unital embedding
φ2 : C
red(Fp)⊗min C∗red(Fp) → C∗red(Fp)⊗min (
Mm(C)/
Mm(C))
Note that C∗red(Fp) is an exact C
∗ algebra. From the explanation preceding the theorem it
follows that there is a unital embedding
φ3 : C
red(Fp)⊗min C∗red(Fp) →
Mm(C∗red(Fp))/
Mm(C∗red(Fp)).
By Lemma 5.6, for a family of elements x1, . . . , xn in C
red(Fp) ⊗min C∗red(Fp) and r ≥ 1, there
are some m ∈ N and some a(m)1 , . . . , a
n in Mm(C∗red(Fp)) so that max{‖a1‖, . . . , ‖an‖} < R
|‖Pj(a(m)1 , . . . , a(m)n )‖ − ‖Pj(x1, . . . , xn)‖| ≤
, ∀ 0 ≤ j ≤ r.
On the other hand, by the existence of embedding
φ1 : C
red(Fp) →
Mm′(C)/
Mm′(C),
it follows that there is a unital embedding
φ4 : Mm(C∗red(Fp)) = Mm(C)⊗min C∗red(Fp) → Mm(C)⊗min (
Mm′(C)/
Mm′(C))
Mm(C)⊗min (
Mm′(C)/
Mm′(C)) =
Mm′m(C)/
Mm′m(C).
Hence for such a
1 , . . . , a
n in Mm(C∗red(Fp)) and r ≥ 1, by Lemma 5.6, there are some k ∈ N
and A1, . . . , An in Mk(C) so that max{‖A1‖, . . . , ‖An‖} < R and
|‖Pj(a(m)1 , . . . , a(m)n )‖ − ‖Pj(A1, . . . , An)‖| ≤
, ∀ 0 ≤ j ≤ r.
Altogether, we have
|‖Pj(x1, . . . , xn)‖ − ‖Pj(A1, . . . , An)‖| ≤
, ∀ 0 ≤ j ≤ r,
which implies that C∗red(Fp)⊗min C∗red(Fp) has approximation property.
Hence
δtop(x1, . . . , xn) = 1,
for any family of self-adjoint elements x1, . . . , xn that generates C
red(Fp)⊗min C∗red(Fp). �
Theorem 5.6. Suppose that K be the C∗ algebra consisting of all compact operators on a sepa-
rable Hilbert space H. Suppose A = C
K is the unitization of K. If x1, . . . , xn is a family of
self-adjoint elements that generate A as a C∗ algebra, then
δtop(x1, . . . , xn) = 0.
Proof. By [17], we know that unital C∗ algebra A is generated by two self-adjoint elements
in A. Note that A has a unique trace τ , which is defined by
τ((λ, x)) = λ, ∀ (λ, x) ∈ A.
By Theorem 5.1, it is not hard to see that
δtop(x1, . . . , xn) = 0,
where x1, . . . , xn is a family of self-adjoint generators of A.
6. Topological free orbit dimension of C∗ algebras
Assume that A is a unital C∗-algebra. Let x1, . . . , xn, y1, . . . , ym be self-adjoint elements
in A. Let C〈X1, . . . , Xn, Y1, . . . , Ym〉 be the noncommutative polynomials in the indeterminates
X1, . . . , Xn, Y1, . . . , Ym. Let {Pr}∞r=1 be the collection of all noncommutative polynomials in
C〈X1, . . . , Xn, Y1, . . . , Ym〉 with rational coefficients.
6.1. Unitary orbits of balls in Mk(C)n. We let Mk(C) be the k× k full matrix algebra
with entries in C, and U(k) be the group of all unitary matrices in Mk(C). Let Mk(C)n denote
the direct sum of n copies of Mk(C). Let Ms.ak (C) be the subalgebra of Mk(C) consisting of
all self-adjoint matrices of Mk(C). Let (Ms.ak (C))n be the direct sum of n copies of Ms.ak (C).
For every ω > 0, we define the ω-orbit-‖ · ‖-ball U(B1, . . . , Bn;ω) centered at (B1, . . . , Bn) in
Mk(C)n to be the subset of Mk(C)n consisting of all (A1, . . . , An) in Mk(C)n such that there
exists some unitary matrix W in U(k) satisfying
‖(A1, . . . , An)− (WB1W ∗, . . . ,WBnW ∗)‖ < ω.
6.2. Norm-microstate space. For all integers r, k ≥ 1, real numbers R, ǫ > 0 and non-
commutative polynomials P1, . . . , Pr, we let
(top)
R (x1, . . . , xn : y1, . . . , ym; k, ǫ, P1, . . . , Pr)
be as defined as in section 2.4.
6.3. Topological free orbit dimension.
Definition 6.1. For ω > 0, we define the covering number
(top)
R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr), ω)
to be the minimal number of ω-orbit–‖·‖-balls that cover Γ(top)R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr)
with the centers of these ω-orbit-‖ · ‖-balls in Γ(top)R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr)
For each function f : N× N× R+ → R, we define,
kf(x1, . . . , xn : y1, . . . , yp;ω,R)
= inf
r∈N,ǫ>0
lim sup
f(o∞(Γ
(top)
R (x1, . . . , xn : y1, . . . , yp; k, ǫ, P1, . . . , Pr), ω), k, ω)
kf(x1, . . . , xn : y1, . . . , yp;ω) = sup
kf(x1, . . . , xn : y1, . . . , yp;ω,R)
kf(x1, . . . , xn : y1, . . . , yp) = lim sup
kf(x1, . . . , xn : y1, . . . , yp;ω),
where kf(x1, , . . . , xn : y1, . . . , yp) is called the topological f(·)-free-orbit-dimension of x1, . . . , xn
in the presence of y1, . . . , yp.
6.4. Topological free entropy dimension and topological free orbit dimension.
The following result follows directly from the definitions of topological free entropy dimension
and topological free orbit dimension of n-tuple of self-adjoint elements in a C∗ algebra.
Theorem 6.1. Suppose that A is a unital C∗ algebra and x1, . . . , xn is a family of self-adjoint
elements of A. Let f : N× N× R+ → R be defined by
f(s, k, ω) =
log s
−k2 logω
for s, k ∈ N, ω > 0. Then
δtop(x1, . . . , xn) ≤ kf(x1, . . . , xn) + 1.
7. Topological free orbit dimension of one variable
We recall the packing number of a set in a metric space as follows.
Definition 7.1. Suppose that X is a metric space with a metric distance d. (i) The packing
number of a set K by ω-balls in X, denoted by P (K,ω), is the maximal cardinality of the subsets
F in K satisfying for all a, b in F either a = b or d(a, b) ≥ ω. (ii) The packing dimension of
the set K in X, denoted by d(K), is defined by
d(K) = lim sup
log(P (K,ω))
− log ω .
7.1. Upper-bound of the topological free orbit dimension of one variable. Suppose
that x = x∗ is a self-adjoint element in a unital C∗ algebra A and σ(x) is the spectrum of x in
For any ω > 0, let m = P (K,ω) be the packing number of σ(x) in R. Thus there exists a
family of elements λ1, . . . , λm in σ(x) such that (i) |λi − λj | ≥ ω for all 1 ≤ i 6= j ≤ m; and (ii)
for any λ in σ(x), there is some λj with 1 ≤ j ≤ m satisfying |λ− λj | ≤ ω.
Lemma 7.1. For any given R > ‖x‖, when r is large enough and ǫ is small enough, we have
lim sup
log o∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), 3ω)
log k
Proof. By Theorem 3.1, there exist some r0 ≥ 1 and ǫ0 > 0 such that the following
holds: when r > r0, ǫ < ǫ0, for any A in Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), there are positive integers
1 ≤ k1, . . . , km ≤ k with k1 + · · ·+ km = k and some unitary matrix U in Mk(C) satisfying
‖U∗AU −
λ1Ik1 0 · · · 0
0 λ2Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · λmIkm
‖ ≤ 2ω,
where Ikj is the kj × kj identity matrix for 1 ≤ j ≤ m.
Ω(k1, . . . , km) =
λ1Ik1 0 · · · 0
0 λ2Ik2 · · · 0
· · · · · · . . . · · ·
0 0 · · · λmIkm
U | U is in Uk
Let J be the set consisting of all these (k1, . . . , km) ∈ Nm with k1 + · · ·+ km = k. Then the
cardinality of the set J is equal to
(k − 1)!
(m− 1)!(k −m)! .
(top)
R (x; k, ǫ, P1, . . . , Pr)
is contained in 2ω-neighborhood of the set
(k1,...,km)∈J
Ω(k1, . . . , km).
It follows that
(top)
R (x; k, ǫ, P1, . . . , Pr), 3ω) ≤ o∞(
(k1,...,km)∈J
Ω(k1, . . . , km), ω) ≤ |J | =
(k − 1)!
(m− 1)!(k −m)! .
Therefore,
lim sup
log o∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), 3ω)
log k
= lim sup
log o∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), ω)
log k
≤ lim sup
(k−1)!
(m−1)!(k−m)!
log k
= m− 1.
7.2. Lower-bound. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A
and σ(x) is the spectrum of x in A.
Lemma 7.2. We have
lim sup
log o∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr),
log k
≥ m− 1.
Proof. For any ω > 0, let m = P (K,ω) be the packing number of σ(x) in R. Thus there
exists a family of elements λ1, . . . , λm in σ(x) such that (i) |λi − λj | ≥ ω for all 1 ≤ i 6= j ≤ m;
and (ii) for any λ in σ(x), there is some λj with 1 ≤ j ≤ m satisfying |λ− λj| ≤ ω.
For any R > ‖x‖, r ≥ 1 and ǫ > 0, by functional calculus, there are λm+1, . . . , λn in σ(x)
such that for every 1 ≤ t1, . . . , tm ≤ k − n with 2nt1 + . . .+ 2ntm = k − n, the matrix
A = diag(λ1I2nt1 ,λ2I2nt2 , . . . , λmI2ntm , λ1, . . . , λm, . . . , λn)
is in Γ
(top)
R (x; k, ǫ, P1, . . . , Pr), (7.2.1)
where we assume that 2n|(k − n).
Let J be the set consisting of all these (t1, . . . , tm) ∈ Nm with 2nt1 + . . . + 2ntm = k − n.
Then the cardinality of the set J is equal to
!(m− 1)!
By Weyl’s theorem in [27] on the distance of unitary orbits of two self-adjoint matrices, for
any two distinct elements
(s1, . . . , sm) and (t1, . . . , tm)
in J and any W in U(k), we have
‖A1 −WA2W ∗‖ ≥ ω,
where
A1 = diag(λ1I2nt1 , λ2I2nt2 , . . . , λmI2ntm , λ1, . . . , λm, . . . , λn)
A2 = diag(λ1I2ns1, λ2I2ns2 , . . . , λmI2nsm, λ1, . . . , λm, . . . , λn)
are two diagonal self-adjoint matrices in Mk(C). Combining with (7.2.1), we have
(top)
R (x; k, ǫ, P1, . . . , Pr),
) ≥ |J | ≥
!(m− 1)!
Hence
lim sup
log o∞(Γ
(top)
R (x; k, ǫ, P1, . . . , Pr),
log k
≥ lim sup
( k−n2n −1)!
( k−n2n −m)!(m−1)!
log k
= m− 1.
7.3. Topological free orbit dimension of one self-adjoint element.
Theorem 7.1. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A and σ(x)
is the spectrum of x in A. Let d(σ(x)) be the packing dimension of the set σ(x) in R. Let
f : N× N× R+ → R be defined by
f(s, k, ω) =
log s
log k
− log ω
for s, k ∈ N, ω > 0. Then
kf(x) = d(σ(x)).
Proof. The result follows directly from Lemma 7.1, Lemma 7.2 and Definition 7.1. �
Theorem 7.2. Suppose that x = x∗ is a self-adjoint element in a unital C∗ algebra A. Let
f : N× N× R+ → R be defined by
f(s, k, ω) =
log s
−k2 logω
for s, k ∈ N, ω > 0. Then
kf(x) = 0.
Proof. The result follows directly from Lemma 7.1 and Definition 7.1. �
References
[1] N. Brown, K. Dykema, K. Jung, “Free Entropy Dimension in Amalgamated Free Products,”
math.OA/0609080.
[2] M. Dostál, D. Hadwin, “An alternative to free entropy for free group factors,” International Workshop
on Operator Algebra and Operator Theory (Linfen, 2001). Acta Math. Sin. (Engl. Ser.) 19 (2003), no. 3,
419–472.
[3] K. Dykema, “Two applications of free entropy,” Math. Ann. 308 (1997), no. 3, 547–558.
[4] L. Ge, “Applications of free entropy to finite von Neumann algebras,” Amer. J. Math. 119 (1997), no. 2,
467–485.
http://arxiv.org/abs/math/0609080
[5] L. Ge, “Applications of free entropy to finite von Neumann algebras,” II. Ann. of Math. (2) 147 (1998), no.
1, 143–157.
[6] L. Ge, S. Popa, “On some decomposition properties for factors of type II1,” Duke Math. J. 94 (1998), no.
1, 79–101.
[7] L. Ge, J. Shen, “Free entropy and property T factors,” Proc. Natl. Acad. Sci. USA 97 (2000), no. 18,
9881–9885 (electronic).
[8] L. Ge, J. Shen, “On free entropy dimension of finite von Neumann algebras,” Geom. Funct. Anal. 12 (2002),
no. 3, 546–566.
[9] U. Haagerup, S. Thorbjrnsen, “A new application of random matrices: Ext(C∗
(F2)) is not a group,” Ann.
of Math. (2) 162 (2005), no. 2, 711–775.
[10] D. Hadwin, “Free entropy and approximate equivalence in von Neumann algebras”, Operator algebras and
operator theory (Shanghai, 1997), 111–131, Contemp. Math., 228, Amer. Math. Soc., Providence, RI, 1998.
[11] D. Hadwin, J. Shen, “Free orbit diension of finite von Neumann algebras”, Journal of Functional Analysis
249 (2007) 75-91.
[12] K. Jung, “The free entropy dimension of hyperfinite von Neumann algebras,” Trans. Amer. Math. Soc. 355
(2003), no. 12, 5053–5089 (electronic).
[13] K. Jung, “A free entropy dimension lemma,” Pacific J. Math. 211 (2003), no. 2, 265–271.
[14] K. Jung, “Strongly 1-bounded von Neumann algebras,” Math arKiv: math.OA/0510576.
[15] K. Jung, D. Shlyakhtenko, “All generating sets of all property T von Neumann algebras have free entropy
dimension ≤ 1,” Math arKiv: math.OA/0603669.
[16] D. McDuff, “Central sequences and the hyperfinite factor,” Proc. London Math. Soc. (3) 21 1970 443–461.
[17] C. Olsen and W. Zame, “Some C∗ algebras with a single generator,” Trans. of A.M.S. 215 (1976), 205-217.
[18] M. Pimsner, D. Voiculescu, “Imbedding the irrational rotation C∗-algebra into an AF-algebra,” J. Operator
Theory 4 (1980), no. 2, 201–210.
[19] M. Stefan, “Indecomposability of free group factors over nonprime subfactors and abelian subalgebras,”
Pacific J. Math. 219 (2005), no. 2, 365–390.
[20] M. Stefan, “The primality of subfactors of finite index in the interpolated free group factors,” Proc. Amer.
Math. Soc. 126 (1998), no. 8, 2299–2307.
[21] S. Szarek, “Metric entropy of homogeneous spaces,” Quantum probability, 395–410, Banach Center Publ.,
43, Polish Acad. Sci., Warsaw, 1998.
[22] D. Voiculescu, “Circular and semicircular systems and free product factors,” Operator algebras, unitary
representations, enveloping algebras, and invariant theory (Paris, 1989), 45–60, Progr. Math., 92, Birkhauser
Boston, MA, 1990.
[23] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure in free probability theory II,”
Invent. Math., 118 (1994), 411-440.
[24] D. Voiculescu, “The analogues of entropy and of Fisher’s information measure in free probability theory III:
The absence of Cartan subalgebras,” Geom. Funct. Anal. 6 (1996) 172–199.
[25] D. Voiculescu, “Free entropy dimension ≤ 1 for some generators of property T factors of type II1,” J. Reine
Angew. Math. 514 (1999), 113–118.
[26] D. Voiculescu, “The topological version of free entropy,” Lett. Math. Phys. 62 (2002), no. 1, 71–82.
[27] H. Weyl, “Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen
(mit einer Anwendung auf die Theorie der Hohlraumstrahlung,” (German) Math. Ann. 71 (1912), no. 4,
441–479.
http://arxiv.org/abs/math/0510576
http://arxiv.org/abs/math/0603669
	1. Introduction
	2. Definitions and preliminary
	3. Some technical lemmas
	4. Topological free entropy dimension of one variable
	5. Topological free entropy dimension of n-tuple in unital C* algebras 
	6. Topological free orbit dimension of C* algebras
	7. Topological free orbit dimension of one variable
	References
ABSTRACT
  The notion of topological free entropy dimension of $n-$tuples of elements in
a unital C$^*$ algebra was introduced by Voiculescu. In the paper, we compute
topological free entropy dimension of one self-adjoint element and topological
orbit dimension of one self-adjoint element in a unital C$^*$ algebra.
Moreover, we calculate the values of topological free entropy dimensions of
families of generators of some unital C$^*$ algebras (for example: irrational
rotation C$^*$ algebras or minimal tensor product of two reduced C$^*$ algebras
of free groups).

<|endoftext|><|startoftext|>
Introduction
A hot deconfined medium favors the dissociation of J/ψ since enough hard gluons
can overcome the large energy gap between the J/ψ and a continuum state of cc̄ [1].
Models based on perturbative QCD have shown that a dense partonic system can be pro-
duced in central Au-Au collisions at RHIC and LHC energies [2-6] and then evolve toward
thermal equilibrium and likely chemical equilibrium [7-11]. Such parton plasmas will be
searched for soon in experiments at Brookhaven National Laboratory Relativistic Heavy
Ion Collider (RHIC). The J/ψ suppression has been taken as a thermometer to identify
the evolution history of a parton plasma by showing transverse momentum dependence
of the survival probability in the central rapidity region [12].
Charmonium melting inside a hot medium, which leads to J/ψ suppression, was
proposed by Matsui and Satz to probe the existence of the quark-gluon plasma [13].
Before the complete formation of charmonium is achieved, a pre-resonant cc̄ is expanding
from a collision point. Dominance of the color octet plus a collinear gluon configuration
in the pre-resonance state [14] may account for the same suppression of ψ′ and J/ψ
production in proton-nucleus collisions [15]. The growth of the color octet configuration
and its interaction with nucleons along its trajectory in a nucleus are essential ingredients
in explaining measured J/ψ production cross sections. In addition, the importance of color
octet configurations has been verified in pp̄ collisions at center-of-mass energy
s = 1.8
TeV with the CDF detector at Fermilab [16]. Theoretically, the color-octet production
at short distances and its evolution into physical resonances has been well formulated in
nonrelativistic QCD [17]. At the collision energies of RHIC and LHC we can reasonably
expect considerable contributions from the color octet mechanism.
The evolution of ultrarelativistic nucleus-nucleus collisions, e.g. central Au-Au colli-
sions at both RHIC and LHC energies, has been divided into three stages in Refs. [9,18,19]:
(a) an initial collision where a parton gas is produced; (b) a prethermal stage where elas-
tic scatterings among partons lead to local momentum isotropy [20]; (c) a thermal stage
where parton numbers increase until freeze-out. The term ’partonic system’ refers to
the assembly of partons in the prethermal and thermal stages. The parton plasma only
denotes the assembly of partons in the thermal stage. The cc̄ pairs are produced in the
initial collision, prethermal and thermal stages but disintegrate in the latter two stages. In
order to understand and make predictions for J/ψ yields of RHIC and LHC experiments,
the following physical processes are taken into account. (a) In the initial collision, cc̄ pairs
are produced in hard and semihard scatterings between partons from incoming nuclei by
2 → 2 processes which start at order α3s through the partonic channels ab→ cc̄[2S+1LJ ]x.
In the prethermal and thermal stages, cc̄ pairs can also be produced in 2 → 1 collisions
which start at order α2s via the partonic channels ab → cc̄[2S+1LJ ] since partons in the
deconfined medium have large transverse momenta. (b) The cc̄ produced at short distance
is in a color singlet (cc̄)1 or color octet (cc̄)8 configuration which has a certain probability
to evolve nonperturbatively into a color singlet state. This J/ψ production process is
formulated in nonrelativistic QCD. (c) Since the color octet to singlet transition of (cc̄)8
takes time, gluons in the partonic medium couple to the color octet state and destroy this
transition process. Normally, dissociation cross sections for g+ cc̄→ (cc̄)8 depend on the
pair size. Expansion of the cc̄ from a collision point to a full J/ψ size has to be taken into
account.
A physical resonance formed by a cc̄ pair may be one of J/ψ, χcJ , ψ
′ and others.
Since the radiative transition from a higher charmonium state to the J/ψ takes a much
longer time, the transition of such a state with nonzero pT takes place outside the partonic
system. Since the Fermilab Tevatron experiments have been able to separate direct J/ψ’s
from those produced in radiative χcJ decays [16], in this work we assume that the direct
J/ψ production can also be extracted in heavy ion measurements. If χcJ and ψ
′ are
considered, suppression factors for χcJ and ψ
′ in a deconfined medium are included in
prompt J/ψ production. The identification of J/ψ suppression in the medium becomes
impossible for any prompt J/ψ production data. Therefore, no contributions from higher
charmonium states are taken into account in this work.
The purpose of this work is to study the dependence of the J/ψ survival probability
and number distributions produced in central Au-Au collisions at RHIC and LHC energies
on the transverse momentum and also rapidity which will be measured in RHIC experi-
ments [21]. The J/ψ number distributions corresponding to production of cc̄ in the initial
collision are given in Section 2. Since nuclear shadowing has been shown to influence J/ψ
production in proton-nucleus collisions [22], the nuclear modification of parton distribu-
tions is considered. The J/ψ number distributions due to cc̄ production in the prethermal
and thermal stages are given in Sections 3 and 4. Section 5 contains dissociation cross
sections for gluon-(cc̄)1 and gluon-(cc̄)8. Numerical results for nucleon-cc̄ cross sections,
J/ψ number distributions and four ratios including survival probability are presented in
Section 6. Conclusions are summarized in the final section.
2. Initial production of cc̄
Intrinsic transverse momenta of partons inside a nucleon result in the production of
J/ψ with typical momenta comparable to the QCD scale via 2 → 1 partonic scattering
processes [23]. Since we want to study J/ψ productions with pT > 2 GeV, contributions
from 2 → 1 partonic reactions are not considered in the initial nucleon-nucleon colli-
sion. The effect of intrinsic transverse momentum smearing is rather modest for large
transverse momentum J/ψ data from the Tevatron [24]. Upon omission of the intrinsic
transverse momentum, differential cross section for J/ψ production in nucleon-nucleon
collision resulting only from 2 → 2 partonic processes is given as
dydyxdp⊥
= 2p⊥
xaxbfa/N (xa)fb/N(xb)
(ab→ cc̄[2S+1L(1)J ]x → J/ψ)
(ab → cc̄[2S+1L(8)J ]x → J/ψ)] (1)
where the summation
abx is over partons labeled by a, b, x,
(1) for all possible color-
singlet states and
(8) for all possible color-octet states. Here
denotes the partonic
differential cross section for producing a cc̄[2S+1LJ ] and evolving to a J/ψ with spectro-
scopic notation for quantum numbers and superscripts for singlet and octet [23, 25], and
fa/N is the parton distribution function of the species a in a free nucleon. The longitudinal
momentum fractions carried by initial partons, xa and xb, are related to rapidities of cc̄
and x, y and yx, by
y + p⊥e
yx), xb =
−y + p⊥e
where
s, p⊥ and m⊥ are the center-of-mass energy of nucleon-nucleon collision, trans-
verse momentum and transverse mass of the J/ψ. The conditions xa < 1 and xb < 1
restrict yx to a region of
s−m⊥e−y
< yx < ln
s−m⊥ey
These 2 → 2 processes at order α3s , gg → cc̄[2S+1LJ ]g, qq̄ → cc̄[2S+1LJ ]g, gq → cc̄[2S+1LJ ]q
and gq̄ → cc̄[2S+1LJ ]q̄, start in initial nucleus-nucleus collisions and proceed with the
expansion of the heavy pair. While a cc̄ propagates inside a prethermal or thermal partonic
system, gluons hit and excite it to continuum states. Let σ
gcc̄[13S
be the cross section for
g+(cc̄)[13S
1 ] → (cc̄)8, σgcc̄[S(8)] for g+(cc̄)[S(8)] → (cc̄)8 and σgcc̄[P (8)] for g+(cc̄)[P (8)] →
(cc̄)8 respectively. The cross sections are calculated in Section 5. The probability for
dissociation of a small-size cc̄ into a free state relies on the relative velocity between the
gluon and cc̄, vrel, and gluon number densities in the prethermal and thermal stages, ng(x)
and ng(τ), respectively. Here the variables x and τ are individually space-time coordinates
and proper time. In the prethermal stage, parton distributions depend on the correlation
between momentum and space-time coordinates [18, 19]. The dependence of the gluon
number density ng(x) on x characterizes the partonic system in nonequilibrium. In the
thermal stage, thermal parton distributions can be approximated by Jüttner distributions
where the temperature and parton fugacities depend only on the proper time [9, 18, 20]. As
a consequence, the gluon number density is only a function of τ . Including cc̄ suppression
in the partonic system, the finally-formed number distribution of J/ψ resulting from cc̄
pairs produced in the initial central A+B collision is given by
dN2→2ini
dyd2p⊥
s−m⊥e
s−m⊥e
xafa/A(xa, m
⊥, ~r)xbfb/B(xb, m
⊥,−~r)
(ab→ cc̄[3S(1)1 ]x → J/ψ)
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[13S(1)
(k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[13S(1)1 ]
(k · u) >the θ(d− VT∆t)]
(ab→ cc̄[3S(8)1 ]x → J/ψ)
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
(ab→ cc̄[1S(8)0 ]x → J/ψ)
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
(ab→ cc̄[3P (8)J ]x → J/ψ)
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]}
where fa/A is the parton distribution function of a nucleus,
fa/A(x,Q
2, ~r) = TA(~r)Sa/A(x,~r)fa/N (x,Q
2) (3)
with the thickness function TA and nuclear parton shadowing factor Sa/A. Here, RA
is the nuclear radius. The symbols < · · · >pre and < · · · >the denote averages over
gluon distributions in the prethermal and thermal stages, respectively. Along the track of
nucleus-nucleus collisions, a deconfined partonic gas is produced from scatterings among
primary partons at τ0, then reaches thermalization at τiso and finally freezes out at τf .
Here, d is the shortest distance which a cc̄ travels from a production point ~r to the surface
of the partonic medium with transverse velocity VT [12]. Suppose a cc̄ is produced at a
proper time τ and a spatial rapidity η. The time ∆t for the partonic system to evolve to
another proper time τ ′ is
(V‖ sinh η − cosh η)τ +
(sinh η − V‖ cosh η)2τ 2 + (1− V 2‖ )τ ′2
1− V 2‖
where V‖ is the longitudinal component of the cc̄ velocity. The disappearance of medium
interactions on the cc̄ is ensured by the step function θ while this pair escapes from the
partonic medium.
3. Production of cc̄ in the prethermal stage
To order α2s, a cc̄ in a color singlet state is produced only through gluon fusion
gg → cc̄[2S+1L(1)J ]. For the cc̄[3S
1 ], this fusion does not occur. In contrast, color octet
states result from both channels gg → cc̄[2S+1L(8)J ] and qq̄ → cc̄[2S+1L
J ]. Nevertheless,
the number densities of quarks and antiquarks are so small that they are neglected in es-
timating the production of cc̄ in the prethermal stage where gluons dominate the partonic
system. Four momenta of the two initial partons and final cc̄ are denoted by k1 = (ω1, ~k1),
k2 = (ω2, ~k2) and p = (E, ~p) = (m⊥ cosh y, ~p⊥, m⊥ sinh y). The differential production rate
for gg → cc̄[2S+1L(8)J ] → J/ψ in the prethermal stage is
d3A2→1pre
8(2π)5
δ(4)(k1 + k2 − p)
g2Gfg(k1, x)fg(k2, x)
| M(gg → cc̄[2S+1L(8)J ] → J/ψ) |2
where gG is the degeneracy factor for gluons and the fg(k, x) is the correlated phase-space
distribution function given in Ref. [18]. The squared amplitudes | M |2 for cc̄ in color
singlet and color octet are calculated individually in Refs. [23, 25]. To order α2s, the
allowed color octet states are 1S
0 and
0,2 through the gluon fusion channel. Taking
into account the suppression of cc̄ in the prethermal and thermal stages, the finally-formed
number distribution of J/ψ resulting from cc̄ pairs produced through 2 → 1 processes in
the prethermal stage is given by
dN2→1pre
dyd2p⊥
16(2π)5
∫ τiso
τdτdηdφk1dyk1
g2Gfg(k1, x)fg(k2, x)
{| M(gg → cc̄[1S(8)0 ] → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
+ | M(gg → cc̄[3P (8)J ] → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]}
where φki is the angle between
~k⊥i and ~p⊥ for i = 1, 2 and mc is the charm quark mass.
The kinematic variables k⊥1, k⊥2, φk2 and yk2 are expressed in terms of
k⊥1 =
m⊥ cosh(y − yk1)− p⊥ cosφk1
k⊥2 =
p2⊥ + k
⊥1 − 2p⊥k⊥1 cosφk1
sin φk2 = −
sinφk1
sinh yk2 =
(m⊥ sinh y − k⊥1 sinh yk1)
To order α3s, the differential production rate gets contributions from the processes
gg → cc̄[2S+1L(1,8)J ]g in the prethermal stage,
d3A2→2pre
16(2π)8
δ(4)(k1 + k2 − p− px)
g2Gfg(k1, x)fg(k2, x){
| M(gg → cc̄[2S+1L(1)J ]x → J/ψ) |2
| M(gg → cc̄[2S+1L(8)J ]x → J/ψ) |2} (7)
where px = (Ex, ~px) = (p⊥x cosh yx, p⊥x cosφx, p⊥x sin φx, p⊥x sinh yx) is the four momen-
tum of the massless parton x. Taking into account the suppression of cc̄ in the prethermal
and thermal stages, the finally-formed number distribution of J/ψ resulting from cc̄ pairs
produced through 2 → 2 processes in the prethermal stage is given by
dN2→2pre
dyd2p⊥
16(2π)8
∫ τiso
τdτdηp⊥xdp⊥xdφxdyxdφk1dyk1
2k2⊥1
g2Gfg(k1, x)fg(k2, x)
{| M(gg → cc̄[3S(1)1 ]x → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[13S(1)1 ]
(k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[13S(1)1 ]
(k · u) >the θ(d− VT∆t)]
+ | M(gg → cc̄[3S(8)1 ]x → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
+ | M(gg → cc̄[1S(8)0 ]x → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[S(8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
+ | M(gg → cc̄[3P (8)J ]x → J/ψ) |2
exp[−
∫ τiso
dτ ′ng(x
′) < vrelσgcc̄[P (8)](k · u) >pre θ(d− VT∆t)
dτ ′ng(τ
′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]}
where ŝ = (k1 + k2)
2 and some kinematic variables are given by
k⊥1 = {4m2c + 2m⊥p⊥x cosh(y − yx)− 2p⊥p⊥x cosφx}/
{2[m⊥ cosh(y − yk1) + p⊥x cosh(yx − yk1)− p⊥ cosφk1 − p⊥x cos(φx − φk1)]}
k2⊥2 = m
⊥ + p
⊥x + 2m⊥p⊥x cosh(y − yx) + k2⊥1
−2k⊥1[m⊥ cosh(y − yk1) + p⊥x cosh(yx − yk1)]
sinh yk2 =
[m⊥ sinh y + p⊥x sinh yx − k⊥1 sinh yk1]
The J/ψ number distribution resulting from cc̄ pairs produced in the prethermal
stage becomes
dNpre
dyd2p⊥
dN2→1pre
dyd2p⊥
dN2→2pre
dyd2p⊥
4. Production of cc̄ in the thermal stage
In the thermal stage, parton distributions are approximated by thermal phase-space
distributions fi(k;T, λi) in which the temperature T and nonequilibrium fugacities λi are
functions of the proper time τ [9, 18]. While the partonic system evolves, quark and
antiquark number densities increase. To order α2s, both gg → cc̄[2S+1L
J ] → J/ψ and
qq̄ → cc̄[2S+1L(8)J ] → J/ψ contribute to the J/ψ number distribution in the thermal stage
dN2→1the
dyd2p⊥
16(2π)5
τdτdηdφk1dyk1
g2Gfg(k1;T, λg)fg(k2;T, λg) | M(gg → cc̄[1S
0 ] → J/ψ) |2
exp[−
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]
g2Gfg(k1;T, λg)fg(k2;T, λg) | M(gg → cc̄[3P
J ] → J/ψ) |2
exp[−
dτ ′ng(τ
′) < vrelσgcc̄[P (8)](k · u) >the θ(d− VT∆t)]
+gqgq̄fq(k1;T, λq)fq̄(k2;T, λq̄) | M(qq̄ → cc̄[3S(8)1 ] → J/ψ) |2
exp[−
dτ ′ng(τ
′) < vrelσgcc̄[S(8)](k · u) >the θ(d− VT∆t)]}
where gq and gq̄ are the degeneracy factors for quarks and antiquarks, respectively. In the
channel of quark-antiquark annihilation, only the squared amplitude for 3S
1 does not
vanish.
All lowest-order 2 → 2 reactions gg → cc̄[2S+1LJ ]g, qq̄ → cc̄[2S+1LJ ]g, gq →
cc̄[2S+1LJ ]q and gq̄ → cc̄[2S+1LJ ]q̄ contribute to the J/ψ number distribution in the ther-
mal stage
dN2→2the
dyd2p⊥
16(2π)8
τdτdηp⊥xdp⊥xdφxdyxdφk1dyk1
2k2⊥1
fa(k1;T, λa)fb(k2;T, λb)gagb
| M(ab→ cc̄[2S+1L(1)J ]x → J/ψ) |2
exp[−
dτ ′ng(τ
′) < vrelσgcc̄[12S+1L(1)
(k · u) >the θ(d− VT∆t)]
| M(ab→ cc̄[2S+1L(8)J ]x → J/ψ) |2
exp[−
dτ ′ng(τ
′) < vrelσgcc̄[L(8)](k · u) >the θ(d− VT∆t)]}
The J/ψ number distribution resulting from cc̄ pairs produced in the thermal stage
becomes
dNthe
dyd2p⊥
dN2→1the
dyd2p⊥
dN2→2the
dyd2p⊥
5. Gluon-cc̄ dissociation cross sections
A dissociation cross section of a full-size J/ψ induced by a gluon is given in Refs. [1,
26]. Since an initially-created cc̄ has a radius of about r0 =
and proceeds by expanding
to a full-size object, the dissociation cross section of cc̄ by a gluon has a size dependence.
By this we mean the dissociation of cc̄ into free states via this process g + cc̄ → (cc̄)8.
Cross sections are calculated with chromoelectric dipole coupling between gluon and cc̄ in
the procedure for gluon-J/ψ dissociation in Ref. [26]. The wave function of an expanding
cc̄ is needed for this purpose, but it has not been investigated in the partonic medium even
though some attempts have been made in studies of the color transparency phenomenon
[27]. We proceed with the construction of wave functions in a simple one-gluon-exchange
potential model.
In a parton plasma, the internal motion of J/ψ is obtained [12] from the attractive
Coulomb potential, V1 = −gs2/3πr. The quantum-mechanical interpretation of the cc̄
radius is
< r2 >, the square root of the radius-square expectation value of the relative-
motion wave function. For the 1S color singlet, its wave function in momentum space
normalized to the radius of cc̄[13S
1 ] is
[~rψ1s](~k) = 32
πa2.50
(1 + (ka0)2)3
where the variable a0 =
< r2 > /3 is the Bohr radius for a full-size J/ψ. The velocity-
square expectation value of the J/ψ wave function is < v2 >= 0.428. Then the radius
of cc̄ is assumed to expand according to
< r2 > =
< v2 >t+ r0. The gluon-cc̄[1
dissociation cross section is
gcc̄[13S
128gs
2m2.5c a
0 − ǫ0)1.5Q0
9[mca
0 − ǫ0) + 1]6
where Q0 is the gluon energy, ǫ0 the binding energy of J/ψ and gs the strong coupling
constant.
While the cc̄ is in a color octet state, it is not a bound state but rather a scattering
state. Its relative-motion wave function is determined by the repulsive potential V8 =
g2s/24πr. The radial part of the S wave function is
SR(r) + iSI(r) = e
iqrF (1 + iη, 2,−2iqr) (15)
and the radial part of the P wave function is
PR(r) + iPI(r) = qre
iqrF (2 + iη, 4,−2iqr) (16)
with q = mc
< v2 > and η = g2s/24π
< v2 >. The function F is the confluent hyper-
geometric function. Wave functions in momentum space are obtained by performing a
Fourier transform of the wave functions in space coordinates. Normalization constants of
the momentum-space wave functions, CS and CP , are determined by fitting the cc̄ radius.
Dissociation cross sections of the S-wave and P -wave color-octet states by a gluon are
σgcc̄[S(8)] =
g2s (mcQ
0)1.5
dr1dr2r
mcQ0r1)j1(
mcQ0r2)[SR(r1)SR(r2) + SI(r1)SI(r2)]
σgcc̄[P (8)] =
g2s (mcQ
0)1.5
dr1dr2
mcQ0r1)j0(
mcQ0r2) + 2j2(
mcQ0r1)j2(
mcQ0r2)]
[PR(r1)PR(r2) + PI(r1)PI(r2)]
where the j0, j1 and j2 are spherical Bessel functions. The b is determined so that the
square root of the r2 expectation value of the relative wave function in Eq. (15) or (16) is
the color-octet radius. Relations b = 1.435
< r2 > for S-wave and b = 1.3
< r2 > for
P -wave approximately hold for color-octet size less than normal hadron size.
6. Numerical results and discussions
Results for five aspects are presented in the following subsections. The first aspect
is the nucleon-cc̄ dissociation cross sections shown in the next subsection. The second one
in Subsection 6.2 is J/ψ number distributions versus transverse momentum at y = 0 and
rapidity at pT = 4 GeV with nuclear effect on parton distributions and cc̄ dissociation
in the partonic system. The third one in Subsection 6.3 is to define and calculate four
ratios including survival probability with y = 0 or pT = 4 GeV at both RHIC and LHC
energies. The fourth one is given in Subsection 6.4 to show J/ψ number distributions
without nuclear effect on parton distributions and cc̄ dissociation in the partonic system.
The fifth one concerns some uncertainties on the above results.
6.1. Nucleon-cc̄ dissociation cross sections
In the parton model of the nucleon, the gluon is a dominant ingredient. Whereas
the cross section for cc̄ dissociated directly by a real gluon is of order αs, the cross section
for the quark-cc̄ dissociation through a virtual gluon is of order α2s. With the gluon-(cc̄)1
cross section given in the last section, the nucleon-cc̄[13S
1 ] cross section driven mainly
by the gluon ingredient becomes
Ncc̄[13S
dxfg/N(x,Q
gcc̄[13S
where x
min =
with pN being the proton momentum in the rest frame of the J/ψ. The
gluon distribution function fg/N is that Glück-Reya-Vogt (GRV) result at leading order
in Ref. [28]. The cross section is drawn in Fig. 1 to show the energy and renormalization-
scale dependence while the cc̄ radius is the J/ψ radius in the attractive Coulomb poten-
tial, rJ/ψ = 0.348 fm. In Fig. 1, gluon field operators are renormalized at three scales
2ǫ0, Q
0, respectively. The coupling constant has the value αs =
correspond-
ing to the scale ǫ0 while it varies for the other two scales. Values of the cross section
s = 10 GeV are a little lower than the nucleon-J/ψ dissociation cross section ob-
tained by the subtraction of quasi-elastic cross section in Ref. [29] from the total cross
section given in Ref. [30]. In high-temperature hadronic matter or J/ψ photoproduction
reaction, a typical value of the center-of-mass energy for nucleon-J/ψ (or preresonance)
dissociation is around
s = 6 GeV [1]. At this energy, Fig. 2 is drawn to show the size
dependence of σ
Ncc̄[13S
, with fg/N (x,Q
2) at Q2 = ǫ20.
For the S-wave color octet the nucleon-cc̄[S(8)] cross section is
σNcc̄[S(8)] =
dxfg/N(x, (q +Q
0)2)σgcc̄[S(8)] (20)
where x
min =
. For the P -wave color octet the nucleon-cc̄[P (8)] cross section is
σNcc̄[P (8)] =
dxfg/N(x, (q +Q
0)2)σgcc̄[P (8)] (21)
Since the gluon momentum in a confining medium is bigger than the QCD scale [14], the
lowest value of x is set by ΛQCD = 0.2322 GeV used in the leading order GRV parton
distribution functions. Dependences of σNcc̄[S(8)] and σNcc̄[P (8)] on the center-of-mass energy√
s are depicted in Fig. 3 while the size of (cc̄)8 is the full size of J/ψ. The dot-dashed
line is obtained with the nucleon-J/ψ cross section given by Eq. (24) in Ref. [1] where
another gluon distribution function evaluated at Q2 = ǫ20 is used. While the (cc̄)8 has
small momentum in a nucleus, the cross section for nucleon-(cc̄)8 production is lower than
the absorption cross section determined by Gerschel and Hüfner [31] or the two-gluon
exchange result [32]. It was proposed by Kharzeev and Satz that the color octet plus a
gluon configuration is a dominant component produced in the proton-nucleus collisions
[14]. In fact, the present cross section for a nucleon and a bare (cc̄)8 is one part of the
nucleon-g(cc̄)8 cross section.
A (cc̄)8 pair produced at a collision point expands before becoming color singlet to
a size which may be larger or smaller than the full size of J/ψ. We then show in Fig. 4
the dependence of the nucleon-(cc̄)8 cross section on the color-octet pair radius.
We do not want to address proton-nucleus collisions in terms of nucleon-cc̄ cross
sections [33, 34] since only the gluon-cc̄ cross sections are needed to study cc̄ suppression
in the prethermal and thermal stages. In proton-nucleus collisions, once a color-octet
(cc̄)8 pair is produced, it picks up a collinear gluon to form a colorless configuration [14].
However, in central Au-Au collisions at RHIC and LHC energies, the accompanying gluon
scatters with other hard gluons in the dense partonic system and is driven away. Therefore
the bare cc̄ is the object that we want to study in the partonic system.
6.2. J/ψ number distributions with suppression
J/ψ number distributions versus transverse momentum at y = 0 and rapidity at
pT = 4 GeV for central Au-Au collisions at RHIC energy
s = 200AGeV are calculated
with respect to the initial collision, prethermal and thermal stages. Initial productions
of cc̄ are calculated with GRV parton distribution functions at renormalization scale µ =
p2⊥ + 4m
c . Evolution of color-octet states
1S0 and
3PJ toward the J/ψ is specified
by nonperturbative matrix elements < OJ/ψ8 (3S1) >, < O
1S0) > and < OJ/ψ8 (3P0) >
in nonrelativistic QCD [17]. In the nonperturbative evolution, a gluon from the partonic
system hits and prevents the color octet from color neutralizing via g + (cc̄)8 → (cc̄)8.
This medium effect has been expressed by exponentials in Eqs. (2), (6), (8), (10) and
(11). Therefore, the nonperturbative matrix elements are assumed to be invariant while
the medium effect is factorized into exponential forms. Values of these matrix elements
are well determined by fitting the CDF measurements for pp̄ collisions at
s = 1.8 TeV
in Ref. [35],
< OJ/ψ8 (3S1) >= (1.12± 0.14)× 10−2GeV3 (22)
< OJ/ψ8 (1S0) > +
< OJ/ψ8 (3P0) >= (3.90± 1.14)× 10−2GeV3 (23)
In pp̄ collisions, differential cross sections of direct J/ψ production depend on the com-
bination of < OJ/ψ8 (1S0) > and < O
3P0) >. However, since the dissociation cross
section for the S-wave color-octet state is different from that for the P -wave color-octet
state, such a dependence on the combination is destroyed. In calculations, values are
taken as follows,
< OJ/ψ8 (1S0) >= 4× 10−2GeV3, < O
3P0) >= −
× 10−2GeV3 (24)
The value of < OJ/ψ8 (3P0) > is positive at tree level and negative after renormalization
[36]. Eq. (23) is still satisfied by the values in Eq. (24). These values of nonperturbative
matrix elements are supposed to be universal for any center-of-mass energy
Various contributions to the J/ψ number distributions including initial collisions,
prethermal and thermal stages, 2 → 1 and 2 → 2 collisions, are drawn separately in Figs.
5 and 6. The dashed curve resulting from cc̄ production in the initial collision is obtained
by calculating Eq. (2) where the nuclear parton shadowing factor is given in Ref. [4]
throughout this subsection. The upper and lower dot-dashed curves resulting from cc̄
production in the prethermal stage are obtained by individually calculating Eq. (6) for
2 → 1 collisions and Eq. (8) for 2 → 2 collisions. The upper and lower dotted curves
resulting from cc̄ production in the thermal stage are obtained by calculating Eq. (10)
for 2 → 1 collisions and Eq. (11) for 2 → 2 collisions, respectively. To exclude the effect
of intrinsic transverse momentum smearing, only the region pT > 2 GeV is considered.
Consequently, no 2 → 1 collisions contribute in the initial collision. The J/ψ number
distribution resulting from the initial collision shown by the dashed line has a plateau
similar to that in proton-proton collision [37]. Both Figs. 5 and 6 show that cc̄ pairs
produced from the thermal stage can be neglected compared to the initial production, but
the contributions from the prethermal stage are important in the transverse momentum
region 2GeV < pT < 8GeV and rapidity region 0 < y < 1.2. Productions of cc̄ in the
prethermal and thermal stages bulge up the J/ψ number distribution shown by the solid
line in this rapidity region. Nevertheless, the dot-dashed and dotted lines fall rapidly as
the rapidity gets large. This bulging characterizes the formation of a deconfined medium
because the medium has average momentum limited but big enough to produce extra cc̄
and thus J/ψ. The 2 → 1 collisions in the partonic system have bigger contributions than
the 2 → 2 collisions.
Each of Figs. 7 and 8 contains two sets of lines to show contributions from the
color-singlet and color-octet pairs produced at short distance. Any set has a dashed line
obtained from Eq. (2) for the initial collision, a dot-dashed line from Eq. (9) for the
prethermal stage and a dotted line from Eq. (12) for the thermal stage, respectively. A
line in the upper (lower) set for the color-octet (color-singlet) contributions stems from
the terms for (cc̄)8 ((cc̄)1) states. The color octet states dominate productions of J/ψ at
RHIC energy. However, the ratio of color-octet to color-singlet contributions shown by the
two solid lines at pT = 6 GeV is reduced from about 70 at CDF collider energy
s = 1.8
TeV to about 40 at RHIC energy. Both contributions of color-singlet and color-octet
states have similar dependence on transverse momentum and rapidity.
Figs. 9 and 10 show transverse momentum and rapidity dependence of J/ψ number
distributions for central Au-Au collisions at LHC energy
s = 5.5ATeV. A prominent
feature is that the J/ψ number produced from the thermal stage is comparable to that
from the prethermal stage. Compared to the initial production, cc̄ and J/ψ produced
through 2 → 2 reactions may be neglected. A bulge is observed on the plateau in the
rapidity region 0 < y < 1.5. Such a bulge can be taken as a signature for the existence
of a parton plasma at the LHC energy. Figs. 11 and 12 depict contributions from the
color singlet and color octet at LHC energy. The ratio of color-octet to color-singlet
contributions shown by the two solid lines at pT = 6GeV reaches about 150. This indicates
the color octet states become more important with the increase of
6.3. Ratios including J/ψ survival probability
Nuclear shadowing results in a modification of gluon distribution functions inside
a nucleus [38] and such a nuclear effect is represented by the shadowing factor Sa/A in
Eq. (3). If the Sa/A = 1 for no shadowing, the dN
ini /dyd
2p⊥ is proportional to the
product of atomic masses of the two colliding nuclei. If Sa/A 6= 1 and depends on the
longitudinal momentum fraction x, the production of cc̄ is reduced in the shadowing region
and enhanced for the anti-shadowing region. Irrespective of interactions of J/ψ with the
partonic system, J/ψ number distributions produced in the initial central A+B collision
is obtained by putting all exponentials equal to 1 in Eq. (2),
dN2→20
dyd2p⊥
(Sa/A) = 2
s−m⊥e
s−m⊥e
xafa/A(xa, m
⊥, ~r)xbfb/B(xb, m
⊥,−~r)
(ab→ cc̄[3S(1)1 ]x → J/ψ)
(ab→ cc̄[3S(8)1 ]x → J/ψ)
(ab→ cc̄[1S(8)0 ]x → J/ψ)
(ab→ cc̄[3P (8)J ]x → J/ψ)}
To characterize the influence of nuclear parton shadowing on the J/ψ production from
the initial collision, a ratio is defined as
Rini =
dN2→20
dyd2p⊥
(Sa/A 6= 1)/
dN2→20
dyd2p⊥
(Sa/A = 1) (26)
Here the Sa/A 6= 1 from Ref. [4] applies throughout this subsection.
The initially produced J/ψ originates from the cc̄ pairs produced in the initial colli-
sion. Its dependence on the transverse momentum and rapidity is obtained by calculating
Eq. (25). Some cc̄ pairs produced in the initial collision may dissociate by gluons from the
partonic system. As a consequence, the J/ψ number is reduced. The survival probability
for the cc̄ transiting into a J/ψ is defined as the ratio
Splasma =
dN2→2ini
dyd2p⊥
(Sa/A 6= 1)/
dN2→20
dyd2p⊥
(Sa/A 6= 1) (27)
We have calculated the J/ψ number distributions produced in the prethermal and
thermal stages in Subsection 6.2. The J/ψ yield may be bigger than the reduced amount
of initially produced J/ψ due to the cc̄ dissociation by gluons in the partonic system. The
partonic system has two roles. One is to produce cc̄ pairs and another is to dissociate cc̄
pairs. To see the roles, a ratio is defined by
Rplasma = (
dN2→2ini
dyd2p⊥
(Sa/A 6= 1) +
dNpre
dyd2p⊥
dNthe
dyd2p⊥
dN2→20
dyd2p⊥
(Sa/A 6= 1) (28)
To understand the nuclear effect on parton distributions and the roles of the partonic
system, we need to compare J/ψ production in the central A+B collision with that in the
nucleon-nucleon collision. To this end, a ratio is defined as
R = (
dN2→2ini
dyd2p⊥
(Sa/A 6= 1) +
dNpre
dyd2p⊥
dNthe
dyd2p⊥
dN2→20
dyd2p⊥
(Sa/A = 1) (29)
which is also written as
R = RiniRplasma (30)
The ratios Rini, Splasma, Rplasma and R versus transverse momentum and rapidity
are depicted as dashed, dotted, dot-dashed and solid lines, respectively, in Figs. 13 and 14
for the RHIC energy and Figs. 15 and 16 for the LHC energy. In contrast to Rini < 1, the
value of Rplasma is larger than 1 for all transverse momenta in Fig. 13 and 0 < y < 1.5 in
Fig. 14 and 0.5 < y < 1.5 in Fig. 16. This results in prominent bulges on the solid lines
of R in Figs. 14 and 16. In contrast, the survival probability Splasma shown by the dotted
lines has no such bulge. Therefore, the bulges are present in Figs. 6 and 10 when the J/ψ
yield resulting from cc̄ pairs produced in the partonic system overwhelms the reduced
amount of initially produced J/ψ. We conclude that in the rapidity region 0 < y < 1.5
a bulge observed in the ratio R is an indicator for the existence of the partonic system.
For Rini < 1 and Splasma < 1, J/ψ suppression arises from the nuclear parton shadowing
found in HIJING and cc̄ dissociation in the partonic system.
6.4. J/ψ number distributions with no suppression
In Subsection 6.2, J/ψ number distributions have been presented while the cc̄ re-
duction due to the nuclear parton shadowing in the initial collision and cc̄ dissociation in
the partonic system are taken into account. In this subsection, the suppression including
both the reduction and dissociation is omitted in calculations of J/ψ number distributions
by setting to 1 all exponentials in Eqs. (2), (6), (8), (10) and (11). Figs. 17-20 depict
these distributions versus transverse momentum at y = 0 and rapidity at pT = 4 GeV at
both RHIC and LHC energies. We are now ready to explain the dip within y < 1 in Fig.
10. This dip disappears in Fig. 20 where suppression is not considered. Since Rini shown
by the dashed line in Fig. 16 is flat with respect to the rapidity y < 2 and Splasma shown
by the dotted line has a steep rise in 0.5 < y < 1, the dip phenomenon is solely due to the
cc̄ dissociation in the partonic system. Such a dip phenomenon is not obvious but still
can be observed in the prethermal and thermal stages when the J/ψ number distributions
with suppression are compared to those without suppression. The comparison is indicated
in Fig. 21 for the prethermal stage and Fig. 22 for the thermal stage. The solid and
dot-dashed lines for no suppression begin to fall from y = 0.5 to y = 1, but change to
rising as shown by the dashed and dotted lines when the cc̄ dissociation is switched on.
This change occurs because of the steep rise of Splasma. A relatively weak dependence of
Splasma on pT is shown by the dotted line in Fig. 15. The dip phenomenon thus cannot
be observed in the pT dependence of J/ψ number distributions. Referring back to Eqs.
(2), (6), (8), (10) and (11), exponentials there have sensitive dependence on the rapidity
in 0 < y < 1.5. The dip is more obvious in the color singlet channel as shown by the
lower solid, dashed, dot-dashed and dotted lines in Fig. 12. This is so because the cross
section for the 13S
1 -state dissociation has a narrower peak with respect to the incident
gluon energy [12] than the color octet states.
6.5. Uncertainties
Since gluon shadowing in nuclei has not been studied experimentally, theoretical
estimates of the nuclear gluon shadowing factor involve uncertainties. The nuclear parton
shadowing factor found in HIJING [4] is a result of the assumption that there is no Q-
dependence on the shadowing factor and the shadowing effect for gluons and quarks is the
same. Nevertheless, the shadowing factor has been shown by Eskola et al. to evolve with
momentum Q [39]. The difference between the latter and the former indicates uncertainty.
The ratio Rini defined in Eq. (26) is calculated with Eskola et al.’s parametrization [39]
and results are depicted in Fig. 23 showing momentum dependence at y = 0 and Fig.
24 showing rapidity dependence at pT = 4 GeV. Compared to the dashed lines in Figs.
13-16, the change of Rini at RHIC energy greater than 1 is prominent. This implies that
the anti-shadowing effect of Eskola et al.’s parametrization is quite important at RHIC
energy. Measurements on Rini in RHIC experiments are needed to confirm this nuclear
enhancement [21].
The ratio Rini is always flat within the rapidity region -1.5< y < 1.5 for parametriza-
tions given in HIJING and by Eskola et al., and the flatness seems to be independent of
parametrizations. If the partonic system does not come into being, the ratio R is flat, too,
since R = Rini. If the partonic system dissociates cc̄ pairs, the solid curve of R undergoes
bulging, dipping and then bulging from y = −1.5 to y = 1.5. Any such twist of R in
−1.5 < y < 1.5 observed in experiments is nontrivial, because only a deconfined medium
generates it.
Upon inclusion of uncertainties on the formation and evolution of parton plasma
arising from other factors, for instance, the dependence on the coupling constant αs
[40] and transverse flow [41], the J/ψ number distribution and the four ratios including
survival probability will change. In the partonic system considered here gluons dominate
the evolution and gluon-cc̄ interactions break the pairs. In a system where quarks and
antiquarks are abundant, interactions between quarks (antiquarks) and cc̄ may account
for a suppression of J/ψ [42]. Additional suppression caused by energy loss of the initial
state has not been considered since there is a controversy on the influence of the energy
loss [43, 44]. Some uncertainties are expected to be fixed by upcoming experiments at
RHIC.
7. Conclusions
We have studied J/ψ production through both color-singlet and color-octet cc̄ chan-
nels with various stages of central Au-Au collisions at both RHIC and LHC energies. In
addition to the scattering processes ab → cc̄[2S+1LJ ]x, contributions of the reactions
ab→ cc̄[2S+1LJ ] are also calculated in the prethermal stage and thermal stage. The effect
of the medium on an expanding cc̄ involves a gluon interacting with the cc̄ to prevent it
from a transition into a color singlet. Cross sections for g + cc̄ → (cc̄)8 are calculated
with internal wave functions of (cc̄)1 in an attractive potential and (cc̄)8 in a repulsive
potential. Furthermore, nucleon-cc̄ cross sections for color singlet, S- and P - wave color
octets as a function of
s or cc̄ radius are evaluated by assuming that the nucleon domi-
nantly contains gluons. Momentum and rapidity dependence of J/ψ number distribution
with various contributions are calculated for central Au-Au collisions at both RHIC and
LHC energies. Color octet contributions are one order of magnitude larger than the color
singlet contributions. Yields of cc̄ are large in the prethermal stage at RHIC energy and
through the 2 → 1 collisions ab → cc̄[2S+1LJ ] at LHC energy. Since the partonic system
offers fairly large amounts of cc̄, a bulge in 0 < y < 1.5 at RHIC energy and 0.5 < y < 1.5
at LHC energy can be observed in the rapidity dependence of the J/ψ number distribu-
tion and the ratio R of J/ψ number distributions for Au-Au collisions to nucleon-nucleon
collisions. Such a bulge is a signature for the existence of a deconfined partonic medium.
We suggest that RHIC and LHC experiments measure J/ψ number distributions and
the ratio R in the rapidity region 0 < y < 3 to observe a bulge. While the yield of cc̄
from the medium is larger than the reduced amount of initial production in the medium,
the ratio Rplasma is larger than 1. The competition between production and suppression
determines the values of Rplasma, which relies on the evolution of parton number density
and temperature of the partonic system [12]. A dip in the rapidity dependence of the J/ψ
number distributions at LHC energy may exist and this amounts to a suppression effect
of cc̄ in the partonic system. So far, we have obtained results and conclusions for positive
rapidity. It is stressed that the same contents for negative rapidity can be obtained from
the positive region by symmetry.
Acknowledgements
I thank the [Department of Energy’s] Institute for Nuclear Theory at the University
of Washington for its hospitality and the Department of Energy for partial support during
the completion of this work. I thank the Nuclear Theory Group at LBNL Berkeley for
their hospitality during my visit. I also thank X.-N. Wang, C.-Y. Wong and M. Asakawa
for discussions, K. J. Eskola for offering Fortran codes of nuclear parton shadowing factors,
H. J. Weber for careful reading through the manuscript. This work was also supported in
part by the project KJ951-A1-410 of the Chinese Academy of Sciences and the Education
Bureau of Chinese Academy of Sciences.
References
[1]D. Kharzeev and H. Satz, Phys. Lett. B334(1994)155.
[2]R. C. Hwa and K. Kajantie, Phys. Rev. Lett. 56(1986)696;
J. P. Blaizot and A. H. Mueller, Nucl. Phys. B289(1987)847.
[3]K. Kajantie, P. V. Landshoff, and J. Lindfors, Phys. Rev. Lett. 59
(1987)2517;
K. J. Eskola, K. Kajantie, and J. Lindfors, Nucl. Phys. B323(1989)37;
Phys. Lett. B214(1991)613.
[4]X.-N. Wang and M. Gyulassy, Phys. Rev. D44(1991)3501; Comput.
Phys. Commun. 83(1994)307.
X.-N. Wang, Phys. Rep. 280(1997)287.
[5]K. Geiger and B. Müller, Nucl. Phys. B369(1992)600;
K. Geiger, Phys. Rev. D47(1993)133.
[6]H. J. Moehring and J. Ranft, Z. Phys. C52(1991)643;
P. Aurenche et al., Phys. Rev. D45(1992)92;
P. Aurenche et al., Comput. Phys. Commun. 83(1994)107.
[7]E. Shuryak, Phys. Rev. Lett. 68(1992)3270;
L. Xiong and E.Shuryak, Phys. Rev. C49(1994)2203.
[8]K. Geiger and J. I. Kapusta, Phys. Rev. D47(1993)4905.
[9]T. S. Biró, E. van Doorn, B. Müller, M. H. Thoma, and X.-N. Wang,
Phys. Rev. C48(1993)1275.
[10]J. Alam, S. Raha and B. Sinha, Phys. Rev. Lett. 73(1994)1895.
[11]H. Heiselberg and X.-N. Wang, Phys. Rev. C53(1996)1892.
[12]X.-M. Xu, D. Kharzeev, H. Satz and X.-N. Wang, Phys. Rev. C53
(1996)3051.
[13]T. Matsui and H. Satz, Phys. Lett. B178(1986)416.
[14]D. Kharzeev and H. Satz, Phys. Lett. B366(1996)316.
[15]D. M. Alde et al., Phys. Rev. Lett. 66(1991)133;
D. M. Alde et al., Phys. Rev. Lett. 66(1991)2285;
L. Antoniazzi et al., Phys. Rev. Lett. 70(1993)383;
M. H. Schub et al., Phys. Rev. D52(1995)1307;
T. Alexopoulos et al., Phys. Rev. D55(1997)3927
[16]F. Abe et al., CDF Collaboration, Phys. Rev. Lett. 79(1997)572,578
[17]W. E. Caswell and G. P. Lepage, Phys. Lett. B167(1986)437;
G. P. Lepage, L. Magnea, C. Nakhleh, U. Magnea and K. Hornbostel,
Phys. Rev. D46(1992)4052;
G. T. Bodwin, E. Braaten and G. P. Lepage, Phys. Rev. D51(1995)1125.
[18]P. Lévai, B. Müller and X.-N. Wang, Phys. Rev. C51(1995)3326.
[19]Z. Lin and M. Gyulassy, Phys. Rev. C51(1995)2177.
[20]K. J. Eskola and X.-N. Wang, Phys. Rev. D49(1994)1284.
[21]Y. Akiba, in Proc. of Charmonium Production in Relativistic Nuclear
Collisions, INT, Seattle, 1998, eds. B. Jacak and X.-N. Wang (World
Scientific, Singapore,1998);
M. Rosati, in Proc. of Charmonium Production in Relativistic Nuclear
Collisions, INT, Seattle, 1998, eds. B. Jacak and X.-N. Wang (World
Scientific, Singapore,1998)
[22]S. Gupta and H. Satz, Z. Phys. C55(1992)391.
R. C. Hwa and L. Leśniak, Phys. Lett. B295(1992)11.
R. Vogt, S. J. Brodsky and P. Hoyer, Nucl Phys. B360(1991)67;
K. Boreskov, A. Capella, A. Kaidalov and J. Tran Thanh Van, Phys.
Rev. D47(1993)919.
M. A. Braun, C. Pajares, C. A. Salgado, N. Armesto and A. Capella,
Nucl. Phys. B509(1998)357.
[23]P. Cho and A. K. Leibovich, Phys. Rev. D53(1996)150,6203.
[24]K. Sridhar, A. D. Martin and W. J. Stirling, Phys. Lett. B438(1998)211.
[25]R. Baier and R. Rückl, Z. Phys. C19(1983)251;
R. Gastmans, W. Troost and T. T. Wu, Nucl. Phys. B291(1987)731.
[26]M. E. Peskin, Nucl. Phys. B156(1979)365;
G. Bhanot and M. E. Peskin, Nucl. Phys. B156(1979)391.
[27]B. Z. Kopeliovich and B. G. Zakharov, Phys. Rev. D44(1991)3466;
L. Frankfurt, G. A. Miller and M. Strikman, Phys. Lett. B304(1993)1;
L. Gerland, L. Frankfurt, M. Strikman, H. Stöcker and W. Greiner,
Phys. Rev. Lett. 81(1998)762.
P. Jain, B. Pire and J. P. Ralston, Phys. Rep. 271(1996)67.
[28]M. Glück, E. Reya and A. Vogt, Z. Phys. C67(1995)433.
[29]R. L. Anderson, SLAC-Pub 1741(1976).
[30]J. Hüfner and B. Z. Kopeliovich, Phys. Lett. B426(1998)154.
[31]C. Gerschel and J. Hüfner, Z. Phys. C56(1992)171.
[32]J. Dolej̆si and J. Hüfner, Z. Phys. C54(1992)489.
C. W. Wong, Phys. Rev. D54(1996)R4199.
[33]C.-Y. Wong and C. W. Wong, Phys. Rev. D57(1998)1838.
[34]W. Cassing and E. L. Bratkovskaya, Nucl. Phys. A623(1997)570.
[35]M. Beneke and M. Krämer, Phys. Rev. D55(1997)R5269.
[36]J. Amundson, S. Fleming and I. Maksymyk, Phys. Rev. D56(1997)5844;
T. Mehen, Phys. Rev. D55(1997)4338.
[37]R. Gavai et al., Int. J. Mod. Phys. A10(1995)3043.
[38]A. H. Mueller and J. Qiu, Nucl. Phys. B268(1986)427;
K. J. Eskola, J. Qiu and X.-N. Wang, Phys. Rev. Lett. 72(1994)36;
M. Arneodo, Phys. Rep. 240(1994)301.
[39]K. J. Eskola, V. J. Kolhinen and C. A. Salgado, JYFL-8/98,
US-FT/14-98, hep-ph/9807297.
K. J. Eskola, V. J. Kolhinen and P. V. Ruuskanen, CERN-TH/97-345,
JYFL-2/98, hep-ph/9802350.
[40]S. M. H. Wong, Phys. Rev. C56(1997)1075.
[41]D. K. Srivastava, M. G. Mustafa and B. Müller, Phys. Rev. C56(1997)1064.
[42]R. Wittmann and U. Heinz, Z. Phys. C59(1993)77.
[43]S. Gavin and J. Milana, Phys. Rev. Lett. 68(1992)1834;
E. Quack and T. Kodama, Phys. Lett. B302(1993)495;
R. C. Hwa, J. Pĭsút and N. Pĭsútová, Phys. Rev. C56(1997)432.
[44]S. J. Brodsky and P. Hoyer, Phys. Lett. B298(1993)165.
http://arxiv.org/abs/hep-ph/9807297
http://arxiv.org/abs/hep-ph/9802350
Figure 1: Solid, dashed and dot-dashed lines are nucleon-cc̄[13S
1 ] cross sections for
fg/N (x,Q
2) evaluated at Q2 = ǫ20, 2ǫ
0, (Q
0)2, respectively. The (cc̄)1 has the same size as
Figure 2: Cross section for nucleon-cc̄[13S
1 ] at
s = 6 GeV as a function of the cc̄ radius
is calculated with fg/N (x,Q
2) evaluated at Q2 = ǫ20.
Figure 3: The solid and dashed lines are cross sections for nucleon-cc̄[S(8)] and nucleon-
cc̄[P (8)] collisions as a function of
s, respectively. The dot-dashed line is the nucleon-J/ψ
cross section calculated with Eq. (24) in Ref. [1]. The corresponding (cc̄)8 and J/ψ have
the same radius.
Figure 4: The solid and dashed lines show radius dependence of cross sections for nucleon-
(cc̄)8[S
(8)] and nucleon-(cc̄)8[P
(8)] collisions, respectively.
Figure 5: J/ψ number distributions versus transverse momentum at y = 0 and RHIC
energy with suppression. The dashed curve corresponds to cc̄ production in the initial
collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced
through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The
solid curve is the sum of all contributions.
Figure 6: The same as Fig. 5, except for rapidity distribution at pT = 4 GeV.
Figure 7: J/ψ number distributions versus transverse momentum at y = 0 and RHIC
energy with suppression. The upper and lower dashed (dot-dashed, dotted and solid)
lines correspond to cc̄ in color octet and color singlet, respectively, produced in the initial
collision(prethermal stage, thermal stage and the all three stages).
Figure 8: The same as Fig. 7, except for rapidity distribution at pT = 4 GeV.
Figure 9: J/ψ number distributions versus transverse momentum at y = 0 and LHC
energy with suppression. The dashed curve corresponds to cc̄ productions in the initial
collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced
through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The
solid curve is the sum of all contributions.
Figure 10: The same as Fig. 9, except for rapidity distribution at pT = 4 GeV.
Figure 11: J/ψ number distributions versus transverse momentum at y = 0 and LHC
energy with suppression. The upper and lower dashed (dot-dashed, dotted and solid)
lines correspond to cc̄ in color octet and color singlet, respectively, produced in the initial
collision (prethermal stage, thermal stage and all three stages).
Figure 12: The same as Fig. 11, except for rapidity distribution at pT = 4 GeV.
Figure 13: Ratios versus transverse momentum at y = 0 and RHIC energy. The solid,
dashed, dot-dashed and dotted lines are R, Rini, Rplasma and Splasma, respectively.
Figure 14: The same as Fig. 13, except for rapidity distribution at pT = 4 GeV
Figure 15: Ratios versus transverse momentum at y = 0 and LHC energy. The solid,
dashed, dot-dashed and dotted lines are R, Rini, Rplasma and Splasma, respectively.
Figure 16: The same as Fig. 15, except for rapidity distribution at pT = 4 GeV.
Figure 17: J/ψ number distributions versus transverse momentum at y = 0 and RHIC
energy without suppression. The dashed curve corresponds to cc̄ productions in the initial
collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced
through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The
solid curve is the sum of all contributions.
Figure 18: The same as Fig. 17, except for rapidity distribution at pT = 4 GeV.
Figure 19: J/ψ number distributions versus transverse momentum at y = 0 and LHC
energy without suppression. The dashed curve corresponds to cc̄ productions in the initial
collision. The upper and lower dot-dashed (dotted) curves correspond to cc̄ produced
through 2 → 1 and 2 → 2 reactions in the prethermal (thermal) stage, respectively. The
solid curve is the sum of all contributions.
Figure 20: The same as Fig. 19, except for rapidity distribution at pT = 4 GeV.
Figure 21: J/ψ number distributions versus rapidity at pT = 4 GeV in the prethermal
stage of LHC energy. The dashed and solid lines individually correspond to cc̄ produced
through 2 → 1 reactions with and without suppression. The dotted and dot-dashed lines
through 2 → 2 reactions with and without suppression, respectively.
Figure 22: The same as Fig. 21, except for the thermal stage.
Figure 23: Ratio Rini versus transverse momentum at y = 0 is calculated with Eskola’s
parametrization.
Figure 24: The same as Fig. 23, except for rapidity distribution at pT = 4 GeV.
ABSTRACT
  Any color singlet or octet ccbar pair is created at short distances and then
expands to a full size of J/psi. Such a dynamical evolution process is included
here in calculations for the J/psi number distribution as a function of
transverse momentum and rapidity in central Au-Au collisions at both RHIC and
LHC energies. The ccbar pairs are produced in the initial collision and in the
partonic system during the prethermal and thermal stages through the partonic
channels ab to ccbar [{2S+1}L_J] and ab to ccbar [{2S+1}L_J]x, and then they
dissociate in the latter two stages. Dissociation of ccbar in the medium occurs
via two reactions: (a) color singlet ccbar plus a gluon turns to color octet
ccbar, (b) color octet ccbar plus a gluon persists as color octet. There are
modest yields of ccbar in the prethermal stage at RHIC energy and through the
reactions ab to ccbar [{2S+1}L_J] at LHC energy for partons with large average
momentum in the prethermal stage at both collider energies and in the thermal
stage at LHC energy. Production from the partonic system competes with the
suppression of the initial yield in the deconfined medium. Consequently, a
bulge within -1.5<y<1.5 has been found for the J/psi number distribution and
the ratio of J/psi number distributions for Au-Au collisions to nucleon-nucleon
collisions. This bulge is caused by the partonic system and is thus an
indicator of a deconfined partonic medium. Based on this result we suggest the
rapidity region worth measuring in future experiments at RHIC and LHC to be
-3<y<3.

<|endoftext|><|startoftext|>
Introduction. 2
2 Toy model of the weak coupling limit. 4
2.1 Dilations of contractive semigroups. . . . . . . . . . . . . . . . . . . . . . 4
2.2 “Toy quadratic noises”. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Weak coupling limit for Friedrichs operators. . . . . . . . . . . . . . . . . 8
3 Completely positive maps and semigroups. 10
3.1 Completely positive maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Completely positive semigroups. . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Classical Markov semigroups. . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4 Invariant c.p semigroups. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Detailed Balance Condition. . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Bosonic reservoirs. 18
4.1 Second quantization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Coupling to a bosonic reservoir. . . . . . . . . . . . . . . . . . . . . . . . . 19
The paper is in final form and no version of it will be published elsewhere.
http://arxiv.org/abs/0704.0669v2
2 J. DEREZIŃSKI AND W. DE ROECK
4.3 Thermal reservoirs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Quantum Langevin dynamics. 21
5.1 Linear noises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 Quadratic noises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.3 Total energy operator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6 Weak coupling limit for Pauli-Fierz operators. 25
6.1 Reduced weak coupling limit. . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Energy of the reservoir in the weak coupling limit. . . . . . . . . . . . . . 27
6.3 Extended weak coupling limit. . . . . . . . . . . . . . . . . . . . . . . . . 27
1. Introduction. Physicists often describe quantum systems by completely positive (c.p.)
semigroups [Haa, AL, Al2]. It is generally believed that this approach is only a phe-
nomenological approximation to a more fundamental description. One usually assumes
that on the fundamental level the dynamics of quantum systems is unitary, more precisely,
is of the form t 7→ eitH · e−itH for some self-adjoint H .
One of justifications for the use of c.p. semigroups in quantum physics is based on
the so-called weak coupling limit for the reduced dynamics [VH, Da1], which we will call
the reduced weak coupling limit. One assumes that a small system is coupled to a large
reservoir and the dynamics of the full system is unitary. The interaction between the
small system and the reservoir is multiplied by a small coupling constant λ. Often the
reservoir is described by a free Bose gas.
The basic steps of the reduced weak coupling limit are:
• Reduce the dynamics to the small system.
• Rescale time as t
• Subtract the dynamics of the small system.
• Consider the weak coupling λ→ 0.
In the limit one obtains a dynamics given by a c.p. Markov semigroup.
Another possible justification of c.p. dynamics goes as follows. One considers the
tensor product of the small system and an appropriate bosonic reservoir. On this enlarged
space one constructs a certain unitary dynamics whose reduction to the small system is
a c.p. semigroup. We will call it a quantum Langevin dynamics. Another name used in
this context in the literature is a quantum stochastic dynamics. Its construction has a
long history, let us mention [AFLe, HP, Fr, Maa].
Thus one can obtain a Markov c.p. semigroup by reducing a single unitary dynamics,
without invoking a family of dynamics and taking its limit. However, the generator of a
quantum Langevin dynamics equals i[Z, ·] where Z is a self-adjoint operator that does
not look like a physically realistic Hamiltonian. In particular, it is unbounded from both
below and above. Thus one can question the physical relevance of this construction.
REDUCED AND EXTENDED WEAK COUPLING LIMIT 3
It turns out, however, that one can extend the weak coupling limit in such a way, that
it involves not only the small system, but also the reservoir. As a result of this approach
one can obtain a quantum Langevin dynamics. One can argue that this approach gives
a physical justification of quantum Langevin dynamics.
The above idea was first implemented in [AFL] by Accardi, Frigerio and Lu under
the name of the stochastic limit (see also [ALV]). Recently we presented our version of
this approach, under the name of the extended weak coupling limit [DD1, DD2], which
we believe is simpler and more natural that of [AFL]. The basic steps of the extended
weak coupling limit are:
• Introduce the so-called asymptotic space — the tensor product of the space of the
small system and of the asymptotic reservoir.
• Introduce an identification operator that maps the asymptotic reservoir into the
physical reservoir and rescales its energy by λ2 around the Bohr frequencies.
• Rescale time as t
• Subtract the “fast degrees of freedom”.
• Consider the weak coupling λ→ 0.
In the limit one obtains a quantum Langevin dynamics on the asymptotic space. Note
that the asymptotic reservoir is given by a bosonic Fock space (just as the physical
reservoir). Its states are however different – correspond only to those physical bosons
whose energies differ from the Bohr frequencies by at most O(λ2). Only such bosons
survive the weak coupling limit.
Let us mention yet another scheme of deriving quantum Langevin equations that has
received attention in the literature, namely the ‘repeated interaction models’ where the
reservoir is continuously refreshed, see [AtP, AtJ].
In this article we review various aspects of the weak coupling limit, reduced and, es-
pecially, extended, mostly following our papers [DD1, DD2]. We also describe some back-
ground material, especially related to completely positive semigroups, quantum Langevin
dynamics and the Detailed Balance Condition.
The plan of our article is as follows. In Section 2 we describe both kinds of the weak
coupling limit on a class of toy-examples – the so-called Friedrichs Hamiltonians and
their dilations. They are less relevant physically than the main model treated in our
article – the one based on Pauli-Fierz operators. Nevertheless, they illustrate some of the
main ideas of this limit in a simple and mathematically instructive context. This section
is based on [DD1].
In Sections 3 we recall some facts about completely positive maps and semigroups,
sketching proofs of the Stinespring dilation theorem [St] and of the so-called Lindblad
form of the generator of a c.p. semigroup [Li, GKS]. In particular, we discuss the freedom
of choosing various terms in the Lindblad form. This question, which we have not seen
discussed in the literature, is relevant for the construction of quantum Langevin dynamics
and the weak coupling limit.
4 J. DEREZIŃSKI AND W. DE ROECK
C.p. semigroups that arise in the weak coupling limit have an additional property –
they commute with the unitary dynamics generated by the Hamiltonian K of the small
system – for shortness we say that they are K-invariant. If in addition the reservoir is
thermal, they satisfy another special property – the so-called Detailed Balance Condition
(DBC) [DF1, Ag, FKGV, Al1]. We devote a large part of Sect. 3 to an analysis of the
K-invariance and the DBC. We show that the generator of a c.p. semigroup with these
properties has some features that curiously resemble the Tomita-Takesaki theory and the
KMS condition. Let us note that in our article the DBC is considered jointly with the
K-invariance, because c.p. semigroups obtained in the weak coupling limit always have
the latter property.
In Section 4 we describe the terminology and notation that we use to describe second-
quantized bosonic reservoirs interacting with a small system. In particular, we introduce
Pauli-Fierz operators [DJ1] – used often (also under other names) in the physics literature
to describe physically realistic systems. In Subsect. 4.3 we discuss thermal reservoirs. In
our definition of a thermal reservoir at inverse temperature β one needs to check a simple
condition for the interaction without explicitly invoking the concept of a KMS state on
an operator algebra, or of a thermal Araki-Woods representation of the CCR [DF1, DJ1].
Of course, this condition is closely related to the KMS property.
In Subsection 5 we describe a construction of quantum Langevin dynamics. We include
a discussion of the so-called quadratic noises, even though they are still not used in our
version of the extended weak coupling limit. (See however [Go] for some partial results
in the context of the formalism of [AFL]).
In Section 6 we describe the two kinds of the weak coupling limit for Pauli-Fierz
operators: reduced and extended. Most of this section is based on [DD2].
2. Toy model of the weak coupling limit. This section is somewhat independent of
the remaining part of the article. It explains the (reduced and extended) weak coupling
limit in the setting of contractive semigroups on a Hilbert space and their unitary dila-
tions. It gives us an opportunity to explain some of the main ideas of the weak coupling
limit in a relatively simple setting. It is based on [DD1].
It is possible to construct physically interesting models based on the material of this
section (e.g. by considering quadratic Hamiltonians obtained by second quantization).
We will not discuss this possibility further, since in the remaining part of the article we
will analyze more interesting and more realistic models.
2.1. Dilations of contractive semigroups. First let us recall the well known concept of
a unitary dilation of a contractive semigroup. Let K be a Hilbert space and e−itΥ a
strongly continuous contractive semigroup on K. This implies that −iΥ is dissipative:
−iΥ + iΥ∗ ≤ 0.
Let Z be a Hilbert space containing K, IK the embedding of K in Z and Ut a unitary
group on Z. We say that (Z, IK, Ut) is a dilation of e−itΥ iff
I∗KUtIK = e
−itΥ, t ≥ 0.
It is well known that every weakly continuous contractive semigroup possesses a
REDUCED AND EXTENDED WEAK COUPLING LIMIT 5
unitary dilation (which is unique up to the unitary equivalence if we additionally demand
its minimality). The original and well known construction of a unitary dilation is due
to Foias and Nagy and can be found in [NF] (see also [EL]). Below we present another
construction, which looks different from that of Foias-Nagy. Its main idea is to view the
generator of a unitary dilation as a kind of a singular Friedrichs operator. (See the next
section, where Friedrichs operators are introduced). Such a definition is well adapted
to the extended weak coupling limit. The construction that we present seems to be
less known in the mathematics literature than that of Foias-Nagy. Nevertheless, similar
constructions are scattered in the literature, especially in physics papers.
Let h be an auxiliary space and ν ∈ B(K, h) satisfy
(Υ −Υ∗) = −ν∗ν. (2.1)
Note that such h and ν always exist. One of possible choices is to take h := K and
i(Υ−Υ∗).
If φ is a vector, then |φ) will denote the operator C ∋ λ 7→ |φ)λ := λφ. Similarly, (φ|
will denote its adjoint: f 7→ (φ|f := (φ|f) ∈ C.
We will use a similar notation also for unbounded functionals. For instance, (1| will
denote the (unbounded) linear functional on L2(R) given by
(1|f =
f(x)dx (2.2)
with the domain L2(R) ∩ L1(R). |1) will denote the hermitian conjugate of (1| in the
sense of sesquilinear forms: if f ∈ L2(R) ∩ L1(R), then
(f |1) :=
f̄(x)dx.
Let ZR be the operator of multiplication on L
2(R) by the variable x.
Introduce the Hilbert spaces ZR; = h⊗L2(R) andZ := K ⊕ZR. Clearly,K is contained
in Z, so we have the obvious embedding IK : K → Z. We also have the embedding
IR : ZR → Z.
For t ≥ 0, consider the following sesquilinear form on K ⊕ (h⊗ (L2(R) ∩ L1(R))):
Ut = IRe
−itZRI∗R + IKe
−itΥI∗K (2.3)
−i(2π)− 12 IK
du e−i(t−u)Υν∗ ⊗ (1|e−iuZRI∗R
−i(2π)− 12 IR
du e−i(t−u)ZRν ⊗ |1)e−iuΥI∗K
−(2π)−1IR
0≤u1,u2, u1+u2≤t
du1du2 e
−iu2ZRν ⊗ |1)e−i(t−u2−u1)Υν∗ ⊗ (1|e−iu1ZRI∗R.
By a straightforward computation we obtain [DD1]
Theorem 2.1. The form Ut extends to a strongly continuous unitary group and
I∗KUtIK = e
−itΥ, t ≥ 0.
Thus (Z, IK, Ut) is a dilation of e−itΥ.
6 J. DEREZIŃSKI AND W. DE ROECK
Let −iZ denote the generator of Ut, so that Ut = e−itZ . Z is a self-adjoint operator
with a number of interesting properties. It is not easy to describe it with a well-defined
formula. Formally it is given by the sesquilinear form
(Υ + Υ∗) (2π)−
2 ν∗ ⊗ (1|
(2π)−
2 ν ⊗ |1) ZR
. (2.4)
Note that (2.4) looks like a special case of a Friedrichs operator (see Subsection 2.3
and [DF2]). As it stands, however, (2.4) does not define a unique self-adjoint operator.
Nevertheless, we will sometimes use the expression (2.4) when referring to Z.
Note that it is possible to give a compact formula for the resolvent of Z, (which is
another possible method of defining Z). For z ∈ C+,
(z − Z)−1 := IR(z − ZR)−1I∗R + IK(z −Υ)−1I∗K
+(2π)−
2 IK(z −Υ)−1ν∗ ⊗ (1|(z − ZR)−1I∗R
+(2π)−
2 IR(z − ZR)−1ν ⊗ |1)(z −Υ)−1I∗K
+(2π)−1IR (z − ZR)−1ν ⊗ |1) (z −Υ)−1 ν∗ ⊗ (1|(z − ZR)−1 I∗R.
Yet another approach that allows to define Z involves a “cut-off procedure”. In fact,
Z is the norm resolvent limit for r → ∞ of the following regularized operators:
Zr :=
(Υ + Υ∗) (2π)−
2 ν∗ ⊗ (1|1[−r,r](ZR)
(2π)−
2 ν ⊗ 1[−r,r](ZR)|1) 1[−r,r](ZR)ZR
Note that it is important to remove the cut-off in a symmetric way. If we replace [−r, r]
with [−r, ar] we usually obtain a different operator. The convergence of Zr to Z is the
reason why we can treat (2.4) as the formal expression for Z.
Next, let us mention a certain invariance property of Z. For λ ∈ R, introduce the
following unitary operator on Z
jλu = u, u ∈ K; jλg(y) := λ−1g(λ−2y), g ∈ ZR.
Note that
j∗λZRjλ = λ
2ZR, j
λ|1) = λ|1).
Therefore, the operator Z is invariant with respect to the following scaling:
Z = λ−2j∗λ
(Υ + Υ∗) λ(2π)−
2 ν∗ ⊗ (1|
λ(2π)−
2 ν ⊗ |1) ZR
jλ. (2.5)
(2.5) will play an important role in the extended weak coupling limit.
Note that in the weak coupling limit it is convenient to use the representation of
ZR as a multiplication operator. Another natural possibility is to represent it as the
differentiation operator. Let us describe this alternative version of the dilation.
The (unitary) Fourier transformation on h⊗L2(R) will be denoted as follows:
Ff(τ) := (2π)−1/2
f(x)e−iτxdx. (2.6)
REDUCED AND EXTENDED WEAK COUPLING LIMIT 7
We will use τ as the generic variable after the application of F . The operator Z trans-
formed by 1K ⊕F will be denoted
Ẑ := (1K ⊕ F)Z(1K ⊕F∗). (2.7)
Introduce
Dτ :=
∂τ . (2.8)
Let (δ0| have the meaning of an (unbounded) linear functional on L2(R) with the domain,
say, the first Sobolev space H1(R), such that
(δ0|f) := f(0). (2.9)
Similarly, |δ0) let be its hermitian adjoint in the sense of forms. By applying the Fourier
transform to (2.4), we can write
(Υ + Υ∗) ν∗ ⊗ (δ0|
ν ⊗ |δ0) Dτ
. (2.10)
Clearly, e−itẐ is also a dilation of e−itΥ.
The operator Ẑ (or Z) and the unitary group it generates has a number of curious
and confusing properties. Let us describe one of them. Consider the space D := K ⊕
(h⊗H1(R)). Clearly, it is a dense subspace of Z. Let us define the following quadratic
form on D:
Ẑ+ :=
Υ ν∗ ⊗ (δ0|
ν ⊗ |δ0) Dτ
. (2.11)
Then, for ψ, ψ′ ∈ D,
(ψ|(e−itẐ − 1)ψ′) = −i(ψ|Ẑ+ψ′). (2.12)
One could think that Ẑ+ = Ẑ. But Ẑ+ is in general non-self-adjoint, which is incompat-
ible with the fact that e−itẐ is a unitary group.
To explain this paradox we notice that (ψ|e−itẐψ′) is in general not differentiable at
zero: its right and left derivatives exist but are different. Hence D is not contained in the
domain of the generator of Ẑ. We will call Ẑ+ the false form of the generator of eitẐ .
In order to make an even closer contact with the usual form of the quantum Langevin
equation [HP, At, Fa, Bar, Me], define the cocycle unitary
Ŵ (t) := eitDτ e−itẐ . (2.13)
Then for t > 0, or for t = 0 and the right derivative, we have the “toy Langevin
(stochastic) equation” which holds in the sense of quadratic forms on D,
Ŵ (t) = (Υ + ν ⊗ |δt))Ŵ (t) + ν∗⊗(δt|. (2.14)
2.2. “Toy quadratic noises”. The formula for Ẑ or for Ẑ+ has one interesting feature:
it involves a perturbation that is localized just at τ = 0. One can ask whether one
can consider other dilations with more general perturbations localized at τ = 0. In this
subsection we will describe such dilations. This construction will not be needed in the
present version of the weak coupling limit. We believe it is an interesting “toy version”
8 J. DEREZIŃSKI AND W. DE ROECK
of “quadratic noises”, which we will discuss in Subsect 5.2. We also expect to extend the
results of [DD1] to “toy quadratic noises”.
Clearly, for any unitary operator U on h⊗L2(R), (1K ⊕U)eitZ(1K ⊕U∗) is a dilation
of e−itΥ. Let us choose a special U , which will lead to a perturbation localized at τ = 0.
Let S be a unitary operator on h. For ψ ∈ h⊗L2(R) ≃ L2(R, h) we set
γ(S)ψ(τ) :=
Sψ(τ), τ > 0,
ψ(τ), τ ≤ 0. (2.15)
Then γ(S) is a unitary operator on h⊗L2(R). Set
ẐS := (1K ⊕ γ(S)∗)Ẑ(1K ⊕ γ(S)).
Clearly, eitẐS is a dilation of e−itΥ. It is awkward to write down a formula for ẐS in the
matrix form, even just formally. It is more natural to write down the “false form of ZS”:
Ẑ+S := (1K ⊕ γ(S)
∗)Ẑ+(1K ⊕ γ(S))
Υ ν∗S ⊗ (δ0|
ν ⊗ |δ0) Dτ + i(1− S)⊗|δ0)(δ0|
For ψ, ψ′ ∈ D we have
(ψ|(e−itẐS − 1)ψ′) = −i(ψ|Ẑ+S ψ
′), (2.16)
Again, as in (2.14), one can extend this formula to derivatives at t > 0. Let
ŴS(t) := e
itDτ e−itẐS , (2.17)
then, in the sense of quadratic forms on D,
ŴS(t) = (Υ + ν ⊗ |δt))ŴS(t) + ν∗S⊗(δt|+ i(1 − S)⊗|δt)(δt|. (2.18)
2.3. Weak coupling limit for Friedrichs operators. Let H := K⊕HR be a Hilbert space,
where K is finite dimensional. Let IK be the embedding of K in H. Let K be a self-adjoint
operator on K and HR be a self-adjoint operator on HR. Let V be a linear operator from
K to HR. The following class of operators will be called Friedrichs operators:
Hλ :=
K λV ∗
λV HR
Assume that
‖V ∗e−itHRV ‖dt < ∞. Then we can define the following operator,
sometimes called the Level Shift Operator, since it describes the shift of eigenvalues of
Hλ in perturbation theory at the 2nd order in λ:
k∈spK
1k(K)V
∗e−it(HR−k)V 1k(K)dt, (2.19)
where 1k(K) denotes the spectral projection of K onto the eigenvalue k; spK denotes
the spectrum of K. Note that ΥK = KΥ.
The following theorem is a special case of a result of Davies [Da1, Da2, Da3], see also
[DD1]:
REDUCED AND EXTENDED WEAK COUPLING LIMIT 9
Theorem 2.2 (Reduced weak coupling limit for Friedrichs operators).
eitK/λ
−itHλ/λ
IK = e
−itΥ.
In order to study the extended weak coupling limit for Friedrichs operators we need
to make additional assumptions. They are perhaps a little complicated to state, but they
are satisfied in many concrete situations.
Assumption 2.3. We suppose that for any k ∈ spK there exists an open Ik ⊂ R and a
Hilbert space hk such that k ∈ Ik,
Ran1Ik(HR) ≃ hk ⊗ L2(Ik, dx),
1Ik(HR)HR is the multiplication operator by the variable x ∈ Ik and
1Ik(HR)V ≃
v(x)dx.
We assume that Ik are disjoint for distinct k and the measurable function Ik ∋ x 7→
v(x) ∈ B(K, hk) is continuous at k.
In other words, we assume that the reservoir Hamiltonian HR and the interaction V
are “nice” around the spectrum of K. In fact, in the extended weak coupling limit only
a vicinity of spK matters.
We set h := ⊕
hk, ZR := h⊗L2(R) and Z := K ⊕ ZR. ZR and Z are the so-called
asymptotic spaces, which are in general different from the physical spaces HR and H.
Next, let us describe the asymptotic dynamics. Let ν : K → h be defined as
ν := (2π)
v(k)1k(K).
Note that it satisfies (2.1) with Υ defined by (2.19). This follows by extending the inte-
gration in (2.19) to R and using the inverse Fourier transform. As before, we set ZR to
be the multiplication by x on L2(R) and we define e−itZ by (2.3), so that (Z, IK, e−itZ)
is a dilation of e−itΥ.
Finally, we need an identification operator that maps the asymptotic space into the
physical space. This is the least canonical part of the construction. In fact, there is some
arbitrariness in its definition for the frequencies away from spK. For λ > 0, we define
the family of partial isometries Jλ,k : L
2(R, hk) → L2(Ik, hk) ⊂ H:
(Jλ,kgk)(y) =
), if y ∈ Ik;
0, if y ∈ R\Ik.
We set Jλ : Z → H, defined for g = (gk) ∈ ZR by
Jλg :=
Jλ,kgk,
and on K equal to the identity. Note that Jλ are partial isometries and
s− lim
J∗λJλ = 1.
The following result is proven in [DD1]:
Theorem 2.4 (Extended weak coupling limit for Friedrichs operators).
s∗ − lim
iλ−2tH0eλ
−2(t−t0)Hλeiλ
−2t0H0Jλ = e
itZRe−i(t−t0)Zeit0ZR .
10 J. DEREZIŃSKI AND W. DE ROECK
Here we used the strong* limit: s∗ − limλց0 Aλ = A means that for any vector ψ we
have limλց0Aλψ = Aψ, limλց0 A
λψ = A
Note that in the extended weak coupling limit for Friedrichs operators the asymptotic
space is a direct sum of parts belonging to various eigenvalues of K that “do not talk to
one another”–have independent asymptotic dynamics.
3. Completely positive maps and semigroups. This section presents basic material
about completely positive maps and semigroups. In particular, we describe a construction
of the Stinespring dilation [St] and of the so-called Lindblad form of the generator of a c.p.
semigroup [Li, GKS]. These beautiful classic results are described in many places in the
literature. Nevertheless, some of their aspects, mostly concerning the freedom of choice
of various terms in the Lindblad form, are difficult to find in the literature. Therefore,
we describe this material at length, including sketches of proofs.
In Subsect. 3.3 we recall the usual concept of a (classical) Markov semigroups (on a
finite state space). When discussing c.p. (quantum) Markov semigroups, it is useful to
compare it to their classical analogs, which are usually much simpler.
In Subsect 3.4 we discuss c.p. semigroups invariant with respect to a certain unitary
dynamics. Such c.p. semigroups arise in the weak coupling limit – therefore, one can
argue that they are “more physical than others”.
Finally, in Subsect. 3.5 we analyze the Detailed Balance Condition, which singles out
c.p. dynamics obtained from a thermal reservoir.
3.1. Completely positive maps. Let K1,K2 be Hilbert spaces. We say that a map Ξ :
B(K1) → B(K2) is positive iff A ≥ 0 implies Ξ(A) ≥ 0. We say that Ξ is Markov iff
Ξ(1) = 1. We say that a map Ξ is n-positive iff
Ξ⊗ id : B(K1 ⊗ Cn) → B(K2 ⊗ Cn)
is positive. (id denotes the identity). We say that Ξ is completely positive, or c.p. for
short, iff it is n-positive for any n.
It is easy to see that if h be a Hilbert space and ν ∈ B(K2,K1 ⊗ h). Then
Ξ(A) := ν∗ A⊗1 ν (3.1)
is c.p. The following theorem says that the above representation of a c.p. map is universal.
2) means that this representation is unique up to a unitary isomorphism.
Theorem 3.1 (Stinespring). Assume that K1,K2 are finite dimensional.
1) If Ξ is c.p. from B(K1) to B(K2), then there exist a Hilbert space h and ν ∈
B(K2,K1 ⊗ h) such that (3.1) is true and
{(φ|⊗1h ν ψ : φ ∈ K1, ψ ∈ K2} = h. (3.2)
2) If in addition to the h′ and ν′ also satisfy the above properties, then there exists a
unique unitary operator U from h to h′ such that ν′ = 1K1 ⊗ U ν.
The right hand side of (3.1) is called a Stinespring dilation of a c.p. map Ξ. If the
condition (3.2) holds, then it is called a minimal.
REDUCED AND EXTENDED WEAK COUPLING LIMIT 11
Remark 3.2. If we choose a basis in h, so that we identify h with Cn, then we can
identify ν with ν1, . . . , νn ∈ B(K2,K1). Then we can rewrite (3.1) as
Ξ(A) =
ν∗jAνj . (3.3)
In the literature, (3.3) is called a Kraus decomposition, even though the work of Stine-
spring is much earlier than that of Kraus.
Note that physically the space h can be interpreted as a part of the reservoir that
directly interacts with the small system.
Proof of Theorem 3.1. Let us prove 1). We equip the algebraic tensor product H0 :=
B(K1)⊗K2 with the following scalar product: for
Xi ⊗ vi, w̃ =
Yi ⊗ wi ∈ H0
we set
(ṽ|w̃) =
(vi|Ξ(X∗i Yj)wj).
By the complete positivity, it is positive definite.
Next we note that there exists a unique linear map π0 : B(K1) → B(H0) satisfying
π0(A)ṽ :=
AXi ⊗ vi.
We check that
(π0(A)ṽ|π0(A)ṽ) ≤ ‖A‖2(ṽ|ṽ), π0(AB) = π0(A)π0(B), π0(A∗) = π0(A)∗.
Let N be the set of ṽ ∈ H0 with (ṽ|ṽ) = 0. Then the completion of H := H0/N is a
Hilbert space. There exists a nondegenerate ∗-representation π of B(K1) in B(H) such
π(A)(ṽ +N ) = π0(A)ṽ.
Using the fact that all our spaces are finite dimensional we see that for some Hilbert
space h we can identify H with K1 ⊗ h and π(A) = A⊗ 1.
We set
νv := 1⊗ v +N .
We check that
Ξ(A) = ν∗A⊗1 ν.
This ends the proof of the existence of the Stinespring dilation.
Let us now prove 2). If h′, ν′ is another pair that gives a Stinespring dilation, we
check that
Xi ⊗ 1h ν vi
Xi ⊗ 1h′ ν′ vi
Therefore, there exists a unitary U0 : K2 ⊗ h → K2 ⊗ h′ such that U0ν = ν′. We check
that U0 A ⊗ 1h = A ⊗ 1h′ U0. Therefore, there exists a unitary U : h → h′ such that
U0 = 1⊗ U .
12 J. DEREZIŃSKI AND W. DE ROECK
We will need the following inequality for c.p. maps:
Theorem 3.3 (Kadison-Schwarz inequality for c.p. maps.). If Ξ is 2-positive
and Ξ(1) is invertible, then
Ξ(A)∗Ξ(1)−1Ξ(A) ≤ Ξ(A∗A). (3.4)
Proof. Let z ∈ C.
A∗A zA∗
z̄A |z|2
≥ 0 implies
Ξ(A∗A) zΞ(A∗)
z̄Ξ(A) |z|2Ξ(1)
≥ 0. Hence, for
φ, ψ ∈ K,
(φ|Ξ(A∗A)φ) + 2Rez̄(ψ|Ξ(1)−1/2Ξ(A)φ) + |z|2(ψ|ψ) ≥ 0. (3.5)
Therefore,
(φ|Ξ(A∗A)|φ)(ψ|ψ) ≥ |(ψ|Ξ(1)−1/2Ξ(A)φ)|2, (3.6)
which implies (3.4).
3.2. Completely positive semigroups. Let K be a finite dimensional Hilbert space. Let
us consider a c.p. semigroup on B(K). We will always assume the semigroup to be
continuous, so that it can be written as etM for a bounded operatorM on B(K). We will
call etM Markov if it preserves the identity.
C.p. Markov semigroups appear in the literature under various names. Among them
let us mention quantum Markov semigroups and quantum dynamical semigroups.
IfM1,M2 are the generators of (Markov) c.p. semigroups and c1, c2 ≥ 0, then c1M1+
c2M2 is the generator of a (Markov) c.p. semigroup. This follows by the Trotter formula.
Here are two classes of examples of c.p. semigroups:
1) Let Υ = Θ+ i∆ be an operator on K, with Θ,∆ self-adjoint. Then
M(A) := iΥA− iAΥ∗ = i[Θ, A]− [∆, A]+
is the generator of a c.p. semigroup and
etM (A) = eitΥAe−itΥ
2) Let Ξ be a c.p. map on B(K). Then it is the generator of a c.p. semigroup and
etΞ(A) =
Ξj(A).
Let Θ, ∆ be self-adjoint operators on K. Let h be an auxiliary Hilbert space and
ν ∈ B(K,K ⊗ h). Then it follows from what we wrote above that
M(S) = i[Θ, A]− [∆, A]+ + ν∗ A⊗1 ν, A ∈ B(K), (3.7)
is the generator of a c.p. semigroup. etM is Markov iff 2∆ = ν∗ν.
The following theorem gives a complete characterization of generators of c.p. semi-
groups on a finite dimensional space [Li, GKS].
Theorem 3.4 (Lindblad, Gorini-Kossakowski-Sudarshan). 1) Let etM be a
c.p. semigroup on B(K) for a finite dimensional Hilbert space K. Then there exist
self-adjoint operators Θ, ∆ on K, an auxiliary Hilbert space h and an operator
ν ∈ B(K,K ⊗ h) such that M can be written in the form (3.7) and
{(φ|⊗1 ν ψ : φ, ψ ∈ K} = h. (3.8)
REDUCED AND EXTENDED WEAK COUPLING LIMIT 13
2) We can always choose Θ and ν so that
TrΘ = 0, Tr ν = 0.
(Above, we take the trace of ν on the space K obtaining a vector in h). If this is the
case, then Θ and ∆ are determined uniquely, and ν is determined uniquely up to
the unitary equivalence.
We will say that a c.p. semigroup is purely dissipative if Θ = 0. We will call (3.7) a
Lindblad form of M . We will say that it is minimal iff (3.8) holds.
Remark 3.5. If we identify h with Cn, then we can write
ν∗ A⊗1 ν =
ν∗jAνj .
Then Tr ν = 0 means Tr νj = 0, j = 1, . . . , n.
Proof of Theorem 3.4. Let us prove 1). The unitary group onK, denoted U(K), is compact.
Therefore, there exists the Haar measure on U(K), which we denote dU . Note that
UXU∗dU = TrX.
Define
iΘ−∆0 :=
M(U∗)UdU,
where Θ and ∆0 are self-adjoint.
Let us show that
M(XU∗)UdU = (iΘ−∆0)X. (3.9)
First check this identity for unitary X , which follows by the invariance of the measure
dU . But every operator is a linear combination of unitaries. So (3.9) follows in general.
We can apply the Kadison-Schwarz inequality to the semigroup etM :
etM (X)∗etM (1)−1etM (X) ≤ etM (X∗X). (3.10)
Differentiating (3.10) at t = 0 yields
M(X∗X) +X∗M(1)X −M(X∗)X −X∗M(X) ≥ 0. (3.11)
Replacing X with UX , where U is unitary, we obtain
M(X∗X) +X∗U∗M(1)UX −M(X∗U∗)UX −X∗U∗M(UX) ≥ 0. (3.12)
Integrating (3.12) over U(K) we get
M(X∗X) +X∗X TrM(1)− (iΘ−∆0)X∗X −X∗X(−iΘ−∆0)∗ ≥ 0. (3.13)
Define
∆1 := ∆0 +
TrM(1),
Ξ(A) := M(A)− (iΘ −∆1)A−A(−iΘ −∆1).
Using (3.13) we see that Ξ is positive. A straightforward extension of the above argument
shows that Ξ is also completely positive. Hence, by Theorem 3.1 1), it can be written as
Ξ(A) = ν∗1 A⊗1 ν1,
14 J. DEREZIŃSKI AND W. DE ROECK
for some auxiliary Hilbert space h and a map ν1 : K → K⊗h.
Finally, let us prove 2). The operator Θ has trace zero, because
i TrΘ− Tr∆0 =
U1M(U
∗)UU∗1 dUdU1
U2UM(U
∗)U∗2 dUdU2
= −i TrΘ− Tr∆0.
Let w be an arbitrary vector in h and
∆ := ∆1 + ν
∗1⊗|w) + 1
(w|w),
ν := ν1 + 1⊗|w).
Then the same generator of a c.p. semigroup can be written in two Lindblad forms:
(iΘ−∆1)A+A(−iΘ−∆1) + ν∗1Aν1,
= (iΘ−∆)A+A(−iΘ −∆) + ν∗Aν.
In particular, choosing w := −Tr ν1, we can make sure that Tr ν = 0.
3.3. Classical Markov semigroups. It is instructive to compare c.p. Markov semigroups
with usual (classical) Markov semigroups.
Consider the space Cn. For u = (u1, . . . , un) ∈ Cn we will write u ≥ 0 iff u1, . . . , un ≥
0. We define 1 := (1, . . . , 1). We say that a linear map T is pointwise positive iff u ≥ 0
implies Tu ≥ 0. We say that it is Markov iff T1 = 1.
A one-parameter semigroup Rt 7→ Tt ∈ B(Cn) will be called a (classical) Markov
semigroup iff Tt is pointwise positive and Markov for any t ≥ 0.
Every continuous one-parameter semigroup on Cn is of the form R+ ∋ t 7→ etm for
some n × n matrix m. Clearly, the transformations etm are pointwise positive for any
t ≥ 0 iff mij ≥ 0, i 6= j. They are Markov for any t ≥ 0 iff in addition
jmij = 0.
Markov c.p. semigroups often lead to classical Markov semigroups, as described in
the following easy fact:
Theorem 3.6. Let P1, . . . , Pn ∈ B(K) satisfy P ∗j = Pj and PjPk = δjkPj. Let P be the
(commutative) ∗-algebra generated by P1, . . . , Pn. Clearly, P is naturally isomorphic to
n. Let etM be a Markov c.p. semigroup on B(K) that preserves the algebra P. Then
is a classical Markov semigroup.
Conversely, from a classical Markov semigroup one can construct c.p. Markov semi-
groups:
Theorem 3.7. Let etm be a classical Markov semigroup on Cn. Let e1, . . . , en denote
the canonical basis of Cn and Eij := |ei)(ej |. Let θ1, . . . θn be real numbers and set
Θ := θ1E11 + · · ·+ θnEnn. For A ∈ B(Cn) define
M(A) := i[Θ, A]− 1
mjj [Ejj , A]+ +
mijEijAEji. (3.14)
Then M is the generator of a Markov c.p. semigroup on B(Cn). The algebra P gener-
ated by E11, . . . , Enn is preserved by e
tM and naturally isomorphic to Cn. Under this
REDUCED AND EXTENDED WEAK COUPLING LIMIT 15
identification, M
equals m.
3.4. Invariant c.p semigroups. Let K be a self-adjoint operator on K. Let M be the
generator of a c.p. semigroup on K. We say that M is K-invariant iff
M(A) = e−itKM(eitKAe−itK)eitK , t ∈ R. (3.15)
We will see later on that c.p. semigroups obtained in the weak coupling limit are always
K-invariant with respect the Hamiltonian of the small system.
Note that M can be split in a canonical way into M = i[Θ, ·] +Md, where Md is its
purely dissipative part. M is K-invariant iff [Θ,K] = 0 and Md is K-invariant. Thus in
what follows it is enough to restrict ourselves to the purely dissipative case.
The following two theorems extend Theorem 3.6 and 3.7.
Theorem 3.8. Consider the set-up of Theorem 3.6. Suppose in addition that K is a
self-adjoint operator on K with the eigenvalues k1, . . . , kn and Pj = 1kj (K). Let M
be K-invariant. Then the algebra P is preserved by etM (and hence the conclusion of
Theorem 3.6 holds).
Theorem 3.9. Consider the set-up of Theorem 3.7. If k1, . . . , kn are real and K :=
k1E11 + · · ·+ knEnn, then M is K-invariant.
The following theorem describes the K-invariance on the level of a Lindblad form.
We restrict ourselves to the Markov case.
Theorem 3.10. Let ν ∈ B(K,K⊗h) and let Y be a self-adjoint operator on h such that
M(A) = −1
[ν∗ν,A]+ + ν
∗A⊗ 1ν, (3.16)
νK = (K⊗1 + 1⊗Y )ν. (3.17)
Then M is the generator of a K-invariant purely dissipative Markov c.p. semigroup.
Proof. We check that ν∗ν commutes with K. Then it is enough to verify that A 7→
ν∗A⊗ 1 ν is K-invariant.
There exists a partial converse of Theorem 3.10.
Theorem 3.11. Let M be the generator of a K-invariant purely dissipative Markov c.p.
semigroup. Let h, ν realize its minimal Lindblad form (3.16). Then there exists a self-
adjoint operator Y on h such that (3.17) is true.
Proof. By the uniqueness part of Theorem 3.1 there exists a unique unitary operator Ut
on h such that eitK⊗Ut ν e−itK = ν. We easily check the Ut is a continuous 1-parameter
unitary group so that Ut can be written as e
itY for some self-adjoint Y .
Note that Theorems 3.10 and 3.11 have a clear physical meaning. The operator ν is
responsible for “quantum jumps”. The operator Y describes the energy of the reservoir
(or actually of the part of the reservoir “directly seen” by the interaction). The equation
(3.17) describes the energetic balance in each quantum jump.
3.5. Detailed Balance Condition. In the literature the name Detailed Balance Condition
(DBC) is given to several related but non-equivalent concepts. In this subsection we
discuss some of the versions of the DBC relevant in the weak coupling limit.
16 J. DEREZIŃSKI AND W. DE ROECK
Some of the definitions of the DBC (both for classical and quantum systems) involve
the time reversal [Ag, Ma, MaSt]. In the weak coupling limit one does not need to
introduce the time reversal, hence we will only discuss versions of the DBC that do not
involve this operation. (See however [DM] for a discussion of time-reversal in semigroups
obtained in the weak coupling limit.)
Let us first recall the definition of the classical Detailed Balance Condition. Let p =
(p1, . . . , pn) ∈ Cn be a vector with p1, . . . , pn > 0. Introduce the scalar product on Cn:
(u|u′)p :=
jpj . (3.18)
Let etm be a classical Markov semigroup on Cn. We say that m satisfies the Detailed
Balance Condition for p iff m is self-adjoint for (·|·)p.
Let us now consider the quantum case. Let ρ be a nondegenerate density matrix. As
usual, we assume that K is finite dimensional. On B(K) we introduce the scalar product
(A|B)ρ := Tr ρ1/2A∗ρ1/2B. (3.19)
Let M be the generator of a c.p. semigroup on B(K). Recall that it can be uniquely
represented as
M = i[Θ, ·] +Md,
where Md is its purely dissipative part and i[Θ, ·] its Hamiltonian part. We say that M
satisfies the Detailed Balance Condition (or DBC) for ρ iff Md is self-adjoint and i[Θ, ·]
is anti-self-adjoint for (·|·)ρ.
Note that M satisfies the DBC for ρ iff [Θ, ρ] = 0 and Md satisfies the DBC for ρ.
Therefore, in our further analysis we will often restrict ourselves to the purely dissipative
case.
We believe that in the quantum finite dimensional case the above definition of the
DBC is the most natural. It was used e.g. in [DF1] under the name of the standard
Detailed Balance Condition.
A similar but different definition of the DBC can be found in [FKGV, Al1]. Its only
difference is the replacement of the scalar product (·|·)ρ given in (3.19) with
Tr ρA∗B. (3.20)
Note that if M is K-invariant and ρ is a function of K, then both definitions are equiv-
alent.
The weak coupling limit applied to a small system with a Hamiltonian K interact-
ing with a thermal reservoir at some fixed temperature β always yields a Markov c.p.
semigroup that is K-invariant and satisfies the DBC for ρ = e−βK/Tr eβK ; see e.g.
[LeSp, DF1] and Subsect 4.3.
There exists a close relationship between the classical and quantum DBC.
Theorem 3.12. Consider the set-up of Theorem 3.6. Let ρ be a density matrix on K with
the eigenvalues p1, . . . , pn and let Pj equal the spectral projections of ρ for the eigenvalue
pj. If M satisfies the DBC for ρ, either in the sense of (3.19) or in the sense of (3.20),
then the classical Markov semigroup etM
satisfies the DBC for p = (p1, . . . , pn).
REDUCED AND EXTENDED WEAK COUPLING LIMIT 17
Theorem 3.13. Consider the set-up of Theorem 3.7. Let etm satisfy the classical DBC
for p = (p1, . . . , pn). Then M defined by (3.14) satisfies both quantum versions of the
DBC for ρ := p1E11 + · · ·+ pnEnn.
The following theorem describes the DBC for K-invariant generators on the level of
their Lindblad form. It is an extension of Theorem 3.10. (Note that (3.21), (3.22) are
identical to (3.16), (3.17) of Theorem 3.10).
Theorem 3.14. Let ν ∈ B(K,K⊗h) and Y a self-adjoint operator on h such that
M(A) = −1
[ν∗ν,A]+ + ν
∗A⊗ 1ν, (3.21)
νK = (K⊗1 + 1⊗Y ) ν, (3.22)
Trh νAν
∗ = ν∗ A⊗e−βY ν. (3.23)
Then M is the generator of a K-invariant purely dissipative Markov c.p. semigroup
satisfying the DBC for ρ := e−βK/Tr e−βK.
Proof. It follows from (3.17) that ν∗ν commutes with e−βK/2. Hence [ν∗ν, ·]+ is self-
adjoint for (·|·)ρ.
If M is a map on B(K), then M∗ρ will denote the adjoint for this scalar product. Let
M1(A) = ν
∗ A⊗1 ν. We compute:
1 (A) = Trh e
βK/2⊗1 ν e−βK/2Ae−βK/2 ν∗ eβK/2⊗1
= Trh e
βK/2ν∗ (e−βK/2Ae−βK/2⊗e−βY ) ν eβK/2 (3.24)
= ν∗ A⊗1ν =M1(A). (3.25)
In (3.24) and (3.25) we used (3.23) and (3.22) respectively.
It is possible to replace the condition (3.23) with a different condition (3.26). Note
that whereas (3.23) is quadratic in ν, (3.26) is linear in ν.
Theorem 3.15. Suppose that ǫ is an antiunitary operator on h such that
(φ⊗w|νψ) = (νφ|ψ ⊗ e−βY/2ǫw), φ, ψ ∈ K, w ∈ h. (3.26)
Then (3.23) holds.
Proof. It is sufficient to assume that A = |ψ)(ψ| for some ψ ∈ K. Let φ ∈ K. Let
{wi | i ∈ I} be an orthonormal basis in h. Then
Trh(φ|νAν∗φ) =
(φ⊗wi|νψ)(νψ|φ⊗wi)
(νφ|ψ⊗e−βY/2ǫwi)(ψ⊗e−βY/2ǫwi|νφ)
|ψ)(ψ|⊗e−βY νφ
= (φ|ν∗ A⊗e−βY νφ).
There exists an extension of Theorem 3.11 to the Detailed Balance Condition. It can
be viewed as a partial converse of Theorems 3.14 and 3.15:
18 J. DEREZIŃSKI AND W. DE ROECK
Theorem 3.16. Let M be the generator of a K-invariant purely dissipative Markov c.p.
semigroup satisfying the DBC for e−βK/Tr e−βK . Let h, ν realize its minimal Lindblad
form (3.21). Let a self-adjoint operator Y on h satisfy (3.22). Then (3.23) is true and
there exists a unique antiunitary operator ǫ on h such that (3.26) holds. Besides, ǫY ǫ =
−Y and ǫ2 = 1.
Proof. Step 1. By the proof of Theorem 3.14, the DBC for e−βK/Tr e−βK together with
(3.22) imply (3.23).
Step 2. The next step is to prove that (3.23) and (3.8) imply the existence of an antiu-
nitary ǫ on h satisfying (3.26).
Identify h with Cn, so that we have a complex conjugation w 7→ w in h. We can
assume that Y is diagonal, so that Y w = Y w, w ∈ h. Define ν⋆ by
(φ⊗w|νψ) = (ν⋆φ|ψ ⊗ w̄), φ, ψ ∈ K, w ∈ h. (3.27)
(Note that ⋆ is a different star from ∗ denoting the Hermitian conjugation, see [DF1]).
We can rewrite (3.23) as
ν⋆∗ A⊗1 ν⋆ = ν∗ 1⊗e−βY/2 (A⊗1) 1⊗e−βY/2 ν. (3.28)
(3.28) defines a c.p. map. By the uniqueness part of Theorem 3.1 and (3.8), we
obtain the existence of a unitary map U on h such that ν⋆ = 1⊗Ue−βY/2 ν. Now we set
ǫw = U∗w̄.
Step 3. We apply (3.23) twice:
(φ⊗w|νψ) = (νφ|ψ⊗e−βY/2ǫw) = (φ⊗(e−βY/2ǫ)2w|νψ).
Using (3.8) we obtain w = (e−βY/2ǫ)2w.
Step 4. Finally applying (3.23) together with (3.22) twice we obtain
(φ⊗w|νψ) = (νeβK/2φ|e−βK/2ψ⊗ǫw) = (φ⊗ǫ2w|νψ).
Thus with help of (3.8) we get w = ǫ2w.
Note that the above results show that for c.p. Markov semigroups that areK-invariant
and satisfy the DBC for e−βK/Tr e−βK we naturally obtain a certain algebraic structure
on the “restricted reservoir” h that resembles closely the famous Tomita-Takesaki theory.
The properties of e−βY and ǫ are paralel to those of the modular operator and the modular
conjugations – the basic objects of the Tomita-Takesaki formalism. (See also Subsection
4.3).
4. Bosonic reservoirs. In this section we recall basic terminology related to second
quantization, see e.g. [De0]. We also introduce Pauli-Fierz operators – a class of models
(known in the literature under various names) that are often used to describe realistic
physical systems, see e.g. [DJ1, DJP].
4.1. Second quantization. Let HR be a Hilbert space describing 1-particle states. The
corresponding bosonic Fock space is defined as
Γs(HR) :=
⊗ns HR.
REDUCED AND EXTENDED WEAK COUPLING LIMIT 19
The vacuum vector is Ω = 1 ∈ ⊗0sHR = C.
If z ∈ HR, then
a(z)Ψ :=
n(z|⊗1(n−1)⊗Ψ ∈ ⊗n−1s HR, Ψ ∈ ⊗ns HR,
is called the annihilation operator of z and a∗(z) := a(z)∗ is the corresponding creation
operator. They are closable operators on Γs(HR).
For an operator q on HR we define the operator Γ(q) on Γs(HR) by
= q ⊗ · · · ⊗ q. (4.1)
For an operator h on HR we define the operator dΓ(h) on Γs(HR) by
dΓ(h)
= h⊗ 1(n−1)⊗ + · · · 1(n−1)⊗ ⊗ h.
Note the identity Γ(eith) = eitdΓ(h).
4.2. Coupling to a bosonic reservoir. Let K be a finite dimensional Hilbert space. We
imagine that it describes a small quantum system interacting with a bosonic reservoir
described by the Fock space Γs(HR). The coupled system is described by the Hilbert
space H := K⊗ Γs(HR).
Let V ∈ B(K,K ⊗HR). For Ψ ∈ K ⊗⊗ns HR we set
a(V )Ψ :=
nV ∗⊗1(n−1)⊗Ψ ∈ K ⊗⊗n−1s HR.
a(V ) is called the annihilation operator of V and a∗(V ) := a(V )∗ the corresponding
creation operator. They are closable operators on K⊗Γs(HR). Note in particular that if
V is written in the form
j Vj⊗|bj) (which is always possible), then
a∗(V ) =
Vj ⊗ a∗(bj), a(V ) = V ∗j ⊗ a(bj),
where a∗(bj), a(bj) are the usual creation/annihilation operators introduced in the pre-
vious subsection.
The following class of operators plays the central role in our article:
Hλ = K ⊗ 1 + 1⊗ dΓ(HR) + λ(a∗(V ) + a(V )). (4.2)
HereK is a self-adjoint operator describing the free dynamics of the small system, dΓ(HR)
describes the free dynamics of the reservoir and a∗(V )/a(V ), for some V ∈ B(K,K⊗HR),
describe the interaction. Operators of the form 4.2 will be called Pauli-Fierz operators.
Note that operators of the form (4.2) or similar are very common in the physics lit-
erature and are believed to give an approximate description of realistic physical systems
in many circumstances (e.g. an atom interacting with radiation in the dipole approxima-
tion), see e.g. [DJ1].
4.3. Thermal reservoirs. In this subsection we will discuss thermal reservoirs. We fix a
positive number β having the interpretation of the inverse temperature.
20 J. DEREZIŃSKI AND W. DE ROECK
Recall that the free Hamiltonian is H0 := K⊗1 + 1⊗dΓ(HR). To have a simpler
formula for the Gibbs state of the small system we assume that Tr e−βK = 1. We set
τt(C) := e
itH0Ce−itH0 ,
ωβ(C) := Tr e
−βK⊗|Ω)(Ω| C, C ∈ B(H).
Theorem 4.1. The following are equivalent:
1) For any D1, D2, D
2 ∈ B(K) and
Bj := Dj⊗1 (a∗(V ) + a(V )) D′j ⊗ 1, j = 1, 2,
and for any t ∈ R we have
ωβ(τt(B1)B2) = ωβ (B2τt+iβ(B1)) . (4.3)
2) For any function f on the spectrum of spHR and A ∈ B(K), we have
TrHR 1⊗f̄(−HR) V A V ∗ = V ∗ A⊗e−βHRf(HR) V. (4.4)
Proof. The left hand side of (4.3) equals
Tr e−βK+itKD1V
∗(D′1e
−itKD2⊗e−itHR)V D′2.
The right hand side of (4.3) equals
TrD2V
∗(D′2e
−βK+itKD1⊗e(−β+it)HR)V D′1e−itK .
Now we set A1 := D
−βK+itKD1, A2 := D
−itKD2, and use the cyclicity of the trace.
We obtain
TrA2⊗e−itHR V A1 V ∗ = TrA2V ∗A1⊗e−βHR+itHR V.
By the Fourier transformation we get
TrA2⊗f̄(−HR) V A1 V ∗ = TrA2V ∗A1⊗e−βHRf(HR) V.
This implies (4.4).
We will say that the reservoir is thermal at the inverse temperature β iff the conditions
of Theorem 4.1 are true.
(4.3) is just the β-KMS condition for the state ωβ , the dynamics τ and appropriate
operators. Note that (4.3) is satisfied for Pauli-Fierz semi-Liouvilleans constructed with
help of the Araki-Woods representations of the CCR, where we use the terminology of
[DJP, De0]. Theorem 4.1 describes a substitute of the KMS condition without invoking
explicitly operator algebras.
The KMS condition is closely related to the Tomita-Takesaki theory. One of the
objects introduced in this theory is the modular conjugation. It turns out that the set-up
of Theorem 4.1 is sufficient to introduce a substitute for the modular conjugation without
talking about operator algebras.
Define
HR̃ := {(φ|⊗f(HR) V ψ : φ, ψ ∈ K, f ∈ Cc(R)}
(cl denotes the closure). Clearly, HR̃ is a subspace of HR invariant with respect to the
1-particle reservoir Liouvillean HR. It describes the part of HR that is coupled to the
small system. Let HR̃ denote the operator HR restricted to the space HR̃.
REDUCED AND EXTENDED WEAK COUPLING LIMIT 21
Theorem 4.2. Suppose that the reservoir is thermal at inverse temperature β. Then
there exist a unique antiunitary operator ǫR̃ on HR̃ such that
(φ⊗w|V ψ) = (V φ|ψ⊗e−βHR̃ǫR̃w). (4.5)
It satisfies ǫ2
= 1 and ǫR̃HR̃ǫR̃ = −HR̃.
Proof. For f ∈ Cc(R), φ, ψ ∈ K, we set
(φ|⊗e−βHR̃/2f(HR̃) V ψ
:= (ψ|⊗f̄(−HR̃)V φ.
(4.4) implies that ǫR̃ is a well defined antiunitary map.
5. Quantum Langevin dynamics. Suppose that we are given a c.p. Markov semigroup
etM on B(K). We will describe a certain class of self-adjoint operators Z on a larger
Hilbert space such that e−itZ · eitZ is a dilation on etM . We will use the name quantum
Langevin (or stochastic) dynamics for e−itZ · eitZ . The unitary group e−itZ will be called
a Langevin (or stochastic) Schrödinger dynamics.
In Subsection 5.1 we will restrict ourselves to a subclass of quantum Langevin dy-
namics involving only the so-called linear noises. Actually, at present our results on the
extended weak coupling limit are limited only to them.
In Subsection 5.2 we will describe a more general class of quantum Langevin dynamics,
which also involve quadratic noises. Our construction involving quadratic noises is related
to the operator-theoretic approach of Chebotarev [Ch, ChR], and especially of Gregoratti
[Gr].
We expect that our approach to the extended weak coupling limit can be improved
to cover also this larger class. Within the approach of [AFL] there exist partial results in
this direction [Go].
The history of the discovery of quantum Langevin dynamics is quite involved. The
construction can be traced back to [AFLe], and especially [HP] where the quantum
stochastic calculus was introduced. But apparently only in [Fr] and [Maa] it was inde-
pendently realized that this leads to a dilation of Markov c.p. semigroups. Let us also
mention [At, Me, Fa] for more recent presentations of the quantum stochastic calculus.
5.1. Linear noises. Apart from a c.p. Markov semigroup etM let us fix some additional
data. More precisely, we fix an operator Υ, an auxiliary Hilbert space h and an operator
ν from K to K ⊗ h such that
−iΥ + iΥ∗ = −ν∗ν
and M is given by
M(A) = −i(ΥA−AΥ∗) + ν∗ A⊗1 ν, A ∈ B(K).
In other words, we fix a concrete Lindblad form of M .
Introduce the Hilbert space ZR := h ⊗ L2(R). The enlarged Hilbert space is Z :=
K ⊗ Γs(ZR).
Let ZR be the operator of multiplication by the variable x on L
2(R). Let (1|, |1) be
defined as in (2.2).
22 J. DEREZIŃSKI AND W. DE ROECK
We choose a basis (bj) in h, so that we can write
νj ⊗ |bj). (5.1)
(Note that at the end the construction will not depend on the choice of a basis). Set
ν+j = νj , ν
j = ν
For t ≥ 0 we define the quadratic form
Ut := e
−itdΓ(ZR)
t≥tn≥···≥t1≥0
dtn · · · dt1
×(2π)−n2
j1,...,jn
ǫ1,...,ǫn∈{+,−}
×(−i)ne−i(t−tn)Υνǫnjn e
−i(tn−tn−1)Υ · · · νǫ1j1 e
−i(t1−0)Υ
k=1,...,n: ǫk=+
a∗(eitkZRbjk ⊗ |1))
k′=1,...,n: ǫk′=−
a(eitk′ZRbjk′ ⊗ |1));
U−t := U
We will denote by IK the embedding of K ≃ K ⊗ Ω in Z.
Theorem 5.1. Ut extends to a strongly continuous unitary group on Z such that
I∗KUtIK = e
−itΥ,
I∗KUt A⊗ 1 U−tIK = etM (A).
Thus Ut is a unitary dilation of e
−itΥ, and Ut · U∗t is a dilation of etM .
As every strongly continuous unitary group, Ut can be written as e
−itZ for a certain
self-adjoint operator Z. Note that formally (and also rigorously with an appropriate
regularization)
(Υ + Υ∗) + dΓ(ZR)
+(2π)−
2 a∗ (ν ⊗ |1)) + (2π)− 12 a (ν ⊗ |1)) .
Thus Z has the form of a Pauli-Fierz operator with a rather singular interaction.
Let us present an alternative variation of the above construction, which is actually
closer to what can be found in the literature. Let F be the Fourier transformation on
ZR = h ⊗ L2(R) defined as in (2.6). The operator Z transformed by 1K⊗Γ(F) will be
denoted by
Ẑ := 1K⊗Γ(F) Z 1K⊗Γ(F∗). (5.2)
It equals
(Υ + Υ∗)⊗1 + 1⊗dΓ(Dτ )
+a (ν ⊗ |δ0)) + a∗ (ν ⊗ |δ0)) ,
REDUCED AND EXTENDED WEAK COUPLING LIMIT 23
where δ0, Dτ are defined as in (2.8), (2.9).
Similarly to the operator of Section 2.1 denoted with the same symbol, the operator
Ẑ (as well as Z) has a number of intriguing properties. Let us describe one of them.
Let D0 := h ⊗ H1(R). (Recall that H1(R) is the first Sobolev space). Let
Γs(D0),
denote the corresponding algebraic Fock space and D1 := K ⊗
Γs(D0). Introduce the
(non-self-adjoint) sesquilinear form
Ẑ+ = Υ⊗1 + 1⊗dΓ(Dτ ) (5.3)
+a (ν ⊗ |δ0)) + a∗ (ν ⊗ |δ0)) .
Let ψ, ψ′ ∈ D1. Then
(ψ|(e−itẐ − 1)ψ′) = −i(ψ|Ẑ+ψ′). (5.4)
Thus it seems that Ẑ+ = Ẑ, which is true only if Υ is self-adjoint and hence there are
no off-diagonal terms in Z. Clearly, the explanation of the above paradox is similar as
in Subsect. 2.1: (ψ|e−itẐψ′) is not differentiable at zero. This is related to the fact that
ψ, ψ′ do not belong to DomZ. Thus Ẑ+ can again be called a false form.
In the literature, the Langevin Schrödinger dynamics e−itẐ is usually introduced
through the so-called Langevin (or stochastic) Schrödinger equation satisfied by
Ŵ (t) := eitdΓ(Dτ )e−itẐ . (5.5)
To write this equation recall the decomposition (5.1) and note that. Then, in the sense
of quadratic forms on D1, we have
Ŵ (t) =
Υ⊗1 + a∗ (ν ⊗ |δt))
Ŵ (t) +
ν∗j Ŵ (t)a (bj ⊗ |δt)) . (5.6)
Note that a (ν ⊗ |δ0)) and a∗ (ν ⊗ |δ0)) appearing in Ẑ and Ẑ+ are quantum analogs
of a classical white noise. They are “localized” at τ = 0. Besides, they are (formally)
given by a linear expression in terms of creation/annihilation operators. Therefore, they
are often called linear quantum noises.
5.2. Quadratic noises. This subsection is outside of the main line of this article. It is
closely related to Subsect. 2.2. It is not needed for the description of the weak coupling
limit, as given in the next section.
Clearly, Ψ ∈ K⊗
⊗ns h⊗ L2(R)
⊗ns L2(R, h)
can be identified with a function
Ψ(τ1, . . . , τn) with values in K⊗ (⊗nh) and the arguments satisfying τ1 < · · · < τn.
Let S be a unitary operator on K ⊗ h. Let S(j) be this operator acting on K ⊗ ⊗nh,
where it is applied to the j’th “leg” of the tensor product ⊗nh. We define an operator
Λ(S) on K ⊗
⊗ns L2(R, h)
as follows: If τ1 < · · · < τk < 0 < τk+1 < · · · < τn, then
(Λ(S)Ψ) (τ1, . . . , τn) := S(k+1) · · ·S(n)Ψ(τ1, . . . , τn).
Clearly, Λ(S) is a unitary operator. If K = C, then it coincides with Γ(γ(S)), where γ(S)
was defined in (2.15) and Γ is the functor of the second quantization defined in (4.1).
24 J. DEREZIŃSKI AND W. DE ROECK
Introduce the operator ẐS,0 on K⊗Γs(ZR) by
ẐS,0 := Υ + Λ(S)
∗1⊗dΓ(Dτ )Λ(S). (5.7)
The operator (5.7) is very singular and contains a “delta interaction at τ = 0”.
Let us now define the dynamics ÛS,t that generalizes Ût. Let Sij ∈ B(K) be defined
Sij⊗|bi)(bj |. (5.8)
ν+S,j = νj , ν
S,j =
ν∗i Sij .
Then we introduce the quadratic form
ÛS,t :=
t≥tn≥···≥t1≥0
dtn · · · dt1
j1,...,jn
ǫ1,...,ǫn∈{+,−}
×(−i)n
k=1,...,n: ǫk=+
a∗(bjk ⊗ |δtk−t))
e−i(t−tn)ẐS,0νǫnS,jn⊗1e
−i(tn−tn−1)ẐS,0 · · · νǫ1S,j1⊗1e
−i(t1−0)ẐS,0
k′=1,...,n: ǫk′=−
a(bjk′ ⊗ |δtk′ ));
ÛS,−t := Û
One can check that ÛS,t extends to a strongly continuous unitary group. Therefore, one
can define a self-adjoint operator ẐS such that ÛS,t = e
−itẐS . It satisfies
I∗KÛS,tIK = e
−itΥ,
I∗KÛS,t A⊗ 1 ÛS,−tIK = etM (A).
It is awkward to write a formula for ẐS in terms of creation/annihilation operators,
even formally. There exists however and alternative formalism that is commonly used in
the literature to define the group e−itẐS . Let ψ, ψ′ ∈ D1. Introduce the cocycle
ŴS(t) := e
itdΓ(Dτ )e−itẐS . (5.9)
Then, in the sense of a quadratic form on D1, the cocycle satisfies the differential equation
ŴS(t) =
Υ⊗1 + a∗(ν⊗|δt))
ŴS(t) (5.10)
i(1 − Sij)⊗a∗(bi ⊗ |δt)) ŴS(t) a(bj⊗|δt)) (5.11)
ν−S,j ŴS(t)a(bj⊗|δt)). (5.12)
REDUCED AND EXTENDED WEAK COUPLING LIMIT 25
This formula is the quantum Langevin (stochastic) equation for the cocycle ŴS(t) in the
sense of [HP, Fa, Pa, At, Maa, Fr, Bar, Me], which includes all three kinds of noises. In
the literature, the dilation e−itẐS is usually introduced through a version of (5.12).
5.3. Total energy operator. Let us analyze the impact of the invariance of a c.p. semigroup
on its quantum Langevin dynamics.
Suppose now that K is a self-adjoint operator on K and Y a self-adjoint operator on
h. Assume that they satisfy
ν K = (K⊗1 + 1⊗Y )ν,
(Υ + Υ∗),K
= 0. (5.13)
This implies in particular that M is K-invariant. Define the self-adjoint operator on Z
E := K⊗1 + 1⊗dΓ(Y⊗1). (5.14)
Then it is easy to see that the quantum Langevin dynamics commutes with this operator:
[E, e−itZ ] = 0. (5.15)
E will be called the total energy operator, which is a name suggested by the physical
interpretation that we attach to E.
Next we discuss the implications of the DBC of a c.p. semigroup on its quantum
Langevin dynamics. We set
σt(C) := e
itECe−itE ,
ωβ(C) := Tr e
−βK⊗|Ω)(Ω| C/Tr e−βK , C ∈ B(Z).
We will see that the DBC for e−βK/Tr e−βK is related to a version of the β-KMS con-
dition for the dynamics σt and the state ωβ .
Theorem 5.2. Assume (5.14). Then the following statements are equivalent:
1) For any D1, D2, D
2 ∈ B(K), f1, f2 ∈ L2(R) and
Bj := Dj⊗1
a∗(ν⊗|fj)) + a(ν⊗|fj))
D′j⊗1, j = 1, 2.
and for any t ∈ R we have
ωβ(σt(B1)B2) = ωβ (B2σt+iβ(B1)) . (5.16)
Trh νAν
∗ = ν∗ A⊗e−βY ν, (5.17)
(This implies in particular that M satisfies the DBC for e−βK/Tr e−βK).
6. Weak coupling limit for Pauli-Fierz operators. In this section we describe the
main results of this article. They are devoted to a rather large class of Pauli-Fierz oper-
ators in the weak coupling limit. In the first subsection we recall the well known results
about the reduced dynamics, which go back to Davies [Da1, Da2, Da3]. In the second
subsection we describe our results that include the reservoir [DD2]. They are inspired by
[AFL]. Finally, we discuss the case of thermal reservoirs.
26 J. DEREZIŃSKI AND W. DE ROECK
6.1. Reduced weak coupling limit. We consider a Pauli-Fierz operator
Hλ = K ⊗ 1 + 1⊗ dΓ(HR) + λ(a∗(V ) + a(V )).
We assume that K is finite dimensional and for any A ∈ B(K) we have
‖V ∗A ⊗
1 e−itH0V ‖dt < ∞. The following theorem is essentially a special case of a result of
Davies [Da1, Da2, Da3], see also [DD2].
Theorem 6.1 (Reduced weak coupling limit for Pauli-Fierz operators). There
exists a K-invariant Markov c.p. semigroup etM on B(K) such that
e−itK/λ
itHλ/λ
A⊗ 1 e−itHλ/λ
itK/λ2 = etM (A),
and a contractive semigroup e−itΥ on K such that [Υ,K] = 0 and
eitK/λ
−itHλ/λ
IK = e
−itΥ.
If the reservoir is at inverse temperature β, then M satisfies the DBC for the state
e−βK/Tr e−βK .
The operator Υ ∈ B(K) arising in the weak coupling limit equals
Υ := −i
k−k′=ω
1k(K)V
∗1k′(K)e
−it(HR−ω)V 1k(K)dt.
In order to write an explicit formula forM it is convenient to introduce an additional
assumption, which anyway will be useful later on in the extended weak coupling limit.
Assumption 6.2. Suppose that for any ω ∈ spK − spK there exist an open Iω ⊂ R and
a Hilbert space hω such that ω ∈ Iω and
Ran1Iω (HR) ≃ hω ⊗ L2(Iω , dx),
1Iω (HR)HR is the multiplication operator by the variable x ∈ Iω and, for ψ ∈ K,
1Iω(HR)V ψ ≃
v(x)ψdx.
Assume that Iω are disjoint for distinct ω and x 7→ v(x) ∈ B(K,K⊗hω) is continuous at
Thus we assume that the reservoir 1-body Hamiltonian HR and the interaction V are
well behaved around the Bohr frequencies – differences of eigenvalues of K.
Let h := ⊕
hω. We define νω ∈ B(K,K⊗hω) by
νω := (2π)
ω=k−k′
1k(K)v(ω)1k′(K)
and ν ∈ B(K,K⊗h) by
REDUCED AND EXTENDED WEAK COUPLING LIMIT 27
Note that
iΥ− iΥ∗ =
k−k′=ω
1k(K)V
∗1k′(K)e
−it(HR−ω)V 1k(K)dt
k−k′=ω
1k(K)v
∗(ω)1k′(K)v(ω) 1k(K)
= ν∗ν.
The generator of a c.p. Markov semigroup that arises in the reduced weak coupling limit,
called sometimes the Davies generator, is
M(A) = −i(ΥA−AΥ∗) + ν∗ A⊗1 ν (6.1)
[A, ν∗ν]+ + ν
∗A⊗1 ν, A ∈ B(K).
6.2. Energy of the reservoir in the weak coupling limit. Introduce the operator Y on h
by setting
Y = ω on hω. (6.2)
The operator Y has the interpretation of the asymptotic energy of the restricted reservoir.
Theorem 6.3. 1) The operator ν constructed in the weak coupling limit satisfies
ν K = (K⊗1 + 1⊗Y )ν. (6.3)
This implies in particular that M is K-invariant.
2) If the reservoir is at inverse temperature β, then ν satisfies
Trh νAν
∗ = ν∗ A⊗e−βY ν, (6.4)
This implies in particular that M satisfies the DBC for e−βK/Tr e−βK .
6.3. Extended weak coupling limit. Recall that given (Υ, ν, h) we can define the space
ZR and the Langevin Schrödinger dynamics e−itZ on the space Z := K ⊗ Γs(ZR), as in
Subsect. 5.1.
For λ > 0, we define the family of partial isometries Jλ,ω : hω⊗L2(R) → hω⊗L2(Iω) ⊂
(Jλ,ωgω)(y) =
), if y ∈ Iω ;
0, if y ∈ R\Iω.
We set Jλ : ZR → HR, defined for g = (gω) by
Jλg :=
Jλ,ωgω.
Note that Jλ are partial isometries and s− limλց0 J∗λJλ = 1.
Set Z0 := dΓ(ZR). The following theorem [DD2] was inspired by [AFL]:
Theorem 6.4 (Extended weak coupling limit for Pauli-Fierz operators).
s∗ − lim
Γ(J∗λ)e
iλ−2tH0e−iλ
−2(t−t0)Hλeiλ
−2t0H0Γ(Jλ)
= eitZ0e−i(t−t0)Ze−it0Z0 .
28 J. DEREZIŃSKI AND W. DE ROECK
The extended weak coupling limit can be used to describe interesting physical prop-
erties of non-equilibrium quantum systems, see e.g. [DM]. The following corollary, which
generalizes the results of [Du], describes the asymptotics of correlation functions for ob-
servables of the form Γ(Jλ)AΓ(J
λ), where A are observables on the asymptotic space.
Corollary 6.5 (Asymptotics of correlation functions). Suppose that
Aℓ, . . . , A1 ∈ B(Z) and t, tℓ, . . . , t1, t0 ∈ R. Then
s∗ − lim
iλ−2tH0e−iλ
−2(t−tℓ)Hλe−iλ
−2tℓH0Γ(Jλ)AℓΓ(J
· · ·Γ(Jλ)A1Γ(J∗λ)eiλ
−2t1H0e−iλ
−2(t1−t0)Hλe−iλ
−2t0H0IK
= I∗Ke
itZ0e−i(t−tℓ)Ze−itℓZ0Aℓ
· · ·A1eit1Z0e−i(t1−t0)Ze−it0Z0IK.
The following corollary is interesting since it describes how reservoir Hamiltonians
converge to operators whose dynamics under the quantum Langevin dynamics U−t · Ut
is well-studied, see e.g. [Bar].
Corollary 6.6 (Asymptotic reservoir energies). Consider the operator Y : h 7→
h defined in (6.2). The operator E := K⊗1 + 1⊗dΓ(Y⊗1) plays the role of “asymptotic
total energy operator”, i.e.
[E, eitZ ] = 0. (6.5)
Besides, for κ1, . . . , κℓ ∈ R,
s∗ − lim
iλ−2tH0e−iλ
−2(t−tℓ)Hλe−iλ
−2tℓH0 eiκℓdΓ(HR)
· · · eiκ1dΓ(HR) eiλ
−2t1H0e−iλ
−2(t1−t0)Hλe−iλ
−2t0H0IK
= I∗Ke
itZ0e−i(t−tℓ)Ze−itℓZ0 eiκℓdΓ(Y⊗1)
· · · eiκ1dΓ(Y⊗1) eit1Z0e−i(t1−t0)Ze−it0Z0IK.
REDUCED AND EXTENDED WEAK COUPLING LIMIT 29
References
[AFLe] Accardi, L., Frigerio, A., Lewis, J.T.: Quantum stochastic processes, Publ. RIMS 18
(1982) 97-133
[AFL] L. Accardi, A. Frigerio, Y.G. Lu: Weak coupling limit as a quantum functional central
limit theorem, Comm. Math. Phys. 131, 537–570 (1990).
[ALV] L. Accardi and Y. G. Lu and I. V. Volovich: Quantum Theory and Its Stochastic Limit,
Springer, New York, 2002
[Ag] G. S. Agarwal: Open quantum Markovian systems and microreversibility, Z. Physik, 258,
(1973) 409–422
[Al1] R. Alicki: On the detailed balance condition for non-Hamiltonian systems, Rep. Math.
Phys., 10, (1976) 249-258
[Al2] R. Alicki: Invitation to quantum dynamical semigroups, eds P. Garbaczewski and R.
Olkiewicz, ” Dynamics of Dissipation ” Lecture Notes in Physics, Springer, 2002
[AL] Alicki, R., Lendi, K.: Quantum dynamical semigroups and applications, Lecture Notes in
Physics no 286, Springer 1991
[At] S. Attal: Quantum noises, “Quantum Open Systems II: The Markovian approach”, eds S.
Attal and A. Joye and C.-A. Pillet, Lecture Notes in Mathematics 1881, Springer, 2006
[AtP] S. Attal and Y. Pautrat: From repeated to continuous quantum interactions: to appear
in Annales Henri Poincaré” 7 2006
[AtJ] S. Attal and A. Joye: The Langevin Equation for a Quantum Heat Bath: math-ph/0612055
[Bar] A. Barchielli: Continual Measurements in Quantum Mechanics, Open Quantum Systems
III. Recent developments eds S. Attal and A. Joye and C.-A. Pillet, Lecture Notes in Math-
ematics 1882, Springer 2006, pp 207-292
[Ch] A. M. Chebotarev: Symmetric form of the Hudson-Parthasarathy stochastic equation, Mat.
Zametki [Math. Notes] 60 (1996) 726-750
[ChR] A. M. Chebotarev and G. V. Ryzhakov: On the Strong Resolvent Convergence of the
Schrodinger Evolution to Quantum Stochastics, Mathematical Notes 74 (2003) 717-733
[Da1] E. B. Davies: Markovian master equations, Comm. Math. Phys. 39, 91 (1974).
[Da2] Davies, E. B.: Markovian master equations II. Math. Ann. 219, 147 (1976).
[Da3] Davies, E. B.: One parameter semigroups, Academic Press 1980
[De0] J. Dereziński: Introduction to Representations of Canonical Commutation and Anticom-
mutation Relations, Large Coulomb Systems, Lecture Notes in Physics 695, eds J. Derezinski
and H. Siedentop, Springer, 2006
[DD1] J. Dereziński, W. De Roeck: Extended weak coupling limit for Friedrichs Hamiltonians,
Journ. Math. Phys. 48 (2007), 012103
[DD2] J. Dereziński, W. De Roeck: Extended weak coupling limit for Pauli-Fierz operators, to
appear in Commun. Math. Phys., preprint math-ph/0610054
[DF1] J. Dereziński, R. Früboes: Fermi Golden Rule and open quantum systems, ”Open Quan-
tum Systems III Recent Developments” Lecture Notes in Mathematics 1882 eds S. Attal,
A. Joye, C.-A. Pillet 2006, pp 67-116
[DF2] J. Dereziński, R. Früboes: Renormalization of Friedrichs Hamiltonians, Reports on Math.
Phys. 50, 433–438 (2002)
[DJ1] J. Dereziński and V. Jaksic: Spectral theory of Pauli-Fierz operators, J. Func. Anal. 180
(2001) ”243–327”,
[DJP] Dereziński, J., Jakšić, V., Pillet, C. A.: Perturbation theory of W ∗-dynamics, Liouvilleans
and KMS-states, to appear in Rev. Math. Phys
http://arxiv.org/abs/math-ph/0612055
http://arxiv.org/abs/math-ph/0610054
30 J. DEREZIŃSKI AND W. DE ROECK
[DM] W. De Roeck and C. Maes: Fluctuations of the dissipated heat in a quantum stochastic
model, Rev. Math. Phys. 18 (2006) ”619–653”,
[Du] R. Dümcke: Convergence of multitime correlation functions in the weak and singular cou-
pling limits, J. Math. Phys. 24 (19983) 311-315
[EL] D.E. Evans and J.T. Lewis: Dilations of irreversible evolutions in algebraic quantum theory,
ed. Dublin Institute for Advanced Studies, 1977
[Fa] Fagnola,F.: Quantum stochastic differential equations and dilation of completely positive
semigroups, ”Open Quantum Systems III Recent Developments” Lecture Notes in Mathe-
matics 1882 eds S. Attal, A. Joye, C.-A. Pillet 2006, pp 183-220
[Fr] Frigerio, A.: Covariant Markov dilations of quantum dynamical semigroups, Publ. RIMS
Kyoto Univ. 21 (1985) 657-675
[FKGV] A. Frigerio and A. Kossakowski and V. Gorini and M. Verri: Quantum detailed balance
and KMS condition, CMP, 57 (1977) 97–110
[GKS] Gorini, V., Kossakowski, A., Sudarshan, E.C.G. Journ. Math. Phys. 17 (1976) 821
[Go] J. Gough: Quantum Flows as Markovian Limit of Emission, Absorption and Scattering
Interactions, CMP 254 (2005) 489–512
[Gr] Gregoratti, M.: The Hamiltonina operator associated with some quantum stochastic evo-
lutions, Comm. Math. Phys. 222 (2001) 181-200; Erratum, Comm. Math. Phys. 264 (2006)
563-564
[Haa] Haake, F.: Statistical treatment of open systems by generalized master equation. Springer
Tracts in Modern Physics 66, Springer-Verlag, Berlin, 1973.
[HP] R. L. Hudson, K. R. Parthasaraty: Quantum Ito’s formula and stochastic evolutions,
Comm. Math. Phys. 93 no. 3, 301–323 (1984).
[Ku] B. Kümmerer, W. Schröder: A new construction of unitary dilations: singular coupling to
white noise, in Quantum Probabilty and Applications, eds L. Accardi and W. von Walden-
fels, (1984)
[LeSp] Lebowitz, J., Spohn, H.: Irreversible thermodynamics for quantum systems weakly cou-
pled to thermal reservoirs. Adv. Chem. Phys. 39, 109 (1978).
[Li] G. Lindblad: On the generators of quantum dynamical semigroups, Comm. Math. Phys.
48 (1976) 119-130
[Maa] Maasen, H.: Quantum Markov processes on Fock space described by integral kernels, in:
Quantum probability and applications II, LNM 1136 Springer Berlin 1985, pp 361-374
[Ma] Majewski, W. A.: Journ. Math. Phys. The detailed balance condition in quantum statistical
mechanics 25 (1984) 614
[MaSt] Majewski, W. A., Streater, R. F.: Detailed balance and quantum dynamical maps Journ.
Phys. A: Math. Gen. 31 (1998) 7981-7995
[Me] Meyer, P.-A.: Quantum probability for probabilists, 2nd edition, L.N.M. 1538, Springer,
Berlin 1995
[NF] B. Sz. Nagy and C. Foias: Harmonic Analysis of Operators in Hilbert Space, North-Holland,
New York (1970)
[Pa] Parthasarathy, K.R.: An introduction to quantum stochastic calculus, Birkhäuser, Basel-
Boston-Berlin 1992
[St] W. F. Stinespring: Positive functions on C-algebras, Proc. Amer. Math. Soc., 6, (1955)
211-216.
[VH] L. Van Hove: Quantum-mechanical perturbations giving rise to a statistical transport
equation. Physica 21, 517 (1955).
	Introduction.
	Toy model of the weak coupling limit.
	Dilations of contractive semigroups.
	``Toy quadratic noises''.
	Weak coupling limit for Friedrichs operators.
	Completely positive maps and semigroups.
	Completely positive maps.
	Completely positive semigroups.
	Classical Markov semigroups.
	Invariant c.p semigroups.
	Detailed Balance Condition.
	Bosonic reservoirs.
	Second quantization.
	Coupling to a bosonic reservoir.
	Thermal reservoirs.
	Quantum Langevin dynamics.
	Linear noises.
	Quadratic noises.
	Total energy operator.
	Weak coupling limit for Pauli-Fierz operators.
	Reduced weak coupling limit.
	Energy of the reservoir in the weak coupling limit.
	Extended weak coupling limit.
ABSTRACT
  We give an extended review of recent work on the extended weak coupling
limit. Background material on completely positive semigroups and their unitary
dilations is given, as well as a particularly easy construction of `quadratic
noises'.

<|endoftext|><|startoftext|>
arXiv:0704.0670v1  [nucl-ex]  5 Apr 2007
Typeset with jpsj2.cls <ver.1.2> Letter
Complete Set of Polarization Transfer Observables
for the 12C(p, n) Reaction at 296 MeV and 0◦
Masanori Dozono∗, Tomotsugu Wakasa, Ema Ihara, Shun Asaji, Kunihiro Fujita1, Kichiji Hatanaka1,
Takashi Ishida2, Takaaki Kaneda1, Hiroaki Matsubara1, Yuji Nagasue, Tetsuo Noro, Yasuhiro Sakemi3,
Yohei Shimizu1, Hidemitsu Takeda, Yuji Tameshige1, Atsushi Tamii1 and Yukiko Yamada
Department of Physics, Kyushu University, Fukuoka 812-8581
Research Center for Nuclear Physics, Osaka University, Osaka 567-0047
Laboratory of Nuclear Science, Tohoku University, Sendai 982-0826
Cyclotron and Radioisotope Center, Tohoku University, Sendai 980-8578
A complete set of polarization transfer observables has been measured for the 12C(p, n)
reaction at Tp = 296 MeV and θlab = 0
◦. The total spin transfer Σ(0◦) and the observable
f1 deduced from the measured polarization transfer observables indicate that the spin–dipole
resonance at Ex ≃ 7 MeV has greater 2
− strength than 1− strength, which is consistent
with recent experimental and theoretical studies. The results also indicate a predominance of
the spin-flip and unnatural-parity transition strength in the continuum. The exchange tensor
interaction at a large momentum transfer of Q ≃ 3.6 fm−1 is discussed.
KEYWORDS: complete set of polarization transfer observables, spin–dipole resonance, exchange tensor
interaction
The charge exchange reaction at intermediate ener-
gies (T & 100 MeV/A) is one of the best probes to
study spin–isospin excitations in nuclei, such as spin–
dipole (SD) excitations characterized by ∆L = 1, ∆S =
1, and ∆Jπ = 0−, 1−, and 2−. In previous (p, n)
and (n, p) experiments on 12C,1, 2 spin–dipole resonances
(SDRs) were found at Ex ≃ 4 and 7 MeV. Analysis of the
angular distributions of the SDRs at Ex ≃ 4 and 7 MeV
indicate that they consist of mainly 2− and 1− compo-
nents, respectively. However, a recent 12C(~d, 2He)12B ex-
periment3 suggested that the SDR at Ex ≃ 7 MeV in
has more 2− components than 1− components. This sug-
gestion is supported by a 12C(12C, 12N)12B experiment4
and by theoretical calculations including tensor correla-
tions.5 Thus the spin-parity assignment of the SDR at
Ex ≃ 7 MeV for the A = 12 system is still controversial.
A complete set of polarization transfer (PT) observ-
ables at 0◦ is a powerful tool for investigating the spin-
parity Jπ of an excited state. The total spin transfer
Σ(0◦) deduced from such a set gives information on the
transferred spin ∆S, which is independent of theoreti-
cal models.6 Furthermore, information can be obtained
on the parity from the observable f1.
7 On the other
hand, each PT observable is sensitive to the effective
nucleon–nucleon (NN) interaction. The PT observables
for ∆Jπ = 1+ transitions have been used to study the
exchange tensor interaction at large momentum trans-
fers.8, 9
In this Letter, we present measurements of a com-
plete set of PT observables for the 12C(p, n) reaction
at Tp = 296 MeV and θlab = 0
◦. We have deduced
the total spin transfer Σ and the observable f1 using
the measured PT observables in order to investigate the
spin-parity structure in both the SDR and continuum
regions. We also compare the PT observables for the
∗E-mail address: dozono@kutl.kyushu-u.ac.jp
12C(p, n)12N(g.s.; 1+) reaction with distorted-wave im-
pulse approximation (DWIA) calculations employing the
effective NN interaction in order to assess the effective
tensor interaction at a large exchange momentum trans-
fer of Q ≃ 3.6 fm−1.
Measurements were carried out at the neutron time-
of-flight facility10 at the Research Center for Nuclear
Physics (RCNP), Osaka University. The proton beam
energy was 296 MeV and the typical current and polar-
ization were 500 nA and 0.70, respectively. The neutron
energy and polarization were measured by the neutron
detector/polarimeter NPOL3.11 We used a natural car-
bon (98.9% 12C) target with a thickness of 89 mg/cm2.
The measured cross sections were normalized to the
0◦ 7Li(p, n)7Be(g.s. + 0.43 MeV) reaction, which has
a center of mass (c.m.) cross section of σc.m.(0
27.0±0.8 mb/sr at this incident energy.12 The systematic
uncertainties of the data were estimated to be 4–6%.
Asymmetries of the 1H(~n, p)n and 12C(~n, p)X reac-
tions in NPOL3 were used to deduce the neutron polar-
ization. The effective analyzing power Ay;eff of NPOL3
was calibrated by using polarized neutrons from the
12C(~p, ~n)12N(g.s.;1+) reaction at 296 MeV and 0◦. A
detailed description of the calibration can be found in
Ref. 11. The resulting Ay;eff was 0.151 ± 0.007 ± 0.004,
where the first and second uncertainties are statistical
and systematic, respectively.
Figure 1 shows the double differential cross section and
a complete set of PT observables Dii (i = S, N, and L)
at 0◦ as a function of excitation energy Ex. The labo-
ratory coordinates at 0◦ are defined so that the normal
(N̂ ) direction is the same as N̂ at finite angles (nor-
mal to the reaction plane), the longitudinal (L̂) direc-
tion is along the momentum transfer, and the sideways
(Ŝ) direction is given by Ŝ = N̂ × L̂. The data of the
cross section in Fig. 1 have been sorted into 0.25-MeV
http://arxiv.org/abs/0704.0670v1
2 J. Phys. Soc. Jpn. Letter Author Name
Fig. 1. Double differential cross section (top panel) and a com-
plete set of polarization transfer observables (bottom three pan-
els) for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0
The error bars represent statistical uncertainties only.
bins, while the data of Dii(0
◦) have been sorted into
1-MeV bins to reduce statistical fluctuations. A high en-
ergy resolution of 500 keV full width at half maximum
(FWHM) was realized by NPOL3, which enabled us to
observe clearly two SDR peaks at Ex ≃ 4 and 7 MeV.
It should be noted that the DNN(0
◦) value should be
equal to the corresponding DSS(0
◦) value because the
N̂ direction is identical to the Ŝ direction at 0◦. The ex-
perimental DNN (0
◦) and DSS(0
◦) values are consistent
with each other within statistical uncertainties over the
entire range of Ex, demonstrating the reliability of our
measurements.
Figure 2 shows the total spin transfer Σ(0◦) and the
observable f1 defined as
Σ(0◦) =
3− [2DNN(0
◦) +DLL(0
1− 2DNN(0
◦) +DLL(0
2[1 +DLL(0◦)]
as a function of excitation energy Ex. The Σ(0
◦) value is
either 0 or 1 depending on whether ∆S = 0 or ∆S = 1,
which is independent of theoretical models.6 The f1
value is either 0 or 1 depending on the natural-parity
or unnatural-parity transition if a single ∆Jπ transition
is dominant.7 The Σ(0◦) and f1 values of the spin-flip
unnatural-parity 1+ and 2− states at Ex = 0 and 4 MeV,
respectively, are almost unity, which is consistent with
theoretical predictions. The continuum Σ(0◦) values are
almost independent of Ex and take values larger than
0.88 up to Ex = 50 MeV, indicating the predominance
of the spin-flip strength. The solid line in the top panel
of Fig. 2 represents the free NN values of Σ(0◦) for the
Fig. 2. Total spin transfer Σ (top panel) and observable f1 (bot-
tom panel) for the 12C(p, n) reaction at Tp = 296 MeV and
θlab = 0
◦. The error bars represent statistical uncertainties only.
The solid line shows the values of Σ for free NN scattering.
corresponding kinematical condition.13 Enhancement of
Σ(0◦) relative to the free NN values means enhancement
of the ∆S = 1 response relative to the ∆S = 0 response
in nuclei at small momentum transfers, which is consis-
tent with previous studies of (p, p′) scattering.14, 15 The
large values of f1 ≥ 0.72 up to Ex = 50 MeV indicate a
predominance of the unnatural-parity transition strength
in the continuum, consistent with the 90Zr(p, n) result at
295 MeV.7
The top panel of Fig. 3 shows the spin-flip (σΣ) and
non-spin-flip (σ(1−Σ)) cross sections as filled and open
circles, respectively, as functions of Ex. The bottom panel
shows the unnatural-parity dominant (σf1) and natural-
parity dominant (σ(1 − f1)) components of the cross
section as filled and open circles, respectively. The solid
lines are the results of peak fitting of the spectra with
Gaussian peaks and a continuum. The continuum was as-
sumed to be the quasi-free scattering contribution, and
its shape was given by the formula given in Ref. 16.
It should be noted that the spin-flip unnatural-parity
1+ and 2− states at Ex = 0 and 4 MeV, respectively,
form peaks only in the σΣ and σf1 spectra. It is found
that the prominent peak at Ex ≃ 7 MeV is the spin-flip
unnatural-parity component with a Jπ value estimated
to be 2− because the Dii(0
◦) values are consistent with
the theoretical prediction for Jπ = 2−.17 In the σ(1−f1)
spectrum, possible evidence for SD 1− peaks is seen at
Ex ≃ 7, 10, and 14 MeV. The top and bottom panels of
Fig. 4 show theoretical calculations for the unnatural-
parity and natural-parity SD strengths, respectively.5
Experimentally extracted peaks in the σf1 and σ(1−f1)
spectra are also shown. Concentration of the SD 2−
strength at three peaks at Ex ≃ 4, 8, and 13 MeV has
been predicted. Our data agree with this prediction qual-
itatively, but give slightly different excitation energies of
Ex ≃ 4, 7, and 11 MeV. On the other hand, the SD
1− strength has been predicted to be quenched and frag-
mented due to tensor correlations.5 The experimental re-
J. Phys. Soc. Jpn. Letter Author Name 3
Fig. 3. Cross sections separated by Σ (top panel) and f1 (bottom
panel) for the 12C(p, n) reaction at Tp = 296 MeV and θlab = 0
The solid lines show peak fitting of the spectra with Gaussian
peaks and a continuum.
sults are spread over a wide region of Ex ≃ 5–16 MeV
and exhibit similar cross sections, which supports frag-
mentation of the SD 1− strength.
Effective tensor interactions at q ≃ 1–3 fm−1 have
mainly been studied using high spin stretched states.18, 19
The present Dii(0
◦) data can give information on the ex-
change tensor interaction at an extremely large exchange
momentum transfer of Q ≃ 3.6 fm−1. In the Kerman–
McNanus–Thaler (KMT) representation,20 the NN scat-
tering amplitude is represented as
M(q) =A+ 1
(B + E + F )σ1 · σ2 + C(σ1 + σ2) · n̂
(E −B)S12(q̂) +
(F −B)S12(Q̂),
where S12 is the tensor operator, q̂ and Q̂ are direct
and exchange momentum transfers, respectively, and
n̂ = Q̂ × q̂. In a plane-wave impulse approximation
(PWIA), the PT observables for the Gamow–Teller (GT)
transition at 0◦ are simply expressed using parameters
A–F as17
DNN(0
◦) = DSS(0
2B2 + F 2
DLL(0
−2B2 + F 2
2B2 + F 2
If there is no exchange tensor S12(Q̂) interaction (i.e.,
F = B), then Dii(0
◦) = −1/3.
The measured PT observables Dii(0
◦) for the GT
12C(~p, ~n)12N(g.s.;1+) transition are listed in Table I,
where the listed uncertainties are statistical only. The
presentDNN (0
◦) and DSS(0
◦) values are consistent with
each other, as expected, and the present DNN (0
◦) value
agrees with the previously measured DNN (0
◦) value at
the same energy.9 The experimental values deviated from
−1/3, which indicates that there are contributions from
both the exchange tensor interaction at Q ≃ 3.6 fm−1
and nuclear distortion effects.
Fig. 4. SD strengths for unnatural-parity (top panel) and
natural-parity (bottom panel) taken from Ref. 5. The solid lines
represent peaks obtained by fitting σf1 (top panel) and σ(1−f1)
(bottom panel) spectra.
In order to assess these effects quantitatively, we per-
formed microscopic DWIA calculations using the com-
puter code dw81.21 The transition amplitudes were cal-
culated from the Cohen–Kurath wave functions22 assum-
ing Woods–Saxon radial dependence.23 Distorted waves
were generated using the optical model potential (OMP)
for proton elastic scattering data on 12C at 318 MeV.24
We used the effective NN interaction parameterized by
Franey and Love (FL) at 270 or 325 MeV.25
First, we examined the sensitivity of the DWIA results
to the OMPs by using two different parameters.24, 26 The
OMP dependence of Dii(0
◦) was found to be less than
0.01. This insensitivity allows us to useDii(0
◦) as a probe
to study the effective NN interaction. Table I shows the
DWIA results for Dii(0
◦) with the NN interaction at
270 and 325 MeV. It is found that the Dii(0
◦) values,
and DLL(0
◦) in particular, are sensitive to the choice of
the NN interaction. These differences are mainly due to
the exchange tensor interaction S12(Q) at Q ≃ 3.6 fm
The real part of S12(Q) for the FL 325 MeV interaction
is about twice as large as that for the FL 270 MeV in-
teraction at Q ≃ 3.6 fm−1 (see Fig. 3 of Ref. 9). The
experimental Dii(0
◦) values support the DWIA results
with the FL 270 MeV interaction, which indicates that
the exchange tensor part of the FL 270 MeV interaction
has an appropriate strength at Q ≃ 3.6 fm−1. This con-
clusion has already been reported for DNN(0
◦) data,9
however, the present data make the conclusion more rig-
orous because of the high sensitivity of DLL(0
◦) to the
exchange tensor interaction.
In summary, a complete set of PT observables for the
12C(p, n) reaction at Tp = 296 MeV and θlab = 0
◦ has
been measured. The total spin transfer Σ(0◦) and the
observable f1 are deduced in order to study the spin-
parity structure in both the SDR and continuum re-
gions. The Σ(0◦) and f1 values show that the SDR at
4 J. Phys. Soc. Jpn. Letter Author Name
DNN (0
◦) DSS(0
◦) DLL(0
This work −0.216 ± 0.019 −0.210± 0.039 −0.554± 0.023
ref. 9 −0.215 ± 0.019 – –
FL 270 MeV −0.225 −0.225 −0.550
FL 325 MeV −0.191 −0.191 −0.619
Table I. PT observables Dii(0
◦) for the GT 12C(~p, ~n)12N(g.s.;1+) transition at 296 MeV and 0◦ compared with theoretical calculations.
Ex ≃ 7 MeV has greater 2
− strength than 1− strength,
which agrees with the recent theoretical prediction. In
the continuum up to Ex ≃ 50 MeV, a predominance
of the spin-flip and unnatural-parity transition strength
is also found. We have compared the PT observables of
the 12C(p, n)12N(g.s.;1+) reaction with DWIA calcula-
tions employing the FL interaction. The exchange tensor
interaction of the FL 270 MeV interaction is found to
be more appropriate at Q ≃ 3.6 fm−1 than that of the
FL 325 MeV interaction. Thus a complete set of PT ob-
servables provides rigorous information not only on the
spin-parity structure in nuclei but also on the effective
NN interaction.
Acknowledgment
We are grateful to the RCNP cyclotron crew for pro-
viding a good quality beam for our experiments. We also
thank H. Tanabe for his help during the experiments.
This work was supported in part by the Grants-in-Aid
for Scientific Research Nos. 14702005 and 16654064 of
the Ministry of Education, Culture, Sports, Science, and
Technology of Japan.
1) X. Yang, L. Wang, J. Rapaport, C. D. Goodman, C. Foster,
Y. Wang, W. Unkelbach, E. Sugarbaker, D. Marchlenski, S. de
Lucia, B. Luther, J. L. Ullmann, A. G. Ling, B. K. Park, D.
S. Sorenson, L. Rybarcyk, T. N. Taddeucci, C. R. Howell and
W. Tornow: Phy. Rev. C 48 (1993) 1158.
2) B. D. Anderson, L. A. C. Garcia, D. J. Millener, D. M. Manley,
A. R. Baldwin, A. Fazely, R. Madey, N. Tamimi, J. W.Watson
and C. C. Foster: Phys. Rev. C 54 (1996) 237.
3) H. Okamura, T. Uesaka, K. Suda, H. Kumasaka, R. Suzuki,
A. Tamii, N. Sakamoto and H. Sakai: Phys. Rev. C 66 (2002)
054602.
4) T. Ichihara, M. Ishihara, H.Ohnuma, T.Niizeki, T.Yamamoto,
K. Katoh, T. Yamashita, Y. Fuchi, S. Kubono, M. H. Tanaka,
H.Okamura, S. Ishida and T.Uesaka: Nucl.Phys.A 577 (1994)
5) T. Suzuki and H. Sagawa: Nucl. Phys. A 637 (1998) 547.
6) T. Suzuki: Prog. Theor. Phys. 103 (2000) 859.
7) T. Wakasa, H. Sakai, H. Okamura, H. Otsu, T. Nonaka, T.
Ohnishi, K. Yako, K. Sekiguchi, S. Fujita, T. Uesaka, Y. Satou,
S. Ishida, N. Sakamoto, M. B. Greenfield and K. Hatanaka: J.
Phys. Soc. Jpn. 73 (2004) 1611.
8) D.J.Mercer, T.N.Taddeucci, L.J.Rybarcyk, X.Y.Chen, D.L.
Prout, R. C. Byrd, J. B. McClelland, W. C. Sailor, S. DeLucia,
B. Luther, D. G. Marchlenski, E. Sugarbaker, E. Gülmez, C.
A. Whitten, Jr., C. D. Goodman and J. Rapaport: Phys. Rev.
Lett. 71 (1993) 684.
9) T. Wakasa, H. Sakai, H. Okamura, H. Otsu, S. Ishida, N.
Sakamoto, T. Uesaka, Y. Satou, M. B. Greenfield, N. Koori,
A. Okihana and K. Hatanaka: Phys. Rev. C 51 (1995) R2871.
10) H. Sakai, H. Okamura, H. Otsu, T. Wakasa, S. Ishida, N.
Sakamoto, T. Uesaka, Y. Satou, S. Fujita and K. Hatanaka:
Nucl. Instrum. Methods Phys. Res., Sect. A 369 (1996) 120.
11) T. Wakasa, Y. Hagihara, M. Sasano, S. Asaji, K. Fujita, K.
Hatanaka, T. Ishida, T. Kawabata, H. Kuboki, Y. Maeda, T.
Noro, T. Saito, H. Sakai, Y. Sakemi, K. Sekiguchi, Y. Shimizu,
A. Tamii, Y. Tameshige and K. Yako: Nucl. Instrum. Methods
Phys. Res., Sect. A 547 (2005) 569.
12) T. N. Taddeucci, W. P. Alford, M. Barlett, R. C. Byrd, T. A.
Carey, D. E. Ciskowski, C. C. Foster, C. Gaarde, C. D. Good-
man, C. A. Goulding, E. Gülmez, W. Huang, D. J. Horen, J.
Larsen, D. Marchlenski, J. B. McClelland, D. Prout, J. Rapa-
port, L. J. Rybarcyk, W. C. Sailor, E. Sugarbaker and C. A.
Whitten, Jr.: Phys. Rev. C 41 (1990) 2548.
13) R.A.Arndt, W.J.Briscoe, R.L.Workman and I. I. Strakovsky:
computer code said http://gwdac.phys.gwu.edu.
14) C.Glashausser, K. Jones, F.T.Baker, L.Bimbot, H.Esbensen,
R. W. Fergerson, A. Green, S. Nanda and R. D. Smith: Phys.
Rev. Lett. 58 (1987) 2404.
15) F. T. Baker, L. Bimbot, B. Castel, R. W. Fergerson, C.
Clashausser, A. Green, O. Hausser, K. Hicks, K. Jones, C. A.
Miller, S. K. Nanda, R. D. Smith, M. Vetterli, J. Wambach,
R. Abegg, D. Beatty, V. Cupps, C. Djalari, R. Henderson, K.
P. Jackson, R. Jeppeson, J. Lisantti, M. Morlet, R. Sawafta,
W. Unkelbach, A. Willis and S. Yen: Phys. Lett. B 237 (1990)
16) A. Erell, J. Alster, J. Lichtenstadt, M. A. Moinester, J. D.
Bowman, M. D. Cooper, F. Irom, H. S. Matis, E. Piasetzky
and U. Sennhause: Phys. Rev. C 34 (1986) 1822.
17) J. M. Moss: Phys. Rev. C 26 (1982) 727.
18) Edward J. Stephenson and Jeffrey A. Tostevin, in Spin and
Isospin in Nuclear Interactions, Proceedings of the Interna-
tional Conference, Telluride, Colorado, 11–15 March 1991,
edited by Scott W.Wissink, Charles D.Goodman, and George
E. Walker (Plenum, New York, 1991), p.281; N. M. Hintz, A.
Sethi, and A. M. Lallena, ibid., p.287.
19) N. M. Hintz, A. M. Lallena and A. Sethi: Phys. Rev. C 45
(1992) 1098.
20) A. K. Kerman, H. McManus and R. M. Thaler: Ann. Phys.
(N.Y.) 8 (1959) 551.
21) Program dwba70, R. Schaeffer and J. Raynal (unpublished);
Extended version dw81 by J. R. Comfort (unpublished).
22) S. Cohen and D. Kurath: Nucl. Phys. 73 (1965) 1.
23) B. L. Clausen, R. J. Peterson and R. A. Lindgren: Phys. Rev.
C 38 (1988) 589.
24) F. T. Baker, D. Beatty, L. Bimbot, V. Cupps, C. Djalali, R.
W. Fergerson, C. Glashausser, G. Graw, A. Green, K. Jones,
M. Morlet, S. K. Nanda, A. Sethi, B. H. Storm, W. Unkelbach
and A. Willis: Phys. Rev. C 48 (1993) 1106.
25) M. A. Franey and W. G. Love: Phys. Rev. C 31 (1985) 488.
26) H. O. Meyer, P. Schwandt, H. P. Gubler, W. P. Lee, W. T. H.
van Oers, R. Abegg, D. A. Hutcheon, C. A. Miller, R. Helmer,
K. P. Jackson, C. Broude and W. Bauhoff: Phys. Rev. C 31
(1985) 1569.
ABSTRACT
  A complete set of polarization transfer observables has been measured for the
$^{12}{\rm C}(p,n)$ reaction at $T_p=296 {\rm MeV}$ and $\theta_{\rm
lab}=0^{\circ}$. The total spin transfer $\Sigma(0^{\circ})$ and the observable
$f_1$ deduced from the measured polarization transfer observables indicate that
the spin--dipole resonance at $E_x \simeq 7 {\rm MeV}$ has greater $2^-$
strength than $1^-$ strength, which is consistent with recent experimental and
theoretical studies. The results also indicate a predominance of the spin-flip
and unnatural-parity transition strength in the continuum. The exchange tensor
interaction at a large momentum transfer of $Q \simeq 3.6 {\rm fm}^{-1}$ is
discussed.

<|endoftext|><|startoftext|>
Introduction and problem statement
	Related work
	Assumptions
	The results
	Example: nonparametric regression
	Discussion and future work
	Appendix
	References
ABSTRACT
  The problem of statistical learning is to construct a predictor of a random
variable $Y$ as a function of a related random variable $X$ on the basis of an
i.i.d. training sample from the joint distribution of $(X,Y)$. Allowable
predictors are drawn from some specified class, and the goal is to approach
asymptotically the performance (expected loss) of the best predictor in the
class. We consider the setting in which one has perfect observation of the
$X$-part of the sample, while the $Y$-part has to be communicated at some
finite bit rate. The encoding of the $Y$-values is allowed to depend on the
$X$-values. Under suitable regularity conditions on the admissible predictors,
the underlying family of probability distributions and the loss function, we
give an information-theoretic characterization of achievable predictor
performance in terms of conditional distortion-rate functions. The ideas are
illustrated on the example of nonparametric regression in Gaussian noise.

<|endoftext|><|startoftext|>
Hamiltonian formalism in Friedmann cosmology and its quantization
Jie Ren1,∗ Xin-He Meng2,3, and Liu Zhao2
Theoretical Physics Division, Chern Institute of Mathematics, Nankai University, Tianjin 300071, China
Department of physics, Nankai University, Tianjin 300071, China and
BK21 Division of Advanced Research and Education in physics, Hanyang University, Seoul 133-791, Korea
(Dated: October 15, 2018)
We propose a Hamiltonian formalism for a generalized Friedmann-Roberson-Walker cosmology
model in the presence of both a variable equation of state (EOS) parameter w(a) and a variable
cosmological constant Λ(a), where a is the scale factor. This Hamiltonian system containing 1
degree of freedom and without constraint, gives Friedmann equations as the equation of motion,
which describes a mechanical system with a variable mass object moving in a potential field. After
an appropriate transformation of the scale factor, this system can be further simplified to an object
with constant mass moving in an effective potential field. In this framework, the Λ cold dark matter
model as the current standard model of cosmology corresponds to a harmonic oscillator. We further
generalize this formalism to take into account the bulk viscosity and other cases. The Hamiltonian
can be quantized straightforwardly, but this is different from the approach of the Wheeler-DeWitt
equation in quantum cosmology.
PACS numbers: 98.80.Jk,45.20.Jj,03.50.-z
I. INTRODUCTION
Since the current accelerating expansion of our Uni-
verse was discovered [1] around 1998 and 1999, theoret-
ical physicists have devoted increasingly more attention
to the Friedmann-Roberson-Walker (FRW) model as a
standard framework in cosmology study. The Λ cold dark
matter (ΛCDM) model as the standard model of cosmol-
ogy so far fits well with observational data whereas it
has had some serious theoretical problems. To make a
comparison to the ΛCDM model, physicists have built
many cosmological models that are able to give out the
effective Friedmann equations with variable cosmological
constant. To quantize the Friedmann equations, the com-
monly used theory is the Wheeler-DeWitt equation [2],
which has been studied and applied widely in quantum
cosmology [3]. Starting from the Hilbert-Einstein action
with the Roberson-Walker (RW) metric, the correspond-
ing HamiltonianH can be obtained. Then the Friedmann
equation plays the role as the constraint H = 0, which
leads to the Wheeler-DeWitt equation. In the present
work, we consider the Friedmann equations as basic equa-
tions and find a Hamiltonian system that gives Fried-
mann equations as classical equations of motion without
constraint.
Many Ansätze of the variable cosmological constant
have been studied in the literature, such as Refs. [4, 5,
6, 7]. Moreover, some models motivated from the string
theory give an effective cosmological term when reduced
to the FRW framework. We assume that the equation
of state (EOS) parameter w ≡ p/ρ can also be variable,
which means that the contents of the Universe, except
the cosmological term, are generalized to a nonperfect
∗Electronic address: jrenphysics@hotmail.com
fluid, or perfect fluid as a special case. In observational
cosmology, the redshift z is regarded as an observable
quantity and related to the scale factor a by z ≡ a0/a−1.
Therefore, we investigate a general case that both the
EOS parameter w and the cosmological constant Λ can
be functions of the scale factor a, and take into account
the bulk viscosity.
As an extension of the problem, we construct a Hamil-
tonian formalism for a system described by the following
equation:
q̈ = f1(q)q̇
2 + ηq̇ + f2(q),
where f1(q) and f2(q) are arbitrary functions, and η is
constant. Also it can be regarded as a generalization
of the damping harmonic oscillator. The corresponding
Hamiltonian describes an object with variable mass mov-
ing in a potential field. After an appropriate canonical
transformation, this system can be further simplified to
an object with constant mass moving in an effective po-
tential field. Thus, differential models in the FRW frame-
work are characterized by their effective potentials. This
is a general formalism and it can be applied to many cos-
mological models, for example, that the ΛCDM model
corresponds to a harmonic oscillator. Since the quanti-
zation of Friedmann equations can provide an insight to
quantum cosmology as a glimpse of quantum gravity, we
also make some remarks on the quantum case, which pro-
vides a correspondence between cosmology and quantum
mechanics.
The paper is organized as follows. In Sec. II we present
a generalized FRW model and the corresponding Hamil-
tonian to describe the Friedmann equations. Then we
find a canonical transformation to further simplify the
problem, and give some examples and special cases. In
Sec. III we show that our framework can also be applied
in the dissipative case with bulk viscosity. In Sec. IV
we turn our attention to the relation to the observable
http://arxiv.org/abs/0704.0672v3
mailto:jrenphysics@hotmail.com
quantities and review some issues of the Bianchi identity.
In Sec. V we make some remarks on quantum cosmology
from our approach. In the last section we present the
conclusion and discuss some future subjects.
II. HAMILTONIAN FORMALISM
A. Hamiltonian description of the Friedmann
equations
We consider the RW metric in the flat space geometry
(k=0) as the case favored by current cosmic observational
data:
ds2 = −dt2 + a(t)2(dr2 + r2dΩ2), (1)
where a(t) is the scale factor. The energy-momentum
tensor for the cosmic fluid can be written as
T̃µν = (ρ+ p)UµUν + (p+ ρΛ)gµν , (2)
where ρΛ = Λ/(8πG) is the energy density of the cos-
mological constant. Thus, Einstein’s equation Rµν −
gµνR = 8πGT̃µν contains two independent equations:
, (3a)
= −4πG
(ρ+ 3p) +
. (3b)
The EOS of the matter (cosmic fluid except the cosmo-
logical constant) is commonly assumed to be
p = (γ − 1)ρ. (4)
Cosmologists usually call Eq. (3a) as the Friedmann
equation and Eq. (3b) as the acceleration equation in the
literature, whereas for simplicity we name both Eqs. (3a)
and (3b) Friedmann equations here. For generality, we
assume that both γ and Λ are functions of the scale factor
a, thus we call it the generalized FRWmodel. Combining
the Friedmann equations with the EOS, we obtain
= −3γ(a)− 2
γ(a)Λ(a)
, (5)
which determines the evolution of the scale factor.
We regard Eq. (5) as a basic starting point; therefore, if
the dynamical equation for the scale factor can be written
as that form, the present framework can be valid. If
the Newton constant G is constant and the cosmological
constant Λ is variable, the energy-momentum tensor for
the matter cannot individually conserved [5, 6], which
implies an interaction between the matter and vacuum
energy. In the following, we assume G to be constant
until Sec. IV.
Our aim is to find a Hamiltonian description of Eq. (5)
as the classical equation of motion. We start from the
following Lagrangian
L(q, q̇) = 1
M(q)q̇2 − V (q), (6)
and the corresponding Hamiltonian thus is
H(q, p) = p
2M(q)
+ V (q), (7)
with the canonical Poisson bracket {q, p} = 1. One can
check that the equation of motion for Eq. (6) or (7) is
∂ lnM
∂ ln q
. (8)
This equation possesses the same form as Eq. (5). There-
fore, by comparing Eq. (5) with Eq. (8), we can take a as
the general coordinate and solve the functions M(a) and
V (a). Then the Lagrangian L = 1
M(a)q̇2 − V (a) with
M = exp
3γ − 2
, V = −1
MγΛada, (9)
gives Eq. (5) as the equation of motion. For some spec-
ified functions γ = γ(a) and Λ = Λ(a), the above in-
tegrations can be evaluated out to give M(a) and V (a)
explicitly.
Now we can see that the generalized FRWmodel essen-
tially corresponds to an object with variable mass M(a)
moving in a potential field V (a). In the following, we
will show that this picture can be further simplified as
an object with constant mass moving in an effective po-
tential field Ṽ (φ), after an appropriate transformation of
the scale factor.
B. Canonical transformation
The above problem can be generalized as the Hamilto-
nian description of the nonlinear equation
q̈ = f1(q)q̇
2 + f2(q), (10)
where f1(q) and f2(q) are two specified functions.
This equation can be derived by the Lagrangian L =
M(q)q̇2 − V (q) with
M = exp
f1(q)dq
, V = −
Mf2(q)dq. (11)
We define a new variable φ as (see Appendix)
f1(q)dq
dq. (12)
This transformation can eliminate the q̇2 term and gives
the equation for the variable φ as in
φ̈ = f2(q) exp
f1(q)dq
, (13)
where q → φ denotes using Eq. (12) to change the vari-
able q to φ. Since there is no φ̇2 term in Eq. (13), this can
be regarded as a partial linearization. Therefore, the sys-
tem of Eq. (10) transformed to Eq. (13) can be described
by the Lagrangian
L(φ, φ̇) = 1
φ̇2 − Ṽ (φ), (14)
with the potential as
Ṽ (φ) = −
f2(q) exp
f1(q)dq
f2(q) exp
f1(q)dq
.(15)
The simplification of the problem by Eq. (12) is essen-
tially the canonical transformation
q → φ, pq → pφ, H(q, pq) → H(φ, pφ), (16)
where pq = M(q)q̇, pφ = φ̇, and H(φ, pφ) = 12p
φ + Ṽ (φ).
Therefore, the classical and quantum properties of differ-
ent models are characterized by the effective potentials.
For Eq. (5) as a special case, the new variable φ is
given by
3γ − 2
da. (17)
C. Some examples
We will give some special cases of the above general
framework to show some applications. If both γ and Λ
are constant for a simple case, the integrations in Eq. (17)
can be evaluated out as
a3γ/2, γ 6= 0, (18a)
= ln a, γ = 0. (18b)
Now we consider γ 6= 0 for example. The special case γ =
1 corresponds to the ΛCDM model. The equation for φ
can be obtained as φ̈− 3
γ2Λφ = 0, and the corresponding
Lagrangian is
L = 1
φ̇2 +
γ2Λφ2. (19)
We can see that the simplest model in cosmology just
corresponds to a harmonic oscillator after linearization.
In particular, this is a upside-down harmonic oscillator
for the asymptotic de Sitter Universe.
We can add the curvature effect to the ΛCDM model,
which is described by the special case m = 2 of the fol-
lowing equation:
= −3γ − 2
. (20)
Here the parameters γ, Λ, and m are all constants. This
equation possesses the same form of Eq. (10). By defining
φ as Eq. (12) and using Eq. (15), we obtain the effective
potential as
Ṽ (φ) = −3
γ2Λφ2 +
3γ −m
)2−2m/3γ
, (21)
for γ 6= 0 and m 6= 3γ.
Another example is the Friedmann equations during
the inflation era. In the study of inflation, we usually
use the conformal time τ instead of the comoving cosmic
time t. Here we assume that a constant term −p0 is in
the EOS during inflation. Thus the Friedmann equations
combined with the EOS p = −ρ− p0 yield
p0, (22)
where the prime denotes a derivative with respect to τ ,
and κ2 = 8πG. By defining φ = −1/a, the equation for
φ is φ′′φ+ (κ2/2)p0 = 0. The effective potential is thus
Ṽ (φ) =
p0 ln |φ|. (23)
Moreover, if we add the curvature term in this case, it
corresponds to a φ2 potential.
III. BULK VISCOSITY
We assume that the cosmic fluid possesses some dis-
sipation effects. Since the sheer tensor σµν = 0 for RW
metric, the sheer viscosity does not contribute to the evo-
lution in Friedmann cosmology. The energy-momentum
tensor for nonperfect fluid concerning bulk viscosity in
the right-hand side of Einstein’s equation is given by [8, 9]
Tµν = ρUµUν + (p− ζ0θ)hµν , (24)
where hµν ≡ gµν + UµUν is the projection operator,
θ ≡ Uα;α = 3ȧ/a is the scalar expansion, and ζ is the
bulk viscosity coefficient. Consequently, Eq. (5) should
be modified as
= −3γ(a)− 2
+ 12πGζ0
γ(a)Λ(a)
. (25)
where both γ and Λ can be functions of a for generality,
and ζ0 is constant. We also find a Hamiltonian
H(a, pa, t) =
2M(a, t)
+ V (a, t), (26)
with the Poisson bracket {a, pa} = 1 to give Eq. (25) as
the classical equation of motion. The functions in this
Hamiltonian are given by
M = exp
3γ − 2
da− 12πGζ0t
, (27a)
V = −1
MγΛada. (27b)
Although a dissipative system cannot be described by
a conservative Hamiltonian generally, one can directly
check that the classical equation of motion for the Hamil-
tonian Eq. (26) is Eq. (25). As a special case, the equa-
tion for a damping harmonic oscillator can be derived by
the Caldirora-Kani (CK) Hamiltonian [10].
The above problem can be generalized to construct a
Hamiltonian system for the equation
q̈ = f1(q)q̇
2 + ηq̇ + f2(q), (28)
where η is constant. It can be derived by the Hamiltonian
H(q, p, t) = 1
M(q, t)−1p2 + V (q, t) with
M = exp
f1(q)dq − ηt
, V = −
Mf2(q)dq.
Similarly, by using the new variable φ defined by
Eq. (12), the equation for φ is
φ̈ = ηφ̇+ f2(q) exp
f1(q)dq
. (30)
Now we consider a very special case that both γ and Λ
are constant; then φ defined by Eq. (18a) satisfies
φ̈− 12πGζ0φ̇−
γ2Λφ = 0, (31)
which describes a damping harmonic oscillator.
The damping harmonic oscillator
Mq̈ = −ηq̇(t)−
∂V (q)
, (32)
has been studied in quantum mechanics. The CK Hamil-
tonian
e−ηt/Mp2 +
Mω2eηt/M q2, (33)
with the commutation relation [q, p] = i~, can yield the
dissipation equation (32) through the Heisenberg equa-
tion [10]. Our work can be regarded as a generalization
to the case of variable mass. It is the variable mass that
generates a nonlinear term in the equation of motion that
describes the generalized FRW model.
In our previous work [9], we have proposed an EOS as
p = (γ − 1)ρ−
κ2T 22
where the parameters γ, T1 and T2 are constants. Com-
bining the Friedmann equations with this more practical
EOS, we obtain the dynamical evolution equation for the
scale factor as
= −3γ − 2
. (35)
This model possesses a large variety of properties, such as
that we have found a scalar field model which is equiva-
lent to the above EOS. For related works on the modified
EOS, see Ref. [9, 11, 12, 13]. The present work can also
be regarded as a generalization of the EOS to γ = γ(a)
and T2 = T2(a). And the corresponding Hamiltonian
formalism for this system can be constructed similarly.
IV. RELATIONS TO THE OBSERVABLE
QUANTITIES
The observations of the supernovae (SNe) Ia have
provided the direct evidence for the cosmic accelerat-
ing expansion of our current Universe [1]. A bridge
between the cosmological theory and the observation
data is the H-z relation, where H ≡ ȧ/a is the Hub-
ble parameter and z is the redshift. For example, the
ΛCDM model in cosmology can be described mainly as
H2(z) = H20 [Ωm(1+z)
3+1−Ωm], where Ωm is the mat-
ter energy density. This model fits the observational data
well and provides the cosmological constant as the sim-
plest candidate for dark energy. In a sense, the different
cosmological models are characterized by the correspond-
ing H-z relations.
There is also a systematic way to construct the Hamil-
tonian starting from the general model
H2 = f(a), (36)
where f(a) is a specific function of the scale factor a,
according to the model. By differentiating Eq. (36), we
obtain that it is a solution of the following equation:
3γ − 2
3γf(a)
af ′(a)
, (37)
which possesses the same form of Eq. (5) or (10). The
corresponding coefficients are given by
f1(a) = −
3γ − 2
, f2(a) =
3γaf(a)
a2f ′(a)
. (38)
Then by applying Eq. (11) we can obtain the correspond-
ing Hamiltonian. Therefore, even if the EOS for a cosmo-
logical model is not explicitly linear in ρ, the Hamiltonian
formalism in the present work can also be applied if the
effective Friedmann equation H2 = f(a) can be given out
for that model.
Many approaches such as modified gravity [14] can
be reduced to effective Friedmann equations in the form
H2 = f(a). Since ΛCDM model fits the SNe Ia data well,
the reasonable cosmological models should be reduced to
Friedmann cosmology in an effective way and give out the
right H-z relation, in order to make a comparison with
the ΛCDM model. In our case, the Friedmann equations
in terms of the Hubble parameter can be written as
= −3γ
H2 + Λ̃(a). (39)
Here γ is assumed to be constant for simplicity. This
equation is linear in H2 and the effective term Λ̃(a) is
an inhomogeneous term. The solution in terms of H(z)
concerning the initial condition H(0) = H0 is given out
H(z)2 = H20 (1 + z)
Λ̃(z′)(1 + z′)−3γ−1dz′
In the power-law ΛCDM model, the contributions of dif-
ferent components are separated in H2, such as a con-
stant for the cosmological constant, and a (1 + z)2 fac-
tor for the curvature term. But in the general case, the
contribution of the matter cannot be separated from the
above solution. This problem is related to the conserva-
tion law of the matter, which has been investigated in
Refs. [5, 6].
The Bianchi identity for the energy-momentum tensor
Eq. (2) gives
ρ̇Λ + ρ̇+ 3H(ρ+ p) = 0, (41)
which implies that energy transfer will exist between the
matter and the vacuum energy. An intuitive idea has
been proposed that if both G and Λ are variable, the
ordinary energy-momentum tensor can be individually
conserved, i.e., ρ̇+3H(ρ+p) = 0 [6]. This is achieved by
combining the Bianchi identity for the variable G and Λ
model
[G(ρΛ + ρ)] + 3GH(ρ+ p) = 0, (42)
with the following constraint:
(ρ+ ρΛ)Ġ+Gρ̇Λ = 0. (43)
The authors of Ref. [6] assume that both the Newton con-
stant G and the cosmological constant Λ are functions of
a scale parameter µ and apply the renormalization group
approach to cosmology. If G(µ) evolves by a logarithmic
law and ρΛ(µ) evolves quadratically with µ, then this
picture can explain the evolution of the Universe, and at
the same time, the variable G can explain the flat rota-
tion curves of the galaxies without introducing the dark
matter hypothesis.
V. REMARKS ON QUANTUM COSMOLOGY
We have obtained a classical Hamiltonian formalism
of the Friedmann equations. Generally, once a Hamilto-
nian is obtained, the system can be quantized straightfor-
wardly by replacing the Poisson bracket with the commu-
tation relation [q, p] = i. However, we need to take into
account the ambiguity in the ordering of noncommuting
operators q and p. For simplicity, we ignore the order-
ing ambiguity here. In terms of the new variable φ, the
corresponding Schrödinger’s equation can be written as
H(φ, p̂φ)Ψ(φ) = EΨ(φ), (44)
where p̂φ = −i∂φ. To make a comparison between our ap-
proach and the Wheeler-DeWitt equation, we only take
the ΛCDM model as a very special case for an illustrative
example. The corresponding Hamiltonian for Eq. (19) in
the case γ = 1 is
H = 1
p2 − 1
Λa3, (45)
where p = aȧ. In the approach of the Wheeler-DeWitt
equation, H = 0 is a constraint [2, 15], thus the quanti-
zation gives (∂2a +
a4)Ψ(a) = 0. This is an anharmonic
oscillator with zero energy eigenvalue. In our case, the
Hamiltonian is nonzero and proportional to the matter
energy density, which we show in the following. The so-
lution of Eq. (39) with Λ̃(a) = Λ/2 is
H20 −
a−3 +
= H20 [Ωma
−3 + 1− Ωm], (46)
where Ωm ≡ 1 − Λ/(3H20 ). Therefore, the Hamiltonian
can be calculated as
H = a
H20Ωm. (47)
After a canonical transformation by Eq. (16), the
Schrödinger’s equation in terms of φ becomes
Ψ(φ) = EΨ(φ). (48)
Thus, for the asymptotic de Sitter Universe, the ΛCDM
model corresponds to an upside-down harmonic oscillator
in our formalism. Such an oscillator also appears in the
matrix description of de Sitter gravity [16].
We can transform the de Sitter Universe to the dual
anti-de Sitter Universe by employing the scale factor du-
ality [17], which has been found that a → a−1 gives
H → −H and other consequences. The duality for
Eq. (5) is given by
a → a−1, γ → −γ, Λ → −Λ, φ → −φ. (49)
It can be checked easily that Eq. (5) is invariant under
these transformations. If we use the dual scale factor a−1,
the corresponding potential becomes Ṽ (φ) = + 3
γ2Λφ2.
In fact, quantization in de Sitter spacetime is one of the
major difficulties of string theory at one time (though
this picture has changed a little bit after Kachru-Kallosh-
Linde- Trivedi theory appeared). It seems that quantiz-
ing de Sitter cosmology is no difference, since the time
variable used is the same, and it is known that there is no
global timelike coordinates in de Sitter spacetime. Some
quantum effects of a scalar field in de Sitter background
can be found in Ref. [18].
VI. CONCLUSION AND DISCUSSION
We have proposed a systematic scheme to describe the
Friedmann equations through a Hamiltonian formalism.
The generalized FRW model accompanied by both vari-
able EOS parameter and variable cosmological constant
admits a Hamiltonian description without constraint. Af-
ter an appropriate canonical transformation, the system
can be significantly simplified to an object moving in
an effective potential field. The bulk viscosity can also
be taken into account by a time-dependent Hamiltonian.
Some examples are given explicitly, such as the ΛCDM
model, the curvature term effect, and the inflation period.
The quantization of the system provides a new approach
to study the potential quantum cosmology, which is an
intriguing topic in theoretical physics research.
We shall discuss some possible future developments of
our work. As we have claimed, the formalism in this work
can be applied to a large variety of cosmological models.
By solving the Schrödinger equation H(φ, p̂φ)Ψ = EΨ,
the cosmological wave function can be obtained for a
specific model. Here we consider the curvature effect,
for example, which is described by the potential Eq. (21)
with parameters Λ = 0, γ = 1, and m = 2. The cor-
responding Schrödinger equation can be solved in terms
of the biconfluent Heun equation (BHE) [19]. We can
also start from the effective Lagrangian and study the
observational effects when we modify the potential. We
believe that our formalism would give a new perspective
to the potential study of quantum cosmology physics.
ACKNOWLEDGMENTS
J.R. thanks Prof. M.L. Ge for helpful discussions on
Hamiltonian systems. X.H.M. is supported by NSFC un-
der No. 10675062 and BK21 Foundation. L.Z. is sup-
ported by NSFC under No. 90403014.
APPENDIX A: MATHEMATICAL NOTES
A more general correspondence between a Hamiltonian
and its equation of motion is given in Ref. [19]. The
equation of motion of the Hamiltonian
H(q, p, t) =
P0(q, t)p
2 + P1(q, t)p+ P2(q, t)
is given out by
∂ lnP0
∂ ln f
− ∂ lnP0
− 2∂V
. (A2)
In the mathematical aspect, Eq. (28) can be further gen-
eralized to the following equation:
q̈ = F1(q, t)q̇
2 + F2(q, t)q̇ + F3(q, t), (A3)
however, here the coefficients F1 and F2 are not com-
pletely independent. Comparing with Eq. (A2), we can
see that the condition 2∂tF1(q, t) = ∂qF2(q, t) must be
satisfied for consistency. In the present work, both f1(q)
and η have safely satisfied this condition.
We shall explain why we choose the transformation as
in Eq. (12). Starting from the following equation
q̈ = f1(q)q̇
2 + ηq̇ + f2(q), (A4)
we expect that after an appropriate change of variable
φ(q), the above equation can be transformed as
φ̈ = ηφ̇+ g(φ). (A5)
By differentiating φ(q), we obtain φ̇ = φ′q̇, and φ̈ =
φ′′q̇2 + φ′q̈, where the prime denotes a derivative with
respect to q. Substituting φ, φ̇, and φ̈ into Eq. (A5), we
obtain
q̈ = −φ
q̇2 + ηq̇ +
Now it turns out that by defining −φ′′/φ′ = f1(q), which
can be solved as the form Eq. (12), the q̇2 term can be
eliminated.
[1] A.G. Riess et al., Astron. J. 116, 1009 (1998); N. Bahcall,
J.P. Ostriker, S. Perlmutter, and P.J. Steinhardt, Science
284, 1481 (1999); D.N. Spergel et al., astro-ph/0603449;
A.G. Riess et al., astro-ph/0611572.
[2] B.S. DeWitt, Phys. Rev. 160, 1113 (1967).
[3] E.M. Barboza Jr. and N.A. Lemos, Gen. Rel. Grav.
38, 1609 (2006); G.A. Monerat, E.V. Corrêa Silva, G.
Oliveira-Neto, L.G. Ferreira Filho, and N.A. Lemos,
Phys. Rev. D 73, 044022 (2006); Braz. J. Phys. 35, 1106
(2005); M.P. Da̧browski, C. Kiefer, and B. Sandhöfer,
Phys. Rev. D 74, 044022 (2006); N. Pinto-Neto, E. Ser-
gio Santini, and F.T. Falciano, Phys. Lett. A 344, 131
(2005); V. Husain and O. Winkler, Phys. Rev. D 69,
084016 (2004); C. Wang, Class. Quant. Grav. 20, 3151
(2003); A.M. Khvedelidze and Yu.G. Palii, Class. Quant.
Grav. 18, 1767 (2001); S.A. Gogilidze, A.M. Khvedelidze,
V.V. Papoyan, Yu.G. Palii, and V.N. Pervushin, Grav.
Cosmol. 3, 17 (1997); A.M. Khvedelidze, V.V. Papoyan,
Yu.G. Palii, and V.N. Pervushin, Phys. Lett. B 402, 263
(1997); H.C. Rosu and J. Socorro, Phys. Lett. A 223, 28
(1996); N.A. Lemos, J. Math. Phys. 37, 1449 (1996); L.A.
Glinka, gr-qc/0612079; V.V. Kuzmichev, gr-qc/0002029.
[4] J.M. Overduin and F.I. Cooperstock, Phys. Rev. D 58,
043506 (1998); R.G. Vishwakarma, Class. Quant. Grav.
18, 1159 (2001).
[5] J. Solà and H. Štefančić, Mod. Phys. Lett. A 21, 479
(2006); Phys. Lett. B 624, 147 (2005); B. Guberina, R.
Horvat, and H. Nikolić, Phys. Lett. B 636, 80 (2006);
[6] I.L. Shapiro, J. Solà, and H. Štefančić, JCAP 0501, 012
(2005).
http://arxiv.org/abs/astro-ph/0603449
http://arxiv.org/abs/astro-ph/0611572
http://arxiv.org/abs/gr-qc/0612079
http://arxiv.org/abs/gr-qc/0002029
[7] P. Wang and X.H. Meng, Class. Quant. Grav. 22, 283
(2005).
[8] I. Brevik, Phys. Rev. D 65, 127302 (2002); I. Brevik and
O. Gorbunova, Gen. Rel. Grav. 37, 2039 (2005).
[9] J. Ren and X.H. Meng, Phys. Lett. B 633, 1 (2006); 636,
5 (2006); astro-ph/0605010, to appear in IJMPD; X.H.
Meng, J. Ren, and M.G.Hu, Commun. Theor. Phys. 47,
379 (2007).
[10] P. Caldirola, Nuovo Cimento 18, 393 (1941); E. Kanai,
Prog. Theor. Phys. 3, 440 (1948); L.H. Yu and C.P. Sun,
Phys. Rev. A 49, 592 (1994); C.P. Sun and L.H. Yu,
Phys. Rev. A 51, 1845 (1995).
[11] R. Holman and S. Naidu, astro-ph/0408102; E. Babichev,
V. Dokuchaev, and Y. Eroshenko, Class. Quant. Grav.
22, 143 (2005).
[12] S. Capozziello, S. Nojiri, and S.D. Odintsov, Phys. Lett.
B 634, 93 (2006); Phys. Lett. B 632, 597 (2006); S.
Capozziello, V.F. Cardone, E. Elizalde, S. Nojiri, and
S.D. Odintsov, Phys. Rev. D 73, 043512 (2006); S. No-
jiri, and S.D. Odintsov, Gen. Rel. Grav. 38, 1285 (2006);
Phys. Rev. D 72, 023003 (2005); I. Brevik, O.G. Gor-
bunova, and A.V. Timoshkin, gr-qc/0702089.
[13] I.L. Shapiro and J. Sola, JHEP 0202 006 (2002);
astro-ph/0401015.
[14] X.H. Meng and P. Wang, Class. Quant. Grav. 20, 4949
(2003); 21, 951 (2004); 22, 23 (2005); ibid, Phys. Lett.
B 584, 1 (2004) for example.
[15] A. Vilenkin, Phys. Rev. D 50, 2581 (1994).
[16] Y.H. Gao, hep-th/0107067.
[17] G. Veneziano, Phys. Lett. B 265, 287 (1991); M.C. Bento
and O. Bertolami, Class. Quant. Grav. 12, 1919 (1995).
[18] V.K. Onemli and R.P. Woodard, Phys. Rev. D
70, 107301 (2004); E.O. Kahya and V.K. Onemli,
gr-qc/0612026.
[19] S.Yu. Slavyanov and W. Lay, Special Functions: A Uni-
fied Theory Based on Singularities (Oxford University
Press, New York, 2000).
http://arxiv.org/abs/astro-ph/0605010
http://arxiv.org/abs/astro-ph/0408102
http://arxiv.org/abs/gr-qc/0702089
http://arxiv.org/abs/astro-ph/0401015
http://arxiv.org/abs/hep-th/0107067
http://arxiv.org/abs/gr-qc/0612026
ABSTRACT
  We propose a Hamiltonian formalism for a generalized
Friedmann-Roberson-Walker cosmology model in the presence of both a variable
equation of state (EOS) parameter $w(a)$ and a variable cosmological constant
$\Lambda(a)$, where $a$ is the scale factor. This Hamiltonian system containing
1 degree of freedom and without constraint, gives Friedmann equations as the
equation of motion, which describes a mechanical system with a variable mass
object moving in a potential field. After an appropriate transformation of the
scale factor, this system can be further simplified to an object with constant
mass moving in an effective potential field. In this framework, the $\Lambda$
cold dark matter model as the current standard model of cosmology corresponds
to a harmonic oscillator. We further generalize this formalism to take into
account the bulk viscosity and other cases. The Hamiltonian can be quantized
straightforwardly, but this is different from the approach of the
Wheeler-DeWitt equation in quantum cosmology.

<|endoftext|><|startoftext|>
Introduction
	Information Theoretic Definitions
	SSR Model
	Describing SSR Using a Single PDF, fQ()
	fQ() as the PDF of the Average Transfer Function
	Mutual Information in Terms of fQ()
	Entropy of the random variable, Q
	Examples of the PDF fQ()
	Large N SSR: Literature Review and Outline of This Paper
	A General Expression for the SSR Channel Capacity for Large N
	A Sufficient Condition for Optimality
	Optimizing the Signal Distribution
	Example: Uniform Noise
	Gaussian Noise
	Optimizing the Noise Distribution
	Example: Uniform Signal
	Consequences of Optimizing the Large N Channel Capacity
	Optimal Fisher Information
	The Optimal PDF fQ()
	Output Entropy at Channel Capacity
	The Optimal Output PMF is Beta-Binomial
	Analytical Expression for the Mutual Information
	A Note on the Output Entropy
	Channel Capacity for Large N and `Matched' Signal and Noise
	Improvements to Previous Large N Approximations
	SSR for Large N and =1
	Uniform Signal and Noise
	Gaussian Signal and Noise
	Acknowledgments
	Derivations
	Mutual Information for Large N and Arbitrary 
	Conditional Output Entropy
	Output Distribution and entropy
	Mutual Information
	Proof that fS(x) is a PDF
	H(y|X) for large N and =1
	References
ABSTRACT
  Suprathreshold stochastic resonance (SSR) is a form of noise enhanced signal
transmission that occurs in a parallel array of independently noisy identical
threshold nonlinearities, including model neurons. Unlike most forms of
stochastic resonance, the output response to suprathreshold random input
signals of arbitrary magnitude is improved by the presence of even small
amounts of noise. In this paper the information transmission performance of SSR
in the limit of a large array size is considered. Using a relationship between
Shannon's mutual information and Fisher information, a sufficient condition for
optimality, i.e. channel capacity, is derived. It is shown that capacity is
achieved when the signal distribution is Jeffrey's prior, as formed from the
noise distribution, or when the noise distribution depends on the signal
distribution via a cosine relationship. These results provide theoretical
verification and justification for previous work in both computational
neuroscience and electronics.

<|endoftext|><|startoftext|>
Draft version October 31, 2018
Preprint typeset using LATEX style emulateapj v. 11/26/04
THREE DIFFERENT TYPES OF GALAXY ALIGNMENT WITHIN DARK MATTER HALOS
A. Faltenbacher
, Cheng Li
, Shude Mao
, Frank C. van den Bosch
, Xiaohu Yang
, Y.P. Jing
, Anna Pasquali
and H.J. Mo
Draft version October 31, 2018
ABSTRACT
Using a large galaxy group catalogue based on the Sloan Digital Sky Survey Data Release 4 we
measure three different types of intrinsic galaxy alignment within groups: halo alignment between the
orientation of the brightest group galaxies (BGG) and the distribution of its satellite galaxies, radial
alignment between the orientation of a satellite galaxy and the direction towards its BGG, and direct
alignment between the orientation of the BGG and that of its satellites. In agreement with previous
studies we find that satellite galaxies are preferentially located along the major axis. In addition, on
scales r < 0.7Rvir we find that red satellites are preferentially aligned radially with the direction to
the BGG. The orientations of blue satellites, however, are perfectly consistent with being isotropic.
Finally, on scales r < 0.1Rvir, we find a weak but significant indication for direct alignment between
satellites and BGGs. We briefly discuss the implications for weak lensing measurements.
Subject headings: galaxies: clusters: general — galaxies: kinematics and dynamics — surveys
1. INTRODUCTION
A precise assessment of galaxy alignments is im-
portant for two main reasons: it contains information
regarding the impact of environment on the formation
and evolution of galaxies, and it can be an important
source of contamination for weak lensing measurements.
In theory, the large scale-tidal field is expected to
induce large-scale correlations between galaxy spins and
galaxy shapes (e.g., Pen et al. 2000; Croft & Metzler
2000; Heavens et al. 2000; Catelan et al. 2001;
Crittenden et al. 2001; Porciani et al. 2002b; Jing
2002). In addition, the preferred accretion of new
material along filaments tends to cause alignment with
the large scale filamentary structure in which dark
matter halos and galaxies are embedded (e.g., Jing 2002;
Faltenbacher et al. 2005; Bailin & Steinmetz 2005). On
small scales, however, inside virialized dark matter
haloes, any primordial alignment is likely to have been
significantly weakened due to non-linear effects such
as violent relaxation and (impulsive) encounters (e.g.,
Porciani et al. 2002a). On the other hand, tidal forces
from the host halo may also induce new alignments,
similar to the tidal locking mechanism that affects
the Earth-Moon system (e.g., Ciotti & Dutta 1994;
Usami & Fujimoto 1997; Fleck & Kuhn 2003).
Observationally, the search for galaxy alignments has
a rich and often confusing history. To some extent this
owes to the fact that numerous different forms of align-
ment have been discussed in the literature: the align-
ment between neighbouring clusters (Binggeli 1982; West
1989; Plionis 1994), between brightest cluster galaxies
(BCGs) and their parent clusters (Carter & Metcalfe
1980; Binggeli 1982; Struble 1990), between the orienta-
1 Shanghai Astronomical Observatory, Nandan Road 80, Shang-
hai 200030, China
2 University of Manchester, Jodrell Bank Observatory, Maccles-
field, Cheshire SK11 9DL, UK
3 Max-Planck-Institute for Astronomy, Königstuhl 17, D-69117
Heidelberg, Germany
4 Department of Astronomy, University of Massachusetts,
Amherst MA 01003-9305
tion of satellite galaxies and the orientation of the cluster
(Dekel 1985; Plionis et al. 2003), and between the ori-
entation of satellite galaxies and the orientation of the
BCG (Struble 1990). Obviously, several of these align-
ments are correlated with each other, but independent
measurements are difficult to compare since they are of-
ten based on very different data sets.
With large galaxy redshift surveys, such as the
two-degree Field Galaxy Redshift Survey (2dFGRS,
Colless, M., et al. 2001) and the Sloan Digital Sky Sur-
vey (SDSS, York, D. G., et al. 2000), it has become pos-
sible to investigate alignments using large and homoge-
neous samples. This has resulted in robust detections of
various alignments: Brainerd (2005), Yang et al. (2006)
and Azzaro et al. (2007) all found that satellite galax-
ies are preferentially distributed along the major axes
of their host galaxies, Trujillo et al. (2006) found that
spiral galaxies located on the shells of large voids have
rotation axes that lie preferentially on the void surface,
and Pereira & Kuhn (2005) and Agustsson & Brainerd
(2006b) noticed that satellite galaxies tend to be prefer-
entially oriented towards the galaxy at the center of the
halo.
In this Letter we use a large galaxy group catalogue
constructed from the SDSS to study galaxy alignments
on small scales within dark matter haloes that span a
wide range in masses. The unique aspect of this study
is that we investigate three different types of alignment
using exactly the same data set consisting of over 60000
galaxies. In addition, by using a carefully selected galaxy
group catalogue, we can discriminate between central
galaxies and satellites, and study their mutual alignment.
The latter is particularly important for galaxy-galaxy
lensing, where it can be a significant source of contami-
nation. Finally, exploiting the large number of galaxies
in our sample, we also investigate how the alignment sig-
nal depends on the colors of the galaxies. Throughout we
adopt Ωm = 0.3 and ΩΛ = 0.7 and a Hubble parameter
h = H0/100 km s
−1Mpc−1.
2. DATA & METHODOLOGY
http://arxiv.org/abs/0704.0674v2
2 Galaxy alignment within dark matter halos
satellite
Fig. 1.— Illustration of the three angles θ, φ and ξ, which
are used to test for halo alignment, radial alignment and direct
alignment, respectively. The three angles are not independent: if
ordered by size α ≥ β ≥ γ then α = min[β + γ, 180◦ − β − γ].
We apply our analysis to the SDSS galaxy group cat-
alogue of Yang et al. (2007, in prep.). This cata-
logue is constructed using the halo-based group finder
of Yang et al. (2005) and applied to the New York Uni-
versity Value Added Galaxy Catalog (NYU-VAGC) 5
that is based on the SDSS Data Release Four (DR4;
Adelman-McCarthy et al. 2006). This group finder uses
the general properties of CDM halos (i.e. virial ra-
dius, velocity dispersion, etc.) to determine the mem-
berships of groups (cf. Weinmann et al. 2006). In this
study we only use those groups with redshifts in the
range 0.01 ≤ z ≤ 0.2 and with halo masses between
5 × 1012 h−1M⊙ and 5 × 10
14 h−1M⊙. In addition, we
only focus on group members with 0.1M
−5 logh ≤ −19.
Throughout this paper all magnitudes are k+e corrected
to z = 0.1 following Blanton et al. (2003). Using the
method of Li et al. (2006) we split our galaxies in three
color bins. In short, we divide the full NYU-VAGC sam-
ple in 282 subsamples according to the r-band luminosity,
and fit the 0.1(g−r) color distribution for each subsample
with a double-Gaussian. Galaxies in between the centers
of the two Gaussians are classified as ‘green’, while those
with higher and lower values for the 0.1(g − r) color are
classified as ‘red’ and ‘blue’, respectively. The final sam-
ple, on which our analysis is based, consists of 18576
groups with a total of 60724 galaxies, of which 29780 are
red, 20604 are green, and 10340 are blue.
In what follows, we use these groups to examine (i)
halo alignment between the orientation of the brightest
group galaxies (BGG) and the distribution of its satel-
lite galaxies, (ii) radial alignment between the orientation
of a satellite galaxy and the direction towards its BGG,
and (iii) direct alignment between the orientation of the
BGG and that of its satellites. In particular, we define
the angles θ, φ and ξ as illustrated in Fig. 1, and in-
vestigate whether their distributions are consistent with
isotropy, or whether they indicate a preferred alignment.
Following Brainerd (2005) and Yang et al. (2006), the
orientation of each galaxy is defined by the major axis
position angle (PA) of its 25-magn arcsec−2 isophote in
the r-band.
For each satellite galaxy we compute its projected dis-
tance, r, to the BGG, normalized by the virial radius,
Rvir, of its group (as derived from the group mass). For
each of 5 radial bins, equally spaced in r/Rvir, we then
compute 〈θ〉, 〈φ〉 and 〈ξ〉, where 〈.〉 indicates the average
over all BGG-satellite pairs in a given radial bin. Next
we construct 100 random samples in which the positions
5 http://wassup.physics.nyu.edu/vagc/
Fig. 2.— Mean angle, θ, between the PA of the BGG and the line
connecting the BGG with a satellite galaxy, as function of r/Rvir.
Different line styles indicate (sub)samples determined according to
the satellites’ color. The shaded areas mark the parameter space
between the 16th and 84th percentiles of the distributions obtained
from the 100 random samples. A signal outside this shaded region
means that it is inconsistent with no alignment (i.e., with isotropy)
at more than 68 percent confidence.
of the galaxies are kept fixed, but their PAs are random-
ized. For each of these random samples we compute 〈θ〉,
〈φ〉 and 〈ξ〉 as function of r/Rvir, which we use to com-
pute the significance of any detected alignment signal.
3. RESULTS
3.1. Halo alignment
Fig. 2 shows the results thus obtained for the angle θ
between the orientation of the BGG and the line con-
necting the BGG with the satellite galaxy. Clearly, for
all four samples shown (all, red, green and blue, where
the color refers to that of the satellite galaxy, not that of
the BGG) we obtain 〈θ〉 < 45◦ at all 5 radial bins and at
high significance6. This indicates that satellite galaxies
are preferentially distributed along the major axis of the
BGG, in good agreement with the findings of Brainerd
(2005), Yang et al. (2006) and Azzaro et al. (2007), but
opposite to the old Holmberg (1969) effect. Note that
there is a clear indication that the distribution of red
satellites is more strongly aligned with the orientation
of the BGG than that of blue satellites, again in good
agreement with previous studies (cf. Yang et al. 2006;
Azzaro et al. 2007)
3.2. Radial alignment
Hawley & Peebles (1975) were the first to report a
possible detection of radial alignment in the Coma
cluster, which has subsequently been confirmed by
Thompson (1976) and Djorgovski (1983). However,
in a more systematic study based on the 2dFGRS,
Bernstein & Norberg (2002) were unable to detect any
significant radial alignment of satellite galaxies around
isolated host galaxies. On the other hand, using a very
similar selection of hosts and satellites, but applied to
the SDSS, Agustsson & Brainerd (2006b) found signifi-
cant evidence for radial alignment on scales. 70 h−1kpc.
In addition, Pereira & Kuhn (2005) found a statistically
robust tendency toward radial alignment in a large sam-
ple of 85 X-ray selected clusters.
Fig. 3 shows the results obtained from our group cat-
alogue. It shows, as function of r/Rvir, the mean angle
6 More than 99 percent, except for the 0.3Rvir bin for the blue
and the 0.9Rvir bin for the green satellites.
Faltenbacher et al. 3
Fig. 3.— Same as Fig. 2, but for the angle φ (see Fig. 1).
φ between the PA of the satellite and the line connect-
ing the satellite with its BGG. As in Fig. 2 results are
shown for all four different samples, together with the
16th and 84th percentiles obtained from the random sam-
ples. There is a clear and very significant indication that
the major axes of red satellites point towards the BGG
(i.e., 〈φ〉 < 45◦), at least for projected radii r . 0.7Rvir.
The signal for the green satellites is significantly weaker,
but still reveals a preference for radial alignment on small
scales: in fact, for the 3 radial bins with r ≤ 0.5Rvir the
null-hypothesis of no radial alignment can be rejected at
more than 95 percent confidence level. In contrast, for
the blue galaxies the data is perfectly consistent with
no radial alignment. Since the 2dFGRS is more biased
towards blue galaxies than the SDSS, this may at least
partially explain why Bernstein & Norberg (2002) were
unable to detect significant radial alignment.
3.3. Direct alignment
The search for direct alignment has mainly been
restricted to galaxy clusters (e.g., Plionis et al. 2003;
Strazzullo et al. 2005; Torlina et al. 2007), mostly result-
ing in no or very weak indications for alignment be-
tween the orientations of BCG and satellite galaxies.
Agustsson & Brainerd (2006b) extended the search for
direct alignment to a samples of 4289 host-satellites pairs
selected from the SDSS DR4, finding a weak but signif-
icant signal on scales . 35 h−1kpc. On larger scales,
however, no significant alignment was found, in agree-
ment with Mandelbaum et al. (2006).
Fig. 4 displays our results for the direct alignment,
based on the angle ξ between the orientations of a
satellite galaxy and that of its BGG. With the ex-
ception of the central bin (r/Rvir = 0.1) the null-
hypothesis of a random distribution cannot be rejected
at more than 1σ confidence level. Our study, based
on over 40000 BGG-satellite pairs, therefore agrees with
Agustsson & Brainerd (2006b) that there is a weak indi-
cation for direct alignment, but only on relatively small
scales: for the average group mass in our sample, M =
3.6×1013 h−1M⊙, a radius of r = 0.1Rvir corresponds to
70 h−1kpc. However, at least for the red satellites there
is a systematic trend towards angles < 45◦ which may
be caused by the group tidal field (cf. Lee et al. 2005).
3.4. Dependence on selection criteria
The sample used above is based on galaxies with
−5 log h ≤ −19. Typically, including fainter galax-
ies improves the number statistics but not necessarily
the signal-to-noise since the PAs of fainter galaxies carry
Fig. 4.— Same as Fig. 2, but for the angle ξ (see Fig. 1).
larger errors. To test the sensitivity of our results, we
repeated the above analysis using magnitude limits of
−17, −18, and −20. This resulted in alignment signals
that were only marginally different. We have also tested
the sentitivity of our results to the range of group masses
considered. Changing the lower limit to 1012 h−1M⊙ or
1013 h−1M⊙, or imposing no upper mass limit, all yields
very similar alignment signals. These tests assure that
our selection criteria lead to representative results.
4. DISCUSSION
The origin of the halo alignment described in § 3.1
has been studied by Agustsson & Brainerd (2006a) and
Kang et al. (2007) using semi-analytical models of galaxy
formation combined with large N -body simulations.
Since dark matter haloes are in general flattened, and
satellite galaxies are a reasonably fair tracer of the dark
matter mass distribution, 〈θ〉 will be smaller than 45◦ as
long as the BGG is aligned with its dark matter halo.
In particular, Kang et al. (2007) were able to accurately
reproduce the data of Yang et al. (2006) under the as-
sumption that the minor axis of the BGG is perfectly
aligned with the spin axis of its dark matter halo.
Kang et al. (2007) also showed that the color depen-
dence of the halo alignment has a natural explanation in
the framework of hierarchical structure formation: red
satellites are typically associated with subhaloes that
were more massive at their time of accretion. Since the
orientation of a halo is correlated with the direction along
which it accreted most of its matter (e.g., Wang et al.
2005; Libeskind et al. 2005), red satellites are a more ac-
curate tracer of the halo orientation than blue satellites.
The origin of the radial alignment is less clear. One
possibility is that it reflects a left-over from large-scale
alignments introduced by the large scale tidal field and
the preferred accretion of matter along filaments. Such
alignment, however, is unlikely to survive for more than
a few orbits within the halo of the BGG, so that the
observed alignment must be mainly due to the satellite
galaxies that were accreted most recently. Since these
satellites typically reside at relatively large halo-centric
radii, this picture predicts a stronger radial alignment at
larger radii, clearly opposite to what we find.
A more likely explanation, therefore, is that radial
alignment has been created locally by the group tidal
field. As shown by Ciotti & Dutta (1994), the timescale
on which a prolate galaxy can adjust its orientation to
the tidal field of a cluster is much shorter than the Hub-
ble time, but longer than its intrinsic dynamical time.
Consequently, prolate galaxies have a tendency to orient
4 Galaxy alignment within dark matter halos
themselves towards the cluster center. The fact that the
observed signal increases towards the group center sup-
ports this interpretation. In particular, satellites that
were accreted early not only are more likely to be red,
they also are more likely to reside at small group-centric
radii and to have relatively low group-centric velocities
(e.g., Mathews et al. 2004). This will enhance their ten-
dency to align themselves along the gradient in the clus-
ter’s gravitational potential, and they may well be the
major contributors to the pronounced signal on small
scales. In the case of disk galaxies, the conservation of
intrinsic angular momentum prevents the disk from re-
adjusting to the tidal field, which may explain why blue
satellites show no sign of radial alignment. Finally, the
tidal field of the parent halo also results in tidal strip-
ping, and the tidal debris may influence the inferred ori-
entation of the satellite galaxy (cf. Johnston et al. 2001;
Fardal et al. 2006). Detailed studies are required to in-
vestigate the interplay between intrinsic satellite orien-
tations and the groups tidal field.
In order to understand the direct alignment results,
first realize that the angles θ, φ and ξ are not indepen-
dent (see Fig. 1). However, the equation given in the cap-
tion is only applicable for single cases not for the mean
angles. Our results indicate that satellite galaxies are
more likely to be aligned ‘radially’ with the direction to-
wards the BGG, than ‘directly’ with the orientation of
the BGG. Since there is no clear theoretical prediction
for direct alignment, at least not one that can survive
for several orbital periods in a dark matter halo, while
radial alignment can be understood as originating from
the halo’s tidal field, we consider the relative weakness
of direct alignment to be consistent with expectations.
In recent years galaxy-galaxy (GG) lensing has
emerged as a primary tool for constraining the masses of
dark matter halos around galaxies (e.g., Brainerd 2004).
If satellite galaxies are falsely identified as sources lensed
by the BGG, which is likely to happen in the absence of
redshift information, the radial alignment detected here
will dilute the tangential GG lensing signal induced by
the dark matter halo associated with the BGG, thus re-
sulting in an underestimate of the halo mass. In agree-
ment with Agustsson & Brainerd (2006b), our findings
therefore emphasize the importance of an accurate rejec-
tion of satellite galaxies to achieve precision constraints
on dark matter halo masses from GG lensing measure-
ments. Similarly, the weak but significant detection of
direct alignment may contaminate the cosmic shear mea-
surements. Since we only detected a weak signal on small
scales, one can easily avoid this contamination by sim-
ply removing or down-weighting close pairs of galaxies in
projection (King & Schneider 2002; Heymans & Heavens
2003).
ACKNOWLEDGMENTS
This work is supported by NSFC (10533030,
0742961001, 0742951001) and the Knowledge Innova-
tion Program of the Chinese Academy of Sciences, grant
KJCX2-YW-T05. AF and CL are supported by the
Joint Program in Astrophysical Cosmology of the Max
Planck Institute for Astrophysics and the Shanghai As-
trophysical Observatory. YPJ is partially supported by
Shanghai Key Projects in Basic research (04JC14079 and
05XD14019).
REFERENCES
Adelman-McCarthy, J. K., et al. 2006, ApJS, 162, 38
Agustsson, I. & Brainerd, T. G. 2006a, ApJ, 650, 550
—. 2006b, ApJ, 644, L25
Azzaro, M., Patiri, S. G., Prada, F., & Zentner, A. R. 2007,
MNRAS, 376, L43
Bailin, J. & Steinmetz, M. 2005, ApJ, 627, 647
Bernstein, G. M. & Norberg, P. 2002, AJ, 124, 733
Binggeli, B. 1982, A&A, 107, 338
Blanton, M. R., et al. 2003, ApJ, 592, 819
Brainerd, T. G. 2004, in AIP Conf. Proc. 743: The New Cosmology:
Conference on Strings and Cosmology, ed. R. E. Allen, D. V.
Nanopoulos, & C. N. Pope, 129–156
Brainerd, T. G. 2005, ApJ, 628, L101
Carter, D. & Metcalfe, N. 1980, MNRAS, 191, 325
Catelan, P., Kamionkowski, M., & Blandford, R. D. 2001, MNRAS,
320, L7
Ciotti, L. & Dutta, S. N. 1994, MNRAS, 270, 390
Colless, M., et al. 2001, MNRAS, 328, 1039
Crittenden, R. G., Natarajan, P., Pen, U.-L., & Theuns, T. 2001,
ApJ, 559, 552
Croft, R. A. C. & Metzler, C. A. 2000, ApJ, 545, 561
Dekel, A. 1985, ApJ, 298, 461
Djorgovski, S. 1983, ApJ, 274, L7
Faltenbacher, A., Allgood, B., Gottlöber, S., Yepes, G., & Hoffman,
Y. 2005, MNRAS, 362, 1099
Fardal, M. A., Babul, A., Geehan, J. J., & Guhathakurta, P. 2006,
MNRAS, 366, 1012
Fleck, J.-J. & Kuhn, J. R. 2003, ApJ, 592, 147
Hawley, D. L. & Peebles, P. J. E. 1975, AJ, 80, 477
Heavens, A., Refregier, A., & Heymans, C. 2000, MNRAS, 319, 649
Heymans, C. & Heavens, A. 2003, MNRAS, 339, 711
Holmberg, E. 1969, Arkiv for Astronomi, 5, 305
Jing, Y. P. 2002, MNRAS, 335, L89
Johnston, K. V., Sackett, P. D., & Bullock, J. S. 2001, ApJ, 557,
Kang, X., van den Bosch, F. C., Yang, X., Mao, S., Mo, H. J., Li,
C., & Jing, Y. P. 2007, MNRAS, in press (astro-ph/0701130)
King, L. & Schneider, P. 2002, A&A, 396, 411
Lee, J., Kang, X., & Jing, Y. P. 2005, ApJ, 629, L5
Li, C., Kauffmann, G., Jing, Y. P., White, S. D. M., Börner, G., &
Cheng, F. Z. 2006, MNRAS, 368, 21
Libeskind, N. I., Frenk, C. S., Cole, S., Helly, J. C., Jenkins, A.,
Navarro, J. F., & Power, C. 2005, MNRAS, 363, 146
Mandelbaum, R., Hirata, C. M., Ishak, M., Seljak, U., &
Brinkmann, J. 2006, MNRAS, 367, 611
Mathews, W. G., Chomiuk, L., Brighenti, F., & Buote, D. A. 2004,
ApJ, 616, 745
Pen, U.-L., Lee, J., & Seljak, U. 2000, ApJ, 543, L107
Pereira, M. J. & Kuhn, J. R. 2005, ApJ, 627, L21
Plionis, M. 1994, ApJS, 95, 401
Plionis, M., Benoist, C., Maurogordato, S., Ferrari, C., & Basilakos,
S. 2003, ApJ, 594, 144
Porciani, C., Dekel, A., & Hoffman, Y. 2002a, MNRAS, 332, 325
—. 2002b, MNRAS, 332, 339
Strazzullo, V., Paolillo, M., Longo, G., Puddu, E., Djorgovski,
S. G., De Carvalho, R. R., & Gal, R. R. 2005, MNRAS, 359,
Struble, M. F. 1990, AJ, 99, 743
Thompson, L. A. 1976, ApJ, 209, 22
Torlina, L., De Propris, R., & West, M. J. 2007, ArXiv Astrophysics
e-prints
Trujillo, I., Carretero, C., & Patiri, S. G. 2006, ApJ, 640, L111
Usami, M. & Fujimoto, M. 1997, ApJ, 487, 489
Wang, H. Y., Jing, Y. P., Mao, S., & Kang, X. 2005, MNRAS, 364,
Weinmann, S. M., van den Bosch, F. C., Yang, X., & Mo, H. J.
2006, MNRAS, 366, 2
West, M. J. 1989, ApJ, 344, 535
Yang, X., Mo, H. J., van den Bosch, F. C., & Jing, Y. P. 2005,
MNRAS, 356, 1293
http://arxiv.org/abs/astro-ph/0701130
Faltenbacher et al. 5
Yang, X., van den Bosch, F. C., Mo, H. J., Mao, S., Kang, X.,
Weinmann, S. M., Guo, Y., & Jing, Y. P. 2006, MNRAS, 369,
York, D. G., et al. 2000, AJ, 120, 1579
ABSTRACT
  Using a large galaxy group catalogue based on the Sloan Digital Sky Survey
Data Release 4 we measure three different types of intrinsic galaxy alignment
within groups: halo alignment between the orientation of the brightest group
galaxies (BGG) and the distribution of its satellite galaxies, radial alignment
between the orientation of a satellite galaxy and the direction towards its
BGG, and direct alignment between the orientation of the BGG and that of its
satellites. In agreement with previous studies we find that satellite galaxies
are preferentially located along the major axis. In addition, on scales r < 0.7
Rvir we find that red satellites are preferentially aligned radially with the
direction to the BGG. The orientations of blue satellites, however, are
perfectly consistent with being isotropic. Finally, on scales r < 0.1 \Rvir, we
find a weak but significant indication for direct alignment between satellites
and BGGs. We briefly discuss the implications for weak lensing measurements.

<|endoftext|><|startoftext|>
Proto-Neutron Star Winds, Magnetar
Birth, and Gamma-Ray Bursts
Brian D. Metzger∗,†, Todd A. Thompson∗∗ and Eliot Quataert∗
∗Astronomy Department and Theoretical Astrophysics Center, 601 Campbell Hall,
Berkeley, CA 94720; bmetzger@astro.berkeley.edu, eliot@astro.berkeley.edu
†Department of Physics, 366 LeConte Hall, University of California, Berkeley, CA 94720
∗∗Department of Astrophysical Sciences, Peyton Hall-Ivy Lane, Princeton University,
Princeton, NJ 08544; thomp@astro.princeton.edu
Abstract. We begin by reviewing the theory of thermal, neutrino-driven proto-neutron
star (PNS) winds. Including the effects of magnetic fields and rotation, we then derive
the mass and energy loss from magnetically-driven PNS winds for both relativistic and
non-relativistic outflows, including important multi-dimensional considerations. With these
simple analytic scalings we argue that proto-magnetars born with ∼ millisecond rotation
periods produce relativistic winds just a few seconds after core collapse with luminosi-
ties, timescales, mass-loading, and internal shock efficiencies favorable for producing long-
duration gamma-ray bursts.
Keywords: neutron stars, stellar winds, supernovae, gamma ray bursts, magnetic fields
PACS: 97.60.Bw, 97.60.Gb; 97.10.Me
1. NEUTRINO-DRIVEN PNS WINDS
After a successful core-collapse supernova (SN), a hot proto-neutron star (PNS)
cools and deleptonizes, releasing the majority of its gravitational binding energy
(∼ 3×1053 ergs) in neutrinos. With initial core temperature T > 10 MeV, a PNS
is born optically-thick to neutrinos of all flavors because the relevant neutrino-
matter cross sections scale as σνn ∝ ǫ
ν ∝ T
2, where ǫν is a typical neutrino energy.
Indeed, because neutrinos are trapped, a PNS’s neutrino luminosity Lν remains
substantial and quasi-thermal for a time after bounce τKH ∼ 10−100 s, as roughly
verified by the 19 neutrinos detected from SN1987A 20 years ago [1],[2]. Although
this Kelvin-Helmholtz (KH) cooling epoch is short compared to the time required
for the shock, once successful and moving outward at ∼ 104 km/s, to traverse
the progenitor stellar mantle, τKH is still significantly longer than the time over
which the initial explosion must be successful. While the specific shock launching
mechanism is presently unknown, it must occur in a time t < 1 s ≪ τKH after
bounce for the PNS to avoid accreting too much matter.
Thus, even after the SN shock has cleared a cavity of relatively low density mate-
rial around the PNS, Lν remains substantial. Detailed PNS cooling calculations [3]
show that the electron neutrino(antineutrino) luminosity Lνe(ν̄e) is ∼ 10
52 erg/s at
t∼ 1 s and declines as∝ t−1 until t≃ τKH, after which Lνe(ν̄e) decreases exponentially
as the PNS becomes optically thin. This persistent neutrino flux Fνe(ν̄e) continues
to heat the PNS atmosphere, primarily through electron neutrino(antineutrino)
http://arxiv.org/abs/0704.0675v1
absorption on nuclei (νe+n → p+ e
− and ν̄e+ p → n+ e
+). Because the inverse,
pair capture rates dominate the cooling, which declines rapidly with temperature
(q̇− ∝ T 6) and hence with spherical radius r, a region of significant net positive
heating (q̇ ≡ q̇+− q̇− > 0) develops above the neutrinosphere radius Rν . This heat-
ing drives mass-loss from the PNS in the form of a thermally-driven wind [4]. To
estimate the dependence of the resultant mass-loss rate (Ṁth) on the PNS proper-
ties explicitly, consider that in steady state the change in gravitational potential
required for a unit mass element to escape the PNS (GM/Rν) must be provided
by the total heating it receives accelerating outwards from the PNS surface:
, (1)
where M is the PNS mass, vr is the outward wind velocity, and q̇ is per unit mass.
Because q̇ is quickly dominated by heating from neutrino absorption, which scales
as q̇+ ∝ Fνeσnν ∝ Lνeǫ
/4πr2, we see that equation (1) implies that
ρdr ≈
ρνHν , (2)
where we have used Ṁth = 4πρr
2vr for a spherical wind, ρ is the mass density, H
is the PNS’s density scale height, ǫνe crudely defines a mean electron neutrino or
antineutrino energy, and a subscript “ν” denotes evaluation near Rν . Neglecting
rotational support and assuming that the thermal pressure P is dominated by
photons and relativistic pairs (which also becomes an excellent approximation
as the density plummets abruptly above the PNS surface), we have that Hν ∼
Pν/ρνgν ∝ T
ν/Mρν , where gν ∝ M/R
ν is the PNS surface gravity and Tν ∝
(Lνeǫ
/R2ν)
1/6 is the PNS surface temperature. Tν is set by the balance between
heating and cooling at the PNS surface (T 6ν ∝ q̇
− = q̇+ ∝Lνeǫ
/R2ν). Inserting these
results into equation (2) and including the correct normalization from the relevant
weak cross sections, one finds the expression for Ṁth first obtained by ref [4]:
Ṁth ≈ 10
10 M⊙/s, (3)
where L52 ≡Lνe×10
52 erg/s, ǫ10 ≡ 10ǫνeMeV, Rν ≡ 10R10 km, andM ≡ 1.4M1.4M⊙.
Endowed with an enormous gravitational binding energy and a means, through
this neutrino-driven outflow, for communicating a fraction of this energy to the
outgoing shock, a newly-born PNS seems capable of affecting the properties of
the SN that we observe. However, a purely thermal, neutrino-driven PNS wind
is only accelerated to an asymptotic speed of order the surface sound speed:
v∞th ∼ cs,ν ≈
2kTν/mp ≈ 0.1L
10 c. Thus, the efficiency η relating wind
power Ėth ≈ Ṁth(v
2/2 to total neutrino luminosity (Lν ∼ 6Lνe) is quite low:
∼ 10−5L
1.4 . (4)
In particular, although neutrino energy deposited in a similar manner may be
responsible for initiating the SN explosion itself at early times (i.e., the neutrino
SN mechanism [5]), η drops rapidly as the PNS cools. Quasi-spherical winds of this
type are therefore not expected to affect the SN’s nucleosynthesis or morphology
(although the wind itself is considered a promising r-process source [4]).
2. MAGNETICALLY-DRIVEN PNS WINDS
Some PNSs may possess a more readily extractable form of energy in rotation.
A PNS born with a period P = Pms ms is endowed with a rotational energy
Erot ≃ 2×10
52P−2ms R
10M1.4 ergs, which, for P < 4 ms, exceeds the energy of a typical
SN shock (∼ 1051 ergs). Given a mass loss rate Ṁ and torquing lever arm ωτ , a wind
extracts angular momentum J from the PNS at a rate J̇ ≃Ωω2τṀ , where Ω= 2π/P
is the PNS rotation rate. With the PNS’s radius Rν as a lever arm and the modest
thermally-driven mass-loss rate given by equation (3), the timescale for removal of
the PNS’s rotational energy, τJ ≡ J/J̇ ∼MR
ν/Ṁω
τ ∼M/Ṁth, is much longer than
τKH. However, if the PNS is rapidly rotating and possesses a dynamically-important
poloidal magnetic field Bp (through either flux-freezing or generated via dynamo
action [6]), then both Ṁ and ωτ can be substantially increased; this reduces τJ ,
allowing efficient extraction of Erot.
For magnetized winds ωτ is the Alfvén radius ωA, defined as the cylindrical
radius where ρv2r/2 first exceeds B
p/8π [7]. The magnetosphere of a PNS is
most likely dominated by its dipole component, with a total (positive-definite)
surface magnetic flux given by ΦB = 2πBνR
ν , where Bν is the polar surface field.
To estimate ωA for magnetized PNS outflows recognize that mass and angular
momentum are primarily extracted from a PNS along open magnetic flux. For
an axisymmetric dipole rotator this represents only a fraction ≈ 2(πθ2LCFL)/4π ≃
Rν/2ωY of ΦB, where θLCFL ≈
Rν/ωY is the latitude (measured from the pole)
at the PNS surface of the last closed field line (LCFL), ωY is the radius where the
LCFL intersects the equator (the “Y point”), and we have assumed that ωY ≫Rν
(θLCFL ≪ 1). Plasma necessarily threads a PNS’s closed magnetosphere and cannot
be forced to corotate superluminally; thus ωY cannot exceed the light cylinder
radius ωL ≡ c/Ω = 48Pms km, making it useful to write the PNS magnetosphere’s
total open magnetic flux as ΦB,open ≈ πBνR
ν(Rν/ωL)(ωY/ωL)
−1. Now, the overall
latitudinal structure of a PNS magnetosphere (i.e., the allocation of open and closed
magnetic flux, and the value of ωY/ωL) is primarily dominated by the dipolar
closed zone. However, recent numerical simulations [8] show that where the field
is open it behaves as a “split monopole”. In this case the poloidal field scales as
Bp ∼ ΦB,open/r
2 ≈ 0.2BνP
msR10(ωY/ωL)
−1(Rν/r)
2, rather than the dipole scaling
∝ (Rν/r)
3. The constant of proportionality is chosen to assure that Bp(Rν)→Bν in
the limit of vanishing closed zone (ωL,ωY →Rν) and is in agreement with numerical
results (see eq. [28] of ref [8]).
2.1. Non-Relativistic Winds and Asymmetric Supernovae
Non-relativistic (NR) magnetically-driven winds reach an equipartition between
kinetic and magnetic energy outside ωA such that the kinetic energy flux at ωA
(Ṁvr(ωA)
2/2) carries a sizeable fraction of the rotational energy loss extracted by
the wind’s surface torque Ėrot = J̇Ω = ṀΩ
2ω2A; thus, we have that vr(ωA)∼ ΩωA.
Combining this with the modified monopole scaling for Bp motivated above and
mass conservation ṀΩ ≡ ρr
2vr (ṀΩ is the mass flux per solid angle) we find that:
ωA/Rν ≃B
ms Ṁ
Ω,−4R
10 (ωY/ωL)
−1, (5)
where ṀΩ ≡ ṀΩ,−4×10
−4M⊙s
−1sr−1, Bν ≡B15×10
15 G, and we have concentrated
on the open magnetic flux that emerges nearest the closed zone (polar latitude
≈ θLCFL) and which thereby dominates the spin-down torque.
From equation (5) we see that winds from rapidly rotating PNSs with
surface magnetic fields typical of Galactic “magnetars” (Bν ∼ 10
14 − 1015
G) possess enhanced lever arms for extracting rotational energy [9]. Fur-
thermore, their total outflow power ĖNRmag ≈ Ėrot ≈ 2πθ
LCFLṀΩΩ
2ω2A ≈
1049B
−13/3
ms Ṁ
Ω,−4R
10 (ωY/ωL)
−3 ergs/s dominates thermal acceleration
(ĖNRmag > Ėth) for B15 > 0.4P
23/24
23/12
−11/3
1.4 (ωY/ωL)
9/4. This condi-
tion becomes easier to satisfy as the PNS cools, allowing magnetized winds to
dominate later stages of the KH epoch for PNSs with even relatively modest Bν
and Ω. NR magnetically-driven winds, in addition to being more powerful than
spherical, thermally-driven outflows, are efficiently hoop-stress collimated along
the PNS rotation axis [8]. The power they deposit along the poles may produce
asymmetry in SN ejecta distinct from the shock-launching process itself.
Strong magnetic fields and rapid rotation can also increase the out-
flow’s power through enhanced mass-loss because ĖNRmag ∝ Ṁ
Ω . When
the PNS’s hydrostatic atmosphere is forced to co-rotate to the outflow’s
sonic radius ωs = (GM sin[θLCFL]/Ω
2)1/3 then ṀΩ is enhanced by a factor
φcf ∼ exp[(vφ,ν/cs,ν)
2] over Ṁth/4π due to centrifugal (“cf”) slinging [9], where
vφ,ν ≈ RνΩsin[θLCFL] ≈ RνΩ
Rν/ωY is the PNS rotation speed at the base
of the open flux. Using our estimate for cs,ν from § 1, we see that enhanced
mass loss becomes important for Pms < Pcf,ms ≡ L
−1/18
10 (ωY/ωL)
(i.e., only for PNSs with considerable rotational energy Erot > 10
52 ergs).
Fully enhanced mass loss (ṀΩ = Ṁthφcf/4π) requires ωA > ωs, which
in turn requires that B15 > Bcf,15 ≡ P
−13/4
10 Ṁ
Ω,−4M
1.4 (ωY/ωL)
5/4 ≃
0.3P 7/4ms L
1.4 R
−29/12
10 exp[0.5(P/Pcf)
−3](ωY/ωL)
5/4, where we have taken
Ṁth from § 1. For cases with Bν < Bcf but P < Pcf , ṀΩ lies somewhere between
Ṁth/4π and φcfṀth/4π (see [10] for numerical results). Millisecond proto-magnetars
generally attain φcf , except perhaps at early times when the PNS is quite hot.
2.2. Relativistic Winds and Gamma-Ray Bursts
As the PNS cools, eventually ωA → ωL and the PNS outflow becomes relativistic
(REL). This transition occurs after τKH for most PNSs (they become pulsars), but
rapidly rotating proto-magnetar winds become relativistic during the KH epoch
itself. Similar to normal pulsars, PNSs of this type lose energy at the force-free,
“vacuum dipole” rate: ĖRELmag ≈ 6×10
49B215P
10(ωY/ωL)
−2 ergs/s (again modulo
corrections for excess open magnetic flux ĖRELmag ∝ Φ
B,open ∝ (ωY/ωL)
−2 [8]), which
gives a familiar spin-down timescale τJ =Erot/Ė
mag ≈ 300B
10 M1.4(ωY/ωL)
s. On the other hand, the mass loading on a PNS’s open magnetic flux is set by
neutrino heating, a process totally different from the way that matter is extracted
from a normal pulsar’s surface. In fact, a proto-magnetar outflow’s energy-to-mass
ratio σ is given by
ĖRELmag
2πθ2LCFLṀΩc
≈ 3B215P
−10/3
1.4 exp
From equation (6) we see that because a PNS’s mass-loss rate drops so precipitously
as it cools, σ ∝ L−5/3νe ǫ
−10/3
rises rapidly with time, easily reaching ∼ 10− 1000
during the KH epoch for typical magnetar parameters [9],[10]. Detailed evolution
calculations indicate that Erot is extracted roughly uniformly in log(σ) [10].
To conclude with a concrete example, consider a proto-magnetar with Bν = 10
G and Pms = 3 at t= 10 seconds after core collapse. From the cooling calculations
of ref [3] we have L52(10 s)≈ 0.1 and ǫ10(10 s)≈ 1 (see Figs. [14] and [18]) and so,
under the conservative estimate that ωY =ωL, equation (6) gives σ≈ 500. Because σ
represents the potential Lorentz factor of the outflow (assuming efficient conversion
of magnetic to kinetic energy), we observe that millisecond proto-magnetar birth
provides the right mass-loading to explain gamma-ray bursts (GRBs). Further, the
power at t = 10 s is still ĖRELmag ≈ 10
50 erg/s with a spin-down time τJ ≈ 30 s, both
reasonable values to explain typical luminosities and durations, respectively, of
long-duration GRBs. Lastly, because σ rises so rapidly with time as the PNS cools,
in the context of GRB internal shock models a cooling proto-magnetar outflow’s
kinetic-to-γ-ray efficiency can be quite high; our calculations indicate that values
of 10−50% are plausible. We conclude that magnetar birth accompanied by rapid
rotation (but requiring less angular momentum than collapsar models) represents
a viable long-duration GRB central engine.
REFERENCES
1. Bionta, R. M., Blewitt, G., Bratton, C. B., Caspere, D., & Ciocio, A. 1987, Physical Review
Letters, 58, 1494
2. Hirata, K. S., et al. 1988, Phys. Rev. D, 38, 448
3. Pons, J. A., Reddy, S., Prakash, M., Lattimer, J. M., & Miralles, J. A. 1999, ApJ, 513, 780
4. Qian, Y.-Z., & Woosley, S. E. 1996, ApJ, 471, 331
5. Bethe, H. A., & Wilson, J. R. 1985, ApJ, 295, 14
6. Thompson, C., & Duncan, R. C. 1993, ApJ, 408, 194
7. Weber, E. J., & Davis, L. J. 1967, ApJ, 148, 217
8. Bucciantini, N., Thompson, T. A., Arons, J., Quataert, E., & Del Zanna, L. 2006, MNRAS,
368, 1717
9. Thompson, T. A., Chang, P., & Quataert, E. 2004, ApJ, 611, 380
10. Metzger, B. D., Thompson, T.A., & Quataert, E. 2007, ApJ in press
	Neutrino-Driven PNS Winds
	Magnetically-Driven PNS Winds
	Non-Relativistic Winds and Asymmetric Supernovae
	Relativistic Winds and Gamma-Ray Bursts
ABSTRACT
  We begin by reviewing the theory of thermal, neutrino-driven proto-neutron
star (PNS) winds. Including the effects of magnetic fields and rotation, we
then derive the mass and energy loss from magnetically-driven PNS winds for
both relativistic and non-relativistic outflows, including important
multi-dimensional considerations. With these simple analytic scalings we argue
that proto-magnetars born with ~ millisecond rotation periods produce
relativistic winds just a few seconds after core collapse with luminosities,
timescales, mass-loading, and internal shock efficiencies favorable for
producing long-duration gamma-ray bursts.

<|endoftext|><|startoftext|>
Introduction
Current theories for the formation of massive stars stress the
importance of the dense cluster environment in which most
of them, if not all, form (Bonnell et al. 2007). Dynamical in-
teractions at the centers of massive star forming regions lead
to captures forming binary systems, ejections, mass segrega-
tion, and possibly coalescence. A remarkable byproduct of the
dynamical interactions in dense clusters of massive stars is
the relatively large abundance of runaway O-type stars, which
amount to almost ∼ 10% of the known O-type stars in the so-
lar vicinity (see Maı́z-Apellániz et al. (2004) for a recent cen-
sus). Runaway stars, characterized by their large spatial veloc-
ities, can form either by dynamical ejection from a dense clus-
ter (Poveda et al. 1967, Leonard & Duncan 1988, 1990) or by
the explosion as supernova of a member of a close massive
binary (Blaauw 1961, van Rensbergen et al. 1996, de Donder
et al. 1997). Evidence for actual examples resulting from both
mechanisms exists (Hoogerwerf et al. 2000, 2001), and both are
a consequence of the special conditions in which massive star
formation takes place. On the one hand, the high stellar density
of the parental cluster facilitates the dynamical ejection scenario.
On the other hand, the supernova scenario is favored by the high
frequency of binaries with high mass ratios among massive stars
(Garmany et al. 1982, Preibisch et al. 1999), which may be a
Send offprint requests to: F. Comerón
⋆ Based on observations collected at the Centro Astronómico
Hispano-Alemán (CAHA) at Calar Alto, operated jointly by the Max-
Planck Institut für Astronomie and the Instituto de Astrofı́sica de
Andalucı́a (CSIC).
consequence of dynamical capture followed by accretion and or-
bital evolution (Bate et al. 2003).
Cygnus OB2, the most massive OB association of the solar
neighbourhood (Knödlseder 2000, 2003, Comerón et al. 2002,
and references therein), should be the source of numerous run-
away stars given its rich content in massive stars, which in-
cludes the massive multiple system Cyg OB2 8 near its center.
Unfortunately, few studies to the date have addressed its possi-
ble runaway population, with the exception of the recent radial
velocity survey of Kiminki et al. (2007) in which no runaway
candidate has been identified until now. Comerón et al. (1994,
1998) pointed out the existence of large-scale kinematical pe-
culiarities in the Cygnus region, most likely related to the pres-
ence of Cygnus OB2, as shown by Hipparcos proper motions.
Although they interpreted their results in terms of triggered star
formation (Elmegreen 1998), at least some of the stars that they
identified as moving away from Cygnus OB2 might be actual
runaways formed by either of the two mechanisms listed above.
In this paper we report the identification of a very high
mass runaway star, BD+43◦ 3654, very probably ejected from
Cygnus OB2. The star had been already identified as a likely
runaway by van Buren & McCray (1988) based on the existence
of an apparent bow shock in IRAS images, caused by the inter-
action of its stellar wind with the local interstellar medium. Here
we present the first spectroscopic observations of the star, which
show it to be a very early Of-type supergiant. We also present
proper motion data and higher resolution MSX images leading
to a more detailed analysis, which strongly supports an origin at
the core of Cygnus OB2.
http://arxiv.org/abs/0704.0676v1
2 F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2
BD+43  3654
IS IS
Fig. 1. Spectrum of BD+43◦ 3654 showing the main absorption
lines used for spectral classification and the prominent emission
of NIII and HeII. The prominent, unlabeled features are Hγ, Hδ,
and Hǫ. Interstellar absorption features are indicated by dotted
lines. The locations where one might expect to detect NIV and
SiIV transitions are also indicated.
2. Observations
The spectrum presented here was obtained in the course of
a project aimed at producing spectral classifications of pre-
viously unknown, photometrically selected new OB stars in
the surroundings of Cygnus OB2. The photometry in the
BRJHKS bands was taken from the Naval Observatory Merged
Astrometric Dataset (NOMAD) catalog (Zacharias et al. 2004),
which combines astrometry and photometry from the Hipparcos,
Tycho-2, UCAC2, USNO-B1.0, and 2MASS catalogs. The spec-
troscopic observations were carried out with the 2.2m tele-
scope at the German-Spanish Astronomical Center on Calar Alto
(Spain) using the CAFOS visible imager and spectrograph. A
1”5 slit combined with the B-100 grism, providing a resolution
λ/∆λ = 800 in the blue part of the visible spectrum, were used.
The exposure time was 900 s. The spectrum was reduced, ex-
tracted, and wavelength calibrated using standard IRAF1 tasks
under the ONEDSPEC package, and it was ratioed by a sixth-
degree polynomial fit to the continuum in order to remove the
steep slope due to the strong extinction towards the star.
3. Results
3.1. Stellar classification, properties, and kinematics
Although the identification of BD+43◦ 3654 as a likely runaway
star dates back to van Buren & McCray (1988), no spectral clas-
sification is available in that work. Subsequent papers by van
Buren et al. (1995) and Noriega-Crespo et al. (1997) refer to
the star as a unspecified B-type but do not report dedicated ob-
servations, and no other spectroscopic classification appears to
be available in the literature apart from a generic classification
as ’OB reddened’ in the LS catalog (Hardorp et al. 1964). The
spectrum presented here is thus the first one allowing an accu-
rate spectral classification of BD+43◦ 3654 and the estimate of
its physical parameters.
1 IRAF is distributed by NOAO, which is operated by the Association
of Universities for Research in Astronomy, Inc., under contract to the
National Science Foundation.
Fig. 2. Comparison between the position of BD+43◦ 3654 and
the evolutionary tracks for very massive stars of Meynet et
al. (1994).
The most obvious spectroscopic feature of BD+43◦ 3654 is
the presence of intense emission in the NIII and HeII lines, and
possibly also in NIV and SiIV, clearly indicating that it is a Of
star. HeII lines are also prominent in absorption, and together
with the absence of HeI lines indicates a spectral type earlier
than O5. Absorption bands due to interstellar absorption, CaII
and diffuse interstellar bands, are also strong due to the high ex-
tinction towards the star. Based on comparison with the atlas
of Walborn & Kirkpatrick (1990), we classify the star as O4If.
Using intrinsic colors of early-type stars from Tokunaga (2000)
and the 2MASS HKS photometry from the NOMAD catalog, we
estimate a K-band extinction AK = 0.57 mag.
A summary of previous distance determinations to
Cygnus OB2 has been presented by Hanson (2003). Based on
her results, we adopt her favored distance modulus DM = 10.8
corresponding to a distance of 1450 pc, with an estimated un-
certainty of ±0.4 based on the results of previous determinations
summarized in that work. Assuming that BD+43◦ 3654 is ap-
proximately at the same distance from the Sun as Cygnus OB2,
we derive its absolute magnitude as
MV = KS − AK + (V − KS )0 − DM = −6.3 ± 0.5 (1)
where the 0.5 mag uncertainty includes as the dominant source
the quoted uncertainty in the distance modulus and the contribu-
tion of error in the derivation of the extinction. We estimate the
latter to be 0.2 mag based on the quality of the fit of a reddened
O4-type spectral energy distribution to the measured BRJHKS
photometry. The contribution of the errors in the broad-band
photometry is negligible as compared to those other two sources.
Different calibrations of the stellar parameters of O stars
can be found in literature to estimate the mass and the age of
BD+43◦ 3654. These calibrations are based on a different treat-
ment of the stellar model atmospheres, depending on whether
non-LTE conditions, line-blanketing effects and stellar winds
are taken into account. For an O4 supergiant in the Milky Way
(i.e. of solar metallicity), Martins et al. (2005) provide an ef-
fective temperature Te f f = 40702 K; Repolust et al. (2004) es-
timate a colder Te f f = 39000 K, while Vacca et al. (1996) give
Te f f = 47690 K. We have adopted the average of those three cal-
ibrations, 42464 K, as the temperature for BD+43◦ 3654, con-
sidering as the uncertainty the range of temperatures spanned by
F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 3
those models. This uncertainty is larger than the temperature dif-
ference between the subtypes O4 and O5, and between types O4I
and O4V, for any given calibration (see e.g. Martins et al. 2005).
The same is true for the effects of metallicity, which are hardly
noticeable even when metal abundances change by a factor of
10. This is clearly shown in Fig. 15 of Heap et al. (2006), where
the temperature-spectral type relationships from different cali-
brations involving both galactic and Small Magellanic Cloud O
stars are compared. Therefore, plausible uncertainties in either
our spectral classification or in our assumption of solar metal-
licity for BD+43◦ 3654 do not significantly alter the size of the
error bars in Fig. 2. The absolute magnitudes MV obtained by all
three models are very similar, with an average of MV = −6.36
and individual determinations deviating by less than 0.05 mag
from that value. This is remarkably close to the value that we ob-
tain from the photometry of BD+43◦ 3654 and the assumption
that its distance modulus is the same obtained by Hanson (2003)
for Cygnus OB2, thus supporting our choice of that distance for
the star.
The position of BD+43◦ 3654 on the Herzsprung-Russell
(HR) diagram is shown in Fig. 2, together with the isochrones
computed by Meynet et al. (1994) for solar metallicity and with
enhanced stellar mass loss. These evolutionary tracks are prefer-
able because they better reproduce the low-luminosity observed
for some Wolf-Rayet stars, the surface chemical composition of
WC and WO stars and the ratio of blue to red supergiants in the
star clusters of the Magellanic Clouds. The isochrones plotted
in Fig. 2 refer to a stellar mass of the progenitor on the main
sequence of Mi = 20, 25, 40 M⊙ (in grey) and Mi = 60, 85 M⊙
(in black). The comparison between the observed properties of
BD+43◦ 3654 and the isochrones allows us to estimate an initial
mass Mi ≃ (70 ± 15) M⊙ and an approximate age of 1.6 Myr.
The isochrones do not take into account stellar rotation, which
many studies in the past decade have found to greatly affect mix-
ing and mass loss, and to be an important ingredient for stellar
evolution (Meynet & Maeder 1997, Langer et al. 1998, Heger &
Langer 2000, Meynet & Maeder 2000, and references therein).
As shown by Meynet & Maeder (2000), for an initial rotational
velocity of 200-300 km s−1 and solar metallicity isochrones be-
come brighter by a few tenths of a magnitude and the lifetime
in the H-burning phase increases by 20-30%. Given the obser-
vational errors on BD+43◦ 3654, these changes do no affect sig-
nificantly our estimates of the initial mass and age of the star.
Proper motions for BD+43◦ 3654 are available from the
NOMAD catalog, based on measurements by the Hipparcos
satellite in the Tycho catalog further refined with previous
ground-based observations. The values listed are µα cos δ =
(−0.4 ± 0.7) mas yr−1 and µδ = (+1.3 ± 1.0) mas yr
−1. The
corresponding values expressed in galactic coordinates, which
are more convenient to derive the spatial velocity of the star
with respect to its local interstellar medium, are µl cos b =
(+0.8 ± 0.9) mas yr−1, µb = (+1.1 ± 0.8) mas yr
Assuming that the local interestellar medium in the sur-
roundings of BD+43◦ 3654 moves in a circular orbit around
the galactic center, its proper motion (µl cos b)0, (µb)0 can be
described by the first-order approximation to the local galactic
velocity field; see e.g. Scheffler & Elsässer (1987):
(µl cos b)0 = 0.211[A cos 2l cos b + B cos b
sin l −
cos l] (2a)
(µb)0 = 0.211[−A sin 2l sin b cos b
Fig. 3. Image obtained in the Midcourse Space Experiment
(MSX) galactic plane survey in the D medium-infrared band
(13.5 µm - 15.9 µm). The position of BD+43◦ 3654 is marked
with a grey circle. The galactic North is up and the direction of
growing galactic longitude to the left.
cos l sin b +
sin l sin b −
cos b] (2b)
where A and B are the Oort constants in units of km s−1 kpc−1;
U, V , and W are the components of the solar peculiar mo-
tion in the directions toward the galactic center, the direction
of circular galactic rotation, and the North galactic pole re-
spectively, in km s−1, and D is the distance to the Sun in
kpc. We have adopted A = −B = 12.5 km s−1 kpc−1, cor-
responding to a flat rotation curve with an angular velocity of
25 km s−1 kpc−1 and (U,V,W) = (7, 14, 7) km s−1. The proper
motion of BD+43◦ 3654 with respect to its local interstellar
medium is then
∆µl cos b = µl cos b − (µl cos b)0 = (5.3 ± 1.1) mas yr
−1 (3a)
∆µb = µb − (µb)0 = (2.0 ± 0.9) mas yr
−1 (3b)
where the uncertainty allows for an error of 2 km s−1 kpc−1 in
each of A, B, and 2 km s−1 in each of U, V , and W. The position
angle θ of the residual proper motion with respect to the North
galactic pole, counted as positive in the direction of increasing
galactic latitude, is then
θ = tan−1
∆µl cos b
= 69◦3 ± 9◦4 (4)
The component of the spatial velocity on the plane of the sky
that we derive from the residual proper motion at the adopted
distance of 1450 pc is (39.8±9.8) km s−1, which is several times
the sound speed in a warm neutral interstellar medium at a tem-
perature of ∼ 8000 K, as expected from the fact that a clear bow
shock is observed ahead of the star in the direction of its motion.
3.2. The bow shock
The original identification of a possible bow shock associated
to BD+43◦ 3654 was reported by van Buren & McCray (1988)
4 F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2
based on 60 µm IRAS maps, and further details were given
by van Buren et al. (1995) and Noriega-Crespo et al. (1997).
While indeed suggestive of a bow shock, the resolution of the
IRAS 60 µm images presented by van Buren et al. (1995) is not
sufficient to accurately determine the shape of the bow shock and
its position with respect to the star.
Much improved images of the region around BD+43◦ 3654
have been provided by the Midcourse Space Experiment (MSX)
satellite (Price et al. 2001). The BD+43◦ 3654 bow shock ap-
pears in them as a neat, well defined arc-shaped nebula in the
D (13.5-15.9 µm; see Fig. 3) and E (18.2-25.1 µm) bands, and
is absent in the A (6.8-10.8µm) and C (11.1-13.2 µm) bands.
The position of the apsis of the bow shock with respect to
BD+43◦ 3654 can be well determined from those images, be-
ing located at 3.4 arcmin from the star in a direction that forms
an angle of 62◦5±10◦ with the direction towards the north galac-
tic pole. This position angle is in very good agreement with the
position angle of the residual velocity vector of the star (Eq. (4)).
The position of the bow shock with respect to the star allows
us to estimate the density of the interstellar medium through
which BD+43◦ 3654 is moving. The apsis of the bow shock is
approximately located at the stagnation radius, which is the dis-
tance from the star where the ram pressure of the interstellar gas
equals that of the stellar wind, given by (e.g. Wilkin 1996):
Ṁwvw
4πρav2∗
where Ṁw and vw are respectively the mass loss rate and termi-
nal wind velocity of the star, ρa is the ambient gas density, and
v∗ is the spatial velocity of the star. The distance given in Eq. (5)
assumes that the bow shock is bound by shock fronts on both
sides. In reality, the non-zero cooling time of the shocked stel-
lar wind builds up a thick layer of low-density, high-temperature
gas between the reverse shock on the stellar wind and the bow
shock. The existence of this thick layer moves the position of the
apsis of bow shock to a greater distance from the star than that
given by Eq. (5). This expression actually gives the position of
the reverse shock ahead of the star, as shown in numerical sim-
ulations by Comerón & Kaper (1998), and the actual position of
the bow shock is normally ∼ 1.5R0, the precise distance depend-
ing on the quantities entering the right-hand side of Eq. (5) and
the cooling curve of the stellar wind gas. Concerning the stellar
wind, we have adopted Ṁ = 10−5 M⊙ yr
−1 and v∗ = 2300 km s
as typical values derived by Markova et al. (2004) and Repolust
et al. (2004) for the O4I stars in their samples. Finally, we use
v∗ = 39.8 km s
−1 as derived in the previous Section, assum-
ing for simplicity that most of the velocity of BD+43◦ 3654 is
on the plane of the sky and that there are no projection effects
on the position of the bow shock. Introducing these values in
Eq. (5), we obtain a number density of the local interstellar gas
nH ≃ 6 cm
−3. It must be kept in mind that this is only a rough es-
timate of the density, mainly due to the large uncertainties in the
values adopted for the different quantities intervening in Eq. (5)
and the assumption that the residual motion of the star is in the
plane of the sky. In particular, we note that Bouret et al. (2005)
find mass loss rates smaller by a factor of ∼ 3 for the galactic
O4If+ star HD190429A when taking into account wind clump-
ing with respect to the homogeneous wind case, which may im-
ply an overestimate of nH by a similar factor due to our adopted
values. In any case, the estimated density clearly indicates that
the star is moving in a tenuous medium whose density matches
well that typical of the warm HI gas in the vicinity of the galactic
midplane (e.g. Dickey & Lockman 1990).
4. Discussion: the origin of BD+43◦ 3654
The spectral type and estimated mass of BD+43◦ 3654 places it
among the three most massive runaway stars known to date. The
only other two comparable stars are ζ Pup and λ Cep (spectral
types O4I(n)f and O6I(n)fp, respectively; Maı́z Apellániz 2004),
whose masses (65-70 M⊙, as estimated by Hoogerwerf et
al. (2001) from evolutionary models by Vanbeveren et al. (1998))
are similar to the one that we estimate for BD+43◦ 3654.
Although currently placed near the boundary separating
Cygnus OB1 and OB9 (to the extent that this boundary may
be real; see Schneider et al. (2007)), the proper motion of
BD+43◦ 3654 points away from the core of Cygnus OB2, which
is approximately marked by the location of the multiple system
of O stars Cyg OB2 8A-D. Other early O-type stars, most no-
tably Cyg OB2 22A (O3If*), Cyg OB2 22B (O6V((f))), and
Cyg OB2 9 (O5If+), are also within few arcminutes of that lo-
cation. The position angle of BD+43◦ 3654 with respect to this
system is 58◦84, very similar to the position angle of its resid-
ual proper motion vector (Sect. 3.1) and of the axis of the bow
shock (Sect. 3.2). In view of the high density of very massive
OB stars found in the central regions of Cygnus OB2 (Massey
& Thompson 1991), we thus consider as a very likely possibility
that BD+43◦ 3654 was formed there and subsequently expelled.
Assuming that BD+43◦ 3654 was born in the close vicinity
of Cyg OB2 8, of which is currently separated by an angular
distance δ = 2◦67, the travel time to its current position is τ =
(∆µl cos b)2 + (∆µb)2 = 1.7 ± 0.4 Myr, which is close to the
age of the star inferred from the evolutionary tracks and the posi-
tion in the H-R diagram (Sect. 3.1). The coincidence between the
age and the travel time supports dynamical ejection early in its
life as the cause for its runaway velocity, since there would have
been no time for a hypothetical massive companion to evolve, go
through different mass transfer episodes (Vanbeveren et al. 1998)
and then explode as supernova. The spatial velocities of the other
two massive runaways noted above are probably higher, unless
the radial velocity of BD+43◦ 3654 exceeds the projected veloc-
ity on the plane of the sky: Hoogerwerf et al. (2001) measure
a velocity of 62.4 km s−1 for ζ Pup, and 74.0 for λ Cep. High
spatial velocities may be the signature of an origin by supernova
ejection, since high ejection velocities by dynamical interaction
become increasingly unlikely as the mass of the ejected star in-
creases. The observed mass-velocity relationship for runaway
stars (Gies & Bolton 1986) clearly shows this trend. Hoogerwerf
et al. (2001) favor a supernova scenario for λ Cep on the basis
of the difference between its age and that of the likeliest parental
association. The birthplace of ζ Pup is more uncertain according
to Hoogerwerf et al. (2001), but van Rensbergen et al. (1996)
also favor a supernova scenario. We note however that the ve-
locities of all three stars is well below the upper limit for the
ejection of very massive stars in encounters with massive bina-
ries (Leonard 1991). We thus consider the similarity between the
estimated age of BD+43◦ 3654, and its kinematic age if it was
born near the center of Cygnus OB2, as the strongest argument in
support of a dynamical ejection, possibly from an original clus-
ter containing in addition Cyg OB2 8, 9, and 22.
BD+43◦ 3654 is the first runaway star from Cygnus OB2
identified thus far, but most likely it is not the only one in such a
rich association. If the fraction of runaways among O-type stars
is the same for Cygnus OB2 as for the more nearby population
of O stars, we estimate that about ten more Cygnus OB2 run-
aways may remain to be discovered, having the potential of pro-
viding new information on their formation environments and on
the mechanisms leading to the runaway ejection.
F. Comerón and A. Pasquali: A very massive runaway star from Cygnus OB2 5
Acknowledgements. It is as always a pleasure to acknowledge the support of the
staff of the Calar Alto observatory during the execution of our observations. We
also thank the detailed and constructive comments of the referee, Dr. Dave van
Buren. This research has made use of the SIMBAD database operated at CDS,
Strasbourg, France. It also makes use of data products from the Two Micron
All Sky Survey, which is a joint project of the University of Massachusetts and
the Infrared Processing and Analysis Center/California Institute of Technology,
funded by the National Aeronautics and Space Administration and the National
Science Foundation, as well as of data products from the Midcourse Space
Experiment (MSX). Processing of the MSX data was funded by the Ballistic
Missile Defense Organization with additional support from NASA Office of
Space Science. This research has also made use of the NASA/ IPAC Infrared
Science Archive, which is operated by the Jet Propulsion Laboratory, California
Institute of Technology, under contract with the National Aeronautics and Space
Administration.
References
Bate, M.R., Bonnell, I.A., Bromm, V., 2003, MNRAS, 336, 705.
Blaauw, A., 1961, Bull. Astron. Inst. Netherlands, 15, 265.
Bonnell, I.A., Larson, R.B., Zinnecker, H., 2007, in ”Protostars and Planets V”,
eds. B. Reipurth, D. Jewitt, K. Keil, Univ. of Arizona Press.
Bouret, J.-C., Lanz, T., Hillier, D.J., 2005, A&A, 438, 301.
Comerón, F., Kaper, L., 1998, A&A, 338, 273.
Comerón, F., Torra, J., 1994, ApJ, 423, 652.
Comerón, F., Torra, J., Gómez, A.E., 1998, A&A, 330, 975.
Comerón, F., Pasquali, A., Rodighiero, G., Stanishev, V., De Filippis, E., López
Martı́, B., Gálvez Ortiz, M.C., Stankov, A., Gredel, R., 2002, A&A, 389,
de Donder, E., Vanbeveren, D., van Bever, J., 1997, A&A, 318, 812.
Dickey, J.M., Lockman, F.J., 1990, ARA&A, 28, 215.
Elmegreen, B.G., 1998, in Origins, eds. C.E. Woodward, J.M. Shull, H.A.
Thronson, ASP Conf. Ser. 148.
Garmany, C.D., , 1980, Conti, P.S., Massey, P., 1982, ApJ, 242, 1063.
Gies, D.R., Bolton, C.T., 1986, ApJS, 61, 419.
Hanson, M.M., 2003, ApJ, 597, 957.
Hardorp, J., Theile, I., Voigt, H.H., 1964, Luminous Stars in the Northern Milky
Way (LS), Vol. 3., Hamburger Sternwarte and Warner & Swasey Obs.
Heap, S.R., Lanz, T., Hubeny, I., 2006, ApJ, 638, 409.
Heger, A., Langer, N., 2000, ApJ, 544, 1016.
Hoogerwerf, R., de Bruijne, J.H.J., de Zeeuw, P.T., 2000, ApJ, 544, L133.
Hoogerwerf, R., de Bruijne, J.H.J., de Zeeuw, P.T., 2001, A&A, 365, 49.
Kiminki, D.C., Kobulnicky, H.A., Kinemuchi, K., Irwin, J.S., Fryer, C.L.,
Berrington, R.C., Uzpen, B., Monson, A.J., Pierce, M.J., Woosley, S.E.,
2007, ApJ, in press.
Knödlseder, J., 2000, A&A, 360, 539.
Knödlseder, J., 2003, in A Massive Star Odyssey: From Main Sequence to
Supernova, eds. K. van der Hucht, A. Herrero, C. Esteban, ASP Conf. Ser.
Langer, N., Heger, A., Fliegner, J., 1998, in Fundamental Stellar Properties: The
Interaction between Observation and Theory, IAU Symp. 189, eds. T.R.
Bedding, A.J. Booth, J. Davis, Kluwer Acad. Publ.
Leonard, P.J.T., 1991, AJ, 101, 562.
Leonard, P.J.T., Duncan, M.J., 1988, AJ, 96, 222.
Leonard, P.J.T., Duncan, M.J., 1990, AJ, 99, 608.
Maı́z Apellániz, J., Walborn, N.R., Galué, H.Á., Wei, L.H., 2004, ApJS, 151,
Markova, N., Puls, J., Repolust, T., Markov, H., 2004, A&A, 413, 693.
Martins, F., Schaerer, D., Hillier, D.J., 2005, A&A, 436, 1049.
Massey, P., Thompson, A.B., 1991, AJ, 101, 1408.
Meynet, G., Maeder, A., 1997, A&A, 321, 465.
Meynet, G., Maeder, A., 2000, AR&A, 38, 143.
Meynet, G., Maeder, A., Schaller, D., Charbonnel, C., 1994, A&AS, 103, 97.
Noriega-Crespo, A., van Buren, D., Dgani, R., 1997, AJ, 113, 780.
Poveda, A., Ruiz, J., Allen, C., 1967, Bol. Obs. Tonantzintla y Tacubaya, 4, 86.
Preibisch, T., Balega, Y., Hofman, K.-H., Weigelt, G., Zinnecker, H., 1999, New
Astr., 4, 531.
Price, S.D., Egan, M.P., Carey, S.J., Mizuno, D.R., Kuchar, T.A., 2001, AJ, 121,
2819.
Repolust, T., Puls, J., Herrero, A., 2004, A&A, 415, 349.
Scheffler, H., Elsässer, H., 1987, Physics of the Galaxy and the Interstellar
Medium, Springer Verlag.
Schneider, N., Simon, R., Bontemps, S., Motte, F., 2007, A&A, submitted.
Tokunaga, A.T., 2000, in Allen’s Astrophysical Quantities, ed. A. Cox, AIP
Press.
Vacca, W.D., Garmany, C.D., Shull, J.M., 1996, ApJ, 460, 914.
Vanbeveren, D., De Loore, C., Van Rensbergen, W., 1998, A&A Rev., 9, 63.
van Buren, D., McCray, R., 1988, ApJ, 329, L93.
van Buren, D., Noriega-Crespo, A., Dgani, R., 1995, AJ, 110, 2914.
van Rensbergen, W., Vanbeveren, D., de Loore, C., 1996, A&A, 305, 825.
Walborn, N.R., Fitzpatrick, E.L., 1990, PASP, 102, 379.
Wilkin, F.P., 1996, ApJ, 459, L31.
Zacharias, N., Monet, D.G., Levine, S.E., Urban, S.E., Gaume, R., Wycoff, G.L.,
2004, AAS, 205, 4815.
List of Objects
‘Cygnus OB2’ on page 1
‘BD+43◦ 3654’ on page 1
‘HD190429A’ on page 4
‘ζ Pup’ on page 4
‘λ Cep’ on page 4
‘Cyg OB2 8A-D’ on page 4
‘Cyg OB2 22A’ on page 4
‘Cyg OB2 22B’ on page 4
‘Cyg OB2 9’ on page 4
	Introduction
	Observations
	Results
	Stellar classification, properties, and kinematics
	The bow shock
	Discussion: the origin of BD+43 3654
ABSTRACT
  Aims: We analyze the available information on the star BD+43 3654 to
investigate the possibility that it may have had its origin in the massive OB
association Cygnus OB2.
  Methods: We present new spectroscopic observations allowing a reliable
spectral classification of the star, and discuss existing MSX observations of
its associated bow shock and astrometric information not previously studied.
  Results: Our observations reveal that BD+43 3654 is a very early and luminous
star of spectral type O4If, with an estimated mass of (70 +/- 15) solar masses
and an age of about 1.6 Myr. The high spatial resolution of the MSX
observations allows us to determine its direction of motion in the plane of the
sky by means of the symmetry axis of the well-defined bow shock, which matches
well the orientation expected from the proper motion. Tracing back its path
across the sky we find that BD+43 3654 was located near the central, densest
region of Cygnus OB2 at a time in the past similar to its estimated age.
  Conclusions: BD+43 3654 turns out to be one of the three most massive runaway
stars known, and it most likely formed in the central region of Cygnus OB2. A
runaway formation mechanism by means of dynamical ejection is consistent with
our results.

<|endoftext|><|startoftext|>
Introduction
The star HD56126 is observed in the post-AGB phase of its evolution. While undergoing this short-lived
stage (according to Blöcker [1], this phase lasts for ∆T ≈ 103 ÷ 104 years), the star passes to the stage of a
planetary nebula and therefore post-AGB stars are also commonly referred to as “protoplanetary nebulae”
(PPNe). In the Hertzsprung–Russell diagram post-AGB stars move at almost constant luminosity leftward
of the AGB and become increasingly hotter. These objects, which are descendants of AGB-stars, can be used
to trace the physical and chemical parameters of the interstellar matter due to a change in the energy source,
which is accompanied by a change of the star structure, ejection of the envelope, and mixing of matter.
The main task of our research is to reveal the chemical composition anomalies that are due to the nuclear
synthesis of chemical elements in the interiors of low- and intermediate-mass stars (less than 8–9M⊙) and
subsequent dredge-up of the products of synthesis to the surface layers of stellar atmospheres. We use our
high-precision spectroscopic data not only to study the chemical composition, but also to perform a detailed
analysis of the velocity field in the atmospheres of these stars, which constitutes a separate astrophysical
problem. In addition, the high quality observational data allowed us to produce an atlas of the spectrum of a
typical post-AGB star over a wide wavelength interval. To this task, we chose the supergiant star HD 56126
(Sp=F5Iab), which is the optical component of the IR source IRAS07134+1005 with a double-peaked
spectral energy distribution (SED) typical of PPN. The star HD56126 is located outside the galactic plane,
its galactic coordinates are l=206.◦75, b=+9.◦99. Note that HD56126 is a generally recognized canonical
object in the phase of transition from the asymptotic giant branch to a planetary nebula. In addition to
the anomalous SED mentioned above, which is due to the circumstellar dust, this star exhibits other, highly
conspicuous, features specific for this class of objects [2]: the optical component of the PPN is an F5Iab-
type supergiant at a high galactic latitude; the central star is surrounded by an extended nebula, which,
according to HST observations [3], has the largest angular size β > 4′′ among PPN objects of this type; the
optical spectrum exhibits variable complex emission–absorption profile of the Hα and shows spectral features
that are indicative of the current mass outflow. Based on their high-resolution spectroscopy (R=860000,
FWHM=0.35km/s) of HD56126, Crawford and Barlow [4] revealed the multicomponent structure of the
Ki and C2 features, which is indicative of repeated episodes of mass ejection from the star.
⋆ The full version of the Atlas is available in electronic form from: http://www.sao.ru/hq/ssl/Atlas/Atlas.html
http://arxiv.org/abs/0704.0677v1
http://www.sao.ru/hq/ssl/Atlas/Atlas.html
2 Klochkova et al.: Spectroscopy of HD56126
Subsequent studies of HD56126 and of the associated IR source revealed a number of properties, which
allowed the object to acquire the canonical status in its class. First, an analysis of the spectra obtained with
the echelle spectrograph of the 6-m telescope, allowed Klochkova [5] to conclude that HD56126 is a metal-
poor star with [Fe/H ]⊙=−1.0 and high excess of carbon and s-process elements. Second, IRAS 07134+1005
was found to belong to the group of PPNe whose IR-spectra exhibit an emission feature at λ=21µm. Objects
of this small subgroup were found to exhibit a correlation between the presence of this 21µm–feature and
the manifestation of products of stellar nucleosynthesis in the outer atmospheric layers: overabundance of
carbon and heavy metals of the s-process. This so far unexplained correlation has been found independently
by Klochkova [6] and a group of other authors [7]. Thus HD56126 possesses the complete set of features
peculiar to the entire family of PPNe, and this fact determines the importance of the detailed spectroscopy
of this object and preparation of an atlas of its optical spectrum over a wide spectral region. This task is
facilitated by HD56126 being the brightest (B=9.m11, V=8.m27) star among carbon-rich PPNe and hence
the most accessible star for high-resolution spectroscopy among the objects of this type.
Section 2 gives a brief description of the methods of observation and data reduction employed in this
paper. Section 3 presents the peculiar features of the spectrum of HD56126, and section 4 describes the
field of radial velocities Vr in the atmosphere and envelope of the star. We also briefly discuss the radial-
velocity variability and the variability of selected spectral-line profiles. Section 5 describes the spectroscopic
atlas, identification of spectral features, and compares the spectrum of HD56126 with that of the standard
supergiant αPer (Sp = F5Iab).
2. Observations and reduction of spectra
We performed spectroscopic observations of HD 56126 and αPer with the 6-m telescope of the Special
Astrophysical Observatory. We obtained all spectra with NES [8, 9] and Lynx [10, 11] echelle spectrographs
operating in the Nasmyth focus. A 2048×2048 CCD and image slicer [12] with the NES spectrograph allows
taking spectra with a resolution of R≈ 60000, whereas the Lynx spectrograph equipped with a 1K×1K CCD
yields a resolution of R≈ 25000. The Table 1 gives the dates of observations and the spectral region recorded.
We use the modified ECHELLE context of MIDAS to extract data from two-dimensional echelle spectra
(see [13] for details). Cosmic-hit removal was performed via median averaging of two successive spectra.
Wavelength calibration was made using Th-Ar-lamps. We use DECH20 [14] code to perform spectropho-
tometric and position measurements. In particular, we determine the radial velocities from individual lines
and their components by superimposing the direct and mirror-reflected profiles. We determine the position
zero point for each spectrogram by referring it to the positions of ionospheric emission features of the night
sky and to those of telluric absorptions, which show up against the spectrum of the object. The accuracy
of single line velocity measurements in the spectra obtained is better than 1.0 and 1.5 km/s, for NES and
Lynx spectrographs, respectively.
3. Peculiarities of the optical spectrum of HD56126
Optical spectra of PPNe differ from the spectra of classical supergiants by the anomalous profiles of spectral
lines (Hi, Nai, Hei), and primarily, by the anomalous Hα profiles. Hα lines in the spectra of typical PPNe have
complex emission and absorption profiles with asymmetric cores, PCyg- or inverse PCyg-type profiles, and
profiles with two emission components. PPNe often exhibit a combination of several such features. Emission
in Hα may be due to mass outflow and/or pulsations and hence we must observe sporadic stellar wind in
many PPNe. The Doppler shift of the core is usually smaller than the escape velocity, i.e., we have evidence
only for motions at the wind base. The spectra of individual objects owe the great variety of their profiles to
the differences in the dynamical processes in their extended atmospheres: spherically symmetric outflow with
constant or height-dependent velocity, mass infall onto the photosphere, and pulsations. A two-component
emission profile is indicative of a nonspherical envelope, e.g., the presence of a circumstellar disk.
The peculiarity of the optical spectra of PPNe often shows up not only in specific Hi profiles, but also in
the distortions of the spectral features of the F −K-type supergiant due to chemical composition anomalies
and the presence of molecular features along with atomic and ion lines.
HD 56126 exhibits all these spectral peculiarities that distinguish PPN from a normal supergiant of the
same spectral type. As is evident from Fig. 1, the Hα line has a complex profile with absorption and emission
components, which are absent in the spectrum of the comparison star αPer. Figure 1 also shows well-defined
photospheric wings of the Hα line in the spectrum of HD56126. These wings are almost as extended as in
the spectrum of αPer. Figure 2, which shows all the data now available, demonstrates date-to-date variations
Klochkova et al.: Spectroscopy of HD56126 3
Table 1. Log of observations and results of Vr measurements. Column 4 gives the mean Vr averaged over weak
lines (r→ 1). For Feii(42), Hα and D lines of Nai we give the velocities inferred from the positions of the strongest
line components. The numbers in parentheses give the velocities inferred from weaker components. Slanted font in
column 5 indicates the velocities inferred from the IR oxygen triplet Oiλ 7773 Å. Semicolumn indicates uncertain
Date Spectro- ∆λ, Å Vr
graph r→ 1 Feii(42) Hβ Hα DNai C2 interstellar
1 2 3 4 5 6 7 8 9 10 11 12
HD56126
12.01.93 Lynx 5560–8790 88.8 91 – 78 (100:) 77 – – – –
10.03.93 Lynx 5560–8790 89.0 93 – 71 (43:) 75: – – – –
04.03.99 Lynx 5050–6640 85.9 77 – 76 (43:) 78 77.1 – – –
20.11.02 NES 4560–5995 89.6 95 (80:) 89 – 75 (89) 77.2 12.0 23.5 30.8
21.02.03 NES 5150–6660 88.8 96: – 88 (112:) 75 (89) 77.1 12 24 31
12.04.03 NES 5270–6760 88.4 – – 82 (103:) 75 (89:) – 13 23 30.5
14.11.03 NES 4518–6000 85.3 96 (87:) 97 – 75 (87:) 76.9 12.5 – –
10.01.04 NES 5270–6760 86.7 – – 54: 76 (86:) – 13.0 23.5 31
09.03.04 NES 5275–6767 89.8 – – 58 (74:) 76 (89) – 13 24 31
12.11.05 NES 4010–5460 82.5 97 (77:) 98 – – 77.5 – – –
04.03.99 Lynx 5050–6620 −1.2 −1 – −2 – – –
02.08.01 NES 3500–5000 −1.8 −1 : −2 – – – –
11.11.05 NES 4010–5460 −2.0 −2 −2 – – – –
12.11.05 NES 4560–6010 −1.9 −2 −2 – – – –
6540 6550 6560 6570
Figure 1. Fragment of the atlas containing the Hα profile in the spectra of HD 56126 (top) and αPer
(bottom). The y-axis gives the residual intensities, the continuum level is set equal to 100.
of the central part of the Hα profile. Earlier, Oudmaijer and Bakker [15] performed spectral monitoring of
HD56126 and also found the Hα to be highly variable over a two-months time scale. Hα-line variability can
be naturally explained in the case of post-AGB stars with signs of binarity (e.g., in the case of HR 4049 [16]),
however, it also shows up in post-AGB objects, which exhibit no regular radial-velocity or light variations
(the case of HD133656 [17]). Photometric variability would allow us (like in the case of RVTau type stars)
to invoke the mechanism where a shock wave stimulates mass outflow. Based on an extensive set of good
quality spectroscopic observations of HD56126, Barthès et al. [18] found that not only the profile of Hα but
4 Klochkova et al.: Spectroscopy of HD56126
6562 6564 6566
12.01.93
10.03.93
04.03.99
21.02.03
12.04.03
10.01.04
09.03.04
Wavelength, А
Figure 2. Variations of the Hα-line profile in the spectra of HD 56126 taken on different days. The y-axis
shows the residual intensities, the continuum level of the lower spectrum is set to 100 and every next spectrum
is shifted by 100 gradations with respect to the previous spectrum.
also that of Hβ to be variable. The above authors analyzed the variations of the profiles of both these lines
and concluded that no periodic component is present that could be associated with the radial-velocity and
photometric variations of the star.
The profiles of strong Feii lines (first and foremost, those of the members of the 42-nd multiplet), Baii,
and other elements in the spectrum of HD56126 are also variable. However, whereas either the blue or
red wing may have lower slopes in the absorption core of Hα, nonhydrogen absorptions preserve their
Klochkova et al.: Spectroscopy of HD56126 5
5160 5170 5180 5190
Figure 3. Same as Fig. 1, but for the spectral region containing the 5165 Å band of the Swan system of C2
molecule and the Feii(42) 5169 Å line.
asymmetry pattern unchanged: the blue wing is always more extended than the red wing. The profile of the
Feii (42)λ 5169 Å line in Fig.3 is a typical example.
Figures 3 and 4 show fragments of the spectra that may illustrate the differences between line intensities
of HD 56126 and αPer. Feii absorptions in the spectrum of the former star are much weaker than in those
of the latter star, and the ratio of the central depths of the same line in the spectra of the two stars depends
on the line intensity: it increases from 1.5 to 4 as one passes from weak to strong lines. Fei lines are also
depressed, on the average, by 0.1. On the contrary, Ci absorptions as well as those of Yii, Zrii, and other
s-process products are deeper by 0.1–0.2 in the spectra of HD 56126 compared to the corresponding features
in the spectrum of αPer.
Let us now analyze the molecular component of the spectrum of HD56126. Bakker et al. [19] were the
first to identify the Swan absorption bands of C2 molecule and of the red system of CN molecule in the
spectrum of the star. Later, Bakker et al. [20] used high-resolution spectra with R=50000 to perform a
detailed analysis of molecular bands in the spectra of HD 56126 and 16 other PPNe selected based on the
presence of carbon molecules C2, CN, CH
+ in their shells. Judging by the velocity corresponding to the
position of these bands, the molecular spectrum forms in a limited region of the shell close to the star [20].
Our spectra exhibit several bands of the Swan system (see Fig. 3).
Here it is pertinent to recall that the spectra of several PPNe were found to exhibit emission Swan bands
[19, 6]. However, the spectra of HD 56126 taken in different years show no signs of emission in these bands. D
lines of Nai neither show any signs of emission. This fact is consistent with the rather simple elliptical shape
of the nebula surrounding HD56126. Emissions in the Swan bands or in Nai D lines appear to show up only
in the spectra of PPNe with bright circumstellar nebulae with well-defined asymmetry. The spectroscopy of
the following PPNe corroborates this hypothesis: IRAS 04296+3429 [21], IRAS 23304+6147 [22], AFGL 2688
[23], IRAS 08005−2356 [24], IRAS 20056+1834 [25], and IRAS 20508+2011 [26]. On HST images [3], the
nebulae surrounding these PPNe are usually asymmetric and have a bipolar structure. Note also that most
of the objects listed above belong to type “1” according to the classification of Trammell et al. [27] — i.e.,
they are PPNe with polarized optical radiation.
Figure 5 shows the behavior of the D2 Nai line profile in the spectrum of HD56126, and here, like in
Table 1, to reveal the fine structure of lines, we analyze only the spectra taken with the highest resolution. The
positions of the three short-wavelength components remain constant within the errors. This stability confirms
that the components form in the interstellar medium. The wavelength shift of the deepest component agrees
6 Klochkova et al.: Spectroscopy of HD56126
5840 5850 5860 5870
Figure 4. Same as Fig. 1, but for the spectral region with the Baii λ 5853 Å line.
with that of the Swan bands (columns 8 and 9 in Table 1), and this fact indicates that the component in
question forms in the circumstellar shell. Finally, the longest-wavelength component forms in the photosphere:
its temporal behavior agrees with that of other photospheric absorptions (columns 4 and 8 in Table 1).
4. Radial velocities pattern
Much attention has been paid to the radial-velocity variations of HD 56126 and to study the differences
between the radial velocities inferred from lines of different types. Bujarrabal et al. [28] used CO millimeter-
wave observations to find Vr = 86.1 km/s, which is natural to adopt as the systemic radial velocity of
HD56126. Based on an extensive collection of spectrograms with high temporal resolution and high S/N
ratio, Oudmaijer and Bakker [15 analyzed the behavior of Vr and concluded that it is variable on a time
scale of several months with a small amplitude (Vr=84÷87± 2 km/s). The above authors demonstrated the
absence of variations on time scales ranging from several minutes to several hours. The variability of the radial
velocity of HD 56126 also showed up when the radial-velocity measurements made with the 6-m telescope were
compared to published data [5]. Lèbre et al. [29] performed a detailed spectroscopic monitoring of HD 56126.
Fourier analysis of the available radial-velocity and photometric data led the above researchers to conclude
that the dynamical state of the atmosphere of HD 56126 is similar to that of the atmospheres of RVTau-type
variables. The above authors interpreted the variability of Hα in terms of shock propagation. Later, Lèbre
et al. [30] analyzed the variability of two lines, Hα and Hβ. They obtained additional spectroscopic data and
determined the period of radial pulsations to be P = 36.8 days.
Barthès et al. [18] analyzed all the available reliable radial-velocity measurements made for HD56126 (89
measurements over eight years) and concluded that radial velocity of this star varies with a half-amplitude
of 2.7 km/s and the main period of P = 36.8 ±0.2d. The period of photometric variability is the same and
photometric amplitude is very small, 0.m02. However, the above authors found the variability pattern of
the star to differ significantly from pulsations observed in RVTau-type stars. Judging by its temperature
[5], HD 56126 evolved further beyond the stage of RVTau-type stars. The photometric and radial-velocity
variations of HD 56126 may be due to first-overtone radial pulsations driven by shocks that generate complex
asynchronous motions in the upper hydrogen layers of the star.
Table 1 presents the radial-velocity data we obtained for HD 56126. Given that a velocity gradient is very
likely in the upper layers of the star’s atmosphere, we report here the Vr values for individual lines and
groups of lines. As is evident from Table 1, velocity variations inferred from weak absorptions (their residual
Klochkova et al.: Spectroscopy of HD56126 7
5888 5889 5890 5891
20.11.02
21.02.03
14.11.03
12.04.03
10.01.04
09.03.04
Wavelength, А
Figure 5. Spectral fragment with the D2 Nai line on different observing dates.
intensities approach 1) are within the variability limits established by Barthès et al. [18]. The positions of
the circumstellar Nai D lines and Swan bands of the C2 molecule agree well with each other and match the
expansion velocity of the shell, Vexp ≈11 km/s. Note that the position of the wind component of the Hα line
is inconsistent with those of the wind components of other lines, whereas the variations of the Hβ profile is
synchronized with those of Feii(42) lines.
When intercomparing our data and comparing it with those of other authors one must control the zero
points of the corresponding radial-velocity systems. We used interstellar and circumstellar lines to control
the radial-velocity zero points. Three blueshifted interstellar components of Nai lines in the spectrum of
HD56126, which are barely visible in Fig. 5, yield Vr values listed in Column 10–12 of Table 1. The fourth
weak component with Vr ≈46km/s is barely visible in at least three our spectra. For each component, all our
Vr estimates agree with each other and with those of Bakker et al. [19] within the quoted errors. Furthermore,
as is evident from Fig. 5, the blend consisting of the three main components has sharp boundaries, which
allow the velocity of this entire feature to be confidently measured. Its mean velocity as inferred from our
data is equal to Vr = 20.3 ± 0.3 km/s and agrees with the velocity of 20±2km/s measured by Lèbre et
al. [29] from lower-resolution spectra. Crawford and Barlow [4] showed that when observed with superhigh
resolution, the circumstellar C2 and Ki features exhibit components that are about 1 km/s apart. These
components yield the same set of velocity values, but have different intensities. This effect may cause minor
systematic differences (also on the order of 1 km/s) between the velocities inferred from circumstellar atomic
and molecular lines in lower-resolution spectra. Our measurements reveal no variations of these velocities
with time, and their mean values, 77.2±0.5 and 75.4±0.3km/s for C2 and Nai, respectively, do not disagree
systematically with the radial velocities obtained by Lèbre et al. [29], Bakker et al. [19, 20], and Crawford
and Barlow [4]: 77.3÷77.6 and 75.3÷76.8km/s for C2 and Nai, Ki, respectively.
However, when comparing our Vr values with published data, one must take into account not only method-
ological effects, but also the spectroscopic peculiarity of the object itself. Line profiles in the spectrum of
HD56126 are asymmetric and their shape varies both with time and with line intensity. We plan to under-
take a detailed analysis of the velocity field at different depths in the atmosphere and in the circumstellar
envelope of HD 56126 in a separate paper.
8 Klochkova et al.: Spectroscopy of HD56126
As for the possible binarity of HD56126, it has neither been confirmed or finally disproved. In this
connection, of certain interest is a comment by Barthès et al. [18] who pointed out a weak trend in the star’s
radial velocity over several years of observations. This trend maybe indicative of a second companion in the
system with an orbital period longer than 16 years. Our eight spectra, which we took on more recent dates,
fail to clarify the situation. It would be therefore important to follow the behavior of Vr over a several-years
period taking one to two spectra every month on a regular basis.
The radial-velocity variability of HD 56126 is not a unique phenomenon. Part of candidate PPNe also
demonstrate radial-velocity variations on a time scale of several hundred days, which may be indicative
of the binary nature of these objects. Evidence for orbital motion has been obtained for several optically
bright objects with IR excess. For example, authors of papers [31, 32] and [33] proved the binary nature,
determined the orbital elements, and proposed models for the high-latitude supergiants 89Her and HR4049,
respectively. Van Winckel et al. [34] showed that HR4049, HD44179, and HD52961 to be spectral binaries
with the orbital periods of about one to two years. The above authors concluded that all extremely metal-
deficient PPNe studied so far (HR4049, HD44179, HD52961, HD46703, and BD+39o4926) are binary stars.
The observed correlation between the binarity and the presence of a hot dust envelope indicates that binarity
promotes the formation of an envelope. Bakker et al. [16] use high-resolution spectra of HR4049 to analyze
the variations of complex emission–absorption profiles of Na D lines and Hα lines over the orbital period.
Individual components of these lines may form under different conditions: in the atmosphere of the main
star; in the disk where both components of the binary are immersed, or in the interstellar medium. For such
binaries, of fundamental importance is the determination of the systemic velocity from radio spectroscopy.
The nature of the companions of suspected binary post-AGB stars remains unclear, because we see no
direct manifestations of these companions either in the continuum or in spectral lines (all known binary
post-AGB objects belong to type SB1). These companions may be either very hot object, or main-sequence
stars of very low luminosity. For example, Bakker et al. [16] believe that the secondary companion in HR 4049
is a cold (Te = 3500K) MS star with a mass of M=0.56M⊙, although it may also be a white dwarf, like in
the case of Ba-stars [35].
Unfortunately, because of the short history of PPN studies we are so far unable to make any definitive
conclusions concerning the cause of radial-velocity variability of a representative sample of these objects.
Moreover, the observed pattern of radial-velocity variations is often complicated by differential motions in
the extended atmospheres of these objects. A detailed analysis of radial velocities based on data taken with
high spectral and temporal resolution for selected — the brightest — PPNe reveals differences in the behavior
of radial velocities inferred from lines of different excitation, which form at different depths in the atmosphere
of the star. For example, Bakker et al. [36] analyzed the spectrum of the IRAS source identified with the
peculiar supergiant HD101584 and found eight categories of spectral lines with fundamentally different
temporal behavior of profiles, half-widths, and shifts (and hence of Vr values). In particular, the highest-
excitation absorption features, which form near the star’s photosphere, exhibit radial-velocity variations due
to the orbital motion in the binary system. At the same time, low-excitation lines with PCyg-type profiles
form in the stellar-wind region and are indicative of mass outflow. The systemic velocity has been confidently
determined from radio emissions of CO and OH molecules.
5. Spectral atlas
Table 2. Spectra used to create this atlas
αPer HD56126
∆λ, Date Spect- Date Spect-
Å rograph rograph
4010–5460 11.11.05 NES 12.11.05 NES
5460–6010 12.11.05 NES 9.03.04 NES
6010–6640 4.03.99 Lynx 9.03.04 NES
6640–8790 10.03.93 Lynx
Our comparative atlas of the spectra of HD56126 and αPer includes 94 plots representing 40 Å-long
spectral fragments. Some of them are shown in Figs. 1, 3, 4 and 6 to illustrate the differences between the
Klochkova et al.: Spectroscopy of HD56126 9
6310 6320 6330 6340 6350
Figure 6. Same as Fig. 1, but for spectral fragment containing the Siii λ 6347 Å line.
intensities and profiles of lines in the spectra of two stars of similar temperature and luminosity. The full
version of the atlas is available at http://www.sao.ru/hq/ssl/Atlas/Atlas.html.
In the wavelength interval 4010–6640ÅÅ the atlas gives the complete spectra of both objects. However,
in the more long-wavelength portion, up to 8790 Å, part of the spectrum has been lost in gaps between
echelle orders and the remaining portions are overcrowded by telluric lines. The atlas therefore gives only
the most informative fragments for this part of the spectrum of HD56126.
The spectrum of HD56126 is variable — line profiles, differential line shifts, and radial velocities vary
from date to date, and therefore we performed no averaging of any kind — different spectral intervals are
represented in the atlas by different spectra indicated in Table 2. For each wavelength interval, we selected
from among the available spectra the one with the highest resolution and signal-to-noise ratio.
We supplement graphic data in the atlas with tables. In Table 3 column 1 gives the results of identification
of spectral features; column 2, the laboratory wavelengths used to measure the radial velocities; columns
3 and 5, the central residual line intensities “r”, and columns 4 and 6, the heliocentric radial velocities Vr
measured from the line cores.
To identify atomic and molecular lines in the spectrum of HD56126, we use the atlases and tables of
solar spectrum [37, 38, 39, 40], the Moore tables for multiplets [41, 42], and electronic tables to the paper
by Bakker et al. [19]. We also use VALD database [43]. We supplement the standard identification criteria
(wavelength, relative line intensity, specific line profile) by two additional ones. One of these new criteria
uses the chemical composition anomalies of HD 56126 mentioned above and the spectrum of the comparison
star. The second new criterion can be applied only to sufficiently strong lines (r < 0.5), which in some of
our spectra exhibit a sharp variation of radial velocity with line depth. Several rather strong lines remained
unidentified in the spectrum of HD56126. Some of them can be seen in the fragments of the atlas presented
here: the λ 6550 Å line in Fig.1, the λ 5845 and 5852 Å lines in Fig. 4, and the λ 6347 Å line in Fig. 6.
Compared to the lines in the spectrum of αPer, those in the spectrum of HD56126 are less blended,
because they are narrower and, in addition, many of these lines are weaker due to low metallicity of the
star. However, by no means all absorptions can be used for reliable measurement of radial velocities. Table 3
lists about one and half thousand identifications for both stars and only 940 Vr measurements for HD 56126
obtained mostly from lines with minimal blending or from lines with the strongest difference of intensity in
the spectra of two stars.
http://www.sao.ru/hq/ssl/Atlas/Atlas.html
10 Klochkova et al.: Spectroscopy of HD56126
Conclusions
We use numerous high-resolution observations (R=25000 and 60000) made with the echelle spectrographs
of the 6-m telescope to perform a detailed analysis of the optical spectrum of the post-AGB star HD56126
identified with the IR source IRAS 07134+1005. We identified numerous absorptions of neutral atoms and
ions in the wavelength interval from 4010 to 8790 ÅÅ and measured their depths and the corresponding
radial velocities. We identified absorption bands of the C2, CN, and CH molecules, and interstellar bands
(DIB). In addition to the known variability of the profile of the Hα line, we found variations of the profiles
of a number of Feii and Baii lines. We produced an atlas of the spectra of HD56126 and its comparison star
αPer.
An analysis of our radial velocities determined from all spectra of our collection leads us to conclude
that:
– the accuracy of our radial-velocity data for HD 56126 allows them to be combined with the most accurate
of earlier published measurements;
– we found the behavior of Vr values to differ for lines of different excitation degree, which form at different
depth in the stellar atmosphere. The half-amplitude of the variations of radial velocities measured from
weak absorptions (r→ 1) is equal to 2–3 km/s;
– we confirm the stability of the expansion velocity of the circumstellar envelope of HD56126 as measured
from C2 and Nai lines;
– we reveal the complex and variable shape of the profiles of strong lines (not only hydrogen lines, but
also absorption features of Feii, Yii, Baii, and other elements), which form in the expanding atmosphere
(wind base) of the star. To study the kinematic state of the atmosphere, one needs measurements of
radial velocities for individual details of these profiles;
– we demonstrate the necessity of high and even superhigh spectral resolution for studying stellar and
circumstellar lines, respectively, in the spectrum of HD56126.
Acknowledgments
This work is supported by the Russian Foundation for Basic Research (project code 05–07–90087), the
Presidium of the Russian Academy of Sciences (program “Origin and Evolution of Stars and Galaxies“,
the Branch of Physical Sciences of the Russian Academy of Sciences (program “Extended Objects in the
Universe”). This publication is based on work supported by Award No. RUP1–2687–NA–05 of the U.S.
Civilian Research and Development Foundation (CRDF).
This work makes use of the SIMBAD, NASA ADS, and VALD astronomical databases.
Klochkova et al.: Spectroscopy of HD56126 11
References
1. T. Blöcker, Astrophys. J. 299, 755 (1995).
2. S. Kwok, Annu. Rev. Astron. & Astrophys. 31, 63 (1993).
3. T. Ueta, M. Meixner, and M. Bobrowsky, Astrophys. J. 528, 861 (2000).
4. I.A. Crawford and M.J. Barlow, MNRAS 311, 370 (2000).
5. V.G. Klochkova, MNRAS 272, 710 (1995).
6. V.G. Klochkova, Bull. Spec. Astrophys. Observ. 44, 5 (1998).
7. L. Decin, H. van Winckel, C. Waelkens, and E. Bakker, Astron. & Astrophys. 332, 928 (1998).
8. V.E. Panchuk, V.G. Klochkova, I.D. Najdenov, Preprint of the Special Astrophysical Observatory of the Russian
Academy of Sciences No. 135 (1999).
9. V.E. Panchuk, N.E. PIskunov, V.G. Klochkova, et al., Preprint of the Special Astrophysical Observatory of the
Russian Academy of Sciences No. 169 (2002).
10. V.G. Klochkova, S.V. Ermakov, V.E. Panchuk, et al., Preprint of the Special Astrophysical Observatory of the
Russian Academy of Sciences No. 137 (1999).
11. V.E. Panchuk, V.G. Klochkova, I.D. Najdenov, et al., Preprint of the Special Astrophysical Observatory of the
Russian Academy of Sciences No. 139 (1999).
12. V.E. Panchuk, M.V. Yushkin, I.D. Najdenov, Preprint of the Special Astrophysical Observatory of the Russian
Academy of Sciences No. 179 (2003).
13. M.V. Yushkin, V.G. Klochkova, Preprint of the Special Astrophysical Observatory of the Russian Academy of
Sciences, No. 206 (2005).
14. G.A. Galazutdinov, Preprint of the Special Astrophysical Observatory of the Russian Academy of Sciences No. 192
(1992).
15. R. Oudmaijer and E.J. Bakker, MNRAS 271, 615 (1994).
16. E.J. Bakker, D.L. Lambert, H. van Winckel, et al., Astron. & Astrophys. 336, 263 (1998).
17. H. van Winckel, R. Oudmaijer, and N.R. Trams., Astron. & Astrophys. 312, 553 (1996).
18. D. Barthès, A. Lèbre, D. Gillet, and N. Mauron, Astron. & Astrophys. 359, 168 (2000).
19. E.J. Bakker, L.B.F.M. Waters, H.J.G.L.M. Lamers, et.al., Astron. & Astrophys. 310, 893 (1996).
20. E.J. Bakker, E.F. Dishoeck, L.B.F.M. Waters, and T. Schoenmaker, Astron. & Astrophys. 323, 469 (1997).
21. V.G. Klochkova, R. Szczerba, V.E. Panchuk, and K. Volk, Astron. & Astrophys. 345, 905 (1998).
22. V.G. Klochkova, R. Szczerba, V.E. Panchuk, Astron. Lett. 26, 88 (2000).
23. V.G. Klochkova, R. Szczerba, V.E. Panchuk, Astron. Lett. 26, 439 (2000)
24. V.G. Klochkova, E.L. Chentsov, Astron. Rep. 48, 301 (2004)
25. N. Kameswara Rao, A. Goswami, and D.L. Lambert, MNRAS 334, 129 (2002).
26. V.G. Klochkova, V.E. Panchuk, N.S. Tavolganskaya, G. Zhao, Astron. Rep. 83, 265 (2006).
27. S. Trammell, H.L. Dinerstein, and R.W. Goodrich, Astrophys. J. 108, 984 (1994).
28. V. Bujarrabal, J. Alcolea, and P. Planesas, Astron. & Astrophys. 257, 701 (1992).
29. A. Lèbre, N. Mauron, D. Gillet, and D. Barthes, Astron. & Astrophys. 310, 923 (1996).
30. A. Lèbre, A. B. Fokin, D. Barthes, D. Gillet, and N. Mauron, Astrophys. Space Sci. 265, 105 (2001).
31. A.A. Ferro, PASP 96, 641 (1984).
32. L.B.F.M. Waters, C. Waelkens, M. Mayor, and N.R. Trams, Astron. & Astrophys. 269, 242 (1993).
33. C. Waelkens, H.J.G.L.M. Lamers, L.B.F.M. Waters, et al., Astron. & Astrophys. 242, 433 (1991).
34. H. van Winckel, C. Waelkens, and L.B.F.M. Waters, Astron. & Astrophys. 293, L25 (1995).
35. R.D. McClure, MNRAS 96, 117 (1984).
36. E. J. Bakker, H.J.G.L.M. Lamers, L.B.F.M. Waters, et al., Astron. & Astrophys. 307, 869 (1996).
37. R.L. Kurucz, I. Furenlid, and J.T.L. Brault, Nat. Solar Observ. Atlas, New Mexico: National Solar Observatory
(1984).
38. L. Wallace, K. Hinkle, and W. Livingston, Nat. Solar Obs. Techn. Rep. No.00–001, Tucson (2000).
39. A.K. Pierce and J.B. Breckinridge, Contr. Kitt Peak Nat. Obs. No. 559 (1973).
40. J.W. Swensson, W.S. Benedict, L. Delbouille, and G. Roland, “The solar spectrum from λ 7498 to λ 12016. A
table of measures and identifications”, Mem. Soc. Roy. Sci. Liege, Vol.hors ser. No.5 (1970).
41. C.E. Moore, Contrib. Princeton University Observ. No.20, part I (1945).
42. C.E. Moore, Contrib. Princeton University Observ. No.20, part II (1945).
43. N.E. Piskunov, F. Kupka, and T.A. Ryabchikova, A&AS 112, 525 (1995).
12 Klochkova et al.: Spectroscopy of HD56126
Table 3. List of lines identified in the spectra of HD 56126 and αPer. Columns 3 and 5 list the central
residual intensities of the lines (the continuum level is set to 1), and columns 4 and 6, the heliocentric
velocities Vr.
αPer HD56126
Element λ Å r Vr r Vr
TiII(11) 4012.39 0.13: −2.2 0.05 81.7
FeI(557) 4013.64
FeI(485) 4013.82 0.53:
ScII(8) 4014.53 0.32: −3.5 0.49
CeII(157) 4014.90 0.56
NiII(12) 4015.47 0.50: 0.64 84.8:
CeII(256) 4015.88 0.74: 83.0:
FeI(560) 4016.42 0.74: −1.0
FeI(279) 4017.10
FeI(527) 4017.15 0.50: −1.1 0.85:
NiI(171) 4017.47 0.7: 0.9:
MnI(5) 4018.10 0.47:
FeI(560) 4018.27
ZrII(54) 4018.38 0.53 82.5
NdII(19) 4018.83 0.89:
VII(201) 4019.05 0.86: −3.0:
FeI(556) 4020.07
NdII(19) 4020.87 0.80 82.4:
CoI(16) 4020.90 0.86: −2.3
NdII(36) 4021.33 0.79 82.0:
FeI(120) 4021.61
FeI(278) 4021.87 0.45: −3.9
FeI(654) 4022.74
NdII 4023.01 0.75: 81.2:
VII(32) 4023.38 0.47: −2.5: 0.56 82.5
FeI(277) 4024.10
ZrII(54) 4024.45 0.33: 85.1:
TiI(12) 4024.58 0.37: −1.5:
TiII(11) 4025.13 0.36: −2.3 0.42: 83.5:
TiII(87) 4028.34 0.29: −1.7 0.28: 84.8:
FeI(556) 4029.63 0.43: −0.6:
ZrII(41) 4029.68 0.28 83.4:
NdII(32) 4030.47 0.60:
FeI(560) 4030.50
MnI(2) 4030.76 0.18: 0.63 81.0:
CeII(108) 4031.34 0.52
FeII(151) 4031.44
LaII(40) 4031.68 0.42 84.6:
MnI 4031.78 0.46: −2.0
FeI(44) 4032.63 0.52:
FeII(126) 4032.95 0.63
MnI(2) 4033.06 0.26: −3.3:
ZrII(42) 4034.08 0.52: 83.7
MnI(2) 4034.48 0.39: −2.6:
Klochkova et al.: Spectroscopy of HD56126 13
αPer HD56126
Element λ Å r Vr r Vr
ZrII(70) 4034.84 0.85: 83.1:
VII(32) 4035.60 0.35: 0.61 85.1
MnI(5) 4035.72
VII(9) 4036.76 0.68: −2.0 0.81 82.2:
GdII(49) 4037.39 0.90: 0.83 84.1:
CeII(218) 4037.67 0.81: 82.7:
GdII(49) 4037.90 0.85: 0.81:
NdII(31) 4038.12 0.83:
VII(32) 4039.56 0.82: −2.0 0.89:
FeI 4040.09 0.76:
ZrII(54) 4040.24 0.58 83.5
FeI(655) 4040.64 0.52:
CeII(138) 4040.76 0.52 82.6:
NdII(30) 4040.80
MnI(5) 4041.35 0.48: −2.5 0.84:
CeII(140) 4042.58 0.60:
SmII(4) 4042.72 0.59
SmII(9) 4042.90 0.53 83.1:
FeI(276) 4043.90 0.56
FeII(172) 4044.01 0.89 81.2:
FeI(359) 4044.61 0.60 −3.3:
GdII(49) 4045.15 0.72:
ZrII(30) 4045.63
FeI(43) 4045.81 0.12 −4.6 0.22:
FeI(557) 4046.06
VII(177) 4046.27
CeII(81) 4046.34 0.72 80.5
ZrII(43) 4048.68 0.29 84.2
MnI(5) 4048.75 0.39 −4.3
ZrII(43) 4050.32 0.66 0.34 82.6:
VII(32) 4051.04
NdII(66) 4051.15 0.69 0.75
FeI(700) 4051.31
CrII(19) 4051.97 0.54 0.64:
TiII(87) 4053.82 0.33 0.39 83.2
FeI(698) 4054.82 0.51
CeII(82) 4054.99 0.70 81.2:
MnI(5) 4055.54 0.68 −3.1
TiII(11) 4056.19 0.60 −0.6 0.79 84.7
FeII(212) 4057.46 0.78 85:
MgI(16) 4057.50 0.48 −4.9
CoI(16) 4058.22 0.75 −4.2: 0.89
FeI(120) 4058.76 0.66 0.91
MnI(5) 4058.92
FeI(767) 4059.79 0.82 −3.3:
NdII(63) 4059.96 0.88
NdII(10) 4061.08 0.75 −2.4 0.66 81.0:
FeII(189) 4061.79 0.84:
CeII(34) 4062.23 0.79 82.3
14 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeI(359) 4062.44 0.56 −1.0
FeI(43) 4063.59 0.19 −4.0: 0.40 82.9
TiII(106) 4064.37 0.67 −1.5: 0.82 81.1:
VII(215) 4065.09 0.74 0.78
FeI(698) 4065.38
NiII(11) 4067.03 0.38 0.59 83.8
CeII(22) 4067.28 0.64 86:
FeI(559) 4067.98 0.54 −3.1 0.87 83.0
CeII(82) 4068.84 0.75 81.8
FeI(558) 4070.78 0.60
ZrII(54) 4071.09 0.59 82.3:
FeI(43) 4071.74 0.23 −3.8 0.46 82.8
CrII(26) 4072.56 0.68 −3.1 0.77 85.5:
CeII(109) 4072.92 0.78 83.8:
CeII(4) 4073.48 0.61 83.4:
FeI(558) 4073.76 0.64
GdII(44) 4073.76 0.70:
FeI(524) 4074.79 0.61 −2.8 0.95:
NdII(62) 4075.12 0.85:
CeII(57) 4075.70
SmII(51) 4075.85 0.53 −2.3: 0.51
FeI(558) 4076.63
LaII(11) 4076.71 0.37 −3.3: 0.74:
ZrII(54) 4077.05 0.53:
CrII(19) 4077.50
SrII(1) 4077.71 0.09 −2.8 0.10
DyII 4077.96
CeII(19) 4078.32 0.67:
FeI(217) 4078.35 0.56: −0.7:
MnI(5) 4079.3: 0.58 0.95:
FeI(359) 4079.84 0.69 −2.1:
FeI(558) 4080.21 0.74
CeII(44) 4080.44 0.72 82.4:
CrII(165) 4081.21 0.80: 0.74 82.4:
CrII(165) 4082.29 0.77
CeII(60) 4083.23 0.61 82.2
MnI(5) 4083.63 0.52 −1.8:
FeI(698) 4084.49 0.62 −1.9: 0.88
CeII(172) 4085.23 0.51 0.70 85.6
FeI(559) 4085.30
VII(214) 4085.67 0.61
ZrII(54) 4085.68 0.61 84.8:
CrII(26) 4086.13 0.70 −3.3: 0.85 84.0:
LaII(10) 4086.71 0.62 −2.1: 0.47 83.0
FeII(28) 4087.28 0.65 0.72 84.6:
FeII(39) 4088.75 0.74 −2.9:
CrII(19) 4088.90 0.76
ZrII(29) 4090.52 0.70 −3.7 0.43 82.4
CoI(29) 4092.39 0.62:
Klochkova et al.: Spectroscopy of HD56126 15
αPer HD56126
Element λ Å r Vr r Vr
HfII(6) 4093.16 0.84: −3.2: 0.65: 82.5
CeII(160) 4093.96 0.83: 82.0:
CaI(25) 4094.93 −1.1:
FeI(217) 4095.98
ZrII(15) 4096.63 0.50 83.6
FeI(558) 4097.08 −3.2:
Hδ 4101.74 0.08 −2.0 0.06 96.0:
SiI(2) 4102.94 −2.9:
DyII 4103.31 81.2:
FeI(356) 4104.12 −1.6:
CeII(156) 4105.00 0.67:
CeII(160) 4106.13 0.79:
FeI(217) 4106.26 0.64:
CeII(139) 4106.88 0.78:
SmII(50) 4107.39 0.63: 85.1:
FeI(354) 4107.49 0.47 −3.4
CaI(39) 4108.53 0.76:
FeI(558) 4109.06 0.67:
NdII(17) 4109.07 0.72: 83.3
NdII(10) 4109.46 0.61: 82.0
MgII(21) 4109.54
FeI(357) 4109.80 0.51:
ZrII(30) 4110.05 0.62: 82.4
CeII(29) 4110.39 0.74: 83.6:
CrII(18) 4110.99 0.52 −3.5 0.64 83.0:
CeII 4111.39 0.75:
FeII(188) 4111.90 0.75: 0.86 82.3
FeI(695) 4112.32 0.80:
CrII(18) 4112.55 0.91 82.5:
FeI(1103) 4112.96 0.62:
CrII(18) 4113.22 0.86 83.6
CeII(137) 4113.73 0.86 0.80 81.8:
FeI(357) 4114.45 0.69 −2.7 0.95: 82.5:
KII(2) 4114.99
CeII(22) 4115.37 0.79 81.6
CrII(181) 4116.66 0.86
CeII(35) 4117.01 0.79 82.4:
CeII(77) 4117.29 0.83: 82:
FeI(700) 4117.85
CeII(11) 4118.15 0.83: 0.69 81.7
FeI(801) 4118.54 0.38
SmII(51) 4118.55 0.72
CeII(89) 4119.01 0.83 80.7:
FeII(21) 4119.51 0.65: −1.6 0.77:
CeII(22) 4119.79 0.64 85:
FeI(423) 4120.21 0.72:
CeII(112) 4120.83 0.87 0.71 81.0
CoI(28) 4121.32 0.67: −1.6: 0.95:
FeI(356) 4121.81 0.70: −2.3 0.92
16 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeII(28) 4122.66 0.37 −4.6 0.50 82.5
LaII(41) 4123.23 0.59 0.46 83.8:
FeI(217) 4123.75 0.62
CeII(60) 4123.87 0.62 81.5
FeII(22) 4124.78
YII(14) 4124.91 0.52 0.46
FeI(1103) 4125.62
FeI(354) 4125.88 0.71: 0.95:
FeI(695) 4126.18 0.69: −3.8: 0.97 85:
CeII(4) 4127.37 0.65 82.4
FeI(357) 4127.61
FeI(558) 4127.80 0.42: 0.68:
SiII(8) 4128.07 0.41: −1.5: 0.50 84.2
FeII(27) 4128.74 0.49 0.63 82.7
CeII(227) 4129.18 0.61: 0.75
EuII(1) 4129.72 0.60 0.85 81.1:
BaII(4) 4130.65 0.44
CeII(209) 4130.71 0.56
FeI(43) 4132.06 0.26 −1.5 0.58 83.4:
FeI(357) 4132.90 0.60 −4.5:
CeII(4) 4133.80 0.61 −2.0 0.58 82.0
FeI(357) 4134.68 0.46 0.83 83.6
NdII 4135.33 0.83 0.73
CeII(188) 4135.44
FeI(726) 4137.00 0.61 −3.0
CeII(2) 4137.65 0.67 0.57: 80.7:
FeII(150) 4138.21 0.78:
FeII(39) 4138.40 0.70 0.73:
FeI(18) 4139.93 0.82 −4.1
FeI(695) 4140.41 0.86 −2.1
LaII(40) 4141.73 0.72 81.2
HfII(87) 4141.84 0.77:
CeII(10) 4142.40 0.68 0.69 81.8
VII(226) 4142.90 0.79
FeI(523) 4143.42 0.41: 0.78 82.7
FeI(43) 4143.87 0.29 −2.7 0.59 82.1
CeII(3) 4144.49 0.86 −1.8: 0.76 83.6
CeII(9) 4145.00 0.80 0.70 81.7
CrII(162) 4145.76 0.82
CeII(203) 4146.23 0.68 82.2
FeI(42) 4147.67 0.56 −3.7: 0.93 84.0:
ZrII(41) 4149.20 0.37 0.37
FeI(694) 4149.36
CeII 4149.94 0.72 0.61 81.0:
FeI(695) 4150.25 0.76
ZrII(42) 4150.97 0.67 −3.2 0.43 82.8
CeII(2) 4151.97 0.55 82.1
FeI(18) 4152.17 0.45
FeI(695) 4153.90 0.49 −0.2: 0.85 82.3
Klochkova et al.: Spectroscopy of HD56126 17
αPer HD56126
Element λ Å r Vr r Vr
FeI(355) 4154.50 0.43 0.88 83.2
FeI(694) 4154.81 0.88 83.5
ZrII(29) 4156.25 0.43 −4.2: 0.38 80.9:
FeI(354) 4156.80 0.46
FeI(695) 4157.78 0.58 −3.2
FeI(695) 4158.80
HfII(41) 4158.90 0.64
CeII(246) 4159.03 0.80 80.6:
ZrII(42) 4161.21 0.35 84.2:
TiII(21) 4161.52 0.35 0.41 81.2:
SrII(3) 4161.80 0.49
TiII(105) 4163.64 0.33 −2.4 0.38 82.6
CeII(10) 4165.59 0.72 0.64 83.7
BaII(4) 4166.00 0.75 82.0
MgI(15) 4167.27 0.53 −2.1 0.79 83.2
CeII(29) 4167.80 0.74 0.85 82.7
CeII(173) 4169.88 0.76 0.71
FeI(482) 4170.91 0.52
TiII(105) 4171.90 0.41 84.3
FeI(650) 4171.91 0.28
FeI(689) 4172.64 0.85 81.8:
FeI(19) 4172.75 0.46
FeII(27) 4173.46 0.22 −1.2 0.34 84.4
TiII(105) 4174.07 0.49 −2.2 0.74 84.2:
FeI(19) 4174.91 0.63 −2.9
FeI(354) 4175.64 0.53 −2.0 0.78 83.2
FeI(689) 4176.57 0.60 −0.6 0.82
FeI(18) 4177.59 0.22
FeII(21) 4177.68 0.29: 83.0:
FeII(28) 4178.85 0.31 −1.5 0.38 84.5
CrII(26) 4179.43 0.49 0.60 83.9:
ZrII(99) 4179.81 0.49 82.4
FeI(354) 4181.75 0.38 −1.0: 0.78 83.4
FeI(476) 4182.39 0.71: −2.1
VII(37) 4183.45 0.60 0.82 81.1:
TiII(21) 4184.31 0.49 0.62 82.0:
FeI(355) 4184.89 0.59 −1.8 0.87 85.1:
CeII(124) 4185.33 0.93: 0.84 82.0
CeII(1) 4186.61 0.43 85.8:
FeI(152) 4187.04 0.41 −2.4 0.71 82.8:
FeII(152) 4187.80 0.36 −4.5 0.69 82.9
FeI(1116) 4188.73 0.68 −1.4 0.96 83.6
PrII(8) 4189.52 0.90: 83.1
FeI(940) 4189.56 0.84
TiII(21) 4190.29 0.67 −2.4 0.89 82.3
VII(25) 4190.40
CeII(169) 4190.63 0.87 80.8:
GdII(34) 4191.07 83.2:
FeI(152) 4191.43 0.38
18 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
ZrII(108) 4191.50 0.57
LaII(78) 4192.35 0.87:
CeII(79) 4193.10 0.87 0.72 84.0:
CeII(85) 4193.87 0.89 0.84 81.5
FeI(693) 4195.33 0.50 0.87 83.7:
CrII(161) 4195.41
FeI(693) 4196.21 0.58
LaII(41) 4196.55 0.60:
CeII(136) 4197.67 0.81
CeII(209) 4198.00 0.79
FeI(152) 4198.30 0.30 0.68
CeII(207) 4198.43
CeII(7) 4198.67 0.64 81.1:
FeI(522) 4199.09 0.40
FeII(141) 4199.09
NdII(15) 4199.10
YII(5) 4199.27 0.60
FeI(3) 4199.99 0.85
FeI(689) 4200.93 0.73 −2.6:
FeI(42) 4202.03 0.28 0.58 82.9
VII(25) 4202.34 0.80 82.6
CeII(186) 4202.94 0.78 0.71 82.6
LaII(53) 4204.03 0.52 −2.5: 0.80 81.9:
YII(1) 4204.75 0.50 80.8
VII(37) 4205.07 0.41 −3.0 0.65
MnII(2) 4205.39 0.51 0.79:
ZrII(133) 4205.91 0.84 81.1
HfII(74) 4206.59 0.85 83.7:
FeI(3) 4206.70 0.67 −1.2
MnII(2) 4207.23 0.66 −3.2:
CrII(26) 4207.35 0.92 83.1
FeI(689) 4208.61 0.66:
ZrII(41) 4208.98 0.56 −2.3
CrII(162) 4209.02 0.42 82.6:
CeII(3) 4209.41
VII(25) 4209.74 0.73 0.88 83.5:
FeI(152) 4210.35 0.47 −1.0
ZrII(97) 4210.62 0.64 82.3
ZrII(15) 4211.89 0.57 −2.6 0.39 83.5
CeII(169) 4213.04 0.88
FeI(355) 4213.65 0.71 −2.5 0.95:
CeII(203) 4214.03 0.84 82.5
FeI(274) 4215.43
SrII(1) 4215.52 0.15 −0.6 0.16 96.9
FeI(3) 4216.18 0.51 −2.9
CdII(49) 4217.20 0.88 82.5:
FeI(693) 4217.55 0.61 −2.6 0.75 83.3
FeI(800) 4219.36 0.51 −2.1 0.85 83.7
CaII(16) 4220.13 0.88:
Klochkova et al.: Spectroscopy of HD56126 19
αPer HD56126
Element λ Å r Vr r Vr
NdII(32) 4220.26 0.68
FeI(482) 4220.35
FeII(152) 4222.21 0.48
ZrII(80) 4222.41
CeII(36) 4222.60 0.52:
FeI(689) 4224.17
ZrII(29) 4224.28 0.50: 0.66 81.0
FeI(689) 4224.52
CrII(162) 4224.85 0.83 83.6
VII(37) 4225.22
PrII(8) 4225.33 0.73:
FeI(693) 4225.45 0.44
FeI(521) 4225.95 0.74:
CaI(2) 4226.72 0.21 −3.6 0.43 80.7:
FeI(693) 4227.43 0.32 −4.7 0.59:
NdII(19) 4227.72 0.73: 82.6
NdII(36) 4228.20
CI 4228.32 0.86 0.77 81.9
SmII(4) 4229.70 0.76 0.92:
FeI(41) 4229.76
LaII(83) 4230.95 0.92 83.2:
NiI(136) 4231.03 0.91
ZrII(99) 4231.64 0.52 83.5
HfII(72) 4232.43 0.86: 0.78 82.2
FeII(27) 4233.17 0.23 0.34 86.5
FeI(152) 4233.60 0.40:
NdII(20) 4234.20 0.83 81.9
VII(24) 4234.22 0.83
MnI(23) 4235.14
MnI(23) 4235.29 0.73
VII(5) 4235.74 0.46
FeI(152) 4235.94 0.31
ZrII(110) 4236.54 0.84: 0.64 83.0
LaII(41) 4238.38 0.63 82.3
FeI(693) 4238.81 0.52 −2.6 0.79 81.6
FeI(18) 4239.85 0.51 −3.8 0.67
CeII(2) 4239.91
FeI(764) 4240.38
CaI(38) 4240.45 0.77
CrII(31) 4242.37 0.47 −1.2 0.53 83.3
NiII(9) 4244.80 0.84 0.88 81.0:
FeI(352) 4245.26 0.61 0.94
FeI(691) 4245.35
HfII(72) 4245.84 0.72
CeII(158) 4245.98
FeI(906) 4246.09 0.69
ScII(7) 4246.83 0.25 −2.1 0.39 88.7:
NdII(14) 4246.88
FeI(693) 4247.42 0.47 −2.7
20 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeI(482) 4248.23 0.72
CeII(1) 4248.68 0.82 −2.8: 0.69 81.8
FeI(152) 4250.12 0.41 −1.7 0.69 82.6
FeI(42) 4250.79 0.33 −2.2 0.58 81.8
GdII(15) 4251.74 0.84 0.80 82.1
CrII(31) 4252.63 0.64 −3.0 0.74 82.4
CeII(77) 4253.36 0.81 82.5
CrI(1) 4254.34 0.35 −1.3 0.68 82.8
CeII(81) 4255.78 0.76 83.0
CeII(172) 4256.16 0.83:
NdII(59) 4256.24 0.81 −3.1 0.84:
CeII(123) 4257.12 0.92 81.0:
ZrII(15) 4258.05
FeII(28) 4258.15 0.34 −0.5 0.36
FeI(3) 4258.32
FeI(476) 4260.13
FeI(152) 4260.47 0.31 −1.5 0.57 83.2
CeII(19) 4261.16 0.95 81.2
CrII(31) 4261.92 0.49 −2.2 0.55 83.5
SmII(37) 4262.68 0.94 81.5:
CeII(254) 4263.43 0.78
FeI(692) 4264.21 0.84 −2.3
CeII(239) 4264.37 0.91 81.2
FeI(993) 4264.74 0.90
YII(71) 4264.88 0.78
ZrII(98) 4264.92
FeI(993) 4265.26 0.87
MnI(23) 4265.92 0.91 −2.5
ZrII(8) 4266.72 0.86 82.9
FeI(273) 4266.97 0.78 −3.8
FeI(482) 4267.83 0.73 −1.9 0.93 84:
CrII(192) 4268.93 0.75:
CI(16) 4269.02 0.68
CrII(31) 4269.29 0.65 0.69:
CeII(204) 4270.19 0.90 −2.8 0.77 81.5
CeII(21) 4270.72 0.82 81.0:
FeI(152) 4271.16 0.40 −2.7 0.70 82.0
FeI(42) 4271.76 0.25 −1.3 0.44 82.8
FeII(27) 4273.32 0.44 0.43:
ZrII(28) 4273.52
FeI(478) 4273.88 0.90:
CrI(1) 4274.79 0.37 −2.1 0.72 82.7
CrII(31) 4275.56 0.54 −1.7 0.57 83.6
FeI(597) 4276.68 0.87
ZrII(40) 4277.37 0.84 0.70 82.4
FeII(32) 4278.16 0.59 −1.6 0.71 82.8
VII(225) 4278.89 0.90 0.83 82.4:
SmII(27) 4279.68 0.92 82.7:
FeI(351) 4279.87 0.82
Klochkova et al.: Spectroscopy of HD56126 21
αPer HD56126
Element λ Å r Vr r Vr
CeII(225) 4280.14 0.88 82.2:
GdII(15) 4280.48 0.82 0.89 82.4
SmII(46) 4280.79 0.84:
SmII 4281.01
CrII(17) 4281.03 0.79
MnI(23) 4281.10
ZrII(182) 4282.21 0.58
FeI(71) 4282.40 0.41 −1.7
CaI(5) 4283.01 0.58 −2.3 0.92 81.7
CrII(31) 4284.20 0.60 −1.2 0.69 83.1
MnII(6) 4284.43 0.82: 83.0:
NdII(10) 4284.51
CeII(11) 4285.37 0.82 82.8
FeI(597) 4285.44 0.71 −2.3
TiI(44) 4286.01 0.84
FeI(414) 4286.47 0.84 −1.9
ZrII(69) 4286.51 0.63 82.6
LaII(75) 4286.97 0.76 −3.3 0.75 81.7
TiII(20) 4287.88 0.36 0.1 0.51 82.9
CeII(135) 4289.45 0.79 82.0:
CrI(1) 4289.72 0.33:
TiII(41) 4290.21 0.24 −2.0: 0.36 85.0:
TiI(44) 4290.94 0.79 −3.5:
FeI(3) 4291.46 0.76 −1.4
MnII(6) 4292.25 0.82
CeII(205) 4292.77 0.88 83.0:
ZrII(110) 4293.14 0.91 0.63 82.0
TiII(20) 4294.10 0.28 −2.2 0.37 83.9
FeI(41) 4294.12
ScII(15) 4294.78 0.55 −2.3 0.73 82.5
LaII(53) 4296.05 0.73 0.65 83.1:
FeII(28) 4296.57 0.35 −0.4 0.37
CeII(2) 4296.68
PrII(7) 4297.76 0.92 84.3:
FeII(520) 4298.04 0.78
FeI(152) 4299.23 0.31 0.60
CeII(47) 4299.36
TiII(41) 4300.05 0.24 −0.7 0.31 93.0:
TiI(44) 4301.09 0.74:
ZrII(109) 4301.81
TiII(41) 4301.92 0.31 −0.8 0.40 81.5
CaI(5) 4302.53 0.46 −3.0: 0.79 83.2
FeII(27) 4303.17 0.34 −1.7 0.41 84.2:
NdII(10) 4303.59
FeI(414) 4304.55 0.86 −3.4
FeI(476) 4305.45 0.49 83.5:
ScII(15) 4305.71 0.37
CeII(1) 4306.72 0.81 0.75 82.3
CaI(5) 4307.75
22 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
TiII(41) 4307.89 0.21 −3.7: 0.40 82.4
FeI(42) 4307.90
ZrII(88) 4308.94 0.70 82.4
FeI(849) 4309.03 0.72:
YII(5) 4309.63 0.38 0.40 84.8:
CeII(126) 4309.74
CeII(133) 4310.70 0.91 82.1
ZrII(99) 4312.23 0.84 82.9
TiII(41) 4312.86 0.33 −2.3 0.41 83.5
ScII(15) 4314.09 0.22 0.36:
FeII(32) 4314.30
TiII(41) 4314.98 0.25 −1.1 0.43 83.8
GdII(43) 4316.05 0.97: 0.95 82.0:
TiII(94) 4316.80 0.58 −1.6 0.69 82.7
ZrII(40) 4317.32 0.82: 0.56 81.9
CaI(5) 4318.65 0.56 −0.1 0.90 83.4:
ScII(15) 4320.74 0.22 0.38
TiII(41) 4320.95
FeI 4321.79 0.87 −1.9
LaII(25) 4322.50 0.88 −0.9 0.75 82.7
ZrII(141) 4323.62 0.91 81.0
FeI(70) 4325.00
ScII(15) 4325.01 0.33 −2.6 0.46 81.9:
FeI(42) 4325.76 0.25 −2.4 0.43 81.4:
BaII(7) 4326.74 0.73 0.90
MnII(6) 4326.76
FeI(761) 4327.10
GdII(15) 4327.13 0.75 0.92 82.5
FeI(597) 4327.91 0.83 −1.6 0.89 83.0
SmII(15) 4329.03 0.89 −2.8 0.92
TiII(94) 4330.24 0.51 0.64
CeII(82) 4330.45
TiII(41) 4330.70 0.44 −3.5 0.59 81.8:
NiI(52) 4331.65 0.77 −2.7
VII(23) 4331.79 0.87
ZrII(132) 4333.28 0.59: 82.1
LaII(24) 4333.76 0.59 −0.7 0.53: 82.4
LaII(77) 4334.96 0.80: 81.7:
CaII(89) 4336.26 0.69:
FeI(41) 4337.05 0.36:
TiII(94) 4337.33
CeII(82) 4337.78 0.38:
TiII(20) 4337.92 0.23: −3.7: 0.33
NdII(68) 4338.70 0.49: 0.52 82.5:
Hγ 4340.47 0.09 −2.1 0.08 97.0:
TiII(32) 4341.36 0.30:
FeI(645) 4343.26 0.61:
TiII(20) 4344.29 0.30: 0.47: 82.1
CeII(251) 4345.96 0.80:
Klochkova et al.: Spectroscopy of HD56126 23
αPer HD56126
Element λ Å r Vr r Vr
FeI(598) 4346.56 0.74:
FeI(828) 4347.84 0.78
FeI(414) 4348.94 0.85: −1.8:
CeII(59) 4349.79 0.85 0.78 81.2
VII(36) 4349.97
TiII(94) 4350.84 0.52 0.71 83.2
FeII(27) 4351.77 0.23 −1.1 0.36: 85.6:
MgI(14) 4351.91
FeI(71) 4352.73 0.53 −2.5
CeII(220) 4352.73 0.76
FeII(213) 4354.36
LaII(58) 4354.40 0.70 83.3:
ScII(14) 4354.61 0.58 −5.7:
CaI(37) 4355.19 0.81 −3.3:
FeII 4357.57 0.90 83.8:
NdII(10) 4358.17 0.80 81.6
NdII(57) 4358.70
YII(5) 4358.72 0.58 −5.6 0.48 83.7
NiI(86) 4359.63
GdII(67) 4359.64 0.57
ZrII(79) 4359.74 0.43 82.8
FeII 4361.25 0.93 −1.7 0.92:
CeII(157) 4361.66 0.92:
SmII(45) 4362.04
NiII(9) 4362.09 0.80 −2.0 0.86 82.5
LaII(133) 4363.05 0.93 −1.9 0.95:
MoII(3) 4363.64 0.92: 81.5:
GdII(33) 4364.14 0.91: 81.9:
YII(70) 4364.17 0.96
CeII(135) 4364.66 0.88 −1.9 0.74 81.4
LaII(53) 4344.66
FeI(415) 4365.90 0.92
FeI(414) 4367.58
TiII(104) 4367.66 0.41 −0.6: 0.56 82.6
FeI(41) 4367.91
CeII(227) 4368.23 0.82: 0.79 83.0:
FeII 4368.26
NdII(11) 4368.63 0.93: 0.91: 82.3:
FeII(28) 4369.40 0.53 0.67 83.1
FeI(518) 4369.77 0.66: 0.90:
ZrII(79) 4370.96 0.73: 0.48 81.9
CI(14) 4371.37 0.69: 0.70 83.2:
FeII(33) 4372.22 0.92 0.92
FeI(214) 4373.57 0.83
CeII(202) 4373.82 0.83 81.5:
ScII(14) 4374.46 0.30 0.43
TiII(93) 4374.82 0.28 0.44: 81.6:
YII(13) 4374.94 0.29: 94.7:
NdII(8) 4375.04
24 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeI(2) 4375.93 0.49 −2.1 0.78 81.1
FeI(471) 4376.78 0.91 −1.5
MoII(3) 4377.77 0.94: −3.1: 0.92 82.3:
LaII(77) 4378.10 0.93
VI(22) 4379.23 0.82 −1.2:
ZrII(88) 4379.78 0.70 −3.3
CeII(155) 4380.06 0.43 83.0
CdII(68) 4380.64 0.91 0.98:
CeII(2) 4382.17 0.87 −2.9 0.73: 81.2:
FeI(799) 4382.78 0.82 −1.7:
ZrII(97) 4383.10 0.85:
FeI(41) 4383.54 0.25 −2.2 0.40 82.6
FeII(32) 4384.32 0.50 0.61 83.9
ScII(14) 4384.81 0.46 0.74
FeII(27) 4385.38 0.36 −2.2 0.44 84.0
NdII(50) 4385.66
CeII(57) 4386.84
TiII(104) 4386.85 0.51 0.57 81.0:
FeI(476) 4387.90 0.79
CeII(5) 4388.01 0.87 81.0:
FeI(830) 4388.41 0.72 −2.2
ZrII(140) 4388.50 0.90 80.0:
FeI(2) 4389.25 0.90 −3.1
VI(22) 4389.99 0.87 −2.0:
MgII(10) 4390.56 0.76
FeI(414) 4390.96
TiII(61) 4391.02 0.48 0.67 82.2
CeII(81) 4391.66 0.76 0.66 82.1
TiII(51) 4394.04 0.44 −2.6 0.54 83.2
TiII(19) 4395.03 0.25 −1.5 0.36 87.7
TiII(61) 4395.84 0.48 −2.5 0.58 83.0
YII(5) 4398.02 0.49 0.44 84.1
TiII(61) 4398.29
CeII(81) 4398.79 0.89 80.4:
CeII(81) 4399.20 0.75 82.4
TiII(51) 4399.77 0.34 −2.3 0.44 83.1
ScII(14) 4400.40 0.36 0.50 83.3
NdII(10) 4400.83 0.80
FeII(828) 4401.29 0.55
ZrII(68) 4401.35 0.84 83.6:
ZrII(79) 4403.35 0.76 0.60 81.9
FeI(41) 4404.75 0.28 −1.6 0.49 82.5
VI(22) 4406.65 0.90 −3.3
GdII(103) 4406.67 0.94 81.9:
CeII(64) 4407.28 0.90 81.1:
FeI(68) 4407.70 0.56 −2.6 0.79 81.5
FeI(68) 4408.42 0.58 −2.2 0.92 83.0:
PrII(4) 4408.84 0.85 82.0
TiII(61) 4409.24 0.50 0.77:
Klochkova et al.: Spectroscopy of HD56126 25
αPer HD56126
Element λ Å r Vr r Vr
TiII(61) 4409.52 0.75:
NiI(88) 4410.52
CeII(33) 4410.64 0.79:
TiII(115) 4411.08 0.58 −2.8 0.62 82.3
TiII(61) 4411.93 0.64 −1.1 0.78 84.2:
NdII(9) 4412.27 0.93
FeII(32) 4413.59 0.69 0.3: 0.79 84.5:
PrII(26) 4413.77
ZrII(79) 4414.54 0.82: 0.55 82.5
FeI(41) 4415.12 0.29 −0.7: 0.57 83.4
ScII(14) 4415.56 0.38: 0.54 82.3
FeII(27) 4416.82 0.38 −2.0 0.43 83.4
TiII(40) 4417.72 0.34 −2.2 0.41 82.9
TiII(51) 4418.34 0.48 −2.1 0.59 82.2
CeII(2) 4418.78 0.85: 0.70 81.6
SmII(32) 4420.53 0.87
ScII(14) 4420.67 0.79
SmII(37) 4421.13 0.94 0.88 82.3
TiII(93) 4421.94 0.58 −1.9 0.68 82.9
FeI(350) 4422.57 0.55 −1.4
YII(5) 4422.59 0.46 82.7
FeI(412) 4423.14 0.76
TiII(61) 4423.22 0.92 82.9
CeII(21) 4423.68 0.89 81.2
FeI(830) 4423.84 0.88
SmII(45) 4424.34 0.82 −4.2: 0.83 81.4
CaI(4) 4425.44 0.62 −1.5 0.93 81.3
FeI(2) 4427.31 0.46 0.87: 81.7:
TiII(61) 4427.90 0.82 −2.2: 0.79 82.5
CeII(19) 4429.27 0.86 −3.1 0.73 83.0
LaII(38) 4429.92 0.68 0.60 81.3
FeI(68) 4430.62 0.63 −2.9 0.95 81.3:
ScII(14) 4431.37 0.75 −2.7 0.92 81.3
TiII(51) 4432.10 0.79 0.94 83.6
LaII(11) 4432.95 0.94
FeI(830) 4433.22 0.77 −2.0
GdII(82) 4433.64 0.93
SmII(41) 4433.89 0.76 0.84 84.5:
SmII(36) 4434.32 0.86 80.7
CaI(4) 4434.96 0.49 0.0 0.89 82.2
FeI(2) 4435.15
EuII(4) 4435.58 0.89 83.9:
CaI(4) 4435.68 0.57
GdII(117) 4436.23 0.89 0.95
FeI(516) 4436.93 0.88 −1.4
CeII(169) 4437.61 0.95: −1.0: 0.93 81.7
GdII(67) 4438.13 0.97:
GdII(44) 4438.27
FeI(828) 4438.35 0.91 −2.7
26 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeII(32) 4439.16 0.95 −3.8 0.95 83.0:
FeI 4439.89 0.95
ZrII(79) 4440.45 0.81 −1.3: 0.57 83.4
CeII(238) 4440.88 0.89 0.89:
TiII(40) 4441.73 0.55 −2.0 0.69 82.8
FeI(68) 4442.34 0.52 −0.9
ZrII(53) 4442.50 0.72 80.8:
ZrII(88) 4443.00 0.53 0.43 83.1
FeI(350) 4443.20
TiII(19) 4443.80 0.29 −1.5 0.42 86.8
LaII(133) 4443.94
TiII(31) 4444.56 0.46 −2.9 0.56 83.3
ZrII(96) 4445.88 0.87 84.8
NdII(49) 4446.39 0.86 −2.3: 0.84 80.3
FeI(68) 4447.72 0.56 −1.6 0.89 83.1
CeII(202) 4449.33 0.81 0.74 82.4
FeII(222) 4449.66 0.88 0.85 82.0:
TiII(19) 4450.48 0.34 −1.4 0.47 84.7
FeII 4451.55 0.73 82.9
MnI(22) 4451.59 0.71 −2.9:
NdII(6) 4451.98 0.93:
SmII(26) 4452.73 0.91 0.91 81.0:
TiI(113) 4453.32 0.87
VII(199) 4453.35
FeII(350) 4454.39
SmII(49) 4454.63
CaI(4) 4454.78 0.41 −1.5 0.52 84.4
FeII 4455.26 0.89
LaII(53) 4455.79 0.87
CaI(4) 4455.89 0.63 −3.2
NdII(50) 4456.39 0.90 82.9
CaI(4) 4456.62 0.78 0.91 83.8:
ZrII(79) 4457.42 0.62 82.4
TiI(113) 4457.44 0.75
FeI(992) 4458.08
MnI(28) 4458.25 0.78
SmII(7) 4458.52 0.93 81.4
FeI(68) 4459.12 0.46 −1.8 0.85 82.1
CeII(2) 4460.21 0.77 −0.7 0.67 82.0
ZrII(67) 4461.22 0.52 82.9:
FeI(2) 4461.65 0.43
FeI(471) 4462.00
NdII(54) 4462.41 0.85
NdII(50) 4462.98 0.88 0.82 82.1
CeII(20) 4463.41 0.83 81.2
TiII(40) 4464.45 0.39 0.52 83.4:
MnI(22) 4464.68
HfII(72) 4466.41
CI 4466.48 0.76 84.3
Klochkova et al.: Spectroscopy of HD56126 27
αPer HD56126
Element λ Å r Vr r Vr
FeI(350) 4466.55 0.52 −2.1
SmII(53) 4467.34 0.88 1.2 0.82
TiII(31) 4468.49 0.29 −1.8 0.41 86.9
TiII(18) 4469.16 0.73 82.7
FeI(830) 4469.37 0.46
TiII(40) 4470.85 0.50 −1.7 0.62 84.0:
CeII(8) 4471.24 0.68 81.6:
FeI(595) 4472.72
FeII(37) 4472.92 0.52 0.64 82.9
FeII(17) 4474.19 0.96 0.98 81.0:
VII(199) 4475.70 0.98:
FeI(350) 4476.02 0.52 −1.1 0.88 84.0
YI(14) 4477.46 0.91 −1.8
CI 4477.47 0.85 82.5
CI 4478.59
SmII 4478.66 0.85 −1.2 0.81: 81.5
GdII(15) 4478.80
CI 4478.83 0.85:
CeII(203) 4479.36 0.77 81.7:
CeII(124) 4479.43
FeI(828) 4479.61 0.80:
FeI(515) 4480.14 0.82
MgII(4) 4481.22 0.28 −2.2 0.26 82.5
ZrII(131) 4482.04
FeI(2) 4482.17 0.48 0.86:
FeI(68) 4482.26
TiI(113) 4482.74 0.85 −2.0:
GdII(62) 4483.33 0.97 0.93 84.4
CeII(3) 4483.90 0.79 81.7
FeI(828) 4484.23 0.73
4484.80 0.92
ZrII(79) 4485.44 0.83 84.4
FeI(830) 4485.68 0.81 −3.6
HfII(23) 4486.14 0.97 83.4
CeII(57) 4486.91 0.85 −1.0 0.74 81.7
TiII(115) 4488.33 0.52 −3.9 0.60 82.7
FeII(37) 4489.17 0.42 −2.7 0.53 83.2
FeI(2) 4489.74 0.73:
FeI(973) 4490.77 0.83 −3.0: 0.97
FeII(37) 4491.40 0.42 −1.5 0.51 83.2
TiII(18) 4493.52 0.67 −1.5 0.81 81.7
ZrII(130) 4494.41 0.54 84.0:
FeI(68) 4494.56 0.47 −3.6
CeII(154) 4495.39
ZrII(79) 4495.44 0.82 0.76 82.7
TiII(40) 4495.46
FeII(147) 4495.52
TiI(146) 4496.15 0.88
TiI(8) 4496.25 0.90 82.5:
28 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
CrI(10) 4496.86
ZrII(40) 4496.96 0.56 0.45 84.3
CeII(19) 4497.84 0.90 0.89 82.9
MnI(22) 4498.90 0.88
CrI(150) 4500.28
TiII(18) 4500.32 0.77 0.88
TiII(31) 4501.27 0.30 −2.1 0.44 86.0
MnI(22) 4502.22 0.90: −1.6:
CrII(16) 4504.52
FeI(555) 4504.83 0.93
NdII(7) 4506.58 0.93
TiII(30) 4506.74 0.83 −2.5
GdII(13) 4506.93
CrII(16) 4507.19
FeII(38) 4508.28 0.37 −2.3 0.46 83.8
CrII(191) 4511.82 0.90 −0.8
TiI(42) 4512.74 0.90 −1.7 0.95
FeII(37) 4515.35 0.37 −2.3 0.46 83.6
LaII 4516.38 0.92:
CrII(191) 4516.56 0.94
FeI(472) 4517.53 0.88 −1.7
TiI(42) 4518.03
TiII(18) 4518.30 0.60 −1.3: 0.77
VII(212) 4518.38
SmII(49) 4519.63 0.93: 0.87 80.5
FeII(37) 4520.22 0.38 −2.5 0.46 83.3
GdII(44) 4521.30 0.97:
FeII(38) 4522.63 0.30 −2.8 0.41 84.1:
TiI(42) 4522.80
CeII(2) 4523.08
BaII(3) 4524.94 0.72 81.5
LaII(50) 4526.12 0.82 80.8
FeI(969) 4526.45 0.73 −1.1
CeII(108) 4527.35 0.79: −2.0: 0.72 82.3
VII(56) 4528.50 0.61
FeI(68) 4528.61 0.36 −2.8
TiII(82) 4529.48 0.46 −0.1: 0.63 83.3
FeI(39) 4531.15 0.57 0.95 81.4
TiI(42) 4533.24 0.91 81.6:
TiII(50) 4533.96 0.23 0.34
FeII(37) 4534.16
TiI(42) 4534.78 0.78 −1.9 0.95
SmII(45) 4537.95 0.94 0.92
CrII(39) 4539.62 0.69
CeII(108) 4539.77 0.66
FeII(38) 4541.52 0.46 −2.9 0.55 82.5
TiII(60) 4544.02 0.64 −2.9 0.79 81.8
TiII(30) 4545.14 0.57 −2.4 0.74 81.3:
CrI(10) 4545.96 0.83 −2.8
Klochkova et al.: Spectroscopy of HD56126 29
αPer HD56126
Element λ Å r Vr r Vr
FeI(755) 4547.85 0.79 −1.9 0.97
FeII(38) 4549.47
TiII(82) 4549.62 0.15 0.28 90.5:
CeII(229) 4551.30 0.93: 0.86 80.0:
TiII(30) 4552.30 0.61: 0.86 82.2
BaII(1) 4554.03 0.29 −2.1 0.31 88.7
CrII(44) 4554.99 0.58 −2.2 0.64 83.0
FeII(37) 4555.89 0.33 0.45 84.0
CrII(44) 4558.65 0.38 −2.6 0.45 82.1
CeII(8) 4560.28 0.78 82.2
CeII(2) 4560.96 0.90 −1.8: 0.83 82.2
CeII(1) 4562.36 0.81 −2.0 0.69 81.9
TiII(50) 4563.76 0.32 −2.2 0.41 84.7
ZrII(116) 4565.41 0.82 84.2:
CrII(39) 4565.77 0.60: 0.71 83.0
TiII(60) 4568.32 0.73 −1.1 0.88 82.6
HfII(86) 4570.70 0.89 80.3:
MgI(1) 4571.10 0.82 −1.1
TiII(82) 4571.98 0.27 −1.5 0.36 91.3:
ZrII(139) 4574.48 0.69 84.6:
LaII(23) 4574.87 0.83: 0.78 80.6:
FeII(38) 4576.34 0.47 −2.1 0.54 83.0
FeII(26) 4580.06 0.59 0.70 82.7
CeII(7) 4582.50 0.75:
FeII(37) 4582.83 0.51 −2.2 0.58 82.6
TiII(39) 4583.41
FeII(38) 4583.84 0.28 −2.3 0.41 87.5
CrII(44) 4588.20 0.45 −2.0 0.49 83.1
CrII(44) 4589.94 0.43 −1.8 0.53 83.4
CrII(44) 4592.05 0.58 −1.7: 0.65 82.8
CeII(6) 4593.92 0.74: 0.72 81.1
FeI(820) 4596.06 0.87 80.1:
NdII(51) 4597.01 0.93 82.9
VII(56) 4600.19 0.67 0.90 82.8
FeII(43) 4601.36 0.84: −2.8 0.91 83.6
ZrII(138) 4601.97 0.86: −0.4: 0.89 82.6
FeI(39) 4602.94 0.63 −2.2 0.94 81.8
CeII(6) 4606.40 0.87: 0.83 82.1
TiII(39) 4609.27 0.85 −2.8 0.94 82.5
FeI(826) 4611.28 0.73 −2.4
ZrII(67) 4613.95 0.87 −1.2: 0.69 82.6
CrII(44) 4616.62 0.61 −2.0 0.68 83.3
CrII(44) 4618.82 0.49 −2.2 0.55 82.3
LaII(76) 4619.87 0.85 82.3
FeII(38) 4620.51 0.55 −1.7 0.63 83.4
CeII(27) 4624.90 0.77 83.7:
FeI(554) 4625.05 0.76
CeII(1) 4628.16 0.83 −1.9 0.67 82.6
ZrII(139) 4629.07
30 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeII(37) 4629.33 0.38 −2.7 0.44 82.7:
CrII(34) 4634.07 0.54 −2.1 0.59 83.0
FeII(186) 4635.31 0.79 −1.6 0.79 82.9
TiII(38) 4636.33 0.81 −2.4 0.91 82.9
FeI(822) 4638.02 0.80: 0.95 82.1
SmII(36) 4642.24 0.96 −2.7 0.93 82.5
LaII(8) 4645.28 0.95: 0.93 80.9:
CrI(21) 4646.17 0.71 −1.4 0.95
FeI(409) 4647.44 0.72 −2.6 0.91 80.8:
FeII(25) 4648.93 0.91 82.4:
CrI(21) 4651.29 0.86 −1.7
CrI(21) 4652.16 0.78 −1.7
CeII(154) 4654.29 0.77 83.4:
LaII(75) 4655.49 0.82 82.0
FeII(43) 4656.98 0.43 0.64
TiII(59) 4657.20
ZrII(129) 4661.78 0.74 82.3:
LaII(8) 4662.51 0.80 82.3:
FeII(44) 4663.71 0.69 −1.6 0.77 83.4
FeII(37) 4666.75 0.51 −2.0 0.61 83.1
LaII(76) 4668.91 0.91 82.1
FeII(25) 4670.19
ScII(24) 4670.40 0.43 0.60 82.1:
LaII(80) 4671.82 0.92 82.7:
CeII(18) 4680.13 0.92 81.6
YII(12) 4682.34 0.73 0.61 81.4
CeII(228) 4684.61 0.89 81.1:
ZrII(129) 4685.19 0.86 0.73 82.7
LaII(23) 4691.17 0.95 82.4:
FeI(409) 4691.42 0.75 −1.2:
LaII(75) 4692.50 0.91 81.4
TiII(59) 4698.67 0.93 81.9
MgI(11) 4702.99 0.52 −2.2
ZrII(138) 4703.03 0.63 81.9
NdII(3) 4706.55 0.92 −1.1: 0.89 82.2
TiII(49) 4708.66 0.59 0.76 82.5
C2 (1;0)R1(16) 4712.96 0.92:
C2 (1;0)R2(15) 4713.12 0.92: 77.2:
C2 (1;0)R1(15) 4714.38 0.91 77.6
C2 (1;0)R2(14) 4714.54 0.91: 78.0:
C2 (1;0)R3(13) 4714.72 0.93: 77.7
NdII(49) 4715.60 0.91:
C2 (1;0)R3(12) 4716.15 0.96 78.4:
LaII(52) 4716.44 0.92 82.2:
C2 (1;0)R1(13) 4717.08 0.93 77.4
C2 (1;0)R2(12) 4717.29 0.94 78.1
C2 (1;0)R3(11) 4717.48 78.0
C2 (1;0)R1(12) 4718.38 0.92 78.2
C2 (1;0)R2(11) 4718.60 0.93 77.4
Klochkova et al.: Spectroscopy of HD56126 31
αPer HD56126
Element λ Å r Vr r Vr
C2 (1;0)R3(10) 4718.84 0.95 77.8
C2 (1;0)R1(11) 4719.61 0.87: 78.3:
C2 (1;0)R1(10) 4720.81 0.93 77.7
C2 (1;0)R2(09) 4721.09 0.94 78.7
C2 (1;0)R3(08) 4721.36 0.95 77.4
C2 (1;0)R1(09) 4721.94 0.95 78.3
C2 (1;0)R2(08) 4722.27 0.91 78.8:
C2 (1;0)R3(07) 4722.53 0.94 77.8
C2 (1;0)R1(08) 4723.04 0.93 78.0
C2 (1;0)R2(07) 4723.44 0.90 77.2
C2 (1;0)R3(06) 4723.74 0.94 77.6
C2 (1;0)R1(07) 4724.08 0.93 77.7
C2 (1;0)R3(05) 4724.83 0.92 77.9
C2 (1;0)R1(06) 4725.07 0.88: 78.3:
C2 (1;0)R2(05) 4725.57 0.91 79.6
C2 (1;0)R1(05) 4725.99 0.86: 78.5:
C2 (1;0)R2(04) 4726.60 0.89 77.5
C2 (1;0)R1(02) 4728.47 0.77 77.9:
C2 (1;0)P1(34) 4730.77 0.90 78.0
FeII(43) 4731.47 0.51 −3.1 0.59 81.3
C2 (1;0)P2(03) 4732.81 0.89 78.3:
C2 (1;0)P2(04) 4733.40 0.84 78.6:
C2 (1;0)P2(05) 4733.93 0.83
FeI(554) 4736.77 0.63 −0.3 0.73 84.0:
LaII(8) 4740.27 0.84 0.78 81.9
LaII(75) 4743.08 0.92 0.85 82.7
PrII(3) 4744.93 0.95 82.2
CeII 4747.14 0.97 0.92 82.4
LaII(65) 4748.73 0.93 −0.9: 0.83 81.5
FeI(634) 4757.58 0.91 −0.8
CeII 4757.84 0.94 81.4
MnI(21) 4761.53 0.85 −2.5:
ZrII(107) 4761.67 0.63 82.1
CI(6) 4762.31
CI(6) 4762.54 0.63:
TiII 4763.90 0.58 −2.4 0.75 81.8:
TiII(48) 4764.53 0.64 −0.5: 0.83
MnI(21) 4766.43 0.76
CI(6) 4766.68 0.83 82.2
CI(6) 4770.03 0.88 −3.6 0.76 82.3
CI(6) 4771.75 0.54 81.9
CeII(17) 4773.94 0.93 −3.9 0.90 80.6
CI(6) 4775.91 0.84 −2.2 0.71 82.0
TiII(92) 4779.99 0.52 −2.7 0.60 82.3
MnI(16) 4783.42 0.70 −2.0 0.96 82.4
YII(22) 4786.58 0.59 82.5
TiII(17) 4798.53 0.64 −3.0 0.81 82.8
LaII(37) 4804.04 0.96 −1.1: 0.91 82.0
TiII(92) 4805.09 0.45 −1.5 0.53 83.2
32 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
LaII(37) 4809.00 0.89 82.4
NdII(3) 4811.35 0.93 0.91 82.0
CrII(30) 4812.35 0.73 −2.6 0.81 82.5
CI(5) 4812.92 0.95 83.0:
ZrII(66) 4816.47 0.97 0.87 82.5
CI(5) 4817.37 0.95: −1.5: 0.90 81.7
NdII(47) 4820.34 0.91 81.9
YII(22) 4823.31 0.53 82.6
MnI(16) 4823.51
CrII(30) 4824.14 0.47 −2.9 0.54 82.0
NdII(3) 4825.48 0.82: 0.86 82.0
CI(5) 4826.80 0.95 −2.1 0.91 82.7:
NiI(111) 4831.18 0.83 −1.9 0.95 81.4
FeII(30) 4833.19 0.85 −1.8 0.94 82.9
CrII(30) 4836.23 0.67 0.78 82.2
LaII(37) 4840.02 0.92 81.4:
ZrII(138) 4841.98 0.92 82.6
SmII(26) 4844.21 0.95 82.2:
CeII(17) 4846.57 0.96 81.1:
CrII(30) 4848.25 0.52 −2.0 0.59 82.9
YII(22) 4854.87 −2.1: 83.0
FeI(318) 4859.74 0.80
Hβ 4861.33 0.11 −1.7 0.13 98.2
CrII(30) 4864.32 −1.7: 83.3:
TiII(29) 4865.62 −2.6: 81.7:
FeI(318) 4871.32 0.50: −2.6: 0.75 81.2
FeI(318) 4872.14 0.55: −3.1: 0.83 81.9
TiII(114) 4874.01 0.65: −2.2: 0.70 82.5
CrII(30) 4876.40 0.61
CrII(30) 4876.48 0.51
CaI(35) 4878.14
FeI(318) 4878.22 0.56 0.90 81.0:
YII(12) 4881.44 0.92 81.1
CeII 4882.46 0.85 81.3
YII(22) 4883.69 0.51 −2.8 0.39 84.8
CrII(30) 4884.60 0.75 −0.9: 0.84 83.2
FeI(318) 4890.76 0.53 −2.2 0.78 80.8
FeI(318) 4891.49 0.49 −2.2 0.74 82.7
FeII(36) 4893.82 0.79 −1.6 0.86
ZrII(107) 4894.43 0.89 82.0
BaII(3) 4899.94
YII(22) 4900.12 0.46 0.38
FeI(318) 4903.31 0.66 −1.9 0.87 81.0
ZrII(145) 4908.67 0.97 80.8:
TiII(114) 4911.19 0.59 −1.1: 0.63 83.7:
ZrII(3) 4911.66 0.82 81.8:
FeI(318) 4918.99 0.52 −2.8 0.81 82.9
FeI(318) 4920.50 0.42 −1.1 0.64 85.0:
LaII(7) 4920.98 0.63 81.9:
Klochkova et al.: Spectroscopy of HD56126 33
αPer HD56126
Element λ Å r Vr r Vr
LaII(7) 4921.80 0.79 0.67 82.3
FeII(42) 4923.92 0.29 −1.8 0.55: 78.0:
0.39 93.9
CI(13) 4932.05 0.81 −3.4 0.64 81.3
BaII(1) 4934.08 0.32 −2.9 0.35 85.5
FeI(687) 4946.39 0.79 −1.9
LaII(36) 4946.47 0.92 80.8
FeII(168) 4953.98 0.93 −1.6: 0.94 83.3:
FeI(318) 4957.30
FeI(318) 4957.60 0.31 0.65 81.8:
NdII(1) 4959.13 0.92 −1.1 0.92 81.4
NdII(22) 4961.40 0.96 0.93 82.4
SrI(4) 4962.29 0.78 83.3
FeI(687) 4966.09 0.71 −2.3 0.94 83.2
OI(14) 4967.88 0.95
FeI(1067) 4967.90 0.83 −1.9
OI(14) 4968.79 0.96: 82.1:
LaII(37) 4970.39 0.85 81.7
CeII 4971.48 0.88 82.2
FeI(984) 4973.11 0.82 −2.0
TiII(71) 4981.35 0.94 80.8
TiII(38) 4981.74 0.72 −1.2
YII(20) 4982.13 0.69 82.4
LaII(22) 4986.82 0.93 0.86 82.8
NdII 4989.96 0.95 −2.0: 0.90 81.4:
FeII(36) 4993.35 0.65 −1.1: 0.75 82.9
FeI(16) 4994.14 0.73 −2.4: 0.93 82.0:
LaII(37) 4999.46 0.78 82.6
TiI(38) 4999.49 0.75 −2.1
TiII(71) 5005.17 0.78: 0.92 82.2
FeI(984) 5005.72
FeI(318) 5006.12 0.54: 0.88 82.1
TiII(113) 5010.21 0.77 −2.9 0.87 82.8
BaII(10) 5013.00 0.92 80.1
TiII(71) 5013.69 0.64 −2.7 0.81 82.0
CI 5017.09 0.91: 82.8:
FeII(42) 5018.44 0.28 −2.2 0.55: 77.0:
0.34 96.4
CaII(15) 5019.98 0.72 −3.0: 0.82 82.8:
TiI(38) 5020.03
FeI(965) 5022.24 0.74:
CeII(16) 5022.87 0.90 81.3:
CI 5023.85 0.94: 0.87 82.7
TiI(38) 5024.85 0.90:
CI 5024.92 0.94 81.8
ScII(23) 5031.02 0.53 −2.6 0.63 82.5
CI(4) 5039.07 0.76 82.3
CI 5040.13 0.90 82.4:
SiII(5) 5041.03 0.77 80.5:
34 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
CeII(16) 5044.01 0.89 81.9
FeI(318) 5044.22 0.86
FeI(114) 5049.82 0.64 −2.1 0.93 81.8
FeI(16) 5051.64 0.62 −0.9:
CI(12) 5052.17 0.74: 0.54 81.7
CI 5053.52 0.95 83.6
SiII(5) 5055.98 0.83 −1.1: 0.75 83.3:
SiII(5) 5056.31
FeI(1094) 5065.02 0.66 −1.7: 0.93: 81.3:
FeI(383) 5068.77 0.68 0.95:
TiII(113) 5069.09 0.93 83.0:
FeI(1089) 5272.08
TiII(113) 5072.29 0.63 0.79 82.4
FeI(1094) 5074.75 0.71 −1.7 0.94 81.6
CeII(15) 5079.68 0.80 81.6
FeI(16) 5079.75 0.74
NiI(143) 5080.54 0.74 −2.1 0.94 81.4
FeI(16) 5083.35 0.72 −2.1 0.97
YII(20) 5087.42 0.62 −2.4 0.51 84.3
FeI(1090) 5090.78 0.82 −2.0 0.94 82.9:
NdII(48) 5092.80 0.94 −1.2 0.92 82.0
C2 (0;0)R1(33) 5095.15 0.98 77.5:
C2 (0;0)R1(32) 5098.13 0.98 77.3:
C2 (0;0)R3(30) 5098.30 0.98: 78.3:
FeII 5100.74 0.70 0.81 82.3:
C2 (0;0)R1(29) 5106.36 0.95 77.5
FeI(16) 5107.45
LaII(164) 5107.54 0.57 0.90 83.0:
FeI(36) 5107.65
FeI(1) 5110.42 0.68 −2.1
ZrII(95) 5112.27 0.89 −1.9 0.62 82.7
LaII(36) 5114.55 0.91: 0.82 81.6:
C2 (0;0)R1(25) 5116.66 0.93 77.7
C2 (0;0)R3(23) 5116.89 0.95 77.6
CeII 5117.18 0.97 0.90 81.4:
YII(20) 5119.12 0.88 −2.1 0.69 80.9:
FeII(35) 5120.34 0.80 −1.0: 0.91 82.7
C2 (0;0)R1(23) 5121.44 0.93 77.3
C2 (0;0)R3(21) 5121.69 0.94 77.7
LaII(36) 5123.00
YII(21) 5123.22 0.70 0.53
C2 (0;0)R1(22) 5123.79 77.3:
C2 (0;0)R3(20) 5124.04 77.3:
C2 (0;0)R1(21) 5125.98 0.91 77.5
C2 (0;0)R3(19) 5126.26 0.93 77.5
C2 (0;0)R3(20) 5128.19 0.90 77.3
C2 (0;0)R3(18) 5128.49 0.93 77.9
TiII(86) 5129.16 0.50
C2 (0;0)R1(19) 5130.27 0.89 77.9
Klochkova et al.: Spectroscopy of HD56126 35
αPer HD56126
Element λ Å r Vr r Vr
C2 (0;0)R1(18) 5132.36 0.86 77.7
FeII(35) 5132.66 0.77 −2.6: 82.2:
FeI(1092) 5133.69 0.64 −2.0 0.88 82.9
C2 (0;0)R1(17) 5134.32 0.89 77.1
C2 (0;0)R3(15) 5134.67 0.91 77.4
C2 (0;0)R1(16) 5136.27 0.89 77.6
C2 (0;0)R2(15) 5136.44 0.94 77.7
C2 (0;0)R3(14) 5136.66 0.89 77.7
C2 (0;0)R1(15) 5138.11 0.89 77.3
C2 (0;0)R2(14) 5138.32 0.93 77.1
C2 (0;0)R3(13) 5138.51 0.90 77.6
C2 (0;0)R1(14) 5139.93 0.88 77.5
C2 (0;0)R2(13) 5140.14 0.92 77.8
C2 (0;0)R3(12) 5140.38 0.89 77.6
C2 (0;0)R1(13) 5141.65 0.87 77.2
C2 (0;0)R2(12) 5141.90 0.89 77.1
C2 (0;0)R3(11) 5142.11 0.89 77.7
C2 (0;0)R1(12) 5143.33 0.86 77.6
C2 (0;0)R2(11) 5143.60 0.89 77.4
C2 (0;0)R3(10) 5143.86 0.88 77.7
C2 (0;0)R1(11) 5144.92 0.85 77.5
C2 (0;0)R2(10) 5145.23 0.87 77.6
C2 (0;0)R3(09) 5145.48 0.87 77.5
FeII(35) 5146.11 0.73: 0.86 83.1:
C2 (0;0)R1(10) 5146.46 0.83 77.5
C2 (0;0)R2(09) 5146.81 0.88 77.6
C2 (0;0)R3(08) 5146.12 0.88 77.2
C2 (0;0)R1(09) 5147.93 0.84 77.4
C2 (0;0)R2(08) 5148.33 0.83 77.0
C2 (0;0)R3(07) 5148.61 0.84 77.3
C2 (0;0)R1(08) 5149.33 0.84 77.8
C2 (0;0)R2(07) 5149.79 0.85 77.1
C2 (0;0)R3(06) 5150.14 0.86 77.6
FeI(16) 5150.85 0.70 −0.9:
C2 (0;0)R2(06) 5151.17 0.83 77.3
C2 (0;0)R2(05) 5152.52 0.81 77.0
C2 (0;0)R2(04) 5153.77 0.74 77.2
TiII(70) 5154.07 0.51 0.57:
C2 (0;0)R2(03) 5154.99 0.82 77.4
C2 (0;0)R1(02) 5156.11 0.77 77.2
C2 (0;0)R2(01) 5157.16 0.86 77.8
C2 (0;0)P2(04) 5161.98 0.75 76.6
FeI(1089) 5162.27 0.66 −1.6
C2 (0;0)P2(05) 5162.58 0.66 77.3
C2 (0;0)P2(07) 5163.13 0.75 77.4
C2 (0;0)P1(14) 5165.03 0.56 78.0
C2 (0;0) head 5165.24 0.72:
MgI(2) 5167.32 0.30 0.55
FeI(37) 5167.49
36 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeII(42) 5169.03 0.27 −2.7 0.56: 77.0:
0.33 97.4
FeI(36) 5171.60 0.56 −1.7 0.88 82.6
MgI(2) 5172.68 0.37 −1.4 0.51 82.7
NdII 5179.78 0.94 81.6
FeI(1166) 5180.07 0.92
MgI(2) 5183.60 0.32 −2.1 0.45 81.6
TiII(86) 5185.91 0.54 −2.1 0.69 82.0
CeII(46) 5187.46 0.93: 0.81 81.4
TiII(70) 5188.69 0.41 0.57
CaI(49) 5188.85
FeI(383) 5191.45 0.56
FeII(52) 5191.58 0.58 82.8:
FeI(383) 5192.35 0.57 0.83:
NdII(75) 5192.61 0.81:
YII(28) 5196.43 0.75 81.8
FeII(49) 5197.58 0.46 −2.4 0.59 83.3
YII(20) 5200.41 0.70 −2.8: 0.61 83.7:
FeI(66) 5202.34 0.67 0.94 81.4
YII(20) 5205.73 0.55 84.8:
CrI(7) 5206.04 0.42
TiII(103) 5211.54 0.72 −2.7 0.83 82.3
NdII(44) 5212.37 0.96 −2.0: 0.95 81.1
FeII 5216.85 0.94 82.2
PrII(35) 5220.11 0.94 82.0
TiII(70) 5226.54
FeI(383) 5226.87 0.37: −1.9: 0.60 83.6:
FeI(37) 5227.19 0.79 81.6
FeI(1091) 5228.38 0.89
NdII(46) 5228.43
FeI(553) 5229.85 0.74 −1.3 0.95 82.8
FeII(49) 5234.63 0.45 −2.3 0.57 83.8
CrII(43) 5237.32 0.60 −1.9 0.68 82.7:
ScII(26) 5239.82 0.66 −1.8 0.76 82.4
CrII(23) 5246.77 0.80: 0.91 83.1
FeII(49) 5254.93 0.62 −1.3 0.74 82.0
NdII(43) 5255.51 0.91 81.6:
FeII(41) 5256.93 0.80 −1.7 0.89 83.2
FeII 5260.26 0.92: 0.90 81.9:
TiII(70) 5262.11 0.33: 0.81
CaI(22) 5262.24
FeII(48) 5264.81 0.60 −2.0 0.56 87.4:
FeI(383) 5266.55 0.58 −1.8 0.88 82.3
TiII(103) 5268.62 0.73: 0.86: 83.0:
FeI(15) 5269.54 0.42 −2.0 0.70 82.1
FeII(185) 5272.39 0.86 0.91 83.3:
CeII(15) 5274.23 0.92: 0.80 81.9
CrII(43) 5274.98 0.60: 0.71 82.9
FeII(49) 5276.00 0.42 −3.0 0.56 84.8
Klochkova et al.: Spectroscopy of HD56126 37
αPer HD56126
Element λ Å r Vr r Vr
FeI(383) 5281.79 0.70 −2.1 0.95 81.3
FeII(41) 5284.10 0.53: 0.68 83.1
YII(20) 5289.82 0.95 −2.3 0.87 81.2
LaII(6) 5290.83 0.96 −1.8: 0.93 81.1:
NdII(75) 5293.16 0.90 −1.4 0.84 82.2
HfII(49) 5298.06 0.94 82.7:
CrII(24) 5305.86 0.76 −1.5 0.84 82.9
CrII(43) 5308.43 0.79 −2.1 0.85 82.7
CrII(43) 5310.69 0.84: 0.92 82.7
CrII(43) 5313.58 0.70 −1.6 0.78 82.9
FeII(49,48) 5316.66 0.34 −1.3: 0.48 88.0:
ScII(22) 5318.35 0.85 −1.4 0.95 83.2:
NdII(75) 5319.82 0.92: 0.85 81.5
YII(20) 5320.78 0.94 82.0
FeI(553) 5324.18 0.58 −2.4 0.81 82.6
FeII(49) 5325.56 0.65 −2.5 0.73 82.6
FeI(15) 5328.04 0.37: 0.74 82.4
OI(12) 5328.69 0.93 83.3
CrII(43) 5334.87 0.72 −2.1 0.80 82.9
TiII(69) 5336.79 0.55 −2.2 0.71 81.9
FeII(48) 5337.73 0.71
CrII(43) 5337.79 0.81
FeI(37) 5341.02 0.61 −1.6 0.94 81.5
ZrII(115) 5350.09
ZrII(115) 5350.35 0.87: 0.64
FeI(1062) 5353.38 0.79:
CeII(15) 5353.53 0.79 81.3
FeII(48) 5362.86 0.51 −1.9 0.63 83.5
FeI(1146) 5364.87 0.72 0.93 82.6
FeI(1146) 5367.47 0.70 −2.1 0.90 82.4
CrII(29) 5369.35 0.96 82.2
FeI(1146) 5369.96 0.65 −2.1 0.88 82.9
FeI(15) 5371.49 0.46 −2.2 0.82 81.8
NdII(79) 5371.92 0.92 81.5:
LaII(95) 5377.05 0.94 82.3
CI(11) 5380.34 0.85: −1.9: 0.68 81.9
TiII(69) 5381.03 0.63 −2.8 0.78 81.8
FeI(1146) 5383.37 0.63 −2.2 0.86 82.6
FeI(553) 5393.17 0.71
CeII(24) 5393.39 0.81
FeI(553) 5393.17 0.71
CeII(24) 5393.39 0.81
FeI(15) 5397.13 0.52 −2.1 0.89 81.6
YII(35) 5402.78 0.85 −2.1 0.59 81.9
FeI(1145) 5404.14 0.59 0.87 82.2
FeI(15) 5405.77 0.50 −1.9 0.89 81.4
CeII(23) 5409.22 0.90 81.4
FeI(1165) 5410.91 0.70 −1.5 0.91 82.8:
FeII(48) 5414.07 0.74 −2.2 0.84 82.5
38 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
FeI(1165) 5415.20 0.64 −2.1 0.87 82.2
ZrII(94) 5418.01 0.96 80.6:
TiII(69) 5418.77 0.67 −2.0 0.82 82.4
CrII(23) 5420.93 0.79 0.90 83.0
FeI(1146) 5424.07 0.60 −1.8 0.84 82.7
FeII(49) 5425.25 0.67 −2.0 0.75 82.7
FeII 5427.80 0.94 −1.4 0.96 81.4:
FeI(15) 5434.52 0.56 −1.8 0.91 81.8
OI(11) 5435.18 0.97
OI(11) 5435.78 0.97 82.4
OI(11) 5436.86 0.95:
NdII(76) 5442.29 0.97 81.0:
FeI(1163) 5445.04 0.74 −1.9 0.95 82.1
FeI(15) 5446.92 0.48 0.89 81.0:
NdII 5451.12 0.97: 0.96 81.5:
CeII(24) 5468.38 0.96 0.92 89.5
CeII(24) 5472.30 0.90 89.0
YII(27) 5473.39 0.87: 0.66 90.0
ZrII(115) 5477.79 0.89 89.7
CrII(50) 5478.37 0.76 −1.7 0.83 89.3
YII(27) 5480.74 0.83: 0.66 90.0
NdII(79) 5485.71 0.97 −1.3: 0.92 90.3:
TiII(68) 5490.68 0.82 −1.2: 0.93 90.4:
YII(27) 5497.41 0.62 0.51 90.5
FeI(15) 5497.52
FeI(15) 5501.47 0.72
CrII(50) 5502.08 0.80 −2.7: 0.87 90.2
FeI(15) 5506.79 0.67 −0.9: 0.96 90.3:
CrII(50) 5508.62 0.82 −2.9: 0.89 89.0:
YII(19) 5509.90 0.80 −1.4 0.65 90.9:
CrII(23) 5510.71 0.83 −1.9 0.90 90.7:
YII(27) 5521.56 0.92 −2.1 0.69 90.3
ScII(31) 5526.81 0.55 −2.1 0.60 90.4
MgI(9) 5528.41 0.58 −2.4 0.76 88.7
FeII(55) 5534.84 0.57 −1.8 0.63 90.0
FeI(926) 5543.19 0.90 −2.1 0.97 89.8
FeI(1062) 5543.94 0.89 −2.4 0.96 89.0:
YII(27) 5544.61 0.89: 0.65 91.0
CI 5545.07 0.92: 0.82: 88.0:
YII(27) 5546.01 0.90: 0.66 90.1
CI 5547.27 0.95 89.5
CI 5551.03 0.95 87.8:
CI 5551.59 0.92 88.5
FeI(1183) 5554.90 0.84 −2.7 0.95 90.8:
FeI(686) 5569.62 0.71 −1.8 0.93 89.0
FeI(686) 5572.84 0.63 0.89 90.0
FeI(686) 5586.76 0.61 −1.4 0.84 89.2
CaI(21) 5588.76 0.65 −2.0 0.90 88.8
CaI(21) 5594.47 0.66: 0.89 88.4
Klochkova et al.: Spectroscopy of HD56126 39
αPer HD56126
Element λ Å r Vr r Vr
CaI(21) 5601.28 0.80 −2.9 0.97 88.7
CeII(26) 5610.26 0.97:
YII(19) 5610.36 0.93
FeI(686) 5615.64 0.57 0.82 89.5
NdII(86) 5620.65 0.93 88.5
FeI(686) 5624.54 0.75 −2.4 0.93 90.2:
FeII(57) 5627.49 0.87 −1.6 0.93 90.3
CI 5629.93 0.98: −1.5: 0.96 88.5
FeI(1314) 5633.95 0.89 −2.0
FeI(1087) 5638.27 0.87 −2.3
ScII(29) 5640.98 0.73 0.82 90.6
ScII(29) 5657.87 0.56: 0.72 90.3
ScII(29) 5658.34 0.83: 88.5
FeI(1087) 5662.52
YII(38) 5662.94 0.66: 0.42 90.6
ScII(29) 5667.15 0.77 −1.3 0.87 90.0
CI 5668.95 0.72 90.4
ScII(29) 5669.03 0.72 −2.9
LaII(95) 5671.54 0.96 91.2:
NaI(6) 5682.64 0.79: 0.96 88.8:
ScII(29) 5684.19 0.70: 0.83 90.5
NaI(6) 5688.21 0.72 −0.9: 0.92 91.2:
NdII(79) 5688.54 0.91 89.6:
CI 5693.11 0.98: 0.94 89.5
MgI(8) 5711.09 0.85 −1.6 0.98:
NiI(231) 5715.09 0.90 −1.6
CI 5720.78 0.98 89.5:
YII(34) 5728.89 0.94 −1.9 0.75 90.3
FeI(1087) 5731.77 0.92 −2.1 0.98
FeII(57) 5732.72 0.94 0.97 90.0
FeI(1180) 5752.04 0.93 −2.2
FeI(1107) 5763.00 0.80 −3.0: 0.96: 89.3:
LaII(70) 5769.06 0.96 −1.2: 0.86 88.9
SiI(17) 5772.15 0.91 −1.8 0.98:
FeI(1087) 5775.08 0.93 −1.7
YII(34) 5781.69 0.84 90.5
CI 5793.12 0.89 0.87 88.9
CI 5794.46 0.97 90.6:
LaII(4) 5797.59 0.87 89.9
SiI(9) 5797.86 0.90
CI 5800.59 0.95: 0.91 89.2
NdII(79) 5804.02 0.93 90.5
CI 5805.19 0.94: 0.94 90.5
LaII(4) 5805.78 0.94 −2.7: 0.86 89.9
FeI(1180) 5806.73 0.93 −2.2 0.98 90.2
LaII(4) 5808.31 0.98: 0.97 91.5:
FeI(982) 5809.22 0.95 −1.3 0.98:
VII(99) 5819.93 0.95 −1.4 0.98: 90.0:
FeII(164) 5823.15 0.96 −2.5: 0.97 89.6:
40 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
NdII(86) 5842.39 0.99: 0.96 88.3:
FeI(1178) 5852.22 0.96 −1.8
BaII(2) 5853.68 0.63 −2.1 0.53 90.8
CaI(47) 5857.45 0.72 0.94 88.9
FeI(1180) 5862.36 0.85 −1.9 0.96 88.8
LaII(62) 5863.70 0.98 0.97 89.8
LaII(35) 5880.63 0.96 90.7:
NaI(1) 5889.95 0.45: −1.2: 0.21 13.0
0.05 +2.2 0.32 24.0
0.37 31.4
0.10 75.7
0.40 89.0
NaI(1) 5895.92 0.50: −0.9: 0.38 13.2
0.06 +2.3 0.51 24.7
0.48 32.0
0.16 76.3
0.45 89.8
FeI(982) 5934.66 0.89 −2.2
SiI(16) 5948.54 0.83 −3.0 0.97:
SiII(4) 5957.56 0.93 88.0:
CeII(80) 5975.83 0.94 89.9
SiII(4) 5978.93 0.88 87.5
FeI(1175) 5983.68 0.89 −0.7 0.95: 89.0:
FeI(1260) 5984.82 0.86 −1.9 0.96: 88.0:
FeII(46) 5991.37 0.75 −1.7 0.84 90.3
CI 6001.13 0.98: 0.92 89.5
CI 6002.98 0.95 89.7:
FeI(959) 6003.02 0.88 −2.4
CI 6006.03 0.96: 0.86 89.5
CI 6010.68 0.98 −1.2 0.92 88.8
CI 6012.24 0.98: −1.5: 0.94 90.3:
CI 6013.32 0.80 87.7:
MnI(27) 6013.49 0.89:
CI 6016.45 0.93 89.2:
MnI(27) 6016.64 0.91: −2.0:
MnI(27) 6021.80 0.90 −2.0:
FeI(1178) 6024.07 0.82 −2.2 0.94 88.8
ZrII(136) 6028.64 0.97: 0.91 89.1
CeII(30) 6034.20 0.98 −1.7 0.96 90.8:
CeII(30) 6043.40 0.99 −1.5: 0.94 89.7
FeI(207) 6065.49 0.81 −1.4 0.98:
FeII(46) 6084.10 0.83 −1.5 0.90
ZrII(106) 6106.47 0.97: 0.93 90.9
FeII(46) 6113.32 0.88 −1.6 0.93 88.6
ZrII(93) 6114.78 0.98 −1.5: 0.92 92.2:
CI 6120.82 0.99: 0.97 89.8
CaI(3) 6122.22 0.69 −0.9 0.93 89.5
FeI(169) 6136.62 0.75 0.95 89.0:
FeI(207) 6137.70 0.78 −2.0 0.96 89.2:
Klochkova et al.: Spectroscopy of HD56126 41
αPer HD56126
Element λ Å r Vr r Vr
BaII(2) 6141.72 0.56 −1.4 0.37 92.3
FeII(74) 6147.74 0.74 −1.4 0.77 90.2
FeII(74) 6149.25 0.75 −1.5 0.77 90.0
SiI(29) 6155.98 0.89
OI(10) 6155.98 0.95: 0.82 87.0:
OI(10) 6156.17 0.95: 0.82 87.1
OI(10) 6158.18 0.79 88.0
FeI(1260) 6170.51 0.90 −1.5 0.97 89.7
FeII(200) 6175.14 0.95 90.2
LuII(2) 6221.88 0.98: 0.90 89.9:
FeII(74) 6238.39 0.74 −1.7 0.76 90.0
SeII(28) 6245.62 0.80: −0.5: 0.90 90.0:
FeII(74) 6247.56 0.66 −2.2 0.70 89.9:
FeII 6248.90 0.94 −1.4 0.94 89.3
FeI(169) 6252.56 0.81 −1.9 0.97 89.5
LaII(33) 6262.30 0.96 −0.6: 0.89 90.2
FeII 6317.99 0.81 −1.0: 0.86 88.8:
FeI(168) 6318.03
SiII(2) 6347.10 0.65 −2.0 0.51 89.7
FeII(40) 6369.46 0.84 −1.6 0.91 90.5
SiII(2) 6371.36 0.72 −1.8 0.57 89.5
FeII 6383.72 0.91 −1.6 0.92 89.8
FeII 6385.46 0.93 −2.0 0.94 88.4:
LaII(33) 6390.49 0.97 0.89 90.3
FeI(168) 6393.61 0.79 −2.5 0.96 90.5:
CI 6397.98 0.98: 0.94 89.2
FeII(74) 6416.92 0.75 −1.0 0.80 90.2
FeI(62) 6430.85 0.83 −1.6 0.97 90.6:
FeII(40) 6432.68 0.75 −1.4 0.81 90.0
CaI(18) 6439.08 0.69 −1.6 0.89 90.0
FeII 6442.95 0.95 −1.2 0.95 88.9
FeII(199) 6446.41 0.95 −1.2 0.95 91.1:
OI(9) 6453.60 0.97: 0.94 87.9
OI(9) 6454.45 0.98: −2.0: 0.94 88.0
FeII(74) 6456.39 0.60 −0.2: 0.66 91.0:
TiII(91) 6491.57 0.78: 0.91 90.7:
BaII(2) 6496.91 0.55 0.52
FeII(40) 6516.08 0.66 −2.0 0.77 90.2
LaII(33) 6526.99 0.91 90.7:
MgII(23) 6545.97 0.93 89.8
FeI(268) 6546.25 0.92:
Hα 6562.81 0.19 −2.1 0.32 58.0
0.40: 74.0:
0.89 82.8:
CI(22) 6587.61 0.89 −1.7 0.72 89.5
SeII(19) 6604.59 0.89 0.94 91.1:
TiII(91) 6606.95 0.95 −2.2: 0.96 89.5
CI 6611.35 0.96 90.6:
YII 6613.75 0.91 0.73 90.0
42 Klochkova et al.: Spectroscopy of HD56126
αPer HD56126
Element λ Å r Vr r Vr
CI 6654.61 0.94 89.5:
CI 6655.51 0.91 90.5
YII(26) 6795.42 0.88 90.0
CI 7087.83 0.90 89.3
CI 7100.12 0.91 89.7
SiI(23) 7415.95 0.95: 87.5:
SiI(23) 7423.50 0.92 91.0:
EuII(8) 7426.57 0.93: 85.0:
CI 7476.18 0.89: 86.0:
CI 7483.44 0.90 85.0:
LaII(1) 7483.48
FeI(1077) 7511.03 0.97: 87.0
KI(1) 7664.91 0.83 77.5
OI(1) 7771.94 0.34 93.5
OI(1) 7774.17 0.35 94.8:
OI(1) 7775.39 0.42:
CI 7860.89 0.91: 88.0:
MgII(8) 7877.05 0.89 90.5:
YII(32) 7881.90 0.89 90.5:
MgII(8) 7896.37 0.80 88.7:
H(P27) 8306.12 0.91: 91.0:
H(P25) 8323.43 0.91: 91.0:
H(P20) 8392.40 0.64 87.0:
H(P19) 8413.32 0.62 88.0:
H(P18) 8437.96 0.60 92.0:
OI(4) 8446.5: 0.26 94.0:
H(P17) 8467.25 0.58 93.0:
CaII(2) 8542.11 0.40 85.0:
H(P15) 8545.38 0.53 94.0:
NI(8) 8567.74 0.90 89.0:
NI(8) 8594.01 90.0:
H(P14) 8598.39 0.49 93.0:
NI(8) 8629.24 0.78: 88.0:
NI(1) 8703.25 0.85: 90.0:
NI(1) 8711.70 0.83 87.0:
NI(1) 8718.83 0.84 89.0:
H(P12) 8750.47 0.43 98.0:
	Introduction
	Observations and reduction of spectra
	Peculiarities of the optical spectrum of HD56126
	Radial velocities pattern
	Spectral atlas
ABSTRACT
  We studied in detail the optical spectrum of the post-AGB star HD56126
(IRAS07134+1005). We use high resolution spectra (R=25000 and 60000) obtained
with the echelle spectrographs of the 6-m telescope. About one and a half
thousand absorptions of neutral atoms and ions, absorption bands of C_2, CN,
and CH molecules, and interstellar bands (DIBs) are identified in the 4010 to
8790 AA wavelength region, and the depths and radial velocities of these
spectral features are measured. Differences are revealed between the variations
of the radial velocities measured from spectral features of different
excitation. In addition to the well-known variability of the Halpha profile, we
found variations in the profiles of a number of FeII, YII, and BaII lines. We
also produce an atlas of the spectrum of HD56126 and its comparison staralpha
Per. The full version of the atlas is available in electronic form from
Web-address: http://www.sao.ru/hq/ssl/Atlas/Atlas.html

<|endoftext|><|startoftext|>
Generation of Large Number-Path Entanglement Using Linear Optics and
Feed-Forward
Hugo Cable∗ and Jonathan P. Dowling
Horace C. Hearne Jr. Institute for Theoretical Physics,
Department of Physics and Astronomy, Louisiana State University, Baton Rouge LA70803.
(Dated: April 4, 2007)
We show how an idealised measurement procedure can condense photons from two modes into
one, and how, by feeding forward the results of the measurement, it is possible to generate efficiently
superpositions of components for which only one mode is populated, commonly called “N00N states”.
For the basic procedure, sources of number states leak onto a beam splitter, and the output ports
are monitored by photodetectors. We find that detecting a fixed fraction of the input at one output
port suffices to direct the remainder to the same port with high probability, however large the initial
state. When instead photons are detected at both ports, Schrödinger cat states are produced. We
describe a circuit for making the components of such a state orthogonal, and another for subsequent
conversion to a N00N state. Our approach scales exponentially better than existing proposals.
Important applications include quantum imaging and metrology.
The fundamental limits to optical detection for metrol-
ogy and imaging are quantum mechanical [1]. Of partic-
ular interest for reaching such quantum limits are path-
entangled states of photons of the form |N0〉+ eiφ|0N〉,
in a basis of photon-number states, commonly referred to
as “N00N” states. A variety of applications have been
suggested [2]. For lithography [3] and microscopy [4],
N00N state light would be used together with multi-
photon absorbers to achieve enhanced resolution. This is
because the de Broglie wavelength for an N -photon state
is a factor 1/N smaller than the wavelength associated
with the single photon, and the absorption rate scales
linearly with the incident intensity, rather than as the
N th power. Regarding applications to precision metrol-
ogy, whereby an interferometric setup is used to mea-
sure small phase shifts, N00N states achieve the Heisen-
berg limit, for which the phase uncertainty scales as 1/N
[5, 6, 7], and entanglement is a fundamental requirement
for achieving this limit. It has been rigorously demon-
strated that the cost of improving sensitivity (without
using entanglement) is higher intensities or longer coher-
ence times [8]. Classically the shot-noise limit applies,
attained for example by laser light, for which the uncer-
tainty scales as 1/
N , already a restraint in applications
such as magnetometry [9] and gyroscopy [10].
However, building a source of N00N states beyond two
photons is challenging. Three, four and six photon ex-
periments have been reported [11, 12, 13], but only in the
first two references were N00N states generated. In the-
ory a source could be made using a nonlinear crystal [14].
However, the required optical nonlinearity is not readily
available. An alternative is a non-deterministic approach
using linear optics, wherein the desired state is generated
on condition of a specific outcome at photodetectors. A
variety of schemes have been suggested which typically
rely on conditional destructive interference [15, 16, 17].
Electronic address: hcable@lsu.edu
However, so far none of these scales efficiently, that is
they all share the feature that exponentially decreasing
success probabilities outweigh the possible gains. Noting
that quantum algorithms, exhibiting polynomial and ex-
ponential speedups over their classical counterparts, may
be implemented scalably in a linear-optics approach [18],
we expect that it should be possible to do better. In this
Article we address this challenge by adapting a concep-
tually simple measurement procedure. Our method is as
follows. First, we aim to minimise the negative effects of
back-action in a sequence of detections, whereby earlier
measurements affect the outcomes of later ones. Next,
we engineer output states that closely approximate the
ideal case. Finally, we exploit feed-forward, for which cir-
cuits are actively switched in response to previous pho-
todetections. Feed-forward is an essential ingredient of
linear-optics based quantum computing, but is not used
in previous proposals for engineering N00N states.
We begin by considering the thought experiment de-
picted in Fig. 1(a). Here, cavity modes labeled A and
B are assumed to start with a well-defined photon num-
ber N . They are each coupled to an external mode by
a weakly transmissive mirror, and these modes are com-
bined at a 50:50 beam splitter, and then subject to par-
tial photodetection. The beam splitter acts to make the
origin of the photons indistinguishable. When a photon
is registered at the left or right photodetector labeled
DL or DR, the transformation is given by the Kraus
operators L̂ = (â − b̂)/
2 or R̂ = (â + b̂)/
2 respec-
tively (where â and b̂ are the annihilation operators for
modes A and B). To obtain the corresponding prob-
abilities it is necessary to normalise by the total pho-
ton number prior to detection. We suppose now that a
string of detections occur only at Dr, say (by adjusting
the path length difference of the cavities between detec-
tions, with a phase shifter, the same state for the cav-
ity modes can be obtained in every case). After this,
the detectors are removed and the system evolves to a
final state with all the remaining population at the out-
put ports. Denoting by |ψAB〉 the state of modes A
http://arxiv.org/abs/0704.0678v1
mailto:hcable@lsu.edu
FIG. 1: Two measurement-based procedures, inducing rela-
tive phase correlations between the principal modes, each one
with N photons at the start. In (a), population leaks from
cavity modes A and B into external modes, which are com-
bined at a beam splitter. Photons are detected one at a time
by photodetectors Dl and Dr. In (b), all modes are propagat-
ing. Beam splitters couple some fraction f from the principal
modes one and two into ancillae modes three and four, which
are initially the vacuum. The ancillae are subsequently com-
bined at a beam splitter, and subjected to number-resolving
photodetection at Dl and Dr.
and B after r initial detections, we find that |ψAB〉 =
R̂r |N〉 |N〉 /
〈N | 〈N |
|N〉 |N〉, normalising
to unity. The probability Pcond of finding all the remain-
ing photons “condensed” at the right output port (and
none at the left output port) is as follows,
Pcond = 〈ψAB|
|ψAB〉 /S!
(rCk)
SCN−k
where S ≡ 2N − r denotes the total remaining photon
number, C denotes a binomial coefficient, and we assume
that r < N . Evaluating the value of Pcond numerically
for initial states of increasing size, we find that its value
is determined asymptotically by the proportion of the
input that is measured. For example, setting r either as
one quarter or one third of 2N suffices for Pcond > 0.6 or
Pcond > 0.7, respectively.
We have found for our thought experiment, that later
detections tend to strongly reinforce earlier ones. Hence,
the effect of measurement back-action here is useful for
state engineering, and in what follows we adapt the mea-
surement process for N00N state generation. We di-
vide our analysis into three stages. First, we translate
our thought experiment into a mathematically equivalent
procedure based purely on linear optics, and consider the
general case for which photons are detected at both pho-
todetectors. The localisation phenomena resulting from
this measurement process, have been studied extensively
in the context of the debate over the existence of abso-
lute optical coherence in common quantum-optical exper-
iments [19]. It has been demonstrated that well-defined
correlations in the relative optical phase evolve, for the
remaining population, and play a central role in the ongo-
ing dynamics. Hence, in the second stage of our analysis,
we investigate simple procedures for manipulating phase
correlations, and relate states with well-defined correla-
tions in the relative phase and N00N states. Finally, we
identify a method based on feed-forward to enable N00N
states to be generated efficiently.
First, we translate our thought experiment into a
mathematically-equivalent procedure, based on linear op-
tics, as depicted in Fig. 1(b). We label this optical cir-
cuit, Circuit I. Here all modes are propagating, and a
source is assumed to supply dual Fock states |N〉|N〉 to
the principal modes one and two. Beam splitters of re-
flectance f couple modes one and two to ancillae modes
three and four, which are combined at a 50:50 beam
splitter. They are then measured by number-resolving
photodetectors labeled Dl and Dr, where on average a
fraction f of the input photons are registered. We now
consider the state |ψl,r〉 generated in modes one and two
after l photons are registered at Dl and r at Dr. Fol-
lowing Ref. [20], it is convenient to adopt a representa-
tion in terms of coherent states, which are of the form,
|α〉 ≡ ||α| exp(iθ)〉 ∝
|α|2k/k! exp (ikθ) |k〉 in a
basis of Fock states. It has been shown that,
|ψl,r〉∝
dθavd∆exp (−iSθav)×
G(∆−∆0)+exp (iσ)G(∆+∆0)
|α1〉|α2〉,(1)
where S ≡ 2N − l − r, αj = |αj | exp(iθj), θav =
(θ1 + θ2)/2 and ∆ ≡ θ2 − θ1. The superposition phase
σ takes the value lπ, and hence the measurement record
must be known exactly. The scalar function G(X) is
given to good approximation by the Gaussian expression
−(l + r)X2/4
. The total photon number is equal
to S. There are well-defined correlations in the relative-
optical phase parameter ∆ at values plus and minus ∆0,
determined only by the ratio of l to r. These correla-
tions are multi-valued whenever photons are registered
at both photodetectors, and a Schrödinger cat state is
generated. We can see that cats are generated as a re-
sult of the symmetry of the setup. Specifically, L̂ and
R̂ are invariant under an exchange of the labeling of the
modes, a transformation which reverses the sign of the
relative phase. The generation of cat states therefore
also requires precise phase stability between the modes.
Turning to the source, we see that the state of the input
can be a mixture of the form
N PN |N〉|N〉〈N |〈N |, since
∆0 is independent of N , and standard linear optical ele-
ments obey a superselection rule for the photon number.
Several two-mode squeezing processes strongly suppress
relative number fluctuations, and hence might serve as
practical sources of light described by these mixed states.
For the second stage in our analysis, we identify the
outputs of Circuit I as examples of quantum reference
frames — reference frames for a classically defined pa-
rameter composed of finite quantum resources. Quantum
reference frames are subject to depletion and degradation
as they are used, and are currently of interest for proto-
cols in the field of quantum information, in which they
are regarded as a resource [21]. By making an analogy
to classical phase references we can now identify simple
ways in which states of the form Eq. (1) can be manip-
ulated. For the current purposes we can assume that
a large number of detections have been performed and
define,
|ψ∞ (∆0)〉 ∝
dθ exp (−iSθ) |α〉|α exp(i∆0)〉, (2)
where α = |α|eiθ, for a state with a total photon num-
ber S and a relative phase of ∆0 (assumed to be nor-
malized). Relative phase correlations between more than
two modes are transitive and are transformed additively
by phase shifters. A phase reference can be extended to
additional modes by combining it with the vacuum at
a beam splitter. As an example, a 50:50 beam split-
ter, which we denote here by Ubs, beating light in a
Fock state with S photons against the vacuum yields,
Ubs|S〉|0〉 ∝ |ψ∞(0)〉, and Ubs|0〉|S〉 ∝ |ψ∞(π)〉. There-
fore, we see that a simple circuit, consisting only of a
beam splitter and a phase shifter, can convert a cat state
generated by Circuit I to aN00N state, whenever the rel-
ative phase correlations differ by π. This happens when
l = r and ∆0 = π/2. We label this circuit, Circuit III
(anticipating an intermediate process modifying the cat
states for the general case).
Before proceeding to the final stage of our analy-
sis, we consider a simple N00N -state generator, that
attempts to convert every cat state generated by Cir-
cuit I using Circuit III. This method might be expected
to yield close approximations to N00N states, when-
ever the the relative phases of the cat state are close
to plus and minus π/2. The situation is summarized
in Fig. 2(a). To measure the quality of the output
state we adopt the fidelity, denoting it by F . For
the measurement-induced condensation, considered at
the start, F takes the same same value as Pcond. For
schemes generating N00N states, it is necessary to ac-
count for the phase of the superposition, and we define
F = maxφ
〈0S| + exp (−iφ) 〈S0|
|ψoutput〉
/2, where
S is the total photon number of the state |ψoutput〉.
Evaluating F for our N00N -state generator, when Cir-
cuit I generates a cat state with relative phase compo-
nents at ±∆0 and total photon number S, we find to
first approximation that F ∼ cos2S [(∆0 − π/2) /2]. As
with other proposals, this scheme in fact scales exponen-
tially poorly whenever the relative phase correlations are
less than π apart, as is typically the case. Inspecting
the overlap for different relative phase components, as
in Eq. (2) with total photon number S, we find that
〈ψ∞(∆1)|ψ∞(∆2)〉
cos [(∆2 −∆1) /2]
. The poor
scaling can be attributed to the non-orthogonality of the
cat state components.
We now proceed to the final stage of our analysis. Our
previousN00N -state generator is effective when Circuit I
FIG. 2: (a) and (b) illustrate complete N00N state generators
in outline. Circuit I produces Schrödinger cat states with two
relative phase components non-deterministically (green). The
generators are terminated by a fixed unitary (yellow) circuit
consisting of beam splitters and phase shifters. In (b) the cat
states are corrected, using an additional measurement process
conditioned on the previous detection outcomes.
generates cat states with components which are orthog-
onal. However, this occurs with low probability. It is
not possible to improve the situation with any combi-
nation of (idealised) beam splitters and phase shifters,
since these implement unitary transformations. Hence,
we now devise a circuit, labeled Circuit II, to make input
cat components orthogonal, using additional processes of
measurement and feed-forward. This more sophisticated
scenario is depicted in Fig. 2(b). To identify a suitable
circuit, we investigate how a beam splitter transforms
phase references, starting with two classical fields. Here
the field in each mode is represented by a complex num-
ber, with the square amplitude corresponding to the in-
tensity, and the phase to the optical one. A 50:50 beam
splitter, configured so as not to impart additional phase
shifts to the modes, outputs two classical fields described
by the sum and difference of the values for the inputs (al-
tering both the square amplitudes and the phases). If the
input has a relative phase of 0 or π, and equal intensities
for each mode, the population is transferred entirely into
one mode. On the other hand, if the input has a relative
phase of plus or minus π/2, and equal intensities in each
mode, the relative phase and intensities are preserved.
Moving to the quantum case, we consider the action of
the beam splitter for a state, defined as in Eq. (2), with
a relative phase of ∆0, a total photon number S, and an
intensity S/2 in each mode. Computing the final state
explicitly, we find a scenario similar to the classical case,
Ubs|ψ∞〉 ∝
dθ exp (−iSθ)×
I1 exp (iθ)〉|
I2 exp [i (θ ± π/2)]〉 . (3)
This has an intensity SI1/ (I1 + I2) = [1− cos(∆0)] /2
in mode one and SI2/ (I1 + I2) = S [1 + cos(∆0)] /2 in
mode two, and a relative phase of plus π/2 when 0 <
∆0 ≤ π/2 and of minus π/2 when −π/2 ≤ ∆0 < 0
(we consider cases for which the intensity is increased in
favour of mode two).
The symmetry of the beam splitter transformation
makes it useful for altering the cats generated by Circuit
I, so that the relative phases are different by π. How-
ever, it creates a difference in the intensities between the
modes. To correct this, we propose beating mode two
against the vacuum, so as to to move the difference of
the intensities to an ancillary mode, which can be re-
moved by a photodetection. This method depends criti-
cally on feeding forward the result of the detections per-
formed by Circuit I, so that a variable beam splitter can
be set according to the value of ∆0. A variable beam
splitter can be implemented with 50:50 beam splitters
and variable phase shifters. The cost of correction is a
decrease in the total photon number, which varies non-
deterministically. As can be seen from Eq. (3), a fraction
of cos (∆0) of the photons are lost on average. Overall,
Circuits I through III constitute a complete N00N -state
generator. Additional mathematical analysis is given in
the supplementary online text. Runs for which Circuit
I fails to generate a Schrödinger cat state, or too many
photons are lost in the detection process are discarded.
The fidelities at the output are, on average, 0.87, 0.94 or
0.98, when a fraction of one third, one half, or two third
respectively of the input photons are detected by Circuit
I. Higher fidelities are possible when the photon number
at the input is small. If allowance is made for sufficient
input photons to be detected by Circuit I, and a further
half to be detected in Circuit II, the probability of failure
is not too large.
Finally, we suggest some possibilities for experimen-
tal implementation. For the source, we propose an op-
tical parametric oscillator setup for which the two mode
squeezed output of an optical parametric amplifier is en-
hanced by a cavity [22]. Note, however, that the cur-
rent purposes require twin beams of a much lower inten-
sity than is typical in many experiments, and that the
beams must be rendered frequency degenerate. Tech-
niques of feed-forward and photodetection are being de-
veloped with a view to quantum information technolo-
gies [23, 24]. For the source, an important problem is
imperfect correlation between the modes. If, for exam-
ple, two independent lasers of equal intensity provide the
input, the scheme generates the intended relative phase
correlations, but no entanglement [25]. Photodetectors
are subject to loss and dark counts. Losses will act to
degrade the source, reducing the relative number corre-
lation and increasing the uncertainty in the total photon
number. Dark counts are more problematic, mixing over
the phase for the superposition in Eq. (1). An alter-
native suggestion is using trapped bosonic atoms. One
possibility might be to work in a regime for which the
atomic wave-packets are much longer than the typical
scattering length, as proposed in Ref. [26]. Another is
to use Bose-Einstein condensates, for which a variety of
coherent operations have been demonstrated. Number-
resolved condensates might be obtained from the Mott
Insulator phase, while relative-number squeezing can be
achieved by different techniques.
In conclusion, we have proposed for the first time a
linear-optics based scheme that generates large N00N
states efficiently, the photon number at the output scal-
ing with that of the source — all the while maintaining
high fidelities, high success probabilities and a fixed num-
ber of circuit components. As well as being of immediate
interest for a range of applications, our results have con-
nections with other topics. For example, the scaling we
derive for our measurement-induced condensation pro-
cedure is of relevance to the study of the interference of
light from independent sources and localizing relative op-
tical phase, phenomena with analogs in different physical
systems [27]. We have left as an open question the extent
to which the scaling can be attributed to Bose statistics.
Regarding our N00N -state generators, the creation of
macroscopic entangled states is of interest for exploring
the quantum-classical transition. Finally, our study of
Schrödinger cat states may have application to quantum
computing, where Schrödinger cat states, defined for one
mode only, have been proposed to encode qubits, which
may be manipulated using standard experimental tech-
niques [28].
Acknowledgements
The authors would like to acknowledge support from the
Hearne Institute, the Army Research Office, and the Dis-
ruptive Technologies Office. H. C. would like to thank
Terry Rudolph, Ryan Glasser, Sonja Daffer and Yuan
Liang Lim for helpful discussions.
[1] Giovannetti, V., Lloyd, S. & Maccone, L. Quantum-
enhanced measurements: beating the standard quantum
limit. Science 306, 1330 (2004).
[2] Kapale, K. T., Didomenico, L. D., Lee, H., Kok, P. &
Dowling, J. P. Quantum interferometric sensors. Con-
cepts of Physics II, 225 (2005).
[3] Boto, A. N. et al. Quantum interferometric optical lithog-
raphy: exploiting entanglement to beat the diffraction
limit. Phys. Rev. Lett. 85, 2733 (2000).
[4] Teich, M. C. & Saleh, B. E. A. Microscopy with quantum-
entangled photons. Českloslovenský časopis pro fyziku
47, 3 (1997). English translation.
[5] Bollinger, J. J., Itano, W. M., Wineland, D. J. & Heinzen,
D. J. Optimal frequency measurements with maximally
correlated states. Phys. Rev. A 54, R4649 (1996).
[6] Ou, Z. Y. Fundamental quantum limit in precision phase
measurement. Phys. Rev. A 55, 2598 (1997).
[7] Boixo, S., Flammia, S. T., Caves, C. M. & Geremia,
J. M. Generalized limits for single-parameter quantum
estimation. Phys. Rev. Lett. 98, 090401 (2007).
[8] Giovannetti, V., Lloyd, S. & Maccone, L. Quantum
metrology. Phys. Rev. Lett. 96, 010401 (2006).
[9] Kominis, I. K., Kornack, T. W., Allred, J. C. & Romalis,
M. V. A subfemtotesla multichannel atomic magnetome-
ter. Nature 422, 596 (2003).
[10] Dowling, J. P. Correlated input-port, matter-wave inter-
ferometer: quantum-noise limits to the atom-laser gyro-
scope. Phys. Rev. A 57, 4736 (1998).
[11] Mitchell, M. W., Lundeen, J. S. & Steinberg, A. M.
Super-resolving phase measurements with a multi-photon
entangled state. Nature 429, 161 (2004).
[12] Walther, P. et al. De broglie wavelength of a non-local
four-photon state. Nature 429, 158 (2004).
[13] Resch, K. J. et al. Time-reversal and super-resolving
phase measurements. quant-ph/0511214 (2005).
[14] Sanders, B. C. Quantum dynamics of the nonlinear rota-
tor and the effects of continual spin measurement. Phys.
Rev. A 40, 2417 (1989).
[15] Fiurášek, J. Conditional generation of n-photon entan-
gled states of light. Phys. Rev. A 65, 053818 (2002).
[16] Zou, X., Pahlke, K. & Mathis, W. Generation of entan-
gled photon states by using linear optical elements. Phys.
Rev. A 66, 014102 (2002).
[17] Kok, P., Lee, H. & Dowling, J. P. Creation of large-
photon-number path entanglement conditioned on pho-
todetection. Phys. Rev. A 65, 052104 (2002).
[18] Knill, E., Laflamme, R. & Milburn, G. J. A scheme for
efficient quantum computation with linear optics. Nature
409, 46 (2001).
[19] Mølmer, K. Optical coherence: A convenient fiction.
Phys. Rev. A 55, 3195 (1997).
[20] Sanders, B. C., Bartlett, S. D., Rudolph, T. & Knight,
P. L. Photon-number superselection and the entangled
coherent-state representation. Phys. Rev. A 68, 042329
(2003).
[21] Bartlett, S. D., Rudolph, T. & Spekkens, R. W. Refer-
ence frames, superselection rules, and quantum informa-
tion. quant-ph/0610030v2 (2006).
[22] Zhang, Y., Kasai, K. & Watanabe, M. Investigation of
the photon-number statistics of twin beams by direct de-
tection. Opt. Lett. 27, 1244 (2002).
[23] Kok, P. et al. Linear optical quantum computing with
photonic qubits. Rev. Mod. Phys. 79, 135 (2007).
[24] Prevedel, R. et al. High-speed linear optics quantum
computing using active feed-forward. Nature 445, 65
(2007).
[25] Cable, H., Knight, P. L. & Rudolph, T. Measurement-
induced localization of relative degrees of freedom. Phys.
Rev. A 71, 042107 (2005).
[26] Popescu, S. KLM quantum computation with bosonic
atoms. quant-ph/0610043 (2006).
[27] Rau, A. V., Dunningham, J. A. & Burnett, K.
Measurement-induced relative-position localization
through entanglement. Science 301, 1081 (2003).
[28] Jeong, H. & Ralph, T. C. Schrodinger cat states for quan-
tum information processing. quant-ph/0509137 (2005).
Supplementary Material: Methods
In these supplementary notes, we provide further anal-
ysis of our N00N -state generator, consisting of Circuits
I, II and III, as depicted in outline in Fig. 2(b). First,
we specify notation for beam splitters, phase shifters and
states with well-defined relative phase correlations. For
the lossless beam splitter, we choose a notation which
makes explicit the “rotation” performed by such a device.
A beam splitter with transmittance τ and reflectance
(1− τ) acts to transform the annihilation operators ôj
for modes labeled j, according to the relations,
cos(γ) − sin(γ)
sin(γ) cos(γ)
with angular parameter γ, where τ = cos2 (γ) and 0 ≤
γ ≤ π/2. We denote this transformation Ubs (γ), and we
include, where necessary, phase shifts of χ at the input
port and −χ at the output port of the first mode, so that
Ubs (γ, χ) ≡ exp
γ exp(iχ)ô1ô
2 − γ exp(−iχ)ô
. For
example, Ubs (arccos (
τ ), π/2) corresponds to a sym-
metric beam splitter. We denote a phase shift trans-
formation on mode j, exp
†ôjχ
, by Ups (χ). For a
state defined, as in Eq. (2), with a total photon number S
and relative phase ∆0, it is helpful to incorporate a phase
factor exp (−iS∆0/2) into the normalisation (making the
definition symmetric between the modes). We then adopt
the following notation for a normalised Schrödinger cat
state,
|ψcat (∆0,Λ)〉 ∝ |ψ∞ (∆0)〉+ exp (iΛ) |ψ∞ (−∆0)〉 ,
having components with relative phases plus and minus
∆0, a phase for the superposition Λ (with the overall
normalisation constant assumed positive).
Next, we elaborate on the sequence of operations per-
formed by our N00N -state generator. We assume the
final state should have at least P photons, and that the
correlations in the relative phase are ideal. For the first
step, Circuit I, depicted in Fig. 1(b), implements the
transformation,
|l, r〉〈l, r|
3,4 Ubs
conditioned on the detection of l photons in mode 3 and
r photons in mode 4. A dual Fock state from the source
evolves to a cat state according to,
Circuit I :|N〉1|N〉2|0〉3|0〉4 −→ |ψcat (∆0, lπ)〉1,2 .
This cat state has relative phase components with values
plus and minus ∆0 ≡ 2 arccos
r/(l + r)
, and total
photon number 2N − l − r. Runs for which l or r are
zero must be discarded. It has been shown that values
for the relative phase are generated with approximately
equal frequency across the range [25], and hence these
failure events do not affect the scaling of the generator.
Next, Circuit II acts to transform the relative phase
correlations, to plus and minus π/2, in every case. When
r ≥ l, the relative phase correlations lie in the range
[−π/2, π/2], and a 50:50 beam splitter acting on the prin-
cipal modes corrects the relative phase correlations, while
increasing the intensity in mode two (and decreasing it in
mode one). To achieve the same outcome when l < r, we
suppose that a phase shift of π is applied in advance (on
either mode). This transforms the cat state generated by
Circuit I as,
Ups (π) |ψcat (∆0(l, r), lπ)〉 ∝ |ψcat (∆0(r, l), rπ)〉 .
Next, a beam splitter, with transmittance
[1− cos (∆0)] / [1 + cos (∆0)], transfers the differ-
ence of the intensities to the ancillary mode five. A
circuit for implementing the variable beam splitter is
given by the relation,
Ubs (γ)2,5 ≡
Ubs (π/4, π/2)2,5 Ups (γ)5 Ups (−γ)2 Ubs (π/4,−π/2)2,5 .
A photodetector measures Q photons in mode 5. Overall,
Circuit II implements the transformation,
|Q〉 〈Q|
Ubs {arccos [tan (∆0/2)]}2,5 Ubs (π/4)1,2 .
The cat state evolves as,
Circuit II :
|ψcat (∆0, lπ)〉−→|ψcat [π/2, lπ + (2N − l− r −Q)π/2]〉 .
We derived a full probability distribution for the out-
comes of the photodetection performed by Circuit II,
Prob(Q = 0, · · · , S − 1) = 1
1 + (−1)l cosS (∆0)
SCQ [1−cos (∆0)]S−Q cosQ (∆0)
Prob(Q = S) =
1 + (−1)l cosS (∆0)
1 + (−1)l
cosS (∆0) ,
where S = 2N − l − r is the total photon number prior
to detection, and C denotes a binomial coefficient. This
probability distribution is approximately binomial, and
the expected number of detections is S cos (∆0). If too
many photons are lost in Circuits I and II the run must
be aborted. The probability of this can be made small
by taking P ≃ N(1−f). In principle, excess photons can
be removed by an additional process, similar to Circuit
Finally, Circuit III implements the unitary transfor-
mation,
Ubs (π/4, π)1,2 Ups (π/2)2 .
The corrected cat state evolves as,
Circuit III :
|ψcat (π/2, lπ+(2N−l−r−Q)π/2)〉−→|P, 0〉+(−1)l |0, P 〉 ,
yielding the desired N00N state, with P = 2N−l−r−Q.
It may be noted that the superposition phase for the
N00N -state at the output depends on the measurement
record at the photodetectors. When l < r, the additional
phase shift in Circuit II causes this phase to be rπ rather
than lπ.
Next, we estimate the fidelities of the states produced
by our N00N state generator, and clarify its behaviour
for large photon number. To do this, we first compute
the fidelity for one component of a cat state generated
by Circuit I, which we denote by |ψG(∆0)〉. We assume,
as in Eq. (1), that the function G(X), describing the
localisation of the relative phase, assumes its Gaussian
asymptotic form. Note that the rate of localisation is
faster when phase correlations evolve at more than one
value. We assume that the state at the input is the dual
Fock state |N〉1|N〉2, and that a total of D = l + r de-
tections have occurred. Then,
〈ψG(∆0)|ψ∞ (∆0)〉
∫ π/2
d∆cosS
∫ π/2
∫ π/2
d∆d∆′ cosS
∆′2 +∆2
where S = 2N − D is the total photon number. This
result was derived assuming that S and D are not small.
The value for the fidelity depends only on the ratio of
detections to input photons. For example, when D/2N
is one half, F = 0.94, and when D/2N is two thirds,
F = 0.98. To verify this result, we computed numerically
exact values for the fidelity,
〈ψl,r|ψcat (∆0(l, r), lπ)〉
for a range of states |ψl,r〉 generated by Circuit I. For
input state |3〉1|3〉2, the fidelity is 0.94 for (l, r) = (1, 2)
and (2, 1), and anomalously it is 1 for (l, r) = (1, 1). For
input state |5〉1|5〉2 the values are 0.94 and 0.96 when
l + r = 5, and range from 0.96 to 1 when l + r = 6,
while for input state |15〉1|15〉2 the values range from
0.92 to 0.96 when l+ r = 15, and from 0.96 to 0.99 when
l + r = 20.
Finally, we performed a complete numerical simulation
of the N00N -state generator, to verify that Circuits I, II
and III work together as predicted. In particular, it was
necessary to check that Circuits II and Circuit III func-
tion as expected when the relative phase correlations for
the cat states are not perfectly well-defined. The results
are shown in Fig. 3. Each point in the plots corresponds
to a particular choice of input state, and measurement by
Circuit I. The height corresponds to the expected pho-
ton number for the output N00N state, and the color
to its fidelity. Averages are taken over all possible out-
comes to the third photodetection performed by Circuit
II. For comparison, the mesh shows the predictions of
the preceding analysis. Good agreement is seen between
these analytical predictions and the numerical results.
However, inspection of individual outcomes in Circuit II
reveals that the high fidelities are not maintained in ev-
ery case. Roughly speaking, improbable outcomes were
often found to have low fidelity.
0.88 0.9 0.92 0.94 0.96 0.98 1
Fidelity
FIG. 3: Fidelities (color) and photon number (vertical axis)
are displayed for outputs of our N00N states generator. Each
point corresponds to a possible outcome to Circuit I, for which
D photons are detected. Input states |N〉|N〉2 are considered
for N up to 15. Going from left to right, D/2N is one third,
one half and two thirds.
ABSTRACT
  We show how an idealised measurement procedure can condense photons from two
modes into one, and how, by feeding forward the results of the measurement, it
is possible to generate efficiently superpositions of components for which only
one mode is populated, commonly called ``N00N states''. For the basic
procedure, sources of number states leak onto a beam splitter, and the output
ports are monitored by photodetectors. We find that detecting a fixed fraction
of the input at one output port suffices to direct the remainder to the same
port with high probability, however large the initial state. When instead
photons are detected at both ports, Schr\"{o}dinger cat states are produced. We
describe a circuit for making the components of such a state orthogonal, and
another for subsequent conversion to a N00N state. Our approach scales
exponentially better than existing proposals. Important applications include
quantum imaging and metrology.

<|endoftext|><|startoftext|>
Introduction
We are interested in a finite branch local solution to the sixth Painlevé equation around a fixed
singular point. We show that every such solution is in fact an algebraic branch solution (see
Definition 1.1 for the terminology). In particular a global solution is an algebraic solution if and
only if it is finitely many-valued globally. Although the problem under study is local in nature,
our solution to it relies on an effective combination of some global technologies and some local
tools. The former includes the algebraic geometry of the sixth Painlevé equation, Riemann-
Hilbert correspondence, geometry and dynamics on cubic surfaces, Kleinian singularities and
their minimal resolutions [15, 16, 17, 18, 20], while the latter includes the power geometry
of algebraic differential equation [5, 6, 7], which is a method of constructing formal solutions
by means of Newton polygons, and the theory of nonlinear differential equations of “regular
singular type” [10, 11], which discusses the convergence of formal solutions.
Let us describe our main results in more detail. First we recall that the sixth Painlevé
equation PVI(κ) is a Hamiltonian system of nonlinear differential equations
∂H(κ)
= −∂H(κ)
, (1)
∗Mathematics Subject Classification: 34M55, 37F10.
†E-mail address: iwasaki@math.kyushu-u.ac.jp
http://arxiv.org/abs/0704.0679v1
with time variable z ∈ Z := P1 − {0, 1,∞} and unknown functions q = q(z) and p = p(z),
depending on complex parameters κ = (κ0, κ1, κ2, κ3, κ4) in the 4-dimensional affine space
K := { κ = (κ0, κ1, κ2, κ3, κ4) ∈ C5 : 2κ0 + κ1 + κ2 + κ3 + κ4 = 1 }, (2)
where the Hamiltonian H(κ) = H(q, p, z; κ) is given by
z(z − 1)H(κ) = (q0q1qz)p2 − {κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz}p+ κ0(κ0 + κ4)qz,
with qν := q − ν for ν ∈ {0, 1, z}. Each of the points 0, 1, ∞ is called a fixed singular point.
It is well known that equation (1) has the analytic Painlevé property, that is, any meromor-
phic solution germ at a base point z ∈ Z can be continued meromorphically along any path in
Z emanating from z. Thus a solution can branch only around a fixed singular point. We are
interested in finite branch solutions around it, by which we mean the following.
Definition 1.1 A finite branch solution to equation (1), say, around z = 0 is a local solution
(q(z), p(z)) on a punctured disk D× = D − {0} centered at z = 0 such that its lift (q̃(z̃), p̃(z̃))
along some finite branched covering ϕ : (D̃, 0̃) → (D, 0), z̃ 7→ z = z̃n around z = 0 is a single-
valued meromorphic function on D̃× = D̃ − {0̃}. Such a solution is said to be an algebraic
branch solution if it can be represented by a convergent Puiseux-Laurent expansion
q(z) =
i/n, p(z) =
i/n, (3)
with ai = bi = 0 for all sufficiently small i ≪ 0, namely, if the lift (q̃(z̃), p̃(z̃)) is a single-valued
meromorphic function on D̃ with at most pole at the origin z̃ = 0̃.
Problem 1.2 Is any finite branch solution to PVI(κ) an algebraic branch solution ?
In this article we settle this problem in the affirmative as is stated in the following.
Theorem 1.3 Any finite branch solution to Painlevé VI around a fixed singular point is an
algebraic branch solution. In particular a global solution is an algebraic solution if and only if
it is finitely many-valued globally. These results are valid for all parameters κ ∈ K.
It is an interesting problem to consider algebraic solutions to Painlevé VI. Many algebraic
solutions have been constructed in [1, 2, 3, 8, 13, 14, 24, 25], but a complete classification seems
to be outstanding. We hope that Theorem 1.3 will play an important part in discussing this
issue. The following remark explains what Theorem 1.3 signifies and why it is remarkable.
Remark 1.4 Logically, according to Definition 1.1, a finite branch solution (q(z), p(z)) around
z = 0 may have a very transcendental singularity at z = 0, to the effect that its lift (q̃(z̃), p̃(z̃))
may have infinitely many poles in D̃× accumulating to the origin z̃ = 0̃, or even if such an
accumulation phenomenon does not occur, it may have an essential singularity at z̃ = 0̃.
Rather surprisingly, however, Theorem 1.3 excludes the possibility for a finite branch solution
to admit such transcendental phenomena. This result becomes more intriguing if we recall that
wild behaviors of a generic solution to Painlevé VI have been observed in [9, 12, 22, 31, 32] and
examples of solutions with infintely many poles accumulating to z̃ = 0 are given in [12, 31];
such a distribution of poles may be expected for a generic solution, though it is not rigorously
verified yet to the author’s knowledge. Thus we can think that a finite branch solution is quite
distinguished from generic solutions, necessarily being an algebraic branch solution.
I II III
{all algebraic branch solutions} i→֒ {all finite branch solutions} j→֒ {all solutions}
{some algebraic branch solutions} i
→֒ {all finite branch solutions}
I ′ II
Figure 1: Main idea for the proof of Theorem 1.3
The main idea for the proof of Theorem 1.3 is presented in Figure 1. We have natural
inclusions i and j in the top line of Figure 1 and we wish to show that the injection i is in fact
a surjection. Our strategy consists of the “upper bound part” and the “lower bound part”.
(1) Upper bound part: In this part we investigate the inclusion j : II →֒ III in Figure 1,
considering how the locus of finite branch solutions is included in the moduli space of
all solutions. In other words, we make a confinement of the locus II in the entire space
III. What we shall really do is not an upper bound estimation of this locus but rather a
pinpoint identification of it. This is the main part of the article and we use the algebraic
geometry of Painlevé VI, Riemann-Hilbert correspondence, geometry and dynamics on
cubic surfaces, and minimal resolutions of Kleinian singularities [15, 16, 17, 18, 19, 20].
(2) Lower bound part: In this part we fill in the diagram of Figure 1 by adding the bottom
line to the top one. We try to construct as many algebraic branch solutions as possible
in order to make the set I ′ as large as possible. The construction is based on the power
geometry technique developed in [5, 6, 7] and the convergence arguments in [10, 11].
We are done if the set I ′ is large enough to show that the injection i′ : I ′ →֒ II is in
fact a surjection. This does not mean that we verify the equality I ′ = II directly. (If
such a direct approach were feasible, then our problem would not be difficult from the
beginning!) Instead, we prove it very indirectly based on the following idea.
(3) Key trick: Suppose that a component A of I ′ injects into a component B of II. If the
cardinalities of A and B are finite and the same, then the injection i′ : A →֒ B is in
fact a surjection. If A and B are biholomorphic to C and the injection i′ : A →֒ B is
holomorphic, then it must be a surjection because any holomorphic injection C →֒ C is a
surjection (use Casorati-Weierstrass or Picard’s little theorem). The same argument holds
true if C is replaced by C×, since any holomorphic injection C× →֒ C× is a surjection (lift
it to the universal covering C →֒ C). These tricks enable us to identify the component
A ⊂ I ′ with the component B ⊂ II. We show that each component involved is either of
the three types mentioned above. Then we make this kind of argument componentwise
to get an identification I ′ = II, which leads to the desired coincidence I ′ = I = II.
In view of the way in which Theorem 1.3 is established, the power geometry technique
provides us with an efficient method of identifying all finite branch solutions (up to Bäcklund
transformations), which have now turned out to be algebraic branch solutions, by determining
the leading terms of their Puiseux-Laurent expansions.
In some sense this article is a counterpart of the previous paper [20] where an ergodic study
of Painlevé VI is developed (see also the survey [21]). Put z1 = 0, z2 = 1, z3 = ∞. For each
γ2 γ3
Figure 2: Three basic loops γ1, γ2, γ3 in Z = P
1 − {0, 1,∞}
{i, j, k} = {1, 2, 3}, let γi be a loop in Z surrounding zi once anti-clockwise and leaving zj and
zk outside as in Figure 2. Then the fundamental group π1(Z, z) is represented as
π1(Z, z) = 〈 γ1, γ2, γ3 | γ1γ2γ3 = 1 〉. (4)
A loop γ ∈ π1(Z, z) is said to be elementary if it is conjugate to γmi for some i ∈ {1, 2, 3} and
m ∈ Z; otherwise, it is said to be non-elementary. The main theme of [20] is the dynamics of
the nonlinear monodromy of PVI(κ) along a given loop γ. It is shown there that, along every
non-elementary loop, the nonlinear monodromy is chaotic and the number of its periodic points
grows exponentially as the period tends to infinity. On the other hand, it is Liouville integrable
along an elementary loop, in the sense that it preserves a Lagrangian fibration. Now we notice
that from the dynamical point of view the main problem of this article is nothing other than
discussing the periodic points of the nonlinear monodromy along the basic loop γi, which is
of course an elementary loop. In view of its integrable character, one may doubt if there is
something very deep with this issue. As Theorem 1.3 and Remark 1.4 show, however, this issue
is actually quite interesting from the function-theoretical point of view.
The plan of this article is as follows. In §2 the phase space of Painlevé VI is introduced as a
moduli space of stable parabolic connections. In §3 the Riemann-Hilbert correspondence from
the moduli space to an affine cubic surface is formulated and its character as an analytic minimal
resolution of Kleininan singularities is stated. In §4 the dynamical system on the cubic surface
representing the nonlinear monodromy of Painlevé VI is formulated and some preliminary
properties of it are given. In §5 we briefly review Bäcklund transformations and their relation
to the Riemann-Hilbert correspondence. In §6 fixed points and periodic points of the dynamical
system are discussed. A stratification of the parameter space K is also introduced in order to
describe the singularities of the cubic surfaces. In §7 a case-by-case study of fixed points and
periodic points is made according to the stratification, thereby a pinpoint identification of
finite branch solutions is made on each stratum. In §8 power geometry of algebraic differential
equations is applied to Painlevé VI in order to construct as many algebraic branch solutions as
possible. In §9 we consider the inclusion of those solutions constructed in §8 into the moduli
space of all finite branch solutions. After some preliminaries on Riccati solutions, we show that
this inclusion is in fact a surjection, thereby complete the proof of Theorem 1.3.
singularities t1 = 0 t2 = z t3 = 1 t4 = ∞
first exponent −λ1 −λ2 −λ3 −λ4
second exponent λ1 λ2 λ3 λ4 − 1
difference κ1 κ2 κ3 κ4
Table 1: Riemann scheme: κi is the difference of the second exponent from the first.
2 Phase Space
Equation (1) is only a fragmentary appearance of a more intrinsic object constructed algebro-
geometrically [16, 17, 18]. We review this construction following the expositions of [20, 21]. The
sixth Painlevé dynamical system PVI(κ) is formulated as a holomorphic, uniform, transversal
foliation on a fibration of certain smooth quasi-projective rational surfaces
πκ : M(κ) → Z := P1 − {0, 1,∞},
whose fiber Mz(κ) := π−1κ (z) over z ∈ Z, called the space of initial conditions at time z, is
realized as a moduli space of stable parabolic connections. The total space M(κ) is called
the phase space of PVI(κ). In this formulation, the uniformity of the Painlevé foliation, in
other words, the geometric Painlevé property of it is a natural consequence of a solution to
the Riemann-Hilbert problem (see Theorem 3.5), especially of the properness of the Riemann-
Hilbert correspondence [16]. Then equation (1) is just a coordinate expression of the foliation
on an affine open subset of M(κ) and the analytic Painlevé property for equation (1) is an
immediate consequence of the geometric Painlevé property for the foliation and the algebraicity
of the phase space M(κ). Moreover there exists a natural compactification Mz(κ) →֒ Mz(κ)
of the moduli space Mz(κ) into a moduli space Mz(κ) of stable parabolic phi-connections.
Here we include a very sketchy explanation of the terminology used in the last paragraph.
A stable parabolic connection is a Fuchsian connection equipped with a parabolic structure on
a (rank 2) vector bundle over P1 having a Riemann scheme as in Table 1, where the parabolic
structure corresponds to the first exponents, which satisfies a sort of stability condition in
geometric invariant theory. Here the parameter κi stands for the difference of the second
exponent from the first one at the regular singular point ti. On the other hand, a stable
parabolic phi-connection is a variant of stable parabolic connection allowing a “matrix-valued
Planck constant” called a phi-operator φ such that the generalized Leibniz rule
∇(fs) = df ⊗ φ(s) + f∇(s)
is satisfied, where the key point here is that the field φ may be degenerate or simi-classical.
Then the moduli space Mz(κ) can be compactified by adding some semi-classical objects, that
is, some stable parabolic phi-connections with degenerate phi-operator φ. There is the following
characterization of our moduli spaces (see Figure 3).
Theorem 2.1 ([16, 17, 18])
(1) The compactified moduli space Mz(κ) is isomorphic to an 8-point blow-up of the Hirze-
bruch surface Σ2 → P1 of degree 2.
Mz(κ)
Yz(κ) : vertical leaves
E1 E2 E3 E4
Figure 3: Nonlinear monodromy γ∗ : Mz(κ) 	 along a loop γ ∈ π1(Z, z)
(2) Mz(κ) has a unique effective anti-canonical divisor Yz(κ), which is given by
Yz(κ) = 2E0 + E1 + E2 + E3 + E4, (5)
where E0 is the strict transform of the section at infinity and Ei (i = 1, 2, 3, 4) is the strict
transform of the fiber over the point ti ∈ P1 of the Hirzebruch surface Σ2 → P1.
(3) The support of the divisor Yz(κ) is exactly the locus where the phi-operator φ is degenerate,
with the coefficients of formula (5) being the ranks of degeneracy of φ. In particular,
Mz(κ) = Mz(κ)− Yz(κ).
This theorem implies that Mz(κ) is a moduli-theoretical realization of the space of initial
conditions for PVI(κ) constructed “by hands” in [26], Mz(κ) is a generalized Halphen surface
of type D
4 in [30] and (Mz(κ),Yz(κ)) is an Okamoto-Painlevé pair of type D̃4 in [28].
Since the Painlevé foliation has the geometric Painlevé property [16], each loop γ ∈ π1(Z, z)
admits global horizontal lifts along the foliation and induces an automorphism
γ∗ : Mz(κ) → Mz(κ), Q 7→ Q′, (6)
called the nonlinear monodromy along the loop γ (see Figure 3). Note that a fixed point or a
periodic point of the map γ∗ : Mz(κ) 	 can be identified with a solution germ at z which is
single-valued or finitely many-valued along the loop γ, respectively.
3 Riemann-Hilbert Correspondence
Generally speaking, a Riemann-Hilbert correspondence is the map from a moduli space of
flat connections to a moduli space of monodromy representations, sending a connection to its
C1 C2 C3
0 z 1
Figure 4: Four loops in P1 − {0, z, 1,∞}
monodromy. In our situation an appropriate Riemann-Hilbert correspondence
RHz,κ : Mz(κ) → Rz(a), Q 7→ ρ, (7)
is formulated in [16, 17, 18]. For each a = (a1, a2, a3, a4) ∈ A := C4, let Rz(a) denote the
moduli space of Jordan equivalence classes of linear monodromy representations
ρ : π1(P
1 − {0, z, 1,∞}, ∗) → SL2(C),
with the prescribed local monodromy data Tr ρ(Ci) = ai (i = 1, 2, 3, 4), where Ci is a loop as in
Figure 4. Any stable parabolic connection Q ∈ Mz(κ), restricted to P1 − {0, z, 1,∞}, induces
a flat connection and determines the Jordan equivalence class ρ ∈ Rz(a) of its monodromy
representations, where the correspondence of parameters κ 7→ a is described as follows. If
−1πκi) (i = 0, 1, 2, 3),
− exp(
−1πκ4) (i = 4),
then b = (b0, b1, b2, b3, b4) belongs to the multiplicative space
B := { b = (b0, b1, b2, b3, b4) ∈ (C×)5 : b20b1b2b3b4 = 1 }.
The Riemann scheme in Table 1 then implies that the monodromy matrix ρ(Ci) has an eigen-
value bi for each i = 1, 2, 3, 4. Since ρ(Ci) ∈ SL2(C), its trace ai = Tr ρ(Ci) is given by
ai = bi + b
i (i = 1, 2, 3, 4). (9)
Given any θ = (θ1, θ2, θ3, θ4) ∈ Θ := C4θ, consider the affine cubic surface
S(θ) = {x ∈ C3x : f(x, θ) := x1x2x3 + x21 + x22 + x23 − θ1x1 − θ2x2 − θ3x3 + θ4 = 0}.
Then there exists an isomorphism of affine algebraic surfaces
Rz(a) → S(θ), ρ 7→ x = (x1, x2, x3), with xi = Tr ρ(CjCk)
for {i, j, k} = {1, 2, 3}, where the correspondence of parameters a 7→ θ is given by
aia4 + ajak ({i, j, k} = {1, 2, 3}),
a1a2a3a4 + a
1 + a
2 + a
3 + a
4 − 4 (i = 4).
w1 w2
w3 w4

−2 1 1 1 1
1 −2 0 0 0
1 0 −2 0 0
1 0 0 −2 0
1 0 0 0 −2

Figure 5: Dynkin diagram and Cartan matrix of type D
The composition of the sequence κ 7→ b 7→ a 7→ θ of the three maps (8), (9) and (10) is referred
to as the Riemann-Hilbert correspondence in the parameter level [16] and is denoted by
rh : K → Θ. (11)
Then the Riemann-Hilbert correspondence (7) is reformulated as a holomorphic map
RHz,κ : Mz(κ) → S(θ) with θ = rh(κ). (12)
The map (11) admits a remarkable affine Weyl group structure [16, 19], from which the
Bäcklund transformations of Painlevé VI emerge [15]. In view of formula (2) the affine space
K can be identified with the linear space C4 by the forgetful isomorphism K → C4, κ =
(κ0, κ1, κ2, κ3, κ4) 7→ (κ1, κ2, κ3, κ4), where the latter space C4 is equipped with the standard
(complex) Euclidean inner product. For each i ∈ {0, 1, 2, 3, 4}, let wi : K → K, κ 7→ κ′, be the
orthogonal reflection in the hyperplane { κ ∈ K : κi = 0}, which is explicitly represented as
κ′j = κj + κicij (i, j ∈ {0, 1, 2, 3, 4}), (13)
where C = (cij) is the Cartan matrix of type D
4 given in Figure 5. Then the group generated
by w0, w1, w2, w3, w4 is an affine Weyl group of type D
4 ) = 〈w0, w1, w2, w3, w4〉 y K.
corresponding to the Dynkin diagram in Figure 5. The reflecting hyperplanes of all reflections
in the group W (D
4 ) are given by affine linear relations
κi = m, κ1 ± κ2 ± κ3 ± κ4 = 2m+ 1 (i ∈ {1, 2, 3, 4}, m ∈ Z),
where the signs ± may be chosen arbitrarily. Let Wall be the union of all these hyperplanes.
Then the affine Weyl group structure on (11) is stated as follows [16] (see Figure 6).
Lemma 3.1 In terms of b ∈ B, the discriminant ∆(θ) of the cubic surfaces S(θ) factors as
∆(θ) =
(bl − b−1l )2
ε∈{±1}4
(bε − 1), (14)
where we put bε = bε11 b
4 for each quadruple sign ε = (ε1, ε2, ε3, ε4) ∈ {±1}4. The
Riemann-Hilbert correspondence in the parameter level (11) is a branched W (D
4 )-covering
ramifying along Wall and mapping Wall onto the discriminant locus ∆(θ) = 0 in Θ.
∆(θ) = 0
K-space Θ-spaceWall
Figure 6: Riemann-Hilbert correspondence in the parameter level
The singularity structure of the cubic surfaces S(θ) can be described in terms of the strati-
fication of K by proper Dynkin subdiagrams, which we now define.
Definition 3.2 Let I be the set of all proper subsets of {0, 1, 2, 3, 4} including the empty set
∅. For each element I ∈ I, we put
KI = the W (D(1)4 )-translates of the set { κ ∈ K : κi = 0 (i ∈ I) },
DI = the Dynkin subdiagram of D
4 that has nodes • exactly in I.
Let KI be the set obtained from KI by removing the sets KJ with #J = #I + 1. Then it
turns out that we have either KI = KI′ or KI ∩ KI′ = ∅ for any distinct subsets I, I ′ ∈ I
(see Remark 3.3). So we can think of the stratification of K by the subsets KI (I ∈ I), called
the W (D
4 )-stratification, where each KI is referred to as a W (D
4 )-stratum. For example, if
I = ∅, one has the big open K∅ = K −Wall. Other examples of W (D()14 )-strata are given in
Figure 7. The diagram DI encodes not only its underlying abstract Dynkin type but also the
inclusion pattern DI →֒ D(1)4 , a kind of marking. The abstact Dynkin type of DI is denoted by
Dynk(I). All the feasible abstract Dynkin types are tabulated in Table 2.
There is a mistake in the definition of KI in [16, Definition 9.3] and [21], which is now
corrected in Definition 3.2. (As for [16], correction may be possible before it is published.)
Remark 3.3 Let I and I ′ be distinct elements of I. If Dynk(I) 6= Dynk(I ′), then KI∩KI′ = ∅.
On the other hand, if KI = KI′ then Dynk(I) = Dynk(I ′) must be of abstract type A1 or A2.
D4 A⊕41 A3
I = {0, 1, 2, 3} I = {1, 2, 3, 4} I = {0, 1, 2}
Figure 7: Examples of W (D
4 )-strata
number of nodes 4 3 2 1 0
abstract D4 A3 A2 A1 ∅
Dynkin type A⊕41 A
1 − −
Table 2: Feasible abstract Dynkin types
(1) There is a unique W (D
4 )-stratum of abstract type ∅, or A1, or A2, or A⊕41 .
(2) There are six W (D
4 )-strata of abstract type A
1 or A3.
(3) There are four W (D
4 )-strata of abstract type A
1 or D4.
Example 3.4 We consider the W (D
4 )-strata of abstract types A
1 and D4.
(1) The unique W (D
4 )-stratum of abstract type A
1 exactly corresponds to the value θ =
(0, 0, 0,−4). A parameter κ ∈ K lies in this stratum if and only if either
(a) κ1, κ2, κ3, κ4 ∈ Z, κ1 + κ2 + κ3 + κ4 ∈ 2Z; or
(b) κ1, κ2, κ3, κ4 ∈ Z+ 1/2.
(2) The four W (D
4 )-strata of abstract type D4 exactly correspond to the values θ =
(8ε1, 8ε2, 8ε3, 28), where ε = (ε1, ε2, ε3) ∈ {±}3 ranges over all triple signs such that
ε1ε2ε3 = 1. A parameter κ ∈ K lies in the union of these W (D(1)4 )-strata if and only if
κ1, κ2, κ3, κ4 ∈ Z, κ1 + κ2 + κ3 + κ4 ∈ 2Z+ 1.
With this stratification, we have a very neat solution to the Riemann-Hilbert problem.
Theorem 3.5 ([16, 17, 18]) Given any κ ∈ K, put θ = rh(κ) ∈ Θ. Then,
(1) if κ ∈ KI then S(θ) has Kleinian singularities of Dynkin type DI ,
(2) the Riemann-Hilbert correspondence (12) is a proper surjective map that is an analytic
minimal resolution of Kleinian singularities.
If κ ∈ K −Wall then the surface S(θ) is smooth and RHz,κ is a biholomorphism, while if
κ ∈ Wall, it is not a biholomorphism but only gives a resolution of singularities (proper and
surjective, but not injective). For example, see Figure 8 for the case κ = (0, 0, 0, 0, 1) where
a singularity of type D4 occurs. In the latter case, however, if we take a standard algebraic
minimal resolution of Kleinian singularities as constructed by Brieskorn [4] and others,
ϕ : S̃(θ) → S(θ) (15)
moduli space cubic surface
RHz,κ
resolution of
singularity
Mz(κ)
D4Ez(κ)
Figure 8: Resolution of singularities by Riemann-Hilbert correspondence
then we can lift the Riemann-Hilbert correspondence (12) to have a commutative diagram
Mz(κ)
gRHz,κ−−−→ S̃(θ)
Mz(κ)
RHz,κ−−−→ S(θ).
The lifted Riemann-Hilbert correspondence R̃Hz,κ is a biholomorphism and hence gives a strict
conjugacy between the nonlinear monodromy (6) of PVI(κ) and a certain automorphism
g̃ : S̃(θ) → S̃(θ). (17)
This latter map will be described explicitly in Section 4 (see Theorem 4.1).
The singularity structure of the affine cubic surface S(θ) is closely related to the Riccati
solutions to PVI(κ) [16], where a Riccati solution is a particular solution that arises from the
Riccati equation associated to a Gauss hypergeometric equation. Let Ez(κ) ⊂ Mz(κ) be the
exceptional set of the resolution of singularities by the Riemann-Hilbert correspondence (12).
Similarly, let E(θ) ⊂ S̃(θ) be the exceptional set of the algebraic resolution of singularities (15).
Theorem 3.6 ([16, 29]) Equation PVI(κ) admits Riccati solutions if and only if κ ∈ Wall.
All Riccati solution germs at time z ∈ Z are parametrized by the exceptional set Ez(κ) ⊂ Mz(κ),
which precisely corresponds to the exceptional set E(θ) ⊂ S̃(θ) through the lifted Riemann-Hilbert
correspondence (16).
Fot this reason we may refer to Ez(κ) and M◦z(κ) := Mz(κ) − Ez(κ) as the Riccati locus and
non-Riccati locus of Mz(κ) respectively. They are invariant under the action of the nonlinear
monodromy (6). Corresponding to them, let Sing(θ) and S◦(θ) := S(θ)−Sing(θ) be the singular
locus and the smooth locus of the cubic surface S(θ) respectively.
Remark 3.7 Two remarks are in order at this stage.
(1) By Theorem 3.5 the Riemann-Hilbert correspondence (12) restricts to a biholomorphism
RH◦z,κ : M◦z(κ) → S◦(θ) (18)
between the non-Riccati locus of Mz(κ) and the smooth locus of S(θ), while it collapses
the Riccati locus Ez(κ) to the singular locus Sing(θ). In order to resolve this degeneracy
and obtain an isomorphism, we had to take the lifted Riemann-Hibert correspondence
(16), which induces an isomorphism between the exceptional sets Ez(κ) and E(θ).
(2) For the Riccati solutions the main problem of this article is trivial; if a Riccati solution
is a finite branch solution around a fixed singular point, then it is an algebraic branch
solution, because the Riccati solution is (essentially) the logarithmic derivative of a Gauss
hypergeometric function. Thus we may restrict our attention to the non-Riccati locus.
4 Dynamics on Cubic Surface
We shall describe the strict conjugacy (17) of the nonlinear monodromy (6). For a cyclic
permutation (i, j, k) of (1, 2, 3) we define an isomorphism gi : S(θ) → S(θ′), (x, θ) 7→ (x′, θ′) by
gi : (x
j , x
j , θ
k) = (θj − xj − xkxi, xi, xk, θj , θi, θk). (19)
Through the resolution of singularities (15), the map gi is uniquely lifted to an isomorphism
g̃i : S̃(θ) → S̃(θ′), (i = 1, 2, 3).
We remark that the square g2i is an automorphism of S(θ) with g̃2i being its lift to S̃(θ).
Theorem 4.1 ([16]) For each i ∈ {1, 2, 3} the nonlinear monodromy γi∗ : Mz(κ) 	 along the
i-th basic loop γi is strictly conjugated to the automorphism g̃
i : S̃(θ) 	 via the lifted Riemann-
Hilbert correspondence (16). More generally, if γ ∈ π1(Z, z) is represented by γ = γε1i1 γ
· · ·γεnin
with (i1, . . . , in) ∈ {1, 2, 3}n and (ε1, . . . , εn) ∈ {±1}n, then the map (17) is given by
g̃ = g̃2ε1i1 g̃
· · · g̃2εnin .
Let F̃ixj(θ) be the set of all fixed points of the transformation g̃
j : S̃(θ) 	. Moreover,
for any integer n > 1, let P̃erj(θ;n) be the set of all periodic points of prime period n of the
transformation g̃2j : S̃(θ) 	. Theorem 4.1 then implies that all single-valued solution germs and
all n-branch solution germs to PVI(κ) around the fixed singular point zj are parametrized by
the sets F̃ixj(θ) and P̃erj(θ;n) respectively. By Remark 3.7, considering F̃ixj(θ) and P̃erj(θ;n)
upstairs is the same thing as considering Fixj(θ) and Perj(θ;n) downstairs, except for the ex-
ceptional locus upstairs and the singular locus downstairs. Here Fixj(θ) and Perj(θ;n) denote
the set of all fixed points and the set of all periodic points of prime period n of the transfor-
mation g2j : S(θ) 	 downstairs. In order to make the situation more transparent, we begin by
investigating simultaneous fixed points of g21, g
3 downstairs.
Theorem 4.2 If Fix(θ) is the set of all simultaneous fixed points of g21, g
3 : S(θ) 	, then
Fix(θ) = Sing(θ). (20)
Proof. A point x ∈ S(θ) is a singular point of the surface S(θ) if and only if its gradient vector
field y(x, θ) = (y1(x, θ), y2(x, θ), y3(x, θ)) vanishes at the point x, where
yi(x, θ) :=
(x, θ) = 2xi + xjxk − θi (21)
On the other hand, an inspection of formula (19) readily shows that x ∈ S(θ) is a simultaneous
fixed point of g21, g
3 if and only if x is a common root of equations
f(x, θ) = y1(x, θ) = y2(x, θ) = y3(x, θ) = 0. (22)
Then the equality (20) immediately follows from these observations. ✷
As is announced in [16], this theorem yields a characterization of the rational solutions.
Corollary 4.3 Any single-valued global solution to PVI(κ) is a rational Riccati solution.
Proof. If a single-valued solution Q ∈ Mz(κ) belongs to the non-Riccati locus Q ∈ M◦z(κ),
then the Riemann-Hilbert correspondence (18) sends Q to a smooth point x ∈ S◦(θ). Since the
single-valued solution Q is a simultaneous fixed point of the nonlinear monodromies γ1∗, γ2∗,
γ3∗, the corresponding point x must lie in Fix(θ). Then Theorem 4.2 implies that x ∈ Sing(θ),
which contradicts the fact that x ∈ S◦(θ). Hence any single-valued solution is a Riccati solution.
Since any Riccati solution is (essentially) the logarithmic derivative of a Gauss hypergeometric
function, any single-valued Riccati solution must be a rational solution. ✷
All the rational solutions to Painlevé VI are classified in [25]. We come back to our discussion
downstairs and give a simple characterization of the sets Fixj(θ) and Perj(θ;n).
Lemma 4.4 Let x = (x1, x2, x3) ∈ S(θ) be any point and let n be any integer > 1.
(1) x ∈ Fixj(θ) if and only if x is a root of equations
f(x, θ) = yj(x, θ) = yk(x, θ) = 0. (23)
(2) x ∈ Perj(θ;n) if and only if there exists an integer 0 < m < n coprime to n such that
f(x, θ) = 0, xi = 2 cos(πm/n). (24)
Proof. We put (x′, θ′) = gj(x, θ) and y
′ = y(x′, θ′). Then formula (19) yields
y′i = yi − xjyk, y′j = −yk, y′k = yj − xiyk. (25)
For each integer n ∈ Z we write (x(n), θ(n)) = gni (x, θ) and y(n) = y(x(n), θ(n)). From formulas
(19) and (25), we can easily obtain three recurrence relations
(n+2)
j + xi y
(n+1)
j + y
j = 0, (26)
(n+2)
j − x
j = y
(n+2)
j , (27)
(n+1)
k = x
j . (28)
The characteristic equation of the recurrence relation (26) is the quadratic equation
λ2 + xi λ+ 1 = 0, (29)
the roots of which are denoted by α and β = α−1. Since αβ = 1, we may and shall assume
that |α| ≥ 1 ≥ |β| > 0 in the sequel. The discussion is divided into two cases.
Case xi ∈ C − {±2}: In this case, the roots α and β are distinct and different from ±1
and the recurrence relation (26) is settled as
βn(αyj + yk)− αn(βyj + yk)
α− β .
Then it follows from (27) and (28) that the sequences x
j and x
k are determined as
j = x
(2n+1)
k = p α
2n + q β2n + r1,
(2n+1)
j = x
k = p α
2n+1 + q β2n+1 + r2,
where the constants p, q, r1 and r2 are given by
p = − α
2(βyj + yk)
(α − β)(α2 − 1) , q =
β2(αyj + yk)
(α− β)(β2 − 1) ,
r1 = xj − p− q, r2 = x′j − αp− βq.
Notice that p = q = 0 if and only if x satisfies equations (23). Indeed, the condition p = q = 0
is equivalent to αyj + yk = βyj + yk = 0, which is equivalent to the condition yj = yk = 0,
because the roots α and β are distinct.
Now we assume that x is a root of equations (23). Then (30) implies that the sequence x(n)
is periodic of period two, that is, x is a fixed point of g2j . Next we assume that x is not a root
of equations (23). If x is a periodic point of g2j of prime period n ≥ 1, then (30) yields
j − xj = (α2n − 1)(p− qβ2n) = 0,
k − xk = (α2n − 1)(pα− qβ2n+1) = 0.
Here it cannot happen that p− qβ2n = pα− qβ2n+1 = 0. Indeed, otherwise, we have p = qβ2n
and q(1 − β2) = 0. Since at least one of p and q is nonzero, we have β ∈ {±1} and hence
xi ∈ {±2}, which contradicts the assumption that xi 6∈ {±2}. Therefore, α2n = 1, that is, α
is a primitive 2n-th root of unity. Note that n ≥ 2 since α 6∈ {±1}. Thus there is an integer
0 < m < n comprime to n such that α = exp(πim/n) and so xi = α + α
−1 = 2 cos(πm/n),
which leads to condition (24). Conversely, if condition (24) is satisfied, then it is easy to see
that x is a periodic point of g2j of prime period n.
Case xi ∈ {±2}: In this case we have xi = −2ε for some sign ε ∈ {±1} and hence
equation (29) has a double root α = β = ε. Then the recurrence equation (26) is settled as
j = ε
n{yj − n(yj + εyk)}. If the sequence x(n) is periodic, then so is the sequence y(n)j . This
is the case if and only if yj + εyk = 0. Conversely, if this condition is satisfied, then we have
j = ε
nyj. Substituting this equation into (27) yields
j = x
(2n+1)
k = xj + nyj,
(2n+1)
j = x
(2n+2)
k = x
j + εnyj.
Hence the sequence x(2n) is periodic if and only if yj = yk = 0, namely, if and only if x is a root
of (23). In this case x is a fixed point of g2j . ✷
In order to give the relation between the fixed points upstairs and those downstairs, we put
Fix◦j(θ) := Fixj(θ)− Sing(θ), F̃ix
j (θ) := F̃ixj(θ)− E(θ), F̃ix
j(θ) := F̃ixj(θ) ∩ E(θ).
For the periodic points of prime period n > 1, we define Per◦j(θ;n), P̃er
j(θ;n) and P̃er
j(θ;n) in
a similar manner. Then there exist direct sum decompositions
F̃ixj(θ) = F̃ix
j (θ)∐ F̃ix
j(θ), P̃erj(θ;n) = P̃er
j (θ;n)∐ P̃er
j(θ;n),
where the exceptional components F̃ix
j(θ) and P̃er
j(θ;n) parametrize the single-valued Riccati
solutions and the n-branched Riccati solutions around the fixed singular point zj respectively.
Lemma 4.5 The minimal resolution (15) induces an isomorphism
ϕ : F̃ix
j (θ) → Fix◦j (θ). (32)
For any n > 1 we have Per(θ;n)∩Sing(θ) = ∅, that is, Per◦j(θ;n) = Perj(θ;n), and the minimal
resolution (15) induces an isomorphism
ϕ : P̃er
j (θ;n) → Perj(θ;n) (n > 1). (33)
Proof. The isomorphism (32) is trivial from the definition. The assertion Per(θ;n)∩Sing(θ) = ∅
follows from (20). Then the isomorphism (33) is again trivial from the definition. ✷
The fixed point set and the periodic point set, upstairs or downstairs, will be investigated
more closely in §6. For this purpose it is convenient to consider the symmetric group S4 of
degree 4 acting on K by permuting the entries κ1, κ2, κ3, κ4 of κ ∈ K and fixing κ0. Through
the Riemann-Hilbert correspondence in the parameter level, rh : K → Θ, the action S4 y K
induces an action of S3 ⋉ Kl on Θ, where Kl is Klein’s 4-group realized as the group of even
triple signs, Kl = {ε = (ε1, ε2, ε3) ∈ {±1}3 : ε1ε2ε3 = 1}, acting on Θ by the sign changes
(θ1, θ2, θ3, θ4) 7→ (ε1θ1, ε2θ2, ε3θ3, θ4), while S3 acts on Θ by permuting the entries θ1, θ2, θ3 of
θ ∈ Θ and fixing θ4. This construction defines an isomorphism of groups
S4 ∼= S3 ⋉Kl, σ 7→ (τ, ε), (34)
with respect to which the map rh : K → Θ becomes S4-equivariant. Viewed as a subgroup of
S4, Klein’s 4-group is the permutation group Kl = {1, (14)(23), (24)(31), (34)(12)}.
Let σ ∈ S4 act on x = (x1, x2, x3) in the same manner as it does on (θ1, θ2, θ3). Then the
polynomial f(x, θ) is σ-invariant and hence σ induces an isomorphism of algebraic surfaces,
σ : S(θ) → S(σ(θ)). As for the action g2j : S(θ) 	, we have the commutative diagram
j−−−→ S(θ)
S(σ(θ)) −−−→
S(σ(θ)),
(A3)i (A
Figure 9: W̃ (D
4 )-strata (A3)i and (A
for any element σ ∈ S4 with τ ∈ S3 determined by (34). It induces isomorphisms
σ : Fixj(θ) → Fixτ(j)(σ(θ)), σ : Perj(θ;n) → Perτ(j)(σ(θ);n),
which, via the minimal resolution (15), lift up to isomorphisms
σ̃ : F̃ixj(θ) → F̃ixτ(j)(σ(θ)), σ̃ : P̃erj(θ;n) → P̃erτ(j)(σ(θ);n).
The action of the symmetric group S4 on K mentioned above is just induced from its action
on the index set {0, 1, 2, 3, 4} fixing the element 0, namely, from the realization of S4 as the
automorphism group of the Dynkin diagram D
4 . By taking the semi-direct product by the
symmetric group S4 or by Klein’s 4-group Kl, we can enlarge the affine Weyl group W (D
to the affine Weyl group of type F
4 or to the extended affine Weyl group of type D
4 ) = S4 ⋉W (D
4 ) ⊃ W̃ (D
4 ) = Kl⋉W (D
Definition 4.6 Replacing the group W (D
4 ) with W (F
4 ) in Definition 3.2, we can define
a coarser stratification of K than the W (D(1)4 )-stratification, called the W (F
4 )-stratification.
Moreover, replacingW (D
4 ) with W̃ (D
4), we can also think of a stratification of K intermediate
between these two stratifications, called the W̃ (D
4 )-stratification.
The following is the classification of the W (F
4 )-strata and W̃ (D
4 )-strata.
Lemma 4.7 For each abstract Dynkin type ∗ in Table 2, there is a unique W (F (1)4 )-stratum of
type ∗. As for the W̃ (D(1)4 )-strata, we have the following classification (see also Figure 9).
(1) For ∗ ∈ {D4, A⊕41 , A⊕31 , A2, A1, ∅}, there is a unique W̃ (D
4 )-stratum of abstract type ∗
and this unique stratum is denoted by the same symbol ∗.
(2) For ∗ ∈ {A3, A⊕21 }, there are exactly three W̃ (D
4 )-strata of abstract type ∗;
(a) for ∗ = A3, the stratum (A3)i represented by I = {0, j, k} with {i, j, k} = {1, 2, 3};
(b) for ∗ = A⊕21 , the stratum (A⊕21 )i represented by I = {j, k} with {i, j, k} = {1, 2, 3}.
(A⊕21 )i
(A3)i
Figure 10: Adjacency relations among W̃ (D
4 )-strata (i = 1, 2, 3)
If something about the transformation g2j is discussed for a fixed index j, the relevant
stratification is the W̃ (D
4 )-stratification. Namely we may discuss the issue on each W̃ (D
stratum, choosing any representative of each W̃ (D
4 )-orbit, since in the commutative diagram
(35) we have τ(j) = j and hence g2
= g2j for every σ ∈ Kl (see also Remark 5.1). For two
W̃ (D
4 )-strata, say ∗ and ∗∗, we write ∗ → ∗∗ if the stratum ∗∗ lies on the boundary of the
stratum ∗. All the possible adjacency relations ∗ → ∗∗ are depicted in Figure 10. Note that
there are no adjacency relations between (A⊕21 )i and (A3)j for any distinct i, j ∈ {1, 2, 3}.
5 Bäcklund Transformations
In this section we briefly discuss Bäcklund transformations, especially the characterization of
them in terms of Riemann-Hilbert correspondence [15, 16]. This topic is included here in order
to confirm that our problem may be treated modulo Bäcklund transformations.
For each σ ∈ S4 we define the isomorphism of affine cubic surfaces
σ : S(θ) → S(σ(θ)), (x1, x2, x3) 7→ (ετ(1)xτ(1), ετ(2)xτ(2), ετ(3)xτ(3)),
where σ ∈ S4 is identified with (τ, ε) ∈ S3 ⋉ Kl via the isomorphism (34). Consider the
natural homomorphism W (F
4 ) = S4 ⋉ W (D
4 ) → S4, w 7→ σ. Since the Riemann-Hilbert
correspondence (12) is an analytic minimal resolution of singularities, for each w ∈ W (F (1)4 ),
there exists an analytic isomorphism w : Mz(κ) → Mz(w(κ)) such that the diagram
Mz(κ)
w−−−→ Mz(w(κ))
RHz,κ
yRHz,w(κ)
S(θ) −−−→
S(σ(θ))
is commutative, for any fixed κ ∈ K with θ = rh(κ) ∈ Θ.
The commutative diagram (36) characterizes the Bäcklund transformations of Painlevé VI.
Namely the map w : Mz(κ) → Mz(w(κ)) turns out to be algebraic and there are suitable affine
coordinates on Mz(κ) and Mz(w(κ)) in terms of which the map w can be represented by the
usual formula for Bäcklund transformations known as birational canonical transforamtions [27]
(see [15, 16] for the precise statement). In other words the Riemann-Hilbert correspondence is
equivariant under the Bäcklund transformations and so is our main problem.
Remark 5.1 The S4-factor of W (F
4 ) = S4 ⋉W (D
4 ) or more strictly the S3-factor of S4 =
S3⋉Kl permutes the three fixed singular points 0, 1 and ∞, while they are fixed by W̃ (D(1)4 ) =
Kl ⋉ W (D
4 ). Hence we may consider our problem only around the origin z = 0 and, upon
restricting our attention to z = 0, we may discuss it modulo the Bäcklund action of W̃ (D
6 Fixed Points and Periodic Points
We shall more closely investigate the fixed point set Fixj(θ), or rather its subset Fix
j (θ)
j (θ) of smooth fixed points, by solving the system of equations (23). In view of (21) the last
two equations in (23) are expressed as a linear system for the unknowns (xj , xk),
2xj + xixk = θj ,
xixj + 2xk = θk,
If its determinant 4− x2i is nonzero, then system (37) is uniquely settled as
2θj − xiθk
4− x2i
, xk =
2θk − xiθj
4− x2i
. (38)
Substituting (38) into equation f(x, θ) = 0 yields a quartic equation for the unknown xi,
x4i − θix3i + (θ4 − 4)x2i + (4θi − θjθk)xi + θ2j + θ2k − 4θ4 = 0. (39)
Conversely, if xi is a root of equation (39) with nonzero x
i − 4, then subsituting this into
formula (38) yields a root of system (23). The four roots of quartic equation (39) are given by
F (bi, b4; bj , bk), F (bi, b
4 ; bj, bk), F (bj, bk; bi, b4), F (bj , b
k ; bi, b4),
counted with multiplicities, where F (bi, b4; bj , bk) is defined by
F (bi, b4; bj , bk) = bib4 + b
4 . (40)
We pick up the root xi = F (bi, b4; bj, bk). Note that F (bi, b4; bj, bk)
2 − 4 is nonzero precisely
when b2i b
4 6= 1. If this is the case, then substituting xi = F (bi, b4; bj , bk) into formula (38) yields
xj = G(bi, b4; bj , bk) and xk = G(bi, b4; bk, bj), where G(bi, b4; bj , bk) is defined by
G(bi, b4; bj , bk) =
(bi + b4)(bj + bk)(bjbk + 1)
2(bib4 + 1)bjbk
(bi − b4)(bj − bk)(bjbk − 1)
2(bib4 − 1)bjbk
. (41)
Therefore, if P (bi, b4; bj, bk) denotes the point defined by
xi = F (bi, b4; bj, bk), xj = G(bi, b4; bj , bk), xk = G(bi, b4; bk, bj),
then x = P (bi, b4; bj , bk) gives a root of system (23) with nonzero x
i −4 provided that b2i b24 6= 1.
If x is at this root, then yi(x, θ) admits the following nice factorization
yi(x, θ) =
(bi − b−1i )(b4 − b−14 )
(b2i b
4 − 1)2
(εj ,εk)∈{±1}
k b4 − 1)
= (bib4 − b−1i b−14 )−2 {F (bi, b4; bj, bk)− F (bi, b−14 ; bj, bk)}
{F (bi, b4; bj, bk)− F (bj , bk ; bi, b4)}
{F (bi, b4; bj, bk)− F (bj , b−1k ; bi, b4)}.
label fixed point existence smoothness condition
1 P (bi, b4; bj , bk) κi + κ4 6∈ Z κi 6∈ Z, κ4 6∈ Z, κi + κ4 ± κj ± κk 6∈ 2Z+ 1
2 P (bi, b
4 ; bj, bk) κi − κ4 6∈ Z κi 6∈ Z, κ4 6∈ Z, κi − κ4 ± κj ± κk 6∈ 2Z+ 1
3 P (bj, bk; bi, b4) κj + κk 6∈ Z κj 6∈ Z, κk 6∈ Z, κj + κk ± κi ± κ4 6∈ 2Z+ 1
4 P (bj, b
k ; bi, b4) κj − κk 6∈ Z κj 6∈ Z, κk 6∈ Z, κj − κk ± κi ± κ4 6∈ 2Z+ 1
Table 3: Smooth fixed points x ∈ Fix◦j(θ) with nonzero x2i − 4
Hence P (bi, b4; bj, bk) is a smooth point of S(θ) if and only if F (bi, b4; bj , bk) is a simple root of
equation (39). In terms of κ ∈ K, the existence and smoothness conditions for P (bi, b4; bj , bk)
are given by κi + κ4 6∈ Z and κi 6∈ Z, κ4 6∈ Z, κi + κ4 ± κj ± κk 6∈ 2Z+ 1, respectively.
Lemma 6.1 The smooth fixed points x ∈ Fix◦j (θ) with nonzero x2i −4 are precisely those points
in Table 3 which satisfy the existence and smoothness conditions mentioned there.
The fixed points in Table 3 is closely related to the configuration of lines on the affine cubic
surface S(θ) or on its compactification S(θ) by the standard embedding
S(θ) →֒ S(θ) ⊂ P3, x = (x1, x2, x3) 7→ [1 : x1 : x2 : x3],
where the projective cubic surface S(θ) is defined by the homogeneous equation
F (X, θ) := X1X2X3 +X0(X
3 )−X20 (θ1X1 + θ2X2 + θ3X3) + θ4X30 = 0.
It is obtained from the affine surface S(θ) by adding three lines at infinity
Li = {X ∈ P3 : X0 = Xi = 0 } (i = 1, 2, 3),
whose union L = L1 ∪ L2 ∪ L3 is called the tritangent lines at infinity.
It is well known that a smooth projective cubic surface has exactly 27 lines on it. We
describe them in the current situation [20]. Let Li(bi, b4; bj, bk) be the line in P
3 defined by
Xi = (bib4 + b
4 )X0, Xj + (bib4)Xk = {bi(bk + b−1k )}+ b4(bj + b−1j )X0. (43)
1 L+i1 = Li(bi, b4; bj, bk) L
i1 = Li(b
i , b
4 ; bj, bk)
2 L+i2 = Li(bi, b
4 ; bj, bk) L
i2 = Li(b
i , b4; bj, bk)
3 L+i3 = Li(bj , bk; bi, b4) L
i3 = Li(b
j , b
k ; bi, b4)
4 L+i4 = Li(bj , b
k ; bi, b4) L
i4 = Li(b
j , bk; bi, b4)
Table 4: Eight lines intersecting the line Li at infinity, divided into four pairs
L−24L
L+11 L
L−12 L
Figure 11: The 27 lines on a smooth cubic surface viewed from the tritangent lines at infinity
For each i ∈ {1, 2, 3} the eight lines in Table 4 are the only lines on S(θ) that intersect the i-th
line Li at infinity, but they do not intersect the remaining two lines Lj and Lk at infinity. These
lines are divided into four pairs as in Table 4. The surface S(θ) is always smooth at infinity
[20] and hence, if κ ∈ K−Wall, then S(θ) is smooth everywhere. In this case, the two lines in
the same pair intersect, while two lines from different pairs do not. The intersection point of
the i-th pair is exactly the i-th fixed point in Table 3. See Figure 11 for a total image of these
situations. Caution: for a pair of distinct indices i and j, the intersection relations between L±iµ
and L±jν are not depicted in the Figure 11. We also remark that in some degenerate cases the
lines L+iµ and L
iµ may meet in a point on the line Li at infinity.
Next we consider the case where the determinant 4 − x2i of system (37) vanishes. In other
words we ask when the fixed point set Fixj(θ) contains points x such that xi ∈ {±2}.
Lemma 6.2 Fixj(θ) contains a point x such that xi = 2δ with δ ∈ {±1} if and only if either
(1) bib4 = bib
4 = δ; or
(2) bjbk = bjb
k = δ; or
(3) bib
4 = bjb
k = δ for some double sign (εk, ε4) ∈ {±1}2.
If this is the case, then θk = δθj and all such poins x are exactly those points on the line
ℓδj := { xi = 2δ, xj + δxk = θj/2 }. (44)
In particular ℓδj ⊂ Fixj(θ) precisely when xi = 2δ is a multiple root of the quartic equation (39).
xi ∈ {±2} multiplicity component remark
no simple smooth point intersection point of L±iµ
no multiple singular point Riccati locus
yes multiple line ℓ+j or ℓ
j line contains singular points
yes simple empty L±iµ intersects at infinity
Table 5: The roots of quartic equation (39) and the components of Fixj(θ)
Proof. If xi = 2δ with δ ∈ {±1} then system (37) is linearly dependent, so that θk − δθj = 0.
However, since θk − δθj = (bibj)−1(bib4 − δ)(bib−14 − δ)(bjbk − δ)(bjb−1k − δ), we have either
4 = δ for some sign ε4 ∈ {±1} or bjb
k = δ for some sign εk ∈ {±1}. Taking the equation
xj + δxk = θj/2 into account, we observe that f(x, θ) factors as
f(x, θ) =
−(2bibj)−2(bib−ε44 − δ)2(bjbk − δ)2(bjb−1k − δ)2 (if bib
4 = δ),
−(2bibj)−2(bjb−εkk − δ)2(bib4 − δ)2(bib−14 − δ)2 (if bjb
k = δ).
If bib
4 = δ then equation f(x, θ) = 0 yields either bib
4 = δ or bjb
k = δ for some sign
εk ∈ {±1}; the former case falls into case (1) while the latter falls into case (3). In a similar
manner the other case bjb
k = δ falls into case (2) or case (3).
Next, if Fixj(θ) contains the line ℓ
j , then what we have just proved implies that
F (bi, b4; bj, bk) = F (bi, b
4 ; bj, bk) = 2δ if condition (1) is satisfied;
F (bj, bk; bi, b4) = F (bj , b
k ; bi, b4) = 2δ if condition (2) is satisfied;
F (bi, b
4 ; bj, bk) = F (bj , b
k ; bi, b4) = 2δ if condition (3) is satisfied.
Hence xi = 2δ is a multiple root of the quartic equation (39). Conversely, if xi = 2δ is a
multiple root of (39), then we can trace the argument backwards to conclude that the system
(23) admits the line solution ℓδj , that is, Fixj(θ) contains ℓ
j . ✷
Summarizing the arguments so far yields a classification of the irreducible components of
the algebraic set Fixj(θ) in terms of certain roots of quartic equation (39).
Theorem 6.3 Any irreducible component of Fixj(θ) is just a single point or a single affine line;
the former is called a point component and the latter is called a line component respectively. The
irreducible components of Fixj(θ) are in one-to-one correspondence with those roots of quartic
equation (39) which are not a simple root x = (x1, x2, x3) such that xi ∈ {±2}.
(1) A simple root with xi 6∈ {±2} corresponds to a point component that is a smooth point of
the surface S(θ) and is given in Table 3.
(2) A multiple root with xi 6∈ {±2} corresponds to a point component that is a singular point
of the surface S(θ) and is associated with Riccati solutions.
(3) A multiple root with xi ∈ {±2} corresponds to a line component; either ℓ+j or ℓ−j .
(4) A simple root with xi ∈ {±2} corresponds to no component of Fixj(θ).
A summary of Theorem 6.3 is given in Table 5 and the following remark may be helpful.
Remark 6.4 The assertions (3) and (4) of Theorem 6.3 may be well understood through the
degeneration of line configration on the projective surface S(θ) as the parameter θ = rh(κ)
tends to a special position. For a generic value of θ the lines L±iµ intersect in a single (smooth)
point on the affine part S(θ) of S(θ). If the parameter θ tends to a special position so that
a corresponding root xi of quartic equation (39) approaches {±2}, then the two line L±iµ are
getting “parallel” and eventually either coincide completely or meet in a point at infinity. The
former case falls into assertion (3) and the latter case falls into assertion (4) respectively.
Let us investigate more closely the case where Fixj(θ) contains line components.
Lemma 6.5 Let θ = rh(κ) with κ ∈ K and (i, j, k) be any cyclic permutation of (1, 2, 3).
(1) Fixj(θ) contains either ℓ
j or ℓ
j but not both of them if and only if κ lies in a W̃ (D
stratum appearing in the following adjacency diagram (see also Figure 10) :
(A⊕21 )i −−−→ A⊕31y
(A3)i −−−→ D4
(2) Fixj(θ) contains both ℓ
j and ℓ
j if and only if θ = (0, 0, 0,−4), that is, precisely when κ
is in the W̃ (D
4 )-stratum of type A
1 . In this case one has Fixj(θ) = ℓ
j ∐ ℓ−j .
Proof. Lemma 6.2 implies that Fixj(θ) contains at least one of ℓ
j and ℓ
j if and only if either (a)
bib4 = bib
4 ∈ {±1}; or (b) bjbk = bjb−1k ∈ {±1}; or (c) bib
4 = bjb
k ∈ {±1} for some double
sign (εk, ε4) ∈ {±1}2. This property is invariant under the action of W̃ (D(1)4 ) = Kl⋉W (D
on K. Using this action we can reduce conditions (a) and (c) to condition (b). First, observe
that the permutation (i, j)(k4) ∈ Kl induces the map (b0, bi, bj , bk, b4) 7→ (b0, bj , bi,−b4,−bk),
which reduces condition (a) to condition (b). Next, formula (13) implies that the reflection wi
induces the multiplicative transformation wi : B → B, b 7→ b′, where
b′j =
−bjbciji (i = 4, j = 0),
i (otherwise).
Applying w4 or wk if necessary, we may assume from the beginning that ε4 = 1 and εk = −1
in condition (c). Then using w0 there yields b
0bib4 = bjb
k ∈ {±1}. But since b20bib4bjbk = 1,
we have bjbk = bjb
k ∈ {±1}, that is, condition (b). Note that condition (b) means κj, κk ∈ Z.
On the other hand, the extended affine Weyl group W̃ (D
4 ) contains shifts
(κ0, κi, κj, κk, κ4) 7→ (κ0, κi − 1, κj + 1, κk, κ4),
(κ0, κi, κj, κk, κ4) 7→ (κ0, κi, κj, κk + 1, κ4 − 1).
Repeated applications of these operations and their inverses can shift κj and κk independently
by arbitrary integers. Thus the condition κj, κk ∈ Z can further be reduced to κj = κk = 0.
Thus we have shown that if Fixj(θ) contains at least one of ℓ
j and ℓ
j , then κ must lie in
the W̃ (D
4 )-stratum of type (A
1 )i or on its boundary strata of types (A3)i, A
1 , D4, A
Moreover, it is easy to see that the converse is also true.
For a sign δ ∈ {±1} the conditions (1), (2), (3) in Lemma 6.2 are denoted by (1δ), (2δ), (3δ),
respectively. Now we assume that Fixj(θ) contains both ℓ
j and ℓ
j . Then there exists a pair
of conditions, one from {(1+), (2+), (3+)} and the other from {(1−), (2−), (3−)}, that are valid
at the same time. Such a pair can be consistent only if it is either (1−) + (2−); or (2+) + (1−);
or (3+) + (3−) where if the sign for (3+) is (εk, ε4) then the sign for (3
−) must be its antipode
(−εk,−ε4). The first and second pairs lead to b21 = b22 = b23 = b24 = 1 and to b1b2b3b4 = −1,
while the third pair yields b21 = b
2 = b
3 = b
4 = −1. These are nothing but the conditions (a)
and (b) in Example 3.4. (1). Therefore κ must lie in the stratum of type A⊕41 . Combining this
with the discussion in the last paragraph establishes the assertion (1), as well as a large part
of the assertion (2). The only thing yet to be proved is the assertion that if Fixj(θ) contains
both ℓ+j and ℓ
j , then Fixj(θ) = ℓ
j ∐ ℓ−j . For this, the last part of Lemma 6.2 implies that both
xi = 2 and xi = −2 are multiple roots of the quartic equation (39), so that there are no other
roots of the equation (39). Thus Fixj(θ) has no elements other than those in ℓ
j ∐ ℓ−j . ✷
Now we turn our attention to periodic points and investigate the set P̃er
j(θ;n) of periodic
points of prime period n > 1 on the non-Riccati locus.
Lemma 6.6 For any integer n > 1 the set P̃er
j(θ;n) is biholomorphic to the disjoint union of
ϕ(n) copies of C×, where ϕ(n) denotes the number of integers 0 < m < n coprime to n.
Proof. By Lemma 4.5 we can identify P̃er
j(θ;n) with Perj(θ;n) and hence may work downstairs.
For any integer 0 < m < n coprime to n, we consider the projective curve Cm in P
3 defined by
{4 cos2(πm/n)− 2θi cos(πm/n) + θ4}X20 −X0(θjXj + θkXk)
+X2j +X
k + 2 cos(πm/n)XjXk = 0, (47)
Xi − 2 cos(πm/n)X0 = 0, (48)
where (47) is obtained from F (X, θ) = 0 by substituting (48) and factoring X0 out of it. It
follows from −2 < 2 cos(πm/n) < 2 that Cm is an irreducible smooth conic curve. By equations
(24) of Lemma 4.4 the closure Perj(θ;n) of Perj(θ;n) in S(θ) is the union of these ϕ(n) curves
Cm. The curve Cm intersects the lines L = Li ∪ Lj ∪ Lk at infinity in the two points
P±m : [X0 : Xi : Xj : Xk] = [0 : 0 : −1 : exp(±π
−1m/n)] ∈ Li.
If Cm := Cm−{P+m , P−m}, then Cm is biholomorphic to C×, since Cm ∼= P1. So Perj(θ;n) is the
disjoint union of the ϕ(n) curves Cm with 0 < m < n, (m,n) = 1, and hence biholomorphic to
the disjoint union of ϕ(n) copies of C×. ✷
7 Case-by-Case Study
We make case-by-case studies of F̃ixj(θ) and P̃er
j(θ;n) according to the adjacency diagram in
Figure 10. Now we need to introduce some notation. Recall that we have the resolution of
singularities (15) which restricts to an isomorphism ϕ : S̃◦(θ) → S◦(θ) and that the smooth
fixed points Fix◦j(θ) in S◦(θ) are listed in Table 3. For each P ∈ Fix◦j(θ) let P̃ ∈ F̃ix
j (θ)
denote its lift through the isomorphism ϕ. For example, P̃ (bi, b4; bj , bk) denotes the lift of
P (bi, b4; bj, bk). If {· · · } is a set of expressions P̃ (bi, b±14 ; bj, bk), P̃ (bj , b±1k ; bi, b4), then we denote
by {{· · · }} its subset obtained by discarding those expressions which do not satisfy either the
existence condition or the smoothness condition of Table 3. An example is given in (49) below.
Example 7.1 (∅) Consider the W̃ (D
4 )-stratum of type ∅, namely, the big open K −Wall.
F̃ixj(θ) = {{ P̃ (bi, b4; bj , bk), P̃ (bi, b−14 ; bj, bk), P̃ (bj , bk; bi, b4), P̃ (bj , b−1k ; bi, b4) }}. (49)
Here we have only to care the existence condition, as we are in the big open where the smooth-
ness condition is fulfilled by hypothesis. If a finer stratification of K attached to the W (F (1)4 )-
action on K is introduced, then a more precise description of (49) is feasible, detecting how
many and which elements are there in (49), but the details are omitted. We only remark that
F̃ixj(θ) consists of four distinct points in the most generic case where none of κi±κ4 and κj±κk
are integers. As for the periodic points, since there is no Riccati locus, we have
P̃erj(θ;n) = P̃er
j (θ;n), P̃er
j(θ;n) = ∅, (n > 1).
Example 7.2 (A1) Consider the W̃ (D
4 )-stratum of type A1. We may assume that κ0 = 0 so
that b0 = 1 and bibjbkb4 = 1. Note that none of b
i , b
j , b
4 equals 1. We claim that b
k 6= 1.
Otherwise, we would have κj + κk ∈ Z. Applying a shift as in (46) to κ repeatedly, we may
assume that κj + κk = 0 while keeping the condition κ0 = 0. Then the transformation w0wj
sends κ to κ′ with κ′j = 0 and κ
k = κj + κk = 0, so that one has κ ∈ K{j,k}, namely, κ lies in
the closure of the stratum of type (A⊕21 )i. This contradicts the assumption that we are in the
stratum of type A1. In this case, the surface S(θ) has a unique singular point of type A1 at
(xi, xj , xk) = (bib4 + b
4 , bjb4 + b
4 , bkb4 + b
Blow up S(θ) at this point to obtain a minimal resolution (15). Write the blowing-up as
(xi, xj, xk) = (uiuj + bib4 + b
4 , uj + bjb4 + b
4 , ukuj + bkb4 + b
in terms of coordinates (ui, uj, uk). The exceptional set e is the irreducible quadratic curve
uj = bibjbk + (1 + b
j )bkui + (1 + b
j )biuk + bibjbk(u
i + u
k) + (1 + b
k)bjuiuk = 0,
which can be paramatrized as uj = 0 and
b2i b
k − 1)3t
bj{bk(b2i − 1)(b2j − 1) + bi(b2jb2k − 1)t}{(b2k − 1)(b24 − 1) + bibkb24(b2jb2k − 1)t}
{b2jbk(b2i − 1)(b2k − 1)− bi(b2jb2k − 1)t}{bk(b2j − 1)(1− b24) + bib24(b2jb2k − 1)t}
bj{bk(b2i − 1)(b2j − 1) + bi(b2jb2k − 1)t}{(b2k − 1)(b24 − 1) + bibkb24(b2jb2k − 1)t}
In terms of this parametrization, the lifted transformation g̃2j acts on the exceptional curve
e ≃ P1 by the multiplication t 7→ b2jb2kt. Since b2jb2k 6= 1, the set F̃ix
j(θ) consists of the two
p0 p+
S̃(θ)
t = 0 t = ∞ s = t = 0 s = ∞
t = ∞
S̃(θ)
Figure 12: Surface of types A1 (left) and A2 (right)
points, say p and q, corresponding to t = 0 and t = ∞ (see Figure 12, left). On the other hand,
the possible candidates for the smooth fixed points F̃ix
j (θ) are only the points of labels 2 and
4 in Table 3, since those of labels 1 and 3 do not satisfy the smoothness condition. Thus,
F̃ixj(θ) = {{P̃ (bi, b−14 ; bj , bk), P̃ (bj , b−1k ; bi, b4)}} ∐ {p, q}. (50)
As for the Riccati periodic points P̃er
j(θ;n), the discussion above implies that for any n > 1,
j(θ;n) =
e (if bjbk is a primitive 2n-th root of unity),
∅ (otherwise).
Example 7.3 (A2) Consider the W̃ (D
4 )-stratum of type A2. We may assume that κ0 =
κi = 0 so that b0 = bi = 1. Then the surface S(θ) has a unique singular point of type A2 at
(xi, xj , xk) = (b4 + b
4 , bjb4 + b
4 , bkb4 + b
Blow up S(θ) at this point to obtain a minimal resolution (15). Write the blowing-up as
(xi, xj , xk) = (uiuj + b4 + b
4 , uj + bjb4 + b
4 , ukuj + bkb4 + b
in terms of coordinates (ui, uj, uk). The exceptional set e is the union of two lines
e+ : uj = bkui + uk + bjbk = 0, e
− : uj = b
k ui + uk + b
k = 0,
intersecting in a point. These lines are parametrized as
e+ : (ui, uj, uk) =
k − 1
bj(1− b2k) + (b2jb2k − 1)s
bk(1− b2j ) + bjbk(1− b2jb2k)s
bj(1− b2k) + (b2jb2k − 1)s
e− : (ui, uj, uk) =
k − 1
bj(1− b2k) + (b2jb2k − 1)t
bk(1− b2j ) + b−1j b−1k (1− b2jb2k)t
bj(1− b2k) + (b2jb2k − 1)t
with the intersection point corresponding to s = t = 0. In terms of these parametrizations, the
lifted transformation g̃2j acts on e
+ and e− by the multiplications s 7→ b−2j b−2k s and t 7→ b2jb2kt,
ℓ̃+j C
ei e4
S̃(θ)
Figure 13: Surface of type (A⊕21 )i
which are rewritten as s 7→ b24s and t 7→ b−24 t, since bjbkb4 = 1. Note that b24 6= 1, for otherwise
κ would be in the closure of the W̃ (D
4 )-stratum of type (A3)i. So g̃
j has exactly two fixed
points p0 and p+ on e
+ corresponding to s = 0 and s = ∞. Similarly g̃2j has exactly two fixed
points p0 and p− on e
− corresponding to t = 0 and t = ∞, where p0 is the intersection point of
e+ and e− (see Figure 12, right). Thus we have F̃ix
j(θ) = {p0, p+, p−}. Next we consider the
smooth fixed point of g̃2j on S̃(θ). Since we are assuming that κi = κj + κk + κ4 = 1, the points
of labels 1, 2, 3 in Table 3 do not satisfy the smoothness condition and that of label 4 is the
only smooth fixed point. Thus F̃ix
j (θ) = {P̃ (bj , b−1k ; bi, b4)} and hence
F̃ixj(θ) = {P̃ (bj , b−1k ; bi, b4), p0, p+, p−}. (51)
In the remaining cases presented below, Fixj(θ) contains at least one line component.
Example 7.4 (A
) First we consider F̃ixj(θ) and P̃er
j(θ) on the W̃ (D
4 )-stratum of type
(A⊕21 )i. We may assume that κj = κk = 0 so that bj = bk = 1. Since our stratum is not of type
(A3)i nor of type D4, we have (bib4−1)(bib−14 −1) 6= 0 or equivalently bi+b−1i 6= b4+b−14 . In this
case Fixj(θ) contains the line ℓ
j but does not the line ℓ
j and the surface S(θ) has two singular
points of type A1 at (xi, xj , xk) = (2, bi + b
i , b4 + b
4 ) and (xi, xj , xk) = (2, b4 + b
4 , bi + b
We denote the former singularity by qi and the latter by q4 respectively; both singularities lie on
the line ℓ+j . Blow up S(θ) at these points to obtain a minimal resolution as in (15). Let ℓ̃+j be
the strict transform of ℓ+j , and let ei and e4 be the exceptional curves over qi and q4 respectively.
Moreover let pi be the intersection point of ℓ̃
j and ei. Similarly let p4 be the intersection point
of ℓ̃+j and e4 (see Figure 13). Then the blowing-up at the point qi is represented as
(xi, xj, xk) = (uiuj + 2, uj + bi + b
i , ukuj + b4 + b
in terms of coordinates (ui, uj, uk) around (0, 0, 0). The strict transform ℓ̃
j and the exceptional
curve ei are given by ui = uk + 1 = 0 and
uj = (bib4)(u
i + u
k) + (b
i + 1)b4(uiuk) + bi(b
4 + 1)ui + 2(bib4)uk + (bib4) = 0.
The exceptional curve ei admits a parametrization
(bib4 − 1)(bib−14 − 1)
(t+ bi)(bit + 1)
, uj = 0, uk = −
bi(t+ b4)(b4t+ 1)
b4(t+ bi)(bit+ 1)
, (52)
where the intersection point pi has coordinates (ui, uj, uk) = (0, 0,−1), which corresponds to
t = ∞. The lifted transformation g̃2j acts on ei as a Möbius transformation fixing pi. Some
computations show that in terms of the variable t this transformation is just the shift
t 7→ t+ (bi + b−1i )− (b4 + b−14 ).
and hence a parabolic transformation. Thus g̃2j has no periodic points on ei other than the
fixed point pi. By symmetry, g̃
j also acts on e4 as a parabolic Möbius transformation fixing p4
only. Summarizing the arguments, we conclude that on the W̃ (D
4 )-stratum of type (A
1 )i,
F̃ixj(θ) = ℓ̃
j ∐ { P̃ (bi, b4; bj , bk), P̃ (bi, b−14 ; bj, bk) }, P̃er
j(θ;n) = ∅ (n > 1). (53)
Next we consider F̃ixi(θ) on the W̃ (D
4 )-stratum of type (A
1 )i. Some calculations show
that there are parametrizations of ei and e4 such that g̃
i acts on ei and e4 as the multiplications
t 7→ b24t and t 7→ b2i t respectively. (Modify (52) to get such parametrization.) Since b24 6= 1 and
b2i 6= 1, the transformation g̃2i has exactly two fixed points, say pii and qii, on ei, and exactly
two fixed points, say pi4 and qi4, on e4. There are no smooth fixed points F̃ix
i (θ), because the
smoothness condition of Table 3 with (i, j, k) replaced by (k, i, j) is not satisfied for any labels
there. Thus we have F̃ixi(θ) = F̃ix
i (θ) = {pii, qii, pi4, qi4} and F̃ix
i (θ) = ∅. By symmetry
there is a similar characterization of F̃ixk(θ). By permuting the indices (i, j, k), we have
F̃ixj(θ) = F̃ix
j(θ) = {four points}, F̃ix
j(θ) = ∅, (54)
on the W̃ (D
4 )-strata of types (A
1 )j and (A
1 )k. A slightly further consideration yields
j(θ;n) =


ej ∐ e4 (if bj and b4 are primitive 2n-th roots of unity),
ej (if b4 is a primitive 2n-th root of unity, but bj is not),
e4 (if bj is a primitive 2n-th root of unity, but b4 is not),
∅ (otherwise).
on the stratum (A⊕21 )j and a similar characterization of it on the stratum (A
1 )k.
Example 7.5 (A3) First we consider F̃ixj(θ) and P̃er
j(θ) on the W̃ (D
4 )-stratum of type
(A3)i. We may assume that κ0 = κj = κk = 0 and κi+κ4 = 1 so that bj = bk = 1 and bib4 = 1.
But we have bib
4 6∈ {±1}, since our stratum is not of type D4. In this case Fixj(θ) contains
the line ℓ+j but does not the line ℓ
j . The surface S(θ) has only one singular point of type A3
at (xi, xj , xk) = (2, b4 + b
4 , b4 + b
4 ), which lies on the line ℓ
j . Blow up the singular point.
This blowing-up is expressed as (xi, xj , xk) = (uiuj + 2, uj + b4 + b
4 , ukuj + b4 + b
4 ) in terms
of coordinates (ui, uj, uk) around (0, 0, 0). The strict transform of the surface S(θ) is given by
uj = b4uiujuk + b4u
i ++b4u
k + (b
4 + 1)uiuk + (b
4 + 1)ui + 2b4uk + b4 = 0,
which has yet one singular point, say q. The exceptional curve consists of two line components
uj = ui+b4uk+b4 = 0 and uj = b4ui+uk+1 = 0, whose intersection point (ui, uj, uk) = (0, 0,−1)
is exactly the singular point q. The strict transform of ℓ+j is now given by ui = uk+1 = 0, which
ℓ̃+jej ek
pj p0 pk
S̃(θ)
Figure 14: Surface of type (A3)i
also passes through q. Blow up again the singular point q. Let e0 be the exceptional curve and
let ej, ek, ℓ̃
j be the strict transforms of the lines uj = ui+ b4uk+ b4 = 0, uj = b4ui+uk+1 = 0,
ui = uk + 1 = 0 respectively. If we express this blowing-up as (ui, uj, uk) = (vi, vivj , vivk − 1),
then the exceptional curve e0 is given by vi = b4 − b4vj + (b24 + 1)vk + b4v2k = 0; ej is given by
vj = 1 + b4vk = 0; and ek is given by vj = b4 + vk = 0. The intersection point of e0 and ej is
(vi, vj , vk) = (0, 0,−bi) and that of e0 and ek is (vi, vj , vk) = (0, 0,−b4). If ej is parametrized as
(vi, vj , vk) = ((t+bi)
−1, 0,−bi), then the transformation g̃2j acts on ej as the shift t 7→ t+b4−bi.
Similarly, if ek is parametrized as (vi, vj, vk) = ((t + b4)
−1, 0,−b4), then g̃2j acts on ek as the
shift t 7→ t + b4 − bi. Hence g̃2j acts on ej and ek as parabolic Möbius transformations fixing
only pj and qj . Then g̃
j acts on e0 as the identity, because it also fixes the intersection point
p0 of e0 and ℓ̃
j . Summarizing the arguments we see that on the stratum of type (A3)i,
F̃ixj(θ) = ℓ̃
e0 ∐ { P̃ (bi, b−14 ; bj , bk) }, P̃er
j(θ;n) = ∅ (n > 1), (55)
where ℓ̃+j ∪
e0 indicates that the curves ℓ̃
j and e0 meet in the point p0.
Next we consider F̃ixi(θ) and P̃er
i (θ;n) on the W̃ (D
4 )-stratum of type (A3)i. If we take a
parametrization of e0 such that t = 0 and t = ∞ correspond to the points pj and pk respectively,
then a simple check shows that the transformation g̃2i on ej is expressed as t 7→ b−24 t. There
is a parametrization of ej such that t = 0 corresponds to pj and g̃
i is given by t 7→ b24t. Since
b24 6= 1, the transformation g̃2i has exactly two fixed points on ej , one of which is just pj and the
other is denoted by pij . Similarly, there is a parametrization of ek such that t = 0 corresponds
to pk and g̃
i is given by t 7→ b−24 t, and hence g̃2i has exactly two fixed points on ek, one of
which is just pk and the other is denoted by pik. There are no smooth fixed points F̃ix
i (θ),
because the smoothness condition of Table 3 with (i, j, k) replaced by (k, i, j) is not satisfied
for any labels there. So we have F̃ixi(θ) = F̃ix
i (θ) = {pj, pij , pk, pik} and F̃ix
i (θ) = ∅ on the
W̃ (D
4 )-stratum of type (A3)i. By symmetry there is a similar characterization of F̃ixk(θ) on
the same stratum. By permuting the indices (i, j, k), we have
F̃ixj(θ) = F̃ix
j(θ) = {four points}, F̃ix
j(θ) = ∅, (56)
ej ek
pj pk
r0 r∞
S̃(θ)
Figure 15: Surface of type A⊕31
on the W̃ (D
4 )-strata of types (A3)j and (A3)k. A slightly further consideration yields
j(θ;n) =
ei (if b4 is a primitive 2n-th root of unity),
∅ (otherwise),
on the stratum (A3)j and a similar characterization of it on the stratum (A3)k.
Example 7.6 (A
) Consider the W̃ (D
4 )-stratum of type A
1 . We may assume that κi =
κj = κk = 0 so that bi = bj = bk = 1. But we have b4 6∈ {±1} since our stratum is not of
type D4 nor of type A
1 . In this case the surface S(θ) has three singular points of type A1 at
(xi, xj, xk) = (b4+b
4 , 2, 2), (2, b4+b
4 , 2), (2, 2, b4+b
4 ), which are called qi, qj , qk respectively.
Note that the two points qj and qk lie on the line ℓ
j but qi does not lie on the union ℓ
j ∐ ℓ−j .
The minimal resolution (15) is obtained by blowing up these three points (see Figure 15). First,
consider the blowing-up at qk and represent it by (xi, xj , xk) = (uiuj+2, uj+2, ukuj+b4+b
Then the strict transform ℓ̃+j of the line ℓ
j is given by ui = uk + 1 = 0, while the exceptional
curve ek is given by b4(ui + uk + 1)
2 + (b4 − 1)2ui = 0. The curves ℓ̃+j and ek intersect in the
point (ui, uj, uk) = (0, 0,−1); this point is called pk. If we parametrize the curve ek as
ui = −
(b4 − 1)2t2
, uj = 0, uk = −
{(b4 − 1)t+ 1}{(b4 − 1)t− b4}
(b4 − 1)2t2
(t ∈ P1),
where t = ∞ corresponds to the point pk, then the lifted transformation g̃2j induces the shift
t 7→ t+1 and hence acts on ek as a parabolic Möbius transformation fixing pk only. In a similar
manner g̃2j acts on the exceptional curve ej over qj as a parabolic Möbius transformation fixing
only the intersection point pj of ℓ̃
j and ej . Next we consider the blowing-up at qi and represent
it by (xi, xj , xk) = (uiuj + b4 + b
4 , uj + 2, ukuj + 2). Then the exceptional curve ei is given by
b4(ui + uk + 1)
2 + (b4 − 1)2uk = 0, which can be parametrized as
ui = −
(b4 + 1)
(b4t + 1)2
, uj = 0, uk = −
b4(t− 1)2
(b4t+ 1)2
(t ∈ P1).
ℓ̃+kℓ̃
ejei ek
qi qj qk
S̃(θ)
Figure 16: Surface of type D4
In terms of this parametrization, the transformation g̃2j restricts to the map t 7→ b24t on the
exceptional curve ei. Let r0 and r∞ be the points on ei corresponding to t = 0 and t = ∞ respec-
tively. Since b24 6= 1, the map g̃2j acts on ei as a Möbius transformation with exactly two fixed
points r0 and r∞. Hence the set F̃ixj(θ) contains the line component ℓ̃
j and the Riccati compo-
nent {r0, r∞}, but has no smooth-point component, since F (bi, b4; bj , bk) = F (bi, b−14 ; bj, bk) =
b4 + b
4 6∈ {±2} is a double root of the quartic equation (39) (see Theorem 6.3). Thus we have
F̃ixj(θ) = ℓ̃
j ∐ {r0, r∞}. (57)
As for the Riccati periodic points, since g̃2j acts on ei as t 7→ b24t, we have for any n > 0,
j(θ;n) =
ei (if b4 is a primitive 2n-th root of unity),
∅ (otherwise).
Example 7.7 (D4) Consider the W̃ (D
4 )-stratum of type D4, say, the W (D
4 )-stratum with
value θ = (8, 8, 8, 28). In this case the surface S(θ) has only one singular point of type D4 at
(xi, xj, xk) = (2, 2, 2). The minimal resolution (15) is obtained by successive blowing-ups: Blow
up the singular point. If we express the blowing-up as (xi, xj , xk) = (uiuj + 2, uj + 2, ukuj + 2)
in terms of coordinates (ui, uj, uk), then the strict transform of the surface S(θ) is represented
as uiujuk + (ui + uk + 1)
2 = 0. The exceptional curve e is given by uj = ui + uk + 1 = 0. The
strict transforms of ℓ+i and ℓ
j are given by ui+1 = uk = 0 and ui = uk+1 = 0, while the strict
transform of ℓ+k is at infinity and not expressible in terms of the coordinates (ui, uj, uk). The
blow-up surface has three singularities, all of which are of type A1 and located at the points
in which the exceptional curve e intersects the strict transforms of ℓ+i , ℓ
j , ℓ
k . The lifts of the
transformations g2i , g
j , g
k fix the curve e pointwise, since they fix the three singular points on
it. Again blow up these points. Then we obtain a minimal resolution (15) of the surface S(θ)
as depicted in Figure 16, where ei, ej , ek are the exceptional curves over the singular points and
e++−e+−+
e−−− e−++
S̃(θ) P
Figure 17: Surface of type A⊕41
e0, ℓ̃
i , ℓ̃
j , ℓ̃
k are the strict transforms of e, ℓ
i , ℓ
j , ℓ
k , respectively. Being the strict transform
of e, the exceptional curve e0 is fixed pointwise by the lifts g̃
i , g̃
j , g̃
k of g
i , g
j , g
k, and hence
carries rational solutions. Moreover the lift g̃2j fixes ej pointwise. This can be seen without
computation. Since g̃2j is area-preserving and fixes ℓ̃
j pointwise, it has derivative 1 at pj along
the curve ej . So the Möbius transformation on ej induced by g̃
j is either identity or a map
of parabolic type. But the latter is impossible because it has at least two fixed points at pj
and qj (see Figure 16). Hence g̃
j acts on ej as the identity. Next we shall observe that g̃
acts on ei as a parabolic Möbius transformation fixing qi only. If we express the blowing-up
at (ui, uj, uk) = (−1, 0, 0) as (ui, uj, uk) = (vivk − 1, vjvk, vk), then the exceptional curve ei is
given by vk = vj − (vi + 1)2 = 0. Parametrize ei as (vi, vj , vk) = (−(t + 1)/t, t−2, 0), where
qi corresponds to t = ∞. Then g̃2j acts on ei by the shift t 7→ t + 1. Similarly g̃2j acts on ek
as a parabolic transformation fixing qk only. By symmetry, g̃
i and g̃
k act on ej as parabolic
transformations fixing qj only. Notice that the exceptional curve e0 carries rational Riccati
solutions, while ej − {qj} carries Riccati solutions of infinte period. Thus we have
F̃ixj(θ) = ℓ̃
e0, P̃er
j(θ;n) = ∅ (n > 1). (58)
Example 7.8 (A
) Consider the W̃ (D
4 )-stratum of type A
1 , where θ = (0, 0, 0,−4) and
Fixj(θ) = ℓ
j ∐ℓ−j . In this case the surface S(θ) has four singularities of type A1 at (xi, xj, xk) =
(2εi, 2εj, 2εk) ∈ {±2}3 with εiεjεk = −1. Blow up at these points to obtain a minimal resolution
as in (15). Let eεiεjεk be the exceptional line over (xi, xj, xk) = (2εi, 2εj, 2εk) and ℓ̃
j be the
strict transform of ℓ
j . Moreover let p
εiεjεk denote the intersection point of the lines eεiεjεk
and ℓ̃
j (see Figure 17). Then the lifted transformation g̃
j : S̃(θ) 	 acts on the exceptional
line eεiεjεk ∼= P1 as a Möbius transformation. It is a parabolic transformation with the only
fixed point pεiεjεk . Let us check this for (εi, εj, εk) = (−1,−1,−1). The blowing-up of C3 at
(xi, xj, xk) = (−2,−2,−2) is described by xi = uiuj − 2, xj = uj − 2, xk = ujuk − 2, in terms
of coordinates (ui, uj, uk) around (0, 0, 0). Then the exceptional line e
−−− is represented by the
equations uj = 0 and (ui − uk)2 − 2(ui + uk) + 1 = 0 and hence it is parametrized as
, uj = 0, uk =
where the fixed point p−−− corresponds to t = ∞. Then we can check that g̃2j acts on the line
e−−− as the translation t 7→ t+4, as desired. Thus the only fixed points of g̃2j on the exceptional
set E(θ) are the four points pεiεjεk with εiεjεk = −1 and there are no periodic points, so that
F̃ixj(θ) = ℓ̃
j ∐ ℓ̃−j , P̃er
j(θ;n) = ∅ (n > 1). (59)
8 Power Geometry
We apply the method of power geometry [5, 6, 7] to construct as many algebraic branch solutions
to PVI(κ) as possible around each fixed singular point. Basically we can follow the arguments
of [7]. However, while the attention of [7] is restricted to generic parameters, we require a
thorough treatment of all parameters, where much ampler varieties of patterns are present.
Moreover, the way in [7] of representing the parameters of Painlevé VI is not convenient for our
purpose. So we have to redevelop the necessary arguments on power geometry from scratch.
In view of Remark 5.1, it is sufficient to work around the origin z = 0. In order to apply
the method in [5, 6, 7], we reduce the system (1) into a single second-order equation. If
(q, p) = (q(z), p(z)) is a solution to system (1) such that q 6≡ 0, 1, z, ∞, then we solve the first
equation of system (1) with respect to p = p(z) to obtain
z(z − 1)q′ + κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz
2q0q1qz
. (60)
Substituting this into the second equation yields the single second-order equation
z − 1 +
q0q1qz
2z2(z − 1)2
κ24 − κ21
+ κ23
z − 1
+ (1− κ22)
z(z − 1)
Multiply equation (61) by 2z2(z − 1)2q0q1qz and move its right-hand side to the left to obtain
P (z, q) = 0, (62)
where P (z, q) is a polynomial of (z, q, q′, q′′), that is, a differential sum of (z, q), whose explicit
formula is omitted here but can be found in [7]. Therefore system (1) is equivalent to equation
(62) together with (60) except for the possible solutions such that q ≡ 0, 1, z, ∞. A simple
check shows that the Newton polygon of equation (62) is given as in Figure 18, where there are
four patterns according as the parameters κ1 and κ4 are zero or not.
First we search for a holomorphic solution germ q = q(z) to equation (62) around z = 0.
We have only to construct formal power series solutions of the form
q = czr + (higher order terms), (r, c) ∈ Z× C×, (63)
0 (3, 0)
(0, 3)
(0, 6)
(3, 3)
Case κ1 6= 0, κ4 6= 0.
(0, 3)
(0, 6)
(3, 3)
(3, 2)(1, 2)
Case κ1 = 0, κ4 6= 0.
0 (3, 0)
(0, 3)
(3, 3)Γ0
(0, 4) (2, 4)
Case κ1 6= 0, κ4 = 0.
(0, 3)
(3, 3)Γ0
(0, 4) (2, 4)
(3, 2)(1, 2)
Case κ1 = 0, κ4 = 0.
Figure 18: Newton polygon for Painlevé VI
0 (4, 0)
(0, 3)
(0, 6)
(2, 1)
(6, 0) 0 (3, 0)
(0, 1)
(0, 6)
(3, 3)
(1, 0)
Figure 19: Newton polygons for Lemma 8.1 (left) and Lemma 8.2 (right)
since any formal power series solution to equation (62) is convergent [10, 11]. Then it follows
from (60) that the associated formal Laurent series for p = p(z) is also convergent.
In order to construct formal solutions (63), we consider the truncations along the edges
Γ1 and Γ0 of the Newton polygons in Figure 18. We see that Γ1 and Γ0 have outer normal
vectors (p1, p2) = (−1,−1) and (p1, p2) = (−1, 0), whose slopes are p2/p1 = 1 and p2/p1 = 0
respectively. Thus the edges Γ1 and Γ0 correspond to the exponents r = 1 and r = 0 respectively.
The truncation of P = P (z, q) along the edge Γ1 is given by
P1 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′ + (κ21 − κ22 + 1)zq2 − z3(q′)2 + 2z3qq′′
−2κ21z2q + κ21z3.
Substituting q = cz into equation P1 = 0 yields c = κ1/(κ1 + εκ2) with any sign ε ∈ {±1}.
Similarly, the truncation of P = P (z, q) along the edge Γ0 is given by
P0 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′ + (κ23 − κ24)q4 + 2zq3q′ − 3z2q2(q′)2
+2z2q3q′′ + 2κ24q
5 − κ24q6.
Substituting q = c into equation P0 = 0 yields c = (κ4 + εκ3)/κ4 with any sign ε ∈ {±1}.
Lemma 8.1 If κ1 + κ2 6∈ Z, then there exists a holomorphic solution around the origin z = 0,
κ1 + κ2
+ κ1κ2
ak,+(κ) z
k, p = κ0(κ0 + κ4)
bk,+(κ) z
k, (64)
depending holomorphically on κ ∈ K with κ1 + κ2 6∈ Z. Similarly, if κ1 − κ2 6∈ Z, then there
exists a meromorphic solution around the origin z = 0,
κ1 − κ2
+ κ1κ2
ak,−(κ) z
k, p =
κ1 − κ2
bk,−(κ) z
k, (65)
depending holomorphically on κ ∈ K with κ1 − κ2 6∈ Z.
Proof. Substituting q = κ1z(κ1 + εκ2)
−1 + κ1κ2Q with ε ∈ {±1} into equation (62) yields
(κ1 + εκ2)
z, κ1z(κ1 + εκ2)
−1 + κ1κ2Q
= κ21κ
2 p(z, Q),
where p(z;Q) is a differential sum of (z, Q) with coefficients in C[κ] whose Newton polygon is
given as in Figure 19 (left). The vertex (2, 1) carries the linear differential expression
LεQ = 2ε(κ1 + εκ2)4x2{x2Q′′ − xQ′ − (κ1 + εκ2 + 1)(κ1 + εκ2 − 1)Q},
while the vertex (4, 0) carries the monomial (κ1 + εκ2)
2{(κ1 + εκ2)2 + κ23 − κ24 − 1}x4. The
corresponding characteristic polynomial is given by
vε(k) = 2ε(κ1 + εκ2)
4(k − 1− κ1 − εκ2)(k − 1 + κ1 + εκ2).
Hence 1 + |κ1 + εκ2| is the unique critical value of the problem. If it is not an integer, then
the coefficients ak,ε(κ) of the expansions (64) and (65) are determined uniquely and recursively.
By substituting the resulting power series q = q(z) into equation (60), the Laurent series for
p = p(z) is uniquely determined as in (64) and (65). As is mentioned earlier, the formal
solutions (64) and (65) so obtained are convergent. ✷
Lemma 8.2 Assume that κ4 is nonzero. If κ4 + κ3 6∈ Z, then there exists a holomorphic
solution germ around the origin z = 0,
κ4 + κ3
ak,+(κ) z
k, p = −κ4κ0
bk,+(κ) z
k, (66)
depending holomorphically on κ ∈ K with κ4 6= 0 and κ4 + κ3 6∈ Z. Similarly, if κ4 − κ3 6∈ Z
then there exists a holomorphic solution germ around the origin z = 0,
κ4 − κ3
ak,−(κ) z
k, p = −κ4(κ0 + κ4)
bk,−(κ) z
k, (67)
depending holomorphically on κ ∈ K with κ4 6= 0 and κ4 − κ3 6∈ Z.
Proof. Substituting q = κ−14 (κ4 + εκ3) + κ
4 κ3Q into equation (62) yields
z, κ−14 (κ4 + εκ3) + κ
4 κ3Q
= κ23 p(z;Q),
where p(z;Q) is a differential sum of (z, Q) with coefficients in C[κ]. The vertex (0, 1) carries
the linear differential expression LεQ = 2(κ4 + εκ3)2{x2Q′′ + xQ′ − (κ4 + εκ3)2Q}, while the
vertex (1, 0) carries the monomial (κ4 + εκ3)
2{1 + κ21 − κ22 − (κ4 + εκ3)2}x. The corresponding
characteristic polynomial is given by vε(k) = 2(κ4 + εκ3)
2{k − (κ4 + εκ3)}{k + (κ4 + εκ3)}.
Hence |κ4 + εκ3| is the unique critical value of the problem. If it is not an integer, then the
coefficients ak,ε(κ) of expansions (66) and (67) are determined uniquely and recursively. Then
substituting the resulting series for q = q(z) into equation (60) yields the Laurent series for
p = p(z) as in (66) and (67). The formal solutions so obtained are convergent. ✷
The solutions in Lemmas 8.1 and 8.2 are essentially constructed in [7, 23]. We construct
more particular solutions for the parameters on various strata of higher codimensions.
0 (4, 0)
(0, 3)
(0, 6)
(2, 1)
Figure 20: Newton polygon for Lemma 8.3
Lemma 8.3 (A
and A
) If κ1 = κ2 = 0, then there exists a 1-parameter family of holo-
morphic solution around the origin z = 0,
t+ (1− t)(1− z)κ4 + t(1− t)κ0(κ0 + κ3)
ak(t; κ) z
p = κ0(κ0 + κ4)
bk(t; κ) z
depending on t ∈ C, where a2(t; κ) = 2, b0(t; κ) = 1 and the remaining coefficients ak(t; κ),
k ≥ 3, and bk(t; κ), k ≥ 1, are polynomials of (t, κ3, κ4) determined uniquely and recursively.
Proof. We put R(z; t) = tz{t+(1−t)(1−z)κ4}−1. Substituting q = R(z; t)+t(1−t)κ0(κ0+κ3)Q
into equation (62) and multiplying the result by {t+ (1− t)(1− z)κ4}4 yield
p(z, Q; t) := {t+ (1− t)(1− z)κ4}4P (z, R(z; t) + t(1− t)κ0(κ0 + κ3)Q)
= 2t2(1− t)2κ0(κ0 + κ3){LQ+ g(z, Q; t) + h(z)} = 0,
where LQ = z2{z2Q′′ − zQ′ + Q} and h(z) = −2z4(1 − z)2κ4+1. The Newton polygon of (69)
is given as in Figure 20, where the terms LQ and h(z) correspond to the vertex (2, 1) and the
horizontal infinite edge emanating from the vertex (4, 0) respectively, and the remaining term
g(z, Q; t) corresponds to the remaining part of the polygon. Since the characteristic equation
of LQ is (k − 1)2 = 0 having the unique root k = 1, the coefficients ak(t; κ), k ≥ 2, in (68)
are determined uniquely and recursively. Here the leading coefficient a2(t; κ) is found to be
a2(t; κ) = 2 by substituting Q = a2(t; κ)z
2 into the truncation LQ − 2z4 = 0 of equation
(69) along the edge connecting the vertices (2, 1) and (4, 0). Substituting the resulting series
q = q(z) into (60) we have p = p(z) as in (68). ✷
(0, 6)
(0, 1)
(2, 0)
Figure 21: Newton polygon for Lemma 8.4
Lemma 8.4 (A3) If κ0 = κ1 = κ2 = 0, κ3 + κ4 = 1, then there exists a 1-parameter family of
holomorphic solutions around the origin z = 0,
1− (1− z)κ4 + tκ3z + tκ3κ4
ak(t; κ) z
k, p = −tκ34
bk(t; κ) z
k, (70)
depending on t ∈ C, where a2(t; κ) = tκ3, b1(t; κ) = 1 and the coeffcients ak(t; κ), k ≥ 2, and
bk(t; κ), k ≥ 1, are polynomials of (t, κ4) determined uniquely and recursively.
Proof. Put R(z) = z{1− (1− z)κ4}−1. Substituting q = R(z) + tκ3z + tκ3κ4Q into (62) yields
P (z, R(z) + tκ3z + tκ3κ4Q) = tκ
4R(x)
4 p(z, Q; t),
where p(z, Q; t) is a differential sum of (z, Q) with coefficients in C[t, κ4] whose Newton polygon
is given as in Figure 21. Especially the vertex (0, 1) carries the linear differential expression
LQ = 2(z2Q′′ + zQ′ −Q), whose characteristic polynomial is 2(k − 1)(k + 1), while the vertex
(2, 0) carries the monomial −6tκ3z2. Since the critical values k = ±1 are smaller than 2, the
coefficients ak(t; κ), k ≥ 2, in (70) are determined uniquely and recursively, where the leading
coefficient a2(t; κ) is found to be tκ3. The rest of the proof is similar to that in Lemma 8.3. ✷
Lemma 8.5 (D4) If κ0 = κ1 = κ2 = κ3 = 0 and κ4 = 1, then there exists a 1-parameter
family of holomorphic solution germs around the origin z = 0,
q = 1 + t
ak(t) z
k, p =
(1− z) log(1− z) + t
bk(t) z
k, (71)
depending on a parameter t ∈ C, where a1(t) = b1(t) = 1 and the remaining coeffieicents ak(t)
and bk(t), k ≥ 2, are polynomials of t determined uniquely and recursively.
0 (3, 0)
(0, 2)
(0, 6)
(1, 1)
(6, 0) 0
(2, 1)(0, 1)
(0, 4)
(2, 3)
(1, 0)
Figure 22: Newton polygons for Lemma 8.5 (left) and Lemma 8.6 (right)
Proof. Substituting q = 1+ tz + tQ into equation (62) yields P (z, 1+ tQ) = t2p(z, Q; t), where
p(z, Q; t) is a differential sum of (z, Q) with coefficients in C[t] whose Newton polygon is given
as in Figure 22 (left). Consider the edge connecting (1, 1) and (3, 0). The vertex (1, 1) carries
the differential monomial LQ = 2z3Q′′, whose characteristic polynomial is k(k − 1), while the
vertex (3, 0) carries the monomial −4tz3. Put a1(t) = 1. Since the critical values k = 0, 1 are
smaller than 2, the coefficients ak(t), k ≥ 2, in (71) are determined uniquely and recursively.
Substituting the resulting series q = q(z) into (60) we have p = p(z) as in (71). Here the term
z/{(1 − z) log(1 − z)} is singled out from p = p(z), because putting t = 0 yields the special
solution q ≡ 1 and p = z/{(1− z) log(1− z)} (see also Lemma 9.4). ✷
Lemma 8.6 (A
) Let κ0 = 1/2 and κ1 = κ2 = κ3 = κ4 = 0. Then there exists a 1-parameter
family of holomorphic solutions around the origin z = 0 depending on a parameter t ∈ C,
q = tz + t(1− t)
ak(t) z
k, p =
bk(t) z
k, (72)
where the coefficients ak(t) and bk(t) are polynomials of t beginning with a2(t) = 1/2 and
b0(t) = 1/4. Moreover there is another 1-parameter family of solutions around z = 0,
ck(t) z
k, p = t
dk(t) z
k, (73)
depending on t ∈ C, where ck(t) and dk(t) are polynomials of t beginning with c1(t) = 1/2 and
d0(t) = −1/2. For t = 0, formula (73) represents the solution such that q ≡ ∞ and p ≡ 0.
Proof. We only derive (73), as (72) is derived in a similar manner. Substituting q = t−1 +
t−1(t−1)Q into (62) yields P (z, t−1+t−1(t−1)Q) = t−4(t−1)2z−1p(z, Q; t), where p(z, Q; t) is a
differential sum of (z, Q) with coefficients in C[t] whose Newton polygon is given as in Figure 22
(right). Consider the edge connecting (0, 1) and (1, 0). The vertex (0, 1) carries the differential
sum LQ = −2(zQ′ + z2Q′′), whose characteristic polynomial is −2k2, while the vertex (1, 0)
carries the monomial z. Since the critical value k = 0 is smaller than 1, the coefficients ck(t),
k ≥ 1, in (73) are determined uniquely and recursively, where the leading term is c1(t) = 1/2.
Substituting the resulting series q = q(z) into (60) we have p = p(z) as in (73). ✷
So far we have considered the edges Γ1 and Γ0 of the Newton polygon in Figure 18 and
constructed meromorphic solutions around the origin z = 0. Now let us consider the vertex (0, 3)
of the polygon, which gives rise to algebraic branch solutions around z = 0. The truncation of
the differential sum P = P (z, q) at the vertex (0, 3) is given by
P3 = −2zq2q′ + 2z2q(q′)2 − 2z2q2q′′.
Its charactersistic polynomial χ(r) is defined by substituting q = zr into P3(z, q) and dividing
the result by q3 = z3r. In the present situation we see that χ(r) is identically zero; χ(r) ≡ 0.
The normal cone of the vertex (0, 3) is U3 = { (p1, p2) ∈ R2 : p1 < 0, 0 < r = p2/p1 < 1 }.
Thus the truncated solutions at the vertex (0, 3) are q = tzr for an arbitrary 0 < r < 1 and
t ∈ C×. The Fréchet derivative with respect to q at the truncated solution q = tzr is given by
L3Q = −2t2z2r{z2Q′′ + (1− 2r)zQ′ + r2Q}.
The corresponding characteristic equation is given by v3(k) := −2t2(k − r)2 = 0, which has
the only root k = r. Let n be any integer greater than 1. In order to search for an algebraic
n-branch solution around z = 0, we take r = m/n for any integer 0 < m < n coprime to n,
and consider a formal Puiseux series solution of the form
q = tzm/n +
ν=m+1
aν(t) z
Since the characteristic equation v3(k) = 0 has no roots such that k > m/n, the coefficients
aν = aν(t) can be determined uniquely and recursively for any given initial coefficient t ∈ C×.
The convergence of the formal solution and its holomorphic dependence on parameters follow
easily if we rewrite the equation (62) in terms of the new independent variable ζ = z1/n and
apply the convergence arguments in [10, 11]. Thus we have established the following lemma.
Lemma 8.7 For any integer n > 1, there exist ϕ(n) mutually disjoint 1-parameter families of
n-branch solution germs to PVI(κ) around the origin z = 0,
q(z) = tzm/n +
ν=m+1
aν(n,m, t; κ) z
p(z) =
m+ n(κ1 + κ2 − 1)
z−m/n +
ν=−m+1
bν(n,m, t; κ) z
where the discrete parameter m ranges over all integers 0 < m < n coprime to n and the
continuous parameter t takes any value of the punctured complex line C×.
Remark 8.8 The family (74) contains no Riccati solutions, even if κ ∈ Wall. This will be
shown in the proof of Lemma 9.14.
9 Injection Implies Surjection
We establish Theorem 1.3 based on the main idea described in §1. Due to the S3-symmetry
permuting the three fixed singular points, it suffices to work around z = 0 (see Remark 5.1).
As a preliminary we begin by constructing some Riccati solutions to equation (1). Assume
that κ0 = 0 so that κ1 + κ2 + κ3 + κ4 = 1. Then the second equation of system (1) has the null
solution p(z) ≡ 0. Substituting this into the first equation yields the Riccati equation
z(z − 1)q′ + κ1q1qz + (κ2 − 1)q0q1 + κ3q0qz = 0.
If κ4 is nonzero, then the change of dependent variable
z(1 − z)
log{(1− z)−κ4f}
transfers the Riccati equation to the Gauss hypergeometric equation
z(1− z)f ′′ + {(1− κ3 − κ4)− (κ2 − κ4 + 1)z}f ′ + κ2κ4f = 0. (75)
Next assume that κ0 = κ1 = 0 so that κ2 + κ3 + κ4 = 1. In this case there is another type
of Riccati solution to the system (1). The first equation of the system (1) has the null solution
q(z) ≡ 0. Substituting this into the second equation yields the Riccati equation
z(z − 1)p′ + zp2 + (κ2 − 1 + κ3z)p = 0.
Then change of independent variable p = (z − 1) d
log g takes it to the linear equation
z(1 − z)g′′ + {(1− κ2)− (κ3 + 1)z}g′ = 0. (76)
Lemma 9.1 (A1) Assume that κ0 = 0, κ1 + κ2 + κ3 + κ4 = 1, κ1 + κ2 6∈ Z, κ3 + κ4 6∈ Z and
κ1κ4 6= 0. Then system (1) has two single-valued Riccati solutions around the origin z = 0,
(a) q =
κ1 + κ2
+O(κ1z
2), p ≡ 0,
(b) q =
κ3 + κ4
+O(κ3z), p ≡ 0.
Proof. The solutions (a) and (b) are obtained from two linearly independent solutions
2F1(κ2,−κ4, 1− κ3 − κ4; z), zκ3+κ42F1(κ2 + κ3 + κ4, κ3, κ3 + κ4 + 1; z),
of equation (75) repsectively. They also come from solutions (64) and (66) respectively. ✷
Lemma 9.2 (A2) Assume that κ0 = κ1 = 0, κ2 + κ3 + κ4 = 1, κ2 6∈ Z, κ3 6= 1 and κ4 6= 0.
Then system (1) has three single-valued Riccati solutions around the origin z = 0,
(a) q ≡ 0, p ≡ 0,
(b) q =
κ3 + κ4
+O(z), p ≡ 0,
(c) q ≡ 0, p = −κ2(κ3 − 1)
κ2 + 1
+O(z).
Proof. The solutions (a) and (b) just come from (a) and (b) of Lemma 9.1 respectively, while
solution (c) is obtained from the solution zκ22F1(κ2, κ2 + κ3, κ2 + 1; z) of equation (76). ✷
Lemma 9.3 (A3) Assume that κ0 = κ1 = κ2 = 0, κ3 + κ4 = 1 and κ4 6= 0. Then system (1)
admits a 1-parameter family of single-valued Riccati solutions around the origin z = 0,
q(z; s) =
s0 + s1(1− z)κ4
, p(z; s) ≡ 0, (s = [s0 : s1] ∈ P1). (77)
Proof. The hypergeometric equation (75) becomes (1 − z)f ′′ − κ3f ′ = 0, whose nontrivial
solutions are given by f(z) = s0 + s1(1− z)κ4 with (s0, s1) ∈ C2 − {(0, 0)}. The corresponding
Riccati solutions are the 1-parameter family of single-valued solutions q = q(z; s) as in (77). ✷
Lemma 9.4 (D4) Assume that κ0 = κ1 = κ2 = κ3 = 0 and κ4 = 1.
(1) System (1) admits a 1-parameter family of rational Riccati solutions
q(z; s) =
s0 + s1(1− z)
, p(z; s) ≡ 0, (s = [s0 : s1] ∈ P1). (78)
(2) System (1) admits a 1-parameter family of single-valued Riccati solutions around z = 0,
q(z; t) ≡ 1, p(z; t) = t0z
(1− z){t0 log(1− z) + t1}
, (t = [t0 : t1] ∈ P1). (79)
Proof. In this case (77) gives the 1-parameter family of rational solutions (78). Moreover the
first equation of system (1) admits a constant solution q(z) ≡ 1. Substituting this into the
second equation yelds the Riccati equation z(z−1)p′+(1− z)p2+ p = 0. Change of dependent
variable p = −zf ′/f takes this into the linear equation (1 − z)f ′′ − f ′ = 0, whose nontrivial
solutions are given by f = t0 log(1 − z) + t1 with (t0, t1) ∈ C2 − {(0, 0)}. Thus the Riccati
equation has the 1-parameter family of single-valued solutions p = p(z; t) as in (79). ✷
Now we proceed to the proof of Theorem 1.3. From now on we fix the indices as (i, j, k) =
(3, 1, 2) in accordance with the choice of indices in §8. First we treat the fixed point case.
Lemma 9.5 The set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0.
Proof. Case-by-case check based on the “injection-implies-surjection” principle described in §1.
Example 9.6 (∅) We combine the results of Example 7.1, Lemmas 8.1 and 8.2. A key ob-
servation is that Lemmas 8.1 and 8.2 give us as many meromorphic solutions around z = 0
as the cardinality of the set F̃ixj(θ) in (49). For example, if P̃ (bi, b4; bj , bk) ∈ F̃ixj(θ), then
the existing and smoothness conditions for it (see Table 3) makes it possible to apply Lemma
8.2 to conclude that the meromorphic solution (66) exists corresponding to the fixed point
P̃ (bi, b4; bj, bk). Thus the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0.
Example 9.7 (A1) We combine the results of Example 7.2, Lemmas 8.1, 8.2 and 9.1. First we
notice that the two single-valued Riccati solutions in Lemma 9.1 correspond to the two Riccati
fixed points F̃ix
j(θ) = {p, q} in (50). On the other hand, for the same reason as in Example 9.6,
formulas (65) and (67) in Lemmas 8.1 and 8.2 give us as many meromorphic solutions around
z = 0 as the cardinality of smooth fixed points F̃ix
j(θ) = {{P̃ (bi, b−14 ; bj , bk), P̃ (bj , b−1k ; bi, b4)}}
in (50). Thus the set F̃ixj(θ) is exhausted by meromorphic solutions around z = 0.
Example 9.8 (A2) We combine the results of Example 7.3, Lemmas 8.2 and 9.2. As (51)
shows, the set F̃ixj(θ) consists of the four points p0, p+, p− and P̃ (bj , b
k ; bi, b4). On the other
hand, we have the three single-valued Riccati solutions of Lemma 9.2 and one non-Riccati
holomorphic solution (67). Clearly, the three Riccati solutions correspond to the points p0,
p+ and p−, while the non-Riccati solution corresponds to the remaining point P̃ (bj , b
k ; bi, b4).
Thus any single-valued solution around z = 0 is a meromorphic solution.
Example 9.9 (A
) We combine the results of Example 7.4, Lemmas 8.2 and 8.3. First we
consider the W̃ (D
4 )-stratum of type (A
1 )i. The C-parameter family (68) of holomorphic
solutions injects into the line ℓ̃+j ≃ C in (53), so that we have an injection C →֒ C. Since this
injection is holomorphic, it must be a surjection. Thus the line ℓ̃+j is exhausted by the family
(68). Moreover we have the two holomorphic solutions (66) and (67), which do not lie in the
family (68). Thus they must correspond to the points P̃ (bi, b4; bj, bk) and P̃ (bi, b
4 ; bj , bk) in
(53). So on the stratum of type (A⊕21 )i the set F̃ixj(θ) is exhausted by meromorphic solutions
around z = 0. Next we consider the W̃ (D
4 )-strata of types (A
1 )j and (A
1 )k. On these
strata the equality F̃ixj(θ) = F̃ix
j(θ) in (54) implies that any single-valued solution around
z = 0 is a Riccati and hence meromorphic solution.
Example 9.10 (A3) We combine the results of Example 7.5, Lemmas 8.2, 8.4 and 9.3. First
we consider the W̃ (D
4 )-stratum of type (A
1 )i. The P
1-family (77) of single-valued Riccati
solutions exactly corresponds to the exceptional curve e0 ≃ P1 in (55). So the C-family (70) of
holomorphic solutions must inject into the line ℓ̃+j ≃ C in (55). Since this injection C →֒ C is
holomorphic, it must be a surjection. Hence ℓ̃+j is exhausted by the family (70). Moreover there
is the holomorphic solution (67), which must correspond to the point P̃ (bi, b
4 ; bj, bk) in (55).
Thus on the stratum of type (A⊕21 )i the set F̃ixj(θ) is exhausted by meromorphic solutions
around z = 0. Next we consider the W̃ (D
4 )-strata of types (A
1 )j and (A
1 )k. On these
strata the equality F̃ixj(θ) = F̃ix
j(θ) in (56) implies that any single-valued solution around
z = 0 is a Riccati and hence meromorphic solution.
Example 9.11 (A
) We combine the results of Example 7.6 and Lemma 8.3. As (57) shows,
F̃ixj(θ) has only one line component ℓ̃
j ≃ C. Hence the C-family (68) of holomorphic solu-
tions must inject into this line, so that we have an inclusion C →֒ C. Since this injection is
holomorphic, it must be a surjection. Thus ℓ̃+j is exhausted by the family (68).
Example 9.12 (D4) We combine the results of Example 7.7, Lemmas 8.5 and 9.4. The P
family (78) of rational Riccati solutions corresponds to the exceptional curve e0 in (58), while
the P1-family (79) of single-valued Riccati solutions corresponds to the exceptional curve ej
there. Hence C-family (71) of holomorphic solutions, which is different from (78) and (79),
must inject into the line ℓ̃+j ≃ C in (58). Since this injection C →֒ C is holomorphic, it must
be a surjection. Thus ℓ̃+j is exhausted by the family (71).
Example 9.13 (A
) We combine the results of Example 7.8 and Lemma 8.6. In view of
F̃ixj(θ) = ℓ̃
j ∐ ℓ̃−j , the C-family (72) of holomorphic solutions injects into the line ℓ̃εj ≃ C for
some sign ε ∈ {±1}. So we have an injection C →֒ C. Since this injection is holomorphic, it
must be a surjection, so that ℓ̃εj is exhausted by the family (72). Then the other C-family (73)
of holomorphic solutions injects into the remaining line ℓ̃−εj ≃ C. So we have another injection
C →֒ C. Since this injection is holomorphic, it must be a surjection. Hence F̃ixj(θ) = ℓ̃+j ∐ ℓ̃−j
is exhausted by the families (72) and (73). The proof of Lemma 9.5 is now complete. ✷
Finally we argue the periodic point case using the “injection-implies-surjection” principle.
Lemma 9.14 For any n > 1 the set P̃erj(θ;n) is exhausted by algebraic n-branch solutions
around z = 0.
Proof. We combine Lemmas 6.6 and 8.7. First we consider the generic case where κ ∈ K−Wall,
namely, where θ = rh(κ) is such that ∆(θ) 6= 0. In this case there is no Riccati locus and hence
P̃erj(θ;n) = P̃er
j (θ;n), which is biholomorphic to the disjoint union of ϕ(n) copies of C
Lemma 6.6. On the other hand, by Lemma 8.7, there are ϕ(n) mutually disjoint C×-parameter
families of algebraic n-branch solutions around z = 0 as in (74). Number these families from
1 to ϕ(n). The first family injects into a (unique) connected component (≃ C×) of P̃erj(θ;n),
which we call the first component, and we have an injection C× →֒ C×. Since this injection is
holomorphic, it must be a surjection and hence the first component is exhausted by the first
family. Consider the second family of solutions and the corresponding second component of
P̃erj(θ;n). Notice that the second component is different from the first one, because the first
component is already occupied by the first family and so it cannot contain the second family.
For the same reason as above, the second component is exhausted by the second family. Since
the families and the components have the same cardinality ϕ(n), we can repeat this argument
to conclude that P̃erj(θ;n) is exhausted by the ϕ(n) families of algebraic n-branch solutions.
Next we consider the case where κ ∈ Wall, namely, where the Riccati part P̃er
j(θ;n) may
appear. Since the lemma is trivial for the Riccati part, we have only to consider the non-Riccati
part P̃er
j(θ;n). The argument proceeds just in the same manner as in the last paragraph, once
we show that the family of solutions in (74) contains no Riccati solutions (see Remark 8.8). To
see this, we consider the family S̃ → Θ of surfaces S̃(θ) parametrized by θ ∈ Θ and put
j (n) =
j (θ;n), E =
E(θ),
where E(θ) is the exceptional set in S̃(θ). (Precisely speaking, the parameter space Θ should
be replaced by a finite covering of it to get a simultaneous minimal resolution.) Then P̃er
j (n)
and E are closed subsets of S̃ which are disjoint by Lemma 4.5. Now we look at the family
of solutions in (74). It depends continuously on κ ∈ K. Take any point κ∗ ∈ Wall and let
K − Wall ∋ κ → κ∗. For any κ ∈ K −Wall, the family at κ is contained in P̃er
j(θ;n) with
θ = rh(κ) and hence in P̃er
j(n). Taking the limit κ → κ∗, we see that the family at κ∗ is
contained in P̃er
j (n), hence in P̃er
∗;n) with θ∗ = rh(κ∗). Since P̃er
∗;n) is disjoint from
E(θ∗), the family at κ∗ contains no Riccati solutions. Therefore the proof is complete. ✷
Now the local statement of Theorem 1.3 around a fixed singular point, say z = 0, is an
immediate consequence of Lemmas 9.5 and 9.14. At the same time all the finite branch solutions
around z = 0 have been classified up to Bäcklund transformations. The global statement about
algebraic solutions follows readily from the local statements around z = 0, 1, ∞, together with
the analytic Painlevé property on Z = P1 − {0, 1,∞}. The proof of Theorem 1.3 is complete.
References
[1] F.V. Andreev and A.V. Kitaev, Transformations RS24(3) of the ranks ≤ 4 and algebraic
solutions of the sixth Painlevé equation, Comm. Math. Phys. 228 (2002), 151–176.
[2] P. Boalch, From Klein to Painlevé via Fourier, Laplace and Jimbo, Proc. London Math.
Soc. (3) 90 (2005), 167–208.
[3] P. Boalch, The fifty-two icosahedral solutions to Painlevé VI, J. Reine Angew. Math. 596
(2006), 183–214.
[4] E. Brieskorn, Über die Auslösung gewisser Singularitäten von holomorphen Abbildungen,
Math. Ann. 166 (1966), 76–102.
[5] A.D. Bruno, Power asymptotics of solutions to an ordinary differential equation, Dokl.
Math., 68 (2003), no. 2, 199-203.
[6] A.D. Bruno, Power-logarithmic expansion of solutions to an ordinary differential equation,
Dokl. Math., 68 (2003), no. 2, 221-226.
[7] A.D. Bruno and I.V. Goryuchkina, Expansions of solutions of the sixth Painlevé equation,
Dokl. Math., 69 (2004) no. 2, 268–272.
[8] B. Dubrovin and M. Mazzocco, Monodromy of certain Painlevé-VI transcendents and
reflection groups, Invent. Math. 141 (2000), no. 1, 55–147.
[9] R. Garnier, Étude de l’intégrale générale de l’équation VI de M. Painlevé dans le voisinage
de ses singularités transcendentes, Ann. Sci. École Norm. Sup. 34 (1917), 239–353.
[10] R. Gérard, Une classe d’équations différentielles non lineaires à singularité régulière, Funk-
cial. Ekvac. 29 (1986), no. 1, 55–76.
[11] R. Gérard and Y. Sibuya, Etude de certains systèmes de Pfaff avec singularité, Lecture
Notes in Math. 712, Springer, Berlin, 1979, 131–288.
[12] D. Guzzetti, The elliptic representation of the general Painlevé VI equation, Comm. Pure
Appl. Math. 55 (2002), 1280–1363.
[13] N. Hitchin, Poncelet polygons and the Painlevé equations, Geometry and analysis, Bombay,
1992, Tata Inst. Fund. Res., Bombay (1995), 151–185.
[14] N. Hitchin, A lecture on the octahedron, Bull. London Math. Soc. 35 (2003), no. 5, 577–
[15] M. Inaba, K. Iwasaki and M.-H. Saito, Bäcklund transformations of the sixth Painlevé
equation in terms of Riemann-Hilbert correspondence, Internat. Math. Res. Notices 2004:1
(2004), 1–30.
[16] M. Inaba, K. Iwasaki and M.-H. Saito, Dynamics of the sixth Painlevé equation, to appear
in: Théorie asymptotique et équations de Painlevé (Angers, juin 2004), M. Loday and
E. Delabaere (Éd.), Séminaires et Congrès, Soc. Math. France.
[17] M. Inaba, K. Iwasaki and M.-H. Saito, Moduli of stable parabolic connections, Riemann-
Hilbert correspondence and geometry of Painlevé equation of type V I. Part I, Publ. Res.
Inst. Math. Sci. 42 (2006), no. 4, 987–1089.
[18] M. Inaba, K. Iwasaki and M.-H. Saito, Moduli of stable parabolic connections, Riemann-
Hilbert correspondence and geometry of Painlevé equation of type V I. Part II, Adv. Stud.
Pure Math. 45 (2006), 387–432.
[19] K. Iwasaki, An area-preserving action of the modular group on cubic surfaces and the
Painlevé VI equation, Comm. Math. Phys., 242 (2003), no. 1-2, 185–219.
[20] K. Iwasaki and T. Uehara, An ergodic study of Painlevé VI, Math. Ann. (in press). Online
First DOI: 10.1007/s00208-006-0077-8. arXiv: math.AG/0604582.
[21] K. Iwasaki and T. Uehara, Chaos in the sixth Painlevé equation, RIMS Kôkyûroku
Bessatsu B2 (2007), 73–88.
[22] M. Jimbo, Monodromy problem and the boundary condition for some Painlevé equations,
Publ. Res. Inst. Math. Sci., 18 (1982), no.3, 1137–1161.
[23] K. Kaneko, Painlevé VI transcendents which are meromorphic at a fixed singularity, Proc.
Japan Acad. 82, Ser. A (2006), no. 5, 71–76.
[24] A.V. Kitaev, Grothendieck’s dessins d’enfants, their deformations, and algebraic solutions
of the sixth Painlevé and Gauss hypergeometric equations, Algebra i Analiz 17 (2005), no.
1, 224–275.
[25] M. Mazzocco, Rational solutions of the Painlevé VI equation, J. Phys. A: Math. Gen. 34
(2001), 2281–2294.
[26] K. Okamoto, Sur les feuilletages associés aux équations du second ordre à points critiques
fixes de P. Painlevé, Espaces des conditions initiales, Japan. J. Math. 5 (1979), 1–79.
[27] K. Okamoto, Study of the Painlevé equations I, sixth Painlevé equation PVI, Ann. Math.
Pura Appl. (4) 146 (1987), 337–381.
[28] M.-H. Saito, T. Takebe and H. Terajima, Deformation of Okamoto-Painlevé pairs and
Painlevé equations, J. Algebraic. Geom. 11 (2002), no. 2, 311–362.
http://arxiv.org/abs/math/0604582
[29] M.-H. Saito and H. Terajima, Nodal curves and Riccati solutions of Painlevé equations, J.
Math. Kyoto Univ. 44 (2004), no. 3, 529–568.
[30] H. Sakai, Rational surfaces associated with affine root systems and geometry of the Painlevé
equations, Comm. Math. Phys. 220 (2001), 165–229.
[31] S. Shimomura, A family of solutions of a nonlinear ordinary differential equation and its
application to Painlevé equations (III), (V) and (VI), J. Math. Soc. Japan 39 (1987), no.
4, 649–662.
[32] K. Takano, Reduction for Painlevé equations at the fixed singular points of the first kind,
Funkcial. Ekvac. 29 (1986), no. 1, 99–119.
	Introduction
	Phase Space
	Riemann-Hilbert Correspondence
	Dynamics on Cubic Surface
	Bäcklund Transformations
	Fixed Points and Periodic Points
	Case-by-Case Study
	Power Geometry
	Injection Implies Surjection
ABSTRACT
  Every finite branch solutions to the sixth Painleve equation around a fixed
singular point is an algebraic branch solution. In particular a global solution
is an algebraic solution if and only if it is finitely many-valued globally.
The proof of this result relies on algebraic geometry of Painleve VI,
Riemann-Hilbert correspondence, geometry and dynamics on cubic surfaces,
resolutions of Kleinian singularities, and power geometry of algebraic
differential equations. In the course of the proof we are also able to classify
all finite branch solutions up to Backlund transformations.

<|endoftext|><|startoftext|>
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
Modern Physics Letters A
c© World Scientific Publishing Company
An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy
Equivalent
Sohrab Rahvar
Department of Physics, Sharif University of Technology, P.O.Box 11365-9161, Tehran, Iran.∗
Yousef Sobouti
Institute for Advanced Studies in Basic Sciences,
P.O.Box 45195-1159, Zanjan, Iran
Received (Day Month Year)
Revised (Day Month Year)
To explain the cosmic speed up, brought to light by the recent SNIa and CMB ob-
servations, we propose the following: a) In a spacetime endowed with a FRW metric,
we choose an empirical scale factor that best explains the observations. b) We assume
a modified gravity, generated by an unspecified field lagrangian, f(R). c) We use the
adopted empirical scale factor to work back retroactively to obtain f(R), hence the term
‘Inverse f(R)’. d) Next we consider the classic GR and a conventional FRW universe
that, in addition to its known baryonic content, possesses a hypothetical ‘Dark Energy’
component. We compare the two scenarios, and find the density, the pressure, and the
equation of the state of the Dark Energy required to make up for the differences between
the conventional and the modified GR models.
Keywords: Cosmology; Dark Energy; Modified Gravity.
95.36.+x, 98.80.Jk, 98.80.Es
∗rahvar@sharif.edu
http://arxiv.org/abs/0704.0680v2
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
2 Rahvar & Sobouti
As cosmological standard candles, supernovae type Ia (SNIa) appear dimmer
than what one expects from a Cold Dark Matter (CDM) model of the universe
1,2,3. This observation and other evidences from the Cosmic Microwave Background
(CMB) measurements indicate that the universe is in an acceleration phase of its
expansion 4,5,6. A conventional CDM scenario does not explain this speed up.
Some authors have stipulated a dark energy component to make up for whatever
dynamical effects that the known energy momentum content of the model leaves
unaccounted for 7,8,9,10,11,12,13,14,15,16,17,18,19,20,21. Others have entertained
alternatives to Einstein’s gravitation 22,23,24,25,26,27,28,29,30. Yet a third school
have resorted to inhomogeneous FRW universes to explain the dilemma 31. Since
all these approaches attempt to answer the same question, all should be equivalent,
and there should be a way to translate one language to the other.
Here, we are concerned with the ’dark energy’ and ’alternative gravitation’ sce-
narios. We suggest to begin with a Freidman- Robertson- Walker (FRW) universe,
to choose its scale factor, a(t), in a way that best explains the available observa-
tions, and to work out the dynamics of the spacetime. Next, to write down the
field equations for a modified f(R) gravitation, and knowing a(t), to solve for f(R).
Finally, to attribute whatever deviations from the conventional FRW results there
is, to a dark energy field, and to obtain its density, pressure, equation of state, etc.
The choice of the scale factor
In a FRW metric, ds2 = −dt2 + a(t)2dx2, single term scale factors of the form
a ∝ tβ lead to constant deceleration parameters, q = −äa/ȧ2 = (1 − β)/β, and do
not serve the purpose. We propose the following two-term ansatz
a(t) =
1 + p
(t/t0)
1 + p(t/t0)
, (1)
where t0 is the age of the universe, and α and p are the free parameters of the model,
to be adjusted to ensure compatibility of the emerging results with observations.
The factor (1+p)−1 is introduced to have a(t0) = 1. By letting either α or p tend to
zero, one recovers the standard CDM universe. Hereafter, for economy in writing,
we will use the time parameter τ = p(t/t0)
2α/3 instead of the conventional time t.
From Eq. (1) one finds
[1− (1 + α)(2α− 1)τ ] [1 + τ ] [1 + (1 + α)τ ]−2 (2)
)−3/2α [1 + (1 + α)τ ] [1 + τ ]
, (3)
R = t02R = 6H2(1− q) =
)−3/α (4)
× [1 + (2 + 5α+ 2α2)τ + (1 + 5α+ 4α2)τ2] [1 + τ ]−2 .
For α > 1/2, q can become negative and remain nonsingular throughout. Transition
from a decelerated phase of expansion to an accelerated one takes place at τtrans =
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 3
[(α+ 1)(2α− 1)]−1 or ttrans = [(α+ 1)(2α− 1)p]
−3/2α
t0 < t0. For −1 < α < 1/2,
q is always positive and nonsingular. For α < −1, q can become negative but also
singular. The last two possibilities are discarded. For all values of α and p,
limit q(t → 0) =
limit q(t → ∞) = (1− 2α)/2(1 + α). (5)
In Fig.(1) we have plotted q(t) versus the redshift, z = a(t0)/a(t) − 1, for several
values of α and p = 1/3.
0 1 2
Fig. 1. Plot of q(t) versus the redshift, z = a(t0)/a(t) − 1, for α = 0, 1, 2, 3, 4; and p = 1/3. The
case α = 0 gives the classic value, q = 2/3. As α increases from 1 to 4, transition to the accelerated
phase of expansion moves from later to earlier epochs, from smaller z’s to larger one.
As to H and R, both remain positive for all times. Both tend to ∞ as τ → 0
and decease monotonically to 0 as τ → ∞. They exhibit a normal behavior in the
neighborhood of τtrans = [(1 + α)(2α − 1)]−1.
Equation (3), written for the present epoch, reveals a constraint on α and p,
that should be observed in the final design of the model. Thus,
[1 + (1 + α)p]/[1 + p] =
H0t0 ≈
. (6)
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
4 Rahvar & Sobouti
0 0.5 1 1.5
Fig. 2. The distance modulus of the SNIa Gold’s sample versus redshifts, black circles; And the
plot of Eq. (7) with the scale factor of Eq. (1), solid line. Parameters of the model are α = 2 and
p = 1/3.
Empirical values of α & p
The distance modulus (corrected for the reddening) and the (dimensionless) lumi-
nosity distance, DL(z;α, p), of supernovae are related as
µ = m−M = 5 logDL(z;α, p) + 25, (7)
DL(z;α, p) = (1 + z)
dzH(z;α, p)−1.
In Fig. (2), the observed distance modulus of 157 SNIa of Gold’s sample are plot-
ted versus the redshifts. The solid curve is the plot of equation (7), in which, in
compliance with the constraint of Eq. (6), we have chosen
α = 2, p = 1/3. (8)
The fit to the data points is adequate for our purpose, though the parameters can
be refined to optimize the fit. These numbers give ttrans = 0.43t0 and ztrans = 1.14.
On adopting H0 ≈ 70km sec−1 Mpc−1, one obtains an age of t0 ≈ 12.4Gyr for the
universe.
An inverse f(R) way out
We begin with a modified field equation generated by an, as yet, unspecified field
lagrangian, f(R),
Rµν −
Rgµν =
(f −RF )gµν (9)
(∇µ∇ν − gµν∇λ∇λ)F −
T (M)µν ,
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 5
where F (R) = df(R)/dR, and κ = 8πG. For a universe of FRW type, filled with a
perfect fluid of density ρm and pressure pm, Eq. (9) and the equation of continuity
reduce to
3HḞ + 3H2F −
(RF − f)− κρm = 0, (10)
F̈ −HḞ + 2ḢF + κ(ρm + pm) = 0, (11)
ρ̇m + 3H(ρm + pm) = 0, (12)
We further neglect the pressure and integrate Eq. (12) to obtain ρm(t) = ρ0a(t)
Next we substitute for ρm in Eq. (11), assume α = 2, change the time variable to
τ = p(t/t0)
4/3, and find
(1 + τ)3τ2F ′′ −
(1 + τ)2(1 + 5τ)τF ′ (13)
(1 + τ)(1 +
τ + 3τ2)F +
where the ′′ ′ now stands for d/dτ , and we have, arbitrarily, put the dimensionless
constant (3/4)3(1+p)3t0
2κρ0, that appear in the course of mathematical manipula-
tions equal to one. We will shortly discuss the numerical solution of Eq. (13). Some
general remarks on its asymptotic behavior, however, are instructive. As τ → 0 we
F (τ) =
τ2 + . . .
+ c1 τ
−0.44P1(τ) + c2 τ
1.7P2(τ), τ → 0.
where P1 and P2 are calculable polynomials in τ , and begin with term 1; c1 and c2 are
constants of integration to be obtained from boundary conditions. The exponents,
−0.44 and 1.7 are approximate solutions of the indicial equation, s2− 5
= 0. As
τ → 0, F diverges to ∞ or converge to 0 on account of one or the other term. This
feature makes the solutions sensitive to a CDM type boundary conditions of the
form, F (τinitial) = 1 and F
′(τinitial) = 0. Presently we have no basis, observational
or otherwise, to make an intelligent guess as to what the appropriate boundary
conditions are. For the sake of argument, however, we have adopt c1 = c2 = 0, and
kept only the proper solution of Eq. (12). With F (τ) known, Eq.(10) becomes an
algebraic equation to calculate f(τ). The numerically calculated solutions of F (τ),
f(τ) and R = t02R(τ) are plotted in Fig. (3). Elimination of τ in favor R provides
F (R) and f(R)
It is instructive to examine the asymptotic behavior of F (R) and f(R) analyti-
cally. In the limit of small and large τ ’s, corresponding to large and small R’s, one
finds
F (R) =
6pR−2/3 + · · ·
R → ∞ (15)
R → 0. (16)
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
6 Rahvar & Sobouti
0.1 0.2 0.3 0.4 0.5
100* F( )
Fig. 3. F (τ), proper solution of Eq. (13) (×100), dot- dashed line; f(τ), dashed line; and R(τ),
solid line; α = 2, p = 1/3.
Note that F (R) is a dimensionless scalar as it should be. With F (R) = df/dR
known, it is a matter of simple integration to obtain f(R). Thus,
f(R) =
6pR−2/3 + · · ·
, R → ∞, (17)
, R → 0. (18)
At early epochs (large R’s), the spacetime approaches the conventional FRW uni-
verse with the classical GR while for the later times we have a positive acceleration
universe. Another point about this action is that for R → 0 in the solar system
scales, f(0) = 0 and f ′(0) = 0. In this case we will have standard GR equation
and f(R) evades from the solar system test. Recently this type of models have been
studied by introducing action in the form of f(R) = R+ f1(R), where for R = 0 in
the solar system, f(0) = 0 and for larger R’s, in cosmological scales and inside the
large scale structures the action reduces to f(R) = R− Λ 32,33.
In the remaining range of R, integration is done numerically and the results are
plotted in Fig. (4).
Dark Energy equivalent
Instead of the modified gravitation considered above, let us assume a classic FRW
universe. That is, let f(R) = R and F = 1. Let this universe, however, have a
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 7
10 20 30 40
Fig. 4. Numerical plot of f(R) versus R. τ is eliminated between f(τ) and R(τ); α = 2, p = 1/3.
’Dark Energy’ component, in addition to its conventional baryonic content. The
counterparts of Eqs. (10), and (11) will be
3H2 − κ (ρde + ρm) = 0, (19)
2Ḣ + κ (ρde + ρm) + κ (pde + pm) = 0. (20)
Subtracting Eq. (10) from (19), and Eq. (11) from (20) gives
κρde = 3H
2(1− F )− 3HḞ +
(RF − f), (21)
κ(ρde + pde) = F̈ −HḞ − 2Ḣ(1− F ), or
κpde = F̈ + 2HḞ −H2(1− 2q)(1− F )
(RF − f). (22)
The equation of state for the dark energy is obtained by eliminating τ , implicit in
F and H , between Eqs. (21) and (22). This is done numerically and w = pde/ρde
as a function of the redshift is plotted in Fig. (5).
Concluding remarks
The algorithm of Fig. 6 summarizes the path we have followed in this letter. We
have resorted to the SNIa observations to design an empirical FRW metric that
allows the model universe to transit from a phase of decelerated expansion at early
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
8 Rahvar & Sobouti
0 1 2 3
-0.685
-0.68
-0.675
-0.67
Fig. 5. Equation of State of the Dark Energy: Numerical plot of w = pde/ρde versus the redshift
epochs to an accelerated one at later times. The spacetime is almost a CDM model
and the gravity is almost the classic GR at very early times, but evolves away in
course of time. Next we have maintained that the so-designed spacetime is deducible
from a modified non- Hilbert- Einstein field lagrangian, f(R) . Knowing the metric,
we have solved the modified field equations retroactively for the sought-after f(R).
Finally, we have compared our results with those of a conventional FRW model
and have attributed the differences between the two to a dark energy component.
Eventually, we have extracted the density, the pressure, and the equation of state
of this stipulated energy.
We note that our choice of the scale factor and the adjustment of its free parameters,
to comply with the available cosmological observations, is, by no means, unique.
The goal is simply to demonstrate that the use of the observations at the outset,
to deduce the rudiments of what seems reasonable, facilitates the access to possible
formal underlying theories, the action based f(R) gravity in our case. With the
availability of more extensive and more accurate data in future one may come back
and revise the model. See also 34 for a similar emphasis.
One of us (YS)35 has followed the same path to propose a modified gravitation for
galactic environments and to explain the flat rotation curves and the Tully-Fisher
relation in spiral galaxies without recourse to hypothetical dark matters.
References
1. A. G. Riess et al., Astron. J. 116, 1009 (1998).
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
An Inverse f(R) Gravitation for Cosmic Speed up, and Dark Energy Equivalent 9
boundary
condition for F( )
differential
equation for F( )
equation of
f=f(F, F’, )
R =R( )
Eliminating
results in f=f(R)
Emprical a( )
from observation
Fig. 6. Algorithm of inverse f(R): Choose a scale factor; Solve field equations first for F = df/dR
and next for f as functions of the time parameter τ ; Eliminate τ between R(τ) f(τ) to arrive at
f(R).
2. S. Perlmutter et al., Astrophys. J. 517, 565 (1999).
3. A. G. Riess et al., Astrophys. J. 607, 665 (2004).
4. C. L. Bennett et al., Astrophys. J. Suppl. 148, 1 (2003).
5. H. V. Peiris et al., Astrophys. J. Suppl. Ser. 148, 213 (2003).
6. D. N. Spergel, L. Verde, H. V. Peiris et al., Astrophys. J. 148, 175 (2003).
7. E. J. Copeland, M. Sami, S. Tsujikawa, Int. J. Mod. Phys. D 15, 1753 (2006)
8. C. Wetterich, Nucl. Phys. B 302, 668 (1988)
9. P. J. E. Peebles and B. Ratra, Astrophys. J. 325, L17 (1988)
10. B. Ratra and P. J. E. Peebles, Phys. Rev. D 37, 3406 (1988).
11. J. A. Frieman, C. T. Hill, A. Stebbins, and I. Waga, Phys. Rev. Lett. 75, 2077 (1995)
12. M. S. Turner and M. White, Phys. Rev. D 56, R4439 (1997)
13. R. R. Caldwell, R. Dave, and P. J. Steinhardt, Phys. Rev. Lett. 80, 1582 (1998).
14. V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 9, 373 (2000).
15. A. R. Liddle and R. J. Scherrer, Phys. Rev. D 59, 023509 (1999).
16. I. Zlatev, L.Wang, and P. J. Steinhardt, Phys. Rev. Lett. 82, 896 (1999).
17. P. J. Steinhardt, L. Wang, and I. Zlatev, Phys. Rev. D 59, 123504 (1999).
18. D. F. Torres, Phys. Rev. D 66, 043522 (2002).
19. M. S.; S. Arbabi Bidgoli, M. S. Movahed, and S. Rahvar, Int. J. Mod. Phys. D 15,
1455 (2006).
20. Movahed and S. Rahvar, Phys. Rev. D 73, 083518, (2006).
21. S. Rahvar and M. S. Movahed, Phys. Rev. D 75, 023512, (2007).
22. T. Clifton and J. Barrow, Phys. Rev. D 72, 103005 (2005).
23. S. Nojiri and S. D. Odintsov, Phys. Rev. D 68, 123512 (2003).
24. S. Nojiri and S. D. Odintsov, Phys. Lett. B 562, 147 (2003).
September 15, 2021 0:45 WSPC/INSTRUCTION FILE paper˙rv
10 Rahvar & Sobouti
25. C. Deffayet, G. R. Dvali, and G. Gabadadze, Phys. Rev. D 65, 044023 (2002).
26. K. Freese and M. Lewis, Phys. Lett. B 540, 1 (2002).
27. M. Ahmed, S. Dodelson, P. B. Greene, and R. Sorkin, Phys. Rev. D 69, 103523 (2004).
28. G. R. Dvali, G. Gabadadze, and M. Porrati, Phys. Lett. B 484, 112 (2000).
29. S. Baghram, M. Farhang, and S. Rahvar, Phys. Rev. D 75, 044024 (2007).
30. M. S. Movahed, S. Baghram and S. Rahvar, Phys. Rev. D 76, 044008 (2007).
31. M. N. Celerier, New Advances in Physics 1, 29 (2007)
32. A. A. Starobinsky, JETPL 86, 157 (2007).
33. W. Hu, I. Sawicki, Phys. Rev. D 76, 064004 (2007).
34. A. Shafieloo, U. Alam, V. Sahni and A. A. Starobinsky, MNRAS 366, 1081 (2006)
35. Y. Sobouti, A&A 464, 921 (2007)
ABSTRACT
  To explain the cosmic speed up, brought to light by the recent SNIa and CMB
observations, we propose the following: a) In a spacetime endowed with a FRW
metric, we choose an empirical scale factor that best explains the
observations. b) We assume a modified gravity, generated by an unspecified
field lagrangian, $f(R)$. c) We use the adopted empirical scale factor to work
back retroactively to obtain $f(R)$, hence the term `Inverse $f(R)$'. d) Next
we consider the classic GR and a conventional FRW universe that, in addition to
its known baryonic content, possesses a hypothetical `Dark Energy' component.
We compare the two scenarios, and find the density, the pressure, and the
equation of the state of the Dark Energy required to make up for the
differences between the conventional and the modified GR models.

<|endoftext|><|startoftext|>
Introduction
	II Experiment
	III Imaging local dynamics in a drying paint film
	IV Time resolved correlation
	V Conclusions
	 Acknowledgments
	 References
ABSTRACT
  We report on a new type of experiment that enables us to monitor spatially
and temporally heterogeneous dynamic properties in complex fluids. Our approach
is based on the analysis of near-field speckles produced by light diffusely
reflected from the superficial volume of a strongly scattering medium. By
periodic modulation of an incident speckle beam we obtain pixel-wise ensemble
averages of the structure function coefficient, a measure of the dynamic
activity. To illustrate the application of our approach we follow the different
stages in the drying process of a colloidal thin film. We show that we can
access ensemble averaged dynamic properties on length scales as small as ten
micrometers over the full field of view.

<|endoftext|><|startoftext|>
Introduction
Along the Asymptotic Giant Branch phase, many stars exhibit
maser amplification in different molecular lines. In oxygen rich
stars, [O]/[C]>1, O-bearing compounds are mainly formed and
maser emission is presented in SiO, H2O and OH. The study
of these different molecules provides information of the overall
envelope, from the inner layers dominated by the stellar pulsa-
tion (SiO masers) to the outermost regions where the circum-
stellar material is expanding at constant velocity (OH masers).
The very long baseline interferometry is a unique technique
to study the compact and very bright SiO emission, and there-
fore, it is particularly helpful in understanding the different and
complex processes occuring in these inner regions of the enve-
lope. On the other hand, the current models of pumping, either
collisional or radiative, do not reproduce some characteristics
of the SiO masers that have been observed, as for example,
their relative location in the envelope. To test these models and
constrain the physical parameters of these inner shells better, it
is very useful to perform simultaneous observations of several
maser transitions. For this reason, we have carried out multi-
line and multi-epoch observations in a sample of AGB stars.
Send offprint requests to: R. Soria-Ruiz
e-mail: soria@jive.nl
We present in this paper the latest results for the Mira-type vari-
able R Leo.
2. VLBA observations
The observations were made with the NRAO1 Very Long
Baseline Array (VLBA) on 2002 december 7. Nearly simul-
taneous observations of different 43 GHz and 86 GHz 28SiO
and 29SiO maser transitions were performed in the variable star
R Leo. The 86 GHz lines were observed in between the two
43 GHz scans, and, therefore, we can assume for our purpouses
that the observations were simultaneous.
The data correlation was done at the VLBA correlator lo-
cated in Socorro (New Mexico). Left and right circular polar-
izations (LCP & RCP) were measured for the 28SiO v=1 and
v=2 J=1–0 lines, whereas only the LCP was observed in the
other transitions. Since no significant difference was found be-
tween the maps, less than 5%, the final image is the average of
both polarizations.
Standard procedures for spectral line VLBI data reduction
were followed in the calibration and production of the maps.
1 The National Radio Astronomy Observatory is a facility of the
National Science Foundation operated under cooperative agreement
by Associated Universities, Inc.
http://arxiv.org/abs/0704.0682v1
2 R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo
Table 1. Observed maser transitions and results of the fits.
specie transition restoring R̄ △R center
beam (mas2) (mas) (mas) (Xc,Yc)
28SiO v=1 J=1–0 0.78×0.50 29.24 6.42 (26.9,–21.3)
v=1 J=2–1 0.50×0.50 33.84 4.20 (–35.2,–4.7)
v=2 J=1–0 0.50×0.50 25.92 6.88 (25.5,–16.3)
29SiO v=0 J=1–0 0.78×0.22 one spot — —
v=0 J=2–1 non–det. — — —
The amplitude calibration was done using the system temper-
atures and antenna gain corrections for the 86 GHz and 29SiO
data, and the template method for the other 43 GHz data. The
phase errors were removed in a two-step process: first, the
single-band delay corrections were derived from the continuum
calibrators, OJ287 and 3C273. Second, the fringe-rates were
estimated by selecting a bright and simple-structured channel;
the corrections found were subsequently applied to the maser
source. The maps were produced using the CLEAN deconvo-
lution algorithm.
3. Results and Data Fits
The results are presented as follows (Figs. 1–2): for each
observed line, we show the integrated emission map in
Jy beam−1 km s−1 units (center panel), the spectrum of the
cross-correlated emission for the different maser components
(numbered panels), the total power spectrum (AC) of one of the
VLBA antennas and of the emission in the map (XC) (upper-
left panel), and the ratio of these two magnitudes (upper-right
panel). We have also estimated the size of the total masing re-
gions by fitting our data to rings. Only those components with
SNR≥6 have been included in the fits. The results derived from
the calculations are summarized in Table 1: characteristic ring
radius (R̄), ring width (△R) and center of the ring (Xc,Yc) (see
details on the fitting process in Soria-Ruiz et al., 2005). In par-
ticular, the angular sizes derived for the 28SiO v=1 and v=2
J=1–0 regions (Table 1) are compatible with previous obser-
vations performed in R Leo by Cotton et al. (2004). Among the
six transitions observed only the 29SiO v=0 J=2–1 has not been
detected. A more detailed description of the maps is given in
the subsequent section.
4. Relative spatial distribution and pumping
mechanisms
Our maps show that the spatial distribution of the v=1 and v=2
maser spots is similar although not all the components appear
in both transitions (Fig. 1). Concerning the relative location of
the 43 GHz 28SiO maser layers, the v=2 emission is produced
in a closer region of the envelope, assuming that the centroids
of all the emissions are coincident. This is also consistent with
previously reported results in other oxygen rich envelopes (see
e.g. Desmurs et al., 2000; Cotton et al., 2004; Soria-Ruiz et al.,
2004, 2005). In contrast to the 43 GHz regions, this first map of
the 28SiO v=1 J=2–1 emission in R Leo reveals that the com-
ponents of this maser line are situated in a significantly outer
region of the envelope, with a very different spot distribution.
Since the J=2–1 emission has been imaged only in a very few
sources, this result in R Leo is particularly important to test
the proposed SiO maser mechanisms. Finally, the 29SiO v=0
J=1–0 map consisted of one maser spot, thus making difficult
to derive any spatial information. The total power and recov-
ered emission are shown in Figure 2.
Current pumping models, either radiative (Bujarrabal,
1994a,b) or collisional (Humphreys et al., 2002), predict that
the different rotational maser lines within the same vibra-
tional state are produced under similar conditions and there-
fore are expected to be located in the same region of the en-
velope. As previously mentioned, we find a contradiction be-
tween these theoretical predictions and our observational re-
sults. This discrepancy has also been observed in other oxygen-
rich stars; IRC+10011 (Soria-Ruiz et al., 2005) and TX Cam
(Soria-Ruiz et al., 2006). Further calculations of the excita-
tion of the SiO molecule in AGB stars have shown that the
conditions under which the different maser transitions occur
change drastically when the line overlap between infrared
lines of H2O and
28SiO is introduced in the pumping models
(Bujarrabal et al., 1996; Soria-Ruiz et al., 2004); such a mech-
anism could explain the lack of coincidence between the spots
of different J–transitions within a vibrational state.
Nevertheless, although these new maps support the rele-
vance of line overlaps in the SiO maser pumping in O–rich
shells, we think that similar studies should be performed in a
larger number of evolved stars. In particular, it would be nec-
essary to have data on all types of long-period variable stars,
namely, Mira-type, semirregular and irregular variables, as well
as supergiant stars.
Acknowledgements. This work has been financially supported by the
Spanish DGI (MCYT) under projects AYA2000-0927 and AYA2003-
7584. All plots have been made using the GILDAS software package
(http://www.iram.fr/IRAMFR/GILDAS).
References
Bujarrabal, V. 1994a, A&A, 285, 953
Bujarrabal, V. 1994b, A&A, 285, 971
Bujarrabal, V., Alcolea, J., Sánchez Contreras, C., & Colomer,
F. 1996, A&A, 314, 883
Bujarrabal, V., Gómez-González, J., & Planesas, P. 1989,
A&A, 219, 256
Cotton, W. D., Mennesson, B., Diamond, P. J., et al. 2004,
A&A, 414, 275
Desmurs, J.-F., Bujarrabal, V., Colomer, F., & Alcolea, J. 2000,
A&A, 360, 189
Humphreys, E. M. L., Gray, M. D., Yates, J. A., et al. 2002,
A&A, 386, 256
Soria-Ruiz, R., Alcolea, J., Colomer, F., et al. 2004, A&A, 426,
Soria-Ruiz, R., Colomer, F., Alcolea, J., et al. 2005, A&A, 432,
Soria-Ruiz, R., Colomer, F., Alcolea, J., et al. 2006,
Proceedings of the 8th EVN Symposium
http://www.iram.fr/IRAMFR/GILDAS
R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo 3
Fig. 1. The 28SiO v=1 (upper panel) and v=2 (lower panel) J=1–0 maser emission in R Leo. Each figure shows the integrated
intensity map in Jy beam−1 km s−1 units, the spectra of the individual maser components, the total power spectrum and the
emission in the map, and their ratio. For some maser components, the intensity has been divided by a factor of 2 or 3 to ease
the comparison with the other spectra. The vertical dashed lines indicate the systemic velocity of the source, VLSR= –0.5 km s
(Bujarrabal et al., 1989). Circles represent the fits for the masing regions (dashed: mean radius R̄, continuous: Rout and Rin defined
as R̄± 12△R) (see Section 3). The peak intensity is 28.45 Jy beam
−1 km s−1 (v=1) and 53.5 Jy beam−1 km s−1 (v=2), and the shown
contour is equivalent to the 5σ level, with σ= 0.4 Jy beam−1 km s−1 (v=1) and σ= 0.7 Jy beam−1 km s−1 (v=2).
4 R. Soria-Ruiz et al.: Mapping the circumstellar SiO maser emission in R Leo
Fig. 2. Same as Figure 1 for the 28SiO v=1 J=2–1 (upper panel) and 29SiO v=0 J=1–0 (lower panel) maser emission in R Leo.
The peak intensity is 6.03 Jy beam−1 km s−1 (28SiO) and 0.26 Jy beam−1 km s−1 (29SiO), and the shown contour is equivalent to
the 5σ level, with σ= 0.08 Jy beam−1 km s−1 (28SiO) and σ= 0.01 Jy beam−1 km s−1 (29SiO).
	Introduction
	VLBA observations
	Results and Data Fits
	Relative spatial distribution and pumping mechanisms
ABSTRACT
  The study of the innermost circumstellar layers around AGB stars is crucial
to understand how these envelopes are formed and evolve. The SiO maser emission
occurs at a few stellar radii from the central star, providing direct
information on the stellar pulsation and on the chemical and physical
properties of these regions. Our data also shed light on several aspects of the
SiO maser pumping theory that are not well understood yet. We aim to determine}
the relative spatial distribution of the 43 GHz and 86 GHz SiO maser lines in
the oxygen-rich evolved star R Leo. We have imaged with milliarcsecond
resolution, by means of Very Long Baseline Interferometry, the 43 GHz (28SiO
v=1, 2 J=1-0 and 29SiO v=0 J=1-0) and 86 GHz (28SiO v=1 J=2-1 and 29SiO v=0
J=2-1) masing regions. We confirm previous results obtained in other
oxygen-rich envelopes. In particular, when comparing the 43 GHz emitting
regions, the 28SiO v=2 transition is produced in an inner layer, closer to the
central star. On the other hand, the 86 GHz line arises in a clearly farther
shell. We have also mapped for the first time the 29SiO v=0 J=1-0 emission in R
Leo. The already reported discrepancy between the observed distributions of the
different maser lines and the theoretical predictions is also found in R Leo.

<|endoftext|><|startoftext|>
Graph1
Partially disordered state near ferromagnetic transition in MnSi
S.V.Maleyev∗ and S. V. Grigoriev
Petersburg Nuclear Physics Institute, Gatchina, Leningrad District 188300, Russia
(Dated: November 4, 2018)
The polarized neutron scattering in helimagnetic MnSi at low T reveals existence of a partially
disordered chiral state at ambient pressure in the magnetic field applied along 〈111〉 axis below
the first order transition to the non-chiral ferromagnetic state. This unexpected phenomenon is
explained by the analysis of the spin-wave spectrum. We demonstrate that the square of the spin-
wave gap becomes negative under magnetic field applied along 〈111〉 and 〈110〉 but not along the
〈100〉 direction. It is a result of competition between the spin-wave interaction and cubic anisotropy.
This negative sign means an instability of the spin wave spectrum for the helix and leads to a
destruction of the helical order, giving rise to the partially disordered state below the first order
ferromagnetic transition.
PACS numbers: 75.25.+z,61.12.Ex
Non-centrosymmetric cubic helimagnets such as MnSi, FeGe, FeCoSi are the subject of the intensive experi-
mental and theoretical studies for the last several decades. Their single-handed helical structure was explained by
Dzyaloshinskii1. The full set of interactions responsible for observed helical structure (Bak-Jensen model) was es-
tablished later in2,3 in agreement with existing experimental data (see for example4 and references therein). The
renascence in this field began with a discovery of the quantum phase transition to a disordered (partially ordered)
state in MnSi at high pressure5 and6. The following properties of this state attract the main attention: i) non-Fermi-
liquid conductivity, ii) spherical neutron scattering surface with the weak maxima along the 〈110〉 axes7,8, whereas
at ambient pressure Bragg reflections are observed along 〈111〉4. These features and the structure of the partially
ordered state were discussed in several theoretical papers (see9,10,11,12 and references therein). It should be noted also
that the spherical scattering surface with maxima along 〈111〉 was observed at ambient pressure just above critical
temperature Tc ≃ 29K. This experiment was explained using the Bak-Jensen model13.
These studies shadowed an important problem of the helix structure evolution in the external magnetic field H . In
particular, simple phenomenological14 and microscopical15 theories predict the smooth second order transition from
the conical to the ferromagnetic state. The spin component of the cone parallel to the applied field is proportional
to the magnetization and it increases as (H/HC) up to its saturated value. This prediction is in agreement with
experiment6. The perpendicular, rotating spin components fade away with the field and according to the plain theory
the helical Bragg reflections must decrease as (H2C − H2)/H2C where HC is the critical field for the ferromagnetic
transition. This, however, contradicts to the experimental facts if H ‖ [111] (See16 and Fig.1). The experiment shows
that the transition is of the first order and the Bragg intensity does not follow the law given above.
In this paper we demonstrate that although from a general expression for the ground state energy one can expect
the second order transition at the critical field HC , this is true for H ‖ [100] only but the situation changes if H ‖ [111]
or H ‖ [110]. In the last cases the spin-wave spectrum is unstable in the field range H1 < H < HC due to the cubic
anisotropy as the square of the spin-wave gap becomes negative and the helical long-range order decays in this field
interval. The ferromagnetic state occurs above HC . Hence we have a region below HC where the partially disordered
state coexists with the almost saturated magnetization. It is a reason why this state has not been noticed in the
earlier macroscopic measurements5,6.
Let us consider the conical helix with the lattice spin SR = S
ζ̂R + S
η̂R + S
ξ̂R where
ζ̂R = ĉ sinα+ (Ae
ik·R + c.h.) cosα;
η̂R = i(Ae
ik·R − c.h.);
ξ̂R = ĉ cosα− (Aeik·R + c.h.) sinα,
whereA = (â−ib̂)/2 and the unit vectors ζ̂, η̂ and ξ̂ form the right-handed frame. If α = 0 we have a plain helix15. The
spin operators are given by the well known expressions: S
= S−(a+a)R, SηR = −i(S/2)1/2[aR−a
−(a+a2)R/(2S)]
and S
= (S/2)1/2[aR + a
− (a+a2)R/(2S)].
http://arxiv.org/abs/0704.0683v1
Similar to in15, we use the following Hamiltonian
{−JqSq · S−q + 2iDq[Sq × S−q]
l=x,y,z
Sq,lS−q,lq
l }+K
l=x,y,z
Sq,lSp,lSh,lSf ,l
+N1/2H · S0,
where the first term is the ferromagnetic exchange interaction, the second term is the Dzyaloshinskii interaction (DI)
responsible for the helix structure1. The following terms are the anisotropic exchange, the cubic anisotropy and the
Zeeman energy. They determine the orientation of the helix vector k in the magnetic field (see15,16 and17). The
following hierarchy of interactions holds2,3: J0 >> D0/a >> F0/a
2 ∼ K where a is the lattice constant. Replacing
SR → Sζ̂R one gets the classical energy15
Ecl = S
2L(k)
−SD0(k · [â× b̂])
cos2α+ SH‖ sinα+ Ecub
where A = S(J0 − Jk)/k2 is the spin-wave stiffness at q >> k, L(k) =
k̂2l (â
l + b̂
l ) and H‖ is the field component
parallel to k.
Using (3) and (4) and taking into account that the single ion contribution Ecub does not depend on k we get
kl = k(D0/|D0|)[ĉl − SF0(â2l + b̂2l )/(2A)]
Ecl = −(SAk2/2)[1− SF0L(ĉ)/(2A)] cos2 α+H‖ sinα+ Ecub,
where k = S|D0/A| and L(ĉ) is a cubic invariant. For D0 > 0 or D0 < 0 we have the right or the left helix,
respectively3. For F0 < (>)0 the helix vector k is oriented along the 〈111〉 (〈100〉) axes as the invariant L has two
extrema 2/3(0)3.
Neglecting Ecub in Eq.(4) one obtains the conical state with sinα = −H‖/HC if H‖ < HC where HC = Ak2(1 −
SF0L/(2A)] ≃ Ak2 and the ferromagnetic state for H‖ > HC15,19. It can be shown that Ecub gives a negligible
contribution to HC and to sinα. The principal parameters of the magnetic structure for MnSi are: A ≃ 52meV Å
k ≃ 0.038Å and HC ≃ 0.6T ≃ Ak2 in agreement with the theory (see15). Another energy F0k2 ∼ 0.01meV ≃ 0.1T
was estimated from the anisotropy of the critical neutron scattering13 and from the reorientation of the helix axis in
the magnetic field16. The value of K will be estimated below.
The classical energy depends on H‖ only. The general expression for the ground state energy, derived in
15. It
contains yet another term of purely quantum nature, which depends on the spin-wave gap ∆ as a parameter. As shown
in15 if H⊥ > ∆
2 the helix wave vector k is directed along the magnetic field. For MnSi ∆ ≃ 12µeV ≃ 0.1T ≪ HC16.
In a linear spin-wave theory the gap appears due to the cubic anisotropy but it equals to zero at H‖ = 0
20. There
is yet another contribution to the gap, which is a result of the spin-wave interaction. We begin with the former. For
evaluation of ∆2 one has to consider the uniform part of the bilinear Hamiltonian, which is given by
H0 = E0a
0 a0 +B0(a
0 + a
0 )/2, (5)
and ∆20 = E
0 −B20 . If one neglects the cubic anisotropy at H‖ < HC then one has E0 = B0 = (HC/2) cos2 α and the
gap is zero15,18. Taking into account Hcub, one obtains after simple but rather tedious calculations
E0 = (HC cos
2 α)/2 + E1 + E2;B0 = (HC cos
2 α)/2− E2
E1 = −4Λ[(1− L) sin4 α+ 3L sin2 α cos2 α
+ (3/8)(2− L) cos4 α];
E2 = (3Λ/4)[4L sin
2 α+ (2− L) cos2 α];
∆2Cub = (HC cos
2 α+ E1)(E1 + 2E2),
where Λ = S3K. These expressions hold at H‖ < HC and, indeed, ∆
Cub = 0 in zero field only
For H ≥ HC we have cosα = 0 and if without cubic anisotropy E0 = H−HC , B0 = 0 and the gap ∆ = H−HC15,
while if taking it into account the cubic anisotropy we get
∆2Cub = [H −HC − 4Λ(1− L)][H −HC + Λ(10L− 4)]. (7)
For the two principal directions L111 = 2/3 and L001 = 0 and we obtain:
∆2Cub,[1,1,1] = (H −HC − 4Λ/3)(H −HC + 8Λ/3);
∆2Cub,[1,0,0] = (H −HC − 4Λ)
Thus, one comes to the important conclusion: ∆2
Cub,[1,1,1]
is negative at H = HC for both signs of K
21. This
circumstance is decisive for the stability of the system if one takes into account that the contribution to ∆2, stem
from the spin-wave interaction, is proportional to cos4 α and disappears at H close to HC .
Let’s consider the spin wave interaction to the gap ∆2Int. As the DI breaks the total spin conservation law it must
lead to the spin-wave gap. However, in cubic crystals this interaction is very soft [see Eq.(2)] and the gap appears as
a result of the spin-wave interaction only similar to the case of pseudo-dipolar interaction in antiferromagnets22,23.
In the one-loop approximation it consists of both the Hartree-Fock (HF) part evaluated in15 at H = 0 and the
second order contribution from three-point spin-wave interaction, which appears due to the helical structure. It was
ignored in15. The diagrams for both contributions are shown in Fig.2 where lines correspond to Green functions
Gq = − < Taq, a+q > and Fq = F+q = − < Taq, a−q >, which in ω-representation are given by15:
Gq(iω) =
Eq + iω
(iω)2 − ǫ2q
; Fq(iω) = −
(iω)2 − ǫ2q
, (9)
where Eq = S(Mo −Mq) +Bq;Bq = (S/2)(Mq − Jq) cos2 α;Mq = J+q + 2Dq(k · ĉ), J±q = (Jq+k ± Jq−k)/2 and the
spin-wave energy ǫq = (E
q − B2q)1/2. Although at q ≪ 1/a these expressions give the same result as obtained in15:
Eq = Aq
2 + Bq; Bq = (Ak
2/2) cos2 α and ǫq = Aq(k
2 cos2 α + q2)1/2. However for the present consideration we
need them for all q as the formulae for ∆2Int contains the sums, which saturate at q ∼ 1/a.
The contribution of the forth-point interaction to ∆2 was analyzed in15 at H = 0 and T = 0. At an arbitrary H
the gap consists of two terms
V1 = (1/4N)
(M1 +M2 −M1−3 −M2−3)a+1 a
2 a3a4;
V2 = (1/4N)
(M1 − J1)[2(a+a)1(a+a)−1 sin2 α
− (a+a2)1(a1 + a+−1) cos2 α],
where 1 = q1 etc. At small momenta we have in parentheses −2A(3 · 4)/S and Ak2/S for V1 and V2, respectively.
Now one has to consider the V1 interaction and the second part of V2 interaction together. They give the principal
contribution to the gap ∆2Int
∆2Int =
SAk2 cos4 α
(Mq − Jq) =
(Ak2)2 cos4 α
, (11)
where in r.h.s we take into account that
J+q =
Jq = 0 and according to Eq.(4) (k · ĉ) = Ak2/(SD0). This
contribution is T -independent.
The contribution of the first term in V2 is more complicated one. Its T -independent part is proportional to (ka)
and may be neglected [for MnSi (ka)2 ≈ 0.03]. The T -dependent contribution consists of two parts. The excitations
with q ≫ k have a quadratic dispersion ǫq ≈ Aq215 and at T ≫ Ak2 they are responsible for the first part, which
has the form ∆22,1 = (1/2)(Ak
2)2ζ(3/2)(ka)3[T/(2πAk2)]3/2 sin2 α cos2 α. In spite of small factor (ka)3 it may be
important at sufficiently high T .
The second part is not so trivial. According to15 (cf. also11) at q . k and H < HC the spin-wave spectrum
becomes strongly anisotropic due to umklapp processes connecting excitations with q and q± k. In zero field taking
into account the gap we have
ǫq = [A
2(k2q2‖ + 3q
⊥/8) + ∆
2]1/2. (12)
As the field increases this anisotropy becomes weaker since the term in the expression for ǫq appears to be proportional
to q2⊥. In the ferromagnetic state (H > HC) this anisotropy vanishes. In a very weak field we can use Eq.(12) and
then the second part is given by
∆22,2 = (1/π)Ak
3/2 ln(Ak2/∆) sin2 α cos4 α. (13)
This expression diverges if ∆ → 0. Similar divergences were discussed in25. However, we can neglect this contribution
due to the small factor (ka)3 and decreasing of logarithm when H increases.
The three-point interaction is given by V3 = V− cosα+V+ sinα cosα where V± = (2S/N)
+a)−q(a−q±
a+q ) and
C(−)q = [J
q +Dq(q · ĉ)] ∼ (A/S)(q · k)(qa)2
C(+)q = [(Jq − J (+)q )/2−Dq(k · ĉ)] ≃ −Ak2/(2S),
where the r.h.s. expressions are derived from Eq.(4) at q ≪ 1/a. The corresponding contributions to ∆2 were
evaluated as in22. The T -independent term is proportional to (ka)2 and the second one is equal to −2∆2,2/3. Both
of them are small and can be neglected. Finally we have
∆2 = ∆2Int +∆
Cub. (15)
The field dependence of the ratio ∆2(H)/∆2(0) for the three directions 〈111〉, 〈100〉 and 〈110〉 are shown in Fig.3
with HC = 0.565T, ∆(0) = 0.1 T in agreement with the experiment. The cubic anisotropy Λ was chosen to be
equal to −0.05 T. In case of H ‖ [100] ∆2 remains positive and the spin-wave spectrum is stable at all H . In this
case at H > HC along with ferromagnetic spin configuration the spin-wave components of the lattice spins has to
remain rotating [see Eq.(1)] as was discussed in15. For 〈111〉 direction ∆2 is negative at H > H1 ≃ 0.72HC and the
spin-wave spectrum becomes unstable. Hence the long-range helical order demolishes and the corresponding Bragg
peaks disappear and the scattering has to be spread around them. Such decrease of the intensity along [111] is shown
in Fig.1 in qualitative agreement with the theory. However, more detailed measurements are needed. A similar
instability has to be along [110] also contrary to the expectations for H ‖ [100].
The width of this scattering may be estimated from the condition ǫ2q = 0, as for larger q the spin-wave spectrum
can not feel the disorder. Near H1 we have ǫ
q ∼ (Aqk)2 + ∆2 and the inverse correlation length of the disorder
κ = k(|∆|/Ak2) ≪ k. Close to HC ǫq ∼ Aq215 and κ = k(|∆|/Ak2)1/2.
This disordered state have a strong chirality, which is demonstrated by a constant polarization of the scattered
neutrons in the whole field range below HC (see inset in Fig.1). This polarization is determined as P = (I− −
I+)/(I− + I+) and according to the general theory
26 the ratio P/P0, where P0 is the initial neutron polarization, is
the chirality at given q. A strong drop of the polarization at HC is a signature of the first order transition to the
uniform ferromagnetic state with weak chiral fluctuations.
In conclusion, we analyzed thoroughly the field behavior of the spin wave gap in the spin-wave spectrum of cubic
helimagnets. It is shown that if the field is applied parallel to both [111] and [110] directions a partially disordered
state has to take place at H1 < HC . We demonstrated that this state appears when the square of the spin-wave gap
becomes negative and the spin-wave spectrum unstable. We presented the first observation in MnSi of this partially
disordered chiral state in the magnetic field.
The work is supported in part by the RFBR (projects No 05-02-19889, 06-02-16702 and 07-02-01318) and the
Russian State Programs ”Quantum Macrophysics” and ”Strongly correlated electrons in Semiconductors, Metals,
Superconductors and magnetic Materials” and Russian State Program ”Neutron research of solids”.
∗ Electronic address: maleyev@sm8283.spb.edu
1 I.E.Dzyaloshinskii, Zh.Eksp. Teor.Fiz.46, 1420 (1964)[Sov.Phys.JETP19, 960 (1964)].
2 O.Nakanishi, A.Yanase, A.Hasegava, M.Kataoka, Solid State Commun.35, 995 (1980).
3 P.Bak, M.Jensen, J.Phys. C 13, L881 (1980).
4 M.Ishida, Y.Endoh, S.Mitsuda, Y.Ishilawa, M Tanaka,J.Phys.Soc.Jpn. 54 2975 (1975).
5 C.Pfleiderer, G.J.MacMillan, S.R.Julian, G.G.Lonzarich, Phys.Rev. B 55, 8330 (1997).
6 K.Koyama, T.Goto, T.Kanomata, R.Note, Phys.Rev. B 62,986 (2000).
7 C.Pfleiderer, S.R.Julian, G.G.Lonzarich, Nature(London) 414, 427 (2001).
8 C.Pfleiderer, D.Reznik, L.Pintschovius, H.v.Löhneysen, M.Garst, A.Rosh, Nature (London)427, 227 (2004).
9 S.Tewaru, D.Belitz, T.R.Kirpatrick, Phys.Rev.Lett.96, 047207 (2006).
10 U.K.Rössler, A.N.Bogdanov, C.Pfleiderer, Nature(London) 442, 797 (2006).
11 D.Belitz, T.R.Kirpatrick, Phys.Rev. B 73, 054431 (2006).
12 B.Binz, A.Vishwanath, Phys.Rev. B 74, 214408 (2006).
13 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, R.Georgii, P.Böni, D.Lamago, H.Eckerlebe, K.Pranzas,
Phys.Rev. B 72,134420 (2005).
14 M.L.Plumer, M.B.Walker, J.Phys.C: Solid State Phys. 14, 4689 (1981).
15 S.V.Maleyev, Phys.Rev. B 73,174402 (2006).
16 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, P.Böni, R.Georgii, D.Lamago, H.Eckerlebe, K.Pranzas,
Phys.Rev. B 74, 214414 (2006).
mailto:maleyev@sm8283.spb.edu
350 400 450 500 550 600 650
350 400 450 500 550 600 650
H (mT)
H (mT)
FIG. 1: The intensity of the Bragg reflection in MnSi at T = 15 K as function of the field at H ‖ [111]. The full line is the
theoretical prediction (see the text). Inset: the spin chirality as a function of the field measured by polarized neutrons(see
text).
(a) (b)
FIG. 2: Hartree-Fock (a) and three-point diagrams for the spin-wave gap (b).
17 S.V.Grigoriev, S.V.Maleyev, A.I.Okorokov, Yu.O.Chetverikov, H.Eckerlebe, J. Phys.: Condens. Matter 19 145286 (2007).
18 There are misprints in Eqs. (13) and (21,22) in Ref.15: the F terms has to have negative sign.
19 Due to demagnetization HC has additional term depending on the sample form
20 Eq.(53) in15 is erroneous.
21 The same trues in [1, 1, 1] direction where L = 1/2
22 D.Petitgrand, S.V.Maleyev, Ph.Bourges, A.S.Ivanov, Phys.Rev B 59, 1079 (1999).
23 Results presented below were obtained as in22 using modification of the Belyaev’s technique24 adjusted to non-Hermitian
spin-wave interaction in Dayson-Maleyev representation.
24 A.A.Abrikosov, L.P.Gor’kov, I.E.Dzyaloshinskii, Quantum field theoretical Methods in Statistical Physics, (Pergamon, New
York, 1965).
25 T.R.Kirpatrick D.Belitz, Phys.Rev.Lett. 97, 267205 (2006).
26 S.V.Maleyev, Phys. Usp.,45, 569 (2002); Physica B 350, 26 (2004).
0,0 0,2 0,4 0,6 0,8 1,0 1,2
 H || (100)
 H || (111)
 H || (110)
 = - 0.5
FIG. 3: The magnetic field dependence of the ratio ∆2(H)/∆2(0). Parameters are given in the text.
	References
ABSTRACT
  The polarized neutron scattering in helimagnetic MnSi at low $T$ reveals
existence of a partially disordered chiral state at ambient pressure in the
magnetic field applied along $<111>$ axis below the first order transition to
the non-chiral ferromagnetic state. This unexpected phenomenon is explained by
the analysis of the spin-wave spectrum. We demonstrate that the square of the
spin-wave gap becomes negative under magnetic field applied along $<111>$ and
$<110>$ but not along the $<100>$ direction. It is a result of competition
between the spin-wave interaction and cubic anisotropy. This negative sign
means an instability of the spin wave spectrum for the helix and leads to a
destruction of the helical order, giving rise to the partially disordered state
below the first order ferromagnetic transition.

<|endoftext|><|startoftext|>
Fluctuations in glassy systems
Claudio Chamon1 and Leticia F. Cugliandolo2
1 Physics Department, Boston University,
590 Commonwealth Avenue, Boston, MA 02215, USA
2Laboratoire de Physique Théorique et Hautes Énergies, Jussieu,
5ème étage, Tour 25, 4 Place Jussieu, 75252 Paris Cedex 05, France
chamon@bu.edu, leticia@lpt.ens.fr
Abstract. We summarize a theoretical framework based on global time-
reparametrization invariance that explains the origin of dynamic fluctuations in glassy
systems. We introduce the main ideas without getting into much technical details.
We describe a number of consequences arising from this scenario that can be tested
numerically and experimentally distinguishing those that can also be explained by other
mechanisms from the ones that we believe, are special to our proposal. We support our
claims by presenting some numerical checks performed on the 3d Edwards-Anderson
spin-glass. Finally, we discuss up to which extent these ideas apply to super-cooled
liquids that have been studied in much more detail up to present.
http://arxiv.org/abs/0704.0684v1
CONTENTS 2
Contents
1 Why glasses? vs. universality in glassy dynamics 2
2 Time reparametrization invariance 5
2.1 Mean-field models – dynamic equations . . . . . . . . . . . . . . . . . . . 5
2.2 Structural glasses: the p ≥ 3 cases . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Short-ranged models – dynamic action . . . . . . . . . . . . . . . . . . . 10
2.4 Turning a nuisance into something useful - symmetry as a guideline . . . 11
2.5 The spherical p = 2 case or mean-field domain growth . . . . . . . . . . . 13
2.6 Quantum problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Consequences and tests 14
3.1 Two-time correlation length . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Scaling of the pdf of local two-time functions . . . . . . . . . . . . . . . 17
3.3 Effective action for local ages . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Two-time scaling of local functions . . . . . . . . . . . . . . . . . . . . . 21
3.5 Multi-time scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Local fluctuation-dissipation relation . . . . . . . . . . . . . . . . . . . . 25
3.7 Infinite susceptibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Discussion 27
1. Why glasses? vs. universality in glassy dynamics
It is common to encounter in nature systems that resist equilibration with their
environments and display what is called glassiness. The name is derived from what
we normally know as glasses, an irregular array of silicon and oxygen atoms without
crystalline order, much as a liquid, but as hard as a solid. The molecular diffusion
within glasses is extremely sluggish, slowing down by over 10 orders of magnitude as
the temperature is slightly decreased near the operationally defined glass transition
temperature. Hence, the term glassiness became generically associated with very slow
dynamics [1].
So, why glasses? The answer to this question has been the focus of much research
effort for long. It is certainly a rather difficult question, and there have been a number
of ideas lined up for trying to understand how material systems become glassy. It is not
even clear whether in many systems there is a thermodynamic phase one can call glass,
or there is simply a dynamical crossover at low enough temperatures.
Do we need to fully answer why glasses? before we really further our understanding
of glassy dynamics? We take the point of view that, by starting from the fact that glassy
systems exist (as nature presents us with concrete examples), we can then attempt to
characterize whatever possible universal properties there are in glassy dynamics.
CONTENTS 3
To make this statement clearer, let us turn to a question that Anderson poses in
the introduction of his Concepts in solids text [2]: why solids? This question, again,
is a rather complex one, and it is not untwined from the question why glasses? if we
focus on why a regular array of atoms, as opposed to a random packing, forms in the
first place. Even if one assumes that a crystalline structure forms at low temperatures,
a detailed quantitative analysis of the energetics remains to be done so as to determine
if the packing is hexagonal, cubic, body or face centered, etc. Nonetheless, if one starts
from the existence (as observed in nature) of a state with broken translational symmetry,
one can construct theories of lattice vibrations (and quantize it) and of electronic band
structure (and discern between insulators, metals and semiconductors). Solid state
physics starts from the solid.
The approach we review in this paper relies on a similar philosophy: we do not
claim any theory of the glass transition, and we do not attempt to answer why glasses?
with this particular approach. Our theory does not allow us to make non-universal
predictions, such as what the glass transition temperature is (if one can be defined) for
a certain material, or whether the material displays glassy behavior at all. We aim at
understanding if there is a set of principles, guided by symmetry considerations, that
can be used to understand certain universal aspects of glassy dynamics once the glass
state is presented. For example, glasses age [3]. We thus expect that there exists a
unified approach to describe aging phenomena, and we seek some guiding principles,
based on dynamical symmetries, that could allow us to understand universal properties
in the aging regime, including the scaling of spatial heterogeneities.
We propose that the symmetry that captures the universal aging dynamics
of glassy systems is the invariance of an effective dynamical action under uniform
reparametrizations of the time scales [4]-[9]. Such type of invariance had been known
to exist since the early days in which the mean-field equilibrium dynamics of spin-
glasses was tried to be understood [10, 11] and it was later encountered in the better
formulated out of equilibrium dynamic formalism applied to the same systems [12, 13].
The invariance means that in the asymptotic regime of very long times a family of
solutions to the equations of motion is found. This ‘annoyance’, we claim, has actually
a physical meaning and implications in the fluctuating dynamics of real glasses. In
order to make this statement concrete, it is necessary to show that the invariance
also exists in finite dimensional glassy models. With this aim we showed that global
time reparametrization invariance emerges in the long times action of short-range spin-
glasses assuming causality and a separation of time scales [4]-[6]. Basically, the second
assumption amounts to start from a glass state, where one may claim there is a
separation between, roughly, a time regime where relaxation is fast and another where
relaxation is slow. The invariance can then be used to describe dynamic fluctuations in
spin-glasses and, we conjectured, in other glassy systems as well.
Physically, the emergence of time reparametrization invariance can be though of
in the following way. The out of equilibrium relaxation of glassy systems is well
characterized by two-time functions, either correlations or linear responses. In the slow
CONTENTS 4
and aging regime they depend on two times and time-translation invariance is lost. The
proper measure of ‘time’ inside the system is the value of the correlation itself, and not
the ‘wall clock’ in the laboratory. For instance, in a spin-glass the proper measure of
sample age is the spin-spin overlap or in particle systems it is the incoherent correlation
function. Age measures can fluctuate from point to point in the sample, what we called
heterogeneous aging, with younger and older pieces (lower and higher values of the
correlation) coexisting at the same values of the two laboratory times. The fact that
the effective dynamical action becomes invariant under global time reparametrizations,
t → h(t), everywhere in the sample, means that the action weights the fluctuations
of the proper ages, C(~r; t1, t2), directly, and the times t1 and t2 in the action are just
integrated over as dummy variables. To draw an analogy, in theories of quantum gravity
the space-time variables Xµ(τ, σ) are the proper variables, and the action is invariant
under conformal transformations of the world-sheet parameters τ and σ.
So what does global time-reparametrization invariance symmetry concretely teach
us about observables in glasses? So far we discussed a global symmetry or invariance
with respect to uniform time-reparametrizations. By looking at spatially heterogeneous
reparametrizations, we can predict the behavior of local correlations and linear
susceptibilities and the relations between them. For example, we predict that, after
a convenient normalization that we explain in the main text, the triangular relation
between the local coarse-grained correlations, C(~r; t1, t2), C(~r; t2, t3) and C(~r; t1, t3), as
a function of the intermediate time t2, t3 < t2 < t1, at all spatial points, ~r, should be
identical to the global triangular relation, in the asymptotic limit of very long absolute
times and delays between them and very large coarse-graining linear length [9]. Different
sites can be retarded or advanced with respect to the global behaviour but they should all
have the same overall type of decay. Similarly, the relation between local susceptibilities
and their associated correlations should be identical all over the sample [5, 6] leading to
a uniform effective temperature [14].
The purpose of this article is to describe our current understanding of dynamic
fluctuations (heterogeneities) in the non-equilibrium relaxation of glassy systems arising
from the time-reparametrization invariance scenario [4]-[9],[15]. We illustrate it by
presenting critical tests.
The structure of this review is the following. In Sect. 2 we explain the origin of
the time-reparametrization invariance scenario. We do not present detailed derivations
that were already published but we aim at highlighting the main ideas behind the scene.
In Sect. 3 we list several measurable consequences of the theory. We discuss how they
have been examined numerically in different glassy models. The discuss here which
consequences could also be explained by other approaches and which, we believe, are
unique to our scenario. Finally, in Sect. 4 we discuss the scenario. Since this research
project is not closed yet, we present some proposals for numeric and experimental tests
as well as some ideas about analytic calculations that could help us understanding the
limitations of our proposal.
CONTENTS 5
2. Time reparametrization invariance
In this Section we explain how global time-reparametrization invariance develops
asymptotically in the aging regime of glassy models. For the sake of simplicity we focus
on the classical formalism and at the end of this section we mention the modifications
introduced by quantum fluctuations.
2.1. Mean-field models – dynamic equations
Schematic models of spin-glasses, structural glasses and ferromagnetic clean coarsening
are encoded in the family of p-spin models defined by [16]
H = −
i1i2...ip
Ji1i2...ipsi1si2 . . . sip (1)
with quenched disordered exchanges distributed with the Gaussian law P (Ji1i2...ip) ∝
−p!J2
i1i2...ip
/(2Np−1)
. The dynamic variables si, i = 1, . . . , N , are of Ising type, si = ±1,
or satisfy a global spherical constraint
i=1 s
i = N . p is an integer parameter: with
p = 2 and Ising variables one mimics spin-glasses, with p > 2 and Ising or spherical
spins the phenomenology of structural glasses is recovered, and with p = 2 and spherical
spins one describes ferromagnetic domain-growth in clean systems. The sum runs over
all p-uplets; for this reason these models are ‘mean-field’ in the sense that the saddle-
point evaluation of the partition function or the dynamic generating functional is exact
in the thermodynamic limit, N → ∞. The Hamiltonian (1) also represents the potential
energy of a particle with position ~s = (s1, . . . , sN) on an infinite dimensional hypercube
(si = ±1) or hypersphere (
i=1 s
i = N).
Dynamics is introduced with a Langevin equation that represents the coupling of
the spins to an equilibrated thermal environment. Ising spins are then replaced by
soft variables by introducing a double-well potential energy,
i=1 V (si), with V (si) =
a(s2i − 1)
2. The initial condition is usually chosen to be random thus mimicking a rapid
quench from infinite temperature to the working temperature T . In the limit N → ∞
exact Schwinger-Dyson equations couple the global correlation and instantaneous linear
response
NC(t, tw) =
si(t)si(tw) , NR(t, tw) =
δsi(t)
δhi(tw)
i=1 si(t) = 0 for all t). The field hi couples linearly to the spin variables and, in
general, we are interested in perturbing fields that are uncorrelated with the equilibrium
states of the systems. It is not necessary to average over thermal noise or quenched
disorder since these quantities do not fluctuate in the out of equilibrium regime reached
when the infinite volume limit (N → ∞) has been taken at the outset. Here are in
what follows times are measured from an origin that corresponds to the quench to the
working temperature. Our study applies then in the order of limits
CONTENTS 6
in which the exact causal Schwinger-Dyson equations for spherical models at times t ≥ tw
read [12, 19]
(∂t − zt)C(t, tw) =
dt′ Σ(t, t′)C(t′, tw) +
dt′ D(t, t′)R(tw, t
′) , (4)
(∂t − zt)R(t, tw) =
dt′ Σ(t, t′)R(t′, tw) + δ(t− tw) , (5)
where the vertex, D, and self-energy, Σ, are functions of C and R
D(t, tw) =
Cp−1(t, tw) , Σ(t, tw) =
p(p− 1)
Cp−2(t, tw)R(t, tw) . (6)
The Lagrange multiplier zt is fixed by requiring C(t, t) = 1. For Ising problems the soft
spin constraint can be treated in the mode-coupling approximation [17] or else, one can
apply a T − Tc expansion taking advantage of the fact that the phase transition is of
second order for p = 2 (Sherrington-Kirkpatrick or SK model) [13].
In such fully-connected models all higher order correlations and linear responses
factorize and can be written as functions of these two-time functions. Self-consistent
approximate treatments of interacting particle models with realistic potentials, like the
mode-coupling approach, yield similar equations with the addition of a wave-vector
dependence that in the present context can also be taken into account by considering
models of d-dimensional random manifolds embedded in N → ∞ dimensional
spaces [18, 19].
2.2. Structural glasses: the p ≥ 3 cases
We now focus on the p ≥ 3 cases that mimic the structural glass problem. We mention
at the end of this subsection the Ising p = 2 (SK) spin-glass case that is not conceptually
different but just technically more involved. In Sect. 2.5 we discuss the p = 2 spherical
problem that yields a mean-field description of coarsening phenomena and is rather
different from the point of view of time-reparametrization transformations.
Equations (4) and (5 are causal and be solved numerically by constructing C(t, tw)
and R(t, tw) from the initial instant t = tw = 0. An analytic solution is possible in the
asymptotic limit, as we discuss below. We first present the main features of C and R
and we later explain how these are obtained from the asymptotic analytic solution.
Equations (4) and (5) have a dynamic transition at a critical temperature Td(p). At
T > Td the dynamics occurs in equilibrium and close to Td the decay of the correlations
slows down as in super-cooled liquids with the α relaxation time diverging as a power law
of T − Td [16, 20]. Below Td eqs. (4) and (5) admit a unique solution [12, 13] that is no
longer stationary. The behaviour of the low-temperature correlation and susceptibility
is sketched in Fig. 1.
The low-temperature solution presents a separation of time-scales. In the long
waiting-time limit the self-correlation and integrated linear response or susceptibility,
χ(t, tw) ≡
dt′R(t, t′), can be written as
C(t, tw) = Cst(t− tw) + Cag(t, tw) , (7)
CONTENTS 7
1e+00
1e-01
1e-02
1e+051e+031e+011e-01
rapid & stationary (C st )
aging &
(Cag)
1e+00
1e-01
1e+051e+031e+011e-01
rapid & stationary (χ st)
aging & slow (χag)
Figure 1. Sketch of the relaxation of the self-correlation and susceptibility in the
glassy regime. The separation of time-scales is clear in the figure. The Edwards-
Anderson parameter, qea the corresponding susceptibility χea and the asymptotic value
limt→∞ χ(t, tw) are indicated with horizontal lines.
χ(t, tw) = χst(t− tw) + χag(t, tw) . (8)
The first terms in the right-hand-side describe the stationary regime at short time-
differences in which the correlation and susceptibility relatively rapidly approach a
plateau at limt−tw→∞ limtw→∞C(t, tw) = qea and limt−tw→∞ limtw→∞ χ(t, tw) = (1 −
qea)/T ≡ χea. The second terms are the aging relaxation of C towards zero (in
the absence of an external field), and the aging response of the system towards the
asymptotic value χea + qea/Teff with Teff a parameter with the interpretation of an
effective temperature [14].
The stationary and aging relaxation are fast and slow in the sense that
∂tCst(t, tw) ∼ Cst(t, tw) C > qea , (9)
∂tCag(t, tw) ≪ Cag(t, tw) C < qea , (10)
∂tχst(t, tw) ∼ χst(t, tw) χ < χea , (11)
∂tχag(t, tw) ≪ χag(t, tw) χ > χea . (12)
The aging self-correlation and susceptibility scale as
Cag(t, tw) ≈ qea fc
R(tw)
, χag(t, tw) ≈ qea fχ
R(tw)
. (13)
The scaling functions satisfy the limit conditions fc(1) = 1, fc(∞) = 0, fχ(1) = 0
and fχ(∞) = 1/Teff . Using mathematical properties of monotonic two-time functions
one can show that such a scaling holds asymptotically in each (two) time-scale of the
evolution [13]. While in a system undergoing finite dimensional coarsening R(t) has
a natural interpretation as the typical domain radius, in mean-field models there is
no immediate understanding of the ‘clock’ R(t) that, in a sense, sets the macroscopic
time-scale. The numerical solution suggests that R(t) is just a power of time.
CONTENTS 8
In the asymptotic limit in which the additive separation of time-scales with the
scaling form (13) holds it is convenient to use a parametric description of the dynamics
in which times do not appear explicitly. More precisely, the approach to the asymptotic
scaling, and fc and fχ, can be put to the test by constructing ‘triangular relations’
between correlations and susceptibilities, respectively. For generic three long times
t1 ≥ t2 ≥ t3 ≫ t0 one computes the correlations C(tµ, tν), µ > ν = 1, 2, 3. If the times
are such that the ratios R(tµ)/R(tν) remain finite in the asymptotic limit, that is to
say C(tµ, tν) = Cag(tµ, tν), one has
C(t1, t3) = qea fc
f−1c [C(t1, t2)/qea] f
c [C(t2, t3)/qea]
. (14)
If, instead, t1 = t2 + τ with τ > 0 finite, and R(t2)/R(t3) finite in such a way that
C(t1, t2) = qea + Cst(t1 − t2) and C(t2, t3) = Cag(t2, t3) one has
C(t1, t3) = min [C(t1, t2), C(t2, t3) ] (15)
asymptotically. This form goes under the name of dynamic ultrametricity. In the
opposite case t3 = t2 − τ and R(t1)/R(t2) finite dynamic ultrametricity also holds.
These relations follow immediately from the additive separation of time-scales (8) and
the scaling (13) but they can be shown without assuming dynamic scaling, just by
using the monotonicity properties of temporal correlations [13]. The simplest way to
see these relations at work is to display C(t1, t2) against C(t2, t3), for a chosen value of
C(t1, t3) < qea, in a parametric plot in which t2 varies from t3 to t1. In the asymptotic
limit t3 → ∞ the construction reaches a stable master curve as displayed in Fig. 2-
left. The vertical and horizontal parts correspond to t2 such that C(t1, t2) > qea and
C(t2, t3) > qea, respectively, and dynamic ultrametricity holds. The curved part is for
t2 such that all correlations are in the aging regime and its functional form is fully
determined by fc. The ‘clock’ R yields the speed at which the data-point moves on the
parametric curve. A similar construction can be done for the susceptibilities.
The stationary correlation, Cst, and susceptibility, χst, are linked by the equilibrium
fluctuation dissipation theorem (fdt), χst = (1− Cst)/T . In the aging regime, instead,
there is a non-trivial relation between χ and C: χag = (qea − Cag)/Teff that yields
fχ(x) = (1−fc(x))/Teff . This relation is also better appreciated if shown in a parametric
construction in which times do not appear explicitly. In the long waiting-time limit the
plot χ(t, tw) against C(t, tw) with t the parameter running from tw to infinity approaches
a broken line form with the slopes −1/T (for C > qea) and −1/Teff (for C < qea). Again,
the ‘clock’ R yields the speed at which this curve is constructed upon increasing t.
An analytic solution to the Schwinger-Dyson equations was derived in the limit
of long waiting-time in which the separation of time-scales, that is to say the plateaus
in C and χ, are fully established. In the aging regime, one uses the fact that the
variation of the correlation and linear susceptibility are negligible with respect to all
terms in the right-hand-side of the equations and can thus be dropped. Furthermore,
one approximates the integrals by separating the contributions from the stationary and
CONTENTS 9
10.80.60.40.20
10.80.40.20
Figure 2. Sketch of the parametric representation of the correlation and susceptibility.
Left: triangular relation between the correlation function in the asymptotic limit
t3 → ∞. The three curves correspond to different t1’s such that C(t1, t3) takes three
values. The breaking points lie at qea. The arrow indicates the sense of the evolution
when t2 increases from t3 to t1. Right: susceptibility, χ(t2, t3) against correlation,
C(t2, t3) at fixed t3 using t2 ≥ t3 as a parameter in the long t3 limit. The breaking point
at (qea, χea) separates the stationary regime where the equilibrium fdt is satisfied from
the aging regime where it is modified. The slopes are −1/T and −1/Teff , respectively.
The arrow also indicates here the sense of the evolution when t2 increases from t3.
aging regimes [19]. Equation (5) becomes
µ∞Rag(t, tw) ∼
p(p− 1)
dt′ Cp−2ag (t, t
′)Rag(t, t
′)Rag(t
′, tw) (16)
(µ∞ is a constant with contributions from limt→∞ zt and border terms in the integrals).
The companion eq. (4) takes a similar form within the same approximation. Now, the
surprise is that the approximate equations are invariant under the transformation
t→ ht ≡ h(t) ,
Cag(t, tw) → Cag(ht, htw) ,
Rag(t, tw) → ḣtw Rag(ht, htw) ,
with ht positive and monotonic and ḣtw ≡ dhtw/dtw. While the functions fc and
fχ and their fd relation are fixed by the remaining approximate equation (16) and
its companion, the time-reparametrization invariance does not allow one to compute,
analytically, the clock R(t). This problem is similar to the velocity selection problem
present for instance in Fisher differential equation describing front propagation and the
like [21]. The exact Schwinger-Dyson equations do have a unique solution with a special
function R(t) that is selected by the short-time difference effect of the time-derivatives.
However, as time increases and time-differences increase too the effect of the time-
derivatives diminishes. In the approximate analytic solution we take advantage of this
fact to solve the equations asymptotically but we introduce in this way a symmetry that
does not allow us to fix R(t). We obtain, instead, a family of solutions parametrized by
ht. It is important to reckon that the parametric constructions in Fig. 2 are independent
of the clock and thus are fully determined by the approximate treatment.
CONTENTS 10
The case p = 2 with Ising spins or Sherrington-Kirkpatrick model has a more
complicated scaling form with a sequence of two-time scales leading to dynamic
ultrametricity for all C < qea asymptotically [13]. This behaviour is technically more
involved but, as far as the symmetry properties are concerned, it is similar to the case
treated above. The full dynamic equations have a unique solution but the approximate
ones acquire time-reparametrization invariance. We shall not discuss these cases further
in the rest of this review.
2.3. Short-ranged models – dynamic action
In a series of papers [4]-[9] we claimed that the time-reparametrization invariance thus
far introduced via the asymptotic solution to the dynamic equations in mean-field glassy
problems is indeed an asymptotic property of the dynamics of glassy systems, mean-field
and finite dimensional. The separation of time-scales stationary-aging has been observed
in a variety of glassy systems with numerical simulations and experiments [3, 22]. The
slowness of the decay in the aging regime, eqs. (10) and (12) and a weak long-term
memory of the kind (13), are the hallmark of glassy relaxation. The idea is then that
time-reparametrization invariance is the symmetry associated to the dominant dynamic
fluctuations in these sytems.
In order to pursue this idea forward one has to first prove that the symmetry of
the saddle-point equations is also a symmetry of the action in the dynamic generating
functional not only of fully-connected spin models of the mean-field type but also of finite
dimensional glassy systems. In [4] we derived and studied the symmetry properties of
the dynamic action – the so-called Martin-Siggia-Rose action associated to Langevin
stochastic dynamics – of the disorder averaged soft-spin 3d Edwards-Anderson (ea)
model of spin-glasses. (H =
〈ij〉 Jijsisj with Jij Gaussian random variables with zero
mean and taking non-zero value only on nearest neighbours on the d-dimensional lattice
and si = ±1.) In our analysis we took a number of steps that we briefly recall here.
First, we introduced four fluctuating two-time fields defined on the lattice sites,
Qabi (t, tw) , with i = 1, . . . , N and a, b = 0, 1 . (18)
Their thermal averages are the expected values of the local two-time self-correlation
(a = b = 0), the retarded linear response (a = 0, b = 1), the advanced linear response
(a = 1, b = 0), and a fourth observable (a = b = 1) that vanishes if causality is preserved.
Second, we assumed that a separation of time-scales fast-slow, of the type described in
the previous subsection, applies to these fluctuating fields too. Third, we determined the
long-time action by using a Renormalization Group (RG) scheme in the time variables.
This allowed us to write the full action as a sum of two contributions: one from the fast
regime holding at short time-differences, another one from the slow regime valid at long
time-differences. The coupling between these two vanishes asymptotically. Fourth, we
analyzed the surviving terms in the action, that are just the slow contribution. Using
advanced and retarded scaling dimensions that are just the labels a and b of the fields,
CONTENTS 11
one finds that the global time-reparametrization,
t→ ht ≡ h(t) , Q̃
i (t, tw) = (ḣt)
a(ḣtw)
b Qabi (ht, htw) , (19)
leaves all surviving terms unmodified. This step is concisely carried out as follows. Take
a generic term in the action. Under (19) it transforms as
dtν · · · →
dtν (ḣtν )
∆ν · · ·
where N is the number of time integrals, the dots represent a product of the fields
Qabi and ∆ν is the sum of all factors ḣtν arising from the transformation of the fields.
Interestingly enough, one finds that all the ∆ν equal one. Thus, with a simple change
of variables one absorbs each factor ḣtν in the corresponding integration variable and
dtν · · · →
dhν · · · (20)
Note that in order to prove invariance under the simpler and more common scale
transformation, ht = λt, it is enough to have
ν=1∆ν = N . Scale invariance is included
in the larger global time-reparametrization symmetry but it is, clearly, more restrictive.
Finally, we showed that the measure in the functional integral is also global time-
reparametrization invariant, completing the program. We refer the reader to Refs. [4]
and [6] for the technical details leading to these results.
In the disordered 3d ea model studied we carried out the disorder average. The
presence of quenched disorder gave us an analytic control of the theory but this does
not necessarily mean that such a symmetry develops only for the long time regime
of models with quenched disorder. It appears that if the essential assumptions are
causality and unitarity, and a separation of time scales that takes the action to a non-
trivial asymptotic state (some glassy state), then one expects the symmetry to exist
for systems without quenched disorder but that are glassy nevertheless. In order to
check the development of time-reparametrization invariance in problems of particles in
interaction in finite dimensions one should first obtain the relevant action to work with.
A good candidate for a starting point is the Dean-Kawasaki stochastic equation for the
evolution of the local density [23]. The idea is then to write its dynamic generating
functional and study the symmetry properties of the effective action assuming that a
separation of time-scales exists. We are currently carrying out this study.
2.4. Turning a nuisance into something useful - symmetry as a guideline
The global time-reparametrization invariance implies that the action describing the long
time slow dynamics of a spin-glass is basically a “geometric” random surface theory, with
the Q’s themselves as the natural coordinates. The original two times parametrize the
surface. Physical quantities, as the bulk integrated response χ(t1, t2) =
dt′R(t′, t2)
and correlation C(t1, t2) have scaling dimension zero under t → h(t) as well as their
local counterparts. The emergence of this gauge-like symmetry, which may appear first
CONTENTS 12
as a nuisance that relates too many solutions for just one problem. However, it may
provide a simple way to understand spatial fluctuations in systems that possess this
global time-reparametrization symmetry.
Here, a simple analogy with the problem of a ferromagnet may elucidate the point
we want to make. In a ferromagnet the action is invariant under uniform rotations of
the magnetization vector ~m. In the ordered phase, rotation symmetry is spontaneously
broken, and a certain magnetization direction ~m0 is picked. Typically, a vanishingly
small pinning field selects this direction. The low action excitations are the spin waves,
fluctuations of the uniform, symmetry broken, state. These spin waves can be described
in terms of smooth spatially fluctuating rotations around the uniform magnetization
state. The spin waves, generated by using slowly varying local rotations, are the
Goldstone modes of the ferromagnet.
Similarly, in the aging regime of the glassy systems, the action has a global
symmetry, under uniform time-reparametrizations, t → h(t), with the fields
transforming as in eq. (19). The probability weight of having certain local two-
time correlation and response, the observables, should be independent of this
reparametrization. After coarse-graining over a linear length ℓ the non-vanishing
fluctuating fields are the local coarse-grained correlation (a = b = 0) and linear response
(a = 0, b = 1),
C(~r; t, tw) =
j∈V~r
sj(t)sj(tw) , R(~r; t, tw) =
j∈V~r
δsj(t)
δhj(tw)
with the sum carried over the spins in the volume V~r = ℓ
d centered at ~r, and a is the
lattice spacing. The transformation (17) is now restated as
t→ ht ≡ h(t) ,
Cag(~r; t, tw) → Cag(~r; ht, htw) ,
Rag(~r; t, tw) → ḣtw Rag(~r; ht, htw)
, (21)
and it is an asymptotic symmetry of the action for the slow coarse-grained degrees of
freedom. Indeed, the symmetry breaking terms, that have their origin in the short-
time dynamics and short-time difference dynamics, are not identical to zero but become
vanishing small asymptotically. The particular scaling function R(t) selected by the
system is determined by matching the fast and the slow dynamics. It depends on several
details – the existence of external forcing, the nature of the microscopic interactions, etc.
In other words, the fast modes which are absent in the slow dynamics act as symmetry
breaking fields for the slow modes.
In analogy with the spin-wave fluctuations in magnetic systems, that are dictated by
the rotational symmetry, we proposed that the smooth fluctuations in the glassy phase
can be obtained by studying the slow varying, position dependent reparametrizations
t→ h(~r, t) around the one reparametrization R(t) selected by the short-time dynamics.
In other words, we basically proposed that there are Goldstone modes for the
glassy action which can be written as slowly varying, spatially inhomogeneous time
reparametrizations. This suggests that the slow part of the coarse-grained local
CONTENTS 13
correlations and susceptibilities should scale as
Cag(~r; t, tw) ≈ qea fc
h(~r, t)
h(~r, tw)
, χag(~r; t, tw) ∼ fχ
h(~r, t)
h(~r, tw)
, (22)
with fc and fχ the same functions describing the global correlation and susceptibility,
respectively [eqs. (13) and (23)] and the same function h(~r, t) scaling the two-time
correlation and susceptibility on each site ~r [4, 5, 6]. The sum rules Cag(t, tw) =
ddr Cag(r; t, tw) and χag(t, tw) = V
ddr χag(r; t, tw) apply.
The reason for this proposal is that the global reparametrization invariance in time
of the dynamic action in this two-time regime leads to low action excitations (Goldstone
modes) for smoothly varying spatial fluctuations in the reparametrization of time, but
not in the external form of the scaling functions. As in a sigma model (for example, to
describe the ferromagnet), the external functions fc and fχ fix the manifold of states,
and the local time reparametrizations correspond to fluctuations restricted to this fixed
manifold of states (in the ferromagnet, the tilting of direction but not the magnitude of
the magnetization vector).
2.5. The spherical p = 2 case or mean-field domain growth
The spherical SK model (p = 2) can be solved exactly by analyzing the Langevin
equation in the basis of eigenvectors of the random matrix Jij [24]. While the correlation
has a very similar behaviour to the one of the p ≥ 3 cases (see Fig. 1-left), the
susceptibility is quite different. One can mention that the stationary and aging regimes
in the linear response are not so sharply separated in this case. If one uses an additive
separation as in (8) the aging contribution to the integrated linear response, χag(t, tw),
vanishes asymptotically. More precisely,
χag(t, tw) ∼ t
. (23)
Importantly enough, even though the inequalities (10) and (12) are still valid, a
careful inspection of all terms in the Schwinger-Dyson equations shows that they are of
the same order asymptotically. Moreover, the stationary contribution to the equations
in the aging regime is not just a constant: the corrections associated to the asymptotic
approach to the plateau in the correlation cannot be neglected. As a result one cannot
simply drop the time derivatives and the Schwinger-Dyson equations in the aging regime
are not time-reparametrization invariant but just scale invariant, that is to say, they are
unchanged by the transformation t→ h(t) = λt, with C and R transforming as in (17)
and λ a positive constant. A similar mechanism, though even harder to prove, applies
to the dynamic equations for the correlation and linear response of the non-conserved
dynamics of the O(N) ferromagnetic model in the large N limit [8].
In line with what we explained above, the effective action for the slow degrees
of freedom of the p = 2 spherical model and, more generally, the d-dimensional
ferromagnetic O(N) model in the large N limit, are not invariant under global time-
reparametrizations but only under global rescaling of time, t → λt. This marks
CONTENTS 14
an important difference between models with a finite aging response and these quasi
quadratic models with a vanishing aging response [8].
This result is important for a number of reasons: first, it suggests that the
susceptibility, or even the effective temperature, might be intimately related to the
symmetry properties of the dynamics and consequently of the fluctuations; second, it
suggests that the mechanism for fluctuations in coarsening systems might be different
from the one of other glassy problems with finite and well-defined effective temperatures.
In order to justify the latter statement it remains to be checked whether the reduction of
time-reparametrization invariance to just scale invariance also holds in finite dimensional
non-field coarsening.
2.6. Quantum problems
In the case of quantum models one introduces the effect of dissipation by coupling the
system to an environment represented, typically, by an infinite ensemble of quantum
harmonic oscillators. One then uses the Schwinger-Keldysh formalism to write a
generating functional and from it, in the fully-connected or infinite dimensional cases,
one derives Schwinger-Dyson equations similar to the ones above. The asymptotic
analysis of these equations follows the same steps as in the classical limit [25, 17] (at
least in the case of a weak coupling to the bath [26]) and the time-reparametrization
invariance also applies.
The appearance of an asymptotic invariance under time-reparatrizations in the
mean-field dynamic equations was related to the reparametrization invariance of the
replica treatment of the statics of the same models [10, 27]. The latter remains
rather abstract. Brézin and de Dominicis [27] studied the consequences of twisting the
reparametrizations in the replica approach. Interestingly enough, this can be simply
done in a dynamic treatment either by applying shear forces or by applying heat-
baths with different inherent dynamics to different parts of the system. More precisely,
using a model with open boundary conditions one could apply a thermal bath with
a characteristic time-scale on one end and a different thermal bath with a different
characteristic time-scale on the opposite end and see how a time-reparametrization
‘flow’ establishes in the model.
3. Consequences and tests
In this Section we discuss how one can put these ideas to the test by presenting a number
of consequences of global time-reparametrization invariance that are directly measurable
numerically and experimentally. The properties that we discuss explicitly are:
1. A growing dynamical correlation length.
2. Scaling of the pdf of local two-time functions.
3. Functional form of the pdf of local two-time functions.
4. Triangular relations between two-time functions.
CONTENTS 15
5. Scaling relations for general multi-time functions.
6. Local fluctuation-dissipation relations.
7. Infinite susceptibilities.
In considering these predictions, we shall separate them into two distinct
classes. The first class contains predictions that are consistent with other theoretical
scenarios as well as with the presence of reparametrization invariance. Hence, while
reparametrization invariance leads to these predictions, this class alone cannot be used
to argue for the role of the symmetry over other mechanisms. The second class, on the
other hand, contains predictions that are not natural within other frameworks, and to
the date of this report have no obvious explanation within other frameworks. Properties
1 and 2 belong to the first class, properties 3-7 to the second. We shall present these
properties in detail below.
3.1. Two-time correlation length
In equilibrium statistical models one defines the static correlation length, ξeq, from the
spatial decay of the correlation between the fluctuations of the order parameter measured
at two space points, 〈 [φ(~r) − 〈 φ(~r) 〉][φ(~r′) − 〈 φ(~r′) 〉] 〉
|~r−~r′|=∆
∼ ∆−d+2−η e−∆/ξeq ,
where the angular brackets denote an average over the Gibbs-Boltzmann measure.
ξeq depends on temperature and, in second order phase transitions, it diverges at Tc
leaving only an algebraically decaying correlation. A dynamic equilibrium correlation
length characterizes the spatial decay of equal-time correlations in the equilibrium
relaxation of ‘usual’ systems. Similarly to the static case one defines ξ(t) via
〈 [φ(~r, t) − 〈 φ(~r, t) 〉][φ(~r′, t) − 〈 φ(~r′, t) 〉] 〉
|~r−~r′|=∆
∼ e−∆/ξ(t); the angular brackets
indicate here an average over thermal histories and, for simplicity, we omitted the
algebraic correction to the exponential decay. The average over thermal noise can
be traded for an integration over the reference space-point ~r and one then obtains
ξ(t) from V −1
ddr δφ(~r, t)δφ(~r′, t) |
|~r−~r′|=∆
∼ e−∆/ξ(t) where δφ(~r, t) ≡ φ(~r, t) −
ddr′′ φ(~r′′, t). The correlation length, ξ(t), depends now on temperature and
total time.
In systems with slow dynamics in which the order parameter is a two-time entity,
a two-time correlation length can be defined in analogy to what we described in the
previous paragraph:
[φ(~r, t)φ(~r, tw)− 〈 φ(~r, t)φ(~r, tw) 〉]
× [φ(~r′, t)φ(~r′, tw)− 〈 φ(~r′, t)φ(~r′, tw) 〉]
|~r−~r′|=∆
∼ e−∆/ξ(t,tw) . (24)
Once again by trading the thermal average by a spatial average 〈 φ(~r, t)φ(~r, tw) 〉 becomes
the global correlation C(t, tw) and ξ(t, tw) is derived from the four-point correlation
C4(∆; t, tw) ≡
ddr δ[φ(~r, t)φ(~r, tw)]δ[φ(~r′, t)φ(~r′, tw)]
|~r−~r′|=∆
CONTENTS 16
1e+081e+061e+041e+021e+00
tw=1k
tw=10k
tw=100k  0
 0  0.2  0.4  0.6  0.8  1
tw=1k
tw=10k
tw=100k  0
 0  0.2  0.4  0.6  0.8  1
tw=1k
tw=10k
tw=100k
Figure 3. The correlation length in the 3d EA model at T = 0.6 < Tc and L = 100.
(a) As a function of t − tw; (b) as a function of 1 − C; (c) in the scaling form t
against 1 − C. These results are taken from [9]. The vertical arrow in panels (b) and
(c) indicates the value of qea.
with δ[φ(~r, t)φ(~r, tw)] ≡ φ(~r, t)φ(~r, tw) − C(t, tw). The quantity C4(∆; t, tw) measures
the probability that a fluctuation of the two-time composite field φ(~r, t)φ(~r, tw) with
respect to its global average C(t, tw) in the spatial position ~r affects a fluctuation of the
same composite field at a different site ~r′ located at a distance ∆ from ~r and averaged
over the whole ensemble of reference points ~r in the sample.
The numerical analysis of C4 in the low temperature out of equilibrium dynamics
of the 3d ea model [6, 9] (see Fig. 3), soft sphere [28] and Lennard-Jones [29] mixtures
yield
ξ(t, tw) ∼
ξst(t− tw) C > qea
ξag(t, tw) C < qea
ξag(t, tw) ∼ t
w g(C) and a a small power (26)
(a logarithmic growth is also possible). g(C) is a monotonically decreasing function of C.
This is a two-time monotonically growing function even at time-lags that are longer than
the waiting-time dependent α relaxation time. Note the difference with what is observed
in the super-cooled phase where a number of numerical and experimental measurements
point at a dynamic correlation length that reaches a maximum at the α relaxation time
and later diminishes to zero [30]. The divergence has a clear interpretation within the
global time-reparametrization scenario: it is due to the generation and development of
the zero mode. The global reparametrization invariance symmetry develops only in the
limit of very long times, so that at intermediate times the irrelevant terms that are scale
down to zero are still manifest. These irrelevant terms, symmetry breaking ones, give
a finite length scale (or a finite ‘mass’) to the soft reparametrization modes. Because
they are irrelevant, the correlation length increases or, equivalently, the mass decreases
asymptotically.
Even though the numerically accessible times are sufficiently long so as to see
the separation of time-scales in the relaxation of the global correlation, the correlation
length is still very short. In numerical simulations ξ reaches of the order of 4 lattice
spacings or inter-particle distances in the spin-glass problem or soft sphere and Lennard-
Jones system, respectively. ξ just increases by, say, a factor 4 when the waiting-time is
CONTENTS 17
increased by nearly 4 orders of magnitude. The expected limit ξ(t, tw) → ∞ is thus far
from being attained.
As we already mentioned in the introduction to this Section, the growing length
scale is consistent with the development of time-reparametrization invariance, but it
is also consistent with other mechanisms. Hence, it alone cannot be used to argue
unambiguously in favor of the symmetry-based approach. For example, theories based
on the mode-coupling approach or random first order scenario [33] and its refinement
including the effect of entropic droplets [34], dynamical criticality controlled by a zero-
temperature critical point [35] and frustration limited domains [36] are used to justify
the growth of a dynamic length-scale, at least in the super-cooled liquid. Hence, the
existence of a growing length scale belongs to the first class of predictions we mentioned
in the introduction to this section.
Finally, let us note that the fluctuations in the susceptibility, or in multi-time
correlations, see eq. (35), and associated susceptibilities, can be used to derive other
correlation lengths. It would be interesting to check whether all these behave in the
same manner.
3.2. Scaling of the pdf of local two-time functions
The most direct way of testing the mere existence of local fluctuations is to measure the
probability distribution function (pdf) of local coarse-grained correlators, C(~r; t, tw),
and linear susceptibilities, χ(~r; t, tw), at different pairs of times, t and tw. In such
a measurement one is forced to use finite coarse-graining lengths. ℓ then becomes a
parameters that has to be taken into account in the scaling analysis of the results.
In Ref. [9] we showed that, quite generally, the pdf of local coarse-grained
correlators can be scaled onto universal curves as long as the global correlation, C(t, tw),
is the same, and the ratio of the coarse graining length over the dynamical correlation
length, ℓ/ξ(t, tw), is held fixed (see [31] for a similar discussion applied to the super-
cooled liquid). Such scaling can be easily understood as follows. At fixed temperature
the pdf ρ[Cr; t, tw, ℓ, L] depends on four parameters: two times, t and tw, and two
lengths, the coarse-graining length, ℓ, and the size of the system, L. As in the aging
regime C(t, tw) is a monotonic function of the two times and ξ(t, tw) ∼ t
wg(C), one
can trade the two times by C and ξ in complete generality. The next step is a scaling
assumption: that in the long times limit the pdfs depend on the coarse graining length
ℓ, the total size L and the scale ξ only through the ratios ℓ/ξ and ξ/L. This last
step, we should stress, is really a scaling assumption, and not a trivial requirement
from dimensional analysis. The lengths ℓ, L and ξ are already dimensionless as they
are measured in units of the lattice spacing. The end result from the rewriting of the
parameters in terms of the global correlation and the scaling hypothesis is that the
pdfs characterizing the heterogeneous constant temperature aging of the system can be
written as
ρ[Cr;C(t, tw), ℓ/ξ(t, tw), ξ(t, tw)/L] . (27)
CONTENTS 18
 0  0.2  0.4  0.6  0.8  1
tw=1k
tw=10k
tw=100k
 0  0.2  0.4  0.6  0.8  1
tw=1k
tw=10k
tw=100k
Figure 4. pdf of local coarse-grained correlations Cr at different times t and tw in
the 3d Edwards-Anderson model with L = 100 at T = 0.6 < Tc. The waiting-times
are given in the key and the global correlation is fixed to C = 0.4 < qea. (a) The
coarse-graining boxes have linear size ℓ = 9 in all cases. The curves do not collapse, a
slow drift with increasing tw is clear in the figure. (b) Variable coarse-graining length
ℓ chosen so as to held ℓ/ξ approximately constant. The collapse improves considerably
with respect to panel (a). These results are taken from [9].
In Fig. 4 we show the scaling of the distribution of local coarse-grained correlations
in the 3d ea model and the effect of the scaling variable ℓ/ξ (the size of the system, L,
is sufficiently large so that ξ/L vanishes in practice). It is noteworthy that a reasonable
scaling with the global correlation held fixed and not taking into account the effect of
the second scaling variable has been already achieved, approximately, in the Edwards-
Anderson model [5, 6], as well as in the kinetically constrained models studied in [7]
and Lennard-Jones systems [32]. This is justified by the fact that ξ varies very slowly
with tw. However, it is clear that at long though finite times one has to hold the ratio
ℓ/ξ constant to obtain a full collapse in all cases.
The scaling variable ℓ/ξ allows one to study the change in the functional form of the
pdfs upon modifying the coarse-graining volume. Indeed, the pdfs should crossover
from a non-trivial form to a simple Gaussian when ℓ goes through the value ξ. In
summary, for finite ξ one identifies three ℓ-dependent regimes with different functional
forms of the pdfs:
• ℓ ≪ ξ. For finite ξ this means ℓ of the order of the lattice spacing, ℓ ∼ a. In this
case the pdfs do not have any particular structure.
• ℓ ∼ ξ. For finite but large ξ this case is non-trivial and indeed the one that is
accessed with numerical simulations and experiments. One finds that the statistics
is non-Gaussian for all C. The skewness decreases from zero at C = 1 to
reach a minimum and then increase again at small values of C. As regards the
functional form, one finds that a Gumbel-like functional form, characterized by a
real parameter that depends on C and ℓ/ξ describes the data rather well for, say,
qea/2
∼ qea.
• ℓ ≫ ξ. In this limit one matches the central-limit theorem conditions and the
statistics becomes Gaussian for all C.
CONTENTS 19
The analysis of the ℓ-dependent pdfs thus provides an independent way to estimate ξ.
3.3. Effective action for local ages
The argument leading to eq. (27) is a scaling hypothesis and it does not rely on time-
reparametrization invariance. Indeed, there exist models in which the scaling (28) for
the pdf of local coarse-grained correlations is found, e.g. the O(N) model in the large
N limit [8], and global time-reparametrization does not apply.
The implications of time-reparametrization invariance appear later, as a prediction
for the functional form of the asymptotic limit
ρ∞(Cr;C) ≡ lim
t,tw→∞
C(t,tw)=C
ρ[Cr;C(t, tw), ℓ/ξ(t, tw), ξ(t, tw)/L] . (28)
In order to study the functional form that ρ∞ can take let us now use the symmetry
argument to analyse the statistics of the fluctuations of the local correlations. So far
we have not yet determined how much the h(~r, t) vary in space and time. To this end
we need to derive an effective action for these functions that will tell us how costly it
is to deviate from the average clock R(t). Ideally, one would like to derive this action
from the microscopic one. This should be possible in quasi mean-field models such as
the p-spin model with Kac long (but finite) range interactions [37]. For the moment
we have just proposed the simplest action that serves our purposes in the ideal limit in
which the zero mode is fully developed ξ → ∞ and the local quantities are measured in
the infinite coarse-graining volume limit with ℓ/ξ → 0 [7]. Otherwise the parameter ℓ
should be taken into account.
To start with we worked with the more convenient transformed variable h(~r, t) ≡
e−ϕ(~r,t) that implies
Cag(r; t, tw) ≈ qea fc
h(~r, t)
h(~r, tw)
= qea fc
dt′∂t′ ϕ(~r,t
and we searched for the simplest action that satisfies the constraints due to the
symmetries. These are:
i. The action must be invariant under a global time reparametrization t→ h(t).
ii. If our interest is in short-ranged problems, the action must be written using local
terms. The action can thus contain products evaluated at a single time and point in
space of terms such as ϕ(~r, t), ∂tϕ(~r, t), ∇ϕ(~r, t), ∇∂tϕ(~r, t), and similar derivatives.
iii. The scaling form in eq. (29) is invariant under ϕ(~r, t) → ϕ(~r, t) + Φ(~r), with Φ(~r)
independent of time. Thus, the action must also have this symmetry.
iv. The action must be positive definite.
These requirements largely restrict the possible actions. The one with the smallest
number of spatial derivatives (most relevant terms) is
S[ϕ] =
(∇∂tϕ(~r, t))
∂tϕ(~r, t)
, (30)
CONTENTS 20
with K a stiffness. A term M ∂tϕ(~r, t) is also allowed by symmetry but since its space-
time integral is constant we drop it. The action solely depends on the time derivatives
∂tϕ(~r, t) and it is simple to check that it satisfies all the four constraints enumerated
above (the last requirement follows from the fact that h(~r, t) are monotonically
increasing functions of time) [7].
Due to the simple form (30) the ∂tϕ(~r, t) are uncorrelated at any two different times
t1 and t2. Thus the expression ∆ϕ~r|
dt′ ∂t′ϕ(~r, t
′) entering the exponential in
the scaling form in eq. (29) is a sum of uncorrelated random variables in time. One
can interpret such expression as the displacement of a random walker with position
dependent velocities. Alternatively, one can think of the space-dependent differences
∆ϕ~r|
as the net space-dependent height (labeled by t) of a stack of spatially fluctuating
layers dt ∂tϕ(~r, t). The action for the fluctuating surfaces of each layer is given by
eq. (30).
The statistics of the ∆ϕ~r|
are completely determined as follows. The action
(30) transforms into one of a Gaussian surface after the introduction of a ‘proper’ time
τ ≡ lnR(t), and the change of variables, ψ2(~r, t) = ∂τϕ(~r, τ). Indeed,
Cag(~r; t, tw) ≈ fc
R lnR(t)
lnR(tw)
dτ ′ ψ2(~r,τ ′)
S[ψ] = K
dτ ′ [∇ψ(~r, τ ′)]2 . (32)
Due to the Gaussian statistics of the ψ, it is simple to show that connected N -point
correlations of ∆
varphi~r1 |
satisfy
〈∆ϕr1|
∆ϕr2 |
· · ·∆ϕrN |
〉c = [τ(t)− τ(tw)] F(~r1, ~r2, . . . , ~rN) , (33)
where the function F can be obtained from Wick’s theorem, summing over all graphs
that visit all sites (connected) with two lines (because of ψ2) for each vertex i
corresponding to a position ri. The reparametrized times appear only in the prefactor
τ(t) − τ(tw) = ln[R(t)/R(tw)]. The probabilistic features of the fluctuations of local
correlations C(~r, t, tw) depend on times only through R(t)/R(tw), and hence only
through the global correlation itself C(t, tw). In consequence, the action (32) implies
the scaling (28). The fact that the time-dependencies of the statistical properties of the
two-time local coarse-grained correlations are fully determined by the global correlation
is a very welcome property of action (30) since it was not obvious a priori.
Having the forms in eqs. (31) and (32) allows us to make some quantitative
predictions about the form of the pdfs. With some algebraic manipulations one shows
the following generic features [7]:
• The distribution is non-Gaussian for all C.
• For C
∼ qea the pdf is negatively skewed and once put into normal form it is
very close to the distribution of the global equilibrium magnetization in the 2d xy
model [38]. It can then be approximately described by a generalized Gumbel form
with real parameter.
CONTENTS 21
• In the opposite limit C
∼ 0 the pdf is positively skewed and it does not take any
recognizable form.
If one is interested in testing the action further one can simply use eq. (32) to generate,
numerically, the ψ(~r, t), construct the C(~r, t, tw) from these functions, and then compare
the pdfs thus obtained to the ones measured, say, in a numerical simulation of a given
problem. Note that the scaling function fc also plays a role in the functional form of
the pdf of local correlations. The same argument applies to the susceptibilities.
The local coarse-grained correlations are, by construction, sums of correlated
random variables (unless ℓ ≫ ξ). With numerical simulations of the 3d ea model [9]
and kinetically facilitated lattice gases [7] we found that the pdfs of correlations coarse-
grained over finite lengths ℓ have a functional form that resembles a generalized Gumbel
distribution characterized by a continuous parameter that depends on ℓ/ξ and the value
of the global correlation, C, when C
∼ qea (see also [38]). This fact is consistent with the
discussion above and also with the observation of Bertin and Clusel that Gumbel-like
pdfs with real parameter characterize the statistics of sums of random variables with
particular correlations between the elements [39]. The fact that we obtain Gumbel-like
distributions then means that the correlations between the terms in the sum are of the
form needed to get this type of pdf.
In short, the time-reparametrization scenario predicts, in its simplest setting, that
eqs. (31) and (32) fully characterise the fluctuation of the local correlations in the large
times and coarse-graining volume limits.
3.4. Two-time scaling of local functions
As we argued in Sect. 2.4, the global time-reparametrization invariance suggests that
in the ideal asymptotic limit the slow part of the coarse-grained local correlations and
susceptibilities should scale as in eq. (22) in the ideal limit a ≪ ℓ ≪ ξ. In practice
the ideal limit is not reached and one is forced to work with finite correlation lengths
and thus finite coarse-graining lengths too. The finite ℓ will then play a role and has
to be taken into account. We now present some tests of eq. (22) that are based on the
parametric representation of the dynamics explained in Sect. 2 and take into account
the finite value of ℓ.
Let us then imagine that we compute three local coarse-grained two-time
correlations, C~r, at three space points ~r1, ~r2 and ~r3, using a given coarse-graining length,
a≪ ℓ, and that we obtain functional forms that are characterized by eq. (22) with, say,
h(~r1, t) = ln (t/t0), h(~r2, t) = t/t0, and h(~r3, t) = e
ln2(t/t0), in the aging regime. In Fig. 5
we sketch the decay of these correlations for the same tw as a function of time-delay.
The plateau is at the same height since qea as well as the the full stationary decay are
not expected to fluctuate. The external function fc is the same in all curves. It is clear
that the decay of the three correlations follows a different pace, the one at ~r3 is the
fastest while the relaxation at ~r2 is the slowest.
The simplest way to put the proposal (22) to the test is to analyze the implications
CONTENTS 22
1e+00
1e-01
1e-02
1e+061e+041e+021e+00
h3  0
 0.25
 0.75
 0  0.25  0.5  0.75  1
Figure 5. Left: sketch of the decay of the correlation with the same stationary decay
to qea – shown with a horizontal dashed line – and three choices of the scaling function
h(r1, t) = ln(t/t0), h(r2, t) = t/t0, and h(r3, t) = e
lna(t/t0). The waiting-time is the
same in all curves. Right: the relation between the integrated linear response against
the correlation. With a solid line, the parametric plot for fixed and long tw, using t
as a parameter that increases from tw at C = 1 to ∞ at C = 0. With symbols, the
three pairs (Cj(t, tw), χj(t, tw)) for the same tw, a fixed value of t and hj(t) as in the
left panel.
of eq. (22) on local triangular relations. In Sect. 2 we showed that two-time functions
with a separation of time-scales as in eq. (8) and an aging scaling as in eq. (13) are
related in a parametric way in which times disappear, see the sketch in Fig. 2-left.
Equation (22) implies that the local (fluctuating) two-time functions should verify the
same relation
Cag(~r; t1, t3) = qea fc
f−1c [Cag(~r; t1, t2)/qea] f
c [Cag(~r; t2, t3)/qea]
. (34)
This is a result of the fact that time-reparametrization invariance restricts the
fluctuations to appear only in the local functions h(~r, t) while the function fc is locked
to be the global one everywhere in the sample.
A pictorial inspection of this relation should take into account the fact that while
the stationary decay is not expected to fluctuate, the full aging relaxation and, in
particular, the minimal value of the local two-time functions, C(~r; t1, t3), are indeed
fluctuating quantities. The parametric construction on different spatial regions should
yield ‘parallel translated’ curves with respect to the global one, as displayed in Fig. 2-
left. Fluctuations in the function fc would yield different functional forms in the
curved part of the parametric construction. A more quantitative analysis can be
done by using the knowledge of fc that can be extracted from the global correlation
decay. Indeed, if fc is known, the parametric plot f
c (C~r12/qea)/
f−1c (C~r13/qea) against
f−1c (C~r23/qea)/
f−1c (C~r13/qea) should yield a master curve identical to the global one
with different sites just being advanced or retarded with respect to the global value.
This is another way of stating that the sample ages in a heterogeneous manner, with
some regions being younger (other older) than the global average. (For simplicity
we used a chort-hand notation, C~rµν = C(~r; tµ, tν) with µ, ν = 1, 2, 3.) If the time-
CONTENTS 23
 0  0.2  0.4  0.6  0.8  1
Ccg(t3,t2)
 0  0.2  0.4  0.6  0.8  1
Cr 23 (Cg 13/Cr 13)
 0  0.2  0.4  0.6  0.8  1
Cr 23 (Cg 13/Cr 13)
 0  0.2  0.4  0.6  0.8  1
Cr 23 (Cg 13/Cr 13)
Figure 6. The triangular relation in the 3d ea model. Upper left panel: the thick
(black) line represents the global C(t1, t2) against C(t2, t3) using t2 as a parameter
varying between t3 = 5 × 10
4 MCs and t1 = 9 × 10
6 MCs, C(t1, t3) ∼ 0.35 and
qea ∼ 0.8. The curved part is well represented by the hyperbolic form C(t1, t2) ∼
qeaC(t1, t3)/C(t2, t3) ∼ 0.79 × 0.35/C(t2, t3) that corresponds to fc(x) ∼ x
−b. With
different points joined with thin lines we show the triangular relations between the
local coarse-grained correlations on five randomly chosen sites on the lattice (ℓ = 30).
Upper right panel and lower left and right panels: 2d projection of the joint probability
density of C(r; t1, t2)
C(t1, t3)/C(r; t1, t3) and C(r; t2, t3)
C(t1, t3)/C(r; t1, t3) at
fixed three values of the intermediate time, t2 = 1.5 × 10
5 MCs, 8 × 105 MCs and
5× 106 MCs, respectively and ℓ = 10. The global C(t1, t2) against C(t2, t3) using t2 as
a parameter is shown with a thick green line. The green points indicate the location of
C(t1, t2) and C(t2, t3) for the chosen t2’s. Each point in the scatter plot corresponds
to a site, r. The lines indicated the boundary surrounding 25%, 50% and 75% of the
probability density. The cloud extends mostly along the global relation as predicted
by time-reparametrization invariance. These results are taken from [9].
CONTENTS 24
reparametrization mode is indeed flat the local values should lie all along this master
curve in the aging regime.
The conclusions drawn above apply in the strict a ≪ ℓ ≪ ξ limit. In simulations
and experiments ξ is finite and even rather short. Thus, ℓ is forced to also be a rather
small parameter, in which case ‘finite size’ fluctuations in fc are also expected to exist.
The claim is that the latter should scale down to zero faster (in ℓ) than the fluctuations
that are related to the zero mode.
We have tested these claims in the non-equilibrium dynamics of the 3d ea spin-
glass [9]. The results are shown in Fig. 6. In the upper left panel we show the global
triangular relation (thick black line) as well as the local one on four chosen sites. The
separation of time-scales is clear in the plot. The aging part is rather well described by
fc(x) ∼ x
−b and the local curves are quite parallel indeed. In the remaining panels in
Fig. 6 we show the 2dprojection of the joint probability density of the site fluctuations
in the local coarse-grained correlations at different chosen times t2, t2 = 1.5× 10
5 MCs
(upper right), t2 = 8 × 10
5 MCs (lower left), and t2 = 5 × 10
6 MCs (lower right).
Taking advantage of the fact that fc(x) ∼ x
−b we use a very convenient normalization
in which we multiply the horizontal and vertical axes by [C13/C~r13]
1/2. Global time-
reparametrization invariance, expressed in eq. (34), implies that the data points should
spread along the global curve indicated with a thick green line in the figure. Some sites
could be advanced, others retarded, with respect to the global value – shown with a
point on the green curve – but all should lie mainly on the same master curve. This is
indeed quite well reproduced by the simulation data in the three cases, C(t1, t2) close to
C(t1, t3) (upper right panel), C(t1, t2) close to qea (lower right panel), and C(t1, t2) far
from both (lower left panel). Most of the data points tend to follow the master curve
though some fall away from it. The reason for this is that eq. (34) should be strictly
satisfied only in the very large coarse-graining volume limit (ℓ≫ a) with ℓ/ξ ≪ 1 while
we are here using ℓ = 10a ∼ ξ, see the discussion in Sect. 3.1.
The triangular relation can be used to test the fluctuations of the local
susceptibilities too. Indeed, if the separation of time scales (8) and the scaling (13)
apply to the global susceptibility, the local ones, after the convenient normalization
by the maximum value, should follow another master curve, identical to the global one.
Note that time-reparametrization invariance as we discuss it here implies that both local
correlations and susceptibilities should be fluctuating quantities.
Finally, notice that, in contrast to the growing correlation length scale, there is
no blatantly obvious explanation of these triangular relations within other theoretical
scenarios. These relations are perhaps the most direct consequence of the time
reparametrization symmetry arguments, and so this prediction falls within the second
class we discussed in the introduction to this section.
CONTENTS 25
3.5. Multi-time scaling
In general muti-time correlations are non-trivially related to two-time ones. One can
take as an example a generic coarse-grained connected four-time correlation. If this
function is monotonic with respect to all times, and the two-point correlations scale as
in (13) for all pairs (tµ, tν) with µ, ν = 1, 2, 3, 4, the four-time correlation should behave
C(~r; t1, t2, t3, t4) = g
h(~r, t1)
h(~r, t2)
h(~r, t2)
h(~r, t3)
h(~r, t3)
h(~r, t4)
with the same external function g for all r, in the asymptotic limit in which all times are
widely separated and the corresponding two-time correlations fall below qea. Parametric
constructions could be envisaged to test this relation.
3.6. Local fluctuation-dissipation relation
The asymptotic relation between the global correlation and susceptibility
tw→∞,C(t,tw)=C
χ(t, tw) = χ̂(C) (36)
was first obtained in mean-field disordered models [12, 13] and later observed in
simulations of many realistic systems (spins and particles in interaction on finite
dimensional spaces). −(dCχ̂(C))
−1 defines an effective temperature [14]. In the aging
regime, that is to say for C < qea, three behaviours have been observed in mean-field
systems: in structural glass models χ̂(C) is linear in C (solid line in the right-panel in
Fig. 5); in spin-glass models χ̂(C) is a non-linear function of C; in coarsening systems
χ̂(C) is a constant equal to (1− qea)/T .
The scaling in eq. (22) implies that the parametric construction ‘local susceptibility
against local correlation’ should fall on the master curve for the global quantities but
can be advanced or retarded with respect to the global value; again in the theoretical
limit a ≪ ℓ ≪ ξ. This behaviour is sketched in Fig. 5-right for the three sites
whose correlations are displayed in the left panel. The restricted relation between local
susceptibility and local correlation in eq. (22) arise from the fact that the fluctuations
are due to local reparametrizations alone and not to changes in the external functions
fc and fχ (much as in transverse vs. longitudinal fluctuations in a non-linear σ-model).
An important property of the interpretation of the fluctuation dissipation relation
in terms of effective temperatures is that one expects all observables evolving in the same
time-scale to equilibrate and hence have the same value of the effective temperature [14]
in an asymptotic regime with slow dynamics and small entropy production. Within
the time-reparametrization invariance approach the local effective temperature, defined
from the slope of the χ̂(C) plot, is automatically the same in the whole sample within
a correlation scale, just because the functions fc and fχ do not fluctuate.
In Fig. 7 we show the joint pdf of local correlations and susceptibilities of the
3d ea spin-glass in its glassy phase; the accord with the analytic prediction is very
CONTENTS 26
satisfactory [5, 6] with the additional spreading away from the master curve ascribed to
the fact that ℓ is finite and not very different from ξ.
+++ Bulk
0 0.5 1
Figure 7. (a) The joint pdf ρ(Cr , χr) at two times (tw, t) such that the global
correlation is C(t, tw) = 0.7 < qea in the 3d ea model. (b) Projection of three contour
levels. The crosses are the parametric construction χ̂(C) for several values of the total
time t larger than tw. The dotted straight line is fdt at the working temperature T .
These results are taken from [6].
3.7. Infinite susceptibilities
Zero modes are intimately related to infinite susceptibilities. Indeed, systems with
continuous symmetries are sensitive to arbitrarily weak perturbations. In the present
context the approximate global time-reparametrization invariance implies that one
can easily change the ‘clock’ R(t) characterizing the scaling of the global correlation
and linear response by applying infinitely weak perturbations that couple to the zero
mode. An illustration of this property is the fact that the aging relaxation dynamics
of glassy systems is rendered stationary by a weak perturbing force that does not
derive from a potential while the χ̂(C) relation in the slow regime is not modified [40].
One could envisage more refined tests such as applying a perturbation that imposes
different scalings on two macroscopic borders of the system and see how the time-
reparametrization wave develops in the bulk.
3.8. Conclusions
In conclusion, the global time-reparametrization invariance scenario gives a mechanism
for the divergence of the correlation length ξ though others have also been proposed in
the literature. There are a number of predictions, as the parametric relations between
local coarse-grained correlations measured at different times and the local fluctuation-
dissipation relations that, to our knowledge, are not explained by other scenarios. As
regards to the easier to measure pdfs of local correlations the framework is not only
CONTENTS 27
consistent with the scaling (28) – that also arises from simple scaling assumptions –
but it also justifies the non-Gaussian and Gumbel-like functional form of the pdfs
that follows from the proposed effective action for local ‘ages’. Moreover, systems with
global time-reparametrization invariance should have as important fluctuations in the
local susceptibilities as in the local coarse-grained responses.
4. Discussion
We presented a summary of studies of dynamical fluctuations in glassy systems that
are based on the idea that, in the long time regime, a global time-reparametrization
invariance emerges in the effective action describing the aging dynamics. We discussed
how this symmetry concretely appears in mean-field spin models, and how it can
be shown to emerge at the level of the action for short-ranged spin glass models
with quenched disorder. Two assumptions are used to prove the global time-
reparametrization of the action for the short-range spin glass model i) causality and
unitarity, and ii) a separation of time scales between a fast (or stationary) and a slow
(or aging) relaxation, where in the latter time translation invariance is broken.
That the dynamical action is symmetric under uniform, i.e., spatially independent
reparametrizations of time variables (t → h(t)) suggests that the dynamic fluctuations
that cost little action should be describable in terms of position dependent, long
wavelength, reparametrizations of the form t→ h(~r, t). These should be the Goldstone
modes associated with breaking time-reparametrization invariance symmetry.
We presented predictions of this theoretical framework and tests that we performed
in the 3d Edwards-Anderson model to falsify these predictions. Among the consequences
of our theoretical framework are those listed in Sect. 3. Some of them find an
explanation within other theories as well; others are particularly related to the time
reparametrization invariance scenario. For example, a correlation length that grows
in time is associated with the asymptotic approach to the long-time regime in which
the symmetry is fully developped, and the long wavelength modes eventually become
massless. The existence of a growing length is also predicted by other models. The
functional form of the triangular relations relating local two-time correlations between
three different times is, instead, particular to our framework.
In this review we showed tests of the predictions of the global time-
reparametrization invariance scenario performed on a finite dimensional spin-glass
model [4]-[9]. In the past we also studied the dynamics of kinetically facilitated models,
without quenched disorder, along the same lines [7]. We believe that, if aging dynamics
is a universal property of glassy systems, then these ideas should also apply to interacting
particle systems without quenched disorder. The rationale is that the two assumptions
leading to the global time-reparametrization invariance of the dynamical action, namely
causality and unitarity, and a separation of time scales, should also hold for other glassy
systems. The former assumption we can take as a fact. The second is, in a way, the
assumption that a glassy phase exists, even though we say nothing as of why it does. As
CONTENTS 28
we stated in the introduction, we do not aim at the question why glasses?, but instead we
focus on the possible universal dynamical properties once the glassy state is presented by
nature. Some of the consequences of the symmetry have already been tested numerically
in Lennard-Jones systems [29, 32] but there is still much room for more detailed studies,
including the analysis of local triangular relation between correlation and susceptibilities
and tests of their joint behaviour.
We thus propose that the asymptotic global time-reparametrization invariance,
and the associated low action long wavelength local reparametrizations, constitute the
mechanism by which dynamic fluctuations, that is to say heterogeneities, are generated
in the glassy state. Moreover, this mechanism may also apply, in an approximate form,
to the super-cooled liquid regime. It should just be an approximation because in super-
cooled liquids the symmetry is not fully developed and there is then a cut-off setting
the limit of the spatial and temporal extent of the heterogeneities, in sharp constrast
to the low temperature glassy regime in which the symmetry is realized asymptotically
and fluctuations of all sizes exist. The growth and divergence of the (two-time) dynamic
correlation length defined from the study of the space correlation of the two-time order
parameter is a manifestation of the growth and divergence of these fluctuations in the
glassy state; in contrast, such growth is interrupted in the super-cooled liquid.
To better understand the distiction between the glassy state with its asymptotic
symmetry and the super-cooled state without the fully developing symmetry, consider
the phenomenology of dynamic fluctuations as a function of temperature. Dynamic
heterogeneities in the super-cooled liquid phase have been identified numerically and
experimentally [52]. These are in a number of ways more important than what is
observed in a simple liquid or a solid. In the super-cooled liquid phase while the
full relaxation is stationary, there is still a time-scale separation with the correlations
decaying as a function of time-difference first rapidly to a temperature-independent
plateau and then slowly towards zero. The latter is the structural or α-relaxation. The
α relaxation time, tα, is finite in the super-cooled liquid regime and it increases upon
decreasing temperature. The global parametrization chosen by the symmetry breaking
terms in the slow regime is R(t) ∝ e−t/tα and C(t, tw) = qeafc(e
−(t−tw)/tα) in this case.
The mode-coupling approach to super-cooled liquids is based on approximate
dynamic equations for the relevant correlators of realistic systems that are very similar
to the p-spin Schwinger-Dyson equations in the high temperature phase [16, 18]. In
these equations the correlators are already expressed as functions of the time-difference,
τ ≡ t − tw. Close to the critical temperature the separation of time-scales develops in
the mode-coupling equations. The approximate analysis of the α relaxation predicted
by these equations also relies on dropping the τ derivatives and approximating the
integrals by assuming a sharp time-scale separation. The remaining asymptotic (large
τ) equations are invariant under reparametrizations of τ .
We then expect the local coarse-grained correlations and integrated linear responses
in the super-cooled liquid to be, to a first approximation, stationary (after a sufficiently
long waiting-time that goes beyond the equilibration time) but with different finite
CONTENTS 29
structural relaxation times, fluctuating about the value that characterizes the decay
of the global correlations. This is consistent with the experimental observation that
dynamic heterogeneities in supercooled liquids seem to have a lifetime of the order of
the relaxation time. Deviations from stationarity are not completely excluded for finite
ℓ but they should be less important than in the aging low-temperature regime.
There is, however, an important difference with respect to the aging regime, in
which the equilibration time diverges and local relaxation times or, better stated, local
ages can fluctuate without limit when tw → ∞. At high temperatures one does
not expect to find heterogeneities with arbitrary long relaxation time. Furthermore,
heterogeneities have a finite spatial extent and one can then suppress them by using
sufficiently large coarse-graining volumes. The correlation length is stationary, ξ(τ),
and it saturates in the limit of long-time differences, τ → ∞. The saturation value,
though, increases for decreasing temperature. From a theoretical point of view, this
picture is, in a sense, similar to the one that describes the equilibrium paramagnetic
phase in the O(N) model, just above the ordering transition temperature.
When lowering the temperature the size and life-time of the heterogeneities
increases. A p-spin or mode-coupling approach predicts that their typical size and thus
limτ→∞ ξ(τ) diverge at the mode coupling transition temperature [33]. In real systems
the divergence at Tc is rendered smooth and ξ does not strictly diverge asymptotically.
At still lower temperatures the bulk quantities age and we expect then to observe
heterogeneous aging dynamics of the kind described in this review, with a two-time
dependent correlation length for the local fluctuations. The heterogeneities age as well,
in a ‘dynamic’ way. By this we mean that if a region looks older than another one when
observed on a given time-window, it can reverse its status and look younger than the
same other region when observed on a different time-window.
The infinite susceptibility with respect to perturbations that couple to the zero
mode are illustrated by the fact that the clock of the bulk quantities that is selected
dynamically is very easy to modify with external perturbations. A small force that does
not derive from a potential and is applied on every spin in the model renders an aging p
spin model stationary [40] while the model maintains a separation of time-scales in which
the fast scale follows the temperature of the bath, T , while the slow scale is controlled
by an an effective temperature, Teff > T . In this case, the aging system selects a
time-reparametrization R(t) = t while in the perturbed model R(t) = e−t/tα . Similarly,
the aging of a Lennard-Jones mixture is stopped by an homogeneous shear [41]. A
different way to modify the time-reparametrization that characterizes the decay of the
correlations is by using complex thermal baths [42].
The picture that we have described applies to long times but not as long as to enter
the activated regime that we still do not know how to characterize theoretically, not
even at the bulk level. This regime corresponds to times that scale with the system size.
The success of mean-field models, or the mode-coupling approach, in describing the
bulk dynamics of glassy systems, at least not to close to the crossover glass temperature
and at a qualitative level, allows us to claim that these extremely long times scales are
CONTENTS 30
unrealistic if not too close to the glass transition Tg even as far as dynamic fluctuations
are concerned.
The ideas discussed in this paper should not only apply to systems that relax
in a non-equilibrium manner as glasses but also to systems with slow dynamics and a
separation of time-scales that are kept out of equilibrium with a (weak) external forcing.
Recently, there has been much interest in the appearance of shear localization, in the
form of shear bands, in the rheology of complex fluids. Along the lines here described
it would be very interesting to analyze the fluctuations in the local reparametrizations
in the fluidized shear band and the ‘jammed’ glassy band.
The analytic treatment of mean-field quantum glassy systems follows similar steps
to the ones presented here. The Schwinger-Keldysh approach replaces the Martin-
Siggia-Rose one but these formalisms are very similar indeed. The analytic solution to
the dynamic equations in the limit in which the coupling to the environment is weak
also uses the fact that the dynamics in the aging regime is very slow. The approximate
equations then become time-reparametrization invariant. One can then expect that
similar dynamic fluctuations arise in glassy problems in which quantum fluctuations
are important. Moreover the proof of global time-reparametrization invariance for spin-
glasses has been presented directly in the quantum formalism. Novel experimental
techniques may be apt to study dynamic heterogeneities in glasses when quantum
fluctuations are important.
We expect to find similar fluctuations using finite size systems and examining the
behaviour of the mesoscopic run-to-run fluctuations of the global correlations. These
may be easier to measure experimentally using mesoscopic systems.
Importantly enough, global time reparametrization invariance does not develop in
all models with slow and aging correlation functions. The O(N) ferromagnetic model in
the large N limit is a case in which global time-reparametrization invariance is reduced
to just scale invariance [8].
Last, but not least, the approach based on reparametrization invariance suggests
that it may be possible to search for universality in glassiness. A Ginzburg-Landau
theory for phase transitions captures universal properties that are independent of the
details of the material. It is symmetry that defines the universality classes. For example,
one requires rotational invariance of the Ginzburg-Landau action when describing
ferromagnets. Time reparametrization invariance may be the underlying symmetry that
must be satisfied by the Ginzburg-Landau action of all glasses. What would determine
if a system is glassy or not? We are tempted to say the answer is if the symmetry
is generated or not at long times. Knowing how to describe the universal behavior
may tell us all the common properties of glasses, but surely it will not allow us to
make non-universal predictions, such as what is the glass transition temperature for a
certain material, or whether the material displays glassy behavior at all. This quest for
universality is a very interesting theoretical scenario that needs to be confronted.
We have tried to state as clearly as possible the implications of our proposal. The
CONTENTS 31
full project is not yet complete since several questions about its limitations remain open.
A number of issues should be addressed are:
(i) From a phenomenological point of view, to perform strong tests of the global
time-reparametrization invariance scenario in molecular dynamic simulations [28, 44, 45]
of realistic glassy systems and experiments [46]-[51]. More precisely, the triangular
relations between local coarse-grained correlations and the local fluctuation dissipation
relations should be analysed to give support or else falsify this conjecture.
(ii) From an analytic point of view, to derive the effective action for local
reparametrizations for glassy models with and without quenched disorder. We are
currently working on this project in collaboration with S. Franz. One idea is to study
the p spin disordered model with Kac long-range interaction. ANothe one is to study
the symmetries of the dynamic action associated with the Dean-Kawasaki equation for
the evolution of the density of a system of particles in interaction.
(iv) From a mixed analytic and numerical point of view to analyse fluctuations
in models with global aging dynamics of different type. To this end, one can study
dynamic fluctuations in simple coarsening systems in finite dimensions. In these cases
the morphology of domains can be characterized in great detail [43]. We could, in
principle, understand the fluctuations in the local correlations and linear responses from
a microscopic point of view. Whether these are similar or different to the ones in other
glassy problems is still to be established and the outcome of this study could clarify the
relevance of the value of the effective temperature in determining the characteristics of
the dynamic fluctuations.
(v) In the same line as the above, the analysis of fluctuations of elastic lines in the
presence of impurities could help us understanding the coarsening phenomenon but also
the role of diffusion that superimposes in these cases to the aging dynamics [15].
These are just a few questions posed by this proposal that are still not answered.
Acknowledgments
We thank our collaborators J. J. Arenzon, C. Aron, A. J. Bray, S. Bustingorry,
H. E. Castillo, P. Charbonneau, D. Domı́nguez, J. L. Iguain, L. D. C. Jaubert, M. P.
Kennett, M. Picco, D. R. Reichman, M. Sellitto, A. Sicilia and H. Yoshino.
We also wish to especially thank L. Berthier, G. Biroli, J-P Bouchaud, D. S. Dean,
G. Fabricius, T. Grigera, E. Fradkin, S. Franz, J. Kurchan, H. Makse, D. Stariolo and L.
Valluzzi for very helpful discussions. We acknowledge financial support from NSF-CNRS
INT-0128922, NSF DMR-0305482, DMR 0403997 and PICS 3172. LFC is a member of
Institut Universitaire de France. LFC thanks the Newton Institute at the University of
Cambridge, ICTP at Trieste, and Universidad Nacional de Mar del Plata, Argentina, CC
the LPTHE at Jussieu, Paris, France, and LFC and CC the Aspen Center for Physics
CONTENTS 32
for hospitality where part of this work has been carried out.
[1] M. D. Ediger, C. A. Angell, and S. R. Nagel; J. Phys. Chem. 100, 13 200 (1996). Glassy Materials
and disordered solids, K. Binder and W. Kob (World Scientific, 2005).
[2] P. W. Anderson, Concepts in solids, (World Scientific, 1997).
[3] Several reviews and book summarize the aging properties of different types of glasses. Aging in
polymer glasses is described in L. C. D. Struik, Physical aging in amorphous polymers and
other materials (Elsevier, Houston, 1978). Aging in spin-glasses is reviewed in E. Vincent et
al, Slow dynamics and aging in spin-glasses, cond-mat/9607224. Aging in soft glassy materials
is summarized in L. Cipelletti and L. Ramos, J. Phys. C 17, R253 (2005). or Viasnoff and
Lequeux. Aging in orientational glasses in F. Alberici-Kious, J-P Bouchaud, L. F. Cugliandolo,
P. Doussineau and A. Levelut, Phys. Rev. B 62, 14766 (2000)
[4] C. Chamon, M. P. Kennett, H. E. Castillo, and L. F. Cugliandolo Phys. Rev. Lett. 89, 217201
(2002).
[5] H. E. Castillo, C. Chamon, L. F. Cugliandolo, and M. P. Kennett, Phys. Rev. Lett. 88, 237201
(2002).
[6] H. E. Castillo, C. Chamon, L. F. Cugliandolo, J. L. Iguain, and M. P. Kennett, Phys. Rev. B 68,
134442 (2003).
[7] C. Chamon, P. Charbonneau, L. F. Cugliandolo, D. R. Reichman, and M. Sellitto, J. of Chem.
Phys. 121, 10120 (2004).
[8] C. Chamon, L. F. Cugliandolo, H. Yoshino, J. Stat. Mech (2006) P01006.
[9] L. D. C. Jaubert, C. Chamon, L. F. Cugliandolo, and M. Picco, cond-mat/0701116, to appear in
JSTAT.
[10] H. Sompolinsky, Phys. Rev. Lett. 47, 935 (1981).
[11] S. L. Ginzburg, Zh. Eksp. Teor. Fiz. 90, 754 (1986) [Sov. Phys. JETP 63, 439 (1986)]. L. B. Ioffe,
Phys. Rev. B 38, 5181 (1988).
[12] L. F. Cugliandolo and J. Kurchan, Phys. Rev. Lett. 71, 173 (1993).
[13] L. F. Cugliandolo and J. Kurchan, J. Phys. A 27, 5749 (1994).
[14] L. F. Cugliandolo, J. Kurchan, and L. Peliti, Phys. Rev. E 55, 3898 (1997).
[15] S. Bustingorry, J. L. Iguain, C. Chamon, L. F. Cugliandolo, and D. Domı́nguez, Europhys. Lett.
76, 856 (2006).
[16] T. R. Kirkpatrick and D. Thirumalai, Phys. Rev. Lett. 58, 2091 (1987); Phys. Rev. B 36, 5388
(1987). T. R. Kirkpatrick and P. Wolynes, Phys. Rev. B 36, 8552 (1987).
[17] C. Chamon and M. P. Kennett, Phys. Rev. Lett. 86, 1622 (2001).
[18] J-P Bouchaud, L. F. Cugliandolo, J. Kurchan, and M. Mézard, Physica A 226, 243 (1996).
[19] L. F. Cugliandolo, Lecture notes in Slow Relaxation and non equilibrium dynamics in condensed
matter, Les Houches Session 77 July 2002, J-L Barrat, J Dalibard, J Kurchan, M V Feigel’man
eds. (Springer-Verlag, 2003); cond-mat/0210312.
[20] W. Götze and L. Sjögren, Rep. Prog. Phys. 55, 241 (1992). W. Götze, Condensed Matter Physics
1, 873 (1998).
[21] R. A. Fisher, Ann. Eugenetics, 7, 355 (1937).
[22] V. Viasnoff and F. Lequeux, Phys. Rev. Lett. 89, 065701 (2002).
[23] D. S. Dean, J. Phys. A 29, L613 (1996). K. Kawazaki and T. Koga, Physica A 201, 115 (1993).
[24] L. F. Cugliandolo and D. S. Dean, J. Phys. A 28, 4213 (1996).
[25] L. F. Cugliandolo and G. S. Lozano, Phys. Rev. Lett. 80, 4979 (1998).
[26] L. F. Cugliandolo, D. R. Grempel, G. L. Lozano, H. Lozza, and C. A. da Silva Santos, Phys. Rev.
B 66, 014444 (2002).
[27] C. De Dominicis and E. Brézin, Eur. Phys. J. B 30, 71 (2002)
http://arxiv.org/abs/cond-mat/9607224
http://arxiv.org/abs/cond-mat/0701116
http://arxiv.org/abs/cond-mat/0210312
CONTENTS 33
[28] G. Parisi, J. Phys. Chem. B 103, 4128 (1999).
[29] A. Parsaeian and H. E. Castillo, cond-mat/0610789.
[30] see e.g. N. Lačević, F. W. Starr, T. B. Sch/oder, and S. C. Glotzer, J. Chem. Phys. 119, 7372
(2003) and references therein.
[31] L. Berthier, Phys. Rev. E 69, 020201(R) (2004).
[32] H. E. Castillo and A. Parsaeian, cond-mat/0610857.
[33] S. Franz and G. Parisi, J. Phys. I 5, 1401 (1995), Phys. Rev. Lett. 79, 2486 (1997). C. Donati, S.
Franz, G. Parisi, and S. C. Glotzer, Phil. Mag. B 79, 1827 (1999). G. Biroli and J-P Bouchaud,
Europhys. Lett. 67, 21 (2004). G. Biroli, J-P Bouchaud, K. Miyazaki, and D. R. Reichman,
cond-mat/0605733.
[34] T. R. Kirkpatrick and P. Wolynes, Phys. Rev. B 36, 8552 (1987). T. R. Kirkpatrick, D. Thirumalai
and P. G. Wolynes, Phys. Rev. A 40, 1045 (1989). P. G. Wolynes, Jour. Res. NIST 102, 187
(1997). J-P Bouchaud and G. Biroli, J. Chem. Phys. 121, 7347 (2004).
[35] J. P. Garrahan and D. Chandler, Proc. Natl. Acad. Sci. USA 100, 9710 (2003). S. Whitelam, L.
Berthier, and J. P. Garrahan Phys. Rev. Lett. 92, 185705 (2004). A. C. Pan, J. P. Garrahan,
and D. Chandler Phys. Rev. E 72, 041106 (2005). R. L. Jack, L. Berthier, and J. P. Garrahan,
Phys. Rev. E 72, 016103 (2005). R. L. Jack and J. P. Garrahan, J. Chem. Phys. 123, 164508
(2005),
[36] G. Tarjus, S. A. Kivelson, Z. Nussinov, and P. Viot, The frustration-based approach of supercooled
liquids and the glass transition: a review and critical assessment, cond-mat/0509127 and
references therein.
[37] C. Chamon, L. F. Cugliandolo and S. Franz, in preparation.
[38] S. Bramwell, P. C. W. Holdsworth and J-F Pinton, Nature 396, 552 (1998).
[39] E. Bertin and M. Clusel, J. Phys. A 39, 7607 (2006).
[40] L. F. Cugliandolo, J. Kurchan, P. Le Doussal, and L. Peliti, Phys. Rev. Lett. 78, 350 (1997). L.
Berthier, J.-L. Barrat, and J. Kurchan, Phys. Rev. E 61, 5464 (2000).
[41] L. Berthier and J.-L. Barrat, Phys. Rev. Lett. 89, 095702 (2002); J. Chem. Phys. 116, 6228 (2002).
[42] L. F. Cugliandolo and J. Kurchan, Physica A 263 242 (1999).
[43] J. J. Arenzon, A. J. Bray, L. F. Cugliandolo, and A. Sicilia, cond-mat/0608270, to appear in Phys.
Rev. Lett.
[44] L. Valluzzi et al., in preparation.
[45] K. Vollmayr-Lee, W. Kob, K. Binder, and A. Zippelius, J. Chem. Phys. 116, 5158 (2002).
[46] L. Buisson, L. Bellon, and S. Ciliberto, cond-mat/0210490, to appear in Proceedings of “III
Workshop on Non-Equilibrium Phenomena” (Pisa 2002).
[47] W. K. Kegel and A. V. Blaaderen, Science 287, 290 (2000).
[48] K. S. Sinnathamby, H. Oukris, N. E. Israeloff, Phys. Rev. Lett. 95, 67205 (2005). Crider and N.
E. Israeloff, Nano Letters 6, 887 (2006).
[49] L. Cipelletti, H. Bissig, V. Trappe, P. Ballestat, and S. Mazoyer, J. Phys.: Condens. Matter 15,
S257 (2003); A. Duri and L. Cipeletti, cond-mat/0606051
[50] R. E. Courtland and E. R. Weeks, J. Phys.: Condens. Matter 15, S359 (2003). G. C. Cianci,
R. E. Courtland, E. R. Weeks, cond-mat/0512698. E. R. Weeks, J. C. Crocker, D. A. Weitz,
cond-mat/0610195.
[51] P. Wang, C. Song, and H. A. Makse, Nature Physics 2, 526 (2006).
[52] H. Sillescu, J. Non-Crystal. Solids 243, 81 (1999); M. D. Ediger, Annu. Rev. Phys. Chem. 51, 99
(2000).
http://arxiv.org/abs/cond-mat/0610789
http://arxiv.org/abs/cond-mat/0610857
http://arxiv.org/abs/cond-mat/0605733
http://arxiv.org/abs/cond-mat/0509127
http://arxiv.org/abs/cond-mat/0608270
http://arxiv.org/abs/cond-mat/0210490
http://arxiv.org/abs/cond-mat/0606051
http://arxiv.org/abs/cond-mat/0512698
http://arxiv.org/abs/cond-mat/0610195
	Why glasses? vs. universality in glassy dynamics
	Time reparametrization invariance
	Mean-field models – dynamic equations
	Structural glasses: the p3 cases
	Short-ranged models – dynamic action
	Turning a nuisance into something useful - symmetry as a guideline
	The spherical p=2 case or mean-field domain growth
	Quantum problems
	Consequences and tests
	Two-time correlation length
	Scaling of the pdf of local two-time functions
	Effective action for local ages
	Two-time scaling of local functions
	Multi-time scaling
	Local fluctuation-dissipation relation
	Infinite susceptibilities
	Conclusions
	Discussion
ABSTRACT
  We summarize a theoretical framework based on global time-reparametrization
invariance that explains the origin of dynamic fluctuations in glassy systems.
We introduce the main ideas without getting into much technical details. We
describe a number of consequences arising from this scenario that can be tested
numerically and experimentally distinguishing those that can also be explained
by other mechanisms from the ones that we believe, are special to our proposal.
We support our claims by presenting some numerical checks performed on the 3d
Edwards-Anderson spin-glass. Finally, we discuss up to which extent these ideas
apply to super-cooled liquids that have been studied in much more detail up to
present.

<|endoftext|><|startoftext|>
A generalization of Chebyshev polynomials and non rooted
posets
Masaya Tomie
tomie@math.tsukuba.ac.jp
In this paper we give a generalization of Chebyshev polynomials and using this we describe
the Mobius function of the generalized subword order derived from a poset {a1, · · · as, c | ai <
c for i = 1, · · · s}, which contains an affirmative answer for the conjecture by Björner, Sagan and
Vatter.(cf,[5] [10])
1 INTRODUCTION
Björner was the first to determine the Möbius functions of factor orders and subword orders. To
determine the Möbius functions, he used involutions, shellability, and generating functions. [2][3][4]
Björner and Stanley found an interesting relation among the subword order derived from a two
point set {a, b} , symmetric groups and composition orders. [6]
Factor orders, subword orders, and generalized subword orders were studied in the context of
Möbius functions derived from word orders.
In [10] Sagan and Vatter gave a description of the Möbius function of the generalized subword
order derived from positive integers in two ways, namely the sign reversing involution and the dis-
crete Morse theory. More generally they gave a combinatorial description of the Möbius functions
derived from rooted forests. And in [5][10] they gave a very interesting conjecture which connects
with the relation between a non-rooted forest P2 as in Notation.1 and Chebyshev polynomials.
Conjecture 1 ([5][10])
We put P := {a, b, c, | a < c, b < c }, and consider the poset P ∗ consisting of finite words of P
with its generalized subword order. Let µ be a Möbius function of P ∗. Suppose 0 ≤ i ≤ j.
Then µ(ai, cj) is the coefficient of Xj−i in Ti+j(X).
Now we call {Tn(X) | n ∈ N} Chebyshev polynomials of first kind.
A series of Chebyshev polynomials {T (X) | n ∈ N} is a system of orthogonal polynomials and
induces a special case of hypergeometric functions as a generalization of a binomial series. And
this polynomial series is an example of the best approximation polynomials.
Not only in analysis, but in combinatorics, Chebyshev polynomials appear in permutation
pattern avoidances [7] and Chebyshev posets, Chebyshev transformations defined by Hetyei which
are related to cd-indeices, f -vectors and h-vectors respectively.
In this paper we give a natural generalization of Chebyshev polynomials in the following way.
Definition 1 (generalized Chebyshev polynomials)
We define the polynomial T sk (X) for s, k ∈ N as follows:
(1) T s0 (X) = 1, T
1 (X) = (s− 1)X ,
(2) T sk+2(X) + T
k (X) = sX · T
k+1(X).
Now the T 2n(X) are Chebyshev polynomials of first kind. And notice deg(T
k (X)) = k.
Then, using generalized Chebyshev polynomials, we generalize the conjecture as follows.
http://arxiv.org/abs/0704.0685v2
Theorem 1
Let Ps be a poset as Notation.1 and µ be the Möbius function of P
s . Then for 0 ≤ m ≤ n ,
µ(am1 , c
n) is the coefficient of Xn−m in T sm+n(x).
2 PRELIMINERIES
In this section, we give some basic definitions and notations used in this paper. For the basic
definitions of posets and Möbius functions, see [12] and for the definitions of subword orders and
generalized subword orders, see [2] [3] [4] [5] [10]. First we recall a path in a poset P .[12]
Definition 2 ([12])
Let P be a poset. An arranged sequence (θ1, · · · , θr) with θi ∈ P and θ1 < · · · < θr, is called a
path of length r − 1 . We denote it by (θ1 → · · · → θr)p
Proposition 1 ([12])
Let P be a locally finite poset, µ be a Möbius function of P and Ck be the number of paths of P
of length k respectively. Then we have
µ(u, v) = Σk(−1)
kCk, for all u ≤ v ∈ P .
Definition 3 ([10])
Let P ∗ be the poset with the subword order derived from a poset P . We take p1, · · · pk and q1 · · · ql
from P ∗. If p1 · · · pk ≤ q1 · · · ql as a subword order, we call S(j1, · · · , jk) (j1 < · · · < jk), an
embedding of p1 · · · pk into q1 · · · ql if pi ≤ qji for 1 ≤ i ≤ k
And an embedding S(j1, · · · , jk) is called the right most embedding of p1 · · · pk into q1 · · · ql, if
for any embedding S′(j′1, · · · , j
k) , we have j
i ≤ ji for all 1 ≤ i ≤ k.
Notation 1
In this paper we fix a poset Ps for s ∈ N as follows.
Ps := {a1, · · ·as, c | ai < c, for i = 1, · · · s}}
Definition 4
We define as follows.
Let P ∗s be a poset with the generalized subword order derived from a poset Ps as in Notation
1 and let X be a set of the paths of P .
Put Mob(X) := Σk≥1Ck, where Ck is the number of paths in X whose length is k. Also we
define
{akcl} := {p1 · · · pk+l | ♯{p1, · · · , pk+l} ∩ {a1, · · · as} = k, ♯{p1, · · · , pk+l} ∩ {c} = l} for k, l ∈
N ∪ {0},
< p1 · · · pk, {a
lcm} >:= {q1 · · · ql+m ∈ {a
lcm} | p1 · · · pk ≤ q1 · · · ql+m} for k, l ∈ N ∪ {0}, and
Pat{p1 · · · pk, q1 · · · ql} := {(p1 · · · pk → θ1 →, · · · → θr → q1 · · · ql)p | p1 · · · pk < θ1 < · · · <
θr < q1 · · · ql, |θi| = l} respectively. Here |θ| is the number of letters of θ.
Definition 5 (generalized Chebyshev polynomial)
We define the polynomial T sk (X) s, k ∈ N as follows:
(1) T s0 (X) = 1, T
1 (X) = (s− 1)X ,
(2) T sk+2(X) + T
k (X) = sX · T
k+1(X).
Now the {T 2n(X) | n ∈ N} are Chebyshev polynomials of first kind. And notice deg(T
k (X)) = k.
Here we give a simple expression of the generalized Chebyshev polynomials.
Proposition 2
For s, n ∈ N, we have
T sn(X) = Σm≤n, n−m:even(−1)
(n−m)/2
(n+m)/2
(n−m)/2
) · sm − (
(n+m)/2− 1
(n−m)/2
) · sm−1
PROOF
We show the above formula by induction. It is easy to see T s0 (X) = 1, T
1 (X) = (s−1)X . Now
we have
T sn + T
= Σm≤n, n−m:even(−1)
(n−m)/2
(n+m)/2
(n−m)/2
) · sm − (
(n+m)/2− 1
(n−m)/2
) · sm−1
+Σm≤n+2, n+2−m:even(−1)
(n+2−m)/2
(n+ 2 +m)/2
(n+ 2−m)/2
) · sm − (
(n+ 2 +m)/2− 1
(n+ 2−m)/2
) · sm−1
) · sn+2 − (
) · sn+1
·Xn+2
+Σm≤n, n−m:even(−1)
(n−m)/2+1
(n+m)/2
(n−m)/2 + 1
) · sm − (
(n+m)/2− 1
(n−m)/2 + 1
) · sm−1
= sn+1(s− 1)Xn+2
+Σm≤n−1, n−1−m:even(−1)
(n−m−1)/2+1
(n+m+ 1)/2
(n−m− 1)/2 + 1
) · sm+1 − (
(n+m+ 1)/2− 1
(n−m− 1)/2 + 1
) · sm
·Xm+1
sn(s− 1)Xn+1
+Σm≤n−1, n−1−m:even(−1)
(n+1−m)/2
(n+ 1 +m)/2
(n+ 1−m)/2
) · sm − (
(n+ 1 +m)/2− 1
(n+ 1−m)/2
) · sm−1
= sX · T sn+1.
Hence we obtain the derived result. ✷
3 MAIN RESULTS
In this section, we give a proof of Theorem 1
Lemma 1
Let P be a finite poset and we take an element x ∈ P . We put as follows:
P≤x := {y | y ≤ x} , P̂≤x := {(θ1 → · · · → θr−1 → x)p | θi ∈ P} , P≥x := {y | y ≥ x} , P̂≥x :=
{x → θ1 → · · · → θr−1)p | θi ∈ P} and Px := {(· · · → τr → x → σ1 → · · ·) | τi ≤ x, σi ≥ x }.
Now a path (x) ∈ P̂≤x, P̂≥x, Px. Then we have MobPx = MobP̂≤x MobP̂≥x.
PROOF
Notice that a path which passes through x splits into the two paths, one starts from x and the
other one ends x. From that we obtain the derived result. ✷
Lemma 2
For m,n, p, q ∈ N∪ {0} such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, 0 ≤ q ≤ n, we take p1 · · · pm, p̃1 · · · p̃m ∈
{am−pcp}. Then we have
♯ < p1 · · · pm, {a
n−qcq} >= ♯ < p̃1 · · · p̃m, {a
n−qcq} >.
PROOF
Claim 1 We have ♯ < p1 · · ·
i−th︷︸︸︷
ax · · · pm, {a
n−qcq} >= ♯ < p1 · · ·
i−th︷︸︸︷
ay · · · pm, {a
n−qcq} >.
(Proof of claim1)
We take ∀q1 · · · qn ∈< p1 · · ·
i−th︷︸︸︷
ax · · · pm, {a
n−qcq} >. And we consider the right most embedding
into q1 · · · qn. Notice that the right most embedding is unique. Here we put S(j1, j2, · · · , jm) as
the right most embedding p1 · · ·
i−th︷︸︸︷
ax · · · pm into q1 · · · qn .
Now we define the map Φ as follows.
Φ < p1 · · ·
i−th︷︸︸︷
ax · · · pm, {a
n−qcq} >−→< p1 · · ·
i−th︷︸︸︷
ay · · · pm, {a
n−qcq} >
Φ(q1q2, · · ·
ji−th︷︸︸︷
ax ax1 · · · axt
ji+1−th︷︸︸︷
pi+1 · · · qn) = q1q2 · · ·
ji−th︷︸︸︷
ay ay1 · · · ayt
ji+1−th︷︸︸︷
pi+1 · · · qn,
Here we put ayk = axk+y−x, and ak+s = ak.
It is easy to see the right most embedding of p1 · · ·
i−th︷︸︸︷
ay · · · pm into Φ(q1 · · · qn) is S(j1, j2, · · · , jm).
And by the construction of Φ, we can easily define the inverse map of Φ. Hence we prove this
claim.
Claim 2 We have ♯ < p1 · · ·
i−th︷︸︸︷
(i+1)−th
c · · · pm, {a
n−qcq} >= ♯ < p1 · · ·
i−th︷︸︸︷
(i+1)−th
ax · · · pm, {a
n−qcq} >.
(Proof of claim2)
We take ∀q1 · · · qn ∈< p1 · · ·
i−th︷︸︸︷
(i+1)−th
c · · · pm, {a
n−qcq} > and put S(j1, j2, · · · , jm) as the
right most embedding p1 · · ·
i−th︷︸︸︷
(i+1)−th
c · · · pm into q1 · · · qn.
Now we define the map Φ as follows.
Φ < p1 · · ·
i−th︷︸︸︷
(i+1)−th
c · · · pm, {a
n−qcq} >−→< p1 · · ·
i−th︷︸︸︷
(i+1)−th
ax · · · pm, {a
n−qcq} >,
Φ(q1 · · ·
=ax︷︸︸︷
qji · · ·︸ ︷︷ ︸
=c︷︸︸︷
qji+1 · · ·︸ ︷︷ ︸
qji+2 · · · qn) = q1 · · ·
=c︷︸︸︷
qji+1 · · ·︸ ︷︷ ︸
=ax︷︸︸︷
qji · · ·︸ ︷︷ ︸
qji+2 · · · qn.
Here the right most embedding of p1 · · ·
i−th︷︸︸︷
(i+1)−th
ax · · · pm into q1 · · ·
=c︷︸︸︷
qji+1 · · ·︸ ︷︷ ︸
=a︷︸︸︷
qji · · ·︸ ︷︷ ︸
qji+2 · · · qn
is S(j1 · · · ji, ji+2 + ji − ji+1, ji+2 · · · jm). By the construction, all of the elements of
< p1 · · ·
i−th︷︸︸︷
(i+1)−th
c · · · pm, {a
n−qcq} > whose right most embedding are S(j1, j2, · · · , jm), have
one to one correspondence to the elements of < p1 · · ·
i−th︷︸︸︷
(i+1)−th
ax · · · pm, {a
n−qcq} > whose right
most embedding are S(j1 · · · ji, ji+2 + ji − ji+1, ji+2 · · · jm). Hence the Φ is bijeciton. Therefore
we have this claim2. By these claims we obtain the derived result. ✷
From Lemma 2 , if p1 · · · pm ∈ {a
m−pcp}, then
♯ < p1 · · · pm, {a
n−qcq} >= ♯ < a1 · · · a1︸ ︷︷ ︸
(m−p)times
c · · · c︸ ︷︷ ︸
p−times
, {an−qcq} >. So we denote the number as
M((m, p), (n, q)) for all 0 ≤ m ≤ n, 0 ≤ p ≤ m, 0 ≤ q ≤ n.
Lemma 3
Let k, l ∈ N∪ {0}, 0 ≤ k ≤ l and p1 · · · pl ∈ {al−kck}, then we have [p1 · · · pl, cl] ≃ Bl−k. Now Bl−k
is a Boolean algebra of rank l − k.
Lemma 4
For m,n, p, k ∈ N ∪ {0} such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, we take p1 · · · pm ∈ {am−pcp}, then
the number of paths in Pat{p1 · · · pm, c
n} whose length are k equals to the number of paths in
Pat{ a1 · · · a1︸ ︷︷ ︸
(m−p)times
c · · · c︸ ︷︷ ︸
p−times
, cn}.
PROOF
Notice that if we take q1 · · · qn ∈ {a
n−qcq}, then the number of length l paths from q1 · · · qn to
cn equals to the number of length l paths from a1 · · ·a1︸ ︷︷ ︸
(n−q)times
c · · · c︸ ︷︷ ︸
q−times
to cn.
Hence we have
♯{p1 · · · pm → θ1 → · · · → θk = c
n | |θi| = n for i = 1, · · · k}
= Σp≤r≤nM((m, p), (n, r))♯{ a1 · · · a1︸ ︷︷ ︸
(n−r)times
c · · · c︸ ︷︷ ︸
r−times
→ τ1 → · · · → τk−1 = c
n}. (By Lemma 3)
Hence we obtain the derived result.
Lemma 5
For m,n, p ∈ N ∪ {0}, such that 0 ≤ m ≤ n, 0 ≤ p ≤ m, we take p1, · · · pm ∈ {am−pcp}. Then we
MobPat{p1 · · · pm, c
i=0 (−1)
n−p−iM((m, p, ), (n, p+ i)) ifm < n
(−1)n−m ifm = n
PROOF
From Lemma 4 we have MobPat{p1 · · · pm, c
n} = MobPat{ a1 · · · a1︸ ︷︷ ︸
(m−p)times
c · · · c︸ ︷︷ ︸
p−times
, cn}.
Then we have
MobPat{ a1 · · · a1︸ ︷︷ ︸
(m−p)times
c · · · c︸ ︷︷ ︸
p−times
, cn}
= (−1)Σ
i=0 M((m, p), (n, p+ i))µ( a1 · · ·a1︸ ︷︷ ︸
(n−p−i)times
c · · · c︸ ︷︷ ︸
(p+i)times
, cn) (By Proposition 1)
= (−1)Σ
i=0 M((m, p), (n, p+ i))(−1)
n−p−i.
Hence we obtain the derived result. ✷
Lemma 6
For m,n, p, q ∈ N ∪ {0} such that 1 ≤ m ≤ n, 1 ≤ p ≤ m, 1 ≤ q ≤ n, 1 ≤ p ≤ q, we have
M((m, p), (n, q)) = Σn−mi=0 M((m− 1, p− 1), (n− 1− i, q − 1)) · s
PROOF
We have
the left hand side
= ♯{x1 · · ·xn ∈ {a
n−qcq} | a
p ≤ x1 · · ·xn} (By Lemma 2)
= ♯{x1 · · ·xn ∈ {a
n−qcq} | a
p ≤ x1 · · ·xn, xn = c}
+♯{x1 · · ·xn ∈ {a
n−qcq} | a
p ≤ x1 · · ·xn, xn−1 = c, xn 6= c}
· · ·
+♯{x1 · · ·xn ∈ {a
n−qcq} | a
p ≤ x1 · · ·xn, xn−i = c, xn−i+k 6= c(1 ≤ k ≤ i)}
+♯{x1 · · ·xn ∈ {a
n−qcq} | a
p ≤ x1 · · ·xn, xm = c, xm+k 6= c(1 ≤ k ≤ n−m)}
= ♯{ · · ·︸︷︷︸
p−1 ≤ A A ∈ {an−qcq−1}}
+♯{ · · ·︸︷︷︸
c︸︷︷︸
p−1 ≤ A A ∈ {an−q−1cq−1}} · s1
+♯{ · · ·︸︷︷︸
c︸︷︷︸
p−1 ≤ A A ∈ {an−q−icq−1}} · si
+♯{ · · ·︸︷︷︸
c︸︷︷︸
p−1 ≤ A A ∈ {am−qcq−1}} · sn−m
(In case of n− q − i < 0 , we recognize ♯{ · · ·︸︷︷︸
c︸︷︷︸
| am−pcp−1 ≤ A A ∈ an−q−icq−1} as 0.)
= M((m− 1, p− 1), (n− 1, q − 1)) · s0 +M((m− 1, p− 1), (n− 2, q − 1)) · s1 + · · ·
+M((m− 1, p− 1), (n− 1− i, q − 1)) · si · · ·M((m− 1, p− 1), (m− 1, q − 1)) · sn−m
Hence we obtain the derived result. ✷
Lemma 7
For m,n, i ∈ N ∪ {0} such that i ≤ m ≤ n, i ≤ p ≤ q, i ≤ p ≤ m, i ≤ q ≤ n, we have
M((m, p), (n, q)) = Σn−mk=0 M((m− i, p− i), (n− i− k, q − i)) · s
k · (
i+ k − 1
PROOF
In case of i = 1, it is shown by Lemma 6.
We show the above formula by induction. We suppose this lemma holds for i− 1. Now we see
M((m, p), (n, q)) = Σn−mk=0 M((m− i+ 1, p− i+ 1), (n− i− k + 1, q − i+ 1)) · s
i+ k − 2
= Σn−mk=0 (Σ
l=0 M((m− i, p− i), (n− i− k − l, q − i)) · 2
l) · sk(
i+ k − 2
= Σn−mk,l=0M((m− i, p− i), (n− i− (k + l), q − i)) · s
i+ k − 2
= Σn−ik+l=0M((m− i, p− i), (n− i− (k + l), q − i)) · s
i+ k − 2
= Σn−iα=0M((m− i, p− i), (n− i− α, q − i))(Σ
i+ j − 2
)) · sα.
Now we remark the following formula.
Σαx=0(
) = (
i+ α+ 1
hence we have
= Σn−iα=0M((m− i, p− i), (n− i− α, q − i))(Σ
i+ j − 2
)) · sα
= Σn−iα=0M((m− i, p− i), (n− i− α, q − i)) · s
n+ α− 1
Lemma 8
For m,n, p, q ∈ N, 1 ≤ m ≤ n, 1 ≤ p ≤ q, 1 ≤ p ≤ m, 1 ≤ q ≤ n, we have
M((m, p), (n, q)) = Σn−mk=0 M((m− p, 0), (n− p− k, q − p)) · s
k · (
p+ k − 1
Lemma 9
For 1 ≤ α ≤ β, we have
i=0M((α, 0), (β, i)) · (−1)
i = 0.
PROOF
We give a combinatorial proof. We put M̂i :=< a1 · · · a1︸ ︷︷ ︸
α−times
, {aβ−ici} >, M̂ :=
0≤i≤β M̂i,
M̂ev :=
0≤i≤β i; even M̂i and M̂odd :=
0≤i≤β i; odd M̂i.
Then we have Σ
i=0M((α, 0), (β, i)) · (−1)
i = ♯M̂ev − ♯M̂odd.
We consider the map Ψ as follows.
Ψ M̂ −→ M̂
Ψ( · · ·︸︷︷︸
a1ax1 · · ·axt) = · · ·︸︷︷︸
cax1 · · ·axt)
Ψ( · · ·︸︷︷︸
cax1 · · · axt) = · · ·︸︷︷︸
a1ax1 · · ·axt
Ψ( · · ·︸︷︷︸
a1) = · · ·︸︷︷︸
Ψ( · · ·︸︷︷︸
c) = · · ·︸︷︷︸
Here Ψ changes a1 into c and c into a1 which appears right most position of each elements.
Since α not being 0, each element of M̂ contains a1 or c. From that the map Ψ is well-defined.
Therefore obviously Ψ−1 = Ψ and Ψ(M̂ev) = (M̂odd) Ψ(M̂odd) = M̂ev. Hence Ψ is a bijection and
♯M̂ev = ♯M̂odd. Hence we obtain the derived result. ✷
Lemma 10
For m,n, p ∈ N ∪ {0}, 0 ≤ m < n, 0 ≤ p < m, we have
p1 · · · pm ∈ {a
m−pcp} =⇒ MobPat{p1 · · · pm, c
n} = 0.
PROOF
We have
MobPat{p1 · · · pm, c
n} = −Σ
i=0 M((m, p), (n, p+ i)) · (−1)
n−p−i
i=0 (−1)
n−p−iΣn−mk=0 M((m− p, 0), (n− p− k, i)) · s
p+ k − 1
= (−1)n−p−1Σ
i=0 Σ
k=0 M((m− p, 0), (n− p− k, i))(−1)
i · sk(
p+ k − 1
= (−1)n−p−1Σn−mk=0 {Σ
n−p−k
i=0 M((m− p, 0), (n− p− k, i))(−1)
︸ ︷︷ ︸
} · sk(
p+ k − 1
= 0. Hence we obtain the derived result. ✷
Lemma 11
For m,n ∈ N, 1 ≤ m ≤ n, we put
P := {am1 → τ1 → · · · → c
m → θk11 · · · → c
k1 → θk21 · · · → c
k2 → · · · → ckr | m < k1 < · · · <
kr = n, |τi| = m, |θ
j | = ki }, and p1 · · · pm ∈ {a
Then we have µ(p1 · · · pm, c
n) = µ(am1 , c
n) = Mob(P ) = (−1)mµ(cm, cn).
PROOF
We have
µ(p1 · · · pm, c
n) = Mob({(p1 · · · pm → θ1 → · · · θr → c
n) | p1 · · · pm < θ1 < · · · θr < c
Now we put
X l1l2
:= {(p1 · · · pm → · · · θr → τ1 → · · · τs → c
l2 → σk11 → · · · → c
k1 → σk21 → · · · → c
k2 · · · →
cn) | |θr| = l1, θr 6= c
l1 , |τ1|, · · · |τs| = l2, |σ
j | = ki}.
Then we have
{(p1 · · · pm → θ1 → · · · θr → c
n) | p1 · · · pm < θ1 < · · · θr < c
m≤l1<l2≤n
X l1l2⊎
{(p1 · · · pm → · · · → c
m → σk11 → · · · → c
k1 → σk21 → · · · → c
k2 · · · → cn) | |σkij | = ki}.
Now for m ≤ l1 < l2 ≤ n,
Mob(X l1l2 ) =
Σq1···ql1 6=c
l1Mob({(p1 · · · pm → θ1 → · · · → θr → q1 · · · ql1) | p1 · · · pm < θ1 < · · · < q1 · · · ql1})·
Mob({(q1 · · · ql1 → τ1 → · · · → τs → c
l2) | |τi| = l2})·
Mob({(cl2 → σk11 → · · · → c
k1 → σk21 → · · · → c
k2 → · · · → cn) | |σkij | = ki}).
By Lemma 10 , we have
Mob({(q1 · · · ql1 → τ1 → · · · → τs → c
l2) | |τi| = l2}) = 0.
Hence we have Mob(X l1l2 ) = 0
Therefore we have µ(p1, · · · pm, c
Mob({(p1, · · · pm → · · · → c
m → σk11 → · · · → c
k1 → σk21 → · · · → c
k2 · · · → cn) | |σkij | = ki})
= Mob({(p1 · · · pm → θ1 · · · θr → c
m) | |θi| = m})·
Mob({(cm → σk11 → · · · → c
k1 → σk21 → · · · → c
k2 · · · → cn) | |σkij | = ki})
= (−1)m · µ(cm, cn).
Hence we obtain the derived result. ✷
Lemma 12
We have
µ(φ, c) = s− 1, µ(ai, c) = −1, µ(c, c
2) = 2s− 1, µ(ai, c
2) = −2s+ 1.
Now we put T (k, n) := MobPat{ck → θ1 → · · · → θr = c
n | |θi| = n}, T (n, k) = 0 for
0 ≤ k < n, and T (n, n) := −1 for 0 ≤ n .
Lemma 13
For 0 ≤ k ≤ n, we have
T (k, n) = −Σni=k(
) · sn−i · (−1)n−i.
PROOF
In case when k = n, it is trivial.
In case when 0 ≤ k < n, we have
T (k, n) = −Σni=kM((k, k), (n, i)) · (−1)
Now we have
M((k, k), (n, i)) = (
) · sn−i.
Hence we obtain the derived result.
Lemma 14
For 0 ≤ k ≤ l, we have T (k, l)− T (k − 1, l − 1) = −sT (k, l− 1).
PROOF
In case when k = l, it is trivial.
It is enough to show the case of k < l. Then we have
T (k, l)− T (k − 1, l − 1)
= −Σli=k(
) · sl−i · (−1)l−i +Σl−1i=k−1(
l − 1
) · sl−i−1 · (−1)l−i−1
= −Σl−1i=k(
) · sl−i · (−1)l−i − 1 + Σl−2i=k−1(
l − 1
) · sl−i−1 · (−1)l−i−1 + 1
= −Σl−1i=k{(
l − 1
)} · sl−i · (−1)l−i
= −s · Σl−1i=k(
) · sl−i−1 · (−1)l−i−1 = −sT (k, l− 1).
Hence we obtain the derived result. ✷
Lemma 15
Suppose 1 ≤ m < n, then we have
µ(cm, cn) = Σn−1k=mµ(c
m, ck)T (k, n).
PROOF
We heve µ(cm, cn)
= Mob({(cm → σm11 → · · · → c
m1 → σm21 → · · · → c
m2 → · · · cmr−1 → σ
1 · · · → c
cn) | |σmij | = mi}).
In case of 2 ≤ r, we put k = mr−1 and in case of r = 1, we put k = m. By Lemma 1, we obtain
the derived result.
Lemma 16
For 1 ≤ m < n, we have
µ(cm, cn)− µ(cm−1, cn−1) = sµ(cm, cn−1).
PROOF
In case of n = 2, we see µ(c, c2) − µ(φ, c) = (2s − 1) − (s − 1) = s, µ(c, c) = 1 . Hence this
lemma holds for n = 2. holds.
We prove by induction on n. Suppose that the relation holds for n− 1. Then we have
µ(cm, cn)− µ(cm−1, cn−1)
= Σn−1k=mµ(c
m, ck)T (k, n)− Σn−2k=m−1µ(c
m−1, ck)T (k, n− 1)
= Σn−1k=mµ(c
m, ck)T (k, n)− µ(cm, ck)T (k − 1, n− 1) + µ(cm, ck)T (k − 1, n− 1)
−µ(cm−1, ck−1)T (k − 1, n− 1)
= Σn−1k=mµ(c
m, ck)(−s)T (k, n− 1) + sµ(cm, ck−1)T (k − 1, n− 1)
= Σn−1k=mµ(c
m, ck)(−s)T (k, n− 1) + sΣn−1k=m+1µ(c
m, ck−1)T (k − 1, n− 1)
= (−s)µ(cm, cn−1)T (n− 1, n− 1)
= sµ(cm, cn−1).
Lemma 17
For 1 ≤ m ≤ n, we have
µ(am1 , c
n) + µ(am−11 , c
n−1) = sµ(am1 , c
n−1).
PROOF
In case of m < n, we have by Lemma 16. In case of m = n, from am1 � c
n−1, µ(am1 , c
n) = (−1)m
and µ(am−11 , c
n−1) = (−1)m−1, therefore the right hand side = 0. Hence we obtan the derived
result.
Lemma 18
For 1 ≤ m ≤ n, µ(am1 , c
n) is coefficient of Xn−m in T sm+n(X).
PROOF
If m + n = 2, i.e m = n = 1, we have T s2 (X) = s(s − 1)X
2 − 1, µ(a1, c) = −1. Hence this
lemma holds.
If 3 ≤ m+ n, by the relation T sk+2(X) + T
k (X) = sX · T
k+1(X) and Lemma 17 we obtain the
derived result.
Lemma 19
For n ∈ N we heve µ(φ, cn) = sn−1(s− 1).
PROOF
If n = 1, our claim follows from Lemma 12. We show by induction. We suppose that µ(φ, ck) =
sk−1(s− 1) when k ≤ n− 1
Now we have
µ(φ, cn)
= Σnk=1MobPat(φ, c
k)µ(ck, cn) (by Lemma10)
= Σnk=1
−Σki=0s
iM((0, 0), (k, k − i))(−1)i
µ(ck, cn)
= Σnk=1
−Σki=0s
)(−1)i
µ(ck, cn)
= Σnk=1 − (1 − s)
kµ(ck, cn).
And we have
µ(φ, cn)− sµ(φ, cn−1)
= Σnk=1 − (1 − s)
kµ(ck, cn)− Σn−1k=1 − (1− s)
kµ(ck, cn−1)
= Σn−1k=1 − (1− s)
µ(ck, cn)− sµ(ck, cn−1)
− (1− s)n
= Σn−1k=1 − (1− s)
kµ(ck−1, cn−1)− (1− s)n (by Lemma15)
= −(1− s)
Σn−1k=1 (1 − s)
k−1µ(ck−1, cn−1)
− (1 − s)n
= −(1− s)
Σn−2k=0 (1 − s)
kµ(ck, cn−1)
− (1− s)n
= −(1− s)
−µ(φ, cn−1)− µ(φ, cn−1)− (1− s)n−1)
− (1− s)n
So we have µ(φ, cn) = sµ(φ, cn−1). Hence we obtain the derived result. ✷
Lemma 20
For n ∈ N, µ(φ, cn) is the coefficient of Xn in T sn(X).
Therefore we have the following theorem.
Theorem 2
For 0 ≤ m ≤ n , µ(am1 , c
n) is the coefficient of Xn−m in T sm+n(x).
Corollary 1
Conjecture 1 is true.
ACKNOWLEDGEMENT
The author wishes to thank Professor Jun Morita for his valuable advice. And he is also grateful
to Professor Daisuke Sagaki, Sho Matsumoto for their helpful comments.
REFERENCE
[1] Björner,A. Shellable andCohen-Macaulaypartially ordered sets. Trans. Amer. Math. Soc. 260,
1 (1980), 159-183.
[2] Björner,A. TheMobius function of subwordorder. In Invariant theory and tableaux (Min-
neapolis, MN, 1988), vol. 19 of IMA Vol. Math. Appl. Springer, New York, 1990
[3] Björner,A. TheMobius function of factor order. Theoret. Comput. Sci. 117, 1-2 (1993) 91-98
[4] Björner,A.Reutenauer, C.Rationality of theMobius function of subwordorder. Theoret. Com-
put. Sci. 98, 1 (1992), 53-63.
[5] Björner,A. Sagan,B, E.Rationality of theMobius function of the compositionposet. Theoret.
Comput. Sci. 359 (2006), no.1-3, 282-298.
[6] Björner,A. Stanley, R, P.An analogue for compositions. arXiv:math.CO/0508043.
[7] Chow,T.West, J. Forbidden subsequences andChebyshev polynomials. Discrete Math. 204, 1-
3,(1990),119-128.
[8] Ehrenbourg,R.Readdy,M.TheChebyshev transformsof the first and secondkinds.
arXiv:math.CO/0412124.
[9] Hetyei,G.Chebyshev posets.DiscreteComput.Geom.32,4 (2004), 493-520.
[10] Sagan,B, E.Vatter, V.TheMobius function of the composition poset. J. Algebraic, Combin. 24
(2006), no.2,117-136.
[11] Stanley,R, P. Flag f -vectorsand the cd-index. Math. Z. 216,(1994),483-499
[12] Stanley,R, P. Enumerative combinatorics. Vol. 1, vol. 49 of Cambridge Studies in Advanced
Mathematics. Cambridge University Press Cambridge, 1997.
http://arxiv.org/abs/math/0508043
http://arxiv.org/abs/math/0412124
	INTRODUCTION
	PRELIMINERIES
	MAIN RESULTS
ABSTRACT
  In this paper we give a generalization of Chebyshev polynomials and using
this we describe the M\"obius function of the generalized subword order from a
poset {a1,...as,c |ai<c}, which contains an affirmative answer for the
conjecture by Bj\"orner, Sagan, Vatter.[5,10]

<|endoftext|><|startoftext|>
Introduction 
Complex systems may also [1] emerge from a large number of interdependent and 
interacting elements. Networks have proven to be effective models of natural or man-
made complex systems, where the elements are represented by the nodes and their 
interactions by the links. Typical well known examples include communication and 
transportation networks, social networks, biological networks [2, 3, 4, 5].  
Although the statistical analysis of the underlying topological structure has been very 
fruitful [2, 3, 4, 5] it was limited due to the fact that in real networks the links may 
have different capacities or intensities or flows of information or strengths. For 
example, weighted links can be used for the Internet, to represent the amount of data 
exchanged between two hosts in the network. For scientific collaboration networks 
the weight depends on the number of coauthored papers between two authors. For 
airport networks, it’s either the number of available seats on direct flight connections 
between airports i and j or the actual number of passengers that travel from airport i 
to j. For neural networks the weight is the number of junctions between neurons and 
for transportation networks it’s the Euclidean distance between two destinations.  
The diversification of the links is described in terms of weights on the links. Therefore, 
the statistical analysis has to be extended from graphs to weighted complex 
networks. If all links are of equal weight, the statistical parameters used for 
unweighted graphs are sufficient for the statistical characterization of the network. 
Therefore, the statistical parameters of the weighted graphs should reduce to the 
corresponding parameters of the conventional graphs if all weights are put equal to 
unity.  
Complex graphs are characterized by three main statistical parameters, namely the 
degree distribution, the average path length and the clustering coefficient. We shall 
briefly mention the definitions for clarity and for a better understanding of the 
proposed extensions of these parameters for weighted graphs. 
The structure of a network with N nodes is represented by a NxN binary matrix 
ijA {a }= , known as adjacency matrix, whose element ija  equals 1, when there is a 
link joining node i to node j and 0 otherwise (i, j=1,2,…,N). 
In the case of undirected networks with no loops, the adjacency matrix is symmetric 
( ij jia a= ) and all elements of the main diagonal equal 0 ( iia 0= ). 
The degree ik  of a node i is defined as the number of its neighbours, i.e. the number 
of links incident to node i: 
j (i)
= ∑      (1) 
where ija  the elements of the adjacency matrix A and (i)Π  the neighborhood of 
node i. 
The degree distribution is the probability that some node has k connections to other 
nodes and it is usually described by a power law P(k) ~ k−γ , with 2 3≤ γ ≤ . 
The characteristic path length of a network is defined as the average of the 
shortest path lengths between any two nodes: 
N(N 1)
− ∑     (2) 
where ijd  is the shortest path length between i and j, defined as the minimum 
number of links traversed to get from node i to node j. 
In many real networks it is found that the existence of a link between nodes i and j 
and between nodes i and k enhances the probability that node j will also be 
connected to node k. This tendency of the neighbours of any node i to connect to 
each other, is called clustering and is quantified by the clustering coefficient iC , 
which is the fraction of triangles in which node i participates, to the maximum 
possible number of such triangles: 
( ) ( )
ij jk ki
i i i i
a a a
k k 1 k k 1
, ik 0,1≠     (3) 
where i ij jk ki
n a a a
= ∑  is the actual number of triangles in which node i 
participates i.e. the actual number of links between the neighbours of node i, and 
( )i ik k 1 / 2−  is the maximum possible number of links, when the subgraph of 
neighbours of node i is completely connected. 
The clustering coefficient iC  equals 1, if node i is the center of a fully interconnected 
cluster and equals 0, if the neighbours of node i are not connected to each other. 
Ιn order to characterize the network as a whole, we usually consider the average 
clustering coefficient C over all the nodes. We may also consider the average 
clustering coefficient C(k) over the node degree k. 
Studies of real complex networks have shown that their connection topology is 
neither completely random nor completely regular, but lies between these extreme 
cases. Many real networks share features of both extreme cases. For example, the 
short average path length, typical of random networks, comes along with large 
clustering coefficient, typical of regular lattices. The coexistence of these attributes 
defines a distinct class of networks, interpolating between regular lattices and 
random networks, known today as small world networks [3, 4, 5, 6]. Another class of 
networks emerges when the degree distribution is a power law (scale free) 
distribution, which signifies the presence of a non negligible number of highly 
connected nodes, known as hubs. These nodes, with very large degree k compared 
to the average degree <k>, are critical for the network’s robustness and vulnerability. 
These networks are known today as scale free networks [2, 3, 4, 7]. 
The purpose of this paper is to assess the statistical characterization of weighted 
networks in terms of proper generalizations of the relevant parameters, namely 
average path length, degree distribution and clustering coefficient. After reviewing 
the definitions of the weighted average path length, weighted degree distribution and 
weighted clustering coefficient in section 2, we compare them in section 3. Although 
the degree distribution and the average path length admit straightforward 
generalizations, for the clustering coefficient several different definitions have been 
proposed. In order to elucidate the significance of different definitions of the weighted 
clustering coefficient, we studied their dependence on the weights of the connections 
in section 4, where we introduce the relative perturbation norm as an index to assess 
the weight distribution. This study revealed new interesting statistical regularities in 
terms of the relative perturbation norm useful for the statistical characterization of 
weighted graphs. 
2. Statistical parameters of weighted networks 
The weights of the links between nodes are described by a NxN matrix ijW {w }= .  
The weight ijw  is 0 if the nodes i and j are not linked. We will consider the case of 
symmetric positive weights ( ij jiw w 0= ≥ ), with no loops ( iiw 0= ). 
In order to compare different networks or different kinds of weights, we usually 
normalize the weights in the interval [0,1], by dividing all weights by the maximum 
weight. Τhe normalized weights are ij
max(w )
The statistical parameters for weighted networks are defined as follows. 
The node degree i ij
j (i)
= ∑ , which is the number of links attached to node i, is 
extended directly to the strength or weighted degree, which is the sum of the 
weights of all links attached to node i: 
j (i)
= ∑      (4) 
The strength of a node takes into account both the connectivity as well as the 
weights of the links. 
The degree distribution is also extended for the weighted networks to the strength 
distribution P(s), which is the probability that some node’s strength equals s. 
Recent studies indicate power law aP(s) ~ s−  [8, 9, 10]. 
There are two different generalizations of the characteristic path length in the 
literature, applicable to transportation and communication networks. In the case of 
transportation networks the weighted shortest path length ijd  between i and j, is 
defined as the smallest sum of the weights of the links throughout all possible paths 
from node i to node j [11, 12]: 
ij ij
d min w= ∑      (5) 
The weight describes physical distances and/or cost usually involved in 
transportation networks. The capacity/intensity/strength/efficiency of the connection 
is inversely proportional to the weight.  
However, this definition is not suitable for communication networks, where the 
efficiency of the communication channel between two nodes is proportional to the 
weight. The shortest path length in case of communication networks is defined as: 
i, j ij
d min
= ∑      (6) 
To our knowledge, the latter definition has been used by Latora and Marchiori [13, 
14] for the definition of the “efficiency” of the network, as inversely proportional to the 
shortest path length ijd . 
The weighted characteristic path length for both cases is the average of all shortest 
path lengths and it is calculated by formula (2). 
We found in the literature six proposals for the definitions of the weighted clustering 
coefficient, which we shall review. 
 Zhang et. al. (2005) [15] definition: 
ij jk ki
w,i 2
ij ij
w w w
    (7) 
The weights in this definition are normalized. The idea of the generalization is the 
substitution of the elements of the adjacency matrix by the weights in the nominator 
of formula (3), as for the denominator the upper limit of the nominator is obtained in 
order to normalize the coefficient between 0 and 1. The definition originated from 
gene co-expression networks.  
As shown by Kalna et. al. (2006) [16] an alternative formula that may apply for this 
definition is  
ij jk ki
ij ik
j k j
w w w
 Lopez-Fernandez et. al. (2004) [17] definition: 
j,k i i i
k (k 1)∈Π
∑     (8) 
The weights in this definition are not normalized. The idea of the generalization is the 
substitution of the number of links that exist between the neighbours of node i in 
formula (3) by the weight of the link between the neighbours j and k. The definition 
originated from an affiliation network for committers (or modules) of free, open 
source software projects. 
 Onnela et. al. (2005) [18] definition:  
ij jk ki
w w w
k (k 1)
    (9) 
The weights in this definition are normalized. The quantity ( )
ij jk kiI(g) w w w=  is 
called “intensity” of the triangle ijk. The concept for this generalization is to substitute 
the total number of the triangles in which node i participates, by the intensity of the 
triangle, which is geometric mean of the links’ weights.  
 Barrat et. al. (2004) [8] definition: 
ij ikB
w,i ij jk ki
j,ki i
C a a a
s (k 1) 2
− ∑     (10) 
The weights in this definition are not normalized. The idea of the generalization is the 
substitution of the elements of the adjacency matrix in formula (3), by the average of 
the weights of the links between node i and its neighbours j and k with respect to 
normalization factor i is (k 1)−  which ensures that 
w,i0 C 1≤ ≤ . This definition was 
used for airport and scientific collaboration networks. 
 Serrano et. al. (2006) [19] definition 
ij ik kj
w,i 2
w w a
s (1 Y )
    (11) 
where 
= ⎜ ⎟
∑  has been named “disparity”. 
The weights in this definition are not normalized. This formula is used for the 
generalization of the average clustering coefficient with degree k, which has a 
probabilistic interpretation just as the unweighted clustering coefficient. 
 Holme et. al. [20] definition:  
ij jk ki
ij ij ik
j k j
w w w
max(w ) w w
     (12) 
The only difference between formulas (7) and (12) is that (12) is divided by ijmax(w ) . 
We shall not discuss this definition in the comparison because the essence of the 
comparison is already addressed by definition (7). 
 Li et. al. (2005) [21] definition of the weighted clustering coefficient, is another 
version of the Lopez-Fernandez proposal (8). 
3. The relation between the different weighted clustering coefficients 
1. All definitions reduce to the clustering coefficient (3), when the weights ijw  are 
replaced by the adjacency matrix elements.  
2. All weighted clustering coefficients reduce to 0 when there are no links between 
the neighbours of node i, that is when jk jka w 0= = . 
3. In the other extreme, all weighted clustering coefficient take the value 1 when all 
neighbours of node i are connected to each other. Formulas (7) and (8) take the 
value 1 if the weights between the neighbours of the node i are 1, independently 
of the weights of the other links. Formula (9) takes the value 1, if and only if all 
the weights are equal to 1. Formulas (10) and (11) take the value 1 for all fully 
connected graphs, independently of all the weights.  
These calculations are presented in Appendix A. 
4. We calculated the values of the weighted clustering coefficients of node i 
participating in a fully connected triangle. Formulas (7) and (8) take the value 
jkw  of the weight of the link between neighbours j and k, of node i. Formula (9) 
becomes equal to the intensity of the triangle O 1/3w,i ij jk kiC (w w w )=  for all nodes of 
the triangle. Formulas (10) and (11) take the value 1 for all fully connected 
graphs, independently of all weights.  
These calculations are presented in Appendix B. 
4. The dependence of the weighted clustering coefficients on the weights 
In order to understand the meaning of the different proposals-definitions (7), (8), (9) 
(10) and (11) of the weighted clustering coefficient we shall examine their 
dependence on the weights, without alteration of the topology of the graph. We 
simply examine the values of these definitions for different distributions of weights, 
substituting the nonzero elements of the adjacency matrix A by weights normalized 
between 0 and 1. 
A way to distinguish and compare different weight distributions over the same graph, 
is in terms of the relative perturbation norm 
, which gives the percentage of 
the perturbation of the adjacency matrix introduced by the weights. For simplicity, we 
considered the L2 norm. 
We shall examine now the dependence of the weighted clustering coefficient with 
respect to the relative perturbation norms for several different weight distributions as 
well as for different graphs. We have examined many networks from 20 up to 300 
nodes with different topologies that were generated by the networks software PAJEK 
[22]. The weights examined are randomly generated numbers following uniform or 
normal distributions with several parameter values, so that the percentages of the 
perturbations scale from 0-90% increasing by 10% at each perturbation. All 
simulations gave rise to the same results, figs. 3 and 4, representing the typical 
trends of random and scale free networks, figs.1 and 2. It is remarkable to 
emphasize again that in all cases the same trends appear demonstrating a clear 
dependence on the relative perturbation norm only and no dependence on the values 
of weights on specific links.  
Figure 1. The random network (Erdos-Renyi model) examined consists of 100 nodes and was 
generated by the networks software PAJEK [22]. The clustering coefficient for the unweighted 
network is 0.3615. 
Figure 2. The scale-free network (Barabasi-Albert extended model) examined consists of 100 
nodes and was generated by the networks software PAJEK [22]. The clustering coefficient for 
the unweighted network is 0.6561. 
Figure 3. The values of all five weighted clustering coefficients Zhang et. al. Zw,iC ( ) , Lopez-
Fernandez et. al. Lw,iC ( ) , Onnela et. al. 
w,iC ( ) , Barrat et. al. 
w,iC ( )  and  Serrano et.al. 
w,iC ( ) , in terms of the relative perturbation norm for the random network (Erdos-Renyi 
model) with 100 nodes.  
(A). The weights are randomly generated numbers following the uniform distribution.  
(B). The weights are randomly generated numbers following the normal distribution. 
Figure 4. The values of all five weighted clustering coefficients Zhang et. al. Zw,iC ( ) , Lopez-
Fernandez et. al. Lw,iC ( ) , Onnela et. al. 
w,iC ( ) , Barrat et. al. 
w,iC ( )  and  Serrano et.al. 
w,iC ( ) , in terms of the relative perturbation norm for the scale free network (Barabasi-Albert 
extended model) with 100 nodes.  
(A). The weights are randomly generated numbers following the uniform distribution.  
(B). The weights are randomly generated numbers following the normal distribution. 
We observe in all cases a clear trend dependence of the values of all five weighted 
clustering coefficients, in terms of the relative perturbation norm of the weighted 
network. This demonstrates clearly first of all that the relative perturbation norm is a 
reliable index of the weight distribution. The Zhang et. al. (7), Lopez-Fernandez et. al. 
(8) and Onnela et. al. (9), weighted clustering coefficients follow the same trend, 
decaying smoothly as the relative perturbation norm increases. More specifically the 
trends of Zhang et. al. (7) and Lopez-Fernandez et. al. (8) almost coincide, while the 
trend of Onnela et. al. (10) varies slightly from the other two.  
The weighted clustering coefficients of Barrat et. al. (10) and Serrano et.al. (11) do 
not change (variations appear after the first two decimal digits), regardless of the size 
of the network or the distribution of the weights. As mentioned in section 3, these 
coefficients are independent of the weights when the graph is completely connected. 
We notice here however, that weighted clustering coefficients (10) and (11) are 
independent of the weights for any graph.  
5. Concluding remarks  
1. The clear trend dependence of the values of all five weighted clustering 
coefficients in terms of the relative perturbation norm shows that the proposed 
relative perturbation norm is a reliable index of the weight distribution. The 
meaning of the decaying trend of weighted clustering coefficients Zhang et. al. (7), 
Lopez-Fernandez et. al. (8) and Onnela et. al. (9), with respect to the increase of 
the relative perturbation norm is quite natural. The clustering decreases almost 
linearly as the weights “decrease”.  
2. We presented in Appendices A and B the calculations demonstrating that all 
definitions reduce to the clustering coefficient (3), when the weights ijw  are 
replaced by the adjacency matrix elements. The values of the weighted clustering 
coefficients of node i participating in a fully connected triangle are presented for 
completeness because we did not found them in the literature. 
3. The results presented in figures 3 and 4 were necessary to obtain in order to have 
a minimal understanding of the statistical analysis of weighted networks, in order 
to proceed to applications on real networks.  
Acknowledgements 
We would like to thank Prof. Kandylis D. from the Medical School of Aristotle 
University of Thessaloniki who showed to us the significance of weighted networks in 
cognitive processes. We also thank Drs. Serrano M. A., Boguñá M. and Pastor-
Satorras R. who pointed out their work to us. 
APPENDIX A. Calculations on the weighted clustering coefficient 
The definitions (7)-(11) reduce to the clustering coefficient (3), when the weights ijw  
are replaced by the adjacency matrix elements. 
1. Zhang et. al. (2005) 
ij jk ki
w,i 2
ij ij
w w w
The proof is presented by the authors. 
For example, for a fully connected network with four nodes  
4 4 4
21 31 411j jk k1 1j j2 j3 j4
j 1 k 1 j 1
2 2 2 2 24 4
12 13 14 12 13 142
1j 1j
j 1 j 1
12 23 31 12 24 41 13 32 21 13 34 41 14 42 21 14 43 31
12 13 13 14 12 14
w w w w w w w w w w
w w w w w w
w w w w w w w w w w w w w w w w w w
      
w w w w w w
≠ ≠ ≠
⎜ ⎟⎜ ⎟
= = =
+ + − − −
12 23 31 13 34 41 12 24 41
12 13 13 14 12 14
w w w w w w w w w
w w w w w w
w,1C 1=  when 23 34 24w w w 1= = =  
2. Lopez-Fernandez et. al. (2004) 
j,k i i i
k (k 1)∈Π
this formula can be expressed as 
jk ij ik
w a a
k (k 1)
It is obvious that the formula reduces to the unweighted (3) when jkw  are substituted 
by jka . 
3. Onnela et. al. (2005) 
ij jk ki
w w w
k (k 1)
reduces to the unweighted definition (3) when jkw  are substituted by jka . 
ij ij(a ) a= , hence ( ) ( )
ij jk ki ij jk ki ij jk kiw w w a a a a a a= =  
ij jk ki ij jk ki
j k j kO
i i i i
a a a a a a
k (k 1) k (k 1)
∑∑ ∑∑
4. Barrat et. al. (2004) 
ij ikB
w,i ij jk ki
j,ki i
C a a a
s (k 1) 2
− ∑  
reduces to the unweighted definition (3) when ijw  and ikw  are substituted by the 
adjacency matrix elements. 
 i ij ij i
j (i) j (i)
s w a k
∈Π ∈Π
= = =∑ ∑  and  2ij ija a= . 
ij ik ij ij jk ki ik ij jk kiB
w,i ij jk ki
j,k j,ki i i i
ij jk ki ij jk ki ij jk ki ij jk ki
j,k j,ki i i i
ij jk ki
j,ki i
a a a a a a a a a a1 1
C a a a
k (k 1) 2 k (k 1) 2
a a a a a a a a a a a a1 1
k (k 1) 2 k (k 1) 2
a a a
k (k 1)
= = =
= = =
5. Serrano et. al. (2006) formula can be expressed as 
ij ik kj ij ik kj ij ik kj ij ik kj
j k j k j k j kS
w,i 2 2 22
i i i ij2 2ij2
ji ij2i
jij i
ij ik kj
ij ij
w w a w w a w w a w w a
s (1 Y ) s w1w s 1 ws 1 ss
w w a
= = = = =
− −⎛ ⎞ ⎛ ⎞⎛ ⎞ −⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠
∑∑ ∑∑ ∑∑ ∑∑
It is obvious that the formula reduces to the unweighted (3) when jkw  are substituted 
by jka . 
APPENDIX B. The values of the weighted clustering coefficients of some node i 
participating in a fully connected triangle. 
We calculate the weighted clustering coefficient of node 1. 
1. Zhang et. al. (2005) 
3 3 3
1j jk k1 1j j2 21 j3 31
j 1 k 1 j 1Z
w,1 2 2 2 23 3
12 13 12 132
1j 1j
j 1 j 1
12 23 31 13 32 21 12 23 31
12 13 12 13
w w w w w w w w
w w w w
w w w w w w 2w w w
         w
2w w 2w w
≠ ≠ ≠
= = =
+ − −⎛ ⎞
= = =
∑ ∑  
2. Lopez-Fernandez et. al. (2004) 
3 3 3
jk 2k 3k
k 1 j 1L 23 32k 1
w,1 23
w w w
k (k 1) 2(2 1) 2
≠ ≠ ≠
= = = =
3. Onnela et. al. (2005) 
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1
3 3 3
1j jk k1 1j jk k1
j k j kO
1 1 1
3 3 3
1j j2 21 j3 31
1 1 1 1
3 3 3 3
12 23 31 13 32 21
1 1 1
3 3 3
12 23 31 13 32 21 12 23 31
w w w w w w
k (k 1) 2(2 1)
         w w w w w
w w w w w w
w w w w w w w w w 1
= = =
= + =⎜ ⎟
= + =⎢ ⎥⎣ ⎦
= + = ≤⎢ ⎥⎣ ⎦
∑∑ ∑ ∑
O O O 3
w,1 w,2 w,3 12 23 31C  C C w w w 1= = = ≤  
4. Barrat et. al. (2004) 
ij ikB
w,i ij jk ki
j,ki i
C a a a
s (k 1) 2
− ∑  
 Degree of node 1: 1 1j
j (1)
k a 2
= =∑  
 Strength of node 1: 1 1j 12 13
j (1)
s w w w
= = +∑  
1j 1kB
w,1 1j jk k1
j,k1 1
1j 12 1j 13
1j j2 21 1j j3 31
13 12 12 13
13 32 21 12 23 31
12 13 12 23 31 12 23 31
12 13
C a a a
s (k 1) 2
w w w w1
a a a a a a
s (2 1) 2 2
w w w w1
a a a a a a
s 2 2
w w a a a a a a 1
+ +⎛ ⎞
= + =⎜ ⎟− ⎝ ⎠
+ +⎛ ⎞
= + =⎜ ⎟
= + = =
since 12 23 31a a a 1= = =  
We also prove that Barrat et. al. definition for the weighted clustering coefficient is 
independent of all weights for all fully connected networks. 
ij ih ij ihB
w,i ij jh hi ij jh hi
j,h j hi i i i
ik ihi1 ih i2 ih
i1 1h hi i2 2h hi ik k h hi
i1 i1 i2 i1
i1 11 1i i2 21 1i
w w w w1 1
C a a a a a a
s (k 1) 2 s (k 1) 2
w ww w w w1
a a a a a a ... a a a
s (k 1) 2 2 2
w w w w1
a a a a a a .
s (k 1) 2 2
= = =
+⎛ ⎞+ +
= + + + =⎜ ⎟
− ⎝ ⎠
= + +
i i i i
i i i i i i
ik i1
ik k 1 1i
ik i2i1 i2 i2 i2
i1 12 2i i2 22 2i ik k 2 2i
i1 ik i2 ik ik ik
i1 1k k i i2 2k k i ik k
.. a a a
w ww w w w
                a a a a a a ... a a a ...
2 2 2
w w w w w w
               a a a a a a ... a a
2 2 2
⎡ +⎛ ⎞
+ +⎢⎜ ⎟
+⎛ ⎞+ +
+ + + + + +⎜ ⎟
+ + +
+ + + +
i ik k i
For a fully connected network: ij ia 1, i, j 1, 2,..., k= ∀ =  and iia 0= , so 
ik i1B i2 i1
ik i2i1 i2
i1 ik i2 ik
w ww w1
C        0        ...
s (k 1) 2 2
w ww w
                                  0      ... ...
w w w w
                          ...       0  
⎡ +⎛ ⎞+
= + + + +⎢⎜ ⎟
− ⎝ ⎠⎣
+⎛ ⎞+
+ + + + + +⎜ ⎟
+ + + +
i1 i2 ik i1
i1 i2 ik i2
i1 i2 ik ik
     
w w ... w w1
             k 2
s (k 1) 2 2
w w ... w w
                              k 2 ...
w w ... w w
                              k 2
=⎥⎜ ⎟
⎡ + + +⎛ ⎞
= + − +⎢⎜ ⎟
− ⎝ ⎠⎣
+ + +⎛ ⎞
+ + − + +⎜ ⎟
+ + +
+ + −
i1 i2 ik i1 i2 ik
i1 i2 ik i1 i2 ik
i i i
w w ... w w w ... w1
        k   k 2
s (k 1) 2 2
w w ... w w w ... w1
        2k 2 1
s (k 1) 2 s
=⎥⎜ ⎟
+ + + + + +⎛ ⎞
= + − =⎜ ⎟
− ⎝ ⎠
+ + + + + +
= − = =
since 
ii i1 i2 ik
s w w ... w= + + +  
5. Serrano et. al. (2006) 
3 3 3
1j jk k1 1j j2 21 j3 31
j 1 k 1 j 1S
w,1 2 2 2 23 3
12 13 12 132
1j 1j
j 1 j 1
12 23 31 13 32 21 12 23 31
12 13 12 13
w a w w a w a w
w w w w
w a w w a w 2w a w
         1
2w w 2w w
≠ ≠ ≠
= = =
+ − −⎛ ⎞
= = =
References 
[1] Prigogine I., (1980). From being to becoming, Freeman, New York. 
[2] Barabasi, A.-L. (2002). Linked: The New Science of Networks, Perseus, 
Cambridge, MA. 
[3] Dorogovtsev S.N., Mendes J.F.F. (2002). Evolution of networks, Advances in 
Physics 51, 1079. 
[4] Newman, M.E.J. (2003). The structure and function of complex networks, SIAM 
Review 45, 167-256. 
[5] Watts, D.J. (2003). Six Degrees: The Science of a Connected Age, Norton, New 
York. 
[6] Watts, D.J., Strogatz, S.H. (1998). Collective dynamics of ‘small-world’ networks, 
Nature 393, 440–442. 
[7] Barabasi, A.-L., Albert, R. (1999). Emergence of scaling in random networks, 
Science 286, 509-512. 
[8] Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A. (2004). The 
architecture of complex weighted networks, Proceedings of the National 
Academy of Sciences (USA) 101, 3747- 3752. 
[9] Barthelemy, M., Barrat, A., Pastor-Satorras, R., Vespignani, A. (2005). 
Characterization and modeling of weighted networks, Physica A 346, 34–43.  
[10] Meiss, M.  Menczer, F.  Vespignani, A. (2005). On the Lack of Typical Behavior 
in the Global Web Traffic Network, Proceedings of the 14th International World 
Wide Web Conference, Japan, ACM 1595930469/05/0005. 
[11] Thadakamalla, H.P., Kumara, S.R.T., Albert R. (2006). Search in weighted 
complex networks, cond-mat/0511476v2. 
[12] Xu, X.-J., Wu, Z.-X., Wang Y.-H. (2005). Properties of weighted complex 
networks, cond-mat/0504294v3. 
[13] Latora, V., Marchiori, M. (2001). Efficient behavior of small-world networks, 
Physical Review Letters 87, 198701. 
[14] Latora, V., Marchiori, M. (2003). Economic small-world behavior in weighted 
networks, The European Physical Journal B 32, 249-263. 
[15] Zhang, B., Horvath, S. (2005). A general framework for weighted gene co-
expression network analysis, Statistical Applications in Genetics and Molecular 
Biology, 4. 
[16] Kalna, G., Higham, D.J. (2006). Clustering coefficients for weighted networks, 
University of Strathclyde Mathematics Research report 3. 
[17] Lopez-Fernandez, L., Robles, G., Gonzalez-Barahona, J.M. (2004). Applying 
social network analysis to the information in CVS repositories. In Proc. of the 1st 
Intl. Workshop on Mining Software Repositories (MSR2004), 101-105. 
[18] Onnela, J.-P., Saramäki, J., Kertész, J., Kaski, K. (2005). Intensity and 
coherence of motifs in weighted complex networks, Physical Review E, 71 (6), 
065103. 
[19] Serrano M. A., Boguñá M., Pastor-Satorras R. (2006). Correlations in weighted 
networks, Physical Review E 74, 055101 (R) 
[20] Holme P., Park S.M., Kim B.J., Edling C.R. (2004). Korean university life in a 
network perspective: Dynamics of a large affiliation network, cond-
mat/0411634v1. 
[21] Li, M., Fan, Y., Chen, J., Gao, L., Di, Z., Wu J. (2005). Weighted networks of 
scientific communication: the measurement and topological role of weight, 
Physica A 350, 643–656. 
[22] Batagelj V., Mrvar A.: Pajek. http://vlado.fmf.uni-lj.si/pub/networks/pajek/
ABSTRACT
  The purpose of this paper is to assess the statistical characterization of
weighted networks in terms of the generalization of the relevant parameters,
namely average path length, degree distribution and clustering coefficient.
Although the degree distribution and the average path length admit
straightforward generalizations, for the clustering coefficient several
different definitions have been proposed in the literature. We examined the
different definitions and identified the similarities and differences between
them. In order to elucidate the significance of different definitions of the
weighted clustering coefficient, we studied their dependence on the weights of
the connections. For this purpose, we introduce the relative perturbation norm
of the weights as an index to assess the weight distribution. This study
revealed new interesting statistical regularities in terms of the relative
perturbation norm useful for the statistical characterization of weighted
graphs.

<|endoftext|><|startoftext|>
Introduction
	Mathematical setting of the problem
	A priori estimates
	Determining modes
	Determining nodes
	Hausdorff and fractal dimension of attractor
	Conclusions
	Bibliography
ABSTRACT
  This paper is devoted to describe the finite-dimensionality of a
two-dimensional micropolar fluid flow with periodic boundary conditions. We
define the notions of determining modes and nodes and estimate the number of
them, we also estimate the dimension of the global attractor. Finally we
compare our results with analogous results for Navier-Stokes equation.

<|endoftext|><|startoftext|>
Introduction
Rotor-router walk is a deterministic analogue of random walk, first studied
by Priezzhev et al. [18] under the name “Eulerian walkers.” At each site
∗supported by an NSF Graduate Research Fellowship, and NSF grant DMS-0605166;
levine(at)math.berkeley.edu
†partially supported by NSF grant DMS-0605166; peres(at)stat.berkeley.edu
Keywords: abelian sandpile, asymptotic shape, discrete Laplacian, divisible sand-
pile, growth model, internal diffusion limited aggregation, rotor-router model
2000 Mathematics Subject Classifications: Primary 60G50; Secondary 35R35
Figure 1: Rotor-router aggregate of one million particles in Z2. Each site is
colored according to the direction of its rotor.
in the integer lattice Z2 is a rotor pointing north, south, east or west. A
particle starts at the origin; during each time step, the rotor at the particle’s
current location is rotated clockwise by 90 degrees, and the particle takes a
step in the direction of the newly rotated rotor. In rotor-router aggregation,
introduced by Jim Propp, we start with n particles at the origin; each par-
ticle in turn performs rotor-router walk until it reaches a site not occupied
by any other particles. Let An denote the resulting region of n occupied
sites. For example, if all rotors initially point north, the sequence will begin
A1 = {(0, 0)}, A2 = {(0, 0), (1, 0)}, A3 = {(0, 0), (1, 0), (0,−1)}. The region
A1,000,000 is pictured in Figure 1. In higher dimensions, the model can be
defined analogously by repeatedly cycling the rotors through an ordering of
the 2d cardinal directions in Zd.
Jim Propp observed from simulations in two dimensions that the regions
An are extraordinarily close to circular, and asked why this was so [7, 19].
Despite the impressive empirical evidence for circularity, the best result
known until now [14] says only that if An is rescaled to have unit volume,
the volume of the symmetric difference of An with a ball of unit volume tends
to zero as a power of n, as n ↑ ∞. The main outline of the argument is
summarized in [15]. Fey and Redig [5] also show that An contains a diamond.
In particular, these results do not rule out the possibility of “holes” in An
far from the boundary or of long tendrils extending far beyond the boundary
of the ball, provided the volume of these features is negligible compared to
Our main result is the following, which rules out the possibility of holes
far from the boundary or of long tendrils in the rotor-router shape. For
r ≥ 0 let
Br = {x ∈ Zd : |x| < r}.
Theorem 1.1. Let An be the region formed by rotor-router aggregation in
Zd starting from n particles at the origin and any initial rotor state. There
exist constants c, c′ depending only on d, such that
Br−c log r ⊂ An ⊂ Br(1+c′r−1/d log r)
where r = (n/ωd)1/d, and ωd is the volume of the unit ball in Rd.
We remark that the same result holds when the rotors evolve according
to stacks of bounded discrepancy; see the remark following Lemma 5.1.
Internal diffusion limited aggregation (“internal DLA”) is an analogous
growth model defined using random walks instead of rotor-router walks.
Starting with n particles at the origin, each particle in turn performs simple
random walk until it reaches an unoccupied site. Lawler, Bramson and
Griffeath [10] showed that for internal DLA in Zd, the occupied region An,
rescaled by a factor of n1/d, converges with probability one to a Euclidean
ball in Rd as n→∞. Lawler [11] estimated the rate of convergence. By way
of comparison with Theorem 1.1, if In is the internal DLA region formed
from n particles started at the origin, the best known bounds [11] are (up
to logarithmic factors)
Br−r1/3 ⊂ In ⊂ Br+r1/3
for all sufficiently large n, with probability one.
We also study another model which is slightly more difficult to define,
but much easier to analyze. In the divisible sandpile, each site x ∈ Zd
starts with a quantity of “mass” ν0(x) ∈ R≥0. A site topples by keeping up
Figure 2: Classical abelian sandpile aggregate of one million particles in Z2.
Colors represent the number of grains at each site.
to mass 1 for itself, and distributing the excess (if any) equally among its
neighbors. Thus if x has mass m > 1, then each of the 2d neighboring sites
gains mass (m−1)/2d when we topple x, and x is left with mass 1; if m ≤ 1,
then no mass moves when we topple x.
Note that individual topplings do not commute; however, the divisible
sandpile is “abelian” in the following sense.
Proposition 1.2. Let x1, x2, . . . ∈ Zd be a sequence with the property that
for any x ∈ Zd there are infinitely many terms xk = x. Let
uk(x) = total mass emitted by x after toppling x1, . . . , xk;
νk(x) = amount of mass present at x after toppling x1, . . . , xk.
Then uk ↑ u and νk → ν ≤ 1. Moreover, the limits u and ν are independent
of the sequence {xk}.
The abelian property can be generalized as follows: after performing
some topplings, we can add some additional mass and then continue top-
pling. The resulting limits u and ν will be the same as in the case when all
mass was initially present. For a further generalization, see [16].
The limiting function u in Proposition 1.2 is the odometer function for
the divisible sandpile. This function plays a central role in our analysis. The
limit ν represents the final mass distribution. Sites x ∈ Zd with ν(x) = 1
are called fully occupied. Proposition 1.2 is proved in section 3, along with
the following.
Theorem 1.3. For m ≥ 0 let Dm ⊂ Zd be the domain of fully occupied
sites for the divisible sandpile formed from a pile of mass m at the origin.
There exist constants c, c′ depending only on d, such that
Br−c ⊂ Dm ⊂ Br+c′ ,
where r = (m/ωd)1/d and ωd is the volume of the unit ball in Rd.
The divisible sandpile is similar to the “oil game” studied by Van den
Heuvel [22]. In the terminology of [5], it also corresponds to the h → −∞
limit of the classical abelian sandpile (defined below), that is, the abelian
sandpile started from the initial condition in which every site has a very
deep “hole.”
In the classical abelian sandpile model [1], each site in Zd has an integer
number of grains of sand; if a site has at least 2d grains, it topples, sending
one grain to each neighbor. If n grains of sand are started at the origin in
Zd, write Sn for the set of sites that are visited during the toppling process;
in particular, although a site may be empty in the final state, we include it
in Sn if it was occupied at any time during the evolution to the final state.
Until now the best known constraints on the shape of Sn in two dimen-
sions were due to Le Borgne and Rossin [12], who proved that
{x ∈ Z2 |x1 + x2 ≤
n/12− 1} ⊂ Sn ⊂ {x ∈ Z2 |x1, x2 ≤
n/2}.
Fey and Redig [5] proved analogous bounds in higher dimensions, and ex-
tended these bounds to arbitrary values of the height parameter h. This
parameter is discussed in section 4.
The methods used to prove the near-perfect circularity of the divisible
sandpile shape in Theorem 1.3 can be used to give constraints on the shape
of the classical abelian sandpile, improving on the bounds of [5] and [12].
Theorem 1.4. Let Sn be the set of sites that are visited by the classical
abelian sandpile model in Zd, starting from n particles at the origin. Write
n = ωdrd. Then for any � > 0 we have
Bc1r−c2 ⊂ Sn ⊂ Bc′1r+c′2
Figure 3: Known bounds on the shape of the classical abelian sandpile in
Z2. The inner diamond and outer square are due to Le Borgne and Rossin
[12]; the inner and outer circles are those in Theorem 1.4.
where
c1 = (2d− 1)−1/d, c′1 = (d− �)
−1/d.
The constant c2 depends only on d, while c′2 depends only on d and �.
Note that Theorem 1.4 does not settle the question of the asymptotic
shape of Sn, and indeed it is not clear from simulations whether the asymp-
totic shape in two dimensions is a disc or perhaps a polygon (Figure 2). To
our knowledge even the existence of an asymptotic shape is not known.
The rest of the paper is organized as follows. In section 2, we derive the
basic Green’s function estimates that are used in the proofs of Theorems 1.1,
1.3 and 1.4. In section 3 we prove Proposition 1.2 and Theorem 1.3 for the
divisible sandpile. In section 4 we adapt the methods of the previous section
to prove Theorem 1.4 for the classical abelian sandpile model. Section 5 is
devoted to the proof of Theorem 1.1.
2 Basic Estimate
Write (Xk)k≥0 for simple random walk in Zd, and for d ≥ 3 denote by
g(x) = Eo#{k|Xk = x}
the expected number of visits to x by simple random walk started at the
origin. This is the discrete harmonic Green’s function in Zd; it satisfies
∆g(x) = 0 for x 6= o, and ∆g(o) = −1, where ∆ is the discrete Laplacian
∆g(x) =
g(y)− g(x).
The sum is over the 2d lattice neighbors y of x. In dimension d = 2, simple
random walk is recurrent, so the expectation defining g(x) is infinite. Here
we define the potential kernel
g(x) = lim
gn(x)− gn(o) (1)
where
gn(x) = Eo#{k ≤ n|Xk = x}.
The limit defining g(x) in (1) is finite [9, 20], and g(x) has Laplacian ∆g(x) =
0 for x 6= o, and ∆g(o) = −1. Note that (1) is the negative of the usual
definition of the potential kernel; we have chosen this sign convention so
that g has the same Laplacian in dimension two as in higher dimensions.
Fix a real number m > 0 and consider the function on Zd
γ̃d(x) = |x|2 +mg(x). (2)
Let r be such that m = ωdrd, and let
γd(x) = γ̃d(x)− γ̃d(brce1) (3)
where e1 is the first standard basis vector in Zd. The function γd plays a
central role in our analysis. To see where it comes from, recall the divisible
sandpile odometer function of Proposition 1.2
u(x) = total mass emitted from x.
Let Dm ⊂ Zd be the domain of fully occupied sites for the divisible sandpile
formed from a pile of mass m at the origin. For x ∈ Dm, since each neighbor
y of x emits an equal amount of mass to each of its 2d neighbors, we have
∆u(x) =
u(y)− u(x)
= mass received by x−mass emitted by x
= 1−mδox.
Moreover, u = 0 on ∂Dm. By construction, the function γd obeys the same
Laplacian condition: ∆γd = 1 −mδo; and as we will see shortly, γd ≈ 0 on
∂Br. Since we expect the domain Dm to be close to the ball Br, we should
expect that u ≈ γd. In fact, we will first show that u is close to γd, and then
use this to conclude that Dm is close to Br.
We will use the following estimates for the Green’s function [6, 21]; see
also [9, Theorems 1.5.4 and 1.6.2].
g(x) =
log |x|+ κ+O(|x|−2), d = 2
ad|x|2−d +O(|x|−d), d ≥ 3.
Here ad =
(d−2)ωd
, where ωd is the volume of the unit ball in Rd, and κ is
a constant whose value we will not need to know. For x ∈ Zd we use |x| to
denote the Euclidean norm of x. Here and throughout the paper, constants
in error terms denoted O(·) depend only on d.
We will need an estimate for γd near the boundary of the ball Br. We
first consider dimension d = 2. From (4) we have
γ̃2(x) = φ(x)− κm+O(m|x|−2), (5)
where
φ(x) = |x|2 −
log |x|.
In the Taylor expansion of φ about |x| = r
φ(x) = φ(r)− φ′(r)(r − |x|) +
φ′′(t)(r − |x|)2 (6)
the linear term vanishes, leaving
γ2(x) =
(r − |x|)2 +O(m|x|−2) (7)
for some t between |x| and r.
In dimensions d ≥ 3, from (4) we have
γ̃d(x) = |x|2 + adm|x|2−d +O(m|x|−d).
Setting φ(x) = |x|2 +adm|x|2−d, the linear term in the Taylor expansion (6)
of φ about |x| = r again vanishes, yielding
γd(x) =
1 + (d− 1)(r/t)d
(r − |x|)2 +O(m|x|−d)
for t between |x| and r. Together with (7), this yields the following estimates
in all dimensions d ≥ 2.
Lemma 2.1. Let γd be given by (3). For all x ∈ Zd we have
γd(x) ≥ (r − |x|)2 +O
. (8)
Lemma 2.2. Let γd be given by (3). Then uniformly in r,
γd(x) = O(1), x ∈ Br+1 −Br−1.
The following lemma is useful for x near the origin, where the error term
in (8) blows up.
Lemma 2.3. Let γd be given by (3). Then for sufficiently large r, we have
γd(x) >
, ∀x ∈ Br/3.
Proof. Since γd(x) − |x|2 is superharmonic, it attains its minimum in Br/3
at a point z on the boundary. Thus for any x ∈ Br/3
γd(x)− |x|2 ≥ γd(z)− |z|2,
hence by Lemma 2.1
γd(x) ≥ (2r/3)2 − (r/3)2 +O(1) >
Lemmas 2.1 and 2.3 together imply the following.
Lemma 2.4. Let γd be given by (3). There is a constant a depending only
on d, such that γd ≥ −a everywhere.
3 Divisible Sandpile
Let µ0 be a nonnegative function on Zd with finite support. We start with
mass µ0(y) at each site y. The operation of toppling a vertex x yields the
mass distribution
Txµ0 = µ0 + α(x)∆δx
where α(x) = max(µ0(x)−1, 0) and ∆ is the discrete Laplacian on Zd. Thus
if µ0(x) ≤ 1 then Txµ0 = µ0 and no mass topples; if µ0(x) > 1 then the
mass in excess of 1 is distributed equally among the neighbors of x.
Let x1, x2, . . . ∈ Zd be a sequence with the property that for any x ∈ Zd
there are infinitely many terms xk = x. Let
µk(y) = Txk . . . Tx1µ0(y).
be the amount of mass present at y after toppling the sites x1, . . . , xk in
succession. The total mass emitted from y during this process is
uk(y) :=
j≤k:xj=y
µj−1(y)− µj(y) =
j≤k:xj=y
αj(y) (9)
where αj(y) = max(µj(y)− 1, 0).
Lemma 3.1. As k ↑ ∞ the functions uk and µk tend to limits uk ↑ u and
µk → µ. Moreover, these limits satisfy
µ = µ0 + ∆u ≤ 1.
Proof. Write M =
x µ0(x) for the total starting mass, and let B ⊂ Z
be a ball centered at the origin containing all points within L1-distance M
of the support of µ0. Note that if µk(x) > 0 and µ0(x) = 0, then x must
have received its mass from a neighbor, so µk(y) ≥ 1 for some y ∼ x. Since∑
x µk(x) = M , it follows that µk is supported on B. Let R be the radius
of B, and consider the quadratic weight
µk(x)|x|2 ≤MR2.
Since µk(xk)−µk−1(xk) = −αk(xk) and for y ∼ xk we have µk(y)−µk−1(y) =
αk(xk), we obtain
Qk −Qk−1 = αk(xk)
|y|2 − |xk|2
= αk(xk).
Summing over k we obtain from (9)
Qk = Q0 +
uk(x).
Fixing x, the sequence uk(x) is thus increasing and bounded above, hence
convergent.
Given neighboring vertices x ∼ y, since y emits an equal amount of mass
to each of its 2d neighbors, it emits mass uk(y)/2d to x up to time k. Thus
x receives a total mass of 1
y∼x uk(y) from its neighbors up to time k.
Comparing the amount of mass present at x before and after toppling, we
obtain
µk(x) = µ0(x) + ∆uk(x).
Since uk ↑ u we infer that µk → µ := µ0 + ∆u. Note that if xk = x, then
µk(x) ≤ 1. Since for each x ∈ Zd this holds for infinitely many values of k,
the limit satisfies µ ≤ 1.
A function s on Zd is superharmonic if ∆s ≤ 0. Given a function γ on
Zd the least superharmonic majorant of γ is the function
s(x) = inf{f(x) | f is superharmonic and f ≥ γ}.
The study of the least superharmonic majorant is a classical topic in analysis
and PDE; see, for example, [8]. Note that if f is superharmonic and f ≥ γ
f(x) ≥
f(y) ≥
s(y).
Taking the infimum on the left side we obtain that s is superharmonic.
Lemma 3.2. The limit u in Lemma 3.1 is given by u = s+ γ, where
γ(x) = |x|2 +
g(x− y)µ0(y)
and s is the least superharmonic majorant of −γ.
Proof. By Lemma 3.1 we have
∆u = µ− µ0 ≤ 1− µ0.
Since ∆γ = 1−µ0, the difference u−γ is superharmonic. As u is nonnegative,
it follows that u − γ ≥ s. For the reverse inequality, note that s + γ − u is
superharmonic on the domain D = {x | µ(x) = 1} of fully occupied sites
and is nonnegative outside D, hence nonnegative inside D as well.
As a corollary of Lemmas 3.1 and 3.2, we obtain the abelian property of
the divisible sandpile, Proposition 1.2, which was stated in the introduction.
We now turn to the case of a point source mass m started at the origin:
µ0 = mδo. More general starting distributions are treated in [16], where
we identify the scaling limit of the divisible sandpile model and show that
it coincides with that of internal DLA and of the rotor-router model. In
the case of a point source of mass m, the natural question is to identify
the shape of the resulting domain Dm of fully occupied sites, i.e. sites x for
which µ(x) = 1. According to Theorem 3.3, Dm is extremely close to a ball
of volume m; in fact, the error in the radius is a constant independent of m.
As before, for r ≥ 0 we write
Br = {x ∈ Zd : |x| < r}
for the lattice ball of radius r centered at the origin.
Theorem 3.3. For m ≥ 0 let Dm ⊂ Zd be the domain of fully occupied sites
for the divisible sandpile formed from a pile of size m at the origin. There
exist constants c, c′ depending only on d, such that
Br−c ⊂ Dm ⊂ Br+c′ ,
where r = (m/ωd)1/d and ωd is the volume of the unit ball in Rd.
The idea of the proof is to use Lemma 3.2 along with the basic estimates
on γ, Lemmas 2.1 and 2.2, to obtain estimates on the odometer function
u(x) = total mass emitted from x.
We will need the following simple observation.
Lemma 3.4. For every point x ∈ Dm − {o} there is a path x = x0 ∼ x1 ∼
. . . ∼ xk = o in Dm with u(xi+1) ≥ u(xi) + 1.
Proof. If xi ∈ Dm − {o}, let xi+1 be a neighbor of xi maximizing u(xi+1).
Then xi+1 ∈ Dm and
u(xi+1) ≥
= u(xi) + ∆u(xi)
= u(xi) + 1,
where in the last step we have used the fact that xi ∈ Dm.
Proof of Theorem 3.3. We first treat the inner estimate. Let γd be given by
(3). By Lemma 3.2 the function u−γd is superharmonic, so its minimum in
the ball Br is attained on the boundary. Since u ≥ 0, we have by Lemma 2.2
u(x)− γd(x) ≥ −C, x ∈ ∂Br
for a constant C depending only on d. Hence by Lemma 2.1,
u(x) ≥ (r − |x|)2 − C ′rd/|x|d, x ∈ Br. (10)
for a constant C ′ depending only on d. It follows that there is a constant c,
depending only on d, such that u(x) > 0 whenever r/3 ≤ |x| < r − c. Thus
Br−c−Br/3 ⊂ Dm. For x ∈ Br/3, by Lemma 2.3 we have u(x) ≥ r2/4−C >
0, hence Br/3 ⊂ Dm.
For the outer estimate, note that u−γd is harmonic onDm. By Lemma 2.4
we have γd ≥ −a everywhere, where a depends only on d. Since u vanishes
on ∂Dm it follows that u − γd ≤ a on Dm. Now for any x ∈ Dm with
r − 1 < |x| ≤ r, we have by Lemma 2.2
u(x) ≤ γd(x) + a ≤ c′
for a constant c′ depending only on d. Lemma 3.4 now implies that Dm ⊂
Br+c′+1.
4 Classical Sandpile
We consider a generalization of the classical abelian sandpile, proposed by
Fey and Redig [5]. Each site in Zd begins with a “hole” of depth H. Thus,
each site absorbs the first H grains it receives, and thereafter functions
normally, toppling once for each additional 2d grains it receives. If H is
negative, we can interpret this as saying that every site starts with h = −H
grains of sand already present. Aggregation is only well-defined in the regime
h ≤ 2d− 2, since for h = 2d− 1 the addition of a single grain already causes
every site in Zd to topple infinitely often.
Let Sn,H be the set of sites that are visited if n particles start at the
origin in Zd. Fey and Redig [5, Theorem 4.7] prove that
lim sup
# (Sn,H 4BH−1/dr) = 0,
where n = ωdrd, and 4 denotes symmetric difference. The following theo-
rem strengthens this result.
Theorem 4.1. Fix an integer H ≥ 2−2d. Let Sn = Sn,H be the set of sites
that are visited by the classical abelian sandpile model in Zd, starting from
n particles at the origin, if every lattice site begins with a hole of depth H.
Write n = ωdrd. Then
Bc1r−c2 ⊂ Sn,H
where
c1 = (2d− 1 +H)−1/d
and c2 is a constant depending only on d. Moreover if H ≥ 1− d, then for
any � > 0 we have
Sn,H ⊂ Bc′1r+c′2
where
c′1 = (d− �+H)
and c′2 is independent of n but may depend on d, H and �.
Note that the ratio c1/c′1 ↑ 1 as H ↑ ∞. Thus, the classical abelian
sandpile run from an initial state in which each lattice site starts with a
deep hole yields a shape very close to a ball. Intuitively, one can think of the
classical sandpile with deep holes as approximating the divisible sandpile,
whose limiting shape is a ball by Theorem 3.3. Following this intuition,
we can adapt the proof of Theorem 3.3 to prove Theorem 4.1; just one
additional averaging trick is needed, which we explain below.
Consider the odometer function for the abelian sandpile
u(x) = total number of grains emitted from x.
Let Tn = {x|u(x) > 0} be the set of sites which topple at least once. Then
Tn ⊂ Sn ⊂ Tn ∪ ∂Tn.
In the final state, each site which has toppled retains between 0 and 2d− 1
grains, in addition to the H that it absorbed. Hence
H ≤ ∆u(x) + nδox ≤ 2d− 1 +H, x ∈ Tn. (11)
We can improve the lower bound by averaging over a small box. For x ∈ Zd
Qk(x) = {y ∈ Zd : ||x− y||∞ ≤ k}
be the box of side length 2k + 1 centered at x, and let
u(k)(x) = (2k + 1)−d
y∈Qk(x)
u(y).
Write
T (k)n = {x |Qk(x) ⊂ Tn}.
Le Borgne and Rossin [12] observe that if T is a set of sites all of which
topple, the number of grains remaining in T is at least the number of edges
internal to T : indeed, for each internal edge, the endpoint that topples last
sends the other a grain which never moves again. Since the box Qk(x) has
2dk(2k + 1)d−1 internal edges, we have
∆u(k)(x) ≥
2k + 1
d+H −
(2k + 1)d
1Qk(o)(x), x ∈ T
n . (12)
The following lemma is analogous to Lemma 3.4.
Lemma 4.2. For every point x ∈ Tn adjacent to ∂Tn there is a path x =
x0 ∼ x1 ∼ . . . ∼ xm = o in Tn with u(xi+1) ≥ u(xi) + 1.
Proof. By (11) we have
u(y) ≥ u(xi).
Since u(xi−1) < u(xi), some term u(y) in the sum above must exceed u(xi).
Let xi+1 = y.
Proof of Theorem 4.1. Let
ξ̃d(x) = (2d− 1 +H)|x|2 + ng(x),
and let
ξd(x) = ξ̃d(x)− ξ̃d(bc1rce1).
Taking m = n/(2d− 1 +H) in Lemma 2.2, we have
u(x)− ξd(x) ≥ −ξd(x) ≥ −C(2d− 1 +H), x ∈ ∂Bc1r (13)
for a constant C depending only on d. By (11), u− ξd is superharmonic, so
u − ξd ≥ −C(2d − 1 + H) in all of Bc1r. Hence by Lemma 2.1 we have for
x ∈ Bc1r
u(x) ≥ (2d− 1 +H)
(c1r − |x|)2 − C ′(c1r)d/|x|d
, (14)
where C ′ depends only on d. It follows that u is positive on Bc1r−c2−Bc1r/3
for a suitable constant c2 depending only on d. For x ∈ Bc1r/3, by Lemma 2.3
we have u(x) > (2d− 1 +H)(c21r
2/4− C) > 0. Thus Bc1r−c2 ⊂ Tn ⊂ Sn.
For the outer estimate, let
ψ̂d(x) = (d− �+H)|x|2 + ng(x).
Choose k large enough so that 2k
d ≥ d− �, and define
ψ̃d(x) = (2k + 1)
y∈Qk(x)
ψ̂d(y).
Finally, let
ψd(x) = ψ̃d(x)− ψ̃d(bc′1rce1).
By (12), u(k) − ψd is subharmonic on T
n . Taking m = n/(d − � + H)
in Lemma 2.4, there is a constant a depending only on d, such that ψd ≥
−a(d+H) everywhere. Since u(k) ≤ (2d+H)(d+1)k on ∂T (k)n it follows that
u(k) − ψd ≤ a(d + H) + (2d + H)(d+1)k on T
n . Now for any x ∈ Sn with
c′1r − 1 < |x| ≤ c
1r we have by Lemma 2.2
u(k)(x) ≤ ψd(x) + a(d+H) + (2d+H)(d+1)k ≤ c̃2
for a constant c̃2 depending only on d, H and �. Then u(x) ≤ c′2 := (2k +
1)dc̃2. Lemma 4.2 now implies that Tn ⊂ Bc′1r+c′2 , and hence
Sn ⊂ Tn ∪ ∂Tn ⊂ Bc′1r+c′2+1.
We remark that the crude bound of (2d + H)(d+1)k used in the proof
of the outer estimate can be improved to a bound of order k2H, and the
final factor of (2k + 1)d can be replaced by a constant factor independent
of k and H, using the fact that a nonnegative function on Zd with bounded
Laplacian cannot grow faster than quadratically; see [16].
5 Rotor-Router Model
Given a function f on Zd, for a directed edge (x, y) write
∇f(x, y) = f(y)− f(x).
Given a function s on directed edges in Zd, write
div s(x) =
s(x, y).
The discrete Laplacian of f is then given by
∆f(x) = div∇f =
f(y)− f(x).
5.1 Inner Estimate
Fixing n ≥ 1, consider the odometer function for rotor-router aggregation
u(x) = total number exits from x by the first n particles.
We learned the idea of using the odometer function to study the rotor-router
shape from Matt Cook [2].
Lemma 5.1. For a directed edge (x, y) in Zd, denote by κ(x, y) the net
number of crossings from x to y performed by the first n particles in rotor-
router aggregation. Then
∇u(x, y) = −2dκ(x, y) +R(x, y) (15)
for some edge function R which satisfies
|R(x, y)| ≤ 4d− 2
for all edges (x, y).
Remark. In the more general setting of rotor stacks of bounded discrepancy,
the 4d− 2 will be replaced by a different constant here.
Proof. Writing N(x, y) for the number of particles routed from x to y, we
u(x)− 2d+ 1
≤ N(x, y) ≤
u(x) + 2d− 1
hence
|∇u(x, y) + 2dκ(x, y)| = |u(y)− u(x) + 2dN(x, y)− 2dN(y, x)|
≤ 4d− 2.
In what follows, C0, C1, . . . denote constants depending only on d.
Lemma 5.2. Let Ω ⊂ Zd − {o} with 2 ≤ #Ω <∞. Then∑
|y|1−d ≤ C0 Diam(Ω).
Proof. For each positive integer k, let
Sk = {y ∈ Zd : k ≤ |y| < k + 1}.
Then ∑
|y|1−d ≤ k1−d#Sk ≤ C ′0
for a constant C ′0 depending only on d. Since Ω can intersect at most
Diam(Ω) + 1 ≤ 2 Diam(Ω) distinct sets Sk, taking C0 = 2C ′0 the proof is
complete.
Lemma 5.3. Let G = GBr be the Green’s function for simple random walk
in Zd stopped on exiting Br. For any ρ ≥ 1 and x ∈ Br,∑
|x−y|≤ρ
|G(x, y)−G(x, z)| ≤ C1ρ. (16)
Proof. Let (Xt)t≥0 denote simple random walk in Zd, and let T be the first
exit time from Br. For fixed y, the function
A(x) = g(x− y)− Exg(XT − y) (17)
has Laplacian ∆A(x) = −δxy in Br and vanishes on ∂Br, hence A(x) =
G(x, y).
Let x, y ∈ Br and z ∼ y. From (4) we have
|g(x− y)− g(x− z)| ≤
|x− y|d−1
, y, z 6= x.
Using the triangle inequality together with (17), we obtain
|G(x, y)−G(x, z)| ≤ |g(x− y)− g(x− z)|+ Ex|g(XT − y)− g(XT − z)|
|x− y|d−1
w∈∂Br
Hx(w)
|w − y|d−1
where Hx(w) = Px(XT = w).
Write D = {y ∈ Br : |x− y| ≤}. Then∑
y 6=x
z 6=x
|G(x, y)−G(x, z)| ≤ C3ρ+ C2
w∈∂Br
Hx(w)
|w − y|1−d. (18)
Figure 4: Diagram for the Proof of Lemma 5.4.
Taking Ω = w − D in Lemma 5.2, the inner sum on the right is at most
C0Diam(D) ≤ 2C0ρ, so the right side of (18) is bounded above by C1ρ for
a suitable C1.
Finally, the terms in which y or z coincides with x make a negligible
contribution to the sum in (16), since for y ∼ x ∈ Zd
|G(x, x)−G(x, y)| ≤ |g(o)−g(x−y)|+Ex|g(XT −x)−g(XT −y)| ≤ C4.
Lemma 5.4. Let H1, H2 be linear half-spaces in Zd, not necessarily parallel
to the coordinate axes. Let Ti be the first hitting time of Hi. If x /∈ H1∪H2,
Px(T1 > T2) ≤
h1 + 1
where hi is the distance from x to Hi.
Proof. If one of H1, H2 contains the other, the result is vacuous. Otherwise,
let H̃i be the half-space shifted parallel to Hci by distance 2h2 in the direction
of x, and let T̃i be the first hitting time of Hi∪H̃i. Let (Xt)t≥0 denote simple
random walk in Zd, and write Mt for the (signed) distance from Xt to the
hyperplane defining the boundary of H1, with M0 = h1. Then Mt is a
martingale with bounded increments. Since ExT̃1 < ∞, we obtain from
optional stopping
h1 = ExMeT1 ≥ 2h2 Px (XeT1 ∈ H̃1)− Px(XeT1 ∈ H1),
hence
Px (XeT1 ∈ H̃1) ≤ h1 + 12h2 . (19)
Likewise, dM2t − t is a martingale with bounded increments, giving
ExT̃1 ≤ d ExM2eT1
≤ d(2h2 + 1)2 Px (XeT1 ∈ H̃1)
≤ d(h1 + 1)(2h2 + 1)
. (20)
Let T = min(T̃1, T̃2). Denoting by Dt the distance from Xt to the
hyperplane defining the boundary of H2, the quantity
D2t + (2h2 −Dt)
is a martingale. Writing p = Px(T = T̃2) we have
dh22 = EN0 = ENT ≥ p
(2h2)
2 + (1− p)dh22 − ExT
≥ (1 + p)dh22 − ExT
hence by (20)
h1 + 1
Finally by (19)
P(T1 > T2) ≤ p+ P(XeT1 ∈ H̃1) ≤ 52
h1 + 1
Lemma 5.5. Let x ∈ Br and let ρ = r + 1− |x|. Let
S∗k = {y ∈ Br : 2
kρ < |x− y| ≤ 2k+1ρ}. (21)
Let τk be the first hitting time of S∗k , and T the first exit time from Br. Then
Px(τk < T ) ≤ C22−k.
Figure 5: Diagram for the proof of Lemma 5.5.
Proof. Let H be the outer half-space tangent to Br at the point z ∈ ∂Br
closest to x. Let Q be the cube of side length 2kρ/
d centered at x. Then
Q is disjoint from S∗k , hence
Px(τk < T ) ≤ Px(T∂Q < T ) ≤ Px(T∂Q < TH)
where T∂Q and TH are the first hitting times of ∂Q and H. Let H1, . . . ,H2d
be the outer half-spaces defining the faces of Q, so that Q = Hc1 ∩ . . .∩H
By Lemma 5.4 we have
Px(T∂Q < TH) ≤
Px(THi < TH)
dist(x,H) + 1
dist(x,Hi)
2 dist(x,Hi)
Since dist(x,H) = |x − z| ≤ ρ and dist(x,Hi) = 2k−1ρ/
d, and ρ ≥ 1,
taking C2 = 20 d3/2(1 +
d)2 completes the proof.
Lemma 5.6. Let G = GBr be the Green’s function for random walk stopped
on exiting Br. Let x ∈ Br and let ρ = r + 1− |x|. Then∑
|G(x, y)−G(x, z)| ≤ C3ρ log
Proof. Let S∗k be given by (21), and let
W = {w ∈ ∂(S∗k ∪ ∂S
k) : |w − x| < 2
be the portion of the boundary of the enlarged spherical shell S∗k ∪∂S
k lying
closer to x. Let τW be the first hitting time of W , and T the first exit time
from Br. For w ∈W let
Hx(w) = Px(XτW∧T = w).
For any y ∈ S∗k and z ∼ y, simple random walk started at x must hit W
before hitting either y or z, hence
|G(x, y)−G(x, z)| ≤
Hx(w)|G(w, y)−G(w, z)|.
For any y ∈ S∗k and any w ∈W we have
|y − w| ≤ |y − x|+ |w − x| ≤ 3 · 2kρ.
Lemma 5.3 yields∑
|G(x, y)−G(x, z)| ≤ 3C12kρ
Hx(w).
By Lemma 5.5 we have
w∈W Hx(w) ≤ C22
−k, so the above sum is at most
3C1C2ρ. Since the union of shells S∗0 ,S
1 , . . . ,S
dlog2(r/ρ)e
covers all of Br ex-
cept for those points y within distance ρ of x, and
|y−x|≤ρ
z∼y |G(x, y)−
G(x, z)| ≤ C1ρ by Lemma 5.3, the result follows.
Proof of Theorem 1.1, Inner Estimate. Let κ and R be defined as in Lemma
5.1. Since the net number of particles to enter a site x 6= o is at most one, we
have 2d div κ(x) ≥ −1. Likewise 2d div κ(o) = n−1. Taking the divergence
in (15), we obtain
∆u(x) ≤ 1 + divR(x), x 6= o; (22)
∆u(o) = 1− n+ divR(o). (23)
Let T be the first exit time from Br, and define
f(x) = Exu(XT )− ExT + n Ex #{j < T |Xj = 0}.
Then ∆f(x) = 1 for x ∈ Br − {o} and ∆f(o) = 1 − n. Moreover f ≥ 0 on
∂Br. It follows from Lemma 2.2 with m = n that f ≥ γ − C4 on Br for a
suitable constant C4.
We have
u(x)− Exu(XT ) =
u(Xk∧T )− u(X(k+1)∧T )
Each summand on the right side is zero on the event {T ≤ k}, hence
u(Xk∧T )− u(X(k+1)∧T ) | Fk∧T
= −∆u(Xk)1{T>k}.
Taking expectations and using (22) and (23), we obtain
u(x)− Exu(XT ) ≥
1{T>k}(n1{Xk=o} − 1− divR(Xk))
= n Ex #{k < T |Xk = o} − ExT −
1{T>k}divR(Xk)
hence
u(x)− f(x) ≥ −
1{T>k} ∑
R(Xk, z)
 . (24)
Since random walk exits Br with probability at least 12d every time it reaches
a site adjacent to the boundary ∂Br, the expected time spent adjacent to
the boundary before time T is at most 2d. Since |R| ≤ 4d, the terms in (24)
with z ∈ ∂Br contribute at most 16d3 to the sum. Thus
u(x)− f(x) ≥ −
 ∑
y,z∈Br
1{T>k}∩{Xk=y}R(y, z)
− 8d2.
For y ∈ Br we have {Xk = y} ∩ {T > k} = {Xk∧T = y}, hence
u(x)− f(x) ≥ −
y,z∈Br
Px(Xk∧T = y)R(y, z)− 8d2. (25)
Write pk(y) = Px(Xk∧T = y). Note that since ∇u and κ are antisymmet-
ric, R is antisymmetric. Thus∑
y,z∈Br
pk(y)R(y, z) = −
y,z∈Br
pk(z)R(y, z)
y,z∈Br
pk(y)− pk(z)
R(y, z).
Summing over k and using the fact that |R| ≤ 4d, we conclude from (25)
u(x) ≥ f(x)−
y,z∈Br
|G(x, y)−G(x, z)| − 8d2,
where G = GBr is the Green’s function for simple random walk stopped on
exiting Br. By Lemma 5.6 we obtain
u(x) ≥ f(x)− C3(r + 1− |x|) log
r + 1− |x|
− 8d2.
Using the fact that f ≥ γ − C4, we obtain from Lemma 2.1
u(x) ≥ (r − |x|)2 − C3(r + 1− |x|) log
r + 1− |x|
The right side is positive provided r/3 ≤ |x| < r−C5 log r. For x ∈ Br/3, by
Lemma 2.3 we have u(x) > r2/4−C3r log 32 > 0, hence Br−C5 log r ⊂ An.
5.2 Outer Estimate
The following result is due to Holroyd and Propp (unpublished); we include
a proof for the sake of completeness. Notice that the bound in (26) does not
depend on the number of particles.
Proposition 5.7. Let Γ be a finite connected graph, and let Y ⊂ Z be sub-
sets of the vertex set of Γ. Let s be a nonnegative integer-valued function on
the vertices of Γ. Let Hw(s, Y ) be the expected number of particles stopping
in Y if s(x) particles start at each vertex x and perform independent sim-
ple random walks stopped on first hitting Z. Let Hr(s, Y ) be the number of
particles stopping in Y if s(x) particles start at each vertex x and perform
rotor-router walks stopped on first hitting Z. Let H(x) = Hw(1x, Y ). Then
|Hr(s, Y )−Hw(s, Y )| ≤
|H(u)−H(v)| (26)
independent of s and the initial positions of the rotors.
Proof. For each vertex u /∈ Z, arbitrarily choose a neighbor η(u). Or-
der the neighbors η(u) = v1, v2, . . . , vd of u so that the rotor at u points
to vi+1 immediately after pointing to vi (indices mod d). We assign weight
w(u, η(u)) = 0 to a rotor pointing from u to η(u), and weight w(u, vi) =
H(u) − H(vi) + w(u, vi−1) to a rotor pointing from u to vi. These assign-
ments are consistent since H is a harmonic function:
i(H(u)−H(vi)) = 0.
Figure 6: Diagram for the proof of Lemma 5.8.
We also assign weight H(u) to a particle located at u. The sum of rotor
and particle weights in any configuration is invariant under the operation of
routing a particle and rotating the corresponding rotor. Initially, the sum of
all particle weights is Hw(s, Y ). After all particles have stopped, the sum of
the particle weights is Hr(s, Y ). Their difference is thus at most the change
in rotor weights, which is bounded above by the sum in (26).
For ρ ∈ Z let
Sρ = {x ∈ Zd : ρ ≤ |x| < ρ+ 1}. (27)
Bρ = {x ∈ Zd : |x| < ρ} = S0 ∪ . . . ∪ Sρ−1.
Note that for simple random walk started in Bρ, the first exit time of Bρ
and first hitting time of Sρ coincide. Our next result is a modification of
Lemma 5(b) of [10].
Lemma 5.8. Fix ρ ≥ 1 and y ∈ Sρ. For x ∈ Bρ let H(x) = Px(XT = y),
where T is the first hitting time of Sρ. Then
H(x) ≤
|x− y|d−1
for a constant J depending only on d.
Proof. We induct on the distance |x − y|, assuming the result holds for all
x′ with |x′ − y| ≤ 1
|x− y|; the base case can be made trivial by choosing J
sufficiently large. By Lemma 5(b) of [10], we can choose J large enough so
that the result holds provided |y| − |x| ≥ 2−d−3|x − y|. Otherwise, let H1
be the outer half-space tangent to Sρ at the point of Sρ closest to x, and let
H2 be the inner half-space tangent to the ball S̃ of radius 12 |x− y| about y,
at the point of S̃ closest to x. By Lemma 5.4 applied to these half-spaces,
the probability that random walk started at x reaches S̃ before hitting Sρ
is at most 21−d. Writing T̃ for the first hitting time of S̃ ∪ Sρ, we have
H(x) ≤
x′∈eS
Px(XeT = x′)H(x′) ≤ 21−dJ ·
|x− y|
where we have used the inductive hypothesis to bound H(x′).
The lazy random walk in Zd stays in place with probability 1
, and moves
to each of the 2d neighbors with probability 1
. We will need the following
standard result, which can be derived e.g. from the estimates in [17], section
II.12; we include a proof for the sake of completeness.
Lemma 5.9. Given u ∼ v ∈ Zd, lazy random walks started at u and v can
be coupled with probability 1−C/R before either reaches distance R from u,
where C depends only on d.
Proof. Let i be the coordinate such that ui 6= vi. To define a step of the
coupling, choose one of the d coordinates uniformly at random. If the chosen
coordinate is different from i, let the two walks take the same lazy step so
that they still agree in this coordinate. If the chosen coordinate is i, let
one walk take a step while the other stays in place. With probability 1
walks will then be coupled. Otherwise, they are located at points u′, v′ with
|u′− v′| = 2. Moreover, P
|u−u′| ≥ R
for a constant C ′ depending
only on d. From now on, whenever coordinate i is chosen, let the two walks
take lazy steps in opposite directions.
∣∣∣xi = u′i + v′i2
be the hyperplane bisecting the segment [u′, v′]. Since the steps of one walk
are reflections in H1 of the steps of the other, the walks couple when they
hit H1. Let Q be the cube of side length R/
d+2 centered at u, and let H2
be a hyperplane defining one of the faces of Q. By Lemma 5.4 with h1 = 1
and h2 = R/4
d, the probability that one of the walks exits Q before the
walks couple is at most 2d · 5
1 + 1
≤ 40 d3/2
1 + 2
Lemma 5.10. With H defined as in Lemma 5.8, we have∑
|H(u)−H(v)| ≤ J ′ log ρ
for a constant J ′ depending only on d.
Proof. Given u ∈ Bρ and v ∼ u, by Lemma 5.9, lazy random walks started
at u and v can be coupled with probability 1 − 2C/|u − y| before either
reaches distance |u − y|/2 from u. If the walks reach this distance without
coupling, by Lemma 5.8 each has still has probability at most J/|u− y|d−1
of exiting Bρ at y. By the strong Markov property it follows that
|H(u)−H(v)| ≤
|u− y|d
Summing in spherical shells about y, we obtain
|H(u)−H(v)| ≤
d−1 2CJ
≤ J ′ log ρ.
We remark that Lemma 5.10 could also be inferred from Lemma 5.8
using [9, Thm. 1.7.1] in a ball of radius |u− y|/2 about u.
To prove the outer estimate of Theorem 1.1, we will make use of the
abelian property of rotor-router aggregation. Fix a finite set Γ ⊂ Zd con-
taining the origin. Starting with n particles at the origin, at each time step,
choose a site x ∈ Γ with more than one particle, rotate the rotor at x, and
move one particle from x to the neighbor the rotor points to. After a finite
number of such choices, each site in Γ will have at most one particle, and all
particles that exited Γ will be on the boundary ∂Γ. The abelian property
says that the final configuration of particles and the final configuration of
rotors do not depend on the choices. For a proof, see [4, Prop. 4.1].
In our application, we will fix ρ ≥ r and stop each particle in rotor-router
aggregation either when it reaches an unoccupied site or when it reaches the
spherical shell Sρ. Let Nρ be the number of particles that reach Sρ during
this process. Note that at some sites in Sρ, more than one particle may have
stopped. If we let each of these extra particles in turn continue performing
rotor-router walk, stopping either when it reaches an unoccupied site or
when it hits the larger shell Sρ+h, then by the abelian property, the number
of particles that reach Sρ+h will be Nρ+h. We will show that when h is order
r1−1/d, a constant fraction of the particles that reach Sρ find unoccupied sites
before reaching Sρ+h.
Proof of Theorem 1.1, Outer Estimate. Fix integers ρ ≥ r and h ≥ 1. In
the setting of Proposition 5.7, let Γ be the lattice ball Bρ+h+1, and let
Z = Sρ+h. Fix y ∈ Sρ+h and let Y = {y}. For x ∈ Sρ, let s(x) be the
number of particles stopped at x if each particle in rotor-router aggregation
is stopped either when it reaches an unoccupied site or when it reaches Sρ.
Write
H(x) = Px(XT = y)
where T is the first hitting time of Sρ+h. By Lemma 5.8 we have
Hw(s, y) =
s(x)H(x) ≤
where
is the number of particles that ever visit the shell Sρ.
By Lemma 5.10 the sum in (26) is at most J ′ log h, hence from Proposi-
ton 5.7 and (29) we have
Hr(s, y) ≤
+ J ′ log h. (30)
Let ρ(0) = r, and define ρ(i) inductively by
ρ(i+ 1) = min
ρ(i) +N2/(2d−1)
, min{ρ > ρ(i)|Nρ ≤ Nρ(i)/2}
. (31)
Fixing h < ρ(i+ 1)− ρ(i), we have
hd−1 log h ≤ N
logNρ(i) ≤ Nρ(i);
so (30) with ρ = ρ(i) simplifies to
Hr(s, y) ≤
CNρ(i)
where C = J + J ′.
Since all particles that visit Sρ(i)+h during rotor-router aggregation must
pass through Sρ(i), we have by the abelian property
Nρ(i)+h ≤
y∈Sρ(i)+h
Hr(s, y). (33)
Let Mk = #(An∩Sk). There are at most Mρ(i)+h nonzero terms in the sum
on the right side of (33), and each term is bounded above by (32), hence
Mρ(i)+h ≥ Nρ(i)+h
CNρ(i)
where the second inequality follows from Nρ(i)+h ≥ Nρ(i)/2. Summing over
h, we obtain
ρ(i+1)−1∑
ρ=ρ(i)+1
(ρ(i+ 1)− ρ(i)− 1)d. (34)
The left side is at most Nρ(i), hence
ρ(i+ 1)− ρ(i) ≤ (2dCNρ(i))
1/d ≤ N2/(2d−1)
providedNρ(i) ≥ C ′ := (2dC)2d−1. Thus the minimum in (31) is not attained
by its first argument. It follows that Nρ(i+1) ≤ Nρ(i)/2, hence Nρ(a log r) < C ′
for a sufficiently large constant a.
By the inner estimate, since the ball Br−c log r is entirely occupied, we
have ∑
Mρ ≤ ωdrd − ωd(r − c log r)d
≤ cdωdrd−1 log r.
Write xi = ρ(i+ 1)− ρ(i)− 1; by (34) we have
a log r∑
xdi ≤ cdωdr
d−1 log r,
By Jensen’s inequality, subject to this constraint,
xi is maximized when
all xi are equal, in which case xi ≤ C ′′r1−1/d and
ρ(a log r) = r +
xi ≤ r + C ′′r1−1/d log r. (35)
Since Nρ(a log r) < C
′ we have Nρ(a log r)+C′ = 0; that is, no particles reach
the shell Sρ(a log r)+C′ . Taking c′ = C ′ + C ′′, we obtain from (35)
An ⊂ Br(1+c′r−1/d log r).
Figure 7: Image of the rotor-router aggregate of one million particles under
the map z 7→ 1/z2. The colors represent the rotor directions. The white
disc in the center is the image of the complement of the occupied region.
6 Concluding Remarks
A number of intriguing questions remain unanswered. Although we have
shown that the asymptotic shape of the rotor-router model is a ball, the
near perfect circularity found in Figure 1 remains a mystery. In particular,
we do not know whether an analogue of Theorem 1.3 holds for the rotor-
router model, with constant error in the radius as the number of particles
grows.
Equally mysterious are the patterns in the rotor directions evident in
Figure 1. The rotor directions can be viewed as values of the odometer
function mod 2d, but our control of the odometer is not fine enough to
provide useful information about the patterns. If the rescaled occupied
region
π/nAn is viewed as a subset of the complex plane, it appears that
the monochromatic regions visible in Figure 1, in which all rotors point
in the same direction, occur near points of the form (1 + 2z)−1/2, where
z = a + bi is a Gaussian integer (i.e. a, b ∈ Z). We do not even have
a heuristic explanation for this phenomenon. Figure 7 shows the image
of A1,000,000 under the map z 7→ 1/z2; the monochromatic patches in the
transformed region occur at lattice points.
László Lovász (personal communication) has asked whether the occu-
pied region An is simply connected, i.e. whether its complement is con-
nected. While Theorem 1.1 shows that An cannot have any holes far from
the boundary, we cannot answer his question at present.
A final question is whether our methods could be adapted to internal
DLA to show that if n = ωdrd, then with high probability Br−c log r ⊂ In,
where In is the internal DLA cluster of n particles. The current best bound is
due to Lawler [11], who proves that with high probability Br−r1/3(log r)2 ⊂ In.
Acknowledgments
The authors thank Jim Propp for bringing the rotor-router model to our
attention and for many fruitful discussions. We also had useful discussions
with Oded Schramm, Scott Sheffield and Misha Sodin. We thank Wilfried
Huss for pointing out an error in an earlier draft. Yelena Shvets helped draw
some of the figures.
References
[1] P. Bak, C. Tang and K. Wiesenfeld, Self-organized criticality: an ex-
planation of the 1/f noise, Phys. Rev. Lett. 59, no. 4 (1987), 381–384.
[2] M. Cook, The tent metaphor, available at http://paradise.caltech.
edu/~cook/Warehouse/ForPropp/.
[3] J. N. Cooper and J. Spencer, Simulating a random walk with con-
stant error, Combin. Probab. Comput. 15 (2006) 815–822. http://www.
arxiv.org/abs/math.CO/0402323.
[4] P. Diaconis and W. Fulton, A growth model, a game, an algebra, La-
grange inversion, and characteristic classes, Rend. Sem. Mat. Univ. Pol.
Torino 49 (1991) no. 1, 95–119.
http://paradise.caltech.edu/~cook/Warehouse/ForPropp/
http://paradise.caltech.edu/~cook/Warehouse/ForPropp/
http://www.arxiv.org/abs/math.CO/0402323
http://www.arxiv.org/abs/math.CO/0402323
[5] A. Fey and F. Redig, Limiting shapes for deterministic centrally seeded
growth models, J. Stat. Phys. 130 (2008), no. 3, 579–597. http://
arxiv.org/abs/math.PR/0702450.
[6] Y. Fukai and K. Uchiyama, Potential kernel for two-dimensional ran-
dom walk. Ann. Probab. 24 (1996), no. 4, 1979–1992.
[7] M. Kleber, Goldbug variations, Math. Intelligencer 27 (2005), no. 1,
55–63.
[8] P. Koosis, La plus petite majorante surharmonique et son rapport avec
l’existence des fonctions entières de type exponentiel jouant le rôle de
multiplicateurs, Ann. Inst. Fourier (Grenoble) 33 (1983), fasc. 1, 67–
[9] G. Lawler, Intersections of Random Walks, Birkhäuser, 1996.
[10] G. Lawler, M. Bramson and D. Griffeath, Internal diffusion limited
aggregation, Ann. Probab. 20, no. 4 (1992), 2117–2140.
[11] G. Lawler, Subdiffusive fluctuations for internal diffusion limited aggre-
gation, Ann. Probab. 23 (1995) no. 1, 71–86.
[12] Y. Le Borgne and D. Rossin, On the identity of the sandpile group,
Discrete Math. 256 (2002) 775–790.
[13] L. Levine, The rotor-router model, Harvard University senior thesis
(2002), http://arxiv.org/abs/math/0409407.
[14] L. Levine and Y. Peres, Spherical asymptotics for the rotor-router
model in Zd, Indiana Univ. Math. J. 57 (2008), no. 1, 431–450.
http://arxiv.org/abs/math/0503251
[15] L. Levine and Y. Peres, The rotor-router shape is spherical, Math.
Intelligencer 27 (2005), no. 3, 9–11.
[16] L. Levine and Y. Peres, Scaling limits for internal aggregation models
with multiple sources. http://arxiv.org/abs/0712.3378.
[17] T. Lindvall, Lectures on the Coupling Method, Wiley, 1992.
[18] V. B. Priezzhev, D. Dhar, A. Dhar, and S. Krishnamurthy, Eulerian
walkers as a model of self-organised criticality, Phys. Rev. Lett. 77
(1996) 5079–82.
http://arxiv.org/abs/math.PR/0702450
http://arxiv.org/abs/math.PR/0702450
http://arxiv.org/abs/math/0409407
http://arxiv.org/abs/math/0503251
http://arxiv.org/abs/0712.3378
[19] J. Propp, Three lectures on quasirandomness, available at http://
faculty.uml.edu/jpropp/berkeley.html.
[20] F. Spitzer, Principles of Random Walk, Springer, 1976.
[21] K. Uchiyama, Green’s functions for random walks on ZN , Proc. London
Math. Soc. 77 (1998), no. 1, 215–240.
[22] J. Van den Heuvel, Algorithmic aspects of a chip-firing game, Combin.
Probab. Comput. 10, no. 6 (2001), 505–529.
http://faculty.uml.edu/jpropp/berkeley.html
http://faculty.uml.edu/jpropp/berkeley.html
	Introduction
	Basic Estimate
	Divisible Sandpile
	Classical Sandpile
	Rotor-Router Model
	Inner Estimate
	Outer Estimate
	Concluding Remarks
ABSTRACT
  The rotor-router model is a deterministic analogue of random walk. It can be
used to define a deterministic growth model analogous to internal DLA. We prove
that the asymptotic shape of this model is a Euclidean ball, in a sense which
is stronger than our earlier work. For the shape consisting of $n=\omega_d r^d$
sites, where $\omega_d$ is the volume of the unit ball in $\R^d$, we show that
the inradius of the set of occupied sites is at least $r-O(\log r)$, while the
outradius is at most $r+O(r^\alpha)$ for any $\alpha > 1-1/d$. For a related
model, the divisible sandpile, we show that the domain of occupied sites is a
Euclidean ball with error in the radius a constant independent of the total
mass. For the classical abelian sandpile model in two dimensions, with $n=\pi
r^2$ particles, we show that the inradius is at least $r/\sqrt{3}$, and the
outradius is at most $(r+o(r))/\sqrt{2}$. This improves on bounds of Le Borgne
and Rossin. Similar bounds apply in higher dimensions.

<|endoftext|><|startoftext|>
Introduction
The kilohertz quasi-periodic oscillations (kHz QPOs) were firstly discov-
ered in Sco X-1, a luminous Z source in neutron star (NS) low-mass X-ray
binaries (LMXBs) (e.g. van der Klis et al. 1996), and now they have been
detected in twenty more sources (e.g. van der Klis 2000, 2006, for reviews).
Usually, these kHz QPOs appear in pairs, the upper kHz QPO frequency (ν2,
hereafter the upper-frequency) and the lower kHz QPO frequency (ν1, here-
after the lower-frequency), which are discovered in three classes of sources,
i.e. accretion powered millisecond pulsars, bright Z sources and less luminous
Atoll sources (e.g., Hasinger & van der Klis 1989).
The kHz QPO peak separation, ∆ν = ν2 − ν1, in a given source generally
decreases with frequency, except the recently detected kHz QPOs in Cir X-
1, in which the peak separation increases with frequency (Boutloukos et al.
Accepted for publication in Advances of Space Research
http://arxiv.org/abs/0704.0689v1
2006). In addition, the variable peak separations are not equal to the NS spin
frequencies. However, the averaged peak separation is found to be either close
to the spin frequency or to half of it (e.g., van der Klis 2006; Linares et al.
2005).
The above observations offer strong evidence against the simple beat-
frequency model, in which the lower-frequency is the beat between the upper-
frequency ν2 and the NS spin frequency νs (e.g. Strohmayer et al. 1996; Zhang
et al. 1997; Miller et al. 1998), i.e. ν1 = ν2 − νs. Furthermore, with the dis-
covery of pairs of 30–450 Hz QPOs from a few black-hole candidates with
the frequency ratios 3:2 (e.g., van der Klis 2006), Abramowicz et al. (2003)
reported that the ratios of twin kHz QPOs in Sco X-1 tend to cluster around
a value about 3:2, and they argued this fact to be a promising link with the
black hole high-frequency QPOs (e.g. van der Klis 2006).
For the all Z and Atoll sources, the data plots of the upper-frequency
versus the lower-frequency can be fitted by a power law function (e.g., Zhang
et al. 2006a), and also roughly fitted by a linear function (Belloni et al. 2005).
However, for the individual kHz QPO source, for instance Sco X-1, its kHz
QPOs can be well fitted by a power law function (e.g. Psaltis et al. 1998; Yin
et al. 2005).
In this paper, to investigate the twin kHz QPO correlation for the individ-
ual Z or Atoll source, we fitted the data with a power-law and a linear function
for four typical Z sources and four typical Atoll sources, and a comparison of
both fittings by χ2-tests is discussed in section 2, where comparisons with the
models are discussed. The conclusions and consequences are given in section
2 Correlations between twin kHz QPOs
Until now, twin kHz QPOs have been detected in 21 LMXBs, including 2
accretion powered millisecond X-ray pulsars, 8 Z sources and 11 Atoll sources,
as listed in Tab. 1. In Fig. 1 and Fig. 2, we plotted twin kHz QPO data for
the Z sources and Atoll sources, showing the correlations of ν1 vs. ν2, ∆ν vs.
ν2 and ν2/ν1 vs. ν2, where the power-law and linear fitting lines for the eight
Z and Atoll sources are presented. The results of the fittings and χ2 -tests are
listed in Tab. 2.
2.1 A power law fitting
The power-law function is chosen as
ν1 = a
1000 Hz
Hz (1)
to fit twin kHz QPO data points of all Atoll (Z) sources, as well as 4 individual
Atoll (Z) sources, separately. It is noted that a same function was applied to
the fitting of kHz QPOs of Sco X-1 by Psaltis et al. (1998) with a smaller set
of kHz QPO data points. The fitting results of the normalization coefficient a,
the power-law index b and χ2/d.o.f. for various cases are listed in Tab. 2, which
correspond to the fitting curves as presented in Fig. 1. We find that the power-
law index for the fitting of all Z sources (see Tab. 2) is 1.87, obviously bigger
than that of the fitting for all Atoll sources (1.61). Then, for the individual
case, the power-law index for Z source is generally bigger than that in Atoll
source, except GX 17+2.
2.2 A linear fitting
For the same data sets, the linear fitting function is chosen as,
ν2 = Aν1 +B Hz , (2)
which was exploited by Belloni et al. (2005) to discuss the kHz QPO fitting in
Sco X-1, 4U 1608-52, 4U 1636-53, 4U 1728-34 and 4U 1820-30. By means of the
χ2 -tests, as shown in Tab. 2, we find that the linear fitting concordes with the
data well in some cases, and there is no much systematic difference between
the linear slope parameters of the Atoll sources and those of Z sources.
2.3 Comparison between the power-law and the linear correlation
As a comparison between models and the data, it is remarked that the
relativistic precession model (e.g. Stella & Vietri 1999) and the Alfvén wave
oscillation model (e.g. Zhang 2004; Li & Zhang 2005) both can lead to power-
law relations approximately, and then the beat-frequency model (e.g. Miller et
al. 1998) and the 3:2 resonance model (e.g. Abramowicz et al. 2003, this model
is successfully applied to black hole candidates) predicted the linear relations
between twin kHz QPO frequencies in the lowest approximation (Abramowicz
et al. 2005). In Tab. 2, we can see that the χ2/d.o.f. of the power-law relation
is usually less than the linear one for the same source, except the two Atoll
sources 4U 0614+09 and 4U 1636-53. And a linear function cannot give a
firstly increasing and then decreasing tendency of all Z data as shown in Fig.
2b. But a power-law one would fit it well as shown in Fig. 1b. So, these maybe
mean that a power-law correlation is better than a linear one.
2.4 Testing the constant peak separation ∆ν = 300 Hz
Since the discovery of kHz QPOs, it is known that the peak separation
for Sco X-1 (van der Klis et al 1997; Méndez & van der Klis 2000) is a not
constant, and the same is true for the other Z sources, e.g., GX 17+2 (Homan
et al. 2002) and Cir X-1 (Boutloukos 2006). As for the Atoll sources, the peak
separation of 4U 1728–34 (Migliari, van der Klis, & Fender 2003; Méndez
& van der Klis 1999) is always significantly lower than the burst oscillation
frequency, and the peak separation of 4U 1636–53 (Jonker, Méndez, & van der
Klis 2002b; Méndez, van der Klis, & van Paradijs 1998) is varying between
being lower and higher than half the spin frequency. In addition, 4U 1608-52
( Méndez et al. 1998) and 4U 1735-44 (Ford et al. 1998) are found to share
the varied peak separations.
In Fig. 1b or Fig. 2b, we show that the peak separations in all Z sources
decrease (increase) systematically with the upper frequency if the upper fre-
quency is larger (less) than ∼700 Hz (e.g. van der Klis 2000, 2006; Boutloukos
et al. 2006). But this firstly increasing and then decreasing with frequency is
not clearly found for the kHz QPO data of all Atoll sources, as shown in Fig.
1e and Fig. 2e, which perhaps is on account of the less amount of data in the
low kHz QPO frequencies in Atoll sources. From Fig. 1 and Fig. 2, we find
that the peak separations are scattered in a wide range of frequency for each
source. Therefore, a constant peak separation, i.e. ∆ν = 300 Hz, cannot fit
for these data.
Fig. 3 shows the results of χ2-tests against a general constant peak sep-
aration of the twin kHz QPOs in the 8 individual sources. The minimum
χ2/d.o.f. are all with values >> 1 , which means that any constant peak
separation model cannot fit for these data anymore.
2.5 Testing the constant peak ratio ν2/ν1 = 3/2
From Fig. 1c (Fig. 1f) or Fig. 2c (Fig. 2f), we find that twin frequency
ratios distribute in a wide range from 1.2 to 4.2, with the averaged value 1.73
(1.50) for all known Z (Atoll) data. Obviously, a constant ratio ν2/ν1 = 3/2,
which can be applicable to some black hole QPO sources, is not consistent with
the observed NS/LMXB data. In the ν2/ν1 vs. ν2 plots of Fig. 1 and Fig. 2, the
frequency ratios systematically decrease with the upper-frequency for both Z
and Atoll sources. In detail, the incompatible 3:2 ratio peak distribution has
been also studied by Belloni et al. (2005) in several sources, who showed that
the distribution of QPO frequencies in Sco X-1, 4U 1608–52, 4U 1636–53, 4U
1728–34, and 4U 1820–30 is multi-peaked, with the peaks occurring at the
different ν2/ν1 ratios.
3 Conclusions
In this paper, the updated data sets of twin kHz QPO frequencies simul-
taneously detected in NS LMXBs are analyzed, and the power-law and linear
fittings are studied for the individual Z/Atoll and all Z/Atoll sources, respec-
tively. Our main conclusions are presented as follows. (1)In Fig.1 and Fig.2,
we can notice that a simple constant peak separation model, such as the beat-
frequency model (e.g. Strohmayer et al. 1996; Zhang et al. 1997; Miller et
al. 1998), or a constant peak ratio assumption, as in a naive extrapolation
of the observed resonant frequency ratio from the black hole sources to the
neutron stars (e.g. Abramowicz et al. 2003), cannot fit the observed data.
Namely, any simple constant peak separation and constant peak ratio models
are generally inconsistent with the data. The peak separations in all Z sources
tend to increase (decrease) with the upper-frequency if the upper-frequency
is less (larger) than ∼700 Hz. But this tendency does not appear in all Atoll
sources because of less amount of data at low kHz QPO frequency. Statis-
tically, the twin frequency ratios tend to decrease with the upper-frequency
in both Z and Atoll sources. (2)Our results show that the index of the fitted
power-law relation of Z source is generally bigger than that of Atoll source,
except GX 17+2. On the consideration of model, this different index value in
Z and Atoll sources might be related to the diversity in their luminosity or
magnetic field. However the linear correlations do not show any systematic
differences between Z and Atoll sources. (3)The power-law fitting is somewhat
better than the linear one for most of the sources, because the χ2/d.o.f. value
of the power-law correlation is generally less than that of linear one, and a lin-
ear correlation cannot give the firstly increasing and then decreasing tendency
of peak separations in all Z data. As a comparison with model’s prediction,
we mention the Relativistic Precession model (e.g. Stella & Vietri 1999) and
the Alfvén Wave Oscillation model (Zhang 2004), since both models can give
an approximated power-law correlation between the twin kHz QPOs, however
none of them can distinguish the influences of the luminosity of Z and Atoll
sources.
As a summary, if the future data still support the conclusions obtained
in the paper, they will pose the meaningful constraints on the models for
explaining kHz QPOs.
4 Acknowledgements
We are grateful for T. Belloni, M. Méndez, D. Psaltis and J. Homan for
providing the QPO data, and thank C.M. Zhang for discussions. We highly
appreciate the anonymous reviewers for their helpful comments.
5 References
Abramowicz, M.A., Bulik, T., Bursa, M., et al. Evidence for a 2:3 resonance
in Sco X-1 kHz QPOs. A&A404, L21–L24. 2003.
Abramowicz, M.A., Barret, D., Bursa, M., et al. AN, 326, 864–866. 2005.
Belloni,T., Psaltis, D., & van der Klis, M. A Unified Description of the Timing
Features of Accreting X-Ray Binaries. ApJ 572, 392–406. 2002.
Belloni, T., Méndez, M., & Homan, J. The distribution of kHz QPO frequen-
cies in bright low mass X-ray binaries. A&A437, 209–216. 2005.
Boutloukos, S., van der Klis, M., Altamirano, D., et al. Discovery of twin
kHz QPOs in the peculiar X-ray binary Circinus X-1. ApJ in press 2006.
(astro-ph/0608089)
Di Salvo, T., Méndez, M., & van der Klis, M. On the correlated spectral and
timing properties of 4U 1636-53: An atoll source at high accretion rates.
A&A406, 177–192. 2003.
Hasinger, G., & van der Klis, M. Two patterns of correlated X-ray timing
and spectral behaviour in low-mass X-ray binaries.A&A225, 79–96. 1989.
Homan, J., van der Klis, M., Jonker, P.G., et al. RXTE Observations of
the Neutron Star Low-Mass X-Ray Binary GX 17+2: Correlated X-Ray
Spectral and Timing Behavior. ApJ 568, 878–900. 2002.
Jonker, P.G., van der Klis, M., Wijnands, et al. The Power Spectral Properties
of the Z Source GX 340+0. ApJ 537, 374–386. 2000.
Jonker, P.G., van der Klis, M., Homan, J., et al. Low- and high-frequency
variability as a function of spectral properties in the bright X-ray binary
GX 5-1. MNRAS 333, 665–678. 2002a.
http://arxiv.org/abs/astro-ph/0608089
Jonker, P.G., Méndez, M., & van der Klis, M. Kilohertz quasi-periodic oscil-
lations difference frequency exceeds inferred spin frequency in 4U 1636-53.
MNRAS 336, L1–L5. 2002b.
Li, X.D., & Zhang, C.M. A Model for Twin Kilohertz Quasi-periodic Os-
cillations in Neutron Star Low-Mass X-Ray Binaries. ApJ 635, L57–L60.
2005.
Linares, M., van der Klis, M., Altamilano, D. et al. Discovery of Kilohertz
Quasi-periodic Oscillations and Shifted Frequency Correlations in the Ac-
creting Millisecond Pulsar XTE J1807-294. ApJ 634, 1250–1260. 2005.
Markwardt, C.B., Strohmayer, T.E., & Swank, J.H. Observation of Kilohertz
Quasi-periodic Oscillations from the Atoll Source 4U 1702-429 by the Rossi
X-Ray Timing Explorer. ApJ 512, L125–L129. 1999.
Méndez, M., van der Klis, M., & van Paradijs, J. Difference Frequency of
Kilohertz QPOs Not Equal to Half the Burst Oscillation Frequency in 4U
1636-53. ApJ 506, L117–L119. 1998.
Méndez, M., van der Klis, M., Wijnands, R., et al. Kilohertz Quasi-periodic
Oscillation Peak Separation Is Not Constant in the Atoll Source 4U 1608-
52. ApJ 505, L23–L26. 1998.
Méndez, M., & van der Klis, M. Precise Measurements of the Kilohertz Quasi-
periodic Oscillations in 4U 1728-34. ApJ 517, L5–L54. 1999.
Méndez, M., van der Klis, M. The harmonic and sideband structure of the
kilohertz quasi-periodic oscillations in Sco X-1. MNRAS 318, 938–942.
2000.
Migliari, S., van der Klis, M., & Fender, R. Evidence of a decrease of kHz
quasi-periodic oscillation peak separation towards low frequencies in 4U
1728-34 (GX 354-0). MNRAS 345, L35–L39. 2003.
Miller, M.C., Lamb, F.K., & Psaltis, D. . Sonic-Point Model of Kilohertz
Quasi-periodic Brightness Oscillations in Low-Mass X-Ray Binaries.ApJ
508, 791–830. 1998.
O’Neill, P.M., Kuulkers, E., Sood, R. K., et al. The X-ray fast-time variability
of Sco X-2 (GX 349+2) with RXTE. MNRAS 336, 217–232. 2002.
Psaltis, D., Méndez, M., Wijnands, R., et al. The Beat-Frequency Interpre-
tation of Kilohertz Quasi-periodic Oscillations in Neutron Star Low-Mass
X-Ray Binaries. ApJ 501, L95–L99. 1998.
Psaltis, D., Wijnands, R., Homan, J., Jonker, et al. On the Magnetospheric
Beat-Frequency and Lense-Thirring Interpretations of the Horizontal-Branch
Oscillation in the Z Sources. ApJ 520, 763–775. 1999a.
Psaltis, D., Belloni, T. & van der Klis, M. Correlations in Quasi-periodic
Oscillation and Noise Frequencies among Neutron Star and Black Hole
X-Ray Binaries. ApJ 520, 262–270. 1999b.
Stella, L., Vietri, M. & Morsink, S. Correlations in the Quasi-periodic Os-
cillation Frequencies of Low-Mass X-Ray Binaries and the Relativistic
Precession Model. ApJ 524, L63–L66. 1999.
Strohmayer, T., Zhang, W., Smale, A., et al. Millisecond X-Ray Variability
from an Accreting Neutron Star System. ApJ 469, L9–L12. 1996.
van der Klis, M., Swank, J.H., Zhang, W., et al. Discovery of Submillisecond
Quasi-periodic Oscillations in the X-Ray Flux of Scorpius X-1. ApJ 469,
L1–L4. 1996.
van der Klis, M., Wijnands, R., Horne, D. et al. Kilohertz Quasi-Periodic
Oscillation Peak Separation Is Not Constant in Scorpius X-1. ApJ 481,
L97–L100. 1997.
van der Klis, M. Millisecond Oscillations in X-ray Binaries. ARA&A38, 717–
760. 2000.
van der Klis, M. Rapid X-Ray Variability. in Compact stellar X-ray sources,
W.H.G. Lewin & M. van der Klis (eds.), Cambridge University Press, p.39.
2006. (astro-ph/0410551)
van Straaten, S., Ford, E.C., van der Klis, M., et al. Relations between Timing
Features and Colors in the X-Ray Binary 4U 0614+09. ApJ 540, 1049–
1061. 2000.
van Straaten, S., van der Klis, M., Di Salvo, T., et al. A Multi-Lorentzian
Timing Study of the Atoll Sources 4U 0614+09 and 4U 1728-34. ApJ 568,
912–930. 2002.
van Straaten, S., van der Klis, M., & Méndez, M. The Atoll Source States of
4U 1608-52. ApJ 596, 1155–1176. 2003.
Wijnands, R., van der Klis, M., Homan, J., et al. Quasi-periodic X-ray bright-
ness fluctuations in an accreting millisecond pulsar. Nature 424, 44–47.
2003.
Yin, H.X., Zhang, C.M., Zhao, Y.H., et al. A Study on the Correlations
between the Twin kHz QPO frequencies in Sco X-1. ChJAA5, 595–600.
2005.
http://arxiv.org/abs/astro-ph/0410551
Zhang, C.M. The MHD Alfven wave oscillation model of kHz Quasi Periodic
Oscillations of Accreting X-ray binaries. A&A423, 401–404. 2004.
Zhang, C.M., Yin, H.X., Zhao, Y.H., et al. The correlations between the
twin kHz quasi-periodic oscillation frequencies of low-mass X-ray binaries.
MNRAS 366, 1373–1377. 2006a.
Zhang, F., Qu, J.L., Zhang, C.M., et al. Timing Features of the Accretion-
driven Millisecond X-Ray Pulsar XTE J1807-294 in the 2003 March Out-
burst. ApJ 646, 1116–1124. 2006b.
Zhang, W., Strohmayer, T.E., & Swank, J.H. Neutron Star Masses and Radii
as Inferred from Kilohertz Quasi-periodic Oscillations. ApJ 482, L167–
L170. 1997.
0 200 400 600 800 1000 1200 1400
 (Hz)
 GX 5-1
 GX 17+2
 GX 340+0
 Sco X-1
 Other Z data
 Sco X-1
 GX 5-1
 GX 340+0
 GX 17+2
 All Z data
 = 300 Hz
0 200 400 600 800 1000 1200 1400
 4U 1728-34
 4U 0614+09
 4U 1608-52
 4U 1636-53
 Other data
 4U 1728-34
 4U 0614+09
 4U 1608-52
 4U 1636-53
 All Atoll data
 =300Hz
 =3/2
 (Hz)
Fig. 1. Plots of a and d ν1 vs. ν2, b and e ∆ν vs. ν2 and c and f ν2/ν1 vs.
ν2 for Z sources and Atoll sources. Power-law fitting lines and the two
reference lines (ν2/ν1 = 3/2, and ∆ν = 300 Hz) are presented also.
0 200 400 600 800 1000 1200 1400
 (Hz)
 GX 5-1
 GX 17+2
 GX 340+0
 Sco X-1
 Other Z data
 Sco X-1
 GX 5-1
 GX 17+2
 GX 340+0
 All Z data
 =300 Hz
 =3/2
0 200 400 600 800 1000 1200 1400
 4U 1728-34
 4U 0614+09
 4U 1608-52
 4U 1636-53
 Other Atoll data
 =3/2
 =300Hz
 4U 1728-34
 4U 0614+09
 4U 1608-52
 4U 1636-53
 All Atoll data
 (Hz)
Fig. 2. Plots of a and d ν1 vs. ν2, b and e ∆ν vs. ν2 and c and f ν2/ν1 vs. ν2
for Z sources and Atoll sources. Linear fitting lines and the two reference
lines (ν2/ν1 = 3/2, and ∆ν = 300 Hz) are presented also.
200 300 400
10000
 GX 5-1
 GX 17+2
 GX 340+0
 Sco X-1
 4U 1728-34
 4U 1636-53
 4U 1608-52
 4U 0614+09
 (Hz)
Fig. 3. χ2−tests for a constant peak separation of the 8 individual sources
list in Tab. 2.
Table 1
List of LMXBs with the simultaneously detected twin kHz QPOs.
Sources ν
(3) ν2/ν
1 References
(Hz) (Hz) (Hz)
Millisecond pulsar (2)
XTE J1807-294 127-360 353-587 179-247 1.51-2.78 1,2,
SAX J1808.4-3658 499 694 195 1.39 3
Z source (8)
Cir X-1 56-226 229-505 173-340 2.23-4.19 4
Sco X-1 544-852 844-1086 223-312 1.26-1.57 B,M,K
GX 340+0 197-565 535-840 275-413 1.49-2.72 B,K,P,5
XTE J1701-462 620 909 289 1.47 6
GX 349+2 712-715 978-985 266-270 1.37-1.38 B,K,7
GX 5-1 156-634 478-880 232-363 1.38-3.06 B,K,P,8
GX 17+2 475-830 759-1078 233-308 1.28-1.60 B,K,P,9
Cyg X-2 532 856.6 324 1.61 B,K,P
Atoll source (11)
4U 0614+09 153-823 449-1162 238-382 1.38-2.93 B,K,P,10,11
4U 1608-52 476-876 802-1099 224-327 1.26-1.69 M,B,K,12
4U 1636-53 644-921 971-1192 217-329 1.24-1.51 B,K,P,13,14
4U 1702-43 722 1055 333 1.46 K,P,15
4U 1705-44 776 1074 298 1.38 B,K,P
4U 1728-34 308-894 582-1183 271-359 1.31-1.89 B,K,P,11,16
KS 1731-260 903 1169 266 1.29 B,K,P
4U 1735-44 640-728 982-1026 296-341 1.41-1.53 B,K,P
4U 1820-30 790 1064 273 1.35 B,K,P
4U 1915-05 224-707 514-1055 290-353 1.49-2.3 B,K,P
XTEJ2123-058 849-871 1110-1140 261-270 1.31-1.31 B,K,P
(1):the range of ν1;
(2): the range of ν2;
(3): the range of ∆ν; (4): the range of
ν2/ν1. K: van der Klis 2000, van der Klis 2006; M: Méndez et al. 1998ab,
Méndez & van der Klis 1999, 2000; B: Belloni et al. 2002, Belloni et al.
2005; P: Psaltis et al. 1999ab. 1: Linares 2005; 2: Zhang et al. 2006b; 3:
Wijnands et al. 2003; 4: Boutloukos et al. 2006; 5: Jonker et al. 2000; 6:
Homan 2006 (personal communication); 7: O’Neill et al. 2002; 8: Jonker
et al. 2002a; 9: Homan et al. 2002; 10: van Straaten et al. 2002; 11: van
Straaten et al. 2000; 12: van Straaten et al. 2003; 13: Di Salvo et al.
2003; 14: Jonker et al. 2002b; 15: Markwardt et al. 1999; 16: Migliari et
al. 2003.
Table 2
List of the results of fittings and χ2-tests.
ν1 = a(ν2/(1000 Hz))
b Hz ν2 = Aν1+B Hz
Source∗
a b χ2/d.o.f. A B χ2/d.o.f.
Z source
Sco X-1 721.95 ± 0.69 1.85 ± 0.01 33.9/87 0.765 ± 0.007 445.84 ± 4.73 54.8/87
GX 340+0 763.85 ±38.03 2.12 ± 0.15 21.3/17 0.884 ± 0.067 374.96 ±24.41 27.2/17
GX 5-1 833.02 ±26.08 2.26 ± 0.10 29.7/27 0.828 ± 0.039 379.54 ±15.10 43.7/27
GX 17+2 723.40 ± 5.53 1.56 ± 0.06 11.2/19 0.906 ± 0.038 341.13 ±24.27 12.0/19
All Z 725.38 ± 2.50 1.87 ± 0.02 205.8/169 0.924 ± 0.012 336.22 ± 6.81 380.1/169
Atoll source
4U 0614+09 673.74 ± 4.77 1.49 ± 0.06 80.1/40 1.002 ± 0.033 320.75 ±20.37 68.4/40
4U 1608-52 717.36 ± 4.89 1.81 ± 0.08 7.8/16 0.781 ± 0.036 436.60 ±25.12 9.1/16
4U 1636-53 685.16 ±15.44 1.72 ± 0.16 10.7/11 0.737 ± 0.064 505.18 ±53.51 10.1/11
4U 1728-34 667.86 ± 5.59 1.51 ± 0.07 20.9/23 0.997 ± 0.046 329.33 ±32.09 25.5/23
All Atoll 683.48 ± 3.01 1.61 ± 0.04 293.3/109 0.912 ± 0.020 371.08 ±13.91 308.7/109
	Introduction
	Correlations between twin kHz QPOs
	A power law fitting
	A linear fitting
	Comparison between the power-law and the linear correlation
	Testing the constant peak separation =300 Hz
	Testing the constant peak ratio 2/1=3/2
	Conclusions
	Acknowledgements
	References
ABSTRACT
  The recently updated data of the twin kilohertz quasi-periodic oscillations
(kHz QPOs) in the neutron star low-mass X-ray binaries are analyzed. The
power-law fitting $\nu_{1}=a(\nu_{2}/1000)^{b}$ and linear fitting
$\nu_{2}=A\nu_{1}+B$ are applied, individually, to the data points of four Z
sources (GX 17+2, GX 340+0, GX 5-1 and Sco X-1) and four Atoll sources (4U
0614+09, 4U 1608-52, 4U 1636-53 and 4U 1728-34). The $\chi^{2}$-tests show that
the power-law correlation and linear correlation both can fit data well.
Moreover, the comparisons between the data and the theoretical models for kHz
QPOs are discussed.

<|endoftext|><|startoftext|>
Introduction 4
2 Fuzzball solutions on T 4 9
2.1 Chiral null models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The IIA F1-NS5 system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Dualizing further to the D1-D5 system . . . . . . . . . . . . . . . . . . . . . 12
3 Fuzzball solutions on K3 13
3.1 Heterotic chiral model in 10 dimensions . . . . . . . . . . . . . . . . . . . . 13
3.2 Compactification on T 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 String-string duality to P-NS5 (IIA) on K3 . . . . . . . . . . . . . . . . . . 16
3.4 T-duality to F1-NS5 (IIB) on K3 . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 S-duality to D1-D5 on K3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 D1-D5 fuzzball solutions 21
5 Vevs for the fuzzball solutions 26
5.1 Holographic relations for vevs . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Application to the fuzzball solutions . . . . . . . . . . . . . . . . . . . . . . 30
6 Properties of fuzzball solutions 33
6.1 Dual field theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.2 Correspondence between geometries and ground states . . . . . . . . . . . . 34
6.3 Matching with the holographic vevs . . . . . . . . . . . . . . . . . . . . . . 35
6.4 A simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.5 Selection rules for curve frequencies . . . . . . . . . . . . . . . . . . . . . . . 40
6.6 Fuzzballs with no transverse excitations . . . . . . . . . . . . . . . . . . . . 41
7 Implications for the fuzzball program 43
A Conventions 46
A.1 Field equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 Duality rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
B Reduction of type IIB solutions on K3 49
B.1 S-duality in 6 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
B.2 Basis change matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C Properties of spherical harmonics 54
D Interpretation of winding modes 55
E Density of ground states with fixed R charges 59
1 Introduction
Over the last few years an interesting new proposal for the gravitational nature of black hole
microstates has emerged [1, 2, 3]; see also [4, 5, 6], and [7]. According to this proposal there
should exist non-singular horizon-free geometries associated with the black hole microstates.
These so-called fuzzball geometries should asymptotically approach the original black hole
geometry and should generically differ from each other around the horizon scale. In this
scenario the black hole provides only an average statistical description of the physics and
thus longstanding issues such as the information loss paradox would be resolved. The
underlying physics of the black hole would not be conceptually different from that of a
star, with the temperature and entropy being of statistical origin. Given the importance
of understanding black hole physics and its implications for quantum gravity, this proposal
should be developed, explored and tested where possible.
Many issues need to be addressed to implement the fuzzball proposal at a quantitative
and precise level. The proposal requires the existence of exponential numbers of horizon-
free non-singular solutions for each black hole. So the most basic of questions is whether
one can find such a number of solutions with the required properties and moreover what
precisely are the required properties for any given black hole. Moreover one would like to
show quantitatively how black hole properties emerge upon coarse-graining; for this one
needs to know the precise relationship between geometries and microstates.
Much of the recent work on this proposal has been focused on constructing fuzzball ge-
ometries for certain supersymmetric black holes with macroscopic horizons; for a summary
of progress in this direction see [8]. The method of construction here uses crucially super-
symmetry and known classifications of supersymmetric solutions: one looks for non-singular
horizon-free supersymmetric solutions with the correct charges to match those of the black
hole.
This method however has a number of limitations. One is that the supersymmetric
classifications are not sufficiently restrictive for cases of interest and thus one needs a specific
ansatz to make progress. To date many of the fuzzball geometries constructed are rather
atypical; for example, they have angular momenta much larger than those of a typical black
hole microstate. Whilst families of typical geometries are presumably contained in the
supersymmetric classification, finding an ansatz to construct families rather than isolated
examples is not easy.
Another key issue is that one does not know precisely what is the relationship between
a given geometry and the black hole microstates. This in turn means that one does not
know whether one has constructed the correct geometries to describe the black hole. Nor
does one know whether one has enough geometries to account for the black hole entropy
upon geometric quantization, using the methods of [9]. For example, in cases where the
dual theory has distinct Higgs and Coulomb branches, one needs to determine whether a
given fuzzball geometry describes Higgs or Coulomb branch physics. More importantly,
one would like to see explicitly how black hole properties emerge upon coarse graining; to
understand how to do such a computation properly the precise relation between the fuzzball
geometries and microstates is crucial.
To address this issue, we have advocated and developed the use of AdS/CFT methods.
That is, the supersymmetric black holes of interest admit a dual CFT description and
the fuzzball geometries therefore have a decoupling limit which is asymptotically AdS.
One can therefore use well-developed techniques of AdS/CFT, in particular Kaluza-Klein
holography [10], to extract field theory data from the geometry and diagnose precisely what
the geometry describes.
It is worth emphasizing at this point that the AdS/CFT correspondence both motivates
and supports the fuzzball picture. The gravity/gauge theory dictionary relates a given
asymptotically AdS geometry to either a deformation of the CFT or the CFT in a non-trivial
vacuum characterized by the expectation values of gauge invariant operators. Conversely,
one expects that for any stable state of the CFT (such as the BPS states) there exists
an asymptotically AdS solution, whose asymptotics encode the vevs of gauge invariant
operators in that state. If the field theory is in a pure state, there is no entropy and
one does not expect the corresponding geometry to have a horizon, and hence entropy.
AdS/CFT thus implies that the field theory in a given pure stable (black hole) state should
have a geometric dual with no horizon; there is however no guarantee that the geometry
should be well-described by supergravity alone, i.e. weakly curved everywhere1.
In our recent papers [11, 12], we have discussed in some detail the case of the D1-D5
system, for which (some) fuzzball geometries were constructed in [1]. Since this is a 2-charge
system, there is no macroscopic horizon: the naive geometry is singular, with the horizon
believed to form on taking into account α′ corrections. Whilst this is not a macroscopic
black hole system, there are a number of reasons to explore this case fully before moving
on to supersymmetric macroscopic black holes.
1There are additional subtleties in low dimensional quantum field theories due to the strong infrared
fluctuations. More properly one should view a given fuzzball geometry as dual to a wavefunction on the
Higgs branch of the field theory, but it seems in any case likely that such wavefunctions would be localized
around specific regions in the large N limit and thus that this issue does not play a key role at infinite N .
Firstly, one can obtain all fuzzball geometries in this system by dualities from known
solitonic solutions of F1-P systems. Thus one should be able to account for all the entropy,
and show how the average black hole description emerges. Moreover, the dual description
of this black hole is the simplest and best understood: the black hole entropy arises from
the degeneracy of the Ramond ground states of the dual (4, 4) CFT. This is an ideal system
in which to address the question of what is the precise correspondence between geome-
tries and microstates, and moreover how the properties of given microstates determine and
characterize the fuzzball geometries.
In the original work of [1], only a subset of the 2-charge fuzzball geometries were con-
structed using dualities from F1-P solutions. Recall that the D1-D5 system on T 4 is related
by dualities to the type II F1-P system, also on T 4, whilst the D1-D5 system on K3 is re-
lated to the heterotic F1-P system on T 4; the exact duality chains needed will be reviewed in
sections 2 and 3. Now the solution for a fundamental string carrying momentum in type II
is characterized by 12 arbitrary curves, eight associated with transverse bosonic excitations
and four associated with the bosonization of eight fermionic excitations on the string [13].
The corresponding heterotic string solution is characterized by 24 arbitrary curves, eight
associated with transverse bosonic excitations and 16 associated with charge waves on the
string.
In the work of [1], the duality chain was carried out for type II F1-P solutions on T 4 for
which only bosonic excitations in the transverse R4 are excited. That is, the solutions are
characterized by only four arbitrary curves; in the dual D1-D5 solutions these four curves
characterize the blow-up of the branes, which in the naive solutions are sitting in the origin
of the transverse R4, into a supertube. In this paper we carry out the dualities for generic
F1-P solutions in both the T 4 and K3 cases, to obtain generic 2-charge fuzzball solutions
with internal excitations. Note that partial results for the T 4 case were previously given
in the appendix of [3]; we will comment on the relation between our solutions and theirs
in section 2. The general solutions are then characterized by arbitrary curves capturing
excitations along the compact manifold M4, along with the four curves describing the
blow-up in R4. They describe a bound state of D1 and D5-branes, wrapped on the compact
manifold M4, blown up into a rotating supertube in R4 and with excitations along the part
of the D5-branes wrapping the M4.
The duality chain that uses string-string duality from heterotic on T 4 to type II on K3
provides a route for obtaining fuzzball solutions that has not been fully explored. One of
the results in this paper is to make explicit all steps in this duality route. In particular, we
work out the reduction of type IIB on K3 and show how S-duality acts in six dimensions.
These results may be useful in obtaining fuzzball solution with more charges.
In our previous work [11, 12], we made a precise proposal for the relationship between
the 2-charge fuzzball geometries characterized by four curves F i(v) and superpositions of R
ground states: a given geometry characterized by F i(v) is dual to a specific superposition of
R vacua with the superposition determined by the Fourier coefficients of the curves F i(v).
In particular, note that only geometries associated with circular curves are dual to a single
R ground state (in the usual basis, where the states are eigenstates of the R-charge). This
proposal has a straightforward extension to generic 2-charge geometries, which we will spell
out in section 6, and the extended proposal passes all kinematical and accessible dynamical
tests, just as in [11, 12].
In particular, we extract one point functions for chiral primaries from the asymptotically
AdS region of the fuzzball solutions. We find that chiral primaries associated with the middle
cohomology of M4 acquire vevs when there are both internal and transverse excitations;
these vevs hence characterize the internal excitations. Moreover, there are selection rules
for these vevs, in that the internal and transverse curves must have common frequencies.
These properties of the holographic vevs follow directly from the proposed dual super-
positions of ground states. The vevs in these ground states can be derived from three point
functions between chiral primaries at the conformal point. Selection rules for the latter,
namely charge conservation and conservation of the number of operators associated with
each middle cohomology cycle, lead to precisely the features of the vevs found holographi-
cally.
To test the actual values of the kinematically allowed vevs would require information
about the three point functions of all chiral primaries which is not currently known and
is inaccessible in supergravity. However, as in [12], these vevs are reproduced surprisingly
well by simple approximations for the three point functions, which follow from treating
the operators as harmonic oscillators. This suggests that the structure of the chiral ring
may simplify considerably in the large N limit, and it would be interesting to explore this
question further.
An interesting feature of the solutions is that they collapse to the naive geometry when
there are internal but no transverse excitations. One can understand this as follows. Ge-
ometries with only internal excitations are dual to superpositions of R ground states built
from operators associated with the middle cohomology of M4. Such operators account for
a finite fraction of the entropy, but have zero R charges with respect to the SO(4) R sym-
metry group. This means that they can only be characterized by the vevs of SO(4) singlet
operators but the only such operators visible in supergravity are kinematically prevented
from acquiring vevs. Thus it is consistent that in supergravity one could not distinguish
between such solutions: one would need to go beyond supergravity to resolve them (by, for
instance, considering vevs of singlet operators dual to string states).
This brings us to a recurring question in the fuzzball program: can it be implemented
consistently within supergravity? As already mentioned, rigorously testing the proposed
correspondence between geometries and superpositions of microstates requires information
beyond supergravity. Furthermore, the geometric duals of superpositions with very small or
zero R charges are not well-described in supergravity. Even if one has geometries which are
smooth supergravity geometries, these may not be distinguishable from each other within
supergravity: for example, their vevs may differ only by terms of order 1/N , which cannot
be reliably computed in supergravity.
The question of whether the fuzzball program can be implemented in supergravity could
first be phrased in the following way. Can one find a complete basis of fuzzball geometries,
each of which is well-described everywhere by supergravity, which are distinguishable from
each other within supergravity and which together span the black hole microstates? On
general grounds one would expect this not to be possible since many of the microstates
carry small quantum numbers. We quantify this discussion in the last section of this paper
in the context of both 2-charge and 3-charge systems.
To make progress within supergravity, however, it would suffice to sample the black hole
microstates in a controlled way. I.e. one could try to find a basis of geometries which are
well-described and distinguishable in supergravity and which span the black hole microstates
but for which each basis element is assigned a measure. In this approach, one would deal
with the fact that many geometries are too similar to be distinguished in supergravity
by picking representative geometries with appropriate measures. In constructing such a
representative basis, the detailed matching between geometries and black hole microstates
would be crucial, to correctly assign measures and to show that the basis indeed spans all
the black hole microstates.
The plan of this paper is as follows. In section 2 we determine the fuzzball geometries
for D1-D5 on T 4 from dualizing type II F1-P solutions whilst in section 3 we obtain fuzzball
geometries for D1-D5 on K3 from dualizing heterotic F1-P solutions. The resulting so-
lutions are of the same form and are summarized in section 4; readers interested only in
the solutions may skip sections 2 and 3. In section 5 we extract from the asymptotically
AdS regions the dual field theory data, one point functions for chiral primaries. In section
6 we discuss the correspondence between geometries and R vacua, extending the proposal
of [11, 12] and using the holographic vevs to test this proposal. In section 7 we discuss
more generally the implications of our results for the fuzzball proposal. Finally there are a
number of appendices. In appendix A we state our conventions for the field equations and
duality rules, in appendix B we discuss in detail the reduction of type IIB on K3 and ap-
pendix C summarizes relevant properties of spherical harmonics. In appendix D we discuss
fundamental string solutions with winding along the torus, and the corresponding duals
in the D1-D5 system. In appendix E we derive the density of ground states with fixed R
charges.
2 Fuzzball solutions on T 4
In this section we will obtain general 2-charge solutions for the D1-D5 system on T 4 from
type II F1-P solutions.
2.1 Chiral null models
Let us begin with a general chiral null model of ten-dimensional supergravity, written in
the form
ds2 = H−1(x, v)dv(−du +K(x, v)dv + 2AI(x, v)dxI) + dxIdxI ; (2.1)
e−2Φ = H(x, v); B(2)uv =
(H(x, v)−1 − 1); B(2)vI = H(x, v)
−1AI(x, v).
The conventions for the supergravity field equations are given in the appendix A.1. The
above is a solution of the equations of motion provided that the defining functions are
harmonic in the transverse directions, labeled by xI :
�H(x, v) = �K(x, v) = �AI(x, v) = (∂IA
I(x, v) − ∂vH(x, v)) = 0. (2.2)
Solutions of these equations appropriate for describing solitonic fundamental strings carry-
ing momentum were given in [14, 15]:
H = 1 +
|x− F (v)|6 , AI = −
QḞI(v)
|x− F (v)|6 , K =
Q2Ḟ (v)2
Q|x− F (v)|6 , (2.3)
where F I(v) is an arbitrary null curve describing the transverse location of the string, and
Ḟ I denotes ∂vF
I(v). More general solutions appropriate for describing solitonic strings with
fermionic condensates were discussed in [13]. Here we will dualise without using the explicit
forms of the functions, thus the resulting dual supergravity solutions are applicable for all
choices of harmonic functions.
The F1-P solutions described by such chiral null models can be dualised to give cor-
responding solutions for the D1-D5 system as follows. Compactify four of the transverse
directions on a torus, such that xi with i = 1, · · · , 4 are coordinates on R4 and xρ with
ρ = 5, · · · , 8 are coordinates on T 4. Then let v = (t−y) and u = (t+y) with the coordinate
y being periodic with length Ly ≡ 2πRy, and smear all harmonic functions over both this
circle and over the T 4, so that they satisfy
�R4H(x) = �R4K(x) = �R4AI(x) = 0, ∂iA
i = 0. (2.4)
Thus the harmonic functions appropriate for describing strings with only bosonic conden-
sates are
H = 1 +
|x− F (v)|2 ; Ai = −
dvḞi(v)
|x− F (v)|2 ; (2.5)
Aρ = −
dvḞρ(v)
|x− F (v)|2 ; K =
dv(Ḟi(v)
2 + Ḟρ(v)
|x− F (v)|2 .
Here |x−F (v)|2 denotes
i(xi−Fi(v))2. Note that neither Ḟi(v) nor Ḟρ(v) have zero modes;
the asymptotic expansions of AI at large |x| therefore begin at order 1/|x|3. Closure of the
curve in R4 automatically implies that Ḟi(v) has no zero modes. The question of whether
Ḟρ(v) has zero modes is more subtle: since the torus coordinate x
ρ is periodic, the curve
Fρ(v) could have winding modes. As we will discuss in appendix D, however, such winding
modes are possible only when the worldsheet theory is deformed by constant B fields. The
corresponding supergravity solutions, and those obtained from them by dualities, should
thus not be included in describing BPS states in the original 2-charge systems.
The appropriate chain of dualities to the D1−D5 system is
T5678→
D5y5678
NS5y5678
NS5y5678
 , (2.6)
to map to the type IIA NS5-F1 system. The subsequent dualities
NS5y5678
NS5y5678
D5y5678
 (2.7)
result in a D1-D5 system. Here the subscripts of Dpa1···ap denote the spatial directions
wrapped by the brane. In carrying out these dualities we use the rules reviewed in appendix
A.2. We will give details of the intermediate solution in the type IIA NS5-F1 system since
it differs from that obtained in [3].
2.2 The IIA F1-NS5 system
By dualizing the chiral null model from the F1-P system in IIB to F1-NS5 in IIA one obtains
the solution
ds2 = K̃−1[−(dt−Aidxi)2 + (dy −Bidxi)2] +Hdxidxi + dxρdxρ
e2Φ = K̃−1H, B
ty = K̃
−1 − 1, (2.8)
µ̄i = K̃
−1Bµ̄i , B
ij = −cij + 2K̃
−1A[iBj]
C(1)ρ = H
−1Aρ, C
tyρ = (HK̃)
−1Aρ, C
µ̄iρ = (HK̃)
−1Bµ̄i Aρ,
ijρ = (λρ)ij + 2(HK̃)
−1AρA[iBj], C
ρστ = ǫρστπH
−1Aπ,
where
K̃ = 1 +K −H−1AρAρ, dc = − ∗4 dH, dB = − ∗4 dA, (2.9)
dλρ = ∗4dAρ, Bµ̄i = (−Bi, Ai),
with µ̄ = (t, y). Here the transverse and torus directions are denoted by (i, j) and (ρ, σ)
respectively and ∗4 denotes the Hodge dual in the flat metric on R4, with ǫρστπ denoting the
Hodge dual in flat T 4 metric. The defining functions satisfy the equations given in (2.4).
The RR field strengths corresponding to the above potentials are
iρ = ∂i(H
−1Aρ), F
tyiρ = K̃
−1∂i(H
−1Aρ),
µ̄ijρ = 2K̃
−1Bµ̄
∂j](H
−1Aρ), F
iρστ = ǫρστπ∂i(H
−1Aπ), (2.10)
= K̃−1
6A[iBj∂k](H
−1Aρ) +Hǫijkl∂
l(H−1Aρ)
Thus the solution describes NS5-branes wrapping the y circle and the T 4, bound to funda-
mental strings delocalized on the T 4 and wrapping the y circle, with additional excitations
on the T 4. These excitations break the T 4 symmetry by singling out a direction within
the torus, and source multipole moments of the RR fluxes; the solution however has no net
D-brane charges.
Now let us briefly comment on the relation between this solution and that presented in
appendix B of [3]2. The NS-NS sector fields agree, but the RR fields are different; in [3] they
are given as 1, 3 and 5-form potentials. The relation of these potentials to field strengths
(and the corresponding field equations) is not given in [3]. As reviewed in appendix A.2, in
the presence of both electric and magnetic sources it is rather natural to use the so-called
democratic formalisms of supergravity [16], in which one includes p-form field strengths with
2We thank Samir Mathur for discussions on this issue.
p > 5 along with constraints relating higher and lower form field strengths. Any solution
written in the democratic formalism can be rewritten in terms of the standard formalism,
appropriately eliminating the higher form field strengths. If one interprets the RR forms
of [3] in this way, one does not however obtain a supergravity solution in the democratic
formalism; the Hodge duality constraints between higher and lower form field strengths are
not satisfied. Furthermore, one would not obtain from the RR fields of [3] the solution
written here in the standard formalism, after eliminating the higher forms.
2.3 Dualizing further to the D1-D5 system
The final steps in the duality chain are T-duality along a torus direction, followed by S-
duality. When T-dualizing further along a torus direction to a F1-NS5 solution in IIB, the
excitations along the torus mean that the dual solution depends explicitly on the chosen
T-duality cycle in the torus. We will discuss the physical interpretation of the distinguished
direction in section 4. In the following the T-duality is taken along the x8 direction, resulting
in the following D1-D5 system:
ds2 =
5 f̃1
[−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f
5 dxidx
i + f
5 dxρdx
e2Φ =
f5f̃1
f5f̃1
µ̄i =
ABµ̄i
f5f̃1
, (2.11)
ij = λij +
2AA[iBj]
f5f̃1
αβ = −ǫαβγf
5 Aγ , B
α8 = f
5 Aα,
C(0) = −f−11 A, C
ty = 1− f̃−11 , C
µ̄i = −f̃
ij = cij − 2f̃
1 A[iBj], C
tyij = λij +
f5f̃1
(cij + 2A[iBj]),
µ̄ijk
f5f̃1
cjk], C
= −ǫαβγf−15 A
γ , C
tyα8 = f
5 Aα,
αβγ8 = ǫαβγf
5 A, C
ijα8 = (λα)ij + f
5 Aαcij , C
ijαβ = −ǫαβγ(λ
ij + f
γcij),
where
f5 ≡ H, f̃1 = 1 +K −H−1(AαAα + (A)2), f1 = f̃1 +H−1(A)2,
dc = − ∗4 dH, dB = − ∗4 dA, Bµ̄i = (−Bi, Ai), (2.12)
dλα = ∗4dAα, dλ = ∗4dA.
Here µ̄ = (t, y) and we denote A8 as A with the remaining Aρ being denoted by Aα where
the index α runs over only 5, 6, 7. The Hodge dual over these coordinates is denoted by
ǫαβγ . Explicit expressions for these defining harmonic functions in terms of variables of the
D1-D5 system will be given in section 4.
The forms with components along the torus directions can be written more compactly
as follows. Introduce a basis of self-dual and anti-self dual 2-forms on the torus such that
ωα± =
(dx4+α± ∧ dx8 ± ∗T 4(dx4+α± ∧ dx8)), (2.13)
with α± = 1, 2, 3. These forms are normalized such that
ωα± ∧ ωβ± = ±(2π)4V δα±β± , (2.14)
where (2π)4V is the volume of the torus. Then the potentials wrapping the torus directions
can be expressed as
B(2)ρσ = C
tyρσ =
2f−15 A
α−ωα−ρσ , (2.15)
ijρσ =
(λij)
α− + f−15 A
α−cij
ωα−ρσ ,
C(4)ρστπ = ǫρστπf
with ǫρστπ being the Hodge dual in the flat metric on T
4. Note that these fields are
expanded only in the anti-self dual two-forms, with neither the self dual two-forms nor the
odd-dimensional forms on the torus being switched on anywhere in the solution. As we
will discuss later, this means the corresponding six-dimensional solution can be described
in chiral N = 4b six-dimensional supergravity. The components of forms associated with
the odd cohomology of T 4 reduce to gauge fields in six dimensions which are contained in
the full N = 8 six-dimensional supergravity, but not its truncation to N = 4b.
3 Fuzzball solutions on K3
In this section we will obtain general 2-charge solutions for the D1-D5 system on K3 from
F1-P solutions of the heterotic string.
3.1 Heterotic chiral model in 10 dimensions
The chiral model for the charged heterotic F1-P system in 10 dimensions is:
ds2 = H−1(−dudv + (K − 2α′H−1N (c)N (c))dv2 + 2AIdxIdv) + dxIdxI
B̂(2)uv =
(H−1 − 1), B̂(2)vI = H
−1AI , (3.1)
Φ̂ = −1
lnH, V̂ (c)v = H
−1N (c),
where I = 1, · · · , 8 labels the transverse directions and V̂ (c)m are Abelian gauge fields, with
((c) = 1, · · · , 16) labeling the elements of the Cartan of the gauge group. The fields are
denoted with hats to distinguish them from the six-dimensional fields used in the next
subsection. The equations of motion for the heterotic string are given in appendix A.1; here
again the defining functions satisfy
�H(x, v) = �K(x, v) = �AI(x, v) = (∂IA
I(x, v)− ∂vH(x, v)) = �N (c) = 0. (3.2)
For the solution to correspond to a solitonic charged heterotic string, one takes the following
solutions
H = 1 +
|x− F (v)|6 , AI = −
QḞI(v)
|x− F (v)|6 , N
(c) =
q(c)(v)
|x− F (v)|6 ,
Q2Ḟ (v)2 + 2α′q(c)q(c)(v)
Q|x− F (v)|6 , (3.3)
where F I(v) is an arbitrary null curve in R8; q(c)(v) is an arbitrary charge wave and ḞI(v)
denotes ∂vFI(v). Such solutions were first discussed in [14, 15], although the above has
a more generic charge wave, lying in U(1)16 rather than U(1). In what follows it will be
convenient to set α′ = 1
These solutions can be related by a duality chain to fuzzball solutions in the D1-D5
system compactified on K3. The chain of dualities is the following:
Het,T 4
NS5ty,K3
NS5ty,K3
D5ty,K3
(3.4)
The first step in the duality is string-string duality between the heterotic theory on T 4 and
type IIA on K3. Again the subscripts of Dpa1···ap denote the spatial directions wrapped by
the brane. To use this chain of dualities on the charged solitonic strings given above, the
solutions must be smeared over the T 4 and over v, so that the harmonic functions satisfy
�R4H = �R4K = �R4AI = �R4N
(c) = ∂iA
i = 0 (3.5)
where i = 1, · · · , 4 labels the transverse R4 directions. Note that although the chain of
dualities is shorter than in the previous case there are various subtleties associated with it,
related to the K3 compactification, which will be discussed below.
3.2 Compactification on T 4
Compactification of the heterotic theory on T 4 is straightforward, see [18, 19] and the review
[20]. The 10-dimensional metric is reduced as
Ĝmn =
gMN +GρσV
(1) ρ
(1) σ
(1) ρ
M Gρσ
(1) σ
N Gρσ Gρσ
 , (3.6)
where V
(1) ρ
M with ρ = 1, · · · 4, are KK gauge fields. (Recall that the ten-dimensional quan-
tities are denoted with hats to distinguish them from six-dimensional quantities.) The
reduced theory contains the following bosonic fields: the graviton gMN , the six-dimensional
dilaton Φ6, 24 Abelian gauge fields V
M ≡ (V
(1) ρ
M , V
M ρ, V
(3) (c)
M ), a two form BMN and an
O(4, 20) matrix of scalars M . Note that the index (a), (b) for the SO(4, 20) vector runs
from (1, · · · , 24). These six-dimensional fields are related to the ten-dimensional fields as
Φ6 = Φ̂−
ln detGρσ ;
M ρ = B̂
Mρ + B̂
(1) σ
V̂ (c)ρ V
(3) (c)
M ; (3.7)
(3) (c)
M = V̂
M − V̂
(1) ρ
HMNP = 3(∂[M B̂
L(a)(b)F (V )
with the metric gMN and V
(1) ρ
M defined in (3.6). The matrix L is given by
0 −I20
 , (3.8)
where In denotes the n× n identity matrix. The scalar moduli are defined via
M = ΩT1
G−1 −G−1C −G−1V T
−CTG−1 G+ CTG−1C + V TV CTG−1V T + V T
−V G−1 V G−1C + V I16 + V G−1V T
Ω1, (3.9)
where G ≡ [Ĝρσ ], C ≡ [12 V̂
σ + B̂
ρσ ] and V ≡ [V̂ (c)ρ ] are defined in terms of the
components of the 10-dimensional fields along the torus. The constant O(4, 20) matrix Ω1
is given by
I4 I4 0
−I4 I4 0
. (3.10)
This matrix arises in (3.9) as follows. In [18, 20] the matrix L was chosen to be off-diagonal,
but for our purposes it is useful for L to be diagonal. An off-diagonal choice is associated
with an off-diagonal intersection matrix for the self-dual and anti-self-dual forms of K3,
but this is an unnatural choice for our solutions, in which only anti-self-dual forms are
active. Thus relative to the conventions of [18, 20] we take L → ΩT1 LΩ1, which induces
M → ΩT1MΩ1 and F → ΩT1 F . The definitions of this and other constant matrices used
throughout the paper are summarized in appendix B.2.
These fields satisfy the equations of motion following from the action
−ge−2Φ6 [R+ 4(∂Φ6)2 −
H23 −
F (V )
MN (LML)(a)(b)F (V )
(b)MN
tr(∂MML∂
MML)], (3.11)
where α′ has been set to 1/4 and κ26 = κ
10/V4 with V4 the volume of the torus.
The reduction of the heterotic solution to six dimensions is then
ds2 = H−1
−dudv +
K −H−1(1
(N (c))2 + (Aρ)
dv2 + 2Aidx
+ dxidx
Buv =
(H−1 − 1), Bvi = H−1Ai, Φ6 = −12 lnH (3.12)
V (a)v =
2H−1Aρ,H
−1N (c)
, M = I24,
where i = 1, · · · , 4 runs over the transverse R4 directions and ρ = 5, · · · , 8 runs over the
internal directions of the T 4. Thus the six-dimensional solution has only one non-trivial
scalar field, the dilaton, with all other scalar fields being constant.
3.3 String-string duality to P-NS5 (IIA) on K3
Given the six-dimensional heterotic solution, the corresponding IIA solution in six dimen-
sions can be obtained as follows. Compactification of type IIA on K3 leads to the following
six-dimensional theory [17]:
6 [R′ + 4(∂Φ′6)
2 − 1
tr(∂MM
′L∂MM ′L)] (3.13)
F ′(V )
MN (LM
′L)(a)(b)F
′(V )
(b)MN
B′2 ∧ F ′2(V )(a) ∧ F ′2(V )(b)L(a)(b).
The field content is the same as for the heterotic theory in (3.11); note that in contrast to
(3.7) there is no Chern-Simons term in the definition of the 3-form field strength, that is,
H ′MNP = 3∂[MB
The rules for string-string duality are [17]:
Φ′6 = −Φ6, g′MN = e−2Φ6gMN , M ′ =M, V
M = V
H ′3 = e
−2Φ6 ∗6 H3; (3.14)
these transform the equations of motion derived from (3.11) into ones derived from the
action (3.13).
Acting with this string-string duality on the heterotic solutions (3.12) yields, dropping
the primes on IIA fields:
ds2 = −dudv + (K −H−1((N (c))2/2 + (Aρ)2))dv2 + 2Aidxidv +Hdxidxi,
Hvij = −ǫijkl∂kAl, Hijk = ǫijkl∂lH, Φ6 =
lnH, (3.15)
V (a)v =
2H−1Aρ,H
−1N (c)
, M = I24,
with ǫijkl denoting the dual in the flat R
4 metric. This describes NS5-branes on type IIA,
wrapped on K3 and on the circle direction y, carrying momentum along the circle direction.
3.4 T-duality to F1-NS5 (IIB) on K3
The next step in the duality chain is T-duality on the circle direction y to give an NS5-F1
solution of type IIB on K3. It is most convenient to carry out this step directly in six
dimensions, using the results of [22] on T-duality of type II theories on K3× S1.
Recall that type IIB compactified on K3 gives d = 6, N = 4b supergravity coupled to 21
tensor multiplets, constructed by Romans in [23]. The bosonic field content of this theory
is the graviton gMN , 5 self-dual and 21 anti-self dual tensor fields and an O(5,21) matrix of
scalars M which can be written in terms of a vielbein M−1 = V TV . Following the notation
of [30] the bosonic field equations may be written as
RMN = 2P
PQ +HrMPQH
∇MPnrM = QMnmPmrM +QMrsPnsM +
HnMNPHrMNP , (3.16)
along with Hodge duality conditions on the 3-forms
∗6Hn3 = Hn3 , ∗6Hr3 = −Hr3 , (3.17)
In these equations (m,n) are SO(5) vector indices running from 1 to 5 whilst (r, s) are
SO(21) vector indices running from 6 to 26. The 3-form field strengths are given by
Hn = GAV nA ; H
r = GAV rA, (3.18)
where A ≡ {n, r} = 1, · · · , 26; GA = dbA are closed and the vielbein on the coset space
SO(5, 21)/(SO(5) × SO(21)) satisfies
V T ηV = η, V =
 , η =
0 −I21
 . (3.19)
The associated connection is
dV V −1 =
2P rn Qrs
 , (3.20)
where Qmn and Qrs are antisymmetric and the off-diagonal block matrices Pms and P rn
are transposed to each other. Note also that there is a freedom in choosing the vielbein;
SO(5) × SO(21) transformations acting on H3 and V as
V → OV, H3 → OH3, (3.21)
leave G3 and M−1 unchanged. Note that the field equations (3.16) can also be derived from
the SO(5, 21) invariant Einstein frame pseudo-action [21]
tr(∂M−1∂M)− 1
GAMNPM−1ABG
, (3.22)
with the Hodge duality conditions (3.17) being imposed independently.
Now let us consider the T-duality relating a six-dimensional IIB solution to a six-
dimensional IIA solution of (3.13); the corresponding rules were derived in [22]. Given
that the six-dimensional IIA supergravity has only an SO(4, 20) symmetry, relating IIB to
IIA requires explicitly breaking the SO(5, 21) symmetry of the IIB action down to SO(4, 20).
That is, one defines a conformal frame in which only an SO(4, 20) subgroup is manifest and
in which the action reads
R+ 4(∂Φ)2 +
tr(∂M−1∂M)
∂l(a)M−1
(a)(b)
∂l(b)
GAMNPM−1ABG
. (3.23)
The SO(5, 21) matrix M−1 has now been split up into the dilaton Φ, an SO(4, 20) vector
l(a) and an SO(4, 20) matrix M−1
(a)(b)
, and we have chosen the parametrization
M−1AB = Ω
e−2Φ + lTM−1l + 1
e2Φl4 −1
e2Φl2 (lTM−1)(b) +
e2Φl2(lTL)(b)
e2Φl2 e2Φ −e2Φ(lTL)(b)
(M−1l)(a) +
e2Φl2(Ll)(a) −e2Φ(Ll)(a) M−1(a)(b) + e
2Φ(Ll)(a)(l
TL)(b)
(3.24)
where l2 = l(a)l(b)L(a)(b), L(a)(b) was defined in (3.8) and Ω3 is a constant matrix defined in
appendix B.2.
The fields Φ, l(a) and M−1 and half of the 3-forms can now be related to the IIA fields
of section 3.3 by the following T-duality rules (given in terms of the 2-form potentials bA)
[22]:
g̃yy = g
yy , b̃
yM + b̃
g−1yy gyM , (3.25)
g̃yM = g
yy ByM , b̃
MN + b̃
g−1yy (BMN + 2(gy[MBN ]y)),
g̃MN = gMN − g−1yy (gyMgyN −ByMByN ), l̃(a) = V (a)y ,
Φ̃ = Φ− 1
log |gyy|, M̃−1(a)(b) =M
(a)(b)
(a)+1
M − g
y gyM ), (1 ≤ (a) ≤ 24),
Here y is the T-duality circle, the six-dimensional index M excludes y and IIB fields are
denoted by tildes to distinguish them from IIA fields. The other half of the tensor fields,
that is
(b̃1yM − b̃26yM ), (b̃1MN − b̃26MN ), b̃
(a)+1
MN , b̃
(a)+1
, can then be determined using the Hodge
duality constraints (3.17).
We now have all the ingredients to obtain the T-dual of the IIA solution (3.15) along
y ≡ 1
(u−v). The IIA solution is expressed in terms of harmonic functions which also depend
on the null coordinate v, and thus one needs to smear the solutions before dualizing. Note
that it is the harmonic functions (H,K,AI , N (c)) which must be smeared over v, rather than
the six-dimensional fields given in (3.15), since it is the former that satisfy linear equations
and can therefore be superimposed.
The Einstein frame metric and three forms are given by
ds2 =
[−(dt−Aidxi)2 + (dy −Bidxi)2)] +
HK̃dxidx
GAtyi = ∂i
, GAµ̄ij = −2∂[i
, (3.26)
GAijk = ǫijkl∂
lnA + 6∂[i
AjBk]
where
(H +K + 1, 04) , n
−2Aρ,−
2N (c),H −K − 1
, (3.27)
K̃ = 1 +K −H−1(1
(N (c))2 + (Aρ)
2), dB = − ∗4 dA, Bµi = (−Bi, Ai).
Recall that n = 1, · · · , 5 and r = 6, · · · , 26 and ∗4 denotes the dual on flat R4; µ̄ = (t, y).
The SO(4, 20) scalars are given by
, l(a) =
2H−1Aρ,H
−1N (c)
, M = I24. (3.28)
The SO(5, 21) scalar matrix M−1 = V TV in (3.24) can then conveniently be expressed in
terms of the vielbein
V = ΩT3
H−1K̃ 0 0
H3K̃)−1(A2ρ +
(N (c))2)
HK̃−1 −
HK̃−1l(b)
l(a) 0 I24
Ω3. (3.29)
3.5 S-duality to D1-D5 on K3
One further step in the duality chain is required to obtain the D1-D5 solution in type IIB,
namely S duality. However, in the previous section the type II solutions have been given in
six rather than ten dimensions. To carry out S duality one needs to specify the relationship
between six and ten dimensional fields. Whilst the ten-dimensional SL(2, R) symmetry is
part of the six-dimensional symmetry group, its embedding into the full six-dimensional
symmetry group is only defined once one specifies the uplift to ten dimensions. The details
of the dimensional reduction are given in appendix B, with the six-dimensional S duality
rules being given in (B.16); the S duality leaves the Einstein frame metric invariant, and
acts as a constant rotation and similarity transformation on the three forms GA and the
matrix of scalars M respectively. The S-dual solution is thus
ds2 =
f5f̃1
[−(dt−Aidxi)2 + (dy −Bidxi)2)] +
f5f̃1dxidx
i, (3.30)
GAtyi = ∂i
f5f̃1
, GAµ̄ij = −2∂[i
f5f̃1
GAijk = ǫijkl∂
lmA + 6∂[i
f5f̃1
AjBk]
(f5 + F1)
, (3.31)
(f5 − F1),−2Aα,−
2N (c), 2A5
((f5 − F1),−2Aα− , 2A) .
Here the index α = 6, 7, 8. Note that the specific reduction used here, see appendix B,
distinguished A5 from the other Aρ and N
(c). A different embedding would single out
a different harmonic function, and hence a different vector, and it is thus convenient to
introduce (A,Aα−) to denote the choice of splitting more abstractly. Also as in (2.12) it is
convenient to introduce the following combinations of harmonic functions:
f5 = H, f̃1 = 1 +K −H−1(A2 +Aα−Aα−), (3.32)
F1 = 1 +K, f1 = f̃1 +H
−1A2.
The vielbein of scalars is given by
V = ΩT4
f−11 f̃1 0 0 0 0
f̃−11 f1 −GAF1 (
f1f̃1)
−1A −GA kγ
−FA 0
f−15 f1 0 0
FA 0 −1
f−15 F (k
1 −Fkγ
0 0 f−15 k
γ 0 I22
Ω4, (3.33)
where to simplify notation quantities (F,G) are defined as
F = (f1f5)
−1/2, G = (f1f̃1f
−1/2. (3.34)
We also define the 22-dimensional vector kγ as
kγ = (03,
2Aα−). (3.35)
Here γ = 1, · · · , b2 where the second Betti number is b2 = 22 for K3. Using the reduction
formulae (B.13) and (B.14), the six-dimensional solution (3.30), (3.33) can be lifted to ten
dimensions, resulting in a solution with an analogous form to the T 4 case (2.11). We will
thus summarize the solution for both cases in the following section.
4 D1-D5 fuzzball solutions
In this section we will summarize the D1-D5 fuzzball solutions with internal excitations, for
both the K3 and T 4 cases. In both cases the solutions can be written as
ds2 =
[−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f
5 dxidx
i + f
e2Φ =
f5f̃1
f5f̃1
µ̄i =
ABµ̄i
f5f̃1
, (4.1)
ij = λij +
2AA[iBj]
f5f̃1
, B(2)ρσ = f
γωγρσ, C
(0) = −f−11 A,
ty = 1− f̃−11 , C
µ̄i = −f̃
i , C
ij = cij − 2f̃
1 A[iBj],
tyij = λij +
f5f̃1
(cij + 2A[iBj]), C
µ̄ijk =
f5f̃1
cjk],
tyρσ = f
γωγρσ, C
ijρσ = (λ
ij + f
γcij)ω
ρσ, C
ρστπ = f
5 Aǫρστπ,
where we introduce a basis of self-dual and anti-self-dual 2-forms ωγ ≡ (ωα+ , ωα−) with
γ = 1, · · · , b2 on the compact manifold M4. For both T 4 and K3 the self-dual forms are
labeled by α+ = 1, 2, 3 whilst the anti-self-dual forms are labeled by α− = 1, 2, 3 for T
4 and
α− = 1, · · · 19 for K3. The intersections and normalizations of these forms are defined in
(2.13), (2.14) and (B.4). The solutions are expressed in terms of the following combinations
of harmonic functions (H,K,Ai,A,Aα−)
f5 = H; f̃1 = 1 +K −H−1(A2 +Aα−Aα−); f1 = f̃1 +H−1A2;
kγ = (03,
2Aα−); dB = − ∗4 dA; dc = − ∗4 df5; (4.2)
dλγ = ∗4dkγ ; dλ = ∗4dA; Bµ̄i = (−Bi, Ai),
where µ̄ = (t, y) and the Hodge dual ∗4 is defined over (flat) R4, with the Hodge dual
in the Ricci flat metric on the compact manifold being denoted by ǫρστπ. The constant
term in C
ty is chosen so that the potential vanishes at asymptotically flat infinity. The
corresponding RR field strengths are
i = −∂i
f−11 A
tyi = (f1f̃1f
f25∂if̃1 + f5A∂iA−A2∂if5
µ̄ij = (f
5 f1f̃1)
(f5∂j]f̃1 + f5A∂j]A−A2∂j]f5) + 2f̃1f25∂[iB
ijk = −ǫijkl(∂
lf5 − f−11 A∂lA)− 6f
1 ∂[i(AjBk]) (4.3)
+(f25f1f̃1)
6A[iBj(f5∂k]f̃1 + f5A∂k]A−A2∂k]f5)
iρσ = f
1 A∂i(f
γ)ωγρσ,
iρστπ = ǫρστπ∂i(f
5 A), F
tyijk = ǫijklf̃
1 f5∂
l(f−15 A),
µ̄ijkl = −ǫijklf5f̃
m(f−15 A),
tyiρσ = f̃
1 ∂i(k
γ/f5)ω
ρσ, F
µ̄ijρσ = 2f̃
∂j](f
γ)ωγρσ,
ijkρσ =
6f̃−11 A[iBj∂k](f
γ) + ǫijklf5∂
l(f−15 k
ωγρσ.
It has been explicitly checked that this is a solution of the ten-dimensional field equations
for any choices of harmonic functions (H,K,Ai,A,Aα−) with ∂iAi = 0. Note that in the
case of K3 one needs the identity (B.15) for the harmonic forms to check the components
of the Einstein equation along K3.
We are interested in solutions for which the defining harmonic functions are given by
H = 1 +
|x− F (v)|2 ; Ai = −
dvḞi(v)
|x− F (v)|2 , (4.4)
A = −Q5
dvḞ(v)
|x− F (v)|2 ; A
α− = −Q5
dvḞα−(v)
|x− F (v)|2 ,
dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2)
|x− F (v)|2 .
In these expressions Q5 is the 5-brane charge and L is the length of the defining curve in
the D1-D5 system, given by
L = 2πQ5/R, (4.5)
where R is the radius of the y circle. Note that Q5 has dimensions of length squared and is
related to the integral charge via
Q5 = α
′n5 (4.6)
(where gs has been set to one). Assuming that the curves (Ḟ(v), Ḟα− (v)) do not have zero
modes, the D1-brane charge Q1 is given by
dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2), (4.7)
and the corresponding integral charge is given by
, (4.8)
where (2π)4V is the volume of the compact manifold. The mapping of the parameters from
the original F1-P systems to the D1-D5 systems was discussed in [1] and is unchanged here.
The fact that the solutions take exactly the same form, regardless of whether the compact
manifold is T 4 or K3, is unsurprising given that only zero modes of the compact manifold
are excited.
The solutions defined in terms of the harmonic functions (4.4) describe the complete
set of two-charge fuzzballs for the D1-D5 system on K3. In the case of T 4, these describe
fuzzballs with only bosonic excitations; the most general solution would include fermionic
excitations and thus more general harmonic functions of the type discussed in [13]. Solutions
involving harmonic functions with disconnected sources would be appropriate for describing
Coulomb branch physics. Note that, whilst the solutions obtained by dualities from super-
symmetric F1-P solutions are guaranteed to be supersymmetric, one would need to check
supersymmetry explicitly for solutions involving other choices of harmonic functions.
In the final solutions one of the harmonic functions A describing internal excitations is
singled out from the others. In the original F1-P system, the solutions pick out a direction
in the internal space. For the type II system on T 4, the choice of Aρ singles out a direction
in the torus whilst in the heterotic solution the choice of (Aρ, N
(c)) singles out a direction
in the 20d internal space. Both duality chains, however, also distinguish directions in the
internal space. In the T 4 case one had to choose a direction in the torus, whilst in the
K3 case the choice is implicitly made when one uplifts type IIB solutions from six to ten
dimensions. In particular, the uplift splits the 21 anti-self-dual six-dimensional 3-forms into
19 + 1 + 1 associated with the ten-dimensional (F (5), F (3),H(3)) respectively.
When there are no internal excitations, the final solutions must be independent of the
choice of direction made in the duality chains but this does not remain true when the original
solution breaks the rotational symmetry in the internal space. A is the component of the
original vector along the direction distinguished in the duality chain, whilst Aα− are the
components orthogonal to this direction. When there are no excitations along the direction
picked out by the duality, i.e. A = 0, the solution considerably simplifies, becoming
ds2 =
(f1f5)1/2
[−(dt−Aidxi)2 + (dy −Bidxi)2] + f1/21 f
5 dxidx
i + f
e2Φ =
, B(2)ρσ = f
γωγρσ, C
ty = 1− f−11 , C
µ̄i = −f
ij = cij − 2f
1 A[iBj], C
tyρσ = f
γωγρσ, C
ijρσ = (λ
ij + f
γcij)ω
In this solution the internal excitations induce fluxes of the NS 3-form and RR 5-form along
anti-self dual cycles in the compact manifold (but no net 3-form or 5-form charges). By
contrast the excitations parallel to the duality direction induce a field strength for the RR
axion, NS 3-form field strength in the non-compact directions and RR 5-form field strength
along the compact manifold (but again no net charges).
Let us also comment on the M4 moduli in our solutions. The solutions are expressed in
terms of a Ricci flat metric on M4 and anti-self dual harmonic two forms. The forms satisfy
ωγρσω
δρσ = Dǫ δdγǫ ≡ δγδ , (4.9)
where the intersection matrix dδγ and the matrix D
δ relating the basis of forms and dual
forms are defined in (B.4) and (B.6) respectively. The latter condition on D
ǫ arose from
the duality chain, and followed from the fact that in the original F1-P solutions the internal
manifold had a flat square metric. Thus, the final solutions are expressed at a specific point
in the moduli space of M4 because the original F1-P solutions have specific fixed moduli.
It is straightforward to extend the solutions to general moduli: one needs to change
f̃1 = 1 +K −H−1(A2 +Aα−Aα−) → 1 +K −H−1(A2 + 12k
γkδDǫ δdγǫ), (4.10)
with kγ as defined in (4.2), to obtain the solution for more general D
Given a generic fuzzball solution, one would like to check whether the geometry is indeed
smooth and horizon-free. For the fuzzballs with no internal excitations this question was
discussed in [3], the conclusion being that the solutions are non-singular unless the defining
curve F i(v) is non-generic and self-intersects. In the appendix of [3], the smoothness of
fuzzballs with internal excitations was also discussed. However, their D1-D5 solutions were
incomplete: only the metric was given, and this was effectively given in the form (3.30)
rather than (4.1). Nonetheless, their conclusion remains unchanged: following the same
discussion as in [3] one can show that a generic fuzzball solution with internal excitations is
non-singular provided that the defining curve F i(v) does not self-intersect and Ḟi(v) only
has isolated zeroes. In particular, if there are no transverse excitations, F i(v) = 0, the
solution will be singular as discussed in section 6.6.
One can show that there are no horizons as follows. The harmonic function f5 is clearly
positive definite, by its definition. The functions (f1, f̃1) are also positive definite, since
they can be rewritten as a sum of positive terms as
f5f̃1 =
|x− F |2
dvḞ 2
|x− F |2
(4.11)
dv(Ḟ (v))2 + (Ḟα−(v))2
|x− F |2
dvdv′
(Ḟ(v)− Ḟ(v′))2 + (Ḟα−(v)− Ḟα−(v′))2
|x− F (v)|2 |x− F (v′)|2
and a corresponding expression for f5f1. Note that in the decoupling limit only the terms
proportional to Q25 remain, and these are also manifestly positive definite. Given that the
defining functions have no zeroes anywhere, the geometry therefore has no horizons.
Now let us consider the conserved charges. From the asymptotics one can see that the
fuzzball solutions have the same mass and D1-brane, D5-brane charges as the naive solution;
the latter are given in (4.6) and (4.8) whilst the ADM mass is
(Q1 +Q5), (4.12)
where Ly = 2πR, Ω3 = 2π
2 is the volume of a unit 3-sphere, and 2κ26 = (2κ
2)/(V (2π)4)
with 2κ2 = (2π)7(α′)4 in our conventions. The fuzzball solutions have in addition angular
momenta, given by
J ij =
dv(F iḞ j − F jḞ i). (4.13)
These are the only charges; the fields F (1) and F (5) fall off too quickly at infinity for the
corresponding charges to be non-zero. One can compute from the harmonic expansions
of the fields dipole and more generally multipole moments of the charge distributions. A
generic solution breaks completely the SO(4) rotational invariance in R4, and this symmetry
breaking is captured by these multipole moments.
However, the multipole moments computed at asymptotically flat infinity do not have a
direct interpretation in the dual field theory. In contrast, the asymptotics of the solutions in
the decoupling limit do give field theory information: one-point functions of chiral primaries
are expressed in terms of the asymptotic expansions (and hence multipole moments) near
the AdS3 ×S3 boundary. Thus it is more useful to compute in detail the latter, as we shall
do in the next section.
5 Vevs for the fuzzball solutions
In the decoupling limit all of the fuzzball solutions are asymptotic to AdS3 × S3 × M4,
where M4 is T 4 or K3. Therefore one can use AdS/CFT methods to extract holographic
data from the geometries. In particular, the asymptotics of the six-dimensional solutions
near the AdS3 × S3 boundary encode the vevs of chiral primary operators in the dual field
theory.
The precise relationship between asymptotics and vevs is however rather subtle. A
systematic method for extracting vevs from asymptotically AdS ×X solutions (with X an
arbitrary compact manifold) was only recently constructed, in [10], building on earlier work
[24, 25, 26, 27, 28], see also the review [29]. This method of Kaluza-Klein holography was
then applied to the case of asymptotically AdS3×S3 solutions of d = 6, N = 4b supergravity
coupled to nt tensor multiplets in [11, 12] and in what follows we will make use of many of
the results derived there.
For fuzzball solutions on K3, the relevant solution of six-dimensional N = 4b super-
gravity coupled to 21 tensor multiplets was given explicitly in (3.30). For the case of T 4,
we obtained the solution in ten dimensions, but there is a corresponding six-dimensional
solution of N = 4b supergravity coupled to 5 tensor multiplets. This solution is of exactly
the same form as the K3 solution given in (3.30), but with the index α− = 1, 2, 3. Thus
in what follows we will analyze both cases simultaneously. As mentioned earlier, the T 4
solution reduces to a solution of d = 6, N = 4b supergravity rather than a solution of
d = 6, N = 8 supergravity because forms associated with the odd cohomology of T 4 (and
hence six-dimensional vectors) are not present in our solutions.
5.1 Holographic relations for vevs
Consider an AdS3 × S3 solution of the six-dimensional field equations (3.16), such that
ds26 =
(−dt2 + dy2 + dz2) + dΩ23
; (5.1)
G5 = H5 ≡ go5 =
Q1Q5(rdr ∧ dt ∧ dy + dΩ3),
with the vielbein being diagonal and all other three forms (both self-dual and anti-self dual)
vanishing. In what follows it is convenient to absorb the curvature radius
Q1Q5 into an
overall prefactor in the action, and work with the unit radius AdS3 × S3. Now express the
perturbations of the six-dimensional supergravity fields relative to the AdS3×S3 background
gMN = g
MN + hMN ; G
A = goA + gA; (5.2)
V nA = δ
A + φ
nrδrA +
φnrφmrδmA ;
V rA = δ
A + φ
nrδnA +
φnrφnsδsA.
These fluctuations can then be expanded in spherical harmonics as follows:
hµν =
hIµν(x)Y
I(y), (5.3)
hµa =
(hIvµ (x)Y
a (y) + h
(s)µ(x)DaY
I(y)),
h(ab) =
(ρIt(x)Y It
(y) + ρIv
(x)DaY
b (y) + ρ
(s)(x)D(aDb)Y
I(y)),
haa =
πI(x)Y I(y),
gAµνρ =
3D[µb
(x)Y I(y),
gAµνa =
(b(A)Iµν (x)DaY
I(y) + 2D[µZ
(A)Iv
(x)Y Iva (y));
gAµab =
(A)I(x)ǫabcD
cY I(y) + 2Z(A)Ivµ D[bY
gAabc =
(−ǫabcΛIU (A)I(x)Y I(y));
φmr =
φ(mr)I(x)Y I(y),
Here (µ, ν) are AdS indices and (a, b) are S3 indices, with x denoting AdS coordinates and
y denoting sphere coordinates. The subscript (ab) denotes symmetrization of indices a and
b with the trace removed. Relevant properties of the spherical harmonics are reviewed in
appendix C. We will often use a notation where we replace the index I by the degree of the
harmonic k or by a pair of indices (k, I) where k is the degree of the harmonic and I now
parametrizes their degeneracy, and similarly for Iv, It.
Imposing the de Donder gauge condition DAhaM = 0 on the metric fluctuations re-
moves the fields with subscripts (s, v). In deriving the spectrum and computing correlation
functions, this is therefore a convenient choice. The de Donder gauge choice is however not
always a convenient choice for the asymptotic expansion of solutions; indeed the natural
coordinate choice in our application takes us outside de Donder gauge. As discussed in [10]
this issue is straightforwardly dealt with by working with gauge invariant combinations of
the fluctuations.
Next let us briefly review the linearized spectrum derived in [30], focusing on fields
dual to chiral primaries. Consider first the scalars. It is useful to introduce the following
combinations which diagonalize the linearized equations of motion:
4(k + 1)
(5r)k
I + 2(k + 2)U
I ), (5.4)
σkI =
12(k + 1)
(6(k + 2)Û
I − π̂
The fields s(r)k and σk correspond to scalar chiral primaries, with the masses of the scalar
fields being
s(r)k
= k(k − 2), (5.5)
The index r spans 6 · · · 5 + nt with nt = 5, 21 respectively for T 4 and K3. Note also that
k ≥ 1 for s(r)k; k ≥ 2 for σk. The hats (Û (5)kI , π̂kI ) denote the following. As discussed in
[10], the equations of motion for the gauge invariant fields are precisely the same as those
in de Donder gauge, provided one replaces all fields with the corresponding gauge invariant
field. The hat thus denotes the appropriate gauge invariant field, which reduces to the de
Donder gauge field when one sets to zero all fields with subscripts (s, v). For our purposes
we will need these gauge invariant quantities only to leading order in the fluctuations, with
the appropriate combinations being
I = πI2 +Λ
2ρI2(s); (5.6)
2 = U
2 − 12ρ
2(s);
ĥ0µν = h
h1±αµ h
Next consider the vector fields. It is useful to introduce the following combinations which
diagonalize the equations of motion:
h±µIv =
(C±µIv −A
(C±µIv +A
). (5.7)
For general k the equations of motion are Proca-Chern-Simons equations which couple
(A±µ , C
µ ) via a first order constraint [30]. The three dynamical fields at each degree k have
masses (k − 1, k + 1, k + 3), corresponding to dual operators of dimensions (k, k + 2, k + 4)
respectively; the operators of dimension k are vector chiral primaries. The lowest dimension
operators are the R symmetry currents, which couple to the k = 1 A±αµ bulk fields. The
latter satisfy the Chern-Simons equation
Fµν(A
±α) = 0, (5.8)
where Fµν(A
±α) is the curvature of the connection and the index α = 1, 2, 3 is an SU(2)
adjoint index. We will here only discuss the vevs of these vector chiral primaries.
Finally there is a tower of KK gravitons with m2 = k(k + 2) but only the massless
graviton, dual to the stress energy tensor, will play a role here. Note that it is the com-
bination Ĥµν = ĥ
µν + π
0goµν which satisfies the Einstein equation; moreover one needs the
appropriate gauge covariant combination ĥ0µν given in (5.6).
Let us denote by (O
) the chiral primary operators dual to the fields (s
I , σ
respectively. The vevs of the scalar operators with dimension two or less can then be
expressed in terms of the coefficients in the asymptotic expansion as
i ]1;
I ]2; (5.9)
2[σ2I ]2 −
2aIij
i ]1[s
Here [ψ]n denotes the coefficient of the z
n term in the asymptotic expansion of the field
ψ. The coefficient aIij refers to the triple overlap between spherical harmonics, defined in
(C.5). Note that dimension one scalar spherical harmonics have degeneracy four, and are
thus labeled by i = 1, · · · 4.
Now consider the stress energy tensor and the R symmetry currents. The three dimen-
sional metric and the Chern-Simons gauge fields admit the following asymptotic expansions
ds23 =
g(0)µ̄ν̄ + z
g(2)µ̄ν̄ + log(z
2)h(2)µ̄ν̄ + (log(z
2))2h̃(2)µ̄ν̄
+ · · ·
dxµ̄dxν̄ ;
A±α = A±α + z2A±α
+ · · · (5.10)
The vevs of the R symmetry currents J±αu are then given in terms of terms in the asymptotic
expansion of A±αµ as
J±αµ̄
g(0)µ̄ν̄ ± ǫµ̄ν̄
A±αν̄ . (5.11)
The vev of the stress energy tensor Tµ̄ν̄ is given by
〈Tµ̄ν̄〉 =
g(2)µ̄ν̄ +
Rg(0)µ̄ν̄ + 8
1g(0)µ̄ν̄ +
(5.12)
where parentheses denote the symmetrized traceless combination of indices.
This summarizes the expressions for the vevs of chiral primaries with dimension two or
less which were derived in [12]. Note that these operators correspond to supergravity fields
which are at the bottom of each Kaluza-Klein tower. The supergravity solution of course
also captures the vevs of operators dual to the other fields in each tower. Expressions for
these vevs were not derived in [12], the obstruction being the non-linear terms: in general
the vev of a dimension p operator will include contributions from terms involving up to p
supergravity fields. Computing these in turn requires the field equations (along with gauge
invariant combinations, KK reduction maps etc) up to pth order in the fluctuations.
Now (apart from the stress energy tensor) none of the operators whose vevs are given
above is an SO(4) (R symmetry) singlet. For later purposes it will be useful to review
which other operators are SO(4) singlets. The computation of the linearized spectrum in
[30] picks out the following as SO(4) singlets:
τ0 ≡ 1
π0; t(r)0 ≡ 1
φ5(r)0, (5.13)
along with φ0i(r) with i = 1, · · · , 4. Recall ψ0 denotes the projection of the field ψ onto
the degree zero harmonic. The fields (τ0, t(r)0) are dual to operators of dimension four,
whilst the fields φ0i(r) are dual to dimension two (marginal) operators. The former lie in
the same tower as (σ2, s(r)2) respectively, whilst the latter are in the same tower as s(r)1. In
total there are (nt + 1) SO(4) singlet irrelevant operators and 4nt SO(4) singlet marginal
operators, where nt = 5, 21 for T
4 and K3 respectively.
Consider the SO(4) singlet marginal operators dual to the supergravity fields φi(r).
These operators have been discussed previously in the context of marginal deformations
of the CFT, see the review [32] and references therein. Suppose one introduces a free
field realization for the T 4 theory, with bosonic and fermionic fields (xiI(z), ψ
I(z)) where
I = 1, · · · , N . Then some of the marginal operators can be explicitly realized in the
untwisted sector as bosonic bilinears
∂xiI(z)∂̄x
I(z̄); (5.14)
there are sixteen such operators, in correspondence with sixteen of the supergravity fields.
The remaining four marginal operators are realized in the twisted sector, and are associated
with deformation from the orbifold point.
5.2 Application to the fuzzball solutions
The six-dimensional metric of (3.30) in the decoupling limit manifestly asymptotes to
ds2 =
(−dt2 + dy2) +
+ dΩ23
. (5.15)
where
dv(Ḟ (v)2 + Ḟ(v)2 + Ḟα−(v)2). (5.16)
Note that the vielbein (3.33) is asymptotically constant
V o = ΩT4
I2 0 0 0
Q1/Q5 0 0
Q5/Q1 0
0 0 0 I22
Ω4, (5.17)
but it does not asymptote to the identity matrix. Thus one needs the constant SO(5, 21)
transformation
V → V (V o)−1, G3 → V oG3. (5.18)
to bring the background into the form assumed in (5.1).
The fields are expanded about the background values, by expanding the harmonic func-
tions defining the solution in spherical harmonics as
f5kIY
k (θ3)
, K =
f1kIY
k (θ3)
, (5.19)
k≥1,I
(AkI)iY
k (θ3)
, A =
k≥1,I
(AkI)Y Ik (θ3)
Aα− =
k≥1,I
Y Ik (θ3)
The polar coordinates here are denoted by (r, θ3) and Y
k (θ3) are (normalized) spherical
harmonics of degree k on S3 with I labeling the degeneracy. Note that the restriction k ≥ 1
in the last three lines is due to the vanishing zero mode, see [12]. As in [12], the coefficients
in the expansion can be expressed as
f5kI =
L(k + 1)
dv(CIi1···ikF
i1 · · ·F ik), (5.20)
f1kI =
L(k + 1)Q1
Ḟ 2 + Ḟ2 + (Ḟα−)2
CIi1···ikF
i1 · · ·F ik ,
(AkI)i = −
L(k + 1)
dvḞiC
i1···ikF
i1 · · ·F ik ,
(AkI) = −
Q1L(k + 1)
dvḞCIi1···ikF
i1 · · ·F ik ,
Aα−kI = −
Q1L(k + 1)
dvḞα−CIi1···ikF
i1 · · ·F ik .
Here the CIi1···ik are orthogonal symmetric traceless rank k tensors on R
4 which are in one-
to-one correspondence with the (normalized) spherical harmonics Y Ik (θ3) of degree k on S
Fixing the center of mass of the whole system implies that
(f11i + f
1i) = 0. (5.21)
The leading term in the asymptotic expansion of the transverse gauge field Ai can be written
in terms of degree one vector harmonics as
(A1j)iY
(aα−Y α−1 + a
α+Y α+1 ), (5.22)
where (Y α−1 , Y
1 ) with α = 1, 2, 3 form a basis for the k = 1 vector harmonics and we have
defined
aα± =
e±αij(A1j)i, (5.23)
where the spherical harmonic triple overlap e±αij is defined in C.6. The dual field is given by
B = −
(aα−Y α−1 − aα+Y
1 ). (5.24)
Now given these asymptotic expansions of the harmonic functions one can proceed to expand
all the supergravity fields, and extract the appropriate combinations required for computing
the vevs defined in (5.9), (5.11) and (5.12). Since the details of the computation are very
similar to those in [12], we will simply summarize the results as follows. Firstly the vevs of
the stress energy tensor and of the R symmetry currents are the same as in [12], namely
〈Tµ̄ν̄〉 = 0; (5.25)
aα±(dy ± dt). (5.26)
The vanishing of the stress energy tensor is as anticipated, since these solutions should be
dual to R vacua. As in [12], however, the cancellation is very non-trivial. The vevs of the
scalar operators dual to the fields (s
I , σ
I ) are also unchanged from [12]:
2f51i); (5.27)
6(f12I − f52I));
2(−(f12I + f52I) + 8aα−aβ+fIαβ).
The internal excitations of the new fuzzball solutions are therefore captured by the vevs of
operators dual to the fields s
I with r > 6:
(5+nt)1
2(A1i);
(6+α−)1
2Aα−1i ; (5.28)
(5+nt)2
6(A2I);
(6+α−)2
6Aα−2I .
Here nt = 5, 21 for T
4 and K3 respectively, with α− = 1, · · · , b2− with b2− = 3, 19 respec-
tively. Thus each curve (F(v),Fα− (v)) induces corresponding vevs of operators associated
with the middle cohomology of M4. Note the sign difference for the vevs of operators which
are related to the distinguished harmonic function F(v).
6 Properties of fuzzball solutions
In this section we will discuss various properties of the fuzzball solutions, including the
interpretation of the vevs computed in the previous section.
6.1 Dual field theory
Let us start by briefly reviewing aspects of the dual CFT and the ground states of the R
sector; a more detailed review of the issues relevant here is contained in [12]. Consider
the dual CFT at the orbifold point; there is a family of chiral primaries in the NS sector
associated with the cohomology of the internal manifold, T 4 or K3. For our discussions
only the chiral primaries associated with the even cohomology are relevant; let these be
labeled as O(p,q)n where n is the twist and (p, q) labels the associated cohomology class. The
degeneracy of the operators associated with the (1, 1) cohomology is h1,1. The complete set
of chiral primaries associated with the even cohomology is then built from products of the
(Opl,qlnl )
nlml = N, (6.1)
where symmetrization over the N copies of the CFT is implicit. The correspondence between
(scalar) supergravity fields and chiral primaries is 3
σn ↔ O(2,2)(n−1), n ≥ 2; (6.2)
s(6)n ↔ O
(0,0)
(n+1)
, s(6+α̃)n ↔ O
(1,1)
(n)α̃
, α̃ = 1, · · · h1,1, n ≥ 1.
Spectral flow maps these chiral primaries in the NS sector to R ground states, where
hR = hNS − jNS3 +
jR3 = j
, (6.3)
where c is the central charge. Each of the operators in (6.1) is mapped by spectral flow to
a (ground state) operator of definite R-charge
(O(pl,ql)nl )
(OR(pl,ql)nl )
ml , (6.4)
jR3 =
(pl − 1)ml, j̄R3 = 12
(ql − 1)ml.
Note that R operators which are obtained from spectral flow of those associated with the
(1, 1) cohomology have zero R charge.
3As discussed in [12], the dictionary between (σn, s
n ) and (O
(2,2)
(n−1)
(0,0)
(n+1)
) may be more complicated,
since their quantum numbers are indistinguishable, but this subtlety will not play a role here.
6.2 Correspondence between geometries and ground states
In [11, 12] we discussed the correspondence between fuzzball geometries characterized by a
curve F i(v) and R ground states (6.4) with (pl, ql) = 1± 1. The latter are related to chiral
primaries in the NS sector built from the cohomology common to both T 4 and K3, namely
the (0, 0), (2, 0), (0, 2) and (2, 2) cohomology.
The following proposal was made in [11, 12] for the precise correspondence between
geometries and ground states; see also [33]. Given a curve F i(v) we construct the corre-
sponding coherent state in the FP system and then find which Fock states in this coherent
state have excitation number NL equal to nw, where n is the momentum and w is the wind-
ing. Applying a map between FP oscillators and R operators then yields the superposition
of R ground states that is proposed to be dual to the D1-D5 geometry.
This proposal can be straightforwardly extended to the new geometries, which are char-
acterized by the curve F i(v) along with h1,1 additional functions (F(v),Fα− (v)). Consider
first the T 4 system, for which the four additional functions are F ρ(v). Then the eight
functions F I(v) ≡ (F i(v), F ρ(v)) can be expanded in harmonics as
F I(v) =
(αIne
−inσ+ + (αIn)
∗einσ
), (6.5)
where σ+ = v/wR9. The corresponding coherent state in the FP system is
, (6.6)
where
is a coherent state of the left moving oscillator âIn, satisfying â
= αIn
Contained in this coherent state are Fock states, such that
(âInI )
mI |0〉 , N =
nImI . (6.7)
Now retain only the terms in the coherent state involving these Fock states, and map the
FP oscillators to CFT R operators via the dictionary
(â1n ± iâ2n) ↔ OR(±1+1),(±1+1)n ; (6.8)
(â3n ± iâ4n) ↔ OR(±1+1),(∓1+1)n ;
âρn ↔ O
R(1,1)
(ρ−4)n.
The dictionary for the case of K3 is analogous. Here one has four curves F i(v) describing
the transverse oscillations and twenty curves F α̃(v) describing the internal excitations. The
oscillators associated with the former are mapped to operators associated with the universal
cohomology as in (6.8) whilst the oscillators associated with the latter are mapped to
operators associated with the (1, 1) cohomology as
âα̃n ↔ O
R(1,1)
α̃n . (6.9)
This completely defines the proposed superposition of R ground states to which a given
geometry corresponds. Note that below we will suggest that a slight refinement of this
dictionary may be necessary, taking into account that one of the internal curves is dis-
tinguished by the duality chain. For the distinguished curve the mapping may include a
negative sign, namely ân ↔ −OR(1,1)n ; this mapping would explain the relative sign between
the vevs found in (5.28) associated with the distinguished curve F and the remaining curves
Fα respectively.
Note that there is a direct correspondence between the frequency of the harmonic on
the curve and the twist label of the CFT operator. The latter is strictly positive, n ≥ 1,
and thus in the dictionary (6.8) there are no candidate CFT operators to correspond to
winding modes of the curves (F(v),Fα− (v)). In the case of T 4 such candidates might be
provided by the additional chiral primaries associated with the extra T 4 in the target space
of the sigma model, discussed in [34]. However the latter is related to the degeneracy of
the right-moving ground states in the dual F1-P system, rather than to winding modes.
For K3 all chiral primaries have been included (except for the additional primaries which
appear at specific points in the K3 moduli space). Thus one confirms that winding modes
of the curves (F(v),Fα− (v)) should not be included in constructing geometries dual to the
R ground states. As discussed in appendix D these winding modes may describe geometric
duals of states in deformations of the CFT.
6.3 Matching with the holographic vevs
In this section we will see how the general structure of the vevs given in (5.28) can be
reproduced using the proposed dictionary. The holographic vevs take the form
O(1,1)α̃kI
dvḞ α̃CIi1···ikF
i1 · · ·F ik . (6.10)
Thus the vevs of the operators O(1,1)α̃kI are zero unless the curve F α̃(v) is non-vanishing and
at least one of the F i(v) is non-vanishing. Moreover, the dimension one operators will
not acquire a vev unless the transverse and internal curves have excitations with the same
frequency. Analogous selection rules for frequencies of curve harmonics apply for the vevs
of higher dimension operators.
These properties of the vevs follow directly from the proposed superpositions, along
with selection rules for three point functions of chiral primaries. The superposition dual to
a given set of curves is built from the R ground states
ORI |0〉 =
(OR(pl,qq)nl )ml |0〉, (6.11)
l nlml = N and I labeling the degeneracy of the ground states. So this superposition
can be denoted abstractly as |Ψ) =
I aIORI |0〉 with certain coefficients aI . In particular,
if the curve F α̃(v) = 0 the superposition does not contain any R ground states built from
OR(1,1)α̃n operators. Moreover, if there are no transverse excitations, the superposition will
contain only states with zero R charge.
Now consider evaluating the vev of a dimension k operator O(1,1)α̃k in such a superposition.
This is determined by three point functions between this operator and the chiral primary
operators occurring in the superposition. More explicitly, the operator vev is related to
three point functions via
(ΨNS |O(1,1)α̃k |ΨNS) =
a∗IaJ 〈(OI)†(∞)O
(1,1)
α̃k (µ)(O
J )(0)〉. (6.12)
Here OI is the NS sector operator which flows to ORI in the R sector and |ΨNS) is the flow
of the superposition back to the NS sector, namely
I aIOI |0〉. The quantity µ is a mass
scale. Note we are evaluating the relevant three point function in the NS sector, and have
hence flowed the ground states back to NS sector chiral primaries. We would get the same
answer by flowing the operator whose vev we wish to compute, O(1,1)
, into the Ramond
sector and computing the three point function there. Recall that the R charges of these
operators are related by the spectral flow formula (6.3) as jNS3 = j
N . In particular,
NS sector chiral primaries built only from operators associated with the middle cohomology
all have the same R charges, namely 1
There are two basic selection rules for the three point functions (6.12). Firstly, as usual
one has to impose conservation of the R charges. Secondly, a basic property of such three
point functions is that they are only non-zero when the total number of operators O(1,1)α̃ with
a given index α̃ in the correlation function is even 4. From a supergravity perspective one can
see this selection rule arising as follows. One computes n-point correlation functions using
n-point couplings in the three dimensional supergravity action, with the latter following
from the reduction of the ten-dimensional action on S3×M4. Since a (1, 1) form integrates
4Note that this selection rule was used for the computation of three point functions of single particle
operators in the orbifold CFT in [35].
to zero over M4, the three dimensional action only contains terms with an even number of
fields sα̃ associated with a given (1, 1) cycle α̃ on M4. Therefore non-zero n-point functions
must contain an even number of operators O(1,1)α̃ , and so do corresponding multi-particle
3-point functions obtained by taking coincident limits.
Expressed in terms of cohomology, allowed three point functions contain an even number
of (1, 1)α̃ cycles labeled by α̃. Thus in single particle correlators one can have processes
such as O(0,0) +O(1,1)α̃ → O
(1,1)
α̃ and O
(1,1)
α̃ +O
(1,1)
α̃ → O(2,2), but processes such as O(0,0) +
O(1,1)α̃ → O(0,0) which involve an odd number of α̃ cycles are kinematically forbidden.
This kinematical selection rule for (1, 1) cycle conservation immediately explains why the
operator O(1,1)α̃k can only acquire a vev when the curve F α̃(v) is non-vanishing: only then
does the ground state superposition contain operators OR(1,1)α̃ such that the selection rule
can be satisfied.
One can also easily see why the operator only acquires a vev if there are transverse
excitations as well. All Ramond ground states associated with the middle cohomology have
zero R charge, with the corresponding chiral primaries in the NS sector having the same
charge jNS3 =
N . Thus a superposition involving only O(1,1) operators has a definite R
charge, and a charged operator cannot acquire a vev. Including transverse excitations means
that the superposition of Ramond ground states contains charged operators, associated with
the universal cohomology, and does not have definite R charge. Therefore a charged operator
can acquire a vev.
Thus, to summarize, the proposed map between curves and superpositions of R ground
states indeed reproduces the principal features of the holographic vevs. Using basic selection
rules for three point functions we have explained why the operators O(1,1)
acquire vevs only
when the curve F α̃(v) is non-zero and when there are excitations in R4. We will see below
that using reasonable assumptions for the three point functions we can also reproduce the
selection rules for vevs relating to frequencies on the curves. Before discussing the general
case, however, it will be instructive to consider a particular example.
6.4 A simple example
Consider a fuzzball geometry characterized by a circular curve in the transverse R4 and one
additional internal curve, with only one harmonic of the same frequency:
F 1(v) =
cos(2πn
); F 2(v) =
sin(2πn
); F(v) = µB
cos(2πn
), (6.13)
where µ =
Q1Q5/R and the D1-brane charge constraint (5.16) enforces
(A2 + 1
B2) = 1. (6.14)
The corresponding dual superposition of R ground states is then given by
|Ψ) =
Cl(OR(2,2)n )l(O
R(1,1)
−l |0〉 , (6.15)
− l)!l!
with the operators being orthonormal in the large N limit. In the case that either A or B
are zero the superposition manifestly collapses to a single term. In the general case, this
superposition gives the following for the expectation values of the R charges:
Ψ|jR3 |Ψ
Ψ|j̄R3 |Ψ
C2l l; (6.16)
N/n−1
− 1)!
− (l + 1))!
A2(l+1)(
−(l+1)) =
Evaluating (5.26) for (6.13) gives
A2(dy ± dt), (6.17)
and thus the integrated R charges defined in our conventions as
〈j3〉 =
; 〈j̄3〉 =
, (6.18)
agree with those of the superposition of R ground states.
The kinematical properties also match between the geometry and the proposed super-
position. In particular, when B 6= 0 the SO(2) symmetry in the 1-2 plane is broken: the
harmonic functions (K,A) depend explicitly on the angle φ in this plane. The asymptotic
expansions of these functions involve charged harmonics, and therefore charged operators
acquire vevs characterizing the symmetry breaking. More explicitly, the relevant terms in
(5.20) are
f1kI ∝
dv(A2 +B2 sin2(
))CIi1···ikF
i1 · · ·F ik ; (6.19)
AkI ∝
dvB sin(
))CIi1···ikF
i1 · · ·F ik .
Now the symmetric tensor of rank k and SO(2) charge in the 1-2 plane of ±m behaves as
((F 1)2 + (F 2)2)k−m(F 1 ± iF 2)m = (µA
)ke±2πinm
L . (6.20)
Note that m is related to (j3, j̄3) via m = j3 + j̄3. Thus, when B 6= 0, harmonics in the
expansion of f1 with charges |m| = 2 are excited, and terms with |m| = 1 are excited in the
expansion of A. Following (6.10) the latter implies that the dimension k operators O(1,1)
1(km)
only acquire vevs when their SO(2) charge m in the 1-2 plane is ±1. In particular using
(5.28) the vevs of the dimension one operators are
〈O(1,1)
1(1±1)〉 = ∓i
µAB, (6.21)
where the normalized degree one symmetric traceless tensors are
2(F 1 ± iF 2).
These properties are implied by the superposition (6.15). The latter is a superposition
of states with different R charge, and therefore it does break the SO(2) symmetry, with
the symmetry breaking being characterized by the vevs of charged operators. Moreover
following (6.12) the vev of O(1,1)
1(km)
is given by
C∗l Cl′〈(O(2,2)n )l(O
(1,1)
−l|O(1,1)
1(km)
(µ)|(O(2,2)n )l
(O(1,1)1n )
−l′〉. (6.22)
For the dimension one operators, charge conservation reduces this to
C∗l±1Cl〈(O(2,2)n )l±1(O
(1,1)
∓1−l|O(1,1)
1(1±1)(µ)|(O
(2,2)
l(O(1,1)1n )
−l〉. (6.23)
Thus there are contributions only from neighboring terms in the superposition. Computing
the actual values of these vevs is beyond current technology: one would need to know three
point functions for single and multiple particle chiral primaries at the conformal point.
However, as in [12], the behavior of the vevs as functions of the curve radii (A,B) can be
captured by remarkably simple approximations for the correlators, motivated by harmonic
oscillators. Suppose one treats the operators as harmonic oscillators, with the operator
O(1,1)
1(11)
destroying one O(1,1)1n and creating one O
(2,2)
n . For harmonic oscillators such that
[â, â†] = 1 the normalized state with p quanta is given by |p〉 = (â†)p/
p!|0〉 and therefore
â†|p〉 =
p+ 1|p+ 1〉. Using harmonic oscillator algebra for the operators gives
〈(O(2,2)n )l+1(O
(1,1)
−1−l|O(1,1)
1(11)
(µ)|(O(2,2)n )l(O
(1,1)
−l〉 ≈ µ
− l)(l + 1). (6.24)
Then the corresponding vev in the superposition |Ψ) is
〈O(1,1)
1(11)
〉Ψ = µ
N/n−1
c∗l+1cl
− l)(l + 1) = µN
AB, (6.25)
which has exactly the structure of (6.21). Given that such simple approximations (and
factorizations) of the correlators reproduce the structure of the vevs so well, it would be
interesting to explore whether this relates to simplifications in the structure of the chiral
ring in the large N limit.
Next consider the vevs of dimension k operators. Using charge conservation and (1, 1)
cycle conservation in (6.22) implies that only operators with m odd can acquire a vev. To
reproduce the holographic result, that vevs are non-zero only when m = ±1, requires the
assumption that only nearest neighbor terms in the superposition contribute to one point
functions. This would follow from a stronger selection rule for (1, 1) cycle conservation, that
the number of (1, 1) cycles in the in and out states differ by at most one. In particular,
multi-particle processes such as (O(1,1)ãn )3 + O
(1,1)
α̃n → (O
(2,2)
3 would be forbidden. The
selection rules for holographic vevs suggest that there is indeed such cycle conservation,
and it would be interesting to explore this issue further.
Let us now return to the comment made below (6.9), that one may need to include a
minus sign in the dictionary for the distinguished curve. Such a minus sign would intro-
duce factors of (−1)N/n−l into the superposition (6.15), and thence an overall sign in the
vevs of the associated operators O(1,1)
1(kI)
. This would naturally account for the relative sign
difference between the vevs associated with the distinguished curve and those associated
with the remaining curves. It is not conclusive that one needs such a minus sign without
knowing the exact three point functions and hence vevs. However such a sign change for
oscillators associated with the direction distinguished by the duality would not be surpris-
ing. Recall that under T-duality of closed strings right moving oscillators associated with
the duality direction switch sign, whilst the left moving oscillators and oscillators associated
with orthogonal directions do not.
6.5 Selection rules for curve frequencies
Selection rules for charge and (1, 1) cycles are sufficient to reproduce the general structure
of the vevs. In the particular example discussed above, these rules also implied the selection
rules for the curve frequencies: operators acquire vevs only when the transverse and internal
curves have related frequencies.
Here we will note how, with reasonable assumptions, one can motivate the selection rules
for frequencies in the general case. Consider the computation of the vev of a dimension one
operator O(1,1)α̃1 for a general superposition |Ψ) using (6.12). Using the selection rules for
charge and (1, 1) cycles, the contributions to (6.12) involve only certain pairs of operators
(OI ,OJ ). Their SO(2) charges must differ by (±1/2,±1/2) and they must differ by an
odd number of O(1,1)α̃ operators.
Now let us make the further assumption that there are contributions to (6.12) only from
pairs of operators (OI ,OJ ) which differ by only one term, the relevant operators taking
the form
OJ = O(p,q)n OJ̃ , (6.26)
with OJ̃ being the same for in and out states, but the single operator O(p,q)n differing
between in and out states. Thus we are assuming that the relevant three point functions
factorize, with the non-trivial part of the correlator arising from a single particle process.
This is indeed the structure of the three point functions arising in our example. Only
nearest neighbor terms in the superposition contribute in the computation of the vev of the
dimension one operator in (6.23). Moreover the m = ±1 charge selection rule for the vevs of
higher dimension operators immediately follows from restricting to nearest neighbor terms
in the three-point functions. Note further that this factorization structure is present in the
orbifold CFT computation of the three point functions. The operator O(1,1)α̃1 ≡ O
(1,1)
α̃1 I
is the identity operator in (N −1) copies of the CFT and thus only acts non-trivially in one
copy of the CFT.
Consider the case of the vev of the operator with SO(2) charges (1/2, 1/2); it would
take the form
I,J ,Ĩ
a∗IaJNĨ
〈(O(2,2)n )†(∞)O
(1,1)
α̃1 (µ)(O
(1,1)
α̃n )(0)〉 (6.27)
+〈(O(1,1)n )†(∞)O
(1,1)
α̃1 (µ)(O
(0,0)
α̃n )(0)〉
where NĨ is the norm of OĨ . Analogous expressions would hold for the dimension one
operators with other charge assignments. Such a factorization would immediately explain
the frequency selection rule found in the holographic vevs obtained from supergravity (6.10).
The superposition contains operators of the form (6.26) with both (p, q) = (1, 1) and (p, q) 6=
(1, 1) only when the internal curve and the transverse curves share a frequency. Extending
these arguments to vevs of higher dimension operators would be straightforward, and would
imply selection rules for curve frequencies.
6.6 Fuzzballs with no transverse excitations
Consider the case where the fuzzball geometry has only internal excitations, F i(v) = 0.
Then the corresponding dual superposition of ground states can involve only states built
from the operators OR(1,1)αn . Any such state will be a zero eigenstate of both jR3 and j̄R3 .
Furthermore, such ground states associated with the middle cohomology account for a finite
fraction of the entropy of the D1-D5 system. In the case of K3 the total entropy behaves
S = 2π
, (6.28)
with c = 24N . The ground states associated with the middle cohomology account for a
central charge c = 20N . In the case of T 4 the entropy behaves as (6.28) with c = 12N . The
states associated with the universal cohomology account for c = 4N , the odd cohomology
accounts for another c = 4N and the middle cohomology accounts for the final c = 4N .
Now let us consider the properties of the corresponding fuzzball geometry. When there
are no transverse excitations and no winding modes of the internal curves, the SO(4)
symmetry in R4 is unbroken, and the defining harmonic functions (4.4) reduce to
H = 1 +
; K =
; (6.29)
with Ai = 0 and where Q1 is defined in (5.16). The solutions manifestly all collapse to the
standard (singular) D1-D5 solution and so, whilst one would need an exponential number
of geometries (upon quantization) to account for dual ground states build from operators
associated with the middle cohomology, one has only one singular geometry. Therefore the
relevant fuzzball solutions are not visible in supergravity: one needs to take into account
higher order corrections.
One can understand this from several perspectives. Firstly, as discussed above, R ground
states associated with the middle cohomology have zero R charge; they do not break the
SO(4) symmetry. A geometry which is asymptotically AdS3 × S3 for which the SO(4)
symmetry is exact can be characterized by the vevs of SO(4) singlet operators. The only
such operators in supergravity are the stress energy tensor, and the scalar operators listed
in (5.13). Since the vev of the stress energy tensor must be zero for the D1-D5 ground
states, the geometry would have to be distinguished by the vevs of the singlet operators
given in (5.13).
Our results imply that these operators do not acquire vevs, and therefore within su-
pergravity (without higher order corrections) geometries dual to different R ground states
associated with the middle cohomology cannot be distinguished. The reason is the follow-
ing. The SO(4) singlet operators dual to supergravity fields are related to chiral primaries
by the action of supercharge raising operators; they are the top components of the multi-
plets. Thus these SO(4) singlet operators cannot acquire vevs in states built from the chiral
primaries. SO(4) singlet operators associated with stringy excitations would be needed to
characterize the different ground states.
A heuristic argument based on the supertube picture also indicates that geometries
dual to these ground states are not to be found in the supergravity approximation. The
geometries with transverse excitations in R4 can be viewed as a bound state of D1-D5
branes, blown up by their angular momentum in the R4. Indeed, the characteristic size of
the fuzzball geometry is directly related to this angular momentum. The simplest example,
related to a circular supertube, is to take a geometry characterized by a circular curve; this
is obtained by setting B = 0 in (6.13). The characteristic scale of the geometry is
rc ∼ gsµ/n, (6.30)
where gs is the string coupling and µ has dimensions of length, whilst the (dimensionless)
angular momentum behaves as j12 = N/n, and thus rc ∼ gsµ(j12/N). Hence the size of
the D1-D5 bound state increases linearly with the angular momentum. A general fuzzball
geometry will of course not be as symmetric but nonetheless the characteristic scale averaged
over the R4 is still related to the total angular momentum. In our previous paper [12] we
noted that fuzzball geometries dual to vacua for which the R charge is very small are not
well described by supergravity. Here we have found that this implies that an exponential
number of geometries dual to a finite fraction of the Ramond ground states, with strictly
zero R charge, cannot be described at all in the supergravity approximation.
7 Implications for the fuzzball program
In this section we will consider the implications of our results for the fuzzball program,
focusing in particular on whether one can find a set of smooth weakly curved supergravity
geometries which span the black hole microstates.
We have seen in the previous sections that the geometric duals of superpositions of R
vacua with small or zero R charge are not well-described in supergravity. The natural basis
for R ground states (6.4) uses states of definite R-charges, and it is therefore straightforward
to work out the density of ground states with given R-charges, dN,j3,j̄3 , with the total number
of ground states being given by dN =
N,j3,j̄3
dN,j3,j̄3 . This computation is discussed in
appendix E with the resulting density in the large N limit being
dN,j1,j2
4(N + 1− j)31/4
2π(2N − j)√
N + 1− j
cosh2(
N+1−j ) cosh
N+1−j )
, (7.1)
where j1 = (j3 + j̄3) and j
2 = (j3 − j̄3) and j = |j1| + |j2|. The key feature is that the
number of states with zero R charge differs from the total number of R ground states given
in (E.16) only by a polynomial factor:
dN,0,0 ∼= dN/N. (7.2)
The geometries dual to such ground states are unlikely to be well-described in supergravity,
and therefore the basis of black hole microstates labeled by R charges is not a good basis
for the geometric duals. This argument reinforces the discussion of [12], where we showed
in detail that the geometric duals of specific states (in this basis) must be characterized by
very small vevs which cannot be reliably distinguished in supergravity; they are comparable
in magnitude to higher order corrections.
The geometries that are smooth in supergravity correspond to specific superpositions
of the R charge eigenstates, for which some vevs are atypically large. The natural basis
for the field theory description of the microstates is thus not the natural basis for the
geometric duals. This issue is likely to persist in other black hole systems. For example, the
microstates of the D1-D5-P system are also most naturally described as (j3, j̄3) eigenstates,
with a relation analogous to (7.2) holding, so the number of states with zero R-charge is
suppressed only polynomially compared to the total number of black hole microstates. Just
as in the 2-charge system discussed here, the geometric duals are related to supertubes
whose radii depend on the R-charges. States or superpositions of states which have small
or zero R-charges are unlikely to be well-described by supergravity solutions. Thus a given
smooth supergravity geometry should be described by a specific superposition of the black
hole microstates. Identifying the specific superpositions for known 3-charge geometries is
an open and important question.
The issue is whether there exist enough such geometries, well-described and distinguish-
able in supergravity, to span the entire set of black hole microstates. It seems unlikely
that a basis exists which simultaneously satisfies all three requirements. Firstly, on general
grounds microstates with small quantum numbers will not be well-described in supergrav-
ity. Even when considering superpositions that are well described by supergravity, to span
the entire basis, one will have to include superpositions which can only be distinguished by
these small vevs. I.e. in choosing a basis of geometries for which some vevs are sufficiently
large for the supergravity description to be valid one will find that some of these geometries
cannot be distinguished among themselves in supergravity.
We have already seen several examples of this problem in the 2-charge system. Let us
parameterize the curves as
F i(v) = µ
(αine
2πinv/L + (αin)
∗e−2πinv/L); (7.3)
F β̃(v) = µ
(αβ̃ne
2πinv/L + (αβ̃n)
∗e−2πinv/L),
where µ =
Q1Q5/R and β̃ runs from 1 to h
1,1(M4). The D1-brane charge constraint (4.7)
limits the total amplitude of these curves as
) = 1. (7.4)
Thus in general increasing the amplitude in one mode, to make certain quantum numbers
large, decreases the amplitudes in the others. Moreover, the amplitude in a given mode is
bounded via |αn|2 ≤ 1/n2, and is thus is intrinsically very small for high frequency modes,
which sample vacua with large twist labels in the CFT. Note also that the vevs of R-charges
are given in terms of
jij = iN
n(αin(α
∗ − αjn(αin)∗) (7.5)
As we have seen, to be describable in supergravity, geometries must have transverse R4
excitations, and thus some large R-charges, requiring jij ≫ 1. Combining (7.5) and (7.4) one
sees that this restricts the amplitudes of the internal excitations, and thus of the sampling
of the black hole microstates associated with the middle cohomology of M4.
Another way to understand the limitations of supergravity is to go back to the F1-P
system where the corresponding state is the coherent state |{αin}, {α
m}). These states form
a complete basis of states, so we know that there is an F1-P geometry corresponding to
every 1/2 BPS state. However, only when all αin, α
m are large are the geometries well-
described and distinguishable within supergravity. Indeed, the amplitudes αin, α
m are also
the root mean deviations of the distribution around the mean (which is described by the
classical curve), so only for large αin, α
m is the classical string that sources the supergravity
solution a good approximation of the quantum state. Putting it differently, when some of the
amplitudes are small the difference in the solutions for different amplitudes is comparable
with the error in the solutions due to the approximation of the source by a classical string,
so one cannot reliably distinguish them within this approximation.
If one could not find a basis of distinguishable supergravity geometries spanning the mi-
crostates, one might ask whether a sufficiently representative basis exists. That is, suppose
one chooses a single representative of the indistinguishable geometries, and assigns a mea-
sure to this geometry. Then is the corresponding basis of weighted geometries sufficiently
representative to obtain the black hole properties? In the 2-charge system, the now com-
plete set of fuzzball geometries along with the precise mapping between these geometries
and R vacua allows these questions to be addressed at a quantitative level and we hope to
return to this issue elsewhere.
Acknowledgments
The authors are supported by NWO, KS via the Vernieuwingsimpuls grant “Quantum
gravity and particle physics” and IK, MMT via the Vidi grant “Holography, duality and
time dependence in string theory”. This work was also supported in part by the EU contract
MRTN-CT-2004-512194. KS and MMT would like to thank both the 2006 SimonsWorkshop
and the theoretical physics group at the University of Crete, where some of this work was
completed. The computer algebra package GRTensor was used to verify that our solutions
satisfy the supergravity field equations.
A Conventions
The following table summarises the indices used throughout the paper. In some cases an
index is used more than once, with different meanings, in separate sections of the paper.
Index Range Usage
(m,n) 0, · · · , 9 10d sugra fields
(M,N) 0, · · · , 5 6d sugra fields
(µ, ν) 0, 1, 2 3d fields
(a, b) 1, 2, 3 S3 indices
(i, j) 1, 2, 3, 4 R4 indices
(ρ, σ) 1, 2, 3, 4 M4 indices
(µ̄, ν̄) 0, 1 2d fields
(α, β) 1, 2, 3 SU(2) vector index
(γ, δ) 1, · · · , b2 H2(M4)
(α̃, β̃) 1, · · · , h1,1 H1,1(M4)
(I, J) 1, · · · , 8 SO(8) vector
((c), (d)) 1, · · · , 16 heterotic vector fields
((a), (b)) 1, · · · , 24 SO(4, 20) vector
(A,B) 1, · · · , 26 SO(5, 21) vector
(m,n) 1, · · · , 5 SO(5) vector
(r, s) 6, · · · , (nt + 1) SO(nt) vector
A.1 Field equations
The equations of motion for IIA supergravity are:
e−2Φ(Rmn + 2∇m∇nΦ−
H(3)mpqH
(3)pq
F (2)mpF
2 · 3!F
mpqrF
(4)pqr
(F (2))2 +
(F (4))2) = 0, (A.1)
4∇2Φ− 4(∇Φ)2 +R− 1
(H(3))2 = 0,
dH(3) = 0, dF (2) = 0, ∇mF (2)mn −
H(3)pqrF
(4)npqr = 0,
∇m(e−2ΦH(3)mnp)−
F (2)qr F
(4)qrnp − 1
2 · (4!)2 ǫ
npm1···m4n1···n4F
m1···m4F
n1···n4 = 0,
dF (4) = H(3) ∧ F (2), ∇mF (4)mnpq −
3! · 4!ǫ
npqm1···m3n1···n4H
m1···m3F
n1···n4 = 0.
The corresponding equations for type IIB are:
e−2Φ(Rmn + 2∇m∇nΦ−
H(3)mpqH
(3)pq
F (1)m F
F (3)mpqF
(3)pq
4 · 4!F
mpqrsFn
(5)pqrs
Gmn((F
(1))2 +
(F (3))2) = 0,
4∇2Φ− 4(∇Φ)2 +R− 1
(H(3))2 = 0,
dH(3) = 0, ∇m(e−2ΦH(3)mnp)− F (1)m F (3)mnp −
F (3)mqrF
(5)mqrnp = 0, (A.2)
dF (1) = 0, ∇mF (1)m +
H(3)pqrF
(3)pqr = 0,
dF (3) = H(3) ∧ F (1), ∇mF (3)mnp +
H(3)mqrF
(5)mqrnp = 0,
dF (5) = d(∗F (5)) = H(3) ∧ F (3),
where the Hodge dual of a p-form ωp in d dimensions is given by
(∗ωp)i1···id−p =
ǫi1···id−pj1···jpω
j1···jp
p , (A.3)
with ǫ01···d−1 =
√−g. The RR field strengths are defined as
F (p+1) = dC(p) −H(3) ∧C(p−2). (A.4)
The equations of motion for the heterotic theory are:
4∇2Φ− 4 (∇Φ)2 +R− 1
(H(3))2 − α′(F (c))2 = 0,
e−2ΦH(3)mnr
Rmn + 2∇m∇nΦ− 1
H(3)mrsH(3)nrs − 2α′F (c)mrF (c)nr = 0,
e−2ΦF (c)mn
e−2ΦH(3)nrsF (c)rs = 0.
mn with (c) = 1, · · · 16 are the field strengths of Abelian gauge fields V (c)m ; we consider
here only supergravity backgrounds with Abelian gauge fields. This restriction means that
the gauge field part of the Chern-Simons form in H3,
H(3) = dB(2) − 2α′ω3(V ) + · · · , (A.5)
does not play a role in the supergravity solutions, nor does the Lorentz Chern-Simons term
denoted by the ellipses.
A.2 Duality rules
The T-duality rules for RR fields were derived in [36] by reducing type IIA and type IIB
supergravities on a circle and relating the respective RR potentials in the 9-dimensional
theory. However, for calculations involving magnetic sources, it is more convenient to work
with T-duality rules for RR field strengths, since potentials can only be defined locally. In
the following we will rederive the T-duality rules in terms of RR field strengths.
It is slightly easier although not necessary to use the democratic formalism of IIA and
IIB supergravity introduced in [16]. In this formalism one includes p-form field strengths
for p > 5 with Hodge dualities relating higher and lower-form field strengths being imposed
in the field equations. This formalism is natural when both magnetic and electric sources
are present; moreover there is no need for Chern-Simons terms in the field equations. The
RR part of the (pseudo)-action is simply
SRR = −
2κ210
(F (q))2, (A.6)
where q = 2, 4, 6, 8 is even in the IIA case and q = 1, 3, 5, 7, 9 is odd in the IIB case. The
field strengths are defined as F (q) = dC(q−1) − H(3) ∧ C(q−3) for q ≥ 3 and Fq = dC(q−1)
for q < 3. The Hodge duality relation between higher and lower form field strengths in our
conventions is
∗F (q) = (−1)⌊
⌋F (10−q), (A.7)
where ⌊n⌋ denotes the largest integer less or equal to n.
Now to compactify on a circle the ten-dimensional metric can be parameterized as
ds2 = e2ψ(dy −Aµdxµ)2 + ĝµνdxµdxν , (A.8)
where y denotes the compact direction, and 9-dimensional quantities will be denoted as
hatted. An economic way to derive the T-duality rules for the field strengths is the following.
Choose the vielbein to be
ey = eψ(dy −Aµdxµ); eµ = êµ, (A.9)
where µ denotes a tangent space index, and êµ is the 9-dimensional vielbein. Now reduce
the field strengths (in the tangent frame) as
F̂ (q)µ
= F (q)µ
, F̂ (q−1)µ
= F (q)µ
y. (A.10)
The corresponding 9-dimensional action for the field strengths is given by
SRR = −
2κ210
eψF̂ 2q . (A.11)
Since ψIIA = −ψIIB under T-duality, one can read from this action the transformation
rules for field strengths in 10d:
(q+1)
y = e
, (A.12)
F̃ (q+1)µ
= eψF (q+2)µ
Here q even defines IIB fields in terms of IIA fields and q odd defines IIA in terms of IIB.
Note that the field strengths on both sides are in the tangent frame. Given the T-duality
rules for NSNS fields
eψ̃ = e−ψ, Ãµ = B
yµ , B̃
ym = Am, (A.13)
B̃(2)mn = B
mn + 2A[mB
, Φ̃ = Φ− ψ,
with the metric gmn invariant, one can easily convert (A.12) back into
F (q)m1...mq = F
(q+1)
m1...mqy
− q(−1)qB(2)
(q−1)
m2...mq]
+ q(q − 1)B(2)
(q−1)
m3...mq]y
F (q)m1...mq−1y = F
(q−1)
m1...mq−1
− (q − 1)(−1)qA[m1F
(q−1)
m2...mq−1]y
. (A.14)
Strictly speaking, this gives the duality rules in the democratic formalism. However we can
obtain the usual rules by simply dropping the (p > 5)-form field strengths as long as we
make sure to self-dualise F (5) in each IIB solution.
The S duality rules for type IIB are
τ̃ = −1
, B̃(2) = C(2), C̃(2) = −B(2),
F̃ (5) = F (5), G̃mn = |τ |Gmn, (A.15)
where τ = C(0) + ie−Φ.
B Reduction of type IIB solutions on K3
The reduction of type IIB on K3 is very similar to the reduction of type IIA, which was
discussed in some detail in [37]. In the following we will use the reduction of the NS-NS
sector fields given in [37], and derive the reduction of the type IIB RR fields. Let us first
review the reduction of the NS-NS sector. Starting from the ten-dimensional action
SNS =
2κ210
e−2Φ̂(R̂+ 4(∂Φ̂)2 − 1
Ĥ23 )
, (B.1)
where ten-dimensional fields are denoted by hats, the corresponding six-dimensional field
equations can be derived from the action [37]
−ge−2Φ
R+ 4(∂Φ)2 − 1
H23 +
tr(∂M−1∂M)
, (B.2)
where the six-dimensional fields are defined as follows. Firstly the 10-dimensional 2-form
potential is reduced as
B̂(2)(x, y) = B2(x) + b
γ(x)ω
2 (y), (B.3)
where (x, y) are six-dimensional and K3 coordinates respectively and the two forms ω
2 with
γ = 1, · · · 22 span the cohomology H2(K3,R). The 2-forms ωγ2 transform under an O(3, 19)
symmetry, with a metric defined by the 22-dimensional intersection matrix
dγδ =
(2π)4V
2 ∧ ω
2, (B.4)
where (2π)4V is the volume of K3. A natural choice for dγδ is
dγδ =
0 −I19
 , (B.5)
corresponding to a diagonal basis for the 3 self-dual and 19 anti-self dual two forms of K3.
Furthermore, there is a matrix Dδ γ defined by the action of the Hodge operator
∗K34 ω
2 = ω
γ , (B.6)
which is dependent on the K3 metric and satisfies
ǫ = δ
δdǫζD
γ = dδζ . (B.7)
The SO(4, 20) matrix of scalars M−1
(a)(b)
was derived in [37] to be
M−1 = ΩT2
e−ρ + bγbδdγǫD
eρb4 1
eρb2 1
eρb2bγdγδ + b
γdγǫD
eρb2 eρ eρbγdγδ
eρb2bγdγδ + b
γdγǫD
ρbγdγδ e
ρbǫdǫγb
ζdζδ + dγǫD
Ω2, (B.8)
with b2 ≡ bγbδdγδ. Here ρ is the breathing mode of K3, e−ρ = 1(2π)4V
∗41. The six-
dimensional dilaton is related to the 10-dimensional dilaton via Φ = Φ̂ + ρ/2.
The dimensional reduction of the NS sector makes manifest only an SO(4, 20) subgroup
of the full SO(5, 21) symmetry. Including the reduction of the RR sector should thus give
the equations of motion following from the six-dimensional string frame action, which for
IIB was given in (3.23)
R+ 4(∂Φ)2 +
tr(∂M−1∂M)
∂l(a)M−1
(a)(b)
∂l(b)
GAMNPM−1ABG
and in which only an SO(4, 20) subgroup of the total SO(5, 21) symmetry is manifest; recall
that M−1AB here is an SO(5, 21) matrix, with M
(a)(b)
being SO(4, 20). Note that the six-
dimensional coupling is related to the ten-dimensional coupling via (2π)4V (2κ26) = 2κ
where (2π)4V is the volume of K3.
Following the same steps as [37] the RR potentials can be reduced as
Ĉ(0)(x, y) = C0(x), Ĉ
(2)(x, y) = C2(x) + c
(0,2)
2 (y), (B.9)
Ĉ(4)(x, y) = C4(x) + c
(2,4)
(x) ∧ ωγ2 (y) + c(0,4)(x)(e
ρ ∗K3 1)(y),
where ∗K3 denotes the Hodge dual in the K3 metric and the corresponding field strengths
F̂ (1)(x, y) = F1(x), (B.10)
F̂ (3)(x, y) = dC2(x)− C0(x)H3(x) +
(0,2)
(x)− C0(x)dbγ(x)
ω2(y) ≡ F3 +Kγ1 ∧ ω
Ĥ(3)(x, y) = dB2(x) + db
γ(x) ∧ ωγ2 (y) ≡ H3 + db
γ ∧ ωγ2 ,
F̂ (5)(x, y) = dC4(x)− C2(x) ∧H3(x) +
(2,4)
(x)− C2(x)dbγ(x)− cγ(0,2)(x)H3(x)
∧ ωγ2 (y)
dc(0,4)(x)− cγ0,2(x)dbδ(x)dγδ
∧ (eρ(x) ∗K3 1)(y)
≡ F5 +Kγ3 ∧ ω
2 + F̃1 ∧ eρ ∗K3 1.
The reduction of the potentials thus gives two three form field strengths H3 and F3, 3 self-
dual and 19 anti-self dual three form field strengths K
3 and 46 scalars b
γ , c
(0,2)
, c(0,4) and
C0. After splitting the three forms H3 and F3 into their self-dual and anti-self-dual parts,
we obtain 5 self-dual and 21 anti-self dual tensors in total, as described in [38].
It is then straightforward to obtain the map relating six and ten-dimensional fields by
inserting the expressions (B.9) and (B.10) into the ten-dimensional field equations (A.2).
The additional RR scalars are contained in
l(a) = ΩT2
c̃(0,4)
(0,2)
, (B.11)
with Ω2 defined in the appendix B.2 and the shifted fields defined as
(0,2)
(0,2)
− C0bγ , (B.12)
c̃(0,4) = c(0,4) − bγcδ(0,2)dγδ +
b2C0.
The fields Φ, l(a) and the SO(4, 20) matrix M−1 given in (B.8) can be recombined into
the SO(5, 21) matrix M−1 = V TV , with the latter conveniently expressed in terms of the
vielbein
V = ΩT4
e−Φ 0 0 0 0
−eΦ(C0c(0,4) − 12c
(0,2)
) eΦ −eΦc̃(0,4) −eΦC0 eΦc̃γ(0,2)dγδ
e−ρ/2C0 0 e
−ρ/2 0 0
eρ/2c(0,4) 0
eρ/2b2 eρ/2 eρ/2bγdγδ
Ṽδγc
(0,2)
0 Ṽδγb
γ 0 Ṽγδ
Ω4. (B.13)
Here the SO(3, 19) vielbein Ṽαβ is defined by dαβD
γ = ṼαβṼβγ , c
(0,2)
(0,2)
(0,2)
dγδ and
the matrix Ω4 is defined in the appendix B.2. The six-dimensional tensor fields are related
to the ten-dimensional fields as
H13 =
(1 + ∗6)H3, Hα++13 = −
(Ṽ K3)
α+ , (B.14)
H53 = −
e−ρ/2
(1 + ∗6)F3, H63 = −
e−ρ/2
(1− ∗6)F3,
3 = −
(Ṽ K3)
α− , H263 =
(1− ∗6)H3.
Here α+ = 1, 2, 3 and α− = 4, · · · 22, labeling the self dual and anti-self dual forms respec-
tively. Note that using formulas (B.13) and (B.14) to lift a six-dimensional solution to ten
dimensions requires a specific choice of six-dimensional vielbein.
The solutions we find have D
= dγδ ; this implies the identity
2 )ρσ(ω
α−β− , (B.15)
where (ρ, τ) are K3 coordinates and gρτ is the K3 metric. As discussed in [39], a choice of
ǫ fixes the complex structure completely and implies (ω
2 )ρσ(ω
ρσ = Dǫ δdγǫ. Varying
this identity with respect to the metric results in (B.15).
B.1 S-duality in 6 dimensions
Given the map between 10-dimensional and 6-dimensional fields, we can now obtain the
action of S-duality on 6-dimensional fields as part of the SO(5, 21) symmetry:
G3 → OSG3, M−1 → OSM−1OTS , (B.16)
where
(OS)ij =
0 0 −1
0 I3 0
1 0 0
, (OS)rs =
0 0 1
0 I19 0
−1 0 0
, (B.17)
Moreover one can perform an SO(5) × SO(21) transformation to bring the vielbein of the
S-dual solution back to the form used by the 10-dimensional lift. Including this transfor-
mation, H3 and V transform as
H3 → OGH3, V → OGV OTS , (B.18)
(OG)ij =
C0 0 −eΦ̂
0 I3 0
eΦ̂ 0 C0
, (OG)rs =
C0 0 −eΦ̂
0 I19 0
eΦ̂ 0 C0
, (B.19)
where τ = C0 + ie
−Φ̂, Φ̂ = Φ− ρ/2 is the 10-dimensional dilaton and the fields C0 and eΦ̂
are the original ones taken before the S-duality.
B.2 Basis change matrices
In defining six-dimensional supergravities there are implicit choices of constant SO(p, q)
matrices. When discussing the compactification from the ten to six dimensions, the most
convenient choices for these matrices are certain off-diagonal forms, see for example [15,
17, 19, 20, 21, 22]. When one is interested in specific solutions of the six-dimensional
supergravity equations, such as AdS3 × S3 solutions, and deriving the spectrum in such
backgrounds, it is rather more convenient to use diagonal choices for these matrices, see for
example [30, 31]. In this paper we both compactify from ten to six dimensions, and expand
six-dimensional solutions about a given background. We therefore find it most convenient to
use diagonal choices for the constant matrices. To use previous results on compactification
and T-duality, we need to apply certain similarity transformations. For the most part these
may be implicitly written in terms of basis change matrices, so that compactification and
duality formulas remain as simple as possible. Thus let us define matrices Ω1 and Ω2 for
SO(4, 20), and Ω3 and Ω4 for SO(5, 21) via:
(vρ −wρ)
(vρ +wρ)
, ΩT3
(v − w)
(v + w)
,(B.20)
(v − w)
(v + w)
, ΩT4
(v1 − w1)
(v2 − w2)
(v2 + w2)
(v1 + w1)
where ρ = 1, · · · 4, (c) = 1, · · · 16, (a) = 1, · · · 24, α = 1, 2, 3 and α− = 1, · · · 19. These satisfy
the conditions:
0 −I4 0
−I4 0 0
0 0 −I16
ΩT1 =
0 −I20
 , (B.21)
σ1 0 0
0 I3 0
0 0 −I19
ΩT2 =
0 −I20
σ1 0 0
0 I4 0
0 0 −I20
ΩT3 =
0 −I21
σ1 0 0 0
0 σ1 0 0
0 0 I3 0
0 0 o −I19
ΩT4 =
0 −I21
Here σ1 is the Pauli matrix
C Properties of spherical harmonics
Scalar, vector and tensor spherical harmonics satisfy the following equations
�Y I = −ΛkY I , (C.1)
�Y Iva = (1− Λk)Y Iva , DaY Iva = 0,
�Y It
= (2− Λk)Y It(ab), D
aY It
k(ab)
where Λk = k(k+2) and the tensor harmonic is traceless. It will often be useful to explicitly
indicate the degree k of the harmonic; we will do this by an additional subscript k, e.g.
degree k spherical harmonics will also be denoted by Y Ik , etc. � denotes the d’Alambertian
along the three sphere. The vector spherical harmonics are the direct sum of two irreducible
representations of SU(2)L × SU(2)R which are characterized by
ǫabcD
bY cIv± = ±(k + 1)Y Iv±a ≡ λkY Iv±a . (C.2)
The degeneracy of the degree k representation is
dk,ǫ = (k + 1)
2 − ǫ, (C.3)
where ǫ = 0, 1, 2 respectively for scalar, vector and tensor harmonics. For degree one vector
harmonics Iv is an adjoint index of SU(2) and will be denoted by α. We use normalized
spherical harmonics such that
Y I1Y J1 = Ω3δ
I1J1 ;
Y aIvY Jva = Ω3δ
IvJv ;
Y (ab)ItY Jt
= Ω3δ
ItJt, (C.4)
where Ω3 = 2π
2 is the volume of a unit 3-sphere. We define the following triple integrals as
Y IY JY K = Ω3aIJK ; (C.5)
(Y α±1 )
1 = Ω3e
αij ; (C.6)
D Interpretation of winding modes
In the fundamental string supergravity solutions (2.1) the null curves describing the motion
of the string along a torus direction xρ (whose periodicity is 2πRρ) could have winding modes
such that Fρ(v) = wρRρv/Ry, with wρ integral. Consider now the correspondence with
quantum string states. Such winding modes are not consistent with both supersymmetry
and momentum and winding quantization for a string propagating in flat space, with no B
field. Recall that the zero modes of a worldsheet compact boson field can be written as
X(σ+, σ−) = x+
+ nR)σ+ +
− nR)σ− ≡ x+ w̃σ+ + wσ−, (D.1)
where R is the radius and (p, n) are the quantized momentum and winding respectively; note
that we define σ± = (τ ± σ). BPS left-moving states with no right-moving excitations have
w = 0 and hence α′p = −nR2. However the latter condition has no solutions at generic
radius and so states with winding along the torus directions cannot be BPS. Therefore
winding modes should not be included to describe the F1-P states and corresponding dual
D1-D5 ground states of interest here.
Now consider switching on constant B
ρv ≡ bρ on the worldsheet. The constant B field
shifts the momentum charges, and thus there are BPS left-moving states with winding
around the torus directions. To be more precise, following the discussion of [12], one can
describe a string with left-moving excitations using a null lightcone gauge. The relevant
terms in the worldsheet fields are then
V = wvσ−; U = wuσ− + w̃uσ+ +
a−n e
−inσ− ; (D.2)
XI = δIρw
Iσ− +
−inσ− ,
where winding modes are included only along torus directions, labeled by ρ. The L0 con-
straint implies
wvwu = (wρ)2 + 2
|n|aInaI−n ≡ (wv)2|∂VXI |20, (D.3)
where |A|0 denotes the projection onto the zero mode. The momentum and winding charges
are given by
dσ(∂τX
m +B(2)mn∂σX
n); Wm =
dσ∂σX
m, (D.4)
respectively, where α′ = 2. Requiring no winding in the time direction and no momentum
along the xρ directions imposes w̃u = wu + wv and wρ = bρw
v . The conserved momentum
and winding charges are then
PM = 1
(1 + |∂VXI |20 + b2ρ), (|∂VXI |20 − b2ρ), 0
; WM = wv(0, 1, 0, bρ). (D.5)
Note that the integral quantized momentum charge py along the y direction is therefore
py = Ry(w
u − (wv)−1(wρ)2). (D.6)
Now consider the solitonic string supergravity solution (2.1) with defining curves F I(v)
where F ρ(v) = bρv + F̄
ρ(v), with F̄ ρ(v) having no zero mode. The ADM charges of this
solitonic string were computed in [15], and are given by
PMADM = kQ
(1 + |∂vF I |20), |∂vF I |20, 0, bρ
, (D.7)
where the effective Newton constant is k = Ω3Ly/2κ
6. When bρ = 0 these charges match
the worldsheet charges (D.5) provided that wv = 2kQ as in [15] but when bρ 6= 0 they do
not quite agree with the worldsheet charges. The reason is that in the supergravity solution
ρv approaches zero at infinity, but to match with the constant B
ρv background on the
worldsheet, B
ρv should approach bρ at infinity. This can be achieved via a constant gauge
transformation Aρ → Aρ − bρ, combined with a coordinate shift u→ u+ 2bρxρ. The ADM
charges of this shifted background indeed exactly match the worldsheet charges (D.5). The
harmonic functions Aρ then take the form
Aρ = −bρH −
|x− F |2 , (D.8)
where in the latter expression |x−F |2 denotes
i−F i(v))2; the harmonic function has
been smeared over the T 4 and the y circle. Note that when F i(v) = 0 the supergravity
solution collapses to
ds2 = H−1dv(−du +Kdv) + dxIdxI ; K = (1 +
Q|∂vF ρ|20
), (D.9)
e−2Φ = H ≡ (1 + Q
); B(2)uv =
(H−1 − 1); B(2)vρ = −bρ.
This is the naive SO(4) invariant F1-P solution, with an additional constant B field. Finally
let us note that one can similarly switch on winding modes for the curves q(c)(v) charac-
terizing the charge waves in the heterotic solution (3.1) by including constant A
v on the
worldsheet.
Now let us consider solutions in the D1-D5 system, and the interpretation of including
winding modes of the internal curves. In particular, it is interesting to note that the general
SO(4) invariant solutions include harmonic functions
A = ao +
; Aα− = aα−o +
, (D.10)
in addition to the harmonic functions (H,K) given in (6.29). The non-constant terms in
these harmonic functions are related to the winding modes of the internal curves, with the
quantities aα̃ = (a, aα−) being given by
a = −Q5
dvḞ(v); aα− = −Q5
dvḞα−(v). (D.11)
Following the duality chain, these constants are given by aα̃ = −Q5bα̃ where for the T 4
case bα̃ ≡ B(2)ρv = bρ and for the K3 case bα̃ ≡ (B(2)ρv = bρ, A(c)v = b(c)). The constant terms
(ao, a
o ) are related to the boundary conditions at asymptotically flat infinity, as we will
discuss below.
When these functions (A,Aα−) are non-zero, the geometry generically differs from the
naive D1-D5 geometry. The functions (f1, f̃1) appearing in the metric behave as
f̃1 = 1 +
− (1 + Q5
(ao +
)2 + (aα−o +
f1 = 1 +
− (1 + Q5
(aα−o +
. (D.12)
In the decoupling limit these functions become
f̃1 → r−2(Q1 −Q−15 (a2 + aα−aα−)) ≡
; f1 → r−2(Q1 −Q−15 (aα−aα−)) ≡
, (D.13)
and thus (ao, a
o ) drop out. Note that q̃1 corresponds to the conserved momentum charge
in the F1-P system (D.6). Substituting the decoupling region functions into (4.1), one finds
that the near horizon region of the solution is AdS3×S3×M4, supported by both F (3) and
H(3) flux:
ds2 =
(−dt2 + dy2) +
q1Q5(
+ dΩ23) +
ds2M4 ; (D.14)
e2Φ =
Q5q̃1
tyr = −
= 2q−11 q̃1Q5;
tyr = 2aQ
1 r, H
= −2a.
The field strengths F (1) and F (5) vanish, but there are non-vanishing potentials:
B(2)ρσ =
2Q−15 a
α−ωα−ρσ , C
(0) = −q−11 a, C(4)ρστπ = Q
5 aǫρστπ; (D.15)
tyαβ = a(1 + q̃
2)ǫαβ, C
αβρσ = 2
2ǫαβa
α−ωα−ρσ , C
tyρσ =
2Q−15 a
α−ωα−ρσ ,
where ǫ is a 2-form such that dǫ is the volume form of the unit 3-sphere. The conserved
charges therefore include Chern-Simons terms; using the equations of motion (A.2) one
finds that they are given by
D5 : Q5 =
(F (3) +H(3)C(0));
D1 : q̃1 =
S3×M4
(∗F (3) +H(3) ∧ C(4)); (D.16)
D3 : aα− =
S3×ωα−
B(2) ∧ (F (3) +H(3)C(0));
NS5 : a = −1
H(3),
where we drop terms which do not contribute to the charges. The curvature radius of the
AdS3 × S3 is l = (q1Q5)1/4, and the three-dimensional Newton constant is
8πV4Ω3
(q1Q5)
3/4, (D.17)
with the volume of M4 being (2π)4V and 2κ210 = (2π)
7(α′)4. Then using [40, 41] the central
charge of the dual CFT is
(α)′4
q̃1Q5 ≡ 6ñ1n5 (D.18)
where the integral charges (ñ1, n5) are given by
Q5 = α
′n5; q̃1 =
(α′)3ñ1
. (D.19)
Now consider the relation between this system and the F1-P system discussed previously.
The conserved charges here are (Q5, q̃1, a, a
α−), which correspond to the winding, momen-
tum along the y circle and winding along the internal manifold in the original system. The
fact that (a, aα−) measure NS5-brane and D3-brane charges in the final system is consistent
with the duality chains from the F1-P systems: applying the standard duality rules along
the chains given in (2.6),(2.7) and (3.4), one indeed finds that the original winding charges
become NS5-brane and D3-brane charges.
Finally let us comment on the constant terms in the harmonic functions, (ao, a
These clearly determine the behavior of the solution at asymptotically flat infinity: the B
field and RR potentials at infinity depend on them. Now consider how these constant terms
can be described in the CFT. In the context of the pure D1-D5 system it was noted in
[12] that (infinitesimal) constant terms in the harmonic functions (f1, f5) can be reinstated
by making (infinitesimal) irrelevant deformations of the CFT by SO(4) singlet operators.
See also [42] for a related discussion in the context of the AdS5/CFT4 correspondence.
It seems probable that a similar interpretation would hold here: the (nt − 1) parameters
(ao, a
o ) (where nt = 5, 21 for T
4 and K3 respectively) would be related to the parameters
of deformations of the CFT by irrelevant SO(4) singlet operators. In total taking into
account these (nt − 1) zero modes, plus the two constant terms in the (f1, f5) harmonic
functions, one gets (nt+1) parameters. This agrees exactly with the count of the number of
irrelevant SO(4) singlet operators5. How to describe these deformations in the field theory
beyond the infinitesimal level is not known, however.
E Density of ground states with fixed R charges
In this appendix we will derive an asymptotic formula for the number of R ground states with
given R charges. Our derivation follows closely that of [43] for the density of fundamental
string states with a given mass and angular momentum. In fact, we will consider the case of
K3, so the relevant counting is precisely that of the density of left moving heterotic string
states with a given excitation level N and (commuting) angular momenta (j12, j34) in the
5Such deformations may also be related to the attractor flow of moduli; this idea is currently being
developed by Kyriakos Papadodimas and collaborators.
transverse R4. For this purpose we can consider the following Hamiltonian
(a)=1
+ λ1j
1 + λ2j
2, (E.1)
where (λ1, λ2) are Lagrange multipliers and
j1 = j12 = −i
n−1(α1−nα
n −α2−nα1n); j2 = j34 = −i
n−1(α3−nα
n−α4−nα3n). (E.2)
Here the oscillators satisfy the standard commutation relations, namely
n , α
nδn+mδ
(a)(b). In [43] the partition function was computed in the case λ2 = 0, and thus
the partition function of interest here can be computed by generalizing their results. The
first step is to diagonalize the Hamiltonian by introducing combinations
a12n =
(α1n + iα
n); b
(α1n − iα2n) (E.3)
and analogously (a34n , b
n ). Then the Hamiltonian takes the form
(a)=5
n + (n− λ1)(a12n )†a12n + (n+ λ1)(b12n )†b12n (E.4)
+(n− λ2)(a34n )†a34n + (n+ λ2)(b34n )†b34n
The partition function Z = Tr(e−βH) is then
(1− wn)−20(1− c1wn)−1(1− c−11 wn)−1(1− c2wn)−1(1− c
(E.5)
with w = e−β and c1 = e
βλ1 , c2 = e
βλ2 . To estimate the asymptotic density of states, one
as usual expresses the partition function in terms of modular functions and then uses the
modular transformation properties. Here one needs the Jacobi theta function
θ1(z|τ) = 2f(q2)q1/4 sin(πz)
(1− 2q2n cos(2πz) + q4n), (E.6)
f(q2) =
(1− q2n), q = eiπτ , (E.7)
and the modular transformation property
| − 1
) = eiπ/4
τeiπz
2/τθ1(z|τ) (E.8)
Rewriting the partition function in terms of the modular functions, applying this modular
transformation and then taking the high temperature limit results in
Z(β, λ1, λ2) = Cβ
12e4π
2/β λ1λ2
sin(πλ1) sin(πλ2)
, (E.9)
with C a constant. From this expression one can extract the density of states with level N
and angular momenta (j1, j2) by expanding
Z(w, k1, k2) =
dN,j1,j2w
Neik1j
1+ik2j
, (E.10)
where k1 = −iβλ1 and k2 = −iβλ2, and projecting out the dN,j1,j2 . Integrating over (k1, k2)
can be done exactly, since
dkeiky
sinh(πk/β)
cosh2(βy/2)
, (E.11)
resulting in the following contour integral over a circle around w = 0 for dN,j1,j2 :
dN,j1,j2 = C
β14e4π
2/β 1
cosh2(βj1/2) cosh2(βj2/2)
. (E.12)
Assuming N is large the integral can be approximated by a saddle point evaluation, with
the saddle point defined by the solution of
= N + 1− j1 tanh(1
j1β)− j2 tanh(1
j2β). (E.13)
For small angular momenta, which is the case of primary interest here, the solution is
β ∼= 2π/
N + 1. For (
∣) = O(N) the stationary point is at
N + 1− |j1| − |j2|
. (E.14)
Note that
∣ ≤ N . This latter stationary point is equally applicable to small angular
momenta, and thus one can write the asymptotic density of states as
dN,j1,j2
4(N + 1− j)31/4
2π(2N − j)√
N + 1− j
cosh2(
N+1−j ) cosh
N+1−j )
, (E.15)
where j =
∣. The constant of proportionality is fixed by the state with j1 = N ,
j2 = 0 being unique. Note that the commuting generators (j3, j̄3) of (SU(2)L, SU(2)R)
respectively are related to the rotations in the 1-2 and 3-4 planes via j3 =
(j1 + j2) and
j̄3 =
(j1 − j2). The total number of states at level N is
dN ∼=
N27/4
exp(4π
N), (E.16)
and thus the density of states with zero angular momenta differs from the total number of
states only by a factor of 1/N ; the exponential growth with N is the same.
References
[1] O. Lunin and S. D. Mathur, “AdS/CFT duality and the black hole information para-
dox,” Nucl. Phys. B 623, 342 (2002) [arXiv:hep-th/0109154].
[2] O. Lunin, S. D. Mathur and A. Saxena, “What is the gravity dual of a chiral primary?,”
Nucl. Phys. B 655 (2003) 185 [arXiv:hep-th/0211292].
[3] O. Lunin, J. Maldacena and L. Maoz, “Gravity solutions for the D1-D5 system with
angular momentum,” arXiv:hep-th/0212210.
[4] O. Lunin and S. D. Mathur, “Metric of the multiply wound rotating string,” Nucl.
Phys. B 610, 49 (2001) [arXiv:hep-th/0105136].
[5] V. Balasubramanian, J. de Boer, E. Keski-Vakkuri and S. F. Ross, “Supersymmetric
conical defects: Towards a string theoretic description of black hole formation,” Phys.
Rev. D 64, 064011 (2001) [arXiv:hep-th/0011217].
[6] J. M. Maldacena and L. Maoz, “De-singularization by rotation,” JHEP 0212 (2002)
055 [arXiv:hep-th/0012025].
[7] S. D. Mathur, “The fuzzball proposal for black holes: An elementary review,”
arXiv:hep-th/0502050.
[8] I. Bena and N. P. Warner, “Black holes, black rings and their microstates,” arXiv:hep-
th/0701216.
[9] V. S. Rychkov, “D1-D5 black hole microstate counting from supergravity,” JHEP 0601
(2006) 063 [arXiv:hep-th/0512053].
[10] K. Skenderis and M. Taylor, “Kaluza-Klein holography,” JHEP 0605, 057 (2006)
[arXiv:hep-th/0603016].
[11] K. Skenderis and M. Taylor, “Fuzzball solutions for black holes and D1-brane-D5-brane
microstates,” Phys. Rev. Lett. 98, 071601 (2007) [arXiv:hep-th/0609154].
[12] I. Kanitscheider, K. Skenderis and M. Taylor, “Holographic anatomy of fuzzballs,”
JHEP 0704 (2007) 023 [arXiv:hep-th/0611171].
[13] M. Taylor, “General 2 charge geometries,” JHEP 0603 (2006) 009 [arXiv:hep-
th/0507223].
[14] C. G. Callan, J. M. Maldacena and A. W. Peet, “Extremal Black Holes As Fundamental
Strings,” Nucl. Phys. B 475, 645 (1996) [arXiv:hep-th/9510134].
[15] A. Dabholkar, J. P. Gauntlett, J. A. Harvey and D. Waldram, “Strings as Solitons and
Black Holes as Strings,” Nucl. Phys. B 474, 85 (1996) [arXiv:hep-th/9511053].
[16] E. Bergshoeff, R. Kallosh, T. Ortin, D. Roest and A. Van Proeyen, “New formulations
of D = 10 supersymmetry and D8 - O8 domain walls,” Class. Quant. Grav. 18, 3359
(2001) [arXiv:hep-th/0103233].
[17] A. Sen, “String String Duality Conjecture In Six-Dimensions And Charged Solitonic
Strings,” Nucl. Phys. B 450 (1995) 103 [arXiv:hep-th/9504027].
[18] A. Sen, “Strong - weak coupling duality in four-dimensional string theory,” Int. J. Mod.
Phys. A 9 (1994) 3707 [arXiv:hep-th/9402002].
[19] J. Maharana and J. H. Schwarz, “Noncompact symmetries in string theory,” Nucl.
Phys. B 390, 3 (1993) [arXiv:hep-th/9207016].
[20] D. Youm, “Black holes and solitons in string theory,” Phys. Rept. 316, 1 (1999)
[arXiv:hep-th/9710046].
[21] E. Bergshoeff, H. J. Boonstra and T. Ortin, “S Duality And Dyonic P-Brane Solutions
In Type II String Theory,” Phys. Rev. D 53 (1996) 7206 [arXiv:hep-th/9508091].
[22] K. Behrndt, E. Bergshoeff and B. Janssen, “Type II Duality Symmetries in Six Di-
mensions,” Nucl. Phys. B 467 (1996) 100 [arXiv:hep-th/9512152].
[23] L. J. Romans, “Selfduality For Interacting Fields: Covariant Field Equations For Six-
Dimensional Chiral Supergravities,” Nucl. Phys. B 276 (1986) 71.
[24] S. de Haro, S. N. Solodukhin and K. Skenderis, “Holographic reconstruction of space-
time and renormalization in the AdS/CFT correspondence,” Commun. Math. Phys.
217, 595 (2001) [arXiv:hep-th/0002230].
[25] M. Bianchi, D. Z. Freedman and K. Skenderis, “How to go with an RG flow,” JHEP
0108, 041 (2001) [arXiv:hep-th/0105276].
[26] M. Bianchi, D. Z. Freedman and K. Skenderis, “Holographic renormalization,” Nucl.
Phys. B 631, 159 (2002) [arXiv:hep-th/0112119].
[27] I. Papadimitriou and K. Skenderis, “AdS / CFT correspondence and geometry,”
arXiv:hep-th/0404176.
[28] I. Papadimitriou and K. Skenderis, “Correlation functions in holographic RG flows,”
JHEP 0410, 075 (2004) [arXiv:hep-th/0407071].
[29] K. Skenderis, “Lecture notes on holographic renormalization,” Class. Quant. Grav. 19
(2002) 5849 [arXiv:hep-th/0209067].
[30] S. Deger, A. Kaya, E. Sezgin and P. Sundell, “Spectrum of D = 6, N = 4b supergravity
on AdS(3) x S(3),” Nucl. Phys. B 536, 110 (1998) [arXiv:hep-th/9804166].
[31] G. Arutyunov, A. Pankiewicz and S. Theisen, “Cubic couplings in D = 6 N = 4b su-
pergravity on AdS(3) x S(3),” Phys. Rev. D 63 (2001) 044024 [arXiv:hep-th/0007061].
[32] J. R. David, G. Mandal and S. R. Wadia, “Microscopic formulation of black holes in
string theory,” Phys. Rept. 369, 549 (2002) [arXiv:hep-th/0203048].
[33] L. F. Alday, J. de Boer and I. Messamah, “The gravitational description of coarse
grained microstates,” JHEP 0612, 063 (2006) [arXiv:hep-th/0607222].
[34] F. Larsen and E. J. Martinec, “U(1) charges and moduli in the D1-D5 system,” JHEP
9906, 019 (1999) [arXiv:hep-th/9905064].
[35] A. Jevicki, M. Mihailescu and S. Ramgoolam, “Gravity from CFT on S**N(X): Sym-
metries and interactions,” Nucl. Phys. B 577, 47 (2000) [arXiv:hep-th/9907144].
[36] E. Bergshoeff, C. M. Hull and T. Ortin, “Duality in the type II superstring effective
action,” Nucl. Phys. B 451, 547 (1995) [arXiv:hep-th/9504081].
[37] M. J. Duff, J. T. Liu and R. Minasian, “Eleven-dimensional origin of string / string
duality: A one-loop test,” Nucl. Phys. B 452, 261 (1995) [arXiv:hep-th/9506126].
[38] P. K. Townsend, “A New Anomaly Free Chiral Supergravity Theory From Compacti-
fication On K3,” Phys. Lett. B 139 (1984) 283.
[39] P. S. Aspinwall, “K3 surfaces and string duality,” arXiv:hep-th/9611137.
[40] J. D. Brown and M. Henneaux, “Central Charges in the Canonical Realization of
Asymptotic Symmetries: An Example from Three-Dimensional Gravity,” Commun.
Math. Phys. 104 (1986) 207.
[41] M. Henningson and K. Skenderis, “The holographic Weyl anomaly,” JHEP 9807, 023
(1998) [arXiv:hep-th/9806087].
[42] K. Skenderis and M. Taylor, “Holographic Coulomb branch vevs,” JHEP 0608, 001
(2006) [arXiv:hep-th/0604169].
[43] J. G. Russo and L. Susskind, “Asymptotic level density in heterotic string theory and
rotating black holes,” Nucl. Phys. B 437, 611 (1995) [arXiv:hep-th/9405117].
ABSTRACT
  We construct general 2-charge D1-D5 horizon-free non-singular solutions of
IIB supergravity on T^4 and K3 describing fuzzballs with excitations in the
internal manifold; these excitations are characterized by arbitrary curves. The
solutions are obtained via dualities from F1-P solutions of heterotic and type
IIB on T^4 for the K3 and T^4 cases, respectively. We compute the holographic
data encoded in these solutions, and show that the internal excitations are
captured by vevs of chiral primaries associated with the middle cohomology of
T^4 or K3. We argue that each geometry is dual to a specific superposition of R
ground states determined in terms of the Fourier coefficients of the curves
defining the supergravity solution. We compute vevs of chiral primaries
associated with the middle cohomology and show that they indeed acquire vevs in
the superpositions corresponding to fuzzballs with internal excitations, in
accordance with the holographic results. We also address the question of
whether the fuzzball program can be implemented consistently within
supergravity.

<|endoftext|><|startoftext|>
Introduction
While the emergence and learning of human languages has been simulated
since decades on computers [1], and while a later economics Nobel laureate
also contributed to linguistics long ago [2], the competition between existing
languages of adults is a more recent research trend, where physicists have
tried to play a major role. It follows the principle of survival of the fittest,
as known from Darwinian evolution in biology, and indeed many of the tech-
niques have been borrowed from simulational biology [3]. This emphasis from
physics on the competition of existing languages for adult humans started
with Abrams and Strogatz[4] and was then followed by at least six groups
independently [5, 6, 7, 8, 9, 10]. More recently, of course, reviews[11, 3] and
conferences brought them together, and others followed them [12, 13, 14].
Today about 7000 different languages (as defined by linguists) are spoken,
and about every ten days one of them dies out. On the other hand, the split
of Latin into different languages spoken from Portugal to Romania is well
documented. In statistical physics, we can describe and explain the pressure
which air molecules of a known density and temperature exert on the walls.
But we cannot predict were one given molecule will be one second from
now. Similarly, the application of statistical physics tools to linguistics may
describe the ensemble of the seven thousand presently existing languages,
http://arxiv.org/abs/0704.0691v1
 1000
 1  10  100  1000  10000  100000  1e+06  1e+07  1e+08  1e+09
size s
Distribution of language sizes from Grimes, Ethnologue, and  550 exp[-0.05{ln(size/7000)}**2]
Figure 1: Empirical variation of the number Ns of languages spoken by
s people each. For better presentation, the language sizes s are binned in
powers of two. Data from Ethnologue [16], as plotted in [17]. The parabola
corresponds to a log-normal distribution; we see deviations from it for the
smallest sizes [18].
but not the extinction of one given language in one given region on Earth.
Figure 1 shows how many languages exist today, as a function of the number
of speakers of that language. A statistical theory of language competition
thus first of all should try to reproduce such results, in order to validate the
model. If if fails to describe this fact, why should one trust it at all? Or as
stated by linguist Yang on page 216 of [15]: It is time for the ancient field of
linguistics to join the quantitative world of modern science.
This review starts with our own model for numerous languages in section
2, followed by a review of the alternative model of Viviane de Oliveira and
coworkers [10]. Then we review more shortly the many other models which
at present do not allow the simulation of thousands of different languages.
2 Schulze Model
2.1 Definition
Our own simulations, also called the Schulze model, characterise each lan-
guage (or grammar) by F independent features each of which can take one
of Q different values; the binary case Q = 2 allows the storage in bit-strings.
Three basic mechanisms connected with probabilities p, q and r are common
to all variants:
i) With probability p at each iteration and for each feature, this feature
is changed (or mutated in biological language). This change is random or
not, depending on process ii).
ii) With probability q the mutation/change under i) is not random but
instead transfers the value of this feature from another person in the popu-
lation. This transfer is called diffusion by linguists. With probability 1− q,
the change is random.
iii) With probability (1 − x)2r (also (1 − x2)r has been used instead)
somebody discards the mother language and takes over the whole language
(all F features) from another person in the population. Here x is the fraction
of people speaking the old language. This flight is called shift by linguists.
Several variants are possible: One can use one joint population where
everybody can meet everybody for transfer and shift; or we put people onto
a square L× L lattice or more complicated network, where diffusion and/or
shift are possible only from a randomly selected neighbour. People may
migrate on this lattice, which physicists would call diffusion. The population
can be fixed, meaning that at every iteration all adults are replaced by their
children. Or it can grow by a suitable birth and death process; in this case the
shifting probability can include also a factor proportional to the population.
If one dislikes to have three free parameters p, q, r one may set q = r = 1
without much loss in results.
For the number F of features, from 8 to 64 were used in simulations. Real
languages contain many thousands of words for everyday use, and thus one
should identify one feature rather with an independent grammatical element
(like the order of subject, object and verb in a sentence) than with a word. F
for real languages was estimated as about 30 [19] or about 40 to 50 [15] such
choices, and the Word Atlas of Language Structures [20] lists 138 features
with up to Q = 9 values. These grammar sizes thus correspond roughly to
what has been simulated. According to [21] the rate of change in normal
linguistic typological features, i.e., excluding a few extraordinarily unstable
ones, is 16 % per 1000 years.
2.2 Results
If we start with everybody speaking one language (or with just one Eve), then
at low p this language still dominates and is spoken by more than half of the
total population, with the remaining people speaking a minor and short-lived
variant of this dominating language. At high p, on the other hand, the whole
population soon fragments into many languages, roughly such that everybody
selects nearly randomly one of the QF possible languages. This corresponds
to the biblical story of the Tower of Babel. We thus have dominance for
small p and fragmentation for large p, with a first-order phase transition or
jump at some threshold value which depends on the other parameters and
details of the model.
If instead we start with everybody speaking a randomly selected lan-
guage, then for high p this situation remains. For low p, however, after some
time one language by random accidents happens to grow to a sufficiently
large size such that it then grows rapidly to be spoken by more than half of
the population. Thus a transition from initial fragmentation to final dom-
inance happens in Fig.2. The threshold value is different from the one for
the opposite direction from dominance to fragmentation: we have hysteresis
as is common for first-order transitions, Fig.3. Empirically, this transition
to one dominating language was observed on the American continent; within
the last five centuries, two thirds of the native Brazilian languages have died
out. And in the last half century we observed the rise of English in physics
research publications. While 85 years ago, physicist Bose sent from India
his paper to Einstein in order to have it translated from English to German
(which lead to Bose-Einstein condensation), after World War II physics re-
search was usually published in English, first in Japan, since the 1960’s in
(West) Germany, a decade later in France, since the 1990’s in Russia; finally,
China has witnessed a surge in physics papers written in English since 2000.
The time needed to go from fragmentation to dominance increases roughly
logarithmically with population size, at least in the binary case Q = 2 with-
out lattice. Thus a mathematical solution for an infinite population might
never get this transition. In other words, proper models should be agent-
based [22], with individuals acting on their own; one should not average over
10000
100000
100 M
0 100 200 300 400 500 600
Size of largest language for 100M, 10M, 1 M, 0.1 M, 10 K, 1 K people, 0.48 mutations per person, q=0.94
Figure 2: Variation with time of the number of people speaking the most
widespread language, for various population sizes. The larger the population
is, the longer is the time until the transition from fragmentation to dominance
takes place. Q = 2, F = 8, p = 0.06, q = 0.94; from [11].
the whole population, using differential equation for the concentrations. Such
simulations have been standard in computational physics for half a century
(Monte Carlo and Molecular Dynamics), while mean field approximations
average over many individuals and can give somewhat or completely wrong
results. (The transition from fragmentation to dominance may require a shift
probability (1− x)2r instead of (1− x2)r.)
The language size distribution to be compared with Fig.1 is shown in
Fig.4a. To get it, we looked at non-equilibrium results and introduced ran-
dom multiplicative noise, since otherwise the language sizes were too small
and their distribution too irregular. Fig.4b avoids these tricks and instead
places the people on a directed scale-free network, discussed below.
No lattice or other spatial structure was employed in the above simula-
tions. On a lattice one can look at language geography [23, 24, 25]. North
and South of the Alps, different languages are spoken, and the same sepa-
ration is made by the English Channel. Genetic and linguistic boundaries
in Europe mostly coincide, and about two thirds of them agree with natural
boundaries like a mountain chain or sea [26]. We simulated this effect on a
0.01 0.1 1 10
population in millions
Critical mutation rate, no transfer, initially one (top) or initially many random languages (bottom)
Figure 3: Dependence of the mutation threshold for the phase transition
on the population size; upper data from dominance to fragmentation, lower
data from fragmentation to dominance. Above the curves we arrive at frag-
mentation, below at dominance. From [11].
lattice [27] with contact only between nearest neighbours and a horizontal
barrier separating the upper from the lower half. The shift from a small
to a large language happens across the barrier only with a small crossing
probability c. For c = 0 one thus has two completely separated halves of the
lattice, and trivially the languages which evolve as dominating are different
on both sides of the border. With c = 1 the border has no effect, and only
one language dominates. Fig.5 shows how often for small but finite c two
separate dominating languages may coexist; already quite small c suffice,
particularly for large lattices, to unify the two regions into only one with the
same language dominating on both sides.
Fig.4b employed a directed Barabási-Albert scale-free network. These
networks are grown from a small fully connected core such that each new
network member selects m already existing members as teachers. The more
people have selected a certain teacher before, the higher is the probability
that this teacher will again be selected. Information only flows from the
teacher to the person who selected this teacher, not in the opposite direction
[31].
 1000
 10000
 100000
 10 M
 1  10  100  1000  10000  100000
number of speakers
Two runs of 200 million; fragmented start, mutation rate 0.0032 per bit-string, t < 5000 or 6000
 10 K
 100K
 1  10  100  1 K  10 K  100K  1 M  10 M
number of speakers
N = 1 million (+), 3 M (x), 10 M (*), 30 M (sq.)
Figure 4: Language size distribution Ns. Top: without lattice, with random
multiplicative noise, not in equilibrium [17]. Bottom: On scale-free network
in equilibrium [30].
 0.0001  0.001  0.01  0.1
crossing rate
100 samples, L = 50 (right line), 100 (+,x), 200 (left line)
Figure 5: Fraction of cases when a semi-permeable barrier allowed two
different languages to dominate on its two sides in the Schulze model.
3 Viviane Model
3.1 Definition
The model of Viviane de Oliveira et al [10] has become known as the Viviane
model (following the Brazilian tradition of how to call people). It simulates
the colonisation of an uninhabited continent by people. Each site j of an L×L
square lattice can later be populated by cj people; this carrying capacity cj
is selected randomly between 1 and some m ∼ 102. Initially only one site i
is occupied by ci people.
Then at each time step, one randomly selected empty lattice neighbour j
of occupied sites becomes occupied with probability cj/m by cj people. Thus
after some time the whole lattice becomes occupied and the simulation stops.
In contrast to the Schulze model, the Viviane model is a growth process and
not one eventually fluctuating about some equilibrium.
Languages have no internal structure and are simply numbered 1, 2, 3,,,
with 1 being the number of the language spoken on the originally occupied
site. All people within one lattice site speak the same language. First, if a
new site has been colonized the language spoken there is taken from one of the
occupied neighbours k, proportional to the fitness Fk of that neighbour site k.
This fitness is the total number of people speaking anywhere in the lattice the
language of k, except that it is bounded from above by a maximum fitness
Mk fixed randomly between 1 and some Mmax ∼ 10
3. Second, mutations
(language change) are made with probability α/Fj on the freshly occupied
site j only, from the selected language of neighbour k. A mutation means
that a new language is created which gets a new number not used previously.
In this way, the flight (shift) from small language and the mutations,
which were two separate processes in the Schulze model, are combined into
one process; and this process also is a transfer (diffusion) process which in
the Schulze model was dealt with separately. Thus we have here only one
free parameter α, the mutation coefficient, instead of three parameters p, q, r
in the Schulze model.
Variants allow mutations also later, after a site is occupied. Or a language
is characterised by a string of F bits (Q = 2 in the Schulze notation) and only
different bit-strings count as different languages [28]. Or the capacities ck are
not homogeneously distributed between 1 and m but more often small than
large, with a frequency proportional to 1/cj, as long as it is not larger than
the maximum m. Lot of computer time then is saved if after the occupation
of the new sites one selects two of its occupied neighbours and takes the
language from the one with the bigger capacity; if only one neighbour is
occupied then its language is taken over.
3.2 Results
In contrast to the Schulze model, the Viviane model gives languages spoken
by 109 people if a sufficiently large lattice is used. The language size distri-
bution Ns has a maximum at moderately small language sizes s. However,
instead of a round parabola as in Fig.1, the log-log plot of Ns versus s gives
two straight lines meeting at the maximum, that means one power law for
small s (where Ns increases with s) and another power law for large s (where
Ns decreases with s). So, not everything is solved yet.
Crucial progress was made by Paulo Murilo de Oliveira (not the same
family as Viviane de Oliveira), who introduced the above-mentioned modifi-
cations [28]: Languages (grammars) are characterized by bit-strings (Q = 2)
of length F ≃ 13 and count as different only when their bit-strings differ; the
carrying capacities c are selected with a probability ∝ 1/c, and the newly
colonized site gets the fitter language of two previously occupied neighbours.
 100K
 1  10  100  1 K  10K 100K 1 M  10M 100M 1 G
number of speakers
L = 20,000; 7500 languages; 5940 million people
Figure 6: Variation of the number Ns of languages spoken by s people each
in the Viviane model, as modified and published in [28].
Now the distribution is roughly log-normal, Fig.6, with enhancement for very
small sizes; the total population and the total number of languages can be
made close to the present reality, the maximum of the parabola in a double-
logarithmic plot (with binning by factors of two in s) is near s ≃ 104, while
the largest language is spoken by 109 people, similar to Mandarin Chinese.
Fig.7 shows for both the modified Viviane model and the Schulze lattice
model that languages in general are less similar to each other if they are
widely separated geographically, in agreement with reality [24, 25]. Note the
difference in scales: One lattice constant (distance between nearest neigh-
bours) corresponds to about one kilometer in the Viviane model and 1000
kilometers in the Schulze model, if Fig.7 is compared to reality [24].
Also the classification of different languages into one family, like the Indo-
European languages, has been simulated with moderate success. Following
the history of the mutations during the colonisation, a language tree like can
be constructed in the unmodified version (Fig.11.15 in [11]). One can imagine
that this is Latin, splitting up into Romanian, Italian, Spanish and French,
with Spanish then splitting into Castellan, Galego and Catalan, and Catalan
mutating further into Mallorquin. (Many small branches were omitted for
clarity.) More quantitative information is obtained from the modified bit-
 0  2  4  6  8  10  12  14  16  18  20
geographical distance in 1000
Modified Viviane model, line = random differences
 1  2  3  4  5  6  7  8  9  10
geographical distance
Schulze model, Q=5, F=9, p=0.5, q=r=0.9, s=0.5
Figure 7: Language differences (in arbitrary units [24]) as a function of
geographical distance in the Viviane model (top) and the Schulze model
(bottom). The horizontal line corresponds to completely uncorrelated lan-
guages. In the bottom part, + and x correspond to start with fragmentation
(+) and with dominance (x). From [24, 30].
string version [28]. The mutated language on a newly occupied site starts a
new family if it differs in two or more bits from the bit-string characterising
the historically first language of the old family. The size distribution of
language families in Fig.8 agrees in its central part with the empirically
observed [29] exponent –0.525 and is independent of the length F = 8, 16,
32 or 64 of the bit-strings for the Viviane model and independent of the
population size for the Schulze model.
 1000
 10000
 100000
 1  10  100  1000
number of languages
100 * L=10,000; 8(+), 16(x), 32(*), 64(sq.) bits
 10 K
 100K
 1  10  100  1000
number of families
N = 1 million (+), 3 M (x), 10 M(*), 30 M(sq.)
Figure 8: Number of families as a function of the number of different lan-
guages in this family [30]. Top: Modified Viviane model for various lengths
of the bit-strings. Bottom: Schulze model with Q = 5, F = 8 on directed
Barabási-Albert scale-free network, p = 0.5, q = 0.59, r = 0.9 for various
population sizes.
4 Other Models
Years before physicists invaded en masse the field of linguistics, Nettle [33]
already wrote down a differential equation for the number L of languages,
dL/dt = 70/t− L/20 ,
where time t is measured in millennia. For long times, only one language
(mathematically: zero languages) will remain; however that time lies far in
the future. A more detailed splitting mechanism was introduced by Novotny
and Drozd [34] for the emergence of new languages from one mother lan-
guage, and gave a log-normal distribution of language sizes, in agreement
with reality except presumably at the smallest sizes [18]. In the same spirit
of looking at languages as a whole, ignoring the individuals, are the very re-
cent models of Tuncay [14], who coupled a splitting mechanism with random
multiplicative noise in the size of the growing population, plus an extinction
probability, and found a power-law decay or a log-normal size distribution
for the simulated languages, depending on parameters. He also checked the
lifetimes of the simulated languages. An “early” attempt to apply the Ising
model of statistical physics to linguistics [36] had little success.
Numerous coupled differential equations were studied by scientists coming
from theoretical chemistry, mathematics and computer science [35] for the
purpose of language learning by children. They have been applied [3] also
to the competition of up to 8000 languages of adults, but since the original
authors have to our knowledge not followed this re-interpretation of their
learning model we now refer to [35, 3] for details and results.
It was the population dynamics of Abrams and Strogatz [4] which started
the avalanche of physics papers on language competition. They assume two
languages X and Y, spoken by the fractions x and y = 1 − x of a fixed
population with a time dependence
dx/dt = yxas− xya(1− s) ,
with a status or prestige variable S which is close to one if X has a high
prestige and close to zero for low prestige of X. The neutral case is s = 1/2.
The exponent a = 1.31 was fitted to some empirical data of how minority
languages decay in size. If a is replaced by unity we arrive at the logistic
equation of Verhulst from the 19th century, which was applied to languages
by Shen in 1997.
0 1 2 3 4 5 6 7 8 9 10
Abrams-Strogatz model for a = 1.31 and s = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6 (bottom to top); unsymmetric
Figure 9: Abrams-Strogatz model: Fraction of people speaking language
X, versus time, for an initial concentration of only ten percent and various
prestige values s of this language. For s ≥ 0.7 this language finally wins over
the other language Y. From [3].
Fig.9 shows the resulting x(t) if X is spoken initially by a minority of
ten percent only. Then for low, neutral or slightly higher status s of its
language X, the fraction decays further towards zero, but for a higher status
like x = 0.7 it finally wins over and is spoken by everybody (not shown).
This may correspond to the influence of a colonial power; indeed in France
today most people speak French as a result of the Roman conquest of more
than two millennia ago, and in Brazil many of the native languages have
become extinct in the last five centuries since the Portuguese arrived there.
This Abrams-Strogatz approach was soon generalized to a lattice by Pa-
triarca and Lepännen [4], and later to populations with bilingual speakers
[5, 38], coexistence of the two languages [8] as well simulations based on
individuals [13].
Such agent-based simulations were also made by Kosmidis et al [6] who
gave each person a string of 20 bits. The first 10 belonged to one language,
the last 10 to another language. In this way they were able to simulate
people speaking, more of less correctly, one or two languages. One could
also interpret their model as one for English which is a mixture of German
(Anglo-Saxon) and French words, due to the conquest of England by the
Normans in 1066.
Finally, Schwämmle [9] also used bit-strings, but to describe biological
ageing through the Penna model. The child can learn the languasge from
the mother, the father, or both, thus also allowing for bilinguals. This model
allowed to simulate that languages are learned easier in youth than at old age.
In this way it builds a bridge between language competition and language
learning [35].
5 How physics may inform linguistics:
prospects for future research
As the research described above has progressed a larger design has become
apparent, which consists in an empirical side looking for quantitative distribu-
tions involving languages [29, 24, 37] and a development of models simulating
similar quantitive distributions. The hope is that as more and more quantifi-
able relations in and among languages are discovered and simulation models
are developed which can adequately replicate these distributions, the simu-
lation models will of necessity become more and more adequate as models
of actual languages, and could therefore be employed for purposes beyond
the ones for which they were designed. For instance, the revised Viviane
model, which was designed to capture the distribution of speaker popula-
tions and the population of languages within families [30], could potentially
be employed for investigating absolute rates of language change, an issue
with which linguists are very much concerned [39], inasmuch as knowledge
of how fast languages change could provide us with a way to date prehis-
toric events involving people speaking given reconstructed languages. Thus,
a strand of research where linguists and physicists can and will continue to
cooperate is the search for quantifiable distributions on the one hand and the
fine-tuning of models which can adequately simulate an increasing range of
such distributions.
Apart from some exceptions [5, 6, 38], most work on language compe-
tition has assumed monolingual speakers. Since most of the world’s pop-
ulation is bi- or multilingual this is clearly not adequate. Language shift
will normally involve transitional bilingualism, or bilingualism may persist
for centuries without the majority language necessarily replacing minority
languages. Diglossia, i.e. the use of different languages for different pur-
poses may help sustain bilingualism. Current models can be extended to
investigate under which conditions bilingualism may persist or get reduced
to monolingualism. Different kinds of situations can be modelled, such as
the replacement of certain, but not all, languages within the domain of the
Roman empire, the development of so-called linguistic areas, where several
languages share a number of features (e.g., the Balkans, India, Mesoamer-
ica), multilingualism caused by linguistic exogamy (the northwest Amazon
region), the shift from one to another lingua franca with retention of minority
languages (Mayan immigrants in urban United States shifting from Spanish
to English but retaining Mayan languages), etc., and may be applied to sit-
uations where prehistoric interaction has left linguistic traces but where the
nature of the interaction is unknown (e.g., the sharing of linguistic features
around the entire coast of the Pacific Ocean). In the Appendix we provide a
preview of the extension of the Schulze model to bilingualism.
An area where physicists may wish to try out their hands more is that of
language change. Simulations may help linguists come to terms with realities
that are accessible through empirical research only in small fragments. Lan-
guages develop and change through the interaction of multitudes of agents
using large lexical inventories and complex grammars. The kinds of regu-
larities that linguist can identify, such as the regularity of sound changes
or directed paths of grammaticalization (roughly, the process whereby sep-
arate words become part of the morphology), are mostly accessible only to
a retrospective view, trough the comparison of language stages dozens or
hundreds of years apart. What lies between is a flux whose behavior is not
easy to understand. 19th century historical linguists, with their focus on
regular sound changes, lived in a universe of clean equations such as Latin
p = English f (as in pater = father). The advent of 20th century soci-
olinguistics, with its focus on the social mechanisms behind sound changes,
complicated the picture, much like the picture gets complicated when one
moves from clean Newtonian physics to modern statistical physics, which
tries to model the actual behavior and interaction of entities. Nevertheless,
unlike physicists, linguists working on the way that languages change have
taken little recourse to simulations that might help them understand the
complexity of how language change or shift percolates within a community.
For instance, a leading sociolinguist has argued that ”networks constituted
chiefly of strong ties function as a mechanism to support minority languages,
resisting institutional pressures to language shift, but when these networks
weaken, language shift is likely to take place” (p. 558 in [40]). This hypoth-
esis is based on just a few case studies, and such case studies are, one the
one hand, extremely costly and, on the other, cannot even begin to cover
the multitute of different situations that actually obtains. In addition to the
strength of network ties, other important parameters are presumably the size
of the group speaking the minority language, geography, prestige of one as
opposed to the other language, economic gain involved in shifting language,
age- and gender-determined mobility, and maybe more. The behaviors of
such parameters can be investigated in simulations (e.g., for geography see
[11, 32]).
Finally, more works needs to be done towards the integration of the mod-
elling of language competition by physicists reviewed here and the modelling
of language evolution by computational linguists [41, 42, 43, 44, 45]. While
physicists have been adept in modelling the interaction among agents but
have operated with languages represented only by numbers or bit-strings,
computational linguists offer elaborate grammar models. With more com-
plex models of the interior structure of languages carried by agents, research
need not be limited to a focus on language competition, but could be ex-
tended to issues of language structure itself.
6 Appendix: Bilingualism
Several authors studied the possibility that people speak more than one lan-
guage [5, 6, 38], and we do here the same for the Schulze model on the square
lattice, with F = 8 features of Q = 3 different values, using only interactions
to nearest neighbours [27]. For this purpose we modify the switch process.
Before, languages spoken by a fraction x of the four neighbours were
dropped in favour of the language of a randomly selected neighbour with
probability (1 − x)2r (r = 0.9). Now we do this at lattice site i only if none
of the four neighbours of i speaks the mother language of site i: x = 0;
then with probability r we replace the mother language of i by the mother
language of a randomly selected neighbour. Otherwise, for x > 0, with the
above probability (1− x)2r, site i learns as an additional “foreign” language
a randomly selected (foreign or mother) language of a randomly selected
neighbour. If in the latter case x > 0, site i has already learned a foreign
language before, then this old foreign language is replaced by the new foreign
language.
 .0001
 0.001
 0.01
 1  10  100  1000  10000
Leading mother(+,x) and foreign(*,sq.) languages
 1e-04
 0.001
 0.01
 1  10  100  1000  10000
Leaders with (+,x) and without (*,sq.) bilinguals
Figure 10: Largest and second-largest languages in the Schulze lattice at
p = 0.01, q = r = 0.9 in a 8000 × 8000 lattice without migration. Part
a includes bilinguals, part b compares only the mother languages with and
without bilinguals.
 .0001
 0.001
 0.01
 1  10  100  1000  10000
As 10a but with forgetting and migration d=0.01
 10  100  1000
100 x 100 migration: d = 1 (left) to 0.0001 (right)
Figure 11: Part a: As Fig 10a, but including a migration probability d of 1 %
and a site-dependent forgetting probability between zero and 5 %, on a 6000×
6000 lattice. Part b shows the drastic speed-up of dominance if the migration
probability d is enhanced: d = 1, 0.5, 0.2, 0.1, . . . , 0.0005, 0.0002, 0.0001.
These are the learning and replacement events if everybody can speak at
most two languages. If instead the number of languages for each person is
restricted by an overall upper limit, then for x > 0 the last-learned foreign
language is replaced by the newly selected neighbour language. If this upper
limit is set equal to one, we go back to the model of monolingual speakers.
In all cases, x is the fraction of neighbours of i speaking as their mother
tongue the mother language of site i. For language diffusion we took q = 0.9
throughout, and for language change mostly p = 0.01. Thus one iteration
may correspond to about one human generation.
We start with a fragmented distribution of mother languages and no
foreign languageas, except that one particular language is spoken initially
by ten percent of the people, randomly spread over the lattice. Then we
check if this “lingua franca” finally (after at most 105 iterations) is spoken
by about everybody: transition from fragmentation to dominance of initially
favoured language. (If another language dominated we count this case as
fragmentation.)
For ten 50× 50 lattices, this transition happened up to p = 0.04 if bilin-
guality is allowed, while for monolinguals it happens up to the larger chang-
ing rate p = 0.09: Bilinguality makes dominance of one language less stable
against continuous changes; see also [38]. For Q = 5 instead of 3 this limit
shifts from 0.04 to 0.05, while for Q = 3, F = 16 it is about 0.03. Fig.10
shows for 8000× 8000 lattices the time dependence of the fraction of people
speaking the largest and the second-largest language, separately for mother
tongue and foreign languages; the comparison for the case without bilinguals
is restricted to the mother language and shows that dominance is reached
faster without bilinguals.
In these simulations, after a short time everybody speaks two languages,
and if up to ten languages are allowed, then again after a short time every-
body speaks ten languages. This is nice but unrealistic. In order to take
into account that people forget again foreign languages which were learned
but not used, or give up learning a foreign language when the need for it
dissipates, we assume that at every time step each speaker (more precisely,
each lattice site) may give up the last-learned foreign language if none of the
neighbours at that time speaks this language. This forgetting happens with
a probability between zero and five percent, fixed for each site randomly and
independently at the beginning.
In addition we included migration via exchanging locations: A speaker
or family exchanges residence with a randomly selected neighbour, and both
carry their languages with them. This happens at each iteration with a
probability d; physicists call d the diffusion constant. Fig.11 shows that ap-
preciable migration can drastically speed up the growth of the lingua franca
from having an initial advantage of being spoken by ten percent of the pop-
ulation to being dominant.
References
[1] http://www.isrl.uiuc.edu/amag/langev/; W.S.Y. Wang, J.W. Minett,
Trans. Philological Soc. 103, 121 (2005).
[2] R.Selten und J. Pool, S.64 in: R. Selten (Hg), Game Equilibrium Models
IV, Berlin-Heidelberg: Springer 1992.
[3] D. Stauffer, S. Moss de Oliveira, P.M.C. de Oliveira, J.S. Sá Martins,
Biology, Sociology, Geology by Computational Physicists. Amsterdam:
Elsevier 2006.
[4] D. Abrams and S.H. Strogatz, Nature 424, 900 (2003); M. Patriarca and
T. Leppänen, Physica A 338, 296 (2004).
[5] J. Mira, J., A. Paredes, Europhys. Lett. 69 (2005) 1031.
[6] K. Kosmidis, J.M. Halley, P. Argyrakis, Physica A, 353 (2005) 595; K.
Kosmidis, A. Kalampokis, P. Argyrakis, Physica A 366, 495 and 320,
808 (2006).
[7] C. Schulze, D. Stauffer, Int. J. Mod. Phys. C 16, 781 and Physics of Life
Reviews 2, 89 (2005).
[8] J.P. Pinasco, L. Romanelli, Physica A 361, 355 (2006).
[9] V. Schwämmle, Int. J. Mod. Phys. C 16, 1519 (2005) and 17, 103 (2006).
[10] V.M. de Oliveira, M.A.F. Gomes and I.R. Tsang, Physica A 361, 361
and 368, 257 (2006).
[11] C. Schulze and D. Stauffer, p.311 in: B.K. Chakrabarti, A. Chakraborti,
and A. Chatterjee (Hgg.), Econophysics and Sociophysics: Trends and
Perspectives. Weinheim: Wiley-VCH 2006. For shorter reviews see
http://www.isrl.uiuc.edu/amag/langev/
AIP Conference Proceedings 779 (2005) 49; Comput. Sci. Engin. 8
(May/June 2006) 86.
[12] T. Tȩsileanu, H. Meyer-Ortmanns, Int. J. Mod. Phys. C 17, 259 (2006).
[13] D. Stauffer, X. Castelló, V.M. Egúıluz, M. San Miguel, Physica A 274,
835 (2007); X. Castelló, V.M. Egúıluz, M. San Miguel, New J. Phys. 8,
article 308 (2006).
[14] Ç. Tuncay, Int. J. Mod. Phys. C 18, No. 5 (2007) , e-prints
physics 0610110, 0612137, 0703144 on arXiv.org
[15] C. Yang, The Infinite Gift - How Children Learn and Unlearn the Lan-
guages of the World, Scribner, New York 2006.
[16] B.F. Grimes, Ethnologue: Languages of the World (14th edn. 2000).
Dallas, TX: Summer Institute of Linguistics; and www.ethnologue.org
[17] D. Stauffer, C. Schulze, F.W.S. Lima, S. Wichmann and S. Solomon,
Physica A 371, 719 (2006).
[18] W.J. Sutherland, Nature 423, 276 (2003).
[19] E.J. Briscoe, Language 76, 245 (2000).
[20] M. Haspelmath, M. Dryer, D. Gil, and B. Comrie (eds.). The World
Atlas of Language Structures. Oxford: Oxford University Press 2005.
[21] S. Wichmann and E.W. Holman. 2007ms. Assessing Temporal Stability
for Linguistic Typological Features. Manuscript to be submitted
[22] F.C.Billari, T. Fent, A. Prskawetz and J. Scheffran (Hgg.) Agent-based
computational modelling, Heidelberg: Physica-Verlag 2006.
[23] Goebl, H., Mitt.Österr. Geogr. Ges. 146 (2004) 247 (in German).
[24] E.W. Holman, C. Schulze, D. Stauffer und S. Wichmann, conditionally
accepted by Linguistic Typology.
[25] L.L. Cavalli-Sforza and W.S.Y. Wang, Language 62, 38 (1986).
[26] G. Barbujani and R.R. Sokal, Proc. Natl. Acad. USA 87, 1816 (1990).
http://arxiv.org/abs/physics/0610110
[27] C. Schulze and D. Stauffer, Physica A, in press (2007).
[28] P.M.C. de Oliveira, D. Stauffer, F.W.S. Lima, A.O. Sousa, C. Schulze
and S. Moss de Oliveira, Physica A 376, 609 (2007).
[29] S. Wichmann, J. Linguistics 41, 117 (2005).
[30] D. Stauffer, C. Schulze and S. Wichmann, in: Beiträge zur Experimen-
talphysik, Didaktik und computergestützten Physik, S. Kolling (ed.):
Logos-Verlag, Berlin 2007 (Festschrift Patt)
[31] R. Albert and L.A. Barabási, Rev. Mod. Phys. 74, 47 (2002).
[32] C. Schulze and D. Stauffer, Adv. Complex Syst. 9, 183 (2006).
[33] D. Nettle, Lingua 108, 95 (1999) and Proc. Natl. Acad. Sci. USA 96,
3325 (1999).
[34] V. Novotny and P. Drozd, Proc. Roy. Soc. London B267, 947 (2000).
[35] M.A. Nowak, N.L. Komarova and P. Niyogi, Nature 417, 611 (2000);
N.L. Komarova, J. Theor. Biology 230, 227 (2004).
[36] N. Prévost, The physics of language: toward a phase Transition of lan-
guage change. PhD Thesis, Simon Fraser University, Vancouver 2003.
[37] S.Wichmann and E.W. Holman. Submitted. Pairwise Comparisons of
Typological Profiles. For the proceedings of the conference Rara & Raris-
sima – Collecting and interpreting unusual characteristics of human lan-
guages, Leipzig (Germany), 29 March - 1 April 2006; e-print 07040071
at arXiv.org
[38] X. Castelló, V.M. Egúıluz, and M. San Miguel, New J.Phys. 8, 308
(2006)
[39] J. Nichols, Diversity and stability in languages. In: B.D. Joseph, and R.
D. Janda (eds.), The Handbook of Historical Linguistics, pp. 283-310.
Malden/Oxford/Melbourne/Berlin: Blackwell Publishing 2003.
[40] L. Milroy, Social networks. In: J.K. Chambers, P. Trudgill and N.
Schilling-Estes (eds.), The Handbook of Language Variation and Change.
Blackwell 2002.
http://arxiv.org/abs/e-print/0704007
[41] A. Cangelosi and D. Parisi (eds.),Simulating the Evolution of Language.
Berlin: Springer-Verlag 2002.
[42] E.J. Briscoe (ed.), Linguistic Evolution through Language Acquisition:
Formal and Computational Models. Cambridge: Cambridge University
Press 2002.
[43] M. Christiansen and S. Kirby (eds.), Language Evolution. Oxford: Ox-
ford University Press 2003.
[44] B. de Boer, Computer modelling as a tool for understanding language
evolution. In: N. Gontier, J. P. van Bendegem and D. Aerts (eds.), Evo-
lutionary Epistemology, Language and Culture: A Non-Adaptationist
Systems Theoretical Approach, 381-406. Dordrecht: Springer 2006.
[45] Niyogi, Partha, The Computational Nature of Language Learning and
Evolution. Cambridge & London: The MIT Press 2006.
	Introduction
	Schulze Model
	Definition
	Results
	Viviane Model
	Definition
	Results
	Other Models
	How physics may inform linguistics:  prospects for future research
	Appendix: Bilingualism
ABSTRACT
  Simulations of physicists for the competition between adult languages since
2003 are reviewed. How many languages are spoken by how many people? How many
languages are contained in various language families? How do language
similarities decay with geographical distance, and what effects do natural
boundaries have? New simulations of bilinguality are given in an appendix.

<|endoftext|><|startoftext|>
Introduction
A consistent theoretical model of the high critical temperature superconductivity in
cuprates is to be able to accommodate both the normal and superconducting states
under incorporation of the essential features of these systems (see, e.g., [1] for a review):
strong antiferromagnetic (AFM) superexchange interaction inside the CuO2 planes,
occurrence of two relatively isolated energy bands around the Fermi level, able to develop
dx2−y2 pairing: one stemming from single particle copper dx2−y2 states and the second
one from singlet doubly occupied states generated [2] by crystal field interaction; hopping
conduction for an extremely low density of the free charge carriers.
The p-d model [3], while incorporating all these features, is too cumbersome and
cell-cluster perturbation theory [4, 5] providing a hierarchy of the various interaction
terms was used to derive simpler models from it. Extreme limit cases of this reduction
procedure are various effective one-band t-J models (see, e.g., [6, 7] and references
therein) which, while unveiling the role played by the AFM exchange interaction in the
occurrence of the d-wave pairing, address exclusively the superconducting state.
The reduction of the p-d model to an effective two-band Hubbard model considered
by Plakida et al. [8], corroborated with the use of the equation of motion technique
for thermodynamic Green functions (GF) [9], provided the simplest approach to the
description of both the normal [8, 10] and the superconducting states [11, 12, 13] within
a frame securing rigorous fulfilment of the Pauli exclusion principle for fermionic states.
The Green function technique rests on the Hubbard operator algebra. Its rigorous
implementation onto a system characterized by specific symmetry properties (translation
invariant two-dimensional spin lattice, spin reversal invariance of the observables) results
either in characteristic invariance properties of several correlation functions, or in the
occurrence of some exactly vanishing correlation functions. The use of these results
allows rigorous derivation and simplification of the expressions of the frequency matrix
and of the generalized mean field approximation (GMFA) Green functions of the model.
The obtained expressions contain higher order boson-boson correlation functions
(CFs). For the CFs involving singlets (normal singlet hopping CFs and anomalous
exchange pairing CFs), an approximation procedure which avoids the usual decoupling
schemes and, yet, secures the correlation order reduction to GMFA-GF expressions,
under the identification and elimination of exponentially small quantities, is described.
The organization of the paper is as follows. Sec. 2 summarizes essentials of the
two-band Hubbard model and GMFA-GF equations. Sec. 3 describes the invariance
properties following from the translation invariance of the underlying spin lattice.
Sec. 4 derives invariance properties and constraints following from the invariance of the
macroscopic properties of the system under spin reversal. On the basis of the results of
Sec. 3 and 4, rigorous derivation of the frequency matrix in the (r, ω)-representation is
done in Sec. 5. The derivation of GMFA-GF expressions for the boson-boson correlation
functions involving singlets is discussed in Sec. 6.
Collecting together the results of sections 5 and 6, expressions of the frequency
Mean field Green functions of Hubbard model of superconductivity 3
matrix and of the GMFA Green function matrix are derived in the (q, ω)-representation
in sections 7 and 8 respectively. These results explicitly incorporate both hole-doping
and electron-doping features of the cuprate systems through the singlet hopping and
superconducting pairing terms.
The paper ends with conclusions in section 9.
2. Mean field approximation
The Hamiltonian of the effective two-band singlet-hole Hubbard model [8] is written in
the form
H = E1
Xσσi + E2
X22i +
σ0,0σ
1,i +K22
2σ,σ2
1,i +K21
2σ̄,0σ
1,i + τ
σ0,σ̄2
1,i ) (1)
The summation label i runs over the sites of an infinite two-dimensional (2D) square
array the lattice constants of which, ax = ay, are defined by the underlying single crystal
structure. The spin projection values in the sums over σ are σ = ±1/2, σ̄ = −σ.
The Hubbard operators (HOs) X
i = |iα〉〈iβ| are defined for the four states of the
model at each lattice site i: |0〉 (vacuum), |σ〉 = |↑〉 and |σ̄〉 = |↓〉 (single particle spin
states inside the hole subband), and |2〉 = |↑↓〉 (singlet state in the singlet subband).
The multiplication rule holdsX
i = δβγX
i . The HOs may be fermionic (single
spin state creation/annihilation in a subband) or bosonic (singlet creation/annihilation,
spin or charge densities, particle numbers). For a pair of fermionic HOs, the
anticommutator rule holds {X
i , X
j } = δij(δβγX
i + δηαX
i ) whereas, if one or both
HOs are bosonic, the commutation rule holds [X
i , X
j ] = δij(δβγX
i − δηαX
i ). At
each lattice site i, the constraint of no double occupancy of any quantum state |iα〉 is
rigorously fulfilled due to the completeness relation X00i +X
i = 1.
In (1), E1 = ε̃d−µ denotes the hole subband energy for the renormalized energy ε̃d
of a d-hole and the chemical potential µ. The energy parameter of the singlet subband
is E2 = 2E1 + ∆, where ∆ ≈ ∆pd = εp − εd is an effective Coulomb energy Ueff
corresponding to the difference between the two energy levels of the model.
In the description of the hopping processes, the label 1 points to the hole subband
and 2 to the singlet subband. The hopping energy parameter Kab = 2tpdKab depends on
tpd, the hopping p-d integral, and on energy band dependent form factors, Kab. Inband
(K11,K22) and interband (K21 = K12) processess are present. The Hubbard 1-forms
αβ,γη
1,i =
m (2)
incorporate the overall effects of specific hopping processes (through the labels (αβ, γη)
of the pair of Hubbard operators) involving the lattice site i and its neighbouring sites.
Up to three coordination spheres around the reference site i do contribute [4, 5] to
the sum (2), each being characterized by a small specific value of the overlap coefficients
Mean field Green functions of Hubbard model of superconductivity 4
νij (ν1 for the nearest neighbour (nn), ν2 for the next nearest neighbour (nnn), ν3 for
the third coordination spheres).
The quasi-particle spectrum and superconducting pairing for the Hamiltonian (1)
are obtained [11, 12] from the two-time 4× 4 GF matrix (in Zubarev notation [9])
G̃ijσ(t− t
′) = 〈〈X̂iσ(t) |X̂
′)〉〉 = −iθ(t− t′)〈{X̂iσ(t), X̂
jσ}〉, (3)
where 〈· · ·〉 denotes the statistical average over the Gibbs grand canonical ensemble.
The GF (3) is defined for the four-component Nambu column operator
X̂iσ = (X
⊤ (4)
where the superscript ⊤ denotes the transposition. In (3), X̂
jσ = (X
is the adjoint operator of X̂jσ.
The GF matrix in (r, ω)-representation is related to the expression (3) of the GF
matrix in (r, t)-representation by the non-unitary Fourier transform,
G̃ijσ(t− t
G̃ijσ(ω) e
−iω(t−t′) dω . (5)
The energy spectrum of the translation invariant spin lattice of (1) is solved in
the reciprocal space. The GF matrix in this (q, ω)-representation is related to the GF
matrix in (r, ω)-representation by the non-unitary discrete Fourier transform
G̃ijσ(ω) =
e−iq (rj−ri) G̃σ(q, ω). (6)
For an elemental GF of labels (αβ, γη), we use the notation 〈〈X
i (t)|X
′)〉〉 in
the (r, t)-representation and, similarly, 〈〈X
j 〉〉ω (assuming Hubbard operators at
t = 0), in the (r, ω)-representation. In the (q, ω)-representation, it is convenient to use
the notation Gαβ,γη(q, ω).
We shall consider henceforth the GMFA-GF, G̃0σ(q, ω). Its derivation involves:
(i) Differentiation of the GF (3) with respect to t and use of the equations of motion
for the Heisenberg operators X
i (t).
(ii) Derivation of an algebraic equation for G̃ijσ(ω), Eq. (5).
(iii) Elimination of the contribution of the inelastic processes to the commutator
Ẑiσ = [X̂iσ, H ] entering the equation of motion of G̃ijσ(ω).
(iv) Transformation to (q, ω)-representation of the obtained equation of G̃0ijσ(ω) by
means of the Fourier transform (6).
This finally yields
G̃0σ(q, ω) = χ̃
χ̃ω − Ãσ(q)
χ̃ , (7)
χ̃ = 〈{X̂iσ, X̂
iσ}〉, (8)
Ãσ(q) =
eiq (rj−ri) Ãijσ, rij = rj − ri , (9)
Ãijσ = 〈{[X̂iσ, H ], X̂
jσ}〉 . (10)
The matrix Ãijσ is Hermitian.
Mean field Green functions of Hubbard model of superconductivity 5
3. Translation invariance of the spin lattice
Four consequences follow from the translation invariance of the spin lattice.
• The definition of the Hubbard 1-form (2) over a translation invariant spin lattice
results in the identity (which secures the hermiticity of the Hamiltonian H):
αβ,γη
1,i = −τ
γη,αβ
1,i . (11)
• The Green function (3) of the model Hamiltonian (1) depends only on the distance
rij = |rj − ri| between the position vectors at the lattice sites i and j [9].
• The one-site statistical averages are independent on the site label i, 〈X
i 〉 = 〈X
(∀ i, j). For this reason, the site label in the one-site averages will be omitted.
• The two-site statistical averages 〈X
j 〉 remain invariant under the interchange
of the site labels i and j,
j 〉 = 〈X
i 〉, i 6= j (12)
4. Spin reversal invariance
The energy spectrum of the system described by the Hamiltonian (1) does not depend
on the specific values σ = ±1/2 of the spin projection. As a consequence, the definition
of the GF (3) either in terms of the σ-Nambu operator (4) or the σ̄-Nambu operator
X̂iσ̄ = (X
⊤ (13)
has to result in mathematically equivalent descriptions of the observables. This means,
however, that the mathematical structures of the frequency matrices Ãijσ, Eq. (10), and
Ãijσ̄ = 〈{[X̂iσ̄, H ], X̂
jσ̄}〉 emerging from the σ̄-Nambu operator (13), have to be related
to each other.
The identification of the existing relationships is constructive: we calculate and
compare the corresponding matrix elements of Ãijσ and Ãijσ̄. The multiplication rules
and the commutation/anticommutation relations satisfied by the Hubbard operators
result in the following general expression of the elemental anticommutators entering
their definitions:
i , H ], X
j } = δijC
λµ,νϕ
i + (1− δij)νijT
λµ,νϕ
ij , (14)
with one-site contributions given by
λµ,νϕ
i = δνµ
δµσ)E1 + δµ2E2
i +K11τ
0ϕ,σ0
1,i −K22τ
2ϕ,σ2
1,i +K21 ·2σ(τ
2ϕ,0σ̄
1,i +τ
0ϕ,2σ̄
1,i )
+δλ2(−E2X
i +K22
σϕ,2σ
1,i +K21
σ̄ϕ,σ0
1,i )−
−δλ0(K11
σϕ,0σ
1,i +K21
σϕ,σ̄2
1,i )
Mean field Green functions of Hubbard model of superconductivity 6
δλσ)E1 + δλ2E2
i +K11τ
ν0,0σ
1,i −K22τ
ν2,2σ
1,i +K21 · 2σ(τ
ν2,σ̄0
1,i +τ
ν0,σ̄2
1,i )
+δµ2(E2X
i +K22
νσ,σ2
1,i +K21
νσ̄,0σ
1,i )−
−δµ0(K11
νσ,σ0
1,i +K21
νσ,2σ̄
1,i )
δϕ0(K11τ
νµ,σ0
1,i + 2σK21τ
νµ,2σ̄
1,i )− δϕ2(K22τ
νµ,σ2
1,i − 2σK21τ
νµ,0σ̄
1,i )
δλ0(K11τ
νµ,0σ
1,i + 2σK21τ
νµ,σ̄2
1,i )− δλ2(K22τ
νµ,2σ
1,i − 2σK21τ
νµ,σ̄0
1,i )
δν0(K11τ
λϕ,0σ
1,i + 2σK21τ
λϕ,σ̄2
1,i )− δν2(K22τ
λϕ,2σ
1,i − 2σK21τ
λϕ,σ̄0
1,i )
δµ0(K11τ
λϕ,σ0
1,i + 2σK21τ
λϕ,2σ̄
1,i )− δµ2(K22τ
λϕ,σ2
1,i − 2σK21τ
λϕ,0σ̄
1,i )
and two-site contributions given by
λµ,νϕ
ij = δνµ
δµσ)(K11X
j −K22X
j )+(−δµ0K11+δµ2K22)
Xλσi X
δλσ)(−K11X
j +K22X
j )+(δλ0K11−δλ2K22)
δν0K11X
j − δν2K22X
+K21 ·2σ
j +δϕ2X
j +δν,−λ(X
δϕ0K11X
j − δϕ2K22X
+K21 ·2σ
j + δν2X
j + δϕ,−µ(X
δλ0K11X
j −δλ2K22X
j +K21 ·2σ(δµ0X
j +δµ2X
δµ0K11X
j −δµ2K22X
j +K21 ·2σ(δλ0X
j +δλ2X
2σ(δλ0δν2X
j −δλ2δν0X
j −δµ0δϕ2X
j +δµ2δϕ0X
The comparison of the results obtained from (14) for the corresponding matrix
elements of Ãijσ and Ãijσ̄ and the use of the translation invariance properties (11)
and (12) result in four distinct kinds of relationships:
• Under the spin reversal σ → σ̄, the following invariance properties hold for the
normal one-site statistical averages:
〈Xσσi 〉 = 〈X
i 〉 (15)
Mean field Green functions of Hubbard model of superconductivity 7
σ2,2σ
1,i 〉 = 〈τ
σ̄2,2σ̄
1,i 〉, 〈τ
0σ̄,σ̄0
1,i 〉 = 〈τ
0σ,σ0
1,i 〉 (16)
σ2,σ̄0
1,i 〉 = 2σ̄〈τ
σ̄2,σ0
1,i 〉 (17)
• The identity 〈C
σ2,0σ
0σ̄,σ̄2
i 〉 = 0 holds, therefrom we get for the one-site anomalous
averages,
〈X02i 〉 = 0 (18)
0σ̄,σ̄2
1,i 〉 = −〈τ
0σ,σ2
1,i 〉 (19)
0σ̄,0σ
1,i 〉 = 〈τ
σ2,σ̄2
1,i 〉 (20)
The first two equations imply that the contributions of the one-site terms 〈X02i 〉
0σ̄,σ̄2
1,i 〉 to the superconducting pairing vanish identically irrespective of the
model details (like, e.g., the relationship between the lattice constants ax and ay).
For a rectangular spin lattice (ax 6= ay), Eq. (20) points to the occurrence of a small
non-vanishing one-site contribution to the superconducting pairing originating
equally in both energy subbands. However, over the square spin lattice (1)
(ax = ay), each term of (20) vanishes for d-wave pairing due to the symmetry
in the reciprocal space [12].
• Under the spin reversal σ → σ̄, the following invariance properties hold for the
two-site statistical averages:
〈Xσσi X
j 〉 = 〈X
j 〉, 〈X
j 〉 = 〈X
j 〉 (21)
〈X22i X
j 〉 = 〈X
j 〉, 〈X
j 〉 = 〈X
j 〉 (22)
〈X02i X
j 〉 = 〈X
j 〉. (23)
• The operator of the number of particles at site i within the singlet subband, Ni, is
the sum of spin σ and σ̄ components,
Ni = niσ + niσ̄, niσ = X
i , niσ̄ = X
i . (24)
Similar relationships hold for the number of particles at site i within the hole
subband, Nhi ,
Nhi = n
iσ + n
iσ̄, n
iσ = X
i , n
iσ̄ = X
i . (25)
Due to the completeness relation,
Ni +N
i = 2, niσ + n
iσ = niσ̄ + n
iσ̄ = 1. (26)
These equalities simply reflect the fact that, at a given lattice site i, there is a single
spin state of predefined spin projection, whereas the total number of spin states
equals two.
Therefore, the operator Ni, Eq. (24), provides unique characterization of the
occupied states within the model [8, 12, 10].
Mean field Green functions of Hubbard model of superconductivity 8
5. Frequency matrix in (r, ω)-representation
A straightforward consequence of the results established in section 4 is the simplest
general expression of the frequency matrix Ãijσ, Eq. (10):
Ãijσ = δij
ĉσ 0̂
0̂ −(ĉσ̄)
+ (1− δij)
D̂ijσ ∆̂ijσ
(∆̂ijσ)
† −(D̂ijσ̄)
. (27)
The one-site 2× 2 matrix ĉσ is Hermitian, its elements do not depend on the particular
lattice site i,
ĉσ =
(E1 +∆)χ2 + a22 2σa21
2σa∗21 E1χ1 + a22
, (28)
and are expressed in terms of the spin reversal invariant quantities
χ2 = 〈niσ〉 = 〈niσ̄〉 (29)
χ1 = 〈n
iσ〉 = 〈n
iσ̄〉 = 1− χ2 (30)
a22 = K11〈τ
0σ̄,σ̄0
1 〉 − K22〈τ
σ2,2σ
1 〉 (31)
a21 = (K11 −K22) · 2σ〈τ
σ2,σ̄0
1 〉+K21(〈τ
0σ̄,σ̄0
1 〉 − 〈τ
σ2,2σ
1 〉). (32)
The normal hopping 2× 2 matrix D̂ijσ is symmetric,
D̂ijσ =
d22ij 2σd
2σd21ij d
Due to the constraints (21)–(22), the charge-spin correlations entering the matrix
elements of (33) get exactly decoupled from each other, such that
d22ij = K22(χ
ij + χ
ij)−K11χ
d11ij = K11[χ
ij + (χ1 − χ2)νij + χ
ij]−K22χ
d21ij = K21[(χ
ij − χ2νij) + χ
ij ]−K21χ
with the three spin reversal invariant weighted boson-boson correlation functions
representing respectively charge-charge (c), spin-spin (S), and singlet-hopping (s-h)
correlations:
χcij = νij〈NiNj〉/4, (34)
χSij = νij〈SiSj〉 (35)
χs−hij = νij〈X
j 〉 (36)
In (35), Si = (S
i , S
i ), with S
i = (X
i )/2 and S
i = X
The anomalous hopping 2× 2 matrix ∆̂ijσ has a very special form namely,
∆̂ijσ =
−K21 · 2σ
(K11 +K22)
(K11 +K22) K21 · 2σ
ij (37)
where the spin reversal invariant weighted boson-boson pairing (pair) correlation
function is given by
ij = νij〈X
i Nj〉 = 2νij〈X
j )〉 = (38)
= − νij〈N
i 〉 = −2νij〈(X
i 〉. (39)
Mean field Green functions of Hubbard model of superconductivity 9
In Eqs. (38) and (39), the derivation of the second expression from the first one makes
use of the spin reversal invariance property (23).
To get a workable expression of the frequency matrix, approximations have to be
derived for the boson-boson statistical averages entering the two-site hopping matrix
elements. In the next section we show that the method of reference [12], yielding
the pairing correlation function 〈X02i Nj〉 in terms of GMFA Green functions within
an approach able to identify and rule out exponentially small terms, can be extended to
the singlet hopping correlations 〈X02i X
j 〉 as well.
6. Hopping processes involving singlets
The right approach to the reduction of the order of correlation of the boson-boson
statistical averages 〈X02i X
j 〉 = 〈X
i 〉 goes differently for the hole-doped and
electron-doped cuprates.
• Reduction of the correlation order for hole-doped cuprates
In these systems, the Fermi level (the zero point energy) stays in the singlet
subband. We get the estimates E2 ≃ −∆, E2 − ∆ ≃ −2∆, E2 + ∆ ≃ 0. With
∆ ∼ 3eV , β∆ ∼ 3.5 · 104T−1. Therefore, at T . 300K, the quantities containing the
factor eβE2 ≃ e−β∆ . e−100 < 10−44 are negligible.
We start with the following form of the spectral theorem [9]
〈X02i X
j 〉 =
1 + e−βω
〈〈X02i |X
j 〉〉ω+iε − 〈〈X
j 〉〉ω−iε
, (40)
written for anticommutator retarded (ω + iε), respectively advanced (ω − iε) Green
functions. Their equation of motion in the (r, ω)-representation is
(ω−E2)〈〈X
j 〉〉ω ≃ 2〈X
j 〉+K21
0σ̄,0σ
1,i |X
j 〉〉ω−〈〈τ
σ2,σ̄2
1,i |X
j 〉〉ω
where, for the sake of simplicity, the labels ±iε, ε = 0+, describing respectively the
retarded and the advanced Green functions have been omitted. In Eq. (41), the higher
order r.h.s. contributions coming from the inband hopping terms have been dropped off.
Replacing (41) in (40), we get
〈X02i X
j 〉 ≃ K21
1 + e−βω
ω − E2 + iε
0σ̄,0σ
1,i |X
j 〉〉ω+iε−〈〈τ
σ2,σ̄2
1,i |X
j 〉〉ω+iε
To evaluate the imaginary part, we use the identity [9]
ω − E2 + iε
ω −E2
− iπδ(ω − E2).
The integrals over the δ-function yield (finite) GF real parts at ω = E2, multiplied by
a thermodynamic factor ∼ e−β∆ ≪ 1. The imaginary part of the hole subband GF
Mean field Green functions of Hubbard model of superconductivity 10
0σ̄,0σ
1,i |X
j 〉〉ω+iε shows a δ-like maximum at ω = E2−∆, where (ω−E2)
−1 ≃ ∆−1 and
the thermodynamic factor reaches a value ∼ e−2∆. The only non-negligible contribution
to the principal part integral comes from the singlet subband GF 〈〈τ
σ2,σ̄2
1,i |X
j 〉〉ω+iε the
imaginary part of which shows a δ-like maximum at ω = E2 +∆ ≃ 0. This allows us to
approximate (ω − E2)
−1 ≈ ∆−1 within the integral over the singlet subband GF to get
〈X02i X
j 〉 ≃ (1− δij)
2σ̄〈τ
σ2,σ̄2
1,i X
j 〉 (42)
Replacing this result in Eq. (38) and using (2) we get
ij ≃ (1− δij)
K21νij
4νij ·2σ̄〈X
m6=(i,j)
2σ〈Xσ2i X
m Nj〉
Omitting the three-site terms, we get the two-site approximation of the superconducting
pairing originating in the singlet subband,
ij ≃ (1− δij)
4K21ν
· 2σ̄〈Xσ2i X
j 〉, (44)
which reproduces the well-known two-site exchange term of the t-J model.
For the singlet hopping correlation function, (42) yields the two-site approximation
χs−hij ≃ (1− δij)
2K21ν
· 2σ̄〈Xσ2i X
j 〉 (45)
• Reduction of the correlation order for electron-doped cuprates
The Fermi level (the zero point energy) stays now in the hole subband. We have
the estimates E2 ≃ ∆, E2 +∆ ≃ 2∆, E2 −∆ ≃ 0.
It is convenient now to start with the alternative form of the spectral theorem [9]
i 〉 =
eβω + 1
〈〈X02i |X
j 〉〉ω+iε − 〈〈X
j 〉〉ω−iε
, (46)
with the retarded and advanced GFs following from the same equation (41).
Exponentially small quantities result from the δ-term of (ω − E2 + iε)
−1 and from
the singlet subband GF 〈〈τ
σ2,σ̄2
1,i |X
j 〉〉ω+iε. The hole subband GF 〈〈τ
0σ̄,0σ
1,i |X
j 〉〉ω+iε,
yields the non-negligible contribution
i 〉 ≃ (1− δij)
2σ̄〈X
0σ̄,0σ
1,i 〉 (47)
Replacing in (39) and omitting the three-site terms, we get the two-site approximation
of the superconducting pairing originating in the hole subband,
ij ≃ (1− δij)
4K21ν
· 2σ〈X0σ̄i X
j 〉 (48)
Finally, the two-site approximation of the singlet-hopping correlation function is
〈X02i X
j 〉 ≃ (1− δij)
2K21ν
· 2σ̄〈X0σ̄i X
j 〉. (49)
In conclusion, the GMFA superconducting pairing is a second order effect. The
lowest order contribution to it originates in interband hopping correlating annihilation
Mean field Green functions of Hubbard model of superconductivity 11
(or creation) of pairs of spins at neighbouring lattice sites i and j within that energy
subband which crosses the Fermi level.
Similarly, the singlet hopping is a second order effect as well. It mainly proceeds
by interband i ⇄ j single particle jumps from the upper energy subband to the lower
energy subband.
7. Frequency matrix in (q, ω)-representation
The calculation of the matrix elements of Ãσ(q) from Eq. (9) asks for three essentially
different kinds of Fourier transforms, namely,
• The averages of the Hubbard 1-forms entering Eqs. (31) and (32) result in sums of
products of q-space averages and geometrical form factors:
λµ,νϕ
1,i 〉 =
〈XλµXνϕ〉qγα(q) (50)
for label sets {(λµ, νϕ)} ∈ {(0σ̄, σ̄0); (σ2, 2σ); (σ2, σ̄0)}.
The quantity 〈XλµXνϕ〉q denotes the average of the q-space image of the product
of Hubbard operators of labels λµ and νϕ respectively,
〈XλµXνϕ〉q =
1 + e−βω
Gλµ,νϕ(q, ω + iε)−Gλµ,νϕ(q, ω − iε)
Finally, in Eq. (50), γα(q) denote the nn (α = 1), nnn (α = 2), and third
neighbour (α = 3) geometrical form factors, γ1(q) = 2[cos(qxax) + cos(qyay)],
γ2(q) = 4 cos(qxax) cos(qyay), γ3(q) = 2[cos(2qxax) + cos(2qyay)].
• For the two-site weighted singlet hopping (36) and the superconducting pairing (38),
the Fourier transforms result in convolutions of specific averages and geometrical
form factors. The results are as follows:
− Singlet hopping
χs−h(q) =
ν2α ·
Ξkγα(q− k) (52)
where Ξk = 2σ〈X
σ2X σ̄0〉k, while Ξk = 2σ〈X
0σ̄X2σ〉k for hole-doped and electron-
doped cuprates respectively, with averages defined in (51).
− Superconducting pairing
χpair(q) =
ν2α ·
Πkγα(q− k) (53)
where Πk = 2σ̄〈X
σ2X σ̄2〉k, while Πk = 2σ〈X
0σ̄X0σ〉k for hole-doped and electron-
doped cuprates respectively, with averages defined in (51).
• The charge-charge and spin-spin correlation functions (34) and (35) are treated
approximately following [8, 10]:
Mean field Green functions of Hubbard model of superconductivity 12
– The order of the charge-charge correlation function 〈NiNj〉 is lowered using a
Hubbard type I approximation decoupling procedure 〈NiNj〉 ≃ 〈Ni〉〈Nj〉 = 2χ2.
– The spin-spin correlation function 〈SiSj〉 is kept undecoupled, but treated
phenomenologically. Eq. (2) implies the occurrence of up to three non-vanishing
spin-spin correlation functions: nn, χS1 = 〈SiSi±ax/y〉, nnn, χ
2 = 〈SiSi±ax±ay〉, and
χS3 = 〈SiSi±2ax/y〉. These are site independent quantities.
Using the above results, we get from (9) and (27) the mathematical structure of the
frequency matrix Ãσ(q) as follows,
Ãσ(q) =
Êσ(q) Φ̂σ(q)
(Φ̂σ(q))
† −(Êσ̄(q))
. (54)
The normal 2× 2 matrix contributions to Ãσ(q) show the characteristic σ-dependence,
Êσ(q) =
c22 2σc21
2σc∗21 c11
; −(Êσ̄(q))
−c22 2σc
2σc21 −c11
with the σ-independent terms cab carrying normal one-site and two-site matrix elements,
c22 ≡ c22(q) = (E1 +∆)χ2 + a22 + d22(q)
c11 ≡ c11(q) = E1χ1 + a22 + d11(q)
c21 ≡ c21(q) = a21 + d21(q)
dab(q) = Kab
ναγα(q)[χ
α + (−1)
a+bχaχb] +
s−h(q)
The one-site terms are defined by Eqs. (31)–(32) and (50). The exchange energy
parameters are given by
Jab = 4KabK21/∆, {ab} ∈ {22, 11, 21}, (56)
while the singlet hopping contribution χs−h(q) is given by Eq. (52).
The anomalous 2× 2 matrix contributions to Ãσ(q), obtained from (37), show the
characteristic σ-dependence,
Φ̂σ(q) =
−2σξ1b ξ2b
−ξ2b 2σξ1b
; (Φ̂σ(q))
−2σξ1b
∗ −ξ2b
∗ 2σξ1b
with ξ1 = J21, ξ2 = (J11 + J22)/2, whereas b ≡ b(q) is a shorthand notation for the
pairing matrix element (53).
Remark 1 The spin reversal σ → σ̄ symmetry properties of the elemental Green
functions entering the matrix GF (3) are identical to those established for the underlying
frequency matrix Ãσ(q).
Mean field Green functions of Hubbard model of superconductivity 13
8. GMFA Green function
From Eqs. (15) and (18) it follows that the matrix χ̃, Eq. (8), is diagonal and spin
reversal invariant, with two nonvanishing matrix elements,
χ̂ 0̂
0̂ χ̂
, χ̂ =
, 0̂ =
, (58)
where χ2 and χ1 are given by Eqs. (29) and (30) respectively.
Replacing in (7) the expressions (58) of the matrix χ̃ and (54) of the frequency
matrix Ãσ(q), we get a structure of the GMFA-GFmatrix obeying the general symmetry
properties established in [11],
G̃0σ(q, ω) =
Ĝ0σ(q, ω) F̂
σ (q, ω)
(F̂ 0σ (q, ω))
† −(Ĝ0σ̄(q,−ω))
, (59)
where the argument ω carries, in fact, the complex value ω + iε, ε = 0+. (Hence the
elemental GFs containing the argument ω point to retarded GFs, while those containing
the argument −ω point to advanced GFs.)
The normal 2× 2 matrix Ĝ0σ(q, ω) shows the characteristic σ-dependence,
Ĝ0σ(q, ω) =
g22(q, ω) 2σg21(q, ω)
2σg∗21(q, ω) g11(q, ω)
D(q, ω)
with the σ-independent components gab(q, ω) found from
gab(q, ω) = Aabω
3 +Babω
2 + Cabω +Dab, {ab} ∈ {22, 11, 21}.
Here the coefficients Aab are given respectively by
A22 = χ2, A11 = χ1, A21 = 0,
while Bab, Cab, Dab are q-dependent coefficients:
B22(q) = c22, B11(q) = c11, B21(q) = c21
C22(q) = − [χ2(c
11 + ξ
1 |b|
2) + χ1(|c21|
2 + ξ22 |b|
2)]/χ21
C11(q) = − [χ1(c
22 + ξ
1 |b|
2) + χ2(|c21|
2 + ξ22 |b|
2)]/χ22
C21(q) = [c21(χ2c11 + χ1c22)− ξ1ξ2|b|
2]/(χ1χ2)
D22(q) = −[c11(c22c11 − |c21|
2) + (c22ξ
1 + c11ξ
2 + 2ℜ(c21)ξ1ξ2)|b|
2]/χ21
D11(q) = −[c22(c22c11 − |c21|
2) + (c11ξ
1 + c22ξ
2 + 2ℜ(c21)ξ1ξ2)|b|
2]/χ22
D21(q) = {c21(c22c11 − |c21|
2)− [c∗21ξ
1 + c21ξ
2 + (c22 + c11)ξ1ξ2]|b|
2}/(χ1χ2)
The anomalous 2× 2 matrix F̂ 0σ (q, ω) shows the characteristic σ-dependence,
F̂ 0σ (q, ω) =
2σf22(q, ω) f21(q, ω)
−f21(q,−ω) 2σf11(q, ω)
D(q, ω)
Mean field Green functions of Hubbard model of superconductivity 14
with the elemental GFs fab(q, ω) given by
faa(q, ω) = (Paaω
2 +Raa)b, {aa} ∈ {22, 11},
f21(q, ω) = (P21ω
2 +Q21ω +R21)b.
Here, P22 = −ξ1, P11 = ξ1, and P21 = −ξ2 are q-independent, while
R22(q) = [(c
11 + c
21)ξ1 + 2c11c21ξ2 + ξ1(ξ
1 − ξ
2)|b|
2]/χ21
R11(q) = −[(c
22 + c
)ξ1 + 2c22c
21ξ2 + ξ1(ξ
1 − ξ
2)|b|
2]/χ22
R21(q) = [(c11c
21 + c22c21)ξ1 + (c22c11 + |c21|
2)ξ2 − ξ2(ξ
1 − ξ
2)|b|
2]/(χ1χ2)
Q21(q) = [(χ2c21 − χ1c
21)ξ1 + (χ2c11 − χ1c22)ξ2]/(χ1χ2).
The denominator D(q, ω) occurring in Eqs. (60) and (61), which is proportional to the
determinant of the matrix χ̃ω − Ãσ(q) in (7), shows the following monic bi-quadratic
dependence in ω:
D(q, ω) = (ω2 − uω + v)(ω2 + uω + v), (62)
where v = v(q) and u = u(q) are found respectively from
(c22c11−|c21|
2)− (ξ21 − ξ
2)|b|
[(c22+c11) + 2ℜ(c21)]
2ξ21 −
−4(c22+c11)ℜ(c21)ξ1(ξ1 − ξ2)− 4|c21|
2(ξ21−ξ
/(χ21χ
2) (63)
u2 − 2v =
(c211 + ξ
1 |b|
(c222 + ξ
1 |b|
(|c21|
2 + ξ22 |b|
2). (64)
A necessary consistency condition to be satisfied by the parameters of the model at any
vector q inside the Brillouin zone is v2(q) ≥ 0.
Remark 2 The zeros of the determinant of the GMFA-GF,
D(q, ω) = 0 (65)
provide the GMFA energy spectrum of the system.
At every wave vector q inside the Brillouin zone, this yields for the superconducting
state the energy eigenvalue set
{Ω1(q), Ω2(q), −Ω2(q), −Ω1(q)},
Ω1,2(q) = (u/2)±
(u/2)2 − v. (66)
In the normal state (b = 0), Eqs. (63) and (64) reduce respectively to
v0 = (c22/χ2)(c11/χ1)− |c21|
2/(χ1χ2)
u0 = (c22/χ2) + (c11/χ1)
such that the energy spectrum is given by the roots of the second order equation
ω2 − u0ω + v0 = 0 solved in [8].
Finally, if we assume a pure Hubbard model (i.e., energy band independent hopping
parameters, K11 = K22 = K21 ≡ t, [10]), then a significant simplification of the
equations derived in the last two sections is obtained. The normal 2× 2 matrix Êσ(q)
Mean field Green functions of Hubbard model of superconductivity 15
becomes symmetric and so is the normal GMFA-GF Ĝ0σ(q, ω). Moreover, there is a
single exchange energy parameter in (57), ξ1 = ξ2 ≡ J = 4t
2/∆, which simplifies the
anomalous 2×2 frequency matrix to Φ̂σ(q) =
−1 2σ
Jb, such that the quantities
u and v in the expression (62) of the GF determinant reduce to
v2 = [(c22c11 − c
2 + (c22 + c11 + 2c21)
2J2|b|2]/(χ21χ
2) (67)
u2 − 2v = [χ22c
11 + χ
22 + 2χ1χ2c
21 + J
2|b|2]/(χ21χ
2). (68)
A non-negative value v ≥ 0 always follows from Eq. (67), however, the reality of the
solutions (66) needs investigation of the domain of variation of the adjustable parameters
of the model.
9. Conclusions
The two-band Hubbard model of the high Tc superconductivity in cuprates [8, 12] uses
Hubbard operator algebra on a physical system characterized by specific invariance
symmetries with respect to translations and spin reversal.
In the present paper we have shown that the system symmetries result either in
invariance properties or exact vanishing of several characteristic statistical averages. The
vanishing of the one-site anomalous matrix elements is shown to be a property which
is embedded in the Hubbard operator algebra. Another worth mentioning consequence
following from the spin reversal invariance properties of the two-site statistical averages
is the exact decoupling from each other of the charge and spin correlations entering
the matrix elements of the frequency matrix. The use of these results allowed rigorous
derivation and simplification of the expression of the frequency matrix of the generalized
mean field approximation (GMFA) Green function (GF) matrix of the model.
For the higher order boson-boson averages 〈X02i X
j 〉 and 〈X
i Nj〉, which enter
respectively the normal singlet hopping and anomalous exchange pairing contributions
to the frequency matrix, an approximation procedure resulting in GMFA-GF expressions
was described. The procedure avoids the current decoupling schemes [14, 15]. Its
principle, first formulated in [12], consists in the identification and elimination of
exponentially small contributions to the spectral theorem representations of these
statistical averages.
A point worth noting is that the proper identification of exponentially small
quantities asks for the use of different starting expressions of the spectral theorem for
the hole-doped and electron-doped cuprates.
The results of the reduction procedure may be summarized as follows:
• The singlet hopping is a second order effect which may be described as interband
i ⇄ j single particle jumps from the upper to the lower energy subband.
• The GMFA superconducting pairing is a second order effect, the lowest order
contribution to which originates in interband hopping correlating the annihilation
Mean field Green functions of Hubbard model of superconductivity 16
(creation) of spin pairs at neighbouring lattice sites i and j within that energy
subband which crosses the Fermi level.
The derivation of the most general and simplest possible expressions of the
frequency matrix and of the GMFA-GF matrix in the (q, ω)-representation enables
reliable numerical investigation of the consequences coming from the adjustable
parameters of the model (the degree of hole/electron doping, the energy gap ∆, the
hopping parameters).
Another open question of the GF approach to the solution of the present model is
the use of the Hubbard operator algebra to get rigorous derivation and simplification of
the Dyson equation of the complete Green function. As shown previously in [12], the
self-energy corrections induce a spin fluctuation d-wave pairing originating in kinematic
interaction in the second order.
These investigations are underway and results will be reported in a forthcoming
paper.
Acknowledgments
The authors would like to express their gratitude to Prof. N.M. Plakida for useful
advice and critical reading of the manuscript. Partial financial support was secured by
the Romanian Authority for Scientific Research (Project 11404/31.10.2005 - SIMFAP).
References
[1] Damascelli A, Hussain Z and Shen Z -X 1986 Rev. Mod. Phys. 75 473
[2] F.C. Zhang F C and T.M. Rice T M 1988 Phys. Rev. B 37 3759
[3] Emery V J 1987 Phys. Rev. Lett. 58 2794; Varma C M Schmitt-Rink S, and Abrahams E 1987
Solid State Commun. 62 681
[4] Feiner L J, Jefferson J H and Raimondi R 1996 Phys. Rev. B 53 8751
[5] Yushankhai V Yu, Oudovenko V S and Hayn R 1997 Phys. Rev. B 55 15562
[6] Plakida N M and Oudovenko V S 1999 Phys. Rev. B 59, 11949
[7] Plakida N M 2001 JETP Lett. 74 36
[8] Plakida N M, Hayn R, and Richard J -L 1995 Phys. Rev. B 51 16599
[9] Zubarev D N 1960 Sov. Phys. Usp. 3 320
[10] Plakida N M and Oudovenko V S 2007 JETP 104 230
[11] Plakida N M 1997 Physica C 282–287 1737
[12] Plakida N M, Anton L, Adam S, and Adam Gh 2003 ZhETF 124, 367; English transl.: 2003 JETP
97 331
[13] Plakida N M 2006 Fiz. Nizkikh Temp. 32 483
[14] Roth L M 1969 Phys. Rev. 184 451
[15] Beenen J and Edwards D M 1995 Phys. Rev. B 52 13636; Avella A, Mancini F, Villani D and
Matsumoto H 1997 Physica C 282–287 1757; Di Matteo T, Mancini F, Matsumoto H and
Oudovenko V S 1997 Physica B 230–232 915; Stanescu T D, Martin I and Phillips Ph, 2000
Phys. Rev. B 62 4300
	Introduction
	Mean field approximation
	Translation invariance of the spin lattice
	Spin reversal invariance
	Frequency matrix in (r, )-representation
	Hopping processes involving singlets
	Frequency matrix in (q, )-representation
	GMFA Green function
	Conclusions
ABSTRACT
  The Green function (GF) equation of motion technique for solving the
effective two-band Hubbard model of high-T_c superconductivity in cuprates
[N.M. Plakida et al., Phys. Rev. B, v. 51, 16599 (1995); JETP, v. 97, 331
(2003)] rests on the Hubbard operator (HO) algebra. We show that, if we take
into account the invariance to translations and spin reversal, the HO algebra
results in invariance properties of several specific correlation functions. The
use of these properties allows rigorous derivation and simplification of the
expressions of the frequency matrix (FM) and of the generalized mean field
approximation (GMFA) Green functions (GFs) of the model.
  For the normal singlet hopping and anomalous exchange pairing correlation
functions which enter the FM and GMFA-GFs, an approximation procedure based on
the identification and elimination of exponentially small quantities is
described. It secures the reduction of the correlation order to GMFA-GF
expressions.

<|endoftext|><|startoftext|>
Geometry Effects at Atomic-Size Aluminium
Contacts
U. Schwingenschlögl ∗, C. Schuster
Institut für Physik, Universität Augsburg, 86135 Augsburg, Germany
Abstract
We present electronic structure calculations for aluminium nanocontacts. Address-
ing the neck of the contact, we compare characteristic geometries to investigate
the effects of the local aluminium coordination on the electronic states. We find
that the Al 3pz states are very sensitive against modifications of the orbital over-
lap, which has serious consequences for the transport properties. Stretching of the
contact shifts states towards the Fermi energy, leaving the system instable against
ferromagnetic ordering. By spacial restriction, hybridization is locally suppressed
at nanocontacts and the charge neutrality is violated. We discuss the influence of
mechanical stress by means of quantitative results for the charge transfer.
Key words: density functional theory, electronic structure, stretched nanocontact,
hybridization, charge neutrality
PACS: 71.20.-b, 73.20.-r, 73.20.At, 73.40.-c, 73.63.Rt
Atomic-size contacts can be prepared by means of scanning tunneling mi-
croscopy [1] or break junction techniques [2]. In each case, piezoelectric ele-
ments are used to stretch a wire with a precision of a few picometers until
finally a single atom configuration is reached. Such contacts have attracted
great attention over the last couple of years, in particular concerning their
electrical transport properties. Since transport is restricted to a small number
of atomic orbitals at the contact, conductance across nanocontacts strongly
depends on the local electronic structure. An atomic-size constriction acco-
modates only a small number of conducting channels, which is determined by
the number of valence orbitals of the contact atom. The transmission of each
channel likewise is fixed by the local atomic environment. For a review on the
quantum properties of atomic-size conductors see Agräıt et al. [3].
∗ Corresponding author. Fax: 49-821-598-3262
Email address: Udo.Schwingenschloegl@physik.uni-augsburg.de
(U. Schwingenschlögl).
Preprint submitted to Elsevier 1 November 2018
http://arxiv.org/abs/0704.0693v1
From the theoretical point of view, the electronic structure and conductance
of nanocontacts and nanowires has been studied by ab initio band structure
calculations. For aluminium contacts, investigations of the electronic states
have been reported in [4,5,6,7], and the conductance has been addressed in
[8,9,10,11,12,13,14,15,16]. In these studies various geometries have been used,
which are assumed to model the local atomic structure of the contact in an
adequate way. The breakage of an aluminium contact has been simulated by
means of molecular dynamics calculations in [17,18,19], i.e. on the basis of
realistic structural arrangements. However, despite such a large number of
investigations, the literature lacks satisfactory reflections about the interrela-
tions between the details of the crystal structure and the local electronic states
at the nanocontact. In the present letter we will deal with this point by com-
paring characteristic contact geometries, including stretched configurations.
In a previous work [20] we have demonstrated that hybridization between Al
3s und 3p states is strongly suppressed at aluminium nanocontacts due to
directed bonds at the neck of the contact. We therefore expect the system
to be very sensitive against modifications of the orbital overlap coming along
with the specific contact geometry. As a consequence, structural details are
important for the electrical transport, since hybridization effects can play a
critical role for transport properties of atomic-size contacts and interfaces
[21,22,23]. In particular, stretching of the nanocontact alterates the chemical
bonding and thus may lead to unexpected electronic features. We will show
that it is mandatory to account for the very details of the contact geometry
in order to obtain adequate results from electronic structure calculations.
The band structure results presented subsequently are obtained within density
functional theory and the generalized gradient approximation. We use the
WIEN2k program package, a state-of-the-art full-potential code based on a
mixed lapw and apw+lo basis [24]. In our calculations the charge density is
represented by ≈150000 plane waves and the exchange-correlation potential is
parametrized according to the Perdew-Burke-Ernzernhof scheme. Moreover,
the mesh for the Brillouin zone integration comprises between 75 and 102
points in the irreducible wedge. While Al 1s, 2s, and 2p orbitals are treated
as core states, the valence states comprise Al 3s and 3p orbitals. The radius
of the aluminium muffin-tin spheres amounts to 2.6 Bohr radii.
Our calculations rely on two characteristic contact geometries, which we in-
troduce in the following. On the one hand, we address a configuration where
a single Al atom is connected to planar Al units on both sides, each consisting
of seven atoms in a hexagonal arrangement with fcc [111] orientation. The
central sites of these planar units lie on top of the contact atom, thus giving
rise to linear σ-type Al-Al bonds along the z-axis. For this reason, we call the
first geometry under consideration the linear contact configuration. The finite
Al units are connected to Al ab-planes of infinite extension, which enables us
to apply periodic boundary conditions. We note that the contact Al site in
this linear geometry, due to its two nearest neighbours, resembles the essential
structural features of atoms in a monostrand nanowire [20].
On the other hand, we study an Al atom sandwiched between two pyramidal
Al electrodes in fcc [001] orientation, which we call the pyramidal contact
configuration. To be specific, the contact Al site has four crystallographically
equivalent nearest neighbours on both sides, which prohibits σ-type Al-Al
bonding via the 3pz orbitals along the z-axis. Whereas the second pyramidal
layer comprises nine atoms, the third layer off the contact extends infinitely
on account of periodic boundary conditions.
For both contact configurations, a convenient choice for the bond lengths and
bond angles is given by the bulk (fcc) aluminium values, therefore by inter-
atomic distances of 2.86 Å. Mechanical stress can increase this bond length at
the nanocontact, which we simulate by interatomic distances of 3.95 Å for the
linear and 3.62 Å for the pyramidal contact configuration. In both cases, only
the bond lengths between the contact Al site and its nearest neighbours are
changed with respect to the fcc setup. Structural relaxation of the electrodes
due to the elongated contact bonds plays a minor role.
For bulk aluminium it is well established that the formal Al 3s23p1 electronic
configuration is seriously interfered by hybridization effects, giving rise to a
prototypical sp-hybrid system. However, the situation changes dramatically
when covalent bonding is no longer isotropic but restricted to specific direc-
tions, as for an atomic-size contact. Partial Al 3s, 3pz, and 3px densities of
states (DOS) as calculated for the Al site at the neck of our linear contact
configuration are shown in figure 1. By symmetry, px and py states are degen-
erate. While most of the occupied states are of 3s type, the 3p states dominate
at energies above the Fermi level. Because hardly any contribution of 3s and
3p states is found at energies dominated by the other states, respectively, evo-
lution of hybrid orbitals and an interpretation in terms of sp-hybrid states
is precluded. Most 3p electrons occupy the pz orbital, which is oriented along
the principal axis of the contact and therefore mediates σ-type orbital overlap.
Because neither 3s nor 3px states give rise to significant contributions to the
DOS at the Fermi energy, chemical bonding is well characterized in terms of
directed 3pz bonds.
When the Al-Al bond length is stretched from 2.86 Å to 3.95 Å at the nanocon-
tact, the central Al site decouples from its neighbours. Its electronic states
hence become more atom-like, which is clearly visible for the 3s states in fig-
ure 1. Smaller band widths and sharper DOS structures likewise are obvious
for the 3px states. Finally, for the 3pz symmetry component we observe a
shift of states from lower energies to the Fermi level and from higher energies
to a new structure at about 2 eV. The DOS at the Fermi energy increases
d=3.95 Å
d=2.86 Å
Al 3s
20-2-4-6
Al 3p
20-2-4-6
Al 3p
E-EF(eV)
20-2-4-6
Fig. 1. Partial Al 3s, 3pz, and 3px densities of states for the contact aluminium site
in the linear contact configuration.
configuration bond length charge transfer
linear 2.86 Å 0.51
2.97 Å 0.54
3.08 Å 0.57
3.33 Å 0.59
3.58 Å 0.62
3.95 Å 0.63
pyramidal 2.86 Å 0.22
3.22 Å 0.53
3.62 Å 0.61
4.04 Å 0.66
Table 1
Net charge transfer off the contact aluminium site.
significantly.
Turning to the occupation of the valence orbitals, the decoupling of the contact
Al site comes along with a reduction of charge, amounting to 0.13 electrons.
In general, the px/py atomic orbital does not mediate chemical bonding due to
the spacial restriction of the crystal structure. Its occupation thus is strongly
reduced and we cannot expect local charge neutrality. Calculated values for
the net charge transfer off the contact Al site, as compared to bulk aluminium,
are given in table 1 for bond lengths between 2.86 Å and 3.95 Å.
Figure 2 shows partial Al 3s, 3pz, and 3px densities of states for the pyramidal
contact configuration. As compared to the linear case fundamental differences
are found. For an Al-Al bond length of 2.86 Å we now have a finite Al 3s DOS
at the Fermi energy. The same is true for the 3pz states, whereas the 3px DOS
almost vanishes. Again, px and py states are degenerate by symmetry. While
most of the occupied states still are of 3s type, contributions of the 3p states are
larger than for the linear contact configuration. Chemical bonding therefore is
more isotropic, which is reflected by larger band widths. As expected from the
contact geometry, σ-type bonding via the 3pz-orbital is significantly reduced.
Due to increased hybridization between the 3s and the three 3p orbitals, all
these states can participate in the electrical transport. Figure 3 illustrates the
differences in the electronic structure at the linear and pyramidal contact by
means of electron density maps. In each case, the density map covers the plane
of the central Al site, where the principal axis of the contact runs from left to
d=3.62 Å
d=2.86 Å
Al 3s
20-2-4-6
Al 3p
20-2-4-6
Al 3p
E-EF(eV)
20-2-4-6
Fig. 2. Partial Al 3s, 3pz, and 3px densities of states for the contact aluminium site
in the pyramidal contact configuration.
pyramidal
linear
Fig. 3. Electron density maps for the neck of the linear and pyramidal contact.
right.
Stretching of the contact by increasing the Al-Al bond length from 2.86 Å to
3.62 Å has similar effects as for the linear contact configuration. The decou-
pling of the central Al site from the pyramidal electrodes results in smaller
band widths and sharper DOS structures for the 3s, 3pz, as well as 3px states,
see figure 2. In addition, the 3pz und 3px states shift to higher energies, giving
rise to pronounced DOS structures near 2 eV. Due to the reduced band width,
the 3s and 3pz DOS disappears almost completely in the vicinity of the Fermi
energy. As concerns the 3pz states, the pyramidal geometry thus shows the
opposite behaviour than the linear geometry. Since the Al 3pz states mediate
the main part of the orbital overlap across the nanocontact, they are very
sensitive against changes in the crystal structure. Structural rearrangement
during the breakage of an aluminium nanocontact or nanowire consequently
should have serious effects on the electrical transport, such as modulation of
the conductance.
Elongated bonds again are accompanied by a decline of charge at the contact.
However, the value of 0.39 electrons net charge transfer off the central Al site
is larger than in the linear case. This traces back to the fact that reduction
of hybridization on stretching is more efficient for the pyramidal geometry,
since for the linear geometry hardly any hybridization is left right from the
Al 3p
E-EF(eV)
20-2-4-6
Fig. 4. Spin majority and minority Al 3pz densities of states for the contact alu-
minium site in the stretched linear contact configuration (d = 3.95 Å).
beginning. Accordingly, smaller values for the net charge transfer are found in
the pyramidal case, see table 1.
We next show that the remarkable increase of the Al 3pz DOS at the Fermi
energy on stretching the linear contact configuration leads to an instability
against ferromagnetic ordering. A spontaneous magnetization of simple metal
nanowires has been predicted by Zabala et al. [4]. These authors have demon-
strated the instability in explicit calculations for an aluminium nanowire, using
a stabilized jellium model. For Al-Al bond lengths of d = 3.95 Å at the neck
of our contact, we hence have performed spin polarized electronic structure
calculations, yielding a stable ferromagnetic solution. Figure 4 displays the
corresponding spin majority and minority densities of states for the contact
Al site. The local spin splitting of about 0.2 eV is connected to an energy gain
of 5.5mRyd. Moreover, the magnetic moment amounts to 0.1µB, which is
significantly smaller than the 0.68µB reported by Zabala et al. [4] for an infi-
nite nanowire. However, Delin et al. [6] have shown for Pd nanowires that the
spontaneous magnetization decreases rapidly for short chains. Only the linear
contact configuration is subject to a magnetic instability on stretching. For
the pyramidal geometry, of course, magnetism cannot be expected, compare
the spin degenerate DOS curves in figure 2.
In conclusion, we have studied the electronic structure of aluminium nanocon-
tacts by means of band structure calculations within density functional theory.
Taking into account the details of the crystal structure, we have discussed the
electronic features of prototypical contact geometries (linear versus pyrami-
dal). Our calculations result in two largely different scenarios. In particular, the
Al 3pz states are strongly affected by modifications of the chemical bonding.
If σ-type bonding via the 3pz orbitals is dominant because of direct orbital
overlap across the contact, sp-type hybridization is almost completely sup-
pressed and only Al 3pz states remain at the Fermi energy [20]. Otherwise, if
the bonding is more isotropic for geometrical reasons, the Al 3s states likewise
have to be taken into account. The divers behaviour of the linear and pyrami-
dal geometry becomes even more pronounced when the contact is stretched.
Whereas in the linear case the Al 3pz DOS at the Fermi energy increases,
which even yields a ferromagnetic instability, it vanishes in the pyramidal
case, leaving the system insulating. As a consequence, the structural details of
the contact are expected to strongly influence the electrical transport. Because
of structural rearrangements, they are particularly relevant for the breakage
of a nanocontact or nanowire.
Acknowledgements
We thank U. Eckern and P. Schwab for helpful discussions and the Deutsche
Forschungsgemeinschaft for financial support (SFB 484).
References
[1] Pascual J.I., Méndez J., Gómez-Herrero J., Baró A.M., and Garćıa N., Phys.
Rev. Lett. 71 (1993) 1852.
[2] Scheer E., Joyez P., Esteve D., Urbina C., and Devoret M.H., Phys. Rev. Lett.
78 (1997) 3535.
[3] Agräıt N., Levi-Yeyati A., and van Ruitenbeck J.-M., Phys. Rep. 377 (2003) 81.
[4] Zabala N., Puska M.J., Ayuela A., Raebiger H., and Nieminen R.M., J. Magn.
Magn. Mater. 249 (2002) 193.
[5] Ribeiro F.J., and Cohen M.L., Phys. Rev. B 68 (2003) 035423.
[6] Delin A., Tosatti E., and Weht R., Phys. Rev. Lett. 92 (2004) 057201.
[7] Delin A. and Tosatti E., J. Phys.: Condens. Matter 16 (2004) 8061.
[8] Levy-Yeyati A., Mart́ın-Rodero A., and Flores F., Phys. Rev. B 56 (1997) 10369.
[9] Cuevas J.C., Levi-Yeyati A., and Mart́ın-Rodero A., Phys. Rev. Lett. 80 (1998)
1066.
[10] Cuevas J.C., Levi-Yeyati A., Mart́ın-Rodero A., Bollinger G.R., Untiedt C., and
Agräıt N., Phys. Rev. Lett. 81 (1998) 2990.
[11] Kobayashi N., Brandbyge M., and Tsukada M., Phys. Rev. B 62 (2000) 8430.
[12] Palacios J.J., Pérez-Jiménez A.J., Louis E., SanFabián E., and Vergés J.A.,
Phys. Rev. B 66 (2002) 035322.
[13] Thygesen K.S., and Jacobsen K.W., Phys. Rev. Lett. 91 (2003) 146801.
[14] Lee H.-W., Sim H.-S., Kim D.-H., and Chang K.J., Phys. Rev. B 68 (2003)
075424.
[15] Okano S, Shiraishi K., and Oshiyama A., Phys. Rev. B 69 (2004) 045401.
[16] Sasaki T., Egami Y., Ono T., and Hirose K., Nanotechnology 15 (2004) 1882.
[17] Hasmy A., Medina E., and Serena P.A., Phys. Rev. Lett. 86 (2001) 5574.
[18] Hasmy A., Pérez-Jiménez A.J., Palacios J.J., Garćıa-Mochales P., Costa-
Krämer J.L., Dı́az M., Medina E., and Serena P.A., Phys. Rev. B 72 (2005)
245405.
[19] Pauly F., Dreher M., Viljas J.K., Häfner M., Cuevas J.C., and Nielaba P.,
arXiv:cond-mat/0607129.
[20] Schwingenschlögl U. and Schuster C., Chem. Phys. Lett. 432 (2006) 245.
[21] Schmitt T., Augustsson A., Nordgren J., Duda L.-C., Höwing J., Gustafsson
T., Schwingenschlögl U., and Eyert V., Appl. Phys. Lett. 86 (2005) 064101.
[22] Schwingenschlögl U. and Schuster C., Chem. Phys. Lett. 435 (2007) 100.
[23] Schwingenschlögl U. and Schuster C., Europhys. Lett. 37 (2007) 37007.
[24] Blaha P., Schwarz K., Madsen G., Kvasicka D., and Luitz J., WIEN2k:
An augmented plane wave + local orbitals program for calculating crystal
properties, Vienna University of Technology, 2001.
http://arxiv.org/abs/cond-mat/0607129
	References
ABSTRACT
  We present electronic structure calculations for aluminium nanocontacts.
Addressing the neck of the contact, we compare characteristic geometries to
investigate the effects of the local aluminium coordination on the electronic
states. We find that the Al 3pz states are very sensitive against modifications
of the orbital overlap, which has serious consequences for the transport
properties. Stretching of the contact shifts states towards the Fermi energy,
leaving the system instable against ferromagnetic ordering. By spacial
restriction, hybridization is locally suppressed at nanocontacts and the charge
neutrality is violated. We discuss the influence of mechanical stress by means
of quantitative results for the charge transfer.

<|endoftext|><|startoftext|>
Introduction
Measurements of current - voltage (I-V ) characteristics are accompanied with the heat
emission and the selfheating. The selfheating can modify dramatically the resulting I-V
curve. A heat hysteresis of I-V curve and a dependence of I-V curve on the velocity of
scanning of current are signs of selfheating.
A removal of the selfheating is very important for transport measurements of high-Tc
superconductors because their heightened temperature sensibility. By reducing the cross
section S of a bulk sample one can measure I-V curve at the fixed range of the current
density for the smaller values of the measuring current. The selfheating decreases as
well. In the case of non-tunneling break junction (BJ) technique, a significant reducing
of S is achieved by the formation of a microcrack in a bulk sample. The non-tunneling
BJ of high-Tc superconductors represents two massive polycrystalline banks connected
by a narrow bottleneck (Figure 1a). The bottleneck is constituted by granules and
intergranular boundaries which are weak links (Figure 1b). The current density in
the bottleneck is much larger than that in the banks. If the bias current I is less
than the critical current Ic of the bulk sample then the weak links in the banks have
zero resistance. Provided small transport currents, (i) Ic and the I-V curve of the BJ
http://arxiv.org/abs/0704.0694v1
Current - voltage characteristics of break junctions of high-Tc superconductors 2
are determined by the weak links in the bottleneck only, (ii) the selfheating effect is
negligible.
Figure 1. a) Break junction of polycrystalline sample. The crack 1 and the bottleneck
2 are displayed. b) Granules in the bottleneck. Filled circles mark weak links that are
intergranular boundaries. Dotted lines are the main paths for transport current. c)
Simplified circuit for the network (Sec. 3.1).
The experimental I-V curves of BJs of high-Tc superconductors have rich
peculiarities reflecting physical mechanisms of a charge transport through weak links.
It was a topic of many investigations [1, 2, 3, 4, 5, 6, 7]. Here we analyze the earlier
works of our group [4, 5, 6, 7] and the new experimental data (Section 2). The model for
description of the I-V curves is suggested in Section 3. The peculiarities observed on the
experimental I-V curves of BJs have been explained in Section 4. Also the parameters
of weak links in the investigated samples are estimated in Section 4.
2. Experiment
La1.85Sr0.15CuO4 (LSCO), Y0.75Lu0.25Ba2Cu3O7−δ (YBCO) and Bi1.8Pb0.3Sr1.9Ca2Cu3Ox
(BSCCO) were synthesized by the standard ceramic technology. The composite 67 vol.%
YBa2Cu3O7−δ + 33 vol.% Ag (YBCO+Ag) was prepared from YBa2Cu3O7−δ powder
and ultra-dispersed Ag [8]. The initial components were mixed and pressed. Then the
composite was synthesized at 925◦C for 8 h. The critical temperatures Tc are 38 K for
LSCO, 112 K for BSCCO, 93.5 K for YBCO and YBCO+Ag.
Samples with a typical size of 2 mm x 2 mm x 10 mm were sawed out from
synthesized pellets. Then the samples were glued to a sapphire substrates. The sapphire
was chosen due to its high thermal conductivity at low temperatures. The central part
of the samples was polished down to obtain a cross-sectional area S ≈ 0.2 x 1 mm2.
For such a value of S, the critical current Ic of YBCO and BSCCO has a typical value
about 2 A at 4.2 K (current density ≈ 1000 A/cm2). Further controllable decrease in S
is very difficult due to an inevitable mechanical stresses breaking the sample. In order
Current - voltage characteristics of break junctions of high-Tc superconductors 3
to obtain a contact of the break junction type, the sample with the above value of S
was bent together with the substrate with the help of screws of spring-loaded current
contacts. It led to the emergence of a microcrack in the part of sample between the
potential contacts. As a result, either a tunnelling contact (no bottleneck, the resistance
R > 10 Ohm at the room temperature) or a metal contact (R < 10 Ohm) was formed.
Only the metal contacts were selected for investigation.
The drop of Ic(4.2 K) when the sample was cracked shows that the values of
S decreased by ≈30 times for LSCO and ≈100 times for YBCO and BSCCO. For
YBCO+Ag Ic(77.4 K) decreases by ∼500 times.
The I-V curves were measured by the standard four probe technique under bias
current. A typical V (I) dependence of BJ has the hysteretic peculiarity which decreases
as temperature increases. Also there is the excess current on the I-V curves. The I-V
curves of LSCO and BSCCO BJs exhibit an arch-like structure at low temperature.
The I-V curves of BJs investigated are independent of scanning velocity of bias current.
Thus, the experimental conditions provides that the hysteretic peculiarity on these I-V
curves is not caused by the selfheating.
3. Model
3.1. I-V curve of network
A polycrystalline high-Tc superconductor is considered to be the network of weak links.
The I-V curve of a network is determined by the I-V curves of individual weak links
and their mutual disposition.
Let us consider firstly an influence of mutual disposition of weak links on the I-V
curve. For a bulk high-Tc superconductor the I-V curve resembles the one of typical
single weak link [9]. However the I-V curves of BJs are distorted usually in comparison
with the one of a single weak link. It is because the combination of finite number
of weak links remains in the bottleneck of BJ (Figure 1 a, b). So the contribution
of different weak links to the resulting I-V curve is more stronger in a BJ than in a
large network. The characteristics of a chaotic network is difficult to calculate [9]. To
simplify the calculation of resulting I-V curve of BJ we consider an equivalent network:
the simple parallel connection of a few chains of series-connected weak links (Figure 1
c). Indeed there are percolation clusters [10] in a network that are paths for current
(Figure 1 b). The each percolating cluster in the considered network is considered to be
the series-connected weak links.
The V (I) dependence of the series-connected weak links is determined as V (I) =
Vi(I), where the sum is over all weak links in the chain, Vi(I) is the I-V curve of each
weak link. The weak links and their I-V curves may be different. It is conveniently to
replace here the sum over all weak links with the sum of a few more typical weak links
multiplied by a weighting coefficient Pi. The relation for the series-connected weak links
Current - voltage characteristics of break junctions of high-Tc superconductors 4
is resulted:
V (I) = NV
PiVi (I) , (1)
where NV is the number of typical weak links, Pi shows the share of ith weak link in
the resulting I-V curve of the chain,
Pi = 1.
The parallel connection of chains is considered further. If the current I flows
through the network then the current Ij through jth chain equals IP‖j/N‖ and
Ij = I.
Here N‖ is the number of parallel chains in the network, P‖j is the weighting coefficient
determined by the resistance of jth chain,
P‖j = 1.
An addition (a subtraction) of chains in parallel connection smears (draws down)
the I-V curve of network to higher (lower) currents. It is like to the modification of
I-V curve due to the increase (the decrease) of cross section of sample. For the sake of
simplicity the difference of parallel chains can be neglected and the typical chain may
be considered only. Then the expression for I-V curve of network of weak links follows:
V (I) = NV
, (2)
where the sum is over the typical weak links with weighting coefficients Pi, NV is the
number of series-connected weak links in the typical chain in the network, I/N‖ = Ii is
the current through the ith weak link of the typical chain.
3.2. I-V curve of a typical weak link
The metal intergranular boundaries were revealed in the polycrystalline YBCO
synthesized by the standard ceramic technology [11]. The excess current and
other peculiarities on the I-V curves of the studied samples are characteristic for
superconductor/normal-metal/superconductor (SNS) junctions [12]. These facts verify
that the intergranular boundaries in the high-Tc superconductors investigated are
metallic. Therefore the networks of SNS junctions are realized in the samples.
The Kümmel - Gunsenheimer - Nicolsky (KGN) theory [13] only among theories
developed for SNS structures predicts the hysteretic peculiarity on the I-V curve of
weak link. The KGN theory considers the multiple Andreev reflections of quasiparticles.
According to the KGN model, the hysteretic peculiarity reflects a part of I-V curve with
a negative differential resistance which can be observed under bias voltage [12, 13]. The
KGN approach was used earlier to the description of experimental I-V curves of low-Tc
[14, 15] and high-Tc weak links [16, 14].
The approach based on consideration of the phase slip in nanowires [17] may
alternatively be employed to compute the hysteretic I-V curve. The model [17] is
valid at T ≈ Tc while the KGN model is appropriate at temperature range T < Tc.
We use the simplified version [14] of KGN to describe the I-V curves of individual
weak links. According to [14] the expression for the current density of SNS junction is
given by:
Current - voltage characteristics of break junctions of high-Tc superconductors 5
Table 1. Parameters of superconductors
Sample ∆0 [meV] m
∗/me kF [Å
BSCCO 25 6.5 0.61
LSCO 9 5 0.35
YBCO 17.5 5 0.65
YBCO+Ag 17.5 5 0.65
j(V ) =
em∗2d2
2π3~5
−∆+neV
∆2 − E2
1− C 2|E|
E2 −∆2
with C = π/2(1 − dm∗∆/2~2kF ) for C > 1 and C = 1 otherwise, E1 = −∆ + neV for
−∆ + neV ≥ ∆ and E1 = ∆ otherwise. Here A is the cross section area and d is the
thickness of normal layer with the inelastic mean free path l and resistance RN , e is the
charge and m∗ is the effective mass of electron, ∆ is the energy gap of superconductor,
n is the number of Andreev reflections which a quasiparticle with energy E undergoes
before it moves out of the normal layer.
One should calculate a few I(V ) dependencies by Eq.(3) for different parameters
to simulate the I-V curve of network by Eq.(2). Almost all parameters in Eq.(3) can be
dispersing for different weak links. Indeed there are some distribution functions of the
parameters of intergranular boundaries (d, A, RN) or the parameters of superconducting
crystallites (∆, the angle of orientation) in the SNS network.
4. Current - voltage characteristics
Figures 2-5 show the experimental I-V curves of BJs (circles) and the calculated I-V
curves of SNS networks (solid lines). The right scale of V -axis of all graphs is given in
the units eV/∆ to correlate the position of peculiarities on I-V curve with the value of
energy gap.
The parameters of superconductors are presented in Table 1. The mean values of
energy gap ∆0 at T = 0 known to be for high-Tc superconductors were used. Parameters
kF , m
∗ were estimated by the Kresin-Wolf model [18].
For a fitting we have calculated the I(V ) dependencies of different SNS junctions
by Eq.(3) to describe different parts of the experimental I-V curve. The parameters
varied were d and RN . Then we have substituted the arrays of I-V values to Eq.(2).
The most experimental I-V curves are satisfactory described when the sum in Eq.(2)
contains at least two members. The first member describes the hysteretic peculiarity,
Current - voltage characteristics of break junctions of high-Tc superconductors 6
the second one describes the initial part of I-V curve. The fitting is illustrated in detail
on Figure 2 were curve 1 is calculated for d = 78 Å, curve 2 is calculated for d = 400 Å
Figure 2. I-V curve of YBCO+Ag break junction at T = 77.4 K. Experiment (circles)
and computed curves (solid lines). Arrows display the jumps of voltage drop. Curve 1
that is N‖I1(V1) fits the hysteretic peculiarity. Curve 2 that is N‖I2(V2) fits the initial
part of I-V curve. Curve 3 is the dependence V (I) = NV
P1V1(I/N‖) + P2V2(I/N‖)
Figure 3. I-V curve of BSCCO break junction at T = 4.2 K. Experiment (circles)
and computed curve (solid line). Arrows display the jumps of voltage drop.
The main fitting parameters are d/l, NV , N‖A, RN/N‖, P1,2. The parameter P1
is the weighting coefficient for the stronger (with the thinner d) typical weak link,
P2 = 1− P1. Some parameters used are presented in Table 2. Values of l are estimated
from the experimental data of resistivity (2, 3, 1.6, 3.6 mOhm cm at 150 K for bulk
BSCCO, LSCO, YBCO, YBCO+Ag correspondingly) and data of works [19, 20]. The
value of l for Ag at 77 K is known to be ∼ 0.1 cm. But it is more realistic to use much
Current - voltage characteristics of break junctions of high-Tc superconductors 7
Figure 4. Temperature evolution of I-V curve of LSCO break junction. Experiment
(circles) and computed curves (solid lines). The I-V curves at 11.05 K, 23.6 K, 32.85
K are shifted up by 0.2 V, 0.4 V, 0.6 V correspondingly.
Figure 5. Temperature evolution of I-V curve of YBCO break junction. Experiment
(circles) and computed curves (solid lines). The I-V curves at 21.65 K, 41.1 K, 61.95
K are shifted up by 10 mV, 20 mV, 30 mV correspondingly.
smaller value for composite. Table 2 shows the possible different values of l and the
corresponding values of d for YBCO+Ag.
The number of the parallel paths N‖ is estimated by assuming A ≃ 10−11 cm2
for the weak links in polycrystalline high-Tc superconductors. Such choice of A is
reasonable because the cross section area of weak link should be more smaller than
D2 (Figure 1b), where D ∼ 10−4 cm is the grain size of high-Tc superconductors. This
rough estimation of N‖ is influenced by a form of the percolation clusters in the sample
[10] and imperfections of weak links.
Figures 2-5 demonstrate that the hysteretic peculiarity on the experimental I-V
curves is resulted from the region of negative differential resistance. This region is due
to the number of the Andreev reflections decreases when the voltage increases.
Current - voltage characteristics of break junctions of high-Tc superconductors 8
Table 2. Parameters of SNS junctions in the networks
Sample l [Å] d1 [Å] d2 [Å] P1 NV N‖
BSCCO 72∗ 3.5 - 1 1 1
LSCO 50∗ 4.8 20 0.905 15 20
YBCO 90∗ 2 20 0.333 3 1
YBCO+Ag 1000∗∗ 78 400 0.75 4 5
100∗∗ 7.8 40 0.75 4 5
∗ value at T = 4.2 K
∗∗ value at T = 77.4 K
The experimental I-V curves for LSCO and YBCO at different temperatures and
the corresponding curves computed are presented in figures 4 and 5. We account a
decreasing of l and ∆ to compute I-V curves at higher temperatures (for LSCO l(11.05
K) = 50 Å, ∆(11.05 K) = 0.93 meV, l(23.6 K) = 50 Å, ∆(23.6 K) = 0.69 meV, l(32.85
K) = 47 Å, ∆(32.85 K) = 0.46 meV; for YBCO l(21.65 K) = 81 Å, ∆(21.65 K) = 17.3
meV, l(41.1 K) = 70 Å, ∆(41.1 K) = 16.6 meV, l(61.95 K) = 60 Å, ∆(61.95 K) = 13.3
meV). The coincidence of computed curves and experimental I-V curves becomes less
satisfactory then T approaches to Tc. As possible, this discrepancy is due to an influence
of other thermoactivated mechanisms.
As the simulation curves demonstrate (Figs. 3 and 4), the arch-like peculiarity
on the experimental I-V curves of LSCO and BSCCO is one of the arches of the
subharmonic gap structure [13]. By using Eq.(2) we account for the arch-like peculiarity
at voltages ≫ ∆/e for LSCO (Figure 4) that should seem to contradict the KGN model
prediction for the subharmonic gap structure at V ≤ 2∆/e [13].
Also we have used Eq.(2) to estimate the number of resistive weak links in the
sample of composite 92.5 vol. % YBCO + 7.5 vol. % BaPbO3 [16]. The I-V curve of
this composite was described earlier by the KGN based approaches [16, 14]. We obtained
NV = 13, N‖ ≈ 4000 and the full number of resistive weak links is 52000. Small number
NV is the evidence that the shot narrowest part of bulk sample is resistive only.
5. Conclusion
We have measured the I-V characteristics of break junctions of polycrystalline high-Tc
superconductors. The peculiarities that are typical for SNS junctions are revealed on
the I-V curves.
The expression for I-V curve of network of weak links (Eq.(2)) was suggested to
describe the experimental data. Eq.(2) determines the relation between the I-V curve
of network and the I-V characteristics of typical weak links.
The I-V curves of SNS junctions forming the network in the polycrystalline high-
Tc superconductors are described by the Kümmel - Gunsenheimer - Nicolsky approach
[13, 14]. The multiple Andreev reflections are found to be responsible for the hysteretic
Current - voltage characteristics of break junctions of high-Tc superconductors 9
and arch-like peculiarities on the I-V curves. The shift of subharmonic gap structure
to higher voltages is explained by the connection of a few SNS junctions in series.
We believe that the expression suggested (Eq.(2)) allows to estimate the number
of junctions with nonlinear I-V curves and R > 0 in various simulated networks.
Acknowledgements
We are thankful to R. Kümmel and Yu.S. Gokhfeld for fruitful discussions. This
work is supported by program of President of Russian Federation for support of young
scientists (grant MK 7414.2006.2), program of presidium of Russian Academy of Sciences
”Quantum macrophysics” 3.4, program of Siberian Division of Russian Academy of
Sciences 3.4, Lavrent’ev competition of young scientist projects (project 52).
References
[1] Zimmermann U, Abens S, Dikin D, Keck K, Wolf T 1996 Physica B 218 205
[2] Svistunov V M, Tarenkov V Yu, Dyachenko A I, Hatta E 2000 JETP Lett. 71 289
[3] Gonnelli R S, Calzolari A, Daghero D, Ummarino G A, Stepanov V A, Giunchi G, Ceresara S,
Ripamonti G. 2001 Phys. Rev. Lett. 87 097001
[4] Petrov M I, Balaev D A, Gokhfeld D M, Shaikhutdinov K A, Aleksandrov K S 2002 Phys. Solid
State 44 1229
[5] Petrov M I, Balaev D A, Gokhfeld D M, Shaikhutdinov K A 2003 Phys. Solid State 45 1219
[6] Petrov M I, Gokhfeld D M, Balaev D A, Shaihutdinov K A, Kümmel R 2004 Physica C 408 620
[7] Gokhfeld D M, Balaev D A, Shaykhutdinov K A, Popkov S I, Petrov M I 2006 Physics of Metals
and Metallography 101 (Suppl. 1) S27 (Preprint cond-mat/0410112)
[8] Mamalis A G, Ovchinnikov S G, Petrov M I, Balaev D A, Shaihutdinov K A, Gohfeld D M,
Kharlamova S A, Vottea I N 2001 Physica C 364-365 174
[9] Haslinger R, Joynt R 2000 Phys. Rev. B 61 4206
[10] Stauffer D 1979 Physics Reports 54 1
[11] Petrov M I, Balaev D A, Gokhfeld D M 2007 Phys. Solid State 49 619
[12] Likharev K K 1979 Rev. Mod. Phys. 51 101
[13] Kümmel R, Gunsenheimer U, Nicolsky R 1990 Phys. Rev. B 42 3992
[14] Gokhfeld D M 2007 Supercond. Sci. Technol. 20 62 (Preprint cond-mat/0609541)
[15] Gokhfeld D M 2007 Physica C (materials of M2S-HTSC, Preprint cond-mat/0605427)
[16] Petrov M I, Balaev D A, Gohfeld D M, Ospishchev S V, Shaihutdinov K A, Aleksandrov K S 1999
Physica C 314 51
[17] Michotte S, Matefi-Tempfli S, Piraux L, Vodolazov D Y, Peeters F M 2004 Phys. Rev. B 69 094512
[18] Kresin V Z, Wolf S A 1990 Phys. Rev. B 41 4278
[19] Larbalestier D, Gurevich A, Feldmann D M, Polyanskii A 2001 Nature 414 368
[20] Gorkov L P, Kopnin N B 1988 Usp. Fiz. Nauk 156 117 (Sov. Phys. Usp. 31 850)
http://arxiv.org/abs/cond-mat/0410112
http://arxiv.org/abs/cond-mat/0609541
http://arxiv.org/abs/cond-mat/0605427
	Introduction
	Experiment
	Model
	I-V curve of network
	I-V curve of a typical weak link
	Current - voltage characteristics
	Conclusion
ABSTRACT
  The current-voltage ($I$-$V$) characteristics of break junctions of
polycrystalline La$_{1.85}$Sr$_{0.15}$CuO$_4$,
Y$_{0.75}$Lu$_{0.25}$Ba$_2$Cu$_3$O$_{7-\delta}$,
Bi$_{1.8}$Pb$_{0.3}$Sr$_{1.9}$Ca$_2$Cu$_3$O$_x$ and composite
YBa$_2$Cu$_3$O$_{7-\delta}$ + Ag are investigated. The experimental $I$-$V$
curves exhibit the specific peculiarities of
superconductor/normal-metal/superconductor junctions. The relation between an
$I$-$V$ characteristic of network of weak links and $I$-$V$ dependencies of
typical weak links is suggested to describe the experimental data. The $I$-$V$
curves of typical weak links are calculated by the K\"{u}mmel - Gunsenheimer -
Nicolsky model considering the multiple Andreev reflections.

<|endoftext|><|startoftext|>
Introduction
	General discussion
	Calculation of R123(2)
	Calculation of R123(3)
	Photon splitting in a monochromatic plane wave
	Asymptotics of the amplitudes
	Amplitudes for small  and small 
	Amplitudes for small  and fixed 
	Amplitudes for large  and fixed =/
	Possibility of experimental observation of photon splitting in a laser field
	Conclusions
	Acknowledgments
	Coefficients for the helicity amplitudes
	References
ABSTRACT
  Photon splitting due to vacuum polarization in a laser field is considered.
Using an operator technique, we derive the amplitudes for arbitrary strength,
spectral content and polarization of the laser field. The case of a
monochromatic circularly polarized laser field is studied in detail and the
amplitudes are obtained as three-fold integrals. The asymptotic behavior of the
amplitudes for various limits of interest are investigated also in the case of
a linearly polarized laser field. Using the obtained results, the possibility
of experimental observation of the process is discussed.

<|endoftext|><|startoftext|>
Introduction: 
Group-VB transition metals V, Nb, and Ta crystallizes in body-centered cubic (BCC) 
structure at ambient pressure and temperature conditions. These metals are of great use 
due to their high thermal, mechanical and chemical stabilities. Recently these metals have 
been the subject of numerous experimental and theoretical studies [1-7] in Mbar pressure 
regions. The Nb is known to have the highest superconducting transition temperature (Tc) 
among elemental solids at ambient pressure [1]. The Ta is used in high-pressure 
experiments as a pressure standard. Experimentally it is known that V, Nb and Ta  remain 
stable in the BCC structure at least up to 150 GPa [2,4] pressure.  
                       However, the linear response phonon calculations using full potential 
linear muffin-in orbital (FP-LMTO) method of V by Suzuki and Otani had shown the 
softening of the transverse acoustics (TA) phonon mode at ~120 GPa pressure, which 
eventually becomes imaginary at pressures higher than 130 GPa [5]. The subsequent 
work of Landa et al. using exact muffin-tin orbital (EMTO) method had also shown the 
anomalous pressure behavior of the C44 elastic constant. They found that the C44 starts 
softening above 50 GPa pressure and drops to zero value at about 180 GPa pressure. In 
the pressure range of 180-270 GPa the value of C44 is negative. Above to this, the C44 
adopts the usual pressure  behavior. These are an indication of  the development of a 
structural instability in BCC lattice.  However, these authors corroborated  this behavior 
to the Fermi-surface nesting  and the possibility of any structural phase transition was not 
explored [6].   
In this paper we are reporting our ab-initio density functional theory based results for the 
V, Nb and Ta under high pressure. For V, BCC to rhombohedral structural phase 
transition has been predicted at around 60 GPa pressure and it remains in rhombohedral 
structure at least up to 332 GPa pressure. However, at 434 GPa pressure the BCC 
structure re-appears  in our calculations. This behavior is not found in the Nb and Ta.. 
But lattice expansion in Nb resulted a new minimum around 111° which is lower in 
energy relative to BCC. By analogy, we expect such features in Ta will appear even at  
higher lattice expansions. 
Computational method:  
The cubic structures namely simple cubic, body centered cubic (BCC) and face-centered 
cubic (FCC) crystal structures are related to a rhombohedral structure. The simple cube 
can be viewed as a rhombohedron with angle (αrhom) equal to 90°. The primitive cells of 
BCC and FCC are rhombohedron with angle αrhom equal to 109.47° and 60° respectively 
[8]. Thus it is possible  to generate all theses cubic structures from a rhombohedron with 
atom at (0,0,0) by varying the angle. Here, we have performed total energy calculations  
taking  a  rhombohedral unit cell for V, Nb and Ta as a function of αrhom in the range of 
54°-112° at several volumes. All the calculations  for V were done using the 
pseudopotential based PWSCF computer code [9]. The total energy calculations are 
based on density functional theory  and the phonon calculations on density functional 
perturbation theory [10]. An ultrasoft pseudopotential for V was used with 40 Ry plane 
wave energy cut off and a 400 Ry cut off in the expansion of the augmentation charges. 
Phonon dynamical matrices were computed on a uniform 6x6x6 grid of q-points in the 
BZ of BCC structure. This leads  to total 16 q-points in the irreducible BZ. Using Fourier 
interpolation, we then constructed and diagonalized the dynamical matrices on a denser 
grid (24x24x24). The calculations for the Ta and Nb were performed using the VASP 
computer program with PAW pseudopotentials taking  an energy cut off of 400 eV for 
plane wave expansions [11-13]. Results were cross checked with both programs at a few 
volumes for V and Nb. The s and p semicore levels of respective elements were included 
in the valence states. For the exchange-correlation term the generalized gradient 
correction approximation of Perdew-Buke-Ernzerof (GGA-PBE) [14] was used. A 
18x18x18 k-point mesh for the Brillouin zone (BZ) integration was used for all the total 
energy calculations. The energy convergence with respect to computational parameters 
was carefully examined.  
Results and discussion:  
The zero-pressure and zero-temperature results for lattice constant a0, bulk modulus BB0, 
and the first pressure derivative of the bulk modulus B’ for V, Ta, and Nb are obtained by 
fitting the total energies versus lattice constant data to a fourth order polynomial. 
Calculated equilibrium lattice constants are in good agreement with the experimental 
values (within the error of 1%). Bulk modulus and its first pressure derivates are also in 
good agreement with existing experimental data (see table for comparison). Calculated 
pressure-volume relations matches well with existing experimental data and shown in 
Fig.1. Hence justifies the use of respective presudopotentials and other computational 
parameters.  
Table-1: The experimental data shown in brackets by symbol (*) is of Takemura [2] and 
by symbol (+) is of Dewaele et. al.[7]. 
Element Lattice constant (a0) Bulk Modulus (BB0) B’ 
V 2.998 (3.00)* 182 (165,188)* 3.70 (3.5,4.04)* 
Nb 3.309 (3.30)* 172 (153,168)* 3.40 (3.9,2.2)* 
Ta 3.320 (3.30)+ 208 (194,)+ 3.13 (3.25)+ 
Fig.2 depicts our calculated total energy as a function of angle αrhom  for V at equilibrium 
volume.  The SC is on the absolute maximum and FCC is on the local maximum 
indicating the unstable nature of theses structures for a given rhombohedral distortion. 
The BCC is in the absolute minimum of the total energy curve as per expectation. 
However, there are two local minima around angles 56° and 68° (see the inset of Fig.2).  
To our best of knowledge no body has shown that for V,  FCC and SC structures are 
mechanically unstable. The energy variation in the range of angles 106° -111°  is about 
4.0 mRy per atom (see Fig.3a). Similar calculations were performed at  many volumes 
covering more than 400 GPa pressures. The Figs.3(a, b) show total energy variation as a 
function of angle αrhom  in the range of  106°-112° at several pressures. The reason for 
showing the energy variation only in this range of angles is two fold. Firstly, BCC is of 
lowest energy among all structures  and secondly no extra feature appear around the FCC 
except that local minima become more pronounced with few degree shift. Particularly at 
112 GPa pressure  the minimum around 68° is shifted to angle 70° and the minimum 
around 56° shifted to 57° . At this pressure 70° minimum is lower in energy by 7 mRy 
per atom relative to the  57° minimum and around 12 mRy per atom relative to FCC. 
However, SC remains on the absolute maximum of the total energy curve up to 400 GPa 
pressure, ruling out the possibility of BCC to SC transition as predicted earlier [15]. 
Below 60 GPa pressure the behavior of energy curves remain unchanged.  However at 60 
GPa a new minimum appear at angle 110.50° (say, Min1) leading to a structural phase 
transition. Now BCC energy is higher by amount of 0.05 mRy per atom relative to Min1. 
Further increase in pressure leads to development of another minimum at angle108.50° 
(say, Min2) and BCC lies on a local maximum. At 112 GPa pressure energy difference 
between Min1 and Min2 is about –0.057 mRy per atom and that between Min1 and BCC 
is -0.10 mRy per atom. Thus now BCC-V becomes mechanically unstable toward a 
rhombohedral distortion which can lead to softening of C44 elastic constant as it related to 
trigonal shear. In fact Land et al [6]  predicted C44 softening in the same pressure region. 
It is to be noted that in our work the BCC to rhombohedral structural phase transition has 
occurred at much lower pressure (~ 60 GPa).  These energy changes are  found to persist 
even with denser (e.g. 28x28x28 k-point) mesh for the BZ integration and with larger 
plane wave energy cut offs (e.g., 65 Ry). Thus it confirms that this feature is not an 
artifact of calculation but it is real. In fact, as these calculations were performed with 
same rhombohedral cell  for all values of angles; inaccuracies due to computational 
parameters will be same. At 112 GPa pressure the calculations were also repeated with 
VASP using PAW pseudopotential [11-13] but results were invariant. 
In our calculations the frequency of TA mode becomes imaginary at some q-values along 
the Γ-Η direction of the BZ (Fig. 3c) at 112 GPa pressure; lower compared to ~130 GPa 
pressure reported by Suzuki and Otani [5]. The reason of this mismatch of pressure can 
be due to the  difference in computational techniques and approximations used for 
exchange-correlation terms.  
The total energy behavior between angles 106°-112° at 160 GPa is similar to that at 112 
GPa except now the Min2 becomes lower in energy relative to Min1 by 0.20 mRy per 
atom leading to an iso-structural phase transition. Now Min2 energy  is lower by 0.14 
mRy per atom relative to BCC. Further increase in pressure lead to  disappearance of  
Min1 and the energy difference between Min2 and Min1 at  240 GPa is  around -0.55 
mRy per atom.  However, the energy difference between the BCC and  Min1  is 0.14 
mRy per atom. At 332 GPa, the Min2 is lower in energy by 0.019 mRy per atom relative 
to 109.47° (BCC) energy value. Further increase in pressure lead to disappearance of the 
Min2  also and at 434 GPa pressure there is only one minimum located at 109.47°. In our 
phonon calculations at 160 GPa pressure the TA mode frequency becomes positive and 
further increase in pressure lead to usual pressure behavior, i.e, hardening with pressure 
[Fig. 3(c)]. The calculations for pressures higher than 300 GPa were checked with VASP 
code using PAW pseudopotential including the  nonlinear core corrections which are 
supposed to be taken care the core-core overlap at higher pressures. Our results remain 
unchanged. 
We have also performed total energy calculations for hexagonal closed packed (HCP) 
lattice at optimized c/a axial ratio (1.829) at two pressures; one at ambient and other at 
332 GPa pressure and its energy is found always higher compared to BCC. The energy 
difference (19.6 mRy/atom at ambient pressure) between BCC and HCP increases with 
pressure (40 mRy/atom at 332 GPa ) thus rules out the BCC-HCP transition.  
It is to be noted that work reported by Ding et. al. had claim the observation of BCC to 
rhombohedral transition near 69 GPa pressure [17] which also supports our predictions. 
                           Similar calculations were repeated for the Nb and Ta and the results of 
total energy variations with αrhom at several pressures are shown in Figs. 4(a,b). No new 
features appear in total energy curves under pressure like those in V and thus no phonon 
calculations were attempted. These are in agreement with earlier elastic constant and 
phonon calculations [3,6,16]. The only known anomaly for these two elements in the C44 
is its slope change with pressure, which occurs at 40 GPa for Nb [6] and in 100 GPa 
region for Ta [16]. But lattice expansion in Nb resulted a new minimum around 111° 
which is lower in energy relative to BCC by 11 meV per atom at 6.7% lattice expansion. 
By analogy, we expect such features in Ta will appear even at  larger lattice expansions. 
Conclusion: 
       In conclusion, high pressure structural behavior of BCC metals of group-VB are 
investigated using density functional theory. Calculations for V show the onset of a 
phonon softening and a rhombohedral instability  at  60 GPa pressure which may lead to 
C44 elastic constant softening as predicted earlier [6] in the same pressure region. Thus,  
we predict a  BCC to rhombohedral structural phase transition (αrhom=110.5°) at around 
60 GPa pressure in V. It exists in rhombohedral structure at least up to 330 GPa pressure, 
however its angle is changed to 108.50° at 160 GPa pressure resulting to an iso-structural 
phase transition. Finally at  around 434 GPa pressure, it transforms to BCC structure 
again.  This behavior is not found in the Nb and Ta.  
References: 
[1] V. V. Struzhkin, Y. A. Timofeev, R. J. Hemley and H. K. Mao, Phys. Rev. Lett. 79, 
4262 (1197). 
 [2] K. Takemura,  ‘Porc. Int. Conf. On High Pressure Science and Technology, AIRAPT-
17’  (Honolulu, 1999), (Sci. Technol. High Pressure vol 1) ed. M. H. Maghnani, W. J. 
Nellis and M. F. Nicol (India: University Press) page 443 (2000). 
[3] J. S. Tse, Z. Li, K. Uehara, Y. Ma and R. Ahuja, Phys. Rev. B 69, 132101 (2004). 
[4] H. Cynn and C. S. Yoo, Phys. Rev. B 59, 8526 (1999). 
[5] N. Suzuki and M. Otani, J. Phys.:Condens. Matter 14, 10869 (2002). 
[6] A. Landa, J. Klepeis, P. Soderlind, I. Naumov, O. Velikokhatnyi, L. Vitos and A. 
Ruban, J. Phys.:Condens. Matter 18, 5079 (2006).  
[7] A. Dewaele, P. Loubeyre and M. Mezouar, Phys. Rev. B 70, 094112 (2004). 
[8] C. Kittel, ‘Introduction to Solid State Physics’  Fifth Edition  (1993). 
[9] Website, http://www.pwscf.org
[10] S.  Baroni, S. de Gironcoli, A. D. Corso and P. Giannozzi, Rev. Mod. Phys. 73, 515 
(2001). 
[11] G. Kresse and J. Furthmuller, Phys. Rev. B. 54, 11169  (1996). 
[12] G. Kresse   and J. Hafner, Phys. Rev. B 47, 558 (1993). 
[13] G. Kresse   and J. Joubert, Phys. Rev. B 59, 1758 (1999). 
[14]  J. P. Perdew, K Burke and, M Ernzerhof, Phys. Rev. Lett. 77, 3865  (1996). 
[15] C. Nirmala Louis and K. Iyakutti, Phys. Rev. B 67, 094509 (2003). 
[16] O. Gulseren and R. E. Cohen, Phys. Rev. B 65, 064103 (2002). 
[17] Ding et. al., Phys. Rev. Lett. 98, 085502  (2007). 
http://www.pwscf.org/
0 100 200 300 400
0 40 80 120 160 200
    Theory
 Experiment
    Theory
 Experiment
Pressure (GPa)
    Theory
 Experiment
 Experiment
Fig. 1: The calculated P-V relations are shown with solid lines. The experimental data for 
V and Nb are from Ref. [2]. The experimental data for Ta shown by symbol (Ο) is from 
Ref. [4] and other is form Ref. [7]. The V0’s are respective theoretical and experimental 
equilibrium volumes. 
40 50 60 70 80 90 100 110 120
-0.74
-0.73
-0.72
-0.71
-0.70
-0.69
-0.68
-0.67
50 55 60 65 70 75
-0.720
-0.716
-0.712 V
P ~ 0 GPa
BCCSC
 (Deg)
Fig. 2: Total energy variation with αrhom at volume 13.48 Å3 per atom. 
-0.739
-0.738
-0.737
-0.736
-0.735
-0.734
-0.7200
-0.7188
-0.7176
-0.7164
-0.694
-0.693
-0.692
-0.691
106 107 108 109 110 111 112
-0.663
-0.662
-0.661
-0.660
-0.659
P~0 GPaVol=13.48
P=42 GPaVol=11.36
P=60 GPa
Vol=10.37
P=112 GPaVol=9.64
Fig. 3(a) 
-0.617
-0.616
-0.615
-0.614
-0.613
-0.612
-0.611
-0.538
-0.536
-0.534
-0.532
-0.530
-0.462
-0.460
-0.458
-0.456
-0.454
106 107 108 109 110 111 112
-0.334
-0.332
-0.330
-0.328
-0.326
P=160 GPaVol=8.89
P=240 GPaVol=8.02
P=332 GPaVol=7.41
P=434 GPaVol=6.67
Fig. 3(b) 
Fig. 3(a,b): The total energy variation with angle αrhom  at different pressures for V. All 
the volumes are in unit of Å3 per atom. 
P=60 GPa
P=112 GPa
P=160 GPa
P=332 GPa
Fig. 3c: The pressure variation of the phonon modes frequency of V along Γ-Η direction 
of the BZ.  
-9.900
-9.895
-9.890
-9.885
-9.880
-9.875
-10.215
-10.200
-10.185
-10.170
-10.18
-10.16
-10.14
-10.12
106 107 108 109 110 111 112
-9.48
-9.44
-9.40
-9.36
Vol=22
P=-2 GPa Vol=18.5
P=13 GPa Vol=17
Vol=10
(Deg)
P=73 GPa Vol=14
Fig. 4a 
-11.84
-11.82
-11.80
-11.78
-11.76
-11.74
-11.72
-11.70
-11.68
-11.36
-11.32
-11.28
-11.24
-11.20
-11.16
-11.12
-10.92
-10.88
-10.84
-10.80
-10.76
-10.72
-10.68
-10.64
-10.60
-10.56
-10.52
-10.48
-10.44
-10.40
-10.36
-10.32
106 107 108 109 110 111 112
-7.60
-7.52
-7.44
-7.36
-7.28
-7.20
-7.12
P=14 GPa Vol= 17.5
P=63 GPa Vol=15
Vol=14
P=114 GPa
P=96 GPa
Vol=13.5
P=280 GPa
Vol=10.78
(Deg)
Fig. 4b 
Fig. 4 (a,b): The calculated total energy variation with αrhom  at several volumes for Nb 
and Ta. The volumes given in figures are in units of Å3 per atom. 
ABSTRACT
  Results of the first-principles calculations are presented for the group-VB
metals V, Nb and Ta up to couple of megabar pressure. An unique structural
phase transition sequence BCC-->(at 60 GPa) rhombohedral (angle=110.5
degree)-->(at ~ 160 GPa)rhombohedral(angle=108.5 degree)--> (at ~ 430 GPa) BCC
is predicted in V. We also find that BCC-V becomes mechanically and
vibrationally unstable at around 112 GPa pressure. Similar transitions are
absent in Nb and Ta.

<|endoftext|><|startoftext|>
Introduction
The Standard Model (SM) of electroweak interactions has been successful. It can explain all experimental
results except for neutrino oscillation phenomena. Masses of quarks and leptons are generated through the
Yukawa interaction after the electroweak symmetry breaking. However, no principle has been established
to determine the flavor structure of the Yukawa couplings, and the origin of fermion mass hierarchy remains
unknown.
There have been many attempts to explain the flavor structure of Yukawa couplings. A promising
approach would be the idea of the flavor symmetry. In models based on the Froggatt–Nielsen (FN)
mechanism[1], the U(1) global symmetry is imposed as a flavor symmetry, in which the vacuum expectation
value (VEV) of an iso-singlet scalar field (FN field) gives a power-like structure of Yukawa couplings due
to the U(1) charge assignment for the relevant fields. Extension to more complicated flavor symmetries
has also been studied; i.e., non-Abelian global symmetries such as U(2)[2], discrete symmetries such as
S3[3], A4[4], D5[5], etc. They have several distinct patterns for the symmetry breaking, and the difference
in VEVs for each scalar field gives a hierarchical structure of the Yukawa matrix. In most of such models,
∗ E-mail: kanemu@sci.u-toyama.ac.jp
† E-mail: matsuda@mail.tsinghua.edu.cn
‡ E-mail: toshi@mpi-hd.mpg.de
§ E-mail: petcov@sissa.it
¶ E-mail: shindou@sissa.it
‖ E-mail: takasugi@het.phys.sci.osaka-u.ac.jp
∗∗ E-mail: ko2@het.phys.sci.osaka-u.ac.jp, Address after April 2007, Theory Group, KEK
http://arxiv.org/abs/0704.0697v1
only the orders of magnitude of the Yukawa matrix elements are estimated, so that O(1) uncertainties
exist in coefficients of the coupling constants between the scalar and matter fields. In this framework, CP
violation (CPV) comes from complex phases in these coefficients.
On the other hand, some kinds of the texture such as a democratic structure[6] have been investigated
for the Yukawa matrix. In the model with the democratic structure, all the elements of the Yukawa matrix
are assumed to have the same value up to the leading order, and mass hierarchy and flavor mixing are
given by diagonalizing these Yukawa matrices. CPV is supposed to appear as a consequence of complex
nature of the terms of tiny breaking of democratic structure. In any case the CP violating phase comes
from complex phases of free parameters so that it is not predictable.
Since both the flavor mixing and the CP violating phase are determined by the Yukawa matrices, it
would be natural to consider that they are given through the same mechanism which is relevant to the
Yukawa interaction. In the scenario of spontaneous CPV[7], the phase is deduced from the relative complex
phase between VEVs of the scalar fields. Combining the spontaneous CPV scenario with the idea of the
flavor symmetry, one can obtain the non-zero complex phase in the Yukawa matrix from the VEVs of
scalar fields. This idea has been developed in several flavor models; e.g., a model with three U(1) scalar
fields[8], spontaneous CPV in non-Abelian flavor symmetry[9], SO(10) model with the complex VEVs of
Higgs field[10], etc.
In this paper, we introduce a simple model where the FN mechanism works with democratic Yukawa
structure between quarks and FN fields, and show how the CPV can be obtained. As mentioned in [8],
this type of models requires at least more than two FN fields for a successful prediction of physical CP
violating phase.
This paper is organized as follows. In Sec.II, we study generation of CPV for quarks based on the FN
mechanism with the democratic ansatz. In Sec.III, simple models with two FN fields are discussed. We
present analytic expressions for the CKM parameters and numerical evaluations are also shown. Conclu-
sions are given in Sec.IV.
2 CP violation in democratic models
The democratic ansatz for the flavor structure of the Yukawa matrix has been implemented in Refs. [6, 13,
14, 15]. In this framework, the Yukawa matrices for the up- and down-type quarks are simply written as
Yu,d ∝
1 1 1
1 1 1
1 1 1
 . (1)
This flavor structure can be constructed by models with the S3R × S3L permutation symmetry[13]. The
symmetry can be realized in the geometrical origin of the brane-world scenario[14], and also in the strong
dynamics[15]. Two of the three eigenvalues are zero in these matrices in Eq. (1), and no CP violating phase
appears in the S3R×S3L limit 1. It is clear that in order to explain the experimental data this permutation
symmetry must be broken by some small effects. When the small breaking terms for the permutation and
CP symmetries are introduced by hand, the mass splitting between the 1st- and 2nd-generation quarks,
the mixing angles and the CP violating phase are explained.
1With keeping the S3R × S3L symmetry, the complex Yukawa matrices Yu,d in Eq. (1) can be re-expressed, for example,
by an appropriate unitary transformation as
Yu,d ∝
ω ω2 1
ω2 1 ω
1 ω ω2
A , (2)
where ω = ei2π/3 is the cube root of one. However the complex phase in this matrix is unphysical, because it is rephased out
by the redefinition of quark fields.
The FN mechanism is a simple idea to generate the mass hierarchy of quarks and leptons. In the
simplest FN model[1], an iso-singlet scalar field, Θ, is introduced with the U(1)FN flavor symmetry in
order to discriminate the fermion flavor by the U(1)FN charge.
The U(1)FN charge assigned for Θ is taken to be fΘ = −1 without loss of generality. Under the U(1)FN
symmetry, non-renormalizable interactions relevant to the quark mass matrix can be written as
LFN = −(CU )ij ŪiQj ·Hu
+fQj+fHu
− (CD)ijD̄iQj ·Hd
+fQj+fHd
, (3)
where Hu and Hd are iso-doublet fields (the Higgs fields) with their hypercharge to be −1/2 and +1/2,
respectively2, Qi is the left-handed quark doublet, Ui and Di are right-handed up- and down-type quarks
in the i-th generation, and CU and CD are coupling constants of order one. The U(1)FN charge of the
field X is expressed by fX .The cut-off scale is given by Λ, which describes the mass scale of new physics
dynamics. The coefficients (CU )ij and (CD)ij are generally complex numbers.
After U(1)FN is broken by the VEV of Θ,
〈Θ〉 = λΛ , (4)
where λ is a small dimensionless parameter, the quark Yukawa matrices are obtained as
(YU,D)ij = (CU,D)ijλ
+fQj+fHu,d . (5)
With the assignment of U(1)FN charges[16] as
(fQ1 , fUc1 ), (fQ2 , fU
), (fQ3 , fUc3 )
= (3, 2, 0) , (fDc
, fDc
, fDc
) = (2, 1, 1),
(fHu , fHd) = (0, 0), (6)
observed quark mass hierarchy and the CKM mixings can be derived by assuming λ to be close to the
Cabibbo angle sin θc = 0.22. At the leading order, induced masses for quarks are mu ∼ λ6〈Hu〉, mc ∼
λ4〈Hu〉, mt ∼ 〈Hu〉, md ∼ λ5〈Hd〉, ms ∼ λ3〈Hd〉 and mb ∼ λ〈Hd〉, and the CKM matrix is given by
UCKM ∼
1 λ λ3
λ 1 λ2
λ3 λ2 1
 . (7)
Now we consider the possibility of the spontaneous CPV due to the complex phase of 〈Θ〉. We assume
that CU and CD in Eq. (3) have the democratic structure, i.e.,
CU,D = αU,D
1 1 1
1 1 1
1 1 1
 , (8)
There is no CPV in the model with only one FN field. Although complex phases may be obtained in
the mass matrices by introducing the complex VEV of Θ, such phases are rotated away by the phase
redefinition of quark fields. Hence the model should have at least two FN fields Θ1,2 in order to obtain
non-vanishing CP violating phase. We start from the following Lagrangian with two FN fields,
LFN2 =−
Ūi(CU )ijQj ·Hu
D̄i(CD)ijQj ·Hd
, (9)
2In the SM, Hd = iσ2H
u is satisfied.
where n
1 and n
2 run from zero to n
ij ≡ fUci ,Dci +fQj+fHu,d with satisfying the constraint n
ij . After the U(1)FN symmetry is broken, the Yukawa couplings in the SM are given by
(YU,D)ij = (CU,D)ijλ
Rk , (10)
where
, R =
≡ |R|eiα . (11)
Therefore, physical CP violating phase can be obtained from the relative phase α between 〈Θ1〉 and 〈Θ2〉.
3 Examples for the model with two Froggatt-Nielsen fields
In this section, we show how to generate CPV from two FN fields by considering simple models. In order
to generate the quark mass hierarchy, we employ the U(1)FN charge assignment for matter fields given in
Eq. (6). In general, U(1)FN charges for the FN fields Θ1 and Θ2 can be different with each other. We here
assume that the both have the same U(1)FN charge for simplicity; (fΘ1 , fΘ2) = (−1,−1).
(a) The simplest toy model
First of all, we discuss the naive model defined in Eq. (9). The mass matrices Mu,d for up- and
down-type quarks are given by
(Mu)ij = αU 〈Hu〉AnU
ij , (Md)ij = αD〈Hd〉AnD
ij , (12)
where An =
k=0 R
k. This model predicts m2s/m
b = O(λ6) and |Vus| = O(λ). Only one of the two
experimental values can be adjusted. Furthermore, each mass matrix gives one massless eigenstate because
of the conditions detMu = detMd = 0
In order to avoid these difficulties, the following possibilities can be considered: (i) introducing an
additional symmetry, (ii) throwing away the democratic ansatz given in Eq. (8), etc. In the next model,
we explore the possibility of keeping the democratic structure for CU,D.
(b) The extended models with the Z2 symmetry
We try to construct more realistic model. The U(1)FN charges are assigned again as in Eq. (6). In
order to obtain the observed value of m2s/m
b by setting λ ∼ sin θc, we impose the Z2 symmetry under the
transformation of Θ1 → Θ1 and Θ2 → −Θ2. For Z2 parity assignment for quark fields, there are a lot of
choices. If we consider the scenario associated with the grand unified theory (GUT), it would be natural
that the Z2 parity for Qi and U
i is common and that for Hu is set to be + for the prediction of a large
top-quark mass. Because Hd always couples to D
i , we can set the Z2 parity for Hd to be + without loss
of generality. Then, there are 64 possibilities on parity assignment for quarks. However, it turns out that
most of them cannot give correct numbers of m2c/m
t and m
b . Consequently, only the following sets
of Z2 parity assignment are enough to be discussed;
• Type I-a,
((Q1, U
1), (Q2, U
2), (Q3, U
3 )) = (+,+,−) , (Dc1, Dc2, Dc3) = (+,+,−) . (13)
• Type I-b,
((Q1, U
1), (Q2, U
2), (Q3, U
3 )) = (+,+,−) , (Dc1, Dc2, Dc3) = (−,+,−) . (14)
3Even when the u- and d-quarks are massless at the electroweak scale, their finite masses would be generated at lower
energy scales due to the strong dynamics[17].
• Type II-a,
((Q1, U
1), (Q2, U
2), (Q3, U
3 )) = (+,−,+) , (Dc1, Dc2, Dc3) = (+,−,+) . (15)
• Type II-b,
((Q1, U
1), (Q2, U
2), (Q3, U
3 )) = (+,−,+) , (Dc1, Dc2, Dc3) = (−,−,+) . (16)
Type I-a gives the mass matrices as
Mu(I-a) =
6 B4λ
5 RB2λ
5 B4λ
4 Rλ2
3 Rλ2 1
αU 〈Hu〉 , Md(I-a) =
5 B4λ
4 Rλ2
4 B2λ
4 RB2λ
αD〈Hd〉 , (17)
where B2n =
k=0 R
2k. Diagonalizing above matrices, we obtain mass ratios m2c/m
t , m
, the
CKM mixing angles (absolute values of CKM matrix elements), and the Kobayashi-Maskawa phase
φ3 ≡ arg(V ∗ubVud/V ∗cbVcd) at the leading order as
= |1 +R4|2λ8 , m
|1−R4|2
(1 + |R|2)2
|Vus| =
|R|8 + |R|−8 − 2(2 cos2 4α− 1)
|Vub| =
|R|||R| − |R|−1|
(|R|+ |R|−1)
|R|4 + |R|−4 + 2 cos 4α
|Vcb| =
|R|2 + |R|−2 + 2 cos 4α
|R|+ |R|−1
λ2 , (18)
φ1 = arg
|R|4 + 1
+ (|R|2 + 1) cos 4α− i(|R|2 − 1) sin 4α
φ2 = arg
(1− |R|2)
− |R|4 + 2i sin 4α
φ3 = arg
(1− |R|2)
|R|4 −
+ (|R|2 − 1) cos 4α+ i(|R|2 + 1) sin 4α
Let us discuss appropriate values of |R|, cos 4α and sin 4α. We first expect that λ ∼ sin θc. Then, |R| > 1
is needed to obtain the reasonable value of |Vub|. However, |R| cannot be much greater than unity, because
m2c/m
t exceeds the experimentally acceptable value. In addition, cos 4α < 0 is necessary for |R| > 1 to
explain the data of |Vcb|. Finally, sin 4α < 0 is required for φ1 to be in the first quadrant. In this case,
however, it turns out that φ3 cannot be in the first quadrant simultaneously.
For numerical evaluation, we take |R| =
3/2, cos 4α = −3/4 and sin 4α = −
7/4. This parameter
set determines λ = 0.25 under the experimental value of |Vus|(= 0.22). The matrices Mu,d(I-a) are
diagonalized, and we obtain
|Vub| = 0.0028 , |Vcb| = 0.032 , (19)
which are in excellent agreement with the CKM mixing angles at the GUT scale, |Vcb| = 0.029–0.039 and
|Vub| = 0.0024–0.0038 which are evaluated from renormalization group method with the experimental data
at low energies[18]. However, quark mass ratios are predicted as
= 0.0061 ,
= 0.073 , (20)
which are about twice as large as the expected values at the GUT scale; mc/mt ∼ 0.0032 ± 0.0007 and
ms/mb = 0.036± 0.005. Finally, we find
φ1 = 12
◦ , φ2 = 31
◦ , φ3 = 137
◦ . (21)
Although the predictions cannot explain all the data simultaneously, it would be amazing to observe that
this model can reproduce most of them in a considerable extent. We also find that for Type I-b, the mixing
angles, mass ratios between 2nd- and 3rd-generations, and the CKM phase are completely the same at
leading order as in Eq. (18).
For Type II-a, the mass matrices are
Mu(II-a) =
6 RB4λ
5 B2λ
5 B4λ
4 Rλ2
3 Rλ2 1
αU 〈Hu〉 , Md(II-a) =
5 RB2λ
4 B2λ
4 B2λ
4 RB2λ
αD〈Hd〉 .
The resulting mass ratios, the CKM parameters and phases φ1, φ2 and φ3 are the same as those in Type
I except for |Vus| and |Vub|, which are
|Vus| =
|R|8 + |R|−8 − 2(2 cos2 4α− 1)
|Vub| =
|R|2||R| − |R|−1|
(|R|+ |R|−1)
|R|4 + |R|−4 + 2 cos 4α
λ3 . (23)
These expressions are different from those in Type I by the multiplication factor |R|.
In this case, we take |R| =
3/2 and cos 4α = −1/2 in order to compensate the effect of the extra factor
|R| in |Vus| in comparison with the that in Type I. In addition, we take sin 4α = −
3/2 and λ = 0.23. We
obtain the following results;
|Vus| = 0.22 , |Vub| = 0.0038 , |Vcb| = 0.035 ,
= 0.0060 ,
= 0.059 ,
φ1 = 18
◦, φ2 = 38
◦, φ3 = 123
◦. (24)
Although the size of ms/mb becomes smaller than that of the Type I, it is still too large to be phenomeno-
logically acceptable. Moreover, mc/mt remains two times greater than the expected value, and φ3 is in
the 2nd quadrant. Type II-b gives almost the same results for the CKM parameters and the mass ratios
between 2nd- and 3rd- generations.
4 Conclusion
We have studied possibility of incorporating CPV by using the FN mechanism in the context of democratic
flavor FN couplings. We have considered models with two FN fields, in which the relative phase of their
VEVs plays as the origin of CP violating phase at low energies. In the scenario with the Z2 symmetry, the
relationship among ratios between quark masses, the absolute values of CKM matrix elements and the CP
violating phase has been examined in several simplest models. We have found that the predictions have
been in good agreement with most of the data. However, the CP violating phase φ3 has been predicted to
be around 130◦, so that the models we have examined are not acceptable. It may be oversimplification to
assume the flavor blind couplings. We expect that the small modification for the democratic assumption
would cure this phenomenological problem.
We have demonstrated the way how introduce the CPV to the FN model and have shown that the
scenario with two FN fields would be promising. An application of our scenario to the lepton sector
including neutrinos is under way and will appear in our future publications. If this will be successfully
achieved, we would obtain a model which gives the unified description of CP phases for quarks and leptons.
Acknowledgements. S. K. was supported, in part, by the Grant-in-Aid of the Ministry of Education,
Culture, Sports, Science and Technology, Government of Japan, Grant Nos.17043008 and 18034004. S. P.
and T. S. were supported in part by the Italian MIUR and INFN under the programs “Fisica Astroparti-
cellare”.
References
[1] C. D. Froggatt and H. B. Nielsen, Nucl. Phys. B 147 (1979) 277.
[2] A. Pomarol and D. Tommasini, Nucl. Phys. B 466 (1996) 3; R. Barbieri, G. R. Dvali and L. J. Hall,
Phys. Lett. B 377 (1996) 76; R. Barbieri and L. J. Hall, Nuovo Cim. A 110 (1997) 1; R. Barbieri,
L. Giusti, L. J. Hall and A. Romanino, Nucl. Phys. B 550 (1999) 32; R. Barbieri, L. J. Hall, S. Raby
and A. Romanino, Nucl. Phys. B 493 (1997) 3; R. Barbieri, L. J. Hall and A. Romanino, Phys. Lett.
B 401 (1997) 47;
[3] S. Pakvasa and H. Sugawara, Phys. Lett. B 73 (1978) 61; H. Harari, H. Haut and J. Weyers, Phys.
Lett. B 78 (1978) 459; E. Derman, Phys. Rev. D 19 (1979) 317; E. Ma, Phys. Rev. D 43 (1991) 2761;
Phys. Rev. D 61 (2000) 033012; L. J. Hall and H. Murayama, Phys. Rev. Lett. 75 (1995) 3985.
[4] D. Wyler, Phys. Rev. D 19 (1979) 3369; E. Ma and G. Rajasekaran, Phys. Rev. D 64 (2001) 113012;
K. S. Babu, E. Ma and J. W. F. Valle, Phys. Lett. B 552 (2003) 207.
[5] C. Hagedorn, M. Lindner and F. Plentinger, Phys. Rev. D 74 (2006) 025007.
[6] H. Fritzsch, Nucl. Phys. B 155 (1979) 189; Y. Koide, Phys. Rev. D 28 (1983) 252; Phys. Rev. D 39
(1989) 1391; H. Fritzsch and Z. Z. Xing, Phys. Lett. B 372 (1996) 265; Phys. Lett. B 440 (1998) 313;
Phys. Rev. D 61 (2000) 073016; Phys. Lett. B 598 (2004) 237;
[7] T. D. Lee, Phys. Rept. 9 (1974) 143 and references therein.
[8] Y. Nir and R. Rattazzi, Phys. Lett. B 382 (1996) 363 .
[9] G. G. Ross, L. Velasco-Sevilla and O. Vives, Nucl. Phys. B 692 (2004) 50; N. Sahu and S. Uma
Sankar, arXiv:hep-ph/0501069. J. Ferrandis, arXiv:hep-ph/0510051.
[10] W. Grimus and H. Kuhbock, arXiv:hep-ph/0607197; arXiv:hep-ph/0612132.
[11] N. Cabibbo, Phys. Rev. Lett. 10 (1963) 531.
[12] M. Kobayashi and T. Maskawa, Prog. Theor. Phys. 49 (1973) 652.
[13] H. Harari, H. Haut and J. Weyers, Phys. Lett. B 78 (1978) 459; M. Tanimoto, Phys. Lett. B 483
(2000) 417; M. Fukugita, M. Tanimoto and T. Yanagida, Phys. Rev. D 57 (1998) 4429; M. Tanimoto,
T. Watari and T. Yanagida, Phys. Lett. B 461 (1999) 345.
[14] T. Watari and T. Yanagida, Phys. Lett. B 544 (2002) 167; A. Soddu and N. K. Tran, Phys. Rev. D
69 (2004) 015010.
[15] T. Kobayashi, H. Shirano and H. Terao, arXiv:hep-ph/0412299. Q. Shafi and Z. Tavartkiladze, Phys.
Lett. B 594 (2004) 177; T. Kobayashi, Y. Omura and H. Terao, Phys. Rev. D 74 (2006) 053005.
http://arxiv.org/abs/hep-ph/0501069
http://arxiv.org/abs/hep-ph/0510051
http://arxiv.org/abs/hep-ph/0607197
http://arxiv.org/abs/hep-ph/0612132
http://arxiv.org/abs/hep-ph/0412299
[16] See e.g. M. Leurer, Y. Nir and N. Seiberg, Nucl. Phys. B 420 (1994) 468; J. K. Elwood, N. Irges and
P. Ramond, Phys. Rev. Lett. 81 (1998) 5064; M. Bando and T. Kugo, Prog. Theor. Phys. 101 (1999)
1313; M. Bando, T. Kugo and K. Yoshioka, Prog. Theor. Phys. 104 (2000) 211.
[17] D. R. Nelson, G. T. Fleming and G. W. Kilcup, Phys. Rev. Lett. 90, 021601 (2003); C. Aubin et al.
[MILC Collaboration], Phys. Rev. D 70, 114501 (2004).
[18] H. Fusaoka and Y. Koide, Phys. Rev. D 57 (1998) 3986; C. R. Das and M. K. Parida, Eur. Phys. J.
C 20 (2001) 121.
	Introduction
	CP violation in democratic models
	Examples for the model with two Froggatt-Nielsen fields
	Conclusion
ABSTRACT
  We study how to incorporate CP violation in the Froggatt--Nielsen (FN)
mechanism. To this end, we introduce non-renormalizable interactions with a
flavor democratic structure to the fermion mass generation sector. It is found
that at least two iso-singlet scalar fields with imposed a discrete symmetry
are necessary to generate CP violation due to the appearance of the relative
phase between their vacuum expectation values.
  In the simplest model, ratios of quark masses and the
Cabibbo-Kobayashi-Maskawa (CKM) matrix including the CP violating phase are
determined by the CKM element |V_{us}| and the ratio of two vacuum expectation
values R=|R|e^{i*alpha} (a magnitude and a phase). It is demonstrated how the
angles phi_i (i=1--3) of the unitarity triangle and the CKM off-diagonal
elements |V_{ub}| and |V_{cb}| are predicted as a function of |V_{us}|, |R| and
\alpha. Although the predicted value of the CP violating phase does not agree
with the experimental data within the simplest model, the basic idea of our
scenario would be promising to construct a more realistic model of flavor and
CP violation.

<|endoftext|><|startoftext|>
Introduction 1
1.1 The adiabatic piston . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Physical motivation for the results . . . . . . . . . . . . . . . . . . . 4
2 Background Averaging Material 8
2.1 The averaging framework . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Some classical averaging results . . . . . . . . . . . . . . . . . . . . 12
2.3 A proof of Anosov’s theorem . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Moral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Results for piston systems in one dimension 27
3.1 Statement of results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Heuristic derivation of the averaged equation for the hard core piston 34
3.3 Proof of the main result for the hard core piston . . . . . . . . . . . 35
3.4 Proof of the main result for the soft core piston . . . . . . . . . . . 41
3.5 Appendix to Section 3.4 . . . . . . . . . . . . . . . . . . . . . . . . 50
4 The periodic oscillation of an adiabatic piston in two or three
dimensions 54
4.1 Statement of the main result . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Preparatory material concerning a
two-dimensional gas container with only one gas particle on each side 58
4.3 Proof of the main result for two-dimensional gas containers with
only one gas particle on each side . . . . . . . . . . . . . . . . . . . 66
4.4 Generalization to a full proof
of Theorem 4.1.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.5 Inducing maps on subspaces . . . . . . . . . . . . . . . . . . . . . . 82
4.6 Derivative bounds for the billiard map
in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Bibliography 84
List of Figures
1.1 A gas container D in d = 2 dimensions separated by an adiabatic
piston. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 An effective potential. . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 A schematic of the phase space M. Note that although the level
set Mc = {h = c} is depicted as a torus, it need not be a torus. It
could be any compact, co-dimension m submanifold. . . . . . . . . . 11
3.1 The piston system with n1 = 3 and n2 = 4. Note that the gas
particles do not interact with each other, but only with the piston
and the walls. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.1 A gas container D ⊂ R2 separated by a piston. . . . . . . . . . . . . 55
4.2 A choice of coordinates on phase space. . . . . . . . . . . . . . . . . 59
4.3 An analysis of the divergences of orbits when ε > 0 and the left gas
particle collides with the moving piston to the right of Q0. Note
that the dimensions are distorted for visual clarity, but that εL and
εL/γ are both o(γ) as ε→ 0. . . . . . . . . . . . . . . . . . . . . . 79
Chapter 1
Introduction
What can be rigorously understood about the nonequilibrium dynamics of chaotic,
many particle systems? Although much progress has been made in understanding
the infinite time behavior of such systems, our understanding on finite time scales
is still far from complete. Systems of many particles contain a large number of
degrees of freedom, and it is often impractical or impossible to keep track of their
full dynamics. However, if one is only interested in the evolution of macroscopic
quantities, then these variables form a small subset of all of the variables. The evo-
lution of these quantities does not itself form a closed dynamical system, because
it depends on events happening in all of the (very large) phase space. We must
therefore develop techniques for describing the evolution of just a few variables in
phase space. Such descriptions are valid on limited time scales because a large
amount of information about the dynamics of the full system is lost. However, the
time scales of validity can often be long enough to enable a good prediction of the
observable dynamics.
Averaging techniques help to describe the evolution of certain variables in
some physical systems, especially when the system has components that move on
different time scales. The primary results of this thesis involve applying averaging
techniques to chaotic microscopic models of gas particles separated by an adiabatic
piston for the purposes of justifying and understanding macroscopic laws.
This thesis is organized as follows. In Section 1.1 we briefly introduce the the
adiabatic piston problem and our results. In Section 1.2 we review the physical
motivations for our results. The following three chapters may each be read inde-
pendently. Chapter 2 presents an introduction to averaging theory and the proofs
of a number of averaging theorems for smooth systems that motivate our later
proofs for the piston problem. Chapter 3 contains our results for piston systems
in one dimension, and Chapter 4 contains our results for the piston system in
dimensions two and three.
1.1 The adiabatic piston
Consider the following simple model of an adiabatic piston separating two gas
containers: A massive piston of mass M ≫ 1 divides a container in Rd, d =
1, 2, or 3, into two halves. The piston has no internal degrees of freedom and can
only move along one axis of the container. On either side of the piston there are a
finite number of ideal, unit mass, point gas particles that interact with the walls
of the container and with the piston via elastic collisions. When M = ∞, the
piston remains fixed in place, and each gas particle performs billiard motion at a
constant energy in its sub-container. We make an ergodicity assumption on the
behavior of the gas particles when the piston is fixed. Then we study the motions
of the piston when the number of gas particles is fixed, the total energy of the
system is bounded, but M is very large.
Heuristically, after some time, one expects the system to approach a steady
state, where the energy of the system is equidistributed amongst the particles
and the piston. However, even if we could show that the full system is ergodic,
an abstract ergodic theorem says nothing about the time scale required to reach
such a steady state. Because the piston will move much slower than a typical gas
particle, it is natural to try to determine the intermediate behavior of the piston
by averaging techniques. By averaging over the motion of the gas particles on a
time scale chosen short enough that the piston is nearly fixed, but long enough
that the ergodic behavior of individual gas particles is observable, we will show
that the system does not approach the expected steady state on the time scale
M1/2. Instead, the piston oscillates periodically, and there is no net energy transfer
between the gas particles.
The results of this thesis follow earlier work by Neishtadt and Sinai [Sin99,
NS04]. They determined that for a wide variety of Hamiltonians for the gas par-
ticles, the averaged behavior of the piston is periodic oscillation, with the piston
moving inside an effective potential well whose shape depends on the initial po-
sition of the piston and the gas particles’ Hamiltonians. They pointed out that
an averaging theorem due to Anosov [Ano60, LM88], proved for smooth systems,
should extend to this case. The main result of the present work, Theorem 4.1.1,
is that Anosov’s theorem does extend to the particular gas particle Hamiltonian
described above. Thus, if we examine the actual motions of the piston with respect
to the slow time τ = t/M1/2, then, as M → ∞, in probability (with respect to
Liouville measure) most initial conditions give rise to orbits whose actual motion is
accurately described by the averaged behavior for 0 ≤ τ ≤ 1, i.e. for 0 ≤ t ≤M1/2.
A recent study involving some similar ideas by Chernov and Dolgopyat [CD06a]
considered the motion inside a two-dimensional domain of a single heavy, large gas
particle (a disk) of mass M ≫ 1 and a single unit mass point particle. They as-
sumed that for each fixed location of the heavy particle, the light particle moves
inside a dispersing (Sinai) billiard domain. By averaging over the strongly hy-
perbolic motions of the light particle, they showed that under an appropriate
scaling of space and time the limiting process of the heavy particle’s velocity is a
(time-inhomogeneous) Brownian motion on a time scale O(M1/2). It is not clear
whether a similar result holds for the piston problem, even for gas containers with
good hyperbolic properties, such as the Bunimovich stadium. In such a container
the motion of a gas particle when the piston is fixed is only nonuniformly hyper-
bolic because it can experience many collisions with the flat walls of the container
immediately preceding and following a collision with the piston.
The present work provides a weak law of large numbers, and it is an open
problem to describe the sizes of the deviations for the piston problem [CD06b].
Although our result does not yield concrete information on the sizes of the devia-
tions, it is general in that it imposes very few conditions on the shape of the gas
container. Most studies of billiard systems impose strict conditions on the shape
of the boundary, generally involving the sign of the curvature and how the corners
are put together. The proofs in this work require no such restrictions. In particu-
lar, the gas container can have cusps as corners and need satisfy no hyperbolicity
conditions.
If the piston divides a container in R2 or R3 with axial symmetry, such as a
rectangle or a cylinder, then our ergodicity assumption on the behavior of the gas
particles when the piston is fixed does not hold. In this case, the interactions of the
gas particles with the piston and the ends of the container are completely specified
by their motions along the normal axis of the container. Thus, this system projects
onto a system inside an interval consisting of a massive point particle, the piston,
which interacts with the gas particles on either side of it. These gas particles make
elastic collisions with the walls at the ends of the container and with the piston,
but they do not interact with each other. For such one-dimensional containers,
the effects of the gas particles are quasi-periodic and can be essentially decoupled,
and we recover a strong law of large numbers with a uniform rate, reminiscent
of classical averaging over just one fast variable in S1: The convergence of the
actual motions to the averaged behavior is uniform over all initial conditions, with
the size of the deviations being no larger than O(M−1/2) on the time scale M+1/2.
See Theorem 3.1.1. Gorelyshev and Neishtadt [GN06] independently obtained this
result.
For systems in d = 1 dimension, we also investigate the behavior of the system
when the interactions of the gas particles with the walls and the piston have been
smoothed, so that Anosov’s theorem applies directly. Let δ ≥ 0 be a parameter
of smoothing, so that δ = 0 corresponds to the hard core setting above. Then
the averaged behavior of the piston is still a periodic oscillation, which depends
smoothly on δ. We show that the deviations of the actual motions of the piston
from the averaged behavior are again not more than O(M−1/2) on the time scale
M1/2. The size of the deviations is bounded uniformly, both over initial conditions
and over the amount of smoothing, Theorem 3.1.2.
Our results for a single heavy piston separating two gas containers generalize
to the case of N heavy pistons separating N +1 gas containers. Here the averaged
behavior of the pistons has them moving like an N -dimensional particle inside an
effective potential well. Compare Section 3.1.3.
The systems under consideration in this work are simple models of an adiabatic
piston. The general adiabatic piston problem [Cal63], well-known from physics,
consists of the following: An insulating piston separates two gas containers, and
initially the piston is fixed in place, and the gas in each container is in a separate
thermal equilibrium. At some time, the piston is no longer externally constrained
and is free to move. One hopes to show that eventually the system will come to a
full thermal equilibrium, where each gas has the same pressure and temperature.
Whether the system will evolve to thermal equilibrium and the interim behavior of
the piston are mechanical problems, not adequately described by thermodynam-
ics [Gru99], that have recently generated much interest within the physics and
mathematics communities following Lieb’s address [Lie99]. One expects that the
system will evolve in at least two stages. First, the system relaxes deterministically
toward a mechanical equilibrium, where the pressures on either side of the piston
are equal. In the second, much longer, stage, the piston drifts stochastically in the
direction of the hotter gas, and the temperatures of the gases equilibrate. See for
example [GPL03, CL02, Che04] and the references therein. Previously, rigorous
results have been limited mainly to models where the effects of gas particles rec-
olliding with the piston can be neglected, either by restricting to extremely short
time scales [LSC02, CLS02] or to infinite gas containers [Che04].
1.2 Physical motivation for the results
In this section, we briefly review the physical motivations for our results on the
adiabatic piston.
Consider a massive, insulating piston of massM that separates a gas container
D in Rd, d = 1, 2, or 3. See Figure 1.1. Denote the location of the piston by Q
and its velocity by dQ/dt = V . If Q is fixed, then the piston divides D into two
subdomains, D1(Q) = D1 on the left and D2(Q) = D2 on the right. By |Di| we
denote the area (when d = 2, or length, when d = 1, or volume, when d = 3) of
Di. Define
∂ |D1(Q)|
= −∂ |D2(Q)|
so that ℓ is the piston’s cross-sectional length (when d = 2, or area, when d = 3).
If d = 1, then ℓ = 1. By Ei we denote the total energy of the gas inside Di.
D1(Q) D2(Q)
E1 E2
D = D1(Q) ⊔ D2(Q)
✲ V = εW
M = ε−2 ≫ 1
s✟✟✙ℓ
Figure 1.1: A gas container D in d = 2 dimensions separated by an adiabatic
piston.
We are interested in the dynamics of the piston when the system’s total energy
is bounded and M → ∞. When M = ∞, the piston remains fixed in place, and
each energy Ei remains constant. When M is large but finite, MV
2/2 is bounded,
and so V = O(M−1/2). It is natural to define
ε =M−1/2,
so that W is of order 1 as ε → 0. This is equivalent to scaling time by ε, and so
we introduce the slow time
τ = εt.
If we let Pi denote the pressure of the gas inside Di, then heuristically the
dynamics of the piston should be governed by the following differential equation:
= V, M
= P1ℓ− P2ℓ,
= P1ℓ− P2ℓ.
(1.1)
To find differential equations for the energies of the gases, note that in a short
amount of time dt, the change in energy should come entirely from the work done
on a gas, i.e. the force applied to the gas times the distance the piston has moved,
because the piston is adiabatic. Thus, one expects that
= −V P1ℓ,
= +V P2ℓ,
= −WP1ℓ,
= +WP2ℓ.
(1.2)
To obtain a closed system of differential equations, it is necessary to insert an
expression for the pressures. Piℓ should be the average force from the gas particles
in Di experienced by the piston when it is held fixed in place. Whether such an
expression, depending only on Ei and Di(Q), exists and is the same for (almost)
every initial condition of the gas particles depends strongly on the microscopic
model of the gas particle dynamics. Sinai and Neishtadt [Sin99, NS04] pointed
out that for many microscopic models where the pressures are well defined, the
solutions of Equations (1.1) and (1.2) have the piston moving according to a model-
dependent effective Hamiltonian.
Because the pressure of an ideal gas in d dimensions is proportional to the
energy density, with the constant of proportionality 2/d, we choose to insert
d |Di|
Later, we will make assumptions on the microscopic gas particle dynamics to justify
this substitution. However, if we accept this definition of the pressure, we obtain
the following ordinary differential equations for the four macroscopic variables of
the system:
d |D1(Q)|
− 2E2ℓ
d |D2(Q)|
− 2WE1ℓ
d |D1(Q)|
2WE2ℓ
d |D2(Q)|
. (1.3)
For these equations, one can see the effective Hamiltonian as follows. Since
d ln(Ei)
d ln(|Di(Q)|)
Ei(τ) = Ei(0)
|Di(Q(0))|
|Di(Q(τ))|
Hence
d2Q(τ)
E1(0) |D1(Q(0))|2/d
|D1(Q(τ))|1+2/d
E2(0) |D2(Q(0))|2/d
|D2(Q(τ))|1+2/d
effective potential
P1 = P2
Figure 1.2: An effective potential.
and so (Q,W ) behave as if they were the coordinates of a Hamiltonian system
describing a particle undergoing motion inside a potential well. The effective
Hamiltonian may be expressed as
W 2 +
E1(0) |D1(Q(0))|2/d
|D1(Q)|2/d
E2(0) |D2(Q(0))|2/d
|D2(Q)|2/d
. (1.4)
The question is, do the solutions of Equation (1.3) give an accurate description
of the actual motions of the macroscopic variables when M tends to infinity? The
main result of this thesis, Theorem 4.1.1, is that, for an appropriately defined
system, the answer to this question is affirmative for 0 ≤ t ≤ M1/2, at least for
most initial conditions of the microscopic variables. Observe that one should not
expect the description to be accurate on time scales much longer than O(M1/2) =
O(ε−1). The reason for this is that, presumably, there are corrections of size O(ε)
in Equation (1.3) that we are neglecting. For τ = εt > O(1), these corrections
should become significant. Such higher order corrections for the adiabatic piston
were studied by Crosignani et al. [CDPS96].
Chapter 2
Background Averaging Material
In this chapter, we present a number of well-known classical averaging results
for smooth systems, as well as a proof of Anosov’s averaging theorem, which is
the first general multi-phase averaging result. All of these theorems are at least
45 years old. However, we present them here because our proofs of the classical
results are at least slightly novel, and the ideas in them lend themselves well to
certain higher-dimensional generalizations. In particular, they are fairly close to
the ideas in the proof we give for our piston results in one dimension. The proof of
Anosov’s theorem is a new and unpublished proof due mainly to Dolgopyat, with
some further simplifications made. The ideas in this proof underly the ideas we
will use to prove the weak law of large numbers for our piston system in dimensions
two and three.
We begin by giving a discussion of a framework for general averaging theory
and some averaging results. A number of classical averaging theorems are then
proved, followed by the proof of Anosov’s theorem.
2.1 The averaging framework
In this section, consider a family of ordinary differential equations
= Z(z, ε) (2.1)
on a smooth, finite-dimensional Riemannian manifold M, which is indexed by the
real parameter ε ∈ [0, ε0]. Assume
• Regularity: the functions Z and ∂Z/∂ε are both C1 on M× [0, ε0].
We denote the flow generated by Z(·, ε) by zε(t, z) = zε(t). We will usually
suppress the dependence on the initial condition z = zε(0, z). Think of zε(·)
as being a random variable whose domain is the space of initial conditions for
the differential equation (2.1) and whose range is the space of continuous paths
(depending on the parameter t) in M.
• Existence of smooth integrals: z0(t) has m independent C2 first integrals
h = (h1, . . . , hm) : M → Rm.
Then h is conserved by z0(t), and at every point the linear operator ∂h/∂z has
full rank. It follows from the implicit function theorem that each level set
Mc := {h = c}
is a smooth submanifold of co-dimension m, which is invariant under z0(t). Fur-
ther, assume that there exists an open ball U ⊂ Rm satisfying:
• Compactness: ∀c ∈ U , Mc is compact.
• Preservation of smooth measures: ∀c ∈ U , z0(t)|Mc preserves a smooth
measure µc that varies smoothly with c, i.e. there exists a C1 function
g : M → R>0 such that g|Mc is the density of µc with respect to the
restriction of Riemannian volume.
hε(t, z) = hε(t) := h(zε(t)).
Again, think of hε(·) as being a random variable that takes initial conditions z ∈ M
to continuous paths (depending on the parameter t) in U . Since dh0/dt ≡ 0,
Hadamard’s Lemma allows us to write
= εH(zε, ε)
for some C1 function H : M× [0, ε0] → U . Observe that
(t) = Dh(zε(t))Z(zε(t), ε) = Dh(zε(t))
Z(zε(t), ε)− Z(zε(t), 0)
so that
H(z, 0) = L ∂Z
|ε=0h.
Here L denotes the Lie derivative.
Define the averaged vector field H̄ by
H̄(h) =
H(z, 0)dµh(z). (2.2)
Then H̄ is C1. Fix a compact set V ⊂ U , and introduce the slow time
τ = εt.
Let h̄(τ, z) = h̄(τ) be the random variable that is the solution of
= H̄(h̄), h̄(0) = hε(0).
We only consider the dynamics in a compact subset of phase space, so for initial
conditions z ∈ h−1U , define the stopping time
Tε(z) = Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Heuristically, think of the phase space M as being a fiber bundle whose base
is the open set U and whose fibers are the compact sets Mh. See Figure 2.1.
Then the vector field Z(·, 0) is perpendicular to the base, so its orbits z0(t) flow
only along the fibers. Now when 0 < ε ≪ 1, the vector field Z(·, ε) acquires a
component of size O(ε) along the base, and so its orbits zε(t) have a small drift
along the base, which we can follow by observing the evolution of hε(t). Because
of this, we refer to h as consisting of the slow variables. Other variables, used
to complete h to a parameterization of (a piece of) phase space, are called fast
variables. Note that hε(t) depends on all the dimensions of phase space, and so it
is not the flow of a vector field on the m-dimensional space U . However, because
the motion along each fiber is relatively fast compared to the motion across fibers,
we hope to be able to average over the fast motions and obtain a vector field
on U that gives a good description of hε(t) over a relatively long time interval,
independent of where the solution zε(t) started on Mhε(0). Because our averaged
vector field, as defined by Equation (2.2), only accounts for deviations of size O(ε),
we cannot expect this time interval to be longer than size O(1/ε). In terms of
the slow time τ = εt, this length becomes O(1). In other words, the goal of the
first-order averaging method described above should be to show that, in some
sense, sup0≤τ≤1∧Tε
∣hε(τ/ε)− h̄(τ)
∣→ 0 as ε → 0. This is often referred to as the
averaging principle.
Note that the assumptions of regularity, existence of smooth integrals, com-
pactness, and preservation of smooth measures above are not sufficient for the
averaging principle to hold in any form. As an example of just one possible
obstruction, the level sets Mc could separate into two completely disjoint sets,
Mc = M+c ⊔ M−c . If this were the case, then it would be implausible that the
solutions of the averaged vector field defined by averaging over all of Mc would
accurately describe hε(t, z), independent of whether z ∈ M+c or z ∈ M−c .
Some averaging results
So far, we are in a general averaging setting. Frequently, one also assumes that the
invariant submanifolds, Mh, are tori, and that there exists a choice of coordinates
z = (h, ϕ)
M = {(h, ϕ)}
✲ h ∈ U ⊂ Rm
“slow variables”
ϕ =“fast variables”
✄✗Z(·, 0) Z(·, ε)
Figure 2.1: A schematic of the phase space M. Note that although the level set
Mc = {h = c} is depicted as a torus, it need not be a torus. It could be any
compact, co-dimension m submanifold.
on M in which the differential equation (2.1) takes the form
= εH(h, ϕ, ε),
= Φ(h, ϕ, ε).
Then if ϕ ∈ S1 and the differential equation for the fast variable is regular,
i.e. Φ(h, ϕ, 0) is bounded away from zero for h ∈ U ,
initial conditions
s.t. hε(0)∈V
0≤τ≤1∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε) as ε → 0.
See for example Chapter 5 in [SV85], Chapter 3 in [LM88], or Theorem 2.2.3 in
the following section.
When the differential equation for the fast variable is not regular, or when
there is more than one fast variable, the typical averaging result becomes much
weaker than the uniform convergence above. For example, consider the case when
ϕ ∈ Tn, n > 1, and the unperturbed motion is quasi-periodic, i.e. Φ(h, ϕ, 0) =
Ω(h). Also assume that H ∈ Cn+2 and that Ω is nonvanishing and satisfies a
nondegeneracy condition on U (for example, Ω : U → Tn is a submersion). Let P
denote Riemannian volume on M. Neishtadt [LM88, Nei76] showed that in this
situation, for each fixed δ > 0,
0≤τ≤1∧Tε
∣hε(τ/ε)− h̄(τ)
∣ ≥ δ
ε/δ),
and that this result is optimal. Thus, the averaged equation only describes the
actual motions of the slow variables in probability on the time scale 1/ε as ε→ 0.
Neishtadt’s result was motivated by a general averaging theorem for smooth
systems due to Anosov. This theorem requires none of the additional assumptions
in the averaging results above. Under the conditions of regularity, existence of
smooth integrals, compactness, and preservation of smooth measures, as well as
• Ergodicity: for Lebesgue almost every c ∈ U , (z0(·), µc) is ergodic,
Anosov showed that sup0≤τ≤1∧Tε
∣hε(τ/ε)− h̄(τ)
∣ → 0 in probability (w.r.t. Rie-
mannian volume on initial conditions) as ε→ 0, i.e.
Theorem 2.1.1 (Anosov’s averaging theorem [Ano60]). For each T > 0 and for
each fixed δ > 0,
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ ≥ δ
as ε→ 0.
We present a recent proof of this theorem in Section 2.3 below.
If we consider hε(·) and h̄(·) to be random variables, Anosov’s theorem is a
version of the weak law of large numbers. In general, we can do no better: There
is no general strong law in this setting. There exists a simple example due to
Neishtadt (which comes from the equations for the motion of a pendulum with
linear drag being driven by a constant torque) where for no initial condition in a
positive measure set do we have convergence of hε(t) to h̄(εt) on the time scale 1/ε
as ε → 0 [Kif04b]. Here, the phase space is R× S1, and the unperturbed motion
is (uniquely) ergodic on all but one fiber.
2.2 Some classical averaging results
In this section we present some simple, well-known averaging results. See for
example Chapter 5 in [SV85] or Chapter 3 in [LM88].
2.2.1 Averaging for time-periodic vector fields
Consider a family of time dependent ordinary differential equations
= εH(h, t, ε), (2.3)
indexed by the real parameter ε ≥ 0, where h ∈ Rm. Fix V ⊂⊂ U ⊂ Rm, and
suppose
• Regularity: H ∈ C1(U × R× [0,∞)).
• Periodicity: There exists T > 0 such that for each h ∈ U , H(h, t, 0) is
T -periodic in time.
= εH(h, t, 0) +O(ε2).
Let hε(t) denote the solution of Equation (2.3). We seek a time independent
vector field whose solutions approximate hε(t), at least for a long length of time.
It is natural to define the averaged vector field H̄ by
H̄(h) =
H(h, s, 0)ds.
Then H̄ ∈ C1(U). Let h̄(τ) be the solution of
= H̄(h̄), h̄(0) = hε(0).
It is reasonable to hope that h̄(εt) and hε(t) are close together for 0 ≤ t ≤ ε−1.
We only consider the dynamics in a compact subset of phase space, so for initial
conditions in U , we define the stopping time
Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Theorem 2.2.1 (Time-periodic averaging). For each T > 0,
hε(0)∈V
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε) as ε→ 0.
Proof. We divide our proof into three essential steps.
Step 1: Reduction using Gronwall’s Inequality. Now, h̄(τ) satisfies the
integral equation
h̄(τ)− h̄(0) =
H̄(h̄(σ))dσ,
while hε(τ/ε) satisfies
hε(τ/ε)− hε(0) = ε
∫ τ/ε
H(hε(s), s, ε)ds
= O(ε) + ε
∫ τ/ε
H(hε(s), s, 0)ds
= O(ε) + ε
∫ τ/ε
H(hε(s), s, 0)− H̄(hε(s))ds+
H̄(hε(σ/ε))dσ
for 0 ≤ τ ≤ T ∧ Tε.
Define
eε(τ) = ε
∫ τ/ε
H(hε(s), s, 0)− H̄(hε(s))ds.
It follows from Gronwall’s Inequality that
0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
O(ε) + sup
0≤τ≤T∧Tε
|eε(τ)|
eLip(H̄|V)T .
Step 2: A sequence of times adapted for ergodization. Ergodization refers
to the convergence along an orbit of a function’s time average to its space average.
We define a sequence of times tk for k ≥ 0 by tk = kT . This sequence of times is
motivated by the fact that
tk+1 − tk
∫ tk+1
H(h0(s), s, 0)ds = H̄(h0).
Note that h0(t) is independent of time. Thus,
0≤τ≤T∧Tε
|eε(τ)| ≤ O(ε) + ε
tk+1≤T∧Tεε
∫ tk+1
H(hε(s), s, 0)− H̄(hε(s))ds
. (2.4)
Step 3: Control of individual terms by comparison with solutions of the
ε = 0 equation. The sum in Equation (2.4) has no more than O(1/ε) terms,
and so it suffices to show that each term
∫ tk+1
H(hε(s), s, 0) − H̄(hε(s))ds is no
larger than O(ε). We can accomplish this by comparing the motions of hε(t) for
tk ≤ t ≤ tk+1 with hk,ε(t), which is defined to be the solution of the ε = 0 ordinary
differential equation satisfying hk,ε(tk) = hε(tk), i.e. hk,ε(t) ≡ hε(tk).
Lemma 2.2.2. If tk+1 ≤ T∧Tεε , then suptk≤t≤tk+1 |hk,ε(t)− hε(t)| = O(ε).
Proof. dhε/dt = O(ε).
Using that H and H̄ are Lipschitz continuous, we conclude that
∫ tk+1
H(hε(s), s, 0)− H̄(hε(s))ds
∫ tk+1
H(hε(s), s, 0)−H(hk,ε(s), s, 0)ds
∫ tk+1
H(hk,ε(s), s, 0)− H̄(hk,ε(s))ds
∫ tk+1
H̄(hk,ε(s))− H̄(hε(s))ds
=O(ε) + 0 +O(ε)
=O(ε).
Thus we see that sup0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ ≤ O(ε), independent of the
initial condition hε(0) ∈ V.
Remark 2.2.1. Note that the O(ε) control in Theorem 2.2.1 on a time scale t =
O(ε−1) is generally optimal. For example, take H(h, t, ε) = cos(t) + ε.
2.2.2 Averaging for vector fields with one regular fast vari-
For h ∈ Rm and ϕ ∈ S1 = [0, 1]/0 ∼ 1, consider the family of ordinary differential
equations
= εH(h, ϕ, ε),
= Φ(h, ϕ, ε), (2.5)
indexed by the real parameter ε ≥ 0. With z = (h, ϕ), we write this family of
differential equations as dz/dt = Z(z, ε).
Fix V ⊂⊂ U ⊂ Rm, and suppose
• Regularity: Z ∈ C1(U × S1 × [0,∞)).
• Regular fast variable: Φ(h, ϕ, 0) is bounded away from 0 for h ∈ U , i.e.
(h,ϕ)∈U×S1
|Φ(h, ϕ, 0)| > 0.
Without loss of generality, we assume that Φ(h, ϕ, 0) > 0.
Let zε(t) = (hε(t), ϕε(t)) denote the solution of Equation (2.5). Then z0(t)
leaves invariant the circles Mc = {h = c} in phase space. In fact, z0(t) preserves
an uniquely ergodic invariant probability measure on Mc, whose density is given
dµc =
Φ(c, ϕ, 0)
where Kc =
Φ(c,ϕ,0)
is a normalization constant.
The averaged vector field H̄ is defined by averaging H(h, ϕ, 0) over ϕ:
H̄(h) =
H(h, ϕ, 0)dµh(ϕ) =
H(h, ϕ, 0)
Φ(h, ϕ, 0)
Then H̄ ∈ C1(U). Let h̄(τ) be the solution of
= H̄(h̄), h̄(0) = hε(0).
For initial conditions in U × S1, we have the usual stopping time Tε = inf{τ ≥ 0 :
h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Theorem 2.2.3 (Averaging over one regular fast variable). For each T > 0,
initial conditions
s.t. hε(0)∈V
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε) as ε→ 0.
Remark 2.2.2. This result encompasses Theorem 2.2.1 for time-periodic averaging.
For example, if T = 1, simply take ϕ = t mod 1 and Φ(h, ϕ, ε) = 1.
Remark 2.2.3. Many of the proofs of the above theorem of which we are aware
hinge on considering ϕ as a time-like variable. For example, one could write
H(h, ϕ, 0)
Φ(h, ϕ, 0)
+O(ε2),
and this looks very similar to the time-periodic situation considered previously.
However, it does take some work to justify such arguments rigorously, and the
traditional proofs do not easily generalize to averaging over multiple fast variables.
Our proof essentially uses ϕ to mark off time, and it will immediately generalize
to a specific instance of multiphase averaging.
Proof. Again, we have three steps.
Step 1: Reduction using Gronwall’s Inequality. Now
h̄(τ)− h̄(0) =
H̄(h̄(σ))dσ,
hε(τ/ε)− hε(0) = ε
∫ τ/ε
H(zε(s), ε)ds = O(ε) + ε
∫ τ/ε
H(zε(s), 0)ds
= O(ε) + ε
∫ τ/ε
H(zε(s), 0)− H̄(hε(s))ds+
H̄(hε(σ/ε))dσ
for 0 ≤ τ ≤ T ∧ Tε.
Define
eε(τ) = ε
∫ τ/ε
H(zε(s), 0)− H̄(hε(s))ds.
It follows from Gronwall’s Inequality that
0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
O(ε) + sup
0≤τ≤T∧Tε
|eε(τ)|
eLip(H̄|V)T .
Step 2: A sequence of times adapted for ergodization. Now for each
initial condition in our phase space and for each fixed ε, we define a sequence of
times tk,ε and a sequence of solutions zk,ε(t) inductively as follows: t0,ε = 0 and
z0,ε(t) = z0(t). For k > 0, tk,ε = inf{t > tk−1,ε : ϕk−1,ε(t) = ϕε(0)}, and zk,ε(t) is
defined as the solution of
dzk,ε
= Z(zk,ε, 0) = (0,Φ(zk,ε, 0)), zk,ε(tk,ε) = zε(tk,ε).
This sequence of times is motivated by the fact that
tk+1,ε − tk,ε
∫ tk+1,ε
tk ,ε
H(zk,ε(s), 0)ds = H̄(hk,ε).
Recall that hk,ε(t) is independent of time. The elements of this sequence of times
are approximately uniformly spaced, i.e. if we fix ω > 0 such that z ∈ V × S1 ⇒
1/ω < Φ(z, 0) < ω, then if tk+1,ε ≤ (T ∧ Tε)/ε, 1/ω < tk+1,ε − tk,ε < ω.
Thus,
0≤τ≤T∧Tε
|eε(τ)| ≤ O(ε) + ε
tk+1,ε≤T∧Tεε
∫ tk+1,ε
H(zε(s), 0)− H̄(hε(s))ds
where the sum in in this equation has no more than O(1/ε) terms.
Step 3: Control of individual terms by comparison with solutions along
fibers. It suffices to show that each term
∫ tk+1,ε
H(zε(s), 0) − H̄(hε(s))ds is no
larger than O(ε). We can accomplish this by comparing the motions of zε(t) for
tk,ε ≤ t ≤ tk+1,ε with zk,ε(t).
Lemma 2.2.4. If tk+1,ε ≤ T∧Tεε , then suptk,ε≤t≤tk+1,ε |zk,ε(t)− zε(t)| = O(ε).
Proof. Without loss of generality, we take k = 0, so that zk,ε(t) = z0(t). Since
h0(t) = hε(0) and dhε/dt = O(ε), supt0,ε≤t≤t1,ε |h0(t)− hε(t)| = O(ε).
Now ϕε(t) − ϕε(0) =
Φ(hε(s), ϕε(s), ε)ds, and because Φ is Lipschitz, we
find that
|ϕε(t)− ϕ0(t)| ≤ O(ε) + Lip (Φ)
|ϕε(s)− ϕ0(s)| ds
for 0 ≤ t ≤ ω. The result follows from Gronwall’s Inequality.
Using that H and H̄ are Lipschitz continuous, we conclude that
∫ tk+1,ε
H(zε(s), 0)− H̄(hε(s))ds
∫ tk+1,ε
H(zε(s), 0)−H(zk,ε(s), 0)ds
∫ tk+1,ε
H(zk,ε(s), 0)− H̄(hk,ε(s))ds
∫ tk+1,ε
H̄(hk,ε(s))− H̄(hε(s))ds
=O(ε) + 0 +O(ε)
=O(ε).
Thus we see that sup0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε), independent of the
initial condition (hε(0), ϕε(0)) ∈ V × S1.
2.2.3 Multiphase averaging for vector fields with separa-
ble, regular fast variables
As explained in Section 2.1, when the differential equation for the fast variable is
not regular, or when there is more than one fast variable, the typical averaging
result becomes much weaker than the uniform convergence in Theorems 2.2.1
and 2.2.3 above. Nonetheless, if the differential equations under consideration
satisfy some very specific hypotheses, the proof in the previous section immediately
generalizes to yield uniform convergence.
For h ∈ Rm and ϕ = (ϕ1, · · · , ϕn) ∈ Tn = ([0, 1]/0 ∼ 1)n, consider the family
of ordinary differential equations
= εH(h, ϕ, ε),
= Φ(h, ϕ, ε), (2.6)
indexed by the real parameter ε ≥ 0. We also write z = (h, ϕ) and dz/dt = Z(z, ε).
Fix V ⊂⊂ U ⊂ Rm, and suppose
• Regularity: Z ∈ C1(U × Tn × [0,∞)).
• Separable fast variables: H(h, ϕ, 0) and Φ(h, ϕ, 0) have the following specific
forms:
– There exist C1 functionsHj(h, ϕj) such thatH(h, ϕ, 0) =
j=1Hj(h, ϕ
This can be thought of as saying that, to first order in ε, each fast vari-
able affects the slow variables independently of the other fast variables.
– The components Φj of Φ satisfy Φj(h, ϕ, 0) = Φj(h, ϕj, 0), i.e. the un-
perturbed motion has each fast variable moving independently of the
other fast variables. Note that this assumption is satisfied if the unper-
turbed motion is quasi-periodic, i.e. Φ(h, ϕ, 0) = Ω(h).
• Regular fast variables: For each j,
(h,ϕj)∈U×S1
∣Φj(h, ϕj, 0)
∣ > 0.
Let zε(t) = (hε(t), ϕε(t)) denote the solution of Equation (2.6). Then z0(t)
leaves invariant the tori Mc = {h = c} in phase space. In fact, z0(t) preserves a
(not necessarily ergodic) invariant probability measure on Mc, whose density is
given by
dµc =
|Φj(c, ϕj, 0)| ,
where Kjc =
|Φj(c,ϕj ,0)| .
The averaged vector field H̄ is defined by
H̄(h) =
H(h, ϕ, 0)dµh(ϕ) =
Hj(h, ϕ
j)dµh(ϕ)
Hj(h, ϕ
|Φj(h, ϕj, 0)|dϕ
H̄j(h).
Let h̄(τ) be the solution of
= H̄(h̄), h̄(0) = hε(0),
and the stopping time Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Theorem 2.2.5 (Averaging over multiple separable, regular fast variables). For
each T > 0,
initial conditions
s.t. hε(0)∈V
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε) as ε→ 0.
Proof. The proof is essentially the same as the proof of Theorem 2.2.3. As before,
we need only show that sup0≤τ≤T∧Tε |eε(τ)| = O(ε), where
eε(τ) = ε
∫ τ/ε
H(zε(s), 0)− H̄(hε(s))ds.
But by our separability assumptions, it suffices to show that for each j,
0≤τ≤T∧Tε
|ej,ε(τ)| = O(ε),
where ej,ε(τ) is defined by
ej,ε(τ) = ε
∫ τ/ε
Hj(hε(s), ϕ
ε(s))− H̄j(hε(s))ds.
Thus, we have effectively separated the effects of each fast variable, and now the
proof can be completed by essentially following steps 2 and 3 in the proof of
Theorem 2.2.3.
2.3 A proof of Anosov’s theorem
Anosov’s original proof of Theorem 2.1.1 from 1960 may be found in [Ano60]. An
exposition of the theorem and Anosov’s proof in English may be found in [LM88].
Recently, Kifer [Kif04a] proved necessary and sufficient conditions for the averaging
principle to hold in an averaged with respect to initial conditions sense. He also
showed explicitly that his conditions are met in the setting of Anosov’s theorem.
The proof of Anosov’s theorem given here is mainly due to Dolgopyat [Dol05],
although some further simplifications have been made.
Proof of Anosov’s theorem. We begin by showing that without loss of generality
we may take Tε = ∞. This is just for convenience, and not an essential part of
the proof. To accomplish this, let ψ(h) be a smooth bump function satisfying
• ψ(h) = 1 if h ∈ V,
• ψ(h) > 0 if h ∈ interior(Ṽ),
• ψ(h) = 0 if h /∈ Ṽ,
where Ṽ is a compact set chosen such that V ⊂⊂ interior(Ṽ) ⊂⊂ U . Next,
set Z̃(z, ε) = ψ(h(z))Z(z, ε). Because the bump function was chosen to depend
only on the slow variables, our assumption about preservation of measures is still
satisfied; on each fiber, Z̃(z, 0) is a scaler multiple of Z(z, 0). Furthermore, the
flow of Z̃(·, 0)|Mh is ergodic for almost every h ∈ Ṽ. Then it would suffice to prove
our theorem for the vector fields Z̃(z, ε) with the set Ṽ replacing V. We assume
that this reduction has been made, although we will not use it until Step 5 below.
Step 1: Reduction using Gronwall’s Inequality. Observe that h̄(τ) satisfies
the integral equation
h̄(τ)− h̄(0) =
H̄(h̄(σ))dσ,
while hε(τ/ε) satisfies
hε(τ/ε)− hε(0) = ε
∫ τ/ε
H(zε(s), ε)ds
= O(ε) + ε
∫ τ/ε
H(zε(s), 0)ds
= O(ε) + ε
∫ τ/ε
H(zε(s), 0)− H̄(hε(s))ds+
H̄(hε(σ/ε))dσ
for 0 ≤ τ ≤ T ∧ Tε. Here we have used the fact that h−1V × [0, ε0] is compact to
achieve uniformity over all initial conditions in the size of the O(ε) term above. We
use this fact repeatedly in what follows. In particular, H , H̄, and Z are uniformly
bounded and have uniform Lipschitz constants on the domains of interest.
Define
eε(τ) = ε
∫ τ/ε
H(zε(s), 0)− H̄(hε(s))ds.
It follows from Gronwall’s Inequality that
0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
O(ε) + sup
0≤τ≤T∧Tε
|eε(τ)|
eLip(H̄ |V)T . (2.7)
Step 2: Introduction of a time scale for ergodization. Choose a real-valued
function L(ε) such that L(ε) → ∞, L(ε) = o(log ε−1) as ε → 0. Think of L(ε) as
being a time scale which grows as ε→ 0 so that ergodization, i.e. the convergence
along an orbit of a function’s time average to a space average, can take place.
However, L(ε) doesn’t grow too fast, so that on this time scale zε(t) essentially
stays on one fiber, where we have our ergodicity assumption. Set tk,ε = kL(ε), so
0≤τ≤T∧Tε
|eε(τ)| ≤ O(εL(ε)) + ε
εL(ε)
∫ tk+1,ε
H(zε(s), 0)− H̄(hε(s))ds
. (2.8)
Step 3: A splitting for using the triangle inequality. Now we let zk,ε(s)
be the solution of
dzk,ε
= Z(zk,ε, 0), zk,ε(tk,ε) = zε(tk,ε).
Set hk,ε(t) = h(zk,ε(t)). Observe that hk,ε(t) is independent of t. We break up the
integral
∫ tk+1,ε
H(zε(s), 0)− H̄(hε(s))ds into three parts:
∫ tk+1,ε
H(zε(s), 0)− H̄(hε(s))ds
∫ tk+1,ε
H(zε(s), 0)−H(zk,ε(s), 0)ds
∫ tk+1,ε
H(zk,ε(s), 0)− H̄(hk,ε(s))ds
∫ tk+1,ε
H̄(hk,ε(s))ds− H̄(hε(s))ds
:=Ik,ε + IIk,ε + IIIk,ε.
The term IIk,ε represents an “ergodicity term” that can be controlled by our
assumptions on the ergodicity of the flow z0(t), while the terms Ik,ε and IIIk,ε
represent “continuity terms” that can be controlled using the following control on
the drift from solutions along fibers.
Step 4: Control of drift from solutions along fibers.
Lemma 2.3.1. If 0 < tk+1,ε ≤ T∧Tεε ,
tk,ε≤t≤tk+1,ε
|zk,ε(t)− zε(t)| ≤ O(εL(ε)eLip(Z)L(ε))
Proof. Without loss of generality we may set k = 0, so that zk,ε(t) = z0(t). Then
for 0 ≤ t ≤ L(ε),
|z0(t)− zε(t)| =
Z(z0(s), 0)− Z(zε(s), ε)ds
≤ Lip (Z)
|ε|+ |z0(s)− zε(s)| ds
= O(εL(ε)) + Lip (Z)
|z0(s)− zε(s)| ds.
The result follows from Gronwall’s Inequality.
From Lemma 2.3.1 we find that Ik,ε, IIIk,ε = O(εL(ε)2eLip(Z)L(ε)).
Step 5: Use of ergodicity along fibers to control IIk,ε. From Equations
(2.7) and (2.8) and the triangle inequality, we already know that
0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
≤ O(ε) +O(εL(ε)) + ε T
εL(ε)
O(εL(ε)2eLip(Z)L(ε)) +O
εL(ε)
|IIk,ε|
= O(εL(ε)eLip(Z)L(ε)) +O
εL(ε)
|IIk,ε|
(2.9)
Fix δ > 0. Recalling that Tε = ∞, it suffices to show that
εL(ε)
|IIk,ε| ≥ δ
as ε→ 0.
For initial conditions z ∈ M and for 0 ≤ k ≤ T
εL(ε)
define
Bk,ε =
|IIk,ε| >
Bz,ε = {k : z ∈ Bk,ε} .
Think of these sets as describing “bad ergodization.” For example, roughly speak-
ing, z ∈ Bk,ε if the orbit zε(t) starting at z spends the time between tk,ε and tk+1,ε
in a region of phase space where the function H(·, 0) is “poorly ergodized” on the
time scale L(ε) by the flow z0(t) (as measured by the parameter δ/2T ). As IIk,ε
is clearly never larger than O(L(ε)), it follows that
εL(ε)
|IIk,ε| ≤
+O(εL(ε)#(Bz,ε)).
Therefore it suffices to show that
#(Bz,ε) ≥
const εL(ε)
as ε→ 0. By Chebyshev’s Inequality, we need only show that
E(εL(ε)#(Bz,ε)) = εL(ε)
εL(ε)
P (Bk,ε)
tends to 0 with ε.
In order to estimate the size of P (Bk,ε), it is convenient to introduce a new
measure P f that is uniformly equivalent to the restriction of Riemannian volume
P to h−1V. Here the f stands for “factor,” and P f is defined by
dP f = dh · dµh,
where dh represents integration with respect to the uniform measure on V.
Observe that B0,ε = zε(tk,ε)Bk,ε. In words, the initial conditions giving rise
to orbits that are “bad” on the time interval [tk,ε, tk+1,ε], moved forward by time
tk,ε, are precisely the initial conditions giving rise to orbits that are “bad” on the
time interval [t0,ε, t1,ε]. Because the flow z0(·) preserves the measure P f , we expect
P f(B0,ε) and P f(Bk,ε) to have roughly the same size. This is made precise by the
following lemma.
Lemma 2.3.2. There exists a constant K such that for each Borel set B ⊂ M
and each t ∈ [−T/ε, T/ε], P f(zε(t)B) ≤ eKTP f(B).
Proof. Assume that P f(B) > 0, and set γ(t) = ln
P f(zε(t)B)/P
. Then
γ(0) = 0, and
(t) =
zε(t)B
f̃(z)dz
zε(t)B
f̃(z)dz
zε(t)B
divP fZ(z, ε)dz
zε(t)B
f̃(z)dz
where f̃ > 0 is the C1 density of P f with respect to Riemannian volume on
h−1V, dz represents integration with respect to that volume, and divP fZ(z, ε) =
divzf̃(z)Z(z, ε). Because z0(t) preserves P
f , divP fZ(z, 0) ≡ 0. By Hadamard’s
Lemma, it follows that divP f Z(z, ε) = O(ε) on the compact set h−1V. Hence
dγ(t)/dt = O(ε), and the result follows.
Returning to our proof of Anosov’s theorem, it suffices to show that
P f(B0,ε) =
dh · µh
∫ L(ε)
H(z0(s), 0)− H̄(h0(0))ds
tends to 0 with ε. By our ergodicity assumption, for almost every h,
∫ L(ε)
H(z0(s), 0)− H̄(h0(0))ds
→ 0 as ε→ 0.
Finally, an application of the Bounded Convergence Theorem finishes the proof.
2.4 Moral
From the proofs of the theorems in this chapter, it should be apparent that there
are at least two key steps necessary for proving a version of the averaging principle
in the setting presented in Section 2.1.
The first step is estimating the continuity between the ε = 0 and the ε > 0
solutions of
= Z(z, ε).
In particular, on some relatively long timescale L = L(ε) ≪ ε−1, we need to show
0≤t≤L
|z0(t)− zε(t)| → 0
as ε → 0. As long as L is sub-logarithmic in ε−1, such estimates for smooth
systems can be made using Gronwall’s Inequality.
The second step is estimating the rate of ergodization of H(·, 0) by z0(t), i.e. es-
timating how fast
H(z0(s), 0) ds→ H̄(h0)
(generally as L → ∞). Note that the estimates in this step compete with those
in the first step in that, if L is small we obtain better continuity, but if L is large
we usually obtain better ergodization. Also, we do not need the full force of the
assumption of ergodicity of (z0(t), µh) on the fibers Mh. We only need z0(t) to
ergodize the specific function H(·, 0). Compare the proof of Theorem 2.2.5.
Note that in the setting of Anosov’s theorem, uniform ergodization leads to
uniform convergence in the averaging principle. Returning to the proof of Theo-
rem 2.1.1 above, suppose that
∫ L(ε)
H(z0(s), 0)ds→ H̄(h0)
uniformly over all initial conditions as L(ε) → ∞. Then for all ε sufficiently small
and each k, Bk,ε = ∅, and hence for all ε sufficiently small and each z, #(Bz,ε) = 0.
From Equation (2.9), it follows that sup0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
∣ → 0 as ε → 0,
uniformly over all initial conditions z ∈ h−1V. However, uniform convergence in
Birkhoff’s Ergodic Theorem is extremely rare and usually comes about because of
unique ergodicity, so it is unreasonable to expect this sort of uniform convergence
in most situations where Anosov’s theorem applies.
Chapter 3
Results for piston systems in one
dimension
In this chapter, we present our results for piston systems in one dimension. These
results may also be found in [Wri06].
3.1 Statement of results
3.1.1 The hard core piston problem
Consider the system of n1 + n2 + 1 point particles moving inside the unit interval
indicated in Figure 3.1. One distinguished particle, the piston, has position Q and
mass M . To the left of the piston there are n1 > 0 particles with positions q1,j
and masses m1,j, 1 ≤ j ≤ n1, and to the right there are n2 > 0 particles with
positions q2,j and masses m2,j, 1 ≤ j ≤ n2. These gas particles do not interact
with each other, but they interact with the piston and with walls located at the
end points of the unit interval via elastic collisions. We denote the velocities by
dQ/dt = V and dxi,j/dt = vi,j. There is a standard method for transforming
this system into a billiard system consisting of a point particle moving inside an
(n1 + n2 + 1)-dimensional polytope [CM06a], but we will not use this in what
follows.
We are interested in the dynamics of this system when the numbers and masses
of the gas particles are fixed, the total energy is bounded, and the mass of the
piston tends to infinity. When M = ∞, the piston remains at rest, and each
gas particle performs periodic motion. More interesting are the motions of the
system when M is very large but finite. Because the total energy of the system is
bounded, MV 2/2 ≤ const, and so V = O(M−1/2). Set
ε =M−1/2,
t t t t t t t
Figure 3.1: The piston system with n1 = 3 and n2 = 4. Note that the gas particles
do not interact with each other, but only with the piston and the walls.
and let
so that
with W = O(1).
When ε = 0, the system has n1 + n2 + 2 independent first integrals (conserved
quantities), which we take to be Q, W , and si,j = |vi,j |, the speeds of the gas
particles. We refer to these variables as the slow variables because they should
change slowly with time when ε is small, and we denote them by
h = (Q,W, s1,1, s1,2, · · · , s1,n1, s2,1, s2,2, · · · , s2,n2) ∈ Rn1+n2+2.
We will often abbreviate by writing h = (Q,W, s1,j, s2,j). Let hε(t, z) = hε(t)
denote the dynamics of these variables in time for a fixed value of ε, where z
represents the dependence on the initial condition in phase space. We usually
suppress the initial condition in our notation. Think of hε(·) as a random variable
which, given an initial condition in the 2(n1 + n2 + 1)-dimensional phase space,
produces a piecewise continuous path in Rn1+n2+2. These paths are the projection
of the actual motions in our phase space onto a lower dimensional space. The goal
of averaging is to find a vector field on Rn1+n2+2 whose orbits approximate hε(t).
The averaged equation
Sinai [Sin99] derived
= H̄(h) :=
j=1 m1,js
j=1 m2,js
−s1,jW
s2,jW
(3.1)
as the averaged equation (with respect to the slow time τ = εt) for the slow
variables. We provide a heuristic derivation in Section 3.2. Sinai solved this
equation as follows: From
d ln(s1,j)
= −d ln(Q)
s1,j(τ) = s1,j(0)Q(0)/Q(τ). Similarly, s2,j(τ) = s2,j(0)(1−Q(0))/(1−Q(τ)). Hence
j=1m1,js1,j(0)
2Q(0)2
j=1m2,js2,j(0)
2(1−Q(0))2
(1−Q)3 ,
and so (Q,W ) behave as if they were the coordinates of a Hamiltonian system
describing a particle undergoing periodic motion inside a potential well. If we let
s2i,j
be the kinetic energy of the gas particles on one side of the piston, the effective
Hamiltonian may be expressed as
W 2 +
E1(0)Q(0)
E2(0)(1−Q(0))2
(1−Q)2 . (3.2)
Hence, the solutions to the averaged equation are periodic for all initial conditions
under consideration.
Main result in the hard core setting
The solutions of the averaged equation approximate the motions of the slow vari-
ables, hε(t), on a time scale O(1/ε) as ε → 0. Precisely, let h̄(τ, z) = h̄(τ) be the
solution of
= H̄(h̄), h̄(0) = hε(0).
Again, think of h̄(·) as being a random variable that takes an initial condition in
our phase space and produces a path in Rn1+n2+2.
Next, fix a compact set V ⊂ Rn1+n2+2 such that h ∈ V ⇒ Q ⊂⊂ (0, 1),W ⊂⊂
R, and si,j ⊂⊂ (0,∞) for each i and j.1 For the remainder of this discussion we
will restrict our attention to the dynamics of the system while the slow variables
remain in the set V. To this end, we define the stopping time
Tε(z) = Tε := inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Theorem 3.1.1. For each T > 0,
initial conditions
s.t. hε(0)∈V
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε) as ε =M−1/2 → 0.
This result was independently obtained by Gorelyshev and Neishtadt [GN06].
Note that the stopping time does not unduly restrict the result. Given any
c such that h = c ⇒ Q ∈ (0, 1), si,j ∈ (0,∞), then by an appropriate choice of
the compact set V we may ensure that, for all ε sufficiently small and all initial
conditions in our phase space with hε(0) = c, Tε ≥ T . We do this by choosing
V ∋ c such that the distance between ∂V and the periodic orbit h̄(τ) with h̄(0) = c
is positive. Call this distance d. Then Tε can only occur before T if hε(τ/ε) has
deviated by at least d from h̄(τ) for some τ ∈ [0, T ). Since the size of the deviations
tends to zero uniformly with ε, this is impossible for all small ε.
3.1.2 The soft core piston problem
In this section, we consider the same system of one piston and gas particles inside
the unit interval considered in Section 3.1.1, but now the interactions of the gas
particles with the walls and with the piston are smooth. Let κ : R → R be a C2
function satisfying
• κ(x) = 0 if x ≥ 1,
• κ′(x) < 0 if x < 1.
Let δ > 0 be a parameter of smoothing, and set
κδ(x) = κ(x/δ).
1 We have introduced this notation for convenience. For example, h ∈ V ⇒ Q ⊂⊂ (0, 1)
means that there exists a compact set A ⊂ (0, 1) such that h ∈ V ⇒ Q ∈ A, and similarly for
the other variables.
Then consider the Hamiltonian system obtained by having the gas particles inter-
act with the piston and the walls via the potential
κδ(q1,j) + κδ(Q− q1,j) +
κδ(q2,j −Q) + κδ(1− q2,j).
As before, we set ε =M−1/2 and W = V/ε. If we let
E1,j =
m1,jv
1,j + κδ(q1,j) + κδ(Q− q1,j), 1 ≤ j ≤ n1,
E2,j =
m2,jv
2,j + κδ(q2,j −Q) + κδ(1− q2,j), 1 ≤ j ≤ n2,
(3.3)
then Ei,j may be thought of as the energy associated with a gas particle, and
W 2/2 +
j=1E1,j +
j=1E2,j is the conserved energy.
When ε = 0, the Hamiltonian system admits n1 + n2 + 2 independent first
integrals, which we choose this time as h = (Q,W,E1,j , E2,j). While discussing
the soft core dynamics we use the energies Ei,j rather than the variables si,j =
2Ei,j/mi,j, which we used for the hard core dynamics, for convenience.
For comparison with the hard core results, we formally consider the dynamics
described by setting δ = 0 to be the hard core dynamics described in Section 3.1.1.
This is reasonable because we will only consider gas particle energies below the
barrier height κ(0). Then for any ε, δ ≥ 0, hδε(t) denotes the actual time evolution
of the slow variables. While discussing the soft core dynamics we often use δ as a
superscript to specify the dynamics for a certain value of δ. We usually suppress
the dependence on δ, unless it is needed for clarity.
Main result in the soft core setting
We have already seen that when δ = 0, there is an appropriate averaged vector
field H̄0 whose solutions approximate the actual motions of the slow variables,
h0ε(t). We will show that when δ > 0, there is also an appropriate averaged vector
field H̄δ whose solutions still approximate the actual motions of the slow variables,
hδε(t). We delay the derivation of H̄
δ until Section 3.4.1.
Fix a compact set V ⊂ Rn1+n2+2 such that h ∈ V ⇒ Q ⊂⊂ (0, 1),W ⊂⊂ R,
and Ei,j ⊂⊂ (0, κ(0)) for each i and j. For each ε, δ ≥ 0 we define the functions
h̄δ(·) and T δε on our phase space by letting h̄δ(τ) be the solution of
= H̄δ(h̄δ), h̄δ(0) = hδε(0), (3.4)
T δε = inf{τ ≥ 0 : h̄δ(τ) /∈ V or hδε(τ/ε) /∈ V}.
Theorem 3.1.2. There exists δ0 > 0 such that the averaged vector field H̄
δ(h) is
C1 on the domain {(δ, h) : 0 ≤ δ ≤ δ0, h ∈ V}. Furthermore, for each T > 0,
0≤δ≤δ0
initial conditions
s.t. hδε(0)∈V
0≤τ≤T∧T δε
∣hδε(τ/ε)− h̄δ(τ)
∣ = O(ε) as ε =M−1/2 → 0.
As in Section 3.1.1, for any fixed c there exists a suitable choice of the compact
set V such that for all sufficiently small ε and δ, T δε ≥ T whenever hδε(0) = c.
As we will see, for each fixed δ > 0, Anosov’s theorem 2.1.1 applies to the soft
core system and yields a weak law of large numbers, and Theorem 2.2.5 applies and
yields a strong law of large numbers with a uniform rate of convergence. However,
neither of these theorems yields the uniformity over δ in the result above.
3.1.3 Applications and generalizations
Relationship between the hard core and the soft core piston
It is not a priori clear that we can compare the motions of the slow variables on
the time scale 1/ε for δ > 0 versus δ = 0, i.e. compare the motions of the soft core
piston with the motions of the hard core piston on a relatively long time scale. It
is impossible to compare the motions of the fast-moving gas particles on this time
scale as ε → 0. As we see in Section 3.4, the frequency with which a gas particle
hits the piston changes by an amount O(δ) when we smooth the interaction. Thus,
on the time scale 1/ε, the number of collisions is altered by roughly O(δ/ε), and
this number diverges if δ is held fixed while ε → 0.
Similarly, one might expect that it is impossible to compare the motions of the
soft and hard core pistons as ε → 0 without letting δ → 0 with ε. However, from
Gronwall’s Inequality it follows that if h̄δ(0) = h̄0(0), then
0≤τ≤T∧T δε ∧T 0ε
∣h̄δ(τ)− h̄0(τ)
∣ = O(δ).
From the triangle inequality and Theorems 3.1.1 and 3.1.2 we obtain the following
corollary, which allows us to compare the motions of the hard core and the soft
core piston.
Corollary 3.1.3. As ε =M−1/2, δ → 0,
initial conditions
s.t. hδε(0)=c=h
0≤t≤(T∧T δε ∧T 0ε )/ε
∣hδε(t)− h0ε(t)
∣ = O(ε) +O(δ).
This shows that, provided the slow variables have the same initial conditions,
0≤t≤1/ε
∣hδε(t)− h0ε(t)
∣ = O(ε) +O(δ).
Thus the motions of the slow variables converge on the time scale 1/ε as ε, δ → 0,
and it is immaterial in which order we let these parameters tend to zero.
The adiabatic piston problem
We comment on what Theorem 3.1.1 says about the adiabatic piston problem. The
initial conditions of the adiabatic piston problem require thatW (0) = 0. Although
our system is so simple that a proper thermodynamical pressure is not defined,
we can define the pressure of a gas to be the average force received from the gas
particles by the piston when it is held fixed, i.e. P1 =
j=1 2m1,js1,j
= 2E1/Q
and P2 = 2E2/(1 − Q). Then if P1(0) > P2(0), the initial condition for our
averaged equation (3.1) has the motion of the piston starting at the left turning
point of a periodic orbit determined by the effective potential well. Up to errors not
much bigger than M−1/2, we see the piston oscillate periodically on the time scale
M1/2. If P1(0) < P2(0), the motion of the piston starts at a right turning point.
However, if P1(0) = P2(0), then the motion of the piston starts at the bottom of
the effective potential well. In this case of mechanical equilibrium, h̄(τ) = h̄(0),
and we conclude that, up to errors not much bigger thanM−1/2, we see no motion
of the piston on the time scale M1/2. A much longer time scale is required to see
if the temperatures equilibrate.
Generalizations
A simple generalization of Theorem 3.1.1, proved by similar techniques, follows.
The system consists of N − 1 pistons, that is, heavy point particles, located inside
the unit interval at positions Q1 < Q2 < . . . < QN−1. Walls are located at Q0 ≡ 0
and QN ≡ 1, and the piston at position Qi has mass Mi. Then the pistons divide
the unit interval into N chambers. Inside the ith chamber, there are ni ≥ 1 gas
particles whose locations and masses will be denoted by xi,j and mi,j , respectively,
where 1 ≤ j ≤ ni. All of the particles are point particles, and the gas particles
interact with the pistons and with the walls via elastic collisions. However, the
gas particles do not directly interact with each other. We scale the piston masses
as Mi = M̂i/ε
2 with M̂i constant, define Wi by dQi/dt = εWi, and let Ei be
the kinetic energy of the gas particles in the ith chamber. Then we can find an
appropriate averaged equation whose solutions have the pistons moving like an
(N − 1)-dimensional particle inside a potential well with an effective Hamiltonian
Ei(0)(Qi(0)−Qi−1(0))2
(Qi −Qi−1)2
If we write the slow variables as h = (Qi,Wi, |vi,j|) and fix a compact set V such
that h ∈ V ⇒ Qi+1 − Qi ⊂⊂ (0, 1),Wi ⊂⊂ R, and |vi,j| ⊂⊂ (0,∞), then the
convergence of the actual motions of the slow variables to the averaged solutions
is exactly the same as the convergence given in Theorem 3.1.1.
Remark 3.1.1. The inverse quadratic potential between adjacent pistons in the
effective Hamiltonian above is also referred to as the Calogero-Moser-Sutherland
potential. It has also been observed as the effective potential created between two
adjacent tagged particles in a one-dimensional Rayleigh gas by the insertion of one
very light particle inbetween the tagged particles [BTT07].
3.2 Heuristic derivation of the averaged equa-
tion for the hard core piston
We present here a heuristic derivation of Sinai’s averaged equation (3.1) that is
found in [Dol05].
First, we examine interparticle collisions when ε > 0. When a particle on the
left, say the one at position q1,j , collides with the piston, s1,j andW instantaneously
change according to the laws of elastic collisions:
v+1,j
m1,j +M
m1,j −M 2M
2m1,j M −m1,j
v−1,j
. (3.5)
If the speed of the left gas particle is bounded away from zero, and W = M1/2V
is also bounded, it follows that for all ε sufficiently small, any collision will have
v−1,j > 0 and v
1,j < 0. In this case, when we translate Equation (3.5) into our new
coordinates, we find that
s+1,j
1 + ε2m1,j
1− ε2m1,j −2ε
2εm1,j 1− ε2m1,j
s−1,j
, (3.6)
so that
∆s1,j = s
1,j − s−1,j = −2εW− +O(ε2),
∆W =W+ −W− = +2εm1,js−1,j +O(ε2).
The situation is analogous when particles on the right collide with the piston.
For all ε sufficiently small, s2,j and W instantaneously change by
∆W = W+ −W− = −2εm2,js−2,j +O(ε2),
∆s2,j = s
2,j − s−2,j = +2εW− +O(ε2).
We defer discussing the rare events in which multiple gas particles collide with the
piston simultaneously, although we will see that they can be handled appropriately.
Let ∆t be a length of time long enough such that the piston experiences many
collisions with the gas particles, but short enough such that the slow variables
change very little, in this time interval. From each collision with the particle at
position q1,j , W changes by an amount +2εm1,js1,j + O(ε2), and the frequency
of these collisions is approximately
. Arguing similarly for collisions with the
other particles, we guess that
2m1,js1,j
2m2,js2,j
2(1−Q) +O(ε
Note that not only does the position of the piston change slowly in time, but its
velocity also changes slowly, i.e. the piston has inertia. With τ = εt as the slow
time, a reasonable guess for the averaged equation for W is
m1,js
m2,js
1−Q .
Similar arguments for the other slow variables lead to the averaged equation (3.1).
3.3 Proof of the main result for the hard core
piston
3.3.1 Proof of Theorem 3.1.1 with only one gas particle on
each side
We specialize to the case when there is only one gas particle on either side of
the piston, i.e. we assume that n1 = n2 = 1. We then denote x1,1 by q1, m2,2
by m2, etc. This allows the proof’s major ideas to be clearly expressed, without
substantially limiting their applicability. At the end of this section, we outline the
simple generalizations needed to make the proof apply in the general case.
A choice of coordinates on the phase space for a three particle system
As part of our proof, we choose a set of coordinates on our six-dimensional phase
space such that, in these coordinates, the ε = 0 dynamics are smooth. Complete
the slow variables h = (Q,W, s1, s2) to a full set of coordinates by adding the
coordinates ϕi ∈ [0, 1]/ 0 ∼ 1 = S1, i = 1, 2, defined as follows:
ϕ1 = ϕ1(q1, v1, Q) =
if v1 > 0
1− q1
if v1 < 0
ϕ2 = ϕ2(q2, v2, Q) =
2(1−Q) if v2 < 0
1− 1−q2
2(1−Q) if v2 > 0
When ε = 0, these coordinates are simply the angle variable portion of action-
angle coordinates for an integrable Hamiltonian system. They are defined such
that collisions occur between the piston and the gas particles precisely when ϕ1
or ϕ2 = 1/2. Then z = (h, ϕ1, ϕ2) represents a choice of coordinates on our phase
space, which is homeomorphic to (a subset of R4) × T2. We abuse notation and
also let h(z) represent the projection onto the first four coordinates of z.
Now we describe the dynamics of our system in these coordinates. When
ϕ1, ϕ2 6= 1/2,
ϕ1 if 0 ≤ ϕ1 < 1/2
(1− ϕ1) if 1/2 < ϕ1 ≤ 1
2(1−Q) +
1−Qϕ2 if 0 ≤ ϕ2 < 1/2
2(1−Q) −
1−Q(1− ϕ2) if 1/2 < ϕ2 ≤ 1
Hence between interparticle collisions, the dynamics are smooth and are described
= εW,
+O(ε),
2(1−Q) +O(ε).
(3.7)
When ϕ1 reaches 1/2, while ϕ2 6= 1/2, the coordinates Q, s2, ϕ1, and ϕ2 are
instantaneously unchanged, while s1 andW instantaneously jump, as described by
Equation (3.6). As an aside, it is curious that s+1 +εW
+ = s−1 −εW−, so that dϕ1/dt
is continuous as ϕ1 crosses 1/2. However, the collision induces discontinuous jumps
of size O(ε2) in dQ/dt and dϕ2/dt. Denote the linear transformation in Equation
(3.6) with j = 1 by A1,ε. Then
A1,ε =
1 −2ε
2εm1 1
+O(ε2).
The situation is analogous when ϕ2 reaches 1/2, while ϕ1 6= 1/2. Then W and
s2 are instantaneously transformed by a linear transformation
A2,ε =
1 −2εm2
+O(ε2).
We also account for the possibility of all three particles colliding simultane-
ously. There is no completely satisfactory way to do this, as the dynamics have an
essential singularity near {ϕ1 = ϕ2 = 1/2}. Furthermore, such three particle colli-
sions occur with probability zero with respect to the invariant measure discussed
below. However, the two 3× 3 matrices
A1,ε 0
0 A2,ε
have a commutator of size O(ε2). We will see that this small of an error will make
no difference to us as ε → 0, and so when ϕ1 = ϕ2 = 1/2, we pretend that the
left particle collides with the piston instantaneously before the right particle does.
Precisely, we transform the variables s1, W, and s2 by
0 A2,ε
A1,ε 0
We find that
∆s1 = s
1 − s−1 = −2εW− +O(ε2),
∆W = W+ −W− = +2εm1s−1 − 2εm2s−2 +O(ε2),
∆s2 = s
2 − s−2 = +2εW− +O(ε2).
The above rules define a flow on the phase space, which we denote by zε(t).
We denote its components by Qε(t), Wε(t), s1,ε(t), etc. When ε > 0, the flow is
not continuous, and for definiteness we take zε(t) to be left continuous in t.
Because our system comes from a Hamiltonian system, it preserves Liouville
measure. In our coordinates, this measure has a density proportional to Q(1 −
Q). That this measure is preserved also follows from the fact that the ordinary
differential equation (3.7) preserves this measure, and the matrices A1,ε, A2,ε have
determinant 1. Also note that the set {ϕ1 = ϕ2 = 1/2} has co-dimension two,
and so
t zε(t){ϕ1 = ϕ2 = 1/2} has co-dimension one, which shows that only a
measure zero set of initial conditions will give rise to three particle collisions.
Argument for uniform convergence
Step 1: Reduction using Gronwall’s Inequality. Define H(z) by
H(z) =
2m1s1δϕ1=1/2 − 2m2s2δϕ2=1/2
−2Wδϕ1=1/2
2Wδϕ2=1/2
Here we make use of Dirac delta functions. All integrals involving these delta
functions may be replaced by sums. We explicitly deal with any ambiguities arising
from collisions occurring at the limits of integration.
Lemma 3.3.1. For 0 ≤ t ≤ T∧Tε
hε(t)− hε(0) = ε
H(zε(s))ds+O(ε),
where any ambiguity about changes due to collisions occurring precisely at times 0
and t is absorbed in the O(ε) term.
Proof. There are four components to verify. The first component requires that
Qε(t)−Qε(0) = ε
Wε(s)ds+O(ε). This is trivially true because Qε(t)−Qε(0) =
Wε(s)ds.
The second component states that
Wε(t)−Wε(0) = ε
2m1s1,ε(s)δϕ1,ε(s)=1/2−2m2s2,ε(s)δϕ2,ε(s)=1/2ds+O(ε). (3.8)
Let rk and qj be the times in (0, t) such that ϕ1,ε(rk) = 1/2 and ϕ2,ε(qj) = 1/2,
respectively. Then
Wε(t)−Wε(0) =
∆Wε(rk) +
∆Wε(qj) +O(ε).
Observe that there exists ω > 0 such that for all sufficiently small ε and all h ∈ V,
1/ω < dϕi
< ω. Thus the number of collisions in a time interval grows no faster
than linearly in the length of that time interval. Because t ≤ T/ε, it follows that
Wε(t)−Wε(0) = ε
2m1s1,ε(rk)− ε
2m2s2,ε(qj) +O(ε),
and Equation (3.8) is verified. Note that because V is compact, there is uniformity
over all initial conditions in the size of the O(ε) terms above. The third and fourth
components are handled similarly.
Next, h̄(τ) satisfies the integral equation
h̄(τ)− h̄(0) =
H̄(h̄(σ))dσ,
while hε(τ/ε) satisfies
hε(τ/ε)− hε(0) = O(ε) + ε
∫ τ/ε
H(zε(s))ds
= O(ε) + ε
∫ τ/ε
H(zε(s))− H̄(hε(s))ds+
H̄(hε(σ/ε))dσ
for 0 ≤ τ ≤ T ∧ Tε.
Define
eε(τ) = ε
∫ τ/ε
H(zε(s))− H̄(hε(s))ds.
It follows from Gronwall’s Inequality that
0≤τ≤T∧Tε
∣h̄(τ)− hε(τ/ε)
O(ε) + sup
0≤τ≤T∧Tε
|eε(τ)|
eLip(H̄ |V)T . (3.9)
Gronwall’s Inequality is usually stated for continuous paths, but the standard
proof (found in [SV85]) still works for paths that are merely integrable, and
∣h̄(τ)− hε(τ/ε)
∣ is piecewise smooth.
Step 2: A splitting according to particles. Now
H(z)− H̄(h) =
2m1s1δϕ1=1/2 −m1s21/Q
−2Wδϕ1=1/2 + s1W/Q
−2m2s2δϕ2=1/2 +m2s22/(1−Q)
2Wδϕ2=1/2 − s2W/(1−Q)
and so, in order to show that sup0≤τ≤T∧Tε |eε(τ)| = O(ε), it suffices to show that
0≤τ≤T∧Tε
∫ τ/ε
s1,ε(s)δϕ1,ε(s)=1/2 −
s1,ε(s)
2Qε(s)
= O(1),
0≤τ≤T∧Tε
∫ τ/ε
Wε(s)δϕ1,ε(s)=1/2 −
Wε(s)s1,ε(s)
2Qε(s)
= O(1),
as well as two analogous claims about terms involving ϕ2,ε. Thus we have effec-
tively separated the effects of the different gas particles, so that we can deal with
each particle separately. We will only show that
0≤τ≤T∧Tε
∫ τ/ε
s1,ε(s)δϕ1,ε(s)=1/2 −
s1,ε(s)
2Qε(s)
= O(1).
The other three terms can be handled similarly.
Step 3: A sequence of times adapted for ergodization. Ergodization refers
to the convergence along an orbit of a function’s time average to its space average.
For example, because of the splitting according to particles above, one can easily
check that 1
H(z0(s))ds = H̄(h0) + O(1/t), even when z0(·) restricted to the
invariant tori Mh0 is not ergodic. In this step, for each initial condition zε(0) in
our phase space, we define a sequence of times tk,ε inductively as follows: t0,ε =
inf{t ≥ 0 : ϕ1,ε(t) = 0}, tk+1,ε = inf{t > tk,ε : ϕ1,ε(t) = 0}. This sequence is
chosen because δϕ1,0(s)=1/2 is “ergodizd” as time passes from tk,0 to tk+1,0. If ε is
sufficiently small and tk+1,ε ≤ (T ∧ Tε)/ε, then the spacings between these times
are uniformly of order 1, i.e. 1/ω < tk+1,ε − tk,ε < ω. Thus,
0≤τ≤T∧Tε
∫ τ/ε
s1,ε(s)δϕ1,ε(s)=1/2 −
s1,ε(s)
2Qε(s)
≤ O(1) +
tk+1,ε≤T∧Tεε
∫ tk+1,ε
s1,ε(s)δϕ1,ε(s)=1/2 −
s1,ε(s)
2Qε(s)
(3.10)
Step 4: Control of individual terms by comparison with solutions along
fibers. The sum in Equation (3.10) has no more than O(1/ε) terms, and so it
suffices to show that each term is no larger than O(ε). We can accomplish this by
comparing the motions of zε(t) for tk,ε ≤ t ≤ tk+1,ε with the solution of the ε = 0
version of Equation (3.7) that, at time tk,ε, is located at zε(tk,ε). Since each term
in the sum has the same form, without loss of generality we will only examine the
first term and suppose that t0,ε = 0, i.e. that ϕ1,ε(0) = 0.
Lemma 3.3.2. If t1,ε ≤ T∧Tεε , then sup0≤t≤t1,ε |z0(t)− zε(t)| = O(ε).
Proof. To check that sup0≤t≤t1,ε |h0(t)− hε(t)| = O(ε), first note that h0(t) =
h0(0) = hε(0). Then dQε/dt = O(ε), so that Q0(t)−Qε(t) = O(εt). Furthermore,
the other slow variables change by O(ε) at collisions, while the number of collisions
in the time interval [0, t1,ε] is O(1).
It remains to show that sup0≤t≤t1,ε |ϕi,0(t)− ϕi,ε(t)| = O(ε). Using what we
know about the divergence of the slow variables,
ϕ1,0(t)− ϕ1,ε(t) =
s1,0(s)
2Q0(s)
− s1,ε(s)
2Qε(s)
+O(ε)ds =
O(ε)ds = O(ε)
for 0 ≤ t ≤ t1,ε. Showing that sup0≤t≤t1,ε |ϕ2,0(t)− ϕ2,ε(t)| = O(ε) is similar.
From Lemma 3.3.2, t1,ε = t1,0 +O(ε) = 2Q0/s1,0 +O(ε). We conclude that
∫ t1,ε
s1,ε(s)δϕ1,ε(s)=1/2 −
s1,ε(s)
2Qε(s)
ds = O(ε) +
∫ t1,ε
s1,0(s)δϕ1,ε(s)=1/2 −
s1,0(s)
2Q0(s)
= O(ε) + s1,0 − t1,ε
s21,0
= O(ε).
It follows that sup0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ = O(ε), independent of the initial
condition in h−1V.
3.3.2 Extension to multiple gas particles
When n1, n2 > 1, only minor modifications are necessary to generalize the proof
above. We start by extending the slow variables h to a full set of coordinates on
phase space by defining the angle variables ϕi,j ∈ [0, 1]/ 0 ∼ 1 = S1 for 1 ≤ i ≤ 2,
1 ≤ j ≤ ni:
ϕ1,j = ϕ1,j(q1,j , v1,j, Q) =
if v1,j > 0
1− q1,j
if v1,j < 0
ϕ2,j = ϕ2,j(q2,j , v2,j, Q) =
1−q2,j
2(1−Q) if v2,j < 0
1− 1−q2,j
2(1−Q) if v2,j > 0
Then dϕ1,j/dt = s1,j(2Q)
−1 + O(ε), dϕ2,j/dt = s2,j(2(1 − Q))−1 + O(ε), and
z = (h, ϕ1,j , ϕ2,j) represents a choice of coordinates on our phase space, which
is homeomorphic to (a subset of Rn1+n2+2) × Tn1+n2 . In these coordinates, the
dynamical system yields a discontinuous flow zε(t) on phase space. The flow
preserves Liouville measure, which in our coordinates has a density proportional
to Qn1(1 − Q)n2 . As is Section 3.3.1, one can show that the measure of initial
conditions leading to multiple particle collisions is zero.
Next, define H(z) by
H(z) =
j=1 2m1,js1,jδϕ1,j=1/2 −
j=1 2m2s2,jδϕ2,j=1/2
−2Wδϕ1,j=1/2
2Wδϕ2,j=1/2
For 0 ≤ t ≤ T∧Tε
, hε(t) − hε(0) = ε
H(zε(s))ds + O(ε). From here, the rest of
the proof follows the same arguments made in Section 3.3.1.
3.4 Proof of the main result for the soft core
piston
For the remainder of this chapter, we consider the family of Hamiltonian systems
introduced in Section 3.1.2, which are parameterized by ε, δ ≥ 0. For simplicity,
we specialize to n1 = n2 = 1. As in Section 3.3, the generalization to n1, n2 > 1
is not difficult. The Hamiltonian dynamics are given by the following ordinary
differential equation:
= εW,
= ε (−κ′δ(Q− x1) + κ′δ(x2 −Q)) ,
= v1,
−κ′δ(x1) + κ′δ(Q− x1)
= v2,
−κ′δ(x2 −Q) + κ′δ(1− x2)
(3.11)
Recalling the particle energies defined by Equation (3.3), we find that
= εWκ′δ(Q− x1),
= −εWκ′δ(x2 −Q).
For the compact set V introduced in Section 3.1.2, fix a small positive number
E and an open set U ⊂ R4 such that V ⊂ U and h ∈ U ⇒ Q ∈ (E , 1−E),W ⊂⊂ R,
and E < E1, E2 < κ(0)− E . We only consider the dynamics for 0 < δ < E/2 and
h ∈ U .
Define
U1(q1) = U1(q1, Q, δ) := κδ(q1) + κδ(Q− q1),
U2(q2) = U2(q2, Q, δ) := κδ(q2 −Q) + κδ(1− q2).
Then the energies Ei satisfy Ei = miv
i /2 + Ui(xi).
Let T1 = T1(Q,E1, δ) and T2 = T2(Q,E2, δ) denote the periods of the motions
of the left and right gas particles, respectively, when ε = 0.
Lemma 3.4.1. For i = 1, 2,
Ti ∈ C1{(Q,Ei, δ) : Q ∈ (E , 1− E), Ei ∈ (E , κ(0)− E), 0 ≤ δ < E/2}.
Furthermore,
T1(Q,E1, δ) =
Q +O(δ),
T2(Q,E2, δ) =
(1−Q) +O(δ).
The proof of this lemma is mostly computational, and so we delay it until
Section 3.5. Note especially that the periods can be suitably defined such that
their regularity extends to δ = 0.
In this section, and in Section 3.5 below, we adopt the following convention
on the use of the O notation. All use of the O notation will explicitly contain the
dependence on ε and δ as ε, δ → 0. For example, if a function f(h, ε, δ) = O(ε),
then there exists δ′, ε′ > 0 such that sup0<ε≤ε′, 0<δ≤δ′, h∈V |f(h, ε, δ)/ε| <∞.
When ε = 0,
(Ei − Ui(xi)).
Define a = a(Ei, δ) by
κδ(a) = κ(a/δ) = Ei,
so that a(E1, δ) is a turning point for the left gas particle. Then a = δκ
−1(Ei),
where κ−1 is defined as follows: κ : [0, 1] → [0, κ(0)] takes 0 to κ(0) and 1 to 0.
Furthermore, κ ∈ C2([0, 1]), κ′ ≤ 0, and κ′(x) < 0 if x < 1. By monotonicity,
κ−1 : [0, κ(0)] → [0, 1] exists and takes 0 to 1 and κ(0) to 0. Also, by the Implicit
Function Theorem, κ−1 ∈ C2((0, κ(0)]), (κ−1)′(y) < 0 for y > 0, and (κ−1)′(y) →
−∞ as y → 0+. Because we only consider energies Ei ∈ (E , κ(0)− E), it follows
that a(Ei, δ) is a C2 function for the domains of interest.
3.4.1 Derivation of the averaged equation
As we previously pointed out, for each fixed δ > 0, Anosov’s theorem 2.1.1 and
Theorem 2.2.5 apply directly to the family of ordinary differential equations in
Equation (3.11), provided that δ is sufficiently small. The invariant fibers Mh of
the ε = 0 flow are tori described by a fixed value of the four slow variables and
{(Q,W, q1, v1, q2, v2) : E1 = m1v21/2 + U1(q1, Q, δ), E2 = m2v22/2 + U2(q2, Q, δ)}. If
we use (q1, q2) as local coordinates on Mh, which is valid except when v1 or v2 = 0,
the invariant measure µh of the unperturbed flow has the density
dq1dq2
(E1 − U1(q1)) T2
(E2 − U2(q2))
The restricted flow is ergodic for almost every h. See Corollary 3.5.1 in Section
−κ′δ(Q− q1) + κ′δ(q2 −Q)
Wκ′δ(Q− q1)
−Wκ′δ(q2 −Q)
κ′δ(Q− q1)dµh =
∫ Q−a
κ′δ(Q− q1)
(E1 − U1(q1))
∫ Q−a
κ′δ(Q− q1)
E1 − κδ(Q− q1)
E1 − u
8m1E1
Similarly,
κ′δ(q2 −Q)dµh = −
8m2E2
It follows that the averaged vector field is
H̄δ(h) =
8m1E1
8m2E2
8m1E1
8m2E2
where from Lemma 3.4.1 we see that H̄ ·(·) ∈ C1({(δ, h) : 0 ≤ δ < E/2, h ∈ V}).
H̄0(h) agrees with the averaged vector field for the hard core system from Equation
(3.1), once we account for the change of coordinates Ei = mis
i /2.
Remark 3.4.1. An argument due to Neishtadt and Sinai [NS04] shows that the
solutions to the averaged equation (3.4) are periodic. This argument also shows
that, as in the case δ = 0, the limiting dynamics of (Q,W ) are effectively Hamil-
tonian, with the shape of the Hamiltonian depending on δ, Q(0), and the initial
energies of the gas particles. The argument depends heavily on the observation
that the phase integrals
Ii(Q,Ei, δ) =
miv2+Ui(x,Q,δ)≤Ei
are adiabatic invariants, i.e. they are integrals of the solutions to the averaged
equation. Thus the four-dimensional phase space of the averaged equation is foli-
ated by invariant two-dimensional submanifolds, and one can think of the effective
Hamiltonians for the piston as living on these submanifolds.
3.4.2 Proof of Theorem 3.1.2
The following arguments are motivated by our proof in Section 3.3, although the
details are more involved as we show that the rate of convergence is independent
of all small δ.
A choice of coordinates on phase space
We wish to describe the dynamics in a coordinate system inspired by the one used
in Section 3.3.1. For each fixed δ ∈ (0, δ0], this change of coordinates will be C1
in all variables on the domain of interest. However, it is an exercise in analysis to
show this, and so we delay the proofs of the following two lemmas until Section
We introduce the angular coordinates ϕi ∈ [0, 1]/ 0 ∼ 1 = S1 defined by
ϕ1 = ϕ1(q1, v1, Q) =
0 if q1 = a
E1−U1(s)ds if v1 > 0
1/2 if q1 = Q− a
E1−U1(s)ds if v1 < 0
ϕ2 = ϕ2(q2, v2, Q) =
0 if q2 = 1− a
∫ 1−a
E2−U2(s)ds if v2 < 0
1/2 if q2 = Q+ a
∫ 1−a
E2−U2(s)ds if v2 > 0
. (3.12)
Then z = (h, ϕ1, ϕ2) is a choice of coordinates on h
−1U . As before, we will abuse
notation and let h(z) denote the projection onto the first four coordinates of z.
There is a fixed value of δ0 in the statement of Theorem 3.1.2. However, for
the purposes of our proof, it will be convenient to progressively choose δ0 smaller
when needed. At the end of the proof, we will have only shrunk δ0 a finite number
of times, and this final value will satisfies the requirements of the theorem. Our
first requirement on δ0 is that it is smaller than E/2.
Lemma 3.4.2. If δ0 > 0 is sufficiently small, then for each δ ∈ (0, δ0] the ordinary
differential equation (3.11) in the coordinates z takes the form
= Zδ(z, ε), (3.13)
where Zδ ∈ C1(h−1U × [0,∞)). When z ∈ h−1U ,
Zδ(z, ε) =
−κ′δ(Q− q1(z)) + κ′δ(q2(z)−Q)
εWκ′δ(Q− q1(z))
−εWκ′δ(q2(z)−Q)
+O(ε)
+O(ε)
. (3.14)
Recall that, by our conventions, the O(ε) terms in Equation (3.14) have a
size that can be bounded independent of all δ sufficiently small. Denote the flow
determined by Zδ(·, ε) by zδε(t), and its components by Qδε(t), W δε (t), Eδ1,ε(t), etc.
Also, set hδε(t) = h(z
ε(t)). From Equation (3.14),
Hδ(z, ε) :=
−κ′δ(Q− q1(z)) + κ′δ(q2(z)−Q)
Wκ′δ(Q− q1(z))
−Wκ′δ(q2(z)−Q)
. (3.15)
In particular, Hδ(z, ε) = Hδ(z, 0).
Before proceeding, we need one final technical lemma.
Lemma 3.4.3. If δ0 > 0 is chosen sufficiently small, there exists a constant K
such that for all δ ∈ (0, δ0], κ′δ(|Q− xi(z)|) = 0 unless ϕi ∈ [1/2−Kδ, 1/2 +Kδ].
Argument for uniform convergence
We start by proving the following lemma, which essentially says that an orbit
zδε(t) only spends a fraction O(δ) of its time in a region of phase space where
∣Hδ(zδε(t), ε)
∣Hδ(zδε(t), 0)
∣ is of size O(δ−1)
Lemma 3.4.4. For 0 ≤ T ′ ≤ T ≤ T∧T
∣Hδ(zδε(s), 0)
∣ ds = O(1 ∨ (T − T ′)).
Proof. Without loss of generality, T ′ = 0. From Lemmas 3.4.1 and 3.4.2 it follows
that if we choose δ0 sufficiently small, then there exists ω > 0 such that for all
sufficiently small ε and all δ ∈ (0, δ0], h ∈ V ⇒ 1/ω <
< ω. Define the set
B = [1/2 − Kδ, 1/2 + Kδ], where K comes from Lemma 3.4.3. Then we find a
crude bound on
Qδε(s)− q1(zδε(s))
∣ ds using that
dϕδ1,ε
≥ 1/ω if ϕδ1,ε ∈ B
≤ ω if ϕδ1,ε ∈ Bc.
This yields
Qδε(s)− q1(zδε(s))
∣ ds ≤ const
1ϕδ1,ε(s)∈Bds
≤ const
2Kωδ + 1−2Kδ
T + 2Kωδ
= O(1 ∨ T ).
Similarly,
∣κ′δ(q2(z
ε(s))−Qδε(s))
∣ ds = O(1∨ T ), and so
∣Hδ(zδε(s), 0)
∣ ds =
O(1 ∨ T ).
We now follow steps one through four from Section 3.3.1, making modifications
where necessary.
Step 1: Reduction using Gronwall’s Inequality. Now hδε(τ/ε) satisfies
hδε(τ/ε)− hδε(0) = ε
∫ τ/ε
Hδ(zδε(s), 0)ds.
Define
eδε(τ) = ε
∫ τ/ε
Hδ(zδε(s), 0)− H̄δ(hδε(s))ds.
It follows from Gronwall’s Inequality and the fact that H̄ ·(·) ∈ C1({(δ, h) : 0 ≤
δ ≤ δ0, h ∈ V}) that
0≤τ≤T∧T δε
∣hδε(τ/ε)− h̄δ(τ)
0≤τ≤T∧T δε
∣eδε(τ)
eLip(H̄
δ|V)T
0≤τ≤T∧T δε
∣eδε(τ)
(3.16)
Step 2: A splitting according to particles. Next,
Hδ(z, 0)− H̄δ(h)
−κ′δ(Q− q1(z))−
8m1E1
Wκ′δ(Q− q1(z)) +W
8m1E1
κ′δ(q2(z)−Q) +
8m2E2
−Wκ′δ(q2(z)−Q)−W
8m2E2
and so, in order to show that sup0≤τ≤T∧T δε
∣eδε(τ)
∣ = O(ε), it suffices to show that
for i = 1, 2,
0≤τ≤T∧T δε
∫ τ/ε
∣Qδε(s)− xi(zδε(s))
i,ε(s)
Ti(Qδε(s), E
i,ε(s), δ)
= O(1),
0≤τ≤T∧T δε
∫ τ/ε
Wε(s)κ
∣Qδε(s)− xi(zδε(s))
+Wε(s)
i,ε(s)
Ti(Qδε(s), E
i,ε(s), δ)
= O(1).
We only demonstrate that
0≤τ≤T∧T δε
∫ τ/ε
Qδε(s)− q1(zδε(s))
1,ε(s)
T1(Qδε(s), E
1,ε(s), δ)
= O(1).
The other three terms are handled similarly.
Step 3: A sequence of times adapted for ergodization. Define the se-
quence of times tδk,ε inductively by t
0,ε = inf{t ≥ 0 : ϕδ1,ε(t) = 0}, tδk+1,ε = inf{t >
tδk,ε : ϕ
1,ε(t) = 0}. If ε and δ are sufficiently small and tδk+1,ε ≤ (T ∧ T δε )/ε, then
it follows from Lemma 3.4.2 and the discussion in the proof of Lemma 3.4.4 that
1/ω < tδk+1,ε − tδk,ε < ω. From Lemmas 3.4.2 and 3.4.4 it follows that
0≤τ≤T∧T δε
∫ τ/ε
Qδε(s)− q1(zδε(s))
1,ε(s)
T1(Qδε(s), E
1,ε(s), δ)
≤ O(1) +
k+1,ε
k+1,ε
Qδε(s)− q1(zδε(s))
1,ε(s)
T1(Qδε(s), E
1,ε(s), δ)
(3.17)
Step 4: Control of individual terms by comparison with solutions along
fibers. As before, it suffices to show that each term in the sum in Equation
(3.17) is no larger than O(ε). Without loss of generality we will only examine the
first term and suppose that tδ0,ε = 0, i.e. that ϕ
1,ε(0) = 0.
Lemma 3.4.5. If tδ1,ε ≤
T∧T δε
, then sup0≤t≤tδ1,ε
∣zδ0(t)− zδε(t)
∣ = O(ε).
Proof. By Lemma 3.4.4, hδ0(t) − hδε(t) = hδε(0) − hδε(t) = −ε
Hδ(zδε(s), 0)ds =
O(ε(1 ∨ t)) for t ≥ 0.
Using what we know about the divergence of the slow variables, we find that
ϕδ1,0(t)− ϕδ1,ε(t) =
0(s), E
0(s), δ)
T1(Qδε(s), E
ε(s), δ)
+O(ε)ds
O(ε)ds
= O(ε)
for 0 ≤ t ≤ tδ1,ε. Lemmas 3.4.1 and 3.4.2 ensure the desired uniformity in the sizes
of the orders of magnitudes. Showing that sup0≤t≤tδ1,ε
∣ϕδ2,0(t)− ϕδ2,ε(t)
∣ = O(ε) is
similar.
From Lemma 3.4.5 we find that t1,ε = t1,0+O(ε) = T1(Qδ0, Eδ0 , δ)+O(ε). Hence
∫ tδ1,ε
1,ε(s)
T1(Qδε(s), E
1,ε(s), δ)
ds = O(ε) +
∫ tδ1,0
1,0, δ)
= O(ε) +
But when q1(z
ε) < Q
ε − a,
Eδ1,ε(s)− κδ
Qδε(s)− q1(zδε(s))
ε(s))
Qδε(s)− q1(zδε(s))
and so
∫ tδ1,ε
Qδε(s)− q1(zδε(s))
ds = −
1,ε(0)−
1,ε(t
= O(ε)−
Hence,
∫ tδ1,ε
Qδε(s)− q1(zδε(s))
1,ε(s)
T1(Qδε(s), E
1,ε(s), δ)
ds = O(ε),
as desired.
3.5 Appendix to Section 3.4
Proof of Lemma 3.4.1:
Proof. For 0 < δ < E/2,
T1 = T1(Q,E1, δ) = 2
∫ Q−a
E1 − U1(s)
T2 = T2(Q,E2, δ) = 2
∫ 1−a
E2 − U2(s)
We only consider the claims about T1, and for convenience we take m1 = 2. Then
T1(Q,E1, δ) = 2
∫ Q−a
E1 − U1(s)
∫ Q/2
E1 − κδ(s)
Q/2− δ√
E1 − κδ(s)
2Q− 4δ√
κ−1(E1)
E1 − κ(s)
Define
F (E) :=
κ−1(E)
E − κ(s)
−(κ−1)′(u)√
E − u
Notice that (κ−1)′(u) diverges as u → 0+, while (E − u)−1/2 diverges as u → E−,
but both functions are still integrable on [0, E]. It follows that F (E) is well defined.
Then it suffices to show that F : [E , κ(0)− E ] → R is C1.
Write
F (E) =
∫ E/2
−(κ−1)′(u)√
E − u
−(κ−1)′(u)√
E − u
:= F1(E) + F2(E).
A standard application of the Dominated Convergence Theorem allows us to dif-
ferentiate inside the integral and conclude that F1 ∈ C∞([E , κ(0)− E ]), with
F ′1(E) =
∫ E/2
(κ−1)′(u)
2(E − u)3/2du.
To examine F2, we make the substitution v = E − u to find that
F2(E) =
∫ E−E/2
−(κ−1)′(E − v)√
Using the fact that (κ−1)′ ∈ C1([E/2, κ(0)]) and the Dominated Convergence The-
orem, we find that F2 is differentiable, with
F ′2(E) =
−(κ−1)′(E/2)
E − E/2
∫ E−E/2
−(κ−1)′′(E − v)√
Another application of the Dominated Convergence Theorem shows that F ′2 is
continuous, and so F2 ∈ C1([E , κ(0)− E ]).
T1(Q,E1, δ) =
−E−1/21 + F1(E1) + F2(E1)
has the desired regularity. For future reference, we note that
+O(δ). (3.18)
Corollary 3.5.1. For all δ sufficiently small, the flow zδ0(t) restricted to the in-
variant tori Mc = {h = c} is ergodic (with respect to the invariant Lebesgue
measure) for almost every c ∈ U .
Proof. The flow is ergodic whenever the periods T1 and T2 are irrationally related.
Fix δ sufficiently small such that ∂T1
= −Q/E3/21 + O(δ) < 0. Next, consider
Q, W , and E2 fixed, so that T2 is constant. Because T1 ∈ C1, it follows that,
as we let E1 vary,
/∈ Q for almost every E1. The result follows from Fubini’s
Theorem.
Proof of Lemma 3.4.2:
Proof. For the duration of this proof, we consider the dynamics for a small, fixed
value of δ > 0, which we generally suppress in our notation. For convenience, we
take m1 = 2.
Let ψ denote the map taking (Q,W, q1, v1, q2, v2) to (Q,W,E1, E2, ϕ1, ϕ2). We
claim that ψ is a C1 change of coordinates on the domain of interest. Since E1 =
v21 + κδ(q1) + κδ(Q− q1), E1 is a C2 function of q1, v1, and Q. A similar statement
holds for E2.
The angular coordinates ϕi(xi, vi, Q) are defined by Equation (3.12). We only
consider ϕ1, as the statements for ϕ2 are similar. Then ϕ1(q1, v1, Q) is clearly C1
whenever q1 6= a,Q−a. The apparent difficulties in regularity at the turning points
are only a result of how the definition of ϕ1 is presented in Equation (3.12). Recall
that the angle variables are actually defined by integrating the elapsed time along
orbits, and our previous definition expressed ϕ1 in a manner which emphasized
the dependence on q1. In fact, whenever |v1| <
ϕ1(q1, v1, Q) =
(κ−1δ )
′(E1 − v2)dv if q1 < δ
(κ−1δ )
′(E1 − v2)dv if q1 > Q− δ.
(3.19)
Here E1 is implicitly considered to be a function of q1, v1, and Q. One can verify
that Dψ is non-degenerate on the domain of interest, and so ψ is indeed a C1
change of coordinates.
Next observe that dϕ1,0/dt = 1/T1, so Hadamard’s Lemma implies that
dϕ1,ε
+O(εf(δ)).
It remains to show that, in fact, we may take f(δ) = 1. It is easy to verify this
whenever q1 ≤ Q−δ because dE1/dt = 0 there. We only perform the more difficult
verification when q1 > Q− δ.
When q1 > Q−δ, |v1| <
E1 and E1 = v
1+κδ(Q−q1). From Equation (3.19)
we find that
T1(Q,E1, δ)
(κ−1)′(E1 − v2)dv. (3.20)
To find dϕ1/dt, we consider ϕ1 as a function of v1, Q, and E1, so that
Then, using Equations (3.18) and (3.20), we compute
(κ−1δ )
′(E1 − v21)
κ′δ(Q− q1)
1/2− ϕ1
(εW ) = εW
1/2− ϕ1
1/2− ϕ1
(κ−1)′′(E1 − v2)dv
(εWκ′δ(Q− q1)).
Using that κ′δ(Q− q1) = κ′(κ−1(E1 − v21))/δ = (δ(κ−1)′(E1 − v21))−1, we find that
1/2− ϕ1
(κ−1)′(E1 − v21)
(κ−1)′′(E1 − v2)dv
But here 1/2− ϕ1 is O(δ). See the proof of Lemma 3.4.3 below. Thus the claims
about dϕ1/dt will be proven, provided we can uniformly bound
(κ−1)′(E1 − v21)
(κ−1)′′(E1 − v2)dv.
Note that the apparent divergence of the integral as |v1| →
E1 is entirely due to
the fact that our expression for ϕ1 from Equation (3.20) requires |v1| <
E1. If
we make the substitution u = E1− v2 and let e = E1− v21 , then it suffices to show
E≤E1≤κ(0)−E
0<e≤E1
(κ−1)′(e)
(κ−1)′′(u)√
E1 − u
< +∞.
The only difficulties occur when e is close to 0. Thus it suffices to show that
E≤E1≤κ(0)−E
0<e≤E/2
(κ−1)′(e)
∫ E/2
(κ−1)′′(u)√
E1 − u
is finite. But this is bounded by
0<e≤E/2
(κ−1)′(e)
∫ E/2
(κ−1)′′(u)
= sup
0<e≤E/2
(κ−1)′(e)
(κ−1)′(E/2)− (κ−1)′(e)
which is finite because (κ−1)′(e) → −∞ as e→ 0+. The claims about dϕ2/dt can
be proven similarly.
Proof of Lemma 3.4.3:
Proof. We continue in the notation of the proofs of Lemmas 3.4.1 and 3.4.2 above,
and we set m1 = 2. Then from Equation (3.20), we see that κ
δ(Q− q1) = 0 unless
|ϕ1 − 1/2| ≤
(κ−1)′(E1 − v2)dv
= δF (E1)/T1 = O(δ). Dealing with ϕ2
is similar.
Chapter 4
The periodic oscillation of an
adiabatic piston in two or three
dimensions
In this chapter, we present our results for the piston system in two or three di-
mensions. These results may also be found in [Wri07].
4.1 Statement of the main result
4.1.1 Description of the model
Consider a massive, insulating piston of mass M that separates a gas container
D in Rd, d = 2 or 3. See Figure 4.1. Denote the location of the piston by Q,
its velocity by dQ/dt = V , and its cross-sectional length (when d = 2, or area,
when d = 3) by ℓ. If Q is fixed, then the piston divides D into two subdomains,
D1(Q) = D1 on the left and D2(Q) = D2 on the right. By Ei we denote the
total energy of the gas inside Di, and by |Di| we denote the area (when d = 2, or
volume, when d = 3) of Di.
We are interested in the dynamics of the piston when the system’s total energy
is bounded and M → ∞. When M = ∞, the piston remains fixed in place, and
each energy Ei remains constant. When M is large but finite, MV
2/2 is bounded,
and so V = O(M−1/2). It is natural to define
ε =M−1/2,
so that W is of order 1 as ε → 0. This is equivalent to scaling time by ε.
D1(Q) D2(Q)
ℓ ✲ V = εW
M = ε−2 ≫ 1
❆❯s✟✟✙
s❍❍❍❥ s✘✘✿
Figure 4.1: A gas container D ⊂ R2 separated by a piston.
Next we precisely describe the gas container. It is a compact, connected billiard
domain D ⊂ Rd with a piecewise C3 boundary, i.e. ∂D consists of a finite number of
C3 embedded hypersurfaces, possibly with boundary and a finite number of corner
points. The container consists of a “tube,” whose perpendicular cross-section P is
the shape of the piston, connecting two disjoint regions. P ⊂ Rd−1 is a compact,
connected domain whose boundary is piecewise C3. Then the “tube” is the region
[0, 1] × P ⊂ D swept out by the piston for 0 ≤ Q ≤ 1, and [0, 1] × ∂P ⊂ ∂D. If
d = 2, P is just a closed line segment, and the “tube” is a rectangle. If d = 3, P
could be a circle, a square, a pentagon, etc.
Our fundamental assumption is as follows:
Main Assumption. For almost every Q ∈ [0, 1] the billiard flow of a single
particle on an energy surface in either of the two subdomains Di(Q) is ergodic
(with respect to the invariant Liouville measure).
If d = 2, the domain could be the Bunimovich stadium [Bun79]. Another possible
domain is indicated in Figure 4.1. The ergodicity of billiards in such domains,
which produce hyperbolic flows, goes back to the pioneering work of Sinai [Sin70],
although a number of individuals have contributed to the theory. A full accounting
of this history can be found in [CM06a]. Polygonal domains satisfying our assump-
tions can also be constructed [Vor97]. Suitable domains in d = 3 dimensions can be
constructed using a rectangular box with shallow spherical caps adjoined [BR98].
Note that we make no assumptions regarding the hyperbolicity of the billiard flow
in the domain.
The Hamiltonian system we consider consists of the massive piston of mass M
located at positionQ, as well as n1+n2 gas particles, n1 inD1 and n2 inD2. Here n1
and n2 are fixed positive integers. For convenience, the gas particles all have unit
mass, though all that is important is that each gas particle has a fixed mass. We
denote the positions of the gas particles in Di by qi,j, 1 ≤ j ≤ ni. The gas particles
are ideal point particles that interact with ∂D and the piston by hard core, elastic
collisions. Although it has no effect on the dynamics we consider, for convenience
we complete our description of the Hamiltonian dynamics by specifying that the
piston makes elastic collisions with walls located at Q = 0, 1 that are only visible
to the piston. We denote velocities by dQ/dt = V = εW and dqi,j/dt = vi,j, and
we set
Ei,j = v
i,j/2, Ei =
Ei,j.
Our system has d(n1 + n2) + 1 degrees of freedom, and so its phase space is
(2d(n1 + n2) + 2)-dimensional.
We let
h(z) = h = (Q,W,E1,1, E1,2, · · · , E1,n1 , E2,1, E2,2, · · · , E2,n2),
so that h is a function from our phase space to Rn1+n2+2. We often abbreviate
h = (Q,W,E1,j , E2,j), and we refer to h as consisting of the slow variables because
these quantities are conserved when ε = 0. We let hε(t, z) = hε(t) denote the
actual motions of these variables in time for a fixed value of ε. Here z represents
the initial condition in phase space, which we usually suppress in our notation.
One should think of hε(·) as being a random variable that takes initial conditions
in phase space to paths (depending on the parameter t) in Rn1+n2+2.
4.1.2 The averaged equation
From the work of Neishtadt and Sinai [NS04], one can derive
= H̄(h) :=
d |D1(Q)|
− 2E2ℓ
d |D2(Q)|
− 2WE1,jℓ
d |D1(Q)|
2WE2,jℓ
d |D2(Q)|
(4.1)
as the averaged equation (with respect to the slow time τ = εt) for the slow
variables. Later, in Section 4.2.3, we will give another heuristic derivation of the
averaged equation that is more suggestive of our proof.
Neishtadt and Sinai [Sin99, NS04] pointed out that the solutions of Equa-
tion (1.3) have (Q,W ) behaving as if they were the coordinates of a Hamiltonian
system describing a particle undergoing motion inside a potential well. As in
Section 1.2, the effective Hamiltonian is given by
W 2 +
E1(0) |D1(Q(0))|2/d
|D1(Q)|2/d
E2(0) |D2(Q(0))|2/d
|D2(Q)|2/d
This can be seen as follows. Since
∂ |D1(Q)|
= ℓ = −∂ |D2(Q)|
d ln(Ei,j)/dτ = −(2/d)d ln(|Di(Q)|)/dτ , and so
Ei,j(τ) = Ei,j(0)
|Di(Q(0))|
|Di(Q(τ))|
By summing over j, we find that
Ei(τ) = Ei(0)
|Di(Q(0))|
|Di(Q(τ))|
and so
d2Q(τ)
E1(0) |D1(Q(0))|2/d
|D1(Q(τ))|1+2/d
E2(0) |D2(Q(0))|2/d
|D2(Q(τ))|1+2/d
Let h̄(τ, z) = h̄(τ) be the solution of
= H̄(h̄), h̄(0) = hε(0).
Again, think of h̄(·) as being a random variable.
4.1.3 The main result
The solutions of the averaged equation approximate the motions of the slow vari-
ables, hε(t), on a time scale O(1/ε) as ε → 0. Precisely, fix a compact set
V ⊂ Rn1+n2+2 such that h ∈ V ⇒ Q ⊂⊂ (0, 1),W ⊂⊂ R, and Ei,j ⊂⊂ (0,∞)
for each i and j.1 We will be mostly concerned with the dynamics when h ∈ V.
Define
Qmin = inf
Q, Qmax = sup
Emin = inf
W 2 + E1 + E2, Emax = sup
W 2 + E1 + E2.
1 We have introduced this notation for convenience. For example, h ∈ V ⇒ Q ⊂⊂ (0, 1)
means that there exists a compact set A ⊂ (0, 1) such that h ∈ V ⇒ Q ∈ A, and similarly for
the other variables.
For a fixed value of ε > 0, we only consider the dynamics on the invariant subset
of phase space defined by
Mε = {(Q, V, qi,j, vi,j) ∈ R2d(n1+n2)+2 : Q ∈ [0, 1], qi,j ∈ Di(Q),
Emin ≤
V 2 + E1 + E2 ≤ Emax}.
Let Pε denote the probability measure obtained by restricting the invariant Liou-
ville measure to Mε. Define the stopping time
Tε(z) = Tε = inf{τ ≥ 0 : h̄(τ) /∈ V or hε(τ/ε) /∈ V}.
Theorem 4.1.1. If D is a gas container in d = 2 or 3 dimensions satisfying the
assumptions in Subsection 4.1.1 above, then for each T > 0,
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣→ 0 in probability as ε =M−1/2 → 0,
i.e. for each fixed δ > 0,
0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ ≥ δ
→ 0 as ε =M−1/2 → 0.
Remark 4.1.1. It should be noted that the stopping time in the above result is not
unduly restrictive. If the initial pressures of the two gasses are not too mismatched,
then the solution to the averaged equation is a periodic orbit, with the effective
potential well keeping the piston away from the walls. Thus, if the actual motions
follow the averaged solution closely for 0 ≤ τ ≤ T ∧ Tε, and the averaged solution
stays in V, it follows that Tε > T .
Remark 4.1.2. The techniques of this work should immediately generalize to prove
the analogue of Theorem 4.1.1 above in the nonphysical dimensions d > 3, although
we do not pursue this here.
Remark 4.1.3. As in Subsection 3.1.3, Theorem 4.1.1 can be easily generalized
to cover a system of N − 1 pistons that divide N gas containers, so long as, for
almost every fixed location of the pistons, the billiard flow of a single gas particle
on an energy surface in any of the N subcontainers is ergodic (with respect to the
invariant Liouville measure). The effective Hamiltonian for the pistons has them
moving like an (N − 1)-dimensional particle inside a potential well.
4.2 Preparatory material concerning a
two-dimensional gas container with only one
gas particle on each side
Our results and techniques of proof are essentially independent of the dimension
and the fixed number of gas particles on either side of the piston. Thus, we focus
D1 D2
ℓ ✲V = εW
M = ε−2 ≫ 1
Figure 4.2: A choice of coordinates on phase space.
on the case when d = 2 and there is only one gas particle on either side. Later,
in Section 4.4, we will indicate the simple modifications that generalize our proof
to the general situation. For clarity, in this section and next, we denote q1,1 by q1,
v2,1 by v2, etc. We decompose the gas particle coordinates according to whether
they are perpendicular to or parallel to the piston’s face, for example q1 = (q
1 , q
See Figure 4.2.
The Hamiltonian dynamics define a flow on our phase space. We denote this
flow by zε(t, z) = zε(t), where z = zε(0, z). One should think of zε(·) as being
a random variable that takes initial conditions in phase space to paths in phase
space. Then hε(t) = h(zε(t)). By the change of coordinates W = V/ε, we may
identify all of the Mε defined in Section 4.1 with the space
M = {(Q,W, q1, v1, q2, v2) ∈ R10 : Q ∈ [0, 1], q1 ∈ D1(Q), q2 ∈ D2(Q),
Emin ≤
W 2 + E1 + E2 ≤ Emax}.
and all of the Pε with the probability measure P on M, which has the density
dP = const dQdWdq⊥1 dq
(Throughout this work we will use const to represent generic constants that are
independent of ε.) We will assume that these identifications have been made, so
that we may consider zε(·) as a family of measure preserving flows on the same
space that all preserve the same probability measure. We denote the components
of zε(t) by Qε(t), q
1,ε(t), etc.
The set {z ∈ M : q1 = Q = q2} has co-dimension two, and so
zε(t){q1 =
Q = q2} has co-dimension one, which shows that only a measure zero set of initial
conditions will give rise to three particle collisions. We ignore this and other
measures zero events, such as gas particles hitting singularities of the billiard flow,
in what follows.
Now we present some background material, as well as some lemmas that will
assist us in our proof of Theorem 4.1.1. We begin by studying the billiard flow
of a gas particle when the piston is infinitely massive. Next we examine collisions
between the gas particles and the piston when the piston has a large, but finite,
mass. Then we present a heuristic derivation of the averaged equation that is
suggestive of our proof. Finally we prove a lemma that allows us to disregard
the possibility that a gas particle will move nearly parallel to the piston’s face
– a situation that is clearly bad for having the motions of the piston follow the
solutions of the averaged equation.
4.2.1 Billiard flows and maps in two dimensions
In this section, we study the billiard flows of the gas particles when M = ∞ and
the slow variables are held fixed at a specific value h ∈ V. We will only study
the motions of the left gas particle, as similar definitions and results hold for the
motions of the right gas particle. Thus we wish to study the billiard flow of a point
particle moving inside the domain D1 at a constant speed
2E1. The results of
this section that are stated without proof can be found in [CM06a].
Let T D1 denote the tangent bundle to D1. The billiard flow takes place in
the three-dimensional space M1h = M1 = {(q1, v1) ∈ T D1 : q1 ∈ D1, |v1| =√
2E1}/ ∼. Here the quotient means that when q1 ∈ ∂D1, we identify velocity
vectors pointing outside of D1 with those pointing inside D1 by reflecting through
the tangent line to ∂D1 at q1, so that the angle of incidence with the unit normal
vector to ∂D1 equals the angle of reflection. Note that most of the quantities
defined in this subsection depend on the fixed value of h. We will usually suppress
this dependence, although, when necessary, we will indicate it by a subscript h.
We denote the resulting flow by y(t, y) = y(t), where y(0, y) = y. As the billiard
flow comes from a Hamiltonian system, it preserves Liouville measure restricted
to the energy surface. We denote the resulting probability measure by µ. This
measure has the density dµ = dq1dv1/(2π
2E1 |D1|). Here dq1 represents area on
R2, and dv1 represents length on S
v1 ∈ R2 : |v1| =
There is a standard cross-section to the billiard flow, the collision cross-section
Ω = {(q1, v1) ∈ T D1 : q1 ∈ ∂D1, |v1| =
2E1}/ ∼. It is customary to parameter-
ize Ω by {x = (r, ϕ) : r ∈ ∂D1, ϕ ∈ [−π/2,+π/2]}, where r is arc length and ϕ
represents the angle between the outgoing velocity vector and the inward pointing
normal vector to ∂D1. It follows that Ω may be realized as the disjoint union
of a finite number of rectangles and cylinders. The cylinders correspond to fixed
scatterers with smooth boundary placed inside the gas container. If F : Ω 	 is the
collision map, i.e. the return map to the collision cross-section, then F preserves the
projected probability measure ν, which has the density dν = cosϕdϕ dr/(2 |∂D1|).
Here |∂D1| is the length of ∂D1.
We suppose that the flow is ergodic, and so F is an invertible, ergodic mea-
sure preserving transformation. Because ∂D1 is piecewise C3, F is piecewise C2,
although it does have discontinuities and unbounded derivatives near discontinu-
ities corresponding to grazing collisions. Because of our assumptions on D1, the
free flight times and the curvature of ∂D1 are uniformly bounded. It follows that
if x /∈ ∂Ω ∪ F−1(∂Ω), then F is differentiable at x, and
‖DF (x)‖ ≤ const
cosϕ(Fx)
, (4.2)
where ϕ(Fx) is the value of the ϕ coordinate at the image of x.
Following the ideas in Section 4.5, we induce F on the subspace Ω̂ of Ω cor-
responding to collisions with the (immobile) piston. We denote the induced map
by F̂ and the induced measure by ν̂. We parameterize Ω̂ by {(r, ϕ) : 0 ≤ r ≤
ℓ, ϕ ∈ [−π/2,+π/2]}. As νΩ̂ = ℓ/ |∂D1|, it follows that ν̂ has the density
dν̂ = cosϕdϕ dr/(2ℓ).
For x ∈ Ω, define ζx to be the free flight time, i.e. the time it takes the billiard
particle traveling at speed
2E1 to travel from x to Fx. If x /∈ ∂Ω ∪ F−1(∂Ω),
‖Dζ(x)‖ ≤ const
cosϕ(Fx)
. (4.3)
Santaló’s formula [San76, Che97] tells us that
Eνζ =
π |D1|
|v1| |∂D1|
. (4.4)
If ζ̂ : Ω̂ → R is the free flight time between collisions with the piston, then it
follows from Proposition 4.5.1 that
Eν̂ ζ̂ =
π |D1|
|v1| ℓ
. (4.5)
The expected value of
∣ when the left gas particle collides with the (immo-
bile) piston is given by
∣ = Eν̂
2E1 cosϕ =
∫ +π/2
cos2 ϕdϕ =
. (4.6)
We wish to compute limt→∞ t
∣2v⊥1 (s)
∣ δq⊥1 (s)=Qds, the time average of the
change in momentum of the left gas particle when it collides with the piston. If this
limit exists and is equal for almost every initial condition of the left gas particle,
then it makes sense to define the pressure inside D1 to be this quantity divided
by ℓ. Because the collisions are hard-core, we cannot directly apply Birkhoff’s
Ergodic Theorem to compute this limit. However, we can compute this limit by
using the map F̂ .
Lemma 4.2.1. If the billiard flow y(t) is ergodic, then for µ− a.e. y ∈ M1,
∣v⊥1 (s)
∣ δq⊥1 (s)=Qds =
2 |D1(Q)|
Proof. Because the billiard flow may be viewed as a suspension flow over the
collision cross-section with ζ as the height function, it suffices to show that the
convergence takes place for ν̂ − a.e. x ∈ Ω̂. For an initial condition x ∈ Ω̂, define
N̂t(x) = N̂t = #
s ∈ (0, t] : y(s, x) ∈ Ω̂
. By the Poincaré Recurrence Theorem,
N̂t → ∞ as t→ ∞, ν̂ − a.e.
n=0 ζ̂(F̂
∣ (F̂ nx) ≤ 1
∣v⊥1 (s)
∣ δq⊥1 (s)=Qds
≤ N̂t
∑N̂t−1
n=0 ζ̂(F̂
∣ (F̂ nx),
and so the result follows from Birkhoff’s Ergodic Theorem and Equations (4.5)
and (4.6).
Corollary 4.2.2. If the billiard flow y(t) is ergodic, then for each δ > 0,
y ∈ M1 :
∣v⊥1 (s)
∣ δq⊥1 (s)=Qds−
2 |D1(Q)|
→ 0 as t→ ∞.
4.2.2 Analysis of collisions
In this section, we return to studying our piston system when ε > 0. We will
examine what happens when a particle collides with the piston. For convenience,
we will only examine in detail collisions between the piston and the left gas particle.
Collisions with the right gas particle can be handled similarly.
When the left gas particle collides with the piston, v⊥1 and V instantaneously
change according to the laws of elastic collisions:
1−M 2M
2 M − 1
In our coordinates, this becomes
1 + ε2
ε2 − 1 2ε
2ε 1− ε2
. (4.7)
Recalling that v1,W = O(1), we find that to first order in ε,
v⊥+1 = −v⊥−1 +O(ε), W+ =W− +O(ε). (4.8)
Observe that a collision can only take place if v⊥−1 > εW
−. In particular, v⊥−1 >
2Emax. Thus, either v
1 > 0 or v
1 = O(ε). By expanding Equation (4.7)
to second order in ε, it follows that
E+1 − E−1 = −2εW
∣+O(ε2),
W+ −W− = +2ε
∣+O(ε2).
(4.9)
Note that it is immaterial whether we use the pre-collision or post-collision values
of W and
∣ on the right hand side of Equation (4.9), because any ambiguity
can be absorbed into the O(ε2) term.
It is convenient for us to define a “clean collision” between the piston and the
left gas particle:
Definition 4.2.1. The left gas particle experiences a clean collision with the
piston if and only if v⊥−1 > 0 and v
1 < −ε
2Emax.
In particular, after a clean collision, the left gas particle will escape from the
piston, i.e. the left gas particle will have to move into the region q⊥1 ≤ 0 before
it can experience another collision with the piston. It follows that there exists
a constant C1 > 0, which depends on the set V, such that for all ε sufficiently
small, so long as Q ≥ Qmin and
∣ > εC1 when q
1 ∈ [Qmin, Q], then the
left gas particle will experience only clean collisions with the piston, and the time
between these collisions will be greater than 2Qmin/(
2Emax). (Note that when we
write expressions such as q⊥1 ∈ [Qmin, Q], we implicitly mean that q1 is positioned
inside the “tube” discussed at the beginning of Section 4.1.) One can verify that
C1 = 5
2Emax would work.
Similarly, we can define clean collisions between the right gas particle and
the piston. We assume that C1 was chosen sufficiently large such that for all ε
sufficiently small, so long as Q ≤ Qmax and
∣ > εC1 when q
2 ∈ [Q,Qmax], then
the right gas particle will experience only clean collisions with the piston.
Now we define three more stopping times, which are functions of the initial
conditions in phase space.
T ′ε = inf{τ ≥ 0 : Qmin ≤ q⊥1,ε(τ/ε) ≤ Qε(τ/ε) ≤ Qmax and
∣v⊥1,ε(τ/ε)
∣ ≤ C1ε},
T ′′ε = inf{τ ≥ 0 : Qmin ≤ Qε(τ/ε) ≤ q⊥2,ε(τ/ε) ≤ Qmax and
∣v⊥2,ε(τ/ε)
∣ ≤ C1ε},
T̃ε =T ∧ Tε ∧ T ′ε ∧ T ′′ε
Define H(z) by
H(z) =
∣ δq⊥1 =Q − 2
∣ δq⊥2 =Q
∣ δq⊥1 =Q
∣ δq⊥2 =Q
Here we make use of Dirac delta functions. All integrals involving these delta
functions may be replaced by sums.
The following lemma is an immediate consequence of Equation (4.9) and the
above discussion:
Lemma 4.2.3. If 0 ≤ t1 ≤ t2 ≤ T̃ε/ε, the piston experiences O((t2 − t1) ∨ 1)
collisions with gas particles in the time interval [t1, t2], all of which are clean
collisions. Furthermore,
hε(t2)− hε(t1) = O(ε) + ε
H(zε(s))ds.
Here any ambiguities arising from collisions occurring at the limits of integration
can be absorbed into the O(ε) term.
4.2.3 Another heuristic derivation of the averaged equa-
The following heuristic derivation of Equation (4.1) when d = 2 was suggested
in [Dol05]. Let ∆t be a length of time long enough such that the piston experiences
many collisions with the gas particles, but short enough such that the slow variables
change very little, in this time interval. From each collision with the left gas
particle, Equation (4.9) states that W changes by an amount +2ε
∣ + O(ε2),
and from Equation (4.6) the average change in W at these collisions should be
approximately επ
2E1/2 + O(ε2). From Equation (4.5) the frequency of these
collisions is approximately
2E1 ℓ/(π |D1|). Arguing similarly for collisions with
the other particle, we guess that
|D1(Q)|
− ε E2ℓ|D2(Q)|
+O(ε2).
With τ = εt as the slow time, a reasonable guess for the averaged equation for W
|D1(Q)|
− E2ℓ|D2(Q)|
Similar arguments for the other slow variables lead to the averaged equation (4.1),
and this explains why we used Pi = Ei/ |Di| for the pressure of a 2-dimensional
gas in Section 1.2.
There is a similar heuristic derivation of the averaged equation in d > 2 dimen-
sions. Compare the analogues of Equations (4.5) and (4.6) in Subsection 4.4.2.
4.2.4 A priori estimate on the size of a set of bad initial
conditions
In this section, we give an a priori estimate on the size of a set of initial conditions
that should not give rise to orbits for which sup0≤τ≤T∧Tε
∣hε(τ/ε)− h̄(τ)
∣ is small.
In particular, when proving Theorem 4.1.1, it is convenient to focus on orbits that
only contain clean collisions with the piston. Thus, we show that P{T̃ε < T ∧ Tε}
vanishes as ε → 0. At first, this result may seem surprising, since P{T ′ε ∧ T ′′ε =
0} = O(ε), and one would expect ∪T/εt=0zε(−t){T ′ε ∧ T ′′ε = 0} to have a size of order
1. However, the rate at which orbits escape from {T ′ε ∧ T ′′ε = 0} is very small, and
so we can prove the following:
Lemma 4.2.4.
P{T̃ε < T ∧ Tε} = O(ε).
In some sense, this lemma states that the probability of having a gas particle
move nearly parallel to the piston’s face within the time interval [0, T/ε], when one
would expect the other gas particle to force the piston to move on a macroscopic
scale, vanishes as ε → 0. Thus, one can hope to control the occurrence of the
“nondiffusive fluctuations” of the piston described in [CD06a] on a time scale
O(ε−1).
Proof. As the left and the right gas particles can be handled similarly, it suffices
to show that P{T ′ε < T} = O(ε). Define
Bε = {z ∈ M : Qmin ≤ q⊥1 ≤ Q ≤ Qmax and
∣ ≤ C1ε}.
Then {T ′ε < T} ⊂ ∪
t=0zε(−t)Bε, and if γ = Qmin/
8Emax,
zε(−t)Bε
 = P
zε(t)Bε
 = P
Bε ∪
((zε(t)Bε)\Bε)
≤ PBε + P
T/(εγ)
zε(kγ)
(zε(t)Bε)\Bε
≤ PBε +
(zε(t)Bε)\Bε
Now PBε = O(ε), so if we can show that P (
t=0(zε(t)Bε)\Bε) = O(ε2), then it
will follow that P{T ′ε < T} = O(ε).
If z ∈
t=0(zε(t)Bε)\Bε, it is still true that
∣ = O(ε). This is because
changes by at most O(ε) at the collisions, and if a collision forces
∣ > C1ε, then
the gas particle must escape to the region q⊥1 ≤ 0 before v⊥1 can change again, and
this will take time greater than γ. Furthermore, if z ∈
t=0(zε(t)Bε)\Bε, then at
least one of the following four possibilities must hold:
∣q⊥1 −Qmin
∣ ≤ O(ε),
• |Q−Qmin| ≤ O(ε),
• |Q−Qmax| ≤ O(ε),
∣Q− q⊥1
∣ ≤ O(ε).
It follows that P (
t=0(zε(t)Bε)\Bε) = O(ε2). For example,
1{|v⊥1 |≤O(ε), |q⊥1 −Qmin|≤O(ε)}dP
= const
{Emin≤W 2/2+v21/2+v22/2≤Emax}
1{|v⊥1 |≤O(ε)}dWdv
{Q∈[0,1], q1∈D1, q2∈D2}
1{|q⊥1 −Qmin|≤O(ε)}dQdq
= O(ε2).
4.3 Proof of the main result for two-dimensional
gas containers with only one gas particle on
each side
As in Section 4.2, we continue with the case when d = 2 and there is only one gas
particle on either side of the piston.
4.3.1 Main steps in the proof of convergence in probability
By Lemma 4.2.4, it suffices to show that sup0≤τ≤T̃ε
∣hε(τ/ε)− h̄(τ)
∣→ 0 in prob-
ability as ε =M−1/2 → 0. Several of the ideas in the steps below were inspired by
a recent proof of Anosov’s averaging theorem for smooth systems that is due to
Dolgopyat [Dol05].
Step 1: Reduction using Gronwall’s Inequality. Observe that h̄(τ) satisfies
the integral equation
h̄(τ)− h̄(0) =
H̄(h̄(σ))dσ,
while from Lemma 4.2.3,
hε(τ/ε)− hε(0) = O(ε) + ε
∫ τ/ε
H(zε(s))ds
= O(ε) + ε
∫ τ/ε
H(zε(s))− H̄(hε(s))ds+
H̄(hε(σ/ε))dσ
for 0 ≤ τ ≤ T̃ε. Define
eε(τ) = ε
∫ τ/ε
H(zε(s))− H̄(hε(s))ds.
It follows from Gronwall’s Inequality that
0≤τ≤T̃ε
∣hε(τ/ε)− h̄(τ)
O(ε) + sup
0≤τ≤T̃ε
|eε(τ)|
eLip(H̄|V)T . (4.10)
Gronwall’s Inequality is usually stated for continuous paths, but the standard
proof (found in [SV85]) still works for paths that are merely integrable, and
∣hε(τ/ε)− h̄(τ)
∣ is piecewise smooth.
Step 2: Introduction of a time scale for ergodization. Let L(ε) be a
real valued function such that L(ε) → ∞, but L(ε) ≪ log ε−1, as ε → 0. In
Section 4.3.2 we will place precise restrictions on the growth rate of L(ε). Think
of L(ε) as being a time scale that grows as ε → 0 so that ergodization, i.e. the
convergence along an orbit of a function’s time average to a space average, can
take place. However, L(ε) doesn’t grow too fast, so that on this time scale zε(t)
essentially stays on the submanifold {h = hε(0)}, where we have our ergodicity
assumption. Set tk,ε = kL(ε), so that
0≤τ≤T̃ε
|eε(τ)| ≤ O(εL(ε)) + ε
εL(ε)
∫ tk+1,ε
H(zε(s))− H̄(hε(s))ds
. (4.11)
Step 3: A splitting according to particles. Now H(z) − H̄(h(z)) divides
into two pieces, each of which depends on only one gas particle when the piston
is held fixed:
H(z)− H̄(h(z)) =
∣ δq⊥1 =Q −
|D1(Q)|
∣ δq⊥1 =Q +
|D1(Q)|
|D2(Q)| − 2
∣ δq⊥2 =Q
− WE2ℓ|D2(Q)| + 2W
∣ δq⊥2 =Q
We will only deal with the piece depending on the left gas particle, as the right
particle can be handled similarly. Define
G(z) =
∣ δq⊥1 =Q, Ḡ(h) =
2 |D1(Q)|
. (4.12)
Returning to Equation (4.11), we see that in order to prove Theorem 4.1.1, it
suffices to show that both
εL(ε)
∫ tk+1,ε
G(zε(s))− Ḡ(hε(s))ds
εL(ε)
∫ tk+1,ε
Wε(s)
G(zε(s))− Ḡ(hε(s))
converge to 0 in probability as ε→ 0.
Step 4: A splitting for using the triangle inequality. Now we let zk,ε(s)
be the orbit of the ε = 0 Hamiltonian vector field satisfying zk,ε(tk,ε) = zε(tk,ε).
Set hk,ε(t) = h(zk,ε(t)). Observe that hk,ε(t) is independent of t.
We emphasize that so long as 0 ≤ t ≤ T̃ε/ε, the times between collisions of a
specific gas particle and piston are uniformly bounded greater than 0, as explained
before Lemma 4.2.3. It follows that, so long as tk+1,ε ≤ T̃ε/ε,
tk,ε≤t≤tk+1,ε
|hk,ε(t)− hε(t)| = O(εL(ε)). (4.13)
This is because the slow variables change by at most O(ε) at collisions, and
dQε/dt = O(ε).
Also,
∫ tk+1,ε
Wε(s)
G(zε(s))− Ḡ(hε(s))
= O(εL(ε)2) +Wk,ε(tk,ε)
∫ tk+1,ε
G(zε(s))− Ḡ(hε(s))ds,
and so
εL(ε)
∫ tk+1,ε
Wε(s)
G(zε(s))− Ḡ(hε(s))
≤ O(εL(ε)) + ε const
εL(ε)
∫ tk+1,ε
G(zε(s))− Ḡ(hε(s))ds
Thus, in order to prove Theorem 4.1.1, it suffices to show that
εL(ε)
∫ tk+1,ε
G(zε(s))− Ḡ(hε(s))ds
εL(ε)
|Ik,ε|+ |IIk,ε|+ |IIIk,ε|
converges to 0 in probability as ε → 0, where
Ik,ε =
∫ tk+1,ε
G(zε(s))−G(zk,ε(s))ds,
IIk,ε =
∫ tk+1,ε
G(zk,ε(s))− Ḡ(hk,ε(s))ds,
IIIk,ε =
∫ tk+1,ε
Ḡ(hk,ε(s))− Ḡ(hε(s))ds.
The term IIk,ε represents an “ergodicity term” that can be controlled by our
assumptions on the ergodicity of the flow z0(t), while the terms Ik,ε and IIIk,ε
represent “continuity terms” that can be controlled by controlling the drift of
zε(t) from zk,ε(t) for tk,ε ≤ t ≤ tk+1,ε.
Step 5: Control of drift from the ε = 0 orbits. Now Ḡ is uniformly
Lipschitz on the compact set V, and so it follows from Equation (4.13) that
IIIk,ε = O(εL(ε)2). Thus, ε
εL(ε)
k=0 |IIIk,ε| = O(εL(ε)) → 0 as ε→ 0.
Next, we show that for fixed δ > 0, P
εL(ε)
k=0 |Ik,ε| ≥ δ
→ 0 as ε → 0.
For initial conditions z ∈ M and for integers k ∈ [0, T/(εL(ε))− 1] define
Ak,ε =
|Ik,ε| >
and k ≤ T̃ε
εL(ε)
Az,ε = {k : z ∈ Ak,ε} .
Think of these sets as describing “poor continuity” between solutions of the ε = 0
and the ε > 0 Hamiltonian vector fields. For example, roughly speaking, z ∈ Ak,ε
if the orbit zε(t) starting at z does not closely follow zk,ε(t) for tk,ε ≤ t ≤ tk+1,ε.
One can easily check that |Ik,ε| ≤ O(L(ε)) for k ≤ T̃ε/(εL(ε))− 1, and so it
follows that
εL(ε)
|Ik,ε| ≤
+O(εL(ε)#(Az,ε)).
Therefore it suffices to show that P (#(Az,ε) ≥ δ(const εL(ε))−1) → 0 as ε → 0.
By Chebyshev’s Inequality, we need only show that
EP (εL(ε)#(Az,ε)) = εL(ε)
εL(ε)
P (Ak,ε)
tends to 0 with ε.
Observe that zε(tk,ε)Ak,ε ⊂ A0,ε. In words, the initial conditions giving rise to
orbits that are “bad” on the time interval [tk,ε, tk+1,ε], moved forward by time tk,ε,
are initial conditions giving rise to orbits which are “bad” on the time interval
[t0,ε, t1,ε]. Because the flow zε(·) preserves the measure, we find that
εL(ε)
εL(ε)
P (Ak,ε) ≤ constP (A0,ε).
To estimate P (A0,ε), it is convenient to use a different probability measure,
which is uniformly equivalent to P on the set {z ∈ M : h(z) ∈ V} ⊃ {T̃ε ≥ εL(ε)}.
We denote this new probability measure by P f , where the f stands for “factor.”
If we choose coordinates on M by using h and the billiard coordinates on the two
gas particles, then P f is defined on M by dP f = dh dµ1h dµ2h, where dh represents
the uniform measure on V ⊂ R4, and the factor measure dµih represents the in-
variant billiard measure of the ith gas particle coordinates for a fixed value of the
slow variables. One can verify that 1{h(z)∈V}dP ≤ const dP f , but that P f is not
invariant under the flow zε(·) when ε > 0.
We abuse notation, and consider µ1h to be a measure on the left particle’s initial
billiard coordinates once h and the initial coordinates of the right gas particle are
fixed. In this context, µ1h is simply the measure µ from Subsection 4.2.1. Then
P f(A0,ε)
dh dµ2h · µ1h
∫ L(ε)
G(zε(s))−G(z0(s))ds
and εL(ε) ≤ T̃ε
and we must show that the last term tends to 0 with ε. By the Bounded Con-
vergence Theorem, it suffices to show that for almost every h ∈ V and initial
condition for the right gas particle,
∫ L(ε)
G(zε(s))−G(z0(s))ds
and εL(ε) ≤ T̃ε
→ 0 as ε→ 0.
(4.14)
Note that if G were a smooth function and zε(·) were the flow of a smooth
family of vector fields Z(z, ε) that depended smoothly on ε, then from Gronwall’s
Inequality, it would follow that sup0≤t≤L(ε) |zε(t)− z0(t)| ≤ O(εL(ε)eLip(Z)L(ε)). If
this were the case, then
L(ε)−1
∫ L(ε)
G(zε(s))−G(z0(s))ds
= O(εL(ε)eLip(Z)L(ε)),
which would tend to 0 with ε. Thus, we need a Gronwall-type inequality for billiard
flows. We obtain the appropriate estimates in Section 4.3.2.
Step 6: Use of ergodicity along fibers to control IIk,ε. All that remains
to be shown is that for fixed δ > 0, P
εL(ε)
k=0 |IIk,ε| ≥ δ
→ 0 as ε→ 0.
For initial conditions z ∈ M and for integers k ∈ [0, T/(εL(ε))− 1] define
Bk,ε =
|IIk,ε| >
and k ≤ T̃ε
εL(ε)
Bz,ε = {k : z ∈ Bk,ε} .
Think of these sets as describing “bad ergodization.” For example, roughly speak-
ing, z ∈ Bk,ε if the orbit zε(t) starting at z spends the time between tk,ε and tk+1,ε
in a region of phase space where the function G(·) is “poorly ergodized” on the
time scale L(ε) by the flow z0(t) (as measured by the parameter δ/2T ). Note that
G(z) =
∣ δq⊥1 =Q is not really a function, but that we may still speak of the
convergence of t−1
G(z0(s))ds as t → ∞. As we showed in Lemma 4.2.1, the
limit is Ḡ(h0) for almost every initial condition.
Proceeding as in Step 5 above, we find that it suffices to show that for almost
every h ∈ V,
G(z0(s))ds− Ḡ(h0(0))
→ 0 as t→ ∞.
But this is simply a question of examining billiard flows, and it follows immediately
from Corollary 4.2.2 and our Main Assumption.
4.3.2 A Gronwall-type inequality for billiards
We begin by presenting a general version of Gronwall’s Inequality for billiard
maps. Then we will show how these results imply the convergence required in
Equation (4.14).
Some inequalities for the collision map
In this section, we consider the value of the slow variables to be fixed at h0 ∈ V.
We will use the notation and results presented in Section 4.2.1, but because the
value of the slow variables is fixed, we will omit it in our notation.
Let ρ, γ, and λ satisfy 0 < ρ≪ γ ≪ 1 ≪ λ <∞. Eventually, these quantities
will be chosen to depend explicitly on ε, but for now they are fixed.
Recall that the phase space Ω for the collision map F is a finite union of
disjoint rectangles and cylinders. Let d(·, ·) be the Euclidean metric on connected
components of Ω. If x and x′ belong to different components, then we set d(x, x′) =
∞. The invariant measure ν satisfies ν < const · (Lebesgue measure). For A ⊂ Ω
and a > 0, let Na(A) = {x ∈ Ω : d(x,A) < a} be the a-neighborhood of A.
For x ∈ Ω let xk(x) = xk = F kx, k ≥ 0, be its forward orbit. Suppose x /∈ Cγ,λ,
where
Cγ,λ =
∪λk=0F−kNγ(∂Ω)
∪λk=0F−kNγ(F−1Nγ(∂Ω))
Thus for 0 ≤ k ≤ λ, xk is well defined, and from Equation (4.2) it satisfies
d(x′, xk) ≤ γ ⇒ d(Fx′, xk+1) ≤
const
d(x′, xk). (4.15)
Next, we consider any ρ-pseudo-orbit x′k obtained from x by adding on an
error of size ≤ ρ at each application of the map, i.e. d(x′0, x0) ≤ ρ, and for k ≥ 1,
d(x′k, Fx
k−1) ≤ ρ. Provided d(xj , x′j) < γ for each j < k, it follows that
d(xk, x
k) ≤ ρ
const
≤ const ρ
const
. (4.16)
In particular, if ρ, γ, and λ were chosen such that
const ρ
const
< γ, (4.17)
then Equation (4.16) will hold for each k ≤ λ. We assume that Equation (4.17)
is true. Then we can also control the differences in elapsed flight times using
Equation (4.3):
|ζxk − ζx′k| ≤
const ρ
const
. (4.18)
It remains to estimate the size νCγ,λ of the set of x for which the above estimates
do not hold. Using Lemma 4.3.1 below,
νCγ,λ ≤ (λ+ 1)
νNγ(∂Ω) + νNγ(F−1Nγ(∂Ω))
≤ O(λ(γ + γ1/3)) = O(λγ1/3).
(4.19)
Lemma 4.3.1. As γ → 0,
νNγ(F−1Nγ(∂Ω)) = O(γ1/3).
This estimate is not necessarily the best possible. For example, for dispersing
billiard tables, where the curvature of the boundary is positive, one can show that
νNγ(F−1Nγ(∂Ω)) = O(γ). However, the estimate in Lemma 4.3.1 is general and
sufficient for our needs.
Proof. First, we note that it is equivalent to estimate νNγ(FNγ(∂Ω)), as F has
the measure-preserving involution I(r, ϕ) = (r,−ϕ), i.e. F−1 = I ◦F ◦I [CM06b].
Fix α ∈ (0, 1/2), and cover Nγ(∂Ω) with O(γ−1) starlike sets, each of diameter
no greater than O(γ). For example, these sets could be squares of side length γ.
Enumerate the sets as {Ai}. Set G = {i : FAi ∩ Nγα(∂Ω) = ∅}.
If i ∈ G, F |Ai is a diffeomorphism satisfying ‖DF |Ai‖ ≤ O(γ−α). See Equa-
tion (4.2). Thus diameter (FAi) ≤ O(γ1−α), and so
diameter (Nγ(FAi)) ≤ O(γ1−α).
Hence νNγ(FAi) ≤ O(γ2(1−α)), and νNγ(∪i∈GFAi) ≤ O(γ1−2α).
If i /∈ G, Ai ∩ F−1(Nγα(∂Ω)) 6= ∅. Thus Ai might be cut into many pieces
by F−1(∂Ω), but each of these pieces must be mapped near ∂Ω. In fact, FAi ⊂
NO(γα)(∂Ω). This is because outside F−1(Nγα(∂Ω)), ‖DF‖ ≤ O(γ−α), and so
points in FAi are no more than a distance O(γ/γα) away from Nγα(∂Ω), and γ <
γ1−α < γα. It follows that Nγ(FAi) ⊂ NO(γα)(∂Ω), and νNO(γα)(∂Ω) = O(γα).
Thus νNγ(F−1Nγ(∂Ω)) = O(γ1−2α + γα), and we obtain the lemma by taking
α = 1/3.
Application to a perturbed billiard flow
Returning to the end of Step 5 in Section 4.3.1, let the initial conditions of the slow
variables be fixed at h0 = (Q0,W0, E1,0, E2,0) ∈ V throughout the remainder of this
section. We can assume that the billiard dynamics of the left gas particle in D1(Q0)
are ergodic. Also, fix a particular value of the initial conditions for the right gas
particle for the remainder of this section. Then zε(t) and T̃ε may be thought of as
random variables depending on the left gas particle’s initial conditions y ∈ M1.
Now if hε(t) = (Qε(t),Wε(t), E1,ε(t), E2,ε(t)) denotes the actual motions of the slow
variables when ε > 0, it follows from Equation (4.13) that, provided εL(ε) ≤ T̃ε,
0≤t≤L(ε)
|h0 − hε(t)| = O(εL(ε)). (4.20)
Furthermore, we only need to show that
y ∈ M1 :
∫ L(ε)
G(zε(s))−G(z0(s))ds
and εL(ε) ≤ T̃ε
(4.21)
as ε→ 0, where G is defined in Equation (4.12).
For definiteness, we take the following quantities from Subsection 4.3.2 to de-
pend on ε as follows:
L(ε) = L = log log
γ(ε) = γ = e−L,
λ(ε) = λ =
ρ(ε) = ρ = const
(4.22)
The constant in the choice of ρ and ρ’s dependence on ε will be explained in the
proof of Lemma 4.3.3, which is at the end of this subsection. The other choices may
be explained as follows. We wish to use continuity estimates for the billiard map
to produce continuity estimates for the flow on the time scale L. As the divergence
of orbits should be exponentially fast, we choose L to grow sublogarithmically in
ε−1. Since from Equation (4.4) the expected flight time between collisions with
∂D1(Q0) when ε = 0 is Eνζ = π |D1(Q0)| /(
2E1,0 |∂D1(Q0)|), we expect to see
roughly λ/2 collisions on this time scale. Considering λ collisions gives us some
margin for error. Furthermore, we will want orbits to keep a certain distance,
γ, away from the billiard discontinuities. γ → 0 as ε → 0, but γ is very large
compared to the possible drift O(εL) of the slow variables on the time scale L. In
fact, for each C,m, n > 0,
= O(ε econstL2) → 0 as ε→ 0. (4.23)
Let X : M1 → Ω be the map taking y ∈ M1 to x = X(y) ∈ Ω, the location
of the billiard orbit of y in the collision cross-section that corresponds to the most
recent time in the past that the orbit was in the collision cross-section. We consider
the set of initial conditions
Eε = X−1(Ω\Cγ,λ)
x ∈ Ω :
ζ(F kx) > L
Now from Equations (4.19) and (4.22), νCγ,λ → 0 as ε → 0. Furthermore, by the
ergodicity of F ,
x ∈ Ω :
ζ(F kx) ≤ L
x ∈ Ω : λ−1
ζ(F kx) ≤ Eνζ/2
as ε → 0. But because the free flight time is bounded above, µX−1 ≤ const ·ν, and
so µEε → 1 as ε→ 0. Hence, the convergence in Equation (4.21) and the conclusion
of the proof in Section 4.3.1 follow from the lemma below and Equation (4.23).
Lemma 4.3.2 (Analysis of deviations along good orbits). As ε→ 0,
y∈Eε∩{εL≤T̃ε}
G(zε(s))−G(z0(s))ds
const
+O(L−1) → 0.
Proof. Fix a particular value of y ∈ Eε∩
εL ≤ T̃ε
. For convenience, suppose that
y = X(y) = x ∈ Ω. Let y0(t) denote the time evolution of the billiard coordinates
for the left gas particle when ε = 0. Then there is some N ≤ λ such that the
orbit xk = F
kx = (rk, ϕk) for 0 ≤ k ≤ N corresponds to all of the instances
(in order) when y0(t) enters the collision cross-section Ω = Ωh0 corresponding to
collisions with ∂D1(Q0) for 0 ≤ t ≤ L. We write Ωh0 to emphasize that in this
subsection we are only considering the collision cross-section corresponding to the
billiard dynamics in the domain D1(Q0) at the energy level E1,0. In particular, F
will always refer to the return map on Ωh0 .
Also, define an increasing sequence of times tk corresponding to the actual
times y0(t) enters the collision cross-section, i.e.
t0 = 0,
tk = tk−1 + ζxk−1 for k > 0.
Then xk = y0(tk). Furthermore, define inductively
N1 = inf {k > 0 : tk corresponds to a collision with the piston} ,
Nj = inf {k > Nj−1 : tk corresponds to a collision with the piston} .
Next, let yε(t) denote the time evolution of the billiard coordinates for the left
gas particle when ε > 0. We will construct a pseudo-orbit x′k,ε = (r
k,ε, ϕ
k,ε) of
points in Ωh0 that essentially track the collisions (in order) of the left gas particle
with the boundary under the dynamics of yε(t) for 0 ≤ t ≤ L.
First, define an increasing sequence of times t′k,ε corresponding to the actual
times yε(t) experiences a collision with the boundary of the gas container or the
moving piston. Define
N ′ε = sup
k ≥ 0 : t′k,ε ≤ L
N ′1,ε = inf
k > 0 : t′k,ε corresponds to a collision with the piston
N ′j,ε = inf
k > N ′j−1,ε : t
k,ε corresponds to a collision with the piston
Because L ≤ T̃ε(y)/ε, we know that as long as N ′j+1,ε ≤ N ′ε, then N ′j+1,ε−N ′j,ε ≥ 2.
See the discussion in Subsection 4.2.2. Then we define x′k,ε ∈ Ωh0 by
x′k,ε =
k,ε) if k /∈
N ′j,ε
F−1x′k+1,ε if k ∈
N ′j,ε
Lemma 4.3.3. Provided ε is sufficiently small, the following hold for each k ∈
[0, N∧N ′ε). Furthermore, the requisite smallness of ε and the sizes of the constants
in these estimates may be chosen independent of the initial condition y ∈ Eε ∩
εL ≤ T̃ε
and of k:
(a) x′k,ε is well defined. In particular, if k /∈
N ′j,ε
, yε(t
k,ε) corresponds to a
collision point on ∂D1(Q0), and not to a collision point on a piece of ∂D to
the right of Q0.
(b) If k > 0 and k /∈
N ′j,ε
, then x′k,ε = Fx
k−1,ε.
(c) If k > 0 and k ∈
N ′j,ε
, then d(x′k,ε, Fx
k−1,ε) ≤ ρ and the ϕ coordinate of
k,ε) satisfies ϕ(yε(t
k,ε)) = ϕ
k,ε +O(ε).
(d) d(xk, x
k,ε) ≤ const ρ(const/γ)k .
(e) k = N ′j,ε if and only if k = Nj.
(f) If k > 0, t′k,ε − t′k−1,ε = tk − tk−1 +O(ρ (const/γ)
We defer the proof of Lemma 4.3.3 until the end of this subsection. Assuming
that ε is sufficiently small for the conclusions of Lemma 4.3.3 to be valid, we
continue with the proof of Lemma 4.3.2.
Set M = N ∧ N ′ε − 1. Note that M ≤ λ ∼ L. From (f) in Lemma 4.3.3 and
Equations (4.22) and (4.23), we see that
∣tM − t′M,ε
∣t′k,ε − t′k−1,ε − (tk − tk−1)
∣ = O
constλ
→ 0 as ε → 0.
Because the flight times t′k,ε − t′k−1,ε and tk − tk−1 are uniformly bounded above,
it follows from the definitions of N and N ′ε that tM , t
M,ε ≥ L − const. But from
Subsection 4.2.2, the time between the collisions of the left gas particle with the
piston are uniformly bounded away from zero. Using (c) and Equation (4.20), it
follows that
G(zε(s))−G(z0(s))ds
= O(L−1) +
k∈{Nj :Nj≤M}
2E1,0 cosϕk −
2E1,ε(t
k,ε) cos(ϕ
k,ε +O(ε))
= O(L−1) +
k∈{Nj :Nj≤M}
2E1,0 cosϕk −
2E1,0 cosϕ
k,ε +O(εL)
= O(L−1) +O(εL2) +
2E1,0
k∈{Nj :Nj≤M}
∣cosϕk − cosϕ′k,ε
But using (d),
k∈{Nj :Nj≤M}
∣cosϕk − cosϕ′k,ε
O(ρ(const/γ)k) = O(ρ(const/γ)λ).
Since εL2 = O(ρ(const/γ)λ), this finishes the proof of Lemma 4.3.2.
Proof of Lemma 4.3.3. The proof is by induction. We take ε to be so small that
Equation (4.17) is satisfied. This is possible by Equation (4.23).
It is trivial to verify (a)-(f) for k = 0. So let 0 < l < N ∧N ′ε, and suppose that
(a)-(f) have been verified for all k < l. We have three cases to consider:
Case 1: l − 1 and l /∈
N ′j,ε
In this case, verifying (a)-(f) for k = l is a relatively straightforward application
of the machinery developed in Subsection 4.3.2, because for t′l−1,ε ≤ t ≤ t′l,ε, yε(t)
traces out the billiard orbit between x′l−1,ε and x
l,ε corresponding to free flight in
the domain D1(Q0). We make only two remarks.
First, as long as ε is sufficiently small, it really is true that x′l,ε = yε(t
corresponds to a true collision point on ∂D1(Q0). Indeed, if this were not the
case, then it must be that Qε(t
l,ε) > Q0, and yε(t
l,ε) would have to correspond
to a collision with the side of the “tube” to the right of Q0. But then x
l,ε =
Fx′l−1,ε ∈ Ωh0 would correspond to a collision with an immobile piston at Q0 and
would satisfy d(xk, x
k,ε) ≤ const ρ(const/γ)k ≤ const ρ(const/γ)λ = o(γ), using
Equations (4.16) and (4.23). But xk /∈ Nγ(∂Ωh0), and so it follows that when the
trajectory of yε(t) crosses the plane {Q = Q0}, it is at least a distance ∼ γ away
from the boundary of the face of the piston, and its velocity vector is pointed
no closer than ∼ γ to being parallel to the piston’s face. As Qε(t′l,ε) − Q0 =
O(εL) = o(γ), and it is geometrically impossible (for small ε) to construct a right
triangle whose sides s1, s2 satisfy |s1| ≥∼ γ, |s2| ≤ O(εL), with the measure of
the acute angle adjacent to s1 being greater than ∼ γ, we have a contradiction.
After crossing the plane {Q = Q0}, yε(t) must experience its next collision with
the face of the piston, which violates the fact that l /∈
N ′j,ε
Second, t′l,ε − t′l−1,ε = ζx′l−1,ε +O(εL), because v1,ε = v1,0 +O(εL). See Equa-
tion (4.20). From Equation 4.18,
∣ζxl−1 − ζx′l−1,ε
∣ ≤ O((ρ/γ) (const/γ)l−1). As
tl − tl−1 = ζxl−1 and εL = O((ρ/γ) (const/γ)l−1), we obtain (f).
Case 2: There exists i such that l = N ′i,ε:
For definiteness, we suppose that Qε(t
l,ε) ≥ Q0, so that the left gas particle collides
with the piston to the right of Q0. The case when Qε(t
l,ε) ≤ Q0 can be handled
similarly.
We know that xl−1, xl, xl+1 /∈ Nγ(∂Ωh0)∪Nγ(F−1Nγ(∂Ωh0)). Using the induc-
tive hypothesis and Equation (4.16), we can define
x′′l,ε = Fx
l−1,ε, x
l+1,ε = F
2x′l−1,ε,
and d(xl, x
l,ε) ≤ const ρ(const/γ)l, d(xl+1, x′′l+1,ε) ≤ const ρ(const/γ)l+1. In partic-
ular, x′′l,ε and x
l+1,ε are both a distance ∼ γ away from ∂Ωh0 . Furthermore, when
the left gas particle collides with the moving piston, it follows from Equation (4.8)
that the difference between its angle of incidence and its angle of reflection is
O(ε). Referring to Figure 4.3, this means that ϕ′l,ε = ϕ′′l,ε + O(ε). Geometric
arguments similar to the one given in Case 1 above show that the yε-trajectory
of the left gas particle has precisely one collision with the piston and no other
collisions with the sides of the gas container when the gas particle traverses the
region Q0 ≤ Q ≤ Qε(t′l,ε). Note that x′l,ε was defined to be the point in the
collision cross-section Ωh0 corresponding to the return of the yε-trajectory into
the region Q ≤ Q0. See Figure 4.3. From this figure, it is also evident that
d(r′l,ε, r
l,ε) ≤ O(εL/γ). Thus d(x′′l,ε, x′l,ε) = O(εL/γ), and this explains the choice
of ρ(ε) in Equation (4.22).
From the above discussion and the machinery of Subsection 4.3.2, (a)-(e) now
follow readily for both k = l and k = l + 1. Furthermore, property (f) follows in
much the same manner as it did in Case 1 above. However, one should note that
t′l,ε− t′l−1,ε = ζx′l−1,ε+O(εL)+O(εL/γ) and t′l+1,ε− t′l,ε = ζx′l,ε+O(εL)+O(εL/γ),
because of the extra distance O(εL/γ) that the gas particle travels to the right of
Q0. But εL/γ = O((ρ/γ) (const/γ)l−1), and so property (f) follows.
Case 3: There exists i such that l − 1 = N ′i,ε:
As mentioned above, the inductive step in this case follows immediately from our
analysis in Case 2.
r−coordinate Q0 Qε(t′l,ε)
D1(Q0)
γ/2 γ/2
O(εL)
O(εL/γ)
r′′l,ε
ϕ′′l,ε
r′l,ε
ϕ′l,ε
r′l−1,ε
ϕ′l−1,ε
r′l+1,ε
r′′l+1,ε
Figure 4.3: An analysis of the divergences of orbits when ε > 0 and the left
gas particle collides with the moving piston to the right of Q0. Note that the
dimensions are distorted for visual clarity, but that εL and εL/γ are both o(γ) as
ε→ 0.
Furthermore, ϕ′′l,ε ∈ (−π/2 + γ/2, π/2 − γ/2) and ϕ′l,ε = ϕ′′l,ε + O(ε), and so
r′l,ε = r
l,ε + O(εL/γ). In particular, the yε-trajectory of the left gas particle has
precisely one collision with the piston and no other collisions with the sides of the
gas container when the gas particle traverses the region Q0 ≤ Q ≤ Qε(t′l,ε)
4.4 Generalization to a full proof
of Theorem 4.1.1
It remains to generalize the proof in Sections 4.2 and 4.3 to the cases when n1, n2 ≥
1 and d = 3.
4.4.1 Multiple gas particles on each side of the piston
When d = 2, but n1, n2 ≥ 1, only minor modifications are necessary to generalize
the proof above. As in Subsection 4.2.2, one defines a stopping time T̃ε satisfying
T̃ε < T ∧ Tε
= O(ε) such that for 0 ≤ t ≤ T̃ε/ε, gas particles will only
experience clean collisions with the piston.
Next, define H(z) by
H(z) =
∣v⊥1,j
∣ δq⊥1,j=Q
∣v⊥2,j
∣ δq⊥2,j=Q
∣v⊥1,j
∣ δq⊥1,j=Q
∣v⊥2,j
∣ δq⊥2,j=Q
It follows that for 0 ≤ t ≤ T̃ε/ε, hε(t)−hε(0) = O(ε)+ε
H(zε(s))ds. From here,
the rest of the proof follows the same steps made in Subsection 4.3.1. We note
that at Step 3, we find that H(z) − H̄(h(z)) divides into n1 + n2 pieces, each of
which depends on only one gas particle when the piston is held fixed.
4.4.2 Three dimensions
The proof of Theorem 4.1.1 in d = 3 dimensions is essentially the same as the proof
in two dimensions given above. The principal differences are due to differences in
the geometry of billiards. We indicate the necessary modifications.
In analogy with Section 4.2.1, we briefly summarize the necessary facts for the
billiard flows of the gas particles when M = ∞ and the slow variables are held
fixed at a specific value h ∈ V. As before, we will only consider the motions of one
gas particle moving in D1. Thus we consider the billiard flow of a point particle
moving inside the domain D1 at a constant speed
2E1. Unless otherwise noted,
we use the notation from Section 4.2.1.
The billiard flow takes place in the five-dimensional space M1 = {(q1, v1) ∈
T D1 : q1 ∈ D1, |v1| =
2E1}/ ∼. Here the quotient means that when q1 ∈ ∂D1,
we identify velocity vectors pointing outside of D1 with those pointing inside D1
by reflecting orthogonally through the tangent plane to ∂D1 at q1. The billiard
flow preserves Liouville measure restricted to the energy surface. This measure
has the density dµ = dq1dv1/(8πE1 |D1|). Here dq1 represents volume on R3, and
dv1 represents area on S
v1 ∈ R3 : |v1| =
The collision cross-section Ω = {(q1, v1) ∈ T D1 : q1 ∈ ∂D1, |v1| =
2E1}/ ∼ is
properly thought of as a fiber bundle, whose base consists of the smooth pieces of
∂D1 and whose fibers are the set of outgoing velocity vectors at q1 ∈ ∂D1. This and
other facts about higher-dimensional billiards, with emphasis on the dispersing
case, can be found in [BCST03]. For our purposes, Ω can be parameterized as
follows. We decompose ∂D1 into a finite union ∪jΓj of pieces, each of which
is diffeomorphic via coordinates r to a compact, connected subset of R2 with
a piecewise C3 boundary. The Γj are nonoverlapping, except possibly on their
boundaries. Next, if (q1, v1) ∈ Ω and v1 is the outward going velocity vector, let
v̂ = v1/ |v1|. Then Ω can be parameterized by {x = (r, v̂)}. It follows that Ω it is
diffeomorphic to ∪jΓj × S2+, where S2+ is the upper unit hemisphere, and by ∂Ω
we mean the subset diffeomorphic to (∪j∂Γj × S2+)
(∪jΓj × ∂S2+). If x ∈ Ω,
we let ϕ ∈ [0, π/2] represent the angle between the outgoing velocity vector and
the inward pointing normal vector n to ∂D1, i.e. cosϕ = 〈v̂, n〉. Note that we no
longer allow ϕ to take on negative values. The return map F : Ω 	 preserves the
projected probability measure ν, which has the density dν = cosϕdv̂ dr/(π |∂D1|).
Here |∂D1| is the area of ∂D1.
F is an invertible, measure preserving transformation that is piecewise C2.
Because of our assumptions on D1, the free flight times and the curvature of ∂D1
are uniformly bounded. The bound on ‖DF (x)‖ given in Equation (4.2) is still
true. A proof of this fact for general three-dimensional billiard tables with finite
horizon does not seem to have made it into the literature, although see [BCST03]
for the case of dispersing billiards. For completeness, we provide a sketch of a
proof for general billiard tables in Section 4.6.
We suppose that the billiard flow is ergodic, so that F is ergodic. Again, we
induce F on the subspace Ω̂ of Ω corresponding to collisions with the (immobile)
piston to obtain the induced map F̂ : Ω̂ 	 that preserves the induced measure ν̂.
The free flight time ζ : Ω → R again satisfies the derivative bound given in
Equation (4.3). The generalized Santaló’s formula [Che97] yields
Eνζ =
4 |D1|
|v1| |∂D1|
If ζ̂ : Ω̂ → R is the free flight time between collisions with the piston, then it
follows from Proposition 4.5.1 that
Eν̂ ζ̂ =
4 |D1|
|v1| ℓ
The expected value of
∣ when the left gas particle collides with the (immo-
bile) piston is given by
∣ = Eν̂
2E1 cosϕ =
cos2 ϕdv̂1 =
As a consequence, we obtain
Lemma 4.4.1. For µ− a.e. y ∈ M1,
∣v⊥1 (s)
∣ δq⊥1 (s)=Qds =
3 |D1(Q)|
Compare the proof of Lemma 4.2.1.
With these differences in mind, the rest of the proof of Theorem 4.1.1 when
d = 3 proceeds in the same manner as indicated in Sections 4.2, 4.3 and 4.4.1
above. The only notable difference occurs in the proof of the Gronwall-type in-
equality for billiards. Due to dimensional considerations, if one follows the proof
of Lemma 4.3.1 for a three-dimensional billiard table, one finds that
νNγ(F−1Nγ(∂Ω)) = O(γ1−4α + γα).
The optimal value of α is 1/5, and so νNγ(F−1Nγ(∂Ω)) = O(γ1/5) as γ → 0.
Hence νCγ,λ = O(λγ1/5), which is a slightly worse estimate than the one in Equa-
tion (4.19). However, it is still sufficient for all of the arguments in Section 4.3.2,
and this finishes the proof.
4.5 Inducing maps on subspaces
Here we present some well-known facts on inducing measure preserving transfor-
mations on subspaces. Let F : (Ω,B, ν) 	 be an invertible, ergodic, measure
preserving transformation of the probability space Ω endowed with the σ-algebra
B and the probability measure ν. Let Ω̂ ∈ B satisfy 0 < νΩ̂ < 1. Define
R : Ω̂ → N to be the first return time to Ω̂, i.e. Rω = inf{n ∈ N : F nω ∈ Ω̂}.
Then if ν̂ := ν(· ∩ Ω̂)/νΩ̂ and B̂ := {B ∩ Ω̂ : B ∈ B}, F̂ : (Ω̂, B̂, ν̂) 	 defined
by F̂ω = FRωω is also an invertible, ergodic, measure preserving transforma-
tion [Pet83]. Furthermore Eν̂R =
Rdν̂ = (νΩ̂)−1.
This last fact is a consequence of the following proposition:
Proposition 4.5.1. If ζ : Ω → R≥0 is in L1(ν), then ζ̂ =
n=0 ζ ◦F n is in L1(ν̂),
Eν̂ ζ̂ =
Proof.
ζ ◦ F n dν̂ =
ζ ◦ F n dν =
Ω̂∩{R=k}
ζ ◦ F n dν
Fn(Ω̂∩{R=k})
ζ dν =
ζ dν,
because {F n(Ω̂ ∩ {R = k}) : 0 ≤ n < k <∞} is a partition of Ω.
4.6 Derivative bounds for the billiard map
in three dimensions
Returning to Section 4.4.2, we need to show that for a billiard table D1 ⊂ R3 with
a piecewise C3 boundary and the free flight time uniformly bounded above, the
billiard map F satisfies the following: If x0 /∈ ∂Ω ∪ F−1(∂Ω), then
‖DF (x0)‖ ≤
const
cosϕ(Fx0)
Fix x0 = (r0, v̂0) ∈ Ω, and let x1 = (r1, v̂1) = Fx0. Let Σ be the plane that
perpendicularly bisects the straight line between r0 and r1, and let r1/2 denote
the point of intersection. We consider Σ as a “transparent” wall, so that in a
neighborhood of x0, we can write F = F2 ◦ F1. Here, F1 is like a billiard map in
that it takes points (i.e. directed velocity vectors with a base) near x0 to points
with a base on Σ and a direction pointing near r1. (F1 would be a billiard map if
we reflected the image velocity vectors orthogonally through Σ.) F2 is a billiard
map that takes points in the image of F1 and maps them near x1. Let x1/2 =
F1x0 = F
2 x1. Then ‖DF (x0)‖ ≤ ‖DF1(x0)‖
∥DF2(x1/2)
It is easy to verify that ‖DF1(x0)‖ ≤ const, with the constant depending
only on the curvature of ∂D1 at r0. In other words, the constant may be chosen
independent of x0. Similarly,
∥DF−12 (x1)
∥ ≤ const. Because billiard maps pre-
serve a probability measure with a density proportional to cosϕ, detDF−12 (x1) =
cosϕ1/ cosϕ1/2 = cosϕ1. As Ω is 4-dimensional, it follows from Cramer’s Rule for
the inversion of linear transformations that
∥DF2(x1/2)
const
∥DF−12 (x1)
detDF−12 (x1)
≤ const
cosϕ1
and we are done.
Bibliography
[Ano60] D. V. Anosov. Averaging in systems of ordinary differential equations
with rapidly oscillating solutions. Izv. Akad. Nauk SSSR Ser. Mat.,
24:721–742, 1960.
[BCST03] Péter Bálint, Nikolai Chernov, Domokos Szász, and Imre Péter
Tóth. Geometry of multi-dimensional dispersing billiards. Astérisque,
(286):xviii, 119–150, 2003. Geometric methods in dynamics. I.
[BR98] Leonid A. Bunimovich and Jan Rehacek. On the ergodicity of many-
dimensional focusing billiards. Ann. Inst. H. Poincaré Phys. Théor.,
68(4):421–448, 1998. Classical and quantum chaos.
[BTT07] P. Bálint, B. Tóth, and I. P. Tóth. On the zero mass limit of tagged
particle diffusion in the 1-d Rayleigh gas. Submitted to the Journal of
Statistical Physics, 2007.
[Bun79] L. A. Bunimovich. On the ergodic properties of nowhere dispersing
billiards. Comm. Math. Phys., 65(3):295–312, 1979.
[Cal63] H. B. Callen. Thermodynamics. Wiley, New York, 1963. Appendix C.
[CD06a] N. Chernov and D. Dolgopyat. Brownian brownian motion - I. Memoirs
of the American Mathematical Society, to appear, 2006.
[CD06b] N. Chernov and D. Dolgopyat. Hyperbolic billiards and statistical
physics. In Proceedings of the International Congress of Mathemati-
cians, Madrid, Spain, 2006.
[CDPS96] B. Crosignani, P. Di Porto, and M. Segev. Approach to thermal equilib-
rium in a system with adiabatic constraints. Am. J. Phys., 64(5):610–
613, 1996.
[Che97] N. Chernov. Entropy, Lyapunov exponents, and mean free path for
billiards. J. Statist. Phys., 88(1-2):1–29, 1997.
[Che04] N. Chernov. On a slow drift of a massive piston in an ideal gas that
remains at mechanical equilibrium. Math. Phys. Electron. J., 10:Paper
2, 18 pp. (electronic), 2004.
[CL02] N. Chernov and J. L. Lebowitz. Dynamics of a massive piston in an
ideal gas: oscillatory motion and approach to equilibrium. J. Statist.
Phys., 109(3-4):507–527, 2002. Special issue dedicated to J. Robert
Dorfman on the occasion of his sixty-fifth birthday.
[CLS02] N. Chernov, J. L. Lebowitz, and Ya. Sinai. Scaling dynamics of a
massive piston in a cube filled with ideal gas: exact results. J. Statist.
Phys., 109(3-4):529–548, 2002. Special issue dedicated to J. Robert
Dorfman on the occasion of his sixty-fifth birthday.
[CM06a] N. Chernov and R. Markarian. Chaotic Billiards. Number 127 in Math-
ematical Surveys and Monographs. American Mathematical Society,
2006.
[CM06b] N. Chernov and R. Markarian. Dispersing billiards with cusps: slow
decay of correlations. preprint, 2006.
[Dol05] Dmitry Dolgopyat. Introduction to averaging. Available online at
http://www.math.umd.edu/∼dmitry, 2005.
[GN06] I.V. Gorelyshev and A.I. Neishtadt. On the adiabatic perturbation
theory for systems with impacts. Prikl. Mat. Mekh., 70(1):6–19, 2006.
English translation in Journal of Applied Mathematics and Mechanics
70 (2006) 417.
[GPL03] Christian Gruber, Séverine Pache, and Annick Lesne. Two-time-scale
relaxation towards thermal equilibrium of the enigmatic piston. J.
Statist. Phys., 112(5-6):1177–1206, 2003.
[Gru99] Ch. Gruber. Thermodynamics of systems with internal adibatic con-
straints: time evolution of the adiabatic piston. Eur. J. Phys., 20:259–
266, 1999.
[Kif04a] Yuri Kifer. Averaging principle for fully coupled dynamical systems
and large deviations. Ergodic Theory Dynam. Systems, 24(3):847–871,
2004.
[Kif04b] Yuri Kifer. Some recent advances in averaging. In Modern Dynami-
cal Systems and Applications, pages 385–403. Cambridge Univ. Press,
Cambridge, 2004.
http://www.math.umd.edu/~dmitry
[Lie99] Elliott H. Lieb. Some problems in statistical mechanics that I would
like to see solved. Phys. A, 263(1-4):491–499, 1999. STATPHYS 20
(Paris, 1998).
[LM88] P. Lochak and C. Meunier. Multiphase Averaging for Classical Systems.
Springer-Verlag, New York, 1988.
[LSC02] J. Lebowitz, Ya. G. Sinai, and N. Chernov. Dynamics of a massive
piston immersed in an ideal gas. Uspekhi Mat. Nauk, 57(6(348)):3–86,
2002. English translation in Russian Math. Surveys 57 (2002), no. 6,
1045–1125.
[Nei76] A. I. Neishtadt. Averaging in multi-frequency systems II. Doklady Akad.
Nauk. SSSR Mechanics, 226(6):1295–1298, 1976. English translation in
Soviet Phys. Doklady 21 (1976), no. 2, 80–82.
[NS04] A. I. Neishtadt and Ya. G. Sinai. Adiabatic piston as a dynamical
system. J. Statist. Phys., 116(1-4):815–820, 2004.
[Pet83] Karl Petersen. Ergodic Theory. Cambridge University Press, Cam-
bridge, 1983.
[San76] L. A. Santaló. Integral Geometry and Geometric Probability. Addison
Wesley, Reading, Mass., 1976.
[Sin70] Ya. G. Sinăı. Dynamical systems with elastic reflections. Ergodic prop-
erties of dispersing billiards. Uspehi Mat. Nauk, 25(2 (152)):141–192,
1970.
[Sin99] Ya. G. Sinai. Dynamics of a massive particle surrounded by a finite
number of light particles. Teoret. Mat. Fiz., 121(1):110–116, 1999. En-
glish translation in Theoret. and Math. Phys. 121 (1999), no. 1, 1351-
1357.
[SV85] J. A. Sanders and F. Verhulst. Averaging Methods in Nonlinear Dy-
namical Systems. Springer-Verlag, New York, 1985.
[Vor97] Ya. B. Vorobets. Ergodicity of billiards in polygons. Mat. Sb.,
188(3):65–112, 1997.
[Wri06] Paul Wright. A simple piston problem in one dimension. Nonlinearity,
19:2365–2389, 2006.
[Wri07] Paul Wright. The periodic oscillation of an adiabatic piston in two
or three dimensions. Comm. Math. Phys., 2007. To appear; available
online at http://www.cims.nyu.edu/∼paulrite.
http://www.cims.nyu.edu/~paulrite
	Dedication
	Acknowledgements
	Abstract
	List of Figures
	Introduction
	The adiabatic piston
	Physical motivation for the results
	Background Averaging Material
	The averaging framework
	Some classical averaging results
	A proof of Anosov's theorem
	Moral
	Results for piston systems in one dimension
	Statement of results
	Heuristic derivation of the averaged equation for the hard core piston
	Proof of the main result for the hard core piston
	Proof of the main result for the soft core piston
	Appendix to Section 3.4
	The periodic oscillation of an adiabatic piston in two or three dimensions
	Statement of the main result
	Preparatory material concerning a two-dimensional gas container with only one gas particle on each side
	Proof of the main result for two-dimensional gas containers with only one gas particle on each side
	Generalization to a full proof of Theorem 4.1.1
	Inducing maps on subspaces
	Derivative bounds for the billiard map in three dimensions
	Bibliography
ABSTRACT
  We study a heavy piston of mass $M$ that moves in one dimension. The piston
separates two gas chambers, each of which contains finitely many ideal, unit
mass gas particles moving in $d$ dimensions, where $ d\geq 1$. Using averaging
techniques, we prove that the actual motions of the piston converge in
probability to the predicted averaged behavior on the time scale $M^ {1/2} $
when $M$ tends to infinity while the total energy of the system is bounded and
the number of gas particles is fixed. Neishtadt and Sinai previously pointed
out that an averaging theorem due to Anosov should extend to this situation.
  When $ d=1$, the gas particles move in just one dimension, and we prove that
the rate of convergence of the actual motions of the piston to its averaged
behavior is $\mathcal{O} (M^ {-1/2}) $ on the time scale $M^ {1/2} $. The
convergence is uniform over all initial conditions in a compact set. We also
investigate the piston system when the particle interactions have been
smoothed. The convergence to the averaged behavior again takes place uniformly,
both over initial conditions and over the amount of smoothing.
  In addition, we prove generalizations of our results to $N$ pistons
separating $N+1$ gas chambers. We also provide a general discussion of
averaging theory and the proofs of a number of previously known averaging
results. In particular, we include a new proof of Anosov's averaging theorem
for smooth systems that is primarily due to Dolgopyat.

<|endoftext|><|startoftext|>
Introduction 
While the thermal decomposition of cyclanes has been the subject of several papers, there are only 
few studies about the reactions of polycyclanes and the corresponding kinetic parameters are still very 
uncertain. 
The geometry and the enthalpy of formation of norbornane or bicyclo[2.2.1]heptane (C7H12), a 
bridged bicyclic alkane (Figure 1), have been previously studied in order to relate the structure of the 
molecule to its strain.1,2 A ring strain energy of 17,2 kcal.mol-1 has been estimated for norbornane2 (the 
ring strain energy is defined as the difference between the experimental gas phase enthalpy of formation 
and the gas phase enthalpy estimated using the group additivity method proposed by Benson3 for the 
estimation of thermochemical data). Baldwin and al.4 studied the thermal isomerization of 
3-butenyl-cyclopropane to norbornane. It was observed that 3-butenyl-cyclopropane lead to norbornane 
and many other hydrocarbons when heated at the temperature of 688 K. It was also established that the 
formation of norbornane occurs through the scission of the C2–C3 bond of 3-butenyl-cyclopropane and 
might involve the initial generation of a diradical (Figure 2). 
Figure 1 
Figure 2 
O’Neal and Benson studied the kinetics of pyrolysis of some non-bridged bicyclic alkanes (e.g. 
bicyclo[2.2.0]hexane, bicyclo[3.2.0]heptane) from the point of view of diradical intermediates.5 
Diradical mechanism estimates were found to be consistent with experimental Arrhenius parameters for 
a large number of hydrocarbons. Reaction channels for the fate of diradicals were proposed by Tsang in 
the case of cyclopentane and cyclohexane.6,7 Direct studies of trimethylene and tetramethylene 
diradicals have been performed by Pedersen and al. by using femtosecond laser techniques with mass 
spectrometry in a molecular beam.8 More recently, Sirjean et al. performed quantum calculations about 
the gas phase unimolecular decomposition of cyclobutane, cyclopentane and cyclohexane by using a 
diradical mechanism.9 The theoretical approach used for the calculation was validated by comparing 
calculated results with available experimental data. Several papers about the pyrolysis of 
tricyclo[5.2.1.02,6]decane, a tricyclic alkane, have been published.10-14 Indeed this hydrocarbon is a 
component of synthetic fuels used in aeronautics. A comprehensive primary mechanism of the pyrolysis 
of this species has been developed in our laboratory.14 The reactions of unimolecular initiation of this 
polycyclic compound have been detailed and the reactions of diradicals (decompositions by β-scission, 
internal disproportionnations) have been taken in account on a systematic way.  
 The first purpose of this article is to present new experimental results of the pyrolysis of 
norbornane (solid at room temperature) dissolved in benzene. In line with previous work on 
hydrocarbons,14-17 experiments have been performed in a jet stirred reactor which was operated at 
temperatures between 873 and 973 K, residence times between 1 and 4 s, at a pressure of 106 kPa and at 
high dilution. Conversions ranging from 0.04 to 22.6% have been obtained. The attention has been paid 
to the analysis of the products of the reaction. The formation of 25 major and minor products has been 
observed. The second objective of this paper is to describe the reactions involved in the mechanism of 
the pyrolysis of the norbornane – benzene binary mixture. These reactions include all the possible 
channels of unimolecular initiation of norbornane using a diradical approach as in the case of 
tricyclo[5.2.1.02,6]decane.14 Cross-coupling reactions due to the presence of benzene in the feed of the 
reactor have been also reviewed although it will be shown later in this paper that benzene is not very 
reactive under the operating conditions of this study. 
Experimental Section  
The apparatus (Figure 3) used for the experimental study of the thermal decomposition of norbornane 
(dissolved in benzene) has already been described in two papers about the pyrolysis of 
tricyclo[5.2.1.02,6]decane14 and n-dodecane,15 respectively Its main features are reminded and 
specificities linked to the dissolution of norbornane in benzene are discussed below. 
Figure 3 
Experiments were performed in a continuous quartz jet stirred reactor operated at constant 
temperature (inner volume of about 90 cm3). This reactor was designed to be perfectly stirred for 
residence times ranging from 0.5 to 5 s.18,19 The heating of the reactor was achieved using Thermocoax 
heating resistors coiled around the vessel. Temperature inside the reactor was measured with a type K 
thermocouple which was located at the level of the injection cross at the center of the vessel. Before 
entering the reactor, reactants were preheated to the reaction temperature to avoid the formation of 
temperature gradients inside the gas phase due to the endothermal properties of the pyrolysis reaction. 
The residence time of the reactants inside the annular preheater was very short, i.e. about 1% of the 
residence time inside the reactor. Pressure inside the reactor was set equal to 106 kPa and was 
controlled with a control valve set downstream the products analysis devices. 
Unlike most hydrocarbons having close molecular weights, norbornane (C7H12) is solid at room 
temperature. The melting point of pure norbornane at atmospheric pressure is 360 K20 and its boiling 
point is 381 K.21 In order to study the pyrolysis of norbornane with the same apparatus as that used for 
n-dodecane and tricyclo[5.2.1.02,6]decane,14,15 solid norbornane has been dissolved in a solvent. 
Benzene was chosen since it is a good solvent for many hydrocarbons and because, as an aromatic 
compound, it is very unreactive at low temperature. Norbornane used for the experiments was provided 
by Aldrich (mass fraction purity greater than 0.98) and benzene was provided by Fluka (mass fraction 
purity greater than 0.99). 
The liquid reactant (20 wt% norbornane, 80 wt% benzene) was stored in a pressurized glass vessel. 
Before performing experiments nitrogen bubbling and vacuum pumping were performed in order to 
remove oxygen traces dissolved in the hydrocarbon mixture. The liquid reactant mass flow rate was 
controlled by a mass flow controller, mixed to the carrier gas (helium 99.995% pure) and evaporated in 
a single pass heat exchanger, the temperature of which was set above the boiling point of the diluted 
hydrocarbon mixture. The molar composition of the mixture at the inlet of the reactor was 0.7% 
norbornane, 3.6% benzene and 95.7% helium. 
Products leaving the reactor have been analyzed by gas chromatography. Analyses were performed in 
two steps. Light species (which are gaseous at room temperature such as hydrogen and hydrocarbons 
containing less than 5 carbon atoms) were analyzed on-line by two gas chromatographs. The first 
chromatograph was fitted with a carbosphere packed column and both a flame ionization detector (FID) 
for the detection of methane and C2 hydrocarbons and a thermal conductivity detector (TCD) for the 
detection of hydrogen. Argon was chosen as carrier and reference gas in order to detect hydrogen with a 
better sensibility. A first analysis was performed with a constant oven temperature of 303 K to separate 
the hydrogen peak from the helium peak (experimental carrier gas). A second analysis was performed 
with a constant oven temperature of 473 K for the hydrocarbons separation. Retention times (in min) 
were: methane: 2.4, acetylene: 5.3, ethylene: 7.3 and ethane: 9.6. The second chromatograph used for 
light species analyses was equipped with a FID for the hydrocarbons detection and a Haysep D packed 
column. This column gave a good separation for hydrocarbons from methane to C5 hydrocarbons. In 
particular the peaks corresponding to species like allene and propene or like 1-butene, 2-butene, 1,3-
butadiene and 1-butyne were well defined. Retention times (in min) for species whose formation was 
observed during the study were: methane: 2.6, ethane: 15.5, propene: 61.2, allene: 70.9, propyne: 73.5, 
1-butene: 106.6, 1,3-butadiene: 107.7 and 1,3-cyclopentadiene: 147.8. Species identification and 
calibration were performed with gaseous standard mixtures provided by Air Liquide and Messer. Heavy 
species (hydrocarbons containing more than 5 carbon atoms which are liquid or solid at room 
temperature) were condensed in a trap connected at the outlet of the reactor and maintained at liquid 
nitrogen temperature during a determined period of time. After this time of accumulation, the trap was 
disconnected and solvent (acetone) and a known amount of internal standard (n-octane) were added. 
When the temperature of the trap was back to a temperature close to 273 K the mixture was poured into 
a sampling bottle and then analyzed by gas chromatography. A first analysis was performed with a gas 
chromatograph fitted with a capillary HP-1 column and a FID for the separation and the detection of 
hydrocarbons. Oven temperature profile was set to: 313 K held 30 min, rate 5 K.min-1, 453 K held 62 
min in order to obtain a good separation of the products of the reaction. Retention times of main 
products of the reaction (in min) were: 1,3-cyclopentadiene: 3.7, benzene: 4.9, norbornane: 7.3, 
toluene: 7.9, styrene: 17.4, indene: 38.8, naphthalene: 46.3, biphenyl: 53.3. Calibration was performed 
with prepared solutions containing small amounts of quantified hydrocarbons and of n-octane (internal 
standard). A second analysis was performed for the identification of the products of the reaction with a 
gas chromatography-mass spectrometry system working in the same conditions than the gas 
chromatograph used for quantification (same column, same carrier gas, same carrier gas flow rate, 
same oven temperature profile). This procedure allowed us to obtain the same chromatograms with 
both chromatographs so that direct comparison of the peaks could be performed. Identification of the 
products separated by the HP-1 column was performed by comparison of the mass spectrum 
corresponding to the detected peaks with the numerous mass spectra included in the library NBS 75K 
which was provided by Agilent with the GC-MS apparatus. 
The consistency between the different chromatographic analyses was verified from products which 
were present on two chromatograms (like 1,3-cyclopentadiene, methane and ethane). In each case the 
relative variation between the mole fractions corresponding to the different analyses was less than 5%. 
The repeatability of experimental results has been studied. Calculated maximum uncertainties in the 
experimental mole fractions were ±5% for species analyzed on-line and ±8% for heavy species 
condensed in the trap. Carbon to hydrogen ratios (C/H ratios) in the products have been calculated. For 
this calculation all species have been taken in account except norbornane (this species has no influence 
on the value of the C/H ratio), benzene, toluene, styrene, naphthalene and biphenyl (these four last 
species mainly come from benzene). An average value of 0.60 (± 0.02) has been obtained. This value is 
slightly above the theoretical value (7/12=0.58) but it should be kept in mind that a rigorous distinction 
between products from the norbornane and from the benzene (C/H ratio of 1) is not possible. 
Experimental Results 
Norbornane – benzene binary mixture. The evolution of the conversion of norbornane with 
residence time is shown on Figure 4. Because the values of the difference between the mass flow rates 
of norbornane entering and leaving the reactor were not accurate enough for such low conversions, the 
values of conversion presented on this graph were deduced from the products of the reaction apart from 
aromatic and polyaromatic compounds (toluene, styrene, indene, naphthalene and biphenyl produced in 
very small quantities). Under the conditions of the study, these last products probably derived from 
benzene. At higher conversions aromatic and polyaromatic compounds could be formed through 
secondary reactions from small unsaturated hydrocarbons and from 1,3-cyclopentadiene. No evolution 
of the mass flow rate of benzene was observed between the inlet and the outlet of the reactor under the 
operating conditions of our study. Benzene appeared to be very stable corresponding to very low 
conversions. 
Figure 4 
Twenty five products of the thermal decomposition of the norbornane – benzene binary mixture have 
been analyzed. These products are (by increasing molecular weight): hydrogen, methane, acetylene, 
ethylene, ethane, allene, propyne, propene, 1-butene, 1,3-butadiene, 1,3-cyclopentadiene, 1,3-
cyclohexadiene, 1,4-cyclohexadiene, 5-methyl-1,3-cyclopentadiene, 1,3,5-hexatriene, toluene, 3-ethyl-
cyclopentene, ethenyl-cyclopentane, 4-methyl-cyclohexene, methylene-cyclohexane, styrene, indene, 
naphthalene and biphenyl. An unidentified minor product (molecular weight of 94 g.mol-1 according to 
mass spectroscopy) has been detected between toluene and styrene. 
It is worth noticing that the formation in small amounts of several species having the same molecular 
weight as norbornane has been observed: 3-ethyl-cyclopentene, ethenyl-cyclopentane, 4-methyl-
cyclohexene and methylene-cyclohexane (Figure 5). The evolution of the mole fractions of these four 
products with residence time is shown on Figure 6. The possible channels of formation of these 
particular species which were observed even at very low conversion (less than 0.5%) will be discussed 
later in this paper. The formation of very small quantities of aromatic and polyaromatic compounds 
such as toluene, styrene, indene, naphthalene and biphenyl was also observed. These species probably 
come from reactions of benzene or from cross-coupling reactions of the norbornane – benzene binary 
mixture. The presence of benzene in the feed of the reactor masked the possible formation of small 
quantities of this species as specific product from norbornane. 
Figure 5 
Figure 6 
Figure 7 displays the distribution of the products in term of selectivity (here the selectivity of a 
product is defined as the ratio of the mole fraction of the considered product and the sum of the mole 
fractions of all products) at a temperature of 973 K and a residence time of 1 s. This figure shows that 
the three main products of the reaction are hydrogen, ethylene and 1,3-cyclopentadiene which are 
formed in similar quantities. Figure 8 shows the evolution of the mole fraction with residence time of 
these three main products, as well as methane, propene, 1,3-butadiene, toluene and biphenyl. 
Figure 7 
Figure 8 
The primary products of the reaction of the thermal decomposition of norbornane dissolved in 
benzene were determined from a study of the selectivity performed at a temperature of 953 K 
(corresponding to a maximum conversion of 15%). A species is probably a primary product if the 
extrapolation to origin of its selectivity versus residence time gives a value different from zero 
(corresponding to a non zero initial rate of production). According to this study 15 species seem to be 
primary products: hydrogen, methane, ethylene, ethane, propene, 1-butene, 1,3-butadiene, 
1,3-cyclopentadiene, 1,3-cyclohexadiene, toluene, 3-ethyl-cyclopentene, 4-methyl-cyclohexene, 
methylene-cyclohexane, ethenyl-cyclopentane and biphenyl. The values of selectivities to origin of 
these products are given in Table 1. Species with the highest selectivities at origin are hydrogen (0.346), 
ethylene (0.314) and 1,3-cyclopentadiene (0.184). Unlike toluene and biphenyl (Figure 9a) other 
aromatic and polyaromatic compounds (styrene, indene and naphthalene) do not seem to be primary 
products (extrapolations to origin of their selectivities versus residence time are close to zero; Figure 
9b). This can be explained by the fact that toluene and biphenyl can be obtained directly from phenyl 
radicals derived from benzene (by combination of phenyl radical and methyl radical for toluene, by self-
combination of phenyl radicals or by ipso addition of phenyl on benzene for biphenyl) whereas styrene, 
indene and naphthalene can not be directly formed from primary radicals generated by the 
decomposition of norbornane and benzene. Amongst species having the same molecular weight as 
norbornane 3-ethyl-cyclopentene has the largest selectivity to origin.  
Table 1 
Figure 9 
Influence of the benzene on the kinetic of the reaction. A short study of the thermal decomposition 
of pure benzene was performed in order to determine if benzene plays a role in the kinetics of the 
reaction of pyrolysis of the norbornane – benzene binary mixture. Experiments were performed at 
temperatures between 913 and 973 K, at residence times ranging from 1 to 4 s and at a pressure of 106 
kPa. The molar composition of the flow entering the reactor was 96.4% helium and 3.6% benzene (the 
mole fraction of benzene at the inlet of the reactor was fixed to the same value as in the case of the 
study of the binary mixture norbornane – benzene for direct comparison). 
Figure 10 shows the evolution of the conversion of benzene with residence time at temperatures 
between 913 and 973 K (the conversion were deduced from the products of the reaction). Under these 
operating conditions benzene appeared to be very stable. A maximum conversion of benzene of 8×10-2 
% was obtained at a temperature of 973 K and a residence time of 1 s. For comparison the conversion of 
norbornane (dissolved in benzene) was 9.7% under the same conditions. 
Figure 10 
 The only product of the reaction which was detected was biphenyl. Hydrogen is probably another 
product of the reaction22-25 but it was not detected (TCD is known for being a much less sensitive 
detector than the FID). The formation of toluene, styrene, indene and naphthalene was not observed. 
These species which were observed in the case of the norbornane – benzene binary mixture were 
probably formed from cross-coupling reactions or from specific reactions of norbornane. 
Mole fractions of biphenyl which were obtained in both studies (pure benzene and norbornane – 
benzene binary mixture) have been compared. The two graphs of Figure 11 display the evolutions of the 
mole fractions of biphenyl with residence time (Figure 11a) and with temperature (Figure 11b). This 
figure shows that the mole fraction of biphenyl is always slightly larger in the case of the norbornane – 
benzene binary mixture than in the case of benzene (apart from the experiment leading to the lowest 
conversion and performed with a temperature of 913 K and a residence time of 1 s; Figure 11b). It can 
also be observed that the variation between the mole fractions obtained during the two studies increases 
with conversion. 
While interactions between the two hydrocarbons do exist during the thermal decomposition of the 
norbornane dissolved in benzene, the conditions of our study (temperatures less than 973 K) are such 
that the presence of benzene has a negligible influence on the reactions of norbornane. In a recent 
paper26 El Balkali et al. showed that in the case of the oxidation of an equimolar n-heptane – benzene 
binary mixture the presence of benzene had very little influence on the reactivity at low temperature. 
Figure 11 
Comparison with the reaction of pyrolysis of tricyclo[5.2.1.02,6]decane. Experimental results 
obtained during this study have been compared with previous ones obtained with 
tricyclo[5.2.1.02,6]decane,14 a tricyclic alkane the structure of which contains the structure of norbornane 
(Figure 12). This tricyclic alkane can be considered as a norbornane structure sharing two adjacent 
carbon atoms with a cyclopentane structure. 
Figure 12 
Figure 13 displays the evolution of the conversions of norbornane (dissolved in benzene) and 
tricyclo[5.2.1.02,6]decane obtained at temperatures ranging from 873 to 973 K, at a residence time of 1 
s and at a pressure of 106 kPa. The mole percentages of norbornane and tricyclo[5.2.1.02,6]decane at the 
inlet of the reactor were set equal to 0.7%. This figure shows that the reactivities of the two polycyclic 
alkanes are very similar. 
Figure 13 
The thermal decomposition of tricyclo[5.2.1.02,6]decane leads to the formation of large amounts of 
hydrogen, ethylene, propene, 1,3-cyclopentadiene and cyclopentene.14 While hydrogen, ethylene and 
1,3-cyclopentadiene were also amongst the main products of the thermal decomposition of norbornane, 
propene appeared to be a minor product and the formation of cyclopentene was not observed. An 
analysis of the kinetic model of the pyrolysis of tricyclo[5.2.1.02,6]decane14 shows that cyclopentene and 
allyl radicals (precursors of propene) mainly come from the cyclopentane part of the structure of 
tricyclo[5.2.1.02,6]decane. 
Discussion 
Most of the reactions involved in the pyrolysis of polycyclanes are still badly known and the related 
kinetic parameters are still very uncertain. We describe here the reactions involved in the mechanism of 
the thermal decomposition of the norbornane – benzene binary mixture and the possible channels of 
formation of the products of the reaction are discussed 
Unimolecular initiations by bond scission of norbornane. Fate of diradicals. Unlike linear and 
branched alkanes for which two free radicals are directly obtained, unimolecular initiations of 
polycyclic alkanes by breaking of a C–C bond lead to the formation of diradicals (species with two 
radical centers). The molecule of norbornane (bicyclic alkane) has three different C–C bonds. The 
unimolecular initiations can lead to the formation of the three diradicals BR1, BR2 and BR3 shown on 
Figure 14. According to O’Neal and Benson5 the activation energies of these reactions are given by the 
expression: E1=∆H(C–C)-∆ETC+E-1, where E1 is the activation energy of the reaction of opening of the 
cycle, ∆H(C–C) is the bond energy of the broken C–C bond, ∆ETC is the difference of ring strain energy 
between the products and the reactants, and E-1 is the activation energy of the reverse reaction of closure 
of the diradical. If it is considered that the ring strain energy of norbornane2 is equal to 17.2 kcal.mol-1, 
that diradicals BR1 and BR2 have the same ring strain energy than cyclopentane (6,3 kcal.mol-1) and 
that BR3 has the same ring strain energy than cyclohexane (0 kcal.mol-1), the terms ∆ETC are equal to 
10.9 kcal.mol-1 for BR1 and BR2 and to 17.2 kcal.mol-1 for BR3. If we approximate that the sum ∆H(C–
C)+E-1 is equal to about 87 kcal.mol-1 (this was observed for cyclopropane, cyclobutane, cyclopentane 
and cyclohexane14) we obtain activation energies of 76.1 kcal.mol-1 for BR1 and BR2 and of 69.8 
kcal.mol-1 for BR3. 
Figure 14 
In previous studies of the pyrolysis of cyclanes and polycyclanes6,7,9,14 it has been shown that 
diradicals could react through three ways: 
(1) by combination to give back the initial (poly)cyclane; this reaction is the reverse step of an 
unimolecular initiation by C–C bond scission. 
(2) by internal disproportionnation through a (poly)cyclic transition state intermediate; an unsaturated 
molecule is then obtained. 
(3) by decomposition by β–scission; products of the reaction depend on the position of the two radical 
centers. In most cases, a smaller diradical and a molecule are obtained. 
Figure 15 
Kinetic parameters of these reactions have been estimated for cyclobutane, cyclopentane and 
cyclohexane by quantum calculation by Sirjean et al..9 This study showed that the easiest reaction is the 
reverse reaction by combination of the diradical formed by the unimolecular initiation (this is why 
cyclanes and polycyclanes present a greater stability than linear and branched alkanes). If we except this 
last reaction, the internal disproportionnation is largely easier than the decomposition by β–scission 
(apart from the particular case of cyclobutane in which the β–scission is easier because the broken C–C 
bond is in β position of the two radical centers). It is worth noticing that in the early stage of the 
reaction, (poly)cyclanes mainly lead to the formation of molecular species through diradicals and that 
they do not lead directly to the formation of free radicals. Thus at very low conversion the concentration 
of radicals is very low and the primary molecular initiation products from the reactant mainly react by 
unimolecular initiations to form new diradicals or free radicals. 
Figure 15 displays the possible internal disproportionnations and decompositions by β–scission of 
diradicals BR1, BR2 and BR3 of Figure 14. For example BR2 can reacts by three different reactions of 
β–scission (to form two new diradicals of the same size and a smaller diradical with ethylene) and by 
three internal disproportionnations through bicyclic transition state intermediates (to form three 
unsaturated molecular species). New diradicals obtained by β–scission from BR1, BR2 and BR3 react 
in their turn by reactions of combination, disproportionnation and β–scission. Molecular species 
obtained through internal disproportionnation (which are the main products obtained from diradicals) 
react by unimolecular initiations to form diradicals and/or free radicals.  
During experiments the formation of small amounts of 3-ethyl-cyclopentene, ethenyl-cyclopentane, 4-
methyl-cyclohexene and methylene-cyclohexane (corresponding to MA4, MA2, MA6 and MA5 on the 
scheme of Figure 15) has been observed in the early stage of the reaction. These species can be obtained 
through internal disproportionnations of diradicals generated by the unimolecular initiations of 
norbornane and/or through metatheses of radicals involved in transfer and propagation reactions of the 
decomposition of the norbornyl radicals (but this last source of formation is in competition with more 
probable reactions of β–scission and isomerization). Thus 3-ethyl-cyclopentene, ethenyl-cyclopentane, 
4-methyl-cyclohexene and methylene-cyclohexane likely come from the initiation step of norbornane 
(this is in accordance with the observation of the formation of these species at very low conversion). 
The formation of 1-methylene,3-methyl-cyclopentane (MA1 on Figure 15) has not been observed under 
the conditions of our study. This is probably because the diradical BR1 is more likely to react by 
reaction of termination by combination to give back norbornane. Unlike 3-ethyl-cyclopentene and 
ethenyl-cyclopentane the formation of 4-ethyl-cyclopentene (MA3 on Figure 15) was not observed. 
This may be explained by the fact that the bicyclic structure of the transition state which connects 
diradical BR2 and MA3 is more strained than the transition states which connect BR2 with MA2 and 
MA4. Study of the selectivity of the products of the reaction showed that MA2, MA4, MA5 and MA6 
seemed to be primary products (extrapolations of their selectivities to origin gave values different from 
zero). Among these four species MA4 (3-ethyl-cyclopentene) has the highest selectivity at origin (Table 
1) which also means that it has the highest initial rate of formation. This let us suppose that the 
unimolecular initiation of norbornane to diradical BR2 followed by the internal disproportionnation to 
MA4 is the easiest path of the initiation step. 
Possible unimolecular initiations of 3-ethenyl-cyclopentene (main initiation product from norbornane) 
are given as an example (Figure 16). The breaking of the two vinylic C–C bonds of MA4 are not written 
on the scheme of Figure 16 because the bond dissociation energy is much higher than those of alkylic 
and allylic C–C bonds. 
Figure 16 
Reactions of unimolecular initiation by breaking of C–H bonds can also be considered but these 
reactions are more difficult than reactions of unimolecular initiation by breaking of C–C bonds. These 
reactions can lead to the formation of the three norbornyl radicals shown on Figure 17; but this channel 
of formation is negligible compared to the reactions of metathesis of radicals from norbornane (see 
below). 
Figure 17 
Transfer and propagation reactions of norbornyl radicals. The molecule of norbornane owns 
three different carbon atoms. Reactions of metathesis of hydrogen atoms and radicals with norbornane 
lead to the formation of three norbornyl radicals (Figure 18). 
Figure 18 
Figure 19 
The three norbornyl radicals can react by decompositions by β–scission to lead to the formation of six 
cyclic radicals (Figure 19). These six new radicals can then react by decompositions by β–scission, by 
isomerizations and by metatheses (H abstractions) with molecules. Figure 20 displays the reactions of 
β–scission, metathesis and isomerization of radical R6 from Figure 19. Only isomerizations involving 
allylic hydrogen atoms have been taken in account on this scheme. The metatheses of radical R6 with 
molecules lead to the formation of 3-ethenyl-cyclopentane (MA4) but these reactions are less probable 
that the reactions of β–scission and isomerization. 
Estimations of activation energies of reactions of β–scission by opening of the ring of cyclopentyl and 
cyclohexyl radicals9,27 showed that the values used for linear and branched alkyl radicals28-30 cannot be 
used in a systematic way for cycloalkyl radicals.14 Norbornyl radicals have a more complex structure 
than cyclopentyl and cyclohexyl radicals and activation energies of their reactions of β–scission by 
opening of the ring are very uncertain. Moreover for some (poly)cyclic radicals (e.g. cyclopentyl 
radicals) the activation energy of the β–scission of C–C bonds may be much higher than the value used 
for linear and ramified alkanes28-30 and the reactions of β–scission of C–H bonds may become 
competitive with the reactions of β–scission of C–C bonds. 
Figure 20 
Uncertainties on the kinetic parameters of reactions involved in the transfer and propagation steps of 
the norbornyl radicals make the discussion difficult at this stage of the study. Nevertheless it can be 
noticed that some radicals (like the radicals R6 and R7 of Figure 19) can lead to the formation of 
ethylene and cyclic C5 radicals which are precursors of 1,3-cyclopentadiene. 
Figure 21 
Reactions of benzene. Benzene is known for being a very stable hydrocarbon. At low temperature 
primary reactions of the pyrolysis of benzene are rather simple.31 The only reaction of unimolecular 
initiation is the breaking of a C–H bond (bond energy of 110.9 kcal.mol-1). Breaking of a benzylic C–C 
bond of benzene is very difficult because of the high energy of this type of bond (120.8 kcal.mol-1). 
Bimolecular initiation (reverse reaction of a termination by disproportionnation) consists in the transfer 
of an hydrogen atom from a molecule of benzene to another. This step leads to the formation of a 
phenyl radical and a 2,4-cyclohexadien-1-yl radical. Activation energy of this reaction of bimolecular 
initiation (close to the enthalpy of the reaction: 94.4 kcal.mol-1) is rather high. Under the conditions of 
our study, unimolecular and bimolecular initiations of benzene are probably negligible compared to 
unimolecular initiations of the norbornane which have lowest activation energies (part of the strain 
energy of norbornane is recovered during ring opening3). 
The initiation step mainly leads to the formation of hydrogen atoms and phenyl radicals (Figure 21). 
Hydrogen atoms can reacts by reaction of metathesis with benzene to form an hydrogen molecule and a 
phenyl radical, by self combination to form hydrogen molecule (this trimolecular step is negligible) and 
by addition to benzene to lead to the 2,4-cyclohexadien-1-yl radical. Phenyl radicals can react by self 
combination or by addition to benzene (followed by the loss of an atom of hydrogen) to form biphenyl. 
Sivaramakrishnan et al.32 observed the formation of acetylene and diacetylene from the decomposition 
by C–C bond β–scission of phenyl radical under extreme conditions (high temperature, high pressure 
benzene pyrolysis study behind reflected shock waves). 2,4-cyclohexadien-1-yl radical can react by C–
H bond β–scission to give benzene, it can decompose by C–C bond β–scission to give 1,3,5-hexatrien-
1-yl radicals and then acetylene and 1,3-butadien-1-yl radicals and can lead to 1,3-cyclohexadiene and 
1,4-cyclohexadiene by metathesis on molecule. But under the conditions of the present study of benzene 
pyrolysis, i.e. at low temperature (below 973 K) and close to atmospheric pressure, decomposition of 
phenyl and 2,4-cyclohexadien-1-yl radicals does not occur and the formation of acetylene and 
unsaturated C4 hydrocarbons was not observed. This is in agreement with the observation of the 
formation of mainly hydrogen and biphenyl by Brooks and al. who studied the pyrolysis of benzene at 
temperatures between 873 and 1036 K in a static reactor.25 According to Brioukov et al.31, which 
performed the analysis of the experimental results obtained by Brooks and al.25, the decomposition of 
benzene is dominated by the unimolecular initiation generating an hydrogen atom and a phenyl radical 
followed by the short propagation chain composed of the reaction of metathesis of an hydrogen atom 
with benzene and the reaction of ipso addition of a phenyl radical to benzene leading to biphenyl and an 
hydrogen atom. 
Cross-coupling reactions of the benzene – norbornane binary mixture. Comparison between 
experimental results obtained during the study of the pyrolysis of pure benzene and that of norbornane 
dissolved in benzene showed that there are low interactions between the two hydrocarbons. There are 
very few possibilities of cross-coupling reactions (mainly reactions of metathesis and reactions of 
termination) because the primary mechanism of pyrolysis of benzene generates few species. 
Bimolecular initiations involving norbornane and benzene molecules lead to the formation of the three 
norbornyl radicals and of 2,4-cyclohexadien-1-yl radical (Figure 22). Activation energies of these 
reactions (between 80.5 and 89.4 kcal.mol-1 according to the transferred hydrogen atom) are little higher 
than those of unimolecular initiation of norbornane but lower than the unimolecular initiation of 
benzene and than the bimolecular initiation of two molecules of benzene. Hydrogen atoms and radicals 
deriving from benzene (phenyl radicals and 2,4-cyclohexadien-1-yl radicals) can react by metatheses 
with norbornane to form the three norbornyl radicals (hydrogen, benzene, 1,3-cyclohexadiene and 1,4-
cyclohexadiene are obtained respectively). In the same way radicals deriving from norbornane (more 
numerous than in the case of benzene) can react by metatheses with benzene to lead to phenyl radicals 
(Figure 22). Reactions of termination between radicals deriving from the two hydrocarbons can explain 
the formation of some products like toluene: this last species can be obtained from the combination 
between a phenyl radical (deriving from benzene) and a methyl radical (deriving from norbornane and 
not from benzene at such low temperature). 
Figure 22 
Conclusions 
New experimental results of the thermal decomposition of norbornane dissolved in benzene have been 
obtained in a jet stirred reactor. A great attention was paid to the identification and the quantification of 
the products of the reaction. The formation of 25 both major and minor species has been observed 
during the experiments. Main products were hydrogen, ethylene and 1,3-cyclopentadiene. The detection 
of minor species having the same molecular weight as norbornane gave interesting information about 
the reactions of unimolecular initiation of norbornane. The study of the selectivities of the products of 
the reaction showed that 15 species were probably primary products and values of extrapolations to 
origin of selectivities let us think that the easiest initiation of the norbornane leads to the formation of 
3-ethenyl-cyclopentene. The use of benzene as solvent of norbornane appeared to be a good choice 
because benzene was very unreactive under the operating conditions of our study: firstly interactions 
between the two hydrocarbons remained low and the reactivity of norbornane was little affected by the 
presence of benzene; secondly unimolecular initiations of norbornane (most probable initiation steps) 
were not masked by initiations involving benzene and the formation of molecular product from the 
diradicals generated through the initiations could be observed. 
Reactions that occur during the pyrolysis of norbornane and during the pyrolysis of benzene have 
been reviewed and described. Cross coupling reactions in the case of the norbornane – benzene binary 
mixture are rather limited because the decomposition of benzene generates only few species at low 
temperature. The kinetics of the reactions of unimolecular initiation by breaking of the C-C bonds 
which are part of a ring structure, reactions of the diradicals (termination by combination, termination 
by disproportionnation and decomposition by β-scission) and reactions of decomposition by β-scission 
leading to the opening of a ring (e.g. reactions of β-scission of the norbornyl radicals) will require to be 
better investigated with reliable estimations of the kinetic parameters of these specific and sensitive 
reactions. 
ACKNOWLEDGMENT 
This work was supported by MBDA-France and the CNRS. We are grateful to E. Daniau, M. 
Bouchez, and F. Falempin for helpful discussion. 
REFERENCES  
1. Doms, L.;Van den Enden, L.; Geise, H. J.;  Van Alsenoy, C. J. Am. Chem. Soc. 1983, 105, 158-162. 
2. Vervkin, S. P.; Emel’yanenko, V. N. J. Phys. Chem. A 2004, 108, 6575-6580. 
3. Benson, S. W. Thermochemical Kinetics, 2nd ed; John Wiley: New York, 1976. 
4. Baldwin, J. E.; Burrell, R. C.; Shukla, R. Org. Lett. 2002, 4, 3305-3307. 
5. O’Neal, H. E.; Benson, S. W. Int. J. Chem. Kinet. 1970, 2, 423-456. 
6. Tsang, W. Int. J. Chem. Kinet. 1978, 10, 599-617. 
7. Tsang, W. Int. J. Chem. Kinet. 1978, 10, 1119-1138. 
8. Pedersen, S.; Herek, J. L.; Zewail, A. H. Science 1994, 266, 1359-1364. 
9. Sirjean, B.; Glaude, P. A.; Ruiz-Lopez, M. F.; Fournet R. J. Phys. Chem. A 2006, 110, 12693-
12704. 
10. Striebich, R. C.; Lawrence, J. J. Anal. Appl. Pyrol. 2003, 70, 339-352. 
11. Rao, P. N.;  Kunzru, D. J. Anal. Appl. Pyrol. 2006, 76, 154-160. 
12. Nakra, S.; Green, R. J.; Anderson, S. L., Combust. Flame 2006, 144, 662-674. 
13. Davidson, D. F.; Horning, D. C.; Oelschlaeger, M. A.; Hanson, R. K.; 37th Joint Propulsion 
Conference, Salt Lake City, UT, 2001; AIAA-01-3707. 
14. Herbinet, O.; Sirjean, B.; Bounaceur, R.; Fournet, R.; Battin-Leclerc, F.; Scacchi, G.; Marquaire, P. 
M. J. Phys. Chem. A 2006, 110, 11298-11314. 
15. Herbinet, O.; Marquaire, P. M.; Battin-Leclerc, F.; Fournet, R. J. Anal. Appl. Pyrol. 
doi:10.1016/j.jaap.2006.10.010. 
16. Chambon, M.; Marquaire, P. M.; Come, G. M. C1 Mol. Chem. 1987, 2, 47-59. 
17. Ziegler, I.; Fournet, R.; Marquaire, P. M. J. Anal. Appl. Pyrol., 2005, 73, 107-115. 
18. Matras, D.; Villermaux, J. Chem. Eng. Sci. 1973, 28, 129-137. 
19. David, R.; Matras, D. Can. J. Chem. Eng. 1975, 53, 297-300. 
20. Burwell, R. L.; Shim, B. K. C.; Rowlinson, H. C., J. Am. Chem. Soc. 1957, 79, 5142-5148. 
21. Desty, D. H.; Whyman, B. H. F. Anal. Chem. 1957, 29, 320-329. 
22. Mead, F. C.; Burk, R. E. Ind. Eng. Chem. 1935, 27, 299-301. 
23. Hou, K. C.; Palmer, H. B. J. Phys. Chem. 1965, 69, 863-868. 
24. Louw, R.; Lucas, H. J. Recueil des Travaux Chimiques des Pays-Bas 1973, 922, 55-71. 
25. Brooks, C. T.; Peacock, S. J.;  Reuben, B. G. J. Chem. Soc. Faraday Trans. I 1979, 75, 652-62. 
26. El Balkali, A.; Ribaucour, M.; Saylam, A.; Vanhove, G.; Thersen, E.; Pauwels, J. F. Fuel 2006, 85, 
881-895. 
27. Handford-Styring, S. M.; Walker, R. W.  J. Chem. Soc., Faraday Trans. 1995, 91, 1431-1438. 
28. Buda, F.; Bounaceur, R.; Warth, V.; Glaude, P. A.; Fournet, R.; Battin-Leclerc, F. Combust. Flame 
2005, 142, 170-186. 
29. Warth, V.; Stef, N.; Glaude, P. A.; Battin-Leclerc, F.; Scacchi G.; Côme, G. M. Combust. Flame 
1998, 114, 81-102. 
30. Dahm, K. D.; Virk, P. S.; Bounaceur, R.; Battin-Leclerc, F.; Marquaire, P. M.; Fournet, R.; Daniau, 
E.; Bouchez, M. J. Anal. Appl. Pyrol. 2004, 71, 865-881. 
31. Brioukov, M. G.; Park, J.; Lin, M. C. Int. J. Chem. Kinet. 1999, 31, 577-582. 
32. Sivaramakrishnan, R.; Brezinsky, K.; Vasudevan, H.; Tranter, R. S. Combust. Sci. and Tech. 2006, 
178, 285-305. 
Table 1. Values of selectivities at origin of primary products of the reaction of pyrolysis of norbornane 
in benzene at a temperature of 953 K. 
Species Selectivities to origin 
hydrogen 0.346 
methane 0.011 
ethylene 0.314 
ethane 0.004 
propene 0.020 
1-butene 0.006 
1,3-butadiene 0.035 
1,3-cyclopentadiene 0.184 
1,3-cyclohexadiene 0.013 
toluene 0.004 
3-ethyl-cyclopentene 0.028 
4-methyl-cyclohexene 0.009 
methylene-cyclohexane 0.007 
ethenyl-cyclopentane 0.004 
biphenyl 0.012 
Total 0.997 (theoretical value: 1) 
Figure 1. Structure of norbornane (bicyclo[2.2.1]heptane). 
Figure 2. Isomerization of 3-butenyl-cyclopropane to norbornane through a diradical intermediate as 
proposed by Baldwin et al..4
Nitrogen
Vacuum 
Liquid hydrocarbon supply
Reactor pressure
control valve
Liquid
hydrocarbon
Jet Stirred Reactor
Annular preheating zone
Controlled
Evaporator
and Mixer
Pressurized vessel
containing the mixture of
benzene and norbornane
Gas mass flow
controller
Liquid mass flow
controller
Gaseous product sampling line
Inert gas supply
Liquid nitrogen bath
Ice bath
On-line
GC analysis
Nitrogen
Vacuum 
Liquid hydrocarbon supply
Reactor pressure
control valve
Liquid
hydrocarbon
Jet Stirred Reactor
Annular preheating zone
Controlled
Evaporator
and Mixer
Pressurized vessel
containing the mixture of
benzene and norbornane
Gas mass flow
controller
Liquid mass flow
controller
Gaseous product sampling line
Inert gas supply
Liquid nitrogen bath
Ice bath
On-line
GC analysis
Figure 3. Experimental apparatus flow sheet. 
43210
Residence time (s)  
Figure 4. Evolution of the conversion of norbornane with residence time. (  873 K,  893 K,  913 
K,  933 K,  953 K,  973 K). 
(a) (b) (d)(c)  
Figure 5. Structure of products having the same molecular weight as norbornane. (a) 
3-ethyl-cyclopentene, (b) ethenyl-cyclopentane, (c) methylene-cyclohexane and (d) 
4-methyl-cyclohexene. 
5x10-5
43210
Residence time (s)
6x10-5
n (a)
2.5x10-5
43210
Residence time (s)
2.5x10-5
Figure 6. Evolution of the mole fractions of products having the same molecular weight as norbornane. 
(a) 3-ethyl-cyclopentene, (b) ethenyl-cyclopentane, (c) methylene-cyclohexane, (d) 4-methyl-
cyclohexene.     (  873 K,  893 K,  913 K,  933 K,  953 K,  973 K). 
hydrogen
ethane
acetylene
ethylene
ethane
propene
allene
propyne
1-butene
1,3-butadiene
1,3-cyclopentadiene
1,3,5-hexatriene
1,4-cyclohexadiene
ethyl-1,3-cyclopentadiene
1,3-cyclohexadiene
toluene
styrene
indene
naphtalene
biphenyl
3-ethyl-cyclopentene
ethenyl-cyclopentane
ethyl-cyclohexene
ethylene-cyclohexane
unidentify product
T=973 K
   τ=1s
Figure 7. Distribution of the products of the reaction at a temperature of 973 K and at a residence time 
of 1 s. 
1.6x10-4
1.6x10-4
6x10-5
43210
Residence time (s)
3.0x10-4
1.0x10-4
43210
Residence time (s)
1.2x10-3
1.6x10-3
1.6x10-3
n (a)
Figure 8. Evolution of the mole fractions of some products of the reaction with residence time. (a) 
hydrogen, (b) ethylene, (c) 1,3-cyclopentadiene, (d) methane, (e) propene, (f) 1,3-butadiene, (g) toluene, 
(h) biphenyl.     (  873 K,  893 K,  913 K,  933 K,  953 K,  973 K). 
2.0x10-2
43210
Residence time (s)
 biphenyl
 toluene
6x10-3
43210
Residence time (s)
 styrene
 naphtalene
 indene
(b)  
Figure 9. Evolution of the selectivities of (a) biphenyl and toluene and (b) styrene, indene and 
naphthalene with residence time (at a temperature of 953 K). 
43210
Residence time (s)
 973 K
 953 K
 933 K
 913 K
Figure 10. Evolution of the conversion of benzene with residence time. 
3.0x10-5
0.99 1.96 2.96 3.92
Residence time (s)
 benzene
 benzene - norbornane
                  T=933 K
(a)  
3.5x10-5
913 933 953 973
Temperature (K)
 benzene
 benzene - norbornane
                     (τ=1s)
(b)  
Figure 11. Comparison of mole fractions of biphenyl obtained in the two studies. Evolution with 
residence time (a) and with temperature (b). 
Figure 12. Structure of the tricyclo[5.2.1.02,6]decane. 
973953933913893873
Temperature (K)
 tricyclo[5.2.1.02,6]decane
 norbornane in benzene
Figure 13. Comparison of the conversions of norbornane (dissolved in benzene) and 
tricyclo[5.2.1.02,6]decane. Experiments were performed at a residence time of 1 s, at a pressure of 106 
kPa and with a mole percentage of reactant at the inlet of the reactor of 0.7%. 
BR1 BR3
norbornane
Figure 14. Reactions of unimolecular initiation of norbornane by breaking of C-C bonds. 
+ C2H4Z
BR7 BR5
MA2 MA3
β-scission
disproportionnation
β-scission
disproportionnation
β-scission
disproportionnation
BR3 BR7 BR4
MA5 MA6  
Figure 15. Reactions of β-scission and of disproportionnation of diradicals BR1, BR2 and BR3 of 
Figure 14. 
+ C2H5
+ CH3
Figure 16. Reactions of unimolecular initiation of 3-ethenyl-cyclopentene (MA4 of Figure 15). 
norbornane
Figure 17. Reactions of unimolecular initiation of norbornane by breaking of C-H bonds. 
norbornane R2R1 R3
or or
Figure 18. The three norbornyl radicals obtained by metatheses of hydrogen atoms or radicals R  on 
norbornane. 
Figure 19. Reaction of decomposition by β-scission of the three norbornyl radicals. 
R16 R17
β-scission
isomerization
metathesis (+RH)
(+R )
Figure 20. Reactions of the radical R6 of Figure 19. 
+ H (unimolecular initiation)
+ + (bimolecular initiation)
+ H + H2 (metathesis of H atoms with benzene)
(combination of two H atoms)
(combination of two phenyl radicals)
H  + H  (+ M) H2 (+ M)
+ H (addition of  H atoms to benzene)
+ + H (addition of phenyl radicals to benzene)
Figure 21. Primary reactions of the pyrolysis of benzene. 
+ (bimolecular initiation)
+ Rn + RnH
(metathesis of phenyl
radical with norbornane)
Rb   + Rn products
(metatheses of radicals deriving from 
norbornane with benzene)
(combination/disproportionnation of radicals
deriving from norbornane and benzene)
Figure 22. Cross-coupling reactions of norbornane and benzene. 
ABSTRACT
  The thermal decomposition of norbornane (dissolved in benzene) has been
studied in a jet stirred reactor at temperatures between 873 and 973 K, at
residence times ranging from 1 to 4 s and at atmospheric pressure, leading to
conversions from 0.04 to 22.6%. 25 reaction products were identified and
quantified by gas chromatography, amongst which the main ones are hydrogen,
ethylene and 1,3-cyclopentadiene. A mechanism investigation of the thermal
decomposition of the norbornane - benzene binary mixture has been performed.
Reactions involved in the mechanism have been reviewed: unimolecular
initiations 1 by C-C bond scission of norbornane, fate of the generated
diradicals, reactions of transfer and propagation of norbornyl radicals,
reactions of benzene and cross-coupling reactions.

<|endoftext|><|startoftext|>
Introduction  
Many important gas-phase or heterogeneous industrial processes such as combustion, partial 
oxidation, cracking or pyrolysis exhibit a complex chemical scheme involving hundreds of species and 
several thousands of reactions. In these thermal processes, cyclic hydrocarbons, particularly 
cycloalkanes, represent an important class of compounds. These molecules are produced during the gas-
phase processes though they can also be present in the reactants in large amounts; for example, a 
commercial jet fuel contains about 26% of naphtenes and condensed naphtenes, while in a commercial 
diesel fuel, this percentage reaches 40 % [1]. Modeling their reactivity currently represents an important 
challenge in the formulation of new fuels, less polluting and usable with new types of combustion in 
engines like “Homogeneous Charge Compression Ignition” (HCCI) [2]. During combustion or pyrolysis 
processes, cycloalkanes can lead to the formation of a) toxic compounds or soot precursors such as 
benzene (by successive dehydrogenations) and b) linear unsaturated species such as buta-1,3-diene or 
acroleïn (by the opening of the ring). Several experimental and modeling studies have been carried out 
on the oxidation of cyloalkanes in gas phase [3-9]. However, the modeling of the combustion of 
cycloalkanes remains difficult due to the lack of both, kinetic data for elementary reactions and 
thermodynamic data (∆H°f, S°, C°p) for the relevant species.  
A number of experimental and theoretical works have been reported on the decomposition of small 
cycloalkanes, such as cyclopropane [10-16]. Ring opening in this molecule is a well-known process and 
the rate of dissociation leading to propene formation has been extensively measured and calculated. Not 
less than 37 estimations of the rate constant are available on the NIST chemical kinetics database [17].  
Thermal decomposition of cyclobutane has been experimentally studied too and rate constants for the 
ring opening leading to the formation of two ethylene molecules have been measured [15-16, 18-19]. 
Theoretical studies on cyclobutane have mainly focused on the reverse reaction, namely, the 
cycloaddition of two ethylene molecules [20-23], since it represents a prototype reaction for the 
Woodward-Hoffmann rules and illustrates the usefulness of orbital symmetry considerations. Moreover, 
Pedersen et al. [24] have showed the validity of the biradical hypothesis by direct femtosecond studies 
of the transition-state structures. Theses studies have highlighted the fact that cycloaddition of two 
ethylene molecules can proceed through two different routes: one involves a tetramethylene biradical 
intermediate, while the other implies a concerted reaction that directly leads to cyclobutane formation. 
However, the latter reaction has been shown to have a high activation energy due to steric effects (see 
for instance ref [20] and therein) and to be much less favorable than the biradical process.  
Even though the study of ring opening of cyclopropane and cyclobutane are interesting from a 
theoretically point of view, larger cycloalkanes such as cyclopentane or cyclohexane are mainly 
involved in usual fuel. In spite of that, the unimolecular initiation of these molecules has been little 
investigated. Ring opening of cyclopentane and cylohexane has been experimentally studied by Tsang 
[25-26] who has reported the main routes of decomposition of these molecules. The mechanism and 
initial rates of decomposition were determined from single-pulse shock-tube experiments. For 
cyclohexane, a reaction pathway leading to the formation of 1-hexene has been considered only whereas 
for cyclopentane, the processes leading to either 1-pentene or to cyclopropane + ethylene have both 
been investigated, in accordance with the products experimentally detected. These results have been 
confirmed by further experimental studies by Kalra et al. [27] and Brown et al. [28]. In addition, Tsang 
[25] has shown that the experimentally obtained global rate parameters are consistent with a biradical 
mechanism for ring opening (Scheme 1): 
Scheme 1. Biradical mechanism for cyclopentane ring opening [25] 
Tsang connected the global rate parameters for the decomposition of cyclopentane in 1-pentene (k1) 
or in cyclopropane+ethylene (k2) to the rate parameters of the elementary reactions shown in Scheme 1. 
The equilibrium constant Keq=kb/kc has been estimated by the group additivity method proposed by 
Benson [29] and using some analogies with reactions of linear alkanes. From these assumptions, the rate 
constants of the elementary reactions have been estimated. The same procedure has been applied for the 
rate of decomposition of cyclohexane in 1-hexene. Even if the rate parameters obtained for the 
elementary reactions are consistent with the formation of a biradical, no transition state has been defined 
for validating the suggested mechanism. Moreover, the route leading to ethylene and trimethylene 
through C-C bond cleavage is rather ambiguous since other pathways are considered by Tsang, as for 
instance the direct decomposition of the n-pentyl biradical into cyclopropane and ethylene (Scheme 1). 
All of these studies showed that biradicals are central to the understanding of reaction mechanisms as 
well as to the predictability of reaction products and rates in the ring opening of cycloalkanes. 
Our aim in the present work is to analyze the ring opening of cyclobutane, cyclopentane and 
cyclohexane by means of high level quantum chemical calculations in order to obtain accurate rate 
constants for elementary reactions. Comparison of the computed rate constants with available 
experimental data allows us to validate the theoretical approach. We explore several plausible pathways 
that could be involved in the decomposition of the biradicals and we discuss the evolution of the ring 
strain energy (RSE) in going from the reactants to the transition state for ring opening. 
2. Computational method  
Quantum chemical computations have been performed on an IBM SP4 computer with the Gaussian03 
(G03) software package [30]. The high-level composite method CBS-QB3 [31] has been used. Analysis 
of vibrational frequencies confirmed that all transition structures (TS) have one imaginary frequency. 
Intrinsic Reaction Coordinate (IRC) calculations have been systematically performed at the B3LYP/6-
31G(d) [32-33] level to ensure that the computed TSs connect the desired reactants and products.  
Only singlet biradicals states have been considered and their study at the composite CBS-QB3 level 
requires two specific comments. First, at this level, geometry optimization of the systems is performed 
by density functional theory (DFT) using an unrestricted B3LYP method and a CBSB7 basis set. It is 
worth noting that the use of one determinantal wavefunction to describe open shell singlet biradicals can 
be questionable. However, previous studies have shown that the geometries obtained in this way 
compare well with those obtained at more refined computational levels [34-38]. Second, in CBS-QB3 
calculations, a correction for spin-contamination in open-shell species is added to the total energy. It 
includes a term of the form ( )2200954.0 thSSE −−=∆      where  denotes the expected value of the 2S 2Ŝ  
operator and 2thS  the corresponding theoretical value (e.g. 0 for a singlet state). This correction was 
derived for systems displaying a small spin-contamination. However, because of strong singlet-triplet 
mixing in the unrestricted wavefunction for biradicals, the 2S  values are close to 1 and this leads to a 
systematic error in the CBS-QB3 energy correction of about 6 kcal.mol-1. Several papers have pointed 
out this limitation [39-41] and some authors have proposed to remove the spin-contamination correction 
in this case. In the present work, we have preferred to use an empirical parameter specifically designed 
to handle spin-contamination in singlet biradicals so that singlet-triplet gaps for hydrocarbons biradicals 
are correctly described at the CBS-QB3 level. All energy values for singlet biradical species are 
therefore corrected by an expression of the form: 
∆Espin = -0.031 (S2-S2th)                     (1) 
with S2th =1. 
3. Thermochemical data  
Thermochemical data (∆Hf°, S°, Cp°) for all the species involved in this study have been derived from 
energy and frequency calculations and are collected in Table 1. In the CBS-QB3 method, harmonic 
frequencies, at the B3LYP/CBSB7 level of theory, are scaled by a factor 0.99. Explicit treatment of the 
internals rotors has been performed with the hinderedRotor option of Gaussian03 in accordance with the 
work of Ayala and Schlegel [42]. A systematic analysis of the results obtained was made in order to 
ensure that internals rotors were correctly treated. Thus, for biradicals a practical correction was 
introduced in order to take into account the symmetry number of 2 for each CH2(•) terminal group. This 
symmetry is not automatically recognized by Gaussian in the case of a radical group. Moreover, it must 
be stressed that in transition states, the constrained torsions of the cyclic structure have been treated as 
harmonic oscillators and the free alkyl groups as hindered rotations. 
Enthalpies of formation (∆Hf°) of species involved in this study have been calculated using isodesmic 
reactions [43] excepted for cyclanes and 1-alkene for which more accurate experimental enthalpies of 
formation [44] can be found in the literature. Thanks to the conservation of the total number and types 
of bonds, very good results can be obtained due to the cancellation of errors on the two sides of the 
reaction. Several isodesmic reactions have been considered for the calculation of ∆Hf°  in order to obtain 
an average value. However, results appear to be strongly dependent on the accuracy of the experimental 
data used for species involved in isodesmic reactions, especially for biradicals.   
Table 1. Ideal gas phase thermodynamics properties obtained by CBS-QB3 calculation.  is 
expressed in kcal.mol
KfH 298,∆
-1 and  and are given in cal.molo KS298 )(TCp
° -1.K-1. 
      )(TCp
°     
Species 
KfH 298,∆
KS298 300 K 400 K 500 K 600 K 800 K 1000 K 1500K P.G. 
 12.74 56.78 13.28   18.03   22.25   25.74   31.04   34.92   40.98 D3h
 6.79 63.19 16.97 23.47 29.37 34.33 41.94 47.48 55.94 D2d
 -18.27 70.21 20.99   29.17   36.67   43.01   52.81   59.92   70.74 D5h
 -29.43 71.55 25.45   35.27   44.28   51.96   63.89   72.55   85.66 D3d
 -22.99 74.09 25.61   35.33   44.30   51.95   63.84   72.49   85.60 D2
 0.82 78.27 24.52   32.14   38.82   44.39   53.03   59.41   69.39 C1
 -4.40 84.87 29.11   39.91   47.71   54.90   65.51   73.16   84.94 C1
 -5.09 79.85 25.99   32.74   38.83   44.05   52.37   56.64   68.53 C1
 -9.90 91.75 31.26   39.41   46.75   53.05   63.11   70.69   82.62 C1
 26.31 66.32 18.06   22.98   27.35   31.02   36.58   40.52   46.70 C2h
 25.41 78.56 23.57   29.58   34.92   39.42   46.46   51.70   59.91 C2
 67.43 79.01 22.95   27.80   32.16   35.90   41.92   46.54   53.94 C1
 68.13 77.33 23.59   28.36   32.50   36.10   41.93   46.46   53.82 C2h
62.84 87.11 27.87   34.29   40.04   44.93   52.73   58.64   68.05 C1
 62.46 87.12 28.37   34.50   40.11   44.95   52.71   58.62   68.03 C1
 62.79 89.54 28.15   34.26   39.89   44.75   52.52   58.44   67.90 C1
 58.50 98.74 31.50   39.48   46.76   52.98   62.83   70.22   81.87 C1
59.41 94.29  33.61   41.17   48.03   53.96   63.47   70.70 82.15 C1
59.08 92.96 34.02   41.97   49.00   54.97   64.47   71.58   82.74 C1
Hence, for these systems, ∆Hf° has been obtained from the prototype reaction: 
•R1-•R2 + 2R3H  → HR1-R2H  +  2 •R3,               (2) 
where •R1-•R2 represents the biradical and •R3 = •H, •CH3, •C2H5 or n-•C3H7. 
The computed entropy of cyclopentane has been corrected in order to take into account the 
experimental symmetry of the molecule (D5h). Indeed, CBS-QB3 calculations lead to a non planar 
geometry with  C1 symmetry. In addition, a low frequency of 22 cm-1 can be associated with a puckering 
motion of the ring. According to Benson [29], this pseudo-rotation is so fast that cyclopentane can be 
treated as dynamically flat (with a symmetry number σ=10). Thus, we corrected the entropy of 
cyclopentane by R ln10 (equation 3): 
Sc-C5H10 = S c-C5H10(CBS-QB3) – R ln 10               (3) 
The computed value, 70.0 cal.mol-1.K-1, is in good agreement with the experimental value [29].  
To our knowledge, no experimental enthalpy of formation for biradicals •C4H8•, •C4H10• or •C6H12• 
has been reported. However, we can compare our values with those obtained from an estimation based 
on bond dissociation energy (BDE) according to reaction (4): 
CnH2n+2 = •CnH2n• + 2 H•                      (4) 
where CnH2n+2 represents a linear free alkane and •CnH2n• is the corresponding biradical. ∆H°r for 
reaction 4 corresponds to twice the BDE for a C-H bond and the enthalpy of formation of •CnH2n• can 
therefore be calculated from equation (5): 
∆H°f (•CnH2n•, 298K) = 2BDE + ∆H°f (CnH2n+2, 298K) - 2 ∆H°f (H•, 298K)       (5) 
This calculation rests on the assumption that no interaction exists between the two radical centers. 
Table 2 compares estimated values using equation (5) and CBS-QB3 results for the most stable 
conformation of the biradicals •C4H8•, •C4H10• or •C6H12• . The BDE value for a primary C−H bond in 
a linear alkane has been taken equal to 100.9 kcal.mol-1, as proposed by Tsang [45] (it corresponds to 
C-H bond dissociation in propane to give the n-propyl radical). Experimental enthalpies of formation for 
the molecules used in equation (5) come from NIST webbook [44]. As shown in Table 2, a very good 
agreement is obtained between CBS-QB3 calculations using isodesmic reactions and values estimated 
by BDE and equation (5). This result corroborates the consistency of the electronic calculation scheme 
for the biradicals, in particular, the use of a broken symmetry method to optimize their geometry, as 
discussed in Section 2. 
Table 2. Comparison of ∆H°f (in kcal.mol-1, at 298 K) estimated from equation (5) and from theoretical 
calculations.  
Biradical ∆H°f from 
equation (5) 
This work 
•C4H8• 67.57 67.43 
•C5H10• 62.52 62.46 
•C6H12• 57.64 58.50 
4. Kinetic calculations 
The rate constants involved in the mechanisms were calculated using TST [47]: 
( ) ⎟
⎛ ∆−⎟
Trpdk buni   exp  exp 
   κ                (6) 
where and ≠∆S ≠∆H  are, respectively, the entropy and enthalpy of activation and rpd is the reaction 
path degeneracy. For reactions involving H-transfer, a transmission coefficient, namely κ(T), has been 
calculated  in order to take into account tunneling effect. We used an approximation to κ, provided by 
Skodje and Truhlar [47]: 
( ) ( )[ ]
( ) ( )[ ]
)/sin(
     
  for  
      
  for 
                (7) 
where Tkh B 
1   ,
)Im( 
2  ==
In equation (7), ν≠ is the imaginary frequency associated with the reaction coordinate, ∆V≠ is the zero-
point-including potential energy difference between the TS structure and the reactants, and V is 0 for an 
exoergic reaction and the zero-point-including potential energy difference between reactants and 
products for an endoergic reaction. 
The calculation of ≠∆H  was performed by taking into account electronic energies of reactants and 
TSs but also the enthalpy of reaction (Scheme 2). 
Reactant
Product
Scheme 2.Calculation of enthalpy of reaction: ≠∆H  
Thus, ≠∆H (reactant → product) is calculated from the following relation: 
∆H≠ R → P( ) =  ∆H1≠ (elec) +  ∆H−1≠ (elec) +  ∆Hr( ) 2            (8) 
and ≠∆H (product → reactant) by : 
∆H≠ P → R( ) =  ∆H1≠ (elec) +  ∆H−1≠ (elec) −  ∆Hr( ) 2              (9) 
where  and  are, respectively, the enthalpy of activation for the direct and reverse 
reactions, calculated from electronic energies, ZPVE and thermal correction energy. 
)(1 elecH
≠∆ )(1 elecH
rH∆ corresponds to 
the enthalpy of reaction estimated using isodesmic reactions [48]. The use of equations (8) and (9) 
ensures the consistency between kinetic and thermodynamic data and  improves the activation enthalpy 
results since the values obtained for rH∆  are more accurate than direct electronic calculations. 
The kinetic data are obtained with a fitting of equation 6 in the temperature range 600-2000K, with the 
following modified Arrhenius form:  
k = A Tb exp (-E/RT)                      (10) 
5. Ring opening mechanisms 
The ring opening mechanisms for cyclobutane, cyclopentane and cyclohexane are now discussed. In 
all schemes presented below, electronic free enthalpies are reported in kcal.mol-1 and are relative to the 
reference cycloalkane at P=1 atm. The value in bold corresponds to free enthalpy at T=298 K while the 
value in italics corresponds to T=1000 K.  
5.1 Cyclobutane 
Scheme 3 presents the global mechanism obtained for the ring opening of cyclobutane. In this scheme, 
we only considered conformers (molecules, biradicals, TSs) of lowest free enthalpies. Especially, we do 
not account for conformations of the biradical (2) (gauche and trans) and we only considered the 
gauche conformer since its free enthalpy is lowest. This point will be discussed in more detail below.    
100.5
0.0 -15.7
-18.0
-22.9
2 (s) 57.8
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 3. Mechanism obtained for the ring opening of cyclobutane and for conformers of lowest 
energies. 
The ring opening of cyclobutane involves a low free enthalpy of activation; ∆G≠ = 60.7 kcal.mol-1 at 
298 K which corresponds to an activation enthalpy (∆H≠) of 62.7 kcal.mol-1.  This value is lower than 
that involved in the dissociation of a C-C bond in the corresponding linear alkane (∆H≠ ≈ 86.3 
kcal.mol-1) [49]. This result will be discussed below but we can anticipate that it is mainly due to the 
high ring strain energy involved in the cyclobutane molecule, that is partially removed in the transition 
state. Following ring opening, three ways of decomposition for tetramethylene have been investigated. 
The route leading to the formation of two ethylene molecules corresponds to the most favorable one, as 
already mentioned in the literature [15-23]. The process leading to 1-butene represents another possible 
pathway though this elementary reaction involves a high stressed cyclic transition state for H-transfer 
(four-member ring) and displays a high activation energy ( ≠∆H =17.6 kcal.mol-1 vs 2.8 kcal.mol-1 for 
the β-scission leading to C2H4, at 298 K). This pathway, however, might provide a non–negligible 
contribution to the total process. Indeed, C4H8 has been identified as a minor product in experimental 
studies of cyclobutane decomposition [18]. The third route consists of the abstraction of two hydrogen 
atoms to form buta-1,3-diene and H2. The reaction step has a high activation barrier ( ≠∆H = 40 
kcal.mol-1 at T=298K) and should not compete with the previous mechanisms.  
A concerted reaction leading from cyclobutane to direct formation of two ethylene molecules has 
been envisaged but no consistent TS could be obtained. As mentioned previously, several studies [20-
23] have been performed on the reverse reaction, i.e. the cyclic addition of two ethylene molecules. The 
authors found that a two-step biradical-type reaction is expected to be favored (by about 24 kcal.mol-1) 
over concerted pathways. Sakai [20] estimated activation energies of 77.6 kcal.mol-1 and 58.8 kcal.mol-1 
for the concerted and stepwise processes, respectively, at the MP2//CAS/6-311+G(d,p) level. However, 
it is important to underline that this author reported a second-order saddle point structure (two negative 
eigenvalues) for the concerted mechanism, which therefore does not correspond to a transition state. 
Table 3 gives the kinetics parameters of the modified Arrhenius form (equation 10) for all the 
elementary processes involved in Scheme 3. Unimolecular initiation of cycloalkanes is unimportant for 
low temperatures in thermal processes and we only give kinetics parameters for T > 600K in all the 
tables. 
Table 3. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 
2000 K and related to scheme 3. 
 k1-2 k2-1 k2-C2H4 k2-4 k2-5
log A (s-1) 18.53 12.21 7.32 5.57 2.23 
n -0.797 -0.305 1.443 2.171 2.995
(kcal.mol-1) 
64.85 1.98 3.03 16.44 37.61
  To the best of our knowledge, no previous quantum calculations have been performed to estimate rate 
parameters for the elementary reactions involved in the thermal decomposition of cyclobutane. In 1971, 
Beadle et al. [19] experimentally studied the pyrolysis of cyclobutane and reported rate parameters for 
the ring opening of c-C4H8 (k1-2 and k2-1) and for the decomposition of the biradical in C2H4 (k2-C2H4). 
Their estimations are based on tabulated thermochemistry and additivity methods. The A factor and 
activation energy proposed by Beadle et al. are as follows: 3.63 1015 s-1 and 63.34 kcal.mol-1 for k1-2, 
2 1012 s-1 and 6.6 kcal.mol-1 for k2-1, and 1.17 1013 s-1 and 8.25 kcal.mol-1 for k2-C2H4. In spite of 
considerable uncertainty in that study, as mentioned by the authors, their results are in good agreement 
with ours. Thus, at 800 K, the ratio between our value and that proposed by Beadle et al. is 1.7, 2.0 and 
0.7 for k1-2, k2-1, and k2-C2H4, respectively. Moreover, it is worth noting that our CBS-QB3 quantum 
calculations provide an accurate energy for the cycloaddition of two ethylene molecules (formation of 
biradical 2). The free enthalpy of activation obtained is equal to 56.3 kcal.mol-1 at 298K, which 
corresponds to an activation energy ∆H≠ = 47.7 kcal.mol-1. This last value can be compared with the 
activation energy of 56.8 kcal.mol-1 reported by Sakai [20] at the MP2/CAS/6-311+G(d,p) level though 
this author obtained  a second order saddle point.  
 As mentioned previously, gauche/trans interconversion of (2) has been neglected in Scheme 3. 
However, the β-scission reaction involves a low ∆G≠ (5.1 kcal.mol-1 at 298K) that could be of the same 
order of magnitude than the rotational barrier around the central CC bond. Accordingly, it can be 
interesting to consider the two conformations of the biradical for this particular but important pathway. 
A detailed scheme is presented in Scheme 4. 
1 2 (s) 3 (s)
6.66.6
46.30.0 48.4
-22.9
-22.9
56.2 53.5
56.0 55.7
Gauche Trans
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 4. Detailed mechanism of C2H4 formation from the ring opening of cyclobutane and obtained 
by considering different conformers of C4 biradicals. 
 Cyclobutane ring-opening leads to the gauche biradical (2). The latter can decompose in two C2H4 
molecules or rotate to give the trans conformation (3), which in turn, can also decompose in two C2H4 
molecules. The kinetic parameters for all processes involved in Scheme 4 are collected in Table 4. 
Table 4. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 2000 
K and related to scheme 4. 
 k1-2 k2-1 k2-3 k3-2 k2-C2H4 k3-C2H4
log A (s-1) 18.53 12.21 11.30 11.27 7.63 7.32 
n -0.797 -0.305 0.461 0.545 1.453 1.521 
Ea (kcal/mol) 64.85 1.98 4.29 3.26 4.79 2.07 
Tables 3 and 4 give the kinetic parameters involved in the thermal decomposition of cyclobutane. In 
order to validate our results, we compare the global rate constant for the process:  
Cyclobutane → C2H4 + C2H4,    
measured experimentally by Barnard et al. [15] and Lewis et al. [16], with that derived from our 
computations (Schemes 3 and 4). We consider the quasi-stationary state approximation (QSSA) for 
biradicals. For Scheme 3, QQSA applied to the gauche biradical leads to the simple expression:  
HCscheme
=                          (11) 
For scheme 4, QSSA leads to a more complex expression:  
  k  
scheme
g             (12) 
with , 
           
4242424242 322323323122312
HCHCHCHCHC kkkkkkkkk k
−−−−−−−−−−
The fit of the global rate constants obtained from relations (11) and (12) leads to the rate parameters 
presented in table 5. 
Table 5. Rate parameters for the global reaction cC4H8 → 2 C2H4, at P=1 atm, 600 ≤ T (K) ≤ 2000 K 
and related to Schemes 3 and 4. 
 3scheme
4scheme
log A (s-1) 20.29 21.52 
n -1.259 -1.606 
Ea (kcal/mol) 67.69 68.16 
Figure 1 compares our results (  and ) with the absolute value measured directly by 
Barnard et al. [15] and by Lewis et al. [16] for the thermal rate decomposition of cyclobutane in two 
ethylene molecules.  
3scheme
4scheme
Figure 1. Comparison between calculated rate constant and experimental data for the global reaction 
cC4H8 → C2H4 + C2H4
As shown in Figure 1, our results are close to the experimental values, validating the consistency of the 
theoretical approach. Computed rate constants are always slightly lower than those obtained by Lewis et 
al. [16] (maximum factor 2.4) or by Barnard et al. [15] (maximum factor 4.3). It is interesting to note 
that differences between  and  decrease with temperature, which is consistent with 
rotational hindrance (20% at 600K and 8% at 2000 K). Though these differences are weak, the rate 
4scheme
3scheme
constant obtained by explicit consideration of the two conformations of the biradical (trans and gauche) 
is closer to experimental results than the rate constant calculated from Scheme 3.  
5.2 Cyclopentane 
Scheme 5 shows the global mechanism obtained for the ring opening of cyclopentane. As for 
cyclobutane, the global mechanism does not take into account the different conformations of the 
biradical •C5H10• and we only consider the conformation with the lowest free enthalpy.  
Due to weaker ring strain energy, the free enthalpy of activation of the ring opening of cyclopentane 
(∆G≠ = 80.5 kcal.mol-1 at 298 K) is higher than that obtained in cyclobutane (∆G≠ = 60.7 kcal.mol-1 at 
298 K).  
4 (s)
100.5
117.3
102.7
107.1
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 5. Mechanism obtained for the ring opening of cyclopentane and for conformers of lowest 
energies. 
Four routes of decomposition have been investigated for the biradical 4, the most favorable one being 
the formation of 1-pentene by H-transfer. This last reaction exhibits an activation energy much lower 
than the equivalent process in cyclobutane ( ≠∆H = 6 kcal.mol-1 vs 17.6 kcal.mol-1 for cC4H8 at 298 K).  
This noticeable difference can be ascribed to different strain energy in the corresponding transition 
states. Conversely, the β-scission process leading to ethylene and cyclopropane is much more difficult 
than that leading to ethylene in cyclobutane. In this last case, the presence of two radical centers in 
(•C4H8•) in β position weakens the inner C-C σ-bond. Another interesting point concerns the 
dehydrogenation reaction of biradical 4 leading to penta-1,4-diene (6) and H2. At low temperature 
(298K), this reaction is competitive with β-scission (reaction 4 →7) but becomes unimportant at high 
temperature (1000K) due to a low change of entropy between the biradical and TS. Finally, the reaction 
of biradical (4) to yield ethyl-cyclopropane (8) is unlikely to occur at either room or high temperature 
because it involves a high stressed cyclic transition state for H-abstraction and always displays quite 
high activation energy. For completeness, it must be noted that formation of a trimethylene biradical 
from 4 has been envisaged. However, the singlet state for this system could not be obtained, geometry 
optimizations leading systematically to the formation of cyclopropane (only the triplet state could be 
optimized). This may be due to the monodeterminantal character of the methodology used for the 
geometry optimizations but it has been shown that ring closure of trimethylene to form cyclopropane 
occurs very fast anyway [50]. Table 6 gives the kinetics parameters of the modified Arrhenius forms 
(equation 7) for all the elementary processes involved in Scheme 5. As mentioned previously, just a few 
studies have been devoted to the kinetics of the thermal decomposition of cyclopentane [25, 27]. Tsang 
[25] measured by comparative-rate-pulse shock tube experiments, the ring opening of cyclopentane, and 
suggested rate expressions for k1-4, k4-1, k4-5 and k4-7, over the temperature range 1000 K – 1200 K. The 
estimations were based on experimental results obtained for the thermal decomposition of cyclopentane 
and on the thermochemical considerations methods proposed by Benson [29]. 
Table 6. Rate parameters for the unimolecular initiation of cyclobutane at P=1 atm, 600 ≤ T (K) ≤ 2000 
K and related to Scheme 5. 
 k1-4 k4-1 k4-5 k4-6 k4-7 k4-8
log A (s-1) 18.11  9.89 6.77 0.51 9.78 -3.07 
n -0.466 0.311 1.480 3.015 1.1 4.157 
Ea (kcal/mol) 85.18 1.7 7.76 17.78 26.16 32.43 
However, the analysis did not take into account transition state and Tsang used a constant value of 
0.16 for the ratio k4-5/ k4-1 in accordance with the disproportionation to combination ratio for n-propyl 
radicals. At 1100 K, the ratios obtained between our values and those proposed by Tsang for k1-4, k4-1, 
k4-5 and k4-7 are, respectively, 0.5, 1.5, 1.6 and 1.7. Despite uncertainties in both experimental and 
theoretical calculations, the agreement is therefore quite satisfying. It is also possible to compare some 
computed rate parameters for elementary reactions in Scheme 5 with values obtained using semi-
empirical relations as proposed by O’Neal [51]. Accordingly, the activation energy involved in reaction 
4 → 6, corresponding to a H-transfer (disproportionation), can be estimated by the following 
relationship [51]: 
Ea = ED  +  ESE                       (13) 
where ED represents the activation energy involved in the disproportionation of two alkyl free radicals 
and ESE represents the strain energy of the cyclic transition state. The latter was taken to be 6.3 
kcal.mol-1 since the transition state exhibits a five-member ring [29]. An activation energy of 1 
kcal.mol-1 can be reasonably taken into account for disproportionation [49].  From our calculations the 
activation enthalpy of reaction 4→ 6, is estimated to 6 kcal.mol-1 at 298 K and therefore it is in good 
agreement with the semi-empirical estimation. On the other hand , our quantum calculations give an 
activation energy equal to 26.6 kcal.mol-1 at 298 K for the β-scission of biradical 4 (reaction 4→7). This 
value can be compared with the mean activation energy for the β-scission of a C-C bond in an alkyl free 
radical (28.7 kcal.mol-1 [49]).  
In Scheme 5, biradicals 4 represents the most stable conformer of the biradical although it does not 
lead directly to the formation of 1-pentene.  Since some activation energies, such as that of the reaction  
4 → 5 ( ≠∆H = 6.7 kcal.mol-1 at 298 K), have values close to that of the rotation barrier heights, the role 
of rotational hindrance has been examined in the case of cyclopentane. Scheme 6 contains the detailed 
mechanism. 
2 (s) 3(s)
77.5 77.1
65.3 64.7
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 6. Detailed mechanism of 1-pentene formation from the ring opening of cyclopentane and 
obtained by considering different conformers of C5 biradicals. 
Two conformations of the biradical •C5H10• are involved in Scheme 6 but only one (biradical 3) leads 
to the formation of 1-pentene by H-abstraction. This result has been validated by IRC calculations. 
Geometries of TS3-5 and biradicals (2) and (3) are presented in Figure 2.  
As shown, the distance between carbon atom 2 and hydrogen atom 11 is larger in biradical 2 than in 
biradical 3 (compares structures a and c), which renders H-abstraction for this last species more 
favorable. This result can be explained by a gauche interaction in biradical 2 involving carbon atoms 4 
and 1. In biradical 3, carbon atoms 1, 3, 4 and 5 are in the same plan and no gauche interaction between 
carbon atoms 1 and 4 is possible.  
Figure 2. Geometries relative to biradical 2 (a), and TS  (b), biradical 3 (c) 
Figure 2. G
 It is wor
Indeed, in 
activation e
In cyclope
only one c
for the mec
Table 7. R
2000 K and
log A (s-1) 
(kcal/mol) 
In order 
rotational b
experiment
2.819 Å
eometries relative to biradical 2 (a), and TS3-5 (b), biradical 3 (c) 
th noting the different role played by rotational hindrance in cyclo
cyclobutane (Scheme 4), the rotational barrier is of the same o
nergy for β-scission and both conformers may lead to the formati
ntane, the rotational barrier is lower than the activation energy for
onformation (biradical 3) allows the latter process. Table 7 summ
hanism shown in Scheme 6. 
ate parameters for the unimolecular initiation of cyclopentane at
 related to Scheme 6. 
k1-2 k2-1 k2-3 k3-2 k3-5
18.11 10.74 10.97 10.54 6.89 
-0.466 0.207 0.569 0.602 1.494
84.76 1.7 2.68 2.97 7.45 
to validate the rate parameters involved in Schemes 5 and 6 and to
arriers between biradicals 2 and 3, we have compared the glob
ally by Tsang [25] for the following reactions: 
cyclopentane → 1-pentene,  k1= 1016.1 exp(-84840 (cal.mol-1
cyclopentane  → cyclopropane + ethylene,  k2= 1016.25 exp(-95060
2.645 Å
2.819 Å
2.645 Å
pentane and cyclobutane. 
rder of magnitude as the 
on of ethylene molecules. 
 1-pentene formation and 
arizes the rate parameters 
 P=1 atm, 600 ≤ T (K) ≤ 
 quantify the effect of the 
al rate constants obtained 
)/RT), 
 (cal.mol-1)/RT), 
with global rate constant estimated by QSSA performed on biradical 4 (Scheme 5) and biradicals 2 and 
3 (Scheme 6). The rate expressions are: 
54415 
51,   
kschemeg ,                       (14) 
533253122-312
5332216 
51,        
−−−−−
kkkkkk
kschemeg ,               (15) 
74415 
71,   
kschemeg ,                    (16) 
and the kinetics parameters are given in Table 8. 
Table 8. Rate parameters for the global reactions cC5H10 → 1-pentene and cC5H10→ cC3H6 +C2H4 at 
P=1 atm, 600 ≤ T (K) ≤ 2000 K and related to Schemes 5 and 6. 
scheme
gk −  
scheme
gk −  
scheme
gk −  
log A (s-1) 20.39 24.33 20.06 
n -0.970 -1.542 -0.878 
(kcal/mol) 
92.86 112.49 92.23 
Comparison of global rate constants estimated by the QSSA approach and those proposed by Tsang 
are presented in Figure 3. In the range 1000K –1200K, the values calculated from our global rate 
constants for the formation of 1-pentene (  and ) are in a good agreement with those 
obtained by Tsang, our values being lower by a factor 1.2 to 2.  
scheme
scheme
Figure 3. Comparison between calculated rate constant and experimental data for the global reactions 
c-C5H10 → 1-pentene and cC5H10 → c-C3H6 + C2H4
A similar result has been obtained in the case of the reaction leading to cyclopropane and ethylene, 
with values underestimated by a factor 0.7 to 2.5 with respect to Tsang estimations. Concerning the 
influence of rotational hindrance in the formation of 1-pentene,  is always found to be lower 
than  over the temperature range (28 % at 600K and 8% at 2000K). By analogy with 
cyclobutane, calculations performed by considering the different conformations for biradical •C
scheme
scheme
5H10• 
permit to obtain results closer to experiment. Moreover, the effect of rotational hindrance is slightly 
greater in the case of cyclopentane than cyclobutane. As said above, this can be explained by the fact 
that biradical •C5H10• must necessarily rotate in order to yield 1-pentene whereas in cyclobutane, both 
gauche and trans conformations of the biradical can decompose in two ethylene molecules. 
5.3 Cyclohexane 
Only a few numbers of studies have been performed on the unimolecular initiation mechanism of 
cylohexane [26, 28]. Scheme 7 shows our results for the ring opening of c-C6H12 and subsequent 
reactions. As before, in this scheme, we only consider the lowest free enthalpy conformers of the 
biradicals. Table 9 summarizes the computed rate parameters.  
5 (s)
106.5 127.0
110.8
8 94.7106.9
53.287.5
C4 pathway
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 7. Mechanism obtained for the ring opening of cyclohexane and for conformers of lowest 
energies. 
Table 9. Rate parameters for the unimolecular initiation of cyclohexane at P=1 atm, 600 ≤ T (K) ≤ 2000 
K and related to scheme 7.  
 k1-5 k5-1 k2-5 k5-2 k5-6 k5-7 k5-8 k5-c-C3H6
log(A s-1) 21.32 9.91 20.11 10.38 2.46 -1.33 10.40 5.23 
n -0.972 0.136 -0.785 0.137 2.569 3.800 0.994 2.185 
Ea (kcal/mol) 92.63 2.09 85.77 2.13 1.42 17.22 25.75 44.25 
In our study, the chair and boat conformations of cyclohexane have been taken into account. As 
mentioned by Dixon and al. [52], the boat structure (C2v symmetry) is a transition state that connects the 
chair structure with a twist boat (D2 symmetry) conformation. We did not locate the transition state 
corresponding to the C2v boat structure but we have obtained a D2 twist boat conformation that does 
correspond to a characterized energy minimum. Hereafter, this twist boat conformation will be simply 
refereed to as the “boat conformation” (symmetry number of 4, D2 symmetry). At high temperature, the 
concentration of this boat conformation cannot be neglected and the equilibrium constant Keq, 
corresponding to the reaction: 
c-C6H12 (chair)          c-C6H12 (boat) ,      
has been fitted in the 600 - 2000 K  temperature range using the relation: 
K rreq 271.1
3265exp  )    ( exp  +−=+−=
           (17) 
CBS-QB3 calculations performed from chair and boat conformations of cyclohexane and considerations 
of isodesmic reactions have permitted to calculate and . The mean values for and  
are, respectively, 6.5 kcal.mol
°∆ rS
°∆ rH
°∆ rH
°∆ rS
-1 and 2.5 cal.mol.l-1. From equation (17), Keq is equal to 0.047 at 753K 
that corresponds to 95.5 % of cyclohexane molecules in chair conformation. This result is in very good 
agreement with the value reported by Walker and Gulati [53] who reported a slightly larger value 
(99.5%). Another estimation of the equilibrium constant has been reported by Eliel and Wilen [54] from 
experimental values obtained at 1073K. They found that 25% of the twist boat structure is present in the 
mixture, that leads to Keq = 0.33. From equation (17), we obtain Keq = 0.17 at 1073K  that is consistent 
with the estimation made by Eliel and Wilen. Thus, it appears that, in the range of temperature 
considered in our study, the chair/boat conformation ratio must be taken into account in kinetic 
calculations.  
Two different TSs have been found for the ring opening of cyclohexane depending on its initial boat 
or chair structure (Figure 4), the TS corresponding to the former conformation being 2.4 kcal.mol-1 
lower in free energy. Since activation energy for ring opening is much higher than the energy involved 
in boat/chair conformation change, we conclude that only the lowest TS should be taken into account in 
the kinetic scheme.   
a)   b)   
Figure 4: TS obtained from the ring opening of cyclohexane with the boat conformation (a) and the 
chair conformation (b) 
The free enthalpy of activation obtained for the ring opening of cyclohexane (∆G≠ = 85.5 kcal.mol-1, 
at T = 298K) is higher than that obtained for cyclobutane (∆G≠ = 60.7 kcal.mol-1) or cyclopentane (∆G≠ 
= 80.5 kcal.mol-1) at the same temperature, and close to the value found for a linear alkane. This result 
will be discussed in detail below but it is interesting to point out that it is consistent with an “unstrained 
structure” in cyclohexane. The main way of decomposition for the biradical •C6H12• (biradical 5) is the 
formation of 1-hexene, as previously mentioned by Tsang [26]. Indeed, this process involves a slightly 
constrained transition state for H-abstraction (six-member ring) with a low activation energy ≠∆H = 1.8 
kcal.mol-1 (vs 17.6 kcal.mol-1 for c-C4H8 and 6 kcal.mol-1 for c-C5H10 at 298 K). The value of ≠∆H = 1.8 
kcal.mol-1 can be compared with the semi-empirical estimation of Ea given by equation 13 and 
corresponding to a disproportionation process. Taking 1 kcal.mol-1 for ED and 0.7 kcal.mol-1 for ERS 
(disproportionation of two alkyl radicals for ED and a ring strain energy of a six-member ring for ERS), 
Ea = 1.7 kcal.mol-1, in very good agreement with our calculation.  Owing to this very low activation 
energy, other routes for •C6H12• decomposition are unlikely. An interesting remark can be made for 
β-scission leading to the formation of •C4H8• and ethylene (reaction 5 → 8). In the experimental study 
performed by Tsang [26], no cyclobutane was detected what could implicitly be explained by a very low 
reaction rate for •C4H8• formation compared to 1-hexene. Analyzing free enthalpies of activation at 
T=1000K in Scheme 7 shows that the ratio between the disproportionation leading to 1-hexene (reaction 
5→ 6) and the β-scission (reaction 5→ 8) is about 120. A comparable value is obtained in cylopentane 
decomposition (Scheme 5). However, in the case of the thermal decomposition of cyclopentane, 
cylopropane is detected experimentally [25] (though it represents a minor product) whereas for 
cylohexane, no cyclobutane is observed at all. This discrepancy can be explained by the very fast 
decomposition of the biradical •C4H8• in two ethylene molecules, compared to the cyclization reaction, 
as shown in Scheme 3. This assumption cannot be verified using the experimental data reported by 
Tsang [26] since ethylene formed by initiation would represent a minor part of the total concentration of 
this molecule, which is principally obtained in propagation reactions (decomposition of 1-hexene and 
cylohexyl radical).  Activation energy of the β-scission of biradical 5 (reaction 5→ 8) is equal to 25.2 
kcal.mol-1 at 298 K, which is consistent with the β-scission of a C-C bond in an alkyl free radical, with a 
corresponding activation energy of 28.7 kcal.mol-1 [50].  
  Let us now consider the effect of conformers of •C6H12•, that was ignored in Scheme 7. The 
activation energy involved in the formation of 1-hexene from the biradical •C6H12• is quite low ( ≠∆H = 
1 kcal.mol-1 at 298 K) and might be competitive with rotational barriers. In accordance with this 
assumption, we developed a detailed mechanism for the formation of 1-hexene involving the different 
conformations of the biradical (Scheme 8). 
3 (s)
4 (s)
87.288.43.9
67.1 67.0
6 13.3-1.7
(s) singlet state
∆G°(T)  in kcal.mol-1  (bold : T= 298K, italic : T=1000K)  
Scheme 8. Detailed mechanism of 1-hexene formation from the ring opening of cyclohexane and 
obtained by considering different conformers of C6 biradicals. 
Table 10. Rate parameters for the unimolecular initiation of cyclohexane at P=1 atm, 600 ≤ T (K) ≤ 
2000 K and related to scheme 8. 
 k1-3 k3-1 k2-3 k3-2 k3-6 k3-4 k4-3 k4-6
log A (s-1) 20.68  11.08 20.16 11.23 2.17 10.11 11.40 4.17 
n -0.799 0.117 -0.810 0.073 2.923 0.720 0.359 2.295 
Ea (kcal/mol) 92.44 0.70 85.99 0.81 0.77 2.50 2.98 0.20 
Table 10 summarizes the rate parameters for elementary reactions in Scheme 8. In order to compare 
the global rate constant obtained by Tsang for the thermal decomposition of cyclohexane in 1-hexene 
[26] to that obtained from Schemes 7 and 8, we performed quasi-stationary-state approximation (QSSA) 
on biradical 5 (Scheme 7) and biradicals 3 and 4 (Scheme 8). Scheme 8 leads to the following 
expressions for the reaction c-C6H12 → 1-C6H12 : 
kchair
scheme8 =  
k3−6 k1−3
k4−6 k3−4  k1−3
C k4−3 +  k4−6( )
             (18) 
kboat
scheme8 =  
k3−6 k2−3
k4−6 k3−4  k2−3
C k4−3 +  k4−6( )
              (19) 
with C =  
k3−4 k4−6
k4−3 +  k4−6
 +  k3−1 +  k3−2 +  k3−6               (20) 
At a given temperature, the global rate constant can be calculated from rate constants given by 
equations (18) and (19) and equilibrium constant in order to have the ratio of boat and chair 
conformations. Thus, we obtained the following expression for the global rate constant:  
scheme
scheme
boateq
scheme
chairscheme
)61( +
=−              (21) 
where Keq is the equilibrium constant obtained from equation (17). 
For the more general Scheme 7, QSSA applied on the biradical 5 leads to the following relations: 
kchair
scheme7 =  
 k1−5 k5−6
k5−1 +  k5−6
       and       kboat
scheme7 =  
k2−5 k5−6
k5−2 +  k5−6
               (22) 
By analogy with equation (21), the global rate constant for Scheme 7 can be the calculated by using the 
equilibrium constant:  
kg(1−6)
scheme7 =  
kchair
scheme7
1 +  Keq
Keq  kboat
scheme7
1 +  Keq
           (23) 
Rate parameters obtained by fitting and  for temperatures ranging from 600K to 2000K 
are presented in Table 11.   
kg(1−6)
scheme7 kg(1−6)
scheme8
Table 11. Rate parameters for the global reactions cC6H12 → 1-hexene at P=1 atm, 600 ≤ T (K) ≤ 2000 
K and related to Schemes 7 and 8. 
scheme
gk −  
scheme
gk −  
log A (s-1) 20.45 20.29 
n -0.685 -0.639 
Ea (kcal/mol) 93.01 94.52 
Figure 5 shows the comparison of the values calculated from and  (Table 11) and those 
obtained from the rate constant proposed by Tsang [26], for temperatures between 950K and 1100K, i.e. 
in the range of validity of Tsang’s study.  
kg(1−6)
scheme7 kg(1−6)
scheme8
Figure 5: comparison between calculated rate constant and experimental data [26] for the global 
reactions c-C6H12 → 1-hexene. 
The agreement between k  and the rate constant proposed by Tsang is rather satisfactory since 
the ratio 
g(1−6)
scheme8
kg(1−6)
scheme8
kTsang
lies between 2 and 2.6. Another point concerns the difference obtained between the rate 
constants and  shown in Figure 6. By neglecting rotational hindrance (Scheme 7), the 
global rate constant is overestimated by a factor 2 compared to , in the temperature range  
950 – 1100 K.  This difference can be explained by the low activation energy involved in the formation 
of 1-hexene compared to rotational barrier, but also by entropic effects due to the difference between 
entropy of  the “linear” biradical (biradical 5) and biradicals 3 and 4 (Table 1). This results shows that 
by considering only the lowest energy biradical conformation, the global rate constant is largely over-
estimated. 
kg(1−6)
scheme7 kg(1−6)
scheme8
kg(1−6)
scheme7 kg(1−6)
scheme8
5.4 Ring strain energies 
The results obtained for the ring opening of cycloalkanes into biradicals, show an increase of the 
enthalpy of activation going from cyclobutane to cyclohexane. These differences are mainly due to the 
change of ring strain energy when one goes from cycloalkanes to TS. To discuss the results obtained, 
we first calculated the ring strain energy of cycloalkanes by using the usual definition of RSE [55] 
(equation 24) : 
RSE = ∆HRS = Hcyclo – n HCH2                 (24) 
where Hcyclo and HCH2, represent, respectively, the electronic enthalpies of cycloalkane and of CH2 
fragment in a strain-free alkane. n represents the number of CH2 fragments in the cyclic species. HCH2 
has been calculated by difference between the enthalpy of n-hexane and n-pentane. Hcyclo and HCH2 have 
been obtained by considering the electronic energy, zero-point energy and thermal corrections to 
enthalpy given at a CBS-QB3 level of theory at 298K.  
RSE can also be obtained by using enthalpies calculated by isodesmic reactions. Thus, equation 24 can 
be rewritten as : 
RSE = ∆HRS = ∆fH°cyclo – n ∆fH°CH2               (25) 
where ∆fH°cyclo and ∆fH°CH2 correspond, respectively, to the enthalpy of cycloalkane formation obtained 
from isodesmic reactions and enthalpy of a CH2 group formation, obtained by the difference between 
the enthalpy of formation of n-hexane and n-pentane.  
The values obtained for cyclobutane, cyclopentane, cyclohexane (twist boat) and cyclohexane (chair) 
are summarized in Table 12.  
RSE obtained at a CBS-QB3 level from direct electronic energies and those obtained from isodesmic 
reactions are in a very good agreement with values proposed by Cohen [56] and based on group 
additivity method. 
Table 12. Ring strain energies (in kcal.mol-1) of cycloalkanes as calculated at the CBS-QB3 level of 
theory and  those proposed by Cohen [56] at  T=298K.  
Cycloalkane Cyclobutane Cyclopentane Cyclohexane 
(Chair) 
Cyclohexane 
(Twist Boat) 
RSE given by 
Electronic energies 
27.0 7.4 1.1 7.5 
RSE calculated from 
Isodesmic reactions 
26.8 7.5 1.0 7.5 
RSE from Cohen [56] 26.8 7.1 0.7 / 
It is now interesting to estimate the remaining part of RSE contained in the transition states during 
ring opening. This can be done by comparing the enthalpies of activation obtained for the ring opening 
of cycloalkanes with those obtained for the dissociation of free-strain linear alkanes and by assuming 
that the difference obtained between the cyclic species and the corresponding linear one is only due to 
the ring strain energy. If no difference is observed, one can conclude that no remaining RSE is 
contained in the TS. 
For a linear alkane, the bond dissociation energy (BDE) can be assimilated with the activation 
enthalpy since the recombination of the two free radicals is barrierless. BDE between two secondary 
carbon atoms have been estimated by CBS-QB3 methods for n-butane, n-pentane and n-hexane by 
means of reaction (26): 
CnH2n+2  →  x  •C3H7  +  y  •C2H5                  (26) 
where CnH2n+2 represents a linear alkane; x and y depend on the value of n.  
BDE corresponds to the enthalpy of reaction (24) and can be expressed by the following equation: 
BDE = ∆rH° = x HC3H7 + y HC2H5 – HCnH2n+2 ≈ ∆H≠             (27)   
where HC3H7, HC2H5, represent, respectively, electronic enthalpy of n-propyl and ethyl radicals, and 
HCnH2n+2, the enthalpy of n-butane, n-pentane or n-hexane.  
Calculations of BDE in equation (27) can be done using electronic enthalpies or enthalpies estimated 
from isodesmic reaction.Table 13 gives the results obtained in the two cases and the experimental 
values proposed by Luo [57]. 
Table 13: BDE (in kcal.mol-1) of two secondary carbon atoms for unstrained linear alkanes, at 298 K  
(C-C) 
CBSQ-B3 
Electronic energies 
CBSQ-B3 
Isodesmic reactions
Experimentals 
BDE  [57] 
n-Butane 88.8 87.0 86.8 
n-Pentane 89.6 87.9 87.3 
n-Hexane 90.5 88.9 87.5 
BDE calculated from direct electronic calculations are higher that experimental ones [57]. On the other 
hand, the values deduced from isodesmic calculations are closer to these recommended values and show 
the best accuracy obtained with isodesmic reactions. If we consider that BDE can be assimilated to 
enthalpy of reaction in the case of linear unstrained alkane, we can compare the activation energies 
obtained for the opening of cycloalkanes and those obtained by removing the RSE from the BDE of the 
corresponding linear free-strain alkane. Table 14 shows this comparison from isodesmic calculations. 
Table 14. Comparison of ∆H≠ (in kcal.mol-1), obtained for the ring opening of cycloalkanes and those 
estimated from linear unstrained alkane, at T=298K by taking into account isodesmic reactions.  
Species ∆H
≠ for 
cycloalkane 
Ring strain energy 
(RSE) 
∆H≠ calculated from the 
corresponding linear alkane 
Remaining 
Cyclobutane 61.7 26.8 87 – 26.8 = 60.2 1.5 
Cyclopentane 82.4 7.5 87.9– 7.5 =  80.4 2.0 
Cyclohexane 
(chair) 
89.5 1.0 88.9 – 1.0 = 87.9 1.6 
Cyclohexane 
(twist boat) 
83.0 7.5 88.9 – 7.5 = 81.4 1.6 
The remaining RSE shown in table 14 permits to conclude that almost all of the RSE is removed in the 
transition state of cyloalkanes, excepted, perhaps, for cyclohexane (chair) where isodesmic calculations 
show a slightly increase of the ring strain energy in the transition state. Figure 6 shows the three 
transition state involved in the ring opening of the cycloalkanes studied. 
c) b) a) 
3.533 Å 3.329 Å
2.789 Å 
Figure 6: TS involved in the ring opening of cyclobutane (a), cyclopentane (b) and cyclohexane (c) 
Dudev et al. [55] have showed, from ab initio calculations, that the ring strain energy of cycloalkanes 
can be explained by contributions of ring bond angles, bond lengths or dihedral angles, this last 
parameter reflecting nonbonded interactions. Thus, the ring strain energy remaining in the TS of 
cyclohexane can be explained by the fact that the six-member ring is forced to adopt an energetically 
unfavorable gauche conformation, whereas n-hexane can exist in a strain-free trans-trans-trans 
conformation. Moreover, they showed that the effects of ring bond angles on RSE decrease when the 
size of the cycle increases while an opposite effect is obtained for nonbonded interactions. In our study, 
the formation of TS from cyclobutane involves an increase of the ring bond angles which permits to 
remove a large part of RSE. Indeed, in the TS of Figure 6a, ∠C1C2C3 = ∠C2C3C4  =  108.4°  vs 88.6° in 
cyclobutane. For TS involved in the ring opening of cyclopentane (Figure 6b), the large increase of the 
bond length between C1 and C2 (3.329 Å in TS vs 1.545 Å in cyclopentane) associated with an increase 
of the ring bond angle ( °=∠=∠ 8.114    542531 CCCCCC in the TS vs 104.8° in the molecule), may explain 
a large part of the removal of RSE in the TS. In cyclohexane, the low value of RSE is maintained in the 
transition state, that can be explained by gauche interactions remaining in the TS.  
6. Conclusions 
While much work have been carried out on the thermal reactions of aliphatic hydrocarbons, the high 
temperature reactions of cyclic hydrocarbons has not been explored extensively. In this study, the ring 
opening of the most representative cyclic alkanes, i.e. cyclobutane, cyclopentane, and cyclohexane have 
been extensively explored by means of quantum chemistry. All the possible elementary reactions have 
been investigated from the biradicals yielded by the initiation steps. The thermochemical properties of 
all the species have been calculated with the high-level CBS-QB3 method. The inharmonic contribution 
of hindered rotors have been taken into account and isodesmic reactions have been systematically used 
for the evaluation of the enthalpies in order to minimize the systematic errors. The enthalpies of 
formation of the biradicals have been compared to data obtained with a semi-empirical method and 
show a very good agreement. 
For all the elementary reactions, the transition state theory allowed to calculate the rate constant. 
Three parameters Arrhenius expressions have been derived in the temperature range 600 to 2000 K at 
atmospheric pressure. Tunneling effect has been taken into account in the case of internal H transfers. 
Thanks to the Quasi Steady State Approximation applied to the biradicals, rate constants have been 
calculated for the global reaction leading directly from the cyclic alkane to the molecular products. 
These values have been compared with the few data available in the literature and showed a rather good 
agreement. The main reaction routes are the decomposition to two ethylene molecules in the case of 
cyclobutane and the internal disproportionation of the biradicals yielding 1-pentene and 1-hexene in the 
case of cyclopentene and cyclohexane, respectively. An important fact highlighted in this work is the 
role of the internal rotation hindrance in the biradical fate. Whereas the energy barriers between 
conformers are usually of low energy in comparison to the reaction barriers, all the energies are close in 
this case and taking the rotations between the conformers into account changes the global rate constant 
especially for the largest biradicals. The analysis of the variation of the ring strain energy has also 
showed that the larger part is removed when going from the cycloalkanes to the transition states. These 
last structure are close to be unconstrained with between 1 or 2 kcal remaining. 
ACKNOWLEDGMENT  
The Centre Informatique National de l’Enseignement Supérieur (CINES) is gratefully acknowledged 
for allocation of computational resources 
SUPPORTING INFORMATION AVAILABLE 
  The full list of author in ref 30, the structural parameters for all the species investigated in this study 
the frequencies, energies and zero point energies. This material is available free of charge via the 
Internet at http://pubs.acs.org.  
http://pubs.acs.org/
REFERENCES 
(1) Guibet, J.C. Fuels and Engines, Institut français du Pétrole Publications, Eds.; Technip : Paris, 
1999 ; Vol 1, pp 55-56. 
(2) Bounaceur, R. ; Glaude, P.A. ; Fournet, R. ;  Battin-Leclerc, F. ; Jay, S. ; Pires Da Cruz, A. Int. 
J. of Vehicles Design, in press. 
(3) Slutsky, V.; Kazakov, O.; Severin, E.; Bespalov, E.;Tsyganov, S. Combust. Flame 1993, 94, 
108. 
(4) Zeppieri, S.; Brezinsky, K.; Glassman, I.  Combust. Flame 1997, 108, 266.  
(5) Voisin, D.; Marchal, A.; Reuillon, M. ; Boetner, J.C. ; Cathonnet, M.  Combust. Sci. Technol. 
1998, 138, 137. 
(6) Simon, V. ; Simon, Y.; Scacchi, G. ; Barronet, F.  Can. J. Chem. 1999, 77, 1177.  
(7) El Bakali, A.; Braun-Unkhoff, M.; Dagaut, P.; Frank, P.; Cathonnet, M. Proc. Combust. Inst. 
2000, 28, 1631. 
(8) Ristori, A.;  Dagaut, P. ; El Bakali, A. ; Cathonnet, M.  Combust. Sci. Technol. 2001, 165, 197. 
(9) Lemaire, O.; Ribaucour, M. ; Carlier, M. ; Minetti, R.  Combust. Flame 2001, 127, 1971. 
(10) Dubnikova, F. ; Lifshitz, A.  J. Phys. Chem. A 1998, 102, 3299. 
(11) Hohm, U. ; Kerl, K. Ber Bunsenges phys. Chem. 1990, 94, 1414. 
(12) Hidaka, V.; Oki, T. Chem. Phys. Lett. 1987, 141, 212. 
(13) Rickborn, S.F.; Rogers, D.S.; Ring, M.A.; O’Neal, H.E. J.Phys. Chem. 1986, 90, 408-414. 
(14) Lewis, D.K.; Bosch, H.W.; Hossenlop, J.M. J. Phys. Chem. 1982, 86, 803. 
(15) Barnard, J.A.; Cocks, A.T.; Lee, R.K-Y. J. Chem. Soc. Faraday Trans. 1 1974, 70, 1782. 
(16) Lewis, D.K.; Bergmann, J.; Manjoney, R.; Paddock, R.; Kaira, B.L. J. Phys. Chem. 1984, 88, 
4112. 
(17) NIST Chemical kinetics Database; http://kinetics.nist.gov 
(18) Butler, J.N.; Ogawa, R.B. J. Am. Chem. Soc. 1963, 85, 3346. 
(19) Beadle, P.C.; Golden, D.M.; King, K.D.; Benson, S.W. J. Am. Chem. Soc. 1972, 94, 2943.  
(20) Sakai, S. Int. J. Quantum Chem. 2002, 90, 549. 
(21) Bernardi, F.; Bottoni, A.; Robb, A.R.; Schlegel, H.B.; Tonachini, G. J. Am. Chem. Soc. 1985, 
107, 2260.  
(22) Bernardi, F.; Bottoni, A.; Olivucci, M.; Robb, A.R.; Schlegel, H.B.; Tonachini, G. J. Am. 
Chem. Soc. 1988, 110, 5993. 
(23) Doubleday, C., Jr.; J. Am. Chem. Soc. 1993, 115, 11968.  
(24) Pedersen, S; Herek, J.L.; Zewail, A.H. Science 1994, 266, 1359. 
(25) Tsang, W. Int. J. Chem. Kinet. 1978, 10, 599. 
(26) Tsang, W. Int. J. Chem. Kinet. 1978, 10, 1119. 
(27) Kalra, B.L.; Feinstein, S.A.; Lewis, K. Can. J. Chem. 1979, 57, 1324. 
(28) Brown, T.C.; King, K.D., Nguyen,T.T. J. Phys. Chem. 1986, 90, 419. 
(29) Benson, S.W. Thermochemical Kinetics, 2nd Ed.; Wiley: New York, 1976. 
(30) Frisch, M. J.; et al. Gaussian03, revision B05; Gaussian, Inc.: Wallingford, CT, 2004. 
http://kinetics.nist.gov/
(31) Montgomery, J.A.; Frisch, M.J.; Ochterski, J.W.; Petersson, G.A. J. Chem. Phys. 1999, 110, 
2822. 
(32) Becke, A.D. J. Phys. Chem.  1993, 98, 5648. 
(33) Lee, L.; Yang, W.; Parr, R.G. Phys Rev. B 1998, 37, 785. 
(34) Gräfenstein, J.; Hjerpe A.M.; Kraka E.; Cremer D. J. Phys. Chem. A 2000, 104, 1748. 
(35) Cremer, D.; Filatov, M.; Polo, V. ; Kraka, E.; Shaik, S. Int. J. Mol. Sci. 2002, 3, 604. 
(36) Schreiner, P.R.; Prall, M.  J. Am. Chem. Soc., 1999, 121, 8615. 
(37) Goldstein, E.; Beno, B.; Houk, K.N. J. Am. Chem. Soc., 1996, 118, 6036. 
(38) Kutawa, K.T. ; Valin, L.C. ; Converse, A.D. J. Phys. Chem. A 2005, 109, 10710. 
(39) Wijaya, C.D.; Sumathi, R.; Green, W.H.Jr. J. Phys. Chem. A 2003, 107, 4908. 
(40) Coote, M.L.; Wood, G.P.F., Radom, L. J. Phys. Chem. A 2002, 106, 12124. 
(41) Wood, G.P.F.; Henry, D.J.; Radom, L. J. Phys. Chem. A 2003, 107, 7985. 
(42) Ayala, P.Y.; Schlegel, H.B. J. Chem. Phys., 1998, 108, 2314.  
(43) Irikura, K.K.; Frurip, D.J.; in Irikura, K.K.; Frurip, D.J (Eds), Computational 
Thermochemistry, ACS Symposium series 677, Washington DC, 1998, pp 13-14. 
(44) NIST Chemistry WebBook : http://webbook.nist.gov/chemistry  
(45) Tsang, W., in: Simões, J.A.M.; Greenberg A.; Liebman, J.F. (Eds), Energetics of organic free 
radicals, vol. 4, Blackie A&P, Glasgow, 1996, pp 22-58. 
(46) Cramer, J.C. Essentials of Computational Chemistry, 2nd Ed., Wiley: Chichester, 2004, p. 527. 
(47) Skodje, R. T.; Truhlar, D.G. J. Phys. Chem., 1981, 85, 624. 
http://webbook.nist.gov/chemistry
(48) Lee, J.; Bozelli, J.W. J. Phys. Chem. A, 2003, 107, 3778. 
(49) Bounaceur, R.; Buda, F.; Conraud, V.; Glaude, P.A.; Fournet, R.; Battin-Leclerc, F. Comb. & 
Flame, 2005, 142, 170. 
(50) Bettinger, H.F.; Rienstra-Kiracofe, J.C.; Hoffman, B.C.; Schaefer III, H.F.; Baldwin, J.E.; 
Schleyer, R. Chem. Commun., 1999, 1515  
(51) Brocard, J.C.; Baronnet, F.; O’Neal, H.E. Comb. & Flame, 1983, 52, 25. 
(52) Dixon, D.A.; Komornicki, A. J. Phys. Chem. 1990, 94, 5630. 
(53) Gulati, S.K.; Walker, R.W. J. Chem. Soc., Faraday Trans., 1989, 85, 1799 
(54) Eliel, E.L.; Wilen, S.H. Stereochemistry of Organic Compounds, Wiley-Intersciences, New-
York, 1994. 
(55) Dudev, T.; Lim, C. J. Am. Chem. Soc., 1998, 120, 4450. 
(56) Cohen, N. J. Phys. Chem. Ref. Data, 1996, 25, 1411 
(57) Luo, Y.R. Handbook of Bond Dissociation Energies in Organic Compounds, CRC Press LLC 
2003, pp 96-97. 
ABSTRACT
  This work reports a theoretical study of the gas phase unimolecular
decomposition of cyclobutane, cyclopentane and cyclohexane by means of quantum
chemical calculations. A biradical mechanism has been envisaged for each
cycloalkane, and the main routes for the decomposition of the biradicals formed
have been investigated at the CBS-QB3 level of theory. Thermochemical data
(\delta H^0_f, S^0, C^0_p) for all the involved species have been obtained by
means of isodesmic reactions. The contribution of hindered rotors has also been
included. Activation barriers of each reaction have been analyzed to assess the
1 energetically most favorable pathways for the decomposition of biradicals.
Rate constants have been derived for all elementary reactions using transition
state theory at 1 atm and temperatures ranging from 600 to 2000 K. Global rate
constant for the decomposition of the cyclic alkanes in molecular products have
been calculated. Comparison between calculated and experimental results allowed
to validate the theoretical approach. An important result is that the
rotational barriers between the conformers, which are usually neglected, are of
importance in decomposition rate of the largest biradicals. Ring strain
energies (RSE) in transition states for ring opening have been estimated and
show that the main part of RSE contained in the cyclic reactants is removed
upon the activation process.

<|endoftext|><|startoftext|>
Introduction 
 I.1 The Yb breakthrough 
 I.2. Strategy on the matrix host 
II. Temperature profile of an ytterbium-doped crystal under diode pumping 
 II.1. Theoretical aspects 
II.1.1. The steady-state heat equation 
II.1.2. A review of analytical solutions of the steady state heat equation  
II.1.3. What is special about ytterbium-doped materials? The influence of absorption saturation 
in the temperature distribution. 
II.1.4. Determining the absolute temperature: the boundary conditions 
II.2. Experimental absolute temperature mapping and heat transfer measurements 
using an infrared camera 
II.2.1. Introduction 
II.2.2. Experimental setup for direct temperature mapping 
II.2.3. results and measurements of heat transfer coefficients 
III. Thermal lensing effects : theory  
III.1. Introduction  
III.2. Stress and strain calculations 
III.3. How can we take into account the photoelastic effect ? 
III.4. Simplified account of photoelastic effect in isotropic crystals 
III.5. A consequence of strain-induced birefringence: depolarization losses.  
III.6. Thermally-induced optical phase shift.  
III.6.1. Expression of the optical path 
  III.6.2. The thermal lens focal length 
 III.7. Discussion about the use of the “dn/dT” coefficient 
 3 /115  
III.8. A novel definition for thermo-optic coefficient based on experimentally 
measurable parameters.  
III.9. The aberrations of the thermal lens.  
IV. Thermal lensing techniques  
IV.1. Introduction  
IV.2. Geometrical methods 
IV.3. Methods based on the properties of cavity eigenmodes 
IV.4. Methods based on wavefront measurements 
IV.4.1. Classical interferometric techniques 
IV.4.2. Shearing interferometric techniques 
IV.4.3. Methods based on Shack-Hartmann wavefront sensing 
IV.4.4. Other techniques 
IV.5. Conclusion 
V. thermal lensing measurements in ytterbium-doped materials: the evidence of 
a non radiative path  
V.1. the thermal load in Yb-doped materials  
 V.2. Evidence of nonradiative effects in Yb-doped materials: the example of Yb:YAG  
V.3. Laser wavelength dependence oN the thermal load in Yb-doped broadband 
materials: the example of Yb:Y2SiO5 
IV.4. The influence of the mean fluorescence wavelength on the thermal load: an 
illustration with Yb:KGW 
IV.5. Conclusion 
VI. Conclusion 
Appendix : Calculation of the photoelastic constants Cr, θ  and C’r, θ  using plane strain and plane 
stress approximations.  
 4 /115  
I. Introduction  
I.1. The Yb3+ breakthrough 
Diode-pumped solid-state laser (DPSSL) technology has become a very intense field of 
research in Physics [1,2]. The replacement of flash-lamp pumping by direct laser-diode pumping for 
solid-state materials has brought a very important breakthrough in the laser technology in particular 
for high power lasers [3-4]. In fact, the better matching between absorption wavelength and 
material’s absorption spectra brought by the use of laser diode emission ― compared to the broad 
one of flash-lamps ― has lead to a significant benefit in efficiency and subsequently in simplicity, 
compactness, reliability and cost. This progress has substantial implications on laser applications 
such as fundamental and applied research, laser processing, medical applications …     
 5 /115  
4I9/2
4I11/2
4I13/2
4I15/2
4F3/2
4F5/2
4G5/2
4G7/2
4G9/2
2F5/2
2F7/20
10000
20000
Nd3+ Yb3+
0.8 µm
0.9 µm
0.94 µm
0.98 µm
~1 µm
~ 1 µm
Parasitic effects
4I9/2
4I11/2
4I13/2
4I15/2
4F3/2
4F5/2
4G5/2
4G7/2
4G9/2
2F5/2
2F7/20
10000
20000
Nd3+ Yb3+
0.8 µm
0.9 µm
0.94 µm
0.98 µm
~1 µm
~ 1 µm
Parasitic effects
Figure 1: Energy levels of Yb and Nd ions. Typical laser transition lines are represented for both 
pump absorption and laser emission. Lines of high excited states are also represented including the 
lines involved in the deleterious effects (up-conversion, excited-state absorption or concentration 
quenching).   
In the realm of high average power DPSSL, two rare-earth ions dominate:  neodymium and 
ytterbium [5-6]. Actually they can be efficiently pumped, respectively at 808 nm with 
InGaAsP/GaAs or AlGaAs/GaAs diodes for neodymium, and between 900 and 980 with 
InGaAs/GaAs diodes for ytterbium (fig. 1).  In both case the standard laser emission is around 1 µm, 
corresponding to the transition between the 4F3/2 and 4I11/2 lines for the Nd3+ and between the 2F5/2 
and 2F7/2 for the Yb3+. At the beginning of the high-power-laser development, the Nd-doped 
materials were preferred to the Yb-doped ones mainly because of the four level nature and their 
 6 /115  
many absorption lines, which are more convenient as far as flash-lamp pumping is concerned. 
However it seems obvious, for more than one decade now, that Yb-doped materials are more suited 
for very efficient and very-high-average-power diode-pumped lasers. The main reason for this is the 
very simple electronic level structure of the Yb3+ ion, which consists on two manifolds as shown in 
figure 1. This singular property allows avoiding most of the parasitic effects such as upconversion, 
cross relaxation or excited-state absorption which are present in Nd-doped materials [7] because of 
the existence of higher excited-state levels (4G9/2 for the 1-µm-laser emission). These deleterious 
effects have two main consequences. First, they increase the thermal load and subsequently the 
thermal problems [8] because the main desexcitation paths of the high-excited state levels are non-
radiative (as represented in fig. 1). Secondly, they also alter the gain because they can induce strong 
depopulation of the 4F3/2 level implicated in the laser inversion population. Another advantage of 
Yb-doped materials compared to their neodymium doped counterparts is the very low quantum 
defect (again due to the 2-manifold based electronic structure). In fact, when pumped at 980 nm the 
quantum defect of ytterbium is around 5 % compared to 30 % for neodymium (in YAG). This is a 
real benefit for reducing the thermal problems and thus to attain very high average powers. As an 
example of comparison between Nd and Yb doped materials, we summarized in table 1, the 
different parameters for the same well-known matrix host: YAG (Y3Al5O12) [9-13].  
 7 /115  
Table 1: comparison between Nd:YAG and Yb:YAG 
Crystal Nd:YAG Yb:YAG 
Emission line 
Wavelength 
Cross section 
Broadness (FWHM) 
1064 nm 
28 10-20 cm-2 
0.8 nm 
1031 nm 
2.1 10-20 cm-2 
9 nm 
Lifetime 230 µs 951 µs 
Saturation fluence 0.67 J/cm2 9.2 J/cm2 
Maximum doping rate 2 % 100 % 
Absorption line 
Wavelength 
Cross section 
Broadness (FWHM) 
808 nm 
67 10-20 cm-2 
2 nm 
968 nm 
0.7 10-20 cm-2 
4 nm 
942 nm 
0.75 10-20 cm-2 
18 nm 
Quantum defect  32 % 6.5 % 9.5 % 
I.2. Strategy on the matrix host 
Another advantage of Yb-doped versus Nd-doped materials is the longer lifetime which may 
allow a better storage of the pump energy; and the last but not the least advantage is the generally 
broader bandwidth of the emission lines. This last advantage leads to a potential for femtosecond 
pulse generation which, in the current state-of-the-art has never been demonstrated with 
neodymium. The emergence of ytterbium-based lasers has allowed crucial progress in the 
ultrashort-pulsed laser technology. These materials have been actually the key point for the 
development of the latest generation of “ultrafast” lasers: the all-solid-state femtosecond lasers [14-
36]. Applications for such lasers are abundant and excite a great interest in the scientific community.  
However Yb-doping brings several drawbacks or difficulties. The first one is the very strong 
influence of the matrix on the spectral properties. Actually, as the two levels 2F5/2 and 2F7/2 are split 
in manifolds by the Stark effect due to the electric crystalline field of the host matrix, the ion 
environment strongly models the spectrum. In a simple way, the spectral broadening can be directly 
related to the level of disorder of the matrix [37-48]. On the first hand, if the matrix is relatively 
simple and well-ordered, the spectra would reveal relatively narrow and intense lines (which are a 
 8 /115  
strong disadvantage for short pulse generation). Though, a simple matrix structure generally implies 
a high thermal conductivity which is a key point for developing high power lasers. An example of 
such an Yb-doped material is given in figure 2 with Yb:YAG. On the second hand, if the disorder of 
the host matrix is high, the spectrum will be large and suitable for very-short pulse generation but at 
the expanse of thermal conductivity. An example of such an Yb-doped material is given in figure 3 
with Yb:SYS. The numerous advantages of the Yb3+ ion have led to a strong interest for many host 
matrices but in general favouring either short pulses or high power applications. Table 2 represents 
this diversity of already studied Yb-doped host matrices and their principal properties.  
 9 /115  
Table 2: Comparison between Yb-doped crystals 
Emission line Absorption line Material 
(name and formula) Wavelength 
(nm) 
Cross 
section 
10-20 cm2
Broadness 
(nm) 
Lifetime
(ms) Usual 
wavelength  
(nm) 
Thermal 
conductivity
(undoped) 
(W/m/K) 
Yb:YAG 
Yb:Y3Al5O12 
1031 2.1 9 0.951 942 968 11 
Yb:GGG 
Yb:Gd3Al5O12 
1025 2 10 0.8 971 8 
Yb:Y2O3 
1076 0.4 14.5 0.82 979 13.6 
Yb:Sc2O3 
1041 1.44 11.6 0.8 979 16.5 
Yb:CaF2 
1045 0.25 70 2.4 979 9.7 
Yb:YVO4 
1020 0.9 40 0.25 985 5.1 
Yb:LSO 
Yb:Lu2SiO5 
1040 0.6 35 0.95 978 5.3 
Yb:YSO 
Yb:Y2SiO5 
1042 0.6 40 0.67 978 4.4 
Yb:YLF 
Yb:YLiF4 
1030 0.81 14 2.21 940 4.3 
Yb:KGW 
Yb:KGd(WO4)2 
1023 2.8 20 0.3 981 3.3 
Yb:KYW 
Yb:KY(WO4)2 
1025 3 24 0.3 981 3.3 
Yb:SYS 
Yb:SrY4(SiO4) 30 
1065 0.44 73 0.82 980 2 
Yb:GdCOB 
Yb:Ca4GdO(BO3)3 
1044 0.35 44 2.6 976 2.1 
Yb:BOYS 
Yb:Sr3Y(BO3)3 
1060 0.3 60 1.1 975 1.8 
Yb:glass 
(phosphate glass) 1020 0.05 35 1.3 975 0.8 
Another drawback of the Yb-doped materials is due to the quasi-3-level structure of these lasers. As 
it is apparent on the spectra of figure 2 and 3, there is an overlap between the emission and the 
absorption bands which leads to strong re-absorption effects and to a reduction of the effective 
emission band broadness. Moreover, since the splitting due to Stark effect is relatively small 
(between 200 and 1000 cm-1), the high energy levels within the 2F7/2 manifold (corresponding to the 
 10 /115  
different possible low-energy levels of the laser transition) are somewhat populated at thermal 
equilibrium. This implies two deleterious effects when temperature increases: first, a reduction of 
the laser inversion population, second, an increase of the reabsorption at the laser wavelength. A 
special care concerning the thermal load and thermal management will be then necessary to develop 
efficient lasers based on ytterbium-doped materials, especially in the high power regime. 
 11 /115  
950 970 990 1010 1030 1050 1070 1090
Section efficace
d'absorption
Section efficace
d'émission
Yb:YAG
Wavelength (nm)
Absorption
Emission
Absorption
Emission 
950 970 990 1010 1030 1050 1070 1090
Section efficace
d'absorption
Section efficace
d'émission
Yb:YAG
Wavelength (nm)
Absorption
Emission
Absorption
Emission 
Absorption
Emission 
Figure 2 : Absorption and emission spectra of Yb:YAG.   
850 900 950 1000 1050 1100 1150
Section efficace
d'absorption
Section efficace
d'émission
73 nm
Yb:SYS
Wavelength (nm)
Absorption
Emission
Absorption
Emission 
850 900 950 1000 1050 1100 1150
Section efficace
d'absorption
Section efficace
d'émission
73 nm
Yb:SYS
Wavelength (nm)
Absorption
Emission
850 900 950 1000 1050 1100 1150
Section efficace
d'absorption
Section efficace
d'émission
73 nm
Yb:SYS
Wavelength (nm)
850 900 950 1000 1050 1100 1150
Section efficace
d'absorption
Section efficace
d'émission
73 nm
Yb:SYS
Wavelength (nm)
850 900 950 1000 1050 1100 1150
Section efficace
d'absorption
Section efficace
d'émission
73 nm
Yb:SYS
Wavelength (nm)
Absorption
Emission
Absorption
Emission 
Absorption
Emission 
Figure 3 : Absorption and emission spectra of Yb:SYS.   
 As a first conclusion, Ytterbium-doped crystals are particularly suitable for directly diode-
pumped, solid-state high-power and/or femtosecond lasers. The emergence of new ytterbium-doped 
laser crystals has allowed crucial progress in the DPSSL technology. Nevertheless, a very special 
 12 /115  
care has to be done concerning the thermal properties and thermal effects in the Yd-doped materials 
because of their strong influence on the laser performance. In this paper, we then propose to make a 
review of different thermal-effect studies made on ytterbium-doped laser crystals. 
II. Temperature profile of an ytterbium-doped crystal 
under diode pumping  
 In this part we present a review about how to calculate and measure the temperature 
distribution in an end-pumped laser crystal. We explain in which cases it is possible to obtain an 
analytical expression (otherwise a finite-element analysis would be necessary), and how these well-
known results have to be corrected when we deal with ytterbium-doped crystals, because absorption 
saturation cannot be ignored in this case. Then, we investigate the role of the thermal contact at the 
boundaries, which is an essential parameter for the knowledge of the temperature. This will be 
illustrated, in the last part of this section, by experimental absolute temperature maps, obtained with 
an infrared imaging camera.  
II.1. Theoretical aspects 
II.1.1. The steady-state heat equation 
A study of thermal effects in crystals first requires the calculation of the temperature field at any 
point of the crystal. One has to solve the heat equation: 
)t,z,y,x(Q)t,z,y,x(TK
)t,z,y,x(T
c thcp =∇−
ρ  (II.1.1.) 
with :  - T = T(x,y,z,t) : temperature in K; 
 - ρ : density in kg.m-3; 
 - cp : specific heat in J.kg-1. K-1; 
 13 /115  
 - Kc : thermal conductivity in W.m-1.K-1; 
 - Qth : thermal power (or thermal load) per unit volume in W.m-3.  
The specific heat affects the temperature variation in the pulsed regime or in the transient regime: 
we will ignore it in the following work since we will only consider CW lasers. The thermal 
conductivity governs the temperature gradient inside the crystal, and will have a crucial importance 
for the thermal lens magnitude. The heat transfer coefficient H is arising when writing the boundary 
conditions, and has then an influence only on the absolute value of the temperature inside the 
crystal. We will discuss at the end of this part how to measure it, and some ways to improve its 
value.  
In order to obtain analytical expressions, some assumptions have to be made. We will assume the 
following: (1) The pump profile is axisymmetric. End-pumping by a fiber-coupled diode is a good 
example of such a profile; (2) the thermal conductivity Kc is a scalar quantity, not a tensor; this 
means that we restrict our discussion to glasses and cubic crystals [49] (in practice however the 
anisotropy of the thermal conductivity tensor is often weak); (3) the cooling is isotropic in the z-
plane, which means that the crystal mount does not favour one given direction of cooling. (4) At 
last we assume that the thermal conductivity is not significantly dependant on temperature, so that it 
can be considered as a constant. This approximation is very realistic in YAG around room 
temperature, following the study of Brown et al. [50], and we assume that it is also true in other 
crystals. However this approximation would not be valid anymore at cryogenic temperatures.  
The heat equation, including all these assumptions, becomes: 
)z,r(Q
 (II.1.2.) 
 14 /115  
when r is the radial coordinate of a point inside the crystal, measured with respect to the pump 
distribution axisymmetric axis. To simplify a bit further the equations, the crystal will be considered 
cylindrical, whose axis corresponds to the pump symmetry axis, with a radius r0 and a length L (see 
figure 4) 
Figure 4: geometry of the crystal, taken for all calculations. z=0 is the input face. 
II.1.2. A review of analytical solutions of the steady state heat equation  
An analytical expression of the temperature distribution inside a crystal is calculable only in a 
limited number of simple systems. For more complex geometries, one has to use finite element 
codes. We present in this subsection a brief review of the cases where such an analytical treatment 
is feasible.  
The first study of thermal effects in crystals has been provided by Koechner [51] in the early 
seventies. He considered flash-pumped Nd:YAG rods, within which the thermal load is uniform :  
Q thth 2
=  (II.1.3.)  
where Pth = ηh Pabs is the thermal power (in W) dissipated into the rod. Pabs is the absorbed pump 
power and ηh is the fractional thermal load. The solution writes:  
z r0 
z = 0 
Pump 
 15 /115  
)r(T)r(T
absh 22
+=  (II.1.4.) 
where T(r0) is the temperature at the edge of the crystal, which will be estimated later thanks to the 
boundary conditions. It is useful to write the temperature shift between the centre and the edge of 
the rod: 
( ) ( )
0∆ 0 =−=  (II.1.5.) 
We note that ∆T is independent of the radius r0  of the crystal, but scales inversely with L. 
The previous results can not be applied to end-pumping configurations, because in these latter cases 
the thermal load is localized within a small volume inside the crystal. In a vast majority of practical 
circumstances, the pump beam profile is axisymmetric and can be described by a super-gaussian 
shape. The general solution of the heat equation for a super-gaussian beam of any order has been 
derived by Schmid et al. [52].  
The situation is even simpler in most cases: indeed, the pump is often either a near-diffraction-
limited Gaussian beam (laser or single mode diode), or a “top-hat” beam (that is a super-gaussian 
profile of infinite order). The latter description corresponds quite well to fiber-coupled diode laser 
array pumping.  
The solution of the heat equation in the specific case of Gaussian-beam pumping is treated in [53] 
and [54]. The case of the top-hat shape has been derived by Chen et al. [55].  
Hereafter we give the solution for a top-hat beam profile. Assuming that the thermal load in each 
slice at z coordinate is a disk of radius wp(z), the temperature shift with respect to the edge 
temperature is:  
 16 /115  
)z(wr
)z(wr
)z,r(T)z,r(T
 (II.1.6.) 
where: 
- αNS is the (non-saturated) linear absorption coefficient; saturation absorption is then not 
taken into account in this formula ; 
- The axial heat flux (along z) is ignored, which means in other words that 
∂ is neglected 
in the heat equation. We’ll see in the next section the reasons why we can neglect axial heat 
flux. The formula is then not valid for thin disks.  
- As a consequence of the latter point, the temperature can be computed inside each “slice” of 
thickness dz of the crystal, as if the surrounding slices did not exist.  
The temperature gradient inside the pumped volume is of particular interest because the laser beam 
is usually (and preferably) smaller than the pump volume. One obtains: 
( , ) ( 0, ) ( , )
4 1 ( )
h abs NS
P e r
T r z T r z T r z
K e w z
∆ = = − =
 (II.1.7.) 
The temperature shift turns out to be independent of the crystal radius r0 and of the crystal length L, 
which was not the case in Koechner’s simple model (equation II.1.4). It makes sense since the 
important parameter here is the absorption length NSabsL α1= , and not the  whole length L of the 
crystal.  
In figure 5 we plotted the normalized temperature distribution for a typical ratio wp/r0 = 0.1. 
 17 /115  
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
r/r0 
T( r ) −T( r0 )
Figure 5: Normalized temperature distribution (in a plane perpendicular to the propagation axis)  
for a crystal pumped by a top-hat-profile fiber-coupled laser diode with wp/r0 = 0.1).  
II.1.3. What is special about ytterbium-doped materials? The influence of absorption 
saturation in the temperature distribution. 
An ytterbium-doped material, especially when pumped at the zero-line wavelength (i.e. around 980 
nm), has many common points with a saturable absorber. The absorption rate due to the pump is 
counterbalanced by the spontaneous emission rate, but also (which is far from being negligible) by 
the stimulated emission rate at the pump wavelength. It is essential to take absorption saturation 
effects into account; otherwise the absorbed pump power can be dramatically overestimated. As a 
result, the absorbed pump power is lower under nonlasing than under lasing conditions, since lasing 
provides (hopefully) a very efficient path to carry the excited population back to the fundamental 
level.  
 18 /115  
It is noteworthy that most Finite Element Analysis (FEA) codes (primarily designed for 4-level 
laser systems in which absorption saturation is not a problem) basically assume an exponential 
decay for the pump power inside the crystal. It can lead to large errors, as we illustrate below. 
Let Pp(z) be the pump power through a plane in the crystal at z coordinate. The thermal load density 
generated into a disk of radius wp(z) and thickness dz is: 
dz)z(w
)z(dP
th 2π
=  (II.1.8) 
where -dPp(z) represents the absorbed pump power in the thin slice of thickness dz.  
The temperature field is: 
)z(wr
)z(wr
)z(dP
)z,r(T)z,r(T
 (II.1.9.) 
Inside the pumped volume the temperature shift writes as follows:  
( ) ( )
zrTzT
=−  (II.1.10.)  
Absorption saturation issues are taken into account, under nonlasing conditions, by the following 
equation for the pump irradiance  Ip (which is the pump power divided by the pump spot area):  
 (II.1.11.) 
Where αNS is the absorption coefficient in the non saturated regime.  
The pump saturation irradiance
I is calculated from the spectroscopic properties of the material:  
( ) ( )[ ]τλσλσλ pempabspp
sat +
=  (II.1.12.) 
 19 /115  
where absσ is the absorption cross section, emσ is the emission cross section, λp the pump wavelength, 
and τ  the radiative lifetime. 
The pump power Pp(z) obeys the following equation, for a top hat beam profile (one may find the 
equivalent formulation for a gaussian pump profile in [56]): 
)z(wI)z(P
)z(wI)z(P
)z(dP
pppNSp
=  (II.1.13.) 
A practical way to study absorption saturation issues, and to check the assumptions made so far, is 
to perform fluorescence imaging experiments in a pumped crystal. Using a crystal whose one of the 
edge surfaces (in practice one side not facing the radiator) has been polished, one can make an 
optical image of the fluorescence, under lasing or nonlasing conditions, with a CCD camera and an 
interference filter at a long wavelength (at 1064 nm for instance), required to completely eliminate 
the scattered light at the pump wavelength, as well as to prevent detection of fluorescence photons 
which could have experienced reabsorption. This simple experiment allows visualizing absorption 
saturation (the fluorescence intensity, integrated along the depth of focus of the imaging system, 
does not decay exponentially) and also to measure what is the optimum location for the pump spot 
inside the crystal (figure 6). The experiments we performed with different Yb-doped materials 
taught us that the optimum focus (the one for which the measured laser efficiency was the highest) 
was always located at about one third of the whole crystal length from the input face. This 
parameter is taken into consideration in the following.  
The low brightness of the diode pump beam (compared to the brightness of the laser beam) makes 
the effective Rayleigh distance of the pump beam considerably shorter than the crystal length. For 
this reason, the divergence of the pump beam inside the crystal must also be considered, in order to 
correctly account for saturation issues. Here we describe the pump radius evolution by a relation of 
the type:  
 20 /115  
pp wn
  (II.1.14.) 
where 
w is the pump beam waist radius. The M2 factor is determined experimentally. In our case 
we used a 200µm-diameter core fiber-coupled diode (HLU15F200-980 from LIMO GmbH), whose 
the M2 was measured to be around 80.  
Results shown in figure 6 show experimental data and theoretical predictions in a 15%-at. doped 
Yb:GdCOB crystal [57]. The theoretical profiles are computed assuming that: 1) the pump volume 
has a top-hat profile, and 2) the imaging objective has a very low numerical aperture, so that the rate 
of spontaneous photons detected by one pixel can be calculated by integrating the fluorescence 
yield over one vertical line underneath. The good match between theory and experiments show 
incidentally that the “top hat” hypothesis for the pump beam profile is well justified. 
 21 /115  
Without saturation absorption :
At low power
(Pinc = 1 W
Pabs =200 mW)
With saturation absorption :
At high power
(Pinc = 13.7 W
Pabs =6 W
Ipsat =4.1 kW/cm
experiment
experiment
simulation
simulation
Without saturation absorption :
At low power
(Pinc = 1 W
Pabs =200 mW)
With saturation absorption :
At high power
(Pinc = 13.7 W
Pabs =6 W
Ipsat =4.1 kW/cm
experiment
experiment
simulation
simulation
Theoretical profile 
computed along the
symmetry axis
Experimental
profile measured
along the
symmetry axis  
Figure 6: Fluorescence detected @ 1064 nm on a crystal pumped at 980 nm at low power (top) and at high 
power (bottom), through the optically-polished top face. The influence of absorption saturation is clearly 
visible: at low pump power, the fluorescence yield is higher at the pump waist location, as expected provided 
that both absorption coefficient and absorption saturation are weak; on the contrary, when absorption 
saturation becomes non negligible, the amount of fluorescence photons is minimum at the pump waist. 
Theory and experiments agree very well, except near the exit face of the crystal, a discrepancy which could 
be related to the fact that far from the waist, the pump beam is no longer “top hat”. 
 22 /115  
Fig 7a)  
0 0.5 1 1.5 2 2.5 3
distance z (mm)
Intensité de saturation
décroissante 
position du waist
 de pompe 
Thickness z  (mm) 
Pump waist 
location 
Decreasing pump saturation 
intensity ↓
Fig 7b) 
0 0.5 1 1.5 2 2.5 3
distance z (mm)
position du waist de pompe 
Intensité de saturation
décroissante 
Thickness z (mm)
Pump waist location  
Decreasing pump saturation 
intensity ↓
Figure 7: Evolution of pump power (Fig. 7a) and temperature difference T(0) –T(r0) (Fig. 7b) 
versus crystal thickness z. The pump saturation intensities values are: ∞=
I - 50 - 20 - 10 - 
5 kW/cm2 (for these curves ηh= 0.065 et Kc = 2 W.m-1.K-1, corresponding to the parameters of 
Yb:GdCOB). 
 23 /115  
Fig 8a) 
Thickness z (mm) 
radius r (mm) 
fig. 8b)  
radius r (mm) Thickness z (mm) 
Figure 8: Temperature distribution under nonlasing condition. The pump beam divergence inside 
the crystal is taken into account (M2 = 80). In Fig 8a) the saturation of absorption is ignored; Fig 
 24 /115  
8b) pump absorption saturation is taken into account (
I = 4.1 kW/cm2). The parameters used are 
form Yb:GdCOB. 
Equations (II.1.13) and (II.1.14.) can be solved numerically and injected in (II.1.9.) to obtain the 
temperature distribution. Figure 7 shows the evolution of pump power (fig. 7a) and temperature 
(fig. 7b) at the center of the rod versus crystal thickness, for various values of the pump saturation 
intensity.  
Here we assumed that the pump beam waist was located at z0 = L/3, which is experimentally well 
verified, as far as the laser output is optimized (see figure 6 and above text). In absence of saturation 
I infinite), both the pump power and the temperature experience an exponential decay as 
expected; but for lower values of the pump saturation intensity, the temperature reaches a local 
minimum at the pump beam waist. Figure 8 shows a 3D view of temperature distribution without 
saturation (fig. 8a) and in presence of strong saturation (fig 8b.) corresponding to Ipsat = 4.1 kW/cm2, 
that is the value for Yb:GdCOB. It appears in the latter case that the region where the pump density 
is the strongest (near the pump beam waist) is not the region where the temperature is the highest 
(near the faces of the crystal). Pump beam divergence appears to be an important parameter: it 
makes, for this example, the temperature higher at the exit face than at the entrance face of the 
crystal.  
In presence of laser extraction, the pump intensity evolution through the crystal is given by: 
satsat
  (II.1.15.)  
where  
( ) NlabslNS λσα =   (II.1.16.)  
 25 /115  
( ) ( )[ ]τλσλσλ lemlabsl
sat +
=   (II.1.17.) 
are the non-saturated absorption coefficient at laser wavelength, and the laser saturation intensity, 
respectively.   
When the intracavity laser intensity I largely exceeds 
I , and if reabsorption at laser wavelength 
is small, one can show that (II.1.15) simply becomes: 
α−=   (II.1.18.) 
which means that the ground manifold is repopulated so that absorption is not saturated any more.  
In real cases, as a matter of fact, the absorbed pump power under lasing conditions is intermediate 
between the non saturated regime and the saturated (non lasing) regime: in a first approximation it 
is possible to ignore saturation effects only if the laser extraction is efficient.  
II.1.4. Determining the absolute temperature: the boundary conditions 
In this subsection we deal with the boundary problem. For the moment we have established 
expressions for the temperature gradient, but we have no idea of the absolute temperature inside the 
crystal. Let us assume that the four edge faces of the crystal are in “contact” with a radiator, which 
will be in most cases a piece of cooled copper. The first boundary condition expresses the 
continuity of the thermal flux across these contacts:   
copperinside
copper
crystaltheinside
crystal n
 (II.1.19.) 
where K is the thermal conductivity, n is the surface normal vector, and  ∂/∂n the normal derivative. 
Common metals (Copper or indium, the latter being used as an intermediate contacting material) 
have thermal conductivities that are several orders of magnitude higher than the usual conductivities 
of laser crystals: 400 W.m-1.K-1 for copper and 820 W.m-1.K-1 for indium. This means that the 
 26 /115  
temperature gradients inside these metals will always be negligible, so that we consider in the 
following that the temperature inside the radiator is uniform and is noted Tc.   
Let’s see now the second boundary condition. In many papers and FEA codes, the temperature at 
the edge of the crystal is set equal to Tc:  
cTrT =)( 0                                                                                                                         
(II.1.20.) This is actually true only for an ideal contact [58]. But even for flat and polished surfaces pressed 
one against another, this relation is far from reality [59].  
The most realistic condition is surprisingly a Newton-type law of cooling, even if we indeed deal 
with conduction problems here: 
( )cc TrTHn
K −=−= )( 0∂
 (II.1.21.) 
where jq is the thermal density flux. H is the heat transfer coefficient or surface conductance 
(W.cm-2.K-1). H is of course infinite for ideal thermal contact.  
Carslaw et al. [58] have shown that the physical origin of a temperature gap between the edge of the 
rod and the mount was due to the presence of a thin oxide (or air or grease) layer, which acts as a 
very large thermal resistance.  
Measuring the heat transfer coefficient is usually difficult and not found easily in the literature: we 
present in the next section a simple and accurate method to perform this measurement.  
What about the end faces, which are in contact with air most of the time? The heat can flow out of 
the crystal through the two end faces by both convection in free air and thermal radiation. Cousins 
[60] calculated the equivalent H coefficient for the two processes and has shown that both 
coefficients were of the order of 10-3 W.cm-2.K-1.  
Since the measured H coefficients for conduction are typically in the range 1-10 W.cm-2.K-1, this is 
the proof that the assumption of pure radial heat flux made on the previous subsection is correct.  
 27 /115  
Using (II.1.9.) et (II.1.21.) one can calculate the temperature gap between the radiator and the edge 
of the crystal: 
)()(  (II.1.22.) 
The parameter of interest here is the normal derivative of the temperature at the interface. This 
explains why the quality of the thermal contact has a tremendous impact on side or edge-pumped 
slabs or rods; since in these configurations the temperature distribution is described by a formula of 
the type II.1.4, that is a parabolic dependence. In contrast, in end-pumped configurations, where the 
temperature profile is described by equation II.1.6., the thermal gradient at the periphery is smaller 
and the requirement of a good thermal contact can be loosen. 
It is also interesting to know the maximum temperature reached inside the crystal. In order to obtain 
a easy-to-handle scaling formula, we make the strong assumption that absorption saturation is 
absent, and we ignore the divergence of the pump beam inside the crystal. We have:  
cmax w
 (II.1.23.) 
As a conclusion for this section, we will list some conclusions one can make from these two last 
equations, like a list of recipes to reduce the temperature Tmax : 
• Increase wp : obvious and efficient, but at the expense of laser efficiency.  
• Increase  H. As shown in the next experimental section, reducing H does not affect the 
temperature gradient, and will not help to reduce the thermal lens magnitude. All we can get 
is a uniform decrease of temperature. However reducing the absolute temperature is actually 
more interesting in an Yb-doped crystal than in an Nd-doped material for example, in virtue 
of reabsorption losses that are highly temperature-dependant. A better contact can also help 
 28 /115  
reducing fracture risks but this is not directly linked to a decrease of the temperature either: 
it is because a good contact can induce radial components to the stress tensor at the 
periphery, or also because it will decrease the density of high spatial frequency alterations of 
the surface which are the ultimate causes of crack-induced propagating fractures [61-63].   
• Decrease Tc : if the radiator temperature is decreased but still remains around the room 
temperature (that is with a standard thermoelectric or water-flow cooling), the effect is the 
same as increasing H: we only play on the temperature pedestal, not on the gradient. 
However, if the mount is cooled far below room temperature (at cryogenic temperatures for 
instance), the thermal conductivity of the crystals significantly increases, which is highly 
positive for the thermal gradient. This approach has been successively applied to reduce the 
thermal lens in high-energy femtosecond laser chains [64] or in Nd:YAG rods [65] 
• Decrease the crystal size? The crystal size has no influence on the temperature gradient. To 
understand its influence on the maximum temperature, Tmax is plotted versus r0 for different 
values of H in figure 9. We observe that when the radius of the crystal exceeds roughly 10 
times the pumped area radius wp, the temperature becomes independent of r0 . In practice, it 
is possible to reduce the absolute temperature using small crystals, providing they are really 
small (see for example [66]). It is practically very difficult to cut and polish crystals whose 
size is smaller than 2 mm: one can then conclude that the transverse section of a crystal is 
not a parameter on which one can play efficiently. Besides, the effect of a bad thermal 
contact is visible only for crystals whose size would be on the order of the pump spot size. 
As illustrated by figure 9, a small crystal with bad cooling (for example r0 = 0.5 mm and H = 
0.1 W.cm-2.K-1) is far worse than a « reasonably » sized crystal with correct cooling (r0 = 2 
mm ; H = 1 W.cm-2.K-1) since the temperature difference between the two configurations 
reaches  200 °C.  
 29 /115  
• Add an axial component to the heat flux: it does not appear in equation II.1.23 because it has 
been derived with the assumption of a purely radial heat flux. However, one can add a large 
axial heat flux by putting either a « transparent radiator » in front of the input face (this is 
the principle of composite bondings [67] ) or by using thin disks (i.e. L« r0) that are very 
efficiently cooled through the face in contact with the radiator [68]. 
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Crystal radius (mm)
 H = ∞ : perfect contact 
H = 1 W/cm2/K 
H = 0.1 W/cm2/K  
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Crystal radius (mm)
 H = ∞ : perfect contact 
H = 1 W/cm2/K 
H = 0.1 W/cm2/K  
Figure 9: Maximum temperature at the center of the input face of the crystal T(r=0, z=0) versus the 
crystal radius r0. The absorption saturation is neglected, as well as the divergence of the pump 
beam inside the crystal. The parameters are: Tc=15°C, ηh=6.5 %, αNS = 7.4 cm-1, Kc=2.1 W/m/K 
(values for GdCOB), Pinc=15 W, wp=100 µm. 
 30 /115  
II.2. Experimental absolute temperature mapping and heat transfer 
measurements using an infrared camera 
II.2.1. Introduction 
As depicted in the previous paragraph, the temperatures obtained by solving the heat equation are 
only relative temperature distributions, expressed with respect to the rod surface temperature. The 
latter depends on the boundary conditions and is then very difficult to predict. Direct temperature 
mapping could consequently be a helpful measurement to understand pump-induced thermal effects. 
Moreover, we have shown that one of the crucial parameter to uniformly decrease the temperature 
inside the crystal (which can be useful to reduce fracture risks, see above paragraph) is the thermal 
contact between the crystal and its surrounding mount. Consequently, the knowledge of quantitative 
and experimentally measured information as the heat transfer coefficient H is of practical 
importance for high power laser development. 
We herein report on a very simple experimental setup, based on an infrared camera that can perform 
spatially resolved analysis of the absolute temperature on the entrance face of the crystal, where 
temperature reaches generally a very high value (in any case higher than at the beam waist, as 
explained in the subsection II.1).  We can also experimentally measure the heat transfer coefficient 
H between the crystal and its surrounding for different types of commonly used thermal contacts. 
We first describe the experimental setup that allows such measurements, and illustrate it with the 
well-known Yb:YAG crystal [8]. 
II.2.2. Experimental setup for direct temperature mapping 
The experimental setup is presented on figure 10. A fiber-coupled laser diode was focussed inside 
an Yb:YAG laser crystal; the infrared emission of the entrance face of the crystal was observed with 
an infrared camera. A dichroic Zinc selenide (ZnSe) plate was used as a dichroic mirror: it was 
High Reflectivity (HR) coated for 960-1080 nm on one face (at 45° angle of incidence) to direct the 
 31 /115  
pump beam into the crystal, and also coated for High Transmission (HT) in the 8-12 µm spectral 
range on both faces to let the thermal radiation reach the camera.  
Figure 10: Experimental setup for absolute temperature measurements. 
A germanium objective (focal length 50 mm, N.A. 0.7, aberration-corrected for infinite conjugation) 
was appended close to the ZnSe plate to create the intermediate thermal image with high spatial 
resolution. The camera was an AGEMA 570 (Flir Systems Inc.) consisting of 240x320 
microbolometers working at room temperature. The measured noise equivalent temperature 
difference (NETD) of the camera is 0.2 °C. The numerical aperture of the whole imaging system in 
the object plane being around 1, a theoretical spatial resolution of about 10 µm could be achieved; 
however, the resolution is here limited to 60 µm by the size of the pixels of our camera.  The crystal 
used here was a 2-mm long, 4x4 mm2 square cross section, 8-at. % doped Yb:YAG crystal. It was 
 32 /115  
AR-coated on its faces (the lateral ones are polished). Its thermal conductivity, which is lower than 
that of an undoped YAG crystal, was measured to be 7 W.m-1.K-1 (11 W.m-1.K-1 for the undoped 
crystal). The pump source was a high power fiber-coupled diode array (HLU15F200-980 from 
LIMO GmbH) emitting 13.5 W at 968 nm. The fiber had a core diameter of 200 µm and a 
numerical aperture of 0.22. The output face was imaged onto the crystal to a 270-µm-diameter spot 
via two doublets. The crystal absorbed 5.4 watts of pump power in this case. The crystal was 
clamped in a copper block by its four side faces. In addition, on the top surface of the crystal, a 
frictionless copper finger allowed us to apply a well-controlled pressure on the crystal by the use of 
a set of known weights put upon the finger. The heat is finally evacuated from the copper block by a 
flow of circulating water. 
The key issue of infrared absolute temperature measurements is the correct calibration of the system. 
Indeed, neither the crystal nor the copper mount has an infrared luminance which equals that of a 
blackbody at the same temperature. The signal V detected by one pixel for a portion of crystal (or 
copper mount) at temperature T is: 
( ) ( ) ( )∫
ελ dLL
TTrSGTV rt
optr  (II.2.1.) 
where G is the geometric extent; ( )λrS is the spectral sensitivity; Tropt is the whole transmission 
coefficient of the ZnSe plate, Germanium objective and camera optics; 
dLTBB is the spectral 
luminance of a blackbody at temperature T, ε(T) is the emissivity; Lr denotes the infrared luminance 
of the camera itself (and its close surroundings) which is reflected back into it by the Germanium 
objective and by the polished surface of the crystal; Lt is the luminance transmitted through the 
crystal: it is zero in the 8-12 µm range since the crystal is highly opaque in this spectral region. Lr is 
nonzero and makes polished objects look brighter than blackbodies: if Lr is ignored it leads to 
overestimation of the temperature around room temperature. Inversely, the emissivity is less than 
 33 /115  
one and makes objects radiate less than a blackbody. Since the parameters ε and Lr are dramatically 
dependant on the surface quality and flatness, all the visible parts of the heat sink were covered with 
lustreless black painting. Moreover, the evaluation of all those parameters is not straightforward. 
We propose to calibrate the whole system as follows: the crystal and the copper mount were heated 
together to a set of given temperatures using a thermoelectric (Peltier) element, and we then 
compare with the temperature given by the camera to apply the adequate correction. This careful 
calibration allows rigorous and absolute measurement of the temperature with a spatial resolution 
large enough to study with sufficient accuracy the thermal behaviour on the crystal’s input face. 
II.2.3 Results and measurement of heat transfer coefficients 
Figure 11: The temperature map obtained when the crystal is clamped by its four edge faces by 
bare contact with copper without thermal joint (a), with heat sink grease (b), with a thin graphite 
layer (c), and with pressured and non-pressured indium (d). 
 34 /115  
Figure 11 shows the temperature map obtained when the crystal is clamped by its four edge faces 
by bare contact with copper without thermal joint (a), with a thin graphite layer (b), with indium (c) 
and with heat sink grease (d).  
Figure 12 is an enlargement of the two extreme cases, namely the bare contact and heat sink grease 
contact, with a transverse profile (y = 0) that shows the temperature evolution along the crystal 
lateral dimension. 
Figure 12 : Temperature mapping of the crystal (front view) and lateral profile at y=0 for two 
different types of thermal contact (direct copper-crystal contact on the left, with grease on the right).  
In the “bare contact” case, a clear gap is noticeable between the temperatures of the mount and at 
the edge of the crystal. The temperature distribution is parabolic inside the pumped region and then 
 35 /115  
experiences a logarithmic decay until the edge of the crystal, in good agreement with the theory 
described in the previous section in the case of fibre-coupled diode pumping (see equation II.1.9 
and figure 5). As already mentioned, the quality of the heat transfer at the interface between the 
crystal and its mount has an influence on the value of the temperature but not on the thermal 
gradient.  
We consequently studied more in detail the heat contact. The heat transfer coefficient H is defined 
by equation (II.1.22), where the thermal gradient is considered normal to the surface.  
Our system provides a space-resolved temperature mapping of the crystal, with a spatial resolution 
which is far below the crystal size: it then allows the measurement of H. 
By performing a linear fit of the temperature versus position on the points that are closer to the 
crystal edge, the heat flux can be determined: by applying the equation (II.1.22), one can then infer 
the value of H. We found for instance a value of 0.25 W.cm-2.K-1 in the case of bare contact. We 
estimate that the uncertainty on H is about 15%. The order of magnitude obtained is consistent with 
the values evoked by Carslaw [58] and Koechner [69]. The hot spot that can be noticed in figure 12 
betrays the poor contact between the polished face of the crystal and the copper surface. The heat 
transfer is primarily a question of how much two surfaces are in contact with respect to each other; 
we checked experimentally that the temperature inside the crystal did not depend on the applied 
pressure: we did not observe any noticeable variation of the temperature when changing the applied 
pressure in absence of thermal joint between the crystal and the copper mount.  
We summarized in table 3 the results obtained for the different thermal joints used in our set of 
experiments, namely graphite layer, indium foil and heat sink grease (CT40-5 from Circuitworks®). 
Graphite layer (around 0.5 mm thick) does not modify significantly either the maximum 
temperature or the heat transfer coefficient, but it was noticed that the contact was much more 
uniform than in the case of bare contact: in particular no hot spot appeared any more and the contact 
was somewhat independent of the applied pressure. It is not the case with indium foil. For this 
 36 /115  
experiment the crystal was wrapped within a 1-mm thick indium foil. Since Indium is a soft 
material, the quality of the contact is greatly dependant on the applied pressure. The temperature at 
the center of the pumped region experiences a 7°C decrease while the pressure increased from 1.5 
kg/cm2 to 22 kg/cm2 as shown in figure 13 (note that in this case the H coefficient is measured 
across the surface where the pressure is applied and is then an “effective” heat coefficient that takes 
into account the transfer from crystal to indium and then from indium to copper.)  
Figure 13: Evolution of the heat transfer coefficient H (squares) and maximum temperature 
(triangles) versus applied pressure for indium-wrapped crystals. 
The most dramatic change in heat transfer coefficient is obtained with heat sink grease (see table 3). 
The temperature gap drops down to 1°C and H reaches 2 W.cm-2.K-1. The heat contact is here 
independent on the applied pressure. This better heat transfer coefficient is achieved while the 
thermal grease has a much lower thermal conductivity than indium (0.62 W/m.K for 
 37 /115  
CircuitWorks ® CT40-5 thermal grease vs. 82 W/m.K for pure indium). This is an illustration, in 
conjunction with the data about the variation of H with the applied pressure, of the idea that 
achieving a good H is first and foremost a question of decreasing the thermal resistance at the 
interface (eliminating air gaps, maximizing the surface contact…) 
Table 3: Table of measured H coefficients for different contacts. Tmax is the temperature at the 
center of the pumped region; Te is the temperature at the edge of the crystal (averaged on the 4 
sides if not symmetrical), and Tm is the copper mount temperature near the crystal.  
Contact H (W.cm-2.K-1) Tmax (°C) Te(°C) Te-Tm (°C) 
Bare 0.25 49.8 33.5 10.7 
Graphite layer 0.28 46.5 30.5 8.7 
Indium foil 
(applied pressure : 
22 kg/cm2) 
0.9 40.0 25.1 4.9 
Heat sink grease 2.0 37.0 21.6 1.5 
 38 /115  
III. Thermal lensing effects: theory  
III.1. Introduction  
The previous chapter was dedicated to the calculation and measurement of the temperature 
distribution, which is the first essential step for the study of thermal effects. The appearance of 
thermal gradients causes the crystal to be under stress. The presence of inhomogeneous temperature, 
stress, and strain distributions is responsible of many deleterious effects for laser action: the most 
radical effect is fracture, observed when the hoop (tangential) stress at the periphery of the crystal 
exceeds the so-called tensile stress. More subtle effects arise from the stress-induced modification 
of the optical indices of refraction: alteration of the stability domains of the cavity, depolarization, 
losses and degradation in beam quality, all of these four phenomena being largely intermixed. In 
this paper we designate by “thermal lensing” effects all the phenomena resulting in a phase change 
of a beam passing through a pumped crystal; in other words we do not restrict this expression to an 
ideal spherical thin thermal lens, we also include its aberrations and its polarization-dependant 
aberrations. This chapter presents a general and synthetic scope of these effects, and points out how 
they are related to each other. We base our discussion on analytical simple scaling relationships, 
and we point out the validity of these formulas.  
In this review, we come back to well-established theories that have been exposed many 
times in the past [60, 69, 70], but we also bring some new insights, to our knowledge, on some 
points of practical interest.  
In particular, we will point out several inaccuracies generally reported about the values of 
the photoelastic constants in YAG, which are the result of an incorrect use of the Hooke Law; we 
also present what is (still to our knowledge) the first derivation of the photoelastic constants that 
have to be used for end-pumped crystals, that is in other words when the calculation is made using 
the plane stress rather than the plane strain approximation.  
 39 /115  
We will eventually point out that the use of the dn/dT coefficient (temperature derivative of 
the refractive index) is very confusing. In one hand, the classical formula (which reveals the 
existence of three contributions: the “dn/dT” part, the bulging of end faces, and the photoelastic 
effect) which is used since decades is correct provided that the dn/dT appearing in this expression is 
understood as a partial derivative taken at constant strain. In the other hand, the experimentalist 
can measure a quantity which is closer to a partial derivative at constant stress, and the partial 
derivatives are obviously not equal. The dn/dT parameter is then not actually the correct parameter 
to be used in order to estimate the thermal lens focal length: this subtlety means in particular that 
one cannot, in general, make use of a value of dn/dT readily found in handbooks to estimate the 
magnitude of the thermal lens of an operating laser, because the experimental measurement 
conditions are in the two cases mutually inconsistent. We’ll see however that when the dn/dT is 
large and positive, the difference can be ignored. 
We will conclude this review by a synthetic diagram showing all the thermal effects and 
how they are connected together.  
Given that thermo-optical properties pertain more to a crystal host than to a doping ion, this 
section is more general than the others and does not restrict to the case of Ytterbium-doped 
materials.  
III.2. Stress and strain calculations 
Once the temperature field has been computed, the next step is to calculate the stress and strain 
distributions inside the crystal, obtained from the so-called “generalized” Hooke law, because it 
includes the thermal expansion term [49]:  
TS ijTklijklij ∆+= ασε          (III.1) 
 40 /115  
where i, j, k, l = 1,2,3 and the Einstein summation convention is used. ∆T is the temperature shift 
with respect to equilibrium (no strain), (Sijkl) is the compliances tensor, (σkl) is the stress tensor, (εij) 
is the strain tensor, and (
α ) is the thermal expansion coefficients tensor.  
The analytical formulations of thermal stress and strain distributions in end-pumped lasers require a 
large amount of approximations, thoroughly discussed in Cousins’s reference paper published in 
1992 [60].  
In order to obtain an analytical solution to the stress problem, an additional approximation is 
required, which consists in considering the problem in two dimensions. This is either the plane 
strain approximation (valid for long and thin rods) or the plane stress approximation (valid for thin 
disks). Interestingly, Cousins [60] pointed out that the plane stress approximation remained valid 
(within approximately 10%) for aspect ratios up to L/2r0 = 1.5, providing that the stresses were 
considered as mean values integrated along the whole thickness of the rod. In the previous section, 
when we derived the temperature distribution, we had considered the crystal as a stacking of thin 
slices, so that the temperature could be calculated in a single thin slice as if the surrounding material 
did not exist. It is not possible to use this approach for the stress distribution, because a given slice 
is under the mechanical influence of the slices located on both sides, and cannot be considered as 
independent. This is why an attempt to take into account absorption saturation effects and pump 
divergence (as far as thermal stresses are concerned) inevitably requires a finite element analysis.  
As far as diode end-pumping is concerned, the plane stress approximation is then the most 
meaningful approximation that can be done. However, the exact calculation remains possible for a 
given crystal (using FEA codes) provided that all the compliances and thermal expansion 
coefficients are known. To the best of our knowledge, these coefficients have been measured for a 
very restricted number of laser crystals up to now: we may readily find these data for YAG, 
sapphire, YLF and Y2O3 [71]: other data are available in the Handbook of Optics [71] but for 
 41 /115  
crystals which are not commonly used in laser applications. The data for other materials are almost 
inexistent.  
The analytical solution of the generalized Hooke law (eq. III.1) can be found, for example in [60] 
under the plane stress approximation. In this paper, the discussion was restricted to isotropic 
materials (at a mechanical rather than an optical point of view, that is when the compliances can be 
reduced to only two parameters, the Young modulus and the Poisson ratio), absorption saturation 
was not considered, and the divergence of the pump beam was not taken into account either.  
Once this has been calculated, it is possible to study thermal fracture issues. It is generally admitted 
that fracture occurs when the maximum hoop (tangential) stress σmax at the surface periphery of the 
crystal exceeds the tensile stress σTS. The latter depends on both the fracture toughness of the 
material and on its surface flatness. These aspects have been studied in detail by Marion [61-63]. 
Data about fracture toughness of materials can be readily found in the literature for YAG, 
fluoroapatites, sapphire, yttrium orthosilicate YSO and some phosphate glasses [39, 63, 71, 72].  
For a qualitative discussion of fracture issues in Yb-doped materials, the reader is invited to refer to 
a previous publication [73]. 
III.3. How can we take into account the photoelastic effect ? 
We now consider how the temperature, stress and strain fields inside the crystal alter the phase of 
the cavity beam, all these effects being referred as “thermal lensing” phenomena.  
The appearance of stresses in the crystal causes the linear optical indicatrix (related to the linear 
indices of refraction) to change its shape, its size and its orientation. This photoelastic effect is 
accounted by the 4th rank elasto-optical tensor (pijkl): 
klijklij pB ε=∆   (III.2) 
Where (Bij) is the dielectric impermeability 2nd rank tensor. This expression if obviously valid in the 
linear optical regime only, and when piezoelectric effect is neglected [49].  
 42 /115  
The complete computation of thermal effects in a given material requires that we know everything 
about the tensors (Sijkl), (pijkl) and (
α ). The minimum number of independent terms of each tensor 
depends on the crystal symmetry, as discussed by Nye [49]. For instance, let us consider crystals 
like KGW or KYW [74], GdCOB [75], YCOB [76] or YSO [43] which are of particular interest for 
Ytterbium doping. These crystals belong to the monoclinic crystal system: this means that the 
compliances can be “reduced” to 13 independent parameters, and the elasto-optical tensor, once the 
redundant coefficients have been identified, appears to have 20 independent coefficients [49]. 
Adding the 3 thermal expansion coefficients, this means that we need to know no less than 36 
coefficients before to be able to draw the new index ellipsoid at a given point of the crystal. 
Obviously these parameters are not known (for any monoclinic crystal, in fact, to the best of our 
knowledge), which means that a rigorous calculation even with a FEA code is just not possible. 
This simple remark highlights the importance of experimental measurements of thermal effects in 
such crystals, and shows the interest as well as the inherent limitations of a simple analytical model.    
III.4. Simplified account of photoelastic effect in isotropic crystals 
Now we are aware of these difficulties, we focus our discussion on a simpler study case, 
actually the only case where analytical expressions are obtainable, that is:  
- we consider isotropic crystals only, and more particularly the widespread YAG crystal, for which 
all the parameters previously evoked are well known. Generally, cubic crystals belonging to the 
space groups 43 ,432, 3m m m  (like YAG) require 3 independent elastic coefficients ; however the 
remarkable isotropic mechanical properties of YAG enable to think of only two mechanical 
coefficients, that is the Young modulus and the Poisson ratio. In the end, 6 coefficients only are 
needed for YAG. 
- The plane stress approximation is used; 
 43 /115  
- the pump profile is still axisymmetric; 
- we consider what occurs inside the pump volume, that is for r < wp.  
- the pump divergence inside the crystal is neglected. 
- temperature, stress, strain, index are considered integrated along the whole thickness of the rod. 
For a physical quantity A(r,z) , we shall note : 
( ) ( ) dzzrArA
the integrated value of A(r,z) along the rod.  
According to the Neumann-Curie theorem [49], under these assumptions the principal axes of all 
the involved tensors (stress, strain, index ellipsoid) are radial and tangential. The notations used in 
the following are depicted in figure 14. 
nr nθ 
nr nθ 
Figure 14: Orientation of the indices’ ellipsoid in an isotropic crystal under thermal stress.  
 44 /115  
The shift of the principal indices nr and nθ are related to the diagonal coefficients of the 
optical indicatrix by:  
=  (III.3) 
We can also write the indices variation as a function of the strains as follows: 
Bnn ε
=∆−=∆ ,,,,
 (III.4) 
The six coefficients 
(i= r, θ; j=r, θ, z) can be calculated from the pijkl coefficients by a correct 
change of coordinates.  
The complete solutions for the strains εr, εθ and εz can be found, for YAG, in many papers and 
textbooks [60, 69] and can also be found in the appendix. 
Inside the pump volume, it can be readily shown that stresses and strains have a parabolic 
dependence, like the temperature distribution. Since the indices of refraction are linear 
combinations of strains, it turns out that radial and tangential index distributions must also be 
parabolic.   
The (integrated) shift of refractive index may be written as:  
( ) ( ) ( )
rndzzr
rn θθ
=∆ ∑ ∫
 (III.5) 
where Cr and Cθ (or C’r and C’θ)  are constants which will be called, following the pioneering work 
of Koechner, the “photoelastic constants”. Their calculation and expression is given in the appendix. 
We would like to point out two important clarifications about these coefficients (and also justify the 
presence of this appendix in this review):  
♦ W. Koechner published incorrect values of these coefficients in his reference book [69] 
because the temperature term in the Hooke law has been omitted; this omission has first 
 45 /115  
been highlighted by Cousins [60], but the expression of the photoelastic constants remained 
uncorrected in the following editions of this book, and nowadays still remains used under 
this form in many papers. 
♦ Secondly, the derivation of the photoelastic constants requires turning the 3D problem into 
a 2D problem, as discussed in the previous section. Only the plane strain case was 
considered by Koechner. However, we saw that the plane stress case is closer to reality in 
end-pumped rods. Here we denote as Cr and Cθ the photoelastic constants valid for long and 
thin rods (the “Koechner case”, that is when the plane strain approximation is valid), and 
C’r and C’θ the photoelastic constants derived within the framework of the plane stress 
approximation. Since we are only interested in end-pumping, we only consider the ' ,θrC  
constants in the following. 
The above-mentioned relations are derived by making the assumption that the pump beam 
radius is constant through the crystal thickness; however we don’t have to assume a particular 
absorption regime, so that they stay valid in presence of absorption saturation, under lasing as well 
as under nonlasing conditions. 
As we will see in the next section, the most interesting feature for the laser scientist is the 
index shift between the center and the edge of the pumped zone, since it yields the contribution to 
the global thermal lens. From (II.1.10), and using the bracket notation for z-integrated values, we 
can write (III.5) under the form:  
( ) ( ) ( ) ( )rTTCnrnn
−=∆−∆ 020 '30,, ,θαθθ  (III.6) 
III.5. A consequence of strain-induced birefringence: depolarization losses.  
 46 /115  
The birefringence of a crystal submitted to thermal stress has two main consequences for a 
light beam passing through it: both its state of polarization and its phase will be altered. Before 
examining in detail the effect on phase (i.e. thermal lensing effects), let’s first examine the influence 
on polarization. 
We use the restrictions exposed in the previous subsection, given that the following can be 
readily extended to uniaxial crystals provided that the optical axis lies parallel to the propagation 
axis. In these cases, if no polarizing element is added into the cavity, the laser output is not 
polarized, and stress-induced birefringence has no net effect. 
Beam polarized along x 
crystal 
Depolarized beam  
Figure 15: Depolarization of a polarized beam passing through an isotropic crystal under thermal 
stress. 
For many applications however a polarized output is desirable: the situation is depicted 
schematically in figure 15. An incident beam (polarized along the x direction) will have its 
polarization modified differently for every single ray: For a ray crossing the (Ox) or (Oy) axis for 
instance, the polarization is not modified, for all the other rays the polarization becomes elliptical, 
with principal axis that are radial and tangential.  
 47 /115  
At every roundtrip in the laser cavity, the beam meets a polarizing element (such as a plate 
at Brewster angle) and this depolarizing element, yielding to the so-called “depolarization losses”. 
Another effect, consequence of the latter, is a modification of the beam spatial profile: since 
the beam is not altered along the (Ox) and (Oy) directions and suffers losses elsewhere, it tends to 
take the shape of a cross; this aspect is chiefly discussed in Koechner [69].  
In biaxial crystals (or uniaxial crystals with optical axis normal to the direction of 
propagation), the output is naturally polarized along the crystallophysic (not to be confused with 
cristallographic) axis along which the emission cross section is the highest. In this case, stress-
induced birefringence does not generally twist the index ellipsoid enough to significantly modify 
the polarization state.  
Finally, clever solutions have been imagined to compensate for depolarization losses in 
isotropic crystals: for example the use of two rods with a Faraday rotator inbetween [78, 79], or 
even a simple quarter waveplate [70]. This last technique is however limited to a few configurations 
[80]. 
In the following, we present results obtained with Yb-doped crystals which are either 
isotropic (and naturally not polarized), or naturally birefringent, so that this problem was not 
encountered. 
III.6. Thermally-induced optical phase shift.  
III.6.1. Expression of the optical path 
We present now this derivation with some detail, because it makes appear a trivial yet 
fundamental difference between these results and what is currently reported in the literature. The 
derivation is largely reproducing Cousins’s work [60]. 
 48 /115  
n (T, ε j ) 
L+∆L d - ∆L 
z  = 0 
n0 (Tc ) 
z  = 0 
Figure 16: Notations used for the calculation of the optical path 
Consider a crystal whose length is L and index n0 at temperature Tc (temperature of the heat 
sink) and in absence of strain. We consider the optical path of a straight ray (running parallel to the 
crystal axis z) between a plane z = 0 (taken at the entrance face of the crystal) and a plane z = L+d  
(see figure 16).  
In absence of temperature and stress fields (pump off), the optical path is: 
dLTn c
off +×= )(0δ  (III.7) 
With the pump on, the optical path is dependant on the lateral shift r of the ray with respect 
to the crystal axis, but also on the direction of polarization. As a consequence there will be two 
distinct thermal lens focal lengths, one for a (virtual) radially polarized beam and another for a 
tangentially polarized beam, what is commonly named “bifocussing”.  
We may write the optical path with the pump beam “on” as (see figure 16):  
  ( ) ( ) ( )rLddzTnr
∆−+= ∫
, ,, εδ θθ  (III.8) 
where ∆L(r) is the crystal length shift due to the inner compression, responsible for the 
bulging of the end faces.  
 49 /115  
Assuming small variations in the refractive index, one may expand ( )jr Tn εθ ,,  in Taylor 
series, and discard the second and higher-order terms:  
( ) ( ) ( )( ) ( ) ( )rLddzzr
+= ∫ ∑
=0 ,,
0 ,,,
 (III.9) 
Note that the temperature derivative of the refractive index appearing in this equation is a 
partial derivative calculated at constant strain. As we will discuss in the next subsection, this is not 
the usual dn/dT parameter. 
The rod length change ∆L(r) can be written as a function of the axial strain εz and equals  
( ) ( ) ( )rdzz,rrL z
z εε∆
== ∫  (III.10) 
From the strain-stress relationships featured in the appendix, it can be easily shown that the 
axial strain, under the plane stress approximation, is equal to: 
 ( ) ( ) ( ) ( )rTrTr Tz −=+−= 01 ναε  (III.11) 
Given that the first-order terms in the integral appearing in (III.9) are much smaller than 
n0(Tc) and given that ∆L « L, we can write that for the first order terms  ( ) ( )∫∫ ≈
∆+ LLL
......
The relative optical path is then:  
( ) ( )
( ) ( ) ( )( ) ( ) ( )0110
TrTnr
offon
−+−+⎟
 (III.12) 
III.6.2. The thermal lens focal length 
The thermal lens is related to the optical path difference (OPD or ∆ in the following) 
between an on-axis central ray (r = 0) and an outer parallel ray passing inside the pumped region, 
defined by a radius r < wp.  
 50 /115  
We note : 
( ) ( ) ( )rr relrrelr θθ δδ ,, 0 −=∆  (III.13) 
The expression of the optical path difference is, from (III.6) and (III.12):  
 ( ) ( ) ( ) ( ) ( )rTTnCn
r TrT
+−++⎟⎟
=∆ 0112 0
, ναα θ
θ    (III.14) 
We assume that the thermal derivative of the refractive index is equal for radial and tangential index, 
so we can write:  
( ) ( )( ) ( ) ( )
( ) ( )rTT
rTTCnn
++−+⎟
 (III.15) 
where θχ ,r is usually called the “thermo-optic” coefficient. We remind the reader that this 
expression is valid only under some restrictive conditions that have been presented in detail in the 
section 4 of this chapter.   
Given that the integrated temperature shift is given by (from eq. II.1.11):  
( ) ( ) ( ) ( )( )
dzzrTzTrTT
∫ =−=−  (III.16) 
it appears that the optical path difference also follows a quadratic dependence in r. This 
means that in the paraxial approximation, the pumped crystal acts as a thin lens whose focal length 
is given by :  
),( 2∆
=  (III.17) 
In the following the difference between fr and fθ (responsible for bifocussing) will be 
omitted (realistic if the photoelastic effect is negligible, or if the laser beam is not polarized.) 
 51 /115  
The thermal lens dioptric power is thus defined by: 
( )( )
th Kw
002 2
++−+⎟
==  (III.18) 
where χ is a polarization-averaged thermo-optic coefficient. 
Let’s notice that this formula still holds in Yb-doped materials, with strong absorption 
saturation, since no assumption was made concerning absorption.  
If the pump profile is gaussian (e.g. end-pumping by another laser), one can show [70] that 
the thermal lens dioptric power is twice as large as (III.18), meaning that the thermal load is more 
spiky around the center of the beam than in the case of an uniform “top hat” pump beam energy 
deposition.  
To conclude, let’s address some orders of magnitude. To be correct, the previous derivation 
has to be performed within the paraxial approximation, which means Lf th >> . 
Taking some typical values (ηh ∼ 0.1, χ ∼ 10-5 , Kc ∼ 5 W/m/K, wp = 100 µm, et L = 3 mm), 
it appears that the paraxial conditions are met provided that  Pabs << 100 W. This corresponds to 
most of practical cases. 
        
III.7. Discussion about the use of the “dn/dT” coefficient. 
Let’s start this discussion by the expression of the thermo-optic coefficient: 
( )( ) 3 ', 0 0 ,1 1 2r T T r
n n C
Tθ θ ε
χ ν α α
∂⎛ ⎞= − + + + ⎜ ⎟∂⎝ ⎠
 (III.19)  
 52 /115  
In this subsection, we would like to point out how this expression can be misleading if one 
carelessly uses, to evaluate the magnitude of a thermal lens, the so-called dn/dT instead of
The error is especially important when all terms except the dn/dT are discarded for the sake of 
simplification. This represents, to our knowledge, a discussion that has never been published so far.  
The three contributions appearing in (III.19) may be understood as follow: 
♦ The term ( )( ) Tn αν110 +−  is clearly related to the bulging of end faces, and is the direct 
consequence of the inner compression of the crystal, which causes the optical path to 
increase (if αT>0). It is strictly true for an infinitely thin crystal, since plane stress 
approximation is used to derive it. It is reported by Cousins [60] that for a rod whose ratio 
length/diameter is 1.5, this term overestimates the actual bulging by around 35%. In general 
this can be taken as an upper limit for end faces bulging in DPSSLs. 
♦ The term ' ,
02 θα rT Cn accounts for the photoelastic effect only, as already discussed in 
subsections III.3 and 4. It explains bifocussing, depolarization and polarization-dependant 
astigmatism.  
♦ As for the first term (III.19), it represents the partial derivative of refractive index at 
constant strain, which is the thermo-optic coefficient of a virtual perfectly rigid crystal.  
It is noteworthy that it is actually not the usual dn/dT parameter that one can measure and 
find easily in the literature. The “usual techniques” for measuring dn/dT are based either on 
geometrical optics (e.g. measurement of the minimum-of-deviation angle of a prism cut in the 
material under study [72]) or on interferometric techniques (e.g. measuring fringe patterns 
displacements [81]). In all cases, the sample is put into an oven and exposed to different well-
known temperatures. The crystal is free to expand, and the temperature rise into the material is 
uniform, which is incidentally an essential condition for valuable measurements. In this case, 
 53 /115  
obviously, the crystal experiences thermal expansion, a phenomenon which causes the index to 
change (to decrease, in general) in virtue of a pure photoelastic effect. In all these practical 
circumstances, the strain tensor relates directly to the temperature shift by the thermal expansion 
tensor, in other words the stress terms in equation III.1 are zero. The coefficient measured 
experimentally can then be regarded as a partial derivative at constant stress: 
measured
 (III.20) 
On the other hand, the reality experienced by the laser crystal while optically pumped is 
radically different. We know that in all cases (transverse as well as end pumping, thin disks as well 
as long and thin rods) the pumped area inside the crystal is under compression (negative stresses 
and strains), which is true as soon as the thermal expansion coefficient is positive. It can be 
explained qualitatively by saying that the central region of the crystal, yet hotter than the edges, is 
prevented from expanding by the expanding (cooler) outer parts of the rod, which eventually causes 
the central region to be under compression.  
As a result, we see that if we consider the measured dn/dT instead of the partial derivative at 
constant strain in eq. (III.19), the photoelastic contribution to the thermal lens, already fully taken 
into account with the term 2 ' ,
0 θα rT Cn , which accounts for thermal expansion if any, is partially 
cancelled by the photoelastic (thermal expansion) term hidden in the measured dn/dT.  
In a first attempt to correct for this, we thus have to evaluate precisely the thermal expansion 
contribution to the measured dn/dT. It can be done in a simple way by considering the Clausius-
Mossotti model for refractive index, which writes: 
 54 /115  
( ) ( )( )
    (III.21) 
where ρ is the specific mass (density), M the molecular weight, Na the Avogadro number, and αe the 
polarizability. This expression is valid strictly speaking for isotropic ionic crystals (for covalent 
crystals the local field correction is smaller and atomic polarizabilities loose their meaning due to 
the very nature of covalent bonding) 
The modification of polarizability with temperature results from the change in thermal 
occupancies and spectra of the energy levels. Tsay et al. [82] have developed a two-oscillator model 
where they consider the contribution of both electronic and lattice vibration terms. The discussion 
about the different origins of dn/dT is beyond the scope of this review and will not be exposed in 
detail here.  
We can differentiate (III.20), assuming that changes in density only originate from isotropic 
thermal expansion. This is consistent with the experimental procedure used to measure this 
coefficient. 
( ) ( )( ){ }, ,e e e
measured
dn T T T n n n n
dT T T Tσ
ρ α ρ α αρ
ρ α ρ α
⎛ ⎞ ⎛ ⎞∂ ∂∂ ∂ ∂ ∂ ∂⎛ ⎞⎜ ⎟ = = + +⎜ ⎟⎜ ⎟⎜ ⎟ ∂ ∂ ∂ ∂ ∂ ∂ ∂⎝ ⎠ ⎝ ⎠⎝ ⎠
    (III.22) 
From (III.20) and isotropic expansion assumption we obtain: 
( )( )
12 22
    (III.23) 
This expression puts into the light three contributions to the measured dn/dT:  
- a pure effect of thermal expansion : 
( )( )
12 22 −+
 (III.24) 
 55 /115  
- the influence of thermal expansion on polarizability,  
              
( )( )2 22 1
− + − ⎡ ⎤∂
⎢ ⎥∂⎣ ⎦
 (III.25) 
- a thermal expansion-independent contribution, which can be assimilated to the partial 
derivative at constant strain: 
( )( )
12 22
 (III.26) 
Since the second and third terms do not breakdown easily into a set of available material physical 
parameters, no general formula based on the measured dn/dT can be derived for the last partial 
derivative. We see also that thermal expansion appears in both (III.24) and (III.25), so that we 
cannot in a straightforward way dissociate “pure” thermal expansion from strain-related 
polarizability effects. 
This formulation brings some questions (rather than answers) about the interpretation of 
some experimental results, in particular those obtained with materials whose measured dn/dT is 
negative.  
It is often reported that in such materials (e.g. LiCAF [72, 83], FAP [39], YLF [84]), the 
thermal lens is weak or even divergent (case observed in YLF crystals) because the negative dn/dT 
counterbalances (or even surpasses) the other positive terms in the expression of the thermo-optic 
coefficient. 
Although, because of the lack of data about these materials, the photoelastic term is just 
supposed to be positive, or evaluated from other materials whose properties are believed to be 
similar (Woods et al. [72] use data from CaF2 to evaluate photoelastic constants of LiCAF, Payne et 
al. [39] used data from LG-750 phosphate glass to approximate that of FAP). In some cases (e.g. 
thermal lensing measurements in Nd:YLF crystals [84]), discrepancies are reported  between theory 
and experiment, which do not occur with YAG which has a positive measured dn/dT. 
 56 /115  
A large and negative dn/dT coefficient means that the thermal expansion is dominating 
polarizability contribution for an unstressed crystal; it is observed that such behaviour is generally 
associated with a large thermal expansion coefficient. This also means that the photoelastic term, 
proportional to αT, can be expected to be greater. In contrast, we can say nothing about the sign of 
this term, whose knowledge requires that we know all of the pijkl coefficients of the crystal. 
This means that there is no obvious relationship between the sign and magnitude of the 
measured dn/dT and the sign and magnitude of the thermal lens. 
To go further, the only term which is truly always positive is the end faces bulging term. The 
polarizability dependence on temperature (eq. III.26) is mostly positive too [82]. We can then assess 
that negative thermal lensing is more likely to be explained by negative photoelastic terms, and/or 
possibly, by a negative e
 term. 
In conclusion, the crystals with negative measured dn/dT have to be considered very 
carefully as far as simulations are concerned: photoelastic terms must not be neglected. However, it 
remains that photoelastic contributions, whatever calculated or measured, tend to be small in many 
crystals. This means that the rude approximation made by replacing χ by the measured dn/dT (+ the 
end bulging term) will be all the more close to reality that the dn/dT is large and positive.   
III.8. A novel definition for thermo-optic coefficient based on experimentally 
measurable parameters.  
In the precedent subsection, we saw that there are many problems related to the definition of 
the thermo-optic coefficient, which are above all related to the abusive use of the parameter (dn/dT) 
in a context where it is not relevant.  
 57 /115  
Actually, a better way to describe the phenomena is to start from measurable data that are 
relevant as far as solid-state lasers are concerned. The partial derivative at constant strain is a formal 
parameter which has not a real physical meaning since it is impossible to prevent the crystal from 
any strain, compression or thermal expansion. Furthermore, photoelastic and polarizability effects 
are so strongly intermingled, that one cannot imagine easily an experiment that could separate 
clearly the effect of one from the other. That’s why we propose to base solid state laser thermal 
characterization for high power applications on measurable data and separate the thermo-optic 
coefficient χ in three truly independent contributions, as follows: 
lgn bu ing birefringenceχ χ χ χ= + +  (III.27) 
where  
( )3 ' '0n T r
n C C
= + +⎜ ⎟∂⎝ ⎠
 (III.28) 
( )( )lg 0 1 1bu ing Tnχ ν α= − +  (III.29) 
( )3 ' '0birefringence T rn C Cθχ α= ± −  (III.30) 
χbulging accounts for curvature of end faces, and is measurable by performing, for example, 
interferometric or wavefront measurements on a probe beam reflected on each side of the crystal, as 
done by Baer et al. [85] or Kleine et al. [86]. The expression given in (III.29) applies to the thin 
disk ideal model, but its real value can be computed depending on every special geometry, quite 
easily with a finite element code, since this is a pure thermomechanical problem, where data are 
more readily accessible or measurable. 
χbirefringence accounts for strain-induced birefringence (“+” for radial polarization, “-“ for 
tangential). It can be measured separately by performing measurements of polarization-dependant 
 58 /115  
astigmatism, for instance, thanks to a wavefront measurement method sensitive to aberrations (see 
next section). 
Eventually, the term χn which accounts for all the refractive index variations with 
temperature, is not rigorously calculable for all the reasons exposed above, but its exact value can 
be deduced for each material and for each pumping configuration, from the separate measurement 
of χ (global thermo-optic coefficient), χbirefringence and χbulging.  
To conclude, let us give some orders of magnitude of different terms in widespread YAG. 
The measured value of the global thermo-optic coefficient (see next section of this article for details) 
for this material under diode pumping is 10 10-6 K-1. 
The different contributions calculated from tabulated data are: 
6 19.10
measured
− −⎛ ⎞ =⎜ ⎟
( )( )2 2 6 12 1 31.5 10
measured
n ndn
+ −⎛ ⎞ + =⎜ ⎟
( )( ) 6 1lg 0 1 1 7.210bu ing Tn Kχ ν α − −= − + =  
3 6 1
3 6 1
2 0.27 10
2 0.9310
n C K
n C Kθ
(see appendix for the detail of the calculation of photoelastic constants within the plane stress 
approximation). 
The thermo-optic coefficient are small, but the bulging term is far from being negligible. 
Weber et al. [87] have shown that in Nd:YAG, bulging represented 30% of the global thermal lens.  
We calculated also for information a “corrected” dn/dT, which is obtained after subtraction of the 
inappropriate thermal expansion term (III.24): this coefficient turns out to be very large here.   
 59 /115  
In diode-end-pumped Nd:YLF and Nd:YVO4, Baer et al. [85] and Kleine et al. [86] have 
shown experimentally that the bulging term represented half of the total thermo-optic coefficient. A 
finite-element analysis, performed by Peng et al. [88], leads to a similar conclusion for a Nd:YVO4 
crystal.  
III.9. The aberrations of the thermal lens.  
In conclusion to this chapter, we will introduce the thermal lens higher-order distortions id 
est the thermal lens aberrations. 
For a perfectly parabolic distortion of the wave front or equivalently a pure thermal lens the 
thermal distortion can be easily compensated by addition in the laser cavity of the opposite 
divergent lens or by adjusting the distance of the different cavity elements. But, if aberrations are 
present the compensation is very difficult and requires complex systems [89]. While uncorrected, 
these aberrations lead to degradation in beam quality (brightness), and also to losses due to 
diffraction of the beam high spatial frequencies [90]. 
The aberrations are present when the wavefront distortions induced by the absorbed pump 
beam are not perfectly parabolic. This occurs when the longitudinal pump beam has not a true top-
hat profile, for example in the case of a Gaussian pump beam profile [91]. Moreover, if the laser 
beam size is larger than the pumped area, the aberrations also become important as shown in figure 
17. The rays, far from the pumped area are almost not deviated. This is the signature of spherical 
aberration (which is sometimes referred as a “thermal lens varying with radius r”.)   
 60 /115  
Pumped areaPumped area
Figure 17: When the laser beam is larger than the pumped area, spherical aberration is observed. 
In general the aberrations affect the laser modes of the cavity in a way that to degrade the 
beam quality. This degradation can be evaluated by the M2 factor or the Strelh ratio for exemple 
[92]. We are just giving here an exemple of the influence of aberrations on the beam quality, some 
results given by Clarkson [70].  
Let us consider that only 3rd order spherical aberration is present. In this case, the optical 
path difference ∆(r) can be written as:  
( ) 44
−=  (III.31) 
if the laser initial beam M2 factor is 2iM , the M
2 with the added aberations is :  
( ) ( )22222 qif MMM +=  (III.32) 
 61 /115  
with 
π8 442 L
M =  (III.33) 
where λ is the wavelength and wL the laser beam waist. 
One can show that obviously C4=0 for a top-hat pump profile provided that wL< wp. But for 
a gaussian pump-beam profile (beam waist wp) we obtain [70]:  
M  (III.34) 
In that case, the wL/wp is present to the power of 4, which means a strong increase of M2 
even for small mode mismatch. 
Another classical aberration to be considered is the thermal astigmatism.  In this case, the 
analytical solution is not as simple as with the 3rd order spherical aberration and we will only focus 
on the qualitative approach, answering this simple question: in which conditions does the thermal 
lens exhibit astigmatism? In practice it will occur whenever: 
- the thermal conductivity is anisotropical ;  
- the thermal expansion tensor 
ijTα does not reduce to a scalar quantity; 
- the cooling is inhomogeneous; 
- the laser beam is  polarized, even for an isotropic material. This is the so-called polarization-
dependant astigmatism, which can be used a sa probe to evaluate the strain-induced 
birefringence (see figure 18.) 
 62 /115  
crystal 
(nr-1)L 
(nθ-1)L 
Figure 18: Illustration of polarization-dependant astigmatism. Here a vertically polarized strikes 
the crystal from the left. The ray  #    sees the radial index of refraction nr while ray #  sees the 
tangential index of refraction nθ . If for the sake of simplicity the indices are considered constant 
over the whole crystal length L, the astigmatism is (nr- nθ)L. 
As a conclusion for this part, we present a schematic diagram (fig. 19) summing up the different 
thermal effects arising in a solid state laser medium, and how they are related to each other.  
 63 /115  
Figure 19: Summary of the thermal effects in solid-state lasers. The observable consequences 
observable are presented in full rectangles. The aberrations can be split in two classes: the ones 
Pumping ∆T 
    temperature 
deformations 
Modification of crystal lattice (thermal 
expansion)  
(stress) 
Elastic 
tensor 
Modification of refraction indices 
Thermal lens 
Induced birefringence 
Photoelastic 
effect 
depolarization 
Modification of 
polarizability, 
related to 
⎜ ⎟∂⎝ ⎠
FRACTURE 
if σ > σTS
Curvature (bulging) of both faces of 
the crystal 
focussing  
For an ideal 
lens 
Changes in 
stability range of 
the laser cavity, 
and in the size 
and the 
divergence of the 
laser beam. 
Losses   
potentially 
Spherical 
aberration or/and 
astigmatism  
Beam quality degradation  
ABERRATIONS of the thermal lens 
Polarization-
dependant 
Astigmatism 
Leading to 
 64 /115  
that do not depend on the polarization (lined rectangles)  and the ones that  depend on polarization 
(dotted rectangles)  which come from the strain-induced birefringence. 
(*): 2 types of losses are induced by aberrations: the diffraction losses (associated with degradation 
in beam quality) and the losses induced by the eventual presence of a diaphragm in the cavity to 
prevent from oscillation of higher order laser modes. 
IV. Thermal lensing techniques  
IV.1. Introduction  
The first evidence of thermal effects in lasers were demonstrated in 1965 by Gordon, Leite 
and Whinnery [93] working at Bell Labs on He-Ne lasers for Raman spectroscopy applications. The 
use of liquids to Q-switch the laser lead to the observation of unexpected effects such as relaxation 
oscillation of jump of modes. The exceptionally long time constant of this phenomenon (several 
seconds) lead to the conclusion that a thermal lens was at the origin of the observed effects.  This 
lens was created by the small absorption occurring in the liquids. The first application proposed by 
the authors was then to use it to measure very small absorption coefficients down to 10-4 cm-1. As a 
matter of fact, the so-called “thermal lens method” allows nowadays to measure absorption 
coefficient lower than 10-7 cm-1 [94]. Since 1965, the photothermal-methods panel available to 
physicians has grown in diversity [95]. Actually, due to the complexity of simulating thermal 
effects in lasers (see previous subsections), the experimental determination is often the only 
accurate method.  
Since the 70’s, numerous attempts have been done to measure the thermal lens in solid-state 
laser media. They can be classified in three categories: the geometrical methods based on the 
deflexion of a beam, the methods based on the properties of cavity modes, and the methods based 
on the wavefront measurements. 
 65 /115  
IV.2. Geometrical methods 
These methods are probably the simplest methods to measure the thermal lenses. They can 
be separated in two sub-groups: one can exploit either the defocussing or the deflection of a beam 
passing through the pumped medium  
crystal 
Powermeter 
Figure 20 : Example of thermal lens measurement based on the displacement of the focal point. 
The principle of the first category of methods is very simple. Considering a probe beam 
going through the crystal, the measurement of the axial shift of the focal point position allows to 
retrieve directly the thermal lens using simple geometrical optics. When the rod is relatively large, 
one can use a collimated beam of comparable size to directly measure the position of this focal 
point by Z-scan measurement for example as shown in figure 20 (a small aperture is longitudinally 
translated in order to find the maximum of probe-beam transmission [96].) This method is based on 
the assumption that the lens is perfect (without aberrations) and then is especially suited to 
transverse pumping with large-size materials . In fact, the method is not easily applicable to end-
pumped lasers because of the very large depth-of-focus associated with a very small and low solid-
angle–probe beam, and obviously does not yield the thermal lens aberrations. 
Moreover, for small beam diameters (10-100 µm), this method needs to be generalized using 
Gaussian beam optics. Hu and Whinnery [97] described a simple method to evaluate the thermal 
lens with a probe beam whose size is comparable to the cavity beam in end-pumping schemes. This 
 66 /115  
last method is described in figure 21: it can be based either on the measurement of the divergence of 
the probe beam [97], or in the measurement of the beam diameter in an appropriate plane.  
figure 21 : Example of measurement based on focal point displacement for tightly focused probe 
beam according to Hu and Whinnery’s method. 
To conclude on the methods based on the focal point displacement, we can say that their 
main drawback is their low accuracy. The relative precision is only 20-30% on the focal lens 
measurement. 
We can use geometrical optics in another way by measuring the deflection of a probe beam. 
Instead of using a beam covering the whole pumped area, we can measure the deflection of a small 
beam slightly off-axis as shown in figure 22.  
figure 22 : Example of measurement based on deflection measurement [98]. 
detector 
Laser 
crystal 
With thermal lens 
Gaussian 
beam 
Without thermal lens
 67 /115  
The whole surface can be scanned in order to measure not only the focal lens but also the 
aberrations due to the thermal effects [98]. However this method is complex and it can only be used 
for large transversally-pumped crystals (even if the resolution can be lower than the probe beam 
size [99]). Moreover this method can be considered as a point-to-point measurement of the simpler 
Shack-Hartmann technique described later.  
IV.3. Methods based on the properties of cavity eigenmodes 
After the development of end-pumped lasers (particularly thanks to the increasing 
performance of laser-diodes), alternative methods appeared. These less straightforward methods are 
based on the properties on the laser cavity eigenmodes, in particular on the fact that the thermal lens 
affects the stability zones of a laser cavity. All these methods are based on the theory of paraxial 
beam propagation theory in cavities presented by Kogelnik and Li [100] and can be formalized 
using the ABCD matrices. An example of this influence of the thermal lens on the stability of a 
cavity is given in figure 23. As a direct consequence, these methods consider ideal lens (aberration-
free) and do not give any information on the thermally-induced aberrations. Nevertheless these 
methods remain easy to implement since there is no probe beam (the laser itself is used to measure 
the thermal lens). In counterpart, it is impossible to measure the thermal lens in absence of laser 
extraction (with pumping but no laser emission). Here are three different examples of these methods 
based respectively on the work of Frauchiger and al., Neuenschwander and al. and Ozygus et al.   
 68 /115  
No thermal lens
thermal lens
Unstable cavity
Stable cavity
No thermal lens
thermal lens
Unstable cavity
Stable cavity
Figure 23: example of this influence of the thermal lens on the stability using a plano-plano cavity 
[101]. 
- Frauchiger and al. [102] measured the divergence of the output laser beam in a diode-pumped 
Nd :YAG and retrieved the thermal lens by a paraxial calculation. This example allows us to 
perceive the limitation of these techniques using the properties of cavity modes. Indeed, they are 
very sensitive to the beam quality. The measured divergence is actually directly proportional to the 
M2 factor of the beam. If the laser mode is not perfectly TEM00 the reliability of the method is 
strongly affected.  
- Neuenschwander and al. [101] used a plano-plano laser cavity stabilised by the thermal lens 
(figure 23). Adding two extra-cavity lenses and measuring the beam diameter in different 
longitudinal positions they found the waist in the cavity and therefore the thermal lens. This method 
is interesting because it allows the simultaneous measurement of the M2 factor which reduces the 
 69 /115  
limitations due to the beam quality. In that case, the limitation is due to the precision on the 
distances between the different optical components which is not always perfectly known.  
- Ozygus et al. [103] proposed an alternative method that consisted in using the frequencies of 
different transverse modes. In fact, the frequencies of the different transverse modes (in a plano-
concave laser cavity) depend not only on the length of the cavity but also on the radius of curvature 
of the mirrors (considering the flat mirror and the crystal as another concave mirror). If one 
achieves to have two modes lasing simultaneously in the cavity, the measurement of the beating 
frequency with an optical spectrum analyser allows retrieving the thermal lens. Another upgraded 
technique based on the same effect was also presented by Ozygus et al.later in 1997 [104]. In the 
last one, one translates one of the mirrors to find the positions of the spectral degeneracy (when two 
eigenmodes have the same frequency). The advantage of this method is its capacity to be used for 
measuring long-focal-length thermal lenses. It allows the measurement of focal lengths as long as 5 
In conclusion, the methods based on the properties of the cavity modes allows to have 
relative precisions on the thermal focal lens of 15 % for a TEM00 beam but down to 60 % for an 
non-diffraction limited beam [101].  
IV.4. Methods based on  wavefront measurements 
In this part, we will distinguish three wavefront measurement techniques: “classical” 
interferometry (based on fringe measurements on Michelson, Fizeau, … interferometers),  shearing 
interferometry, and Shack-Hartmann sensing. 
IV.4.1. Classical interferometric techniques 
 70 /115  
As far as “classical” interferometry is concerned, one can consider equal-thickness fringes 
between the parallel end faces of the rod (this is the so-called Fizeau interferometer); one can also 
insert the rod under study in an arm of a Michelson-type [105] or Mach-Zehnder -type 
interferometer [106] for example. The first type of method is simpler since it does not require a 
second interferometrically adjusted arm.   
These methods are particularly well suited for large amplifier rods but in counterpart, they 
are not convenient for end-pumping. Indeed, even if there is no fundamental contraindication to use 
this method for small spots, in practice in this case the number of fringes is too small to obtain an 
exploitable interferogram. As an example, if we consider a 200-µm diameter probe beam, a focal 
length of 5 cm only induces a phase shift between the centre and the edge of the beam of ∆ = h2/2f 
= λ/6 (with h the radius of the beam, f the focal length, and λ the wavelength of the probe beam, 
here λ =670 nm). In this case, of course, no fringes are visible.  
To overcome this problem, one can choose to take a probe beam larger than the pumped 
area, given that in these conditions, as shown in figure 17, some spherical aberration will be present, 
which requires an additional numerical model to fit the data and retrieve the thermal lens, as done 
by Pfistner et al. [107]. The weakness of this method relies precisely in the retrieving algorithm 
since the whole interferogram consists on only a few fringes. The precision is then a hard point and 
in the best case this precision is evaluated to λ/4. This work [107] has also been done on YAG, 
GSGG and YLF crystals.   
IV.4.2. Shearing interferometric techniques 
A classical solution to obtain information on the phase “between two fringes” is phase shift 
interferometry. This technique has been used by Khizhnyak et al [108] with longitudinally-pumped 
Nd :YAG lasers. This method is easy to implement since it’s based on commercial products, but it 
 71 /115  
remains quite unused due to its important cost. One generally prefers lateral-shearing-interferometer 
methods.  
Methods based on lateral shearing interferometers are particularly well suited to end 
pumping. The principle is the following: the beam is duplicated in several replicas, typically 2 [109-
110],3 [111] or 4 [112-113] (as presented in figures 24 and 25 with a tri-wave lateral shearing 
interferometer setup).  
Figure 24 : Example of tri-wave lateral shearing interferometer (courtesy of J.C. Chanteloup). 
 72 /115  
Figure 25: Example of tri-wave lateral shearing interferometer setup (courtesy of J.C. Chanteloup). 
The replicas are slightly shifted from each other in the lateral direction which provides an 
interferogram whose fringes (for 2 waves) or dots (for 3, 4 waves) separation give information on 
the derivative of the wavefront.  For example in the absence of wave-front distortion the lines are 
rectilinear for the 2-wave shearing interferometer, they form an homogenous honey-comb for the 3-
wave shearing interferometer and perfect squares for the 4-wave shearing interferometer. This 
technique is more sensitive than classical interferometry since the sensitivity is tunable by adjusting 
the shearing distance. The larger the shift, the more precise the technique . The precision of this 
method is excellent since it is in the order of λ/50 [109] and can even reach λ/200 [111].  
The method using 3 (or 4)-wave « trilateral shearing interferometry » [111] has the 
advantage over the 2-wave shearing interferometry to allow the cartography in the 2 dimensions in 
one acquisition.  This method is simple to implement (figure 25) since the splitting of the beam can 
be realized readily with a 3D grating.  Moreover the use of a grating makes this method totally 
achromatic which allows its use for broad-band lasers such as femtosecond lasers. In 1998, 
 73 /115  
Chanteloup et al.  [112] reported on the wavefront distortions of a terawatt-class femtosecond laser 
system with an accuracy of λ/50. 
This method was also used to characterize thermal lensing in Ytterbium-doped materials, 
namely in Yb:YAB by J.L. Blows et al. [110], who used the 2-wave shearing interferometer 
technique to measure thermal lenses, and then thermal conductivities and fractional thermal 
loadings. The experimental setup used is reproduced on figure 26.  
Figure 26: setup for measuring thermal lensing in diode-pumped Yb:YAB crystal with lateral 
shearing interferometry from [110] (courtesy of J. Dawes, Centre for lasers and Applications, 
Sydney) 
The use of a probe beam at 530 nm allowed the thermal lens to be evaluated under lasing 
and nonlasing conditions (that is with a shutter inside the cavity), which is an essential requirement 
to perform radiative quantum efficiencies measurements. 
IV.4.3. Methods based on Shack-Hartmann wavefront sensing 
Although it can be seen also as a multiple beam interferometric method, we separate this method 
from the previously mentioned ones since it can be understood easily with geometrical optics. 
 74 /115  
In 1900, Hartman proposed to use a drilled plate [115] to measure wavefronts. The principle is 
simple: since light rays run perpendicular to the wavefront, one can retrieve the local wavefront 
slope as soon as the direction of the ray is measured. The Hartmann plate is made of small apertures 
which scatter the beam into regularly-spaced diffraction patterns, and behind which is located a 
detector (typically a CCD camera nowadays). In 1971, Roland Shack and Ben Platt improved the 
Hartmann setup by replacing the array of holes by an array of microlenses (figures 27, 26).  
Single unit 
Figure 27: example of intensity pattern observed in the focal plane of the Shack-Hartmann 
microlenses. 
 75 /115  
Rays 
propagationIncident 
Focal length
Microlens
array
Rays 
propagationIncident 
Focal length
Microlens
array
Figure 28: Shack-Hartmann wavefront sensor setup. 
Reconstructed 
wave front
Measured 
local slopes
Input wave 
front
Reconstructed 
wave front
Measured 
local slopes
Input wave 
front
Measured 
local slopes
Input wave 
front
Figure 29: Shack-Hartmann sampling. 
 76 /115  
The small axial shift of the microlens diffraction pattern centroid (see figure 28) is directly 
proportional to the average local slope of the wavefront on the aperture of the micro-lenses. The 
displacement vectors allow having 2D information. The standard sensitivity of such systems is 
typically λ/100 RMS for commercial products, which settles this technique as a competitor of  
lateral shearing interferometry. One of the advantages of this method, compared to interferometric 
ones, is its insensitivity to mechanical vibrations or thermal fluctuations. In counterpart, the 
principal limitation of this kind of sensor is the discrete sampling that limits the transverse 
resolution (figure 29) and thus prevents from obtaining information about high-spatial-frequency 
phase distortions. Nevertheless, it’s noteworthy that this point is not a problem for thermal lensing 
and aberration measurements because in virtue of the general heat equation (II.1.1), temperature 
variations in a crystal are smooth even if the thermal load exhibits sharp variations.  
In 1998, Armstrong [116] reported the measurement of thermal lens in transversally-pumped 
Nd:YAG and Nd:YAP rods. More recently Ito et al. [117] and Pittman et al. [118] reported the use 
of a Hartmann-Shack sensor to measure the thermal lens in Ti:sapphire rods of terawatt-class 
femtosecond lasers.   
In 2001, reports of temporal changes of thermal lens effects on high power pulsed Yb:glass 
lasers have been done by Nishimura et al. using a Shack-Hartmann wavefront sensor [120]. Here 
the sensor was used for its ability to yield real-time (100 Hz sampling rate) estimation of Zernike 
coefficients of aberrations, and allowed to measure characteristic thermal relaxation times. 
Our group reported recently [73] a derivation of Armstrong’s work to measure thermal lensing 
in various diode-pumped Yb:doped crystals, under lasing or nonlasing conditions. The setup 
appears in figure 30. The probe beam was a laser diode at 670 nm coupled in a single mode fiber, 
chosen for its high spatial coherence (an essential feature to correctly define the “reference 
wavefront”) and its low temporal coherence (necessary to avoid coherent cross talk, that is 
interference between two neighbouring microlens diffraction patterns.)  After collimation by a 
 77 /115  
microscope objective, the probe beam was focused onto the crystal and superimposed with the 
pump beam. The crystal was then imaged upon the microlens array using a magnifying relay 
imaging system. An uncoated glass plate was inserted in the pump beam path to reflect the probe 
beam towards the sensor. A selective interference filter at 670 nm was added in front of the sensor 
to eliminate any unwanted signal at the pump or laser wavelengths. A « reference wavefront » is 
recorded when the pump diode is turned off, which includes all static aberrations of the optical 
elements and of the cold crystal itself. It is then subtracted to the measured wavefront when the 
pump is on. Thus, only phase distortions originating in thermal effects are recorded. The phase front 
was then reconstructed by projection over the set of the orthogonal Zernike polynomials [70].   
Figure 30: Experimental setup used to measure thermal lensing in diode-pumped Yb-doped crystals 
with a Shack-Hartmann wavefront sensor, from [73]. 
IV.4.4  Other techniques 
There exists some more marginal and less used techniques to measure thermal lensing. For 
example, it is possible to achieve phase reconstruction basing on Fourier optics. One can show that 
it is possible to retrieve the phase by knowing the intensity profile in different planes linked by 
Fiber-coupled pump 
laser diode 
Fiber-coupled 
Shack-Hartmann sensor 
Interference 
Filter @ 670 nm 
Laser cavity Glass plate 
crystal 
 78 /115  
Fourier transformation. For instance it is quite obvious that a uniform intensity over a circular 
aperture and an Airy pattern appearing in two Fourier-related planes implies that the wavefront 
propagating from one plane to the other is purely spherical. Nevertheless the general inverse 
problem is far from being obvious. Grossard and al. [114] proposed a technique for measuring 
thermal lensing aberrations in a diode-pumped Nd:YVO4 crystal using intensity profiles in 3 planes 
and a complex phase retrieval algorithm derived from Gerchberg and Saxton’s work.      
IV.5. Conclusion 
In conclusion, we made a review of the main methods used to experimentally measure the 
thermal effects in lasers, putting in emphasis the advantages and the limitations of each method. A 
summary of this review is presented in table 4. 
 79 /115  
Methods 
Based on  focal point 
displacement no 
Very 
difficult yes yes yes yes yes 
Very easy / 
very cheap 
Based on cavity 
properties 
no yes Only with 
Almost 
impossible
probe yes yes 
Low  
≈ 15 à 60 %
Difficult in 
general / 
cheap 
« classical » 
Interferometry 
yes Very difficult 
yes in 
theory no no no yes 
σ∆ ∼λ/4 for the 
interferometry 
without phase 
shift. 
Difficult / 
Cheap 
(except phase 
shift interf.) 
lateral shearing 
interferometry yes yes yes yes yes yes 
yes  
for the 3-4-wave 
shearing 
high 
σ∆∼λ/50 
to λ/200 
medium / 
expensive 
reconstruction de la 
phase from different 
position intensity 
profiles 
yes yes yes yes yes yes no unknown Difficult /no commercial 
Hartmann- Shack yes yes yes yes yes yes yes 
high 
σ ∆ ∼λ/100 
easy/ 
expensive 
Table 4: summary of the different techniques used to measure thermal lensing. σ∆ denotes the 
reported RMS deviation on phase shift. 
V. Thermal lensing measurements in ytterbium-doped 
materials: the evidence of a non radiative path  
We will conclude this review by giving some examples of experimental thermal lensing 
measurements in diode-pumped Yb-doped crystals. Most of the examples are taken from our 
previous publications [73,121]: the reader interested in the technical details of the experiments, as 
well as by the theoretical considerations underneath, is invited to refer to these works. 
 80 /115  
V.1. the thermal load in Yb-doped materials  
What are the heat sources in a laser medium ? Following T. Y. Fan and using the same notations 
[8], the fractional thermal load ηh (that is the fraction of the absorbed pump power converted into 
heat) can be written:  
rlph λ
ηηηη 11   (V.1) 
Where: 
- λp, λl, λf are the pump wavelength, the observed laser wavelength, and the mean fluorescence 
spectrum wavelength, respectively; 
- ηp is the pump quantum efficiency, which is the fraction of absorbed pump photons contributing 
to inversion. Non-unity pump quantum efficiency accounts for residual absorption of the undoped 
crystal, or can be related to the presence of nonradiative sites. 
- ηr  is the radiative quantum efficiency for the upper manifold: it represents the fraction of excited 
atoms that decay by a radiative path (in absence of stimulated emission). Non-unity radiative 
quantum efficiency can be related to multiphonon relaxation (although it is very unlikely since a 
large number of phonons are necessary to bridge the 10 000 cm-1 gap separating the excited and 
ground manifolds) or more probably to concentration quenching. The latter phenomenon 
corresponds to the trapping of the energy by a color center, an impurity, or a lattice defect (Yb2+, 
rare-earth impurities or hydroxyl groups have been evoked as possible “dark sites” [122-124]) after 
several transfers of excitation between neighbouring ions.  
- ηl is the laser extraction efficiency, defined as the fraction of excited ions that are extracted by 
stimulated emission. An expression of the laser extraction efficiency can be derived by writing the 
stimulated, spontaneous and nonradiative relaxation rates [73]. In most cases an approximate 
 81 /115  
relation can be used by neglecting reabsorption at laser wavelength. In the latter case we obtain the 
simplified expression: 
≈   (V.2) 
where I is the intracavity laser intensity, and ( )lem λσ  the emission cross section at laser 
wavelength. As shown by eq. (V.2) the laser extraction efficiency tends towards 1 for intracavity 
laser intensities that surpass the laser saturation intensity. Generally, cw oscillators based on Yb-
doped materials work with high reflectivity output couplers: as a consequence the intracavity laser 
intensity is very high, at least one order of magnitude higher than the laser saturation intensity, so 
that ηl is typically close to unity in an operating Yb laser. In this case, the thermal load becomes 
nearly independent on the radiative quantum efficiency, and is only given by the quantum defect. 
Nevertheless, the quantum efficiency directly affects the excited state population and has 
subsequently crucial importance for Q-switched lasers or low repetition rate amplifiers. 
Incidentally, the relation (V.2) also illustrates that the performance of an Yb-based cw oscillator 
becomes nearly independent of the emission cross section at laser wavelength, provided that the 
pump intensity is far higher than the pump saturation intensity.  
V.2. Evidence of nonradiative effects in Yb-doped materials: the example of 
Yb:YAG  
During the past decade, it has been assumed and claimed many times that nonradiative effects did 
not exist in Yb-doped materials, in virtue of the very simple electronic structure that prevented 
deleterious effects that are well known with other ions, such as cross relaxation, excited state 
absorption or upconversion. Nevertheless, all the measurements performed in the past few years 
 82 /115  
have all brought a contradiction to this statement. Blows et al. have demonstrated clear evidence of 
a nonradiative path in Yb:YAB [110],  Ramirez et al. [125] in Yb:MgO:LiNbO3 , and as for YAG, 
non-unity quantum efficiencies have been reported by Barnes et al. [126], Patel et al. [127], 
Ramirez et al. [125] and Chenais et al. [121]. This recent work has also shown the existence of 
nonradiative effects in Yb:GGG, Yb:YSO, Yb:KGW, Yb:YCOB, Yb:GdCOB.  
An example of quantum efficiency measurement thanks to the thermal lens method is shown in 
figure 31 with an Yb:YAG crystal. A simple qualitative explanation is given in figure 32. The clear 
difference between the thermal lens dioptric power under lasing and nonlasing conditions can be 
modelled using eq. (V.1) and eq. (V.2), provided that laser power is measured simultaneously. 
Given some additional approximations, detailed in [121], one can retrieve both radiative quantum 
efficiency and thermo-optic coefficient χ. The results recently reported for different crystals have 
been summarized in table 5. 
 83 /115  
figure 31: Thermal lens dioptric power (here, aberrations of the thermal lens were negligible) 
under lasing and nonlasing conditions. On the right (same graph) the measured laser power, useful 
to compute the laser extraction efficiency. (from [121]) 
laserfluorescence
NON lasing lasing
2 F 5/2 
2 F 7/2 
Non Radiative 
(NR) decay 
Energy 
fluorescence
NR decay  
NON LASING 
LASING
543210
Absorbed pump power (W)
8- at.% Yb:YAG
 TL, non lasing : experiment
 TL, non lasing : model
 TL, lasing : experiment
 TL, lasing : model
 Laser power
 84 /115  
figure 32 : Simple qualitative explanation of the observed difference between thermal lens dioptric 
power under lasing and nonlasing conditions. When the laser is on, stimulated emission short-
circuits the nonradiative path, causing the thermal load to be lower. 
It is seen that for a crystal like YAG, many different techniques have been used (for the details of 
each method, refer to the cited publications), and that a large dispersion of reported quantum 
efficiencies appears. This dispersion tends to assess the conjecture of concentration quenching as 
the nonradiative source in Yb-doped materials. This conjecture has been very alleviated in highly-
doped Yb:YAG samples and was attributed to cooperative processes between two Yb3+ ions 
towards Yb2+ impurities [122]. Owing to the intrinsic nature of concentration quenching and the 
major role played by impurities, it is clear that the radiative quantum efficiency is a parameter that 
pertains to a single given sample, characterized by its doping concentration, the growth technique 
and its associated environment (in particular the nature of the crucible), and of course the degree of 
purity of the compounds.  
 85 /115  
Table 5: Reported values of measured thermo-optic coefficients and radiative quantum efficiencies 
in the literature for different Yb-doped materials. 
crystal Mean 
fluorescence 
wavelength 
λf (nm) 
Thermo-optic 
coefficient (×10-6 K-1)
(measured, under 
diode-pumping 
conditions) 
Radiative 
quantum 
efficiency ηr 
(measured) 
Method used 
 (for quantum 
efficiency 
measurement) 
Reference 
0.70 Thermal lensing 
(Shack-Hartmann) 
[121] 
0.874 photometric [126] 
0.835 calorimetric [126] 
0.97 lifetime [127] 
Yb:YAG 1007 10.0 
(from [121]) 
0.85 Direct temperature 
measurement 
[125] 
Yb:GGG 1013 31 0.90 Thermal lensing 
(Shack-Hartmann) 
[121] 
Yb:GdCOB 1011 6.5 0.71 idem [121] 
Yb:YCOB 1035 17 0.90 idem [121] 
Yb:KGW 993 7.5 0.96 idem [121] 
Yb:YSO 1001 15 0.89 idem [121] 
Yb:YAB 996 Non reported 0.88 Thermal lensing 
(lateral shearing) 
[110] 
V.3. Laser wavelength dependence on the thermal load in Yb-doped broadband 
materials: the example of Yb:Y2SiO5 
 86 /115  
We saw that the fractional thermal load (eq. V.1) is dependant on the operating wavelength of the 
laser. In most practical circumstances however, this dependence is hidden by the fact that the laser 
extraction efficiency also greatly depends on the laser wavelength, since it is linked to the emission 
cross section. The Yb:Y2SiO5 (Yb:YSO [128]) crystal exhibits two maxima of comparable 
amplitude in its emission spectrum, at 1042 and 1058 nm respectively. In addition the output is 
naturally linearly polarized along the crystallophysic axis Y for both wavelengths. It is thus a good 
candidate to put clearly into evidence the influence of laser wavelength on thermal lensing. To 
perform the experiment, we added a SF6 dispersive prism cut at Brewster angle in the collimated 
arm of the three-mirror cavity appearing in figure 30, so that identical laser efficiencies were 
achieved at both wavelengths (2.1 Watts were obtained for 8.5 Watts of absorbed pump power). 
The results are shown in figure 33. It appears clearly that the thermal lens is weaker when the laser 
oscillates at 1042 nm, as expected since quantum defect is lower at this wavelength. The theoretical 
curve derived for the 1058-nm laser oscillation was obtained from the 1042-nm curve by just 
modifying the wavelength and the emission cross section in eq. (V.1) and (V.2), without any 
adjustable parameter (see [121]). This simple experiment shows the interest of multiwavelength 
thermal lensing measurements in broadband materials. Indeed, we have considered here a simple 
formulation of the fractional thermal loading (given by eq. V.1) which gives here satisfactory 
results; but it is possible, from the work of Patel et al. [127] for instance, to derive a more accurate 
expression of the fractional thermal load, which takes into account the probability of excitation 
transfer to a neighbouring ion [129]. If measurements are performed versus absorbed power at 
different (more than 2) laser wavelengths, for which the laser extraction efficiency is also known, 
this means that we have the possibility to infer other spectroscopic parameters (such as the transfer 
probability to a neighbouring ion) involved in the expression of the thermal load. 
 87 /115  
86420
Absorbed pump power (W)
1058 nm
1042 nm
1058 nm
1042 nm
 TL, lasing at 1042 nm : experiment
 TL, lasing at 1058nm : experiment
 TL, lasing at 1042 nm : model
 TL, lasing at 1058 nm : model
5-at. % Yb:YSO 
Figure 33: Thermal lens dioptric power at 1042 and 1058 nm (left) and laser power (right in 
Yb:YSO. The effect of wavelength on quantum defect is clearly visible for this material. (from [121]) 
IV.4. The influence of the mean fluorescence wavelength on the thermal load: an 
illustration with Yb:KGW 
The fact that the mean fluorescence wavelength affects the fractional thermal load is a very 
interesting feature of broadband Yb-doped materials. There are even some materials whose mean 
fluorescence wavelength is below the tail of the absorption spectrum. More precisely, according to 
eq (V.1) it is clear that if we may find a pump wavelength verifying:  
 88 /115  
f p r pλ η η λ<  (V.3) 
then the thermal load will be negative (in absence of laser extraction) and “radiative cooling” will 
be achieved. Such radiative cooling has been reported in Yb-doped ZBLANP glass in 1995 [130] 
and in a KGW crystal by Bowman et al. in 2002 [131]. The latter author has also proposed to use 
this phenomenon to realize a thermal load-free (radiation-balanced) laser [132]. The key idea is to 
correctly adjust the laser intensity so that the spontaneous emission rate, source of cooling 
providing the pump wavelength is long enough, exactly balances the stimulated emission rate, 
source of heating. 
We show here how the mean fluorescence wavelength plays a key role in the interpretation of the 
results obtained when measuring the thermal lens in a Yb:KGW crystal. KGW is now a well-
known crystal, suited for ultrafast laser applications [32-33, 17, 46, 74]. When used with 
wavevector k//c, the polarization-averaged mean fluorescence wavelength is 993 nm while the 
observed laser wavelength is 1030 nm and the pump wavelength tuned to the zero-line absorption 
peak, i.e. 979 nm. Results are shown in figure 34, where it can be seen that unlike previously 
reported results, thermal lensing is actually stronger under lasing conditions. A simple explanation 
is given with the schematic picture appearing in figure 35. The simple model suggested above fits 
well with experimental data and yields a high quantum efficiency for this sample (0,96), consistent 
with the fact that tungstate crystals are grown with the flux method, which is known to carry less 
impurities during growth than the Czochralski technique.  
 89 /115  
figure 34: Thermal lensing measurements in Yb:KGW (from [121]) 
NON LASING 
LASING 
6543210
Absorbed pump power (W)
5- at.% Yb:KGW
 TL, non lasing : experiment
 TL, non lasing : model
 TL, lasing : experiment
 TL, lasing : model
 Laser power
2 F 5/2 
2 F 7/2 
993 nm 1030 nm 
laser fluo fluorescence 
Non lasing lasing 
Energy 
 90 /115  
Figure 35: Qualitative explanation of the paradoxical behaviour of the Yb:KGW crystal. Under 
nonlasing conditions (left), the thermal defect is low due to a low mean fluorescence wavelength 
(993 nm). When laser extraction at 1030 nm becomes the dominant way down for excitation, then 
the quantum defect is higher. This simple picture assumes that ηr = 1. 
IV.5. Conclusion 
As a conclusion for this final part of this paper, we have shown some examples of thermal lensing 
measurements, which allowed us to highlight several points of interest pertaining to Yb-doped laser 
media : 
- all measurements show a difference between thermal lensing dioptric power with and without 
laser action: this provides the proof that in all the crystals under investigation (here YAG, GGG, 
YSO, GdCOB, YCOB, KGW, YAB)  there is a nonradiative return path for excited state 
population. This is all the more detrimental for laser performance in pulsed regime, since in cw 
oscillators the laser extraction efficiency can be large enough to dissimulate this effect. 
- Since Yb-doped materials have broader spectra than their Nd-doped counterparts, thermal lensing 
has two specific properties, which have been illustrated by two experiments: 
1) the thermal load depends on the laser oscillation wavelength: the lower the wavelength and the 
lower the quantum defect; 
2) the mean fluorescence wavelength λf plays a key role. Materials exhibiting a low λf will be less 
sensitive to heating under nonlasing conditions, and in KGW we saw that the thermal lens was 
even higher under lasing action.  
 91 /115  
The peculiarities of Yb-doped materials are not limited to these two latter facts. Chenais et al. [121] 
have reported the appearance of a roll-off in the thermal lens dioptric power at high pumping 
densities, under nonlasing conditions, with several different materials [121]. Furthermore, it has 
been observed by several groups ([110], [124]) a green luminescence that can be related to 
cooperative luminescence.  
The detailed interpretation of these phenomena and their connection to the thermal load are still 
works in progress. 
VI. Conclusion 
Ytterbium-doped materials have brought new prospects and deep changes in the area of high power 
solid state lasers. Associated to new and clever pumping concepts (fiber lasers, thin disk lasers, 
spinning disk lasers...) they are now well-established competitors of Nd-doped materials for high 
power applications. This paper has been an effort to make a review of the recently published works 
dealing with thermal effects in solid state lasers, with a particular scope on the special case of Yb-
doped crystals.  
In the first section we have presented the general properties of Ytterbium-doped media, and 
pointed out the crucial role of the matrix host on the properties of the laser material. The part II was 
devoted to a detailed presentation of the temperature distribution in a diode-pumped Yb-doped 
crystal: how to calculate it and how to measure it. We pointed out the importance of boundary 
conditions, and gave some practical information about the role of the thermal contacts in the 
temperature profile. We have shown that it was possible to easily include pump absorption 
saturation effects and pump beam divergence inside the crystal, exploiting the fact that the heat 
transfer coefficient towards end faces was far smaller than towards edge faces.  
In the third part of this review we focussed on the thermo-optical properties and made a quite 
detailed study of the so-called thermal lensing phenomenon. A comprehensive understanding of 
 92 /115  
this aspect requires a good knowledge about both thermo-mechanical and thermo-optical properties 
of the materials under consideration. This lead us to point out several inaccuracies reported in 
previous works, concerning the calculation of photoelastic constants for isotropic materials, and 
more generally the abusive employment of the dn/dT parameter when it is used to estimate the 
magnitude of the thermal lens: we have shown that the partial derivative of index with temperature 
taken at constant strain is the most appropriate figure, this is because the dn/dT is classically 
measured under experimental conditions that are not consistent with the usual situation of a diode-
pumped crystal under mechanical stress.  We proposed an alternative way to split the thermo-optic 
coefficient into three truly independent terms, and addressed in conclusion a schematic diagram of 
thermal effects showing how all the different apparent consequences (lensing, depolarization, 
strain-induced birefringence, astigmatism, fracture...) are connected to each other.  
Given the high complexity of these thermo-optical phenomena, and the unfeasibility of precise 
calculations as far as all the properties of a crystal are not known (that is the case for the majority 
of laser crystals), we then focussed our attention on thermal lensing measurement techniques, 
which was the topic of part IV.  
We presented in that section a review of what are, to our knowledge, the main different techniques 
that can be employed to measure thermal lensing in end-pumped laser media, and discussed their 
relative advantages and drawbacks.  
Finally, we presented some examples of thermal lensing measurements that have been reported 
recently in Yb-doped crystals. All these measurements agree to find non-unity radiative quantum 
efficiencies for the Yb-doped materials under investigation. This non-ideal behaviour, presumably 
related to concentration quenching, provides contradiction to the general consideration that Yb-
doped materials are totally free of deleterious nonradiative effects. We concluded this review by 
giving some examples of the influence of the laser operating wavelength on the thermal load, as 
well as the influence of the mean fluorescence wavelength. 
 93 /115  
Acknowledgements: We wish to acknowledge all the people involved in the realization of this 
review: at first Romain Gaumé (Stanford University) for fruitful discussions about issues related to 
dn/dT and characterization of materials; Bruno Viana, Gerard Aka and Daniel Vivien from the 
Laboratoire de Chimie Appliquée de l’Etat Solide (Ecole Nationale Supérieure de Chimie Paris, 
France), Alain Brenier and Georges Boulon from the Laboratoire de Physico-Chimie des 
Materiaux Luminescents (Lyon, France) and at last Bernard Ferrand from CEA-LETI (Grenoble, 
France), for growing the crystals used in most of the experimental results reported here. We 
gratefully acknowledge the Ecole Supérieure d’Optique (Orsay, France) for loaning us the Shack-
Hartmann wavefront sensor as well as the infrared camera. The experiments related to our previous 
works have been done under the auspices of the CNRS and the Délégation Générale de 
l’Armement (DGA). We also thank Jean-Christophe Chanteloup and Judith Dawes for allowing us 
to reproduce some figures taken from their works.  
VII. Appendix: Calculation of the photoelastic constants Cr, θ  and 
C’r, θ  using plane strain and plane stress approximations.  
The Cr, θ constants are useful parameters to account for photoelastic effects in the most 
simple cases (optically isotropic crystals or glasses, and parabolic dependence of strain inside the 
crystal).  When Koechner published for the first time the derivation of these constants in 1970, the 
term accounting for thermal dilatation in the generalized Hooke law was omitted [51]. In 1992, 
Cousins pointed out the mistake but did not correct, at that time, the expressions of the constants Cr, 
θ, whose derivation however requires the use of the generalized Hooke law. In the last version of 
 94 /115  
Koechner’s reference book (Solid State Laser engineering, Springer Verlag ed.), the faulty 
equations were actually removed or corrected, but the formulation of the photoelastic constants Cr, θ 
remained unchanged since the first edition. 
Because the generalized Hooke law is used to infer an expression of the constants, it means 
that their expression depends on which assumption has been made about stresses and strains. In 
Koechner’s calculation, the plane strain approximation was made. In the case of end-pumping, it is 
well known however that the plane stress approximation is closer to reality. In this appendix we 
derive the expression of the photoelastic constants, within the framework of the two approximations.  
VII.1. Basic equations  
We restrict our discussion to cubic crystals. In this case, only the elasto-optical coefficients 
p11, p12, p44 are nonzero. These coefficients are given in the [100] system linked to the crystal.  
In optically isotropic materials, the principal axes of strain are given by the cylindrical coordinate 
system axes (radial, tangential, axial). After a change of coordinates one obtains [77] : 
[ ]441211
441211
441211
 (A.1.) 
for the radial index, and : 
 95 /115  
[ ]441211
441211
441211
 (A.2.) 
for the tangential index.  
As stated by Shoji et al.[119], these expressions are valid only for propagation along the [111] axis. 
For YAG we have :  
p11 = - 0.029 
p22 = + 0.0091 
p44 = - 0.0615. 
The strains are related to stresses by the generalized Hooke law, which in our case (whatever 
the approximation we make) writes:  
( )[ ] ( )
( )[ ] ( )
( )[ ] ( )cTrzz
cTzrr
−++−=
−++−=
−++−=
ασσνσ
ασσνσ
ασσνσ
 (A.3.) 
VII.2. Derivation of Cr,θ  using the plane strain approximation (Koechner’s 
case)  
 96 /115  
We consider a homogeneously pumped rod. Only the radial dependence of stresses and 
temperature is considered, since the additional constants represent a constant phase shift which does 
not affect the phase profile.  
We define:  
( )ν116
and  
Q absh
=  . 
The stresses and temperature fields, within the plane strain approximation, are given by 
Koechner [69] :  
( ) cteQSrr
cteQSrr
.cteQSrr
θ   (A.4.) 
( ) .cter
+−= 2
 (A.5.) 
Omitting the constant terms, the generalized Hooke Law yields :  
( )[ ] ( ) 0
ν14ασσνσ
ε 22θ =⎥
−−=++−= r
Trzz  (A.6.) 
which checks the plane strain condition. We have also: 
( )[ ] ( ) 22θ ν134
ν71ασσνσ
ε QSr
−−=++−=   (A.7.)  
 97 /115  
( )[ ] ( ) 22θθ ν14
ν53ασσνσ
ε QSr
−−=++−=  (A.8.) 
The radial index variation is related to the strains by:  
=  (A.9.) 
and the  
are given by (A.1) and (A.2). 
Following Koechner, we can write the index variations under the form :  
nn ,r
,r −=  (A.10.) 
With :  
( )( )
( )1ν48
875ν1 441211
C r  (A.11.) 
( )( )
( )1ν16
3ν1 1211
C  (A.12.) 
For YAG, we find Cr = + 0.020 et Cθ = + 1.77 10-4. The valued computed by Koechner are: Cr = 
0.017 et Cθ = -0.0025 (with the same values taken for pmn and the same Poisson ratio: ν = 0.25). 
The error made by Koechner is about 20%.  
 98 /115  
These coefficients have been extensively used for the purpose of evaluating depolarization losses in 
Nd:YAG. In this latter case, the relevant parameter is [60]: 
( )rB CCC −= θ2
 (A.13.) 
With (A.11) and (A.12), we find:  
( )441211 4148
pppCB +−−
 (A.14.) 
which is coincidentally the same formula obtained by Koechner from incorrect expressions of Cr et 
Cθ.  
VII.3. Derivation of C’r,θ  using the plane stress approximation (end pumping 
case) 
We consider a thin disk, pumped by a top-hat pump beam profile of radius wp  equal to the rod 
radius. This is also true in the conditions defined in Section III.4, provided that we are only 
concerned in the area r < wp and that integrated values of parameters along the crystal length are 
considered. 
Let’s define :   
=     
The temperature distribution is given by:  
 99 /115  
( ) 2' .
T r r cte
= − +  (A.15.) 
where we use Cousins’ notation (see Part III.4) , meaning that the bracketed quantity is integrated 
along the rod length L, and is then homogeneous to the said quantity times a length.  
We have: 
3 ' '
0 ( )
r Q S r Cte
r Q S r Cte
r plane stress
 (A.16.) 
From (A.3) we obtain :  
3 ' '
4 ' '
Q S r
Q S r
Q S r
 (A.17.) 
By analogy with (A.10.), the index shift can be written under the form:  
3 ' 2
n n C r
∆ = −  
which yields, using (A.1), (A.2), and (A.9) :  
( )( )11 12' 1 9 15
ν− + +
=  (A.18.) 
 100 /115  
( )( )11 12 44' 1 7 17 8
p p p
ν− + + −
=  (A.19.) 
For YAG : C’r = + 0.0032 et C’θ = - 0.011.  
The optical indicatrix does not deform the same way compared to the case of the long thin rod: here 
the tangential index becomes greater than n0. An interesting feature is the stress-induced 
birefringence term, defined previously (eq. A.13):  
( )' 11 12 44
C p p p
= − +  (A.20.) 
This expression differs from (A.14) only by a factor of 1-ν (=0.75 for YAG, but also for most 
materials).  
It is of special interest to calculate the average value of the photoelastic constants:  
0.0039
rC Cθ+ = −  
which gives an order of magnitude of the role of photoelastic effect in the thermal lens.  
Here, in the end pumping case, the thermal stresses yield a divergent contribution to the 
thermal lens, whereas it was convergent in the case of side-pumping.  
See subsections 6 to 8 of Section III for a detailed discussion about the use of these 
parameters.  
 101 /115  
References:  
1. T.Y. Fan : “Diode-pumped Solid-State Lasers”, in Laser Sources and Applications, Proc. of 
the 47th Scottish Universities Summer School in Physics, St Andrews, edited by A. Miller 
and D.Finlayson, SUSSP Publications & Institute of Physics Publishing, 1996. 
2. D. Hanna and W. Clarkson : “A review of diode-pumped lasers”, in Advances in Lasers and 
applications, Proc. of the 52nd  Scottish Universities Summer School in Physics, St Andrews, 
edited by D. Finlayson and B. Sinclair, SUSSP Publications & Institute of Physics 
Publishing, 1999. 
3. R.J. Keyes, T.M. Quist « Injection luminescent pumping of CaF2:U3+ with GaAs diode lasers 
», Appl. Phys. Lett. 4, 50-52 (1964).  
4. W. Streifer, D. Scifres, G. Harnagel, D. Welch, J. Berger, M. Sakamoto : “Advances in laser 
diode pumps“, IEEE J. Quant. Elec. Vol. 24, pp. 883-894 (1988). 
5.  D. Brown : « Ultrahigh-average power diode-pumped Nd :YAG and Yb:YAG lasers », IEEE 
J. Quant. Elec., Vol. 33, No. 5, pp. 861-873 (1997). 
6. W. Krupke : «Ytterbium Solid-State Lasers – The first decade», IEEE Journal On Sel. Topics 
in Quant. Elec. 6, 1287-1296 (2000). 
7. P. Hardman, W. Clarkson, G. Friel, M. Pollnau, D. Hanna : “energy-transfer upconversion 
and thermal lensing in high-power end-pumped Nd:YLF laser crystals”, IEEE J. Quant. 
Elec. , Vol. 35, No. 4, pp. 647-655 (1999). 
8. T.Y. Fan : “Heat generation in Nd:YAG and Yb:YAG”, IEEE J. Quant. Elec., Vol. 29, No. 6, 
pp. 1457-1459 (1993). 
9. D. Sumida, A. Betin, H. Bruesselbach, R. Byren, S. Matthews, R. Reeder, M. Mangir : 
“Diode-pumped Yb:YAG catches up with Nd:YAG”, Laser Focus World, Juin 1999, pp. 
63-70. 
 102 /115  
10. L. Johnson, J. Geusic, L. Van Uitert : “Coherent oscillations from Tm3+, Ho3+, Yb3+ and Er3+ 
ions in yttrium aluminium garnet”, Appl. Phys. Lett., Vol. 7, No. 5, pp. 127-129 (1965). 
11. R. Wynne, J. Daneu, T. Y. Fan : “thermal coefficients of the expansion and refractive index in 
YAG”, Appl. Opt., vol. 38, no. 15, pp. 3282-3284 (1999). 
12. P. Lacovara, H. Choi, C. Wang, R. Aggrawal, T. Fan : « room-temperature diode-pumped 
Yb:YAG laser », Opt. Lett. , vol. 16  No. 14, pp.1089-1091 (1991). 
13. G.A. Slack and D.W. Oliver : “Thermal conductivity of garnets and phonon scattering by 
rare-earth ions”, Phys. Rev. B, 4, 592-609 (1971). 
14. F. Krausz, M.E. Fermann, T. Brabec, P. F. Curley, M. Hofer, M. H. Ober, C. Spielmann, E. 
Wintner, A. J. Schmidt, « Femtosecond Solid-State Lasers»‚IEEE   J. Quant. Elec. 28, 
2097-2121 (1992). 
15. U. Keller, «Ultrafast all-solid-state laser technology», Appl. Phys. B, 58, 347-364 (1994). 
16. J. Aus der Au, D. Kopf, F. Morier-Genoud, M. Moser, U. Keller  « 60-fs pulses from a diode-
pumped Ndglass laser » Opt. Lett. 22 307 (1997). 
17. F. Brunner, T. Sdmeyer, E. Innerhofer, F. Morier-Genoud, R. Paschotta, V. E. Kisel, V. G. 
Shcherbitsky, N. V. Kuleshov, J. Gao, K. Contag, A. Giesen, U. Keller «240-fs pulses with 
22-W average power from a mode-locked thin-disk YbKY(WO4)2 laser», Opt. Lett. 27 1162 
(2002). 
18. A. Courjaud, R. Maleck-Rassoul, N. Deguil, C. Hönninger, and F. Salin, “Diode pumped 
multikilohertz femtosecond amplifier”, in OSA Trends in Optics and Photonics, Advanced 
Solid-State Lasers, 68, 121-123 (2002).  
19. F. Druon, G.J. Valentine, S. Chénais, P. Raybaut, F. Balembois, P. Georges, A. Brun, A.J. 
Kemp, W. Sibbett, S. Mohr, D. Kopf, D.J.L. Birkin, D. Burns, A. Courjaud, C. Hönninger, 
F. Salin, R. Gaumé, A. Aron, G. Aka, B. Viana, C. Clerc, H. Bernas, «Diode-pumped 
femtosecond oscillator using ultra-broad Yb-doped crystals and modelocked using low-
 103 /115  
temperature grown or ion implanted saturable-absorber mirrors», Euro. Phys. J. 20, 177-
182 (2003). 
20. F. Druon, F. Balembois, P. Georges, A. Brun, A. Courjaud, C. Honninger, F. Salin, A. Aron, 
F. Mougel, G. Aka, D. Vivien «Generation of 90-fs pulses from a mode-locked diode-
pumped Yb 3+: Ca4GdO(BO3)3 laser» Opt. Lett. 25, 423-425 (2000). 
21. F. Druon, S. Chénais, P. Raybaut, F. Balembois, P. Georges, R. Gaum, G. Aka, B. Viana, S. 
Mohr, D. Kopf «Diode-pumped Yb:Sr3Y(BO3)3 femtosecond laser»   Opt. Lett. 27, 197-199 
(2002). 
22. F. Druon, S. Chénais, P. Raybaut, F. Balembois, P. Georges, R. Gaumé, P. H. Haumesser, B. 
Viana, D. Vivien, S. Dhellemmes, V. Ortiz, C. Larat  «Apatite-structure crystal, 
Yb3+:SrY4(SiO4)3O, for the development of diode-pumped femtosecond lasers» , Opt. Lett. 
27, 1914-1916 (2002). 
23. F. Druon, F. Balembois, P. Georges « Laser crystals for the production of ultra-short laser 
pulses » Ann. Chim. Mat. 28, 47-72 (2003). 
24. C. Honninger , F. Morier-Genoud , M. Moser , U. Keller , L. R. Brovelli , and C. Harder , 
«Efficient and tunable diode-pumped femtosecond Yb:glass lasers» , Opt. Lett. 23, 126-128 
(1998). 
25. C. Honninger, R. Paschotta, M. Graf, F. Morier-Genoud, G. Zhang, M. Moser, S. Biswal, J. 
Nees, A. Braun, G. Mourou, I. Johannsen, A. Giesen, W. Seeber, and U. Keller, «Ultrafast 
ytterbium-doped bulk laser amplifiers» Appl. Phys. B, 69, 3 (1999). 
26. E. Innerhofer, T. Sdmeyer, F. Brunner, R. Hring, A. Aschwanden, R. Paschotta, C. Honninger, 
M. Kumkar, U. Keller «60-W average power in 810-fs pulses from a thin-disk YbYAG 
laser»   Opt. Lett. 28, 367-369 (2003). 
27. Junji Kawanaka, Koichi Yamakawa, Hajime Nishioka, Ken-ichi Ueda «30-mJ, diode-pumped, 
chirped-pulse YbYLF regenerative amplifier »  Opt. Lett. 28, 2121-2123 (2003). 
 104 /115  
28. Peter Klopp, Valentin Petrov, Uwe Griebner, Klaus Petermann, Volker Peters, Götz Erbert « 
Highly efficient mode-locked Yb:Sc2O3 laser  »   Opt. Lett. 29, 391-393 (2004). 
29. P. Klopp,  V. Petrov,  U. Griebner, and G. Erbert, "Passively mode-locked Yb:KYW laser 
pumped by a tapered diode laser," Opt. Express 10, 108-113 (2002). 
http://www.opticsexpress.org/abstract.cfm?URI=OPEX-10-2-108 
30. Peter Klopp, Valentin Petrov, Uwe Griebner, Klaus Petermann, Volker Peters, Götz Erbert « 
Highly efficient mode-locked Yb:Sc2O3 laser»  Opt. Lett. 29, 391-393 (2004).  
31. H. Liu, S. Biswal, J. Paye, J. Nees, G. Mourou, C. Hnninger, U. Keller « Directly diode-
pumped millijoule subpicosecond Ybglass regenerative amplifier » ,  Opt. Lett. 24, 917-919 
(1999).    
32. H. Liu, J. Nees, G. Mourou « Diode-pumped Kerr-lens mode-locked Yb:KY(WO4)2 laser»   
Opt. Lett. 26,  1723-1725 (2001). 
33. Hsiao-Hua Liu, John Nees, Grard Mourou, « Directly diode-pumped Yb:KY(WO4).2 
regenerative amplifiers »  Opt. Lett. 27, 722-724 (2002). 
34. P. Raybaut, F. Druon, F. Balembois, P. Georges, R. Gaumé, G. Aka, B. Viana «Directly 
diode-pumped oscillators and regenerative amplifiers for ultra-short pulse generation »   
Invited paper CThFF5 CLEO 2004. 
35. Pierre Raybaut, Frederic Druon, Francois Balembois, Patrick Georges, Romain Gaumé, 
Bruno Viana, Daniel Vivien « Directly diode-pumped Yb3+:SrY4(SiO4)3O regenerative 
amplifier »  Opt. Lett. 28, 2195-2196 (2003). 
36. A. Shirakawa,  K. Takaichi,  H. Yagi,  J. -. Bisson,  J. Lu,  M. Musha,  K. Ueda,  T. 
Yanagitani,  T. S. Petrov, and A. A. Kaminskii, "Diode-pumped mode-locked Yb3+:Y2O3 
ceramic laser," Opt. Express 11, 2911-2916 (2003).  
 105 /115  
37. L. DeLoach, S. Payne, L. Chase, L. Smith, W. Kway, W. Krupke : “Evaluation of absorption 
and emission properties of Yb3+-doped crystals for laser applications”, IEEE J. Quant. Elec., 
vol. 29, no. 4, pp. 1179-1091 (1984). 
38. S.A. Payne, L.D. DeLoach, L.K. Smith W.L. Kway, J.B. Tassano, W.F. Krupke, B.H.T. Chai, 
G. Louts : “Ytterbium-doped apatite-structure crystals : a new class of laser materials ”, J. 
Appl. Phys 76, 497-503 (1994) 
39. S. Payne, L. Smith, L. DeLoach, W. Kway, J. Tassano, W. Krupke : “Laser, optical, and 
thermomechanical properties of Yb-doped fluorapatite”, IEEE. J. Quant. Elec. , vol. 30, 
No.1, pp. 170-179 (1994). 
40. F. Mougel, G. Aka, A. Kahn-Harari, H. Hubert, J.M. Benitez, D. Vivien : “Infrared laser 
performance and self-frequency doubling of Nd3+:Ca4GdO(BO3)3 (Nd:GdCOB)”, Opt. Mat. 
8, pp.161-173 (1997). 
41. F. Druon, F. Augé, F. Balembois, P. Georges, A. Brun, A. Aron, F. Mougel, G. Aka, D. 
Vivien : « Efficient, tunable, zero-line diode-pumped, continuous-wave 
Yb3+ :Ca4LnO(BO3)3 (Ln = Gd, Y) lasers at room temperature and application to miniature 
lasers », J. Opt. Soc. Am. B, Vol. 17, No. 1 (2000). 
42. P.-H. Haumesser, R. Gaumé, B. Viana, D. Vivien : “Determination of laser parameters of 
ytterbium-doped oxide crystalline materials”, J. Opt. Soc. Am. B, Vol. 19, No. 10, pp. 2365-
2375 (2002). 
43. R. Gaumé, P.-H. Haumesser, B. Viana, G. Aka, D. Vivien, E. Scheer, P. Bourdon, B. Ferrand, 
M. Jacquet, N. Lenain : « Spectroscopic properties and laser performance of Yb3+ :Y2SiO5, 
a new infrared laser material », OSA TOPS vol. 34, Advanced Solid State Lasers, p. 469, 
2000. 
 106 /115  
44. S. Chénais, F. Druon, F. Balembois, P. Georges, R. Gaumé, B. Viana, D. Vivien, A. Brenier, 
G. Boulon : « Diode-pumped Yb:GGG laser : comparison with Yb:YAG », Optical 
Materials 22 pp.99-106 (2003). 
45. N. V. Kuleshov, A. A. Lagatsky, V. G. Shcherbitsky, V. P. Mikhailov, E. Heumann, T. Jensen, 
A. Diening, G. Huber : “CW laser performance of Yb and Er, Yb doped tungstates”,  Appl. 
Phys. B  64, pp.409-413 (1997). 
46. A.A. Lagatsky, N.V. Kuleshov, V.P. Mikhailov: « diode-pumped CW lasing of Yb:KYW and 
Yb:KGW », Opt. Comm. 165, pp.71-75 (1999) 
47. A. Lucca, M. Jacquemet, F. Druon, F. Balembois, P. Georges, P. Camy,, J.L. Doualan, R. 
Moncorgé « high power diode-pumped CW laser operation of Yb :CaF2 » post-deadline 
CPD-A1 CLEO (2004).   
48. S. Jiang, M. Myers, D. Rhonehouse, U. Griebner, R. Koch, H. Schonnagel : “Ytterbium 
doped phosphate laser glasses”, Proceedings of SPIE “Solid State Lasers VI”, Vol. 2986 
(1997). 
49. J.F. Nye : physical properties of crystals, Clarendon Press, Oxford, 1985. 
50. D. Brown : « Ultrahigh-average power diode-pumped Nd :YAG and Yb:YAG lasers », IEEE 
J. Quant. Elec., Vol. 33, No. 5, pp. 861-873 (1997). 
51. W. Koechner : « Absorbed Pump Power, Thermal Profile and Stresses in a cw Pumped 
Nd :YAG Crystal », Appl. Opt., Vol. 9, No. 6, June 1970, 1429-1434 
52. M. Schmid, T. Graf, and H. P. Weber, “Analytical model of the temperature distribution and 
the thermally induced birefringence in laser rods with cylindrically symmetric heating,” J. 
Opt. Soc. Amer. B, vol. V17, no. 8, pp. 1398–1404, 2000. 
53. U. Farrukh, A. Buoncristiani, C. Byvik : « an analysis of the temperature distribution in finite 
solid-state laser rods », IEEE J. Quant. Elec., vol. 34, no. 11, pp. 2253-2263 (1988) 
 107 /115  
54. M. Innocenzi, H. Yura, C. Fincher, R. Fields : « Thermal modeling of continuous-wave end-
pumped solid-state lasers », Appl. Phys. Lett. 56 (19), 7 may 1990, pp. 1831-1833 
55. Y. Chen, T. Huang, C. Kao, C. Wang, S. Wang : « optimization in scaling fiber-coupled laser-
diode end-pumped lasers to higher power : influence on thermal effect », IEEE J. Quant. 
Elec., Vol. 33, No. 8, pp. 1424-1429 (1997) 
56. F. Sanchez, M. Brunel, K Aït-Ameur : « Pump-saturation effects in end-pumped solid-state 
lasers », J. Opt. Soc. Am. B, Vol. 15, No. 9, pp. 2390-2394 (1998). 
57. F. Augé, F. Druon, F. Balembois, P. Georges, A. Brun, F. Mougel, G. P. Aka, and D. Vivien, 
“Theoretical and experimental investigations of a diode-pumped quasi-three-level laser: 
The Yb –doped Ca4GdO(BO3)3 (Yb:GdCOB) laser,” IEEE J. Quantum Electron., vol. 36, 
pp. 598–606, May 2000. 
58. H.S. Carslaw, J.C. Jaeger : conduction of heat in solids, 2nd edition, Clarendon Press, Oxford, 
1986. 
59. Jacobs and Starr, Rev. Sci. Instr. 10, 140 (1941). 
60. A. Cousins: « temperature and thermal stress scaling in finite-length end-pumped laser rods» , 
IEEE J. Quant. Elec. , vol. 28, no. 4, pp.1057-1069 (1992). 
61. J. Marion: “strengthened solid-state laser materials”, Appl. Phys. Lett. 47 (7), pp. 94-96 
(1985). 
62. J. Marion: “Fracture of solid-state laser slabs”, J. Appl. Phys. 60 (1), pp. 69-77 (1986).  
63. J. Marion: “appropriate use of the strength parameter in solid-state slab laser design”, J. Appl. 
Phys. 62 (5), pp. 1595-1604 (1987). 
64. S. Ferré: « Caractérisation expérimentale et simulation des effets thermiques d’une chaîne 
laser ultra-intense à base de saphir dopé au titane »,  thèse de doctorat de l’école 
polytechnique, 2002. 
 108 /115  
65. H. Glur, R. Lavi, T. Graf: “Reduction of thermally induced lenses in Nd:YAG rods with low 
temperatures, IEEE J. of Quant. Elec., Vol. 40, No. 5, 499-504 (2004) 
66. D. Kopf, K. Weingarten, G. Zhang, M. Moser, M. Emanuel, R. Beach, J. Skidmore, U. 
Keller :“ High average power diode-pumped femtosecond Cr:LiSAF lasers”, Appl. Phys. B 
65, pp. 235-243 (1997). 
67. M. Tsunekane, N. Taguchi, H. Inaba : “ reduction of thermal effects in a diode-end-pumped, 
composite Nd:YAG rod with a sapphire end”, Applied Optics, vol. 37, no. 15, pp. 3290-
3294 (1998). 
68. A. Giesen, H. Hügel, A. Voss, K. Wittig, U. Brauch, H. Opower :“scalable concept for diode-
pumped hgh-power solid-state lasers“, Appl. Phys. B 58, 365-372 (1994). 
69.  W. Koechner : Solid State laser engineering, 5th version, Springer, 1999. 
70.  W.A. Clarkson : « thermal effects and their mitigation in end-pumped solid-state lasers », J. 
Phys. D : Appl. Phys. 34 pp. 2381-2395 (2001) 
71.  Handbook of Optics, 2nd ed., vol. II (devices, measurements and properties), edited by M. 
Bass, E. Stryland, D. Williams, W. Wolfe, McGRAW-HILL, Inc., 2001. 
72.  B. Woods, S. Payne, J. Marion, R. Hugues, L. Davis : “thermomechanical and thermo-
optical properties of the LiCaAlF6:Cr3+ laser material”, J. Opt. Soc. Am. B Vol. 8, No. 5, pp. 970-
977 (1991).  
73. S. Chenais, F. Balembois, F. Druon, G. Lucas-leclin, P. Georges, “thermal lensing in 
Diode-pumped Ytterbium Lasers – Part I : Theoretical analysis and wavefront measurements”, 
IEEE J. Quant. Elec. Vol 40, No. 9, 1217-1234, (2004) 
74. A.A. Lagatsky, N.V. Kuleshov, V.P. Mikhailov: « diode-pumped CW lasing of Yb:KYW 
and Yb:KGW », Opt. Comm. 165, pp.71-75 (1999) 
 109 /115  
75. F. Augé, F. Druon, F. Balembois, P. Georges, A. Brun, F. Mougel, G. P. Aka, D. Vivien : 
« Theoretical and experimental investigations of a diode-pumped quasi-three-level laser : the 
Yb3+-doped Ca4GdO(BO3)3 (Yb:GdCOB) laser » , IEEE J. Quant. Elec., Vol. 36, No.5, pp. 
598-606 (2000). 
76. A. Aron, G. Aka, B. Viana, A. Kahn-Harari, D. Vivien, F. Druon, F. Balembois, P. Georges, A. 
Brun, N. Lenain, M. Jacquet : “Spectroscopic properties and laser performances of Yb:YCOB 
and potential of the Yb:LaCOB material”, Opt. Mat. 16, pp.181-188 (2001). 
77. W. Koechner, D.K. Rice : “effect of birefringence on the performance of linearly polarized 
YAG:Nd lasers,” IEEE J. Quant. Elec. , vol. 6, pp.557-566 (1990).  
78. Scott W.C., De Wit M., Appl Phys Lett. vol. 18, pp.3-4 1971 
79. Lu Q., Kugler N., Weber H., Dong S., Muller N. and Wittrock U., Opt Quant Electron, vol 28, 
pp. 57-69 (1996) 
80. R. Fluck, M. Hermann, L. Hackel : “energetic and thermal performance of high-gain diode-side 
pumped Nd:YAG rods”, Appl. Phys. B 70, pp. 491-498 (2000). 
81. S De Nicola, A Finizio, G Pierattini and G Carbonara “Interferometric method for concurrent 
measurement of the thermo-optic coefficients of quartz retarders” Pure Appl. Opt. 3 209-213 
(1994)  
82. Y-F. Tsay, B. Bendow, S. Mitra : “theory of the temperature derivative of the refractive index 
in transparent crystals”, Phys. Rev. B., Vol. 8, No. 6, pp. 2688-2696 (1973).  
83. J. Eichenholz, M. Richardson, : “measurement of thermal lensing in Cr3+-doped colquirites”, 
IEEE J. Quant. Elec., vol. 34, no. 5, pp. 910-919 (1988)  
84. C. Pfistner, R. Weber, H. Weber, S. Merazzi, R. Gruber : “ Thermal beam distortions in end-
pumped Nd:YAG, Nd:GSGG, and Nd:YLF Rods ”, IEEE. J. Quant. Elec. vol. 30, no. 7, pp. 
1605-1615 (1994). 
 110 /115  
85. T. Baer, W. Nighan, M. Keierstead : “modeling of end-pumped, solid-state lasers”, in 
Conference on Lasers and Electro-Optics, Vol. 11 of 1993 OSA Technical Digest Series 
(Optical society of America, Washington D.C., 1993), p. 638. 
86. K. Kleine, L. Gonzalez, R. Bhatia, L. Marshall,, D. Matthews : “High brightness Nd:YVO4 
laser for nonlinear optics”, Advanced Solid State Lasers, M. Fejer, H. Injeyan and U. Keller 
eds., Vol. 26 of OSA Trends in Optics and Photonics Series (Optical Society of America, 
Washington D.C., 1999), pp. 157-158. 
87. R. Weber, B. Neuenschwender, M. Macdonald, M.B. Roos, H.P. Weber : “Cooling schemes for 
longitudinally diode-laser pumped Nd:YAG rods”, IEEE J. Quant. Elec. 34, pp. 1046-1053 
(1998). 
88. X. Peng, A. Asundi, Y. Chen, Z. Xiong : “study of the mechanical properties of Nd:YVO4 
crystal by use of laser interferometry and finite-element analysis”, Appl. Opt., Vol. 40, No. 9, 
pp. 1396-1403 (2001). 
89. A. Brignon, J.-P. Huignard, M. H. Garrett, and I. Mnushkina, “Spatial beam cleanup of a 
Nd:YAG laser operating at 1.06 µm with two-wave mixing in Rh:BaTiO3,” Appl. Opt. 36, 
7788-7793 (1997). 
90. W. Xie, S. Tam, Y. Lam, Y. Kwon : “Diffraction losses of high power solid state lasers”, Opt. 
Comm. 189, pp. 337-343 (2001). 
91. J. Frauchiger, P. Albers, H. Weber : "modeling of thermal lensing and higher order ring mode 
oscillation in end-pumped CW Nd:YAG lasers“, IEEE J. Quant. Elec. , vol. 28, no. 4, pp. 
1046-1056 (1992). 
92. F. Druon, G. Chériaux, J. Faure, G. Vdovin, J.C. Chanteloup, J. Nees, M. Nantel, A. 
Maksimchuk, G. Mourou, "Wave-front correction of femtosecond terawatt lasers using 
deformable mirrors", Optics Letters, Vol. 23 No 1, pp. 1043-45, 1998 
 111 /115  
93. J. Gordon, R. Leite, R. Moore, S. Porto, J. Whinnery : “long-transient effects in lasers with 
inserted liquid samples”, J. Appl. Phys. Vol. 36, no. 1, pp. 3-8 (1965). 
94. D. Fournier, A.C. Boccara, N. Amer, R. Gerlach : « sensitive in situ trace-gas detection by 
photothermal deflection », Appl. Phys. Lett. 37 (6), pp. 519-521 (1980). 
95. S. Bialkowski : “Photothermal Spectroscopy Methods for Chemical Analysis” Volume 134 
Chemical Analysis: A Series of Monographs on Analytical Chemistry and Its Applications, , J. 
D. Winefordner, Series Editor, John Wiley & Sons, Inc. 1996 
96. D. Burnham : “simple measurement of thermal lensing effects in laser rods”, Appl. Opt. vol. 9, 
no. 7, pp.1727-1728 (1970). 
97. C. Hu, J.R. Whinnery : “New thermooptical measurement method and a comparison with other 
methods”, Appl. Opt. Vol. 12, No. 1, pp. 72-79 (1973). 
98. R. Paugstadt, M. Bas : “method for temporally and spatially resolved thermal-lensing 
measurements”, Appl. Opt. , vol. 33, no. 6, pp. 954-959 (1994). 
99. R. Misra, P. Banerjee : “theoretical and experimental studies of pump-induced probe deflection 
in a thermal medium”, Appl. Opt. , vol. 34, no. 18, pp.3358-3365 (1995). 
100. H. Kogelnik, T. Li : “laser beams and resonators”, Appl. Opt. vol. 5, pp. 1550 (octobre 
1966). 
101. B. Neuenschwander, R. Weber, H. Weber : “determination of the thermal lens in solid-state 
lasers with stable cavities”, IEEE. J. Quant. Elec. , vol. 31, no. 6, pp. 1082-1087 (1995). 
102. J. Frauchiger, P. Albers, H. Weber : "modeling of thermal lensing and higher order ring 
mode oscillation in end-pumped CW Nd:YAG lasers“, IEEE J. Quant. Elec. , vol. 28, no. 4, pp. 
1046-1056 (1992). 
103. B. Ozygus, J. Erhard : “thermal lens determination of end-pumped solid-state lasers with 
transverse beat frequencies”, Appl. Phys. Lett. 67 (10), pp. 1361-1362 (1995). 
 112 /115  
104. B. Ozygus, Q. Zhang : “thermal lens determination of end-pumped solid-state lasers using 
primary degeneration modes”, Appl. Phys. Lett. 71 (18), pp. 2590-2592 (1997). 
105. A. Cabezas, L. Komai, R. Treat : “dynamic measurements of phae shifts in laser amplifiers”, 
Appl. Opt., Vol. 5, No. 4, pp. 647-651 (1966). 
106. H. Welling, C. Bickart : “spatial and temporal variation of the optical path length in flash-
pumped laser rods”, J. Opt. Soc. Am. Vol. 56, No. 5, pp. 611-618 (1966). 
107. C. Pfistner, R. Weber, H. Weber, S. Merazzi, R. Gruber : “ Thermal beam distortions in 
end-pumped Nd:YAG, Nd:GSGG, and Nd:YLF Rods ”, IEEE. J. Quant. Elec. vol. 30, no. 7, pp. 
1605-1615 (1994). 
108. A. Khizhnyak, G. Galich, M. Lopiitchouk : “characteristics of thermal lens induced in 
active rod of cw Nd:YAG laser”, Semiconductor Physics ; Quantum Electronics & 
Optoelectronics (SQO), Vol. 2, No. 1, pp. 147-152 (1999). 
109. J. Blows, J. Dawes, T. Omatsu : “thermal lensing measurements in line-focus end-pumped 
neodymium yttrium aluminium garnet using holographic lateral shearing interferometry”, J. 
Appl. Phys. , Vol. 83, No. 6, pp. 2901-2906 (1998). 
110. J.L. Blows, P. Dekker, P. Wang , J.M. Dawes , T. Omatsu, "Thermal lensing  
measurements and thermal conductivity of Yb:YAB" Appl. Phys. B 76, 289-292  
(2003) 
111. J. Primot : “three-wave lateral shearing interferometer”, Appl. Opt. , Vol 32, No. 31, 
pp.6242-6249 (1993). J. Primot : “three-wave lateral shearing interferometer”, Appl. Opt. , Vol 
32, No. 31, pp.6242-6249 (1993). 
112. J-C. Chanteloup: « Multiple-wave lateral shearing interferometry for wave-front sensing », 
Opt. Lett., Vol. 23, No. 8, pp. 621-623 (1998) 
 113 /115  
113. J-C. Chanteloup, F. Druon, M. Nantel, A. Maksimchuk, G. Mourou : « single-shot wave-
front measurements of high-intensity ultrashort laser pulses with a three-wave interferometer », 
Applied Optics, Vol. 44, No. 9, pp. 1559-1571 (2005) 
114. L. Grossard, A. Desfarges-Berthelemot, B. Colombeau, C. Froehly : “iterative 
reconstruction of thermally induced phase distortion in a Nd3+:YVO4 laser”, J. Opt. A : Pure 
Appl. Opt. 4 pp. 1-7 (2002). 
115. J. Hartmann : “Bemerkungen uber den Bau und die Justirung von Spectrographen”, Zeitung 
Instrumentenkd., 20 p.47 (1900). 
116. D. Armstrong, J. Mansell, D. Neal : “how to avoid beam distortion in solid-state laser 
design”, Laser Focus World, décembre 1998, pp. 129-132. 
117. S. Ito, H. Nagaoka, T. Miura, K. Kobayashi, A. Endo, K. Torizuka : “ measurement of 
thermal lensing in a power amplifier of a terawatt Ti:sapphire laser”, Appl. Phys. B 74, pp. 343-
347 (2002). 
118. M. Pittman, S. Ferré, J.P. Rouseau, L. Notebaert, J.P. Chambaret, G. Chériaux : « design 
and characterization of a near-difraction-limited femtosecond 100-TW 10-Hz high-intensity 
laser system », Appl. Phys. B 74, pp. 529-535 (2002). 
119. I. Shoji, T. Taira : “Drastic reduction of depolarization resulting from thermally induced 
birefringence by use of a (110)-cut YAG crystal”, OSA TOPS Advanced Solid State Lasers Vol. 
68, pp.521-525, 2002. 
120. A. Nashimura, K. Akaoka, A. Ohzu, T. Usami, “Temporal changes of thermal lens effects 
on highly-pumped Ytterbium glass by wavefront measurements”, Journal of Nuclear Science 
and Technology, Vol. 38, No. 12, p. 1043-1047 (2001) 
121. S. Chénais, F. Balembois, F . Druon, G. Lucas-Leclin, P. Georges, "Thermal Lensing in 
Diode-Pumped Ytterbium Lasers  - Part II: evaluation of quantum efficiencies and thermo-optic 
coefficients." IEEE J. Quantum Electronics Vol. 40 No 9 September, 1235-1243, 2004 
 114 /115  
122. D. F. de Sousa, N. Martynyuk, V. Peters, K. Lunstedt, K. Rademaker, K. Petermann, and S. 
Basun, “Quenching behavior of highly doped Yb:YAG and YbAG,” in Conf. Lasers Electro-
Optics Europe, Tech. Dig., Conf. Ed., 2003, CG1-3. 
123. H. Yin, P. Deng, and F. Gan, “Defects in Yb:YAG,” J. Appl. Phys., vol. 83, no. 8, pp. 
3825–3828, 1998. 
124. P. Yang, P. Deng, and Z. Yin, “Concentration quenching in Yb:YAG,” J. Lumin., pp. 51–54, 
125. M. O. Ramirez, D. Jaque, L. E. Bausa, J. A. S. Garcia, and J. G. Solé, “Thermal loading in 
highly efficient diode pumped ytterbium doped lithium niobate lasers,” presented at the Conf. 
Lasers and Electro-Optics Europe 2003 (CLEO Europe), Münich, Germany, 2003. 
126. N. Barnes, B. Walsh : “Quantum efficiency measurements of Nd:YAG, Yb:YAG, and 
Tm:YAG”, OSA TOPS Advanced Solid State Lasers Vol. 68, pp.284-287, 2002. 
127. F. Patel, E. Honea, J. Speth, S. Payne, R. Hutcheson, R. Equall : “Laser demonstration of 
Yb3Al5O12 (YbAG) and materials properties of highly doped Yb:YAG”, IEEE. J. Quant. Elec. , 
vol. 37, no. 1, pp. 135-144 (2001). 
128.  Jacquemet, M.; Jacquemet, C.; Janel, N.; Druon, F.; Balembois, F.; Georges, P.; Petit, J.; 
Viana, B.; Vivien, D.; Ferrand, B. “Efficient laser action of Yb:LSO and Yb:YSO 
oxyorthosilicates crystals under high-power diode-pumping”, Applied Physics B, Volume 80, 
Issue 2, pp.171-176 (2005) 
129. R. Gaumé, “Relations structure-propriétés dans les lasers solides de puissance à l’ytterbium. 
élaboration et caractérization de nouveaux matériaux et de cristaux composites soudés par 
diffusion,” Ph.D. dissertation, Pierre et Marie Curie Univ., Paris, VI, France, 2002 
130. R. Epstein et. al : « Observation of laser-induced fluorescent cooling of a solid », Nature, 
vol. 377, pp.500-503, 12 oct. 1995. 
 115 /115  
131. S. Bowman, N. Jenkins, B. Feldman, S. O’Connor: “Demonstration of a radiatively cooled 
laser”, in Proceedings of Conference on Lasers and Electro-Optics (CLEO 2002), Long Beach, 
CA, June 2002, p. 180. 
132. S. Bowman : “ lasers without internal heat generation”, IEEE J. Quant. Elec., vol. 35, no. 1, 
115-122 (1999)
ABSTRACT
  A review of theoretical and experimental studies of thermal effects in
solid-state lasers is presented, with a special focus on diode-pumped
ytterbium-doped materials. A large part of this review provides however general
information applicable to any kind of solid-state laser. Our aim here is not to
make a list of the techniques that have been used to minimize thermal effects,
but instead to give an overview of the theoretical aspects underneath, and give
a state-of-the-art of the tools at the disposal of the laser scientist to
measure thermal effects. After a presentation of some general properties of
Yb-doped materials, we address the issue of evaluating the temperature map in
Yb-doped laser crystals, both theoretically and experimentally. This is the
first step before studying the complex problem of thermal lensing (part III).
We will focus on some newly discussed aspects, like the definition of the
thermo-optic coefficient: we will highlight some misleading interpretations of
thermal lensing experiments due to the use of the dn/dT parameter in a context
where it is not relevant. Part IV will be devoted to a state-of-the-art of
experimental techniques used to measure thermal lensing. Eventually, in part V,
we will give some concrete examples in Yb-doped materials, where their
peculiarities will be pointed out.

<|endoftext|><|startoftext|>
Introduction
	Numerical simulations
	Field scattered by a dipole along the interface
	Analytical model
	Summary
	Acknowledgments
	APPENDIX
	References
ABSTRACT
  We present a numerical study and analytical model of the optical near-field
diffracted in the vicinity of subwavelength grooves milled in silver surfaces.
The Green's tensor approach permits computation of the phase and amplitude
dependence of the diffracted wave as a function of the groove geometry. It is
shown that the field diffracted along the interface by the groove is equivalent
to replacing the groove by an oscillating dipolar line source. An analytic
expression is derived from the Green's function formalism, that reproduces well
the asymptotic surface plasmon polariton (SPP) wave as well as the transient
surface wave in the near-zone close to the groove. The agreement between this
model and the full simulation is very good, showing that the transient
"near-zone" regime does not depend on the precise shape of the groove. Finally,
it is shown that a composite diffractive evanescent wave model that includes
the asymptotic SPP can describe the wavelength evolution in this transient
near-zone. Such a semi-analytical model may be useful for the design and
optimization of more elaborate photonic circuits whose behavior in large part
will be controlled by surface waves.

<|endoftext|><|startoftext|>
Introduction
(a) Equipped with the L2-Wasserstein distance dW (cf. (2.1)), the space P(M) of probability
measures on an Euclidean or Riemannian space M is itself a rich object of geometric interest.
Due to the fundamental works of Y. Brenier, R. McCann, F. Otto, C. Villani and many others
(see e.g. [Bre91, McC97, CEMS01, Ott01, OV00, Vil03]) there are well understood and pow-
erful concepts of geodesics, exponential maps, tangent spaces TµP(M) and gradients Du(µ) of
functions on this space. In a certain sense, P(M) can be regarded as an infinite dimensional
Riemannian manifold, or at least as an infinite dimensional Alexandrov space with nonnegative
lower curvature bound if the base manifold (M,d) has nonnegative sectional curvature.
A central role is played by the relative entropy : P(M) → R ∪ {+∞} with respect to the
Riemannian volume measure dx on M
Ent(µ) =
ρ log ρ dx, if dµ(x) ≪ dx with ρ(x) = dµ(x)
+∞, else.
The relative entropy as a function on the geodesic space (P(M), dW ) is K-convex for a given
number K ∈ R if and only if the Ricci curvature of the underlying manifold M is bounded from
below by K, [vRS05, Stu06]. The gradient flow for the relative entropy in the geodesic space
(P(M), dW ) is given by the heat equation ∂∂tµ = ∆µ on M , [JKO98]. More generally, a large
class of evolution equations can be treated as gradient flows for suitable free energy functionals
S : P(M) → R, [Vil03].
What is missing until now, is a natural ’Riemannian volume measure’ P on P(M). The basic re-
quirement will be an integration by parts formula for the gradient. This will imply the closability
of the pre-Dirichlet form
E(u, v) =
〈Du(µ),Dv(µ)〉Tµ dP(µ)
in L2(P(M),P), – which in turn will be the key tool in order to develop an analytic and stochastic
calculus on P(M). In particular, it will allow us to construct a kind of Laplacian and a kind
of Brownian motion on P(M). Among others, we intend to use the powerful machinery of
Dirichlet forms to study stochastically perturbed gradient flows on P(M) which – on the level
http://arxiv.org/abs/0704.0704v1
of the underlying spaces M – will lead to a new concept of SPDEs (preserving probability by
construction).
Instead of constructing a ’uniform distribution’ P on P(M), for various reasons, we prefer to
construct a probability measure Pβ on P(M) formally given as
dPβ(µ) =
e−β·Ent(µ) dP(µ) (1.1)
for β > 0 and some normalization constant Zβ. (In the language of statistical mechanics, β is
the ’inverse temperature’ and Zβ the ’partition function’ whereas the entropy plays the role of
a Hamiltonian.)
(b) One of the basic results of this paper is the rigorous construction of such a entropic measure
Pβ in the one-dimensional case, i.e. M = S1 or M = [0, 1]. We will essentially make use of the
representation of probability measures by their inverse distributions function gµ. It allows to
transfer the problem of constructing a measure Pβ on the space of probability measures P([0, 1])
(or P(S1)) into the problem of constructing a measure Qβ0 (or Qβ) on the space G0 (or G, resp.)
of nondecreasing functions from [0, 1] (or S1, resp.) into itself.
In terms of the measure Q
0 on G0, for instance, the formal characterization (1.1) then reads as
follows
0 (g) =
e−β·S(g) dQ0(g). (1.2)
Here Q0 denotes some ’uniform distribution’ on G0 ⊂ L2([0, 1]) and S : G0 → [0,∞] is the
entropy functional
S(g) := Ent(g∗Leb) = −
log g′(t) dt.
This representation is reminiscent of Feynman’s heuristic picture of the Wiener measure, — now
with the energy
H(g) =
g′(t)2dt
of a path replaced by its entropy. Q
0 will turn out to be (the law of) the Dirichlet process or
normalized Gamma process.
(c) The key result here is the quasi-invariance – or in other words a change of variable formula –
for the measure Pβ (or P
0 ) under push-forwards µ 7→ h∗µ by means of smooth diffeomorphisms
h of S1 (or [0, 1], resp.). This is equivalent to the quasi-invariance of the measure Qβ under
translations g 7→ h ◦ g of the semigroup G by smooth h ∈ G. The density
dPβ(h∗µ)
dPβ(µ)
· Y 0h (µ)
consists of two terms. The first one
h (µ) = exp
log h′(t)dµ(t)
can be interpreted as exp(−βEnt(h∗µ))/ exp(−βEnt(µ)) in accordance with our formal inter-
pretation (1.1). The second one
Y 0h (µ) =
I∈gaps(µ)
h′(I−) · h′(I+)
|h(I)|/|I|
can be interpreted as the change of variable formula for the (non-existing) measure P. Here
gaps(µ) denotes the set of intervals I =]I−, I+[⊂ S1 of maximal length with µ(I) = 0. Note that
Pβ is concentrated on the set of µ which have no atoms and not absolutely continuous parts and
whose supports have Lebesgue measure 0.
(d) The tangent space at a given point µ in P = P(S1) (or in P0 = P([0, 1])) will be an
appropriate completion of the space C∞(S1,R) (or C∞([0, 1],R), resp.). The action of a tangent
vector ϕ on µ (’exponential map’) is given by the push forward ϕ∗µ. This leads to the notion
of the directional derivative
Dϕu(µ) = lim
[u((Id+ tϕ)∗µ)− u(µ)]
for functions u : P → R. The quasi-invariance of the measure Pβ implies an integration by parts
formula (and thus the closability)
D∗ϕu = −Dϕu− Vϕ · u
with drift Vϕ = limt→0
Id+tϕ
− 1).
The subsequent construction will strongly depend on the choice of the norm on the tangent
spaces TµP. Basically, we will encounter two important cases.
(e) Choosing TµP = Hs(S1,Leb) for some s > 1/2 — independent of µ — leads to a regular,
local, recurrent Dirichlet form E on L2(P,Pβ) by
E(u, u) =
|Dϕku(µ)|2 dPβ(µ).
where {ϕk}k∈N denotes some complete orthonormal system in the Sobolev space Hs(S1). Ac-
cording to the theory of Dirichlet forms on locally compact spaces [FOT94], this form is associ-
ated with a continuous Markov process on P(S1) which is reversible with respect to the measure
Pβ. Its generator is given by
DϕkDϕk +
Vϕk ·Dϕk . (1.3)
This process (gt)t≥0 is closely related to the stochastic processes on the diffeomorphism group
of S1 and to the ’Brownian motion’ on the homeomorphism group of S1, studied by Airault,
Fang, Malliavin, Ren, Thalmaier and others [AMT04, AM06, AR02, Fan02, Fan04, Mal99].
These are processes with generator 1
kDϕkDϕk . Hence, one advantage of our approach is to
identify a probability measure Pβ such that these processes — after adding a suitable drift —
are reversible.
Moreover, previous approaches are restricted to s ≥ 3/2 whereas our construction applies to all
cases s > 1/2.
(f) Choosing TµG = L2([0, 1], µ) leads to the Wasserstein Dirichlet form
E(u, u) =
‖Du(µ)‖2L2(µ) dP
0 (µ)
on L2(P0,Pβ0 ). Its square field operator is the squared norm of the Wasserstein gradient and its
intrinsic distance (which governs the short time asymptotic of the process) coincides with the
L2-Wasserstein metric. The associated continuous Markov process (µt)t≥0 on P([0, 1]), which we
shall call Wasserstein diffusion, is reversible w.r.t. the entropic measure P
0 . It can be regarded
as a stochastic perturbation of the Neumann heat flow on P([0, 1]) with small time Gaussian
behaviour measured in terms of kinetic energy.
2 Spaces of Probability Measures and Monotone Maps
The goal of this paper is to study stochastic dynamics on spaces P(M) in case M is the unit
interval [0, 1] or the unit circle S1.
2.1 The Spaces P0 = P([0, 1]) and G0
Let us collect some basic facts for the space P0 = P([0, 1]) of probability measures on the unit
interval [0, 1] the proofs of which can be found in the monograph [Vil03]. Equipped with the
L2-Wasserstein distance dW , it is a compact metric space. Recall that
dW (µ, ν) := inf
[0,1]2
|x− y|2γ(dx, dy)
, (2.1)
where the infimum is taken over all probability measures γ ∈ P([0, 1]2) having marginals µ and
ν (i.e. γ(A×M) = µ(A) and γ(M ×B) = ν(B) for all A,B ⊂M).
Let G0 denote the space of all right continuous nondecreasing maps g : [0, 1[→ [0, 1] equipped
with the L2-distance
‖g1 − g2‖L2 =
|g1(t)− g2(t)|2dt
Moreover, for notational convenience each g ∈ G0 is extended to the full interval [0, 1] by g(1) :=
1. The map
χ : G0 → P0, g 7→ g∗Leb
(= push forward of the Lebesgue measure on [0, 1] under the map g) establishes an isometry
between (G0, ‖.‖L2) and (P0, dW ). The inverse map χ−1 : P0 → G0, µ 7→ gµ assigns to each
probability measure µ ∈ P0 its inverse distribution function defined by
gµ(t) := inf{s ∈ [0, 1] : µ[0, s] > t} (2.2)
with inf ∅ := 1. In particular, for all µ, ν ∈ P0
dW (µ, ν) = ‖gµ − gν‖L2 . (2.3)
For each g ∈ G0 the generalized inverse g−1 ∈ G0 is defined by g−1(t) = inf{s ≥ 0 : g(s) > t}.
Obviously,
‖g1 − g2‖L1 = ‖g−11 − g
2 ‖L1 (2.4)
(being simply the area between the graphs) and (g−1)−1 = g. Moreover, g−1(g(t)) = t for all
t provided g−1 is continuous. (Note that under the measure Q
0 to be constructed below the
latter will be satisfied for a.e. g ∈ G0.)
On G0, there exist various canonical topologies: the L2-topology of G0 regarded as subset of
L2([0, 1],R); the image of the weak topology on P0 under the map χ−1 : µ 7→ gµ (= inverse
distribution function); the image of the weak topology on P0 under the map µ 7→ g−1µ (=
distribution function). All these – and several other – topologies coincide.
Proposition 2.1. For each sequence (gn)n ⊂ G0, each g ∈ G0 and each p ∈ [1,∞[ the following
are equivalent:
(i) gn(t) → g(t) for each t ∈ [0, 1] in which g is continuous;
(ii) gn → g in Lp([0, 1]);
(iii) g−1n → g−1 in Lp([0, 1]);
(iv) µgn → µg weakly;
(v) µgn → µg in dW .
In particular, G0 is compact.
Let us briefly sketch the main arguments of the
Proof. Since all the functions gn and g
n are bounded, properties (ii) and (iii) obviously are
independent of p. The equivalence of (ii) and (iii) for p = 1 was already stated in (2.4) and the
equivalence between (ii) for p = 2 and (v) was stated in (2.3). The equivalence of (iv) and (v)
is the well known fact that the Wasserstein distance metrizes the weak topology. Another well
known characterization of weak convergence states that (iv) is equivalent to (i’): g−1n (t) → g−1(t)
for each t ∈ [0, 1] in which g−1 is continuous. Finally, (i′) ⇔ (i) according to the equivalence
(ii) ⇔ (iii) which allows to pass from convergence of distribution functions g−1n to convergence
of inverse distribution functions gn. The last assertion follows from the compactness of P0 in
the weak topology.
2.2 The Spaces G, G1 and P = P(S1)
Throughout this paper, S1 = R/Z will always denote the circle of length 1. It inherits the group
operation + from R with neutral element 0. For each x, y ∈ S1 the positively oriented segment
from x to y will be denoted by [x, y] and its length by |[x, y]|. If no ambiguity is possible, the
latter will also be denoted by y − x. In contrast to that, |x − y| will denote the S1-distance
between x and y. Hence, in particular, |[y, x]| = 1 − |[x, y]| and |x − y| = min{|[y, x]|, |[x, y]|}.
A family of points t1, . . . , tN ∈ S1 is called an ’ordered family’ if
i=1 |[ti, ti+1]| = 1 with
tN+1 := t1 (or in other words if all the open segments ]ti, ti+1[ are disjoint).
G(R) = {g : R → R right continuous nondecreasing with g(x+ 1) = g(x) + 1 for all x ∈ R}.
Due to the required equivariance with respect to the group action of Z, each map g ∈ G(R)
induces uniquely a map π(g) : S1 → S1. Put G := π(G(R)). The monotonicity of the functions
in G(R) induces also a kind of monotonicity of maps in G: each continuous g ∈ G will be
order preserving and homotopic to the identity map. In the sequel, however, we often will have
to deal with discontinuous g ∈ G. The elements g ∈ G will be called monotone maps of S1.
G is a compact subspace of the L2-space of maps from S1 to S1 with metric ‖g1 − g2‖L2 =
|g1(t)− g2(t)|2dt
With the composition ◦ of maps, G is a semigroup. Its neutral element e is the identity map.
Of particular interest in the sequel will be the semigroup G1 = G/S1 where functions g, h ∈ G
will be identified if g(.) = h(.+ a) for some a ∈ S1.
Proposition 2.2. The map
χ : G1 → P, g 7→ g∗Leb
(= push forward of the Lebesgue measure on S1 under the map g) and its inverse χ−1 : P →
G1, µ 7→ gµ (with gµ as defined in (2.2)) establish an isometry between the space G1 equipped
with the induced L2-distance
‖g1 − g2‖G1 =
|g1(t)− g2(t+ s)|2dt
and the space P of probability measures on S1 equipped with the L2-Wasserstein distance. In
particular, G1 is compact.
Proof. The bijectivity of χ and χ−1 is clear. It remains to prove that
dW (µ, ν) = ‖gµ − gν‖G1 (2.5)
for all µ, ν ∈ P. Obviously, it suffices to prove this for all absolutely continuous µ, ν (or
equivalently for strictly increasing gµ, gν) since the latter are dense in P (or in G1, resp.). For
such a pair of measures, there exists a map F : S1 → S1 (’transport map’) which minimizes the
transportation costs [Vil03]. Fix any point in S1, say 0, and put s = F (0). Then the map F is
a transport map for the mass µ on the segment ]0, 1[ onto the mass ν on the segment ]s, s+ 1[.
Since these segments are isometric to the interval ]0, 1[, the results from the previous subsection
imply that the minimal cost for such a transport is given by
|g1(t) − g2(t + s)|2dt. Varying
over s finally proves the claim.
3 Dirichlet Process and Entropic Measure
3.1 Gibbsean Interpretation and Heuristic Derivation of the Entropic Mea-
One of the basic results of this paper is the rigorous construction of a measure Pβ formally given
as (1.1) in the one-dimensional case, i.e. M = S1 or M = [0, 1]. We will essentially make use
of the isometries χ : G1 → P = P(S1), g 7→ g∗Leb and χ : G0 → P0 = P([0, 1]). They allow to
transfer the problem of constructing measures Pβ on spaces of probability measures P (or P0)
into the problem of constructing measures Qβ (or Q
0 ) on spaces of functions G1 (or G0, resp.).
In terms of the measure Q
0 on G0, for instance, the formal characterization (1.1) then reads as
follows
0 (dg) =
e−β·S(g)Q0(dg). (3.1)
Here Q0 denotes some ’uniform distribution’ on G0 ⊂ L2([0, 1]) and S : G0 → [0,∞] is the
entropy functional S(g) := Ent(g∗Leb). If g is absolutely continuous then S(g) can be expressed
explicitly as
S(g) = −
log g′(t) dt.
The representation (3.1) is reminiscent of Feynman’s heuristic picture of the Wiener measure.
Let us briefly recall the latter and try to use it as a guideline for our construction of the measure
According to this heuristic picture, the Wiener measure Pβ with diffusion constant σ2 = 1/β
should be interpreted (and could be constructed) as
Pβ(dg) =
e−β·H(g)P(dg) (3.2)
with the energy functional H(g) = 1
g′(t)2dt. Here P(dg) is assumed to be the ’uniform
distribution’ on the space G∗ of all continuous paths g : [0, 1] → R with g(0) = 0. Even if
such a uniform distribution existed, typically almost all paths g would have infinite energy.
Nevertheless, one can overcome this difficulty as follows.
Given any finite partition {0 = t0 < t1 < · · · < tN = 1} of [0, 1], one should replace the energy
H(g) of the path g by the energy of the piecewise linear interpolation of g
HN (g) = inf {H(g̃) : g̃ ∈ G∗, g̃(ti) = g(ti) ∀i} =
|g(ti)− g(ti−1)|2
2(ti − ti−1)
Then (3.2) leads to the following explicit representation for the finite dimensional distributions
Pβ (gt1 ∈ dx1, . . . , gtN ∈ dxN ) =
|xi − xi−1|2
ti − ti−1
pN (dx1, . . . , xN ). (3.3)
Here pN (dx1, . . . , xN ) = P (gt1 ∈ dx1, . . . , gtN ∈ dxN ) should be a ’uniform distribution’ on RN
and Zβ,N a normalization constant. Choosing pN to be the N -dimensional Lebesgue measure
makes the RHS of (3.3) a projective family of probability measures. According to Kolmogorov’s
extension theorem this family has a unique projective limit, the Wiener measure Pβ on G∗ with
diffusion constant σ2 = 1/β.
Now let us try to follow this procedure with the entropy functional S(g) replacing the energy
functional H(g). Given any finite partition {0 = t0 < t1 < · · · < tN < tN+1 = 1} of [0, 1], we
will replace the entropy S(g) of the path g by the entropy of the piecewise linear interpolation
SN (g) = inf {S(g̃) : g̃ ∈ G0, g̃(ti) = g(ti) ∀i} = −
g(ti)− g(ti−1)
ti − ti−1
· (ti − ti−1).
This leads to the following expression for the finite dimensional distributions
0 (gt1 ∈ dx1, . . . , gtN ∈ dxN )
xi − xi−1
ti − ti−1
· (ti − ti−1)
qN(dx1 . . . dxN ) (3.4)
where qN (dx1, . . . , xN ) = Q0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) is a ’uniform distribution’ on the simplex
(x1, . . . , xN ) ∈ [0, 1]N : 0 < x1 < x2 . . . < xN < 1
and x0 := 0, xN+1 := 1.
What is a ’canonical’ candidate for qN? A natural requirement will be the invariance property
qN (dx1, . . . , dxN ) = [(Ξ
xi−1,xi+k)∗ qk(dxi, . . . , dxi+k−1)]
dqN−k(dx1, . . . , dxi−1, dxi+k, . . . , dxN ) (3.5)
for all 1 ≤ k ≤ N and all 1 ≤ i ≤ N − k + 1 with the convention x0 = 0, xN+1 = 1 and the
rescaling map Ξa,b : ]0, 1[k→ Rk, yj 7→ yj(b− a) + a for j = 1, · · · , k.
If the qN , N ∈ N, were probability measures then the invariance property admits the following
interpretation: under qN , the distribution of the (N − k)-tuple (x1, . . . , xi−1, xi+k, . . . , xN ) is
nothing but qN−k; and under qN , the distribution of the k-tuple (xi, . . . , xi+k−1) of points in
the interval ]xi−1, xk[ coincides — after rescaling of this interval — with qk. Unfortunately, no
family of probability measures qN , N ∈ N with property (3.5) exists. However, there is a family
of measures with this property.
By iteration of the invariance property (3.5), the choice of the measure q1 on the interval
Σ1 = ]0, 1[ will determine all the measures qN , N ∈ N. Moreover, applying (3.5) for N = 2,
k = 1 and both choices of i yields
(Ξ0,x1)∗ q1(dx2)
dq1(dx1) =
(Ξx2,1)∗ q1(dx1)
dq1(dx2) (3.6)
for all 0 < x1 < x2 < 1. This reflects the intuitive requirement that there should be no difference
whether we first choose randomly x1 ∈ ]0, 1[ and then x2 ∈ ]x1, 1[ or the other way round, first
x2 ∈ ]0, 1[ and then x1 ∈ ]0, x2[.
Lemma 3.1. A family of measures qN , N ∈ N, with continuous densities satisfies property (3.5)
if and only if
qN (dx1, . . . , dxN ) = C
N dx1 . . . dxN
x1 · (x2 − x1) · . . . · (xN − xN−1) · (1− xN )
(3.7)
for some constant C ∈ R+.
Proof. If q1(dx) = ρ(x)dx then (3.6) is equivalent to
ρ(y) · ρ
= ρ(x) · ρ
y − x
for all 0 < x < y < 1. For continuous ρ this implies that there exists a constant C ∈ R+ such
that ρ(x) = C
x(1−x)
for all 0 < x < 1. Iterated inserting this into (3.5) yields the claim.
Let us come back to our attempt to give a meaning to the heuristic formula (3.1). Combining
(3.4) with the choice (3.7) of the measure qN finally yields
0 (gt1 ∈ dx1, . . . , gtN ∈ dxN )
(xi − xi−1)β(ti−ti1 )
dx1 . . . dxN
x1 · (x2 − x1) · . . . · (1− xN )
(3.8)
with appropriate normalization constants Zβ,N . Now the RHS of this formula indeed turns out
to define a consistent family of probability measures. Hence, by Kolmogorov’s extension theorem
it admits a projective limit Q
0 on the space G0. The push forward of this measure under the
canonical identification χ : G0 → P0, g 7→ g∗Leb will be the entropic measure Pβ0 which we were
looking for.
The details of the rigorous construction of this measure as well as various properties of it will
be presented in the following sections.
3.2 The Measures Qβ and Pβ
The basic object to be studied in this section is the probability measure Qβ on the space G.
Proposition 3.2. For each real number β > 0 there exists a unique probability measure Qβ on
G, called Dirichlet process, with the property that for each N ∈ N and for each ordered family
of points t1, t2, . . . , tN ∈ S1
β (gt1 ∈ dx1, . . . , gtN ∈ dxN ) =
i=1 Γ(β(ti+1 − ti))
(xi+1 − xi)β(ti+1−ti)−1dx1 . . . dxN .
(3.9)
The precise meaning of (3.9) is that for all bounded measurable u : (S1)N → R
u (gt1 , . . . , gtN ) dQ
i=1 Γ(β · |[ti, ti+1]|)
u(x1, . . . , xN )
|[xi, xi+1]|β·|[ti,ti+1]|−1dx1 . . . dxN .
with ΣN =
(x1, . . . , xN ) ∈ (S1)N :
i=1 |[xi, xi+1]| = 1
and xN+1 := x1, tN+1 := t1. In
particular, with N = 1 this means
u(gt)dQ
β(g) =
u(x)dx for each t ∈ S1.
Proof. It suffices to prove that (3.9) defines a consistent family of finite dimensional distributions.
The existence of Qβ (as a ’projective limit’) then follows from Kolmogorov’s extension theorem.
The required consistency means that
i=1 Γ(β · |[ti, ti+1]|)
|[xi, xi+1]|β·|[ti,ti+1]|−1u(x1, . . . , xN ) dx1 . . . dxN
Γ(β · |[t1, t2]|) · . . . · Γ(β · |[tk−1, tk+1]|) · . . . · Γ(β · |[tN , t1]|)
|[x1, x2]|β·|[t1,t2]|−1 · . . . · |[xk−1, xk+1]|β·|[tk−1,tk+1]|−1 · . . . · |[xN , x1]|β·|[tN ,t1]|−1
·v(x1, . . . , xk−1, xk . . . , xN ) dx1 . . . dxk−1dxk+1 . . . dxN
whenever u(x1, . . . , xN ) = v(x1, . . . , xk−1, xk . . . , xN ) for all (x1, . . . xN ) ∈ ΣN . The latter is an
immediate consequence of the well-known fact (Euler’s beta integral) that
[xk−1,xk+1]
|[xk−1, xk]|β·|[tk−1,tk]|−1 · |[xk, xk+1]|β·|[tk,tk+1]|−1 dxk
Γ(β · |[tk−1, tk]|)Γ(β · |[tk, tk+1]|)
Γ(β · |[tk−1, tk+1]|)
|[xk−1, xk+1]|β·|[tk−1,tk+1]|−1.
For s ∈ S1 let θ̂s : G → G, g 7→ g ◦ θs be the isomorphism of G induced by the rotation
θs : S
1 → S1, t 7→ t+ s. Obviously, the measure Qβ on G is invariant under each of the maps θ̂s.
Hence, Qβ induces a probability measure Q
1 on the quotient spaces G1 = G/S1.
Recall the definition of the map χ : G → P, g 7→ g∗Leb. Since (g ◦ θs)∗Leb = g∗Leb this
canonically extends to a map χ : G1 → P. (As mentioned before, the latter is even an isometry.)
Definition 3.3. The entropic measure Pβ on P is defined as the push forward of the Dirichlet
process Qβ on G (or equivalently, of the measure Qβ1 on G1) under the map χ. That is, for all
bounded measurable u : P → R
u(µ) dPβ(µ) =
u(g∗Leb) dQ
β(g).
3.3 The Measures Q
0 and P
The subspaces {g ∈ G : g(0) = 0} and {g ∈ G0 : g(0) = 0} can obviously be identified.
Conditioning the probability measure Qβ onto this event thus will define a probability measure
0 on G0. However, we prefer to give the direct construction of Q
Proposition 3.4. For each real number β > 0 there exists a unique probability measure Q
G0, called Dirichlet process, with the property that for each N ∈ N and each family 0 = t0 <
t1 < t2 < . . . < tN < tN+1 = 1
0 (gt1 ∈ dx1, . . . , gtN ∈ dxN ) =
i Γ(β · (ti+1 − ti))
(xi+1 − xi)β·(ti+1−ti)−1dx1 . . . dxN .
(3.10)
The precise meaning of (3.10) is that for all bounded measurable u : [0, 1]N → R
u (gt1 , . . . , gtN ) dQ
0 (g)
i=1 Γ(β · (ti+1 − ti))
u(x1, . . . , xN )
(xi+1 − xi)β·(ti+1−ti)−1dx1 . . . dxN .
with ΣN =
(x1, . . . , xN ) ∈ [0, 1]N : 0 < x1 < x2 . . . < xn < 1
and xN+1 := x1, tN+1 := t1.
Remark 3.5. According to these explicit formulae, it is easy to calculate the moments of the
Dirichlet process. For instance,
0 (gt) :=
gt dQ
0 (g) = t
0 (gt) :=
(gt − t)2 dQβ0 (g) =
1 + β
t(1− t)
for all β > 0 and all t ∈ [0, 1].
Definition 3.6. The entropic measure P
0 on P0 = P([0, 1]) is defined as the push forward of
the Dirichlet process Q
0 on G0 under the map χ. That is, for all bounded measurable u : P0 → R
u(µ) dP
0 (µ) =
u(g∗Leb) dQ
0 (g).
Remark 3.7. (i) According to the above construction Q
0 ( . ) = Q
β( . |g(0) = 0) and
u(g) dQ
0 (g) =
u(g − g(0)) dQβ(g),
u(g) dQβ(g) =
u(g + x) dQ
0 (g) dx.
(ii) Analogously, the entropic measures on the sphere and on the are linked as follows
u(µ) dPβ(µ) =
u((θx)∗µ)dP
0 (µ) dx
or briefly
dPβ =
(θ̂x)∗dP
where θx : S
1 → S1, y 7→ x + y and θ̂x : P → P : µ 7→ (θx)∗µ. We would like to emphasize,
however, that Pβ 6= Pβ0 . For instance, consider u(µ) :=
f dµ for some f : S1 → R (which may
be identified with f : [0, 1] → R). Then
P(S1)
u(µ) dPβ(µ) =
f(x) dx
whereas
P([0,1])
u(µ) dP
0 (µ) =
[0,1]
f(x)ρβ(x) dx
with ρβ(x) =
Γ(βt)Γ(β(1−t))
xβt−1(1− x)β(1−t)−1 dt.
According to the last remark, it suffices to study in detail one of the four measures Qβ, Q
Pβ, and P
0 . We will concentrate in the rest of this chapter on the measure Q
0 which seems to
admit the most easy interpretations.
3.4 The Dirichlet Process as Normalized Gamma Process
We start recalling some basic facts about the real valued Gamma processes. For α > 0 denote
by G(α) the absolutely continuous probability measure on R+ with density
xα−1e−x.
Definition 3.8. A real valued Markov process (γt)t≥0 starting in zero is called standard Gamma
process if its increments γt − γs are independent and distributed according to G(t − s) for 0 ≤
s < t. Without loss of generality we may assume that almost surely the function t→ γt is right
continuous and nondecreasing.
Alternatively the Gamma-Process may be defined as the unique pure jump Levy process with
Levy measure Λ(dx) = 1‖‖x>0
dx. The connection between pure jump Levy and Poisson point
processes gives rise to several other equivalent representations of the Gamma process [Kin93,
Ber99]. For instance, let Π = {p = (px, py) ∈ R2} be the Poisson point process on R+×R+ with
intensity measure dx× Λ(dy) with Λ as above, then a Gamma process is obtained by
γt :=
p∈Π:px≤t
py. (3.11)
For β > 0 the process γt·β is a Levy process with Levy measure Λβ(dx) = β · 1‖‖x>0 e
dx. Its
increments are distributed according to
P (γβ·t − γβ·s ∈ dx) =
Γ(β · (t− s))x
β·(t−s)−1e−xdx. �
Proposition 3.9. For each β > 0, the law of the process (
)t∈[0,1] is the Dirichlet process Q
Proof. This well-known fact is easily obtained from Lukacs’ characterization of the Gamma
distribution [ÉY04].
3.5 Support Properties
Proposition 3.10. (i) For each β > 0, the measure Q
0 has full support on G0.
(ii) Q
0 -almost surely the function t 7→ g(t) is strictly increasing but increases only by jumps
(that is, the jumps heights add up to 1 and the jump locations are dense in [0, 1]).
(iii) For each fixed t0 ∈ [0, 1], Qβ0 -almost surely the function t 7→ g(t) is continuous at t0.
Proof. (i) Let g ∈ G ⊂ L2([0, 1], dx) and ǫ > 0 then we have to show Qβ(Bǫ(g)) > 0 where
Bǫ(g) = {h ∈ G0 : ‖h − g‖L2([0,1]) < ǫ}. For this choose finitely many points ti ∈ [0, 1] together
with δi > 0 such that the set S := {f ∈ G
∣ |f(ti)−g(ti)| ≤ δi ∀i} is contained in Bǫ(g). Clearly,
from (3.10) Qβ(S) > 0 which proves the claim.
(ii) (3.10) implies that Q
0 -almost surely g(s) < g(t) for each given pair s < t. Varying over all
such rational pairs s < t, it follows that a.e. g is strictly increasing on R+.
In terms of the probabilistic representation (3.9), it is obvious that g increases only by jumps.
(iii) This also follows easily from the representation as a normalized gamma process (3.9).
Restating the previous property (ii) in terms of the entropic measure yields that P
0 -a.e. µ ∈ P0
is ’Cantor like’. More precisely,
Corollary 3.11. P
0 -almost surely the measure µ ∈ P0 has no absolutely continuous part and
no discrete part. The topological support of µ has Lebesgue measure 0. Moreover,
Ent(µ) = +∞. (3.12)
Proof. The assertion on the entropy of µ is an immediate consequence of the statement on the
support of µ. The second claim follows from the fact that the jump heights of g add up to 1.
In terms of the measure Q
0 , the last assertion of the corollary states that S(g) = +∞ for Q
0 -a.e.
g ∈ G0.
3.6 Scaling and Invariance Properties
The Dirichlet process Q
0 on G0 has the following Markov property: the distribution of g|[s,t]
depends on g[0,1]\[s,t] only via g(s), g(t).
And the Dirichlet process Q
0 on G0 has a remarkable self-similarity property: if we restrict the
functions g onto a given interval [s, t] and then linearly rescale their domain and range in order
to make them again elements of G0 then this new process is distributed according to Qβ
0 with
β′ = β · |t− s|.
Proposition 3.12. For each β > 0, and each s, t ∈ [0, 1], s < t
g|[s,t] ∈ .
∣ g[0,1]\[s,t]
g|[s,t] ∈ .
∣ g(s), g(t)
(3.13)
(Ξs,t)∗Q
0 = Q
β·|t−s|
0 (3.14)
where Ξs,t : G0 → G0 with Ξs,t(g)(r) = g((1−r)s+rt)−g(s)g(t)−g(s) for r ∈ [0, 1].
Proof. Both properties follow immediately from the representation in Proposition 3.10.
Corollary 3.13. The probability measures Q
0 , β > 0 on G0 are uniquely characterized by the
self-similarity property (3.14) and the distributions of g1/2:
0 (g1/2 ∈ dx) =
Γ(β/2)2
· [x(1− x)]β/2−1dx.
Proposition 3.14. (i) For β → 0 the measures Qβ0 weakly converge to a measure Q00 defined
as the uniform distribution on the set {1[t,1] : t ∈ ]0, 1]} ⊂ G0.
Analogously, the measures Qβ weakly converge for β → 0 to a measure Q0 defined as the uniform
distribution on the set of constant maps {t : t ∈ S1} ⊂ G.
(ii) For β → ∞ the measures Qβ0 (or Qβ) weakly converge to the Dirac mass δe on the identity
map e of [0, 1] (or S1, resp.).
Proof. (i) Since the space G0 (equipped with the L2-topology) is compact, so is P(G0) (equipped
with the weak topology). Hence the family Q
0 , β > 0 is pre-compact. Let Q
0 denote the limit of
any converging subsequence of Q
0 for β → 0. According to the formula for the one-dimensional
distributions, for each t ∈ ]0, 1[
0 (gt ∈ dx) =
Γ(βt)Γ(β(1 − t)) · x
βt−1(1− x)β(1−t)−1dx
−→ (1− t)δ{0}(dx) + tδ{1}(dx)
as β → 0. Hence, Q00 is the uniform distribution on the set {1[t,1] : t ∈ ]0, 1]} ⊂ G0.
(ii) Similarly, Q
0 (gt ∈ dx) → δt(dx) as β → ∞. Hence, δe with e : t 7→ t will be the unique
accumulation point of Q
0 for β → ∞.
Restating the previous results in terms of the entropic measures, yields that the entropic mea-
sures P
0 converge weakly to the uniform distribution P
0 on the set {(1 − t)δ{0} + tδ{1} : t ∈
[0, 1]} ⊂ P0; and the measures Pβ converge weakly to the uniform distribution P0 on the set
{δ{t} : t ∈ S1} ⊂ P whereas for β → ∞ both, Pβ0 and Pβ, will converge to δLeb, the Dirac mass
on the uniform distribution of [0, 1] or S1, resp.
The assertions of Proposition 3.12 imply the following Markov property and self-similarity prop-
erty of the entropic measure.
Proposition 3.15. For each each x, y ∈ [0, 1], x < y
µ|[x,y] ∈ .
∣µ|[0,1]\[x,y]
µ|[x,y] ∈ .
∣µ([x, y]
µ|[x,y] ∈ .
∣µ([x, y]) = α
0 (µx,y ∈ . )
with µx,y ∈ P0 (’rescaling of µ|[x,y]’) defined by µx,y(A) = 1µ([x,y])µ(x+(y−x) ·A) for A ⊂ [0, 1].
3.7 Dirichlet Processes on General Measurable Spaces
Recall Ferguson’s notion of a Dirichlet process on a general measurable spaceM with parameter
measure m on M . This is a probability measure Qm
on P(M), uniquely defined by the fact
that for any finite measurable partition M = ˙
i=1 Mi and σi := m(Mi).
P(M) (µ : µ(M1) ∈ dx1, . . . , µ(MN ) ∈ dxN )
Γ(m(M))
i=1 Γ(σi)
1 · · · x
)σN+1−1dx1 · · · dxN ,
If a map h :M →M leaves the parameter measure m invariant then obviously the induced map
ĥ : P(M) → P(M), µ 7→ h∗µ leaves the Dirichlet process QmP(M) invariant.
In the particular case M = [0, 1] and m = β · Leb, the Dirichlet process Qm
can be obtained
as push forward of the measure Q
0 (introduced before) under the isomorphism ζ : G0 → P([0, 1])
which assigns to each g the induced Lebesgue-Stieltjes measure dg (the inverse ζ−1 assigns to
each probability measure its distribution function):
P([0,1]) = ζ∗Q
0 . (3.15)
Note that the support properties of the measure Qm
P([0,1])
are completely different from those of
the measure P
0 . In particular, Q
P([0,1])
-almost every µ ∈ P([0, 1]) is discrete and has full topo-
logical support, cf. Corollary 3.11. The invariance properties of Qm
P([0,1])
under push forwards by
means of measure preserving transformations of [0, 1] seems to have no intrinsic interpretation
in terms of Q
4 The Change of Variable Formula for the Dirichlet Process and
for the Entropic Measure
Our main result in this chapter will be a change of variable formula for the Dirichlet process.
To motivate this formula, let us first present an heuristic derivation based on the formal repre-
sentation (3.1).
4.1 Heuristic Approaches to Change of Variable Formulae
Let us have a look on the change of variable formula for the Wiener measure. On a formal level,
it easily follows from Feynman’s heuristic interpretation
dPβ(g) =
g′(t)2dt dP(g)
with the (non-existing) ’uniform distribution’ P. Assuming that the latter is ’translation invari-
ant’ (i.e. invariant under additive changes of variables, – at least in ’smooth’ directions h) we
immediately obtain
dPβ(h+ g) =
(h+g)′(t)2dt dP(h + g)
h′(t)2dt−β
h′(t)g′(t)dt · e−
g′(t)2dt dP(g)
h′(t)2dt−β
h′(t)dg(t) dPβ(g).
If we interpret
h′(t)dg(t) as the Ito integral of h′ with respect to the Brownian path g then
this is indeed the famous Cameron-Martin-Girsanov-Maruyama theorem.
In the case of the entropic measure, the starting point for a similar argumentation is the heuristic
interpretation
0 (g) =
log g′(t)dt dQ0(g),
again with a (non-existing) ’uniform distribution’ Q0 on G0. The natural concept of ’change
of variables’, of course, will be based on the semigroup structure of the underlying space G0;
that is, we will study transformations of G0 of the form g 7→ h ◦ g for some (smooth) element
h ∈ G0. It turns out that Q0 should not be assumed to be invariant under translations but
merely quasi-invariant:
dQ0(h ◦ g) = Y 0h (g) dQ0(g)
with some density Yh. This immediately implies the following change of variable formula for Q
0 (h ◦ g) =
log(h◦g)′(t)dt dQ0(h ◦ g)
log h′(g(t))dt · eβ
log g′(t)dt · Y 0h (g) dQ0(g)
log g′(t)dt · Y 0h (g) dQ
0 (g).
This is the heuristic derivation of the change of variables formula. Its rigorous derivation (and
the identification of the density Yh) is the main result of this chapter.
4.2 The Change of Variables Formula on the Sphere
For g, h ∈ G with h ∈ C2 we put
Y 0h (g) :=
h′ (g(a−)) · h′ (g(a+))
δ(h◦g)
, (4.1)
where Jg ⊂ S1 denotes the set of jump locations of g and
δ(h ◦ g)
(a) :=
h (g(a+)) − h (g(a−))
g(a+)− g(a−) .
To simplify notation, here and in the sequel (if no ambiguity seems possible), we write y − x
instead of |[x, y]| to denote the length of the positively oriented segment from x to y in S1. We
will see below that the infinite product in the definition of Y 0h (g) converges for Q
β-a.e. g ∈ G.
Moreover, for β > 0 we put
h (g) := exp
log h′ (g(s)) ds
h (g) := X
h (g) · Y
h (g). (4.2)
Theorem 4.1. Each C2-diffeomorphism h ∈ G induces a bijective map τh : G → G, g 7→ h ◦ g
which leaves the measure Qβ quasi-invariant:
dQβ(h ◦ g) = Y βh (g) dQ
β(g).
In other words, the push forward of Qβ under the map τ−1h = τh−1 is absolutely continuous w.r.t.
Qβ with density Y
d(τh−1)∗Q
dQβ(g)
h (g).
The function Y
h is bounded from above and below (away from 0) on G.
By means of the canonical isometry χ : G → P, g 7→ g∗Leb, Theorem 4.1 immediately implies
Corollary 4.2. For each C2-diffeomorphism h ∈ G the entropic measure Pβ is quasi-invariant
under the transformation µ 7→ h∗µ of the space P:
dPβ(h∗µ) = Y
−1(µ)) dPβ(µ).
The density Y
−1(µ)) introduced in (4.2) can be expressed as follows
−1(µ)) = exp
log h′(s)µ(ds)
I∈gaps(µ)
h′(I−) · h′(I+)
|h(I)|/|I|
where gaps(µ) denotes the set of segments I = ]I−, I+[⊂ S1 of maximal length with µ(I) = 0
and |I| denotes the length of such a segment.
4.3 The Change of Variables Formula on the Interval
From the representation of Qβ as a product of Q
0 and Leb (see Remark 3.7) and the change of
variable formulae for Qβ and Leb, one can deduce a change of variable formula for Q
0 similar to
that of Theorem 4.1 but containing an additional factor 1
h′(0)
. In this case, one has to restrict
to translations by means of C2-diffeomorphisms h ∈ G with h(0) = 0.
More generally, one might be interested in translations of G0 by means of C2-diffeomorphisms
h ∈ G0. In contrast to the previous situation, it now may happen that h′(0) 6= h′(1).
For g ∈ G0 and C2-ismorphism h : [0, 1] → [0, 1] we put
h,0(g) := X
h (g) · Yh,0(g) (4.3)
Yh,0(g) =
h′(0) · h′(1)
· Y 0h (g)
and X
(g) and Y 0h (g) defined as before in (4.1), (4.2). Note that here and in the sequel by a
C2-isomorphism h ∈ G0 we understand an increasing homeomorphism h : [0, 1] → [0, 1] such that
h and h−1 are bounded in C2([0, 1]), which in particular implies h′ > 0.
Theorem 4.3. Each translation τh : G0 → G0, g 7→ h ◦ g by means of a C2-isomorphism h ∈ G0
leaves the measure Q
0 quasi-invariant:
0 (h ◦ g) = Y
(g) dQ
0 (g)
or, in other words,
d(τh−1)∗Q
0 (g)
0 (g)
The function Y
h,0 is bounded from above and below (away from 0) on G0.
Corollary 4.4. For each C2-isomorphism h ∈ G0 the entropic measure Pβ0 is quasi-invariant
under the transformation µ 7→ h∗µ of the space P0:
0 (h∗µ)
0 (µ)
= exp
log h′(s)µ(ds)
h′(0) · h′(1)
I∈gaps(µ)
h′(I−) · h′(I+)
|h(I)|/|I|
where gaps(µ) denotes the set of intervals I = ]I−, I+[⊂ [0, 1] of maximal length with µ(I) = 0
and |I| denotes the length of such an interval.
Remark 4.5. Theorem 4.3 seems to be unrelated to the quasi-invariance of the measure Qm
P([0,1])
under the transformation dg → h · dg/〈h, dg〉 shown in [Han02]. Nor is it anyhow implied by
the quasi-ivarariance formula for the general measure valued gamma process as in [TVY01] with
respect to a similar transformation. In our present case the latter would correspond to the
mapping dγ → h · dγ of the (measure valued) Gamma process dγ.
4.4 Proofs for the Sphere Case
Lemma 4.6. For each C2-diffeomorphism h ∈ G
h (g) = lim
h (g(ti+1))− h (g(ti))
g(ti+1)− g(ti)
]β(ti+1−ti)
(4.4)
Here ti =
for i = 0, 1, . . . , k − 1 and tk = 0. Thus ti+1 − ti := |[ti, ti+1]| = 1k for all i.
Proof. Without restriction, we may assume β = 1. According to Taylor’s formula
h (g(ti+1)) = h (g(ti)) + h
′ (g(ti)) · (g(ti+1)− g(ti)) + 12h
′′(γi) · (g(ti+1)− g(ti))2
for some γi ∈ [g(ti), g(ti+1)]. Hence,
h (g(ti+1))− h (g(ti))
g(ti+1)− g(ti)
]ti+1−ti
= lim
h′ (g(ti)) +
h′′(γi) · (g(ti+1)− g(ti))
]ti+1−ti
= lim
log h′ (g(ti)) + log
1 + 1
h′′(γi)
h′ (g(ti))
(g(ti+1)− g(ti))
· (ti+1 − ti)
= exp
log h′ (g(ti)) · (ti+1 − ti)
= exp
log h′ (g(t)) dt
= X1h(g).
Here (⋆) follows from the fact that
1 + 1
h′′(γi)
h′ (g(ti))
· (g(ti+1)− g(ti)) =
h (g(ti+1))− h (g(ti))
g(ti+1)− g(ti)
h′ (g(ti))
= h′(ηi) ·
h′ (g(ti))
≥ ε > 0
for some ηi ∈ [g(ti), g(ti+1)] and some ε > 0, independent of i and k. Thus
1 + 1
h′′(γi)
h′ (g(ti))
· (g(ti+1)− g(ti))
· (ti+1 − ti)
≤ C1 ·
h′′(γi)
h′ (g(ti))
· (g(ti+1)− g(ti)) · (ti+1 − ti)
≤ C2 ·
(g(ti+1)− g(ti)) · (ti+1 − ti)
≤ C3 · 1k .
Lemma 4.7. For each C3-diffeomorphism h ∈ G
Y 0h (g) := lim
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
(4.5)
where ti =
for i = 0, 1, . . . , k − 1 and tk = 0.
Proof. Let h and g be given. Depending on some ε > 0 let us choose l ∈ N large enough (to be
specified in the sequel) and let a1, . . . , al denote the l largest jumps of g. Put J
g = Jg\{a1, . . . , al}
and for simplicity al+1 := a1. For k very large (compared with l) and j = 1, . . . , l let kj denote
the index i ∈ {0, 1, . . . , k − 1}, for which aj ∈ [ti, ti+1[. Then again by Taylor’s formula
kj+1−1
i=kj+1
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
kj+1−1
i=kj+1
1 + 1
h′′ (g(ti))
h′ (g(ti))
· (g(ti+1)− g(ti)) + 16
h′′′(ηi)
h′ (g(ti))
· (g(ti+1)− g(ti))2
≤ exp
kj+1−1
i=kj+1
log h′
(g(ti)) +
· (g(ti+1)− g(ti))
≤ eε/l · exp
kj+1−1
i=kj+1
log h′
(g(ti)) · (g(ti+1)− g(ti))
provided l and k are chosen so large that
|g(ti+1)− g(ti)| ≤
C1 · l
for all i ∈ {0, . . . , k − 1} \ {k1, . . . , kl}, where C1 = sup
|h′′′(x)|
6·h′(y)
On the other hand,
g(tkj+1)
g(tkj+1)
) = exp
∫ g(tkj+1 )
g(tkj+1)
log h′
(s) ds
= exp
kj+1−1
i=kj+1
log h′
(g(ti)) · (g(ti+1)− g(ti)) +
log h′
(γi) · 12 (g(ti+1)− g(ti))
≥ e−ε/l · exp
kj+1−1
i=kj+1
log h′
(g(ti)) · (g(ti+1)− g(ti))
provided l and k are chosen so large that
|g(ti+1)− g(ti)| ≤
C2 · l
for all i ∈ {0, 1, . . . , k − 1} \ {k1, . . . , kl}, where C2 = sup
log h′
Therefore,
i∈{0,1,...,k−1}\{k1,...,kl}
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
≤ e2ε ·
g(tkj+1)
g(tkj+1)
) = (I).
In order to derive the corresponding lower estimate, we can proceed as before in (1a) and (2)
(replacing ε by −ε and ≤ by ≥ and vice versa). To proceed as in (1b) we have to argue as
follows
kj+1−1
i=kj+1
log h′
(g(ti))− εl
· (g(ti+1)− g(ti))
≥ e−ε/l · exp
kj+1−1
i=kj+1
(1− ε) ·
log h′
(g(ti)) · (g(ti+1)− g(ti))
provided l and k are chosen so large that
log (1 + C3 · (g(ti+1)− g(ti))) ≥ (1− ε) · C3 · (g(ti+1)− g(ti))
for all i ∈ {0, 1, . . . , k − 1} \ {k1, . . . , kl}, where C3 = sup
log h′
Thus we obtain the following lower estimate
i∈{0,1,...,k−1}\{k1,...,kl}
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
≥ e−2ε ·
g(tkj+1)
g(tkj+1)
≥ e−2ε · C−ε/23 ·
g(tkj+1)
g(tkj+1)
) = (II),
since
g(tkj+1)
g(tkj+1)
= exp
log h′
g(tkj+1)
− log h′
g(tkj+1)
≤ exp
g(tkj+1)− g(tkj+1)
≤ exp
where C3 = sup
∣(log h′)
Now for fixed l as k → ∞ the bound (I) converges to
(I′) = e2ε ·
h′ (g(aj+1−))
h′ (g(aj+))
and the bound (II) to
(II′) = e−2ε · C−ε/23 ·
h′ (g(aj+1−))
h′ (g(aj+))
Finally, it remains to consider
i∈{k1,...,kl}
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
= (III).
Again for fixed l and k → ∞ this obviously converges to
(III′) =
h′ (g(aj−))
· δ(h ◦ g)
Putting together these estimates and letting l → ∞, we obtain the claim.
Lemma 4.8. (i) For all g, h ∈ G with h ∈ C2 strictly increasing, the infinite product in the
definition of Y 0h (g) converges. There exists a constant C = C(β, h) such that ∀g ∈ G
≤ Y β
(g) ≤ C.
(ii) If hn → h in C2 then Y 0hn(g) → Y
h (g).
(iii) Let Y 0h,k,X
denote the sequences used in Lemma 4.6 and 4.7 to approximate Y 0h ,X
Then there exists a constant C = C(β, h) such that ∀g ∈ G, ∀k ∈ N
≤ Y βh,k(g) ≤ C.
Proof. (i) Put C = sup |(log h′)′|. Given g ∈ G and ǫ > 0, we choose k large enough such that
a∈Jg(k)
|g(a+)−g(a−)| ≤ ǫ where Jg(k) = Jg \{a1, a2, . . . , ak} denotes the ’set of small jumps’
of g. Here we enumerate the jump locations a1, a2, . . . ∈ Jg according to the size of the respective
jumps. Then with suitable ξa ∈ [g(a−), g(a+)]
a∈Jg(k)
h′(g(a−))
h′(g(a+))
δ(h◦g)
a∈Jg(k)
log h′(g(a−)) + 1
log h′(g(a−)) − log h′(ξ(a))
a∈Jg(k)
|C · (g(a+)− g(a−))| = C · ǫ.
Hence, the infinite sum
h′(g(a−))
h′(g(a+))
δ(h◦g)
= lim
a∈Jg(k)
h′(g(a−))
h′(g(a+))
δ(h◦g)
is absolutely convergent and thus also infinite product in the definition of Y 0h (g) converges. The
same arguments immediately yield
∣log Y 0h (g)
log h′(g(a−)) + 1
log h′(g(a−)) − log h′(ξ(a))
≤ C. (4.6)
(ii) In order to prove the convergence Y 0hn(g) → Y
h (g), for given g ∈ G we split the product over
all jumps into a finite product over the big jumps and an infinite product over all small jumps.
Obviously, the finite products will converge (for any choice of k)
a∈{a1,...,ak}
h′n(g(a−))
h′n(g(a+))
δ(hn◦g)
a∈{a1,...,ak}
h′(g(a−))
h′(g(a+))
δ(h◦g)
as n → ∞ provided hn → h in C2. Now let C = supn supx |(log h′n)′(x)| and choose k as before.
Then uniformly in n
a∈Jg\{a1,...,ak}
h′n(g(a−))
h′n(g(a+))
δ(hn◦g)
≤ C · ǫ.
(iii) Let C1 = sup
|h′(x)| and C2 = sup
∣(log h′)
∣. Then for all g and k:
Xh,k(g) =
h′(ηi)
ti+1−ti ≤ C1
Y 0h,k(g) =
h′ (g(ti))
h′(γi)
= exp
log h′
(ζi) · (g(ti)− γi)
≤ exp
|g(ti)− γi|
≤ exp(C2)
(with suitable γi, ηi ∈ [g(ti), g(ti+1)] and ζi ∈ [g(ti), γi]). Analogously, the lower estimates
follow.
Proof of Theorem 4.1. In order to prove the equality of the two measures under consideration, it
suffices to prove that all of their finite dimensional distributions coincide. That is, for each m ∈
N, each ordered family t1, . . . , tm of points in S
1 and each bounded continuous u : (S1)m −→ R
one has to verify that
h−1 (g(t1)) , h
−1 (g(t2)) , . . . , h
−1 (g(tm))
dQβ(g)
u (g(t1), g(t2), . . . , g(tm)) · Y βh (g) dQ
β(g).
Without restriction, we may restrict ourselves to equidistant partitions, i.e. ti =
for i =
1, . . . ,m. Let us fix m ∈ N, u and h. For simplicity, we first assume that h is C3. Then by
Lemmas 4.6 - 4.8 and Lebesgue’s theorem
, . . . , g (1)
· Y βh (g) dQ
, . . . , g (1)
· lim
(g) dQβ(g)
= lim
, . . . , g (1)
dQβ(g)
= lim
[Γ(β/km)]km
u(xk, x2k, . . . , xmk)
h′(xi) ·
[h(xi+1)− h(xi)]
dx1 . . . dxmk
= lim
[Γ(β/km)]km
u(xk, x2k, . . . , xmk) ·
[h(xi+1)− h(xi)]
dh(x1) . . . dh(xmk)
= lim
[Γ(β/km)]km
h−1(yk), h
−1(y2k), . . . , h
−1(ymk)
[yi+1 − yi]
dy1 . . . dymk
, h−1
, . . . , h−1 (g (1))
dQβ(g).
Now we treat the general case h ∈ C2. We choose a sequence of C3-functions hn ∈ G with hn → h
in C2. Then
h−1 (g(t1)) , h
−1 (g(t2)) , . . . , h
−1 (g(tm))
dQβ(g)
= lim
h−1n (g(t1)) , h
n (g(t2)) , . . . , h
n (g(tm))
dQβ(g)
= lim
u (g(t1), g(t2), . . . , g(tm)) · Y βhn(g) dQ
u (g(t1), g(t2), . . . , g(tm)) · Y βh (g) dQ
β(g).
For the last equality, we have used the dominated convergence Y
(g) → Y βh (g) (due to Lemma
4.8).
4.5 Proof for the Interval Case
The proof of Theorem 4.3 uses completely analogous arguments as in the previous section. To
simplify notation, for h ∈ C1([0, 1]), k ∈ N let Xh,k, Y 0h,k : G0 → R be defined by
Xh,k(g) :=
h (g(ti+1))− h (g(ti))
g(ti+1)− g(ti)
]ti+1−ti
Y 0h,k(g) :=
g(t1)− g(t0)
h (g(t1))− h (g(t0))
] k−1
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
where ti =
with i = 0, 1, . . . , k. Similar to the proof of theorem 4.1 the measure Q
0 satisfies
the following finite dimensional quasi-invariance formula.
For any u : [0, 1]m−1 → R, m, l ∈ N and C1-isomorphism h : [0, 1] → [0, 1]
h−1 (g(t1)) , h
−1 (g(t2)) , . . . , h
−1 (g(tm−1))
0 (g)
u (g(t1), g(t2), . . . , g(tm−1)) ·Xβh,l·m(g) · Y
h,l·m(g) dQ
0 (g),
where ti =
, i = 1, · · · ,m−1. The passage to the limit for letting first l and then m to infinity
is based on the following assertions.
Lemma 4.9. (i) For each C2-isomorphism h ∈ G0 and g ∈ G0
Xh(g) = lim
Xh,k(g).
(ii) For each C3-isomorphism h ∈ G0 and g ∈ G0
Y 0h,k(g) =
h′(g(a+)) · h′(g(a−))
δ(h◦g)
h′(g(0)) · h′(g(1−))
1 if g(1−) = g(1)
h′(g(1−))
δ(h◦g)
else,
where Jg ⊂]0, 1[ is the set of jump locations of g on ]0, 1[. In particular,
Y 0h,k(g) = Yh,0(g) for Q
0 -a.e.g.
(iii) For all g ∈ G0 and C2-isomorphism h ∈ G0, the infinite product in the definition of Yh,0(g)
converges. There exists a constant C = C(β, h) such that ∀g ∈ G0
≤ Y β
(g) ≤ C.
(iv) If hn → h in C2([0, 1], [0, 1]) with h as above, then Y 0,hn(g) → Y0,h(g).
(v) For each C3-isomorphism h ∈ G0 there exists a constant C = C(β, h) such that ∀g ∈ G,
∀k ∈ N
≤ Xβh,k(g) · Y
h,k(g) ≤ C.
Proof. The proofs of (i) and (iii)-(iv) carry over from their respective counterparts on the sphere,
lemmas 4.6 and 4.8 above. We sketch the proof of statement (ii) which needs most modification.
For ε > 0 choose l ∈ N large enough and let a2, . . . , al−1 denote the l − 2 largest jumps of g
on ]0, 1[. For k very large (compared with l) we may assume that a2, . . . , al−2 ∈] 2k , 1 −
[. Put
a1 :=
, al := 1 − 1k . For j = 1, . . . , l let kj denote the index i ∈ {1, . . . , k − 1}, for which
aj ∈ [ti, ti+1[. In particular, k1 = 1 and kl = k−1. Then using the same arguments as in lemma
4.7 one obtains, for k and l sufficiently large, the two sided bounds
(I) = e2ε ·
g(tkj+1)
g(tkj+1)
i∈{1,...,k−1}\{k1,...,kl}
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
≥ e−2ε · C−ε/23 ·
g(tkj+1)
g(tkj+1)
) = (II)
For fixed l and k → ∞ the bounds (I) and (II) converge to
(I′) = e2ε
h′(g(a2−))
h′(g(0))
h′ (g(aj+1−))
h′ (g(aj+))
h′(g(1−))
h′(g(al−1+))
(II′) = e−2ε · C−ε/23 ·
h′(g(a2−))
h′(g(0))
h′ (g(aj+1−))
h′ (g(aj+))
h′(g(1−))
h′(g(al−1+))
It remains to consider the three remaining terms
(III) =
i∈{k2,...,kl−1}
h′ (g(ti)) ·
g(ti+1)− g(ti)
h (g(ti+1))− h (g(ti))
which for fixed l and k → ∞ converges to
(III′) =
h′ (g(aj−))
· δ(h ◦ g)
(IV) =
)− g(0)
− h (g(0))
)− g( 1
converging by right continuity of g to
(IV′) = h′(g(0))
(V) =
k − 1
g(1) − g(k−1
h (g(1)) − h
g(k−1
which tends, also for k → ∞, to
(V′) =
1 if g continuous in 1
δ(h◦g)
(1) 1
h′(g(1−))
else.
Combining these estimates and letting l → ∞, we obtain the first claim. The second claim in
statement (ii) follows from the fact that g is continuous in t = 1 Q
0 -almost surely.
5 The Integration by Parts Formula
In order to construct Dirichlet forms and Markov processes on G, we will consider it as an infinite
dimensional manifold. For each g ∈ G, the tangent space TgG will be an appropriate completion
of the space C∞(S1,R). The whole construction will strongly depend on the choice of the norm
on the tangent spaces TgG. Basically, we will encounter two important cases:
• in Chapter 6 we will study the case TgG = Hs(S1,Leb) for some s > 1/2, independent
of g; this approach is closely related to the construction of stochastic processes on the
diffeomorphism group of S1 and Malliavin’s Brownian motion on the homeomorphism
group on S1, cf. [Mal99].
• in Chapters 7-9 we will assume TgG = L2(S1, g∗Leb); in terms of the dynamics on the
space P(S1) of probability measures, this will lead to a Dirichlet form and a stochastic
process associated with the Wasserstein gradient and with intrinsic metric given by the
Wasserstein distance.
In this chapter, we develop the basic tools for the differential calculus on G. The main result
will be an integration by parts formula. These results will be independent of the choice of the
norm on the tangent space.
5.1 The Drift Term
For each ϕ ∈ C∞(S1,R), the flow generated by ϕ is the map eϕ : R × S1 → S1 where for each
x ∈ S1 the function eϕ(., x) : R → S1, t 7→ eϕ(t, x) denotes the unique solution to the ODE
= ϕ(xt) (5.1)
with initial condition x0 = x. Since eϕ(t, x) = etϕ(1, x) for all ϕ, t, x under consideration, we
may simplify notation and write etϕ(x) instead of eϕ(t, x).
Obviously, for each ϕ ∈ C∞(S1,R) the family etϕ, t ∈ R is a group of orientation preserving,
C∞-diffeomorphism of S1. (In particular, e0 is the identity map e on S1, etϕ ◦ esϕ = e(t+s)ϕ for
all s, t ∈ R and (eϕ)−1 = e−ϕ.)
Since ∂
etϕ(x)|t=0 = ϕ(x) we obtain as a linearization for small t
etϕ(x) ≈ x+ tϕ(x). (5.2)
More precisely,
|etϕ(x)− (x+ tϕ(x))| ≤ C · t2
as well as
etϕ(x)− (1 + t
ϕ(x))| ≤ C · t2
uniformly in x and |t| ≤ 1.
For ϕ ∈ C∞(S1,R) and β > 0 we define functions V βϕ : G → R by
V βϕ (g) := V
ϕ (g) + β
ϕ′(g(x))dx
where
V 0ϕ (g) :=
ϕ′(g(a+)) + ϕ′(g(a−))
− ϕ(g(a+)) − ϕ(g(a−))
g(a+)− g(a−)
. (5.3)
Lemma 5.1. (i) The sum in (5.3) is absolutely convergent. More precisely,
|V 0ϕ (g)| ≤
ϕ′(g(a+)) + ϕ′(g(a−))
− ϕ(g(a+)) − ϕ(g(a−))
g(a+)− g(a−)
|ϕ′′(x)|dx
|V βϕ (g)| ≤ (1/2 + β) ·
|ϕ′′(x)|dx.
(ii) For each β ≥ 0
V βϕ (g) =
Y βetϕ(g)
e+tϕ(g)
. (5.4)
Proof. (i) According to Taylor’s formula, for each a ∈ Jg
ϕ′(g(a+)) + ϕ′(g(a−))
− δ(ϕ ◦ g)
(a) =
2(g(a+) − g(a−))
∫ g(a+)
g(a−)
∫ g(a+)
g(a−)
sgn(y−x) ·ϕ′′(y)dydx.
Hence,
ϕ′(g(a+)) + ϕ′(g(a−))
− δ(ϕ ◦ g)
(g(a+)− g(a−))
∫ g(a+)
g(a−)
∫ g(a+)
g(a−)
sgn(y − x) · ϕ′′(y)dydx
∫ g(a+)
g(a−)
|ϕ′′(y)|dy = 1
|ϕ′′(y)|dy.
Finally,
ϕ′(g(x))dx| ≤ sup
|ϕ′(y)| ≤
|ϕ′′(y)|dy.
(ii) Let us first consider the case β = 0.
log Y 0etϕ(g)
etϕ)(g(a+)) +
etϕ)(g(a−)) − log
δ(etϕ ◦ g)
etϕ)(g(a+)) +
etϕ)(g(a−)) − log
δ(etϕ ◦ g)
In order to justify that we may interchange differentiation and summation, we decompose (as
we did several times before) the infinite sum over all jumps in Jg into a finite sum over big
jumps a1, . . . , ak and an infinite sum over small jumps in Jg(k) = Jg \ {a1, . . . , ak}. Of course,
the finite sum will make no problem. We are going to prove that the contribution of the small
jumps is arbitrarily small. Recall from Lemma 4.8 that
a∈Jg(k)
etϕ)(g(a+)) +
etϕ)(g(a−)) − log
δ(etϕ ◦ g)
≤ Ct·
a∈Jg(k)
[g(a+)− g(a−)]
where Ct := supx
log( ∂
etϕ)(x)
∣. Now Ct ≤ C · |t| for all |t| ≤ 1 and an appropriate constant
C. Thus for any given ǫ > 0
a∈Jg(k)
etϕ)(g(a+)) +
etϕ)(g(a−)) − log
δ(etϕ ◦ g)
provided k is chosen large enough (i.e. such that C ·
a∈Jg(k)
|g(a+)−g(a−)| ≤ ǫ). This justifies
the above interchange of differentiation and summation.
Now for each x ∈ S1
etϕ(x)
= ϕ′(x)
since the linearization of etϕ for small t yields
etϕ(x) ≈ x+ tϕ(x),
etϕ(x) ≈ 1 + tϕ′(x).
Similarly, for small t we obtain
δ(etϕ ◦ g)
(a) ≈ 1 + t · δ(ϕ ◦ g)
and thus
δ(etϕ ◦ g)
δ(ϕ ◦ g)
Therefore,
log Y 0etϕ(g)
= V 0ϕ (g).
On the other hand, obviously
log Y 0etϕ(g)
Y 0etϕ(g)
since Y 0e0(g) = 1.
Finally, we have to consider the derivative of Xetϕ . Based on the previous arguments and using
the fact that ∂
(x) is uniformly bounded in t ∈ [−1, 1] and x ∈ S1 we immediately
logXetϕ(g)
(g(y))dy
(g(y)) dy =
ϕ′(g(y))dy.
Again Xe0(g) = 1. Therefore,
= β ·
ϕ′(g(y))dy
and thus
Y βetϕ(g)
= V βϕ (g).
this proves the first identity in (5.4). The proof of the second one V
ϕ (g) =
e+tϕ(g)
similar (even slightly easier).
5.2 Directional Derivatives
For functions u : G → R we will define the directional derivative along ϕ ∈ C∞(S1,R) by
Dϕu(g) := lim
[u(etϕ ◦ g)− u(g)] (5.5)
provided this limit exists. In particular, this will be the case for the following ’cylinder functions’.
Definition 5.2. We say that u : G → R belongs to the class Sk(G) if it can be written as
u(g) = U(g(x1), . . . , g(xm)) (5.6)
for some m ∈ N, some x1, . . . , xm ∈ S1 and some Ck-function U : (S1)m → R.
It should be mentioned that functions u ∈ Sk(G) are in general not continuous on G.
Lemma 5.3. The directional derivative exists for all u ∈ S1(G). In particular, for u as above
Dϕu(g) = lim
[u(g + t · ϕ ◦ g) − u(g)]
∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi))
with ∂iU :=
U . Moreover, Dϕ : S
k(G) → Sk−1(G) for all k ∈ N ∪ {∞} and
‖Dϕu‖L2(Qβ) ≤
m · ‖∇U‖∞ · ‖ϕ‖L2(S1).
Proof. The first claim follows from
Dϕu(g) =
U(etϕ(g(x1)), . . . , etϕ(g(xm)))
∂iU(etϕ(g(x1)), . . . , etϕ(g(xm))) ·
etϕ(g(xi))
∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi))
U(g(x1) + tϕ(g(x1)), . . . , g(xm) + tϕ(g(xm)))
= lim
[u(g + t · ϕ ◦ g)− u(g)] .
For the second claim,
‖Dϕu‖2L2(Qβ) =
∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi))
dQβ(g)
(∂iU)
2(g(x1), . . . , g(xm)) ·
ϕ2(g(xi))
dQβ(g)
≤ ‖∇U‖2∞ ·
ϕ2(g(xi)) dQ
= m · ‖∇U‖2∞ ·
ϕ2(y) dy.
5.3 Integration by Parts Formula on P(S1)
For ϕ ∈ C∞(S1,R) let D∗ϕ denote the operator in L2(G,Qβ) adjoint to Dϕ with domain S1(G).
Proposition 5.4. Dom(D∗ϕ) ⊃ S1(G) and for all u ∈ S1(G)
D∗ϕu = −Dϕu− V βϕ · u. (5.7)
Proof. Let u, v ∈ S1(G). Then
Dϕu · v dQβ = lim
[u(etϕ ◦ g) − u(g)] · v(g) dQβ(g)
= lim
u(g) · v(e−tϕ ◦ g) · Y βe−tϕ − u(g) · v(g)
dQβ(g)
= lim
u(g) · [v(e−tϕ ◦ g)− v(g)] dQβ(g)
+ lim
u(g) · v(g) ·
Y βe−tϕ − 1
dQβ(g)
+ lim
u(g) · [v(e−tϕ ◦ g) − v(g)] ·
Y βe−tϕ − 1
dQβ(g)
u ·Dϕv dQβ(g)−
u · v · V βϕ dQβ(g) + 0.
To justify the last equality, note that according to Lemma 4.8 | log Y βetϕ | ≤ C · |t| for |t| ≤ 1.
Hence, the claim follows with dominated convergence and Lemma 5.4.
Corollary 5.5. The operator (Dϕ,S
1(G)) is closable in L2(Qβ). Its closure will be denoted by
(Dϕ,Dom(Dϕ)).
In other words, Dom(Dϕ) is the closure (or completion) of S
1(G) with respect to the norm
[u2 + (Dϕu)
2] dQβ
Of course, the space Dom(Dϕ) will depend on β but we assume β > 0 to be fixed for the sequel.
Remark 5.6. The bilinear form
Eϕ(u, v) :=
Dϕu ·Dϕv dQβ , Dom(Eϕ) := Dom(Dϕ) (5.8)
is a Dirichlet form on L2(G,Qβ) with form core S∞(G). Its generator (Lϕ,Dom(Lϕ)) is the
Friedrichs extension of the symmetric operator
(−D∗ϕ ◦Dϕ, S2(G)).
5.4 Derivatives and Integration by Parts Formula on P([0, 1])
Now let us have a look on flows on [0, 1]. To do so, let a function ϕ ∈ C∞([0, 1],R) with
ϕ(0) = ϕ(1) = 0 be given. (Note that each such function can be regarded as ϕ ∈ C∞(S1,R)
with ϕ(0) = 0.) The flow equation (5.1) now defines a flow etϕ, t ∈ R, of order preserving C∞
diffeomorphisms of [0, 1]. In particular, etϕ(0) = 0 and etϕ(1) = 1 for all t ∈ R.
Lemma 5.1 together with Theorem 4.3 immediately yields
Lemma 5.7. For ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 and each β ≥ 0
etϕ,0
= V βϕ (g)−
ϕ′(0) + ϕ′(1)
ϕ,0(g). (5.9)
For functions u : G0 → R we will define the directional derivative along ϕ ∈ C∞([0, 1],R) with
ϕ(0) = ϕ(1) = 0 as before by
Dϕu(g) := lim
[u(etϕ ◦ g)− u(g)] (5.10)
provided this limit exists. We will consider three classes of ’cylinder functions’ for which the
existence of this limit is guaranteed.
Definition 5.8. (i) We say that a function u : G0 → R belongs to the class Ck(G0) (for k ∈
N ∪ {0,∞}) if it can be written as
u(g) = U
~f(t)g(t)dt
(5.11)
for some m ∈ N, some ~f = (f1, . . . , fm) with fi ∈ L2([0, 1],Leb) and some Ck-function U : Rm →
R. Here and in the sequel, we write
~f(t)g(t)dt =
f1(t)g(t)dt, . . . ,
fm(t)g(t)dt
(ii) We say that u : G0 → R belongs to the class Sk(G0) if it can be written as
u(g) = U (g(x1), . . . , g(xm)) (5.12)
for some m ∈ N, some x1, . . . , xm ∈ [0, 1] and some Ck-function U : Rm → R.
(iii) We say that u : G0 → R belongs to the class Zk(G0) if it can be written as
u(g) = U
~α(gs)ds
(5.13)
with U as above, ~α = (α1, . . . , αm) ∈ Ck([0, 1],Rm) and
~α(gs)ds =
α1(gs)ds, . . . ,
αm(gs)ds
Remark 5.9. For each ϕ ∈ C∞(S1,R) with ϕ(0) = 0 (which can be regarded as ϕ ∈ C∞([0, 1],R)
with ϕ(0) = ϕ(1) = 0), the definitions of Dϕ in (5.5) and (5.10) are consistent in the following
sense. Each cylinder function u ∈ S1(G0) defines by v(g) := u(g − g0) (∀g ∈ G) a cylinder
function v ∈ S1(G) with Dϕv = Dϕu on G0. Conversely, each cylinder function v ∈ S1(G)
defines by u(g) := v(g) (∀g ∈ G0) a cylinder function u ∈ S1(G0) with Dϕv = Dϕu on G0.
Lemma 5.10. (i) The directional derivative Dϕu(g) exists for all u ∈ C1(G0)∪S1(G0)∪Z1(G0)
(in each point g ∈ G0 and in each direction ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0) and
Dϕu(g) = limt→0
[u(g + t · ϕ ◦ g)− u(g)]. Moreover,
Dϕu(g) =
~f(t)g(t)dt
fi(t)ϕ(g(t))dt
for each u ∈ C1(G0) as in (5.11),
Dϕu(g) =
∂iU(g(x1), . . . , g(xm)) · ϕ(g(xi))
for each u ∈ S1(G0) as in (5.12), and
Dϕu(g) =
~α(gs)ds
α′i(gs)ϕ(gs)ds
for each u ∈ Z1(G0) as in (5.13).
(ii) For ϕ ∈ C∞([0, 1],R) with ϕ(0) = ϕ(1) = 0 let D∗ϕ,0 denote the operator in L2(G0,Q
adjoint to Dϕ. Then for all u ∈ C1(G0) ∪S1(G0) ∪ Z1(G0)
D∗ϕ,0u = −Dϕu− V
ϕ,0 · u. (5.14)
Proof. See the proof of the analogous results in Lemma 5.3 and Proposition 5.4.
Remark 5.11. The operators (Dϕ,C
1(G0)), (Dϕ,S1(G0)), and (Dϕ,Z1(G0)) are closable in
0 ). The closures of (Dϕ,C
1(G0)), (Dϕ,Z1(G0)) and (Dϕ,S1(G0)) coincide. They will be
denoted by (Dϕ,Dom(Dϕ)). See (proof of) Corollary 6.11.
6 Dirichlet Form and Stochastic Dynamics on on G
At each point g ∈ G, the directional derivative Dϕu(g) of any ’nice’ function u on G defines
a linear form ϕ 7→ Dϕu(g) on C∞(S1). If we specify a pre-Hilbert norm ‖.‖g on C∞(S1) for
which this linear form is continuous then there exists a unique element Du(g) ∈ TgG with
Dϕu(g) = 〈Du(g), ϕ〉g for all ϕ ∈ C∞(S1). Here TgG denotes the completion of C∞(S1) w.r.t.
the norm ‖.‖g.
The canonical choice of a Dirichlet form on G will then be (the closure of)
E(u, v) =
〈Du(g),Dv(g)〉g dQβ(g), u, v ∈ S1(G). (6.1)
Given such a Dirichlet form, there is a straightforward procedure to construct an operator (’gen-
eralized Laplacian’) and a Markov process (’generalized Brownian motion’). Different choices of
‖.‖g in general will lead to completely different Dirichlet forms, operators and Markov processes.
We will discuss in detail two choices: in this chapter we will choose ‖.‖g (independent of g) to
be the Sobolev norm ‖.‖Hs for some s > 1/2; in the remaining chapters, ‖.‖g will always be the
L2-norm ϕ 7→ (
ϕ(gt)
2dt)1/2 of L2(S1, g∗Leb).
For the sequel, fix – once for ever – the number β > 0 and drop it from the notations, i.e.
Q := Qβ, Vϕ := V
ϕ etc.
6.1 The Dirichlet Form on G
Let (ψk)k∈N denote the standard Fourier basis of L
2(S1). That is,
ψ2k(x) =
2 · sin(2πkx), ψ2k+1(x) =
2 · cos(2πkx)
for k = 1, 2, . . . and ψ1(x) = 1. It constitutes a complete orthonormal system in L
2(S1): each
ϕ ∈ L2(S1) can uniquely be written as ϕ(x) =
k=1 ck · ψk(x) with Fourier coefficients of ϕ
given by ck :=
ϕ(y)ψk(y)dy. In terms of these Fourier coefficients we define for each s ≥ 0
the norm
‖ϕ‖Hs :=
c21 +
k2s · (c22k + c22k+1)
(6.2)
on C∞(S1). The Sobolev space Hs(S1) is the completion of C∞(S1) with respect to the norm
‖.‖Hs . It has a complete orthonormal system consisting of smooth functions (ϕk)k∈N. For
instance, one may choose
ϕ2k(x) =
2 · k−s · sin(2πkx), ϕ2k+1(x) =
2 · k−s · cos(2πkx) (6.3)
for k = 1, 2, . . . and ϕ1(x) = 1.
A linear form A : C∞(S1) → R is continuous w.r.t. ‖.‖Hs — and thus can be represented as
A(ϕ) = 〈ψ,ϕ〉Hs for some ψ ∈ Hs(S1) with ‖ψ‖Hs = ‖A‖Hs — if and only if
‖A‖Hs :=
|A(ψ1)|2 +
k2s · (|A(ψ2k)|2 + |A(ψ2k+1)|2)
<∞. (6.4)
Proposition 6.1. Fix a number s > 1/2. Then for each cylinder function u ∈ S(G) and each
g ∈ G, the directional derivative defines a continuous linear form ϕ 7→ Dϕu(g) on C∞(S1) ⊂
Hs(S1). There exists a unique tangent vector Du(g) ∈ Hs(S1) such that Dϕu(g) = 〈Du(g), ϕ〉Hs
for all ϕ ∈ C∞(S1).
In terms of the family Φ = (ϕk)k∈N from (6.3)
Du(g) =
Dϕku(g) · ϕk(.)
‖Du(g)‖2Hs =
|Dϕku(g)|2. (6.5)
Proof. It remains to prove that the RHS of (6.5) is finite for each u and g under consideration.
According to Lemma 5.3, for any u ∈ S(G) represented as in (5.12)
|Dϕku(g)|2 =
∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi))
≤ m · ‖∇U‖2∞ · ‖
ϕ2k‖∞ = m · ‖∇U‖2∞ · (1 + 4
k−2s).
And, indeed, the latter is finite for each s > 1/2.
For the sequel, let us now fix a number s > 1/2 and define
E(u, v) =
〈Du(g),Dv(g)〉Hs dQ(g) (6.6)
for u, v ∈ S1(G). Equivalently, in terms of the family Φ = (ϕk)k∈N from (6.3)
E(u, v) =
Dϕku(g) ·Dϕkv(g) dQ(g). (6.7)
Theorem 6.2. (i) (E ,S1(G)) is closable. Its closure (E ,Dom(E)) is a regular Dirichlet form
on L2(G,Q) which is strongly local and recurrent (hence, in particular, conservative).
(ii) For u ∈ S1(G) with representation (5.6)
E(u, u) =
∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi))
dQ(g).
The generator of the Dirichlet form is the Friedrichs extension of the operator L given on S2(G)
Lu(g) =
i,j=1
∂i∂jU (g(x1), . . . , g(xm))ϕk(g(xi))ϕk(g(xj))
∂iU (g(x1), . . . , g(xm)) [ϕ
k(g(xi)) + Vϕk(g)]ϕk(g(xi)).
(iii) Z1(G) is a core for Dom(E) (i.e. it is contained in the latter as a dense subset). For
u ∈ Z1(G) with representation (5.13)
E(u, u) =
~α(gt)dt) ·
α′i(gt)ϕk(gt)dt
dQ(g).
The generator of the Dirichlet form is the Friedrichs extension of the operator L given on Z2(G)
Lu(g) =
i,j=1
∂i∂jU
~α(gt)dt
α′i(gt)ϕk(gt)dt ·
α′j(gt)ϕk(gt)dt
~α(gt)dt
{Vϕk(g) +
[α′′i (gt)ϕ
k(gt) + α
i(gt)ϕ
k(gt)ϕk(gt)]dt}.
(iv) The intrinsic metric ρ can be estimated from below in terms of the L2-metric:
ρ(g, h) ≥ 1√
‖g − h‖L2 .
Remark 6.3. All assertions of the above Theorem remain valid for any E defined as in (6.7)
with any choice of a sequence Φ = (ϕk)k∈N of smooth functions on S
1 with
C := ‖
ϕ2k‖∞ <∞. (6.8)
(This condition is satisfied for the sequence from (6.3) if and only if s > 1/2.)
The proof of the Theorem will make use of the following
Lemma 6.4. (i) Dom(E) contains all functions u which can be represented as
u(g) = U(‖g − f1‖L2 , . . . , ‖g − fm‖L2) (6.9)
with some m ∈ N, some f1, . . . , fm ∈ G and some U ∈ C1(Rm,R).
For each u as above, each ϕ ∈ C∞(S1) and Q-a.e. g ∈ G
Dϕu(g) =
∂iU(‖g − f1‖L2 , . . . , ‖g − fm‖L2) ·
sign(g(t)− fi(t))
|g(t) − fi(t)|
‖g − fi‖L2
ϕ(g(t))dt
where sign(z) := +1 for z ∈ S1 with |[0, z]| ≤ 1/2 and sign(z) := −1 for z ∈ S1 with |[z, 0]| < 1/2.
(ii) Moreover, Dom(E) contains all functions u which can be represented as
u(g) = U(gǫ1(x1), . . . , gǫm(xm)) (6.10)
with some m ∈ N, some x1, . . . , xm ∈ S1, some ǫ1, . . . , ǫm ∈ ]0, 1[ and some U ∈ C1((S1)m,R).
Here gǫ(x) :=
∫ x+ǫ
g(t)dt ∈ S1 for x ∈ S1 and 0 < ǫ < 1. More precisely,
gǫ(x) := π(
∫ x+ǫ
π−1g(t)dt)
where π : G(R) → G (cf. section 2.2) denotes the projection and π−1 : G → G(R) the canonical
lift with π−1(g)(t) ∈ [g(x), g(x) + 1] ⊂ R for t ∈ [x, x+ 1] ⊂ R.
For each u as above, each ϕ ∈ C∞(S1) and each g ∈ G
Dϕu(g) =
∂iU(gǫ1(x1), . . . , gǫm(xm)) ·
∫ xi+ǫi
ϕ(g(t))dt.
(iii) The set of all u of the form (6.10) is dense in Dom(E).
Proof. (i) Let us first prove that for each f ∈ G, the map u(g) = ‖g − f‖L2 lies in Dom(E). For
n ∈ N, let πn : G → G be the map which replaces each g by the piecewise constant map:
πn(g)(t) := g(
) for t ∈ [ i
Then by right continuity πn(g) → g as n→ ∞ and thus
|g( i
)− f( i
)|2 −→
|g(t) − f(t)|2dt.
Therefore, for each g ∈ G as n→ ∞
un(g) := Un(g(0), g(
), . . . , g(
n − 1
)) −→ u(g) (6.11)
where Un(x1, . . . , xn) :=
i=0 dn(xi+1 − f( in))
and dn is a smooth approximation of
the distance function x 7→ |x| on S1 (which itself is non-differentiable at x = 0 and x = 1
) with
|d′n| ≤ 1 and dn(x) → |x| as n→ ∞. Obviously, un ∈ S1(G).
By dominated convergence, (6.11) also implies that un → u in L2(G,Q). Hence, u ∈ Dom(E) if
(and only if) we can prove that
E(un) <∞.
E(un) =
∂iUn(g(0), g(
), . . . , g(
n − 1
)) · ϕ(g( i − 1
dQ(g)
ϕ2k(g(
i − 1
)) dQ(g) =
‖ϕk‖2L2 <∞,
uniformly in n ∈ N. This proves the claim for the function u(g) = ‖g − f‖L2 .
From this, the general claim follows immediately: if vn, n ∈ N, is a sequence of S1(G) approx-
imations of g 7→ ‖g − 0‖L2 then un(g) := U(vn(g − f1), . . . , vn(g − fm)) defines a sequence of
S1(G) approximations of u(g) = U(‖g − f1‖L2 , . . . , ‖g − fm‖L2).
(ii) Again it suffices to treat the particular case m = 1 and U = id, that is, u(g) = gǫ(x)
for some x ∈ S1 and some 0 < ǫ < 1. Let g̃ ∈ G(R) be the lifting of g and recall that
u(g) = π(1
∫ x+ǫ
g̃(t)dt). Define un ∈ S1(G) for n ∈ N by un(g) = π( 1n
i=0 g̃(x +
ǫ)). Right
continuity of g̃ implies un → u as n → ∞ pointwise on G and thus also in L2(G,Q). To see the
boundedness of E(un) note that Dϕun(g) = 1n
i=0 ϕ(g(x +
ǫ)). Thus
E(un) ≤
ϕ2k(g(x +
ǫ))dQ(g) =
‖ϕk‖2L2 <∞.
(iii) We have to prove that each u ∈ S1(G) can be approximated in the norm (‖.‖2 + E(.))1/2 by
functions un of type (6.10). Again it suffices to treat the particular case u(g) = g(x) for some
x ∈ S1. Choose un(g) = g1/n(x). Then by right continuity of g, un → u pointwise on G and
thus also in L2(G,Q). Moreover, Dϕun(g) = n
∫ x+1/n
ϕ(g(t))dt (for all ϕ and g) and therefore
E(un) ≤
∫ x+1/n
ϕ2k(g(t))dtdQ(g) =
‖ϕk‖2L2 <∞.
Proof of the Theorem. (a) The sum E of closable bilinear forms with common domain S1(G)
is closable, provided it is still finite on this domain. The latter will follow by means of Lemma
5.3 which implies for all u ∈ S1(G) with representation (5.11)
E(u, u) =
∂iU(g(x1), . . . , g(xm)) · ϕk(g(xi))
dQ(g)
≤ m · ‖∇U‖2∞ ·
‖ϕk‖2L2(S1) <∞.
Hence, indeed E is finite on S1(G).
(b) The Markov property for E follows from that of the Eϕk(u, v) =
Dϕku ·Dϕkv dQ.
(c) According to the previous Lemma, the class of continuous functions of type (6.10) is dense
in Dom(E). Moreover, the class of finite energy functions of type (6.9) is dense in C(G) (with
the L2 topology of G ⊂ L2(S1), cf. Proposition 2.1). Therefore, the Dirichlet form E is regular.
(e) The estimate for the intrinsic metric is an immediate consequence of the following estimate
for the norm of the gradient of the function u(g) = ‖g − f‖L2 (which holds for each f ∈ G
uniformly in g ∈ G):
‖Du(g)‖2 =
sign(g(t)− fi(t))
|g(t) − fi(t)|
‖g − fi‖L2
ϕk(g(t))dt
ϕ2k(g(t))dt ≤ ‖
ϕ2k‖∞ =: C.
(f) The locality is an immediate consequence of the previous estimate: Given functions u, v ∈
Dom(E) with disjoint supports, one has to prove that E(u, v) = 0. Without restriction, one may
assume that supp[u] ⊂ Br(g) and supp[v] ⊂ Br(h) with ‖g − h‖L2 > 2r + 2δ. (The general
case will follow by a simple covering argument.) Without restriction, u, v can be assumed to be
bounded. Then |u| ≤ Cwδ,g and |v| ≤ Cwδ,h for some constant C where
wδ,g(f) =
(r + δ − ‖f − g‖L2) ∧ 1
Given un ∈ S1(G) with un → u in Dom(E) put
un = (un ∧ wδ,g) ∨ (−wδ,g).
Then un → u in Dom(E). Analogously, vn → v in Dom(E) for vn = (vn ∧ wδ,h) ∨ (−wδ,h). But
obviously, E(un, vn) = 0 since un · vn = 0. Hence, E(u, v) = 0.
(g) In order to prove that Z1(G) is contained in Dom(E) it suffices to prove that each u ∈ Z1(G)
of the form u(g) =
α(gt)dt can be approximated in Dom(E) by un ∈ S1(G). Given u as above
with α ∈ C1(S1,R) put un(g) = 1n
i=1 α(gi/n). Then un ∈ S1(G), un → u on G and
Dϕun(g) =
α′(gi/n)ϕ(gi/n) →
α′(gt)ϕ(gt)dt = Dϕu(g).
Moreover,
E(un, un) =
α′(gi/n)ϕ(gi/n)
dQ(g)
≤ C ·
α′(gi/n)
2 dQ(g) = C ·
α′(t)2dt
uniformly in n ∈ N. Hence, u ∈ Dom(E) and
E(u, u) = lim
E(un, un) =
α′(gt)ϕk(gt)dt
dQ(g).
(h) The set Z1(G) is dense in Dom(E) since according to assertion (ii) of the previous Lemma
already the subset of all u of the form (6.10) is dense in Dom(E).
Finally, one easily verifies that Z2(G) is dense in Z1(G) and (using the integration by parts
formula) that L is a symmetric operator on Z2(G) with the given representation.
Corollary 6.5. There exists a strong Markov process (gt)t≥0 on G, associated with the Dirichlet
form E. It has continuous trajectories and it is reversible w.r.t. the measure Q. Its generator
has the form
DϕkDϕk +
Vϕk ·Dϕk
with {ϕk}k∈N being the Fourier basis of Hs(S1).
Remark 6.6. This process (gt)t≥0 is closely related to the stochastic processes on the diffeo-
morphism group of S1 and to the ’Brownian motion’ on the homeomorphism group of S1,
studied by Airault, Fang, Malliavin, Ren, Thalmaier and others [AMT04, AM06, AR02, Fan02,
Fan04, Mal99]. These are processes with generator 1
kDϕkDϕk . For instance, in the
case s = 3/2 our process from the previous Corollary may be regarded as ’Brownian motion
plus drift’. All the previous approaches are restricted to s ≥ 3/2. The main improvements of
our approach are:
• identification of a probability measure Q such that these processes — after adding a
suitable drift — are reversible;
• construction of such processes in all cases s > 1/2.
6.2 Finite Dimensional Noise Approximations
In the previous section, we have seen the construction of the diffusion process on G under minimal
assumptions. However, the construction of the process is rather abstract. In this section, we try
to construct explicitly a diffusion process associated with the generator of the Dirichlet form E
from Theorem 6.2. Here we do not aim for greatest generality.
Let a finite family Φ = (ϕk)k=1,...,n of smooth functions on S
1 be given and let (Wt)t≥0 with
Wt = (W
t , . . . ,W
t ) be a n-dimensional Brownian motion, defined on some probability space
(Ω,F ,P). For each x ∈ S1 we define a stochastic processes (ηt(x))t≥0 with values in S1 as the
strong solution of the Ito differential equation
dηt(x) =
ϕk(ηt(x))dW
ϕ′k(ηt(x))ϕk(ηt(x))dt (6.12)
with initial condition η0(x) = x. Equation (6.12) can be rewritten in Stratonovich form as
follows
dηt(x) =
ϕk(ηt(x)) ⋄ dW kt . (6.13)
Obviously, for every t and for P-a.e. ω ∈ Ω, the function x 7→ ηt(x, ω) is an element of the
semigroup G. (Indeed, it is a C∞-diffeomorphism.) Thus (6.13) may also be interpreted as a
Stratonovich SDE on the semigroup G:
dηt =
ϕk(ηt) ⋄ dW kt , η0 = e. (6.14)
This process on G is right invariant: if gt denotes the solution to (6.14) with initial condition
g0 = g for some initial condition g ∈ G then gt = ηt ◦ g. One easily verifies that the generator
of this process (gt)t≥0 is given on S
2(G) by 1
k=1DϕkDϕk . What we aim for, however, is a
process with generator
D∗ϕkDϕk =
DϕkDϕk +
Vϕk ·Dϕk .
Define a new probability measure Pg on (Ω,F), given on Ft by
dPg = exp
Vϕk(ηs ◦ g)dW ks −
|Vϕk(ηs ◦ g)|2ds
dP (6.15)
and a semigroup (Pt)t≥0 acting on bounded measurable functions u on G as follows
Ptu(g) =
u(ηt(g(.), ω)) dP
g(ω).
Proposition 6.7. (Pt)t≥0 is a strongly continuous Markov semigroup on G. Its generator is an
extension of the operator 1
L = −1
Dϕk with domain S
2(G). That is, for all u ∈ S2(G)
and all g ∈ G
(Ptu(g)− u(g)) =
Lu(g). (6.16)
Proof. The strong continuity follows easily from the fact that ηt(x, .) → x a.s. as t → 0 which
implies by dominated convergence
Ptu(g) =
u(ηt ◦ g) dPg → u(g)
for each continuous u : G → R.
Now we aim for identifying the generator. According to Girsanov’s theorem, under the measure
Pg the processes
W̃ kt =W
Vϕk(ηs ◦ g)ds
for k = 1, . . . , n will define n independent Brownian motions. In terms of these driving processes,
(6.12) can be reformulated as
dgt(x) =
ϕk(gt(x))dW̃
[ϕ′k(gt(x)) + Vϕk(gt)]ϕk(gt(x))dt (6.17)
(recall that gs = ηs ◦ g). The chain rule applied to a smooth function U on (S1)m, therefore,
yields
dU (gt(y1), . . . , gt(ym))
U (gt(y1), . . . , gt(ym)) dgt(yi)
i,j=1
∂xi∂xj
U (gt(y1), . . . , gt(ym)) d〈g.(yi), g.(yj)〉t
U (gt(y1), . . . , gt(ym))ϕk(gt(yi))dW̃
U (gt(y1), . . . , gt(ym)) [ϕ
k(gt(yi)) + Vϕk(gt)]ϕk(gt(yi))dt
i,j=1
∂xi∂xj
U (gt(y1), . . . , gt(ym))ϕk(gt(yi))ϕk(gt(yj))dt.
Hence, for a cylinder function of the form u(g) = U(g(y1), . . . , g(ym)) we obtain
(Ptu(g) − u(g))
= lim
[U (gt(y1), . . . , gt(ym))− U (g0(y1), . . . , g0(ym))] dPg
= lim
U (gs(y1), . . . , gs(ym)) [ϕ
k(gs(yi)) + Vϕk(gs)]ϕk(gs(yi))
i,j=1
∂xi∂xj
U (gs(y1), . . . , gs(ym))ϕk(gs(yi))ϕk(gs(yj))
 ds dPg
U (g(y1), . . . , g(ym)) [ϕ
k(g(yi)) + Vϕk(g)]ϕk(g(yi))
i,j=1
∂xi∂xj
U (g(y1), . . . , g(ym))ϕk(g(yi))ϕk(g(yj))
[DϕkDϕku(g) + Vϕk(g) ·Dϕku(g)] = −
D∗ϕkDϕku(g).
In order to justify (∗), we have to verify continuity in s in all the expressions preceding (∗). The
only term for which this is not obvious is Vϕk(gs). But gs = ηs ◦g with a function ηs(x, ω) which
is continuous in x and in s. Thus Vϕk(ηs(., ω) ◦ g) is continuous in s.
Remark 6.8. All the previous argumentations in principle also apply to infinite families of
(ϕk)k=1,2,..., provided they have sufficiently good integrability properties. For instance, the
family (6.3) with s > 5
will do the job. There are three key steps which require a careful
verification:
• the solvability of the Ito equation (6.12) and the fact that the solutions are homeomor-
phisms of S1; here s ≥ 3
suffices, cf. [Mal99];
• the boundedness of the quadratic variation of the drift to justify Girsanov’s transformation
in (6.15); for s > 5
this will be satisfied since Lemma 5.1 implies (uniformly in g)
|Vϕk(g)|
2 ≤ (β + 1)2
|ϕ′′k(x)|2dx ≤ 4(β + 1)2
k4−2s;
• the finiteness of the generator and Ito’s chain rule for C2-cylinder functions; here s > 3
will be sufficient.
Remark 6.9. Another completely different approximation of the process (gt)t≥0 in terms of
finite dimensional SDEs is obtained as follows. For N ∈ N, let S1N denote the set of cylinder
functions u : G → R which can be represented as u(g) = U(g(1/N), g(2/N), . . . , g(1)) for some
U ∈ C1((S1)N ). Denote the closure of (E ,S1N ) by (EN ,Dom(EN )). It is the image of the
Dirichlet form (EN ,Dom(EN )) on ΣN ⊂ (S1)N given by
EN (U) =
i,j=1
∂iU(x)∂jU(x) aij(x)ρ(x) dx (6.18)
aij(x) =
ϕk(xi)ϕk(xj), ρ(x) =
Γ(β/N)N
(xi+1 − xi)β/N−1dx.
and (as before) ΣN =
(x1, . . . , xN ) ∈ (S1)N :
i=1 |[xi, xi+1]| = 1
. That is,
EN (u) = EN (U)
for cylinder functions u ∈ S1N as above. Let (Xt,Px)t≥0,x∈ΣN be the Markov process on ΣN
associated with EN . Then the semigroup associated with EN is given by
TNt u(g) = Eg(1/N),...,g(1) [U(Xt)] .
Now let (gt,Pg)t≥0,g∈G and (Tt)t≥0 denote the Markov process and the L
2-semigroup associated
with E . Then as N → ∞
t → Tt strongly in L2
since
E2N ց E
in the sense of quadratic forms, [RS80], Theorem S.16. (Note that ∪N∈NS12N is dense inDom(E).)
6.3 Dirichlet Form and Stochastic Dynamics on G1 and P
In order to define the derivative of a function u : G1 → R we regard it as a function ũ on G with
the property ũ(g) = ũ(g ◦θz) for all z ∈ S1. This implies that Dϕũ(g) = (Dϕũ)(g ◦θz) whenever
one of these expressions is well-defined. In other words, Dϕũ defines a function on G1 which will
be denoted by Dϕu and called the directional derivative of u along ϕ.
Corollary 6.10. (i) Under assumption (6.8), with the notations from above,
E(u, u) =
|Dϕku|
2 dQ.
defines a regular, strongly local, recurrent Dirichlet form on L2(G1,Q).
(ii) The Markov process on G analyzed in the previous section extends to a (continuous, re-
versible) Markov process on G1.
In order to see the second claim, let g, g̃ ∈ G with g̃ = g ◦ θz for some z ∈ S1. Then obviously,
g̃t(., ω) = ηt(g̃(.), ω) = ηt(g(. + z), ω) = gt(., ω) ◦ θz.
Moreover,
Pg̃ = Pg
since Vϕ(g ◦ θz) = Vϕ(g) for all ϕ under consideration and all z ∈ S1.
The objects considered previously – derivative, Dirichlet form and Markov process on G1 – have
canonical counterparts on P. The key to these new objects is the bijective map χ : G1 → P.
The flow generated by a smooth ’tangent vector’ ϕ : S1 → R through the point µ ∈ P will be
given by ((etϕ)∗µ)t∈R. In these terms, the directional derivative of a function u : P → R at the
point µ ∈ P in direction ϕ ∈ C∞(S1,R) can be expressed as
Dϕu(µ) = lim
[u((etϕ)∗µ)− u(µ)] ,
provided this limit exists. The adjoint operator to Dϕ in L
2(P,P) is given (on a suitable dense
subspace) by
D∗ϕu(µ) = −Dϕ(µ)− Vϕ(χ−1(µ)) · u(µ).
The drift term can be represented as
−1(µ)) = β
ϕ′(s)µ(ds) +
I∈gaps(µ)
ϕ′(I−) + ϕ
′(I+)
− ϕ(I+)− ϕ(I−)|I|
Given a sequence Φ = (ϕk)k∈N of smooth functions on S
1 satisfying (6.8), we obtain a (regular,
strongly local, recurrent) Dirichlet form E on L2(P,P) by
E(u, u) =
|Dϕku(µ)|2dP(µ). (6.19)
It is the image of the Dirichlet form defined in (6.7) under the map χ. The generator of E is
given on an appropriate dense subspace of L2(P,P) by
L = −
D∗ϕkDϕk . (6.20)
For P-a.e. µ0 ∈ P, the associated Markov process (µt)t≥0 on P starting in µ0 is given as
µt(ω) = gt(ω)∗Leb
where (gt)t≥0 is the process on G, starting in g0 := χ−1(µ0). (As mentioned before, (gt)t≥0 admits
a more direct construction provided we restrict ourselves to a finite sequence Φ = (ϕk)k=1,...,n.)
6.4 Dirichlet Form and Stochastic Dynamics on G0 and P0
For s > 0 and ϕ : [0, 1] → R let the Sobolev norm ‖ϕ‖Hs be defined as in (6.2) and let Hs0([0, 1])
denote the closure of C∞c (]0, 1[), the space of smooth ϕ : [0, 1] → R with compact support
in ]0, 1[. If s ≥ 1/2 (which is the only case we are interested in) Hs0([0, 1]) can be identified
with {ϕ ∈ Hs([0, 1]) : ϕ(0) = ϕ(1) = 0} or equivalently with {ϕ ∈ Hs(S1) : ϕ(0) = 0}.
For the sequel, fix s > 1/2 and a complete orthonormal basis Φ = {ϕk}k∈N of Hs0([0, 1]) with
C := ‖
k‖∞ <∞, and define
E0(u, u) =
|Dϕk ,0u(g)|2 dQ0(g).
Corollary 6.11. (E0,S1(G0)), (E0,Z1(G0)) and (E0,C1(G0)) are closable. Their closures coincide
and define a regular, strongly local, recurrent Dirichlet form (E0,Dom(E0)) on L2(G0,Q0).
Proof. For the closability (and the equivalence of the respective closures) of (E0,S1(G0)) and
(E0,Z1(G0)), see the proof of Theorem 6.2. Also all the assertions on the closure are deduced in
the same manner. For the closability of (E0,C1(G0)) (and the equivalence of its closure with the
previously defined closures), see the proof of Theorem 7.8 below.
As explained in the previous subsection, these objects (invariant measure, derivative, Dirichlet
form and Markov process) on G0 have canonical counterparts on P0 defined by means of the
bijective map χ : G0 → P0.
7 The Canonical Dirichlet Form on the Wasserstein Space
7.1 Tangent Spaces and Gradients
The aim of this chapter is to construct a canonical Dirichlet form on the L2-Wasserstein space
P0. Due to the isometry χ : G0 → P0 this is equivalent to construct a canonical Dirichlet form
on the metric space (G0, ‖.‖L2). This can be realized in two geometric settings which seem to be
completely different:
• Like in the preceding two chapters, G0 can be considered as a group, with composition of
functions as group operation. The tangent space TgG0 is the closure (w.r.t. some norm)
of the space of smooth functions ϕ : [0, 1] → R with ϕ(0) = ϕ(1) = 0. Such a function ϕ
induces a flow on G0 by (g, t) 7→ etϕ ◦ g ≈ g + t ϕ ◦ g and it defines a directional derivative
by Dϕu(g) = limt→0
[u(etϕ ◦ g)−u(g)] for u : G0 → R. The norm on TgG0 we now choose
to be ‖ϕ‖Tg := (
ϕ(gs)
2ds)1/2. That is,
TgG0 := L2([0, 1], g∗Leb).
For given u and g as above, a gradient Du(g) ∈ TgG0 exists with
Dϕu(g) = 〈Du(g), ϕ〉Tg (∀ϕ ∈ Tg)
if and only if supϕ
Dϕu(g)
‖ϕ◦g‖
• Alternatively, we can regard G0 as a closed subset of the space L2([0, 1],Leb). The lin-
ear structure of the latter (with the pointwise addition of functions as group operation)
suggests to choose as tangent space
TgG0 := L2([0, 1],Leb).
An element f ∈ TgG0 induces a flow by (g, t) 7→ g+tf and it defines a directional derivative
(’Frechet derivative’) by Dfu(g) = limt→0
[u(g + tf) − u(g)] for u : G0 → R, provided u
extends to a neighborhood of G0 in L2([0, 1],Leb) or the flow (induced by f) stays within
G0. A gradient Du(g) ∈ TgG0 exists with
Dfu(g) = 〈Du(g), f〉L2 (∀ϕ ∈ L2)
if and only if supf
Dfu(g)
<∞. In this case, Du(g) is the usual L2-gradient.
Fortunately, both geometric settings lead to the same result.
Lemma 7.1. (i) For each g ∈ G0, the map ιg : ϕ 7→ ϕ ◦ g defines an isometric embedding
of TgG0 = L2([0, 1], g∗Leb) into TgG0 = L2([0, 1],Leb). For each (smooth) cylinder function
u : G0 → R
Dϕu(g) = Dϕ◦gu(g).
If Du ∈ L2(Leb) exists then Du ∈ L2(g∗Leb) also exists.
(ii) For Q0-a.e. g ∈ G0, the above map ιg : TgG0 → TgG0 is even bijective. For each u as above
Du(g) = Du(g) ◦ g−1 and
‖Du(g)‖Tg = ‖Du(g)‖Tg .
Proof. (i) is obvious, (ii) follows from the fact that for Q0-a.e. g ∈ G0 the generalized inverse
g−1 is continuous and thus g−1(gt) = t for all t (see sections 3.5 and 2.1). Hence, the map
ιg : TgG0 → TgG0 is surjective: for each f ∈ TgG0
ιg(f ◦ g−1) = f ◦ g−1 ◦ g = f.
Example 7.2. (i) For each u ∈ Z1(G0) of the form u(g) = U(
~α(gt)dt) with U ∈ C1(Rm,R)
and ~α = (α1, . . . , αm) ∈ C1([0, 1],Rm), the gradients Du(g) ∈ TgG0 = L2([0, 1], g∗Leb) and
Du(g) ∈ TgG0 = L2([0, 1],Leb) exist:
Du(g) =
~α(gt)dt) · α′i(g(.)), Du(g) =
~α(gt)dt) · α′i(.)
and their norms coincide:
‖Du(g)‖2Tg = ‖Du(g)‖
~α(gt)dt) · α′i(g(s))
(ii) For each u ∈ C1(G0) of the form u(g) = U(
~f(t)g(t)dt) with U ∈ C1(Rm,R) and ~f =
(f1, . . . , fm) ∈ L2([0, 1],Rm), the gradient
Du(g) =
~f(t)g(t)dt) · αi(.) ∈ L2([0, 1],Leb)
exists and
‖Du(g)‖2Tg =
~f(t)g(t)dt) · fi(s)
For u ∈ C1(G0) ∪ Z1(G0), the gradient Du can be regarded as a map G0 × [0, 1] → R, (g, t) 7→
Du(g)(t). More precisely,
D : C1(G0) ∪ Z1(G0) → L2(G0 × [0, 1],Q0 ⊗ Leb).
Proposition 7.3. The operator D : Z1(G0) → L2(G0×[0, 1],Q0⊗Leb) is closable in L2(G0,Q0).
Proof. LetW ∈ L2(G0×[0, 1],Q0⊗Leb) be of the formW (g) = w(g)·ϕ(gt) with some w ∈ Z1(G0)
and some ϕ ∈ C∞([0, 1]) satisfying ϕ(0) = ϕ(1) = 0. Then according to the integration by parts
formula for each u ∈ Z1(G0) with u(g) = U(
~α(gs)ds)
G0×[0,1]
Du ·W d(Q0 ⊗ Leb) =
~α(gs)ds)α
i(gt)w(g)ϕ(gt)dtdQ0(g)
Dϕu(g)w(g) dQ0(g) =
u(g)D∗ϕw(g) dQ0(g).
To prove the closability of D, consider a sequence (un)n in Z
1(G0) with un → 0 in L2(Q0) and
Dun → V in L2(Q0 ⊗ Leb). Then
V ·W d(Q0 ⊗ Leb) = lim
Dun ·W d(Q0 ⊗ Leb) = lim
ϕw dQ0 = 0 (7.1)
for all W as above. The linear hull of the latter is dense in L2(Q0 ⊗ Leb). Hence, (7.1) implies
V = 0 which proves the closability of D.
The closure of (D,Z1(G0)) will be denoted by (D,Dom(D). Note that a priori it is not clear
whether D coincides with D on C1(G0). (See, however, Theorem 7.8 below.)
7.2 The Dirichlet Form
Definition 7.4. For u, v ∈ Z1(G0) ∪ C1(G0) we define the ’Wasserstein Dirichlet integral’
E(u, v) =
〈Du(g),Dv(g)〉L2 dQ0(g). (7.2)
Theorem 7.5. (i) (E,Z1(G0)) is closable. Its closure (E,Dom(E)) is a regular, recurrent
Dirichlet form on L2(G0,Q0).
Dom(E) = Dom(D) and for all u, v ∈ Dom(D)
E(u, v) =
G0×[0,1]
Du · Dv d(Q0 ⊗ Leb).
(ii) The set Z∞0 (G0) of all cylinder functions u ∈ Z∞(G0) of the form u(g) = U(
~α(gs)ds) with
U ∈ C∞(Rm,R) and ~α = (α1, . . . , αm) ∈ C∞([0, 1],Rm) satisfying α′i(0) = α′i(1) = 0 is a core
for (E,Dom(E)).
(iii) The generator (L,Dom(L) of (E,Dom(E)) is the Friedrichs extension of the operator
(L,Z∞0 (G0) given by
Lu(g) = −
D∗αiui(g)
i,j=1
∂i∂jU
~α(gs)ds
α′i(gs)α
j(gs)ds +
~α(gs)ds
· V β
where ui(g) := ∂iU(
~α(gs)ds) and V
(g) denotes the drift term defined in section 5.1 with
ϕ = α′i; β > 0 is the parameter of the entropic measure fixed throughout the whole chapter.
(iv) The Dirichlet form (E,Dom(E)) has a square field operator given by
Γ(u, v) := 〈Du,Dv〉L2(Leb) ∈ L1(G0,Q0)
with Dom(Γ) = Dom(E) ∩ L∞(G0,Q0). That is, for all u, v, w ∈ Dom(E) ∩ L∞(G0,Q0)
w · Γ(u, v) dQ0 = E(u, vw) + E(uw, v) − E(uv,w). (7.3)
Proof. (a) The closability of the form (E,Z1(G0)) follows immediately from the previous Propo-
sition 7.3. Alternatively, we can deduce it from assertion (iii) which we are going to prove first.
(b) Our first claim is that E(u,w) = −
u · Lw dQ0 for all u,w ∈ Z∞0 (G0). Let u(g) =
~α(gs)ds) and w(g) = W (
~γ(gs)ds) with U,W ∈ C∞(Rm,R) and ~α = (α1, . . . , αm), ~γ =
(γ1, . . . , γm) ∈ C∞([0, 1],Rm) satisfying α′i(0) = α′i(1) = γ′i(0) = γ′i(1) = 0. Observe that
〈Du(g),Dw(g)〉L2 =
i,j=1
~α(gs)ds) · ∂jW (
~γ(gs)ds) ·
α′i(gs)γ
j(gs)ds
ui(g) ·Dα′
w(g).
Hence, according to the integration by parts formula from Proposition 5.10
E(u,w) =
〈Du(g),Dw(g)〉L2 dQ0(g)
ui(g) ·Dα′
w(g) dQ0(g)
ui(g) · w(g) dQ0(g)
Lu(g) · w(g) dQ0(g).
This proves our first claim. In particular, (L,Z∞0 (G0)) is a symmetric operator. Therefore, the
form (E,Z∞0 (G0)) is closable and its generator coincides with the Friedrichs extension of L.
(c) Now let us prove that Z∞0 (G0) is dense in Z1(G0). That is, let us prove that each function
u ∈ Z1(G0) can be approximated by functions uǫ ∈ Z∞0 (G0). For simplicity, assume that u is of the
form u(g) = U(
α(gs)ds) with U ∈ C1(R) and α ∈ C1([0, 1]). (That is, for simplicity, m = 1.)
Let Uǫ ∈ C∞(R) for ǫ > 0 be smooth approximations of U with ‖U − Uǫ‖∞ + ‖U ′ − U ′ǫ‖∞ → 0
as ǫ → 0 and let αǫ ∈ C∞(R) with α′ǫ(0) = α′ǫ(1) = 0 be smooth approximations of α with
‖α−αǫ‖∞ → 0 and α′ǫ(t) → α′(t) for all t ∈]0, 1[ as ǫ → 0. Moreover, assume that supǫ ‖α′‖∞ <
Define uǫ ∈ Z∞0 (G0) as uǫ(g) = Uǫ(
αǫ(gs)ds). Then uǫ → u in L2(G0,Q0) by dominated
convergence relative Q0.
Since
U ′ǫ(
αǫ(g(s))ds)
[0,1]
α′ǫ(gs)
2ds ≤ C,
(α′ǫ)
2(g(s))
ǫ→0−→ α′(gs)2 ∀s ∈ [0, 1] \
{g = 0} ∩ {g = 1}
[0, 1] \
{g = 0} ∩ {g = 1}
=]0, 1[ for Q0-almost all g ∈ G0
one finds by dominated convergence in L2([0, 1],Leb), for Q0-almost all g ∈ G0
U ′ǫ(
αǫ(gs)ds)
[0,1]
α′ǫ(gs)
ǫ→0−→
α(gs)ds)
[0,1]
α′(gs)
Hence also with
E(uǫ, uǫ) =
U ′ǫ(
αǫ(gs)ds)
α′ǫ(gs)
2dsQ0(dg)
ǫ→0−→
α(gs)ds)
α′(gs)
2dsQ0(dg)
by dominated convergence in L2(G0,Q0). In particular, {uǫ}ǫ constitutes a Cauchy sequence
relative to the norm ‖v‖2
E,1 := ‖v‖2L2(G,Q) + E(v, v). In fact, since the sequence uǫ is uniformly
bounded w.r.t. to ‖.‖E,1, by weak compactness there is a weakly converging subsequence in
(Dom(E), ‖.‖E,1). Since the associated norms converge, the convergence is actually strong in
(Dom(E), ‖.‖E,1). Moreover, since uǫ → u in L2(G0,Q0), this limit is unique. Hence the entire
sequence converges to u ∈ (Dom(E), ‖.‖E,1), such that in particular E(u, u) = limǫ→0E(uǫ, uǫ).
This proves our second claim. In particular, it implies that also (E,Z1(G0)) is closable and that
the closures of Z∞0 (G0) and Z1(G0) coincide.
(d) Obviously, (E,Dom(E)) has the Markovian property. Hence, it is a Dirichlet form. Since
the constant functions belong to Dom(E), the form is recurrent. Finally, the set Z1(G0) is dense
in (C(G0), ‖.‖∞) according to the theorem of Stone-Weierstrass since it separates the points in
the compact metric space G0. Hence, (E,Dom(E)) is regular.
(e) According to Leibniz’ rule, (7.3) holds true for all u, v, w ∈ Z1(G0). Arbitrary u, v, w ∈
Dom(E)∩L∞(G0,Q0) can be approximated in (E(.) + ‖.‖2)1/2 by un, vn, wn ∈ Z1(G0) which are
uniformly bounded on G0. Then unvn → uv, unwn → uw and vnwn → vw in (E(.) + ‖.‖2)1/2.
Moreover, we may assume that wn → w Q0-a.e. on G0 and thus
|wΓ(u, v) − wnΓ(un, vn)| dQ0 ≤
|w−wn|Γ(u, v)dQ0 +
|wn| · |Γ(u, v)−Γ(un, vn)|dQ0 → 0
by dominated convergence. Hence, (7.3) carries over from Z1(G0) to Dom(E) ∩L∞(G0,Q0).
Lemma 7.6. For each f ∈ G0 the function u : g 7→ 〈f, g〉L2 belongs to Dom(E).
Proof. (a) For f, g ∈ G0 put µf = f∗Leb and µg = g∗Leb. Recall that by Kantorovich duality
‖f − g‖2L2 =
d2W (µf , µg)
= sup
ϕdµf +
= sup
ϕ(ft)dt+
ψ(gt)dt
where the supϕ,ψ is taken over all (smooth, bounded) ϕ ∈ L1([0, 1], µf ), ψ ∈ L1([0, 1], µg)
satisfying ϕ(x) + ψ(y) ≤ 1
|x − y|2 for µf -a.e. x and µg-a.e. y in [0, 1]. Replacing ϕ(x) by
|x|2/2 − ϕ(x) (and ψ(y) by . . .) this can be restated as
〈f, g〉L2 = inf
ϕ(ft)dt+
ψ(gt)dt
(7.4)
where the infϕ,ψ now is taken over all (smooth, bounded) ϕ ∈ L1([0, 1], µf ), ψ ∈ L1([0, 1], µg)
satisfying ϕ(x) + ψ(y) ≥ 〈x, y〉 for µf -a.e. x and µg-a.e. y in [0, 1]. If g is strictly increasing
then ψ can be chosen as
ψ′ = f ◦ g−1,
cf. [Vil03], sect. 2.1 and 2.2.
(b) Now fix a countable dense set {gn}n∈N of strictly increasing functions in G0 and an arbitrary
function f ∈ G0. Let (ϕn, ψn) denote a minimizing pair for (f, gn) in (7.4) and define un : G0 → R
un(g) := min
i=1,...,n
ϕ(fi(t))dt+
ψi(g(t))dt
Note that ψ′i = f ◦ g
i and thus un(gi) = 〈f, gi〉 for all i = 1, . . . , n. Therefore,
|un(g)− un(g̃)| ≤ max
|ψi(g(t))dt − ψi(g̃(t))|dt ≤ max
‖ψ′i‖∞ ·
|g(t)− g̃(t)|dt ≤ ‖g − g̃‖L1
for all g, g̃ ∈ G0. Hence, un → u pointwise on G0 and in L2(G0,Q0) where u(g) := 〈f, g〉.
(c) The function un is in the class Z
0(G0):
un(g) = Un
~α(gt)dt
with Un(x1, . . . , xn) = min{c1 + x1, . . . , cn + xn}, ci =
ϕi(f(t))dt and αi = ψi. The function
Un can be easily approximated by C1 functions in order to verify that un ∈ Dom(E) and
Dun(g) =
1Ai(g) · ψ′i(g(.))
with a suitable disjoint decomposition G0 = ∪iAi. (More precisely, Ai denotes the set of all
g ∈ G0 satisfying
ϕ(fi(t))dt +
ψi(g(t))dt <
ϕ(fj(t))dt +
ψj(g(t))dt for all j < i and
ϕ(fi(t))dt+
ψi(g(t))dt ≤
ϕ(fi(t))dt+
ψi(g(t))dt for all j > i.) Thus
‖Dun(g)‖2 =
1Ai(g) ·
ψ′i(g(t))
E(un) ≤ max
‖ψ′i ◦ g‖2L2dQ0(g).
In particular, since |ψ′i| ≤ 1,
E(un) ≤ 1
and thus u ∈ Dom(E).
Lemma 7.7. For all u ∈ Z1(G0) and all w ∈ C1(G0) ∩Dom(E)
E(u,w) =
〈Du(g),Dw(g)〉L2dQ0(g) (7.5)
(with Du(g) and Dw(g) given explicitly as in Example 7.2).
Proof. Recall that for u ∈ Z∞0 (G0) of the form u(g) = U(
~α(gt)dt)
Lu(g) =
ui(g)
with ui(g) = ∂iU(
~α(gt)dt). Hence, for w ∈ C1(G0) of the form w(g) =W (〈~h, g〉)
E(u,w) = −
Lu(g)w(g) dQ0(g)
ui(g)w(g) dQ0(g) =
ui(g)Dα′
ui(g)w(g) dQ0(g)
i,j=1
~α(gt)dt) · ∂jW (
~h(t)g(t)dt) ·
α′i(g(t))hj(t)dt dQ0(g)
〈Du(g),Dw(g)〉dQ0(g).
This proves the claim provided u ∈ Z∞0 (G0). By density this extends to all u ∈ Z1(G0).
Theorem 7.8. (i) (E,C1(G0)) is closable and its closure coincides with (E,Dom(E)). Similarly,
(D,C1(G0)) is closable and its closure coincides with (D,Dom(D)).
(ii) For all u,w ∈ Z1(G0) ∪ C1(G0)
Γ(u,w)(g) = 〈Du(g),Dw(g)〉L2 , (7.6)
in particular, E(u,w) =
〈Du(g),Dw(g)〉L2dQ0(g) (with Du(g) and Dw(g) given explicitly as
in Example 7.2).
(iii) For each f ∈ G0 the function uf : g 7→ ‖f − g‖L2 belongs to Dom(E) and Γ(uf , uf ) ≤ 1
Q0-a.e. on G0.
(iv) (E,Dom(E)) is strongly local.
Proof. (a) Claim: For each f ∈ L2([0, 1],Leb) the function uf : g 7→ 〈f, g〉L2 belongs to Dom(E)
and E(uf , uf ) = ‖f‖2L2 .
Indeed, if f ∈ L2 ∩ C1 then f = c0 + c1f1 + c2f2 with f1, f2 ∈ G0 and c0, c1, c2 ∈ R. Hence,
uf ∈ Dom(E) according to Lemma 7.6 and E(uf , uf ) =
‖Duf‖2dQ0 = ‖f‖2 according to
Lemma 7.7. Finally, each f ∈ L2 can be approximated by fn ∈ L2 ∩ C1 with ‖f − fn‖ → 0.
Hence, uf ∈ Dom(E) and E(uf , uf ) = ‖f‖2.
(b) Claim: C1(G0) ⊂ Dom(E).
Let u ∈ C1(G0) be given with u(g) = U(〈~f , g〉), U ∈ C1(Rm,R), ~f = (f1, . . . , fm) ∈ L2([0, 1],Rm).
For each i = 1, . . . ,m let (wi,n)n∈N be an approximating sequence in (Z
1(G0), (E + ‖.‖2)1/2) for
wi : g 7→ 〈fi, g〉. Put un(g) = U(w1,n(g), . . . , wm,n(g)). Then un ∈ Z1(G0), un → u pointwise on
G0 and in L2(G0,Q0). Moreover,
E(un, un) =
∂iU(w1,n(g), . . . , wm,n(g))Dwi,n(g)‖2L2 dQ0(g)
∂iU(〈f1, g〉, . . . , 〈fm, g〉)Dwi(g)‖2L2 dQ0(g)
‖Du(g)‖2dQ0(g).
Hence, u ∈ Dom(E) and E(u, u) =
‖Du(g)‖2dQ0(g).
(c) Assertion (ii) then follows via polarization and bi-linearity. Assertion (iii) is an immediate
consequence of assertion (ii). Assertion (iii) allows to prove the locality of the Dirichlet form
(E,Dom(E)) in the same manner as in the proof of Theorem 6.2.
(d) Claim: C1(G0) is dense in Dom(E).
We have to prove that each u ∈ Z1(G0) can be approximated by un ∈ C1(G0). As usual, it suffices
to treat the particular case u(g) =
α(gt)dt for some α ∈ C1([0, 1]). Put Un(x1, . . . , xn) =
i=1 α(xi) and fn,i(t) = n · 1[ i−1
(t). Then
un(g) := Un(〈fn,1, g〉, . . . 〈fn,n, g〉) =
defines a sequence in C1(G0) with un(g) → u(g) pointwise on G0 and in L2(G0,Q0).
Moreover,
Dun(g) =
· 1[ i−1
[(.) (7.7)
and therefore
E(un) =
dQ0(g) −→
α′(gt)
2dtdQ0(g) = E(u). (7.8)
Thus (un)n is Cauchy in Dom(E) and un → u in Dom(E).
7.3 Rademacher Property and Intrinsic Metric
We say that a function u : G0 → R is 1-Lipschitz if
|u(g)− u(h)| ≤ ‖g − h‖L2 (∀g, h ∈ G0).
Theorem 7.9. Every 1-Lipschitz function u on G0 belongs to Dom(E) and Γ(u, u) ≤ 1 Q0-a.e.
on G0.
Before proving the theorem in full generality, let us first consider the following particular case.
Lemma 7.10. Given n ∈ N, let {h1, . . . , hn} be a orthonormal system in L2([0, 1],Leb) and let
U be a 1-Lipschitz function on Rn. Then the function u(g) = U(〈h1, g〉, . . . , 〈hn, g〉) belongs to
Dom(E) and Γ(u, u) ≤ 1 Q0-a.e. on G0.
Proof. Let us first assume that in addition U is C1. Then according to Theorem 7.8, u is in
Dom(E) and Du(g) =
i=1 ∂iU(〈~h, g〉) · hi. Thus
Γ(u, u)(g) = ‖Du(g)‖L2 =
|∂iU(〈~h, g〉)|2 ≤ 1.
In the case of a general 1-Lipschitz continuous U on Rn we choose an approximating sequence
of 1-Lipschitz functions Uk, k ∈ N, in C1(Rn) with Uk → U uniformly on Rn and put uk(g) =
Uk((〈~h, g〉) for g ∈ G0. Then uk → u pointwise and in L2(G0,Q0). Hence, u ∈ Dom(E) and
Γ(u, u) ≤ 1 Q0-a.e. on G0.
Proof of Theorem 7.9. Every 1-Lipschitz function u on G0 can be extended to a 1-Lipschitz
function ũ on L2([0, 1],Leb) (’Kirszbraun extension’). Hence, without restriction, assume that
u is a 1-Lipschitz function on L2([0, 1],Leb). Choose a complete orthonormal system {hi}i∈N of
the separable Hilbert space L2([0, 1],Leb) and define for each n ∈ N the function Un : Rn → R
Un(x1, . . . , xn) = u
for x = (x1, . . . , xn) ∈ Rn. This function Un is 1-Lipschitz on Rn:
|Un(x)− Un(y)| ≤
xihi −
≤ |x− y|.
Hence, according to the previous Lemma the function
un(g) = Un(〈h1, g〉, . . . , 〈hn, g〉)
belongs belongs to Dom(E) and Γ(un, un) ≤ 1 Q0-a.e. on G0.
Note that
un(g) = u
〈hi, g〉hi
for each g ∈ L2([0, 1],Leb). Therefore, un → u on L2([0, 1],Leb) since
i=1〈hi, g〉hi → g
on L2([0, 1],Leb) and since u is continuous on L2([0, 1],Leb). Thus, finally, u ∈ Dom(E) and
Γ(u, u) ≤ 1 Q0-a.e. on G0.
Our next goal is the converse to the previous Theorem.
Theorem 7.11. Every continuous function u ∈ Dom(E) with Γ(u, u) ≤ 1 Q0-a.e. on G0 is
1-Lipschitz on G0.
Lemma 7.12. For each u ∈ C1(G0) ∪ Z1(G0) and all g0, g1 ∈ G0
u(g1)− u(g0) =
〈Du ((1− t)g0 + tg1) , g1 − g0〉L2dt. (7.9)
Proof. Put gt = (1− t)g0 + tg1 and consider the C1 function η : [0, 1] → R defined by ηt = u(gt).
η̇t = Dg1−g0u(gt) = 〈Du(gt), g1 − g0〉
and thus
η1 − η0 =
η̇tdt =
〈Du(gt), g1 − g0〉dt.
Lemma 7.13. Let g0, g1 ∈ G0 ∩C3 and put gt = (1− t)g0 + tg1. Then for each u ∈ Dom(E) and
each bounded measurable Ψ : G0 → R
[u(g1 ◦ h)− u(g0 ◦ h)]Ψ(h) dQ0(h) =
〈Du(gt ◦ h, (g1 − g0) ◦ h〉Ψ(h)Q0(h)dt. (7.10)
Proof. Given g0, g1, Ψ and u ∈ Dom(E) as above, choose an approximating sequence in Z1(G0)∪
C1(G0) with un → u in Dom(E) as n→ ∞. According to the previous Lemma for each n
[un(g1◦h)−un(g0◦h)]Ψ(h) dQ0(h) =
〈Dun (gt ◦ h) , (g1−g0)◦h〉Ψ(h) dQ0(h)dt. (7.11)
By assumption un → u in L2(G0,Q0) and Dun → Du in L2(G0 × [0, 1],Q0 ⊗ Leb) as n → ∞.
Using the quasi-invariance of Q0 (Theorem 4.3) this implies
|u(gt ◦ h)− un(gt ◦ h)|Ψ(h) dQ0(h) =
[u(h)− un(h)|Ψ(g−1t ◦ h) · Y
(h) dQ0(h) → 0
as n→ ∞ as well as
‖Du(gt ◦ h)− Dun(gt ◦ h)|2L2Ψ(h)Q0(h)
‖Du(h) − Dun(h)|2L2Ψ(g
t ◦ h) · Y
(h)Q0(h) → 0
Hence, we may pass to the limit n→ ∞ in (7.11) which yields the claim.
Proof of Theorem 7.11. Let a continuous u ∈ Dom(E) be given with Γ(u, u) ≤ 1 Q0-a.e. on G0.
We want to prove that u(g1)− u(g0) ≤ ‖g1 − g0‖L2 for all g0, g1 ∈ G0. By density of G0 ∩ C3 in
G0 and by continuity of u it suffices to prove the claim for g0, g1 ∈ G0 ∩ C3.
Choose a sequence of bounded measurable Ψk : G0 → R+ such that the probability measures
ΨkdQ0 on G0 converge weakly to δe, the Dirac mass in the identity map e ∈ G0. Then according
to the previous Lemma and the assumption ‖Du‖ ≤ 1
[u(g1 ◦ h)− u(g0 ◦ h)]Ψk((h)dQ0(h)
〈Du(gt ◦ h, (g1 − g0) ◦ h〉Ψk(h) dQ0(h)dt
‖Du(gt ◦ h)‖L2 · ‖(g1 − g0) ◦ h‖L2 ·Ψk(h) dQ0(h)dt
‖(g1 − g0) ◦ h‖L2 ·Ψk(h) dQ0(h).
Now the integrands on both sides, h 7→ u(g1 ◦h)−u(g0 ◦h) as well as h 7→ ‖(g1 − g0) ◦h‖L2 , are
continuous in h ∈ G0. Hence, as k → ∞ by weak convergence ΨkdQ0 → δe we obtain
u(g1)− u(g0) ≤ ‖g1 − g0‖L2 .
Corollary 7.14. The intrinsic metric for the Dirichlet form (E,Dom(E)) is the L2-metric:
‖g1 − g0‖L2 = sup {u(g1)− u(g0) : u ∈ C(G0) ∩Dom(E), Γ(u, u) ≤ 1Q0-a.e. on G0}
for all g0, g1 ∈ G0.
7.4 Finite Dimensional Noise Approximations
The goal of this section is to present representations – and finite dimensional approximations –
of the Dirichlet form
E(u, v) =
〈Du(g),Dv(g)〉L2 dQ0(g)
in terms of globally defined vector fields.
If (ϕi)i∈N is a complete orthonormal system in Tg = L
2([0, 1], g∗Leb) for a given g ∈ G0 then
obviously
〈Du(g),Dv(g)〉L2 =
Dϕiu(g)Dϕiv(g). (7.12)
Unfortunately, however, there exists no family (ϕi)i∈N which is simultaneously orthonormal in
all Tg = L
2([0, 1], g∗Leb), g ∈ G0. For a general family, the representation (7.12) should be
replaced by
〈Du(g),Dv(g)〉L2 =
i,j=1
Dϕiu(g) · aij(g) ·Dϕjv(g) (7.13)
where a(g) = (aij(g))i,j∈N is the ’generalized inverse’ to Φ(g) = (Φij(g))i,j∈N with
Φij(g) := 〈ϕi, ϕj〉Tg =
ϕi(gt)ϕj(gt)dt.
In order to make these concepts rigorous, we have to introduce some notations.
For fixed n ∈ N let S+(n) ⊂ Rn×n denote the set of symmetric nonnegative definite real (n×n)-
matrices. For each A ∈ S+(n) a unique element A−1 ∈ S+(n), called generalized inverse to A,
is defined by
A−1x :=
0 if x ∈ Ker(A),
y if x ∈ Ran(A) with x = Ay
This definition makes sense since (by the symmetry of A) we have an orthogonal decomposition
Rn = Ker(A)⊕ Ran(A). Obviously,
A−1 ·A = A · A−1 = πA
where πA denotes the projection onto Ran(A).
Moreover, for each A ∈ S+(n) there exists a unique element A1/2 ∈ S+(n), called nonnegative
square root of A, satisfying
A1/2 · A1/2 = A.
Let Ψ(n) denote the map A 7→ A−1, regarded as a map from S+(n) ⊂ Rn×n to Rn×n, with
ij (A) = (A
−1)ij for i, j = 1, . . . , n. Similarly, put
Ξ(n) : S+(n) → S+(n), A 7→ (A1/2)−1 = (A−1)1/2.
Note that Ψ(n)(A) = Ξ(n)(A) · Ξ(n)(A) for all A ∈ S+(n).
The maps Ψ(n) and Ξ(n) are smooth on the subset of positive definite matrices A ∈ S+(n) but
unfortunately not on the whole set S+(n). However, they can be approximated from below
(in the sense of quadratic forms) by smooth maps: there exists a sequence of C∞ maps Ξ(n,l) :
Rn×n → Rn×n with
ξ · Ξ(n,k)(A) · ξ ≤ ξ · Ξ(n,l)(A) · ξ
for all A ∈ S+(n), ξ ∈ Rn and all k, l ∈ N with k ≤ l and
(n,l)
ij (A) → Ξ
ij (A) = (A
−1/2)ij
for all A ∈ S+(n), i, j ∈ {1, . . . , n} as l → ∞. Put Ψ(n,l)(A) = Ξ(n,l)(A) ·Ξ(n,l)(A) for A ∈ Rn×n.
Then the sequence (Ψ(n,l))l∈N approximates Ψ
(n) from below in the sense of quadratic forms.
Now let us choose a family {ϕi}i∈N of smooth functions ϕi : [0, 1] → R which is total in C0([0, 1])
w.r.t. uniform convergence (i.e. its linear hull is dense). Put
Φij(g) := 〈ϕi, ϕj〉Tg =
ϕi(gx)ϕj(gx)dx
(n,l)
ij (g) = Ψ
(n,l)
ij (Φ(g)) , σ
(n,l)
ij (g) = Ξ
(n,l)
ij (Φ(g)).
Note that the maps g 7→ a(n,l)ij (g) and g 7→ σ
(n,l)
ij (g) (for each choice of n, l, i, j) belong to the
class Z∞(G0). Moreover, put
ij (g) = Ψ
ij (Φ(g)) .
Then obviously the orthogonal projection πn onto the linear span of {ϕ1, . . . , ϕn} ⊂ Tg =
L2([0, 1], g∗Leb) is given by
πnu =
i,j=1
ij (g) · 〈u, ϕi〉Tg · ϕj
〈πnu, πnv〉Tg =
i,j=1
〈u, ϕi〉Tg · a
ij (g) · 〈v, ϕj〉Tg
for all u, v ∈ Tg.
Theorem 7.15. (i) For each n, l ∈ N the form (E(n,l),Z1(G0)) with
(n,l)(u, v) =
i,j=1
Dϕiu(g) · a
(n,l)
ij (g) ·Dϕjv(g) dQ0(g)
is closable. Its closure is a Dirichlet form with generator being the Friedrichs extension of the
symmetric operator (L(n,l),Z2(G0)) given by
(n,l) =
i,j=1
(n,l)
ij ·DϕiDϕj +
i,j=1
(n,l)
ij + a
(n,l)
ij · V
Dϕj . (7.14)
(ii) As l → ∞
(n,l) ր E(n)
where
(n)(u, v) =
i,j=1
Dϕiu(g) · a
ij (g) ·Dϕjv(g) dQ0(g).
for u, v ∈ Z1(G0). Hence, in particular, E(n) is a Dirichlet form.
(iii) As n→ ∞
(n) ր E
(which provides an alternative proof for the closability of the form (E,Z1(G0))).
Proof. (i) The function a
(n,l)
i,j on G0 is a cylinder function in the class Z1(G0). The integration
by parts formula for the Dϕi , therefore, implies that for all u, v ∈ Z2(G0)
(n,l)(u, v) =
Dϕiu(g)Dϕjv(g)a
(n,l)
ij (g)dQ0(g)
u(g) ·D∗ϕi
(n,l)
ij Dϕjv
(g) dQ0(g) = −
u(g) · L(n,l)v(g) dQ0(g).
(n,l) = −
i,j=1
(n,l)
ij Dϕj
Hence, (E(n,l),Z2(G0)) is closable and the generator of its closure is the Friedrichs extension of
(L(n,l),Z2(G0)).
(ii) The monotone convergence E(n,l) ր E(n) of the quadratic forms is an immediate consequence
of the fact that a(n,l)(g) ր a(n)(g) (in the sense of symmetric matrices) for each g ∈ G0 which in
turn follows from the defining properties of the approximations Ψ(n,l) of the generalized inverse
Ψ(n).
The limit of an increasing sequence of Dirichlet forms is itself again a Dirichlet form provided it
is densely defined which in our case is guaranteed since it is finite on Z2(G0).
(iii) Obviously, the En, n ∈ N constitute an increasing sequence of Dirichlet forms with En ≤ E
for all n. Moreover, Z1(G0) is a core for all the forms under consideration. Hence, it suffices to
prove that for each u ∈ Z1(G0) and each ǫ > 0 there exists an n ∈ N such that
(n)(u, u)− E(u, u)
To simplify notation, assume that u is of the form u(g) = U(
α(gt)dt) for some U ∈ C1c (R)
and some α ∈ C1([0, 1]). By assumption, the set {ϕi, i ∈ N} is total in C0([0, 1]) w.r.t. uniform
convergence. Hence, for each δ > 0 there exist n ∈ N and ϕ ∈ span(ϕ1, . . . , ϕn) with ‖α′−ϕ‖sup ≤
δ which implies
〈α′, ϕ〉Tg
‖ϕ‖Tg
≥ ‖ϕ‖Tg − δ ≥ ‖α′‖Tg − 2δ.
E(u, u) ≥ E(n)(u, u) ≥
α(gt)dt)
2 · 〈α′, ϕ〉2Tg ·
‖ϕ‖2Tg
dQ0(g)
α(gt)dt)
‖α′‖Tg − 2δ
dQ0(g)
α(gt)dt)
1 + δ
‖α′‖2Tg − 4δ
dQ0(g)
1 + δ
E(u, u)− 4δ‖U ′‖2sup.
Hence, for δ sufficiently small, E(u, u) and E(n)(u, u) are arbitrarily close to each other.
Remark 7.16. For any given g0 ∈ G0, let (gt)t≥0 with gt : (x, ω) 7→ gxt (ω) be the solution to the
dgxt =
i,j=1
(n,l)
ij (gt) · ϕj(g
t ) dW
i,j=1
(n,l)
ij (gt) · ϕj(g
t ) ·
ϕ′i(g
t ) + V
i,j=1
k,m=1
(n,l)
ij (Φ(gt)) · 〈(ϕkϕm)
′, ϕi〉Tg · ϕj(gxt )dt
where ∂kmΨ
(n,l)
ij for (k,m) ∈ {1, . . . , n}2 denotes the 1st order partial derivative of the function
(n,l)
ij : R
n×n → R with respect to the coordinate xkm. Then the generator of the process coincides
on Z2(G0) with the operator 12L
(n,l) from (7.14), the generator of the Dirichlet form E(n,l).
Let us briefly comment on the various terms in the SDE from above:
• The first one,
i,j=1 σ
(n,l)
ij (gt) · ϕj(gxt ) dW it is the diffusion term, written in Ito form;
• the second one, 1
i,j=1 a
(n,l)
ij (gt) ·ϕj(gxt ) ·ϕ′i(gxt )dt is a drift which comes from the trans-
formation between Stratonovich and Ito form (it would disappear if we wrote the diffusion
term in Stratonovich form).
• The next one, 1
i,j=1 a
(n,l)
ij (gt) ·ϕj(gxt ) · V
ϕi(gt)dt is a drift which arises from our change
of variable formula. Actually, since
V βϕi(g) = β
ϕ′i(g(y))dy +
ϕ′i(g(a+)) + ϕ
i(g(a−))
− ϕi(g(a+)) − ϕi(g(a−))
g(a+)− g(a−)
it consists of two parts, one originates in the logarithmic derivative of the entropy of the
g’s (which finally will force the process to evolve as a stochastic perturbation of the heat
equation), the other one is created by the jumps of the g’s.
• The last term, 1
i,j=1
k,m=1 ∂kmΨ
(n,l)
ij (Φ(gt)) · 〈(ϕkϕm)′, ϕi〉Tg · ϕj(gxt )dt involves the
derivative of the diffusion matrix. It arises from the fact that the generator is originally
given in divergence form.
7.5 The Wasserstein Diffusion (µt) on P0
The objects considered previously – derivative, Dirichlet form and Markov process on G0 – have
canonical counterparts on P0. The key to these objects is the bijective map χ : G0 → P0,
g 7→ g∗Leb.
We denote by Zk(P0) the set of all (’cylinder’) functions u : P0 → R which can be written as
u(µ) = U
α1dµ, . . . ,
(7.15)
with some m ∈ N, some U ∈ Ck(Rm) and some ~α = (α1, . . . , αm) ∈ Ck([0, 1],Rm) . The subset
of u ∈ Zk(P0) with α′i(0) = α′i(1) = 0 for all i = 1, . . . ,m will be denoted by Zk0(P0). For
u ∈ Z1(P0) represented as above we define its gradient Du(µ) ∈ L2([0, 1], µ) by
Du(µ) =
~αdµ) · α′i(.)
with norm
‖Du(µ)‖L2(µ) =
~αdµ) · α′i
The tangent space at a given point µ ∈ P0 can be identified with L2([0, 1], µ). The action of a
tangent vector ϕ ∈ L2([0, 1], µ) on µ (’exponential map’) is given by the push forward ϕ∗µ.
Theorem 7.17. (i) The image of the Dirichlet form defined in (7.2) under the map χ is the
regular, strongly local, recurrent Wasserstein Dirichlet form E on L2(P0,P0) defined on its core
Z1(P0) by
E(u, v) =
〈Du(µ),Dv(µ)〉2L2(µ)dP0(µ). (7.16)
The Dirichlet form has a square field operator, defined on Dom(E) ∩ L∞, and given on Z1(P0)
Γ(u, v)(µ) = 〈Du(µ),Dv(µ)〉2L2(µ).
The intrinsic metric for the Dirichlet form is the L2-Wasserstein distance dW . More precisely,
a continuous function u : P0 → R is 1-Lipschitz w.r.t. the L2-Wasserstein distance if and only
if it belongs to Dom(E) and Γ(u, u)(µ) ≤ 1 for P0-a.e. µ ∈ P0.
(ii) The generator of the Dirichlet form is the Friedrichs extension of the symmetric operator
(L,Z20(P0) on L2(P0,P0) given as L = L1 + L2 + β · L3 with
L1u(µ) =
i,j=1
∂i∂jU(
~αdµ) ·
L2u(µ) =
~αdµ) ·
I∈gaps(µ)
α′′i (I−) + α
i (I+)
i(I+)− α′i(I−)
i (0) + α
i (1)
L3u(µ) =
~αdµ) ·
α′′i dµ.
Recall that gaps(µ) denotes the set of intervals I = ]I−, I+[⊂ [0, 1] of maximal length with
µ(I) = 0 and |I| denotes the length of such an interval.
(iii) For P0-a.e. µ0 ∈ P0, the associated Markov process (µt)t≥0 on P0 starting in µ0, called
Wasserstein diffusion, with generator 1
L is given as
µt(ω) = gt(ω)∗Leb
where (gt)t≥0 is the Markov process on G0 associated with the Dirichlet form of Theorem 7.5,
starting in g0 := χ
−1(µ0).
For each u ∈ Z20(P0) the process
u(µt)− u(µ0)−
Lu(µs)ds
is a martingale whenever the distribution of µ0 is chosen to be absolutely continuous w.r.t. the
entropic measure P0. Its quadratic variation process is
Γ(u, u)(µs)ds.
Remark 7.18. L1 is the second order part (’diffusion part’) of the generator L, L2 and L3
are first order operators (’drift parts’). The operator L1 describes the diffusion on P0 in all
directions of the respective tangent spaces. This means that the process (µt) at each time t ≥ 0
experiences the full ’tangential’ L2([0, 1], µt)-noise.
L3 is the generator of the deterministic semigroup (’Neumann heat flow’) (Ht)t≥0 on L
2(P0,P0)
given by
Htu(µ) = u(htµ).
Here ht is the heat kernel on [0, 1] with reflecting (’Neumann’) boundary conditions and htµ(.) =
ht(., y)µ(dy). Indeed, for each u ∈ Z10(P0) given as u(g) = U(
~αdµ) we obtain Htu(µ) =
~α(x)ht(x, y)µ(dy)dx
and thus
∂tHtu(µ) =
∂iU(htµ) · ∂t
αi(x)ht(x, y)µ(dy)dx
∂iU(htµ) ·
αi(x)h
t (x, y)µ(dy)dx
∂iU(htµ) ·
α′′i (x)ht(x, y)µ(dy)dx = L3Htu(µ).
Note that L depends on β only via the drift term L3 and
L → L3 as β → ∞.
The following statement, which in the finite dimensional case is known as Varadhan’s formula,
exhibits another close relationship between (µt) and the geometry of (P([0, 1]), dW ). The Gaus-
sian short time asymptotics of the process (µt)t≥0 are governed by the L
2-Wasserstein distance.
Corollary 7.19. For measurable sets A,B ∈ P0 with positive P0-measure, let dW (A,B) =
inf{dW (ν, ν̃) | ν ∈ A, ν̃ ∈ B} and pt(A,B) =
pt(ν, dν̃)P0(dν) where pt(ν, dν̃) denotes the
transition semigroup for the process (µt)t≥0.
t log pt(A,B) = −
dW (A,B)
. (7.17)
Proof. This type of result is known as Varadhan’s formula. Its respective form for (E,Dom(E) on
L2(P0,P0) holds true by the very general results of [HR03] for conservative symmetric diffusions,
and the identification of the intrinsic metric as dW in our previous Theorem.
Due to the sample path continuity of (µt) the Wasserstein diffusion is equivalently characterized
by the following martingale problem. Here we use the notation 〈α, µt〉 =
α(x)µt(dx).
Corollary 7.20. For each α ∈ C2([0, 1]) with α′(0) = α′(1) = 0 the process
Mt = 〈α, µt〉 −
〈α′′, µs〉ds
I∈gaps(µs)
α′′(I−) + α
′′(I+)
′(I+)− α′(I−)
′′(0) + α′′(1)
is a continuous martingale with quadratic variation process
[M ]t =
〈(α′)2, µs〉ds.
Remark 7.21. For illustration one may compare corollary 7.20 for (µt) in the case β = 1 to
the respective martingale problems for four other well-known measure valued process, say on
the real line, namely the so-called super-Brownian motion or Dawson-Watanabe process (µDWt ),
the Fleming-Viot process (µFW ), both of which we can consider with the Laplacian as drift, the
Dobrushin-Doob process (µDDt ) which is the empirical measure of independent Brownian motions
with locally finite Poissonian starting distribution, cf. [AKR98], and finally simply the empirical
measure process of a single Brownian motion (µBMt = δXt). For each i ∈ {DW,FV,DD,BM}
and sufficiently regular α : R → R the process M it := 〈α, µit〉 − 12
〈α′′, µis〉ds is a continuous
martingale with quadratic variation process
[MDW ]t =
〈α2, µDWs 〉ds,
[MFV ]t =
[〈α2, µFVs 〉 − (〈α, µFVs 〉)2]ds,
[MDD]t =
〈(α′)2, µDDs 〉ds,
[MBM ]t =
〈(α′)2, µBMs 〉ds.
In view of corollary 7.19 the apparent similarity of µDD and µBM to the Wasserstein diffusion
µ is no suprise. However, the effective state spaces of µDD, µBM and µt are as much different
as their invariant measures.
References
[AKR98] S. Albeverio, Yu. G. Kondratiev, and M. Röckner. Analysis and geometry on con-
figuration spaces. J. Funct. Anal., 154(2):444–500, 1998.
[AM06] Hélène Airault and Paul Malliavin. Quasi-invariance of Brownian measures on the
group of circle homeomorphisms and infinite-dimensional Riemannian geometry. J.
Funct. Anal. 241 (1): 99-142, 2006.
[AMT04] Hélène Airault, Paul Malliavin, and Anton Thalmaier. Canonical Brownian motion
on the space of univalent functions and resolution of Beltrami equations by a con-
tinuity method along stochastic flows. J. Math. Pures Appl. (9), 83(8):955–1018,
2004.
[AR02] Hélène Airault and Jiagang Ren. Modulus of continuity of the canonic Brownian
motion “on the group of diffeomorphisms of the circle”. J. Funct. Anal. 196 (2):
395-426, 2002.
[Ber99] Jean Bertoin. Subordinators: examples and applications. In Lectures on probability
theory and statistics (Saint-Flour, 1997), volume 1717 of Lecture Notes in Math.,
pages 1–91. Springer, Berlin, 1999.
[Bre91] Yann Brenier. Polar factorization and monotone rearrangement of vector-valued
functions. Comm. Pure Appl. Math., 44(4):375–417, 1991.
[CEMS01] Dario Cordero-Erausquin, Robert J. McCann, and Michael Schmuckenschläger. A
Riemannian interpolation inequality à la Borell, Brascamp and Lieb. Invent. Math.,
146(2):219–257, 2001.
[Daw93] Donald A. Dawson. Measure-valued Markov processes. In École d’Été de Probabilités
de Saint-Flour XXI—1991, volume 1541 of Lecture Notes in Math., pages 1–260.
Springer, Berlin, 1993.
[DZ06] Arnaud Debussche and Lorenzo Zambotti. Conservative Stochastic Cahn-Hilliard
equation with reflection. 2006. Preprint.
[ÉY04] Michel Émery and Marc Yor. A parallel between Brownian bridges and gamma
bridges. Publ. Res. Inst. Math. Sci., 40(3):669–688, 2004.
[Fan02] Shizan Fang. Canonical Brownian motion on the diffeomorphism group of the circle.
J. Funct. Anal. 196 (1): 162-179, 2002.
[Fan04] Shizan Fang. Solving stochastic differential equations on Homeo(S1). J. Funct. Anal.,
216(1):22–46, 2004.
[FOT94] Masatoshi Fukushima, Yōichi Oshima, and Masayoshi Takeda. Dirichlet forms and
symmetric Markov processes. Walter de Gruyter & Co., Berlin, 1994.
[Han02] Kenji Handa. Quasi-invariance and reversibility in the Fleming-Viot process. Probab.
Theory Related Fields, 122(4):545–566, 2002.
[HR03] Masanori Hino and José A. Ramı́rez. Small-time Gaussian behavior of symmetric
diffusion semigroups. Ann. Probab., 31(3):1254–1295, 2003.
[JKO98] Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of
the Fokker-Planck equation. SIAM J. Math. Anal., 29(1):1–17 (electronic), 1998.
[Kin93] J. F. C. Kingman. Poisson processes, volume 3 of Oxford Studies in Probability.
The Clarendon Press Oxford University Press, New York, 1993. , Oxford Science
Publications.
[Mal99] Paul Malliavin. The canonic diffusion above the diffeomorphism group of the circle.
C. R. Acad. Sci. Paris Sér. I Math., 329(4):325–329, 1999.
[McC97] Robert J. McCann. A convexity principle for interacting gases. Adv. Math.,
128(1):153–179, 1997.
[NP92] D. Nualart and É. Pardoux. White noise driven quasilinear SPDEs with reflection.
Probab. Theory Related Fields, 93(1):77–89, 1992.
[Ott01] Felix Otto. The geometry of dissipative evolution equations: the porous medium
equation. Comm. Partial Differential Equations, 26(1-2):101–174, 2001.
[OV00] F. Otto and C. Villani. Generalization of an inequality by Talagrand and links with
the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361–400, 2000.
[RS80] Michael Reed and Barry Simon. Functional Analysis I. Academic Press 1980.
[vRS05] Max-K. von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient es-
timates, entropy, and Ricci curvature. Comm. Pure Appl. Math., 58(7):923–940,
2005.
[Sch97] Alexander Schied. Geometric aspects of Fleming-Viot and Dawson-Watanabe pro-
cesses. Ann. Probab., 25(3):1160–1179, 1997.
[Sta03] Wilhelm Stannat. On transition semigroups of (A,Ψ)-superprocesses with immigra-
tion. Ann. Probab., 31(3):1377–1412, 2003.
[Stu06] Karl-Theodor Sturm. On the geometry of metric measure spaces. I. Acta Math.,
196(1):65–131, 2006.
[TVY01] Natalia Tsilevich, Anatoly Vershik, and Marc Yor. An infinite-dimensional analogue
of the Lebesgue measure and distinguished properties of the gamma process. J.
Funct. Anal., 185(1):274–296, 2001.
[Vil03] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in
Mathematics. American Mathematical Society, Providence, RI, 2003.
	Introduction
	Spaces of Probability Measures and Monotone Maps
	The Spaces P0=P([0,1]) and G0
	The Spaces G, G1 and P=P(S1)
	Dirichlet Process and Entropic Measure
	Gibbsean Interpretation and Heuristic Derivation of the Entropic Measure
	The Measures Q and P 
	The Measures Q0 and P0 
	The Dirichlet Process as Normalized Gamma Process
	Support Properties
	Scaling and Invariance Properties
	Dirichlet Processes on General Measurable Spaces
	The Change of Variable Formula for the Dirichlet Process and for the Entropic Measure
	Heuristic Approaches to Change of Variable Formulae
	The Change of Variables Formula on the Sphere
	The Change of Variables Formula on the Interval
	Proofs for the Sphere Case
	Proof for the Interval Case
	The Integration by Parts Formula
	The Drift Term
	Directional Derivatives
	Integration by Parts Formula on P(S1)
	Derivatives and Integration by Parts Formula on P([0,1])
	Dirichlet Form and Stochastic Dynamics on on G
	The Dirichlet Form on G
	Finite Dimensional Noise Approximations
	Dirichlet Form and Stochastic Dynamics on G1 and P
	Dirichlet Form and Stochastic Dynamics on G0 and P0
	The Canonical Dirichlet Form on the Wasserstein Space
	Tangent Spaces and Gradients
	The Dirichlet Form
	Rademacher Property and Intrinsic Metric
	Finite Dimensional Noise Approximations
	The Wasserstein Diffusion (t) on P0
ABSTRACT
  We construct a new random probability measure on the sphere and on the unit
interval which in both cases has a Gibbs structure with the relative entropy
functional as Hamiltonian. It satisfies a quasi-invariance formula with respect
to the action of smooth diffeomorphism of the sphere and the interval
respectively. The associated integration by parts formula is used to construct
two classes of diffusion processes on probability measures (on the sphere or
the unit interval) by Dirichlet form methods. The first one is closely related
to Malliavin's Brownian motion on the homeomorphism group. The second one is a
probability valued stochastic perturbation of the heat flow, whose intrinsic
metric is the quadratic Wasserstein distance. It may be regarded as the
canonical diffusion process on the Wasserstein space.

<|endoftext|><|startoftext|>
Introduction
The eikonal approximation [1–3] has a long history of successful results in
describing scattering processes like nucleon-nucleus scattering, heavy-ion col-
lisions, and electroinduced nucleon-knockout reactions. The latter class of re-
actions, usually denoted as A(e, e′p), provide access to a wide range of nuclear
phenomena like short- and long-range correlations, relativistic effects, the tran-
sition from hadronic to partonic degrees of freedom, and medium modifications
of nucleon properties. The interpretation of A(e, e′p) data heavily relies on an
Email address: Jan.Ryckebusch@UGent.be (J. Ryckebusch).
Preprint submitted to Elsevier 6 August 2021
http://arxiv.org/abs/0704.0705v1
accurate description of the effect of the final-state interactions (FSI), i.e., the
interactions of the ejected proton with the residual nucleus such as rescatter-
ing and/or absorption. The eikonal approximation has been widely used to
treat these distortions, either in combination with optical potentials [4–7], or
with Glauber theory, its multiple-scattering extension [8–15].
The eikonal scattering wave functions are derived by linearizing the continuum
wave equation for the ejected proton. Hence, the solution is only valid to first
order in 1/k, with k the proton’s momentum, and the eikonal approximation
is suited for the description of reactions at sufficiently high energies. To extend
the applicability to lower energies, Wallace [16] has developed systematic cor-
rections to the eikonal scattering amplitude. Several authors have investigated
the effect of higher-order eikonal corrections in elastic nuclear scattering by
protons, antiprotons, and α particles [17,18], heavy-ion collisions [19–22], and
inclusive electron-nucleus scattering [23]. The aim of this Letter is to deter-
mine the influence of higher-order eikonal corrections on A(e, e′p) observables.
To this purpose, we extend the relativistic optical model eikonal approxima-
tion (ROMEA) A(e, e′p) framework of Ref. [7]. Our formalism builds upon the
work of Baker [24], where an eikonal approximation for potential scattering
was derived to second order in 1/k. Here, this work is extended to include the
effect of the spin-orbit potential.
The outline of this Letter is as follows. In Section 2, the second-order eikonal
correction to the ROMEA model is derived. Section 3 presents the results of
the A(e, e′p) numerical calculations. We look into how the second-order eikonal
correction affects more inclusive quantities like the nuclear transparency, as
well as truly exclusive observables such as the induced normal polarization
Pn, the left-right asymmetry ALT , and the differential cross section. Finally,
in Section 4, we state our conclusions.
2 Formalism
For the description of the A(e, e′p) reaction, we adopt the impulse approxi-
mation (IA) and the independent-nucleon picture. Within this approach, the
basic quantity to be computed is the transition matrix element [25]
〈Jµ〉 =
~k,ms
(~r) Ĵµ(~r) ei~q·~r φα1(~r) . (1)
Here, φα1 and Ψ
~k,ms
are the relativistic bound-state and scattering wave func-
tions, with α1 the quantum numbers of the struck proton and ~k and ms the
momentum and spin of the ejected proton. The relativistic bound-state wave
function is obtained in the Hartree approximation to the σ − ω model [26]
with the W1 parametrization for the different field strengths [27]. The scat-
tering wave function Ψ
~k,ms
appears with incoming boundary conditions and is
related to Ψ
~k,ms
by time reversal. Furthermore, Ĵµ is the relativistic one-body
current operator. Throughout this Letter, we use the Coulomb gauge and the
CC2 form of Ĵµ [28].
We now turn our attention to the determination of the scattering wave func-
tion Ψ
~k,ms
. We start by considering the Dirac equation for a proton with
relativistic energy E =
k2 +M2N and spin state
subject to Lorentz
scalar and vector potentials Vs(r) and Vv(r). The Dirac equation for the four-
component spinor Ψ
~k,ms
(~r) is converted to a Schrödinger-like equation for the
upper component u
~k,ms
(~r) [7, 29]
+ Vc(r) + Vso(r) (~σ · ~L− i~r · ~̂p)
~k,ms
(~r) =
~k,ms
(~r) . (2)
The central Vc(r) and spin-orbit Vso(r) potentials are defined in terms of the
scalar and vector ones, Vs(r) and Vv(r). The lower component w
~k,ms
(~r) is
related to the upper one through
~k,ms
(~r) =
E +MN + Vs(r)− Vv(r)
~σ · ~̂p u
~k,ms
(~r) . (3)
When solving Eq. (2) in the eikonal approximation, a standard procedure is
to replace the momentum operator ~̂p by the asymptotic momentum ~k in the
spin-orbit (Vso(r)~σ · ~L) and Darwin (Vso(r) (−i~r · ~̂p)) terms, as well as in the
lower component (3). In literature, this is usually referred to as the effective
momentum approximation (EMA) [30]. For the upper component, one puts
forward a solution of the form
~k,ms
(~r) ≡ N η(~r) ei
~k·~r χ 1
, (4)
i.e., a plane wave modulated by an eikonal factor η(~r). Here, N is a normal-
ization factor.
In the ROMEA approach [7, 29], which adopts the first-order eikonal approx-
imation, Eq. (2) is linearized in ~̂p leading to a solution for the eikonal factor
of the form
ηROMEA(~r) = ηROMEA(~b, z) = exp
dz ′ Vopt(~b, z
 , (5)
where ~r ≡ (~b, z), the z axis lies along the momentum ~k of the proton, and
Vopt(~b, z) = Vc(~b, z) + Vso(~b, z) (~σ · ~b × ~k − ikz). Despite the fact that it is
written as an exponential phase, the solution (5) is only valid up to first order
in Vopt/k.
In what follows, we will derive an expression for the eikonal factor η(~r) that
is valid up to order Vopt/k
2. The momentum dependence in the spin-orbit
and Darwin terms makes that these terms are retained up to order Vso/k,
while central terms are included up to order Vc/k
2. Note that the expansion is
not expressed in terms of the Lorentz scalar and vector potentials Vs and Vv.
Looking for a solution of the form (4) for the Schrödinger-like equation (2),
Baker arrived at the following equation for the eikonal factor (see Eq. (14) of
Ref. [24]):
η(~b, z) = 1− i
dz ′ Vopt(~b, z
′) η(~b, z ′) +
Vopt(~b, z) η(~b, z)
dz ′ (z − z ′)
Vopt(~b, z
′) η(~b, z ′)
. (6)
Note that, apart from dropping contributions of order Vopt/k
3 and higher, no
additional assumptions were made when deriving Eq. (6). In Ref. [24], Eq. (6)
was subsequently solved for spherically symmetric potentials. The spin-orbit
and Darwin terms, however, break the spherical symmetry and a novel method
to solve Eq. (6) is needed. To that purpose, we assume that the derivative of
the function η is of higher order in 1/k than η itself (as is true for the ROMEA
solution (5)). This allows us to drop the ∂η/∂b contribution in the last term
of Eq. (6), as it is of order Vopt/k
3 or higher:
dz ′ (z − z ′)
Vopt(~b, z
′) η(~b, z ′)
dz ′ (z − z ′)
Vc(~b, z
′) + Vso(~b, z
′) (~σ ·~b× ~k − ikz ′)
η(~b, z ′) . (7)
Spherical symmetry implies that z ′ ∂Vc(~b, z
′)/∂b = b ∂Vc(~b, z
′)/∂z ′. Hence,
the z ′ ∂Vc/∂b term in Eq. (7) can be written as
dz ′ b
∂Vc(~b, z
η(~b, z ′)
dz ′ b
Vc(~b, z
′) η(~b, z ′)
2 + b
Vc(~b, z)
η(~b, z) . (8)
In the first step, we made use of the fact that the derivative ∂η/∂z ′ is of higher
order to turn the integrand into an exact differential. A similar reasoning,
followed by integration by parts, leads to
dz ′ (z − z ′)
∂Vso(~b, z
(−ikz ′) η(~b, z ′)
2 + b
Vso(~b, z)
η(~b, z ′) , (9)
for the Darwin term of Eq. (7). Inserting the expressions of Eqs. (8) and (9),
Eq. (6) adopts the form
η(~b, z) =
dz ′ Vopt(~b, z
′) η(~b, z ′)−
1 + b
Vc(~b, z)
η(~b, z)
1 + b
∂Vc(~b, z
η(~b, z ′)
Vso(~b, z) (~σ ·~b× ~k − ikz) η(~b, z)
1 + b
dz ′ (z − z ′)
Vso(~b, z
′)~σ ·~b× ~k
η(~b, z ′)
2 + b
Vso(~b, z)
η(~b, z ′) . (10)
We look for a solution of the form
η(~b, z) = f(~b, z) exp
dz ′ Vopt(~b, z
′) f(~b, z ′)
= f(~b, z) exp
i S(~b, z)
, (11)
which should reduce to the ROMEA result of Eq. (5) when terms of higher
order than Vopt/k are neglected. Accordingly, the function f(~b, z) should be of
the form f = 1+O(Vopt/k
2). Substituting (11) into Eq. (10) and multiplying
by e−i S(
~b,z) yields
f(~b, z) = 1−
1 + b
Vc(~b, z)
f(~b, z)
1 + b
∂Vc(~b, z
f(~b, z ′)
Vso(~b, z) (~σ ·~b× ~k − ikz) f(~b, z)
1 + b
dz ′ (z − z ′)
Vso(~b, z
′)~σ ·~b× ~k
f(~b, z ′)
2 + b
Vso(~b, z)
f(~b, z ′) . (12)
In deriving this equation, we set ei S(
~b,z ′) e−i S(
~b,z) equal to 1, since higher-order
terms are neglected. The difficulty in solving for f(~b, z) is that Eq. (12) is an
integral equation. An expression for f(~b, z) can, however, be readily obtained
by adding (1−f) terms, which introduce only higher-order terms, to the right-
hand side of Eq. (12). This is permitted since we seek for a solution up to order
Vopt/k
2. With this manipulation, the function f becomes
f(~b, z) = 1−
1 + b
Vc(~b, z) +
1 + b
∂Vc(~b, z
Vso(~b, z) (~σ ·~b× ~k − ikz)
1 + b
dz ′ (z − z ′)
Vso(~b, z
′)~σ ·~b× ~k
2 + b
Vso(~b, z) . (13)
The eikonal factor of Eq. (11) with f(~b, z) given by (13), is a solution of the
integral equation (6) to order Vopt/k
2 and reduces to the ROMEA result (5)
when truncated at order Vopt/k. Furthermore, it can be easily verified that the
derivative of η is of higher order in Vopt/k than η itself. Henceforth, calculations
performed with the eikonal factor of Eqs. (11) and (13), are dubbed as the
second-order relativistic optical model eikonal approximation (SOROMEA).
3 Results
One way to quantify the overall effect of FSI in A(e, e′p) processes is via
the nuclear transparency. The measurements are commonly performed under
quasielastic conditions [31–36]. We obtain the theoretical transparencies by
adopting similar expressions and cuts as in the experiments. Hence, the nuclear
transparency is defined as [37]
d~pmS
α(~pm, Em, ~k)
d~pmS
PWIA(~pm, Em)
. (14)
Here, Sα is the reduced cross section for knockout from the shell α
Sα(~pm, Em, ~k) =
dΩpdǫ′dΩǫ′
(e, e′p)
, (15)
where ~pm and Em are the missing momentum and energy, K is a kinemati-
cal factor and σep is the off-shell electron-proton cross section. S
PWIA is the
reduced cross section within the plane-wave impulse approximation (PWIA)
in the nonrelativistic limit. Further,
α extends over all occupied shells α in
the target nucleus. The phase-space volume in the missing momentum ∆3pm
is defined by the cut |pm| ≤ 300 MeV/c. The A-dependent factor cA corrects
in a phenomenological way for the effect of short-range correlations. We intro-
duce the cA in the denominator of Eq. (14) because the data have undergone
a rescaling with cA = 0.9 (
12C) and 0.82 (56Fe).
Transparencies have been computed for the nuclei 12C and 56Fe at planar
and constant (~q, ω) kinematics compatible with the phase space covered in
the experiments. For the optical potential, the EDAD1 parametrization of
Ref. [38] was used.
In Fig. 1 the ROMEA and SOROMEA results are displayed as a function
of the four-momentum transfer Q2 and compared to the data. Not surpris-
ingly, at high Q2, the ROMEA and SOROMEA predictions practically coin-
cide and the role of the second-order eikonal effects grows with decreasing Q2.
At Q2 = 1.7 (GeV/c)2, the ROMEA and SOROMEA transparencies agree to
within 1%; while at Q2 = 0.3 (GeV/c)2, the difference has risen to 3% for
56Fe and 5% for 12C. The enhancement of the nuclear transparency due to
the second-order eikonal corrections is modest, even for values of the four-
momentum transfer as low as Q2 = 0.2 (GeV/c)2. Both the ROMEA and
the SOROMEA predictions tend to slightly underestimate the measurements.
The second-order corrections move the predictions somewhat closer to the
Q2 = 0.34 (GeV/c)2 data point.
As the nuclear transparency involves integrations over missing momenta and
energies, it may hide subtleties in the theoretical treatment of the FSI mech-
anisms. Next, we focus on highly exclusive A(e, e′p) quantities and quantify
the role of second-order eikonal effects.
An observable that is particularly well suited to study FSI effects is the induced
normal polarization
d5σ (σn =↑)− d
5σ (σn =↓)
d5σ (σn =↑) + d5σ (σn =↓)
, (16)
where σn denotes the spin orientation of the ejectile in the direction orthogonal
to the reaction plane. Indeed, in the one-photon exchange approximation, Pn
vanishes in the absence of FSI.
Fig. 2 shows the missing momentum dependence of the induced normal polar-
ization for the kinematics of Ref. [39], corresponding with Q2 ≈ 0.5 (GeV/c)2.
The calculations are performed with the energy-dependent A-independent
(EDAI) potential of Ref. [38]. The ROMEA results are in line with the rela-
tivistic distorted-wave impulse approximation (RDWIA) calculations of Ref. [40].
The RDWIA framework was implemented by the Madrid-Sevilla group [41]
and relies on a partial-wave expansion of the exact scattering wave function.
It is similar to the (SO)ROMEA approach in that both models compute the
effect of the FSI with the aid of proton-nucleus optical potentials. Further,
the overall agreement with the data is excellent. The second-order eikonal
corrections are most pronounced for the 1s1/2 level. For missing momenta
pm > 125 MeV/c, they reduce the magnitude of the Pn for the 1s1/2 state
by roughly 20%, thereby resulting in a marginally better agreement with the
highest pm data point. For 1p3/2 knockout, on the other hand, the effect of the
second-order eikonal corrections is smaller than 5%.
The inclusion of the second-order eikonal effects is particularly visible at high
missing momentum, a region where also other mechanisms become impor-
tant. The qualitative behavior of the meson-exchange and ∆-isobar currents,
for instance, is alike [42]. At low missing momenta (pm ≤ 200 MeV/c), the in-
duced normal polarization Pn is relatively insensitive to the two-body currents;
whereas at higher missing momenta, sizable contributions from the meson-
exchange and isobar currents are predicted. The influence of the meson and
isobar degrees of freedom is also stronger for knockout from the 1s1/2 shell
than for 1p3/2 knockout.
In Fig. 2, also calculations neglecting the spin-orbit part Vso(~b, z)~σ ·~b×~k are
shown. They illustrate that the spin-orbit distortion is the largest source of
Pn. Hence, a correct inclusion of this term is essential. Moreover, Pn proves to
be rather sensitive to the choice of optical potential [40].
Another A(e, e′p) observable which has been the subject of many investigations
is the left-right asymmetry
ALT =
d5σ (φ = 0◦)− d5σ (φ = 180◦)
d5σ (φ = 0◦) + d5σ (φ = 180◦)
. (17)
The subscript LT indicates that this quantity is closely related to the longitudinal-
transverse response function.
Fig. 3 presents the ALT predictions for the removal of 1p-shell protons in
16O in the kinematics of Refs. [43, 44]. The FSI shift the dip in ALT , which
is located at pm ≈ 400 MeV/c in the relativistic PWIA (RPWIA), to lower
values of the missing momentum. This shift is essential to describe the data at
pm ≈ 350 MeV/c. The exact pm location and height of the ripple, however, are
affected by many ingredients of the calculations, such as the current operator,
bound-state wave function, and parametrization of the optical potential [44].
As can be inferred from Fig. 3, the second-order eikonal corrections affect the
height, but not the position of the ripple.
We also show the results of our SOROMEA calculations within the so-called
noSV approximation. In this approximation, the dynamical enhancement of
the lower component of the scattering wave (3) due to the Vs(r)− Vv(r) term
is omitted. As such, the SOROMEA-noSV calculations make the same set of
assumptions as the EMAf-noSV predictions by the Madrid-Sevilla group. The
EMAf-noSV approach is an RDWIA calculation which adopts the EMA in
combination with the noSV approximation. The second-order eikonal correc-
tions clearly increase the height of the oscillation in ALT and brings the eikonal
noSV calculations in excellent agreement with the corresponding partial-wave
prediction EMAf-noSV. Finally, the comparison between the SOROMEA and
the SOROMEA-noSV calculations demonstrates that the dynamical enhance-
ment plays a significant role in the description of the ALT data.
In Fig. 4, 16O(e, e′p) cross-section results are displayed for the kinematics
of Fig. 3. The spectroscopic factors, which normalize the calculations to the
data, were determined by performing a χ2 fit to the data and are summa-
rized in Table 1. The RDWIA spectroscopic factors are 5–10% higher than
the (SO)ROMEA ones. The second-order eikonal corrections hardly affect the
values of the extracted spectroscopic factors. Both our (SO)ROMEA calcula-
tions and the RDWIA predictions of the Madrid-Sevilla group do a very good
job of representing the data over the entire pm range. For missing momenta
|pm| ≤ 250 MeV/c, the (SO)ROMEA and RDWIA results are in excellent
agreement. The impact of the second-order eikonal corrections on the com-
puted differential cross sections is almost negligible for pm below the Fermi
momentum, but can be as large as 30% at high pm. The inclusion of the
second-order effects improves the agreement with the RDWIA calculations at
these high missing momenta. Results for the effective response functions RL,
RT , RLT , and RTT are not shown, but the effect of the second-order eikonal
corrections is similar to the effect on the differential cross section.
RPWIA ROMEA SOROMEA RDWIA
1p3/2 0.55 0.84 0.83 0.92
1p1/2 0.47 0.75 0.74 0.78
Table 1
The spectroscopic factors for the 16O(e, e′p) reaction of Ref. [43], as obtained with
a χ2 procedure.
4 Conclusions
We have developed a formalism to account for second-order corrections in the
eikonal approximation. Our model is relativistic and includes both the central
and spin-orbit parts of the optical potentials. The formalism has been applied
to A(e, e′p) processes. Our numerical calculations show that the effect of the
second-order eikonal corrections on A(e, e′p) observables is rather limited for
Q2 ≥ 0.2 (GeV/c)2. The nuclear transparency calculations confirm the ex-
pected energy dependence of the eikonal corrections: the effect decreases with
increasing Q2. Concerning the pm dependence of the A(e, e
′p) observables, the
effect of the second-order eikonal corrections is minor except at high missing
momenta. In this high-pm region, the eikonal corrections affect the observables
up to an order of 30%, thereby bringing the calculations closer to the data
and/or the RDWIA calculations. The robustness of the first-order eikonal ap-
proximation, which emerges from this study, can be invoked to explain the
success of the Glauber approach to A(e, e′p) down to relatively low kinetic
energies of 200 MeV.
Acknowledgements
This work was supported by the Fund for Scientific Research, Flanders (FWO).
References
[1] G.P. McCauley, G.E. Brown, Proc. Phys. Soc. London 71 (1958) 893.
[2] R.J. Glauber, in: W.E. Brittin, et al. (Eds.), Lectures in Theoretical Physics,
Interscience, New York, 1959.
[3] C.J. Joachain, Quantum Collision Theory (Elsevier, Amsterdam, 1975).
[4] W.R. Greenberg, G.A. Miller, Phys. Rev. C 49 (1994) 2747.
[5] A. Bianconi, M. Radici, Phys. Lett. B 363 (1995) 24.
[6] H. Ito, S.E. Koonin, R. Seki, Phys. Rev. C 56 (1997) 3231.
[7] D. Debruyne, J. Ryckebusch, W. Van Nespen, S. Janssen, Phys. Rev. C 62
(2000) 024611.
[8] L.L. Frankfurt, E. Moniz, M. Sargsyan, M.I. Strikman, Phys. Rev. C 51 (1995)
3435.
[9] N.N. Nikolaev, A. Szcurek, J. Speth, J. Wambach, B.G. Zakharov, V.R. Zoller,
Nucl. Phys. A 582 (1995) 665.
[10] S. Jeschonnek, T.W. Donnelly, Phys. Rev. C 59 (1999) 2676.
[11] A. Kohama, K. Yazaki, R. Seki, Nucl. Phys. A 662 (2000) 175.
[12] O. Benhar, N. Nikolaev, J. Speth, A. Usmani, B. Zakharov, Nucl. Phys. A 673
(2000) 241.
[13] C. Ciofi degli Atti, L.P. Kaptari, D. Treleani, Phys. Rev. C 63 (2001) 044601.
[14] M. Petraki, E. Mavrommatis, O. Benhar, J.W. Clark, A. Fabrocini, S. Fantoni,
Phys. Rev. C 67 (2003) 014605.
[15] J. Ryckebusch, D. Debruyne, P. Lava, S. Janssen, B. Van Overmeire, T. Van
Cauteren, Nucl. Phys. A 728 (2003) 226.
[16] S.J. Wallace, Phys. Rev. Lett. 27 (1971) 622;
S.J. Wallace, Ann. Phys. (N.Y.) 78 (1973) 190;
S.J. Wallace, Phys. Rev. D 8 (1973) 1846;
S.J. Wallace, J.A. McNeil, Phys. Rev. D 16 (1977) 3565;
S.J. Wallace, Phys. Rev. C 29 (1984) 956.
[17] D. Waxman, C. Wilkin, J.-F. Germond, R.J. Lombard, Phys. Rev. C 24 (1981)
[18] G. Fäldt, A. Ingemarsson, J. Mahalanabis, Phys. Rev. C 46 (1992) 1974.
[19] F. Carstoiu, R.J. Lombard, Phys. Rev. C 48 (1993) 830.
[20] M.H. Cha, Y.J. Kim, Phys. Rev. C 51 (1995) 212.
[21] J.S. Al-Khalili, J.A. Tostevin, J.M. Brooke, Phys. Rev. C 55 (1997) R1018.
[22] C.E. Aguiar, F. Zardi, A. Vitturi, Phys. Rev. C 56 (1997) 1511.
[23] J.A. Tjon, S.J. Wallace, Phys. Rev. C 74 (2006) 064602.
[24] A. Baker, Phys. Rev. D 6 (1972) 3462.
[25] J.J. Kelly, Adv. Nucl. Phys. 23 (1996) 75.
[26] B.D. Serot, J.D. Walecka, Adv. Nucl. Phys. 16 (1986) 1.
[27] R.J. Furnstahl, B.D. Serot, H.-B. Tang, Nucl. Phys. A 615 (1997) 441.
[28] T. de Forest, Nucl. Phys. A 392 (1983) 232.
[29] R.D. Amado, J. Piekarewicz, D.A. Sparrow, J.A. McNeil, Phys. Rev. C 28
(1983) 1663.
[30] J.J. Kelly, Phys. Rev. C 60 (1999) 044609.
[31] G. Garino, et al., Phys. Rev. C 45 (1992) 780.
[32] T.G. O’Neill, et al., Phys. Lett. B 351 (1995) 87.
[33] N.C.R. Makins, et al., Phys. Rev. Lett. 72 (1994) 1986.
[34] D. Abbott, et al., Phys. Rev. Lett. 80 (1998) 5072.
[35] D. Dutta, et al., Phys. Rev. C 68 (2003) 064603.
[36] D. Rohe, et al., Phys. Rev. C 72 (2005) 054602.
[37] P. Lava, M.C. Mart́ınez, J. Ryckebusch, J.A. Caballero, J.M. Ud́ıas, Phys. Lett.
B 595 (2004) 177.
[38] E.D. Cooper, S. Hama, B.C. Clark, R.L. Mercer, Phys. Rev. C 47 (1993) 297.
[39] R.J. Woo, et al., Phys. Rev. Lett. 80 (1998) 456.
[40] J.M. Ud́ıas, J.R. Vignote, Phys. Rev. C 62 (2000) 034302.
[41] J.M. Ud́ıas, P. Sarriguren, E. Moya de Guerra, E. Garrido, J. A. Caballero,
Phys. Rev. C 48 (1993) 2731.
[42] J. Ryckebusch, D. Debruyne, W. Van Nespen, S. Janssen, Phys. Rev. C 60
(1999) 034604.
[43] J. Gao, et al., Phys. Rev. Lett. 84 (2000) 3265.
[44] K.G. Fissum, et al., Phys. Rev. C 70 (2004) 034606.
ROMEA
SOROMEA
Q2  (GeV2/c2)
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 1. Nuclear transparencies versus Q2 for A(e, e′p) reactions in quasielastic kine-
matics. The SOROMEA (dashed lines) are compared to the ROMEA (solid lines)
results. The EDAD1 potential [38] has been employed in both formalisms. Data
points are from Refs. [31] (open squares), [32, 33] (open triangles), [34, 35] (solid
triangles), and [36] (open diamonds).
1p3/2
ROMEA
SOROMEA
ROMEA-noSO
SOROMEA-noSO
1s1/2
pm (MeV/c)
0 50 100 150 200 250 300
Fig. 2. Induced normal polarization Pn for proton knockout from the 1p3/2 (upper
panel) and 1s1/2 (lower panel) shell in the
12C(e, e′~p) reaction. The kinematics is de-
termined by beam energy ǫ = 579 MeV, momentum transfer q = 760 MeV/c, energy
transfer ω = 292 MeV, and azimuthal angle φ = 180◦. The solid (dashed) curves
represent ROMEA (SOROMEA) calculations. The dot-dashed (dotted) curves re-
fer to predictions obtained within the ROMEA (SOROMEA) frameworks, with the
spin-orbit term Vso(~b, z)~σ ·~b× ~k turned off. The data are from Ref. [39].
1p1/2
ROMEA
SOROMEA
SOROMEA-noSV
EMAf-noSV
RPWIA
1p3/2
pm (MeV/c)
0 50 100 150 200 250 300 350 400 450
Fig. 3. The left-right asymmetry ALT for the
16O(e, e′p) experiment of [43].
The kinematics was ǫ = 2.442 GeV, q = 1 GeV/c, and ω = 445 MeV (i.e.,
Q2 = 0.8 (GeV/c)2). The red solid (green dashed) lines show the results of the
ROMEA (SOROMEA) calculations. The SOROMEA-noSV (orange long-dotted
curves) calculations differ from the SOROMEA calculations in that the dynamical
enhancement of the lower component of the scattering wave function is neglected.
The cyan short-dotted curves present the results from an RDWIA calculation where
the spinor distortions in the scattered wave are neglected. All calculations use the
EDAI version for the optical potentials [38]. The black short-dotted curves represent
the RPWIA results. The data points are from Ref. [43].
ROMEA
SOROMEA
RDWIA
RPWIA
1p3/2
1p1/2
pm (MeV/c)
-400 -300 -200 -100 0 100 200 300 400
Fig. 4. 16O(e, e′p) cross sections compared to ROMEA, SOROMEA, RDWIA, and
RPWIA calculations for the constant (~q, ω) kinematics of Fig. 3. The calculations
use the optical potential EDAI [38]. The data are from Ref. [43] and the RDWIA
results from Ref. [44]. The following convention is adopted: positive (negative) pm
corresponds to φ = 180◦ (φ = 0◦).
	Introduction
	Formalism
	Results
	Conclusions
	Acknowledgements
	References
ABSTRACT
  The first-order eikonal approximation is frequently adopted in interpreting
the results of $A(e,e'p)$ measurements. Glauber calculations, for example,
typically adopt the first-order eikonal approximation. We present an extension
of the relativistic eikonal approach to $A(e,e'p)$ which accounts for
second-order eikonal corrections. The numerical calculations are performed
within the relativistic optical model eikonal approximation. The nuclear
transparency results indicate that the effect of the second-order eikonal
corrections is rather modest, even at $Q^{2} \approx 0.2$ (GeV/c)$^2$. The same
applies to polarization observables, left-right asymmetries, and differential
cross sections at low missing momenta. At high missing momenta, however, the
second-order eikonal corrections are significant and bring the calculations in
closer agreement with the data and/or the exact results from models adopting
partial-wave expansions.

<|endoftext|><|startoftext|>
Flory-Huggins theory for the solubility of heterogeneously-modified polymers
Patrick B. Warren
Unilever R&D Port Sunlight, Bebington, Wirral, CH63 3JW, UK.
(Dated: April 5, 2007)
Many water soluble polymers are chemically modified versions of insoluble base materials such as
cellulose. A Flory-Huggins model is solved to determine the effects of heterogeneity in modification
on the solubility of such polymers. It is found that heterogeneity leads to decreased solubility, with
the effect increasing with increasing blockiness. In the limit of extreme blockiness, the nature of the
phase coexistence crosses over to a polymer-polymer demixing transition. Some consequences are
discussed for the synthesis of partially modified polymers, and the experimental characterisation of
such systems.
Many water-soluble polymers are made by chemically
modifying insoluble base materials such as starches and
gums, for example a wide class of water-soluble polymers
are obtained from cellulose [1, 2]. It is often possible to
vary the degree of modification of the base polymer to
obtain water soluble polymers with, in principle, con-
tinuously variable properties. A basic characteristic of
these polymers is their solubility, but given the essen-
tially stochastic nature of the chemical modification step,
what is the effect of heterogeneity in modification on the
solubility of the resulting materials?
In the present paper, this question is approached from
a theoretical point of view by setting up a Flory-Huggins
model for the phase behaviour of a polymer-solvent mix-
ture [3], where the polymers have a random degree of
modification. In this approach, the issue of solubility
is translated into the problem of determing the phase
coexistence between a dissolved aqueous phase and an
undissolved (water-poor) phase. The solubility is then
formally given by the polymer concentration in the aque-
ous phase. Determination of the full phase behaviour for
a multicomponent Flory-Huggins theory is an onerous
task though, and a simpler approach is to examine the
spinodal stability of the system, which can be taken to
be representative of the full phase behaviour. This is the
approach taken in the present paper. It is arguably more
insightful than a full calculation of the phase behaviour
since closed-form analytic expressions can be obtained for
the spinodal stability limit. The approach taken is sim-
ilar to models for the phase behaviour of random block
copolymer melts which have been developed in the past
[4, 5, 6]. There has been rather little work though on
random copolymers which also include a solvent, apart
from a brief example described by Sollich et al [7].
In the present model, it is supposed that the system
comprises a large number of species of polymers i with
differing degrees of modification 0 < αi < 1 and con-
centrations ρi. For simplicity, length polydispersity is
neglected, and all the polymers are assumed to have the
same number N of segments. The system is then de-
scribed by the following (mean field) Flory-Huggins free
energy density,
i ρi log ρi+(1−φ) log(1−φ)+χ(φ−η)(1−φ), (1)
where φ is the total polymer segment concentration and
η is the concentration of chemically modified segments,
given respectively by φ = N
i ρi and η = N
i ρiαi.
The first term in Eq. (1) is the ideal free energy of mix-
ing. The second term is the usual Flory-Huggins configu-
rational chain entropy. The third term is the free energy
cost of the unmodified polymer segments at a concentra-
tion φ− η coming into contact with solvent (water) at a
concentration 1 − φ. Typically one expects χ > 1/2 for
this interaction, to represent the repulsion between un-
modified segments and water which leads to phase sepa-
ration of unmodified polymers. To keep the model sim-
ple, this is the only χ-parameter that is retained in the
problem.
Eq. (1) has the structure of a moment free energy,
since the excess free energy, comprising the second and
third terms, only depends on φ and η which are mo-
ment densities. Such a system can be analysed using the
methods developed by Sollich and coworkers [7, 8, 9, 10].
In particular, Ref. [10] describes how the spinodal sta-
bility conditions for systems with an excess free energy
can be expressed in terms of moment densities, gen-
eralising various truncation theorems obtained by ear-
lier workers [11, 12]. I now summarise the relevant re-
sults, translated into terms suitable for the present prob-
lem. Let us consider such a system with a free energy
i ρi log ρi + f
(ex)(φ(1) . . . φ(n)), where the excess
free energy depends on moment densities of the form
φ(r) =
i ρiw
i (r = 1 . . . n), with the w
i being
species-dependent weights. The fundamental idea is that
the moment densities can be treated as effective species
concentrations. In particular, it can be proved that spin-
odal stability corresponds to the positive-definiteness of
the matrix M of second partial derivatives of the free en-
ergy with respect to the moment densities. In Ref. [10]
it is shown that M = Mid + Mex where (M
id )rs =
i ρiw
i and (Mex)rs = ∂
2f (ex)/∂φ(r)∂φ(s). The
limit of spinodal stability is given by detM = 0. This
condition usually corresponds to the vanishing of a single
eigenvalue of M, with an eigenvector ∆φ(s) that satisfies
s(M)rs∆φ
(s) = 0. It is shown in Ref. [10] that the
spinodal instability direction in the space of species con-
centrations is given by ∆ρi =
rs ρiw
i (Mid)rs∆φ
For the present problem, there are two moment densi-
ties φ and η, defined respectively with w
i = N (a con-
http://arxiv.org/abs/0704.0707v1
stant) and w
i = Nαi (the number of modified groups
on the ith species). Application of the above theory to
Eq. (1) leads to
id = N
i ρiαi
i ρiαi
i ρiα
Mex =
(1− φ)−1 − 2χ χ
. (3)
After some algebra the condition detM = 0 reduces to
−2χ(1−〈α〉)−χ2Nφ(〈α2〉−〈α〉) = 0, (4)
where
〈α〉 =
i ρiαi /
i ρi, 〈α
i ρiα
i ρi. (5)
I emphasise that, despite being remarkably simple,
Eq. (4) is exact.
One already reaches a significant conclusion from this.
The first three terms in Eq. (4) are what one would ex-
pect from standard Flory-Huggins theory [3], with an ef-
fective χ-parameter given by the product of the original
χ-parameter and the fraction 1− 〈α〉 of unmodified seg-
ments. These terms therefore take account of the mean
degree of modification. The final term in Eq. (4) is a
correction due to the heterogeneity. Since the variance
〈α2〉− 〈α〉2 is positive, this term is always negative. The
effect is that heterogeneity in modification reduces the
solubility, over and above what would be expected from
the mean degree of modification.
To make further progress, it is convenient to specify a
model for the distribution of the αi. In particular, such a
model can be used to examine the effect of blockiness in
modification which is expected to play an important role.
In previous work on random block copolymers [4, 5], a
Markov model was used to characterise the correlations
between different kinds of segments. Whilst such a model
may be appropriate for the stochastic nature of the syn-
thetic route for such random block copolymers, as dis-
cussed below it is probably not appropriate in the present
case. I therefore consider instead a very simple model for
the heterogeneity in which the modified segments occur
in blocks of size M , where 1 < M < N . In this model,
it is supposed that each block has an equal probability p
of being modified, and there are no further correlations.
Then, for any particular species, αi = (1/N)
j=1 Mǫij
where j labels the blocks, and ǫij is zero or one with prob-
ability 1 − p and p respectively. Thus the αi are drawn
from scaled binomial distribution, with
〈α〉 = p, 〈α2〉 − 〈α〉 = (M/N) p(1− p). (6)
Eq. (4) becomes
− 2χ(1− p)− χ2Mφp(1− p) = 0. (7)
This is a quadratic equation for χ and the appropriate
root is
)}1/2
. (8)
I now examine the consequences of this result.
The formal limit M → 0 corresponds to a vanishing
variance and a completely uniform distribution of mod-
ified segments, as though each monomer has undergone
an identical fractional modification by a fraction p, rather
than being modified or not with probability p and 1− p.
As noted already above, this limit corresponds to sim-
ple Flory-Huggins theory with an effective χ-parameter
equal to χ(1−p). For large N , this indicates the absence
of phase separation for χ(1− p) < 1/2 or p > 1− 1/(2χ).
Now let us consider Eq. (8) for block size M = 1.
In this case, individual segments are modified randomly
with no correlations. For M = 1 and large N in Eq. (8),
there are two behaviours depending on the value of p.
For p < 4/5, there is an absence of phase separation
for χ(1 − p) < 1/2, just as for the M → 0 limit. For
4/5 < p < 1, the behaviour is more complicated. To be
precise, the location of the minimum value of the χ(φ)
spinodal shifts from φmin ∼ N
−1/2 for p < 4/5 to a non-
vanishing 0 < φmin < 1 for p > 4/5 (it is the examination
of Eq. (8) in the limit φ ∼ N−1/2 that gives the cross over
point p = 4/5). The change in behaviour can be seen for
the M = 1 curves (dashed lines) in Fig. 1 and is shown
explicitly in the upper plot of Fig. 2.
Let us next consider the limit of extreme blockiness
M = N . This limit is strikingly different from the M = 1
case. For largeN and p > 0, one can show that there is an
absence of phase separation only for χ
Np(1− p) < 2.
In the large N limit, this inequality is always violated,
indicating that the system always has a tendency to un-
dergo phase separation in the limit of extreme blockiness.
Since the unmodified polymer system itself only phase
separates for χ > 1/2, this suggests that the phase sep-
aration has the nature of a polymer-polymer demixing
transition rather than a solvent-driven phase separation.
This insight is confirmed by analysis of the spinodal in-
stability direction below.
For large N and general M in Eq. (8), one would ex-
pect that the above two cases represent the two classes of
behaviour. In the first case M ≪ N and the behaviour
is similar to the M = 1 limit where individual segments
are randomly modified. In the second case, M ∝ N
and the behaviour is similar to the M = N limit of ex-
treme blockiness. Fig. 1 shows typical spinodal curves
calculated from Eq. (8) for various values of p and M .
The location of the minimum (φmin, χmin) of the spin-
odal curves can be numerically determined, and Fig. 2
shows how this depends on p.
The results show firstly that for M ≪ N , increasing p
leads to increasing solubility as the value of χ required to
reach the spinodal instability is increased. Moreover, a
decrease in solubility between a uniform model (M → 0)
with no heterogeneity, and a model with fine-grained
0.0 0.2 0.4 0.6 0.8 1.0
PSfrag replacements
p = 0.1
0.0 0.2 0.4 0.6 0.8 1.0
PSfrag replacements
p = 0.1
p = 0.5
0.0 0.2 0.4 0.6 0.8 1.0
PSfrag replacements
p = 0.1
p = 0.5
p = 0.9
FIG. 1: Spinodal curves calculated from Eq. (8) for polymers
of length N = 103, for three values of the mean degree of
modification p, and for block sizes M → 0 (uniform limit,
solid line), M = 1 (dashed line), M = 10 (dash-dot line),
M = 100 (dash-dot-dot line) and M = 103 (dash-dash-dot
line). The system is spinodally unstable above the indicated
curves. Note the change in shape of the M = 1 curves: for
p = 0.1 and 0.5 the minimum is at φ → 0, whereas for p = 0.9
the minimum is at φ ≈ 0.25.
blockiness (M = 1), is apparent. The major effect arises
as M → N though, where the tendency for phase sepa-
ration is greatly enhanced.
The above analysis is augmented considering the spin-
odal instability direction associated with the spinodal
stability limit which can provide a useful mechanistic in-
sight. As explained above, the spinodal instability direc-
tion is characterised by the eigenvector that corresponds
to the vanishing eigenvalue responsible for the vanish-
0.0 0.2 0.4 0.6 0.8 1.0
PSfrag replacements
0.0 0.2 0.4 0.6 0.8 1.0
PSfrag replacements
FIG. 2: The location of the numerically determined minimum
of the spinodal curves from Eq. (8) is plotted as a function
of p, for polymers of length N = 103 and block sizes M → 0
(uniform limit, solid line), M = 1 (dashed line), M = 10
(dash-dot line), M = 100 (dash-dot-dot line) and M = 103
(dash-dash-dot line). For M = 1 (dashed line) the upper
plot shows clearly that φmin ≈ N
≈ 0.03 only holds for
4/5 = 0.8.
ing spinodal determinant. For the present problem, from
Eqs. (2)–(3), one finds the instability direction is charac-
terised by
∆η /∆φ = 〈α〉 − χNφ(〈α2〉 − 〈α〉2) (9)
The corresponding spinodal instability direction in the
space of species concentrations is
〈α2〉∆φ − 〈α〉∆η + αi(∆η − 〈α〉∆φ)
φ(〈α2〉 − 〈α〉2)
1 + χNφ(〈α〉 − αi)
where the second line follows by inserting the result for
the ratio ∆η/∆φ. These results should be evaluated on
the spinodal. They are all exact, for an arbitrary distri-
bution of αi.
For the instability direction to lie along a pure dilution
line, one should have ∆ρi/ρi independent of species i.
One can conclude that this only happens if ∆η/∆φ =
〈α〉, in other words if the variance 〈α2〉 − 〈α〉2 vanishes.
In such a case, the phase transition is purely associative,
or solvent-driven, meaning that the compositions of the
coexisting phases remain the same (∆ρi/∆φ = ρi/φ).
If one specialises to the model of blockiness described
above by inserting the value of χ corresponding to the
spinodal stability limit, the instability direction becomes
)}1/2
This confirms that the spinodal instability lies along a
dilution line (∆η/∆φ = 〈α〉 = p) only in the limit
M → 0 which formally corresponds to a vanishing vari-
ance. For M = 1 (and M ≪ N in general) the phase
transition has a mixed character. The interesting case
occurs when M = N (or M ∝ N in general) for which
∆η/∆φ ∼ (−)N1/2 in the limit of largeN . One can write
this as ∆φ/∆η → 0 as N → ∞. This shows that the
phase transition tends towards being purely segregative,
meaning that the overall polymer concentration in coex-
isting phases remains the same (∆φ = 0). This confirms
the suggestion above, that in the limit of extreme block-
iness, the system tends towards a segregative polymer-
polymer demixing transition.
Let us now try to draw some conclusions. The main ef-
fect of randomness is to reduce the solubility of partially-
modified polymers beyond what would be expected from
the mean degree of modification. The extent to which
this occurs depends on the blockiness in substitution.
For fine-grained blockiness, the phase behaviour is ex-
pected to be similar to a system for which there is no ran-
domness, albeit with a somewhat reduced solubility. For
coarse-grained blockiness, where the block size is com-
parable to the polymer length, the nature of the phase
transition changes to a polymer-polymer demixing tran-
sition. In this situation, one expects that the modified
polymers (being almost fully modified) will partition into
the aqueous phase, leaving the unmodified polymers be-
hind.
The reason for considering the two extreme kinds of
blockiness is now clearer: namely one can envisage two
different mechanisms of chemical modification (this is the
reason why a Markov model for the distribution of mod-
ified segments has not been used). Fine-grained block-
iness would arise if monomers are equally accessible to
the modifying agent, irrespective of their surroudings. If
this cannot be achieved in a one-step process (for the
reason described below) it could perhaps be achieved in
a two-step process, by fully modifying the polymers then
removing a random fraction of the derivative groups. Ex-
treme blockiness on the scale of the polymer chain itself
would arise if the modifying agent was present only in the
aqueous phase, and as such only able to access polymer
which had already been solubilised. This would lead to a
mixture of polymers which were either fully modified, or
remained unmodified and insoluble. The process of mod-
ification of insoluble polymers could still be initiated be-
cause the modifying agent is able to access the tiny pro-
portion of the insoluble polymer segments which lie at
the interface between the insoluble and aqueous phases.
Experimentally, confirmation of the scenario of extreme
blockiness would be given by measuring the mean degree
of modification for the dissolved polymers. One should
find that this is much in excess of the apparent mean
degree of modification.
In the calculation, the major effect arises from inter-
chain rather than intra-chain heterogeneities. The model
is not sophisticated enough to take account of the solu-
tion structures such as micelles or mesophases that could
form for blocky polymers with block sizes M ≫ 1 but
still M < N (for example, diblock copolymers). Such
polymers would be expected to have greater solubilities
than would be predicted from the Flory-Huggins theory
since the hydrophobic groups can be buried in micelles
or other solution structures. The present theory could be
extended to discuss these inhomogeneous situations using
a Landau approach developed for random block copoly-
mers [4, 5, 6]. For the mechanistic routes discussed above
though, it is difficult to envisage that polymers with in-
termediate block sizes could arise very easily. I therefore
expect that the general conclusions will remain.
Finally I note that in principle the above model for
the phase behaviour could be combined with a model for
the chemical modification reaction, to obtain a theory for
reaction-induced solubility. However, one needs to take
great care to capture the kinetics correctly [13].
I thank Nigel Clarke for a critical reading of the
manuscript.
[1] R. L. Davidson, Handbook of water-soluble gums and
resins (McGraw-Hill, New York, 1980).
[2] J. Rueben, Macromol. 17, 156 (1984).
[3] P. J. Flory, Principles of polymer chemistry (Cornell Uni-
versity Press, Ithaca, New York, 1953).
[4] G. H. Fredrickson and S. T. Milner, Phys. Rev. Lett. 67,
835 (1991).
[5] G. H. Fredrickson, S. T. Milner, and L. Leibler, Macro-
molecules 25, 6341 (1992).
[6] A. Nesariker, M. Olvera de la Cruz, and B. Crist, J.
Chem. Phys. 98, 7385 (1993).
[7] P. Sollich, P. B. Warren, and M. E. Cates, Adv. Chem.
Phys. 116, 265 (2001).
[8] P. Sollich and M. E. Cates, Phys. Rev. Lett. 80, 1365
(1998).
[9] P. B. Warren, Phys. Rev. Lett. 80, 1369 (1998).
[10] P. B. Warren, Europhys. Lett. 46, 295 (1999).
[11] P. Irvine and M. Gordon, Proc. R. Soc. Lond. A 375,
397 (1981).
[12] E. M. Hendriks, Ind. Eng. Chem. Res. 27, 1728 (1988).
[13] G. A. Buxton and N. Clarke, Macromolecules 38, 8929
(2005).
ABSTRACT
  Many water soluble polymers are chemically modified versions of insoluble
base materials such as cellulose. A Flory-Huggins model is solved to determine
the effects of heterogeneity in modification on the solubility of such
polymers. It is found that heterogeneity leads to decreased solubility, with
the effect increasing with increasing blockiness. In the limit of extreme
blockiness, the nature of the phase coexistence crosses over to a
polymer-polymer demixing transition. Some consequences are discussed for the
synthesis of partially modified polymers, and the experimental characterisation
of such systems.

<|endoftext|><|startoftext|>
Introduction and statement of the results.
Let Ω be a bounded open set with smooth boundary in R2 or R3. Consider a L∞ function σ such
that there exists a real c with σ(x) ≥ c > 0. Consider the elliptic equation
− div
σ(x)∇u
= 0 in Ω, (1)
with the Dirichlet boundary condition
u = f on ∂Ω. (2)
Define the Dirichlet-to-Neumann map as
Λσ : f 7→ σ (∂nu)|∂Ω ,
where u solves (1),(2) and n is the outer unit normal vector to ∂Ω. The inverse conductivity problem
of Calderón is to determine σ from Λσ. Electrical impedance tomography aims to form an image
of the conductivity distribution σ from the knowledge of Λσ. When σ is smooth enough, one can
reconstruct σ from Λσ (see the works of Sylvester and Uhlmann [21], Nachmann [15, 16] and Novikov
∗Laboratoire de Mathématiques Appliquées de Compiègne. Université de Technologie de Compiègne.
http://arxiv.org/abs/0704.0708v1
[17]). When the conductivity distribution is only L∞, Astala and Päivärinta have recently shown in
[3] that, in dimension two, the map Λσ determines σ ∈ L
∞(Ω).
We are interested in a particular case of that problem: when a body is inserted inside a given
object with a distinct conductivity, the question of determining its shape from boundary measurement
arises in many fields of modern technology. In the context of the inverse problem of conductivity
of Calderón, we restrict the range of admissible conductivity distributions to the family of piecewise
constant functions which take only two distinct values σ1, σ2 > 0 which are assumed to be known.
The conductivity distribution is then defined by an open subset ω as
σ = σ1χΩ\ω + σ2χω. (3)
Here, the only unknown of the problem is ω a subdomain of Ω with a smooth boundary ∂ω; its outer
unit normal vector is denoted by n. The notation χω (respectively. χΩ\ω) denotes the characteristic
function of ω (respectively. Ω \ ω). The second main difference arises from practical considerations:
it is unrealistic from the point of view of applications to know the full graph of Dirichlet-to-Neumann.
Therefore, we will assume that one has access to a single point in that graph. This non destructive
testing problem is usually written from a numerical point of view as the minimization of a cost
function: typically a least-square matching criterion. Many authors have investigated the steepest
descent method for this problem [13, 7, 10, 18, 1] with the methods of shape optimization since the
unknown parameter is a geometrical domain.
This work is devoted to the study of second order methods for this problem that has only be
considered before for simplified models in [5, 2]. By introducing second order methods, one aims to
reach two distinct objectives.
• On one hand, we provide all the needed material to design a Newton algorithm. We will give
differentiability results for the state function and for the objective that we have chosen to study
in this work. Nevertheless, we point out that the discretization of a Newton method for this
problem turns out to be very delicate; this is why, in the present paper, we will neither discuss
about this problem nor present numerical examples. This topic is actually the main objective
of a work in progress.
• On the other hand, we analyze rigorously the well-posedness of the optimization method. This is
justified by the huge numerical literature devoted to the numerical study of this question in the
field of inverse problems; the numerical experiments insist on the ill-posedness of this problem.
We will explain the instability in the continuous settings in terms of shape optimization. We
show that the shape Hessian is not coercive -in fact its Riesz operator is compact – and this
explains the unstability of the minimization process.
Let us describe the precise problem under consideration and the notations. We consider a bounded
domain Ω ⊂ Rd (d = 2 or 3) with a C2 boundary. It is filled with a material whose conductivity is σ1
and with an unknown inclusion ω in Ω of conductivity σ2 6= σ1. We search to reconstruct the shape
of ω by measuring on ∂Ω, the input voltage and the corresponding output current. In the sequel, we
fix d0 > 0 and consider inclusions ω such that ω ⊂⊂ Ωd0 = {x ∈ ω, d(x, ∂Ω) > d0}. We also assume
that the boundary ∂ω is of class C4,α. The inverse problem arises when one has access to the normal
vector derivative of the potential u that solves (1)-(2) when the conductivity distribution is defined
by (3) . Assume that ones knows
σ1∂nu = g on ∂Ω, (4)
then the problem (1)-(2)-(4) is overdetermined. The electrical impedance tomography problem we
consider is to recover the shape of ω from the knowledge of the single Cauchy pair (f, g).
In order to recover the shape of the inclusion ω, an usual strategy is to minimize a cost function.
Many choices are possible; however it turns out that a Kohn and Vogelius type objective leads to a
minimization problem with nicer properties than the least squares fitting approaches (we refer to [1]
for a comparison of different objectives with order one methods and to [2] for the case of a perfectly
insulated inclusion). Therefore, we study such a cost function in this work.
Let us define this criterion. Its distinctive feature is to involve two state functions ud and un: the
state ud solves (1)-(2) while un solves (1)-(4). The Kohn -Vogelius objective JKV is then defined as:
JKV (ω) =
σ|∇(ud − un)|
2 (5)
Let us sum up the results of this paper concerning the minimization of this objective. We first prove
differentiability results for the state ud. In the sequel, we use the convention that a bold character
denotes a vector. If h denotes a deformation field, it can be written as h = hτ + hnn on ∂ω. Note
also that in the following lines, n denotes the outer normal field to ∂ω pointing into Ω\ω. Hence, for
x ∈ ∂ω, we define, when the limit exists, u±(x) (resp. (∂nu)
±(x)) as the limit of u(x± tn(x)) (resp.
〈∇u(x± tn(x),n(x))) when t > 0 tends to 0. Note that hτ is a vector while hn is a scalar quantity.
The admissible deformation fields have to preserve ∂Ω and the regularity of the boundaries:
therefore the space of admissible fields is
H = {h ∈ C4,α(Rd,Rd), Supp(h) ⊂ Ωd0}.
The following result concerns the first order derivative of the state functions ud and un. It was
derived in [7, 18, 1].
Theorem 1 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0 with
a boundary of class C4,α. Then the state functions ud and un are shape differentiable; furthermore
their shape derivative u′d and u
n belongs to H
1(Ω \ ω) ∪H1(ω) and satisfy


∆u′d = 0 in Ω \ ω and in ω,
d on ∂ω,[
= [σ]divτ (hn∇τud) on ∂ω,
u′d = 0 on ∂Ω.


∆u′n = 0 in Ω \ ω and in ω,
n on ∂ω,[
= [σ]divτ (hn∇τun) on ∂ω,
∂u′n = 0 on ∂Ω.
The main result of this work concerns the second order derivative. It is given is the following theorem.
Theorem 2 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0
with a C4,α boundary. Let h1 and h2 be two deformation fields in H. Then the state ud has a second
order shape derivative u′′d ∈ H
1(Ω \ ω) ∪H1(ω) that solves


∆u′′d = 0 in Ω \ ω and in ω,[
h1,nh2,nH − h1τ .(Dnh2τ )
[∂nud]−
h1,n[∂n(ud)
2] + h2,n[∂n(ud)
h1τ .∇h2,n + h2τ .∇h1,n
[∂nud] on ∂ω,[
= divτ
σ∇τ (ud)
+ h1,n
σ∇τ (ud)
+ h1τ .(Dnh2τ )[σ∇τud]
− divτ
(h1τ .∇τh2,n +∇τh1,n.h2τ ) [σ∇τud]
+ divτ
h2,nh1,n(2Dn−HI) [σ∇τud]
on ∂ω,
u′′d = 0 on ∂Ω.
Here, (ud)
i denotes the first order derivative of u in the direction of hi as given in (6), Dn stands for
the second fundamental form of the manifold ∂ω and H stands for the mean curvature of ∂ω. The
twin result concerning un is an easy adaption of Theorem 2. Once the differentiability of the state
function has been established, one can consider the objectives. In [1], we have shown the first order
result.
Theorem 3 Let Ω be an open smooth subset of Rd (d = 2 or 3) and let ω be an element of Ωd0
with a C4,α boundary. Let h1 and h2 be two deformation fields in H. The Kohn-Vogelius objective is
differentiable with respect to the shape and its derivative in the direction of a deformation field h is
given by:
DJKV (ω)h = [σ]
2 − |∂nu
+ |∇τud|
2 − |∇τun|
hn. (9)
We now give the second-order derivative of the Kohn and Vogelius criterion.
Theorem 4 Let Ω be an open smooth subset of Rd (d = 2 or 3) and ω be an element of Ωd0 with a
C4,α boundary. Let h1 and h2 be two deformation fields in H. The Kohn-Vogelius objective is twice
differentiable with respect to the shape and its second derivative in the directions h1 and h2 is given
D2JKV (ω)(h1,h2) =
σ|∇v|2
h1τ .∇τ (h2,n) + h2τ .∇τ (h1,n)− h2τ .(Dnh1τ )
σ|∇v|2
h1,nh2,n + 2
σ∇v.(h1,n∇v
2 + h2,n∇v
∂n(un)
2 + ∂n(un)
1 − ∂nv
1(ud)
2 − ∂nv
2(ud)
σ∂n(un)
− σ1∂nv
where we have set v = ud − un.
To investigate the properties of stability of this cost function, we are led to consider an admissible
inclusion ω∗ to solve both (1)-(2) and (1)-(4) in order to obtain the corresponding measurements f∗
and g∗. It is obvious that the domain ω∗ realizes the absolute minimum of the criterion JKV since,
by construction, we can write ud = un in Ω and hence JKV (ω
∗) = 0. We will check that the Euler
equation
DJKV (ω
∗)(h) = 0,
holds. We will also prove that
D2JKV (ω
∗)(h,h) =
σ|∇v′|2. (11)
Moreover, if hn 6= 0, then D
2JKV (ω
∗)(h,h) > 0 holds. Nevertheless, (11) does not mean that the
minimization problem is well-posed. In fact, it is the following theorem that explains the instability
of standard minimization algorithms.
Theorem 5 Assume that ω∗ is a critical shape of JKV for which the additional condition un = ud
holds. Then the Riesz operator corresponding to D2JKV (ω
∗) defined from H1/2(∂ω∗) with values in
H−1/2(∂ω∗) is compact. Moreover, the minimization problem is severely ill-posed in the following
sense: if the target domain is C∞ and if λn denotes the n
th eigenvalue of D2JKV (ω
∗), then λn =
o(n−s) for all s > 0.
Theorem 5 has two main consequences. First, the shape Hessian at the global minimizer is
not coercive. This means that this minimizer may not be a local strict minimum of the criterion.
Moreover, the criterion provides no control of the distance between the parameter ω and the target
ω∗. The second consequence concerns any numerical scheme used to obtain this optimal domain ω∗.
One has to face this difficulty and this explains why frozen Newton or Levenberg-Marquard schemes
have been used to solve numerically this problem [7, 1].
The paper is organized as follows. In a first section, we state some preliminary results. Some are
well known facts in shape optimization and will be recalled without proof for the sake of readability.
Some of them (e.g the derivatives of a Laplace-Beltrami operator and the tangential regularity of the
solution to (1)-(2) along the discontinuity of the conductivity distribution) are less known and will
be proved thanks to potential layer methods. Hence we will tackle the computations in Section 3
that we consider as the core of this work : it is essentially devoted to prove Theorem 2. After a first
part where we prove the existence of a second order derivative for the state, we propose two distinct
methods to find the boundary value problem solved by this second order derivative. The first method
(subsection 3.3) follows the lines of classical proofs of shape differentiability by differentiating the
weak formulation of problem (1)-(2) and interpreting the result in terms of differential operator and
boundary conditions. The alternative method (subsection 3.4) consists in a direct differentiation of
the boundary conditions. Finally, Section 4 is devoted to the analysis of the criterion, we establish
Theorem 4 and Theorem 5. We will present their consequences on the stability of critical shapes.
2 Preliminary results.
2.1 Elements of shape calculus
Before entering the proof of Theorem 2, we recall without proof some basic facts from shape opti-
mization (see [6] for references). Let h be a deformation field in C2(Ω,Rd) with ‖h‖C2 < 1. We set
Tt(h, .) = Id + th and denote by Ωt the transported domain Ωt = Tt(Ω). To avoid heavy notations,
we will misuse the notation Tt instead of Tt(h, .).
Material and shape derivatives. Classically, in mechanics of continuous media, the material
derivative is defined as being a positive limit. In our context, for any vector field h ∈ H, we define
the material derivative of the domain functional y = y(Ω) at Ω in an admissible direction h as the
limit
ẏ(Ω;h) = lim
y(Ωt) ◦ Tt − y(Ω)
, (12)
Similarly, one can define the material derivative ẏ(∂Ω,h) for any domain functional y = y(∂Ω) which
depends on ∂Ω. Another kind of derivative occurs : it is called the shape derivative of y(Ω,h). It is
viewed as a first local variation. Its definition is given by the following
Definition 1 The shape derivative y′ = y′(Ω;h) of a functional y(Ω) at Ω in the direction of a
vector field h is given by
y′ = ẏ − h.∇y. (13)
For more details on these derivations, the reader can consult [20, 6].
Elements of tangential derivatives. We will need in the sequel to manipulate the tangential
differential operators on a manifold. For the reader’s convenience, we recall from [4, 6] some definitions
and also some useful rules of calculus.
Definition 2 The tangential divergence of a vector field V ∈ C1(Rd,Rd) is given by
divτ (V) = div (V)−DV.n.n, (14)
where the notation DV denotes the Jacobian matrix of V. When the vector V ∈ C1(∂Ω,Rd) is
defined on ∂Ω, then the following notation is used to define the tangential divergence
divτ (V) = div
− (DṼ.n).n, (15)
where Ṽ stands for an arbitrary C1 extension of V on an open neighborhood of ∂Ω.
We introduce now, the notion of tangential gradient ∇τ of any smooth scalar function f in
C1(∂Ω,Rd).
Definition 3 Let an element f ∈ C1(∂Ω,Rd) be given and let f̃ be an extension of f in the sense
that f̃ ∈ C1(U) and f̃ |∂Ω = f and where U is an open neighborhood of ∂Ω. Then the following
notation is used to defined the tangential gradient
∇τf = ∇f̃ |∂Ω −∇f̃ .n n on ∂Ω. (16)
The details for the existence of such an extension can be found in [4]. Let us remark that these
definitions do not depend on the choice of the extension. Furthermore, one can show the important
relation ∫
∇τf.F = −
f divτ (F) , (17)
for all elements f ∈ C1(∂Ω) and all vector fields F ∈ C1(∂Ω,Rd) satisfying Fn = 〈F, n〉 = 0.
Integration by parts on ∂Ω. In general, the condition above Fn = 0 is not always satisfied.
We are then led to find another formula to extend the formula in the general case. The extension of
this integration by parts formula to fields with a normal vector component involves curvature.
First, we point out that the curvature is connected to the normal vector via the tangential
divergence operator. Recall that the mean curvature of ∂Ω is defined as H = divτ (n). Making use
of the form of divτ (n) on the boundary, one shows straightforwardly the following statement.
Proposition 1 Let Ω be an open subset of R3 with a C2 boundary. For any unitary extension N of
n on a neighborhood of ∂Ω, one has
div (N ) = H on ∂Ω.
Assume that the manifold ∂Ω has no borders. If F ∈ H2(∂Ω)3 and f ∈ H2(∂Ω), then we have
∇f.F+ fdivτ (F) =
(∇f.n+Hf)F.n. (18)
We assume now that the domain Ω has a C3 boundary. The simplest second-order derivative is the
Laplace Beltrami operator; it is defined as follows (see [20, 4, 6]) thanks to the following usual chain
rule.
Definition 4 Let f ∈ H2(∂Ω). The Laplace-Beltrami ∆τ of f is defined as follows
∆τf = divτ (∇τf) . (19)
There is a relation connecting the Laplace operator and the Laplace-Beltrami operator. Let us denote
by ∂2nnf = (D
2f.n).n where D2f stands for the Hessian of f .
Proposition 2 Let Ω be a domain with a boundary ∂Ω of class C3. For all functions f ∈ H3(Ω), it
holds
∆f = ∆τf +H∂nf + ∂
nnf, on ∂Ω. (20)
We need to compute shape and material derivative of special vector fields: the outer unit normal
vector n, the tangential gradient and the Laplace-Beltrami operator applied to a function. While the
derivative of the normal vector is obtained by a straightforward calculus, we have to transport from
∂Ωt to ∂Ω the Laplace-Beltrami operator and the tangential gradient in order to compute the other
derivatives.
Derivatives of the normal vector. We describe the material and shape derivatives of the normal
vector. We will denote by n the gradient of the signed distance to ∂Ω. This is an unitary extension
of the unitary normal vector n at ∂Ω which is smooth in the vicinity of ∂Ω. This extension furnishes
a symmetric Jacobian Dn that satisfies Dnn = 0 on ∂Ω. The direction h will be supposed to be in
C2(Rd,Rd) or in C2(∂Ω,Rd).
Proposition 3 The material derivative ṅ of the normal vector n at Ω in the direction of a vector
field h ∈ C1(Rd,Rd) is given by
ṅ = −∇τ (h.n) +Dnhτ ,
where hτ = h− h.n n.
Concerning its shape derivative defined as n′ = (∂tnt)|t=0 where nt is any smooth unitary extension
of n to ∂Ωt, we obtain.
Proposition 4 The shape boundary n′ in the direction of h is given by
n′ = −∇τ (h.n).
Derivative of the tangential gradient. For f ∈ H3(∂Ω), we compute the material derivative of
∇τf . We first compute the difference
∇τf −∇ḟ .
Proposition 5 For all functions f ∈ C2(R3) and directions h ∈ C2(∂Ω,R3), one has
˙∇τf = ∇ḟ + (D
2fh)τ −∇f.n ṅ−∇f.ṅ n
Proof of Proposition 5. We differentiate ∇f and ∇f.n n and obtain
∇̇f = ∇f ′ +D2fh
while
˙∇f.n n = ∇f.ṅ n+∇f.n ṅ+∇f ′.n n+ (D2fh).n n.
The two former equations give the desired result. �
Derivative of the Laplace-Beltrami operator. Now, we want to compute the material derivative
∆τf . We begin to study how to transport the Laplace-Beltrami operator when one works on ∂Ωt.
Let ∆τ,t denote the Laplace-Beltrami operator on the manifold ∂Ωt. To compute the derivative of a
Laplace-Beltrami operator, we need the following proposition that we quote from [20].
Proposition 6 Let f ∈ H5/2(Rd), then
∆τ,tf
◦ Tt γτ (t)
φ = −
∇(f ◦ Tt)− (B(t) n).∇(f ◦ Tt)
.∇φ, ∀φ ∈ D(Rd).
In the former proposition, we set
γ(t) = detDTt,
γτ (t) = γ(t)‖(DT
T .n‖Rd ,
B(t) =
D(T−1t )(D(T
‖(DT−1t )
T .n‖2
C(t) = γτ (t)D(T
t )(D(Tt)
−1)T .
. (22)
A straightforward computation gives
γ′(0) = divτ (h) ,
γ′τ (0) = divτ (h) = divτ (hτ ) +Hhn,
B′(0) = 2(Dhn).nI − (Dh+ (Dh)T ),
C′(0) = divτ (h) I − (Dh+ (Dh)
. (23)
Theorem 6 Let f ∈ D(Rd). The material derivative of ∆τf in the direction h is given by
∆τf = ∆τ ḟ+∇τf.∇τ
divτ (hτ )
+∇τ (Hhn).∇τf − divτ
Dh+ (Dh)T
Proof of Theorem 6 : Formula (24) is shown in a weak sense. For each test function φ ∈ C∞(∂Ω),
there exists an extension φ̃ ∈ D(Rd) such that ∂nφ̃ = 0; this can be done by extending φ as a constant
along the orbits of the gradient of the signed distance function to ∂Ω and the use of a cut-off function.
For f ∈ D(Rd), we set
A(t) =
(∆τ,tf) ◦ Tt −∆τf
γτ (t) φ.
After an integration by parts on ∂Ω, we obtain:
A(t) =
1− γτ (t)
(∆τ,tf) ◦ Tt φ+
(∆τ,tf) ◦ Ttφ+
∇τf.∇τφ
1− γτ (t)
(∆τ,tf) ◦ Tt φ
∇τf − C(t)∇ (f ◦ Tt)
.∇φ̃+
(B(t)n.∇(f ◦ Tt)
C(t) n.∇φ̃
Since ∂nφ̃ = 0 and C(0) = I, we get
A(t) =
1− γτ (t)
(∆τ,tf) ◦ Tt φ+
∇τ (f − f ◦ Tt)
.∇τ φ̃+
C(0)− C(t)
∇(f ◦ Tt).∇τ φ̃.
When t→ 0, it then comes
∆τfφ = −
γ′τ (t)∆τ fφ+∇τ ḟ .∇τφ+
C′(0).∇f
.∇τφ,
∆τ ḟ − divτ (h)∆τf
Dh+ (Dh)T − divτ (h) I
∇f.∇τφ,
∆τ ḟ − divτ (h)∆τf + divτ
divτ (h)∇τf
− divτ
Dh+ (Dh)T
Expanding the double divergence term, we obtain:
∆τf = ∆τ ḟ +∇τf.∇τdivτ (h)− divτ
Dh+ (Dh)T
In order to explicit these derivatives, we let appear the curvatures of ∂Ω by means of
∇τf.∇τdivτ (h) = ∇τf.∇τ
divτ (hτ ) +Hhn
and this ends the proof of the theorem (24). �
3 Existence of the second order derivative of the state. Proof
of Theorem 2.
The section is devoted to prove Theorem 2. We follow the usual strategy to derive existence in shape
optimization. In section 3.2, we will write the weak formulation of the problem, then transport it on
the reference domain, pass to the limit and obtain existence of the material derivative. In a second
time, we will seek a boundary value problem solved by the material derivative. This will provide
a characterization of the second order shape derivative. Two strategies, that we will detail, are
possible: the first one explored in section 3.3 consists in working on the variational formulation while
the second one uses the tangential differential calculus by differentiating the boundary conditions.
This last approach will be presented in section 3.4. The computations that will be made in subsections
3.3 and 3.4 require some regularity of the traces of the state ud on the interface of discontinuity ∂ω.
For the sake of readability, we postponed in subsection 3.5 all the needed justifications.
3.1 Preliminary results.
In the sequel, we will use some technical formulae. To preserve the readability of the proof of the
main result, we state them in this paragraph. The tools needed for proving these results can be found
in [20]. Given a smooth vector field h, we denote
Ah = Dh+Dh
T − div (h) I
We begin with the following formula.
Lemma 1 It holds:
∇u.Ah∇v = ∇(h.∇u).∇v +∇(h.∇v)∇u − div
(∇u.∇v)h
. (25)
Given two smooth vector fields h1 and h2, we set
A = Dh2Ah1 +Ah1Dh2
T −Ah1div (h2)− (Ah1)
′(h2), (26)
b = (h2.∇u)Ah1∇v + (h2.∇v)Ah1∇u − ((Ah1∇u).∇v)h2.
Here, the notation (Ah1)
′(h2) stands for the matrix defined by its elements
((Ah1)
′(h2))k,l = ∇(((Ah1 )
′)k,l).h2
Lemma 2 One has:
∇u.A∇v = div (b)− (h2.∇u)div
(Ah1∇v)
− (h2.∇v)div
(Ah1∇u)
. (27)
We need the following crucial result
Lemma 3 If u is harmonic then
Ah1∇u
= ∆(h1.∇u). (28)
Proof of Lemma 3 For any harmonic function u in Ω and for every test function φ ∈ D(Ω), we
can write ∫
∇u̇∇φ =
Ah∇u∇φ
then ∫
∆u̇ φ =
div (Ah∇u) φ
Since u̇ = u′ + h.∇u and since u′ is harmonic in Ω, we obtain the result. �
3.2 Proof of existence of the second order derivative.
We follow Hettlich and Rundell [8] and Simon [19] to define the second order derivative of an op-
erator with respect to a domain. We compute the second derivative by considering two admissible
deformations h1,h2 ∈ H that will describe the small variations of ∂ω. Simon shows that the second
derivative F ′′(∂ω;h1,h2) of F (∂ω) is defined as a bounded bilinear operator satisfying
F ′′(∂ω;h1,h2) =
F ′(∂ω;h1)
h2 − F
′(∂ω;Dh1 h2) (29)
For more details, the reader can consult the appendix in page 613 of [8].
Let us begin the proof. Let h1,h2 ∈ H be two vector fields. The direction h1 being fixed, we
consider u̇1,h2 the variation of u̇1 with respect to the direction h2. We recall from [1] that the material
derivative u̇1 of u in the direction h1 satisfies
∀v ∈ H10 (Ω),
σ∇u̇1.∇v =
σ∇u.Ah1∇v.
Let φ2 : Ω 7→ Ω be the diffeomorphism defined by φ2(x) = x+h2(x) and we set ψ2 = φ
2 . Setting
ωh2 =
x+ h2(x), x ∈ ω
, Ωh2 =
x+ h2(x), x ∈ Ω
= Ω and σh2 = σ ◦ φ2, we get
σh2∇u̇1,h2 .∇v =
σh2∇uh2 .Ah1∇v (30)
where uh2 is the solution of the original problem with ωh2 instead of ω. Making the change of
variables x = φ2(X), we get the integral identity on the fixed domain Ω :
σ∇˜̇u1,h2 .
Dψ2(Dψ2)
T det(Dφ2)
σ∇ũh2 .
Dψ2Ãh1(Dψ2)
T det(Dφ2)
∇v (31)
with the notations ũ = u ◦φ2 and Ãh1 = Ah1 ◦φ2. Since the material derivative u̇1 of u with respect
to the direction h1 satisfies ∫
σ∇u̇1.∇v =
σ∇u.Ah1∇v,
the difference of (30) and (31) gives
˜̇u1,h2 − u̇1
.∇v =
σ∇˜̇u1,h2 .
I −Dψ2(Dψ2)
T det(Dφ2)
σ∇ũh2 .
Dψ2Ãh1(Dψ2)
T det(Dφ2)−Ah1
(∇ũh2 −∇u).Ah1∇v.
We quote from [13] and [8] the following asymptotic formulae
‖ − div (hi) ‖∞ = O(‖hi‖
‖Dψi(Dψi)
T det(Dφi)− I +Ahi‖∞ = O(‖hi‖
‖Dψ2Ãh1(Dψ2)
T det(Dφ2)−Ah1 +Dh2Ah1 +Ah1(Dh2)
T − div (h2)Ah1 − (Ah1)
′(h2)‖∞ = O(‖h2‖
Making the adequate substitutions, we easily check that the material derivative of u̇1 with respect
to h2 exists. This derivative, denoted by ü1, satisfies
σ∇ü1.∇v dx =
∇u̇1.Ah2∇v +∇u̇2.Ah1∇v −∇u.A∇v
. (32)
where A is defined in (26).
3.3 Derivation of (8) from the weak formulation.
We want to make explicit the problem solved by (u′)′. To achieve this, we should write the right
hand side
∇u̇1.Ah2∇v +∇u̇2.Ah1∇v −∇u.A∇v
as the sum of an integral with ∇v in factor and an integral of a divergence to identify the jump
conditions on ∂ω. To that end, we will use algebraic identities that involve second order derivatives
of u, u̇i and of the test function v ∈ D(Ω). Using Lemma 1, we obtain:
σ∇u̇1.Ah2∇v =
∇(h2.∇u̇1).∇v +∇(h2.∇v)∇u̇1 − div
(∇u̇1.∇v)h2
σ∇u̇2.Ah1∇v =
∇(h1.∇u̇2).∇v +∇(h1.∇v)∇u̇2 − div
(∇u̇2.∇v)h1
Concerning the remaining terms, we use Lemma 2 to get
σ∇u.A∇v =
σ div
(h2.∇u)Ah1∇v + (h2.∇v)Ah1∇u− (Ah1∇u.∇v)h2
(h2.∇u)div
Ah1∇v
+ (h2.∇v)div
Ah1∇u
We apply Lemma 3 and gather the expressions obtained for F .
∇ (h1.∇u̇2 + h2.∇u̇1) .∇v +∇(h2.∇v).∇u̇1 +∇(h1.∇v).∇u̇2
σ div
(Ah1∇u.∇v −∇u̇1.∇v)h2 − (∇u̇2.∇v)h1
(h2.∇v)∆(h1.∇u)− div
(h2.∇v)Ah1∇u
−∇(h2.∇u).Ah1∇v
Using (25), we remove the dependency on Ah1∇v:
∇(h2.∇u).Ah1∇v = ∇(h1.∇(h2.∇u)).∇v +∇(h1.∇v)∇(h2.∇u)− div
(∇(h2.∇u).∇v)h1
Therefore, we write F = F1 + F2 where
∇ (h1.∇u̇2 + h2.∇u̇1)−∇(h1.∇(h2.∇u))
.∇v, (34)
∇(h1.∇v).∇(u̇2 − h2.∇u) +∇(h2.∇v).∇u̇1 + (h2.∇v)∆(h1.∇u)
σ div
(Ah1∇u.∇v −∇u̇1.∇v)h2 +
∇(h2.∇u).∇v −∇u̇2.∇v
h1 − (h2.∇v)Ah1∇u
The connection between second order material and shape derivatives is given by:
ü1 = (u
2 + h1.∇u̇2 + h2.∇u̇1 − h1.∇(h2.∇u),
incorporating this expression in (34), we rewrite (32) as:
∀v ∈ H10 (Ω),
σ∇(u′1)
2.∇v = F2. (35)
Testing it against v ∈ D(Ω \ ∂ω), we get ∆(u′1)
2 = 0 in Ω \ ω and in ω. We now deduce the jump
conditions for (u′1)
2. To obtain the jump of the potential, we simply write that ü1 ∈ H
0(Ω), hence
[ü1] = 0 on ∂ω and then
[(u′1)
2] = −h1.∇u
2 − h2.∇u̇1.
To express the jump of the flux, we then apply the Gauss formula in (35) to get
[σ∂n(u
2]v = F2. (36)
The second term F2 contains all the jumps of the flux on the interface ∂ω.
A simplified expression of F2. To get a simplified formula for F2 under a boundary integral,
some lengthy but straightforward calculations are needed. We summarize the result by means of the
following lemma
Lemma 4 One has:
2h2,nh1,nDn [σ∇τu]− h2,nn.∇h1,n [σ∇τu] + h2,nh1τ .Dnn [σ∇τu]
h1τ .∇τ (h2,n) [σ∇τu]− h1,nh2,nH [σ∇τu]
+ divτ
Proof of lemma First, write :
σ∇(h1.∇v).∇(u̇2 − h2.∇u) = σ1
∇(h1.∇v).∇u
2 + σ2
∇(h1.∇v).∇u
[σ∂nu
2](h1.∇v)
Note that the normal vector is oriented from ω to Ω \ ω. In the same spirit, we write
∇(h2.∇v).∇u̇1 + (h2.∇v)∆(h1.∇u) = ∇(h2.∇v).∇(u̇1 − h1.∇u) + div
(h2.∇v).∇(h1.∇u)
By a argument of symmetry, we then can write:
σ∇(h2.∇v).∇(u̇1 − h1.∇u) = −
[σ∂nu
1](h2.∇v).
To drop the dependency in Ah1 , we use (25) and get after expansion:
Ah1∇u.∇v
= div
∇(h1.∇v).∇u +∇(h1.∇u)∇v
− div
(∇u.∇v)h1
(h2.∇v)Ah1∇u
= ∇(h2.∇v).Ah1∇u + (h2.∇v)div
Ah1∇u
= ∇(h1.∇(h2.∇v)).∇u +∇(h1.∇u)∇(h2.∇v) + (h2.∇v)∆(h1.∇u)
− div
∇(h2.∇v).∇u
h1.∇(h2.∇v)
.∇u+ div
(h2.∇v)∇(h1.∇u)−
∇(h2.∇v).∇u
After integrating by parts, we conclude thanks to the state equation and obtain
h1.∇(h2.∇v)
.∇u = −
h1.∇(h2.∇v)
div (σ∇u) = 0
We substitute the shape derivative u′ to the material one u̇:
F2 = −
[σ∂nu
1](h2.∇v) + [σ∂nu
2](h1.∇v)−
σ div
(∇u.∇v)h1
σ div
∇(h1.∇v).∇u)h2 + (∇(h2.∇v).∇u
(∇u′2.∇v)h1 + (∇u
1.∇v)h2
First, we use the continuity of the flux on ∂ω, then we integrate by parts on ∂ω and finally we
incorporate the expressions of the jumps of the shape derivatives u′ to obtain
∇(h2.∇v).∇u
σ∇u.∇(h2.∇v)
h1,n = −
[σ∇τu]h1,n∇τ (h2.∇v)
[σ∇τu]h1,n
h2.∇v =
h2.∇v.
This leads to a simplified expression for F2:
F2 = −
σ div
(∇u.∇v)h1
(∇u′1.∇v)h2 + (∇u
2.∇v)h1
Let us study each term of this sum. Using Gauss formula and integrating by parts on the manifold
∂ω, we obtain
σ div
∇u′1.∇v)h2
σ∇u′1.∇v
∂nv −
∂nv +
By symmetry, we also get:
σ div
∇u′2.∇v)h1
∂nv +
We now turn to the term with a double divergence. We first write it as a boundary integral thanks
to Gauss formula as
σ div
(∇u.∇v)h1
h2,ndiv
σ(∇u.∇v)
then, we use (14) to introduce the tangential operators
σ div
(∇u.∇v)h1
h2,ndivτ
σ(∇u.∇v)
h2,nD(h1
σ(∇u.∇v)
)n.n.
We study each of these terms. We start with the one involving tangential derivatives: we expand the
tangential divergence to incorporate the jump relation for the state u.
σ(∇u.∇v)
= divτ (h1)
σ(∇u.∇v)
+ h1.∇τ [σ∇u.∇v]
= divτ (h1) [σ∇τu] .∇τv + h1.∇τ [σ∇τu.∇τv] .
Then, the first term becomes:
h2,ndivτ
σ(∇u.∇v)
h2,ndivτ (h1) [σ∇τu] .∇τv +
h2,nh1.∇τ [σ∇τu∇τv] .
We use the integration by parts formula (18) to get:
h2,ndivτ
σ(∇u.∇v)
h1,nh2,nH [σ∇τu] .∇τv − divτ
divτ (h1)h2,n [σ∇τu]
v − divτ
h1h2,n
[σ∇τu] .∇τv
h1h2,n
− divτ (h1)h2,n − h1,nh2,nH
[σ∇τu]
Expanding
h1h2,n
[σ∇τu]
v = divτ
divτ (h1) h2,n [σ∇τu] + h1.∇τ (h2,n) [σ∇τu]
= divτ
divτ (h1) h2,n [σ∇τu]
v + divτ
h1τ∇τh2,n [σ∇τu]
we obtain the new expression:
h2,ndivτ
σ(∇u.∇v)
h1τ∇τh2,n − h1,nh2,nH
[σ∇τu]
v. (38)
Now, we consider the term involving normal components. We have
n.D(h1
σ∇(∇u.∇v)
)n = n.∇(h1,n [σ∇u.∇v])− [σ∇u.∇v]h1τ .Dnn
= n.∇(h1,n) [σ∇τu]∇τv + h1,nn.∇([σ∇u.∇v]).
Then, we get
h2,nD(h1
σ(∇u.∇v)
)n.n =
h2,nn.∇(h1,n) [σ∇τu] .∇τv + h2,nh1,nn.∇([σ∇u.∇v])
−divτ
h2,nn.∇(h1,n) [σ∇τu]
v + h2,nh1,nn.∇([σ∇u.∇v]).
A straightforward calculus leads to
n.∇([σ∇u.∇v]) = n.
σD2u∇v
+D2v [σ∇u]
∇τv +D
2v [σ∇τu]
= ∂nv
∇τv + n.D
2v [σ∇τu] .
where D2u is the Hessian matrix of u. From (20) and from the jump conditions for the state u, we
deduce that [
= − [σ∆τu] .
When one differentiates the relation expressing the continuity of the flux for the state along the
tangential direction ∇τv, one gets ([6], p 235):
0 = ∇[σ∂nu].∇τv = [σD
2u]∇τv.n+ [σ∇u].(Dn∇τv).
In the same spirit, it comes that
∇∂nv.[σ∇τu] = D
2v[σ∇τu].n+∇v.(Dn[σ∇τu]). (40)
Since Dn is a symmetric matrix and Dnn = 0, one checks ∇v.(Dn[σ∇τu]) = [σ∇u].(Dn∇τv).
n.∇([σ∇u.∇v])) = − [σ∆τu] ∂nv − 2Dn [σ∇τu] .∇τv + [σ∇τu]∇τ∂nv
We integrate this expression on ∂ω and obtain after some integration by parts:
h2,nh1,nn.∇([σ∇u.∇v])
h2,nh1,n [σ∆τu]∂nv +
h2,nh1,n [σ∇τu]∇τ∂nv − 2
h2,nh1,nDn [σ∇τu] .∇τv,
h2,nh1,n [σ∆τu] + divτ
h2,nh1,n [σ∇τu]
∂nv + 2
h2,nh1,nDn [σ∇τu]
Hence
h2,nD(h1
σ(∇u.∇v)
)n.n = −
h2,nh1,n [σ∆τu] + divτ
h2,nh1,n [σ∇τu]
2h2,nh1,nDn [σ∇τu]− h2,nn.∇τh1,n [σ∇τu]
(∇u.∇v)h1
h2,nh1,n [σ∆τu] + divτ
h2,nh1,n [σ∇τu]
2h2,nh1,nDn [σ∇τu]− h2,nn.∇h1,n [σ∇τu]
h1τ .∇τ (h2,n) [σ∇τu]− h1,nh2,nH [σ∇τu]
Gathering all the terms, we write F2 as:
2h2,nh1,nDn [σ∇τu] +
h1τ .∇τ (h2,n)− h2,nn.∇h1,n − h1,nh2,nH
[σ∇τu]
+ divτ
h2,nh1,n [σ∆τu] + divτ
h2,nh1,n [σ∇τu]
h1,ndivτ
h2,n [σ∇τu]
+ h2,ndivτ
h1,n [σ∇τu]
We end the proof after expanding the tangential divergence of the last term of F2. �
Let us return to the weak formulation (36) of the derivative. By identification, we get
[σ∂n(u
2] = divτ
+ divτ
− divτ
h2,nh1,n(2Dn−HI) [σ∇τu]
− divτ
h1τ .∇τ (h2,n) [σ∇τu]− h2,nn.∇h1,n [σ∇τu] + h2,nh1τ .Dnn [σ∇τu]
It remains to compute the jump of the flux for the second order derivative. Since
u′′1,2 = (u
2 − u
Dh1 h2
where u′Dh1 h2 is the first shape derivative of u in the direction of the vector field Dh1 h2. Thanks
to (6), we can write the jump under the form
[σ∂nu
1,2] = [σ∂n(u
2]− [σ∂nu
Dh1 h2
] = [σ∂n(u
2]− divτ
Dh1h2.n[σ∇τu]
. (42)
Let us split the field h2 in two parts: Dh1 h2.n = h2,nn.Dh1 n +Dh1 h2τ .n. In the spirit of (40),
we obtain
Dh1h2τ .n = ∇τh1,n.h2τ − h1τ .Dnh2τ . (43)
Thanks to (39), the jump [σ∂nu
Dh1 h2
] then can be written under the form
[σ∂nu
Dh1 h2
] = divτ
(h2,nn.∇h1,n +∇τh1,n.h2τ − h1τ .Dnh2τ )[σ∇τu]
Gathering all the terms, simplifications occur and we get:
[σ∂nu
1,2] =divτ
+ h1,n
− divτ
(h1τ .∇τh2,n +∇τh1,n.h2τ ) [σ∇τu]
− divτ
h2,nh1,n(2Dn−HI) [σ∇τu]
+ divτ
h1τ .Dnh2τ )[σ∇τu]
To get the jumps of the potential, we use (41) and obtain
u′′1,2
(u′1)
u′Dh1 h2
= −h1.
− h2. [∇u̇1]−
u′Dh1 h2
= −h2,n
− h1,n
− h2τ .
− h1τ .
− h2,nn.
∇(h1.∇u)
− h1,nn.
∇(h2.∇u)
u′Dh1 h2
Thanks to the jump of the potential for the first order shape derivative given in (6), it comes that
h2τ .
= −h2τ .
∇(h1.∇u)
and h1τ .
= −h1τ .
∇(h2.∇u)
and then:
u′′1,2
= −h2,n
− h1,n
− h2,nn.
∇(h1.∇u)
+ h1τ .
∇(h2.∇u)
u′Dh1 h2
Computing the other jumps that appeared in the former expression, we get
∇(h2.∇u)
= (Dh2)
T [∇u] +
h1τ .
∇(h2.∇u)
= n.Dh2h1τ [∂nu] + h2,nh1τ .
n+ h1τ .
h2τ .
h2,nn.
∇(h1.∇u)
= h2,n [∂nu]n.Dh1n+ h2,nh1,nn.
n+ h2,nn.
h1τ .
u′Dh1 h2
= −Dh1 h2.n [∂nu] = −
h2,nn.Dh1n+ n.Dh1h2τ
[∂nu]
With the help of formula (43), we obtain:
−h2,nn.
∇(h1.∇u)
+ h1τ .
∇(h2.∇u)
u′Dh1 h2
∇τh1,n.h2τ +∇τh2,n.h1τ
[∂nu]
− 2h1τ .Dnh2τ [∂nu] + h1τ .
h2τ − h2,nh1,nn.
n = − [∆τu]−H [∂nu] = −H [∂nu] ,
h1τ .
h2τ = h1τ .D([∇u])h2τ = h1τ .D([∂nu]n)h2τ = [∂nu]h1τ .Dnh2τ .
Finally, we gather the results of these computations to write
u′′1,2
+ h1,n
∇τh1,n.h2τ +∇τh2,n.h1τ
[∂nu]
h2,nh1,nH − h1τ .Dnh2τ
[∂nu])
3.4 How to recover (8) by formal differentiation of the boundary condi-
tions.
The aim of this section is to retrieve the expression of the flux jump [σ∂nu
′′] by computing the normal
derivatives of each of the expressions
[σ∇u′].n and
h1,n[σ∇τu]
. Since
[σ∇u′].n = divτ
h1,n[σ∇τu]
= h1,n[σ∆τu] +∇τh1,n.[σ∇τu],
then, we get
[σ∇u′].n = ˙h1,n[σ∆τu] + h1,n
[σ∆τu] +
˙∇τh1,n.[σ∇τu] +∇τh1,n.
[σ∇τu]. (46)
In order to avoid lengthy computations, we shall concentrate on each normal derivative appearing in
the above formula. Some of the results are straightforward and their proof will be left to the reader.
Combining propositions (3) and (5), we conclude that
˙∇τh1,n = −∇τ (h1.∇τh2,n) + (D
2h1,n.h2)τ −∇h1,n.ṅ n−∇h1,n.n ṅ.
In the same manner, we also get
[σ∇τu] = [σ∇τu
2] + ([σD
2u].h2)τ − [σ∇τu].ṅ n− [σ∇τu]n ṅ.
Hence, we can write
h1.n = h2.∇h,n −∇τh2,n.h1τ .
It remains to simplify the terms A = (D2u.h2)τ .∇τh1,n and B = [σ∇τu].(D
2h1,n.h2)τ . We obtain:
A = −[σ∇τu].(Dn∇τh1,n)h2,n + [σ∆τu]∇τh1,n.h2τ ,
B = (D2h2,n.h2τ ).[σ∇τu] +∇τ (∂nh1,n).[σ∇τu]h2,n − [σ∇τu].(Dn∇τh1,n)h2,n.
We tackle the computation of (∂nu
′)′. For the sake of clearness, we subdivide the work in several
steps.
First step. We compute
h1,n[σ∇τu]
. We expand:
h1,n[σ∇τu]
h1,n[σ∆τu] +
∇τh1,n.[σ∇τu],
h1,n[σ∆τu] + h1,n
[σ∆τu] +
∇τh1,n.[σ∇τu] +∇τh1,n.
[σ∇τu].
Hence, after substitution, one gets
h1,n[σ∇τu]
= divτ
h1,n[σ∇τu
2] + (h2,n∂nh1,n −∇τh2,n.h1τ )[σ∇τu]
+ 2[σ∆τu]∇τh1,n.h2τ − ∂nh1,[σ∇τu].(Dnh2τ ) + [σ∇τu].(D
2h1,n.h2τ )
− 2h2,n[σ∇τu].(Dn∇τh1,n) + h1,n
σ∆τu− [σ∆τu
. (47)
Second step. We compute
[σ∂nu
1]. From the expression of ṅ, we get after some straightforward
computations:
[σ∂nu
1] = [σ∂n(u
2] + ([σD
2u′1]h2).n+ [σ∇τu
1].(Dnh2τ −∇τh2,n). (48)
Third step. We compute σ∂n(u
2. From the jump condition on the flux of the derivative (6) and
(47) and (48), we obtain:
[σ∂(u′1)
2] = divτ
h1,n[σ∇τu
h2,n∂nh1,n −∇τh2,n.h1τ
[σ∇τu]
+ 2∇τh1,n.h2τ [σ∆τu]
−([σD2u′1]h2).n+ [σ∇τu
∇τh2,n −Dnh2τ
− ∂nh1,n[σ∇τu].(Dn h2τ )
+(D2h1,n h2τ ).[σ∇τu]− h2,n(Dn [σ∇τu])£.∇τh1,n.
Taking account of the following calculation,
−([σD2u′1]h2).n+ [σ∇τu
1].∇τh2, n = −
h2,n[σD
2u′1]n+ [σD
2u′1]h2τ
.n+ [σ∇τu
1].∇τh2, n,
= h2,n
[σ∆τu
1] +H [σ∂nu
+ [σu′1].∇τh2,n − ([σD
2u′1]h2τ ).n,
= divτ
h2,n[σ∇τu
+Hh2,n[σ∂nu
1]− ([σD
2u′1]h2τ ).n;
it comes
[σ∂n(u
2] = divτ
h1,n[σ∇τu
2] + h2,n[σ∇τu
h2,n∂nh1,n −∇τh2,n.h1τ
[σ∇τu]
+2[σ∆τu]∇τh1,n.h2τ +Hh2,n[σ∂nu
1]− ([σD
2u′1]h2τ ).n
[σ∇τu
1] + ∂nh1,n[σ∇τu]
.(Dnh2τ ) + (D
2h1,n h2τ ).[σ∇τu]
−2h2,n∇τh1,n.(Dn [σ∇τu]) + h1,n
[σ∆τu]− [σ∆τu
. (49)
This formula remains hard to handle. To get a more convenient one, we decide to derive tangentially
to the direction h2 the boundary identity
[σ∂nu
1] = h1,n[σ∆τu] +∇τh1,n.[σ∇τu].
This leads to:
([σD2u′1]h2τ ).n+(Dnh2τ ).[σ∇τu
1] = ∇τh1,n.h2τ [σ∆τu] + h1,n∇τ [σ∆τu].h2τ
+ (D2h1,n h2τ ).[σ∇τu]− ∂nh1,n[σ∇τu].(Dnh2τ ) + [σ∆τu]h2τ .∇τh1,n.
From (24) and subtracting (50) from (49), we can write
[σ∂n(u
2] = divτ
h1,n[σ∇τu
2] + h2,n[σ∇τu
h2,n∂nh1,n −∇τh2,n.h1τ
[σ∇τu]
+divτ
h1,nh2,n(HI − 2Dn).[σ∇τu]
− h1,n
∇τ [σ∆τu].h2τ +∆τ [σ∇τu].h2τ
+h1,n
∇τdivτ (h2τ ) .[σ∇τu]− divτ
Dh2 + (Dh2)
[σ∇τu]
From (24), we obtain
[σ∆τu] = [σ∆τ u̇] +∇τdivτ (h2τ ) .[σ∇τu] +∇τ (Hh2,n).[σ∇τu]
− divτ
Dh2 + (Dh2)
[σ∇τu]
and using the relation between the material and shape derivative, we get
[σ∆τu] = [σ∆τu
′] +∇
[σ∆τu]
.h2 and [σ∆τ u̇] = [σ∆τu
′] + ∆τ
[σ∇u].h2
Injecting these relations in (51) and applying them for h2τ , we get
∆τ ([σ∇τu].h2τ ) +∇τdivτ (h2τ ) .[σ∇τu] = ∇τ [σ∆τu].h2τ + divτ
Dh2 + (Dh2)
[σ∇τu]
This last fact allows us to conclude.
3.5 Justification of the formal computations.
We have to justify rigorously that the right-hand sides of (6),(7),(8) make sense. They involve
tangential derivatives of un and ud along the interface ∂ω up to the order three. The existence of
these derivatives is not clear a priori since the gradient of the solution has a discontinuity along this
interface. Our first aim is to precise the tangential regularity along the interface ∂ω of the solution
u of (1) with either Dirichlet or Neumann boundary conditions.
We should access to the trace of u on the interface ∂ω. Any numerical discretization needs also
to compute the state, its derivatives with respect to the shape and the normal derivatives along the
interface ∂ω. To that end, we introduce for any α ∈ H1/2(∂ω) and β ∈ H−1/2(∂ω) the following
boundary value problems
∆v = 0 in Ω \ ω and in ω,
[v] = α on ∂ω,
[σ∂nv] = β on ∂ω,
v = f1 on ∂Ω.
and (N)
∆v = 0 in Ω \ ω and in ω,
[v] = α on ∂ω,
[σ∂nv] = β on ∂ω,
∂nv = g1 on ∂Ω,
where (f1, g1) ∈ H
1/2(∂Ω)×H−1/2(∂Ω). Note that for α = 0, β = 0 and (f1, g1) = (f, g) then (ud)
and un solve respectively (D) and (N); furthermore the choice of
hn∂nu
+ and β = [σ] divτ (hn∇τu) (53)
leads to (6) and (7) when we take (f1, g) = (0, 0).
Existence of solutions to (D) and (N). To study these problems, we use the integral rep-
resentation in terms of layer potentials. In a first step, we recall some definitions. The Newtonian
potential Γ is defined as:
Γ(x, y) =
ln(|x− y|) if n = 2,
|x− y|
if n = 3.
The integral equations applying to direct problem will be obtained from a study of the classical
single- and double-layer potentials. We begin to introduce the following operators
S∂Ω∂ω : u 7→ S∂Ω∂ωu(x) :=
Γ(x, y)u(y) dσ(y);
S∂ω∂Ω : u 7→ S∂ω∂Ωu(x) :=
Γ(x, y)u(y) dσ(y);
K∂Ω∂ω : u 7→ K∂Ω∂ωu(x) :=
∂nΓ(x, y)u(y) dσ(y) ;
K∂ω∂Ω : u 7→ K∂ω∂Ωu(x) :=
∂nΓ(x, y)u(y) dσ(y)
Note that all these operators have a smooth kernel since the boundaries ∂ω and ∂Ω are assumed to
have no common point. We also denote
SΩ : u 7→ SΩu(x) :=
Γ(x, y)u(y) dσ(y);
KΩ : u 7→ KΩu(x) :=
∂nΓ(x, y)u(y) dσ(y);
Sω : u 7→ Sωu(x) :=
Γ(x, y)u(y) dσ(y);
Kω : u 7→ Kωu(x) :=
∂nΓ(x, y)u(y) dσ(y).
We now obtain some systems of integral equations to compute the state function and their shape
derivatives. Since v is harmonic in Ω \ ω and for all x ∈ ∂Ω ∪ ∂ω, it has the classical boundary
representation:
v(x) =
∂nΓ(x, y)v(y) −
∂nΓ(x, y)v(y)−
Γ(x, y)∂nv(y) +
Γ(x, y)∂nv(y). (54)
Similarly since v harmonic in ω, for all x ∈ ∂ω we can write
v(x) =
∂nΓ(x, y)v(y) −
Γ(x, y)∂nv(y). (55)
Let us denote by vd the solution of the boundary values problem (D) in (52). Let us show how
to compute their restrictions and also their normal vector derivatives on the boundaries. Incorpo-
rating the jump conditions, a straightforward computation leads to the following boundary integral
equations
I + µKω
σ2 + σ1
S∂Ω∂ω
µK∂ω∂Ω
σ2 + σ1
(v+d )|∂ω
(∂nvd)|∂Ω
σ1 + σ2

I −Kω
−σ2K∂ω∂Ω S∂ω∂Ω

σ1 + σ2

K∂Ω∂ωf1

where µ = [σ]/(σ1 + σ2). Thanks to (55), the quantity (∂nvd)
+ is then given by
Sω(∂nvd)
I +Kω
v+d (x)|∂ω − α
Concerning vn, the solution of the Neumann problem (N) in (52), the same kind of computations
gives

I + µKω −
σ2 + σ1
K∂Ω∂ω
µK∂ω∂Ω −
σ2 + σ1
I +KΩ

(v+n )|∂ω
(vn)|∂Ω
σ1 + σ2

I −Kω
−σ2K∂ω∂Ω S∂ω∂Ω

σ1 + σ2
S∂Ω∂ωg1
Finally, the computation of (∂nvn)
is given by
Sω(∂nvn)
I +Kω
v+n (x)|∂ω − α
Concerning the well-posedness of (56), we can state the following result.
Theorem 7 The linear system of integral equation (56) has an unique solution in H1/2(∂ω) ×
H−1/2(∂Ω).
Proof of Theorem 7 Let A be the matricial operator defined on H1/2(∂ω)×H−1/2(∂Ω) as
I + µKω
σ2 + σ1
S∂Ω∂ω
µK∂ω∂Ω
σ1 + σ2
The main argument of the proof is based on the Fredholm alternative. In a first step, we have to show
that the adjoint operator A∗ is injective. Since the boundaries are bounded, the adjoint operator A∗
defined on H−1/2(∂ω)×H1/2(∂Ω) can be written under the form
I + µK∗ω µK
σ2 + σ1
S∂ω∂Ω
σ2 + σ1
. (59)
Let (u, v) ∈ H−1/2(∂ω)×H1/2(∂Ω) be in the kernel of A∗. Consider the potential W defined for each
x ∈ Rd by
W (x) =
σ2 + σ1
Γ(x, y)u(y) +
Γ(x, y)v(y)
. (60)
In a first step, we show that W = 0. The function W satisfies ∆W = 0 in Rd \ (∂ω ∪ ∂Ω) by
construction. We check that W |∂Ω = 0 from the equation corresponding to the second line of A
By the properties of the single layer potential, [W ] = 0 on ∂ω. Furthermore, it holds [σ∂nW ] = 0 on
∂ω. Indeed, we can have ([11])
σ1 + σ2
u+K∗∂Ω∂ωv
σ1 + σ2
v +K∗∂Ω∂ωv
hence,
σ1∂nW
+ − σ2∂nW
− = σ1
I + µK∗ω)u+ µK
∂Ω∂ωv
This corresponds to the first line of A∗(u, v). Then, W solves the Laplace equation (1) with ho-
mogeneous Dirichlet boundary conditions. By the uniqueness of the solution, we get W = 0 in
In a second step, we deduce that u = v = 0. Since W = 0 in Ω, we see that [∂nW ] = 0 on ∂ω.
Since [∂nW ] = σ1u/(σ1 + σ2) on ∂ω , we deduce u = 0. From the second line of A
∗(u, v) = 0, we
see that SΩv = 0 on ∂Ω. Since the single layer potential operator SΩ : H
−1/2(∂Ω) 7→ H1/2(∂Ω) is an
isomorphism, v = 0 holds. The injectivity of A∗ is proved. Since 2A = I + C where C is a compact
operator, we conclude that A has a continuous inverse thanks to the Fredholm alternative. �
In a similar way, the problem (57) is well-posed under some additional assumptions. We define
the adequate space
♦ (∂Ω) =
φ ∈ H1/2(∂Ω) :
φ = 0
We can state the following result.
Theorem 8 If we impose the normalizing condition
then there exists one unique couple ((vn)|∂ω, (vn)|∂Ω) ∈ H
1/2(∂ω)×H
♦ (∂Ω) solution of (57) .
Proof of Theorem 8 Set

I + µKω −
σ2 + σ1
K∂Ω∂ω
µK∂ω∂Ω −
σ1 + σ2
I +KΩ

the operator defined on H1/2(∂ω)×H
♦ (∂Ω). The adjoint B
∗ can be written under the form
I + µK∗ω µK
σ1 + σ2
K∗∂ω∂Ω −
σ1 + σ2
I +K∗Ω
In a first step, we begin to show that B∗ is injective. Let (u, v) ∈ H1/2(∂ω) × H1/2(∂Ω) be in the
kernel of B∗. We introduce the potential
Z(x) = −
σ1 + σ2
Γ(x, y)u(y) +
Γ(x, y)v(y)
, x ∈ Rd.
We can see that Z is a harmonic function in Rd\(∂ω ∪ ∂Ω), satisfying ∂nZ = 0 on ∂Ω. By the
properties of the single layer potential, [Z] = 0 Furthermore, a straightforward calculation shows
that [σ∂nZ] = 0 on ∂ω. Hence, Z solves the boundary value problem
−div (σ∇Z) = 0 in Ω,
∂nZ = 0 on ∂Ω.
The function is therefore constant in Ω. Writing [∂nZ] = 0 on ∂ω, we get easily u = 0 and then
+K∗Ω)v = 0. Since the operator λI −K
Ω is one to one on H
♦ (∂Ω), we deduce that v = 0. We
conclude the proof thanks to the Fredholm alternative. �
Tangential regularity results. Let us consider now the particular case where both α and β
are the zero function and (f1, g1) = (f, g) where f and g are respectively the Dirichlet and Neumann
boundary data. To recover the tangential regularity of the solution u along ∂ω, we look at the first
line of (56) to deduce that
I + µKω
(ud)|∂ω = −
σ2 + σ1
S∂Ω∂ω∂nud|∂Ω +
σ1 + σ2
K∂Ω∂ωf ; (63)
Sω(∂nud)
I +Kω
u+d (x)|∂ω (64)
It is easy to deduce that (ud)|∂ω ∈ C
3,α(∂ω). Indeed, from (63) that we consider as an equation in
(ud)|∂ω with data f and (∂nud)|∂Ω = g, we see that (f, (∂nvd)|∂Ω) belongs to H
1/2(∂Ω)×H−1/2(∂Ω),
thanks to Theorem 7.
In order to give a sense to the jump conditions arising in (6),(7),(8), we need to work in space of
functions of higher regularity. We choose the framework of Hölder spaces. We quote [12] to precise
the behavior of the layer potentials on these spaces.
Theorem 9 (Kirsch [12])
1. If ∂ω is of class C2,α, 0 < α < 1 then the operators Sω and Kω map C
β(∂ω) continuously into
C1,β for all 0 < β ≤ α.
2. Let k ∈ N with k 6= 0. If ∂ω is of class Ck+1,α with 0 < α < 1, then the operators Sω and Kω
map Ck,β(∂ω) continuously into Ck+1,β(∂ω) for all 0 < β ≤ α.
3. Let k be an integer. If ∂ω is of class Ck+2,α, then K∗ω maps C
k,β continuously into Ck+1,β(∂ω)
for all 0 < β ≤ α.
We go back to the proof. Since the two boundaries have no intersection point and since ∂ω is of
class C4,α, it follows that the right hand side of the former equation is of class C3,α(∂ω). We then
conclude the solution of (63) will be of class C3,α since the operator 1/2I + µKω is an isomorphism
from C3,α(∂ω) into itself. With the same arguments, we show straightforwardly that (∂nun)
∈ C2,α.
About the regularity of the jumps of the second derivative. The equations giving the jump
conditions [u′d] and [∂u
d] show obviously that [u
d] and [∂nu
d] belong respectively to C
2,α(∂ω) and
C1,α(∂ω). Hence, it comes straightforwardly that [u′′d ] ∈ C
1,α. With the same arguments, we show
that [∂nu
d ] ∈ C
0,α(see [20] for more details) and then that all the formal computations to get the
equations describing the second derivative have a sense.
Remark 1 In a view of a numerical discretization of the state equation, one has to emphasize that
the choice of a finite elements method seems inappropriate: one should extract tangential derivative
of high order on the interface ∂ω. The obtained numerical accuracy is not sufficient to incorporate
the results in an optimization scheme. On the converse, the systems of boundary integral equations
(56) and (57) are well-suited for this kind of computation. Nevertheless, a discussion of adapted
schemes should be precise and is out of the scope of this manuscript.
3.6 Case of Neumann boundary conditions.
Since the admissible deformation fields have a support with no intersection points with the outer
boundary, it is a straightforward application of the preceding computations to show that un solution
to (1)-(4) is twice differentiable with respect to the shape. Furthermore, its second order derivative
u′′n belongs to H
1(Ω \ ω) ∪H1(ω) and solves


∆u′′n = 0 in ω \ ω and in ω,[
h1,nh2,nH − h1τ .Dnh2τ
[∂nun]−
h1,n[∂n(un)
2] + h2,n[∂n(un)
h1τ .∇h2,n + h2τ .∇h1,n
[∂nun] on ∂ω,[
= divτ
σ∇τ (un)
+ h1,n
σ∇τ (un)
+ h1τ .Dn.h2τ )[σ∇τun]
−divτ
(h1τ .∇τh2,n +∇τh1,n.h2τ + h2,nh1,n(2Dn−HI)) [σ∇τun]
on ∂ω,
n = 0 on ∂Ω;
where we use the notations of Theorem 2.
4 Second order derivatives for the criterion.
4.1 Proof of Theorem 4.
The differentiability of the objective is a direct application of Theorem 2. The computation we make
here is based on the relation
D2JKV (ω)(h1,h2) = D
DJKV (w)h1
h2 −DJKV (w)Dh1h2. (66)
To obtain (10), we compute in a first step the shape gradient in the direction h1. Then, in a second
step, we differentiate the obtained expression in the direction of h2. In the sequel, we adopt the
notation v = ud − un to obtain concise expressions.
DJKV (ω)h1 = σ1
|∇v|2h1
+ 2∇v.∇v′1 + σ2
|∇v|2h1
+ 2∇v.∇v′1
= σ1(A1 + 2B1) + σ2(A2 + 2B2),
where
|∇v|2h1
∇v.∇v′1
|∇v|2h1
∇v.∇v′1.
Now, we use the classical formulae to differentiate a domain integral to get
DA1(ω)h2 =
|∇v|2h1
+ 2div
∇v.∇v′2 h1
|∇v+|2h1
h2,n + 2∇v
+.∇(v′2)
+ h1,n;
DA2(ω)h2 =
|∇v−|2h1
h2,n + 2∇v
−.∇(v′2)
− h1,n.
The terms DBi, i = 1, 2 require more precisions. First, we write
DB1(ω)h2 =
∇v.∇v′1)h2
+∇v′1.∇v
2 +∇v.∇(v
∇v+.∇(v′1)
+h2,n + ∂nv
+((v′1)
+)′2 +
+(v′2)
+ + ∂n(v
+(v′1)
∂nv((un)
∂n(ud)
1(un)
2 + ∂n(ud)
2(un)
Note that we used the Green formula twice to keep the symmetry in h1 and h2. We also use the fact
that the derivatives (ud)
i are harmonic in Ω \ ω to transform the boundary integral on the exterior
boundary into an integral on the moving boundary. We obtain
DB1(ω)h2 = −
∇v+.∇(v′1)
+h2,n + ∂nv
+(((ud)
+ − v∂n(((un)
+((ud)
+ + ∂n(v
+((ud)
+ − ∂n(un)
+(v′2)
+ − ∂n((un)
+(v′1)
By the same methods, we get
DB2(ω)h2 =
∇v−.∇(v′1)
−h2,n + ∂nv
−((v′1)
−(v′2)
− + ∂n(v
−(v′1)
We regroup the different terms and after some straightforward computations, we obtain:
DJKV (ω)h1
(ω)h2 = −
σ|∇v|2h1
h1,n∇v
2 + h2,n∇v
1 + (ud)
2 − ∂n(un)
1 − ∂n(un)
σ∂n((un)
− σ1∂nv
((ud)
In order to compute D2JKV (ω)(h1,h2), the first order derivative of the Kohn-Vogelius objective is
needed. It can be written as follows:
DJKV (w)h = −
σ|∇v|2
hn + 2
σ∂n(un)
− σ1∂nv
Gathering (66),(41) and (42), we write the second derivative of the Kohn-Vogelius criterion as:
D2JKV (ω)(h1,h2) = −
σ|∇v|2h1
σ|∇v|2
(Dh1h2).n
1 + (ud)
2 − ∂n(un)
1 − ∂n(un)
h1,n∇v
2 + h2,n∇v
σ∂n(un)
− σ1∂nv
Let us give a more simplified version for the first term. We decompose the field h2 into normal vector
and tangential parts and we use (43). After some elementary computations, we obtain
σ|∇v|2h1
σ|∇v|2
(Dh1h2).n
σ|∇v|2
h1τ .∇τh2,n + h2τ .∇τh1,n − h2τ .Dnh1τ
σ|∇v|2
h1,nh2,n.
Finally, the second order derivative of the Kohn-Vogelius objective becomes:
D2JKV (ω)(h1,h2) =
σ|∇v|2
h1τ .∇τh2,n + h2τ .∇τh1,n − h2τ .Dnh1τ
σ|∇v|2
h1,nh2,n + 2
h1,n∇v
2 + h2,n∇v
1 + (ud)
2 − ∂n(un)
1 − ∂n(un)
σ∂n(un)
− σ1∂nv
4.2 Analysis of stability. Proof of Theorem 5
Now, we specify the domain ω that is assumed to be a critical shape for JKV . Moreover, we assume
that the additional condition ud = un holds. To emphasize that we deal with such a special domain,
we will denote it ω∗. The assumptions mean that the measurements are compatible and that ω∗ is
a global minimum of the criterion. From the necessary condition of order two at a minimum, the
shape Hessian is positive at such a point.
Let us notice that only the normal component of h appears. Let us also emphasize that there
is no hope to get h = 0 from the structure theorem for second order shape derivative ([6]). The
deformation field h appears in D2JKV (ω
∗)(h,h) only thought its normal component hn since ω
a critical point for JKV . This remark explains why we consider in the statement of Theorem 5 the
scalar Sobolev space corresponding to the normal components of the deformation field.
We now prove Theorem 5. From (67), we deduce
DJ2KV (ω
∗)[h, h] = −2
u′d∂nv
′ − ∂nu
= 2 [σ]
(u′+d − u
n )divτ (hn∇τud)−
d hn∂n(u
d − u
= 2 [σ]
u′+d − u
n , divτ (hn∇τud)
∂nudhn, ∂n
u′d − u
where 〈, 〉 denotes the duality between H1/2(∂ω∗)×H−1/2(∂ω∗) . Let us introduce the operators
T1 : H
1/2(∂ω∗) → H−1/2(∂ω∗) M1 : H
1/2(∂ω∗) → H1/2(∂ω∗)
h 7→ divτ (hn∇τud) h 7→ u
d − u
T2 : H
1/2(∂ω∗) → H1/2(∂ω∗) M2 : H
1/2(∂ω∗) → H−1/2(∂ω∗)
h 7→ hn∂nu
d h 7→ ∂n
u′+d − u
The Hessian can then be written under the form :
D2JKV (ω
∗)(h,h) = 2 [σ]
M1(h), T1(h)
T2(h),M2(h)
From the classical results of Maz’ya and Shaposhnikova on multipliers ([14], [22]), we get easily that
T1 and T2 are continuous operators. In fact, the compactness of the Hessian is a consequence of the
fact that both operators M1 and M2 are compact. We use a regularity argument : we remark that
M1 is the composition of the operators:
R1 : H
1/2(∂ω∗) → H
⋄ (∂Ω) and R2 : H
⋄ (∂Ω) → H
1/2(∂ω∗)
h 7→ −u′n φ 7→ ψ
where ψ is the trace on ∂ω∗ of Ψ solution of
−∆Ψ = 0 in Ω \ ω∗ and in ω∗, ,
[Ψ] = 0 on ∂ω∗,
[σ∂nΨ] = 0 on ∂ω
Ψ = φ on ∂Ω.
While R1 is a continuous operator, we prove that R2 is compact. Let us express u|∂ω∗ = ψ. We use
the integral formula of u to obtain:
I + µKω∗
σ2 + σ1
S∂Ω∂ω∗
µK∂ω∗∂Ω
σ2 + σ1
(u)|∂ω∗
(∂nu)|∂Ω
σ1 + σ2

K∂Ω∂ω∗φ

The matricial operator arising in this equation appeared also in (56). It has a continuous inverse
thanks to Theorem 7. Let us express u|∂ω∗ = ψ:
I + µKω∗)− µS∂Ω∂ω∗S
Ω K∂ω∗∂Ω
σ1 + σ2
K∂Ω∂ω∗ − S∂Ω∂ω∗S
I −KΩ)
φ. (69)
Since the operators K∂Ω∂ω∗ and S∂Ω∂ω∗ are compact, the operator R2 is compact, hence M1 is
compact. The proof of compactness of M2 is similar. Let us mention that a similar strategy of proof
can be found in [5].
The natural question is then to quantify how is this optimization problem ill-posed. This question
is directly in related to the rate at which the singular values of the Hessian operator are decreasing.
Equation (69) shows that this rate is the one of the operators K∂Ω∂ω∗ and S∂Ω∂ω∗ . Now, since for
every u ∈ H1/2(∂Ω), the functions K∂Ω∂ω∗u and S∂Ω∂ω∗u are harmonic outside of ∂Ω and therefore in
Ω, their restrictions on ∂ω∗ are as smooth as ∂ω∗. We conclude that if ∂ω∗ is C∞ then the restriction
belongs to each Hs(∂ω∗) for s > 1/2 then that if λn denotes the n
th eigenvalue of D2JKV (ω
∗), then
λn = o(n
−s) for all s > 0.
References
[1] L Afraites, and M. Dambrine and D. Kateb. Conformal mappings and shape derivatives for the
transmission problem with a single measurement. Preprint HAL 2006 to appear in Numerical
Functional Analysis and Optimization.
[2] L Afraites, and M. Dambrine, and K. Eppler and D. Kateb. Detecting perfectly insulated obstacles
by shape optimization techniques of order two. Preprint HAL 2006.
[3] K. Astala, and L. Pävärinta. Calderón’s inverse conductivity problem in the plane. Ann. of Math.,
(163), 265-299.
[4] M. Delfour, and J.-P. Zolesio. Shapes and Geometries: Analysis, Differential Calculus, and
Optimization SIAM, (2001).
[5] K. Eppler, and H. Harbrecht. A regularized Newton method in electrical impedance tomography
using Hessian information, Control and Cybernetics (34), 203-225.
[6] A. Henrot, and M. Pierre. Variation et optimization de formes. Springer Mathématiques et
Applications, volume 48 (2005).
[7] F. Hettlich, and W. Rundell. The determination of a discontinuity in a conductivity from a single
boundary measurement, Inverse Problems 14 (1998), 67-82.
[8] F. Hettlich, and W. Rundell. A Second Degree Method for Nonlinear Inverse Problems, SIAM J.
Numer. Anal., 37, No.2, (1999), 587–620.
[9] B. Hofmann. Approximation of the inverse electrical impedance tomography problem by an inverse
transmission problem, Inverse problems 14 (1998), 1171-1187.
[10] K. Ito, K. Kunisch, and Z. Li. Level-set function approach to an inverse interface problem,
Inverse Problems 17 (2001), 1225-1242.
[11] R. Kress. Linear Integral Equations. Springer-Verlag, Applied Mathematical Sciences (82).
[12] A. Kirsch. Surface Gradients and Continuity Properties for some integral operators in Classical
Scattering Theory. Mathematical Methods in the Applied Sciences, Vol 11 (1989), 789-804.
[13] A. Kirsch. The Domain Derivative and Two Applications in Inverse Scattering Theory, Inverse
Problems 9 (1993), 81-96.
[14] V.G. Maz’ya and T.O. Shaposhnikova. Theory of multipliers in spaces of differentiable functions,
Pitman, Boston, Monographs and Studies in Mathematics, 23, (1985).
[15] A.I. Nachmann. Reconstruction from boundary measurements, Ann. of Math., 128 (1988),
531-576
[16] A.I. Nachmann. Global uniqueness for a two dimensional inverse boundary value problem, Ann.
of Math., 143 (1996), 71-96
[17] R.G. Novikov. A multidimensional inverse spectral problem for the equation ∆ψ + (v(x) −
Eu(x))ψ = 0, Funktsional. Anal. i Prilozhen. 22 (1988), no 4, 11-22, 96; translation in Funct.
Anal. Appl. 22, 263-272.
[18] O. Pantz. Sensibilité de l’équation de la chaleur aux sauts de conductivité, C.R. Acad. Sci. Paris,
Ser. I 341-5 (2005), 333-337.
[19] J. Simon. Second variation for domain optimization problems, In Control and estimation of
distributed parameter systems, F. Kappel, K. Kunisch and W. Schappacher ed., International
Series of Numerical Mathematics, no 91, Birkhäuser, 361-378.
[20] J. Sokolowski and Jean-Paul Zolesio. Introduction to Shape Optimization: Shape Sensitivity
Analysis, Springer-Verlag (1992).
[21] J. Sylvester, and G. Uhlmann. A global uniqueness for an inverse boundary value problem Ann.
of Math. 125 (1987), 153-169.
[22] H. Triebel. Theory of Function Spaces, Birkhaüser (1983).
	Introduction and statement of the results.
	Preliminary results. 
	Elements of shape calculus
	Existence of the second order derivative of the state. Proof of Theorem ??.
	Preliminary results.
	Proof of existence of the second order derivative.
	Derivation of (??) from the weak formulation.
	How to recover (??) by formal differentiation of the boundary conditions.
	Justification of the formal computations.
	Case of Neumann boundary conditions.
	Second order derivatives for the criterion.
	Proof of Theorem ??.
	Analysis of stability. Proof of Theorem ??
ABSTRACT
  This paper is devoted to the analysis of a second order method for recovering
the \emph{a priori} unknown shape of an inclusion $\omega$ inside a body
$\Omega$ from boundary measurement. This inverse problem - known as electrical
impedance tomography - has many important practical applications and hence has
focussed much attention during the last years. However, to our best knowledge,
no work has yet considered a second order approach for this problem. This paper
aims to fill that void: we investigate the existence of second order derivative
of the state $u$ with respect to perturbations of the shape of the interface
$\partial\omega$, then we choose a cost function in order to recover the
geometry of $\partial \omega$ and derive the expression of the derivatives
needed to implement the corresponding Newton method. We then investigate the
stability of the process and explain why this inverse problem is severely
ill-posed by proving the compactness of the Hessian at the global minimizer.

<|endoftext|><|startoftext|>
Braiding transformation, entanglement swapping and Berry phase in entanglement
space
Jing-Ling Chen,1, ∗ Kang Xue,2 and Mo-Lin Ge1, †
Liuhui Center for Applied Mathematics and Theoretical Physics Division,
Chern Institute of Mathematics, Nankai University, Tianjin 300071, People’s Republic of China
Department of Physics, Northeast Normal University,
Changchun, Jilin 130024, People’s Republic of China
We show that braiding transformation is a natural approach to describe quantum entanglement, by
using the unitary braiding operators to realize entanglement swapping and generate the GHZ states
as well as the linear cluster states. A Hamiltonian is constructed from the unitary Ři,i+1(θ, ϕ)-
matrix, where ϕ = ωt is time-dependent while θ is time-independent. This in turn allows us to
investigate the Berry phase in the entanglement space.
PACS numbers: 03.67.Mn, 02.40.-k, 03.65.Vf
I. INTRODUCTION
Quantum entanglement is the most surprising non-
classical property of composite quantum systems that
Schrödinger singled out many decades ago as “the char-
acteristic trait of quantum mechanics”. Recently entan-
glement has become one of the most fascinating topics
in quantum information, because it has been shown that
entangled pairs are more powerful resources than the sep-
arable ones in a number of applications, such as quantum
cryptography [1], dense coding, teleportation [2] and in-
vestigation of quantum channels, communication proto-
cols and computation [3]. For instance, by using a maxi-
mally entangled state |Φ+〉 = 1/
2(| ↑↑〉+| ↓↓〉) (i.e., one
of Bell states and also the so-called Einstein-Podolsky-
Rosen (EPR) channel in [2]), Bennett et al. have showed
that it is faithful to transmit a one-qubit state a| ↑〉+b| ↓〉
from one location (Alice) to another (Bob) by sending
two bits of classical information.
For a two-qubit system, there has been defined a
“magic basis” consisting of four Bell states [4]:
|Φ+〉 = 1/
2(| ↑↑〉+ | ↓↓〉),
|Φ−〉 = 1/
2(| ↑↑〉 − | ↓↓〉),
|Ψ+〉 = 1/
2(| ↑↓〉+ | ↓↑〉),
|Ψ−〉 = 1/
2(| ↑↓〉 − | ↓↑〉), (1)
where spin-1/2 notation for definiteness has been used.
Any pure state of two-qubit can be expanded in this par-
ticular basis and its degree of entanglement can be ex-
pressed in a remarkably simple way [4]. It is possible to
study these Bell states from the other point of view of
transformation theory. The fact that they are all nor-
malized and mutual orthogonal naturally indicates that
the four Bell states are connected to the standard basis
∗Electronic address: chenjl@nankai.edu.cn
†Electronic address: geml@nankai.edu.cn
{| ↑↑〉, | ↑↓〉, | ↓↑〉, | ↓↓〉} by a unitary transformation
1 0 0 1
0 1 1 0
0 −1 1 0
−1 0 0 1
. (2)
More precisely, let | ↑〉 = (1, 0)T and | ↓〉 = (0, 1)T , | ↑↑〉
is understood as | ↑〉⊗| ↑〉, one then has the matrix forms
for the standard basis as | ↑↑〉 = (1, 0, 0, 0)T , | ↑↓〉 =
(0, 1, 0, 0)T , | ↓↑〉 = (0, 0, 1, 0)T , | ↓↓〉 = (0, 0, 0, 1)T . Act-
ing the unitary matrix U on the standard basis will pro-
duce the four Bell states: U | ↑↑〉 = 1/
2(1, 0, 0,−1)T =
|Φ−〉, U | ↑↓〉 = 1/
2(0, 1,−1, 0)T = ¯|Ψ−〉, U | ↓↑〉 =
2(0, 1, 1, 0)T = |Ψ+〉, U | ↓↓〉 = 1/
2(1, 0, 0, 1)T =
|Φ+〉, in short one obtains U(| ↑↑〉, | ↑↓〉, | ↓↑〉, | ↓↓〉) =
(|Φ−〉, |Ψ−〉, |Ψ+〉, |Φ+〉).
During the investigation of the relationships among
quantum entanglement, topological entanglement and
quantum computation, Kauffman et al. have discovered
a very significant result that the matrix U is nothing but
a braiding operator, and furthermore it can be identi-
fied to the universal quantum gate (i.e., the CNOT gate)
[5][6]. There is an earlier literature on topological quan-
tum computation and which is all about quantum com-
puting using braiding [7]. These literatures introduce
the braiding operators and Yang–Baxter equations to the
field of quantum information and quantum computation,
and also provide a novel way to study the quantum en-
tanglement.
Our aim in this work is twofold: one is to show that
braiding transformation is a natural approach describing
the quantum entanglement, the other is to investigate
the Berry phase in the entanglement space (or the Bloch
space). The paper is organized as follows. In Sec. II, we
briefly review the unitary braiding operators and apply
them to realize entanglement swapping and to generate
the Greenberger-Horne-Zeilinger (GHZ) states as well as
the linear cluster states. In Sec. III, after briefly review-
ing the Yang–Baxterization approach, we construct a
Hamiltonian from the unitary Ři,i+1(θ, ϕ)-matrix, where
ϕ is time-dependent while θ is time-independent. This in
http://arxiv.org/abs/0704.0709v3
mailto:chenjl@nankai.edu.cn
mailto:geml@nankai.edu.cn
turn allows us to investigate the Berry phase in the en-
tanglement space. Conclusion and discussion are made
in the last section.
II. BRAIDING TRANSFORMATION AND ITS
APPLICATIONS
Hereafter for convenience, we shall denote the spin up
| ↑〉 and down | ↓〉 as |0〉 and |1〉, respectively. Braiding
operators are the generalizations of the usual permuta-
tion operators. ForN spin-1/2 particles, the permutation
operator for the particles i and i+ 1 reads
Pi,i+1 =
(1 + ~σi · ~σi+1) =
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
, (3)
Here Pi,i+1 is understood as 11 ⊗ 12 ⊗ · · · ⊗ 1i−1 ⊗ (1 +
~σi · ~σi+1)/2⊗ 1i+2 ⊗ · · · ⊗ 1N , where 1 is the 2× 2 unit
matrix. The permutation operator Pi,i+1 exchanges the
spin state |k〉i ⊗ |l〉i+1 to be |l〉i ⊗ |k〉i+1.
The braiding operators satisfy the following braid re-
lations:
bi,i+1bi+1,i+2bi,i+1 = bi+1,i+2bi,i+1bi+1,i+2, i ≤ N − 2,
bi,i+1bj,j+1 = bj,j+1bi,i+1, |i− j| ≥ 2. (4)
The usual permutation operator Pi,i+1 is a solution of
Eq. (4) with the constraint P 2i,i+1 = 1. Physics prefers
to the unitary transformations. One may observe that
both U and Pi,i+1 are unitary. Two more general unitary
braiding transformations satisfying the braiding relations
Bi,i+1 =
1 0 0 e−iϕ
0 1 1 0
0 −1 1 0
−eiϕ 0 0 1
, (5)
Pi,i+1 =
eiξ00 0 0 0
0 0 eiξ10 0
0 eiξ01 0 0
0 0 0 eiξ11
, (6)
which allow additional phase factors. Braiding opera-
tors Bi,i+1 and Pi,i+1 transform the direct-product states
|kl〉 ≡ |k〉i ⊗ |l〉i+1 in the following way
Bi,i+1
|00〉 − eiϕ|11〉
|01〉 − |10〉
|01〉+ |10〉
e−iϕ|00〉+ |11〉
, (7)
Pi,i+1
eiξ00 |00〉
eiξ10 |10〉
eiξ01 |01〉
eiξ11 |11〉
. (8)
B12 B34
B12 B34
|ψ〉ABCD = |Φ
−〉AB ⊗ |Φ
|ψ′〉ABCD = −|Φ
−〉AD ⊗ |Φ
|0〉A |0〉B |0〉D|0〉C
FIG. 1: Realizing ES by braiding transformations. After act-
ing B34B12 on a separable state |0000〉ABCD , one prepares
a state |ψ〉ABCD = |Φ−〉AB ⊗ |Φ−〉CD needed for quantum
entanglement swapping. After performing successive braid-
ing transformations B23B34B12B23 on |ψ〉ABCD , the entan-
glement involved in the state |ψ〉ABCD is swapped to the state
|ψ′〉ABCD = −|Φ−〉AD ⊗ |Φ+〉BC .
They may generate entangled states from disentangled
ones: (i) The braiding matrix Bi,i+1 yields directly the
four Bell states |Φ±〉 and |Ψ±〉 with the relative phase
factor e−iϕ. The phase factor e−iϕ originates from the
q-deformation of the braiding operator U with q = e−iϕ
[8][9], and ϕ may have a physical significance of mag-
netic flux [10]. In the next section, we shall vary adia-
batically the parameter ϕ to obtain the Berry phase in
the entanglement space. (ii) When Pi,i+1 acts on an ini-
tial separable state 1/
2(|0〉+ |1〉)i⊗ 1/
2(|0〉+ |1〉)i+1,
it produces an entangled state (eiξ00 |00〉 + eiξ01 |01〉 +
eiξ10 |10〉 + eiξ11 |11〉)/2 whose degree of entanglement
equals to |ei(ξ00+ξ11) − ei(ξ01+ξ10)|/2. Thus it is indeed
a very natural way for the braiding operators to describe
and to generate quantum entanglement. To strengthen
such a viewpoint, we would like to provide two explicit
examples as applications of braiding transformations as
follows.
Example 1: Entanglement swapping. Entanglement
swapping (ES) is a very interesting quantum mechanical
phenomenon, which was originally proposed by Żukowski
et al. [11], generalized to multipartite quantum systems
by Zeilinger et al. [12] and Bose et al. [13] independently,
and experimentally realized by Pan et al. [14]. The origi-
nal ES is based on quantum measurement: Suppose Alice
and Bob share an entangled state, similarly Claire and
Danny also share some entangled states, if Bob and Claire
come together and make a measurement in a suitable
basis and communicate their measurement results clas-
sically, then Alice’s and Danny’s particles may become
entangled. Now we come to use the braiding transforma-
tions to realize the ES. Starting from a separable state
|0000〉ABCD ≡ |0000〉1234, we prepare a state |ψ〉ABCD
needed for quantum entanglement swapping due to the
braiding transformations B12 and B34 as follows:
|ψ〉ABCD = B34B12|0000〉ABCD
= |Φ−〉AB ⊗ |Φ−〉CD, (9)
(|00〉 − |11〉)AB ⊗
(|00〉 − |11〉)CD,
here for simplicity we have set ϕ = 0, and |Φ±〉 are the
usual Bell states. One may verify that
|ψ′〉ABCD = B23B34B12B23|ψ〉ABCD
= −|Φ−〉AD ⊗ |Φ+〉BC , (10)
(−|00〉+ |11〉)AD ⊗
(|00〉+ |11〉)BC ,
in other words, after making the successive braiding
transformations B23B34B12B23, the entanglement in-
volved in the state |ψ〉ABCD is swapped to |ψ′〉ABCD,
therefore we have realized the ES (see Fig. 1). The dif-
ference between the original ES scenario and ours is that
the former based on quantum measurement, while the
latter based on unitary braiding transformations with-
out quantum measurement. It is worthy to mention that
the approach of realizing ES by braiding transformations
is not unique. For instance, ES can be done even simpler
by using only two permutations P34P23 that acting on
the state |ψ〉ABCD.
Example 2: Generating the GHZ states and the linear
cluster states. These are some kinds of important en-
tangled states in quantum information, such as the well-
known GHZ state and the linear cluster state. (i) It is
easy to check that, after acting B12B23 on the initially
separable three-qubit state |111〉123, one obtains a state
|ψ′〉GHZ = B12B23|111〉123 (11)
(|100〉123 + |010〉123 + |001〉123 + |111〉123),
which is equivalent to the standard three-qubit GHZ
state |ψ〉GHZ = 1/
2(|000〉123 + |111〉123) up to a local
unitary transformation:
|ψ′〉GHZ = Ua ⊗ Ub ⊗ Uc|ψ〉GHZ , (12)
where Ua = Ub = Uc = V , and
, (13)
i.e., the unitary transformation V is decomposed as a
product of the Hadamard gate and the phase gate of σz.
In general, one may obtain the N -qubit GHZ states by
acting B12B23 · · ·BN−1,N on the initially separable N -
qubit state |11 · · · 1〉12···N . (ii) The linear cluster state
is the highly entangled multiparticle state on which one-
way quantum computation is based [15][16]. The linear
cluster state is locally equivalent to the N -qubits ring
cluster state. The random quantum measurement er-
ror can be overcome by applying a feed-forward tech-
nique, such that the future measurement basis depends
on earlier measurement results. This technique is crucial
for achieving deterministic quantum computation once
a cluster state is prepared. For four qubits, the linear
cluster state reads
|ψ〉cluster =
(|0〉1|0〉2|0〉3|0〉4 + |0〉1|0〉2|1〉3|1〉4 +
|1〉1|1〉2|0〉3|0〉4 − |1〉1|1〉2|1〉3|1〉4). (14)
However, it is not easy to generate |ψ〉cluster by us-
ing only one kind of unitary braiding transformations
Bi,i+1. In the following, starting from the initial separa-
ble four-qubit state |0000〉1234, we would like to mathe-
matically generate the four-qubit linear cluster state by
combined using two kinds of unitary braiding transfor-
mations Bi,i+1 and Pi,i+1, namely
|ψ〉cluster = P23P23B34B12|0000〉1234, (15)
where the phases in P23 are chosen as ξ00 = 0, ξ01 =
ξ10 = ξ11 = π, and P23 is the usual permutation opera-
tor in Eq. (3). Moreover, one can mathematically gener-
ate 16 orthogonal four-qubit linear cluster states by act-
ing P23P23B34B12 on the initial states |ijkl〉1234, where
i, j, k, l run from 0 to 1.
Significantly such realizations of entanglement swap-
ping as well as the GHZ states are purely based on one
kind of braiding transformations Bi,i+1. Eqs. (9)-(13)
are hopeful to provide an alternative approach for the
experimenter to realize the ES and also generate the
GHZ states through a network of quantum logic gates in
the future. Recent realization of the linear cluster states
is based on quantum measurements [16]. By using two
kinds of braiding transformations, Eq. (15) has mathe-
matically produced the state |ψ〉cluster . Since Bi,i+1 and
Pi,i+1 do not have the same eigenvalues and they can-
not be the matrices representing exchanges within the
same braid group representation, there is still a distance
between the mathematical realization Eq. (15) and the
actual physical realization.
III. R-MATRIX, HAMILTONIAN AND BERRY
PHASE IN ENTANGLEMENT SPACE
In Ref. [6], the unitary matrix Ři,i+1(θ, ϕ) has been in-
troduced from the Yang–Baxterization approach [8] in or-
der to include the general discussion of the nonmaximally
entangled states. To make the paper be self-contained,
we briefly review it in the following.
The Yang-Baxterization of the unitary braiding oper-
ator Bi,i+1 is
Ři,i+1(x) =
1 + x2
(Bi,i+1 + xB
i,i+1), (16)
namely, Ři,i+1(x)-matrix is a linear superposition of ma-
trices Bi,i+1 and B
i,i+1, where B
−1 = B† is the inverse
matrix of B. The unitary Ř-matrix is a generalization
of the unitary braiding matrix Bi,i+1, which satisfies the
Yang–Baxter equation:
Ři(x) Ři+1(xy) Ři(y) = Ři+1(y) Ři(xy) Ři+1(x), (17)
where x and y are called the spectral parameters. The
braid relations (4) can be viewed as an asymptotic be-
havior of the Yang–Baxter equation. By introducing the
new variables of angles θ as cos θ = (1− x)/
2(1 + x2),
sin θ = (1+x)/
2(1 + x2), the matrix Ři,i+1(x) may be
recast to Ři,i+1(θ, ϕ) = sin θ 1i ⊗ 1i+1 + cos θ Mi,i+1.
where Mi,i+1 = e
−iϕS+i ⊗ S
i+1 − eiϕS
i ⊗ S
i+1 + S
S−i+1 − S
i ⊗ S
i+1, and S
± = Sx ± iSy are the matrices
for spin-1/2 angular momentum operators.
Similar to Eq. (7), when the unitary matrix
Ři,i+1(θ, ϕ) acts on the direct-product states |kl〉, it is
expected to produce the nonmaximally entangled states
Ři,i+1(θ, ϕ)
sin θ|00〉 − eiϕ cos θ|11〉
sin θ|01〉 − cos θ|10〉
cos θ|01〉+ sin θ|10〉)
e−iϕ cos θ|00〉+ sin θ|11〉
Remarkably, the four states in the right-hand side of Eq.
(18) possess the same degree of entanglement (or the con-
currence [17]) equals to | sin(2θ)|. When θ = π/4, they
reduce to the four Bell basis and correspondingly the ma-
trix Ři,i+1(θ, ϕ) reduces to the braiding operator Bi,i+1.
There are two parameters θ, ϕ in the unitary matrix
Ři,i+1(θ, ϕ). If let θ be time-dependent while ϕ be time-
independent, one can construct a Hamiltonian as in Ref.
[6]. However, the eigenstates of such a Hamiltonian are
separable states, which do not allow us to study the Berry
phases for entangled states. To reach this purpose, in this
paper we will let ϕ = ωt be time-dependent while θ be
time-independent.
Equation (18) can be abbreviated as
Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉 = |ψ(θ, ϕ)〉. Taking
the Schrödinger equation ih̄∂|ψ(θ, ϕ)〉/∂t =
H(θ, ϕ)|ψ(θ, ϕ)〉 into account, one obtains
ih̄∂/∂t[Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉] = ih̄∂/∂t[|ψ(θ, ϕ)〉] =
H(θ, ϕ)|ψ(θ, ϕ)〉 = H(θ, ϕ)Ři,i+1(θ, ϕ)|ψ(π/2, 0)〉. Now
let the parameters θ be time-independent and ϕ(t) = ωt,
one may arrive at a Hamiltonian through the unitary
transformation Ři,i+1(θ, ϕ) as
H(θ, ϕ) = ih̄
∂Ři,i+1(θ, ϕ)
i,i+1(θ, ϕ). (19)
More precisely, the Hamiltonian reads
H(θ, ϕ) = h̄ϕ̇ cos θ
cos θ 0 0 e−iϕ sin θ
0 0 0 0
0 0 0 0
eiϕ sin θ 0 0 − cos θ
,(20)
or, H(θ, ϕ) = h̄ϕ̇ cos θ[cos θ(Szi ⊗ 1i+1 + 1i ⊗ Szi+1) +
sin θ(e−iϕS+i ⊗ S
i+1 + e
iϕS−i ⊗ S
i+1]. In the standard
basis {|00〉, |01〉, |10〉, |11〉}, one observes that H(θ, ϕ)
has contributions merely on {|00〉, |11〉}, i.e., it makes
four-dimensions “collapse” to two-dimensions since θ
is assumed to be time-independent. In the basis of
{|01〉, |10〉}, the two eigenstates |χ01〉 = |01〉, |χ10〉 = |10〉
FIG. 2: Berry phases in Bloch space (or the entanglement
space). The parameter θ comes from the Yang–Baxterization
of the unitary braiding operators, while parameters ϕ origi-
nates from the q-deformation of the braiding operators. They
define a point on the unit three-dimensional sphere named
the Bloch sphere, and have definite geometric meanings as
angles of longitude and latitude respectively. Let θ be time-
independent, when the parameter ϕ(t) evolves adiabatically
from 0 to 2π, one obtains the Berry phases for χ±(θ, ϕ)
as shown in Eq. (22). The relation between Berry phases
and concurrence of the entangled states χ±(θ, ϕ) is γ± =
∓π(1−
1− C2), where C = | sin θ| is the concurrence.
are degenerate with zero eigenvalues E01 = E10 = 0,
they will not give rise to Berry phases so we would not
like to discuss them here. In the basis of {|00〉, |11〉}, the
two eigenvalues E± = ±h̄ϕ̇ cos θ with two corresponding
eigenstates read
|χ+(θ, ϕ)〉 = cos
|00〉+ eiϕ sin
|11〉,
|χ−(θ, ϕ)〉 = −e−iϕ sin
|00〉+ cos θ
|11〉. (21)
Interestingly, the interval between E+ and E− depends
on θ that related to the degree of entanglement of the
states. According to Berry’s theory [18], when ϕ(t)
evolves adiabatically from 0 to 2π, the corresponding
Berry phases for the entangled states are
γ± = i
dt 〈χ±(θ, ϕ)|
|χ±(θ, ϕ)〉 = ∓
, (22)
where Ω = 2π(1 − cos θ) is the familiar solid angle en-
closed by the loop on the Bloch sphere (see Fig. 2).
Actually, the eigenstates |χ±(θ, ϕ)〉 are the SU(2) spin
coherent states. If we express the Hamiltonian in terms
of SU(2) generators as [19]
H(θ, ϕ) = X1J1 +X2J2 +X3J3, (23)
where X1 = 2h̄ϕ̇ cos θ sin θ cosϕ, X2 =
2h̄ϕ̇ cos θ sin θ sinϕ, X3 = 2h̄ϕ̇ cos θ cos θ, and the
SU(2) generators are
J1 = (S
i ⊗ S
i+1 + S
i ⊗ S
i+1)/2,
J2 = (S
i ⊗ S
i+1 − S
i ⊗ S
i+1)/2i,
J3 = (S
i ⊗ 1i+1 + 1i ⊗ Szi+1)/2, (24)
based on which one can verify directly that
|χ+(θ, ϕ)〉 = exp[ζJ+ − ζ∗J−] |00〉,
|χ−(θ, ϕ)〉 = exp[ζJ+ − ζ∗J−] |11〉, (25)
where exp[ζJ+ − ζ∗J−] is the spin coherence operators
(and also the usual D
2 (θ, ϕ)-matrix in the angular mo-
mentum theory), J± = J1 ± iJ2 and ζ = e−iϕθ/2. Berry
phase for spin coherence states has been discussed in [19],
where the corresponding result coincides with Eq. (22).
IV. CONCLUSION AND DISCUSSION
In summary, we have shown that braiding transfor-
mation is a natural approach to describe quantum en-
tanglement, by applying the unitary braiding operators
to realize entanglement swapping and to generate the
GHZ states as well as the linear cluster states. The uni-
tary braiding matrix Bi,i+1 describes the Bell states and
the Yang–Baxter matrix Ři,i+1(θ, ϕ) describes generally
entangled states with arbitrary degree of entanglement.
Varying the parameter θ continuously from 0 to 2π, one
may obtain an “oscillating entanglement” phenomenon
for the entangled states. A Hamiltonian is constructed
from the unitary Ři,i+1(θ, ϕ)-matrix, where ϕ = ωt is
time-dependent while θ is time-independent. This in turn
allows us to investigate the Berry phases for the entan-
gled states in the entanglement space.
Let us make two discussions to end this paper.
(i) Very recently, geometric phases for mixed states
[20] have been observed in experiments by using NMR
interferometry [21] as well as single photon interferome-
try [22]. Under a certain noisy environment, the states
|χ±(θ, ϕ)〉 may become mixed states as
ρ±(r, θ, ϕ) = r |χ±〉〈χ±|+ (1− r)ρnoise, (26)
where 0 ≤ r ≤ 1. Usually, ρnoise is chosen as 1i ⊗
1i+1/4 = (|00〉〈00| + |01〉〈01| + |10〉〈10| + |11〉〈11|)/4
and ρ±(r, θ, ϕ) become the generalized Werner states
[3]. Following Ref. [23], one may calculate the geo-
metric phases for the mixed states ρ±(r, θ, ϕ), however,
the computation becomes complicated since ρ±(r, θ, ϕ)
have two nonzero degenerate eigenvalues in the subspace
spanned by {|01〉, |10〉}. Geometric phases for degen-
erate mixed states are complicated and we will discuss
them elsewhere. In the following, we would like to dis-
cuss a more simpler case for geometric phases of mixed
states, by restricting the noise in the subspace spanned
by {|00〉, |11〉}. The analysis on such a restriction to
the noisy environment is limited, for it assumes that the
states |01〉 and |10〉 are decoupled, and the environment
only affects the |00〉 and |11〉 subspace.
For simplicity, let us denote |0〉 ≡ |00〉, |1〉 ≡ |11〉,
then the Hamiltonian can be rewritten in a very sim-
ple form as H(θ, ϕ) = h̄ϕ̇ cos θ r̂ · σ, where r̂ =
(sin θ cosϕ, sin θ sinϕ, cos θ) is a unit vector on the Bloch
sphere, and σ = (σ1, σ2, σ3) is the Pauli matrix vector
in the basis of {|0〉, |1〉}, namely, σ1 = |0〉〈1| + |1〉〈0|,
σ2 = −i|0〉〈1| + i|1〉〈0|, σ3 = |0〉〈0| − |1〉〈1|. Based
on which, the pure states |χ±(θ, ϕ)〉 can be rewritten
in a density matrix form as |χ±〉〈χ±| = (11 ± r̂ · σ)/2,
where 11 = |0〉〈0| + |1〉〈1|. In other words, in the basis
of {|0〉, |1〉}, |χ±〉 may be viewed as states of a single
“qubit”, which allows us to introduce mixed states and
discuss their geometric phases in a particular noisy en-
vironment as follows. By choosing ρnoise = 11/2, one has
from Eq. (26) that
ρ±(r, θ, ϕ) =
(11± r · σ), (27)
where r = rr̂. The state |χ+〉 corresponds to a point r̂
on the Bloch sphere; ρnoise is located on the center of
the Bloch sphere; the unit vector r̂ shrinks to r when the
particular noise is presented and then |χ±〉〈χ±| turn to
be mixed states ρ±(r, θ, ϕ). Follow the same calculations
in [23], let r and θ be time-independent, when parameter
ϕ(t) evolves adiabatically from 0 to 2π, one obtains the
geometric phase for the mixed states (27) as
γmixed± = ∓ arctan(r tan
), (28)
which reduces to Eq. (22) for r = 1.
(ii) The Berry phases in Eq. (22) can be expressed
in terms of the concurrence of the states |χ±(θ, ϕ)〉 as
γ± = ∓π(1 −
1− C2), with C = | sin θ| being the con-
currence. It is well-known that C is an invariant of entan-
glement for the entangled states |χ±(θ, ϕ)〉, while Berry
phase is related to some certain topological structures.
This might bridge a connection between quantum entan-
glement and topological quantum computation. Even-
tually, when one restricts the discussion to the basis of
{|0〉, |1〉}, by taking θ = π/4, φ = −π/2 (or q = i), the
matrix Ři,i+1 may reduce to the two-dimensional repre-
sentation of braiding operators as in Eq. (140) of [9],
which has physical applications in non-Abelian quantum
Hall systems and topological quantum field theory.
ACKNOWLEDGMENTS The authors thank Prof.
L. D. Faddeev and Prof. K. Fijikawa for their encourage-
ment and useful discussions. This work was supported
in part by NSF of China (Grant No. 10575053 and No.
10605013) and Program for New Century Excellent Tal-
ents in University.
[1] A. K. Ekert, Phys. Rev. Lett. 67, 661 (1991). [2] C. H. Bennett, and S. J. Wiesner, Phys. Rev. Lett. 69,
2881 (1992); C. H. Bennett, G. Brassard, C. Crépeau, R.
Jozsa, A. Peres, and W. K. Wootters, Phys. Rev. Lett.
70, 1895 (1993).
[3] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
2000).
[4] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W.
K. Wootters, Phys. Rev. A 54, 3824 (1996).
[5] L. H. Kauffman and S. J. Lomonaco Jr., New J. Phys. 6,
134 (2004); J. M. Franko, E. C. Rowell, and Z. Wang, J.
Knot Theory Ramifications 15, 413 (2006).
[6] Y. Zhang, L. H. Kauffman and M. L. Ge, Int. J. Quant.
Inform. 3, 669 (2005).
[7] A. Y. Kitaev, Annals Phys. 303, 2 (2003); e-print
quant-ph/9707021.
[8] Yang–Baxter Equations in Integrable Systems, edited by
M. Jimbo (World Scientific, Singapore, 1990).
[9] J. K. Slingerland, and F. A. Bais, Nucl. Phys. B 612, 229
(2001).
[10] G. Badurek, H. Rauch, A.Zeilinger, W. Bauspiess, and
U. Bonse, Phys.Rev. D 14, 1177 (1976); A. Zeilinger,
Physica B 137, 235 (1986).
[11] M. Żukowski, A. Zeilinger, M. A. Horne, and A. K. Ekert,
Phys. Rev. Lett. 71, 4287 (1993).
[12] A. Zeilinger, M. A. Horne, H. Weinfurter, and M.
Żukowski, Phys. Rev. Lett. 78, 3031 (1997).
[13] S. Bose, V. Vedral, and P. L. Knight, Phys. Rev. A 57,
822 (1998).
[14] J. W. Pan, D. Bouwmeester, H. Weinfurter, A. Zeilinger,
Phys. Rev. Lett. 80, 3891 (1998).
[15] R. Raussendorf and H. J. Briegel, Phys. Rev. Lett. 86,
5188 (2001).
[16] R. Prevedel, P. Walther, F. Tiefenbacher, P. Böhi, R.
Kaltenbaek, T. Jennewein, and A. Zeilinger, Nature 445,
65 (2007).
[17] W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
[18] Geometric Phases in Physics, edited by A. Shapere and
F. Wilczek (World Scientific, Singapore, 1989).
[19] S. Chaturvedi, M. S. Sriram, and V. Srinivasan, J. Phys.
A 20, L1091 (1987).
[20] E. Sjöqvist, A. K. Pati, A. Ekert, J. S. Anandan, M.
Ericsson, D. K. L. Oi, and V. Vedral, Phys. Rev. Lett.
85, 2845 (2000).
[21] J. Du, P. Zou, M. Shi, L. C. Kwek, J. W. Pan, C. H. Oh,
A. Ekert, D. K. L. Oi, and M. Ericsson, Phys. Rev. Lett.
91, 100403 (2003).
[22] M. Ericsson, D. Achilles, J. T. Barreiro, D. Branning, N.
A. Peters, and P. G. Kwiat, Phys. Rev. Lett. 94, 050401
(2005).
[23] K. Singh, D. M. Tong, K. Basu, J. L. Chen, and J. F.
Du, Phys. Rev. A 67, 032106 (2003).
http://arxiv.org/abs/quant-ph/9707021
ABSTRACT
  We show that braiding transformation is a natural approach to describe
quantum entanglement, by using the unitary braiding operators to realize
entanglement swapping and generate the GHZ states as well as the linear cluster
states. A Hamiltonian is constructed from the unitary
$\check{R}_{i,i+1}(\theta,\phi)$-matrix, where $\phi=\omega t$ is
time-dependent while $\theta$ is time-independent. This in turn allows us to
investigate the Berry phase in the entanglement space.

<|endoftext|><|startoftext|>
Introduction
A large number of baryon resonances has been estab-
lished experimentally [1]. Below a mass of 1.8GeV/c2,
most of these states are well reproduced by constituent
quark models [2,3,4]. The models differ in details of the
predicted mass spectrum but have a common feature: above
1.8GeV/c2, they predict many more states than have been
seen experimentally. A natural explanation for these miss-
ing resonances is that they have escaped detection. The
majority of known non-strange baryon resonances stems
from πN scattering experiments. Model calculations show
that for some of these missing resonances only a small
coupling to πN is expected [5]. In elastic scattering, the
coupling to πN enters in the entrance and exit channel so
Correspondence to: klempt@hiskp.uni-bonn.de
these resonances contribute only very weakly. By contrast,
these resonances are predicted to have normal photo cou-
plings [6] and some of them should be observed in channels
like Nη, KΛ, KΣ [7], ∆η or ∆ω [8]. In comparison to the
Nπ final state, most of the above provide a distinctive
advantage: they act as isospin filters; only N∗ resonances
contribute to the Nη and KΛ final states while resonances
in ∆η, and ∆ω belong to the ∆∗ series.
A partial-wave analysis of various photoproduction data
suggested the existence of several new resonances [9]. The
analysis included data from CB-ELSA on π0 and η photo-
production [10,11], Mainz-TAPS data on η photoproduc-
tion [12], beam-asymmetry measurements of π0 and η [13,
14,15], data on γp → nπ+ [16] and from the compilation
of the SAID database [17], and data on photoproduction
http://arxiv.org/abs/0704.0710v1
2 J. Junkersfeld et al.: Photoproduction of π0ω off protons
Fig. 1. Contributions to ∆ω photoproduction: left, produc-
tion of ∆∗ intermediate states; right: production of ω mesons
via t-channel pion exchange.
of γp → K+Λ and γp → K+Σ0 from SAPHIR [18,19],
CLAS [20,21], and LEPS [22].
The reaction γp → pω is known to receive large contri-
butions from t-channel exchange processes [23,24]. A sim-
ilar mechanism may contribute also to ∆ω photoproduc-
tion: the incoming photon may couple to ωπ0, the virtual
π0 excites the nucleon to a ∆ and the ω escapes, pref-
erentially in forward direction. Fig. 1 shows a Feynman
diagram for this reaction mechanism and for the produc-
tion of a ∆ resonance decaying into ∆ω.
This paper reports on a measurement of differential
and total cross sections for the reaction
γp → pπ0ω , (1)
with ω → π0γ and the π0 detected in its two photon decay.
From this data the total cross-section for
γp → ∆+ω (2)
with the subsequent decays
∆+ → pπ0 and ω → π0γ
was extracted and compared to an earlier measurement
at higher energies [25]. The low statistics for reactions (1)
and (2) does not yet provide a sufficiently large data sam-
ple for a partial-wave analysis, but may serve as a guide
for what to expect from future experiments and is thus of
exploratory character.
2 Experimental setup
The experiment was performed at the Electron Stretcher
Accelerator ELSA [26] at the University of Bonn. Elec-
trons were extracted at an energy of 3.2GeV and brems-
strahlung was produced in a radiator foil with a thickness
of 3/1000 of a radiation length (Fig. 2). Electrons deflected
in a magnet were detected with a tagging system cover-
ing the photon energy range from 750 to 2970MeV. The
tagging system consisted of 14 thick scintillation counters
and two proportional wire chambers with a total of 352
wires. The scintillation counters were used to derive a fast
timing signal and the wire chambers to determine the pho-
ton energy. The γ energy resolution varied from 30MeV
dipole magnet beam
radiator
e  beam
g beam
Crystal Barrel
scifi detector
target
quadrupole
H  liquifier2
6,6 m
Fig. 2. Setup of the CB-ELSA experiment
at the lower end to 0.5MeV at the upper end of the spec-
trum not taking into account the energy distribution of
the electron beam of 3−5MeV [26]. This is well matched
with the overall resolution of the detector for this reac-
tion of 30MeV (FWHM) (see below). Typical rates were
1 − 3 × 106 photons/s. The photon beam hit a liquid H2
target of 5.3 cm length and 3 cm diameter. The absolute
normalisation was derived from a comparison of our dif-
ferential angular distributions for the reaction γp → pπ0
with the SAID model SM02. The normalisation uncer-
tainty was estimated to be 15% [10,27].
Charged reaction products were detected by a three-
layer scintillating fibre detector covering polar angles from
15◦ to 165◦ [28]. The outer layer was parallel to the beam
axis, the fibres of the other two layers were bent ±25◦
with respect to the first layer to allow for a spatial re-
construction of hits. Photons and charged particles were
detected in the Crystal Barrel detector [29], a calorimeter
consisting of 1380 CsI(Tl) crystals with photodiode read-
out, covering 98% of 4π solid angle. The detector with its
high granularity and energy resolution is excellently suited
for the detection of multi-photon final states.
Electromagnetic showers typically extended over up to
30 crystals in the calorimeter. Photons were reconstructed
with an energy resolution of σE/E = 2.5%/
E[GeV] and
an angular resolution of σθ,φ ≈ 1.1
◦. Hits due to charged
particles induce smaller clusters with typical 3 – 6 crystals.
A fast first-level trigger signal was derived from a coin-
cidence between a hit in the tagging system and a signal in
at least two out of three layers of the inner fibre detector.
The second-level trigger required a minimum number of
hits in the calorimeter. For part of the data the minimal
number of hits in the calorimeter was 2, otherwise at least
3 hits were requested, in order to reduce the dead-time.
Dead-time losses were below 70%, and below 20% for the
more restrictive trigger.
A more detailed description of the experimental setup
and the event reconstruction can be found in [27].
3 The reaction γp → pπ0ω
3.1 Event selection
The reaction γp → pπ0ω, ω → π0γ, leads to a final state
with five photons and a proton. The π0ω photoproduction
threshold is at Eγ = 1365MeV; a cut on a tagged photon
energyEγ > 1315MeV was applied right at the beginning.
J. Junkersfeld et al.: Photoproduction of π0ω off protons 3
 [MeV]
γ  0π
500 600 700 800 900 1000 1100 1200
 0.9 MeV± = 783.8  ωm
  83± = 2017  ωN
 [MeV]
γ  0π
500 600 700 800 900 1000 1100 1200
Fig. 3. ω signal in π0γ invariant mass
The first step in the analysis is the identification of the
five photons and the reconstruction of their energies and
directions. Protons with Ekin below ∼ 95MeV only pro-
duce a signal in the inner detector but not in the calorime-
ter. Hence in the analysis, events were selected with 5 or 6
hits in the Crystal Barrel calorimeter and 1 – 3 hits in the
inner detector. (A three–hit pattern can arise from three
single hits in each layer not crossing in a single point.) At
least two layers of the inner detector had to have a sig-
nal. For each pair of fibre and barrel hits it was tested if
the two vectors pointing from the target centre to a fibre–
detector-hit and to a Crystal-Barrel-hit form an angle of
20◦ or less; in this case the Crystal Barrel hit was identi-
fied as a proton, otherwise as a photon. The 20◦ matching
angle was chosen to allow for the extension of the target
and the uncertainties in the measurement. Events with
five photons were kept for further analysis. In case of 6
hits in the barrel, one of them had to match the proton
identification.
Surviving events were kinematically fitted to the hy-
pothesis γp → pmiss π
0π0γ with a missing proton, neglect-
ing identified charged hits and using all remaining photon
candidates. The kinematic fit assumed that the reaction
took place in the target centre. Since the momentum of
the proton is unknown and needs to be reconstructed, en-
ergy and momentum conservation give one constraint, the
π0 masses two constraints. A cut on a confidence level of
2% was applied, optimised to lose only few good events.
From the fit, the flight direction of the proton was deter-
mined and compared to hits in the inner detector. Again,
the direction of the missing proton and the direction to
a hit in the inner fibre detector had to form an angle of
±20◦ or less for the hit to be identified as proton.
The pπ0π0γ events were used to identify pπ0ω events
with ω decaying into π0γ. Fig. 3 shows the π0γ mass dis-
tribution with two entries per event. The fit using a Voigt
function (a Breit-Wigner convoluted with a Gaussian) im-
posing the ω width of Γ = 8.49MeV/c2 assigns about
2000 events to reaction (1). The ω mass was determined
to (783.8± 0.9stat± 1.0syst)MeV/c
2. The systematic error
was estimated from the comparison of η, η′, and ω masses
in different reactions with the PDG values. The mass res-
olution is determined to σ = 16MeV/c2.
 [MeV]γE
1500 2000 2500 3000
ω 0 π
 MCω 0πp 
 [MeV]γE
1500 2000 2500 3000
ω 0 π
0.005
0.015
 MC0π 0π 0πp 
Fig. 4. Acceptance of pπ0ω events (left) and the misidentifi-
cation probability of p 3π0 events (right).
) [MeV]γ0πm(
600 800 1000 12000
  26± = 253  ωN
600 800 1000 12000
 < 2382 MeVγ2256 < E
) [MeV]γ0πm(
600 800 1000 12000
  30± = 364  ωN
600 800 1000 12000
 < 2677 MeVγ2495 < E
Fig. 5. The ω signal with background. The background is
predicted in height and shape by simulations of p 3π0 (dark-
grey) and pπ0ω combinatorial background (light-grey).
The background in the π0γ distribution of pπ0π0γ
events has two main sources. A large fraction stems from
p 3π0 events. Fig. 4 shows that, in the energy region 2000 <
Eγ < 3000MeV, p 3π
0 events have a high probability to be
misidentified as pπ0ω. The misidentification probability is
only one order of magnitude smaller than the acceptance
for pπ0ω. However, the branching ratio of p 3π0 → p 6γ is
(96.44±0.09)% compared with (8.71±0.25)% for pπ0ω →
p 5γ. The cross-section for 3π0 photoproduction was esti-
mated using the cross-section for γp → pη [11], which
was determined from events with the η decaying into γγ
and 3 π0. The fractions of η and non-η events in the 3π0
event samples were determined and used to estimate the
cross-section of γp → p 3π0. Monte Carlo simulations were
performed using the p 3π0 cross-section estimate to deter-
mine the expected number of p 3π0 events surviving the
pπ0π0γ reconstruction. Fig. 5 shows for two photon en-
ergy ranges the predicted contribution of pπ0π0π0 events
to the background and the observed π0γ distribution.
The expected combinatorial background was determined
from the number of reconstructed pπ0ω events. It is shown
together with the p 3π0 part of the background in Fig. 5.
In the ω mass region, there is good agreement between the
simulated background distribution and the observed back-
ground. The study of simulated pπ0π0 and pπ0η events
shows misidentification probabilities of the order of 0.1%.
Their contributions were neglected.
The number of events due to reaction (1) in a given
energy range was determined by fitting the π0γ distribu-
tion using a Voigt function for the ω signal and a sec-
ond order polynomial for the background. The fit also re-
turned the number of background events below the peak.
For background subtraction, data histograms were filled
with events within the ω mass region (mω ± 40MeV/c
and background histograms with events falling into the
upper or lower sidebands (687 − 727MeV/c2 and 839 −
4 J. Junkersfeld et al.: Photoproduction of π0ω off protons
879MeV/c2, also shown in Fig. 3). The latter histograms
were scaled to contain the same number of events as found
in the background below the peak. For each energy and
angular region, the sideband histograms were subtracted
from the data histograms to extract the pπ0ω distribu-
tions. The same procedure was used to determine the pπ0ω
distributions for each energy region as function of the mo-
mentum transfer and the invariant mass respectively.
The acceptance was studied with a GEANT-based Mon-
te Carlo simulation using phase space distributed pπ0ω
and ∆+ω events. In the first iteration only pπ0ω Monte-
Carlo events were taken into account. cross-sections for
γp → pπ0ω and γp → ∆+ω were thus obtained, as will
be described in sections 3.3 and 3.4, and used to produce
Monte Carlo events with a realistic mixture of pπ0ω and
∆+ω events. This provides a more realistic acceptance
simulation. Stable results were achieved in the second iter-
ation. The simulated acceptance was different when only
events due to phase-space distributed pπ0ω events or ∆+ω
events were used for the simulation. The difference in the
acceptance was taken as a contribution to the systematic
error.
3.2 Differential cross-sections
The differential cross-sections were obtained from the side-
band subtracted histograms. We give in the centre of mass
system cross-sections differential in cos θω, cos θπ0 and |t−
tmin|,
dσ/dΩ(cos θω), dσ/dΩ(cos θπ0), dσ/dt (|t− tmin|),
respectively. Here t is the squared four-momentum trans-
fer from the photon beam to the pπ0 system given by
t = q2 = (pγ − pω)
and tmin is the minimal momentum transfer imposed by
kinematics.
Fig. 6 presents the differential cross-sections as a func-
tion of cos θω, in table they are given in numerical form.
The distributions are compatible with a description of the
(x) = a0 + a1 · e
a2x with x = cos θ. (4)
In the backward direction, the acceptance is small and the
errors large. The fit using Eq. (4) took into account only
data for which the acceptance ǫ was above 5% (thus re-
stricting the fit range to cos θω > −0.6 forEγ < 1800MeV,
and to cos θω > −0.8 for Eγ > 1800MeV), and was then
extrapolated to cover the full cos θω range. In the forward
direction, there is a strong increase in intensity, in partic-
ular at energies above 2GeV. Production of ω mesons via
t-channel exchange with simultaneous p → ∆(1232) ex-
citation seems to play an important role in the dynamics
of reaction (1).
From the cos θω distributions a total cross-section was
determined by summing over the measured values for which
the acceptance was above 5% and using extrapolated val-
ues in the remaining range.
0.8  < 1817 MeVγ1383 < E
0.8  < 2256 MeVγ2020 < E
 < 2495 MeVγ2382 < E
-1 -0.5 0 0.5 10
1.5  < 2845 MeVγ2677 < E
 < 2020 MeVγ1817 < E
 < 2382 MeVγ2256 < E
 < 2677 MeVγ2495 < E
ωθcos  
-0.5 0 0.5 1
 < 2970 MeVγ2845 < E
Fig. 6. Differential cross-sections dσ/dΩ(cos θω).
The differential cross-sections dσ/dΩ(cos θπ0) are shown
in Fig. 7 and listed numerically in 2. There are no ob-
vious structures beside some fluctuations in the forward
and backward regions. The data were fitted using a con-
stant. The fit was restricted to data points measured with
an acceptance of at least 5%, thus excluding for Eγ >
2380MeV the points with cos θπ0 > 0.8. From this dis-
tribution the total cross-section is derived from the data
points and the extrapolation was used for the points with
small acceptance.
Fig. 8 shows the differential cross-sections dσ/dt in
dependence of |t − tmin|, which are compatible with an
exponential behaviour in the low t region. This is charac-
teristic for production via t-channel exchange. The data
were fitted in the region below 0.8 · |tmax − tmin| (approx-
imately corresponding to ǫ > 5%) using
(|t− tmin|) = e
a+b|t−tmin| + c(E) (5)
where tmin is the minimum squared momentum transfer
imposed by kinematics. The non-t-dependent contribution
was described with a function c(E) = c0 + c1 · Eγ . The
parameters c0 and c1 were determined in a combined fit
of the differential cross sections.
The slope parameter b is shown in Fig. 9. The slope
is approximately constant over the covered energy range.
This indicates a strong contribution from ω production
via t-channel exchange processes.
J. Junkersfeld et al.: Photoproduction of π0ω off protons 5
Table 1. Differential cross-sections dσ/dΩ(cos θω). There is a common systematic error of 16%.
cos θω dσ/dΩ(cos θω) dσ/dΩ(cos θω) dσ/dΩ(cos θω) dσ/dΩ(cos θω)
[µb/sr] [µb/sr] [µb/sr] [µb/sr]
Eγ [MeV] 1383 - 1817 1817 - 2020 2020 - 2256 2256 - 2382
−1.00 −−0.80 0.16 ± 0.08 0.00 ± 0.13 0.29 ± 0.14 0.35 ± 0.23
−0.80 −−0.60 0.08 ± 0.06 0.05 ± 0.07 0.18 ± 0.09 0.23 ± 0.13
−0.60 −−0.40 0.04 ± 0.04 0.12 ± 0.06 0.15 ± 0.06 0.33 ± 0.12
−0.40 −−0.20 0.02 ± 0.03 0.03 ± 0.04 0.05 ± 0.04 0.22 ± 0.08
−0.20− 0.00 0.03 ± 0.02 0.17 ± 0.04 0.16 ± 0.05 0.10 ± 0.08
0.00 − 0.20 −0.02± 0.02 0.09 ± 0.05 0.17 ± 0.05 0.31 ± 0.09
0.20 − 0.40 0.06 ± 0.03 0.13 ± 0.04 0.22 ± 0.06 0.30 ± 0.08
0.40 − 0.60 0.03 ± 0.03 0.19 ± 0.06 0.33 ± 0.06 0.39 ± 0.11
0.60 − 0.80 0.04 ± 0.03 0.32 ± 0.06 0.46 ± 0.08 0.71 ± 0.13
0.80 − 1.00 0.11 ± 0.04 0.32 ± 0.07 0.75 ± 0.10 0.65 ± 0.17
Eγ [MeV] 2382 - 2495 2495 - 2677 2677 - 2845 2845 - 2970
−1.00 −−0.80 −0.10± 0.25 0.27 ± 0.18 0.70 ± 0.28 0.47 ± 0.23
−0.80 −−0.60 0.27 ± 0.14 0.44 ± 0.13 0.37 ± 0.16 0.42 ± 0.16
−0.60 −−0.40 0.44 ± 0.13 0.27 ± 0.09 0.15 ± 0.10 0.13 ± 0.12
−0.40 −−0.20 0.21 ± 0.10 0.17 ± 0.08 0.30 ± 0.09 0.25 ± 0.11
−0.20− 0.00 0.17 ± 0.09 0.09 ± 0.07 0.14 ± 0.09 0.25 ± 0.10
0.00 − 0.20 0.25 ± 0.09 0.28 ± 0.08 0.22 ± 0.08 0.17 ± 0.11
0.20 − 0.40 0.48 ± 0.11 0.26 ± 0.08 0.42 ± 0.10 0.32 ± 0.12
0.40 − 0.60 0.45 ± 0.13 0.42 ± 0.10 0.42 ± 0.11 0.51 ± 0.15
0.60 − 0.80 1.02 ± 0.18 0.57 ± 0.13 0.77 ± 0.18 0.99 ± 0.24
0.80 − 1.00 1.76 ± 0.27 1.23 ± 0.20 1.86 ± 0.28 1.79 ± 0.33
Table 2. Differential cross-sections dσ/dΩ(cos θ
). There is a common systematic error of 16%.
cos θ
dσ/dΩ(cos θ
) dσ/dΩ(cos θ
) dσ/dΩ(cos θ
) dσ/dΩ(cos θ
[µb/sr] [µb/sr] [µb/sr] [µb/sr]
Eγ [MeV] 1383 - 1817 1817 - 2020 2020 - 2256 2256 - 2382
−1.00−−0.80 0.05± 0.03 0.23 ± 0.07 0.30 ± 0.07 0.40± 0.11
−0.80−−0.60 0.05± 0.03 0.21 ± 0.05 0.32 ± 0.06 0.27± 0.09
−0.60−−0.40 0.07± 0.03 0.09 ± 0.05 0.29 ± 0.06 0.40± 0.11
−0.40−−0.20 0.04± 0.03 0.08 ± 0.05 0.27 ± 0.07 0.30± 0.11
−0.20− 0.00 −0.04± 0.03 0.17 ± 0.05 0.23 ± 0.06 0.42± 0.11
0.00 − 0.20 0.06± 0.03 0.17 ± 0.05 0.18 ± 0.06 0.17± 0.10
0.20 − 0.40 0.03± 0.03 0.11 ± 0.05 0.30 ± 0.06 0.40± 0.12
0.40 − 0.60 0.03± 0.03 0.16 ± 0.06 0.27 ± 0.07 0.32± 0.10
0.60 − 0.80 0.07± 0.04 0.18 ± 0.06 0.24 ± 0.07 0.33± 0.11
0.80 − 1.00 0.04± 0.04 0.18 ± 0.08 0.18 ± 0.10 0.64± 0.17
Eγ [MeV] 2382 - 2495 2495 - 2677 2677 - 2845 2845 - 2970
−1.00−−0.80 0.59± 0.14 0.67 ± 0.11 0.67 ± 0.14 0.61± 0.17
−0.80−−0.60 0.54± 0.12 0.29 ± 0.10 0.37 ± 0.10 0.50± 0.14
−0.60−−0.40 0.57± 0.13 0.37 ± 0.10 0.26 ± 0.11 0.32± 0.12
−0.40−−0.20 0.47± 0.13 0.31 ± 0.10 0.34 ± 0.12 0.23± 0.14
−0.20− 0.00 0.58± 0.13 0.46 ± 0.10 0.64 ± 0.12 0.43± 0.13
0.00 − 0.20 0.22± 0.12 0.12 ± 0.09 0.42 ± 0.12 0.49± 0.13
0.20 − 0.40 0.65± 0.14 0.39 ± 0.10 0.61 ± 0.12 0.60± 0.14
0.40 − 0.60 0.28± 0.12 0.29 ± 0.10 0.37 ± 0.13 0.58± 0.15
0.60 − 0.80 0.39± 0.15 0.39 ± 0.11 0.24 ± 0.12 0.09± 0.16
0.80 − 1.00 0.56± 0.27 0.15 ± 0.21 0.26 ± 0.16 0.35± 0.26
6 J. Junkersfeld et al.: Photoproduction of π0ω off protons
From these differential distributions the total cross-
section was obtained by integrating over function (5) from
tmin to tmax.
3.3 Total cross-section
The total cross-section was determined in three different
ways, by extrapolation and summation of the three types
of differential cross sections, dσ/dΩ(cos θω), dσ/dΩ(cos θπ0),
and dσ/dt (|t−tmin|) as described above. Statistical errors
of the total cross-sections were determined by error prop-
agation. As final result, the mean value of the total cross-
section and the mean statistical error are shown in Fig. 10
(left) as a function of the photon energy. The cross-section
rises with increasing photon energy, i. e. with the available
phase space.
A systematic uncertainty was derived from the spread
of the three different determinations of the total cross-
section, using data of Fig. 6, 7 and 8. A further error
of 5.7% was assigned to the Monte Carlo reconstruction
efficiency [30]. These contributions and the 15% normal-
isation error [27] were added in quadrature to yield the
total systematic error shown in Fig. 10.
3.4 The ∆+ω contribution to pπ0ω
Fig. 11 shows the differential cross-sections dσ/dm (pπ0),
which were used to disentangle the ∆+ω contribution to
 < 1817 MeVγ1383 < E
 < 2256 MeVγ2020 < E
1  < 2495 MeVγ2382 < E
-1 -0.5 0 0.5 10
1  < 2845 MeVγ2677 < E
 < 2020 MeVγ1817 < E
 < 2382 MeVγ2256 < E
 < 2677 MeVγ2495 < E
0πθcos  
-0.5 0 0.5 1
 < 2970 MeVγ2845 < E
Fig. 7. Differential cross-sections dσ/dΩ(cos θ
d -110
10  < 1817 MeVγ1383 < E
10  < 2256 MeVγ2020 < E
10  < 2495 MeVγ2382 < E
0 1 2
10  < 2845 MeVγ2677 < E
 < 2020 MeVγ1817 < E
 < 2382 MeVγ2256 < E
 < 2677 MeVγ2495 < E
]2| [(GeV/c)
|t - t
 < 2970 MeVγ2845 < E
Fig. 8. Differential cross-sections dσ/dt (|t − tmin|) of the
squared four-momentum transfer t to the pπ0 system.
 [MeV]γE
1500 2000 2500 3000
Fig. 9. Slope parameter of dσ/dt (|t− tmin|)
the total cross-section. The distributions show prominent
∆ signals. The ∆ peak was fitted by a phase space cor-
rected Breit-Wigner function (see e. g. [31] for details).
The non-resonant pπ0ω part was described by phase-space
distributed pπ0ω Monte Carlo events. Only the ampli-
tudes of the two contributions were left free in the fit.
The Breit-Wigner width of the ∆ was fixed to 120MeV/c2
the mass was fixed to 1232MeV/c2 for energies below
2500MeV and set to values between 1240 and 1250MeV/c2
for higher energies to improve the fit. With these two com-
ponents a good description of the ∆ peak and of the pπ0ω
J. Junkersfeld et al.: Photoproduction of π0ω off protons 7
phase space contribution to the differential cross-section
was achieved.
The Breit-Wigner distributions and the pπ0ω phase-
space contributions were integrated and their fractions de-
termined. The systematic uncertainty due to the disentan-
glement was estimated to 3− 10% and added in quadra-
ture to the systematic error. The cross-section for γp →
pπ0ω without ∆+ω contributions is shown in Fig. 10 (right).
The total cross-section of γp → ∆+ω was determined
from the observed fraction of ∆+ω events and the γp →
pπ0ω cross-section, taking into account the unseen ∆+ →
nπ+ decay mode. The resulting cross-section is shown in
Fig. 12 together with the results of the LAMP2 exper-
iment [25]. It is worthwhile to discuss how the LAMP2
cross-section was determined.
The LAMP2 experiment measured the reaction γp →
∆+ω by identifying ω mesons in their π+π−π0 decays. The
∆+ decay products were not observed. Instead, the ∆+
was identified in the missing mass spectrum of the γp →
ωX reaction. The missing mass distribution (Fig. 13) con-
tains signals for pω and ∆+ω production. The authors
give a 15% systematic uncertainty due to the difficulty in
disentangling the pω, pπ0ω and ∆+ω contributions.
In our analysis the fraction of pπ0 below the ∆ is sig-
nificant (see fig. 11) and larger than estimated by LAMP2.
Hence it seems possible that the LAMP2 cross-section is
overestimated.
The total cross-section for ∆+ω photoproduction in
Fig. 12 (see table 3) is consistent with a simple fit as-
suming a background amplitude in the form A · (E −
Ethreshold)
α/2 · (E − Eh)
β/2 (A, α, β, Ethreshold, Eh fit
parameters). The χ2 = 12.1 for NDoF = 6 corresponds to
an acceptable 6% probability.
4 Summary
We have studied the reaction γp → pπ0ω with ω → π0γ
from the ωπ0 production threshold up to 3GeV photon
energy using an unpolarised tagged photon beam and a
liquid hydrogen target. Differential cross-sections were de-
termined as functions of cos θω, cos θπ0 and |t− tmin|. The
distributions reveal strong contributions from isovector
 [MeV]γE
1500 2000 2500 3000
ω 0 π
 [MeV]γE
1500 2000 2500 3000
Fig. 10. Total cross-sections σ(γp → p π0ω) before (left) and
after (right) subtraction of the ∆+ω contribution. The grey
band represents the systematic uncertainty.
0.005
0.015
0.025
 < 1817 MeVγ1383 < E
0.04  < 2256 MeVγ2020 < E
0.04  < 2495 MeVγ2382 < E
1200 1400 1600 18000
0.04  < 2845 MeVγ2677 < E
 < 2020 MeVγ1817 < E
 < 2382 MeVγ2256 < E
 < 2677 MeVγ2495 < E
 [MeV]0πpm
1200 1400 1600 1800
 < 2970 MeVγ2845 < E
Fig. 11. Differential cross-sections dσ/dm (pπ0). They are fit-
ted with a combination of a Breit-Wigner (blue) and phase
space pπ0ω Monte Carlo events (red).
 [MeV]γE
2000 3000 4000 5000
 [MeV]γE
2000 3000 4000 5000
CB-ELSA
LAMP2
Fig. 12. Total cross-section σ(γp → ∆+ω), shown are the
data from this analysis and from the LAMP2 experiment [25].
The systematic errors are shown as an error band in light (CB-
ELSA) and dark grey (LAMP2). A fit to the data points is
shown, which is described in the text.
8 J. Junkersfeld et al.: Photoproduction of π0ω off protons
Fig. 13. Missing mass of the ω in the LAMP2 experiment
(from [25]). Note that the spectrum is dominated by pω events.
Table 3. Total cross-sections of γp → p π0ω (with and with-
out ∆+ω contribution) and γp → ∆+ω. The systematic error
is shown in Fig. 10 and 12.
Eγ σ(γp → pπ
0ω) σ(γp → pπ0ω) σ(γp → ∆+ω)
[MeV] [µb] (no ∆+ω) [µb] [µb]
1383− 1817 0.49 ± 0.13 0.35 ± 0.13 0.21 ± 0.17
1817− 2020 1.95 ± 0.24 0.93 ± 0.21 1.54 ± 0.35
2020− 2256 3.28 ± 0.28 1.46 ± 0.28 2.74 ± 0.44
2256− 2382 4.47 ± 0.47 2.44 ± 0.50 3.04 ± 0.63
2382− 2495 6.31 ± 0.56 3.58 ± 0.61 4.10 ± 0.76
2495− 2677 4.80 ± 0.43 3.49 ± 0.50 1.97 ± 0.50
2677− 2845 5.87 ± 0.52 3.92 ± 0.62 2.92 ± 0.63
2845− 2970 5.93 ± 0.63 4.18 ± 0.77 2.62 ± 0.74
exchange currents from the photon – converting to an
ω meson – to the proton which undergoes a p-∆ excita-
tion. The cross section for ∆ω production and for non-∆ω
events rises with increasing phase space; LAMP2 data in-
dicate a decrease of the cross-sections when going to larger
photon energies (3 - 5GeV).
We thank the technical staff at ELSA and at all the partici-
pating institutions for their invaluable contributions to the suc-
cess of the experiment. We acknowledge financial support from
the Deutsche Forschungsgemeinschaft (DFG) within the Son-
derforschungsbereich SFB/TR16. The collaboration with St.
Petersburg received funds from DFG and the Russian Founda-
tion for Basic Research. B. Krusche acknowledges support from
Schweizerischer Nationalfond. U. Thoma thanks for an Emmy
Noether grant from the DFG. A.V. Anisovich and A.V. Sarant-
sev acknowledge support from the Alexander von Humboldt
Foundation. This work comprises part of the PhD thesis of J.
Junkersfeld.
References
1. W.M. Yao et al., J. Phys. G 33, 1 (2006).
2. S. Capstick and N. Isgur, Phys. Rev. D 34 (1986) 2809.
S. Capstick and W. Roberts, Prog. Part. Nucl. Phys. 45
(2000) S241.
3. L. Y. Glozman, W. Plessas, K. Varga, and R. F. Wagen-
brunn, Phys. Rev. D 58 (1998) 094030.
4. U. Löring, K. Kretzschmar, B. C. Metsch, and H. R. Petry,
Eur. Phys. J. A 10 (2001) 309.
U. Löring, B. C. Metsch, and H. R. Petry, Eur. Phys. J. A
10 (2001) 395, 447
5. S. Capstick and W. Roberts, Phys. Rev. D 49 (1994) 4570.
6. S. Capstick, Phys. Rev. D 46 (1992) 2864.
7. S. Capstick and W. Roberts, Phys. Rev. D 58 (1998)
074011.
8. S. Capstick and W. Roberts, Phys. Rev. D 57 (1998) 4301.
9. A. V. Anisovich, A. Sarantsev, O. Bartholomy, E. Klempt,
V. A. Nikonov and U. Thoma, Eur. Phys. J. A 25 (2005)
A. Sarantsev, V. A. Nikonov, A. V. Anisovich, E. Klempt
and U. Thoma, Eur. Phys. J. A 25 (2005) 441.
10. O. Bartholomy et al., Phys. Rev. Lett. 94 (2005) 012003.
11. V. Crede et al., Phys. Rev. Lett. 94 (2005) 012004.
12. B. Krusche et al., Phys. Rev. Lett. 74 (1995) 3736.
13. O. Bartalini et al., Eur. Phys. J. A 26 (2005) 399.
14. A.A. Belyaev et al., Nucl. Phys. B 213 (1983) 201.
R. Beck et al., Phys. Rev. Lett. 78 (1997) 606.
D. Rebreyend et al., Nucl. Phys. A 663 (2000) 436.
15. J. Ajaka et al., Phys. Rev. Lett. 81 (1998) 1797.
16. K.H. Althoff et al., Z. Phys. C 18 (1983) 199.
E. J. Durwen, BONN-IR-80-7 (1980).
K. Buechler et al., Nucl. Phys. A 570 (1994) 580.
17. R.A. Arndt et al., W. J. Briscoe, I. I. Strakovsky, and
R. L.Workman, http://gwdac.phys.gwu.edu.
18. K. H. Glander et al., Eur. Phys. J. A 19 (2004) 251.
19. R. Lawall et al., Eur. Phys. J. A 24 (2005) 275.
20. J. W. C. McNabb et al., Phys. Rev. C 69 (2004) 042201.
21. B. Carnahan, UMI-31-09682 (microfiche), Ph.D. thesis
(2003), CUA, Washington, D.C., see also R. Bradford et
al., Phys. Rev. C 73 (2006) 035202
22. R. G. T. Zegers et al., Phys. Rev. Lett. 91 (2003) 092001.
23. J. Barth et al., Eur. Phys. J. A 18 (2003) 117.
24. J. Ajaka et al., Phys. Rev. Lett. 96 (2006) 132003.
25. D. P. Barber et al.,Z. Phys. C 26 (1984) 343.
26. W. Hillert, Eur. Phys. J. A 28S1 (2006) 139.
27. H. van Pee et al., Eur. Phys. J. A 31 61 (2007).
28. G. Suft et al., Nucl. Instrum. Meth. A 538 (2005) 416.
29. E. Aker et al., Nucl. Instrum. Meth. A 321 (1992) 69.
30. C. Amsler et al., Z. Phys. C 58 (1993) 175.
31. A. V. Anisovich, A. Sarantsev, O. Bartholomy, E. Klempt,
V. A. Nikonov and U. Thoma, Eur. Phys. J. A 25 (2005)
http://gwdac.phys.gwu.edu
	Introduction
	Experimental setup
	The reaction pp0
	Summary
ABSTRACT
  Differential and total cross-sections for photoproduction of gamma proton to
proton pi0 omega and gamma proton to Delta+ omega were determined from
measurements of the CB-ELSA experiment, performed at the electron accelerator
ELSA in Bonn. The measurements covered the photon energy range from the
production threshold up to 3GeV.

<|endoftext|><|startoftext|>
arXiv:0704.0711v1  [nucl-th]  5 Apr 2007
Two-pion exchange three-nucleon potential:
O(q4) chiral expansion
S. Ishikawa1 and M. R. Robilotta2
1Department of Physics, Science Research Center,
Hosei University, 2-17-1 Fujimi, Chiyoda, Tokyo 102-8160, Japan
2Instituto de F́ısica, Universidade de São Paulo,
C.P. 66318, 05315-970, São Paulo, SP, Brazil
(Dated: November 4, 2018)
Abstract
We present the expansion of the two-pion exchange three-nucleon potential (TPE-3NP) to chiral
order q4, which corresponds to a subset of all possibilities at this order and is based on the πN
amplitude at O(q3). Results encompass both numerical corrections to strength coefficients of
previous O(q3) terms and new structures in the profile functions. The former are typically smaller
than 10% whereas the latter arise from either loop functions or non-local gradients acting on the
wave function. The influence of the new TPE-3NP over static and scattering three-body observables
has been assessed and found to be small, as expected from perturbative corrections.
PACS numbers: 13.75.Cs, 21.30.Fe, 13.75.Gx, 12.39.Fe
http://arxiv.org/abs/0704.0711v1
I. INTRODUCTION
The research programme for nuclear forces, outlined more than fifty years ago by Taketani,
Nakamura, and Sasaki [1], treats pions and nucleons as basic degrees of freedom. This insight
proved to be very fruitful. On the one hand, it implies the interconnection of all nuclear
processes, both among themselves and with a class of free reactions. On the other, it
determines a close relationship between the number of pions involved in a given interaction
and its range. As a consequence, the outer components of nuclear forces are dominated by
just a few basic subamplitudes, describing either single (N → πN) or multipion (ππ → ππ,
πN → πN , πN → ππN , ...) interactions.
Nevertheless, it took a long time for a theoretical tool to be available which allows the
precise treatment of these amplitudes. Nowadays, owing to the development of chiral per-
turbation theory (ChPT) in association with effective lagrangians [2, 3], the roles of pions
and nucleons in nuclear forces can be described consistently. The rationale for this approach
is that the quarks u and d, which have small masses, dominate low-energy interactions.
One then works with a two-flavor version of QCD and treats their masses as perturbations
in a chiral symmetric lagrangian. The systematic inclusion of quark mass contributions is
performed by means of chiral perturbation theory, which incorporates low-energy features
of QCD into the nuclear force problem. In performing perturbative expansions, one uses a
typical scale q, set by either pion four-momenta or nucleon three-momenta, such that q ≪ 1
Nuclear forces are dominated by two-body (NN) interactions and leading contributions
are due to the one-pion exchange potential (OPEP), which begins [4] at O(q0). The two-
pion exchange potential (TPEP) begins at O(q2) and, at present, there are two independent
expansions up to O(q4) in the literature, based on either heavy baryon [5] or covariant [6, 7]
ChPT. The TPEP is closely related with the off-shell πN amplitude and, at this order,
two-loop diagrams involving intermediate ππ scattering already begin to contribute.
In proper three-nucleon (3N) interactions, the leading term is due to the process known
as TPE-3NP, in which the pion emitted by a nucleon is scattered before being absorbed
by another one. It has been available since long [8–10], involves only tree-level interactions
and has the longest possible range. This contribution begins at O(q3) and consistency with
available NN forces demands the extension of the chiral series for the 3NP up to O(q4).
However, the implementation of this programme is not straightforward, since it requires
the evaluation of a rather large number of diagrams. With the purpose of exploring the
magnitude of O(q4) effects, in this work we concentrate on the particular subset of processes
which still belong to the TPE-3NP class. Our presentation is divided as follows. In section
II we display the general relationship between the TPE-3NP and the πN amplitude, in order
to discuss how it affects chiral power counting in the former. The πN amplitude relevant for
the O(q4) potential is derived in section III and used to construct the three-body interaction
in section IV. We concentrate on numerical changes induced into both potential parameters
and observables in sections V and VI, whereas conclusions are presented in section VII.
There are also four appendices, dealing with kinematics, πN subthreshold coefficients, loop
integrals and non-local terms.
II. GENERAL FORMULATION
Potentials to be used into non-relativistic equations can be derived from field the-
ory by means of the T -matrix. In the case of three-nucleon potentials, one starts
from the non-relativistic transition matrix describing the process N(p1) N(p2) N(p3) →
N(p′1) N(p
2) N(p
3), which includes both kernels and their iterations. The former corre-
spond to proper interactions, represented by diagrams which cannot be split into two pieces
by cutting positive-energy nucleon lines only, whereas the latter are automatically gener-
ated by the dynamical equation. Therefore, just the kernels, denoted collectively by t̄3, are
included into the potential.
The transformation of a T -matrix into a potential depends on both the dynamical equa-
tion adopted and conventions associated with off-shell effects. The latter were discussed
in a comprehensive paper by Friar [11]. Here we use the kinematical variables defined in
Appendix A and relate t̄3 to the momentum space potential operator Ŵ by writing [12]
〈p′1,p′2,p′3 |Ŵ |p1,p2,p3〉 = −(2π)3 δ3(P ′−P ) t̄3(p′1,p′2,p′3,p1,p2,p3) . (1)
In configuration space, internal dynamics is described by the function
W (r′,ρ′; r,ρ) = −
(2π)3
(2π)3
(2π)3
(2π)3
×ei[Qr·(r′−r)+Qρ·(ρ′−ρ)+qr ·(r′+r)/2+qρ·(ρ′+ρ)/2] t̄3(Qr,Qρ, qr, qρ) , (2)
which is to be used in a non-local version of the Schrödinger equation:
ρ′ − ǫ
ψ(r′,ρ′) = −
dr dρ W (r′,ρ′; r,ρ) ψ(r,ρ) . (3)
Non-local effects are associated with the variablesQr andQρ. When these effects are not too
strong, they can be represented by gradients acting on the wave function and the potential
W is rewritten as
W (r′,ρ′; r,ρ) = δ3(r′−r) δ3(ρ′−ρ)
V (r,ρ) . (4)
The two-pion exchange three-nucleon potential is represented in Fig. 1a. It is closely related
with the πN scattering amplitude, which is O(q) for free pions and becomes O(q2) within
the three-nucleon system. As a consequence, the TPE-3NP begins at O(q3) and, at this
order, it also receives contributions from interactions (c) and (d), which have shorter range.
The extension of the chiral series to O(q4) requires both the inclusion of single loop effects
into processes that already contribute at O(q3) and the evaluation of many new amplitudes,
especially those associated with diagram (b).
(a) (b) (c) (d)
FIG. 1: (Color online) Classes of three-nucleon forces, where full and dashed lines represent nucleons
and pions respectively; diagram (a) corresponds to the TPE-3NP.
In this paper we concentrate on the particular set of processes which belong to the TPE-
3NP class, represented by the T -matrix Tππ and evaluated using the kinematical conditions
given in Fig. 2. The coupling of a pion to nucleon i = (1, 2) is derived from the usual lowest
order pseudo-vector lagrangian L(1) and the Dirac equation yields the equivalent forms for
the vertex
(gA/2fπ) [τ ū (p
′−p) γ5 u](i) = (mgA/fπ) [τ ū γ5 u](i) , (5)
where gA, fπ and m represent, respectively, the axial nucleon decay, the pion decay and the
nucleon mass.
k´, b
   k , a
FIG. 2: (Color online) Two-pion exchange three-nucleon potential.
The amplitude for the intermediate process πa(k)N(p) → πb(k′)N(p′) has the isospin
structure
Tba = δab T
+ + iǫbacτc T
− (6)
and Fig. 2 yields
Tππ =−
[ū γ5 u]
[ū γ5 u]
(2) 1
k2−µ2
′2−µ2
(1) ·τ (2) T+ − i τ (1)×τ (2) ·τ (3) T−
, (7)
µ being the pion mass. Results in Appendix A show that [ū γ5 u]
(i) → O(q), whereas pion
propagators are O(q−2). As a consequence, in the O(q4) expansion of the potential one
needs Tππ to O(q) and T± to O(q3). For on-shell nucleons, the sub amplitudes T± can be
written as
T± = ū(p′)
D± − i
σµν(p
′−p)µKν B±
u(p) , (8)
with K = (k′+k)/2. The dynamical content of the πN interaction is carried by the functions
D± and B± and their main properties were reviewed by Höhler [13]. The chiral structure
of these sub amplitudes was discussed by Becher and Leutwyler [14, 15] a few years ago,
in the framework of covariant perturbation theory, and here we employ their results. As
far as power counting is concerned, in Appendix A one finds [ū(p′) u(p)](3) → O(q0) and
ū(p′) σµν(p
′−p)µKν u(p)](3) → O(q2), indicating that one needs the expansions of D±
and B± up to O(q3) and O(q) respectively.
At low and intermediate energies, the πN amplitude is given by a nucleon pole superim-
posed to a smooth background. One then distinguishes the pseudovector (PV) Born term
from a remainder (R) and writes
T± = T±pv + T
R . (9)
The former contribution depends on just two observables, namely the nucleon mass m and
the πN coupling constant g, as prescribed by the Ward-Takahashi identity [16]. The calcu-
lation of these quantities in chiral perturbation theory may involve loops and other coupling
constants but, at the end, results must be organized so as to reproduce the physical values
of both m and g in T±pv [17]. For this reason, one uses the constant g, instead of (gA/fπ),
since the former is indeed the observable determined by the residue of the nucleon pole
[13, 15, 18]. The pv Born sub amplitudes are given by
D+pv =
k′ ·k
k′ ·k
, (10)
B+pv = −g2
, (11)
D−pv =
k ·k′
k ·k′
, (12)
B−pv = −g2
, (13)
where s and u are the usual πN Mandelstam variables. In the case of free pions, their chiral
orders are respectively [D+pv, B
pv, D
pv, B
pv] → O[q2, q−1 , q, q0], but important changes do
occur when the pions become off-shell.
The amplitudes T±R receive contributions from both tree interactions and loops. The
former can be read directly from the basic lagrangians and correspond to polynomials in
t = (k′−k)2 and ν = (p′+p)·(k′+k)/4m, with coefficients given by renormalized LECs [15].
The latter are more complex and depend on Feynman integrals. In the description of πN
amplitudes below threshold, one approximates both types of contributions by polynomials
and writes [13, 19]
2mtn , (14)
where XR stands for D
R/ν, D
R/ν or B
R . The subthreshold coefficients xmn have the
status of observables, since they can be obtained by means of dispersion relations applied
to scattering data. As such, they constitute an important source of information about the
values of the LECs to be used in effective lagrangians.
The isospin odd subthreshold coefficients include leading order terms, which implement
the predictions made by Weinberg [20] and Tomozawa [21] for πN scattering lengths, given
D−WT =
2f 2π
, B−WT =
2f 2π
. (15)
For free pions, one has [D−WT , B
WT ] → O[q, q0], but these orders of magnitude also change
when pions become virtual.
Quite generally, the ranges of nuclear interactions are determined by t-channel exchanges.
At O(q3), the TPE-3NP involves only single-pion exchanges among different nucleons and
has the longest possible range. Another t-channel structure becomes apparent at O(q4),
associated with the pion cloud of the nucleon, which gives rise to both scalar and vector
form factors [18]. These effects extend well beyond 1 fm [22, 23] and a limitation of the
power series given by Eq. (14) is that they cannot accommodate these ranges, since Fourier
transforms of polynomials yield only δ-functions and its derivatives. In the description of the
πN amplitude produced by Becher and Leutwyler [15], one learns that the only sources of
medium range (mr) effects are their diagrams k and l, which contain two pions propagating
in the t-channel. In our derivation of the TPE-3NP, the loop content of these diagrams is not
approximated by power series and, for free pions, the non-pole subamplitudes are written as
D+R = D
mr(t) +
d̄+00 + d
2 + d̄+01t
d+20ν
4 + d+11ν
2t + d̄+02t
, (16)
B+R = B
mr(t) +
b+00ν
, (17)
D−R = D
mr(t) +
ν/(2f 2π)
d̄−00ν + d
3 + d̄−01νt
, (18)
B−R = B
mr(t) +
1/(2f 2π) + b̄
b−10ν
2 + b̄−01t
, (19)
where the labels (n) outside the brackets indicate the presence of O(qn) leading terms
and mr denotes terms associated with the nucleon pion cloud. The bar symbol over some
coefficients indicates that they do not include both Weinberg-Tomozawa and medium range
contributions, which are accounted for explicitly. The functions D±R and B
R depend on the
parameters fπ, gA, µ, m and on the LECs ci and d̄i, which appear into higher order terms
of the effective lagrangian. The subthreshold coefficients are the door through which LECs
enter our calculation and their explicit forms are given in Appendix B.
The dynamical content of the O(q3) πN amplitude is shown in Fig. 3. The first two
diagrams correspond to PV Born amplitudes, whereas the third one represents the Weinberg-
=    + +
FIG. 3: (Color online) Representation of the πN amplitude used in the construction of the TPE-
Tomozawa contact interaction, all of them with physical masses and coupling constants. The
fourth graph summarizes the terms within square brackets in Eqs. (16-19) and depends on
the LECs. Finally, the last two diagrams describe medium range effects owing to the nucleon
pion cloud, associated with scalar and vector form factors. This decomposition of the πN
amplitude has also been used in our derivation of the two-pion exchange components of the
NN interaction [6, 7] and hence the present calculation is consistent with those results.
III. INTERMEDIATE πN AMPLITUDE
The combination of Figs. 2 and 3 gives rise to the TPE-3NP, associated with the six
diagrams shown in Fig. 4. In the sequence, we discuss their individual contributions to the
subamplitudes D± and B±. We are interested only in the longest possible component of
the potential and numerators of expressions are systematically simplified by using k2 → µ2
and k
′2 → µ2. In configuration space, this corresponds to keeping only those terms which
contain two Yukawa functions and neglecting interactions associated with Figs. 1 (c) and 1
• diagrams (a) and (b): The crosses in the nucleon propagators of Figs. 4 (a) and 4 (b)
indicate that they do not include forward propagating components, so as to avoid double
counting when the potential is used in the dynamical equation. The covariant evaluation of
these contributions is based on Eqs. (10-13). Denoting by p̄ the momenta of the propagating
 + +=
 +  + +
(a) (b) (c)
(d) (e) (f)
FIG. 4: (Color online) Structure of the O(q4) two-pion exchange three-nucleon potential
nucleons, the factors 1/(s−m2) and 1/(u−m2) are decomposed as
(p̄0)2−Ē2
2Ē(p̄0−Ē)
2Ē(p̄0+Ē)
, (20)
with Ē =
m2+p̄2. The first term represents forward propagating nucleons, associated with
the iteration of the OPEP, whereas the second one gives rise to connected contributions.
Discarding the former and using the results of Appendix A, one has
1/({su} −m2)→ −1/
4m2 +
3q2r+q
ρ/3+16Q
ρ/3± 10qr ·Qρ/
3∓ 2qρ ·Qr/
. (21)
After appropriate truncation, one obtains
D+ab = −
(2µ2−t)→ O(q2) , (22)
B+ab → O(q
2) , (23)
D−ab = −
ν → O(q2) , (24)
B−ab → O(q
2) , (25)
where we have used the fact that, in the case of virtual pions, ν → O(q2).
• diagrams (c) and (d): These contributions are purely polynomial, can be read directly
from Eqs. (16-19), and are given by
D+cd = −
16 πf 4π
(2µ2−t)→ O(q2) , (26)
B+cd → O(q
2) , (27)
D−cd =
2f 2π
ν → O(q2) , (28)
B−cd =
2f 2π
2 c4m
8 πf 4π
→ O(q0) . (29)
• diagrams (e) and (f): The medium range components of the intermediate πN amplitude
D+e =
64π2f 4π
(2t−µ2)
(1−t/2µ2) Πt − 2π
→ O(q3) , (30)
D+ef → O(q
4) , (31)
B−e =
g2Amµ
16π2f 4π
(1−t/4µ2) Πt − π
→ O(q) , (32)
where Πt is the dimensionless Feynman integral
µ2 F (a)
← M = 2µ/a , F (a) =
tan−1
µ (1−a2/2)
. (33)
The amplitude D−ef , proportional to ν, is O(q3) for free pions and here becomes O(q4). Thus,
in fact, diagram (f) does not contribute to the TPE-3NP at O(q4).
• full results: The Golberger-Treiman relation g/m = gA/fπ is valid up to O(q2) and can
be used in diagrams (a) and (b). One then has
σ(2µ2)
(2µ2−t)
+ c3 +
g2A(1+g
16πf 2π
128π2f 2π
(1−2t/µ2) Πt
, (34)
where
σ(t = 2µ2) = −4 c1 µ2 −
3g2Aµ
32πf 2π
is the value of the scalar form factor at the Cheng-Dashen point [14]. The remaining ampli-
tudes read
B+ → O(q2) , (36)
1−g2A
2f 2π
ν , (37)
1 + 4 c4m
2f 2π
A(1+2g
16 πf 4π
g2Amµ
16 π2f 4π
(1−t/4µ2) Πt . (38)
The subamplitudes D± and B± begin at O(q2) and one needs just the leading terms in the
spinor matrix elements of Eq. (8), which is rewritten as
T+ = 2mD+ , (39)
T− = 2mD− + iσ(3) ·k′×kB− , (40)
with D+ → O(q2)+O(q3), D− → O(q2), and B− → O(q0)+O(q).
• O(q3) reduction: In order to compare our amplitudes with previous O(q3) results, one
notes that, in case corrections are dropped, one would have
(2µ2−t)
, (41)
2f 2π
2 c4m
. (42)
These expressions agree with those derived directly from a chiral lagrangian [24], except
for the terms within square brackets in both D+ and B−. The former corresponds to a
Born contribution whereas the latter is due to diagram (c) in Fig. 4, associated with the
Weinberg-Tomozawa term.
IV. TWO-PION EXCHANGE POTENTIAL
The expansion of the TPE-3NP up to O(q4) requires only leading terms in vertices and
propagators. In order to derive the non-relativistic potential in momentum space, one divides
Eq. (7) by the relativistic normalization factor
2m for each external nucleon leg
and writes1
t̄3 =
4f 2π
′2+µ2
(1) ·k σ(2) ·k′
(1) ·τ (2)D+ − i τ (1) × τ (2) ·τ (3)
(3) ·k′×k B−
. (43)
The configuration space potential has the form
V3(r,ρ) = τ
(1) ·τ (2) V +3 (r,ρ) + τ (1) × τ (2) ·τ (3) V −3 (r,ρ) + cyclic permutations, (44)
1 One notes that this expression is identical with Eq. (33) of Ref. [10] divided by 8m3.
V +3 (r,ρ) = C
(1) ·x̂31 σ(2) ·x̂23 U1(x31)U1(x23)
+ C+2
(1/9)σ(1) ·σ(2) [U(x31)−U2(x31)] [U(x23)−U2(x23)]
+ (1/3)σ(1) ·x̂23 σ(2) ·x̂23 [U(x31)−U2(x31)] U2(x23)
+ (1/3)σ(1) ·x̂31 σ(2) ·x̂31 U2(x31) [U(x23)−U2(x23)]
+ σ(1) ·x̂31 σ(2) ·x̂23 x̂31 ·x̂23 U2(x31)U2(x23)
+ C+3 σ
(1) ·∇I31 σ(2) ·∇I23 ∇I31 ·∇I23
I0 − 2 I1
, (45)
V −3 (r,ρ) = C
(1/9)σ(1)×σ(2) ·σ(3) [U(x31)−U2(x31)] [U(x23)−U2(x23)]
+ (1/3)σ(3)×σ(1) ·x̂23 σ(2) ·x̂23 [U(x31)−U2(x31)] U(x23)
+ (1/3)σ(1) ·x̂31 σ(2)×σ(3) ·x̂31 U2(x31) [U(x23)−U2(x23)]
+ σ(1) ·x̂31 σ(2) ·x̂23 σ(3) ·x̂31×x̂23 U2(x31)U2(x23)
+ C−2
(1) ·
31 −i∇
(2) ·x̂23 [U(x31)−U2(x31)] U1(x23)
+ σ(1) ·x̂31 σ(2) ·
31 −i∇
U1(x31) [U(x23)−U2(x23)]
+ 3σ(1) ·x̂31 σ(2) ·x̂23
31 −i∇
· [x̂31 U2(x31)U1(x23) + x̂23 U1(x31)U2(x23)]
+ C−3 σ
(1) ·∇I31 σ(2) ·∇I23 σ(3) ·∇I31×∇I23
I0 − I1/4
. (46)
The profile functions are written in terms of the dimensionless variables xij = µ rij and read
U(x) =
, (47)
U1(x) = −
, (48)
U2(x) =
, (49)
In = −
(2π)3
(2π)3
ei(k·r31+k
·r23)
′2+µ2
Πt(t) . (50)
The last function involves the loop integral given in Eq. (33) and is discussed further in
Appendix C. The gradients ∇Iij act on the functions I
n, whereas the ∇
ij act only on the
wave function and give rise to non-local interactions, as discussed in Appendix D.
The strength coefficients are the following combinations of the basic coupling constants
C+1 =
64 π2f 4π
σ(2µ2) , (51)
C+2 =
32 π2f 4π m
+mc3 +
g2A(1+g
16πf 2π
, (52)
C+3 =
4096 π3f 6π
, (53)
C−1 =
256 π2f 4π m
1 + 4mc4 −
g2A(1+2g
8πf 2π
, (54)
C−2 =
g2A(g
A−1)µ6
768 π2f 4π m
, (55)
C−3 = −
2048 π3f 6π
. (56)
V. STRENGTH COEFFICIENTS
The strength constants of the potential involve a blend of four well determined param-
eters, namely m = 938.28 MeV, µ = 139.57 MeV, gA = 1.267 and fπ = 92.4 MeV, with
the scalar form factor at the Cheng-Dashen point and the LECs c3 and c4, which are less
precise. As far as σ(2µ2) is concerned, we rely on the results [25] σ(2µ2)−σ(0) = 15.2± 0.4
MeV, σ(0) = 45±8 MeV, and adopt the central value σ(2µ2) = 60 MeV. The values quoted
for the LECs in the literature vary considerably, depending on the empirical input employed
and the chiral order one is working at. A sample of values is given in Table I.
Our work is based on the O(q3) expansion of the intermediate πN amplitude and, for
the sake of consistency, one must use LECs extracted at the same order. The kinematical
conditions of the three-body interaction are such that the variable ν is O(q2), an order of
magnitude smaller than the threshold value, ν = µ. This makes information encompassed in
the subthreshold coefficients better suited to this problem and we use results from Appendix
TABLE I: Some values of the LECs c3 and c4; m is the nucleon mass.
Reference Chiral order πN input mc3 mc4
[26] 3 amplitude at ν = 0, t = 0 −5.00± 1.43 3.62 ± 0.04
[26] 3 amplitude at ν = 0, t = 2µ2/3 −5.01± 1.01 3.62 ± 0.04
[27] 3 scattering amplitude −5.69± 0.04 3.03 ± 0.16
[15] 4 subthreshold coefficients -3.4 2.0
[15] 4 scattering lengths -4.2 2.3
tree 2 subthreshold coefficients -3.6 2.0
this work 3 subthreshold coefficients -4.9 3.3
B in order to write
mc3 = −mf 2π d+01 −
g4A mµ
16 π f 2π
− 77 g
768 π f 2π
, (57)
mc4 =
f 2π b
g2A(1+g
16 π f 2π
. (58)
Adopting the values for the subthreshold coefficients given by Höhler [13], namely d+01 =
1.14 ± 0.02µ−3 and b−00 = 10.36 ± 0.10µ−2, one finds the figures shown in the last row of
Table I. These, in turn, produce the strength coefficients displayed in Table II. For the sake
of comparison, we also quote values employed in our earlier calculation [10] and in two TM’
versions [28] of the Tucson-Melbourne potential [8].
TABLE II: Strength coefficients in MeV.
reference C+1 C
this work 0.794 -2.118 0.034 0.691 0.014 -0.067
Brazil [10] 0.92 -1.99 - 0.67 - -
TM’(93) [28] 0.60 -2.05 - 0.58 - -
TM’(99) [28] 0.91 -2.26 - 0.61 - -
Changes in these parameters represent theoretical progress achieved over more than two
decades and it is worth investigating their origins in some detail. With this purpose in
mind, we compare present results with those of our previous O(q3) calculation [10]. At the
chiral order one is working here, new qualitative effects begin to show up, associated with
both loops and non-local interactions. They are represented by terms proportional to the
coefficients C+3 , C
2 and C
3 in Eqs. (45) and (46).
The πN coupling is now described by g2Aµ
2/f 2π = 3.66 whereas, previously, the factor
g2µ2/m2 = 3.97 was used. From a conceptual point of view, the latter should be preferred,
since g is indeed the proper coupling observable. In chiral perturbation theory, the difference
between both forms is ascribed to the parameter ∆GT = −2d18µ2/g, which describes the
Goldberger-Treiman discrepancy [15]. As this is a O(q2) effect, both forms of the coupling
become equivalent in the present calculation. On the other hand, the empirical value of g is
subject to larger uncertainties and the form based on gA is more precise. Our present choice
accounts for a decrease of 8% in all parameters.
The relations C+1 ↔ Cs, C+2 ↔ Cp and C−1 ↔ −C ′p allow one to compare Eqs. (45)
and (46) with Eq. (67) of Ref. [10]. One notes that the latter contains an unfortunate
misprint in the sign of the term proportional to C ′p, as pointed out in Ref. [29]. In the
earlier calculation, the coefficient Cs was based on a parameter [30] ασ = 1.05µ
−1, which
corresponds to σ(2µ2) = 64 MeV. The results of Table II show that the values of C+2 and
C−1 are rather close to those of Cp and −C ′p. This can be understood by rewriting Eqs. (52)
and (54) in terms of the subthreshold coefficient d+01 and b
00 as follows
C+2 = −
32 π2f 4π m
mf 2π d
29g2Amµ
768πf 2π
, (59)
C−1 =
128 π2f 4π m
f 2π b
g2Amµ
16πf 2π
. (60)
Numerically, this amounts to C+2 = −(1.845 + 0.110 + [0.163]) MeV and C−1 = (0.624 +
[0.067]) MeV. The second term in the former equation was overlooked in Ref. [10] and
should have been considered there. The square brackets2 correspond to next-to-leading
order contributions and yield corrections of about 8% and 11% to the leading terms in C+2
and C−1 , respectively.
3 As the model used in Ref. [10] was explicitly designed to reproduce
the subthreshold coefficients quoted by Höhler [13], it produces the very same contributions
as the first terms in Eqs. (59) and (60).
2 These factors can be traced back to loop diagrams in Fig. 3 and are dynamically related with the term
proportional to C±
, as we discuss in Appendix C.
3 When comparing the new coefficients with those in the second row of Table II, one should also take into
account the 8% effect due to the Goldberger-Treiman discrepancy.
VI. NUMERICAL RESULTS FOR THREE-NUCLEON SYSTEMS
In order to test the effects of the TPE-3NP at O(q4), in this section, we present some
numerical results of Faddeev calculations for three-nucleon bound and scattering states. The
calculations are based on a configuration space approach, in which we solve the Faddeev
integral equations [31–33],
Φ3 = Ξ12,3 +
E + iǫ−H0 − V12
× [V12 (Φ1 + Φ2) +W3 (Φ1 + Φ2 + Φ3)] ,
(and cyclic permutations), (61)
where Ξ12,3, which does not appear in the bound state problem, is an initial state wave
function for the scattering problem, H0 is a three-body kinetic operator in the center of
mass, V12 is a nucleon-nucleon (2NP) potential between nucleons 1 and 2, and W3 is the
3NP displayed in Fig. 2. Partial wave states of a 3N system, in which both NN and 3N
forces act, are restricted to those with total NN angular momenta j ≤ 6 for bound state
calculations, and j ≤ 3 for scattering state calculations. The total 3N angular momentum
(J) is truncated at J = 19/2, while 3NP is switched off for 3N states with J > 9/2 for
scattering calculations. These truncation procedures are confirmed to give converged results
for the purposes of the present work.
When just local terms are retained, t̄3 in Eq. (43) can be cast in the conventional form
[8–10]
t̄3 = −
4f 2π
F (k2)
2 + µ2
F (k′2)
′2 + µ2
(σ(1) · k)(σ(2) · k′)
(τ (1) · τ (2)){a+ b(k · k′)}
−(iτ (1) × τ (2) · τ (3))(iσ(3) · k′ × k)d
, (62)
where the coefficients, a, b, and d are related with our potential strength coefficients by
[C+1 , C
2 , C
1 ] =
(4π)2
[−aµ4, bµ6, −dµ6] . (63)
The values of the coefficients, a, b, and d for the TPE-3NP at O(q4) are shown in Table III,
as BR-O(q4). In this table, the values for the older version of the Brazil TPE-3NP, BR(83)
[10], and the potential up to O(q3) given by Eqs. (41-42), BR-O(q3), are shown as well.
TABLE III: Coefficients a, b, and d of the TPE-3NP.
3NP a µ b µ3 d µ3
BR-O(q4) -0.981 -2.617 -0.854
BR-O(q3) -0.736 -3.483 -1.204
BR(83) -1.05 -2.29 -0.768
In Eq. (62), the function F (k2) represents a πNN form factor. We apply a dipole form
factor with the cut off mass Λ,
Λ2−µ2
, which modifies the profile functions U(x), U1(x),
and U2(x) in Eqs. (47-49) as
U(x) =
Λ̄2 − 1
, (64)
U1(x) = −
+ Λ̄2
e−Λ̄x
Λ̄2 − 1
e−Λ̄x, (65)
U2(r) =
(Λ̄x)2
e−Λ̄x
−Λ̄(Λ̄
2 − 1)
e−Λ̄x, (66)
with Λ̄ = Λ/µ.
We choose the Argonne V18 model (AV18) [34] for a realistic NN potential, by which the
triton binding energy (B3) becomes 7.626 MeV, underbinding it by about 0.9 MeV compared
to the empirical value, 8.482 MeV. As it is well known, the introduction of the TPE-3NP
remedies this deficiency. The amount of attractive contribution depends on the cutoff mass
Λ, as shown in Fig. 5. The solid curve shows the dependence of B3 on Λ for the calculation
with the BR-O(q4) 3NP in addition to the AV18 2NP (AV18+BR-O(q4)). In the figure,
the empirical value and the AV18 result are displayed by the dashed and dotted horizontal
lines, respectively. Due to the strong attractive character of the 3NP, B3 is reproduced by
choosing a rather small value of Λ, namely 660 MeV. In the same figure, the Λ-dependence
of B3 for AV18+BR-O(q3) is displayed by a dashed curve and that for the AV18+BR(83) by
a dotted curve. From these curves we see that AV18+BR-O(q3) reproduces B3 for Λ = 620
MeV and AV18+BR(83) for Λ = 680 MeV. In other words, the BR-O(q4) 3NP is slightly
more attractive than the BR(83) 3NP and a large attractive effect occurs when one moves
from the TPE O(q4) 3NP to the O(q3) 3NP. This tendency is strongly correlated with the
magnitude of the coefficient b, as shown in Table III. This can be understood as a dominant
contribution to B3 from the component of the TPE-3NP associated with the coefficients b.
This dominance is shown in Table IV, where we tabulate calculated B3 for the AV18 plus
the BR-O(q4) 3NP and plus each term of the BR-O(q4) coming from the coefficients a, b,
and d.
550 600 650 700 750 800
L (MeV)
FIG. 5: (Color online) The triton binding energy B3 as functions of the cutoff mass Λ of the πNN
dipole form factor. The solid curve denotes the result for AV18+BR-O(q4), the dashed curve for
AV18+BR-O(q3), and the dotted curve for AV18+BR(83). The horizontal lines denote the AV18
result (dotted line) and the empirical value (dashed line).
In Fig. 6, we compare six calculated observables for proton-deuteron elastic scattering,
namely differential cross sections σ(θ), vector analyzing powers of the proton Ay(θ) and
of the deuteron iT11(θ), and tensor analyzing powers of the deuteron T20(θ), T21(θ), and
T22(θ), at incident proton energy E
N = 3.0 MeV, (or incident deuteron energy E
d = 6.0
MeV,) with experimental data of Ref. [35, 36]. In the figure, the solid curves designate
the AV18 calculations and the dashed curves the AV18+BR-O(q4) calculations, which are
almost indistinguishable from the AV18+BR-O(q3) and AV18+BR(83) calculations, once
TABLE IV: Triton binding energy for the AV18 2NP plus the BR-O(q4) 3NP for each term of the
BR-O(q4) 3NP with Λ = 660 MeV. ∆B3 means the difference of the calculated binding energy
from that of the AV18 calculation.
B3 (MeV) ∆B3 (MeV)
AV18+BR-O(q4) 8.492 0.866
AV18+BR-O(q4)-a 7.673 0.047
AV18+BR-O(q4)-b 8.241 0.615
AV18+BR-O(q4)-d 7.787 0.161
the cut off masses are chosen so that B3 is reproduced.
It is reminded that the TPE-3NF gives minor effects on the vector analyzing powers.
This happens because the exchange of pions gives essentially scalar and tensor components
of nuclear interaction in spin space, which are not so effective to the vector analyzing powers.
On the other hand, as is noticed in Refs. [37, 38], at ElabN = 3.0 MeV, the TPE-3NP gives a
wrong contribution to the tensor analyzing power T21(θ) around θ = 90
In Fig. 7, we compare calculations of observables in neutron-deuteron elastic scattering
at ElabN = 28.0 MeV with experimental data of proton-deuteron scattering Ref. [39]. At
this energy, discrepancies between the calculations and the experimental data in the vector
analyzing power iT11(θ) appear at θ ∼ 100◦, where iT11(θ) has a minimum, and at θ ∼ 140◦,
where iT11(θ) has a maximum, which are not compensated by the introduction of the TPE-
3NP. On the other hand, while the AV18 calculation almost reproduces the experimental
data of T21(θ) at θ ∼ 90◦, the introduction of the TPE-3NP gives a wrong effect, as in the
ElabN = 3 MeV case.
These results set the stage for the introduction of terms associated with the coefficients
C+3 , C
2 , and C
3 , Eqs. (44-45), which are new features of the O(q4) expansion of the TPE-
3NP. Terms proportional to C±3 , which include the rather complicated function I(r31, r23)
given in Appendix C, arise from a loop integral, Eq. (33). On the other hand, the term with
C−2 corresponds to a non-local potential and includes the gradient operator ∇
ij , which acts
on the wave function and arises from the kinematical variable ν. Both kinds of contributions
are not expressed in the conventional local form shown in Eq. (62), which involves only
the coefficients C+1 , C
2 , and C
1 , and the full evaluation of their effects would require an
0 60 120 180
-0.04
-0.03
-0.02
-0.01
0 60 120 180
0 60 120 180
-0.03
-0.02
-0.01
 (deg) 
 = 3.0 MeV   (E
 = 6.0 MeV)
 (deg) 
 (deg) 
 (deg) 
 (deg) 
 (deg) 
FIG. 6: (Color online) Proton-deuteron elastic scattering observables at ElabN = 3.0 MeV. Solid
curves are calculations for the AV18 potential, and dashed curves for the AV18+BR-O(q4). Ex-
perimental data are taken from Refs. [35, 36].
extensive rebuilding of large numerical codes. However, the coefficients of the new terms are
small, and in this exploratory paper we estimate their influence over observables as follows.
The function I(r31, r23) is approximated by Eq. (C11), which amounts to replacing Πt(t)
by a factor −π. Further, the kinematical factors in front of Πt(t) in Eqs. (34) and (38),
namely 1 − 2t/µ2 and 1 − t/4µ2, are approximately evaluated by putting t ≈ 2µ2, which
yields −3 and 1/2, respectively. By this procedure, the coefficients C+3 and C−3 are absorbed
into C+2 and C
1 , or in b and d respectively, and one has
∆C+2 = −3C+3 , ∆C−1 = C−3 /2. (67)
Numerically, this corresponds to ∆C+2 = −0.102 MeV ∼ 120C
2 and ∆C
1 = −0.034 MeV ∼
C−1 , or ∆b = −0.125(µ−3) and ∆d = 0.042(µ−3). The net change produced in the triton
binding energy is +0.026 MeV (+0.037 MeV from ∆C+2 and -0.011 MeV from ∆C
1 ), just
about 1/30 of the total increase in B3 due to the local terms of the BR-O(q4) TPE-3NP.
The non-local term proportional to C−2 is more involved and we restrict ourselves to a
rough assessment of its role. We replace the variable ν by a constant 〈ν〉 and assume, for
0 60 120 180
0 60 120 180
0 60 120 180
 (deg) 
 = 28.0 MeV   �E
 = 56.0 MeV�
 (deg) 
 (deg) 
 (deg) 
 (deg) 
 (deg) 
FIG. 7: (Color online) Nucleon-deuteron elastic scattering observables at ElabN = 28.0 MeV. Curves
are calculations for neutron-deuteron scattering. Solid curves denote calculations for the AV18
potential and dashed curves for the AV18+BR-O(q4). Experimental data are those for proton-
deuteron scattering taken from Ref. [39].
example, that 〈ν〉 = µ2
. This changes the C−2 term in Eq. (46) into the very simple form
V −3 (r,ρ) = C
1 (· · · ) + iC̃−2 σ(1) · x̂31σ(2) · x̂23U1(x31)U1(x23) + C−3 (· · · ) , (68)
C̃−2 = −
4f 2π
1− g2A
2f 2π
(4π)2
g2A(1− g2A)µ6
512π2f 4πm
= 0.021 MeV . (69)
Except for the isospin factor, this term is similar to that with C+1 (or a), which adds
about 0.05 MeV to the triton binding energy. Since the potential strength C̃−2 is about 3 %
of C+1 , its contribution to the binding energy may be estimated to be a tiny 0.001 MeV.
VII. CONCLUSIONS
In the framework of chiral perturbation theory, three-nucleon forces begin at O(q3), with
a long range component which is due to the exchanges of two pions and relatively simple.
At O(q4), on the other hand, a large number of different processes intervene and a full
description becomes rather complex. For this reason, here we concentrate on a subset of
O(q4) interactions, namely that which still involves the exchanges of just two pions. This
part of the 3NP is closely related with the πN amplitude, and the expansion of the former
up to O(q4) depends on the latter at O(q3).
Our expressions for the potential are given in Eqs. (44-56) and the new chiral layer of the
TPE-3NP considered in this work gives rise to both numerical corrections to strength coef-
ficients of already existing terms (C+1 , C
2 , C
1 ) and new structures in the profile functions.
Changes in numerical coefficients lay in the neighborhood of 10% and can be read in Tables
II and III. New structures, on the other hand, arise either from loop functions representing
form factors or the non-local terms associated with gradients acting on the wave function.
They correspond to the terms proportional to the parameters C+3 , C
2 and C
3 , which are
small and compatible with perturbative effects.
In order to insert our results into a broader picture, in Table V we show the orders at
which the various effects begin to appear, including the drift potential derived recently [40].
TABLE V: Chiral picture for two- and three-body forces.
beginning TWO-BODY TWO-BODY THREE-BODY
O(q0) OPEP: V −T , V
O(q2) OPEP: V −D TPEP: V
T , V
O(q3) TPEP: V −LS, V
T , V
SS ;V
C , V
LS TPEP: C
1 , C
O(q4) TPEP: V −D ;V
Q , V
D TPEP: C
3 , C
The influence of the new TPE-3NP over three-body observables has been assessed in both
static and scattering environments, adopting the Argonne V18 potential for the two-body
interaction. In order to reproduce the empirical triton binding energy, the O(q4) potential
requires a cutoff mass of 660 MeV. Comparing this with the value of 680 MeV for the 1983
Brazil TPE-3NP, one learns that the later version is more attractive.
In the study of proton-deuteron elastic scattering, we have calculated cross sections σ(θ),
vector analyzing powers Ay(θ) of the proton and iT11(θ) of the deuteron, and tensor analyzing
powers T20(θ), T21(θ), and T22(θ) of the deuteron, at energies of 3 and 28 MeV. Results are
displayed in Figs. 6 and 7, where it is possible to see that there is little sensitivity to
the changes induced in the strength parameters when one goes from O(q3) to O(q4). Old
problems, as the Ay(θ) puzzle, remain unsolved.
The present version of the TPE-3NP contains new structures, associated with loop inte-
grals an non-local operators. Their influence over observables has been estimated and found
to be at least one order of magnitude smaller than other three-body effects. A more detailed
study of this part of the force is being carried on.
APPENDIX A: KINEMATICS
The coordinate describing the position of nucleon i is ri and one uses the combinations
R = (r1+r2+r3)/3 , r = r2−r1 , ρ = (2 r3−r1−r2)/
3 , (A1)
which yield
r1 = R−
, r2 = R +
, r3 = R +
. (A2)
The momentum of nucleon i is pi and one defines
P = p1+p2+p3 , pr = (p2−p1)/2 , pρ = (2p3−p1−p2)/2
3 . (A3)
Initial momenta p and final momenta p′ are used in the combinations
Q = (P ′+P )/2 , q = (P ′−P ) , (A4)
Qr = (p
r+pr)/2 , qr = (p
r−pr) , (A5)
Qρ = (p
ρ+pρ)/2 , qρ = (p
ρ−pρ) . (A6)
In the CM, one has P = 0 and the three-momenta are given by
p1 = −(Qr−qr/2)− (Qρ−qρ/2)/
3 , p′1 = −(Qr+qr/2)− (Qρ+qρ/2)/
3 , (A7)
p2 = (Qr−qr/2)− (Qρ−qρ/2)/
3 , p′2 = (Qr+qr/2)− (Qρ+qρ/2)/
3 , (A8)
p3 = 2(Qρ−qρ/2)/
3 , p′3 = 2(Qρ+qρ/2)/
3 . (A9)
Energy conservation for on-shell particles yield the non-relativistic constraint
Qr ·qr +Qρ ·qρ = 0 . (A10)
The momenta of the exchanged pions are written as
k = p1 − p′1 , k′ = p′2 − p2 , (A11)
k0 = −(qr+qρ/
3)·(Qr+Qρ/
3)/m , k = qr+qρ/
3 , (A12)
′0 = (qr−qρ/
3)·(Qr−Qρ/
3)/m , k′ = qr−qρ/
3 , (A13)
and the Mandelstam variables for nucleon 3 read
s = (p3+k)
2 = m2 − (qr+qρ/
3) · (qr+2Qr−qρ/
3Qρ) +O(q4) , (A14)
u = (p3−k′)2 = m2 − (qr−qρ/
3) · (qr+2Qr+qρ/
3Qρ) +O(q4) , (A15)
ν = (s−u)/4m = −2 qr ·Qρ/
3 +O(q4) . (A16)
In the evaluation of the intermediate πN amplitude, one needs
[ū(p′) u(p)](3) ≃ 2m+O(q2) , (A17)
ū(p′) σµν(p
′−p)µKν u(p)](3) ≃ 2 iσ(3) ·qρ×qr/
3 +O(q4) . (A18)
The πN vertex for nucleon 1 is associated with
[ū(p′) γ5 u(p)]
(1) ≃ σ(1) ·(qr+qρ/
3) +O(q3) , (A19)
and results for nucleon 2 are obtained by making qr → −qr.
APPENDIX B: SUBTHRESHOLD COEFFICIENTS
The polynomial parts of the amplitudes T±R , Eqs. (30-35), are determined by the sub-
threshold coefficients of Ref. [15]. The terms relevant to the O(q3) expansion are written as
d+00 = −
2 (2c1 − c3) µ2
8 g4A µ
64 π f 4π
3 g2A µ
64 π f 4π
, (B1)
d+01 = −
48 g4A µ
768 π f 4π
77 g2A µ
768 π f 4π
, (B2)
d+02 =
193 g2A
15360 π f 4π µ
, (B3)
d−00 =
2 f 2π
+O(q2) , (B4)
b−00 =
2 f 2π
2 c4 m
g4A m µ
8 π f 4π
g2A m µ
8 π f 4π
, (B5)
b−01 =
g2A m
96 π f 4π µ
, (B6)
where the parameters ci and d̃i are the usual coupling constants of the chiral lagrangians
of order 2 and 3 respectively [41] and the tilde over the latter indicates that they were
renormalized [15]. Terms within square brackets labeled (mr) in these results are due to
the medium range diagrams shown in Fig. 3 and have been included explicitly into the
functions D±mr and B
mr. Terms bearing the (WT ) label were also explicitly considered in
Eqs. (15-19). The subthreshold coefficients are determined from πN scatterig data and a
set of experimental values is given in Ref. [13].
APPENDIX C: FUNCTIONS In
The functions In, describing loop contributions, are given by
In(r31, r23) = −
(2π)3
(2π)3
ei(k·r31+k
·r23)
′2+µ2
Πt(t) . (C1)
Using the definition Eq. (33) and the Jacobi variables Eq. (A1), one writes
In(r31, r23) =
I(r31, r23) , (C2)
I(r31, r23) = 128π
da tan−1
µ (1−a2/2)
L(a; r,ρ) (C3)
L(a; r,ρ) =
(2π)3
(2π)3
ei(Q·r−
3q·ρ/2)
a2q2+4µ2
[(Q−q)2+µ2]
[(Q+q)2+µ2]
. (C4)
The numerical evaluation of the function L is can be simplified by using alternative repre-
sentations.
• form 1: One uses the Feynman procedure for manipulating denominators, which yields
L(a; r,ρ) =
(2π)3
(2π)3
ei(Q·r−
3q·ρ/2)
a2q2+4µ2
[(Q2+q2/4+µ2)−(1−2b)q ·Q]2
(2π)3
ei[(1−2b)r−
3ρ]·q/2
a2q2+4µ2
e−Θ r
µ2+b(1−b) q2 . (C5)
Performing the angular integration over q, one has
L(a; r,ρ) =
16 π3
e−Θ r
Θ (a2q2+4µ2)
sin q [(1− 2b) r −
3ρ]/2
[(1− 2b) r −
3ρ]/2
. (C6)
• form 2: The Fourier transform
dx e−ik·x
allows one to write
L(a; r,ρ) =
e−µ|r31+z|
|r31+z|
e−µ|r23−z|
|r23−z|
e−2µ z/a
. (C8)
These results may be further simplified by means of approximations.
• heavy baryon approximation: In the limit m → ∞, corresponding to the heavy
baryon case, one uses F (a)→ 4π/a2 in Eq. (33) and Eqs. (C5) and (C7) yield, respectively,
I(r31, r23) ≃
tan−1
e−Θ r
sin q [(1− 2b) r −
3ρ]/2
[(1− 2b) r −
3ρ]/2
, (C9)
I(r31, r23) ≃
e−µ|r31+z|
|r31+z|
e−µ|r23−z|
|r23−z|
e−2µ z
2µ z2
. (C10)
• multipole approximation: The integrand in Eq. (C10) is peaked around z = 0 and a
multipole expansion of the Yukawa functions produces
I(r31, r23) ≃ U(x31) U(x23) + · · · . (C11)
The same result can also be obtained by using the expansion Πt(t) ∼ −π[1 + t/12µ2 +
t2/80µ4 + · · · ], valid for low t, directly into Eq. (C1).
APPENDIX D: NON-LOCAL TERM
In configuration space, the variable Qρ corresponds to a non-local operator, represented
by a gradient acting on the wave function. In order to make the dependence of t̄3 on Qρ
explicit, one writes
t̄3 = [Qρ]i Xi(qr, qρ) , (D1)
where X is a generic three-vector, and evaluates the matrix element
〈ψ |W |ψ〉 =−
]12 ∫
dr′ dρ′ dr dρ ψ∗(r′,ρ′) ψ(r,ρ)
dQr dQρ dqr dqρ
× ei[Qr·(r
′−r)+Q
·(ρ′−ρ)+q
·(r′+r)/2+q
·(ρ′+ρ)/2] t̄3(Qr,Qρ, qr, qρ)
dr dρ
∗(r,ρ)
ψ(r,ρ) + ψ∗(r,ρ)
∇ρ ψ(r,ρ)
dqr dqρ e
·ρ] Xi(qr, qρ) . (D2)
This yields the potential
V3(r,ρ) = −
(2π)6
dqr dqρ e
·ρ] Xi(qr, qρ) , (D3)
where the operator ∇
acts only on the wave function. An alternative form can be
obtained by integrating Eq. (D2) by parts, and one finds
V3(r,ρ) =−
(2π)6
dqr dqρ e
·ρ] X(qr, qρ)
− i∇wfρ
dqr dqρ e
·ρ] X(qr, qρ)
. (D4)
In the case of the three-body force, the only non-local contribution is associated with the
subamplitude D−, Eq. (37), which yields
Xi = −i τ (1) × τ (2) ·τ (3)
′2+µ2
(1) ·k σ(2) ·k′
g2A(g
A − 1)√
3 8f 4π m
(k′+k)i . (D5)
The action of ∇ρ on the second term of Eq. (D4) gives rise to an integrand proportional to
′2−k2), which has short range and does not contribute to the TPE-3NP. Therefore it is
neglected.
[1] M. Taketani, S. Nakamura, and T. Sasaki, Prog. Theor. Phys. 6, 581 (1951).
[2] S. Weinberg, Phys. Lett. B 251, 288 (1990); Nucl. Phys. B 363, 3 (1991).
[3] S. Weinberg, Phys. Lett. B 295, 114 (1992).
[4] C. Ordóñez and U.van Kolck, Phys. Lett. B 291, 459 (1992); C. Ordóñez, L. Ray, and U. van
Kolck, Phys. Rev. Lett. 72, 1982 (1994); Phys. Rev. C 53, 2086 (1996).
[5] N. Kaiser, R. Brockman, and W. Weise, Nucl. Phys. A625, 758 (1997); N. Kaiser, Phys.
Rev. C 64, 057001 (2001); Phys. Rev. C 65, 017001 (2001); E. Epelbaum, W.Glöckle, and
U-G. Meissner, Nucl. Phys. A637, 107 (1998); ibid. A671, 295 (2000); D. R. Entem and R.
Machleidt, Phys. Rev. C 66, 014002 (2002).
[6] R. Higa and M. R. Robilotta, Phys. Rev. C 68, 024004 (2003).
[7] R. Higa, M. R. Robilotta, and C. A. da Rocha, Phys. Rev. C 69, 034009 (2004).
[8] S. A Coon, M. D. Scadron, P. C. McNamee, B. R. Barrett, D. W. E. Blatt, and B. H. J.
McKellar, Nucl. Phys. A317, 242 (1979).
[9] S. A. Coon and W. Glöckle, Phys. Rev. C 23, 1790 (1981).
[10] H. T. Coelho, T. K. Das, and M. R. Robilotta, Phys. Rev. C 28, 1812 (1983).
[11] J. L. Friar, Phys. Rev. C 60, 034002 (1999).
[12] S-N. Yang, Phys. Rev. C 10, 2067 (1974).
[13] G. Höhler, group I, vol.9, subvol.b, part 2 of Landölt-Bornstein Numerical data and Functional
Relationships in Science and Technology, ed. H.Schopper, 1983; G. Höhler, H. P. Jacob, and
R. Strauss, Nucl. Phys. B39, 273 (1972).
[14] T. Becher and H. Leutwyler, Eur. Phys. Journal C 9, 643 (1999).
[15] T. Becher and H. Leutwyler, JHEP 106, 17 (2001).
[16] J. C. Ward, Phys. Rev. 78, 1824 (1950); Y. Takahashi, Nuovo Cimento 6, 370 (1957); L. S.
Brown, W. J. Pardee, and R. Peccei, Phys. Rev. D 4, 2801 (1971).
[17] M. Mojžǐs and J. Kambor, Phys. Lett. B 476, 344 (2000).
[18] J. Gasser, M. E. Sainio, and A. Švarc, Nucl. Phys. B307, 779 (1988).
[19] G. Höhler, H. P. Jacob, and R. Strauss, Nucl. Phys. B39, 273 (1972).
[20] S. Weinberg, Phys. Rev. Lett. 17, 616 (1966).
[21] Y. Tomozawa, Nuovo Cimento A 46, 707 (1966).
[22] M. R. Robilotta, Phys. Rev. C 63, 044004 (2001).
[23] I. P. Cavalcante, M. R. Robilotta, J. Sá Borges, D. de O. Santos, and G. R. S. Zarnauskas,
Phys. Rev. C 72, 065207 (2005).
[24] J. L. Friar, D. Huber, and U. van Kolck, Phys. Rev. C 59, 53 (199); U. van Kolck, Ph. D.
thesis, University of Texas, 1993; C. Ordóñez and U. van Kolck, Phys. Lett. B 291, 459 (1992);
U. van Kolck, Phys. Rev. C 49, 2932 (1994).
[25] J. Gasser, H.Leutwyler, and M. E. Sainio, Phys. Lett. B 253, 252, 260 (1991).
[26] P. Büttiker and U.-G. Meissner, Nucl. Phys. A668, 97 (2000).
[27] N. Fettes and U-G. Meissner, Nucl. Phys. A693, 693 (2001).
[28] S. A. Coon and H. K. Han, Few-Body Syst. 30, 131 (2001).
[29] M. R. Robilotta and H. T. Coelho, Nucl. Phys. A460, 645 (1986).
[30] M. G. Olsson and E. T. Osypowski, Nucl. Phys. B101,136 (1975); E. T. Osypowski, Nucl.
Phys. B21, 615 (1970).
[31] T. Sasakawa and S. Ishikawa, Few-Body Syst. 1, 3 (1986).
[32] S. Ishikawa, Few-Body Syst. 32, 229 (2003).
[33] S. Ishikawa, Few-Body Syst. (to be published), nucl-th/0701044.
[34] R. B. Wiringa, V. G. J. Stoks, and R. Schiavilla, Phys. Rev. C 51, 38 (1995).
[35] K. Sagara, H. Oguri, S. Shimizu, K. Maeda, H. Nakamura, T. Nakashima, and S. Morinobu,
Phys. Rev. C 50, 576 (1994).
[36] S. Shimizu, K. Sagara, H. Nakamura, K. Maeda, T. Miwa, N. Nishimori, S. Ueno,
T. Nakashima, and S. Morinobu, Phys. Rev. C 52, 1193 (1995).
[37] S. Ishikawa, M. Tanifuji, and Y. Iseri, Phys. Rev. C 67, 061001(R) (2003).
[38] S. Ishikawa, M. Tanifuji, and Y. Iseri, in Proc. of the Seventeenth International IUPAP Con-
ference on Few-Body Problems in Physics, Durham, North Carolina, USA, 2003, edited by
W. Glöckle and W. Tornow, (Elsevier, Amsterdam, 2004) S61.
[39] K. Hatanaka, N. Matsuoka, H. Sakai, T. Saito, K. Hosono, Y. Koike, M. Kondo, K. Imai,
H. Shimizu, T. Ichihara, K. Nisimura, and A. Okihana, Nucl. Phys. A426, 77 (1984).
[40] M. R. Robilotta, Phys. Rev. C 74, 044002 (2006).
[41] V. Bernard, N. Kaiser, J. Kambor, and U-G. Meissner, Nucl. Phys. B388, 315 (1992).
ABSTRACT
  We present the expansion of the two-pion exchange three-nucleon potential
(TPE-3NP) to chiral order q^4, which corresponds to a subset of all
possibilities at this order and is based on the \piN amplitude at O(q^3).
Results encompass both numerical corrections to strength coefficients of
previous O(q^3) terms and new structures in the profile functions. The former
are typically smaller than 10% whereas the latter arise from either loop
functions or non-local gradients acting on the wave function. The influence of
the new TPE-3NP over static and scattering three-body observables has been
assessed and found to be small, as expected from perturbative corrections.

<|endoftext|><|startoftext|>
Interference effects in above-threshold ionization from diatomic molecules:
determining the internuclear separation
H. Hetzheim,1, 2 C. Figueira de Morisson Faria,3 and W. Becker2
Max-Planck-Institut für Kernphysik, Saupfercheckweg 1, 69117 Heidelberg, Germany
Max-Born-Institut für nichtlineare Optik und Kurzzeitspektroskopie, Max-Born-Str. 2A, D-12489 Berlin, Germany
Department of Physics and Astronomy, University College London,
Gower Street, London WC1E 6BT, United Kingdom
(Dated: October 28, 2018)
We calculate angle-resolved above-threshold ionization spectra for diatomic molecules in linearly
polarized laser fields, employing the strong-field approximation. The interference structure resulting
from the individual contributions of the different scattering scenarios is discussed in detail, with
respect to the dependence on the internuclear distance and molecular orientation. We show that,
in general, the contributions from the processes in which the electron is freed at one center and
rescatters off the other obscure the interference maxima and minima obtained from single-center
processes. However, around the boundary of the energy regions for which rescattering has a classical
counterpart, such processes play a negligible role and very clear interference patterns are observed.
In such energy regions, one is able to infer the internuclear distance from the energy difference
between adjacent interference minima.
I. INTRODUCTION
The interaction of matter with an intense laser field
(I & 1013W/cm
) leads to several phenomena, such as
above-threshold ionization (ATI) or high-order harmonic
generation (HHG). Such phenomena owe their existence
to physical mechanisms, in which an electron reaches
the continuum, by tunneling or multiphoton ionization,
at an instant t′. Subsequently, it is accelerated by the
field and driven back towards its parent ion, or molecule,
with which it rescatters or recombines at a later time
t [1]. Such laser-induced recombination or rescattering
processes take place within a fraction of a laser-field cycle.
The period of a typical near-infrared Ti:sapphire laser
pulse is T = 2π/ω ∼ 2.6fs. Thus, HHG and ATI occur on
a time scale of hundreds of attoseconds [2]. Hence, above-
threshold ionization and high-order harmonic generation
may be employed for probing, or even controlling, dy-
namic processes with attosecond and sub-angstrom reso-
lution.
This fact, together with new alignment techniques, has
opened a whole new range of possibilities for studying
molecules in strong laser fields, employing high-energy
photoelectrons or high-order harmonic radiation. Con-
crete examples are the attosecond reconstruction of the
nuclear motion in a molecule [3], the real-time imaging of
vibrational wavepackets [4], the tomographic reconstruc-
tion of molecular orbitals [5], the time-resolved measure-
ment of intramolecular quantum-interference effects [6],
or the determination of internuclear distances [7].
These applications are a direct consequence of the fact
that a molecule possesses a very specific configuration of
ions from which the electron may leave, or off which it
may rescatter causing above-threshold ionization, or re-
combine generating high-harmonics. This leads to char-
acteristic quantum-interference patterns in the HHG or
ATI spectra, in which structural information about the
molecule is hidden. This is true both for polyatomic [8]
and diatomic molecules [6, 7, 9, 10, 11, 13, 14, 15, 16,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27]. In particular
for diatomic molecules, it has been shown that the high-
order harmonic or ATI spectra exhibit overall maxima
and minima, which are highly dependent on the spatial
separation between both centers in the molecules, and
can be described as the interference between two radi-
ating point sources. In this sense, HHG or ATI by a
diatomic molecule may be viewed as the microscopic ana-
log of a double-slit experiment [6, 10, 11]. Furthermore,
such features depend on the symmetry of the highest oc-
cupied molecular orbitals, and on the alignment angle of
the molecule with respect to the laser-field polarization
[6, 7, 10, 11, 12, 13, 15, 19, 20, 21, 22, 23, 24, 25, 26, 27].
Specifically in the diatomic case, several aspects of
this interference have been extensively studied in the
past few years, such as the influence of the orbital sym-
metry, the internuclear distance, the alignment angles
[6, 7, 10, 11, 12, 13, 15, 19, 20, 21, 22, 23, 27], and
molecular vibration [24, 25, 26], as well as the role of the
laser-field shape [16, 17] or polarization [18]. Further-
more, an adequate modeling of bound molecular states,
in comparison with existing ionization experiments [14],
has also raised considerable debate [15, 19, 20, 27].
For that purpose, both the purely numerical solution
of the time-dependent Schrödinger equation [7, 11, 12],
and the strong-field approximation [10, 13, 15, 19, 20,
22, 23, 24, 25, 26, 27] have been employed. The latter
method allows a transparent physical interpretation of
the phenomena in question as laser-induced rescattering
or recombination processes, and permits a clear space-
time picture, which can be related to the classical orbits
of an electron in a strong laser field [28]. For a diatomic
molecule, there exist two main rescattering or recombi-
nation scenarios: the electron born through ionization at
the center Ci upon its return may recollide and interact
with either the same ion (Ci), or with the other one ( Cj
http://arxiv.org/abs/0704.0712v2
(i 6= j)) Such processes have been taken into account for
high-order harmonic generation employing a two-center
zero-range potential [13, 21], using Bessel function expan-
sions [19], and by means of saddle-point methods [21, 22].
In this paper, we calculate the energy spectra and an-
gular distributions of ATI produced electrons in linearly
polarized laser fields, within the framework of the strong-
field approximation (SFA) and the single-active electron
approximation (SAE). We employ a zero-range poten-
tial model similar to that in [13], and consider both the
direct electrons, which reach the detector without inter-
acting with their parent molecule, and the electrons that
suffer a single act of rescattering before reaching the de-
tector. In the latter case, we put particular emphasis on
interference effects: A final state with given momentum
outside the laser field can be reached via two different
scenarios. An electron can be born at and rescatter off
the same center, or it can be born at one center and
rescatter off the other. We show that the processes in-
volving two centers, in general, obscure the interference
patterns in the ATI spectra, in almost all energy-angle
regions. An exception, however, is the boundary of the
region that after tunneling is classically accessible to the
ionized electron, in other words, the region before the
classical cutoff. Near this boundary, the two-center pro-
cesses yield negligible contributions, and one may identify
very clear interference patterns. This makes it possible to
provide a recipe to determine the internuclear distance R
out of the angle-resolved ATI spectra. Throughout the
article we will use the velocity gauge and atomic units
(e = m = ~ = 1, c = 137).
The paper is organized as follows: In Sec. II we pro-
vide the ATI transition amplitudes for the direct and for
the rescattered electrons, which in Sec. III are employed
to compute the ATI spectra. The interference patterns in
the spectra are analyzed with respect to molecular orien-
tation, internuclear distance, and the position of the de-
tector with respect to the polarization of the laser field
and the molecular axis (Sec III A). In Sec. III B, we
present angle-resolved spectra, from which we infer the
internuclear distance. Finally, in Sec. IV, we summarize
the paper.
II. TRANSITION AMPLITUDES
The transition amplitude for direct ionization, within
the strong-field approximation (SFA) [29], is given by the
Keldysh-Faisal-Reiss amplitude [30]
Mp = −i
dt〈Ψ(V )p (t)|V |Ψ0(t)〉, (1)
where |Ψ0(t)〉 = |Ψ0〉 exp(i|E0|t). The amplitude de-
scribes an electron, initially in the ground state |Ψ0〉, that
is injected in the continuum by the laser field overcom-
ing the ionization potential |E0|, and reaches the detector
with final momentum p. The form of the transition am-
plitude given here, which contains the binding potential
V (r) rather than the interaction with the laser field, was
first presented in Ref. [31]. In the SFA, the final state
with momentum p is described by a Volkov state, which
in velocity gauge has the form
〈r|Ψ(V )p (t)〉 =
(2π)3/2
eip·re
dτ [p+A(τ)]2
. (2)
In the amplitude (1), the electron once ionized does not
interact with the ion (that is, with the binding potential
V ) anymore. If we allow for at most one single act of
rescattering, the amplitude (1) is replaced by
M (0,1)p = −
×〈Ψ(V )p (t)|V U (V )(t, t′)V |Ψ0(t′)〉. (3)
Here, U (V )(t, t′) denotes the Volkov time-evolution oper-
ator, which describes the evolution of the electron in the
presence of the external laser field, ignoring the binding
potential. Equation (3) incorporates direct ionization, as
described by Eq. (1), as well as ionization followed by
rescattering (for details, see, e.g., Ref. [32]).
In order to apply Eqs. (1) and (3) to a diatomic
molecule, we consider the two-center binding potential
V (r) = V0(r−R1) + V0(r−R2), (4)
where Ri (i = 1, 2) denote the coordinates of the centers
Ci (i = 1, 2). For the ground-state wave function, we
employ a linear combination of atomic orbitals (LCAO):
Ψ(r) = c1Ψ0(r−R1) + c2Ψ0(r−R2). (5)
Specifically, we will use the zero-range potential
V (r) =
r, (6)
whose single bound state is described by the wave func-
Ψ0(r) =
)1/2 1
e−κr, (7)
with κ =
2|E0|. The regularization operator ∂/∂r r
acts on the wave function to its right in order to satisfy
the proper boundary conditions at the origin [34]. For
direct ionization by a monochromatic linearly polarized
laser field
A(t) = A0 cosωt e, (8)
the evaluation of the amplitude (1) is straightforward.
Taking R1 = R/2 and R2 = −R/2, so that R is the
internuclear distance of the two centers, one obtains by
expanding the exponent in the Volkov wave function in
Eq. (2) into Bessel functions
M0p =
+ Up + |E0| −Nω
J−(2l+N)
2p · e
, (9)
0 0.5 1 1.5 2 2.5
energy in units of U
s) atom
molecule R=2 a.u.
molecule R=6 a.u.
FIG. 1: (Color online) ATI spectra of the direct electrons in
the laser polarization direction in the atomic and molecular
case, for a molecule aligned parallel to a laser field of frequency
ω = 0.058 a.u. and ponderomotive potential Up = 2.08 a.u.
We consider a symmetric combination of atomic orbitals [c1 =
c2 = 1/
2 in Eq. (5)], with ionization potential |E0| = 0.9
a.u. In order to facilitate the comparison, the same ionization
potential V0(r) was chosen in the atomic and molecular cases.
The arrows mark the various destructive interference energies
(n = 0, 1, 2) for R = 6 a.u..
where Up = A
0/4 denotes the ponderomotive energy of
the laser field (8). The prefactor, which is proportional
F = −κ+ exp(−κR)/R, (10)
is of no relevance, since we do not attempt to calculate
total ionization rates. It is, however, worth mentioning
that the limit of R → 0 is not straightforward. Below we
will not face this limit. For a more detailed discussion,
see, e.g., Refs. [13, 35].
The only difference between the matrix element (9) for
a molecule and the corresponding matrix element for an
atom, besides the R-dependent prefactor, is the presence
of the term cos(p ·R/2). This term describes the inter-
ference of electron orbits with momentum p originating
from one or the other center of the two-center potential
(4). The cosine term comes from assuming a symmetric
combination of orbitals in the ground-state wave func-
tion (5), so that c1 = c2 = 1/
2. For an antisymmetric
combination, so that c1 = −c2, the cosine is replaced by
a sine, leading to suppression of electrons with low mo-
menta due to destructive interference [9, 10, 11]. The
interference factor cos(p ·R/2) = cos(pR cos θ/2) yields
destructive interference for electrons with energies
E ≡ p
(2n+ 1)π
R cos θ
for integer n. An illustration of the interference effect
is given in Fig. 1, which shows the spectrum of the di-
rect electrons for an atom and for a symmetric diatomic
molecule aligned parallel to the laser-field polarization.
The clearly visible sharp dips in the spectrum, due to
destructive interference, are indicated by the arrows in
the figure. Next, we turn to the evaluation of the ma-
trix element (3), which allows rescattering. With the
two-center potential (4) and the symmetric ground-state
wave function (5), the matrix element reads
(2π)3/2
d3r′ e−ip·re
dτ(p+A(τ))2 [V0(r+R/2) + V0(r−R/2)]
× U (V )(rt; r′t′) [V0(r′ +R/2) + V0(r′ −R/2)] [Ψ0(r′ +R/2) + Ψ0(r′ −R/2)] ei|E0|t
. (12)
For the zero-range potential (6), the integrations over
space can be carried out easily [32], which leaves a
two-dimensional integral over the ionization time t′ and
the rescattering time t. For finite-range potentials, one
may proceed by introducing form factors and employing
saddle-point methods. In this case, the single-center pre-
factors cause an overall decrease of the yield for increas-
ing photoelectron energy. There are, however, no signif-
icant changes in the interference patterns in comparison
with the zero-range case, since these prefactors do not
influence the action or the cosine factor. For a detailed
discussion of the single-atom case, see, e.g., Ref. [36].
We split the eight integrals into those where the elec-
tron rescatters off the same center from which it was ion-
ized (r = r′ = ±R/2) and those where it rescatters off
the opposite center (r = −r′ = ±R/2). We refer to the
respective terms by M ij where i = +,− denote the cen-
ter of ionization and j = +,− the center of rescattering.
The integrals M++ and M−−, which specify the elec-
trons coming from and rescattering off the same center,
are essentially identical to the corresponding results for
an atom [32]. The structure of the molecule is reflected
in the integrals M+− and M−+, which characterize the
electrons that experience the presence of both centers.
Evaluating the remaining two integrals over t and t′, we
substitute t′ = t − τ . The doubly infinite integral over t
then yields a δ-function expressing energy conservation,
while the semi-infinite integration over τ has to be cal-
culated numerically. Expanding all oscillating exponents
in terms of Bessel functions, we obtain for the transition
amplitudes
M++p +M
p = 2F cos
+ Up + |E0| −Nω
J−(2l+N)
2p · e
)3/2{
e−i(|E0|τ+lα)e
−iUpτ [1−(
sinωτ/2
= 2F cos
M (atom)p , (13)
M+−p = Fe
+ Up + |E0| −Nω
2τ e−i[|E0|τ+lα+(2l+N)β−]e
−iUpτ [1−(
sinωτ/2
J−(2l+N)
, (14)
M−+p = Fe
+ Up + |E0| −Nω
2τ e−i[|E0|τ+lα+(2l+N)β+]e
−iUpτ [1−(
sinωτ/2
J−(2l+N)
. (15)
The real quantities A, B± and the phases α and β± are
defined by
Ae−iα = e−2iωτ +
e−iωτ , (16)
−iβ± = p · e± R · e
[i sinωτ − (1− cosωτ)].(17)
Upon R → −R, we have B± exp(−iβ±) →
B∓ exp(−iβ∓). Consequently, the matrix element Mp
does not change when R → −R. The complete matrix
element is the sum of the terms (13) – (15),
Mp = M
p . (18)
The first two terms describe electrons originating from
and rescattering off the same center. They are pro-
portional to the atomic ionization amplitude M
(atom)
[32] multiplied by the wave-function overlap F and the
two-center interference factor cos(p ·R/2), which we ob-
served for the direct electrons in Eq. (9). The behavior of
the exchange terms M+−p and M
p is more complicated
and will be discussed below.
The transition amplitude Mp simplifies enormously
when the molecule is aligned perpendicularly to the field
so that R · e = 0. Equation (17) shows that in this case
B+ = B− and β+ = β− = 0. Hence, the integrals on
the right-hand side of Eqs. (14) and (15) are equal and
M+−p +M
p becomes proportional to 2 cos(p ·R/2) just
like M++p +M
p . If, in addition, the electron is emitted
perpendicularly to the field so that also p · e = 0, then
B+ = B− = 0 and we have Mp = 0 unless N is even.
Substituting τ → τ/ω in Eqs. (13)–(17) one can see
that the amplitudes M ijp and their sum Mp depend on
the parameters of the problem through the dimension-
less quantities p2/ω, Up/ω, and ωR
2 when the relative
orientations of the vectors R, p, and e are kept fixed.
III. PHOTOELECTRON SPECTRA
In this section we discuss the ATI spectra computed
employing the transition matrix elements (1) and (3),
and a symmetric combination of equivalent centers [c1 =
c2 = 1/
2 in Eq. (5)]. For the sake of simplicity, in
the comparison between the atomic and molecular case
the same ionization potential V0 is chosen. Specifically,
we take |E0| = 0.9 a.u. in Eqs. (5) and (7) through-
out [37]. In Sec. III A, we perform a detailed analysis of
the interference patterns with respect to the molecular
orientation, rescattering scenarios, and the direction of
electron emission, while in Sec. III B we provide a recipe
for measuring the internuclear distance from an analysis
of the interference patterns in the angle-resolved spectra.
A. Analysis of the interference patterns
As a first step, we investigate how the interference pat-
terns are influenced by the orientation of the molecule
with respect to the laser-polarization direction. Such re-
sults are displayed in Figs. 2 and 3 for parallel and per-
pendicular orientations, respectively. In both cases, we
compare the entire ATI spectrum consisting of the di-
rect and the rescattered electrons in the atomic and the
molecular case. Unless stated otherwise (cf. Fig. 5), we
consider electron emission in the laser-polarization direc-
tion.
As expected from Eq. (9), for energies smaller than
2Up, the main contributions to the yield come from the
direct-ionization matrix element (9). Apart from the
interference-related factor of cos(p ·R/2), the transition
matrix element is identical to that obtained for a sin-
gle atom (cf. Fig. 1). This factor is responsible for the
sharp interference dips at the positions given by Eq. (11).
In the plateau energy region, however, the spectra de-
pend on the laser-field polarization in a more complex
way, as will be discussed next. In case the molecule is
aligned parallel to the laser-field polarization (Fig. 2),
the plateau is strongly enhanced in the molecular case,
and the structure of the spectrum is very different from
the atomic case and dependent on the internuclear dis-
tanceR. Indeed, inspection of the exchange integrals (14)
and (15) does not reveal any simple dependence on the
internuclear distance. Generally, for the molecular case
there are more pathways into a given final state. For our
case of a two-center potential, there are four pathways in
place of one for the atomic case. If they add coherently,
a significant enhancement can result, ideally by a factor
of 16, which is roughly what is observed in Fig. 2 be-
fore the cutoff. The structure caused by the cosine factor
is suppressed in the plateau region. This is caused by
the contribution of the processes in which the electron
is ejected from one center and rescatters off the other.
Such processes correspond to the transition amplitudes
M−+p andM
p , which do not exhibit the proportionality
to the cosine that is characteristic of the the one-center
scattering amplitudes M++p and M
p . A further par-
ticular feature observed in this case is the displacement
of the cut-off to higher energies with increasing internu-
clear distance. This can be understood by the fact that
an electron that moves from one center to the other may
gain more energy from the field since it may be acceler-
ated over a longer distance before it recollides.
A strikingly different behavior is observed if the
molecule is aligned perpendicular to the direction of the
laser field so that R ·e = 0. In Fig. 3 we consider the case
where the electron is emitted in the direction of the laser
polarization so that p ·R = 0, too. In this case, there is
a general enhancement of the ATI yield in comparison to
that of a single atom by roughly a factor of two for the di-
rect electrons and a much larger factor for the rescattered
electrons. Notice that the molecular spectrum is practi-
cally independent of the internuclear distance [21], since
the R dependence of the prefactor (10) is weak for R & 2
and the exponential of R2/(2τ) is small for the values
of R that we consider and the values of τ that give the
dominant contributions to the integral. The entire ATI
spectrum does not show any interference structure, since
the contributions from the two centers add constructively
for p ·R = 0. In fact, the cosine term in the matrix el-
ements M0p, M
p and M
p simply reduces to one and
the spectrum therefore looks like that of an atom. Specif-
ically within the plateau, by symmetry the contributions
from the two centers always interfere constructively. This
results in a spectrum that is largely independent on the
internuclear separation R, except that the plateau is en-
hanced, compared with the atomic case, by the existence
of four pathways. Formally, this can be understood as
discussed above at the end of Sec. II.
For arbitrary p · R, if the electron is emitted perpen-
dicular to the laser polarization so that p · e = 0, then
we see from Eqs. (16) and (17) that B+ = B−, while
β− = β+ + π. The sum of the two exchange terms then
goes like cos(p ·R/2) for even N and like sin(p ·R/2)
for odd N . This holds regardless of the orientation of
the molecule. If the laser polarization is perpendicular to
both the electron momentum and the internuclear axis,
then B+ = B− = 0. This implies that Mp is nonzero
only for integer N . Each other electron peak is missing.
Next, we discuss the interference pattern in more detail
by analyzing the individual contributions to the transi-
tion matrix element. In Fig. 4, we separately investigate
the individual contributions to the amplitude (18). If
only M++p and M
p are taken, a very pronounced min-
inum is observed near 5Up. These matrix elements cor-
respond to the case in which the electron is ejected from
and rescatters off the same center, so that the minimum is
due to the term cos(p ·R/2). In the full spectrum |Mp|2,
however, this minimum is absent because it is filled by the
contributions from the exchange terms M−+p and M
For a given orientation of the molecule with respect to
the laser field and for fixed momentum p, Fig. 4 shows
that the contribution |M−+p |2 of the scenario in which
the electron is freed at the center C1 and rescatters off
at the center C2 is different from that of |M+−p |2 where
it is released at C2 and rescatters at C1. The same has
been observed for high-order harmonic generation in a
two-center system [13].
The imprints of interference can still be observed if
the electron is emitted away from the laser polarization
direction. This will cause, however, an overall decrease
in the photoelectron energies for both the direct and the
rescattered electrons. Examples are presented in Fig. 5.
This behavior is known from atomic ionization, and its
origin is the same in the molecular case; for a discussion,
see, e.g., Ref. [32].
0 2 4 6 8 10
energy in units of U
s) atom
molecule R=2 a.u.
molecule R=4 a.u.
FIG. 2: (Color online) Comparison of the complete ATI spec-
tra consisting of direct and rescattered electrons in the atomic
and the molecular case for internuclear distances of R = 2
a.u. and R = 4 a.u.. The molecule is aligned parallel to
the laser-polarization direction, and the electrons are emitted
in the same direction. The arrows mark the destructive in-
terferences (n = 0, 1) of the molecule for R = 2 a.u.. The
destructive interference for n = 1 is already in the plateau
region, where the role of exchange terms becomes important.
The remaining parameters are the same as in Fig. 1.
0 2 4 6 8 10
energy in units of U
s) atom
molecule R=2 a.u.
molecule R=4 a.u.
FIG. 3: (Color online) The same as Fig. 2 but with the
molecular axis perpendicular to the laser polarization. The
electrons are emitted parallel to the laser polarization.
B. Determining the internuclear distance
9 10 11
0 2 4 6 8 10
energy in units of U
FIG. 4: (Color online) Individual contributions of the var-
ious rescattering scenarios to the total amplitude (18) for
a diatomic molecule with internuclear distance R = 2 a.u.
aligned parallel to the laser-field polarization, for the same
molecular and field parameters as in Fig. 2. The electrons are
emitted in the polarization direction. The arrows mark the
respective cutoff energies for the various transition amplitude
matrix elements. The inset at the lower left is an enlargement
of the region near the cutoff where the direct terms and the
exchange terms differ in a characteristic fashion, allowing for
the determination of the internuclear separation.
-8 Ψ=0
0 2 4 6 8 10 12
energy in units of U
A || R
A ⊥ R
FIG. 5: (Color online) Electron yield for the parameters of
Fig. 1 and internuclear distance R = 2 a.u. for different
emission angles ψ with respect to the polarization of the laser
field. The molecule is aligned parallel (perpendicular) to the
laser field in the upper (lower) panel.
FIG. 6: (Color online) Angle-resolved ATI spectra on a log-
arithmic scale for a diatomic molecule with internuclear dis-
tances R = 2 a.u. and R = 3 a.u. (middle and bottom
panels, respectively), aligned parallel to the laser-field polar-
ization, compared to the single-atom case (upper panel). The
binding energy is E0 = 0.9 a.u. in all cases, and the laser fre-
quency and the ponderomotive potential are ω = 0.058 a.u.
and Up = 2.08 a.u., respectively. The plotted lines depict the
minima (solid lines) and the maxima (dashed lines) of the
energy distribution given by Eq. (11).
FIG. 7: (Color online) Angle-resolved ATI spectra on a log-
arithmic scale for a diatomic molecule with ionization poten-
tial E0 = 0.9 a.u. and internuclear distances R = 4 a.u.,
R = 6 a.u. and R = 8 a.u. (upper, middle and bottom
panels, respectively), aligned parallel to the laser-field polar-
ization. The field parameters are the same as in the previous
figure. The plotted lines depict the minima (solid lines) and
the maxima (dashed lines) of the energy distribution given by
Eq. (11).
For a complete picture of the angle-resolved ATI spec-
trum, not restricted to emission in particular directions,
we will now present density plots. While they invariably
imply loss of fine details and depend on the positioning
and gradient of the false-color scale, they give a compre-
hensive overview of the general structure. We restrict
ourselves to the case of parallel alignment [41]. In this
case the spectrum is symmetrical with respect to the in-
ternuclear axis. It is obvious that the electrons with max-
FIG. 8: (Color online) Enlargement of a limited energy-angle
region of Fig. 7 for R = 8 a.u. (lowest panel) with increased
resolution. The indents of Fig. 7 are distinctly visible as val-
leys deeply cut into the high ridge that precedes the cutoff.
imal kinetic energy will be detected in the direction of the
laser field.
The angle-resolved spectra displayed in Figs. 6 and 7
are very intricate and do not exhibit any simple struc-
tures. They depend strongly on the internuclear sep-
aration but do not, on a first inspection, lend them-
selves in any obvious way to the assignment of a specific
value of R to a given spectrum. Especially, owing to the
presence and magnitude of the exchange terms (14) and
(15), the two-center interference, which is expressed in
the cos(p ·R/2) term, is not immediately visible. How-
ever, looking more closely, one can observe a very distinct
manifestation of this term just near the classical bound-
ary of the spectrum. Roughly, the latter agrees with the
boundaries of the colored areas in the various panels of
Figs. 6 and 7. We observe well-defined indents in the
overall smooth curve that defines the classical boundary.
The positions of these indents and, especially, their sepa-
rations agree quite well with the interference minima pre-
dicted by Eq. (11). The figures show that the separation
δE between the indents (on the scale of Up) monotoni-
cally decreases with increasing R. Hence, by comparing a
measured angle-resolved spectrum with Figs. 6 and 7 we
can infer the internuclear separation. For the parameters
underlying Figs. 6 and 7, the resulting function R(δE) is
given in Table I. Fig. 8 exhibits an enlargement of the
relevant area around the classical cutoff for the case of
R = 8 a.u. with a higher resolution of the electron yield.
The indents are very clearly visible like valleys that cut
into the drop of a plateau on a topographical map.
An analytical formula for R(δE) can be gained from
an analytical formula for the classical cutoff energy E(θ)
as a function of the angle θ. Intersecting this with the
energies of the interference minima given by Eq. (11) al-
lows one to determine the function R(δE) in dependence
of the parameters of the problem. For the case of an
atom, such a formula for E(θ) is actually known [39]. At
least for R ≤ 6 a.u., Figs. 6 and 7 show that this classical
boundary does not depend very strongly on the inter-
nuclear separation R, so that the atomic result could be
employed. However, even with this simplification, the re-
sulting formula is quite complicated and we refrain from
presenting it here.
The question arises of why near the classical bound-
ary the interference term cos(p ·R/2) roughly multiplies
the angle-resolved spectrum, like it does for the direct
electrons. The answer can be inferred from Fig. 4. The
total ionization amplitude Mp is the superposition (13)
of four different scenarios such that the electron starts
from and rescatters off one or the other center. The two
contributions (13) where they start from and rescatter
off the same center are identical except for the geomet-
rical phase, which leads to the cosine in Eq. (13). In
contrast, the other two contributions (14) and (15) are
uncorrelated since they are generated by geometrically
different scenarios. Their magnitudes are different and
almost nowhere do they exhibit a significant construc-
tive interference. The two contributions (13) are individ-
ually large when the long orbit and the short orbit add
constructively, as is the case specifically just before the
classical cutoff. In this case, they dominate the other
two terms (14) and (15) by a factor of the order of 2 to
4. Hence the complete spectrum distinctly exhibits the
geometrical interference, which is expressed in the factor
cos(p ·R/2).
Internuclear distances Energy differences of the indents
of the molecule at the spectral boundary
R = 2 a.u. δE ≈ 4.5Up
R = 3 a.u. δE ≈ 3.2Up
R = 4 a.u. δE ≈ 2.4Up
R = 6 a.u. δE ≈ 1.3Up
R = 8 a.u. δE ≈ 1.1Up
TABLE I: Energy differences between adjacent indents
around the classical boundary of the angle-resolved spectrum
of Fig. 6. The differences are taken, for each internuclear
separation, by starting with the first indent for Ψ ≥ 0◦ as a
function of the energy.
IV. CONCLUSIONS
We have analyzed ATI spectra for a two-center
molecule in a linearly polarized laser field. The terms
of the two-center wave function contributing to the in-
terference structure within the SFA formalism could be
identified as well as the absence of the interference struc-
ture throughout most of the plateau region. We have
shown that the angle-resolved spectra can be used to de-
termine the internuclear distance of a molecule aligned
with the laser field, by reading off the energy differences
between subsequent interference minima at the classical
boundary of the spectrum.
The validity of this method depends upon how close
to reality is the angle-resolved spectrum calculated for
our model molecule. Certainly, the spectrum of the di-
rect electrons cannot be trusted for this purpose. How-
ever, high-order ATI of an atom is well described by the
SFA and a zero-range potential, especially near the classi-
cal boundary [32]; for a comparison of spectra calculated
from the SFA with the solution of the time-dependent
Schrödinger equation, see Ref. [40] for the case of an
atom. Experimentally, application of the method re-
quires a high detection efficiency that allows one to ob-
tain a sufficient number of counts down to the classical
boundary.
The problem of how to extract the internuclear sepa-
ration from a diffraction pattern has been addressed by
a different method in Ref. [7], employing the numerical
solution of the time-dependent Schrdinger equation. In
[7], however, it is necessary to compute a radial distribu-
tion function from the diffraction intensity, whereas, with
the method discussed in this paper, one may determine
the internuclear distances directly from the spectra. The
method suggested in Ref. [7] has the advantage that it
analyzes direct electrons and, therefore, does not require
exceptionally high detection efficiency.
Finally, in a real physical system, there exist additional
effects, which have not been incorporated in this model
and may alter the interference patterns. Molecular vi-
bration, for instance, causes an intensity loss in the high-
harmonic signal [25], which may lead to a blurring in the
patterns. However, recently, numerical ATI computa-
tions in which such an effect is included have shown that
for H+2 the angle-dependent interference patterns related
to the double-slit physical picture remain distinguishable
in the case considered [17]. Generally, the amount of
blurring depends on the rigidity of the vibrational poten-
tial. Since the period of vibrations is much longer than
the laser period, with a few-cycle laser pulse our method
could be used to track a vibrational wave packet or the
dissociation of a molecule [4]. Another feature which has
not been incorporated in our model is the dependence of
the ionization potential on the internuclear distance. In
fact, we have taken E0 to be constant, whereas, in reality,
it decreases with R [13]. This feature, however, will only
cause an overall energy shift in the spectra. Therefore,
it will not affect the distance between two consecutive
minima or maxima in the patterns for constant internu-
clear distance (Figs. 6 and 7). Therefore, we expect our
method to be applicable to real physical systems and a
wide parameter range.
Acknowledgments
C.F.M.F. would like to thank L.E. Chipperfield, R.
Torres, and J.P. Marangos for useful discussions and the
UK Engineering and Physical Sciences Research Council
(Advanced Fellowship, grant No. EP/D07309X/1) for
financial support
[1] P. B. Corkum, Phys. Rev. Lett. 71, 1994 (1993); K. C.
Kulander, K. J. Schafer, and J. L. Krause in: B. Piraux et
al. eds., Proceedings of the SILAP conference, (Plenum,
New York, 1993).
[2] A. Scrinzi, M. Y. Ivanov, R. Kienberger, and D. M. Vil-
leneuve, J. Phys. B 39, R1 (2006).
[3] S. Baker, J. Robinson, C. Haworth, H. Teng, R. Smith,
C. Chirilă, M. Lein, J. Tisch, and J. P. Marangos, Science
312, 424 (2006).
[4] H. Niikura, F. Légaré, R. Hasbani, A. D. Bandrauk, M.
Yu. Ivanov, D. M. Villeneuve, and P. B. Corkum, Nature
417, 917 (2002); H. Niikura, F. Légaré, R. Hasbani, M.
Yu. Ivanov, D. M. Villeneuve, and P. B. Corkum, Nature
421, 826 (2003); E. Goll, G. Wunner, and A. Saenz, Phys.
Rev. Lett. 97, 103003 (2006); F. Légaré, K. F. Lee, A.
D. Bandrauk, D. M. Villeneuve, and P. B. Corkum, J.
Phys. B 39, S503 (2006).
[5] J. Itatani, J. Levesque, D. Zeidler, H. Niikura, H. Pépin,
J.C. Kieffer, P. B. Corkum, and D. M. Villeneuve, Nature
432, 867 (2004).
[6] T. Kanai, S. Minemoto, and H. Sakai, Nature 435, 470
(2005).
[7] S. X. Hu and L. A. Collins, Phys. Rev. Lett. 94, 073004
(2005).
[8] For experiments see, e.g., N. Hay, R. de Nalda, T. Half-
mann, K. J. Mendham, M. B. Mason, M. Castillejo, and
J. P. Marangos, Phys. Rev. A 62, 041803(R)(2000); C.
Altucci, R. Velotta, E. Heesel, E. Springate, J. P. Maran-
gos, C. Vozzi, E. Benedetti, F. Calegari, G. Sansone, S.
Stagira, M. Nisoli, and V. Tosa, Phys. Rev. A 73, 043411
(2006); H. Ohmura, F. Ito, and M. Tachyia, Phys. Rev.
A 74, 043410 (2006); and for theory T. K. Kjeldsen, C.
Z. Bisgaard, L. B. Madsen, and H. Stapelfeld, Phys. Rev.
A 71, 013418 (2005).
[9] B. Shan, X. M. Tong, Z. Zhao, Z. Chang, and C. D.
Lin, Phys. Rev. A 66, 061401(R) (2002); F. Grasbon,
G. G. Paulus, S. L. Chin, H. Walther, J. Muth-Böhm,
A. Becker, and F. H. M. Faisal, Phys. Rev. A 63,
041402(R)(2001); C. Altucci, R. Velotta, J. P. Maran-
gos, E. Heesel, E. Springate, M. Pascolini, L. Poletto,
P. Villoresi, C. Vozzi, G. Sansone, M. Anscombe, J. P.
Caumes, S. Stagira, and M. Nisoli, Phys. Rev. A 71,
013409 (2005).
[10] J. Muth-Böhm, A. Becker, and F. H. M. Faisal, Phys.
Rev. Lett. 85, 2280 (2000); A. Jarón-Becker, A. Becker,
and F. H. M. Faisal, Phys. Rev. A 69, 023410 (2004)
[11] M. Lein, N. Hay, R. Velotta, J. P. Marangos, and P. L.
Knight, Phys. Rev. Lett. 88, 183903 (2002); Phys. Rev.
A 66, 023805 (2002); M. Lein, J. P. Marangos, and P. L.
Knight, Phys. Rev. A 66, 051404(R) (2002); M. Spanner,
O. Smirnova, P. B. Corkum, and M. Y. Ivanov, J. Phys.
B 37, L243 (2004).
[12] D. A. Telnov and Shih-I Chu, Phys. Rev. A 71, 013408
(2005); G. Lagmago Kamta and A. D. Bandrauk, Phys.
Rev. A 71, 053407 (2005).
[13] R. Kopold, W. Becker, and M. Kleber, Phys. Rev. A 58,
4022 (1998).
[14] C. Guo, M. Li, J. P. Nibarger, and G. N. Gibson, Phys.
Rev. A 58, R4271 (1998); M. J. DeWitt, E. Wells, and R.
R. Jones, Phys. Rev. Lett. 87, 153001 (2001); E. Wells,
M. J. DeWitt, and R. R. Jones, Phys. Rev. A 66, 013409
(2002); I. V. Litvinyuk, K.F. Lee, P.W. Dooley, D.M.
Rayner, D.M. Villeneuve, and P.B. Corkum, Phys. Rev.
Lett. 90, 233003 (2003).
[15] T. K. Kjeldsen and L. B. Madsen, J. Phys. B 37, 2033
(2004); Phys. Rev. A 73, 047401 (2006).
[16] C. P. J. Martiny, and L. B. Madsen, Phys. Rev. Lett.
97, 093001 (2006); ibid. 97, 169903 (2006); S. Baier, C.
Ruiz, L. Plaja, and A. Becker, Phys. Rev. A 74, 033405
(2006).
[17] S. Seltsø, J. F. McCann, M. Førre, J. P. Hansen, and L.
B. Madsen, Phys. Rev. A 73, 033407 (2006).
[18] M. Lein, P. P. Corso, J. P. Marangos, and P. L. Knight,
Phys. Rev. A 67, 023819 (2003).
[19] V. I. Usachenko, P. E. Pyak, and S.-I Chu, Laser Phys.
16, 1326 (2006).
[20] V. I. Usachenko, and S.-I Chu, Phys. Rev. A 71, 063410
(2005); V. I. Usachenko, Phys. Rev. A 73, 047402 (2006).
[21] H. Hetzheim, M. Sc. thesis (Humboldt Universität zu
Berlin, 2005).
[22] C. C. Chirilă and M. Lein, Phys. Rev. A 73, 023410
(2006); ibid. 74, 051401(R) (2006).
[23] X. Zhou, X. M. Tong, Z. X. Zhao, and C. D. Lin, Phys.
Rev. A 71, 061801(R) (2005); ibid. 72, 033412 (2005).
[24] T. K. Kjeldsen and L. B. Madsen, Phys. Rev. A 71,
023411 (2005); Phys. Rev. Lett. 95, 073004 (2005); C.
B. Madsen and L. B. Madsen, Phys. Rev. A 74, 023406
(2006).
[25] M. Lein, Phys. Rev. Lett. 94, 053004 (2005); C. C.
Chirilă and M. Lein, J. Phys. B 39, S437 (2006).
[26] A. Requate, A. Becker, and F. H. M. Faisal, Phys. Rev.
A 73, 033406 (2006).
[27] D. B. Milošević, Phys. Rev. A 74, 063404 (2006).
[28] P. Salières, B. Carré, L. LeDéroff, F. Grasbon, G.
G. Paulus, H. Walther, R. Kopold, W. Becker, D. B.
Milošević, A. Sanpera, and M. Lewenstein, Science 292,
902 (2001).
[29] The SFA consists in neglecting the atomic binding poten-
tials when the electron is in the continuum, the external
laser field when the electron is bound, and the internal
structure of the molecule.
[30] L. V. Keldysh, Sov. Phys. JETP 20, 1307 (1964); F. H.
M. Faisal, J. Phys. B 6, L89 (1973); H. R. Reiss, Phys.
Rev. A 22, 1786 (1980).
[31] A. Perelomov, V. Popov, and M. Terent’ev, JETP 23,
924 (1966).
[32] A. Lohr, M. Kleber, R. Kopold, and W. Becker, Phys.
Rev. A 55, R4003 (1997).
[33] W. Becker, S. Long, and J. K. McIver Phys. Rev. A 50,
1540 (1994); M. Lewenstein, Ph. Balcou, M. Yu. Ivanov,
A. L’Huillier, and P. B. Corkum Phys. Rev. A 49, 2117
(1994); W. Becker, A. Lohr, M. Kleber, and M. Lewen-
stein, Phys. Rev. A 56, 645 (1997).
[34] E. Fermi, Ric. Sci. 7, 13 (1936).
[35] P. Krstić, D. B. Milošević, and R. Janev, Phys. Rev. A
44, 3089 (1991).
[36] C. Figueira de Morisson Faria, H. Schomerus, and W.
Becker, Phys. Rev. A 66, 043413 (2002).
[37] In reality, the ionization potential of a molecule decreases
with increasing internuclear distance. This effect, how-
ever, will only cause an overall shift in the interference
patterns. Since it will neither modify their shapes, nor the
energy difference between neighboring maxima, it is not
relevant to the present discussion. For the specific com-
putation of this shift within the context of a zero-range
model potential see, e.g., Ref. [13].
[38] W. Becker, F. Grasbon, R. Kopold, D. B. Milošević, G.
G. Paulus, and H. Walther, Adv. At. Mol. Opt. Phys.
48, 36 (2002).
[39] E. Hasović, M. Busuladžić, A. Gasibegović-Busuladžić,
D. B. Milošević, and W. Becker, Laser Phys. 17, 376
(2007).
[40] D. Bauer, D. B. Milošević, and W. Becker, J. Mod. Opt.
53, 135 (2006).
[41] If one changes the orientation of the molecule, the sit-
uation becomes different. For small angles the electrons
with the maximal energy are still observable in the direc-
tion of the laser field, but there exists a further maximum
of electrons with a certain momentum in the opposite di-
rection as a result of the alignment of the molecule [21].
With increasing angle of the molecular axis with respect
to the laser polarization direction, this local maximum
will move over the entire angle-resolved ATI spectrum.
ABSTRACT
  We calculate angle-resolved above-threshold ionization spectra for diatomic
molecules in linearly polarized laser fields, employing the strong-field
approximation. The interference structure resulting from the individual
contributions of the different scattering scenarios is discussed in detail,
with respect to the dependence on the internuclear distance and molecular
orientation. We show that, in general, the contributions from the processes in
which the electron is freed at one center and rescatters off the other obscure
the interference maxima and minima obtained from single-center processes.
However, around the boundary of the energy regions for which rescattering has a
classical counterpart, such processes play a negligible role and very clear
interference patterns are observed. In such energy regions, one is able to
infer the internuclear distance from the energy difference between adjacent
interference minima.

<|endoftext|><|startoftext|>
Néel order in the two-dimensional S = 1
-Heisenberg Model
Ute Löw1
Theoretische Physik, Universität zu Köln, Zülpicher Str.77, 50937 Köln, Germany
(Dated: November 1, 2018)
The existence of Néel order in the S = 1
Heisenberg model on the square lattice at T = 0 is
shown using inequalities set up by Kennedy, Lieb and Shastry in combination with high precision
Quantum Monte Carlo data.
The ground state order of quantum spin systems, in
particular the issue whether the ground state shows long
range magnetic order, has attracted long and continuous
interest. For the prototype of spin models, the antiferro-
magnetic Heisenberg model, the existence of Néel order
at low temperatures was proved in the seminal paper of
Dyson, Lieb and Simon [1] in 1978 for spin S ≥ 1 and
spatial dimension d ≥ 3 and also for S = 1
and d > 3.
Ten years later Kennedy, Lieb and Shastry [2] showed
that also for S = 1
and d = 3 Néel order in the ground
state exists.
The situation in two dimensions is different and more
subtle, since the Mermin-Wagner-Hohenberg theorem
forbids Néel order at finite T , leaving open however the
possibility of Néel order in the ground state. The exis-
tence of Néel order for the two-dimensional model and
S ≥ 1 was shown in [3, 4] and later in [2] by an indepen-
dent derivation of the relevant inequality at T = 0.
However the inequalities sufficient to show Néel order
for S = 1 in the two-dimensional case are not sufficient
to construct an analogous proof for S = 1
. Thus the
case of S = 1
remains an open problem. Still it is pos-
sible to derive inequalities concerning spin-spin correla-
tions at short distances [2] which are violated if Néel order
is present. That is, with a minimum of numerical infor-
mation, the question of Néel order in the ground state
can be decided.
The issue of this paper is to evaluate the spin-spin cor-
relations of the two-dimensional S = 1
antiferromagnetic
Heisenberg model at short distances and demonstrate
that these results combined with the analytic expres-
sions of [2] show the existence of Néel order in the two-
dimensional S = 1
antiferromagnetic Heisenberg model
at T = 0. Such a study has become possible, due to the
developement of high precision Monte-Carlo techniques
over the last decade.
In Ref.[2] Kennedy, Lieb and Shastry used data of
Gross, Sanchez-Velasco and Siggia [5] for a comparison,
however these data clearly deviate from the results pre-
sented here. The authors of [5] used a Quantum Monte
Carlo method without loop updates and with discrete
Trotter time (see below). Their data served only as a
crude comparison to extrapolated Lanczos data and data
produced by the Neumann-Ulam method, which were
the best algorithms to study the properties of the two-
dimensional Heisenberg model in 1988. Today modern
loop algortihms by far outreach both methods.
As will be shown in the following an accurate evalua-
tion of correlation functions at short distances is possible
with modern Quantum Monte Carlo methods, which al-
low us to compute expectation values at very low temper-
atures and even though the short distance results have a
certain finite size and finite temperature correction, these
uncertainties are well controlled and allow to draw defi-
nite conclusions.
The approach and intention of this paper is diffrent
from a completely numerical evaluation of e.g. the corre-
lation length, which involves a calculation of correlations
at long distances and an appropriate extrapolation to in-
finite distances, which cannot be used as a proof of long
range order in any rigorous sense.
At first sight a ”Quantum Monte Carlo algorithm”
seems a puzzling concept, since an important step in
any Monte-Carlo-method is the evaluation of Boltzmann
weights for given energies of the system. For quantum
models these energies are hard if not impossible to calcu-
late. A key idea to make Monte Carlo methods applicable
to quantum systems is to map the quantum model onto a
classical model by introducing an extra dimension, usu-
ally referred to as Trotter-time [6].
In the first generation of algorithms this mapping
was straightforwardly applied to the quantum Heisen-
berg model. Though this allowed for a wealth of new
studies of the finite temperature properties in one and in
particular in two-dimensional systems, these algorithms
had two major drawbacks, which became most evident
at low temperatures. Firstly the extra Trotter dimension
was discretized, introducing the number of time slices as
a parameter which had to be eliminated from the final
results by an extrapolation. Secondly the update pro-
cedure, i.e. the construction of new independent config-
urations, was done locally. As a consequence one had
to move through the lattice site by site several times to
obtain a configuration independent of the starting config-
uration and useful for a new evaluation of an observable.
A first improvement was introduced by the so called
loop-algorithms [7], which uses nonlocal updates similar
to the Swendsen-Wang algorithm for classical models. A
second and important step towards high precision Quan-
tum Monte Carlo techniques were algorithms which work
directly in the Euclidian time continuum [8] and require
no extrapolation in Trotter time. For the algorithm [9]
used for the analysis presented here no approximations
enter, and statistical errors are the only source of inac-
curacy.
Since this work intends to produce highly accurate
http://arxiv.org/abs/0704.0713v1
data it seems appropriate to assess the precision of the
method by a comparison with exact results. The best
candidate for such a comparison are the correlations of
one-dimensional systems evaluated by the Bethe-ansatz
with almost arbitrary precision up to distance seven [10]
and with results for finite chains from Ref.[11]. This is
done in the Appendix for chains of 400 sites at T=0.005.
After these introductory remarks we now return to our
actual goal, which is the two-dimensional system. Our
starting point is a S = 1
Heisenberg model
x,yεΛ
~Sx~Sy (1)
with nearest neighbour interaction on a finite square lat-
tice Λ with an even number of sites in every direction
and periodic boundary conditions.
The Fourier transform of the spin-spin correlation
function at T = 0 is given by
gq = 〈S−qSq〉 =
e−iqx〈S30S
x〉 (2)
where
e−iqxS3x. (3)
For the corresponding finite temperature expectation
value of gq an upper bound fq was derived in [1]. The
T = 0 limit of this bound was obtained in Ref.[4] and
a direct proof of the bound at T = 0 was given in [2].
Following the notation and arguments of Kennedy, Lieb
and Shastry [2] the inequality for d = 2 reads
gq ≤ fq for q 6= Q (4)
where fq =
12Eq−Q
, Eq = 2 − cos q1 − cos q2, Q =
(π, π) and −e0 is the ground state energy per site of the
Heisenberg model Eq.1 on the lattice Λ.
The fundamental idea is, that the existence of Néel
order in the limit of infinite system size corresponds to a
delta-function in the Fourier transform of the spin-spin
correlation gq at Q. This means, if Eq. 4 is integrated
over the whole Brillouin zone one finds in the case of Néel
order
d2q fq ≥
d2q gq = S(S + 1)/3 (5)
where m2 is the coefficient of the delta-function at Q.
If there is no Néel order m2 is zero. By numerically
evaluating the integral over fq, and by using exact varia-
tional upper and lower bounds on the ground state energy
−e0 one sees, that the above inequality and its analogon
for d ≥ 3 cannot be fulfilled with m2 = 0 and S ≥ 1,
which proves Néel order.
Inequalities of type Eq. 5 are not sufficient to prove
the existence of a nonzero m2 for d = 2, 3 and S = 1
but a new relation is obtained by multiplying gq by cos qi
and again integrating over the Brillouin zone:
ddq gq cos qi = 〈S
〉 = −e0/3d (6)
with i=1,2 for d=2 and i=1,2,3 for d=3 and δi the unit
vector in i-direction and the value of the ground state
energy form Ref.[12] is e0 = 0.669437(5).
Carrying out an analogous integral over fq and using
again Eq.4 one finds:
d2Eq−Q
cos qi
were the f+ means the positive part of a function, which
equals f, when f is positive and is zero otherwise.
Again Eq.7, which is valid if no Néel order exists, was
shown to be violated for d = 3 and S = 1
in Ref.[2] by
using bounds on e0 and thus the existence of Néel order
was proved also for d = 3 and S = 1
For S = 1
and d = 2 one cannot construct a contra-
diction by using only the ground state energy. Here more
input from numerical data is needed. This can be incor-
porated by multiplying gq by cos(mqi) with m = 2, 3...
and again integrating over the whole Brillouin zone:
d2q gq cos(mqi) = 〈S
〉 (8)
with i=1,2.
Next, defining ḡ(n) as
ḡ(n) =
(−1)m〈S30S
〉 (9)
and using again inequality 4 one constructs the follow-
ing relations involving the correlation functions:
ḡ(n) =
2n+ 2
(−1)m{cos(mq1) + cos(mq2)} gq
2n+ 2
(−1)m{cos(mq1) + cos(mq2)}+ fq.
Whenever the inequality Eq. 10 is violated for a cer-
tain n, a nonzero m2 multiplying the delta-function at
Q is needed and therefore the existence of Néel order is
proved.
The ḡ(n) as defined in Eq.9 were calculated by the
QuantumMonte Carlo method [9]. The results, displayed
in table I, show that the ḡ(n) calculated by Quantum
Monte Carlo cross the bound obtained by integrating
over fq at n = 8. This is also depicted in Fig. 1. Thus
inequality Eq.10 is violated and Néel order must exists in
n Bound T = 0.005 T = 0.025 T = 0.075
1 2.297e-01 1.80799e-01 ± 3.63e-06 1.80794e-01 1.80792e-01
2 1.714e-01 1.40308e-01 ± 5.63e-06 1.40302e-01 1.40298e-01
3 1.383e-01 1.17686e-01 ± 6.84e-06 1.17678e-01 1.17670e-01
4 1.166e-01 1.03005e-01 ± 7.67e-06 1.02997e-01 1.02985e-01
5 1.013e-01 9.27815e-02 ± 8.27e-06 9.27743e-02 9.27544e-02
6 8.990e-02 8.52115e-02 ± 8.73e-06 8.52048e-02 8.51770e-02
7 8.107e-02 7.93875e-02 ± 9.10e-06 7.93811e-02 7.93436e-02
8 7.400e-02 7.47551e-02 ± 9.40e-06 7.47496e-02 7.47012e-02
9 6.820e-02 7.09844e-02 ± 9.64e-06 7.09795e-02 7.09191e-02
10 6.334e-02 6.78504e-02 ± 9.85e-06 6.78464e-02 6.77734e-02
11 5.921e-02 6.52055e-02 ± 1.00e-05 6.52021e-02 6.51163e-02
12 5.563e-02 6.29418e-02 ± 1.02e-05 6.29389e-02 6.28404e-02
13 5.252e-02 6.09835e-02 ± 1.03e-05 6.09806e-02 6.08695e-02
14 4.976e-02 5.92718e-02 ± 1.04e-05 5.92692e-02 5.91456e-02
15 4.732e-02 5.77638e-02 ± 1.06e-05 5.77617e-02 5.76255e-02
TABLE I: Bound obtained by integrating numerically over
the right hand side of Eq.10 compared with ḡ(n) for a 40×40
lattice and different temperatures.
the two-dimensional antiferromagnetic Heisenberg model
with S = 1
at T = 0.
There are three type of corrections to the data of table
I, which need to be taken into account, but which, as
we shall show in the following, do not change the above
conclusion of a crossing of the curves at n = 8:
(i) effects of finite temperature,
(ii) effects of the finiteness of the system,
(iii) statistical errors.
In the following we comment on how these corrections
modify the data.
(i) The Quantum Monte Carlo data presented are at
T ≥ 0.005. The overall effect of finite temperature is to
lower the absolute value of the correlations and therefore
also the value of the ḡ(n). The effect of finite temperature
is to shift the crossing of the bound and ḡ(n) to larger n,
or eventually to destroy a crossing completely.
The functional dependence of the internal energy
U(T ), which up to an overall factor 3z (z = 2 is the
coordination number of the two-dimensional square lat-
tice) equals the correlation-function at distance one, has
been determined for low T by spin wave theory [13, 14]
U(T ) = −e0 + bT
3. (11)
The coefficient is given in [15] as b =
≈ 0.2853626,
so the correction for distance one is ≈ b
10−7, which is
two orders of magnitude smaller than the statistical error,
(see point (iii)).
For distances larger than one, we fitted the data as
a function of temperature (taking the exponent of T as
n Bound T = 0.025 T=0.025 extrapolated
1 2.297e-01 1.80794e-01 1.80791e-01 ± 5.09e-06
2 1.714e-01 1.40302e-01 1.40295e-01 ± 7.87e-06
3 1.383e-01 1.17678e-01 1.17668e-01 ± 9.53e-06
4 1.166e-01 1.02997e-01 1.02983e-01 ± 1.07e-05
5 1.013e-01 9.27743e-02 9.27534e-02 ± 1.15e-05
6 8.990e-02 8.52048e-02 8.51762e-02 ± 1.21e-05
7 8.107e-02 7.93811e-02 7.93428e-02 ± 1.26e-05
8 7.400e-02 7.47496e-02 7.46995e-02 ± 1.30e-05
9 6.820e-02 7.09795e-02 7.09154e-02 ± 1.34e-05
10 6.334e-02 6.78464e-02 6.77663e-02 ± 1.37e-05
11 5.921e-02 6.52021e-02 6.51035e-02 ± 1.39e-05
12 5.563e-02 6.29389e-02 6.28188e-02 ± 1.41e-05
13 5.252e-02 6.09806e-02 6.08346e-02 ± 1.43e-05
14 4.976e-02 5.92692e-02 5.90923e-02 ± 1.45e-05
15 4.732e-02 5.77617e-02 5.75473e-02 ± 1.46e-05
TABLE II: Bound obtained by integrating numerically over
the right hand side of Eq.10 compared with ḡ(n) extrapolated
for N=40,36,32,24 at T = 0.025.
fit parameter) for T = 0.005, 0.025, 0.05, 0.075 and found
the corrections due to finite temperature all of the order
of 10−5, which is the order of the statistical error. There-
fore we do not give any finite temperature corrections.
(ii) The absolute value of the correlations in the ther-
modynamic limit are smaller than in systems of finite
size. This means that the effect of finite system size is
opposite to the effect of temperature. The finite size be-
haviour of the ground state energy is well studied for
the Heisenberg model on the square lattice. Arguments
originating from the quantum nonlinear sigma model de-
scription [16] of the Heisenberg model to lowest order in
system size give
− e0 = −e0(N) +
, with c > 0 (12)
where −e0(N) is the ground state energy of a system of
size N ×N . Though the corrections are not substantial,
they do effect the results, and taking into account, that
the finite size errors in contrast to the finite temperature
effects, might falsely lead to a crossing, we extrapolated
the data for N = 24...40 using the functional dependence
Eq.12, which we found well satisfied also for larger dis-
tances. The results are shown in table II. One sees that
the numeric values are changed but the crossing point is
still at n = 8.
(iii) We compute ∆x = 1√
〈x2〉 − 〈x〉2 (where the
observable x stands for the value of a correlation at a
given distance, temperature and system size and NMC is
the number of Monte Carlo iterations), which is a reliable
estimate for the statistical error of the mean value 〈x〉,
since for the algorithm of Ref.[9] the autocorrelation time
is of order one and the Monte Carlo configurations are al-
most independent. To assess the quality of our error anal-
0 5 10 15 20
bound 
 24x24  
 36x36 
 40x40 
FIG. 1: Bound on ḡ(n) obtained from Eq.9 and ḡ(n) for 24×
24, 36× 36 and 40× 40 at T = 0.025.
Distance Quantum Monte Carlo Bethe-Ansatz
0 0.25000000 ( 0)
1 -0.14771586 (198) -0.1477157268
2 0.06067787 (324) 0.0606797699
3 -0.05024194 (282) -0.0502486272
4 0.03464515 (281) 0.0346527769
5 -0.03088096 (260) -0.0308903666
6 0.02443619 (255) 0.0244467383
7 -0.02248413 (242) -0.0224982227
8 0.01895736 (236)
TABLE III: Correlations for a chain with N = 400 sites at
T/J=0.005 compared with results from Ref. [10].
ysis we also returned to the case of the one-dimensional
antiferromagnetic Heisenberg model ( see Appendix ) and
compared results with independent streams of random
numbers.
To calculate an upper limit to the errors of ḡ(n), the er-
rors of the correlations where added up ( being evaluated
with the same configurations, they are not independent).
To conclude, the error analysis shows that the short
range correlations entering Eq.9 were determined with
sufficiently high accuracy to prove the existence of a
crossing of the bound and the Quantum Monte Carlo
data for ḡ(n) at n = 8 and therefore to show the
existence of long range order.
Appendix
(1) In this Appendix we list the correlations of a one-
dimensional Heisenberg model with periodic boundary
conditions and chain length N = 400 at T=0.005 com-
pared with results of Ref.[10] for infinite chain length and
T = 0.
For the internal energy U(T ) of the Heisenberg chain
the temperature dependence for low T is U(T ) = −e10 +
aT 2 with the ground-state energy e10 = 0.4431471804
for 400 sites and e10 = −
+ ln 2 for the infinite size
system[17]. and the coefficient a = 1
given in Ref.
[18, 19]. This means that the correction for the corre-
lations in tableIII due to finite temperatures are of the
order of 10−5.
(2) The exact values of the correlation functions
[11, 20] for distance one and two at T = 0 for a chain
with 400 sites are 〈S30S
1〉400 = −0.147717441765735 and
〈S30S
2〉400 = 0.0606813790491800. The above data show
that the error analysis concerning statistical errors and
finite temperature effects is consistent.
Acknowledgement
I am indebted to Prof. E.H. Lieb for bringing the prob-
lem of longrange order to my attention and for his interest
in this work.
[1] F.J. Dyson, E.H. Lieb and B. Simon, J.Stat.Phys. 18
335-383 (1978).
[2] T. Kennedy, E. H.Lieb and S. Shastry, J.Stat.Phys. 53,
1019-1030,(1988).
[3] I. Affleck, T. Kennedy, E.H. Lieb and H. Tasaki, Comm.
Math. Phys. 115:477-528.
[4] E. Jordão Neves and J. Fernando Perez, Phys.Lett.114A
331-333 (1986).
[5] M. Gross, E. Sanchez-Velasco, and E. Siggia, Phys.Rev.B
39 2484(1989).
[6] M. Suzuki, Commun. Math. Phys. 51, (1976).
[7] H.G. Evertz, G. Lana, M.Marcu, Phys. Rev. Lett 70, 875
(1993).
[8] E. Farhi and S. Gutmann, Ann.Phys. (N.Y.)213, 182
(1992).
[9] B. B. Beard and U. -J. Wiese, Phys. Rev. Lett. 77, 5130
(1996).
[10] J. Sato, M. Shiroishi, M. Takahashi hep-th/0507290.
[11] J.Damerau, F.Göhmann, N.P.Hasenclever, A.Klümper,
cond-mat/0701463.
[12] A. W. Sandvik, Phys.Rev.B56 (1997) 11678.
[13] R. Kubo, Phys.Rev.87, 568 (1952).
[14] T. Oguchi, Phys. Rev.117, 117 (1960).
[15] M. Takahashi, Phys. Rev.B 40, 2494 (1989).
[16] S. Chakravaty, B.I. Halperin, D.R. Nelson,
Phys.Rev.B39,2344(1989).
[17] L. Hulthén, Arkiv Mat.Astron.Fysik 26A,1 (1938).
[18] H.M. Babujian, Nucl.Phys. B215, 317 (1982).
[19] I. Affleck, Phys.Rev.Lett. 56,746 (1986).
[20] J. Damerau, private communication.
ABSTRACT
  The existence of Neel order in the S=1/2 Heisenberg model on the square
lattice at T=0 is shown using inequalities set up by Kennedy, Lieb and Shastry
in combination with high precision Quantum Monte Carlo data.

<|endoftext|><|startoftext|>
Introduction
Let (M, g) be a compact Riemannian manifold. The Perelman λ-functional
(1.1) λM(g) = inf
f∈C∞(M)
{F(g, f) :
e−fdvolg = 1}
where F(g, f) =
(Rg + |∇f |2)e−fdvolg and Rg is the scalar curvature of g. Note
that λM(g) is the lowest eigenvalue of the operator −4△ + Rg. By [Pe1] the gradient
flow of the Perelman λ-functional is the Hamilton’s the Ricci-flow evolution equation
(1.2)
g(t) = −2Ric(g(t))
The normalized Ricci flow equation on an n-manifold M reads
(1.3)
g(t) = −2Ric(g(t)) + 2R
where Ric (resp. R) denotes the Ricci tensor (resp. the average scalar curvature
). Note that (1.2) and (1.3) differ only by a change of scale in space and time,
and the volume Vol(g(t)) is constant in t. If dimM = n, λM(g) = λM(g)Volg(M)
is invariant up to rescaling the metric. Perelman [Pe1] has proved that λM(g(t)) is
non-decreasing along the Ricci flow g(t) whenever λM(g(t)) ≤ 0. This leads to the
The first author was supported by NSF Grant 19925104 of China, 973 project of Foundation Science
of China, and the Capital Normal University.
http://arxiv.org/abs/0704.0714v1
2 F. FANG, Y. ZHANG, AND Z. ZHANG
Perelman invariant λM by taking supremum of λM(g) in the set of all Riemannian
metrics on M .
By [AIL] the Perelman invariant λM is equal to the Yamabe invariant whenever
λM ≤ 0, after the earlier estimations (cf. [An5] [Pe2] [Le4] [FZ] and [Kot]). In
particular, if (M, g) is a smooth compact oriented 4-manifold with a Spinc-structure
c which is a monopole class (i.e., the associated Seiberg-Witten equation possesses
an irreducible solution) so that that c21(c)[M ] > 0, by [FZ] λM ≤ −
32π2c21(c)[M ].
Moreover, g is a Kähler-Einstein metric of negative scalar curvature if and only if
λM(g) = −
32π2c21(c)[M ]. However, there are plenty of 4-manifolds where the Perel-
man invariant λM = −
32π2c21(c)[M ] but do not admit any Kähler Einstein metric.
It is natural to study 4-manifolds with these extremal property. For such a 4-manifold
M , to seek for an ”optimal” Riemannian metric on M with respect to the Perelman
functional λM : M → R, we want to consider a maximal solution g(t) which is a solu-
tion of the Ricci flow (1.3). We call a longtime solution g(t), t ∈ [0,+∞), to the Ricci
flow (1.3) a maximum solution if lim
λM(g(t)) = λM . For a compact 3-manifold, by
Perelman [Pe2] all solutions of the Ricci flow (1.2) with surgery exist for longtime and
are maximum solutions, provided λM ≤ 0. In the paper [FZZ] obstructions are found
for the longtime solutions with bounded curvature to (1.3).
In this paper we are going to study the maximum solutions of (1.3) with bounded
Ricci curvatures instead. To avoid technique terminology we only state our results
for symplectic 4-manifolds by using the celebrated work of Taubes [Ta]: if (M,ω) is
a compact symplectic manifold with b+2 (M) > 1 (the dimension of self-dual harmonic
2-forms of M), the spinc-structure induced by ω is a monopole class. Moreover, in
this situation c21(c)[M ] = 2χ(M) + 3τ(M), where χ(M) (resp. τ(M)) is the Euler
characteristic (resp. signature) of M .
Theorem 1.1. Let (M,ω) be a smooth compact symplectic 4-manifold satisfying that
b+2 (M) > 1 and 2χ(M) + 3τ(M) > 0. If g(t), t ∈ [0,∞), is a solution to (1.3) such
that |Ric(g(t))| ≤ 3, and
λM(g(t)) = −
32π2(2χ(M) + 3τ(M)),
then there exists an m ∈ N, and sequences of points {xj,k ∈ M}, j = 1, · · · , m,
satisfying that, by passing to a subsequence,
(M, g(tk + t), x1,k, · · · , xm,k)
dGH−→ (
Nj, g∞, x1,∞, · · · , , xm,∞),
t ∈ [0,∞), in the m-pointed Gromov-Hausdorff sense for any sequence tk −→ ∞, where
(Nj , g∞), j = 1, · · · , m, are complete Kähler-Einstein orbifolds of complex dimension 2
with at most finitely many isolated orbifold points. The scalar curvature (resp. volume)
of g∞ is
−Volg0(M)−
32π2(2χ(M) + 3τ(M)) (resp. Volg0(M) =
Volg∞(Nj))
Moreover, the convergence is C∞ in the non-singular part of
1 Nj.
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 3
We first remark that, if the diameters diamg(tk)(M) possess a uniform upper bound,
then m = 1, and N1 is a compact Kähler-Einstein orbifold. Secondly, if the Ricci curva-
ture bound in the above theorem is replaced by a uniform bound of sectional curvature,
then every (Nj, g∞), j = 1, · · · , m are complete Kähler-Einstein manifolds. By the
same arguments as in [An5][An6],
j=1Nj can weakly embed in M ,
j=1Nj ⊂⊂ M ,
i.e. for any compact subset K ⊂
j=1Nj , there is a smooth embedding FK : K −→ M .
Furthermore, there exists a sufficiently large compact subset K ⊂
j=1Nj such that
M\K admits an F-structure of positive rank. This type geometric decomposition seems
very useful to understand the diffeomorphism type of 4-manifolds.
Theorem 1.2. Let (M,ω) be a smooth compact symplectic 4-manifold such that b+2 (M) >
1 and let g(t), t ∈ [0,∞), be a solution to (1.3) such that |R(g(t))| ≤ 12. If in addition
χ(M) = 3τ(M) > 0, then
λM(g(t)) = −
32π2(2χ(M) + 3τ(M))
Moreover, if |Ric(g(t))| ≤ 3, the Kähler-Einstein metric g∞ in Theorem 1.1 is complex
hyperbolic.
To conclude the section we point out that the main result in Theorem 1.1 (resp.
Corollary 1.2) holds if the manifold is not symplectic but a compact oriented 4-manifold
with a monopole class c1 (i.e. with a spin
c-structure with non-vanishing Seiberg-Witten
invariant) so that c21 = 2χ(M) + 3τ(M) > 0.
2. Preliminaries
2.1. Monopole class. Let (M, g) be a compact oriented Riemannian 4-manifold with
a Spinc structure c. Let b+2 (M) denote the dimension of the space of self-dual harmonic
2-forms in M . Let S±
denote the Spinc-bundles associated to c, and let L be the
determinant line bundle of c. There is a well-defined Dirac operator
DA : Γ(S+c ) −→ Γ(S−c )
Let c : ∧∗T ∗M −→ End(S+
) denote the Clifford multiplication on the Spinc-
bundles, and, for any φ ∈ Γ(S±
), let
q(φ) = φ⊗ φ− 1
|φ|2id.
The Seiberg-Witten equations read
(2.1)
DAφ = 0
c(F+A ) = q(φ)
where A is an Hermitian connection on L, and F+A is the self-dual part of the curvature
of A.
A solution of (2.1) is called reducible if φ ≡ 0; otherwise, it is called irreducible. If
(φ,A) is a resolution of (2.1), one calculates easily that
(2.2) |F+A | =
|φ|2,
4 F. FANG, Y. ZHANG, AND Z. ZHANG
The Bochner formula reads
(2.3) 0 = −2△|φ|2 + 4|∇Aφ|2 +Rg|φ|2 + |φ|4,
where Rg is the scalar curvature of g.
The Seiberg-Witten invariant can be defined by counting the irreducible solutions
of the Seiberg-Witten equations (cf. [Le2]).
Definition 2.2. ([K1]) Let M be a smooth compact oriented 4-manifold. An element
α ∈ H2(M,Z)/torsion is called a monopole class ofM if and only if there exists a Spinc-
structure c on M with first Chern class c1 ≡ α(mod torsion), so that the Seiberg-Witten
equations have a solution for every Riemannian metric g on M .
By the celebrated work of Taubes [Ta], if (M,ω) is a compact symplectic 4-manifold
with b+2 (M) > 1, the canonical class of (M,ω) is a monopole class.
2.3. Kato’s inequality. Let (M, g) be a Riemannian Spinc-manifold of dimension n,
the following Kato inequality is useful.
Proposition 2.4. (Proposition 2.2 in [BD]) Let φ be a harmonic Spinc-spinor on
(M, g), i.e. DAφ = 0, where DA is the Dirac operator and A is an Hermitian connection
on the determinant line bundle. Then
(2.4) |∇|φ||2 ≤ n− 1
|∇Aφ|2 ≤ |∇Aφ|2
at all points where φ is non-zero. Moreover, |∇|φ||2 = |∇Aφ|2 occurs only if ∇Aφ ≡ 0.
Note that the arguments in the proof of Proposition 2.2 in [BD] can be used to
prove this proposition without any change, where the same conclusion was derived for
Spin-spinor φ. For any ǫ > 0, let |φ|2ǫ = |φ|2+ǫ2. If φ is harmonic, by above proposition,
(2.5) |∇|φ|ǫ|2 ≤
|∇|φ||2 ≤ n− 1
|∇Aφ|2 ≤ |∇Aφ|2
at points where φ(p) 6= 0. Since {p ∈ M : φ(p) 6= 0} is dense in M for harmonic φ, we
conclude that (2.5) holds everywhere in M .
2.5. Chern-Gauss-Bonnet formula and Hirzebruch signature formula. Let
(M, g) be a compact closed oriented Riemannian 4-manifold, χ(M) and τ(M) are the
Euler number and the signature of M respectively. The Chern-Gauss-Bonnet formula
and the Hirzebruch signature theorem say that
(2.6) χ(M) =
+ |Wg|2 −
|Rico|2)dvg, and
(2.7) τ(M) =
(|W+g |2 − |W−g |2)dvg,
where Rico = Ric(g)− Rg
g is the Einstein tensor, W+g and W
g are the self-dual and
anti-self-dual Weyl tensors respectively (cf. [B]). If g is a Kähler-Einstein metric, then
(2.8) R2g = 24|W+g |2,
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 5
(cf. [Le3]) which will be used in the proof of Theorem 1.1.
By Chern-Gauss-Bonnet formula, one has an L2-bound of the curvature operator
Rm(g) by the bounds of Ricci curvature, i.e. if |Ric(g)| < C, then
(2.9)
|Rm(g)|2dvg ≤ 8π2χ(M) + C1V olg(M),
where C and C1 are constants independent of (M, g).
Let (N, g) be a complete Ricci-flat Einstein 4-manifold. Assume that
(2.10)
|Rm(g)|2dvg < ∞, and Volg(Bg(x, r)) ≥ Cr4,
for all r > 0, a point x ∈ N , and a positive constant C. By Theorem 2.11 of [N], (N, g)
is ALE. (i.e, Asymptotically Locally Euclidean space) of order 4. It is well-known that
N is asymptotic to the cone on the spherical space form S3/Γ, where Γ ⊂ SO(4) is a
finite group. The Chern-Gauss-Bonnet formula implies that
(2.11) χ(N) =
|Rm(g)|2dvg +
(cf. [N] and [An1]).
2.6. Curvature estimates for 4-manifolds. Now let’s recall a result of [CT], which
is important to the proof of Theorem 1.1. Let (M, g) be a complete Riemannian 4-
manifold. A subset U ⊂ M such that for all p ∈ U , sup
Bg(p,1)
Ric(g) ≥ −3, is called
̺-collapsed if for all p ∈ U ,
V olg(Bg(p, 1)) ≤ ̺.
By Theorem 0.1 in [CG], there is a constant ε4 such that if U is ̺-collapsed with
sectional curvature |Kg| ≤ 1 and ̺ ≤ ε4, then U carries an F-structure of positive
rank.
Theorem 2.7. (Remark 5.11 and Theorem 1.26 in [CT]) There exist constants δ > 0,
c > 0 such that: if (M, g) is a complete Riemannian 4-manifold with |Ric(g)| ≤ 3 and
|Rm(g)|2dvg ≤ C,
and if E ⊂ M is a bounded open subset such that T1(E) = {x ∈ M : dist(x, E) ≤ 1} is
ε4-collapsed with
Bg(x,1)
|Rm(g)|2dvg ≤ δ (for all T1(E)),
then ∫
|Rm(g)|2dvg ≤ cV olg(A0,1(E)),
where A0,1(E) = T1(E)\E.
6 F. FANG, Y. ZHANG, AND Z. ZHANG
3. The limiting behavior of Ricci flow
In this section we study the limiting behavior of Ricci-flow with bounded Ricci
curvatures on 4-manifolds. We will assume in this section that M is a smooth closed
oriented 4-manifold with λM < 0, and g(t), t ∈ [0,+∞), is a longtime solution of the
normalized Ricci flow (1.3) with bounded Ricci-curvature. By normalization we may
assume that |Ric(g(t))| ≤ 3. By (2.9) there is a constant C independent of t such that
|Rm(g(t))|2dvg(t) ≤ C.
Let us denoted by V the volume Volg(0)(M) = Volg(t)(M), and R̆(g(t)) = min
R(g(t))(x)
the minimum of the scalar curvature of g(t). It is easy to see that R̆(g(t)) ≤ λMV −
Lemma 3.1. (3.1.1) lim
λM(g(t)) = lim
R(g(t)) = lim
R̆(g(t)) = R∞
(3.1.2) lim
|R(g(t))−R(g(t))|dvg(t) = 0,
(3.1.3) lim
|Rico(g(t))|2dvg(t) = 0.
Proof. By Perelman [Pe1] λM(g(t)) is a non-decreasing function on t, therefore the limit
λM(g(t)) exists since λM < 0. Now let us denote by R∞ the limit lim
λM(g(t)).
Note that R∞ ≤ λMV −
2 < 0. To prove (3.1.1), we first prove that both lim
R(g(t))
and lim
R̆(g(t)) exist and take values R∞. By the same arguments as in the proof of
Proposition 2.6 and Lemma 2.7 of [FZZ] we get that
R(g(t))− R̆(g(t)) = 0.
Observe that R(g(t)) ≥ λM(g(t)) ≥ R̆(g(t)) (cf. [KL] (92.3)). Therefore lim
R(g(t)) =
R∞ = lim
R̆(g(t)). This proves (3.1.1).
Note that∫
|R(g(t))− R(g(t))|dvg(t) ≤
(R(g(t))− R̆(g(t)))dvg(t) +
(R(g(t))− R̆(g(t)))dvg(t)
= 2(R(g(t))− R̆(g(t)))V
(3.1.2) follows from (3.1.1).
By Lemma 3.1 in [FZZ],
|Rico(g(t))|2dvg(t)dt < ∞,
and, by Lemma 1 in [Ye], we have
|Rico(g(t))|2dvg(t) ≤ −2
|∇Rico(g(t))|2dvg(t)+4
|Rm||Rico(g(t))|2dvg(t) < D,
where D is a constant independent of t. By the same argument as in the proof of
Proposition 2.6 in [FZZ] (3.1.3) follows. �
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 7
The following is the main result of this section, which is an analogy of Theorem 10.5
in [CT], where the same conclusion was derived for closed oriented Einstein 4-manifolds
with the same negative Einstein constant. The key point in our case is to use Lemma
3.1 to get non-collapsing balls and to prove the limiting metric is an Einstein metric
(cf. Lemma 3.3 and Lemma 3.4 below).
Proposition 3.2. Let M be a smooth closed oriented 4-manifold with λM < 0. If
g(t), t ∈ [0,∞) is a solution to (1.3) such that |Ric(g(t))| ≤ 3, and {tk} is a sequence
of times tends to infinity such that
diamgk(M) −→ ∞,
when k −→ ∞, where gk = g(tk), then there exists an m ∈ N, and sequences of points
{xj,k ∈ M}, j = 1, · · · , m, satisfying that, by passing to a subsequence,
(M, gk, x1,k, · · · , xm,k)
dGH−→ (
Nj , g∞, x1,∞, · · · , , xm,∞)
in the m-pointed Gromov-Hausdorff sense for k → ∞, where (Nj , g∞) j = 1, · · · , m are
complete Einstein 4-orbifolds with at most finitely many isolated orbifold points {qi}.
The scalar curvature (resp. volume) of g∞ is
R∞ = lim
λM(g(t)), (resp. V = Volg0(M) =
Volg∞(Nj)).
Furthermore, in the regular part of Nj, {gk} converges to g∞ in both L2,p (resp. C1,α)
sense for all p < ∞ (resp. α < 1).
We divide the proof of Proposition 3.2 into several useful lemmas.
A key result in the paper [CT] shows that, for any compact oriented Einstein 4-
manifold (X, g) with Einstein constant −3, there exists a constant C depending only
on the Euler number of X , and a point x ∈ X such that Volg(Bg(x, 1)) ≥ CVolg(X) (cf.
Theorem 0.14 [CT]). Cheeger-Tian remarked that the same result continues to hold
for 4-manifolds which are sufficiently negatively Ricci pinched. The following lemmas
is an analogy of the result for the metric gk in Proposition 3.2.
Lemma 3.3. There exists a constant v > 0, and a sequence {xk} ⊂ M such that
Volgk(Bgk(xk, 1)) ≥ v.
Proof. Let ε4 > 0 be the critical constant of Cheeger-Tian (cf. §1 [CT]), i.e., if X is a
Riemannian 4-manifold which is ε4-collapsed with locally bounded curvature, then X
carries an F-structure of positive rank. We may assume that, for all x ∈ M and gk,
Volgk(Bgk(x, 1)) < ε4. By a standard covering argument, for any k, there exist finitely
many points q1, · · · , ql such that E = M\
i=1Bgk(qi, 1) satisfies the hypothesis of
Theorem 2.7. Moreover, l ≤ Cδ−1 where C and δ are the constants in Theorem 2.7.
Therefore, by Theorem 2.7 we conclude that, there is a constant C1 independent of k
such that
(3.1)
|R(gk)|2dvk ≤ 6
|Rm(gk)|2dvk ≤ C1
Volgk(Bgk(qi, 1)).
8 F. FANG, Y. ZHANG, AND Z. ZHANG
On the other hand, by Lemma (3.1.2)
(R(gk)
2 − R(gk)2)dvk| ≤ 24
|R(gk)− R(gk)|dvk
k→∞−→ 0.
Therefore
(3.2)
∞Volgk(E)−
R(gk)
2dvk ≤ R(gk)2Volgk(E)−
R(gk)
(R(gk)
2 − R(gk)2)dvk
for sufficiently large k since R∞ ≤ λMV −
2 < 0. By inserting (3.1) we get that
∞(V −
V olgk(Bgk(qi, 1))−
∞Volgk(E)−
Volgk(Bgk(qi, 1)),
V ≤ C2
Volgk(Bgk(qi, 1)),
where C2 is a constant independent of k. Therefore, there is at least a ball among the
l balls whose volume is at least V
. The desired result follows. �
Assuming that diamgk(M) → ∞ for k → ∞, by using the technique developed in
[An3], the analogue of Theorem 3.3 in [An2] holds (cf. Theorem 2.3 in [An4]), i.e.
there exist a sequence of points {xk} ⊂ M such that, by passing to a subsequence,
{(M, gk, xk)}
dGH−→ (N∞, g∞, x∞)
where N∞ is a 4-orbifold with only isolated orbifold points {qi}, g∞ is a complete C0
orbifold metric, and g∞ is a C
1,α ∩L2,p Riemannian metric on the regular part of N∞,
for all p < ∞ and α < 1. Furthermore, {gk} converges to g∞ in the L2,p (resp. C1,α)
sense on the regular part of N∞, i.e. for any r ≫ 1 and k, there is a smooth embedding
Fk,r : Bg∞(x∞, r)\
i Bg∞(qi, r
−1) ⊂ N∞ → M such that, by passing to a subsequence,
F ∗k,rgk converge to g∞ in both L
2,p and C1,α senses.
Lemma 3.4. g∞ is an Einstein orbifold metric with scalar curvature R∞.
Proof. We first prove that g∞ is an Einstein metric with scalar curvature R∞ on the
regular part of N∞. Since F
k,rgk converge to g∞ in the L
2,p(resp. C1,α) sense on
Bg∞(p∞, r)\
i Bg∞(qi, r
−1), for any r, by Lemma 3.1, we obtain that
Bg∞ (p∞,r)\
i Bg∞(qi,r
|Rico(g∞)|2dv∞ ≤ lim
|Rico(gk)|2dvk = 0,
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 9
Bg∞ (p∞,r)\
i Bg∞ (qi,r
|R(g∞)− R∞|dv∞ ≤ lim
|R(gk)− R(gk)|dvk = 0.
Therefore g∞ is a C
1,α Riemannian metric on Bg∞(p∞, r)\
i Bg∞(qi, r
−1) which sat-
isfies the Einstein equation in the weak sense. By elliptic regularity theory, g∞ is a
smooth Einstein metric with scalar curvature R∞.
Since g∞ is a C
0-orbifold metric, i.e. for any orbifold point qi ∈ N∞, there is
a neighborhood Ui ∼= B(0, r)/Γ of qi such that g̃∞ is a C0-Riemannian metric on
B(0, r) ⊂ R4 where Γ ⊂ SO(4) is a finite subgroup acting freely on S3, and g̃∞|B(0,r)\{0}
is the pull-back metric of g∞. Note that g̃∞ is a smooth Einstein metric on B(0, r)\{0}
satisfying that
B(0,r)
|Rm(g̃∞)|2dveg∞ < C < ∞. By the arguments as in [An1] and
[Ti], g̃∞ is a C
∞ Einstein metric on B(0, r) (cf. the proof of Theorem C in [An1], and
Section 4 in [Ti]). Hence g∞ is an Einstein orbifold metric. �
By the discussion before Lemma 3.4 we may choose ℓ sequences of points {xj,k} ⊂
M , j = 1, · · · , ℓ, such that distgk(xi,k, xj,k)
k→∞−→ ∞ for any i 6= j, and
(3.3) {(M, gk, x1,k, · · · , xℓ,k)}
dGH−→ (
Nj , g∞, x1,∞, · · · , xℓ,∞)
where (Nj , g∞, xj,∞), j = 1, · · · , ℓ are complete Einstein 4-orbifolds with only isolated
singular points and scalar curvatures R∞. Furthermore, {gk} converges to g∞ in both
L2,p (resp. C1,α) sense on the regular parts of Nj , j = 1, · · · , ℓ. Note that
(3.4) V ≥
Volg∞(Nj).
Lemma 3.5. The number of orbifold points of
Nj is less than a constant depending
only on the Euler characteristic χ(M).
Proof. For each orbifold point q ∈ Nj , there exist a sequence {qk} ⊂ M , and two
constants r ≫ r1 > 0 such that:
(3.5.1) q ∈ Bg∞(xj,∞, r);
(3.5.2) Bg∞(q, r1)\Bg∞(q, σ) lies in the regular part of Bg∞(xj,∞, r) for any σ < r1;
(3.5.3) (Bgk(qk, r1)\Bgk(qk, σ), gk)
C1,α−→ (Bg∞(q, r1)\Bg∞(q, σ), g∞).
By the definition of harmonic radius (cf. [An3]), the harmonic radii of all points in
Bgk(qk, r1)\Bgk(qk, σ) have a uniform lower bound, saying µ > 0, a constant depending
on σ but independent of k.
Clearly, there is a positive constant v0 (e.g.,
Volg∞(Bg∞(xj,∞, r))) such that
Volgk(Bgk(xj,k, r)) ≥ v0. Note that the Sobolev constants CS,k of Bgk(xj,k, r) are
bounded from below by a constant depending only on v0, r (cf. [An2] and [Cr]). There-
fore, by [An2] again we get that Volgk(Bgk(qk, s)) ≥ Cs4 for any s ≪ 1, where C is
independent of k.
10 F. FANG, Y. ZHANG, AND Z. ZHANG
Let us denote by rh,k the infimum of the harmonic radii of gk in the ball Bgk(qk, r1).
Note that rh,k
k→∞−→ 0 since q is a orbifold point (cf. [An3]). Therefore, there is a point
q̄k ∈ Bgk(qk, σ) so that rh(q̄k) = rh,k for sufficiently large k.
Consider the normalized balls (Bgk(qk, r1), r
h,kgk), which have harmonic radii at
least 1. By passing to a subsequence if necessary,
(Bgk(qk, r1), r
h,kgk, q̄k)
C1,α−→ (W, ḡ∞, q̄)
where (W, ḡ∞) is a complete Ricci-flat 4-manifold satisfying that
(3.5) Volḡ∞(Bḡ∞(q̄, r)) ≥ Cr4
for any r > 0. It is obvious that
|Rm(ḡ∞)|2dvḡ∞ ≤ lim inf
|Rm(gk)|2dvk ≤ C.
Therefore (W, ḡ∞) is an Asymptotically Locally Euclidean space (cf. Theorem 2.11 in
[N] or [An1]), which is asymptotic to a cone of S3/Γ where Γ ⊂ SO(4) is a finite group
acting freely on S3. By the Chern-Gauss-Bonnet formula
(3.6) χ(W ) =
|Rm(ḡ∞)|2dvḡ∞ +
|Γ| .
By [An1] W is isometric to R4, provided |Γ| = 1. Since the harmonic radius of ḡ∞ at
q̄ is 1, hence ḡ∞ can not be the Euclidean metric. Hence |Γ| ≥ 2. It is easy to verify
that χ(W ) ≥ 1. By (3.6) we get that
|Rm(ḡ∞)|2dvḡ∞ ≥ 4π2.
This proves that every orbifold point contributes to lim inf
|Rm(gk)|2dvk at least
4π2. By the rescaling invariance of the integral we conclude that the number of orbifold
points β ≤ C
The following lemma is an analogue of a result in Cheeger-Tian [CT].
Lemma 3.6. ℓ < χ(M)+β+1, where β := #{number of orbifold points in Lemma 3.5}.
Proof. Suppose not, i.e, ℓ ≥ χ(M) + β + 1, by definition there are at least χ(M) + 1
components of
1Nj which are smooth complete non-compact Einstein 4-manifolds of
finite volume, for simplicity saying N1, · · · , Ns, where s ≥ χ(M) + 1. By Theorem 4.5
in [CT], for each 1 ≤ j ≤ s,
|Rm(g∞)|2dvg∞ ≥ 8π2.
Since (M, gk, xk,j)
L2,p−→ (Nj , g∞, x∞,j), by Chern-Gauss-Bonnet formula and (3.1.3) in
Lemma 3.1 we get that
8π2χ(M) = lim
|Rm(gk)|2dvgk ≥
|Rm(g∞)|2dvg∞ ≥ 8π2(χ(M) + 1).
A contradiction. �
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 11
Let m denote the maximal value of all possible choice of the base point sequences
in (3.3), which has a upper bound by Lemma 3.6.
Lemma 3.7. Let Mk,r = M\
Bgk(xj,k, r). For sufficiently large r, there is a
constant C independent of r such that
(3.7) lim
Volgk(Mk,r) ≤ C
Volg∞(Nj\Bg∞(xj,∞,
(3.8)
Volg∞(Nj) = V.
Proof. We may choose r ≫ 1 such that, for any y ∈
j=1(Nj\Bg∞(xj,∞, r − 1)),
Volg∞(Bg∞(y, 1)) ≤ 12ε4, where ε4 > 0 is the critical constant of Cheeger-Tian (cf.
proof of Lemma 3.3 or §1 [CT] ).
Now we claim that there is a constant k0 ≫ 1 such that, for any k > k0 and any
x ∈ Mk,r, Volgk(Bgk(x, 1)) ≤ ε4.
If it is false, without loss of generality we may assume a sequence of points {yk} ⊂
Mk,r such that
(3.9) Volgk(Bgk(yk, 1)) > ε4
Observe that the distance distgk(yk, xj,k) → ∞ as k → ∞ for all 1 ≤ j ≤ m. Otherwise,
assuming distgk(yk, xj,k) < ρ for some j and ρ > 0, we get that F
j,k,ρ(yk) → y∞ ∈
Bg∞(xj,∞, ρ)\Bg∞(xj,∞, r − 1), and so
(3.10) Volgk(Bgk(yk, 1)) → Volg∞(Bg∞(y∞, 1)) ≤
when k → ∞, since F ∗j,k,ρgk C1,α-converges to gj,∞, where
(3.11) Fj,k,ρ : Bg∞(xj,∞, ρ)\
Bg∞(qi, ρ
−1) ⊂ N∞ → M
is a smooth embedding so that F ∗j,k,ρgk converges to g∞ in the C
1,α-sense (cf. the
discussion before Lemma 3.4). A contradiction to (3.9).
Note that (M, gk, yk)
dGH−→ (N∞, g∞, y∞) where N∞ is a complete 4-orbifold different
from each of Nj, 1 ≤ j ≤ m. This violates the choice of maximality of m. Hence we
have proved the claim.
By a standard covering argument, for any k, there exist finitely many points
z1,k, · · · , zI,k such that Ek,r = Mk,r\
i=1Bzi,k(1) satisfies the hypothesis of Theorem
2.7, where I is independent of k. By Theorem 2.7, there is a constant C independent
of k such that
|R(gk)|2dvk ≤ 6
|Rm(gk)|2dvk ≤ C(
Volgk(Bgk(zi,k, 1))+Volgk(A0,1(Mk,r))).
By Lemma 3.1, for k ≫ 1, we have
(3.12)
|R(gk)− R(gk)|dvk <
|R(gk)− R(gk)|dvk −→ 0.
12 F. FANG, Y. ZHANG, AND Z. ZHANG
By (3.2) we get
∞Volgk(Ek,r)−
R(gk)
2dvk ≤
(R(gk)
2 − R(gk)2)dvk
|R(gk)− R(gk)|dvk,
Since Volgk(Ek,r) ≥ Volgk(Mk,r) −
Volgk(Bgk(zi,k, 1)), by the above together we get
immediately that
(3.13)
Volgk(Mk,r) ≤ C(
Volgk(Bgk(zi,k, 1))+Volgk(A0,1(Mk,r)))+24
|R(gk)−R(gk)|dvk.
If distgk(zi,k, xj,k) → ∞ for all 1 ≤ j ≤ m, by the same argument as above we get that
Volgk(Bgk(zi,k, 1)) → 0
when k → ∞. Otherwise, there exists a subsequence ks → ∞ and an index j such that
distgks (zi,ks, xj,ks) < ρ
for some constant ρ. In both cases, we obtain
lim sup
Volgk(Bgk(zi,ks, 1)) ≤
Volg∞(Nj\Bg∞(xj,∞,
for r ≫ ρ. Therefore, by (3.12) and (3.13) we conclude immediately (3.7).
By (3.7) it follows that lim
k,r→∞
Volgk(Mk,r) → 0. Hence (3.8) follows. �
By now Proposition 3.2 follows by the above lemmas.
4. Smooth convergence on the regular part
The main result of this section is the following:
Proposition 4.1. Let M be a closed 4-manifold satisfying that λ̄M < 0 and let
g(t), t ∈ [0,∞), be a solution to the normalized Ricci flow equation (1.3) on M with
uniformly bounded Ricci curvature. If (M, g(tk), pk)
dGH−→ (N∞, g∞, p∞), where tk → ∞
and N∞ is a 4-dimensional orbifold, and g(tk)
C1,α−→ g∞ on the regular part R of N∞ (the
compliment of the orbifold points), then, by passing to a subsequence, for all t ∈ [0,∞),
(M, g(tk + t), pk)
dGH−→ (N∞, g∞(t), p∞), where g∞(t) is a family of smooth metrics on
R solving the normalized Ricci flow equation on R with g∞(0) = g∞. Moreover, the
convergence is smooth on R× [0,∞).
In [Se] the convergence of Kähler-Ricci flow on compact Kähler manifolds with
bounded Ricci curvature was studied. It seems that the arguments in [Se] could be
applied to prove Proposition 4.1, but the authors can not follow completely her line.
Therefore, we give a quite different approach, where we first give a curvature estimate
of the Ricci flow similar to Perelman’s pseudolocality theorem. Using this curvature
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 13
estimation we prove the limit Ricci flow exists on R × [0,∞). Finally, we prove that
R is exactly the regular part of every subsequence limit of (M, g(tk + t), pk), for all
t ∈ [0,∞). It deserves to point out that our approach works only in dimension 4.
We now give a curvature estimate for the Ricci flow which is an analogy of Perel-
man’s pseudolocality theorem (cf. [Pe1] Thm. 10.1). The difference is that here we
use the hypothesis of local almost Euclidean volume growth, instead of the almost
Euclidean isoperimetric estimate. The proof is much easier than that of Perelman’s
pseudolocality theorem.
Theorem 4.2. There exist universal constants δ0, ǫ0 > 0 with the following property.
Let g(t), t ∈ [0, (ǫP r0)2], be a solution to the Ricci flow equation (1.2) on a closed
n-manifold M and x0 ∈ M be a point. If the scalar curvature
R(x, t) ≥ −r−20 whenever distg(t)(x0, x) ≤ r0,
and the volume
Volg(t)(Bg(t)(x, r)) ≥ (1− δ0)Vol(B(r)) for all Bg(t)(x, r) ⊂ Bg(t)(x0, r0),
where B(r) denotes a ball of radius r in the n-Euclidean space and Vol(B(r)) denotes
its Euclidean volume, then the Riemannian curvature tensor satisfies
|Rm|g(t)(x, t) ≤ t−1, whenever distg(t)(x0, x) < ǫ0r0, and 0 < t ≤ (ǫ0r0)2.
In particular, |Rm|g(t)(x0, t) ≤ t−1 for all time t ∈ (0, (ǫ0r0)2].
Proof. We use Claim 1 and Claim 2 of Theorem 10.1 in [Pe1] and adopt a contradiction
argument. For any given small constants ǫ, δ > 0, set ǫ0 = ǫ, δ0 = δ, then there is a
solution to the Ricci flow equation (1.2), say (M, g(t)), not satisfying the conclusion
of the theorem. After a rescaling, we may assume that r0 = 1. Denote by M̄ the
non-empty set of pairs (x, t) such that |Rm|g(t)(x, t) > t−1, then as in Claim 1 and
Claim 2 of Theorem 10.1 in [Pe1], we can choose another space time point (x̄, t̄) ∈ M̄
with 0 < t̄ ≤ ǫ2, distg(t̄)(x0, x̄) < 110 , such that |Rm|g(t)(x, t) ≤ 4Q whenever
t̄− 1
Q−1 ≤ t ≤ t̄, distg(t̄)(x̄, x) ≤
(100nǫ)−1Q−1/2,
where Q = |Rm|g(t̄)(x̄, t̄). It is remarkable that from the proof of Claim 2 of Theorem
10.1 in [Pe1], each such a space time point (x, t) satisfies
distg(t)(x, x0) < distg(t̄)(x0, x̄) + (100nǫ)
−1Q−1/2 <
+ (100n)−1 <
Now choosing sequences of positive numbers ǫk → 0 and δk → 0, we obtain a
sequence of solutions (Mk, gk(t)), t ∈ [0, ǫ2k] and a sequence of points x0,k, x̄k ∈ Mk
and times t̄k, with each satisfying the assumptions of the theorem and the properties
described above. In particular, we have that Qk = |Rmk|gk(t̄k)(x̄k, t̄k) → ∞. Consider
the sequence of pointed Ricci flow solutions
(Bgk(t̄k)(x̄k,
(100nǫk)
k ), Qkgk(Q
k t+ t̄k), x̄k), t ∈ [−
, 0].
Using Hamilton’s compactness theorem for solutions to the Ricci flow, we can extract
a subsequence which converge to a complete Ricci flow solution (M∞, g∞(t), x̄∞), t ∈
, 0], with |Rm∞|g∞(0)(x̄∞, 0) = 1.
14 F. FANG, Y. ZHANG, AND Z. ZHANG
By assumption, the balls
Bgk(t̄k)(x̄k,
(100nǫk)
k ) ⊂ Bgk(t)(x0,k,
for any t ∈ [t̄− 1
Q−1, t̄], so the scalar curvature Rk(x, t) ≥ −1 for t ∈ [t̄− 12nQ
−1, t̄] and
x ∈ Bgk(t̄k)(x̄k, 110(100nǫk)
k ) and Volgk(t)(Bgk(t)(x, r)) ≥ (1−δk)Vol(B(r)) for any
metric ball Bgk(t)(x, r) ⊂ Bgk(t̄k)(x̄k, 110(100nǫk)
k ), t ∈ [t̄− 12nQ
−1, t̄]. Passing to
the limit, we see that g∞(t) has scalar curvature R∞ ≥ 0 everywhere and local volume
Volg∞(t)(Bg∞(t)(z, r)) ≥ Vol(B(r)) for any balls Bg∞(t)(z, r) at time t ∈ (− 12n , 0]. Then
the local variation formula of volume implies that R∞ ≡ 0 on M∞×(− 12n , 0], see [STW]
for details. By the evolution of the scalar curvature ∂
R∞ = △R∞ + 2|Ric∞|2, we get
that Ric∞ ≡ 0 over M∞ × (− 12n , 0]. Then the Bishop-Gromov volume comparison
theorem implies that g∞(t) are flat solutions to the Ricci flow, which contradicts the
fact that |Rm∞|(x̄∞, 0) = 1. This ends the proof of the theorem. �
The next lemma provides a comparison of the curvature of the normalized and un-
normalized Ricci flow. By assumption, there is C̄ < ∞ such that |Ric| ≤ C̄ everywhere
along the flow (M, g(t)). Note that by Lemma 3.1, there is some time T < ∞ such that
2R∞ ≤ R(g(t)) ≤ 12R∞ < 0 whenever t > T . Fix any such a time t̄ > T and let h(t)
and h̃(t̃) be the solutions to the normalized and unnormalized Ricci flow with initial
metric h(0) = h̃(0) = g(t̄) respectively, where t̃ = t̃(t) is the corresponding rescaled
time for t. Denote by Rmt̄, Rict̄, Rt̄ and R̃mt̄, R̃ict̄, R̃t̄ the corresponding Riemann-
ian curvature, Ricci curvature and scalar curvature of them, where |Rict̄| ≤ C̄ since
h(t) = g(t̄+ t). Then we have
Lemma 4.3. The solution h̃(t̃) exists for all time t̃ ∈ [0,∞). Furthermore, there exist
constants C and τ depending on λ̄M and C̄, such that
t ≤ t̃ ≤ Ct, |R̃mt̄|(x, t̃) ≤ |Rmt̄|(x, t) ≤ C|R̃mt̄|(x, t̃), whenever t ≤ τ.
Proof. The solution h(t) has average scalar curvature R(t̄ + t) ≤ 1
R∞ < 0, so h̃(t̃)
also has average scalar curvature R̃ < 0. From the evolution d
lnVol(h̃(t̃)) = −R̃, the
volume Vol(h̃(t̃)) increases strictly in t̃, so to normalize it, we need to compress the
space and time. Thus t̃ ≥ t and |R̃mt̄|(x, t̃) ≤ |Rmt̄|(x, t) for all (x, t). So h̃(t̃) exists
for all time.
The last assertion means that the scaling factor from normalized Ricci flow to the
unnormalized one is less than C on the time interval [0, τ ]. Consider the evolution of
average scalar curvature R̃(t̃):
(2|R̃ict̄|2 − R̃2t̄ )dvk
V olh̃( t̃)(M)
for some constant Λ = Λ(C̄), since |R̃ict̄| ≤ |Rict̄| ≤ C̄, |R̃t̄| ≤ |Rt̄| ≤ C̄, |R̃| ≤ |R| ≤ C̄.
Note that the initial value R̃(0) = R(g(t̄)) ≤ 1
R∞, so there is some constant τ̃ = τ̃(Λ)
such that R̃(t̃) ≤ 1
R∞ for t̃ ∈ [0, τ̃ ]. Thus the scaling factor from normalized Ricci
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 15
flow to the unnormalized one, which equals
R(h(t))
R̃(t̃)
, is less than 8 on the time interval
t̃ ∈ [0, τ̃ ]. Now the result follows, by setting τ = τ̃
and C = 8. �
The following lemma gives the estimation of the local volume along the Ricci flow.
As in [Se], the proof uses Theorem A 1.5 of [CC]. By assumption, we have a solution
(M, g(t)) to the normalized Ricci flow (1.3) and a sequence of times tk → ∞ and points
pk such that (M, g(tk), pk)
dGH−→ (N∞, g∞, p∞) with g(tk)
C1,α−→ g∞ on the regular part R
of the orbifold N∞. For the space M or N∞, let Rǫ,ρ be the set of points x such that
dGH(B(x, r), B(r)) < ǫr for any r ≤ u, where u ≥ ρ is some constant depending on x.
Here and after, B(r) denotes a ball of radius r in 4-Euclidean space and B(x, r) the
metric ball of radius r with center x in a metric space. A weak version is WRǫ,ρ, the
set of points x such that there is u ≥ ρ with dGH(B(x, u), B(u)) < ǫu.
Lemma 4.4. For each q ∈ R, choose a sequence qk ∈ M that converge to q. Then for
any ǫ > 0, there exist k0, η, ρ > 0 such that
Vol(Bg(tk+t)(q
k, r)) ≥ (1− ǫ)Vol(B(r)), ∀r < ρ, k0 < k,
whenever Bg(tk+t)(q
k, r) ⊂ Bg(tk)(qk, ρ) and t ∈ [−η, η].
Proof. By the boundedness of Ricci tensor, there is a universal constant Λ = Λ(C̄) > 1
such that Bg(t)(p,Λ
−1r) ⊂ Bg(s)(p, r) ⊂ Bg(t)(p,Λr) for all t, s ∈ [tk − 1, tk + 1], p ∈ M
and r > 0. By Theorem A.1.5 of [CC], for fixed ǫ > 0, there are δ = δ(ǫ, n), ρ =
ρ(ǫ, n) > 0 such that x ∈ WRδ,r implies Vol(Bg(t)(x, r)) ≥ (1 − ǫ)Vol(B(r)) for each
r ≤ ρ and x ∈ M . So by definition, it suffice to show q′k ∈ Rδ,ρ with respect to each
metric g(t), t ∈ [tk − η, tk + η], whenever q
k ∈ Bg(tk)(qk,Λρ), for some constant η > 0.
The constant ρ may be modified by a smaller one if necessary.
Using Theorem A.1.5 of [CC] again, for fixed δ as above, there is δ1 = δ1(δ, n) > 0
such that qk ∈ WR
(Λ2+1)ρ
implies q
k ∈ Rδ,ρ for any q
k ∈ Bg(t)(qk,Λ2ρ). So it re-
duces to show qk ∈ WR
(Λ2+1)ρ
with respect to each time t ∈ [tk − η, tk + η] for some
η > 0 small enough. In fact, as showed in [Se], dGH(Bg(tk)(qk, ρ1), B(ρ1)) <
for some small number ρ1 and all k large enough. By the boundedness of Ricci ten-
sor again, there is a constant η ≤ 1 such that for each time t ∈ [−η, η], we have
dGH(Bg(tk+t)(qk, ρ1), Bg(tk)(qk, ρ1)) <
δ1ρ1 for all k. Thus dGH(Bg(tk+t)(qk, ρ1), B(ρ1)) <
δ1ρ1 for each t ∈ [−η, η]. Now the result follows by setting ρ = (1−δ)ρ1Λ2+1 . �
Note that in the proof, the constant δ1 = δ1(ǫ, n), so the constant η depends only on
ǫ, n and C̄. By assumption, there is a compact exhaustion {Ki}∞i=1 of R and a sequence
of smooth embeddings Fi : Ki → M such that Fi(p∞) = pi and F ∗i g(ti) converges to
g∞ in the local C
1,α sense. Following the lines described in [Se], we can prove
Lemma 4.5. Denote by Ki,k = Fk(Ki), then for any ǫ > 0 and i, there are k0, η, ρ > 0
such that
Vol(Bg(tk+t)(q
k, r)) ≥ (1− ǫ)Vol(B(r)), ∀q
k ∈ Ki,k, k0 < k, t ∈ [−η, η] and r < ρ.
Now we are ready to prove the Proposition 4.1.
16 F. FANG, Y. ZHANG, AND Z. ZHANG
Proof of Proposition 4.1. Assume that p∞ ∈ Ki for each i. Set ǫ = δ0 in the the
previous lemma, where δ0 is just the constant in Theorem 4.2, then for one fixed Ki,
there exist k0, η, ρ > 0 such that Vol(Bg(tk+t)(q, r)) ≥ (1 − δ0)Vol(B(r)) whenever
q ∈ Ki,k, k0 < k, t ∈ [−η, η] and r < ρ. Modifying ρ and η by smaller constants, we
assume (ǫ0ρ)
2 ≤ 2η < τ , where τ and ǫ0 are constants in Lemma 4.3 and Theorem 4.2
respectively.
Let hk(t̃) be the corresponding solutions to the unnormalized Ricci flow equation
with initial value hk(0) = g(tk−η), then Vol(Bhk(t̃)(q, r)) ≥ (1−δ0)Vol(B(r)) whenever
q ∈ Ki,k, r < ρ, k0 < k and t̃ satisfying t(t̃) ∈ [0, 2η], since the inequality Vol(B(q, r)) ≥
(1− δ0)Vol(B(r)) is scale invariant and Bhk(t̃) ⊂ Bg(tk+t(t̃))(q, r) for k large enough such
that tk ≥ T + η for T chosen as above. Denote by R̃mk the Riemannian curvature
tensor of hk, then by Theorem 4.2 and Lemma 4.3, we have
|Rm|(q, tk + t) ≤ C|R̃mk|(q, t̃) ≤ C(t̃)−1 ≤ C(t− tk + η)−1,
for all q ∈ Ki,k. Hence |Rm|(q, t) is uniformly bounded on Ki,k × [tk − η2 , tk +
By Hamilton’s compactness theorem of Ricci flow solution, {(Ki,k, g(tk+ t), pk)}∞k=1
converge along a subsequence to a solution to the normalized Ricci flow (Ki,∞, gi,∞(t), pi,∞), t ∈
), in the local C∞ sense. When we consider the time t = 0, then using a diag-
onalization argument, a subsequence of {(Ki,k, g(tk), pk)}i,k will converge in the local
C∞ sense to a smooth Riemannian manifold (K∞, g∞, p∞), which is just (R, g∞), by
the uniqueness of the limit space.
For fixed i, there is a family of metrics gi,∞(t), t ∈ (−η2 ,
), on Ki. As showed
in [Se], we translate the time by η
, say considering the sequence {(Ki,k, g(tk + η4 +
t), pk)}k, and repeat the above argument, then obtain that {(Ki,k, g(tk + t), pk)}k
loc−→
(Ki,∞, gi,∞(t), pi,∞) along another subsequence, on the time interval t ∈ (−η2 ,
The essential point is that the estimate dGH(Bg(tk)(qk, ρ1), B(ρ1)) <
δ1ρ1 in the proof
of Lemma 4.4 holds for some constant ρ1, simultaneously the time tk is replaced by
, but the constant η in Lemma 4.5 is fixed in this procedure. Iterating this process
infinite times we obtain the convergence on Ki for all t ∈ [0,∞). Then do the same
thing for each Ki, i = 1, 2, · · · , and after a diagonalization argument, we get that a
subsequence of {(Ki,k, g(tk + t), pk)}k, say (Ki,ki, g(tki + t), pki)
loc−→ (R, g∞(t), p∞) for
all t ∈ [0,∞), with g∞(0) = g∞.
We finally show that the completion of R with respect to the metric g∞(t), say R̄t,
is just N∞, for each time t ∈ [0,∞). Denote by S = N∞\R the set of singular points
of (N∞, g∞(0)), then it suffice to show that R̄t = R ∪ S for fixed time t. Assume
S = {ql}Ql=1, where Q ≤ β for β = β(M) by Lemma 3.5, and let ε > 0 be any
small constant such that Bg∞(0)(qi, ε) ∩ Bg∞(0)(qj , ε) = ∅ whenever i 6= j. Denote by
Kε = R\
Bg∞(0)(pl, ε), then using |Ric∞| ≤ C̄ on R× [0,∞) and by the evolution
of the distance function, we obtain dGH((R\Kε, g∞(t)),S) ≤ e2C̄tε and consequently
R̄t = R ∪ S, by letting ε → 0. �
5. Proofs of Theorems 1.1 and 1.2
The main result of this section is the following
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 17
Theorem 5.1. Let (M, c) be a smooth oriented closed 4-manifold with a Spinc-structure
c. Assume that the first Chern class c1(c) of c is a monopole class of M satisfying that
(5.1) c21(c)[M ] ≥ 2χ(M) + 3τ(M) > 0.
Let g(t), t ∈ [0,∞), be a solution to (1.3) so that |Ric(g(t))| ≤ 3, and
(5.2) lim
λM(g(t)) = −
32π2c21(c)[M ].
Then there exists an m ∈ N, and sequences of points {xj,k ∈ M}, j = 1, · · · , m,
satisfying that, by passing to a subsequence,
(M, g(tk + t), x1,k, · · · , xm,k)
dGH−→ (
Nj, g∞, x1,∞, · · · , , xm,∞),
t ∈ [0,∞), in the m-pointed Gromov-Hausdorff sense for any tk → ∞, where (Nj , g∞)
j = 1, · · · , m are complete Kähler-Einstein orbifolds of complex dimension 2 with at
most finitely many isolated orbifold points {qi}. The scalar curvature (resp. volume)
of g∞ is
−Volg0(M)−
32π2c21(c)[M ] (resp. V = Volg0(M) =
Volg∞(Nj)).
Furthermore, in the regular part of Nj, {g(tk + t)} converges to g∞ in C∞-sense.
Comparing with Proposition 3.2, Theorem 5.1 shows that the Einstein orbifolds are
actually Kähler Einstein orbifolds under the additional assumptions. The key point in
the proof is that the sequence of the self-dual parts of the curvatures of the connections
on the determinant line bundles given by the irreducible solutions in the Seiberg-Witten
equations converges to a non-trivial parallel self-dual 2-form on every component Nj ,
which is a candidate of the Kähler form.
Let (M, c) and g(t) be the same as in Thoerem 5.1, and let V , m, tk, xj,k, R̆(g(t)),
gk, g∞, Nj and Fj,k,r be the same as in Section 3. Assume that, for each k, (φk, Ak) is
an irreducible solution to the Seiberg-Witten equations (2.1). Let | · |k denote the norm
with respect to the metric gk = g(tk). The following lemma shows that the L
2-norms
of the self-dual parts F+Ak tends to zero.
Lemma 5.2.
|∇kF+Ak |
kdvk = 0,
where ∇k is the connection on Λ2T ∗(M) induced by Levi-civita connection.
Proof. The Bochner formula implies that
0 = −1
∆k|φk|2k + |∇Akφk|2k +
R(gk)
|φk|2k +
|φk|4k,
By taking integration we get that
(5.3)
(|∇Akφk|2k +
R(gk)
|φk|2k)dvk = −
|φk|4kdvk.
18 F. FANG, Y. ZHANG, AND Z. ZHANG
Since λM(gk) is the lowest eigenvalue of the operator −4△k+R(gk), for any 1 ≫ ǫ > 0,
by definition
(5.4) λM(gk)
|φk|2k,ǫdvk ≤
(4|∇|φk|k,ǫ|2 +R(gk)|φk|2k,ǫ)dvk,
where | · |2k,ǫ = | · |2k + ǫ2. By Kato’s inequality (cf. (2.5)) and letting ǫ → 0,
λM(gk)
|φk|2kdvk ≤
(4|∇Akφk|2k +R(gk)|φk|2k)dvk = −
|φk|4kdvk ≤ 0.
As λM(gk) ≤ 0, by Schwarz inequality,
λM(gk)(
|φk|4k,ǫdvk)
2 = λM(gk)Volgk(M)
|φk|4k,ǫdvk)
2 ≤ λM(gk)
|φk|2k,ǫdvk.
Therefore
λM(gk)(
|φk|4k,ǫdvk)
(4|∇|φk|k,ǫ|2 +R(gk)|φ|2k,ǫ)dvk.
(5.5) 4
(|∇Akφk|2k − |∇|φk|k,ǫ|2)dvk ≤ −
|φk|4kdvk − λM(gk)(
|φk|4k,ǫdvk)
From (2.5), |∇|φk|k,ǫ|2 ≤ 34 |∇
Akφk|2k. Hence, by letting ǫ −→ 0, we have
(5.6)
|∇Akφk|2kdvk ≤ −((
|φk|4kdvk)
2 + λM(gk))(
|φk|4kdvk)
If c+1,k denotes the self-dual part of the harmonic form representing the first Chern class
c1(c) of c, by the Seiberg-Witten equation we get that
(5.7)
|φk|4kdvk = 8
|F+Ak |
2dvk ≥ 32π2[c+1,k]2[M ] ≥ 32π2c21(c)[M ].
Note that, by the standard estimates for Seiberg-Witten equations,
−R̆(gk) ≥ |φk|2k
and, by Theorem 1.1 in [FZ],
32π2c21(c)[M ] + λM(gk) is non-positive. Hence
(5.8)
|∇Akφk|2kdvk ≤ −(
32π2c21(c)[M ] + λM(gk))(
|φk|4kdvk)
≤ R̆(gk)V
32π2c21(c)[M ] + λM(gk)) −→ 0,
when k −→ ∞, by (5.2) and Lemma 3.1.
By the second one of the Seiberg-Witten equations again (cf. [Le2]),
(5.9) |∇kF+Ak |
|φk|2k|∇Akφk|2k,
where ∇Ak is the connection on Γ(S
) induced by the Levi-civita connection. Hence
|∇kF+Ak |
kdvk ≤
|R̆(g(tk))|
|∇Akφk|2kdvk −→ 0,
when k −→ ∞. �
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 19
Regard F+Ak as self-dual 2-forms of g
k on Uj,r = Bg∞(xj,∞, r)\
i Bg∞(qi,j, r
where g′k = F
j,k,r+1gk, and qi,j are the orbifold points of Nj. Since
(5.10) |F+Ak |
|φk|4k ≤
R̆(gk)
2 ≤ C,
where C is a constant independent of k, F+Ak ∈ L
1,2(g′k), and
‖F+Ak‖L1,2(g′k) ≤ C
where C ′ is a constant independent of k. Note that ‖ · ‖L1,2(g∞) ≤ 2‖ · ‖L1,2(g′k) for k ≫ 1
since g′k
C1,α−→ g∞ on Uj,r. Thus, by passing to a subsequence, F+Ak
L1,2−→ Ωj ∈ L1,2(g∞), a
self-dual 2-form with respect to g∞.
Lemma 5.3. For any j, Ωj is a smooth self-dual 2-form on Uj,r\∂Uj,r such that
∇∞Ωj ≡ 0, and |Ωj |∞ ≡ cont. 6= 0, where ∇∞ is the connection induced by the
Levi-civita connection of g∞. Hence, g∞ is a Kähler metric with Kähler form
|Ωj |
on Uj,r.
Proof. By Lemma 5.2
|∇∞Ωj |2∞dv∞ = lim
|∇∞F+Ak |
∞dv∞ ≤ 2 lim
|∇kF+Ak |
kdvk = 0.
It is easy to see that Ωj is a weak solution of the elliptic equation ∇∞Ωj = 0 on Uj,r.
By elliptic equation theory, Ωj is a smooth self-dual 2-form on Uj,r\∂Uj,r, ∇∞Ωj ≡ 0,
and |Ωj|∞ ≡ cont..
Now we claim that, for any j and r ≫ 1,
|Ωj |2∞dv∞ 6= 0. If not, there exist
js, s = 1, · · · , m0, m0 ≤ m, such that
Ujs,r
|Ωjs|2∞dv∞ ≡ 0. By Lemma 3.1, R∞ =
R(gk) = lim
R̆(gk) = λMV
2 , which is the scalar curvature of g∞, i.e. R∞ =
R(g∞). Note that, by (5.10) and Lemma 3.7,
|Ωj |2∞dv∞ = lim
|F+Ak |
R̆(gk)
2Volg′
(Uj,r)
∞Volg∞(Uj,r),
|F+Ak |
kdvk −
|F+Ak |
kdvk| ≤
R̆(gk)
2Volgk(M\
Fk,j,r(Uj,r))
Volg∞(Nj\Uj, r2 ),
and, by Lemma 3.1,
(R(gk)
2 − R2∞)dvk| ≤ 24 lim
(|R(gk)−R(gk)|+ |R∞ − R(gk)|)dvk = 0,
20 F. FANG, Y. ZHANG, AND Z. ZHANG
where C is a constant in-dependent of k. Hence, we obtain
j 6=j1,··· ,jm0
Volg∞(Uj,r) ≥
8|Ωj|2∞dv∞ = lim
8|F+Ak|
≥ lim
8|F+Ak |
kdvk − CR
Volg∞(Nj\Uj, r2 )
≥ 32π2c21(c)[M ]− CR
Volg∞(Nj\Uj, r2 ).
The last inequality is obtained by (5.7). Thus, by (5.1),
j 6=j1,··· ,jm0
Volg∞(Uj,r) ≥ 32π2(2χ(M) + 3τ(M))− CR
Volg∞(Nj\Uj, r2 ).
By the Chern-Gauss-Bonnet formula and the Hirzebruch signature theorem,
2χ(M) + 3τ(M) ≥ 1
R(gk)
2 + 2|W+(gk)|2k)dvk −
|Rico(gk)|2dvk.
By Lemma 3.1, and the fact that g′k
L2,p−→ g∞ on Uj,r, we obtain that
j 6=j1,··· ,jm0
Volg∞(Uj,r) ≥
+ 2|W+(g∞)|2∞)dv∞
−CR2∞
Volg∞(Nj\Uj, r2 ).
Note that, on any Uj,r, j 6= j1, · · · , jm0 , ∇∞Ωj ≡ 0, |Ωj|∞ ≡ cont. 6= 0, and Ωj
is a self-dual 2-form. Thus g∞ is a Kähler metric with Kähler form
|Ωj |
on Uj,r,
j 6= j1, · · · , jm0 . It is well known that R
∞ = 24|W+(g∞)|2∞ for Kähler metrics (cf.
[Le3]). Thus
j 6=j1,··· ,jm0
Volg∞(Uj,r) ≥ R
j 6=j1,··· ,jm0
Volg∞(Uj,r)− CR
Volg∞(Nj\Uj, r2 )
js=j1,··· ,jm0
+ 2|W+(g∞)|2∞)dv∞
≥ R2∞
j 6=j1,··· ,jm0
Volg∞(Uj,r)− CR
Volg∞(Nj\Uj, r2 )
js=j1,··· ,jm0
∞Volg∞(Ujs,r).
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 21
Note that, for r ≫ 1,
1 ≫ 3CR2∞
Volg∞(Nj\Uj, r2 ) ≥
js=j1,··· ,jm0
∞Volg∞(Ujs,r).
A contradiction. Thus, for all j,
|Ωj |2∞dv∞ 6= 0, and∇∞Ωj ≡ 0, |Ωj |∞ ≡ cont. 6= 0.
Thus we obtain the conclusion.
Proof of Theorem 5.1. First, assume that diamg(tk)(M) −→ ∞, when k −→ ∞. By
Proposition 3.2 and Proposition 4.1, there exists a m ∈ N, and a sequence of points
{xj,k ∈ M}, k ∈ N, j = 1, · · · , m, satisfying that, by passing to a subsequence,
(M, g(tk+t), x1,k, · · · , xm,k), t ∈ [0,∞), converges to {(N1, g∞, x1,∞), · · · , (Nm, g∞, xm,∞)}
in them-pointed Gromov-Hausdorff sense, when k −→ ∞, where (Nj , g∞) j = 1, · · · , m
are complete Einstein 4-orbifolds with finite isolated orbifold points {qi}. The scalar
curvature of g∞ is
R∞ = lim
λM(g(t)), and V = Volg0(M) =
Volg∞(Nj).
By Lemma 5.2, g∞ is a Kähler-Einstein metric in the non-singular part of
Then by the same arguments as in Section 4 of [Ti], g∞ is actually a Kähler-Einstein
orbifold metric. Furthermore, in the non-singular part of
Nj , {g(tk+ t)}, t ∈ [0,∞),
C∞-converges to g∞ by Proposition 4.1.
If diamgk(M) < C for a constant C in-dependent of k, we can also obtain the
conclusion by the similar, but much easier, arguments as above.
Theorem 5.4. Let (M, c) be a smooth compact closed oriented 4-manifold with a Spinc-
structure c. Assume that the first Chern class c1(c) of c is a monopole class of M
satisfying c21(c)[M ] = 2χ(M) + 3τ(M) > 0, and χ(M) = 3τ(M). If M admits a
solution g(t), t ∈ [0,∞) to (1.3) with |R(g(t))| ≤ 12, then
λM(g(t)) = −
32π2c21(c)[M ].
Furthermore, if |Ric(g(t))| ≤ 3, the Kähler-Einstein metric g∞ in Theorem 5.1 is a
complex hyperbolic metric.
Proof. Let V = V olg(t)(M). By the Chern-Gauss-Bonnet formula and the Hirzebruch
signature theorem,
(5.11) 2χ(M)− 3τ(M) ≥ 1
R(g(t))2 + 2|W−(g(t))|2 − 1
|Rico(g(t))|2dvg(t),
where W− is the anti-self-dual Weyl tensor. Note that
(5.12)
R(g(t))2dvg(t) ≥ R(g(t))2V −→ R
∞V = lim
λM(g(t))
22 F. FANG, Y. ZHANG, AND Z. ZHANG
when t −→ ∞, by Schwarz inequality and Lemma 3.1. By (5.11), (5.12), Lemma 3.1
and Theorem 1.1 in [FZ],
2χ(M)− 3τ(M) ≥ lim inf
|W−(g(t))|2dvg(t) +
λM(g(t))
≥ lim inf
|W−(g(t))|2dvg(t) +
c21(c)[M ]
= lim inf
|W−(g(t))|2dvg(t) +
(2χ(M) + 3τ(M)).
Since χ(M) = 3τ(M), we obtain
λM(g(t)) = −
32π2c21(c)[M ],
lim inf
|W−(g(t))|2dvg(t) = 0.
Now, assume that |Ric(g(t))| ≤ 3. Let tk, Nj , gk, and g∞ be the same as above.
For any j and compact subset U of the regular part of Nj,
|W−(g∞)|2∞dv∞ ≤ lim inf
|W−(g(tk))|2kdvk = 0,
since g(tk)
L2,p−→ g∞ on U . Hence g∞ is a Kähler-Einstein metric with W−(g∞) ≡ 0.
This implies that g∞ is a complex hyperbolic metric (cf. [Le1]). The desired result
follows.
Proofs of Theorem 1.1 and Theorem 1.2. By the work of Taubes [Ta], if (M,ω) is a
compact symplectic manifold with b+2 (M) > 1, the spin
c-structure induced by ω is a
monopole class. Moreover, since in this situation c21(c)[M ] = 2χ(M)+3τ(M), Theorem
1.1 (resp. Theorem 1.2) is an obvious consequence of Theorem 5.1 (resp. Theorem
5.4). �
References
[1] [An1] M. T. Anderson, Ricci curvature bounds and Einstein metrics on compact manifolds, J.
Amer. Math. Soc. 2 (1989), 455-490.
[An2] M. T. Anderson, The L2 structure of moduli spaces of Einstein metrics on 4-manifolds,
G.A.F.A. (1991), 231-251.
[An3] M. T. Anderson, Convergence and rigidity of manifolds under Ricci curvature bounds, Invent.
Math. 102 (1990), 429-445.
[An4] M. T. Anderson, Degeneration of metrics with bounded curvature and applications to critical
metrics of Riemannian functionals, Proceeding of Sympoia in Pure Mathematics, 54 (1993), 53-79.
[An5] M. T. Anderson, Canonical metrics on 3-manifolds and 4-manifolds, Asian J.Math. 10, (2006),
127-163.
[An6] M. T. Anderson, Extrema of curvature functionals on the space of metrics on 3-manifolds,
Calc. Var. and PDE, 5 (1997), 199-269.
[AIL] K.Akutagawa, M.Ishida, and C.LeBrun, Perelman’s invariant, Ricci flow, and the Yamabe
invariants of smooth manifolds, arxiv/math.DG/0610130.
[B] A. L. Besse, Einstein manifolds, Ergebnisse der Math. Springer-Verlag, Berlin-New York 1987.
MAXIMUM SOLUTIONS OF NORMALIZED RICCI FLOWS ON 4-MANIFOLDS 23
[BD] C.Bär, M.Dahl, Small eigenvalues of the conformal Laplacian, Geom. Funct. anal. 13 (2003),
483-508.
[CC] J. Cheeger and T. H. Colding, On the structure of space with Ricci curvature bounded below I,
Jour. Diff. Geom., 45 (1997), 406-480.
[CG] J. Cheeger and M. Gromov, Collapsing Riemannian Manifolds while keeping their curvature
bounded I, J.Diff.Geom. 23, (1986), 309-364.
[Cr] C.Croke, Some isoperimetric inequalities and eigenvalue estimates, Ann. Sci. Ecole Norm. Sup.
(4)13 (1980), 419-435.
[CT] J.Cheeger, and G.Tian, Curvature and injectivity radius estimates for Einstein 4-manifolds,
Journal of the American Mathematical Society, 19, (2006), 487-525.
[FZ] F.Fang, and Y.G.Zhang, Perelman’s λ-functional and the Seiberg-Witten equations,
math.FA/0608439.
[FZZ] F.Fang, Y.G.Zhang, and Z.L.Zhang, Non-singular solutions to the normalized Ricci flow equa-
tion, math.DG/0609254.
[H1] R. Hamilton, Three-manifolds with positive Ricci curvature, J. Diff. Geom. 17 (1982) 255-306.
[H2] R. Hamilton, A compactness property for solutions of the Ricci flow, Amer. J. Math. 117 (1995)
545-574.
[K] P.B.Kronheimer, Minimal genus in S1 ×M3, Invent. Math. 135(1) (1999), 45-61.
[KL] B.Kleiner, J.Lott, Notes on Perelman’s papers, arxiv/math.DG/0605667.
[Kot] D.Kotschick, Monopole classes and Perelman’s invariant of four-manifolds,
arXiv:math.DG/0608504.
[Le1] C.LeBrun, Einstein metrics and Mostow rigidity, Math. Res. Lett., 2 (1995), 1-8.
[Le2] C.LeBrun, Four-Dimensional Einstein Manifolds and Beyond, in Surveys in Differential Ge-
ometry, vol VI: Essays on Einstein Manifolds, 247-285.
[Le3] C.LeBrun, Ricci curvature, minimal volumes, and Seiberg-Witten theory, Invent. Math., 145
(2001), 279-316.
[Le4] C.LeBrun, Kodaira dimension and the Yamabe probblem, Comm. Anal.Geom. 7 (1999), 133-156.
[N] H.Nakajima,Self-duality of ALE Ricci-flat 4-manifolds and positive mass theorem, Advanced
Studies in Pure Math. 18-I, (1990), 385-395.
[Pe1] G.Perelman, The entropy formula for the Ricci flow and its geometric applications,
arXiv:math/0211159.
[Pe2] G.Perelman, Ricci flow with surgery on three-manifolds, arXiv: math:DG/0303109v1.
[Se] N.Sesum, Convergence of a Kähler-Ricci flow, arXiv:math.DG/0402238v1.
[STW] N. Sesum, G. Tian and X.D. Wang, Notes on Perelman’s paper on the entropy formula for
the Ricci flow and its geometric applications, preprint.
[Ta] C.H.Taubes, More constraints on symplectic forms from Seiberg-Witten invariants, Math. Res.
Lett. 2 (1995), 9-13.
[Ti] G.Tian, On Calabi’s conjecture for complex surface with positive first Chern class, Invent. Math.
101 (1990), 101-172.
[W] E.Witten, Monopoles and four-manifolds, Math. Res. Lett., 1(1994), 809-822.
[Y] R.Ye, Ricci flow, Einstein metrics and space forms, Trans. Amer. Math. Soc. 338 no.2 (1993),
871-896.
Department of Mathematics, Capital Normal University, Beijing, P.R.China
E-mail address : ffang@nankai.edu.cn
Department of Mathematics, Capital Normal University, Beijing, P.R.China
Nankai Institute of Mathematics, Weijin Road 94, Tianjin 300071, P.R.China
http://arxiv.org/abs/math/0608439
http://arxiv.org/abs/math/0609254
http://arxiv.org/abs/math/0608504
http://arxiv.org/abs/math/0211159
http://arxiv.org/abs/math/0402238
	1. Introduction
	2. Preliminaries 
	2.1. Monopole class
	2.3. Kato's inequality
	2.5. Chern-Gauss-Bonnet formula and Hirzebruch signature formula
	2.6. Curvature estimates for 4-manifolds
	3.  The limiting behavior of Ricci flow 
	4. Smooth convergence on the regular part
	5. Proofs of Theorems 1.1 and 1.2
	References
ABSTRACT
  We consider maximum solution $g(t)$, $t\in [0, +\infty)$, to the normalized
Ricci flow. Among other things, we prove that, if $(M, \omega) $ is a smooth
compact symplectic 4-manifold such that $b_2^+(M)>1$ and let
$g(t),t\in[0,\infty)$, be a solution to (1.3) on $M$ whose Ricci curvature
satisfies that $|\text{Ric}(g(t))|\leq 3$ and additionally $\chi(M)=3 \tau
(M)>0$, then there exists an $m\in \mathbb{N}$, and a sequence of points
$\{x_{j,k}\in M\}$, $j=1, ..., m$, satisfying that, by passing to a
subsequence, $$(M, g(t_{k}+t), x_{1,k},..., x_{m,k})
\stackrel{d_{GH}}\longrightarrow (\coprod_{j=1}^m N_j, g_{\infty},
x_{1,\infty}, ...,, x_{m,\infty}),$$ $t\in [0, \infty)$, in the $m$-pointed
Gromov-Hausdorff sense for any sequence $t_{k}\longrightarrow \infty$, where
$(N_{j}, g_{\infty})$, $j=1,..., m$, are complete complex hyperbolic orbifolds
of complex dimension 2 with at most finitely many isolated orbifold points.
Moreover, the convergence is $C^{\infty}$ in the non-singular part of
$\coprod_1^m N_{j}$ and
$\text{Vol}_{g_{0}}(M)=\sum_{j=1}^{m}\text{Vol}_{g_{\infty}}(N_{j})$, where
$\chi(M)$ (resp. $\tau(M)$) is the Euler characteristic (resp. signature) of
$M$.

<|endoftext|><|startoftext|>
Phase separation and flux quantization in the doped quantum dimer model on the
square and triangular lattices
Arnaud Ralko,1 Frédéric Mila,2 and Didier Poilblanc1
1 Laboratoire de Physique Théorique, CNRS & Université Paul Sabatier, F-31062 Toulouse, France
2 Institute of Theoretical Physics, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
(Dated: April 5, 2007)
The doped two-dimensional quantum dimer model is investigated by numerical techniques on the
square and triangular lattices, with significantly different results. On the square lattice, at small
enough doping, there is always a phase separation between an insulating valence-bond solid and
a uniform superfluid phase, whereas on the triangular lattice, doping leads directly to a uniform
superfluid in a large portion of the RVB phase. Under an applied Aharonov-Bohm flux, the superfluid
exhibits quantization in terms of half-flux quanta, consistent with Q = 2e elementary charge quanta
in transport properties.
PACS numbers: 75.10.Jm, 05.50.+q, 05.30.-d
Understanding electron pairing in high temperature
superconductors is a major challenge in strongly corre-
lated systems. In his milestone paper, Anderson pro-
posed a simple connection between high temperature su-
perconductors and Mott insulators [1]. Electron pairs
”hidden” in the strongly correlated insulating parent
state as Valence Bond (VB) singlets lead, once fried to
move at finite doping, to a superconducting behavior.
A very good candidate of the insulating parent state is
the resonating VB state (RVB), a state with only expo-
nentially decaying correlations and no lattice symmetry
breaking. A simple realization of RVB has been pro-
posed by Rokhsar and Kivelson (RK) in the framework
of an effective quantum dimer model (QDM) with only
local processes and orthogonal dimer coverings [2]. Even
though the relevance of these models for the description
of SU(2) Heisenberg models is still debated, this approach
is expected to capture the physics of systems that nat-
urally possess singlet ground states (GS). For instance,
specific quantum dimer models have recently been de-
rived from a spin-orbital model describing LiNiO2 [3], or
from the trimerized kagome antiferromagnet [4]. In a re-
cent work, a family of doped QDMs (at T=0) generalizing
the so-called RK point of Ref.[2] has been constructed
and investigated[5], taking advantage of a mapping to
classical dimer models [6] that extends the mapping of
the RK model onto a classical model at infinite temper-
ature, with evidence of phase separation at low doping.
However, the soluble models of Ref.[5] are ’ad hoc’ con-
structions, and this call for the investigation of similars
issue in the context of more realistic models. In that re-
spect, a natural minimal model to describe the motion of
charge carriers in a sea of dimers is the two-dimensional
quantum hard-core dimer-gas Hamiltonian:
H = v
Nc|c〉〈c| − J
(c,c′)
|c′〉〈c| − t
(c,c′′)
|c′′〉〈c|
where the sum on (c) runs over all configurations of the
Hilbert space, Nc is the number of flippable plaquettes,
the sum on (c′, c) runs over all configurations |c〉 and |c′〉
that differ by a single plaquette dimer flip, and the sum
on (c′′, c) runs over all configurations |c〉 and |c′′〉 that
differ by a single hole hopping between nearest neigh-
bors (triangular) or (diagonal) next-nearest neighbors
(square). Throughout the energy scale is set by J = 1.
A schematic phase diagram for the two lattices is de-
picted in Fig.1 in the undoped case. Remarkably, these
lattices lead to quite different insulating states. Indeed,
an ordered plaquette phase appears on the square lattice
immediately away from the special RK point, whereas a
RVB liquid phase is present in the triangular lattice
PSfrag replacements
columnar
columnar staggered
staggered
liquid
plaquette
FIG. 1: (color online) Schematic phase diagrams for the tri-
angular and the square lattice.
In this Letter, we investigate in details the properties
of model (1) on the square and triangular lattices at fi-
nite doping. Building on the differences between the two
lattices in the undoped case, we investigate to which ex-
tent the properties of the doped system are governed by
the nature of the insulating parent state. This investi-
gation is based on exact Diagonalisations and extensive
Green’s Function Monte-Carlo (GFMC) simulations [7]
essentially free of the usual finite-size limitations [8].
Phase separation: At small t, it is expected that holes
http://arxiv.org/abs/0704.0715v2
experience an effective attractive potential. It is there-
fore natural to first address the issue of phase separation
(PS), i.e. the possibility for the system to spontaneously
undergo a macroscopic segregation into two phases with
different hole concentrations. We analyze the problem as
a function of the hopping parameter t and hole concen-
tration x = nh/N , where nh is the number of holes in the
system and N the number of sites. In order to perform
a Maxwell construction we define:
s(x) =
e(x)− e(0)
where e(x) is the energy per site at doping x. This quan-
tity corresponds to the slope of the line passing through
e(0) and e(x). If the system exhibits PS, the energy will
present a change of curvature implying s(x) to have a
minimum at a critical doping xc. The fact that the local
curvature of e(x) at x = 0 is negative then implies that
the two seperated phases will have x = xc and x = 0 (the
undoped insulator). In Fig.2, typical results are shown
for both square and triangular lattices and for different
sizes. Interestingly, PS appears in both cases, but with
0 0.05 0.1 0.15
-0.025
-0.02
3x6x6
3x8x8
3x10x10
0 0.05 0.1 0.15
-0.138
-0.136
-0.134
12x12
14x14
16x16
PSfrag replacements
Triangular
v=0.85, t=0.05
Square
v=0.90, t=0.10
FIG. 2: (color online) Slope of energy density (Eq.(2)) vs
doping for different sizes. (a) Triangular lattice. (b) Square
lattice.
noticeable differences. While for the square lattice (lower
panel) the critical hole concentration xc is roughly size
independent, there is a strong size dependence for the
triangular lattice (upper panel). This size effect can be
traced back to the nature of the parent undoped state.
On the square lattice, the crystalline phase (for v < 1) at
zero doping is very robust and for increasing size, its lo-
cal order changes only weakly. On the triangular lattice,
it has been shown that size effects play an important role
[8], especially in the RVB liquid phase for 0.8 . v ≤ 1.
Periodic boundary conditions (BC) tend to stabilize the
so-called
12 phase on small clusters, and clusters
with more than 192 sites are necessary to significantly re-
duce finite-size effects, in particular, as in Fig.2, close to
the transition point with the crystalline phase. Hence the
PS observed around x = 0.075 for the 3× 6× 6 cluster is
not representative of the thermodynamic limit.
To obtain the phase diagram in the (v, x) plane, we
have performed a systematic size-scaling analysis at fixed
t and for various v’s depicted in Fig.3. In agreement with
0.003 0.004 0.005 0.006 0.007 0.008
v=0.95
v=0.90
v=0.85
0 0.002 0.004 0.006 0.008 0.01
v=0.90
v=0.85
v=0.80PSfrag replacements
Triangular
Square
Triangular
t=0.05
Square
t=0.15
FIG. 3: (color online) Scaling of the critical doping xc de-
fined by Maxwell construction with the inverse total number
of sites. (a) Triangular lattice. (b) Square lattice.
the previous discussion, a significant size dependence is
only present for the triangular lattice, in which case PS
disappears for large clusters in the RVB phase in the
vicinity of the RK point [9]. In Fig.4, we report the ther-
modynamic limit of xc for the two lattices as a function
of v, and for different values of t. For the square lattice,
calculations have been done from the RK point down
to the expected phase transition between the plaquette
phase and the columnar phase, namely v ≃ 0.6 [10]. For
the triangular lattice, the range between the RK point
down to the RVB-
12 transition point at v ≃ 0.8
has been covered [8]. These results clearly demonstrate
the difference between the square and triangular lattices.
In the first case, as soon as v 6= 1, PS occurs for x < xc.
Moreover, upon decreasing v, crystalline order strength-
ens and, for fixed t, it is necessary to consider a higher
concentration of holes to reach a stable conducting phase.
Similarly, the bigger t, the lower xc. On the triangular
lattice, a finite size-scaling analysis shows that no phase
separation appears down to a critical value v ∼ 0.9, well
above the critical value v ∼ 0.8 below which plaquette
order sets in. Although numerical limitations prevent
computations for smaller v and t, our results up to the
3 × 12 × 12-site cluster provide clear evidence for a re-
gion of PS inside the RVB region, between v ∼ 0.8 and
v ∼ 0.9 [11].
Dimer ordering on the square lattice: Next we investi-
gate how dimer order, known to exist at x = 0, evolves
under finite doping. Two scenarii are a priori possible: i)
the dimer order vanishes in the stable conducting phase
0.6 0.7 0.8 0.9 1
t=0.05
t=0.10
t=0.15
t=0.20
PSfrag replacements
Square
Triangular
FIG. 4: (color online) Phase separation boundaries for the
square and triangular lattices in the thermodynamic limit.
The dashed lines correspond to the approximate location of
the phase transition between plaquette and columnar phases
[10] for the square lattice and between plaquette and RVB
phases [8, 12] for the triangular lattice.
immediately at xc; ii) dimer order survives above xc in
a narrow region of the conducting phase. To solve this
problem, we have calculated the squared order parameter
D2(~k) in the GS |Ψ0〉 defined by:
D2(~k) =
〈Ψ0|d(−~k)d(~k)|Ψ0〉
〈Ψ0|Ψ0〉
along the path of Fig.5(a) the first Brillouin zone of the
square lattice, where d(~k) is the Fourier transform of the
dimer operator defined on the horizontal bonds. Note
that this calculation has not been tried for the triangular
lattice since no Bragg peak is present in the RVB phase,
and the algorithm is losing efficiency for v . 0.8 [9]. In
the pure plaquette phase on the square lattice, a Bragg
peak develops at point ~kM = (π, 0), the middle of the side
of the BZ. We show in Fig.5(b) a typical result for the
squared order parameter on the 196-site cluster for differ-
ent values of x. Clearly, the Bragg peak disappears upon
doping. A finite size scaling of the order parameter can
be performed thanks to the linear behaviour at low con-
centration. Within our data, D2(~k = ~kM ) ≡ D2M (L, x)
behaves like aLx + bL in the linear region. In this
case, one can determine rather precisely x+∞ such that
D2M (+∞, x+∞) = 0 i.e. x+∞ is the concentration in
the thermodynamic limit where the Bragg peak vanishes.
By definition, x+∞ = −b+∞/a+∞ ≃ 0.05(8). This value,
and the linear behaviour in the thermodynamic limit, are
displayed in Fig.5(c). If we compare x+∞ to the corre-
sponding xc from Fig.4, we can conclude that the Bragg
peak indeed vanishes at xc ≃ 0.067(5) within error bars.
Note that numerical errors increase for larger clusters,
and we are not able to use the same analysis for clusters
larger than 256 sites (the results for the 18 × 18 cluster
have not been used). Although the determination of x+∞
0 0.03 0.06 0.09 0.12 0.15
0.005
0.015
12x12
14x14
16x16
18x18
TD limit
0.005
x=0.00
x=0.03
x=0.06
x=0.09
x=0.12
PSfrag replacements
v=0.90, t=0.15
FIG. 5: (color online) (a) First Brillouin zone. (b) Momen-
tum dependence of the squared order parameter for the 196-
site cluster (14x14). (c) Squared order parameter at the M
point as a function of the hole concentration x and for differ-
ent cluster sizes. The thermodynamic limit is depicted as a
dashed line, with the corresponding error bar (see main text).
is delicate, the linear behavior of D2 vs x is consistent
with the physical behavior expected for the binary sys-
tem of Fig.4. No dimer order is present above xc, showing
that the system is simply “conducting ”in this case, with
D2(~kM ) decreasing as N
−1 (critical behaviour).
Flux quantization: Finally, let us turn to a better char-
acterization of the “conducting phase”. Since holes are
bosonic one expects the conducting phase to be a su-
perfluid (through Bose condensation)[13]. However, ex-
tra complexity results from the fact that these bosons
move in a dimer environment. To investigate this is-
sue, we pierce the torus by an Aharonov-Bohm flux
of strength φ = ξφ0 with 0 ≤ ξ ≤ 1 and where
φ0 = hc/e is the magnetic flux quantum. To reduce
finite size effects, we also consider arbitrary BC in the
second direction (y). All this is implemented by the
Peierls substitution, changing the hole hopping term into
t′ = exp(±i2πξa/Lx± i2πηa/Ly)t, where the ± depends
on the directions ±x± y, while a is the lattice parameter
and Lx and Ly the linear sizes of the system. Obviously,
the whole spectrum should be periodic in ξ with period
1. We show in Fig.6 the spectrum of the 4 × 4 cluster
on the square lattice, with 4 holes. It turns out that (i)
the GS energy exhibits well-defined minima and (ii) is
rigorously periodic with period ξ = 1/2, which means
that there is flux quantization in units of half the flux
quantum (red curve). Property (i) is typical of a super-
fluid [14]: It is the precursor of a finite barrier in the ther-
modynamic limit. [15]. Property (ii) was suggested quite
some time ago in the context of a more general QDM
by Kivelson[13], who also predicted that, in the cylinder
geometry, one should be able to tunnel between the two
branches of Fig.6, thus lifting the degeneracy at the level
crossing. This degeneracy is not lifted in our case, neither
in the torus geometry, nor in the cylinder geometry, due
0 0.2 0.4 0.6 0.8 1
-0.54
-0.51
-0.48
-0.45
k(0,0)
k(π,0)
-0.48
-0.46
PSfrag replacements
FIG. 6: (color online) Energy spectrum vs (reduced)
Aharonov-Bohm flux ξ, for a 4 × 4 cluster with 4 holes at
v = 0.70 and t = 1.00. (a) Torus geometry. Arbitrary BC are
used in the transverse direction to obtain a continuous set of
momenta leading to a continuum (colored area). (b) The two
low-energy branches for the case of a cylinder (dashed line)
and including a small bond disorder (full red line).
to the translational symmetry, which puts the two states
that are degenerate at ξ = 1/4 into different symmetry
sectors. However, getting rid of the translational sym-
metry by changing the amplitude of a local dimer flip
indeed removes the degeneracy (upper panel of Fig.6),
leading to a detectable flux quantization in units of half
the flux quantum in an experiment in which the flux is
sweeped. Thus, in our model, the ground-state energy
has periodicity hc/2e, consistent with mobile elementary
particles of charge Q = 2e in the system. Unlike what
was recently found in a bosonic model with correlated
hopping[16], these particles are not boson pairs: ¿From
the bosonic point of view, it is the statistical flux of the
dimer background that leads to the half-flux quantiza-
tion. If dimers are interpreted as SU(2) electron singlets,
these singlets are the physical pairs that lead to half-flux
quantization. This scenario is fundamentally different
from the usual mechanism related to real space pairing
of the charge carriers found e.g. in the extended Hubbard
chain[17], in the 2-leg ladders [18] or, more generally, in
Luther-Emery liquids [19], as can be inferred from the
exact degeneracy between ξ = 0 and ξ = 0.5 for finite
systems in the present case, to be contrasted with the
significant finite-size effects of the other cases.
Summary and conclusions: The numerical investiga-
tion with Green’s function QuantumMonte Carlo and ex-
act diagonalizations of the doped two-dimensional quan-
tum hard-core dimer model on the square and triangular
lattices has led to a number of interesting conclusions re-
garding hole motion in a dimer background. Phase sep-
aration is often present at low doping, as suggested by
earlier investigations, but our results indicate that it is
related to the presence of valence bond order [20]: In the
RVB phase of the triangular lattice, PS only occurs close
to the plaquette phase, where short-range dimer correla-
tions are already strong enough. Close to the RK point,
doping the RVB phase leads directly to a superfluid phase
as shown from its response to an Aharonov-Bohm flux.
Moreover, we observed that the flux quantization is in
units of half a flux quantum, consistent with the idea
that the dimer background leads to effective particles of
charge 2e. All these results are in qualitative agreement
with the gauge theories of high Tc superconductivity in
strongly correlated systems [21].
We acknowledge useful discussions with Federico
Becca. This work was supported by the Swiss National
Fund, by MaNEP, and by the Agence Nationale de la
Recherche (France).
[1] P.W. Anderson, Science 235, 1196 (1987).
[2] D.S. Rokhsar and S.A. Kivelson, Phys. Rev. Lett. 61,
2376 (1988).
[3] F. Vernay, A. Ralko, F. Becca and F. Mila, Phys. Rev.
B 74, 054402 (2006).
[4] M. E. Zhitomirsky, Phys. Rev. B 71, 214413 (2005).
[5] D. Poilblanc, F. Alet, F. Becca, A. Ralko, F. Trousselet
and F. Mila, Phys. Rev. B 74, 014437 (2006).
[6] C. Castelnovo, C. Chamon, C. Mudry and P. Pujol, Ann.
Phys. 322, 903 (2007).
[7] N. Trivedi and D.M. Ceperley, Phys. Rev. B 41, 4552
(1990); M. Calandra and S. Sorella, Phys. Rev. B 57,
11446 (1998).
[8] A. Ralko, M. Ferrero, F. Becca, D. Ivanov, and F. Mila,
Phys. Rev. B 74, 134301 (2006) and references therein.
[9] For small values of t, the GFMC algorithm is no longer
ergodic due to hole localization.
[10] O.F. Syljuasen, Phys. Rev. B 73, 245105 (2006).
[11] Long-range Coulomb repulsion is expected to have a ma-
jor role in the PS region and might stabilize stripes.
[12] R. Moessner and S. L. Sondhi, Phys. Rev. B 63, 224401
(2001).
[13] S. Kivelson, Phys. Rev. B 39, 259 (1989).
[14] N. Byers and C.N. Yang, Phys. Rev. Lett. 7, 46 (1961).
[15] By contrast, for non-interacting fermions, signs of a flat
energy curve already appear on such small clusters pro-
vided one also uses arbitrary BC. Similar arguments were
used in D. Poilblanc, Phys. Rev. B 44, 9562 (1991) for
the 2D t-J model.
[16] R. Bendjama, B. Kumar, F. Mila, Phys. Rev. Lett. 95,
110406 (2005).
[17] K. Penc and F. Mila, Phys. Rev. B 49, 9670 (1994).
[18] C. A. Hayward, D. Poilblanc, R. M. Noack, D. J.
Scalapino and W. Hanke, Phys. Rev. Lett. 75, 926
(1995).
[19] A. Seidel and D. H. Lee, Phys. Rev. B 71, 045113 (2005).
[20] Our results give an explicit characterization of the
confinement-deconfinement discussed in O.F. Syljuasen,
Phys. Rev. B 71, 020401(R)(2005).
[21] T. Senthil and P.A. Lee, Phys. Rev. B 71, 174515 (2005).
and references therein.
ABSTRACT
  The doped two-dimensional quantum dimer model is investigated by numerical
techniques on the square and triangular lattices, with significantly different
results. On the square lattice, at small enough doping, there is always a phase
separation between an insulating valence-bond solid and a uniform superfluid
phase, whereas on the triangular lattice, doping leads directly to a uniform
superfluid in a large portion of the RVB phase. Under an applied Aharonov-Bohm
flux, the superfluid exhibits quantization in terms of half-flux quanta,
consistent with Q=2e elementary charge quanta in transport properties.

<|endoftext|><|startoftext|>
Introduction
For a given combinatorial class of objects, such as polygons or polyhedra, the most basic
question concerns the number of objects of a given size (always assumed to be finite), or
an asymptotic estimate thereof. Informally stated, in this overview we will analyse the
refined question:
What does a typical object look like?
In contrast to the combinatorial question about the number of objects of a given size, the
latter question is of a probabilistic nature. For counting parameters in addition to object
size, one asks for their (asymptotic) probability law. To give this question a meaning,
an underlying ensemble has to be specified. The simplest choice is the uniform ensemble,
where each object of a given size occurs with equal probability.
For self-avoiding polygons on the square lattice, size may be the number of edges of the
polygon, and an additional counting parameter may be the area enclosed by the polygon.
We will call this ensemble the fixed perimeter ensemble. For the uniform fixed perimeter
ensemble, one assumes that, for a fixed number of edges, each polygon occurs with the
same probability. Another ensemble, which we will call the fixed area ensemble, is obtained
with size being the polygon area, and the number of edges being an additional counting
http://arxiv.org/abs/0704.0716v4
parameter. For the uniform fixed area ensemble, one assumes that, for fixed area, each
polygon occurs with the same probability.
To be specific, let pm,n denote the number of square lattice self-avoiding polygons of
half-perimeter m and area n. Discrete random variables X̃m of area in the uniform fixed
perimeter ensemble and of perimeter Ỹn in the uniform fixed area ensemble are defined by
P(X̃m = n) =
pm,n∑
n pm,n
, P(Ỹn = m) =
pm,n∑
m pm,n
We are interested in an asymptotic description of these probability laws, in the limit of
infinite object size.
In statistical physics, certain non-uniform ensembles are important. For fixed object
size, the probability of an object with value n of the counting parameter (such as the area
of a polygon) may be proportional to an, for some non-negative parameter a = e−βE of
non-uniformity. Here E is the energy of the object, and β = 1/(kBT ), where T is the
temperature, and kB denotes Boltzmann’s constant. A qualitative change in the behaviour
of typical objects may then be reflected in a qualitative change in the probability law of
the counting parameter w.r.t. a. Such a change is an indication of a phase transition, i.e.,
a non-analyticity in the free energy of the corresponding ensemble.
For self-avoiding polygons in the fixed perimeter ensemble, let q denote the parameter
of non-uniformity,
P(X̃m(q) = n) =
pm,nq
n pm,nq
Polygons of large area are suppressed in probability for small values of q, such that one
expects a typical self-avoiding polygon to closely resemble a branched polymer. Likewise,
for large values of q, a typical polygon is expected to be inflated, closely resembling a ball
(or square) shape. Let us define the ball-shaped phase by the condition that the mean
area of a polygon grows quadratically with its perimeter. The ball-shaped phase occurs
for q > 1 [31]. Linear growth of the mean area w.r.t. perimeter is expected to occur
for all values 0 < q < 1. This phase called the branched polymer phase. Of particular
interest is the point q = 1, at which a phase transition occurs [31]. This transition is called
a collapse transition. Similar considerations apply for self-avoiding polygons in the fixed
area ensemble,
P(Ỹn(x) = m) =
pm,nx
pm,nxm
with parameter of non-uniformity x, where 0 < x < ∞.
For a given model, these effects may be studied using data from exact or Monte-Carlo
enumeration and series extrapolation techniques. Sometimes, the underlying model is
exactly solvable, i.e., it obeys a combinatorial decomposition, which leads to a recursion
for the counting parameter. In that case, its (asymptotic) behaviour may be extracted
from the recurrence.
A convenient tool is generating functions. The combinatorial information about the
number of objects of a given size is coded in a one-variable (ordinary) generating function,
typically of positive and finite radius of convergence. Given the generating function of the
counting problem, the asymptotic behaviour of its coefficients can be inferred from the
leading singular behaviour of the generating function. This is determined by the location
and nature of the singularity of the generating function closest to the origin. There are
elaborate techniques for studying this behaviour exactly [37] or numerically [43].
The case of additional counting parameters leads to a multivariate generating function.
For self-avoiding polygons, the half-perimeter and area generating function is
P (x, q) =
pm,nx
For a fixed value of a non-uniformity parameter q0, where 0 < q0 ≤ 1, let x0 be the radius
of convergence of P (x, q0). The asymptotic law of the counting parameter is encoded in the
singular behaviour of the generating function P (x, q) about (x0, q0). If locally about (x0, q0)
the nature of the singularity of P (x, q) does not change, then distributions are expected
to be concentrated, with a Gaussian limit law. This corresponds to the physical intuition
that fluctuations of macroscopic quantities are asymptotically negligible away from phase
transition points. If the nature of the singularity does change locally, we expect non-
concentrated distributions, resulting in non-Gaussian limit laws. This is expected to be
the case at phase transition points.
Qualitative information about the singularity structure is given by the singularity di-
agram (also called the phase diagram). It displays the region of convergence of the two-
variable generating function, i.e., the set of points (x, q) in the closed upper right quadrant
of the plane, such that the generating function P (x, q) converges. The set of boundary
points with positive coordinates is a set of singular points of P (x, q), called the critical
curve. See Figure 1 for a sketch of the singularity diagram of a typical polygon model
such as self-avoiding polygons, counted by half-perimeter and area, with generating func-
tion P (x, q) as above. There appear two lines of singularities, which intersect at the point
Figure 1: Singularity diagram of a typical polygon model counted by half-perimeter and area,
with x conjugate to half-perimeter and q conjugate to area.
(x, q) = (xc, 1). Here xc is the radius of convergence of the half-perimeter generating func-
tion P (x, 1), also called the critical point. The nature of a singularity does not change
along each of the two lines, and the intersection point (x, q) = (xc, 1) of the two lines is a
phase transition point. For 0 < q < 1 fixed, denote by xc(q) the radius of convergence of
P (x, q). The branched polymer phase for the fixed perimeter ensemble 0 < q < 1 (and also
for the corresponding fixed area ensemble) is asymptotically described by the singularity
of P (x, q) about (xc(q), q). In the ball-shaped phase q > 1 of the fixed perimeter ensemble,
the (ordinary) generating function does not seem the right object to study, since it has zero
radius of convergence for fixed q > 1. The singularity of P (x, q) about (x, 1) describes, for
0 < x < xc, a ball-shaped phase in the fixed area ensemble, with a finite average size of a
ball.
For points (x, q) within the region of convergence, both x and y positive, the generating
function P (x, q) is finite and positive. Thus, such points may be interpreted as parameters
in a mixed infinite ensemble
P(X̃(x, q) = (m,n)) =
pm,nx
m,n pm,nx
The limiting law of the counting parameter in the fixed area or fixed perimeter ensem-
ble can be extracted from the leading singular behaviour of the two-variable generating
function. There are two different approaches to the problem. The first one consists in
analysing, for fixed non-uniformity parameter a, the singular behaviour of the remaining
one-parameter generating function and its derivatives w.r.t. a. This method is also called
the method of moments. It can be successfully applied in the fixed perimeter ensemble at
the phase transition point. Typically, this results in non-concentrated distributions.
The second approach derives an asymptotic approximation of the two-variable generat-
ing function. Away from a phase transition point, such an approximation can be obtained
for some classes of models, typically resulting in concentrated distributions, with a Gaus-
sian law for the centred and normalised random variable. However, it is usually difficult to
extract such information at a phase transition point. The theory of tricritical scaling seeks
to fill this gap, by suggesting and justifying a particular ansatz for an approximation using
scaling functions. Knowledge of the approximation may imply knowledge of the quantities
analysed in the first approach.
In the following, we give an overview of these two approaches. For the first approach,
summarised by the title limit distributions , there are a number of rigorous results, which
we will discuss. The second approach, summarised by the title scaling functions , is less
developed. For that reason, our presentation will be more descriptive, stating important
open questions. We will stress connections between the two approaches, thereby providing
a probabilistic interpretation of scaling functions in terms of limit distributions.
2 Polygon models and generating functions
Models of polygons, polyominoes or polyhedra have been studied intensively on the square
and cubic lattices. It is believed that the leading asymptotic behaviour of such models,
such as the type of limit distribution or critical exponents, is independent of the underlying
lattice.
In two dimensions, a number of models of square lattice polygons have been enumerated
according to perimeter and area and other parameters, see [7] for a review of models with an
exact solution. The majority of such models has an algebraic perimeter generating function.
We mention prudent polygons [96, 22, 8] as a notable exception. Of particular importance
for polygon models is the fixed perimeter ensemble, since it models two-dimensional vesicle
collapse. Another important ensemble is the fixed area ensemble, which serves as a model of
ring polymers. The fixed area ensemble may also describe percolation and cluster growth.
For example, staircase polygons are models of directed compact percolation [26, 28, 29,
27, 12, 57]. This may be compared to the exactly solvable case of percolation on a tree
[42]. The model of self-avoiding polygons is conjectured to describe the hull of critical
percolation clusters [60].
In addition to perimeter, other counting parameters have been studied, such as width
and height, generalisations of area [89], radius of gyration [53, 64], number of nearest-
neighbour interactions [4], last column height [7], and site perimeter [20, 11]. Also, mo-
tivated by applications in chemistry, symmetry subclasses of polygon models have been
analysed [63, 62, 40, 95]. Whereas this gives rise to a number of different ensembles, only
a few of them have been asymptotically studied. Not all of them display phase transitions.
In three dimensions, models of polyhedra on the cubic lattice have been enumerated
according to perimeter, surface area and volume, see [74, 102, 3] and the discussion in
section 3.9. Various ensembles may be defined, such as the fixed surface area ensemble
and the fixed volume ensemble. The fixed surface area ensemble serves as a model of
three-dimensional vesicle collapse [104].
In this chapter, we will consider models of square lattice polygons, counted by half-
perimeter and area. Let pm,n denote the (finite) number of such polygons of half-perimeter
m and area n. The numbers pm,n will always satisfy the following assumption.
Assumption 1. For m,n ∈ N0, let non-negative integers pm,n ∈ N0 be given. The numbers
pm,n are assumed to satisfy the following properties.
i) There exist positive constants A,B > 0 such that pm,n = 0 if n ≤ Am or if n ≥ Bm2.
ii) The sequence (
n pm,n)m∈N0 has infinitely many positive elements and grows at most
exponentially.
Remarks. i) A sequence (an)n∈N0 is said to grow at most exponentially, if there are positive
constants C, µ such that |an| ≤ Cµn for all n.
ii) Condition i) reflects the geometric constraint that the area of a polygon grows at most
quadratically and at least linearly with its perimeter. For self-avoiding polygons, we have
n ≥ m − 1. Since pm,n = 0 if m < 2, we may choose A = 1/3. Since n ≤ m2/4 for
self-avoiding polygons, we may choose B = 1/3. Condition ii) is a natural condition on
the growth of the number of polygons of a given perimeter. For self-avoiding polygons, we
may choose C = 1 and µ = 16.
iii) For models with counting parameters different from area, or for models in higher
dimensions, a modified assumption holds, with the growth condition i) being replaced by
n ≤ Amk0 and n ≥ Bmk1 , for appropriate values of k0 and k1. Counting parameters
statisfying pm,n = 0 for n ≥ Bmk are called rank k parameters [25].
The above assumption imposes restrictions on the generating function of the numbers
pm,n. These explain the qualitative form of the singularity diagram Figure 1.
Proposition 1. For numbers pm,n, let Assumption 1 be satisfied. Then, the generating
function P (x, q) =
m,n pm,nx
mqn has the following properties.
i) The generating function P (x, q) satisfies for k ∈ N
P (x, q) ≪
P (x, q) ≪ Bk
P (x, q),
where ≪ denotes coefficient-wise domination.
ii) The evaluation P (x, 1) is a power series with radius of convergence xc, where 0 <
xc ≤ 1.
iii) The generating function P (x, q) diverges, if x 6= 0 and |q| > 1. It converges, if |q| < 1
and |x| < xcq−A. In particular, for k ∈ N0, the evaluations
P (x, q)
are power series with radius of convergence 1.
iv) For k ∈ N0, the evaluations
P (x, q)
are power series with radius of convergence xc. They satisfy, for |x| < xc,
P (x, q)
= lim
−1<q<1
P (x, q).
sketch. The domination formula follows immediately from condition i). The existence of
the evaluations at q = 1 and x = xc as formal power series also follows from condition i).
Condition ii) ensures that 0 < xc ≤ 1 for the radius of convergence of P (x, 1). Equality
of the radii of convergence for the derivatives follows from condition i) by elementary
estimates. The claimed analytic properties of P (x, q) follow from conditions i) and ii) by
elementary estimates. The claimed left-continuity of the derivatives in iv) is implied by
Abel’s continuity theorem for real power series.
Remarks. i) Proposition 1 implies that the critical curve xc(q) satisfies for 0 < q < 1 the
estimate xc(q) ≥ xcq−A. For self-avoiding polygons, the critical curve xc(q) is continuous
for 0 < q < 1. This follows from a certain supermultiplicative inequality for the numbers
pm,n by convexity arguments [48].
ii) Of central importance in the sequel will be the power series
gk(x) =
P (x, q)
. (1)
They are called factorial moment generating functions , for reasons which will become clear
later.
We continue studying analytic properties of the factorial moment generating functions.
In the following, the notation x ր x0 denotes the limit x → x0 for sequences (xn) satisfying
|xn| < x0. The notation f(x) ∼ g(x) as x ր x0 means that g(x) 6= 0 in a left neigbourhood
of x0 and that limxրx0 f(x)/g(x) = 1. Likewise, am ∼ bm as m → ∞ for sequences
(am), (bm) means that bm 6= 0 for almost all m and limm→∞ am/bm = 1. The following
lemma is a standard result.
Lemma 1. Let (am)m∈N0 be a sequence of real numbers, which asymptotically satisfy
am ∼ Ax−mc mγ−1 (m → ∞), (2)
for real numbers A, xc, γ, where A 6= 0 and xc > 0.
Then, the generating function g(x) =
m=0 amx
m has radius of convergence xc. If
γ /∈ {0,−1,−2, . . .}, then there exists a power series g(reg)(x) with radius of convergence
strictly larger than xc, such that g(x) satisfies
g(x)− g(reg)(x)
∼ AΓ(γ)
(1− x/xc)γ
(x ր xc), (3)
where Γ(z) denotes the Gamma function.
Remarks. i) The above lemma can be proved using the analytic properties of the polylog
function [32]. If γ ∈ {0,−1,−2, . . .}, an asymptotic form similar to Eq. (3) is valid, which
involves logarithms.
ii) The function g(reg)(x) in the above lemma is not unique. For example, if γ > 0, any
polynomial in x may be chosen. We demand g(reg)(x) ≡ 0 in that case. If γ < 0 and
g(reg)(x) is restricted to be a polynomial, it is uniquely defined. If −1 < γ < 0, we
have g(reg)(x) ≡ g(xc). In the general case, the polynomial has degree ⌊−γ⌋, compare
[32]. In the following, we will demand uniqueness by the above choice. The power series
g(sing)(x) :=
g(x)− g(reg)(x)
is then called the singular part of g(x).
Conversely, let a power series g(x) with radius of convergence xc be given. In order to
conclude from Eq. (3) the behaviour Eq. (2), certain additional analyticity assumptions
on g(x) have to be satisfied. To this end, a function g(x) is called ∆(xc, η, φ)-regular (or
simply ∆-regular) [30], if there is a positive real number xc > 0, such that g(x) is analytic
in the indented disc ∆(xc, η, φ) := {z ∈ C : |z| ≤ xc + η, |Arg(z − xc)| ≥ φ}, for some
η > 0 and some φ, where 0 < φ < π/2. Note that xc /∈ ∆, where we adopt the convention
Arg(0) = 0. The point x = xc is the only point for |x| ≤ xc, where g(x) may possess a
singularity.
Lemma 2 ([35]). Let the function g(x) be ∆-regular and assume that
g(x) ∼ 1
(1− x/xc)γ
(x → xc in ∆).
If γ /∈ {0,−1,−2, . . .}, we then have
[xm]g(x) ∼
x−mc m
γ−1 (m → ∞),
where [xm]g(x) denotes the Taylor coefficient of g(x) of order m about x = 0.
Remarks. i) Note that the coefficients of the function f(x) = (1 − x/xc)−γ with real
exponent γ /∈ {0,−1,−2, . . .} satisfy
[xm]f(x) ∼ 1
x−mc m
γ−1 (m → ∞). (4)
This may be seen by an application of the binomial series and Stirling’s formula. For
functions g(x) ∼ f(x), the assumption of ∆-regularity for g(x) ensures that the same
asymptotic estimate holds for the coefficients of g(x).
ii) Theorems of the above type are called transfer theorems [35, 37]. The set of ∆-regular
functions with singularities of the above form is closed under addition, multiplication,
differentiation, and integration [30].
iii) The case of a finite number of singularities on the circle of convergence can be treated
by a straightforward extension of the above result [35, 37].
Lemma 1 implies a particular singular behaviour of the factorial moment generating
functions, if the numbers pm,n satisfy certain typical asymptotic estimates. We write
(a)k = a · (a− 1) · . . . · (a− k + 1) to denote the lower factorial.
Proposition 2. For m,n ∈ N0, let real numbers pm,n be given. Assume that the numbers
pm,n asymptotically satisfy, for k ∈ N0,
(n)kpm,n ∼ Akx−mc mγk−1 (m → ∞), (5)
for real numbers Ak, xc, γk, where Ak > 0, xc > 0, and γk /∈ {0,−1,−2, . . .}.
Then, the factorial moment generating functions gk(x) satisfy
(sing)
k (x) ∼
(1− x/xc)γk
(x ր xc), (6)
where fk = Ak Γ(γk).
Model φ θ γ0 Area limit law
rectangles
convex polygons
−1 2 β1,1/2
Ferrers diagrams
stacks
1 Gaussian
staircase polygons
bargraph polygons
column-convex polygons
directed column-convex polygons
diagonally convex directed polygons
rooted self-avoiding polygons∗
directed convex polygons 2
meander
diagonally convex polygons∗ −1
three-choice polygons 0
Table 1: Exponents and area limit laws for prominent polygon models. An asterisk denotes a
numerical analysis.
Remarks. i) The above assumption on the growth of the coefficients in Eq. (5) is typical
for polygon models, with γk = (k − θ)/φ, and φ > 0.
ii) If the numbers pm,n satisfy, in addition to Eq.(5), Condition i) of Assumption 1, this
implies for exponents of the form γk = (k − θ)/φ, where φ > 0, the estimate 1/2 ≤ φ ≤ 1.
iii) The proposition implies that the singular part of the factorial moment generating
function gk(x) is asymptotically equal to the singular part of the corresponding (ordinary)
moment generating function,
P (x, q)
)(sing)
P (x, q)
∣∣∣∣∣
(sing)
(x ր xc).
We give a list of exponents and area limit distributions for a number of polygon models.
An asterisk denotes that corresponding results rely on a numerical analysis. It appears that
the value (θ, φ) = (1/3, 2/3) arises for a large number of models. Furthermore, the exponent
γ0 seems to determine the area limit law. These two observations will be explained in the
following section.
3 Limit distributions
In this section, we will concentrate on models of square lattice polygons in the fixed perime-
ter ensemble, and analyse their area law. The uniform ensemble is of particular interest,
since non-Gaussian limit laws usually appear, due to expected phase transitions at q = 1.
For non-uniform ensembles q 6= 1, Gaussian limit laws are expected, due to the absence of
phase transitions.
There are effective techniques for the uniform ensemble, since the relevant generating
functions are typically algebraic. This is different from the fixed area ensemble, where
singularities are more difficult to analyse. It will turn out that the dominant singularity of
the perimeter generating function determines the limiting area law of the model. We will
first discuss several examples with different type of singularity. Then, we will describe a
general result, by analysing classes of q-difference equations (see e.g. [103]), which exactly
solvable polygon models obey. Whereas in the case q 6= 1 their theory is developed to some
extent, the case q = 1 is more difficult to analyse. Motivated by the typical behaviour of
polygon models, we assume that a q-difference equation reduces to an algebraic equation
as q approaches unity, and then analyse the behaviour of its solution about q = 1.
Useful background concerning a probabilistic analysis of counting parameters of combi-
natorial structures can be found in [37, Ch IX]. See [80, Ch 1] and [5, Ch 1] for background
about asymptotic expansions. For properties of formal power series, see [39, Ch 1.1]. A
useful reference on the Laplace transform, which will appear below, is [23].
3.1 An illustrative example: Rectangles
3.1.1 Limit law of area
Let pm,n denote the number of rectangles of half-perimeter m and area n. Consider the
uniform fixed perimeter ensemble, with a discrete random variable of area X̃m defined by
P(X̃m = n) =
pm,n∑
n pm,n
. (7)
The k-th moments of X̃m are given explicitly by
E[X̃km] =
(l(m− l))k
∼ m2k
(x(1− x))kdx = (k!)
(2k + 1)!
m2k (m → ∞),
where we approximated the Riemann sum by an integral, using the Euler-MacLaurin sum-
mation formula. Thus, the random variable X̃m has mean µm ∼ m2/6 and variance
σ2m ∼ m4/180. Since the sequence of random variables (X̃m) does not satisfy the con-
centration property limm→∞ σm/µm = 0, we expect a non-trivial limiting distribution.
Consider the normalised random variable
. (8)
Since the moments ofXm converge asm → ∞, and the limit sequenceMk := limm→∞ E[Xkm]
satisfies the Carleman condition
k(M2k)
−1/(2k) = ∞, they define [17, Ch 4.5] a unique
random variable X with moments Mk. Its moment generating function M(t) = E[e
−tX ] is
readily obtained as
M(t) =
E[Xk]
(−t)k =
et erf
The corresponding probability distribution p(x) is obtained by an inverse Laplace trans-
form, and is given by
p(x) =
1−x 0 ≤ x ≤ 1
0 x > 1
. (9)
This distribution is known as the beta distribution β1,1/2. Together with [17, Thm 4.5.5],
we arrive at the following result.
Theorem 1. The area random variable X̃m of rectangles Eq. (7) has mean µm ∼ m2/6
and variance σ2m ∼ m4/180. The normalised random variables Xm Eq. (8) converge in
distribution to a continuous random variable with limit law β1,1/2 Eq. (9). We also have
moment convergence.
3.1.2 Limit law via generating functions
We now extract the limit distribution using generating functions. Whereas the derivation
is less direct than the previous approach, the method applies to a number of other cases,
where a direct approach fails. Consider the half-perimeter and area generating function
P (x, q) for rectangles,
P (x, q) =
pm,nx
The factorial moments of the area random variable X̃m Eq. (7) are obtained from the
generating function via
E[(X̃m)k] =
n(n)kpm,n∑
n pm,n
[xm] ∂
P (x, q)
[xm]P (x, 1)
where (a)k = a · (a − 1) · . . . · (a − k + 1) is the lower factorial. The generating function
P (x, q) satisfies [87, Eq. 5.1] the linear q-difference equation [103]
P (x, q) = x2qP (qx, q) +
x2q(1 + qx)
1− qx
. (10)
Due to the particular structure of the functional equation, the area moment generating
functions
gk(x) =
P (x, q)
are rational functions and can be computed recursively from the functional equation, by
repeated differentiation w.r.t. q and then setting q=1. (Such calculations are easily per-
formed with a computer algebra system.) This gives, in particular,
g0(x) =
(1− x)2
, g1(x) =
(1− x)4
g2(x) =
(1− x)6
, g3(x) =
(1− x)8
g4(x) =
x4(1 + 22x+ x2)
(1− x)10
, g5(x) =
12x5(1 + 8x+ x2)
(1− x)12
Whereas the exact expressions get messy for increasing k, their asymptotic form about
their singularity xc = 1 is simply given by
gk(x) ∼
(1− x)2k+2
(x → 1). (11)
The above result can be inferred from the functional equation, which induces a recursion
for the functions gk(x), which in turn can be asymptotically analysed. This method is
called moment pumping [36]. Below, we will extract the above asymptotic behaviour by
the method of dominant balance.
The asymptotic behaviour of the moments of X̃m can be obtained from singularity
analysis of generating functions, as described in Lemma 2. Using the functional equation,
it can be shown that all functions gk(x) are Laurent series about x = 1, with a finite number
of terms. Hence the remark following Lemma 2 implies for the (factorial) moments of the
random variable Xm Eq. (8) the expression
E[(Xm)
∼ E[(Xm)k]
Γ(2k + 2)
(2k + 1)!
(m → ∞),
in accordance with the previous derivation.
On the level of the moment generating function, an application of Watson’s lemma [5,
Sec 4.1] shows that the coefficients k! in Eq. (11) appear in the asymptotic expansion of a
certain Laplace transform of the (entire) moment generating function E[e−tX ],
E[Xk]
(−t2)k
t dt ∼
(−1)kk!s−(2k+2) (s → ∞).
Note that the r.h.s. is formally obtained by term-by-term integration of the l.h.s..
Using the arguments of [46, Ch 8.11], one concludes that there exists an s0 > 0, such
that there is a unique function F (s) analytic for ℜ(s) ≥ s0 with the above asymptotic
expansion. It is given by
F (s) = Ei(s2) es
, (12)
where Ei(z) =
e−tz /t dt is the exponential integral. The moment generating function
M(t) = E[e−tX ] of the random variable X is given by an inverse Laplace transform of F (s),
e−stM(t2)t dt = F (s).
Since there are effective methods for computing inverse Laplace transforms [23], the
question arises whether the function F (s) can be easily obtained. It turns out that the
functional equation Eq. (10) induces a differential equation for F (s). This equation can be
obtained in a mechanical way, using the method of dominant balance.
3.1.3 Dominant balance
For a given functional equation, the method of dominant balance consists of a certain
rescaling of the variables, such that the quantity of interest appears in the expansion of a
rescaled variable to leading order. The method was originally used as an heuristic tool in
order to extract the scaling function of a polygon model [84] (see the following section). In
the present framework, it is a rigorous method.
Consider the half-perimeter and area generating function P (x, q) as a formal power
series. The substitution q = 1− ǫ̃ is valid, since the coefficients of the power series P (x, q)
in x are polynomials in q. We get the power series in ǫ̃,
H(x, ǫ̃) =
(−1)kgk(x)ǫ̃k.
whose coefficients (−1)kgk(x) are power series in x. The functional equation Eq. (10)
induces an equation for H(x, ǫ̃), from which the factorial area moment generating functions
gk(x) may be computed recursively.
Now replace gk(x) by its expansion about x = 1,
gk(x) =
(1− x)2k+2−l
Introducing s̃ = 1− x, this leads to a power series E(s̃, ǫ̃) in ǫ̃,
E(s̃, ǫ̃) =
(−1)k
s̃2k+2−l
whose coefficients are Laurent series in s̃. As above, the functional equation induces an
equation for the power series E(s̃, ǫ̃) in ǫ̃, from which the expansion coefficients may be
computed recursively.
We infer from the previous equation that
E(sǫ, ǫ2) =
(−1)k fk,l
s2k+2−l
F (s, ǫ). (13)
Write F (s, ǫ) =
l≥0 Fl(s)ǫ
l. By construction, the (formal) series F0(s) = F (s, 0) coincides
with the asymptotic expansion of the desired function F (s) Eq. (12) about infinity.
The above example suggests a technique for computing F0(s). The functional equation
Eq. (10) for P (x, q) induces, after reparametrisation, differential equations for the functions
Fl(s), from which F0(s) may be obtained explicitly. These may be computed by first writing
P (x, q) =
(1− q)1/2
, (1− q)1/2
, (14)
and then introducing variables s and ǫ, by setting x = 1− sǫ and q = 1− ǫ2. Expand the
equation to leading order in ǫ. This yields, to order ǫ0, the first order differential equation
sF ′0(s) + 2− 2s2F0(s) = 0.
The above equation translates into a recursion for the coefficients fk,0, from which fk,0 = k!
can be deduced. In addition, the equation has a unique solution with the prescribed
asymptotic behaviour Eqn. (13), which is given by F0(s) = Ei(s
2) es
As we will argue in the next section, Eq. (14) is sometimes referred to as a scaling
Ansatz, the function F (s, 0) appears as a scaling function, the functions Fl(s), for l ≥ 1,
appear as correction-to-scaling functions . In our formal framework, where the series Fl(s)
are rescaled generating functions for the coefficients fk,l, their derivation is rigorous.
3.2 A general method
In the preceding two subsections, we described a method for obtaining limit laws of counting
parameters, via a generating function approach. Since this method will be important in
the remainder of this section, we summarise it here. Its first ingredient is based on the
so-called method of moments [17, Thm 4.5.5].
Proposition 3. For m,n ∈ N0, let real numbers pm,n be given. Assume that the numbers
pm,n asymptotically satisfy, for k ∈ N0,
(n)kpm,n ∼ Akx−mc mγk−1 (m → ∞), (15)
where Ak are positive numbers, and γk = (k − θ)/φ, with real constants θ and φ > 0.
Assume that the numbers Mk := Ak/A0 satisfy the Carleman condition
(M2k)
−1/(2k) = +∞. (16)
Then the following conclusions hold.
i) For almost all m, the random variables X̃m
P(X̃m = n) =
pm,n∑
n pm,n
are well defined. We have
Xm :=
d→ X, (18)
for a unique random variable X with moments Mk, where d denotes convergence in
distribution. We also have moment convergence.
ii) If the numbers Mk satisfy for all t ∈ R the estimate
= 0, (19)
then the moment generating function M(t) = E[e−tX ] of X is an entire function.
The coefficients AkΓ(γk) are related to M(t) by a Laplace transform which has, for
θ > 0, the asymptotic expansion
E[Xk]
(−t1/φ)k
t1−γ0
(−1)kAkΓ(γk)s−γk (s → ∞).
sketch. A straightforward calculation using Eq. (15) leads to
E[(X̃m)k]
mk/φ (m → ∞).
This implies that the same asymptotic form holds for the (ordinary) moments E[(X̃m)
Due to the growth condition Eq. (16), the sequence (Mk) defines a unique random variable
X with moments Mk. Also, moment convergence of the sequence (Xm) to X implies
convergence in distribution, see [17, Thm 4.5.5]. Due to the growth condition Eq. (19), the
function M(t) is entire. Hence the conditions of Watson’s Lemma [5, Sec 4.1] are satisfied,
and we obtain Eq. (20).
Remarks. i) The growth condition Eq. (19) implies the Carleman condition Eq. (16). All
examples below have entire moment generating functions M(t).
ii) If γ0 < 0, a modified version of Eq. (20) can be given, see for example staircase polygons
below.
Proposition 2 states that assumption Eq. (15) translates, at the level of the half-
perimeter and area generating function P (x, q) =
m,n pm,nx
mqn, to a certain asymptotic
expression for the factorial moment generating functions
gk(x) =
P (x, q)
Their asymptotic behaviour follows from Eq. (15), and is
(sing)
k (x) ∼
(1− x/xc)γk
(x ր xc),
where fk = AkΓ(γk). Adopting the generating function viewpoint, the amplitudes fk
determine the numbers Ak, hence the moments Mk = Ak/A0 of the limit distribution. The
series F (s) =
k≥0(−1)kfks−γk will be of central importance in the sequel.
Definition 1 (Area amplitude series). Let Assumption 1 be satisfied. Assume that the
generating function P (x, q) =
m,n pm,nx
mqn satisfies asymptotically
P (x, q)
)(sing)
(1− x/xc)γk
(x ր xc),
with exponents γk /∈ {0,−1,−2, . . .}. Then, the formal series
F (s) =
(−1)k fk
is called the area amplitude series.
Remarks. i) Proposition 3 states that the area amplitude series appears in the asymptotic
expansion about infinity of a Laplace transform of the moment generating function of the
area limit distribution. The probability distribution of the limiting area distribution is
related to F (s) by a double Laplace transform.
ii) For typical polygon models, all derivatives of P (x, q) w.r.t. q, evaluated at q = 1, exist
and have the same radius of convergence, see Proposition 1. Typical polygon models do
have factorial moment generating functions of the above form, see the examples below.
The second ingredient of the method consists in applying the method of dominant
balance. As described above, this may result in a differential equation (or in a difference
equation [90]) for the function F (s). Its applicability has to be tested for each given type
of functional equation. Typically, it can be applied if the factorial area moment generating
functions gk(x) Eq. (1) have, for values x < xc, a local expansion about x = xc of the form
(sing)
k (x) =
(1− x/xc)γk,l
where γk,l = (k − θl)/φ and θl+1 > θl. If a transfer theorem such as Lemma 2 applies,
then the differential equation for F (s) induces a recurrence for the moments of the limit
distribution. If the differential equation can be solved in closed form, inverse Laplace
transform techniques may be applied in order to obtain explicit expressions for the moment
generating function and the probability density. Also, higher order corrections to the
limiting behaviour may be analysed, by studying the functions Fl(s), for l ≥ 1. See [87]
for examples.
3.3 Further examples
Using the general method as described above, area limit laws for the other exactly solved
polygon models can be derived. A model with the same area limit law as rectangles
is convex polygons, compare [87]. We will discuss some classes of polygon models with
different area limit laws.
3.3.1 Ferrers diagrams
In contrast to the previous example, the limit distribution of area of Ferrers diagrams is
concentrated.
Proposition 4. The area random variable X̃m of Ferrers diagrams has mean µm ∼ m2/8.
The normalised random variablesXm Eq. (18) converge in distribution to a random variable
with density p(x) = δ(x− 1/8).
Remark. It should be noted that the above convergence statement already follows from
the concentration property limm→∞ σm/µm = 0, with σ
m ∼ m3/48 the variance of Xm, by
an explicit analysis of the first three factorial moment generating functions. (By Cheby-
shev’s inequality, the concentration property implies convergence in probability, which in
turn implies convergence in distribution.) For illustrative purposes, we follow a different
route via the moment method in the following proof.
Proof. Ferrers diagrams, counted by half-perimeter and area, satisfy the linear q-difference
equation [87, Eq (5.4)]
P (x, q) =
(1− qx)2
P (qx, q) +
(1− qx)2
The perimeter generating function g0(x) = x
2/(1 − 2x) is obtained by setting q = 1 in
the above equation. Hence xc = 1/2. Using the functional equation, it can be shown by
induction on k that all area moment generating functions gk(x) are rational in g0(x) and
its derivatives. Hence all gk(x) are rational functions. Since the area of a polygon grows
at most quadratically with the perimeter, we have a bound on the exponent, γk ≤ 2k + 1,
of the leading singular part of gk(x). Given this bound, the method of dominant balance
can be applied. We set
P (x, q) =
(1− q) 12
1− 2x
(1− q) 12
, (1− q)
and introduce new variables s and ǫ by q = 1 − ǫ2 and 2x = 1 − sǫ. Then an expansion
of the functional equation yields, to order ǫ0, the ODE of first order F ′(s) = 4sF (s)− 1,
whose unique solution with the prescribed asymptotic behaviour is
F (s) =
It can be inferred from the differential equation that all coefficients in the asymptotic
expansion of F (s) at infinity are nonzero. Hence, the above exponent bound is tight.
It can be inferred from the functional equation by induction on k that each gk(x) is a
Laurent polynomial about xc = 1/2. Thus, Lemma 2 applies, and we obtain the moment
generating function of the corresponding random variable Eq. (18) as M(s) = exp(−s/8).
This is readily recognised as the moment generating function of a probability distribution
concentrated at x = 1/8.
A sequence of random variables, which satisfies the concentration property, often leads
to a Gaussian limit law, after centering and suitable normalisation. This is also the case
for Ferrers diagrams.
Theorem 2 ([97]). The area random variable X̃m of Ferrers diagrams has mean µm ∼
m2/8 and variance σ2m ∼ m3/48. The centred and normalised random variables
X̃m − µm
, (21)
converge in distribution to a Gaussian random variable.
Remarks. i) It is possible to prove this result by the method of dominant balance. The
idea of proof consists in studying the functional equation of the generating function for the
“centred coefficients” pm,n − µm.
ii) The above arguments can also be applied to stack polygons to yield the concentration
property and a central limit theorem.
3.3.2 Staircase polygons
The limit law of area of staircase polygons is the Airy distribution. This distribution (see
[34] and the survey [52]) is conveniently defined via its moments.
Definition 2 (Airy distribution [34]). The random variable Y is said to be Airy distributed
E[Y k]
Γ(γ0)
Γ(γk)
where γk = 3k/2− 1/2, and the numbers φk satisfy, for k ≥ 1, the quadratic recurrence
γk−1φk−1 +
φlφk−l = 0,
with initial condition φ0 = −1.
Remarks ([34, 58]). i) The first moment is E[Y ] =
π. The sequence of moments can
be shown to satisfy the Carleman condition. Hence the distribution is uniquely determined
by its moments.
ii) The numbers φk appear in the asymptotic expansion of the logarithmic derivative of
the Airy function at infinity,
logAi(s) ∼
(−1)kφk
s−γk (s → ∞),
where Ai(x) = 1
cos(t3/3 + tx) dt is the Airy function.
iii) Explicit expressions for the numbers φk are known [58]. They are, for k ≥ 1, given by
φk = 2
k+1 3
x3(k−1)/2
Ai(x)2 + Bi(x)2
where Bi(z) is the second standard solution of the Airy differential equation f ′′(z)−zf(z) =
iv) The Airy distribution appears in a variety of contexts [34]. In particular, the random
variable Y/
8 describes the law of the area of a Brownian excursion. See also [76] for an
overview from a physical perspective.
Explicit expressions have been derived for the moment generating function of the Airy
distribution and for its density.
Fact 1 ([19, 66, 99, 34]). The moment generating function M(t) = E[e−tY ] of the Airy
distribution satisfies the modified Laplace transform
(e−st−1)M(2−3/2t3/2)
dt = 21/3
Ai′(21/3s)
Ai(21/3s)
Ai′(0)
Ai(0)
. (22)
The moment generating function M(t) is given explicitly by
M(2−3/2t) =
−βkt2/32−1/3
where the numbers −βk are the zeros of the Airy function. Its density p(x) is given explicitly
23/2p(23/2x) =
e−vk v
where vk = 2β
k/(27x
2) and U(a, b, z) is the confluent hypergeometric function.
Remarks. i) The confluent hypergeometric function U(a, b; z) is defined as [1]
U(a, b; z) =
sin πb
1F1[a, b; z]
Γ(1 + a− b)Γ(b)
1F1[1 + a− b, 2− b; z]
Γ(a)Γ(2− b)
where 1F1[a; b; z] is the hypergeometric function
1F1[a; b; z] = 1 +
a(a+ 1)
b(b+ 1)
+ . . .
ii) The moment generating function and its density are obtained by two consecutive inverse
Laplace transforms of Eq. (22), see [67, 68] and [99, 54].
iii) In the proof of the following theorem, we will derive Eq. (22) using the model of staircase
polygons. This shows, in particular, that the coefficients φk appear in the asymptotic
expansion of the Airy function.
Theorem 3. The normalised area random variables Xm of staircase polygons Eq. (18)
satisfy
d−→ Y√
(m → ∞),
where Y is Airy distributed according to Definition 2. We also have moment convergence.
Remark. Given the functional equation of the half-perimeter and area generating function
of staircase polygons,
P (x, q) =
1− 2xq − P (qx, q)
(see [88] for a recent derivation), this result is a special case of Theorem 4 below, which is
stated in [25].
Proof. We use the method of dominant balance. From the functional equation Eq. (23),
we infer g0(x) = 1/4 +
1− 4x/2 + (1 − 4x)/4. Hence xc = 1/4. The structure of the
functional equation implies that all functions gk(x) can be written as Laurent series in
1− 4x, see also Proposition 7 below. Explicitly, we get g1(x) = x2/(1 − 4x). This
suggests γk = (3k− 1)/2. An upper bound of this form on the exponent γk can be derived
without too much effort from the functional equation, by an application of Faa di Bruno’s
formula, see also [89, Prop (4.4)]. Thus, the method of dominant balance can be applied.
We set
P (x, q) =
+ (1− q)1/3F
1− 4x
(1− q)2/3
, (1− q)1/3
and introduce variables s, ǫ by 4x = 1 − sǫ2 and q = 1 − ǫ3. In the above equation,
we excluded the constant 1/4 =: P (reg)(x, q), since it does not contribute to the moment
asymptotics. Expanding the functional equation to order ǫ2 gives the Riccati equation
F ′(s) + 4F (s)2 − s = 0. (24)
It follows that the coefficients fk of F (s) satisfy, for k ≥ 1, the quadratic recursion
γk−1fk−1 + 4
flfk−l = 0,
with initial condition f0 = −1/2. A comparison with the definition of the Airy distribution
shows that φk = 2
2k+1fk. Using the closure properties of ∆-regular functions, it can be
inferred from the functional equation that (the analytic continuation of) each factorial
moment generating function gk(x) is ∆-regular, with xc = 1/4, see also Proposition 7
below. Hence the transfer theorem Lemma 2 can be applied. We obtain 4Xm
d→ Y in
distribution and for moments, where Y is Airy distributed.
Remarks. i) The unique solution F (s) of the differential equation in the above proof
Eq. (24), satisfying the prescribed asymptotic behaviour, is given by
F (s) =
logAi(41/3s). (25)
The moment generating function M(t) of the limiting random variable X = limm→∞Xm
is related to the function F (s) via the modified Laplace transform
(e−st−1)M(t3/2) 1
dt = 4
π(F (s)− F (0)),
where the modification has been introduced in order to ensure a finite integral about the
origin. This result relates the above proof to Proposition 1.
ii) The method of dominant balance can be used to obtain corrections Fl(s) to the limiting
behaviour [87].
The fact that the area law of staircase polygons is, up to normalisation, the same as
that of the area under a Brownian excursion, suggests that there might be a combinato-
rial explanation. Indeed, as is well known, there is a bijection [21, 98] between staircase
polygons and Dyck paths, a discrete version of Brownian excursions [2], see figure 2 [88].
Within this bijection, the polygon area corresponds to the sum of peak heights of the Dyck
path, but not to the area below the Dyck path. For more about this connection, see the
Figure 2: [88] A combinatorial bijection between staircase polygons and Dyck paths [21, 98].
Column heights of a polygon correspond to peak heights of a path.
remark at the end of the following subsection.
3.4 q-difference equations
All polygon models discussed above have an algebraic perimeter generating function. More-
over, their half-perimeter and area generating function satisfies a functional equation of
the form
P (x, q) = G(x, q, P (x, q), P (qx, q)),
for a real polynomial G(x, q, y0, y1). Since, under mild assumptions on G, the equation
reduces to an algebraic equation for P (x, 1) in the limit q → 1, it may be viewed as a
“deformation” of an algebraic equation. In this subsection, we will analyse equations of
this type at the special point (x, q) = (xc, 1), where xc is the radius of convergence of
P (x, 1). It will appear that the methods used in the above examples also can be applied
to this more general case.
The above equation falls into the class of q-difference equations [103]. While particular
examples appear in combinatorics in a number of places, see e.g. [37], the asymptotic
behaviour of equations of the above form seems to have been systematically studied initially
in [25, 87]. The study can be done in some generality, e.g., also for non-polynomial power
series G, for replacements more general than x 7→ qx, and for multivariate generalisations,
see [89] and [25]. For simplicity, we will concentrate on polynomial G, and then briefly
discuss generalisations. Our exposition closely follows [89, 87].
3.4.1 Algebraic q-difference equations
Definition 3 (Algebraic q-difference equation [25, 87]). An algebraic q-difference equation
is an equation of the form
P (x, q) = G(x, q, P (x, q), P (qx, q), . . . , P (qNx, q)), (26)
where G(x, q, y0, y1, . . . , yN) is a complex polynomial. We require that
G(0, q, 0, 0, . . . , 0) ≡ 0,
(0, q, 0, 0, . . . , 0) ≡ 0 (k = 0, 1, . . . , N).
Remarks. i) See [103] for an overview of the theory of q-difference equations. As q
approaches unity, the above equation reduces to an algebraic equation.
ii) Asymptotics for solutions of algebraic q-difference equations have been considered in
[25]. The above definition is a special case of [89, Def 2.4], where a multivariate extension
is considered, and where G may be non-polynomial. Also, replacements more general than
x 7→ f(q)x are allowed. Such equations are called q-functional equations in [89]. The
results presented below apply mutatis mutandis also to q-functional equations.
The algebraic q-difference equation in Definition 3 uniquely defines a (formal) power
series P (x, q) satisfying P (0, q) ≡ 0. This is shown by analysing the implied recurrence
for the coefficients pm(q) of P (x, q) =
m>0 pm(q)x
m, see also [89, Prop 2.5]. In fact,
pm(q) is a polynomial in q. The growth of its degree in m is not larger than cm
2 for some
positive constant c, hence the counting parameters are rank 2 parameters [25]. In our
situation, such a bound holds, since the area of a polygon grows at most quadratically
with its perimeter.
From the preceding discussion, it follows that the factorial moment generating functions
gk(x) =
P (x, q)
are well-defined as formal power series. In fact, they can be recursively determined from the
q-difference equation by implicit differentiation, as a consequence of the following proposi-
tion.
Proposition 5 ([87, 89]). Consider the derivative of order k > 0 of an algebraic q-
difference equation Eq. (26) w.r.t. q, evaluated at q = 1. It is linear in gk(x), and its
r.h.s. is a complex polynomial in the power series gl(x) and its derivatives up to order
k − l, where l = 0, . . . , k.
Remarks. i) This statement can be shown by analysing the k-th derivative of the q-
difference equation, using Faa di Bruno’s formula [18].
ii) It follows that every function gk(x) is rational in gl(x) and its derivatives up to order
k− l, where 0 ≤ l < k. Since G is a polynomial, gk(x) is algebraic, by the closure properties
of algebraic functions.
We discuss analytic properties of the (analytic continuations of the) factorial moment
generating functions gk(x). These are determined by the analytic properties of g0(x) =
P (x, 1). We discuss the case of a square-root singularity of P (x, 1), which often occurs
for combinatorial structures, and which is well studied, see e.g. [79, Thm 10.6] or [37,
Ch VII.4]. Other cases may be treated similarly. We make the following assumption:
Assumption 2. The q-difference equation in Definition 3 has the following properties:
i) All coefficients of the polynomial G(x, q, y0, y1, . . . , yN) are non-negative.
ii) The polynomial Q(x, y) := G(x, 1, y, y, . . . , y) satisfies Q(x, 0) 6≡ 0 and has degree at
least two in y.
iii) P (x, 1) =
m≥1 pmx
m is aperiodic, i.e., there exist indices 1 ≤ i < j < k such that
pipjpk 6= 0, while gcd(j − i, k − i) = 1.
Remarks. i) The positivity assumption is natural for combinatorial constructions. There
are, however, q-difference equations with negative coefficients, which arise from systems of
q-difference equations with non-negative coefficients by reduction. Examples are convex
polygons [87, Sec 5.4] and directed convex polygons, see below.
ii) Assumptions i) and ii) result in a square-root singularity as the dominant singularity
of P (x, 1).
iii) Assumption iii) implies that there is only one singularity of P (x, 1) on its circle of
convergence. Since P (x, 1) has non-negative coefficients only, it occurs on the positive real
half-line. The periodic case can be treated by a straightforward extension [37].
An application of the (complex) implicit function theorem ensures that P (x, 1) is an-
alytic at the origin. It can be analytically continued, as long as the defining algebraic
equation remains invertible. Together with the positivity assumption, one can conclude
that there is a number 0 < xc < ∞, such that the analytic continuation of P (x, 1) satisfies
yc = limxրxc P (x, 1) < ∞, with
Q(xc, yc) = yc,
Q(xc, y)
With the positivity assumption on the coefficients, it follows that
Q(xc, y)
> 0, C :=
Q(x, yc)
> 0. (27)
These conditions characterise the singularity of P (x, 1) at x = xc as a square-root. It
can be shown that there exists a locally convergent expansion of P (x, 1) about x = xc, and
that P (x, 1) is analytic for |x| < xc. We have the following result. Recall that a function
f(z) is ∆-regular if it is analytic in the indented disc ∆ = {z : |z| ≤ xc+η, |Arg(z−xc)| ≥
φ} for some η > 0 and some φ, where 0 < φ < π/2.
Proposition 6 ([79, 37, 89]). Given Assumption 2, the power series P (x, 1) is analytic
at x = 0, with radius of convergence xc. Its analytic continuation is ∆-regular, with a
square-root singularity at x = xc and a local Puiseux expansion
P (x, 1) = yc +
f0,l(1− x/xc)1/2+l/2,
where yc = limxրxc P (x, 1) < ∞ and f0,0 = −
xcC/B, for constants B > 0 and C >
0 as in Eq. (27). The numbers f0,l can be recursively determined from the q-difference
equation.
The asymptotic behaviour of P (x, 1) = g0(x) carries over to the factorial moment
generating functions gk(x).
Proposition 7 ([89]). Given Assumption 2, all factorial moment generating functions
gk(x) are, for k ≥ 1, analytic at x = 0, with radius of convergence xc. Their analytic
continuations are ∆-regular, with local Puiseux expansions
gk(x) =
fk,l(1− x/xc)−γk+l/2,
where γk = 3k/2 − 1/2. The numbers fk,0 = fk are, for k ≥ 2, characterised by the
recursion
γk−1fk−1 +
flfk−l = 0,
and the numbers f0 < 0 and f1 > 0 are given by
f0 = −
, 4f1 =
k=1 k
(xc, 1, yc, yc, . . . , yc)
, (28)
for constants B > 0 and C > 0 as in Eq. (27).
Remarks. i) This result can be obtained by a direct analysis of the q-difference equation,
applying Faa di Bruno’s formula, see also [87, Sec 2.2].
ii) Alternatively, it can be obtained by applying the method of dominant balance to the
q-difference equation. To this end, one notes that all functions gk(x) are Laurent series in√
1− x/xc, and that their leading exponents are bounded from above by γk. (An upper
bound on an exponent is usually easier to obtain than its exact value, since cancellations
can be ignored). With these two ingredients, the method of dominant balance, as described
above, can be applied. The differential equation of the function F (s) then translates, via
a transfer theorem, into the above recursion for the coefficients. See [89, Sec 5].
The above result can be used to infer the limit distribution of area, along the lines of
Section 3.2.
Theorem 4 ([25, 89]). Let Assumption 2 be satisfied. For the solution of an algebraic
q-difference equation P (x, q) =
m,n pm,nx
mqn, let X̃m denote the random variable
P(X̃m = n) =
pm,n∑
n pm,n
(which is well-defined for almost all m). The mean of X̃m is given by
E[X̃m] ∼ 2
m3/2 (m → ∞),
where the numbers f0 and f1 are given in Eq. (28). The sequence of normalised random
variables Xm converges in distribution,
E[X̃m]
d−→ Y√
(m → ∞),
where Y is Airy distributed according to Definition 2. We also have moment convergence.
Remarks. i) An explicit calculation shows that φk = |f0|−1
fk. Together with
Proposition 7, the claim of the proof follows by standard reasoning, as in the examples
above.
ii) The above theorem appears in [25, Thm 3.1], together with an indication of the ar-
guments of a proof. [There is a misprint in the definition of γ in [25, Thm 3.1]. In our
notation γ = 4Bf1.] Within the more general setup of q-functional equations, the theorem
is a special case of [89, Thm 1.5].
iii) The above theorem is a kind of central limit theorem for combinatorial constructions,
since the Airy distribution arises under natural assumptions for a large class of combina-
torial constructions. For a connection to certain Brownian motion functionals, see below.
3.4.2 q-functional equations and other extensions
We discuss extensions of the above result. Generically, the dominant singularity of P (x, 1)
is a square-root. The case of a simple pole as dominant singularity, which generalises
the example of Ferrers diagrams, has been discussed in [87]. Under weak assumptions,
the resulting limit distribution of area is concentrated. Other singularities can also be
analysed, as shown in the examples of rectangles above and of directed convex polygons in
the following subsection. Compare also [90].
The case of non-polynomial G can be discussed along the same lines, with certain
assumptions on the analyticity properties of the series G. In the undeformed case q = 1,
it is a classical result [37, Ch VII.3] that the generating function has a square-root as
dominant singularity, as in the polynomial case. One can then argue along the above
lines that an Airy distribution emerges as the limit law of the deformation variable [89,
Thm 1.5]. Such an extension is relevant, since prominent combinatorial models, such as the
Cayley tree generating function, fall into that class. See also the discussion of self-avoiding
polygons below.
The above statements also remain valid for more general classes of replacements x 7→ qx,
e.g., for replacements x 7→ f(q)x, where f(q) is analytic for 0 ≤ q ≤ 1, with non-negative
series coefficients about q = 0. More interestingly, the idea of introducing a q-deformation
may be iterated [25], leading to equations such as
P (x, q1, . . . , qM) = G(x, P (xq1 · . . . · qM , q1q2 · . . . · qM , q2q3 · . . . · qM , . . . , qM)). (29)
The counting parameters corresponding to qk are rank k + 1 parameters, and limit distri-
butions for such quantities have been derived for some types of singularities [77, 78, 88].
There is a central limit result for the generic case of a square-root singularity [89]. This
generalisation applies to counting parameters, which decompose linearly under a combina-
torial construction. These results can also be obtained by an alternative method, which
generalises to non-linear parameters, see [51].
The case where the limit q to unity in a q-difference equation is not algebraic, has not
been discussed. For example, if G(x, q, P (x, q), P (qx, q)) = 0 for some polynomial G, the
limit q to unity might lead to an algebraic differential equation for P (x, 1). This may be
seen by noting that
f(x)− f(qx)
(1− q)
= xf ′(x),
for f(x) differentiable at x. Such equations are possibly related to polygon models such as
three-choice polygons [44] or punctured staircase polygons [45]. Their perimeter generating
function is not algebraic, hence the models do not satisfy an algebraic q-difference equation
as in Definition 3.
3.4.3 A stochastic connection
Lastly, we indicate a link to Brownian motion, which appears in [99, 100] and was further
developed in [77, 78, 89, 88]. As we saw in Section 3.2, limit distributions can, under
certain conditions, be characterised by a certain Laplace transform of their moment gen-
erating functions. This approach, which arises naturally from the viewpoint of generating
functions, can be applied to discrete versions of Brownian motion, excursions, bridges or
meanders. Asymptotic results are results for the corresponding stochastic objects. In fact,
distributions of some functionals of Brownian motion have apparently first been obtained
using this approach [99, 100].
Interestingly, a similar characterisation appears in stochastics for functionals of Brow-
nian motion, via the Feynman-Kac formula. For example, Louchard’s formula [66] relates
the logarithmic derivate of the Airy function to a certain Laplace transform of the moment
generating function of the law of the Brownian excursion area. Distributions of functionals
of Brownian motion can also be obtained by a path integral approach, see [75] for a recent
overview.
The discrete approach provides an alternative method for obtaining information about
distributions of certain functionals of Brownian motion. For such functionals, it provides
an alternative proof of Louchard’s formula [77, 78]. It leads, via the method of dominant
balance, quite directly to moment recurrences for the underlying distribution. These have
been studied in the case of rank k parameters for discrete models of Brownian motion.
In particular, they characterise the distributions of integrals over (k − 1)-th powers of the
corresponding stochastic objects [77, 78, 89, 88]. Such results have apparently not been
previously derived using stochastic methods. The generating function approach can also be
applied to classes of q-functional equations with singularities different from those connected
to Brownian motion. For a related generalisation, see [10].
Vice versa, results and techniques from stochastics can be (and have been) analysed in
order to study asymptotic properties of polygons. An example is the contour process of
simply generated trees [38], which asymptotically describes the area of a staircase polygon.
See also [69, 70, 71, 59].
3.5 Directed convex polygons
We show that the limit law of area of directed convex polygons in the uniform fixed
perimeter ensemble is that of the area of the Brownian meander.
Fact 2 ([100, Thm 2]). The random variable Z of area of the Brownian meander is char-
acterised by
E[Zk]
Γ(α0)
Γ(αk)
where αk = 3k/2 + 1/2. The numbers ωk satisfy for k ≥ 1 the quadratic recurrence
αk−1ωk−1 +
−lωk−l = 0,
with initial condition ω0 = 1, where the numbers φk appear in the Airy distribution as in
Definition 2.
Remarks. i) This result has been derived using a discrete meander, whose length and
area generating function is described by a system of two algebraic q-difference equations,
see [77, Prop 1].
ii) We have E[Z] = 3
2π/8 for the mean of Z. The random variable Z is uniquely
determined by its moments. The numbers ωk appear in the asymptotic expansion [100,
Thm 3]
Ω(s) =
Ai(t) dt
3Ai(s)
(−1)kωks−αk (s → ∞),
where Ai(x) = 1
cos(t3/3 + tx) dt is the Airy function.
Explicit expressions have been derived for the moment generating function and for the
distribution function of Z.
Fact 3 ([100, Thm 5]). The moment generating function M(t) = E[e−tZ ] of Z satisfies the
Laplace transform ∫ ∞
e−stM(
2 t3/2)
πΩ(s). (30)
It is explicitly given by
M(t) = 2−1/6t1/3
Rk exp(−βkt2/32−1/3)
for ℜ(t) > 0, where the numbers −βk are the zeroes of the Airy function, and where
βk(1 + 3
Ai(−t) dt)
3Ai′(−βk)
The random variable Z has a continuous density p(y), with distribution function R(x) =∫ x
p(y) dy given by
R(x) =
(18)1/6x
−vk v
k Ai((3vk/2)
2/3),
where vk = (βk)
3/(27x2).
Remark. The moment generating function and the distribution function are obtained by
two consecutive inverse Laplace transforms of Eq. (30).
Theorem 5. The normalised area random variablesXm of directed convex polygons Eq. (18)
satisfy
d−→ 1
Z (m → ∞),
where Z is the area random variable of the Brownian meander as in Fact 2. We also have
moment convergence.
Proof. A system of q-difference equations for the generating function Q(x, y, q) of directed
convex polygons, counted by width, height and area, has been given in [9, Lemma 1.1]. It
can be reduced to a single equation,
q(qx− 1)Q(x, y, q) + ((1 + q)(P (x, y, q) + y))Q(qx, y, q)+
xyq − y2 + P (x, y, q)(qx− y − 1)
Q(q2x, y, q)
− q2xy (y + P (x, y, q)− 1) = 0,
where P (x, y, q) is the width, height and area generating function of staircase polygons.
Setting q=1 and x = y yields the half-perimeter generating function
g0(x) =
1− 4x
Hence xc = 1/4 for the radius of convergence of Q(x, x, 1).
It is possible to derive from Eq. (31) a q-difference equation for the (isotropic) half-
perimeter and area generating function Q(x, q) = Q(x, x, q) of directed convex polygons.
This is due to the symmetry Q(x, y, q) = Q(y, x, q), which results from invariance of the
set of directed convex polygons under reflection along the negative diagonal y = −x. Since
this equation is quite long, we do not give it here. By arguments analogous to those of the
previous subsection, it can be deduced from this equation that all area moment generating
functions gk(x) of Q(x, 1) are Laurent series in s =
1− 4x, see also [89, Prop (4.3)]. The
leading singular exponent of gk(x), defined by gk(x) ∼ hk(1− x/xc)−αk as x ր xc, can be
bounded from above by αk ≤ 3k/2 + 1/2, see also [89, Prop (4.4)] for the argument. We
apply the method of dominant balance, in order to prove that αk = 3k/2 + 1/2 and to
yield recurrences for the coefficients hk. We define
P (x, q) =
+ (1− q)1/3F
1− 4x
(1− q)2/3
, (1− q)1/3
Q(x, q) = (1− q)−1/3H
1− 4x
(1− q)2/3
, (1− q)1/3
where F (s) = F (s, 0) has already been determined in Eq. (25). We set 4x = 1 − sǫ2,
q = 1− ǫ3, and expand the q-difference equation to leading order in ǫ. We get for H(s) :=
H(s, 0) the inhomogeneous linear differential equation of first order
H ′(s) + 4H(s)F (s) +
This implies for the coefficients hk of H(s) =
k≥0 hks
−αk and fk of F (s) =
k≥0 fks
for k ≥ 1 the quadratic recursion
αk−1hk−1 + 4
flhk−l = 0,
where h0 = 1/16. Using fk = 2
−2k−1φk, we obtain the meander recursion in Fact 2 by
setting hk = 2
−k−4ωk. It can be inferred from the functional equation that (the analytic
continuations of) all factorial moment generating functions are ∆-regular, with xc = 1/4.
Thus Lemma 2 applies, and we conclude Xm
d→ Z/2.
Remarks. i) The above theorem states that the limit distribution of area of directed
convex polygons coincides, up to normalisation, with the area distribution of the Brownian
meander [100]. This suggests that there might exist a combinatorial bijection to discrete
meanders, in analogy to that between staircase polygons and Dyck paths. Up to now,
a “nice” bijection has not been found, see however [6, 72] for combinatorial bijections to
discrete bridges.
ii) The above proof relies on a q-difference equation for the isotropic generating function
Q(x, x, q). Up to normalisation, the meander distribution also appears for the anisotropic
model Q(x, y, q), where 0 < y < 1/2 is fixed, as can be shown by a considerably simpler
calculation. The normalisation constant coincides with that of the isotropic model for
y = 1/2. The latter statement is also a consequence of the fact that the height random
variable of directed polygons is asymptotically Gaussian, after centering and normalisation.
Analogous considerations apply to the relation between isotropic and anisotropic versions
of the other polygon classes.
3.6 Limit laws away from (xc, 1)
As motivated in the introduction, limit laws in the fixed perimeter ensemble for q 6= 1 are
expected to be Gaussian. The same remark holds for the fixed area ensemble for x 6= xc.
There are partial results for the model of staircase polygons. The fixed area ensemble can,
for x < xc and q near unity, be analysed using Fact 7 of the following section. For staircase
polygons in the uniform fixed area ensemble x = 1, the following result holds.
Fact 4 ([37, Prop IX.11]). Consider the perimeter random variable of staircase polygons
in the uniform fixed area ensemble,
P(Ỹn = m) =
pm,n∑
m pm,n
The variable Ỹn has mean µn ∼ µ ·n and standard deviation σn ∼ σ
n, where the numbers
µ and σ satisfy
µ = 0.8417620156 . . . , σ = 0.4242065326 . . .
The centred and normalised random variables
Ỹn − µn
converge in distribution to a Gaussian random variable.
Remark. The above result is proved using an explicit expression for the half-perimeter
and area generating function, as a ratio of two q-Bessel functions. It can be shown that this
expression is meromorphic about (x, q) = (1, qc) with a simple pole, where qc is the radius
of convergence of the generating function P (1, q). The explicit form of the singularity
about (1, qc) yields a Gaussian limit law.
There are a number of results for classes of column-convex polygons in the uniform
fixed area ensemble, typically leading to Gaussian limit laws. The upper and lower shape
of a polygon can be described by Brownian motions. See [69, 70, 71] for details. It would
be interesting to prove convergence to a Gaussian limit law within a more general frame-
work, such as q-difference equations. Analogous questions for other functional equations,
describing counting parameters such as horizontal width, have been studied in [24].
3.7 Self-avoiding polygons
A numerical analysis of self-avoiding polygons, using data from exact enumeration [91,
92], supports the conjecture that the limit law of area is, up to normalisation, the Airy
distribution.
Let pm,n denote the number of square lattice self-avoiding polygons of half-perimeter
m and area n. Exact enumeration techniques have been applied to obtain the numbers
pm,n for all values of n for given m ≤ 50. Numerical extrapolation techniques yield very
accurate estimates of the asymptotic behaviour of the coefficients of the factorial moment
generating functions. To leading order, these are given by
[xm]gk(x) =
(n)kpm,n ∼ Akx−mc m3k/2−3/2−1 (m → ∞), (32)
for positive amplitudes Ak. The above form has been numerically checked [91, 92] for values
k ≤ 10 and is conjectured to hold for arbitrary k. The value xc is the radius of convergence
of the half-perimeter generating function of self-avoiding polygons. The amplitudes Ak
have been extrapolated to at least five significant digits. In particular, we have
xc = 0.14368062927(2), A0 = 0.09940174(4), A1 = 0.0397886(1),
where the numbers in brackets denote the uncertainty in the last digit. An exact value of
the amplitude A1 = 1/(8π) has been predicted [15] using field-theoretic arguments.
The particular form of the exponent implies that the model of rooted self-avoiding poly-
gons p̃m,n = mpm,n has the same exponents φ = 2/3 and θ = 1/3 as staircase polygons. In
particular, it implies a square-root as dominant singularity of the half-perimeter generat-
ing function. Together with the above result for q-functional equations, this suggests that
(rooted) self-avoiding polygons might obey the Airy distribution as a limit law of area.
A natural method to test this conjecture consists in analysing ratios of moments, such
that a normalisation constant is eliminated. Such ratios are also called universal amplitude
ratios. If the conjecture were true, we would have asymptotically
E[X̃km]
E[X̃m]k
∼ k! Γ(γ1)
Γ(γk)Γ(γ0)k−1
(m → ∞),
for the area random variables X̃m as in Eq. (17). The numbers φk and exponents γk are
those of the Airy distribution as in Definition 2. The above form was numerically confirmed
for values of k ≤ 10 to a high level of numerical accuracy. The normalisation constant is
obtained by noting that E[Y ] =
Conjecture 1 (cf [91, 92]). Let pm,n denote the number of square lattice self-avoiding
polygons of half-perimeter m and area n. Let X̃m denote the random variable of area in
the uniform fixed perimeter ensemble,
P(X̃m = n) =
pm,n∑
n pm,n
We conjecture that
E[X̃m]
d−→ Y√
where Y is Airy distributed according to Definition 2.
Remarks. i) Field theoretic arguments [15] yield A1 = 1/(8π).
ii) References [91, 92] contain conjectures for the scaling function of self-avoiding polygons
and rooted self-avoiding polygons, see the following section. In fact, the numerical analysis
in [91, 92] mainly concerns the area amplitudes Ak, which determine the limit distribution
of area.
iii) The area law of self-avoiding polygons has also been studied [91, 92] on the triangular
and hexagonal lattices. As for the square lattice, the area limit law appears to be the Airy
distribution, up to normalisation.
iv) It is an open question whether there are non-trivial counting parameters other than the
area, whose limit law (in the fixed perimeter ensembles) coincides between self-avoiding
polygons and staircase polygons. See [88] for a negative example. This indicates that
underlying stochastic processes must be quite different.
v) A proof of the above conjecture is an outstanding open problem. It would be interest-
ing to analyse the emergence of the Airy distribution using stochastic Loewner evolution
[60]. Self-avoiding polygons at criticality are conjectured to describe the hull of critical
percolation clusters and the outer boundary of two-dimensional Brownian motion [60].
A numerical analysis of the fixed area ensemble along the above lines again shows
behaviour similar to that of staircase polygons. This supports the following conjecture.
Conjecture 2. Consider the perimeter random variable of self-avoiding polygons in the
uniform fixed area ensemble,
P(Ỹn = m) =
pm,n∑
m pm,n
The random variable Ỹn is conjectured to have mean µn ∼ µ · n and standard deviation
σn ∼ σ
n, where the numbers µ and σ satisfy
µ = 1.855217(1), σ2 = 0.3259(1),
where the number in brackets denotes the uncertainty in the last digit. The centred and
normalised random variables
Ỹn − µn
are conjectured to converge in distribution to a Gaussian random variable.
The above conjectures, together with the results of the previous subsection, also raise
the question whether rooted square-lattice self-avoiding polygons, counted by half-perimeter
and area, might satisfy a q-functional equation. In particular, it would be interesting to
consider whether rooted self-avoiding polygons might satisfy
P (x) = G(x, P (x)), (33)
for some power series G(x, y) in x, y. If the perimeter generating function P (x) is not
algebraic, this excludes polynomials G(x, y) in x and y. Note that the anisotropic perimeter
generating function of self-avoiding polygons is not D-finite [86]. It is thus unlikely that
the isotropic perimeter generating function is D-finite and, in particular, algebraic. On the
other hand, solutions of Eq. (33) need not be algebraic nor D-finite. An example is the
Cayley tree generating function T (x) satisfying T (x) = x exp(T (x)), see [33].
3.8 Punctured polygons
Punctured polygons are self-avoiding polygons with internal holes, which are also self-
avoiding polygons. The polygons are also mutually avoiding. The perimeter of a punctured
polygon is the sum of the lengths of its boundary curves, the area of a punctured polygon is
the area of the outer polygon minus the area of the holes. Apart from intrinsic combinato-
rial interest, models of punctured polygons may be viewed as arising from two-dimensional
sections of three-dimensional self-avoiding vesicles. Counted by area, they may serve as an
approximation to the polyomino model.
We consider, for a given subclass of self-avoiding polygons, punctured polygons with
holes from the same subclass. The case of a bounded number of punctures of bounded
size can be analysed in some generality. The case of a bounded number of punctures of
unbounded size leads to simple results if the critical perimeter generating function of the
model without punctures is finite.
For a given subclass of self-avoiding polygons, the number pm,n denotes the number
of polygons with half-perimeter m and area n. Let p
(r,s)
m,n denote the number of polygons
with r ≥ 1 punctures whose half-perimeter sum equals s. Let p(r)m,n denote the number of
polygons with r ≥ 1 punctures of arbitrary size.
Theorem 6 ([94, Thms 1,2]). Assume that, for a class of self-avoiding polygons with-
out punctures, the area moment coefficients p
n≥0 n
kpm,n have, for k ∈ N0, the
asymptotic form
p(k)m ∼ Akx−mc mγk−1 (m → ∞),
for numbers Ak > 0, for 0 < xc ≤ 1 and for γk = (k − θ)/φ, where 0 < φ < 1. Let
g0(x) =
m≥0 p
m denote the half-perimeter generating function.
Then, the area moment coefficient p
(r,k,s)
(r,s)
m,n of the polygon class with r ≥ 1
punctures whose half-perimeter sum equals s is, for k ∈ N0, asymptotically given by
p(r,k,s)m ∼ A
(r,s)
γk+r−1 (m → ∞),
where A
(r,s)
xsc[x
s](g0(x))
If θ > 0, the area moment coefficient p
(r,k)
m,n of the polygon class with r ≥ 1
punctures of arbitrary size satisfies, for k ∈ N0, asymptotically
p(r,k)m ∼ A
γk+r−1 (m → ∞),
where the amplitudes A
k are given by
Ak+r(g0(xc))
Remarks. i) The basic argument in the proof of the preceding result involves an estimate
of interactions of hole polygons with one another or with the boundary of the external
polygon, which are shown to be asymptotically irrelevant. This argument also applies in
higher dimensions, as long as the exponent φ satisfies 0 < φ < 1.
ii) In the case of an infinite critical perimeter generating function, such as for subclasses of
convex polygons, boundary effects are asymptotically relevant, if punctures of unbounded
size are considered. The case of an unbounded number of punctures, which approximates
the polyomino problem, is unsolved.
iii) The above result leads to new area limit distributions. For rectangles with r punctures
of bounded size, we get βr+1,1/2 as the limit distribution of area. For staircase polygons with
punctures, we obtain generalisations of the Airy distribution, which are discussed in [94].
In contrast, for Ferrers diagrams with punctures of bounded size, the limit distribution of
area stays concentrated.
iv) The theorem also applies to models of punctured polygons, which do not satisfy an
algebraic q-difference equation. An example is given by staircase polygons with a staircase
hole of unbounded size, whose perimeter generating function is not algebraic [45].
3.9 Models in three dimensions
There are very few results for models in higher dimensions, notably for models on the cubic
lattice. There are a number of natural counting parameters for such objects. We restrict
consideration to area and volume, which is the three-dimensional analogue of perimeter
and area of two-dimensional models.
One prominent model is self-avoiding surfaces on the cubic lattice, also studied as
a model of three-dimensional vesicle collapse. We follow the review in [102] (see also
the references therein) and consider closed orientable surfaces of genus zero, i.e., surfaces
homeomorphic to a sphere. Numerical studies indicate that the surface generating function
displays a square-root γ = −1/2 as the dominant singularity.
Consider the fixed surface area ensemble with weights proportional to qn, with n the
volume of the surface. One expects a deflated phase (branched polymer phase) for small
values of q and an inflated phase (spherical phase) for large values of q. In the deflated
phase, the mean volume of a surface should grow proportionally to the aream of the surface,
in the inflated phase the mean volume should grow like m3/2 with the surface. Numerical
simulations suggest a phase transition at q = 1 with exponent φ = 1. This indicates that
a typical surface resembles a branched polymer, and a concentrated distribution of volume
is expected. Note that this behaviour differs from that of the two-dimensional model of
self-avoiding polygons.
Even relatively simple subclasses of self-avoiding surfaces such as rectangular boxes [73]
and plane partition vesicles [50], generalising the two-dimensional models of rectangles and
Ferrers diagrams, display complicated behaviour. Let pm,n denote the number of surfaces of
area m and volume n and consider the generating function P (x, q) =
m,n pm,nx
mqn. For
rectangular box vesicles, we apparently have P (x, 1) ∼ A| log(1−x)|/(1−x)3/2 as x → 1−,
some some constant A > 0, see [73, Eq (35)]. In the fixed surface area ensemble, a linear
polymer phase 0 < q < 1 is separated from a cubic phase q > 1. At q = 1, we have φ = 2/3,
such that typical rectangular boxes are expected to attain a cubic shape. We expect a limit
distribution which is concentrated. For plane partition vesicles, it is conjectured on the
basis of numerical simulations [50, Sec 4.1.1] that P (x, 1) ∼ A exp(α/(xc−x)1/3)/(xc−x)γ ,
where γ ≈ 1.7 at xc = 0.8467(3), for non-vanishing constants A and α. It is expected that
φ = 1/2.
As in the previous subsection, three-dimensional models of punctured vesicles may
be considered. The above arguments hold, if the exponent φ satisfies 0 < φ < 1. A
corresponding result for punctures of unbounded size can be stated if the critical surface
area generating function is finite.
3.10 Summary
In this section, we described methods to extract asymptotic area laws for polygon models
on the square lattice, and we applied these to various classes of polygons. Some of the laws
were found to coincide with those of the (absolute) area under a Brownian excursion and a
Brownian meander. A combinatorial explanation for the latter result has not been given.
Is there a simple polygon model with the same area limit law as the area under a Brownian
bridge? The connection to stochastics deserves further investigation. In particular, it would
be interesting to identify underlying stochastic processes. For an approach to a number of
different random combinatorial structures starting from a probabilistic viewpoint, see [82].
Area laws of polygon models in the uniform fixed perimeter ensemble q = 1 have been
understood in some generality, by an analysis of the singular behaviour of q-functional
equations about the point (x, q) = (xc, 1). Essentially, the type of singularity of the
half-perimeter generating function determines the limit law. A refined analysis can be
done, leading to local limit laws and providing convergence rates. Also, limit distributions
describing corrections to the asymptotic behaviour can be derived. They seem to coincide
with distributions arising in models of punctured polygons, see [94].
For non-uniform ensembles, concentrated distributions are expected, but general re-
sults, e.g. for q-functional equations, are lacking. These may be obtained by multivariate
singularity analysis, see also [24, 65].
The underlying structure of q-functional equations appears in a number of other combi-
natorial models, such as models of two-dimensional directed walks, counted by length and
area between the walk and the x-axis, models of simply generated trees, counted by the
number of nodes and path length, and models which appear in the average case analysis
of algorithms, see [34, 37]. Thus, the above methods and results can be applied to such
models. In statistical physics, this mainly concerns models of (interacting) directed walks,
see [48] for a review. There is also an approach to the behaviour of such walks from a
stochastic viewpoint, see e.g. the review [101].
There are exactly solvable polygon models, which do not satisfy an algebraic q-difference
equation, such as three-choice polygons [44], punctured staircase polygons [45], prudent
polygon subclasses [96], and possibly diagonally convex polygons. For a rigorous analysis
of the above models, it may be necessary to understand q-difference equations with more
general holonomic solutions, as q approaches unity.
Focussing on self-avoiding polygons, it might be interesting to analyse whether the
perimeter generating function of rooted self-avoiding polygons might satisfy an implicit
equation Eq. (33). Asymptotic properties of the area can possibly be studied using stochas-
tic Loewner evolution [60]. Another open question concerns the area limit law for q 6= 1
or the perimeter limit law for x 6= xc, where Gaussian behaviour is expected. At present,
even the simpler question of analyticity of the critical curve xc(q) for 0 < q < 1 is open.
Most results of this section concerned area limit laws of polygon models. Similarly,
one can ask for perimeter laws in the fixed area ensemble. Results have been given for
the uniform ensemble. Generally, Gaussian limit laws are expected away from criticality,
i.e., away from x = xc. Perimeter laws are more difficult to extract from a q-functional
equation than area laws. We will however see in the following section that, surprisingly,
under certain conditions, knowledge of the area limit law can be used to infer the perimeter
limit law at criticality.
4 Scaling functions
From a technical perspective, the focus in the previous section was on the singular be-
haviour of the single-variable factorial moment generating function gk(x) Eq. (1), and on
the associated asymptotic behaviour of their coefficients. This yielded the limiting area
distribution of some polygon models.
In this section, we discuss the more general problem of the singular behaviour of the two-
variable perimeter and area generating function of a polygon model. Near the special point
(x, q) = (xc, 1), the perimeter and area generating function P (x, q) =
m≥0 pm(q)x
n≥0 an(x)q
n is expected to be approximated by a scaling function, and the correspond-
ing coefficient functions pm(q) and an(x) are expected to be approximated by finite size
scaling functions. As we will see, scaling functions encapsulate information about the limit
distributions discussed in the previous section, and thus have a probabilistic interpretation.
We will give a focussed review, guided by exactly solvable examples, since singularity
analysis of multivariate generating functions is, in contrast to the one-variable case, not
very well developed, see [81] for a recent overview. Methods of particular interest to poly-
gon models concern asymptotic expansions about multicritical points, which are discussed
for special examples in [80, 5]. Conjectures for the behaviour of polygon models about
multicritical points arise from the physical theory of tricritical scaling [41], see the review
[61], which has been adapted to polygon models [14, 13]. There are few rigorous results
about scaling behaviour of polygon models, which we will discuss. This will complement
the exposition in [47]. See also [42, Ch 9] for the related subject of scaling in percolation.
4.1 Scaling and finite size scaling
The half-perimeter and area generating function of a polygon model P (x, q) about (x, q) =
(xc, 1) is expected to be approximated by a scaling function. This is motivated by the
following heuristic argument. Assume that the factorial area moment generating functions
gk(x) Eq. (1) have, for values x < xc, a local expansion about x = xc of the form
gk(x) =
(1− x/xc)γk,l
where γk,l = (k − θl)/φ and θl+1 > θl. Disregarding questions of analyticity, we argue
P (x, q) ≈
(−1)k
(1− x/xc)γk,l
(1− q)k
(1− q)θl
(−1)kfk,l
1− x/xc
(1− q)φ
)−γk,l)
In the above calculation, we replaced P (x, q) by its Taylor series about q = 1, and then
replaced the Taylor coefficients by their expansion about x = xc. The preceding heuristic
calculation has, for some polygon models and on a formal level, a rigorous counterpart,
see the previous section. In the above expression, the r.h.s. depends on series Fl(s) =∑
k≥0(−1)kfk,ls−γk,l of a single variable of combined argument s = (1 − x/xc)/(1 − q)φ.
Restricting to the leading term l = 0, this motivates the following definition. For φ > 0
and xc > 0, we define for numbers s−, s+ ∈ [−∞,+∞] the domain
D(s−, s+) = {(x, q) ∈ (0,∞)× (0, 1) : s− < (1− x/xc)/(1− q)φ < s+)}.
Definition 4 (Scaling function). For numbers pm,n with generating function P (x, q) =∑
m,n pm,nx
mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of convergence
of P (x, 1). Assume that there exist constants s−, s+ ∈ [−∞,+∞] satisfying s− < s+ and
a function F : (s−, s+) → R, such that P (x, q) satisfies, for real constants θ and φ > 0,
P (sing)(x, q) ∼ (1− q)θF
1− x/xc
(1− q)φ
(x, q) → (xc, 1) in D(s−, s+). (34)
Then, the function F(s) is called an (area) scaling function, and θ and φ are called critical
exponents.
Remarks. i) In analogy to the one-variable case, the above asymptotic equality means
that there exists a power series P (reg)(x, q) convergent for |x| < x1 and |q| < q1, where
x1 > xc and q1 > 1, such that the function P
(sing)(x, q) := P (x, q)− P (reg)(x, q) is asymp-
totically equal to the r.h.s..
ii) Due to the region D(s−, s+) where the limit (x, q) → (xc, 1) is taken, admissible values
(x, q) satisfy 0 < q < 1 and 0 < x < x0(q), where x0(q) = xc(1− s−(1− q)φ), if s− 6= −∞.
Thus, in this case, the critical curve xc(q) satisfies xc(q) ≥ x0(q) as q approaches unity.
Note that equality need not hold in general.
iii) The method of dominant balance was originally applied in order to obtain a defining
equation for a scaling function F(s) from a given functional equation of a polygon model.
This assumes the existence of a scaling function, together with additional analyticity prop-
erties. See [84, 91, 87].
iv) For particular examples, an analytic scaling function F(s) exists, with an asymptotic
expansion about infinity, and the area amplitude series F (s) agrees with the asymptotic
series, see below.
v) There is an alternative definition of a scaling function [31] by demanding
P (sing)(x, q) ∼ 1
(1− x/xc)−θ/φ
(1− x/xc)1/φ
(x, q) → (xc, 1) (35)
in a suited domain, for a functionH(t) of argument t = (1−q)/(1−x/xc)1/φ. Such a scaling
form is also motivated by the above argument. One may then call such a function H(t) a
perimeter scaling function. If F(s) is a scaling function, then a function H(t), satisfying
Eq. (35) in a suited domain, is given by
H(t) = tθF(t−φ).
If s− ≤ 0 and s+ = ∞, the particular scaling form Eq. (34) implies a certain asymptotic
behaviour of the critical area generating function and of the half-perimeter generating
function. The following lemma is a consequence of Definition 4.
Lemma 3. Let the assumptions of Definition 4 be satisfied.
i) If s+ = ∞ and if the scaling function F(s) has the asymptotic behaviour
F(s) ∼ f0s−γ0 (s → ∞),
then γ0 = − θφ , and the half-perimeter generating function P (x, 1) satisfies
P (sing)(x, 1) ∼ f0(1− x/xc)θ/φ (x ր xc).
ii) If s− ≤ 0 and if the scaling function F(s) has the asymptotic behaviour
F(s) ∼ h0sα0 (s ց 0),
then α0 = 0, and the critical area generating function P (xc, q) satisfies
P (sing)(xc, q) ∼ h0(1− q)θ (q ր 1).
A sufficient condition for equality of the area amplitude series and the scaling function
is stated in the following lemma, which is an extension of Lemma 3.
Lemma 4. Let the assumptions of Definition 4 be satisfied.
i) Assume that the relation Eq. (34) remains valid under arbitrary differentiation w.r.t.
q. If s+ = ∞, if the scaling function F(s) has an asymptotic expansion
F(s) ∼
(−1)kfks−γk (s → ∞),
and if an according asymptotic expansion is true for arbitrary derivatives, then the
following statements hold.
a) The exponent γk is, for k ∈ N0, given by
k − θ
b) The scaling function F(s) determines the asymptotic behaviour of the factorial
area moment generating functions via
P (x, q)
)(sing)
(1− x/xc)γk
(x ր xc).
ii) Assume that the relation Eq. (34) remains valid under arbitrary differentiation w.r.
to x. If s− ≤ 0, and if the scaling function F(s) has an asymptotic expansion
F(s) ∼
(−1)khksαk (s ց 0),
and if an according asymptotic expansion is true for arbitrary derivatives, then the
following statements hold.
a) The exponent αk is, for k ∈ N0, given by αk = k.
b) The scaling function determines the asymptotic behaviour of the factorial perime-
ter moment generating functions at x = xc via
P (x, q)
)(sing)
(1− q)βk
(q ր 1),
where βk = kφ− θ.
Remarks. Lemma 4 states conditions under which the area amplitude series coincides
with the scaling function. Given these conditions, the scaling function also determines the
perimeter law of the polygon model at criticality.
In the one-variable case, the singular behaviour of a generating function translates,
under suitable assumptions, to the asymptotic behaviour of its coefficients. We sketch
the analogous situation for the asymptotic behaviour of a generating function involving a
scaling function.
Definition 5 (Finite size scaling function). For numbers pm,n with generating function
P (x, q) =
m,n pm,nx
mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of
convergence of the generating function P (x, 1).
i) Assume that there exist a number t+ ∈ (0,∞] and a function f : [0, t+] → R, such
that the perimeter coefficient function asymptotically satisfies, for real constants γ0
and φ > 0,
[xm]P (x, q) ∼ x−mc mγ0−1f(m1/φ(1− q)) (q,m) → (1,∞),
where the limit is taken for m a positive integer and for real q, such that m1/φ(1−q) ∈
[0, t+]. Then, the function f(t) is called a finite size (perimeter) scaling function.
ii) Assume that there exist constants t− ∈ [−∞, 0), t+ ∈ (0,∞], and a function h :
[t−, t+] → R, such that the area coefficient function asymptotically satisfies, for real
constants β0 and φ > 0,
[qn]P (x, q) ∼ nβ0−1h(nφ(1− x/xc)) (x, n) → (xc,∞),
where the limit is taken for n a positive integer and real x, such that nφ(1− x/xc) ∈
[t−, t+]. Then, the function h(t) is called a finite size (area) scaling function.
Remarks. i) The following heuristic calculation motivates the expectation that a finite
size scaling function approximates the coefficient function. For the perimeter coefficient
function, assume that the exponents γk of the factorial area moment generating functions
are of the special form γk = (k − θ)/φ. We argue
[xm]P (x, q) ≈ [xm]
(−1)k fk
(1− x/xc)γk
(1− q)k
≈ x−mc mγ0−1
(−1)k
Γ(γk)
m1/φ(1− q)
In the above expression, the r.h.s. depends on a function f(t) of a single variable of
combined argument t = m1/φ(1− q).
For the area coefficient function, we assume that βk = kφ− θ and argue as above,
[qn]P (x, q) ≈ [qn]
(−1)k hk
(1− q)βk
(1− x/xc)k
≈ nβ0−1
(−1)k hk
Γ(βk)
nφ(1− x/xc)
In the above expression, the r.h.s. depends on a function h(t) of a single variable of
combined argument t = nφ(1− x/xc).
ii) The above argument suggests that a scaling function and a finite size scaling function
may be related by a Laplace transformation. A comparison with Eq.(20) leads one to
expect that finite size scaling functions are moment generating functions of the limit laws
of area and perimeter.
iii) Sufficient conditions under which knowledge of a scaling function implies the existence
of a finite size scaling function have been given for the finite size area scaling function [13]
using Darboux’s theorem.
A scaling function describes the leading singular behaviour of the generating function
P (x, q) in some region about (x, q) = (xc, 1). A particular form of subsequent correction
terms has been argued for at the beginning of the section.
Definition 6 (Correction-to-scaling functions). For numbers pm,n with generating function
P (x, q) =
m,n pm,nx
mqn, let Assumption 1 be satisfied. Let 0 < xc ≤ 1 be the radius of
convergence of the generating function P (x, 1). Assume that there exist constants s−, s+ ∈
[−∞,+∞] satisfying s− < s+, and functions Fl : (s−, s+) → R for l ∈ N0, such that the
generating function P (x, q) satisfies, for real constants φ > 0 and θl, where θl+1 > θl,
P (sing)(x, q) ∼
(1− q)θlFl
1− x/xc
(1− q)φ
(x, q) → (xc, 1) in D(s−, s+).
Then, the function F0(s) is a scaling function, and for l ≤ 1, the functions Fl(s) are called
correction-to-scaling functions.
Remarks. i) In the above context, the symbol ∼ denotes a (generalised) asymptotic
expansion (see also [80, Ch 1]): Let (Gk(x))k∈N0 be a sequence of (multivariate) functions
satisfying for all k the estimate Gk+1(x) = o(Gk(x)) as x → xc in some prescribed region.
For a function G(x), we then write G(x) ∼
k=0Gk(x) as x → xc, if for all n we have
G(x) =
k=0 Gk(x) +O(Gn(x)) as x → xc.
ii) The previous section yielded effective methods for obtaining area amplitude functions.
These are candidates for correction-to-scaling functions, see also [87].
4.2 Squares and rectangles
We consider the models of squares and rectangles, whose scaling behaviour can be explicitly
computed. Their half-perimeter and area generating function can be written as a single
sum, to which the Euler-MacLaurin summation formula [80, Ch 8] can be applied. We
first discuss squares.
Fact 5 (cf [49, Thm 2.4]). For 0 < x, q < 1, the generating function P (x, q) =
m=0 x
of squares, counted by half-perimeter and area, is given by
P (x, q) =
| log q|
| log x|
| log q|
+R(x, q),
with F(s) =
erfc(s), where the remainder term R(x, q) is bounded by
|R(x, q)| ≤ 1
| log x|.
Remarks. i) The remainder term differs from that in [49, Thm 2.4], where it was estimated
by an integral with lower bound one instead of zero [49, Eq. (46)].
ii) With xc = 1, s− = 0 and s+ = ∞, the function F(s) is a scaling function according to the
above definition. The remainder term is uniformly bounded in any rectangle [x0, 1)× [q0, 1)
for 0 < x0, q0 < 1, and so the approximation is uniform in this rectangle.
iii) The generating function P (x, q) satisfies the quadratic q-difference equation P (x, q) =
1+xq1/4P (q1/2x, q). Using the methods of the previous section, the area amplitude series of
the model can be derived. It coincides with the above scaling function F(s). This particular
form is expected, since the distribution of area is concentrated, p(x) = δ(x−1/4), compare
also with Ferrers diagrams.
iv) It has not been studied whether the scaling region can be extended to values x > 1
near (x, q) = (1, 1). It can be checked that the scaling function F(s) also determines
the asymptotic behaviour of the perimeter moment generating functions, via its expansion
about the origin. As expected, they indicate a concentrated distribution.
The half-perimeter and area generating function of rectangles is given by
P (x, q) =
xr+sqrs =
x(qx)r
1− qrx
We have P (x, 1) = x2/(1− x)2, and it can be shown that P (1, q) ∼ − log(1−q)
1−q as q ր 1, see
[85, 49]. The latter result implies that a scaling form as in Definition 4, with s− ≤ 0, does
not exist for rectangles. We have the following result.
Fact 6 ([49, Thm 3.4]). For 0 < q < 1 and 0 < qx < 1, the generating function P (x, q) of
rectangles satisfies
P (x, q) =
| log q|
| log q|
| log x|
− LerchPhi
qx, 1,
| log x|
| log q|
+R(x, q),
with the Lerch Phi-function LerchPhi(z, a, v) =
(v+n)a
, where the remainder term
R(x, q) is bounded by
|R(x, q)| ≤
1− qx
| log x|
(1− qx)2
| log q|
Remarks. i) The theorem implies that, for every q0 ∈ (0, 1), the function (1−qx)2P (x, q)
is uniformly approximated for points (x, q) satisfying q0 < q < 1 and 0 < x < xc(q), where
xc(q) = 1/q is the critical curve.
ii) Rectangles cannot have a scaling function F(s) as in Definition 4 with s− ≤ 0, since
the area generating function diverges with a logarithmic singularity. This is reflected in
the above approximation.
iii) It has not been studied whether the area moments or the perimeter moments at criti-
cality can be extracted from the above approximation.
iv) The relation of the above approximation to the area amplitude series of rectangles of
the previous section, F (s) = Ei(s2) es
, is not understood. Interestingly, the expansion of
F (s) about s = 0 resembles a logarithmic divergence. It is not clear whether its expansion
at the origin is related to the asymptotic behaviour of the perimeter moment generating
functions.
4.3 Ferrers diagrams
The singularity diagram of Ferrers diagrams is special, since the value xc(1) := limqր1 xc(q)
does not coincide with the radius of convergence xc of the half-perimeter generating function
P (x, 1). (The function q 7→ xc(q) is continuous on (0, 1], as may be inferred from the
exact solution.) Thus, there are two special points in the singularity diagram, namely
(x, q) = (xc, 1) and (x, q) = (xc(1), 1). Scaling behaviour about the latter point has
apparently not been studied, see also [85].
About the former point (x, q) = (xc, 1), scaling behaviour is expected. The area ampli-
tude series F (s) of Ferrers diagrams is given by the entire function
F (s) =
A numerical analysis indicates that its Taylor coefficients about s = 0 coincide with the
perimeter moment amplitudes at criticality, which characterise a concentrated distribu-
tion. There is no singularity of F (s) on the negative real axis at any finite value of s, in
accordance with the fact that the critical line at q = 1 extends above x = xc.
It is not known whether a scaling function exists for Ferrers diagrams, or whether it
would coincide with the amplitude generating function, see also the recent discussion [50,
Sec 2.3]. An rigorous study may be possible, by first rewriting the half-perimeter and
area generating function as a contour integral. A further analysis then reveals a saddle
point coalescing with the integration boundary at criticality. For such phenomena, uniform
asymptotic expansions can be obtained by Bleistein’s method [80, Ch 9.9]. The approach
proposed above is similar to that for the staircase model [83] in the following subsection.
4.4 Staircase polygons
For staircase polygons, counted by width, height, and area with associated variables x, y, q,
the existence of an area scaling function has been proved. The derivation starts from an
exact expression for the generating function, which has then been written as a complex
contour integral. About the point (x, q) = (xc, 1), this led to a saddle-point evaluation
with the effect of two coalescing saddles.
Fact 7 (cf [83, Thm 5.3]). Consider 0 < x, y, q < 1 such that the generating function
P (x, y, q) of staircase polygons, counted by width, height and area, is convergent. Set
q = e−ǫ for ǫ > 0. Then, as ǫ ց 0, we have
P (x, y, q) =
1− x− y
+α−1/2ǫ1/3
Ai′(αǫ−2/3)
Ai(αǫ−2/3)
1− x− y
 (1 +O(ǫ))
uniformly in x, y, where α = α(x, y) satisfies the implicit equation
α3/2 = log(x)
log(zm −
log(zm +
+ 2 Li2(zm −
d)− 2 Li2(zm +
where zm = (1 + y − x)/2 and d = z2m − y, and Li2(t) = −
log(1−u)
du is the Euler
dilogarithm.
Remarks. i) The characterisation of α3/2 given in [83, Eq (4.21)] has been used.
ii) The above approximation defines an area scaling function. For x = y and xc = 1/4, we
obtain the approximation [83, Eq (1.14)]
P (x, q) ∼
+ 4−2/3ǫ1/3
Ai′(44/3(1/4− x)ǫ−2/3)
Ai(44/3(1/4− x)ǫ−2/3)
as (x, q) → (xc, 1) within the region of convergence of P (x, q). It follows by comparison
that the area amplitude series coincides with the area scaling function.
iii) An area amplitude series for the anisotropic model has been given in [56], by a suitable
refinement of the method of dominant balance.
iv) It is expected that the perimeter law at x = xc may be inferred from the Taylor ex-
pansion of the scaling function F(s) at s = 0. A closed form for the moment generating
function or the probability density has not been given. The right tail of the distribution
has been analysed via the asymptotic behaviour of the moments [57, 55]. See also the next
subsection.
v) The above expression gives the singular behaviour of P (x, q) as q approaches unity,
uniformly in x, y. Restricting to x = y, it describes the singular behaviour along the line
q = 1 for 0 < x < xc. In the compact percolation picture, this line describes compact per-
colation below criticality. Perimeter limit laws away from criticality may be inferred along
the above lines. (Asymptotic expansions which are uniform in an additional parameter
appear also for solutions of differential equations near singular points [80].)
vi) By analytic continuation, it follows that the critical curve xc(q) for P (x, x, q) coincides
near q = 1 with the upper boundary curve x0(q) = (1 − s−(1 − q)2/3)/4 of the scaling
domain, where the value s− is determined by the singularity of smallest modulus of the
scaling function on the negative real axis, hence by the first zero of the Airy function. This
leads to a simple pole singularity in the generating function, which describes the branched
polymer phase close to q = 1.
4.5 Self-avoiding polygons
In the previous section, a conjecture for the limit distribution of area for self-avoiding
polygons and rooted slef-avoiding polygons was stated. We further explain the underlying
numerical analysis, following [91, 92, 93]. The numerically established form Eq. (32) implies
for the area moment generating functions for k 6= 1 singular behaviour of the form
(sing)
k (x) ∼
(1− x/xc)γk
(x ր xc),
with critical point xc = 0.14368062927(2) and γk = 3k/2− 3/2, where the numbers fk are
related to the amplitudes Ak in Eq. (32) by
Γ(γk)
For k = 1, we have γk = 0, and a logarithmic singularity is expected, g1(x) ∼ f1 log(1 −
x/xc), with f1 = A1. Similar to Conjecture 1, this leads to a corresponding conjecture
for the area amplitude series of self-avoiding polygons. If the area amplitude series was
a scaling function, we would expect that it also describes the limit law of perimeter at
criticality x = xc, via its expansion about the origin. (Interestingly, these moments are
related to the moments of the Airy distribution of negative order, see [93, 34].) This
prediction was confirmed in [93], up to numerical accuracy, for the first ten perimeter
moments. Also, the crossover behaviour to the branched polymer phase has been found
to be consistent with the corresponding scaling function prediction. As was argued in the
previous subsection, the critical curve xc(q) close to unity should coincide with the upper
boundary curve x0(q) = xc(1 − s−(1 − q)2/3), where the point s− is related to the first
zero of the Airy function on the negative real axis, s− = −0.2608637(5). The latter two
observations support the following conjecture.
Conjecture 3 ([87, 93]). Let pm,n denote the number of self-avoiding polygons of half-
perimeter m and area n, with generating function P (x, q) =
m,n pm,nx
mqn. Let xc =
0.14368062927(2) be the radius of convergence of the half-perimeter generating function
P (x, 1). Assume that
pm,n ∼ A0x−mc m−5/2 (m → ∞),
where A0 is estimated by A0 = 0.09940174(4). Let the number s− be such that (4A0)
3 πs−
coincides with the zero of the Airy function on the negative real axis of smallest modulus.
We have s− = −0.2608637(5).
i) For rooted self-avoiding polygons with half-perimeter and area generating function
P (r)(x, q) = x d
P (x, q), the conjectured form of a scaling function F (r)(s) : (s−,∞) →
R as in Definition 4 is
F (r)(s) = xc
logAi
(4A0)
with critical exponents θ = 1/3 and φ = 2/3.
ii) The conjectured form of a scaling function F(s) : (s−,∞) → R for self-avoiding
polygons is obtained by integration,
F(s) = −
log Ai
(4A0)
(1− q) log(1− q), (36)
with critical exponents θ = 1 and φ = 2/3.
Remarks. i) The above conjecture is essentially based on the conjecture of the previous
section that both staircase polygons and rooted self-avoiding polygons have, up to normal-
isation constants, the same limiting distribution of area in the uniform ensemble q = 1.
For a numerical investigation of the implications of the scaling function conjecture, see the
preceding discussion.
ii) A field-theoretical justification of the above conjecture has been proposed [16]. Also,
the values of A1 = 1/(8π) and the prefactor 1/(12π) in Eq. (36) have been predicted using
field-theoretic methods [15], see also the discussion in [93].
4.6 Models in higher dimensions
Only very few models of vesicles have been studied in three dimensions. For the simple
model of cubes, the scaling behaviour in the perimeter-area ensemble is the same as for
squares [49, Thm 2.4]. The scaling form in the area-volume ensemble has been given [49,
Thm 2.8]. The asymptotic behaviour of rectangular box vesicles has been studied to some
extent [73]. Explicit expressions for scaling functions have not been derived.
4.7 Open questions
The mathematical problem of this section concerns the local behaviour of multivariate
generating functions about non-isolated singularities. If such behaviour is known, it may,
under appropriate conditions, be used to infer asymptotic properties such as limit distri-
butions. Along lines of the same singular behaviour in the singularity diagram, expressions
uniform in the parameters are expected. This may lead to Gaussian limit laws [37]. Parts
of the theory of such asymptotic expansions have been developed using methods of sev-
eral complex variables [81]. The case of several coalescing lines of different singularities
is more difficult. Non-Gaussian limit laws are expected, and this case is subject to recent
mathematical research [81].
Our approach is motivated by certain models of statistical physics. It relies on the
observation that the singular behaviour of their generating function is described by a
scaling function. There are major open questions concerning scaling functions. On a
conceptual level, the transfer problem [35] should be studied in more detail, i.e., conditions
under which the existence of a scaling function implies the existence of the finite-size scaling
function. Also, conditions have to be derived such that limit laws can be extracted from
scaling functions. This is related to the question when can an asymptotic relation be
differentiated. Real analytic methods, in conjunction with monotonicity properties of the
generating function, might prove useful [80].
For particular examples, such as models satisfying a linear q-difference equation or di-
rected convex polygons, scaling functions may be extracted explicitly. It would be interest-
ing to prove scaling behaviour for classes of polygon models from their defining functional
equation. Furthermore, the staircase polygon result indicates that some generating func-
tions may have in fact asymptotic expansions for q ր 1, which are valid uniformly in the
perimeter variable (i.e., not only in the limit x ր xc). Such expansions would yield scaling
functions and correction-to-scaling functions, thereby extending the formal results of the
previous section. This might be worked out for specific models, at least in the relevant
example of staircase polygons.
Acknowledgements
The author would like to thank Tony Guttmann and Iwan Jensen for comments on the
manuscript, and Nadine Eisner, Thomas Prellberg and Uwe Schwerdtfeger for helpful dis-
cussions. Financial support by the German Research Council (Deutsche Forschungsge-
meinschaft) within the CRC701 is gratefully acknowledged.
References
[1] M. Abramowitz and I.A. Stegun. Handbook of Mathematical Functions with Formu-
las, Graphs, and Mathematical Tables, volume 18. National Bureau of Standards
Applied Mathematics Series, 1964. Reprint Dover 1973.
[2] D.J. Aldous. The continuum random tree II: An overview. In M.T. Barlow and N.H.
Bingham, editors, Stochastic Analysis, pages 23–70. Cambridge University Press,
Cambridge, 1991.
[3] G. Aleksandrowicz and G. Barequet. Counting d-dimensional polycubes and nonrect-
angular planar polyominoes. In Proc. 12th Ann. Int. Computing and Combinatorics
Conf. (COCOON), Taipei, Taiwan, volume 4112 of Springer Lecture Notes in Com-
puter Science, pages 418–427. Springer, 2006.
[4] D. Bennett-Wood, I.G. Enting, D.S. Gaunt, A.J. Guttmann, J.L. Leask, A.L.
Owczarek, and S.G. Whittington. Exact enumeration study of free energies of inter-
acting polygons and walks in two dimensions. J. Phys. A: Math. Gen, 31:4725–4741,
1998.
[5] N. Bleistein and R.A. Handelsman. Asymptotic Expansions of Integrals. Holt, Rine-
hart and Winston, New York, 1975.
[6] M. Bousquet-Mélou. Une bijection entre les polyominos convexes dirigés et les mots
de Dyck bilatéres. RAIRO Inform. Théor. Appl., 26:205–219, 1992.
[7] M. Bousquet-Mélou. A method for the enumeration of various classes of column-
convex polygons. Discrete Math., 154:1–25, 1996.
[8] M. Bousquet-Mélou. Families of prudent self-avoiding walks. Preprint
arXiv:0804.4843, 2008.
[9] M. Bousquet-Mélou and J.-M. Fédou. The generating function of convex polyomi-
noes: the resolution of a q-differential system. Discr. Math., 137:53–75, 1995.
[10] M. Bousquet-Mélou and S. Janson. The density of the ISE and local limit laws for
embedded trees. Ann. Appl. Probab., 16:1597–1632, 2006.
[11] M. Bousquet-Mélou and A. Rechnitzer. The site-perimeter of bargraphs. Adv. in
Appl. Math., 31:86–112, 2003.
[12] R. Brak and J.W. Essam. Directed compact percolation near a wall. III. Exact results
for the mean length and number of contacts. J. Phys. A: Math. Gen., 32:355–367,
1999.
[13] R. Brak and A.L. Owczarek. On the analyticity properties of scaling functions in
models of polymer collapse. J. Phys. A: Math. Gen., 28:4709–4725, 1995.
[14] R. Brak, A.L. Owczarek, and T. Prellberg. A scaling theory of the collapse transi-
tion in geometric cluster models of polymers and vesicles. J. Phys. A: Math. Gen.,
26:4565–5479, 1993.
[15] J. Cardy. Mean area of self-avoiding loops. Phys. Rev. Lett., 72:1580–1583, 1994.
[16] J. Cardy. Exact scaling functions for self-avoiding loops and branched polymers.
J. Phys. A: Math. Gen., 34:L665–L672, 2001.
[17] K.L. Chung. A Course in Probability Theory. Academic Press, New York, 2nd
edition, 1974.
[18] G.M. Constantine and T.H. Savits. A multivariate Faa di Bruno formula with ap-
plications. Trans. Amer. Math. Soc., 348:503–520, 1996.
[19] D.A. Darling. On the supremum of a certain Gaussian process. Ann. Probab., 11:803–
806, 1983.
[20] M.-P. Delest, D. Gouyou-Beauchamps, and B. Vauquelin. Enumeration of parallelo-
gram polyominoes with given bond and site perimeter. Graphs Combin., 3:325–339,
1987.
[21] M.-P. Delest and X.G. Viennot. Algebraic languages and polyominoes enumeration.
Theor. Comput. Sci., 34:169–206, 1984.
[22] J.C. Dethridge, T.M. Garoni, A.J. Guttmann, and I. Jensen. Prudent walks and
polygons. Preprint arXiv:0810:3137, 2008.
[23] G. Doetsch. Introduction to the Theory and Application of the Laplace Transform.
Springer, New York, 1974.
[24] M. Drmota. Systems of functional equations. Random Structures Algorithms, 10:103–
124, 1997.
[25] P. Duchon. q-grammars and wall polyominoes. Ann. Comb., 3:311–321, 1999.
[26] J.W. Essam. Directed compact percolation: Cluster size and hyperscaling. J. Phys.
A: Math. Gen., 22:4927–4937, 1989.
[27] J.W. Essam and A.J. Guttmann. Directed compact percolation near a wall. II.
Cluster length and size. J. Phys. A: Math. Gen., 28:3591–3598, 1995.
[28] J.W. Essam and D. Tanlakishani. Directed compact percolation. II. Nodal points,
mass distribution, and scaling. In Disorder in physical systems, volume 67, pages
67–86. Oxford Univ. Press, New York, 1990.
[29] J.W. Essam and D. Tanlakishani. Directed compact percolation near a wall. I. Biased
growth. J. Phys. A: Math. Gen., 27:3743–3750, 1994.
[30] J.A. Fill, P. Flajolet, and N. Kapur. Singularity analysis, Hadamard products, and
tree recurrences. J. Comput. Appl. Math., 174:271–313, 2005.
[31] M.E. Fisher, A.J. Guttmann, and S.G. Whittington. Two-dimensional lattice vesicles
and polygons. J. Phys. A: Math. Gen., 24:3095–3106, 1991.
[32] P. Flajolet. Singularity analysis and asymptotics of Bernoulli sums. Theoret. Comput.
Sci., 215:371–381, 1999.
[33] P. Flajolet, S. Gerhold, and B. Salvy. On the non-holonomic character of logarithms,
powers, and the n-th prime function. Electronic Journal of Combinatorics, 11:A2:1–
16, 2005.
[34] P. Flajolet and G. Louchard. Analytic variations on the Airy distribution. Algorith-
mica, 31:361–377, 2001.
[35] P. Flajolet and A. Odlyzko. Singularity analysis of generating functions. SIAM
J. Discr. Math., 3:216–240, 1990.
[36] P. Flajolet, P. Poblete, and A. Viola. On the analysis of linear probing hashing.
Average-case analysis of algorithms. Algorithmica, 22:37–71, 1998.
[37] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Book in preparation, 2008.
[38] B. Gittenberger. On the contour of random trees. SIAM J. Discr. Math., 12:434–458,
1999.
[39] I.P. Goulden and D.M. Jackson. Combinatorial enumeration. John Wiley & Sons,
New York, 1983.
[40] D. Gouyou-Beauchamps and P. Leroux. Enumeration of symmetry classes of convex
polyominoes on the honeycomb lattice. Theoret. Comput. Sci., 346:307–334, 2005.
[41] R.B. Griffiths. Proposal for notation at tricritical points. Phys. Rev. B, 7:545–551,
1973.
[42] G. Grimmett. Percolation. Springer, Berlin, 1999. 2nd ed.
[43] A.J. Guttmann. Asymptotic analysis of power-series expansions. In C. Domb and
J.L. Lebowitz, editors, Phase Transitions and Critical Phenomena, volume 13, pages
1–234. Academic, New York, 1989.
[44] A.J. Guttmann and I. Jensen. Fuchsian differential equation for the perimeter gen-
erating function of three-choice polygons. Séminaire Lotharingien de Combinatoire,
54:B54c, 2006.
[45] A.J. Guttmann and I. Jensen. The perimeter generating function of punctured stair-
case polygons. J. Phys. A: Math. Gen., 39:3871–3882, 2006.
[46] G.H. Hardy. Divergent Series. Clarendon Press, Oxford, 1949.
[47] E.J. Janse van Rensburg. The Statistical Mechanics of Interacting Walks, Polygons,
Animals and Vesicles, volume 18 of Oxford Lecture Series in Mathematics and its
Applications. Oxford University Press, Oxford, 2000.
[48] E.J. Janse van Rensburg. Statistical mechanics of directed models of polymers in the
square lattice. J. Phys. A: Math. Gen., 36:R11–R61, 2003.
[49] E.J. Janse van Rensburg. Inflating square and rectangular lattice vesicles.
J. Phys. A: Math. Gen., 37:3903–3932, 2004.
[50] E.J. Janse van Rensburg and J. Ma. Plane partition vesicles. J. Phys. A: Math.
Gen., 39:11171–11192, 2006.
[51] S. Janson. The Wiener index of simply generated random trees. Random Structures
Algorithms, 22:337–358, 2003.
[52] S. Janson. Brownian excursion area, Wright’s constants in graph enumeration, and
other Brownian areas. Probab. Surv., 4:80–145, 2007.
[53] I. Jensen. Perimeter generating function for the mean-squared radius of gyration of
convex polygons. J. Phys. A: Math. Gen., 38:L769–775, 2005.
[54] B.McK. Johnson and T. Killeen. An explicit formula for the c.d.f. of the l1 norm of
the Brownian bridge. Ann. Prob., 11:807–808, 1983.
[55] J.M. Kearney. On a random area variable arising in discrete-time queues and compact
directed percolation. J. Phys. A: Math. Gen., 37:8421–8431, 2004.
[56] M.J. Kearney. Staircase polygons, scaling functions and asymmetric compact perco-
lation. J. Phys. A: Math. Gen., 35:L731–L735, 2002.
[57] M.J. Kearney. On the finite-size scaling of clusters in compact directed percolation.
J. Phys. A: Math. Gen., 36:6629–6633, 2003.
[58] M.J. Kearney, S.N. Majumdar, and R.J. Martin. The first-passage area for drifted
Brownian motion and the moments of the Airy distribution. J. Phys. A: Math.
Theor., 40:F863–F869, 2007.
[59] J.-M. Labarbe and J.-F. Marckert. Asymptotics of Bernoulli random walks, bridges,
excursions and meanders with a given number of peaks. Electronic J. Probab., 12:229–
261, 2007.
[60] G.F. Lawler, O. Schramm, and W. Werner. On the scaling limit of planar self-
avoiding walk. In Fractal Geometry and Applications: A Jubilee of Benôıt Man-
delbrot, Part 2, volume 72 of Proceedings of Symposia in Pure Mathematics, pages
339–364. Amer. Math. Soc., Providence, RI, 2004.
[61] I.D. Lawrie and S. Sarbach. Theory of tricritical points. In C. Domb and J.L.
Lebowitz, editors, Phase Transitions and Critical Phenomena, volume 9, pages 1–
161. Academic Press, London, 1984.
[62] P. Leroux and É. Rassart. Enumeration of symmetry classes of parallelogram poly-
ominoes. Ann. Sci. Math. Québec, 25:71–90, 2001.
[63] P. Leroux, É. Rassart, and A. Robitaille. Enumeration of symmetry classes of convex
polyominoes in the square lattice. Adv. in Appl. Math, 21:343–380, 1998.
[64] K.Y. Lin. Rigorous derivation of the perimeter generating functions for the mean-
squared radius of gyration of rectangular, Ferrers and pyramid polygons. J. Phys.
A: Math. Gen., 39:8741–8745, 2006.
[65] M. Lladser. Asymptotic enumeration via singularity analysis. PhD thesis, Ohio State
University, 2003. Doctoral dissertation.
[66] G. Louchard. Kac’s formula, Lévy’s local time and Brownian excursion.
J. Appl. Probab., 21:479–499, 1984.
[67] G. Louchard. The Brownian excursion area: A numerical analysis. Com-
put. Math. Appl., 10:413–417, 1985.
[68] G. Louchard. Erratum: ”The Brownian excursion area: A numerical analysis”.
Comput. Math. Appl., 12:375, 1986.
[69] G. Louchard. Probabilistic analysis of some (un)directed animals. Theoret. Comput.
Sci., 159:65–79, 1996.
[70] G. Louchard. Probabilistic analysis of column-convex and directed diagonally-convex
animals. Random Structures Algorithms, 11:151–178, 1997.
[71] G. Louchard. Probabilistic analysis of column-convex and directed diagonally-convex
animals. II. Trajectories and shapes. Random Structures Algorithms, 15:1–23, 1999.
[72] A. Del Lungo, M. Mirolli, R. Pinzani, and S. Rinaldi. A bijection for directed-convex
polyominoes. Discr. Math. Theo. Comput. Sci., AA (DM-CCG):133–144, 2001.
[73] J. Ma and E.J. Janse van Rensburg. Rectangular vesicles in three dimensions.
J. Phys. A: Math. Gen., 38:4115–4147, 2005.
[74] N. Madras and G. Slade. The Self-Avoiding Walk. Birkhäuser Boston, Boston, MA,
1993.
[75] S.N. Majumdar. Brownian functionals in physics and computer science. Current
Sci., 89:2076–2092, 2005.
[76] S.N. Majumdar and A. Comtet. Airy distribution function: From the area under a
Brownian excursion to the maximal height of fluctuating interfaces. J. Stat. Phys.,
119:777–826, 2005.
[77] M. Nguy˜̂en Th´̂e. Area of Brownian motion with generatingfunctionology. In C.
Banderier and C. Krattenthaler, editors, Discrete Random Walks, DRW’03, Discrete
Mathematics and Theoretical Computer Science Proceedings, AC, pages 229–242.
Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2003.
[78] M. Nguy˜̂en Th´̂e. Area and inertial moment of Dyck paths. Combin. Probab. Comput.,
13:697–716, 2004.
[79] A.M. Odlyzko. Asymptotic enumeration methods. In R.L. Graham, M. Grötschel,
and L. Lovász, editors, Handbook of Combinatorics, volume 2, pages 1063–1229.
Elsevier, Amsterdam, 1995.
[80] F.W.J. Olver. Asymptotics and Special Functions. Academic Press, New York, 1974.
[81] R. Pemantle and M. Wilson. Twenty combinatorial examples of asymptotics derived
from multivariate generating functions. SIAM Rev., 50:199–272, 2008.
[82] J. Pitman. Combinatorial Stochastic Processes, volume 1875 of Lecture Notes in
Mathematics. Springer-Verlag, Berlin, 2006.
[83] T. Prellberg. Uniform q-series asymptotics for staircase polygons.
J. Phys. A: Math. Gen., 28:1289–1304, 1995.
[84] T. Prellberg and R. Brak. Critical exponents from nonlinear functional equations for
partially directed cluster models. J. Stat. Phys., 78:701–730, 1995.
[85] T. Prellberg and A.L. Owczarek. Stacking models of vesicles and compact clusters.
J. Stat. Phys., 80:755–779, 1995.
[86] A. Rechnitzer. Haruspicy 2: The anisotropic generating function of self-avoiding
polygons is not D-finite. J. Combin. Theory Ser. A, 113:520–546, 2006.
[87] C. Richard. Scaling behaviour of two-dimensional polygon models. J. Stat. Phys.,
108:459–493, 2002.
[88] C. Richard. Staircase polygons: Moments of diagonal lengths and column heights.
J. Phys.: Conf. Ser., 42:239–257, 2006.
[89] C. Richard. On q-functional equations and excursion moments. Discr. Math., in
press, 2008. math.CO/0503198.
[90] C. Richard and A.J. Guttmann. q-linear approximants: Scaling functions for polygon
models. J. Phys. A: Math. Gen., 34:4783–4796, 2001.
[91] C. Richard, A.J. Guttmann, and I. Jensen. Scaling function and universal amplitude
combinations for self-avoiding polygons. J. Phys. A: Math. Gen., 34:L495–L501,
2001.
[92] C. Richard, I. Jensen, and A.J. Guttmann. Scaling function for self-avoiding poly-
gons. In D. Iagolnitzer, V. Rivasseau, and J. Zinn-Justin, editors, Proceedings of the
International Congress on Theoretical Physics TH2002 (Paris), Supplement, pages
267–277. Birkhäuser, Basel, 2003.
[93] C. Richard, I. Jensen, and A.J. Guttmann. Scaling function for self-avoiding polygons
revisited. J. Stat. Mech.: Th. Exp., page P08007, 2004.
[94] C. Richard, I. Jensen, and A.J. Guttmann. Area distribution and scaling function
for punctured polygons. Electronic Journal of Combinatorics, 15:#R53, 2008.
[95] C. Richard, U. Schwerdtfeger, and B. Thatte. Area limit laws for symmetry classes
of staircase polygons. Preprint arXiv:0710:4041, 2007.
[96] U. Schwerdtfeger. Exact solution of two classes of prudent polygons. Preprint
arXiv:0809:5232, 2008.
[97] U. Schwerdtfeger. Volume laws for boxed plane partitions and area laws for Ferrers
diagrams. In Fifth Colloquium on Mathematics and Computer Science, Discrete
Mathematics and Theoretical Computer Science Proceedings, AG, pages 535–544.
Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2008.
[98] R.P. Stanley. Enumerative Combinatorics, volume 2. Cambridge University Press,
Cambridge, Cambridge.
[99] L. Takács. On a probability problem connected with railway traffic.
J. Appl. Math. Stochastic Anal., 4:1–27, 1991.
[100] L. Takács. Limit distributions for the Bernoulli meander. J. Appl. Prob., 32:375–395,
1995.
[101] R. van der Hofstad and W. König. A survey of one-dimensional random polymers.
J. Statist. Phys., 103:915–944, 2001.
[102] C. Vanderzande. Lattice Models of Polymers, volume 11 of Cambridge Lecture Notes
in Physics. Cambridge University Press, Cambridge, 1998.
[103] L. Di Vizio, J.-P. Ramis, J. Sauloy, and C. Zhang. Équations aux q-différences.
Gaz. Math., 96:20–49, 2003.
[104] S.G. Whittington. Statistical mechanics of three-dimensional vesicles.
J. Math. Chem., 14:103–110, 1993.
	Introduction
	Polygon models and generating functions
	Limit distributions
	An illustrative example: Rectangles
	Limit law of area
	Limit law via generating functions
	Dominant balance
	A general method
	Further examples
	Ferrers diagrams
	Staircase polygons
	q-difference equations
	Algebraic q-difference equations
	q-functional equations and other extensions
	A stochastic connection
	Directed convex polygons
	Limit laws away from (xc,1)
	Self-avoiding polygons
	Punctured polygons
	Models in three dimensions
	Summary
	Scaling functions
	Scaling and finite size scaling
	Squares and rectangles
	Ferrers diagrams
	Staircase polygons
	Self-avoiding polygons
	Models in higher dimensions
	Open questions
ABSTRACT
  We discuss the asymptotic behaviour of models of lattice polygons, mainly on
the square lattice. In particular, we focus on limiting area laws in the
uniform perimeter ensemble where, for fixed perimeter, each polygon of a given
area occurs with the same probability. We relate limit distributions to the
scaling behaviour of the associated perimeter and area generating functions,
thereby providing a geometric interpretation of scaling functions. To a major
extent, this article is a pedagogic review of known results.

<|endoftext|><|startoftext|>
Incommmensurability and unconventional superconductor to insulator transition in
the Hubbard model with bond-charge interaction
A. A. Aligia,1 A. Anfossi,2, 3 L. Arrachea,4, 3 C. Degli Esposti Boschi,5 A.
O. Dobry,6 C. Gazza,6 A. Montorsi,2 F. Ortolani,7 and M. E. Torio6
1Comisión Nacional de Enerǵıa Atómica, Centro Atómico Bariloche and Instituto Balseiro, 8400 S.C. de Bariloche, Argentina
2Dipartimento di Fisica del Politecnico and CNISM,
corso Duca degli Abruzzi 24, I-10129, Torino, Italy
3BIFI, Universidad de Zaragoza, Corona de Aragón 42, 5009 Zaragoza, Spain
4Departamento de F́ısica de la Materia Condensada, Universidad de Zaragoza, 5009 Zaragoza
5Unità CNISM and Dipartimento di Fisica dell’Università di Bologna, viale Berti-Pichat 6/2, I-40127, Bologna, Italy
6Instituto de F́ısica Rosario, CONICET-UNR, Bv. 27 de Febrero 210 bis, 2000 Rosario, Argentina.
7Dipartimento di Fisica dell’Università di Bologna and INFN, viale Berti-Pichat 6/2, I-40127, Bologna, Italy
(Dated: November 1, 2018)
We determine the quantum phase diagram of the one-dimensional Hubbard model with bond-
charge interaction X in addition to the usual Coulomb repulsion U > 0 at half-filling. For large
enoughX < t the model shows three phases. For large U the system is in the spin-density wave phase
as in the usual Hubbard model. As U decreases, there is first a spin transition to a spontaneously
dimerized bond-ordered wave phase and then a charge transition to a novel phase in which the
dominant correlations at large distances correspond to an incommensurate singlet superconductor.
PACS numbers: 71.10.Fd,71.10.Hf,71.10.Pm,71.30.+h
The Hubbard model has been originally proposed to
describe the effect of the Coulomb interaction in tran-
sition metals, which usually contain localized orbitals.
Other real compounds containing more extended orbitals
cannot in general be properly described by this simple
Hamiltonian. Well-known examples are several quasi-
one-dimensional (1D) materials that have been recently
investigated [1], which exhibit a variety of phases that
cannot be explained with the usual Hubbard model. Ad-
ditional interactions should be included. A natural in-
teraction that arises in systems with extended orbitals is
the bond-charge interaction X [2]. In fact, it is natural
to assume that the charge in the bond affects screening
and the effective potential acting on valence electrons,
and therefore the extension of the Wannier orbitals and
the hopping between them should vary with the charge.
This leads to the U −X Hamiltonian:
H = −t
σ=↑,↓,〈ij〉
iσcjσ +H.c.) + U
ni↑ni↓
σ,〈ij〉
cjσ +H.c.)(ni−σ + nj−σ). (1)
This model has been studied in two dimensions, moti-
vated by a theory of hole superconductivity [3]. A mod-
ified version of it has been derived as an effective model
for the cuprates and shows enhanced d-wave supercon-
ducting correlations [4]. Recently, this model has been
paramount to broader audiences, and its relevance has
been discussed in the context of mesoscopic transport [5]
and quantum information [6, 7].
In 1D, there are bosonization [8, 9] and numerical [9]
results available. However, at half-filling, the effect of X
disappears in the standard bosonization treatment and a
behavior different from the usual Hubbard model was not
expected in these studies. For X = t, an exact solution
is available [10]. In this case the ground state is highly
degenerate: the transition to a metallic state takes place
at Uc = 4t > 0, but the response of the system to an
applied magnetic flux indicates that it is not supercon-
ducting [11]. In view of the previous studies, the recent
evidence of an insulator-metal transition driven by X < t
at finite Uc > 0 at half-filling comes as a surprise [12].
The nature of the metallic phase and the character of
the transition have not been fully elucidated, though the
possibility of superconductivity has been suggested.
In this Letter we employ several analytical and numer-
ical techniques to calculate accurately the phase diagram
of the model at half-filling in 1D and to determine the na-
ture of each phase. We establish that the insulator-metal
transition is of commensurate-incommensurate (CIC)
type to a phase with dominating singlet superconducting
(SS) correlations. Remarkably, unlike other CIC tran-
sitions [13, 14], it is not driven by one-body effects like
chemical potential or the emergence of more than two
Fermi points in the noninteracting dispersion relation,
but by strong correlations induced by large enough X . In
addition, we unveil that inside the insulating phase there
is a spin transition separating the expected spin-density
wave (SDW) for U > Us from a spontaneously dimerized
bond-ordered wave (BOW) phase for Uc < U < Us. This
transition is of Kosterlitz-Thouless (KT) type and a spin
gap opens in the BOW phase.
The nature of each phase and the qualitative aspects of
the phase diagram can be understood by a weak coupling
bosonization analysis provided it includes vertex correc-
tions of second order in X to the coupling constants and
http://arxiv.org/abs/0704.0717v2
one term of order a2 in the bosonization of the bond-
charge interaction as described below, where a is the lat-
tice constant. A bosonized version of (1) is given by the
following Hamiltonian density:
H = H0σ +H0ρ +
(2πα)2
8φσ)−
(2πα)2
(2πα)2
8φσ)∂xφρ, (2)
where H0σ and H0ρ are the usual known quadratic forms
and α is a short range cutoff in the bosonization proce-
dure. The first line of (2) has the structure of the pre-
viously studied bosonized theory [8], which corresponds
to two decoupled sine-Gordon field theories, one for the
spin (φσ) and the other for the charge (φρ). In order
to take into account the effect of the bond-charge inter-
action on the phase diagram of the system, we included
vertex corrections of second order in X in the definition
of the the coupling constants gi, due to virtual processes
involving states far from the Fermi energy [15]. In addi-
tion, we took into account the usually neglected gσρ term
that couples spin and charge degrees of freedom. The lat-
ter is ∝ a2. It arises including spatial derivatives of the
fermionic fields in the representation of (1) in terms of
a low energy field theory. All of these terms have naive
scaling dimension 3 and are usually neglected. However,
one term that bosonize as the second line of (2) becomes
relevant for large enough X and provides a mechanism
for an incommensurate transition, as discussed below.
Explicitly, the effective parameters read g1⊥ = g2⊥ =
(U− 8X
π(t−X)
)a and gσρ =
2a2X . The forward and umk-
lapp processes are the same as in the Hubbard model,
g3⊥ = g4⊥ = Ua. The Luttinger liquid parameters (Kρ
and Kσ) and the charge and spin wave velocity (uρ and
uσ) in terms of gi are given by known expressions [16].
Neglecting the gσρ term, the renormalization-group (RG)
flow diagrams are of KT type. A spin gap opens when
g1⊥ < 0, i.e., when the flow of RG, which takes place
on the separatrix of the KT diagram due to spin SU(2)
symmetry, goes to strong coupling. Therefore, the spin
gapped phase appears when U < Us =
π(t−X)
. As for
the behavior of the charge modes, a gap opens when the
g3⊥ term becomes relevant. The charge gapped phase
takes place for U > Uc, with Uc < Us. The gσρ term
becomes relevant for Kσ < 1/2 (X > 0.6t for U = 0).
In the spin gapped phase the cos(
8φσ) is frozen at its
mean value. This term could be interpreted as a chemical
potential [µ =
(2πα)2
〈cos(
8φσ)〉] times a charge density
operator. The effects of such a term are known [16].
If we start the analysis from a situation where there is
also a charge gap (∆c) smaller than the spin one (∆s),
and we then increase the value of X , the effect of this
term is to close ∆c, leading to a metallic phase when
µ > ∆c. The effective Fermi level is shifted with respect
to the original one and the system develops incommen-
0 0,2 0,4 0,6
Bosonization S
Bosonization C
0 0,2 0,4 0,6 0,8
0 0,2 0,4 0,6 0,8 1
Exact
Top. Ph S
Top. Ph C
DMRG S
DMRG C
Figure 1: (Color online). Phase diagram. Left: Bosoniza-
tion (top), and real space renormalization-group (bottom)
predictions. Right: Numerical results as obtained by DMRG
(circles-squares) and topological phase (crosses) methods.
surate correlations. A numerical analysis discussed be-
low shows that the system has dominant SS correlations.
Thus, this phase can be characterized as incommensurate
singlet superconducting (ICSS).
For a qualitative localization of the boundary transi-
tion line between the insulator and the ICSS phase, we
have implemented a procedure as follows: (i) We start
from a parameter regime where the spin gap is open.
(ii) We follow the RG flow up to a length scale where
|g1⊥|/(πUs)| ∼ 1. (iii) At this point the gσρ term is de-
coupled by a mean field approach similar to that used
by Nersesyan et al. to show incommensurability in the
anisotropic zigzag chain [17]. The value of 〈cos(
8φσ)〉
is exactly obtained at the LE point (Kσ = 1/2). (iv) For
vanishing gσρ, ∆c is obtained by rescaling the problem
to the LE point of the charge sector, by using the RG
equations of the sine-Gordon theory. (v) The CIC tran-
sition takes place when
(2πα)2
〈cos(
8φσ)〉 = ∆c [16]. In
the top left panel of Fig. 1 we show the phase diagram
of the model predicted by this approach. For each value
of X , there are two transition points Uc and Us corre-
sponding to the charge and spin transition, respectively.
Each phase is characterized by the gapped modes and the
relevant order parameter. For U > Uc the system is an
insulator. For U > Us, the slowest decaying correlation
functions are the spin-spin ones. The system is in a SDW
phase. For Uc < U < Us a fully gapped (spin and charge)
phase is developed. The fields φσ and φρ are located at
the minimum of the potential, and the translation sym-
metry is spontaneously broken. The BOW parameter,
defined below, acquires a nonzero value. For U < Uc
the charge gap closes and the dominant correlations at
large distances are the SS ones. While the nature of each
phase has been identified, the phase boundaries predicted
by bosonization are not quantitatively valid, particularly
for large values of the interactions. In the right panel of
Fig. 1 we show the phase diagram of the model, as ob-
tained by accurate numerical techniques. One of them,
used to determine the charge transition line, consists in
studying singularities of single-site entanglement [12] by
means of density-matrix renormalization group (DMRG)
[18]. Another method is based on topological numbers,
or jumps of Berry phases [19], which was successfully ap-
plied to a similar model [8] (b). The value of Uc (Us) is
determined in this case by the jump of the charge (spin)
Berry phase. The corresponding values of Uc and Us in
systems up to L = 14 sites, extrapolated to the thermo-
dynamic limit using a parabola in 1/L2, are also shown
in Fig. 1.
DMRG evaluations of ∆c and ∆s confirm these pre-
dictions. The charge gap was calculated in [12] from
the definition 2∆c = E0(N + 2) + E0(N − 2)− 2E0(N),
E0(N) being the ground-state energy of the chain with
N particles. Similarly, the spin gap is here determined
through ∆s = E0(Sz = 1) − E0(Sz = 0), being E0(Sz),
the ground-state energy of the half-filled system within
the subspace with a given total Sz. We can see in Fig. 1
that the closing of ∆c, ∆s do not take place simultane-
ously for small U and X . The critical lines for the closing
of both gaps obtained by extrapolations to the thermo-
dynamic limit are in reasonable quantitative agreement
with the ones determined by the method of the topolog-
ical phases.
We have verified that the spin transition is of KT
type, calculating the scaling dimensions of the singlet and
triplet operators as described in [19]. In order to identify
the universality class of the charge transition, we em-
ployed the finite-size crossing method [20]. The study of
the dependence of 〈ni↑ni↓〉 = ∂eL/∂U on the size L (eL
being the ground-state energy density) provides a loca-
tion of the critical points in agreement with the methods
discussed above. In addition, the divergence that devel-
ops ∂e2L/∂U
2 with increasing L indicates that the gap ex-
ponent ν remains close to 1/2 (the value that can be com-
puted exactly at the point X = t) for X/t = 0.6, 0.7, 0.8
with a possible increase for X/t → 0.5; below this point,
our numerical analysis suggests that the charge transi-
tion becomes of KT type, with “ν = ∞”. The estimate
ν = 1/2 relies upon the assumption that the dynamic
exponent ζ (through which gap and correlation length ξ
are related, ∆c ∝ ξζ) is still ζ = 2, as in the exactly
solvable case X = t [7]. As already noted in [12], the
behavior of ∆c ∝ L−2 along the transition line is consis-
tent with this exponent. We stress that such feature is in
agreement with the CIC character of the metal-insulator
transition [16]. Instead, within the metallic phase, the
finite-size scaling suggests ∆c ∝ L−1, although the data
are rather noisy due to incommmensurability.
In Fig. 2 we show numerical results supporting the in-
commensurate character of the metallic phase. We report
the density distributions in real space for the local charge
8 16 24 32 40 48 56 64
U=4.0 t
U=3.05 t
U=2.5 t
U=0.5 t
8 16 24 32 40 48 56 64
X=0.3 t
X=0.58 t
X=0.8 t
Figure 2: (Color online). Charge distribution 〈ni〉 evaluated
by DMRG. Left: X = 0.8t. Right: U = 1.5t.
density ni = ni↑ + ni↓ in the ground state in an open
chain with L = 64 sites. The incommensurate character
of the metallic phase manifests itself also in the behavior
of the charge and spin correlation functions, whose cor-
responding structure factors show peaks away from the
commensurate reciprocal vector q = π (not shown). The
left panel of Fig. 2 corresponds to X = 0.8t as U is var-
ied. The behavior is similar to the one observed within
the incommensurate phase of the Hubbard model includ-
ing next-nearest-neighbor hopping (t−t′−U model) [13].
For U > Uc = 3.05t, the commensurate charge distribu-
tion characterizing the insulating phase is reached within
a few lattice sites from the edge. The insulator-metal
transition shows up via the appearance of incommensu-
rate modulations in the charge distribution, whose wave-
length increases within the metallic phase. The right
part of the figure shows the results obtained by varying
X at U = 1.5t. Interestingly, a first modulation appears
already for Xs < X < Xc ( Xs ≈ 0.5t, and Xc ≈ 0.6t).
Again, for X > Xc further incommensurate modulations
appear in the LE phase.
Within the charge sector U < Uc, the dominating cor-
relations at large distance are superconducting pair-pair
ones if the correlation exponent Kρ > 1 or charge-charge
ones otherwise. We calculatedKρ employing the method-
ology described in [9]. This study casts extrapolated val-
ues Kρ ∼ 1.3 for U = 0 and X = 0.8t. To provide
stronger evidence for the SS character of the incommen-
surate phase, we have calculated on-site pairing correla-
tions 〈P †i Pj〉 with Pi = c
i↓ and charge-charge correla-
tions |〈ninj〉 − 〈ni〉〈nj〉| in an open chain with 100 sites
and using the sites 30 to 70 to avoid boundary effects.
The results are displayed in Fig. 3. A fitting of the pair-
ing correlations at distances between 8 and 40 sites gives
Kρ = 1.32± 0.01. This value is also consistent with the
long distance behavior of the charge-charge correlations.
The inset also shows the tendency of the system to show
the anomalous flux quantization characteristic of super-
0.0 0.5 1.0 1.5
0.0 0.5 1.0
-5.060
-5.055
-5.050
Φ / Φ
 pair-pair
 charge-charge
 |i-j|
Figure 3: Pair-pair and charge-charge correlation functions
for U = t and X = 0.8t. Full (dashed) line corresponds to
a power law with exponent 1/Kρ (Kρ). The inset shows the
ground state energy as a function of an applied magnetic flux.
conductivity [11], which is more pronounced as the size
of the system increases.
An additional argument suggesting superconducting
correlations within this phase is provided by the real
space renormalization-group method, used before for the
standard Hubbard model [21]. Different from that case,
the recursive equations for the renormalized parameters
in the positive U regime, depending on X and U , exhibit
three different fixed points for the nth step renormalized
Coulomb interaction U (n) in the large n limit: U (n) > 0
for U > Urc, U
(n) = 0 for U = Urc, and U
(n) < 0 for
U < Urc. In the latter case, the effective Coulomb inter-
action becomes attractive. In the bottom left insert of
Fig. 1 Urc obtained in this way is reported.
To support the bosonization predictions, which char-
acterize the intermediate phase as a BOW, we have eval-
uated with DMRG the BOW order parameter OBOW =
(−1)i〈c†i+1σciσ +H.c.〉]/(L− 1) in chains with open
boundary conditions, following the same procedure as
Manmana et al. for the ionic Hubbard model [22] in
chains up to 400 sites. In spite of the large systems used,
finite-size effects are still important and do not allow an
accurate extrapolation. In any case, the qualitative be-
havior of our results (not shown) is similar to that found
by Manmana et al. showing a clear maximum inside the
BOW phase, an abrupt fall for U ∼ Uc as the system en-
ters the SS phase and a slower decay for larger U ∼ Us,
which for finite systems extends inside the SDW phase.
To conclude, we have presented compelling evidence,
based on bosonization as well as on other analytical and
numerical techniques, of the existence of a narrow bond-
ordered wave phase and a transition to an unconventional
incommensurate metallic one with dominant singlet su-
perconducting correlations in the phase diagram of the
U − X model. The appearance of superconductivity in
a model with repulsive on-site interactions at half fill-
ing, and of incommensurate correlations induced by in-
teraction are both unusual features. Their emergence can
be understood from the structure of the exactly solvable
case X = t. There the number Nd of doubly occupied
sites (doublons) becomes a conserved quantity; holes and
doublons play an identical role regarding the kinetic en-
ergy ǫ(kF ), which can be mapped into that of a spinless
fermion system, with Fermi momentum kF . The compe-
tition of ǫ(kF ) and UNd fixes the Fermi level of the re-
sulting effective model. The presence of doublons in the
ground state (U < 4t) simultaneously drives the spinless
fermions away from half-filling (kF 6= π), and switches on
the doublons role in the kinetic energy. The latter ceases
to be identical to that of holes as soon as X 6= t, gener-
ating incommensurability within the system. Moreover
superconducting correlations can dominate away from
half-filling [8]. Thus, a nonvanishing number of doublons
provides the scenario for both incommensurability and
superconductivity for X . t.
We thank D. Cabra for useful discussions. We acknowl-
edge support from PICT’s No. 03-11609, No. 03-12742,
and No. 05-33775 of ANPCyT and PIP’s No. 5254
and No. 5306 of CONICET, Argentina, No. FIS2006-
08533-C03-02, and the “Ramon y Cajal” program from
MCEyC of Spain, Angelo Della Riccia Foundation, and
PRIN 2005021773 Italy.
[1] Burbonais, Science 281, 1155 (1998); H. Kishida et al.,
Nature (London) 405, 929 (2000).
[2] J. T. Gammel and D. K. Campbell, Phys. Rev. B 60, 71
(1988); Y. Z. Zhang, ibid. 92, 246404 (2004); R. Strack
and D. Volhardt, Phys. Rev. Lett. 70, 2637 (1993).
[3] J. E. Hirsch, Physica (Amsterdam) 158C, 326 (1989);
J. E. Hirsch and F. Marsiglio, Phys. Rev. B 39, 11515
(1989).
[4] L. Arrachea and A. A. Aligia, Phys. Rev. B 59, 1333
(1999); ibid. 61, 9686 (2000).
[5] A. Hübsch et al., Phys. Rev. Lett. 96, 196401 (2006).
[6] A. Anfossi et al. Phys. Rev. Lett. 95, 056402 (2005).
[7] A. Anfossi, P. Giorda, and A. Montorsi, Phys. Rev. B
75, 165106 (2007).
[8] G. I. Japaridze and E. Müller-Hartman, Ann. Phys.
(Leipzig) 506, 163 (1994); A. A. Aligia and L. Arrachea,
Phys. Rev. B 60, 15332 (1999), and references therein.
[9] L. Arrachea et al., Phys. Rev. B 50, 16044 (1994).
[10] L. Arrachea and A. A. Aligia, Phys. Rev. Lett. 73, 2240
(1994); J. de Boer, V. E. Korepin, and A. Schadschneider,
ibid. 74, 789 (1995).
[11] L. Arrachea, A. A. Aligia and E. Gagliano, Phys. Rev.
Lett. 76, 4396 (1996).
[12] A. Anfossi et al., Phys. Rev. B 73, 085113 (2006).
[13] G. I. Japaridze, R. M. Noack, and D. Baeriswyl, Phys.
Rev. B 76, 115118 (2007).
[14] I. N. Karnaukhov, Phys. Rev. B 66, 092304 (2002).
[15] M. Tsuchiizu and A. Furusaki, Phys. Rev. B 69, 035103
(2004).
[16] T. Giamarchi, Quantum Physics in One Dimension (Ox-
ford University Press, Oxford, U.K., 2004).
[17] A. A. Nersesyan, A. O. Gogolin, and F.H.L. Eβler, Phys.
Rev. Lett. 81, 910 (1998).
[18] S. R. White, Phys. Rev. Lett. 69, 2863 (1992); K. Hall-
berg, Adv. Phys. 55, 477 (2006).
[19] M. E. Torio et al., Phys. Rev. B 73, 115109 (2006), and
references therein.
[20] L. Campos Venuti et al., Phys. Rev. A 73, 010303(R)
(2006).
[21] J. E. Hirsch, Phys. Rev. B 22, 5259 (1980).
[22] S. R. Manmana et al., Phys. Rev. B 70, 155115 (2004).
ABSTRACT
  We determine the quantum phase diagram of the one-dimensional Hubbard model
with bond-charge interaction X in addition to the usual Coulomb repulsion U at
half-filling. For large enough X and positive U the model shows three phases.
For large U the system is in the spin-density wave phase already known in the
usual Hubbard model. As U decreases, there is first a spin transition to a
spontaneously dimerized bond-ordered wave phase and then a charge transition to
a novel phase in which the dominant correlations at large distances correspond
to an incommensurate singlet superconductor.

<|endoftext|><|startoftext|>
Introduction
	Flavor Ratios of Astrophysical Neutrinos
	Production of Astrophysical Neutrinos
	Energy Spectra in weak decays 
	Power law Spectra
	Energy Loss Mechanisms
	Decay Energy Distribution
	Hadronic interactions
	Gas Target
	Photoproduction
	Nuclear photodisintegration
	Neutrino Production in Gamma Ray Bursts
	Experimental Flavor identification
	Conclusions
	References
ABSTRACT
  The measurement of the flavor composition of the neutrino fluxes from
astrophysical sources has been proposed as a method to study not only the
nature of their emission mechanisms, but also the neutrino fundamental
properties. It is however problematic to reconcile these two goals, since a
sufficiently accurate understanding of the neutrino fluxes at the source is
needed to extract information about the physics of neutrino propagation. In
this work we discuss critically the expectations for the flavor composition and
energy spectrum from different types of astrophysical sources, and comment on
the theoretical uncertainties connected to our limited knowledge of their
structure.

<|endoftext|><|startoftext|>
Introduction. The gravitational waves from the present system are calculated by the quadrupole formula [33],
which is given by
hQxx − hQyy
cos 2ϕ+ 2hQxy sin 2ϕ
] (cos2 θ + 1)
hQxx + h
yy − 2hQzz
) sin2 θ
hQxz cosϕ+ h
yz sinϕ
sin θ cos θ , (3.1)
2hQxy cos 2ϕ− (hQxx − hQyy) sin 2ϕ
] cos θ
hQxz sinϕ− hQyz cosϕ
sin θ, (3.2)
where
d2Qij
with Qij ≡ µ
ZiZj −
δij Z
(the reduced quadrupole moment of a point mass). (3.3)
(r, θ, ϕ) [or (x, y, z)] is the position of a distant observer in spherical coordinates [or Cartesian coordinates], andZ(t)
is a trajectory of a particle. We assume that the observer is on the equatorial plane, i.e. (θ, ϕ) = (π/2, 0). Figure 6
shows the waveforms from Orbits (a), (b), and (c). The left panels show the “+” polarization modes of those waves,
while the right ones are the “×” polarization. The top, middle, and bottom panels correspond to the waves from
Orbits (a), (b), and (c), respectively. The waves from Orbits (a) and (c) show a periodic feature, which is expected
from the Poincaré maps in Fig. 1. On the other hand, the waves from Orbit (b) show a completely different behaviour.
We find much random spiky noise in the waveform before t/M = 1.6×105 and after t/M = 3.8×105. This is a typical
feature of the gravitational waves from highly chaotic motion [11, 17]. We also find that the amplitude decreases
for the time interval of t/M = (1.6 ∼ 3.8) × 105. As shown in Fig. 3, in this time interval, the particle moves near
the small tori in the phase space. This adjective feature of this particle motion appears clearly in the gravitational
amplitudes. That is, in the phase of a nearly regular motion, the particle position and its velocity do not change
much compared with those in the more strongly chaotic phase (b-1) (see Fig. 1(b) and Fig. 3(b)). The time variation
of the quadrupole moment of the system is small and hence the wave amplitude decreases as well.
We also calculate the energy spectra of the gravitational waves, which will be one of the most important observable
quantities in the near future. In Fig. 7, we show the energy spectra for each orbit. Figures 7(a) and (c) show many
sharp peaks at certain characteristic frequencies. If a motion is regular, we expect several typical frequencies with those
harmonics. So such a result reflects that the particle moves regularly. Figure 7 (b) gives the spectrum of Orbit (b). It
is clearly different from the previous two almost regular cases. It looks just like white noise, below a typical frequency
fM ∼ 10−2, i.e., the shape of the spectrum is flat and it contains many noisy components. However, the spectrum
of Orbit (b-2) (Fig. 7(b-2)), which is analyzed by the orbit only in the time interval of t/M = (1.6 ∼ 3.8)× 105, does
not do so. Rather it looks similar to the spectrum of a regular orbit. Contrary to Fig. 7(b), it does not contain much
noise at the low frequency region (fM ≤ 10−2).
To see more detail, dividing the time interval of Orbit (b) into two, we show the magnifications of the spectra of
Orbits (a), (b-1), (b-2), and (c) in Fig. 8. Compared to the spectra (a) and (c), the spectra (b-1) and (b-2) contain
many noisy spikes. Such noisy spikes are usually found in the gravitational waves from a chaotic orbit [17]. However,
the spectra (b-1) and (b-2) are completely different. The spectrum (b-1) is just white noise. No structure is found.
On the other hand, the spectrum (b-2) looks similar to those for regular orbits. The “sharp” peaks appear at some
frequencies, but the widths of those peaks are broadened by many noisy spikes. Therefore, we conclude that Orbit
(b-2) looks nearly regular but still holds its chaotic character, and such a feature imprints in the spectrum of the
waves. The important point is that two phases in the particle orbit (b), i.e., the nearly regular phase and the more
strongly chaotic one, are also distinguishable in the gravitational wave forms and the energy spectra. With this
analysis, we could constrain orbital parameters.
IV. SUMMARY AND DISCUSSION
In this paper we have investigated chaos characteristic for a test particle motion in a system of a point mass with
a massive disk in Newtonian gravity. To distinguish such characteristics, we propose the gravitational waves emitted
from this system. At first, we analyzed the motion of the particle by use of the Poincaré map and the “local” Lyapunov
exponent. We found that the phase in which particle motion becomes nearly regular always appears even though the
global motion is chaotic. We emphasize that both phases of nearly regular and more strongly chaotic motions are
found in the same orbit.
The gravitational wave forms and their energy spectra have been evaluated by use of the quadrupole formula in
each case. In two almost regular cases, the waves show the periodic behaviour and certain sharp peaks appear in those
energy spectra. In the chaotic case, we have found that the waves show two phases, the nearly regular phase and
a more strongly chaotic one. In the nearly regular phase, wave amplitude gets smaller in the more strongly chaotic
phase. The energy spectra are also clearly different. The spectrum in the more strongly chaotic phase looks like white
noise, but in the nearly regular one, it becomes similar to those in the regular ones. However it is accompanied by
many small noisy spikes, which is a characteristic feature of a chaotic system. These spikes make the widths of the
spectrum peaks broader than those in the regular cases. Comparing information from the waves with the particle
motion, we conclude that we can extract chaotic characteristics of a particle motion of the gravitational waves of the
system. In the present analysis, in the spectrum (b-2) of the gravitational waves, we do not find a power-law structure,
which appears in the spectrum of the particle motion. This may be because the waveform is given by the change of
the quadrupole moment, which contains higher time derivatives of a particle trajectory such as acceleration. It may
be much more interesting if one can find the 1/f behaviour in some information of the gravitational waves because
 2000  4000  6000  8000  10000
orbit(a)
 2000  4000  6000  8000  10000
orbit(a)
 100000  200000  300000  400000  500000
orbit(b)
 100000  200000  300000  400000  500000
orbit(b)
 2000  4000  6000  8000  10000
orbit(c)
 2000  4000  6000  8000  10000
orbit(c)
FIG. 6: The gravitational waveforms evaluated by the quadrupole formula. Top, middle, and bottom figures correspond to
those for Orbits (a), (b), and (c), respectively. The left and right rows give the “+” and “×” polarization modes, respectively.
such an indication may specify the type of chaos more clearly. This is under investigation.
Finally, we mention a possibility to constrain parameters in a dynamical system. If the gravitational waves are
observed for a sufficiently long time, we can monitor the time variation of the wave amplitudes, their forms and
polarizations. We can then calculate the energy spectra for some durations. If the spectra show one of the typical
characteristics found in this paper, the parameters of a particle motion could be constrained. Of course, a realistic
system can be more complicated, and the present model may be too simple. But we believe the characteristic behaviour
of the gravitational waves found in this paper will help us to understand a chaotic system. Therefore our next task
is to analyze the gravitational waves from various chaotic systems, especially relativistic chaotic systems [4, 5, 6, 7,
9, 10, 11, 12, 13, 17, 21, 32]. Then, we should investigate whether or not the correlation between the gravitational
-6 -5 -4 -3 -2 -1  0
Log10[fM]
orbit(a)
-6 -5 -4 -3 -2 -1  0
Log10[fM]
orbit(b)
-5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5  0
Log10[fM]
orbit(b-2)
-2 -1.9 -1.8 -1.7 -1.6 -1.5
Log10[fM]
orbit(c)
FIG. 7: The energy spectra of the gravitational waves shown in Fig. 6. Orbit (b-2) gives the spectrum of the waves for the
“stagnant motion”, i.e., when the particle motion of Orbit (b) becomes near regular for t/M = (1.6 ∼ 3.8) × 105. Figures (a)
and (c) show many sharp peaks at certain characteristic frequencies. This is because of the regular motion. The spectrum in
Fig. (b), which looks like white noise for fM ≤ 10−2, is clearly different from those in Figs. (a) and (c), but the spectrum
in Fig. (b-2) does not look like white noise. It looks similar to the cases (a) and (c). However, the peaks are not sharp
but rather broadened by appearing so many other spikes. Note that the typical frequency of the orbits is in the range of
fM = 10−2 ∼ 10−1 (see Fig. 5).
waves and chaos in dynamical systems found in this work is generic.
Acknowledgments
We express thanks to T. Konishi for useful discussions. This work was supported in part by Japan Society for
Promotion of Science (JSPS) Research Fellowships (K.K. and H.K.), by a Grant-in-Aid from the Scientific Research
Fund of the JSPS (No. 17540268), and by the Japan-U.K. Research Cooperative Program. K.M. would like to thank
DAMTP, the Centre for Theoretical Cosmology, and Clare Hall, where this work was completed.
[1] Deterministic Chaos in General Relativity, edited by D. Hobill, A. Burd, and A. Coley (Plenum, New York, 1994), and
references therein.
[2] J.D. Barrow, Phys. Rep. 85, 1 (1982).
[3] G. Contopoulos, Proc. R. Soc. London A431,183(1990).
[4] V. Karas and D. Vokrouhlický, Gen. Rela. Grav. 24,729(1992).
[5] H. Varvoglis and D. Papadopoulos, Astron. Astrophys. 261,664(1992).
[6] L. Bombelli and E. Calzetta, Class. Quantum Grav. 9,2573(1992).
[7] R. Moeckel, Commun. Math. Phys. 150,415(1992).
-2.4 -2.2 -2 -1.8 -1.6
Log10[fM]
orbit(a) 
-2 -1.8 -1.6 -1.4 -1.2 -1
Log10[fM]
orbit(b-1)
-2 -1.8 -1.6 -1.4 -1.2 -1
Log10[fM]
orbit(b-2)
-2 -1.9 -1.8 -1.7 -1.6 -1.5
Log10[fM]
orbit(c)
FIG. 8: Magnification of energy spectra of Orbits (a), (b-1), (b-2), and (c).
[8] C.P. Dettmann, N.E. Frankel and N.J. Cornish, Phys. Rev. D 50, 618(1994).
[9] U. Yurtsever, Phys. Rev. D 52,3176(1995).
[10] Y. Sota, S. Suzuki, and K. Maeda, Class. Quantum Grav. 13,1241(1996).
[11] S. Suzuki and K. Maeda, Phys. Rev. D 55,4848(1997); 58,023005(1998); 61,024005(1999).
[12] P.S. Letelier and W.M. Viera, Phys. Rev. D 56,8095(1997).
[13] J. Podolsky and K. Vesely, Phys. Rev. D 58, 081501(1998).
[14] J. Levin, Phys. Rev. Lett. 84,3515(2000).
[15] J.D. Schnittman and F.A. Rasio, Phys. Rev. Lett. 87, 121101 (2001).
[16] N.J. Cornish and J. Levin, Phys. Rev. Lett. 89,179001(2002).
[17] K. Kiuchi and K. Maeda, Phys. Rev. D 70,064036(2004)
[18] C.F.F. Karney, Physica D 8,360(1983)
[19] G. Contopoulos, M. Harsoula and N. Voglis, Celest. Mech. Dyn. Astron. 78,197(2000)
[20] G. Contopoulos, Order and Chaos in Dynamical Astronomy, (Springer, 2002)
[21] H. Koyama, K. Kiuchi and T. Konishi, arXiv:gr-qc/0702072.
[22] K. Tsubono, Prepared for Edoardo Amaldi Meeting on Gravitational Wave Experiments, Rome, Italy, 14-17 Jun 1994
[23] A. Abramovici et al., “LIGO: The Laser interferometer gravitational wave observatory,” Science 256, 325 (1992).
[24] J. Hough et al., Prepared for TAMA Workshop on Gravitational Wave Detection, Saitama, Japan, 12-14 Nov 1996
[25] K.S. Thorne, arXiv:gr-qc/9506086.
[26] A. Saa and R. Venegeroles, Phys. Lett. A 259,201(1999)
[27] A. Saa, Phys. Lett. A 269,204(2000)
[28] The recent observation suggests there exist huge black holes at the centers of many galaxies. See, for example,
J. Kormendy and D. Richstone, Astrophys. J. 393,559(1992)
[29] I. Shimada and T. Nagashima, PTP 61, 1605 (1979)
[30] The reason why we find non-zero positive Lyapunov exponent for an integrable system is that we solve the equation of
motion by a finite difference method and the finite difference approximation does not provide an exact integrable system.
In fact, if we reduce the time step for integration, the value decreases.
[31] W.C.Saslaw, Gravitational physics of stellar and galactic systems, Cambridge University Press, 1985.
[32] J.P.S. Lemos and P.S. Letelier, Phys. Rev. D 49,5135(1994).
[33] L. D. Landau and E. M. Lifshitz, The Classical Theory of Fields, (Pergamon, Oxford, 1951).
[34] P. Grassberger, R. Badii, and A. Politi, J. Stat. Phys. 51, 135 (1988); H. E. Kandrup, B. L. Eckstein, and B. O. Bradley,
Astron. Asrophys. 320, 65 (1997)
http://arxiv.org/abs/gr-qc/0702072
http://arxiv.org/abs/gr-qc/9506086
APPENDIX A: LOCAL LYAPUNOV EXPONENT
In this appendix, we give the definition of “local” Lyapunov exponent. Our definition of “local” Lyapunov exponent
is somewhat different from the conventional one[34], but those are essentially the same.
At first, let us consider the system whose time evolution is described by a set of differential equations in N -
dimensional space,
ẋ = F(x) , (A1)
where x(t) is a N -dimensional vector.
The time evolution of the orbital deviation δx, which is the difference between two nearby orbits, obeys the following
set of linear differential equations:
δẋ =
(x(t))δx. (A2)
The solution of Eq. (A2) can be written formally as
δx(t) = U tt0δx0, (A3)
where δx0 is an “initial” deviation at some time t0 and U
is an evolution matrix, which is given by the following
integration;
U tt0 = exp
(x(t′))dt′
. (A4)
We define the “local” Lyapunov exponent in time interval [t0, t] by
λ(ek, t) =
t− t0
||U tt0e1 ∧ U
e2 ∧ · · · ∧ U tt0ek||
||e1 ∧ e2 · · · ∧ ek||
for k = 1, 2, · · · , N , where ek is a k-dimensional subspace in the tangent space at the initial point x0, which is spanned
by k independent vectors ei (i = 1, 2, · · · , k), ∧ is an exterior product, and || ◦ || is a norm with respect to some
appropriate Riemannian metric. If we take a limit of t → ∞, λ(ek,∞) correspond to the conventional Lyapunov
exponents.
If the integration time interval t∆ ≡ t − t0 is much longer than the dynamical time of the system, we may find
convergent values for each λ(ek, t), which are almost independent of t∆ (or t0). We may call them “local” Lyapunov
exponents at t. The maximum value of “local” Lyapunov exponents, i.e. λ(t) = max{λ(ek, t)|k = 1, 2, · · · , N} is the
most important one for our discussion. So we also call it the “local” Lyapunov exponent at t.
-6 -5 -4 -3 -2 -1  0
Log10[fM]
orbit(b)
	Introduction
	Basic equations
	Numerical Analysis
	Two phases of chaos in particle motion
	Indication of chaos in gravitational waves
	Summary and Discussion
	Acknowledgments
	References
	Local Lyapunov Exponent
ABSTRACT
  We study gravitational waves from a particle moving around a system of a
point mass with a disk in Newtonian gravitational theory. A particle motion in
this system can be chaotic when the gravitational contribution from a surface
density of a disk is comparable with that from a point mass. In such an orbit,
we sometimes find that there appears a phase of the orbit in which particle
motion becomes to be nearly regular (the so-called ``stagnant motion'') for a
finite time interval between more strongly chaotic phases. To study how these
different chaotic behaviours affect on observation of gravitational waves, we
investigate a correlation of the particle motion and the waves. We find that
such a difference in chaotic motions reflects on the wave forms and energy
spectra. The character of the waves in the stagnant motion is quite different
from that either in a regular motion or in a more strongly chaotic motion. This
suggests that we may make a distinction between different chaotic behaviours of
the orbit via the gravitational waves.

<|endoftext|><|startoftext|>
Introduction
This paper is a sequel to [Z]. We present here a Lohner-type algorithm for
computation of rigorous enclosures of partial derivatives with respect to initial
conditions up to an arbitrary order r of the flow induced by an autonomous
ODE, hence the name Cr-Lohner algorithm. Let r be a positive integer, then
by Cr-algorithm we will mean the routine which gives rigorous estimates for
partial derivatives with respect to initial conditions up to an order r and Cr-
computations we mean an application of an Cr-algorithm.
Our main motivation for the development of Cr-algorithm was a desire to
provide a tool, which will considerably extend the possibilities of computer
assisted proofs in the dynamics of ODEs. Till now most of such proofs have used
topological conditions (see for example [HZHT, MM, GZ, Z1]) and additionally
conditions on the first derivatives with respect to initial conditions (see for
example [RNS, T, Wi1, WZ, KZ]), hence it required C0- and C1-computations,
respectively. The spectrum of problems treated includes the questions of the
existence of periodic orbits and their local uniqueness, the existence of symbolic
dynamics, the existence of hyperbolic invariants sets, the existence of homo-
and heteroclinic orbits. To treat other phenomena, like bifurcations of periodic
orbits, the route to chaos, invariant tori through KAM theory one needs the
knowledge of partial derivatives with respect to initial conditions of higher order.
In principle, one can think that a good rigorous ODE solver should be en-
ough. Namely, to compute the partial derivatives of the flow induced by
x′ = f(x), x ∈ Rn (1)
1 Research supported by an annual national scholarship for young scientists from the
Foundation for Polish Science
2 Research supported in part by Polish State Ministry of Science and Information Techno-
logy grant N201 024 31/2163
http://arxiv.org/abs/0704.0720v1
it is enough to rigorously integrate a system of variational equations obtained by
a formal differentiation of (1) with respect to the initial conditions. For example
for r = 2 we have the following system
x′ = f(x), (2)
Vij(t) =
(x)Vsj(t) (3)
Hijk(t) =
s,r=1
∂xs∂xr
(x)Vrk(t)Vsj(t) +
(x)Hsjk(x), (4)
with the initial conditions
x(0) = x0, V (0) = Id, Hijk(0) = 0, i, j, k = 1, . . . , n.
It is well known that if by ϕ(t, x0) we denote the (local) flow induced by (1),
(t, x0) = Vi,j(t),
∂xj∂xk
(t, x0) = Hijk(t).
Analogous statements are true for higher order partial derivatives with respect
to initial conditions.
It turns out that a straightforward application of a rigorous ODE solver to
the system of variational Equations (2–4) is very inefficient. Namely, it totally
ignores the structure of the system and leads to a very poor performance and
unnecessary long computation times (see Section 4.1).
Our algorithm is a modification of the Lohner algorithm [Lo], which takes
into account the structure of variational Equations (2–4). Basically it consists of
the Taylor method, a heuristic routine for a priori bounds for solution of (2–4)
during a time step and a Lohner-type control of the wrapping effect, which is
done separately for x and partial derivatives with respect initial conditions (the
variables V and H in (3,4)). The Taylor method is realized using the automa-
tic differentiation [Ra] and the algorithms for computation of compositions of
multivariate Taylor series.
The proposed algorithm has been successfully applied in [HNW] to the Mi-
chelson system [Mi], where a computer assisted proof of the existence of a cocoon
bifurcation was presented. Some parts of this proof required C2-computations.
In the present paper in Section 8 we show an application of our algorithm to
pendulum equation with periodic forcing and the Michelson system. We used
it to compute rigorous bounds for the coefficients of some normal forms up to
order five, which enabled us to prove the existence of invariant tori around some
elliptic periodic orbits in these systems using KAM theorem for twist maps on
the plane. These proofs required C3 and C5 computations.
2 Basic definitions
To effectively deal with the formulas involving partial derivatives we will use ex-
tensively a notation of multiindices, multipointers and submultipointers throu-
ghout the paper.
As an motivation let us consider the formula for the partial derivatives of
the composition of maps. Assume g : Rn → Rn and f : Rn → R are of class C3.
We have
∂3(f ◦ g)
∂xi∂xj∂xc
k,r,s=1
∂xk∂xr∂xs
∂xi∂xj∂xc
k,r=1
∂xk∂xr
∂xi∂xc
∂xj∂xc
∂xi∂xj
To the operator ∂
∂xi1∂xi2∂xi3
we can in a unique way assign a multipointer,
which is a nondecreasing sequence of integers (j1, j2, j3), such that {i1, i2, i3} =
{j1, j2, j3}. A submultipointer is a multipointer, which is a part of a longer mul-
tipointer, for example (i, j, c)(1,3) = (i, c). One observes, that submultipointers
appear at several places in the above formula.
A multiindex is an element of α ∈ Nn. It is another way to represent various
partial derivatives. The coefficient αi tells us how many times to differentiate
a function with respect to the i-th variable. Obviously, we have one-to-one
correspondence between multipointers and multiindices.
2.1 Multiindices
By N we will denote the set of nonnegative integers, i.e. N = {0, 1, 2, . . .}.
Definition 1 An element τ ∈ Nn will be called a multiindex.
For a sequence α = (α1, . . . , αn) ∈ Nn and a vector x = (x1, . . . , xn) ∈ Rn we
1. |α| = α1 + · · ·+ αn
2. α! = α1! · α2! · · ·αn!
3. xα = (xα11 , . . . , x
By eni ∈ Nn we will denote
eni = (0, 0, . . . , 0,
1 , 0, . . . , 0, 0).
We will drop the index n (the dimension) in the symbol eni when it is obvious
from the context.
Put Nnp := {a ∈ Nn : |a| = p}.
For δ = (δ1, . . . , δk) ∈ Nn1 × · · · × Nnk we set
1. |δ| =
i=1 |δi|
2. δ! =
i=1 δi!
Let f = (f1, . . . , fm) : R
n → Rm be sufficiently smooth. For α ∈ Nn we set
1. Dαfi =
∂|α|fi
∂xα11 · · · ∂x
2. Dαf = (Dαf1, D
αf2, . . . , D
For a function f : R× Rn → Rn by Dαfi(t, x) we will denote Dαfi(t, ·)(x) and
similarly
Dαf(t, x) = (Dαf1(t, x), . . . , D
αfn(t, x)).
This convention means that Dα always acts on x-variables.
2.2 Multipointers
For a fixed n > 0 and p > 0 we define
Nnp := {(a1, a2, . . . , ap) ∈ Np : 1 ≤ a1 ≤ · · · ≤ ap ≤ n}
N = Nn :=
Definition 2 An element of Nn will be called a multipointer.
Remark 3 A function
Λ : Nnp ∋ (a1, . . . , ap) →
enai ∈ N
p (5)
is a bijection.
Let f = (f1, . . . , fm) : R
n → Rm be a sufficiently smooth. For a ∈ Nnp we set
1. Dafi :=
∂xa1 . . . ∂xap
2. Daf := (Daf1, . . . , Dafm)
For a function f : R × Rn → Rn by Dafi(t, x) we will denote Dafi(t, ·)(x). In
the light of the above notations Dαf = D
Λ(α)f .
For a = (a1, a2, . . . , an) ∈ Nnp and b = (b1, b2, . . . , bn) ∈ Nnq we define
a+ b = (a1 + b1, . . . , an + bn) ∈ Nnp+q.
For α ∈ Nnp and β ∈ Nnq we define
α+ β = Λ−1 (Λ(α) + Λ(β)) ∈ Nnp+q.
By ≤ we will denote a linear order (lexicographical order) in N defined in
the following way. For a ∈ Nnp and b ∈ Nnq
(a ≤ b) ⇐⇒
either ∃i, i ≤ p, i ≤ q, ai < bi and aj = bj for j < i
or p ≤ q and ai = bi for i = 1, . . . , p.
Definition 4 For k ≤ p we set
N p(k) := {(δ1, . . . , δk) ∈ (N p)k : δ1 ≤ · · · ≤ δk, δ1+ · · ·+δk = (1, 2, . . . , p)} (7)
We will use N p(k) extensively in the next section. Its will be used to label
terms in Dαfi(ϕ(t, x)). Observe that for p > 0
N p(1) = {(1, 2, . . . , p)}
N p(p) = {((1), (2), . . . , (p))}
One can construct all elements of N p(k) using the following recursive procedure.
From the definition of N p(k) it follows that if (δ1, . . . , δm−1) ∈ N p−1(m − 1)
then (δ1, . . . , δm−1, (p)) ∈ N p(m) (notice that order is preserved). Similarly, if
(δ1, . . . , δm) ∈ N p−1(m) then
(δ1, . . . , δs−1, δs + (p), δs+1, . . . , δm) ∈ N p(m)
and again order of elements is preserved. Hence, for p > 2 and 1 < k < p we
have N p(k) = A ∪B where
(δ1, . . . , δk−1, (p)) : (δ1, . . . , δk−1) ∈ N p−1(k − 1)
(δ1, . . . , δs−1, δs + (p), δs+1, . . . , δk) : (δ1, . . . , δk) ∈ N p−1(k)
} (8)
and the sets A and B are disjoint.
Another way to generate all elements of N p(k) can be described as follows
• decompose the set {1, 2, . . . , p} into k nonempty and disjoints sets ∆i,
i = 1, . . . , k
• we sort each ∆i and permute ∆i’s to obtain min(∆1) < min(∆2) < · · · <
min(∆k)
• we define δi to be an ordered set consisting of all elements of ∆i for
i = 1, . . . , k
Definition 5 For an arbitrary a ∈ Nnp and δ ∈ N
k such that k ≤ p we define a
submultipointer aδ ∈ Nnk by (aδ)i = aδi for i = 1, . . . , k, which can be expressed
using Λ as follows
aδ := Λ
enaδi
∈ Nnk
3 Equations for variations
Consider an ODE x′ = f(x) where f is CK+1. Let ϕ : R× Rn−→◦ Rn be a local
dynamical system induced by x′ = f(x). It is well known, that ϕ ∈ CK and one
can derive the equations for partial derivatives of ϕ by differentiating equation
(t, x) = f(ϕ(t, x)) with respect to the initial condition x. As a result we
obtain a system of so-called equations for variations, whose size depends on the
order r of partial derivatives we intend to compute. An example of such system
for r = 2 is given by (2–4) with initial conditions given by (5).
The goal of this section is to write the equations for variations in a compact
form using multipointers and multiindices, which allows us to take into account
the symmetries of partial derivatives,
Lemma 6 Assume f ∈ Cr+1 and let ϕ : R × Rn−→◦ Rn be a local dynamical
system induced by x′ = f(x). Then for a ∈ Nnp such that p ≤ r holds
Daϕi =
i1,...,ik=1
Dei1+···+eik fi
(δ1,...,δk)∈Np(k)
Daδjϕij (9)
for i = 1, . . . , n.
Proof: In the proof the functions Dei1+···+eik fi are always evaluated at
ϕ(t, x), and various partial derivatives of ϕ are always evaluated at (t, x), there-
fore the arguments will be always dropped to simplify formulae. We prove the
lemma by induction on p = |a|. If p = 1 then a = (c) for some c ∈ {1, . . . , n}
and (9) becomes
D(c)ϕi =
Desfi ·D(c)ϕs.
Assume (9) holds true for p − 1, p > 1. Let us fix a ∈ Nnp . We have
a = b + (c), where b = (a1, . . . , ap−1) ∈ Nnp−1 and c = ap. Since (9) is satisfied
for p− 1, therefore we have
Daϕi = D(c)
= D(c)
i1,...,ik=1
β:=ei1+···+eik
(δ1,...,δk)∈Np−1(k)
Dbδjϕij
i1,...,ik+1=1
β:=ei1+···+eik+1
Dβfi ·D(c)ϕik+1
(δ1,...,δk)∈Np−1(k)
Dbδjϕij
i1,...,ik=1
β:=ei1+···+eik
(δ1,...,δk)∈Np−1(k)
Dbδs+(c)ϕis
j 6=s
Dbδjϕij
For k = 1, . . . , p we set
Tk :=
i1,...,ik=1
Dei1+···+eik fi
(δ1,...,δk)∈Np(k)
Daδjϕij (10)
Now our goal is to prove that:
Daϕi =
Tk (11)
Our strategy of proof is as follows. We will define S1, . . . , Sp, such that
Daϕi =
Sk, Si = Ti, i = 1, . . . , p. (12)
We set
i1,...,ik=1
β:=ei1+···+eik
(δ1,...,δk)∈Np−1(k)
Dbδs+(c)ϕis
j 6=s
Dbδjϕij
k=p−1
i1,...,ik+1=1
β:=ei1+···+eik+1
Dβfi ·D(c)ϕik+1
(δ1,...,δk)∈Np−1(k)
Dbδjϕij .
For m = 2, 3, . . . , p− 1 we set
k=m−1
i1,...,ik+1=1
β:=ei1+···+eik+1
Dβfi ·D(c)ϕik+1
(δ1,...,δk)∈Np−1(k)
Dbδjϕij
i1,...,ik=1
β:=ei1+···+eik
(δ1,...,δk)∈Np−1(k)
Dbδs+(c)ϕis
j 6=s
Dbδjϕij
It remains to show that Si = Ti for i = 1, . . . , p. Consider first i = 1. Recall
that N p−1(1) = {(1, 2, . . . , p− 1)}, hence
Desfi ·Db+(c)ϕs =
Desfi ·Daϕs.
Therefore
S1 = T1. (13)
Consider now i = p. For an arbitrary s > 0 N s(s) contains only one element
((1), (2), . . . , (s)). Therefore we obtain
i1,...,ip=1
Dei1+···+eip fi ·D(c)ϕip
(δ1,...,δp−1)∈Np−1(p−1)
Dbδjϕij
i1,...,ip=1
Dei1+···+eip fi ·D(c)ϕip
Dbjϕij .
Since a = b+ (c), where c = (ap), hence
i1,...,ip=1
Dei1+···+eip fi
Dajϕij
i1,...,ip=1
Dei1+···+eip fi
(δ1,...,δp)∈Np(p)
Daδjϕij = Tp
Consider now m = 2, 3, . . . , p− 1. We have
i1,...,im=1
Dei1+···+eim fi ·D(c)ϕim
(δ1,...,δm−1)∈Np−1(m−1)
Dbδjϕij
i1,...,im=1
Dei1+···+eim fi
(δ1,...,δm)∈Np−1(m)
Dbδs+(c)ϕis
j 6=s
Dbδjϕij
Using decomposition N p(m) = A ∪B as in (8) we obtain
i1,...,im=1
Dei1+···+eim fi
(δ1,...,δm−1,δm=(p))∈A
Daδjϕij
i1,...,im=1
Dei1+···+eim fi
(δ1,...,δm)∈B
Daδjϕij
i1,...,im=1
Dei1+···+eim fi
(δ1,...,δm)∈Np(m)
Daδjϕij = Tm
We have shown that Ti = Si for i = 1, . . . , p. This finishes the proof.
4 Cr-Lohner algorithm
4.1 Why one needs an Cr-algorithm?
There are several effective algorithms for the computation of rigorous bounds
for solutions of ordinary differential equations, including Lohner method [Lo],
Hermite–Obreschkoff algorithm [NJ] or Taylor models [BM]. For Cr-computa-
tions the number of equations to solve is equal to n
hence, even for
r = 1 direct application of such an algorithms to equations for variations (14)
leads to integration in high dimensional space and is usually inefficient. Let us
recall after [Z, Sec. 6] the basic reason for this. In order to have a good control
over the expansion rate of the set of initial conditions during a time step these
algorithms, while being C0, are C1 ’internally’(or higher for Taylor models),
because they solve non-rigorously equations for (∂ϕ
) - the variational matrix of
the flow. This effectively squares the dimension of phase space of the equation
and impacts heavily the computation time. But as it was observed in [Z] the
equations for partial derivatives of the flow can be seen as non-autonomous and
nonhomogenous linear equations, therefore we do not need additional equations
for variations for them. As a result the dimension of the effective phase space
for our Cr-algorithm is given by n
and not a square of this number.
Another important aspect of the proposed algorithm is the fact that the
Lohner-type control of the wrapping effect is done separately for x-variables
and variables Daϕ. This feature is not present in the blind application of C0
algorithm to the system of variational equations and it turns out that this often
practically switches off the control of the wrapping effect on x-variables, as
various choices used in this control become dominated by the Daϕ-variables.
In [Z] a C1-algorithm has been proposed. Here we present an algorithm for
computation of higher order partial derivatives.
4.2 An outline of the algorithm
Let us fix r ≤ K and consider the following system of differential equations
ϕ = f ◦ ϕ
Daϕ =
i1,...,ik=1
Dei1+···+eik f
(δ1,...,δk)∈Nd(k)
Daδjϕij
for all a ∈ Nnd , d = 1, . . . , r.
Our goal is to present an algorithm for computing a rigorous bound for the
solution of (14) with a set of initial conditions
ϕ(0, x0) ∈ [x0] ⊂ Rn
Dϕ(0, x0) = Id
Daϕ(0, x0) = 0, for a ∈ Nn2 ∪ . . . ∪ Nnr .
In the sequel we will use the following notations:
• if a solution of system (14) is defined for t > 0 and some x0 ∈ Rn, then
for a ∈ N by Va(t, x0) we denote Daϕ(t, x0)
• for [x0] ⊂ Rn by [Va(t, [x0])] we will denote a set for which we have
Va(t, [x0]) ⊂ [Va(t, [x0])]. This set is obtained using an rigorous nume-
rical routine described below.
The Cr-Lohner algorithm is a modification of C1-Lohner algorithm [Z]. One
step of Cr-Lohner is a shift along the trajectory of the system (14) with the
following input and output data
Input data:
• tk - a current time,
• hk - a time step,
• [xk] ⊂ Rn, such that ϕ(tk, [x0]) ⊂ [xk],
• [Vk,a] = [Vk,a(tk, [x0])] ⊂ Rn, such that Daϕ(tk, [x0]) ⊂ [Vk,a] for a ∈
Nn1 ∪ . . . ∪ Nnr .
Output data:
• tk+1 = tk + hk - a new current time,
• [xk+1] ⊂ Rn, such that ϕ(tk+1, [x0]) ⊂ [xk+1],
• [Vk+1,a] = [Vk+1,a(tk+1, [x0])] ⊂ Rn, such that Daϕ(tk+1, [x0]) ⊂ [Vk+1,a]
for a ∈ Nn1 ∪ . . . ∪ Nnr .
We will often skip the arguments of Vk,a when they are obvious from the context.
The values of [xk+1] and [Vk+1,a], a ∈ Nn1 are computed using one step
C1-Lohner algorithm. After it is done, we perform the following operations to
compute [Vk+1,a] for a ∈ Nn2 ∪ . . . ∪ Nnr
1. Find a rough enclosure for Daϕ([0, hk], [xk]).
2. Compute [Vk+1,a], this will also involve some rearrangement computations
to reduce the wrapping effect for V [Mo, Lo].
5 Computation of a rough enclosure for Daϕ
For a fixed multipointer a ∈ Nnd Equation (14) can be written as follows
Daϕ(t, x) = Ba(t, x) +A(t, x)Daϕ(t, x) (16)
where
i1,...,ik=1
Dei1+···+eik f
(δ1,...,δk)∈Nd(k)
Daδjϕij
A = Df ◦ ϕ
The procedure for computing the rough enclosure is based on the notion of
a logarithmic norm, which we give below.
Definition 7 [HNW] For a square matrix A the logarithmic norm µ(A) is de-
fined as a limit
µ(A) = lim sup
‖Id +Ah‖ − 1
where ‖ · ‖ is a given matrix norm.
The formulas for the logarithmic norm of a real matrix in the most frequently
used norms are (see [HNW])
1. for ‖x‖1 =
i |xi|, µ(A) = maxj(ajj +
i6=j |aij |)
2. for m ‖x‖2 =
i |xi|2, µ(A) is equal to the largest eigenvalue of (A +
AT )/2
3. for ‖x‖∞ = maxi |xi|, µ(A) = maxi(aii +
j 6=i |aij |)
In order to find bounds for Daϕ we use the following theorem [HNW, Thm.
I.10.6]
Theorem 8 Let x(t) be a solution of a differential equation
x′(t) = f(t, x(t)), x ∈ Rn (18)
Let ν(t) be a piecewise differentiable function with values in Rn. Assume that
(t, η)
≤ l(t) for η ∈ [x(t), ν(t)]
|ν′(t)− f(t, ν(t))| ≤ δ(t),
where by µ(A), we denote a logarithmic norm of a square matrix A ∈ Rn×n.
Then for t ≥ t0 we have
|x(t) − ν(t)| ≤ eL(t)
|x(t0)− ν(t0)|+
e−L(s)δ(s)ds
, (19)
with L(t) =
l(τ)dτ .
We apply the above theorem to Equation (16) to obtain
Lemma 9 Let us fix x ∈ Rn. Assume that |Ba(t, x)| ≤ δ(t) and µ(A(t, x)) ≤
l(t), then for t > t0
|Daϕ(t, x)| ≤ |Daϕ(t0, x)|eL(t) + eL(t)
e−L(τ)δ(τ)dτ (20)
with L(t) =
l(τ)dτ .
Proof: Consider Equation (16) and a homogenous problem for (16)
w = f(t, w) := A(t, x) · w, w ∈ Rn. (21)
Using Theorem 8 we can estimate the difference between any solution of (21),
w, and a solution of (16), denoted by Daϕ.
|Daϕ(t) − w(t)| ≤ |Daϕ(t0)− w(t0)|eL(t) + eL(t)
e−L(τ)δ(τ)dτ. (22)
After a substitution w(t) = 0, which is a solution of the homogenous equation,
we obtain our assertion.
Usually, we do not have any control over the time dependence of δ and l,
hence we will use the following
Lemma 10 Assume that |Ba(t, x)| ≤ δ and µ(A(t, x)) ≤ l for t ∈ [0, h] then
for t ∈ [0, h] we have
|Daϕ(t, x)| ≤ |Daϕ(0, x)|max(1, ehl) + δ
elt − 1
, if l 6= 0, (23)
|Daϕ(t, x)| ≤ |Daϕ(0, x)|+ δt, when l = 0. (24)
5.1 The procedure for the computation of the rough en-
closure for V .
The procedure for the computing of the rough enclosure is iterative, which
means that given a rough enclosure for ϕ([0, hk], [xk]) and rough enclosures
Daϕ([0, hk], [xk]) for all a ∈ Nn1 ∪ . . . ∪ Nnp we are able to compute the rough
enclosure for Daϕ([0, hk], [xk]) for a ∈ Nnp+1.
The procedures for computation of the rough enclosures of ϕ([0, hk], [xk])
and Daϕ([0, hk], [xk]) for a ∈ Nn1 has been given in [Z]. Below we present an
algorithm for computing [Ea] for a ∈ Nn2 ∪ . . . ∪Nnr .
Input parameters:
• hk - a time step,
• [xk] ⊂ Rn - the current value of x = ϕ(tk, [x0]),
• [E0] ⊂ Rn - a compact and convex such that ϕ([0, hk], [xk]) ⊂ [E0]
• [Ea] ⊂ Rn, a ∈ Nn1 ∪ . . . ∪ Nnp such that Daϕ([0, hk], [xk]) ⊂ [Ea] for
a ∈ Nn1 ∪ . . . ∪ Nnp .
Output:
• [Ea] ⊂ Rn, a ∈ Nnp+1 such that
Daϕ([0, hk], [xk]) ⊂ [Ea]
Before we present an algorithm let us observe that for a fixed a ∈ Nnp+1, Ba
defined in (17) could be seen as a multivariate function of t, x and Vb = Dbϕ for
b ∈ Nn1 ∪. . .∪Nnp . More precisely, putmp := ♯
Nn1 ∪ . . . ∪ Nnp
, where ♯ stands
for number of elements of a set. Recall that, we have defined by (6) a linear
order in Nn. Hence, there is a unique sequence of multipointers b1, . . . , bmp ,
such that bi ∈ Nn1 ∪ . . .∪Nnp for i = 1, . . . ,mp, b1 ≤ b2 ≤ · · · ≤ bmp and bi 6= bj
for i 6= j.
Let us define
B̃a : R× (Rn)mp+1 → Rn,
Fa : R× (Rn)mp+1 → Rn
B̃a(t, x, vb1 , . . . , vbmp ) =
i1,...,ik=1
Dei1+···+eik f(ϕ(t, x))
(δ1,...,δk)∈Np+1(k)
Fa(t, x, vb1 , . . . , vbm) = B̃a(t, x, vb1 , . . . , vbm) +Df(ϕ(t, x))Va(t, x) (26)
Algorithm:
To compute [Ea] for a ∈ Nnp+1 we proceed as follows
1. Find l ≥
maxx∈[E0] µ (Df(x))
2. Compute δa ≥ max ‖B̃a‖, i.e.
δa ≥ max
(x,vb1 ,...,vbmp )∈[E0]×[Eb1 ]×···×[Ebmp ]
∥B̃a(0, x, vb1 , . . . , vbmp )
For example, if a = (j, c) ∈ Nn2 , then δa should be such that
δa ≥ max
x∈[E0],v1∈[E(1)],...,vn∈[E(n)]
r,s=1
∂xr∂xs
(x) (vj)s (vc)r
3. Define [Ea]i = [−1, 1]δa e
, for i = 1, . . . , n, where [Ea]i denotes i-th
coordinate of [Ea].
One can refine the obtained enclosure by
[Ea] :=
[0, hk]Fa
0, [E0], [Eb1 ], . . . , [Ebmp ]
∩ [Ea]
Indeed, for i = 1, . . . , n, t ∈ [0, hk] and x0 ∈ [E0] we have
Daϕi(t, x0) = Daϕi(t, x0)−Daϕ(0, x0)
= t (Fa)i (θi, x0, Db1ϕ(θi, x0), . . . , Dbmpϕ(θi, x0))
= t (Fa)i (0, ϕ(θi, x0), Db1ϕ(θi, x0), . . . , Dbmpϕ(θi, x0))
for some θi ∈ [0, t] ⊂ [0, hk]. In the above we have used the fact that
Fa(t, x, v1, . . . , vmp) = Fa(0, ϕ(t, x), v1, . . . , vmp).
Since ϕ(θi, x0) ∈ [E0] and Dbjϕ(θi, x0) ∈ [Ebj ] for j = 1, . . . ,mp we get
Daϕi(t, x0) ∈ [0, hk] (Fa)i
0, [E0], [Eb1 ], . . . , [Ebmp ]
6 Computation of [Vk+1]
6.1 Composition formulas
For any p-times continuously differentiable functions f, g : Rn → Rn and a ∈ Nnp
we have
Da(f ◦ g) =
i1,...,ik=1
Dei1+···+eik fi
(δ1,...,δk)∈Np(k)
Daδj gij (27)
We can apply the above formula to f = ϕ(hk, ·) and g = ϕ(tk, ·) to obtain
Va(tk + hk, x0) =
i1,...,ik=1
VΛ−1(ei1+...+eik )
(hk, xk)
(δ1,...,δk)∈Np(k)
(tk, x0)
for all x0 ∈ [x0]. Using notations [Vk+1,a] := [Va(tk + hk, [x0])] and [Vk,a] =
[Va(tk, [x0])] we can rewrite the above equation as
[Vk+1,a] =
i1,...,ik=1
VΛ−1(ei1+...+eik )
(hk, [xk])
(δ1,...,δk)∈Np(k)
Vk,aδj
where Λ is defined by (5).
6.2 The procedure for computation of [Vk+1]
We introduce new parameters od - the order of the Taylor method used in
computations of Va for a ∈ Nnd . It makes sense to take o1 ≥ o2 ≥ · · · ≥ or.
Input parameters:
• hk - a time step,
• [xk] ⊂ Rn - the current value of x = ϕ(tk, [x0]),
• [Vk,a] ⊂ Rn - a current value of Vk,a(tk, [x0]), for a ∈ Nn1 ∪ . . . ∪Nnr
• [E0] ⊂ Rn compact and convex, such that ϕ([0, hk], [xk]) ⊂ [E0] - a rough
enclosure for [xk],
• [Ea] ⊂ Rn, compact and convex, such that Daϕ([0, hk], [xk]) ⊂ [Ea], for
a ∈ Nn1 ∪ . . . ∪ Nnr .
Output: [Vk+1,a] ⊂ Rn, such that
Va(tk + hk, x0) ∈ [Vk+1,a] (29)
for x0 ∈ [x0] and a ∈ Nn1 ∪ . . . ∪ Nnr .
Algorithm: We compute [Vk+1] as follows
1. Computation of Va(hk, [xk]) using Taylor method for Equation (14), i.e. for
a ∈ Nnp we compute
[Fa] =
dti−1
Fa(0, [xk], Vb1 , . . . , Vbmp−1 ) (30)
hop+1
(op + 1)!
Fa(0, [E0], [Eb1 ], . . . , [Ebmp−1 ]).
where Vbi = 0 for bi ∈ Nn2 ∪ . . . ∪ Nnp−1 and V(j) = enj for j = 1, . . . , n.
Observe that
Va(hk, [xk]) ⊂ [Fa] (31)
Indeed, using Taylor series expansion we obtain that for xk ∈ [xk] and
j = 1, . . . , n holds
(Va)j(hk, xk) =
dti−1
(Fa)j(0, xk, Vb1(0, xk), . . . , Vbmp−1 (0, xk))
hop+1
(op + 1)!
(Fa)j(θi, xk, Vb1(θi, xk), . . . , Vbmp−1 (θi, xk))
for some θi ∈ [0, hk]. Observe, that
(Fa)j(θi, xk, Vb1(θi, xk), . . . , Vbmp−1 (θi, xk))
(Fa)j(0, ϕ(θi, xk), Vb1(θi, xk), . . . , 0, Vbmp−1 (θi, xk))
Using ϕ(θi, xk) ∈ [E0] and Vbs(θi, xk) ∈ [Ebs ] for s = 1, . . . ,mp−1 we
obtain our assertion.
2. The composition. Put
[Jk] := ([F(1)], . . . , [F(n)])
Using (28) for a ∈ Nnp we have
[Vk+1,a] = [αa] + [Jk] · [Vk,a], (32)
where
[αa] =
i1,...,ik=1
[FΛ−1(ei1+...+eik )
(δ1,...,δk)∈Np(k)
Vk,aδj
In our implementation of the algorithm we use the symbolic differentiation to
obtain formulae for Daf . Next, using the automatic differentiation we compute
Fa(t, x, Vb1 (t, x), . . . , Vbmp−1 (t, x))|t=0 which appear in (30).
6.3 Rearrangement for Va - the evaluation of Equation
It is well know that a direct evaluation of Equation (32) leads to wrapping effect
[Mo, Lo]. To avoid it following the work of Lohner [Lo] we will use the same
scheme as it was proposed in [Z].
Namely, observe that Equation (32) has exactly the same structure as the
propagation equations for C1-method (see [Z, Section 3]). Moreover, all vectors
Vk,a, for a ∈ Nn1 ∪ . . .Nnr ’propagate’ by the same [Jk] as did the variational
part in [Z], hence it makes sense the same approach.
To be more precise, each set [Vk,a], for a ∈ Nn1 ∪ . . . ∪ Nnr is represented in
the following form
[Vk,a] = vk,a + [Bk][rk,a] + Ck[qk,a]
where [Bk] is interval matrix, Ck is point matrix, vk,a is a point vector and
rk,a, qk,a are interval vectors. Observe that [Bk] and Ck are independent of a.
In the sequel we will drop index a. Equation (32) leads to
[Vk+1] = [α] + [Jk](vk + [Bk][rk] + Ck[qk]) (34)
Let m([z]) denotes a center of an interval object, i.e. [z] is interval vector or
interval matrix and ∆([z]) = [z]−m([z]).
Let [Q] be an interval matrix which contains an orthogonal matrix. Usually,
[Q] is computed by the orthonormalisation of the columns of m([Jk])[Bk].
[Z] = m([Jk])Ck
Ck+1 = m([Z])
[Bk+1] = [Q]
Then we rearrange formula (34) as follows
[s] = [α] + [Jk]vk +∆([Jk])[Vk]
vk+1 = m([s])
[qk+1] = [qk]
[rk+1] = [Q
T ](∆([s]) + ∆([Z])[qk]) + ([Q
T ]m([Jk])[Bk])[rk]
Summarizing, we can use the following data structure to represent ϕ(tk, [x0])
and Daϕ(tk, [x0]), for a ∈ Nn1 ∪ . . . ∪ Nnr
type CnSet = record
v0, r0, q0: IntervalVector;
C0, B0, C,B : IntervalMatrix;
{va, ra, qa : IntervalVector}a∈Nn1 ∪...∪Nnr
The set ϕ(tk, [x0]) is represented as v0 + B0r0 + C0q0, the partial derivatives
Daϕ(tk, [x0]) are represented as va+Bra+Cqa. The matrices B,C are common
for all partial derivatives.
Notice, that if we start the Cr computation with an initial condition (15) then
there is no Lipschitz part at the beginning for the partial derivatives. Hence,
the initial values for C and B are set to the identity matrix and the initial values
for qa, ra are set to zero.
If the interval vectors ra become ’thick’ (i.e. theirs diameters are larger than
some threshold value) we can set a new Lipschitz part in our representation (it
must be done simultaneously for all Daϕ) and reset ra in the following way
qa = ra + (B
TC)qa, for a ∈ Nn1 ∪ . . . ∪ Nnr
ra = 0, for a ∈ Nn1 ∪ . . . ∪ Nnr
C = B
B = Id
A similar change of the Lipshitz part may be done when vectors ra become thick
in comparison to qa.
7 Derivatives of Poincaré map
Consider a differential equation
x′ = f(x), x ∈ Rn, f ∈ CK+1 (36)
Let ϕ : R × Rn → Rn be a (local) dynamical system induced by (36). Let
α : Rn → R be C1-map. Put Π = {x | α(x) = C}.
Definition 11 We will say that Π is a local section for the vector field f at
y0 ∈ Π if
〈∇α(y0)|f(y0)〉 6= 0. (37)
Assume x0 ∈ Rn and t0 ∈ R are such that Π is a local section at ϕ(t0, x0).
Consider an implicit equation
α(ϕ(tP (x), x)) = C. (38)
It follows easily from (37) and from the implicit function theorem that there
exists a uniquely defined tP : R
n−→◦ R in a neighborhood of x0, such that
tP (x0) = t0. The function tP is as smooth as the flow ϕ. We will refer to
tP as to the Poincare return time to section Π.
We define a Poincaré map P : Rn ⊃ dom (tP ) → Rn by
P (x) = ϕ(tP (x), x). (39)
Usually the Poincaré map is defined as a map P : Π1−→◦ Π2, where Π1,Π2 are
local sections in Rn. The approach taken here, i.e. treating the Poincaré map
as map P : Rn−→◦ Rn allows us to not to worry about the coordinates on local
section.
In this section we are interested in the partial derivatives of P defined by
(39).
From (39) we can compute ∂Pi
and we obtain
(x) = fi(P (x))
(x) +
(tP (x), x). (40)
We need ∂tP
. We differentiate (38) to obtain
(P (x))
fk(P (x))
(x) +
(tP (x), x)
(∇α(P (x)) · f(P (x))) ∂tP
(x) +
(P (x))
(tP (x), x) = 0. (41)
Hence
(x) = − 1〈∇α(P (x))|f(P (x))〉
(P (x))
(tP (x), x). (42)
7.1 Higher order derivatives of the Poincaré map
To make formulas transparent we will drop arguments of functions in this sec-
tion, but reader should be aware that for tP and its partial derivatives the
argument is x, for ϕ and Daϕ the argument is always the pair (tP (x), x).
From (40) we obtain
D(j,c)P =
ϕD(j)tPD(c)tP +
D(c)ϕD(j)tP +
ϕD(j,c)tP
D(j)ϕD(c)tP +D(j,c)ϕ.
It is easy to see that partial derivatives of high order give rise to quite complex
expressions and it is not entirely obvious how to organize it in some coherent
and programmable way. For this purpose we use the following
Lemma 12 For a multipointer a ∈ Nnp we have
DaP = Daϕ+
(δ1,...,δk)∈Np(k)
j=1Daδj tP
(δ1,...,δk)∈Np(k)
∂tk−1
Daδsϕ
j 6=sDaδj tP
Proof: By induction on p. For p = 1 formula (43) is equivalent to (40), because
the two last sums are taken over empty set. Assume (43) holds true for some
p ≥ 1 and fix a ∈ Nnp+1. Our goal is to show that
DaP = R1 +R2 +R3
where
R1 = Daϕ+
ϕDatP
(δ1,...,δk)∈Np+1(k)
Daδj tP
(δ1,...,δk)∈Np+1(k)
∂tk−1
Daδsϕ
j 6=s
Daδj tP
Write a = β + γ, where β ∈ Nnp and γ = (ap+1) ∈ Nn1 . From the induction
assumption we have
DaP = Dγ
ϕDβtP
(δ1,...,δk)∈Np(k)
j=1Dβδj tP
(δ1,...,δk)∈Np(k)
∂tk−1
Dβδsϕ
j 6=sDβδj tP
i=1 Si
where
S1 = Daϕ+
ϕDatP
DβϕDγtP
ϕDβtPDγtP
DγϕDβtP
(δ1,...,δk)∈Np(k)
j=1Dβδj tP
∂tk+1
ϕDγtP
(δ1,...,δk)∈Np(k)
j=1Dβδj tP
(δ1,...,δk)∈Np(k)
s=1Dβδs+γtP
j 6=s
Dβδj tP
(δ1,...,δk)∈Np(k)
∂tk−1
Dβδs+γϕ
j 6=sDβδj tP
(δ1,...,δk)∈Np(k)
DβδsϕDγtP
j 6=sDβδj tP
S10 =
(δ1,...,δk)∈Np(k)
r 6=s
∂tk−1
DβδsϕDβδr+γtP
j 6=s
j 6=r
Dβδj tP
Obviously R1 = S1. We will show that R2 = S3 + S6 + S7 and R3 = S2 + S4 +
S5 + S8 + S9 + S10.
Denote by Ri,k, i = 2, 3 a part of sum Ri with fixed k = 2, . . . , p + 1.
Similarly, let us denote by Si,k a part of sum Si, i = 5, . . . , 10, for k = 2, . . . , p.
Using decomposition of N p+1(2) as in (8) we obtain that R2,2 = S3 + S7,2.
Similarly, using (8) we observe that R2,k = S6,k−1 + S7,k for k = 3, . . . , p.
Finally, since N p+1(p + 1) = {((1), (2), . . . , (p + 1))} and γ = (ap+1) we find
that R2,p+1 = S6,p. This shows that R2 = S3 + S6 + S7.
It remains to show that R3 = S2 +S4 +S5 + S8 + S9 + S10. We will classify
possible terms by the fact, where p+ 1 appears in δi, i = 1, . . . , k and how this
δi enters in R3 as δs or δj . There are four cases
1. δs = (p+ 1)
2. δj = (p+ 1)
3. p+ 1 ∈ δs, |δs| ≥ 2
4. p+ 1 ∈ δj, |δj | ≥ 2
Let us fix k = 2. Let (δ1, δ2) ∈ N p+1(2). The term for case 1 is S4, for case 2
is S2, case 3 is S8,2 and case 4 is S10,2. Hence, R3,2 = S2 + S4 + S8,2 + S10,2.
For k = 3, . . . , p and fixed (δ1, . . . , δk) ∈ N p+1(k) we have: case 1 is given
by S5,k−1, case 2 by S9,k−1, case 3 by S8,k and case 4 by S10,k Hence, for
k = 3, . . . , p we have R3,k = S5,k−1 + S9,k−1 + S8,k + S10,k.
Finally, for k = p+ 1 we observe, that R3,p+1 = S5,p + S9,p. Indeed, in this
case (δ1, . . . , δp+1) = ((1), (2), . . . , (p + 1)). Hence, either for δs = γ we have
term S5,p and δs 6= γ we have S9,p.
We have showed that R3 = S2 + S4 + S5 + S8 + S9 + S10 and the proof is
finished.
Hence, if we know all the partial derivatives of tP up order p we can compute
the partial derivatives of the Poincaré map up the same order. In next subsection
we show how to compute partial derivatives of tP for affine sections.
7.2 Partial derivatives of tP for affine sections
Assume α : Rn → R is an affine map given by
α(x) = α0 +
αixi.
This is a quite restrictive assumption about sections, but it leads to relatively
simple formulas for DatP and it is sufficient for the applications we have in
mind.
Lemma 13 For a multipointer a ∈ Nnp holds
−DatP
∇α| ∂
= 〈∇α|Daϕ〉
∇α| ∂
(δ1,...,δk)∈Np(k)
j=1Daδj tP
(δ1,...,δk)∈Np(k)
∇α| ∂
∂tk−1
Daδsϕ
j 6=sDaδj tP
Proof: The proof is a direct consequence of Lemma 12 and (38). Since α is
affine, by differentiating of α(P (x)) = C we get 〈∇α|DaP 〉 = 0. Using formula
(43) for DaP we obtain our assertion.
Fix [x] ⊂ Rn and assume we have a rigorous bound for tP ([x]) ∈ [t1, t2] (see
[Z, Section 6] for more details on this). Lemmas 13 and 12 show that given rigo-
rous bounds for the partial derivatives Daϕ([t1, t2], [x]) and
Daϕ([t1, t2], [x])
up to some order p we can compute recursively rigorous bounds for the partial
derivatives of tP ([x]) and P ([x]) up to the same order. Notice, that
Daϕ are
given by Taylor coefficients of the solution of (14) with initial conditions P ([x])
for C0 part and Daϕ(tP (x), [x]) for equations for variations. Hence, these coef-
ficients can be easily computed using the automatic differentiation algorithm.
8 Applications.
One of the typical invariant sets in hamiltonian mechanics are invariant tori.
However, the existence of invariant torus in a given system is often difficult to
prove despite the fact that the theory is quite well developed. Probably the
best work in this direction was done by Celletti and Chercia [CC1, CC2], where
the an effective application (computer assisted proof) of KAM theory to the
restricted three body problem modelling system consisting of Sun, Jupiter and
asteroid 12 Victoria was given. Our aim here is more modest as we focus on the
invariant tori emanating from the elliptic fixed point satisfying suitable twist
condition.
In this section we show that the rigorous computations of partial derivatives
of a dynamical system up to order 3 or 5 can be used to prove that in a particular
system an invariant torus exists around some elliptic periodic orbits. In this
section this will be done for the forced pendulum equation and the Michelson
system.
8.1 Area preserving maps on the plane, normal forms and
KAM theorem
Definition 14 Let f : R2 → R2 be a smooth area preserving map, such that
f(p) = p. Let λ and µ be eigenvalues of df(p). Following [SM] we will call the
point p
• hyperbolic if λ, µ ∈ R and λ 6= µ,
• elliptic if λ = µ and λ 6= µ,
• parabolic if λ = µ.
The following KAM theorem will be the main tool to prove the existence of
invariant tori in this paper.
Theorem 15 [SM, §32] Consider an analytic area preserving map f : R2 →
2, f(r, s) = (r1, s1) where
r1 = r cosα− s sinα+O2l+2
s1 = r sinα+ s cosα+O2l+2 (44)
r2 + s2
and O2l+2 denotes convergent power series in r, s with terms of order greater
than 2l+ 1, only.
If at least one of γ1, . . . , γl is not zero then the origin is a stable fixed point
for map f . Moreover, in any neighborhood U of point 0 there exists an invariant
curve for map f around the origin contained in U .
The next theorem and its proof tells how to bring a planar area preserving
map in the neighborhood of an elliptic fixed point into the form (44).
Theorem 16 [SM, §23] Consider an analytic area preserving map f : R2 → R2
such that f(0) = 0. Let λ, λ̄ be complex eigenvalues of Df(0), such that |λ| =
|λ̄| = 1. If λk 6= 1 for k = 1, . . . , 2l+2, then there is an analytic area preserving
substitution such that in the new coordinates mapping f has form (44).
The proof of the above theorem is constructive, i.e. given the power series
for f at an elliptic fixed point one can construct explicitly an area preserving
substitution and compute the coefficients γ0, . . . , γl in (44). An explicit formula
for the coefficient γ1 in the above normal form is given in Appendix A.
8.2 The existence of invariant tori in forced pendulum.
Consider an equation
θ̈ = − sin(θ) + sin(ωt) (45)
Observe that (45) is hamiltonian.
Let us denote by Pω : R
2−→◦ R2 the Poincaré map for Equation (45) with
a parameter ω, i.e. Pω = ϕ(2π/ω, ·), where ϕ : R × R2−→◦ R2 is a local flow
induced by (45). Observe that (45) is nonautonomous, but it is equivalent to
first order system of autonomous ODE given by
= − sin(θ) + sin(ωt) (46)
In the sequel all rigorous computations for (45) will be in fact performed for the
system (46).
Observe that to any invariant closed curve for Pω corresponds and invariant
2-torus for (45).
Consider a set of parameter values
Ω1 = [2, 2.994], Ω2 = [3, 3.997], Ω3 = [4, 8]
Ω = Ω1 ∪ Ω2 ∪ Ω3
The following lemma was proved with computer assistance
Lemma 17 For all parameter values ω ∈ Ω there exists an elliptic fixed point
xω ∈ R2 for Pω. Moreover, there exists an area-preserving substitution such
that in the new coordinates the map fω(x) = Pω(x+xω)−xω has the form (44)
with l = 1 and γ1 6= 0.
Before we give the proof, let us briefly comment about the choice of the parame-
ter set Ω. For parameter values slightly lower than 2 we observe the parabolic
case, i.e. there exists a parameter value ω1 for which eigenvalues of the deriva-
tive of Pω1 are equal to −1. In two gaps in Ω below 3 and 4 we have resonances
of low order. Namely, we have parameter values with an elliptic fixed with ei-
genvalues to e±2π/3 = −1
i and e±iπ/2 = ±i, respectively. Clearly, in a
computer assisted proof we need to exclude a small interval around those para-
meters. For ω > 4 it seems that the interval Ω3 can be extended much further
to the right without any difficulty.
Proof of Lemma 17: A computer assisted proof consists of the following steps.
We cover the set Ω by 9910 nonequal subintervals ωi. Diameters of ωi’s were
relatively large for values far away from the parabolic cases and very small close
to them. For a fixed subinterval ωi we proceed as follows
1. Let ω̄ denote an approximate center of the interval ωi. We find an approxi-
mate fixed point for Pω̄ using the standard nonrigorous Newton method.
Let us denote such a point by xi.
2. We define a box centered at xi, i.e we set vi := xi + [−εi, εi]2, where
εi > 0 depends on subinterval ωi - the values we used are from the interval
[5 · 10−6, 3 · 10−3], depending on whether xi close to parameter values
corresponding to parabolic cases.
3. Using the C1-Lohner algorithm we compute the Interval Newton operator
[Mo, N, A] Ni := N(Pωi − Id, xi, vi) and verify that Ni ⊂ int vi. This
proves that for all ω ∈ ωi there exists a unique fixed point xω ∈ Ni for
4. Using the C3-Lohner algorithm we compute a rigorous bound for Pωi(Ni)
and DαPωi(Ni), α ∈ N21 ∪N22 ∪N23. Hence, we obtain a rigorous bound for
the coefficients in
fω(x) =
|α|=1
DaP (xω)x
5. We show that an arbitrary matrix M ∈ DPωi(Ni) has a pair of complex
eigenvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 4. From Theorem 16
it follows there exists an area-preserving substitution such that in the new
coordinates the map fω for ω ∈ ωi has the form (44) with l = 1.
6. We compute a rigorous bound for γ0 and γ1 which appear in the formula
(44) and verify that for ω ∈ ωi holds γ1 6= 0.
The rigorous bounds for the values of γ1 on Ω are
γ1(Ω1) ⊂ [0.29930416771330087, 30.118260918229566]
γ1(Ω2) ⊂ [0.099747909112924596, 0.56550301088840627]
γ1(Ω3) ⊂ [0.18574835001593507, 0.4129279974577012]
A computer assisted proof of the above took approximately 95 minutes on the
Pentium IV 3GHz processor.
As a straightforward consequence of Lemma 17 and Theorem 15 we obtain
Theorem 18 For all parameter values ω ∈ Ω there exists an elliptic fixed point
xω ∈ R2 for Pω. Moreover, any neighborhood of point xω contains an invariant
curve for Pω around xω.
8.3 Higher order normal forms.
In the previous section it was shown that C3 computations are sufficient to
prove that for (45) a family of invariant tori exists. However, it may happen
that the coefficient γ1 in the normal form vanishes. In this situation we may try
to compute higher order normal form. As an example we consider a pendulum
with a different forcing term,
θ̈ = − sin(θ) + sin(ωt) + sin(2ωt). (47)
Theorem 19 Let Pω be the Poincaré map for (47). For all parameter values
ω ∈ Ω∗ = [2.9957694795, 2.9957694796] there exists an elliptic fixed point xω ∈
2 for Pω. Moreover, any neighbourhood of point xω contains an invariant
curve for Pω around xω.
Proof: The main concept of the proof is the same as in Lemma 17. Using the
nonrigorous Newton method we find an approximate fixed point
x = (−7.7491573604896152 · 10−12,−0.54723831527031352).
We set v = x + 3 · 10−5([−1, 1] × [−1, 1]). Using the C1-Lohner algorithm we
compute the Interval Newton Operator of Pω − Id on v and we obtain that for
all ω ∈ Ω∗, N = N(Pω − Id, center(v), v) ⊂ (N1, N2), where
N1 = [−5.1582932672798325, 5.1582631625020222] · 10−10
N2 = [−0.54723831580217108,−0.54723831470891193]
Since N ⊂ v we conclude that for all ω ∈ Ω∗ there exists a unique fixed point
xω ∈ N for the Poincaré map.
Using C5-Lohner algorithm we compute a rigorous bound for PΩ∗(N) and
DαPΩ∗(N), α ∈ N21 ∪ . . . ∪ N25. Hence, we obtain a rigorous bound for the
coefficients in
fω(x) =
|α|=1
DαP (xω)x
α +O6
We show that an arbitrary matrix M ∈ DPΩ∗(N) has a pair of complex ei-
genvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 6. From Theorem 16 it follows
there exists an area-preserving substitution such that in the new coordinates the
map fω for ω ∈ Ω∗ has the form (44) with l = 2.
Next, we compute a rigorous bound for γ1 and γ2 which appear in the formula
(44) and we get
γ1(Ω∗) ⊂ [−5.3924276719042241, 5.381714805052106] · 10−6
γ2(Ω∗) ⊂ [199.95180660157078, 199.99104965939162]
Since for ω ∈ Ω∗, γ2(ω) 6= 0 the assertion follows from Theorem 15.
The main observation which makes this example interesting is that there
exists ω∗ ∈ Ω∗ for which γ1(ω∗) = 0 and we cannot conclude the existence of
invariant tori for all ω ∈ Ω∗ from C3 computations. To be more precise, we
computed the coefficient γ1 for the parameter values ω1 = minΩ∗ and ω2 =
maxΩ∗ and we get
γ1(ω1) ∈ [−2.3559594437885885,−1.3593457220363871] · 10−8
γ1(ω2) ∈ [2.9671154858524365 · 10−9, 1.2819312939263052 · 10−8]
Since γ1 exists for all ω ∈ Ω∗ and depends continuously on ω we conclude, that
γ1(ω∗) = 0 for some ω∗ ∈ Ω∗.
8.4 Application to the Michelson system
The existence of an invariant curve for a planar map f : R2 → R2 can be proven
without assumption that f is measure preserving. The key assumption in the
proof given in [SM] is that any curve γ around an elliptic point intersect its
image under f , i.e. f(γ)∩ γ 6= ∅. Such a situation is also observed in reversible
planar map around an symmetric elliptic fixed points.
Definition 20 An invertible transformation M : Ω −→ Ω is called a reversing
symmetry of a local dynamical system φ : T × Ω −→ Ω, T = R or T = Z if the
following conditions are satisfied
1. if (t, x) ∈ dom (φ) then (−t, S(x)) ∈ dom (φ).
2. S(φ(t, x)) = φ(−t, S(x)))
Remark 21 In the discrete time case, the above two conditions are equivalent
to identity
M ◦ f = f−1 ◦M.
where f = φ(1, ·) is a generator of φ.
Definition 22 Let φ : T×Ω → Ω be a local (discrete or continuous) dynamical
system. For x ∈ Ω put
I(x) = {t ∈ T : (t, x) ∈ dom (φ)}
O (x) = {φ(t, x) ∈ Ω : t ∈ I(x)}
The set O (x) will be called a trajectory of a point x.
Definition 23 Assume S is an reversing symmetry for φ : T × Ω → Ω. An
orbit O (x) is called S-symmetric orbit if O (x) = S(x).
Remark 24 [La] In continuous case the orbit O (x) is S-symmetric if it contains
a point from the set Fix(S) = {y : S(y) = y}.
Remark 25 [Wi2, Lem.3.3] It is easy to see that if Θ ⊂ Ω is a Poincaré section
for a R-reversible flow φ : R × Ω → Ω such that Θ = R(Θ) then the Poincaré
map P : Θ → Θ is R|Θ-reversible.
As we observed at the beginning of this section, an R-reversible planar map
may admit an invariant curve around an R-symmetric elliptic fixed point. In
reversible case a planar map admits the same normal form around symmetric, el-
liptic fixed point as in the area-preserving case and the substitution which tends
the map to the normal form is exactly the same as we described in Appendix A
– for details see [Se, BHS].
Consider an ODE 
ẋ = y
ẏ = z
ż = c2 − y − 1
On one hand, the system (48) is an equation for the steady state solution of
one-dimensional Kuramoto-Sivashinsky PDE and it is known in the literature
as the Michelson system[Mi]. On the other hand, this system appears as a part
of the limit family of the unfolding of the nilpotent singularity of codimension
three (see [DIK1]).
The system (48) is reversible with respect to the symmetry
R : (x, y, z, t) → (−x, y,−z,−t) (49)
and since the divergence vanishes it is also volume preserving.
A dynamical system induced by (48) exhibits several types of dynamics for
different values of parameter. For sufficiently large c there is a simple invariant
set consisting of two equilibria (±c
2, 0, 0) and heteroclinic orbit between them
[MC]. Lau [Lau] numerically observed that when the parameter c decreases a
cascade of cocoon bifurcations occurs and at the limit value c ≈ 1.266232337
a periodic orbit is born through a saddle-node bifurcation. This hypothesis
has been proved in [KWZ]. The computer assisted proof of this fact given in
[KWZ] uses the algorithm presented in this paper in order to compute partial
derivatives up to second order for a certain Poincaré map.
For the parameter value equal to one and slightly smaller than one it was
proven in [DIK2, Wi1, Wi2, Wi3] that the system has rich and complicated
dynamics including symbolic dynamics, heteroclinic solutions, Shilnikov homo-
clinic solutions.
However, as the bifurcations diagram presented by Michelson suggests [Mi,
Fig.1] for all parameter values c ∈ (0, 0.3195) there are at least two elliptic
periodic orbits with large invariant islands around them. In this section we
present a proof that such islands exist for some range of parameter values. The
main idea of the proof is almost the same as in the previous section. There are
two main differences. First, the Poincaré map will not be a time shift. Therefore
computations of the partial derivatives of the Poincaré map require Lemma 12
and Lemma 13. Second difference is: we use the shooting method instead of
the interval Newton method for the proof of the existence of symmetric periodic
orbit.
The aim of this section is to prove the following
Theorem 26 For all parameter values from the set
C = C1 ∪ C2 = [0.1, 0.225]∪ [0.226, 0.25]
there exists a symmetric elliptic periodic orbit for the Michelson system (48).
Moreover, each neighbourhood of such an orbit contains a 2D tori invariant
under the flow generated by the Michelson system.
Let us define the Poincaré section Π := {(0, z, y) : z, y ∈ R}. Let Pc =
(P1, P2) : Π−→◦ Π be the Poincaré map for the system with the parameter value
c. Notice, that Pc is in fact a half Poincaré map, which means that the trajectory
of x crosses Π in opposite directions when passing through x and Pc(x), and
therefore periodic orbits for the Michelson system corresponds to periodic points
for P 2.
Since the section Π is invariant under symmetry (x, y, z) → (−x, y,−z),
from Remark 25 the Poincaré map is also reversible with respect to an involu-
tion R(y, z) = (y,−z). We will use the same letter R to denote the reversing
symmetry of the Poincaré map and the Michelson system.
Let us comment about the choice of the set C. In the gap between intervals
C1 and C2 there is a parameter value c∗ for which the eigenvalues of the Poincaré
map P 2c∗ are ±i. Apparently at this parameter value we have a bifurcation and
four periodic islands are born as it is shown in Fig.1 - see also a movie mpp.mov
available at [Wi4] which presents an animation of the phase portrait of Pc for
the parameter values from the range [0.1, 0.25].
Proof of Theorem 26. The main concept of the proof is quite similar to the
one presented in Lemma 17. We divide the set C of parameter values onto 20800
nonequal parts (smaller when close to the bifurcation parameter c∗ and close to
0.1 and 0.25). For a fixed subinterval ci from the grid we proceed as follows
1. Let c̄ denote a center of the interval ci. We find an approximate fixed point
of P 2c̄ using the standard nonrigorous Newton method. Let us denote this
point by (yi, zi).
2. Since the map Pc is reversible one can prove the existence of the fixed
point for P 2c using the shooting method as follows.
Let Fix(R) = {(y, z) ∈ Π : R(y, z) = (y, z)} = {(y, 0) ∈ Π : y ∈ R}.
Since Pc satisfies (Pc ◦ R)2 = Id whenever the left side is defined, one
can see that if x ∈ Fix(R) and Pc(x) ∈ Fix(R) then P 2c (x) = x. Let us
remark, that we always get an approximate fixed points (yi, zi) resulting
from the nonrigorous Newton method very close to Fix(R). We define
two points u1 = (yi − εi, 0), u2 = (yi + εi, 0) ∈ Fix(R), where εi is a small
number depending on ci and we show that πz(Pci(u1)) · πz(Pci(u2)) < 0,
where πz is a projection onto z coordinate. Hence, if the Pci is defined
on the set Ni = (0, [yi − εi, yi + εi], 0) then for all parameter values c ∈ ci
there is a point uc ∈ N which satisfies πz(Pc(uc)) = 0 and therefore
Figure 1: Phase portrait of the Poincaré map Pc (top) before bifurcation for
c = 0.225 and (bottom) after bifurcation for c = 0.226 with four periodic islands.
Between those parameters resonant case occurs with eigenvalues equal to ±i.
See also auxiliary material [Wi4].
Pc(uc) ∈ Fix(R). This shows that for all c ∈ ci there exists a fixed point
for P 2c inside Ni provided Pc is defined on Ni, which will be discussed
below.
3. Using C3-Lohner algorithm we compute rigorous bounds for P 2ci(Ni) and
DαP 2ci(Ni) for α ∈ N
1 ∪ N22 ∪ N23. This implies also that Ni ⊂ domPci .
4. We show that an arbitrary matrix M ∈ DPci(Ni) has a pair of complex
eigenvalues λ, λ̄ which satisfy λk 6= 1 for k = 1, . . . , 4. From Theorem 16
it follows there exists an area-preserving substitution such that in the new
coordinates the map Pc for c ∈ ci has the form (44) with l = 1.
5. We compute a rigorous bound for γ0 and γ1 which appear in the formula
(44) and verify that for c ∈ ci holds γ1 6= 0.
The rigorous bounds for the values of γ1 on C are
γ1(C1) ⊂ [0.014515898754816965, 157.76639522562903]
γ1(C2) ⊂ [1.1002393483255526, 151.35147664498677]
The computer assisted proof of the above took approximately 7 hours and 50
minutes on the Pentium IV 3GHz processor.
9 Implementation notes.
All the algorithms presented in this paper have been implemented in C++ by
authors and are part of the CAPD library [CAPD]. In particular, the package
implements the computation of partial derivatives of a flow with respect to
initial condition, partial derivatives of Poincaré maps for linear sections and
computations of normal forms for planar maps up to order 5.
The implementation combines the automatic and symbolic differentiation in
order to generate a coefficients in Taylor series for the solutions of the system
(14).
Our tests shows that without difficulty we can compute partial derivatives
up to order 3 for an equation in 8-dimensional phase space (which gives 1320
equations to solve) on a computer with 512MB memory. However, our current
implementation is optimized for lower dimensional problems. All the trees which
represent formulas (14) are stored in the memory of a computer. This speeds
up computations because we do not need to recompute all the multiindices,
multipointers and submultipointers in each step of the algorithm. Unfortunately,
such an implementation is memory-consuming. Therefore, higher dimensional
problems require a computer with huge memory even for C3 or C5 computations.
A Explicit formulas for third order normal forms
for a planar map
The goal of this section is to give some details about the proof of Theorem 16.
We want to present some formulas to give the reader the feeling about the
necessary computations.
Throughout this section we assume that the assumptions of Theorem 16 are
satisfied. In the neighbourhood of 0 f is given by a real, convergent power series
f(x, y) = (x1, y1)
ak−l,lx
lyk−l
bk−l,lx
lyk−l
Denote also by f : C2 → C2 a complex extension of f . Let λ, λ̄ ∈ C be
complex eigenvalues of Df(0) and v, v̄ ∈ C corresponding eigenvectors (here
bar denotes the complex conjugation). Then, using a linear substitution of the
form L = [vT , v̄T ], we can change the coordinate system such that in the new
coordinates the mapping f has the form
f(ξ, η) = (λξ + p(ξ, η), λ̄η + q(ξ, η))
p(ξ, η) =
pl,k−lξ
lηk−l
q(ξ, η) =
ql,k−lξ
lηk−l
pi,j = qj,i for i, j ≥ 0.
The last condition is a consequence of the invariance of R2 ⊂ C2 under the
complex map f . We will refer to it as the reality condition. Namely, the set
2 ⊂ C2 in the new coordinates (ξ, η) is given by ξ = η and the condition
f(R2) ⊂ R2 expressed in coordinates (ξ, η) is equivalent to (50).
Assume now, that λk 6= 1 for k = 1, . . . , 4. Then an analytic area-preserving
substitution satisfying reality condition (50)
(Φ(z, v),Ψ(z, v)) = (z1, v1)
z1 = z +
φl,k−lz
lvk−l + · · ·
v1 = v +
ψl,k−lz
lvk−l + · · ·
where
ψ2,0 = φ0,2 = −λ2p0,2(λ3 − 1)−1
ψ1,1 = φ1,1 = −p1,1(λ− 1)−1
ψ0,2 = φ2,0 = p2,0(λ
2 − λ)−1
ψ3,0 = φ0,3 = −λ3 (p0,3 + p1,1φ0,2 + 2q0,2ψ0,2) (λ4 − 1)−1
ψ2,1 = φ1,2 =
λ2 − 1 (p1,2 + 2p2,0φ0,2 + p1,1φ1,1 + p1,1ψ0,2 + 2p0,2ψ1,1)
ψ1,2 = φ2,1 = −φ2,0ψ0,2 + φ0,2ψ2,0
ψ0,3 = φ3,0 = (p3,0 + 2p2,0φ2,0 + p1,1ψ2,0) (λ
3 − λ)−1
brings f = (f1, f2) to the normal form
(z, v) → (z(α0 + α2zv), v(β0 + β2zv)) +O((zv)2)
β0 = α0 = λ
β2 = α2 = q1,2 + 2q2,0φ0,2 + q1,1φ1,1 + q1,1ψ0,2 + 2q0,2ψ1,1
Finally, let γ0 ∈ R be such that λ = α0 = eiγ0 and we compute coefficient γ1 by
From the proof given in [SM] it follows that γ1 ∈ R and the mapping f in
coordinates (z, v) has the form
f(z, v) =
zei(γ0+γ1zv), ve−i(γ0+γ1zv)
where O4 is a convergent power series with the terms of degree at least 4.
Again, the coefficients of f(z, v) satisfy reality condition (50). In order to
express this normal form in terms of real variables we make a linear substitution
z = r + is, v = r − is
and we obtain the normal form for f
f(r, s) = (r1, s1) +O4
r1 = r cos(γ0 + γ1(r
2 + s2))− s sin(γ0 + γ1(r2 + s2))
s1 = r sin(γ0 + γ1(r
2 + s2)) + s cos(γ0 + γ1(r
2 + s2))
which agrees with (44).
The formulas for higher order terms φi,j , ψi,j (and for γ2, which are not given
here) has been computed in Mathematica.
References
[A] G. Alefeld, Inclusion methods for systems of nonlinear equations - the
interval Newton method and modifications, in Topics in Validated Com-
putations, J. Herzberger (Editor), Elsevier Science B.V., 1994, pages
[BHS] H.W. Broer, G.B. Huitema and M.B. Sevryuk, Quasi-periodicity in
families of dynamical systems: order amidst chaos, Lecture Notes in
Mathematics, Vol. 1645, Springer Verlag, (1996).
[BM] M. Berz, K. Makino, New Methods for High-Dimensional Verified Qua-
drature, Reliable Computing, 5, 13-22 (1999)
[CAPD] CAPD – Computer Assisted Proofs in Dynamics group, a C++ pa-
ckage for rigorous numerics, http://capd.wsb-nlu.edu.pl.
[CC1] A. Celletti, L. Cherchia, KAM Stability for three-body problem of the
Solar System, Z. angw. Math. Phys. 57 (2006) 33-41
[CC2] A. Celletti, L. Cherchia, KAM Stability and Celestial Mechanics, Me-
moirs of the AMS, Vol 187, Num 878 (2007)
[DIK1] F. Dumortier, S. Ibáñez, and H. Kokubu, New aspects in the unfolding
of the nilpotent singularity of codimension three, Dynam. Syst. 16
(2001), 63–95.
[DIK2] F. Dumortier, S. Ibáñez, and H. Kokubu, Cocoon bifurcation in three
dimensional reversible vector fields, Nonlinearity 19 (2006), 305–328.
[GZ] Z. Galias, P. Zgliczyński, Computer assisted proof of chaos in the Lo-
renz system, Physica D, 115, 1998,165–188
[HNW] E. Hairer, S.P. Nørsett, G. Wanner, Solving Ordinary Differential
Equations I, Nonstiff Problems, Springer-Verlag, Berlin Heidelberg
1987.
[HZHT] B. Hassard, J. Zhang, S. Hastings, W. Troy, A computer proof that
the Lorenz equations have ”chaotic” solutions, Appl. Math. Letters, 7
(1994), 79–83
[KZ] T. Kapela, P. Zgliczyński, The existence of simple choreographies for
N-body problem - a computer assisted proof, Nonlinearity, 16 (2003),
1899-1918
[KWZ] H. Kokubu, D.Wilczak, P. Zgliczyński, Rigorous verifi-
cation of cocoon bifurcations in the Michelson system,
http://www.ii.uj.edu.pl/~wilczak, submitted
[La] J.S.W. Lamb, Reversing symmetries in dynamical systems, PhD The-
sis, Amsterdam University, 1994
http://capd.wsb-nlu.edu.pl
http://www.ii.uj.edu.pl/~wilczak
[Lau] Y-T Lau, The “cocoon” bifurcations in three-dimensional systems with
two fixed points, Int. Jour. Bif. Chaos, Vol.2, No.3 (1992) 543-558.
[Lo] R.J. Lohner, Computation of Guaranteed Enclosures for the Solutions
of Ordinary Initial and Boundary Value Problems, in: Computational
Ordinary Differential Equations, J.R. Cash, I. Gladwell Eds., Claren-
don Press, Oxford, 1992.
[MC] C.K. McCord, Uniqueness of connecting orbits in the equation Y (3) =
Y 2 − 1, J. Math. Anal. Appl. 114, 584-592.
[Mi] D. Michelson, Steady solutions of the Kuramoto–Sivashinsky equation,
Physica D, 19, (1986) 89-111.
[Mo] R.E. Moore, Interval Analysis. Prentice Hall, Englewood Cliffs, N.J.,
[MM] K. Mischaikow, M. Mrozek, Chaos in the Lorenz equations: A compu-
ter assisted proof. Part II: Details, Mathematics of Computation, 67,
(1998), 1023–1046
[MZ] M. Mrozek, P. Zgliczyński, Set arithmetic and the enclosing problem
in dynamics, Annales Pol. Math., 2000, 237–259
[NJ] N. S. Nedialkov, K. R. Jackson, An Interval Hermite – Obreschkoff
Method for Computing Rigorous Bounds on the Solution of an Initial
Value Problem for an Ordinary Differential Equation, chapter in the
book Developments in Reliable Computing, editor T. Csendes, 289-310,
Kluwer, Dordrecht, Netherlands, 1999.
[N] A. Neumeier, Interval methods for systems of equations, Cambridge
University Press, 1990.
[RNS] T. Rage, A. Neumaier, C. Schlier, Rigorous verification of chaos in a
molecular model, Phys. Rev. E 50 (1994), 2682–2688
[Ra] L.B. Rall, Automatic Differentiation: Techniques and Applications,
volume 120 of Lecture Notes in Computer Science. Springer Verlag,
Berlin, 1981
[Se] Sevryuk, M. B. Reversible systems, Lecture Notes in Mathematics,
1211. Springer-Verlag, Berlin, 1986
[SM] C.L. Siegel, J.K. Moser, Lectures on Celestial Mechanics, Springer-
Verlag Berlin Heidelberg New York, 1995.
[SK] D. Stoffer, U. Kirchgraber, Possible chaotic motion of comets in the
Sun Jupiter system - an efficient computer-assisted approach, Nonli-
nearity, 17 (2004) 281-300.
[T] W. Tucker, A Rigorous ODE solver and Smale’s 14th Problem, Foun-
dations of Computational Mathematics, (2002), Vol. 2, Num. 1, 53-117
[W] W. Walter, Differential and integral inequalities, Springer-Verlag Ber-
lin Heidelberg New York, 1970
[Wi1] D. Wilczak, The existence of Shilnikov homoclinic orbits in the Mi-
chelson system: a computer assisted proof, Found. Comp. Math. Vol.6,
No.4, 495-535, (2006).
[Wi2] D. Wilczak, Chaos in the Kuramoto–Sivashinsky equations – a compu-
ter assisted proof, J. Diff. Eqns., 194, 433-459 (2003).
[Wi3] D. Wilczak, Symmetric heteroclinic connections in the Michelson sys-
tem – a computer assisted proof, SIAM J. App. Dyn. Sys., Vol.4, No.3,
489-514 (2005).
[Wi4] D. Wilczak, http://www.ii.uj.edu.pl/~wilczak, a refference for
auxiliary materials.
[WZ] D. Wilczak, P. Zgliczyński, Heteroclinic Connections between Periodic
Orbits in Planar Restricted Circular Three Body Problem - A Compu-
ter Assisted Proof, Commun. Math. Phys. 234, 37-75 (2003).
[Z1] P. Zgliczyński, Computer assisted proof of chaos in the Hénon map and
in the Rössler equations, Nonlinearity, 1997, Vol. 10, No. 1, 243–252
[Z] P. Zgliczyński, C1-Lohner algorithm, Foundations of Computational
Mathematics, (2002) 2:429–465
[ZPer] P. Zgliczyński, Lohner Algorithm for perturbations of ODEs and diffe-
rential inclusions, http://www.ii.uj.edu.pl/~zgliczyn
http://www.ii.uj.edu.pl/~wilczak
http://www.ii.uj.edu.pl/~zgliczyn
	Introduction
	Basic definitions
	Multiindices
	Multipointers
	Equations for variations
	Cr-Lohner algorithm
	Why one needs an Cr-algorithm?
	An outline of the algorithm
	Computation of a rough enclosure for Da
	The procedure for the computation of the rough enclosure for V.
	Computation of [Vk+1]
	Composition formulas
	The procedure for computation of [Vk+1]
	Rearrangement for Va - the evaluation of Equation (32)
	Derivatives of Poincaré map
	Higher order derivatives of the Poincaré map
	Partial derivatives of tP for affine sections
	Applications.
	Area preserving maps on the plane, normal forms and KAM theorem 
	The existence of invariant tori in forced pendulum.
	Higher order normal forms.
	Application to the Michelson system
	Implementation notes.
	Explicit formulas for third order normal forms for a planar map
ABSTRACT
  We present a Lohner type algorithm for the computation of rigorous bounds for
solutions of ordinary differential equations and its derivatives with respect
to initial conditions up to arbitrary order. As an application we prove the
existence of multiple invariant tori around some elliptic periodic orbits for
the pendulum equation with periodic forcing and for Michelson system.

<|endoftext|><|startoftext|>
Mon. Not. R. Astron. Soc. 000, 1–8 (2006) Printed 1 November 2018 (MN LATEX style file v2.2)
Timing evidence in determining the accretion state of the
Seyfert galaxy NGC 3783
D. P. Summons1,3⋆, P. Arévalo1 , I. M. McHardy1, P. Uttley2 and A. Bhaskar3
1School of Physics and Astronomy, University of Southampton, Southampton SO17 1BJ, UK
2Astronomical Institute ‘Anton Pannekoek’, University of Amsterdam, Kruislaan 403, 1098 SJ, Amsterdam, the Netherlands
3School of Engineering Science, University of Southampton, Southampton SO17 1BJ, UK
Received /Accepted
ABSTRACT
Previous observations with the Rossi X-ray Timing Explorer (RXTE) have suggested
that the power spectral density (PSD) of NGC 3783 flattens to a slope near zero at low
frequencies, in a similar manner to that of Galactic black hole X-ray binary systems
(GBHs) in the ‘hard’ state. The low radio flux emitted by this object, however, is
inconsistent with a hard state interpretation. The accretion rate of NGC 3783 (∼ 7%
of the Eddington rate) is similar to that of other AGN with ‘soft’ state PSDs and
higher than that at which the GBH Cyg X-1, with which AGN are often compared,
changes between ‘hard’ and ‘soft’ states (∼ 2% of the Eddington rate). If NGC 3783
really does have a ‘hard’ state PSD, it would be quite unusual and would indicate
that AGN and GBHs are not quite as similar as we currently believe. Here we present
an improved X-ray PSD of NGC 3783, spanning from ∼ 10−8 to ∼ 10−3 Hz, based
on considerably extended (5.5 years) RXTE observations combined with two orbits
of continuous observation by XMM-Newton. We show that this PSD is, in fact, well
fitted by a ‘soft’ state model which has only one break, at high frequencies. Although
a ‘hard’ state model can also fit the data, the improvement in fit by adding a second
break at low frequency is not significant. Thus NGC 3783 is not unusual. These results
leave Arakelian 564 as the only AGN which shows a second break at low frequencies,
although in that case the very high accretion rate implies a ‘very high’, rather than
‘hard’ state PSD. The break frequency found in NGC 3783 is consistent with the
expectation based on comparisons with other AGN and GBHs, given its black hole
mass and accretion rate.
Key words: galaxies: active – galaxies: Seyfert – galaxies: NGC 3783 – X rays:
galaxies
1 INTRODUCTION
Super-massive black holes in active galactic nuclei (AGN)
and Galactic stellar-mass black hole X-ray binary sys-
tems (GBHs) both display aperiodic X-ray variability
which may be quantified by calculating the power spec-
tral densities (PSDs) of the X-ray light curves. The PSDs
can typically be represented by red-noise type power
laws (i.e. P (ν), the power at frequency ν, ∝ να where
α ∼ -1) with a bend or break (to α 6 -2) at a
characteristic PSD frequency. The time-scale, correspond-
ing to the bend-frequency, scales approximately linearly
with black hole mass from AGN to GBHs (McHardy
1988; Edelson & Nandra 1999; Uttley et al. 2002, 2005;
Markowitz et al. 2003; McHardy et al. 2004, 2005), albeit
⋆ E-mail: dps@astro.soton.ac.uk
with some scatter. However, the scatter is entirely accounted
for by variations in accretion rate, allowing scaling between
AGN and GBHs on time-scales from ∼ years to ∼ ms
(McHardy et al. 2006).
GBHs are observed in a number of distinct X-ray spec-
tral states which also have distinct X-ray timing proper-
ties. Two common states are the low/hard (hereafter ‘hard’)
and high/soft (hearafter ‘soft’) states. In the hard state, the
energy-spectrum is dominated by a highly variable power
law component and the PSDs are well fitted by multiple
broad Lorentzians. For use in AGN, where signal/noise is
lower than in GBHs, this PSD shape can be approximated by
a double-bend power law with slopes α = 0, -1 and -2, from
low to high frequency, where the high- and low-frequency
bends correspond to the strongest peaks in the Lorentzian
parameterisation. The breaks are typically separated by only
one to two decades in frequency. In the soft state, the en-
c© 2006 RAS
http://arxiv.org/abs/0704.0721v1
2 D. P. Summons et al
ergy spectrum is dominated by an approximately constant
thermal disc component which extends into the X-ray band
in GBHs but which in AGN is shifted down to the opti-
cal/UV band. Therefore, a meaningful comparison between
the PSDs of soft state GBH and AGN can only be made in
cases where the GBH power-law emission is strong enough
to show significant variability. Such GBHs are rare and the
best example is Cyg X-1 which shows a ‘1/f’ PSD over many
decades of frequencies (Reig et al. 2002). The soft state is
distinguished from the hard state by having only one, high
frequency, break in this power law, from slope -1 to -2.
It has been suggested that this pure simple broken or
cut-off power-law PSD shape is unique to the soft state of
Cyg X-1, which is a persistent source. However in transient
GBHs with similar X-ray spectra, the power law PSD com-
ponent may be seen in combination with broad Lorentzian
features (Done & Gierliński 2005). Axelsson et al. (2006)
also note that a mixed power law plus Lorentzian PSD is
also present in Cyg X-1 in lower luminosity, harder spectral
states, but as the luminosity rises the Lorentzian features
weaken and the power law PSD component strengthens un-
til, in the softest state, it completely dominates. Since the
softest spectral states of transient GBHs are dominated by
constant disc emission we cannot determine whether they
show a similar PSD shape to Cyg X-1.
However, a direct comparison of transient GBHs and
Cyg X-1 is complex, since the transients show much larger
luminosity changes, and complex hysteresis effects in spec-
tral hardness versus luminosity (e.g. Homan et al. 2001,
Belloni et al. 2005) which are not seen in Cyg X-1. There-
fore it is not clear that one can compare timing properties
between Cyg X-1 and transient GBHs simply as a function
of observed X-ray spectrum.
None the less, it is still interesting that the X-ray spec-
trum of Cyg X-1 never becomes totally disc-dominated,
and always contains a relatively strong variable component
whose PSD resembles that of X-ray bright AGN. If variabil-
ity originates, at least partly, in the disc, so power spectral
shape is related to the disc structure, that structure might
be severely disrupted during outbursts, thereby suggesting
a possible difference between the persistent Cyg X-1 and the
transient GBH sources. The similarities between the PSDs
of Cyg X-1 and AGN may also be related to the possible
similarities in accretion flows between AGN and Cyg X-1
noted by Done & Gierliński (2005).
To date, NGC 3783 and the Narrow Line Seyfert 1
Galaxy (NLS1) Ark 564 are the only AGN with suggested
second, low-frequency breaks in their PSDs (i.e. similar to
low/hard GBHs) and are both commonly referred to as be-
ing unusual (e.g. Done & Gierliński 2005). The power spec-
tral evidence for a second break in the case of Ark 564
is very strong (Pounds et al. 2001; Papadakis et al. 2002;
Markowitz et al. 2003, McHardy et al. in prep.). Of all the
AGN with good timing data, Ark 564 shows the highest
accretion-rate (possibly super-Eddington) so it would not be
surprising if it were in an unusual state, e.g. the ‘very high’
state where the PSD, in GBHs, also displays two distinct
breaks. The properties of NGC 3783, on the other hand, are
similar to those of AGN with proven soft-state PSDs (e.g.
NGC 3227, NGC 4051 McHardy et al. 2004, MCG-6-30-15
McHardy et al. 2005), and in particular it is radio quiet
(e.g. Reynolds 1997). In the hard state, GBHs are strong
radio sources whereas in the soft state the radio emission is
quenched (Corbel et al. 2000; Fender 2001; Körding et al.
2006). We also note that NGC 3783 has a more moder-
ate accretion rate than Ark 564 (∼ 7%), and more similar
to the other AGN mentioned above, and Cyg X-1 changes
from the hard to the soft state at around 2% of the Edding-
ton accretion rate (i.e. ṁE =0.02) (Pottschmidt et al. 2003;
Wilms et al. 2006; Axelsson et al. 2006). These two facts do
not lie easily with a hard state identification of NGC 3783.
Thus it would be surprising, and might indicate that our cur-
rent ideas regarding the scaling between AGN and GBHs are
not entirely correct, if NGC 3783 were proven to have a hard
state PSD. It is therefore important to determine whether
NGC 3783 does have a second, low frequency, break in its
PSD or not.
Markowitz et al. (2003) recognised the presence of a
break in the 2− 10 keV PSD of NGC 3783 at 4× 10−6 Hz
and found provisional evidence for a second lower-frequency
break at ∼ 2×10−7 Hz. Specifically, Markowitz et al. (2003)
rejected the possibility that the PSD is described by a single-
break power law with low-frequency slope -1, similar to other
AGN, at the 98% confidence level. In this paper we re-
investigate the evidence for the second break in the PSD of
NGC 3783, using new long-term monitoring data that cov-
ers the frequency range where the break appears to be. By
including additional RXTE archival data spanning several
years, along with short time-scale observations by XMM-
Newton, we will demonstrate that the improved PSD is
perfectly compatible with a single-bend power law, consis-
tent with the behaviour of the other moderately-accreting
Seyferts. In Section 2 we describe the observations and the
methods by which we extract the RXTE and XMM-Newton
light curves. In Sections 3 we discuss the PSD of NGC 3783
as produced from the RXTE and XMM-Newton observa-
tions, and compare it with various PSD models. In Section
4 we briefly review the implications of our observations.
2 OBSERVATIONS AND DATA REDUCTION
2.1 RXTE Data Reduction
From 1999 to 2006, NGC 3783 has been the target of various
monitoring campaigns with RXTE. These campaigns have
consisted of short, ∼1 ks duration, observations with the
proportional counter array (PCA, Zhang et al. 1993). We
have analysed the archival PCA STANDARD-2 data and
our own proprietary data with FTOOLS v6.0.2 using stan-
dard extraction methods. We use data from the top layers of
PCUs 0 and 2 up to 2000 May 12 and only top layer PCU 2
data from observations after this date. The remaining PCUs
were not used due to repeated breakdowns.
Data were selected according to the standard ‘good-
time’ criteria, i.e. target elevation < 10◦, offset pointing
< 0.02◦, and electron contamination < 0.1. The background
was simulated with the L7 model for faint sources using
PCABACKEST v3.0. The response matrices for each PCA
observation were calculated using PCARSP v10.1. The fi-
nal 2− 10 keV fluxes were calculated using XPSEC v12.2.1
by fitting a power law with galactic absorption to the PHA
data.
The RXTE data used in our analysis, together with the
c© 2006 RAS, MNRAS 000, 1–8
Accretion state of NGC 3783 3
Figure 1. RXTE long-term light curve of NGC 3783 in the 2-
10keV band.
Figure 2. RXTE intense sampling light curve in the 2-10 keV
band of NGC 3783.
sampling patterns, are listed in Table 1 and displayed in
Fig. 1. The early data (to MJD 52375) with 4 d sampling,
together with the 20 d period of 3 h sampling already pre-
sented by Markowitz et al. (2003) are followed, after a 2 year
gap, by our new long-term monitoring, with 2 d sampling.
As the gap is large compared to the duration of each moni-
toring campaign we will include data from each monitoring
campaign as separate lightcurves in our fits.
2.2 XMM-Newton Data Reduction
NGC 3783 was observed by XMM-Newton during revolu-
tions 371 and 372, between 2001 December 17 and 2001 De-
cember 21. Temporal analysis of these data were first pre-
sented by Markowitz (2005) who discusses the coherence,
frequency-dependent phase lags, and variation of high fre-
quency PSD slope with energy. Here we use these data to
constrain the high frequency part of the overall long and
short timescale PSD. We used data from the European Pho-
ton Imaging Cameras (EPIC) PN and MOS2 instruments,
which were operated in imaging mode. MOS1 was operated
in Fast Uncompressed Mode and we do not use those data
here. The PN camera was operated in Small Window mode,
using the medium filter. Source photons were extracted from
a circular region of 40′′ radius and the background was se-
Figure 3. XMM-Newton light curve in the 0.2–2 keV band of
NGC 3783.
lected from a source-free region of equal area on the same
chip. We selected single and double events, with quality
flag=0. The MOS2 camera was operated in the Full Win-
dow mode, using the medium filter. We extracted source
and background photons using the same procedure as for
the PN data and selected single, double, triple and quadru-
ple events. These data showed no serious pile-up when tested
with the XMM-SAS task epatplot.
We constructed light curves, for each detector and or-
bit, in the 0.2–2, 2–10 and 4–10 keV energy bands. We filled
in the ∼ 5 ks gap in the middle of orbit 371 light curves, and
some other much smaller gaps, by interpolation and added
Poisson noise. The resulting PN and MOS2 continuous light
curves were then combined to produce the final light curves
for each orbit. The combined, background subtracted, av-
erage count rates in the 0.2–10 keV band were 11.8 c/s for
orbit 371 and 15.8 c/s for orbit 372, and the 0.2-2 keV com-
bined light curve is shown in Fig. 3. Poisson noise dominates
the PSD on timescales shorter than 1000s, so the light curves
were binned into 200s bins.
3 POWER SPECTRAL MODELLING
3.1 Combining RXTE and XMM-Newton data
To determine the PSD over the largest possible frequency
range we combine the RXTE and XMM-Newton data. In
GBHs the break-frequency and slope of the PSD below the
break appear to be independent of the chosen energy band
(Cui et al. 1997; Churazov et al. 2001; Nowak et al. 1999;
Revnivtsev et al. 2000; McHardy et al. 2004). On the other
hand, the PSD normalisation and the slope above the break
are often energy-dependent (Markowitz 2005). Therefore,
when combining data from different instruments, it is prefer-
able to use similar energy ranges. The RXTE data are in
the 2–10 keV band and, for NGC 3783, that band has a
median photon energy of 5.7 keV. The XMM-Newton band
with the same median photon energy is 4.1–10 keV. How-
ever the count rate in that XMM-Newton band is low (2
c/s) so we only detect significant source power above the
Poisson noise level at frequencies below 10−4Hz. To probe
higher frequencies we can use the 0.2–2 keV XMM-Newton
data (8.8 c/s) but we must re-scale its PSD normalisation to
c© 2006 RAS, MNRAS 000, 1–8
4 D. P. Summons et al
Light curve Sampling interval Observation length Date Range [MJD]
RXTE Long-term 1 ∼4.36 days 1194.6 days 51180.5–52375.1
RXTE Long-term 2 ∼2.1 days 928.3 days 53063.4–53991.6
RXTE Intense monitoring ∼3.2 hours 19.9 days 51960.1–51980.1
XMM-Newton observations (2 orbits) 200-s 3.2 days 52260.8–52264.0
Table 1. Summary of the RXTE and XMM-Newton light curves used in the analysis of NGC 3783, including their sampling frequency
and date range.
that of the 4–10 keV PSD. We determined the scaling cor-
rection by producing PSDs in both energy bands and fitting
the same bending power law model to the noise-subtracted
data. On the assumption that the PSD shape below the high
frequency break is energy-independent, the combined RXTE
2–10 keV and XMM-Newton 0.2–2 keV PSD will then have
the shape of the 0.2–2 keV PSD.
3.2 Monte Carlo simulations
We use the Monte Carlo technique of Uttley et al. (2002)
(PSRESP), to estimate the underlying PSD parameters in
the presence of sampling biases. In this method we first
calculate the observed (or ‘dirty’) PSD, in parts, from the
observed lightcurves, using the Discrete Fourier Transform.
Here the PSD estimates are binned in bins of width 1.3ν,
where ν is the starting frequency by taking the average of
the log of power (Papadakis & Lawrence 1993). We require
a minimum of 4 PSD estimates per bin. We then compare si-
multaneously the dirty PSDs from each lightcurve with var-
ious model PSDs derived from lightcurves simulated with
the same sampling pattern as the real observations. We al-
ter the model parameters to obtain the best fit for any given
model. We refer the reader to Uttley et al. (2002) for a full
discussion of the method.
For each set of chosen underlying-PSD model param-
eters, we simulate red-noise light curves, as described by
Timmer & Koenig (1995). The RXTE light curves are sim-
ulated with time resolutions of 10.5 h, 5.0 h and 18 m for the
first and second long time-scale and the medium time-scale
light curves respectively. The simulated resolution, which is
10 times shorter than the typical sampling intervals of the
real observations, given in column 2 of Table 1, is to take into
account the effect of aliasing. These simulated light curves
were resampled and binned to match the real NGC 3783 ob-
servations. XMM-Newton light curves were simulated with
200-s resolution, as at shorter time-scales the underlying
varability power is negligible compared to the Poisson noise,
so aliasing does not play a role. The Poisson noise level was
not subtracted from the observed PSD but was added to
the simulated PSDs. To reproduce the effect of red-noise
leak, each light curve was simulated to be ∼300 times longer
than the real observation, and was then split into sections,
constituting 300 simulated light curves for each observed
lightcurve. The simulated model average PSD is evaluated
from this ensemble of PSD realisations, and the errors are
assigned from the rms spread of the realisations within a
frequency bin.
We present the results of several PSD model fits in an
attempt to quantify the underlying model shape that best
describes the PSD of NGC 3783, and associate an accep-
Figure 4. The best fit unbroken power law PSD of the combined
RXTE and XMM-Newton data. The solid lines represent the ob-
served data and the points with error bars exhibit the biased
model and the spread in individual realisations of the model. The
dashed line is the underlying model used to generate the simu-
lated PSD. The three lowest frequency data sets are from RXTE
observations and the high frequency data set (∼10−5 Hz) is from
the XMM-Newton observations. Note that the rise in power at
the highest frequencies is due to the photon Poisson noise.
tance probability with each model. We initially test a sim-
ple unbroken power law model. We next fit a power law with
a single-bend in the PSD, and then a model incorporating
a double-bend. We also fit a single-bend power law with a
Lorentzian component.
3.3 Unbroken power law model
To begin, we fitted a simple power law model to the data of
the form:
P (ν) = A
where A is the normalisation at a frequency ν0, and α
is the power law slope. We made 900 simulations and in Fig.
4 we show the best fit plotted in ν × Pν , which has a PSD
slope of -2.1. However, the fit is poor and this model can be
rejected with a probability > 99 %, or & 3 σ.
3.4 Single-bend power law model
Here we fit a single-bend power law to the data. This model
best describes the PSD of Cyg X-1 in the high/soft state,
and provides a good fit to the PSDs of the AGN NGC 4051
and MCG–6-30-15 (McHardy et al. 2004, 2005).
c© 2006 RAS, MNRAS 000, 1–8
Accretion state of NGC 3783 5
Figure 5. The best fit single-bend power law PSD of the com-
bined RXTE and XMM-Newton data. The various lines represent
the same data as seen in Fig. 4.
Figure 6. The best fit double-bend power law PSD of the com-
bined RXTE and XMM-Newton data. The various lines represent
the same data as seen in Fig. 4.
P (ν) =
A ναL
”αL−αH
Fig. 5 presents the observed PSD fitted with a single-
bend power law model, for which a good likelihood of accep-
tance is obtained (P = 44 %). The best fit bend-frequency is
νB = 6.2
+40.6
−5.6 × 10
−6 Hz, the high-frequency slope is αH =
−2.6+0.6
, and the low-frequency slope is αL = −0.8
−0.5.
The errors are 90% confidence limits, an asterisk indicates
that the limit is unconstrained. For αH the best fit value
is well within the searched parameter space but the degen-
eracy produced by red-noise leak in the probability at high
values of αH , means that the upper limit is not constrained
at the 90% confidence level. The confidence contours for the
main interesting parameters are plotted in Figs. 7 and 8. Ta-
ble 2 shows the single-bend power law best fit parameters to
the data. The best fit single-bend frequency obtained here is
consistent with the value found by Markowitz et al. (2003)
3.5 Double-bend power law model
Markowitz et al. (2003) provide tentative evidence that
Figure 7. Single-bend power law model: 68, 90, and 99% confi-
dence contours for the bend frequency, νB, and the high frequency
slope, −αH , for the single-bend power law fit to the combined
RXTE and XMM-Newton PSD.
Figure 8. Single-bend power law model: 68, 90, and 99% confi-
dence contours for the bend frequency, νB, and the low frequency
slope, −αL, for the single-bending power law fit to the combined
RXTE and XMM-Newton PSD.
a second, lower, frequency break exists in the PSD of
NGC 3783. Thus, we also fitted a more complex double-bend
power law model to see if the goodness-of-fit is improved sig-
nificantly. The double-bend power law model is given by:
P (ν) =
A ναL
”αL−αI
”αI−αH
where αI is the intermediate-frequency slope and νL
and νH are the low and high bend-frequencies respectively.
We fixed the low-frequency slope to zero, to avoid making
the simulations computationally prohibitive, and because a
low-frequency slope of zero would allow the best qualitative
comparison to the low state of Cyg X-1 (Nowak et al. 1999).
Fig. 6 presents the same observed PSD as in Fig. 5,
but fitted with the double-bend power law model. A good
likelihood of acceptance is obtained (P=64 %). The best-
fitting high bend-frequency is νH = 2.6
× 10−5 Hz, the
high-frequency slope is αH = −3.2
, the intermediate-
frequency slope is αI = −1.3
, the low-frequency bend is
νL = 1.7
×10−7 Hz. As before, we use 90% confidence lim-
c© 2006 RAS, MNRAS 000, 1–8
6 D. P. Summons et al
Model Normalisation αH αI αL νH νL Acceptance
(a) (Hz) (Hz) (%)
Single-bend 1.5× 10−4 −2.6
NA −0.8
+40.6
× 10−6 NA 44.4
Double-bend 1.0× 102 −3.2
0.0 2.6
× 10−5 1.7
× 10−7 63.9
Table 2. Best fit model parameters for the examined models to the combined RXTE and XMM-Newton PSD of NGC 3783. The errors
on the single- and double-bend fits are calculated from the 90% confidence intervals. The bend-frequency for the single-bend model, νB ,
is denoted here as νH . An asterisk indicates that the limit is unconstrained.
its. The added parameters allow extra freedom to find better
fit probabilities for any given set of double-bend parameters.
For this reason, the contour levels cover larger ranges in the
parameter space and therefore, most of the 90% contours
in our double-bend fit remain unbounded over the fitted
parameter space. The high-frequency slope is subject to the
same problems as in the single-bend model. Table 2 contains
a summary of the best-fitting model parameters.
The best-fitting low-frequency bend is found close to the
lowest frequencies probed by the data and, as seen in Fig. 9,
it is essentially unbounded down to the lowest measurable
frequency at the 68% confidence level. These facts suggest
that the second, low-frequency, bend might not be required
by the data and that the improvement in the fit might be
only due to the increased complexity of the model fitted.
The likelihood of acceptance is better in the double-
bend model than in the single-bend model, 64 versus 44 %
respectively, but there are more free parameters. In order
to determine the significance of this improvement, we per-
formed the following test. Using the best-fitting single-bend
PSD parameters, we generated 300 realisations of the sets of
RXTE and XMM-Newton lightcurves. Each realisation was
then fitted with the best-fitting double-bend parameters, ex-
actly as was done with the real data, and the distribution
of their fit probabilities was constructed. We found that 121
out of the 300 single-bend simulations have a higher fit prob-
ability than the real data, when fitted with the double-bend
model. Therefore, we conclude that the improvement in fit
probability is no more than may be expected from fitting a
model which is more complicated than required by the data:
the double-bend model does not represent a significant im-
provement.
3.6 Single-bend power law with a Lorentzian
component
We finally consider whether the observed PSD might be
best-described by adding a Lorentzian component, such as
are commonly used to describe broad-band noise compo-
nents in GBHs (e.g. Nowak 2000), to the single-bend power
law. We are motivated to consider this possibility because
the PSD of the intense-sampling RXTE light curve is not
very well described by either the single- or double-bend
power law model. Visual inspection of this light curve, shown
in Fig. 2, suggests that the variability is strongly concen-
trated on time-scales of around a day, or equivalently, fre-
quencies around 10−5 Hz, which is confirmed by the peak
seen in the corresponding section of the PSD, and the drop
in the same PSD at lower frequencies (∼ 10−6Hz). The
long-term monitoring PSDs, however, do not show a dip
at 10−6Hz, creating a large discrepancy in the PSD mea-
Figure 9. Double-bend power law model: 68, and 90% confi-
dence contours for the high bend-frequency, νH , and the low bend-
frequency, νL, for the double-bend power law fit to the combined
RXTE and XMM-Newton PSD.
surements at this frequency. A strongly peaked component
in the underlying PSD, at ∼ 10−5 Hz, could produce the
observed features. Such a component would appear as a
peak in a PSD that covered frequencies above and below
its peak-frequency, but would be insufficiently sampled by
the long-term monitoring campaigns; thus, its power would
be aliased into the highest frequencies of the longer time-
scale data, making them rise above the underlying model
level and causing the apparent disparity.
The Lorentzian profile is described by:
PLor(ν) =
ν2c + 4Q
2(νc − ν)2
where the centroid frequency νc is related to the peak-
frequency νp by νp = νc
1 + 1/4Q2 and the quality factor
Q is equal to νc divided by the full width at half maximum
of the Lorentzian. The variable A parameterizes the relative
contribution of the power law and Lorentzian components
to the total rms. Fitting a Lorentzian component in addition
to a single-bend power law provides a good fit (P=52 %).
The best-fitting Lorentzian contributes 20% of the variance
in the frequency range probed and its best-fitting parame-
ters are quoted in Table 3. Fig. 10 shows the observed PSD
compared with the best-fitting single-bend power law model
plus a Lorentzian component. The Lorentzian feature in the
model can reproduce qualitatively the spurious power at the
high frequency end of the long-term monitoring data and the
turn down effect observed in the intensive-sampling data.
To determine the significance of the Lorentzian compo-
nent fit we repeated the procedure used in determining the
c© 2006 RAS, MNRAS 000, 1–8
Accretion state of NGC 3783 7
νp νB Q A αL αH Acceptance
(Hz) (Hz) (%)
× 10−6 1.1
× 10−5 5.1
Table 3. Best fit single-bend power law with Lorentzian component model parameters to the combined RXTE and XMM-Newton PSD
of NGC 3783, where νp is the Lorentzian peak frequency, Q is its quality factor, νB is the power law bend frequency and αL and αH
are the power law slopes below and above the bend, respectively. The errors are calculated from the 68% confidence intervals, and an
asterisk indicates that the limit is unconstrained.
Figure 10. The best-fitting single-bend power law with a
Lorentzian component. The fit was done using the entire data
set but here we only show the Lorentzian region. As before, solid
lines represent the real data PSD, dashed lines represent the best-
fitting model and markers with error bars represent the model
distorted by sampling effets. The Lorentzian feature in the model
can reproduce qualitatively the spurious power at the high fre-
quency end of the long-term monitoring data and the turn down
effect observed in the intensive-sampling data.
significance of the double-bend model. We found that 222 of
the 300 single-bend simulated PSDs have a higher fit proba-
bility than the data, when fitted with the single-bend power
law plus Lorentzian model. This result indicates that the
increase in fit probability could be due to the added com-
plexity of the model, and that the improvement in the fit
over a simple bending power law is not significant.
4 DISCUSSION AND CONCLUSIONS
We have combined our own new RXTE monitoring data
with archival RXTE and XMM-Newton observations to
construct a high-quality PSD of NGC 3783 spanning five
decades in frequency.
We find that a ‘soft’ state model, with a single
bend at 6.2 × 10−6 Hz, similar to that found earlier by
Markowitz et al. (2003), a power law of slope approximately
-0.8 extending over almost three decades in frequency below
the bend, and slope above the bend of approximately -2.6
is a good fit to the data. We also find that a ‘hard’ state
model, with a double bend, fits the data, as does a model
with a single bend plus an additional Lorentzian compo-
nent. However the improvement in fit is marginal and, given
the additional free parameters, is not significant. Thus we
conclude that a simple ‘soft’ state model provides the most
likely explanation of the data.
Assuming a mass of 3 × 107M⊙ for NGC 3783
(Peterson et al. 2004), and an accretion rate of 7% of
the Eddington limit (Uttley & McHardy 2005, based on
Woo & Urry 2002), then NGC 3783 is still in good agree-
ment with the scaling of PSD break timescale as ∼ M/ṁE
between AGN and GBHs found by McHardy et al. (2006).
Our new fits, show that the PSD of NGC 3783 is per-
fectly consistent with a single-bend power law with low-
frequency slope of -1, in contrast with the earlier result of
Markowitz et al. (2003), who found that a similar model
was rejected tentatively at ∼ 98% confidence. The differ-
ence can be understood in terms of the improved long-term
data. Our new RXTE monitoring observations occur every 2
days, compared to 4 days previously, thereby increasing the
long term RXTE data set by a factor 2.6 and, in particular,
providing overlap at high frequencies with the RXTE inten-
sive monitoring data. The drop in long-timescale variability
power, evident in the older long term monitoring data is not
reproduced by the new long-term monitoring data, showing
that this drop could be just a statistical fluctuation. In addi-
tion, the very high frequencies are better constrained by the
2 orbits of XMM-Newton data than by the earlier Chandra
data used by Markowitz et al. (2003).
The classification of the PSD as being ‘soft’ state means
that NGC 3783 is no longer considered unusual amongst
AGN. The fact that this AGN is radio-quiet strongly sup-
ports the analogy with GBHs in the soft state. Also the ac-
cretion rate of NGC 3783 ( ṁE =0.07) (Uttley & McHardy
2005, based on Woo & Urry 2002) is similar to that
of other AGN with soft-state PSDs (e.g. NGC 3227
Uttley & McHardy 2005, NGC 4051 McHardy et al. 2004,
MCG-6-30-15 McHardy et al. 2005). This accretion rate is
above the rate at which the persistent GBH Cyg X-1 tran-
sits between hard and soft states in either direction and at
which other GBHs transit from the soft to hard state ( ṁE
=0.02) (Maccarone et al. 2003; Maccarone 2003). We note
that other transient GBHs in outburst, where the variable
power law emission in the soft state PSD is weak, can re-
main in the hard state to much higher accretion rates (∼ 2–
50% Homan & Belloni 2005) but it is not clear whether we
should expect similar PSD shapes to AGN for such outburst-
ing sources. Thus NGC 3783 remains compatible with other
moderately accreting AGN in being analogous to Cyg X-1
in the soft state. It is, of course, possible that the transition
rate might not be independent of mass. Observations do not
yet greatly constrain the transition rate as a function of mass
but the abscence of large deviations from the so-called ‘fun-
damental’ plane of radio luminosity, X-ray luminosity and
black hole mass (Merloni et al. 2003; Falcke et al. 2004) ar-
gues against a large spread in the transistion accretion-rates
(e.g. see Körding et al. 2006). In the case of Seyfert galaxy
c© 2006 RAS, MNRAS 000, 1–8
8 D. P. Summons et al
NGC 3227, the accretion rate is ∼ 1–2% and a ‘soft’ state
PSD is measured (Uttley & McHardy 2005), which suggests
that the transition accretion-rate in AGN should be at or
below that value.
Our observations, which show that NGC 3783 does not
have a highly unusual PSD, therefore confirm the growing
similarities between AGN and Galactic black hole systems
and leave only Arakelian 564, which is probably a very high
state object, as the only AGN showing clear double breaks
(or multiple Lorentzians) in its PSD (e.g. Arévalo et al.
2006, McHardy et al. in prep.).
ACKNOWLEDGEMENTS
We would like to thank the referee, Chris Done, for use-
ful comments and suggestions. This research has made use
of the data obtained from the High Energy Astrophysics
Science Archive Research Center (HEASARC), provided by
NASA’s Goddard Space Flight Center. We would like to
thank Information Systems Services (ISS) at the Univer-
sity of Southampton for the use of their Beowulf cluster,
Iridis2. PU acknowledges support from a Marie Curie Inter-
European Research Fellowship.
REFERENCES
Arévalo P., Papadakis I. E., Uttley P., McHardy I. M.,
Brinkmann W., 2006, MNRAS, 372, 401
Axelsson M., Borgonovo L., Larsson S., 2006, A&A, 452,
Belloni T., Homan J., Casella P., van der Klis M., Nespoli
E., Lewin W. H. G., Miller J. M., Méndez M., 2005, A&A,
440, 207
Churazov E., Gilfanov M., Revnivtsev M., 2001, MNRAS,
321, 759
Corbel S., Fender R. P., Tzioumis A. K., Nowak M., McIn-
tyre V., Durouchoux P., Sood R., 2000, A&A, 359, 251
Cui W., Heindl W. A., Rothschild R. E., Zhang S. N., Ja-
hoda K., Focke W., 1997, ApJL, 474, L57+
Done C., Gierliński M., 2005, MNRAS, 364, 208
Edelson R., Nandra K., 1999, ApJ, 514, 682
Falcke H., Körding E., Markoff S., 2004, A&A, 414, 895
Fender R. P., 2001, MNRAS, 322, 31
Homan J., Belloni T., 2005, ApSS, 300, 107
Homan J., Wijnands R., van der Klis M., Belloni T., van
Paradijs J., Klein-Wolt M., Fender R., Méndez M., 2001,
ApJS, 132, 377
Körding E. G., Fender R. P., Migliari S., 2006, MNRAS,
369, 1451
Körding E. G., Jester S., Fender R., 2006, MNRAS, 372,
Maccarone T. J., 2003, A&A, 409, 697
Maccarone T. J., Gallo E., Fender R., 2003, MNRAS, 345,
Markowitz A., 2005, ApJ, 635, 180
Markowitz A., Edelson R., Vaughan S., Uttley P., George
I. M., Griffiths R. E., Kaspi S., Lawrence A., McHardy I.,
Nandra K., Pounds K., Reeves J., Schurch N., Warwick
R., 2003, ApJ, 593, 96
McHardy I., 1988, Memorie della Societa Astronomica Ital-
iana, 59, 239
McHardy I. M., Gunn K. F., Uttley P., Goad M. R., 2005,
MNRAS, 359, 1469
McHardy I. M., Koerding E., Knigge C., Uttley P., Fender
R. P., 2006, Nature, 444, 730
McHardy I. M., Papadakis I. E., Uttley P., Page M. J.,
Mason K. O., 2004, MNRAS, 348, 783
Merloni A., Heinz S., di Matteo T., 2003, MNRAS, 345,
Nowak M. A., 2000, MNRAS, 318, 361
Nowak M. A., Vaughan B. A., Wilms J., Dove J. B., Begel-
man M. C., 1999, ApJ, 510, 874
Papadakis I. E., Brinkmann W., Negoro H., Gliozzi M.,
2002, A&A, 382, L1
Papadakis I. E., Lawrence A., 1993, MNRAS, 261, 612
Peterson B. M., Ferrarese L., Gilbert K. M., Kaspi S.,
Malkan M. A., Maoz D., Merritt D., Netzer H., Onken
C. A., Pogge R. W., Vestergaard M., Wandel A., 2004,
ApJ, 613, 682
Pottschmidt K., Wilms J., Nowak M. A., Pooley G. G.,
Gleissner T., Heindl W. A., Smith D. M., Remillard R.,
Staubert R., 2003, A&A, 407, 1039
Pounds K., Edelson R., Markowitz A., Vaughan S., 2001,
ApJL, 550, L15
Reig P., Papadakis I., Kylafis N. D., 2002, A&A, 383, 202
Revnivtsev M., Gilfanov M., Churazov E., 2000, A&A, 363,
Reynolds C. S., 1997, MNRAS, 286, 513
Timmer J., Koenig M., 1995, A&A, 300, 707
Uttley P., McHardy I. M., 2005, MNRAS, 363, 586
Uttley P., McHardy I. M., Papadakis I. E., 2002, MNRAS,
332, 231
Uttley P., McHardy I. M., Vaughan S., 2005, MNRAS, 359,
Wilms J., Nowak M. A., Pottschmidt K., Pooley G. G.,
Fritz S., 2006, A&A, 447, 245
Woo J.-H., Urry C. M., 2002, ApJ, 579, 530
Zhang W., Giles A. B., Jahoda K., Soong Y., Swank J. H.,
Morgan E. H., 1993, in Siegmund O. H., ed., Proc. SPIE
Vol. 2006, p. 324-333, EUV, X-Ray, and Gamma-Ray In-
strumentation for Astronomy IV, Oswald H. Siegmund;
Ed. Laboratory performance of the proportional counter
array experiment for the X-ray Timing Explorer. pp 324–
c© 2006 RAS, MNRAS 000, 1–8
	INTRODUCTION
	OBSERVATIONS AND DATA REDUCTION
	RXTE Data Reduction
	XMM-Newton Data Reduction
	Power spectral Modelling
	Combining RXTE and XMM-Newton data
	Monte Carlo simulations
	Unbroken power law model
	Single-bend power law model
	Double-bend power law model
	Single-bend power law with a Lorentzian component
	Discussion and Conclusions
ABSTRACT
  Previous observations with the Rossi X-ray Timing Explorer (RXTE) have
suggested that the power spectral density (PSD) of NGC 3783 flattens to a slope
near zero at low frequencies, in a similar manner to that of Galactic black
hole X-ray binary systems (GBHs) in the `hard' state. The low radio flux
emitted by this object, however, is inconsistent with a hard state
interpretation. The accretion rate of NGC 3783 (~7% of the Eddington rate) is
similar to that of other AGN with `soft' state PSDs and higher than that at
which the GBH Cyg X-1, with which AGN are often compared, changes between
`hard' and `soft' states (~2% of the Eddington rate). If NGC 3783 really does
have a `hard' state PSD, it would be quite unusual and would indicate that AGN
and GBHs are not quite as similar as we currently believe. Here we present an
improved X-ray PSD of NGC 3783, spanning from ~10^{-8} to ~10^{-3} Hz, based on
considerably extended (5.5 years) RXTE observations combined with two orbits of
continuous observation by XMM-Newton. We show that this PSD is, in fact, well
fitted by a `soft' state model which has only one break, at high frequencies.
Although a `hard' state model can also fit the data, the improvement in fit by
adding a second break at low frequency is not significant. Thus NGC 3783 is not
unusual. These results leave Arakelian 564 as the only AGN which shows a second
break at low frequencies, although in that case the very high accretion rate
implies a `very high', rather than `hard' state PSD. The break frequency found
in NGC 3783 is consistent with the expectation based on comparisons with other
AGN and GBHs, given its black hole mass and accretion rate.

<|endoftext|><|startoftext|>
Two- and three-point Green’s functions in two-dimensional Landau-gauge Yang-Mills
theory
Axel Maas1, ∗
Department of Complex Physical Systems, Institute of Physics,
Slovak Academy of Sciences, Dúbravská cesta 9, SK-845 11 Bratislava, Slovakia
(Dated: November 4, 2018)
The ghost and gluon propagator and the ghost-gluon and three-gluon vertex of two-dimensional
SU(2) Yang-Mills theory in (minimal) Landau gauge are studied using lattice gauge theory. It is
found that the results are qualitatively similar to the ones in three and four dimensions. The propa-
gators and the Faddeev-Popov operator behave as expected from the Gribov-Zwanziger scenario. In
addition, finite volume effects affecting these Green’s functions are investigated systematically. The
critical infrared exponents of the propagators, as proposed in calculations using stochastic quanti-
zation and Dyson-Schwinger equations, are confirmed quantitatively. For this purpose lattices of
volume up to (42.7 fm)2 have been used.
PACS numbers: 11.10.Kk 11.15.-q 11.15.Ha 12.38.Aw
I. INTRODUCTION
Two-dimensional Yang-Mills theory turns out to be a
very fascinating topic. Quite a number of quantities, e.
g. the string tension [1], can be calculated exactly, al-
though not all quantities are (yet) known analytically.
In particular, up to now it was not possible to calcu-
late the Green’s functions in Landau gauge. However,
exactly these Green’s functions may contain interesting
information.
The reason for this is confinement. In two-dimensional
Yang-Mills theory, confinement in Landau gauge is al-
ready manifest in perturbation theory: All elementary
fields, the gluons and ghosts, form a BRST quartet, and
thus are confined according to the Kugo-Ojima mecha-
nism [2]. This can be extended non-perturbatively, pro-
vided that BRST symmetry is unbroken beyond pertur-
bation theory. This makes explicit the absence of propa-
gating degrees of freedom in two-dimensional Yang-Mills
theory. But even without propagating degrees of free-
dom, this permits to investigate the manifestation of the
quartet mechanism on the level of the Green’s functions.
In addition, the reasoning for the confinement scenario
of Gribov and Zwanziger [3, 4, 5, 6] is applicable to two
dimensions as well [6]. However, this scenario has no
direct manifestation on the perturbative level, as in the
case of the quartet mechanism. It is only manifest in
the infrared properties of correlation functions. In par-
ticular, the Gribov-Zwanziger scenario predicts that the
Faddeev-Popov operator Mab accumulates near-zero or
zero eigenvalues. As a consequence, the ghost propagator
DG, being the expectation value of the inverse Faddeev-
Popov operator, should be infrared diverging. Detailed
calculations using stochastic quantization [6] or Dyson-
Schwinger equations (DSEs) [7, 8] lead to a power-law
∗Electronic address: axel.maas@savba.sk
behavior in the far infrared in any dimension from two
to four,
DG(p) ∼p→0 p
−2−2κ. (1)
Furthermore, the gluon propagator is infrared vanishing,
and thereby explicitly positivity violating. Its scalar part
also behaves like a power-law in the far infrared,
D(p) =
(d− 1)
δµν −
Dµν(p) ∼p→0 p
−2−2t,
where d is the space-time dimension. The two exponents
are related by the sum rule
t+ 2κ+
= 0. (3)
Under the assumption of an infrared bare ghost-gluon
vertex, two possible values for κ are found, 0 and 1/5
[6, 8]. If physics is smooth as a function of dimensionality,
the non-zero exponent would be expected due to the re-
sults obtained in three and in four dimensions [6, 7, 8, 9].
Note that in calculations using the renormalization group
in the case of a bare ghost-gluon vertex the same equa-
tions as in DSE calculations are obtained, thus leading
to the same results for the infrared exponents in any di-
mension [10].
These two scenarios are two of the most discussed for
the confinement mechanism of gluons also in higher di-
mensions, see e. g. for four dimensions the reviews [11]
and in three dimensions [6, 8, 12]. A verification of their
predictions using lattice gauge theory in higher dimen-
sions has, however, turned out to be very complicated,
mainly due to finite volume effects. In three dimensions
only a qualitative agreement between the predictions of
the Gribov-Zwanziger scenario and functional calcula-
tions has been obtained [12, 13]. In four dimensions, the
lattice results are inconclusive (see e. g. [14, 15, 16, 17]).
Studies using Dyson-Schwinger equations in a finite vol-
ume support that these problems are, in fact, finite vol-
ume effects, and provide even a quantitative prediction
http://arxiv.org/abs/0704.0722v2
mailto:axel.maas@savba.sk
of these in four dimensions [18]. The latter are in ac-
ceptable agreement with the results obtained in lattice
calculations [18].
Here, for two dimensions, the accessible lattices permit
a quantitative test of the predictions. It will be shown
that the predictions, assumptions, and actually the value
of κ = 1/5, of the Gribov-Zwanziger scenario are found
in lattice calculations, and hence there is very strong ev-
idence for the Gribov-Zwanziger scenario to be at work.
In fact, it is possible to quantify the finite volume effects.
Hence, in the following a quantitative confirmation of
the predictions of the Gribov-Zwanziger scenario using
lattice gauge theory for two-dimensional SU(2) Yang-
Mills theory in (minimal) Landau gauge will be given.
Of course, with such results, one question immediately
arises when comparing the two-dimensional results to
those in higher dimensions: Why do they agree qualita-
tively on the level of two- and three-point Green’s func-
tions in the infrared? This points to a structural origin
of both, the Gribov-Zwanziger and the Kugo-Ojima sce-
nario, provided both are, in fact, correct. It is partic-
ularly tempting to then investigate the relation of both
scenarios in two dimensions. Also how the relation of
two-dimensional Yang-Mills theory to topological field
theory [19] comes then into play is immediately on one’s
mind. These, and similar questions arise when contem-
plating the results, and indicate that there are many in-
teresting opportunities still present in the study of two-
dimensional Yang-Mills theory. These are highly inter-
esting questions, and must be investigated in the future.
Within this work, however, as a first step just the
results from the lattice calculations will be collected.
The two-point functions, and as associated quantities
the Faddeev-Popov operator and the running coupling,
will be investigated in section II. The three-point func-
tions will afterwards be discussed in section III. A short
summary of the results will be given in section IV. The
technicalities of the lattice simulations can be found in
appendix A. Lattice artifacts other than finite volume ef-
fects will be discussed in appendix B. That the suppres-
sion of color indices in equations (1) and (2) is justified
will be shown in appendix C.
II. TWO-POINT FUNCTIONS
The definition and determination of the two-point
functions on the lattice, and the associated quantities,
have been repeatedly discussed in the literature (see, e.
g., [5, 12, 14, 15, 16, 17]). Here, the methods described
in [12] are employed. Furthermore, the appearance of β-
factors to obtain the correct scaling has been discussed
there, also in case of the three-point functions. Hence,
this will not be repeated here. To assign units to the
quantities, the exactly calculable string tension [1] has
been assigned the conventional value (440 MeV)2, as in
higher dimensions.
p [GeV]
0 0.2 0.4 0.6 0.8 1
Gluon propagator
p [GeV]
0 0.5 1 1.5 2 2.5 3
Gluon dressing function
FIG. 1: The top panel shows the gluon propagator at small
momenta for various volumes. The lower panel shows the
gluon dressing function over the whole accessible momentum
range. Open circles correspond to a volume of (42.7 fm)2, full
squares to (14.2 fm)2, full triangles to (7.11 fm)2, and upside-
down full triangles to (2.02 fm)2. The solid line in the top
panel is the function 4.5p4/5.
A. Gluon propagator
The gluon propagator is the most readily accessible
two-point correlation function. The results for the prop-
agator D(p), (2), and its associated dressing function
p2D(p) are shown in figure 1. A strongly infrared sup-
pressed gluon propagator is clearly visible. At the same
time, the infrared suppression increases with increasing
physical volume. In particular, while on a volume of (2.02
fm)2 the propagator appears to be infrared diverging, a
clear maximum appears already at a volume only a factor
1/L [GeV]
-210 -110
Gluon propagator at zero momentum
FIG. 2: The zero-momentum value D(0) of the gluon propa-
gator as a function of inverse edge length. The straight line is
the power-law fit 5.67L−0.79 to the 20 points at the smallest
volumes.
(2-3)2 larger. Only the point at the lowest non-vanishing
momentum and the point at zero are not consistent with
an infrared vanishing gluon propagator. This is, however,
expected [18]. The scaling of D(0) with volume, shown
in figure 2, makes it very likely that in the infinite volume
limit the gluon propagator vanishes at zero momentum,
as it vanishes like a power-law with inverse volume. In
fact, the exponent 0.79 of the determined power-law is
in very good agreement with the expectation [18] that it
should coincide with the exponent of the gluon propaga-
tor t = 4/5 of equation (2).
Furthermore, even the gluon dressing function does not
exhibit any qualitative difference to three dimensions.
In particular it also exhibits a (shallow) maximum. As
the propagator becomes ultraviolet constant, as a conse-
quence of asymptotic freedom, there is no intrinsic ne-
cessity for such a maximum, as in four dimensions. Its
presence in this theory without propagating degrees of
freedom is hence slightly surprising. However, in the con-
text of a DSE treatment, it is natural to expect such a
maximum due to the different signs of ghost and gluon
self-energy contributions [8].
The most interesting quantity is the far infrared be-
havior. It is clearly visible that the gluon propagator is
strongly infrared suppressed. The deviation at the very
lowest momenta points, however, shows a more massive
behavior, as expected from DSE-studies in finite volumes
[18]. However, the mass decreases rapidly with volume,
as discussed above, and a massive behavior is seen only
in a momentum window which rapidly decreases with in-
creasing volume. More interestingly, it is expected that
1/L [GeV]
0 0.1 0.2 0.3 0.4 0.5
Gluon infrared exponent
FIG. 3: The measured infrared exponent κZ obtained from
the gluon propagator. Two fits are given. The dashed line
corresponds to a fit of type (5) which is forced to go to the
predicted value κ = 1/5 at 1/L = 0, while the one given by
the solid line is not forced to do so. The fit parameters can
be found in table I.
in the regime1
≪ p ≪ ΛQCD (4)
the continuum behavior should prevail. In particular, the
gluon propagator should decrease even in a finite volume
in this domain like the power-law (2) [18]. Using the sum
rule (3), the exponent of the propagator itself should be
4κ. Such a power-law is shown in the top panel in figure
1, and agrees well with the data inside the domain (4).
To investigate this quantitatively, the effective expo-
nent κZ was determined. This was done by discarding
the two lowest non-vanishing momentum points. Then
the next five highest points in momentum were used to
fit a power-law. To obtain errors, the steepest and shal-
lowest curve consistent with a 1σ-confidence interval was
determined as well. That this is likely too optimistic is
shown by the scattering of the results below. If more
than one momentum representation for a given momen-
tum existed, the results were averaged over the various
representations, as the violation of rotational symmetry
is a minor effect that far in the infrared, see appendix B.
The results are shown in figure 3. While there are still
significant fluctuations at large volumina, the measured
exponents tend towards the continuum value.
The volume-dependence of the measured exponents
1 The characteristic scale ΛQCD is in two dimensions of course
proportional to the coupling constant g.
TABLE I: Fit parameters of formula (5). Fit 1 corresponds
to one with fixed a = κ = 1/5, fit 2 to one where a was fitted
as well.
Fit a b [fm] c [fm2] d [fm3]
1 1/5 0.130 -12.9 19.5
2 0.190 0.358 -14.0 20.9
can be fitted by the formula2
Z = a+
. (5)
Two fits have been done. In one case, a was fitted as
well, while in the second case a was set to the continuum
value κ = 1/5. However, even with a free, the result
is in reasonable agreement with 1/5. In particular, the
results are not consistent with an infrared finite gluon
propagator, which would be expected if κ = 0, the second
solution found in [6, 8]. The individual fit parameters are
given in table I.
Hence, the gluon propagator behaves quantitatively
exactly as predicted in the Gribov-Zwanziger scenario,
when finite volume effects are taken properly into ac-
count.
B. Ghost propagator
The ghost propagator has been determined along the
same line as in higher dimensions [5, 12]. However, more
interesting than the propagator itself is the dressing func-
tion p2DG(p). The propagator and the dressing function
are shown for different volumes in figure 4. It is clearly
visible that the dressing function is infrared diverging.
This already indicates that of the two possible exponents
κ = 0 and κ = 1/5 found [6, 8] only the latter one, if one
at all, is realized.
Compared to the case of the gluon propagator, finite
volume effects are hardly visible to the eye. It seems that
the propagator actually becomes less infrared diverging
with volume. From the quantitative evaluation below,
this is found to be not the case. What seems to be the
case is that the domain of closest approach to the origin is
affected by finite volume effects. Its modification leads to
the various changes in the infrared in a non-trivial man-
ner. If this is the case, the finite-volume effects would
be very hard to compare between lattice calculations
and functional calculations, as they would be dominated
by mid-momentum effects, which in functional methods
are usually most strongly affected by truncations [8, 11].
This would, on the other hand, explain why in four di-
mensions the finite volume effects in the ghost propaga-
tor have indeed been found to be at least to some extent
2 The cubic term is necessary to include all volumes.
p [GeV]
0 1 2 3 4 5
Ghost propagator
p [GeV]
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Ghost dressing function
FIG. 4: The top panel shows the ghost dressing function at
small momenta for various volumes. The lower panel shows
the ghost propagator over the whole accessible momentum
range. Open circles correspond to a volume of (42.7 fm)2,
full squares to (14.2 fm)2, full triangles to (7.11 fm)2, and
upside-down full triangles to (2.02 fm)2. The solid line is the
function 1.1p−2/5.
different in lattice and in Dyson-Schwinger calculations
[18]. In addition, Gribov-Singer effects [3, 20], which ac-
cording to the Gribov-Zwanziger scenario are irrelevant
in the infinite-volume limit [21], may still be relevant even
at volumes as large as those used here. This has not yet
been investigated in two dimensions in Landau gauge.
Even with the available volumes the effect is small. A
power-law with exponent κ = 1/5 already describes the
data quite well in the infrared, as shown in the top panel
of figure 4. Therefore, a more quantitative investigation
of the infrared behavior is required.
1/L [GeV]
0 0.1 0.2 0.3 0.4 0.5
Ghost infrared exponent
FIG. 5: The measured infrared exponent κ obtained from the
ghost propagator. Two fits of type (5) are given. The dashed
line corresponds to a fit which is forced to go to the predicted
value at 1/L = 0, while the one given by the solid line is not
forced to do so. The fit parameters can be found in table II.
TABLE II: Fit parameters for the ghost effective exponent
κG using formula (5). Fit 1 corresponds to one with fixed
a = κ = 1/5, fit 2 to one where a was fitted as well.
Fit a b [fm] c [fm2] d [fm3]
1 1/5 0.139 -0.711 0.173
2 0.150 1.28 -6.41 7.47
This is done by extracting the effective infrared ghost
exponent κG in the same way as in the case of the gluon
propagator. The results for κG are shown in figure 5.
While statistical errors are larger than in the case of
the gluon propagator, it is visible that all results clus-
ter around the predicted continuum value of κ = 1/5 at
large volumes. This is also seen from a fit of the type
(5). The corresponding fit parameters can be found in
table II. Due to statistical uncertainties it is not as clean
as for the gluon. However, it is visible that the exponent
does not vary strongly with volume. In fact, the effective
ghost exponent seems only to change by about a third
when changing the volume by almost two orders of mag-
nitude. Finally, even with the limited fit accuracy it is
not unreasonable that the results are, in fact, consistent
with the prediction of κ = 1/5.
Another possibility to check the continuum results is
to test the predicted sum rule (3). This is done by using
the effective measured exponents κZ and κG in figure 6.
Again, a fit of type (5) has been performed. The corre-
sponding fit parameters are given in table III. As already
anticipated from the individual results, the sum rule be-
comes better and better satisfied when approaching the
continuum limit. Hence, it seems very likely that the
1/L [GeV]
0 0.1 0.2 0.3 0.4 0.5
Infrared sum rule
FIG. 6: Test of the sum rule t+2κ+1 = 0, using the effective
ghost exponent κG, shown in figure 5, and the effective gluon
exponent tZ = −(1 + 2κZ), shown in figure 3. Two fits of
type (5) are given. The dashed line corresponds to a fit which
is forced to go to the predicted value at 1/L = 0, while the
one given by the solid line is not forced to do so. The fit
parameters are given in table III.
shiny relation (3) is, in fact, recovered in the continuum
limit.
TABLE III: Fit parameters for a formula of type (5) for the
sum rule. Fit 1 corresponds to one with fixed a = 0, fit 2 to
one where a was fitted as well.
Fit a b [fm] c [fm2] d [fm3]
1 0 0.0192 24.4 -38.6
2 -0.0806 1.85 15.2 -26.9
One of the particularly interesting results so far is that
the ghost exponent is only very weakly dependent on
the volume, compared to the one of the gluon. This
is in marked contrast to the case in four-dimensional
DSEs in a finite volume [18]. Furthermore, all attempts
to extract a ghost exponent in lattice calculations in
higher dimensions also yield a rather small, more or less
volume-independent exponent [17]. It is thus interesting
to compare the ghost propagator in various dimensions at
roughly the same volume. This is done in figure 7. Only
the momentum regime is shown which is accessible by all
of the lattices used. Furthermore, the propagators have
been normalized so that they coincide at a momentum of
2 GeV. For the momenta itself, the string tension was set
to the same value for all three different dimensionalities.
Aside from the question to which extent such a com-
parison is justified, the results behave as predicted: The
ghost propagator becomes more divergent with increasing
dimension. Also, it is in agreement with the predictions
p [GeV]
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Ghost propagator
p [GeV]
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Ghost dressing function
FIG. 7: Comparison of the ghost propagator in different di-
mensions. Circles are two dimensions, triangles are three
dimensions, and upside-down triangles are four dimensions.
The lattice volumes are (6.06 fm)2, (5.20 fm)3 [12], and (5.28
fm)4 [22], at β = 30, β = 4.2, and β = 2.3, respectively.
[6, 7, 8, 11] that the difference is more pronounced from
two to three dimensions than from three to four dimen-
sions: κ changes from 1/5 to ≈ 0.39 or 1/2 from two
to three dimensions. The four-dimensional exponent of
κ ≈ 0.59 is, on the other hand, rather close to the one in
three dimensions. This qualitative behavior, with all its
caveats, is another indication for the correctness of the
Gribov-Zwanziger scenario and the quantitative predic-
tions.
Furthermore, the result in two dimensions is in fact
confirming quantitatively the Gribov-Zwanziger scenario
in two dimensions. However, the very slow change in the
effective exponent over orders of magnitude in volume is
p [GeV]
0 0.5 1 1.5 2 2.5 3 3.5
Effective coupling
FIG. 8: The effective running coupling divided by p2 for var-
ious volumes. Open circles correspond to a volume of (21.3
fm)2, full squares to (14.2 fm)2, full triangles to (8.08 fm)2,
and upside-down full triangles to (2.02 fm)2.
indicative of what challenges have to be met in higher
dimensions to see the asymptotic ghost exponent.
Finally, the exact value of the exponent obtained in
DSE calculations depends on the projection of the ten-
sor equation for the ghost [8, 11]. The value of 1/5 is
obtained only in the case of a transverse projection [8].
This in turn implies automatically a certain structure of
the longitudinal (w. r. t. to the gluon momentum) tensor
structure of the ghost-gluon vertex, such that it leads for
arbitrary projections to the infrared exponent 1/5. This
then makes the determination of this tensor structure an
almost trivial exercise in the infrared limit. Furthermore,
this precisely prescribes how the Slavnov-Taylor identity
for the gluon propagator, and hence its transversality, is
recovered in the far infrared.
C. Running coupling
Although it is possibly a questionable concept in two-
dimensional Yang-Mills theory, it is possible to formally
define a running coupling. Analogous to higher dimen-
sions [11, 23], the quantity3 α(p) = p6D(p)DG(p)
then proportional to the coupling constant. In particu-
lar, as a consequence of the sum rule, the quantity α(p)
3 To improve the statistical behavior, the ghost dressing function
has been evaluated on a plane-wave source instead of a point
source, as in case of the propagator alone [12]. Hence only the
same volumes are accessible for the coupling constant as for the
ghost-gluon vertex below, where this is also necessary.
should behave in the infrared as p2. Hence α/p2 should
be constant.
From the results on the sum-rule, given in figure 6, it
is already clear that an infrared fixed point will hardly
be seen. However, the results, shown in figure 8, exhibit
such a fixed point at the largest volumes, provided the
lowest point at non-vanishing momentum is discarded4.
Note that the finite volume effects seem to make the run-
ning coupling diverging instead of vanishing, as in higher
dimensions [17, 18].
Thus at sufficiently large volumes, and taking finite
volume effects into account, it is in fact possible to ob-
serve a fixed point in the coupling in lattice gauge theory.
Note that there is a small, systematic overall factor
between the coupling obtained in the different volumes
shown in figure 8. This effect is not visible in the propaga-
tors themselves, but is increased here by taking effectively
the third power of the propagators. As this effect occurs
at all momenta, it is likely not simply a finite volume
effect. However, this can still be an O(a)-effect which is
caused, e. g. among other effects, by the fact that tadpole
corrections, which give overall-factors to the propagators,
have been neglected here [16, 24].
D. Faddeev-Popov operator
A last element in the analysis of the two-point cor-
relation functions are the properties of the Faddeev-
Popov operator, central to the Gribov-Zwanziger sce-
nario [3, 4, 5]. The results on the ghost propagator, which
is the expectation value of the inverse Faddeev-Popov op-
erator, already indicate the existence of an enhancement
of its eigenspectrum near zero eigenvalue. This enhance-
ment is the hallmark of the Gribov-Zwanziger scenario.
However, it is interesting to see the quantitative behav-
ior of the eigenspectrum. Hence the spectral properties
of the Faddeev-Popov operator have been determined as
well, using the technique described in [12].
The near-zero part of the eigenspectrum is shown for
various volumes in figure 9. The volume scaling of the
lowest eigenvalue is shown in figure 10. It is clearly visi-
ble that with increasing volume more and more eigenval-
ues are found near zero. This is the near-zero eigenvalue
enhancement, as predicted in the Gribov-Zwanziger sce-
nario5. In addition, the lowest eigenvalue vanishes in
the infinite-volume limit, and in fact vanishes faster than
4 For the coupling constant only edge momenta have been used,
in contrast to the propagators where also other momenta have
been included. Dismissing here only the lowest non-vanishing
momenta is thus equivalent to dismissing the two lowest non-
vanishing momenta in case of the propagators.
5 Note that the decrease towards larger eigenvalues seen in figure
9 is likely an artifact of the method to determine the eigenvalues
[12]. Furthermore, all eigenvalues are only found with multiplic-
ity 1.
0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
0.001
Spectrum near zero eigenvalue
FIG. 9: The near-zero part of the eigenvalue spectrum of
the Faddeev-Popov operator for volumes (2.02 fm)2 (dashed-
dotted line), (7.11 fm)2 (dotted line), (14.2 fm)2 (dashed line),
and (24.9 fm)2 (solid line). 1164228, 2261493, 3517400, and
2614098 eigenvalues have been enclosed in the full spectrum,
respectively.
1/L [GeV]
-210×3 -110 -110×2
Lowest eigenvalue of the Faddeev-Popov operator
FIG. 10: The volume-dependence of the lowest eigenvalue of
the Faddeev-Popov operator. The solid line is the function
0.314L−2.34 .
the lowest eigenvalue of the Laplacian. This is a prop-
erty which has also been observed in higher dimensions
[12, 22]. It has been argued that such a larger rate may be
necessary for the ghost propagator to develop an infrared
divergence [25]. It is therefore another direct evidence for
the validity of the Gribov-Zwanziger scenario. This van-
ishing of the lowest eigenvalue is in fact necessary for the
Gribov-Zwanziger mechanism to work: For infinite vol-
ume, an average configuration should be arbitrarily close
or on the common boundary of the fundamental mod-
ular region and the Gribov horizon, where by definition
the determinant of the Faddeev-Popov operator vanishes,
and thus must have at least one vanishing eigenvalue [5].
III. THREE-POINT FUNCTIONS
Investigating the vertices in two dimensions is a very
interesting task. On the one hand, the vertices do not
lend themselves easily to evaluation, since as three-point
functions they are much more strongly affected by statis-
tical fluctuations than two-point functions. Hence their
investigation has so far been limited to rather small
volumes in four [26, 27] and even in three dimensions
[12, 26]. On the other hand, the vertices describe inter-
action effects, and it is not a-priori clear how they should
behave in a theory without propagating degrees of free-
dom. In particular, the possibility that the three-gluon
vertex, or at least some of its tensor structures, could
change sign is a very interesting observation in higher di-
mensions [12, 26]. Whether this is also the case in two
dimensions, especially in large volumes, is thus also a
question of interest.
One drawback of investigating vertices in two dimen-
sions on a square lattice is the impossibility to construct
a momentum configuration such that all three momenta
are equal. This equal momentum configuration is the one
usually preferred in functional studies of the vertices [28],
as it permits to have only one external scale. However,
in higher dimensions it was found that the results do not
change qualitatively when instead two of the momenta
are taken to be orthogonal [12, 26]. This configuration
can be realized in two dimensions, and will thus be em-
ployed here.
In general, vertices have a significant amount of tensor
structures. To obtain a more simple function to measure
the interaction represented by a vertex, the quantity
Γtl,abcGabc
Γtl,abcDadDbeDcfΓtl,def
. (6)
will be evaluated instead. Here the indices a, . . . , f are
generic multi-indices, encompassing field-type, Lorentz
and color indices. Also, Dab are the propagators of the
fields, Gabc represent the full Green’s functions and Γtl,abc
are the corresponding tree-level vertices. This quantity
is defined such that it becomes equal to one if the full
and the tree-level vertex coincide. For a more detailed
discussion of this quantity and its properties, see [12].
There are two vertices in Landau-gauge Yang-Mills
theory. The first is the ghost-gluon vertex, which is
shown for four different volumes in figure 11. In this case
in fact the vertex is shown, as only one tensor structure,
the tree-level one, survives non-amputation [12].
As in the higher-dimensional cases [12, 26, 27], it ex-
hibits an essentially constant behavior, except for a pos-
sible small structure below roughly 1 GeV in ghost mo-
mentum. This structure is a maximum, with a drop to-
wards smaller momenta below the tree-level value. Fur-
thermore, the value at small ghost momenta and finite
gluon momenta is below 1, but finite. If a modification
away from a constant infrared behavior of this vertex
should exist, it must set in with an extremely small ef-
fective exponent to not be visible on these volumes.
These results are all in qualitative agreement with the
ghost-gluon vertex in higher dimensions [12, 26, 27]. In
particular, the results confirm the truncation scheme in
the far infrared used in two dimensions in stochastic
quantization and DSE calculations [6, 7, 8]. In that case
an infrared finite ghost-gluon vertex was assumed, de-
livering the critical infrared exponent κ = 1/5, which in
fact was observed in the previous section. This once more
nicely confirms the Gribov-Zwanziger scenario, which
leads directly to this type of approximation. Further-
more, in four dimensions the infrared critical exponent
of the ghost-gluon vertex is fixed, once the exponents of
the propagators are known [28]. This can be extended
to two dimensions and yields in fact an infrared constant
ghost-gluon vertex [29]. This is a very stringent test of
the scenario. The results found here in lattice calcula-
tions once more pass this test. Or, more aptly put, the
test passes the results.
The three-gluon vertex is much more troublesome to
calculate due to strong statistical fluctuations, in partic-
ular at large lattice (not physical) momenta. These are,
in fact, even more pronounced than in higher dimensions,
as was already observed when going from three to four di-
mensions [26]. Thus the uncertainty connected with this
vertex is quite large. Nonetheless, the results shown in
figure 12 are quite spectacular. At a point of about 300-
400 MeV, corresponding roughly to the position where
the plateau in the coupling constant develops or where
the gluon propagator starts to bend over, the quantity
changes sign. Thereafter, it diverges, likely like a
power-law, as can be seen from the bottom-left panel
in figure 12. Precisely such a divergence is expected in
higher dimensions [28]. This also compares very well to
lattice results in higher dimensions, which found the on-
set of such a negative divergence in three dimensions [12],
and at least an infrared suppression in four dimensions
[26]. Note, however that due to the contraction (6) not
necessarily one particular tensor structure of the vertex
changes sign. It is as well possible that two tensor struc-
tures have opposite sign throughout, but differ in mag-
nitude, and one is dominant in the infrared, while the
other dominates in the ultraviolet.
The infrared divergence of the three-gluon vertex when
one momentum vanishes is roughly in agreement with
a power-law with exponent −2.2 for the single external
scale, as can be seen in the bottom-left panel of figure 12.
Although this is not the momentum configuration used
in DSE calculations [28], there is again just one external
scale. It could be expected that the infrared behavior is
the same, if there is just one scale left. In that case, this
exponent of −2.2 is actually the one expected in DSE
q [GeV] (Ghost)
0.5 1
1.5 2
eV] (G
luon) 0
Ghost-gluon vertex, orthogonal momenta
FIG. 11: The ghost-gluon vertex for orthogonal momenta. The
top left panel shows the vertex for all possible orthogonal mo-
mentum configurations for a volume of (21.3 fm)2, with errors
suppressed. The ripple structure is an artifact of the method
[12], and vanishes with increasing statistics. The bottom left
and right panel show the vertex in two specific momentum con-
figurations. In one case the gluon momentum vanishes (left
panel), and in the other the gluon and ghost momenta are of
equal magnitude (right panel). In this case, various physical
volumes are compared. Open circles correspond to a volume
of (21.3 fm)2, full squares to (14.2 fm)2, full triangles to (8.08
fm)2, and upside-down full triangles to (2.02 fm)2.
q [GeV]
0 0.5 1 1.5 2 2.5 3 3.5
Ghost-gluon vertex, one momentum vanishing
q [GeV]
0 0.5 1 1.5 2 2.5 3 3.5
Ghost-gluon vertex, orthogonal momenta with two equal
calculations [29]. This statement applies as well to the
infrared constancy of the ghost-gluon vertex.
Taking this reasoning seriously would imply that all
two- and three-point functions exhibit exactly and quan-
titatively the infrared exponents predicted in DSE calcu-
lations, and are in agreement with the Gribov-Zwanziger
scenario. Therefore, this work here would represent the
first quantitative confirmation of these two frameworks
using lattice gauge theory.
It is of course tempting to also investigate higher n-
point functions. Unfortunately, this is currently out of
reach in the present approach. The reason is that only
non-amputated, full Green’s functions can be directly ob-
tained with the methods used here. Therefore, it would
be necessary to first subtract the not-connected part of
the amplitude, and then amputate the Green’s functions.
In general, the not-connected and the connected ampli-
tude have the same infrared behavior, at least in four
dimensions, if the predictions [28] are correct. There-
fore, it would be necessary to disentangle the sum of
two functions, which both have the same leading infrared
behavior. As the statistical fluctuations become larger
when increasing the number of external legs, the required
statistics become impractical at the current time. Hence
it would be necessary to reduce these fluctuations. It is
possible that e. g. including only results for the same sign
of the Polyakov loop6 would be helpful, as by this statis-
tical fluctuations, at least in case of the gluon propagator,
are reduced [30]. This has to be investigated further.
6 At finite volume, the value of the Polyakov loop is non-zero for
each individual configuration.
p [GeV]
0 0.5 1 1.5 2 2.5 3 3.5 4
Three-gluon vertex, one momentum vanishing
p [GeV]
0 0.5 1 1.5 2 2.5 3 3.5 4
Three-gluon vertex, orthogonal momenta with two equal
p [GeV]
-210×5 -110 -110×2 -110×3
Three-gluon vertex, one momentum vanishing
q [GeV
k [GeV]
Three-gluon vertex, orthogonal momenta
FIG. 12: The three-gluon vertex for orthogonal momenta. The top left and right panel show the vertex in two specific
momentum configurations. In one case one of the gluon momenta vanishes (left panel), and in the other two of the gluon
momenta are of equal magnitude (right panel). The bottom left panel shows a magnification of the low-momentum regime
for one momentum vanishing. In this case the absolute value of GA
is displayed. Various physical volumes are compared.
Open circles correspond to a volume of (21.3 fm)2, full squares to (14.2 fm)2, full triangles to (8.08 fm)2, and upside-down full
triangles to (2.02 fm)2. Finally, in the bottom right panel GA
is shown for the complete orthogonal momentum configuration
plane in case of the largest volume (21.3 fm)2. In case of the bottom left panel, results from all available volumes up to lattices
of size 1202 are shown, see appendix A. In addition to the previously used symbols, the remaining symbols correspond to (3.56
fm)2 (pluses), (4.04 fm)2 (open stars), (6.06 fm)2 (open crosses), (7.11 fm)2 (full stars), (10.1 fm)2 (open triangles), (10.7 fm)2
(diamonds), (12.1 fm)2 (full circles), and (17.8 fm)2 (open squares). The line is the function −0.17p−2.2.
IV. SUMMARY
The volumes accessible in two-dimensional Yang-Mills
theory permitted here to obtain the two-point and three-
point functions on very large lattices, up to (42.7 fm)2
and (21.3 fm)2, respectively. In particular, it was possible
to obtain quantitative results on the infrared behavior
with a precision which is unprecedented in the lattice
investigations of these quantities.
These results demonstrated that the gluon propaga-
tor is infrared vanishing, the ghost propagator is infrared
diverging, and the ’effective coupling constant’ also has
the expected qualitative infrared behavior. Moreover, it
was possible to make these statements quantitative. In-
cluding the effects of finite volume, it was possible to
determine the infinite-volume limit of the characteris-
tic infrared exponents for the two-point functions, and
demonstrate the validity of the sum-rule (3). In fact, the
value κ = 1/5 found coincides with one of the two pos-
sible values expected from stochastic quantization and
Dyson-Schwinger equations for an ’on-shell’, i. e. trans-
verse, gluon. Furthermore, the infrared behavior of the
vertices permit to close the system self-consistently in the
context of such equations. In particular, the ghost-gluon
vertex is infrared constant.
These results confirm the Gribov-Zwanziger scenario
in two dimensions. Without any dynamic, i. e. propa-
gating, degrees of freedom, all the infrared behavior is
still qualitative the same as in higher dimensions. This
implies that these effects in fact stem from the gauge-
fixing procedure, in essentially the way predicted by the
Gribov-Zwanziger scenario.
It will, of course, take some time before it is possible to
repeat the same in higher dimensions. One of the quan-
titative reasons is that the critical exponent in the gluon
observables decreases with increasing dimension [6, 8].
Hence the effects observed here will only be observable
for larger volumes in higher dimensions. Nonetheless,
the results are also in excellent qualitative agreement
with the predictions of DSE calculations for the finite
volume behavior of the propagators in four dimensions
[18]. Finally, the comparison of the ghost propagator for
different dimensions yields the pattern expected from the
Gribov-Zwanziger scenario.
However, these results should also be taken with care,
as two-dimensional Yang-Mills theory is different from its
higher-dimensional versions. And although there is little
evidence to the contrary, no rigorous implication exists
that the effects seen here translate themselves into higher
dimensions without changes. Hence a satisfactory state
of affairs in higher dimensions has to await equivalent
investigations in higher dimensions. Until then, these
results here are another piece of the puzzle, which seem
to indicate that the Gribov-Zwanziger scenario in Landau
gauge is valid also in higher dimensions.
These results are, beyond these questions, also inter-
esting on their own. It is very tempting to investigate
how these results relate to the host of exact results avail-
able in two-dimensional Yang-Mills theory, what is the
connection to the topological aspects of the theory, and,
last but not least, how and if an equivalence between
the Gribov-Zwanziger and the Kugo-Ojima confinement
scenario exists, at least in two dimensions.
Acknowledgments
The author is grateful to Attilio Cucchieri and Tereza
Mendes for many helpful and interesting discussions.
Furthermore he thanks all those (in particular, Markus
Huber) who always ask about two dimensions. This work
was supported by the DFG under grant MA 3935/1-2 and
in part by the Slovak Grant Agency for Science, Project
VEGA No. 2/6068/2006. The ROOT framework [31] has
been used in this project.
APPENDIX A: GENERATION OF
CONFIGURATIONS
The generation of configurations in two dimensions and
their gauge-fixing to Landau gauge can be and has been
done exactly as in higher dimensions [12]. In particular,
the confirmation of the Gribov-Zwanziger scenario in the
present work implies that the problem of Gribov-Singer
copies [3, 20] should also in two dimensions become ir-
relevant for Green’s functions in the infinite volume limit
[21]: Gribov-Singer effects should become smaller with
increasing volume. Hence they have been ignored here,
although, as discussed in section II, effects at finite vol-
ume cannot be excluded.
To give units to the momenta, the infinite volume limit
of the string tension for a given β, which can be deter-
mined analytically [1], is set to (440 MeV)2. The con-
figurations used are shown in table IV. The comparison
with the (also exactly known) infinite volume value of the
plaquette [1] shows that locally the continuum has been
reached. However, the discussion in section II shows that
this is not correct globally.
APPENDIX B: LATTICE ARTIFACTS OTHER
THAN FINITE VOLUME
As one of the main claims here is that the deviation
from the asymptotic continuum form in the infrared is a
pure finite-volume effect, it is necessary to check the in-
fluence of other lattice artifacts. In particular, discretiza-
tion effects and violation of rotational symmetry may be
relevant. The latter is known to be a significant effect
when comparing correlation functions measured along
different directions of the hypercube (see, e. g., [14]), in
the present case along an edge or along a diagonal. In
figure 13, these effects are explicitly checked. The results
are at roughly the same volume of about (10.3 fm)2 at
two different βs, 10 and 30, and results with momenta
along any possible direction are directly compared.
It is clearly visible that, despite a factor of nearly 2
in a, both results agree remarkably well over the whole
range of momenta. Thus discretization effects are nearly
negligible, at least for a volume of a few fm2 and mo-
mentum not too close to the maximum one. Treating
only the physical volume as an independent parameter
in the infrared throughout the main text is hence justi-
fied. Also no significant effect is seen of the violation of
rotational invariance, which is usually most pronounced
at intermediate momenta in the gluon dressing function.
For the current case a few tens of lattice sites along each
edge seem to be sufficient to have already a quite good
approximation of rotational invariance.
Furthermore, there is no distinct difference between
the gluon and the ghost dressing function in terms of
TABLE IV: Data of the configurations considered in the numerical simulations. The values for a are 1.108 GeV−1 for β = 10
and 1.951 GeV−1 for β = 30 [1]. The momenta p0, pi, and pf denote the lowest non-vanishing momentum and the beginning
and the end of the fit interval used in the determination of the effective exponents in section II, respectively. Note that for
N ≥ 140 not as many momentum configurations for the gluon propagator were available as for N ≤ 120. Ghost configurations
are the ones used to determine the ghost propagator, the properties of the Faddeev-Popov operator, the ghost-gluon vertex,
and the running coupling. Gluon configurations are the ones used to determine the gluon propagator and the three-gluon
vertex. As the autocorrelation time for the plaquette is less than one hybrid overrelaxation (HOR) sweep, all sweeps (after
thermalization) have been used for the plaquette measurement, given the number of plaquette configurations in the table. Note
that all ghost configurations are also included in the gluon configurations, the sets are not independent. In case of N ≥ 140,
only the propagators have been determined. Hence the number of both configurations coincide. The quantity < P > / < P∞ >
gives the ratio of the expectation value of the plaquette over the analytical infinite volume limit. The error is determined
according to [12]. Finally, p is the tuning parameter for the stochastic overrelaxation algorithm used for gauge-fixing [32], and
which has been obtained by linear self-adjustment [12]. Note that this quantity is not very precisely determined, and should be
used rather as an indication of the correct order. Sweeps is the number of HOR sweeps between two consecutive measurements
[12].
V [fm2] N =
V/a2 β p0 [MeV] pi [MeV] pf [MeV] Ghost config. Gluon config. Plaq. config. 1-< P > /P∞ p Sweeps
2.02 20 30 610 1206 1874 2430 11525 369211 -5(4) 10−6 0.83 30
3.56 20 10 347 685 1064 2102 12319 355257 1(1) 10−5 0.84 30
4.04 40 30 306 610 961 1964 10579 527689 1(2) 10−6 0.88 50
6.06 60 30 204 408 644 1723 7311 510688 0(1) 10−6 0.93 70
7.11 40 10 174 347 546 2161 10758 536786 -2(6) 10−6 0.87 50
8.08 80 30 153 306 484 1429 4898 438579 0(1) 10−6 0.90 90
10.1 100 30 123 245 387 747 1988 216391 -3(1) 10−6 0.96 110
10.7 60 10 116 232 366 1825 7108 496291 -2(4) 10−6 0.92 70
12.1 120 30 102 204 323 552 1754 225036 1(1) 10−6 0.95 130
14.1 140 30 87.6 175 371 368 368 53971 -2(2) 10−6 0.97 150
14.2 80 10 87.0 174 275 1582 6465 579900 -1(3) 10−6 0.92 90
16.2 160 30 76.6 153 325 291 291 48199 0(2) 10−6 0.98 170
17.8 100 10 69.6 139 220 1339 4337 478853 -2(3) 10−6 0.96 110
18.2 180 30 68.1 136 289 308 308 56724 -2(1) 10−6 0.96 190
20.2 200 30 61.3 123 260 199 199 40584 -2(1) 10−6 0.96 210
21.3 120 10 58.0 116 183 762 5236 678065 1(2) 10−6 0.93 130
22.2 220 30 55.7 111 236 232 232 51577 1(1) 10−6 0.99 230
24.2 240 30 51.1 102 217 232 232 55691 0(1) 10−6 0.98 250
24.9 140 10 49.7 99.4 211 517 517 76053 1(5) 10−6 0.96 150
28.4 160 10 43.5 87.0 184 455 455 75500 -8(4) 10−6 0.97 170
32.0 180 10 38.7 77.3 164 390 390 72034 -4(4) 10−6 0.97 190
35.6 200 10 34.8 69.6 148 328 328 66976 -4(4) 10−6 0.97 210
39.1 220 10 31.6 63.3 134 287 287 63703 3(3) 10−3 0.98 230
42.7 240 10 29.0 58.0 123 394 394 96075 0(2) 10−6 0.98 250
these artifacts. In case of the propagators these effects
would be even diminished, as the trivial factor p−2 helps
in the reduction of such artifacts. Hence the totally domi-
nant contribution for the artifacts in the correlation func-
tions in the infrared is clearly the finite physical volume.
Similar observations pertain to all quantities measured
here, and hence only the physical volumes are used as
explicit parameters in the main text, and no heed is paid
for the different β-values. The only exception observed
here is in the case of the running coupling in section II C,
where an overall scaling factor has been seen. This issue
has been discussed in detail in this section II C.
APPENDIX C: CONTRIBUTIONS IN OTHER
COLOR TENSOR STRUCTURES
There is no a-priori necessity for correlation functions
to carry only their tree-level color structure, although
such a color structure permits a consistent solution us-
ing functional methods in the infrared, at least in four
dimensions [18]. Therefore, this property should be ex-
plicitly checked. This is done for the ghost and the gluon
propagator in figure 14. All contributions are compatible
with zero. Furthermore, the average value decreases in all
cases with increasing statistics. So, within the statistics
available, there are no color-off-diagonal components in
the propagators. Due to the structure of the DSEs, it is
then very hard to imagine how the higher n-point Green’s
functions should have a color structure different from the
p [GeV]
0 1 2 3 4 5
Gluon dressing function
p [GeV]
0 1 2 3 4 5
Ghost dressing function
FIG. 13: Consequences of different discretizations and violation of rotational invariance in case of the gluon (left panel) and
ghost (right panel) dressing functions. Open circles correspond to a system at β = 30 and a volume of (10.1 fm)2, open stars
to a system at β = 10 and a volume of (10.7 fm)2. The different momentum directions have not been marked differently.
p [GeV]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
-0.04
-0.02
Off-diagonal gluon propagator
p [GeV]
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Off-diagonal ghost dressing function
FIG. 14: The color off-diagonal elements of the gluon propagator (left) and of the ghost propagator (right) on a (21.3 fm)2
volume.
tree-level one. This can, of course, not be excluded by this result. Nonetheless, it seems to be unlikely.
[1] H. G. Dosch and V. F. Muller, Fortsch. Phys. 27, 547
(1979).
[2] T. Kugo and I. Ojima, Prog. Theor. Phys. Suppl. 66, 1
(1979) [Erratum Prog. Theor. Phys. 71, 1121 (1984)].
[3] V. N. Gribov, Nucl. Phys. B 139, 1 (1978).
[4] D. Zwanziger, Phys. Lett. B 257, 168 (1991); D.
Zwanziger, Nucl. Phys. B 364, 127 (1991); D. Zwanziger,
Phys. Rev. D 67, 105001 (2003) [arXiv: hep-th/0206053].
[5] D. Zwanziger, Nucl. Phys. B 412, 657 (1994).
[6] D. Zwanziger, Phys. Rev. D 65, 094039 (2002) [arXiv:
hep-th/0109224].
[7] C. Lerche and L. von Smekal, Phys. Rev. D 65, 125006
(2002) [arXiv:hep-ph/0202194].
[8] A. Maas, J. Wambach, B. Grüter and R. Alkofer, Eur.
http://arxiv.org/abs/hep-th/0206053
http://arxiv.org/abs/hep-th/0109224
http://arxiv.org/abs/hep-ph/0202194
Phys. J. C 37, No.3, 335 (2004) [arXiv:hep-ph/0408074].
[9] D. Zwanziger, Phys. Rev. D 70, 094034 (2004)
[arXiv:hep-ph/0312254].
[10] J. M. Pawlowski, D. F. Litim, S. Nedelko and
L. von Smekal, Phys. Rev. Lett. 93, 152002 (2004)
[arXiv:hep-th/0312324].
[11] R. Alkofer and L. von Smekal, Phys. Rept. 353, 281
(2001) [arXiv:hep-ph/0007355]; C. S. Fischer, J. Phys.
G 32, R253 (2006) [arXiv:hep-ph/0605173]; R. Alkofer,
arXiv:hep-ph/0611090.
[12] A. Cucchieri, A. Maas and T. Mendes, Phys. Rev. D 74,
014503 (2006) [arXiv:hep-lat/0605011].
[13] A. Cucchieri, Phys. Rev. D 60, 034508 (1999)
[arXiv:hep-lat/9902023]; A. Cucchieri, F. Karsch and
P. Petreczky, Phys. Rev. D 64, 036001 (2001)
[arXiv:hep-lat/0103009]; A. Cucchieri, T. Mendes and
A. R. Taurines, Phys. Rev. D 67, 091502 (2003)
[arXiv:hep-lat/0302022]; A. Cucchieri and T. Mendes,
Phys. Rev. D 73, 071502 (2006) [arXiv:hep-lat/0602012].
[14] A. Cucchieri and T. Mendes, arXiv:hep-ph/0605224.
[15] P. Boucaud et al., arXiv:hep-lat/0602006.
[16] J. C. R. Bloch, A. Cucchieri, K. Langfeld and T. Mendes,
Nucl. Phys. B 687, 76 (2004) [arXiv:hep-lat/0312036].
[17] A. Sternbeck, E. M. Ilgenfritz, M. Müller-Preussker
and A. Schiller, Phys. Rev. D 72, 014507 (2005)
[arXiv:hep-lat/0506007]; A. Sternbeck, E. M. Ilgenfritz,
M. Muller-Preussker, A. Schiller and I. L. Bogolubsky,
PoS LAT2006, 076 (2006) [arXiv:hep-lat/0610053].
[18] C. S. Fischer, A. Maas, J. M. Pawlowski and L. von
Smekal, arXiv:hep-ph/0701050, accepted by Ann. of.
Phys..
[19] D. Birmingham, M. Blau, M. Rakowski and G. Thomp-
son, Phys. Rept. 209, 129 (1991); S. Cordes,
G. W. Moore and S. Ramgoolam, Nucl. Phys. Proc.
Suppl. 41, 184 (1995) [arXiv:hep-th/9411210].
[20] I. M. Singer, Commun. Math. Phys. 60 (1978) 7.
[21] D. Zwanziger, Phys. Rev. D 69, 016002 (2004)
[arXiv:hep-ph/0303028].
[22] A. Cucchieri, A. Maas and T. Mendes,
arXiv:hep-lat/0702022, accepted by Phys. Rev. D.
[23] L. von Smekal, R. Alkofer and A. Hauck, Phys. Rev.
Lett. 79, 3591 (1997) [arXiv:hep-ph/9705242]; Annals
Phys. 267, 1 (1998) [Erratum-ibid. 269, 182 (1998)]
[arXiv:hep-ph/9707327].
[24] G. P. Lepage and P. B. Mackenzie, Phys. Rev. D 48, 2250
(1993) [arXiv:hep-lat/9209022].
[25] A. Cucchieri, arXiv:hep-lat/0612004.
[26] A. Maas, A. Cucchieri and T. Mendes,
arXiv:hep-lat/0610006.
[27] A. Cucchieri, T. Mendes and A. Mihara, JHEP
0412, 012 (2004) [arXiv:hep-lat/0408034]; A. Stern-
beck, E. M. Ilgenfritz, M. Muller-Preussker, A. Schiller
and I. L. Bogolubsky, PoS LAT2006, 076 (2006)
[arXiv:hep-lat/0610053].
[28] C. S. Fischer and J. M. Pawlowski, Phys. Rev. D
75, 025012 (2007) [arXiv:hep-th/0609009]; R. Alkofer,
C. S. Fischer and F. J. Llanes-Estrada, Phys. Lett. B
611, 279 (2005) [arXiv:hep-th/0412330].
[29] C. S. Fischer, private communication; M. Huber,
Diploma Thesis, Graz U., March 2007 (advisor:
R. Alkofer).
[30] A. Cucchieri, Nucl. Phys. B 508, 353 (1997)
[arXiv:hep-lat/9705005].
[31] R. Brun and F. Rademakers, Nucl. Instrum. Meth. A
389, 81 (1997).
[32] A. Cucchieri and T. Mendes, Nucl. Phys. B 471,
263 (1996) [arXiv:hep-lat/9511020]; Ph. de Forcrand,
R. Gupta, Nucl. Phys. B (Proc. Suppl. ) 9, 516 (1989).
http://arxiv.org/abs/hep-ph/0408074
http://arxiv.org/abs/hep-ph/0312254
http://arxiv.org/abs/hep-th/0312324
http://arxiv.org/abs/hep-ph/0007355
http://arxiv.org/abs/hep-ph/0605173
http://arxiv.org/abs/hep-ph/0611090
http://arxiv.org/abs/hep-lat/0605011
http://arxiv.org/abs/hep-lat/9902023
http://arxiv.org/abs/hep-lat/0103009
http://arxiv.org/abs/hep-lat/0302022
http://arxiv.org/abs/hep-lat/0602012
http://arxiv.org/abs/hep-ph/0605224
http://arxiv.org/abs/hep-lat/0602006
http://arxiv.org/abs/hep-lat/0312036
http://arxiv.org/abs/hep-lat/0506007
http://arxiv.org/abs/hep-lat/0610053
http://arxiv.org/abs/hep-ph/0701050
http://arxiv.org/abs/hep-th/9411210
http://arxiv.org/abs/hep-ph/0303028
http://arxiv.org/abs/hep-lat/0702022
http://arxiv.org/abs/hep-ph/9705242
http://arxiv.org/abs/hep-ph/9707327
http://arxiv.org/abs/hep-lat/9209022
http://arxiv.org/abs/hep-lat/0612004
http://arxiv.org/abs/hep-lat/0610006
http://arxiv.org/abs/hep-lat/0408034
http://arxiv.org/abs/hep-lat/0610053
http://arxiv.org/abs/hep-th/0609009
http://arxiv.org/abs/hep-th/0412330
http://arxiv.org/abs/hep-lat/9705005
http://arxiv.org/abs/hep-lat/9511020
ABSTRACT
  The ghost and gluon propagator and the ghost-gluon and three-gluon vertex of
two-dimensional SU(2) Yang-Mills theory in (minimal) Landau gauge are studied
using lattice gauge theory. It is found that the results are qualitatively
similar to the ones in three and four dimensions. The propagators and the
Faddeev-Popov operator behave as expected from the Gribov-Zwanziger scenario.
In addition, finite volume effects affecting these Green's functions are
investigated systematically. The critical infrared exponents of the
propagators, as proposed in calculations using stochastic quantization and
Dyson-Schwinger equations, are confirmed quantitatively. For this purpose
lattices of volume up to (42.7 fm)^2 have been used.

<|endoftext|><|startoftext|>
Coupled electron and phonon transport in one-dimensional atomic junctions
J. T. Lü∗ and Jian-Sheng Wang†
Center for Computational Science and Engineering and Department of Physics,
National University of Singapore, Singapore 117542, Republic of Singapore
(Dated: August 5, 2021)
Employing the nonequilibrium Green’s function method, we develop a fully quantum mechanical
model to study the coupled electron-phonon transport in one-dimensional atomic junctions in the
presence of a weak electron-phonon interaction. This model enables us to study the electronic and
phononic transport on an equal footing. We derive the electrical and energy currents of the coupled
electron-phonon system and the energy exchange between them. As an application, we study the
heat dissipation in current carrying atomic junctions within the self-consistent Born approximation,
which guarantees energy current conservation. We find that the inclusion of phonon transport is
important in determining the heat dissipation and temperature change of the atomic junctions.
PACS numbers: 71.38.-k,63.20.Kr,72.10.Bg
I. INTRODUCTION
The electronic transport and phononic transport in
meso- and nano-structures have attracted a great deal
of interest in the past two decades, although their devel-
opment is not so parallel sometimes. These structures
display important quantum effects due to the confine-
ment in one or more directions1. The quantized electrical
conductance2 was observed much earlier than that of the
thermal conductance3 mainly due to the difficulty in mea-
suring the thermal transport properties. Electrons and
phonons are not two isolated systems. Their interactions
are important for both electronic and phononic trans-
port. With the development of both fields there arises
the requirement to study the coupled electron-phonon
transport from time to time. When studying electronic
transport problems, one usually assumes that electrons
interact with some phonon bath where the phonons are in
their thermal equilibrium state characterized by the Bose
distribution. This simple assumption is not able to give
satisfactory results in some cases where the phonons are
driven out of equilibrium by the electrons. This is espe-
cially true in places where the thermal conductance is low
or the phonon relaxation is slow4,5. To take into account
the nonequilibrium phonon effect, one usually introduces
into the electronic transport formalism some phenomeno-
logical parameters that describe the phonon relaxation
process. In engineering applications, as the size of the
electronic devices decreases to nanoscale, the heat dissi-
pation and conduction in these structures become crit-
ical issues, which may influence the electronic proper-
ties dramatically6. Only studying the electronic trans-
port is not enough in these cases. On the other hand,
heat transport in one-dimensional (1D) structures has re-
ceived considerable attention recently6,7,8. Fourier’s law
of heat conduction is no longer valid in many 1D systems.
The microscopic origins of the macroscopic Fourier’s law
remain one of the most frustrating problems in nonequi-
librium statistical mechanics. Since the electrons and
phonons both contribute to the heat conduction, their
relative roles in many nanostructures are still not clear.
Especially in semiconductors, which one carries the ma-
jority of the thermal current is not a trivial problem. To
answer these questions, we need some general models,
which take into account the electron, phonon transport,
and their mutual interactions.
Theoretically, although the development of electronic
transport in 1D structures has been very striking, that
of the phononic transport is relatively slow. Classical
molecular dynamics (MD) and the Boltzmann-Peierls
equation are the widely used methods in phononic trans-
port. MD method is not accurate below the Debye
temperature, while the Boltzmann-Peierls equation can
not be used in nanostructures without translational in-
variance. In both cases, the quantum effect becomes
important1. Only recently, the nonequilibrium Green’s
function method9,10,11,12, which has been widely used
to study the electronic transport, has been applied to
study the quantum phononic transport13,14,15,16,17. As
far as we know, the study of the coupled electronic and
phononic transport in nanostructures is rare18,19,20,21.
In Ref. 19, the authors considered the nonequilibrium
phonons in molecular transport junctions. Galperin and
co-authors analyzed the heat generation and conduction
in molecular systems18. In this paper, using the nonequi-
librium Green’s function method, we study the coupled
electronic and phononic transport in 1D atomic junc-
tions. The formalism is similar to that of Ref. 18. In our
model the electron subsystem is described by a single-
orbital tight-binding Hamiltonian, and the phonon sub-
system is described in a harmonic approximation. We
assume that the electron-phonon interaction is weak so
that the perturbative treatment is valid. The strong-
interaction case is the scope of future work.
The rest of the paper is organized as follows. In Sec. II,
we introduce the 1D model system, and derive expres-
sions for the electrical, energy current of the coupled
electron-phonon system. In Sec. III we show the heat
generation in one- and two-atom structures under differ-
ent model parameters. Sec. IV is the conclusion. In Ap-
pendix A-C we give some technical details of our deriva-
tion.
http://arxiv.org/abs/0704.0723v1
II. COUPLED ELECTRONIC AND PHONONIC
TRANSPORT
A. The Hamitonian
Our model system is an infinite 1D atomic chain as
shown in Fig. 1. The electrons and atoms are only al-
lowed to move in the longitudinal direction. We treat
the atoms as coupled harmonic oscillators, and take into
account their nearest neighbour interactions up to the
second order. We assume that there is only one single
electronic state for each atom and take into account hop-
ping transitions between the nearest states. This corre-
sponds to a single-orbital tight-binding model. Also, we
assume that there is only one spin state for each orbital.
Following Caroli22, we divide the whole system into one
central region and two semi-infinite leads, which act as
electrical and thermal baths (Fig. 1). The Hamiltonian
of the whole system is
α=L,C,R;β=e,ph
α=L,R;β=e,ph
HαCβ +H
+Heph. (1)
The electron-phonon interaction Hamiltonian Heph is
non-zero only in the central region. The electron Hamil-
tonian reads
Hαe =
εαi c
|i−j|=1
tαijc
j , (2)
where c
i and c
i are the electron creation and annihila-
tion operators. εαi is the electron onsite energy, and t
ij is
the hopping energy between adjacent states. i and j run
over the sites in the α region. The coupling Hamiltonian
with the leads is
HLCe =
tLCij c
j , (3)
HCRe =
tCRij c
j . (4)
HCLe and H
e have similar expressions. We also have
tαC = tCα
, α = L,R. For our 1D tight-binding model,
tαC has only one non-zero element. If we label the central
atoms with indices 1 to n as shown in Fig. 1, the non-zero
elements will be tLC01 , t
10 , t
n+1,n, and t
n,n+1.
The phonon Hamiltonian is
Hαph =
u̇αi u̇
|i−j|=0,1
uαi K
j . (5)
uαi and u̇
i are the mass-renormalized atom displacement
and momentum operator. Kαii = 2K
i , and K
εCm+1 ε
εRn+1 ε
KL0,−1
tL0,−1 t
KCL10
tCm+1,m
KCm+1,m
tRCn+1,n
KRCn+1,n
tRn+2,n+1
KRn+2,n+1
Heph Heph
FIG. 1: Shematic diagram of the 1D coupled electron-phonon
system and the parameters used in the model. The big dots
in the bottom line represent atoms, while the small dots in
the upper line represent electron states. They are coupled via
the electron-phonon interaction.
−Kα0 /
mαi m
j (i 6= j). Here Kα0 is the spring constant,
and mαi is the mass of the ith atom in the α region. Like
the electrons, the coupling Hamiltonian with the leads is
HLCph =
uLi K
j , (6)
HCRph =
uCi K
j . (7)
We also have KCα = KαC
. The non-zero elements are
KLC01 , K
10 , K
n+1,n, and K
n,n+1.
The electron-phonon interaction is included within the
adiabatic Born-Oppenheimer approximation. First, the
electron subsystem is solved with all the atoms in their
equilibrium positions. Then, the isolated phonon sub-
system is considered. After that, the electron-phonon in-
teraction is turned on by allowing the atoms to oscillate
around their equilibrium positions. Within this picture,
the electron-phonon interaction is11
Heph =
i,j,k
Mkijc
icjuk. (8)
The interaction matrix element is Mkij =
. All
the operators in Eq. (8) are in the central region, so we
omitted the superscript C. In our model, the electron
operators are in the second quantization, while that of
the phonons are in the first quantization.
B. Green’s functions
The nonequilibrium Green’s function method for the
electronic transport is discussed in Refs. 9,10,11,12, and
that for the phononic transport in Refs. 13,14,15,16,17.
Here we concentrate on the electron-phonon interactions.
The definition of the electron contour-ordered Green’s
function is Gjk(τ, τ
′) = −i〈T {cj(τ)c†k(τ ′)}〉, and the
phonon counterpart is Djk(τ, τ
′) = −i〈T {uj(τ)uk(τ ′)}〉.
Here τ is time on the Keldysh contour, and T {· · ·} is
the contour-ordered operator. We set h̄ = 1 through-
out the formulas. Without the electron-phonon inter-
action, the isolated electron and phonon problem can
be solved exactly. We denote these Green’s functions
as G0(τ, τ
′) and D0(τ, τ
′), respectively. In our case,
it is convenient to write the Hamiltonians as matri-
ces and work in the energy space. The electron re-
tarded and advanced Green’s functions are Gr0(ε) =
†(ε) =
(ε+ iη)I −HCe − ΣrL(ε)− ΣrR(ε)
. I is
an identity matrix, and η → 0+. The retarded self-
energy Σrα = t
Cαgrαt
αC is due to the interactions with
the lead α. The retarded Green’s function of the semi-
infinite lead grα can be obtained analytically (Appendix
A). The “less than” Green’s function is given by G<0 =
Gr0(Σ
L + Σ
0 , where Σ
α = −f eα(Σrα − Σaα). f eα is
the Fermi-Dirac distribution. The phonon retarded and
advanced Green’s functions are23 Dr0(ω) = D
†(ω) =
(ω + iη)2I −KC −ΠrL(ω)−ΠrR(ω)
. The lead re-
tarded self-energy is Πrα(ω) = K
Cαdrα(ω)K
αC . drα also
has analytical expression (Appendix A). The phonon
“less than” Green’s function is D<0 = D
)Da0 ,
where Π<α = f
α −Πaα). fphα is the Bose distribution
function.
Knowing the bare electron and phonon Green’s func-
tions G0 and D0, we can include their interaction as per-
turbation. Following the standard procedure of nonequi-
librium Green’s function method, we can express this
interaction as self-energies. The full Green’s functions
are obtained from the Dyson equation, e.g., for elec-
trons Gr,a = G
0 + G
r,a, and G< = GrΣ<t G
Σ<t = Σ
eph + Σ
L + Σ
R is the total self-energy. Keeping
the lowest non-zero order (the second order) of the self-
energies, we have two (Hartree- and Fock-like) terms for
the electrons, and one polarization term for the phonons.
This is the so-called Born approximation (BA)11. The
Fock self-energies are
ΣF,<mn (ε) = iM
G<0 ij(ε− ω)D
0 kl(ω)
M ljn, (9)
ΣF,rmn(ε) = iM
Gr0ij(ε− ω)D
0 kl(ω)
+G<0 ij(ε− ω)D
0kl(ω)
+Gr0ij(ε− ω)D
0kl(ω)
M ljn. (10)
The “less than” Hartree self-energy is zero, and the re-
tarded one is
ΣH,rmn = −iM imnDr0ij(ω
′ = 0)M
G<0 lk(ε)
. (11)
This term is a constant for all energies, which represents
a static potential due to the presence of phonons. The
self-energies for the phonons are
Π<mn(ω) = −iMmlk
G<0 ki(ε)G
0 jl(ε− ω)M
ij , (12)
Πrmn(ω) = −iMmlk
Gr0ki(ε)G
0 jl(ε− ω)
+G<0 ki(ε)G
0jl(ε− ω)
Mnij . (13)
In Eqs. (9-13), sum over internal indices is assumed. The
self-consistent Born approximation (SCBA) is obtained
by replacing all the bare Green’s functions G0 and D0 in
Eqs. (9-13) with the full G and D11. In Appendix B,
we show that the SCBA fulfills the electrical and energy
current conservation, while BA fails.
C. The electrical and energy current
The electrical and energy current can be expressed by
the Green’s functions. The electrical current out of the
lead α is11,24
Jα = e
Tr{G>(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}. (14)
The electron energy current is
JE,eα =
ε Tr{G>(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}. (15)
The electron heat current is obtained from Eqs. (14-
15) as Jh,eα = J
α − µαJα/e. µα is the lead chemical
potential. The derivation of the phonon energy current
runs parallel with that of the electrons17
JE,phα = −
ω Tr{D>(ω)Π<α (ω)−D<(ω)Π>α (ω)}.
For phonons the energy current is the same as the heat
current. When there is no electron-phonon interaction,
the electron energy current is conserved throughout the
structure. So is the phonon energy current. In the pres-
ence of such an interaction, only the total energy current
is conserved due to the energy exchange between them.
The phonons do not carry charges, so in both cases the
electrical current is conserved. Since we can’t get the
exact self-energies in most cases, we need some approxi-
mations. Properly defined self-energies should fulfill the
electrical and energy current conservation
Jα = 0, (17)
(JE,eα + J
α ) = 0, (18)
where α runs over all the leads. We justify that the SCBA
fulfills these conservation laws, while the BA fails to con-
serve the energy current (Appendix B). Provided we sat-
isfy these conservation laws, we can write the electrical
and energy current in symmetric forms. The electrical
current is
J = e
T̃ e(ε) [f eL(ε)− f eR(ε)] . (19)
The transmission coefficient reads
T̃ e = Tr
Gr(ΓL +
Γeph − Se)GaΓR
+GrΓLG
a(ΓR +
Γeph + S
, (20)
where Se is
(f eR + f
L)Γeph + iΣ
f eL − f eR
. (21)
Γα = i(Σ
α − Σaα), α = L,R is the electron level-width
function. Γeph = i(Σ
eph − Σaeph) is due to the electron-
phonon interaction. The total energy current is
T̃ e(ε) [f eL(ε)− f eR(ε)]
T̃ ph(ε)
L (ε)− f
R (ε)
. (22)
The phonon transmission coefficient is
T̃ ph = Tr
Dr(ΛL +
Λeph − Sph)DaΛR
+DrΛLD
a(ΛR +
Λeph + S
, (23)
where Sph is
Sph =
R + f
L )Λeph − iΠ
L − f
. (24)
Λα = i(Π
α − Πaα) is the phonon level-width function.
Λeph = i(Π
eph−Πaeph) is due to the electron-phonon inter-
action. Eqs. (19-24) are the generalization of the Caroli
formula22 to include the electron-phonon interaction.
III. HEAT GENERATION IN CURRENT
CARRYING 1D ATOMIC JUNCTIONS
As an application of the formalism in Sec. II, we
study the heat dissipation in current-carrying 1D atomic
junctions18,25,26,27,28,29,30,31. In the presence of potential
difference between the two leads, there will be electrical
current flowing between them. When the electrons pass
the central region, there is energy exchange between the
electron and phonon systems. The energy dissipated into
the phonon system makes the atom temperature higher
than that of the leads if it is not efficiently conducted to
the leads. If the electron-phonon interaction is weak, the
energy dissipated into the phonon system is only a small
fraction of the electron energy current. But this small
fraction still influences the transport properties of the
atomic junction and even leads to junction breakup32,33,
especially when the thermal conductance is low. Dif-
ferent models have been used to study the local heating
effect. Some simply assume that the phonons are in their
thermal equilibrium states31. Some take into account the
phonon transport by using the rate equations30 or other
semi-classical models26,27,32. Few of them take into ac-
count the quantum effect in heat transport18,34. Our
model treats the electron and phonon transport on an
equal quantum-mechanical footing, and includes their in-
teractions self-consistently. The heat generation is given
by (Eq. (B9))
Q = i
G>nm(ε)M
kl(ω)G
ij(ε− ω)M ljn
. (25)
At zero temperature, we can get an analytical expression
Eq. (C1) for a single-atom structure by using the bare
Green’s functions G0 and D0 in Eq. (25) (Appendix C).
Equation (C1) can reproduce most qualitative features of
heat generation in a single atom, except that it does not
take into account heat conduction in the phonon system.
We first study the case where the lead energy band is
wide compared to the voltage applied to the structure.
For most metallic leads, this condition should hold. In
the weak electron-phonon coupling regime, the Born ap-
proximation should give acceptable results for the heat
generation, although physically it is not a good approx-
imation. Figure 2 shows the heat generation of a single
atom (n = 1 in Fig. 1) computed using Eqs. (B5) and
(B8) under BA and SCBA, respectively. The parameters
used in the calculation are stated in the figure caption.
With these parameters, the electron energy band is in
the range −1 ≤ ε ≤ 1 eV. The chemical potential of each
lead is zero in equilibrium. The phonon energy is ap-
proximately ω = 0.05 eV. In all the results presented in
this section, the temperature is T = 4.2 K, the electron-
phonon coupling matrix M = 0.08 eV/(Å·amu 12 ). The
cut-off energy of the electron system is 2.1 eV, and the
phonon system is 0.2 eV. The energy spacing is dis-
cretized into grids of 1 meV. Equation (B5) gives the
energy decrease of the electron system, while Eq. (B8)
gives the energy increase of the phonon system. Numeri-
cal results from Eq. (B5) and Eq. (B8) under SCBA have
some slight discrepancy. This is due to numerical inac-
curacies. But most of the discrepancy under BA comes
from the difference between the bare and the full Green’s
functions, which may become even larger for some pa-
rameters. So BA should be used with care in the study
the energy exchange between the electron and phonon
system. We also note that although Eqs. (B5) and (B7)
are equivalent, numerical result from Eq. (B5) is unsta-
ble in many cases. The reason is that the energy ex-
change between the electron and the phonon system is
only a small fraction of the total electrical energy cur-
rent. Equation (B5) is the difference between two large
numbers, so our numerical integration has to be accurate
enough to get a reasonable result30. On the contrary,
Eq. (25) is much more stable since we have got the dif-
ference analytically. All the results presented below use
this equation.
From Fig. 2, we can see two threshold values in heat
generation. The first one corresponds to the onset of
phonon emission. Under low temperatures, the equilib-
rium phonon occupation is very small, so the phonon
absorption process seldom takes place. If the applied
bias is smaller than the phonon energy, electrons don’t
 0  0.1  0.2  0.3  0.4  0.5
Voltage (V)
SCBA, B8
SCBA,B5
BA, B8
BA,B5
 0  0.05  0.1
FIG. 2: Comparison of different methods to compute the heat
generation in a single atom structure. The four curves cor-
respond to results from Eqs. (B5) and (B8) under BA and
SCBA, respectively. If we label this single atom as index 1,
its electronic onsite energy is written as εC1 = 0.1 eV. The
onsite energy of the leads is εL = εR = 0 eV. The hopping
energy is tLij = t
ij = 0.5 eV. The non-zero electronic cou-
pling with the lead is tCL10 = t
21 = 0.1 eV. The matrix ele-
ment of the single atom is KC11 = 0.654 eV/(Å
· amu). The
spring constant between the lead atoms is KLij = K
ij = 0.654
eV/(Å2·amu). The non-zero atomic coupling with the leads
is KCL10 = K
21 = 0.127 eV/(Å
·amu).
have enough energy to emit one phonon. So the heat
generation is zero. Once the applied voltage is larger
than the phonon energy, phonon emission turns on. The
heat generation increases almost linearly with the applied
bias (inset of Fig. 2). This is different from the electrical
current, which increases smoothly in this regime. The
second threshold value corresponds to the alignment of
the left lead chemical potential with the electron onsite
energy eV = 2ε0 (positive bias µl > µr). The electron
transmission is nearly unity above the onsite energy. The
larger the transmission, the larger the current and the
heat generation provided that the other parameters re-
main unchanged. These two threshold behaviours may
become less obvious when the coupling with the leads
get stronger. As a result of coupling, the discrete elec-
tron and phonon density of states (DoS) extends to a
small energy region around their discrete values. The
continuous phonon DoS leads to the broadening of the
first threshold behaviour, while the continuous electron
DoS is responsible for that of the second. It is smoothed
out when the coupling is large enough (Fig. 3). Only
electrons whose energies are within the broadened energy
spectrum can tunnel across the central atom. The heat
generation reaches maximum when the electron states in
one lead are all occupied in this energy range, while those
in the other are all empty.
The electron-lead coupling not only leads to the elec-
tron level broadening, but it also influences the elec-
tron tunneling time. The larger this coupling, the less
time electrons spend in the central region. In Fig. 3, we
show the heat generation and the atom temperature for a
single-atom structure under different electronic coupling
strengths. The definition of temperature is ambiguous
in nanostructures6. Here we use the method proposed
in Ref. 18. We can only see one threshold behaviour at
about 0.2 V, which is smoothed out when the coupling is
larger than 0.2 eV. The temperature and the heat gener-
ation show similar trends. The saturate voltage of heat
generation increases with the strengthen of the electron-
lead coupling. This is due to the coupling induced atomic
level broadening. The decrease of the heat generation
and temperature with increasing electron-leads coupling
can be easily understood. The larger this coupling, the
less time electrons spend at the central atom. Since the
electron-phonon interaction takes place there, the heat
generation decreases. We also show the heat generation
as a function of electron-lead coupling in the inset of the
lower panel. The applied voltage is 0.3 V. On one side,
when the coupling is too small, few electrons can tun-
neling through the atom. The heat generation is small.
On the other, when the coupling is very large, the elec-
tron tunneling process is too quick for the phonons to
interact with the electrons. The heat generation is also
small. It has a maximum value at some moderate cou-
pling strength. This is different from the electrical cur-
rent, which increases monotonously with the increase of
coupling strength.
 0  0.1  0.2  0.3  0.4  0.5
Voltage (V)
0.2 0.4
FIG. 3: Heat generation Q and the atom temperature T under
different electron coupling strength tCL10 = t
21 = 0.05, 0.1,
0.2, and 0.3 eV, respectively. Other parameters are the same
with Fig. 2. The inset shows the heat generation as a function
of electron coupling strength at an applied bias V = 0.3 V.
The atom-lead coupling determines how well the gen-
erated heat can be conducted into the surrounding leads.
One of the important reasons why we are interested in
the heat generation in nanostructures is that it may leads
to temperature increase and even structure breakup. To
study the temperature change, we need to take into ac-
count not only the heat generation, but also the heat con-
duction into the leads. In the simplest one-atom struc-
ture, the heat conductance is mainly determined by the
atom-lead coupling. Our model includes this intrinsically.
Figure 4 shows the heat generation and the atom temper-
ature as a function of atom-lead coupling. For the heat
generation, the BA and SCBA results show large differ-
ence around the resonant position, which corresponds to
a perfect atomic junction. For the atom temperature, BA
and SCBA give almost the same results. In the case of
a perfect junction, the heat generation reaches its max-
imum value, while the atom temperature is the lowest.
The reason is that the perfect junction has the best heat
conductance. When the atom-lead coupling is weak, the
heat generation is small. But the poor heat conductance
can still result in a much higher temperature than the
surrounding leads. We also show the heat conductance
as a function of atom-lead coupling in the inset of the
upper panel, which shows a sharp peak at resonance.
 0  0.2  0.4  0.6  0.8  1
K (eV/A2 amu)
0.80.60.40.2
FIG. 4: Heat generation Q and the atom temperature T as
a function of the atom-lead coupling KCL10 = K
21 = K.
Dashed and dotted lines correspond to results of SCBA and
BA, respectively. Other parameters are the same with Fig. 2.
The inset shows the thermal conductance κ as a function of
K. The unit is 1× 10−12 W/K.
In Fig. 5, we show the heat generation as a function
of electron onsite energy at different biases. We assume
that we can tune the the onsite energy via a gate voltage.
When the applied bias is less than the phonon energy,
there will be no heat generation. When the bias energy is
slightly larger than the phonon energy and less than 2ω,
there are two energy positions where the heat generation
is the largest. These two peaks are approximately at
−0.5eV +ω and 0.5eV −ω. They merge into a single one
at a bias of eV = 2ω until it reaches saturation. After
that, this peak broadens, and becomes ladders. All these
behaviour can also be explained by the analytical result
of Eq. (C1).
In Fig. 6 we show the heat generation of a two-atom
structure (n = 2 in Fig. 1). The central region has two
identical atoms. Interaction between them leads to two
discrete energy levels. One is at 0 eV, and the other at
0.4 eV. When the electrical coupling between the leads
and the central region is small (0.1 eV), additional to the
threshold behaviour at eV = ω, there are two ladders
corresponding to the phonon assisted resonant tunneling
across the two electrical levels. If the electrical coupling
0.20.10-0.1-0.2
ε(eV)
FIG. 5: Heat generation Q as a function of electrical onsite
energy εC1 under different biases. From the inner to the outer
side, the applied biases are V = 0.05, 0.10, 0.15, 0.20, 0.25,
0.30, 0.35, 0.40, 0.45, and 0.50 V, respectively.
gets larger (0.2 eV), the two ladders broaden out. Again
this is attributed to the coupling induced level broad-
ening. The heat generation for the two-atom structure
is much larger than that of a single-atom structure. The
more the electrical levels, the larger the electrical current
and heat generation. It is worth noting that for multi-
atom structures the distribution of the electrostatic po-
tential may influence the results significantly35. In the
above calculation, we assume that the two electrical lev-
els don’t change with the applied bias, and that we can
tune their positions via a gate voltage.
 0  0.2  0.4  0.6  0.8  1
Voltage (V)
FIG. 6: Heat generation Q as a function of applied voltage for
a two-atom structure. The two-atom onsite energy is εC = 0.2
eV, the hopping energy is tC = 0.1 eV, and the spring con-
stant is KC = 0.654 eV/(Å2·amu). The two leads are iden-
tical. The electron onsite energy is εL = εR = 0, and the
hopping energy is tL = tR = 0.5 eV. Their spring constants
are the same as the central region. The non-zero coupling cou-
plings with the leads are KCL10 = K
32 = 0.327 eV/(Å
·amu),
and tCL10 = t
32 = 0.1 (solid), 0.2 eV (dashed), respectively.
If one of the metallic leads is replaced by a semicon-
ductor, there will be some new features in the electrical
current and the heat generation. In our simple model,
we can alternate the electron onsite energies between two
values to mimic a simple semiconductor (Appendix A).
In Fig. 7 we show the heat generation and the electri-
cal current for such kind of structure. The alternating
onsite energies of the left lead are −0.1 and −0.2 eV, re-
spectively. This produces an energy band-gap of 0.1 eV.
Other parameters are given in the figure caption. We can
see that there appears negative differential conductivity
in the current-voltage characteristics due to the semi-
conductor band-gap. This qualitatively agrees with the
experimental36 and first-principle37 studies. The heat
generation curve is slightly different. Additional to its
threshold behaviour, the peak and valley positions are
also different. The electrical current has a peak when
the chemical potential of the lead is aligned with the
central electrical level, while the peak of the heat gen-
eration shifts to the right by one phonon energy. This
corresponds to the phonon-assisted resonant tunneling.
The current and heat generation decrease when the sin-
gle electrical level is within the band-gap of the left lead.
The peak-to-valley ratio depends on the coupling with
the semiconductor lead. In the limit of small band-gap
and large coupling, we recover the metallic lead results.
 0  0.2  0.4  0.6  0.8  1
Voltage (V)
FIG. 7: Heat generation Q and electrical current J as a
function of applied voltage for a single-atom structure. The
left lead is a semiconductor. Its alternating onsite energies
are −0.2 and −0.1 eV, respectively. The chemical potential
is µL = 0.05 eV higher than the conduction band bottom,
which corresponds to n-type doping. Other parameters are
10 = K
21 = 0.4 eV/(Å
·amu), tCL10 = t
21 = 0.1 eV,
1 = 0.1 eV, K
11 = 0.654 eV/(Å
·amu), εR = −0.05 eV,
L = tR = 0.5 eV, and KL = KR = 0.654 eV/(Å2·amu).
IV. CONCLUSION
We studied the coupled electron and phonon transport
in 1D atomic junctions in the weak electron-phonon inter-
action regime. Base on the nonequilibrium Green’s func-
tion method, we derived the electrical, energy current of
the coupled electron-phonon system, and the energy ex-
change between them. We showed that the SCBA con-
serves the energy current. Using this formalism, we stud-
ied the heat generation in one- and two-atom structures
coupling with different leads under a broad range of pa-
rameters. Especially, we studied the influence of the ther-
mal transport properties on the heat generation and atom
temperature of the central region. The results on semi-
conductor leads agree qualitatively with the experimental
and first-principle studies. This model can be easily ex-
tended to study more realistic structures such as molec-
ular transport junctions and metallic nanowires. The
electron, phonon Hamiltonian, their interaction and lead-
coupling matrices can all be obtained from first-principle
calculations17,38,39. The surface Green’s functions for
bulk leads can be computed by recursive method17,38. It
is also possible to include the electron-electron and the
phonon-phonon interactions14,17.
Acknowledgments
We thank Baowen Li, Sai Kong Chin, Jian Wang, and
Nan Zeng for discussions. This work is supported in part
by a Faculty Research Grant of the National University
of Singapore.
APPENDIX A: SURFACE GREEN’S FUNCTIONS
OF THE 1D LEAD
In this Appendix, we show that for the 1D tight-
binding model the lead self-energies can be expressed
analytically17. The electron and phonon self-energies are
similar in their form. Here we take electrons as an exam-
ple, and give the phonon results directly. We assume that
the onsite energies of the electrons alternate between ε1
and ε2. The hopping energy is t
ij = t0. If ε1 = ε2, we get
a continuum band. This corresponds to a metallic lead.
If they are not equal, we get two bands with a band gap.
We can take the lower as the valence band (VB), and the
upper as the conduction band (CB). We use this method
to mimic a semiconductor lead. In this case, the semi-
infinite lead has two electron states in each period. In the
tight-binding model, only the left- (right-) most state of
the central region is coupled to the left (right) lead. So
we only need to know the surface Green’s function, e.g.,
for the left lead it is g0 = g
00. We assume the retarded
Green’s function is
grij =
i−j state 1,
i−j state 2.
Putting it into the definition of the retarded Green’s func-
tions [(ε+ iη)I −H ]gr = I, we have
− t0c1 + (ε+ iη − ε2)c2 − t0c1λ = 0, (A2)
− t0c2 + (ε+ iη − ε1)c1λ− t0c2λ = 0. (A3)
From Eqs. (A2-A3), we get an equation for λ
2− (ε+ iη − ε1)(ε+ iη − ε2)
λ+ 1 = 0. (A4)
The condition that Eq. (A4) has travelling wave solutions
gives the dispersion relation
(ε1+ε2)−
(ε1−ε2)2+16t
≤ ε ≤ ε1 (VB),
ε2 ≤ ε ≤
(ε1+ε2)+
(ε1−ε2)2+16t
(CB).
We assume ε1 ≤ ε2 without loss of generality. The energy
band-gap is ε2 − ε1. If they are equal, the two bands
merge into one, which corresponds to a metallic lead.
For the surface Green’s function of the left lead, we
also have
(ε+ iη − ε1)c1 − t0c2 = 1. (A6)
From Eqs. (A2,A6), we get
ε+iη−ε2
(1+λ)t2
(VB),
ε+iη−ε1
(1+λ)t2
(CB).
|λ| ≥ 1 is one of the roots of Eq. (A4). The surface
Green’s function of the right lead is identical.
We can also alternate the atom masses to generate a
phonon band-gap. In our model the mass change will
modify the renormalized spring constants. The diagonal
elements of the dynamical matrix will be two alternating
values Kαii = 2k1 or 2k2, while the off-diagonal elements
will be a single value Kαij = −
k1k2, where |i − j| = 1.
If we assume that k2 ≥ k1, the acoustic band (AB) is
0 < ω2 < 2k1, and the optical band (OB) 2k2 < ω
2(k1 + k2). The surface Green’s function is
(1+λ)k1k2
(AB),
(1+λ)k1k2
(OB),
where Ωn = (ω + iη)
2 − 2kn. |λ| ≥ 1 is one of the roots
2− Ω1Ω2
λ+ 1 = 0. (A9)
In all the simulation results of present paper, the two
spring constants are equal (k1 = k2), which correspond
a single continuum phonon band. The electron onsite
energies are also equal (ε1 = ε2) except in Fig. 7, where
we set ε1 = −0.2 eV and ε2 = −0.1 eV to mimic a
semiconductor lead.
APPENDIX B: ENERGY CURRENT
CONSERVATION
In this Appendix, we justify that the SCBA satisfies
the energy current conservation. The justification of the
electrical current conservation is given in the Refs.40,41.
What we need to prove is that
(JE,eα + J
α ) = 0. (B1)
The electron part is
JE,eα =
ε Tr{G>(ε)Σ<α (ε)−G<(ε)Σ>α (ε)}.
Using the important relation39,40
G>Σ<t −G<Σ>t
= 0, (B3)
we get
JE,eα = −
ε Tr{G>(ε)Σ<eph(ε)−G
<(ε)Σ>eph(ε)}.
The Hartree term does not contribute to the current di-
rectly. It’s just like a static potential which only modifies
the Green’s function. Putting the Fock self-energy into
Eq. (B4), we have
JE,eα
G>nm(ε)M
kl(ω)G
ij(ε− ω)M
−G<nm(ε)MkmiD>kl(ω)G
ij(ε− ω)M ljn
. (B5)
Sum over all the indices is assumed. The heat generation
Q is the energy decrease of the electron system, which
should also be the energy increase of the phonon system.
Replacing ω by −ω, using the symmetric properties of
the phonon Green’s functions17, replacing ε by ε − ω,
and finally changing dummy variables, we get
G<nm(ε)M
kl(ω)G
ij(ε− ω)M ljn
G<nm(ε)M
kl(−ω)G
ij(ε+ ω)M
(ε− ω)
G<nm(ε− ω)MkmiD<lk(ω)G
ij(ε)M
(ε− ω)
G>nm(ε)M
kl(ω)G
ij(ε− ω)M
Putting Eq. (B6) back into Eq. (B5), we get
JE,eα
G>nm(ε)M
kl(ω)G
ij(ε− ω)M ljn
6= 0. (B7)
For the phonon energy current we have
JE,phα
D>nm(ω)M
ki(ε)G
jl(ε− ω)M
−D<nm(ω)MmlkG>ki(ε)G
jl(ε− ω)M
. (B8)
Following the same procedure as electrons, finally we get
JE,phα
G>nm(ε)M
kl(ω)G
ij(ε− ω)M ljn
6= 0. (B9)
So we still have
JE,eα + J
= 0. (B10)
Eqs. (B7, B9) give the energy exchange between the
electron and the phonon system, which is also the heat
generation of the atomic junction. Replacing D<, G<
by D<0 , G
0 in Eq. (B7), and G
>, G< by G>0 , G
Eq. (B9), we get the results under BA. We can find that
the energy increase of the phonons does not equal to the
energy decrease of the electrons under BA.
APPENDIX C: ANALYTICAL RESULT AT ZERO
TEMPERATURE
At zero temperature, we can get an analytical expres-
sion for heat generation in a single-atom structure by
using the bare Green’s functions in Eq. (B7, B9). We
only take into account the imaginary part of the lead self-
energies and ignore their energy dependence (the wide-
band limit)24. Finally, we assume that the phonons are
in their equilibrium states. Under these approximations,
the heat generation is (assuming eV ≥ ω0)
Q ≈ 1
M2ΓLΓR
[(ε− ε0)2 + Γ2/4] [(ε− ε0 − ω0)2 + Γ2/4]
M2ΓLΓR
4π(ω20 + Γ
(eV/2− ε0)2 + Γ2/4
(eV/2− ε0 − ω0)2 + Γ2/4
(−eV/2− ε0 + ω0)2 + Γ2/4
(−eV/2− ε0)2 + Γ2/4
arctan
eV/2− ε0
+ arctan
eV/2− ε0 − ω0
−arctan
−eV/2− ε0
−arctan
−eV/2− ε0 + ω0
. (C1)
ω0 is the phonon energy, ε0 is the electron onsite energy,
V is the applied bias, and Γ = ΓL+ΓR. The heat genera-
tion is zero when eV ≤ ω0. Equation (C1) can reproduce
most the qualitative features of heat generation in a sin-
gle atom, except that it does not take into account heat
conduction in the phonon system.
∗ Electronic address: tower.lu@gmail.com
http://staff.science.nus.edu.sg/~phywjs/
1 S. Ciraci, A. Buldum, and I. P. Batra, J. Phys.: Condens.
Matter 13, R537 (2001).
2 B. J. van Wees, H. van Houten, C. W. J. Beenakker,
J. G. Williamson, L. P. Kouwenhoven, D. van der Marel,
and C. T. Foxon, Phys. Rev. Lett. 60, 848 (1988);
D. A. Wharam, T. J. Thornton, R. Newbury, M. Pepper,
H. Ahmed, J. E. F. Frost, D. G. Hasko, D. C. Peacock,
D. A. Ritchie, and G. A. C. Jones, J. Phys. C: Solid State
Phys. 21, L209 (1988).
3 L. G. C. Rego and G. Kirczenow, Phys. Rev. Lett. 81, 232
(1998); K. Schwab, E. A. Henriksen, J. M. Worlock, and
M. L. Roukes, Nature 404, 974 (2000).
4 E. Pop, D. Mann, J. Cao, Q. Wang, K. Goodson, and
H. Dai, Phys. Rev. Lett. 95, 155505 (2005); D. Mann,
Y. K. Kato, A. Kinkhabwala, E. Pop, J. Cao, X. Wang,
L. Zhang, Q. Wang, J. Guo, and H. Dai, Nature Nanotech-
nology 2, 33 (2007).
5 M. Lazzeri, S. Piscanec, F. Mauri, A. C. Ferrari, and
J. Robertson, Phys. Rev. Lett. 95, 236802 (2005).
6 D. G. Cahill, W. K. Ford, K. E. Goodson, G. D. Mahan,
A. Majumdar, H. J. Maris, R. Merlin, and S. R. Phillpot,
J. Appl. Phys. 93, 793 (2003).
7 S. Lepri, R. Livi, and A. Politi, Phys. Rep. 377, 1 (2003).
8 B. Li, J. Wang, L. Wang, and G. Zhang, Chaos 15, 015121
(2005).
9 L. V. Keldysh, Sov. Phys. JETP 20, 1018 (1965).
10 L. Kadanoff and G. Baymn, Quantum Statistical Mechan-
ics (W. A. Benjamin, New York, 1962).
11 H. Haug and A.-P. Jauho, Quantum Kinetics in Transport
and Optics of Semiconductors (Springer, Berlin, 1996).
12 S. Datta, Electronic Transport in Mesoscopic Systems
(Cambridge University Press, 1997).
13 A. Ozpineci and S. Ciraci, Phys. Rev. B 63, 125415 (2001).
14 N. Mingo and L. Yang, Phys. Rev. B 68, 245406 (2003);
Phys. Rev. B 70, 249901(E) (2004); N. Mingo, Phys. Rev.
B 74, 125402 (2006).
15 T. Yamamoto and K. Watanabe, Phys. Rev. Lett. 96,
255503 (2006).
16 A. Dhar and D. Sen, Phys. Rev. B 73, 085119 (2006).
17 J.-S. Wang, J. Wang, and N. Zeng, Phys. Rev. B 74,
033408 (2006); J.-S. Wang, N. Zeng, J. Wang, and C. K.
Gan, cond-mat/0701164.
18 M. Galperin, M. A. Ratner, and A. Nitzan, J. Phys.: Con-
dens. Matter 19, 103201 (2007); cond-mat/0611169.
19 D. A. Ryndyk, M. Hartung, and G. Cuniberti, Phys. Rev.
B 73, 045420 (2006).
20 C. Auer, F. Schurrer, and C. Ertler, Phys. Rev. B 74,
165409 (2006).
21 M. Lazzeri and F. Mauri, Phys. Rev. B 73, 165419 (2006).
22 C. Caroli, R. Combescot, P. Nozieres, and D. Saint-James,
mailto:tower.lu@gmail.com
http://staff.science.nus.edu.sg/~phywjs/
J. Phys. C : Solid State Phys. 4, 916 (1971).
23 We denote electron energy as ε, and phonon energy as ω,
except in the unified expressions including both electrons
and phonons, i.e., Eq. (22).
24 Y. Meir and N. S. Wingreen, Phys. Rev. Lett. 68, 2512
(1992); A.-P. Jauho, N. S. Wingreen, and Y. Meir, Phys.
Rev. B 50, 5528 (1994).
25 R. Lake and S. Datta, Phys. Rev. B 46, 4757 (1992).
26 T. N. Todorov, Phil. Mag. B 77, 965 (1998).
27 M. J. Montgomery, T. N. Todorov, and A. P. Sutton, J.
Phys.: Condens. Matter 14, 5377 (2002).
28 A. P. Horsfield, D. R. Bowler, H. Ness, C. G. Snchez,
T. N. Todorov, and A. J. Fisher, Rep. Prog. Phys. 69,
1195 (2006).
29 N. Agräıt, C. Untiedt, G. Rubio-Bollinger, and S. Vieira,
Phys. Rev. Lett. 88, 216803 (2002).
30 A. Pecchia, G. Romano, and A. D. Carlo, Phys. Rev. B
75, 035401 (2007).
31 Q.-F. Sun and X. C. Xie, cond-mat/0608536.
32 T. N. Todorov, J. Hoekstra, and A. P. Sutton, Phys. Rev.
Lett. 86, 3606 (2001).
33 M. D. Ventra, Y.-C. Chen, and T. N. Todorov, Phys. Rev.
Lett. 92, 176803 (2004).
34 Y.-C. Chen, M. Zwolak, and M. D. Ventra, Nano Lett. 3,
1691 (2003); Z. Yang, M. Chshiev, M. Zwolak, Y.-C. Chen,
and M. D. Ventra, Phys. Rev. B 71, 041402(R) (2005);
Z. Huang, B. Xu, Y. Chen, M. D. Ventra, and N. Tao,
Nano Lett. 6, 1240 (2006).
35 D. Segal and A. Nitzan, J. Chem. Phys. 117, 3915 (2002).
36 N. Guisinger, M. Greene, R. Basu, A. Baluch, and M. Her-
sam, Nano Lett. 4, 55 (2004).
37 T. Rakshit, G.-C. Liang, A. W. Ghosh, M. C. Hersam, and
S. Datta, Phys. Rev. B 72, 125305 (2005).
38 J. Taylor, H. Guo, and J. Wang, Phys. Rev. B 63, 245407
(2001); N. Sergueev, D. Roubtsov, and H. Guo, Phys. Rev.
Lett. 95, 146803 (2005).
39 T. Frederiksen, M. Brandbyge, N. Lorente, and A.-P.
Jauho, Phys. Rev. Lett. 93, 256601 (2004); T. Frederik-
sen, M. Paulsson, M. Brandbyge, and A.-P. Jauho, cond-
mat/0611562.
40 T. Frederiksen, Master’s thesis, Technical University of
Denmark (2004).
41 J. K. Viljas, J. C. Cuevas, F. Pauly, and M. Hafner, Phys.
Rev. B 72, 245415 (2005).
ABSTRACT
  Employing the nonequilibrium Green's function method, we develop a fully
quantum mechanical model to study the coupled electron-phonon transport in
one-dimensional atomic junctions in the presence of a weak electron-phonon
interaction. This model enables us to study the electronic and phononic
transport on an equal footing. We derive the electrical and energy currents of
the coupled electron-phonon system and the energy exchange between them. As an
application, we study the heat dissipation in current carrying atomic junctions
within the self-consistent Born approximation, which guarantees energy current
conservation. We find that the inclusion of phonon transport is important in
determining the heat dissipation and temperature change of the atomic
junctions.

<|endoftext|><|startoftext|>
Introduction to Mesoscopic Physics 2nd ed. (Oxford, 2001).
h-hole
l-hole
[100]
[010]
FIG. 1: (a) Surface plot of spin subbands explains two Fermi contours with different kF appear
as a result of spin-orbit interaction. (b) According to more realistic calculation[6, 7], the outer
subband (h-hole) is significantly warped. (c) Schematic illustration of Beff seen by holes moving
around a ring structure in the presence of spin-orbit interaction. (d) Scanning electron micrograph
of the ADL sample.
Bext (T)
0 0.1 0.2 0.3 0.4 0.5
-0.02
-0.01
FIG. 2: Upper solid line: Resistance of the sample as a function of the external magnetic field
Bext for T=60 mK. Lower broken line: Extracted oscillating part of the magnetoresistance. For
the background subtraction, a 12th order polynomial for each set of 100 consecutive data points is
adopted.
Frequency (T −1)
T =60mK
20 40 60 80 100 120 140
160 180 200 220
FIG. 3: Fourier spectrum of the AB-type oscillation in the magnetoresistance at 60mK. The main
peak splits into four (B, A, A’, B’) and also the 2nd harmonic splits into two (A∗, A’∗). (The other
two are within the noise level.) The inset shows peak structures around four times frequency of the
main peak structure as indicated by arrows. The scales of FT amplitude are shown by the arrows
marked as u in the respective figures.
Frequency (T −1)
T=60mK
120mK
150mK
210mK
250mK
40 60 80 100 120 140
T (mK)
100 200 300
FIG. 4: FT results at temperatures from 300mK down to 60mK. The data are offset for clarity.
The inset shows the peak heights normalized at 60mK for peaks A, A’ and B’. The solid line in
the inset is the fit to the Dingle function (4).
40 50 60
100 110 120
140 150 160 170
Frequency (T −1)
190 200 210 220 230 240
FIG. 5: Solid lines are the FT spectra of the function (5) for the winding number n = 1,2,3,4.
The magnetic field region is taken as −0.5T< B <0.5T and the parameter ∆kF is taken from SdH
measurement as 4.1×107m−1. The scale of abscissa is modified with n to show the entire peak
structures. Dotted lines are the results when Berry’s phase ∆θB is set to zero. Significant difference
between solid and dotted lines around the center while they are almost identical in surrounding
regions.
	Acknowledgments
	References
ABSTRACT
  We report observation of spin-orbit Berry's phase in the Aharonov-Bohm (AB)
type oscillation of weak field magnetoresistance in an anti-dot lattice (ADL)
of a two-dimensional hole system. An AB-type oscillation is superposed on the
commensurability peak, and the main peak in the Fourier transform is clearly
split up due to variation in Berry's phase originating from the spin-orbit
interaction. A simulation considering Berry's phase and the phase arising from
the spin-orbit shift in the momentum space shows qualitative agreement with the
experiment.

<|endoftext|><|startoftext|>
Introduction
Jones (1976) studied the propagation of light in a moving dielectric and showed
by experiment that a rotating medium induces a rotation of the polarisation of
the transmitted light. Player (1976) confirmed that this observation could be ac-
counted for through an application of Maxwells equations in a moving medium.
More recently Padgett et al. (2006) reasoned that the rotation of the medium turns
a transmitted image by the same angle as the polarisation. This is in contrast to the
Faraday effect (Faraday 1846), where a static magnetic field in a dielectric medium,
parallel to the propagation of light, causes a rotation of the polarisation but not
a rotation of a transmitted image. Rotation of the plane of polarisation and im-
age rotation in a rotating medium may be attributed respectively to the spin and
orbital angular momentum of light (Allen et al. 1999, 2003).
The first theoretical treatment of this problem was published by Fermi (1923),
who considered plane waves and a non-dispersive medium. The theoretical ana-
lysis of Player (1976) was also restricted to the propagation of plane waves, but
took the dispersion of the medium into account. Player assumed that the dielectric
response does not depend on the motion of the medium. In our treatment we
follow his assumption although a more careful analysis by Nienhuis et al. (1992)
showed that there will be an effect of the motion on the refractive index for a
dispersive medium near to an absorption resonance (see also Baranova & Zel’dovich
(1979) for a discussion on the effect of the Coriolis force on the refractive index).
In contrast to Player we allow for more general electromagnetic fields that can
carry orbital angular momentum (OAM). This leads to an additional term in our
Article submitted to Royal Society TEX Paper
http://arxiv.org/abs/0704.0725v1
2 J. B. Götte, S. M. Barnett and M. Padgett
wave equation, which corresponds to a Fresnel drag term familiar from analysis of
uniform motion. For a rotating medium, however, this drag leads to a rotational
shift of the image. The propagation of light in a rotating medium thus involves
both spin angular momentum (SAM) and OAM. We solve the wave equation for
circularly polarised Bessel beams and consider two different superpositions of such
Bessel beams to quantify the effects of both polarisation and image rotation. For
rotation of the polarisation we examine a superposition of left- and right-circularly
polarised Bessel beams carrying the same amount of OAM. For image rotation we
consider a superposition of Bessel beams with the same circular polarisation but
opposite OAM values. Such a superposition creates an intensity pattern with lobes
or ‘petals’. In both cases the constituent Bessel beams propagate differently in the
medium, which leads to a change in their relative phase. This is the origin of the
rotation of both the polarisation and the transmitted image. For both phenomena
we derive an expression for the angle per unit length of dielectric through which
the image or the polarisation is rotated.
The significance of the total angular momentum can be most easily seen in the
wave equation for the propagation of light in a rotating medium. We derive this
wave equation in section 2. In the remaining sections we calculate the rotation of
polarisation (section 3) and the image rotation (section 4) and reveal their common
form.
2. Wave equations
The wave equation for a general electric displacment D in a rigid dielectric medium
rotating with angular velocity Ω is given by:
−∇2D = −ǫ(ω′)D̈+ 2[ǫ(ω′)− 1]
Ω× Ḋ− (v · ∇)Ḋ
. (2.1)
An analoguous wave equation can be derived for the magnetic induction B. Com-
pared to the form derived by Player (1976), who considered the special case of a
plane wave propagating along the direction of Ω, these wave equations contain an
additional term 2[ǫ(ω′)− 1](v · ∇)Ḋ. This term is responsible for the Fresnel drag
effect which modifies the speed of light in a moving medium (McCrea 1954; Bar-
ton 1999; Rindler 2001). In the following we will derive this wave equation for the
electric displacement.
Our analysis starts with the same considerations as Player (1976), by intro-
ducing a rest frame and a moving frame. In the rest frame the dielectric medium
rotates with an angular velocity v = Ω× r and in the moving frame the medium is
at rest. We restrict our analysis to small velocities with v ≪ c and use Maxwell’s
equations in both reference frames (Landau & Lifshitz 1975). For the medium at
rest we assume the following constitutive relations:
′ = ǫ(ω′)E′, (2.2a)
′ = H′, (2.2b)
where we have used primes to denote the fields and their frequency ω′ in the moving
frame. The fields in the moving frame can be expressed in the rest frame by a Lorentz
Article submitted to Royal Society
On the dragging of light by a rotating medium 3
transformation (Stratton 1941; Jackson 1998), which gives to first order in v/c:
′ = D+ v ×H, (2.3a)
′ = B− v ×E, (2.3b)
′ = E+ v ×B, (2.3c)
′ = H− v ×D, (2.3d)
where we have set c = 1 and work with units in which ǫ0 = µ0 = 1. The two
constitutive relation in (2.2) in the rest frame are thus given by
D+ v ×H = ǫ(ω′) (E+ v ×B) , (2.4a)
B− v ×E = H− v ×D. (2.4b)
The dielectric constant is still given as a function of the frequency in the moving
frame. We also assume that the dielectric constant depends only on the frequency
and is otherwise independent of the state of motion of the medium. On combining
these two equations we can express D and B with the two other fields E and H to
the first order in v:
D = ǫ(ω′)E+ [ǫ(ω′)− 1]v×H, (2.5a)
B = H− [ǫ(ω′)− 1]v×E. (2.5b)
After taking the curl of (2.5a) we can use the Maxwell equation ∇×E = −Ḃ and
express Ḃ, with the help of (2.5b), in terms of Ḣ and Ė. If we assume v to be
constant (see Appendix A), as in Player’s paper (Player, 1976) this yields
∇×D = −ǫ(ω′)Ḣ+ ǫ(ω′)[ǫ(ω′)− 1]v × Ė+ [ǫ(ω′)− 1]∇× (v ×H). (2.6)
It follows from (2.5a) that ǫ(ω′)v × E = v ×D, to the first order in v, and so we
can rewrite (2.6) as:
∇×D = −ǫ(ω′)Ḣ+ [ǫ(ω′)− 1]v × Ḋ+ [ǫ(ω′)− 1]∇× (v ×H). (2.7)
We can now take the curl of (2.7) to obtain a wave equation for D, as ∇×∇×D =
−∇2D for ∇·D = 0, and the curl of Ḣ is given by ∇× Ḣ = D̈. In order to express
the curl of the vector products we use the identity∇×(a×b) = ∂ibia−∂iaib, where
the doubly occurring index denotes a summation over the Cartesian components.
The operator ∂i represents differentiation with respect to the ith component and
acts on the whole product which gives rise to terms containing the divergences of
v,D and H. These terms are either zero, because ∇ · v = 0 and ∇ ·D = 0 or they
lead to terms which are of second order in v and therefore negligible. The wave
equation for D is thus given by
−∇2D = −ǫ(ω′)D̈+ [ǫ(ω′)− 1]
(Ḋ · ∇)v − (v · ∇)Ḋ
+ [ǫ(ω′)− 1]∇× [(H · ∇)v − (v · ∇)H] .
(2.8)
For a rotation v = Ω×r we can specify terms of the form (a ·∇)v by expressing the
components of the velocity v using the Levi-Civitta symbol εijk as vi = εijkΩjrk.
The components of (a · ∇)v are thus given by
[(a · ∇)v]i = al∂lεijkΩjrk = alεijkΩjδlk = [Ω× a]i . (2.9)
Article submitted to Royal Society
4 J. B. Götte, S. M. Barnett and M. Padgett
If we use the results from (2.9) in (2.8) we find for ∇2D:
−∇2D = −ǫ(ω′)D̈+ [ǫ(ω′)− 1]
Ω× Ḋ− (v · ∇)Ḋ
+ [ǫ(ω′)− 1]∇× [Ω×H− (v · ∇)H] .
(2.10)
The curl of the last bracket requires some some additional calculations. The first
term is given by:
∇× (Ω×H) = (∇ ·H)Ω− (Ω · ∇)H, (2.11)
and the second term can be written as:
∇× (v · ∇)H = Ω (∇ ·H)−∇ (Ω ·H) + (v · ∇) Ḋ, (2.12)
where the last term originates from ∇×H. The terms containing the divergence of
H cancel and the term (v · ∇) Ḋ can be added to the second term in (2.10). The
two remaining terms − (Ω · ∇)H and ∇ (Ω ·H) together give Ω× Ḋ:
Ω× Ḋ = Ω× (∇×H) = ∇ (Ω ·H)− (Ω · ∇)H. (2.13)
This concludes the derivation of the wave equation (2.1). It is possible to derive the
same wave equation for B using similar methods.
For a rotation around the z axis with constant angular velocity Ω = Ωez, the
directional derivative v · ∇ is proportional to an azimuthal derivative, as v · ∇ =
Ω× r · ∇ = Ω∂φ. This allows us to identify the two terms Ω× Ḋ and Ω∂φḊ in the
wave equation
−∇2D = −ǫ(ω′)D̈+ 2[ǫ(ω′)− 1]
Ω× Ḋ− Ω∂φḊ
(2.14)
as the polarisation rotation and rotary Fresnel drag terms, respectively. Player’s
derivation does not contain the term proportional to ∂φḊ because he treated only
the case of a plane wave propagating in the z-direction and for such fields D is
independent of φ.
On substituting a monochromatic ansatz of the form D = D0 exp(−iωt) into
(2.14), where ω is the optical angular frequency in the rest frame, we obtain:
−∇2D0 = ǫ(ω
′)ω2D0 − 2[ǫ(ω
′)− 1]ωΩ [iez ×D0 − i∂φD0] . (2.15)
If we make an ansatz for D0 with a general polarisation given by the complex
numbers α and β (with |α|2 + |β|2 = 1) in the form of D0 = (αex + βey)D+Dzez,
we find that the x and y components of the wave equation (2.15) decouple if β = ±iα
corresponding to left- and right-circularly polarised light respectively. If we restrict
the solutions to these two cases we can write the wave equation as:
∇2D = −ǫ(ω′)ω2D + 2[ǫ(ω′)− 1]ωΩ (±1− i∂φ)D, (2.16)
where the plus sign refers to left-circular polarisation and the minus sign to right-
circular polarisation. We can then identify ±1 as the extreme values of the variable
σ which corresponds to the circular polarisation or SAM of the light beam. Similarly
we can identify −i∂φ = Lz as the OAM operator, so that the wave equation contains
a term which depends on the total angular momentum σ + Lz:
∇2D = −ǫ(ω′)ω2D + 2[ǫ(ω′)− 1]ωΩ (σ + Lz)D. (2.17)
We shall see that it is the dependence on the optical angular momentum that is
responsible for the rotation of both the polarisation and of a transmitted image.
Article submitted to Royal Society
On the dragging of light by a rotating medium 5
3. Specific rotary power
The rotation of the polarisation arises from the difference in the refractive indices
for left- and right-circularly polarised light. The angle per unit length by which the
polarisation is rotated is called the specific rotary power. For an optically active
medium at rest the specific rotary power is characteristic for a given material, but
from (2.17) it can be seen that light propagates differently in a rotating medium,
depending on whether the circular polarisation turns in the same rotation sense
as the dielectric or in the opposite sense. This phenomenon is described by the
effective specific rotary power (Jones 1976; Player 1976).
The specific rotary power, defined as (Fowles 1975):
δpol(ω) = (nr(ω)− nl(ω))
= (nr(ω)− nl(ω))
, (3.1)
is the angle of rotation of the plane of polarisation in an optical active medium. Here,
the indices r and l refer to right- and left-circularly polarised light. It was convenient
to set c = 1 for our derivation in section 2 but we reintroduce it here to facilitate
the calculation of measurable quantities. In order to illustrate the effect of the OAM
of light we choose a Bessel beam as an ansatz for the electrical displacement in the
x− y plane:
D = Jm(κρ) exp(imφ) exp(ikzz), (3.2)
where κ and kz are the transverse and longitudinal components of the wavevector.
Bessel beams of this form carry OAM of m~ per photon (Allen et al. 1992, 1999,
2003). Substituting the Bessel beam ansatz in the wave equation (2.17) yields the
following result for the overall wavenumber k =
κ2 + k2z :
k2l/r(ω) = ǫ(ω
− 2[ǫ(ω′)− 1]
(σ +m). (3.3)
The indices l and r denoting the circular polarisation correspond respectively to
σ = 1 and σ = −1. With the help of the relations ǫ(ω′) = n2(ω′) and k(ω) =
n(ω)ω/c we can turn the equation for the wavenumbers into an equation for the
effective refractive indices for left- and right-circularly polarised light:
n2l/r(ω) = n
2(ω′)− 2[n2(ω′)− 1]
(σ +m). (3.4)
Following Player (1976) we assume that Ω ≪ ω and we can therefore approximate
the square root for the refractive indices nl/r by a small parameter expansion to
the first order in Ω/ω:
nl/r(ω) ≃ n(ω
n(ω′)−
n(ω′)
(σ +m) . (3.5)
The frequency in the moving frame ω′ is different for left- and right-circularly po-
larised light (Garetz 1981) and, more generally, the azimuthal or rotational Doppler
shift is proportional to the total angular momentum (σ + m) (Allen et al. 1994;
Bialynicki-Birula & Bialynicka-Birula 1997; Courtial et al. 1998; Allen et al. 2003).
For left-circularly polarised light with σ = 1 the frequency is thus ω′ = ω−Ω(1+m),
Article submitted to Royal Society
6 J. B. Götte, S. M. Barnett and M. Padgett
and for right-circularly polarised light with σ = −1 the frequency changes to
ω′ = ω − Ω(−1 + m). Following Player (1976) we expand the refractive index
of the dielectric in a Taylor series to calculate the difference nr − nl:
nl(ω) ≃ n(ω)−
Ω(1 +m)−
n(ω)−
(1 +m) , (3.6a)
nr(ω) ≃ n(ω)−
Ω(−1 +m)−
n(ω)−
(−1 +m) . (3.6b)
Higher order derivatives of n become comparable in magnitude if n′(ω)Ω ≃ n(ω).
This will only be case for a strongly dispersive medium, such as atomic or molecular
gases, near a resonance. For such gaseous media the dielectric response in a rotating
medium has to examined more closely (Nienhuis et al. 1992). For solid materials,
such as a rotating glass rod, and for optical frequencies this condition is not fulfilled
and we can neglect higher order derivatives in the expansion (3.6). Within Player’s
assumption that the refractive index is independent of the motion of the medium
we find for the effective specific rotary power:
δpol(ω) =
ωn′(ω) + n(ω)−
. (3.7)
On introducing the group refractive index ng(ω) = n(ω) + ωn
′(ω) and the phase
refractive index nϕ(ω) = n(ω), we can rewrite the rotary power as
δpol(ω) =
ng(ω)− n
ϕ (ω)
(Ω/c), (3.8)
which is identical to Player’s (1976) expression. In this form the specific rotary
power (3.8) can be used directly with experimental data in the SI unit system. In
the next section we look at image rotation caused by a difference in the effective
refractive indices for different values of m.
4. Image rotation
The specific rotary power describes the rotation of the propagation, but we can
define, analogously, a rotary power of image rotation. The image can simply be
created by the superposition of two light beams carrying different values of OAM
which leads to an azimuthal variation of the intensity pattern. In particular we
consider an incident superposition of two similarly circularly polarised Bessel beams
with opposite OAM values of the form
D = D+ +D−
= Jm(κρ) exp(imφ) exp(ikzz) + J−m(κρ) exp(−imφ) exp(ikzz).
(4.1)
Outside the medium the superposition can be written as one Bessel beam with a
trigonometric modulation
D = Jm(κρ) (exp(imφ) + (−1)
m exp(−imφ)) exp(ikzz), (4.2)
but inside the medium the effective refractive index is different for the two com-
ponents of the superposition (Allen & Padgett 2007). On propagation this leads to
Article submitted to Royal Society
On the dragging of light by a rotating medium 7
HaL HbL ∆
Figure 1. Image rotation
(Intensity pattern created by the superposition of Bessel beams (a) in (4.1) for m = 2.
On propagation the relative phase between the constituent Bessel beams changes which
leads to a rotation of the pattern (b). The angle of rotation at a propagation distance L
is given by δimgL.)
phase difference which causes a rotation of the image (see figure 1). We define
δimg(ω) = (n−(ω)− n+(ω))
, (4.3)
which is the angle per unit length by which the image is rotated. The factorm in the
expression for δimg appears because of the exp(imφ) and exp(−imφ) phase structure
of the interfering beams and the resulting 2m-fold symmetry of the created image
(Pagdett et al. 2006).
The different effective refractive indices for the components of the superposition
(4.1) are given by:
n2+/−(ω) = n
2(ω′)− 2[n2(ω′)− 1]
(σ ±m). (4.4)
Here, σ is fixed in contrast to (3.4). The roles of σ and m are reversed for the image
rotation and the refractive indices for positive and negative OAM are given by:
n+(ω) ≃ n(ω)−
Ω(σ +m)−
n(ω)−
(σ +m) , (4.5a)
n−(ω) ≃ n(ω)−
Ω(σ −m)−
n(ω)−
(σ −m) . (4.5b)
On substituting (4.5) into (4.3) we find:
δimg(ω) =
+ n(ω)−
, (4.6)
which can be written in terms of the group and phase refractive indices as:
δimg(ω) =
ng(ω)− n
ϕ (ω)
(Ω/c). (4.7)
Article submitted to Royal Society
8 J. B. Götte, S. M. Barnett and M. Padgett
This verifies the reasoning of Padgett et al. (2006) that the polarisation and the
image are turned by the same amount when passing through a rotating medium.
It is the total angular momentum that determines the phase shifts and a linearly
polarised image will undergo rotations of both the plane of polarisation and the
intensity pattern or image.
5. Conclusion
We have extended a theoretical study by Player (1976) on the propagation of light
through a rotating medium to include general electromagnetic fields. In the original
analysis Player (1976) showed that the rotation of the polarisation inside a rotating
medium can be understood in terms of a difference in the propagation for left- and
right-circularly polarised light. Player’s (1976) analysis was thus concerned solely
with the spin angular momentum (SAM) of light.
Our treatment has shown that the general wave equation has an additional term,
which is of the same form as the Fresnel drag term for a uniform motion. In the
context of rotating motion, however, this term is connected to the orbital angular
momentum (OAM) of the light. By extending the theoretical analysis to include
OAM we have been able to attribute polarisation rotation and image rotation to
SAM and OAM respectively. We have shown that a superposition of Bessel beams
with the same OAM but opposite SAM states leads to the rotation of the polari-
sation, whereas a superposition of Bessel beams with the same SAM and opposite
OAM values gives rise to a rotation of the transmitted image. We have obtained
quantitative expressions for the rotation of the polarisation and of the transmitted
image and have verified that both are turned through the same angle, as recently
suggested by Padgett et al. (2006).
Player (1976) remarked that the derivation by Fermi (1923) appears to be in
error. The mistake in Fermi’s treatment seems to be in missing the transformation
of the magnetic fields. Whereas the change in the electric fields induced by the
motion of the medium is explicitly given in terms of the electric polarisation P†, a
similar transformation for the magnetic field is missing. In terms of our derivation
this would mean that (2.5b) changes to B = H in the rest frame. This in turn
causes that the term v × Ḋ would be missing in (2.6). This term and the term
∇ × (v × H) contribute equally to the wave equation (2.1), which explains why
Fermi’s result for the specific rotary power is smaller than Player’s and ours by a
factor of two. As pointed out by Player (1976) this missing factor is cancelled by
an additional factor of two in Fermi’s definition of the specific rotary power.
We would like to thank Amanda Wright and Jonathan Leach whose experiments on this
problem motivated our work. This work was supported by the UK Engineering and Phys-
ical Sciences Research Council.
Appendix A. Accelerated motion
The assumption that v = Ω × r is steady is problematic for a rotating motion;
if we assume Ω to be constant over time, then v̇ = (Ω · r)Ω − Ω2r. In princi-
ple this would invalidate our initial considerations for the transformation of the
† Fermi (1923) denotes the electric polarisation by S
Article submitted to Royal Society
On the dragging of light by a rotating medium 9
electromagnetic fields (2.3) which strictly hold only for uniform motion. Includ-
ing the time-derivative of v would lead to additional terms in (2.6) of the form
ǫ(ω′)[ǫ(ω′) − 1]v̇ × E. If we proceed in taking the curl of this vector product we
produce four terms which either can be neglected because they are second order in
v/c, or they do not contain the time derivative of an optical field. The latter are
smaller than terms that do contain a time derivative by ∼ Ω/ω. For our assumption
Ω ≪ ω all such terms are negligible.
References
Allen, L., Beijersbergen, M. W., Spreeuw, R. J. C. &Woerdman, J. P. 1992 Orbital angular
momentum of light and the transformation of Laguerre-Gaussian modes. Phys. Rev. A
45, 8185–8190.
Allen, L., Babiker, M. & Power, W. L. 1994 Azimuthal Doppler-shift in light-beams with
orbital angular-momentum. Opt. Commun. 112, 141–144.
Allen, L., Padgett, M. J. & Babiker M. 1999 The Orbital Angular Momentum of Light.
Prog. Opt. 39, 291–372.
Allen, L., Barnett, S. M. & Padgett, M. J. 2003 Optical Angular Momentum. Bristol:
Institute of Physics Publishing.
Allen, L. & Padgett, M. J. 2007 Equivalent geometric transformation for spin and orbital
angular momentum of light. J. Mod. Opt 54, 487–491.
Baranova, N. B. & Zel’dovich, B. Ya. 1979 Coriolis contribution to the rotary ether drag.
Proc. R. Soc. Lond. A 368, 591–592.
Barton, G. 1999 Introduction to the Relativity principle. Chichester: John Wiley & Sons.
Bialynicki-Birula I. & Bialynicka-Birula Z. 1997 Rotational Frequency Shift. Phys. Rev.
Lett 78, 2539–2542.
Courtial J., Robertson D. A., Dholakia K., Allen L. & Padgett M. J. 1998 Rotational
Frequency Shift of a Light Beam. Phys. Rev. Lett. 81, 4828 – 4830.
Faraday, M. 1846 Experimental Researches in Electricity. Nineteenth Series. Philos. Trans.
R. Soc. Lond. 136, 1–20.
Fermi, E. 1923 Sul trascinamento del piano di polarizzazione da parte di un mezzo
rotante. Rend. Lincei 32, 115–118. Reprinted in: Fermi, E. 1962 Collected Papers, vol.
1. Chicago: University of Chicago Press.
Fowles, F. R. 1975 Introduction to Modern Optics, 2nd edn. New York: Dover Publications.
Garetz, B. A. 1981 Angular Doppler-effect. J. Opt. Soc. Am. 71, 609–611.
Jackson, J. D. 1999 Classical Electrodynamics, 3rd edn. New York: John Wiley & Sons.
Jones, R. V. 1976 Rotary ‘aether drag’. Proc. R. Soc. Lond. A 349, 423–439.
Landau, L. D. & Lifshitz, E. M. 1975 The Classical Theory of Fields, 4th edn. Oxford:
Elsevier Butterworth-Heinemann.
Nienhuis, G., Woerdman, J. P. & Kuščer 1992 Magnetic and mechanical Faraday effects.
Phys. Rev. A 46 (11), 7079–7092.
McCrea, W. H. 1954 Relativity Physics, 4th edn. London: Methuen Publishing.
Padgett M., Whyte G., Girkin J., Wright A., Allen L., Öhberg P. & Barnett S. M. 2006
Polarization and image rotation induced by a rotating dielectric rod: an optical angular
momentum interpretation. Optics Lett. 31 (14), 2205–2207.
Player, M. A. 1976 Polarization and image rotation induced by a rotating dielectric rod:
an optical angular momentum interpretation. Proc. R. Soc. Lond. A 349, 441–445.
Rindler, W. 2001 Relativity. Oxford: Oxford University Press.
Stratton, J. A., 1941 Electromagnetic Theory. New York: McGraw-Hill.
Article submitted to Royal Society
	Introduction
	Wave equations
	Specific rotary power
	Image rotation
	Conclusion
	Accelerated motion
ABSTRACT
  When light is passing through a rotating medium the optical polarisation is
rotated. Recently it has been reasoned that this rotation applies also to the
transmitted image (Padgett et al. 2006). We examine these two phenomena by
extending an analysis of Player (1976) to general electromagnetic fields. We
find that in this more general case the wave equation inside the rotating
medium has to be amended by a term which is connected to the orbital angular
momentum of the light. We show that optical spin and orbital angular momentum
account respectively for the rotation of the polarisation and the rotation of
the transmitted image.

<|endoftext|><|startoftext|>
Introduction to Econophysics (Cambridge University
Press, Cambridge); Liu Y, Gopikrishnan P, Cizeau P, Meyer M, Peng C K and Stanley H E,
1999 Phys. Rev. E 60, 1390; Vandewalle N, Ausloos M and Boveroux P, 1999 Physica A 269,
[30] Kantelhardt J W, Berkovits R, Havlin S and Bunde A, 1999 Physica A 266, 461; Vandewalle
N, Ausloos M, Houssa M, Mertens P W and Heyns M M, 1999 Appl. Phys. Lett. 74, 1579
[31] Feder J, 1988 Fractals (Plenum Press, New York)
[32] Barabási A L and Vicsek T, 1991 Phys. Rev. A 44, 2730
[33] Peitgen H O, Jürgens H and Saupe D, 1992 Chaos and Fractals (Springer-Verlag, New York),
Appendix B
[34] Bacry E, Delour J and Muzy J F, 2001 Phys. Rev. E 64, 026103
[35] Fano U, 1947 Phys. Rev. 72 26
[36] Barmes J A and Allan D W, 1996 Proc. IEEE 54 176
[37] Buldyrev S V, Goldberger A L, Havlin S, Mantegna R N, Matsa M E, Peng C K, Simons M,
Stanley H E, 1995 Phys. Rev. E 51 5084
[38] Eke A, Herman P, Kocsis L and Kozak L R, 2002 Physiol. Meas. 23, R1-R38
[39] M Sadegh Movahed, G R Jafari, F Ghasemi , Sohrab Rahvar and M Reza Rahimi Tabar, J.
Stat. Mech. (2006) P02003.
[40] Kantelhardt J W, Zschiegner S A, Kosciliny-Bunde E, Bunde A, Pavlin S and Stanley H E,
2002 Physica A 316, 78-114.
[41] Oświȩcimka P and et. al, [arXive:cond-mat/0504608]
http://arxiv.org/abs/cond-mat/0504608
	Introduction
	Multifractal Detrended Fluctuation Analysis
	Description of MF-DFA method
	Relation to standard multifractal analysis
	Analysis of music frequency series
	Conclusion
	Acknowledgment
	References
ABSTRACT
  We show that it can be considered some of Bach pitches series as a stochastic
process with scaling behavior. Using multifractal deterend fluctuation analysis
(MF-DFA) method, frequency series of Bach pitches have been analyzed. In this
view we find same second moment exponents (after double profiling) in ranges
(1.7-1.8) in his works. Comparing MF-DFA results of original series to those
for shuffled and surrogate series we can distinguish multifractality due to
long-range correlations and a broad probability density function. Finally we
determine the scaling exponents and singularity spectrum. We conclude fat tail
has more effect in its multifractality nature than long-range correlations.

<|endoftext|><|startoftext|>
Circuit QED with a Flux Qubit Strongly Coupled
to a Coplanar Transmission Line Resonator
T.Lindström1, C.H. Webster1, J.E. Healey2, M. S. Colclough2 C.M.Muirhead2, A.Ya.Tzalenchuk1
1National Physical Laboratory, Hampton Road, Teddington, TW11 0LW, UK and
2 University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
(Dated: October 22, 2018)
We propose a scheme for circuit quantum electrodynamics with a superconducting flux-qubit
coupled to a high-Q coplanar resonator. Assuming realistic circuit parameters we predict that it
is possible to reach the strong coupling regime. Routes to metrological applications, such as single
photon generation and quantum non-demolition measurements are discussed.
I. INTRODUCTION
Until a few years ago it was an open question whether
true quantum effects such as quantum entanglement
would ever be observed in a man-made macroscopic elec-
tronic device. However, over the past decade, quantum
coherence has been demonstrated in a variety of macro-
scopic systems, including superconducting circuits1,2,3,4.
Many of these experiments drew inspiration from the pi-
oneering work on atomic qubits that took place a decade
earlier5. As the fields of atomic physics and quantum
optics continue to advance, it makes sense to continue to
look to them for guidance.
The universal nature of quantum mechanics is greatly
to our advantage, in that the terminology and method-
ology apply as well to macroscopic as to microscopic
systems. This allows well-known results from atomic
physics and quantum optics to be used to plan and pre-
dict the outcome of experiments on solid state devices.
Recently, such techniques have been applied with great
success to implement a number of ideas such as quan-
tum state tomography6, Mach-Zehnder interferometry7
and sideband cooling8.
This approach has also been very successful in the field
of cavity quantum electrodynamics (CQED) [see Ref.9
for an introduction]. In CQED an atomic 2-level system
(i.e. a qubit) is made to interact with a high-finesse op-
tical cavity with a coupling energy h̄g. Provided that
the relaxation rates γ of the qubit and κ of the cavity
field are smaller than g (known as the strong coupling
criterion), it is possible to observe a coherent exchange
of energy between the qubit and the cavity field. The
resulting entangled states can be detected spectroscopi-
cally. Recently, Schoelkopf and co-workers10,11 achieved
strong coupling in a macroscopic circuit comprising a su-
perconducting charge qubit and a coplanar transmission
line resonator. This new field is known as circuit-QED,
and has many potential applications such as the gener-
ation and detection of single microwave photons. More
recently strong coupling was observed in experiments on
photonic crystals12 and quantum dots13.
Much of the work to date on qubit-cavity sys-
tems has been focused on methods for making quan-
tum non-demolition (QND) measurements of the qubit
state. QND schemes have been used to read-out single
qubits10,14 and also to measure the photon number in
the cavity15. Several ways of reading out qubits using
low-Q cavities/resonators have also been reported16,17,
but these do not allow the strong coupling regime to be
reached.
The benefits of superconducting circuit-QED systems
over atomic systems are twofold. Firstly, the qubit en-
ergy can be tuned by varying the external magnetic field,
enabling control over the qubit-cavity interaction. Sec-
ondly, qubit parameters such as the level separation at
the degeneracy point can be engineered through appro-
priate circuit design. The main advantage of flux qubits
over other types of superconducting qubits is that they
are less susceptible to fluctuations in the background
charge and the associated noise. This makes them less
prone to decoherence, and therefore easier to manipulate
in a deterministic way.
In section II we summarise the established background
theory in the context of our proposed system; in III we
show that it is possible to achieve strong coupling be-
tween a superconducting flux qubit18 and a high-Q copla-
nar transmission line resonator; in IV we present com-
puter simulations of the resonator response; and in V we
propose a scheme for producing single photons on de-
mand.
II. THEORY OF CIRCUIT QED WITH A FLUX
QUBIT
In this section we analyze the processes which occur
when a superconducting qubit is coupled to a supercon-
ducting coplanar transmission line resonator, as shown in
Fig. 1. The effect of coupling a flux qubit to a resonator
has previously been experimentally demonstrated19 but
the quality factor of the resonator was too small to fulfill
the strong coupling criteria.
Two types of flux qubits will be discussed - an RF
SQUID, which consists of a single Josephson junction in
a superconducting loop, and a persistent current qubit
(PCQ), which has three junctions in the loop, one of
which is smaller than the others by a factor α18. Both
need to be biased by an external magnetic flux Φx, which
tunes the energy level separation through an anticross-
ing at Φx = 0.5Φ0. The resonator is a coplanar trans-
http://arxiv.org/abs/0704.0727v3
FIG. 1: a) Sketch of a typical coplanar waveguide resonator
of length l=λ/2 ≈ 11 mm. Shown is also how the qubit can be
placed in between the centre conductor and the ground plane
of the waveguide. b) Schematic diagram of a superconducting
qubit coupled to a coplanar transmission line resonator. (A)
Persistent current qubit. (B) RF SQUID. M is the mutual
inductance between the qubit and resonator, and ΦX is the
magnetic flux threading the qubit loop.
mission line with inductance L and capacitance C, which
is weakly coupled to external transmission lines via cou-
pling capacitors CC . The Hamiltonian H of the complete
system is
H = Hq +Hr +Hg +HI +HE . (1)
Hq describes the qubit, Hr the resonator, and Hg the
interaction between them. HI denotes the interaction
of the system with a periodic drive field. Finally, HE
describes the interaction of the resonator with its envi-
ronment, resulting in the loss of photons to the external
transmission lines and the interaction of the qubit with
its environment, resulting in spontaneous decay from the
excited state to the ground state.
The qubit Hamiltonian Hq is given by the expression
(h̄/2)(−ǫσz−δσx) where σz and σx are Pauli spin matri-
ces, δ is the the level repulsion, ǫ = (2Ip/h̄)(Φx −Φ0/2),
and Ip is the persistent current. In the case of an RF
SQUID suitable for operation as a qubit, Ip is roughly
equal to half the critical current of the single junction,
whereas, for the persistent current qubit, Ip is approxi-
mately equal to the critical current of the smallest of the
three junctions. If the qubit is operated at or near the
degeneracy point, Hq can be expressed more simply by
transforming to the basis in which the ground state | ↓〉
and excited state | ↑〉 correspond to symmetric and anti-
symmetric superpositions of clockwise and anti-clockwise
persistent currents. This yields
(| ↑〉〈↑ | − | ↓〉〈↓ |) = h̄ω0
σz, (2)
where the level separation is h̄ω0, ω0 being the qubit
Larmor frequency
ǫ2 + δ2.
Assuming the resonator supports only a single mode
of the electromagnetic field, its Hamiltonian is given by
Hr = h̄ωr
, (3)
where a†(a) is the creation(annihilation) operator which
creates (destroys) a single photon in the cavity. The
eigenstates of the resonator described by this Hamilto-
nian are Fock states |0〉 . . . |n〉 . . . with n photons. The
single mode condition corresponds to a harmonic oscilla-
tor with the energy levels at h̄ωr(n+ 1/2) and the zero-
point energy h̄ωr/2.
The interaction between the radiation and the qubit
is described as the dipole interaction Hg = −µ̂ · B̂ be-
tween a magnetic moment µ of the persistent current
circulating in the qubit loop and the magnetic field B in
the resonator. Introducing quantization, this term in the
Hamiltonian can be written in the form
Hg = h̄g(a
†σ− + σ+a), (4)
where σ+(σ−) is the raising(lowering) operator for the
qubit. This expression is valid in the so-called non-
dispersive regime where ωr ≈ ω035. The constant g char-
acterizes the qubit-photon interaction strength and the
expression in the brackets describes the process whereby
the qubit can be excited by absorbing a photon, or a
photon can be generated at the expense of de-exciting
the qubit into its ground state. In the next section we
shall explicitly calculate the dipole coupling strength g
for specific designs of the qubit and the cavity.
The interaction of the coupled system with an external
classical drive field can be seen as a periodic exchange of
photons between the resonator and the driving field:
HI = ξ(e
−iωta† + eiωta), (5)
where ξ is the drive amplitude. One can also drive the
qubit directly using a separate control line leading to
terms ξ′(e−iωtσ+ + eiωtσ−), but we shall not consider
this case any further.
Adding the terms (2)-(5) together we arrive at the
driven Jaynes-Cummings Hamiltonian20
σz + h̄ωr
+ h̄g(a†σ− + σ+a) (6)
+ ξ(e−iωta† + eiωta).
This Hamiltonian can be used to write down a mas-
ter equation which completely describes the dynamics of
the system. All the parameters which define the system
can be conveniently written in units of angular frequency
– a convention which we will follow here. If the cav-
ity is weakly coupled to an already weak drive field one
can achieve a regime when only two lower Fock states of
the resonator are relevant. Within the picture described
above we make one two-level system, the qubit, interact
with another two-level system, the resonator. When the
qubit is detuned from the resonator the eigenstates of
the coupled system can be written: |0 ↓〉, |0 ↑〉, |1 ↓〉
and |1 ↑〉, where the number represents a Fock state of
the resonator and the arrow represents the qubit state.
However, when the qubit is brought into resonance, Hg
couples the states |0 ↑〉 and |1 ↓〉 and lifts their degen-
eracy. The system will oscillate between the states |0 ↑〉
and |1 ↓〉 at a frequency Ω = 2g , known as the vac-
uum Rabi frequency21, giving rise to a splitting22 of the
central peak as shown in Fig. 2. One can visualize this
as a cycle in which the resonator and qubit continuously
exchange an amount of energy equal to one photon. As
the drive amplitude increases it will start to perturb the
system. This leads to a set of states that are shifted by
En = ±
nh̄g[1− (2ξ/g)2]3/4. (7)
Hence, the effect of the drive is to reduce the Rabi fre-
quency.
The coupling strength g can be determined experimen-
tally by making a spectroscopic measurement of the split-
ting ∆E1. The drive amplitude should be reduced until
the splitting reaches its maximum value, where it is equal
to 2gh̄.
|0 |1↑〉 ± ↓〉
Energy
hω0hωr
|1↓〉|0 |1↑〉 ± ↓〉
FIG. 2: Energy levels of the coupled qubit-cavity system. On
the left of the diagram the qubit is far detuned from the cavity.
As we move from left to right, the magnetic flux threading the
qubit loop is increased, tuning the qubit transition frequency
into resonance with the cavity. On the right of the diagram,
the qubit is once again far detuned.
The above discussion covers the resonant regime,
where the qubit is tuned into resonance with the cav-
ity. In contrast, when the detuning ∆ = ω0−ωr is large,
such that g/∆ ≪ 1, a dispersive Stark shift pulls the cav-
ity frequency by ±g2/∆. This so-called dispersive regime
can be used to perform quantum non-demolition mea-
surements of the qubit state11.
The effects of the environment on the system are taken
into account by the term HE . There are three types of
damping that need to be considered:
• Photons leak out of the cavity at a rate κ = ωr/Q,
where Q is the cavity quality factor.
• The qubit relaxes at a rate γ = 1/T1, where T1 is
the energy relaxation time.
• Pure dephasing of the qubit at a rate γφ = 1/Tφ =
1/T2 − 2/T1, where T2 is the dephasing time.
Dephasing plays a larger role in solid state systems than
atomic systems, due to the stronger interaction of solid
state qubits with their environments. In the absence of
pure dephasing we would have T2 = 2T1 but in real sys-
tems T2 is frequently much shorter than that, indicating
the need to take pure dephasing into account.
III. STRONG COUPLING WITH A FLUX
QUBIT
Below we estimate the coupling strength g using a
semi-classical approach. We treat the flux qubit as
a magnetic dipole and assume that it is placed at a
magnetic field antinode of the resonator. The coupling
strength is given by g = µB0rms/h̄, where B0rms is the
zero-point root mean square magnetic field generated by
the current fluctuations at the antinode of the resonator.
The magnetic dipole moment of the qubit is given by
µ = IpA, where Ip is the persistent current flowing
around the loop and A is the loop area. We can esti-
mate B0rms by considering the zero point energy of the
resonator, h̄ωr/2. This energy cycles continuously be-
tween inductive and capacitive components. The mag-
netic field is determined by the inductive component,
LI(t)2/2, where L is the total equivalent inductance of
the resonator near resonance and I(t) is the instanta-
neous current. At the moment when the energy is purely
inductive, we have (1/2)LI2max = (1/2)h̄ωr. Since I(t)
undergoes sinusoidal oscillation we therefore have that
Irms =
. (8)
Assuming that current flows in thin strips (whose width
is determined by the superconducting penetration depth)
at the edges of the centre conductor and ground plane,
the field at the antinode of the fundamental mode is ap-
proximately given by
B0rms ≈
µ0Irms
, (9)
where µ0 is the permeability of free space and r is half the
width of the gap between the centre conductor and the
ground plane (we assume the qubit is placed at the centre
of the gap). Therefore, the coupling strength between the
qubit and resonator is given approximately by
g ≈ IpAµ0
. (10)
By inserting realistic values for the parameters in the
above equation we can obtain an estimate of g.
First, we choose the fundamental frequency of the res-
onator. It is convenient to choose a value that lies within
the range 4–8 GHz, as this is well within the design scope
of both the qubit and resonator, and can be accessed
with commercial microwave sources and components. We
choose ωr/2π = 6 GHz. With a centre conductor of width
∼ 10 µm and a gap of width ∼ 5 µm, it is possible to
achieve a total inductance L ∼ 2 nH for a resonator op-
erated at its fundamental frequency..
Next, we choose Ip such that the transition frequency
ω0 of the qubit at the degeneracy point is slightly less
than that of the resonator. This will enable us to tune
the qubit in and out of resonance with the resonator by
changing the external flux Φx threading the qubit loop.
For the 3-junction persistent current qubit having two
junctions of critical current Ic = 800 nA and junction
capacitance C = 4 fF, and one junction of critical cur-
rent αIc, where α = 0.72, we get ω0/2π = 4.9 GHz at
the degeneracy point and Ip ≈ 580 nA. These parameters
were obtained by solving the Schroedinger equation nu-
merically. The transition frequency ω0 does not depend
on the area A of the qubit loop, provided that the loop
inductance remains small compared with the Josephson
inductance. However, we note that the larger the loop
area, the greater the (undesired) coupling to the environ-
ment. Here we choose a value A ≈ 8 µm2.
With the above parameters we obtain g/2π ≈ 35 MHz.
We now compare this with the rate of photon loss from
the resonator κ and the relaxation rate of the qubit γ.
When g > κ, γ, the coupled system is able to undergo
many cycles (≈ 2g/(κ+γ)) of vacuum Rabi oscillation be-
fore losing coherence. This is important for applications
such as single microwave photon generation. The photon
loss rate from the resonator is given by κ/2π = ωr/Q,
where Q is the loaded quality factor of the resonator. It
is possible to design a resonator with Q = 105, yield-
ing κ/2π ≈ 0.1 MHz. The relaxation rate of the qubit
is given by γ = 2π/T1. Taking T1 ∼ 1 µs3, we obtain
γ/2π ≈ 1 MHz. Naturally, the values for T1 and T2 for
a real system can not be predicted with any accuracy
and will depend on the experimental conditions. How-
ever, based on the data available in the literature23,24 we
believe that the aforementioned values are reasonable.
Hence, it is clear that our estimated value of g for the
persistent current qubit should satisfy the strong cou-
pling criterion.
For an RF SQUID with critical current Ic = 10 µA,
area A = 64 µm2, loop inductance LSQUID = 35 pH and
junction capacitance C = 50 fF, we obtain a transition
frequency ω0/2π = 4.6 GHz. In contrast to the persis-
tent current qubit, the area of the RF SQUID does affect
the transition frequency, via the loop inductance. It is
difficult to reduce the area further than the value we have
chosen, as this necessitates increasing the critical current
and decreasing the junction capacitance, which becomes
increasingly difficult to achieve in practice. Close to the
degeneracy point, the persistent current Ip in the above
SQUID is expected to be about 5 µA. Combined with the
increased loop area, this is likely to lead to an even larger
coupling strength g than predicted for the persistent cur-
rent qubit (unless the SQUID is displaced significantly
from the antinode of the resonator) but at the same time
make qubit more susceptible to noise. The fact that the
loop area of the 3-junction persistent current qubit can
be made small enough to render it relatively insensitive
to flux noise is one reason why it has been so successfully
used by e.g Mooij and co-workers3,18.
IV. NUMERICAL SIMULATION OF THE
SPECTRUM UNDER MICROWAVE
EXCITATION
Having shown that it is possible to reach the strong
coupling regime with a flux qubit coupled to a coplanar
resonator, we now simulate the results of a spectroscopic
experiment to measure g. The experiment would involve
driving the coupled system with an external microwave
field whose frequency ωl would be swept through the res-
onance of the coupled system. The most straightforward
way to probe the response of the system would be to
use what effectively amounts to a standard microwave
transmission ( S12 ) measurement of the cavity. The ex-
periment would be done by starting with the qubit far
detuned from the resonator, then stepping the external
magnetic flux to tune the qubit through the cavity reso-
nance. This type of experiment would allow us to record
the output power of the resonator as a function of qubit
Larmor frequency.
These simulations were performed by solving the mas-
ter equation using a Liouvillian with the Hamiltonian (7)
and three collapse (Lindblad) operators which account
for the decay and dephasing of the qubit and the cavity
at the aforementioned rates κ,γ and γφ (i.e. the effects of
HE). A brief description of the formalism can be found
in appendix B. All simulations were performed using the
“Quantum Optics Toolbox” developed by Tan25. Note
that all the figures show the spectrum of the intracavity
field (see appendix C for the definition and a description
of how it is calculated).
In the regime where the qubit Larmor frequency ω0 is
very far detuned from the cavity (i.e. when even disper-
sive effects are negligible) the qubit and the cavity are
effectively decoupled and no exchange of energy can take
place. If the system is probed by measuring the response
of the cavity, a single peak located at the bare resonance
−100 −80 −60 −40 −20 0 20 40 60 80 100
Frequency shift(MHz)
FIG. 3: The spectrum far from resonance (where the only
effect of the qubit is to broaden the resonance) and at zero
de-tuning for small drive (mean steady-state photon number
〈n〉 < 10−3), the latter giving rise to a Rabi splitting 2g.
Shown is also the spectrum at large (〈n〉 ≈ 8) drive amplitudes
(dashed line).
frequency ωr will be seen.
If instead the qubit is tuned exactly on resonance (∆ =
0) the effects of the coupling become clearly visible. Now,
there are two peaks located at ωr ±Ω/2 = ωr ± g as can
be seen in fig. 3. In between these two extremes there
is a gradual change from a single- to a double-peaked
spectrum where the splitting is approximately g2/∆ as
can be seen in the left picture of fig. 4. In this case the
pure dephasing rate γφ was set to zero. The result is a
”diamond-shaped” picture with a maximum splitting of
2g at zero detuning.
Far from resonance we can identify the four branches as
being associated with the states |0 ↑〉 and |1 ↓〉 above and
below the resonance. Exactly on resonance the system
is in a superposition of states |0 ↑〉 ± |1 ↓〉. This is in
agreement with the diagram shown in fig. 2. Note that
e.g. an interferometric measurement method26 must be
used in order to be able to observe the whole spectrum,
a simpler transmission experiments would only see one
peak off-resonance since such a measurement only records
transitions between photon states where the final state
can emit a photon; in this case |0 ↑〉 ↔ |1 ↓〉.
The right picture in fig. 4 shows the spectrum when
pure dephasing is introduced to the model. The result is
somewhat more complicated in that the spectral weights
are asymmetric with respect to wr − wl = 0 when the
qubit is detuned from resonance. The reason for this
is that dephasing leads to a loss of coherence, meaning
that the qubit tends to stay in its ground state and the
coupling between the states |0 ↑〉 and |1 ↓〉 is reduced,
the system is therefore no longer in a superposition of
those states. Whence, states with zero photons in the
cavity are effectively ”decoupled” from the one-photon
states. Since only states which allow for (at least) one
photon in the cavity can be measured it follows that only
the two branches (one above and one below resonance)
that (approximately) correspond to ±|1 ↓〉 will be clearly
visible. Note, however, that despite the loss of coherence
FIG. 4: (Color online) Power spectrum of the coupled qubit-
resonator system as a function of qubit detuning in the strong
coupling limit. The frequency of the drive field ωl is held at
the resonance frequency ωr of the bare resonator, while the
Larmor frequency ω0 of the qubit is tuned by changing the
external magnetic field. Left : Spectrum in the absence of
pure dephasing. Right: Adding a pure dephasing channel to
the dissipation results in an asymmetric spectrum, here the
pure dephasing rate γφ is 9.5 MHz. Shown is also one branch
of the expression ±g2/∆ (white dashed line). The following
parameters were used in the simulations (in GHz): ωr = ωl =
6, g = 0.035, κ = 0.004, γ = 0.001 and ξ/h̄ = 0.25κ.
an on-resonance measurement would still show two peaks
separated by 2g, i.e. exactly on resonance this situation is
effectively indistinguishable from the more coherent case
even when the full spectrum is measured.
While there are no bound states in the limit of strong
driving ξ > g/2 a continuum of states still exists giv-
ing rise to complex spectra (dashed line in fig. 3). The
structure is reminiscent to the so-called ”Mollow” peaks,
well known in atomic physics from e.g. fluorescence spec-
troscopy. However, in the latter case the peaks are the
result of strong driving of the atom (qubit) whereas in
this simulation the cavity is being driven so the similarity
is somewhat superficial. When as in this case the cavity
is driven so strongly that there are on average several
photons in the cavity, we see both a drive induced shift
of the position of the sidebands and a reappearance of
the central peak. Note that since it is difficult to directly
relate the parameter ξ to the power output from a mi-
crowave generator, care must be taken not to drive the
system inadvertently into this regime.
For the persistent current qubit considered here, a de-
tuning of ±0.4 GHz corresponds to an external magnetic
flux in the range ±10−4Φ0 which is a useful value for a
real experiment. However, for the RF SQUID, the flux
required is rather less: ±3 · 10−5Φ0.
One effect which is not taken into account in our sim-
ulations is the presence of thermal photons in the cavity.
Thermal effects can, in general, be neglected when work-
ing with optical cavities due to the very small average
number of photons at those frequencies. In experiment on
solid state qubits this is, however, not generally true since
they are operated in the microwave range. Also, the rele-
vant temperature scale is set not only by the the phonon
temperature (e.g. the temperature of the mixing cham-
ber of a dilution refrigerator) but also by the amount of
noise (essentially ”hot” photons) which reaches the sys-
tem via the leads. However, in a well-filtered system a
total temperature of 50 mK is attainable. The resonator
will then nearly be in its ground state with an average
thermal occupancy n̄ of 0.009 and thermal fluctuations in
the photon number of the order of 0.1. This justifies ig-
noring thermal effects in our simulations for now. That
said, even a moderate increase in temperature can sig-
nificantly change the outcome of an experiment27. One
further simplifying assumption in the model is that T1
and T2 do not change as the qubit is detuned from the
optimal bias point Φ0/2. While this is clearly unrealistic,
it has been experimentally shown24 that neither param-
eter should change dramatically in the parameter range
considered here, giving some justification to this approx-
imation.
V. APPLICATIONS OF CIRCUIT-QED
One of the most important applications of circuit QED
is the generation of single microwave photons on demand.
Single photon sources in the optical regime have been re-
alized using e.g. cavity QED with atoms and high finesse
optical cavities26. Design and fabrication of deterministic
sources that operate in the microwave regime have proved
to be more difficult, but a source based on superconduct-
ing circuit-QED was recently demonstrated28. This kind
of source could be used for quantum radiometry, as well
as for quantum information applications such as quan-
tum key distribution.
Various schemes can be envisaged29,30,31,32, by which
single photons can be generated with a circuit QED de-
vice. Below, we describe a straightforward technique,
based on manipulation of the qubit state with microwave
pulses and rapid changes in the DC magnetic field to tune
the qubit in and out of resonance with the cavity.
Our technique begins with the qubit far detuned from
the cavity and the combined system in its ground state
|0 ↓〉. A microwave π-pulse is applied to the qubit to
excite it to the state |0 ↑〉 [Fig. 5(a)], and this is fol-
lowed by a step in the magnetic field to bring the qubit
into resonance with the cavity [Fig. 5(b)]. The state of
the system immediately after the step is still |0 ↑〉, but
due to the qubit-cavity interaction on resonance, this is
no longer an eigenstate, so the system begins to precess
|0 + |1↑ ↓〉 〉|0 |1↑〉 − ↓〉
large
detuning
|0 + |1↑ ↓〉 〉
|0 |1↑〉 − ↓〉
|0↑〉|1↓〉
|0 + |1↑ ↓〉 〉
|0 |1↑〉 − ↓〉
|0↑〉|1↓〉
|0 + |1↑ ↓〉 〉|0 |1↑〉 − ↓〉
(a) (b)
(c) (d)
detuning
detuning
large
detuning
FIG. 5: (Color online). Bloch sphere diagrams showing the
state of the qubit-cavity system at successive stages of the
single photon generation process.
around the equator of the Bloch sphere at the vacuum
Rabi frequency 2g. After a time 2π/4g, the state of the
system will be |1 ↓〉 [Fig. 5(c)]. This means that coherent
energy exchange has taken place between the qubit and
the cavity, creating a photon-like state. Another step in
magnetic field detunes the qubit from the cavity so that
the state |1 ↓〉 is once more an eigenstate of the system
[Fig. 5(d)]. The system remains in this state until the
photon decays out of the cavity into one of the external
waveguides in a time of order 2π/κ. By repeating this
sequence many times, photons can be generated on de-
mand, provided that the time window within which they
are required is much longer than 2π/κ.
If a scheme similar to the one above is implemented, it
is important to prove that it generates single photons de-
terministically, rather than stochastically. At optical fre-
quencies this is done by studying photon-counting statis-
tics using interferometric measurements. Such measure-
ments require the use of a beamsplitter. An analogous
experiment can be envisaged in the microwave regime,
provided that a microwave beamsplitter can be realised.
Such a device has been proposed recently29, and could
lead to a microwave analogue of the Hanbury-Brown and
Twiss interferometer26.
VI. CONCLUSIONS
We have shown that it should be possible to reach the
strong coupling regime using a flux qubit coupled to a
coplanar waveguide resonator. If realized, it would open
the door to potential applications in metrology, quantum
communication and experimental tests of quantum me-
chanics. The fact that conventional lithography can be
used to fabricate the samples and that the experimen-
tal parameters can be chosen freely can in some cases
a be significant advantage compared to CQED implen-
tantaions utilizing e.g. atoms of ions. Since the relevant
frequencies are in the microwave regime it is also possible
to use well established methods to manipulate the sys-
tem. The main drawback compared to experiments done
at optical frequencies is the short coherence time of the
qubit and the fact that the system must be operated at
very low temperatures.
VII. ACKNOWLEDGMENTS
The authors would like to thank Mark Oxborrow,
Alexandre Zagoskin, Alexander Blais and Vladimir
Antonov for helpful discussions and comments. We
would also like to thank Scott Parkins for his help with
the Quantum Optics Toolbox. This work was funded
by the UK Department of Trade and Industry Quan-
tum Metrology Programme, project QM04.3.4. and the
Swedish Research Council.
APPENDIX A: DERIVATION OF THE
HAMILTONIAN
It is useful to compare the informal procedure used
in the introduction of this paper to derive the Hamil-
tonian with a more formal approach. Starting with
the bare qubit Hamiltonian − 1
(ǫσz + ∆σx) where ǫ =
2Ip(Φx −Φ0/2) we proceed just as before by noting that
the flux threading the qubit loop will be modulated via
the mutual inductance M that couple the fluctuations in
the cavity to the qubit. Writing the total external flux
as Φx = Φ
x + δΦ, where δΦ = M
(a†+a), adding
the Hamiltonian for the oscillator mode and the external
field and finally transforming into the eigenbasis of the
qubit we get
H = h̄ωr
σz + h̄g(a
†σ− + σ+a) sin θ
+ξ(e−iωta† + eiωta)− h̄g(a† + a)σz cos θ (A1)
in the RWA. Here we have introduced the mixing angle
θ = arctan∆/ǫ. This Hamiltonian is identical to the
J-C Hamiltonian (7) except that we now have an effec-
tive coupling g sin θ and an extra term h̄g(a†+ a)σz cos θ
which is zero when the qubit is operated at the degener-
acy point θ = π/2. By moving to an interaction frame
rotating at the drive frequency ω we see that all terms in
the Hamiltonian (A1) are time-independent except the
last term which picks up a factor exp(−iwω), meaning
it can be neglected in the rotating wave approxiamtion.
Note, however, that this additional term can potentially
play a role in the dispersive regime.
APPENDIX B: DISSIPATION
The effects of the environment on a quantum system is
in general very difficult to model but is nevertheless cru-
cial to understand since it is the cause of decoherence.
However, assuming the interaction with the environment
is Markovian the evolution of the (reduced) density ma-
trix of the system can be described by a master equation
ρ̇ = Lρ of Lindblad form33
= − i
[H, ρ] +
kCkρ+ ρC
where Ck are Lindblad operators. In the case considered
here we have 3 Lindblad operators. Firstly, the relaxation
from the excited state to the ground state at a rate γ1 =
1/T1 represented by a Lindblad operator proportional
to the lowering operator σ̂−, i.e. C1 =
−. The
cavity is loosing energy at a rate κ = ωr/Q which leads
to the ”destruction” of photons in the system, C2 =
Finally, we also need to consider pure dephasing of the
qubit at a rate γφ = 1/Tφ = 1/T2 − 1/2T1 where T2 is
the usual total dephasing time of the qubit. This process
is represented by the operator C3 =
σ̂z .
APPENDIX C: CALCULATION OF THE
SPECTRUM
Our aim is to calculate the steady-state power spec-
trum S(ω) of the intracavity field, formally this is de-
fined in terms of the photocount output from the cavity
as seen by a monochromatic detector20. The spectrum
can be calculated from the 2-time correlation function34
〈a†(t+ τ)a(τ)〉
S(ω) =
e−iωτ 〈a†(τ + t)a(t)〉dτ (C1)
which can be evaluated using the quantum-regression
theorem
< a†(τ + t)a(t) >= Tr{a†eLτaρ} (C2)
where the Liovillian (which includes the three Lindblad
operators defined in Appendix B) is given by the right
hand side of equation B1 and ρ is the steady-state den-
sity matrix which is the solution to Lρ = 0 These calcu-
lations are straightforward using the built-in routines of
the ”Quantum Optics Toolbox”.
1 Y. Nakamura, Yu.A. Pashkin, and J.S. Tsai. Coherent
control of macroscopic quantum states in a single-cooper-
pair box. Nature, 398:786–8, 1999.
2 C.H. Van Der Wal, A.C.J. Ter Haar, F.K. Wilhelm, R.N.
Schouten, C.J.P.M. Harmans, T.P. Orlando, S. Lloyd,
and J.E. Mooij. Quantum superposition of macroscopic
persistent-current states. Science, 290:773–7, 2000.
3 I. Chiorescu, Y. Nakamura, C.J.P.M. Harmans, and J.E.
Mooij. Coherent quantum dynamics of a superconducting
flux qubit. Science, 299:1869–71, 2003.
4 J.M. Martinis, S. Nam, J. Aumentado, and C. Urbina.
Rabi oscillations in a large Josephson-junction qubit. Phys.
Rev. Lett., 89(11):117901–1, 2003.
5 A. Zeilinger. Experiment and the foundations of quantum
physics. Reviews of Modern Physics, 71:288, 1999.
6 Matthias Steffen, M. Ansmann, Radoslaw C. A ND Bial-
czak, N. Katz, Erik Lucero, R. McDermott, Matthew Nee-
ley, E. M. Weig, A. N. Cleland, and John M. Martinis.
Measurement of the entanglement of two superconducting
qubits vis quantum state tomography. Science, 313:1423–
1425, 2006.
7 William D. Oliver, Yang Yu, Janice C. Lee, Karl K.
Berggren, Leonid S. Levitov, and Terry P. Orlando. Mach-
Zehnder interferometry in a strongly driven superconduct-
ing qubit. Science, 310:1653–1657, 2005.
8 S.A Valenzuela, W. Oliver, D.M. Berns, K.K. Berggren,
L.S. Levitov, and T.P. Orlando. Microwave-induced cool-
ing of a superconducting qubit. Science, 314:1589–1591,
2006.
9 C.S. Gerry and P.L. Knight. Introductory Quantum Optics.
Cambridge University Press, 2005.
10 A. Wallraff, D. I. Schuster, A. Blais, L. Frunzio, R.-S.
Huang, J. Majer, S. Kumar, S. M. Girvin, and R. J.
Schoelkopf. Strong coupling of a single photon to a super-
conducting qubit using circuit quantum electrodynamics.
Nature, 431:162–167, 2004.
11 A Blais, R Huang, A Wallraff, S.M. Girvin, and R.J.
Shoelkopf. Cavity quantum electrodynamics for supercon-
ducting electrical circuits: An architecture for quantum
computation. Phys. Rev. A, 69:062320, 2004.
12 T Yoshie, A. Sherer, A. Hendrickson, G. Khitrova, H.M.
Gibbs, G. Rupper, C. Ell, O.B. Shchekin, and D.G. Deppe.
Vacuum Rabi splitting with a single quantum dot in a pho-
tonic crystal nanocavity. Nature, 432:200–203, 2004.
13 J. P. Reithmaier, A. Lffler, G. Sk, C. Hofmann, S. Kuhn,
S. Reitzenstein, L.V. Keldysh, V.D. Kulakovskii, Rei-
necke T. L., and A. Forchel. Strong coupling in a single
quantum dot-semiconductor microcavity system. Nature,
432:197–200, 2004.
14 A. Lupaşcu, S. Saito, T. Pictot, P.C. de Groot, C.J.M.
Harmans, and J.E. Mooij. Quantum non-demolition mea-
surement of superconducting two-level system. arXiv:cond-
Mat, (0611505), 2006.
15 D. I. Schuster, A. A. Houck, J. A. Schreier, A. Wallraff,
J. M. Gambetta, A. Blais, L. Frunzio, B. Johnson, M. H.
Devoret, S. M. Girvin, and Schoelkopf R. J. Resolving
photon number states in a superconducting circuit. Nature,
445:515, 2007.
16 G. Johansson, L. Tornberg, and C. M. Wilson. Fast quan-
tum limited readout of a superconducting qubit using a
slow oscillator. Phys. Rev. B, 74:100504R, 2006.
17 Grajcar et al. Four-qubit device with mixed couplings.
Phys. Rev. Lett., 96:047006, 2006.
18 J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H.
van der Wal, and S. Lloyd. Josephson persistent-current
qubit. Science, 285:1036, 1999.
19 I. Chiorescu, P. Bertet, K. Semba, Y. Nakamura, C.J.P.M.
Harmans, and J.E. Mooij. Coherent dynamics of a flux
qubit coupled to a harmonic oscillator. Nature, 431:159–
162, 2004.
20 D.F. Walls and G.J. Milburn. Quantum Optics. Springer-
Verlag, 1994.
21 J. Johansson, S. Saito, T. Meno, H. Nakano, M. Ueda,
K. Semba, and H. Takayanagi. Vacuum Rabi oscillations in
a macroscopic superconducting qubit LC oscillator system.
Physical Review Letters, 96:127006, 2007.
22 P. Alsing, D.S Guo, and H.J. Carmichael. Dynamic Stark
effect for the Jaynes-Cummings system. Phys. Rev. A,
47(7):5135–5143, 1994.
23 P. Bertet, I. Chiorescu, G. Burkard, K. Semba, C. J. P. M.
Harmans, D.P. DiVincenzo, and J. E. Mooij. Relaxation
and dephasing in a flux qubit. arXiv:cond-mat, (0412485),
2004.
24 K. Kakuyanagi, T. Meno, S. Saito, H. Nakano, K. Semba,
H. Takayanagi, F. Deppe, and A. Shnirman. Dephasing
of a superconducting flux qubit. Physical Review Letters,
98:47004, 2007.
25 S. Tan. A computational toolbox for quantum and atomic
optics. Journal of Optics B, 1:424–432, 1999.
26 M. Oxborrow and A. Sinclair. Single photon sources. Con-
temporary Physics, 46(3):173–206, 2005.
27 I Rau, G. Johansson, and A. Shnirman. Cavity quantum
electrodynamics in superconducting circuits: Susceptibil-
ity at elevated temperatures. Phys. Rev. B, 70:054521,
2004.
28 A. A. Houck, D. I. Schuster, J. M. Gambetta, J. A.
Schreier, B. R. Johnson, J. M. Chow, J. Majer, L. Frunzio,
M. H. Devoret, S. M. Girvin, and R. J. Schoelkopf. Gen-
erating single microwave photons in a circuit. arXiv:cond-
mat, (0702648v1), 2007.
29 M. Mariantoni, M.J. Storcz, F.K. Wilhelm, W.D. Oliver,
A. Emmert, A. Marx, R. Gross, H. Christ, and E. Solano.
On-chip microwave Fock states and quantum homodyne
measurements. arXiv:Cond-Mat, (0509737), 2006.
30 F. Marquardt. Efficient on-chip source of microwave pho-
ton pairs in superconducting circuit qed. arXiv:cond-Mat,
(0605232), 2006.
31 A.M. Zagoskin, M. Grajcar, and A.N. Omelyanchouk. Se-
lective amplification of a quantum state. Phys. Rev. B,
70:060301(R), 2004.
32 K. Saito, M. Wubs, S. Kohler, P. Hänggi, and
Y. Kayanuma. Quantum state preparation in circuit QED
via Landau-Zener tunneling. Europhysics Letters (EPL),
76(1):22–28, 2006.
33 H. P. Breuer and F. Petruccione. The theory of open quan-
tum systems. Oxford University Press, 2004.
34 R. J. Glauber. Coherent and incoherent states of radiation
field. Physical Review, 131:2766–2788, 1963.
35 In the dispersive regime the effective Hamiltonian becomes
H = g2/∆(σ̂+σ̂− + â†âσ̂z)
ABSTRACT
  We propose a scheme for circuit quantum electrodynamics with a
superconducting flux-qubit coupled to a high-Q coplanar resonator. Assuming
realistic circuit parameters we predict that it is possible to reach the strong
coupling regime. Routes to metrological applications, such as single photon
generation and quantum non-demolition measurements are discussed.

<|endoftext|><|startoftext|>
Zero bias anomaly out of equilibrium
D. B. Gutman1, Yuval Gefen2, and A. D. Mirlin3,4,∗
Dept. of Physics, University of Florida, Gainesville, FL 32611, USA
Dept. of Condensed Matter Physics, Weizmann Institute of Science, Rehovot 76100, Israel
Institut für Nanotechnologie, Forschungszentrum Karlsruhe, 76021 Karlsruhe, Germany
Inst. für Theorie der kondensierten Materie, Universität Karlsruhe, 76128 Karlsruhe, Germany
(Dated: August 7, 2021)
The non-equilibrium zero bias anomaly (ZBA) in the tunneling density of states of a diffusive
metallic film is studied. An effective action describing virtual fluctuations out-of-equilibrium is
derived. The singular behavior of the equilibrium ZBA is smoothed out by real processes of inelastic
scattering.
PACS numbers: 73.23.-b, 73.40.Gk, 73.50.Td
The suppression of tunneling current at low bias due
to electron-electron interaction is known as the zero bias
anomaly (ZBA). The theory of ZBA for disordered metals
at thermal equilibrium has been developed, on a pertur-
bative level, by Altshuler and Aronov [1, 2]. The non-
perturbative generalization of this theory was achieved
by Finkelstein [3]. Measurements of the tunneling density
of states (DOS) in biased quasi-one-dimensional wires [4]
call for an extension of the theory to non-equilibrium
setups. In this work we study the ZBA for disordered
metallic films out of equilibrium, in both the perturba-
tive and the non-perturbative (in interaction) limits.
Besides the experimental motivation, the problem of
ZBA in a non-equilibrium system is of fundamental the-
oretical interest. At equilibrium, the distribution of elec-
trons in phase space has a single edge at the Fermi sur-
face. The Coulomb interaction between the tunneling
electron and the electrons in the Fermi sea excites vir-
tual particle-hole pairs around the Fermi edge, leading
to the suppression of the tunneling DOS, similarly to the
Debye-Waller factor. The suppression gets stronger when
the electron energy approaches the Fermi energy. Out of
equilibrium, the distribution of particles may have sev-
eral sharp edges rather than a single one at the Fermi
surface, which poses important questions addressed in
this work: How will the excitation of electron-hole pairs
in this situation affect the tunneling DOS? Will there
be an interpaly between the two edges? We show that
the two edges are not independent: one edge affects the
ZBA near the other via real interaction-induced scatter-
ing processes governing the dephasing of electrons in the
non-equlibrium regime. From this point of view the prob-
lem we are considering is a representive of a class of phe-
nomena that involve renormalization away from thermal
equilibrium, such as the Fermi edge singularity [5] and
the Kondo effect [6].
What makes the ZBA particularly interesting is
its deep connection to various conceptually impor-
tant phenomenological ideas. At equilibrium, the non-
perturbative results [3] have been reproduced by quan-
tum hydrodynamical methods [7], and, within the frame-
work of the theory of dissipation [8], by methods that
rely on the fluctuation-dissipation theorem. Our work
circumvents this restriction. Starting with the Keldysh
non-linear σ-model, we derive an effective action that ac-
counts for virtual fluctuations in disordered metals away
from equilibrium. This action is complementary to the
one for kinetics of real fluctuations (such as noise) devel-
oped earlier [11, 12, 13, 14]. Further, we discuss a connec-
tion between our theory and phenomenological methods
[7, 8, 9, 10]. As a central application of our theory, we an-
alyze the ZBA problem and calculate the tunneling DOS
for a two-dimensional (2D) diffusive metallic film subject
to external bias.
Consider an electron which tunnels from a tip into a
metal, subject to an external bias. The Coulomb inter-
action U(q) (U(q) = 2πe2/q, d = 2) causes electrons in
the metal to readjust their position, such that they try
to screen the added charge. Their motion is impeded by
static disorder. For a sufficiently good metal, character-
ized by the dimensionless conductance g = ǫF τ ≫ 1, (ǫF
is the Fermi energy and τ – the elastic mean free time)
the electron motion is diffusive. Low energy excitations
are accounted for by the non-linear Keldysh σ-model [15]
iS[Q,φ] = iTr{φT (U−1 + ν0)σ1φ}
Tr{D(∇Q)2 − 4∂tQ} − iπν0Tr{φαγ
αQ}, (1)
where Q(r, t, t′) is a 2 × 2 matrix field (in the Keldysh
space) describing slow electronic modes, φ(r, t) is a 2-
vector in the Keldysh space representing the Coulomb
interaction, and Tr denotes the trace both over Keldysh
indices (TrK), spatial, and temporal coordinates. The Q
field is subject to a non-linear constraint,
dt1Q(r, t, t1)Q(r, t1, t
′) = δ(t− t′)σ0, (2)
γ1 ≡ σ0 is a 2×2 unit matrix, γ2 ≡ σ1 the first Pauli ma-
trix, ν0 is the DOS at the Fermi energy in the absence of
interaction, and D is the diffusion coefficient. The theory
can be rewritten in terms of the field Q only. Integrating
the φ field out, we derive the following action:
iS[Q] = −
Tr{D(∇Q)2 − 4∂tQ}
i(πν0)
TrK{σ1Q}
U−1 + ν0
TrK{Q}. (3)
http://arxiv.org/abs/0704.0728v1
Physical observables can be found by differentiating the
generating functional
Z[ϕ1, ϕ2] = exp (iS[Q]) exp
1 + ν0U
2ϕ1ϕ2 − πϕ1TrK{Q} − πϕ2TrK{σ1Q}
over the source fields ϕ1 and ϕ2.
So far all low energy excitation in the problem have
been kept indiscriminately. The price one pays for this is
an extreme technical complexity of the theory. In some
cases, this description is excessive and the theory can be
simplified by singling out a particular subspace of the
Q matrices. The resulting theory is less general, but
more suitable for tackling a specific class of problems.
The noise statistics in disordered systems is a remark-
able example of such a simplification. An extension of
the Boltzman-Langevin approach [16] to high order cor-
relators is achieved on the subspace of matrices that are
diagonal in the Wigner representation,
Q(r, ǫ, t) = e
f̄(r,ǫ,t)
1 2− 4f(r, ǫ, t)
f̄(r,ǫ,t).
The theory [11, 13, 14] resulting from the σ-model (3),
reduced to the subspace (5), accounts for real fluctuations
in agreement with the cascade idea of Nagaev [17].
Below we use a similar ideology, “projecting” the σ-
model on the subspace appropriate for the ZBA prob-
lem. According to works on ZBA at equilibrium, these
are virtual fluctuations of gauge fields that dominate the
suppression of the tunneling DOS [3, 15, 18]. This sug-
gests that gauge-type fluctuations constituting local-in-
time rotations of the saddle point Λt−t′ ,
Q(r, t, t′) = e
f̄(r,t)+ i
f(r,t)Λ
r,t−t′e
f(r,t′)−
f̄(r,t′),
r,t−t′ =
1 2− 4n(r, t− t′)
, (6)
where n(r, t− t′) is the Fourier transform of the local dis-
tribution function n(r, ǫ), are to be retained. Plugging
(6) into the action (3) and expanding up to quadratic
order in f and f̄ , we derive an effective theory of fluctu-
ations in an interacting diffusive conductor.
iS[f, f̄ ] =
(dω)(dq)
f̄(−ω,−q) ω
−Dq2 +
1 + ν0U(q)
f(ω, q)
−Dq2T (ω)f̄(−ω,−q)f̄(ω, q)
. (7)
Here the effective temperature T (ω) is defined as
T (ω) =
dǫ n(ǫ) [2− n(ǫ − ω)− n(ǫ+ ω)] , (8)
and n(ǫ) is a distribution function determined by the
Boltzmann equation.
To demonstrate the consistency of our approach, we
first show that the effective action (7) reproduces cor-
rectly known density correlation functions. The sym-
metrized density correlation function is given by
ρ̂(r, t), ρ̂(0, 0)
〉ω,q = −
∂φ2(ω, q)∂φ2(−ω,−q)
1 + ν0U
ω2〈ff〉ω,q . (9)
Similarly, the response function can be expressed as
i〈[ρ̂(r, t), ρ̂(0, 0)]−θ(t)〉ω,q =
∂φ1(ω, q)∂φ2(−ω,−q)
1 + ν0U
ω2〈f f̄〉ω,q +
1 + ν0U
. (10)
Calculating the correlation functions entering Eqs. (9),
(10) using the action (7),
〈ff〉ω,q =
4Dq2T (ω)
Dq2 + iω
1+ν0V (q)
〈fω,qf̄−ω,−q〉 ≡ 〈f f̄〉ω,q =
Dq2 − iω
1+ν0U(q)
) ,(11)
we reproduce the known results for for the spectral func-
tion of density fluctuations
ρ̂(r, t), ρ̂(0, 0)
〉ω,q =
2T (ω)
|Dq2[1 + ν0U(q)] + iω|2
and the density-density response function,
i〈[ρ̂(r, t), ρ̂(0, 0)]−θ(t)〉ω,q =
Dq2[1 + ν0U(q)]− iω
It is worth noting the analogy between our effective ac-
tion and phenomenological theories that describe ZBA at
equilibrium within an effective environment model [9, 10].
The latter explain the ZBA as the influence of virtual
fluctuations of an electromagnetic field at equilibrium [8]
on the electron tunneling. In view of the fluctuation-
dissipation theorem, fluctuations of the electromagnetic
field are determined by the complex impedance of the
system. In the zero dimensional case, modes of the elec-
tromagnetic field can be considered as independent quan-
tum harmonic oscillators. Being suddenly shaken by the
incoming electron, they move away from the equilibrium
position, reducing the overlap with their original config-
uration hence suppressing electron tunneling amplitude.
The action (7) describes similar processes in a 2D diffu-
sive system and without the assumption of thermal equi-
librium, thus keeping the information about both real
and virtual processes in a non-equilibrium state.
Now we are ready to apply the theory to the problem
of ZBA out of equilibrium. The tunneling DOS
ν(ǫ) =
GR(ǫ, p)−GA(ǫ, p)
can be rewritten in terms of Q matrices as
ν(ǫ) =
dt e−iǫ(t−t
′)〈TrK Q(r, t
′, t)σz〉Q . (13)
Plugging Eq. (6) into Eq.(13), we find
= 1+2i
dt n(t) eiǫt−
Iff (t) sin
Iff̄ (t)
, (14)
where we used the notations
Iff (t) =
(dω)(dq) (1 − cosωt) 〈〈ff〉〉ω,q ,
Iff̄ (t) =
(dω)(dq) sinωt 〈〈f f̄〉〉ω,q . (15)
There is a subtle point related to the definition of the cor-
relator 〈〈ff〉〉 and 〈〈f f̄〉〉 in (15). The ZBA should vanish
in the absence of interaction (U = 0). This physically ob-
vious statement is valid to all orders of the diagrammatic
calculations (all potentially singular contributions vanish
after the time integration over the Keldysh contour) and
is satisfied by the non-linear σ model. However, it ceases
to be an exact feature of the theory when the full space
of Q matrices is reduced to the subspace Eq. (6). This
problem is easily cured. Since we are only interested in
summing up the leading interaction-induced ln2 ǫ contri-
butions to the tunneling DOS (to all orders), these are
the interaction-induced parts of 〈ff〉 and 〈f f̄〉 that we
have to keep in Eq. (15) [19]:
〈〈f f̄〉〉ω,q ≡ 〈f f̄〉ω,q − 〈f f̄〉ω,q,U=0
(Dq2 − iω)[Dq2(1 + ν0U)− iω]
. (16)
Analogously,
〈〈ff〉〉ω,q =
4Dq2T (ω)(2 + ν0U)U
|Dq2 − iω|2|Dq2(1 + ν0U)− iω|2
. (17)
Equations (14)–(17) constitute our general result for the
non-equilibrium ZBA. At equilibrium they reproduce the
known results for the ZBA in the perturbative [1, 2] and
non-perturbative [3] regimes.
To illustrate the effect of non-equilibrium conditions on
ZBA we consider a diffusive film of length L connected to
two zero-temperature reservoirs with voltage difference
V . We assume that the energy relaxation in the film
can be neglected, i.e. that L is shorter than the energy
relaxation length, a condition met at not too high bias.
In this case the solution of the Boltzmann equation is a
double-step function
n(ǫ, x) = an0(ǫ− eV/2) + (1 − a)n0(ǫ+ eV/2) , (18)
where n0(ǫ) is the Fermi distribution function, a = x/L,
and x is the distance from the reservoir. This distribution
corresponds to the effective temperature
T (ω) = [a2 + (1− a)2]Teq(ω)
+a(1− a)[Teq(ω + eV ) + Teq(ω − eV )] , (19)
where Teq(ω) is the equilibrium value
Teq(ω) = ω coth(ω/T ) →
|ω| . (20)
Substituting Eqs. (16), (17) into Eqs.(15),( 14), we find
the tunneling DOS out of equilibrium:
= a exp
8π2ν0D
log(D2κ4τt−)
+(1− a) exp
8π2ν0D
log(D2κ4τt+)
. (21)
This result is valid in both the perturbative (when the
exponentials can be expanded up to the first non-trivial
term) and non-perturbative regimes. The DOS consists
of two terms corresponding to electrons coming from the
left and right reservoir. The energy scales governing the
argument of the logarithm in these two terms are
= max
a(1− a)eV log
4πν0D
, (22)
respectively (κ = 2πe2ν).
The evolution of the ZBA in the tunneling DOS with
decreasing conductance (i.e. from the perturbative to
the non-perturbative regime) is illustrated in Fig. 1. The
DOS has a two-dip shape with minima reached at the
energies where the distribution function exhibits discon-
tinuous jumps (i.e. at the positions of the Fermi edge
in the left and right leads, ǫ = ± eV
). Away from the
minima, the DOS is controlled by the energy measured
from the corresponding edge. As this energy decreases
(we get closer to one of the Fermi edges), the singularity
near this edge gets affected by the presence of the other
edge. The broadening of the ZBA singularity takes place
on a new energy scale,
(V ) =
a(1 − a)
4πν0D
eV log
. (23)
The notation introduced in Eq. (23) stresses the analogy
with the equilibrium dephasing rate τ−1
(T ) governing
the infrared cutoff of the interference phenomena such
as the weak localization [2]. The emergence of τ−1
shows that inelastic scattering processes (responsible for
dephasing) lead to smearing of the ZBA singularity.
Qualitatively the results can be explained as follows.
Virtual fluctuations of the gauge fields act on a quasipar-
ticle and suppress the tunneling DOS. Their influence is
limited by the quasiparticle life time. An external bias
-0.001 -0.0005 0 0.0005 0.001
( a )
-0.001 -0.0005 0 0.0005 0.001
FIG. 1: Tunneling DOS vs. energy for the double-step dis-
tribution function (18) with a = 0.5. The dimensionless con-
ductance is, from top to bottom, g = 100, 10 (a) and 1 (b).
enhances fluctuations of the electromagnetic field, which
in turn dephase the tunneling quasiparticle. This results
in a competition between the virtual processes establish-
ing the ZBA and the real fluctuations (noise) cutting it
Let us stress the crucial difference with the equilib-
rium situation. At equilibrium, the ZBA singularity
in 2D is cut off by temperature (which enters via the
distribution function); specifically, the dephasing rate
(T ) does not affect the ZBA in any essential way,
since τ−1
(T ) ≪ T . It is thus a distinct feature of the
strongly non-equilibrium regime that the ZBA is smeared
by the inelastic scattering (dephasing) rate.
In conclusion, we have developed an effective theory
for virtual fluctuations in disordered metals out of equi-
librium. Using this theory, we studied the ZBA in a 2D
metallic film biased by external voltage. We have found
that out of equilibrium the tunneling DOS has a double-
dip structure with minima reached at the ”edges” of the
particle distribution. The ZBA near any of the ”edges” is
influenced by the other one. The suppression of DOS is
smoothed out by the real processes of inelastic scattering
(dephasing) with characteristic energy scale τφ(V ). Fur-
ther applications and extensions of our theory include, in
particular, the ZBA in quasi-one-dimensional and strictly
one-dimensional (Luttinger liquid) wires [20] out of equi-
librium; these results will be reported elsewhere.
We thank D. Bagrets, N. Birge, A. Finkelstein, D.
Maslov, A. Shnirman, and A. Shytov for useful discus-
sions. This work was supported by NSF-DMR-0308377
(DG), US-Israel BSF, ISF of the Israel Academy of Sci-
ences, and DFG SPP 1285 (YG), DFG Center for Func-
tional Nanostructures, EC Transnational Access Pro-
gram at the WIS Braun Submicron Center (ADM), and
Einstein Minerva Center.
[*] Also at Petersburg Nuclear Physics Institute, 188300
St. Petersburg, Russia.
[1] B. L. Altshuler and A. G. Aronov, Solid State Commun.
30, 115 (1979); Sov. Phys. JETP 50, 968 (1979); B. L.
Altshuler, A. G. Aronov, and P. A. Lee, Phys. Rev. Lett.
44, 1288 (1980).
[2] B. L. Altshuler and A. G. Aronov, in Electron–Electron
Interaction In Disordered Systems, ed. by A.L. Efros and
M. Pollak (Elsevier, 1985), p.1.
[3] A. M. Finkel’stein, Sov. Phys. JETP 57, 97 (1983); ibid
59, 212 (1984); Sov. Sci. Rev. A 14, 1 (1990).
[4] A. Anthore, F. Pierre, H. Pothier, and D. Esteve, Phys.
Rev. Lett. 90, 076806 (2003).
[5] D.A. Abanin and L.S. Levitov, Phys. Rev. Lett. 94,
186803 (2005).
[6] J. Paaske, A. Rosch, J. Kroha, and P. Wölfle, Phys. Rev.
B 70, 155301 (2004); J. Paaske, A. Rosch, P. Wölfle, N.
Mason, C.M. Marcus, and J. Nygard, Nature Physics 2,
460 (2006).
[7] L.S. Levitov and A.V. Shytov, JETP Lett. 66, 214
(1997).
[8] Yu. V. Nazarov, Sov. Phys. JETP 68, 561 (1989).
[9] M.H. Devoret, D. Esteve, H. Grabert, G.-L. Ingold, H.
Pothier, and C. Urbina, Phys. Rev. Lett. 64, 1824 (1990).
[10] S.M. Girvin, L.I. Glazman, M. Jonson, D.R. Penn, and
M.D. Stiles, Phys. Rev. Lett. 64, 3183 (1990).
[11] S. Pilgram, Phys. Rev. B 69, 115315 (2004); S. Pilgram,
K.E. Nagaev, and M. Büttiker, Phys. Rev. B 70, 045304
(2004); A.N. Jordan, E.V. Sukhorukov, and S. Pilgram,
J. Math. Phys. 45, 4386 (2004).
[12] T. Bodineau and B. Derrida, Phys. Rev. Lett. 92, 180601
(2004).
[13] D.B. Gutman, Y. Gefen, and A.D. Mirlin, Phys. Rev B
71, 085118 (2005).
[14] D.A. Bagrets, Phys. Rev. Lett. 93, 236803 (2004).
[15] A. Kamenev and A. Andreev, Phys Rev B. 60, 2218
(1999).
[16] A. Ya. Shulman and Sh. M. Kogan, Sov. Phys. JETP
29, 467 (1969); S.V. Gantsevich, V. L. Gurevich, and R.
Katilius, Sov. Phys. JETP 30, 276 (1970).
[17] K. Nagaev, Phys. Rev. B 66, 075334, (2002).
[18] P. Kopietz, Phys. Rev. Lett. 81, 2120 (1998).
[19] A similar subtraction was implemented in L.S. Levitov,
A.V. Shytov, and B.I. Halperin, Phys. Rev. B 64, 075322
(2001).
[20] The equilibrium ZBA in the 1D geometry, including
a crossover between the diffusve and ballistic regimes,
was recently studied in E.G. Mishchenko, A.V. Andreev,
and L.I. Glazman, Phys. Rev. Lett. 87, 246801 (2001);
C. Mora, R. Egger, and A. Altland, Phys. Rev. B 75,
035310 (2007).
ABSTRACT
  The non-equilibrium zero bias anomaly (ZBA) in the tunneling density of
states of a diffusive metallic film is studied. An effective action describing
virtual fluctuations out-of-equilibrium is derived. The singular behavior of
the equilibrium ZBA is smoothed out by real processes of inelastic scattering.

<|endoftext|><|startoftext|>
Introduction
The physics of Kaluza-Klein black holes, i.e. black hole spacetimes asymp-
totic at infinity to M × S1, has proved to be a surprisingly rich subject, in-
cluding such phenomena as the Gregory-Laflamme instability, non-uniform
static black strings and the black hole/black string phase transition (see e.g.
the reviews [1, 2]). Research to date has focused primarilly on the static
case. However, it is also of interest to explore the properties of stationary
solutions. Accordingly, in this paper we will study the thermodynamics of
stationary Kaluza-Klein black holes1.
Static Kaluza-Klein black holes are characterized at infinity by the mass
M, tension T and the length L of the compact direction. The physical
meaning of the tension follows from its role in the first law for static S1
Kaluza-Klein black holes [6][7][8]
dM = κ
dA + T dL. (1)
We see that the tension determines the variation of the mass with varying
length of the compact direction, under the constraint that the horizon area is
held fixed. Within the thermodynamic analogy, it appears to be an intensive
parameter of the system, like temperature or pressure.
Stationary Kaluza-Klein black holes can carry linear momentum in the
compact direction, as well as angular momentum. In this paper, we will be
interested in this linear momentum, which we denote by P, and will assume
that the angular momentum vanishes. The simplest solutions with P 6= 0
are boosted black strings. These are obtained by starting from the infinite
uniform black string, boosting in the z direction and then identifying the new
z coordinate with period L. The boosted black string is then locally, but not
globally, the same as the static uniform black string. Further stationary, but
not z-translationally invariant, solutions may be obtained by giving localized
black holes or non-uniform black strings velocity in the compact direction.
In subsequent sections, we present the following results. We use the
Hamiltonian methods of [9][10][8] to establish the first law for stationary, non-
rotating Kaluza-Klein black holes. We also derive two Smarr formulas for
these spacetimes. These are exact relations between the geometric quantities
1Aspects of stationary Kaluza-Klein black holes have been studied in [3][4]. The ther-
modynamics of asymptotically AdS, boosted domain walls have been investigated in ref-
erence [5].
M, κA, vHP and T L, where the quantity VH is defined below. The first of
these formulas holds for the entire class of spacetimes under consideration.
The second Smarr formula holds under the additional assumption of exact
translation invariance in the compact direction. A linear combination of these
two formulas gives the relation between mass and tension for the boosted
black string. We derive each of these Smarr formulas in two ways, first using
scaling arguments (as in e.g. reference [11]) and second using Komar integral
relations (as in reference [12]). Finally, we present a Gibbs-Duhem formula
that relates variations in the tension to variations in the other intensive
parameters. This result generalizes the ‘tension first law’ of reference [13].
Our result for the first law resolves a small puzzle related to the boosted
black string, which formed part of the motivation for this work. It was found
in reference [3] that the tension of the boosted black string becomes negative
for values of the boost parameter in excess of a certain critical value, which
depends only on the spacetime dimension. If the physical interpretation of
the tension based on equation (1) were to continue to hold in the stationary
case, then the energy of the system would decrease with increasing L, which
seems counter-intuitive. This puzzle is resolved by showing that the coeffi-
cient of the dL term in the first law for black holes is an effective tension T̂ .
The effective tension T̂ is equal to the ADM tension in the static case, but
includes a contribution from the momentum in the stationary case. For the
boosted black string T̂ is always positive, and is in fact given by the tension
of the unboosted black string with the same horizon radius.
2 Stationary, non-rotating Kaluza-Klein black
holes
We consider stationary D-dimensional vacuum black hole spacetimes that
are asymptotic to MD−1 × S1, and assume that the black hole horizon is a
bifurcate Killing horizon. In accordance with our focus on linear momentum
around the S1, we take the ADM angular momentum to vanish. We denote
the horizon generating Killing field by la, and assume that at infinity it has
the form
la = T a + vHZ
a (2)
where T a = (∂/∂t)a and Za = (∂/∂z)a, with z being the coordinate around
the S1. The surface gravity κ of the black hole horizon is defined, as usual,
via the relation on the horizon
∇a(lblb) = −2κla. (3)
The form (2) of the horizon generating Killing field at infinity resem-
bles the decomposition of the horizon generating Killing field for a rotating,
asymptotically flat black hole. In that case, i.e. upon replacing vHZ
a, with φ an azimuthal coordinate, one can show that T a and φa are
themselves Killing vectors [14, 15, 16]. The quantity ΩH can then be in-
terpreted as the angular velocity of the horizon, and further shown to be
constant on the horizon [14].
Returning to the case of Kaluza-Klein black holes, the situation is quite
different. Already in the static case, solutions exist which are non-uniform in
the z direction. In the stationary case then, it will not generally be the case
that T a and Za are Killing vectors. For localized black holes or non-uniform
black strings with velocity around the S1, only the linear combination la is
a Killing vector.
2.1 Two commuting Killing fields
It is nonetheless useful to separately consider the case in which T a = (∂/∂t)a
and Za = (∂/∂z)a are two commuting Killing fields, and that the relation
(2) holds throughout the spacetime. The boosted black string falls into this
class of spacetimes. If both Za and T a are Killing fields, then the quantity
vH in equation (2) may be considered to be the velocity of the black hole
horizon. This identification follows in a similar way to that of ΩH as the
angular velocity in the rotating case (see e.g. the article by Carter in [17]).
It follows from equation (3), together with our assumption that T a and Za
are commuting Killing vectors, that in addition to lala = 0 on the horizon,
one also has there the orthogonality relations
laTa = 0, l
aZa = 0. (4)
Given these, one can then show that the metric components on the horizon
satisfy the two relations
(T aZa)
2 = (T aTa) Z
bZb, vH = −
T aZa
(ZbZb)
The second of these leads to the interpretation of vH as the velocity of the
horizon in the following manner. For rotating black holes, one considers ‘zero
angular momentum observers’ or ZAMO’s. The angular velocity ΩH of the
horizon is the limit of a ZAMO’s angular velocity as it approaches the horizon
radius. For a boosted black string, we may analogously consider observers
with zero linear momentum along the string, which we might be justified
in calling ZELMO’s. Let pa = m dxa/dτ be the momentum of a particle
following a geodesic. It’s energy E = −T apa and the z−component of its
momentum P = Zapa are both constants of motion. The condition P = 0
of vanishing linear momentum is then dz/dt = −gtz/gzz, and we see that on
the horizon the coordinate velocity of a ZELMO is equal to vH .
3 ADM mass, tension and momentum
We review the formulas for the ADM mass, tension and momentum. Let us
write the spacetime metric near infinity as gab = ηab + γab, where ηab is the
D-dimensional Minkowski metric. The components of γab are assumed to fall-
off sufficiently rapidly that the integral expressions for the mass, tension and
momentum are well-defined. In the asymptotic region, write the spacetime
coordinates as xa = (t, z, xi), where i = 1, . . . , D − 2 and the coordinate z
running around the S1 is identified with period L. Let Σ be a spatial slice
and ∂Σ∞ its boundary at spatial infinity. The ADM mass and momentum
in the z direction are then given in asymptotically Cartesian coordinates by
the integrals
M = 1
dz dsi
−∂iγjj − ∂iγzz + ∂jγij
dz dsi ∂
iγtz (7)
where indices are raised and lowered with the asymptotic metric ηab and the
area element dsi is that of a sphere S
D−3 at infinity in a slice of constant t
and z.
The ADM tension is similarly given by the integral [13][6][18]
T = −
∂Σ∞/S1
−∂iγjj − ∂iγtt + ∂jγij
. (8)
Note that in contrast with the ADM mass and momentum, the definition
of the tension does not include an integral in the z-direction. The ADM
mass is an integral over the boundary of a slice of constant t, which includes
the direction around the S1. The tension, on the other hand, is defined
[13][6][18] by an integral over the boundary of a slice of constant z. This
includes, in principle, an integration over time. However, if one expands the
integrand around spatial infinity, one finds that terms that make non-zero
contributions to the integral are always time independent. Time dependent
terms fall-off too rapidly to contribute. Hence, one can omit the integration
over the time direction and work with the quantity T defined above, which
is strictly speaking a ‘tension per unit time’.
We can evaluate these formulas for M, P and T in terms of the asymp-
totic parameters of our spacetimes. The spacetimes we consider have topol-
ogy RD−1 × S1, the coordinate z in the compact direction being identified
with period L. We can write the metric explicitly as
ds2 = gttdt
2 + 2gtzdtdz + gzzdz
2 + 2(gtidtdx
i + gzidzdx
i) + gijdx
idxj (9)
where xi with i = 1, . . .D − 2 are the non-compact spatial coordinates. We
assume the following falloff conditions at spatial infinity
gtt ≃ −1 + ct/rD−4, gzz ≃ 1 + cz/rD−4, gtz ≃ ctz/rD−4, (10)
and further that the coefficients gti and gzi falloff sufficiently fast that they
do not contribute to any ADM integrals at infinity. The mass, tension [7]
and momentum can then be shown, using the field equations, to be given in
terms of the asymptotic parameters ct, cz and ctz by
M = ΩD−3  L
((D − 3)ct − cz), T =
(ct − (D − 3)cz), (11)
P = −(D − 4)ΩD−3  L
ctz . (12)
4 The boosted black string
The boosted black string serves as a simple analytic vacuum spacetime in
which to check the results we present below for the first law, Smarr and
Gibbs-Duhem relations. The boosted black string metric may be obtained
starting from the uniform black string, performing a boost transformation
with parameter β and identifying the new, boosted z coordinate with period
L. This gives
ds2 = −(1 −
cosh2 β)dt2 + (1 +
sinh2 β)dz2 (13)
sinh β cosh βdzdt + (1 − c
)−1dr2 + r2dΩ2D−3
The horizon, which has topology SD−3×S1 is located at rH = c1/(D−4). From
the asymptotic form of the metric, one finds using the expressions (11) and
(12) that the ADM mass, tension and momentum are given as in [3] by
ΩD−3L
rD−4H ((D − 4) cosh2 β + 1) (14)
rD−4H (1 − (D − 4) sinh2 β) (15)
P = −
ΩD−3L
rD−4H (D − 4) sinh β cosh β (16)
Note that, as mentioned in the introduction, the tension becomes negative
for sinh2 β > 1/(D − 4). We can further compute, as in reference [3], that
the horizon area, surface gravity, and horizon velocity of the boosted black
string are given by
A = ΩD−3LrD−3H cosh β, κ =
D − 4
2rH cosh β
, vH = −
sinh β
cosh β
. (17)
5 Gauss’ Laws for Perturbations
Following the work of [9], we use the Hamiltonian formalism of general rela-
tivity to derive the first law for stationary, non-rotating Kaluza-Klein black
holes. Another of our goals is to derive a ‘first law’ for variations in the
tension as in reference [13], for this class of spacetimes. This requires a slight
generalization of the Hamiltonian formalism to accomodate evolution of data
on timelike surfaces in a spacelike direction. Although, as we discuss below
in section (8), we have not yet succeeded in providing a Hamiltonian deriva-
tion of the ‘tension first law’ in the stationary case, our presentation of the
Hamiltonian formalism will be general enough to provide the necessary tools.
The essence of the method is as follows. In vacuum gravity, suppose one
has a black hole solution with a Killing field. Now consider solutions that
are perturbatively close to this background solution, but are not required
to have the original Killing symmetry. The linearized Einstein constraint
equations on a hypersurface can be expressed in the form of a Gauss’ law
(see [10]), relating a boundary integral at infinity to a boundary integral at
the horizon. The physical meaning of this Gauss’ law relation depends on
the choice of Killing field, as well as on the choice of hypersurface. Taking
the generator la of a Killing horizon, together with an appropriate choice of a
spacelike hypersurface, yields the usual first law for variation of the mass [9].
In the case of solutions that are z translation invariant, choosing the spatial
translation Killing vector Za, again with an appropriate choice of a timelike
hypersurface, gives a ‘first law’ for variations in the tension [13].
The formalism then proceeds in the following way. Assume we have a
foliation of a spacetime by a family of hypersurfaces of constant coordinate
w. We denote these hypersurfaces, both collectively and individually, by V
and the unit normal to the hypersurfaces by wa. With the application to
tension in mind, we consider both timelike and spacelike normals by setting
a = ǫ with ǫ = ±1. This slight generalization introduces factors of ǫ
into a number of otherwise standard formulas. The spacetime metric can be
written as
gab = ǫwawb + sab (18)
where sab, satisfying sa
bwb = 0, is the metric on the hypersurfaces V . As
usual, the dynamical variables in the Hamiltonian formalism are the met-
ric sab and its canonically conjugate momentum π
ab = ǫ
|s|(Ksab − Kab).
Here Kab = sa
c∇cwb is the extrinsic curvature of a hypersurface V . We con-
sider Hamiltonian evolution along the vector field W a = (∂/∂w)a, which can
be decomposed into its components normal and tangential to V , according
to W a = Nwa + Na where Nawa = 0. As usual, we refer to N and N
respectively as the lapse function and the shift vector. The gravitational
Hamiltonian density which evolves the system along W a is then given by
H = NH + NaHa with
H = −R(D−1) + ǫ
D − 2
− πabπab) (19)
Hb = −2Da(|s|−
2πab). (20)
where R(D−1) is the scalar curvature for the metric sab and the derivative
operator Da on the hypersurface V . One further finds that the quantities H
and Ha are simply related to the normal components of the Einstein tensor,
H = 2ǫ Gabw
awb, Hb = 2ǫ Gacw
ascb (21)
These components of the field equations contain only first derivatives with
respect to the coordinate w, and hence represent constraints on the dynamical
fields, sab and π
ab, on V . This property is independent of whether the normal
direction is timelike, as in the usual ADM formalism, or spacelike. In vacuum,
the equations H = 0 and Hb = 0 are enforced in the Hamiltonian formalism as
the equations of motion of the nondynamical lapse and shift variables. These
are referred to as the Hamiltonian and momentum constraints, a terminology
we continue to use in the case that the normal wa is spacelike.
Let us now assume that the spacetime metric ḡab is a solution to the
vacuum Einstein equations2 with a Killing vector ξa. We decompose ξa into
components normal and tangent to the hypersurfaces V introduced above,
according to ξa = Fwa + βa. Now, let us further assume that the metric
gab = ḡab+δgab is the linear approximation to another solution to the vacuum
Einstein equations. Denote the Hamiltonian data for the background metric
by s̄ab, π̄
ab, the corresponding perturbations to the data by hab = δsab and
pab = δπab, and the linearized Hamiltonian and momentum constraints by
δH and δHa. As shown in [10, 9, 13], the following statement then holds as
a consequence of Killing’s equation in the background metric,
FδH + βaδHa = −D̄aBa (22)
where D̄a is the background covariant derivative operator on the hypersurface
and the vector Ba is given by
Ba = F (D̄ah−D̄bhab)−hD̄aF +habD̄bF +
βb(π̄cdhcds̄
b−2π̄achbc−2pab).
Here indices are raised and lowered with the background metric s̄ab. Since the
metric gab solves the vacuum Einstein equations by assumption, we know that
δH = δHa = 0. Therefore, we have the Gauss’ law type statement D̄aB
a = 0.
Note that the detailed form of this relation for the perturbations hab and p
depends on the the Killing vector ξa and the normal wa to the hypersurface,
2In this paper we will focus on the case when the background spacetime is vacuum.
It is straightforward to add stress-energy which is described by a Hamiltonian [13]. If
the matter Hamiltonian contains constraints–such as for Maxwell theory–then additional
charges appear in the first law. This was worked out for Einstein-Yang-Mills in [9]. The
general case when the background spacetime has stress-energy, such as a cosmology, was
studied earlier in [10]. In this situation, the criterion for a Gauss’ law on perturbations is
that the background have an Integral Constraint Vector.
as well as on the background spacetime metric. Making different choices for
the Killing vector and normal can lead to different relations of this form. We
can now integrate the relation D̄aB
a = 0 over the hypersurface V and use
Stokes theorem to obtain
c = 0, (24)
where for black hole spacetimes the boundary ∂V of the hypersurface V
typically has two components, one on the horizon and one at infinity3.
6 The first law for stationary, Kaluza-Klein
black holes
Following references [9, 8], we now use the Hamiltonian formalism presented
in the last section to derive the first law for stationary, non-rotating Kaluza-
Klein black holes. The first law relates the difference δA in the horizon
area between nearby solutions to the variations δM, δP and δ L in the mass,
momentum and length of the compact direction. As in reference [8], we carry
out the calculation first holding the length at infinity,  L, fixed, and then use
this result in order to do the calculation with δ L 6= 0.
We assume as in section (2) that we have a stationary, non-rotating
Kaluza-Klein black hole solution with metric ḡab and horizon generating
Killing field la, which is given at infinity by la = T a + vHZ
a. We further
assume as in section (5) that the metric gab = ḡab + δgab is a linear approx-
imation to a solution of the field equations. At this stage, we assume that
δgab is such that δ L = 0. Further on, we will relax this assumption.
The derivation of the mass first law is then quite similar to that for
rotating black holes [9]. Consider a spacelike hypersurface V , which intersects
the horizon at the bifurcation surface and has a unit normal approaching the
vector T a at infinity. Choose the Killing vector in the Gauss’ law construction
to be the horizon generator la. Let ∂V∞ and ∂VH denote the boundaries of the
hypersurface V at infinity and at the horizon bifurcation surface. Equation
(24) then implies that
IH + I∞ = 0 (25)
3Kaluza-Klein bubble spacetimes, which we do not consider here, provide an interesting
contrast . There is no interior horizon, but the rotational Killing field has an axis. Hence
to use Stokes theorem, one must exclude the axis, which introduces an inner boundary.
where
c, I∞ =
c. (26)
Let us first consider the calculation of IH . On the horizon bifurcation
surface, the quantities F and βa vanish, and the boundary integral on the
horizon reduces to
IH = −
daρ̂c(−h D̄cF + hcb D̄bF ) (27)
where ρ̂c is the unit outward pointing normal to the bifurcation surface within
V . One can show that the surface gravity is given by κ = ρ̂c∂cF , and it then
follows as in reference [9] that
IH = 2κ δA (28)
Now consider the boundary term at infinity. Many of the terms in (23) fall
off too rapidly to make non-zero contributions. In particular, it is straight-
forward to check that the DaF terms, as well as those including products of
π̄ab with the metric perturbation, fall off too rapidly as r → ∞ to contribute.
Furthermore, it is sufficient to take F ≃ 1 and βz = vH in this limit. We
then arrive at the expression
dz dsi(∂
ih− ∂jhij − 2vHpiz) (29)
At this point, we need to note that the formulas (6) and (8) for the ADM
mass, momentum and tension are written in terms of the variable γab defined
by gab = ηab + γab. In order to interpret the boundary integral (29) in
terms of variations in M, P and T , we need to relate the perturbations
γab and p
ab in the Hamiltonian formalism to a covariant perturbation δγab.
It is straightforward to show that to the required order of accuracy h =
δklδγkl + δγzz and h
ij ≃ δikδjlδγkl, while a further calculation reveals that
piz ≃ −(1/2) ∂iδhzt. We then find that
∂ih− ∂jhij + vH∂ihzt
= −16πG(δM− vHδP) (31)
Inserting these results into equation (25) then yields the mass first law for
boosted black strings (with the length  L at infinity held fixed)4
δM = κ
δA + vHδP (33)
We see that the momentum appears as an extensive parameter in the first law,
while vH , which for the boosted black string is the horizon velocity, appears
as an intensive parameter. This parallels the way angular momentum enters
the first law for rotating black holes. Equation (32) is easily verified for the
case of the boosted black string using the formulas of section (4).
We now generalize the first law (33) to include perturbations with δL 6= 0.
Our analysis of the boundary term is based on that in [8] for the static
case. The boundary integral at the horizon in this case remains unchanged
and is still given by equation (28). Additional terms, however, occur in
the boundary term at infinity. Given the results above, we can write the
boundary term at infinity as
I∞ = 16πG(−δM|δL=0 + vHδP|δL=0 + λδL), (34)
where λ remains to be determined. On the other hand, we know the L
dependence of M and P explicitly from the expressions (11) and (12). This
allows us to write
δM = δM|δL=0 +
δL (35)
δP = δP|δL=0 +
δL (36)
Combining these with equations (34), (25) and (28) then gives
I∞ = 16πG
−δM + vHδP + (λ +
. (37)
We can now further appeal to the results of [8] for the case P = 0 and write
λ = λ|P=0 + λ′. We know from [8] that λ|P=0 + M/L = T . Putting this
together, allows us to rewrite (37) as
I∞ = 16πG
−δM + vHδP + (λ′ + T − vH
. (38)
4We can also include perturbative stress energy in this relation, in which case the mass
first law becomes
δM = κ
δA+ vHδP +
(−δT a
b). (32)
We still need to calculate the quantity Î∞ = 16πGλ
′δL which includes
only the terms in I∞ that are proportional to both P and δL. It is noted in [8]
that in order for the perturbative Gauss’s law (24) to apply with δL 6= 0, one
need to make a coordinate transformation so that δL appears in the metric
perturbation, rather than in a change in the range of coordinates. Following
this procedure yields the metric perturbations
hzz ≃ 2
), hzt ≃
There are two terms in equation (23) that potentially contribute to Î∞ and
we accordingly write Î∞ = Î
+ Î(2)
. The first of these terms is given by
Î(1)
dzdac
−2βbπ̄achab
dzdai(−2vH π̄izhzz) (41)
dzdai vH∂iḡtz
= 16πG vH P
The second term, which requires some care in evaluating, is given by
Î(2)
−2βbδπcb
dai(−2vHpiz) (45)
daivH ∂iḡtz
(1 − 2 + 1) (46)
= 0. (47)
where the factor (1 − 2 + 1) in line (46) comes about in the following way.
We have piz ≃ δ(
sszzKiz) with Kiz ≃ −(1/2)∂igtz. The first 1 comes from
the variation of the volume element
s, the −2 comes from the variation
of inverse metric component szz following from equation (39), and the final
1 comes from the variation of gtz, also as in equation (39). Putting these
results together gives λ′ = 2vHP/L and hence
I∞ = 16πG
−δM + vHδP + (T +
. (48)
Finally, combining this with IH gives the first law
δA + vHδP + (T +
)δL (49)
From the δL term, we see that the coefficient of δL is an effective tension given
by T̂ = T + vHP/L. As mentioned in the introduction, the tension of the
boosted black string becomes negative for sufficiently large boost parameter.
It is straightforward to check that the first law (49) is satisfied for variations
within the family of boosted black string solutions in section (4), and also
that the effective tension is given by
rD−4H (50)
which is equal to the tension of the unboosted black string having the same
horizon radius.
7 Smarr formulas, scaling relations and Ko-
mar integrals
Smarr formulas are relations between the thermodynamic parameters that
hold for black hole solutions that have exact symmetries. In this section we
will derive the Smarr formula for stationary, but non-rotating Kaluza-Klein
black holes. We will also derive a second Smarr-type formula that holds in the
case of exact translation invariance in the z-direction, e.g. for the boosted
black string. We present two approaches to deriving these formulas. The
first is based on general scaling relations, which are familiar from classical
thermodynamics, and the second is based on Komar integral relations.
Given the statement of the first law (49) for stationary, non-rotating
Kaluza-Klein black holes, the Smarr formula can be derived by making use
of a simple scaling argument (see e,g, [11]). Given any stationary vacuum
solution to Einstein’s equations, we may obtain a one parameter family of
solutions by rescaling all the dimensionful parameters in the given solution in
an appropriate way. If a parameter µ has dimensions (length)n, we replace it
with λnµ. The parameters ct, cz and ctz that specify the asymptotics of the
stationary solution all scale as (length)D−4. If we rescale these accordingly,
and also replace L with λL, then the mass and momentum rescale as
M = λD−3M̄, P = λD−3P̄ (51)
where M̄ and P̄ are the mass and momentum of the original solution. Sim-
ilarly, the area of the event horizon of the family of spacetimes will be
A = λD−2Ā. Now consider how these quantities change under a small change
in λ. We have
dM = (D−3)Mdλ
, dP = (D−3)P dλ
, dA = (D−2)Adλ
, dL = Ldλ
The first law (49) must hold in particular for this variation in λ. This will
implies that
(D − 3)M = (D − 2) 1
κA + T̂ L + (D − 3)vHP (53)
which is the Smarr formula for stationary, non-rotating Kaluza-Klein black
holes. Note that via the scaling argument, the effective tension T̂ naturally
enters the Smarr formula as well as the first law.
We now derive a second Smarr formula that holds only for solutions, such
as the boosted black string, that have exact translation invariance in the z-
direction5. Note that the mass, momentum and horizon area are all extensive
quantities in the compactification length L and that different values of L give
another one parameter family of solutions. Within this family we have
dM = MdL
, dP = P dL
, dA = AdL
under a small variation in L. For the first law to be satisfied under such
variations, we must have
M = 1
κA + T̂ L + vHP (55)
Because of the simple extensivity of M, P, A and L itself in the length L
of the compact direction, this second Smarr formula takes the form of the
usual Euler relation for a thermodynamic system, without any additional
dimension dependent prefactors. Note that by taking a linear combination
of the two Smarr formulas (53) and (55), the horizon area term may be
eliminated, giving
M = (D − 3)T̂ L + vHP (56)
= (D − 3)T L + (D − 2)vHP (57)
5It appears likely that the sboosted black strings can be shown to be the only stationary,
non-rotating, z translational vacuum solutions with non-singular horizons (see reference
[19]).
For P = 0 this is the well known relation between the mass and tension for
a uniform black string.
The Smarr formulas may also be derived by geometrical means using
Komar integral relations. This is done in reference [12] for the first Smarr
formula in the case P = 0. For a vacuum spacetime with a Killing vector ka,
and a hypersurface Σ with boundaries ∂Σ∞ at infinity and ∂ΣH at the black
hole horizon, the Komar integral relation implies the equality I∂Σ∞ = I∂ΣH
where
IS = −
dSab∇akb. (58)
The first Smarr formula results from taking ka to be the horizon generator
la and Σ to be a spacelike hypersurface with normal dt at infinity. The com-
putation of the horizon boundary term in this case is by now quite standard
(see [20]). The horizon generator la is null on the horizon and consequently
normal to the boundary ∂ΣH . Let q
a be the second null vector orthogo-
nal to ∂ΣH , normalized so that l
aqa = −1. One then has on the boundary
dSab = 2l[aqb]dA, where dA is the surface area element. It then follows that
I∂ΣH =
κA (59)
where we have made use of the definition (3) of the surface gravity. The
boundary term at infinity may be straightforwardly computed using the
asymptotic form of the metric in (10) and the expressions (11) and (12)
for the ADM mass, tension and momentum. One finds
I∂Σ∞ =
ΩD−3L
(D − 4)ct (60)
D − 2
((D − 3)M−T L) − vHP. (61)
Equating the two boundary integrals correctly reproduces the first Smarr
formula (53).
The scaling argument that led to the second Smarr formula assumed
translation invariance in the z-direction, i.e. that Za as well as the horizon
generator la = T a + vHZ
a is a Killing vector. To give a geometric derivation,
we additionally assume, as in section (2.1), that the Killing vectors T a and Za
commute. Let us now take the Killing vector in the Komar construction to
be V a = vHT
a +Za, which is orthogonal to the horizon generator la both at
infinity and on the horizon. We take the hypersurface Σ to be timelike, with
normal equal to dz at infinity and proportional to Va at the horizon. The
normal to the horizon within Σ is then proportional to the horizon generator
la, and hence the boundary term at the horizon includes the factor
laVb∇aV b = VaVb∇alb = 0. (62)
In the first equality the commutivity of the Killing vectors is used and the
second equality follows from Killing’s equation. Hence, the boundary term
at the horizon I∂ΣH vanishes for this choice of Killing field and hypersurface.
The boundary term at infinity is again straightforward to compute using the
expressions in (10),(11) and (12). We find
I∂Σ∞ = −
(D − 4)(cz + vHctz) (63)
= − 1
D − 2
(M/L− (D − 3)T ) + vHP/L. (64)
Equating this with zero then gives the second Smarr formula (55).
8 Tension first law and Gibbs-Duhem rela-
A second kind of variational formula for static Kaluza-Klein black holes was
derived in reference [13]. This ‘tension first law’ states that
LdT = − 1
Adκ (65)
and holds for perturbations that take a static, translation invariant solution
into a nearby solution that is stationary, but not necessarily translation in-
variant. It applies, for example, to the perturbation between the marginally
stable uniform black string and the static non-uniform black string of ref-
erence [21]. In this section, we discuss the thermodynamic context of this
formula and conjecture its extension to include P 6= 0.
We regard the quantities M, A, L and P as extensive parameters, while κ,
T and vH are regarded as intensive parameters. For thermodynamic systems,
the first law relates variations in the extensive parameters, as does equation
(49). In classical thermodynamics a formula relating the variations of the
intensive parameters is known as a Gibbs-Duhem relation. A Gibbs-Duhem
relation can be derived from the first law, together with the variation of an
Euler formula, such as equation (55). In the present case, variation of the
Euler formula gives
dM = 1
(κdA + Adκ) + T̂ dL + LdT̂ + vHdP + PdvH (66)
Combining this with the first law then gives the Gibbs-Duhem relation
Adκ + LdT̂ + PdvH (67)
which reduces to (65) for P = 0.
Note, however, that the Euler formula (55) holds only for z-translationally
invariant solutions, and hence the result above holds only for perturbations
that respect this symmetry, i.e. within the boosted black string family of
solutions. Equation (65) was derived in [13] via the Hamiltonian perturba-
tion methods of section (5), and does not require that the perturbations are
invariant under z translations. We would like to extend the derivation of [13]
to the stationary non-rotating case, but have not yet accomplished this.
9 Conclusions
We have derived various thermodynamic relations for stationary, non-rotating
Kaluza-Klein black holes. As in reference [8], the derivation of the first law
required a careful application of Hamiltonian perturbation theory techniques.
Perhaps the most interesting aspect of the first law (49) is the appearance
of the effective tension T̂ which generally differs from the ADM tension. For
the boosted black string, the ADM tension becomes negative for large boost
parameter, while the effective tension remains positive. We note that the
gravitational contribution to the ADM tension was shown to be positive for
static spacetimes in reference [22] using spinorial techniques. It should be
interesting to see what these techniques yield in the stationary case, e.g. do
they prove positivity of the effective tension.
Our results concerning the Smarr formulas in section (7) are also of in-
terest. In particular, the parallels between the scaling argument and Komar
integral relation derivations are intriguing and can most likely be understood
in a more general setting. Finally, we would like to be able to give a Hamilto-
nian derivation of the Gibbs-Duhem, or ‘tension first law’, result in equation
(67).
Acknowledgments
The authors would like to thank Roberto Emparan and Henriette Elvang for
helpful discussions. This work was supported in part by NSF grant PHY-
0555304.
References
[1] B. Kol, “The phase transition between caged black holes and black
strings: A review,” Phys. Rept. 422, 119 (2006) [arXiv:hep-th/0411240].
[2] T. Harmark, V. Niarchos and N. A. Obers, “Instabilities of black strings
and branes,” arXiv:hep-th/0701022.
[3] J. L. Hovdebo and R. C. Myers, “Black rings, boosted strings
and Gregory-Laflamme,” Phys. Rev. D 73, 084013 (2006)
[arXiv:hep-th/0601079].
[4] B. Kleihaus, J. Kunz and E. Radu, “Rotating nonuniform black string
solutions,” arXiv:hep-th/0702053.
[5] R. G. Cai, “Boosted domain wall and charged Kaigorodov space,” Phys.
Lett. B 572, 75 (2003) [arXiv:hep-th/0306140].
[6] P. K. Townsend and M. Zamaklar, “The first law of black brane me-
chanics,” Class. Quant. Grav. 18, 5269 (2001) [arXiv:hep-th/0107228].
[7] T. Harmark and N. A. Obers, “Phase structure of black holes and strings
on cylinders,” Nucl. Phys. B 684, 183 (2004) [arXiv:hep-th/0309230].
[8] D. Kastor and J. Traschen, “Stresses and strains in the first
law for Kaluza-Klein black holes,” JHEP 0609, 022 (2006)
[arXiv:hep-th/0607051].
[9] D. Sudarsky and R. M. Wald, “Extrema of mass, stationarity, and static-
ity, and solutions to the Einstein Yang-Mills equations,” Phys. Rev. D
46, 1453 (1992).
[10] J. H. Traschen, “Constraints On Stress Energy Perturbations In General
Relativity,” Phys. Rev. D 66, 010001 (2002).
http://arxiv.org/abs/hep-th/0411240
http://arxiv.org/abs/hep-th/0701022
http://arxiv.org/abs/hep-th/0601079
http://arxiv.org/abs/hep-th/0702053
http://arxiv.org/abs/hep-th/0306140
http://arxiv.org/abs/hep-th/0107228
http://arxiv.org/abs/hep-th/0309230
http://arxiv.org/abs/hep-th/0607051
[11] B. D. Chowdhury, S. Giusto and S. D. Mathur, “A microscopic model
for the black hole - black string phase transition,” Nucl. Phys. B 762,
301 (2007) [arXiv:hep-th/0610069].
[12] T. Harmark and N. A. Obers, “New phase diagram for black holes
and strings on cylinders,” Class. Quant. Grav. 21, 1709 (2004)
[arXiv:hep-th/0309116].
[13] J. H. Traschen and D. Fox, “Tension perturbations of black brane space-
times,” Class. Quant. Grav. 21, 289 (2004) [arXiv:gr-qc/0103106].
[14] S. W. Hawking, “Black holes in general relativity,” Commun. Math.
Phys. 25, 152 (1972).
[15] S. W. Hawking and G. F. R. Ellis, “The Large scale structure of space-
time,” Cambridge University Press, Cambridge, 1973
[16] S. Hollands, A. Ishibashi and R. M. Wald, “A higher di-
mensional stationary rotating black hole must be axisymmetric,”
arXiv:gr-qc/0605106.
[17] B. Carter and J. B. Hartle, “Gravitation in Astrophysics, Proceedings,
Nato Advanced Study Institute, Cargese, France, July 15-31, 1986,”
New York, USA: Plenum (1987) 399 P. (Nato ASI Series. Series B,
Physics, 156)
[18] T. Harmark and N. A. Obers, “General definition of gravitational ten-
sion,” JHEP 0405, 043 (2004) [arXiv:hep-th/0403103].
[19] J. Lee and H. C. Kim, “Black string solution and frame dragging,”
arXiv:gr-qc/0703091.
[20] J. M. Bardeen, B. Carter and S. W. Hawking, “The Four laws of black
hole mechanics,” Commun. Math. Phys. 31, 161 (1973).
[21] T. Wiseman, “Static axisymmetric vacuum solutions and non-
uniform black strings,” Class. Quant. Grav. 20, 1137 (2003)
[arXiv:hep-th/0209051].
[22] J. H. Traschen, “A positivity theorem for gravitational ten-
sion in brane spacetimes,” Class. Quant. Grav. 21, 1343 (2004)
[arXiv:hep-th/0308173].
http://arxiv.org/abs/hep-th/0610069
http://arxiv.org/abs/hep-th/0309116
http://arxiv.org/abs/gr-qc/0103106
http://arxiv.org/abs/gr-qc/0605106
http://arxiv.org/abs/hep-th/0403103
http://arxiv.org/abs/gr-qc/0703091
http://arxiv.org/abs/hep-th/0209051
http://arxiv.org/abs/hep-th/0308173
	Introduction
	Stationary, non-rotating Kaluza-Klein black holes
	Two commuting Killing fields
	ADM mass, tension and momentum
	The boosted black string
	Gauss' Laws for Perturbations
	The first law for stationary, Kaluza-Klein black holes
	Smarr formulas, scaling relations and Komar integrals
	Tension first law and Gibbs-Duhem relation
	Conclusions
ABSTRACT
  We study the thermodynamics of Kaluza-Klein black holes with momentum along
the compact dimension, but vanishing angular momentum. These black holes are
stationary, but non-rotating. We derive the first law for these spacetimes and
find that the parameter conjugate to variations in the length of the compact
direction is an effective tension, which generally differs from the ADM
tension. For the boosted black string, this effective tension is always
positive, while the ADM tension is negative for large boost parameter. We also
derive two Smarr formulas, one that follows from time translation invariance,
and a second one that holds only in the case of exact translation symmetry in
the compact dimension. Finally, we show that the `tension first law' derived by
Traschen and Fox in the static case has the form of a thermodynamic Gibbs-Duhem
relation and give its extension in the stationary, non-rotating case.

<|endoftext|><|startoftext|>
arXiv:0704.0730v1  [cs.PF]  5 Apr 2007
Revisiting the Issues On Netflow Sample and
Export Performance
Hamed Haddadi, Raul Landa, Miguel Rio
Department of Electronic and Electrical Engineering
University College London
United Kingdom
Email: hamed,mrio,rlanda@ee.ucl.ac.uk
Saleem Bhatti
School of Computer Science
University of St. Andrews
United Kingdom
Email: saleem@dcs.st-and.ac.uk
Abstract— The high volume of packets and packet rates of traffic
on some router links makes it exceedingly difficult for routers to
examine every packet in order to keep detailed statistics about the
traffic which is traversing the router. Sampling is commonly applied
on routers in order to limit the load incurred by the collection
of information that the router has to undertake when evaluating
flow information for monitoring purposes. The sampling process in
nearly all cases is a deterministic process of choosing 1 in every N
packets on a per-interface basis, and then forming the flow statistics
based on the collected sampled statistics. Even though this sampling
may not be significant for some statistics, such as packet rate, others
can be severely distorted. However, it is important to consider the
sampling techniques and their relative accuracy when applied to
different traffic patterns.
The main disadvantage of sampling is the loss of accuracy in
the collected trace when compared to the original traffic stream.
To date there has not been a detailed analysis of the impact of
sampling at a router in various traffic profiles and flow criteria.
In this paper, we assess the performance of the sampling process
as used in NetFlow in detail, and we discuss some techniques for
the compensation of loss of monitoring detail.
I. INTRODUCTION
Packet sampling is an integral part of passive network
measurement on today’s Internet. The high traffic volumes on
backbone networks and the pressure on routers has resulted
in the need to control the consumption of resources in the
measurement infrastructure. This has resulted in the definition
and use of estimated statistics by routers, generated based on
sampling packets in each direction of each port on the routers.
The aims of this paper is to analyse the effects of the sampling
process as operated by NetFlow, the dominant standard on
today’s routers.
There are three constraints on a core router which lead to
the use packet sampling: the size of the record buffer, the
CPU speed and the record look-up time. In [6], it is noted
that in order to manage and analyse the performance of a
network, it is enough to look at the basic statistical measures
and summary statistics such as average range, variance, and
standard deviation. However, in this paper we analyse both
analytically and practically the accuracy of the inference of
original characteristics from the sampled stream when higher
order statistics are used.
This paper focuses on the inference of original network
traffic characteristics for flows from a sampled set of packets
and examines how the sampling process can affect the quality
of the results. In this context, a flow is identified specifically,
as the tuple of the following five key fields: Source IP address,
Destination IP address, Source port number, Destination port
number, Layer 4 protocol type.
A. NetFlow memory constraints
A router at the core of an internet link is carrying a large
number of flows at any given time. this pressure on the router
entails the use of strict rules in order to export the statistics and
keep the router memory buffer and CPU resources available to
deal with changes in traffic patterns by avoiding the handling
of large tables of flow records. Rules for expiring NetFlow
cache entries include:
• Flows which have been idle for a specified time are
expired and removed from the cache (15 seconds is
default)
• Long lived flows are expired and removed from the cache
(30 minutes is default)
• As the cache becomes full a number of heuristics are ap-
plied to aggressively age groups of flows simultaneously
• TCP connections which have reached the end of byte
stream (FIN) or which have been reset (RST) will be
expired
B. Sampling basics
Distributions studies have been done extensively in lit-
erature. In brief conclusion, internet traffic is believed to
have Heavy-tailed distribution, self-similar nature, Long Range
Dependence [2]. Sampling has the following effects on the
flows:
• It is easy to miss short flows [14]
• Mis-ranking on high flows [4]
• Sparse flow creation [14]
Packet sampling:
The inversion methods are of little to no use in practice for
low sampling probability q, such as q = 0.01 (1 packet in
100) or smaller, and become much worse as q becomes smaller
still. For example, on the Abilene network, 50% sampling was
needed to detect the top flow correctly [4].
Flow sampling:
Preserves flows intact and the sampling is done on the flow
http://arxiv.org/abs/0704.0730v1
records. In practice, any attempt to gather flow statistics
involves classifying individual packets into flows. All packet
meta-data has to be organised into flows before sampling can
take place. This involves more CPU load and more memory
if one uses the traditional hash table approach with one entry
per flow. New flow classification techniques, such as bitmap
algorithms, could be applied but there is no practical usage in
this manner currently.
II. VARIATION OF HIGHER ORDER STATISTICS
In this section we look at a more detailed analysis of the
effect of sampling as performed by netflow on higher order
statistics of the packet and flow size distributions. For the
analysis of packet sampling application is used by NetFlow,
we emulated the NetFlow operation on a 1 hour OC-48 trace,
collected from the CAIDA link on 24th of April 2003. This
data set is available from the public repository at CAIDA [7].
The trace comprises of 84579462 packets with anonymised
source and destination IP addresses. An important factor to
rememberer in this work is the fact that the memory constraint
on the router has been relaxed in generating the flows from
the sampled stream. This means that there maybe more than
tens of thousands of flow keys present at the memory at a
given time, while in NetFlow, the export mechanism empties
the buffer list regularly which can have a more severe impact
on the resultant distribution of flow rates and statistics3.
A. Effects of the short time-out imposed by memory con-
straints
Table I illustrates the data rates d(t) per interval of mea-
surement. Inverted data rates, by dividing d(t) by the sampling
probability q, are shown as dn(t).
TABLE I
THE STATISTICAL PROPERTIES ON DATA RATES d(t)
Dataset,bin(secs) STD Skewness Kurtosis
d(t), 30 2.2274e+07 0.5421 0.6163
dn(t), 30 2.9109e+07 0.3837 0.4444
d(n)− dn(t), 30 1.6748e+07 -0.2083 0.7172
d(t), 120 7.8650e+07 0.7398 1.6190
dn(t), 120 9.5216e+07 0.3274 0.9268
d(t) − dn(t), 120 3.7652e+07 -0.2971 -1.1848
d(t), 300 1.8491e+08 1.3058 3.7451
dn(t), 300 2.1248e+08 1.1016 2.5408
d(t) − dn(t), 300 6.1039e+07 0.1840 -1.1628
As observed in table I, the mean does not have a great
variation, possibly because distributions of packet sizes within
single flows do not exhibit high variability. The standard
deviation of the estimated data rate is higher than the cor-
responding standard deviation for the unsampled data stream.
In the absence of any additional knowledge about the higher
level protocol, or the nature of the session level activity, in the
unsampled data stream, each flow can be thought of as having
3The processing of the data was done using tools which are made available
to the public by the authors.
packets of varying sizes that are more or less independent
from one another. Thus, the whole traffic profile results from
the addition of many independent random variables which, by
the central limit theorem, tend to balance among themselves
to produce a more predictable, homogeneous traffic aggre-
gate. However, simple inversion eliminates this multiplicity
of randomly distributed values by introducing a very strong
correlation effect, whereby the size of all the packets in a
reconstructed flow depend on the size of a very small set of
sampled packets. This eliminates the possibility for balancing
and thus increases the variability of the resulting stream, i.e.
its standard deviation.
However, the skewness and kurtosis do change. Skewness is
a measure of the asymmetry of the probability distribution of a
real-valued random variable. Roughly speaking, a distribution
has positive skew (right-skewed) if the right (higher value) tail
is longer and negative skew (left-skewed) if the left (lower
value) tail is longer (confusing the two is a common error).
Skewness, the third standardised moment, is written as γ1 and
defined as:
where µ3 is the third moment about the mean and σ is the
standard deviation.
Kurtosis is more commonly defined as the fourth cumulant
divided by the square of the variance of the probability
distribution,
which is known as excess kurtosis. The ”minus 3” at the end
of this formula is often explained as a correction to make the
kurtosis of the normal distribution equal to zero. The skewness
is a sort of measure of the asymmetry of the distribution
function. The kurtosis measures the flatness of the distribution
function compared to what would be expected from a Gaussian
distribution.
Table II illustrates the packet rates p(t) per interval of
measurement. Inverted packet rates, by dividing p(t) by the
sampling probability q, are shown as pn(t). The distributions
before and after sampling are extremely close, and thus their
difference tends to exaggerate those small difference that they
do have. That is the reason of the enormous skewness and
kurtosis that are observed. The skewness of the reconstructed
stream is smaller than that of the unsampled stream this means
that the reconstructed distribution is more symmetric, that is
, it tends to diverge in a more homogeneous manner around
the mean. Additionally, it is positive, meaning that in both
cases the distribution tends to have longer tails towards large
packets rather than towards short packets, concentrating its
bulk on the smaller packets. If we concede that small flows
(flows consisting of a small number of packets) tend to contain
small packets, then it is clear that this smaller packets will
be underrepresented and the distribution will shift its weight
towards bigger packets (members of bigger flows). Thus, it
will become more symmetric and hence less skewed.
The Kurtosis decreases in all of the considered examples.
TABLE II
THE STATISTICAL PROPERTIES ON PACKET RATES p(t)
Dataset,bin(secs) STD Skewness Kurtosis
p(t), 30 3.1162e+04 -0.4007 0.7415
pn(t), 30 3.1359e+04 -0.3584 0.6072
p(t) − pn(t), 30 5.4148e+03 9.1469 96.0659
p(t), 120 1.1215e+05 -0.3875 1.2027
pn(t), 120 1.1178e+05 -0.3759 1.2238
p(t) − pn(t), 120 3.0157e+03 4.7140 26.1079
p(t), 300 2.5128e+05 0.1305 1.6495
pn(t), 300 2.5152e+05 0.1433 1.6597
p(t) − pn(t), 300 2.1047e+03 2.4298 8.9377
This means that the reconstructed streams are more homo-
geneous and less prone to outliers when compared with the
original traces. Thus, more of the variance in the original
traces in packet size can be attributed to infrequent packets
that have inordinately big packets that were missed in the
sampling process, and thus the variance in the reconstructed
stream consists more of homogeneous differences and not
large outliers. However, both the reconstructed and unsampled
streams are leptokurtic and thus tend to have long, heavy tails.
B. The two-sample KS test
The two-sample KS test is one of the most useful and
general non-parametric methods for comparing two samples,
as it is sensitive to differences in both location and shape
of the empirical cumulative distribution functions of the two
samples. A CDF was calculated for the number of packets
per flow and the number of octets per flow for each of
the 120 sampling intervals of 30 seconds each, both for the
sampled/inverted and unsampled streams. Then, a Two-Sample
Kolmogorov-Smirnov Test with 5% significance level was
performed between the 120 unsampled and the 120 sampled
& inverted distributions. In every case the distributions before
and after sampling and inversion were found to be significantly
different, and thus it is very clear that the sampling and inver-
sion process significantly distorts the actual flow behaviour of
the network.
III. PRACTICAL IMPLICATIONS OF SAMPLING
The effects of sampling on network traffic statistics can be
measured from different perspectives. In this section we will
cover the theories behind the sampling strategy and use some
real data captures from CAIDA in an emulation approach to
demonstrate the performance constraints of systematic sam-
pling.
A. Inversion errors on sampled statistics
The great advantage of sampling is the fact that the first
order statistics do not show much variation when the sampling
is done at consistent intervals and from a large pool of
data. This enables the network monitoring to use the sampled
statistics to form a relatively good measure of the aggregate
measure of network performance. Figure 1 displays the data
rates d(t), in number of bytes seen per 30 second interval,
on the one hour trace. The inverted data d(t) is also shown
with diamond notation, showing the statistics gathered after
the sampled data is multiplied by the sampling rate. The black
dots display the relative error per interval, e(t) = d(t)−dn(t)
0 20 40 60 80 100 120
8 Data Rates on CAIDA, 30sec interval
Sampling interval [x 30sec]
Original
Inverted from sampling
Relative error
Fig. 1. Data rates per 30 second interval, original versus normal inversion
of sampled
Figure 2 displays the packet rates p(t), the number of
packets per 30 second interval, versus the sampled and inverted
packet rates pn(t). In this figure, it can be observed that
the inversion does a very good job at nearly all times and
the relative error is negligible. This is a characteristics of
systematic sampling and is due to the central limit theorem.
0 20 40 60 80 100 120
5 Packet Rate p(t) on CAIDA trace, 30 Second period
Sampling interval [x 30sec]
Original
Inverted from sampling
Fig. 2. Packet rates per 30 second interval, original vs inversion of sampled
It can be readily seen that the recovery of packet rates by
simple inversion is much better than the recovery of data
rates. This is because sampling one in a thousand packets
deterministically can be trivially inverted by multiplying by the
sampling rate (1000): we focus on packet level measurement,
as opposed to a flow level measurement. If the whole traffic
flow is collapsed into a single link, then if we sample one
packet out every thousand and then multiply that by the
sampling rate, we will get the total number of packets in that
time window. We believe that the small differences that we can
see in Figure 2 are due to the fact that at the end of the window
some packets are lost (because their ‘representative’ was not
sampled) or overcounted (a ‘representative’ for 1000 packets
was sampled but the time interval finished before they had
passed). We believe these errors happen between measurement
windows in time, i.e. they are window-edge effects.
The inversion property described above does not hold for
measuring the number of bytes in a sampling interval. Simple
inversion essentially assumes that all packets in a given flow
are the same size, and of course this assumption is incorrect.
It is to be expected that the greater the standard deviation of
packet size over an individual flow, the more inaccurate the
recovery by simple inversion will be regarding the number
of bytes per measurement interval. Figures 3 and 4 displays
the standard error rate on data rate and packet rate recovery
respectively, in different measurement intervals.
−1 0 1 2
Datarate errors, 300 second
−1 −0.5 0 0.5 1
Datarate errors, 120 second
−5 0 5 10
Datarate errors, 30 second
Fig. 3. Standard Sampling & inversion error on data rates, different
measurement bins
−5000 0 5000 10000
Packet rate errors, 300 second
−5 0 5 10
Packet rate errors, 30 second
−1 0 1 2
Packet rate errors, 120 second
Fig. 4. Sampling & inversion error on packet rates, different measurement
B. Flow size and packet size distributions
Figure 5, displays the CDF of packet size distribution in all
the flows formed from the sampled and unsampled streams.
The little variation in the packet size distribution conforms to
the findings of the previous section where it was discussed
that the packet sampling has low impact on the packet size
distribution.
3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8
CDF of packet size distributaions, Logarithmic view
Log of Packet size [Octets]
Sampled
Original
Fig. 5. Normalised CDF of packets distributions per flow, original vs inverted
Figure 6:1 shows the effect that the distribution of packet
lengths can have on the distribution of flow lengths when
periodic packet sampling is applied. As flows reconstructed
from a sampled packet stream are predominantly formed by
just one packet, their length distribution follows that of single
packets (Figure 5). That is the reason for the sharp jump near
1500 octets, as this characteristic originates from the maximum
frame size in ethernet networks.
2 4 6 8 10 12 14 16 18
CDF of Flow Size Distributaions [logarithmic]
Flow size [Log(Octets)]
Sampled
Original
0 2 4 6 8 10 12
CDF of Flow Length Distributaions [logarithmic]
Flow length [Log(Packets)]
Inverted
Original
Fig. 6. Normalised CDF of flow size in packets [figure] & length in bytes
[right] per flow, original vs inverted
From Figure 6:2 , it can be readily seen that, in the sampled
stream, more than 90 percent of flows consist of a single
packet, whereas in the unsampled case a much grater diversity
in flow lengths exists for small flows. This is due to the
fact that simple packet-based deterministic sampling under-
represents short flows, and those short flows that are indeed
detected by the procedure after sampling usually consist of a
single packet. Thus, short flows are either lost or recovered as
single packet flows, and long flows have their lengths reduced.
IV. CONCLUSION
In this paper we have reviewed the effects of sampling
and flow record creation, as done by NetFlow, on the traffic
statistics which are reported by such a process. It is inevitable
that systematic sampling can no longer provide a realistic
picture of the traffic profile present on internet links. The
emergence of applications such as video on-demand, file
sharing, streaming applications and even on-line data process-
ing packages prevents the routers from reporting an optimal
measure of the traffic traversing them. In the inversion process,
it is a mistake to assume that the inversion of statistics by
multiplication by the sampling rate is an indicate of even the
first order statistics such as packet rates.
An extension to this work and the inversion problem entails
the use of more detailed statistics such as port numbers
and TCP flags in order to be able to infer the original
characteristics from the probability distribution functions of
such variables. This will enable a more detailed recovery of
original packet and data rates for different applications. The
inference of such probabilities, plus use of methods such as
Bayesian inference, would enable a forecasting method which
would enable the inversion of the sampled stream in near real
time.
In a related work, we will be looking at alternative flow syn-
thesis schemes, looking at techniques replacing the NetFlow,
such as use of hashing techniques using Bloom filters. The use
of a light weight flow indexing system will allow for a larger
number of flows to be present at the router, possibly increasing
the memory constraints and allowing for a higher sampling
rate, which will in turn lead to more accurate inversion.
V. RELATED WORK
There has been a great deal of worked done on analysis
of sampling process and inversion problem. Choi et al. have
explored the sampling error and measurement overhead of
NetFlow in [11] though they have not looked at inversion
process.
In [3], the authors have compared the Netflow reports with
those obtained from SNMP statistics and packet level traces,
but without using the sampling feature of NetFlow which is
perhaps the dominant version in use nowadays. Estan et al. [5]
have proposed a novel method of adapting the sampling rate at
a NetFlow router in order to keep the memory resources at a
constant level. This is done by upgrading the router firmware,
which can be compromised by an attacker injecting varying
traffic volume in order to take down the router. Also this work
has not considered the flow length statistics which are the
primary focus of our work.
Hohn et al. [10] have proposed a flow sampling model
which can be used in an offline analysis of flow records formed
from an unsampled packet stream. In this model the statistics
of the original stream are recovered to a great extent. However
the intensive computing and memory resources needed in
this process prevents the implementation of such a scheme
on highspeed routers. They prove it impossible to accurately
recover statistics from a packet sampled stream, but based on
the assumption of packets being independent and identically
distributed
Roughan at [12] has looked at statistical processes of active
measurement using Poisson and uniform sampling and has
compared the theoretical performance of the two methods.
Papagiannaki et al. at [8] have discussed the effect of sampling
on tiny flows when looking at generation of traffic matrices.
Authors at [15] have been looking at anomaly detection
using flow statistics, but without sampling. In [17] and [16],
authors have looked at inferring the numbers and lengths of
flows of original traffic that evaded sampling altogether. They
have looked at inversion via multiplication.
ACKNOWLEDGEMENTS
The authors would like to acknowledge CAIDA [7] for
providing the trace files. This work is conducted under the
MASTS (EPSRC grant GR/T10503) and the 46PaQ project
(EPSRC grant GR/S93707).
REFERENCES
[1] NetFlow Services Solutions Guide available at http://www.
cisco.com/univercd/cc/td/doc/cisintwk/intsolns/
netflsol/nfwhite.htm
[2] Will Leland, Murad Taqqu, Walter Willinger, and Daniel Wilson, On the
Self-Similar Nature of Ethernet Traffic (Extended Version), IEEE/ACM
Transactions on Networking, Vol. 2, No. 1, pp. 1-15, February 1994.
[3] Sommer, R. and Feldmann, A. 2002. NetFlow: information loss or win?.
In Proceedings of the 2nd ACM SIGCOMM Workshop on internet
Measurment (Marseille, France, November 06 - 08, 2002). IMW ’02.
ACM Press, New York, NY, 173-174. DOI= http://doi.acm.org/
10.1145/637201.637226
[4] Barakat, C., Iannaccone, G., and Diot, C. 2005. Ranking flows from sam-
pled traffic. In Proceedings of the 2005 ACM Conference on Emerging
Network Experiment and Technology (Toulouse, France, October 24 - 27,
2005). CoNEXT’05. ACM Press, New York, NY, 188-199.
[5] Estan, C., Keys, K., Moore, D., and Varghese, G. 2004. Building a better
NetFlow. SIGCOMM Comput. Commun. Rev. 34, 4 (Aug. 2004), 245-
256. http://doi.acm.org/10.1145/1030194.1015495
[6] Performance and Fault Management (Cisco Press Core Series) by Paul
L Della Maggiora (Author), James M. Thompson (Author), Robert L.
Pavone Jr. (Author), Kent J. Phelps (Author), Christopher E. Elliott
(Editor), Publisher: Cisco press; 1ST edition, ISBN: 1578701805
[7] CAIDA, the Cooperative Association for Internet Data Analysis: http:
//www.caida.org
[8] K. Papagiannaki, N. Taft, and A. Lakhina. A Distributed Approach to
Measure IP Traffic Matrices. In ACM Internet Measurement Conference,
Taormina, Italy, October, 2004.
[9] IETF PSAMP working Group: http://www.ietf.org/html.
charters/psamp-charter.html
[10] N. Hohn and D. Veitch. Inverting Sampled Traffic. In Internet Mea-
surement Conference, 2003. http://citeseer.ist.psu.edu/
hohn03inverting.html
[11] Choi, B. and Bhattacharyya, S. 2005. Observations on Cisco sampled
NetFlow. SIGMETRICS Perform. Eval. Rev. 33, 3 (Dec. 2005), 18-23.
DOI= http://doi.acm.org/10.1145/1111572.1111579
[12] A Comparison of Poisson and Uniform Sampling for Active Measure-
ments, Matthew Roughan, accepted to appear in IEEE JSAC.
[13] InMon Corporation (2004). sFlow accuracy and billing. Available at
www.inmon.com/pdf/sFlowBilling.pdf
[14] Sampling for Passive Internet Measurement: A Review, N.G. Duffield,
Statistical Science,Vol. 19, No. 3, 472-498, 2004.
[15] Brauckhoff, D., Tellenbach, B., Wagner, A., May, M., and Lakhina,
A. 2006. Impact of packet sampling on anomaly detection metrics. In
Proceedings of the 6th ACM SIGCOMM on internet Measurement (Rio
de Janeriro, Brazil, October 25 - 27, 2006). IMC ’06. ACM Press,
New York, NY, 159-164. DOI= http://doi.acm.org/10.1145/
1177080.1177101
http://www.cisco.com/univercd/cc/td/doc/cisintwk/intsolns/netflsol/nfwhite.htm
http://doi.acm.org/10.1145/637201.637226
http://doi.acm.org/10.1145/1030194.1015495
http://www.caida.org
http://www.ietf.org/html.charters/psamp-charter.html
http://citeseer.ist.psu.edu/hohn03inverting.html
www.inmon.com/pdf/sFlowBilling.pdf
http://doi.acm.org/10.1145/1177080.1177101
[16] N. Duffield, C. Lund, and M. Thorup. Properties and Prediction of
Flow Statistics from Sampled Packet Streams. In Proc. ACM SIGCOMM
IMW’02, Marseille, France, Nov. 2002.
[17] N. Duffield, C. Lund, and M. Thorup. Estimating Flow Distributions
from Sampled Flow Statistics. In Proc. ACM SIGCOMM’03, Karlsruhe,
Germany, Aug. 2003.
ABSTRACT
  The high volume of packets and packet rates of traffic on some router links
makes it exceedingly difficult for routers to examine every packet in order to
keep detailed statistics about the traffic which is traversing the router.
Sampling is commonly applied on routers in order to limit the load incurred by
the collection of information that the router has to undertake when evaluating
flow information for monitoring purposes. The sampling process in nearly all
cases is a deterministic process of choosing 1 in every N packets on a
per-interface basis, and then forming the flow statistics based on the
collected sampled statistics. Even though this sampling may not be significant
for some statistics, such as packet rate, others can be severely distorted.
However, it is important to consider the sampling techniques and their relative
accuracy when applied to different traffic patterns. The main disadvantage of
sampling is the loss of accuracy in the collected trace when compared to the
original traffic stream. To date there has not been a detailed analysis of the
impact of sampling at a router in various traffic profiles and flow criteria.
In this paper, we assess the performance of the sampling process as used in
NetFlow in detail, and we discuss some techniques for the compensation of loss
of monitoring detail.

<|endoftext|><|startoftext|>
The tensor part of the Skyrme energy density functional. I. Spherical nuclei
T. Lesinski,1, ∗ M. Bender,2, 3, † K. Bennaceur,1, 2 T. Duguet,4 and J. Meyer1
1Université de Lyon, F-69003 Lyon, France; Institut de Physique Nucléaire de Lyon,
CNRS/IN2P3, Université Lyon 1, F-69622 Villeurbanne, France
2DSM/DAPNIA/SPhN, CEA Saclay, F-91191 Gif-sur-Yvette Cedex, France
3Université Bordeaux 1; CNRS/IN2P3; Centre d’Études Nucléaires de Bordeaux Gradignan,
UMR5797, Chemin du Solarium, BP120, F-33175 Gradignan, France
4National Superconducting Cyclotron Laboratory and Department of Physics and Astronomy,
Michigan State University, East Lansing, MI 48824, USA
(Dated: April 4, 2007)
We perform a systematic study of the impact of the J2 tensor term in the Skyrme energy functional
on properties of spherical nuclei. In the Skyrme energy functional, the tensor terms originate both
from zero-range central and tensor forces. We build a set of 36 parameterizations which cover a wide
range of the parameter space of the isoscalar and isovector tensor term coupling constants with a
fit protocol very similar to that of the successful SLy parameterizations. We analyze the impact of
the tensor terms on a large variety of observables in spherical mean-field calculations, such as the
spin-orbit splittings and single-particle spectra of doubly-magic nuclei, the evolution of spin-orbit
splittings along chains of semi-magic nuclei, mass residuals of spherical nuclei, and known anomalies
of radii. The major findings of our study are (i) tensor terms should not be added perturbatively to
existing parameterizations, a complete refit of the entire parameter set is imperative. (ii) The free
variation of the tensor terms does not lower the χ2 within a standard Skyrme energy functional.
(iii) For certain regions of the parameter space of their coupling constants, the tensor terms lead
to instabilities of the spherical shell structure, or even the coexistence of two configurations with
different spherical shell structure. (iv) The standard spin-orbit interaction does not scale properly
with the principal quantum number, such that single-particle states with one or several nodes have
too large spin-orbit splittings, while those of nodeless intruder levels are tentatively too small. Tensor
terms with realistic coupling constants cannot cure this problem. (v) Positive values of the coupling
constants of proton-neutron and like-particle tensor terms allow for a qualitative description of the
evolution of spin-orbit splittings in chains of Ca, Ni and Sn isotopes. (vi) For the same values of
the tensor term coupling constants, however, the overall agreement of the single-particle spectra
in doubly-magic nuclei is deteriorated, which can be traced back to features of the single-particle
spectra that are not related to the tensor terms. We conclude that the currently used central and
spin-orbit parts of the Skyrme energy density functional are not flexible enough to allow for the
presence of large tensor terms.
PACS numbers: 21.10.Dr, 21.10.Pc, 21.30.Fe, 21.60.Jz
I. INTRODUCTION
The strong nuclear spin-orbit interaction in nuclei is
responsible for the observed magic numbers in heavy nu-
clei [1, 2, 3, 4]. While a simple spin-orbit interaction al-
lows for the qualitative description of the global features
of shell structure, the available data suggest that single-
particle energies evolve with neutron and proton number
in a manner that cannot be related to the geometrical
growth of the single-particle potential with N and Z.
Many anomalies of shell structure have been identified
that do not fit into simple experimental systematics, and
that challenge any global model of nuclear structure.
The evolution of shell structure with N and Z as a fea-
ture of self-consistent mean-field models has been known
for long. To quote the pioneering study of shell structure
∗Electronic address: lesinski@ipnl.in2p3.fr
†Electronic address: bender@cenbg.in2p3.fr
in a self-consistent model performed by Beiner et al. [5],
the “most striking effect is the appearance of N = 16,
34 and 56 as neutron magic numbers for unstable nuclei,
together with a weakening of the shell closure at N = 20
and 28”. Various mechanisms that modify the appear-
ance of gaps in the single-particle spectra have been dis-
cussed in detail in the literature. The two most promi-
nent ones that were worked out by Dobaczewski et al. in
Ref. [6], however, play mainly a role for weakly-bound
exotic nuclei far from stability, as they are directly or
indirectly related to the physics of loosely bound single-
particle states, namely that the enhancement of the dif-
fuseness of neutron density distribution reduces the spin-
orbit coupling in neutron-rich nuclei on the one hand,
and the interaction between bound orbitals and the con-
tinuum results in a quenching of shell effects in light and
medium systems on the other hand. The former effect
was also extensively discussed in the framework of rela-
tivistic models by Lalazissis et al. [7, 8], while the latter
triggered a number of studies that discussed the poten-
tial relevance of this so-called “Boguliubov enhanced shell
http://arxiv.org/abs/0704.0731v3
mailto:lesinski@ipnl.in2p3.fr
mailto:bender@cenbg.in2p3.fr
quenching” to explain the abundance pattern from the
astrophysical r-process of nucleosynthesis [9, 10, 11, 12].
These two effects take place in neutron-rich nuclei. In
proton-rich nuclei, the Coulomb barrier suppresses both
the diffuseness of the proton density and the coupling of
bound proton states to the continuum. But the Coulomb
interaction itself can also modify the shell structure: for
super-heavy nuclei, it begins to destabilize the nucleus
as a whole. Mean-field models predict that it ampli-
fies the shell oscillations of the densities for incomplete
filled oscillator shells, which leads to strong variations of
the density profile that feed back onto the single-particle
spectra [13, 14].
Interestingly, most theoretical papers about the evolu-
tion of shell structure from the last decade have specu-
lated about new effects that mainly affect neutron shells
in nuclei far from stability in the anticipation of the rare-
isotope physics that might become accessible with the
next generation of experimental facilities. The known
anomalies, some of which have been known for a long
time, and many more have been identified recently, con-
cern also proton shells and already appear sufficiently
close to stability that “exotic phenomena can be ruled
out for their explanation” in most cases, to paraphrase
the authors of Ref. [15]. By contrast, this suggests that
there exists a mechanism that induces a strong evolution
of single-particle spectra already in stable nuclei that has
been overlooked for long.
There is a prominent ingredient of the nucleon-nucleon
interaction that has been ignored for decades in virtu-
ally all global nuclear structure models for medium and
heavy nuclei, be it macroscopic-microscopic approaches
or self-consistent mean-field methods. It is only very re-
cently, that the systematic discrepancies between model
predictions and experiment have triggered a renaissance
of the tensor force in the description of finite medium-
and heavy-mass nuclei.
The tensor force is a crucial and necessary ingredient
of the bare nucleon-nucleon interaction [16, 17], and con-
sequently is contained in all ab-initio approaches that are
available for light, mainly p-shell nuclei [18, 19]. One of
the first experimental signatures of the tensor force was
the small, but finite quadrupole moment of the deuteron.
In a boson-exchange picture of the bare nucleon-nucleon
interaction, the tensor force originates from the exchange
of pseudoscalar pions, which have both central and tensor
couplings, see for example section 2.3 in Ref. [20] or ap-
pendix 13A of Ref. [21]. In a nuclear many-body system,
the bare tensor force induces a strong correlation between
the spatial and spin orientations in the two-body density
matrix. For two nucleons with parallel spins, the ten-
sor force energetically favors the configuration where the
distance vector is aligned with the spins, while for anti-
parallel spins the tensor force prefers when the distance
vector is perpendicular to the spins, see the discussion
of Fig. 13 in Ref. [22] and of Fig. 3 in Ref. [23]. The
authors of these papers also demonstrate very nicely the
well-known fact [24, 25] that in an approach that starts
from the bare nucleon-nucleon interaction, nuclei are not
bound without taking into account the two-body corre-
lations induced by the tensor force.
The role of the tensor force, however, manifests itself
differently in self-consistent mean-field models, otherwise
called energy density functional (EDF) methods, the tool
of choice for medium and heavy nuclei. The latter meth-
ods use an independent-particle state as a reference state
to express the energy of the correlated nuclear ground
state. Thus, correlations are not explicitly present in the
higher-order density matrices of the reference state, but
rather included under the form of a more elaborate func-
tional of the (local and nearly local parts of the) one-body
density matrix of that reference state. In such a scheme,
most of the effect of the bare tensor force on the binding
energy is integrated out through the renormalization of
the coupling constants associated with a central effective
vertex, in a similar fashion as the tensor part of the bare
interaction is renormalized into the central one when go-
ing from the bare nucleon-nucleon force to a Brueckner G
matrix. The tensor terms of the EDF relate to a residual
tensor vertex, that gives nothing but a correction to the
spin-orbit splittings, which for light p-shell nuclei might
be of the same order as the contribution from the genuine
spin-orbit force. The interplay of spin-orbit and tensor
forces in the mean field of medium and heavy nuclei was
explored in Refs. [26, 27, 28], where the particular role
of spin-unsaturated shells was pointed out.
There are two widely used effective interactions for
non-relativistic self-consistent mean-field models [29], the
zero-range non-local Skyrme interaction [30, 31, 32, 33]
on the one hand and the finite-range Gogny force [34, 35]
on the other hand.
In fact, the effective zero-range non-local interaction
proposed by Skyrme in 1956 [30, 31, 32, 33] already con-
tained a zero-range tensor force. The first applications of
Skyrme’s interaction in self-consistent mean-field models
that became available around 1970, however, neglected
the tensor force, and the simplified effective Skyrme in-
teraction used in the seminal paper by Vautherin and
Brink [36] soon became the standard Skyrme interaction
that was used in most applications ever since. Until very
recently, there was only very little exploratory work on
Skyrme’s tensor force. In their early study, Stancu, Brink
and Flocard [37], who added the tensor force perturba-
tively to the SIII parameterization, pointed out that some
spin-orbit splittings in magic nuclei can be improved with
a tensor force. A complete fit including the terms from
the tensor force that contribute in spherical nuclei was
attempted by Tondeur [38], with the relevant coupling
constants of the spin-orbit and tensor terms adjusted to
selected spin-orbit splittings in 16O, 48Ca and 208Pb. An-
other complete fit of a generalized Skyrme interaction in-
cluding a tensor force was performed by Liu et al. [39],
but the authors did not investigate the effect of the ten-
sor force in detail, nor was the resulting parameterization
ever used in the literature thereafter.
Similarly, the seminal paper by Gogny [34] on the eval-
uation of matrix elements of a finite-range force of Gaus-
sian shape in an harmonic oscillator basis contains the ex-
pressions for a finite-range tensor force, which, however,
was omitted in the parameterizations of Gogny’s force
adjusted by the Bruyères-le-Châtel group [35]. It were
Onishi and Negele [40] who first published an effective
interaction that combined a Gaussian two-body central
force, a finite-range tensor force with a zero-range spin-
orbit force and a zero-range non-local three-body force,
which, however, also fell into oblivion.
The role of the tensor force is slightly different in
Skyrme and Gogny interactions. In the Gogny force,
the contributions from the central and tensor parts re-
main explicitly distinct, although, of course, this does
not prevent a certain entanglement of their physical ef-
fects. In the context of Skyrme’s functional, however, the
contribution of a zero-range tensor force to the spherical
mean-field state of an even-even nucleus has exactly the
same form as a particular exchange term from the non-
local part of the central Skyrme force. When looking at
spherical nuclei only, adding Skyrme’s tensor force simply
allows one to decouple a term that is already provided
by the central force. This indeed makes the effective-
interaction-restricted functional more flexible, as the ad-
ditional degrees of freedom from the tensor force remove
an interdependence between the effective mass, the sur-
face terms and the “tensor terms”. However, one must
always keep in mind that both the central and tensor
part of the effective vertex contribute to the so-called J2t
“tensor” terms of the functional.1
In the context of relativistic mean-field models, the
equivalent of the non-relativistic tensor force appears
as the exchange term of effective fields with the quan-
tum numbers of the pion, which by construction do
not appear in the standard relativistic Hartree models.
Only relativistic Hartree-Fock models contain this tensor
force, with the first predictive parameterizations becom-
ing available just recently [42].
We also mention that there is a large body of work
on the tensor force in the interacting shell model, see
Ref. [43] for a review, that concentrates on a completely
different aspect of the tensor force, namely its unique
contribution to excitations with unnatural parity.
The recent interest in the effect of the tensor force in
the context of self-consistent mean field models was trig-
gered by the observed evolution of single-particle levels
of one nucleon species in dependence of the number of
the other nucleon species. Otsuka et al. [44] proposed
1 As we will outline below, and as was already pointed out in
Ref. [5], this argument does not hold for deformed even-even nu-
clei or any situation where intrinsic time-reversal is broken, for
example odd nuclei or dynamics. There, the tensor and non-local
central parts of the effective Skyrme interaction give contribu-
tions to the mean-fields and the binding energy with different
analytical expressions. This will be discussed in a companion
article [41].
that at least part of the effect is caused by the proton-
neutron tensor force from pion exchange. Many groups
attempt now to explain known, but so far unresolved,
anomalies of shell structure in terms of a tensor force.
A particularly popular playground is the relative shift of
the proton 1g7/2 and 1h11/2 levels in tin isotopes, which is
interpreted as the reduction of the spin-orbit splittings of
both levels with their respective partners with increasing
neutron number [45].
Otsuka et al. [46] added a Gaussian tensor force, ad-
justed on the long-range part of a one-pion+ρ exchange
potential, to a standard Gogny force. After a consis-
tent readjustment of the parameters of its central and
spin-orbit parts, they were able to explain coherently the
anomalous relative evolution of some single-particle levels
without, however, being able to describe their absolute
distance in energy. Dobaczewski [47] has pointed out that
a perturbatively added tensor interaction with suitably
chosen coupling constants in the Skyrme energy density
functional does not only modify the evolution of shell
structure, but does also improve the description of nu-
clear masses around magic nuclei. Brown et al. [48] have
fitted a Skyrme interaction with added zero-range tensor
force with emphasis on the reproduction of single-particle
spectra. While the authors appreciate the qualitatively
correctly described evolution of relative level distances,
they point out that the combination of zero-range spin-
orbit and tensor forces does not and can not correctly
describe the ℓ-dependence of spin-orbit splittings. Colò
et al. [49], and Brink et al. [50] have added Skyrme’s
tensor force perturbatively to the existing standard pa-
rameterization SLy5 [51, 52], and to the SIII [5] one,
respectively. They have investigated some single-particle
energy differences: the 1h11/2 and 1g7/2 proton states in
tin isotopes as well as 1i13/2 and 1h9/2 neutron states in
N = 82 isotones and propose similar parameters as in
Ref. [48]. The effect of the tensor force on the centroid of
the GT giant resonance is also estimated by Colò et al.
using a sum-rule approach and found to be substantial.
Long et al. [53], demonstrate that the tensor force that
emerges naturally in relativistic Hartree-Fock also im-
proves the relative shifts of the proton 1g7/2 and 1h11/2
levels in tin isotopes.
The work on the tensor force published so far aims
at an optimal single parameterization, that establishes a
best fit to either the underlying bare tensor force [46, 48]
or empirical data [38, 47, 49]. The published results,
as well as our first exploratory studies, however, suggest
that adding a tensor force to the existing mean-field mod-
els gives only a local improvement of the relative change
of certain single-particle energies, but not necessarily a
global improvement of single-particle spectra or other ob-
servables. In the framework of the Skyrme interaction,
that we will employ throughout this work, there is also
the already mentioned ambiguity that the contribution
from the tensor force to spherical nuclei has the same
structure as a term from the central force. In view of
this situation, we will pursue a different strategy and in-
vestigate the effect of the tensor terms on a multitude
of observables in nuclei though a set of Skyrme interac-
tions with systematically varied coupling constants of the
tensor terms.
The present study was motivated by the finding that
the performance of the existing Skyrme-type effective in-
teractions for masses and spectroscopic properties is lim-
ited by systematic deficiencies of the single-particle spec-
tra [54, 55, 56, 57] that seem to be impossible to remove
within the standard Skyrme interaction. The details of
single-particle spectra were so far somewhat outside the
focus of self-consistent mean-field methods, on the one
hand as they do not correspond directly to empirical
single-particle energies (we will come back to that be-
low), and on the other hand because many of the ob-
servables that are usually calculated with self-consistent
mean-field methods are not very sensitive to the exact
placement of single-particle levels. By contrast, there
is an enormous body of work that examines the infi-
nite and semi-infinite nuclear matter properties of the
effective interactions that are the analog of liquid-drop
and droplet parameters in great detail. The reason is,
of course, that the global trends over the whole chart of
nuclei have to be understood before one can look into
details. The last few years have seen an increasing de-
mand on predictive power. Moreover, beyond-mean-field
approaches of the projected generator coordinate method
(GCM), or Bohr-Hamiltonian type, have become widely
used tools to analyze and predict spectroscopic properties
in medium and heavy nuclei, employing either Gogny or
Skyrme interactions. The underlying single-particle spec-
tra thus now deserve more attention, as many of the spec-
troscopic properties of interest turn out to be extremely
sensitive to even subtle details of the single-particle spec-
tra. As the tensor force is the most obvious missing piece
in all standard mean-field interactions, it is the natural
starting point for the systematic investigation of possi-
ble generalizations with the ultimate goal to improve the
predictive power of the interactions for spectroscopy.
In the present paper, we will outline the formalism of
a Skyrme interaction with added tensor force, describe
the fit of the parameterizations, analyze the role of the
tensor terms for single-particle spectra, masses and radii
of spherical even-even nuclei. A second paper [41] studies
the surface and deformation properties of these Skyrme
interactions for even-even nuclei, and future work will ex-
amine the stability of nuclear matter and the role of the
time-odd terms from the tensor force in odd and rotating
nuclei. Only deformed nuclei and, in particular, observ-
ables sensitive to the time-odd contributions, will pos-
sibly allow to distinguish clearly between the non-local
central and tensor parts of the Skyrme force.
II. THE SKYRME INTERACTION WITH
TENSOR TERMS
A. The energy density functional
The usual ansatz for the Skyrme effective interac-
tion [51, 52] leads to an energy density functional which
can be written as the sum of a kinetic term, the Skyrme
potential energy functional that models the effective
strong interaction in the particle-hole channel, a pairing
energy functional corresponding to a density-dependent
contact pairing interaction, the Coulomb energy func-
tional (calculated using the Slater approximation [58])
and correction terms to approximately remove the ex-
citation energy from spurious motion caused by broken
symmetries
E = Ekin + ESkyrme + Epairing + ECoulomb + Ecorr . (1)
B. The Skyrme energy density functional
Throughout this work, we will use an effective Skyrme
energy functional that corresponds to an antisymme-
trized density-dependent two-body vertex in the particle-
hole channel of the strong interaction, that can be decom-
posed into a central, spin-orbit and tensor contribution
vSkyrme = vc + vt + vLS . (2)
Other choices for the writing of the Skyrme energy func-
tional are possible and have been made in the literature,
which might affect the form of the effective interaction,
its interpretation and the results obtained from it. We
will come back to that in section IID below.
The Skyrme energy density functional is a functional
of local densities and currents
ESkyrme =
d3r HSkyrme(r) , (3)
which has many technical advantages compared to finite-
range forces such as the Gogny force. All exchange terms
have the same structure as the direct terms, which greatly
reduces the number of necessary integrations during a
calculation.
1. Local densities and currents
Throughout this paper we will assume that we have
pure proton and neutron states. The formal framework
of the general case including proton-neutron mixing is
discussed in Ref. [59]. Without making reference to any
single-particle basis, we start from the density matrices
of protons and neutrons in coordinate space [60]
ρq(rσ, r
′σ′) = 〈â
r′σ′q ârσq〉
ρq(r, r
′)δσσ′ +
sq(r, r
′) · 〈σ′|σ̂|σ〉
where
ρq(r, r
ρq(rσ, r
sq(r, r
ρq(rσ, r
′σ′) 〈σ′|σ̂|σ〉 . (5)
The Skyrme energy functional up to second order in
derivatives that we will introduce below can be expressed
in terms of seven local densities and currents [59] that are
defined as
ρq(r) = ρq(r, r
sq(r) = sq(r, r
τq(r) = ∇ ·∇
′ ρq(r, r
Tq,µ(r) = ∇ ·∇
′ sq,µ(r, r
jq(r) = −
(∇−∇′) ρq(r, r
Jq,µν(r) = −
(∇µ −∇
µ) sq,ν(r, r
Fq,µ(r) =
sq,ν(r, r
which are the density ρq(r), the kinetic density τq(r),
the current (vector) density jq(r), the spin (pseudovec-
tor) density sq(r), the spin kinetic (pseudovector) density
Tq(r), the spin-current (pseudotensor) density Jq,µν(r),
and the tensor-kinetic (pseudovector) density Fq(r).
ρq(r), τq(r) and Jq,µν(r) are time-even, while sq(r),
Tq(r), jq(r) and Fq(r) are time-odd. For a detailed dis-
cussion of their symmetries see Ref. [60]. There are other
local densities up to second order in derivatives that can
be constructed, but when constructing an energy func-
tional they either cannot be combined with others to
terms with proper symmetries or they lead to terms that
are not independent from the others [61].
The cartesian spin-current pseudotensor density Jµν
can be decomposed into pseudoscalar, (anti-symmetric)
vector and (symmetric) traceless pseudotensor parts, all
of which have well-defined transformation properties un-
der rotations
Jµν(r) =
δµν J
(0)(r)+ 1
ǫµνκ J
κ (r)+J
µν (r) , (7)
where δµν is the Kronecker symbol and ǫµνκ the Levi-
Civita tensor. The pseudoscalar, vector and pseudoten-
sor parts expressed in terms of the cartesian tensor are
given by
J (0)(r) =
Jµµ(r) , (8)
J (1)κ (r) =
µ,ν=x
ǫκµν Jµν(r) ,
J (2)µν (r) =
[Jµν(r) + Jνµ(r)] −
Jκκ(r) .
The vector spin current density J(1)(r) ≡ J(r) is often
called spin-orbit current, as it enters the spin-orbit energy
density. 2
For the formal discussion of the physical content of the
Skyrme energy functional it is of advantage to recouple
the proton and neutron densities to isoscalar and isovec-
tor densities, for example
ρ0(r) = ρn(r) + ρp(r) ,
ρ1(r) = ρn(r)− ρp(r) (9)
and similar for all others. As we assume pure proton
and neutron states, only the Tz = 0 component of the
isovector density is non-zero, which we exploit to drop
the index Tz from the isovector densities ρ1Tz (r) etc.
2. Skyrmes’s central force
We will use the standard density-dependent central
Skyrme force
vc(R, r) = t0 (1 + x0P̂σ) δ(r)
t3 (1 + x3P̂σ) ρ
α(R) δ(r)
t1 (1 + x1P̂σ)
′2 δ(r) + δ(r) k̂2
+ t2 (1 + x2P̂σ) k̂
′ · δ(r) k̂ (10)
where we use the shorthand notation
r = r1 − r2 ,
R = 1
(r1 + r2) , (11)
while k̂ is the usual operator for relative momenta
k̂ = − i
(∇1 −∇2) (12)
and k̂′ its complex conjugated acting on the left. Finally,
P̂σ is the spin exchange operator that controls the relative
strength of the S = 0 and S = 1 channels for a given term
in the two-body interaction
P̂σ =
(1 + σ̂1 · σ̂2) . (13)
As said above, we restrict ourselves to a parameterization
of the Skyrme energy functional as obtained from the
average value of an effective two-body vertex in the ref-
erence Slater determinant. We decompose the isoscalar
and isovector parts of the resulting energy density func-
tional Hc into a part H
c,even
t that is composed entirely of
time-even densities and currents, and a part H
c,odd
t that
contains terms which are bilinear in time-odd densities
2 Some authors call J(r) spin density, which is ambiguous and
confusing when discussing the complete energy density functional
including terms that contain the time-odd s(r).
and currents and vanishes in intrinsically time-reversal
invariant systems
Hc(r) =
t=0,1
c,even
t (r) +H
c,odd
t (r)
. (14)
Both H
c,even
t and H
c,odd
t are of course constructed such
that they are time-even; they are given by [59, 62]
c,even
t = A
t [ρ0] ρ
t ρt∆ρt +A
t ρtτt
µ,ν=x
Jt,µνJt,µν ,
c,odd
t = A
t [ρ0] s
+A∆st st ·∆st +A
t st ·Tt , (15)
where A
t [ρ0] and A
t [ρ0] are density dependent coupling
constants that depend on the total (isoscalar) density.
The detailed relations between the coupling constants of
the functional and the central Skyrme force are given
in appendix A. The notation reflects that two pairs of
terms in H
c,even
t and H
c,odd
t are connected by the require-
ment of local gauge invariance of the Skyrme energy func-
tional [63].
3. A zero-range spin-orbit force
The spin-orbit force used with most standard Skyrme
interactions
vLS(r) = iW0 (σ̂1 + σ̂2) · k̂
′ × δ(r) k̂ (16)
is a special case of the one proposed by Bell and
Skyrme [32, 33]. Again, the corresponding energy func-
tional [59, 62] can be separated into a time-even and a
time-odd term
HLS(r) =
t=0,1
LS,even
t (r) +H
LS,odd
t (r)
where
LS,even
t = A
t ρt∇ · Jt
LS,odd
t = A
t st · ∇ × jt (18)
which share the same coupling constant as again both
terms are linked by the local gauge invariance of the en-
ergy functional. The relation between the A∇·Jt and the
one coupling constant of the two-body spin-orbit force
W0 is given in appendix A.
4. Skyrme’s tensor force
By convention, the tensor operator in the tensor force
is constructed using the unit vectors in the direction of
the relative coordinate er = r/|r| and subtracting σ̂1 · σ̂2
Ŝ12 = 3(σ̂1 · er)(σ̂2 · er)− σ̂1 · σ̂2 , (19)
such that its mean value vanishes for a relative S state,
which decouples the central and tensor channels of the
interaction. The operator Ŝ12 commutes with the total
spin [Ŝ12, Ŝ
2] = 0, therefore it does not mix partial waves
with different spin, i.e. spin singlet and spin triplet states.
In particular, it does not act in spin singlet states at
all, as Ŝ12P̂S=0 = 0 (see section 13.6 of Ref. [21]). As
a consequence, there is no point in multiplying a tensor
force with an exchange operator (1+xtP̂σ) as done for the
central force, as this will only lead to an overall rescaling
of its strength.
The derivation of the general energy functional from a
zero-range two-body tensor force is discussed in detail in
Refs. [59, 64]. We repeat here the details relevant for our
discussion, starting from the two zero-range tensor forces
proposed by Skyrme [30, 31]
vt(r) = 1
3 (σ1 · k
′) (σ2 · k
′)− (σ1 · σ2)k
δ(r) + δ(r)
3 (σ1 · k) (σ2 · k)− (σ1 · σ2)k
3 (σ1 · k
′) δ(r) (σ2 · k)− (σ1 · σ2)k
′ · δ(r)k
where r, k̂ and k̂′ are defined as above, Eqs. (11) and (12).
The corresponding energy density functional can again be
decomposed in a time-even and a time-odd part
Ht(r) =
t=0,1
t,even
t (r) +H
t,odd
t (r)
with [59]
t,even
t = −B
µ,ν=x
Jt,µνJt,µν −
Jt,µµ
µ,ν=x
Jt,µνJt,νµ
t,odd
t = B
t st ·Tt +B
t st · Ft
+B∆st st ·∆st +B
t (∇ · st)
2 , (22)
where we already used the local gauge invariance of the
energy functional [59] for the expressions of the coupling
constants. The actual expressions for the coupling con-
stants expressed in terms of the two coupling constants
te and to of the tensor forces are given in appendix A.
The “even” term proportional to te in the two-body
tensor force (20) mixes relative S and D waves, while
the “odd” term proportional to to mixes relative P and
F waves. Thus, due to the fact that both act in spin-
triplet states only, antisymmetrization implies that the
former acts in isospin-singlet states (and hence con-
tributes to the neutron-proton interaction only) and the
latter in isospin-triplet states (contributing both to the
like-particle and neutron-proton interactions). The cen-
tral and spin-orbit interactions as we use them, however,
do not containD or F wave interactions. From this point
of view, one might suspect a mismatch when combining
the various interaction terms. From the point of view
of the energy functional (22), however, all contributions
from the zero-range tensor force are of the same second
order in derivatives as the contributions from the non-
local part of the central Skyrme force (15) and from the
spin-orbit force (18).
In the time-even part of the energy functional H
t,even
there appear three different combinations of the carte-
sian components of the spin current tensor. The term
proportional to BTt contains the symmetric combination
JµνJµν as it already appeared in the energy functional
from the central Skyrme interaction (15), while the term
proportional to BFt contains two different terms, namely
the antisymmetric combination JµνJνµ and the square of
the trace of Jνµ.
5. Combining central and tensor interactions
The Skyrme energy functional representing central,
tensor, and spin-orbit interactions is given by
ESkyrme = Ec + ELS + Et
t=0,1
t [ρ0] ρ
t + C
t [ρ0] s
t + C
t ρt∆ρt + C
t (∇ · st)
2 + C∆st st ·∆st + C
t (ρtτt − j
st ·Tt −
µ,ν=x
Jt,µνJt,µν
+ CFt
st · Ft −
Jt,µµ
µ,ν=x
Jt,µνJt,νµ
+C∇·Jt (ρt∇ · Jt + st · ∇ × jt)
. (23)
This functional contains all possible bilinear terms up to
second order in the derivatives that can be constructed
from local densities and that are invariant under spatial
and time inversion, rotations, and local gauge transfor-
mations [59].
Some of the coupling constants are completely defined
by the standard central Skyrme force, i.e. C
t = A
Cst = A
t , C
t = A
t , and C
t = A
t , two by the
spin-orbit force, C∇Jt = A
t , others by the tensor force,
CFt = B
t and C
t = B
t , while some are the sum of
coupling constants from both central and tensor forces,
CTt = A
t , and C
t = A
The three terms bilinear in Jµν can be recoupled into
terms bilinear in its pseudoscalar, vector, and pseudoten-
sor components J (0), J (1), and J (2), Eq. (8), which is
prefered by some authors [59]
µ,ν=x
Jt,µνJt,µν =
µ,ν=x
t,µνJ
t,µν (24)
Jt,µµ
µ,ν=x
Jt,µνJt,νµ
µ,ν=x
t,µνJ
t,µν . (25)
After combining (23) with the kinetic, Coulomb, pairing
and other contributions from (1), the mean-field equa-
tions are obtained by standard functional derivative tech-
niques from the total energy functional [29, 59].
The complete Skyrme energy functional (23) has quite
complicated a structure, and in the most general case
leads to seven distinct mean fields in the single-particle
Hamiltonian [59]. As already mentioned, we want to di-
vide the examination of those terms that contain two
derivatives and two Pauli matrices in the complete func-
tional, i.e. those terms from the central Skyrme force
that are often neglected and all the terms from the ten-
sor Skyrme force, into three distinct steps: First, in the
present paper, we enforce spherical symmetry which re-
moves all time-odd densities and all but one out of the
nine components of the spin current tensor Jµν as will
be outlined in the following section. A subsequent pa-
per [41] will discuss deformed even-even nuclei where the
complete spin current tensor Jµν is present, and future
work will address the time-odd part of the energy func-
tional (23).
C. The Skyrme energy functional in spherical
symmetry
For the rest of this paper, we will concentrate on spher-
ical nuclei, enforcing spherical symmetry of the N -body
wave functions. As a consequence, the canonical single-
particle wave functions Ψi [65] can be labeled by ji, ℓi
and mi. The index ni labels the different states with
same ji and ℓi. The functions Ψi separates into a radial
part ψ and an angular and spin part, represented by a
tensor spherical harmonic Ωjℓm
Ψnjℓm(r) =
ψnjℓ(r) Ωjℓm(θ, φ) . (26)
Spherical symmetry also enforces that all magnetic sub-
states of Ψnjℓm have the same occupation probability
v2njℓm ≡ v
njℓ for all −j ≤ m ≤ j. For a static spherical
state, all time-odd densities are zero sq(r) = Tq(r) =
jq(r) = Fq(r) = 0, as are the corresponding mean fields
in the single-particle Hamiltonian.
Enforcing spherical symmetry also greatly simplifies
the spin-current tensor, both the pseudoscalar and pseu-
dotensor parts of Jµν vanish. From the vector spin-orbit
current, only the radial component is non-zero, which is
given by [36]
Jq(r) =
n,j,ℓ
(2j + 1) v2njℓ
j(j + 1)− ℓ(ℓ+ 1)− 3
ψ2njℓ(r) (27)
so that there is only one out of the nine components of the
spin-current tensor density that contributes in spherical
nuclei. Unlike the total density ρ and the kinetic den-
sity τ , that are bulk properties of the nucleus and grow
with the size of the nucleus, the spin-orbit current is a
shell effect that shows strong fluctuations. Assume the
two shells with same n and ℓ which are split by the spin-
orbit interaction, one coupled with the spin to j = ℓ+ 1
the other to j = ℓ − 1
. It is easy to verify that their
contributions to Jq(r) are equal but of opposite signs
such that they cancel when (i) both shells are completely
filled and (ii) their radial wave functions are identical
ψn,ℓ+1/2,ℓ = ψn,ℓ−1/2,ℓ. Although the latter condition is
never exactly fulfilled, this demonstrates that the spin-
orbit current is not a bulk property, but a shell effect
that strongly fluctuates with N and Z. It nearly van-
ishes in so-called spin-saturated nuclei, where all spin-
orbit partners are either completely occupied or empty,
and it might be quite large when only the j = ℓ+1/2 level
out of one or even several pairs of spin-orbit partners is
filled.
Altogether, the Skyrme part of the energy density func-
tional in spherical nuclei is reduced to
HSkyrme =
t=0,1
t [ρ0] ρ
t + C
t ρt∆ρt + C
t ρtτt
CJt J
t + C
t ρt∇ · Jt
, (28)
where we have introduced an effective coupling constant
CJt of the J
t tensor terms at sphericity, such that the
corresponding contribution to the energy functional is
given by
t=0,1
CJt J
t=0,1
CTt +
t . (29)
The effective coupling constants can be separated back
into contributions from the non-local central and tensor
forces
CJt = A
t (30)
which are given by
AJ0 =
AJ1 =
BJ0 =
(te + 3to) =
(T + 3U)
BJ1 =
(to − te) =
(U − T ) , (31)
where we also give the expressions using the notation
T = 3te and U = 3to employed in [37, 49, 64].
For the following discussion it will be also illuminating
to recouple this expression to a representation that uses
proton and neutron densities, where we use the notation
introduced in Ref. [37]
Ht = 1
α (J2n + J
p) + β Jn · Jp , (32)
α = CJ0 + C
1 , β = C
0 − C
CJ0 =
(α+ β) , CJ1 =
(α− β) . (33)
The proton-neutron coupling constants α = αC+αT and
β = βC + βT can again be separated into contributions
from central and tensor forces
(t1 − t2)−
(t1x1 + t2x2) ,
βC = −
(t1x1 + t2x2) ,
(te + to) =
(T + U) . (34)
As could be expected, the isospin-singlet tensor force
contributes only to the proton-neutron term, while the
isospin-triplet tensor force contributes to both.
The spin-orbit potential of the neutrons is given by
Wn(r) =
δJn(r)
2∇ρn +∇ρp) + αJn + β Jp . (35)
The expression for the protons is obtained exchanging
the indices for protons and neutrons. In spherical sym-
metry, the tensor force gives a contribution to the spin-
orbit potential, but does not alter the structure of the
spin-orbit terms in the single-particle Hamiltonian as
such. This will be different in the case of deformed mean
fields [41, 59].
The dependence of the spin-orbit potential Wq(r) on
the spin-orbit current Jq(r) through the tensor terms is
the source of a potential instability. When the spin-orbit
splitting becomes larger than the splitting of the cen-
troids of single-particle states with different orbital angu-
lar momentum ℓ, the reordering of levels might increase
the number of spin-unsaturated levels, which increases
the spin-orbit current Jn and feeds back on the spin-orbit
potential by increasing it even further, which ultimately
leads to an unphysical shell structure. An example will
be given in appendix B.
D. A brief history of tensor terms in the central
Skyrme energy functional
For the interpretation of the parameterizations we will
describe below it is important to point out that within
our choice of the effective Skyrme interaction as an an-
tisymmetrized vertex the two coupling constants of the
contribution from the central force toHT , Eq. (29), either
represented through AJ0 , A
1 or through αC , βC , are not
independent from the coupling constants Aτ0 , A
1 , A
and A
1 , that appear in Eq. (28). Through the expres-
sions given in appendix A, all six of them are determined
by the four coupling constants t1, x1, t2, and x2 from the
central Skyrme force, Eq. (10). As a consequence, a ten-
sor force is absolutely necessary to decouple the values of
the CJt from those of the C
t and C
t , which determine
the isoscalar and isovector effective masses and give the
dominant contribution to the surface and surface asym-
metry coefficients, respectively.
This interpretation of the Skyrme interaction is, how-
ever, far from being common practice and a source of
confusion and potential inconsistencies in the literature.
Many authors have used parameterizations of the central
and spin-orbit Skyrme energy functional with coupling
constants that in one way or the other do not exactly
correspond to the functional obtained from Eqns. (10)
and (16), which, depending on the point of view, can be
seen as an approximation to or a generalization of the
original Skyrme interaction. As the most popular mod-
ification concerns the tensor terms, a few comments on
the subject are in order. Again, the practice goes back
to the seminal paper by Vautherin and Brink [36], who
state that “the contribution of this term to [the spin-
orbit potential] is quite small. Since it is difficult to in-
clude such a term in the case of deformed nuclei, it has
been neglected”. This choice was further motivated by
the interpretation of the effective Skyrme interaction as
a density-matrix expansion (DME) [25, 66, 67, 68]. All
early parameterizations as SI and SII [36], SIII-SVI [5],
SkM [69] and SkM∗ [70] followed this example and did
not contain the J2 terms. Beiner et al. [5] weakened
the case for J2 terms further by pointing out that they
might lead to unphysical single-particle spectra. During
the 1980s and later, however, it became more popular
to include them, for example in SkP [65], the parame-
terizations T1-T9 by Tondeur et al. [71], Eσ and Zσ by
Friedrich and Reinhard [72]. Some of the recent param-
eterizations come in pairs, where variants without and
with J2 terms are fitted within the same fit protocol, for
example (SLy4, SLy5) and (SLy6, SLy7) in Ref. [52], or
(SkO, SkO’) in Ref. [73].
Interestingly, all but one parameterization of the cen-
tral Skyrme interaction found in the literature set the
coupling constants of the J2 terms either to their Skyrme
force value (A1) or strictly to zero. The exception is
Ref. [38] by Tondeur, where an independent fit of the cou-
pling constants of the J2 terms was attempted, making
explicit reference to a DME interpretation of the energy
functional.
Setting the coupling constants of a term to zero when
one does not know how to adjust its parameters is of
course an acceptable practise when permitted by the cho-
sen framework. For Skyrme interactions fitted without
the J2 terms, the situation becomes confusing when one
looks at deformed nuclei and any situation that breaks
time-reversal invariance. First of all, Galilean invariance
of the energy functional dictates that the coupling con-
stant of the s · T terms is also set to zero, as already
indicated by the presentation of the energy functional
in Eq. (23). Second, using a DME interpretation of the
Skyrme energy functional in one place, but the interre-
lations from the two-body Skyrme force in all others is
not entirely satisfactory. Many authors who drop the J2
terms rarely show scruples to keep most of the time-odd
terms in the Skyrme energy functional (23) with coupling
constants Ast and A
t from (A1), although they are not
at all constrained in the common fit protocols employ-
ing properties of even-even nuclei and spin-saturated nu-
clear matter. For a list of exceptions see Sect. II.A.2.d of
Ref. [29]. An alternative is to set up a hierarchy of terms,
as it was attempted by Bonche, Flocard and Heenen in
their mean-field and beyond codes, which set A∆st = 0 in
addition to the coupling constant of the J2 terms, as all
three terms have in common that they couple two Pauli
matrices with two derivatives in different manners, see
the footnote on page 129 of [74].
There are also inconsistent applications of parameter-
izations without J2 − s · T terms to be found in the lit-
erature. For example, almost all applications of Skyrme
interactions to the Landau parameters gℓ and g
ℓ and the
properties of polarized nuclear matter, include the con-
tribution from the s · T terms, although it should be
dropped for parameterizations fitted without J2 terms.
Similarly, most RPA and QRPA codes include them for
simplicity, see the discussion in Refs. [75, 76, 77].
As it is relevant for the subject of the present paper,
we also mention another generalization of the Skyrme in-
teraction that invokes the interpretation of the Skyrme
energy functional in a DME framework. The spin-orbit
force (16) fixes the isospin mix of the corresponding
terms in the Skyrme energy functional (23) such that
A∇J0 = 3A
1 (A2). There are a few parameterizations
as MSkA [78], SkI3 and SkI4 [79], SkO and SkO’ [73] and
SLy10 [52] that liberate the isospin degree of freedom in
the spin-orbit functional. A DME interpretation of the
energy functional is mandatory for this generalization. It
is motivated by the better performance of standard rela-
tivistic mean-field models for the kink of the charge radii
in Pb isotopes. Note that the standard RMF models are
effective Hartree theories without exchange terms, and
that the standard Lagrangians have very limited isovec-
tor degrees of freedom [29], both of which supress a strong
isospin dependence of the spin-orbit interaction. It is in-
teresting to note that the existing fits of Skyrme energy
functionals with generalized spin-orbit interaction do not
improve spin-orbit splittings [14].
III. THE FITS
A. General remarks
In order to study the effect of the J2 terms, we have
built a set of 36 effective interactions that systematically
cover the region of coupling constants CJ0 and C
1 that
give a reasonable description of finite nuclei in connec-
tion with the standard central and spin-orbit Skyrme
forces. At variance with the perturbative approach used
in Refs. [37, 49], each of these parameterizations has been
fitted separately, following a procedure nearly identical to
that used for the construction of the SLy parameteriza-
tions [51, 52], so that we can keep the connection between
the new fits with parameterizations that have been ap-
plied to a large variety of observables and phenomena.
The Saclay-Lyon fit protocol focuses on the simultaneous
reproduction of nuclear bulk properties such as binding
energies and radii of finite nuclei and the empirical char-
acteristics of infinite nuclear matter (i.e. symmetric and
pure neutron matter). The latter establishes an impor-
tant, though highly idealized, limiting case as it permits
to confront the energy functional with calculations from
first principles using the bare nucleon-nucleon force [80].
The region of effective coupling constants (CJ0 , C
1 ) of
the J2 terms acting in spherical nuclei, as defined in
Eq. (28), that we will explore, is shown in Fig. 1. The
parameterizations are labeled TIJ , where indices I and
J refer to the proton-neutron (β) and like-particle (α)
coupling constants in Eq. (32) such that
α = 60 (J − 2) MeV fm5,
β = 60 (I − 2) MeV fm5. (36)
The corresponding values of CJt can be obtained through
Eq. (33) or from Fig. 1. On the one hand, we cover
the positions of the most popular existing Skyrme in-
teractions that take the J2 terms from the central force
into account, which are SLy5 [52], SkP [65], Zσ [72],
T6 [71], SkO’ [73] and BSk9 [81]. On the other hand,
among recent parameterizations including a tensor term,
i.e. Skxta [48], Skxtb [48, 82] as well as those published
by Colò et al. [49] and Brink and Stancu [50], most fall
in a region of negative CJ1 and vanishing C
0 , that is to
the lower left of Fig. 1. Parameterizations of this region,
which also includes a part of the triangle advocated in
the perturbative study of Stancu et al. [37], gave unsat-
isfactory results for many observables. Moreover, when
attempting to fit parameterizations with large negative
coupling constants, we sometimes obtained unrealistic
single-particle spectra or even ran into the instabilities al-
ready mentioned and outlined in appendix B. Parameter-
izations further to the lower and upper right also have un-
realistic deformations properties. The contribution from
the J2 terms vanishes for T22, which will serve as the
reference point. For the parameterizations T2J , only the
proton-proton and neutron-neutron terms in Ht are non-
zero (β = 0), while for the parameterizations TI2, only
the proton-neutron term in Ht contributes (α = 0). Note
that the earlier parameterizations T6 and Zσ have a pure
like-particle J2 terms as a consequence of the constraint
x1 = x2 = 0 employed for both (and most other early
parameterizations of Skyrme’s interaction).
B. The fit protocol and procedure
The list of observables used to construct the cost
function χ2 minimized during the fit (see Eq. (4.1) in
Ref. [51]) reads as follows: binding energies and charge
radii of 40Ca, 48Ca, 56Ni, 90Zr, 132Sn and 208Pb; the bind-
ing energy of 100Sn; the spin-orbit splitting of the neutron
3p state in 208Pb; the empirical energy per particle and
density at the saturation point of symmetric nuclear mat-
ter; and finally, the equation of state of neutron matter
as predicted by Wiringa et al. [16].
Furthermore, some properties of infinite nuclear mat-
ter are constrained through analytic relations between
-60 -30  0  30  60  90  120  150  180  210  240  270
CJ0 [MeV fm
BSk9T6
Skxta
Skxtb
Brink
FIG. 1: Values of CJ0 and C
1 for our set of parameteriza-
tions (circles). Diagonal lines indicate α = CJ0 +C
1 = 0 (pure
neutron-proton coupling) and β = CJ0 − C
1 = 0 (pure like-
particle coupling). Values for classical parameter sets are also
indicated (dots), with SLy4 representing all parameterizations
for which J2 terms have been omitted in the fit. Recent pa-
rameterizations with tensor terms are indicated by squares.
coupling constants in the same manner as they were in
Refs. [51, 52]: the incompressibility modulus K∞ is kept
at 230 MeV, while the volume symmetry energy coeffi-
cient aτ is set to 32 MeV. The isovector effective mass, ex-
pressed through the Thomas-Reiche-Kuhn sum rule en-
hancement factor κv, is taken such that κv = 0.25.
When using a single density-dependent term in the
central Skyrme force (10), the isoscalar effective mass
m∗0 cannot be chosen independently from the incompress-
ibility modulus for a given exponent α of ρ0. We fol-
low here the prescription used for the SLy parameteriza-
tions [51, 52] and use α = 1/6, which leads to an isoscalar
effective mass close to 0.7 in units of the bare nucleon
mass for all TIJ parameterizations. This value allows for
a correct description of dynamical properties, as for ex-
ample the energy of the giant quadrupole resonance [83].
Using such a protocol we cannot reproduce the isovec-
tor effective mass consistent with recent ab-initio predic-
tions [84]. Regarding the present exploratory study of the
tensor terms this is not a critical limitation, in particular
as the influence of this quantity on static properties of
finite nuclei turns out to be small.
There are three modifications of the fit protocol com-
pared to [51, 52]. The obvious one is that the values
for CJ0 and C
1 are fixed beforehand as the parameters
that will later on label and classify the fits. The second
is that we have added the binding energies of 90Zr and
100Sn to the set of data. Indeed, we observed that the
latter nucleus is usually significantly overbound when not
   -60   0   60   120   180   240
(T11)
β [MeV fm5] α [MeV fm5]
FIG. 2: Values of the cost function χ2 as defined in the fit
procedure, for the set of parameterizations TIJ . The label
“T11” indicates the position of this parameterization in the
(α,β)-plane as obtained from Eqs. (36). Contour lines are
drawn at χ2 = 11, 12, 15, 20, 25, and 30. The minimum
value is found for T21 (χ2 = 10.05), the maximum for T61
(χ2 = 37.11).
included in the fit. The third is that we have dropped
the constraint x2 = −1 that was imposed on the SLy pa-
rameterizations [51, 52] to ensure the stability of infinite
homogeneous neutron matter against a transition into
a ferromagnetic state. On the one hand, this stability
criterion is completely determined by the coupling con-
stants of the time-odd terms in the energy functional [76],
that we do not want to constrain here, accepting that
the parameterizations might be of limited use beyond
the present study. On the other hand, the tensor force
brings many new contributions to the energy per parti-
cle of polarized nuclear matter that lead to a much more
complex stability criterion. We postpone the entire dis-
cussion concerning the stability in polarized systems in
the presence of a tensor force to future work that will
also address finite-size instabilities [84]. It also has to be
stressed that the actual stability criterion, as all proper-
ties of the time-odd part of the Skyrme energy functional,
depends on the choices made for the interpretation of its
coupling constants, i.e. antisymmetrized vertex or den-
sity functional [76].
The properties of the finite nuclei entering the fit are
computed using a Slater determinant without taking
pairing into account. The cost function χ2 was mini-
mized using a simulated annealing algorithm. The an-
nealing schedule was an exponential one, with a charac-
teristic time of 200 iterations (also referred to as “simu-
lated quenching”) Thus, assuming a reasonably smooth
cost function, we strive to obtain satisfactory convergence
to its absolute minimum in a single run, allowing a sys-
tematic and straightforward production of a large series
of forces. The coupling constants for all 36 parameteri-
zations can be found in the Physical Review archive [85].
Figure 2 displays the value of χ2 after minimization as
-150 -120 -90 -60 -30  0  30  60  90  120  150  180
BJ0 [MeV fm
FIG. 3: The contributions from the tensor force BJ0 and B
to the effective coupling constants of the J2 term at sphericity.
Diagonal lines as in Fig. 1. The diagonal where BJ0 + B
αT = 0 (pure proton-neutron contribution) additionally cor-
responds to an isospin-singlet force with to ≡ U = 0.
a function of the recoupled coupling constants α and β.
The first striking feature is the existence of a “valley” at
β = 0, i.e. a pure like-particle tensor term ∼ (J2n + J
The abrupt rise of χ2 around this value can be attributed
to the term depending on nuclear binding energies, as
sharp variations of energy residuals can be seen between
neighboring magic nuclei with functionals of the T6J se-
ries (β = 240). For example, 48Ca and 90Zr tend to be
significantly overbound in this case. We will come back
later to discussing the implications for the quality of the
functionals.
C. General properties of the fits
The coupling constants of the energy functional for
spherical nuclei (28) obtained for T22 are very similar to
those of SLy4, except for a slight readjustment coming
from the inclusion of the binding energies of 90Zr and
100Sn in the fit as well as the abandoned constraint on
x2. With its value of −0.945, the x2 obtained for T22
still stays close to the value −1 enforced for SLy4, which
confirms that this is not too severe a constraint for pa-
rameterizations without effective J2 terms at sphericity.
Increasing the effective tensor term coupling constants
CJt , however, the values for x2 start to deviate strongly
from the region around −1, which is to a large extent due
to the feedback from the contribution of the J2 terms to
the surface and surface symmetry energy coefficients in
the presence of constraints on isoscalar and isovector ef-
    -60    0
    60    120
    180    240
-60  
120  
180  
240  
W0 [MeV fm
(T11)
β [MeV fm5]α [MeV fm5]
W0 [MeV fm
FIG. 4: Value of spin-orbit coupling constant W0 for each
of the parameterizations TIJ , vs. indices I and J (The
“(T11)” label indicates the position of this parameterization
in the (α, β)-plane). The contour lines differ by 20 MeV fm5.
The values plotted here range from 103.7 MeV fm5 (T11) to
195.3 MeV fm5 (T66).
fective masses, all of which also depend on x2. A more de-
tailed discussion of the contribution of the J2 terms to the
surface energy coefficients will be given elsewhere [41].
From the constrained coupling constants CJ0 and C
the respective contributions BJ0 and B
1 from the tensor
force can be deduced afterwards using the expressions
given in Sect. II C. Their values, shown in Fig. 3, are
less regularly distributed, which is a consequence of the
the non-linear interdependence of all coupling constants.
Still, a general trend can be observed, such that all
parameterizations are shifted towards the “south-west”
compared to Fig. 1. In turn, this indicates that the con-
tribution from the central Skyrme force always stays in
the small region outlined by SkP, SLy5, Zσ, etc in Fig. 1,
with values that range between 28 and 104 MeV fm5 for
AJ0 and between 38 to 62 MeV fm
5 for AJ1 , respectively.
This justifies a posteriori to use the tensor force as a
motivation to decouple the J2t terms from the central
part of the effective Skyrme vertex. We note in passing
that all our parameterizations TI4 correspond to an al-
most pure proton-neutron or isospin-singlet tensor force,
i.e. the term ∝ te in Eq. (20), as they are all located close
to the αT = 0 line.
We also find a particularly strong and systematic vari-
ation of the coupling constant W0 of the spin-orbit force,
which varies from W0 = 103.7 MeV fm
5 for T11 to
W0 = 195.3 MeV fm
5 for T66, see Fig. 4. This variation
is of course correlated to the strength of the tensor force.
As already shown, the tensor force has the tendency to
reduce the spin-orbit splittings in spin-unsaturated nu-
clei. To maintain a given spin-orbit splitting in such a
nucleus, the spin-orbit coupling constant W0 has to be
increased.
Ni, T440.015
0.010
0.005
0.000
ρsat/2
0 1 2 3 4 5 6 7
r [fm]
-0.010
-0.005
0.000
0.005
0.010
0.015
0.020
Jn [fm
FIG. 5: (Color online) Radial component of the neutron
spin-orbit current for the chain of Ni isotopes, plotted against
radius and neutron number N . The solid line on the base
plot indicates the radius where the total density has half its
saturation value.
IV. RESULTS AND DISCUSSION
The calculations presented below include open-shell
nuclei treated in the Hartree-Fock-Bogoliubov (HFB)
framework. In the particle-particle channel, we use a
zero-range interaction with a mixed surface/volume form
factor (called DFTM pairing in Ref. [86]). The HFB
equations were regularized with a cutoff at 60 MeV in
the quasiparticle equivalent spectrum [87]. The pair-
ing strength was adjusted in 120Sn with the particle-hole
mean field calculated using the parameter set T33. The
resulting strength was kept at the same value for all pa-
rameterizations, which is justified by the fact that the
effective mass parameters are the same. Moreover, we
thus avoid including, in the adjustment of the pairing
strength, local effects linked with changes in details of
the single-particle spectrum.
A. Spin-orbit currents and potentials
As a first step in the analysis of the role of the tensor
terms and their interplay with the spin-orbit interaction
in spherical nuclei, we analyze the spin-orbit current den-
sity and its relative contribution to the spin-orbit poten-
tial. We choose the chain of nickel isotopes, Z = 28, as it
covers the largest number of spherical neutron shells and
subshells (N = 20, 28, 40 and 50) of any isotopic chain,
two of which are spin-saturated (N = 20 and 40), while
the other two are not. Figure 5 displays the radial com-
ponent of the neutron spin-orbit current Jn for isotopes
from the proton to the neutron drip-lines. The calcula-
tions are performed with T44, but the spin-orbit current
is fairly independent from the parameterization. Starting
from N = 20, which corresponds to a completely filled
and spin-saturated sd-shell, the next magic number at
N = 28 is reached by filling the 1f7/2 shell, which leads
to the steeply rising bump in the plot of Jn in the fore-
ground, peaked around r ≃ 3.5 fm. Then, from N = 28
to N = 40 the rest of the fp shell is filled, which first
produces the small bump at small radii that corresponds
to the filling of the 2p3/2 shell, but ultimately leads to
a vanishing spin-orbit current when the 1f and 2p lev-
els are completely filled for the N = 40 isostope, visible
as the deep valley in Fig. 5. Adding more neutrons, the
filling of the 1g9/2 shell leads again to a strong neutron
spin-orbit current at N = 50. For the remaining isotopes
up to the neutron drip line, the evolution of Jn is slower
with the filling of the 2d and 3s orbitals.
A few further comments are in order. First, the spin-
orbit current clearly reflects the spatial probability dis-
tribution of the single-particle wave function in pairs of
unsaturated spin-orbit partners. Within a given shell,
the high-ℓ states contribute at the surface, represented
by the solid line on the base of Fig. 5, while low-ℓ states
contribute at the interior. The peak from the high-ℓ or-
bitals, however, is always located on the inside of the nu-
clear surface, as defined by the radius of half saturation
density. Second, within a given shell, the largest contri-
butions to the spin-orbit current density obviously come
from the levels with largest ℓ, as they have the largest
degeneracy factors in (27), and because they do not have
nodes, which leads to a single, sharply peaked contribu-
tion. Third, the spin-orbit current is not exactly zero
for nominally “spin-saturated” nuclei, exemplified by the
N = 20 and N = 40 isotopes in Fig. 5, as the radial
single-particle wave functions are not exactly identical
for all pairs of spin-orbit partners, which is a necessary
requirement to obtain Jn = 0 at all radii (Cf. the example
of the ν 2d states in 132Sn in Fig. 16 below). Fourth, pair-
ing and other correlations will always smooth the fluctu-
ations of the spin-orbit current with nucleon numbers, as
levels in the vicinity of the Fermi energy will never be
completely filled or empty.
Next, we compare the contributions from the tensor
terms and from the spin-orbit force to the spin-orbit po-
tentials of protons and neutrons, Eq. (35). The contri-
butions from the tensor force to the spin-orbit poten-
tial are proportional to the spin-orbit currents of pro-
tons and neutrons. For the Ni isotopes, the proton spin-
orbit current is very similar to that of the neutrons at
N = 28 displayed in Fig. 5. For the parameterization
T44 we use here as an example, we have contributions
from both proton and neutron spin-orbit currents, which
come with equal weights. Their combined contribution
to the spin-orbit potential of the neutronWn might be as
large as 4 MeV, see Fig. 6. This is more than a third of
the maximum contribution from the spin-orbit force to
Wn, see Fig. 7. The latter is proportional to a combina-
tion of the gradients of the proton and neutron densities,
2∇ρn(r) + ∇ρp(r), see Eq. (35). As a consequence, it
has a smooth behavior as a function of particle number,
Ni, T44
ρsat/2
0 1 2 3 4 5 6 7
r [fm]
  20 N
Wn,t [MeV fm]
FIG. 6: (Color online) Contribution from the tensor terms to
the neutron spin-orbit potential for the chain of Ni isotopes
as obtained with the parameterization T44. The solid line on
the base plot indicates the radius where the isoscalar density
ρ0 crosses half its saturation value.
Ni, T44
ρsat/2
0 1 2 3 4 5 6 7
r [fm]
  20 N
Wn,so [MeV fm]
FIG. 7: (Color online) Contribution from the spin-orbit force
to the neutron spin-orbit potential for the chain of Ni isotopes
as obtained with the parameterization T44. The solid line on
the base plot indicates the radius where the isoscalar density
ρ0 crosses half its saturation value.
with slowly and monotonically varying width, depth and
position. Only limited local variations can be seen on
the interior due to small variations of the density profile
originating from the successive filling of different orbits.
Furthermore, one can easily verify that the contribution
from the spin-orbit force is peaked at the surface of the
nucleus (the solid line on the base plot). The strongest
variation of the depth of this potential occurs just be-
fore the neutron drip line at N = 62, where is becomes
wider and shallower due to the development of a diffuse
Ni, T44
ρsat/2
0 1 2 3 4 5 6 7
r [fm]
Wn [MeV fm]
FIG. 8: (Color online) Total neutron spin-orbit potential for
the chain of Ni isotopes as obtained with the parameterization
T44. The solid line on the base plot indicates the radius where
the isoscalar density ρ0 crosses half its saturation value.
Ni, T44
ρsat/2
0 1 2 3 4 5 6 7
r [fm]
Wp [MeV fm]
FIG. 9: (Color online) Total proton spin-orbit potential for
the chain of Ni isotopes as obtained with the parameterization
T44. The solid line on the base plot indicates the radius where
the isoscalar density ρ0 crosses half its saturation value.
neutron skin, which reduces the gradient of the neutron
density [6, 7, 8].
Adding the contributions from the proton and neutron
tensor terms to that from the spin-orbit force, the total
neutron spin-orbit potential for neutrons in Ni isotopes
is shown in Fig. 8. For the parameterization T44 used
here (and most others in the sample of parameterizations
used in this study) the dominating contributions from
the spin-orbit and tensor forces to the spin-orbit poten-
tial are of opposite sign. For Ni isotopes, Jp is always
quite large, while Jn varies as shown in Fig. 5. Notably,
both are peaked inside of the surface. When examining
 20  30  40  50  60
1g9/2
2p1/2
1f5/2
2p3/2
1f7/2
1d3/2
2s1/2
1d5/2
2d3/2
3s1/2
2d5/2
1g9/2
2p1/2
1f5/2
2p3/2
1f7/2
2s1/2
1d3/2
FIG. 10: (Color online) Single-particle spectra of neutrons
(upper panel) and protons (lower panel) for the chain of Ni
isotopes, as obtained with the parameterization T22 with van-
ishing combined J2 terms. The thick solid line in the upper
panel denotes the Fermi energy for neutrons.
the combined contribution from the spin-orbit and tensor
forces to the spin-orbit potential (35), one must keep in
mind that they are peaked at different radii. Moreover,
the variation of tensor-term coupling constants among a
set of parameterizations implies a rearrangement of the
spin-orbit term strength, as will be discussed later. As a
consequence, taking into account the tensor force modi-
fies the width and localization of the spin-orbit potential
Wq(r) much more than it modifies its depth through the
variation of the spin-orbit currents.
Our observations also confirm the finding of Otsuka
et al. [46] that the spin-orbit splittings might be more
strongly modified by the tensor force than they are by
neutron skins in neutron-rich nuclei through the reduc-
tion of the gradient of the density.
Figure 9 shows the spin-orbit potential of the protons
for the chain of Ni isotopes. Here, the contribution from
the spin-orbit force has a larger contribution coming from
the gradient of the proton density that just grows with
the mass number, without being subject to varying shell
fluctuations. The same holds for the proton contribution
from the tensor terms. Only the neutron contribution
from the tensor terms varies rapidly, proportional to Jn
displayed in Fig. 5, which has a very limited effect on the
total spin-orbit potential, though.
With that, we can examine how the tensor terms af-
fect the evolution of single-particle spectra. To that end,
Fig. 10 shows the single-particle energies of protons and
neutrons along the chain of Ni isotopes for the parameter-
ization T22 with vanishing combined tensor terms, which
 20  30  40  50  60
1g9/2
2p1/2
1f5/2
2p3/2
1f7/2
1d3/2
2s1/2
1d5/2
2d3/2
3s1/2
2d5/2
1g9/2
2p1/2
1f5/2
2p3/2
1f7/2
2s1/2
1d3/2
FIG. 11: (Color online) The same as Fig. 10, obtained with
T44 with proton-neutron and like-particle tensor terms of
equal strength.
will serve as a reference, while Fig. 11 shows the same for
the parameterization T44 with proton-neutron and like-
particle tensor terms of equal strength. For the latter,
the variation of the neutron spin-orbit current with N in-
fluences both neutron and proton single-particle spectra.
The effect of the tensor terms is subtle, but clearly visi-
ble: for T22, the major change of the single-particle en-
ergies is their compression with increasing mass number,
while for T44 the level distances oscillate on top of this
background correlated to the neutron shell and sub-shell
closures at N = 20, 28, 40 and 50. As shown above, the
neutron spin-orbit current vanishes for N = 20, where
it consequently has no effect on the spin-orbit potentials
and splittings. By contrast, the neutron spin-orbit cur-
rent is large for N = 28 and 50, where its contribution
to the spin-orbit potential reduces the splittings from the
spin-orbit force.
The strong variation of the spin-orbit current with
nucleon numbers is typical for light nuclei up to about
mass 100. For heavier nuclei, its variation becomes much
smaller. This is exemplified in Fig. 12 for the neutron
spin-orbit current in the chain of Pb isotopes. There
remain the fast fluctuations at small radii which we al-
ready saw for the Ni isotopes and that reflect the subse-
quent filling of low-ℓ levels with many nodes, but which
have a very limited impact on the spin-orbit splittings
when fed into the spin-orbit potential. The dominating
peak of the spin-orbit current, just beneath the surface
shows only small fluctuations, as the overlapping spin-
orbit splittings of levels with different ℓ never give rise to
a spin-saturated configuration in heavy nuclei.
Pb, T44
0.010
0.005
0.000
ρsat/2
0 2 4 6 8 10r [fm]
-0.005
0.000
0.005
0.010
0.015
Jn [fm
FIG. 12: (Color online) Radial component of the Neutron
spin-orbit current for the chain of Pb isotopes plotted in the
same manner as in Fig. 5.
Note that both the spin-orbit current J and the spin-
orbit potential are exactly zero at r = 0 as they are
vectors with negative parity.
B. Single-particle energies
As a next step, we analyze the modifications that the
presence of J2 terms brings to single-particle energies in
detail. Before we do so, a few general comments on the
definition and interpretation of single-particle energies
are in order. From an experimental point of view, empir-
ical single-particle energies in a doubly-magic nucleus are
determined as the separation energies between the even-
even doubly magic nucleus and low-lying states in the
adjacent odd-A nuclei, i.e. they are differences of bind-
ing energies. In nuclear models, however, it is customary
to discuss shell structure and single-particle energies in
terms of the spectrum of eigenvalues ǫi of the Hartree-
Fock mean-field Hamiltonian (in even-even nuclei), as we
have done already in Figs. 10 and 11:
ĥΦi = ǫiΦi . (37)
In the nuclear EDF approach without pairing, the ref-
erence state is directly constructed as a Slater determi-
nant of eigenstates of ĥ; hence, the corresponding eigen-
values are directly connected to the fundamental build-
ing blocks of the theory and reflect the mean field in
the nucleus. The density of single-particle levels around
the Fermi surface drives the magnitude of pairing cor-
relations, the relative distance of single-particle levels at
sphericity and their quantum numbers determine to a
large extent the detailed structure of the deformation en-
ergy landscape which in turn, determines the collective
spectroscopy. The spectroscopic properties of even-even
nuclei, in particular when they exhibit shape coexistence,
provide valuable benchmarks for the underlying single-
particle spectrum [56]. The link between the spectrum
of single-particle energies on the one hand and the col-
lective excitation spectrum on the other hand, however,
always remains indirect.
On the other hand, “single-particle” states near the
Fermi level of a magic nucleus can be observed by adding
or removing a particle in one of these states, and thus cor-
respond to the ground and excited states of the neighbor-
ing odd-mass nuclei. Assuming an infinitely stiff magic
core, which is neither subject to any rearrangement or po-
larization, nor to any collective excitations following the
addition (or removal) of a nucleon, the separation ener-
gies with the states in the odd-mass neighbors are equal
to the single-particle energies as defined through (37).
This highly idealized situation is modified by static [88]
and dynamic [89, 90] correlations, often called “core po-
larization” (see chapter 7 of Ref. [91]) and “particle-
vibration coupling” (see section 9.3.3 of Ref. [92]) in the
literature, that alter the separation energies. The main
effect of the correlations is that they compress the spec-
trum, pulling down the levels from above the Fermi en-
ergy and pushing up those from below. The gross fea-
tures, i.e. the ordering and relative placement of single-
particle states, however, are more weakly affected by
correlations. The particle-vibration coupling, however,
is also responsible for the fractionization of the single-
particle strength. When the latter is too large, the naive
comparison between the calculated ǫi given by Eq. (37)
and the energy of the lowest experimental state with the
same quantum numbers is not even qualitatively mean-
ingful anymore [48].
We mention that a part of the static correlations orig-
inate from the non-vanishing time-odd densities in the
mean-field ground-state of an odd-A nucleus, that also
cannot be truly spherical, so that the complete energy
functional from Eq. (23) should be considered in a fully
self-consistent calculation of the separation energies.
The effective single-particle energies that are used to
characterize the underlying shell structure in the inter-
acting shell model [93] have a slightly different mean-
ing. Their definition usually renormalizes polarization
and particle-vibration coupling effects around a doubly-
magic nucleus whereas their evolution is discussed in
terms of monopole shifts [94]. A collection of effective
single-particle energies and their evolution was collected
by Grawe [95, 96]. Note that the SkX parameterization
of the Skyrme energy functional by Brown and its vari-
ants [48, 97] were constructed aiming at a description of
effective single-particle energies along these lines.
It should be kept in mind that the obvious, coarse dis-
crepancies between the calculated spectra of ǫµ and the
empirical single-particle energies are often larger than the
uncertainties coming from the missing correlations, as
long as one observes some elementary precautions. We
took care to ensure that the states used in the analy-
sis below were one-quasiparticle states weakly coupled
to core phonons. First, we checked that the even-even
nucleus of interest could be described as spherical, indi-
cated by a sufficiently high-lying 2+ state. Second, we
avoided all levels which were obviously correlated with
the energies of 2+ states in the adjacent semi-magic se-
ries, as this indicates strong coupling with core excita-
tions. Finally, we carefully examined states, lying above
the 2+ energy and/or twice the pairing gap of adjacent
semi-magic nuclei, in order to eliminate those more accu-
rately described as an elementary core excitation coupled
to one or more quasiparticles, which generally appear as
a multiplet of states. We did not attempt to use energy
centroids calculated with use of spectroscopic factors, as
these are not systematically available. Indeed, our re-
quirement is that if some collectivity is present, it should
be similar among all nuclei considered, in order to be eas-
ily subtracted out. Empirical single-particle levels shown
below are determined from the lowest states having given
quantum numbers in an odd-mass nucleus.
1. Spin-orbit splittings
The primary effect one expects from a tensor term
is that it affects spin-orbit splittings by altering the
strength of the spin-orbit field in spin-unsaturated nuclei,
according to Eq. (35). One should remember, though,
that the spin-orbit coupling itself is readjusted for each
pair of coupling constants CJ0 , and C
1 . The effect of this
readjustment is generally opposite to that of the variation
of the isoscalar tensor term coupling constant. It should
thus be stressed that the effects described result from the
balance between the variation of tensor and spin-orbit
terms, which for most of our parameterizations pull into
opposite directions.
Common wisdom states that the energy spacing be-
tween levels that are both above or both below the magic
gap are not much affected by correlations, even when
their absolute energy changes; hence it is common prac-
tice to confront only the spin-orbit splittings between
pairs of particle or hole states with calculated single-
particle energies from the spherical mean field. Figure 13
shows the relative error of single-particle splitting of such
levels for doubly-magic nuclei throughout the chart of nu-
clei. The calculated values are typically 20 to 60% larger
than the experimental ones, with the exception of 16O,
where the splittings of the neutron and proton 1p states
are acceptably reproduced at least for the parameteri-
zations T22, T24 and T42, i.e. those with the weakest
tensor terms in the sample.
It is noteworthy that the calculated splittings depend
much more sensitively on the tensor terms for light nuclei
with spin-saturated shells (protons and neutrons in 16O,
protons in 90Zr) than for the heavy doubly-magic 132Sn
and 208Pb, which are quite robust against a variation of
the tensor terms. The reason will become clear below.
ν1p π1p
132Sn
ν2d π2d
208Pb
ν3p π2d
FIG. 13: (Color online) Relative error of the spin-orbit split-
tings in doubly-magic nuclei for ℓ ≤ 2 levels.
2. Connection between tensor and spin-orbit terms
The finding that our parameterizations systematically
overestimate the spin-orbit splittings deserves an expla-
nation. It was earlier already noted that all standard
Skyrme interactions, including the SLy parameteriza-
tions that share our fit protocol, have an unresolved trend
that overestimates the spin-orbit splittings in heavy nu-
clei [14, 29, 98]. Adding the tensor terms, however,
further deteriorates the overall description of spin-orbit
splittings, instead of improving it. It is particularly dis-
turbing that the spin-orbit splitting of the 3p level in
208Pb that was used to constrain W0 in the fit is overes-
timated by 30 to 40%, which is larger than the relative
tolerance of 20% included in the fit protocol. In fact,
it turns out that the coupling constant W0 of the spin-
orbit force is more tightly constrained by the binding
energies of light nuclei than by this or any other spin-
orbit splitting. In the HF approach used during the fit,
the structure of 40Ca, 48Ca, and 56Ni differs by the occu-
pation of the neutron and proton 1f7/2 levels. First, we
have to note that the terms in the energy functional that
contain the spin-orbit current play an important role for
the energy difference between 40Ca and 56Ni. The com-
bined contribution from the tensor and spin-orbit terms
varies from a near-zero value in the spin-saturated 40Ca
to about −60 MeV in 56Ni for all our parameterizations,
which is a large fraction of the −142 MeV difference in
total binding energy between both nuclei. The Z = 40
subshell and Z = 50 shell are another example of abrupt
variation of the spin-orbit current with the filling of the
1g9/2 level, which strongly affects the relative binding
energy of N = 50 isotones 90Zr and 100Sn. Second, the
fit to phenomenological data can take advantage of the
large relative variation of these terms to mock up missing
physics in the energy functional that should contribute
to the energy difference, but that is absent in it. The
consequence will be a spurious increase of the spin-orbit
and tensor term coupling constants. The resulting energy
functional will correctly describe the mass difference, but
not the physics of the spin-orbit and tensor terms.
In order to test the above interpretation, we performed
a refit of selected TIJ parameterizations without taking
into account the masses of 40Ca, 48Ca, 56Ni and 90Zr in
the fit procedure. In the resulting parameterizations, the
spin-orbit coefficient W0 is typically 20% lower than in
the original ones. As a consequence, the empirical value
for the spin-orbit splitting of the neutron 3p level in 208Pb
is met well within tolerance, at the price of binding en-
ergy residuals in light nuclei being unacceptably large,
i.e. 56Ni being underbound by 5 MeV while 40Ca and
90Zr are overbound by up to 10 MeV. While the global
trend of the spin-orbit splittings shown in Fig. 13 is enor-
mously improved with these fits, in particular for heavy
nuclei, the overall agreement of the single-particle spectra
with experiment is not, so that we had to discard these
parameterizations. This finding hints at a deeply rooted
deficiency of the Skyrme energy functional. The spin-
orbit and, when present, tensor terms indeed do simu-
late missing physics of the energy functional at the price
of unrealistic spin-orbit splittings. This also hints why
perturbative studies, as those performed in [37, 49] give
much more promising results than what we will find be-
low with our complete refits. We will discuss mass resid-
uals in more detail in Sect. IVC1 below.
During the fit, the masses of light nuclei do not only
compromise the spin-orbit splittings, they also establish
a correlation betweenW0 and C
0 in all our parameteriza-
tions. The combined spin-orbit and spin-current energy
of a given spherical nucleus (N,Z) is given by (keeping
only the isoscalar part since we shall focus on the N = Z
nuclei 40Ca and 56Ni)
0 (N,Z) = C
0 (N,Z) + C
0 (N,Z) (38)
I∇J0 (N,Z) =
d3r ρ0∇ · J0 (39)
IJ0 (N,Z) =
d3r J20 . (40)
-30 -15  0  15  30  45  60  75  90 105 120
J [MeV fm5]
FIG. 14: Correlation between the values of spin-orbit cou-
pling constant C∇J0 and the isoscalar spherical effective spin-
current coupling constant CJ0 . Dots: values for the actual
parameterizations TIJ , solid line: trend estimated through
Eq. (42) (see text).
The difference of E
0 between
56Ni and 40Ca
= ∆Espin (41)
turns out to be fairly independent from the parameteri-
zation. Averaged over all 36 parameterizations TIJ used
here, ∆Espin has a value of −58.991MeV with a standard
deviation as small as 3.202 MeV, or 5.4%.
The integrals in Eqs. (39,40) are fairly independent
from the actual parameterization. For a rough estimate,
we can replace them in Eq. (38) by their average values.
Plugged into Eq. (41) this yields
C∇J0 =
∆Espin − CJ0 〈I
− IJ0
〈I∇J0 (
56Ni)− I∇J0 (
40Ca)〉
. (42)
Figure 14 compares the values of C∇J0 as obtained
through (42) with the values for the actual parameter-
izations. The estimate works very well, which demon-
strates that C∇J0 = −
W0 and C
0 are indeed correlated
and cannot be varied independently within a high qual-
ity fit of the energy functional (28). As the combined
strength of the spin-orbit and tensor terms in the energy
functional is mainly determined by the mass difference
of the two N = Z nuclei 40Ca and 56Ni, the spin-orbit
coupling constant W0 depends more or less linearly on
the isoscalar tensor coupling constant CJ0 , while for all
practical purposes it is independent from the isovector
one, see also Fig. 4 above.
3. Splitting of high-ℓ states and the role of the radial form
factor
As stated above, it is common practice to confront only
the spin-orbit splittings between pairs of particle or hole
states with calculated single-particle energies from the
spherical mean field. The spin-orbit splitting of intruder
states is rarely examined. Figure 15 displays the relative
ν1f π1f
132Sn
ν1h π1g
208Pb
ν1i π1h
FIG. 15: (Color online) Spin-orbit splittings of high-ℓ levels
in magic nuclei across the Fermi energy. The calculated values
are less robust against correlation effects than those shown in
Fig. 13 and have to be interpreted with caution (see text).
deviation of the spin-orbit splittings of the intruder states
with ℓ ≥ 3 that span across major shell closures and
are thus given by the energy difference of a particle and
a hole state. These splittings are not “safe”, i.e. they
can be expected to be strongly decreased by polarization
and correlation effects [88, 89, 90]. To leave room for
this effect, a mean-field calculation should overestimate
the empirical spin-orbit splittings. We observe, however,
that mean-field calculations done here give values that
are quite close to the experimental ones, or even smaller
for parameterizations with large positive isoscalar tensor
coupling (cf. the evolution from T22 to T66).
This means that the spin-orbit splittings are not too
large in general, as might be concluded from Fig. 13,
but that there is a wrong trend of the splittings with ℓ
with the strength of the spin-orbit potential establishing
a compromise between the in-shell splittings of small ℓ
orbits that are too large and the across-shell splittings
of the intruders that are tentatively too small. In fact,
the levels in Fig. 15 obviously have in common that their
radial wave functions do not have nodes, while the levels
in Fig. 13 have one or two nodes, with the notable excep-
tion of the 1p levels in 16O, for which we also find smaller
deviations of the spin-orbit splittings than for the other
levels in Fig. 13.
Underestimating the spin-orbit splittings of intruder
levels has immediate and obvious consequences for the
performance of an effective interaction, as this closes the
magic gaps in the single-particle spectra and compro-
mises the predictions for doubly-magic nuclei, as we will
demonstrate in detail below. By contrast, the spin-orbit
splittings of the low-ℓ states within the major shells have
no obvious direct impact on bulk properties. Their devi-
ation from empirical data is less dramatic, as the typical
bulk observables discussed with mean-field approaches
are not very sensitive to them. It is only in applica-
tions to spectroscopy that their deficiencies become ev-
ident. It is noteworthy that the parameterization T22
without effective tensor terms at sphericity provides a
reasonable compromise between the tentatively underes-
timated splittings of the intruder levels shown in Fig. 15
and the tentatively overestimated splittings of the lev-
els within major shells shown in Fig. 13 above, while for
parameterizations with tensor terms this balance is lost.
There clearly is a proton-neutron staggering in Figs. 13
and 15, such that calculated proton splittings are rela-
tively smaller than the neutron ones. The effect appears
both when comparing proton and neutron levels with dif-
ferent ℓ in the same nucleus, and when comparing proton
and neutron levels with the same ℓ in the same or dif-
ferent nuclei (see the 1h levels in 132Sn and 208Pb). The
staggering for the intruder levels is even amplified for pa-
rameterizations with large proton-neutron tensor term,
as T62, T64 or T66. The effect is particularly promi-
nent for the heavy 132Sn and 208Pb with a large proton-
to-neutron ratio N/Z, which might hint at unresolved
isospin dependence of the spin-orbit interaction, although
alternative explanations that involve how single-particle
states in different shells should interact through tensor
and spin-orbit forces are possible as well, see also the
next paragraph.
Note that also the spin-orbit splittings of the low-ℓ
levels shown in Fig. 13 exhibit a staggering, which is
of smaller amplitude, though. It has been pointed out
by Skalski [99], that an exact treatment of the Coulomb
exchange term (compared to the Slater approximation
used here and nearly all existing literature) does indeed
slightly increase the spin-orbit splittings of protons across
major shells. This effect might give a clue to the stagger-
ing observed for the N = Z nucleus 56Ni, but the magni-
tude of the effect reported in [99] is too small to explain
the large staggering we find for the heavier N 6= Z nuclei.
Next, we use the example of 132Sn to demonstrate why
the spin-orbit splittings of nodeless high-ℓ states are more
sensitive to the tensor terms than low-ℓ states with one
or several nodes, see Fig. 16. The lower panel shows the
neutron spin-orbit potential in 132Sn for four different
parameterizations, while the upper panel shows selected
radial single-particle wave functions. The ν 1h11/2 and
 0  1  2  3  4  5  6  7  8  9  10
r [fm]
ν2d3/2
ν2d5/2
ν1h11/2
π1g9/2
FIG. 16: (Color online) Neutron spin-orbit potential (top)
and the radial wave function of selected orbitals (bottom) in
132Sn.
π 1g9/2 levels give the main contribution to the neutron
and proton spin-orbit currents in this nucleus, and con-
sequently to the tensor contribution to the spin-orbit po-
tential. Indeed, the largest differences between the spin-
orbit potentials from the chosen parameterizations are
caused by the varying contribution from the tensor terms
and appear for the region between 3 and 6 fm, where the
wave functions of the 1g and 1h states are peaked. This
region corresponds to the inner flank of the spin-orbit po-
tential well, while the outer flank is much less affected.
While the 1g and 1h wave functions are peaked at the in-
ner flank, the 2d orbitals have their node in this region.
Consequently, the splittings of the 1g and 1h levels are
strongly modified by the tensor terms, while those of the
2d orbitals are quite insensitive.
As a rule of thumb, the tensor contribution to the
spin-orbit potential in doubly-magic nuclei comes mainly
from the nodeless intruder states, which, when present, in
turn mainly affect their own spin-orbit splittings, leaving
the splittings of the low-ℓ states with one or more nodes
nearly unchanged for reasons of geometrical overlap.
We note in passing that the slightly different radial
wave functions of the 2d orbitals demonstrate nicely that
their contribution to the spin-orbit current, Eq. (27), can-
not completely cancel.
In fact, when regarding more specifically the evolution
of the spin-orbit potential between the parameterizations
T22 and T66, it is striking that for T66 it is essentially
narrowed and its minimum slightly pushed towards larger
radii, while its depth remains unaltered. Recalling that
] 132Sn, ν  (a)
1h centroid
Exp. T22
T42 T62 T24 T44 T64 T26 T46 T66
(7/2+)
(5/2+)
(1/2+)
(11/2-)
(3/2+)
(7/2-)
(3/2-)
(9/2-)
(1/2-)
1g7/2
2d5/2
3s1/2
2d3/2
1h11/2
2f7/2
3p3/2
1h9/2
3p1/2
2f5/2
132Sn, π  (b)
1g centroid
Exp. T22
T42 T62 T24 T44 T64 T26 T46 T66
(1/2-)
(9/2+)
(7/2+)
(5/2+)
(3/2+)
(11/2-)
2p1/2
1g9/2
1g7/2
2d5/2
2d3/2
3s1/2
1h11/2
FIG. 17: Single-particle energies in 132Sn for a subset of our
parameterizations. We also show the centroid of the intruder
levels, defined through Eq. (43) Top panel: neutron levels,
bottom panel: proton levels. A thick mark indicates the Fermi
level.
T66 shows a pathological behavior of too weak spin-orbit
splitting of the intruder states, it appears that a cor-
rect ℓ-dependence of spin-orbit splittings might require
to modify the radial dependence of the spin-orbit poten-
tial such that it becomes wider towards smaller radii.
This uncalled-for modification of the shape of the spin-
orbit field has previously been put forward by Brown
et al. [48] as an argument for a negative like-particle J2
coupling constant α. However, as will be discussed in
paragraph IVB6 below, the evolution of single-particle
levels along isotopic chains calls for α > 0, see also [48].
Additionally, as we will show in appendix B, large nega-
tive values of α pose the risk of instabilities towards the
transition to states with unphysical shell structure.
4. Single-particle spectra of doubly-magic nuclei
After we have examined the predictions for spin-orbit
splittings, we will now turn to the overall quality of the
single-particle spectra of doubly-magic nuclei. Figure 17
shows the single-particle spectrum of 132Sn. It is evi-
dent that as a consequence of the underestimated spin-
orbit splittings of the intruder levels that we discussed
in the last section, the spectrum is deteriorated for large
positive isoscalar tensor term coupling constants CJ0 (see
T66), as, for example, a decrease of the spin-orbit split-
208Pb, ν  (a)
1i centroid
Exp. T22
T42 T62 T24 T44 T64 T26 T46 T66
13/2+
11/2+
15/2-
3p3/2
2f5/2
1i13/2
3p1/2
2g9/2
1i11/2
3d5/2
4s1/2
1j15/2
2g7/2
3d3/2
208Pb, π  (b)
1h centroid
Exp. T22
T42 T62 T24 T44 T64 T26 T46 T66
11/2-
2d5/2
2d3/2
3s1/2
1h11/2
1h9/2
2f7/2
1i13/2
2f5/2
FIG. 18: Same as Fig. 17 for 208Pb.
ting of the neutron 1h shell pushes the 1h11/2 further up,
closing the N = 82 gap. As a consequence, the presence
of the tensor terms cannot remove the problem shared by
all standard mean-field methods that always wrongly put
the neutron 1h11/2 level above the 2d3/2 and 3s1/2 lev-
els [29], which compromises the description of the entire
mass region. For the same reason, the proton spectrum
of 132Sn also excludes interactions with large positive CJ0 ,
which reduces the Z = 50 gap between the 1g levels to
unacceptable small values.
Figure 17 also shows the energy centroids of the ν 1h
and π 1g levels, defined as
εcentqnℓ =
2ℓ+ 1
εqnℓ,j=ℓ+1/2 +
2ℓ+ 1
εqnℓ,j=ℓ−1/2 . (43)
The position of the centroid is fairly independent from
the parameterization. Assuming that the calculated en-
ergy of the centroid of an intruder state is more robust
against corrections from core polarization and particle-
vibration coupling that its spin-orbit splitting, we see
that the ν 1h centroid is clearly too high in energy by
about 1 MeV. In combination with its tentatively too
small spin-orbit splitting, see Fig. 15, this offers an ex-
planation for the notorious wrong positioning of the
ν 1h11/2, 2d3/2 and 3s1/2 levels in
132Sn [29]. The near-
degeneracy of the ν 2d3/2 and 3s1/2 levels is always well
reproduced, while the 1h11/2 comes out much too high.
As the 1h11/2 is the last occupied neutron level, self-
consistency puts it close to the Fermi energy, which, in
turn, pushes the 2d3/2 and 3s1/2 levels down in the spec-
trum.
The overall situation is similar for 208Pb, see Fig. 18.
Again, the high-ℓ intruder states move too close to the
Z = 82 and N = 126 gaps for large positive CJ0 . The
effect is less obvious than for 132Sn as the intruders and
their spin-orbit partners are further away from the gaps.
Still, the level ordering and the size of the Z = 82 gap
become unacceptable for parameterizations with large
tensor coupling constants. For strong tensor term cou-
pling constants (both like-particle and proton-neutron), a
Z = 92 gap opens in the single-particle spectrum of the
protons that is also frequently predicted by relativistic
mean-field models [14, 88] but absent in experiment [100].
The single-particle spectra for the light doubly magic
nuclei 40Ca (Fig. 19), 48Ca (Fig. 20), 56Ni (Fig. 21), 68Ni
(Fig. 22) and 90Zr (Fig. 23), all have in common that
the relative impact of the J2 terms on the ordering and
relative distance of single-particle levels is even stronger
than for the heavy nuclei discussed above. But not all
of the strong dependence on the coupling constants of
the J2 terms that we see in the figures is due to the ac-
tual contribution of the tensor terms to the spin-orbit
potential. This is most obvious for 40Ca, where protons
and neutrons are spin-saturated so that the J2 terms do
not contribute to the spin-orbit potentials. Still, increas-
ing their coupling constants increases the spin-orbit split-
tings, which manifests the readjustment of the spin-orbit
force to a given set of CJ0 and C
1 (see Fig. 4). The evolu-
tion of the spin-orbit splittings in 40Ca visible in Fig. 19
is the background which we have to keep in mind when
discussing the impact of the tensor terms on nuclei with
non-vanishing spin-orbit currents. Note that the spin-
orbit coupling constant W0 is correlated with isoscalar
tensor coupling constant CJ0 , such that the single-particle
spectra obtained with T24 and T42 are very similar, as
they are for T26, T44 and T62.
For 48Ca, Fig. 20, the protons are still spin-saturated
with vanishing proton spin-orbit current Jp, while for
neutrons we have a large Jn. Depending on the nature
of the tensor terms in the energy functional – i.e. like-
particle or proton-neutron or a mixture of both – the
spin-orbit current will either contribute to the spin-orbit
potential of the neutrons or that of the protons or both,
see Eq. (35). For the parameterizations with dominating
like-particle J2 term, for example T24 and T26, the situ-
ation for the protons is the same as for 40Ca: there is no
contribution from the tensor terms to the proton spin-
orbit splittings, but compared to T22 the proton Z = 20
gap is reduced through the readjustment of the spin-orbit
force, leading to values that are too small. For the same
parameterizations, the large contribution from Jn to Wn
opens up the N = 20 gap to values that are tentatively
too large, as it reduces the neutron spin-orbit splittings
and thereby compensates, even overcompensates, the ef-
fect from the readjustment of the spin-orbit force. At the
same time the N = 28 gap is reduced. The opposite effect
is seen for parameterizations with large proton-neutron
tensor term, for example T42 or T62. For those, the pro-
ton spin-orbit splitting is reduced, opening up the Z = 20
40Ca, ν  (a)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
1d5/2
2s1/2
1d3/2
1f7/2
2p3/2
40Ca, π  (b)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
2s1/2
1d3/2
1f7/2
2p3/2
FIG. 19: Same as Fig. 17 for 40Ca.
48Ca, ν  (a)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
2s1/2
1d3/2
1f7/2
2p3/2
2p1/2
1f5/2
48Ca, π  (b)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
2s1/2
1d3/2
1f7/2
2p3/2
FIG. 20: Same as Fig. 17 for 48Ca.
gap compared to T22, while the neutron spin-orbit split-
tings are increased by the background effect from the
readjusted spin-orbit force.
56Ni, ν  (a)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
1f7/2
2p3/2
1f5/2
2p1/2
56Ni, π  (b)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
1f7/2
2p3/2
1f5/2
2p1/2
FIG. 21: Same as Fig. 17 for 56Ni.
68Ni, ν  (a)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
(5/2-)
(1/2-)
(9/2+)
2p3/2
1f5/2
2p1/2
1g9/2
68Ni, π  (b)
Exp. T22
T24 T26 T42 T44 T46 T62 T64 T66
(7/2-)
1f7/2
2p3/2
1f5/2
2p1/2
FIG. 22: Same as Fig. 17 for 68Ni.
For 56Ni, Fig. 21, we have large Jn and Jp. In
this N = Z nucleus, the like-particle or proton-neutron
parts of the tensor terms cannot be distinguished. The
spectra depend only on the overall coupling constant of
the isoscalar tensor term CJ0 , on the one hand directly
through the contribution of the tensor terms to the spin-
orbit potentials, and on the other hand through the back-
ground readjustment of W0 that is correlated to C
well. As already mentioned, results for T24 and T42 are
very similar, as they are for T26, T44 and T62. All pa-
rameterizations have in common that the proton and neu-
tron gaps at 28 are too small. The variation of the single-
particle spectra among the parameterizations is smaller
than for 40Ca, mainly because the tensor terms compen-
sate the background drift from the readjustment of W0.
The slightly neutron-rich 68Ni combines a spin-
saturated sub-shell closure N = 40 that gives a vanishing
neutron spin-orbit current with the magic Z = 28 that
gives a strong proton spin-orbit current. The variation of
the single-particle spectra in dependence of the coupling
constants of the tensor terms is similar to those of 48Ca,
with the roles of protons and neutrons exchanged.
The nucleus 90Zr combines the spin-saturated proton
sub-shell closure Z = 40 with the major neutron shell
closure N = 50. The high degeneracy of the occupied ν
1g9/2 level leads to a very strong neutron spin-orbit cur-
rent, while the proton spin-orbit current is zero. Even
in the absence of a tensor term contributing to their
spin-orbit potential for parameterizations with pure like-
particle tensor terms, the proton single-particle spectra
are dramatically changed by the feedback effect from the
readjusted spin-orbit force; see the evolution from T22
to T26. The π 1g9/2 comes down, and closes the Z = 40
sub-shell gap. For parameterizations with pure proton-
neutron tensor term, one has the opposite effect, this time
because the contribution from the tensor terms overcom-
pensates the background effect from the spin-orbit force.
The effect of the tensor terms on the neutron spin-orbit
splittings is less dramatic, but still might be sizable.
We have to point out that the calculations displayed
in Fig. 23 were performed without taking pairing into
account, as the HFB scheme breaks down in the weak
pairing regime of doubly magic nuclei. For some ex-
treme (and unrealistic) parameterizations, however, the
gaps disappear which, in turn, would lead to strong pair-
ing correlations if the calculations were performed within
the HFB scheme. This happens, for example, for neu-
trons in 90Zr when using T26 and T46. Interestingly,
the pairing correlations for neutrons break the spin sat-
uration, which leads to a substantial neutron spin-orbit
current Jn. As these parameterizations use values of the
like-particle coupling constant significantly larger than
the neutron-proton one, Jn feeds back onto the neutron
spin-orbit potential only, Eq. (35). As the correspond-
ing coupling constant α is positive for T26 and T46, the
contribution from the tensor terms reduces the spin-orbit
splittings, in particular those of the 1g9/2 and 1f5/2. As
a result, this counteracts the reduction of the N = 40
gap predicted by T26 and T46 in calculations without
pairing.
90Zr, ν  (a)
Exp. T22
T24 T26 T42 T44
T46 T62 T64 T66
(11/2-)
2p3/2
1f5/2
2p1/2
1g9/2
2d5/2
3s1/2
2d3/2
1g7/2
1h11/2
90Zr, π  (b)
Exp. T22 T24
T42 T44 T46 T62 T64 T66
1f5/2
2p3/2
2p1/2
1g9/2
FIG. 23: Same as Fig. 17 for 90Zr.
5. Evolution along isotopic chains: np coupling
In the preceding sections, we have analyzed character-
istics of the single-particle spectra for isolated doubly-
magic nuclei. We found that larger tensor terms do not
lead to an overall improvement of the single-particle spec-
tra. However, we also argued that it might be essentially
due to deficiencies of the central (and possibly spin-orbit)
interactions and that it should not be used to discard the
tensor terms as such. In any case, the results gathered so
far on single-particle spectra of doubly-magic nuclei do
not permit to narrow down a region of meaningful cou-
pling constants of the tensor terms. The analysis must be
complemented by looking at other observables. A better
suited observable is provided by the evolution of spin-
orbit splittings along an isotopic or isotonic chain, which
ideally reflects the nucleon-number-dependent contribu-
tion from the J2 terms to the spin-orbit potentials. Un-
fortunately, safe experimental data for the evolution of
spin-orbit partners are scarce; hence, one has to content
oneself to the evolution of the energy distance of lev-
els with different ℓ, assuming that the effect is primarily
caused by the evolution of the spin-orbit splittings of each
level with its respective partner. A popular playground
for such studies is the chain of Sn isotopes, where two
such pairs of levels have gained attention; the π 2d5/2 and
π 1g7/2 on the one hand, and the π 1g7/2 and π 1h11/2 on
the other hand. Figure 24 shows these two sets of results
for a selection of our parameterizations.
Experimentally, the 2d5/2 and 1g7/2 levels cross be-
 56  60  64  68  72  76  80  84
π1g7/2 - π2d5/2
π1h11/2 - π1g7/2
FIG. 24: (Color online) Distance of the proton 1h11/2 and
1g7/2 levels (top) and of the proton 2d5/2 and 1g7/2 levels
(bottom), for the chain of tin isotopes. The “best” param-
eterization cannot and should not be determined with a χ2
criterion, see text.
tween N = 70 and 72, such that the 2d5/2 provides the
ground state of light odd-A Sb isotopes, and 1g7/2 that of
the heavy ones, see for example Ref. [101]. The crossing
as such is predicted by many mean-field interactions and
most of the parameterizations of the Skyrme interaction
we use here. It has also been studied in detail with the
standard Gogny force (without any tensor term) using
elaborate blocking calculations of the odd-A nuclei [102].
The crossing, however, is never predicted at the right
neutron number, see Fig. 24. As we have learned above,
we should not assume that the absolute distance of the
two levels will be correctly described by any of our param-
eterizations (as the centroids of the ℓ shells will not have
the proper distance and the spin-orbit splittings have a
wrong ℓ dependence within a given shell). Hence, the
neutron number where the crossing takes place cannot
and should not be used as a quality criterion. What does
characterize the tensor terms is the bend of the curves
in Fig. 24, as ideally it reflects how the spin-orbit split-
tings of both levels change in the presence of the ten-
sor terms. Similar caution has to be exercised in the
 28  32  36  40  44
π1f5/2 - π2p3/2
FIG. 25: (Color online) Distance of the proton 1f5/2 and
2p3/2 in the chain of Ni isotopes.
analysis of the unusual relative evolution of the proton
1g7/2 and 1h11/2 levels that was brought to attention
by Schieffer et al. [45]. Their spacing has been investi-
gated in terms of the tensor force before [44, 46, 48, 49].
Again, we pay attention to the qualitative nature of the
bend without focusing too much on the precise value by
which the splitting changes when going from N ≈ 58 to
N = 82. Indeed, the matching of the lowest proton frag-
ment with quantum number 1h11/2 seen experimentally
with the corresponding empirical single-particle energy is
unsafe because of the fractionization of the strength as
discussed in Ref. [48].
For both pairs of levels, the evolution of their distance
can be attributed to the tensor coupling between the pro-
ton levels and neutrons filling the 1h11/2 level below the
N = 82 gap. Unfortunately, this introduces an addi-
tional source of uncertainty: as can be seen in Fig. 17,
the ordering of the neutron levels in 132Sn is not prop-
erly reproduced by any of our parameterizations, with the
1h11/2 level being predicted above the 2d3/2 level, while
it is the other way round in experiment. This means that
in the calculations, the contribution from the 1h11/2 level
to the neutron spin-orbit current builds up at larger N
than what can be expected in experiment. As a conse-
quence, the prediction for the relative evolution of the
levels might be shifted by up to four mass units to the
right compared to experiment for both pairs of levels we
examine here.
In the end, the trend of both splittings is best repro-
duced when using a positive value of the neutron-proton
Jn · Jp coupling constant β such that the filling of the
neutron 1h11/2 shell decreases the spin-orbit splittings of
the proton shells. The parameterizations from the T4J
and T6J series indeed do reproduce the bend of empirical
data, with, however, a clear shift in the neutron number
where it occurs, as expected from the previous discussion.
A value of β = 120 MeV fm5, which corresponds to the
series of T4J parameterizations, matches its magnitude
best (see for example T44).
A similar analysis can be performed for the proton
1f5/2 and 2p3/2 levels in the chain of Ni isotopes, see
Fig. 25. This case is interesting as no distinctive feature
can be observed in the empirical spectra, yet the standard
parameterizations without tensor terms like T22 do not
reproduce them. In fact, to keep the 1f5/2 and 2p3/2 at a
constant distance, two competing effects have to cancel.
First, the increasing diffuseness of the neutron density
with increasing neutron number diminishes the proton
spin-orbit splittings through its reduced gradient in the
expression for the proton spin-orbit potential when going
from N = 32 to N = 40. Second, the filling of the neu-
tron 1f5/2 state reduces the neutron spin-orbit current
which in turn increases the proton spin-orbit splittings
for interactions with sizable proton-neutron tensor con-
tribution to the proton spin-orbit potential when going
from N = 32 to N = 40. The former effect can be clearly
seen for parameterizations T2J with vanishing proton-
neutron tensor term, β = 0. Again, parameterizations
of the T4J series seem to be the most appropriate to
describe the evolution of these levels.
The evolution of single-particle levels is the tool of
choice to determine the sign and magnitude of the
proton-neutron tensor coupling constant. The value
which we favor, as a result of our semi-qualitative analy-
sis is β = 120 MeV fm5. This value is only slightly larger
than the value of 94 to 96 MeV fm5 advocated by Brown
et al. in Ref. [48], which was adjusted to theoretical level
shifts in the chain of tin isotopes obtained from a G-
matrix interaction. We can consider this as a reasonable
agreement.
Let us defer the discussion of this value to the end of
this section and study in the next paragraph the like-
particle tensor-term coupling constant α.
6. Evolution along isotopic chains: nn coupling
In order to narrow down an empirical value for the
neutron-neutron tensor coupling constant, the ideal ob-
servable would be the evolution of neutron single-particle
levels along an isotopic chain. Unfortunately, these are
only accessible at the respective shell closures. We shall
therefore compare neutron single-particle spectra of pairs
of doubly-magic nuclei belonging to the same isotopic
chain. Again, the necessity to extract pure single-particle
effects calls for precautions. We choose pairs of parti-
cle or hole levels which are close enough in energy that
their absolute spacing is not much affected by particle-
vibration coupling. Of course, one also has to be careful if
both states appear at relatively high excitation energy in
the neighboring odd isotope because the fractionization
of their strength could again interfere with the analysis.
In the following, we choose pairs of orbitals which are as
safe as possible.
To remove the uncertainties from the deficiencies of the
central and spin-orbit parts of the effective interaction
that we have identified above, we will look at a double
-60  0  60  120  180  240
α [MeV fm5]
β = 0
β = 120
β = 240
FIG. 26: Shift of the distance between the neutron 1d3/2 and
2s1/2 levels when going from
40Ca to 48Ca, Eq. (44) (top) and
of the neutron 1f5/2 and 2p1/2 levels when going from
and 68Ni, Eq. (45) (bottom).
difference, where, first, we construct the energy difference
between the neutron 1d3/2 and 2s1/2 levels separately
for 40Ca and 48Ca, and then compare the value of this
difference in both nuclei
δCa =
1d3/2
2s1/2
1d3/2
2s1/2
. (44)
Assuming that the problems from the central and spin-
orbit forces discussed in Sects. IVB 1 and IVB4 have the
same effect in both nuclei, they will cancel out in δCa.
The interesting feature of this pair of states is that
they are separated by more than 2 MeV in 40Ca, while
they are nearly degenerate in 48Ca, see Figs. 19 and 20.
Such a shift can only be reproduced with a positive (140-
180 MeV fm5) value of α, which decreases the splitting
of the neutron 1d shell when the neutron 1f7/2 level is
filled.
A similar analysis can be performed for the 1f5/2 and
2p1/2 neutron states in the Ni isotopes
56Ni and 68Ni
δNi =
1f5/2
2p1/2
1f5/2
2p1/2
. (45)
Going from 56Ni to 68Ni, the neutron 1f5/2 level comes
further down in energy than the 2p1/2 level for param-
eterizations without tensor terms (T22), see Figs. 21
and 22. The reason for this trend is the geometrical
growth of the nucleus, which on the one hand lowers the
centroid of the 1f levels in the widening potential well,
and on the other hand pushes the spin-orbit field to larger
radii, which has opposite effects on the splittings of 2p
and 1f states. The like-particle tensor terms can com-
pensate this trend through a reduction of the spin-orbit
splitting of the 1f levels. The observed downward shift
by 0.3 MeV can be recovered with a value of α around
120 MeV fm5, see Fig. 26.
It is also gratifying to see that the analysis of Ca and
Ni isotopes suggests nearly the same value for the like-
particle tensor term coupling constant α.
C. Binding energies
Our ultimate goal, although far beyond the scope of
the present paper, is the construction of a universal nu-
clear energy density functional that simultaneously de-
scribes bulk properties like masses and radii, giant res-
onances, and low-energy spectroscopy, such as quasipar-
ticle configurations and collective rotational and vibra-
tional states. To crosscheck how our findings on single-
particle spectra and spin-orbit splittings translate into
bulk properties, we will now analyze the evolution of
mass residuals and charge radii along isotopic and iso-
tonic chains. It has been repeatedly noted in the liter-
ature that the mass residuals from mean-field calcula-
tions show characteristic arches [29, 52, 54, 65, 72, 103,
104, 105], where heavy mid-shell nuclei are usually un-
derbound compared to the doubly magic ones that are
located at the bottom of deep ravines. For light nuclei,
the patterns are often less obvious. Part of this effect can
be explained and removed taking large-amplitude corre-
lations from collective shape degrees of freedom into ac-
count through suitable beyond-mean-field methods. In
turn, this means that the mass residuals should leave
room for the extra binding of mid-shell nuclei from cor-
relations. However, it turns out that for typical effective
interactions the amplitude of the arches is larger than
what is brought by correlations [54]. Furthermore, this
effect seems not to be of the same size for isotopic and
isotonic chains, which altogether hints at deficiencies of
the current effective interactions.
Recently, Dobaczewski pointed out [47] that the
strongly fluctuating contribution brought by the J2 terms
to the total binding energy could remove at least some
of the ravines found in the mass residuals around magic
numbers. The hypothesis was motivated by calculations
that evaluate the tensor terms either perturbatively, or
self-consistently, using in this case an existing standard
parameterization without tensor terms for the rest of the
energy functional. Our set of refitted parameterizations
with varied coupling constants of the tensor terms gives
us a tool to check how much of the argument persists to
a full fit.
1. Semi-magic series
Figure 27 displays binding energy residuals along var-
ious isotopic and isotonic chains of semi-magic nuclei for
a selection of our parameterizations: T22 is the reference
with vanishing J2 terms at sphericity; T24 has a sub-
stantial like-particle coupling constant α and vanishing
proton-neutron coupling constant β, which is similar to
most of the published parameterizations which take the
J2 terms from the central Skyrme force into account; T42
and T62 are parameterizations with substantial proton-
neutron coupling constant β and vanishing like-particle
coupling constant; T44 has a mixture of like-particle and
proton-neutron tensor terms that is close to what we
found preferable for the evolution of spin-orbit splittings
above; and T46 is a parameterization that gives the best
root-mean-square residual of binding energies for spher-
ical nuclei, as we will see below. Finally, T66 is a pa-
rameterization with large and equal proton-neutron and
like-particle tensor-term coupling constants.
The tensor terms have opposite effects in light and
heavy nuclei: The curves obtained with T22, the parame-
terization without J2 term contribution at sphericity, are
relatively flat for the light isotopic and isotonic chains,
but show very pronounced arches with an amplitude of
5 or even more MeV for the heavy Sn and Pb isotopic
chains. By contrast, the most striking effect of the J2
terms is that they induce large fluctuations of the mass
residuals in light nuclei, while they flatten the curves in
the heavy ones.
The strong variation between the parameter sets for
light nuclei are of course the direct consequence of the
strong variation of the spin-orbit current J that enters the
spin-orbit and tensor terms when going back and forth
between nuclei where the configuration of at least one
nucleon species is spin-saturated. The variations seen
are a result of the modifications of tensor-term coupling
constants and the associated readjustment of the spin-
orbit strength W0. For example,
48Ca is overbound with
respect to 40Ca and 56Ni for parameterizations with a
proton-neutron coupling constant β > 0, while the like-
particle coupling constant α has a more limited effect.
Since only the neutron core is spin-unsaturated in this
nucleus, this must be attributed to the increase in the
readjusted spin-orbit strength W0 (correlated with C
(α + β)) which dominates when β is increased and α
kept at zero, and counterbalances the effect of α when
the latter varies. See the parameter sets T62 and T66
in Figures 27 and 28. The large overbinding of nuclei
around 90Zr (Z = 40, N = 50) for parameterizations
with large proton-neutron tensor coupling constant has
the same origin. For a given parameterization and a given
nucleus, the energy gain from the spin-orbit term seems
to be almost always larger than the energy loss from the
J2 one, see Fig. 28 for Ca isotopes and Fig. 29 for Sn
isotopes. Of course other terms in the energy functional
compensate for a part of the gain from the spin-orbit
term, but the overall trends of the mass residuals suggest
that the spin-orbit energy has a much larger contribution
to the differences between the parameterizations visible
in Fig. 27 than the J2 terms.
We have to note that the spin-orbit current does not
completely vanish for the nominally proton and neutron
spin-saturated 40Ca for parameterizations with large cou-
 16  20  24  28
 32  36  40  44  48
 48  52  56  60  64  68
 16  20  24  28  32
 28  32  36  40  44
 50  60  70  80
 100  110  120  130
FIG. 27: (Color online) Mass residuals Eth − Eexp along selected isotopic and isotonic chains of semi-magic nuclei for the
parameterizations as indicated. Positive values of Eth − Eexp denote underbound nuclei, negative values overbound nuclei.
pling constants of the J2 terms. For those, the gap at 20
is strongly (and nonphysically) reduced, see Fig. 19. The
small gap at 20 does not suppress pairing correlations
anymore in our HFB approach. The resulting scattering
of particles from the sd shell to the fp shell breaks the
spin-saturation, such that there is a finite, in some cases
quite sizable, contribution from the spin-orbit term to
the total binding energy. Owing to the compensation be-
tween all contributions, the total energy gain compared
to a HF calculation without pairing is usually small and
rests on the order of 200 keV for the parameterizations
shown in Fig. 27.
It is also important to note that some of the light
chains in Fig. 27 are sufficiently close to or even cross the
N = Z line that they are subject to the Wigner energy,
which still lacks a satisfying explanation, not to men-
tion a description in the framework of mean-field meth-
ods [106]. The Wigner energy is not taken into account
 16  20  24  28  32
FIG. 28: (Color online) Evolution of spin-orbit current (J2t )
energy (bottom panel, zero by construction for T22) and spin-
orbit energy (top panel) with neutron number N in the chain
of Ca isotopes (Z = 20).
in our fits, while it turned out to be a crucial ingredient
of any HFB [107, 108, 109] or other mass formula. In
fact, as shown in Fig. 14 of Ref. [54], the missing Wigner
energy clearly sticks out from the mass residuals for SLy4
(which is very similar to T22) when they are plotted for
isobaric chains. This local trend around N = Z is, how-
ever, overlaced with a global trend with mass number,
such that the missing Wigner energy cannot be spotted
anymore when looking at the mass residuals for the iso-
topic chain of Ca isotopes, similar to what is seen for T22
in Fig. 27. Within our fit protocol, the correlation be-
tween the masses of 40Ca, 48Ca and 56Ni, that is brought
by the spin-orbit force (see Sect. IVB 2) does not tolerate
a correction for the Wigner energy for standard central
and spin-orbit Skyrme forces, as this will lead to an un-
acceptable underbinding of 48Ca. This, however, might
change when the J2 terms are added. Indeed, Fig. 27
suggests that adding a phenomenological Wigner term
around 40Ca and 56Ni to a parameter set like T44, which
is consistent with the evolution of single-particle levels,
would flatten the curves for the mass residuals in the Ca,
Ni and N = 28 chains. The mass residuals for the chain
of oxygen isotopes that are not shown here would be im-
proved in a similar manner. However, extreme caution
should be exercised before jumping to premature conclu-
sions, as the spin-orbit splittings and level distances in
light nuclei are far from realistic for all our parameter-
 50  60  70  80
FIG. 29: (Color online) Same as Fig. 28 for tin isotopes
(Z = 50).
izations; as a consequence it is difficult to judge if the
room we find for the Wigner energy is fortuitous or in-
deed a feature of well-tuned J2 terms. Note that the HFB
mass formulas that do include a correction for the Wigner
energy side-by-side with the J2 terms from the central
Skyrme force give satisfying mass residuals for light nu-
clei [107, 108, 109], but have nuclear matter properties
that are quite different from ours; cf. BSk1 and BSk6
with SLy4 in Table I of Ref. [110]. Our constraints on
the empirical nuclear matter properties (same as those
on SLy4) that are absent in these HFB mass fits might
be the deeper reason for this conflict.
Large tensor-term coupling constants straighten the
arches in the mass residuals in the heavy Sn and Pb iso-
topic chains, but the improvements are not completely
satisfactory. Large, combined proton-neutron and like-
particle coupling constants tend to transform the arch
for the tin isotopic chain into a an s-shaped curve, which
is not very realistic from the standpoint of expected cor-
rections through collective effects. It can again be as-
sumed that the deficiencies of the single-particle spectra
pointed out in Fig. 17 are responsible, where the ν 1h11/2
and π 1g9/2 are placed too high above the rest of the
single-particle spectra in heavy Sn isotopes. For Pb iso-
topes, large values of the tensor terms tend to overbind
the neutron-deficient isotopes. It is noteworthy that the
tensor terms seem to not much affect the mass residu-
als of the heavy Pb isotopes above N = 126, which are
on the flank of a very deep ravine that becomes visible
 50  60  70  80  90  100  110  120
FIG. 30: (Color online) Two-neutron separation energy along
the chain of isotopes (Z = 50).
when going towards heavier elements, cf. the SLy4 results
in Ref. [54].
It has been often noted that effective interactions that
give a similar satisfying description of masses close to the
valley of stability give diverging predictions when extrap-
olated to exotic nuclei. The standard example is the two-
neutron separation energy S2n(N,Z) = E(N,Z − 2) −
E(N,Z) for the chain of Sn isotopes. Results obtained
with a subset of our parameterizations are shown in Fig.
30. It is noteworthy that the differences for neutron-rich
nuclei beyond N = 82 are not larger than those for the
isotopes closer to stability. Around the valley of stabil-
ity, increasing the coupling constants of tensor terms, in
particular the like-particle ones, tilts the curve, pushing
it up for light isotopes and pulling it down it for heavy
ones, which reflects of course the position of the ν 1h11/2
level that is pushed into the N = 82 gap, see Fig. 17.
For the neutron-rich isotopes, small differences appear
around N = 90, which reflects the change of level struc-
ture above the ν 2f7/2 level and at the drip line, but they
are much smaller than the differences seen between pa-
rameterizations obtained with different fit protocols, see
Fig. 5 of Ref. [29].
2. Systematics
In the preceding section we showed how the J2 terms
in the energy functional modify the trends of mass resid-
uals along isotopic and isotonic chains, in particular the
amplitude of the arches between doubly-magic nuclei. In
this section, we want to examine how this translates into
quality criteria for the overall performance of the param-
eterizations for masses.
Figure 31 displays the root-mean-square deviation of
-60060120180240
∆Erms [MeV]
(T11)
β [MeV fm5]α [MeV fm
∆Erms [MeV]
FIG. 31: Root-mean-square deviation from experiment of the
binding energies of a set of 134 spherical nuclei, for each of
the forces TIJ , vs. α and β (The “(T11)” label indicates
the position of this parameterization in the (α, β)-plane).
Contour lines at ∆Erms = 2.0, 2.25, 2.5, 3.0, 3.5, 4.0 MeV. The
minimal value is found for T46 (∆Erms = 1.96 MeV).
the mass residuals for all our 36 parameterizations, eval-
uated for a set of 134 nuclei predicted to have spherical
mean-field ground states when calculated with the pa-
rameterizations SLy4 [54]. One observes a clear mini-
mum around T46, i.e. (α, β) = (240, 120), with (Eth −
Eexp)r.m.s. = 1.96 MeV, compared with 3.44 MeV for
T22 (α = β = 0). We found even slightly better values
with even more repulsive isoscalar and isovector coupling
constants, but the single-particle spectra of these inter-
actions turn out to be quite unrealistic, cf. Sect. IVB 1.
This already demonstrates that in the presence of the J2
terms a good fit of masses does not necessarily lead to
satisfactory single-particle spectra.
Figure 32 demonstrates how the distribution of the
mass residuals Eth − Eexp affects the evolution of their
r.m.s. value for a subset of 9 parameterizations. For
T22 (α = β = 0), the distribution is centered at posi-
tive mass residuals, with only very few nuclei being over-
bound. Increasing β to 120 MeV fm5 (T42) or even 240
MeV fm5 (T62) shifts the median of the distribution to
smaller values, which yields more and more overbound
nuclei. For large values of β, the distribution spreads out
more, which diminishes the improvement from centering
the distribution closer to zero. For given β, increasing
α mainly shifts the median of the distribution without
spreading out its overall shape, which is preferable to
optimize the r.m.s. value.
These considerations, however, have to be taken with
caution. As said above, we aim at a model where certain
correlations beyond the mean-field are treated explicitly,
which asks for a distribution of mean-field mass residuals
with an asymmetric distribution towards positive mass
residuals, and a width that is similar to the difference
-10 -8 -6 -4 -2  0  2  4  6  8  10
Eth - Eexp [MeV]
FIG. 32: (Color online) Distribution of deviations from ex-
periment of the binding energies of a set of 134 spherical nuclei
(1 MeV bins) for a subset of parameterizations. Each panel
corresponds to a given value of β (from top to bottom: β = 0,
120, 240 MeV fm5).
between the maximum and minimum correlation energies
to be found.
D. Radii
The evolution of nuclear charge radii along isotopic
chains reflects how the mean field of the protons changes
when neutrons are added in the system. In the sim-
plistic liquid-drop model, it just follows the geometrical
growth of the nucleus ∼ A1/3, but data show that there
are many local deviations from this global trend. On
the one hand, radii are of course subject to correlations
beyond the mean field [54, 111, 112, 113, 114] On the
other hand, they are also sensitive to the detailed shell
structure, which, in turn, might be influenced by tensor
terms. We will concentrate here on two anomalies of the
evolution of charge radii, both of which are not much
influenced by collective correlations beyond the mean-
field (at least in calculations with the Skyrme interaction
SLy4) [54]: that the root-mean-square (r.m.s.) charge ra-
dius of 48Ca is almost the same as the one of the lighter
40Ca or possibly slightly smaller, and the kink in the iso-
topic shifts of mean-square (m.s.) charge radii in the Pb
isotopes, where Pb isotopes above 208Pb are larger than
what could be expected from liquid-drop systematics. In
-60  0  60  120  180  240
β [MeV fm5]
 40Ca 48Ca 
α = 0
α = 120
α = 240
(α = 120)
1s1/2
1p3/2
1p1/2
1d5/2
1d3/2
2s1/2
FIG. 33: (Color online) Middle panel: Difference of mean-
square charge radii between 40Ca and 48Ca as a function of
the proton-neutron tensor term coupling constant β for three
values of α. The experimental value (with error bar) is rep-
resented by the two horizontal black lines. Bottom panel:
Root-mean-square charge radii of 40Ca and 48Ca. Top panel:
Contribution of the single-particle proton states to the dif-
ference of the charge radii (mean square radius of the point
proton distribution, see Eq. (46)).
both cases it is plausible that shell effects are the de-
termining factor, although alternative explanations that
involve pairing effects have been put forward for the lat-
ter case as well [115, 116].
Charge radii have been calculated with the approxima-
tion used in Ref. [51]3 and derived from Ref. [117]
r2ch = 〈r
2〉p + r
r2n +
v2i µqi〈σ · ℓ〉i ,
where the mean-square (m.s.) radius of the point-proton
distribution 〈r2〉p is corrected by three terms: the first
two estimate the effects of the intrinsic charge distribu-
tion of the free proton and neutron (with m.s. radii r2p
and r2n) and the third adds a correction from the mag-
netic moments of the nucleons. Since we will consider
the shift of charge radii for different isotopes of the same
series, the actual value of r2p cancels out. For the second
correction term, which is independent from the interac-
tion, we take r2n = −0.117 fm
2 [29]. Finally, the magnetic
correction can only depend weakly on the details of the
interaction through the occupation factors v2i when non-
magic nuclei are considered. The same expressions had
been used during the fit of our parameterizations.
We begin with the Ca isotopes. Most parameteriza-
tions of Skyrme’s interaction are not able to reproduce
that the charge radius of 48Ca has about the same size as
that of 40Ca, see Fig. 11 in Ref. [29]. The middle panel of
Fig. 33 shows the difference of the the m.s. radii of 48Ca
and 40Ca in dependence of the tensor term coupling con-
stants α and β. First, this difference is almost indepen-
dent of α, the strength of the like-particle tensor terms.
Second, it is strongly correlated with β, the strength of
the proton-neutron tensor term, with large positive val-
ues of β bringing the difference of radii into the domain
of experimentally acceptable values [118] or even below,
with a best match obtained for β = 80 MeV fm5. This
effect can be explained by looking at the proton single-
particle spectra of 40Ca (Fig. 19) and 48Ca (Fig. 20). In-
deed, one observes that a positive neutron-proton tensor
coupling constant decreases the strength of the proton
spin-orbit field in 48Ca, which in turn lowers the π 1d3/2
level in 48Ca (compare the parameterizations TIJ in
Fig. 20 with increasing I for given J). As a consequence,
the m.s. radius of this state decreases as it sinks deeper
into the potential well of 48Ca. At the same time, this
level is pushed up in 40Ca, which slightly increases the
contribution of this state to the charge m.s. radius of this
nucleus. This effect is demonstrated in the top panel of
Fig. 33, which displays the degeneracy-weighted and nor-
malized change of the m.s. radii of proton hole states be-
tween 40Ca and 48Ca as a function of the proton-neutron
tensor term coupling constant β for forces with a like-
particle tensor term coupling constant α = 120 MeV fm5.
Indeed, the decreasing contribution from the π1d3/2 state
to the m.s. radius significantly decreases the isotopic shift
3 There is a typographical error in Eq. (4.2) in Ref. [51], that
was copied to Eq. (110) in Ref. [29]: the ~/mc factor should
be squared, as is trivially found by dimensional analysis and
confirmed by Ref. [117].
-60  0  60  120  180  240
α [MeV fm5]
β = 0
β = 120
β = 240
FIG. 34: Change of slope in the m.s. charge radii ∆2r2ch
around 208Pb, Eq. (47), in fm2 as a function of α for three
values of β. The experimental value is about one and a half
times as large as the largest theoretical value shown here, see
text.
between both Ca isotopes. It has to be noted that the
m.s. value of the charge radii of 40Ca and 48Ca are al-
most independent of alpha and that their absolute values
are not reproduced for any of our parameterizations.
The latter study demonstrates the correlation between
the isotopic shift of m.s. charge radius between 40Ca and
48Ca and the absolute single-particle energy of the pro-
ton 1d3/2 state. This level can be moved around within
the single-particle spectrum with the J2 terms. However,
the agreement of the calculated single-particle energy of
the proton 1d3/2 state in both nuclei with experiment is
not necessarily improved for the parameterizations that
reproduce the isotopic shift of the m.s. charge radius.
Furthermore, a good reproduction of the isotopic shift
does not guarantee that the absolute values of the charge
radii are well reproduced, see the bottom panel in Fig. 33.
In fact, they are predicted too large for all of our pa-
rameterizations, which again points to deficiencies of the
central field. Altogether, this suggests that in spite of
its sensitivity to the coupling constants of the J2 terms,
the isotopic shift of m.s. charge radius between 40Ca and
48Ca should not be used to constrain them before one
has gained sufficient control over the central interaction.
A few further words of caution are in place. The charge
radii of all light nuclei are significantly increased by dy-
namical quadrupole correlations, see Fig. 23 of Ref. [54].
Correlations beyond the static self-consistent mean field
are also at the origin of the arch of the ms charge radii
between 40Ca and 48Ca that is neither reproduced by
any pure mean-field model, see again Fig. 11 in Ref. [29],
nor by the beyond-mean-field calculations with SLy4 of
Ref. [54], while the shell model allows for a satisfactory
description [119].
Many explanations have been put forward to explain
the kink in the isotopic shifts of Pb radii. As it qual-
itatively appears in relativistic mean-field models, but
not in non-relativistic ones using the standard spin-orbit
interaction (16), it has been used as a motivation to gen-
eralize the isospin mix of the standard spin-orbit energy
density functional, Eq. (18), to simulate the isospin de-
pendence of the relativistic Hartree models [78, 79]. The
resulting parameterizations are not completely satisfac-
tory, as the price for the improvement of the radii is a fur-
ther deterioration of spin-orbit splittings [14], while the
relativistic mean field gives a satisfactory description of
both. Some standard Skyrme interactions that take the
tensor terms from the central Skyrme force into account
also give a kink, but it is by far too small to reproduce
the experimental values [52].
Plotting the m.s. radii along the chain of Pb isotopes as
a function of N , the slopes are nearly linear when looking
separately at the isotopes below and above 208Pb. We
will concentrate on the change in the slope at 208Pb that
is brought by the tensor terms, which can be quantified
through the second finite difference of the m.s. radii at
208Pb
∆2〈r2ch〉(
208Pb) (47)
r2ch(
206Pb)− 2 r2ch(
208Pb) + r2ch(
210Pb)
There are two conflicting values to be found in the lit-
erature, either 46.4± 1.4 fm2 [118] and the significantly
larger 59 ± 3 fm2 [120]. Figure 34 shows the change of
slope around 208Pb as defined through Eq. (47) as a func-
tion of the like-particle tensor coupling constant α and
for three different values of β. It is striking to see that
this quantity is almost independent of the neutron-proton
tensor coupling constant β, so the change is mainly in-
duced by the tensor interaction between particles of the
same kind. It has been noted before that the kink in the
isotopic shift of the charge radii in Pb isotopes is corre-
lated to the single-particle spectrum of neutrons above
N = 126, in particular the position of the 1i11/2 level.
(This has to be contrasted with the Ca isotopic chain
discussed above, where the difference of charge radii be-
tween 40Ca and 48Ca appears to be particularly sensi-
tive to the single-particle spectrum of the protons.) The
closer the 1i11/2 level is to the 2g9/2 level that is filled
above N = 126, the more the 1i11/2 becomes occupied
through pairing correlations. Through the shape of its
radial wave function, the partial filling of the nodeless
1i11/2 increases the neutron radius faster than filling only
the 2g9/2, and in particular faster than for the isotopes
below N = 126. As the protons follow the density distri-
bution of the neutrons, the charge radius grows rapidly
beyond N = 126. This offers an explanation why the
kink increases with the like-particle tensor term coupling
constant α: for large values of the weight α of the neu-
tron spin-orbit current in the neutron spin-orbit poten-
tial, Eq. (35), the spin-orbit splitting of the ν 1i levels is
reduced such that the 1i11/2 approaches the 2g9/2 level
in 208Pb, see Fig. 18.
While the kink is clearly sensitive to the tensor terms,
they cannot be responsible for the entire effect, as
even for extreme parameterizations that give unrealistic
single-particle spectra the calculated kink hardly reaches
about three quarters of its experimental value.
V. SUMMARY AND CONCLUSIONS
We have reported a systematic study of the effects of
the J2 (tensor) terms in the Skyrme energy functional for
spherical nuclei. The aim of the present study was not to
obtain a unique best fit of the Skyrme energy functional
with tensor terms, but to analyze the impact of the tensor
terms on a large variety of observables in calculations
at a pure mean-field level and to identify, if possible,
observables that are particularly, even uniquely, sensitive
to the J2 terms. To reach our goal, we have built a set
of 36 parameterizations that cover the two-dimensional
parameter space of the coupling constants of the J2t terms
that does not give obviously unphysical predictions for a
wide variety of observables we have looked at. The fits
were performed using a protocol very similar to that of
the SLy parameterizations [51, 52]. The 36 actual sets of
parameters can be found in the Physical Review archive
[85].
We use a formalism that explicitly relates the tensor
terms in the energy functional to underlying effective
density-dependent central, spin-orbit and tensor forces
(or vertices) in the particle-hole channel. As has been
known for long, a zero-range tensor force gives no qual-
itatively new terms for spherical mean-field states when
combined with a central Skyrme force, but solely modifies
the coupling constants of the J2 terms that are already
present. The contribution from the central Skyrme force
to the coupling constants of the J2 terms depends on the
same parameters t1, x1, t2 and x2 that determine the ef-
fective mass and contribute to the surface terms. As the
latter terms are much more important for the description
of bulk properties than the J2 terms, the coupling con-
stants of the J2 terms are confined to a very small region
of the parameter space. From this point of view, adding
a tensor force is necessary to explore it fully.
There is, however, the alternative interpretation of the
Skyrme energy functional from the density matrix ex-
pansion, which in the absence of ab-initio realizations so
far is used as a motivation to set up energy functionals
with independent, and phenomenologically fitted, cou-
pling constants of all terms not constrained by symme-
tries. In particular, this can be used to set unwanted or
underconstrained terms to zero, as it is done for many
existing parameterizations of the (central) Skyrme in-
teraction. For the ground states of spherical nuclei, as
discussed here, the frameworks cannot be distinguished.
For deformed nuclei and, in particular, polarized nuclear
matter, this choice will make a difference.
As a result of our study, we have obtained a long list
of potential deficiencies of the Skyrme energy functional,
most of which can be expected to be related to the prop-
erties of the central and spin-orbit interactions used. In
fact, these deficiencies become more obvious the moment
one adds a tensor force, as it appears that the presence of
a tensor force unbalances a delicate compromise within
various terms of the Skyrme interaction that permits to
get the global trend of gross features of the shell structure
right.
Our conclusions, however, have to be taken with a
grain of salt. On the one hand, some might depend on
the fit protocol; and on the other hand, we have to stress
that (within the framework of our study – and all others
available so far using mean-field methods) the compari-
son between calculated and empirical single-particle en-
ergies is not straightforward and without the risk of being
misled. However, without even looking at single-particle
spectra, we find that
1. The presence of the tensor terms leads to a strong
rearrangement of the other coupling constants,
most notably that of the spin-orbit force. In fact,
we find that the variation of the spin-orbit strength
W0 provoked by the presence of tensor terms has
a larger impact on the global systematics of single-
particle spectra than the tensor terms themselves.
The rearrangement of the parameters of the central
and spin-orbit parts of the effective interaction sug-
gests that perturbative studies of the tensor terms,
in which they are added to an existing parameter-
ization without readjustment, allow only very lim-
ited conclusions.
2. In the Skyrme energy functional, the combined cou-
pling constants of the spin-orbit and tensor terms
are nearly exclusively fixed by the mass differences
between 40Ca, 48Ca and 56Ni. This correlation ap-
pears to be (at least partly) spurious, the rapidly
varying spin-orbit and tensor terms being misused
to simulate missing physics in the standard Skyrme
functional.
3. The cost function χ2 used in our fit protocol prefers
parameterizations with β = 0, i.e. pure like-particle
tensor terms ∼ (J2n + J
p), without giving a clear
preference for a value of the corresponding coupling
constant α. By contrast, the mass residuals of 134
spherical even-even nuclei are minimized for inter-
actions with large α and β. However, and as we
will discuss in [41], the deformation properties of
many nuclei obtained with the latter parameteriza-
tions are unrealistic, which disfavors this region of
the parameter space.
4. The difference of the charge radii of 40Ca and 48Ca
turns out to be particularly sensitive to the abso-
lute single-particle energy of the proton 1d3/2 level,
which can be moved around by the J2 terms. As
the parameterizations that give the best agreement
for the absolute placement of this level do not nec-
essarily give the best overall single-particle spectra
for these two nuclei, this quantity should not be
used to constrain the J2 terms.
Concerning the global properties of the spin-orbit current
J and its contribution to the spin-orbit potential, we have
shown that
1. The spin-orbit current J in non-spin-saturated
doubly-magic nuclei as 56Ni, 100Sn, 132Sn or 208Pb
is dominated by the nodeless intruder orbitals.
Through the contribution of the tensor terms to
the spin-orbit field, the feedback effect on their own
spin-orbit splitting is maximized.
2. In light nuclei, J and consequently the contribu-
tion of the J2 terms to the binding energy and the
spin-orbit potential, vary rapidly between near-zero
and very large values when adding just a few nucle-
ons to a given nucleus. In heavy spherical nuclei,
the variation becomes much slower and smoother
as on the one hand one does not encounter spin-
saturated configurations anymore, and on the other
hand there are more and more high-ℓ states with
large degeneracy that require more nucleons to be
filled.
3. The contribution from the zero-range spin-orbit
force to the spin-orbit potential is peaked at the
nuclear surface, as it is proportional to the gradi-
ent of the density. By contrast, the contribution
from the zero-range tensor terms is peaked further
inside of the nucleus, modifying the width of the
spin-orbit potential with varying nucleon numbers.
As shown in Ref. [48], experimental data tend to
dislike such a modification.
4. Large negative coupling constants of the tensor
terms will lead to instabilities, where a nucleus
gains energy separating the levels from many spin-
orbit partners on both sides of the Fermi energy.
This process leads to unphysical single-particle
spectra and rules out a large part of the parameter
space. In particular cases, one might even obtain
a (probably spurious) coexistence of two spherical
configurations with different shell structure in the
same nucleus, which are separated by a barrier.
The main motivation to add J2 terms is of course to im-
prove the single-particle spectra. All observations and
conclusions concerning those have to be taken with care,
as in this study we compare the eigenvalues of a spher-
ical single-particle Hamiltonian with the separation en-
ergy to low-lying states in the odd-A neighbors of doubly
and semi-magic nuclei (as was done in all existing earlier
studies). When looking at the single-particle spectra in
doubly-magic nuclei (or semi-magic nuclei combined with
a strong subshell closure of the other species) we find that
1. The relative error of the spin-orbit splittings de-
pends strongly on the principal quantum number of
the orbitals within a given shell, such that for pa-
rameterizations without the tensor terms the split-
tings of the intruder state (without nodes in the ra-
dial wave function) is tentatively too small, while it
becomes too large with increasing number of nodes.
Adding the tensor terms further increases the dis-
crepancy. This problem can only be resolved by
an improved control over the shape of the spin-
orbit potential. Indeed, the size of the spin-orbit
splittings is related to the overlap of the radial
wave function of a given single-particle state with
the spin-orbit potential. The tensor terms mod-
ify the width of the spin-orbit potential, but to
cure this deficiency calls for a large negative like-
particle tensor coupling constant α, which is not
consistent with the evolution of spin-orbit splittings
along chains of semi-magic nuclei, and will lead to
instabilities.
2. We also find that, in a given nucleus, the predicted
spin-orbit splittings of neutron levels are larger
than those of the protons when both are compared
to experiment, which hints at an unresolved isospin
trend in the spin-orbit interaction.
3. For spin-saturated doubly-magic nuclei as 16O and
40Ca, the spin-orbit splittings of the spin-saturated
species of nucleons depends strongly on the cou-
pling constants of the J2 terms, although they do
not contribute to the spin-orbit field. This is a con-
sequence of the strong correlation between the spin-
orbit and tensor term coupling constants, which try
to compensate each other in spin-unsaturated nu-
clei. For parameterizations with strong tensor-term
coupling constants, the resulting spin-orbit force
leads to unrealistic single-particle spectra of spin-
saturated configurations.
4. The centroid of the spin-orbit partners that give
the intruder state is tentatively too high compared
to the major shell below.
The main effect of the tensor terms, that most of the
recent studies concentrate on, is the evolution of spin-
orbit splittings with N and Z. Unfortunately, there are
no data for the splittings themselves, such that one re-
lies on data for the evolution of the distance of two lev-
els with different ℓ. The comparison is compromised by
the global deficiencies of the single-particle spectra listed
above. Still, a careful comparison of calculations and
experiment suggests that
1. The evolution of the proton 1h11/2, 1g7/2 and 2d5/2
levels in the chain of Sn isotopes and that of the
proton 1f5/2 and 2p3/2 levels in Ni isotopes call for
a positive proton-neutron tensor coupling constant
β with a value around 120 MeV fm5, consistent
with the findings of Refs. [48, 49, 50].
2. The evolution of the neutron 1d3/2 and 2s1/2 levels
between 40Ca and 48Ca calls for a like-particle ten-
sor coupling constant α with a similar value around
120 MeV fm5. This it at variance to the findings of
Refs. [48, 49, 50], but in qualitative agreement with
the parameterization skxta of Brown et al. [48] for
which the tensor terms were derived from a realis-
tic interaction but disregarded thereafter because
of its poor description of spin-orbit splittings.
3. Combined this leads to a dominantly isoscalar ten-
sor term with a coupling constant CJ0 around 120
MeV fm5, while the isovector coupling constant will
have a small, near-zero, value.
Our study is obviously only a stepping stone towards
improved parameterizations of the Skyrme energy den-
sity functional. There are a number of necessary further
studies and future theoretical developments
1. The deformation properties of selected parameter-
izations TIJ from this study will be discussed in a
forthcoming paper [41].
2. The influence of the terms depending on time-odd
densities and currents in the complete energy func-
tional (23) on nuclear matter and finite nuclei (rota-
tional bands etc) is under investigation as well. The
existing stability criteria of polarized matter have
to be generalized as the tensor force introduces new
unique terms, for example in the Landau parame-
ters [121].
3. It is well known that the strength of the spin-
orbit force has to scale with the effective mass
of an interaction, which in turn determines the
average density of single-particle levels. All pa-
rameterizations discussed here have a similar effec-
tive mass close to m∗0/m = 0.7 that was already
used for the SLy parameterizations. This value is
somewhat smaller than the one obtained from ab-
initio calculations. We have checked that increas-
ing the effective isoscalar mass to the more realistic
m∗0/m = 0.8 (which within our fit protocol requires
to use two density dependent terms [84]) does not
significantly affect any of our conclusions.
4. It is evident that improvements of the central and
spin-orbit parts of the energy density functional are
necessary, which will require a generalization of its
functional form. Other motivations were found re-
cently to perform such a generalization [84].
5. The only quantity that we found sufficiently sen-
sitive to the tensor terms is the evolution of the
distance between single-particle levels in isotopic
or isotonic chains of semi-magic nuclei. The dis-
tance between the levels that can be used for such
studies is so large, that it might be compromised
by their coupling to collective excitations. Reliable
calculations including pairing, polarization as well
as particle-vibration coupling effects [89, 90] along
isotopic and isotonic chains are needed to test the
quality, reliability and limits of the simplistic iden-
tification of the eigenvalues of the spherical mean-
field Hamiltonian in an even-even nucleus with the
separation energy to or from low-lying states in the
adjacent odd-A nuclei.
Acknowledgments
We thank P. Bonche, H. Flocard, P.-H. Heenen and
B. A. Brown for stimulating and encouraging discus-
sions. Work by M. B. and K. B. was performed within
the framework of the Espace de Structure Nucléaire
Théorique (ESNT). T. L. acknowledges the hospitality
of the SPhN and ESNT on many occasions during the
realization of this work. This work was supported by
the U.S. National Science Foundation under Grant No.
PHY-0456903.
APPENDIX A: COUPLING CONSTANTS OF
THE SKYRME ENERGY FUNCTIONAL
The coupling constants of the central Skyrme energy
density functional in terms of the parameters of the cen-
tral Skyrme force are given by
0 (r)
1 = −
ρα0 (r)
As0 = −
ρα0 (r)
As1 = −
0 (r)
Aτ0 =
Aτ1 = −
AT0 = −
AT1 = −
0 = −
A∆s0 =
A∆s1 =
t2 . (A1)
The coupling constants of the spin-orbit energy density
functional in terms of the parameters of the spin-orbit
force are given by
A∇J0 = −
A∇J1 = −
W0 . (A2)
The coupling constants of the tensor energy density func-
tional in terms of the parameters of Skyrme’s tensor force
are given by (Table I in [59])
BT0 = −
(te + 3to) B
(te − to) (A3)
BF0 =
(te + 3to) B
1 = −
(te − to) (A4)
B∆s0 =
(te − to) B
1 = −
(3te + to) (A5)
B∇s0 =
(te − to) B
1 = −
(3te + to) . (A6)
APPENDIX B: PHASE TRANSITIONS
The densities ρ and τ entering the energy functional
(28) vary smoothly with nucleon numbers as they fol-
-1025
-1020
-1015
-1010
 0  0.1  0.2  0.3  0.4  0.5  0.6  0.7
FIG. 35: Total binding energy of 120Sn as a function of C =
r Jn ·∇ρn in a constrained calculation. The dashed curve
shows results obtained with the parameterization mentioned
in the text, while the solid curve shows results obtained with
SLy5.
low the geometric growth of the nucleus. As a result, a
functional depending only on ρ and τ usually shows a
unique minimum for given N , Z and shape. The situ-
ation is quite different when the tensor terms are taken
into account. Indeed, the amplitude of the spin-orbit
current density J (27) depends on the number of spin-
unsaturated single-particle states in the nucleus; it varies
from (almost) zero in spin-saturated nuclei to large finite
values as a consequence of shell and finite-size effects, see
Fig. 5.
This behavior poses the risk of an instability, which
was already reported in [5]: multiplying J with a large
coupling constant in the spin-orbit potential (35) might,
for certain combinations of the signs of the coupling con-
stant and the spin-orbit currents of protons and neutrons,
increase the spin-orbit splittings. In some nuclei, this will
cause two levels originating from different ℓ shells to ap-
proach the Fermi energy, one from above and the other
from below, or even to cross. In that situation, their
occupation numbers will change such that J increases
further, which feeds back onto the spin-orbit potential
and ultimately leads to a dramatic rearrangement of the
single-particle spectrum.
We faced this problem when attempting to fit param-
eter sets with large negative CJ0 and C
1 . During the fit,
some nuclei sometimes fell into the instability, depending
on the values of the other coupling constants. As this is
a highly nonlinear threshold effect that results in a very
large energy gain from tiny modifications of the coupling
constants, the corresponding fits did not, and could not,
converge.
In special cases, one might even run into a situation
with two coexisting minima, where as a function of a
suitable coordinate the configuration with regular shell
structure is separated from a configuration with unphys-
ical large spin-orbit splittings by a barrier. In such a
120Sn
SLy5 (a)
SLy5 (a) (b)
1g9/2
2d5/2
1g7/2
3s1/2
2d3/2
1h11/2
2f7/2
3p3/2
3p1/2
2p3/2
2p1/2
1f5/2
1g9/2
2d5/2
3s1/2
1h11/2
2d3/2
1g7/2
FIG. 36: Single-particle spectra corresponding to the mini-
mum found with SLy5 and (a) the secondary minimum found
with TXX , (b) the absolute minimum (see Fig. 35; left: neu-
tron levels, right: proton levels).
case, a calculation of the ground state might converge
into one or the other minimum depending on the initial
conditions chosen for the iterative solution of the HFB
equations. In a calculation along an isotopic or isotonic
chain, the coexistence will reveal itself through a large
scattering of the mass residuals, which will fall on two
distinct curves. We illustrate this phenomenon in Fig. 35
for 120Sn using a parameter set denoted “TXX” with
CJ0 = −157.57 MeV fm
5 and CJ1 = −114.88 MeV fm
which is located outside the parameter space shown in
Fig. 1, to its lower left. Among the various possible
recipes for a constraint on the spin-orbit current density,
we chose to minimize the following quantity:
E[ρ]− µ
d3r Jn · ∇ρn − C
where µ is a Lagrange parameter and C is a constant used
to tune the constraint. The energy curve exhibits two
minima denoted (a) and (b). The corresponding single-
particle spectra are shown in Fig. 36 along with those
obtained for SLy5. The minimum (a) corresponds to an
almost spin saturated neutron configuration where both
spin partners are either occupied or empty,4 which is very
similar to what is found using SLy5. In the minimum (b),
which is deeper by more than 7 MeV, the single-particle
spectrum is completely reorganized in order to maximize
the spin-orbit current density and take advantage of its
contribution in the functional. In this situation the neu-
tron spin doublets 2d, 1g and 1h split on both sides of
the Fermi surface and generate a large spin-orbit current
density.
This clearly shows that the parameter sets with large
and negative coupling constants of the J2 terms must
be discarded since for many nuclei they lead to ground
states with unrealistic single-particle structure.
Note that this kind of instability does not appear for
the spin-orbit term: although its contribution to the en-
ergy functional (28) also varies between small, sometimes
near-zero, and very large values, see Figs. 28 and 29, it
is only linear in J. As a consequence, its contribution to
the spin-orbit potential (35) lacks the feedback mecha-
nism outlined above as it does not scale with J. Still, its
contribution to the total energy is usually much larger
than that of the J2 terms, so it plays a decisive role for
the absolute energy gained when varying J.
[1] M. Goeppert Mayer, Phys. Rev. 74, 235 (1948).
[2] O. Haxel, J. H. D. Jensen, and H. E. Suess, Phys. Rev.
75, 1766 (1949).
[3] E. Feenberg and K. C. Hammack, Phys. Rev. 75, 1877
(1949).
[4] M. Goeppert Mayer, Phys. Rev. 75, 1969 (1949).
[5] M. Beiner, H. Flocard, Nguyen Van Giai, and
P. Quentin, Nucl. Phys. A238, 29 (1975).
[6] J. Dobaczewski, I. Hamamoto, W. Nazarewicz, and
J. A. Sheikh, Phys. Rev. Lett. 72, 981 (1994).
[7] G. A. Lalazissis, D. Vretenar, W. Pöschl, and P. Ring,
Phys. Lett. B418, 7 (1998).
[8] G. A. Lalazissis, D. Vretenar, W. Pöschl, and P. Ring,
Nucl. Phys. A632, 363 (1998).
[9] B. Chen, J. Dobaczewski, K. L. Kratz, K. Langanke,
B. Pfeiffer, F.-K. Thielemann, and P. Vogel, Phys. Lett.
B355, 37 (1995).
[10] J. Dobaczewski, A. Nazarewicz, and T. R. Werner,
Phys. Scr. T56, 15 (1995).
[11] J. M. Pearson, R. C. Nayak, and S. Goriely, Phys. Lett.
B387, 455 (1996).
[12] B. Pfeiffer, K.-L. Kratz, and F.-K. Thielemann, Z. Phys.
A357, 235 (1997).
[13] J. Dechargé, J. F. Berger, K. Dietrich, and M. S. Weiss,
Phys. Lett. B451, 275 (1999).
[14] M. Bender, K. Rutz, P.-G. Reinhard, J. A. Maruhn, and
W. Greiner, Phys. Rev. C 60, 034304 (1999).
[15] K. Langanke, J. Terasaki, F. Nowacki, D. J. Dean, and
W. Nazarewicz, Phys. Rev. C 67, 044314 (2003).
[16] R. B. Wiringa, V. G. J. Stoks, and R. Schiavilla, Phys.
Rev. C 51, 38 (1995).
[17] R. Machleidt, Phys. Rev. C 63, 024001 (2001).
[18] S. C. Pieper and R. B. Wiringa, Ann. Rev. Nucl. Part.
Sci. 51, 53 (2001).
[19] P. Navrátil and W. E. Ormand, Phys. Rev. C 68, 034305
(2003).
[20] J. M. Eisenberg and W. Greiner, Nuclear Theory. III.
Microscopic theory of the nucleus (Second printing,
North Holland Physics Publ., Elsevier Science Publish-
ers, Amsterdam, 1976).
[21] S. G. Nilsson and I. Ragnarsson, Shapes and Shells in
Nuclear Structure (Cambridge University Press, Cam-
bridge, England, 1995).
[22] T. Neff and H. Feldmeier, Nucl. Phys. A713, 311
(2003).
[23] R. Roth, T. Neff, H. Hergert, and H. Feldmeier, Nucl.
Phys. A745, 3 (2004).
[24] H. A. Bethe, Phys. Rev. 167, 879 (1968).
[25] J. W. Negele, Phys. Rev. C 1, 1260 (1970).
[26] R. R. Scheerbaum, Phys. Lett. B63, 381 (1976).
[27] A. L. Goodman and J. Borysowicz, Nucl. Phys. A295,
333 (1978).
[28] D. C. Zheng and L. Zamick, Ann. Phys. (NY) 206, 106
(1991).
[29] M. Bender, P.-H. Heenen, and P.-G. Reinhard, Rev.
Mod. Phys. 75, 121 (2003).
[30] T. H. R. Skyrme, Phil. Mag. 1, 1043 (1956).
[31] T. H. R. Skyrme, Nucl. Phys. 9, 615 (1958).
[32] J. S. Bell and T. H. R. Skyrme, Phil. Mag. 1, 1055
(1956).
[33] T. H. R. Skyrme, Nucl. Phys. 9, 635 (1958).
[34] D. Gogny, Nucl. Phys. A237, 399 (1975).
[35] J. Dechargé and D. Gogny, Phys. Rev. C 21, 1568
(1980).
[36] D. Vautherin and D. M. Brink, Phys. Rev. C 5, 626
(1972).
[37] F. Stancu, D. M. Brink, and H. Flocard, Phys. Lett.
B68, 108 (1977).
[38] F. Tondeur, Phys. Lett. B123, 139 (1983).
[39] K.-F. Liu, H. Luo, Z. Ma, Q. Shen, and S. A.
Moszkowski, Nucl. Phys. A534, 1 (1991).
[40] N. Onishi and J. W. Negele, Nucl. Phys. A301, 336
(1978).
[41] M. Bender, K. Bennaceur, T. Duguet, P.-H. Heenen,
T. Lesinski, and J. Meyer (2007), companion paper, in
preparation.
[42] W.-H. Long, Nguyen Van Giai, and J. Meng, Phys. Lett.
B640, 150 (2006).
[43] M. S. Fayache, L. Zamick, and B. Castel, Phys. Rep.
290, 201 (1997).
[44] T. Otsuka, T. Suzuki, R. Fujimoto, H. Grawe, and
Y. Akaishi, Phys. Rev. Lett. 95, 232502 (2005).
[45] J. P. Schiffer, S. J. Freeman, J. A. Caggiano, C. Deibel,
A. Heinz, C.-L. Jiang, R. Lewis, A. Parikh, P. D. Parker,
K. E. Rehm et al., Phys. Rev. Lett. 92, 162501 (2004).
[46] T. Otsuka, T. Matsuo, and D. Abe, Phys. Rev. Lett.
97, 162501 (2006).
[47] J. Dobaczewski, in Proceedings of the Third
ANL/MSU/JINA/INT RIA Workshop, edited by
T. Duguet, H. Esbensen, K. M. Nollett, and C. D.
Roberts (World Scientific, 2006), vol. 15 of Proceed-
ings from the Institute for Nuclear Theory, preprint
nucl-th/0604043.
[48] B. A. Brown, T. Duguet, T. Otsuka, D. Abe, and
T. Suzuki, Phys. Rev. C 74, 061303(R) (2006).
[49] G. Colò, H. Sagawa, S. Fracasso, and P. F. Bortignon,
Phys. Lett. B646, 227 (2007).
[50] D. M. Brink and F. Stancu, Phys. Rev. C 75, 064311
(2007).
[51] E. Chabanat, P. Bonche, P. Haensel, J. Meyer, and
R. Schaeffer, Nucl. Phys. A627, 710 (1997).
[52] E. Chabanat, P. Bonche, P. Haensel, J. Meyer, and
R. Schaeffer, Nucl. Phys. A635, 231 (1998), erratum
Nucl. Phys. A643, 441 (1998).
[53] W. H. Long, H. Sagawa, J. Meng, and Nguyen Van Giai
(2006), preprint nucl-th/0609076.
[54] M. Bender, G. F. Bertsch, and P.-H. Heenen, Phys. Rev.
C 73, 034322 (2006).
[55] M. Bender, P. Bonche, T. Duguet, and P. H. Heenen,
Nucl. Phys. A723, 354 (2003).
[56] M. Bender, P. Bonche, and P. H. Heenen, Phys. Rev. C
74, 024312 (2006).
[57] A. Chatillon, C. Theisen, P. T. Greenlees, G. Auger,
J. E. Bastin, E. Bouchez, B. Bouriquet, J. M. Casand-
jian, R. Cee, E. Clément et al., Eur. Phys. J. A30, 397
(2006).
[58] J. C. Slater, Phys. Rev. 81, 385 (1951).
[59] E. Perlińska, S. G. Rohoziński, J. Dobaczewski, and
W. Nazarewicz, Phys. Rev. C 69, 014316 (2004).
[60] J. Dobaczewski, J. Dudek, S. G. Rohoziński, and T. R.
Werner, Phys. Rev. C 62, 014310 (2000).
[61] J. Dobaczewski and J. Dudek, Acta Phys. Pol. B27, 45
(1996).
[62] Y. M. Engel, D. M. Brink, K. Goeke, S. J. Krieger, and
D. Vautherin, Nucl. Phys. A249, 215 (1975).
[63] J. Dobaczewski and J. Dudek, Phys. Rev. C 52, 1827
(1995), erratum Phys. Rev. C 55, 3177 (1997).
[64] H. Flocard, Ph.D. thesis, Orsay, Série A, No. 1543, Uni-
versité Paris Sud (1975).
[65] J. Dobaczewski, H. Flocard, and J. Treiner, Nucl. Phys.
A422, 103 (1984).
[66] J. W. Negele and D. Vautherin, Phys. Rev. C 5, 1472
(1972).
[67] J. W. Negele and D. Vautherin, Phys. Rev. C 11, 1031
(1975).
[68] X. Campi and A. Bouyssy, Phys. Lett. 73B, 263 (1978).
[69] H. Krivine, J. Treiner, and O. Bohigas, Nucl. Phys.
A336, 155 (1980).
[70] J. Bartel, P. Quentin, M. Brack, C. Guet, and H. B.
Hakansson, Nucl. Phys. A386, 79 (1982).
[71] F. Tondeur, M. Brack, M. Farine, and J. M. Pearson,
Nucl. Phys. A420, 297 (1984).
[72] J. Friedrich and P. G. Reinhard, Phys. Rev. C 33, 335
(1986).
[73] P. G. Reinhard, D. J. Dean, W. Nazarewicz,
J. Dobaczewski, J. A. Maruhn, and M. R. Strayer, Phys.
Rev. C 60, 014316 (1999).
[74] P. Bonche, H. Flocard, and P. H. Heenen, Nucl. Phys.
A467, 115 (1987).
[75] J. Engel, M. Bender, J. Dobaczewski, W. Nazarewicz,
and R. Surman, Phys. Rev. C 60, 014302 (1999).
[76] M. Bender, J. Dobaczewski, J. Engel, and
W. Nazarewicz, Phys. Rev. C 65, 054322 (2002).
[77] J. Terasaki, J. Engel, M. Bender, J. Dobaczewski,
W. Nazarewicz, and M. V. Stoitsov, Phys. Rev. C 71,
034310 (2005).
[78] M. M. Sharma, G. Lalazissis, J. König, and P. Ring,
Phys. Rev. Lett. 74, 3744 (1995).
[79] P. G. Reinhard and H. Flocard, Nucl. Phys. A584, 467
(1995).
[80] A. Akmal, V. R. Pandharipande, and D. G. Ravenhall,
Phys. Rev. C58, 1804 (1998).
[81] S. Goriely, M. Samyn, J. M. Pearson, and M. Onsi, Nucl.
Phys. A750, 425 (2005).
[82] B. A. Brown, private communication.
[83] K. F. Liu and G. E. Brown, Nucl. Phys. A265, 385
(1976).
[84] T. Lesinski, K. Bennaceur, T. Duguet, and J. Meyer,
Phys. Rev. C 74, 044315 (2006).
[85] See EPAPS Document No. ???? for the coupling con-
stants of the 36 TIJ parametrizations.
[86] T. Duguet, K. Bennaceur, and P. Bonche (2006), invited
talk at the Workshop on New developments in Nuclear
Self-Consistent Mean-Field Theories, Yukawa Institute
for Theoretical Physics, Kyoto, Japan, May 30 - June
1, 2005, nucl-th/0508054.
[87] K. Bennaceur and J. Dobaczewski, Comp. Phys. Comm.
168, 96 (2005).
[88] K. Rutz, M. Bender, P. G. Reinhard, J. A. Maruhn, and
W. Greiner, Nucl. Phys. A634, 67 (1998).
[89] V. Bernard and Nguyen Van Giai, Nucl. Phys. A348,
75 (1980).
[90] E. Litvinova and P. Ring, Phys. Rev. C 73, 044328
(2006).
[91] G. F. Bertsch, The Practitioner’s Shell model (North
Holland, Amsterdam, 1972).
[92] P. Ring and P. Schuck, The Nuclear Many Body Problem
(Springer, Berlin, 1980).
[93] E. Caurier, G. Martinez-Pinedo, F. Nowacki, A. Poves,
and A. P. Zuker, Rev. Mod. Phys. 77, 427 (2005).
[94] J. Duflo and A. P. Zuker, Phys. Rev. C 59, R2347
(1999).
[95] H. Grawe, A. Blazhev, M. Górska, I. Mukha, C. Plet-
tner, E. Roeckl, F. Nowacki, R. Grzywacz, and M. Saw-
icka, Eur. Phys. J. A25, 357 (2005).
[96] H. Grawe, A. Blazhev, M. Górska, R. Grzywacz,
H. Mach, and I. Mukha, Eur. Phys. J. A27, 257 (2006).
[97] B. A. Brown, Phys. Rev. C 58, 220 (1998).
[98] M. López-Quelle, Nguyen Van Giai, S. Marcos, and
L. N. Savushkin, Phys. Rev. C 61, 064321 (2000).
[99] J. Skalski, Phys. Rev. C 63, 024312 (2001).
[100] K. Hauschild, M. Rejmund, H. Grawe, E. Caurier,
F. Nowacki, F. Becker, Y. Le Coz, W. Korten, J. Döring,
M. Górska et al., Phys. Rev. Lett. 87, 072501 (2001).
[101] J. Shergur, D. J. Dean, D. Seweryniak, W. B. Walters,
A. Wohr, P. Boutachkov, C. N. Davids, I. Dillmann,
A. Juodagalvis, G. Mukherjee et al., Phys. Rev. C 71,
064323 (2005).
[102] M. G. Porquet, S. Péru, and M. Girod, Eur. Phys. J.
A25, 319 (2005).
[103] Z. Patyk, A. Baran, J. F. Berger, J. Dechargé,
J. Dobaczewski, P. Ring, and A. Sobiczewski, Phys.
Rev. C 59, 704 (1999).
[104] D. Lunney, J. M. Pearson, and C. Thibault, Rev. Mod.
Phys. 75, 1021 (2003).
[105] J. Dobaczewski, M. V. Stoitsov, and W. Nazarewicz, in
Proc. Int. Conf. on Nuclear Physics, Large and Small,
Cocoyoc, Mexico, April 19-22, 2004, edited by R. Bi-
jker, R. F. Casten, and A. Frank (American Insti-
tute of Physics, Melville, NY, 2004), pp. 51–56, nucl-
th/0404077.
[106] W. Satu la, D. J. Dean, J. Gary, S. Mizutori, and
W. Nazarewicz, Phys. Lett. B407, 103 (1997).
[107] F. Tondeur, S. Goriely, J. M. Pearson, and M. Onsi,
Phys. Rev. C 62, 024308 (2000).
[108] M. Samyn, S. Goriely, P. H. Heenen, J. M. Pearson, and
F. Tondeur, Nucl. Phys. A700, 142 (2002).
[109] S. Goriely, M. Samyn, M. Bender, and J. M. Pearson,
Phys. Rev. C 68, 054325 (2003).
[110] P.-G. Reinhard, M. Bender, W. Nazarewicz, and
T. Vertse, Phys. Rev. C 73, 014309 (2006).
[111] P.-G. Reinhard and D. Drechsel, Z. Phys. A290, 85
(1979).
[112] M. Girod and P.-G. Reinhard, Nucl. Phys. A384, 179
(1982).
[113] P. Bonche, J. Dobaczewski, H. Flocard, and P. H. Hee-
nen, Nucl. Phys. A530, 149 (1991).
[114] P. H. Heenen, P. Bonche, J. Dobaczewski, and H. Flo-
card, Nucl. Phys. A561, 367 (1993).
[115] N. Tajima, P. Bonche, H. Flocard, P. H. Heenen, and
M. S. Weiss, Nucl. Phys. A551, 434 (1993).
[116] S. A. Fayans, S. V. Tolokonnikov, E. L. Trykov, and
D. Zawischa, Nucl. Phys. A676, 49 (2000).
[117] W. Bertozzi, J. Friar, J. Heisenberg, and J. W. Negele,
Phys. Lett. B41, 408 (1972).
[118] E. W. Otten, in Treatise on Heavy-Ion Science, edited
by A. D. Bromley (Plenum, New York, 1989), vol. 8.
Nuclei far from Stability, pp. 517–638.
[119] E. Caurier, K. Langanke, G. Martinez-Pinedo,
F. Nowacki, and P. Vogel, Phys. Lett. B522, 240 (2001).
[120] I. Angeli, Atom. Data Nucl. Data Tables 87, 185 (2004).
[121] P. Haensel and A. J. Jerzak, Phys. Lett. B112, 285
(1982).
ABSTRACT
  We perform a systematic study of the impact of the J^2 tensor term in the
Skyrme energy functional on properties of spherical nuclei. In the Skyrme
energy functional, the tensor terms originate both from zero-range central and
tensor forces. We build a set of 36 parameterizations, which covers a wide
range of the parameter space of the isoscalar and isovector tensor term
coupling constants, with a fit protocol very similar to that of the successful
SLy parameterizations. We analyze the impact of the tensor terms on a large
variety of observables in spherical mean-field calculations, such as the
spin-orbit splittings and single-particle spectra of doubly-magic nuclei, the
evolution of spin-orbit splittings along chains of semi-magic nuclei, mass
residuals of spherical nuclei, and known anomalies of charge radii. Our main
conclusion is that the currently used central and spin-orbit parts of the
Skyrme energy density functional are not flexible enough to allow for the
presence of large tensor terms.

<|endoftext|><|startoftext|>
Introduction
It is a pleasure to participate in the celebration of the seminal accomplish-
ments of Gabriele Veneziano. I will try to do so by reviewing a line of
research which is intimately connected with several of Gabriele’s important
contributions, being concerned with the cardinal problem of String Cosmol-
ogy: the fate of the Einstein-like space-time description at big crunch/big
bang cosmological singularities. Actually, the work, described below started
as a by-product of the string cosmology program initiated by M. Gasperini
1Contribution to “String Theory and Fundamental Interactions” – in celebration of
Gabriele Veneziano’s 65th birthday – eds. M. Gasperini and J. Maharana, Springer-Verlag,
Heidelberg, 2007.
http://arxiv.org/abs/0704.0732v1
and G. Veneziano [1]. While collaborating with Gabriele on the possible
birth of “pre-big bang bubbles” from the gravitational-collapse instability of
a generic string vacuum made of a stochastic bath of incoming gravitational
and dilatonic waves [2], an issue raised itself : what is the structure of a
generic spacelike (i.e. big crunch or big bang) singularity within the effec-
tive field theory approximation of (super-) string theory (when keeping all
fields, and not only the metric and the dilaton). The answer turned out to be
surprisingly complex, and rich of hidden structures. It was first found [3, 4]
that the general solution, near a spacelike singularity, of the massless bosonic
sector of all superstring models (D = 10, IIA, IIB, I, HE, HO), as well as
that of M theory (D = 11 supergravity), exhibits a never ending oscillatory
behaviour of the Belinsky-Khalatnikov-Lifshitz (BKL) type [5]. However, it
was later realized that behind this seeming entirely chaotic behaviour there
was a hidden symmetry structure [6, 7, 8]. This led to the conjecture of the
existence of a hidden equivalence (i.e. a correspondence) between two seem-
ingly very different dynamical systems: on the one hand, 11-dimensional
supergravity (or even, hopefully, “M-theory”), and, on the other hand, a
one-dimensional E10/K(E10) nonlinear σ model, i.e. the geodesic motion of
a massless particle on the infinite-dimensional coset space2 E10/K(E10) [8].
The intuitive hope behind this conjecture is that the BKL-type near space-
like singularity limit might act as a tool for revealing a hidden structure, in
analogy to the much better established AdS/CFT correspondence [9], where
the consideration of the near horizon limit of certain black D-branes has
revealed a hidden equivalence between 10-dimensional string theory in AdS
spacetime on one side, and a lower-dimensional CFT on the other side. If
the (much less firmly established) “gravity/coset correspondence” were con-
firmed, it might provide both the basis of a new definition of M-theory, and
a description of the “de-emergence” of space near a cosmological singularity
(see [10] and below).
2Here K(E10) denotes the (formal) “maximal compact subgroup” of the hyperbolic
Kac-Moody group E10.
2 Cosmological billiards
Let us start by summarizing the BKL-type analysis of the “near spacelike
singularity limit”, that is, of the asymptotic behaviour of the metric gµν(t,x),
together with the other fields (such as the 3-form Aµνλ(t,x) in supergravity),
near a singular hypersurface. The basic idea is that, near a spacelike singu-
larity, the time derivatives are expected to dominate over spatial derivatives.
More precisely, BKL found that spatial derivatives introduce terms in the
equations of motion for the metric which are similars to the “walls” of a
billiard table [5]. To see this, it is convenient [11] to decompose the D-
dimensional metric gµν into non-dynamical (lapse N , and shift N
i, here set
to zero) and dynamical (e−2β
, θai ) components. They are defined so that the
line element reads
ds2 = −N2dt2 +
θai θ
idxj . (1)
Here d ≡ D − 1 denotes the spatial dimension (d = 10 for SUGRA11, and
d = 9 for string theory), e−2β
represent (in an Iwasawa decomposition)
the “diagonal” components of the spatial metric gij, while the “off diagonal”
components are represented by the θai , defined to be upper triangular matrices
with 1’s on the diagonal (so that, in particular, det θ = 1).
The Hamiltonian constraint, at a given spatial point, reads (with Ñ ≡
det gij denoting the “rescaled lapse”)
H(βa, πa, P, Q)
Gabπaπb +
cA(Q,P, ∂β, ∂
2β, ∂Q) exp
− 2wA(β)
. (2)
Here πa (with a = 1, ..., d) denote the canonical momenta conjugate to the
“logarithmic scale factors” βa, while Q denote the remaining configuration
variables (θai , 3-form components Aijk(t,x) in supergravity), and P their
canonically conjugate momenta (P ia, π
ijk). The symbol ∂ denotes spatial
derivatives. The (inverse) metric Gab in Eq. (2) is the DeWitt “superspace”
metric induced on the β’s by the Einstein-Hilbert action. It endows the
d-dimensional3 β space with a Lorentzian structure Gab β̇
aβ̇b.
One of the crucial features of Eq. (2) is the appearance of Toda-like
exponential potential terms ∝ exp(−2wA(β)), where the wA(β) are linear
forms in the logarithmic scale factors: wA(β) ≡ wAa βa. The range of labels
A and the specific “wall forms” wA(β) that appear depend on the considered
model. For instance, in SUGRA11 there appear: “symmetry wall forms”
wSab(β) ≡ βb − βa (with a < b), “gravitational wall forms” w
abc(β) ≡ 2βa +
e 6=a,b,c
βe (a 6= b, b 6= c, c 6= a), “electric 3-form wall forms”, eabc(β) ≡
βa + βb + βc (a 6= b, b 6= c, c 6= a), and “magnetic 3-form wall forms”,
ma1....a6 ≡ βa1 + βa2 + ...+ βa6 (with indices all different).
One then finds that the near-spacelike-singularity limit amounts to con-
sidering the large β limit in Eq.(2). In this limit a crucial role is played
by the linear forms wA(β) appearing in the “exponential walls”. Actually,
these walls enter in successive “layers”. A first layer consists of a sub-
set of all the walls called the dominant walls wi(β). The effect of these
dynamically dominant walls is to confine the motion in β-space to a fun-
damental billiard chamber defined by the inequalities wi(β) > 0. In the
case of SUGRA11, one finds that there are 10 dominant walls: 9 of them
are the symmetry walls wS12(β), w
23(β), ..., w
910(β), and the 10th is an elec-
tric 3-form wall e123(β) = β
1 + β2 + β3. As noticed in [6] a remarkable
fact is that the fundamental cosmological billiard chamber of SUGRA11
(as well as type-II string theories) is the Weyl chamber of the hyperbolic
Kac-Moody algebra E10. More precisely, the 10 dynamically dominant wall
forms
wS12(β), w
23(β), ..., w
910(β), e123(β)
can be identified with the 10 sim-
ple roots {α1(h), α2(h), ..., α10(h)} of E10. Here h parametrizes a generic el-
ement of a Cartan subalgebra (CSA) of E10 . [Let us also note that for
Heterotic and type-I string theories the cosmological billiard is the Weyl
310 dimensional for SUGRA11; but the various superstring theories also lead to a 10
dimensional Lorentz space because one must add the (positive) kinetic term of the dilaton
ϕ ≡ β10 to the 9-dimensional DeWitt metric corresponding to the 9 spatial dimensions.
chamber of another rank-10 hyperbolic Kac-Moody algebra, namely BE10].
In the Dynkin diagram of E10, Fig. 1, the 9 “horizontal” nodes correspond
to the 9 symmetry walls, while the characteristic “exceptional” node sticking
out “vertically” corresponds to the electric 3-form wall e123 = β
1 + β2 + β3.
[The fact that this node stems from the 3rd horizontal node is then seen to
be directly related to the presence of the 3-form Aµνλ, with electric kinetic
energy ∝ giℓgjmgknȦijkȦℓmn].
α1 α2 α3 α4 α5 α6 α7 α8 α9
✐ ✐ ✐ ✐ ✐ ✐ ✐ ✐ ✐
Figure 1: Dynkin diagram of E10.
The appearance of E10 in the BKL behaviour of SUGRA11 revived an old
suggestion of B. Julia [12] about the possible role of E10 in a one-dimensional
reduction of SUGRA11. A posteriori, one can view the BKL behaviour as
a kind of spontaneous reduction to one dimension (time) of a multidimen-
sional theory. Note, however, that we are always discussing generic inho-
mogeneous 11-dimensional solutions, but that we examine them in the near-
spacelike-singularity limit where the spatial derivatives are sub-dominant:
∂x ≪ ∂t. Note also that the discrete E10(Z) was proposed as a U -duality
group of the full (T 10) spatial toroidal compactification of M-theory by Hull
and Townsend [13].
3 Gravity/Coset correspondence
Refs [8, 14] went beyond the leading-order BKL analysis just recalled by in-
cluding the first three “layers” of spatial-gradient-related sub-dominant walls
∝ exp(−2wA(β)) in Eq.(2). The relative importance of these sub-dominant
walls, which modify the leading billiard dynamics defined by the 10 dom-
inant walls wi(β), can be ordered by means of an expansion which counts
how many dominant wall forms wi(β) are contained in the exponents of the
sub-dominant wall forms wA(β), associated to higher spatial gradients. By
mapping the dominant gravity wall forms wi(β) onto the corresponding E10
simple roots αi(h), i = 1, ..., 10, the just described BKL-type gradient ex-
pansion becomes mapped onto a Lie-algebraic height expansion in the roots
of E10. It was remarkably found that, up to height 30 (i.e. up to small
corrections to the billiard dynamics associated to the product of 30 leading
walls e−2wi(β)), the SUGRA11 dynamics for gµν(t,x), Aµνλ(t,x) considered
at some given spatial point x0, could be identified to the geodesic dynam-
ics of a massless particle moving on the (infinite-dimensional) coset space
E10/K(E10). Note the “holographic” nature of this correspondence between
an 11-dimensional dynamics on one side, and a 1-dimensional one on the
other side.
A point on the coset space E10(R)/K(E10(R)) is coordinatized by a time-
dependent (but spatially independent) element of the E10(R) group of the
(Iwasawa) form: g(t) = exp h(t) exp ν(t). Here, h(t) = βacoset(t)Ha belongs
to the 10-dimensional CSA of E10, while ν(t) =
α>0 ν
α(t)Eα belongs to a
Borel subalgebra of E10 and has an infinite number of components labelled
by a positive root α of E10. The (null) geodesic action over the coset space
E10/K(E10) takes the simple form
SE10/K(E10) =
(vsym|vsym) (3)
where vsym ≡ 1
(v + vT ) is the “symmetric”4 part of the “velocity” v ≡
(dg/dt)g−1 of a group element g(t) running over E10(R).
The correspondence between the gravity, Eq. (2), and coset, Eq. (3), dy-
namics is best exhibited by decomposing (the Lie algebra of) E10 with respect
4Here the transpose operation T denotes the negative of the Chevalley involution ω
defining the real form E10(10) of E10. It is such that the elements k of the Lie sub-algebra
ofK(E10) are “T -antisymmetric”: k
T = −k, which is equivalent to them being fixed under
ω : ω(k) = +ω(k).
to (the Lie algebra of) the GL(10) subgroup defined by the horizontal line in
the Dynkin diagram of E10. This allows one to grade the various components
of g(t) by their GL(10) level ℓ. One finds that, at the ℓ = 0 level, g(t) is
parametrized by the Cartan coordinates βacoset(t) together with a unimodu-
lar upper triangular zehnbein θacoset i(t). At level ℓ = 1, one finds a 3-form
Acosetijk (t); at level ℓ = 2, a 6-form A
coset
i1i2...i6
(t), and at level ℓ = 3 a 9-index
object Acoseti1|i2...i9(t) with Young-tableau symmetry {8, 1}. The coset action
(3) then defines a coupled set of equations of motion for βacoset(t), θ
coset i(t),
Acosetijk (t), A
coset
i1...i6
(t), Acoseti1|i2...i9(t). By explicit calculations, it was found that
these coupled equations of motion could be identified (modulo terms corre-
sponding to potential walls of height at least 30) to the SUGRA11 equations
of motion, considered at some given spatial point x0.
The dictionary between the two dynamics says essentially that:
(0) βagravity(t,x0) ↔ βacoset(t) , θai (t,x0) ↔ θacoset i(t), (1) ∂t Acosetijk (t) corre-
sponds to the electric components of the 11-dimensional field strength Fgravity
= dAgravity in a certain frame e
i, (2) the conjugate momentum of Acoseti1...i6(t)
corresponds to the dual (using εi1i2...i10) of the “magnetic” frame compo-
nents of the 4-form Fgravity = dAgravity, and (3) the conjugate momentum of
Ai1|i2...i9(t) corresponds to the ε
10 dual (on jk) of the structure constants C ijk
of the coframe ei (d ei = 1
C ijk e
j ∧ ek).
The fact that at levels ℓ = 2 and ℓ = 3 the dictionary between supergrav-
ity and coset variables maps the first spatial gradients of the SUGRA variables
Aijk(t,x) and gij(t,x) onto (time derivatives of) coset variables suggested
the conjecture [8] of a hidden equivalence between the two models, i.e. the
existence of a dynamics-preserving map between the infinite tower of (spa-
tially independent) coset variables (βacoset, ν
α), together with their conjugate
momenta (πcoseta , pα), and the infinite sequence of spatial Taylor coefficients
(β(x0), π(x0), Q(x0), P (x0), ∂Q(x0), ∂
2β(x0), ∂
2Q(x0), . . . , ∂
nQ(x0), . . .)
formally describing the dynamics of the gravity variables (β(x), π(x), Q(x),
P (x)) around some given spatial point x0.
5One, however, expects the map between the two models to become spatially non-local
It has been possible to extend the correspondence between the two models
to the inclusion of fermionic terms on both sides [15, 16, 17]. Moreover,
Ref. [18] found evidence for a nice compatibility between some high-level
contributions (height −115!) in the coset action, corresponding to imaginary
roots6, and M-theory one-loop corrections to SUGRA11, notably the terms
quartic in the curvature tensor. (See also [19] for a study of the compatibility
of an underlying Kac-Moody symmetry with quantum corrections in various
models).
4 A new view of the (quantum) fate of space
at a cosmological singularity
Let us now, following [10], sketch the physical picture suggested by the
gravity/coset correspondence. That is, let us take seriously the idea that,
upon approaching a spacelike singularity, the description in terms of a spa-
tial continuum, and space-time based (quantum) field theory breaks down,
and should be replaced by a purely abstract Lie algebraic description. More
precisely, we suggest that the information previously encoded in the spatial
variation of the geometry and of the matter fields gets transferred to an
infinite tower of spatially independent (but time dependent) Lie algebraic
variables. In other words, we are led to the conclusion that space actually
“disappears” (or “de-emerges”) as the singularity is approached7. In partic-
ular (and this would be bad news for Gabriele’s pre-big bang scenario), we
suggest no (quantum) “bounce” from an incoming collapsing universe to some
outgoing expanding universe. Rather it is suggested that “life continues” for
for heights ≥ 30.
6i.e. such that (α, α) < 0, by contrast to the “real” roots, (α, α) = +2, which enter the
checks mentionned above.
7We have in mind here a “big crunch”, i.e. we conventionally consider that we are
tending towards the singularity. Mutatis mutandis, we would say that space “appears” or
“emerges” at a big bang.
an infinite “affine time” at a singularity, with the double understanding,
however, that: (i) life continues only in a totally new form (as in a kind of
“transmigration”), and (ii) an infinite affine time interval (measured, say, in
the coordinate t of Eq. (3) with a coset lapse function n(t) = 1) corresponds
to a sub-Planckian interval of geometrical proper time8.
Let us also comment on some expected aspects of the “duality” between
the two models. It seems probable (from the AdS/CFT paradigm) that,
even if the equivalence between the “gravity” and the “coset” descriptions
is formally exact, each model has a natural domain of applicability in which
the corresponding description is sufficiently “weakly coupled” to be trustable
as is, even in the leading approximation. For the gravity description this
domain is clearly that of curvatures smaller than the Planck scale. One then
expects that the natural domain of validity of the dual coset model would
correspond (in gravity variables) to that of curvatures larger than the Planck
scale. In addition, it is possible that the coset description should primarily
be considered as a quantum model, as now sketched.
The coset action (3) describes the classical motion of a massless particle
on the symmetric space E10(R)/K(E10(R)). Quantum mechanically, one
should consider a quantum massless particle, i.e., if we neglect polarization
effects9 a Klein-Gordon equation,
�Ψ(βa, να) = 0 , (4)
where � denotes the (formal) Laplace-Beltrami operator on the infinite-
dimensional Lorentz-signature curved coset manifold E10(R)/K(E10(R)).
Eq. (4) would apply to the case considered here of un-compactified M-theory.
In the case where all spatial dimensions are toroidally compactified, it has
been suggested [20, 21] that Ψ satisfy (4) together with a condition of period-
8Indeed, it is found that the coset time t (with n(t) = 1) corresponds to a “Zeno-like”
gravity coordinate time (with rescaled lapse Ñ = N/
g = 1) which tends to +∞ as the
proper time tends to zero.
9Actually, Refs. [15, 16, 17] indicate the need to consider a spinning massless particle,
i.e. some kind of Dirac equation on E10/K(E10).
icity over the discrete group E10(Z). In other words, Ψ would be a “modular
wave form” on E10(Z)\E10(R)/K(E10(R)).
Let us emphasize (still following [10]) that all reference to space and
time has disappeared in Eq. (4). The disappearance of time is common
between (4) and the usual Wheeler-DeWitt equation in which the “wave
function(al) of the universe” Ψ[gij(x)] no longer depends on any extrinsic time
parameter. [As usual, one needs to choose, among all the dynamical variables
a specific “clock field” to be used as an intrinsic time variable parametrizing
the dynamics of the remaining variables.] The interesting new feature of (4)
(when compared to a Wheeler-DeWitt type equation) is the disappearance
of any notion of geometry gij(x) and its replacement by the infinite tower
of Lie-algebraic variables (βa, να)10. This quantum de-emergence of space,
and the emergence of an infinite-dimensional symmetry group E10
11 which
deeply intertwines space-time with matter degrees of freedom might be radical
enough to get us closer to an understanding of the fate of space-time and
matter at cosmological singularities.
Acknowledgments. It is a pleasure to dedicate this review to Gabriele
Veneziano, a dear friend and a great physicist from whom I have learned a
lot. I am also very grateful to my collaborators Marc Henneaux and Hermann
Nicolai for the (continuing) E10 adventure. I wish also to thank Maurizio
Gasperini and Jnan Maherana for their patience.
References
[1] M. Gasperini and G. Veneziano, Phys. Rept. 373, 1 (2003) [arXiv:hep-
th/0207130].
10Note that this is conceptually very different from the E11-based proposal of [22].
11Let us note that E10 enjoys a similarly distinguished status among the (infinite-
dimensional) hyperbolic Kac-Moody Lie groups as E8 does in the Cartan-Killing classi-
fication of the finite-dimensional simple Lie groups [23].
[2] A. Buonanno, T. Damour and G. Veneziano, Nucl. Phys. B 543, 275
(1999) [arXiv:hep-th/9806230].
[3] T. Damour and M. Henneaux, Phys. Rev. Lett. 85, 920 (2000)
[arXiv:hep-th/0003139].
[4] T. Damour and M. Henneaux, Phys. Lett. B 488, 108 (2000) [Erratum-
ibid. B 491, 377 (2000)] [arXiv:hep-th/0006171].
[5] V. A. Belinsky, I. M. Khalatnikov and E. M. Lifshitz, Adv. Phys. 19,
525 (1970).
[6] T. Damour and M. Henneaux, Phys. Rev. Lett. 86, 4749 (2001)
[arXiv:hep-th/0012172].
[7] T. Damour, M. Henneaux, B. Julia and H. Nicolai, Phys. Lett. B 509,
323 (2001) [arXiv:hep-th/0103094].
[8] T. Damour, M. Henneaux and H. Nicolai, Phys. Rev. Lett. 89, 221601
(2002) [arXiv:hep-th/0207267].
[9] O. Aharony, S. S. Gubser, J. M. Maldacena, H. Ooguri and Y. Oz, Phys.
Rept. 323, 183 (2000) [arXiv:hep-th/9905111].
[10] T. Damour and H. Nicolai, “Symmetries, Singularities and the De-
emergence of Space”, essay submitted to the Gravity Research Foun-
dation, March 2007.
[11] T. Damour, M. Henneaux and H. Nicolai, Class. Quant. Grav. 20, R145
(2003) [arXiv:hep-th/0212256].
[12] B. Julia, in: Lectures in Applied Mathematics, Vol. 21 (1985), AMS-
SIAM, p. 335; preprint LPTENS 80/16.
[13] C. M. Hull and P. K. Townsend, Nucl. Phys. B 438, 109 (1995)
[arXiv:hep-th/9410167].
[14] T. Damour and H. Nicolai, arXiv:hep-th/0410245.
[15] T. Damour, A. Kleinschmidt and H. Nicolai, Phys. Lett. B 634, 319
(2006) [arXiv:hep-th/0512163].
[16] S. de Buyl, M. Henneaux and L. Paulot, JHEP 0602, 056 (2006)
[arXiv:hep-th/0512292].
[17] T. Damour, A. Kleinschmidt and H. Nicolai, JHEP 0608, 046 (2006)
[arXiv:hep-th/0606105].
[18] T. Damour and H. Nicolai, Class. Quant. Grav. 22, 2849 (2005)
[arXiv:hep-th/0504153].
[19] T. Damour, A. Hanany, M. Henneaux, A. Kleinschmidt and H. Nicolai,
Gen. Rel. Grav. 38, 1507 (2006) [arXiv:hep-th/0604143].
[20] O. J. Ganor, arXiv:hep-th/9903110.
[21] J. Brown, O. J. Ganor and C. Helfgott, JHEP 0408, 063 (2004)
[arXiv:hep-th/0401053].
[22] P. C. West, Class. Quant. Grav. 18, 4443 (2001) [arXiv:hep-th/0104081].
[23] V. G. Kac, Infinite dimensional Lie algebras, 3rd edition, Cambridge
University Press (Cambridge, 1990).
ABSTRACT
  We review the recently discovered connection between the
Belinsky-Khalatnikov-Lifshitz-like ``chaotic'' structure of generic
cosmological singularities in eleven-dimensional supergravity and the ``last''
hyperbolic Kac-Moody algebra E(10). This intriguing connection suggests the
existence of a hidden ``correspondence'' between supergravity (or even
M-theory) and null geodesic motion on the infinite-dimensional coset space
K(E(10)). If true, this gravity/coset correspondence would offer a new view of
the (quantum) fate of space (and matter) at cosmological singularities.

<|endoftext|><|startoftext|>
Acceleration and localization of matter in a ring trap
Yu. V. Bludov1 and V. V. Konotop1,2
Centro de F́ısica Teórica e Computacional, Universidade de Lisboa,
Complexo Interdisciplinar, Avenida Professor Gama Pinto 2, Lisboa 1649-003, Portugal
Departamento de F́ısica, Faculdade de Ciências, Universidade de Lisboa,
Campo Grande, Ed. C8, Piso 6, Lisboa 1749-016,
Portugal and Departamento de Matemáticas, E. T. S. de Ingenieros Industriales,
Universidad de Castilla-La Mancha 13071 Ciudad Real, Spain
A toroidal trap combined with external time-dependent electric field can be used for implementing
different dynamical regimes of matter waves. In particular, we show that dynamical and stochastic
acceleration, localization and implementation of the Kapitza pendulum can be originated by means
of proper choice of the external force.
PACS numbers: 03.75.Lm, 03.75.Kk, 03.75.Ss
I. INTRODUCTION
Exploring different geometries of potentials trapping
cold condensed atoms is of both fundamental and practi-
cal importance. Toroidal traps play a special role allow-
ing for ”infinite” atomic trajectories and for realization
of quasi-one-dimensional (quasi-1D) regimes. These ad-
vantages are relevant for designing highly precise sensors
based on matter wave interferometry [1, 2] as well as for
accurate study of such phenomena as superfluid currents,
stability of sound waves, solitons and vortices in Bose-
Einstein condensates (BEC’s) [3, 4]. Traps with circular
geometry are also believed to be conceptually important
for implementation of the main ideas of the accelerator
physics at ultra-low temperatures [2] and, in particular,
for acceleration of ultracold atoms [5]. In this last con-
text existence of well localized wave packets, and thus
attenuation of the dispersion, the latter being the intrin-
sic property of a quantum systems, is of primary impor-
tance. In the first experimental studies [2] it was shown
that the dispersive spreading out [1] can be compensated
by using betatron resonances in a storage ring. An al-
ternative way of contra-balancing dispersion is also well
known - it is nonlinearity, leading in quasi-1D regime to
existence of bright and dark matter solitons (see e.g. [6, 7]
and [8, 9, 10], respectively). This issue has already been
explored [11] from the point of view of acceleration of
matter waves in a toroidal trap with help of a modulated
optical lattice, which is known to be an efficient tool for
acceleration of matter waves [12].
In this paper we propose two alternative ways of ac-
celerating matter wave solitons – either by time varying
or by stochastic external electric field. These new ways
of soliton acceleration are especially relevant in view of
radiative losses [13] and distortions [12] of solitons mov-
ing in optical lattices (the effects acquiring significance
for long trajectories). At the same time, it turns out
that the toroidal geometry of a trap confining a BEC al-
lows one to realize a number of other dynamical regimes,
like dynamical localization of solitons and solitonic im-
plementation of the celebrated Kapitza pendulum. The-
oretical description of all mentioned phenomena can be
put witching the unique framework, based on the pertur-
bation theory for solitons, what is done in the present pa-
per. More specifically, in Sec. II we formulate the model
and the main physical constraints determining its valid-
ity. In sections III and IV we describe how by applying
external time-dependent electric field matter solitons can
be accelerated in the usual sense and in the sense of the
time increase of the velocity variance (the stochastic ac-
celeration), respectively. In Sec. V we describe localized
states of the matter in circular trap subject to external
field, and in Sec. VI we show that a matter soliton af-
fected by rapidly varying force represents an example of
the Kapitza pendulum [14]. Summary and discussion of
the results are given in the Conclusion.
II. SCALING AND THE EVOLUTION
EQUATION
We assume that a BEC is loaded in a circular trap,
which in cylindrical coordinates r = (ρ, ϕ, z) is described
by V = Vc(ρ) +mω
2/2, where ωz is the frequency of
the magnetic trap in the z–direction, Vc(ρ) is the poten-
tial in the radial direction, forming the trap circular in
the (x, y)–plane, and m is the mass of an atom. We also
suppose that the BEC is subject to external electric field
with amplitude E0, which produces an additional poten-
tial Vext = −α′E20/4, where α′ is the polarizability of the
atoms (see e.g. [15]). If the amplitude E0 or direction of
the field vary along some direction, say, along the x-axis,
smoothly on the scale of the trap radius R, the potential
energy Vext van be expanded in the Taylor series and, af-
ter neglecting nonessential constant, be rewritten in the
form Vext = −αx, where α = (α′/4)∂(E0)2/∂x|x=0 and
we restricted the consideration only to the first term of
the expansion. In order to realize one-dimensional geom-
etry we require torus radius to be much larger than the
core radius rc, what allows us to define a small param-
eter ǫ = rc/R ≪ 1. In order to introduce quantitative
characteristics, we consider the normalized ground state
http://arxiv.org/abs/0704.0733v1
φ of the eigenvalue problem
φ+ Vc(ρ)φ = εrφ ,
φ2ρdρ = 1 (1)
and define R1 =
φ2ρ2dρ, R2 =
φ2dρ/ρ
)−1/2
and λ =
φ4ρdρ
)−1/2
. In the case at hand λ ∼√
Rrc ∼ ǫ1/2R and thus λ≪ R1 ∼ R2 ∼ R.
In the present paper we are interested in the dynam-
ics of matter waves which spatial extension is much less
than the trap perimeter, allowing treat them similarly to
the matter solitons in an infinite one-dimensional trap.
This in particular the case where the spatial size of the
BEC excitations along the trap are of order of λ, which
is the well defined parameter and thus convenient for for-
mulation the constraints of the theory. Indeed, now we
can estimate the kinetic energy of the longitudinal ex-
citations as ε‖ = h̄
2/(2mλ2) and require it to be much
less than the kinetic energy of the transverse excitations,
εr ∼ h̄2/(2mr2c) (for the sake of simplicity here we assume
that the size of the trap in z-direction is of order of the
core radius: az =
h̄/mωz ∼ rc). Adding the require-
ment for the energy of the two-body interactions, which
is estimated as |g|n (where g = 4πh̄2as/m, as is the scat-
tering length, n ∼ N/V is a mean density, N is the total
number of atoms and V is the effective volume occupied
by the atoms and estimated as V ∼ πλrcaz), to be of or-
der of ε‖ and to be much less than εr (or more precisely,
requiring |g|n/εr ∼ ǫ), we can neglect in the leading or-
der the transitions between the transverse energy lev-
els [9, 10], and employ the multiple scale expansion [6, 10]
for description of the quasi-one-dimensional evolution of
the BEC. We also notice that subject to the assumptions
introduced, one has the estimate N ∼ ǫ3/2R/(8|as|).
In order to get an insight on practical numbers, let us
consider 7Li atoms (as = −2 nm) in a trap with R =
100µm, rc = 5µm and az = 10µm. Then ǫ = 0.05, the
characteristic size of solitonic excitations is λ ≈ 22µm
and the number of particles is estimated as N ≈ 140. We
emphasize, that these estimates indicate only an order of
the parameters. Thus, for example, a condensate of 102÷
103 lithium atoms satisfy the conditions of the theory.
We will be interested in managing soliton dynamics
by means of weak (i.e not destroying solitons) electro-
magnetic field varying in time. Respectively, we con-
sider α time-dependent and characterized by the esti-
mate α ∼ h̄2/(mR1λ2). Then, starting with the Gross-
Pitaevskii equation, in which the external potential in
cylindrical coordinates has the form Vext = −αρ cos(ϕ) ≈
−αR1 cos(ϕ), and using the multiple-scale expansion one
ensures that the BEC macroscopic wave function in the
leading order allows factorization
Ψ = π−1/4a−1/2z e
−i(ωr+ωz)t/2e−z
2/2a2
zφ(r)ψ(t, ϕ), (2)
where ωr = 2εr/h̄ and ψ(t, ϕ) solves the nonlinear
Schrödinger equation, which we write in terms of A =
|g|m/
2πh̄2azψ, ζ = R2ϕ/λ, and τ = h̄t/mλ
− cos(κζ)f(τ)A + σ|A|2A . (3)
Here σ =signas, f(τ) ≡ mR1λ2α(t)/h̄2 and κ = λ/R2 ∼√
ǫ. We choose the scaling in such a way that all terms
in (3) are of the unity order, and in particular A = O(1).
This can be done, taken into account the normalization
|A|2dζ = 2
|as|N
, (4)
L = 2π/κ, which follows from the normalization condi-
tion for the order parameter
|Ψ|2d3r = N , and consid-
ering N ∼ az/|as|, what is of order of 103, in a typical
experimental setting.
Eq. (3) is subject to periodic boundary conditions
A(ζ, τ) = A(ζ + L, τ).
III. ACCELERATION OF BRIGHT MATTER
SOLITONS BY TIME-DEPENDENT EXTERNAL
FORCE
First we consider the acceleration, γ, which can be
achieved due to the potential Ve properly dependent on
time. An order of magnitude of γ can be estimated by
taking into account that Eq. (3) makes sense provided
that all terms are of the unity order. In the physical
units this gives γ ∼ h̄2/(m2λ3). Then, recalling the
above example of the lithium condensate we estimates
γ ∼ 7mm/s2, what is of order of the acceleration an-
nounced in [11]. It however does not provide the best es-
timate in our case, because it is based on the 1D model,
while lowering dimensionality imposes constrains on the
atomic density and consequently on the amplitude of the
applied force.
To describe the physics of the phenomenon we consider
a BEC with a negative scattering length (σ = −1). Then
a ”bright soliton” solution of Eq. (3) at f(τ) ≡ 0 (or more
precisely a periodic solution mimicking a bright soliton
in an infinite 1D system) which moves with a constant
velocity vn can be written down as follows [8]
As = e
−i(ω(k)+v2
/2)τ+ivnζη(k) dn (η(k)(ζ − vnτ), k) . (5)
Here dn(x, k) is the Jacobi elliptic function [16], k is
the elliptic modulus parameterizing the solution. The
frequency and the amplitude are given by: ω(k) =
(k2/2 − 1)η2(k) and η(k) = 2K(k)/L [K(k) is the com-
plete elliptic integral of the first kind]. The velocity of
the soliton is quantized vn = 2πn/L with n being integer.
To ensure that the solution As satisfies the scaling
relations imposed above, we notice that the size of
the soliton can be estimated as π/K(k) and its small-
ness implies that k is close to unity. In that case we
obtain the estimates 1 − k2 ∼ 16 exp(−2π/
ǫ) and
dn (η(k)(ζ − vnτ), k) ≈ 1/ cosh(η(k)(ζ − vnτ)).
In the limit k → 1 quantization of the velocity does not
play any significant role. We verified this numerically.
For example, for L = 10 deviation of the initial velocity
from the quantized one produces appreciable effect on
dynamics during intervals τ <∼ 100 only if k <∼ 0.99.
When external force is applied, f(τ) 6= 0, the velocity is
not preserved any more, what manifests itself in evolution
of the momentum P = (1/2i)
AζĀ−AĀζ
(here Ā
stands for complex conjugation of A) according to the
= −f(τ)
cos(κζ)
∂|A|2
dζ . (6)
The external field, however, does not affect the norm:
|A|2dζ=const. It follows from (5) that in the
adiabatic approximation the solution of the perturbed
equation (3) can be searched in the form
A = e−i(ω(k)+V (τ)
2/2)τ+iV (τ)ζη dn (η(ζ −X(τ)), k) (7)
where V (τ) = dX(τ)/dτ is the time-dependent velocity
of the soliton and X(τ) is the coordinate of the soliton
center. Substituting (7) in (6) and taking into account
the parity of the functions in the integrand as well as the
fact that all of them are periodic with the same period
L, we obtain the equation for the soliton coordinate
= −κC(k)f(τ) sin(κX) . (8)
C(k) =
2πE(k)
cos(θ)dn2
dθ (9)
and it is taken into account that N = 2ηE(k), where
E(k) is the complete elliptic integral of the second kind.
Depending on the choice of the function f(τ), Eq. (8)
describes different dynamical regimes. Now we are inter-
ested in acceleration which occurs during the rotational
movement of the soliton in the trap (i.e. X is a growing
function). We illustrate this acceleration using an exam-
ple of the simplest step-like dependence f(τ). To this
end we assume that initially the soliton is centered at
X(0) = 0 and require f(τ) to be a constant f0 for time
intervals such that the soliton coordinates X(τ) ∈ Ip and
to be zero for X(τ) /∈ Ip where the intervals Ip are given
by Ip = [(p +
)L, (p + 1)L] with p = 0, 1, .... Then, as
it is clear, the acceleration of the soliton, which is given
by the right hand side of (8), is positive for all times.
The above requirement introduces natural splitting of
the temporal axis in the set of intervals Tl = [τl, τl+1]
(l = 0, 1, ...), with τ0 = 0, such that f(τ) = 0 for τ ∈ T2p
and f(τ) = f0 for τ ∈ T2p+1 (here X(τl) = lL/2). Thus,
our task is to find τl. This can be done by taking into ac-
count that during each of the ”odd” intervals T2p+1 Eq.
(8) describes conservative nonlinear oscillator, the solu-
tion for which is well known. During ”even” intervals T2p
the motion is free (with a constant velocity) what means
FIG. 1: (Color online) The soliton velocity vs time (panels
a,b) for the parameters k = 0.99999, L = 10.0, f0 = 0.3,
n = 0.43 (panel a, the non-quantized velocity), n = 1.0 (panel
b, the quantized velocity), and the forms of the soliton (panels
c,d,e) in the instants of time indicated in the panel a. In
the panels a,b solid and dashed lines represent the velocity
numerically computed from Eq.(3), and Eq.(8), respectively.
that the time T2p necessary for the soliton to cross an
interval [pL, (p+ 1
)L] is
T2p = τ2p+1 − τ2p = L/(2v2p), (10)
where v2p is the velocity in the point pL. During the
time interval T2p+1 the soliton has to cross the interval
Ip. From this condition we obtain:
T2p+1 = τ2p+2 − τ2p+1 =
2E0/(H2p+1 + E0)
H2p+1 + E0
where H2p+1 = v
2p+1/2 + E0 is the energy of the soli-
ton in the point (p + 1/2)L, E0 = C(k)f0, and v2p+1 =
v2p is the soliton velocity in the same point. At the
end of the interval T2p+1 the soliton velocity is given
by v2(p+1) =
2(H2p+1 + E0). Thus one computes
that after p rotations the soliton acquires the velocity
v2(p+1), which can be obtained from the recurrent rela-
tion: v2(p+1) =
v22p + 4C(k)f0.
In Fig.1 a,b we compare the solution, obtained from
the perturbation theory, Eq.(8), with numerical simula-
tion of Eq.(3) for f0 = 0.3. Nevertheless during the nu-
merical simulation we used the values for T2p and T2p+1
(Eqs.(10)–(11)) obtained for the case of adiabatic approx-
imation. It follows form the results presented that the
dashed and solid lines perfectly match until τ ≈ 50.0. At
larger times appreciable discrepancy appears. It occurs
due to failure of the adiabatic approximation and can be
removed by introducing temporal corrections to T2p and
T2p+1. This naturally leads to an optimization problem,
which requires numerical approach and goes beyond the
scope of the present work. Finally we notice, that for the
above example of 7Li condensate the obtained accelera-
tion is 0.36mm/s2.
Comparison of the panels a and b in Fig.1 shows that
for k ≈ 1 quantization of the velocity is not important,
what is also confirmed by the evolution of the solitonic
forms depicted in the panels Fig.1 c-e.
IV. STOCHASTIC ACCELERATION OF
MATTER SOLITONS
Now we concentrate on another dynamical regime – on
the stochastic acceleration – where increase of the veloc-
ity of a matter soliton in a toroidal trap is achieved by
applying a fluctuating external field. To this end hold-
ing all conditions of the applicability of the model (3),
we consider the case of a stochastic force f(τ), which
is delta-correlated Gaussian process with characteristics
〈f(τ)〉 = 0 and 〈f(τ)f(τ ′)〉 = Dδ(τ − τ ′) (the angular
brackets stand for the stochastic averaging and D is the
dispersion). Now the dynamics can be described in terms
of the distribution function
P(V,Φ, τ) = 〈δ(Φ− Φ(τ))δ(V − V (τ))〉, (12)
where Φ(τ) ≡ κX is the angular coordinate of the soli-
ton, Φ(τ) and V (τ) with explicit time dependence stand
for the soliton coordinates obtained from the dynamical
equations while Φ and V are considered as independent
variables. The distribution function solves the Fokker-
Planck equation, which is obtained by the standard pro-
cedure (see e.g. [17]):
= −V ∂P
+ D̃ sin2(Φ)
. (13)
Here D̃ = κ4C2(k)D is the diffusion coefficient. Due to
the circular geometry of the trap Eq. (13) is considered
on the interval −π < Φ < π with the periodic boundary
conditions P(V,Φ − π, τ) = P(V,Φ + π, τ) with respect
to Φ and zero boundary conditions with respect V : P →
0 as V → ±∞. The normalization condition for the
probability density reads:
dΦP = 1.
Multiplying Eq.(13) by V and Φ and integrating over
V and Φ one readily obtains that the averaged velocity
and angular position of the soliton are constants, which
for the sake of simplicity will be considered zeros, i.e.
〈V 〉 = 0 and 〈Φ〉 = 0. Next, multiplying (13) by V 2, Φ2
and V Φ and performing the integration one obtains the
equations of the second momenta. They are not closed
and can be written down as follows:
〈V 2〉 = 2D̃〈sin2 Φ〉 , (14)
〈Φ2〉 = 2〈V Φ〉 , (15)
〈V Φ〉 = −2π
P (π, V, τ)V 2dV + 〈V 2〉 .(16)
Eq. (14) means that the average square velocity is grow-
ing with time, i.e. the soliton undergoes the stochastic
acceleration. The law of the growth of the velocity in-
variance deviates form the linear, as it would happen for
FIG. 2: (Color online) The mean square velocity (panel a) and
the stochastic acceleration γ̃ (panel b) of the soliton vs time
for different values of the dispersion, obtained by numerical
integration of Eq. (13) with parameters L = 10.0 and k =
0.99999. In panel b dashed lines depict the approximation of
numerical data by the law γ̃ ∝ τ−1/2. All axes in panels a,b
are represented in logarithmic scale.
the Brownian diffusion in the momentum space, what
happens because the diffusion coefficient in the Fokker-
Planck equation (13) is not a constant, but depends on
the angular variable. However, due to the diffusion one
can expect that the phase distribution will tend to ho-
mogeneous, i.e. that P → 1/(2π) as τ → ∞. In this
formal limit one obtains that 〈V Φ〉 → 0, 〈sin2 Φ〉 → 1/2
and hence 〈V 2〉 → D̃τ . In other words, the system (14)-
(16), describes random walk which in the limit of large
time, approaches the Brownian diffusion in the velocity
space. In that limit the stochastic acceleration, which
can be defined as γ̃ = d
〈V 2〉/dτ , would tend to zero
according to the law γ̃ ∝ τ−1/2.
In order to check the above predictions and reveal other
features of the stochastic dynamics of a soliton in a ring
trap we solved numerically Eq. (13) subject to the ini-
tial condition P(V,Φ, 0) = 〈δ(Φ)δ(V )〉. The results is
summarized in Fig. 2. In the panel a one observes the
predicted monotonic growth of the mean velocity with
time, which slightly different from the linear law. In the
panel b one can see that the stochastic acceleration γ̃
is a monotonically decreasing function, which at suffi-
ciently large times tends to zero. In particular, at τ >∼ 15
the decreasing of the acceleration with time can be well
approximated by the predicted law γ̃ ∝ τ−1/2, as it is
shown by dashed curves in the panel b of Fig. 2 (it was
verified that in at the same times 〈sin2 Φ〉 ≈ 1/2, what
is in agreement with the analytical predictions). Also
Fig. 2 b shows that the stochastic acceleration is larger
for larger D. The physical explanation of this last fact
is simple: the acceleration is generated by the stochastic
force, whose intensity is determined by the dispersion D.
V. LOCALIZATION OF MATTER INDUCED BY
THE EXTERNAL FIELD
Let us now turn to localized states of a matter in a
toroidal trap and concentrate on the states generated by
the constant external electric field, i.e. by f(τ) ≡ f0.
Respectively, we look for stationary solutions of Eq. (3)
in the form A = e−iωτA(ζ) and obtain for A(ζ) the equa-
tion:
− f0 cos(κζ)A + σ|A|2A = ωA (17)
which is subject to periodic boundary conditions
A(ζ, τ) = A(ζ + L, τ).
Several lowest branches of the numerically obtained
solutions of Eq. (17) are shown in Fig. 3. The lowest
branch approaches zero at the frequency ω0 ≈ −0.143
(it is interesting to mention that this frequency coincides
with the lowest gap edge of the spectrum of the Mathieu
equation (17) considered on the whole axis), where the
amplitude of the nonlinear periodic mode is small and
it transforms into the linear periodic Bloch mode at the
lowest gap edge. Such a behavior of the branch is sim-
ilar to that of the strongly localized modes in a BECs
in the optical lattice [18]. The lowest mode A is local-
ized in the vicinity of ϕ = 0 (Fig.3, b), i.e. around the
minimum of the effective potential and that is why such
a mode is stable and can exists even in the linear case,
where the two-body interactions are negligible (here it
is important that we are dealing with periodic boundary
conditions). The modes of the upper branches – B and
D (their typical examples are shown in Fig.3 b) – bifur-
cate at ω∗ ≈ −0.345. They are localized at ϕ = ±π, i.e.
near the points where the potential has maxima. As it
is clear, this is pure nonlinear effect and occurs due to
delicate balance between the attractive interactions and
repulsive forces of the external potential. Such balance
can easily be destroyed even by an infinitesimal pertur-
bation, what allows us to expect instability of the modes.
The mode C represents two local maxima of the atomic
distributions at ϕ = 0, π. Similar to the modes B and D
we one can expect it to be unstable, what physically can
be explained by existence of local atomic maxima at the
maxima of the potential. By direct numerical solution of
Eq. (3) (more specifically by perturbing the mode pro-
files by the factor 1 + 0.1 cos (21 ζ) and computing the
dynamics until τ = 1000) we have verified that, indeed,
only the mode A on Fig. 3 is dynamically stable, while
the modes B, C, and D are unstable.
VI. MATTER SOLITON AS A KAPITZA
PENDULUM
As the final example of nontrivial dynamics of a mat-
ter soliton in a toroidal trap we consider dynamical lo-
calization induced by a rapidly oscillating force f(τ) =
f0 [ν + cos(Ωτ)]. In this case the solitonic motion mimics
FIG. 3: (Color online) The number of bosons N (for the ex-
ample of the lithium condensate described in the text) vs
frequency ω (panel a) and shapes of the localized modes at
ω = −0.4 (panel b) for the case where L = 10.0, f0 = 0.3,
σ = −1.
the famous Kapitza pendulum, which acquires an addi-
tional stable point due rapid oscillation of the pivot [14].
Assuming that the physical conditions of the validity of
the quasi-1D approximation (3) holds and that the fre-
quency Ω is large enough, i.e. Ω2 ≫ κ2C(k)f0, one can
perform the standard analysis (see e.g. [14]), i.e. look for
a solution of (3) in a form X(τ) + ξ(τ) where ξ is small,
|ξ| ≪ |X |, and rapidly varying, and provide averaging
over rapid oscillations. Then one arrives at the equation
d2X/dτ2 = −∂U/∂X with the effective potential
U = −C(k)f0
ν cos(κX) + κ2C(k)
cos(2κX)
.(18)
If the condition κ2C(k)f0/(2Ω
2ν) > 1 is met, the ef-
fective potential U possesses two stable points: X = 0
(Φ = 0) and X = L/2 (Φ = π). So, it opens the pos-
sibility for the new type of soliton moving around the
new stable point. Two typical trajectories of the soli-
ton, obtained by numerical integration of Eq. (3), are
presented in Fig.4. One of the trajectories displays oscil-
lations around the new equilibrium point, while the other
one shows the large oscillations around the equilibrium
point Φ = 2π started with the same initial data but in the
case where Φ = π is not an equilibrium any more. The
amplitude of large oscillations decay with time because
of energy losses of the soliton in the nonconservative sys-
VII. CONCLUSIONS
In the present paper we have shown that dynamics of
a matter soliton in a toroidal trap, well reproducing one-
dimensional geometry, can be very efficiently governed by
time varying external electric field. In particular, such
regimes like dynamical acceleration, stochastic acceler-
ation, localization and implementation of the Kapitza
pendulum can be realized by proper choices of the time
dependence of the external force.
Experimental detection of the acceleration can be im-
plemented either by direct imaging of the atomic cloud,
FIG. 4: (Color online) The angular coordinate of the soliton
center vs time for the soliton motion affected by the rapidly
oscillating external force, obtained numerically from Eq.(3)
with parameters L = 10.0, n = 0.01, σ = −1, f0 = 0.15,
Ω = 2.0, and k = 0.99999.
which is well localized in space and has well specified tra-
jectory, or by measurement of the atomic distribution in
the momentum space displaying shift of the maximum
towards higher kinetic energies. Alternatively, one can
study the evolution of the atomic cloud releasing from
the trap (by switching the trap off) after some period of
accelerating motion. The respective dynamics will be a
spreading out cloud whose center of mass is moving with
the acquired velocity.
The obtained results were based on the one-
dimensional model, although deduced using the multiple-
scale method and thus mathematically controllable. This
means that a number of problem, are still left open. One
of them is the limitation on the soliton velocity, and thus
acceleration, introduced by lowering the space dimension,
which appears when the solitonic kinetic energy becomes
comparable with the transverse kinetic energy. Another
limitation on the soliton acceleration emerges from the
fact of the velocity quantization, when the radius of the
ring trap is not large enough. We also left for further
studies the diversity of localized stationary atomic distri-
butions supported by the external filed, indicating only
the lowest modes. We thus believe that the richness of
the phenomena which can be observed by simple com-
bination of the trap geometry and varying external field
will stimulate new experimental and theoretical studies.
Acknowledgments
The authors are indebted to H. Michinel for providing
additional data on Ref. [11]. YVB was supported by the
FCT grant SFRH/PD/20292/2004. VVK was supported
by the Secretaria de Stado de Universidades e Investi-
gaci?n (Spain) under the grant SAB2005-0195. The work
was supported by the FCT (Portugal) and European pro-
gram FEDER under the grant POCI/FIS/56237/2004.
[1] S. Gupta, K. W. Murch, K. L. Moore, T. P. Purdy, and
D. M. Stamper-Kurn Phys. Rev. Lett. 95, 143201 (2005)
[2] K. W. Murch, K. L. Moore, S. Gupta, and D. M.
Stamper-Kurn, Phys. Rev. Lett. 96, 013202 (2006)
[3] J. Javanainen, S. M. Paik, and S. Mi Yoo, Phys. Rev. A
58, 580 (1998); L. Salasnich, A. Parola, and L. Reatto,
Phys. Rev. A 59, 2990 (1999); A. B. Bhattacherjee, E.
Courtade, and A. Arimondo, J. Phys. B, 37, 4397 (2004).
[4] J. Brand and W. P. Reinhardt, J. Phys. B: At. Mol. Opt.
Phys. 34, L113 (2001)
[5] O. Dutta, M. Jääskeläinen, and P. Meystre, Phys. Rev.
A 74, 023609 (2006).
[6] V. M. Pérez-Garćıa, H. Michinel and H. Herrero, Phys.
Rev. A 57, 3837 (1998).
[7] F. Kh. Abdullaev, et. al. Int. J. Mod. Phys. B 19, 3415
(2005)
[8] T. Tsuzuki, J. Low Temp. Phys. 4, 441 (1971).
[9] Th. Busch and J. R. Anglin, Phys. Rev. Lett. 84, 2298
(2000); A. Muryshev, G.V. Shlyapnikov, W. Ertmer,
K. Sengstock, and M. Lewenstein, Phys. Rev. Lett. 89,
110401 (2002)
[10] V. A. Brazhnyi and V. V. Konotop, Phys. Rev. A 68,
043613 (2003).
[11] A.V. Carpentier and H. Michinel, cond-mat/0610047.
[12] V. A. Brazhnyi, V. V. Konotop, and V. Kuzmiak, Phys.
Rev. A 70, 043604 (2004)
[13] A. V. Yulin, D. V. Skryabin, and P. St. J. Russell, Phys.
Rev. Lett. 91, 260402 (2003)
[14] see e.g. L. D. Landau and E. M. Lifshitz, Mechanics
(Pergamon Press, New York, 1976)
[15] C.J. Pethick and H. Smith, Bose-Einstein condensation
in dilute gases (Cambridge, University Press, 2001)
[16] D. K. Lawden: Elliptic Functions and applications
(Springer-Verlag, New York Inc., 1989)
[17] see e.g. V. V. Konotop and L. Vázquez Nonlinear random
waves (World Sci., Singapore, 1994)
[18] A. Trombettoni and A. Smerzi, Phys. Rev. Lett. 86, 2353
(2001); F. K. Abdullaev, B. B. Baizakov, S. A. Dar-
manyan, V. V. Konotop, and M. Salerno, Phys. Rev. A
64, 043606 (2001).
http://arxiv.org/abs/cond-mat/0610047
ABSTRACT
  A toroidal trap combined with external time-dependent electric field can be
used for implementing different dynamical regimes of matter waves. In
particular, we show that dynamical and stochastic acceleration, localization
and implementation of the Kapitza pendulum can be originated by means of proper
choice of the external force.

<|endoftext|><|startoftext|>
**FULL TITLE**
ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION**
**NAMES OF EDITORS**
GRO J1655-40: from ASCA and XMM-Newton
Observations
Xiao-Ling Zhang1, Shuang Nan Zhang2,3,4, Gloria Sala1,
Jochen Greiner1, Yuxin Feng3,4, Yangsen Yao5
Abstract. We have analysed four ASCA observations (1994–1995, 1996–
1997) and three XMM-Newton observations (2005) of this source, in all of which
the source is in high/soft state. We modeled the continuum spectra with rel-
ativistic disk model kerrbb, estimated the spin of the central black hole, and
constrained the spectral hardening factor fcol and the distance. If kerrbb model
applies, for normally used value of fcol (1.7), the distance cannot be very small,
and fcol changes with observations.
1. Background
GRO J1655-40, the second microquasar (after GRS 1915-105), had X-ray out-
bursts in 1994-1995, 1996-1997, 2005. Its geometric parameters are considered
best determined: mass MBH = 7.0± 0.2M⊙, inclination angle θ = 69.50
± 0.08
(Orosz & Bailyn 1997), distance D = 3.2 ± 0.2 kpc (Hjellming & Rupen 1995),
which makes it a very good laboratory of studying black holes and environments.
The spin of the central black hole has been estimated by various authors
with various methods (see, e.g., Zhang et al. 1997; Abramowicz & Kluźniak 2001;
Aschenbach 2004; Shafee et al. 2006), and the reported value range from 0.2
(Abramowicz & Kluźniak 2001) to 0.996 (Aschenbach 2004).
In estimating black hole spin from continuum spectral modeling, the color
correction factor fcol = Tcol/Teff , is one of the key factors. The normally used
value of fcol is 1.7, following Shimura & Takahara (1995), while many authors
believe it should not be constant (see, e.g., Merloni et al. 2000). The distance
is also very important. The widely accepted value 3.2± 0.2 kpc was challenged
by Foellmi et al. (2006), who gave an upper limit of 1.7 kpc.
2. Observations, data reduction and model fitting
We analysed three ASCA observations during the 1994–1995 and the 1996–1997
outbursts, and three XMM-Newton observations during the 2005 outburst, in all
of which the source was in high/soft state. For ASCA, only GIS2 data were used,
after gain correction and deadtime correction. For XMM-Newton, only Epic-pn
MPE, Postfach 1312, 85741 Garching, Germany, zhangx@mpe.mpg.de
Tsinghua Univ, 100084, Beijing, China
U. of Alabama, Huntsville, AL 35899, USA
NSSTC, Sparkman Dr. 320, Huntsville AL 35805, USA
MIT Kavli Inst. for Astro. and Space Research, 70 Vassar Street, Cambridge, MA 02139
http://arxiv.org/abs/0704.0734v1
2 X.-L. Zhang et al
data were used, after correction for rate-dependent Charge-Transfer-Efficiency
(Sala et al. 2006).
The classical way of estimating black hole spin from the continuum spectral
fitting is to fit the spectra with disk models, and obtain the spin directly or
indirectly. All models take the source distance as parameter, and most models
treat the disk as multi-temperature black-body rings and the derived spin value
depends on the apparent/effective temperature ratio.
The relativistic disk model kerrbb in XSPEC was used in the fitting. We
let fcol vary from 1.0 to 3.0, and D vary from 1.0 kpc and 3.2 kpc. For each
combination of fcol and D, we fitted the data sets and obtained a spin value, if
the fit was acceptable (χ2/dof < 2). The contour of the derived spin a over D
and fcol are shown in the Fig. 1.
3. Conclusion
From Fig. 1 we can see, 1. for the normally used fcol value 1.7, kerrbbmodel does
not favor small distance; 2. because the black hole spin and the source distance
should be constant, fcol changes dramatically between these observations.
Figure 1. Contour of black hole specific angular momentum a versus dis-
tance D and fcol. The two dotted lines indicate D=1.7kpc, and fcol = 1.7.
References
Abramowicz, M. A., & Kluźniak, W. 2001, A&A, 374, L19
Aschenbach, B. 2004, A&A, 425, 1075
Foellmi, C., Depagne, E., Dall, T. H., & Mirabel, I. F. 2006, A&A, 457, 249
Hjellming, R.M. & Rupen, M.P. 1995, Nat, 375, 464
Merloni, A., Fabian, A.C., Ross, R.R. 2000, MNRAS, 313, 193
Orosz, J. A., & Bailyn, C. D. 1997, ApJ, 477, 876
Sala, G., et al 2006, A&A, accepted (astro-ph/0606272 )
Shafee, R., et al. 2006, ApJ, 636, L113
Shimura, T. & Takahara, F. 1995, ApJ, 445, 780
Zhang, S. N., Cui, W., & Chen, W. 1997, ApJ, 482, L155
http://arxiv.org/abs/astro-ph/0606272
ABSTRACT
  We have analysed four ASCA observations (1994--1995, 1996--1997) and three
XMM-Newton observations (2005) of this source, in all of which the source is in
high/soft state. We modeled the continuum spectra with relativistic disk model
kerrbb, estimated the spin of the central black hole, and constrained the
spectral hardening factor f_col and the distance. If kerrbb model applies, for
normally used value of f_col, the distance cannot be very small, and f_col
changes with observations.

<|endoftext|><|startoftext|>
Introduction
Hard X-ray surveys have clearly revealed the important role
played by obscured Active Galactic Nuclei (AGN) to repro-
duce the cosmic X-ray background (XRB; e.g., Comastri et al.
1995) and have provided evidence that a significant fraction of
the accretion-driven energy density in the Universe resides in
obscured X-ray sources (e.g., Barger et al. 2005; Hopkins et al.
2006; Hickox & Markevitch 2006).
Until recently, the limited information on the broad-band
emission of the counterparts of obscured X-ray sources pre-
vented a reliable determination of their bolometric luminosi-
ties. The lack of a proper knowledge of the spectral energy dis-
tributions (SEDs) of obscured sources has led many authors to
adopt, in the computation of the bolometric luminosities, the
average value derived by Elvis et al. (1994), although in that
work the sample comprises mostly local unabsorbed quasars.
By the current work, we aim at providing a robust estimate of
the bolometric luminosity for obscured sources, which is an
essential parameter to derive the cosmic mass density of super-
massive black holes (SMBHs, i.e., following the Soltan 1982
approach). A reliable estimate of the bolometric luminosity of
obscured AGN is typically limited by the actual capabilities
of disentangling the nuclear emission (related to the accretion
processes) from that of the host galaxy which, unlike for un-
absorbed quasars, often dominates at optical and near-infrared
(near-IR) bands.
A significant fraction of high-redshift, luminous obscured
AGN (the so-called Type 2 quasars) may have escaped spec-
troscopic identification due to their faint optical counterparts,
thus preventing current studies from an accurate sampling of
obscured sources. Mid-infrared (mid-IR) observations appear
http://arxiv.org/abs/0704.0735v2
2 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
Table 1. Properties of our targets
Source Id. 2–10 keV flux† R Ks Morph(Ks) X/O z
(10−14 erg cm−2 s−1)
Abell 2690#75 3.30 24.60 18.33 E 1.86 2.13a
PKS 0312−77#36 1.90 24.70 19.13 E 1.66 –
PKS 0537−28#91 4.20 23.70 18.99 E 1.60 –
PKS 0537−28#54 2.10 25.10 18.91 E 1.86 –
PKS 0537−28#111 2.10 24.50 17.64 E 1.62 –
Abell 2690#29 2.80 25.10 17.67 P 1.99 2.08b
PKS 0312−77#45 2.80 24.40 18.62 P 1.71 –
BPM 16274#69 2.27 24.08 17.87 E 1.49 1.35b
† 2-10 kev X-ray fluxes from Perola et al. (2004).
a Tentative spectroscopic redshift from near-IR spectroscopic observations (Maiolino et al. 2006).
b Spectroscopic redshift from near-IR spectroscopic observations (Maiolino et al. 2006).
to be fundamental for this class of objects, since they are only
marginally affected by dust obscuration and are able to recover
the nuclear emission. With mid-IR observations, we expect to
reveal the nuclear radiation re-processed by the torus of the
obscured active nuclei, which are often recognized as such by
means of their X-ray emission only. For these sources, the
soft X-ray emission, which is photo-electrically absorbed by
the gas, and the optical emission, extincted by the dusty cir-
cumnuclear medium, are expected to be downgraded in energy
and to emerge as thermally reprocessed radiation in the IR at
wavelengths in the range between a few and a few hundred µm
(Granato et al. 1997).
The potentialities of mid-IR observations were firstly
shown by Fadda et al. (2002), who detected with ISOCAM at
15 µm about two-thirds of the X-ray sources detected in the
5–10 keV band in the XMM-Newton Lockman Hole survey. A
similar high detection rate at 24 µm has been recently reported
by Rigby et al. (2004) and Franceschini et al. (2005) studying
the Spitzer counterparts of the Chandra sources in the CDF-
S and in the ELAIS-N1 field, respectively, within the SWIRE
survey (Lonsdale et al. 2004).
In this context, a new interesting class of objects is emerg-
ing from the current X-ray surveys: these sources are character-
ized by a high (>1) X-ray-to-optical flux ratio (hereafter X/O)1;
for comparison, unobscured Type 1 AGN have a broad distri-
bution peaked at X/O≈0. Objects with X/O &1 are about 20%
of the hard X-ray selected sources, and the fraction of these
sources seems to remain constant over ≈ 3 decades of X-ray
flux (Comastri & Fiore 2004). By definition, sources with high
X/O are among the faintest sources in the optical band. In the
shallow, large-area X-ray surveys (e.g., the HELLAS2XMM
survey, with F2−10 keV > 10
−14 erg cm−2 s−1 over ≈ 3 deg2;
Baldi et al. 2002), where the identification of a sizable sam-
ple of sources with X/O>1 has been possible (e.g., Fiore et al.
2003), the X/O selection criterion has proven to be effective in
selecting Type 2 quasars at high redshifts.
1 X/O is defined as log FX
= log FX +
+ 5.5.
We have performed a pilot program to study with Spitzer a
sample of eight sources selected in the 2–10 keV band from the
HELLAS2XMM survey on the basis of their high (>1) X/O and
large column densities (NH ≥10
22 cm−2). The sample observed
with Spitzer has been previously investigated in other bands.
The most surprising finding of the follow-up campaigns was
the association of these sources with luminous near-IR objects
(Mignoli et al. 2004), placing them into the class of Extremely
Red Objects (EROs, R − K ≥5).
The outline of the paper is as follows: in Sect. 2 we present
the sample selection; Spitzer data reduction and analysis are
discussed in Sect. 3, while in Sect. 4 we describe the analy-
sis of the SEDs. Finally, in Sect. 5 we estimate the bolometric
luminosities, the stellar masses of the host galaxies, the black
hole masses, and the Eddington ratios.
Throughout this paper we adopt the “concordance”
(WMAP) cosmology (H0=70 km s
−1 Mpc−1, ΩM=0.3, and
ΩΛ=0.7; Spergel et al. 2003). Magnitudes are expressed in the
Vega system.
2. Sample selection
The eight objects presented in this paper (see Table 1) were se-
lected among the 10 HELLAS2XMM high X/O ratio sources
detected in the Ks band with ISAAC at ESO-VLT; for details
on the association of the Ks-band counterpart to the X-ray
source, see Mignoli et al. (2004). Two sources of the origi-
nal sample were not selected for Spitzer observations: one
(PKS 0537−28#31) is associated with a disky galaxy, while
the other (BPM 16274#181) has an ambiguous Ks-band mor-
phological classification. All but one of the sources observed
by Spitzer belong to the first square degree field (122 X-ray
sources; see Fiore et al. 2003 for the spectroscopic and photo-
metric identification and Perola et al. 2004 for the X-ray spec-
tral analysis); the only exception is BPM 16274#69, which be-
longs to the sample of the second square degree (110 X-ray
sources; see Cocchia et al. 2007; Lanzuisi et al., in prepara-
tion).
F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 3
Fig. 1. ISAAC Ks images, centered on the Ks counterpart of the X-ray sources; each box is 30′′ wide. North is to the top and East
to the left. Contour levels of the 24 µm emission corresponding to [3, 4, 5, 7, 10, 20, 40]σ are superimposed to each image.
The selected sources, although faint in the optical band
(23.7<R<25.1), are all bright in the near-IR (Ks.19) and, all
but one, have R − Ks > 5, thus being EROs. Mignoli et al.
(2004) were able to study the surface brightness profiles of
these sources in the Ks band and obtained a morphological clas-
sification. Only two sources are classified as point-like objects,
while all of the others are extended, with clear detection of the
host galaxy and radial profiles consistent with those of ellipti-
cal galaxies. In this latter class of sources, the central AGN is
evident in the X-ray band, while in the near-IR the host galaxy
dominates. An upper limit to the contribution of a central un-
resolved source (i.e., the nuclear emission), ranging from 2%
to 12% of the galaxy emission, was obtained. Furthermore, us-
ing both the R − K colour and the morphological information,
a minimum photometric redshift, in the range 0.9–2.4, was es-
timated for these sources.
For three of the sources with the reddest colours
(Abell 2690#75, Abell 2690#29 and BPM 16274#69; see
Table 1), near-IR spectroscopic observations with ISAAC at
ESO-VLT were performed by Maiolino et al. (2006), thus al-
lowing for a spectroscopic identification for at least two of
these sources (one redshift measurement appears tentative).
The point-like source (Abell 2690#29) shows the typical rest-
frame optical spectrum of high-redshift dust-reddened quasars,
with a broad Hα line (Gregg et al. 2002). The other two ob-
served sources, both of them extended in the Ks band, have nar-
row emission-line spectra: one is a LINER-like object at z=1.35
and the second source has a spectrum with a single weak line,
tentatively associated with Hα at z=2.13. Consistently with the
morphological information, in the first source the AGN domi-
nates the emission, while in the other two sources the nuclear
spectrum is heavily diluted by the host galaxy starlight.
4 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
Table 2. Spitzer flux densities
Source Id. 3.6 µm 4.5 µm 5.8 µm 8.0 µm 24 µm
Sν±∆Sν Sν±∆Sν Sν±∆Sν Sν±∆Sν Sν±∆Sν
Abell 2690#75 51 ± 5 56 ± 6 89 ± 11 139 ± 15 565 ± 62
PKS 0312−77#36 41 ± 4 44 ± 5 40 ± 8 71 ± 9 236 ± 30a
PKS 0537−28#91 28 ± 4 35 ± 4 42 ± 8 80 ± 10 301 ± 40
PKS 0537−28#54 31 ± 4 35 ± 4 50 ± 10 47 ± 8 279 ± 45
PKS 0537−28#111 88 ± 9 75 ± 8 41 ± 6 46 ± 7 148 ± 28
Abell 2690#29 141 ± 14 185 ± 19 260 ± 27 371 ± 38 1012 ±106a
PKS 0312−77#45 50 ± 6 62 ± 7 69 ± 10 78 ± 10 249 ± 35
BPM 16274#69 86 ± 9 92 ± 9 97 ± 11 120 ± 13 286 ± 34
The flux density is reported in units of µJy. a The 24 µm flux density is probably over-
estimated due to contamination from nearby sources and should be considered as an
upper limit.
3. Spitzer observations and data reduction
The whole sample of eight hard X-ray selected sources has
been observed by Spitzer (Werner et al. 2004), with IRAC
(Fazio et al. 2004) observations of 480 s integration time and
MIPS (Rieke et al. 2004) observations at 24 µm with a total
integration time of ≈ 1400 s per position. IRAC observations
were performed in photometry mode with frame time of 30 s
and dither pattern of 16 points. The MIPS 24 µm observations
were performed in MIPS photometry mode with frame time of
10 s, 10 cycles and small-field pattern. To reduce overheads,
the cluster option was used when possible.
For the IRAC bands, we used the final combined post-basic
calibrated data (BCD) mosaics produced by the Spitzer Science
Center (SSC) pipeline (Version S12.0–S13.01). At 24 µm,
we started the analysis from the BCD produced by the SSC
pipeline (Version S12.4.2–S13.01) and then we applied ad hoc
procedures to optimize the reduction, since some of our sources
were close to the detection limit (see Table 2). We remind that
BCD are individual frames already corrected for dark, flat field
and geometric distortion. We improved the quality of the BCD
by correcting each individual BCD for a residual flat field de-
pending on the scan mirror position (Fadda et al. 2006). The
residual flat fielding was obtained from our own data by aver-
aging the BCD corresponding to the same scan mirror position
and the same Astronomical Observation Request (AOR), con-
sidering all the different cluster positions. To each BCD, its
median level was subtracted before this operation. This pro-
cedure was possible since our observations are not dominated
by background fluctuations. The corrected BCD were co-added
and background-subtracted using the SSC MOPEX software
(Makovoz & Marleau 2005). The resulting mosaics were made
with 2.4′′ pixel size. The overall analysis at 24 µm produces
mosaics with a typical noise of ≈ 0.020 MJy/pixel (a factor of
2 lower in comparison with the SSC pipeline mosaics). The
noise map has been computed for each mosaic by scaling the
measured mean rms of the central part of the map according to
the inverse square root of the coverage map.
The flux densities of our targets in IRAC and MIPS bands
were measured on the signal maps using aperture photometry at
the position of the sources. The chosen aperture radius for the
IRAC bands is 2.45′′ and the adopted factors for the aperture
corrections are 1.21, 1.23, 1.38 and 1.58 (following the IRAC
Data Handbook, Version 3.0) at 3.6 µm, 4.5 µm, 5.8 µm and 8
µm, respectively.
The chosen aperture radius at 24 µm is 7.5′′; aperture correc-
tions were derived by examining the photometry of bright stars.
Taking into account an additional correction of 1.15 to match
the procedure used by the MIPS instrument team to derive cal-
ibration factors from standard star observations, the resulting
aperture correction is 1.57 (in agreement with the SWIRE team,
see Surace et al. 2005).
Table 2 reports the results of the Spitzer observations. To
compute the photometric uncertainties, we added in quadra-
ture the noise map and the systematic uncertainties (≈10%, see
MIPS and IRAC Data Handbook 2006, Version 3.0). The rel-
ative photometric uncertainties range from ≈10% in the best
cases (IRAC channels 1 and 2), up to ≈20% at 24 µm in the
worst ones.
All of the eight sources are clearly detected in both IRAC
and MIPS 24 µm bands. At 24 µm, the flux densities span an
order of magnitude, ranging between ≈1000 µJy and ≈150 µJy,
with the faintest source (PKS 0537−28#111) close to the 5σ
detection level.
In Fig. 1 the Ks-band images, along with the contour lev-
els of the 24 µm emission, are shown. At 24 µm, the sources
PKS 0312−77#36 and Abell 2690#29 appear to be confused.
In particular, in both cases, there is a second source at ≈ 8-10′′,
unrelated to the targets. The contribution of these sources to
the 24 µm flux density of our targets has been estimated by a
decomposition analysis (using the PSF fitting algorithm IMFIT
within the AIPS environment). Furthermore, both the sources
PKS 0312−77#36 and Abell 2690#29 present a second object
at ≈ 2′′, clearly visible in the Ks images, too close to our targets
for a decomposition analysis, given the 24 µm pixel size. Since
these close companions become increasingly fainter moving
from the Ks bands to the longer IRAC wavelengths, we have
attributed the entire flux density estimated from the decompo-
sition analysis to our targets (see Table 2). However, the 24 µm
flux densities for these two sources should be treated as upper
limits (see Fig. 2).
F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 5
We point out that the deblending procedure measures the
peak flux density. To convert the peak flux density into total
flux density, we have assumed that the deblended objects are
point-like sources and applied a correction factor of 8.9 derived
from the 24 µm Spitzer PSF and including the 1.15 calibration
factor.
4. Analysis of the spectral energy distributions
In this section we provide an analysis of the SEDs of our
sources, in order to derive the energy distribution of the nu-
clear component “cleaned” by the host galaxy contribution. As
anticipated in Sect. 1, the determination of the nuclear SEDs
over a large wavelength range is an essential step to estimate
the physical properties of the black hole, such as its bolometric
luminosity, mass, and accretion rate. Taking advantage of the
new Spitzer photometric points, simultaneously to the SED de-
termination we have estimated the photometric redshifts of our
sources. These new values are then compared with the min-
imum redshifts estimated by Mignoli et al. (2004) using only
the R and Ks bands and with the spectroscopic ones measured
in three cases by Maiolino et al. (2006, see §2).
Given the different morphological properties of our
sources, two approaches have been adopted, one for the sources
dominated in the Ks band by the host galaxy (elliptical-like
sources) and another one for the sources dominated by the nu-
clear component (point-like sources).
4.1. Elliptical-like sources
From the Ks-band morphological analysis, we know that at
least up to the observed 2.2 µm band the stellar contribu-
tion dominates the emission in these sources. At longer wave-
lengths, the nuclear component is expected to arise as repro-
cessed radiation of the primary emission, while the stellar
component is expected to drop (e.g., Bruzual & Charlot 2003;
Silva et al. 2004).
In the analysis presented in this work, we have decided
to follow a phenomenological approach, checking whether the
emission of our sources can be reproduced as the sum of two
components, one from the host galaxy and the other related to
the reprocessing of the nuclear emission by the dusty torus en-
visaged by unification schemes (Antonucci 1993). The shape
and relative strengths of the two components have to be con-
sistent with all our observed data sets (multi-band photometry,
Ks-band morphology and magnitude, Ks-band upper limit on
the nuclear component, and X-ray spectral analysis).
For the galaxy component, we adopted a set of six galaxy
templates, obtained from the synthetic spectra of GISSEL 2003
(Bruzual & Charlot 2003) assuming a simple stellar population
and spanning a wide range of ages, from 1 Gyr up to a “maxi-
mum age” model (z f orm = 20). The “maximum age” model has
been adopted by Mignoli et al. (2004) to derive the minimum
photometric redshift for the sources of the current sample.
For the nuclear component, we adopted the nuclear tem-
plates from Silva et al. (2004), based on the radiative transfer
models of Granato & Danese (1994). We chose these templates
since in the work of Silva et al. (2004) the radiative transfer
models are used to interpolate the observed nuclear IR data for
a sizable sample of local AGN. We must note, however, that
the nuclear observed SEDs are available only in the 2–20 µm
regime, where data from small-aperture instruments are avail-
able. At wavelengths above ≈ 20 µm, the SEDs are model ex-
trapolations.
Silva et al. (2004) found that the nuclear SEDs can be ex-
pressed as a function of two parameters, the hard X-ray (2–
10 keV) intrinsic luminosity, which provides the normalization
to the SED, and the column density NH , which gives the shape
to the SED (see Fig. 2 in Silva et al. 2004). The shapes of the
SEDs of the Seyfert galaxies are assumed to be valid also at
quasar luminosities.
In the attempt to provide a better estimate for the source
redshifts, we used the four torus templates as given by
Silva et al. (2004), which depend on the column densities NH ,
and we left the normalizations free to vary. The redshift interval
explored by this procedure is 0.5–3.0; the best-fit solution is ob-
tained when the algorithm, based on the χ2, finds a minimum
in the galaxy template, torus template and redshift parameter
space.
For four out of six sources the procedure finds a clear min-
imum χ2, which allows us to determine a photometric red-
shift with relatively good accuracy (see Table 3). For sources
BPM 16274#69 and PKS 0537−28#54, our procedure con-
strains only the lower bound of the redshift interval. For all
the six sources, the estimated redshifts are consistent with
the minimum redshifts of Mignoli et al. (2004). In case of the
source BPM 16274#69, where a secure spectroscopic redshift
is available, the minimum photometric redshift (zphot > 1.25)
is consistent with the spectroscopic one (z = 1.35). For source
Abell 2690#75, we find zphot = 1.30
+0.30
−0.20, which is signifi-
cantly lower than the spectroscopic value (z = 2.13) reported
by Maiolino et al. (2006). In this case, we choose to adopt the
photometric determination, since the spectroscopic redshift is
based on the tentative detection of a single line.
The results of the SED fitting and decomposition are shown
in Fig. 2. The dot-dashed line represents the best-fit galaxy tem-
plate, the dashed line is the best-fit nuclear template and the
thick solid line is the sum of the two components. The best-fit
galaxy templates are all typical of early-type galaxies with ages
between 3 and 6 Gyr. We find an overall agreement between the
SED templates and the data points. Moreover, in all but one of
the sources, the nuclear component derived from the best fit
is consistent with the upper limits derived from the analysis
of the Ks-band images (shown as downward-pointing arrows).
In Fig. 2 we also report as dotted line the SED of the nuclear
component normalized to the intrinsic (i.e., de-absorbed) X-ray
luminosity following the prescriptions of Silva et al. (2004),
where, at a given NH , the normalization depends only on the in-
trinsic hard X-ray luminosity. The overall agreement between
the SEDs normalized to the X-ray luminosity and the best-fit
SEDs is extremely interesting, being consistent within a factor
of ≈ 2–3.
In Table 3 we report, along with the photometric redshifts,
the column densities NH and the de-absorbed L2−10 keV lumi-
nosities. We derive rest-frame NH column densities in the range
1022.0–1023.4 cm−2 and 2–10 keV luminosities in the range
6 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
Fig. 2. Rest-frame SEDs of the elliptical sources (black filled circles) compared with the best-fit model obtained as the sum (solid
line) of an early-type galaxy (dot-dashed line) and a nuclear component (dashed line). For comparison, the nuclear component
as derived from the X-ray normalization is also reported (dotted line). The nuclear Ks-band upper limits (downward-pointing
arrows) were derived from the morphological analysis carried out by Mignoli et al. (2004). The 24 µm upper limit of source
PKS 0312−77#36 takes into account a possible contribution from a companion source. zs means that the redshift is spectroscopic
(see §2 for details), while zphot means that the redshift is photometric (see §4.1).
1043.8–1044.7 erg s−1, placing these sources among the Type 2
quasar population.
4.2. Point-like sources
From the Ks-band morphological analysis, Abell 2690#29 and
PKS 0312−77#45 (see Mignoli et al. 2004 and Table 1) show
a completely different appearance at 2.2 µm in comparison
with the sample of extended objects; these two sources have
their near-IR emission mostly dominated by an unresolved
source. The dominant role played by the AGN is supported
for Abell 2690#29 also by its near-IR spectrum, where a broad
Hα emission line is detected (see Maiolino et al. 2006 and §2).
F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 7
Fig. 3. Rest-frame SEDs of the two point-like sources (black filled circles) compared to an extinguished quasar template (dashed
line) and the best-fit red quasar template (solid line). The extinguished quasar template is obtained from the unobscured quasar
template of Elvis et al. (1994) using the SMC extinction law with E(B − V) = 0.7 and scaled to fit the R − Ks colour (for
comparison, the unobscured quasar template is also shown as a dotted line). The red quasar template is taken from Polletta et al.
(2006). The 24 µm flux density upper limit for the source Abell 2690#29 takes into account a possible contribution from a
companion source. zs means that the redshift is spectroscopic (see §2 for details), while zphot means that the redshift is photometric
(see §4.2).
Unfortunately, the near-IR spectroscopy information is absent
for PKS 0312−77#45 (see Table 1).
We first tried to reproduce their observed SEDs by redden-
ing the composite template spectrum of bright Type 1 quasars
of Elvis et al. (1994) with several extinction laws. Reddening
has been applied as prescribed by Calzetti (1997) for a dust-
screen model and by Pei (1992) for the Small Magellanic
Cloud (SMC) galaxy. The two prescriptions produce similar
effects at λ > 0.5 µm, but the SMC law produces redder spec-
tra at shorter wavelengths for the same amount of extinction.
Reddened templates with an SMC law reproduce quite well the
optical spectra of dust-reddened quasars in the Sloan Digital
Sky Survey (SDSS; see Richards et al. 2003), while, using the
Calzetti (1997) law, Polletta et al. (2006) were able to repro-
duce the SEDs of X-ray sources in the Spitzer SWIRE survey.
The procedure of reddening a typical Type 1 quasar does
not provide a satisfactory fit to the photometric data points of
our sources. In Fig. 3 we show the results obtained when the
prescription of Mignoli et al. (2004) for the extinction [SMC
extinction law and E(B − V) = 0.7] is adopted.
The dashed line shows the reddened quasar template nor-
malized to fit the R−Ks colour. For comparison, the unobscured
quasar template is also shown (dotted line). Although the R−Ks
colour is obviously reproduced, the overall SED is not well re-
produced, since the observed IRAC and 24 µm flux densities
are systematically lower than predicted (up to a factor of 10 at
24 µm).
The discrepancy between the data points and the reddened
Type 1 quasar template might be due to the application to ac-
tive galactic nuclei of an extinction curve derived from galax-
ies. The different behaviour of AGNs from galaxies can be
attributed to different dust distribution (i.e., torus shape in
AGNs), and gas-to-dust ratios, which can lead to an unusual
dust reddening curve for AGNs.
Since a reddened quasar template does not reproduce the
shape of our data, we adopted the red quasar template from
Polletta et al. (2006), which is a composite spectrum: in the
optical/near-IR band, it is the spectrum of the red quasar
FIRST J013435.7−093102 from Gregg et al. (2002), while the
average of several bright quasars from the Palomar-Green (PG)
sample (Schmidt & Green 1983) with consistent optical data
has been used in the IR. The Polletta et al. (2006) template re-
produces the observed data points significantly better, allowing
for the observed sharp decrease from 0.2 to 0.7 µm in the source
rest frame (see Fig. 3 where the Polletta et al. 2006 spectrum is
shown as a solid line).
For source Abell 2690#29, which has a spectroscopic red-
shift, the SED normalization has been obtained through a best-
fit procedure. For source PKS 0312−77#45, where only a min-
imum redshift was available prior to this analysis, we have left
free to vary both the normalization and the redshift.
In Table 3 we report the derived redshifts, column densi-
ties NH and de-absorbed L2−10 keV also for these sources which,
similarly to the elliptical-like sources, belong to the Type 2
quasar population.
5. Physical parameters
5.1. Bolometric correction
Once the SED of the nuclear component has been determined,
the following step is to estimate the bolometric luminosity Lbol,
which is a quantity directly related to the central black hole
activity.
The bolometric luminosity Lbol can be estimated from
the luminosity in a given band b, Lb, by applying a suit-
able bolometric correction kbol,b = Lbol/Lb. For the X-ray se-
lected sources, the bolometric luminosity is typically estimated
8 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
from the luminosity in the 2–10 keV band (kbol,2−10 keV =
Lbol/L2−10 keV ). In previous works, several authors used the
bolometric correction obtained by Elvis et al. (1994) for lumi-
nous, mostly nearby quasars, i.e., kbol,2−10 keV≈ 30. However,
these corrections could be affected by the following uncer-
tainties: firstly, they are average corrections obtained from
a few dozens of bright quasars; secondly, as discussed in
Marconi et al. (2004), these corrections could overestimate the
bolometric luminosities since they are based on the integral
of the observed SEDs of bright unobscured AGN, without
removing the IR bump (hence counting twice a fraction of
≈ 30% of the intrinsic optical–UV radiation). At lower lumi-
nosities (typical of Seyfert galaxies, i.e., 1042 − 1044 erg s−1), a
lower value for this correction (kbol,2−10 keV≈ 10) was suggested
(e.g., Fabian 2004). For heavily obscured luminous sources,
only few objects have been studied in detail; in particular, for
two SWIRE Compton-thick (i.e., log NH & 24 cm
−2) AGN
Polletta et al. (2006) found kbol,0.3−8 keV ≈ 3 and ≈ 100.
In this work, thanks to the multi-band observations and ef-
forts in disentangling the AGN and the host components, we
try to derive directly the nuclear bolometric luminosity of our
sources without assuming any average correction. We estimate
Lbol by adding the X-ray luminosity integrated over the entire
X-ray range (L0.5−500 keV , not corrected for absorption) to the
IR luminosity (L1−1000 µm).
L0.5−500 keV has been estimated from the observed L2−10 keV
luminosity assuming a single power-law spectrum with Γ=1.9
(typical for AGN emission; see, e.g., Fig. 6 of Vignali et al.
2005 and references therein) plus absorption (where the col-
umn densities are taken from the X-ray spectral analysis; see
Perola et al. 2004 and Lanzuisi et al., in preparation) and an
exponential cut-off at 200 keV. The median value found for the
ratio L0.5−500 keV/L2−10 keV is ≈ 4. The IR luminosity has been
estimated by integrating the SED from 1 µm to 1000 µm using
only the nuclear component for the AGN hosted in the ellipti-
cal galaxies (see §4.1), and the Polletta et al. (2006) template
for the point-like sources.
Before computing the bolometric output of our sources, the
derived IR luminosities must be properly corrected to account
for the geometry of the torus and its orientation. The first cor-
rection is related to the covering factor f (which represents the
fraction of the primary optical–UV radiation intercepted by the
torus), while the second correction is due to the anisotropy of
the IR emission, which is a function of the viewing angle (see
Pier & Krolik 1993 and Granato & Danese 1994 for further de-
tails).
We estimated the first correction (≈ 1.5) from the ratio
of obscured (Compton thin + Compton thick) to unobscured
quasars as required by the most recent X-ray background syn-
thesis model (see Gilli et al. 2007) in the luminosity range of
our sources. A correction of ≈ 1.5 implies an average covering
factor f≈0.67 which, in a simple torus geometry, corresponds
to an angle θ≈48◦ between the perpendicular to the equatorial
plane and the edge of the torus.
A first-order estimate of the anisotropy factor has been
computed from the Silva et al. (2004) templates as the ratio
(R) of the luminosity of a face-on vs. an edge-on AGN, whose
obscuration is parametrized as a function of NH. The integra-
tion has been performed in the 1–30 µm range, after normal-
izing the two SEDs to the same luminosity in the 30–100 µm
range, where the anisotropy is thought to be negligible. The
derived anisotropy factors are large only for the Silva et al.
(2004) template with higher column density (R ≈3–4 for NH =
1024.5 cm−2); since all of our targets are characterized by lower
obscuration, such corrections do not affect our IR luminosities
significantly (R ≈1.2–1.3 for NH ≈ 10
22.0 − 1023.4 cm−2). In
conclusion, the final combined corrections to be applied to the
observed IR luminosities of our sources, given their column
densities, are in the range ≈ 1.8–2.0. After adding the X-ray lu-
minosities, the IR correction factors would translate in a mean
correction factor of ≈ 1.7 in the computation of the bolometric
luminosities.
In Table 3 the derived bolometric luminosities are reported
along with the full range (L0.5−500 keV) of X-ray luminosities,
the IR (L1−1000 µm) luminosities and the bolometric corrections
(kbol,2−10 keV ). We note that our L1−1000 µm estimates (hence Lbol)
are robust despite the choice of our SEDs. By comparing the
L1−1000 µm obtained using the Silva et al. (2004) model with the
L1−1000 µm obtained adopting other recent average quasar SEDs
(i.e., Richards et al. 2006), we have verified that the uncertain-
ties in L1−1000 µm are within the ≈ 10% level. The bolometric
output of our targets is dominated by the IR reprocessed emis-
sion, the primary X-ray radiation (L0.5−500 keV) accounting only
for .15% of the total luminosity.
The derived median (mean) value of kbol,2−10 keV is ≈ 25
(35±9; see Fig. 4 and column 8 of Table 3), consistent with the
value kbol,2−10 keV≈ 30 from Elvis et al. (1994) widely adopted
in past works. However, as pointed out also by Elvis et al.
(1994), the bolometric corrections span a wide range of values
(≈ 12–100); as a consequence, the adoption of a mean value
could lead to inaccurate results.
We note that in the extreme case where no corrections for
the covering factor and the anisotropy of the torus are applied,
we would obtain a median kbol,2−10 keV of ≈ 16.
In Fig. 4 the derived Lbol as a function of L2−10 keV is shown;
the dot-dashed line joins the expected values from the analysis
of Marconi et al. (2004), where kbol,2−10 keV is derived by con-
structing an AGN reference template taking into account how
the spectral index αox (Zamorani et al. 1981) varies as a func-
tion of the luminosity (Vignali et al. 2003). Although the bolo-
metric luminosities estimated for our objects are on average
lower than those expected on the basis of Marconi et al. (2004)
relation, they are however consistent with a trend of higher
kbol,2−10 keV for objects with higher X-ray luminosity. If we fit
our objects in the Lbol − L2−10keV plane with the same slope as
the Marconi et al. (2004) relation, the difference in normaliza-
tion is ≈ 50%.
5.2. Galaxy and black hole masses, and black hole
Eddington ratios
For the AGN hosted in elliptical galaxies we are able to infer
both the galaxy and the black hole masses. The galaxy masses
are estimated, assuming a Salpeter (1955) initial mass func-
tion (IMF), from the Ks luminosities taking into account that
ozzi,C
ignali,A
astri
itzer
observations
inous
obscured
quasars
Table 3. Inferred rest-frame properties of our targets
Source Id. za NbH L
2−10 keV L
0.5−500 keV L
1−1000 µm Lbol Lbol/L2−10 keV L
K Mstar M
BH (Lbol/LEdd)
(1022 cm−2) (1044 erg s−1) (1045 erg s−1) (1045 erg s−1) (1045 erg s−1) (1011Lk,⊙) (10
11M⊙) (10
Abell 2690#75 1.30+0.30
−0.20 6.9 3.2 1.3 9.7 11.0 34.7 5.2 3.5 1.3 0.065
PKS 0312-77#36 0.90+0.05
−0.15 1.0 0.7 0.2 1.2 1.5 20.6 1.0 0.8 0.2 0.058
PKS 0537-28#91 1.30+0.40
−0.70 25.8 5.3 2.6 4.4 7.1 13.4 2.8 1.5 0.7 0.084
PKS 0537-28#54 >1.30 1.6 2.0 0.6 3.9 4.6 23.0 3.0 2.0 0.7 0.049
PKS 0537-28#111 1.20+0.20
−0.10 9.1 1.7 0.7 1.4 2.1 12.3 7.8 6.2 2.1 0.008
Abell 2690#29 2.08 2.1 8.4 2.8 78.4 81.2 97.0 4.17 – – –
PKS 0312-77#45 1.85+0.20
−0.30 8.0 6.2 2.6 27.2 29.8 47.9 1.65 – – –
BPM 16274#69 1.35 2.5 2.4 0.8 5.9 6.7 28.2 8.8 6.0 2.5 0.022
a Photometric redshifts as derived from the analysis presented in this paper (see §4.1). For source PKS 0537-28#54, only a minimun redshift was estimated; for sources Abell 2690#29
and BPM 16274#69, the spectroscopic redshifts measured by Maiolino et al. (2006) are reported.
b The column densities, measured through X-ray spectral fitting (see Perola et al. 2004 and Lanzuisi et al., in preparation, for details), were “matched” to the redshift used in the SED
best-fitting procedure (using the relation NH(z)=NH(z = 0)×(1 + z)
2.6).
c Absorption-corrected X-ray luminosity.
d The 0.5–500 keV luminosities have been derived from the observed 2–10 keV luminosities as described in the text. The luminosities are not corrected for absorption.
e The 1–1000 µm luminosities have been derived from the integral of the nuclear SEDs (including the corrections described in the text). The values reported for the point-like sources
refer to the Polletta et al. (2006) red quasar template (see 4.2 for details).
f The Ks-band luminosities refer to the nuclear component for the point-like sources (Abell 2690#29 and PKS 0312−77#45) and to the host-galaxy starlight for the AGN hosted in the
elliptical galaxies.
g,h The reported MBH and Lbol/LEdd have been computed from the local LK −MBH relation (Marconi & Hunt 2003), under the hypothesis of an evolution of the MBH/Mstar ratio of a factor
two with redshift (see §5.2 for details).
10 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
Fig. 4. Bolometric luminosity vs. absorption-corrected 2–
10 keV luminosity for the six AGN hosted in the elliptical
galaxies (circles) and the two point-like AGN (triangles). The
filled symbols refer to the values corrected for the covering fac-
tor and torus anisotropy, while the empty symbols refer to the
bolometric luminosity without applying these corrections. The
dot-dashed line represents the correlation from Marconi et al.
(2004). The three dashed lines represent the loci of kbol,x=10,
30 and 100.
Fig. 5. Black hole mass (MBH) vs. bolometric luminosity (Lbol)
for the AGN hosted in the elliptical galaxies. The Eddington
luminosity for a given MBH is reported in the right-hand axis.
The black hole masses have been estimated from the local
LK − MBH relation (Marconi & Hunt 2003) under two differ-
ent hypotheses (see 5.2): (1) evolution by a factor of two of
the MBH/Mstar ratio with redshift in comparison to the local
values (black filled circles); (2) no evolution of the MBH/Mstar
ratio (empty circles). The three dashed lines represent the loci
of Lbol/LEdd = 0.01, 0.033 and 0.1 (from left to right).
Mstar/LK for an old stellar population can vary from ≈ 0.5 to
≈ 0.9 (for ages between 3 and 6 Gyr; Bruzual & Charlot 2003).
We can derive Mstar directly from LK since for these sources
the Ks-band emission is dominated by the galaxy starlight. The
rest-frame LK have been derived using the appropriate SED
templates (see Sect. 4.1). The inferred stellar masses are in
the range (0.8–6.2)×1011 M⊙, implying that our obscured AGN
are hosted by massive elliptical galaxies at high redshifts. In
Table 3 both the LK and Mstar values are reported; we note
that the different assumption of the Chabrier (2003) IMF would
produce a factor ≈ 1.7 lower masses (di Serego Alighieri et al.
2005).
To estimate the black hole masses, we take advantage of the
local MBH − LK relation (Marconi & Hunt 2003) which, taking
into account the Mstar/LK values, is expression of the intrin-
sic MBH − Mstar relation. Given the challenging measurements
of high-redshift black hole masses, the behaviour of this rela-
tion with redshift is still matter of debate and different authors,
using different techniques, have found different results.
Woo et al. (2006) and Peng et al. (2006) derive a signifi-
cant evolution of the MBH − Mstar relation with redshift, be-
ing the MBH/Mstar ratio larger, at high redshift, up to a factor
≈ 4 in comparison to the local value. In the Woo et al. (2006)
analysis the discrepancy with respect to the local value is al-
ready present at z = 0.36, while Peng et al. (2006) find an av-
erage MBH/Mstar a factor &4 times larger than the local value
at z > 1.7, while at lower redshifts (1 . z . 1.7) they de-
rive a ratio which is at most two times higher than the local
value, and maybe consistent with marginal or no evolution. On
the other hand, Shields et al. (2006) and Hopkins et al. (2006)
suggest that the MBH/Mstar ratio is not significantly higher (at
most a factor of two) than that measured locally up to z . 2.
Given these uncertainties about the evolution of the MBH −
Mstar relation with redshift, we have estimated the black hole
masses for our objects under two different hypotheses: (1) the
MBH/Mstar ratio is higher than locally by a factor two in the
redshift range (0.9 . z . 1.4) of our sources; (2) the MBH/Mstar
ratio does not evolve with redshift.
In both cases, our results imply very massive black hole
masses (see Fig. 5), in the range ≈ 2.0 × 108 − 2.5× 109 M⊙ in
the former hyphothesis, with a factor of two lower values under
the second hypothesis.
Our estimated MBH are consistent with the results derived
by McLure & Dunlop (2004) studying a large sample of Type 1
SDSS quasars and deriving the black hole masses from virial
methods; most of their black hole masses are in the range
1.5×108–2.5×109 M⊙ in the redshift interval of our sample (see
Fig. 1 of McLure & Dunlop 2004).
From the comparison of the bolometric luminosities com-
puted in the previous section (see Table 3) with the Eddington
luminosities calculated from the black hole masses esti-
mated above, we derive that our obscured AGN are radiat-
ing at a relatively low fraction of their Eddington luminos-
ity (λ ≈ 0.008–0.084 and λ ≈ 0.015–0.170 under the two hy-
potheses; see Fig. 5 and Table 3). This finding confirms and
extends to a larger sample the results found by Maiolino et al.
(2006) for two sources of our sample and by Brusa et al. (2005)
for a sample of EROs in the “Daddi Field”. As suggested by
Maiolino et al. (2006), the data indicate that our very massive
black holes may have already passed their rapidly accreting
phase and are reaching their final masses at low accretion rates.
F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars 11
Fig. 6. Bolometric luminosity as a fraction of the Eddington lu-
minosity vs. redshift for the whole sample of SDSS quasars of
McLure & Dunlop (2004), plotted as small crosses. The large
circles indicate the six HELLAS2XMM AGN hosted in ellipti-
cal galaxies (symbols as in Fig. 5).
The estimated radiating efficiencies are significantly lower
than the average Lbol/LEdd ≈ 0.4 inferred by Marconi et al.
(2004). However, since in the Marconi et al. (2004) model only
the phases of significant black hole growth are considered, our
results are not in contrast with the proposed model but sug-
gest that our targets belong to the tail of the sources charac-
terized by low accretion rates. Consistently, our data (black
filled and empty circles representing the evolution and no-
evolution hypothesis, respectively) lie in the lower envelope of
the Eddington ratio distribution found by McLure & Dunlop
(2004) for their large SDSS quasar sample. This is shown in
Fig. 6, where our data are overlaid on the SDSS data points.
This suggests that the SDSS quasar survey and the HELLAS
survey probe different regimes of AGN activity: the SDSS sam-
ples the brightest sources in the sky (R <∼ 20), most likely char-
acterized by a high accretion rate, while our targets (X-ray se-
lected, optically faint, i.e., R > 24, and obscured), are asso-
ciated with a different evolutionary phase. We argue that the
SMBH in our targets has already reached its final mass and the
observed emission is witnessing a late stage of the accretion
activity.
6. Conclusions
We have performed with Spitzer a pilot program to study a sam-
ple of eight Type 2 (i.e., luminous and obscured) quasars at
high redshift, selected from the HELLAS2XMM survey. Three
sources have a measured spectroscopic redshift (two secure and
one tentative) from near-IR spectroscopy; the remaining ob-
jects have an estimated minimum redshift obtained from the
R − K colours. On the basis of their Ks-band morphological
properties, the sample is divided into two classes: sources with
radial profiles typical of elliptical galaxies and point-like ob-
jects. The most important results can be summarized as fol-
lows:
• All of the eight sources have been clearly detected in both
IRAC and MIPS 24 µm bands.
• The Spitzer observations have allowed us to detect the nu-
clear component (often hidden at short wavelengths by the
host galaxy) as thermal IR re-processed emission from the
circumnuclear torus. While for the two point-like sources
the nuclear component dominates at all frequencies, for the
six sources with elliptical-like radial profile the contribu-
tion from the strong stellar continuum is dominant up to
the first IRAC bands, but the torus emission accounts for
the entire emission at 24 µm.
• Taking advantage of the new Spitzer data, the nuclear SEDs
of the sources have been modeled and new photometric
redshifts have been estimated, following two approaches:
for the elliptical sources, the nuclear emission has been
“cleaned” from the host galaxy contribution adopting a
two-component model (galaxy plus nuclear component),
constrained using all the extensive observed data sets. For
the point-like sources, the SEDs appear inconsistent with
an extinguished Type 1 quasar template, being well repro-
duced by an empirical SED of red quasars (Polletta et al.
2006). We find an overall agreement between the SED tem-
plates and the data points, and the derived photometric red-
shifts are consistent with the spectroscopic ones for two
sources.
• Using the model components to extrapolate the nuclear
SEDs in the far-IR regime, we derived the bolometric lumi-
nosities (being in the range ≈ 1045−1047 erg s−1) by adding
the IR luminosities to the full range of X-ray luminosities.
In this computation, we have considered and discussed the
corrections to be applied to the observed IR luminosities to
take into account the covering factor of the torus and the
anisotropy of the IR emission. The median 2–10 keV bolo-
metric correction is ≈ 25, consistent with the value typically
assumed in literature.
• For the elliptical sources, thanks to the independent esti-
mates of the stellar light and nuclear bolometric luminos-
ity, the physical parameters of the central black holes have
been estimated using the MBH − LK relation and exploring
different hypotheses for the evolution of the MBH/Mstar ra-
tio with redshift. Under the hyphothesis that the MBH/Mstar
ratio is a factor of two higher at z ≈ 1.2 than locally, our
luminous, obscured AGN have masses in the range (0.2-
2.5)×109 M⊙, reside in massive [(0.8-6.2)×10
11 M⊙] high-
redshift ellipticals and are characterized by low Eddington
ratios (λ≈ 0.008–0.084). Through our direct estimate of the
IR luminosity, we confirm the conclusion of Maiolino et al.
(2006) that these black holes may have already passed their
rapidly accretion phase.
Acknowledgements. The authors acknowledge partial support by the
Italian Space agency under the contract ASI–INAF I/023/05/0. The
authors thank R. Gilli, M. Polletta and L.Silva for useful discussions,
and R. J. McLure for kindly providing us with the data points of Fig. 6.
We thank the anonymous referee for the useful comments.
References
Antonucci, R. 1993, ARA&A, 31, 473
12 F. Pozzi, C. Vignali, A. Comastri et al.: Spitzer observations of luminous obscured quasars
Baldi, A., Molendi, S., Comastri, A., et al. 2002, ApJ, 564, 190
Barger, A. J., Cowie, L. L., Mushotzky, R. F., et al. 2005, AJ,
129, 578
Brusa, M., Comastri, A., Daddi, E., et al. 2005, A&A, 432, 69
Bruzual, G. & Charlot, S. 2003, MNRAS, 344, 1000
Calzetti, D. 1997, in American Institute of Physics Conference
Series, ed. W. H. Waller, 403
Chabrier, G. 2003, PASP, 115, 763
Cocchia, F., Fiore, F., Vignali, C., et al. 2007, A&A, in press,
astro-ph/0612023
Comastri, A. & Fiore, F. 2004, Ap&SS, 294, 63
Comastri, A., Setti, G., Zamorani, G., & Hasinger, G. 1995,
A&A, 296, 1
di Serego Alighieri, S., Vernet, J., Cimatti, A., et al. 2005,
A&A, 442, 125
Elvis, M., Wilkes, B. J., McDowell, J. C., et al. 1994, ApJS, 95,
Fabian, A. C. 2004, in Coevolution of Black Holes and
Galaxies, ed. L. C. Ho, 446
Fadda, D., Flores, H., Hasinger, G., et al. 2002, A&A, 383, 838
Fadda, D., Marleau, F. R., Storrie-Lombardi, L. J., et al. 2006,
AJ, 131, 2859
Fazio, G. G., Hora, J. L., Allen, L. E., et al. 2004, ApJS, 154,
Fiore, F., Brusa, M., Cocchia, F., et al. 2003, A&A, 409, 79
Franceschini, A., Manners, J., Polletta, M. d. C., et al. 2005,
AJ, 129, 2074
Gilli, R., Comastri, A., & Hasinger, G. 2007, A&A, 463, 79
Granato, G. L. & Danese, L. 1994, MNRAS, 268, 235
Granato, G. L., Danese, L., & Franceschini, A. 1997, ApJ, 486,
Gregg, M. D., Lacy, M., White, R. L., et al. 2002, ApJ, 564,
Hickox, R. C. & Markevitch, M. 2006, ApJ, 645, 95
Hopkins, P. F., Hernquist, L., Cox, T. J., et al. 2006, ApJS, 163,
Lonsdale, C., Polletta, M. d. C., Surace, J., et al. 2004, ApJS,
154, 54
Maiolino, R., Mignoli, M., Pozzetti, L., et al. 2006, A&A, 445,
Makovoz, D. & Marleau, F. R. 2005, PASP, 117, 1113
Marconi, A. & Hunt, L. K. 2003, ApJ, 589, L21
Marconi, A., Risaliti, G., Gilli, R., et al. 2004, MNRAS, 351,
McLure, R. J. & Dunlop, J. S. 2004, MNRAS, 352, 1390
Mignoli, M., Pozzetti, L., Comastri, A., et al. 2004, A&A, 418,
Pei, Y. C. 1992, ApJ, 395, 130
Peng, C. Y., Impey, C. D., Rix, H.-W., et al. 2006, ApJ, 649,
Perola, G. C., Puccetti, S., Fiore, F., et al. 2004, A&A, 421, 491
Pier, E. A. & Krolik, J. H. 1993, ApJ, 418, 673
Polletta, M. d. C., Wilkes, B. J., Siana, B., et al. 2006, ApJ, 642,
Richards, G. T., Hall, P. B., Vanden Berk, D. E., et al. 2003, AJ,
126, 1131
Richards, G. T., Lacy, M., Storrie-Lombardi, L. J., et al. 2006,
ApJS, 166, 470
Rieke, G. H., Young, E. T., Engelbracht, C. W., et al. 2004,
ApJS, 154, 25
Rigby, J. R., Rieke, G. H., Maiolino, R., et al. 2004, ApJS, 154,
Salpeter, E. E. 1955, ApJ, 121, 161
Schmidt, M. & Green, R. F. 1983, ApJ, 269, 352
Shields, G. A., Salviander, S., & Bonning, E. W. 2006, New
Astronomy Review, 50, 809
Silva, L., Maiolino, R., & Granato, G. L. 2004, MNRAS, 355,
Soltan, A. 1982, MNRAS, 200, 115
Spergel, D. N., Verde, L., Peiris, H. V., et al. 2003, ApJS, 148,
Surace, J. A., Shupe, D. L., Fang, F., & et al. 2005, tech-
nical report, The SWIRE Data Release 2. Available at
http://swire.ipac.caltech.edu/swire/astronomers/
publications/SWIRE2_doc_083105.pdf
Vignali, C., Brandt, W. N., & Schneider, D. P. 2003, AJ, 125,
Vignali, C., Brandt, W. N., Schneider, D. P., & Kaspi, S. 2005,
AJ, 129, 2519
Werner, M. W., Roellig, T. L., Low, F. J., et al. 2004, ApJS,
154, 1
Woo, J.-H., Treu, T., Malkan, M. A., & Blandford, R. D. 2006,
ApJ, 645, 900
Zamorani, G., Henry, J. P., Maccacaro, T., et al. 1981, ApJ,
245, 357
	Introduction
	Sample selection
	Spitzer observations and data reduction
	Analysis of the spectral energy distributions
	Elliptical-like sources
	Point-like sources
	Physical parameters
	Bolometric correction
	Galaxy and black hole masses, and black hole Eddington ratios
	Conclusions
ABSTRACT
  Aims: We aim at estimating the spectral energy distributions (SEDs) and the
physical parameters related to the black holes harbored in eight high
X-ray-to-optical (F_X/F_R>10) obscured quasars at z>0.9 selected in the 2--10
keV band from the HELLAS2XMM survey.
  Methods: We use IRAC and MIPS 24 micron observations, along with optical and
Ks-band photometry, to obtain the SEDs of the sources. The observed SEDs are
modeled using a combination of an elliptical template and torus emission (using
the phenomenological templates of Silva et al. 2004) for six sources associated
with passive galaxies; for two point-like sources, the empirical SEDs of red
quasars are adopted. The bolometric luminosities and the M_BH-L_K relation are
used to provide an estimate of the masses and Eddington ratios of the black
holes residing in these AGN.
  Results: All of our sources are detected in the IRAC and MIPS (at 24 micron)
bands. The SED modeling described above is in good agreement with the observed
near- and mid-infrared data. The derived bolometric luminosities are in the
range ~10^45-10^47 erg s^-1, and the median 2--10 keV bolometric correction is
~25, consistent with the widely adopted value derived by Elvis et al. (1994).
For the objects with elliptical-like profiles in the K_s band, we derive high
stellar masses (0.8-6.2)X10^11 Mo, black hole masses in the range
(0.2-2.5)X10^9 Mo, and Eddington ratios L/L_Edd<0.1, suggesting a low-accretion
phase.

<|endoftext|><|startoftext|>
Introduction
1.1. p-Adic wavelets and pseudo-differential operators. According to
the well-known Ostrovsky theorem, any nontrivial valuation on the field Q is
equivalent either to the real valuation | · | or to one of the p-adic valuations
| · |p. We recall that the field Qp of p-adic numbers is defined as the completion
of the field of rational numbers Q with respect to the non-Archimedean p-adic
norm | · |p. This norm is defined as follows: if an arbitrary rational number
x 6= 0 is represented as x = pγ m
, where γ = γ(x) ∈ Z, and m and n are not
divisible by p, then
(1.1) |x|p = p−γ , x 6= 0, |0|p = 0.
This norm inQp satisfies the strong triangle inequality |x+y|p ≤ max(|x|p, |y|p).
Thus there are two equal in rights universes: the real universes and the
p-adic one. The latter has a specific and unusual properties. Nevertheless,
there are a lot of papers where different applications of p-adic analysis to
physical problems, stochastics, cognitive sciences and psychology are stud-
ied [6]– [10], [13]– [19], [34]– [36] (see also the references therein). In view
of the Ostrovsky theorem such investigations not only have great interest in
Date:
2000 Mathematics Subject Classification. Primary 11F85, 42C40; Secondary 46F10.
Key words and phrases. p-adic multiresolution analysis, p-adic compactly supported
wavelets.
The first author (V. S.) was also supported in part by DFG Project 436 RUS 113/809
and Grant 05-01-04002-NNIOa of Russian Foundation for Basic Research.
http://arxiv.org/abs/0704.0736v1
2 V. M. SHELKOVICH AND M. SKOPINA
itself, but lead to applications and better understanding of similar problems
in usual mathematical physics.
We recall that there exists a p-adic analysis connected with the mapping Qp
into Qp and an analysis connected with the mapping Qp into the field of com-
plex numbers C, there exist two types of p-adic physics models. For the p-adic
analysis related to the mapping Qp → C the operation of partial differentia-
tion is not defined , and as a result, large number of models connected with
p-adic differential equations use pseudo-differential operators and the theory
of p-adic distributions (generalized functions) (see the above mentioned pa-
pers and books). In particular, fractional operators Dα are extensively used
in applications (see fore-quoted papers and especially [34]).
It is well known that the theory of p-adic pseudo-differential operators (in
particular, fractional operators) and equations closely related to wavelet type
bases. It is typical that p-adic compactly supported wavelets are eigenfunc-
tions of p-adic pseudo-differential operators [3]– [5], [16], [17], [18], [20] – [22].
Thus the wavelet theory plays a key role in application of p-adic analysis and
gives a new powerful technique for solving p-adic problems. This theory starts
development only in resent years and has many open problems.
In [20], S. V. Kozyrev constructed the orthonormal compactly supported
p-adic wavelet basis (1.2) in L2(Qp):
(1.2) θγja(x) = p
−γ/2χp
p−1j(pγx− a)
|pγx− a|p
, x ∈ Qp,
j ∈ Jp = {1, 2, . . . , p−1}, γ ∈ Z, a ∈ Ip = Qp/Zp. Kozyrev’s wavelets (1.2) are
eigenfunctions of the Vladimirov fractional operator [34, IX]. Further develop-
ment and generalization of the theory of such type wavelets can be found in
the papers by S. V. Kozyrev [21], [22], A. Yu. Khrennikov, and S. V. Kozyrev
[16], [17], J. J. Benedetto, and R. L. Benedetto [8], and R. L. Benedetto [9].
In [3], the multidimensional p-adic wavelets generated by direct product of
the Kozyrev one-dimensional wavelets were introduced. In [18], a new type of
p-adic multidimensional wavelet basis was introduced:
θ(m)γsa (x) = p
−γ/2χp
s(pγx− a)
|pγx− a|p
, x ∈ Qp,
where s ∈ Jp;m, γ ∈ Z, a ∈ Ip. Here Jp;m = {s = p−m
s0 + s1p + · · · +
sm−1p
: sj = 0, 1, . . . , p − 1; j = 0, 1, . . . , m − 1; s0 6= 0}, m ≥ 1 is
a fixed positive integer. The multidimensional wavelets from [3] are a par-
ticular case of the last wavelets. Moreover, in [3], [18], there were derived
the necessary and sufficient conditions for a class of multidimensional p-adic
pseudo-differential operators (including fractional operator) to have such mul-
tidimensional wavelets as eigenfunctions.
It remains to point out that for pseudo-differential operators from [3], [18]
a “natural” definition domain is the Lizorkin spaces of distributions Φ′(Qnp ),
introduced in [3]. The space Φ′(Qnp ) is invariant under the mentioned above
pseudo-differential operators. Moreover, the above mentioned p-adic wavelets
belong to the Lizorkin space Φ(Qnp ) of test functions. Recall that the usual
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 3
Lizorkin spaces were studied in the excellent papers of P. I. Lizorkin [24], [25]
(see also [29], [30]).
It’s interesting to compare appearing first wavelets in p-adic analysis with
the history of the wavelet theory in real analysis. In 1910 Haar [12] constructed
an orthogonal basis for L2(R) consisting of the dyadic shifts and scales of one
piecewise constant function. A lot of mathematicians actively studied Haar
basis, different kinds of generalizations were introduced, but during almost the
whole century nobody could find another wavelet function (a function whose
shifts and scales form an orthogonal basis). Only in early nineties a method
for construction of wavelet functions appeared. This method is based on the
notion of multiresolution analysis (MRA in the sequel) introduced by Y. Meyer
and S. Mallat [28], [26], [27]. Smooth compactly supported wavelet functions
were found in this way, which has been very important for some engineering
applications. In this paper we introduce MRA in L2(Qp) and present a concrete
MRA for p = 2 being an analog of Haar MRA in L2(R). The same scheme as
in the real setting leads to a Haar basis. It turned out that this Haar basis
coincides with Kozyrev’s wavelet system. However, 2-adic Haar MRA is not
an identical copy of its real analog. In contrast to Haar MRA in L2(R), we
proved that there exist infinity many different Haar orthogonal bases in L2(Q2)
generated by the same MRA.
1.2. Contents of the paper. In Sec. 2, we recall some facts from the p-adic
theory of distributions [11], [32], [33], [34]. In Sec. 3, some facts from the
theory of the p-adic Lizorkin spaces [3] are recalled.
In Sec. 4, by Definition 4.1 we introduce the MRA adapted to the p-adic
case. In Subsec. 4.2, we introduce the refinement equation (4.7)
φ(x) =
, x ∈ Qp,
whose solution φ(x) = Ω
is the characteristic function of the unit disc,
where where Ω(t) is the characteristic function of the interval [0, 1]. The con-
jecture to use the above equation as the refinement equation was proposed
in [18]. The above refinement equation is natural and reflects the fact that the
characteristic function Ω
of the unit disc B0 is represented as a sum of p
pieces characteristic functions of the disjoint discs B−1(r), r = 0, 1, . . . , p − 1
(see (2.7)).
In Subsec. 4.3, the 2-adic MRA is constructed. Namely, we proved that
MRA is generated by a refinable function which is the characteristic function
φ(x) = Ω
of the unit disc B0 = {x : |x|2 ≤ 1} ⊂ Q2 and satisfies the
refinement equation (4.8)
φ(x) =
, x ∈ Q2.
4 V. M. SHELKOVICH AND M. SKOPINA
By our MRA we construct 2-adic orthonormal wavelet basis (4.15) in L2(Q2),
which is the Kozyrev basis (1.2) for the case p = 2. It turned out that the
Kozyrev wavelet basis is not unique orthonormal wavelet basis.
In Sec. 5, infinity many different 2-adic wavelet orthonormal bases in L2(Q2)
are constructed. Namely, using Theorem 5.1, we construct wavelet functions
ψ(s)(x), s ∈ N whose dilatations and shifts form 2-adic orthonormal wavelet
bases in L2(Q2).
Since many p-adic models use pseudo-differential operators, in particular,
fractional operator, these results on p-adic wavelets can be intensively used in
applications. Moreover, p-adic wavelets can be used to construct solutions of
linear and semi-linear pseudo-differential equations [5], [23].
2. p-Adic distributions
We recall some facts from the theory of p-adic distributions (generalized
functions). Here and in what follows, we shall systematically use the notations
and results from [34] and [11, Ch.II]. Let N, Z, C be the sets of positive
integers, integers, complex numbers, respectively, and N0 = {0} ∪ N. Denote
by Q∗p = Qp \ {0} the multiplicative group of the field Qp.
The canonical form of a p-adic number x 6= 0 is
(2.1) x = pγ(x0 + x1p + x2p
2 + · · · ),
where γ = γ(x) ∈ Z, xj = 0, 1, . . . , p − 1, x0 6= 0, j = 0, 1, . . . . The series
is convergent in the p-adic norm (1.1), and one has |x|p = p−γ. By means of
representation (2.1), the fractional part {x}p of a number x ∈ Qp is defined as
follows
(2.2) {x}p =
0, if γ(x) ≥ 0 or x = 0,
pγ(x0 + x1p + x2p
2 + · · ·+ x|γ|−1p|γ|−1), if γ(x) < 0.
The function
(2.3) χp(ξx) = e
2πi{ξx}p
for every fixed ξ ∈ Qp is an additive character of the field Qp.
According to [34, III.2.], any multiplicative character π of the field Qp can
be represented as
= πα(x) = |x|α−1p π1(x), x ∈ Q∗p,
where π(p) = p1−α and π1(x) is a normed multiplicative character such that
π1(x) = π1(|x|px), π1(p) = π1(1) = 1, |π1(x)| = 1. We denote π0 = |x|−1p .
The space Qnp = Qp × · · · × Qp consists of points x = (x1, . . . , xn), where
xj ∈ Qp, j = 1, 2 . . . , n, n ≥ 2. The p-adic norm on Qnp is
(2.4) |x|p = max
1≤j≤n
|xj|p, x ∈ Qnp ,
where |xj |p id defined by (1.1).
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 5
Denote by Bnγ (a) = {x ∈ Qnp : |x − a|p ≤ pγ} the ball of radius pγ with the
center at a point a = (a1, . . . , an) ∈ Qnp and by Snγ (a) = {x ∈ Qnp : |x − a|p =
pγ} = Bnγ (a) \ Bnγ−1(a) its boundary (sphere), γ ∈ Z. For a = 0 we set
Bnγ (0) = B
γ and S
γ (0) = S
γ . For the case n = 1 we will omit the upper index
n. It is clear that
(2.5) Bnγ (a) = Bγ(a1)× · · · × Bγ(an),
where Bγ(aj) = {xj : |xj − aj |p ≤ pγ} ⊂ Qp is a disc of radius pγ with the
center at a point aj ∈ Qp, j = 1, 2 . . . , n.
Any two balls in Qnp either are disjoint or one contains the other. Every
point of the ball is its center.
According to [34, I.3,Examples 1,2.], the disc Bγ is represented by the sum
of pγ−γ
disjoint discs Bγ′(a), γ
′ < γ:
(2.6) Bγ = Bγ′ ∪ ∪aBγ′(a),
where a = 0 and a = a−rp
−r + a−r+1p
−r+1 + · · ·+ a−γ′−1p−γ
′−1 are the centers
of the discs Bγ′(a), r = γ, γ − 1, γ − 2, . . . , γ′ + 1, 0 ≤ aj ≤ p− 1, a−r 6= 0.
In particular, the disc B0 is represented by the sum of p disjoint discs
(2.7) B0 = B−1 ∪ ∪p−1r=1B−1(r),
where B−1(r) = {x ∈ S0 : x0 = r} = r + pZp, r = 1, . . . , p− 1; B−1 = {|x|p ≤
p−1} = pZp; and S0 = {|x|p = 1} = ∪p−1r=1B−1(r). Here all the discs are disjoint.
We call coverings (2.6) and (2.7) the canonical covering of the discs B0 and
Bγ , respectively.
On Qp there exists the Haar measure, i.e., a positive measure dx invariant
under shifts, d(x + a) = dx, and normalized by the equality
|ξ|p≤1 dx = 1.
The invariant measure dx on the field Qp is extended to an invariant measure
dnx = dx1 · · · dxn on Qnp in the standard way.
If f is an integrable function on Qp, then [11, Ch.II,§2.2], [34, IV]:
(2.8)
dx = pγ ,
f(x) dx =
f(x) dx,
f(x) dx =
f(x) dx−
f(x) dx.
A complex-valued function f defined on Qnp is called locally-constant if for
any x ∈ Qnp there exists an integer l(x) ∈ Z such that
f(x+ y) = f(x), y ∈ Bnl(x).
Let E(Qnp ) and D(Qnp ) be the linear spaces of locally-constant C-valued func-
tions on Qnp and locally-constant C-valued functions with compact supports
6 V. M. SHELKOVICH AND M. SKOPINA
(so-called test functions), respectively [34, VI.1.,2.]. If ϕ ∈ D(Qnp ), according
to Lemma 1 from [34, VI.1.], there exists l ∈ Z, such that
ϕ(x+ y) = ϕ(x), y ∈ Bnl , x ∈ Qnp .
The largest of such numbers l = l(ϕ) is called the parameter of constancy of
the function ϕ. Let us denote by DlN(Qnp ) the finite-dimensional space of test
functions from D(Qnp ) having supports in the ball BnN and with parameters
of constancy ≥ l [34, VI.2.]. The following embedding holds: DlN (Qnp ) ⊂
Dl′N ′(Qnp ), N ≤ N ′, l ≥ l′. Thus D(Qnp ) = lim indN→∞ lim indl→−∞DlN(Qnp ).
The space D(Qnp ) is a complete locally convex vector space.
According to [34, VI,(5.2’)], any function ϕ ∈ DlN(Qnp ) is represented in the
following form
(2.9) ϕ(x) =
pn(N−l)∑
ϕ(cν)∆l(x− cν), x ∈ Qnp ,
where ∆l(x − cν) are the characteristic functions of the disjoint balls Bl(cν),
and the points cν = (cν1 , . . . c
n) ∈ BnN do not depend on ϕ.
Denote by D′(Qnp ) the set of all linear functionals on D(Qnp ) [34, VI.3.].
Let us introduce in D(Qnp ) a canonical δ-sequence δk(x) = pnkΩ(pk|x|p), and
a canonical 1-sequence ∆k(x) = Ω(p
−k|x|p), k ∈ Z, x ∈ Qnp , where
(2.10) Ω(t) =
1, 0 ≤ t ≤ 1,
0, t > 1.
Here ∆k(x) is the characteristic function of the ball B
k . It is clear [34, VI.3.,
VII.1.] that δk → δ, k → ∞ in D′(Qnp ) and ∆k → 1, k → ∞ in E(Qnp ).
The Fourier transform of ϕ ∈ D(Qnp ) is defined by the formula
F [ϕ](ξ) =
χp(ξ · x)ϕ(x) dnx, ξ ∈ Qnp ,
where χp(ξ · x) = χp(ξ1x1) · · ·χp(ξnxn) = e2πi
j=1{ξjxj}p; ξ · x is the scalar
product of vectors.
The Fourier transform is a linear isomorphism D(Qnp ) into D(Qnp ). Moreover,
according to [32, Lemma A.], [33, III,(3.2)], [34, VII.2.],
(2.11) ϕ(x) ∈ DlN(Qnp ) iff F
(ξ) ∈ D−N−l (Q
We define the Fourier transform F [f ] of a distribution f ∈ D′(Qnp ) by the
relation [34, VII.3.]:
(2.12) 〈F [f ], ϕ〉 = 〈f, F [ϕ]〉, ∀ϕ ∈ D(Qnp ).
Let A be a matrix and b ∈ Qnp . Then for a distribution f ∈ D′(Qnp ) the
following relation holds [34, VII,(3.3)]:
(2.13) F [f(Ax+ b)](ξ) = | detA|−1p χp
−A−1b · ξ
F [f(x)]
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 7
where detA 6= 0. According to [34, IV,(3.1)],
(2.14) F [∆k](x) = δk(x), k ∈ Z, x ∈ Qnp .
In particular, F [Ω(|ξ|p)](x) = Ω(|x|p).
The convolution f ∗ g for distributions f, g ∈ D′(Qnp ) is defined (see [34,
VII.1.]) as
(2.15) 〈f ∗ g, ϕ〉 = lim
〈f(x)× g(y),∆k(x)ϕ(x+ y)〉
if the limit exists for all ϕ ∈ D(Qnp ), where f(x) × g(y) is the direct product
of distributions. If for distributions f, g ∈ D′(Qnp ) the convolution f ∗ g exists
then [34, VII,(5.4)]
(2.16) F [f ∗ g] = F [f ]F [g].
Definition 2.1. Let πα be a multiplicative character of the field Qp. A dis-
tribution f ∈ D′(Qnp ) is called homogeneous of degree πα if for all ϕ ∈ D(Qnp )
and t ∈ Q∗p we have the relation
, . . . ,
= πα(t)|t|np
f, ϕ(x1, . . . , xn)
i.e., f(tx) = f(tx1, . . . , txn) = πα(t)f(x), x = (x1, . . . , xn) ∈ Qnp . A homoge-
neous distribution of degree πα(t) = |t|α−1p (α 6= 0) is called homogeneous of
degree α− 1.
3. The p-adic Lizorkin spaces
Let us introduce the p-adic Lizorkin space of test functions
Φ(Qnp ) = {φ : φ = F [ψ], ψ ∈ Ψ(Qnp )},
where
Ψ(Qnp ) = {ψ(ξ) ∈ D(Qnp ) : ψ(0) = 0}.
Here Ψ(Qnp ),Φ(Q
p ) ⊂ D(Qnp ). The space Φ(Qnp ) is called the p-adic Lizorkin
space of test functions . The space Φ(Qnp ) can be equipped with the topology
of the space D(Qnp ) which makes Φ a complete space.
In view of (2.11), the following lemma holds.
Lemma 3.1. ( [3], [4]) (a) φ ∈ Φ(Qnp ) iff φ ∈ D(Qnp ) and
(3.1)
φ(x) dnx = 0.
(b) φ ∈ DlN(Qnp ) ∩ Φ(Qnp ), i.e.,
φ(x) dnx = 0, iff ψ = F−1[φ] ∈
D−N−l (Qnp ) ∩Ψ(Qnp ), i.e., ψ(ξ) = 0, ξ ∈ Bn−N .
Unlike the classical Lizorkin space, any function ψ(ξ) ∈ Φ(Qnp ) is equal to
zero not only at ξ = 0 but in a ball Bn ∋ 0, as well.
Let Φ′(Qnp ) denote the topological dual of the space Φ(Q
p ). We call it the
p-adic Lizorkin space of distributions .
8 V. M. SHELKOVICH AND M. SKOPINA
By Ψ⊥ and Φ⊥ we denote the subspaces of functionals in D′(Qnp ) orthogonal
to Ψ(Qnp ) and Φ(Q
p ), respectively. Thus Ψ
⊥ = {f ∈ D′(Qnp ) : f = Cδ, C ∈ C}
and Φ⊥ = {f ∈ D′(Qnp ) : f = C, C ∈ C}.
Proposition 3.1. ( [3])
Φ′(Qnp ) = D′(Qnp )/Φ⊥, Ψ′(Qnp ) = D′(Qnp )/Ψ⊥.
The space Φ′(Qnp ) can be obtained from D′(Qnp ) by “sifting out” constants.
Thus two distributions in D′(Qnp ) differing by a constant are indistinguishable
as elements of Φ′(Qnp ).
Similarly to (2.12), we define the Fourier transform of distributions f ∈
Φ′×(Q
p ) and g ∈ Ψ′×(Qnp ) by the relations:
(3.2)
〈F [f ], ψ〉 = 〈f, F [ψ]〉, ∀ψ ∈ Ψ(Qnp ),
〈F [g], φ〉 = 〈g, F [φ]〉, ∀φ ∈ Φ(Qnp ).
By definition, F [Φ(Qnp )] = Ψ(Q
p ) and F [Ψ(Q
p )] = Φ(Q
p ), i.e., (3.2) give well
defined objects.
4. Construction of multiresolution analysis
4.1. p-Adic multiresolution analysis. Denote the factor group Qp/Zp by
Ip, i.e.
Ip = {a = p−γ
a0 + a1p+ · · ·+ aγ−1pγ−1
(4.1) γ ∈ N; aj = 0, 1, . . . , p− 1; j = 0, 1, . . . , γ − 1}.
It is well known that Qp = B0 ∪ ∪∞γ=1Sγ, where Sγ = {x ∈ Qp : |x|p = pγ}.
In view of (2.1), x ∈ Sγ , γ ≥ 1 if and only if x = x−γp−γ + x−γ+1p−γ+1 + · · ·+
−1 + ξ, where ξ ∈ B0. Since x−γp−γ + x−γ+1p−γ+1 + · · ·+ x−1p−1 ∈ Ip, we
have a “natural” decomposition of Qp to a union of mutually disjoint discs:
Qp = ∪a∈IpB0(a).
So, Ip is a “natural” group of shifts for Qp.
Definition 4.1. A collection of closed spaces Vj ⊂ L2(Qp), j ∈ Z is called a
multiresolution analysis (MRA) in L2(Qp) if the following axioms hold
(a) Vj ⊂ Vj+1 for all j ∈ Z;
(b) ∪j∈ZVj is dense in L2(Qp);
(c) ∩j∈ZVj = {0};
(d) f(·) ∈ Vj ⇐⇒ f(p−1·) ∈ Vj+1 for all j ∈ Z;
(e) there a function φ ∈ V0 such that the system φ(x − a), a ∈ Ip, form an
orthonormal basis for V0.
The function φ from axiom (e) is called scaling or refinable. It follows im-
mediately from axioms (d) and (e) that the functions pj/2φ(p−j · −a), a ∈ Ip,
form an orthonormal basis for Vj.
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 9
According to the standard scheme (see, e.g., [31, §1.3]) for construction of
MRA-based wavelets, for each j, we define a space Wj (wavelet space) as the
orthogonal complement of Vj in Vj+1, i.e.,
(4.2) Vj+1 = Vj ⊕Wj , j ∈ Z,
where Wj ⊥ Vj , j ∈ Z. It is not difficult to see that
(4.3) f ∈ Wj ⇐⇒ f(p−1·) ∈ Wj+1, for all j ∈ Z
and Wj ⊥Wk, j 6= k. Taking into account axioms (b) and (c), we obtain
(4.4) ⊕j∈ZWj = L2(Qp) (orthogonal direct sum).
If now we find a function ψ ∈ W0 such that the system ψ(x−a), a ∈ Ip, form
an orthonormal basis for W0, then the system p
j/2ψ(p−j · −a), a ∈ Ip, is an
orthonormal basis for L2(Qp). Such a function ψ is called a wavelet function
and the basis is a wavelet basis.
4.2. p-Adic refinement equation. Let φ be a refinable function for a MRA.
As was mentioned above, the system p1/2φ(p−1 · −a), a ∈ Ip, is a basis for V1.
It follows from axoim (a) that
(4.5) φ =
αaφ(p
−1 · −a), αa ∈ C.
We see that the function φ is a solution of a special kind of functional equation.
Such equations are called refinement equations. Investigation of refinement
equations and their solutions is the most difficult part of wavelet theory in real
analysis.
A natural way for construction of a MRA (see, e.g., [31, §1.2]) is the fol-
lowing. We start with an appropriate function φ whose integer shifts form
an orthonormal system, and set V0 = span
: a ∈ Ip
and Vj =
p−jx− a
: a ∈ Ip
, j ∈ Z. It is clear that axioms (d) and (e) of
Definition 4.1 are fulfilled.
Of course, not any such a function φ provides axiom (a). In the real setting,
the relation V0 ⊂ V1 holds if and only if the refinable function satisfies a
refinement equation. Situation is different in p-adics. Generally speaking,
a refinement equation (4.5) does not imply the including property V0 ⊂ V1.
Indeed, we need all the functions φ(· − b), b ∈ Ip, to belong to the space V1,
i.e., the equalities φ(x − b) =
a∈Ip αa,bφ(p
−1x − a) should be fulfilled for all
b ∈ Ip. Since p−1b+ a is not in Ip in general, we can not state that refinement
equation (4.5) implies φ(x − b) =
a∈Ip αa,bφ(p
−1x − p−1b − a) ∈ V1 for all
b ∈ Ip.
The refinement equation reflects some “self-similarity”. The structure of the
space Qp has a natural “self-similarity” property which is given by formulas
(2.6), (2.7). By (2.7), the characteristic function ∆0(x) = Ω
of the unit
10 V. M. SHELKOVICH AND M. SKOPINA
disc B0 is represented as a sum of p characteristic functions of the disjoint
discs B−1(r), r = 0, 1, . . . , p− 1, i.e.,
(4.6) ∆0(x) =
, x ∈ Qp.
Thus, in p-adics, we have a natural refinement equation (4.5):
(4.7) φ(x) =
, x ∈ Qp,
whose solution is φ(x) = ∆0(x) = Ω
. This equation is an analog of the
refinement equation generating Haar MRA in real analysis.
4.3. Construction of 2-adic Haar multiresolution analysis. Now, using
the refinement equation (4.7) for p = 2
(4.8) φ(x) = φ
, x ∈ Q2,
and its solution, the refinable function φ(x) = ∆0(x) = Ω
, we construct
2-adic multiresolution analysis.
(4.9) V0 = span
: a ∈ I2
(4.10) Vj = span
2−jx− a
: a ∈ I2
, j ∈ Z.
It is clear that axioms (d) and (e) of Definition 4.1 are fulfilled and the system
2j/2φ(2−j · −a), a ∈ Ip is an orthonormal basis for Vj, j ∈ Z.
Note that the characteristic function of the unit disc Ω
has a wonderful
feature: Ω(| · +ξ|2) = Ω(| · |2), for all ξ ∈ Z2 because the p-adic norm is non-
Archimedean. In particular, Ω(| · ±1|2) = Ω(| · |2), i.e.,
(4.11) φ(x± 1) = φ(x), ∀ x ∈ Q2.
Thus φ is periodic with the period 1.
In view of this fact, taking into account that 2−1b+ a ( mod 1) is in I2, for
all a, b ∈ I2, it follows from the refinement equation (4.8) that V0 ⊂ V1. By
(4.10), this yields axiom (a).
Due to the refinement equation (4.8), we obtain that Vj ⊂ Vj+1, i.e., the
axiom (a) from Definition 4.1 holds.
Lemma 4.1. The axiom (b) of Definition 4.1 holds, i.e., ∪j∈ZVj = L2(Q2).
Proof. According to (2.9), any function ϕ ∈ D(Q2) belongs to one of the spaces
DlN(Q2), and consequently, is represented in the form
(4.12) ϕ(x) =
pN−l∑
ϕ(cν)∆l(x− cν), x ∈ Q2,
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 11
where ∆l(· − cν) are the characteristic functions of the mutually disjoint discs
ν) ⊂ Q2, cν ∈ BN , ν = 1, 2, . . . pN−l; l = l(ϕ), N = N(ϕ). Since
∆l(x − cν) = Ω(p−l|x − cν |p) = Ω(|plx − plcν |p) and any number plcν can be
represented in the form plcν = aν + bν , where aν ∈ I2, bν ∈ Z2, we have
∆l(x− cν) = ∆l(x− aν). Thus any function ϕ ∈ D(Q2) can be represented in
the form
(4.13) ϕ(x) =
pN−l∑
αν∆l(x− aν), x ∈ Q2, aν ∈ I2, αν ∈ C.
Consequently, on the basis of (4.10), ϕ(x) ∈ V−l. Thus any test function ϕ
belongs to one of the space Vj , where j = j(ϕ).
Since the space D(Q2) is dense in L2(Q2) [34, VI.2], approximating any
function from L2(Q2) by test functions (4.13), we prove our assertion. �
Lemma 4.2. The axiom (c) of Definition 4.1 holds, i.e., ∩j∈ZVj = {0}.
Proof. Suppose that ∩j∈ZVj 6= {0}. Then there exists a function f ∈ Vj for all
j ∈ Z. Hence, due to (4.10), f(x) =
a∈I2 cjaφ
2−jx− a
for all j ∈ Z.
Let x = 2−N(x0+x12+x22
2+· · · ). Since 2−jx = 2−N−j(x0+x12+x222+· · · ),
for all j ≤ −N , we have 2−jx ∈ Z2, and, consequently, |2−jx − a|2 > 1 for
all a ∈ I2, a 6= 0. Thus φ
2−jx − a
= 0 for all j ≤ −N and a ∈ I2, a 6= 0.
Since |2−jx|2 ≤ 1, we have f(x) = cj0 for all j ≤ −N . Similarly, for another
x′ = 2−N
(x′0 + x
12 + x
2 + · · · ), we have f(x′) = cj′0 for all j ≤ −N ′. This
yields that f(x) = f(x′). Consequently, f(x) ≡ C, where C is a constant.
However, if C 6= 0, f 6∈ L2(Q2). Thus, C = 0 and the proof of the theorem is
complete. �
According to the above scheme, we introduce the spaceW0 as the orthogonal
complement of V0 in V1.
(4.14) ψ(0)(x) = φ
Lemma 4.3. The shift system ψ(0)(x− a), a ∈ I2, is an orthonormal basis of
the space W0.
Proof. Let us prove that W0 ⊥ V0. It follows from (4.8), (4.14) that
ψ(0)(x− a), φ(x− b)
ψ(0)(x− a)φ(x− b) dx
for all a, b ∈ I2. Let a 6= b. Since it is impossible a 6= b+ 1, b 6= a + 1, taking
into account that the functions 21/2φ(2−1 · −c), c ∈ I2 are orthonormal, we
obtain
ψ(0)(x − a), φ(x− b)
= 0. If a = b, again due to the orthonormality
12 V. M. SHELKOVICH AND M. SKOPINA
of the system 21/2φ(2−1 · −c), c ∈ I2, taking into account that a2 ,
∈ I2,
we have
ψ(0)(x− a), φ(x− a)
dx = 0.
Thus, ψ(0)(x+ a) ⊥ φ(x+ b) for all a, b ∈ I2.
The refinement equation (4.8) and relation (4.14) imply that
x− 2a
+ ψ(0)
x− 2a
, a ∈ I2.
Since {21/2φ(2−1x − a) : a ∈ I2} is a basis for V1, we have V1 = V0 ⊕W0, i.e.,
(4.2) holds. �
Thus we prove that the collection {Vj : j ∈ Z} is a MRA in L2(Q2) and
the function ψ(0) defined by (4.14) is a wavelet function. This MRA is a 2-
adic analog of the real Haar MRA and the wavelet basis generated by ψ(0) is
an analog of real Haar wavelet basis. But in contrast to the real setting, the
refinable function φ generating our Haar MRA is periodic with the period 1
(see (4.11)), which never holds for real refinable functions. It will be shown
bellow that due of this specific property of φ, there exist infinity many different
orthonormal wavelet bases in the same Haar MRA (see Sec. 5).
Due to (2.3), (2.7), the function ψ(0) can be rewritten in the form ψ(0)(x) =
−1x)Ω(|x|2) and the Haar wavelet basis is
ψ(0)γa (x) = 2
−γ/2ψ(0)(2γx− a)
(4.15) = 2−γ/2χ2
2−1(2γx− a)
|2γx− a|2
, x ∈ Q2, γ ∈ Z, a ∈ I2.
It is clear that
(4.16)
ψ(0)γa (x) dx = 0,
and, according to Lemma 3.1, ψ
γa (x) belongs to the Lyzorkin space Φ(Q2).
Remark 4.1. The Haar wavelet basis (4.15) coincides with Kozyrev’s wavelet
basis (1.2) for the case p = 2. In present paper we restrict ourself by con-
structing the Haar wavelets only for p = 2. Since Haar refinement equation
(4.7) was presented for all p, a similar construction may be easily realized in
the general case. Moreover, it is not difficult to see that Kozytev’s wavelet
function θj(x) from (1.2) can be expressed in terms of the refinable function
φ(x) as
(4.17) θj(x) = χp(p
−1jx)Ω
= p−1/2
, x ∈ Qp,
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 13
where hr = p
2πi{ jr
}p, r = 0, 1, . . . , p− 1, j = 1, 2, . . . , p− 1.
Remark 4.2. In view of periodicity (4.11) of the refinable function φ, one can
use shifts ψ(0)(·+ a), a ∈ I2, instead of shifts ψ(0)(· − a), a ∈ I2.
Now we show that there is another function ψ(1)(x) whose shifts form an
orthonormal basis in W0. Indeed, taking into account (4.11), we have
ψ(1)(x) =
(4.18) =
an its shifts
(4.19) =
(4.20) =
Since the system of functions {φ(2−1x − a) : a ∈ I2} is orthonormal, in
view of (4.11), formulas (4.18)–(4.20) imply that the function ψ(1)(x) and the
function ψ(1)(x − a) are orthonormal, whenever a ∈ I2, a 6= 0, 12 . Here we
take into account that all shifts (up to mod 1) of refinable function in (4.18),
(4.20) are distinct.
Similarly, by (4.18), (4.19), we have
ψ(1)(x), ψ(1)(x+ 2−1)
ψ(1)(x)ψ(1)(x+ 2−1) dx
= 2−1
dx = 0.
ψ(1)(x), ψ(1)(x)
= 2−1
dx = 1.
14 V. M. SHELKOVICH AND M. SKOPINA
Thus all shifts of ψ(1) are orthonormal.
It is clear that the functions (4.18) and (4.19) can be rewritten in the form
(4.21) ψ(1)(x) =
− ψ(0)
+ ψ(0)
It follows that
ψ(0)(x) =
+ ψ(1)
Since the system ψ(0)(· − a), a ∈ I2, forms an orthonormal basis for W0, the
system ψ(1)(· − a), a ∈ I2, is another orthonormal basis for W0.
So, we showed that a wavelet basis generated by the Haar MRA is not
unique.
5. Description of 2-adic Haar bases
5.1. Complex wavelets. Using the fact that all dilatations and shifts (x →
2γx+ a, a ∈ I2) of the Haar wavelet function ψ(0) form a orthonormal basis in
L2(Q2), we show that there exist infinitely many wavelet functions ψ(s), s ∈ N
in W0.
In what follows, we shall write the 2-adic number a = 2−s
a0 + a12 + · · ·+
as−12
s−1) ∈ I2, aj = 0, 1, j = 0, 1, . . . , s − 1 briefly as a rational number
a = m
, where m = a0 + a12 + · · ·+ as−12s−1.
Since the characteristic function of the unit disc φ(x) = ∆0(x) = Ω
periodic with the period ξ ∈ S0, the wavelet function ψ0(x) has the following
evident and important property:
(5.1) ψ(0)(x+ ξ) = −ψ(0)(x), ξ ∈ S0.
Here ξ = 1 + ξ12 + ξ22
2 + · · · , where ξj = 0, 1; j ∈ N.
Before we prove a general result, we consider the simplest particular case.
Consider the function
(5.2) ψ(1)(x) = α0ψ
+ α1ψ
, α0, α1 ∈ C,
and solve the problem when all shifts of this function generates an orthonormal
basis ψ(1)(x+ a), a ∈ I2 in W0.
Taking into account orthonormality of the system ψ(0)(x + a), a ∈ I2 and
relation (5.1), we can see that the function ψ(1)(x) and the functions ψ(1)(x+a)
are orthonormal for all a ∈ I2, a 6= 0, 12 . Thus, in view of (5.1), the system
of functions ψ(1)(x + a), a ∈ I2 is orthonormal if and only if the system of
functions (5.2) and
(5.3) ψ(1)
= −α1ψ(0)
+ α0ψ
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 15
is orthonormal. Hence, we have |α0|2 + |α1|2 = 1. In other words, the matrix
α0 α1
−α1 α0
is unitary. Thus, the function (5.2), where |α0|2 + |α1|2 = 1 is the wavelet
function. It is clear that the wavelet function (4.21) is a particular case of the
wavelet function (5.2).
Consequently, all dilatations and shifts of ψ(1)(x) form 2-adic orthonormal
wavelet basis in L2(Q2).
Now we will prove a general theorem.
Theorem 5.1. Let s = 1, 2, . . . . The function
(5.4) ψ(s)(x) =
2s−1∑
is the wavelet function (whose dilatations and shifts form 2-adic orthonormal
wavelet basis in L2(Q2)) if and only if
(5.5) αk = 2
−s(−1)k
2s−1∑
−iπ 2r+1
k, k = 0, 1, 2, . . . , 2s − 1,
γk ∈ C, |γk| = 1.
Proof. Suppose that ψ(s)(x), s ≥ 1 is given by formula (5.4). Since the system
ψ(0)(· + a), a ∈ I2 is orthonormal (see Subsec. 4.3) and in view of relation
(5.1), it is easy to see that ψ(s) and ψ(s)(·+ a) are orthonormal for any a ∈ I2,
a 6= k
, k = 0, 1, . . . 2s − 1. Thus the system of functions ψ(s)(x+ a), a ∈ I2 is
orthonormal if and only if the system of functions, consisting of the function
(5.4) and its shifts, i.e.,
= −α2s−rψ(0)(x)−α2s−r+1ψ(0)
−· · ·−α2s−1ψ(0)
r − 1
(5.6) + α0ψ
+ · · ·+ α2s−r−1ψ(0)
2s − 1
r = 0, 1, . . . , 2s − 1 is orthonormal.
Set Ξ(0) = {ψ(0)(· + k
) : k = 0, 1, . . . , 2s − 1}T , Ξ(s) = {ψ(s)(· + k
) : k =
0, 1, . . . , 2s − 1}T . In view of (5.4), (5.6), Ξ(s) = DΞ(0), where
(5.7) D =

α0 α1 α2 . . . α2s−2 α2s−1
−α2s−1 α0 α1 . . . α2s−3 α2s−2
−α2s−2 −α2s−1 α0 . . . α2s−4 α2s−3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
−α2 −α3 −α4 . . . α0 α1
−α1 −α2 −α3 . . . −α2s−1 α0

Thus the system Ξ(s) is orthonormal if and only if the matrix D is unitary.
16 V. M. SHELKOVICH AND M. SKOPINA
Let u = (α0, α1, . . . , α2s−1)
T be a vector and

0 0 . . . 0 0 −1
1 0 . . . 0 0 0
0 1 . . . 0 0 0
. . . . . . . . . . . . . . . . . . .
0 0 . . . 1 0 0
0 0 . . . 0 1 0

be a 2s × 2s matrix. It is easy to see that
Aru = (−α2s−r,−α2s−r+1, . . . ,−α2s−1, α0, α1, . . . , α2s−r−1)T ,
r = 1, 2, . . . , 2s − 1. Thus D =
u,Au, . . . , A2
. It is significant that
u = −u. Consequently, in order to describe all matrixes D (or in other
words, all vectors u), we should find all vectors u = (α0, α1, . . . , α2s−1)
T such
that the system {Aru : r = 0, 1, 2, . . . , 2s − 1} is orthonormal.
In view of the fact that the system ψ(0)(x+a), a ∈ I2 forms an orthonormal
basis in W0, it is easy to see that the vector u0 = (1, 0, . . . , 0, 0)
T is one of
mentioned above vectors u. That is the system composed of vectors u0 and
Aru0 = (δ0 r, δ1 r, . . . , δ2s−2 r, δ2s−1 r)
T , r = 1, 2, . . . , 2s−1, is orthonormal, where
δi r is the Kronecker symbol.
Let us prove that the vector u = (α0, α1, . . . , α2s−1)
T already mentioned
above such that Aru, r = 0, 1, 2, . . . , 2s − 1 is orthonormal, can be expressed
by the formula u = Bu0 if and only if B is a unitary matrix such that AB =
BA. Indeed, let u = Bu0, where B is a unitary matrix such that AB =
BA. Then Aru = BAru0, r = 0, 1, 2, . . . , 2
s − 1. Since the system Aru0,
r = 0, 1, 2, . . . , 2s − 1 is orthonormal and the matrix B is unitary, the vectors
Aru, r = 0, 1, 2, . . . , 2s − 1 are orthonormal. Conversely, if the system Aru,
r = 0, 1, 2, . . . , 2s−1 is orthonormal, taking into account that the system Aru0,
r = 0, 1, 2, . . . , 2s − 1 is orthonormal, we conclude that there exists a unitary
matrix B such that Aru = B(Aru0), r = 0, 1, 2, . . . , 2
s − 1. Since A2su = −u,
u0 = −u0, we have an additional relation A2
u = BA2
u0. It follows from
the above relations that (AB − BA)(Aru0) = 0, r = 0, 1, 2, . . . , 2s − 1. Since
the vectors Aru0, r = 0, 1, 2, . . . , 2
s − 1 form a basis in the 2s-dimensional
space, we conclude that AB = BA.
Thus we have D =
Bu0, BAu0, . . . , BA
2s−1u0
It is clear that the eigenvalues of A and the corresponding normalized eigen-
vectors are
(5.8) λr = −eiπ
and vr =
(vr)1, . . . , (vr)2s
, respectively, where
(5.9) (vr)l = 2
−s/2(−1)le−iπ
l, l = 0, 1, 2, . . . , 2s − 1,
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 17
r = 0, 1, 2, . . . , 2s − 1. As is well known, the matrix A can be represented as
A = CÃC−1, where
λ0 0 . . . 0
0 λ1 . . . 0
. . .
0 0 . . . λ2s−1
is a diagonal matrix, C =
v0, v1, . . . , v2s−1
. Since C is a unitary matrix, the
matrix B = CB̃C−1 is unitary if and only if B̃ is unitary. On the other hand,
AB = BA if and only if ÃB̃ = B̃Ã. Moreover, since according to (5.8) λk 6= λl,
whenever k 6= l, all unitary matrix B̃ such that ÃB̃ = B̃Ã, are given by
γ0 0 . . . 0
0 γ1 . . . 0
. . .
0 0 . . . γ2s−1
 ,
where γk ∈ C, |γk| = 1. Hence, all unitary matrix B such that AB = BA, are
given by B = CB̃C−1, where B̃ is the above diagonal matrix.
By using formula (5.9), one can calculate
αk = (Bu0)k = (CB̃C
−1u0)k =
2s−1∑
γr(vr)k(vr)0
= 2−s(−1)k
2s−1∑
−iπ 2r+1
k, k = 0, 1, 2, . . . , 2s − 1,
where γk ∈ C, |γk| = 1. Thus (5.5) holds.
Taking into account that Ξ(0) = D−1Ξ(s), we conclude that if we define
ψ(s)(x) by formula (5.4), where αk is given by (5.5), k = 0, 1, 2, . . . , 2
s − 1,
then the system of functions {ψ(s)(· − a) : a ∈ a ∈ I2} is orthonormal and
forms the orthonormal basis in W0.
Consequently, all dilatations and shifts of the function (5.4) form 2-adic
orthonormal wavelet basis in L2(Q2). �
It is clear that
γa (x) dx = 0, and in view of Lemma 3.1, ψ
γa (x) belongs
to the Lizorkin space ∈ Φ(Qn2 ).
5.2. Real wavelets. Using formulas (5.5), one can extract all real wavelet
functions (5.4).
Let s = 1. According to (5.2), (5.3),
(5.10) ψ(1)(x) = cos θ ψ(0)
+ sin θ ψ(0)
is the real wavelet function.
18 V. M. SHELKOVICH AND M. SKOPINA
Let s = 2. Set γr = e
iθr , r = 0, 1, 2, . . . , 2s − 1. Then (5.5) imply that the
wavelet function ψ(1)(x) is real if and only if
sin θ1 + sin θ2 + sin θ3 + sin θ4 = 0,
cos θ1 − cos θ2 + cos θ3 − cos θ4 = 0,
sin θ1 − sin θ2 − sin θ3 + sin θ4 =
cos θ1 + cos θ2 − cos θ3 − cos θ4,
sin θ1 − sin θ2 − sin θ3 + sin θ4 =
−(cos θ1 + cos θ2 − cos θ3 − cos θ4).
The last relations are equivalent to the system
sin θ1 = − sin θ4, cos θ1 = cos θ4,
sin θ2 = − sin θ3, cos θ2 = cos θ3.
Thus for s = 2 the real wavelet functions (5.4) is represented as
ψ(1)(x) =
(cos θ1 + cos θ2)ψ
(cos θ1 − cos θ2 + sin θ1 + sin θ2)ψ(0)
(sin θ1 − sin θ2)ψ(0)
(5.11) +
(cos θ1 − cos θ2 − sin θ1 − sin θ2)ψ(0)
In particular, for the special cases θ1 = θ2 = θ, θ1 = −θ2 = θ, θ1 = θ2+π2 = θ,
we obtain one-parameter families of the real wavelet functions
(5.12)
ψ(1)(x) = cos θψ(0)
+ sin θψ(0)
ψ(1)(x) = cos θψ(0)
sin θψ(0)
sin θψ(0)
ψ(1)(x) = 1
(cos θ − sin θ)ψ(0)
(cos θ + sin θ)ψ(0)
(cos θ − sin θ)ψ(0)
respectively.
Acknowledgments
The authors are greatly indebted to E. Yu. Panov for fruitful discussions.
p-ADIC HAAR MULTIRESOLUTION ANALYSIS 19
References
[1] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Associated homogeneous p-adic dis-
tributions, J. Math. An. Appl. 313 (2006) 64–83.
[2] S. Albeverio, A.Yu. Khrennikov, V. M. Shelkovich, Associated homogeneous p-adic
generalized functions, Dokl. Ross. Akad. Nauk 393 no. 3 (2003), 300–303. English
transl. in Russian Doklady Mathematics. 68 no. 3 (2003) 354–357.
[3] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Harmonic analysis in the p-adic
Lizorkin spaces: fractional operators, pseudo-differential equations, p-adic wavelets,
Tauberian theorems, Journal of Fourier Analysis and Applications, Vol. 12, Issue 4,
(2006), 393–425.
[4] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, Pseudo-differential operators in the
p-adic Lizorkin space, p-Adic Mathematical Physics. 2-nd International Conference,
Belgrade, Serbia and Montenegro, 15 – 21 September 2005, Eds: Branko Dragovich,
Zoran Rakic, Melville, New York, 2006, AIP Conference Proceedings – March 29, 2006,
Vol. 826, Issue 1, pp. 195–205.
[5] S. Albeverio, A.Yu. Khrennikov, V.M. Shelkovich, p-Adic semi-linear evolutionary
pseudo-differential equations in the Lizorkin space, To appear in Dokl. Ross. Akad.
Nauk, (2007). English transl. in Russian Doklady Mathematics, (2007).
[6] I.Ya. Aref′eva, B.G. Dragovic, and I.V. Volovich On the adelic string amplitudes, Phys.
Lett. B 209 no. 4 (1998) 445–450.
[7] V.A. Avetisov, A.H. Bikulov, S.V. Kozyrev, and V.A. Osipov, p-Adic models of ultra-
metric diffusion constrained by hierarchical energy landscapes, J. Phys. A: Math. Gen.
12 (2002) 177–189.
[8] J.J. Benedetto, and R.L. Benedetto, A wavelet theory for local fields and related groups,
The Journal of Geometric Analysis 3 (2004) 423–456.
[9] R.L. Benedetto, Examples of wavelets for local fields, Wavelets, Frames, and operator
Theory, (College Park, MD, 2003), Am. Math. Soc., Providence, RI, (2004), 27–47.
[10] A.H. Bikulov, and I.V. Volovich, p-Adic Brownian motion, Izvestia Akademii Nauk,
Seria Math. 61 no. 3 (1997) 537–552.
[11] I.M. Gel′fand, M.I. Graev and I.I. Piatetskii-Shapiro, Generalized functions. vol 6:
Representation theory and automorphic functions. Nauka, Moscow, 1966.
[12] A. Haar, Sur Theorie de orthogonalen, Funktionensysteme, Math. Ann. 69 (1910) 331–
[13] A. Khrennikov, p-Adic valued distributions in mathematical physics. Kluwer Academic
Publ., Dordrecht, 1994.
[14] A. Khrennikov, Non-archimedean analysis: quantum paradoxes, dynamical systems and
biological models. Kluwer Academic Publ., Dordrecht, 1997.
[15] A. Khrennikov, Information dynamics in cognitive, psychological, social and anomalous
phenomena. Kluwer Academic Publ., Dordrecht, 2004.
[16] A.Yu. Khrennikov, and S.V. Kozyrev, Wavelets on ultrametric spaces, Applied and
Computational Harmonic Analysis 19 (2005) 61–76.
[17] A.Yu. Khrennikov, and S.V. Kozyrev, Pseudodifferential operators on ultrametric
spaces and ultrametric wavelets, Izvestia Akademii Nauk, Seria Math. 69 no. 5 (2005)
133–148.
[18] A.Yu. Khrennikov, V.M. Shelkovich, p-Adic multidimensional wavelets and their
application to p-adic pseudo-differential operators, (2006), Preprint at the url:
http://arxiv.org/abs/math-ph/0612049
[19] A.N. Kochubei, Pseudo-differential equations and stochastics over non-archimedean
fields, Marcel Dekker. Inc. New York, Basel, 2001.
[20] S.V. Kozyrev, Wavelet analysis as a p-adic spectral analysis, Izvestia Akademii Nauk,
Seria Math. 66 no. 2 (2002) 149–158.
http://arxiv.org/abs/math-ph/0612049
20 V. M. SHELKOVICH AND M. SKOPINA
[21] S.V. Kozyrev, p-Adic pseudodifferential operators: methods and applications, Proc.
Steklov Inst. Math. 245, Moscow (2004) 154–165.
[22] S.V. Kozyrev, p-Adic pseudodifferential operators and p-adic wavelets, Theor. Math.
Physics 138, no. 3 (2004) 1–42.
[23] S.V. Kozyrev, V.Al. Osipov, V.C. A.Avetisov, Nondegenerate ultrametric diffusion, J.
Math. Phys. 46 no. 6 (2005) 15 pp.
[24] P.I. Lizorkin, Generalized Liouville differentiation and the functional spaces Lp
r(En).
Imbedding theorems, (Russian) Mat. Sb. (N.S.) 60(102) (1963) 325–353.
[25] P.I. Lizorkin, Operators connected with fractional differentiation, and classes of differ-
entiable functions, (Russian) Studies in the theory of differentiable functions of several
variables and its applications, IV. Trudy Mat. Inst. Steklov. Vol. 117 (1972), 212–243.
[26] S. Mallat, Multiresolution representation and wavelets, Ph. D. Thesis, University of
Pennsylvania, Philadelphia, PA. 1988.
[27] S. Mallat, An efficient image representation for multiscale analysis, In: Proc. of Machine
Vision Conference, Lake Taho. 1987.
[28] Y. Meyer, Ondelettes and fonctions splines, Seminaire EDP. Paris. Decamber 1986.
[29] S.G. Samko, Hypersingular integrals and their applications. Taylor & Francis, London,
2002.
[30] S.G. Samko, A.A. Kilbas, and O.I. Marichev, Fractional integrals and derivatives and
some of their applications. Minsk, Nauka i Tekhnika, 1987 (in Russian); English transla-
tion: Fractional integrals and derivatives. Theory and applications, Gordon and Breach,
London, 1993.
[31] I. Novikov , V. Protassov, and M. Skopina, Wavelet Theory. Moscow: Fizmatlit, 2005.
[32] M.H. Taibleson, Harmonic analysis on n-dimensional vector spaces over local fields. I.
Basic results on fractional integration, Math. Annalen 176 (1968) 191–207.
[33] M.H. Taibleson, Fourier analysis on local fields. Princeton University Press, Princeton,
1975.
[34] V.S. Vladimirov, I.V. Volovich and E.I. Zelenov, p-Adic analysis and mathematical
physics. World Scientific, Singapore, 1994.
[35] V.S. Vladimirov, I.V. Volovich, p-Adic quantum mechanics, Commun. Math. Phys. 123
(1989) 659–676.
[36] I.V. Volovich, p-Adic string, Class. Quant. Grav. 4 (1987) L83–L87.
Department of Mathematics, St.-Petersburg State Architecture and Civil
Engineering University, 2 Krasnoarmeiskaya 4, 190005, St. Petersburg, Rus-
sia. Phone: +7 (812) 2517549 Fax: +7 (812) 3165872
E-mail address : shelkv@vs1567.spb.edu
Department of Applied Mathematics and Control Processes, St. Peters-
burg State University, Universitetskii pr.-35, Petrodvorets, 198504 St. Pe-
tersburg, Russia. Phone: +7 (812) 51326090 Fax: +7 (812)
E-mail address : skopina@ms1167.spb.edu
	1. Introduction
	1.1. p-Adic wavelets and pseudo-differential operators.
	1.2. Contents of the paper.
	2. p-Adic distributions
	3. The p-adic Lizorkin spaces
	4. Construction of multiresolution analysis
	4.1. p-Adic multiresolution analysis.
	4.2. p-Adic refinement equation.
	4.3. Construction of 2-adic Haar multiresolution analysis.
	5. Description of 2-adic Haar bases
	5.1. Complex wavelets.
	5.2. Real wavelets.
	Acknowledgments
	References
ABSTRACT
  In this paper, the notion of {\em $p$-adic multiresolution analysis (MRA)} is
introduced. We use a ``natural'' refinement equation whose solution (a
refinable function) is the characteristic function of the unit disc. This
equation reflects the fact that the characteristic function of the unit disc is
the sum of $p$ characteristic functions of disjoint discs of radius $p^{-1}$.
The case $p=2$ is studied in detail. Our MRA is a 2-adic analog of the real
Haar MRA. But in contrast to the real setting, the refinable function
generating our Haar MRA is periodic with period 1, which never holds for real
refinable functions. This fact implies that there exist infinity many different
2-adic orthonormal wavelet bases in ${\cL}^2(\bQ_2)$ generated by the same Haar
MRA. All of these bases are constructed. Since $p$-adic pseudo-differential
operators are closely related to wavelet-type bases, our bases can be
intensively used for applications.

<|endoftext|><|startoftext|>
Introduction 3
2 Review 5
2.1 KKLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Consistency of KKLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Large volume scenario (LVS) . . . . . . . . . . . . . . . . . . . . . . . . 7
3 String loop corrections to LVS 10
3.1 From toroidal orientifolds to Calabi-Yau manifolds . . . . . . . . . . . . 11
3.2 LVS with loop corrections . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 The P4[1,1,1,6,9] model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Gaugino masses 20
4.1 Including loop corrections . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Other soft terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 LVS for other classes of Calabi-Yau manifolds? 25
5.1 Abundance of “Swiss cheese” Calabi-Yau manifolds . . . . . . . . . . . 25
5.2 Toroidal orientifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Fibered Calabi-Yau manifolds . . . . . . . . . . . . . . . . . . . . . . . 28
6 Further corrections 29
7 Conclusions 29
A Some details on LVS 31
A.1 LVS for P4
[1,1,1,6,9]
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
A.2 Many Kähler moduli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
B Loop corrected inverse Kähler metric for P4[1,1,1,6,9] 34
C No-scale Kähler potential in type II string theory 35
C.1 No-scale structure in type IIA . . . . . . . . . . . . . . . . . . . . . . . 36
C.2 No-scale structure in type IIB . . . . . . . . . . . . . . . . . . . . . . . 37
C.3 Cancellation with just the volume modulus . . . . . . . . . . . . . . . . 39
C.4 Cancellation with many Kähler moduli . . . . . . . . . . . . . . . . . . 39
C.5 Perturbative corrections to Vnp1 and Vnp2 . . . . . . . . . . . . . . . . . 40
D KK spectrum with fluxes 42
E The orientifold calculation 44
F Factorized approximation 48
F.1 Factorized approximation of the scalar potential . . . . . . . . . . . . . 50
1 Introduction
The KKLT strategy [1,2] for producing stabilized string vacua that can serve as a
starting point for phenomenology has been a source of great interest for the last few
years. The “large-volume scenario” (LVS) [3,4] is an extension of KKLT where string
corrections to the tree-level supergravity effective action computed in [5] play a sig-
nificant role, and where the compactification volume can be as large as 1015 in string
units. In LVS, work has been done on soft supersymmetry breaking [4,6,7], the QCD
axion [8,9], neutrino masses [10], inflation [11–14], and even first attempts at LHC
phenomenology [15].
Although tantalizing, the models discussed in the aforementioned papers (nominally
“string compactifications”) raise many questions. It remains an open problem to con-
struct complete KKLT models in string theory, as opposed to supergravity. Problems
one faces include things like the description of RR fluxes in string theory, showing that
the necessary nonperturbative effects actually can and do appear in a way consistent
with other contributions to the potential (for progress in this direction, see [16–32]),
and verifying that one can uplift to a Minkowski or deSitter vacuum without ruining
stabilization [26,33–36]. In LVS, since string corrections play a crucial role, striving
for actual string constructions seems quite important. In the end, the restrictiveness
this entails may greatly improve predictivity, or kill the models completely as string
compactifications.
In this paper, we will not improve on the consistency of KKLT or LVS in general,
but rather assume the existence of LVS models in string theory, and then perform self-
consistency checks. This is a modest step on the way towards reconciling phenomeno-
logically promising scenarios with underlying string models. We will see that although
a priori the situation looks very bleak, and one might have hastily concluded that even
our modest consistency check would put very strong constraints on LVS, things are
more interesting. It turns out that LVS jumps through every hoop we present it with,
and instead of broad qualitative changes, we find only small quantitative changes.
The main difference between KKLT and LVS is that LVS includes a specific string
α′ correction ∆Kα′ in the Kähler potential K of the four-dimensional N = 1 effective
supergravity. Naturally, the four-dimensional string effective action also contains other
string corrections. Here, we will focus on gs corrections due to sources (D-branes
and O-planes). For some N = 1 and N = 2 toroidal orientifolds, these corrections
were computed in [37] (see also [38]; for a comprehensive introduction to orientifolds,
see [39]). Compared to the α′ correction ∆Kα′ considered in LVS, the gs corrections to
the Kähler potential ∆Kgs will scale as
∆Kα′ : ∆Kgs
O(α′3) : O(g2sα′2)
(string frame) . (1)
By naive dimensional analysis, one would expect that in a 1/V expansion, where V is
the overall volume in the Einstein frame, eq. (1) implies
∆Kα′ ∼ O(g−3/2s V−1) , ∆Kgs ∼ O(gsV−2/3) (Einstein frame) . (2)
If there is more than one Kähler modulus, as is usually the case, various combinations
of Kähler moduli may appear in ∆Kgs in eq.(2), and a priori this could lead to even
weaker suppression in 1/V than that shown. However, we will argue that (2) is actually
correct as far as the suppression factors in the 1/V expansion go. Nevertheless, even the
suppression displayed in (2) seems to be a challenge for LVS, if indeed V ∼ 1015. For V
this large, ∆Kgs would dominate ∆Kα′ , since we do not expect the string coupling gs
to be stabilized extremely small. On the other hand, if we are interested in the effects
gs corrections may have on the existence of the large volume minima, the relevant
quantity to look at is the scalar potential V , rather than the Kähler potential K. It
turns out that certain cancellations in the expression for the scalar potential leave us
with leading correction terms to V that scale as
∆Vα′ ∼ O(g−1/2s V−3) , ∆Vgs ∼ O(gsV−3) . (3)
This is already much better news for LVS. However, restoring numerical factors in (3),
and with gs typically not stabilized extremely small, it would seem that ∆Kgs could still
have a significant effect both on stabilization and on the resultant phenomenology (like
soft supersymmetry breaking terms, which also depend on the Kähler potential). We
will see that although this is indeed so in principle, in practice the models we consider
are surprisingly robust against the inclusion of ∆Kgs . The clearest example of this is
the calculation of gaugino masses in sec. 4. The result is that for the “11169 model”
(analyzed in [6]), the correction to the gaugino masses due to ∆Kgs is negligible. Thus,
for the most part, LVS survives our onslaught unscathed.
We consider this a sign that scenarios such as LVS deserve to be taken seriously as
goals to be studied in detail in string theory, even as the caveats above (that apply to
any KKLT-like setup) serve to remind us that there is much work left to be done to
really understand phenomenologically viable stabilized flux compactifications in string
theory.
2 Review
Let us begin by a quick review of the KKLT and large volume scenarios. For reasons
that will become clear, we will want to allow for more than a single Kähler modulus.
2.1 KKLT
The KKLT setup [1,2] is a warped Type IIB flux compactification on a Calabi-Yau (or
more generally, F-theory) orientifold, with all moduli stabilized. In this paper, we will
neglect warping. For progress towards taking warping into account in phenomenological
contexts, see [40,41].
In the four-dimensional N = 1 effective supergravity, the Kähler potential and
superpotential read
K = −ln(S + S̄)− 2 ln(V) +Kcs(U, Ū) ,
W = Wtree +Wnp =W (S, U) +
Ai(S, U)e
−aiTi , (4)
where the volume V is a function of the Kähler moduli Ti = τi + ibi whose real parts
are 4-cycle volumes and whose imaginary parts are axions bi, arising from the integral
of the RR 4-form over the corresponding 4-cycles. In particular, the volume V depends
on the Ti only through the real parts τi,
V = V(Ti + T̄i) = V(τi) , (5)
and the nonperturbative superpotential Wnp a priori depends on the complexified dila-
ton S and the complex structure moduli U . After stabilization of S and U by demand-
ing DUW = 0 = DSW , we have
K = −ln(S + S̄)− 2 ln(V) +Kcs(U, Ū) ,
W = Wtree +Wnp =W0 +
−aiTi . (6)
We keep the dependence on the complexified dilaton S and the complex structure
moduli U in the Kähler potential for now, since the Kähler metric in the F-term
potential
V = eK
GJ̄IDJ̄W̄DIW − 3|W |2
is to be calculated with the full Kähler potential K, including the dependence on S and
U . (In eq. (7), the index I a priori runs over all moduli, but after fixing the complex
structure moduli and the dilaton, only the sum over the Kähler moduli remains). The
scalar potential V has a supersymmetric AdS minimum at a radius that is barely large
enough to make the use of a large-radius effective supergravity self-consistent, typically
τ ∼ 100 (recall that τ has units of (length)4).1 In addition, to obtain a supersymmetric
minimum at all, one needs to tune the flux superpotential W0 to very small values.
That is, the stabilization only works for a small parameter range. This is easy to
understand, since we are balancing a nonperturbative term against a tree-level term.
Let us briefly digress on the reasons for and implications of this balancing.
2.2 Consistency of KKLT
In the previous section we only considered the lowest-order supergravity effective action.
As was already noted in the original KKLT paper, α′ corrections and gs corrections
(string loops) that appear in addition to the tree-level effective action could in principle
1This minimum then has to be uplifted to dS or Minkowski by an additional contribution to the
potential. Various mechanisms were suggested in [2,36,42–45].
affect stabilization. Oftentimes, the logic of string effective actions is that if one such
correction matters, they all do, so no reliable physics can be learned from considering
the first few corrections. If this is true, one can only consider regimes in which all
corrections are suppressed. This is not necessarily so if some symmetry prevents the
tree-level contribution to the effective action from appearing, so that the first correc-
tion (be it α′ or gs) constitutes lowest order. This indeed happens for type IIB flux
compactifications; given the tree level Kähler potential (6), if we were to set Wnp = 0,
the remaining K andW in (6) produce a no-scale potential, i.e. the scalar potential for
the Kähler moduli then vanishes [46]. In KKLT, this no-scale structure is only broken
by the nonperturbative contribution to the superpotential Wnp. Since each term in
Wnp is exponentially suppressed in some Kähler modulus, the resulting terms in the
potential are also exponentially suppressed. For instance, for the simpler example of a
single modulus τ , the potential (after already fixing the axionic partner along the lines
of appendix A.2) reads
4|A|2aτe−aτ
aτ + 1
− 4aτ |AW0|
e−aτ , (8)
meaning that even for moderate values of the Kähler modulus τ , all these terms are
numerically very small. Corrections in α′ and gs, however, are expected to go as powers
of Kähler moduli τ , so will dominate the scalar potential for most of parameter space. In
particular, it was argued in [3,4] that only for very small values of W0 can perturbative
corrections to the Kähler potential be neglected. It was the insight of [3] that even if
W0 is O(1) (which is more generic than the tiny value forW0 required in KKLT), there
can still be a competition between the perturbative and nonperturbative corrections to
the potential in regions of the Kähler cone where large hierarchies between the Kähler
moduli are present. We now review this scenario.
2.3 Large volume scenario (LVS)
As was shown in [5], the no-scale structure (and factorization of moduli space) is broken
by perturbative α′ corrections to the Kähler potential, such as
K = −ln(2S1)− 2 ln(V + 12ξS
1 ) +Kcs(U, Ū) , (9)
where2 ξ = −ζ(3)χ/2(2π)3 and S1 = ReS. For large volume V, we see that the
perturbative correction goes as a power in the volume,
− 2 ln(V + 1
1 ) = −2 lnV −
V + . . . , (10)
2Here ξ differs by a factor (2π)−3 from [5] because we use the string length ls = 2π
which by the discussion in the previous subsection will dominate in the scalar potential
if all Kähler moduli are even moderately large. Using the superpotential
W = W0 +Wnp = W0 +
−aiTi , (11)
the scalar potential has the structure
V = Vnp1 + Vnp2 + V3 (12)
G̄i∂̄W̄np∂iWnp +
G̄iK̄ (W̄0 + W̄np)∂iWnp + c.c.
G̄iK̄Ki − 3
|W |2
For concrete calculations we will use the model based on the hypersurface of degree 18 in
[1,1,1,6,9]
(see [16,47,48] for background information on its topology. Some comments
about generalizations to other models with arbitrary numbers of Kähler moduli are
given in appendix A.2). The defining equation is
z181 + z
2 + z
3 + z
4 + z
5 − 18ψz1z2z3z4z5 − 3φz61z62z63 = 0 (13)
and it has the Hodge numbers h1,1 = 2 and h2,1 = 272 (only two of the complex
structure moduli ψ and φ have been made explicit in (13); moreover, not all of the
272 survive orientifolding). We denote the two Kähler moduli by Tb = τb + ibb and
Ts = τs + ibs, where τb and τs are the volumes of 4-cycles, and the subscripts “b” and
“s” are chosen in anticipation of the fact that one of the Kähler moduli (τb) will be
stabilized big, and the other one (τs) will be stabilized small. An interesting property
of this model is that it allows expressing the 2-cycle volumes ti explicitly as functions of
the 4-cycle volumes τj , so that the total volume of the manifold can be written directly
in terms of 4-cycle volumes, yielding
V = 1
b − τ 3/2s
, (14)
(ts + 6tb)
, τs =
Following [4], we are interested in minima of the potential with the peculiar property
that one Kähler modulus τb ∼ V2/3 is stabilized large and the rest are relatively small
(but still large compared to the string scale),
aτs ∼ lnV ∼
ln τb (15)
in the case at hand. Thus, we expand the potential around large volume, treating
e−aτs as being of the same order as V−1. In the end one has to check that the resulting
potential indeed leads to a minimum consistent with the exponential hierarchy aτs ∼
lnV, so that the procedure is self-consistent. Applying this strategy, the scalar potential
at leading order in 1/V becomes3
VO(1/V3) =
2|A|2a2√τse−2aτs
− 2a|AW0|τse
3|W0|2
eKcs . (16)
From here one can see the existence of the large volume minima rather generally. By
the Dine-Seiberg argument [49], the scalar potential goes to zero asymptotically in
every direction. Along the direction (15), for large volume the leading term in (16) is
V ∼ Vnp2 ∝ −
V3 , (17)
which is negative, so the potential V approaches zero from below. For moderately
small values of the volume, V is positive (this is guaranteed if the Euler number χ
is negative, hence ξ positive), so in between there is a minimum. This minimum
is typically nonsupersymmetric, and because we are no longer balancing a tree-level
versus a nonperturbative term, we can find minima at large volume — hence the name
large volume scenario (LVS).4 To be precise, in flux compactifications we move in
parameter space by the choice of discrete fluxes, but since V is exponentially sensitive
to parameters like S1, large volume minima appear easy to achieve also by small changes
in flux parameters. If we allow for very small values ofW0 (so that KKLT minima exist
at all), the above minimum can coexist with the KKLT minimum [4,50]. Here, we will
allow W0 to take generic values of order one.
The astute reader will have noticed that this argument for the existence of the
LVS minimum is “one dimensional”, as it only takes into account the behavior of the
potential along the direction (15). One must of course check minimization with respect
to all Kähler moduli. In [3] a plausibility argument to this effect was given, and the
existence of the minimum was explicitly checked in the case of the P4
[1,1,1,6,9]
model by
explicitly minimizing the potential (16) with respect to the Kähler moduli. In doing
so, it is convenient to trade the two independent variables {τb,τs} for {V,τs} so that
3Here we have already stabilized the axion bs, i.e. solved ∂V/∂bs = 0, which produces the minus
sign in the second term; this is also true with many small moduli τi. See appendix A.2 for details.
Also note that solving DUW = 0 = DSW causes the values of U and S at the minimum to depend
on the Kähler moduli. However, this dependence arises either from the nonperturbative terms in
the superpotential or from the α′-correction to the Kähler potential. Thus it would only modify the
potential at subleading order in the 1/V expansion.
4By “tree-level” we intend “tree-level supergravity”, i.e. for the purposes of this paper we call both
α′ and gs corrections “quantum corrections”.
∂τsV = 0, as then the last term in (16) is independent of τs (this will be different when
we include loop corrections). Extremizing with respect to τs, and defining
X ≡ Ae−aτs , (18)
one obtains a quadratic equation for X ,
τsS1V
(4aτs − 1)X2 +
2a|W0|
(aτs − 1)X
eKcs . (19)
In (18), we chose A to be real as a potential phase can be absorbed into a shift of the
axion b and disappears after minimization with respect to b (see section A.2). Two
comments are in order. The quadratic equation (19) has just one meaningful solution
(X = 0 corresponds to τs = ∞). Moreover, when expanding (19) in 1/(aτs), the leading
terms arise from derivatives of the exponential.
Formula (19) is an implicit equation determining τs. However, one can easily solve
(19) for X and obtains
X = Ae−aτs =
2|W0|
(aτs)2
. (20)
The hierarchy (15) is obvious in this solution, rendering the procedure self-consistent.
One also notices that reasonably large values of τs (e.g. 35) are not difficult to obtain,
if V is stabilized large enough; for example, simply set a ∼ 1, A ∼ 1, W0 ∼ 1. We
fill in the numerical details, following [3], in appendix A.1 (including some further
observations).
3 String loop corrections to LVS
As already emphasized, the α′ correction proportional to ξ is only one among many
corrections in the string effective action. We now consider the effect of string loop
corrections on this scenario and what the regime of validity is for including or neglect-
ing those corrections. Volume stabilization with string loop corrections but without
nonperturbative effects was considered in [51].
To be precise, the corrections considered in [51] were those of [37], that were com-
puted for toroidal N = 1 and N = 2 orientifolds. Here, we would need the analogous
corrections for smooth Calabi-Yau orientifolds. Needless to say, these are not known.
Faced with the fact that the string coupling gs is stabilized at a finite (and typically
not terribly small) value, we propose that attempting to estimate the corrections based
on experience with the toroidal case is better than arbitrarily discarding them. As we
will see, if our estimates are correct, typically the loop corrections can be neglected,
though there may at least be some regions of parameter space where they must be
taken into account (see figure 4). (In section 5, we will briefly consider “cousins” of
LVS where they cannot be neglected anywhere in parameter space.) Improvement on
our guesswork would of course be very desirable.
3.1 From toroidal orientifolds to Calabi-Yau manifolds
We would like to make an educated guess for the possible form of one-loop corrections
in a general Calabi-Yau orientifold. All we can hope to guess is the scaling of these
corrections with the Kähler moduli T and the dilaton S. The dependence on other
moduli, like the complex structure moduli U , cannot be determined by the following
arguments (even in the toroidal orientifolds this dependence was quite complicated).
In order to generalize the results of [37] to the case of smooth Calabi-Yau mani-
folds, we should first review them and in particular remind ourselves where the various
corrections come from in the case of toroidal orientifolds. There, the Kähler potential
looks as follows (we will explain the notation as we go along):
K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)−
V (21)
E (K)i (U, Ū)
4τiS1
i 6=j 6=k
E (W )k (U, Ū)
4τiτj
There are two kinds of corrections. One comes from the exchange of Kaluza-Klein
(KK) modes between D7-branes (or O7-planes) and D3-branes (or O3-planes, both
localized in the internal space), which are usually needed for tadpole cancellation, cf.
fig. 1. This leads to the first kind of corrections in (21), proportional to E (K)i where
the superscript (K) reminds us that these terms originate from KK modes. In the
toroidal orientifold case, this type of correction is suppressed by the dilaton and a
single Kähler modulus τi, related to the volume of the 4-cycle wrapped by the D7-
branes (or O7-planes, respectively).5 We expect an analog of these terms to arise more
5We should mention that there was no additional correction of this kind coming from KK exchange
between (parallel) D7-branes in [37] (actually that paper considered the T-dual version with D5-branes,
but here we directly translate the result to the D7-brane language). This was due to the fact that
in [37] the D7-brane scalars were set to zero. In general we would also expect a correction coming
PSfrag replacements τb
Figure 1: The loop correction E(K) comes from the exchange of closed strings, or
equivalently an open-string one-loop diagram, between the D3-brane and D7-branes
(or O7-planes) wrapped on either the small 4-cycle τs (as in a) or the large 4-cycle
τb (as in b). The exchanged closed strings carry Kaluza-Klein momentum.
generally, given that they originate from the exchange of KK states which are present
in all compactifications.
The second type of correction comes from the exchange of winding strings between
intersecting stacks of D7-branes (or between intersecting D7-branes and O7-planes).
The exchanged strings are wound around non-contractible 1-cycles within the intersec-
tion locus of the D7-branes (and O7-planes, respectively), cf. fig. 2. This leads to the
PSfrag replacements
Figure 2: The loop correction E(W ) comes from the exchange of winding strings
on the intersection between the small 4-cycle τs and the large 4-cycle τb. If this
intersection is empty, there are no terms with E(W ).
second kind of correction in (21) proportional to E (W )i . The superscript (W ) reminds
us that these terms arise from the exchange of winding strings. In toroidal orientifolds,
this type of correction is suppressed by the two Kähler moduli measuring the volumes
from parallel (or more generally, non-intersecting) D7-branes by exchange of KK-states. These should
scale in the same way with the Kähler moduli as those arising from the KK exchange between D3-
and D7-branes.
of the 4-cycles wrapped by the D7-branes (and O7-planes). One might a priori think
that this kind of correction does not generalize easily to a smooth Calabi-Yau which
has vanishing first Betti number (and therefore at most torsional 1-cycles). However,
the exchanged winding strings are, from the open string point of view, Dirichlet strings
with their endpoints stuck on the D7-branes. Thus, the topological condition is on the
cycle over which the two D7-brane stacks (or one D7-brane stack and an O7-plane)
intersect, as in figure 3. Thus, it depends on the topology of specific cycles within
cycles whether winding open strings exist in a given model.6
PSfrag replacements
no D-brane
no D-
brane
Figure 3: A D7-brane is wrapped on a 4-cycle A, which intersects the 4-cycle B on
a 2-cycle C. For Dirichlet strings, the relevant topological condition (the existence
of nontrivial 1-cycles) is on the intersection locus C, not on cycle B or on the whole
Calabi-Yau. In other words, without the D-brane, the string on cycle C could have
been unwound by sliding it along cycle B (as shown in the figure). With the D-brane,
the string on cycle C is stuck.
Given the expressions in [37] and the subset reproduced in (21) above, it is tempting
to conjecture that some terms at one loop might be suppressed only by powers of single
Kähler moduli like the τi (and the dilaton):
Calabi-Yau: ∆Kgs
for some function E of the complex structure and open string moduli. If this were
the case, the one-loop corrections would typically dominate the α′ correction in (21)
6The toroidal orientifold case seems to be a bit degenerate. Two stacks of D7-branes intersect along
a 2-cycle with the topology of P1. However, there are point-like curvature singularities along the P1
at the orbifold point and strings winding around these singular points cannot be contracted without
crossing the singularities. This seems to allow for stability of winding strings (at least classically).
(which is suppressed by the overall volume V) in the Kähler potential, if there are large
hierarchies among the Kähler moduli. However, one should keep in mind that toroidal
orientifolds are rather special in that they have very simple intersection numbers. In
particular, the overall volume can be written as V ∼ τiti, where there is no summation
over i implied. Thus, it is not obvious whether a generalization to the case of a general
Calabi-Yau really contains terms suppressed by single Kähler moduli instead of the
overall volume. Even though we cannot exclude the presence of such terms, we deem it
more likely that the scaling of one-loop corrections to the Kähler potential is not (22)
Calabi-Yau: ∆Kgs
gaK(t, S1)E
W(t, S1)E
V , (23)
where the sums run over KK and winding states, respectively. Also, E (K) and E (W ) are
again unknown functions of the complex structure and open string moduli, t stands
for the 2-cycle volumes (in the Einstein frame; see appendix C.1) and the functions
gK(t, S1) and gW(t, S1) determine the scaling of the KK and winding mode masses
with the Kähler moduli and the dilaton.7 As we review in appendix E, in the toroidal
orientifold case the suppression by the overall volume arises naturally through the Weyl
rescaling to the 4-dimensional Einstein frame.
Starting with the ansatz (23) for smooth Calabi-Yau manifolds, the known form
(22) for toroidal orientifolds follows simply by substituting gK, gW and the intersec-
tion numbers for the toroidal orientifold case. In particular, gK ∼ ti for the 2-cycle
transverse to the relevant D7-brane, while gW ∼ t−1i for the 2-cycle along which the
two D7-branes intersect. Then, the first of the terms in (23) reduces to E (K)i /(S1τi) for
toroidal orientifolds, the second to E (W )i /(τjτk) with j 6= i 6= k, cf (50). Our strategy
in the following chapters will therefore be to assume a scaling like (23) for the 1-loop
corrections to the Kähler potential for general Calabi-Yau spaces.
As already mentioned, the dependence on the complex structure and open string
moduli cannot be inferred by analogy to the orientifold case. We parameterize our igno-
7In rewriting the sums over KK and winding states in terms of the functions g and E , we assume that
the dependence of the corresponding spectra on the complex structure and Kähler moduli factorizes.
In the known examples of toroidal orientifolds (with or without world-volume fluxes), this is always
the case, cf. [52]. Moreover, in general there can appear several contributions (denoted by a and q)
depending on which tower of KK or winding states are exchanged in a given process. We will see
explicit examples of this in the following.
rance by keeping the expressions E in (23) as unknown functions of the corresponding
moduli. Then we investigate the consequences of the one-loop terms, depending on
the size of these unknown functions at the minimum of the potential for the complex
structure and open string moduli. Some further comments on the form of ∆Kgs will
appear in section 5.2.
3.2 LVS with loop corrections
Thus, allowing for string loop corrections of the form (23) in (9), and expanding the
α′ correction as in (10), we can write
K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)−
W = W0 +
−aiTi , (24)
where as explained in the previous section, we have not specified the explicit form of
the loop corrections E , that are allowed to be functions of U (and in general of the open
string moduli, that we neglect in our analysis, assuming that they can be stabilized
by fluxes). The Kähler potential for the complex structure moduli Kcs(U, Ū) is left
unspecified in (24), indeed we will not need its explicit form. For consistency, we have
also included loop corrections to the α′ correction.8 This changes ξ to ξ̃, which is a
small change; for S1 = 10, numerically ξ̃ ≈ 1.02 ξ.
Neglecting fluxes, the functions gaK and g
W are proportional and inversely propor-
tional to some 2-cycle volume, respectively. (We will come back to corrections from
fluxes in appendix D.) When using a particular basis of 2-cycles (with volumes ti as
in appendix C.1), the 2-cycle volume appearing in gaK or g
W might be given by a linear
combination ta =
i citi of the basis cycles ti (and similarly for tq). Depending on
which 2-cycle is the relevant one, this linear combination might or might not contain
the large 2-cycle tb ∼ V1/3, which always exists in LVS. If it is present in the linear
combination, one can neglect the contribution of the small 2-cycles to leading order in
a large volume expansion and obtains possible terms proportional to E (K)b S−11 V−2/3 or
E (W )b V−4/3, where the subscript b refers to the large 4-cycle τb.
8We remind the reader that the α′ correction arises from the R4 term in 10 dimensions whose
coefficient receives corrections at 1-loop (and from D-instantons). The 1-loop correction amounts to
a shift of the prefactor from ξ to ξ̃ = ξ
1 + π
3ζ(3)S2
, see for instance [53] for a review.
Before getting into the details, it is hard to resist trying to anticipate what might
happen. For those terms that are more suppressed in volume than the ξ̃ term (e.g
E (W )b ), one would expect the loop corrections to have little effect on stabilization. They
could still represent a small but interesting correction to physical quantities in LVS. For
those that are less suppressed in volume than the ξ̃ term (e.g. E (K)b ), one would expect
the loop correction to have a huge effect on stabilization, and severely constrain the
allowed values for the complex structure moduli and the dilaton in LVS (in particular,
constrain them to a region in moduli space where the function E (K)b takes very small
values). We will find, however, that this expectation is sometimes too naive. For
example, there can be cancellations in the scalar potential that are not obvious from
just looking at the Kähler potential.
Let us now get into more detail on what happens in the LVS model with loop
corrections.
3.3 The P4
[1,1,1,6,9]
model
We would now like to specify the general form of the Kähler- and superpotential (24) to
the case of the P4[1,1,1,6,9] model. In this space, the divisors that produce nonperturbative
superpotentials when D7- (or D3-) branes are wrapped around them do not intersect,
as reviewed for instance in [48]. Therefore, we do not expect any correction of the E (W )
type in this model (for the generalization to models where there are such intersections,
see appendix D). Moreover, we neglect flux corrections to the KK mass spectrum in
the main text. It is shown in appendix D that, for small fluxes, this correctly captures
all the qualitative features we are interested in, and it leads to much clearer formulas.
Thus, we now consider the scalar potential resulting from
K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)−
τbE (K)b
τsE (K)s
W = W0 + Ae
−aTs . (25)
As τb is very large the corresponding non-perturbative term in the superpotential of
(24) can be neglected, which allowed us to simplify the notation by setting As = A
and as = a.
The general structure of the scalar potential was already given in (12). The three
contributions at leading order (O(V−3)) in the large volume expansion are
Vnp1 = e
24a2|A|2τ 3/2s e−2aτs
V∆ , (26)
Vnp2 = −eKcs
2a|AW0|τse−aτs
6E (K)s
, (27)
3eKcs |W0|2
1 ξ̃ +
4(E (K)s )2
, (28)
where the axion has already been minimized for, as discussed in section A.2, and
2S1τs − 3E (K)s . (29)
The leading α′-correction is the ξ̃ term in V3 above. We now see that it scales with the
volume and the string coupling gs = 1/S1 as claimed in the Introduction, in eq. (3).
Also the volume dependence of the loop correction (E (K)s term) in V3 is as announced
in (3). The gs factors seem to differ from (3); we see g
s , g
s and g
s for Vnp1, Vnp2 and
V3, respectively. This is because the gs dependence advertised in (3) arises in models
where, unlike in P4[1,1,1,6,9], the E (W ) correction is present as well, cf. appendix D.9 It is
also worth mentioning that the loop correction proportional to E (K)s modifies Vnp1 and
Vnp2 at leading order in the V-expansion. whereas the α′ correction does not; it only
appears in V3. This is so even though both corrections are equally suppressed in the
Kähler potential (i.e. ∼ V−1). The reason for this can be traced back to the fact that
the loop-correction explicitly depends on τs and not only on the overall volume, cf. the
discussion in appendix C.4 and C.5.
As anticipated, E (K)b and its first derivatives appear only at the next order, O(V−10/3):
V10/3 = 2
61/3|W0|2eKcs
S31V10/3
(E (K)b )2 +
∂αE (K)b ∂ᾱE
, (30)
where ∂α = ∂/∂U
α and ∂ᾱ = ∂/∂Ū
ᾱ and α enumerates the complex structure moduli.
For E (K)b = E
s = 0, the potential terms at leading order coincide with the original
case discussed in (56), cf. appendix A.1. The singularity from zeros of the denominator
is an artifact of the expansion as discussed in appendix B. The range of validity is
9There, it is shown that including the effect of fluxes on the KK spectrum might also produce this
behavior.
limited to the range in moduli space where the denominator ∆ does not become too
small. It is also apparent that the loop terms are subleading in a large τs, large S1
expansion. However, depending on the relative values of the parameters {E (K)s , τs, S1},
a truncation to the first terms in such an expansion may or may not be valid. We
perform a numerical comparison of the two contributions to V3 in figure 4.
Figure 4: The top surface is the α′ correction, the second is the gs correction, and
the “red carpet” is 10/∆ (we used the values A = 1,W0 = 1, a = 2π/8). We see
that for most of the parameter range, the α′ correction dominates, and only for large
E(K)s , with the string coupling gs = 1/S1 not too small, do the contributions become
comparable.
We can understand the volume dependence of the terms (26)-(30) as follows. The
common prefactor eK gives an overall suppression τ−3b ≃ V−2. The quantum corrections
obey the rule that a term proportional to 1/τλb in K appears in V3 at order 1/τ
(where the +3 comes from the overall eK factor) for all values of λ except for λ = 1.
When it does appear, it is generated by the term (KiKi − 3) and breaks the no-scale
structure. For λ = 1 there is a cancellation at leading order, so it appears only at
order 1/τ 2+3b (see appendix C.3 and C.4). This rule can explicitly be verified in our
calculation: the α′ and the E (K)s corrections are suppressed by 1/τ 3/2b inK, and therefore
they appear with the suppression 1/τ
b in V3. On the other hand, for the E
b term a
cancellation takes place to leading order (λ = 1). It appears neither in Vnp1 nor in Vnp2
at leading order (which can be understood more generally, cf. appendix C.5). Thus, it
only appears subleading in the potential, at O(V−10/3).10
10This cancellation for λ = 1 was already noticed in [51], albeit in the case without nonperturbative
superpotential. In [54] it was argued that this cancellation can be understood from a field redefinition
We now proceed to minimize the potential (26)-(28), using the same strategy as
in the case without loop corrections, cf. section 2.3 and appendix A.1. The equations
∂VV = 0 = ∂τsV are of course more complicated now, but it is easy to solve them
numerically. Doing so we find that the volume V and the small 4-cycle volume τs,
viewed as functions of S1 and E (K)s , are well fit by linear functions when restricted to
a sufficiently limited range in parameter space. For example,
range: S1 = [8, 11], E (K)s = [20, 40]
log10 V = 1.720S1 − 0.1208 E (K)s − 3.437 , (31)
τs = 5.000S1 − 0.3581 E (K)s − 8.638 .
The fits are quite good; the error is no greater than ±0.3 for τs and ±0.1 for log10 V in
this range, for an {S1, E (K)s } grid of 402 points.
From (31) we see an interesting difference to the case without loop corrections. The
value of τs at the minimum depends on the complex structure moduli U , through E (K)s .
This is in contrast to the case without loop corrections, where the value of τs is only
determined by the value of the Euler number ξ and the dilaton S1, cf. (57) below. It is
analogous to the perturbative stabilization in [51] where the volume at the minimum
of the potential also depends on U .
The result (26)-(28) was derived in a particular model, but we expect the appearance
of loop corrections in V to be more general. This opens up the possibility that in
principle, one might obtain large volume minima even for manifolds of vanishing (or
even positive) Euler number, where LVS is not applicable, as LVS-style stabilization
only holds for one sign of ξ. In practice it might be difficult to get large enough values
for the 1-loop corrections to stabilize τs at a value sufficiently bigger than the string
scale. This deserves further study.
We also note that the special structure of (28) and (30), i.e. the appearance of Es
only in (28) and of Eb only in (30), offers additional flexibility in tuning the relative
size of these terms in a purely perturbative stabilization of the Kähler moduli along
the lines of [51,55]. Also this point deserves further study.
argument combined with the no-scale structure of the tree-level Kähler potential. That argument
holds for the case of a single Kähler modulus T with tree-level Kähler potential −3 ln(T + T̄ ) and
under the assumption that the coefficient of the loop correction to the Kähler potential ∼ (T + T̄ )−1
is independent of the complex structure moduli and the dilaton. Here, these assumptions do not
hold, but we showed that the term ∼ (Tb + T̄b)−1 in the Kähler potential nevertheless only appears
at subleading order in the potential in LVS, cf. (26)-(30).
4 Gaugino masses
Now that we know how the stabilization of the (Kähler) moduli is modified by loop
corrections, it is natural to extend our analysis to the soft supersymmetry breaking
Lagrangian (For a review see for instance [56,57].) In LVS, supersymmetry breaking
is mostly due to F -terms: Fs 6= 0, Fb 6= 0. These determine the soft supersymmetry
breaking terms which can be present in the low energy effective action without spoiling
the hierarchy between the electroweak and the Planck scale,
Leff = LMSSM + Lsoft . (32)
The soft Lagrangian contains gaugino masses M , scalar masses m, further scalar bilin-
ear terms B and trilinear terms A. (For explicit expressions, see the aforementioned
reviews, or e.g. [6].)
Let us start considering gaugino masses. In [6] it was shown that in LVS, gaugino
masses Ma are generically suppressed with respect to the gravitino mass m3/2:
|Ma| ≃
ln(1/m3/2)
ln(1/m3/2)
(we use units in which MPl = 1). This suppression results from a cancellation of the
leading order F -term contribution to gaugino masses. We briefly review this calcula-
tion. Given the F -terms
F I = eK/2GJ̄IDJ̄W̄ , (34)
gaugino masses are given by [56]
F I∂Ifa , (35)
where fa are the gauge kinetic functions and a labels the different gauge group fac-
tors. In LVS the Standard Model (SM) gauge groups arise from D7-branes wrapped
around small 4-cycles. We do not try to go into the details of how to embed the SM
concretely, but we mention that different gauge group factors might arise from brane
stacks wrapping the same 4-cycle if world volume fluxes are present on the branes. In
that case the gauge kinetic functions are given by11
+ ha(F)S + f (1)a (U) , (36)
11We use the “phenomenology” normalization of the gauge generators, in the language of [58]; that
explains the relative factor of 4π in (36).
where ha depends on the world volume fluxes and we also included a possible 1-loop
correction to the gauge kinetic function which depends on the complex structure (and
possibly open string) moduli. If several gauge groups arise from branes wrapped around
the same cycle, the same Kähler modulus T would appear in all of them. From (36) it
is clear why the D7-branes of the SM have to wrap small 4-cycles, because otherwise
the gauge coupling would come out too small (unless there is an unnatural cancellation
between the different contributions to fa).
As is also apparent from (36), the gauge kinetic function in general depends not
only on the Kähler moduli but also on the dilaton and the complex structure. Thus,
according to (35) we need to know FU , F S and F i for the small Kähler moduli.12
From the definition (34), it is clear that FU and F S might be non-vanishing even
though we assume DUW = 0 = DSW , provided the inverse metric G
J̄I contains mixed
components between Kähler moduli on the one hand and complex structure moduli
and dilaton on the other hand. Without loop corrections (i.e. considering only the
leading α′ correction) there is no mixing between the Kähler and complex structure
moduli, and one finds
FU = 0 , F S ∼ O(V−2) and F i ∼ O(V−1) (without loop corrections) . (37)
Thus, at leading order in the large volume expansion, the sum in (35) only runs over
the Kähler moduli. Moreover, taking into account the linear dependence of the gauge
kinetic functions (36) on the (small) Kähler moduli, the sum effectively only involves
a single term, i.e.
F a +O(V−2) , (38)
where F a is the F-term of the (small) Kähler modulus appearing in fa.
As a concrete example we consider again the P4[1,1,1,6,9] model with only one small
Kähler modulus τs. The corresponding F-term is given by
F s = eK/2
Gs̄s∂s̄W̄ + (G
s̄sKs̄ +G
b̄sKb̄)W̄
= 2τse
K/2W̄0
− 1 +O((aτs)−2)
+O(V−2) , (39)
where we used (20) and (61) for the first term and (64) for the second.
12With a slight abuse of notation, we denote the F -terms of the Kähler moduli by the index i, but
the F -terms of the other moduli are identified by the symbol for the corresponding modulus, like FS .
This is to avoid introducing too many indices.
Now the leading order cancellation is obvious in (39). Determining the gaugino
masses requires dividing by Refs, cf. (35). In order to further evaluate this, [6,7] as-
sumed that the dilute flux approximation fs = (4π)
−1Ts is valid, i.e. they neglected the
contributions from world-volume fluxes and one-loop terms compared to the tree-level
term. This puts some constraints on the allowed discrete flux values determining hs.
We want to stress that the cancellation appearing in (39) is independent of this approx-
imation. We are mainly interested in the fate of this cancellation when including loop
corrections, and do not have anything to add concerning phenomenological constraints
that may arise from imposing the dilute flux approximation. Using it, the gaugino
masses simplify to
|Ms| =
= eK/2|W0|
+O((aτs)−2)
ln(1/m3/2)
ln(1/m3/2)
which is the announced result. In (40) we used
m3/2 ∼ |W0|/V and aτs ∼ ln(V/|W0|) , (41)
where the second relation holds in LVS due to (20).
4.1 Including loop corrections
The previous section was a review of the results found in [6]. Now we ask what changes
if one considers the loop corrected Kähler potential (25). A priori, as (40) results from
a leading order cancellation, one might wonder whether loop corrections might spoil
this small hierarchy between the gaugino and gravitino masses. To address this concern
we start by observing that the gaugino masses are still determined by the F-terms of
the small Kähler moduli (in the large volume limit). More precisely, the scaling of the
F-terms (37) now becomes
FU = O(V−2), F S ∼ O(V−2) and F i ∼ O(V−1) (with loop corrections) , (42)
i.e. FU no longer vanishes, but it is just as suppressed as F S.
We again focus on the P4[1,1,1,6,9] model and ask how (39) is modified by loop cor-
rections. Amongst other things, we need to generalize equation (20) to include loop
corrections, since we need it to calculate the first term in (39). Thus, we need to
extremize the potential again with respect to τs by setting
∂τsV =
4aτs − 1
∆+ 6E (K)s
2a|W0|
V2S1∆2
aτs − 1
∆2 − 18(E (K)s )2
2aS1τ
s E (K)s
X (43)
− 3|W0|
2(E (K)s )2
4S21V3∆
to zero. Obviously, X = 0 is no longer a solution. Instead, there are now two non-trivial
solutions, one of which goes to zero in the limit E (K)s → 0. This solution corresponds
to a maximum of the potential, so it is of no use to us here. We can expand the other
solution for large aτs, as in the case without loop corrections, yielding
X = Ae−aτs =
2|W0|
2aE (K)s
(aτs)2
. (44)
Another ingredient we need is the quantity Gı̄sKı̄, in order to evaluate the second term
in (39). Using equation (65) we obtain
Gı̄sKı̄ = −2τs
6E (K)s
+ . . .
= −2τs −
2E (K)s
− 18(E
S21τs
2(E (K)s )3
τ 2s S
+ . . . , (45)
where the ellipsis represents terms that are more suppressed in V−1.
Now we see from (44), (45) and (65) that at leading order in an expansion in aτs, the
quantities relevant to evaluate (39) are not affected by the loop corrections. Thus, the
leading order cancellation in the gaugino masses survives the inclusion of loop effects.13
At first glance, though, equations (44), (45) and (65) seem to suggest a correction to the
subleading term, that could potentially give a significant contribution to the gaugino
masses after the leading-order cancellation, cf. (39).
In the actual calculation, this contribution drops out. Putting all the ingredients
together (and employing the dilute flux approximation again), the gaugino mass turns
13One might argue that this result was to be expected, because the main assumption of [6] is
that the stabilization is due to nonperturbative effects, i.e. the dominant effect in ∂τsV should arise
from the nonperturbative superpotential. However, in view of (43), it is no longer obvious that the
nonperturbative terms dominate when loop corrections are included.
out to be
|Ms| =
= 3eK/2|W0|
16a2τ 2s
S1 − 12
2aE (K)s
64S1a3τ 3s
+ . . .
ln(1/m3/2)
ln(1/m3/2)
. (46)
The result of [6] is therefore very robust. Unexpectedly, the correction to (40) due to
E (K)s only appears at sub-sub-leading order in the 1/ ln(1/m3/2) expansion.
4.2 Other soft terms
In [7] all other soft terms were calculated for LVS. The main result (see p. 15 of [7]) is
that roughly speaking, all the soft parameters are determined by F s and by the power
with which the chiral matter metrics scale with τs. As we saw in the previous section,
F s gets modified by loop corrections only at sub-sub-leading order in a 1/τs expansion
(see (46)). Therefore, the calculation of all the soft terms in [7] appears to be quite
robust against including loop effects.
One of the key assumptions in [7] is that all the Yukawa couplings Y are already
present in perturbation theory, i.e. they have the schematic form Y = Y pert(U) +
Y np(e−T ). This requirement featured prominently already in the derivation of the
volume dependence of the chiral matter metrics in [59] by scaling arguments. In [7] the
same schematic form is essential for simplifying the trilinear soft terms A. In general
these terms receive a contribution of the schematic form
F T∂T logY = F
T ∂T (Y
pert(U) + Y np(e−T ))
Y pert(U) + Y np(e−T )
∼ O(e
O(T 0) +O(e−T ) , (47)
which is exponentially suppressed if and only if Y pert(U) is non-vanishing. However, in
many examples the Yukawa couplings are actually only generated nonperturbatively,
see for instance the discussion in [60], and [61] for some examples. This poses a con-
straint on the way the Standard Model is realized in LVS, if one wants to ensure flavor
universality of the soft breaking terms as advertised in [7].
One more comment about the important issue of flavor universality. In [7], section
3.4., it was argued that in LVS, approximate flavor universality is a natural consequence
of the zeroth-order factorization of Kähler and complex structure moduli spaces. We
provide some more details on the factorized approximation in appendix F.
5 LVS for other classes of Calabi-Yau manifolds?
In section 3.3 and 4 we saw that the 1-loop corrections to the moduli Kähler potential
only have relatively small effects on the large volume scenario based on the P4
[1,1,1,6,9]
model of [3]. In this chapter, we would like to ask the question how generic the “Swiss
cheese” form is for a Calabi-Yau manifold and if there are other models in which the
one-loop corrections discussed above might become more important. This is indeed
to be expected if the Calabi-Yau under consideration has a fibered structure, as we
explain in the following. If gs corrections do dominate α
′ corrections, they could ruin
the volume expansion of LVS.
5.1 Abundance of “Swiss cheese” Calabi-Yau manifolds
In the LVS examples discussed in [4] the volume in terms of the Kähler moduli takes
the “Swiss cheese” form
− . . .−
, (48)
where the coefficients ai, ..., ci are only non-vanishing for the small Kähler moduli. The
LVS limit consists in scaling the overall volume of the Calabi-Yau more or less isotrop-
ically while having small holes inside the manifold. The τ ’s are linear combinations of
∂tiV, where now V is considered as a (cubic) function of the 2-cycle volumes ti. For
the effective field theory analysis to be valid one should not only demand that the
4-cycle volumes τi are large compared to the string scale, but also that the 2-cycle
volumes ti are large. In the cases discussed in [4], the linear combinations ∂tiV are
indeed such that one can have one of them exponentially large and the others small
(but still sufficiently larger than the string scale), without taking any of the ti to be
exponentially small. This is obvious for the P4
[1,1,1,6,9]
example where the 2-cycle volume
tb only appears in the definition of one of the τ ’s, cf. (14), but it is also true for the
second example of [4], cf. their formulas (84).
However, the F18 model of [16] does not seem to allow its volume to be written in
the form (48) with one Kähler modulus τb that can become large while keeping all the
others small (again, demanding that the ti stay larger than 1 in string units). Thus,
it is an interesting question how generic or non-generic the “Swiss cheese” Calabi-Yau
manifolds are. We do not attempt to give a general answer; instead, we turn to two
examples in which the form of the volume differs from (48).
5.2 Toroidal orientifolds
The reason loop corrections may be more important in toroidal orientifolds than in
compactifications on “Swiss cheese” Calabi-Yau manifolds is the following. As we al-
ready discussed in section 3.1, the conjectured form of 1-loop corrections (23) simplifies
in the case of toroidal orientifolds, because they have very special and simple intersec-
tion numbers. More concretely, using the definition τi = ∂tiV, together with the special
form of the intersection numbers in the toroidal case, i.e. V = t1t2t3, the volume can
alternatively be expressed as
V = √τ1τ2τ3 = tiτi (no summation; i = 1, 2 or 3) . (49)
Thus, formula (23) simplifies and the 1-loop corrections proportional to E (K)i are only
suppressed by single Kähler moduli instead of by the overall volume. Also the terms
proportional to E (W )i can be rewritten in the toroidal orientifold case and the Kähler
potential takes the form (for the T 6/(Z2 × Z2) example)
K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)−
V (50)
E (K)i (U, Ū)
4τiS1
i 6=j 6=k
E (W )k (U, Ū)
4τiτj
where the functions E are non-holomorphic Eisenstein series in this case [37]. It is easy
to see that the origin of this simplification is the fact that there is just a single non-
vanishing intersection number in the toroidal orientifold case and all Kähler moduli
appear linearly in the cubic expression for the volume.
The difference of the toroidal orientifold to the “Swiss cheese” case of LVS can also
be seen in the different forms of the functional dependence of the volume on the Kähler
moduli. In the toroidal orientifold case one has the relations
∂t1V = t2t3 , ∂t2V = t1t3 , ∂t3V = t1t2 , (51)
so that two of them will always become large if one takes one of the ti to be large
and demands that the other two stay larger than 1. This also holds for any linear
combinations of them. The difference is also obvious from the fact that the 2-cycle
volume tb that is responsible for τb becoming large in the LVS examples of [4] always
appears cubically in the volume. This is related to the fact that the term (τb +
aiτi)
should be the square of a linear combination of the ti, in order for (48) to be expressible
as a cubic polynomial in the ti. In contrast, any (untwisted) 2-cycle volume in the
toroidal orientifold case only appears linearly in the cubic volume polynomial.
To illustrate the effect of terms in the Kähler potential that are suppressed only by
single Kähler moduli instead of the overall volume, we take the Kähler potential (50)
and expand V3 in the region of the Kähler cone where τ1 = τ2 = τb ≫ τ3 = τs (as we
explained above, at least two of the Kähler moduli have to become large simultaneously,
if one wants to avoid any of the 2-cycle volumes becoming very small). This leads to
(for simplicity setting all Ui = U , all E (K)i = E (K) and all E
i = E (W )):
|W0|2eKcs
2S1V2
(E (K))2 + 1
(∂UŪKcs)
−1∂UE (K)∂ŪE (K)
8τ 2s S
3 ξ̃ S
E (W )
(E (K))2 + (∂UŪKcs)−1∂UE (K)∂ŪE (K)
4S21τs
. (52)
Obviously, the leading term in the large τb expansion now comes from the loop correc-
tion and not from the α′ term (which term really dominates depends on the values of
S1 and U as well, of course). Thus, an expansion of the potential as in LVS, cf. (16),
would not be realized in this case, even if one found a way to lift enough zero modes
by fluxes for τs to appear in a nonperturbative superpotential.
This toy example was meant to show that for a consistent large volume expansion
in models with large hierarchies in the Kähler moduli, it is important to make sure that
there are no correction terms in the Kähler potential (from loop or α′-corrections) that
are suppressed only by some of the small Kähler moduli. We should stress again that
also terms suppressed by the large volume can be dangerous if the suppression is less
than for the α′ term, i.e. if it is τ−λb with λ < 3/2. The only exception to this rule is
the case λ = 1 as we showed above (and as is shown more generally in appendices C.4
and C.5). In this respect it would be important to know if the conjecture (25) really
bears out. If it turns out that the actual form of the 1-loop corrections also contains
terms like
tλ1b t
, (53)
with λ1 + λ2 = 1 but 0 < λ1 6= 1 or 0, such a 1-loop correction would spoil the large
volume expansion performed in (16).14
14In principle, one would also need an argument that no such terms arise at higher loop order, which
5.3 Fibered Calabi-Yau manifolds
The feature of orientifolds that all Kähler moduli appear linearly in the cubic expression
for the volume shows that a similar simplification can occur in the case of (K3) fibered
Calabi-Yau manifolds, which also have the property that one Kähler modulus (the one
corresponding to the volume of the base) only appears linearly in the cubic expression
for the volume. This takes the form
V = tbηijtitj + dijktitjtk , (54)
where ηij are the intersection numbers of the (K3) fiber, and neither they nor the triple
intersection numbers dijk contain the index b, which is chosen to denote “base”, but it
is also suggestively the same index as the one we used for the large Kähler modulus in
the P4
[1,1,1,6,9]
model. Two-parameter examples of this type appear in e.g. [47,62]. In a
region of the Kähler moduli space where the base tb is rather large but all the other
ti stay relatively small, the volume is approximately V = tbηijtitj . Thus, if the Kähler
potential has a 1-loop correction ∼ E (K)b tb/V, it could be approximated in this region
E (K)b tb
E (K)b
, (55)
where τf = ηijtitj is the volume of the (K3) fiber (which is small compared to t
Obviously, this would lead to a correction to the Kähler potential that is only sup-
pressed by a single (small) 4-cycle volume, similar to the toroidal orientifold example
we discussed in the last section.
We should note that this limit (large base and small fiber for (K3) fibered Calabi-
Yau manifolds), is quite different from the one performed in the usual LVS of [3], even
though both cases involve hierarchical limits of the Kähler moduli. As explained in
section 5.1, the LVS limit consists in scaling the overall volume of the Calabi-Yau more
or less isotropically while keeping holes in the bulk of the manifold small. In contrast,
the limit of large base and small fiber is anisotropic. At the moment we have nothing to
add about whether such anisotropic configurations with all moduli stabilized actually
exist. We merely wanted to point out that if they do exist, that would be an example
of smooth Calabi-Yau compactifications where the gs corrections we consider dominate
over the α′ corrections considered in the large volume limit, as in the toroidal orientifold
case.
would, however, have to be further suppressed in the dilaton S1.
6 Further corrections
In [4], further α′ corrections to the string effective action beyond the one in (9) were
considered. In the case of bulk α′ corrections (i.e. those already present in type IIB
bulk theory without D-branes, arising from sphere level) scaling arguments were given
as to why they are suppressed in the large volume limit. Although that discussion was
surprisingly powerful in its simplicity, we do not consider it completely conclusive, if
large hierarchies between the Kähler moduli exist. After all, dimensional analysis alone
does not guarantee that the other α′-corrections are always suppressed by additional
powers in the overall volume, instead of powers of some of the small Kähler moduli.
Moreover, in addition to the bulk α′ corrections that appear at order O(α′3), in the
models of interest for LVS further α′-corrections arise on the worldvolume of D-branes
and O-planes, cf. [63–70]. These corrections begin already at order O(α′) and scaling
arguments of the kind used for the bulk corrections do not seem to be sufficient to
neglect them.
Indeed, there are correction terms involving two powers of the Riemann tensor which
do modify the effective D3-brane charge and tension, if the D7-branes are wrapped
over 4-cycles with non-vanishing Euler number. These terms were already taken into
account in [1]. However, there are further contributions to the DBI action at the same
order in α′, like F 23R or F
3 , where F3 stands for the RR 3-form field strength, R for
the Riemann tensor and we left index contractions unspecified. If the D7-branes do
not break supersymmetry and remain BPS, it seems unlikely that these terms could
contribute to the potential for the closed string moduli, i.e. induce some effective D3-
brane tension. The reason is that there does not seem to be a corresponding term in
the Chern-Simons action that could lead to the necessary modification of the effective
D3-brane charge at the same time. This could be checked in more detail.
In general, we think that the question of additional corrections to the moduli
(Kähler) potential deserves further attention. Here we only outlined some steps in
that direction.
7 Conclusions
In this paper, we have investigated whether string loop corrections may impact a) stabi-
lization in the large volume compactification scenario (LVS), and b) the phenomenology
of those scenarios, as manifested in the soft supersymmetry breaking terms. The result
is that for the specific class of compactification manifolds considered in LVS, so-called
“Swiss cheese” Calabi-Yau manifolds, changes are minuscule. Only if the loop cor-
rections become abnormally large (in the toroidal orientifold case, this can happen if
the complex structure is stabilized very large) do they affect LVS. For other classes
of manifolds, the corrections may be important. We hasten to add that the detailed
expressions for the loop corrections in LVS remain unknown; we have merely tried to
infer their scaling with the Kähler moduli from experience in the toroidal orientifold
limit. We think it is important to attempt to address this issue, as the string coupling
is stabilized at a nonzero value, so the corrections cannot be turned off.
We also stress the (to some readers obvious) fact that there remain a host of is-
sues that must ultimately be dealt with if one wishes to claim that these are “string
compactifications”.
• We cannot be sure that fluxes do not alter the corrections, since backgrounds
with RR and NSNS fluxes are not well understood in string perturbation theory.
• Additional corrections may appear (see section 6) that could be equally threat-
ening to LVS as the loop corrections, or worse.
• In [37] only the corrections to the Kähler potential coming from N = 2 sectors
were determined and we based our generalization on those results. However,
there might be interesting corrections coming from the N = 1 sectors as well.
• It has not yet been shown that a local Standard Model-like construction can
be embedded in the simplest examples like the P4
[1,1,1,6,9]
model. If more general
models turn out to be needed, one needs to reconsider whether the requisite
nonperturbative superpotentials are generated.
• We have largely ignored open string moduli, under the proviso that they are
stabilized heavy, as are the dilaton and complex structure moduli.
• The coefficient A(S, U) in the nonperturbative superpotential is generally as-
sumed to be of order 1. It is not known how generic this is.
• All string computations we have discussed were performed in a supersymmetric
context. In LVS supersymmetry is broken already before uplift, in the AdS
minimum. Supersymmetry breaking directly in string theory is not very well
understood [39,71].
Faced with all these caveats, a pessimist might be inclined to give up. We think we
have shown that it is worth considering these issues in detail. Sometimes, an effect one
would have thought to be devastating turns out to be as gentle as a summer breeze.
Acknowledgments
It is a pleasure to thank Carlo Angelantonj, Vijay Balasubramanian, Massimo Bianchi,
Joe Conlon, Gottfried Curio, Robbert Dijkgraaf, Michael Douglas, Bogdan Florea,
Elias Kiritsis, Max Kreuzer, Fernando Marchesano, Peter Mayr, Thomas Mohaupt,
Hans-Peter Nilles, Gabi Pfuff, Fernando Quevedo, Waldemar Schulgin, Mike Schulz,
Stephan Stieberger, Angel Uranga, and Alexander Westphal for helpful discussions
and comments and Boris Körs for initial collaboration. This work is supported in part
by the European Community’s Human Potential Program under contract MRTN-CT-
2004-005104 ’Constituents, fundamental forces and symmetries of the universe’. The
work of M. B. is supported by European Community’s Human Potential Program un-
der contract MRTN-CT-2004-512194, ‘The European Superstring Theory Network’.
He would like to thank the Galileo Galilei Institute in Florence for hospitality. M. H.
would like to thank the university of Nis for hospitality. The work of M. H. and E. P.
is supported by the German Research Foundation (DFG) within the Emmy Noether-
Program (grant number: HA 3448/3-1). Both M. B. and M. H. would like to thank
the KITP in Santa Barbara for hospitality during the program “String Phenomenol-
ogy” and the university of Hamburg for hospitality during the workshop “Generalized
Geometry and Flux Compactifications”.
A Some details on LVS
In this appendix we collect some details on the minimization of the potential in LVS,
mainly reviewing the results of [3,8], but filling in some details. The minimization with
respect to the axions (i.e. the imaginary parts of the Kähler moduli) is performed for
an arbitrary number of Kähler moduli, while for the minimization of the real parts, we
restrict to the example of the hypersurface in P4[1,1,1,6,9] discussed throughout the main
text.
A.1 LVS for P4
[1,1,1,6,9]
Here we give some more numerical details on large-volume stabilization in the P4
[1,1,1,6,9]
orientifold. The relevant features of this Calabi-Yau have been described in chapter
2.3. The leading terms of the scalar potential are
V e−Kcs =
τe−2aτ
V2 τe
−aτ +
V3 , (56)
where we use τ = τs and V as the independent variables and for the expansion we
have in mind the limit (15). The minimum of this potential under the assumption that
aτ ≫ 1 is given by
V = µ
eaτ . (57)
In the P4[1,1,1,6,9] orientifold the coefficients λ, µ and ν can be calculated explicitly,
yielding
2a2|A|2
, µ =
2a|AW0|
and ν = ξ
S1|W0|2 . (58)
We notice that the value of τ at the minimum is determined only by the Euler number
τ ∝ χ2/3 and the value of the dilaton S1 at its minimum. An example of a set of
possible parameters (using a = 2π/10, A = 1, S1 = 10 and W0 = 10) is
ξ = − ζ(3)χ
2(2π)3
≃ 1.31 −→ ν ≃ 155 ,
λ ≃ 0.67 , µ = 4π
. (59)
There is an unknown overall factor eKcs that does not change the shape of the potential
and so leaves the position of the minima unchanged. For the parameters given in
equation (59), the minimum is at τ ≃ 41.1 and V ≃ 9.96 · 1011. These values come
from equation (57) which is just approximated using the assumption aτ ≫ 1. This
solution has the shortcoming that, if one is interested in the value of the potential at
the minimum, after substitution of (57) into (56), one finds V = 0. If instead one
solves the exact equation for the minimum of the potential numerically, the result is
V ≃ −6.6 · 10−37 at the point τ ≃ 41.7 and V ≃ 1.38 1012. From this one checks that,
apart from the shortcoming that V = 0, the approximate solution gives the position of
the minimum with a good precision.
A.2 Many Kähler moduli
The simple picture of P4
[1,1,1,6,9]
, gets slightly more involved in models with more than
two Kähler moduli, but some general statements can still be made. For a single small
Kähler modulus, among the leading contributions to the potential only the one from
Vnp2 is axion dependent, while the leading terms in V3 and Vnp1 are axion independent.
For several small Kähler moduli, all three terms are axion dependent. However, the
argument that the leading term in Vnp2 only receives a sign change due to axion stabi-
lization generalizes (and holds also for the regular KKLT scenario with relatively small
volume, see e.g. [8], section 3.2).
Indeed, with the superpotential (11) one obtains
Vnp1 = e
K G̄i
aiaj |AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj)
Vnp2 = −2eK aiGk̄iKk̄
|AiW0|e−aiτi cos(−aibi + βi − βW0)
+|AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj)
, (60)
V3 = e
K (Gk̄lKk̄Kl − 3)
|W0|2 + 2 |W0Ai|e−aiτi cos(−aibi + βi − βW0)
+|AiAj |e−aiτi−ajτj cos(−aibi + ajbj + βi − βj)
where Ai = |Ai|eiβi, W0 = |W0|eiβW0 and a sum over repeated indices is understood
throughout. As the only dependence on the axions is in form of cosines, one can easily
see that this potential has a minimum for
aibi = −βW0 + βi + niπ , ni ∈ 2Z+ 1 . (61)
We notice that the minimum of the bi depends on the (already fixed) complex structure
moduli, but it is independent of the Kähler moduli.
In the regime (15) the scalar potential again contains three terms at leading order,
Vnp1 ∼ 2eKcs
aiaj |AiAj |e−aiτi−ajτjM ljMki (−VVlk + VlVk)
+ . . . ,
Vnp2 ∼ −2eKcs
ai|AiW0|e−aiτiτi
+ . . . , (62)
V3 ∼ eKcs
8V3 |W0|
2 + . . . ,
where the sum over i and j effectively only picks up terms from the small moduli
because of the exponential suppression of Vnp1 and Vnp2. Moreover, for Vnp1 we used
the form (89) for the inverse of the moduli metric with respect to the basis (83). The
Kähler moduli appearing in the nonperturbative superpotential are linear combinations
of these, which we account for by a basis-changing matrix Mki , i.e.
Ti =M
i T̃k , (63)
where T̃k are the fields defined in (83) and Ti are the Kähler moduli appearing in the
nonperturbative superpotential. (Another way of saying this is that the real parts of Ti
measure the volumes of a basis of divisors that have the right properties to contribute
to the nonperturbative superpotential.) In the second term we used
Gk̄iKk̄ = −2τi + . . . = −2ReTi + . . . . (64)
In the basis (83), this would follow straightforwardly from (87), (89) and the relations
(80), but it holds equally well after a change of basis, because both sides of (64)
transform linearly under a change of basis (63).
Note finally that the ellipsis in (62) and (64) stand for subleading corrections in the
large volume limit (assuming also (15)).
B Loop corrected inverse Kähler metric for P4
[1,1,1,6,9]
We now have a closer look at the inverse metric from the Kähler potential in equation
(25). We invert the 4× 4 matrix and focus on the four terms that appear in the scalar
potential for the Kähler moduli,
Gb̄b =
τ 2b +O (τb) ,
Gb̄s = Gs̄b = 4τbτs
6E (K)s
, (65)
Gs̄s =
2S1τs
where we have performed an expansion in τb ≃ V2/3 and the quantity ∆ was introduced
in (29). We notice that only Gb̄b is not corrected at leading order. The apparent
divergence from zeros of the denominator ∆ is an artifact of the expansion. In fact,
the determinant of the (entire) Kähler metric behaves as
detG ∼ Aτ−7/2b +Bτ
b + . . . (66)
for some expressions15 A and B, which depend on the moduli τs, U and S1. In par-
ticular, one finds A ∼ ∆, but B does not vanish at a zero of ∆. Thus, in general
the expansion in large τb picks up the factor A, which is responsible for the apparent
divergence in (65). However, this is fictitious because when ∆ = 0, the next term
proportional to B is non-vanishing and the determinant stays away from zero. Indeed,
we do not expect to find any zero of the determinant in the range of validity of the
parameters.
If E (K)s ≪ (S1τs), one can further expand (65) with respect to E (K)s /(S1τs), yielding
Gb̄b =
τ 2b +O (τb) ,
Gb̄s = Gs̄b = 4τbτs
2E (K)s
((E (K)s )2
Gs̄s =
3E (K)s√
2S1τs
((E (K)s )2
Depending on the values of the moduli (τs and S1), this expansion may or may not be
useful. In general, only the expansion in τb makes sense and one has to deal with the
full expressions (65). That is what we did in section 3.3.
C No-scale Kähler potential in type II string the-
In this appendix we review why compactification of type IIA and type IIB theory
on general Calabi-Yau manifolds, or orientifolds thereof, lead to no-scale (F-term)
potentials if
i) the superpotential does not depend on the Kähler moduli (68)
and if
ii) one uses the tree-level form of the Kähler potential. (69)
(Of course, in LVS neither i) nor ii) holds, but one can think of jointly imposing i)
and ii) as a zeroth-order approximation, that we will successively move away from in
later subsections of this appendix.)
15This A has nothing to do with the A in Wnp.
If the moduli spaces of Kähler and complex structure moduli factorize (see appendix
F for more details on this), and under assumption i), the F-term potential takes the
V = eK
GĪJDĪW̄DJW − 3|W |2
GābDāW̄DbW + (G
ı̄jKı̄Kj − 3)|W |2
. (71)
The indices a and b run over the complex structure moduli and the dilaton, i, j over
the Kähler moduli and I and J refer to all moduli.
The condition for a no-scale potential (V = 0 for the Kähler moduli) is then
Gı̄jKı̄Kj = 3 , (72)
and we will verify in turn that this is fulfilled in both type IIA and type IIB Calabi-
Yau compactifications, if one uses the tree-level Kähler potential, as in assumption ii).
In that case, the moduli spaces of Kähler and complex structure moduli do factorize
exactly.
C.1 No-scale structure in type IIA
The tree level Kähler potential for the Kähler moduli is
K = − ln
dijk(σ + σ̄)i(σ + σ̄)j(σ + σ̄)k
= − ln
dijktitjtk
= − ln(V) , (73)
where dijk are the intersection numbers of the Calabi-Yau,
dijk =
ωi ∧ ωj ∧ ωk , (74)
σi = ti + ici (75)
are the complexified Kähler moduli whose real parts ti represent the volumes of 2-cycles
and whose imaginary parts originate from the expansion of the NSNS 2-form. Using
the Kähler form
J = tiωi (76)
of the Calabi-Yau, it is useful to introduce the notation
V = 1
J ∧ J ∧ J = 1
dijktitjtk ,
ωi ∧ J ∧ J =
dijktjtk ,
Vij =
ωi ∧ ωj ∧ J = dijktk . (77)
Note that here the index i does not denote a derivative with respect to the Kähler
variables (in contrast to subscripts on the Kähler potential K). Instead, one has the
relations Vi = 2∂σiV and Vij = 4∂σi∂σ̄̄V. It is straightforward to calculate
Ki = −
2V = Kı̄ Gi̄ = Ki̄ = −
. (78)
Then one can show that the inverse Kähler metric is
G̄i = −4VjiV + 2tjti . (79)
To verify this, one has to use
V ijVj =
ti , Vijtj = 2Vi , Viti = 3V . (80)
Putting everything together, one arrives at
Gı̄jKı̄Kj =
−4V ijV + 2titj
V = 3 , (81)
i.e. (72) is fulfilled under assumption (69).
C.2 No-scale structure in type IIB
In the type IIB case, the tree-level Kähler potential for the Kähler moduli is
K = −2 ln
dijktitjtk
= −2 ln(V) . (82)
The difference to the IIA case is that, even if K in (82) is expressed in terms of the
2-cycle volumes ti, the real parts of the good Kähler moduli, T̃i, are now the 4-cycle
volumes τ̃i (the imaginary parts, on the other hand, arise from the RR 4-form). The
relation between them depends on the particular Calabi-Yau:
Re T̃i = τ̃i =
dijktjtk = Vi , (83)
which cannot be inverted in general.16 In order to calculate Ki = ∂T̃iK we note that
∂ti = (∂ti T̃j)∂T̃j + (∂ti
T̄)∂ ¯̃T̄
= Vij
∂T̃j + ∂ ¯̃T̄
. (84)
If acting on a function F that only depends on T̃ +
T , as is the case for K, (84)
simplifies to
∂tiF (T̃ +
T ) = 2Vij∂T̃jF (T̃ +
T ) , (85)
where on the left hand side T̃ is understood as a function of t. Alternatively, one has
∂T̃iF (T̃ +
T ) =
V ij∂tjF (T̃ +
T ) = ∂ ¯̃
F (T̃ +
T ) . (86)
Using this, one can calculate
Ki = −
V ∂T̃iV = −
V ijVj
V = −
2V = Kı̄ , (87)
where in the last step we used (80). In the same way one can calculate
Gi̄ =
. (88)
Using this formula one can check that the inverse Kähler metric is given by
Gı̄j = 4 (−VVij + ViVj) . (89)
Putting everything together, no-scale structure holds also for type IIB:
Gı̄jKı̄Kj = (−VVij + ViVj)
V = 3 , (90)
again under the assumption (69).
16Note that the Kähler moduli appearing in the non-perturbative superpotentials in the examples
of [16] are related to the ones in (83) by a linear field redefinition. However, this does not play any role
in verifying the no-scale structure at leading order, as (90) below is invariant under field redefinitions.
We chose to make the distinction clear by using tildes for the Kähler moduli defined by (83).
C.3 Cancellation with just the volume modulus
Now we relax assumption (69). For simplicity, let us first consider the Kähler potential
K = −3 ln(T + T̄ ) + Ξ
(T + T̄ )λ
, (91)
which corresponds to the case of a single Kähler modulus and the complex structure
moduli and the dilaton are neglected. A generic quantum correction was added to the
tree level term, which could be an α′ or a loop correction, depending on the value of
λ. Focusing on V3, i.e.
eK |W |2 = G
̄iK̄Ki − 3 , (92)
one calculates
eK |W |2 =
(3(2τ)λ + ξλ)2
3(2τ)2λ + Ξ(2τ)λλ(λ+ 1)
= 3− 3 + (λ− λ
(2τ)λ
3(2τ)2λ
. (93)
This simplified calculation gives an intuition of why the E (K)b -term does not appear
in V3 of (28) whereas the α
′- and E (K)s -terms contribute. When the exponent of the
quantum correction is exactly 1, there is a cancellation at leading order in the scalar
potential (compare also the discussion in footnote 10). Note that since we focused on
V3 in this subsection, it did not matter whether assumption (68) holds or not.
C.4 Cancellation with many Kähler moduli
We would now like to see how the previous result is changed when we have an arbitrary
number of moduli. We do not make any assumption on the dependence of the volume
on the Kähler moduli (“Swiss cheese” or fibered manifolds are special cases). Due to
its relevance for LVS, we consider a single correction to the Kähler potential which only
depends on the large Kähler modulus Tb (an example would be the α
′-correction or the
loop term proportional to E (K)b , considering the moduli other than the Kähler moduli
as fixed; this is allowed at leading order in a τb-expansion, as we argue in appendix F).
Thus, we take the Kähler potential to be of the form
K = K(0) + δK = −2ln(V) + δK(Tb, T̄b) ≡ −2ln(V) +
(Tb + T̄b)λ
. (94)
Again focusing on V3, we obtain
eK |W |2 = G
̄iK̄Ki − 3 = (G̄i0 + δG̄i)
̄ + δK̄
i + δKi
− 3 ,
where δKi ≡ ∂TiδK and G
0 is the inverse metric of appendix C.2; finally δG
̄i is
the modification of the inverse metric coming from considering the modified Kähler
potential (94). Explicitly one has
G̄i = (G0i̄ + δKi̄)
−1 ≃ G̄i0 −G
0 δKhk̄G
0 + . . . ,
δKi = −
(2τb)λ+1
δib , δKi̄ =
(λ2 + λ) Ξb
(2τb)λ+2
δibδ̄b . (96)
We now put everything together and use the results of appendix C.2 and formula (64)
(which, for the unperturbed metric and Kähler potential, is an exact equality) to arrive
eK |W |2 =
i − 3
̄ δKi
0 δKkh̄G
i + . . .
= 0 +
(2τb)λ+1
4(λ2 + λ)Ξb
(2τb)λ+2
τbτb + . . .
(λ− λ2)Ξb
(2τb)λ
+ . . . . (97)
We notice that the term 1/τλb vanishes exactly for λ = 1, independently of the explicit
form of the volume in terms of the Kähler moduli. In particular, the loop correction
proportional to E (K)b experiences a cancellation at leading order in V3 (and it is not
difficult to see that the subleading order is suppressed by τ−2λb ). Therefore, the loop
correction is subleading in the potential compared to the α′ correction, even though it
is leading in the Kähler potential. Next, we would like to extend this analysis to the
other parts of the potential, i.e. Vnp1 and Vnp2.
C.5 Perturbative corrections to Vnp1 and Vnp2
We now introduce the nonpertubative superpotential into the game, i.e. relax assump-
tion (68), and look at the other terms of the scalar potential, Vnp1 and Vnp2 (see eq.
(12)). For this, we restrict to the P4[1,1,1,6,9] model again. The contribution Vnp1 is pro-
portional to Gs̄sW̄,s̄W,s. From (65) we see that no E (K)b appears at leading order. This
can be understood as follows. Consider the Kähler potential (94) where now V is the
volume of P4
[1,1,1,6,9]
, given in (14). Then the scaling with the large Kähler modulus τb
is schematically given by
Gs̄s ≃ Gs̄s0 −Gs̄b0 δKbb̄Gb̄s0 + . . .
∼ τ 3/2b + τ 2b
τλ+2b
∼ τ 3/2b +
+ . . . , (98)
which shows that any loop correction to the Kähler potential of the form Ξ/τλb leads to
a subleading contribution to Vnp1 in the large volume expansion. As usual, the ellipsis
stands for terms that are even more subleading in the τb expansion.
To understand the E (K)s correction to Gs̄s we need to consider
K = −2 ln(V) + δ̃K(T, T̄ ) ≡ −2 ln(V) + Ξb g(Ts, T̄s)
(Tb + T̄b)λ
for some function g(Ts, T̄s) of the small Kähler modulus and we assume λ ≥ 3/2 in the
following. Then, again very schematically, the scaling behavior is given by17
Gs̄s ≃ Gs̄s0 −Gs̄i0 δ̃Ki̄G
0 + . . .
∼ τ 3/2b + Ξb (τb, τ
τ−λ−2b τ
τ−λ−1b τ
+ . . . (100)
∼ τ 3/2b + τ−λb + τ
−λ+3/2
b + τ
b + . . . .
One sees that λ = 3/2 indeed contributes at the same order as Gs̄s0 . This is confirmed
by the dependence of Gs̄s in (65) on E (K)s through ∆, cf. (29).
We now consider Vnp2. This is proportional to G
̄sK̄. Again we start by considering
a correction to the Kähler potential whose only dependence on the Kähler moduli is
via τb, as in (94). Schematically, we find
G̄sK̄ ≃ (G̄s0 −G
0 δKbb̄G
̄ + δK̄
+ . . .
∼ τs +
Gs̄b0 Ξb
τλ+1b
+ . . . ∼ τ 0b +
+ . . . . (101)
This result is confirmed by the absence of E (K)b in the leading term of Vnp2. A calculation
very similar to the one in (100) shows, however, that Vnp2 is modified by a correction
to the Kähler potential of the form (99) for λ = 3/2. It is straightforward to generalize
this analysis to a more general form of the “Swiss cheese” volume, with more than one
small Kähler modulus.
17For λ = 3/2 we can still use the expansion of the inverse metric (96), because the correction term
would also be further suppressed e.g. in the dilaton.
D KK spectrum with fluxes
In this section we would like to develop some intuition on how the analysis of sections
3.3 and 4 might change in the presence of fluxes. We will restrict the discussion to
one possible effect of the fluxes, namely their influence on the KK spectrum. It is not
known explicitly how closed string fluxes, which are present in LVS, would change the
mass spectrum. We will consider a toy example, using an analogy to the correction
arising from world volume fluxes (cf. [52]), in order to get a feeling for what kind of
effects one might expect. In particular, for the purposes of this appendix we assume a
modified KK mass spectrum of the form
m2KK ∼
1 + F
1 + F
) , (102)
where F represents any of the fluxes that may be present, and in the second equality
the factors of S1 appeared when expressing the 2-cycle volumes in Einstein frame as
compared to the string frame (t ∼ e−Φ/2tstr). Note that expanding (102) for large
values of t would lead to a correction ∆m2 ∼ F 2/t3, whose scaling with the flux and
with t is reminiscent of the moduli masses induced by closed string 3-form flux [72,73].
In that case, the suppression would be by the overall volume (which would lead to only
mild effects in LVS), but in (102) we allow for a suppression by single 2-cycle volumes
(which might be the small 2-cycle in the P4[1,1,1,6,9] model).
Substituting (102) in (23), we now consider the scalar potential resulting from
K = −ln(2S1)− 2 ln(V) +Kcs(U, Ū)−
τbE (K)b
τsE (K)s
F 2S1
W = W0 + Ae
−aTs . (103)
We have not included any flux correction to the term proportional to E (K)b because we
expect such corrections to be subleading in a large volume expansion.18 Note that the
F -dependent correction term we did include is of the same form as the winding string
18Even though we think it is unlikely, we cannot exclude that the correction to KK masses that
scale like t−1b without fluxes is only suppressed by F
2/τs instead of F
2/τb. In that case, one would
have to redo the analysis of appendix C.5, using (99) with λ = 1. This would prohibit the use of the
expansion (96), because in the large volume limit the leading contribution to Gss̄ would arise from the
loop correction (it would scale as τ−1b as opposed to the scaling of the tree level contribution ∼ τ
In that case the leading terms in Vnp1 and Vnp2 would be suppressed compared to V3 and only arise
at order V−10/3, thus invalidating the volume expansion of LVS.
correction ∼ E (W )s , when one neglects any potential flux corrections to the winding
string spectrum, cf. (24) (remember that gsW would just be proportional to 1/
without fluxes). Thus, by considering (103) we implicitly also analyze in the following
the effect of corrections from winding strings (recall from section 3.3 that this correction
is not present in the P4[1,1,1,6,9] model, but may be present in general).
We now give the generalization of (26)-(28) when using the modified Kähler po-
tential (103). The three contributions at leading order (O(V−3)) in the large volume
expansion are
Vnp1 = e
24a2|A|2τ 3/2s e−2aτs
V∆ , (104)
Vnp2 = −eKcs
2a|AW0|τse−aτs
6E (K)s
1− 2F
, (105)
3eKcs |W0|2
1 ξ̃ + (106)
4(E (K)s )2 − 8(E (K)s )2F 2S1τ−1s (1 + F 2S1τ−1s )− 8
F 2S21E
where the axion has already been minimized for, as discussed in section A.2, and now
∆ is generalized to
2S1τs − 3E (K)s
1− 3F
. (107)
Plots for F = 1 and F = 3 are given in fig. 5, and they look quite similar to the
plot without flux, fig. 4. Qualitatively, the conclusion is the same; only for nongeneric
values of the gs corrections do they compete with the α
′ correction. Note, however,
that the amount of fine-tuning seems to depend on the value of the flux, cf. fig. 5.
The same is true for the dependence of the values of V and τs at the minimum on S1
and E (K)s . For F = 3, for instance, this dependence becomes more complicated than
what we found in (31). For the parameter range shown in fig. 5, the values of τs and
V in the minimum vary in the ranges τs ∈ [14.6, 46.3] and log10 V ∈ [3.7, 15.5], where
the smallest value for both of them is reached in the corner where the two corrections
become comparable.
Also the cancellation that we found for the gaugino masses survives the inclusion
of the flux factor in (103). The correction still only appears at sub-sub-leading order
0.0 9
PSfrag replacements
0.0 9
PSfrag replacements
Figure 5: Similarly to figure 4, the top surface is the α′ correction, the second is
the gs correction (with F = 1 in the left graph and F = 3 in the right), and the “red
carpet” is 10/∆, with ∆ from (107), using the same values as in fig. 4. The result
is qualitatively the same as before. Note, however, that the range for E(K)s differs.
For larger values of F one does not need to fine-tune E(K)s as much in order for the
two corrections to become of similar order.
in an expansion in ln(1/m3/2) and we find (again using the dilute flux approximation
for the prefactor (Ref)−1):
|Ms| =
= 3eK/2|W0|
16a2τ 2s
S1 − 12
2a(1− 2F 2S1τ−1s )E
64S1a3τ 3s
+ . . .
ln(1/m3/2)
ln(1/m3/2)
. (108)
This concludes our brief study of the direct effects of fluxes on the loop corrections.
E The orientifold calculation
In the main text, we are interested in how ∆Kgs , the one-loop correction to the Kähler
potential, scales with the Kähler moduli Ti. Our argument in section 3.1 is based on the
known result for ∆Kgs in the case of N = 2 supersymmetric K3× T 2 orientifolds and
N = 1 supersymmetric T 6/ZN (or T 6/(Z2 ×Z2)) orientifolds, from [37] (see also [74]).
Here we review this computation for the case of K3 × T 2, and take this opportunity
to adapt it to our case of D3-branes and D7-branes from the beginning. (One can also
obtain them by T-duality on the final D9/D5 results of [37], e.g. as in the appendix
of [75], but as we shall see, the direct computation is enlightening in its own right.)
We will leave out details that are essentially identical to [37], and only emphasize the
differences.
As shown in [37] using “Kähler adapted” vertex operators, the easiest way to com-
pute ∆Kgs is by considering the 2-point function of the complex structure modulus U
of T 2, with vanishing Wilson line moduli, i.e.
〈VUVŪ〉 = −
4g2cα
(U − Ū)2
〈V (0,0)ZZ V
(0,0)
〉σ . (109)
Here, we use the notation of [37],
(0,0)
U = −gcα′−2
U − Ū
(0,0)
(0,0)
= gcα
′−2 2
U − Ū
(0,0)
(110)
(0,0)
ZZ = −
i∂Z +
α′(p · ψ)Ψ
i∂̄Z +
α′(p · ψ̃)Ψ̃
eipX ,
(0,0)
= − 2
i∂Z̄ +
α′(p · ψ)Ψ̄
i∂̄Z̄ +
α′(p · ψ̃) ¯̃Ψ
eipX . (111)
As in [37], and [76] before that, we find these complex worldsheet variables particularly
convenient:
(X4 + ŪX5) , Z̄ =
(X4 + UX5) ,
(ψ4 + Ūψ5) , Ψ̄ =
(ψ4 + Uψ5) , (112)
where
Gstr is the volume of T
2 measured in string frame. The 2-point function (109)
can be expanded for small momenta, p1 · p2 ≪ 1, and we obtain
〈V (0,0)ZZ V
(0,0)
〉σ = −V4
(p1 · p2)
16(4π2α′)2
d2ν1d
k=0,1
~n=(n,m)T
e−π~n
~nt−1
](0, τ)
η3(τ)
γσ,kZ intσ,k[
〈∂̄Z(ν̄1)∂̄Z̄(ν̄2)〉σ〈Ψ(ν1)Ψ̄(ν2)〉α,βσ 〈ψ(ν1)ψ(ν2)〉α,βσ (113)
+〈∂̄Z(ν̄1)∂Z̄(ν2)〉σ〈Ψ(ν1) ¯̃Ψ(ν̄2)〉α,βσ 〈ψ(ν1)ψ̃(ν̄2)〉α,βσ + c.c.
+O((p1 · p2)2) .
For the details we refer to [37]. The main difference to the corresponding formula
(C.3) in [37] is the appearance of the inverse metric G−1str in the exponent arising from
the zero mode sum, and in the prefactor. This is due to the fact that the D3 and
D7 branes are localized along the T 2, and so the closed string channel involves a
Kaluza-Klein momentum sum instead of a winding sum. The sum over bosonic zero
modes has been made explicit, since there is also an implicit dependence on m,n in
the bosonic correlators: this arises from the classical piece in the split into zero modes
and fluctuations. That is, Z(ν) = Zclass(ν) + Zqu(ν), where the classical part is given
Zclass =
n+mŪ
Re(ν) c̃σ , c̃σ =
1 for K
2 for A ,M . (114)
These zero modes have the right periodicity under Re(ν) → Re(ν) + π (for A,M) or
Re(ν) → Re(ν) + 2π (for K), i.e. X4 → X4 + 2πn
α′ and X5 → X5 + 2πm
α′. In
contrast to [37] they involve the real part of ν. The reason is again that in the D3/D7
case the branes are localized along T 2 and thus the winding appears in the open string
channel as opposed to the closed string channel (as was the case for D9/D5 branes).
The sum over spin structures is performed using Riemann identities. This leaves
the correlators of the bosonic fields as the only piece that depends on the positions
νi of the vertex operators. The νi integral can then be evaluated. As the zero modes
(114) involve the real part of ν in the case of D3/D7-branes, in contrast to the D9/D5-
case studied in [37], the zero mode contribution in the Z-correlators drops out. The
quantum part is evaluated using the method of images on the worldsheet [38,77,78].
To evaluate the KK sum in (113), it is useful to regularize the integral over t by a UV
cutoff Λ. With this we obtain
1/(eσΛ2)
~n=(n,m)T
π3c2σtα
′e−π~n
~nt−1
π3α′c2σe
4 + πα′c2σ
E2(0, U) + . . . , (115)
where the prime at the sum indicates that the (n,m) = (0, 0) term is left out, and cσ,
eσ are constants whose precise values will not be important in the following (but can
be found in [37]). Terms that go to zero in the limit Λ → ∞ have been dropped, as
indicated by the ellipsis. The nonholomorphic Eisenstein series E2(0, U) is the s = 2
special case of
Es(0, U) =
~n=(n,m)T
|n+mU |2s . (116)
The terms involving the UV cutoff Λ drop out after summing over all diagrams, due
to tadpole cancellation. We have then reduced (113) to
〈V (0,0)ZZ V
(0,0)
〉σ = −(p1 · p2)α′
(4π2α′)2
E2(0, U)γσ,kQσ,k
+O((p1 · p2)2) . (117)
The quantities Qσ,k come from the sum over spin structures and are defined in [37].
Introducing the notation
E2(0, U) =
k=0,1
E2(0, U)γσ,kQσ,k
, (118)
we end up with (neglecting some irrelevant factors of gc, α
′, terms subleading in the
low-energy expansion, and constants of order 1)
〈VUVŪ〉 ∼ −i(p1 · p2)
(4π2α′)2
(U − Ū)2
E2(0, U) . (119)
To read off the one-loop correction to the kinetic term of U we need to perform a Weyl
rescaling to the Einstein frame. In the one-loop term (119) this just leads to
Weyl rescaling: × e
Vstr , (120)
where
Vstr = VstrK3
Gstr (121)
is the overall volume in string frame. The Kähler potential can then be read off from
the kinetic term by use of the identity
∂U∂ŪE2(0, U) = −
(U − Ū)2
E2(0, U) , (122)
producing the final result
∆Kgs ∼
Gstre
Vstr(S + S̄)
E2(0, U) , (123)
where
Gstre
Φ/Vstr is to be interpreted as a function of the Kähler variables. In the
K3× T 2 orientifold case, using (121), this is just proportional to eΦ/VstrK3 ∼ (T + T̄ )−1
(with Re T the volume of K3 measured in Einstein frame), giving a result T-dual
to [37] (note that we switched the real and imaginary parts in the definition of T and
S as compared to [37], to conform with the rest of this paper). As we argue in section
3.1, in general the dependence on the Kähler moduli will be more complicated than
this, because there is no analog to the relation (121). It is still clear that the inverse
suppression in the overall volume will appear as in (123), given that it is a direct
consequence of the Weyl rescaling.
F Factorized approximation
As mentioned in section 4.2, it is an important issue to what extent the moduli spaces
of Kähler and complex structure moduli factorize. In this appendix, we give further
details on the factorized approximation.
A common starting point in the analysis of the potential arising in type IIB theory
with 3-form fluxes is to assume that all complex structure moduli Uα and the dilaton
S are stabilized by demanding
DUαW = 0 = DSW . (124)
In this case the F-term potential for the moduli (7) reduces to
V = eK
G̄iD̄W̄DiW − 3|W |2
, (125)
where as in the main text, the indices i and j refer only to the Kähler moduli and
thus run from 1 to h1,1. Note that even though the complex structure moduli and
the dilaton are assumed to be stabilized by (124), the inverse metric G̄i is part of
the inverse of the whole moduli space metric. More precisely, if we denote the Kähler
moduli by Ti, as before, and all other moduli (i.e. the complex structure moduli and
the dilaton) collectively as Za, the moduli space metric is given by
GIJ̄ ∼
Ki̄ Kib̄
Ka̄ Kab̄
. (126)
We denote the inverse of this (whole) metric by GJ̄I . In general
G̄i 6= (Ki̄)−1 . (127)
Equality only holds if Gib̄ = 0, i.e. if the moduli space of the Kähler moduli is factorized
from the rest, as it is the case without loop and α′ corrections.
In this appendix, we would like to investigate at which order in a large volume
expansion the two matrices in (127) start to deviate from each other. For this analysis
we assume a volume of the “Swiss cheese” form as in (48) and a Kähler potential of
the form (24) (without taking possible effects of fluxes on the KK and winding mode
spectra into account as was done in appendix D; thus, gaK ∼ ta and g
K ∼ t−1q for some
2-cycle volumes). To avoid cumbersome notation we will indicate all the small moduli
collectively as τs. We then use the formula
A−1(1 +BP−1CA−1) −A−1BP−1
−P−1CA−1 P−1
, (128)
where P is the Schur complement of A, defined as
P = D − CA−1B . (129)
In our case P is the Schur complement of Ki̄. From (24) we read off that
GIJ̄ ∼
τ−2b τ
τ−2b τ
τ−2b τ
, (130)
where we only indicate the τb dependence and the indices run over I, J = {Tb, Ts, U, S}.
Here β = −2 for those τi with a nonvanishing ai in (48) (so β has an implicit index
i), otherwise β = −5/2 (which is in particular the value in the P[1,1,1,6,9] case). We
decompose GIJ̄ as in (128)
τ−2b τ
, A−1 ∼
τ 2b τ
7/2+β
7/2+β
B = CT ∼
τ−2b τ
, D ∼
τ 0b τ
τ−1b τ
τ 0b τ
τ−1b τ
, P−1 ∼
τ 0b τ
τ−1b τ
. (131)
Using (128) one easily obtains the scaling of the inverse:
GJ̄I ∼
τ 2b τ
7/2+β
7/2+β
τ 0b τ
τ 0b τ
. (132)
Now, from (128), G̄i receives two contributions. The first is K ̄i, that would be the
only term in the case of a factorized metric; the second is K−1h̄ Khb̄P
Kal̄K
, that
breaks factorization. Let us compare their τb scaling:
G̄i = A−1 + A−1BP−1CA−1 (133)
τ 2b τ
7/2+β
7/2+β
τ 0b τ
τ 0b τ
Thus the corrections coming from non-vanishing off-diagonal metric elements in (126)
set in with a suppression by τ−2b , τ
−7/2−β
b and τ
b in G
b̄b, Gb̄s and Gs̄s, respectively.
In the explicit example based on P[1,1,1,6,9], β = −5/2, and we checked this result by
comparing to the subleading terms in (65).
F.1 Factorized approximation of the scalar potential
What we are really interested in is not the (inverse) metric itself, but the scalar po-
tential, to which we now turn. For the nonperturbative terms Vnp1 and Vnp2, the
suppression of the off-diagonal terms in (133) is inherited by the scalar potential, as
they are proportional to Gs̄sW̄,s̄W,s and G
̄sK̄, respectively. For V3 things are not as
simple, due to the no-scale structure at leading order. Let us neglect for a moment all
the quantum corrections, then the no-scale structure implies
Gı̄jKı̄Kj − 3
no−scale ∼ (τ
b , τ
τ 2b τ
7/2+β
7/2+β
∼ τ 0b + τ
2β+7/2
b = 0 . (134)
The two terms have to vanish independently. Now let us add corrections that break
no-scale structure. Because of the cancellation described in appendix C.3, the leading
contribution can be seen to come at order τ
b (from the α
′, E (K)s and E (W )s corrections).
On the other hand, the off-diagonal terms appear at order
Gı̄jKı̄Kj
off−diagonal ∼ (τ
b , τ
τ 0b τ
τ 0b τ
∼ τ−2b + τ
b + τ
∼ τ−2b + . . . , (135)
for both β = −2 and β = −5/2. Therefore, the off-diagonal terms of the moduli space
metric appear in the scalar potential with a suppression of at least τ
b (as is confirmed
by the explicit example of section 3.3, cf. formulas (26)-(30)). The suppression can be
even stronger if some corrections are absent and the leading term in (135) vanishes.
To summarize: if one is only interested in the leading term of the scalar potential in
the large volume (i.e. large τb) expansion, then one can use the factorized approximation,
G̄i = K ̄i +O
. (136)
This provides a useful tool to simplify the calculations.
References
[1] S. B. Giddings, S. Kachru, and J. Polchinski, Hierarchies from fluxes in string
compactifications, Phys. Rev. D66 (2002) 106006, [hep-th/0105097].
[2] S. Kachru, R. Kallosh, A. Linde, and S. P. Trivedi, De Sitter vacua in string
theory, Phys. Rev. D68 (2003) 046005, [hep-th/0301240].
[3] V. Balasubramanian, P. Berglund, J. P. Conlon, and F. Quevedo, Systematics of
moduli stabilisation in Calabi-Yau flux compactifications, JHEP 03 (2005) 007,
[hep-th/0502058].
http://xxx.lanl.gov/abs/hep-th/0105097
http://xxx.lanl.gov/abs/hep-th/0301240
http://xxx.lanl.gov/abs/hep-th/0502058
[4] J. P. Conlon, F. Quevedo, and K. Suruliz, Large-volume flux compactifications:
Moduli spectrum and D3/D7 soft supersymmetry breaking, JHEP 08 (2005)
007, [hep-th/0505076].
[5] K. Becker, M. Becker, M. Haack, and J. Louis, Supersymmetry breaking and
alpha’-corrections to flux induced potentials, JHEP 06 (2002) 060,
[hep-th/0204254].
[6] J. P. Conlon and F. Quevedo, Gaugino and scalar masses in the landscape,
JHEP 06 (2006) 029, [hep-th/0605141].
[7] J. P. Conlon, S. S. Abdussalam, F. Quevedo, and K. Suruliz, Soft SUSY
breaking terms for chiral matter in IIB string compactifications, JHEP 01
(2007) 032, [hep-th/0610129].
[8] J. P. Conlon, The QCD axion and moduli stabilisation, JHEP 05 (2006) 078,
[hep-th/0602233].
[9] J. P. Conlon, Seeing the invisible axion in the sparticle spectrum, Phys. Rev.
Lett. 97 (2006) 261802, [hep-ph/0607138].
[10] J. P. Conlon and D. Cremades, The neutrino suppression scale from large
volumes, hep-ph/0611144.
[11] J. P. Conlon and F. Quevedo, Kaehler moduli inflation, JHEP 01 (2006) 146,
[hep-th/0509012].
[12] R. Holman and J. A. Hutasoit, Axionic inflation from large volume flux
compactifications, hep-th/0603246.
[13] J. Simon, R. Jimenez, L. Verde, P. Berglund, and V. Balasubramanian, Using
cosmology to constrain the topology of hidden dimensions, astro-ph/0605371.
[14] J. R. Bond, L. Kofman, S. Prokushkin, and P. M. Vaudrevange, Roulette
inflation with Kaehler moduli and their axions, hep-th/0612197.
[15] G. L. Kane, P. Kumar, and J. Shao, LHC string phenomenology,
hep-ph/0610038.
[16] F. Denef, M. R. Douglas, and B. Florea, Building a better racetrack, JHEP 06
(2004) 034, [hep-th/0404257].
http://xxx.lanl.gov/abs/hep-th/0505076
http://xxx.lanl.gov/abs/hep-th/0204254
http://xxx.lanl.gov/abs/hep-th/0605141
http://xxx.lanl.gov/abs/hep-th/0610129
http://xxx.lanl.gov/abs/hep-th/0602233
http://xxx.lanl.gov/abs/hep-ph/0607138
http://xxx.lanl.gov/abs/hep-ph/0611144
http://xxx.lanl.gov/abs/hep-th/0509012
http://xxx.lanl.gov/abs/hep-th/0603246
http://xxx.lanl.gov/abs/astro-ph/0605371
http://xxx.lanl.gov/abs/hep-th/0612197
http://xxx.lanl.gov/abs/hep-ph/0610038
http://xxx.lanl.gov/abs/hep-th/0404257
[17] L. Görlich, S. Kachru, P. K. Tripathy, and S. P. Trivedi, Gaugino condensation
and nonperturbative superpotentials in flux compactifications, JHEP 12 (2004)
074, [hep-th/0407130].
[18] P. K. Tripathy and S. P. Trivedi, D3 brane action and fermion zero modes in
presence of background flux, JHEP 06 (2005) 066, [hep-th/0503072].
[19] F. Denef, M. R. Douglas, B. Florea, A. Grassi, and S. Kachru, Fixing all moduli
in a simple F-theory compactification, Adv. Theor. Math. Phys. 9 (2005)
861–929, [hep-th/0503124].
[20] N. Saulina, Topological constraints on stabilized flux vacua, Nucl. Phys. B720
(2005) 203–210, [hep-th/0503125].
[21] R. Kallosh, A.-K. Kashani-Poor, and A. Tomasiello, Counting fermionic zero
modes on M5 with fluxes, JHEP 06 (2005) 069, [hep-th/0503138].
[22] L. Martucci, J. Rosseel, D. Van den Bleeken, and A. Van Proeyen, Dirac actions
for D-branes on backgrounds with fluxes, Class. Quant. Grav. 22 (2005)
2745–2764, [hep-th/0504041].
[23] P. Berglund and P. Mayr, Non-perturbative superpotentials in F-theory and
string duality, hep-th/0504058.
[24] E. Bergshoeff, R. Kallosh, A.-K. Kashani-Poor, D. Sorokin, and A. Tomasiello,
An index for the Dirac operator on D3 branes with background fluxes, JHEP
10 (2005) 102, [hep-th/0507069].
[25] D. Lüst, S. Reffert, W. Schulgin, and P. K. Tripathy, Fermion zero modes in the
presence of fluxes and a non- perturbative superpotential, JHEP 08 (2006) 071,
[hep-th/0509082].
[26] D. Lüst, S. Reffert, E. Scheidegger, W. Schulgin, and S. Stieberger, Moduli
stabilization in type IIB orientifolds. II, Nucl. Phys. B766 (2007) 178–231,
[hep-th/0609013].
[27] R. Blumenhagen, M. Cvetic, and T. Weigand, Spacetime instanton corrections
in 4D string vacua - the seesaw mechanism for D-brane models, Nucl. Phys.
B771 (2007) 113–142, [hep-th/0609191].
http://xxx.lanl.gov/abs/hep-th/0407130
http://xxx.lanl.gov/abs/hep-th/0503072
http://xxx.lanl.gov/abs/hep-th/0503124
http://xxx.lanl.gov/abs/hep-th/0503125
http://xxx.lanl.gov/abs/hep-th/0503138
http://xxx.lanl.gov/abs/hep-th/0504041
http://xxx.lanl.gov/abs/hep-th/0504058
http://xxx.lanl.gov/abs/hep-th/0507069
http://xxx.lanl.gov/abs/hep-th/0509082
http://xxx.lanl.gov/abs/hep-th/0609013
http://xxx.lanl.gov/abs/hep-th/0609191
[28] M. Haack, D. Krefl, D. Lüst, A. Van Proeyen, and M. Zagermann, Gaugino
condensates and D-terms from D7-branes, JHEP 01 (2007) 078,
[hep-th/0609211].
[29] N. Akerblom, R. Blumenhagen, D. Lüst, E. Plauschinn, and
M. Schmidt-Sommerfeld, Non-perturbative SQCD superpotentials from string
instantons, JHEP 04 (2007) 076, [hep-th/0612132].
[30] D. Tsimpis, Fivebrane instantons and Calabi-Yau fourfolds with flux, JHEP 03
(2007) 099, [hep-th/0701287].
[31] M. Bianchi and E. Kiritsis, Non-perturbative and flux superpotentials for Type
I strings on the Z3 orbifold, hep-th/0702015.
[32] R. Argurio, M. Bertolini, G. Ferretti, A. Lerda, and C. Petersson, Stringy
instantons at orbifold singularities, arXiv:0704.0262 [hep-th].
[33] K. Choi, A. Falkowski, H. P. Nilles, M. Olechowski, and S. Pokorski, Stability of
flux compactifications and the pattern of supersymmetry breaking, JHEP 11
(2004) 076, [hep-th/0411066].
[34] D. Lüst, S. Reffert, W. Schulgin, and S. Stieberger, Moduli stabilization in type
IIB orientifolds. I: Orbifold limits, Nucl. Phys. B766 (2007) 68–149,
[hep-th/0506090].
[35] D. Krefl and D. Lüst, On supersymmetric minkowski vacua in IIB orientifolds,
JHEP 06 (2006) 023, [hep-th/0603166].
[36] M. Gomez-Reino and C. A. Scrucca, Locally stable non-supersymmetric
Minkowski vacua in supergravity, JHEP 05 (2006) 015, [hep-th/0602246].
[37] M. Berg, M. Haack, and B. Körs, String loop corrections to Kaehler potentials
in orientifolds, JHEP 11 (2005) 030, [hep-th/0508043].
[38] I. Antoniadis, C. Bachas, C. Fabre, H. Partouche, and T. R. Taylor, Aspects of
type I - type II - heterotic triality in four dimensions, Nucl. Phys. B489 (1997)
160–178, [hep-th/9608012].
[39] C. Angelantonj and A. Sagnotti, Open strings, Phys. Rept. 371 (2002) 1–150,
[hep-th/0204089].
http://xxx.lanl.gov/abs/hep-th/0609211
http://xxx.lanl.gov/abs/hep-th/0612132
http://xxx.lanl.gov/abs/hep-th/0701287
http://xxx.lanl.gov/abs/hep-th/0702015
http://xxx.lanl.gov/abs/arXiv:0704.0262 [hep-th]
http://xxx.lanl.gov/abs/hep-th/0411066
http://xxx.lanl.gov/abs/hep-th/0506090
http://xxx.lanl.gov/abs/hep-th/0603166
http://xxx.lanl.gov/abs/hep-th/0602246
http://xxx.lanl.gov/abs/hep-th/0508043
http://xxx.lanl.gov/abs/hep-th/9608012
http://xxx.lanl.gov/abs/hep-th/0204089
[40] C. P. Burgess, P. Camara, S. de Alwis, S. Giddings, A. Maharana, F. Quevedo,
and K. Suruliz, Warped supersymmetry breaking, hep-th/0610255.
[41] S. B. Giddings and A. Maharana, Dynamics of warped compactifications and
the shape of the warped landscape, Phys. Rev. D73 (2006) 126003,
[hep-th/0507158].
[42] C. P. Burgess, R. Kallosh, and F. Quevedo, de Sitter string vacua from
supersymmetric D-terms, JHEP 10 (2003) 056, [hep-th/0309187].
[43] O. Lebedev, H. P. Nilles, and M. Ratz, de Sitter vacua from matter
superpotentials, Phys. Lett. B636 (2006) 126, [hep-th/0603047].
[44] E. Dudas and Y. Mambrini, Moduli stabilization with positive vacuum energy,
JHEP 10 (2006) 044, [hep-th/0607077].
[45] E. Dudas, C. Papineau, and S. Pokorski, Moduli stabilization and uplifting with
dynamically generated F-terms, JHEP 02 (2007) 028, [hep-th/0610297].
[46] E. Cremmer, S. Ferrara, C. Kounnas, and D. V. Nanopoulos, Naturally
vanishing cosmological constant in N=1 supergravity, Phys. Lett. B133 (1983)
[47] P. Candelas, A. Font, S. H. Katz, and D. R. Morrison, Mirror symmetry for two
parameter models. 2, Nucl. Phys. B429 (1994) 626–674, [hep-th/9403187].
[48] G. Curio and V. Spillner, On the modified KKLT procedure: A case study for
the P(11169)(18) model, hep-th/0606047.
[49] M. Dine and N. Seiberg, Is the superstring weakly coupled?, Phys. Lett. B162
(1985) 299.
[50] V. Balasubramanian and P. Berglund, Stringy corrections to Kaehler potentials,
SUSY breaking, and the cosmological constant problem, JHEP 11 (2004) 085,
[hep-th/0408054].
[51] M. Berg, M. Haack, and B. Körs, On volume stabilization by quantum
corrections, Phys. Rev. Lett. 96 (2006) 021601, [hep-th/0508171].
[52] R. Blumenhagen, B. Körs, D. Lüst, and S. Stieberger, Four-dimensional string
compactifications with D-branes, orientifolds and fluxes, hep-th/0610327.
http://xxx.lanl.gov/abs/hep-th/0610255
http://xxx.lanl.gov/abs/hep-th/0507158
http://xxx.lanl.gov/abs/hep-th/0309187
http://xxx.lanl.gov/abs/hep-th/0603047
http://xxx.lanl.gov/abs/hep-th/0607077
http://xxx.lanl.gov/abs/hep-th/0610297
http://xxx.lanl.gov/abs/hep-th/9403187
http://xxx.lanl.gov/abs/hep-th/0606047
http://xxx.lanl.gov/abs/hep-th/0408054
http://xxx.lanl.gov/abs/hep-th/0508171
http://xxx.lanl.gov/abs/hep-th/0610327
[53] M. B. Green, Interconnections between type II superstrings, M theory and N =
4 Yang-Mills, hep-th/9903124.
[54] K. Choi and H. P. Nilles, The gaugino code, JHEP 04 (2007) 006,
[hep-ph/0702146].
[55] G. von Gersdorff and A. Hebecker, Kaehler corrections for the volume modulus
of flux compactifications, Phys. Lett. B624 (2005) 270–274, [hep-th/0507131].
[56] H. P. Nilles, Supersymmetry, supergravity and particle physics, Phys. Rept.
110 (1984) 1.
[57] S. P. Martin, A supersymmetry primer, hep-ph/9709356.
[58] V. S. Kaplunovsky, One loop threshold effects in string unification, Nucl. Phys.
B307 (1988) 145, [hep-th/9205068].
[59] J. P. Conlon, D. Cremades, and F. Quevedo, Kaehler potentials of chiral matter
fields for Calabi-Yau string compactifications, JHEP 01 (2007) 022,
[hep-th/0609180].
[60] D. Berenstein, Branes vs. GUTS: Challenges for string inspired phenomenology,
hep-th/0603103.
[61] L. E. Ibanez, F. Marchesano, and R. Rabadan, Getting just the standard model
at intersecting branes, JHEP 11 (2001) 002, [hep-th/0105155].
[62] P. Candelas, X. De La Ossa, A. Font, S. H. Katz, and D. R. Morrison, Mirror
symmetry for two parameter models. I, Nucl. Phys. B416 (1994) 481–538,
[hep-th/9308083].
[63] M. B. Green, J. A. Harvey, and G. W. Moore, I-brane inflow and anomalous
couplings on D-branes, Class. Quant. Grav. 14 (1997) 47–52, [hep-th/9605033].
[64] K. Dasgupta, D. P. Jatkar, and S. Mukhi, Gravitational couplings and Z(2)
orientifolds, Nucl. Phys. B523 (1998) 465–484, [hep-th/9707224].
[65] Y.-K. E. Cheung and Z. Yin, Anomalies, branes, and currents, Nucl. Phys.
B517 (1998) 69–91, [hep-th/9710206].
[66] R. Minasian and G. W. Moore, K-theory and Ramond-Ramond charge, JHEP
11 (1997) 002, [hep-th/9710230].
http://xxx.lanl.gov/abs/hep-th/9903124
http://xxx.lanl.gov/abs/hep-ph/0702146
http://xxx.lanl.gov/abs/hep-th/0507131
http://xxx.lanl.gov/abs/hep-ph/9709356
http://xxx.lanl.gov/abs/hep-th/9205068
http://xxx.lanl.gov/abs/hep-th/0609180
http://xxx.lanl.gov/abs/hep-th/0603103
http://xxx.lanl.gov/abs/hep-th/0105155
http://xxx.lanl.gov/abs/hep-th/9308083
http://xxx.lanl.gov/abs/hep-th/9605033
http://xxx.lanl.gov/abs/hep-th/9707224
http://xxx.lanl.gov/abs/hep-th/9710206
http://xxx.lanl.gov/abs/hep-th/9710230
[67] J. F. Morales, C. A. Scrucca, and M. Serone, Anomalous couplings for D-branes
and O-planes, Nucl. Phys. B552 (1999) 291–315, [hep-th/9812071].
[68] J. Stefanski, Bogdan, Gravitational couplings of d-branes and o-planes, Nucl.
Phys. B548 (1999) 275–290, [hep-th/9812088].
[69] C. P. Bachas, P. Bain, and M. B. Green, Curvature terms in D-brane actions
and their M-theory origin, JHEP 05 (1999) 011, [hep-th/9903210].
[70] A. Fotopoulos, On (alpha’)**2 corrections to the D-brane action for non-
geodesic world-volume embeddings, JHEP 09 (2001) 005, [hep-th/0104146].
[71] E. Dudas, G. Pradisi, M. Nicolosi, and A. Sagnotti, On tadpoles and vacuum
redefinitions in string theory, Nucl. Phys. B708 (2005) 3–44, [hep-th/0410101].
[72] N. Kaloper and R. C. Myers, The O(dd) story of massive supergravity, JHEP
05 (1999) 010, [hep-th/9901045].
[73] S. Kachru, M. B. Schulz, and S. Trivedi, Moduli stabilization from fluxes in a
simple IIB orientifold, JHEP 10 (2003) 007, [hep-th/0201028].
[74] M. Berg, M. Haack, and B. Körs, Loop corrections to volume moduli and
inflation in string theory, Phys. Rev. D71 (2005) 026005, [hep-th/0404087].
[75] M. Berg, M. Haack, and B. Körs, On the moduli dependence of nonperturbative
superpotentials in brane inflation, hep-th/0409282.
[76] D. Lüst, P. Mayr, R. Richter, and S. Stieberger, Scattering of gauge, matter,
and moduli fields from intersecting branes, Nucl. Phys. B696 (2004) 205–250,
[hep-th/0404134].
[77] C. P. Burgess and T. R. Morris, Open and unoriented strings a la Polyakov,
Nucl. Phys. B291 (1987) 256.
[78] C. P. Burgess and T. R. Morris, Open superstrings a la Polyakov, Nucl. Phys.
B291 (1987) 285.
http://xxx.lanl.gov/abs/hep-th/9812071
http://xxx.lanl.gov/abs/hep-th/9812088
http://xxx.lanl.gov/abs/hep-th/9903210
http://xxx.lanl.gov/abs/hep-th/0104146
http://xxx.lanl.gov/abs/hep-th/0410101
http://xxx.lanl.gov/abs/hep-th/9901045
http://xxx.lanl.gov/abs/hep-th/0201028
http://xxx.lanl.gov/abs/hep-th/0404087
http://xxx.lanl.gov/abs/hep-th/0409282
http://xxx.lanl.gov/abs/hep-th/0404134
	Introduction
	Review
	KKLT
	Consistency of KKLT
	Large volume scenario (LVS)
	String loop corrections to LVS
	From toroidal orientifolds to Calabi-Yau manifolds
	LVS with loop corrections
	The P[1,1,1,6,9]4 model
	Gaugino masses
	Including loop corrections
	Other soft terms
	LVS for other classes of Calabi-Yau manifolds?
	Abundance of ``Swiss cheese'' Calabi-Yau manifolds
	Toroidal orientifolds
	Fibered Calabi-Yau manifolds
	Further corrections
	Conclusions
	Some details on LVS
	LVS for P[1,1,1,6,9]4
	Many Kähler moduli
	Loop corrected inverse Kähler metric for P[1,1,1,6,9]4
	No-scale Kähler potential in type II string theory
	No-scale structure in type IIA
	No-scale structure in type IIB
	Cancellation with just the volume modulus
	Cancellation with many Kähler moduli
	Perturbative corrections to Vnp1 and Vnp2
	KK spectrum with fluxes
	The orientifold calculation
	Factorized approximation
	Factorized approximation of the scalar potential
ABSTRACT
  We subject the phenomenologically successful large volume scenario of
hep-th/0502058 to a first consistency check in string theory. In particular, we
consider whether the expansion of the string effective action is consistent in
the presence of D-branes and O-planes. Due to the no-scale structure at
tree-level, the scenario is surprisingly robust. We compute the modification of
soft supersymmetry breaking terms, and find only subleading corrections. We
also comment that for large-volume limits of toroidal orientifolds and fibered
Calabi-Yau manifolds the corrections can be more important, and we discuss
further checks that need to be performed.

<|endoftext|><|startoftext|>
Driven activation versus thermal activation
Patrick Ilg∗ and Jean-Louis Barrat
Université de Lyon; Univ. Lyon I, Laboratoire de Physique de la Matière Condensée et des Nanostructures; CNRS,
UMR 5586, 43 Bvd. du 11 Nov. 1918, 69622 Villeurbanne Cedex, France
(Dated: November 2, 2018)
Activated dynamics in a glassy system undergoing steady shear deformation is studied by numerical simula-
tions. Our results show that the external driving force has a strong influence on the barrier crossing rate, even
though the reaction coordinate is only weakly coupled to the nonequilibrium system. This ”driven activation”
can be quantified by introducing in the Arrhenius expression an effective temperature, which is close to the one
determined from the fluctuation-dissipation relation. This conclusion is supported by analytical results for a
simplified model system.
PACS numbers: 64.70.Pf,05.40.-a,05.70.Ln
Activated rate theory is ubiquitous in the description and
understanding of dynamical processes in condensed matter,
physical chemistry or materials science. The basic problem,
known as the ”barrier crossing” or ”Kramers problem”, is that
of a single degree of freedom, coupled to a heat bath, and
moving in a double well potential. The ”barrier crossing rate”
is defined as the average time taken by the system to switch
from a potential well to the other, under the influence of ther-
mal noise. In general, the single degree of freedom, often
called ”reaction coordinate”, is coupled to a complex, fluc-
tuating environment. The ”thermal noise” is a schematic de-
scription of the interaction with this environment.
This approach has been applied to a wealth of different
problems. We can for example mention diffusion in solids, in
which case the reaction coordinate is an atomic position, and
the noise is associated with thermal vibrations. In isomeriza-
tion reactions, the reaction coordinate is an internal coordinate
of the molecule, coupled to a liquid solvent. In nucleation the-
ory, the internal coordinate describes a collective fluctuation
of an order parameter, and the ”barrier” is interpreted as a free
energy, rather than energy, barrier. Other examples involve
the Eyring theory of plasticity in solids, in which the activated
process is associated with a local strain change.
The analysis of the barrier crossing problem is often asso-
ciated with the names of Eyring, who proposed the so called
”transition state approximation” [1], and of Kramers, who
made the first complete analysis of the problem in the lim-
its of low and high friction [2]. Since then, many refinements
of the theory have been studied and are reviewed in reference
[3]. In all cases, it turns out that an essential factor in the re-
action rate, which to a large extent governs the variation with
temperature T , is the Arrhenius contribution:
r(T ) ∼ exp(−∆E/kBT ) (1)
where ∆E is the energy barrier to overcome. The exponential
variation of the Arrhenius factor (1) is, in fact, the hallmark of
activated processes.
∗Present address: ETH Zürich, Polymer Physics, HCI H541, CH-8093
Zürich, Switzerland
As discussed above, activated processes are often invoked
in the description of the dynamical response of condensed
matter systems. As such, they will typically take place un-
der nonequilibrium conditions. The deviation from equilib-
rium can be weak, e.g. during the flow of a Newtonian liquid,
in which case the applicability of equation (1) is straightfor-
ward. In other cases, however, the same equation is applied to
systems that are strongly out of equilibrium, in the sense that
their response to an external driving force is strongly nonlin-
ear, or that their phase space distribution is very different from
the equilibrium, Gibbs-Boltzmann distribution.
A prototypical example of such a strongly nonequilibrium
situation is the flow of a glassy system. Such a flow can be in-
duced only by stresses larger than the yield stress (see e.g. [4]
for the effect of strain and temperature in glassy solids). In
the absence of flow, the relaxation is very slow, and the sys-
tem is out of equilibrium and non-stationary [5, 6]. The flow
produces a nonequilibrium steady state [7, 8, 9], with a typi-
cal relaxation time that is fixed internally by the applied stress
or the strain rate. This situation has attracted a considerable
amount of theoretical and experimental interest, in two differ-
ent contexts. The first one is the rheology of ”soft glasses”
(emulsions, pastes, colloidal glasses, foams). The second one
is the plastic deformation of bulk metallic glasses. In both
cases, approaches have been proposed that introduce a ”noise
temperature” [10] or ”disorder temperature” [11, 12]. In [10],
this noise temperature replaces the actual temperature in equa-
tion (1). In such models, the effective temperature is intro-
duced in a somewhat empirical manner.
Another concept of effective temperature, rooted in sta-
tistical mechanics ideas, was introduced in [13, 14], based
on the ”fluctuation-dissipation ratio”. At equilibrium, the
fluctuation-dissipation theorem states that the ratio between
integrated response and correlation functions (FDR) is equal
to the temperature. Cugliandolo et al. [13] showed how this
concept could be extended to out-of-equilibrium system, by
defining the effective temperature from the FDR, which now
differs from the thermal bath temperature. It was proposed
that a thermometer probing a nonequilibrium system on long
time scales would actually be sensitive to this effective tem-
perature, and this result was checked numerically on simple
models [8, 9, 15]. Experimental evidence supporting this
definition of an effective temperature has been found e.g. in
http://arxiv.org/abs/0704.0738v1
[16, 17].
In this contribution, we explore the influence of an external
driving force on the rate of a simple activated process. Our
primary objective is here to check how the external drive, and
the ”noise” it generates, can influence the dynamics of an in-
ternal degree of freedom, which is not directly coupled to the
driving force. A very standard way of quantifying the results
is to use the Arrhenius representation, which provides an op-
erational way of introducing an ”activation temperature”, that
can be compared to other calculations of effective tempera-
tures in nonequilibrium systems.
Our approach involves the simulation of the classical Kob-
Andersen ”binary Lennard-Jones” model undergoing shear
flow, similar to the one used in ref. [8]. In order to probe ac-
tivated dynamics, one appealing possibility would be to iden-
tify and study the activated events that actually give rise to
the flow at low temperature, in the spirit of [10]. This ap-
proach, however, is difficult and could yield ambiguous re-
sults, as the flow is self consistently coupled to these events.
We therefore make use of the flexibility of numerical model-
ing to devise a very simple ”activated degree of freedom” that
has only a weak coupling to the existing flow in our system.
This is achieved by replacing each particle of the minority
species rBj by a peanut shaped ”dumbbell” with coordinates
j ± (uj/2)e
z, with fixed orientation along ez , the direc-
tion perpendicular to the shear plane. Each center of force
in the dumbbell carries half of the particle interaction, and
the separation between the two centers of force u is small
enough that the perturbation of the surrounding fluid can be
neglected. The important feature of the model is the fact that
the two centers of force are related through an internal ”reac-
tion coordinate” u, which evolves in a bistable intramolecular
potential V (u) = (V0/u
2 − u20)
2, where u0 = 0.1 (in
Lennard-Jones units) is the equilibrium dumbbell separation
(see fig. 1b). Each dumbbell is therefore a simple ”two-state”
system which can undergo, under the influence of the interac-
tions with the surrounding fluid, an ”isomerization reaction”.
This reaction corresponds to exchanging the positions of the
two centers of force (see fig. 1).
This ”isomerization” will be the focus of our study. Its rate
can be studied as a function of the imposed barrier height, of
the external temperature T and on the driving force, which is
here quantified by the shear rate γ̇. We have chosen to work
under conditions for T and γ̇ that have been well character-
ized previously [8] (T = 0.3 and γ̇ = 10−3, in Lennard-Jones
units) and to concentrate on the influence of barrier height
∆E = V0. At this temperature, the system would not undergo
structural relaxation on the time scales that can be achieved
using computer simulation. Under the influence of the exter-
nal drive, a relaxation on a time scale τα ≃ 100 is observed.
This time scale is very well separated from microscopic, vi-
brational time scale, so that our system is a practical realiza-
tion of the theoretical concepts described in ref. [13].
Determination of reaction rates is a notoriously difficult
challenge for numerical simulations, as the activated events
typically take place on much larger time scales than the short
time vibrations of the intramolecular bonds. A number of so-
phisticated methods [18] have been developed to bypass this
intrinsic difficulty, either from biased simulations, or by mak-
ing use directly of the rate formula 1. Unfortunately, such
methods always assume that the system is close to thermal
equilibrium, and are therefore inapplicable in our case. The
forward flux method recently proposed in [19] is applicable
to nonequilibrium systems. However, only a single reaction
coordinate per system can be treated with this method, which
is impractical for the present situation. As a result, we have
to use ”brute force” simulations to obtain reaction rates from
the study of individual trajectories, which seriously limits the
range of barrier heights that can be considered.
The Sllod equations of motion appropriate for a fluid under-
going simple shear were integrated with a leapfrog algorithm
using a time step of ∆t = 5×10−4 in reduced Lennard-Jones
units [8]. For the dumbbell particles, the leapfrog algorithm
is applied to the center-of-mass and relative positions and mo-
menta. Lees-Edwards periodic boundary conditions [8] are
employed in order to minimize effects due to the finite system
size. Constant temperature conditions are ensured by rescal-
ing the velocity components in the neutral direction of all par-
ticles at each time step.
The reaction rate r is determined from the number correla-
tion function C(t) by
C(t) =
〈δn(t)δn(0)〉
〈δn(0)2〉
≈ exp [−rt/〈n〉] (2)
where δn(t) = n(t) − 〈n〉 and n(t) equals one if u(t) > uB
and zero else [20]. The systems studied contain N = 2048
particles, 410 of which are dumbbells. Eq. (2) is evaluated
from an ensemble average over 20− 40 independent systems.
The results are found to be independent of the exact location
of the dividing surface uB in the vicinity of the barrier max-
imum uB = 0. The fast initial decay of C(t) is well de-
scribed by transition state theory. Escape rates are extracted
from fits to Eq. (2) for intermediate times 5 ≤ t ≤ 10.
We verified that very similar results are found within a broad
range 1 ≤ t ≤ 30, before the correlation function finally de-
cays to zero, in full agreement with theoretical expectation
[20]. For relatively low barrier heights V0/T . 3, C(t) de-
cays more rapidly, so that we extracted rates for shorter times,
0.5 ≤ t ≤ 1.
In the following, we present results for the reaction rate as a
function of V0 based on the study of the decay in the number-
number correlation function [20]. We adopt common practice
by giving all temperature and energy values in terms of the
depth of the Lennard-Jones potential ǫ.
Figure 2 illustrates the difficulty of the approach, by show-
ing the trajectories of selected dumbbells for different values
of the barrier at T = 0.3. For V0/kT = 1, barrier crossings
are so common that describing them trough classical rate the-
ory is problematic. For V0/kT = 10, the crossings become
very unlikely, so that the determination of the rate becomes
difficult. This leaves us with typically two decades in terms of
variation of the reaction rate.
The corresponding reaction rates, determined from the cor-
relation function of the dumbbell internal coordinate, are
shown in figure 3. At T = 0.8, γ̇ = 10−3, the rates obey
the equilibrium Arrhenius law (1), showing that under these
conditions the drive is only a weak perturbation to the system.
We now concentrate on the rates obtained at T = 0.3 and
γ̇ = 10−3. The reaction rates are clearly influenced by the
external driving imposed to the system. To show this, we use
the rates obtained at a rather high temperature, req(T = 0.8),
to extrapolate to T = 0.3. The equilibrium extrapolation rext
is achieved using the Arrhenius formula, i.e. rext(T = 0.3) =
req(T = 0.8)×exp(+V0/0.8−V0/0.3). Clearly, the extrapo-
lated rates are significantly lower than those actually observed
under shear, except at low barrier heights (high rates) were the
two estimates almost coincide. The difference between the ex-
trapolated rates and the measured ones is an indicator of the
inadequacy of the standard Arrhenius formula, using the ther-
mal bath temperature, in the driven system.
In spite of the limited range of accessible rates, it is clear
from figure 3 that the rates in a glassy system under shear do
not obey Arrhenius behavior of the form exp(−V0/T ) over
the whole range of barrier heights under study. While this
law is relatively well obeyed at low barriers and large cross-
ing rates, it would significantly underestimate the rate for high
barriers. Instead, at high barrier rates, the crossing rate is con-
siderably increased. If an attempt is made to fit the results
to an ”effective Arrhenius factor”, a value of Teff ≃ 0.6 is
obtained.
Under the same conditions, a completely different de-
termination of the effective temperature [8], based on the
fluctuation-dissipation approach mentioned above, yields
T ∗ ≃ 0.65. This is in good agreement with the present fit
to an Arrhenius law. The determination of Teff based on re-
action rates is of limited accuracy, such that we cannot ex-
clude that Teff and T
∗ actually differ slightly or that Teff is
slightly dependent on barrier height. A more precise determi-
nation of Teff would require larger barrier heights, which is
computationally quite demanding. Note, that V0 = 3 corre-
sponds for T = 0.3 already to the rather high barrier height of
V0/kT = 10.
In figure 3, we also display the results obtained for the rates
at a slightly higher value of the shear, γ̇ = 10−2. The sepa-
ration of time scales between relaxation time and microscopic
times is less marked than for the low shear rates (τα ≃ 10
in this case). It appears that the increase in shear induces a
change in the prefactor for the rates, rather than in the bar-
rier height dependence. This is consistent with the relatively
weak influence of shear rate on effective temperature reported
earlier [8].
It is interesting to discuss the time scale at which the
crossover between the two Arrhenius laws, characterized ei-
ther by the bath temperature or an effective temperature, takes
place. A natural guess would be to associate this crossover
with a value of the rate that corresponds to the inverse of the
α relaxation time. The general idea is, that fluctuations taking
place on longer time scales will be associated with a higher
temperature [8, 13]. In figure 3 we see that this guess over-
estimates the crossover rate by a factor of 5 in the case of
γ̇ = 10−3. It is not clear at this point, whether this differ-
ence is significant or reflects merely some arbitrariness in the
definition of relaxation times.
The simulation results presented above suggest that the
activated dynamics is governed by an elevated temperature
Teff ≃ 0.6 > T . This temperature is consistent with the ef-
fective temperature T ∗ = 0.65 found in extensive simulation
studies on the fluctuation-dissipation relation in this system
[8]. In order to investigate the relation between Teff and T
and to rationalize our simulation results, we study to following
toy model proposed in [21].
Consider a particle of mass m at position x moving in an
external potential V (x) under the influence of two thermal
baths. One bath, associated with the fast degrees of freedom,
is kept at temperature Tfast and exerts an instantaneous fric-
tion force of strength Γ0. The second bath, which mimics the
slow degrees of freedom is held at temperature Tslow and is
described by the retarded friction coefficient (memory kernel)
Γ(t). The equations of motion read ẋ = v,
mv̇ = −V ′(x)−
dsΓ(t−s)v(s)−Γ0v(t)+ξ(t)+η(t) (3)
The fast bath is modeled as Gaussian white noise with
〈η(t)〉 = 0, 〈η(t)η(s)〉 = 2Tfastδ(t − s), whereas the ran-
dom force due to the slow bath is described by 〈ξ(t)〉 = 0,
〈ξ(t)ξ(s)〉 = 2TslowΓ(t − s). We use an exponentially de-
caying memory kernel Γ(t) = α−1e−t/(αγ) for which the
non-Markovian dynamics (3) can equivalently be rewritten as
Markovian dynamics in an extended set of variables [22].
Exact solutions of the model (3) for harmonic potentials
V are presented in [21]. For barrier crossing problems with
double-well potentials V , no analytical solutions to (3) are
known. We therefore extend the widely used transition state
approximation to the present model after adiabatic elimina-
tion of the fast degrees of freedom. The resulting expres-
sion for the rate rTST is rather lengthy and will be presented
elsewhere together with the (straightforward) procedure. For
the double-well potential V (x) considered above, the depen-
dence of the rate rTST on the barrier height V0 is again domi-
nated by the Arrhenius factor, however with an effective tem-
perature Teff,TST = Tfastw/[w + 4(Tfast − Tslow)], where
w = Tslow + αV
′′Tfast and V
′′ = 8V0/u
0. Thus, if the
slow and fast bath are both kept at the same temperature,
Tslow = Tfast = T , one recovers the usual Kramers re-
sult with Teff,TST = T . If, however, Tslow > Tfast, the
escape rate is enhanced due to Teff,TST > Tfast. Due to
the interplay between fast and slow dynamics in the barrier
crossing, the effective temperature is in general intermedi-
ate between the temperature of the slow and the fast bath.
These predictions are in agreement with the simulation re-
sults presented above. Furthermore, estimating the coefficient
α ≈ 0.01 from the inverse high frequency shear modulus for
the Lennard-Jones system [22], the predicted effective tem-
perature is Teff,TST ≈ 0.45. In view of the simplicity of the
model and the uncertainty in α, the order of magnitude agree-
ment with the observed Teff is reasonable.
In conclusion, we have shown that activated processes out
of equilibrium are influenced by an external driving, even if
the corresponding degree of freedom is weakly coupled to
the drive. Qualitatively, this increase is the essential result
from our simulations. From a more quantitative point of view,
the analysis of the Arrhenius plot allows one to define oper-
ationally an effective activation temperature. The link of this
activation temperature to other definitions of effective tem-
perature, and the time scale for the crossover from ”thermal
activation” to ”driven activation” will have to be explored fur-
ther. However, the results are consistent with a general picture
involving a degree of freedom coupled to two different heat
baths, one associated with short time vibrations and one asso-
ciated with shear induced fluctuations, taking place on longer
time scales and described by a higher temperature [21].
This ”driven activation” (as opposed to ”thermal” activa-
tion) could have interesting consequences for characterizing
the effective temperature of nonequilibrium systems, by pro-
viding a ”thermometer” based on activated processes. It can
also be of importance within the theory of plasticity of amor-
phous materials, by providing a self-consistent description of
the ”noise” that induces local plastic events, within a classical
statistical mechanics description involving a noise tempera-
ture [11, 23]. Inserting an effective temperature in Eyring’s
rate theory of plasticity, T. Haxton and A. J. Liu were recently
able to account for the flow curves of simple glassy systems
at low temperatures [24].
[1] H. Eyring, J. Chem. Phys. 3, 107 (1935).
[2] H. A. Kramers, Physica 7, 284 (1940).
[3] P. Hänggi, P. Talkner, and M. Borkovec, Rev. Mod. Phys. 62,
251 (1990).
[4] J. Rottler and M. O. Robbins, Phys. Rev. E 68, 011507 (2003).
[5] L. Berthier, G. Biroli, J.-P. Bouchaud, L. Cipelletti, D. E. Masri,
D. L’Hôte, F. Ladieu, and M. Pierno, Science 310, 1797 (2005).
[6] K. N. Pham, A. M. Puertas, J. Bergenholtz, S. U. Egelhaaf,
A. Moussaı̈d, P. N. Pusey, A. B. Schofield, M. E. Cates,
M. Fuchs, and W. C. K. Poon, Science 296, 104 (2002).
[7] L. Berthier, J.-L. Barrat, and J. Kurchan, Phys. Rev. E 61, 5464
(2000).
[8] L. Berthier and J.-L. Barrat, J. Chem. Phys. 116, 6228 (2002).
[9] L. Berthier and J.-L. Barrat, Phys. Rev. Lett. 89, 095702 (2002).
[10] P. Sollich, F. Lequeux, P. Hébraud, and M. E. Cates, Phys. Rev.
Lett. 78, 2020 (1997).
[11] J. S. Langer, Phys. Rev. E 70, 041502 (2004).
[12] J. S. Langer and A. Lemaitre, Phys. Rev. Lett. 94, 175701
(2005).
[13] L. Cugliandolo, J. Kurchan, and L. Peliti, Phys. Rev. E 55, 3898
(1997).
[14] J. Kurchan, Nature 433, 222 (2005).
[15] I. K. Ono, C. S. O’Hern, D. J. Durian, S. A. Langer, A. J. Liu,
and S. R. Nagel, Phys. Rev. Lett. 89, 095703 (2002).
[16] P. Wang, C. Song, and H. A. Makse, Nature Physics 2, 526
(2006).
[17] D. Herisson and M. Ocio, Phys. Rev. Lett. 88, 257202 (2002).
[18] see e.g. C. Dellago, P. G. Bolhuis, and P. L. Geissler, Advances
Chem. Phys. 123, 1 (2002).
[19] R. J. Allen, P. B. Warren, and P. R. ten Wolde, Phys. Rev. Lett.
94, 018104 (2005).
[20] D. Chandler, J. Chem. Phys. 68, 2959 (1978).
[21] P. Ilg and J.-L. Barrat, J. Phys.: Conf. Ser. 40, 76 (2006).
[22] J.-L. Barrat, Chem. Phys. Lett. 165, 551 (1990).
[23] A. Lemaitre and C. Caroli, cond-mat/0609689.
[24] A. Liu, private communication.
Figures
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
�������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
������
dumbbell separation  u 
FIG. 1: (a) Schematic representation of a dumbbell particle in a sys-
tem undergoing shear flow with fixed orientation perpendicular to the
shear plane. Also shown is an isomerization reaction. The magnitude
of the separation between the two centers of force is considerably
exaggerated in this schematic representation. (b) Intramolecular po-
tential (characteristic of the nonlinear ”spring” shown in panel (a) )
between the two centers of force that define the dumbbell.
FIG. 2: (Color online) Trajectories of the internal dumbbell coor-
dinate for different barrier heights. The thermal bath temperature
is T = 0.3 in reduced Lennard-Jones units, and the shear rate is
γ̇ = 10
1 1.5 2 2.5 3 3.5
barrier height  V
τα, γ=10−2
τα, γ=10−3
1 / 0.8
1 / 0.3
1 / 0.6
FIG. 3: (Color online) Reaction rate as a function of barrier height,
for fixed temperature and shear rate. Full squares: results for T =
0.8 (red) at equilibrium. T = 0.3 (black and blue), and different
shear rates (full diamonds and circles correspond to γ̇ = 10−2 and
γ̇ = 10
−3, respectively). Open circles represent rext, an extrapola-
tion of the high temperature results to T = 0.3 as explained in the
text.
ABSTRACT
  Activated dynamics in a glassy system undergoing steady shear deformation is
studied by numerical simulations. Our results show that the external driving
force has a strong influence on the barrier crossing rate, even though the
reaction coordinate is only weakly coupled to the nonequilibrium system. This
"driven activation" can be quantified by introducing in the Arrhenius
expression an effective temperature, which is close to the one determined from
the fluctuation-dissipation relation. This conclusion is supported by
analytical results for a simplified model system.

<|endoftext|><|startoftext|>
Computation of Power Loss in Likelihood
Ratio Tests for Probability Densities
Extended by Lehmann Alternatives
Lucas Gallindo Martins Soares
Departamento de Estatı́stica e Informática
Universidade Federal Rural de Pernambuco, Brasil
lucasgallindo@gmail.com
Abstract
We compute the loss of power in likelihood ratio tests when we test
the original parameter of a probability density extended by the first
Lehmann alternative.
1 Distributions Generated by Lehmann Alter-
natives
In the context of parametric models for lifetime data, [Gupta et alii 1998]
disseminated the study of distributions generated by Lehmann alterna-
tives, cumulative distributions that take one of the following forms:
G1 (x, λ) = [F (x)]
λ or G2 (x, λ) = 1− [1− F (x)]
λ (1)
where F (x) is any cumulative distribution and λ > 0. In the present note,
we are going to call both G distributions generated distributions or extended
distributions. It is easy to see that for integer values of λ, G1 and G2 are,
respectively, the distribution of the maximum and the minimum of a sample
of size λ, the support of the two distribution is the same of F , and that the
associated density functions are
g1 (x, λ) = λf(x) [F (x)]
λ−1 and g2 (x, λ) = λf(x) [1− F (x)]
λ−1 (2)
where f(x) is the density function associated with F . Suppose that we
generate a distribution G(x|λ) based on the distribution F (x), and want to
generate another distribution G′(x|λ, λ′) repeating the process; It is easy
to see that the distribution G′ will be the same as G, for the new param-
eter of the distribution, λλ′ may be summarized as a single one. This has
the interesting side effect that the standard uniparametric exponential dis-
tribution may be seen as a distribution generated by the second Lehmann
alternative from the distribution F (x) = 1− e−x.
To compute the moments of distribution generated by Lehmann alter-
natives, we use the change of variables u = F (x) in the expression
xkλf(x) [F (x)]λ−1 dx (3)
yielding
λQk(u)uλ−1du = EBeta(λ,1) [Q(u)] (4)
where Q(u) = F−1(u) is the quantile function. This integral is equivalent to
the expectancy of Q(u) with respect to a Beta distribution with parameters
α = λ, β = 1. The same reasoning can be used to show that, for the second
Lehmann alternative, E
= EBeta(1,λ) [Q(u)].
Using the log-likelihood functions
G1 (x, λ) = n ln (λ) +
ln f (xj) + (λ− 1)
lnF (xj) (5)
G2 (x, λ) = n ln (λ) +
ln f (xj) + (λ− 1)
ln [1− F (xj)] (6)
we see that the maximum likelihood estimators to the parameter λ have
the forms
λ̂ = −
j=1 lnF (xj)
and λ̂ = −
j=1 ln [1− F (xj)]
The existing literature about distributions generated by Lehmann al-
ternatives concerns mostly distributions defined on the interval (0,∞) or in
the real line, with the paper by [Nadarajah and Kotz 2006] being the more
complete review of progresses and the paper [Nadarajah 2006] being an in-
teresting application of the concepts developed outside the original proposal
by [Gupta et alii 1998], which was to analyze lifetime data. In the present
paper, we are concerned with some information theoretical quantities of the
first extension. These are not the only papers dealing with the subject, but
a complete list with comments would be a paper on its own.
2 Kullback-Leibler Divergence
Given two probability density functions, the quantity defined as
DKL (f |g) =
f(x) ln
dx (8)
is called Kullback-Leibler Divergence (abbreviated DKL) after the authors
of the classical paper [Kullback and Leibler 1951]. Very often, this quantity
is used as a measure of distance between two probability density functions,
even though it is not a metric; This divergence measure clearly is greater
or equal than zero, with zero occurring only and only if f = g, but it is
not symmetric, so DKL (f |g) 6= DKL (g|f), and it does not obey the triangle
inequality also.
Rewriting equation (8), we get∫
f(x) ln
f(x) ln(f(x))− f(x) ln(g(x))dx (9)
= Ef [ln(f(X))]− Ef [ln(g(X))] (10)
where Ef [h(X)] is the expectation of the random variable h(X) with respect
to the probability density f . Since DKL (f |g) is greater than zero, we have
Ef [ln(f(X))] > Ef [ln(g(X))] (11)
We will now show that maximizing the likelihood is equivalent to mini-
mize DKL (f |e), where e is the empirical distribution function. Calculating
DKL (f |e) we arrive at
DKL (f |e) = Ef [ln(f(X))]−
ln (f(xj , θ)) (12)
where the rightmost term is the empirical log-likelihood multiplied by a
constant. So, maximizing the rightmost term we minimize the whole diver-
gence; Then the process of maximizing the likelihood is equivalent to mi-
nimizing the divergence between the empirical density and the parametric
model. This result is very common in the related literature, and is shown in
full detail on sources like [Eguchi and Copas 1998], which gives an accessi-
ble but rather compact deduction of properties of methods based on Like-
lihood Functions using DKL. In the next (and last) section we draw freely
from a result shown in the [Eguchi and Copas 1998] paper that states that
DKL might be used to measure the loss of power in likelihood ratio tests
when the distribution under the alternative hypothesis is mis-specified.
3 Wrong Specification of Reference Distribu-
tion and Loss of Power in Likelihood Ratio
Tests
Suppose we have data from a probability distribution H(x|θ, λ), and want
to test the hypothesis that (θ = θ0, λ = λ0). The usual log-likelihood ratio is
expressed as
Λ(λ0, θ0) =
`(λ̂, θ̂)
`(λ0, θ0)
where the notation ξ̂ is used for the unrestricted maximum likelihood es-
timative of the parameter ξ. Suppose we are not willing to (or not able
to) compute `(λ̂, θ̂) because the estimative of the parameter λ is trouble-
some and decide to approximate the likelihood ratio statistic using `(λ1, θ̃)
instead of the likelihood under the alternative hypothesis, where θ̃ is the
maximum likelihood estimator of θ given that λ = λ1. We have then the
relation
Λ(x) ≈
`(λ1, θ̃)
`(λ0, θ0)
A result by [Eguchi and Copas 1998], section 3, states that the test statistic
generated this way is less powerful than the usual one, with the loss in the
power equal to
∆Power = DKL
f(x|λ̂, θ̂), f(x|λ1, θ̃)
In the present paper, we are concerned with the case where the data
follows a distribution extended with the first Lehmann alternative, where
the original distribution is such that F = F (x|θ) for a parameter θ. The null
hypothesis will be of the form
H0 : θ = θ0, λ = 1 (16)
against a alternative hypothesis
HA : θ 6= θ0, λ 6= 1 (17)
If we erroneously consider that the data doesn’t come from a extended dis-
tribution G(x|λ, θ), but from a population that follows the original F (x|θ)
distribution, we can say that we are approximating the log-likelihood un-
der the alternative hypothesis like in the previous discussion. In this case,
the log-likelihood will be taken under the hypothesis
HA′ : θ 6= θ0, λ = 1 (18)
which generates the following expression for the log-likelihood:
Λ(x) ≈
`(1, θ̃)
`(1, θ0)
Then we have that the test has less power than the one using the full G
distribution; The difference on the power of the tests is given by
∆Power = DKL
g(x|λ̂, θ̂)|g(x|1, θ̃)
The main point in the above discussion is that for testing hypotheses about
the ”original” parameter ξ, the tests using the extended version of distribu-
tions are always more powerful, with a considerable difference in the error
type II rate.
Expanding the equation (20) we have that
∆P = DKL
g(x|λ̂, θ̂)|g(x|1, θ̃)
g(x|λ̂, θ̂) ln
g(x|λ̂, θ̂)
g(x|1, θ̃)
dx (22)
λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln
λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂)
f(x|1, θ̃)
dx (23)
λf(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln
λFλ−1(x|λ̂, θ̂)
dx (24)
= lnλ+
λ(λ− 1)f(x|λ̂, θ̂)Fλ−1(x|λ̂, θ̂) ln
F (x|λ̂, θ̂)
dx (25)
Integrating by parts, we get
∆Power = lnλ+
The graphic of this function is the loss of power that we have on our test
when we the distribution of our data is one extended by the first Lehmann
alternative and we fail to notice that, and is depicted in Figure 1 for values
of λ bigger than one.
References
[Eguchi and Copas 1998] EGUCHI, S. AND COPAS, J. (2006). Interpreting
Kullback-Leibler divergence with the Neyman-Pearson lemma. Journal
of Multivariate Analysis, vol. 97, Issue 9, pages 2034-2040.
[Gupta et alii 1998] GUPTA, R. C., GUPTA, P. L. AND GUPTA, R. D. (1998).
Modeling failure time data by Lehman alternatives. Communication in
Statistics: Theory and Methods, vol. 27, pages 887-904.
[Kullback and Leibler 1951] KULLBACK, S. AND LEIBLER, R. A. (1951).
On information and sufficiency. The Annals of Mathematical Statistics,
vol. 22, Number 1, pages 79-86.
[Nadarajah and Kotz 2006] NADARAJAH, S., KOTZ, S. (2006). The Expo-
nentiated Type Distributions. Acta Applicandae Mathematicae, vol. 92,
pages 97-111.
[Nadarajah 2006] NADARAJAH, S. 2006. The exponentiated Gumbel dis-
tribution with climate application. Environmetrics, vol. 17, Number 1,
pages 13-23.
Figure 1: Loss of Power as a Function of λ, for λ > 1.
	Distributions Generated by Lehmann Alternatives
	Kullback-Leibler Divergence
	Wrong Specification of Reference Distribution and Loss of Power in Likelihood Ratio Tests
ABSTRACT
  We compute the loss of power in likelihood ratio tests when we test the
original parameter of a probability density extended by the first Lehmann
alternative.

<|endoftext|><|startoftext|>
Introduction
	The method
	Two-integral dynamics
	A preliminary analysis
	Results
	Miyamoto-Nagai disks
	Thick exponential disks
	Milky-Way like galaxies
	Discussion and conclusions
	REFERENCES
ABSTRACT
  We investigate the possibility of discriminating between Modified Newtonian
Dynamics (MOND) and Newtonian gravity with dark matter, by studying the
vertical dynamics of disk galaxies. We consider models with the same circular
velocity in the equatorial plane (purely baryonic disks in MOND and the same
disks in Newtonian gravity embedded in spherical dark matter haloes), and we
construct their intrinsic and projected kinematical fields by solving the Jeans
equations under the assumption of a two-integral distribution function. We
found that the vertical velocity dispersion of deep-MOND disks can be much
larger than in the equivalent spherical Newtonian models. However, in the more
realistic case of high-surface density disks this effect is significantly
reduced, casting doubts on the possibility of discriminating between MOND and
Newtonian gravity with dark matter by using current observations.

<|endoftext|><|startoftext|>
Al’tshuler-Aronov correction to the conductivity of
a large metallic square network
Christophe Texier1, 2 and Gilles Montambaux2
1Laboratoire de Physique Théorique et Modèles Statistiques,
UMR 8626 du CNRS, Université Paris-Sud, F-91405 Orsay Cedex, France.
2Laboratoire de Physique des Solides, UMR 8502 du CNRS,
Université Paris-Sud, F-91405 Orsay Cedex, France.
(Dated: April 5, 2007)
We consider the correction ∆σee due to electron-electron interaction to the conductivity of a
weakly disordered metal (Al’tshuler-Aronov correction). The correction is related to the spectral
determinant of the Laplace operator. The case of a large square metallic network is considered. The
variation of ∆σee(LT ) as a function of the thermal length LT is found very similar to the variation
of the weak localization ∆σWL(Lϕ) as a function of the phase coherence length. Our result for
∆σee interpolates between the known 1d and 2d results, but the interaction parameter entering
the expression of ∆σee keeps a 1d behaviour. Quite surprisingly, the result is very close to the 2d
logarithmic behaviour already for LT ∼ a/2, where a is the lattice parameter.
PACS numbers: 73.23.-b ; 73.20.Fz ; 72.15.Rn
I. INTRODUCTION
At low temperature, the classical (Drude) con-
ductivity of a weakly disordered metal is affected
by two kinds of quantum corrections : the first one
is the weak localization (WL) correction, a phase
coherent contribution that originates from quan-
tum interferences between reversed electronic tra-
jectories. This contribution to the averaged con-
ductivity depends on the phase coherence length
Lϕ and the magnetic field : ∆σWL(B, Lϕ). The
temperature manifests itself through Lϕ, since
phase breaking may depend on temperature, e.g.
if it originates from electron-electron1 or electron-
phonon2 interaction.
In a metal, an electron is not only elastically
scattered on the disordered potential, but, due to
the electron-electron interaction, is also scattered
by the electrostatic potential created by the other
electrons. At low temperatures, when the elas-
tic scattering rate (1/τe) dominates the electron-
electron scattering rate (1/τee(T )), the motion of
the electron is diffusive between scattering events
with other electrons. In this regime, electron-
electron interaction is responsible for a small deple-
tion of the density of states at Fermi energy (called
the DoS anomaly or the Coulomb dip) and a cor-
rection to the averaged conductivity as well, the so-
called Al’tshuler-Aronov (AA) correction3,4,5,6,7,8,9
(see Refs.10,11,12 for a recent discussion). AA and
WL corrections are of the same order (but this
latter vanishes in a magnetic field). However, con-
trary to the WL, the AA correction is not sensitive
to phase coherence and involves another important
length scale : the thermal length LT =
(~ = kB = 1). The AA correction, denoted be-
low ∆σee(LT ), has been measured in metallic wires
in several experiments14,15,16,17. From the exper-
imental point of view, AA correction allows to
study interaction effects in weakly disordered met-
als, but also furnishes a local probe of temperature
in order to control Joule heating effects15,17, which
is crucial in a phase coherent experiment.
All the works aforementioned refer to the quasi-
one-dimensional (wire) or two-dimensional (plane)
situations. Quantum transport has also been stud-
ied in more complex geometries like networks of
quasi-1d wires. For example several studies of WL
have been provided on large regular networks in
honeycomb and square metallic networks18,19, in
square networks etched in a 2DEG20, and in square
and dice silver networks21. Theoretical studies of
WL on networks have been initiated by the works
of Douçot & Rammal22,23 and improved by Pas-
caud & Montambaux24 who introduced a powerful
tool25 : the spectral determinant of the Laplace
operator, that will be used in the following (see
also Ref.26).
The aim of this paper is to study how the AA
correction can be computed in networks. In a first
part we briefly recall how the spectral determinant
can be used to compute the WL. Then in a second
part we will consider the AA correction.
II. SPECTRAL DETERMINANT AND
WEAK LOCALIZATION
Interferences of reversed electronic trajec-
tories are encoded in the Cooperon, solu-
tion of a diffusion-like equation (∂t − D[∇ −
2ieA(x)]2)Pc(x, x′; t) = δ(x − x′)δ(t), where A(x)
is the vector potential. On large regular networks,
when it is justified to integrate uniformly the
Cooperon over the network (see Ref.27 for a discus-
sion of this point) it is meaningful to introduce the
space-averaged Cooperon Pc(t) =
Pc(x, x; t)
http://arxiv.org/abs/0704.0741v1
∆σWL = −
dt e−t/τϕ Pc(t) (1)
= −2e
lnS(γ) (2)
where τϕ = L
ϕ/D is the phase coherence time.
The factor 2 stands for spin degeneracy. We
have omitted in (1,2) a factor 1/s where s is the
cross-section of the wires. The parameter γ is re-
lated to the phase coherence length γ = 1/L2ϕ
(note that description of the decoherence due to
electron-electron interaction in networks requires
a more refined discussion28,29). The spectral de-
terminant of the Laplace operator is formally de-
fined as S(γ) = det(γ −∆) =
n(γ + En) where
{En} is the spectrum of −∆ [in the presence of
a magnetic field, ∆ → (∇ − 2ieA)2]. The inter-
est in introducing S(γ) is that it can be related
to the determinant of a V × V -matrix, where V is
the number of vertices, that encodes all informa-
tion on the network (topology, length of the wires,
magnetic field, connection to reservoirs). We la-
bel vertices by greek letters. lαβ designates the
length of the wire (αβ) and θαβ the circulation of
the vector potential along the wire. The topology
is encoded in the adjacency matrix : aαβ = 1 if
α and β are linked by a wire, aαβ = 0 otherwise.
λα = ∞ if α is connected to a reservoir and λα = 0
if not. We introduce the matrix
Mαβ = δαβ
aαµ coth
−iθαβ
where the aαµ constraints the sum to run over
neighbouring vertices. Then24
S(γ) =
γlαβ√
detM (4)
where the product runs over all wires. We now con-
sider a large square network of size Nx ×Ny made
of wires of length lαβ = a ∀(αβ). For simplicity we
impose periodic boundary conditions (topology of
a torus), which is inessential as soon as the to-
tal size of the network remains small compared
to Lϕ. At zero magnetic field the spectrum of
the adjacency matrix is ǫn,m = 2 cos(2nπ/Nx) +
2 cos(2mπ/Ny), with n = 1, · · · , Nx and m =
1, · · · , Ny. Therefore
S(γ) =
)NxNy
2 cosh
γa− cos 2πn
− cos 2πm
The calculation of lnS(γ) involves a sum that can
be replaced by an integral when Nx, Ny ≫ Lϕ/a.
Using
(2π)2
2A+ cosx+ cos y
K(1/A), (6)
where K(x) is the complete elliptic integral of first
kind30, yields20
lnS(γ) =
γa− 1√
where the volume of the network is Vol = 2NxNya.
We recover the expression of the WL first derived
by Douçot & Rammal23. Figure 1 displays the
dependence of the WL correction as a function of
the phase coherence length Lϕ. We now discuss
two limiting cases.
0 1 2 3 4 5Lϕ/a
0.1 1.0 10.0
FIG. 1: ∆σWL in unit of 2e
2/h as a function of Lϕ/a
(at zero magnetic field). The dashed line is the 1d re-
sult. The dotted line is the 2d limit eq. (10).
1d limit.– In the limit Lϕ ≪ a (i.e.
γa ≫ 1) :
∆σWL = −
−2a/Lϕ
We compare with the result for a wire of length a
connected at its extremities : ∆σwireWL ≃ − 2e
). As we can see the dominant terms coincide.
Deviations appear when Lϕ/a increases since tra-
jectories begin to feel the topology of the network.
This is already visible by comparing the second
terms of the expansions.
2d limit.– In the limit Lϕ ≫ a (i.e.
γa ≪ 1), we
obtain
lnS(γ) =
ln(4Lϕ/a) +
The conductivity reads
∆σWL ≃ −
ln(Lϕ/a) + CWL
with CWL =
2 ln 2
≃ 0.608. As noticed in the
beginning of the section, eqs. (8,10) should be di-
vided by the cross-section s of the wires. In the 2d
limit, diffusive trajectories expand over distances
larger than a and feel the two dimensional nature
of the system, being the reason why (10) is reminis-
cent of the 2d result. It is interesting to point that
the network provides a natural cutoff (the length
of the wires, a) while the computation of the WL
for a plane in the diffusion approximation requires
to introduce a cutoff by hand for lower times in
eq. (1), which is the elastic scattering time τe. In
this latter case the constant added to the logarith-
mic behaviour is not well controlled since it de-
pends on the cutoff procedure (the computation of
the constant for a plane requires to go beyond the
diffusion approximation and leads to31 ∆σ
plane
ln(2L2ϕ/ℓ
e + 1) ≃ − 2e
ln(Lϕ/ℓe) +
ln 2]
since ℓe ≪ Lϕ).
III. AL’TSHULER-ARONOV
CORRECTION
At first order in the electron-electron interac-
tion, the exchange term is the dominant contribu-
tion to the correction to the conductivity8,10,11,32
∆σee = −
dπVol
ω coth
D~q 2
U(~q, ω)
(−iω +D~q 2)3 (11)
where U(~q, ω) is the dynamically screened inter-
action. Within the RPA approximation and in
the small ~q and ω limit, the interaction takes the
form33 U(~q, ω) ≃ 1
−iω+D~q 2
D~q 2
where ρ0 is the den-
sity of states per spin channel. Replacing the
Drude conductivity by its expression σ0 = 2e
and performing an integration by parts, we get
∆σee = −
πdVol
ω coth
−iω +D~q 2 (12)
After Fourier transform, the result can be cast in
the form11 :
∆σee = −λσ
sinhπT t
Pd(t) ,
For the exchange term considered here, one finds
λσ = 4/d. Further calculation yields
8 λσ ≃
F , where F is the average of the interaction
on the Fermi surface (see definition in Refs.8,9).
This expression of λσ is valid in the perturba-
tive regime, F ≪ 1 ; nonperturbative expression
is given in Refs.6,7,8,9. Pd(t) is the space inte-
grated return probability Pd(t) =
Pd(x, x; t),
where Pd(x, x′; t) is solution of a classical diffusion
equation similar to the equation for Pc(x, x′; t),
apart that it does not feel the magnetic field :
[∂t−D∆]Pd(x, x′; t) = δ(x−x′)δ(t). Therefore the
Laplace transform of Pd(t) is given by ∂γ lnS(γ)
with θαβ = 0. It is interesting to point out that
(13) has a similar structure to (1) with a different
cutoff procedure for large time. It also involves
a different scale : the temperature dependence of
∆σee is driven by the length scale LT instead of
Lϕ for the weak-localization correction ∆σWL.
Up to eq. (13) the discussion is rather general
and nothing has been specified on the system.
We have seen in section II that the WL for the
square network presents a dimensional crossover
from 1d to 2d by tuning Lϕ/a. A similar dimen-
sional crossover occurs for the AA correction by
tuning LT /a as we will see. This remark raises the
question of the dimension d in eq. (11). To answer
this question we should return to the origin of the
factor 1/d : the current lines in the conductivity
σij produce a factor qiqj replaced by δij
~q 2 after
angular integration. Since in a network the diffu-
sion in the wires has a 1d structure (provided that
W ≪ LT ∼
D/ω, where W is the width of the
wires), the dimension in λσ is d = 1. Therefore we
have for the network λnetworkσ ≃ 4− 32F .
If one now expands the thermal function in (13)
sinh y
= 4y2
n e−2ny , (14)
we can also relate ∆σee to the spectral determi-
nant. We obtain :
∆σee = −λσ
lnS(γ)
γ= 2nπ
which is the central result of this paper. It is the
starting point of the discussion below.
Application to the case of the square network.– We
have to compute γ2 ∂
lnS(γ). We start from (7)
and compute its second derivative. We obtain after
some algebra :
∆σee = −λσ
where the function ϕ(x) is given by :
ϕ(x) = − 8
2x coshx
sinh3 x
sinh2 x
3 cothx
3 tanhx
coshx
3− 2x
sinh 2x
coshx
, (17)
E(x) being the complete elliptic integral of second
kind30. The function ϕ(x) is plotted in figure 2
and its limiting behaviours are easily obtained30 :
ϕ(x) =
+O(x2) for x → 0 (18)
+O(xe−2x) for x → ∞ (19)
The LT dependence of AA correction on a square
network is displayed on figure 3, where we have
plotted ∆σee(LT ) given by eq. (16). The di-
mensional crossover now occurs by tuning the ra-
tio LT/a. We consider the two limits.
0 5 10 15
FIG. 2: The function ϕ(x) of eq. (17).
1d limit.– For LT ≪ a we can replace the expan-
sion (19) in the series (16). Therefore
∆σee ≃ −λσ
3ζ(3/2)
3ζ(3/2)
≃ 0.782. The dominant term again
coincides with the one for a connected wire8,10,11
while the second differs by a factor 2, as for the WL
[see discussion after eq. (8)].
2d limit.– In the limit LT ≫ a we introduce
N = (LT /a)2 and cut the sum (16) in two pieces :
N . It is clear from the limit be-
0 1 2 3 4 5LT/a
ee 0.1 1.0 10.0
FIG. 3: The continuous line is ∆σee in unit of λσ
as a function of LT /a (the series (16) is computed nu-
merically). The dashed line is the 1d limit, eq. (20),
and the dotted curve is the 2d limit, eq. (21).
haviours of ϕ(x) that the first sum diverges loga-
rithmically with N while the second brings a neg-
ligible contribution of order N 0. Therefore :
∆σee ≃ −λσ
ln(LT /a) + Cee
The constant is estimated numerically. We find
Cee ≃ 0.56.
The two eqs. (20,21) should be divided by the
cross-section s of the wires.
The two functions ∆σWL(B = 0, Lϕ) (figure 1)
and ∆σee(LT ) (figure 3) are very similar. Apart
from the prefactors 2e2/h and λσe
2/h which ac-
count respectively for the spin degeneracy and the
interaction strength, the linear behaviours at the
origin have a different slope (1 and 0.782) and
the logarithmic behaviours are slightly shifted :
CWL ≃ 0.61 and Cee ≃ 0.56.
IV. COMPARISON WITH EXPERIMENTS
The AA correction has been recently measured
by Mallet et al34 in networks of silver wires with
3 104 and 105 cells, lattice spacing a = 0.64 µm
and diffusion constant D ≃ 100 cm2/s. The dif-
fusion constant D has been measured separately
(through measurement of the Drude conductance),
therefore we can compare our result (16) with ex-
periment using one fitting parameter only : the
interaction parameter λσ. The 2d logarithmic
behaviour (21) has been observed in the range
100mK< T < 1K from which the value λexpσ ≃ 3.1
was extracted, in agreement with similar mea-
surements performed on a long silver wire for
which34,35 λexp, wireσ ≃ 3.2. We now compare with
the theoretical value : for silver Fermi wavelength
is k−1F = 0.083 nm and Thomas-Fermi screening
length κ−1 = 1/
8πρ0e2 = 0.055 nm. In the
Thomas-Fermi approximation, the parameter F is
given by11 F = ( κ
)2 ln[1 + (2kF
)2], therefore
F ≃ 0.58. Using the 1d nonperturbative expres-
sion8 λσ = 4 +
1 + F/2 − 1 − F/4), we get
λthσ ≃ 3.24, close to the experimental value.
V. CONCLUSION
Equations (15,16) are our main results. The first
one shows that AA and WL can be formally re-
lated :
∆σee(LT ) =
∆σWL(Lϕ)
= 2nπ
The validity of this relation is the same as for
eqs. (1,2) : the system should be such that it
is meaningful to average uniformly the nonlocal
conductivity σ(r, r′) to get the local conductivity
drdr′
σ(r, r′). A similar discussion has been
proposed to relate WL and conductivity fluctua-
tions (see appendix E of Ref.29).
Our starting point (11) is a formulation in the
Fourier space, what implicitly assumes translation
invariance. Whereas this assumption seems rea-
sonable for a large regular network such as the
square network studied in this article, its valid-
ity is not clear for networks of arbitrary topology,
what would need further developments.
We have computed the AA correction in a large
square network and shown that the result inter-
polates between the 1d, eq. (20), and a 2d result,
eq. (21). Interestingly, the 2d limit in a network
involves a 1d constant λnetworkσ ≃ 4 − 32F , what
is confirmed by experiments, as discussed in sec-
tion IV.
The interest of the network compared to the
plane is to control the constant Cee of eq. (21) :
for a plane, a cutoff must be introduced in eq. (13)
at short time t ∼ τe and the constant Cee is
replaced by a number that depends on the pre-
cise cutoff procedure. Experimentally, it would
be interesting to observe the crossover from (20)
to (21) by varying LT/a. This was not pos-
sible in experiments of Mallet et al34 described
in section IV because measurements are compli-
cated by the fact that electron-phonon interaction
also brings a temperature-dependent contribution,
∆σe−ph, at high temperature (above few Kelvins).
The conductivity is given by σ = σ0 + ∆σWL +
∆σee + ∆σe−ph. The WL can be suppressed by a
magnetic field however the electron-phonon contri-
bution is difficult to separate from ∆σee. There-
fore the network should be patterned in a way such
that the crossover 1d-2d remains below T ∼ 1 K
where ∆σe−ph is negligible. As an example we con-
sider the silver networks studied in Ref.21 for which
LT = 0.27 × T−1/2 (LT in µm and T in K). In
order to see clearly the 1d and the 2d regimes it
would be convenient to study two networks with
different lattice spacings. If temperature is con-
strained by 10 mK< T < 1 K, for a = 0.5 µm we
have 0.5 . LT /a . 5, which probes the 2d regime
over one decade. A second lattice with a ∼ 5 µm
would allow to probe the 1d regime since in this
case 0.05 . LT/a . 0.5.
Acknowledgements
We have benefitted from stimulating discussions
with Christopher Bäuerle, Hélène Bouchiat, Meydi
Ferrier, François Mallet, Laurent Saminadayar and
Félicien Schopfer.
1 B. L. Altshuler, A. G. Aronov, and D. E. Khmel-
nitsky, Effects of electron-electron collisions with
small energy transfers on quantum localisation,
J. Phys. C: Solid St. Phys. 15, 7367 (1982).
2 S. Chakravarty and A. Schmid, Weak localization:
the quasiclassical theory of electrons in a random
potential, Phys. Rep. 140(4), 193 (1986).
3 B. L. Al’tshuler and A. G. Aronov, Contribu-
tion to the theory of disordered metals in strongly
doped semiconductors, Sov. Phys. JETP 50(5), 968
(1979).
4 B. L. Al’tshuler, D. E. Khmel’nitzkĭı, A. I. Larkin,
and P. A. Lee, Magnetoresistance and Hall effect
in a disordered two-dimensional electron gas, Phys.
Rev. B 22(11), 5142 (1980).
5 B. L. Altshuler and A. G. Aronov, Fermi-liquid the-
ory of the electron-electron interaction effects in
disordered metals, Solid State Commun. 46, 429
(1983).
6 A. M. Finkel’shtĕın, Influence of Coulomb interac-
tion on the properties of disordered metals, Sov.
Phys. JETP 57(1), 97 (1983).
7 C. Castellani, C. Di Castro, P. A. Lee, and
M. Ma, Interaction-driven metal-insulator transi-
tions in disordered fermion systems, Phys. Rev. B
30(2), 527 (1984).
8 B. L. Altshuler and A. G. Aronov, Electron-electron
interaction in disordered conductors, in Electron-
electron interactions in disordered systems, edited
by A. L. Efros and M. Pollak, page 1, North-
Holland, 1985.
9 P. A. Lee and T. V. Ramakrishnan, Disordered elec-
tronic systems, Rev. Mod. Phys. 57, 287 (1985).
10 I. L. Aleiner, B. L. Altshuler, and M. E. Gershenson,
Interaction effects and phase relaxation in disor-
dered systems, Waves RandomMedia 9, 201 (1999).
11 É. Akkermans and G. Montambaux, Mesoscopic
physics of electrons and photons, Cambridge Uni-
versity Press, 2007.
12 It is worth mentioning that a similar effect exists at
higher temperature, in the ballistic regime (τee ≪
τe) ; Ref.
13 provides a nice review on this point and
describes the crossover between the two regimes.
13 G. Zala, B. N. Narozhny, and I. L. Aleiner, Interac-
tion corrections at intermediate temperatures: Lon-
gitudinal conductivity and kinetic equation, Phys.
Rev. B 64, 214204 (2001).
14 A. E. White, M. Tinkham, W. J. Skocpol, and D. C.
Flanders, Evidence for interaction effects in the
low-temperature resistance rise in ultrathin metallic
wires, Phys. Rev. Lett. 48(25), 1752 (1982).
15 P. M. Echternach, M. E. Gershenson, H. M. Bozler,
A. L. Bogdanov, and B. Nilsson, Temperature de-
pendence of the resistance of one-dimensional metal
films with dominant Nyquist phase breaking, Phys.
Rev. B 50(8), 5748 (1994).
16 F. Pierre, A. B. Gougam, A. Anthore, H. Pothier,
D. Esteve, and N. O. Birge, Dephasing of electrons
in mesoscopic metal wires, Phys. Rev. B 68, 085413
(2003).
17 C. Bäuerle, F. Mallet, F. Schopfer, D. Mailly,
G. Eska, and L. Saminadayar, Experimental Test
of the Numerical Renormalization Group Theory
for Inelastic Scattering from Magnetic Impurities,
Phys. Rev. Lett. 95, 266805 (2005).
18 B. Pannetier, J. Chaussy, R. Rammal, and P. Gan-
dit, First Observation of Altshuler-Aronov-Spivak
effect in gold and copper, Phys. Rev. B 31(5), 3209
(1985).
19 G. J. Dolan, J. C. Licini, and D. J. Bishop, Quan-
tum Interference Effects in Lithium Ring Arrays,
Phys. Rev. Lett. 56(14), 1493 (1986).
20 M. Ferrier, L. Angers, A. C. H. Rowe, S. Guéron,
H. Bouchiat, C. Texier, G. Montambaux, and
D. Mailly, Direct measurement of the phase co-
herence length in a GaAs/GaAlAs square network,
Phys. Rev. Lett. 93, 246804 (2004).
21 F. Schopfer, F. Mallet, D. Mailly, C. Texier,
G. Montambaux, L. Saminadayar, and C. Bäuerle,
Dimensional crossover in quantum networks: from
mesoscopic to macroscopic physics, Phys. Rev. Lett.
98, 026807 (2007).
22 B. Douçot and R. Rammal, Quantum oscillations
in normal-metal networks, Phys. Rev. Lett. 55(10),
1148 (1985).
23 B. Douçot and R. Rammal, Interference effects and
magnetoresistance oscillations in normal-metal net-
works: 1. weak localization approach, J. Physique
47, 973–999 (1986).
24 M. Pascaud and G. Montambaux, Persistent cur-
rents on networks, Phys. Rev. Lett. 82, 4512 (1999).
25 Pascaud & Montambaux have rather considered
thermodynamic properties. The nonlocal effects in
networks have been further investigated in Ref.27.
26 E. Akkermans, A. Comtet, J. Desbois, G. Montam-
baux, and C. Texier, On the spectral determinant
of quantum graphs, Ann. Phys. (N.Y.) 284, 10–51
(2000).
27 C. Texier and G. Montambaux, Weak localization
in multiterminal networks of diffusive wires, Phys.
Rev. Lett. 92, 186801 (2004).
28 T. Ludwig and A. D. Mirlin, Interaction-induced
dephasing of Aharonov-Bohm oscillations, Phys.
Rev. B 69, 193306 (2004).
29 C. Texier and G. Montambaux, Dephasing due
to electron-electron interaction in a diffusive ring,
Phys. Rev. B 72, 115327 (2005).
30 I. S. Gradshteyn and I. M. Ryzhik, Table of in-
tegrals, series and products, Academic Press, fifth
edition, 1994.
31 A. Cassam-Chenai and B. Shapiro, Two dimen-
sional weak localization beyond the diffusion ap-
proximation, J. Phys. I France 4, 1527 (1994).
32 The formula (5.1) of Ref.8 has the wrong sign.
33 This interaction assumes that the screening length
is smaller than the transverse size of the wire.
34 F. Mallet et al, to be published (2007).
35 L. Saminadayar, P. Mohanty, R. A. Webb, P. De-
giovanni and C. Bäuerle, Phase coherence in the
presence of magnetic impurities, Physica E, to be
published (2007).
ABSTRACT
  We consider the correction $\Delta\sigma_\mathrm{ee}$ due to
electron-electron interaction to the conductivity of a weakly disordered metal
(Al'tshuler-Aronov correction). The correction is related to the spectral
determinant of the Laplace operator. The case of a large square metallic
network is considered. The variation of $\Delta\sigma_\mathrm{ee}(L_T)$ as a
function of the thermal length $L_T$ is found very similar to the variation of
the weak localization $\Delta\sigma_\mathrm{WL}(L_\phi)$ as a function of the
phase coherence length. Our result for $\Delta\sigma_\mathrm{ee}$ interpolates
between the known 1d and 2d results, but the interaction parameter entering the
expression of $\Delta\sigma_\mathrm{ee}$ keeps a 1d behaviour. Quite
surprisingly, the result is very close to the 2d logarithmic behaviour already
for $L_T\sim{a}/2$, where $a$ is the lattice parameter.

<|endoftext|><|startoftext|>
Introduction
At low temperature, quantum interferences of reversed electronic trajectories are responsible
for a small reduction of the averaged conductivity called the weak localization (WL) correction.
This correction is a manifestation of quantum coherence which is always limited over a certain
length scale, named the phase coherence length Lϕ. A way to extract this important length
scale in experiments is to use the magnetic field sensitivity of the WL. For example the WL
correction of an infinitely long wire of rectangular section of width W and area S submitted to
a perpendicular magnetic field B is 1 〈∆σ〉 = −2e2
)2]−1/2 (in the following we
will forget the 1/S factor). The width of the magnetoconductance (MC) curve provides a direct
determination of Lϕ.
... ...
(b) ......
Figure 1: Chains of rings. If we consider the regime b ≫ Lϕ ≫ L the rings can be considered as independent in
case (a) but not in case (b).
Another possibility to extract phase coherence length is to study arrays of rings whose
MC present oscillations as a function of the flux φ per ring with period half the quantum flux
φ0 = h/e. These are the famous Al’tshuler-Aronov-Spivak oscillations
2 (AAS), observed in
many experiments3,4,5,6. In order to extract the phase coherence length from AAS oscillations,
a precise theoretical prediction for the behaviour of the AAS harmonics with the phase coherence
length is needed. Harmonics of the oscillations are defined as 〈∆σ(φ)〉 =
n∆σne
4πinφ/φ0 . A
well-known expression has been derived in Ref. 1 for an isolated ring of perimeter L :
∆σn = −
−|n|L/Lϕ (1)
However, in a real experiment where the ring is connected to wires, or embedded in a larger
network, this expression can only be relevant in the regime Lϕ ≪ L. At the lowest tempera-
tures, when Lϕ & L, the AAS harmonics are strongly affected by the surrounding wires since
trajectories can expand outside the ring over distances larger than the perimeter. It is the aim
of this paper to discuss the behaviour of AAS harmonics in chains of rings when Lϕ & L. We
will consider two cases represented on figure 1 : in the first situation the rings are separated
http://arxiv.org/abs/0704.0742v1
by a distance b ≫ Lϕ and can therefore be considered as independent (however the connecting
wires will affect the AAS harmonics). In the second case rings are in contact and harmonics can
involve trajectories winding around several neighbouring rings. In section 3, we will see that
when electron-electron interaction is the dominant process for decoherence, eq. (1) cannot be
used even in the regime L ≪ Lϕ.
2 Nonlocality of quantum transport in chain of rings
We consider an array of rings all pierced by the same flux φ. The n-th harmonic of the WL
correction at a given point x of a network can be expressed as
∆σn(x) = −
dtPn(x, x; t) e−t/τϕ (2)
where D is the diffusion constant and τϕ = L
ϕ/D the phase coherence time. The factor 2 stands
for spin degeneracy. Pn(x, x; t) is the probability that a particle diffusing into the network
comes back to its initial point x in a time t, after having encircled a flux nφ. For example,
in an isolated ring Pn(x, x; t) = 1√
−(nL)2/4Dt which immediatly gives eq. (1). Except in
translation invariant systems, ∆σn(x) depends on x and expression (2) must be averaged over
the network in a proper way described in Ref. 7.
A ring with two arms.– The case of a ring connected to two arms has been studied in detail
in Ref. 8 where it has been shown that Pn(x, x; t) ≃
2(Dt)3/4
(Dt)1/4
) for time scales t ≫ L2/D
with x inside the ring (the precise form of the dimensionless function Ψ(ξ) is inessential for the
present discussion). Compared with the isolated ring case, where the typical number of winding
scales with time as nt ∼ t1/2, diffusion around the ring is slowed down as nt ∼ t1/4 due to the
time spent in the arms. As a consequence the harmonics of the conductance of a ring are given
by 8 : ∆σn ∝ Lϕ3/2e−|n|
2L/Lϕ (note that the scaling n ∼ L1/2ϕ is analogous to the scaling of
winding with time nt ∼ t1/4 since Lϕ ∼ t1/2).
The chain of distant rings.– The same argument holds for the chain of rings separated by
a distance b ≫ Lϕ (figure 1.a). In this case, averaging properly ∆σn(x) inside the chain of Nr
rings, one finds that the harmonics of the dimensionless conductance read :
∆gn ≃ −
1/2Lϕ
2 [(Nr + 1)b]2
2L/Lϕ for b ≫ Lϕ ≫ L (3)
The chain of attached rings.– If we now consider the network of figure 1.b, we can show
that the probability reads 9 Pn(x, x; t) ≃ L8πDte
−(nL)2/4Dt for t ≫ L2/D. The AAS harmonics
are given in this case by 9 :
∆gn ≃ −
[ln(2Lϕ/|n|L) + bn] for Lϕ/L ≫ |n| (4)
−|n|L/Lϕ
|n|L/Lϕ
for |n| ≫ Lϕ/L ≫ 1 (5)
where bn depends weakly on n (b∞ = −C, the Euler constant).
3 Decoherence due to electron-electron interaction
The above results rely on the fact that, in eq. (2), the long times have been cut off with an
exponential damping e−t/τϕ . However it has been shown recently that this simple modelization
does not account correctly for the decoherence due to electron-electron interaction, which is the
dominant one at low temperature a. In this case, an alternative description was proposed by
Al’tshuler, Aronov & Khmel’nitskii (AAK) 12 but it is only recently that the consequences for
AAS oscillations have been understood13,14,15.
The model of AAK.– The length scale characterizing the efficiency of electron-electron inter-
action to suppress phase coherence in wires is known as the Nyquist length LN = (ν0D
2/T )1/3
where ν0 is the density of states, D the diffusion constant and T the temperature (~ = kB = 1).
In the model of AAK, the random phase accumulated by an electron moving in the fluctuat-
ing electric potential due to other electrons is included in the calculation of the WL. The pair
of reversed interfering trajectories picks a phase eiΦ[C], where C designates a closed diffusive
trajectory, and the harmonics of WL are given by
∆σn ∼ −
〈eiΦ[Cn]〉V = −
〈Φ[Cn]2〉V (6)
The sum runs over all closed trajectories with winding n (a proper formulation of eq. (6) requires
a path integral). Gaussian fluctuations of the electric potential are given by the fluctuation-
dissipation theorem 〈V (~r, t)V (~r ′, t′)〉V = 2e
Tδ(t − t′)Pd(~r,~r ′) (written here in the classical
limit T ≪ ω), where σ0 is the classical Drude conductivity. Pd is solution of the diffusion
equation −∆Pd(~r,~r ′) = δ(~r−~r ′) and therefore depends on the topology of the system. Then14
〈Φ[x(τ)]2〉V =
dτ [Pd(x(τ), x(τ)) − Pd(x(τ), x(t− τ))] (7)
where C ≡ (x(τ), 0 6 τ 6 t |x(0) = x(t)) is a closed diffusive path. The crucial point is that
the simple exponential damping of eq. (2) is replaced in eq. (6) by a functional of the trajectory
〈Φ[Cn]2〉V . Therefore decoherence is now network-dependent and a priori sensitive to the
nature of trajectories (in particular whether they do enclose a magnetic flux or not).
The limit LN ≪ L.– The model described above was applied to the case of a single
ring 13,14,15. The result for an isolated ring is relevant to describe arrays of rings in the limit
LN ≪ L where winding trajectories hardly exit from a ring, which makes rings independent
from each other. For the chain of distant rings (figure 1.a) we have
∆gn ∼ −
[(Nr + 1)b]2
−|n|π
(L/LN )
3/2 ∼
−nL3/2T 1/2
T 1/3
for LN ≪ L ≪ b (8)
(for the case of the chain of rings in contact (figure 1.b), (Nr + 1)b in the denominator is
replaced by NrL/4). Whereas the time characterizing efficiency of electron-electron interaction
to suppress phase coherence in a wire is the Nyquist time τN = L
N/D ∝ T−2/3, it was shown in
Refs. 13,14 that the behaviour (8) is related to a new time scale characterizing decoherence for
winding trajectories : τc = τ
L ∝ T
−1, where τL = L
2/D is the Thouless time of the ring.
The chain of distant rings.– If we consider a ring connected to long arms, winding trajectories
spend most of the time in the arms8 and the decoherence mostly occurs in the arms. Therefore
decoherence occurs on a time scale τN , like in a wire. The function 〈eiΦ[C]〉V for a wire was
a The exponential damping gives the correct shape of a MC of a wire 10,11 with Lϕ →
2LN (see below
for definition of LN ), however this simple substitution gives an incorrect result for AAS harmonics as explained
below.
studied in Ref. 16. Using this result and the winding properties recalled in section 2 leads to 14
∆gn ∼ −
[(Nr + 1)b]2
for n2 ≪ LN/L (9)
[(Nr + 1)b]
)7/12
−κ2|n|
L/LN ∼ e
−nL1/2T 1/6
T 11/36
for n2 ≫ LN/L , (10)
where κ2 =
2|u1|1/4 ≃ 1.421.
The chain of attached rings.– In this case, the nature of decoherence was shown to be closely
related to the one of a wire since diffusion along the chain is reminiscent of a 1d diffusion and
again occurs on time scale τN
∆gn ≃ −
ln(LN/|n|L) + cste for |n| ≪ LN/L (11)
≃ − 1
Nr|u1|3/2
−κ3|n|L/LN ∼ e−nLT 1/3 for |n| ≫ LN/L (12)
where κ3 = 2
−1/3|u1|1/2 ≃ 0.801.
4 Conclusion
We have considered networks of connected rings, made of weakly disordered wires. We have first
shown that geometrical effects can strongly modify the exponential behaviour of AAS harmonics
well-known for an isolated ring, since trajectories can now explore the network around each
ring. In the second part we have shown that decoherence due to electron-electron interaction
is sensitive to geometry, a second reason that modifies the simple AAS result. An interesting
experiment would be to compare precisely AAS oscillations for the two networks of figure 1 in
the low temperature regime LN ≫ L.
References
1. B. L. Al’tshuler and A. G. Aronov, JETP Lett. 33(10), 499 (1981).
2. B. L. Al’tshuler, A. G. Aronov, and B. Z. Spivak, JETP Lett. 33(2), 94 (1981).
3. B. Pannetier, J. Chaussy, R. Rammal, and P. Gandit, Phys. Rev. B 31(5), 3209 (1985).
4. G. J. Dolan, J. C. Licini, and D. J. Bishop, Phys. Rev. Lett. 56(14), 1493 (1986).
5. M. Ferrier, L. Angers, A. C. H. Rowe, S. Guéron, H. Bouchiat, C. Texier, G. Montambaux,
and D. Mailly, Phys. Rev. Lett. 93, 246804 (2004).
6. F. Schopfer, F. Mallet, D. Mailly, C. Texier, G. Montambaux, L. Saminadayar, and
C. Bäuerle, Phys. Rev. Lett. 98, 026807 (2007).
7. C. Texier and G. Montambaux, Phys. Rev. Lett. 92, 186801 (2004).
8. C. Texier and G. Montambaux, J. Phys. A: Math. Gen. 38, 3455–3471 (2005).
9. C. Texier and G. Montambaux, in preparation (2007).
10. F. Pierre, A. B. Gougam, A. Anthore, H. Pothier, D. Esteve, and N. O. Birge, Phys.
Rev. B 68, 085413 (2003).
11. É. Akkermans and G. Montambaux, Physique mésoscopique des électrons et des photons,
EDP Sciences, CNRS éditions, 2004. Mesoscopic physics of electrons and photons, Cam-
bridge University Press, 2007.
12. B. L. Altshuler, A. G. Aronov, and D. E. Khmelnitsky, J. Phys. C: Solid St. Phys. 15,
7367 (1982).
13. T. Ludwig and A. D. Mirlin, Phys. Rev. B 69, 193306 (2004).
14. C. Texier and G. Montambaux, Phys. Rev. B 72, 115327 (2005) ; ibid 74, 209902(E)
(2006).
15. C. Texier and G. Montambaux, Comment on Ref. 13, submitted (2007).
16. G. Montambaux and E. Akkermans, Phys. Rev. Lett. 95, 016403 (2005).
	Introduction
	Nonlocality of quantum transport in chain of rings
	Decoherence due to electron-electron interaction
	Conclusion
ABSTRACT
  We study weak localization in chains of metallic rings. We show than
nonlocality of quantum transport can drastically affect the behaviour of the
harmonics of magnetoconductance oscillations. Two different geometries are
considered: the case of rings separated by long wires compared to the phase
coherence length and the case of contacted rings. In a second part we discuss
the role of decoherence due to electron-electron interaction in these two
geometries.

<|endoftext|><|startoftext|>
Diatomicmolecule as a quantum entanglement switch
Adam Rycerz ∗
Marian Smoluchowski Institute of Physics, Jagiellonian University, Reymonta 4, 30–059 Kraków, Poland
Abstract
We investigate a pair entanglement of electrons in diatomic molecule, modeled as a correlated double quantum dot attached to
the leads. The low-temperature properties are derived from the ground state obtained by utilizing the Rejec-Ramšak variational
technique within the framework of EDABI method, which combines exact diagonalization with ab initio calculations. The results
show, that single-particle basis renormalization modifies the entanglement-switch effectiveness significantly. We also found the
entanglement signature of a competition between an extended Kondo and singlet phases.
Key words: Correlated nanosystems, Entanglement manipulation, EDABI method
PACS: 73.63.-b, 03.67.Mn, 72.15.Qm
Quantum entanglement, as one of the most intriguing
features of quantum mechanics, have spurred a great deal
of scientific activity during the last decade, mainly because
it is regarded as a valuable resource in quantum commu-
nication and information processing [1]. The question on
entanglement between microscopic degrees of freedom in a
condensed phase have been raised recently [2], in hope to
shed new lights on the physics of quantum phase transitions
and quantum coherence [3]. In the field of quantum elec-
tronics, a pair entanglement appeared to be a convenient
tool to characterize the nature of transport through quan-
tum dot, since its vanish when the system is in a Kondo
regime [4]. The analogical behavior was observed for two
qubits in double quantum dot, for either serial and paral-
lel configuration [5]. The latter case is intriguing, since the
concurrence [6] at T = 0 changes abruptly from C ≈ 1 to
C = 0 when varying the interdot coupling, so a finite An-
derson system shows a true quantum phase transition.
Here we consider a nanoscale version of such an entangle-
ment switch, inspired by conductance measurements for a
single hydrogenmolecule [7]. A special attention is payed to
electron-correlation effects, in particular the wave-function
renormalization [8]. Recent experiment [9] shows the cur-
rent through a molecule is carried by a single conductance
channel, so serial configuration shown in Fig. 1 seems to be
the realistic one. The Hamiltonian of the system is
∗ Corresponding author. Tel: (+48 12) 663–55–68 Fax: (+48 12)
633–40–79
Email address: rycerz@th.if.uj.edu.pl (Adam Rycerz).
���������� �������� �������� ������������������������ ��������
Fig. 1. Diatomic molecule modeled as a double quantum dot attached
serially to the leads. A cross-section of the single-particle potential
along the main system axis is shown schematically.
H = HL + VL +HC + VR +HR, (1)
where HC models the central region, HL(R) describes the
left (right) lead, and VL(R) is the coupling between the lead
and the central region. Both HL(R) and VL(R) terms have a
tight–binding form, with the chemical potential in leads µ,
the hopping t, and the tunneling amplitude V , as depicted
schematically in Fig. 1. The central-region Hamiltonian
iσcjσ +
iσ 6=jσ′
Uijniσniσ′ + (Ze)
2/R (2)
(with i, j = 1, 2 and σ =↑, ↓) describes a double quan-
tum dot with electron-electron interaction. tij and Uij are
single-particle and interaction elements, the last term de-
scribes the Coulomb repulsion of the two ions at the dis-
tance R. Here we put Z = 1 and calculate all the param-
eters tij , Uij as the Slater integrals [10] for 1s-like hydro-
genic orbitals Ψ1s(r) =
α3/π exp(−α|r|), where α−1 is
the orbital size (cf. Fig. 1). The parameter α is optimized
to get a minimal ground-state energy for whole the system
Preprint submitted to Elsevier 15 November 2018
http://arxiv.org/abs/0704.0743v2
-3 -2 -1  0  1
chemical potential,   
 0  1  2  3  4
 average filling,   
〈n1+n2〉
Fig. 2. Entanglement and transport through the system in Fig. 1 as
a function of the chemical potential µ (top panel) and the average
filling 〈n1+n2〉 (bottom panel). Tick (thin) solid and dashed lines
shows the concurrence C (conductance G) for Γ = t/9 and t/4,
respectively. The limits Γ → 0 are depicted with dotted lines in the
bottom panel. The interatomic distance is R = 1.5a.
 1  2  3  4  5  6
interatomic distance,   
Fig. 3. Concurrence (tick lines) and conductance (thin lines) at
the half-filled sector 〈n1+n2〉 = 2 as a function of the interatomic
distance R. The remaining parameters are the same as in Fig. 2.
described by the Hamiltonian (1). Thus, following the idea
of EDABI method [8], we reduce the number of physical
parameters of the problem to just a three: the interatomic
distance R, the lead-molecule hybridization Γ = V 2/t, and
the chemical potential µ (we put the lead hopping t =
1 Ry = 13.6 eV to work in the wide–bandwidth limit).
The entanglement between electrons placed on two atoms
can be characterized by the charge concurrence [4]
C = 2max
0, |〈c
iσcjσ〉| −
〈niσnjσ〉〈n̄iσ n̄jσ〉
where n̄iσ ≡ 1−niσ. We also discuss the conductivity cal-
culated from the formula G = G0 sin
2(E+−E−)/4tN [11],
where G0 = 2e
2/h̄, and E± are the ground-state ener-
gies of the system with periodic and antiperiodic boundary
conditions, respectively. Either the energies E± or correla-
tion functions in Eq. (3) are calculated within the Rejec–
Ramšak variational method [11], complemented by the or-
bital size optimization, as mentioned above. We use up to
N = 104 sites to reach the convergence.
In Fig. 2 we show the concurrence and conductance for
R = 1.5a0 (where a0 is the Bohr radius) and two values
of the hybridization Γ = t/9 and t/4. The conductance
spectrum asymmetry, caused by wave-function renormal-
ization [8], is followed by an analogical effect on entangle-
ment, which changes significantly faster for the upper con-
duction band, where the average filling is 〈n1+n2〉 ≈ 3
(one extra electron). The asymmetry vanish when analyz-
ing the system properties as a function of 〈n1+n2〉, show-
ing it originates from varying charge compressibility χc =
∂〈n1+n2〉/∂µ ≈ 2/(U11 + U12) ∼ 1/α. We also note the
convergence of discussed quantities with Γ → 0 to C ≈
1− |〈n1+n2〉 − 2|/2 and G ≈ G0 sin
2(π〈n1+n2〉/2).
Entanglement evolution with R is illustrated in Fig. 3,
where we focus on the charge neutral section 〈n1+n2〉 =
2. The abrupt entanglement drop follows the sharp con-
ductance peak for Γ = t/9, which is associated with the
competition between double Kondo and spin/charge sin-
glet phases [12]. For Γ = t/4 both C andG dependence onR
become smooth, but the switching behavior is still present.
Earlier, we have shown that Γ = t/4 is large enough to
cause molecule instability and therefore may allow the in-
dividual atom manipulation [8].
In conclusion, we analyzed a pair entanglement of elec-
trons in diatomic molecule attached serially to the leads.
Entanglement evolution with the chemical potential speeds
up remarkably for the negatively charged system, due to
electron correlation effects. The switching behavior was also
observed when changing the interatomic distance.
The work was supported by Polish Science Foundation
(FNP), and Ministry of Science Grant No. 1 P03B 001 29.
References
[1] See review by C.H. Bennet and D.P. Divincenzo, Nature 404, 247
(2000); M.A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge, 2000).
[2] A. Osterloh et al., Nature 416, 608 (2002); T. J. Osborne and
M. A. Nielsen, Phys. Rev. A 66, 032110 (2002); S.–J. Gu et al.,
Phys. Rev. Lett. 93, 086402 (2004).
[3] J. van Wezel, J. van den Brink, J. Zaanen, Phys. Rev. Lett. 94,
230401 (2005); cond-mat/0606140.
[4] A. Rycerz, Eur. Phys. J. B 52, 291 (2006); S. Oh, J. Kim, Phys.
Rev. B 73, 052407 (2007).
[5] A. Ramšak, J. Mravlje, R. Žitko, J. Bonča, Phys. Rev. B 74,
241305(R) (2006); R. Žitko, J. Bonča, ibid. 74, 045312 (2006).
[6] W.K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
[7] R.H.M. Smit et al., Nature 419, 906 (2002).
[8] J. Spa lek et al., cond-mat/0610815.
[9] D. Djukic, J.M. van Ruitenbeek, Nano. Lett. 6, 789 (2006); M.
Kiguchi et al., cond-mat/0612681.
[10] J. C. Slater, Quantum Theory of Molecules and Solids, McGraw–
Kill (New York, 1963), Vol. 1, p. 50.
[11] T. Rejec, A. Ramšak, Phys. Rev. B 68, 033306 (2003).
[12] P.S. Cornaglia, D.R. Grempel, Phys. Rev. B 71, 075305 (2005);
J. Mravlje, A. Ramšak, T. Rejec, ibid. 73, 241305(R) (2006).
	References
ABSTRACT
  We investigate a pair entanglement of electrons in diatomic molecule, modeled
as a correlated double quantum dot attached to the leads. The low-temperature
properties are derived from the ground state obtained by utilizing the
Rejec-Ramsak variational technique within the framework of EDABI method, which
combines exact diagonalization with ab initio calculations. The results show,
that single-particle basis renormalization modifies the entanglement-switch
effectiveness significantly. We also found the entanglement signature of a
competition between an extended Kondo and singlet phases.

<|endoftext|><|startoftext|>
Introduction and Setting
Let (Ω,F ,P) be a probability space carrying an N -dimensional Brownian motion
(Wt)t≥0 with a d × d correlation matrix. We consider smooth curves Fǫ : R →
L2(Ω;RN ) of random variables, where ǫ ∈ R is a parameter. We apply Taylor
theorems to obtain strong approximations of the curve Fǫ at ǫ = 0 and we apply
partial integration on Wiener space to obtain weak approximations of the law of Fǫ
for small values of ǫ.
We choose the notion Taylor expansion instead of asymptotic expansion in order
to point out that the strong method is indeed a classical Taylor expansion with usual
conditions for convergence. The weak method represents a truncated converging
power series in the parameter ǫ if – for instance – the payoff f : RN → R stems
from a real analytic function and some distributional properties are satisfied.
2. Weak and strong Taylor methods - Structure Theorems
We introduce in this section two concepts of approximation. Consider a curve
ǫ 7→ Fǫ, where ǫ ∈ R and Fǫ ∈ L2(Ω;RN ).
Definition 1. A strong Taylor approximation of order n ≥ 0 is a (truncated)
power series
(2.1) Tnǫ (Fǫ) :=
such that
(2.2) E
|Fǫ −Tnǫ (Fǫ)|
= o(ǫn),
Financial support from the Austrian Science Fund (FWF) under grant P 15889 and the START-
prize-grant Y328-N13 is gratefully acknowledged. Furthermore this work was financially supported
by the Christian Doppler Research Association (CDG). The authors gratefully acknowledge a
fruitful collaboration and continued support by Bank Austria and the Austrian Federal Financing
Agency (ÖBFA) through CDG.
http://arxiv.org/abs/0704.0745v1
2 MARIA SIOPACHA AND JOSEF TEICHMANN
holds true as ǫ→ 0.
Remark 1. In our setting a strong Taylor approximation of any order n ≥ 0 of
the curve Fǫ can always be obtained, see for instance [KM97].
Let f : RN → R be a Lipschitz function with Lipschitz constant K, then we
obtain
(2.3) E
|f(Fǫ)− f
ǫ (Fǫ)
‖Fǫ −Tnǫ (Fǫ)‖
= Ko(ǫn).
Equation (2.3) does not hold anymore if f is not globally Lipschitz continuous.
In particular, we observe the dependence of the right hand side on the Lipschitz
constant K. Hence, truncating an a-priori known Taylor expansion leads to an
error term, which contains the Lipschitz constant and is therefore not useful for
non-Lipschitz claims. The weak method navigates around this feature by partial
integration.
Definition 2. A weak Taylor approximation of order n ≥ 0 is a power series
for each bounded, measurable f : RN → R,
ǫ (f, Fǫ) :=
E(f(F0)πi),
where πi ∈ L1(Ω) denote real valued, integrable random variables, such that
f(Fǫ)
−Wnǫ (f, Fǫ)| = o(ǫn).
Remark 2. The weights πi for i ≥ 1 are called Malliavin weights.
Remark 3. If the law of Fǫ is real analytic at ǫ = 0 in the weak sense, i.e. if there
exist (signed) measures µi such that for all bounded, measurable f : R
N → R the
following series converges and the equality
f(Fǫ)
f(x)µi(dx),
holds true, precisely then we do have a converging weak Taylor expansion. We aim
for constructing stochastic representations of the following type, for i ≥ 0:
f(x)µi(dx) = E(f(F0)πi).
For the definition of the weak Taylor approximation to make sense, existence of
the Malliavin weights has to hold. The following theorem can be found in a slightly
different version in [MT06] and goes back to S. Watanabe. For the definition and
notion of D∞(RN ) see [Mal97] or [Nua06].
Theorem 1. Let Fǫ : R → D∞(RN ) be smooth and assume that the Malliavin co-
variance matrix γ(Fǫ) is invertible with p-integrable inverse for every p ≥ 1 around
ǫ = 0 (i.e. on an open interval containing ǫ = 0). Then there is a weak Taylor
approximation of any order n ≥ 0 and there are explicit formulas for the weights
πi. If we only know that the Malliavin covariance matrix γ(F0) is invertible with
p-integrable inverse, then we can also calculate the Malliavin weights, since they
depend only on γ(F0).
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 3
Proof. Fix n ≥ 0 and take a smooth test function f : RN → R and assume that
γ−1(Fǫ) exists as a smooth curve in D∞ on a open ǫ-interval containing ǫ = 0. By
standard arguments we can prove the following formula
f(Fǫ)
f(Fǫ)δ
s 7→ (DsFǫ)Tγ−1(Fǫ)
More precisely, by the integration by parts [Nua06, Definition 1.3.1-(1.42)], the
chain rule [Nua06, Proposition 1.2.3] and the definition of the Malliavin covariance
matrix, [Nua06, page 92], we obtain from the right hand side the desired left hand
side.
Notice that the ǫ-dependence of the Skorohod integral is smooth due to basic
properties of D∞. Hence, we can calculate higher derivatives of the left hand side
by iterating the above procedure and differentiating the Skorohod integral. We
denote
(2.4) π1 := δ
s 7→ (DsFǫ)Tγ−1(Fǫ)
We write then, pars pro toto, the formula for the second derivative
E(f(Fǫ)) = E
f(Fǫ)δ
s 7→ π1(DsFǫ)Tγ−1(Fǫ)
f(Fǫ)δ
s 7→ (Ds
γ−1(Fǫ)
f(Fǫ)δ
s 7→ (DsFǫ)Tγ−1(Fǫ)
dγ(Fǫ)
γ−1(Fǫ)
f(Fǫ)δ
s 7→ (DsFǫ)Tγ−1(Fǫ)
This formula makes perfect sense at ǫ = 0 and – by induction – we see that we can
perform this step for any derivative. The general, recursive result is the following:
as := (DsFǫ)
γ−1(Fǫ)
for 0 ≤ s ≤ T,
πn := δ(s 7→ asπn−1) +
πn−1,
π0 := 1.
Here we understand the weights πn as ǫ-dependent, whereas in the final formulas
we put ǫ = 0. This proves the result for smooth test functions f and under the
assumption that the Malliavin covariance matrix is invertible around ǫ = 0. If we
approximate a bounded, measurable function f by smooth test functions we obtain
the desired assertion by standard arguments, since the weights are integrable. �
Remark 4. By Taylor’s theorem and the Faà-di-Bruno-formula we obtain
dnf(Fǫ)
|α|≤n
f (α)(Fǫ)pα,
where pα is a well-defined polynomial in derivatives of the curve ǫ 7→ Fǫ, for a
multi-index α. Since D∞ is an algebra, see [Mal97], the above expression lies in
4 MARIA SIOPACHA AND JOSEF TEICHMANN
Lp(Ω) for each p ≥ 0. The previous result provides a representation of the partial
integration result for
|α|≤n
f (α)(Fǫ)pα) = E(f(Fǫ)πn).
The structure of the weights is seen from above. The result can be considered as
a dual version of the Faà-di-Bruno-formula. However, the structure of this dual
formula is much simpler.
We provide an example to demonstrate the strong and weak method of approx-
imation. The method works in order to replace time-consuming iteration schemes,
like the Euler-scheme, by simulations of “simple” Itô integrals.
Example 1. We deal with a generic, real-valued random variable over a one-
dimensional Gaussian space, see [Nua06], i.e.
where the F i lie in the (i+1)st Wiener chaos Hi+1(Ω) (one can think of a Hermite
expansion for instance) and the sum is understood in the L2-sense. From the strong
expansion we obtain immediately – for a given Lipschitz function f : R → R – that
f(Fǫ)
f(F 0 + ǫF 1)
| ≤ Ko(ǫ),
as ǫ→ 0, where K denotes the Lipschitz constant of f . This simple approximation
can be sometimes quite useful.
We assume now that F 0 =
h(s)dWs has non-vanishing variance in order to
calculate the weights, which do depend only on γ(F0). The strong Taylor approxi-
mation is given by definition, the weak Taylor expansion can be constructed by the
previous recursive formulas and the specifications
0 = h(s),
γ(F 0) =
h(s)2ds,
h(s)2ds
In order to obtain a first-order approximation for bounded, measurable random
variables we therefore have to calculate
f(F 0)
f(F 0)π1
where
π1 = δ
s 7→ asF 1
This amounts to an integration of f times a polynomial with respect to a Gaussian
density, since:
f(F 0)π1
f(F 0)F 1
asdWs
f(F 0)DsF
Notice that the strong approximation does not yield such a result for bounded,
measurable random variables. Notice also that in the given case the approximation
can be calculated in a deterministic way, since we deal with Gaussian integrations.
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 5
The second-order weak Taylor approximation is given by
f(F 0)
f(F 0)π1
+ ǫ2E
f(F 0)π2
where
π2 = δ
s 7→ π1asF 1
s 7→ asF 2
3. Applications from Financial Mathematics
For applications we want to deal with strong and weak Taylor approximations
of a given curve of random variables. We are particulary interested in cases, where
the first derivative dFǫ
|ǫ=0 is of simple form or – even more important – where the
Malliavin covariance matrix γ(F0) is of simple form. In these cases it is easy to
obtain first or second order approximations of the respective quantities in the weak
or strong sense.
In what follows, first we will present one of the most applied interest rate models,
namely the LIBOR market model (LMM). Then, we will introduce the commonly
used technique of freezing the drift. We will show how to embed the”freezing the
drift” technique into our framework of Taylor approximations. We understand
freezing the drift as a strong Taylor approximation of order zero in the drift term
of the LIBOR SDE. Our goal is to put this technique into a method, where we
can in particular improve the order of approximation. We will finally extend the
assumption of log normality and develop a stochastic volatility LMM, where we will
show how to obtain tractable option prices via our weak Taylor approximations.
3.1. The LIBOR Market Model. We apply our concepts to the LMM, initially
constructed by [BGM97], [MSS97] and [Jam97]. Let T denote a strictly positive
fixed time horizon and (Ω,FT ,P, (Ft)0≤t≤T ) be a complete probability space, sup-
porting an N -dimensional Brownian motion Wt = (W
t , ...W
t )0≤t≤T . The factors
are correlated with dW it dW
t = ρijdt. Let 0 = T0 < T1 < T2 < . . . < TN <
TN+1 =: T be a discrete tenor structure and α := Ti+1 − Ti the accrual factor for
the time period [Ti, Ti+1], i = 0, . . . , N . Let P (t, Ti) denote the value at time t
of a zero coupon bond with maturity Ti ∈ [0, T ]. The measure P is the terminal
forward measure, which corresponds to taking the final bond P (t, T ) as numéraire.
The forward LIBOR rate Lit := Lt(Ti, Ti+1) at time t ≤ Ti for the period [Ti, Ti+1]
is given by:
Lit = Lt(Ti, Ti+1) =
P (t, Ti)
P (t, Ti+1)
We assume that for any maturity Ti there exists a bounded, continuous, determin-
istic function σi(t) : [0, Ti] → R, which represents the volatility of the LIBOR Lit,
i = 1, ..., N . The log normal LIBOR market model can be expressed under the
measure P as:
(3.1) dLit = σ
i(t)Lit
j=i+1
1 + αL
dt+ σi(t)LitdW
t , i = 1, ..., N.
3.2. Freezing the Drift. The dynamics of forward LIBORs for i = 1, ..., N − 1
depend on the stochastic drift term
, i ≤ j ≤ N , which is determined by LI-
BOR rates with longer maturities. This random drift prohibits analytic tractability
when pricing products that depend on more that one LIBOR rate, since there is
no unifying measure under which all LIBOR rates are simultaneously log normal.
6 MARIA SIOPACHA AND JOSEF TEICHMANN
In addition, it encumbers the numerical implementation of the model. Common
practice is to approximate this term by its starting value
or as it is widely
referred to as freezing the drift, i.e.
1 + αL
1 + αL
It was first implemented in the original paper [BGM97] for the pricing of swaptions
based on the LMM. [BW00] and [Sch02] argue that freezing the drift is justified due
to the fact that this term has small variance. However, by freezing the drift there
is a difference in option prices with the real and the frozen drift. It has not been
examined how big the error is or for which assets it works well or not. Our aim is
to investigate such a phenomenon and improve the performance by providing with
correction terms of order one.
3.3. Correcting the Frozen Drift. The purpose of this section is to embed the
well-known and often applied technique of freezing the drift into the strong and
weak Taylor approximations, in order to develop a method to improve the order of
accuracy. Specifically for the strong Taylor approximation, the method works well,
since we always deal with a globally Lipschitz drift term x 7→ αx+
1+αx+
with small
Lipschitz constant α.
Remark 5. As it will be clear later, the strong Taylor correction method can be
accommodated with any extension of the log normal LMM, for example with the
Lévy LIBOR model by Eberlein and Özkan [EÖ05].
3.3.1. Strong Taylor Approximation. We first state a useful lemma, asserting that
we can indeed freeze the drift under special model formulation and choice parame-
ters.
Lemma 1. Let ǫ1 ∈ R and consider for i = 1, . . . , N the following stochastic
differential equation:
(i,ǫ1)
t = ǫ1
σi(t)X
(i,ǫ1)
j=i+1
(j,ǫ1)
1 + αX
(j,ǫ1)
ρijdt+ dW
,(3.2)
defined on the complete probability space (Ω,FT ,P, (Ft)0≤t≤T ) where Wt is an N -
dimensional Brownian motion under the measure P with dW it dW
t = ρijdt. Then
the first-order strong Taylor approximation for X
(i,ǫ1)
t is given by:
(3.3) T1ǫ1(X
(i,ǫ1)
t ) = X
(i,0)
t + ǫ1
(i,ǫ1)
Proof. By (1) we obtain for n = 1:
(i,ǫ1)
t ) ≃ X
(i,ǫ1)
t = X
(i,0)
0 + ǫ1Y
t + o(ǫ1),
since X
(i,0)
t = X
(i,0)
0 and where Y
|ǫ1=0X
(i,ǫ1)
t is the first-order correction
term. By differentiating (3.2) with respect to ǫ1, we calculate:
(i,ǫ1)
= σi(t)X
(i,0)
j=i+1
(j,0)
1 + αX
(j,0)
ρijdt+ dW
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 7
and derive Y it as the solution to the above linear SDE:
(3.4) Y it =
−σi(s)X(i,0)0
j=i+1
(j,0)
1 + αX
(j,0)
σi(s)X
(i,0)
with Y i0 = 0. �
Remark 6. We parametrise the LIBOR market model in terms of the parameter
ǫ1 as follows:
(i,ǫ1)
t = σ
i(t)L
(i,ǫ1)
j=i+1
(j,ǫ1)
1 + αX
(j,ǫ1)
ρijdt+ dW
and assume at t = 0 that L
(i,ǫ1)
0 = X
(i,ǫ1)
0 for all ǫ1 and all i = 1, ..., N . If ǫ1 = 1,
what we obtain is the standard LIBOR market model formulation and in particular
(i,1)
t = X
(i,1)
t . For ǫ1 = 0, X
(i,0)
t equals its starting value and thus the drift term
in the following SDE is no longer stochastic:
(3.5) dL
(i,0)
t = σ
i(t)L
(i,0)
j=i+1
(j,0)
1 + αX
(j,0)
ρijdt+ dW
The next proposition provides a way for a pathwise approximation of L
(i,ǫ1)
t , by
means of adjusting its SDE. This is achieved by adding Tnǫ1(X
(j,ǫ1)
t ) in the frozen
drift part.
Proposition 1. Assume the setup of Lemma 1 and assume further at t = 0 that
(i,ǫ1)
0 = X
(i,ǫ1)
0 for all ǫ1 and all i = 1, ..., N . Then the stochastic differential
equation for L
(i,ǫ1)
t with the unfrozen drift:
(3.6) dL
(i,ǫ1)
t = σ
i(t)L
(i,ǫ1)
j=i+1
(j,ǫ1)
1 + αX
(j,ǫ1)
ρijdt+ dW
can be strongly approximated as ǫ1 ↓ 0 by
(3.7) dL̂
(i,ǫ1)
t = σ
i(t)L̂
(i,ǫ1)
j=i+1
Tnǫ1(X
(j,ǫ1)
σj(t)
1 + α
Tnǫ1(X
(j,ǫ1)
ρijdt+ dW
Remark 7. For n = 0, we derive the ”freezing the drift” case. For n = 1, we
already obtain an improvement.
Proof. First step is to interchange X
(j,ǫ1)
t with (X
(j,ǫ1)
t )+ in (3.6) to obtain:
(i,ǫ1)
t = σ
i(t)L
(i,ǫ1)
j=i+1
(j,ǫ1)
t )+σ
1 + α(X
(j,ǫ1)
ρijdt+ dW
This yields no change for the dynamics of L
(i,ǫ1)
t , since X
(j,ǫ1)
t = (X
(j,ǫ1)
t )+.
8 MARIA SIOPACHA AND JOSEF TEICHMANN
By Taylor’s expansion, we know that as ǫ1 ↓ 0, L̂(i,ǫ1)t → L
(i,ǫ1)
t P-a.s. The
estimate for the error term is given by
log L̂
(i,ǫ1)
t − logL
(i,ǫ1)
σi(s)
j=i+1
Tnǫ1(X
(j,ǫ1)
σj(s)
1 + α
Tnǫ1(X
(j,ǫ1)
ρij +
j=i+1
(j,ǫ1)
t )+σ
1 + α(X
(j,ǫ1)
α|X(j,ǫ1)s − (Tnǫ1(X
(j,ǫ1)
s ))+|ds.
Remark 8. The SDE for the approximated L̂
(i,ǫ1)
t is easier and faster to simulate
than (3.1), as it is exhibited by the following example. Notice additionally that
(i,ǫ1)
t is a continuous functional of the process Y
t (3.4) and of the Brownian path
W it . Eventually, by using L̂
(i,ǫ1)
t as the LIBOR rates, the computational complexity
of the drift and thus of the model can be reduced substantially, while maintaining
accuracy of prices.
Example 2. In this example, we examine the performance of the strong Taylor
correction method. Let N = 3 and consider pricing a caplet on the LIBOR rate L1
with strike K. Its price is given by:
0 = αEP
L1T1 −K
Assume that the volatility functions σi(t) : [0, Ti] → R for i = 1, 2, 3 are given by
(cf. Brigo and Mercurio [BM01], formulation (6.12)):
σi(t) =
a(Ti − t) + d
− b(Ti − t)
where the constants a, b, d, e are the same for all three LIBOR rates and are equal
to a = −0.113035, b = 0.22911, d = −a, e = 0.684784. Thus, we can write the
model under the terminal measure P as:
(1,ǫ1)
t = σ
1(t)L
(1,ǫ1)
(2,ǫ1)
2(t)ρ12
1 + αX
(2,ǫ1)
(3,ǫ1)
3(t)ρ13
1 + αX
(3,ǫ1)
dt+ σ1(t)L
(1,ǫ1)
(2,ǫ1)
t = σ
2(t)L
(2,ǫ1)
(3,ǫ1)
3(t)ρ23
1 + αX
(3,ǫ1)
dt+ σ2(t)L
(2,ǫ1)
dL3t = σ
3(t)L3t dW
(1,ǫ1)
t = ǫ1
σ1(t)X
(1,ǫ1)
(2,ǫ1)
2(t)ρ12
1 + αX
(2,ǫ1)
(3,ǫ1)
3(t)ρ13
1 + αX
(3,ǫ1)
dt+ σ1(t)X
(1,ǫ1)
(2,ǫ1)
t = ǫ1
σ2(t)X
(2,ǫ1)
(3,ǫ1)
3(t)ρ23
1 + αX
(3,ǫ1)
dt+ σ2(t)X
(2,ǫ1)
(3,ǫ1)
t = ǫ1
σ3(t)X
(3,ǫ1)
with initial values L
(i,ǫ1)
0 = X
(i,ǫ1)
0 = ci, for i = 1, 2, 3 and for all ǫ1. The Brownian
motion vector (W 1t ,W
t ) is correlated with correlation coefficient ρij given by:
ρij = 0.49 + (1− 0.49) exp (−0.13|i− j|), i, j = 1, 2, 3.
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 9
The SDEs for the approximated LIBOR rates L̂
(1,ǫ1)
t and L̂
(2,ǫ1)
t are given by:
(1,ǫ1)
t = σ
1(t)L̂
(1,ǫ1)
c2 + ǫ1Y
σ2(t)ρ12
1 + α
c2 + ǫ1Y
c3 + ǫ1Y
σ3(t)ρ13
1 + α
c3 + ǫ1Y
+ σ1(t)L̂
(1,ǫ1)
(2,ǫ1)
t = σ
2(t)L̂
(2,ǫ1)
c3 + ǫ1Y
σ3(t)ρ23
1 + α
c3 + ǫ1Y
dt+ σ2(t)L̂
(2,ǫ1)
The partial derivative terms Y 2t and Y
t are equal to:
Y 2t = c2
( ∫ t
σ2(s)dW 2s −
αc3ρ23
1 + αc3
σ2(s)σ3(s)ds
Y 3t = c3
σ3(s)dW 3s .
We compare three caplet prices:
• benchmark price, underlying L(1,ǫ1)t ;
• strong Taylor price, underlying L̂(1,ǫ1)t ;
• frozen drift price, underlying L(1,0)t .
Numerical results in basis points (bps) are displayed in Table 2 for parameters
ǫ1 = 1, N = 3, α = 0.50137, c1 = 3.86777%, c2 = 3.7574%, c3 = 3.8631%,
T1 = 1.53151, Ti = T1 + iα, i = 2, 3, 4. We characteristically observe the difference
in prices between the benchmark and frozen drift price, whilst our strong Taylor
correction method performs very well and is computationally simpler and faster.
strikes K=3% K=3.5% K=4% K=5.75% K=6.25% K=8%
benchmark 11.1831 8.5897 6.5503 3.0349 2.4423 1.2969
strong Taylor 11.0687 8.5691 6.5867 3.1448 2.5513 1.3926
frozen drift 13.9551 11.1822 8.8803 4.6313 3.8506 2.2524
Table 1: Caplet values in bps for parameters ǫ1 = 1, α = 0.50137, c1 = 3.86777%, c2 = 3.7574%,
c3 = 3.8631% and T1 = 1.53151.
3.3.2. Weak Taylor Approximation. In what follows, we provide some results on
how to correct option prices obtained by the SDE with the frozen drift (3.5) by
adding a correction term involving the appropriate Malliavin weight. Let L
i,k,ǫ1
denote the vector of the LIBOR rates (L
(i,ǫ1)
, . . . , L
(k,ǫ1)
Proposition 2. Assume the setup of Lemma 1, where the ith LIBOR rate is given
(3.8) dL
(i,ǫ1)
t = σ
i(t)L
(i,ǫ1)
j=i+1
(j,ǫ1)
1 + αX
(j,ǫ1)
ρijdt+ dW
with L
(i,ǫ1)
0 = X
(i,ǫ1)
0 for all ǫ1 and all i = 1, ..., N . Assume furthermore that the
Malliavin covariance matrix γ(L
i,k,0
) is invertible. Then the price of an option with
10 MARIA SIOPACHA AND JOSEF TEICHMANN
payoff g(L
i,k,ǫ1
), for i ≤ k ≤ N and g bounded measurable, can be approximated by
the weak Taylor approximation of order one:
a(g,L
i,k,ǫ1
) = P (0, T )
i,k,0
+ ǫ1EP
i,k,0
,(3.9)
where the Malliavin weight ζTi is given by:
ζTi = δ
i,k,0
)Tγ−1(L
i,k,0
i,k,ǫ1
,(3.10)
for t ≤ Ti.
Proof. The weight ζTi is obtained by (2.4). Notice that we can write:
i,k,ǫ1
i,k,0
and hence the result (3.9) by Definition 2 for n = 1. �
Example 3. In this example we let N = 3 and we price a payers swaption with
strike price K and maturity T1, where the underlying swap is entered at T1 and has
payment dates T2 and T3. We assume that the volatility functions σ
i(t) : [0, Ti] → R
for i = 1, 2, 3 are constant:
σ1(t) = σ1, σ
2(t) = σ2, σ
3(t) = σ3,
such that we obtain under the terminal measure P:
(1,ǫ1)
t = σ1L
(1,ǫ1)
(2,ǫ1)
1 + αX
(2,ǫ1)
(3,ǫ1)
1 + αX
(3,ǫ1)
dt+ dW 1t
(2,ǫ1)
t = σ2L
(2,ǫ1)
(3,ǫ1)
1 + αX
(3,ǫ1)
dt+ dW 2t
dL3t = σ3L
t ,(3.11)
(1,ǫ1)
t = ǫ1
(1,ǫ1)
(2,ǫ1)
1 + αX
(2,ǫ1)
(3,ǫ1)
1 + αX
(3,ǫ1)
dt+ dW 1t
(2,ǫ1)
t = ǫ1
(2,ǫ1)
(3,ǫ1)
1 + αX
(3,ǫ1)
dt+ dW 2t
(3,ǫ1)
t = ǫ1
(3,ǫ1)
with initial values L
(i,ǫ1)
0 = X
(i,ǫ1)
0 = ci, for i = 1, 2, 3 and for all ǫ1. W
t and
W 2t are correlated with correlation coefficient ρ12. We freeze the drifts in the above
equations to obtain:
(1,0)
t = c1 exp
( αc2σ2
1 + αc2
αc3σ3
1 + αc3
(2,0)
t = c2 exp
( αc3σ3
1 + αc3
L3t = c3 exp
Similarly to the previous example, we compare four option prices:
• benchmark price;
• frozen drift;
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 11
• strong Taylor price;
• weak Taylor price.
The weak correction formula (3.9) adds a correction term to the closed form price
of the option. The swaption payoff at Ti can be found for example in [MR98]:
swptn
k=i+1
(1 + αL
if the underlying swap is entered at time Ti and has payment dates Ti+1, ..., T . αk
is given by:
Kα, k = i+ 1, . . . , N,
1 +Kα, k = N + 1.
The payers swaption value at time t = 0 can be written as:
swptn
0 = P (0, Ti)EPi
swptn
= P (0, T )EP
(1 + αL
)− (1 +Kα)
,(3.12)
where αi := −1 and Pi denotes the forward measure corresponding to the bond
P (t, Ti) as numéraire. Therefore, its benchmark price is given by the above formula
with N = 2 and i = 1:
swptn
0 = P (0, T )
αL1T1 + αL
+ α2L1T1L
−Kα2L2T1 − 2Kα
Its weak Taylor price is given by (3.9) with i = 1 and k = N = 2.
swptn
0 = P (0, T )
(1,0)
(2,0)
+ α2L
(1,0)
(2,0)
−Kα2L(2,0)
− 2Kα
+ ǫ1EP
(1,0)
(2,0)
+ α2L
(1,0)
(2,0)
−Kα2L(2,0)
− 2Kα
The weight ζT1 is given by (3.10). The partial derivative terms C
|ǫ1=0L
(1,ǫ1)
and C2T1 :=
|ǫ1=0L
(2,ǫ1)
are given by:
C1T1 = L
(1,0)
σ1ρ12
( σ3αc3β2
(1 + αc3)
t− (β2 + β3)W 2t
C2T1 = L
(2,0)
−σ2β3W 2t dt,
with C10 = C
0 = 0 and β2 :=
(1+αc2)2
, β3 :=
(1+αc3)2
. The Malliavin covariance
matrix of the vector (L
(1,0)
(2,0)
) is equal to:
(1,0)
(2,0)
(1 + ρ212)(L
(1,0)
)2T1σ
1 2ρ12(L
(1,0)
(2,0)
)T1σ1σ2
2ρ12(L
(1,0)
(2,0)
)T1σ1σ2 (1 + ρ
12)(L
(2,0)
)2T1σ
⇒ det
(1,0)
(2,0)
(1,0)
(2,0)
)2T 21 σ
2(1− ρ212).
12 MARIA SIOPACHA AND JOSEF TEICHMANN
The determinant is not zero as long as ρ12 6= 1, which is a natural assumption.
Hence under this condition, its inverse is given by:
(1,0)
(2,0)
(1− ρ212)
1+ρ212
(1,0)
)2T1σ
− 2ρ12
(1,0)
(2,0)
T1σ1σ2
− 2ρ12
(1,0)
(2,0)
T1σ1σ2
1+ρ212
(2,0)
)2T1σ
Write the weight ζT1 = ζ
+ ζ2T1 , where the first weight ζ
is obtained as:
ζ1T1 =
(1,0)
C1T1γ
11 + C
γ−112
+D1tL
(2,0)
C1T1γ
21 + C
γ−122
δW 1t ,
and ζ2T1 similarly:
ζ2T1 =
(1,0)
C1T1γ
11 + C
γ−112
+D2tL
(2,0)
C1T1γ
21 + C
γ−122
δW 2t .
Performing all necessary calculations, we conclude that:
ζ1T1 = ρ12
W 1T1
(σ3αc3β2T1
2(1 + αc3)
− (β2 + β3)
W 2t dt
ρ12(β2 + β3)T1
− ρ12
(ρ12β3T1
W 2t dt
Analogously we obtain ζ2T1 as:
ζ2T1 = ρ
W 2T1
(σ3αc3β2T1
2(1 + αc3)
(β2 + β3)
W 2t dt
(β2 + β3)T1
(β3T1
W 2t dt
Notice that the weights are functions of normal variables and thus the calculation of
the weak Taylor price amounts just to computation of deterministic integrals. Table
3 gives the swaption prices in bps for parameters N = 3, α = 0.25, σ1 = 18%,
σ2 = 15%, σ3 = 12%, c0 = 5.28875%, c1 = 5.37375%, c2 = 5.40%, c3 = 5.40125%
and ρ12 = 0.75.
strikes K=4% K=4.5% K=4.75% K=5% K=5.15% K=5.25%
benchmark 10.2240 6.5386 4.7454 3.1060 2.2599 1.7758
frozen drift 10.2132 6.5326 4.7419 3.1028 2.2582 1.7618
strong Taylor 10.2240 6.5386 4.7454 3.1060 2.2599 1.7758
weak Taylor 10.2266 6.5407 4.7485 3.1064 2.2593 1.7626
Table 2: Swaption values in bps for parameters ǫ1 = 1, α = 0.25, σ1 = 18%, σ2 = 15%,
σ3 = 12%, c0 = 5.28875%, c1 = 5.37375%, c2 = 5.40%, c3 = 5.40125% and ρ12 = 0.75.
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 13
3.4. The Stochastic Volatility LIBOR Market Model. In this section, we
develop a stochastic volatility LMM. The stochastic volatility parameter vt follows
a square root process, like in the extensively applied Heston model [Hes93]. The
resulting model, called hereafter the stochastic volatility LMM (SVLMM), has the
following dynamics under the terminal measure:
dLit = σ
i(t)Lit
j=i+1
1 + αL
vtdt+ dW
, i = 1, ..., N,(3.13)
dvt = κ(θ − vt)dt+ ǫ2
vtdBt,
where κ, θ, ǫ2 ∈ R+. The Brownian motions Wt = (W 1t , ...,WNt ) and Bt are ex-
pressed under the terminal measure with correlations dW it dBt = ρidt and dW
ρijdt for i, j = 1, ...N . We assume additionally that the filtration (Ft)0≤t≤T is gen-
erated by both Brownian motions. Observe that the process vt is a time-changed
squared Bessel process with dimension δ = 4κθ/ǫ22. If δ ≥ 2, then the point zero is
unattainable. So we require 2κθ ≥ ǫ22 for the process vt not to reach zero.
3.4.1. Pricing a multi-LIBOR option. In this section, we aim at approximating the
price of an option with payoff depending on the vector L
i,k,ǫ1,ǫ2
i,ǫ1,ǫ2
, . . . , L
k,ǫ1,ǫ2
We interpret the volatility of the volatility parameter ǫ2 as a parameter on which
the LIBOR rates depend. Overall, we parametrise the SVLMM by both ǫ1 and ǫ2
and correct prices in a weak sense introducing Malliavin weights.
Proposition 3. Consider the SVLMM (3.13) and assume that the Malliavin co-
variance matrix γ(L
i,k,0,0
) is invertible. Then the price of an option with payoff
i,k,ǫ1,ǫ2
), i ≤ k ≤ N , where ψ is a bounded measurable function, can be approx-
imated by the weak Taylor approximation of order one:
(ǫ1,ǫ2)
i,k,ǫ1,ǫ2
)) = P (0, T )
i,k,0,0
+ ǫ1EP
i,k,0,0
+ ǫ2EP
i,k,0,0
,(3.14)
where the Malliavin weights ζTi , πTi are given by:
ζTi = δ
i,k,0,0
)Tγ−1(L
i,k,0,0
i,k,ǫ1,0
,(3.15)
πTi = δ
i,k,0,0
)Tγ−1(L
i,k,0,0
i,k,0,ǫ2
),(3.16)
for t ≤ Ti.
Proof. The weights ζTi and πTi are obtained by (2.4). We derive (3.14) by noticing
that:
i,k,ǫ1,ǫ2
i,k,0,0
i,k,ǫ1,0
i,k,0,ǫ2
i,k,0,0
+ ǫ1E
i,k,0,0
+ ǫ2E
i,k,0,0
from Definition 2 for n = 1. �
14 MARIA SIOPACHA AND JOSEF TEICHMANN
Example 4. Let N = 2 and consider the SVLMM where the volatility functions
σi(t) : [0, Ti] → R for i = 1, 2 are assumed to be constant and in particular σ1(t) =
σ1, σ
2(t) = σ2. We derive an approximative formula for the price of a payers
swaption with maturity T1 and strike price K. The underlying swap is entered at
T1 and has payment dates T2, T3. Under the terminal measure P we can write the
SDEs for the LIBOR rates and stochastic volatility as:
dvǫ2t = κ
θ − vǫ2t
dt+ ǫ2
vǫ2t dBt,
(1,ǫ1,ǫ2)
t = −L
(1,ǫ1,ǫ2)
t ρ12
(2,ǫ1,ǫ2)
1 + αX
(2,ǫ1,ǫ2)
t dt+ σ1L
(1,ǫ1,ǫ2)
vǫ2t dW
(2,ǫ2)
t = σ2L
(2,ǫ2)
vǫ2t dW
(2,ǫ1)
t = ǫ1
(2,ǫ2)
vǫ2t dW
W 1t and W
t are assumed to be correlated, so correlations are as dW
t dBt = ρidt
and dW 1t dW
t = ρ12 for i = 1, 2. The (0, 0)-model is given by:
v0t = exp (−κt)(v00 − θ) + θ,
(1,0,0)
= c1 exp
v0t dW
( αc2ρ12
1 + αc2
(2,0)
= c2 exp
v0t dW
(2,0,0)
t = c2,
with c :=
v0t dt = θT1 −
(exp (−κT1) − 1). As in the previous example, we
compare the following option prices:
• benchmark price;
• frozen drift;
• weak Taylor price (3.14).
The benchmark price is given by (3.12) with N = 2 and i = 1:
swptn
0 = P (0, T )EP
(1,ǫ1,ǫ2)
(2,ǫ2)
(1,ǫ1,ǫ2)
(2,ǫ2)
−Kα2L(2,ǫ2)
The weak Taylor price is obtained by (3.14):
swptn
0 = P (0, T )
(1,0,0)
(2,0)
+ α2L
(1,0,0)
(2,0)
−Kα2L(2,0)
− 2Kα
+ ǫ1EP
(1,0,0)
(2,0)
+ α2L
(1,0,0)
(2,0)
−Kα2·
· L(2,0)
− 2Kα
+ ǫ2EP
(1,0,0)
(2,0)
+ α2L
(1,0,0)
(2,0)
−Kα2L(2,0)T1 − 2Kα
We calculate the Malliavin weights ζT1 , πT1 as given by (3.15) and (3.16) corre-
spondingly. We can express the weight ζT1 as:
ζT1 = ζ
+ ζ2T1 ,
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 15
with:
ζ1T1 =
(1,0,0)
(1,ǫ1,0)
γ−1(L
(1,0,0)
(2,0,0)
(2,0,0)
(1,ǫ1,0)
γ−1(L
(1,0,0)
(2,0,0)
δW 1t ,
ζ2T1 =
(1,0,0)
(1,ǫ1,0)
γ−1(L
(1,0,0)
(2,0,0)
(2,0,0)
(1,ǫ1,0)
γ−1(L
(1,0,0)
(2,0,0)
δW 2t .
since ∂
|ǫ1=0 L
(2,ǫ1,0)
= 0. The partial derivative term with respect to ǫ1 for L
given by:
(1,ǫ1,0)
(1,0,0)
(1 + αc2)2
v0sdW
= −σ1ρ12β2L(1,0,0)T1
θ(T1 − t)−
v00 − θ
(exp (−κT1)− exp (−κt))
dW 2t ,
where β2 =
(1+αc2)2
. Similarly the weight πT1 is given by:
πT1 = π
+ π2T1 ,
with:
π1T1 =
(l,0,0)
(j,0,ǫ2)
(1,0,0)
(2,0,0)
δW 1t ,
π2T1 =
(l,0,0)
(j,0,ǫ2)
(1,0,0)
(2,0,0)
δW 2t .
Partial derivative terms are equal to:
(1,0,ǫ2)
(1,0,0)
exp (−κt)
exp (κs)
v0sdBsdW
αc2σ1σ2ρ12
1 + αc2
exp (κs)
exp (−κT1)− exp (−κs)
Doing similar calculations, we derive the second partial derivative:
(2,ǫ2)
(2,0)
σ1σ2Vtdt
where Vt = exp (−κt)
exp (κs)
v0sdBs.
16 MARIA SIOPACHA AND JOSEF TEICHMANN
We calculate the Malliavin covariance matrix γ
(1,0,0)
(2,0,0)
and its in-
verse.
(1 + ρ212)(L
(1,0,0)
)2σ21
v0t dt
︸ ︷︷ ︸
2ρ12L
(1,0,0)
(2,0,0)
σ1σ2c
2ρ12L
(1,0,0)
(2,0,0)
σ1σ2c (1 + ρ
12)(L
(2,0,0)
)2σ22c
⇒ det
(1,0,0)
(2,0,0)
(1,0,0)
(2,0,0)
)2σ21σ
2(1− ρ212).
Hence its inverse is given by, for ρ12 6= 1:
γ−1 =
(1− ρ212)
1+ρ212
(1,0,0)
)2σ21c
− 2ρ12
(1,0,0)
(2,0,0)
σ1σ2c
− 2ρ12
(1,0,0)
(2,0,0)
σ1σ2c
1+ρ212
(2,0,0)
)2σ22c
If we define Xi =
v0t dW
t , i = 1, 2 and Y =
θ(T1−t)− v
(exp (−κT1)−
exp (−κt))
dW 2t , we finally obtain the weights as:
ζ1T1 = −
ρ12β2
X1Y − Cov(X1, Y )
ζ2T1 =
ρ212β2
X2Y − Cov(X2, Y )
Moreover, for the weight πT1 we define:
exp (−κT1)− exp (−κt)
and random variables Di, Zi for i = 1, 2:
g(s)dW isdW
g(s)dZisdW
where the Brownian motions Zit are independent from W
t and f(t) =
exp (−κt)√
g(s) = exp (κs)
v0s . Therefore, we obtain the weights as:
π1T1 =
X1(ρ1D1 +
1− ρ21Z1) +
(αc2(2ρ12σ2 + σ1) + σ1
1 + αc2
(αc2(2ρ12σ2 + σ1) + σ1
1 + αc2
X1(ρ2D2 +
1− ρ22Z2)+
σ1BX1
− σ1ρ1E
WEAK AND STRONG TAYLOR METHODS FOR NUMERICAL SOLUTIONS OF SDES 17
where E equals to
1− exp (−κT1)
. Similarly we get π2T1 as:
π2T1 =
X2(ρ2D2 +
1− ρ22Z2) +
σ2X2 + 1
− σ2ρ2E
− ρ12
X2(ρ1D1 +
1− ρ21Z1) +
σ1BX2
(αc2(2ρ12σ2 + σ1) + σ1
1 + αc2
− σ1ρ2E
(αc2(2ρ12σ2 + σ1) + σ1
1 + αc2
In this example, the weights are functions of normal variables and double sto-
chastic integrals, which are computed via simulation. Table 4 reports the swaption
prices in bps with parameters N = 2, α = 1.5, σ1 = 25%, σ2 = 15%, c0 = 5.28875%,
c1 = 5.4%, c2 = 5.39%, v0 = 1, ρ1 = −0.75, ρ2 = −0.6, κ = 2.3767, θ = 0.2143,
ǫ2 = 25%, ρ12 = 0.63.
strikes K=3.5% K=4% K=5% K=6% K=7% K=8%
benchmark 3.8984 2.9221 1.2588 0.3858 0.1019 0.0216
(0, 0)-model 3.8951 2.9053 1.2705 0.3966 0.0942 0.0185
weak Taylor 3.8990 2.9159 1.2694 0.3791 0.1042 0.0210
Table 3: Stochastic volatility swaption values in bps for parameters ǫ1 = 1, α = 1.5, σ1 = 25%,
σ2 = 15%, c0 = 5.28875%, c1 = 5.4%, c2 = 5.39%, v0 = 1, ρ1 = −0.75, ρ2 = −0.6,
κ = 2.3767, θ = 0.2143, ǫ2 = 25%, ρ12 = 0.63.
References
[BGM97] A. Brace, D. Gatarek, and M. Musiela, The Market Model of Interest Rate Dynamics,
Mathematical Finance 7 (1997), no. 2, 127–155.
[BM01] D. Brigo and F. Mercurio, Interest Rate Models: Theory and Practice, Springer Finance,
Springer, 2001.
[BW00] A. Brace and R.S. Womersley, Exact Fit to the Swaption Volatility Matrix using Semi-
definite Programming, Working paper, presented at ICBI Global Derivatives Confer-
ence, Paris, April 2000, 2000.
[EÖ05] E. Eberlein and F. Özkan, The Lévy LIBOR model, Finance and Stochastics 9 (2005),
327348.
[Hes93] S. Heston, A Closed-Form Solution for Options with Stochastic Volatility with Applica-
tions to Bond and Currency Options, The Review of Financial Studies 6 (1993), no. 2,
327–343.
[Jam97] F. Jamshidian, LIBOR and Swap Market Models and Measures, Finance and Stochas-
tics 1 (1997), 293–330.
[KM97] A. Kriegl and P. W. Michor, The Convenient Setting of Global Analysis, American
Mathematical Society, 1997.
[Mal97] P. Malliavin, Stochastic Analysis, Springer, 1997.
[MR98] M. Musiela and M. Rutkowski, Martingale Methods in Financial Modelling, second ed.,
Springer, 1998.
[MSS97] K. Miltersen, K. Sandmann, and D. Sondermann, Closed Form Solutions for Term
Structures Derivatives with Log-Normal Interest Rates, Journal of Finance 52 (1997),
409–430.
[MT06] P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Fi-
nance, Springer, 2006.
18 MARIA SIOPACHA AND JOSEF TEICHMANN
[Nua06] D. Nualart, The Malliavin Calculus and Related Topics, second ed., Springer Verlag,
2006.
[Sch02] E. Schlögl, A Multicurrency Extension of the Lognormal Interest Rate Market Models,
Finance and Stochastics 6 (2002), 173196.
Department of Mathematical Methods in Economics, Vienna University of Technol-
ogy, Wiedner Hauptstrasse 8–10/105–1, A-1040 Vienna, Austria.
E-mail address: [josef.teichmann,siopacha]@fam.tuwien.ac.at
	1. Introduction and Setting
	2. Weak and strong Taylor methods - Structure Theorems
	3. Applications from Financial Mathematics
	3.1. The LIBOR Market Model
	3.2. Freezing the Drift
	3.3. Correcting the Frozen Drift
	3.4. The Stochastic Volatility LIBOR Market Model
	References
ABSTRACT
  We apply results of Malliavin-Thalmaier-Watanabe for strong and weak Taylor
expansions of solutions of perturbed stochastic differential equations (SDEs).
In particular, we work out weight expressions for the Taylor coefficients of
the expansion. The results are applied to LIBOR market models in order to deal
with the typical stochastic drift and with stochastic volatility. In contrast
to other accurate methods like numerical schemes for the full SDE, we obtain
easily tractable expressions for accurate pricing. In particular, we present an
easily tractable alternative to ``freezing the drift'' in LIBOR market models,
which has an accuracy similar to the full numerical scheme. Numerical examples
underline the results.

<|endoftext|><|startoftext|>
Finite bias visibility of the electronic Mach-Zehnder interferometer
Preden Roulleau, F. Portier, D. C. Glattli,∗ and P. Roche†
Nanoelectronic group, Service de Physique de l’Etat Condensé,
CEA Saclay, F-91191 Gif-Sur-Yvette, France
A. Cavanna, G. Faini, U. Gennser, and D. Mailly
CNRS, Phynano team,
Laboratoire de Photonique et Nanostructures,
Route de Nozay, F-91460 Marcoussis, France
(Dated: November 28, 2018)
We present an original statistical method to measure the visibility of interferences in an electronic
Mach-Zehnder interferometer in the presence of low frequency fluctuations. The visibility presents
a single side lobe structure shown to result from a gaussian phase averaging whose variance is
quadratic with the bias. To reinforce our approach and validate our statistical method, the same
experiment is also realized with a stable sample. It exhibits the same visibility behavior as the
fluctuating one, indicating the intrinsic character of finite bias phase averaging. In both samples,
the dilution of the impinging current reduces the variance of the gaussian distribution.
PACS numbers: 85.35.Ds, 73.43.Fj
Nowadays quantum conductors can be used to per-
form experiments usually done in optics, where electron
beams replace photon beams. A beamlike electron mo-
tion can be obtained in the Integer Quantum Hall Effect
(IQHE) regime using a high mobility two dimensional
electron gas in a high magnetic field at low temperature.
In the IQHE regime, one-dimensional gapless excitation
modes form, which correspond to electrons drifting along
the edge of the sample. The number of these so-called
edge channels corresponds to the number of filled Lan-
dau levels in the bulk. The chirality of the excitations
yields long collision times between quasi-particles, mak-
ing edge states very suitable for quantum interferences
experiments like the electronic Mach-Zehnder interfer-
ometer (MZI) [1, 2, 3]. Surprisingly, despite some ex-
periments which show that equilibrium length in chiral
wires is rather long [4], very little is known about the
coherence length or the phase averaging in these ”per-
fect” chiral uni-dimensional wires. In particular, while in
the very first interference MZI experiment the interfer-
ence visibility showed a monotonic decrease with voltage
bias, which was attributed to phase noise [1], in a more
recent paper, a surprising non-monotonic decrease with
a lobe structure was observed [5]. A satisfactory expla-
nation has not yet been found, and the experiment has
so far not been reported by other groups to confirm these
results.
We report here on an original method to measure the
visibility of interferences in a MZI, when low frequency
phase fluctuations prevent direct observation of the peri-
odic interference pattern obtained by changing the mag-
netic flux through the MZI. We studied the visibility at
finite energy and observed a single side lobe structure,
which can be explained by a gaussian phase averaging
whose variance is proportional to V 2, where V is the
FIG. 1: SEM view of the electronic Mach-Zehnder with a
schematic representation of the edge state. G0, G1, G2 are
quantum point contacts which mimic beam splitters. The
pairs of split gates defining a QPC are electrically connected
via a Au metallic bridge deposited on an insolator (SU8). G0
allows a dilution of the impinging current, G1 and G2 are the
two beam splitters of the Mach-Zehnder interferometer. SG
is a side gate which allows a variation of the length of the
lower path (b).
bias voltage. To reinforce our result and check if low
frequency fluctuation may be responsible for that behav-
ior, we realized the same experiment on a stable sample
: we also observed a single side lobe structure which can
be fitted with our approach of gaussian phase averaging.
This proves the validity of the results, which cannot be
an artefact due to the low frequency phase fluctuations
in the first sample. In both samples, the dilution of the
impinging current has an unexpected effect : it decreases
the variance of the gaussian distribution.
The MZI geometry is patterned using e-beam lithogra-
phy on a high mobility two dimensional electron gas in a
GaAs/Ga1−xAlxAs heterojunction with a sheet density
nS = 2.0×1011 cm−2 and a mobility of 2.5×106 cm2/Vs.
http://arxiv.org/abs/0704.0746v3
The experiment was performed in the IQHE regime at
filling factor ν = nSh/eB = 2 (magnetic field B =5.2
Tesla). Transport occurs through two edge states with an
extremely large energy redistribution length [4]. Quan-
tum point contacts (QPC) controlled by gates G0, G1
and G2 define electronic beam splitters with transmis-
sions T0, T1 and T2 respectively. In all the results pre-
sented here, the interferences were studied on the outer
edge state schematically drawn as black lines in Fig.(1),
the inner edge state being fully reflected by all the QPCs.
The interferometer consists of G1, G2 and the small cen-
tral ohmic contact in between the two arms. G1 splits
the incident beam into two trajectories (a) and (b), which
are recombined with G2 leading to interferences. The
two arms defined by the mesa are 8 µm long and en-
close a 14 µm2 area. The current which is not transmit-
ted through the MZI, IB = ID − IT , is collected to the
ground with the small ohmic contact. An additional gate
SG allows a change of the length of the trajectory (b).
The impinging current I0 can be diluted thanks to the
beam splitter G0 whose transmission T0 determines the
diluted current dID = T0 × dI0. We measure the differ-
ential transmission through the MZI by standard lock-in
techniques using a 619 Hz frequency 5 µVrms AC bias
VAC superimposed to the DC voltage V . This AC bias
modulates the incoming current dID = T0 × h/e2 ×VAC ,
and thus the transmitted current in an energy range close
to eV , giving the transmission T (eV ) = dIT /dI0.
Using the single particle approach of the Landauer-
Büttiker formalism, the transmission amplitude t
through the MZI is the sum of the two complex
transmission amplitudes corresponding to paths (a)
and (b) of the interferometer; t = t0{t1 exp(iφa)t2 −
r1 exp(iφb)r2}. This leads to a transmission probabil-
ity T (ǫ) = T0{T1T2 + R1R2 +
T1R2R1T2 sin[ϕ(ǫ)]},
where ϕ(ǫ) = φa − φb and Ti = |ti|2 = 1 −Ri. ϕ(ǫ) cor-
responds to the total Aharonov-Bohm (AB) flux across
the surface S(ǫ) defined by the arms of the MZI, ϕ(ǫ) =
2πS(ǫ) × eB/h. The surface S depends on the energy ǫ
when there is a finite length difference ∆L = La−Lb be-
tween the two arms. This leads to a variation of the phase
with the energy, ϕ(ǫ+EF ) = ϕ(EF )+ǫ∆L/(~vD), where
vD is the drift velocity. When varying the AB flux, the
interferences manifest themselves as oscillations of the
transmission; in practice this is done either by varying
the magnetic field or by varying the surface of the MZI
with a side gate [1, 5, 6]. The visibility of the interfer-
ences defined as V = (TMAX−TMIN )/(TMAX+TMIN ), is
maximum when both beam splitter transmission are set
to 1/2. In the present experiment the MZI is designed
with equal arm lengths (∆L = 0) and the visibility is not
expected to be sensitive to the coherence length of the
source ~vD/max(kBT, eVAC). Thus the visibility pro-
vides a direct measurement of the decoherence and/or
phase averaging in this quantum circuit.
In Ref.[1], 60% visibility was observed at low tem-
-0.76 -0.72 -0.68 -0.64
-0.4 -0.2 0.0 0.2 0.4
2 x Visibility
δT / T
0.0 0.2 0.4 0.6 0.8 1.0
Transmission T
gate voltage (V
 or V
) (Volt)
FIG. 2: Sample #1 a)Transmission T = dIT /dI0 as a func-
tion of the gate voltages V 1 and V 2 applied on G1 and G2.
(◦) T = T1 versus V 1. (•) T = T2 versus V 2. The solid
line is the transmission T obtained with T1 fixed to 1/2 while
sweeping V 2 : transmission fluctuations due to interferences
with low frequency phase noise appears. b) Stack histogram
on 6000 successive transmission measurements as a function
of the normalized deviation from the mean value. The solid
line is the distribution of transmission expected for a uniform
distribution of phases. c)Visibility of interferences as a func-
tion of the transmission T2 when T1 = 1/2. The solid line is
T2(1− T2) dependence predicted by the theory.
perature, showing that the quantum coherence length
can be at least as large as several micrometer at 20 mK
(and probably larger if phase averaging is the limiting
factor). At finite energy (compared to the Fermi en-
ergy), the visibility was also found decreasing with the
bias voltage[1, 5, 6]. This effect is not due to an increase
of the coherence length of the electron source which re-
mains determined by eVAC or kBT [7]. In a first exper-
iment, a monotonic visibility decrease was found, which
was attributed to phase averaging, as confirmed by shot
noise measurements [1]. Nevertheless, it remains unclear
why and how the phase averaging increases with the bias.
In a recent paper, instead of a monotonic decrease of the
visibility, a lobe structure was observed for filling factor
less than 1 in the QPCs [5]. No non-interacting electron
model was found to be able to explain this observation,
and although interaction effects have been proposed [8],
a satisfactory explanation has not yet been found to ac-
count for all the experimental observations. So far, two
experiments have shown to two different behaviors, rais-
ing questions about the universality of these observations.
Here, we report experiments where different samples give
consistent results, with a fit to the data clearly demon-
strating that our MZI suffers from a gaussian phase av-
eraging whose variance is proportional to V 2, leading to
the single side lobe structure of the visibility.
We have used the following procedure to tune the MZI.
We first measure independently the two beam splitters’
transparencies versus their respective gate voltages, the
inner edge state being fully reflected. This is shown in
figure (2a) where the transmission (T1 or T2) through
one QPC is varied while keeping unit transparency for
the other QPC. This provides the characterization of the
transparency of each beam splitter as a function of its
gate voltage. The fact that the transmission vanishes
for large negative voltages means that the small ohmic
contact in between the two arms can absorb all incom-
ing electrons, otherwise the transmission would tend to
a finite value. This is very important in order to avoid
any spurious effect in the interference pattern. In a sec-
ond step we fix the transmission T1 to 1/2 while sweep-
ing the gate voltage of G2 (solid line of figure (2a)).
Whereas for a fully incoherent system the T should be
1/2× (R2 + T2) = 1/2, we observe large temporal trans-
mission fluctuations around 1/2. We show in the fol-
lowing that they result from the interferences, expected
in the coherent regime, but in presence of large low fre-
quency phase noise. This is revealed by the probabil-
ity distribution of the transmissions obtained when mak-
ing a large number of transmission measurements for the
same gate voltage. Figure (2b) shows a histogram of T
when making 6000 measurements (each measurement be-
ing separated from the next by 10 ms). The histogram of
the transmission fluctuations δT = T − Tmean displays
two maxima very well fitted using a probability distribu-
tion p(δT /Tmean) = 1/(2π
1− (δT /Tmean)2/V2) (the
solid line of figure (2b)). This distribution is obtained
assuming interferences δT = Tmean×Vsin(ϕ) and a uni-
form probability distribution of ϕ over [−π,+π]. Note
that the peaks around |δT /Tmean| = V have a finite
width. They correspond to the gaussian distribution as-
sociated with the detection noise which has to be convo-
luted with the previous distribution.
Although no regular oscillations of transmission can
be observed due to phase noise, we can directly extract
the visibility of the interferences by calculating the vari-
ance of the fluctuations (the approach is similar to mea-
surements of Universal Conductance Fluctuations via the
amplitude of 1/f noise in diffusive metallic wires) [10].
As expected when T1 = 1/2, the visibility extracted by
our method is proportional to
T2(1− T2), definitively
showing that fluctuations results from interference: we
are able to measure the visibility of fluctuating interfer-
ences (see figure (2c)).
The visibility depends on the bias voltage with a lobe
structure shown in figure (3), confirming the pioneer-
ing observation [5]. Nevertheless, there are marked dif-
ferences. The visibility shape is not the same as that
in ref.[5]. We have always seen only one side lobe, al-
though the sensitivity of our measurements would be high
enough to observe a second one if it existed. Moreover,
-100 -50 0 50 100
 = 0.02
 = 0.14
Drain-Source Voltage (µV)
FIG. 3: (Color online) Sample #1 : Visibility of the inter-
ferences as a function of the drain-source voltage I0h/e
2 for
three different values of T0. The curves are shifted for clar-
ity. The energy width of the lobe structure is modified by
the dilution whereas the maximum visibility at zero bias is
not modified. Solid lines are fits using equation (1). From
top to bottom, T0 = 0.02 and V0 = 31 µV, T0 = 0.14 and
V0 = 22 µV, T0 = 1 and V0 = 11.4 µV.
the lobe width (see figure (3)) can be increased by di-
luting the impinging current with G0, whereas no such
effect is seen for G1 and G2. This apparent increase of
the energy scale cannot be attributed to the addition of
a resistance in series with the MZI because G0 is close to
the MZI, at a distance shorter than the coherence length.
An almost perfect fit for the whole range of T0 (dilu-
tion), is
V = V0e−V
2/2V 2
0 |1− V ID
dID/dV
|, (1)
where V0 is a fitting parameter. Equation (1) is obtained
when assuming a gaussian phase averaging with a vari-
ance < δϕ2 > proportional to V 2 and a length difference
∆L small enough to neglect the energy dependence of
the phase in the observed energy range eV ≪ ~vD/∆L.
In such a case, the interfering part of the current I∼ is
thus proportional to ID sin(ϕ). The gaussian distribu-
tion of the phase leads to I∼ ∝ ID sin(< ϕ >)e−<δϕ
2>/2,
where < ϕ > is the mean value of the phase distribu-
tion. The measured interfering part of the transmission,
T∼ = h/e2 dI∼/dV gives a visibility corresponding to for-
mula (1) when < δϕ2 >= V 2/V 2
. Such behavior gives
a nul visibility accompanied with a π shift of the phase
when V ID/(V
dID/dV ) = 1. When T0 ∼ 1, ID is pro-
portional to V and the width of the central lobe is simply
equal to 2V0. However in the most general case, dID/dV
varies with V . One can see in figure (3) that the fit with
Equation (1) is very good, definitively showing that the
-20 0 20
0.4 0.6 0.8
 / dI
0.5 0.6 0.7
 / dI
 V  (µV)
-40 -20 0 20 40
d)  T
 = 0.06
Drain Source Voltage (µV)
FIG. 4: (Color online) Sample #2 : a) Gray plot of the
transmission T as a function of the bias voltage V and the
side gate voltage VSG. Note the π shift of the phase when
the visibility reaches 0. b) & c) T as a function of the side
gate voltage for two different values of the drain source volt-
age corresponding to the dashed line of a) (0 and 16 µV
respectively). d) Lobe structure of the visibility fitted using
equation (1) for a diluted and an undiluted impinging current.
existence of one side lobe, as observed in the experiment
of ref.[5] at ν = 2 (for the highest fields) and at ν = 1,
can be explained within our simple approach. Concern-
ing multiple side lobes, we cannot yet conclude if they do
arise from long range interaction as recently proposed by
ref.[8]. Our geometry is different from the one used in the
earlier experiment [5] and the coupling between counter
propagating edge states, thought to be responsible for
multiple side lobe [8], should be less efficient here.
To check if low frequency fluctuations have an impact
on the finite bias phase averaging, we have studied an-
other sample, with the same geometry and fabricated
simultaneously (sample #2), which exhibits clear inter-
ference pattern (see Figure (4a,b,c)). As one can remark
on figure (4d), the lobe structure is well fitted with our
theory, definitively showing that the gaussian phase av-
eraging is not associated with low frequency phase fluc-
tuations.
It is noteworthy that V0 increases (see figure (5)) with
the dilution, namely when the transmission T0 at zero
bias decreases. An impact of the dilution was already
observed as it suppressed multiple side lobes [9] (arXiv
version of Ref.[5]), but the conclusion was that the width
of the central lobe was barely affected. Here, dilution
plays a clear role whose T0 dependence is the same for
the two studied samples, once normalized to the not di-
luted case. This dilution effect is nevertheless not easy
to explain. For example, mechanisms like screening, in-
tra edge scattering and fluctuations mediated by shot
noise should have maximum effect at half transmission,
in contradiction with figure (5). More generally, it is
difficult to determine if the process responsible for the
phase averaging introduced in our model is located at
the beam splitters, or is uniformly distributed along the
interfering channels. However, setting T1 = 0.02 or 0.05,
keeping T2 = 0.5, leaves the lobe width unaffected. This
shows that, if located at the Quantum Point Contacts,
the phase averaging process is independent of transmis-
sion.
0.0 0.2 0.4 0.6 0.8 1.0
 Sample #1 : V
(1) = 13.7 µV
 Sample #2 : V
(1) = 10.6 µV
FIG. 5: (Color online) V0 obtained by fitting the visibility
with equation (1), normalized to V0 at T0 = 1, as a function
of T0 at zero bias.
To summarize, we propose a statistical method to mea-
sure the visibility of ”invisible” interferences. We observe
a single side lobe structure of the visibility on stable and
unstable samples which is shown to result from a gaus-
sian phase averaging whose variance is proportional to
V 2. Moreover, this variance is shown to be reduced by
diluting the impinging current. However, the mechanism
responsible for such type of phase averaging remains yet
unexplained.
The authors would like to thank M. Büttiker for fruit-
ful discussions. This work was supported by the French
National Research Agency (grant n◦ 2A4002).
∗ Also at LPA, Ecole Normale Supérieure, Paris.
† Electronic address: patrice.roche@cea.fr
[1] Y. Ji, Y. Chung, D. Sprinzak, M. Heiblum, D. Mahalu,
and H. Shtrikman, Nature 422, 415 (2003).
[2] P. Samuelsson, E. V. Sukhorukov, and M. Büttiker, Phys.
Rev. Lett. 92, 026805 (2004).
[3] I. Neder, N. Ofek, Y. Chung, M. Heiblum, D. Mahalu,
and V. Umansky, arXiv:0705.0173 (2007).
[4] T. Machida, H. Hirai, S. Komiyama, T. Osada, and Y.
Shiraki, Solid State Commun. 103, 441 (1997).
mailto:patrice.roche@cea.fr
http://arxiv.org/abs/0705.0173
[5] I. Neder, M. Heiblum, Y. Levinson, D. Mahalu, and V.
Umansky, Phys. Rev. Lett. 96, 016804 (2006).
[6] L. V. Litvin, H.-P. Tranitz, W. Wegscheider, and C.
Strunk, Phys. Rev. B 75, 033315 (2007).
[7] V. S.-W. Chung, P. Samuelsson, and M. Büttiker, Phys.
Rev. B 72, 125320 (2005).
[8] E. V. Sukhorukov and V. V. Cheianov,
cond-mat/0609288 .
[9] I. Neder, M. Heiblum, Y. Levinson, D. Mahalu, and V.
Umansky, cond-mat/0508024 (2005).
[10] All the results on the visibility reported here on sam-
ple #1 have been obtained using the following proce-
dure : we measured N = 2000 times the transmission
and calculated the mean value Tmean and the variance
< δT 2 >. It is straightforward to show that the visi-
bility is V =
< δT 2 > − < δT 2 >0/Tmean, where
< δT 2 >0 is the measurement noise which depends on
the AC bias amplitude, the noise of the amplifiers and
the time constant of the lock-in amplifiers (fixed to 10
ms), measured in absence of the quantum interferences.
http://arxiv.org/abs/cond-mat/0609288
http://arxiv.org/abs/cond-mat/0508024
0.2 0.4 0.6
0.6 d)
Magnetic Field - 4.6 T (mT)
0.2 0.4 0.6
0.6 c)
0.2 0.4 0.6
ABSTRACT
  We present an original statistical method to measure the visibility of
interferences in an electronic Mach-Zehnder interferometer in the presence of
low frequency fluctuations. The visibility presents a single side lobe
structure shown to result from a gaussian phase averaging whose variance is
quadratic with the bias. To reinforce our approach and validate our statistical
method, the same experiment is also realized with a stable sample. It exhibits
the same visibility behavior as the fluctuating one, indicating the intrinsic
character of finite bias phase averaging. In both samples, the dilution of the
impinging current reduces the variance of the gaussian distribution.

<|endoftext|><|startoftext|>
arXiv:0704.0747v1  [math.DG]  5 Apr 2007
Univ. Beograd. Publ. Elektrotehn. Fak.
Ser. Mat. 7 (1996), 105–109.
A NOTE ON HIGHER-ORDER
DIFFERENTIAL OPERATIONS
Branko J. Malešević
In this paper we consider successive iterations of the first-order differential
operations in space R
1. INTRODUCTION
Let C∞(R3) be the set of scalar functions f = f(x1, x2, x3) : R
3 7→ R which
have the continuous partial derivatives of the arbitrary order on coordinates xi (i =
1, 2, 3). Let ~C∞(R3) be the set vector functions ~f =
f1(x1, x2, x3), f2(x1, x2, x3),
f3(x1, x2, x3)
: R3 7→ R3 which have the coordinately continuous partial deriva-
tives of the arbitrary order on coordinates xi (i = 1, 2, 3). First-order differential
operations of the vector analysis of the space R3 are defined on the following set
of functions:
f : R3 7→ R | f ∈ C∞(R3)
and ~F =
~f : R3 7→ R3 | ~f ∈ ~C∞(R3)
First-order differential operations of the vector analysis of the space R3 are
defined as the following three linear operations [1], denoted here by ∇1,∇2 and ∇3
for a convenience:
(1) grad f = ∇1f =
~e1 +
~e2 +
~e3 : F 7→ ~F ,
(2) curl ~f = ∇2 ~f =
~e1 +
~e2 +
~e3 : ~F 7→ ~F ,
(3) div ~f = ∇3 ~f =
: ~F 7→ F.
Let Ω = {∇1,∇2,∇3} be the set of above defined operations and let Σ =
F ∪ ~F . Then the first-order differential operations can be considered as partial
operations Σ 7→ Σ, i.e. as operations whose domain (and codomain) are subsets F or
01991 Mathematics Subject Classification: 26B12
http://arxiv.org/abs/0704.0747v1
106 Branko J. Malešević
~F of Σ. Second and higher-order differential operations are then defined as products
of operations in Ω in the sense of composition of operations. Some of these products
might be meaningful, like ∇3 ◦ ∇1, while the others are meaningless, like ∇1 ◦ ∇1.
To all meaningless products for any argument we associate the value of nowhere
defined function ϑ (Dom (ϑ) = ∅ and Ran (ϑ) = ∅). Nowhere defined function ϑ(f∅)
is a concept from the recursive function theory [2]. We do not consider the function
ϑ as the starting argument for calculating the value of the higher-order differential
operations. In that way we increase set Σ into set Σ = F ∪ ~F ∪ {ϑ}.
All meaningful second-order differential operations are:
(4) ∆f = div grad f = (∇3 ◦ ∇1) (f),
(5) curl curl ~f = (∇2 ◦ ∇2) (~f),
(6) graddiv ~f = (∇1 ◦ ∇3)(~f),
(7) div curl ~f = (∇3 ◦ ∇2) (~f) = 0,
(8) curl grad f = (∇2 ◦ ∇1) (f) = ~0, f, ~f ∈ Σ \ {ϑ}.
In this paper we consider higher-order differential operations, search for mean-
ingful ones and present some applications.
2. HIGHER-ORDER DIFFERENTIAL OPERATIONS
Theorem 1. For arbitrary operations ∇i,∇j ,∇k ∈ Ω (i, j, k ∈ {1, 2, 3}) and
argument ξ ∈ Σ \ {ϑ} the associative law holds:
(9) ∇i ◦ (∇j ◦ ∇k)(ξ) = (∇i ◦ ∇j) ◦ ∇k(ξ).
Proof. Choosing the ∇i,∇j ,∇k from Ω and argument ξ from Σ\{ϑ}, (9) appears
in 54 possible cases. It is directly verified that whenever the left side of the equality
is meaningless, the right side is also meaningless. Than, all meaningless products
have the same value of the nowhere defined function ϑ, so that (9) is true in the
following form: ϑ = ϑ. Also, whenever the left side of equality is meaningful,
the right side is also meaningful. Then, according to the associative law of the
meaningful functions, we conclude that (9) is true.
From Theorem 1 it follows (by induction) that the generalized associative
law also holds, so we may write the product ∇i1 ◦ ∇i2 ◦ · · · ◦ ∇in without brackets
(ij ∈ {1, 2, 3} : j = 1, 2, ..., n).
For higher-order differential operations, given as meaningful products, we
say that they are the trivial products if they are trivially anullated, i.e. if they are
identically the same as the anullating functions 0, ~0 from Σ. Otherwise, we refer to
the higher-order differential operations, given as meaningful products, as nontrivial
products (if they are nontrivially anullated).
Next, we prove the statement:
A note on higher-order differential operations 107
Theorem 2. Higher-order differential operations appear as nontrivial products in
the following three forms:
(grad) div . . . graddiv grad f = (∇1◦)∇3 ◦ · · · ◦ ∇1 ◦ ∇3 ◦ ∇1f,
curl curl . . . curl curl curl ~f = ∇2 ◦ ∇2 ◦ · · · ◦ ∇2 ◦ ∇2 ◦ ∇2 ~f,
(div) grad . . . div graddiv ~f = (∇3◦)∇1 ◦ · · · ◦ ∇3 ◦ ∇1 ◦ ∇3 ~f,
for arbitrary functions f, ~f ∈ Σ \ {ϑ}, where terms in brackets are included for odd
number of terms and are left out otherwise. All other meaningful operations are
identically zero in their domain.
Proof. Meaningful third-order differential operations appear in the form of eight
compositions as follows:
(10) graddiv gradf = ∇1 ◦ ∇3 ◦ ∇1f,
(11) curl curl curl ~f = ∇2 ◦ ∇2 ◦ ∇2 ~f,
(12) div graddiv ~f = ∇3 ◦ ∇1 ◦ ∇3 ~f,
(13) div curl curl ~f = ∇3 ◦ ∇2 ◦ ∇2 ~f = 0,
(14) div curl gradf = ∇3 ◦ ∇2 ◦ ∇1f = 0,
(15) curl curl grad f = ∇2 ◦ ∇2 ◦ ∇1f = ~0,
(16) curl graddiv ~f = ∇2 ◦ ∇1 ◦ ∇3 ~f = ~0,
(17) graddiv curl ~f = ∇1 ◦ ∇3 ◦ ∇2 ~f = ~0, f, ~f ∈ Σ \ {ϑ}.
Anullations of the operations (13)–(17) follow directly from the anullations (4)–(5).
The statement follows directly from the principle of mathematical induction by
means of using the general associative law and formulas (10)–(17).
For a given sequence of operations ∇i1 ,∇i2 , . . . ,∇in from the set Ω of func-
tions, let define the concept of the collection of functions as a subset of functions
Θ ⊆ Σ \ {ϑ} such that all functions ξ from Θ anullate the nontrivial product
∇i1 ◦ ∇i2 ◦ · · · ◦ ∇in (ξ).
Let us form some collections. Scalar functions f from Σ, such that ∆nf = 0
is true, define harmonic collection Hn of order n, as the form of the polyharmonic
functions. Let us notice that in the case of two dimensions there is a general form
of polyharmonic functions f as a solution of the equation ∆nf = 0, [3]. Vector
functions ~f from Σ, such that curln ~f = ~0 is true, define curling collection Cn of
order n.
We can remark that besides the total scalar operation ∆ : F 7→ F (partial
scalar operation ∆ : Σ 7→ Σ) we can also consider the total vector operation
~∆ : ~F 7→ ~F (partial vector operation ~∆ : Σ 7→ Σ) defined by:
(18) ~∆~f = (∆f1,∆f2,∆f3) = ∆f1 · ~e1 +∆f2 · ~e2 +∆f3 · ~e3.
108 Branko J. Malešević
Let set ~Hn be the sign for the vector functions ~f from Σ such that ~∆
n(~f) = ~0,
where ~∆n is iteration of order n of the vector operation ~∆ given by (18). The set
of vector harmonic functions ~Hn of order n, which is defined in such a way, is not
in the list of collections which appear in the previous theorem because it is not
obtained through the compositions of operations (1)–(3). For the set ~Hn we shall
keep the term collection.
Let us notice that for scalar polyharmonic collections, vector polyharmonic
collections and curling collections, related to the index-order, the following inclu-
sions hold:
(19) H ⊂ H2 ⊂ · · · ⊂ Hn−1 ⊂ Hn ⊂ · · · ,
(20) ~H ⊂ ~H2 ⊂ · · · ⊂ ~Hn−1 ⊂ ~Hn ⊂ · · · ,
(21) C ⊂ C2 ⊂ · · · ⊂ Cn−1 ⊂ Cn ⊂ · · · .
Let emphasize that all previous considerations can be transformed in three-
dimensional orthogonal curvilinear coordinate system by introducing of correspond-
ing presumptions for functions from the sets F, ~F and Lamé’s coefficients.
Finally, let state a few examples where scalar and vector polyharmonic col-
lections appear.
Example 1. All meaningful products of third-and-higher-order differential opera-
tions for vector functions ~f ∈ ~H and scalar functions f ∈ H are anullated.
For vector functions ~f ∈ ~H the following equation holds:
(22) curl curl ~f = graddiv ~f.
Hence, for f ∈ H and ~f ∈ ~H, on the basis of formulas (22) and (10)–(17) the
following is true:
graddiv gradf = grad (∆f) = ~0,
curl curl curl ~f = curl (graddiv) ~f = ~0,
div graddiv ~f = div (curl curl) ~f = 0.
Thus, all eight meaningful products of third-order differential operations are anul-
lated, so that the statement is true.
Example 2. If f ∈ Hn−1, then x · f ∈ Hn, n ≥ 2.
Let us notice that if f ∈ F, then x · f ∈ F. For an arbitrary scalar function f ∈ F
the following equation is directly verified:
∆(x · f) = 2∂f/∂x+ x ·∆(f).
Inductive generalization is the following equation:
∆n(x · f) = 2n · ∂
∆n−1(f)
/∂x+ x ·∆n(f).
Thus, for (n− 1)-harmonic function f ∈ Hn−1 the conclusion x · f ∈ Hn is true.
A note on higher-order differential operations 109
Example 3. If f ∈ Hn−1, then (x
2 + y2 + z2) · f ∈ Hn, n ≥ 2.
Let us notice that if f ∈ F, then (x2 + y2 + z2) · f ∈ F. For the arbitrary scalar
function f ∈ F the following equations are directly verified:
∆(x2 · f) = 2 · f + 4x · ∂f/∂x+ x2 ·∆(f),
∆2(x2 · f) = 8 · ∂2f/∂x2 + 8x · ∂
/∂x+ 4 ·∆(f) + x2 ·∆2(f).
Inductive generalization is the equation as follows:
∆n(x2 · f) = 4n(n− 1) · ∂2
∆n−2(f)
+ 4nx · ∂
∆n−1(f)
/∂x+ 2n ·∆n−1(f) + x2 ·∆n(f).
Thus, if f ∈ Hn−1, then (x
2 + y2 + z2) · f ∈ Hn.
Two previous examples are the generalizations of the corresponding problems con-
tained in [4].
Acknowledgement. I wish to express my gratitude to Professors M. Merkle,
I. Lazarević and D. Tošić who examined the first version of paper and gave me
their suggestions and some very useful remarks.
REFERENCES
1. M. L. Krasnov, A. I. Kiselev, G. I. Makarenko: Vector Analysis. Moscow 1981.
2. N. Cutland: Computability. Cambridge University Press, London 1980.
3. D. S. Mitrinović, J. D. Kečkić: Jednačine matematičke fizike. Beograd 1985.
4. D. S. Mitrinović, in association with P. M. Vasić: Diferencijalne jednačine, Novi
zbornik problema 4. Beograd 1986.
5. M. J. Crowe: A History of Vector Analysis. University of Notre Dame Press, London
1967.
Faculty of Electrical Engineering, (Received May 6, 1996)
University of Belgrade,
P.O.B 816, 11001 Belgrade,
Yugoslavia
malesevic@kiklop.etf.bg.ac.yu
ABSTRACT
  In this paper we consider successive iterations of the first-order
differential operations in space ${\bf R}^3.$

<|endoftext|><|startoftext|>
Introduction 
 The most important magneto-optical interactions that can occur in material media 
are the Faraday effect, magnetic dichroism, and magnetic birefringence (the Cotton-
Mouton effect). Quantum electrodynamics predicts that because of photon-photon 
interactions even the vacuum becomes birefringent in the presence of a strong magnetic 
field [1-5]. Further, the interaction with an axion-like particle and two photons via the 
Primakoff effect will also lend optical properties to the vacuum in the presence of a 
strong magnetic field [6-10]. The occurrence of an apparent magnetic dichroism of the 
vacuum would imply the preferential disappearance of left- or right circularly polarized 
photons from a light beam. To conserve mass and energy this would imply either the 
production of particles, or photon-splitting.  
The QED effect and the axion effect are treated in terms of an effective 
Lagrangian [1-7], in units where 1c= =h and . 2 / 4 1/137eα π= ≈
 ( ) ( ) ( )
2 22 2 2
1 7 1
4 90 4 2 4ae
L F F F F F F a a m a F F
µν µν µν µ µν
µν µν µν µ µν
α ⎡ ⎤
= − + + + ∂ ∂ − +⎢ ⎥⎣ ⎦
% %1 a  (1) 
Where the first half of the expression is the Euler-Heisenberg effective Lagrangian, 
which is appropriate to the QED effect, and the second half is the effective Lagrangian, 
which is appropriate to the Primakoff effect and accounts for the axion.  Here,  is the 
axion field,  is the axion mass, and 
am M is the inverse axion coupling constant.  Raffelt 
and Stodolsky [7] synthesize the results of Adler [4] and solve for the equations of 
motion.  Analysis of the classical wave solutions of the equations of motion produces a 
picture of mixing between photon and axion modes in a polarized laser experiment with a 
static transverse magnetic field and an optical cavity to increase path length.  In such an 
experiment, CP arguments predict that the axion will only couple to the parallel 
components of the beam. Thus, two main effects are predicted.   The first effect is a phase 
difference ∆φ=φ||-φ⊥ between the parallel and perpendicular components of polarized 
light interacting with the magnetic field. This arises from both QED and the preferential 
mixing of axion and photon modes.  In the mixing part of this picture, a photon mode 
oscillates into an axion mode before turning back into a photon and gets out of phase.  In 
both cases, this phase difference causes an apparent birefringence. 
 The second main effect, is an apparent linear dichroism which manifests itself as a 
rotation,ψ ,of the polarization and attenuation.  This is caused by the fact that mirrors do 
not reflect axions and, hence, any axion modes that do not oscillate back to photons 
before hitting the mirror will appear as lost parallel photon modes.  For small axion 
masses, the theory predicts: 
                  
φ ω∆ =  ,  
ext a
∆ =  and  
lψ = , (2) 
where l is the length of the cavity, N is the number of passes and L Nl=  is the total path 
length of the beam through the interaction region.  The subscripts QED and a, refer to the 
origin of the effects.  In terms of index of refraction, ∆φ=kL(n||-n⊥).  Choosing the limit 
of small axion masses is justified by several experimental results and astrophysical 
observations [6 -11] which bound the axion mass to and 3 610 10aeV m eV
− −> >
1010M GeV> .  This result also takes into account Adler’s analysis of the E-H 
Lagrangian which predicts the following vacuum birefringence: 
                                 n⊥=1+2ξsin2θ, n||=1+ θξ 2sin
7  with
= .                          (3) 
Here,θ  is the angle between and k .[4,7] extB
 The expected birefringence, as a function of extB in Tesla, due to the QED effect, is 
∆n=n||-n⊥=4×10-23Bext2.  The phase shift between the two orthogonal components of a 
light beam is 2 /L nφ π∆ = ∆ λ . For input light linearly polarized at 45o to the field 
direction, this translates into an induced ellipticity of the light /L nε π λ= ∆ , for a path 
length L. For a 1m path and a 1T field the induced ellipticity is expected to be 1.2×10-16.  
No experiment to date has achieved this sensitivity. 
The BRFT and PVLAS Experiments 
 Two important experiments have attempted to detect the phenomena that would 
result from the Primakoff effect. In the BRFT experiment [12] an upper limit of 3.5×10-10 
rad was determined for the possible rotation angle for a 2.2km path in a 3.25T field, 
equivalent to 1.5×10-14 rad m-1T-2, and an ellipticity of 1.6×10-9 was measured on a 299m 
path in a 3.25T field. The PVLAS experiment [13] claims a rotation of 1.7×10-7rad for a 
44km long path in a 5T field, equivalent to 1.55×10-14 rad m-1T-2. The BRFT and PVLAS 
experiments differ in several important specific ways, although from the standpoint of 
applying a modulated magnetic field they are similar. BRFT uses a transverse magnetic 
field modulated at a frequency of 32mHz about a background level of 3.25T. PVLAS 
uses a transverse magnetic field that rotates around the light propagation axis at 
1.89rad/s. This field is equivalent to the simultaneous application of two orthogonal 
transverse field components oscillating at 0.3Hz, but in quadrature.  Neither the BRFT 
nor PVLAS experiments operated at the photon noise limit. The BRFT experiment used a 
200mW argon ion laser and achieved a sensitivity of 4.7×10-7 rad Hz ½ m W -1/2. The 
PVLAS experiment used a 100mW 1.06µm Nd:YAG laser and achieved a sensitivity of 
10-6 rad Hz-1/2 mW-1/2. The photon noise limit at 1.06µm for a detector with a responsivity 
of 0.4 A/W (a typical value for a Si photodiode at this wavelength) is 2×10-8 
rad Hz 1/2 mW 1/2. 
Discussion 
 We have for several years operated a balanced coherent homodyne polarization 
interferometer for the study of the Faraday and Cotton-Mouton effects in condensed 
matter [14], and have achieved a photon noise limited sensitivity of 2×10-8 rad Hz-1/2 
mW-1/2 at 632.8nm or 1.06µm. Because we have only a 1kGauss modulated transverse 
field magnet with 0.1m pole pieces we could not compete with the BRFT and PVLAS 
experiments in overall sensitivity since we were a factor of 2.3×104 mT2 below BRFT and 
a factor 109mT2 below PVLAS in terms of path length and field strength. However, our 
experience with a very sensitive system for measuring elipticity has taught us much about 
the potential pitfalls of these experiments from an experimental optics standpoint. It is 
clear to us that the PVLAS experiment suffers from artifacts, as has already been pointed 
out by Melissinos [15], that the BRFT experiment suffers from artifacts has been 
acknowledged by its authors, although they do not specify all the sources of these 
spurious signals. A primary source of spurious signals in sensitive experiments of this 
kind is motion of optical components caused by a time-varying or a rotating magnetic 
field. The BRFT experiment acknowledges this and used a feedback system to attempt to 
minimize its effects. The PVLAS data show clear sideband peaks corresponding to the 
rotation frequency of their magnet, which should not be present for an effect proportional 
to B2. Indeed these peaks are approximately 18 times larger than the “real” signal at twice 
the magnet rotation frequency. They do not explain the origin of the fundamental signal 
but interpret the second harmonic signal as resulting from  an interaction involving a 
light, neutral, spin-zero particle.   
 In both the BRFT and PVLAS experiments optical components are either close to 
the magnet or mechanically coupled to the magnet and its cryostat. A primary component 
of the experiment that is strongly affected by the magnetic field is the evacuated tube 
passing though the magnet. This tube extends to the cavity end mirrors. All components 
in the experiment that experience any modulated field or field gradients will experience 
time-varying diamagnetic or paramagnetic forces. For example, any stainless steel or 
aluminum optical mounts will experience paramagnetic forces. There are torques acting 
on induced magnetic dipoles, especially in any components exposed to the field that are 
not absolutely symmetrically placed with respect to the field direction. A quartz sample 
tube in the magnet will experience the strongest forces in the regions where it leaves the 
magnet and experiences the largest field gradients, and will be pulled into the magnet 
bore. In general time-varying forces all result from any changes in magnetic stored 
energy that occur as the field is modulated. This generalized force on an object is 
F= ∫ ⋅∇− .21 dVHB   
 In our sensitive magneto-optical experiments we have verified that significant 
artifacts can result from any modulated feedback of light into the laser [16]. It has been 
shown that if a part of its own field is fed back into a laser by an optical component 
vibrating with small amplitude, then in the weak feedback regime, phase and amplitude 
of the output beam from the laser are synchronously modulated [17]. This effect is so 
efficient that when the source laser is influenced by the feedback the modulated light can 
cause interference in a sensitive measurement even for a balanced homodyne 
interferometer measuring an extremely small signal. We have performed a rigorous study 
of the feedback effect for the case of a balanced homodyne polarization interferometer. 
As a result, we have been able to detect phase and/or amplitude modulation produced in a 
balanced homodyne polarization interferometer when light from a mirror oscillating with 
an amplitude of only 9nm is fed back into the laser with 120dB of attenuation. This effect 
is still present even if the laser is an extremely low phase noise Nd:YAG ring laser [17]. 
The BRFT experiment is less sensitive to this feedback effect because it uses a multipass, 
zig-zag Herriott type cavity [18,19] rather than a spherical Fabry-Perot cavity. It is 
possible for light scattered by any of the optical components in these experiments to 
cause feedback, even if no specific optical component is used in the normal direction, and 
this includes scattered light that reflects off the inside walls of the evacuated tube inside 
the magnet.  The BRFT experiment uses a single optical isolator, which probably does 
not provide sufficient isolation to prevent feedback modulation effects. It appears that, 
according to the experimental arrangement shown in ref [11], the PVLAS experiment 
does not use an optical isolator after its laser. In principle, the Fabry-Perot resonator 
might not reflect significant incident light if the source laser is perfectly frequency locked 
to the resonator. In practice, however, even for a very high-Q resonator, it is impossible 
to avoid the feedback due to imperfectness of mirrors and locking electronics. Therefore, 
in the PVLAS experiment, the feedback modulation effects may cause major interference 
in measurements. 
 In principle, any correlated intensity noise can be rejected in a balanced 
homodyne interferometer. However, because of the imperfect performance of real optical 
and/or electronic components, overall common mode rejection ratio of the interferometer 
used in our study was approximately 40dB. Synchronous feedback can cause interference 
in a sensitive experiment even when the signal level is very low. In the case of the 
PVLAS scheme, by including the feedback effect synchronized at twice the rotating 
frequency of the magnet, the representation for the light intensity transmitted through the 
crossed polarizers of the ellipsometer given in Eq (2) of Ref. [11] can be rewritten as 
                                      2 20 ,2( ){ [ ( ) ( ) ( )] }mNI I I t t tν σ α η= + + + +Γ  
where 0I , 
2σ , α , η , and Γ  have the same meaning as in Ref [11] and ,2 mNI ν  is the 
intensity modulation caused by the feedback. The frequency of this synchronized 
modulation is given by the vibration frequency of a feedback element, twice the 
frequency of the rotating magnet. Small misalignment between the polarization 
components must be included in the quasi-static, uncompensated rotation and ellipticity, 
, which is much larger than the rotation caused by the Primakoff effect. Thus the term Γ
,22 ( )mN ( )I t tν η Γ  in the above equation has not only the same Fourier frequency as 02I αη  
but also has the same phase relationship when the quarter-wave plate is rotated by 90o. 
The synchronous interference, thereby cannot be distinguished from the magneto-optical 
effect being sought.  
 An important, but subtle distinction between the BRFT and PVLAS experiments 
is that the BRFT uses a mode-matched mirror cavity while the PVLAS apparently does 
not. Consequently, in the PVLAS experiment as the light beam oscillates between the 
two cavity mirrors its spot size and radius of curvature both oscillate and the radius of 
curvature does not match the mirror curvatures. This mismatch in radius leads to local 
non-normal incidence on the cavity mirrors (except on axis) and causes the local P-and S- 
polarization components of the beam to suffer different phase shifts, which vary radially 
on the mirror. A calculation for a typical very high reflectance multilayer mirror shows 
that this phase difference can be easily 10-11 rad per reflection for an incidence angle of 
1.5mrad. The PVLAS cavity is subject to these effects, which would be modulated if the 
cavity mirrors move, although the BRFT cavity is not. 
 A potential confounder in a search for vacuum magneto optic effects is the 
Faraday effect resulting from residual axial field components and trace gas. There are 
residual axial field components in both the BRFT and PVLAS experiments, since the 
local wave-vector directions in a Gaussian beam are only nominally perpendicular to a 
transverse field at the beam waist, or on axis. We do not however, believe that these were 
the sources of sidebands at the magnet oscillation or rotation frequency ωm. Nonetheless, 
an experiment in which there is no obvious modulation of the effect at frequency ωm is 
desirable, since an effect proportional to Bext2 only shows up at frequency 2ωm. In an 
experiment in which the entire field is modulated at frequency ωm a Faraday effect signal, 
or spurious signal, at ωm is distinguished from the desired signal at frequency 2ωm, which 
should be further checked by verifying that the desired signal is proportional to Bext2. A 
complication can arise if the magnet modulation is not a pure harmonic at frequency ωm. 
Any second harmonic of the magnetic field can produce a spurious signal at 2ωm, but this 
can be identified since it will be linear in Bext.  
Features of an Improved Experiment 
 It is our belief that a balanced coherent homodyne interferometer is a better 
instrument to use than an extinction-based ellipsometer in a search for vacuum magneto-
optical effects. Such a system is almost guaranteed to achieve the photon noise limit and 
provides excellent common mode rejection of laser noise. We also believe that any effect 
observed should be demonstrated to scale with Bext2 [14]. It will also be desirable to use 
the largest magnetic field possible, but not to modulate this. An experiment similar to 
PVLAS can then be performed by rotating the optical train at angular frequency ωm.  
Conclusions 
 We believe that we have identified the likely causes of artifacts in the PVLAS 
experiment, and therefore suggest that the case for an interaction involving an axion-like 
particle has not been made. Furthermore, the PVLAS experiment contradicts the findings 
of the BRFT experiment, and a series of astrophysical observations that restrict the range 
of axion particle masses that are possible. An improved experimental arrangement is 
needed to pursue vacuum magnetic birefringence and polarization rotation effects. With 
an improved system, detection of the QED- predicted magnetic birefringence [4,5] should 
be possible, and a more sensitive examination of the existences of any axion-like 
interactions. 
* Corresponding author 
Email address: davis@umd.edu
                                                 
[1] W. Heisenberg and H. Euler, Z. Phys. 98, 714 (1936). 
[2] V.F. Weisskopf, K. Dan. Vidensk.Selsk.Mat.Fys.Medd, 14, 6 (1936). 
[3] J. Schwinger, Phys.Rev. 82, 664 (1951). 
[4] S.L. Adler, Ann Phys. (N.Y.) 67,599 (1971). 
[5] S. L. Adler, J. Phys. A 40, F143 (2007). 
[6] P. Sikivie, Phys. Rev. Lett. 51, 1415 (1983).  
[7] G. Raffelt and L. Stodolsky, Phys. Rev. D 37, 1237 (1988). 
[8] S.J. Asztalos et al. Ann.Rev.Nucl.Part.Sci. 56, 293 (2006). 
[9] P. Sikivie, arXiv:hep-ph/0701198v1 (2007).  
[10] G. Raffelt, arXiv:hep-ph/0611350 (2006). 
[11] J. Jaeckel et al, Phys.Rev.D 75, 013004 (2007) 
[12] R. Cameron et al. Phys Rev. D 47, 3703 (1993) 
[13] E. Zavattini et al. Phys. Rev. Lett. 96, 110406 (2006) 
[14] K. Cho, S.P. Bush, D.L. Mazzoni,  and C.C. Davis, Phys. Rev. B 43, 965 (1991). 
[15] A.C. Melissinos, arXiv:hep-ph/0702135v1 13 Feb 2007. 
[16] K. Cho, Ph.D Thesis, University of Maryland, 1991. 
[17] M. Sargent III, M.O. Scully and W.E. Lamb, Laser Physics, Addison-Wesley, 
Reading. Mass, 1974. 
[18] D. R. Herriott, H. Kogelnik, and R. Kompfner, Appl. Opt. 3, 523 (1964) 
[19] D. R. Herriott and H.J. Schulte,  Appl. Opt. Aug. 4, 883 (1965) 
mailto:davis@umd.edu
ABSTRACT
  We discuss the experimental techniques used to date for measuring the changes
in polarization state of a laser produced by a strong transverse magnetic field
acting in a vacuum. We point out the likely artifacts that can arise in such
experiments, with particular reference to the recent PVLAS observations and the
previous findings of the BFRT collaboration. Our observations are based on
studies with a photon-noise limited coherent homodyne interferometer with a
polarization sensitivity of 2x10^-8 rad Hz^(1/2) mW^(-1/2).

<|endoftext|><|startoftext|>
Introduction
The discovery of binary pulsars in 1974 [1] opened up a new testing ground
for relativistic gravity. Before this discovery, the only available testing ground
for relativistic gravity was the solar system. As Einstein’s theory of General
Relativity (GR) is one of the basic pillars of modern science, it deserves to
be tested, with the highest possible accuracy, in all its aspects. In the solar
system, the gravitational field is slowly varying and represents only a very small
deformation of a flat spacetime. As a consequence, solar system tests can only
probe the quasi-stationary (non radiative) weak-field limit of relativistic gravity.
By contrast binary systems containing compact objects (neutron stars or black
holes) involve spacetime domains (inside and near the compact objects) where
the gravitational field is strong. Indeed, the surface relativistic gravitational
field h00 ≃ 2GM/c2R of a neutron star is of order 0.4, which is close to the one
of a black hole (2GM/c2R = 1) and much larger than the surface gravitational
fields of solar system bodies: (2GM/c2R)Sun ∼ 10−6, (2GM/c2R)Earth ∼ 10−9.
In addition, the high stability of “pulsar clocks” has made it possible to monitor
the dynamics of its orbital motion down to a precision allowing one to measure
the small (∼ (v/c)5) orbital effects linked to the propagation of the gravitational
field at the velocity of light between the pulsar and its companion.
The recent discovery of the remarkable double binary pulsar PSR J0737−
3039 [2, 3] (see also the contributions of M. Kramer and A. Possenti to these
∗Based on lectures given at the SIGRAV School “A Century from Einstein Relativity:
Probing Gravity Theories in Binary Systems”, Villa Olmo (Como Lake, Italy), 17-21 May
2005. To appear in the Proceedings, edited by M. Colpi et al. (to be published by Springer).
http://arxiv.org/abs/0704.0749v1
proceedings) has renewed the interest in the use of binary pulsars as test-beds
of gravity theories. The aim of these notes is to provide an introduction to the
theoretical frameworks needed for interpreting binary pulsar data as tests of GR
and alternative gravity theories.
2 Motion of binary pulsars in general relativity
The traditional (text book) approach to the problem of motion of N separate
bodies in GR consists of solving, by successive approximations, Einstein’s field
equations (we use the signature −+++)
Rµν −
Rgµν =
Tµν , (1)
together with their consequence
∇ν T µν = 0 . (2)
To do so, one assumes some specific matter model, say a perfect fluid,
T µν = (ε+ p)uµ uν + p gµν . (3)
One expands (say in powers of Newton’s constant)
gµν(x
λ) = ηµν + h
µν + h
µν + . . . , (4)
together with the use of the simplifications brought by the ‘Post-Newtonian’
approximation (∂0 hµν = c
−1 ∂t hµν ≪ ∂i hµν ; v/c ≪ 1, p ≪ ε). Then one
integrates the local material equation of motion (2) over the volume of each
separate body, labelled say by a = 1, 2, . . . , N . In so doing, one must define
some ‘center of mass’ zia of body a, as well as some (approximately conserved)
‘mass’ ma of body a, together with some corresponding ‘spin vector’ S
a and,
possibly, higher multipole moments.
An important feature of this traditional method is to use a unique coor-
dinate chart xµ to describe the full N -body system. For instance, the center
of mass, shape and spin of each body a are all described within this common
coordinate system xµ. This use of a single chart has several inconvenient as-
pects, even in the case of weakly self-gravitating bodies (as in the solar system
case). Indeed, it means for instance that a body which is, say, spherically sym-
metric in its own ‘rest frame’ Xα will appear as deformed into some kind of
ellipsoid in the common coordinate chart xµ. Moreover, it is not clear how
to construct ‘good definitions’ of the center of mass, spin vector, and higher
multipole moments of body a, when described in the common coordinate chart
xµ. In addition, as we are interested in the motion of strongly self-gravitating
bodies, it is not a priori justified to use a simple expansion of the type (4) be-
cause h
Gma/(c
2 |x − za|) will not be uniformly small in the common
coordinate system xµ. It will be small if one stays far away from each object a,
but, as recalled above, it will become of order unity on the surface of a compact
body.
These two shortcomings of the traditional ‘one-chart’ approach to the rela-
tivistic problem of motion can be cured by using a ‘multi-chart’ approach.The
multi-chart approach describes the motion of N (possibly, but not necessarily,
compact) bodies by using N+1 separate coordinate systems: (i) one global coor-
dinate chart xµ (µ = 0, 1, 2, 3) used to describe the spacetime outside N ‘tubes’,
each containing one body, and (ii) N local coordinate charts Xαa (α = 0, 1, 2, 3;
a = 1, 2, . . . , N) used to describe the spacetime in and around each body a.
The multi-chart approach was first used to discuss the motion of black holes
and other compact objects [4, 5, 6, 7, 8, 9, 10, 11]. Then it was also found to
be very convenient for describing, with the high-accuracy required for dealing
with modern technologies such as VLBI, systems of N weakly self-gravitating
bodies, such as the solar system [12, 13].
The essential idea of the multi-chart approach is to combine the information
contained in several expansions. One uses both a global expansion of the type
(4) and several local expansions of the type
Gαβ(X
a ) = G
a ;ma) +H
αβ (X
a ;ma,mb) + · · · , (5)
where G
αβ(X ;ma) denotes the (possibly strong-field) metric generated by an
isolated body of mass ma (possibly with the additional effect of spin).
The separate expansions (4) and (5) are then ‘matched’ in some overlapping
domain of common validity of the type Gma/c
2 . Ra ≪ |x−za| ≪ d ∼ |xa−xb|
(with b 6= a), where one can relate the different coordinate systems by expansions
of the form
xµ = zµa (Ta) + e
i (Ta)X
ij(Ta)X
a + · · · (6)
The multi-chart approach becomes simplified if one considers compact bodies
(of radius Ra comparable to 2Gma/c
2). In this case, it was shown [9], by
considering how the ‘internal expansion’ (5) propagates into the ‘external’ one
(4) via the matching (6), that, in General Relativity, the internal structure of
each compact body was effaced to a very high degree, when seen in the external
expansion (4). For instance, for non spinning bodies, the internal structure of
each body (notably the way it responds to an external tidal excitation) shows
up in the external problem of motion only at the fifth post-Newtonian (5PN)
approximation, i.e. in terms of order (v/c)10 in the equations of motion.
This ‘effacement of internal structure’ indicates that it should be possible
to simplify the rigorous multi-chart approach by skeletonizing each compact
body by means of some delta-function source. Mathematically, the use of dis-
tributional sources is delicate in a nonlinear theory such as GR. However, it
was found that one can reproduce the results of the more rigorous matched-
multi-chart approach by treating the divergent integrals generated by the use
of delta-function sources by means of (complex) analytic continuation [9]. The
most efficient method (especially to high PN orders) has been found to use
analytic continuation in the dimension of space d [14].
Finally, the most efficient way to derive the general relativistic equations of
motion of N compact bodies consists of solving the equations derived from the
action (where g ≡ − det(gµν))
dd+1 x
R(g)−
−gµν(zλa ) dz
a dzνa , (7)
formally using the standard weak-field expansion (4), but considering the space
dimension d as an arbitrary complex number which is sent to its physical value
d = 3 only at the end of the calculation.
Using this method1 one has derived the equations of motion of two compact
bodies at the 2.5PN (v5/c5) approximation level needed for describing binary
pulsars [15, 16, 9]:
d2 zia
= Aia0(za − zb) + c−2Aia2(za − zb,va,vb)
+ c−4Aia4(za − zb,va,vb,Sa,Sb)
+ c−5Aia5(za − zb,va − vb) +O(c−6) . (8)
Here Aia0 = −Gmb(zia − zib)/|za − zb|3 denotes the Newtonian acceleration, Aia2
its 1PN modification, Aia4 its 2PN modification (together with the spin-orbit
effects), and Aia5 the 2.5PN contribution of order v
5/c5. [See the references
above; or the review [17], for more references and the explicit expressions of A2,
A4 and A5.] It was verified that the term A
a5 has the effect of decreasing the
mechanical energy of the system by an amount equal (on average) to the energy
lost in the form of gravitational wave flux at infinity. Note, however, that here
Aia5 was derived, in the near zone of the system, as a direct consequence of the
general relativistic propagation of gravity, at the velocity c, between the two
bodies. This highlights the fact that binary pulsar tests of the existence of Aia5
are direct tests of the reality of gravitational radiation.
Recently, the equations of motion (8) have been computed to even higher
accuracy: 3PN ∼ v6/c6 [18, 19, 20, 21, 22] and 3.5PN ∼ v7/c7 [23, 24, 25]
(see also the review [26]). These refinements are, however, not (yet) needed for
interpreting binary pulsar data.
3 Timing of binary pulsars in general relativity
In order to extract observational effects from the equations of motion (8) one
needs to go through two steps: (i) to solve the equations of motion (8) so as to
1Or, more precisely, an essentially equivalent analytic continuation using the so-called
‘Riesz kernels’.
get the coordinate positions z1 and z2 as explicit functions of the coordinate
time t, and (ii) to relate the coordinate motion za(t) to the pulsar observables,
i.e. mainly to the times of arrival of electromagnetic pulses on Earth.
The first step has been accomplished, in a form particularly useful for dis-
cussing pulsar timing, in Ref. [27]. There (see also [28]) it was shown that,
when considering the full (periodic and secular) effects of the A2 ∼ v2/c2 terms
in Eq. (8), together with the secular effects of the A4 ∼ v4/c4 and A5 ∼ v5/c5
terms, the relativistic two-body motion could be written in a very simple ‘quasi-
Keplerian’ form (in polar coordinates), namely:
n dt+ σ = u− et sinu , (9)
θ − θ0 = (1 + k) 2 arctan
1 + eθ
1− eθ
, (10)
R ≡ rab = aR(1 − eR cosu) , (11)
ra ≡ |za − zCM | = ar(1− er cosu) , (12)
rb ≡ |zb − zCM | = ar′(1− er′ cosu) . (13)
Here n ≡ 2π/Pb denotes the orbital frequency, k = ∆θ/2π = 〈ω̇〉/n =
〈ω̇〉Pb/2π the fractional periastron advance per orbit, u an auxiliary angle (‘rel-
ativistic eccentric anomaly’), et, eθ, eR, er and er′ various ‘relativistic eccentric-
ities’ and aR, ar and ar′ some ‘relativistic semi-major axes’. See [27] for the
relations between these quantities, as well as their link to the relativistic energy
and angular momentum E, J . A direct study [28] of the dynamical effect of the
contribution A5 ∼ v5/c5 in the equations of motion (8) has checked that it led
to a secular increase of the orbital frequency n(t) ≃ n(0)+ ṅ(t−t0), and thereby
to a quadratic term in the ‘relativistic mean anomaly’ ℓ =
n dt+ σ appearing
on the left-hand side (L.H.S.) of Eq. (9):
ℓ ≃ σ0 + n0(t− t0) +
ṅ(t− t0)2 . (14)
As for the contribution A4 ∼ v4/c4 it induces several secular effects in the
orbital motion: various 2PN contributions to the dimensionless periastron pa-
rameter k (δ4 k ∼ v4/c4+ spin-orbit effects), and secular variations in the incli-
nation of the orbital plane (due to spin-orbit effects).
The second step in relating (8) to pulsar observations has been accomplished
through the derivation of a ‘relativistic timing formula’ [29, 30]. The ‘timing
formula’ of a binary pulsar is a multi-parameter mathematical function relating
the observed time of arrival (at the radio-telescope) of the center of the N th
pulse to the integer N . It involves many different physical effects: (i) dispersion
effects, (ii) travel time across the solar system, (iii) gravitational delay due to
the Sun and the planets, (iv) time dilation effects between the time measured
on the Earth and the solar-system-barycenter time, (v) variations in the travel
time between the binary pulsar and the solar-system barycenter (due to relative
accelerations, parallax and proper motion), (vi) time delays happening within
the binary system. We shall focus here on the time delays which take place
within the binary system (see the lectures of M. Kramer for a discussion of the
other effects).
For a proper derivation of the time delays occurring within the binary sys-
tem we need to use the multi-chart approach mentionned above. In the ‘rest
frame’ (X0a = c Ta, X
a) attached to the pulsar a, the pulsar phenomenon can be
modelled by the secularly changing rotation of a beam of radio waves:
Ωa(Ta) d Ta ≃ Ωa Ta +
Ω̇a T
Ω̈a T
a + · · · , (15)
where Φa is the longitude around the spin axis. [Depending on the precise defi-
nition of the rest-frame attached to the pulsar, the spin axis can either be fixed,
or be slowly evolving, see e.g. [13].] One must then relate the initial direction
(Θa,Φa), and proper time Ta, of emission of the pulsar beam to the coordinate
direction and coordinate time of the null geodesic representing the electromag-
netic beam in the ‘global’ coordinates xµ used to describe the dynamics of the
binary system [NB: the explicit orbital motion (9)–(13) refers to such global
coordinates x0 = ct, xi]. This is done by using the link (6) in which zia denotes
the global coordinates of the ‘center of mass’ of the pulsar, Ta the local (proper)
time of the pulsar frame, and where, for instance
e0i =
c2 rab
+ · · ·
+ · · · (16)
Using the link (6) (with expressions such as (16) for the coefficients e
i , . . .)
one finds, among other results, that a radio beam emitted in the proper direction
N i in the local frame appears to propagate, in the global frame, in the coordinate
direction ni where
ni = N i +
−N i N
. (17)
This is the well known ‘aberration effect’, which will then contribute to the
timing formula.
One must also write the link between the pulsar ‘proper time’ Ta and the
coordinate time t = x0/c = z0a/c used in the orbital motion (9)–(13). This reads
− c2 d T 2a = g̃µν(aλa) dzµa dzνa (18)
where the ‘tilde’ denotes the operation consisting (in the matching approach) in
discarding in gµν the ‘self contributions’ ∼ (Gma/Ra)n, while keeping the effect
of the companion (∼ Gmb/rab, etc. . .). One checks that this is equivalent (in
the dimensional-continuation approach) in taking xµ = zµa for sufficiently small
values of the real part of the dimension d. To lowest order this yields the link
1− 2Gmb
c2 rab
1− Gmb
c2 rab
which combines the special relativistic and general relativistic time dilation
effects. Hence, following [30] we can refer to them as the ‘Einstein time delay’.
Then, one must compute the (global) time taken by a light beam emitted
by the pulsar, at the proper time Ta (linked to temission by (19)), in the initial
global direction ni (see Eq. (17)), to reach the barycenter of the solar system.
This is done by writing that this light beam follows a null geodesic: in particular
0 = ds2 = gµν(x
λ) dxµ dxν ≃ −
1− 2U
c2 dt2 +
dx2 (20)
where U = Gma/|x−za|+Gmb/|x−zb| is the Newtonian potential within the
binary system. This yields (with te ≡ temission, ta ≡ tarrival)
ta − te =
dt ≃ 1
|dx|+ 2
|x− za|
|x− zb|
|dx| . (21)
The first term on the last RHS of Eq. (21) is the usual ‘light crossing time’
|zbarycenter(ta) − za(te)| between the pulsar and the solar barycenter. It con-
tains the ‘Roemer time delay’ due to the fact that za(te) moves on an orbit.
The second term on the last RHS of Eq. (21) is the ‘Shapiro time delay’ due to
the propagation of the beam in a curved spacetime (only the Gmb piece linked
to the companion is variable).
When inserting the ‘quasi-Keplerian’ form (9)–(13) of the relativistic motion
in the ‘Roemer’ term in (21), together with all other relativistic effects, one finds
that the final expression for the relativistic timing formula can be significantly
simplified by doing two mathematical transformations. One can redefine the
‘time eccentricity’ et appearing in the ‘Kepler equation’ (9), and one can define
a new ‘eccentric anomaly’ angle: u→ unew [we henceforth drop the superscript
‘new’ on u]. After these changes, the binary-system part of the general relativis-
tic timing formula [30] takes the form (we suppress the index a on the pulsar
proper time Ta)
tbarycenter − t0 = D−1[T +∆R(T ) + ∆E(T ) + ∆S(T ) + ∆A(T )] (22)
∆R = x sinω[cosu− e(1 + δr)] + x[1 − e2(1 + δθ)2]1/2 cosω sinu , (23)
∆E = γ sinu , (24)
∆S = −2r ln{1− e cosu− s[sinω(cosu− e) + (1− e2)1/2 cosω sinu]},(25)
∆A = A{sin[ω +Ae(u)] + e sinω}+B{cos[ω +Ae(u)] + e cosω} , (26)
where x = x0 + ẋ(T − T0) represents the projected light-crossing time (x =
apulsar sin i/c), e = e0 + ė(T − T0) a certain (relativistically-defined) ‘timing
eccentricity’, Ae(u) the function
Ae(u) ≡ 2 arctan
1 + e
, (27)
ω = ω0 + k Ae(u) the ‘argument of the periastron’, and where the (relativisti-
cally-defined) ‘eccentric anomaly’ u is the function of the ‘pulsar proper time’
T obtained by solving the Kepler equation
u− e sinu = 2π
T − T0
T − T0
. (28)
It is understood here that the pulsar proper time T corresponding to the N th
pulse is related to the integer N by an equation of the form
N = c0 + νp T +
ν̇p T
ν̈p T
3 . (29)
From these formulas, one sees that δθ (and δr) measure some relativistic distor-
tion of the pulsar orbit, γ the amplitude of the ‘Einstein time delay’2 ∆E , and
r and s the range and shape of the ‘Shapiro time delay’3 ∆S . Note also that
the dimensionless PPK parameter k measures the non-uniform advance of the
periastron. It is related to the often quoted secular rate of periastron advance
ω̇ ≡ 〈dω/dt〉 by the relation k = ω̇Pb/2π. It has been explicitly checked that
binary-pulsar observational data do indeed require to model the relativistic pe-
riastron advance by means of the non-uniform (and non-trivial) function of u
multiplying k on the R.H.S. of Eq. (27) [31]4. Finally, we see from Eq. (28) that
Pb represents the (periastron to periastron) orbital period at the fiducial epoch
T0, while the dimensionless parameter Ṗb represents the time derivative of Pb
(at T0).
Schematically, the structure of the DD timing formula (22) is
tbarycenter − t0 = F [TN ; {pK}; {pPK}; {qPK}] , (30)
where tbarycenter denotes the solar-system barycentric (infinite frequency) ar-
rival time of a pulse, T the pulsar emission proper time (corrected for aberra-
tion), {pK} = {Pb, T0, e0, ω0, x0} is the set of Keplerian parameters, {pPK =
k, γ, Ṗb, r, s, δθ, ė, ẋ} the set of separately measurable post-Keplerian parameters,
2The post-Keplerian timing parameter γ, first introduced in [29], has the dimension of
time, and should not be confused with the dimensionless post-Newtonian Eddington parameter
γPPN probed by solar-system experiments (see below).
3The dimensionless parameter s is numerically equal to the sine of the inclination angle i
of the orbital plane, but its real definition within the PPK formalism is the timing parameter
which determines the ‘shape’ of the logarithmic time delay ∆S(T ).
4Alas this function is theory-independent, so that the non-uniform aspect of the periastron
advance cannot be used to yield discriminating tests of relativistic gravity theories.
and {qPK} = {δr, A,B,D} the set of not separately measurable post-Keplerian
parameters [31]. [The parameter D is a ‘Doppler factor’ which enters as an
overall multiplicative factor D−1 on the right-hand side of Eq. (22).]
A further simplification of the DD timing formula was found possible. In-
deed, the fact that the parameters {qPK} = {δr, A,B,D} are not separately
measurable means that they can be absorbed in changes of the other param-
eters. The explicit formulas for doing that were given in [30] and [31]: they
consist in redefining e, x, Pb, δθ and δr. At the end of the day, it suffices to
consider a simplified timing formula where {δr, A,B,D} have been set to some
given fiducial values, e.g. {0, 0, 0, 1}, and where one only fits for the remaining
parameters {pK} and {pPK}.
Finally, let us mention that it is possible to extend the general parametrized
timing formula (30) by writing a similar parametrized formula describing the ef-
fect of the pulsar orbital motion on the directional spectral luminosity [d(energy)
/d(time) d(frequency) d(solid angle)] received by an observer. As discussed in
detail in [31] this introduces a new set of ‘pulse-structure post-Keplerian pa-
rameters’.
4 Phenomenological approach to testing rela-
tivistic gravity with binary pulsar data
As said in the Introduction, binary pulsars contain strong gravity domains and
should therefore allow one to test the strong-field aspects of relativistic gravity.
The question we face is then the following: How can one use binary pulsar data
to test strong-field (and radiative) gravity?
Two different types of answers can be given to this question: a phenomeno-
logical (or theory-independent) one, or various types of theory-dependent ap-
proaches. In this Section we shall consider the phenomenological approach.
The phenomenological approach to binary-pulsar tests of relativistic gravity
is called the parametrized post-Keplerian formalism [32, 31]. This approach
is based on the fact that the mathematical form of the multi-parameter DD
timing formula (30) was found to be applicable not only in General Relativity,
but also in a wide class of alternative theories of gravity. Indeed, any theory
in which gravity is mediated not only by a metric field gµν but by a general
combination of a metric field and of one or several scalar fields ϕ(a) will induce
relativistic timing effects in binary pulsars which can still be parametrized by the
formulas (22)–(29). Such general ‘tensor-multi-scalar’ theories of gravity contain
arbitrary functions of the scalar fields. They have been studied in full generality
in [33]. It was shown that, under certain conditions, such tensor-scalar gravity
theories could lead, because of strong-field effects, to very different predictions
from those of General Relativity in binary pulsar timing observations [34, 35, 36].
However, the point which is important for this Section, is that even when such
strong-field effects develop one can still use the universal DD timing formula
(30) to fit the observed pulsar times of arrival.
The basic idea of the phenomenological, parametrized post-Keplerian (PPK)
approach is then the following: By least-square fitting the observed sequence
of pulsar arrival times tN to the parametrized formula (30) (in which TN is
defined by Eq. (29) which introduces the further parameters νp, ν̇p, ν̈p) one can
phenomenologically extract from raw observational data the (best fit) values of
all the parameters entering Eqs. (29) and (30). In particular, one so determines
both the set of Keplerian parameters {pK} = {Pb, T0, e0, ω0, x0}, and the set of
post-Keplerian (PK) parameters {pPK} = {k, γ, Ṗb, r, s, δθ, ė, ẋ}. In extracting
these values, we did not have to assume any theory of gravity. However, each
specific theory of gravity will make specific predictions relating the PK param-
eters to the Keplerian ones, and to the two (a priori unknown) masses ma and
mb of the pulsar and its companion. [For certain PK parameters one must also
consider other variables related to the spin vectors of a and b.] In other words,
the measurement (in addition of the Keplerian parameters) of each PK param-
eter defines, for each given theory, a curve in the (ma,mb) mass plane. For any
given theory, the measurement of two PK parameters determines two curves
and thereby generically determines the values of the two masses ma and mb (as
the point of intersection of these two curves). Therefore, as soon as one mea-
sures three PK parameters one obtains a test of the considered gravity theory.
The test is passed only if the three curves meet at one point. More generally,
the measurement of n PK timing parameters yields n− 2 independent tests of
relativistic gravity. Any one of these tests, i.e. any simultaneous measurement
of three PK parameters can either confirm or put in doubt any given theory of
gravity.
As General Relativity is our current most successful theory of gravity, it is
clearly the prime target for these tests. We have seen above that the timing data
of each binary pulsar provides a maximum of 8 PK parameters: k, γ, Ṗb, r, s, δθ, ė
and ẋ. Here, we were talking about a normal ‘single line’ binary pulsar where,
among the two compact objects a and b only one of the two, say a is observed
as a pulsar. In this case, one binary system can provide up to 8− 2 = 6 tests of
GR. In practice, however, it has not yet been possible to measure the parameter
δθ (which measures a small relativistic deformation of the elliptical orbit), nor
the secular parameters ė and ẋ. The original Hulse-Taylor system PSR 1913+16
has allowed one to measure 3 PK parameters: k ≡ 〈ω̇〉Pb/2π, γ and Ṗb. The
two parameters k and γ involve (non radiative) strong-field effects, while, as
explained above, the orbital period derivative Ṗb is a direct consequence of the
term A5 ∼ v5/c5 in the binary-system equations of motion (5). The term A5 is
itself directly linked to the retarded propagation, at the velocity of light, of the
gravitational interaction between the two strongly self-gravitating bodies a and
b. Therefore, any test involving Ṗb will be a mixed radiative strong-field test.
Let us explain on this example what information one needs to implement a
phenomenological test such as the (k−γ−Ṗb)1913+16 one. First, we need to know
the predictions made by the considered target theory for the PK parameters k, γ
and Ṗb as functions of the two masses ma and mb. These predictions have been
worked out, for General Relativity, in Refs. [29, 28, 30]. Introducing the notation
(where n ≡ 2π/Pb)
M ≡ ma +mb (31)
Xa ≡ ma/M ; Xb ≡ mb/M ; Xa +Xb ≡ 1 (32)
βO(M) ≡
, (33)
they read
kGR(ma,mb) =
1− e2
β2O , (34)
γGR(ma,mb) =
Xb(1 +Xb)β
O , (35)
ṖGRb (ma,mb) = −
1 + 73
e2 + 37
(1− e2)7/2
XaXb β
O . (36)
However, if we use the three predictions (34)–(36), together with the best
current observed values of the PK parameters kobs, γobs, Ṗ obdb [37] we shall find
that the three curves kGR(ma,mb) = k
obs, γGR(ma,mb) = γ
obs, ṖGRb (ma,mb) =
Ṗ obsb in the (ma,mb) mass plane fail to meet at about the 13 σ level! Should
this put in doubt General Relativity? No, because Ref. [38] has shown that the
time variation (notably due to galactic acceleration effects) of the Doppler fac-
tor D entering Eq. (22) entailed an extra contribution to the ‘observed’ period
derivative Ṗ obsb . We need to subtract this non-GR contribution before drawing
the corresponding curve: ṖGRb (ma,mb) = Ṗ
b − Ṗ
galactic
b . Then one finds that
the three curves do meet within one σ. This yields a deep confirmation of Gen-
eral Relativity, and a direct observational proof of the reality of gravitational
radiation.
We said several times that this test is also a probe of the strong-field aspects
of GR. How can one see this? A look at the GR predictions (34)–(36) does not
exhibit explicit strong-field effects. Indeed, the derivation of Eqs. (34)–(36) used
in a crucial way the ‘effacement of internal structure’ that occurs in the general
relativistic dynamics of compact objects. This non trivial property is rather
specific of GR and means that, in this theory, all the strong-field effects can be
absorbed in the definition of the masses ma and mb. One can, however, verify
that strong-field effects do enter the observable PK parameters k, γ, Ṗb etc. . . by
considering how the theoretical predictions (34)–(36) get modified in alternative
theories of gravity. The presence of such strong-field effects in PK parameters
was first pointed out in Ref. [7] (see also [39]) for the Jordan-Fierz-Brans-Dicke
theory of gravity, and in Ref. [8] for Rosen’s bi-metric theory of gravity. A
detailed study of such strong-field deviations was then performed in [33, 34, 35]
for general tensor-(multi-)scalar theories of gravity. In the following Section
we shall exhibit how such strong-field effects enter the various post-Keplerian
parameters.
Continuing our historical review of phenomenological pulsar tests, let us
come to the binary system which was the first one to provide several ‘pure
strong-field tests’ of relativistic gravity, without mixing of radiative effects:
PSR 1534+12. In this system, it was possible to measure the four (non ra-
diative) PK parameters k, γ, r and s. [We see from Eq. (25) that r and s mea-
sure, respectively, the range and the shape of the ‘Shapiro time delay’ ∆S .] The
measurement of the 4 PK parameters k, γ, r, s define 4 curves in the (ma,mb)
mass plane, and thereby yield 2 strong-field tests of GR. It was found in [40]
that GR passes these two tests. For instance, the ratio between the measured
value sobs of the phenomenological parameter5 s and the value sGR[kobs, γobs]
predicted by GR on the basis of the measurements of the two PK parameters k
and γ (which determine, via Eqs. (34) , (35), the GR-predicted value of ma and
mb) was found to be s
obs/sGR[kobs, γobs] = 1.004± 0.007 [40]. The most recent
data [41] yield sobs/sGR[kobs, γobs] = 1.000± 0.007. We see that we have here a
confirmation of the strong-field regime of GR at the 1% level.
Another way to get phenomenological tests of the strong field aspects of
gravity concerns the possibility of a violation of the strong equivalence principle.
This is parametrized by phenomenologically assuming that the ratio between the
gravitational and the inertial mass of the pulsar differs from unity (which is its
value in GR): (mgrav/minert)a = 1+∆a. Similarly to what happens in the Earth-
Moon-Sun system [42], the three-body system made of a binary pulsar and of the
Galaxy exhibits a ‘polarization’ of the orbit which is proportional to ∆ ≡ ∆a −
∆b, and which can be constrained by considering certain quasi-circular neutron-
star-white-dwarf binary systems [43]. See [44] for recently published improved
limits6 on the phenomenological equivalence-principle violation parameter ∆.
The Parkes multibeam survey has recently discovered several new interesting
‘relativistic’ binary pulsars, thereby giving a huge increase in the number of
phenomenological tests of relativistic gravity. Among those new binary pulsar
systems, two stand out as superb testing grounds for relativistic gravity: (i)
PSR J1141−6545 [46, 47], and (ii) the remarkable double binary pulsar PSR
J0737−3039A and B [2, 3, 48, 49] (see also the lectures by M. Kramer and
A. Possenti).
The PSR J1141−6545 timing data have led to the measurement of 3 PK
parameters: k, γ, and Ṗb [47]. As in PSR 1913+16 this yields one mixed
radiative-strong-field test7.
5As already mentioned the dimensionless parameter s is numerically equal (in all theories)
to the sine of the inclination angle i of the orbital plane, but it is better thought, in the PPK
formalism, as a phenomenological timing parameter determining the ‘shape’ of the logarithmic
time delay ∆S(T ).
6Note, however, that these limits, as well as those previously obtained in [45], assume
that the (a priori pulsar-mass dependent) parameter ∆ ≃ ∆a is the same for all the analyzed
pulsars.
7In addition, scintillation data have led to an estimate of the sine of the orbital inclination,
sin i [50]. As said above, sin i numerically coincides with the PK parameter s measuring the
‘shape’ of the Shapiro time delay. Therefore, one could use the scintillation measurements as
an indirect determination of s, thereby obtaining two independent tests from PSR J1141−6545
data. A caveat, however, is that the extraction of sin i from scintillation measurements rests
on several simplifying assumptions whose validity is unclear. In fact, in the case of PSR
J0737−3039 the direct timing measurement of s disagrees with its estimate via scintillation
The timing data of the millisecond binary pulsar PSR J0737−3039A have
led to the direct measurement of 5 PK parameters: k, γ, r, s and Ṗb [3, 48,
49]. In addition, the ‘double line’ nature of this binary system (i.e. the fact
that one observes both components, A and B, as radio pulsars) allows one to
perform new phenomenological tests by using Keplerian parameters. Indeed, the
simultaneous measurement of the Keplerian parameters xa and xb representing
the projected light crossing times of both pulsars (A and B) gives access to the
combined Keplerian parameter
Robs ≡
xobsb
xobsa
. (37)
On the other hand, the general derivation of [30] (applicable to any Lorentz-
invariant theory of gravity, and notably to any tensor-scalar theory) shows that
the theoretical prediction for the the ratio R, considered as a function of the
masses ma and mb, is
Rtheory =
. (38)
The absence of any explicit strong-field-gravity effects in the theoretical predic-
tion (38) (to be contrasted, for instance, with the predictions for PK parameters
in tensor-scalar gravity discussed in the next Section) is mainly due to the con-
vention used in [30] and [31] for defining the masses ma and mb. These are
always defined so that the Lagrangian for two non interacting compact objects
reads L0 =
−ma c2(1− v2a/c2)1/2. In other words, ma c2 represents the total
energy of body a. This means that one has implicitly lumped in the definition
of ma many strong-self-gravity effects. [For instance, in tensor-scalar gravity
ma includes not only the usual Einsteinian gravitational binding energy due
to the self-gravitational field gµν(x), but also the extra binding energy linked
to the scalar field ϕ(x).] Anyway, what is important is that, when performing
a phenomenological test from the measurement of a triplet of parameters, e.g.
{k, γ,R}, at least one parameter among them be a priori sensitive to strong-
field effects. This is enough for guaranteeing that the crossing of the three
curves ktheory(ma,mb) = k
obs, γtheory(ma,mb) = γ
obs, Rtheory(ma,mb) = R
is really a probe of strong-field gravity.
In conclusion, the two recently discovered binary pulsars PSR J1141−6545
and PSR J0737−3039 have more than doubled the number of phenomenological
tests of (radiative and) strong-field gravity. Before their discovery, the ‘canoni-
cal’ relativistic binary pulsars PSR 1913+16 and PSR 1534+12 had given us four
data [49]. It is therefore safer not to use scintillation estimates of sin i on the same footing as
direct timing measurements of the PK parameter s. On the other hand, a safe way of obtaining
an s-related gravity test consists in using the necessary mathematical fact that s = sin i ≤ 1.
In GR the definition xa = aa sin i/c leads to sin i = nxa/(β0 Xb). Therefore we can write the
inequality nxa/(β0(M)Xb) ≤ 1 as a phenomenological test of GR.
s ≤ 1
0 0.5 1 1.5 2 2.5
PSR J1141−6545
intersection
0 0.5 1 1.5 2 2.5
PSR B1534+12
intersection
0 0.5 1 1.5 2 2.5
PSR J0737−3039
intersection
0 0.5 1 1.5 2 2.5
2.5 ω
s ≤ 1
PSR B1913+16
intersection
Figure 1: Phenomenological tests of General Relativity obtained from Keplerian
and post-Keplerian timing parameters of four relativistic pulsars. Figure taken
from [51].
such tests: one (k−γ−Ṗb) test from PSR 1913+16 and three (k−γ−r−s−Ṗb8)
tests from PSR 1534+12. The two new binary systems have given us five9 more
phenomenological tests: one (k−γ− Ṗb) (or two, k−γ− Ṗb−s) tests from PSR
J1141−6545 and four (k−γ− r−s− Ṗb−R) tests from PSR J0737−303910. As
illustrated in Figure 1, these nine phenomenological tests of strong-field (and
radiative) gravity are all in beautiful agreement with General Relativity.
In addition, let us recall that several quasi-circular wide binaries, made of
a neutron star and a white dwarf, have led to high-precision phenomenological
confirmations [44] (in strong-field conditions) of one of the deep predictions of
General Relativity: the ‘strong’ equivalence principle, i.e. the fact that var-
ious bodies fall with the same acceleration in an external gravitational field,
independently of the strength of their self-gravity.
Finally, let us mention that Ref. [31] has extended the philosophy of the
8The timing measurement of Ṗ obs
in PSR 1534+12 is even more strongly affected by
kinematic corrections (Ḋ terms) than in the PSR 1913+16 case. In absence of a precise,
independent measurement of the distance to PSR 1534+12, the k−γ− Ṗb test yields, at best,
a ∼ 15% test of GR.
9Or even six, if we use the scintillation determination of s in PSR J1141−6545.
10The companion pulsar 0737−3039B being non recycled, and being visible only during a
small part of its orbit, cannot be timed with sufficient accuracy to allow one to measure any
of its post-Keplerian parameters.
phenomenological (parametrized post-Keplerian) analysis of timing data, to a
similar phenomenological analysis of pulse-structure data. Ref. [31] showed that,
in principle, one could extract up to 11 ‘post-Keplerian pulse-structure param-
eters’. Together with the 8 post-Keplerian timing parameters of a (single-line)
binary pulsar, this makes a total of 19 phenomenological PK parameters. As
these parameters depend not only on the two massesma,mb but also on the two
angles λ, η determining the direction of the spin axis of the pulsar, the maximum
number of tests one might hope to extract from one (single-line) binary pulsar
is 19 − 4 = 15. However, the present accuracy with which one can model and
measure the pulse structure of the known pulsars has not yet allowed one to
measure any of these new pulse-structure parameters in a theory-independent
and model-independent way.
Nonetheless, it has been possible to confirm the reality (and order of mag-
nitude) of the spin-orbit coupling in GR which was pointed out [52, 53] to be
observable via a secular change of the intensity profile of a pulsar signal. Confir-
mations of general relativistic spin-orbit effects in the evolution of pulsar profiles
were obained in several pulsars: PSR 1913+16 [54, 55], PSR B1534+12 [56] and
PSR J1141−6545 [57]. In this respect, let us mention that the spin-orbit interac-
tion affects also several PK parameters, either by inducing a secular evolution in
some of them (see [31]) or by contributing to their value. For instance, the spin-
orbit interaction contributes to the observed value of the periastron advance
parameter k an amount which is significant for the pulsars (such as 1913+16
and 0737−3039) where k is measured with high-accuracy. It was then pointed
out [58] that this gives, in principle, and indirect way of measuring the moment
of inertia of neutron stars (a useful quantity for probing the equation of state
of nuclear matter [59, 60]). However, this can be done only if one measures,
besides k, two other PK parameters with 10−5 accuracy. A rather tall order
which will be a challenge to meet.
The phenomenological approach to pulsar tests has the advantage that it can
confirm or invalidate a specific theory of gravity without making assumptions
about other theories. Moreover, as General Relativity has no free parameters,
any test of its predictions is a potentially lethal test. From this point of view, it is
remarkable that GR has passed with flying colours all the pulsar tests if has been
submitted to. [See, notably, Fig. 1.] As argued above, these tests have probed
strong-field aspects of gravity which had not been probed by solar-system (or
cosmological) tests. On the other hand, a disadvantage of the phenomenological
tests is that they do not tell us in any precise way which strong-field structures,
have been actually tested. For instance, let us imagine that one day one specific
PPK test fails to be satisfied by GR, while the others are OK. This leaves us in a
quandary: If we trust the problematic test, we must conclude that GR is wrong.
However, the other tests say that GR is OK. This example shows that we would
like to have some idea of what physical effects, linked to strong-field gravity,
enter in each test, or even better in each PK parameter. The ‘effacement of
internal structure’ which takes place in GR does not allow one to discuss this
issue. This gives us a motivation for going beyond the phenomenological PPK
approach by considering theory-dependent formalisms in which one embeds GR
within a space of alternative gravity theories.
5 Theory-space approach to testing relativistic
gravity with binary pulsar data
A complementary approach to testing gravity with binary pulsar data consists
in embedding General Relativity within a multi-parameter space of alternative
theories of gravity. In other words, we want to contrast the predictions of GR
with the predictions of continuous families of alternative theories. In so doing we
hope to learn more about which structures of GR are actually being probed in
binary pulsar tests. This is a bit similar to the well-known psycho-physiological
fact that the best way to appreciate a nuance of colour is to surround a given
patch of colour by other patches with slightly different colours. This makes it
much easier to detect subtle differences in colour. In the same way, we hope to
learn about the probing power of pulsar tests by seeing how the phenomeno-
logical tests summarized in Fig. 1 fail (or continue) to be satisfied when one
continuously deform, away from GR, the gravity theory which is being tested.
Let us first recall the various ways in which this theory-space approach has
been used in the context of the solar-system tests of relativistic gravity.
5.1 Theory-space approaches to solar-system tests of rel-
ativistic gravity
In the quasi-stationary weak-field context of the solar-system, this theory-space
approach has been implemented in two different ways. First, the parametrized
post-Newtonian (PPN) formalism [61, 62, 63, 42, 64, 65, 11, 66] describes
many ‘directions’ in which generic alternative theories of gravity might dif-
fer in their weak-field predictions from GR. In its most general versions the
PPN formalism contains 10 ‘post-Einstein’ PPN parameters, γ̄ ≡ γPPN − 111,
β̄ ≡ βPPN−1, ξ, α1, α2, α3, ζ1, ζ2, ζ3, ζ4. Each one of these dimensionless quanti-
ties parametrizes a certain class of slow-motion, weak-field gravitational effects
which deviate from corresponding GR predictions. For instance, γ̄ parametrizes
modifications both of the effect of a massive body (say, the Sun) on the light
passing near it, and of the terms in the two-body gravitational Lagrangian which
are proportional to (Gmamb/rab) · (va − vb)2/c2.
A second way of implementing the theory-space philosophy consists in con-
sidering some explicit, parameter-dependent family of alternative relativistic
theories of gravity. For instance, the simplest tensor-scalar theory of gravity
11The PPN parameter γPPN is usually denoted simply as γ. To distinguish it from the
Einstein-time-delay PPK timing parameter γ used above we add the superscript PPN. In
addition, as the value of γPPN in GR is 1, we prefer to work with the parameter γ̄ ≡ γPPN−1
which vanishes in GR, and therefore measures a ‘deviation’ from GR in a certain ‘direction’
in theory-space. Similarly with β̄ ≡ βPPN − 1.
put forward by Jordan [67], Fierz [68] and Brans and Dicke [69] has a unique
free parameter, say α20 = (2ωBD + 3)
−1. When α20 → 0, this theory reduces
to GR, so that α20 (or 1/ωBD) measures all the deviations from GR. When
considering the weak-field limit of the Jordan-Fierz-Brans-Dicke (JFBD) the-
ory, one finds that it can be described within the PPN formalism by choosing
γ̄ = −2α20(1 + α20)−1, β̄ = 0 and ξ = αi = ζj = 0.
Having briefly recalled the two types of theory-space approaches used to
discuss solar-system tests, let us now consider the case of binary-pulsar tests.
5.2 Theory-space approaches to binary-pulsar tests of rel-
ativistic gravity
There exist generalizations of these two different theory-space approaches to the
context of strong-field gravity and binary pulsar tests. First, the PPN formalism
has been (partially) extended beyond the ‘first post-Newtonian’ (1PN) order
deviations from GR (∼ v2/c2+Gm/c2 r) to describe 2PN order deviations from
[70]. Remarkably, there appear only two new parameters
at the 2PN level12: ǫ and ζ. Also, by expanding in powers of the self-gravity
parameters of body a and b the predictions for the PPK timing parameters
in generic tensor-multi-scalar theories, one has shown that these predictions
depended on several ‘layers’ of new dimensionless parameters [33]. Early among
these parameters one finds, the 1PN parameters β̄, γ̄ and then the basic 2PN
parameters ǫ and ζ, but one also finds further parameters β3, (ββ
′), β′′, . . .
which would not enter usual 2PN effects. The two approaches that we have just
mentionned can be viewed as generalizations of the PPN formalism.
There exist also useful generalizations to the strong-field context of the idea
of considering some explicit parameter-dependent family of alternative theo-
ries of relativistic gravity. Early studies [7, 8, 39] focussed either on the one-
parameter JFBD tensor-scalar theory, or on some theories which are not con-
tinuously connected to GR, such as Rosen’s bimetric theory of gravity. Though
the JFBD theory exhibits a marked difference from GR in that it predicts the
existence of dipole radiation, it has the disadvantage that the weak field, solar-
system constraints on its unique parameter α20 are so strong that they drastically
constrain (and essentially forbid) the presence of any non-radiative, strong-field
deviations from GR. In view of this, it is useful to consider other ‘mini-spaces’
of alternative theories.
A two-parameter mini-space of theories, that we shall denote13 here as
′, β′′), was introduced in [33]. This two-parameter family of tensor-bi-scalar
theories was constructed so as to have exactly the same first post-Newtonian
limit as GR (i.e. γ̄ = β̄ = · · · = 0), but to differ from GR in its predictions
12When restricting oneself to the general class of tensor-multi-scalar theories. At the 1PN
level, this restriction would imply that only the ‘directions’ γ̄ and β̄ are allowed.
13We add here an index 2 to T as a reminder that this is a class of tensor-bi-scalar theories,
i.e. that they contain two independent scalar fields ϕ1, ϕ2 besides a dynamical metric gµν .
for the various observables that can be extracted from binary pulsar data. Let
us give one example of this behaviour of the T2(β
′, β′′) class of theories. For
a general theory of gravity we expect to have violations of the strong equiva-
lence principle in the sense that the ratio between the gravitational mass of a
self-gravitating body to its inertial mass will admit an expansion of the type
mgrava
minerta
≡ 1 + ∆a = 1−
η1 ca + η2 c
a + . . . (39)
where ca ≡ −2 ∂ lnma∂ lnG measures the ‘gravitational compactness’ (or fractional
gravitational binding energy, ca ≃ −2Egrava /ma c2) of body a. The numerical
coefficient η1 of the contribution linear in ca is a combination of the first post-
Newtonian order PPN parameters, namely η1 = 4 β̄ − γ̄ [42]. The numerical
coefficient η2 of the term quadratic in ca is a combination of the 1PN and 2PN
parameters. When working in the context of the T2(β
′, β′′) theories, the 1PN
parameters vanish exactly (β̄ = 0 = γ̄) and the coefficient of the quadratic term
becomes simply proportional to the theory parameter β′ : η2 =
Bβ′, where
B ≈ 1.026. This example shows explicitly how binary pulsar data (here the data
constraining the equivalence principle violation parameter ∆ = ∆a − ∆b, see
above) can go beyond solar-system experiments in probing certain strong-self-
gravity effects. Indeed, solar-system experiments are totally insensitive to 2PN
parameters because of the smallness of ca ∼ Gma/c2Ra and of the structure
of 2PN effects [70]. By contrast, the ‘compactness’ of neutron stars is of order
ca ∼ 0.21ma/M⊙ ∼ 0.3 [33] so that the pulsar limit |∆| < 5.5×10−3 [44] yields,
within the T2(β
′, β′′) framework, a significant limit on the dimensionless (2PN
order) parameter β′ : |β′| < 0.12.
Ref. [35] introduced a new two-parameter mini-space of gravity theories,
denoted here as T1(α0, β0), which, from the point of view of theoretical physics,
has several advantages over the T2(β
′, β′′) mini-space mentionned above. First,
it is technically simpler in that it contains only one scalar field ϕ besides the
metric gµν (hence the index 1 on T1(α0, β0)). Second, it contains only positive-
energy excitations (while one combination of the two scalar fields of T2(β
′, β′′)
carried negative-energy waves). Third, it is the minimal way to parametrize
the huge class of tensor-mono-scalar theories with a ‘coupling function’ a(ϕ)
satisfying some very general requirements (see below).
Let us now motivate the use of tensor-scalar theories of gravity as alternatives
to general relativity.
5.3 Tensor-scalar theories of gravity
Let us start by recalling (essentially from [35]) why tensor-(mono)-scalar theories
define a natural class of alternatives to GR. First, and foremost, the existence
of scalar partners to the graviton is a simple theoretical possibility which has
surfaced many times in the development of unified theories, from Kaluza-Klein
to superstring theory. Second, they are general enough to describe many inter-
esting deviations from GR (both in weak-field and in strong field conditions),
but simple enough to allow one to work out their predictions in full detail.
Let us therefore consider a general tensor-scalar action involving a metric
g̃µν (with signature ‘mostly plus’), a scalar field Φ, and some matter variables
ψm (including gauge bosons):
16πG∗
g̃1/2
F (Φ)R̃ − Z(Φ)g̃µν∂µΦ ∂νΦ− U(Φ)
+ Sm[ψm; g̃µν ] .
For simplicity, we assume here that the weak equivalence principle is satisfied,
i.e., that the matter variables ψm are all coupled to the same ‘physical metric’
g̃µν . The general model (40) involves three arbitrary functions: a function F (Φ)
coupling the scalar Φ to the Ricci scalar of g̃µν , R̃ ≡ R(g̃µν), a function Z(Φ)
renormalizing the kinetic term of Φ, and a potential function U(Φ). As we
have the freedom of arbitrary redefinitions of the scalar field, Φ → Φ′ = f(Φ),
only two functions among F , Z and U are independent. It is often convenient
to rewrite (40) in a canonical form, obtained by redefining both Φ and g̃µν
according to
g∗µν = F (Φ) g̃µν , (41)
ϕ = ±
F ′2(Φ)
F 2(Φ)
F (Φ)
. (42)
This yields
16πG∗
∗ [R∗ − 2gµν∗ ∂µϕ∂νϕ− V (ϕ)] + Sm
2(ϕ) g∗µν
where R∗ ≡ R(g∗µν), where the potential
V (ϕ) = F−2(Φ)U(Φ) , (44)
and where the conformal coupling function A(ϕ) is given by
A(ϕ) = F−1/2(Φ) , (45)
with Φ(ϕ) obtained by inverting the integral (42).
The two arbitrary functions entering the canonical form (43) are: (i) the con-
formal coupling function A(ϕ), and (ii) the potential function V (ϕ). Note that
the ‘physical metric’ g̃µν (the one measured by laboratory clocks and rods) is
conformally related to the ‘Einstein metric’ g∗µν , being given by g̃µν = A
2(ϕ) g∗µν .
The canonical representation is technically useful because it decouples the two
14Actually, most unified models suggest that there are violations of the weak equivalence
principle. However, the study of general string-inspired tensor-scalar models [71] has found
that the composition-dependent effects would be negligible in the gravitational physics of
neutron stars that we consider here. The experimental limits on tests of the equivalence
principle would, however, bring a strong additional constraint of order 10−5 α2
∼ ∆a/a .
10−12. As this constraint is strongly model-dependent, we will not use it in our exclusion
plots below. One should, however, keep in mind that a limit on the scalar coupling strength
of order α2
. 10−7 [71, 72] is likely to exist in many, physically-motivated, tensor-scalar
models.
irreducible propagating excitations: the spin-0 excitations are described by ϕ,
while the pure spin-2 excitations are described by the Einstein metric g∗µν (with
kinetic term the usual Einstein-Hilbert action ∝ R(g∗µν)).
In many technical developments it is useful to work with the logarithmic
coupling function a(ϕ) such that:
a(ϕ) ≡ lnA(ϕ) ; A(ϕ) ≡ ea(ϕ) . (46)
In the case of the general model (40) this logarithmic15 coupling function is
given by
a(ϕ) = −
lnF (Φ) ,
where Φ(ϕ) must be obtained from (42).
In the following, we shall assume that the potential V (ϕ) is a slowly varying
function of ϕ which, in the domain of variation we shall explore, is roughly
equivalent to a very small mass term V (ϕ) ∼ 2m2ϕ(ϕ−ϕ0)2 with m2ϕ of cosmo-
logical order of magnitude m2ϕ = O(H20 ), or, at least, with a range λϕ = m−1ϕ
much larger than the typical length scales that we shall consider (such as the
size of the binary orbit, or the size of the Galaxy when considering violations of
the strong equivalence principle). Under this assumption16 the potential func-
tion V (ϕ) will only serve the role of fixing the value of ϕ far from the system
(to ϕ(r = ∞) = ϕ0), and its effect on the propagation of ϕ within the system
will be negligible. In the end, the tensor-scalar phenomenology that we shall
explore only depends on one function: the coupling function a(ϕ).
Let us consider some examples to see what kind of coupling functions might
naturally arise. First, the simplest case is the Jordan-Fierz-Brans-Dicke action,
which is of the general type (40) with
F (Φ) = Φ (47)
Z(Φ) = ωBD Φ
−1 , (48)
where ωBD is an arbitrary constant. Using Eqs. (42), (45) above, one finds that
− 2α0 ϕ = lnΦ and that the (logarithmic) coupling function is simply
a(ϕ) = α0 ϕ+ const. , (49)
where α0 = ∓(2ωBD + 3)−1/2, depending on the sign chosen in Eq. (42). Inde-
pendently of this sign, one has the link
α20 =
2ωBD + 3
. (50)
15As we shall mostly work with a(ϕ) below, we shall henceforth drop the adjective ‘loga-
rithmic’.
16Note, however, that, as was recently explored in [73, 74, 75], a sufficiently fast varying
potential V (ϕ) can change the tensor-scalar phenomenology by endowing ϕ with a mass term
m2ϕ =
∂2V/∂ϕ2 which strongly depends on the local value of ϕ and, thereby can get large
in sufficiently dense environments.
Note that 2ωBD + 3 must be positive for the spin-0 excitations to have the
correct (non ghost) sign.
Let us now discuss the often considered case of a massive scalar field having
a nonminimal coupling to curvature
16πG∗
g̃1/2
R̃− g̃µν∂µΦ ∂νΦ−m2ΦΦ2 + ξR̃Φ2
+ Sm[ψm; g̃µν ] .
This is of the form (40) with
F (Φ) = 1 + ξΦ2 , Z(Φ) = 1 , U(Φ) = m2ΦΦ
2 . (52)
The case ξ = − 1
is usually referred to as that of ‘conformal coupling’. With the
variables (51) the theory is ghost-free only if 2 (1+ ξΦ2)2 (dϕ/dΦ)2 = 1+ ξ(1 +
6 ξ)Φ2 is everywhere positive. If we do not wish to restrict the initial values of
Φ, we must have ξ(1+6 ξ) > 0. Introducing then the notation χ ≡
ξ(1 + 6 ξ),
we get the following link between Φ and ϕ:
1 + 2χΦ
1 + χ2 Φ2 + χΦ
1 + χ2 Φ2 −
1 + ξΦ2
. (53)
For small values of Φ, this yields ϕ = Φ/
2 + O(Φ3). The potential and the
coupling functions are given by
V (ϕ) =
1 + ξΦ2
, (54)
a(ϕ) = −1
ln(1 + ξΦ2) . (55)
These functions have singularities when 1+ ξΦ2 vanishes. If we do not wish
to restrict the initial value of Φ we must assume ξ > 0 (which then implies our
previous assumption ξ(1+6 ξ) > 0). Then there is a one-to-one relation between
Φ and ϕ over the entire real line. Small values of Φ correspond to small values
of ϕ and to a coupling function
a(ϕ) = − ξ ϕ2 +O(ϕ4) . (56)
On the other hand, large values of |Φ| correspond to large values of |ϕ|, and to
a coupling function of the asymptotic form
a(ϕ) ≃ −
|ϕ|+ const. (57)
The potential V (ϕ) has a minimum at ϕ = 0, as well as other minima at
ϕ → ±∞. If we assume, for instance, that m2Φ and the cosmological dynamics
are such that the cosmological value of ϕ is currently attracted towards zero,
the value of ϕ at large distances from the local gravitating systems we shall
consider will be ϕ0 ≪ 1.
As a final example of a possible tensor-scalar gravity theory, let us discuss
the string-motivated dilaton-runaway scenario considered in [76]. The starting
action (a functional of ḡµν and Φ) was taken of the general form
Bg(Φ)
BΦ(Φ)
[2 �̄Φ
−(∇̄Φ)2]− 1
BF (Φ)F̄
2 − V (Φ) + · · ·
and it was assumed that all the functions Bi(Φ) have a regular asymptotic be-
havior when Φ → +∞ of the form Bi(Φ) = Ci+O(e−Φ). Under this assumption
the early cosmological evolution can push Φ towards +∞ (hence the name ‘run-
away dilaton’). In the canonical, ‘Einstein frame’ representation (43), one has,
for large values of Φ, Φ ≃ c ϕ, where c is a numerical constant, and the coupling
function to hadronic matter is given by
ea(ϕ) ∝ ΛQCD(ϕ) ∝ B−1/2g (ϕ) exp[−8π2 b−13 BF (ϕ)]
where b3 is the one-loop rational coefficient entering the renormalization-group
running of the gauge field coupling g2F . This finally yields a coupling function
of the approximate form (for large values of ϕ):
a(ϕ) ≃ k e−cϕ + const. ,
where the dimensionless constants k and c are both expected to be of order unity.
[The constant c must be positive, but the sign of k is not a priori restricted.]
Summarizing: the JFBD model yields a coupling function which is a linear
function of ϕ, Eq. (49), a nonminimally coupled scalar yields a coupling function
which interpolates between a quadratic function of ϕ, Eq. (56), and a linear one,
Eq. (57), and the dilaton-runaway scenario of Ref. [76] yields a coupling function
of a decaying exponential type.
5.4 The role of the coupling function a(ϕ); definition of the
two-dimensional space of tensor-scalar gravity theo-
ries T1(α0, β0)
Let us now discuss how the coupling function a(ϕ) enters the observable pre-
dictions of tensor-scalar gravity at the first post-Newtonian (1PN) level, i.e.,
in the weak-field conditions appropriate to solar-system tests. It was shown in
previous work that, if one uses appropriate units in the asymptotic region far
from the system, namely units such that the asymptotic value a(ϕ0) of a(ϕ)
vanishes17, all observable quantities at the 1PN level depend only on the values
17In these units the Einstein metric g∗µν and the physical metric g̃µν asymptotically coin-
cide.
of the first two derivatives of the a(ϕ) at ϕ = ϕ0. More precisely, if one defines
α(ϕ) ≡ ∂ a(ϕ)
; β(ϕ) ≡ ∂ α(ϕ)
∂2 a(ϕ)
, (58)
and denotes by α0 ≡ α(ϕ0), β0 ≡ β(ϕ0) their asymptotic values, one finds
(see, e.g., [33]) that the effective gravitational constant between two bodies (as
measured by a Cavendish experiment) is given by
G = G∗(1 + α
0) , (59)
while, among the PPN parameters, only the two basic Eddington ones, γ̄ ≡
γPPN − 1, and β̄ ≡ βPPN − 1, do not vanish, and are given by
γ̄ ≡ γPPN − 1 = −2 α
1 + α20
, (60)
β̄ ≡ βPPN − 1 = 1
α0 β0 α0
(1 + α20)
. (61)
The structure of the results (60) and (61) can be transparently expressed by
means of simple (Feynman-like) diagrams (see, e.g., [77]). Eqs. (59) and (60)
correspond to diagrams where the interaction between two worldlines (repre-
senting two massive bodies) is mediated by the sum of the exchange of one
graviton and one scalar particle. The scalar couples to matter with strength
G∗. The exchange of a scalar excitation then leads to a term ∝ α20. On
the other hand, Eq. (61) corresponds to a nonlinear interaction between three
worldlines involving: (i) the ‘generation’ of a scalar excitation on a first world-
line (factor α0), (ii) a nonlinear vertex on a second worldline associated to the
quadratic piece of a(ϕ) (aquad(ϕ) =
β0(ϕ−ϕ0)2; so that one gets a factor β0),
and (iii) the final ‘absorption’ of a scalar excitation on a third worldline (second
factor α0).
Eqs. (60) and (61) can be summarized by saying that the first two coefficients
in the Taylor expansion of the coupling function a(ϕ) around ϕ = ϕ0 (after
setting a(ϕ0) = 0)
a(ϕ) = α0(ϕ− ϕ0) +
β0(ϕ − ϕ0)2 + · · · (62)
suffice to determine the quasi-stationary, weak-field (1PN) predictions of any
tensor-scalar theory. In other words, the solar-system tests only explore the
‘osculating approximation’ (62) (slope and local curvature) to the function a(ϕ).
Note that GR corresponds to a vanishing coupling function a(ϕ) = 0 (so that
α0 = β0 = · · · = 0), the JFBD model corresponds to keeping only the first term
on the R.H.S. of (62), while, for instance, the nonminimally coupled scalar field
(with asymptotic value ϕ0 ≪ 1) does indeed lead to nonzero values for both α0
and β0, namely
α0 ≃ − 2 ξ ϕ0 ; β0 ≃ − 2 ξ . (63)
Finally the dilaton-runaway scenario considered above leads also to non zero
values for both α0 and β0, namely
α0 ≃ − k c e−cϕ0 ; β0 ≃ + k c2 e−cϕ0 , (64)
for a largish value of ϕ0. Note that the dilaton-runaway model naturally predicts
that α0 ≪ 1, and that β0 is of the same order of magnitude as α0 : β0 ≃ − c α0
with c being (positive and) of order unity. The interesting outcome is that such
a model is well approximated by the usual JFBD model (with β0 = 0). This
shows that a JFBD-like theory could come out from a model which is initially
quite different from the usual exact JFBD theory.
As we shall discuss in detail below, solar-system tests constrain α20 and α
0 |β0|
to be both small. This immediately implies that |α0| must be small, i.e., that
the scalar field is linearly weakly coupled to matter. On the other hand, the
quadratic coupling parameter β0 is not directly constrained. Both its magnitude
and its sign can be more or less arbitrary. Note that there are no a priori sign
restrictions on β0. The conformal factor A
2(ϕ) = exp(2 a(ϕ)) entering Eq. (43)
had to be positive, but this leads to no restrictions on the sign of a(ϕ) and of
its various derivatives18. For instance, in the nonminimally coupled scalar field
case, it seemed more natural to require ξ > 0, which leads to a negative β0 in
view of Eq. (63).
Let us summarize the results above: (i) the most general tensor-scalar the-
ory19 is described by one arbitrary function a(ϕ); and (ii) weak-field tests depend
only on the first two terms, parametrized by α0 and β0, in the Taylor expansion
(62) of a(ϕ) around its asymptotic value ϕ0.
From this follows a rather natural way to define a simplemini space of tensor-
scalar theories. It suffices to consider the two-dimensional space of theories, say
T1(α0, β0), defined by the coupling function which is a quadratic polynomial in
ϕ [34, 35], say
aα0,β0(ϕ) = α0(ϕ− ϕ0) +
β0(ϕ− ϕ0)2 . (65)
As indicated, this class of theories depends only on two parameters: α0 and
β0. The asymptotic value ϕ0 of ϕ does not count as a third parameter (when
using the form (65)) because one can always work with the shifted field ϕ̄ ≡
ϕ−ϕ0, with asymptotic value ϕ̄0 = 0 and coupling function aα0,β0(ϕ̄) = α0 ϕ̄+
β0 ϕ̄
2. Moreover, as already said, the asymptotic value a(ϕ0) of a(ϕ) has also
no physical meaning, because one can always use units such that it vanishes (as
done in (65)).
18As explained above, we assume here the presence of a potential term V (ϕ) to fix the
asymptotic value ϕ0 of ϕ. If the potential V (ϕ) is absent (or negligible), the ‘attractor
mechanism’ of Refs. [78, 71] would attract ϕ to a minimum of the coupling function a(ϕ),
thereby favoring a positive value of β0.
19Under the assumption that the potential V (ϕ) is a slowly-varying function of ϕ, which
modifies the propagation of ϕ only on very large scales.
Note also that an alternative way to represent the same class of theories is
to use a coupling function of the very simple form
aβ(ϕ) =
β ϕ2 , (66)
but to keep the asymptotic value ϕ0 as an independent parameter. This class
of theories is clearly equivalent to T1(α0, β0), Eq. (65), with the dictionary:
α0 = β ϕ0, β0 = β.
5.5 Tensor-scalar gravity, strong-field effects, and binary-
pulsar observables
Having chosen some mini-space of gravity theories, we now wish to derive what
predictions these theories make for the timing observables of binary pulsars. To
do this we need to generalize the general relativistic treatment of the motion and
timing of binary systems comprising strongly self-gravitating bodies summarized
above. Let us recall that this treatment was based on a multi-chart method,
using a matching between two separate problems: (i) the ‘internal problem’
considers each strongly self-gravitating body in a suitable approximately freely
falling frame where the influence of its companion is small, and (ii) the ‘external
problem’ where the two bodies are described as effective point masses which
interact via the various fields they are coupled to. Let us first consider the
internal problem, i.e., the description of a neutron star in an approximately
freely falling frame where the influence of the companion is reduced to imposing
some boundary conditions on the tensor and scalar fields with which it interacts
[7, 8, 33, 34, 35]. The field equations of a general tensor-scalar theory, as derived
from the canonical action (43) (neglecting the effect of V (ϕ)) read
R∗µν = 2 ∂µϕ∂νϕ+ 8πG∗
T ∗µν −
T ∗g∗µν
, (67)
�g∗ ϕ = − 4πG∗ α(ϕ)T∗ , (68)
where T
∗ ≡ 2 c (g∗)−1/2 δSm/δg∗µν denotes the material stress-energy tensor in
‘Einstein units’, and α(ϕ) the ϕ-derivative of the coupling function, see Eq. (58).
All tensorial operations in Eqs. (67) and (68) are performed by using the Einstein
metric g∗µν .
Explicitly writing the field equations (67) and (68) for a slowly rotating
(stationary, axisymmetric) neutron star, labelled20 A, leads to a coupled set of
ordinary differential equations constraining the radial dependence of g∗µν and
ϕ [35, 79]. Imposing the boundary conditions g∗µν → ηµν , ϕ → ϕa at large
radial distances, finally determines the crucial ‘form factors’ (in Einstein units)
describing the effective coupling between the neutron star A and the fields to
20We henceforth use the labels A and B for the (recycled) pulsar and its companion, instead
of the labels a and b used above. We henceforth use the label a to denote the asymptotic
value of some quantity (at large radial distances within the local frame, Xi
or Xi
, of the
considered neutron star A or B).
which it is sensitive: total mass mA(ϕa), total scalar charge ωA(ϕa), and inertia
moment IA(ϕa). As indicated, these quantities are functions of the asymptotic
value ϕa of ϕ felt by the considered neutron star
21. They satisfy the relation
ωA(ϕa) = −∂ mA(ϕa)/∂ ϕa. From them, one defines other quantities that play
an important role in binary pulsar physics, notably
αA(ϕa) ≡ −
≡ ∂ lnmA
, (69)
βA(ϕa) ≡
, (70)
as well as
kA(ϕa) ≡ −
∂ ln IA
. (71)
The quantity αA, Eq. (69), plays a crucial role. It measures the effective coupling
strength between the neutron star and the ambient scalar field. If we formally let
the self-gravity of the neutron A tend toward zero (i.e., if we consider a weakly
self-gravitating object), the function αA(ϕa) becomes replaced by α(ϕa) where
α(ϕ) ≡ ∂ a(ϕ)/∂ ϕ is the coupling strength appearing in the R.H.S. of Eq. (68).
Roughly speaking, we can think of αA(ϕa) as a (suitable defined) average value
of the local coupling strength α(ϕ(r)) over the radial profile of the neutron star
0.5 1 1.5 2 2.5 3
critical
maximum
maximum
mass in GR
scalar charge
baryonic mass
neutron star
Figure 2: Dependence upon the baryonic mass m̄A of the coupling parameter
αA in the theory T1(α0, β0) with α0 = −0.014, β0 = −6. Figure taken from
[80].
It was pointed out in Refs. [34, 35] that the strong self-gravity of a neutron
star can cause the effective coupling strength αA(ϕa) to become of order unity,
21This ϕa is a combination of the cosmological background value ϕ0 and of the scalar
influence of the companion of the considered neutron star. It varies with the orbital period and
is determined as part of the ‘external problem’ discussed below. Note that, strictly speaking,
the label a (for asymptotic) should be indexed by the label of the considered neutron star: i.e.
one should use a label aA (and a locally asymptotic value ϕaA) when considering the neutron
star A, and a label aB (with a corresponding ϕaB ) when considering the neutron star B.
even when its weak-field counterpart α0 = α(ϕa) is extremely small (as is im-
plied by solar-system tests that put strong constraints on the PPN combination
γ̄ = −2α20/(1+α20)). This is illustrated, in the minimal context of the T1(α0, β0)
class of theories, in Figure 2.
Note that when the baryonic mass m̄A of the neutron star is smaller than
the critical mass m̄cr ≃ 1.24M⊙ the effective scalar coupling strength αA of
the star is quite small (because it is proportional to its weak-field limit α0 =
α(ϕa)). By contrast, when m̄A > m̄cr, |αA| becomes of order unity, nearly
independently of the externally imposed α0 = αa = α(ϕa). This interesting
non-perturbative behaviour was related in [34, 35] to a mechanism of spontaneous
scalarization, akin to the well-known mechanism of spontaneous magnetization
of ferromagnets. See also [51] for a simple analytical description of the behaviour
of αA.
Let us also mention in passing that, in the case where A is a black hole, the
effective coupling strength αA actually vanishes [33]. This result is related to
the impossibility of having (regular) ‘scalar hair’ on a black hole.
We have sketched above the first part of the matching approach to the mo-
tion and timing of strongly self-gravitating bodies: the ‘internal problem’. It
remains to describe the remaining ‘external problem’. As already mentionned
(and emphasized, in the present context, by Eardley [7, 11]), the most efficient
way to describe the external problem is, instead of matching in detail the exter-
nal fields (g∗µν , ϕ) to the fields generated by each body in its comoving frame, to
‘skeletonize’ the bodies by point masses. Technically this means working with
the action
16πG∗
∗ [R∗ − 2 gµν∗ ∂µϕ∂νϕ]
mA(ϕ(zA))(−g∗µν(zA) dz
1/2 , (72)
where the function mA(ϕ) in the last term on the R.H.S. is the function mA(ϕa)
obtained above by solving the internal problem. Eq. (72) indicates that the ar-
gument of this function is taken to be ϕa = ϕ(zA), i.e., the value that the scalar
field (as viewed in the external problem) takes at the location z
A of the center of
mass of body A. However, as body A is described, in the external problem, as a
point mass this causes a technical difficulty: the externally determined field ϕ(x)
becomes formally singular at the location of the point sources, so that ϕ(zA) is a
priori undefined. One can either deal with this problem by coming back to the
physically well-defined matching approach (which shows that ϕ(zA) should be
replaced by ϕa, the value of ϕ in an intermediate domain RA ≪ r ≪ |zA−zB |),
or use the efficient technique of dimensional regularization. This means that the
spacetime dimension D in Eq. (72) is first taken to have a complex value such
that ϕ(zA) is finite, before being analytically continued to its physical value
D = 4.
One then derives from the action (72) two important consequences for the
motion and timing of binary pulsars. First, one derives the Lagrangian de-
scribing the relativistic interaction between N strongly self-gravitating bodies
(including orbital ∼ (v/c)2 effects, and neglecting O(v4/c4) ones) [11, 7, 33, 39].
It is the sum of one-body, two-body and three-body terms.
The one-body action has the usual form of the sum (over the label A) of the
kinetic term of each point mass:
one-body
A = −mA c
1− v2A/c2
= −mA c2 +
(v2A)
. (73)
Here, we use Einstein units, and the inertial mass mA entering Eq. (73) is mA ≡
mA(ϕ0), where ϕ0 is the asymptotic value of ϕ far away from the considered
N -body system.
The two-body action is a sum over the pairs A,B of a term L
2-body
AB which
differs from the GR-predicted 2-body Lagrangian in two ways: (i) the usual
gravitational constant G appearing as an overall factor in L
2-body
AB must be re-
placed by an effective (body-dependent) gravitational constant (in the appro-
priate units mentioned above) given by
GAB = G∗(1 + αA αB) , (74)
and (ii) the relativistic (O(v2/c2)) terms in L2-bodyAB contain, in addition to those
predicted by GR, new velocity-dependent terms of the form
2-body
AB = (γ̄AB)
GAB mAmB
(vA − vB)2
, (75)
γ̄AB ≡ γAB − 1 = − 2
αA αB
1 + αA αB
. (76)
In these expressions αA ≡ αA(ϕ0) ≡ ∂ lnmA(ϕ0)/∂ϕ0 (see Eq. (69) with ϕa →
Finally, the 3-body action is a sum over the pairs B,C and over A (with
A 6= B, A 6= C, but the possibility of having B = C) of
3-body
ABC = −(1 + 2 β̄
GAB GAC mAmB mC
c2 rAB rAC
where
β̄ABC ≡ βABC − 1 =
αB βA αC
(1 + αA αB)(1 + αA αC)
, (78)
with βA = ∂αA(ϕ0)/∂ϕ0 (see Eq. (70) with ϕa → ϕ0).
When comparing the strong-field results (74), (76), (78) to their weak-field
counterparts (59), (60), (61) one sees that the body-dependent quantity αA
replaces the weak-field coupling strength α0 in all quantities which are linked
to a scalar effect generated by body A. Note also that, in keeping with the
‘3-body’ nature of Eq. (77), the quantity βABC −1 is linked to scalar interactions
which are generated in bodies B and C and which nonlinearly interact on body
A. The notation used above has been chosen to emphasize that γAB and β
are strong-field analogs of the usual Eddington parameters γPPN, βPPN, so that
γ̄AB and β̄
BC are strong-field analogs of the ‘post-Einstein’ 1PN parameters γ̄
and β̄ (which vanish in GR). Indeed the usual PPN results for the post-Einstein
terms in the O(1/c2) 2-body and 3-body Lagrangians are obtained by replacing
in Eqs. (75) and (77) γ̄AB → γ̄, β̄ABC → β̄ and GAB → G.
The non-perturbative strong-field effects discussed above show that the strong
self-gravity of neutron stars can cause γAB and β
BC to be significantly different
from their GR values γGR = 1, βGR = 1, in some scalar-tensor theories having
a small value of the basic coupling parameter α0 (so that γ
PPN − 1 ∝ α20 and
βPPN − 1 ∝ β0 α20 are both small). For instance, Fig. 2 shows that it is possible
to have αA ∼ αB ∼ ± 0.6 which implies γAB − 1 ∼ − 0.53, i.e., a 50% deviation
from GR! Even larger effects can arise in βABC − 1 because of the large values
that βA = ∂αA/∂ϕ0 can reach near the spontaneous scalarization transition
[35].
Those possible strong-field modifications of the effective Eddington param-
eters γAB, β
BC , which parametrize the ‘first post-Keplerian’ (1PK) effects
(i.e., the orbital effects ∼ v2/c2 smaller than those entailed by the Lagrangian
A 6=B
GAB mAmB/rAB), can then significantly modify the usual
GR predictions relating the directly observable parametrized post-Keplerian
(PPK) parameters to the values of the masses of the pulsar and its compan-
ion. As worked out in Refs. [11, 31, 33, 35] one finds the following modified
predictions for the PPK parameters k ≡ 〈ω̇〉/n, r and s:
kth(mA,mB) =
1− e2
GAB(mA +mB)n
αA αB
1 + αA αB
− XA βB α
A +XB βA α
6 (1 + αA αB)2
, (79)
rth(mA,mB) = G0B mB , (80)
sth(mA,mB) =
GAB(mA +mB)n
]−1/3
. (81)
Here, the label A refers to the object which is timed (‘the pulsar’22), the label B
refers to its companion, xA = aA sin i/c denotes the projected semi-major axis
of the orbit of A (in light seconds), XA ≡ mA/(mA+mB) andXB ≡ mB/(mA+
mB) = 1 − XA the mass ratios, n ≡ 2π/Pb the orbital frequency and G0B =
G∗(1 + α0 αB) the effective gravitational constant measuring the interaction
22In the double binary pulsar, both the first discovered pulsar and its companion are pulsars.
However, the companion B is a non recycled, slow pulsar whose motion is well described by
Keplerian parameters only.
between B and a test object (namely electromagnetic waves on their way from
the pulsar toward the Earth). In addition one must replace the unknown bare
Newtonian G∗ by its expression in terms of the one measured in Cavendish
experiments, i.e., G∗ = G/(1 + α
0) as deduced from Eq. (59).
The modified theoretical prediction for the PPK parameter γ entering the
‘Einstein time delay’ ∆E , Eq. (24), is more complicated to derive because one
must take into account the modulation of the proper spin period of the pulsar
caused by the variation of its moment of inertia IA under the (scalar) influence
of its companion [11, 7, 35]. This leads to
γth(mA,mB) =
1 + αA αB
GAB(mA +mB)n
[XB(1 + αA αB) + 1 + kA αB ] , (82)
where kA(ϕ0) = −∂ ln IA(ϕ0)/∂ϕ0 (see Eq. (71) with ϕa → ϕ0). Numerical
studies [35] show that kA can take quite large values. Actually, the quantity
kA αB entering (82) blows up near the scalarization transition when α0 → 0
(keeping β0 < 0 fixed). In other words a theory which is closer to GR in weak-
field conditions predicts larger deviations in the strong-field regime.
The structure dependence of the effective gravitational constantGAB , Eq. (74),
has also the consequence that the object A does not fall in the same way
as B in the gravitational field of the Galaxy. As most of the mass of the
Galaxy is made of non strongly-self-gravitating bodies, A will fall toward the
Galaxy with an acceleration ∝ GA0, while B will fall with an acceleration
∝ GB0. Here, as above, GA0 = G0A = G∗(1 + α0 αA) is the effective gravi-
tational constant between A and any weakly self-gravitating body. As pointed
out in Ref. [43] this possible violation of the universality of free fall of self-
gravitating bodies can be constrained by using observational data on the class
of small-eccentricity long-orbital-period binary pulsars. More precisely, the
quantity which can be observationally constrained is not exactly the violation
∆AB = (G0A −G0B)/G = (1 +α20)−1(α0 αA −α0 αB) of the strong equivalence
principle [which simplifies to ∆A0 = (G0A − G)/G = (1 + α20)−1(α0 αA − α20)
in the case of observational relevance where one neglects the self-gravity of the
white-dwarf companion] but rather23 [33]
∆effective ≡
2 γAB − (XA βBAA +XB βABB) + 2
(1 + αA αB)
−3/2(1 + α20)
−1(α0 αA − α0 αB) . (83)
Here, the index B (= white-dwarf companion) can be replaced by 0 (weakly self-
gravitating body) so that, for instance, γAB = γA0 = 1− 2αAα0/(1+αA α0) =
(1− αA α0)/(1 + αA α0), as deduced from Eq. (76).
23This refinement is given here for pedagogical completeness. However, in practice, the
lowest-order result ∆ ≃ (1 + α2
)−1(α0 αA − α
) ≃ α0 αA − α
is accurate enough.
It remains to discuss the possible strong-field modifications of the theoretical
prediction for the orbital period derivative Ṗb = Ṗ
b (mA,mB). This is obtained
by deriving from the effective action (72) the energy lost by the binary system
in the form of fluxes of spin-2 and spin-0 waves at infinity. The needed results
in a generic tensor-scalar theory were derived in Refs. [33, 39] (in addition one
must take into account the tensor-scalar modification of the additional ‘varying-
Doppler’ contribution to the observed Ṗb due to the Galactic acceleration [38]).
The final result for Ṗb is of the form
Ṗ thb (mA,mB) = Ṗ
monopole
bϕ + Ṗ
dipole
bϕ + Ṗ
quadrupole
bϕ + Ṗ
quadrupole
galagtic
bGR + δ
th Ṗ
galactic
b , (84)
where, for instance, Ṗ
monopole
bϕ is (heuristically
24) related to the monopolar flux
of spin-0 waves at infinity. The term Ṗ
quadrupole
bg∗ corresponds to the usual
quadrupolar flux of spin-2 waves at infinity. It reads:
quadrupole
bg∗ (mA,mB) = −
5(1 + αA αB)
(mA +mB)2
GAB(mA +mB)n
1 + 73 e2/24 + 37 e4/96
(1− e2)7/2
with GAB = G∗(1 + αA αB) = G(1 + αA αB)/(1 + α
0), where G∗ is the ‘bare’
gravitational constant appearing in the action, while G is the gravitational con-
stant measured in Cavendish experiments. The flux (85) is the only one which
survives in GR (although without any αA-related modifications). Among the
several other contributions which arise in tensor-scalar theories, let us only write
down the explicit expression of the contribution to (84) coming from the dipolar
flux of scalar waves. Indeed, this contribution is, in most cases, the dominant
one [7] because it scales as (v/c)3, while the monopolar and quadrupolar con-
tributions scale as (v/c)5. It reads
dipole
bϕ (mA,mB) = −2π
G∗mAmB n
c3(mA +mB)
1 + e2/2
(1 − e2)5/2
(αA − αB)2 . (86)
Note that the dipolar effect (86) vanishes when αA = αB . Indeed, a binary
system made of two identical objects (A = B) cannot select a preferred direction
for a dipole vector, and cannot therefore emit any dipolar radiation. This also
implies that double neutron star systems (which tend to have mA ≈ mB ∼
1.35M⊙) will be rather poor emitters of dipolar radiation (though (86) still tends
to dominate over the other terms in (84), because of the remaining difference
(mA − mB)/(mA + mB) 6= 0). By contrast, very dissymmetric systems such
24Contrary to the GR case where a lot of effort was spent to show how the observed Ṗb
was directly related to the GR predictions for the (v/c)5-accurate orbital equations of motion
of a binary system [9], we use here the indirect and less rigorous argument that the energy
flux at infinity should be balanced by a corresponding decrease of the mechanical energy of
the binary system.
as a neutron-star and a white-dwarf (or a neutron-star and a black hole) will
be very efficient emitters of dipolar radiation, and will potentially lead to very
strong constraints on tensor-scalar theories. See below.
5.6 Theory-space analyses of binary pulsar data
Having reviewed the theoretical results needed to discuss the predictions of
alternative gravity theories, let us end by summarizing the results of various
theory-space analyses of binary pulsar data.
Let us first recall what are the best, current solar-system limits on the two
1PN ‘post-Einstein’ parameters γ̄ ≡ γPPN − 1 and β̄ ≡ βPPN − 1. They are:
γ̄ = (2.1± 2.3)× 10−5 , (87)
from frequency shift measurements made with the Cassini spacecraft [81], which
supersedes the constraint
γ̄ = (−1.7± 4.5)× 10−4 (88)
from VLBI measurements [82],
|2 γ̄ − β̄| < 3× 10−3 , (89)
from Mercury’s perihelion shift [66, 83], and
4 β̄ − γ̄ = (4.4± 4.5)× 10−4 , (90)
from Lunar laser ranging measurements [84].
Concerning binary pulsar data, we can make use of the published measure-
ments of various Keplerian and post-Keplerian timing parameters in the binary
pulsars: PSR 1913+16 [37], PSR B1534+12 [41], PSR J1141−6545 [47] and
PSR J0737−3039A+B [3, 48, 49]. In addition, we can use25 the recently up-
dated limit on the parameter ∆ measuring a possible violation of the strong
equivalence principle (SEP), namely |∆| < 5.5 × 10−3 at the 95% confidence
level [44].
This ensemble of solar-system and binary-pulsar data can then be analyzed
within any given parametrized theoretical framework. For instance, one might
work within
(i) the 4-parameter framework T0(γ̄, β̄; ǫ, ζ) [70] which defines the 2PN exten-
sion of the original (Eddington) PPN framework T0(γ̄, β̄); or
(ii) the 2-parameter class of tensor-mono-scalar theories T1(α0, β0) [34]; or
25There is, however, a caveat in the theoretical use one can make of the phenomenological
limits on ∆. Indeed, in the small-eccentricity long-orbital-period binary pulsar systems used
to constrain ∆ one does not have access to enough PK parameters to measure the pulsar mass
mA directly. As the theoretical expression of ∆ ≃ α0 αA −α
depends on mA (through αA),
one needs to assume some fiducial value of mA (say mA ≃ 1.35M⊙).
(iii) the 2-parameter class of tensor-bi-scalar theories T2(β
′, β′′) [33].
Here, the index 0 on T0(γ̄, β̄; ǫ, ζ) is a reminder of the fact that this framework
is not a family of specific theories (it contains zero explicit dynamical fields),
but is a parametrization of 2PN deviations from GR. As a consequence, its use
for analyzing binary pulsar data is somewhat ill-defined because one needs to
truncate the various timing observables (which are functions of the compactness
of the two bodies A and B, say PPK = f(cA, cB)) at the 2PN order (i.e. es-
sentially at the quadratic order in cA and/or cB). For some observables (or for
product of observables) there might be several ways of defining this truncation.
In spite of this slight inconvenience, the use of the T0(γ̄, β̄; ǫ, ζ) framework is
conceptually useful because it shows very clearly why and how binary-pulsar
data can probe the behaviour of gravitational theories beyond the usual 1PN
regime probed by solar-system tests.
For instance, the parameter ∆A ≡ mgravA /minertA − 1 measuring the strong
equivalence principle (SEP) violation in a neutron star has, within the T0(γ̄, β̄; ǫ, ζ)
framework, a 2PN-order expansion of the form [33, 70]
∆A = −
(4 β̄ − γ̄) cA +
+ ζ +O(β̄)
bA , (91)
where cA = −2 ∂ lnmA∂ lnG ≃
〈U〉A, bA = 1c4 〈U
2〉A ≃ B c2A, with B ≃ 1.026 and
cA ≃ kmA/M⊙ with k ∼ 0.21. The general result (91) is compatible with the
result quoted in subsection 5.2 within the context of the theory T2(β
′, β′′) when
taking into account the fact that, within T2(β
′, β′′), one has β̄ = γ̄ = 0, ǫ = β′
and ζ = 0 [and that β′′ parametrizes some effects beyond the 2PN level].
On the example of Eq. (91) one sees that, after having used solar-system tests
to constrain the first contribution on the RHS to a very small value, one can
use binary-pulsar tests of the SEP to set a significant limit on the combination
ǫ + ζ of 2PN parameters. Other pulsar data then yield significant limits on
other combinations of the two 2PN parameters ǫ and ζ. The final conclusion
is that binary-pulsar data allow one to set significant limits (around or better
than the 1% level) on the possible 2PN deviations from GR (in contrast to
solar-system tests which are unable to yield any limit on ǫ and ζ) [70]. For a
recent update of the limits on ǫ and ζ, which makes use of recent pulsar data
see [51].
Let us now briefly discuss the use of mini-space of theories, such as T1(α0, β0)
or T2(β
′, β′′), for analyzing solar-system and binary-pulsar data. The basic
methodology is to compute, for each given theory (e.g. for each given values of
α0 and β0 if one chooses to work in the T1(α0, β0) theory space) a goodness-
of-fit statistics χ2(α0, β0) measuring the quality of the agreement between the
experimental data and the considered theory. For instance, when considering
the timing data of a particular pulsar, for which one has measured several PK
parameters pi (i = 1, . . . , n) with some standard deviations σ
, one defines,
for this pulsar
χ2(α0, β0) = min
mA,mB
(σobspi )
theory
i (α0, β0;mA,mB)− p
2 , (92)
where ‘min’ denotes the result of minimizing over the unknown masses mA,mB
and where p
theory
i (α0, β0;mA,mB) denotes the theoretical prediction (within
T1(α0, β0)) for the PK observable pi (given also the observed values of the
Keplerian parameters).
The goodness-of-fit quantity χ2(α0, β0) will reach its minimum χ
min for some
values, say αmin0 , β
0 , of α0 and β0. Then, one focusses, for each pulsar, on the
level contours of the function
∆χ2(α0, β0) ≡ χ2(α0, β0)− χ2min . (93)
Each choice of level contour (e.g. ∆χ2 = 1 or ∆χ2 = 2.3) defines a certain
region in theory space, which contains, with a certain corresponding ‘confidence
level’, the ‘correct’ theory of gravity (if it belongs to the considered mini-space of
theories). When combining together several independent data sets (e.g. solar-
system data, and different pulsar data) we can define a total goodness-of-fit
statistics χ2tot(α0, β0), by adding together the various individual χ
2(α0, β0). This
leads to a corresponding combined contour ∆χ2tot(α0, β0).
Let us end by briefly summarizing the results of the theory-space approach
to relativistic gravity tests. For detailed discussions the reader should consult
Refs. [33, 40, 35, 36, 80], and especially the recent update [51] which uses the
latest binary-pulsar data.
Regarding the two-parameter class of tensor-bi-scalar theories T2(β
′, β′′) the
recent analysis [51] has shown that the ∆χ2(β′, β′′) corresponding to the double
binary pulsar PSR J0737−3039 was defining quite a small elliptical allowed
region in the (β′, β′′) plane. By contrast the other pulsar data define much wider
allowed regions, while the strong equivalence principle tests define (in view of
the theoretical result ∆ ≃ 1 + 1
Bβ′(c2A − c2B)) a thin, but infinitely long, strip
|β′| < cst. in the (β′, β′′) plane. This highlights the power of the double binary
pulsar in probing certain specific strong-field deviations from GR.
Contrary to the T2(β
′, β′′) tensor-bi-scalar theories, which were constructed
to have exactly the same first post-Newtonian limit as GR26 (so that solar-
system tests put no constraints on β′ and β′′), the class of tensor-mono-scalar
theories T1(α0, β0) is such that its parameters α0 and β0 parametrize both the
weak-field 1PN regime (see Eqs. (60) and (61) above) and the strong-field regime
(which plays an important role in compact binaries). This means that each class
of solar-system data (see Eqs. (87)–(90) above) will define, via a corresponding
goodness-of-fit statistics of the type, say
χ2Cassini(α0, β0) = (σ
Cassini
−2 (γ̄theory(α0, β0)− γ̄Cassini)2
26However, this could be achieved only at the cost of allowing some combination of the two
scalar fields to carry a negative energy flux.
a certain allowed region27 in the (α0, β0) plane. As a consequence, the analysis
in the framework of the T1(α0, β0) space of theories allows one to compare
and contrast the probing powers of solar-system tests versus binary-pulsar tests
(while comparing also solar-system tests among themselves and binary-pulsar
ones among themselves). The result of the recent analysis [51] is shown in
Figure 3.
general relativity
B1534+12
J1141–6545
J0737–3039
B1913+16
−6 −4 −2 0 2 4 6
0.025
matter
matter
0.175
0.075
0.125
solar
system
Figure 3: Solar-system and binary-pulsar constraints on the two-parameter fam-
ily of tensor-mono-scalar theories T1(α0, β0). Figure taken from [51].
In Fig. 3, the various solar-system constraints (87)–(90) are concentrated
around the horizontal β0 axis. In particular, the high-precision Cassini con-
straint is the lower small grey strip. The various pulsar constraints are labelled
by the name of the pulsar, except for the strong equivalence principle constraint
which is labelled SEP. Note that General Relativity corresponds to the origin
of the (α0, β0) plane, and is compatible with all existing tests.
The global constraint obtained by combining all the pulsar tests would, to a
good accuracy, be obtained by intersecting the various pulsar-allowed regions.
One can then see on Fig. 3 that it would be comparable to the pre-Cassini
solar-system constraints and that its boundaries would be defined successively
(starting from the left) by 1913+16, 1141−6545, 0737−3039, 1913+16 again
and 1141−6545 again.
A first conclusion is therefore that, at the quantitative level, binary-pulsar
tests constrain tensor-scalar gravity theories as strongly as most solar-system
27Actually, in the case of the Cassini data, as it is quite plausible that the positive value of
the published central value γ̄Cassini = +2.1× 10−5 is due to unsubtracted systematic effects,
we use σCassiniγ = 2.3× 10
−5 but γ̄Cassini = 0. Otherwise, we would get unreasonably strong
1σ limits on α2
because tensor-scalar theories predict that γ̄ must be negative, see Eqs. (60)
and (61).
tests (excluding the exceptionally accurate Cassini result which constrains α20
to be smaller than 1.15 × 10−5, i.e. |α0| < 3.4 × 10−3). A second conclusion
is obtained by comparing the behaviour of the solar-system exclusion plots and
of the binary-pulsar ones around the negative β0 axis. One sees that binary-
pulsar tests exclude a whole domain of the theory space (located on the left
of β0 < −4) which is compatible with all solar-system experiments (even when
including the very tight Cassini constraint). This remarkable qualitative feature
of pulsar tests is a direct consequence of the existence of (non-perturbative)
strong-field effects which start developing when the product −β0 cA (with cA
denoting, as above, the compactness of the pulsar) becomes of order unity.
6 Conclusion
In conclusion, we hope to have convinced the reader of the superb opportunities
that binary pulsar data offer for testing gravity theories. In particular, they
have been able to go qualitatively beyond solar-system experiments in probing
two physically important regimes of relativistic gravity: the radiative regime
and the strong-field one. Up to now, General Relativity has passed with flying
colours all the radiative and strong-field tests provided by pulsar data. However,
it is important to continue testing General Relativity in all its aspects (weak-
field, radiative and strong-field). Indeed, history has taught us that physical
theories have a limited range of validity, and that it is quite difficult to predict
in which regime a theory will cease to be an accurate description of nature. Let
us look forward to new results, and possibly interesting surprises, from binary
pulsar data.
Acknowledgments
It is a pleasure to thank my long-term collaborator Gilles Esposito-Farèse for his
useful remarks on the text, and for providing the figures. I wish also to thank
the organizers of the 2005 Sigrav School, and notably Monica Colpi and Ugo
Moschella, for organizing a warm and intellectually stimulating meeting. This
work was partly supported by the European Research and Training Network
“Forces Universe” (contract number MRTN-CT-2004-005104).
References
[1] R. A. Hulse and J. H. Taylor: Discovery of a pulsar in a binary system,
Astrophys. J. 195, L51 (1975).
[2] M. Burgay et al.: An increased estimate of the merger rate of double
neutron stars from observations of a highly relativistic system, Nature
426, 531 (2003), arXiv:astro-ph/0312071.
[3] A. G. Lyne et al.: A double-pulsar system: A rare laboratory for relativistic
gravity and plasma physics, Science 303, 1153 (2004).
http://arxiv.org/abs/astro-ph/0312071
[4] F. K. Manasse: J. Math. Phys. 4, 746 (1963).
[5] P. D. D’Eath: Phys. Rev. D 11, 1387 (1975).
[6] R. E. Kates: Phys. Rev. D 22, 1853 (1980).
[7] D. M. Eardley: Astrophys. J. 196, L59 (1975).
[8] C. M. Will, D. M. Eardley: Astrophys. J. 212, L91 (1977).
[9] T. Damour: Gravitational radiation and the motion of compact bodies,
in Gravitational Radiation, edited by N. Deruelle and T. Piran, North-
Holland, Amsterdam, pp. 59-144 (1983).
[10] K. S. Thorne and J. B. Hartle: Laws of motion and precession for black
holes and other bodies, Phys. Rev. D 31, 1815 (1984).
[11] C. M. Will: Theory and experiment in gravitational physics, Cambridge
University Press (1993) 380 p.
[12] V. A. Brumberg and S. M. Kopejkin: Nuovo Cimento B 103, 63 (1988)
[13] T. Damour, M. Soffel and C. M. Xu: General relativistic celestial mechan-
ics. 1. Method and definition of reference system, Phys. Rev. D 43, 3273
(1991); General relativistic celestial mechanics. 2. Translational equations
of motion, Phys. Rev. D 45, 1017 (1992); General relativistic celestial me-
chanics. 3. Rotational equations of motion, Phys. Rev. D 47, 3124 (1993);
General relativistic celestial mechanics. 4. Theory of satellite motion, Phys.
Rev. D 49, 618 (1994).
[14] G. ’t Hooft and M. J. G. Veltman: Regularization and renormalization of
gauge fields, Nucl. Phys. B 44, 189 (1972).
[15] T. Damour and N. Deruelle: Radiation reaction and angular momentum
loss in small angle gravitational scattering, Phys. Lett. A 87, 81 (1981).
[16] T. Damour: Problème des deux corps et freinage de rayonnement en rela-
tivité générale, C.R. Acad. Sci. Paris, Série II, 294, 1355 (1982).
[17] T. Damour: The problem of motion in Newtonian and Einsteinian grav-
ity, in Three Hundred Years of Gravitation, edited by S.W. Hawking and
W. Israel, Cambridge University Press, Cambridge, pp. 128-198 (1987).
[18] P. Jaranowski, G. Schäfer: Third post-Newtonian higher order ADM
Hamilton dynamics for two-body point-mass systems, Phys. Rev. D 57,
7274 (1998).
[19] L. Blanchet, G. Faye: General relativistic dynamics of compact binaries
at the third post-Newtonian order, Phys. Rev. D 63, 062005-1-43 (2001).
[20] T. Damour, P. Jaranowski, G. Schäfer: Dimensional regularization of the
gravitational interaction of point masses, Phys. Lett. B 513, 147 (2001).
[21] Y. Itoh, T. Futamase: New derivation of a third post-Newtonian equation
of motion for relativistic compact binaries without ambiguity, Phys. Rev.
D 68, 121501(R), (2003).
[22] L. Blanchet, T. Damour, G. Esposito-Farèse: Dimensional regularization
of the third post-Newtonian dynamics of point particles in harmonic coor-
dinates, Phys. Rev. D 69, 124007 (2004).
[23] M. E. Pati, C. M. Will: Post-Newtonian gravitational radiation and equa-
tions of motion via direct integration of the relaxed Einstein equations.
II. Two-body equations of motion to second post-Newtonian order, and
radiation-reaction to 3.5 post-Newtonian order, Phys. Rev. D 65, 104008-
1-21 (2001).
[24] C. Königsdörffer, G. Faye, G. Schäfer: Binary black-hole dynamics at the
third-and-a-half post-Newtonian order in the ADM formalism, Phys. Rev.
D 68, 044004-1-19 (2003).
[25] S. Nissanke, L. Blanchet: Gravitational radiation reaction in the equa-
tions of motion of compact binaries to 3.5 post-Newtonian order, Class.
Quantum Grav. 22, 1007 (2005).
[26] L. Blanchet: Gravitational radiation from post-Newtonian sources and
inspiralling compact binaries, Living Rev. Rel. 5, 3 (2002); Updated article:
http://www.livingreviews.org/lrr-2006-4
[27] T. Damour, N. Deruelle: General relativitic celestial mechanics of binary
system I. The post-Newtonian motion, Ann. Inst. Henri Poincaré 43, 107
(1985).
[28] T. Damour: Gravitational radiation reaction in the binary pulsar and the
quadrupole formula controversy, Phys. Rev. Lett. 51, 1019 (1983).
[29] R. Blandford, S. A. Teukolsky: Astrophys. J. 205, 580 (1976).
[30] T. Damour, N. Deruelle: General relativitic celestial mechanics of binary
system II. The post-Newtonian timing formula, Ann. Inst. Henri Poincaré
44, 263 (1986).
[31] T. Damour, J. H. Taylor: Strong field tests of relativistic gravity and
binary pulsars, Phys. Rev. D 45, 1840 (1992).
[32] T. Damour: Strong-field tests of general relativity and the binary pulsar,
in Proceedings of the 2cd Canadian Conference on General Relativity and
Relativistic Astrophysics, edited by A. Coley, C. Dyer, T. Tupper, World
Scientific, Singapore, pp. 315-334 (1988).
[33] T. Damour, G. Esposito-Farèse: Tensor-multi-scalar theories of gravita-
tion, Class. Quant. Grav. 9, 2093 (1992).
http://www.livingreviews.org/lrr-2006-4
[34] T. Damour, G. Esposito-Farèse: Non-perturbative strong-field effects in
tensor-scalar theories of gravitation, Phys. Rev. Lett. 70, 2220 (1993).
[35] T. Damour, G. Esposito-Farèse: Tensor-scalar gravity and binary-pulsar
experiments, Phys. Rev. D 54, 1474 (1996), arXiv:gr-qc/9602056.
[36] T. Damour, G. Esposito-Farèse: Gravitational-wave versus binary-pulsar
tests of strong-field gravity, Phys. Rev. D 58, 042001 (1998).
[37] J. M. Weisberg, J. H. Taylor: Relativistic binary pulsar B1913+16: thirty
years of observations and analysis, To appear in the proceedings of As-
pen Winter Conference on Astrophysics: Binary Radio Pulsars, Aspen,
Colorado, 11-17 Jan 2004., arXiv:astro-ph/0407149.
[38] T. Damour, J. H. Taylor: On the orbital period change of the binary pulsar
Psr-1913+16, The Astrophysical Journal 366, 501 (1991).
[39] C. M. Will, H. W. Zaglauer: Gravitational radiation, close binary systems,
and the Brans-Dicke theory of gravity, Astrophys. J. 346, 366 (1989).
[40] J. H. Taylor, A. Wolszczan, T. Damour, J. M. Weisberg: Experimental
constraints on strong field relativistic gravity, Nature 355, 132 (1992).
[41] I. H. Stairs, S. E. Thorsett, J. H. Taylor, A. Wolszczan: Studies of the
relativistic binary pulsar PSR B1534+12: I. Timing analysis, Astrophys.
J. 581, 501 (2002).
[42] K. Nordtvedt: Equivalence principle for massive bodies. 2. Theory, Phys.
Rev. 169, 1017 (1968).
[43] T. Damour and G. Schäfer: New tests of the strong equivalence principle
using binary pulsar data, Phys. Rev. Lett. 66, 2549 (1991).
[44] I. H. Stairs et al.: Discovery of three wide-orbit binary pulsars: implica-
tions for binary evolution and equivalence principles, Astrophys. J. 632,
1060 (2005).
[45] N. Wex: New limits on the violation of the Strong Equivalence Princi-
ple in strong field regimes, Astronomy and Astrophysics 317, 976 (1997),
gr-qc/9511017.
[46] V. M. Kaspi et al.: Discovery of a young radio pulsar in a relativistic
binary orbit, arXiv:astro-ph/0005214.
[47] M. Bailes, S. M. Ord, H. S. Knight, A. W. Hotan: Self-consistency of
relativistic observables with general relativity in the white dwarf-neutron
star binary pulsar PSR J1141-6545, Astrophys. J. 595, L49 (2003).
[48] M. Kramer et al.: eConf C041213, 0038 (2004), astro-ph/0503386.
http://arxiv.org/abs/gr-qc/9602056
http://arxiv.org/abs/astro-ph/0407149
http://arxiv.org/abs/gr-qc/9511017
http://arxiv.org/abs/astro-ph/0005214
http://arxiv.org/abs/astro-ph/0503386
[49] M. Kramer et al.: Tests of general relativity from timing the double pulsar,
Science 314, 97-102 (2006).
[50] S. M. Ord, M. Bailes and W. van Straten: The Scintillation Velocity of
the Relativistic Binary Pulsar PSR J1141-6545, arXiv:astro-ph/0204421.
[51] T. Damour, G. Esposito-Farèse: Binary-pulsar versus solar-system tests of
tensor-scalar gravity, 2007, in preparation.
[52] T. Damour, R. Ruffini: Sur certaines vérifications nouvelles de la rela-
tivité générale rendues possibles par la découverte d’un pulsar membre
d’un système binaire, C.R. Acad. Sci. Paris (Série A) 279, 971 (1974).
[53] B. M. Barker, R. F. O’Connell: Gravitational two-body problem with
arbitrary masses, spins, and quadrupole moments, Phys. Rev. D 12, 329
(1975).
[54] M. Kramer: Astrophys. J. 509, 856 (1998).
[55] J. M. Weisberg and J. H. Taylor: Astrophys. J. 576, 942 (2002).
[56] I. H. Stairs, S. E. Thorsett, Z. Arzoumanian: Measurement of gravitational
spin-orbit coupling in a binary pulsar system, Phys. Rev. Lett. 93, 141101
(2004).
[57] A. W. Hotan, M. Bailes, S. M. Ord: Geodetic Precession in PSR J1141-
6545, Astrophys. J. 624, 906 (2005).
[58] T. Damour, G. Schäfer: Higher order relativistic periastron advances and
binary pulsars, Nuovo Cim. B 101, 127 (1988).
[59] J.M. Lattimer, B.F. Schutz: Constraining the equation of state with
moment of inertia measurements, Astrophys. J. 629, 979 (2005),
arXiv:astro-ph/0411470.
[60] I. A. Morrison, T. W. Baumgarte, S. L. Shapiro, V. R. Pandharipande:
The moment of inertia of the binary pulsar J0737-3039A: constraining the
nuclear equation of state, Astrophys. J. 617, L135 (2004).
[61] A. S. Eddington: The Mathematical Theory of Relativity, Cambridge Uni-
versity Press, London (1923).
[62] L. I. Schiff: Am. J. Phys. 28, 340 (1960).
[63] R. Baierlein: Phys. Rev. 162, 1275 (1967).
[64] C. M. Will: Astrophys. J. 163, 611 (1971).
[65] C. M. Will, K. Nordtvedt: Astrophys. J. 177, 757 (1972).
http://arxiv.org/abs/astro-ph/0204421
http://arxiv.org/abs/astro-ph/0411470
[66] C. M. Will: The confrontation between general relativity and experi-
ment, Living Rev. Rel. 4, 4 (2001) arXiv:gr-qc/0103036; update (2005)
in arXiv:gr-qc/0510072.
[67] P. Jordan, Nature (London) 164, 637 (1949); Schwerkraft und Weltall
(Vieweg, Braunschweig, 1955); Z. Phys. 157, 112 (1959).
[68] M. Fierz: Helv. Phys. Acta 29, 128 (1956).
[69] C. Brans, R. H. Dicke: Mach’s principle and a relativistic theory of gravi-
tation, Phys. Rev. 124, 925 (1961).
[70] T. Damour, G. Esposito-Farèse: Testing gravity to second postNewto-
nian order: A Field theory approach, Phys. Rev. D 53, 5541 (1996),
arXiv:gr-qc/9506063.
[71] T. Damour, A. M. Polyakov: The string dilaton and a least coupling prin-
ciple, Nucl. Phys. B 423, 532 (1994) arXiv:hep-th/9401069; String theory
and gravity, Gen. Rel. Grav. 26, 1171 (1994), arXiv:gr-qc/9411069.
[72] T. Damour, D. Vokrouhlicky: The equivalence principle and the moon,
Phys. Rev. D 53, 4177 (1996), arXiv:gr-qc/9507016.
[73] J. Khoury, A. Weltman: Chameleon fields: Awaiting surprises for
tests of gravity in space, Phys. Rev. Lett. 93, 171104 (2004),
arXiv:astro-ph/0309300.
[74] J. Khoury, A. Weltman: Chameleon cosmology, Phys. Rev. D 69, 044026
(2004), arXiv:astro-ph/0309411.
[75] P. Brax, C. van de Bruck, A. C. Davis, J. Khoury, A. Weltman: Detect-
ing dark energy in orbit: The cosmological chameleon, Phys. Rev. D 70,
123518 (2004), arXiv:astro-ph/0408415.
[76] T. Damour, F. Piazza, G. Veneziano: Runaway dilaton and equiv-
alence principle violations, Phys. Rev. Lett. 89, 081601 (2002),
arXiv:gr-qc/0204094; Violations of the equivalence principle in a dilaton-
runaway scenario, Phys. Rev. D 66, 046007 (2002), arXiv:hep-th/0205111.
[77] T. Damour, G. Esposito-Farèse: Testing gravity to second postNewto-
nian order: A Field theory approach, Phys. Rev. D 53, 5541 (1996),
arXiv:gr-qc/9506063.
[78] T. Damour, K. Nordtvedt: General relativity as a cosmological attractor
of tensor scalar theories, Phys. Rev. Lett. 70, 2217 (1993); Tensor-scalar
cosmological models and their relaxation toward general relativity, Phys.
Rev. D 48, 3436 (1993).
[79] J. B. Hartle: Slowly rotating relativistic stars. 1. Equations of structure,
Astrophys. J. 150, 1005 (1967).
http://arxiv.org/abs/gr-qc/0103036
http://arxiv.org/abs/gr-qc/0510072
http://arxiv.org/abs/gr-qc/9506063
http://arxiv.org/abs/hep-th/9401069
http://arxiv.org/abs/gr-qc/9411069
http://arxiv.org/abs/gr-qc/9507016
http://arxiv.org/abs/astro-ph/0309300
http://arxiv.org/abs/astro-ph/0309411
http://arxiv.org/abs/astro-ph/0408415
http://arxiv.org/abs/gr-qc/0204094
http://arxiv.org/abs/hep-th/0205111
http://arxiv.org/abs/gr-qc/9506063
[80] G. Esposito-Farèse: Binary-pulsar tests of strong-field gravity and gravita-
tional radiation damping, in Proceedings of the tenth Marcel Grossmann
Meeting, July 2003, edited by M. Novello et al., World Scientific (2005),
p. 647, arXiv:gr-qc/0402007.
[81] B. Bertotti, L. Iess, P. Tortora: A test of general relativity using radio
links with the Cassini spacecraft, Nature 425, 374 (2003).
[82] S. S. Shapiro et al: Phys. Rev. Lett 92, 121101 (2004).
[83] I. I. Shapiro, in General Relativity and Gravitation 12, edited by N. Ashby,
D. F. Bartlett, and W. Wyss (Cambridge University Press, 1990), p. 313.
[84] J. G. Williams, S. G. Turyshev, D. H. Boggs: Progress in lunar laser
ranging tests of relativistic gravity, Phys. Rev. Lett. 93, 261101 (2004),
arXiv:gr-qc/0411113.
http://arxiv.org/abs/gr-qc/0402007
http://arxiv.org/abs/gr-qc/0411113
	Introduction
	Motion of binary pulsars in general relativity
	Timing of binary pulsars in general relativity
	Phenomenological approach to testing relativistic gravity with binary pulsar data
	Theory-space approach to testing relativistic gravity with binary pulsar data
	Theory-space approaches to solar-system tests of relativistic gravity
	Theory-space approaches to binary-pulsar tests of relativistic gravity
	Tensor-scalar theories of gravity
	The role of the coupling function a(); definition of the two-dimensional space of tensor-scalar gravity theories T1 (0 , 0)
	Tensor-scalar gravity, strong-field effects, and binary-pulsar observables
	Theory-space analyses of binary pulsar data
	Conclusion
ABSTRACT
  We review the general relativistic theory of the motion, and of the timing,
of binary systems containing compact objects (neutron stars or black holes).
Then we indicate the various ways one can use binary pulsar data to test the
strong-field and/or radiative aspects of General Relativity, and of general
classes of alternative theories of relativistic gravity.

<|endoftext|><|startoftext|>
arXiv:0704.0750v1  [math.DG]  5 Apr 2007
Univ. Beograd. Publ. Elektrotehn. Fak.
Ser. Mat. 9 (1998), 29–33
SOME COMBINATORIAL ASPECTS
OF DIFFERENTIAL OPERATION
COMPOSITION ON THE SPACE R
Branko J. Malešević
In this paper we present a recurrent relation for counting meaningful compositions
of the higher-order differential operations on the space Rn (n=3,4,...) and extract
the non-trivial compositions of order higher than two.
1. DIFFERENTIAL FORMS AND OPERATIONS ON THE SPACE R
It is well known that the first-order differential operations grad, curl and div
on the space R3 can be introduced using the operator of the exterior differentia-
tion d of differential forms [1]:
Ω0(R3)
−→ Ω1(R3)
−→ Ω2(R3)
−→ Ω3(R3),
where Ωi(R3) is the space of differential forms of degree i = 0, 1, 2, 3 on the space
3 over the ring of functions A = {f : R3 → R | f ∈ C∞(R3)}. In the consideration,
which follows, we give definitions of the first-order differential operations.
Let us notice that one-dimensional spaces Ω0(R3) and Ω3(R3) are isomorphic
toA and let ϕ0 : Ω
0(R3) → A, ϕ3 : Ω
3(R3) → A be the corresponding isomorphisms.
Next, the set of vector functionsB = {f =(f1, f2, f3) : R
3 → R3 | f1, f2, f3 ∈ C
∞(R3)},
over the ring A, is three-dimensional. It is isomorphic to Ω1(R3) and Ω2(R3). Let
ϕ1 : Ω
1(R3) → B, ϕ2 : Ω
2(R3) → B be the corresponding isomorphisms. In that
case, the compositions ϕ−10 ◦ϕ3 : Ω
3(R3) → Ω0(R3) and ϕ−11 ◦ϕ2 : Ω
2(R3) → Ω1(R3)
are isomorphisms of the corresponding spaces of differential forms. The first-order
differential operations are defined via the operator of the exterior differentiation d
of differential forms in the following form:
∇1 = ϕ1◦d◦ϕ
0 : A → B, ∇2 = ϕ2◦d◦ϕ
1 : B → B, ∇3 = ϕ3◦d◦ϕ
2 : B → A.
Therefore we obtain explicit expressions for the first order differential operations
∇1, ∇2, ∇3 on the space R
3 in the following form:
(1) gradf = ∇1f =
e3 : A → B,
(2) curlf = ∇2f =
e3 : B → B,
(3) divf = ∇3f =
: B → A.
1991 Mathematics Subject Classification: 26B12, 58A10
http://arxiv.org/abs/0704.0750v1
30 Branko J. Malešević
Let us count meaningful compositions of differential operations ∇1,∇2,∇3.
Consider the set of functions Θ = {∇1,∇2,∇3}. Let us define a binary relation
ρ ”to be in composition” with ∇iρ∇j = ⊤ iff the composition ∇j ◦∇i is meaningful
(∇i,∇j ∈ Θ). The Cayley’s table of this relation reads:
ρ ∇1 ∇2 ∇3
∇1 ⊥ ⊤ ⊤
∇2 ⊥ ⊤ ⊤
∇3 ⊤ ⊥ ⊥ .
We form the graph of relation ρ as follows. If ∇iρ∇j = ⊤ then we put the node ∇j
under the node ∇i. Let us mark ∇0 as nowhere-defined function ϑ, with domain
and range being the empty set [2]. We shall consider ∇0ρ∇i = ⊤ (i = 1, 2, 3). For
the set of functions Θ ∪ {∇0} our graph is the tree with the root in the node ∇0.
∇0 f(0) = 1
∇2 ❳❳
∇3 f(1) = 3
∇1 f(2) = 5
∇3 f(3) = 8
✔✔❚❚ ✔✔❚❚ ✔✔❚❚ ✔✔❚❚ ✔✔q∇2 ❚❚q∇3 q∇1 f(4) = 13
✔✔❚❚ ✔✔❚❚Fig. 1 f(5) = 21
Let fi(k) be a number of meaningful compositions of the k
th-order beginning with
∇i. Let f(k) be a number of meaningful composition of the k
th-order of operations
over Θ. Then f(k) = f1(k) + f2(k) + f3(k). Based on partial self similarity of the
tree (Fig. 1), which is formed according to Cayley’s table (4), we get equalities:
f1(k) = f2(k − 1) + f3(k − 1) ∧ f2(k) = f2(k − 1) + f3(k − 1) ∧ f3(k) = f1(k − 1).
Now, a recurrent relation for f(k) can be derived as follows:
f(k) = f1(k) + f2(k) + f3(k)
f1(k − 1) + f2(k − 1) + f3(k − 1)
f3(k − 1) + f2(k − 1)
= f(k − 1) +
f1(k − 2) + f2(k − 2) + f3(k − 2)
= f(k − 1) + f(k − 2).
Based on the initial values: f(1) = 3, f(2) = 5, f(3) = 8 we conclude that f(k) =
Fk+3, where is Fibonacci’s number of order k + 3.
Let us note that ∇2 ◦∇1 = 0 and ∇3 ◦∇2 = 0, because d
2 = 0. On the other
hand, the compositions ∇1 ◦∇3, ∇2 ◦∇2 and ∇3 ◦∇1 are not annihilated, because
of ϕ−10 ◦ ϕ3 6= i and ϕ
1 ◦ ϕ2 6= i. Thus, as in the paper [2], we conclude that the
non-trivial compositions are of the following form:
(∇1◦)∇3 ◦ · · · ◦ ∇1 ◦ ∇3 ◦ ∇1,
∇2 ◦ ∇2 ◦ · · · ◦ ∇2 ◦ ∇2 ◦ ∇2,
(∇3◦)∇1 ◦ · · · ◦ ∇3 ◦ ∇1 ◦ ∇3.
As non-trivial compositions we consider those which are not identical to the zero
function. Terms in parentheses are included in for an odd number of terms and are
left out otherwise.
Some combinatorial aspects of differential operation compositions ... 31
2. DIFFERENTIAL FORMS AND OPERATIONS ON THE SPACE R
Let us present a recurrent relation for counting meaningful compositions of
the higher-order differential operations on the space Rn (n = 3, 4, . . .) and extract
the non-trivial compositions of order higher than two. Let us form the following
sets of functions:
Ai = {f : R
)|f1, . . . , f(n
) ∈ C
for i = 0, 1, . . . ,m where m = [n/2]. Let Ωi(Rn) be a set of differential forms of
degree i = 0, 1, . . . , n on the space Rn. Let us notice that Ωi(Rn) and Ωn−i(Rn),
over ring A0, are spaces of the same dimension
, for i = 0, 1, . . . ,m. They can
be identified with Ai, using the corresponding isomorphisms:
ϕi : Ω
i(Rn) → Ai (0 ≤ i ≤ m) and ϕn−i : Ω
n−i(Rn) → Ai (0 ≤ i < n−m).
We define the first-order differential operations on the space Rn
via the operator of the exterior differentiation d as follows:
∇i = ϕi ◦ d ◦ ϕ
i−1 (1 ≤ i ≤ n). (1 ≤ i ≤ m)
Therefore, we obtain the first order differential operations on the space Rn, de-
pending on pairity of dimension n, in the following form:
n = 2m : ∇1 : A0 → A1
∇2 : A1 → A2
∇i : Ai → Ai+1
∇m : Am−1 → Am
∇m+1 : Am → Am−1
∇n−j : Aj+1 → Aj
∇n−1 : A2 → A1
∇n : A1 → A0,
n = 2m+ 1 : ∇1 : A0 → A1
∇2 : A1 → A2
∇i : Ai → Ai+1
∇m : Am−1 → Am
∇m+1 : Am → Am
∇m+2 : Am → Am−1
∇n−j : Aj+1 → Aj
∇n−1 : A2 → A1
∇n : A1 → A0.
Consider the set of functions Θ = {∇1,∇2, . . . ,∇n}. Let us define a binary relation
ρ ”to be in composition” with ∇iρ∇j = ⊤ iff the composition ∇j ◦∇i is meaningful
(∇i,∇j ∈ Θ). It is not difficult to check that Cayley’s table of this relation is
determined with:
(6) ∇iρ∇j =
⊤ : (j = i+ 1) ∨ (i+ j = n+ 1),
⊥ : (j 6= i+ 1) ∧ (i+ j 6= n+ 1).
Let us form an adjacency matrix A = [aij ] ∈ {0, 1}
n×n of the graph, determined
by relation ρ. Let fi(k) be a number of meaningful compositions of the k
th-order
32 Branko J. Malešević
beginning with ∇i (notice that fi(1) = 1 for i= 1, . . . , n). Let f(k) be a number
of meaningful composition of the kth-order of operations over Θ. Then f(k) =
f1(k)+. . .+fn(k). Notice that the following is true:
(7) fi(k) =
aij · fj(k − 1),
for i = 1, . . . , n. Based on (7) we form the system of recurrent equations:
f1(k)
fn(k)
a11 · · · a1n
an1 · · · ann
f1(k − 1)
fn(k − 1)
If vn = [ 1 · · · 1 ]1×n then:
(9) f(k) = vn ·
f1(k)
fn(k)
So, the expression:
(10) f(k) = vn · A
k−1 · vTn .
follows from (8) and (9). Reducing the system of the recurrent equations (8), for
any of the functions fi(k) we have:
(11) α0fi(k) + α1fi(k − 1) + · · ·+ αnfi(k − n) = 0 (k > n),
where α0, . . . , αn are coefficients of the characteristic polynomial Pn(λ) = |A−λI| =
n+ . . .+αn. Thus, we conclude that the function f(k) =
fi(k) also satisfies:
(12) α0f(k) + α1f(k − 1) + · · ·+ αnf(k − n) = 0 (k > n).
Hence, the following theorem holds.
Theorem 1. The number of meaningful differential operations, on the space R
(n = 3, 4, . . .), of the order higher than two, is determined by the formula (10), i.e.
by the recurrent formula (12).
In n-dimensional space Rn, for dimensions n = 3, 4, 5, . . . , 10, using the pre-
vious theorem we form a table of the corresponding recurrent formula:
Dimension: Recurrent relations for the number of meaningful compositions:
n = 3 f(i+ 2) = f(i + 1) + f(i)
n = 4 f(i+ 2) = 2f(i)
n = 5 f(i+ 3) = f(i+ 2) + 2f(i+ 1)− f(i)
n = 6 f(i+ 4) = 3f(i+ 2)− f(i)
n = 7 f(i + 5) = f(i + 3) + 3f(i + 2) − 2f(i + 1)− f(i)
n = 8 f(i + 4) = 4f(i + 3) − 3f(i)
n = 9 f(i+ 5) = f(i+ 4) + 4f(i+ 3)− 3f(i+ 2)− 3f(i + 1) + f(i)
n = 10 f(i + 6) = 5f(i + 4) − 6f(i + 2) + f(i)
Some combinatorial aspects of differential operation compositions ... 33
Let us determine non-trivial higher-order meaningful compositions on the
space Rn. For isomorphisms ϕk we have:
(13) ϕ−1
◦ ϕn−k 6= i,
for k = 1, 2, . . . , n and 2k 6= n. Then, based on (6) and (13), all second-order
compositions are given by the formula:
(14) ∇j ◦ ∇k =
0 : j = k + 1,
gj,k : (k + j = n+ 1) ∧ (2k 6= n),
ϑ : (j 6= k + 1) ∧ (k + j 6= n+ 1);
where 0 is a trivial composition, gj,k is a non-trivial second-order composition and
ϑ is a nowhere-defined function for j, k = 1, . . . , n. Notice that in gj,k = ∇j ◦∇k =
ϕn+1−k ◦ d ◦ ϕ
◦ ϕk ◦ d ◦ ϕ
k−1 (j=n+1−k ∧ 2k 6=n) and switching the terms
is impossible, because in that way we get nowhere-defined function ϑ. Hence, we
conclude that the following theorem holds.
Theorem 2. All meaningful non-trivial differential operations on the space R
(n = 3, 4, . . .), of order higher than, two are given in the form of the following com-
positions:
(∇k) ◦ ∇j ◦ ∇k ◦ · · · ◦ ∇j ◦ ∇k,
(∇j) ◦ ∇k ◦ ∇j ◦ · · · ◦ ∇k ◦ ∇j ,
with to the condition k+ j = n+ 1 and 2k, 2j 6= n for k, j = 1, 2, . . . , n. Terms in
parentheses are included in for an odd number of terms and are left out otherwise.
Acknowledgment. I wish to express my gratitude to ProfessorsM. Merkle and
M. Prvanović who examined the first version of the paper and gave me their
suggestions and some very useful remarks.
REFERENCES
1. R.Bott, L.W.Tu: Differential forms in algebraic topology, Springer, New York 1982.
2. B.J.Malešević: A note on higher-order differential operations, Univ. Beograd, Publ.
Elektrotehn. Fak.,Ser. Mat. 7 (1996), 105-109.
University of Belgrade, (Received September 8, 1997)
Faculty of Electrical Engineering, (Revised October 30, 1998)
P.O.Box 35-54, 11120 Belgrade,
Yugoslavia
malesevic@kiklop.etf.bg.ac.yu
ABSTRACT
  In this paper we present a recurrent relation for counting meaningful
compositions of the higher-order differential operations on the space $R^{n}$
(n=3,4,...) and extract the non-trivial compositions of order higher than two.

<|endoftext|><|startoftext|>
Introduction
	2. Preliminary
	3. The proof of Theorem ??
	4. Applications
	References
ABSTRACT
  We provide several equivalent characterizations of Kobayashi hyperbolicity in
unbounded convex domains in terms of peak and anti-peak functions at infinity,
affine lines, Bergman metric and iteration theory.

<|endoftext|><|startoftext|>
Introduction
The two-dimensional models have widely been used in the context of the two-dimensional
gravity (e.g. see [1, 2, 3, 4] and references therein) and string theory. From the 2d-gravity
point of view, higher-dimensional gravity models, by dimensional reduction reduce to the
2d-gravity [1, 2, 3]. From the string theory point of view, the (1+1)-dimensional actions
are fundamental tools of the theory. However, 2d-gravity and 2d- string theory are closely
related to each other.
The known sigma models for string, in the presence of the dilaton field Φ(X), contain
the two-dimensional scalar curvature R(hab),
hRΦ(X). (1)
In two dimensions the combination
hR is total derivative. Thus, in the absence of the
dilaton field, this action is a topological invariant that gives no dynamics to the worldsheet
metric hab.
In fact, in the action (1), the dilaton is not the only choice. For example, replacing the
dilaton field with the scalar curvature R, leads to the R2-gravity [1, 4, 5]. In particular
the Polyakov action is replaced by a special combination of the worldsheet fields, which
include an overall factor R−1. Removing the dilaton and replacing it with another quantities
motivated us to study a class of two-dimensional actions. They are useful in the context of
the non-critical strings with curved worldsheet, and the 2-dimensional gravity.
Instead of the dilaton field, we introduce some combinations of hab, R and the induced
metric on the worldsheet, i.e. γab, which give dynamics to hab. These non-linear combinations
can contain an arbitrary function f(R) of the scalar curvature R. We observe that these
dynamics lead to the constraint equation for hab, extracted from the Polyakov action.
For the flat spacetime, these models have the Poincaré symmetry. In addition, they are
reparametrization invariant. However, for any function f(R), they do not have the Weyl
symmetry. Therefore, the string worldsheet at most is conformally flat. By introducing an
extra scalar field in these actions, they also find the Weyl symmetry. Note that a Weyl
non-invariant string theory has noncritical dimension, e.g. see [6].
This paper is organized as follows. In section 2, we introduce a new action for the string
in which the corresponding worldsheet always is curved. In section 3, the Poincaré symmetry
of this string model will be studied. In section 4, the generalized form of the above action
will be introduced and it will be analyzed.
2 Curved worldsheet in the curved spacetime
We consider the following action for the string, which propagates in the curved spacetime
S = −T
habγab
, (2)
where h = − det hab, and T is a dimensionless constant. In addition, R denotes the two-
dimensional scalar curvature which is made from hab. The string coordinates are {Xµ(σ, τ)}.
The induced metric on the worldsheet, i.e. γab, is also given by
γab = gµν(X)∂aX
µ(σ, τ)∂bX
ν(σ, τ), (3)
where gµν(X) is the spacetime metric.
In two dimensions, the symmetries of the curvature tensor imply the identity
Rab −
habR = 0. (4)
Therefore, the variation of the action (2) leads to the following equation of motion for hab,
Rab −
γab = 0. (5)
This implies that the energy-momentum tensor, extracted from the action (2), vanishes.
Contraction of this equation by hab gives R = 1
habγab. Introducing this equation and
the equation (5) into (4) leads to
(Polyakov)
ab ≡ γab −
hab(h
a′b′γa′b′) = 0. (6)
This is the constraint equation, extracted from the Polyakov action. Note that the energy-
momentum tensor, due to the action (2), is proportional to the left-hand-side of the equation
(5). Thus, it is different from (6).
The equation of motion of the string coordinate Xµ(σ, τ) also is
hRhab∂bX
hRhabΓ
νλ∂aX
λ = 0. (7)
Presence of the scalar curvature R distinguishes this equation from its analog, extracted
from the Polyakov action.
Now consider those solutions of the equations of motion (5) and (7), which admit constant
scalar curvature R. For these solutions, the equation (7) reduces to the equation of motion
of the string coordinates, extracted from the Polyakov action with the curved background.
However, for general solutions the scalar curvature R depends on the worldsheet coordinates
σ and τ , and hence this coincidence does not occur.
2.1 The model in the conformal gauge
Under reparametrization of σ and τ , the action (2) is invariant. That is, in two dimensions
the general coordinate transformations σ → σ′(σ, τ) and τ → τ ′(σ, τ), depend on two free
functions, namely the new coordinates σ′ and τ ′. By means of such transformations any
two of the three independent components of hab can be eliminated. A standard choice is a
parametrization of the worldsheet such that
hab = e
φ(σ,τ)ηab, (8)
where ηab = diag(−1, 1), and eφ(σ,τ) is an unknown conformal factor. The choice (8) is called
the conformal gauge. Since the action (2) does not have the Weyl symmetry (a local rescaling
of the worldsheet metric hab) we cannot choose the gauge hab = ηab.
The scalar curvature corresponding to the metric (8) is
R = −e−φ∂2φ, (9)
where ∂2 = ηab∂a∂b. Thus, the action (2) reduces to
S ′ = −T
d2σe−φ∂2φ
ηabγab
. (10)
According to the gauge (8), this action describes a conformally flat worldsheet.
3 Poincaré symmetry of the model
In this section we consider flat Minkowski space, i.e. gµν(X) = ηµν . Therefore, the equations
of motion are simplified to
Rab −
ηµν∂aX
ν = 0, (11)
hRhab∂bX
µ) = 0. (12)
The Poincaré symmetry reflects the symmetry of the background in which the string is
propagating. It is described by the transformations
δXµ = aµνX
ν + bµ,
δhab = 0, (13)
where aµν and b
µ are independent of the worldsheet coordinates σ and τ , and aµν = ηµλa
is antisymmetric. Thus, from the worldsheet point of view, these transformations are global
symmetries. Under these transformations the action (2) is invariant.
3.1 The conserved currents
The Poincaré invariance of the action (2) is associated to the following Noether currents
J µνa = T
hRhab(Xµ∂bX
ν −Xν∂bXµ),
Pµa = T
hRhab∂bX
µ, (14)
where the current Pµa is corresponding to the translation invariance and J µνa is the current
associated to the Lorentz symmetry. According to the equation of motion (12) these are
conserved currents
∂aJ µνa = 0,
∂aPµa = 0. (15)
3.2 The covariantly conserved currents
It is possible to construct two other currents from (14), in which they be covariantly con-
served. For this, there is the useful formula
∇aKa =
a), (16)
where Ka is a worldsheet vector. Therefore, we define the currents Jµνa and P µa as in the
following
Jµνa =
J µνa,
P µa =
Pµa. (17)
According to the equations (15) and (16), these are covariantly conserved currents, i.e.,
∇aJµνa = ∇aP µa = 0. (18)
The currents (17) can also be written as
Jµνa =
R(Xµ∂aX
ν −Xν∂aXµ),
P µa =
µ. (19)
Since there is ∇ahbc = 0, the conservation laws (18) also imply the covariantly conservation
of the currents (19).
4 Generalization of the model
The generalized form of the action (2) is
I = −T
f(R)−
, (20)
where f(R) is an arbitrary differentiable function of the scalar curvature R. The set
{Xµ(σ, τ)} describes a string worldsheet in the spacetime. These string coordinates ap-
peared in the induced metric γab through the equation (3). Thus, (20) is a model for the
string action.
The equation of motion of Xµ is as previous, i.e. (7). Vanishing the variation of this
action with respect to the worldsheet metric hab, gives the equation of motion of hab,
df(R)
γab = 0. (21)
The trace of this equation is
df(R)
habγab = 0. (22)
Combining the equations (4), (21) and (22) again leads to the equation (6).
As an example, consider the function f(R) = α lnR + β. Thus, the field equation (21)
implies that the intrinsic metric hab becomes proportional to the induced metric γab, that is
hab =
Since the Poincaré transformations contain δhab = 0, the generalized action (20) for
the flat background metric gµν = ηµν , also has the Poincaré invariance. This leads to the
previous conserved currents, i.e. (14) and (19).
4.1 Weyl invariance in the presence of a new scalar field
The action (20) under the reparametrization transformations is symmetric. The Weyl trans-
formation is also defined by
hab −→ h′ab = eρ(σ,τ)hab. (23)
Thus, the scalar curvature transforms as
R −→ R′ = e−ρ(R−∇2ρ), (24)
where ∇2ρ = 1√
hhab∂bρ). The equations (23) and (24) imply that the action (20), for
any function f(R), is Weyl non-invariant.
Introducing (23) and (24) into the action (20) gives a new action which contains the field
ρ(σ, τ),
I ′ = −T
h(R−∇2ρ)
f [e−ρ(R−∇2ρ)]− 1
e−ρhabγab
. (25)
We can ignore the origin of this action. In other words, it is another model for string.
However, under the Weyl transformations
hab −→ eu(σ,τ)hab,
ρ −→ ρ− u, (26)
the action I ′, for any function f , is symmetric. Note that according to the definition of ∇2
there is the transformation ∇2 → e−u∇2.
5 Conclusions
We considered some string actions which give dynamics to the worldsheet metric hab. Due
to the absence of the Weyl invariance, these models admit at most conformally flat (but not
flat) worldsheet. We observed that the constraint equation on the metric, extracted from
the Polyakov action, is a special result of the field equations of our string models. Obtaining
this constraint equation admits us to introduce an arbitrary function of the scalar curvature
to the action. For the case f(R) = α lnR + β, the metric hab becomes proportional to the
induced metric of the worldsheet.
By introducing a new degree of freedom we obtained a string action, in which for any
function f is Weyl invariant.
Our string models with arbitrary f(R), in the flat background have the Poincaré sym-
metry. The associated conserved currents are proportional to the scalar curvature R. We
also constructed the covariantly conserved currents from the Poincaré currents.
References
[1] H.J. Schmidt, Int. J. Mod. Phys. D7 (1998) 215, gr-qc/9712034.
[2] D. Park and Y. Kiem, Phys. Rev. D53 (1996) 5513; Phys. Rev. D53 (1996) 747.
[3] A. Achucarro and M. Ortiz, Phys. Rev. D48 (1993) 3600.
[4] D. Grumiller, W. Kummer and D.V. Vassilevich, Phys. Rept. 369 (2002) 327-429, hep-
th/0204253.
[5] M.O. Katanaev and I.V. Volovich, Phys. Lett. B175 (1986) 413, hep-th/0209014.
[6] F. David, Mod. Phys. Lett. A3 (1988) 1651; J. Distler, H. Kawai, Nucl. Phys. B321
(1989) 509; A. A. Tseytlin, Int. Jour. Mod. Phys. A4 (1989) 1257; J. Polchinski, Nucl.
Phys. B324 (1989) 123.
ABSTRACT
  At first we introduce an action for the string, which leads to a worldsheet
that always is curved. For this action we study the Poincar\'e symmetry and the
associated conserved currents. Then, a generalization of the above action,
which contains an arbitrary function of the two-dimensional scalar curvature,
will be introduced. An extra scalar field enables us to modify these actions to
Weyl invariant models.

<|endoftext|><|startoftext|>
Introduction
The cosmic strings play an important role in the study of the early universe.
These strings arise during the phase transition after the big bang explosion
as the temperature drops down below some critical temperature as predicted
by grand unified theories [1-5]. It is thought that cosmic strings cause density
perturbations leading to the formation of galaxies [6]. These cosmic strings have
stress-energy and couple with the gravitational field. Therefore, it is interesting
to study the gravitational effects that arise from strings. The general relativistic
treatment of strings was started by Letelier [7, 8] and Stachel [9]. Exact solutions
of string cosmology in various space-times have been studied by several authors
[10-23].
http://arxiv.org/abs/0704.0753v2
On the other hand, the magnetic field has an important role at the cosmo-
logical scale and is present in galactic and intergalactic spaces. The importance
of the magnetic field for various astrophysical phenomena has been studied in
many papers. Melvin [24] has pointed out that during the evolution of the uni-
verse, the matter was in a highly ionized state and is smoothly coupled with
the field and forms a neutral matter as a result of universe expansion. FRW
models are approximately valid as present day magnetic field strength is very
small. In the early universe, the strength might have been appreciable. The
break-down of isotropy is due to the magnetic field. Therefore the possibility of
the presence of magnetic field in the cloud string universe is not unrealistic and
has been investigated by many authors [25-28].
In this paper, we have investigated Bianchi type I massive string magnetized
barotropic perfect fluid cosmological model in General Relativity. The magnetic
field is due to an electric current produced along x-axis with infinite electrical
conductivity. Also the behaviour of the model in the presence and absence of
magnetic field together with other physical aspects is discussed.
2 The Metric and Field Equations
We consider the space-time of Bianchi type-I in the form
ds2 = −dt2 +A2(t)dx2 +B2(t)dy2 + C2(t)dz2. (1)
The energy momentum tensor for a cloud of massive string and perfect fluid
distribution with electromagnetic field is taken as
i = (ρ+ p)viv
j + pg
i − λxix
j + E
i , (2)
where vi and xi satisfy condition
vivi = −xixi = −1, vixi = 0, (3)
p is the isotropic pressure, ρ is the proper energy density for a cloud string with
particles attached to them, λ is the string tension density, vi the four-velocity
of the particles, and xi is a unit space-like vector representing the direction of
string. In a co-moving co-ordinate system, we have
vi = (0, 0, 0, 1), xi =
, 0, 0, 0
. (4)
The electromagnetic field E
i given by Lichnerowicz [29] as
i = µ̄
| h |2
− hihj
. (5)
Here the flow-vector vi satisfies
ivj = −1, (6)
and µ̄ is the magnetic permeability, hi the magnetic flux vector defined by
ǫijklF
klvj , (7)
where Fkl is the electromagnetic field tensor and ǫijkl is the Levi Civita tensor
density. The incidental magnetic field is taken along x-axis, so that h1 6= 0,
h2 = h3 = h4 = 0. We assume that F23 is the only non-vanishing component of
Fij .
The Maxwell’s equations
Fij;k + Fjk;i + Fki;j = 0,
;j = 0, (8)
are satisfied by
F23 = constant = H(say).
Here F14 = 0 = F24 = F34, due to the assumption of infinite electrical conduc-
tivity [30]. Hence
. (9)
Since | h |2= hlhl = h1h1 = g11(h1)2, therefore
| h |2=
µ̄2B2C2
. (10)
Using Eqs. (9) and (10) in (5), we have
E11 = −
2µ̄B2C2
= −E22 = −E33 = E44 . (11)
If the particle density of the configuration is denoted by ρp, then we have
ρ = ρp + λ. (12)
The Einstein’s field equations (in gravitational units c = 1, G = 1) read as
i = −T
i , (13)
where R
i is the Ricci tensor; R = g
ijRij is the Ricci scalar.
The field equations (13) with (2) subsequently lead to the following system
of equations:
= −p+ λ+ H
2µ̄B2C2
, (14)
2µ̄B2C2
, (15)
2µ̄B2C2
, (16)
2µ̄B2C2
, (17)
where the suffix 4 at the symbols A, B and C denotes ordinary differentiation
with respect to t.
3 Solution of Field Equations
The field Eqs. (14)-(17) are a system of four equations with six unknown param-
eters A, B, C, p, λ and ρ. Two additional constraints relating these parameters
are required to obtain explicit solutions of the system.
From Eq. (16), we have
p = −A44
− B44
− A4B4
, (18)
where K = H
. Now from Eq. (17), we have
. (19)
To get deterministic solution, we first assume that the universe is filled with
barotropic perfect fluid which leads to
p = γρ, (20)
where γ(0 ≤ γ ≤ 1) is a constant. Putting the values of p and ρ from Eqs. (18)
and (19) in (20), we obtain
+ (1 + γ)
+ (1 − γ) K
= 0. (21)
Equations (15) and (16) lead to
(CB4 −BC4)4
(CB4 −BC4)
= −A4
, (22)
which again leads to
, (23)
where L is an integrating constant and
BC = µ,
= ν. (24)
Thus from Eqs. (23) and (24), we have
. (25)
For deterministic solution, we secondly assume
A = constant = α(say). (26)
Thus Eq. (25) leads to
. (27)
From Eqs. (21) and (26), we have
(1− γ)K
= 0. (28)
Using (24) in Eq. (28), we obtain
+ (γ − 1) µ
+ (1− γ) L
4α2µ2
+ (1− γ)K
= 0, (29)
which again leads to
2µ44 + (γ − 1)
, (30)
where
a = (γ − 1)L
+ 4(γ − 1)K. (31)
Let us assume that µ4 = f(µ). Thus µ44 = ff
′, where f ′ = df
. Accordingly
Eq. (30) leads to
(f2) + (γ − 1) 1
, (32)
which again reduces to
γ − 1
+ bµ1−γ . (33)
Now from Eq. (27), we have
dµ (34)
Using Eq. (33) in Eq. (34), we have
Lµ̄γdµ
bµ1−γ
ℓ2 + µ1−γ
, (35)
where ℓ2 = a
(γ−1)b .
Eq. (35), after integration, leads to
ν = S
ℓ2 + µ1−γ − ℓ
ℓ2 + µ1−γ + ℓ
α(1−γ)ℓ
, (36)
where S is the constant of integration.
Thus the metric (1) reduces to the form
ds2 = −
dµ2 + α2dx2 + µ
ℓ2 + µ1−γ − ℓ
ℓ2 + µ1−γ + ℓ
αℓ(1−γ)
ℓ2 + µ1−γ − ℓ
ℓ2 + µ1−γ + ℓ
αℓ(1−γ)
, (37)
which after suitable transformation of coordinates, leads to
ds2 = − dT
b(ℓ2 + T 1−γ)
+ dX2 + T
ℓ2 + µ1−γ − ℓ
ℓ2 + µ1−γ + ℓ
αℓ(1−γ)
ℓ2 + µ1−γ − ℓ
ℓ2 + µ1−γ + ℓ
αℓ(1−γ)
, (38)
where αx = X,
Sy = Y, 1√
z = z, µ = T .
In the absence of the magnetic field, i.e. when K → 0, then the metric (37)
reduces to
2 = −
+ T 1−γ
) + dX2 + T
+ T 1−γ − L
+ T 1−γ + L
+ T 1−γ − L
+ T 1−γ + L
. (39)
4 The Geometric and Physical Significance of
Model
The energy density (ρ), the string tension density (λ), the particle density (ρp),
the isotropic pressure (p), the scalar of expansion (θ), and shear tensor (σ) for
the model (38) are given by
b(ℓ2 + T 1−γ)− L
, (40)
b(1− γ)T 1−γ
, (41)
b(ℓ2 + γT 1−γ)− L
, (42)
b(ℓ2 + T 1−γ)− L
. (43)
ℓ2 + T 1−γ
, (44)
6b(ℓ2 + T 1−γ) +
. (45)
ρ+ p =
b{2ℓ2 − (1− γ)T 1−γ} − 2L
, (46)
ρ+ 3p =
b{4ℓ2 + (1 + 3γ)T 1−γ − 4L
− 16K
. (47)
The reality conditions given by Ellis [31] as
(i)ρ+ p > 0, (ii)ρ+ 3p > 0,
are satisfied when
T 1−γ <
ℓ2 − L
The energy conditions ρ ≥ 0 and ρp ≥ 0 are satisfied in the presence of
magnetic field for the model (38). The condition ρ ≥ 0 leads to
b(ℓ2 + T 1−γ) ≥
+ 4K.
The condition ρp ≥ 0 leads to
b(ℓ2 + T 1−γ) ≥ L
− 4K.
From Eq. (42), we observe that the string tension density λ ≥ 0 provided
b(1− γ)T 1−γ ≥ 8K.
The model (38) starts with a big bang at T = 0 and the expansion in the
model decreases as time increases. When T → 0 then ρ → ∞, λ → ∞. When
T → ∞ then ρ → 0, λ → 0. Also p → ∞ when T → 0 and p → 0 when T → ∞.
Since limT→∞
6= 0, hence the model does not isotropize in general. However,
if L = 0 then the model (38) isotropizes for large values of T . There is a point
type singularity [32] in the model (38) at T = 0.
The ratio of magnetic energy to material energy is given by
b(ℓ2 + T (1−γ))− L2
, (48)
where 0 ≤ γ ≤ 1. The ratio E
is non-zero finite quantity initially and tends to
zero as T → ∞.
The scale factor (R) is given by
R3 = ABC = αµ = αT. (49)
Thus R increases as T increases.
The deceleration parameter (q) in presence of magnetic field is given by
q = −RR44
a(3a− 2) + 1
b(1 + 3γ)T (1−γ)
γ−1 + bT
(1−γ)
) . (50)
The deceleration parameter (q) approaches the value (−1) as in the case of
de-Sitter universe if
T (1−γ) =
2a(3a− 2)(γ − 1)− a
b(1− γ)(1− 3γ)
In the absence of magnetic field, i.e. K → 0, the above mentioned quantities
are given by
4T 1+γ
, (51)
b(1− γ)
4T 1+γ
, (52)
4T 1+γ
, (53)
4T 1+γ
, (54)
In the absence of magnetic field when γ = 1, then ρ = b
and also the string
tension density becomes zero. The energy conditions ρ ≥ 0 and ρp ≥ 0 are
satisfied for the model (38) when b ≥ 0.
The reality conditions given by Ellis [31] as
(i) ρ+ p > 0, (ii) ρ+ 3p > 0,
are satisfied when b > 0.
+ bT 1−γ
, (55)
+ 6bT 1−γ
. (56)
In the absence of magnetic field, the model (39) starts with a big bang at T = 0
and the expansion in the model decreases as time increases. When T → 0 then
ρ → ∞, λ → ∞ and p → ∞. When T → ∞ then ρ → 0, λ → 0 and p → 0. In
the absence of magnetic field, the particle density (ρp) and the isotropic pressure
(p) are equal. Since limT→∞
6= 0, therefore the model does not isotropize in
general. However, if L = 0 then the model (39) isotropizes for large values of
T . There is a point type singularity [32] in the model (39) at T = 0.
In absence of magnetic field, the scale factor (R) is given by
R3 = αT. (57)
The R increases as T increases in this case also. The deceleration parameter (q)
is given by
q = −
b(1 + 3γ)T (1−γ) − L
{3L2(1− γ) + α2}
+ bT (1−γ)
We observe that q < 0 if T (1−γ) >
2{3L2(1−γ)α2}
b(1+3γ)α4
. The deceleration parameter
(q) approaches the value (−1) as in the case of de-Sitter universe if
T (1−γ) =
L2{3L2(1− γ) + α2}
3bγα4
Acknowledgments
Authors would like to thank the Inter-University Centre for Astronomy and
Astrophysics (IUCAA), Pune, India for providing facility and support where
this work was carried out. Authors also thank to the referee for their fruitful
comments.
References
[1] Kibble T W B 1976 J. Phys. A: Math. Gen. 9 1387
[2] Zel’dovich Ya B, Kobzarev, I Yu, and Okun, L B 1975 Zh. Eksp. Teor.
Fiz. 67 3
Zel’dovich Ya B, Kobzarev, I Yu, and Okun, L B 1975 Sov. Phys.-JETP
[3] Kibble T W B 1980 Phys. Rep. 67 183
[4] Everett A E 1981 Phys. Rev. 24 858
[5] Vilenkin A 1981 Phys. Rev. D 24 2082
[6] Zel’dovich Ya B 1980 Mon. Not. R. Astron. Soc. 192 663
[7] Letelier P S 1979 Phys. Rev. D 20 1249
[8] Letelier P S 1983 Phys. Rev. D 28 2414
[9] Stachel J 1980 Phys. Rev. D 21 2171
[10] Banerjee A, Sanyal A K and Chakraborty S 1990 Pramana-J. Phys. 34 1
[11] Chakraborty S 1991 Ind. J. Pure Appl. Phys. 29 31
[12] Tikekar R and Patel L K 1992 Gen. Rel. Grav. 24 397
[13] Tikekar R and Patel L K 1994 Pramana-J. Phys. 42 483
[14] Patel, L K and Maharaj S D 1996 Pramana-J. Phys. 47 33
[15] Ram, S and Singh, T K 1995 Gen. Rel. Grav. 27 1207
[16] Carminati J and McIntosh C B G 1980 J. Phys. A: Math. Gen. 13 953
[17] Krori K D, Chaudhury T, Mahanta C R and Mazumdar A 1990 Gen. Rel.
Grav. 22 123
[18] Wang X X 2003 Chin. Phys. Lett. 20 615
[19] Singh G P and Singh T 1999 Gen. Relativ. Gravit. 31 371
[20] Bali R and Upadhaya R D 2003 Astrophys. Space Sci. 283 97
[21] Bali R and Pradhan A 2007 Chin. Phys. Lett. 24 585
[22] Bali R and Anjali 2006 Astrophys. Space Sci. 302 201
[23] Yadav M K, Rai A and Pradhan A 2007 Int. J. Theor. Phys. to appear
(gr-qc/0611032).
[24] Melvin M A 1975 Ann. New York Acad. Sci. 262 253
[25] Wang X X 2006 Chin. Phys. Lett. 23 1702
[26] Wang X X 2004 Astrophys. Space Sci. 293 933
[27] Chakraborty N C and Chakraborty 2001 Int. J. Mod. Phys. D 10 723
[28] Singh G P and Singh T 1999 Gen. Rel. Gravit. 31 371
[29] Lichnerowicz A 1967 Relativistic Hydrodynamics and Magnetohydrody-
namics Benjamin New York p. 13
[30] Roy Maartens 2000 Pramana-J. Phys. 55 576
[31] Ellis G F R 1971 General Relativity and Cosmology ed. Sachs R K Claren-
don Press p. 117
[32] MacCallum M A H 1971 Comm. Math. Phys. 20 57
http://arxiv.org/abs/gr-qc/0611032
	Introduction
	The Metric and Field Equations
	Solution of Field Equations
	The Geometric and Physical Significance of Model
ABSTRACT
  Bianchi type I massive string cosmological model with magnetic field of
barotropic perfect fluid distribution through the techniques used by Latelier
and Stachel, is investigated. To get the deterministic model of the universe,
it is assumed that the universe is filled with barotropic perfect fluid
distribution. The magnetic field is due to electric current produced along
x-axis with infinite electrical conductivity. The behaviour of the model in
presence and absence of magnetic field together with other physical aspects is
further discussed.

<|endoftext|><|startoftext|>
Introduction
The theory of general relativity was developed by Einstein in work that extended
from 1907 to 1915. The starting point for Einstein’s thinking was the compo-
sition of a review article in 1907 on what we today call the theory of special
relativity. Recall that the latter theory sprang from a new kinematics governing
length and time measurements that was proposed by Einstein in June of 1905
[1], [2], following important pioneering work by Lorentz and Poincaré. The
theory of special relativity essentially poses a new fundamental framework (in
place of the one posed by Galileo, Descartes, and Newton) for the formulation of
physical laws: this framework being the chrono-geometric space-time structure
of Poincaré and Minkowski. After 1905, it therefore seemed a natural task to
formulate, reformulate, or modify the then known physical laws so that they fit
within the framework of special relativity. For Newton’s law of gravitation, this
task was begun (before Einstein had even supplied his conceptual crystallization
in 1905) by Lorentz (1900) and Poincaré (1905), and was pursued in the period
from 1910 to 1915 by Max Abraham, Gunnar Nordström and Gustav Mie (with
these latter researchers developing scalar relativistic theories of gravitation).
Meanwhile, in 1907, Einstein became aware that gravitational interactions
possessed particular characteristics that suggested the necessity of generalizing
the framework and structure of the 1905 theory of relativity. After many years
of intense intellectual effort, Einstein succeeded in constructing a generalized
∗Talk given at the Poincaré Seminar “Gravitation et Expérience” (28 October 2006, Paris);
to appear in the proceedings to be published by Birkhäuser.
†Translated from the French by Eric Novak.
http://arxiv.org/abs/0704.0754v1
theory of relativity (or general relativity) that proposed a profound modification
of the chrono-geometric structure of the space-time of special relativity. In 1915,
in place of a simple, neutral arena, given a priori, independently of all material
content, space-time became a physical “field” (identified with the gravitational
field). In other words, it was now a dynamical entity, both influencing and
influenced by the distribution of mass-energy that it contains.
This radically new conception of the structure of space-time remained for
a long while on the margins of the development of physics. Twentieth century
physics discovered a great number of new physical laws and phenomena while
working with the space-time of special relativity as its fundamental framework,
as well as imposing the respect of its symmetries (namely the Lorentz-Poincaré
group). On the other hand, the theory of general relativity seemed for a long
time to be a theory that was both poorly confirmed by experiment and without
connection to the extraordinary progress springing from application of quantum
theory (along with special relativity) to high-energy physics. This marginaliza-
tion of general relativity no longer obtains. Today, general relativity has become
one of the essential players in cutting-edge science. Numerous high-precision ex-
perimental tests have confirmed, in detail, the pertinence of this theory. General
relativity has become the favored tool for the description of the macroscopic uni-
verse, covering everything from the big bang to black holes, including the solar
system, neutron stars, pulsars, and gravitational waves. Moreover, the search
for a consistent description of fundamental physics in its entirety has led to the
exploration of theories that unify, within a general quantum framework, the
description of matter and all its interactions (including gravity). These theo-
ries, which are still under construction and are provisionally known as string
theories, contain general relativity in a central way but suggest that the funda-
mental structure of space-time-matter is even richer than is suggested separately
by quantum theory and general relativity.
2 Special Relativity
We begin our exposition of the theory of general relativity by recalling the
chrono-geometric structure of space-time in the theory of special relativity. The
structure of Poincaré-Minkowski space-time is given by a generalization of the
Euclidean geometric structure of ordinary space. The latter structure is sum-
marized by the formula L2 = (∆x)2 + (∆y)2 + (∆z)2 (a consequence of the
Pythagorean theorem), expressing the square of the distance L between two
points in space as a sum of the squares of the differences of the (orthonormal)
coordinates x, y, z that label the points. The symmetry group of Euclidean ge-
ometry is the group of coordinate transformations (x, y, z) → (x′, y′, z′) that
leave the quadratic form L2 = (∆x)2 + (∆y)2 +(∆z)2 invariant. (This group is
generated by translations, rotations, and “reversals” such as the transformation
given by reflection in a mirror, for example: x′ = −x, y′ = y, z′ = z.)
The Poincaré-Minkowski space-time is defined as the ensemble of events (ide-
alizations of what happens at a particular point in space, at a particular moment
in time), together with the notion of a (squared) interval S2 defined between
any two events. An event is fixed by four coordinates, x, y, z, and t, where
(x, y, z) are the spatial coordinates of the point in space where the event in
question “occurs,” and where t fixes the instant when this event “occurs.” An-
other event will be described (within the same reference frame) by four different
coordinates, let us say x+∆x, y+∆y, z+∆z, and t+∆t. The points in space
where these two events occur are separated by a distance L given by the for-
mula above, L2 = (∆x)2+(∆y)2+(∆z)2. The moments in time when these two
events occur are separated by a time interval T given by T = ∆t. The squared
interval S2 between these two events is given as a function of these quantities,
by definition, through the following generalization of the Pythagorean theorem:
S2 = L2 − c2 T 2 = (∆x)2 + (∆y)2 + (∆z)2 − c2(∆t)2 , (1)
where c denotes the speed of light (or, more precisely, the maximum speed of
signal propagation).
Equation (1) defines the chrono-geometry of Poincaré-Minkowski space-time.
The symmetry group of this chrono-geometry is the group of coordinate trans-
formations (x, y, z, t) → (x′, y′, z′, t′) that leave the quadratic form (1) of the
interval S invariant. We will show that this group is made up of linear trans-
formations and that it is generated by translations in space and time, spatial
rotations, “boosts” (meaning special Lorentz transformations), and reversals of
space and time.
It is useful to replace the time coordinate t by the “light-time” x0 ≡ ct, and
to collectively denote the coordinates as xµ ≡ (x0, xi) where the Greek indices
µ, ν, . . . = 0, 1, 2, 3, and the Roman indices i, j, . . . = 1, 2, 3 (with x1 = x, x2 = y,
and x3 = z). Equation (1) is then written
S2 = ηµν ∆x
µ ∆xν , (2)
where we have used the Einstein summation convention1 and where ηµν is a
diagonal matrix whose only non-zero elements are η00 = −1 and η11 = η22 =
η33 = +1. The symmetry group of Poincaré-Minkowski space-time is therefore
the ensemble of Lorentz-Poincaré transformations,
x′µ = Λµν x
ν + aµ , (3)
where ηαβ Λ
ν = ηµν .
The chrono-geometry of Poincaré-Minkowski space-time can be visualized
by representing, around each point x in space-time, the locus of points that
are separated from the point x by a unit (squared) interval, in other words the
ensemble of points x′ such that S2xx′ = ηµν(x
′µ−xµ)(x′ν −xν) = +1. This locus
is a one-sheeted (unit) hyperboloid.
If we were within an ordinary Euclidean space, the ensemble of points x′
would trace out a (unit) sphere centered on x, and the “field” of these spheres
1Every repeated index is supposed to be summed over all of its possible values.
centered on each point x would allow one to completely characterize the Eu-
clidean geometry of the space. Similarly, in the case of Poincaré-Minkowski
space-time, the “field” of unit hyperboloids centered on each point x is a visual
characterization of the geometry of this space-time. See Figure 1. This figure
gives an idea of the symmetry group of Poincaré-Minkowski space-time, and
renders the rigid and homogeneous nature of its geometry particularly clear.
Figure 1: Geometry of the “rigid” space-time of the theory of special relativity.
This geometry is visualized by representing, around each point x in space-time,
the locus of points separated from the point x by a unit (squared) interval. The
space-time shown here has only three dimensions: one time dimension (repre-
sented vertically), x0 = ct, and two spatial dimensions (represented horizon-
tally), x, y. We have also shown the ‘space-time line’, or ‘world-line’, (moving
from the bottom to the top of the “space-time block,” or from the past towards
the future) representing the history of a particle’s motion.
The essential idea in Einstein’s article of June 1905 was to impose the group
of transformations (3) as a symmetry group of the fundamental laws of physics
(“the principle of relativity”). This point of view proved to be extraordinarily
fruitful, since it led to the discovery of new laws and the prediction of new phe-
nomena. Let us mention some of these for the record: the relativistic dynamics
of classical particles, the dilation of lifetimes for relativistic particles, the re-
lation E = mc2 between energy and inertial mass, Dirac’s relativistic theory
of quantum spin 1
particles, the prediction of antimatter, the classification of
particles by rest mass and spin, the relation between spin and statistics, and
the CPT theorem.
After these recollections on special relativity, let us discuss the special fea-
ture of gravity which, in 1907, suggested to Einstein the need for a profound
generalization of the chrono-geometric structure of space-time.
3 The Principle of Equivalence
Einstein’s point of departure was a striking experimental fact: all bodies in an
external gravitational field fall with the same acceleration. This fact was pointed
out by Galileo in 1638. Through a remarkable combination of logical reason-
ing, thought experiments, and real experiments performed on inclined planes,2
Galileo was in fact the first to conceive of what we today call the “universality of
free-fall” or the “weak principle of equivalence.” Let us cite the conclusion that
Galileo drew from a hypothetical argument where he varied the ratio between
the densities of the freely falling bodies under consideration and the resistance of
the medium through which they fall: “Having observed this I came to the con-
clusion that in a medium totally devoid of resistance all bodies would fall with
the same speed” [3]. This universality of free-fall was verified with more pre-
cision by Newton’s experiments with pendulums, and was incorporated by him
into his theory of gravitation (1687) in the form of the identification of the iner-
tial massmi (appearing in the fundamental law of dynamics F = mi a) with the
gravitational mass mg (appearing in the gravitational force, Fg = Gmgm
mi = mg . (4)
At the end of the nineteenth century, Baron Roland von Eötvös verified
the equivalence (4) between mi and mg with a precision on the order of 10
and Einstein was aware of this high-precision verification. (At present, the
equivalence between mi and mg has been verified at the level of 10
−12 [4].) The
point that struck Einstein was that, given the precision with whichmi = mg was
verified, and given the equivalence between inertial mass and energy discovered
by Einstein in September of 1905 [2] (E = mi c
2), one must conclude that all of
the various forms of energy that contribute to the inertial mass of a body (rest
mass of the elementary constituents, various binding energies, internal kinetic
energy, etc.) do contribute in a strictly identical way to the gravitational mass of
this body, meaning both to its capacity for reacting to an external gravitational
field and to its capacity to create a gravitational field.
In 1907, Einstein realized that the equivalence betweenmi andmg implicitly
contained a deeper equivalence between inertia and gravitation that had impor-
tant consequences for the notion of an inertial reference frame (which was a fun-
damental concept in the theory of special relativity). In an ingenious thought
experiment, Einstein imagined the behavior of rigid bodies and reference clocks
within a freely falling elevator. Because of the universality of free-fall, all of the
objects in such a “freely falling local reference frame” would appear not to be
accelerating with respect to it. Thus, with respect to such a reference frame,
the exterior gravitational field is “erased” (or “effaced”). Einstein therefore pos-
tulated what he called the “principle of equivalence” between gravitation and
inertia. This principle has two parts, that Einstein used in turns. The first
part says that, for any external gravitational field whatsoever, it is possible to
2The experiment with falling bodies said to be performed from atop the Leaning Tower of
Pisa is a myth, although it aptly summarizes the essence of Galilean innovation.
locally “erase” the gravitational field by using an appropriate freely falling local
reference frame and that, because of this, the non-gravitational physical laws
apply within this local reference frame just as they would in an inertial reference
frame (free of gravity) in special relativity. The second part of Einstein’s equiv-
alence principle says that, by starting from an inertial reference frame in special
relativity (in the absence of any “true” gravitational field), one can create an
apparent gravitational field in a local reference frame, if this reference frame is
accelerated (be it in a straight line or through a rotation).
4 Gravitation and Space-Time Chrono-Geometry
Einstein was able (through an extraordinary intellectual journey that lasted
eight years) to construct a new theory of gravitation, based on a rich general-
ization of the 1905 theory of relativity, starting just from the equivalence prin-
ciple described above. The first step in this journey consisted in understanding
that the principle of equivalence would suggest a profound modification of the
chrono-geometric structure of Poincaré-Minkowski space-time recalled in Equa-
tion (1) above.
To illustrate, let Xα, α = 0, 1, 2, 3, be the space-time coordinates in a lo-
cal, freely-falling reference frame (or locally inertial reference frame). In such a
reference frame, the laws of special relativity apply. In particular, the infinites-
imal space-time interval ds2 = dL2 − c2 dT 2 between two neighboring events
within such a reference frame Xα, X ′α = Xα + dXα (close to the center of this
reference frame) takes the form
ds2 = dL2 − c2 dT 2 = ηαβ dXα dXβ , (5)
where we recall that the repeated indices α and β are summed over all of their
values (α, β = 0, 1, 2, 3). We also know that in special relativity the local energy
and momentum densities and fluxes are collected into the ten components of
the energy-momentum tensor Tαβ. (For example, the energy density per unit
volume is equal to T 00, in the reference frame described by coordinates Xα =
(X0, X i), i = 1, 2, 3.) The conservation of energy and momentum translates
into the equation ∂β T
αβ = 0, where ∂β = ∂/∂ X
The theory of special relativity tells us that we can change our locally in-
ertial reference frame (while remaining in the neighborhood of a space-time
point where one has “erased” gravity) through a Lorentz transformation, X ′α =
Λαβ X
β. Under such a transformation, the infinitesimal interval ds2, Equation
(5), remains invariant and the ten components of the (symmetric) tensor Tαβ
are transformed according to T ′αβ = Λαγ Λ
γδ. On the other hand, when
we pass from a locally inertial reference frame (with coordinates Xα) to an
extended non-inertial reference frame (with coordinates xµ; µ = 0, 1, 2, 3), the
transformation connecting the Xα to the xµ is no longer a linear transforma-
tion (like the Lorentz transformation) but becomes a non-linear transformation
Xα = Xα(xµ) that can take any form whatsoever. Because of this, the value of
the infinitesimal interval ds2, when expressed in a general, extended reference
frame, will take a more complicated form than the very simple one given by
Equation (5) that it had in a reference frame that was locally in free-fall. In
fact, by differentiating the non-linear functions Xα = Xα(xµ) we obtain the
relation dXα = ∂Xα/∂xµ dxµ. By substituting this relation into (5) we then
obtain
ds2 = gµν(x
λ) dxµ dxν , (6)
where the indices µ, ν are summed over 0, 1, 2, 3 and where the ten functions
gµν(x) (symmetric over the indices µ and ν) of the four variables x
λ are de-
fined, point by point (meaning that for each point xλ we consider a refer-
ence frame that is locally freely falling at x, with local coordinates Xαx ) by
gµν(x) = ηαβ ∂X
x (x)/∂x
µ ∂Xβx (x)/∂x
ν . Because of the nonlinearity of the
functions Xα(x), the functions gµν(x) generally depend in a nontrivial way on
the coordinates xλ.
The local chrono-geometry of space-time thus appears to be given, not by
the simple Minkowskian metric (2), with constant coefficients ηµν , but by a
quadratic metric of a much more general type, Equation (6), with coefficients
gµν(x) that vary from point to point. Such general metric spaces had been
introduced and studied by Gauss and Riemann in the nineteenth century (in
the case where the quadratic form (6) is positive definite). They carry the
name Riemannian spaces or curved spaces. (In the case of interest for Einstein’s
theory, where the quadratic form (6) is not positive definite, one speaks of a
pseudo-Riemannian metric.)
We do not have the space here to explain in detail the various geometric
structures in a Riemannian space that are derivable from the data of the in-
finitesimal interval (6). Let us note simply that given Equation (6), which
gives the distance ds between two infinitesimally separated points, we are able,
through integration along a curve, to define the length of an arbitrary curve
connecting two widely separated points A and B: LAB =
ds. One can then
define the “straightest possible line” between two given points A and B to be
the shortest line, in other words the curve that minimizes (or, more generally,
extremizes) the integrated distance LAB. These straightest possible lines are
called geodesic curves. To give a simple example, the geodesics of a spherical
surface (like the surface of the Earth) are the great circles (with radius equal
to the radius of the sphere). If one mathematically writes the condition for
a curve, as given by its parametric representation xµ = xµ(s), where s is the
length along the curve, to extremize the total length LAB one finds that x
must satisfy the following second-order differential equation:
d2 xλ
+ Γλµν(x)
= 0 , (7)
where the quantities Γλµν , known as the Christoffel coefficients or connection
coefficients, are calculated, at each point x, from the metric components gµν(x)
by the equation
Γλµν ≡
gλσ(∂µ gνσ + ∂ν gµσ − ∂σ gµν) , (8)
where gµν denotes the matrix inverse to gµν (g
µσ gσν = δ
ν where the Kronecker
symbol δµν is equal to 1 when µ = ν and 0 otherwise) and where ∂µ ≡ ∂/∂xµ
denotes the partial derivative with respect to the coordinate xµ. To give a
very simple example: in the Poincaré-Minkowski space-time the components of
the metric are constant, gµν = ηµν (when we use an inertial reference frame).
Because of this, the connection coefficients (8) vanish in an inertial reference
frame, and the differential equation for geodesics reduces to d2 xλ/ds2 = 0,
whose solutions are ordinary straight lines: xλ(s) = aλ s + bλ. On the other
hand, in a general “curved” space-time (meaning one with components gµν that
depend in an arbitrary way on the point x) the geodesics cannot be globally
represented by straight lines. One can nevertheless show that it always remains
possible, for any gµν(x) whatsoever, to change coordinates x
µ → Xα(x) in such
a way that the connection coefficients Γαβγ , in the new system of coordinates
Xα, vanish locally, at a given point Xα0 (or even along an arbitrary curve).
Such locally geodesic coordinate systems realize Einstein’s equivalence principle
mathematically: up to terms of second order, the components gαβ(X) of a
“curved” metric in locally geodesic coordinates Xα (ds2 = gαβ(X) dX
α dXβ)
can be identified with the components of a “flat” Poincaré-Minkowski metric:
gαβ(X) = ηαβ +O((X−X0)2), where X0 is the point around which we expand.
5 Einstein’s Equations: Elastic Space-Time
Having postulated that a consistent relativistic theory of the gravitational field
should include the consideration of a far-reaching generalization of the Poincaré-
Minkowski space-time, Equation (6), Einstein concluded that the same ten
functions gµν(x) should describe both the geometry of space-time as well as
gravitation. He therefore got down to the task of finding which equations must
be satisfied by the “geometric-gravitational field” gµν(x). He was guided in this
search by three principles. The first was the principle of general relativity, which
asserts that in the presence of a gravitational field one should be able to write
the fundamental laws of physics (including those governing the gravitational
field itself) in the same way in any coordinate system whatsoever. The second
was that the “source” of the gravitational field should be the energy-momentum
tensor T µν . The third was a principle of correspondence with earlier physics:
in the limit where one neglects gravitational effects, gµν(x) = ηµν should be
a solution of the equations being sought, and there should also be a so-called
Newtonian limit where the new theory reduces to Newton’s theory of gravity.
Note that the principle of general relativity (contrary to appearances and
contrary to what Einstein believed for several years) has a different physical
status than the principle of special relativity. The principle of special relativity
was a symmetry principle for the structure of space-time that asserted that
physics is the same in a particular class of reference frames, and therefore that
certain “corresponding” phenomena occur in exactly the same way in different
reference frames (“active” transformations). On the other hand, the principle
of general relativity is a principle of indifference: the phenomena do not (in
general) take place in the same way in different coordinate systems. However,
none of these (extended) coordinate systems enjoys any privileged status with
respect to the others.
The principle asserting that the energy-momentum tensor T µν should be the
source of the gravitational field is founded on two ideas: the relations E = mi c
and the weak principle of equivalence mi = mg show that, in the Newtonian
limit, the source of gravitation, the gravitational mass mg, is equal to the total
energy of the body considered, or in other words the integral over space of
the energy density T 00, up to the factor c−2. Therefore at least one of the
components of the tensor T µν must play the role of source for the gravitational
field. However, since the gravitational field is encoded, according to Einstein,
by the ten components of the metric gµν , it is natural to suppose that the
source for gµν must also have ten components, which is precisely the case for
the (symmetric) tensor T µν .
In November of 1915, after many years of conceptually arduous work, Ein-
stein wrote the final form of the theory of general relativity [6]. Einstein’s equa-
tions are non-linear, second-order partial differential equations for the geometric-
gravitational field gµν , containing the energy-momentum tensor Tµν ≡ gµκ gνλ T κλ
on the right-hand side. They are written as follows:
Rµν −
Rgµν =
Tµν (9)
where G is the (Newtonian) gravitational constant, c is the speed of light, and
R ≡ gµν Rµν and the Ricci tensor Rµν are calculated as a function of the
connection coefficients Γλµν (8) in the following way:
Rµν ≡ ∂α Γαµν − ∂ν Γαµα + Γαβα Γβµν − Γαβν Γβµα . (10)
One can show that, in a four-dimensional space-time, the three principles
we have described previously uniquely determine Einstein’s equations (9). It
is nevertheless remarkable that these equations may also be developed from
points of view that are completely different from the one taken by Einstein. For
example, in the 1960s various authors (in particular Feynman, Weinberg and
Deser; see references in [4]) showed that Einstein’s equations could be obtained
from a purely dynamical approach, founded on the consistency of interactions
of a long-range spin 2 field, without making any appeal, as Einstein had, to
the geometric notions coming from mathematical work on Riemannian spaces.
Let us also note that if we relax one of the principles described previously (as
Einstein did in 1917) we can find a generalization of Equation (9) in which one
adds the term +Λ gµν to the left-hand side, where Λ is the so-called cosmological
constant. Such a modification was proposed by Einstein in 1917 in order to be
able to write down a globally homogeneous and stationary cosmological solution.
Einstein rejected this additional term after work by Friedmann (1922) showed
the existence of expanding cosmological solutions of general relativity and after
the observational discovery (by Hubble in 1929) of the expanding motion of
galaxies within the universe. However, recent cosmological data have once again
made this possibility fashionable, although in the fundamental physics of today
one tends to believe that a term of the type Λ gµν should be considered as a
particular physical contribution to the right-hand side of Einstein’s equations
(more precisely, as the stress-energy tensor of the vacuum, T Vµν = − c
Λ gµν),
rather than as a universal geometric modification of the left-hand side.
Let us now comment on the physical meaning of Einstein’s equations (9).
The essential new idea is that the chrono-geometric structure of space-time,
Equation (6), in other words the structure that underlies all of the measurements
that one could locally make of duration, dT , and of distance, dL, (we recall
that, locally, ds2 = dL2 − c2 dT 2) is no longer a rigid structure that is given a
priori, once and for all (as was the case for the structure of Poincaré-Minkowski
space-time), but instead has become a field, a dynamical or elastic structure,
which is created and/or deformed by the presence of an energy-momentum
distribution. See Figure 2, which visualizes the “elastic” geometry of space-
time in the theory of general relativity by representing, around each point x,
the locus of points (assumed to be infinitesimally close to x) separated from x by
a constant (squared) interval: ds2 = ε2. As in the case of Poincaré-Minkowski
space-time (Figure 1), one arrives at a “field” of hyperboloids. However, this
field of hyperboloids no longer has a “rigid” and homogeneous structure.
Figure 2: “Elastic” space-time geometry in the theory of general relativity. This
geometry is visualized by representing, around each space-time point x, the locus
of points separated from x by a given small positive (squared) interval.
The space-time field gµν(x) describes the variation from point to point of
the chrono-geometry as well as all gravitational effects. The simplest example
of space-time chrono-geometric elasticity is the effect that the proximity of a
mass has on the “local rate of flow for time.” In concrete terms, if you separate
two twins at birth, with one staying on the surface of the Earth and the other
going to live on the peak of a very tall mountain (in other words farther from
the Earth’s center), and then reunite them after 100 years, the “highlander”
will be older (will have lived longer) than the twin who stayed on the valley
floor. Everything takes place as if time flows more slowly the closer one is to
a given distribution of mass-energy. In mathematical terms this effect is due
to the fact that the coefficient g00(x) of (dx
0)2 in Equation (6) is deformed
with respect to its value in special relativity, gMinkowski00 = η00 = −1, to become
gEinstein00 (x) ≃ −1 + 2GM/c2r, where M is the Earth’s mass (in our example)
and r the distance to the center of the Earth. In the example considered above
of terrestrial twins the effect is extremely small (a difference in the amount of
time lived of about one second over 100 years), but the effect is real and has
been verified many times using atomic clocks (see the references in [4]). Let us
mention that today this “Einstein effect” has important practical repercussions,
for example in aerial or maritime navigation, for the piloting of automobiles, or
even farm machinery, etc. In fact, the GPS (Global Positioning System), which
uses the data transmitted by a constellation of atomic clocks on board satellites,
incorporates the Einsteinian deformation of space-time chrono-geometry into its
software. The effect is only on the order of one part in a billion, but if it were not
taken into account, it would introduce an unacceptably large error into the GPS,
which would continually grow over time. Indeed, GPS performance relies on the
high stability of the orbiting atomic clocks, a stability better than 10−13, or in
other words 10,000 times greater than the apparent change in frequency(∼ 10−9)
due to the Einsteinian deformation of the chrono-geometry.
6 TheWeak-Field Limit and the Newtonian Limit
To understand the physical consequences of Einstein’s equations (9), it is useful
to begin by considering the limiting case of weak geometric-gravitational fields,
namely the case where gµν(x) = ηµν + hµν(x), with perturbations hµν(x) that
are very small with respect to unity: |hµν(x)| ≪ 1. In this case, a simple
calculation (that we encourage the reader to perform) starting from Definitions
(8) and (10) above, leads to the following explicit form of Einstein’s equations
(where we ignore terms of order h2 and hT ):
� hµν − ∂µ ∂α hαν − ∂ν ∂α hαµ + ∂µν hαα = −
16 πG
T̃µν , (11)
where � = ηµν ∂µν = ∆−∂20 = ∂2/∂x2+∂2/∂y2+∂2/∂z2− c−2 ∂2/∂t2 denotes
the “flat” d’Alembertian (or wave operator; xµ = (ct, x, y, z)), and where indices
in the upper position have been raised by the inverse ηµν of the flat metric ηµν
(numerically ηµν = ηµν , meaning that −η00 = η11 = η22 = η33 = +1). For
example ∂α hαν denotes η
αβ ∂α hβν and h
α ≡ ηαβ hαβ = −h00 + h11 + h22 +
h33. The “source” T̃µν appearing on the right-hand side of (11) denotes the
combination T̃µν ≡ Tµν − 12 T
α ηµν (when space-time is four-dimensional).
The “linearized” approximation (11) of Einstein’s equations is analogous to
Maxwell’s equations
�Aµ − ∂µ ∂αAα = −4π Jµ , (12)
connecting the electromagnetic four-potential Aµ ≡ ηµν Aν (where A0 = V ,
Ai = A, i = 1, 2, 3) to the four-current density Jµ ≡ ηµν Jν (where J0 = ρ is the
charge density and J i = J is the current density). Another analogy is that the
structure of the left-hand side of Maxwell’s equations implies that the “source”
Jµ appearing on the right-hand side must satisfy ∂
µ Jµ = 0 (∂
µ ≡ ηµν ∂ν),
which expresses the conservation of electric charge. Likewise, the structure of
the left-hand side of the linearized form of Einstein’s equations (11) implies that
the “source” Tµν = T̃µν − 12 T̃
α ηµν must satisfy ∂
µ Tµν = 0, which expresses the
conservation of energy and momentum of matter. (The structure of the left-
hand side of the exact form of Einstein’s equations (9) implies that the source
Tµν must satisfy the more complicated equation ∂µ T
µν+Γµσµ T
σν+Γνσµ T
µσ = 0,
where the terms in ΓT can be interpreted as describing an exchange of energy
and momentum between matter and the gravitational field.) The major dif-
ference is that, in the case of electromagnetism, the field Aµ and its source Jµ
have a single space-time index, while in the gravitational case the field hµν and
its source T̃µν have two space-time indices. We shall return later to this anal-
ogy/difference between Aµ and hµν which suggests the existence of a certain
relation between gravitation and electromagnetism.
We recover the Newtonian theory of gravitation as the limiting case of Ein-
stein’s theory by assuming not only that the gravitational field is a weak defor-
mation of the flat Minkowski space-time (hµν ≪ 1), but also that the field hµν
is slowly varying (∂0 hµν ≪ ∂i hµν) and that its source Tµν is non-relativistic
(Tij ≪ T0i ≪ T00). Under these conditions Equation (11) leads to a Poisson-
type equation for the purely temporal component, h00, of the space-time field,
∆h00 = −
16 πG
T̃00 = −
(T00 + Tii) ≃ −
T00 , (13)
where ∆ = ∂2x + ∂
y + ∂
z is the Laplacian. Recall that, according to Laplace
and Poisson, Newton’s theory of gravity is summarized by saying that the grav-
itational field is described by a single potential U(x), produced by the mass
density ρ(x) according to the Poisson equation ∆U = −4 πGρ, that deter-
mines the acceleration of a test particle placed in the exterior field U(x) ac-
cording to the equation d2 xi/dt2 = ∂i U(x) ≡ ∂U/∂xi. Because of the relation
mi = mg = E/c
2 one can identify ρ = T 00/c2. We therefore find that (13)
reproduces the Poisson equation if h00 = +2U/c
2. It therefore remains to ver-
ify that Einstein’s theory indeed predicts that a non-relativistic test particle is
accelerated by a space-time field according to d2 xi/dt2 ≃ 1
c2 ∂i h00. Einstein
understood that this was a consequence of the equivalence principle. In fact,
as we discussed in Section 4 above, the principle of equivalence states that the
gravitational field is (locally) erased in a locally inertial reference frame Xα
(such that gαβ(X) = ηαβ +O((X −X0)2)). In such a reference frame, the laws
of special relativity apply at the point X0. In particular an isolated (and elec-
trically neutral) body must satisfy a principle of inertia in this frame: its center
of mass moves in a straight line at constant speed. In other words it satisfies
the equation of motion d2Xα/ds2 = 0. By passing back to an arbitrary (ex-
tended) coordinate system xµ, one verifies that this equation for inertial motion
transforms into the geodesic equation (7). Therefore (7) describes falling bodies,
such as they are observed in arbitrary extended reference frames (for example a
reference frame at rest with respect to the Earth or at rest with respect to the
center of mass of the solar system). From this one concludes that the relativis-
tic analog of the Newtonian field of gravitational acceleration, g(x) = ∇U(x),
is gλ(x) ≡ −c2 Γλµν dxµ/ds dxν/ds. By considering a particle whose motion is
slow with respect to the speed of light (dxi/ds ≪ dx0/ds ≃ 1) one can easily
verify that gi(x) ≃ −c2 Γi00. Finally, by using the definition (8) of Γαµν , and the
hypothesis of weak fields, one indeed verifies that gi(x) ≃ 1
c2 ∂i h00, in perfect
agreement with the identification h00 = 2U/c
2 anticipated above. We encour-
age the reader to personally verify this result, which contains the very essence
of Einstein’s theory: gravitational motion is no longer described as being due
to a force, but is identified with motion that is “as inertial as possible” within a
space-time whose chrono-geometry is deformed in the presence of a mass-energy
distribution.
Finding the Newtonian theory as a limiting case of Einstein’s theory is ob-
viously a necessity for seriously considering this new theory. But of course,
from the very beginning Einstein explored the observational consequences of
general relativity that go beyond the Newtonian description of gravitation. We
have already mentioned one of these above: the fact that g00 = η00 + h00 ≃
−1 + 2U(x)/c2 implies a distortion in the relative measurement of time in the
neighborhood of massive bodies. In 1907 (as soon as he had developed the
principle of equivalence, and long before he had obtained the field equations
of general relativity) Einstein had predicted the existence of such a distortion
for measurements of time and frequency in the presence of an external gravita-
tional field. He realized that this should have observable consequences for the
frequency, as observed on Earth, of the spectral rays emitted from the surface of
the Sun. Specifically, a spectral ray of (proper local) frequency ν0 emitted from
a point x0 where the (stationary) gravitational potential takes the value U(x0)
and observed (via electromagnetic signals) at a point x where the potential is
U(x) should appear to have a frequency ν such that
g00(x0)
g00(x)
≃ 1 + 1
[U(x)− U(x0)] . (14)
In the case where the point of emission x0 is in a gravitational potential well
deeper than the point of observation x (meaning that U(x0) > U(x)) one has
ν < ν0, in other words a reddening effect on frequencies. This effect, which was
predicted by Einstein in 1907, was unambiguously verified only in the 1960s,
in experiments by Pound and collaborators over a height of about twenty me-
ters. The most precise verification (at the level of ∼ 10−4) is due to Vessot
and collaborators, who compared a hydrogen maser, launched aboard a rocket
that reached about 10,000 km in altitude, to a clock of similar construction
on the ground. Other experiments compared the times shown on clocks placed
aboard airplanes to clocks remaining on the ground. (For references to these
experiments see [4].) As we have already mentioned, the “Einstein effect” (14)
must be incorporated in a crucial way into the software of satellite positioning
systems such as the GPS.
In 1907, Einstein also pointed out that the equivalence principle would sug-
gest that light rays should be deflected by a gravitational field. Indeed, a gener-
alization of the reasoning given above for the motion of particles in an external
gravitational field, based on the principle of equivalence, shows that light must
itself follow a trajectory that is “as inertial as possible,” meaning a geodesic
of the curved space-time. Light rays must therefore satisfy the geodesic equa-
tion (7). (The only difference from the geodesics followed by material particles
is that the parameter s in Equation (7) can no longer be taken equal to the
“length” along the geodesic, since a “light” geodesic must also satisfy the con-
straint gµν(x) dx
µ dxν = 0, ensuring that its speed is equal to c, when it is
measured in a locally inertial reference frame.) Starting from Equation (7) one
can therefore calculate to what extent light is deflected when it passes through
the neighborhood of a large mass (such as the Sun). One nevertheless soon
realizes that in order to perform this calculation one must know more than the
component h00 of the gravitational field. The other components of hµν , and in
particular the spatial components hij , come into play in a crucial way in this
calculation. This is why it was only in November of 1915, after having obtained
the (essentially) final form of his theory, that Einstein could predict the total
value of the deflection of light by the Sun. Starting from the linearized form of
Einstein’s equations (11) and continuing by making the “non-relativistic” sim-
plifications indicated above (Tij ≪ T0i ≪ T00, ∂0 h≪ ∂i h) it is easy to see that
the spatial component hij , like h00, can be written (after a helpful choice of
coordinates) in terms of the Newtonian potential U as hij(x) ≃ +2U(x) δij/c2,
where δij takes the value 1 if i = j and 0 otherwise (i, j = 1, 2, 3). By inserting
this result, as well as the preceding result h00 = +2U/c
2, into the geodesic
equation (7) for the motion of light, one finds (as Einstein did in 1915) that
general relativity predicts that the Sun should deflect a ray of light by an angle
θ = 4GM/(c2b) where b is the impact parameter of the ray (meaning its mini-
mum distance from the Sun). As is well known, the confirmation of this effect
in 1919 (with rather weak precision) made the theory of general relativity and
its creator famous.
7 The Post-Newtonian Approximation and Ex-
perimental Confirmations in the Regime of
Weak and Quasi-Stationary Gravitational Fields
We have already pointed out some of the experimental confirmations of the
theory of general relativity. At present, the extreme precision of certain mea-
surements of time or frequency in the solar system necessitates a very careful
account of the modifications brought by general relativity to the Newtonian de-
scription of space-time. As a consequence, general relativity is used in a great
number of situations, from astronomical or geophysical research (such as very
long range radio interferometry, radar tracking of the planets, and laser tracking
of the Moon or artificial satellites) to metrological, geodesic or other applica-
tions (such as the definition of international atomic time, precision cartography,
and the G.P.S.). To do this, the so-called post-Newtonian approximation has
been developed. This method involves working in the Newtonian limit sketched
above while keeping the terms of higher order in the small parameter
ε ∼ v
∼ |hµν | ∼ |∂0 h/∂i h|2 ∼ |T 0i/T 00|2 ∼ |T ij/T 00| ,
where v denotes a characteristic speed for the elements in the system considered.
For all present applications of general relativity to the solar system it suffices
to include the first post-Newtonian approximation, in other words to keep the
relative corrections of order ε to the Newtonian predictions. Since the theory of
general relativity was poorly verified for a long time, one found it useful (as in
the pioneering work of A. Eddington, generalized in the 1960s by K. Nordtvedt
and C.M. Will) to study not only the precise predictions of the equations (9)
defining Einstein’s theory, but to also consider possible deviations from these
predictions. These possible deviations were parameterized by means of several
non-dimensional “post-Newtonian” parameters. Among these parameters, two
play a key role: γ and β. The parameter γ describes a possible deviation from
general relativity that comes into play starting at the linearized level, in other
words one that modifies the linearized approximation given above. More pre-
cisely, it is defined by writing that the difference hij ≡ gij − δij between the
spatial metric and the Euclidean metric can take the value hij = 2γ U δij/c
2 (in a
suitable coordinate system), rather than the value hGRij = 2U δij/c
2 that it takes
in general relativity, thus differing by a factor γ. Therefore, by definition γ takes
the value 1 in general relativity, and γ− 1 measures the possible deviation with
respect to this theory. As for the parameter β (or rather β−1), it measures a pos-
sible deviation (with respect to general relativity) in the value of h00 ≡ g00−η00.
The value of h00 in general relativity is h
00 = 2U/c
2 − 2U2/c4, where the first
term (discussed above) reproduces the Newtonian approximation (and cannot
therefore be modified, as the idea is to parameterize gravitational physics be-
yond Newtonian predictions) and where the second term is obtained by solving
Einstein’s equations (9) at the second order of approximation. One then writes
an h00 of a more general parameterized type, h00 = 2U/c
2 − 2 β U2/c4, where,
by definition, β takes the value 1 in general relativity. Let us finally point out
that the parameters γ−1 and β−1 completely parameterize the post-Newtonian
regime of the simplest theoretical alternatives to general relativity, namely the
tensor-scalar theories of gravitation. In these theories, the gravitational inter-
action is carried by two fields at the same time: a massless tensor (spin 2) field
coupled to T µν , and a massless scalar (spin 0) field ϕ coupled to the trace Tαα .
In this case the parameter −(γ − 1) plays the key role of measuring the ratio
between the scalar coupling and the tensor coupling.
All of the experiments performed to date within the solar system are com-
patible with the predictions of general relativity. When they are interpreted
in terms of the post-Newtonian (and “post-Einsteinian”) parameters γ − 1 and
β−1, they lead to strong constraints on possible deviations from Einstein’s the-
ory. We make note of the following among tests performed in the solar system:
the deflection of electromagnetic waves in the neighborhood of the Sun, the grav-
itational delay (‘Shapiro effect’) of radar signals bounced from the Viking lander
on Mars, the global analysis of solar system dynamics (including the advance of
planetary perihelia), the sub-centimeter measurement of the Earth-Moon dis-
tance obtained from laser signals bounced off of reflectors on the Moon’s surface,
etc. At present (October of 2006) the most precise test (that has been published)
of general relativity was obtained in 2003 by measuring the ratio 1 + y ≡ f/f0
between the frequency f0 of radio waves sent from Earth to the Cassini space
probe and the frequency f of coherent radio waves sent back (with the same
local frequency) from Cassini to Earth and compared (on Earth) to the emitted
frequency f0. The main contribution to the small quantity y is an effect equal, in
general relativity, to yGR = 8(GM/c
3 b) db/dt (where b is, as before, the impact
parameter) due to the propagation of radio waves in the geometry of a space-time
deformed by the Sun: ds2 ≃ −(1−2U/c2) c2 dt2+(1+2U/c2)(dx2+dy2+dz2),
where U = GM/r. The maximum value of the frequency change predicted
by general relativity was only |yGR| . 2 × 10−10 for the best observations,
but thanks to an excellent frequency stability ∼ 10−14 (after correction for the
perturbations caused by the solar corona) and to a relatively large number of
individual measurements spread over 18 days, this experiment was able to verify
Einstein’s theory at the remarkable level of ∼ 10−5 [7]. More precisely, when
this experiment is interpreted in terms of the post-Newtonian parameters γ− 1
and β − 1, it gives the following limit for the parameter γ − 1 [7]
γ − 1 = (2.1± 2.3)× 10−5 . (15)
As for the best present-day limit on the parameter β−1, it is smaller than 10−3
and comes from the non-observation, in the data from lasers bounced off of the
Moon, of any eventual polarization of the Moon’s orbit in the direction of the
Sun (‘Nordtvedt effect’; see [4] for references)
4(β − 1)− (γ − 1) = −0.0007± 0.0010 . (16)
Although the theory of general relativity is one of the best verified theories in
physics, scientists continue to design and plan new or increasingly precise tests of
the theory. This is the case in particular for the space mission Gravity Probe B
(launched by NASA in April of 2004) whose principal aim is to directly observe a
prediction of general relativity that states (intuitively speaking) that space is not
only “elastic,” but also “fluid.” In the nineteenth century Foucalt invented both
the gyroscope and his famous pendulum in order to render Newton’s absolute
(and rigid) space directly observable. His experiments in fact showed that,
for example, a gyroscope on the surface of the Earth continued, despite the
Earth’s rotation, to align itself in a direction that is “fixed” with respect to the
distant stars. However, in 1918, when Lense and Thirring analyzed some of the
consequences of the (linearized) Einstein equations (11), they found that general
relativity predicts, among other things, the following phenomenon: the rotation
of the Earth (or any other ball of matter) creates a particular deformation of the
chrono-geometry of space-time. This deformation is described by the “gravito-
magnetic” components h0i of the metric, and induces an effect analogous to
the “rotation drag” effect caused by a ball of matter turning in a fluid: the
rotation of the Earth (minimally) drags all of the space around it, causing it to
continually “turn,” as a fluid would.3 This “rotation of space” translates, in an
observable way, into a violation of the effects predicted by Newton and confirmed
by Foucault’s experiments: in particular, a gyroscope no longer aligns itself in a
direction that is “fixed in absolute space,” rather its axis of rotation is “dragged”
by the rotating motion of the local space where it is located. This effect is much
too small to be visible in Foucalt’s experiments. Its observation by Gravity
Probe B (see [8] and the contribution of John Mester to this Poincaré seminar)
is important for making Einstein’s revolutionary notion of a fluid space-time
tangible to the general public.
Up till now we have only discussed the regime of weak and slowly varying
gravitational fields. The theory of general relativity predicts the appearance
of new phenomena when the gravitational field becomes strong and/or rapidly
varying. (We shall not here discuss the cosmological aspects of relativistic grav-
itation.)
8 Strong Gravitational Fields and Black Holes
The regime of strong gravitational fields is encountered in the physics of grav-
itationally condensed bodies. This term designates the final states of stellar
evolution, and in particular neutron stars and black holes. Recall that most of
the life of a star is spent slowly burning its nuclear fuel. This process causes
the star to be structured as a series of layers of differentiated nuclear structure,
surrounding a progressively denser core (an “onion-like” structure). When the
initial mass of the star is sufficiently large, this process ends into a catastrophic
phenomenon: the core, already much denser than ordinary matter, collapses in
on itself under the influence of its gravitational self-attraction. (This implosion
of the central part of the star is, in many cases, accompanied by an explosion
of the outer layers of the star—a supernova.) Depending on the quantity of
mass that collapses with the core of a star, this collapse can give rise to either
a neutron star or a black hole.
A neutron star condenses a mass on the order of the mass of the Sun inside
a radius on the order of 10 km. The density in the interior of a neutron star
(named thus because neutrons dominate its nuclear composition) is more than
100 million tons per cubic centimeter (1014 g/cm3)! It is about the same as the
density in the interior of atomic nuclei. What is important for our discussion is
that the deformation away from the Minkowski metric in the immediate neigh-
borhood of a neutron star, measured by h00 ∼ hii ∼ 2GM/c2R, where R is the
radius of the star, is no longer a small quantity, as it was in the solar system.
In fact, while h ∼ 2GM/c2R is on the order of 10−9 for the Earth and 10−6 for
the Sun, one finds that h ∼ 0.4 for a typical neutron star (M ≃ 1.4M⊙, R ∼ 10
3Recent historical work (by Herbert Pfister) has in fact shown that this effect had already
been derived by Einstein within the framework of the provisory relativistic theory of gravity
that he started to develop in 1912 in collaboration with Marcel Grossmann.
km). One thus concludes that it is no longer possible, as was the case in the
solar system, to study the structure and physics of neutron stars by using the
post-Newtonian approximation outlined above. One must consider the exact
form of Einstein’s equations (9), with all of their non-linear structure. Because
of this, we expect that observations concerning neutron stars will allow us to
confirm (or refute) the theory of general relativity in its strongly non-linear
regime. We shall discuss such tests below in relation to observations of binary
pulsars.
A black hole is the result of a continued collapse, meaning that it does
not stop with the formation of an ultra-dense star (such as a neutron star).
(The physical concept of a black hole was introduced by J.R. Oppenheimer and
H. Snyder in 1939. The global geometric structure of black holes was not un-
derstood until some years later, thanks notably to the work of R. Penrose. For
a historical review of the idea of black holes see [9].) It is a particular structure
of curved space-time characterized by the existence of a boundary (called the
“black hole surface” or “horizon”) between an exterior region, from which it is
possible to emit signals to infinity, and an interior region (of space-time), within
which any emitted signal remains trapped. See Figure 3.
r = 0 SINGULARITY
r = 2M
HORIZON
FLASH
OF LIGHT
EMITTED
FROM CENTER
COLLAPSING
space
Figure 3: Schematic representation of the space-time for a black hole created
from the collapse of a spherical star. Each cone represents the space-time history
of a flash of light emitted from a point at a particular instant. (Such a “cone
field” is obtained by taking the limit ε2 = 0 from Figure 2, and keeping only
the upper part, in other words the part directed towards the future, of the
double cones obtained as limits of the hyperboloids of Figure 2.) The interior
of the black hole is shaded, its outer boundary being the “black hole surface”
or “horizon.” The “inner boundary” (shown in dark grey) of the interior region
of the black hole is a space-time singularity of the big-crunch type.
The cones shown in this figure are called “light cones.” They are defined as
the locus of points (infinitesimally close to x) such that ds2 = 0, with dx0 =
cdt ≥ 0. Each represents the beginning of the space-time history of a flash of
light emitted from a certain point in space-time. The cones whose vertices are
located outside of the horizon (the shaded zone) will evolve by spreading out
to infinity, thus representing the possibility for electromagnetic signals to reach
infinity.
On the other hand, the cones whose vertices are located inside the horizon
(the grey zone) will evolve without ever succeeding in escaping the grey zone.
It is therefore impossible to emit an electromagnetic signal that reaches infinity
from the grey zone. The horizon, namely the boundary between the shaded
zone and the unshaded zone, is itself the history of a particular flash of light,
emitted from the center of the star over the course of its collapse, such that
it asymptotically stabilizes as a space-time cylinder. This space-time cylinder
(the asymptotic horizon) therefore represents the space-time history of a bubble
of light that, viewed locally, moves outward at the speed c, but which globally
“runs in place.” This remarkable behavior is a striking illustration of the “fluid”
character of space-time in Einstein’s theory. Indeed, one can compare the pre-
ceding situation with what may take place around the open drain of an emptying
sink: a wave may move along the water, away from the hole, all the while run-
ning in place with respect to the sink because of the falling motion of the water
in the direction of the drain.
Note that the temporal development of the interior region is limited, ter-
minating in a singularity (the dark gray surface) where the curvature becomes
infinite and where the classical description of space and time loses its meaning.
This singularity is locally similar to the temporal inverse of a cosmological sin-
gularity of the big bang type. This is called a big crunch. It is a space-time
frontier, beyond which space-time ceases to exist. The appearance of singulari-
ties associated with regions of strong gravitational fields is a generic phenomenon
in general relativity, as shown by theorems of R. Penrose and S.W. Hawking.
Black holes have some remarkable properties. First, a uniqueness theorem
(due to W. Israel, B. Carter, D.C. Robinson, G. Bunting, and P.O. Mazur)
asserts that an isolated, stationary black hole (in Einstein-Maxwell theory) is
completely described by three parameters: its mass M , its angular momen-
tum J , and its electric charge Q. The exact solution (called the Kerr-Newman
solution) of Einstein’s equations (11) describing a black hole with parameters
M,J,Q is explicitly known. We shall here content ourselves with writing the
space-time geometry in the simplest case of a black hole: the one in which
J = Q = 0 and the black hole is described only by its mass (a solution discov-
ered by K. Schwarzschild in January of 1916):
ds2 = −
1− 2GM
c2 dt2 +
1− 2GM
+ r2(dθ2 + sin2 θ dϕ2) . (17)
We see that the purely temporal component of the metric, g00 = −(1−2GM/c2r),
vanishes when the radial coordinate r takes the value r = rH ≡ 2GM/c2. Ac-
cording to the earlier equation (14), it would therefore seem that light emitted
from an arbitrary point on the sphere r0 = rH , when it is viewed by an observer
located anywhere in the exterior (in r > rH), would experience an infinite red-
dening of its emission frequency (ν/ν0 = 0). In fact, the sphere rH = 2GM/c
is the horizon of the Schwarzschild black hole, and no particle (that is capable of
emitting light) can remain at rest when r = rH (nor, a fortiori, when r < rH).
To study what happens at the horizon (r = rH) or in the interior (r < rH) of
a Schwarzschild black hole, one must use other space-time coordinates than the
coordinates (t, r, θ, ϕ) used in Equation (17). The “big crunch” singularity in
the interior of a Schwarzschild black hole, in the coordinates of (17), is located
at r = 0 (which does not describe, as one might believe, a point in space, but
rather an instant in time).
The space-time metric of a black hole space-time, such as Equation (17)
in the simple case J = Q = 0, allows one to study the influence of a black
hole on particles and fields in its neighborhood. One finds that a black hole
is a gravitational potential well that is so deep that any particle or wave that
penetrates the interior of the black hole (the region r < rH) will never be able
to come out again, and that the total energy of the particle or wave that “falls”
into the black hole ends up augmenting the total mass-energy M of the black
hole. By studying such black hole “accretion” processes with falling particles
(following R. Penrose), D. Christodoulou and R. Ruffini showed that a black hole
is not only a potential well, but also a physical object possessing a significant
free energy that it is possible, in principle, to extract. Such black hole energetics
is encapsulated in the “mass formula” of Christodoulou and Ruffini (in units
where c = 1)
Mirr +
4GMirr
4G2M2irr
, (18)
where Mirr denotes the irreducible mass of the black hole, a quantity that can
only grow, irreversibly. One deduces from (18) that a rotating (J 6= 0) and/or
charged (Q 6= 0) black hole possesses a free energy M − Mirr > 0 that can,
in principle, be extracted through processes that reduce its angular momentum
and/or its electric charge. Such black hole energy-extraction processes may lie
at the origin of certain ultra-energetic astrophysical phenomena (such as quasars
or gamma ray bursts). Let us note that, according to Equation (18), (rotating or
charged) black holes are the largest reservoirs of free energy in the Universe: in
fact, 29% of their mass energy can be stored in the form of rotational energy, and
up to 50% can be stored in the form of electric energy. These percentages are
much higher than the few percent of nuclear binding energy that is at the origin
of all the light emitted by stars over their lifetimes. Even though there is not, at
present, irrefutable proof of the existence of black holes in the universe, an entire
range of very strong presumptive evidence lends credence to their existence. In
particular, more than a dozen X-ray emitting binary systems in our galaxy
are most likely made up of a black hole and an ordinary star. Moreover, the
center of our galaxy seems to contain a very compact concentration of mass
∼ 3 × 106M⊙ that is probably a black hole. (For a review of the observational
data leading to these conclusions see, for example, Section 7.6 of the recent book
by N. Straumann [6].)
The fact that a quantity associated with a black hole, here the irreducible
mass Mirr, or, according to a more general result due to S.W. Hawking, the
total area A of the surface of a black hole (A = 16 πG2M2irr), can evolve only by
irreversibly growing is reminiscent of the second law of thermodynamics. This
result led J.D. Bekenstein to interpret the horizon area, A, as being propor-
tional to the entropy of the black hole. Such a thermodynamic interpretation
is reinforced by the study of the growth of A under the influence of external
perturbations, a growth that one can in fact attribute to some local dissipative
properties of the black hole surface, notably a surface viscosity and an electrical
resistivity equal to 377 ohm (as shown in work by T. Damour and R.L. Zna-
jek). These “thermodynamic” interpretations of black hole properties are based
on simple analogies at the level of classical physics, but a remarkable result by
Hawking showed that they have real content at the level of quantum physics.
In 1974, Hawking discovered that the presence of a horizon in a black hole
space-time affected the definition of a quantum particle, and caused a black
hole to continuously emit a flux of particles having the characteristic spectrum
(Planck spectrum) of thermal emission at the temperature T = 4 ~G∂M/∂A,
where ~ is the reduced Planck constant. By using the general thermodynamic
relation connecting the temperature to the energy E = M and the entropy S,
T = ∂M/∂S, we see from Hawking’s result (in conformity with Bekenstein’s
ideas) that a black hole possesses an entropy S equal (again with c = 1) to
. (19)
The Bekenstein-Hawking formula (19) suggests an unexpected, and perhaps pro-
found, connection between gravitation, thermodynamics, and quantum theory.
See Section 11 below.
9 Binary Pulsars and Experimental Confirma-
tions in the Regime of Strong and Radiating
Gravitational Fields
Binary pulsars are binary systems made up of a pulsar (a rapidly spinning
neutron star) and a very dense companion star (either a neutron star or a white
dwarf). The first system of this type (called PSR B1913+16) was discovered by
R.A. Hulse and J.H. Taylor in 1974 [10]. Today, a dozen are known. Some of
these (including the first-discovered PSR B1913+16) have revealed themselves
to be remarkable probes of relativistic gravitation and, in particular, of the
regime of strong and/or radiating gravitational fields. The reason for which a
binary pulsar allows for the probing of strong gravitational fields is that, as we
have already indicated above, the deformation of the space-time geometry in the
neighborhood of a neutron star is no longer a small quantity, as it is in the solar
system. Rather, it is on the order of unity: hµν ≡ gµν − ηµν ∼ 2GM/c2R ∼ 0.4.
(We note that this value is only 2.5 times smaller than in the extreme case of a
black hole, for which 2GM/c2R = 1.) Moreover, the fact that the gravitational
interaction propagates at the speed of light (as indicated by the presence of
the wave operator, � = ∆ − c−2∂2/∂t2 in (11)) between the pulsar and its
companion is found to play an observationally significant role for certain binary
pulsars.
Let us outline how the observational data from binary pulsars are used to
probe the regime of strong (hµν on the order of unity) and/or radiative (effects
propagating at the speed c) gravitational fields. (For more details on the obser-
vational data from binary pulsars and their use in probing relativistic gravita-
tion, see Michael Kramer’s contribution to this Poincaré seminar.) Essentially,
a pulsar plays the role of an extremely stable clock. Indeed, the “pulsar phe-
nomenon” is due to the rotation of a bundle of electromagnetic waves, created
in the neighborhood of the two magnetic poles of a strongly magnetized neutron
star (with a magnetic field on the order of 1012 Gauss, 1012 times the size of
the terrestrial magnetic field). Since the magnetic axis of a pulsar is not aligned
with its axis of rotation, the rapid rotation of the pulsar causes the (inner)
magnetosphere as a whole to rotate, and likewise the bundle of electromagnetic
waves created near the magnetic poles. The pulsar is therefore analogous to a
lighthouse that sweeps out space with two bundles (one per pole) of electromag-
netic waves. Just as for a lighthouse, one does not see the pulsar from Earth
except when the bundle sweeps the Earth, thus causing a flash of electromag-
netic noise with each turn of the pulsar around itself (in some cases, one even
sees a secondary flash, due to emission from the second pole, after each half-
turn). One can then measure the time of arrival at Earth of (the center of) each
flash of electromagnetic noise. The basic observational data of a pulsar are thus
made up of a regular, discrete sequence of the arrival times at Earth of these
flashes or “pulses.” This sequence is analogous to the signal from a clock: tick,
tick, tick, . . .. Observationally, one finds that some pulsars (and in particular
those that belong to binary systems) thus define clocks of a stability comparable
to the best atomic clocks [11]. In the case of a solitary pulsar, the sequence of
its arrival times is (in essence) a regular “arithmetic sequence,” TN = aN + b,
where N is an integer labelling the pulse considered, and where a is equal to the
period of rotation of the pulsar around itself. In the case of a binary pulsar, the
sequence of arrival times is a much richer signal, say TN = aN + b+∆N , where
∆N measures the deviation with respect to a regular arithmetic sequence. This
deviation (after the subtraction of effects not connected to the orbital period
of the pulsar) is due to a whole ensemble of physical effects connected to the
orbital motion of the pulsar around its companion or, more precisely, around
the center of mass of the binary system. Some of these effects could be pre-
dicted by a purely Keplerian description of the motion of the pulsar in space,
and are analogous to the “Rœmer effect” that allowed Rœmer to determine,
for the first time, the speed of light from the arrival times at Earth of light
signals coming from Jupiter’s satellites (the light signals coming from a body
moving in orbit are “delayed” by the time taken by light to cross this orbit and
arrive at Earth). Other effects can only be predicted and calculated by using a
relativistic description, either of the orbital motion of the pulsar, or of the prop-
agation of electromagnetic signals between the pulsar and Earth. For example,
the following facts must be accounted for: (i) the “pulsar clock” moves at a
large speed (on the order of 300 km/s ∼ 10−3c) and is embedded in the varying
gravitational potential of the companion; (ii) the orbit of the pulsar is not a
simple Keplerian ellipse, but (in general relativity) a more complicated orbit
that traces out a “rosette” around the center of mass; (iii) the propagation of
electromagnetic signals between the pulsar and Earth takes place in a space-time
that is curved by both the pulsar and its companion, which leads to particular
effects of relativistic delay; etc. Taking relativistic effects in the theoretical de-
scription of arrival times for signals emitted by binary pulsars into account thus
leads one to write what is called a timing formula. This timing formula (due to
T. Damour and N. Deruelle) in essence allows one to parameterize the sequence
of arrival times, TN = aN + b +∆N , in other words to parameterize ∆N , as a
function of a set of “phenomenological parameters” that include not only the
so-called “Keplerian” parameters (such as the orbital period P , the projection
of the semi-major axis of the pulsar’s orbit along the line of sight xA = aA sin i,
and the eccentricity e), but also the post-Keplerian parameters associated with
the relativistic effects mentioned above. For example, effect (i) discussed above
is parameterized by a quantity denoted γT ; effect (ii) by (among others) the
quantities ω̇, Ṗ ; effect (iii) by the quantities r, s; etc.
The way in which observations of binary pulsars allow one to test rela-
tivistic theories of gravity is therefore the following. A (least-squares) fit be-
tween the observational timing data, ∆obsN , and the parameterized theoreti-
cal timing formula, ∆thN (P, xA, e; γT , ω̇, Ṗ , r, s), allows for the determination of
the observational values of the Keplerian (P obs, xobsA , e
obs) and post-Keplerian
(γobsT , ω̇
obs, Ṗ obs, robs, sobs) parameters. The theory of general relativity pre-
dicts the value of each post-Keplerian parameter as a function of the Keple-
rian parameters and the two masses of the binary system (the mass mA of
the pulsar and the mass mB of the companion). For example, the theoretical
value predicted by general relativity for the parameter γT is γ
T (mA,mB) =
en−1(GMn/c3)2/3mB(mA + 2mB)/M
2, where e is the eccentricity, n = 2π/P
the orbital frequency, andM ≡ mA+mB. We thus see that, if one assumes that
general relativity is correct, the observational measurement of a post-Keplerian
parameter, for example γobsT , determines a curve in the plane (mA,mB) of the
two masses: γGRT (mA,mB) = γ
T , in our example. The measurement of two
post-Keplerian parameters thus gives two curves in the (mA,mB) plane and
generically allows one to determine the values of the two masses mA and mB,
by considering the intersection of the two curves. We obtain a test of general
relativity as soon as one observationally measures three or more post-Keplerian
parameters: if the three (or more) curves all intersect at one point in the plane
of the two masses, the theory of general relativity is confirmed, but if this is
not the case the theory is refuted. At present, four distinct binary pulsars have
allowed one to test general relativity. These four “relativistic” binary pulsars
are: the first binary pulsar PSR B1913+16, the pulsar PSR B1534+12 (dis-
covered by A. Wolszczan in 1991), and two recently discovered pulsars: PSR
J1141−6545 (discovered in 1999 by V.M. Kaspi et al., whose first timing results
are due to M. Bailes et al. in 2003), and PSR J0737−3039 (discovered in 2003
by M. Burgay et al., whose first timing results are due to A.G. Lyne et al. and
M. Kramer et al.). With the exception of PSR J1141−6545, whose companion
is a white dwarf, the companions of the pulsars are neutron stars. In the case
of PSR J0737−3039 the companion turns out to also be a pulsar that is visible
from Earth.
In the system PSR B1913+16, three post-Keplerian parameters have been
measured (ω̇, γT , Ṗ ), which gives one test of the theory. In the system PSR
J1141−65, three post-Keplerian parameters have been measured (ω̇, γT , Ṗ ), which
gives one test of the theory. (The parameter s is also measured through scin-
tillation phenomena, but the use of this measurement for testing gravitation is
more problematic.) In the system PSR B1534+12, five post-Keplerian param-
eters have been measured, which gives three tests of the theory. In the system
PSR J0737−3039,six post-Keplerian parameters,4 which gives four tests of the
theory. It is remarkable that all of these tests have confirmed general relativ-
ity. See Figure 4 and, for references and details, [4, 11, 12, 13], as well as the
contribution by Michael Kramer.
Note that, in Figure 4, some post-Keplerian parameters are measured with
such great precision that they in fact define very thin curves in the mA,mB
plane. On the other hand, some of them are only measured with a rough
fractional precision and thus define “thick curves,” or “strips” in the plane of
the masses (see, for example, the strips associated with Ṗ , r and s in the case
of PSR B1534+12). In any case, the theory is confirmed when all of the strips
(thick or thin) have a non-empty common intersection. (One should also note
that the strips represented in Figure 4 only use the “one sigma” error bars, in
other words a 68% level of confidence. Therefore, the fact that the Ṗ strip for
PSR B1534+12 is a little bit disjoint from the intersection of the other strips is
not significant: a “two sigma” figure would show excellent agreement between
observation and general relativity.)
In view of the arguments presented above, all of the tests shown in Figure 4
confirm the validity of general relativity in the regime of strong gravitational
fields (hµν ∼ 1). Moreover, the four tests that use measurements of the pa-
rameter Ṗ (in the four corresponding systems) are direct experimental confir-
mations of the fact that the gravitational interaction propagates at the speed
c between the companion and the pulsar. In fact, Ṗ denotes the long-term
variation 〈dP/dt〉 of the orbital period. Detailed theoretical calculations of the
motion of two gravitationally condensed objects in general relativity, that take
into account the effects connected to the propagation of the gravitational inter-
action at finite speed[14], have shown that one of the observable effects of this
propagation is a long-term decrease in the orbital period given by the formula
ṖGR(mA,mB) = −
192 π
1 + 73
e2 + 37
(1− e2)7/2
4In the case of PSR J0737−3039, one of the six measured parameters is the ratio xA/xB
between a Keplerian parameter of the pulsar and its analog for the companion, which turns
out to also be a pulsar.
s ≤ 1
0 0.5 1 1.5 2 2.5
PSR J1141−6545
intersection
0 0.5 1 1.5 2 2.5
PSR B1534+12
intersection
0 0.5 1 1.5 2 2.5
PSR J0737−3039
intersection
0 0.5 1 1.5 2 2.5
2.5 ω
s ≤ 1
PSR B1913+16
intersection
Figure 4: Tests of general relativity obtained from observations of four binary
pulsars. For each binary pulsar one has traced the “curves,” in the plane of
the two masses (mA = mass of the pulsar, mB = mass of the companion),
defined by equating the theoretical expressions for the various post-Keplerian
parameters, as predicted by general relativity, to their observational value, de-
termined through a least-squares fit to the parameterized theoretical timing
formula. Each “curve” is in fact a “strip,” whose thickness is given by the
(one sigma) precision with which the corresponding post-Keplerian parameter
is measured. For some parameters, these strips are too thin to be visible. The
grey zones would correspond to a sine for the angle of inclination of the or-
bital plane with respect to the plane of the sky that is greater than 1, and are
therefore physically excluded.
The direct physical origin of this decrease in the orbital period lies in the mod-
ification, produced by general relativity, of the usual Newtonian law of gravi-
tational attraction between two bodies, FNewton = GmAmB/r
AB. In place of
such a simple law, general relativity predicts a more complicated force law that
can be expanded in the symbolic form
FEinstein =
GmAmB
+ · · ·
, (20)
where, for example, “v2/c2” represents a whole set of terms of order v2A/c
v2B/c
2, vA vB/c
2, or even GmA/c
2 r or GmB/c
2 r. Here vA denotes the speed
of body A, vB that of body B, and rAB the distance between the two bod-
ies. The term of order v5/c5 in Equation (20) is particularly important. This
term is a direct consequence of the finite-speed propagation of the gravitational
interaction between A and B, and its calculation shows that it contains a com-
ponent that is opposed to the relative speed vA − vB of the two bodies and
that, consequently, slows down the orbital motion of each body, causing it to
evolve towards an orbit that lies closer to its companion (and therefore has a
shorter orbital period). This “braking” term (which is correlated with the emis-
sion of gravitational waves), δFEinstein ∼ v5/c5 FNewton, leads to a long-term
decrease in the orbital period ṖGR ∼ −(v/c)5 ∼ −10−12 that is very small, but
whose reality has been verified with a fractional precision of order 10−3 in PSR
B1913+16 and of order 20% in PSR B1534+12 and PSR J1141−6545 [4, 11, 13].
To conclude this brief outline of the tests of relativistic gravitation by binary
pulsars, let us note that there is an analog, for the regime of strong gravitational
fields, of the formalism of parametrization for possible deviations from general
relativity mentioned in Section 6 in the framework of weak gravitational fields
(using the post-Newtonian parameters γ−1 and β−1). This analog is obtained
by considering a two-parameter family of relativistic theories of gravitation,
assuming that the gravitational interaction is propagated not only by a tensor
field gµν but also by a scalar field ϕ. Such a class of tensor-scalar theories
of gravitation allows for a description of possible deviations in both the solar
system and in binary pulsars. It also allows one to explicitly demonstrate that
binary pulsars indeed test the effects of strong fields that go beyond the tests
of the weak fields of the solar system by exhibiting classes of theories that
are compatible with all of the observations in the solar system but that are
incompatible with the observations of binary pulsars, see [4, 13].
10 Gravitational Waves: Propagation, Genera-
tion, and Detection
As soon as he had finished constructing the theory of general relativity, Ein-
stein realized that it implied the existence of waves of geometric deformations
of space-time, or “gravitational waves” [15, 2]. Mathematically, these waves are
analogs (with the replacement Aµ → hµν) of electromagnetic waves, but concep-
tually they signify something remarkable: they exemplify, in the purest possible
way, the “elastic” nature of space-time in general relativity. Before Einstein
space-time was a rigid structure, given a priori, which was not influenced by
the material content of the Universe. After Einstein, a distribution of matter
(or more generally of mass-energy) that changes over the course of time, let us
say for concreteness a binary system of two neutron stars or two black holes,
will not only deform the chrono-geometry of the space-time in its immediate
neighborhood, but this deformation will propagate in every possible direction
away from the system considered, and will travel out to infinity in the form of a
wave whose oscillations will reflect the temporal variations of the matter distri-
bution. We therefore see that the study of these gravitational waves poses three
separate problems: that of generation, that of propagation, and, finally, that of
detection of such gravitational radiation. These three problems are at present
being actively studied, since it is hoped that we will soon detect gravitational
waves, and thus will be able to obtain new information about the Universe [16].
We shall here content ourselves with an elementary introduction to this field
of research. For a more detailed introduction to the detection of gravitational
waves see the contribution by Jean-Yves Vinet to this Poincaré seminar.
Let us first consider the simplest case of very weak gravitational waves,
outside of their material sources. The geometry of such a space-time can be
written, as in Section 6, as gµν(x) = ηµν+hµν(x), where hµν ≪ 1. At first order
in h, and outside of the source (namely in the domain where Tµν(x) = 0), the
perturbation of the geometry, hµν(x), satisfies a homogeneous equation obtained
by replacing the right-hand side of Equation (11) with zero. It can be shown that
one can simplify this equation through a suitable choice of coordinate system.
In a transverse traceless (TT) coordinate system the only non-zero components
of a general gravitational wave are the spatial components hTTij , i, j = 1, 2, 3 (in
other words hTT00 = 0 = h
0i ), and these components satisfy
� hTTij = 0 , ∂j h
ij = 0 , h
jj = 0 . (21)
The first equation in (21), where the wave operator � = ∆ − c−2 ∂2t appears,
shows that gravitational waves (like electromagnetic waves) propagate at the
speed c. If we consider for simplicity a monochromatic plane wave (hTTij =
ζij exp(ik ·x− i ω t)+ complex conjugate, with ω = c |k|), the second equation
in (21) shows that the (complex) tensor ζij measuring the polarization of a
gravitational wave only has non-zero components in the plane orthogonal to
the wave’s direction of propagation: ζij k
j = 0. Finally, the third equation
in (21) shows that the polarization tensor ζij has vanishing trace: ζjj = 0.
More concretely, this means that if a gravitational wave propagates in the z-
direction, its polarization is described by a 2 × 2 matrix,
ζxx ζxy
ζyx ζyy
, which
is symmetric and traceless. Such a polarization matrix therefore only contains
two independent (complex) components: ζ+ ≡ ζxx = −ζyy, and ζ× ≡ ζxy =
ζyx. This is the same number of independent (complex) components that an
electromagnetic wave has. Indeed, in a transverse gauge, an electromagnetic
wave only has spatial components ATi that satisfy
�ATi = 0 , ∂j A
j = 0 . (22)
As in the case above, the first equation (22) means that an electromagnetic wave
propagates at the speed c, and the second equation shows that a monochromatic
plane electromagnetic wave (ATi = ζi exp(ik · x− i ω t)+ c.c., ω = c |k|) is de-
scribed by a (complex) polarization vector ζi that is orthogonal to the direction
of propagation: ζj k
j = 0. For a wave propagating in the z-direction such a
vector only has two independent (complex) components, ζx and ζy. It is in-
deed the same number of components that a gravitational wave has, but we
see that the two quantities measuring the polarization of a gravitational wave,
ζ+ = ζxx = −ζyy, ζ× = ζxy = ζyx are mathematically quite different from
the two quantities ζx, ζy measuring the polarization of an electromagnetic wave.
However, see Section 11 below.
We have here discussed the propagation of a gravitational wave in a back-
ground space-time described by the Minkowski metric ηµν . One can also con-
sider the propagation of a wave in a curved background space-time, namely by
studying solutions of Einstein’s equations (9) of the form gµν(x) = g
µν(x) +
hµν(x) where hµν is not only small, but varies on temporal and spatial scales
much shorter than those of the background metric gBµν(x). Such a study is nec-
essary, for example, for understanding the propagation of gravitational waves
in the cosmological Universe.
The problem of generation consists in searching for the connection between
the tensorial amplitude hTTij of the gravitational radiation in the radiation zone
and the motion and structure of the source. If one considers the simplest case of
a source that is sufficiently diffuse that it only creates waves that are everywhere
weak (gµν − ηµν = hµν ≪ 1), one can use the linearized approximation to Ein-
stein’s equations (9), namely Equations (11). One can solve Equations (11) by
the same technique that is used to solve Maxwell’s equations (12): one fixes the
coordinate system by imposing ∂α hαµ − 12 ∂µ h
α = 0 (analogous to the Lorentz
gauge condition ∂αAα = 0), then one inverts the wave operator by using re-
tarded potentials. Finally, one must study the asymptotic form, at infinity, of
the emitted wave, and write it in the reduced form of a transverse and traceless
amplitude hTTij satisfying Equations (21) (analogous to a transverse electromag-
netic wave ATi satisfying (22)). One then finds that, just as charge conservation
implies that there is no monopole type electro-magnetic radiation, but only
dipole or higher orders of polarity, the conservation of energy-momentum im-
plies the absence of monopole and dipole gravitational radiation. For a slowly
varying source (v/c≪ 1), the dominant gravitational radiation is of quadrupole
type. It is given, in the radiation zone, by an expression of the form
hTTij (t, r,n) ≃
[Iij(t− r/c)]TT . (23)
Here r denotes the distance to the center of mass of the source, Iij(t) ≡
d3x c−2
T 00(t,x)
xixj − 1
x2δij
is the quadrupole moment of the mass-energy distri-
bution, and the upper index TT denotes an algebraic projection operation for
the quadrupole tensor Iij (which is a 3 × 3 matrix) that only retains the part
orthogonal to the local direction of wave propagation ni ≡ xi/r with vanish-
ing trace (ITTij is therefore locally a (real) 2× 2 symmetric, traceless matrix of
the same type as ζij above). Formula (23) (which was in essence obtained by
Einstein in 1918 [15]) is only the first approximation to an expansion in powers
of v/c, where v designates an internal speed characteristic of the source. The
prospect of soon being able to detect gravitational waves has motivated theo-
rists to improve Formula (23): (i) by describing the terms of higher order in
v/c, up to a very high order, and (ii) by using new approximation methods that
allow one to treat sources containing regions of strong gravitational fields (such
as, for example, a binary system of two black holes or two neutron stars). See
below for the most recent results.
Finally, the problem of detection, of which the pioneer was Joseph Weber in
the 1960s, is at present giving rise to very active experimental research. The
principle behind any detector is that a gravitational wave of amplitude hTTij
induces a change in the distance L between two bodies on the order of δL ∼ hL
during its passage. One way of seeing this is to consider the action of a wave
hTTij on two free particles, at rest before the arrival of the wave at the positions
xi1 and x
2 respectively. As we have seen, each particle, in the presence of the
wave, will follow a geodesic motion in the geometry gµν = ηµν + hµν (with
h00 = h0i = 0 and hij = h
ij ). By writing out the geodesic equation, Equation
(7), one finds that it simply reduces (at first order in h) to d2xi/ds2 = 0.
Therefore, particles that are initially at rest (xi = const.) remain at rest in a
transverse and traceless system of coordinates! This does not however mean
that the gravitational wave has no observable effect. In fact, since the spatial
geometry is perturbed by the passage of the wave, gij(t,x) = δij + h
ij (t,x),
one finds that the physical distance between the two particles xi1, x
2 (which is
observable, for example, by measuring the time taken for light to make a round
trip between the two particles) varies, during the passage of the wave, according
to L2 = (δij + h
ij )(x
2 − xi1)(x
2 − x
The problem of detecting a gravitational wave thus leads to the problem of
detecting a small relative displacement δL/L ∼ h. By using Formula (23), one
finds that the order of magnitude of h, for known or hoped for astrophysical
sources (for example,a very close system of two neutron stars or two black holes),
situated at distances such that one may hope to see several events per year
(r & 600 million light-years), is in fact extremely small: h . 10−22 for signals
whose characteristic frequency is around 100 Hertz. Several types of detectors
have been developed since the pioneering work of J. Weber [16]. At present,
the detectors that should succeed in the near future at detecting amplitudes
h ∼ δL/L ∼ 10−22 are large interferometers, of the Michelson or Fabry-Pérot
type, having arms that are many kilometers in length into which a very powerful
monochromatic laser beam is injected. Such terrestrial interferometric detectors
presently exist in the U.S.A. (the LIGO detectors [17]), in Europe (the VIRGO
[18] and GEO 600 [19] detectors) and elsewhere (such as the TAMA detector
in Japan). Moreover, the international space project LISA [20], made up of
an interferometer between satellites that are several million kilometers apart,
should allow one to detect low frequency (∼ one hundredth or one thousandth
of a Hertz) gravitational waves in a dozen years or so. This collection of gravi-
tational wave detectors promises to bring invaluable information for astronomy
by opening a new “window” on the Universe that is much more transparent
than the various electromagnetic (or neutrino) windows that have so greatly
expanded our knowledge of the Universe in the twentieth century.
The extreme smallness of the expected gravitational signals has led a num-
ber of experimentalists to contribute, over many years, a wealth of ingenuity
and know-how in order to develop technology that is sufficiently precise and
trustworthy (see [17, 18, 19, 20]). To conclude, let us also mention how much
concerted theoretical effort has been made, both in calculating the general rel-
ativistic predictions for gravitational waves emitted by certain sources, and in
developing methods adapted to the extraction of the gravitational signal from
the background noise in the detectors. For example, one of the most promising
sources for terrestrial detectors is the wave train for gravitational waves emitted
by a system of two black holes, and in particular the final (most intense) portion
of this wave train, which is emitted during the last few orbits of the system and
the final coalescence of the two black holes into a single, more massive black
hole. We have seen above (see Section 9) that the finite speed of propagation
of the gravitational interaction between the two bodies of a binary system gives
rise to a progressive acceleration of the orbital frequency, connected to the pro-
gressive approach of the two bodies towards each other. Here we are speaking
of the final stages in such a process, where the two bodies are so close that they
orbit around each other in a spiral pattern that accelerates until they attain
(for the final “stable” orbits) speeds that become comparable to the speed of
light, all the while remaining slightly slower. In order to be able to determine,
with a precision that is acceptable for the needs of detection, the dynamics of
such a binary black hole system in such a situation, as well as the gravitational
amplitude hTTij that it emits, it was necessary to develop a whole ensemble of
analytic techniques to a very high level of precision. For example, it was neces-
sary to calculate the expansion (20) of the force determining the motion of the
two bodies to a very high order and also to calculate the amplitude hTTij of the
gravitational radiation emitted to infinity with a precision going well beyond the
quadrupole approximation (23). These calculations are comparable in complex-
ity to high-order calculations in quantum field theory. Some of the techniques
developed for quantum field theory indeed proved to be extremely useful for
these calculations in the (classical) theory of general relativity (such as certain
resummation methods and the mathematical use of analytic continuation in the
number of space-time dimensions). For an entryway into the literature of these
modern analytic methods, see [21], and for an early example of a result obtained
by such methods of direct interest for the physics of detection see Figure 5 [22],
which shows a component of the gravitational amplitude hTTij (t) emitted during
the final stages of evolution of a system of two black holes of equal mass. The
first oscillations shown in Figure 5 are emitted during the last quasi-circular
orbits (accelerated motion in a spiral of decreasing radius). The middle part
of the signal corresponds to a phase where, having moved past the last stable
orbit, the two black holes “fall” toward each other while spiraling rapidly. In
fact, contrarily to Newton’s theory, which predicts that two condensed bodies
would be able to orbit around each other with an orbit of arbitrarily small ra-
dius (basically up until the point that the two bodies touch), Einstein’s theory
predicts a modified law for the force between the two bodies, Equation (20),
whose analysis shows that it is so attractive that it no longer allows for sta-
ble circular orbits when the distance between the two bodies becomes smaller
than around 6G(mA +mB)/c
2. In the case of two black holes, this distance is
sufficiently larger than the black hole “radii” (2GmA/c
2 and 2GmB/c
2) that
one is still able to analytically treat the beginning of the “spiralling plunge” of
the two black holes towards each other. The final oscillations in Figure 5 are
emitted by the rotating (and initially highly deformed) black hole formed from
the merger of the two initial, separate black holes.
−200 −100 0 100
−0.48
−0.38
−0.28
−0.18
−0.08
inspiral + plunge
merger + ring−down
Figure 5: The gravitational amplitude h(t) emitted during the final stages of
evolution of a system of two equal-mass black holes. The beginning of the signal
(the left side of the figure), which is sinusoidal, corresponds to an inspiral motion
of two separate black holes (with decreasing distance); the middle corresponds
to a rapid “inspiralling plunge” of the two black holes towards each other; the
end (at right) corresponds to the oscillations of the final, rotating black hole
formed from the merger of the two initial black holes.
Up until quite recently the analytic predictions illustrated in Figure 5 con-
cerning the gravitational signal h(t) emitted by the spiralling plunge and merger
of two black holes remained conjectural, since they could be compared to neither
other theoretical predictions nor to observational data. Recently, worldwide ef-
forts made over three decades to attack the problem of the coalescence of two
black holes by numerically solving Einstein’s equations (9) have spectacularly
begun to bear fruit. Several groups have been able to numerically calculate
the signal h(t) emitted during the final orbits and merger of two black holes
[23]. In essence, there is good agreement between the analytical and numerical
predictions. In order to be able to detect the gravitational waves emitted by
the coalescence of two black holes, it will most likely be necessary to properly
combine the information on the structure of the signal h(t) obtained by the two
types of methods, which are in fact complementary.
11 General Relativity and Quantum Theory: From
Supergravity to String Theory
Up until now, we have discussed the classical theory of general relativity, ne-
glecting any quantum effects. What becomes of the theory in the quantum
regime? This apparently innocent question in fact opens up vast new prospects
that are still under construction. We will do nothing more here than to touch
upon the subject, by pointing out to the reader some of the paths along which
contemporary physics has been led by the challenge of unifying general relativity
and quantum theory. For a more complete introduction to the various possi-
bilities “beyond” general relativity suggested within the framework of string
theory (which is still under construction) one should consult the contribution of
Ignatios Antoniadis to this Poincaré Seminar.
Let us recall that, from the very beginning of the quasi-definitive formula-
tion of quantum theory (1925–1930), the creators of quantum mechanics (Born,
Heisenberg, Jordan; Dirac; Pauli; etc.) showed how to “quantize” not only
systems with several particles (such as an atom), but also fields, continuous dy-
namical systems whose classical description implies a continuous distribution of
energy and momentum in space. In particular, they showed how to quantize (or
in other words how to formulate within a framework compatible with quantum
theory) the electromagnetic field Aµ, which, as we have recalled above, satisfies
the Maxwell equations (12) at the classical level. They nevertheless ran into dif-
ficulty due to the following fact. In quantum theory, the physics of a system’s
evolution is essentially contained in the transition amplitudes A(f, i) between
an initial state labelled by i and a final state labelled by f . These amplitudes
A(f, i) are complex numbers. They satisfy a “transitivity” property of the type
A(f, i) =
A(f, n)A(n, i) , (24)
which contains a sum over all possible intermediate states, labelled by n (with
this sum becoming an integral when there is a continuum of intermediate pos-
sible states). R. Feynman used Equation (24) as a point of departure for a
new formulation of quantum theory, by interpreting it as an analog of Huy-
gens’ Principle: if one thinks of A(f, i) as the amplitude, “at the point f ,” of
a “wave” emitted “from the point i,” Equation (24) states that this amplitude
can be calculated by considering the “wave” emitted from i as passing through
all possible intermediate “points” n (A(n, i)), while reemitting “wavelets” start-
ing from these intermediate points (A(f, n)), which then superpose to form the
total wave arriving at the “final point f .”
Property (24) does not pose any problem in the quantum mechanics of dis-
crete systems (particle systems). It simply shows that the amplitude A(f, i)
behaves like a wave, and therefore must satisfy a “wave equation” (which is in-
deed the case for the Schrödinger equation describing the dependence of A(f, i)
on the parameters determining the final configuration f). On the other hand,
Property (24) poses formidable problems when one applies it to the quantiza-
tion of continuous dynamical systems (fields). In fact, for such systems the
“space” of intermediate possible states is infinitely larger than in the case of
the mechanics of discrete systems. Roughly speaking, the intermediate possible
states for a field can be described as containing ℓ = 1, 2, 3, . . . quantum excita-
tions of the field, with each quantum excitation (or pair of “virtual particles”)
being described essentially by a plane wave, ζ exp(i kµ x
µ), where ζ measures
the polarization of these virtual particles and kµ = ηµν kν , with k
0 = ω and
ki = k, their angular frequency and wave vector, or (using the Planck-Einstein-
de Broglie relations E = ~ω, p = ~k) their energy-momentum pµ = ~ kµ.
The quantum theory shows (basically because of the uncertainty principle) that
the four-frequencies (and four-momenta) pµ = ~ kµ of the intermediate states
cannot be constrained to satisfy the classical equation ηµν p
µ pν = −m2 (or in
other words E2 = p2 +m2 ; we use c = 1 in this section). As a consequence,
the sum over intermediate states for a quantum field theory has the following
properties (among others): (i) when ℓ = 1 (an intermediate state containing
only one pair of virtual particles, called a one-loop contribution), there is an in-
tegral over a four-momentum pµ,
d4p =
dp; (ii) when ℓ = 2 (two pairs
of virtual particles; a two-loop contribution), there is an integral over two four-
momenta p
1 , p
d4p1 d
4p2; etc. The delicate point comes from the fact that
the energy-momentum of an intermediate state can take arbitrarily high values.
This possibility is directly connected (through a Fourier transform) to the fact
that a field possesses an infinite number of degrees of freedom, corresponding
to configurations that vary over arbitrarily small time and length scales.
The problems posed by the necessity of integrating over the infinite domain
of four-momenta of intermediate virtual particles (or in other words of account-
ing for the fact that field configurations can vary over arbitrarily small scales)
appeared in the 1930s when the quantum theory of the electromagnetic field
Aµ (called quantum electrodynamics, or QED) was studied in detail. These
problems imposed themselves in the following form: when one calculates the
transition amplitude for given initial and final states (for example the collision
of two light quanta, with two photons entering and two photons leaving) by
using (24), one finds a result given in the form of a divergent integral, because
of the integral (in the one-loop approximation, ℓ = 1) over the arbitrarily large
energy-momentum describing virtual electron-positron pairs appearing as pos-
sible intermediate states. Little by little, theoretical physicists understood that
the types of divergent integrals appearing in QED were relatively benign and,
after the second world war, they developed a method (renormalization theory)
that allowed one to unambiguously isolate the infinite part of these integrals,
and to subtract them by expressing the amplitudes A(f, i) solely as a function of
observable quantities [24] (work by J. Schwinger, R. Feynman, F. Dyson etc.).
The preceding work led to the development of consistent quantum theories
not only for the electromagnetic field Aµ (QED), but also for generalizations of
electromagnetism (Yang-Mills theory or non-abelian gauge theory) that turned
out to provide excellent descriptions of the new interactions between elementary
particles discovered in the twentieth century (the electroweak theory, partially
unifying electromagnetism and weak nuclear interactions, and quantum chro-
modynamics, describing the strong nuclear interactions). All of these theories
give rise to only relatively benign divergences that can be “renormalized” and
thus allowed one to compute amplitudes A(f, i) corresponding to observable
physical processes [24] (notably, work by G. ’t Hooft and M. Veltman).
What happens when we use (24) to construct a “perturbative” quantum
theory of general relativity (namely one obtained by expanding in the number
ℓ of virtual particle pairs appearing in the intermediate states)? The answer is
that the integrals over the four-momenta of intermediate virtual particles are
not at all of the benign type that allowed them to be renormalized in the simpler
case of electromagnetism. The source of this difference is not accidental, but is
rather connected with the basic physics of relativistic gravitation. Indeed, as we
have mentioned, the virtual particles have arbitrarily large energies E. Because
of the basic relations that led Einstein to develop general relativity, namely
E = mi and mi = mg, one deduces that these virtual particles correspond to
arbitrarily large gravitational masses mg. They will therefore end up creating
intense gravitational effects that become more and more intense as the number
ℓ of virtual particle pairs grows. These gravitational interactions that grow
without limit with energy and momentum correspond (by Fourier transform) to
field configurations concentrated in arbitrarily small space and time scales. One
way of seeing why the quantum gravitational field creates much more violent
problems than the quantum electromagnetic field is, quite simply, to go back to
dimensional analysis. Simple considerations in fact show that the relative (non-
dimensional) one-loop amplitude A1 must be proportional to the product ~G
and must contain an integral
d4k. However, in 1900 Planck had noticed that
(in units where c = 1) the dimensions of ~ and G were such that the product
~G had the dimensions of length (or time) squared:
≃ 1.6× 10−33 cm, tP ≡
≃ 5.4× 10−44 s . (25)
One thus deduces that the integral
d4k f(k) must have the dimensions of a
squared frequency, and therefore that A1 must (when k → ∞) be of the type,
A1 ∼ ~G
d4k/k2. Such an integral diverges quadratically with the upper
limit Λ of the integral (the cutoff frequency, such that |k| ≤ Λ), so that A1 ∼
~GΛ2 ∼ t2P Λ2. The extension of this dimensional analysis to the intermediate
states with several loops (ℓ > 1) causes even more severe polynomial divergences
to appear, of a type such that the power of Λ that appears grows without limit
with ℓ.
In summary, the essential physical characteristics of gravitation (E = mi =
mg and the dimension of Newton’s constant G) imply the impossibility of gener-
alizing to the gravitational case the methods that allowed a satisfactory quantum
treatment of the other interactions (electromagnetic, weak, and strong). Several
paths have been explored to get out of this impasse. Some researchers tried to
quantize general relativity non-perturbatively, without using an expansion in
intermediate states (24) (work by A. Ashtekar, L. Smolin, and others). others
have tried to generalize general relativity by adding a fermionic field to Einstein’s
(bosonic) gravitational field gµν(x), the gravitino field ψµ(x). It is indeed re-
markable that it is possible to define a theory, known as supergravity, that gener-
alizes the geometric invariance of general relativity in a profound way. After the
1974 discovery (by J. Wess and B. Zumino) of a possible new global symmetry
for interacting bosonic and fermionic fields, supersymmetry (which is a sort of
global rotation transforming bosons to fermions and vice versa), D.Z. Freedman,
P. van Nieuwenhuizen, and S. Ferrara; and S. Deser and B. Zumino; showed that
one could generalize global supersymmetry to a local supersymmetry, meaning
that it varies from point to point in space-time. Local supersymmetry is a sort
of fermionic generalization (with anti-commuting parameters) of the geometric
invariance at the base of general relativity (the invariance under any change in
coordinates). The generalization of Einstein’s theory of gravitation that admits
such a local supersymmetry is called supergravity theory. As we have mentioned,
in four dimensions this theory contains, in addition to the (commuting) bosonic
field gµν(x), an (anti-commuting) fermionic field ψµ(x) that is both a space-
time vector (with index µ) and a spinor. (It is a massless field of spin 3/2,
intermediate between a massless spin 1 field like Aµ and a massless spin 2 field
like hµν = gµν − ηµν .) Supergravity was extended to richer fermionic struc-
tures (with many gravitinos), and was formulated in space-times having more
than four dimensions. It is nevertheless remarkable that there is a maximal
dimension, equal to D = 11, admitting a theory of supergravity (the maximal
supergravity constructed by E. Cremmer, B. Julia, and J. Scherk). The initial
hope underlying the construction of these supergravity theories was that they
would perhaps allow one to give meaning to the perturbative calculation (24)
of quantum amplitudes. Indeed, one finds for example that at one loop, ℓ = 1,
the contributions coming from intermediate fermionic states have a sign oppo-
site to the bosonic contributions and (because of the supersymmetry, bosons ↔
fermions) exactly cancel them. Unfortunately, although such cancellations exist
for the lowest orders of approximation, it appeared that this was probably not
going to be the case at all orders5. The fact that the gravitational interaction
constant G has “a bad dimension” remains true and creates non-renormalizable
divergences starting at a certain number of loops ℓ.
Meanwhile, a third way of defining a consistent quantum theory of gravity
was developed, under the name of string theory. Initially formulated as models
for the strong interactions (in particular by G. Veneziano, M. Virasoro, P. Ra-
mond, A. Neveu, and J.H. Schwarz), the string theories were founded upon the
quantization of the relativistic dynamics of an extended object of one spatial di-
mension: a “string.” This string could be closed in on itself, like a small rubber
band (a closed string), or it could have two ends (an open string). Note that
the point of departure of string theory only includes the Poincaré-Minkowski
space-time, in other words the metric ηµν of Equation (2), and quantum theory
(with the constant ~ = h/2π). In particular, the only symmetry manifest in the
classical dynamics of a string is the Poincaré group (3). It is, however, remark-
5Recent work by Z. Bern et al. and M. Green et al., has, however, suggested that such
cancellations take place at all orders for the case of maximal supergravity, dimensionally
reduced to D = 4 dimensions.
able that (as shown by T. Yoneya, and J. Scherk and J.H. Schwarz, in 1974) one
of the quantum excitations of a closed string reproduces, in a certain limit, all
of the non-linear structure of general relativity (see below). Among the other
remarkable properties of string theory [25], let us point out that it is the first
physical theory to determine the space-time dimension D. In fact, this theory
is only consistent if D = 10, for the versions allowing fermionic excitations (the
purely bosonic string theory selects D = 26). The fact that 10 > 4 does not
mean that this theory has no relevance to the real world. Indeed, it has been
known since the 1930s (from work of T. Kaluza and O. Klein) that a space-
time of dimension D > 4 is compatible with experiment if the supplementary
(spatial) dimensions close in on themselves (meaning they are compactified) on
very small distance scales. The low-energy physics of such a theory seems to
take place in a four-dimensional space-time, but it contains new (a priori mass-
less) fields connected to the geometry of the additional compactified dimensions.
Moreover, recent work (due in particular to I. Antoniadis, N. Arkani-Hamed,
S. Dimopoulos, and G. Dvali) has suggested the possibility that the additional
dimensions are compactified on scales that are small with respect to everyday
life, but very large with respect to the Planck length. This possibility opens
up an entire phenomenological field dealing with the eventual observation of
signals coming from string theory (see the contribution of I. Antoniadis to this
Poincaré seminar).
However, string theory’s most remarkable property is that it seems to avoid,
in a radical way, the problems of divergent (non-renormalizable) integrals that
have weighed down every direct attempt at perturbatively quantizing gravity.
In order to explain how string theory arrives at such a result, we must discuss
some elements of its formalism.
Recall that the classical dynamics of any system is obtained by minimizing a
functional of the time evolution of the system’s configuration, called the action
(the principle of least action). For example, the action for a particle of mass
m, moving in a Riemannian space-time (6), is proportional to the length of the
line that it traces in space-time: S = −m
ds. This action is minimized when
the particle follows a geodesic, in other words when its equation of motion is
given by (7). According to Y. Nambu and T. Goto, the action for a string is
S = −T
dA, where the parameter T (analogous to m for the particle) is
called the string tension, and where
dA is the area of the two-dimensional
surface traced out by the evolution of the string in the (D-dimensional) space-
time in which it lives. In quantum theory, the action functional serves (as
shown by R. Feynman) to define the transition amplitude (24). Basically, when
one considers two intermediate configurations m and n (in the sense of the
right-hand side of (24)) that are close to each other, the amplitude A(n,m) is
proportional to exp(i S(n,m)/~), where S(n,m) is the minimal classical action
such that the system considered evolves from the configuration labelled by n to
that labelled by m. Generalizing the decomposition in (24) by introducing an
infinite number of intermediate configurations that lie close to each other, one
ends up (in a generalization of Huygens’ principle) expressing the amplitude
A(f, i) as a multiple sum over all of the “paths” (in the configuration space of
the system studied) connecting the initial state i to the final state f . Each path
contributes a term eiφ where the phase φ = S/~ is proportional to the action
S corresponding to this “path,” or in other words to this possible evolution of
the system. In string theory, φ = −(T/~)
dA. Since the phase is a non-
dimensional quantity, and
dA has the dimension of an area, we see that the
quantum theory of strings brings in the quantity ~/T , having the dimensions
of a length squared, at a fundamental level. More precisely, the fundamental
length of string theory, ℓs, is defined by
ℓ2s ≡ α′ ≡
2 π T
. (26)
This fundamental length plays a central role in string theory. Roughly speak-
ing, it defines the characteristic “size” of the quantum states of a string. If ℓs
is much smaller than the observational resolution with which one studies the
string, the string will look like a point-like particle, and its interactions will be
described by a quantum theory of relativistic particles, which is equivalent to
a theory of relativistic fields. It is precisely in this sense that general relativity
emerges as a limit of string theory. Since this is an important conceptual point
for our story, let us give some details about the emergence of general relativity
from string theory.
The action functional that is used in practice to quantize a string is not
really −T
dA, but rather (as emphasized by A. Polyakov)
= − 1
4 π ℓ2s
−γ γab ∂aXµ ∂bXν ηµν + · · · , (27)
where σa, a = 0, 1 are two coordinates that allow an event to be located on
the space-time surface (or ‘world-sheet’) traced out by the string within the
ambient space-time; γab is an auxiliary metric (dΣ
2 = γab(σ) dσ
a dσb) defined
on this surface (with γab being its inverse, and γ its determinant); and Xµ(σa)
defines the embedding of the string in the ambient (flat) space-time. The dots
indicate additional terms, and in particular terms of fermionic type that were
introduced by P. Ramond, by A. Neveu and J.H. Schwarz, and by others. If
one separates the two coordinates σa = (σ0, σ1) into a temporal coordinate,
τ ≡ σ0, and a spatial coordinate, σ ≡ σ1, the configuration “at time τ” of the
string is described by the functions Xµ(τ, σ), where one can interpret σ as a
curvilinear abscissa describing the spatial extent of the string. If we consider
a closed string, one that is topologically equivalent to a circle, the function
Xµ(τ, σ) must be periodic in σ. One can show that (modulo the imposition of
certain constraints) one can choose the coordinates τ and σ on the string such
that dΣ2 = −dτ2+dσ2. Then, the dynamical equations for the string (obtained
by minimizing the action (27)) reduce to the standard equation for waves on
a string: −∂2Xµ/∂τ2 + ∂2Xµ/∂σ2 = 0. The general solution to this equation
describes a superposition of waves travelling along the string in both possible
directions: Xµ = X
L(τ+σ)+X
R(τ−σ). If we consider a closed string (one that
is topologically equivalent to a circle), these two types of wave are independent
of each other. For an open string (with certain reflection conditions at the
endpoints of the string) these two types of waves are connected to each other.
Moreover, since the string has a finite length in both cases, one can decompose
the left- or right-moving waves X
L(τ + σ) or X
R(τ − σ) as a Fourier series. For
example, for a closed string one may write
Xµ(τ, σ) = X
0 (τ) +
e−2in(τ−σ) +
ãµn√
e−2in(τ+σ)
+ h.c. (28)
Here X
0 (τ) = x
µ + 2 ℓ2s p
µτ describes the motion of the string’s center of mass,
and the remainder describes the decomposition of the motion around the center
of mass into a discrete set of oscillatory modes. Like any vibrating string, a rel-
ativistic string can vibrate in its fundamental mode (n = 1) or in a “harmonic”
of the fundamental mode (for an integer n > 1). In the classical case the com-
plex coefficients aµn, ã
n represent the (complex) amplitudes of vibration for the
modes of oscillation at frequency n times the fundamental frequency. (with aµn
corresponding to a wave travelling to the right, while ãµn corresponds to a wave
travelling to the left.) When one quantizes the string dynamics the position of
the string Xµ(τ, σ) becomes an operator (acting in the space of quantum states
of the system), and because of this the quantities xµ, pµ, aµn and ã
n in (28) be-
come operators. The notation h.c. signifies that one must add the hermitian
conjugates of the oscillation terms, which will contain the operators (aµn)
† and
(ãµn)
†. (The notation † indicates hermitian conjugation, in other words the oper-
ator analog of complex conjugation.) One then finds that the operators xµ and
pµ describing the motion of the center of mass satisfy the usual commutation re-
lations of a relativistic particle, [xµ, pµ] = i ~ ηµν , and that the operators aµn and
ãµn become annihilation operators, like those that appear in the quantum theory
of any vibrating system: [aµn, (a
†] = ~ ηµν δnm, [ã
n, (ã
†] = ~ ηµν δmn. In
the case of an open string, one only has one set of oscillators, let us say aµn.
The discussion up until now has neglected to mention that the oscillation am-
plitudes aµn, ã
n must satisfy an infinite number of constraints (connected with
the equation obtained by minimizing (27) with respect to the auxiliary metric
γab). One can satisfy these by expressing two of the space-time components of
the oscillators aµn, ã
n (for each n) as a function of the other. Because of this, the
physical states of the string are described by oscillators ain, ã
n where the index i
only takes D−2 values in a space-time of dimension D. Forgetting this subtlety
for the moment (which is nevertheless crucial physically), let us conclude this
discussion by summarizing the spectrum of a quantum string, or in other words
the ensemble of quantum states of motion for a string.
For an open string, the ensemble of quantum states describes the states of
motion (the momenta pµ) of an infinite collection of relativistic particles, having
squared massesM2 = −ηµν pµ pν equal to (N−1) m2s, whereN is a non-negative
integer andms ≡ ~/ℓs is the fundamental mass of string theory associated to the
fundamental length ℓs. For a closed string, one finds another “infinite tower”
of more and more massive particles, this time with M2 = 4(N − 1)m2s. In both
cases the integer N is given, as a function of the string’s oscillation amplitudes
(travelling to the right), by
n ηµν(a
† aνn . (29)
In the case of a closed string one must also satisfy the constraint N = Ñ where
Ñ is the operator obtained by replacing aµn by ã
n in (29).
The preceding result essentially states that the (quantized) internal energy
of an oscillating string defines the squared mass of the associated particle. The
presence of the additional term −1 in the formulae given above for M2 means
that the quantum state of minimum internal energy for a string, that is, the
“vacuum” state |0〉 where all oscillators are in their ground state, aµn | 0〉 = 0,
corresponds to a negative squared mass (M2 = −m2s for the open string and
M2 = −4m2s for the closed string). This unusual quantum state (a tachyon) cor-
responds to an instability of the theory of bosonic strings. It is absent from the
more sophisticated versions of string theory (“superstrings”) due to F. Gliozzi,
J. Scherk, and D. Olive, to M. Green and J.H. Schwarz, and to D. Gross and
collaborators. Let us concentrate on the other states (which are the only ones
that have corresponding states in superstring theory). One then finds that the
first possible physical quantum states (such that N = 1) describe some massless
particles. In relativistic quantum theory it is known that any particle is the
quantized excitation of a corresponding field. Therefore the massless particles
that appear in string theory must correspond to long-range fields. To know
which fields appear in this way one must more closely examine which possible
combinations of oscillator excitations a
1 , a
2 , a
3 , . . ., appearing in Formula (29),
can lead to N = 1. Because of the factor n in (29) multiplying the harmonic
contribution of order n to the mass squared, only the oscillators of the fun-
damental mode n = 1 can give N = 1. One then deduces that the internal
quantum states of massless particles appearing in the theory of open strings are
described by a string oscillation state of the form
† | 0〉 . (30)
On the other hand, because of the constraint N = Ñ = 1, the internal quantum
states of the massless particles appearing in the theory of closed strings are
described by a state of excitation containing both a left-moving oscillation and
a right-moving oscillation:
ζµν(a
† (ãν1)
† | 0〉 . (31)
In Equations (30) and (31) the state |0〉 denotes the ground state of all oscillators
(aµn | 0〉 = ãµn | 0〉 = 0).
The state (30) therefore describes a massless particle (with momentum sat-
isfying ηµν p
µ pν = 0), possessing an “internal structure” described by a vector
polarization ζµ. Here we recognize exactly the definition of a photon, the quan-
tum state associated with a wave Aµ(x) = ζµ exp(i kλ x
λ), where pµ = ~ kµ.
The theory of open strings therefore contains Maxwell’s theory. (One can also
show that, because of the constraints briefly mentioned above, the polarization
ζµ must be transverse, k
µ ζµ = 0, and that it is only defined up to a gauge
transformation: ζ′µ = ζµ + a kµ.) As for the state (31), this describes a massless
particle (ηµν p
µ pν = 0), possessing an “internal structure” described by a tensor
polarization ζµν . The plane wave associated with such a particle is therefore of
the form h̄µν(x) = ζµν exp(i kλ x
λ), where pµ = ~ kµ. As in the case of the open
string, one can show that ζµν must be transverse, ζµν k
ν = 0 and that it is only
defined up to a gauge transformation, ζ′µν = ζµν+kµ aν+kν bµ. We here see the
same type of structure appear that we had in general relativity for plane waves.
However, here we have a structure that is richer than that of general relativity.
Indeed, since the state (31) is obtained by combining two independent states of
oscillation, (a
† and (ã
†, the polarization tensor ζµν is not constrained to be
symmetric. Moreover it is not constrained to have vanishing trace. Therefore,
if we decompose ζµν into its possible irreducible parts (a symmetric traceless
part, a symmetric part with trace, and an antisymmetric part) we find that the
field h̄µν(x) associated with the massless states of a closed string decomposes
into: (i) a field hµν(x) (the graviton) representing a weak gravitational wave
in general relativity, (ii) a scalar field Φ(x) (called the dilaton), and (iii) an
antisymmetric tensor field Bµν(x) = −Bνµ(x) subject to the gauge invariance
B′µν(x) = Bµν(x) + ∂µ aν(x) − ∂ν aµ(x). Moreover, when one studies the non-
linear interactions between these various fields, as described by the transition
amplitudes A(f, i) in string theory, one can show that the field hµν(x) truly
represents a deformation of the flat geometry of the background space-time in
which the theory was initially formulated. Let us emphasize this remarkable
result. We started from a theory that studied the quantum dynamics of a string
in a rigid background space-time. This theory predicts that certain quantum
excitations of a string (that propagate at the speed of light) in fact represent
waves of deformation of the space-time geometry. In intuitive terms, the “elas-
ticity” of space-time postulated by the theory of general relativity appears here
as being due to certain internal vibrations of an elastic object extended in one
spatial dimension.
Another suggestive consequence of string theory is the link suggested by
the comparison between (30) and (31). Roughly, Equation (31) states that
the internal state of a closed string corresponding to a graviton is constructed
by taking the (tensor) product of the states corresponding to photons in the
theory of open strings. This unexpected link between Einstein’s gravitation
(gµν) and Maxwell’s theory (Aµ) translates, when we look at interactions in
string theory, into remarkable identities (due to H. Kawai, D.C. Lewellen, and
S.-H.H. Tye) between the transition amplitudes of open strings and those of
closed strings. This affinity between electromagnetism, or rather Yang-Mills
theory, and gravitation has recently given rise to fascinating conjectures (due to
A. Polyakov and J. Maldacena) connecting quantum Yang-Mills theory in flat
space-time to quasi-classical limits of string theory and gravitation in curved
space-time. Einstein would certainly have been interested to see how classical
general relativity is used here to clarify the limit of a quantum Yang-Mills theory.
Having explained the starting point of string theory, we can outline the in-
tuitive reason for which this theory avoids the problems with divergent integrals
that appeared when one tried to directly quantize gravitation. We have seen
that string theory contains an infinite tower of particles whose masses grow
with the degree of excitation of the string’s internal oscillators. The gravita-
tional field appears in the limit that one considers the low energy interactions
(E ≪ ms) between the massless states of the theory. In this limit the gravi-
ton (meaning the particle associated with the gravitational field) is treated as
a “point-like” particle. When we consider more complicated processes (at one
loop, ℓ = 1, see above), virtual elementary gravitons could appear with arbitrar-
ily high energy. It is these virtual high-energy gravitons that are responsible for
the divergences. However, in string theory, when we consider any intermediate
process whatsoever where high energies appear, it must be remembered that
this high intermediate energy can also be used to excite the internal state of
the virtual gravitons, and thus reveal that they are “made” from an extended
string. An analysis of this fact shows that string theory introduces an effective
truncation of the type E . ms on the energies of exchanged virtual particles.
In other words, the fact that there are no truly “point-like” particles in string
theory, but only string excitations having a characteristic length ∼ ℓs, elimi-
nates the problem of infinities connected to arbitrarily small length and time
scales. Because of this, in string theory one can calculate the transition ampli-
tudes corresponding to a collision between two gravitons, and one finds that the
result is given by a finite integral [25].
Up until now we have only considered the starting point of string theory.
This is a complex theory that is still in a stage of rapid development. Let us
briefly sketch some other aspects of this theory that are relevant for this exposé
centered around relativistic gravitation. Let us first state that the more sophis-
ticated versions of string theory (superstrings) require the inclusion of fermionic
oscillators bµn, b̃
n, in addition to the bosonic oscillators a
n, ã
n introduced above.
One then finds that there are no particles of negative mass-squared, and that
the space-time dimension D must be equal to 10. One also finds that the mass-
less states contain more states than those indicated above. In fact, one finds
that the fields corresponding to these states describe the various possible theo-
ries of supergravity in D = 10. Recently (in work by J. Polchinski) it has also
been understood that string theory contains not only the states of excitation of
strings (in other words of objects extended in one spatial direction), but also
the states of excitation of objects extended in p spatial directions, where the
integer p can take other values than 1. For example, p = 2 corresponds to a
membrane. It even seems (according to C. Hull and P. Townsend) that one
should recognize that there is a sort of “democracy” between several different
values for p. An object extended in p spatial directions is called a p-brane. In
general, the masses of the quantum states of these p-branes are very large, be-
ing parametrically higher than the characteristic mass ms. However, one may
also consider a limit where the mass of certain p-branes tends towards zero. In
this limit, the fields associated with these p-branes become long-range fields. A
surprising result (by E. Witten) is that, in this limit, the infinite tower of states
of certain p-branes (in particular for p = 0) corresponds exactly to the infinite
tower of states that appear when one considers the maximal supergravity in
D = 11 dimensions, with the eleventh (spatial) dimension compactified on a
circle (that is to say with a periodicity condition on x11). In other words, in
a certain limit, a theory of superstrings in D = 10 transforms into a theory
that lives in D = 11 dimensions! Because of this, many experts in string theory
believe that the true definition of string theory (which is still to be found) must
start from a theory (to be defined) in 11 dimensions (known as “M -theory”).
We have seen in Section 8 that one point of contact between relativistic grav-
itation and quantum theory is the phenomenon of thermal emission from black
holes discovered by S.W. Hawking. String theory has shed new light upon this
phenomenon, as well as on the concept of black hole “entropy.” The essential
question that the calculation of S.W. Hawking left in the shadows is: what is
the physical meaning of the quantity S defined by Equation (19)? In the ther-
modynamic theory of ordinary bodies, the entropy of a system is interpreted,
since Boltzmann’s work, as the (natural) logarithm of the number of micro-
scopic states N having the same macroscopic characteristics (energy, volume,
etc.) as the state of the system under consideration: S = logN . Bekenstein had
attempted to estimate the number of microscopic internal states of a macroscop-
ically defined black hole, and had argued for a result such that logN was on the
order of magnitude of A/~G, but his arguments remained indirect and did not
allow a clear meaning to be attributed to this counting of microscopic states.
Work by A. Sen and by A. Strominger and C. Vafa, as well as by C.G. Callan
and J.M. Maldacena has, for the first time, given examples of black holes whose
microscopic description in string theory is sufficiently precise to allow for the
calculation (in certain limits) of the number of internal quantum states, N . It
is therefore quite satisfying to find a final result for N whose logarithm is pre-
cisely equal to the expression (19). However, there do remain dark areas in the
understanding of the quantum structure of black holes. In particular, the string
theory calculations allowing one to give a precise statistical meaning to the en-
tropy (19) deal with very special black holes (known as extremal black holes,
which have the maximal electric charge that a black hole with a regular horizon
can support). These black holes have a Hawking temperature equal to zero, and
therefore do not emit thermal radiation. They correspond to stable states in the
quantum theory. One would nevertheless also like to understand the detailed
internal quantum structure of unstable black holes, such as the Schwarzschild
black hole (17), which has a non-zero temperature, and which therefore loses
its mass little by little in the form of thermal radiation. What is the final state
to which this gradual process of black hole “evaporation” leads? Is it the case
that an initial pure quantum state radiates all of its initial mass to transform
itself entirely into incoherent thermal radiation? Or does a Schwarzschild black
hole transform itself, after having obtained a minimum size, into something
else? The answers to these questions remain open to a large extent, although it
has been argued that a Schwarzschild black hole transforms itself into a highly
massive quantum string state when its radius becomes on the order of ℓs [26].
We have seen previously that string theory contains general relativity in
a certain limit. At the same time, string theory is, strictly speaking, infinitely
richer than Einstein’s gravitation, for the graviton is nothing more than a partic-
ular quantum excitation of a string, among an infinite number of others. What
deviations from Einstein’s gravity are predicted by string theory? This question
remains open today because of our lack of comprehension about the connection
between string theory and the reality observed in our everyday environment
(4-dimensional space-time; electromagnetic, weak, and strong interactions; the
spectrum of observed particles; . . .). We shall content ourselves here with out-
lining a few possibilities. (See the contribution by I. Antoniadis for a discussion
of other possibilities.) First, let us state that if one considers collisions between
gravitons with energy-momentum k smaller than, but not negligible with respect
to, the characteristic string mass ms, the calculations of transition amplitudes
in string theory show that the usual Einstein equations (in the absence of mat-
ter) Rµν = 0 must be modified, by including corrections of order (k/ms)
2. One
finds that these modified Einstein equations have the form (for bosonic string
theory)
Rµν +
ℓ2s Rµαβγ R
ν + · · · = 0 , (32)
where
�ναβ ≡ ∂α Γ
νβ + Γ
νβ − ∂β Γµνα − Γ
να , (33)
denotes the “curvature tensor” of the metric gµν . (the quantity Rµν defined in
Section 5 that appears in Einstein’s equations in an essential way is a “trace” of
this tensor: Rµν = R
�µσν .) As indicated by the dots in (32), the terms written
are no more than the two first terms of an infinite series in growing powers of
ℓ2s ≡ α′. Equation (32) shows how the fact that the string is not a point, but
is rather extended over a characteristic length ∼ ℓs, modifies the Einsteinian
description of gravity. The corrections to Einstein’s equation shown in (32) are
nevertheless completely negligible in most applications of general relativity. In
fact, it is expected that ℓs is on the order of the Planck scale ℓp, Equation (25).
More precisely, one expects that ℓs is on the order of magnitude of 10
−32 cm.
(Nevertheless, this question remains open, and it has been recently suggested
that ℓs is much larger, and perhaps on the order of 10
−17 cm.)
If one assumes that ℓs is on the order of magnitude of 10
−32 cm (and that
the extra dimensions are compactified on distances scales on the order of ℓs),
the only area of general relativistic applications where the modifications shown
in (32) should play an important role is in primordial cosmology. Indeed, close
to the initial singularity of the Big Bang (if it exists), the “curvature” Rµναβ
becomes extremely large. When it reaches values comparable to ℓ−2s the infinite
series of corrections in (32) begins to play a role comparable to the first term,
discovered by Einstein. Such a situation is also found in the interior of a black
hole, when one gets very close to the singularity (see Figure 3). Unfortunately, in
such situations, one must take the infinite series of terms in (32) into account,
or in other words replace Einstein’s description of gravitation in terms of a
field (which corresponds to a point-like (quantum) particle) by its exact stringy
description. This is a difficult problem that no one really knows how to attack
today.
However, a priori string theory predicts more drastic low energy (k ≪ ms)
modifications to general relativity than the corrections shown in (32). In fact,
we have seen in Equation (31) above that Einsteinian gravity does not appear
alone in string theory. It is always necessarily accompanied by other long-range
fields, in particular a scalar field Φ(x), the dilaton, and an antisymmetric ten-
sor Bµν(x). What role do these “partners” of the graviton play in observable
reality? This question does not yet have a clear answer. Moreover, if one recalls
that (super)string theory must live in a space-time of dimension D = 10, and
that it includes the D = 10 (and eventually the D = 11) theory of supergravity,
there are many other supplementary fields that add themselves to the ten com-
ponents of the usual metric tensor gµν (in D = 4). It is conceivable that all of
these supplementary fields (which are massless to first approximation in string
theory) acquire masses in our local universe that are large enough that they no
longer propagate observable effects over macroscopic scales. It remains possible,
however, that one or several of these fields remain (essentially) massless, and
therefore can propagate physical effects over distances that are large enough to
be observable. It is therefore of interest to understand what physical effects are
implied, for example, by the dilaton Φ(x) or by Bµν(x). Concerning the latter,
it is interesting to note that (as emphasized by A. Connes, M. Douglas, and
A. Schwartz), in a certain limit, the presence of a background Bµν(x) has the
effect of deforming the space-time geometry in a “non-commutative” way. This
means that, in a certain sense, the space-time coordinates xµ cease to be sim-
ple real (commuting) numbers in order to become non-commuting quantities:
xµxν − xνxµ = εµν where εµν = −ενµ is connected to a (uniform) background
Bµν . To conclude, let us consider the other obligatory partner of the graviton
gµν(x), the dilaton Φ(x). This field plays a central role in string theory. In fact,
the average value of the dilaton (in the vacuum) determines the string theory
coupling constant, gs = e
Φ. The value of gs in turn determines (along with other
fields) the physical coupling constants. For example, the gravitational coupling
constant is given by a formula of the type ~G = ℓ2s(g
s + · · · ) where the dots
denote correction terms (which can become quite important if gs is not very
small). Similarly, the fine structure constant, α = e2/~c ≃ 1/137, which deter-
mines the intensity of electromagnetic interactions is a function of g2s . Because
of these relations between the physical coupling constants and gs (and therefore
the value of the dilaton; gs = e
Φ), we see that if the dilaton is massless (or in
other words is long-range), its value Φ(x) at a space-time point x will depend on
the distribution of matter in the universe. For example, as is the case with the
gravitational field (for example g00(x) ≃ −1 + 2GM/c2r), we expect that the
value of Φ(x) depends on the masses present around the point x, and should
be different at the Earth’s surface than it is at a higher altitude. One may
also expect that Φ(x) would be sensitive to the expansion of the universe and
would vary over a time scale comparable to the age of the universe. However,
if Φ(x) varies over space and/or time, one concludes from the relations shown
above between gs = e
Φ and the physical coupling constants that the latter must
also vary over space and/or time. Therefore, for example, the value, here and
now, of the fine structure constant α could be slightly different from the value
it had, long ago, in a very distant galaxy. Such effects are accessible to detailed
astronomical observations and, in fact, some recent observations have suggested
that the interaction constants were different in distant galaxies. However, other
experimental data (such as the fossil nuclear reactor at Oklo and the isotopic
composition of ancient terrestrial meteorites) put very severe limits on any vari-
ability of the coupling “constants.” Let us finally note that if the fine structure
“constant” α, as well as other coupling “constants,” varies with a massless field
such as the dilaton Φ(x), then this implies a violation of the basic postulate of
general relativity: the principle of equivalence. In particular, one can show that
the universality of free fall is necessarily violated, meaning that bodies with dif-
ferent nuclear composition would fall with different accelerations in an external
gravitational field. This gives an important motivation for testing the principle
of equivalence with greater precision. For example, the MICROSCOPE space
mission [27] (of the CNES) should soon test the universality of free fall to the
level of 10−15, and the STEP space project (Satellite Test of the Equivalence
Principle) [28] could reach the level 10−18.
Another interesting phenomenological possibility is that the dilaton (and/or
other scalar fields of the same type, called moduli) acquires a non-zero mass that
is however very small with respect to the string mass scale ms. One could then
observe a modification of Newtonian gravitation over small distances (smaller
than a tenth of a millimeter). For a discussion of this theoretical possibility and
of its recent experimental tests see, respectively, the contributions by I. Anto-
niadis and J. Mester to this Poincaré seminar.
12 Conclusion
For a long time general relativity was admired as a marvellous intellectual con-
struction, but it only played a marginal role in physics. Typical of the appraisal
of this theory is the comment by Max Born [29] made upon the fiftieth an-
niversary of the annus mirabilis: “The foundations of general relativity seemed
to me then, and they still do today, to be the greatest feat of human thought
concerning Nature, the most astounding association of philosophical penetra-
tion, physical intuition, and mathematical ability. However its connections to
experiment were tenuous. It seduced me like a great work of art that should be
appreciated and admired from a distance.”
Today, one century after the annus mirabilis, the situation is quite different.
General relativity plays a central role in a large domain of physics, including
everything from primordial cosmology and the physics of black holes to the
observation of binary pulsars and the definition of international atomic time.
It even has everyday practical applications, via the satellite positioning sys-
tems (such as the GPS and, soon, its European counterpart Galileo). Many
ambitious (and costly) experimental projects aim to test it (G.P.B., MICRO-
SCOPE, STEP, . . .), or use it as a tool for deciphering the distant universe
(LIGO/VIRGO/GEO, LISA, . . .). The time is therefore long-gone that its con-
nection with experiment was tenuous. Nevertheless, it is worth noting that the
fascination with the structure and physical implications of the theory evoked
by Born remains intact. One of the motivations for thinking that the theory
of strings (and other extended objects) holds the key to the problem of the
unification of physics is its deep affinity with general relativity. Indeed, while
the attempts at “Grand Unification” made in the 1970s completely ignored the
gravitational interaction, string theory necessarily leads to Einstein’s fundamen-
tal concept of a dynamical space-time. At any rate, it seems that one must more
deeply understand the “generalized quantum geometry” created through the in-
teraction of strings and p-branes in order to completely formulate this theory
and to understand its hidden symmetries and physical implications. Einstein
would no doubt appreciate seeing the key role played by symmetry principles
and gravity within modern physics.
References
[1] A. Einstein, Zur Elektrodynamik bewegter Körper, Annalen der Physik
17, 891 (1905).
[2] See http://www.einstein.caltech.edu for an entry into the Einstein Col-
lected Papers Project. The French reader will have access to Einstein’s
main papers in Albert Einstein, Œuvres choisies, Paris, Le Seuil/CNRS,
1993, under the direction of F. Balibar. See in particular Volumes 2 (Rel-
ativités I) and 3 (Relativités II). One can also consult the 2005 Poincaré
seminar dedicated to Einstein (http://www.lpthe.jussieu.fr/poincare):
Einstein, 1905-2005, Poincaré Seminar 2005, edited by T. Damour,
O. Darrigol, B. Duplantier and V. Rivasseau (Birkhäuser Verlag, Basel,
Suisse, 2006). See also the excellent summary article by D. Giulini and
N. Straumann, “Einstein’s impact on the physics of the twentieth cen-
tury,” Studies in History and Philosophy of Modern Physics 37, 115-
173 (2006). For online access to many of Einstein’s original articles and
to documents about him, see http://www.alberteinstein.info/. We also
note that most of the work in progress on general relativity can be
consulted on various archives at http://xxx.lanl.gov, in particular the
archive gr-qc. Review articles on certain sub-fields of general relativ-
ity are accessible at http://relativity.livingreviews.org. Finally, see T.
Damour, Once Upon Einstein, A K Peters Ltd, Wellesley, 2006, for a
recent non-technical account of the formation of Einstein’s ideas.
[3] Galileo, Dialogues Concerning Two New Sciences, translated by
Henry Crew and Alfonso di Salvio, Macmillan, New York, 1914.
[4] The reader interested in learning about recent experimental
tests of gravitational theories may consult, on the internet, ei-
ther the highly detailed review by C.M. Will in Living Re-
views (http://relativity.livingreviews.org/Articles/lrr-2001-4) or the
brief review by T. Damour in the Review of Particle Physics
http://www.einstein.caltech.edu
http://www.lpthe.jussieu.fr/poincare
http://www.alberteinstein.info/
http://xxx.lanl.gov
http://relativity.livingreviews.org
http://relativity.livingreviews.org/Articles/lrr-2001-4
(http://pdg.lbl.gov/). See also John Mester’s contribution to this
Poincaré seminar.
[5] A. Einstein, Die Feldgleichungen der Gravitation, Sitz. Preuss. Akad.
Wiss., 1915, p. 844.
[6] The reader wishing to study the formalism and applications of
general relativity in detail can consult, for example, the following
works: L. Landau and E. Lifshitz, The Classical Theory of Fields,
Butterworth-Heinemann, 1995; S. Weinberg, Gravitation and Cos-
mology, Wiley, New York, 1972; H.C. Ohanian and R. Ruffini,
Gravitation and Spacetime, Second Edition, Norton, New York,
1994; N. Straumann, General Relativity, With Applications to Astro-
physics, Springer Verlag, 2004. Let us also mention detailed course
notes on general relativity by S.M. Carroll, available on the in-
ternet: http://pancake.uchicago.edu/∼carroll/notes/∼; as well as at
gr-qc/9712019. Finally, let us mention the recent book (in French) on
the history of the discovery and reception of general relativity: J. Eisen-
staedt, Einstein et la relativité générale, CNRS, Paris, 2002.
[7] B. Bertotti, L. Iess, and P. Tortora, A Test of General Relativity
Using Radio Links with the Cassini Spacecraft, Nature 425, 374 (2003).
[8] http://einstein.stanford.edu
[9] W. Israel, Dark stars: the evolution of an idea, in 300 Years of Grav-
itation, edited by S.W. Hawking and W. Israel, Cambridge University
Press, Cambridge, 1987, Chapter 7, pp. 199-276.
[10] The discovery of binary pulsars is related in Hulse’s Nobel Lecture:
R.A. Hulse, Reviews of Modern Physics 66, 699 (1994).
[11] For an introduction to the observational characteristics of pulsars, and
their use in testing relativistic gravitation, see Taylor’s Nobel Lecture:
J.H. Taylor, Reviews of Modern Physics 66, 711 (1994). See also Michael
Kramer’s contribution to this Poincaré seminar.
[12] For an update on the observational characteristics of pulsars, and their
use in testing general relativity, see the Living Review by I.H. Stairs,
available at http://relativity.livingreviews.org/Articles/lrr-2003-5/ and
the contribution by Michael Kramer to this Poincaré seminar.
[13] For a recent update on tests of relativistic gravitation (and of tensor-
scalar theories) obtained through the chronometry of binary pulsars, see
G. Esposito-Farèse, gr-qc/0402007 (available on the general relativity
and quantum cosmology archive at the address http://xxx.lanl.gov),
and T. Damour and G. Esposito-Farèse, in preparation. Figure 4 is
adapted from these references.
http://pdg.lbl.gov/
http://pancake.uchicago.edu/~carroll/notes/~
http://arxiv.org/abs/gr-qc/9712019
http://einstein.stanford.edu
http://relativity.livingreviews.org/Articles/lrr-2003-5/
http://arxiv.org/abs/gr-qc/0402007
http://xxx.lanl.gov
[14] For a review of the problem of the motion of two gravitationally con-
densed bodies in general relativity, up to the level where the effects con-
nected to the finite speed of propagation of the gravitational interaction
appear, see T. Damour, The problem of motion in Newtonian and Ein-
steinian gravity, in 300 Years of Gravitation, edited by S.W. Hawking
and W. Israel, Cambridge University Press, Cambridge, 1987, Chapter
6, pp. 128-198.
[15] A. Einstein, Näherungsweise Integration der Feldgleichungen der
Gravitation, Sitz. Preuss. Akad. Wiss., 1916, p. 688 ; ibidem, Über Grav-
itationswellen, 1918, p. 154.
[16] For a highly detailed introduction to these three problems, see
K.S. Thorne Gravitational radiation, in 300 Years of Gravitation, edited
by S.W. Hawking and W. Israel, Cambridge University Press, Cam-
bridge, 1987, Chapter 9, pp. 330-458.
[17] http://www.ligo.caltech.edu/
[18] http://www.virgo.infn.it/
[19] http://www/geo600.uni-hanover.de/
[20] http://lisa.jpl.nasa.gov/
[21] L. Blanchet et al., gr-qc/0406012 ; see also the Living Review by
L. Blanchet, available at http://relativity.livingreviews.org/Articles.
[22] Figure 5 is adapted from work by A. Buonanno and T. Damour,
gr-qc/0001013.
[23] F. Pretorius, Phys. Rev. Lett. 95, 121101 (2005), gr-qc/0507014;
M. Campanelli et al., Phys. Rev. Lett. 96, 111101 (2006),
gr-qc/0511048 J. Baker et al., Phys. Rev. D 73, 104002 (2006),
gr-qc/0602026.
[24] For a particularly clear exposé of the development of the quantum the-
ory of fields, see, for example, the first chapter of S. Weinberg, The
Quantum Theory of Fields, volume 1, Foundations, Cambridge Univer-
sity Press, Cambridge, 1995.
[25] For an introduction to the theory of (super)strings see
http://superstringtheory.com/. For a detailed (and technical) in-
troduction to the theory see the books: K. Becker, M. Becker,
and J.H. Schwarz, String Theory and M-theory: An Introduction,
Cambridge University Press, Cambridge, 2006; B. Zwiebach, A First
Course in String Theory, Cambridge University Press, Cambridge,
2004; M.B. Green, J.H. Schwarz et E. Witten, Superstring theory, 2 vol-
umes, Cambridge University Press, Cambridge, 1987 ; and J. Polchinski,
String Theory, 2 volumes, Cambridge University Press, Cambridge,
http://www.ligo.caltech.edu/
http://www.virgo.infn.it/
http://www/geo600.uni-hanover.de/
http://lisa.jpl.nasa.gov/
http://arxiv.org/abs/gr-qc/0406012
http://relativity.livingreviews.org/Articles
http://arxiv.org/abs/gr-qc/0001013
http://arxiv.org/abs/gr-qc/0507014
http://arxiv.org/abs/gr-qc/0511048
http://arxiv.org/abs/gr-qc/0602026
http://superstringtheory.com/
1998. To read review articles or to research this theory as it develops
see the hep-th archive at http://xxx.lanl.gov. To search for information
on the string theory literature (and more generally that of high-energy
physics) see also the site http://www.slac.stanford.edu/spires/find/hep.
[26] For a detailed introduction to black hole physics see P.K. Townsend,
gr-qc/9707012; for an entry into the vast literature on black hole en-
tropy, see, for example, T. Damour, hep-th/0401160 in Poincaré Sem-
inar 2003, edited by Jean Dalibard, Bertrand Duplantier, and Vincent
Rivasseau (Birkhäuser Verlag, Basel, 2004), pp. 227-264.
[27] http://www.onera.fr/microscope/
[28] http://www.sstd.rl.ac.uk/fundphys/step/.
[29] M. Born, Physics and Relativity, in Fünfzig Jahre Relativitätstheorie,
Bern, 11-16 Juli 1955, Verhandlungen, edited by A. Mercier and
M. Kervaire, Helvetica Physica Acta, Supplement 4, 244-260 (1956).
http://xxx.lanl.gov
http://www.slac.stanford.edu/spires/find/hep
http://arxiv.org/abs/gr-qc/9707012
http://arxiv.org/abs/hep-th/0401160
http://www.onera.fr/microscope/
http://www.sstd.rl.ac.uk/fundphys/step/
	Introduction
	Special Relativity
	The Principle of Equivalence
	Gravitation and Space-Time Chrono-Geometry
	Einstein's Equations: Elastic Space-Time
	The Weak-Field Limit and the Newtonian Limit
	The Post-Newtonian Approximation and Experimental Confirmations in the Regime of Weak and Quasi-Stationary Gravitational Fields
	Strong Gravitational Fields and Black Holes
	Binary Pulsars and Experimental Confirmations in the Regime of Strong and Radiating Gravitational Fields
	Gravitational Waves: Propagation, Generation, and Detection
	General Relativity and Quantum Theory: From Supergravity to String Theory
	Conclusion
ABSTRACT
  After recalling the conceptual foundations and the basic structure of general
relativity, we review some of its main modern developments (apart from
cosmology) : (i) the post-Newtonian limit and weak-field tests in the solar
system, (ii) strong gravitational fields and black holes, (iii) strong-field
and radiative tests in binary pulsar observations, (iv) gravitational waves,
(v) general relativity and quantum theory.

<|endoftext|><|startoftext|>
Introduction
This worksheet demonstrates the use of Maple in Linear Algebra.
We give a new procedure (PowerMatrix) in Maple for finding the kth power of n-by-n square
matrix A, in a symbolic form, for any positive integer k, k ≥ n. The algorithm is based on an
application of Cayley-Hamilton theorem. We used the fact that the entries of the matrix Ak satisfy
the same recurrence relation which is determined by the characteristic polynomial of the matrix A
(see [1]). The order of these recurrences is n− d, where d is the lowest degree of the characteristic
polynomial of the matrix A.
For non-singular matrices the procedure can be extended for k not only a positive integer.
2 Initialization
> restart:
with(LinearAlgebra):
2.1 Procedure Definition
2.1.1 PowerMatrix
Input data are a square matrix A and a parameter k. Elements of the matrix A can be numbers
and/or parameters. The parameter k can take numeric value or be a symbol. The output data is
the kth power of the matrix. The procedure PowerMatrix is as powerful as the procedure rsolve.
> PowerMatrix := proc(A::Matrix,k)
local i,j,m,r,q,n,d,f,P,F,C;
P := x->CharacteristicPolynomial(A,x);
n := degree(P(x),x);
d := ldegree(P(x),x);
http://arxiv.org/abs/0704.0755v2
http://www.maplesoft.com/
mailto:malesh@EUnet.yu
mailto:ivana121@EUnet.yu
F := (i,j)->rsolve(sum(coeff(P(x),x,m)*f(m+q),m=0..n)=0,seq(f(r)=(A^r)[i,j],
r=d+1..n),f);
C := q->Matrix(n,n,F);
if (type(k,integer)) then return(simplify(A^k)) elif (Determinant(A)=0 and
not type(k,numeric)) then printf("The %ath power of the matrix for %a>=%d:",
k,k,n) elif (Determinant(A)=0 and type(k,numeric)) then return(simplify(A^k)) fi;
return(simplify(subs(q=k,C(q))));
3 Examples
3.1 Example 1.
> A := Matrix([[4,-2,2],[-5,7,-5],[-6,6,-4]]);
4 −2 2
−5 7 −5
−6 6 −4
> PowerMatrix(A,k);
−2k + 2 · 3k 2(1+k) − 2 · 3k −2(1+k) + 2 · 3k
−5 · 3k + 5 · 2k 5 · 3k − 4 · 2k −5 · 3k + 5 · 2k
6 · 2k − 6 · 3k −6 · 2k + 6 · 3k −6 · 3k + 7 · 2k
> Determinant(A);
> B := A^(-1);
> PowerMatrix(B,k);
−2(−k) + 2 · 3(−k) 2(1−k) − 2 · 3(−k) −2(1−k) + 2 · 3(−k)
−5 · 3(−k) + 5 · 2(−k) 5 · 3(−k) − 4 · 2(−k) −5 · 3(−k) + 5 · 2(−k)
−6 · 3(−k) + 6 · 2(−k) −6 · 2(−k) + 6 · 3(−k) −6 · 3(−k) + 7 · 2(−k)
3.2 Example 2.
> A := Matrix([[1-p,p],[p,1-p]]);
1− p p
p 1− p
> PowerMatrix(A,k);
(1− 2 p)k
(1− 2 p)k
(1− 2 p)k
(1− 2 p)k
The example is from [4], page 272, exercise 19.
3.3 Example 3.
> A := Matrix([[a,b,c],[d,e,f],[g,h,i]]);
a b c
d e f
g h i
> PowerMatrix(A,k)[1,1];
R = RootOf( (gbf + hdc + iea − gce − hfa− idb) Z3 + (gc + hf + db − ie − ia − ea) Z2 + (i+ e + a) Z − 1 )
R2ie− R2hf − Re− Ri+ 1
(3 R2gbf + 3 R2hdc+ 3 R2iea
−3 R2gce− 3 R2hfa− 3 R2idb+ 2 Rgc+ 2 Rhf + 2 Rdb− 2 Rie− 2 Ria− 2 Rea+ i+ e+ a) R
# Warning!
In this example MatrixPower and MatrixFuction procedures cannot be done in real-time.
# MatrixPower(A,k)[1,1];
# MatrixFunction(A,v^k,v)[1,1];
3.4 Example 4.
> A := Matrix([[0,0,1,0,1],[1,0,0,0,1],[0,0,0,1,1],[0,1,0,0,1],[1,1,1,1,0]]);
0 0 1 0 1
1 0 0 0 1
0 0 0 1 1
0 1 0 0 1
1 1 1 1 0
> PowerMatrix(A,k)[1,5];
Replace ’:’ with ’;’ and see result!
> MatrixPower(A,k)[1,5]:
> assume(m::integer):simplify(MatrixPower(A,k)[1,5]):
The example is from [3], page 101.
3.5 Example 5. and Example 6.
Pay attention what happens for singular matrices.
3.5.1 Example 5.
> A := Matrix([[0,2,1,3],[0,0,-2,4],[0,0,0,5],[0,0,0,0]]);
0 2 1 3
0 0 −2 4
0 0 0 5
0 0 0 0
> PowerMatrix(A,2);
0 0 −4 13
0 0 0 −10
0 0 0 0
0 0 0 0
> PowerMatrix(A,3);
0 0 0 −20
0 0 0 0
0 0 0 0
0 0 0 0
> PowerMatrix(A,k);
The kth power of the matrix A for k ≥ 4:
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
> MatrixPower(A,k);
Error, (in LinearAlgebra:-LA_Main:-MatrixPower)
power k is not defined for this Matrix
> MatrixFunction(A,v^k,v);
Error, (in LinearAlgebra:-LA_Main:-MatrixFunction)
Matrix function vk is not defined for this Matrix
The example is from [2], page 151, exercise 23.
3.5.2 Example 6.
> A := Matrix([[1,1,1,0],[1,1,1,-1],[0,0,-1,1],[0,0,1,-1]]);
1 1 1 0
1 1 1 −1
0 0 −1 1
0 0 1 −1
> PowerMatrix(A,k);
> The kth power of the matrix for k ≥ 4:
2(−1+k) 2(−1+k)
(−1)(1+k) · 2k
5 · 2k
(−1)k · 2k
2(−1+k) 2(−1+k)
5 · 2k
5 · (−1)(1+k) · 2k
5 · (−1)k · 2k
0 0 (−1)k · 2(−1+k) (−1)(1+k) · 2(−1+k)
0 0 (−1)(1+k) · 2(−1+k) (−1)k · 2(−1+k)
> MatrixPower(A,k);
Error, (in LinearAlgebra:-LA_Main:-MatrixPower)
power k is not defined for this Matrix
> MatrixFunction(A,v^k,v);
Error, (in LinearAlgebra:-LA_Main:-MatrixFunction)
Matrix function vk is not defined for this Matrix
4 References
[1] BrankoMalešević: Some combinatorial aspects of the composition of a set of functions, NSJOM
2006 (36), 3-9, URLs: http://www.im.ns.ac.yu/NSJOM/Papers/36 1/NSJOM 36 1 003 009.pdf,
http://arxiv.org/abs/math.CO/0409287.
[2] JohnB. Johnston, G.BaleyPrice, Fred S.Van Vleck: Linear Equations and Matrices, Addi-
son-Wesley, 1966.
[3] Carl D.Meyer: Matrix Analysis and Applied Linear Algebra Book and Solutions Manual SIAM,
2001.
[4] Robert Messer: Linear Algebra Gateway to Mathematics, New York, Harper-Collins College
Publisher, 1993.
5 Conclusions
This procedure has an educational character. It is an interesting demonstration for finding the kth
power of a matrix in a symbolic form. Sometimes, it gives solutions in the better form than the
existing procedure MatrixPower (see example 4.). See also example 5. and example 6., where we
consider singular matrices. In these cases the procedure MatrixPower does not give a solution. The
procedure PowerMatrix calculates the kth power of any singular matrices. In some examples it is
possible to get a solution in the better form with using the procedure allvalues (see example 3.).
Legal Notice: The copyright for this application is owned by the authors. Neither Maplesoft nor the
author are responsible for any errors contained within and are not liable for any damages resulting
from the use of this material. This application is intended for non-commercial, non-profit use only.
Contact the author for permission if you wish to use this application in for-profit activities.
http://www.im.ns.ac.yu/NSJOM/Papers/36_1/NSJOM_36_1_003_009.pdf
http://arxiv.org/abs/math.CO/0409287
	Introduction 
	Initialization
	Procedure Definition
	PowerMatrix
	Examples
	Example 1.
	Example 2.
	Example 3.
	Example 4.
	Example 5. and Example 6.
	Example 5. 
	Example 6.
	References
	Conclusions
ABSTRACT
  We give a new procedure in Maple for finding the k-th power of a martix. The
algorithm is based on the article [1].

<|endoftext|><|startoftext|>
Introduction: String theory, the most serious can-
didate for a quantum theory of gravity, predicts the ex-
istence of ’branes’, i.e. hypersurfaces in the 10- (or 11-
) dimensional spacetime on which ordinary matter, e.g.
gauge particles and fermions, are confined. Gravitons
can move freely in the ’bulk’, the full higher dimensional
spacetime [1].
The scenario, where our Universe moves through a five-
dimensional Anti de Sitter (AdS) spacetime has been
especially successful in reproducing the observed four-
dimensional behavior of gravity. It has been shown that
at sufficiently low energies and large scales, not only grav-
ity on the brane looks four dimensional [2], but also cos-
mological expansion can be reproduced [3]. We shall con-
centrate here on this example and comment on behavior
which may survive in other warped braneworlds.
We consider the following situation: A fixed ’static
brane’ is sitting in the bulk. The ’physical brane’, our
Universe, is first moving away from the AdS Cauchy hori-
zon, approaching the second brane. This motion corre-
sponds to a contracting Universe. After a closest en-
counter the physical brane turns around and moves away
from the static brane. This motion mimics the observed
expanding Universe.
The moving brane acts as a time-dependent boundary for
the 5D bulk leading to production of gravitons from vac-
uum fluctuations in the same way a moving mirror causes
photon creation from vacuum in dynamical cavities [4].
Apart from massless gravitons, braneworlds allow for a
tower of Kaluza-Klein (KK) gravitons which appear as
massive particles on the brane leading possibly to phe-
nomenological consequences.
We postulate, that high energy stringy physics will lead
to a turnaround of the brane motion, i.e., provoke a re-
pulsion of the physical brane from the static one. This
∗Electronic address: ruth.durrer@physics.unige.ch
†Electronic address: marcus.ruser@physics.unige.ch
motion is modeled by a kink where the brane velocity
changes sign. As we shall see, a perfect kink leads to
divergent particle production due to its infinite acceler-
ation. We therefore assume that the kink is rounded
off at the string scale Ls. Then particles with energies
E > Es = 1/Ls are not generated. This setup represents
a regular ’bouncing Universe’ as, for example the ’ekpy-
rotic Universe’ [5]. Four-dimensional bouncing Universes
have also been studied in Ref. [6].
Moving brane in AdS5: Our starting point is the met-
ric of AdS5 in Poincaré coordinates:
ds2 = gABdx
AdxB =
−dt2 + δijdxidxj + dy2
The physical brane (our Universe) is located at some
time-dependent position y = yb(t), while the static brane
is at a fixed position y = ys > yb(t). The scale factor on
the brane is
a(η) =
yb(t)
, dη =
1− v2dt = γ−1dt , v = dyb
where we have introduced the brane velocity v and the
conformal time η on the brane. If v ≪ 1, the junction
conditions lead to the Friedmann equations on the brane.
For reviews see [7, 8]. Defining the string and Planck
scales by κ5 ≡ L3s and κ4 ≡ L2Pl the Randall-Sundrum
(RS) fine tuning condition [2] implies
. (2)
We assume that the brane energy density is dominated
by a radiation component. The contracting (t < 0) and
expanding (t > 0) phases are then described by
a(t) =
|t|+ tb
, yb(t) =
|t|+ tb
, (3)
v(t) = − sign(t)L
(|t|+ tb)2
≃ −HL (4)
http://arxiv.org/abs/0704.0756v3
http://arxiv.org/abs/0704.0790
mailto:ruth.durrer@physics.unige.ch
mailto:marcus.ruser@physics.unige.ch
where H = (da/dη)/a2 is the Hubble parameter and we
have used that η ≃ t if v ≪ 1. A small velocity also
requires yb(t) ≪ L. The transition from contraction to
expansion is approximated by a kink at t = 0, such that
at the moment of the bounce
|v(0)| ≡ vb =
, ab = a(0) =
, H2b =
. (5)
Tensor perturbations: We now consider tensor per-
turbations hij on this background,
ds2 =
−dt2 + (δij + 2hij)dxidxj + dy2
. (6)
For each polarization, their amplitude h satisfies the
Klein-Gordon equation in AdS5 [8]
∂2t + k
2 − ∂2y +
h(t, y;k) = 0 (7)
where k = |k| is the momentum parallel to the brane and
h is subject to the boundary (2nd junction) conditions
(v∂t + ∂y)h|yb(t) = 0 → ∂yh|yb(t) = 0 and ∂yh|ys = 0 .
Being interested in late-time (low energy) effects, we have
approximated the first of those conditions by a Neumann
condition (v ≪ 1). Then, the spatial part of Eq. (7)
together with (8) forms a Strum-Liouville problem at any
given time and therefore has a complete orthonormal set
of eigenfunctions {φα(t, y)}∞α=0. These ’instantaneous’
mode functions are given by
φ0(t) =
ysyb(t)
y2s − y2b (t)
. (9)
φn(t, y) = Nn(t)y
2C2(mn(t), yb(t), y) with
Cν(m,x, y) = Y1(mx)Jν(my)−J1(mx)Yν (my) (10)
and satisfy [−∂2y + (3/y)∂y]φα(y) = m2αφα(y) as well as
(8). Nn is a time-dependent normalization condition.
More details can be found in [9]. The massless mode φ0
represents the ordinary four-dimensional graviton on the
brane, while the massive modes are KK gravitons. Their
masses are quantized by the boundary condition at the
static brane which requires C1(mn, yb, ys) = 0. At late
times and for large n the KK masses are roughly given
by mn ≃ nπ/ys. The gravity wave amplitude h may now
be decomposed as [9]
h(t, y;k) =
qα,k(t)φα(t, y) (11)
where the prefactor assures that the variables qα,k are
canonically normalized. Their time evolution is deter-
mined by the brane motion [cf. Eq. (14)].
Localization of gravity: From the above expressions
and using L/yb(t) = a(t), we can determine the late-
time behavior of the mode functions φα on the brane
(yb ≪ L ≪ ys)
φ0(t, yb) →
, φn(t, yb) →
. (12)
At this point we can already make two crucial ob-
servations: First, the mass mn is a comoving mass.
The instantaneous energy of a KK graviton is ωn,k =
k2 +m2n, where k denotes comoving wave number.
The ’physical mass’ of a KK mode measured by an ob-
server on the brane with cosmic time dτ = adt is therefore
mn/a, i.e. the KK masses are redshifted with the expan-
sion of the Universe. This comes from the fact that mn
is the wave number corresponding to the y direction with
respect to the bulk time t which corresponds to conformal
time η on the brane and not to physical time. It implies
that the energy of KK particles on a moving AdS brane
is redshifted like that of massless particles. From this
alone we would expect the energy density of KK modes
on the brane decays like 1/a4.
But this is not all. In contrast to the zero mode
which behaves as φ0(t, yb) ∝ 1/a the KK-mode functions
φn(t, yb) decay as 1/a
2 with the expansion of the Uni-
verse and scale like 1/
ys. Consequently the amplitude
of the KK modes on the brane dilutes rapidly with the
expansion of the Universe and is in general smaller the
larger ys. This can be understood by studying the prob-
ability of finding a KK-graviton at position y in the bulk
which turns out to be much larger in regions of less warp-
ing than in the vicinity of the physical brane[9]. If KK
gravitons are present on the brane, they escape rapidly
into the bulk, i.e., the moving brane looses them, since
their wave function is repulsed away from the brane. This
causes the additional 1/a-dependence of φn(t, yb) com-
pared to φ0(t, yb). The 1/
ys-dependence expresses the
fact that the larger the bulk the smaller the probabil-
ity to find a KK-graviton at the position of the moving
brane. This behavior reflects the localization of gravity:
traces of the five-dimensional nature of gravity like KK
gravitons become less and less ’visible’ on the brane as
time evolves. As a consequence, the energy density of
KK gravitons at late times on the brane behaves as
ρKK ∝ 1/a6 . (13)
It means that KK gravitons redshift like stiff matter and
cannot be the dark matter in an AdS braneworld since
their energy density does not have the required 1/a3 be-
havior. They also do not behave like dark radiation [7, 8]
as one might naively expect. This new result is derived
in detail in Ref. [9]. It is based on the calculation of
〈ḣ2(t, yb,k)〉 ∝ 〈q̇2α,k(t)〉φ2α(t, yb) where the bracket in-
corporates a quantum expectation value with respect to
a well-defined initial vacuum state and averaging over
several oscillations of the field [9]. An overdot denotes
the derivative with respect to t. The scaling behaviour
(13) is due to φ2α(t, yb) only, since graviton production
from vacuum fluctuations has ceased at late times (like
in radiation domination) which is necessary for a mean-
ingful particle definition. Then, 〈q̇2α,k(t)〉 is related to the
number of produced gravitons and is constant in time. In
case that amplification of tensor perturbation is still on-
going, e.g., during a de Sitter phase, the energy density
related to the massive modes might scale differently. The
scaling behavior (13) remains valid also when the fixed
brane is sent off to infinity and we end up with a single
braneworld in AdS5, like in the Randall-Sundrum II sce-
nario [2]. The situation is not altered if we replace the
graviton by a scalar or vector degree of freedom in the
bulk. Since every bulk degree of freedom must satisfy the
five-dimensional Klein-Gordon equation, the mode func-
tions will always be the functions φα, and the energy
density of the KK-modes decays like 1/a6. KK particles
on a brane moving through an AdS bulk cannot play the
role of dark matter.
It is important here that we consider a static bulk and
the time depencence of the brane comes solely from its
motion through the bulk. In Ref. [10] the situation of a
fixed brane in a time-dependent bulk is discussed. There
it is shown that under certain assumption (separability
of the y and t dependence of fluctuations), the energy
density of KK modes on a low energy cosmological brane
does scale like 1/a3 which seems to be in contradiction
with our result. However, the approximations used in [10]
lead to a system of equations governing the expansion of
the Universe but neglecting the time dependence of the
bulk. The situation is then effectively four dimensional
even for the KK modes; effects of the fifth dimension like
the possibility of KK gravitons escaping into the bulk
seem to be lost in this approach. In our case we would
have a similar situation if we keep the expansion on the
brane a(t) but take the position of the brane in the bulk
as static yb(t) = const, which is not consistent with the
general relation yb(t) = L/a(t). [For a fixed physical
mass M = m/a, if we neglect the time dependence of
φn(yb(t)) ∝ 1/a2 we also obtain an energy density for
this mass proportional to 1/a3.]
Particle production: The equation of motion for the
canonical variables qα,k is of the form, see Ref. [9],
q̈α,k + ω
α,kqα,k =
β 6=α
Mαβ q̇β,k +
Nαβqβ,k . (14)
Here ωα,k =
k2 +m2α is the frequency of the mode and
M andN are coupling matrices. When we quantize these
variables, gravitons can be created by two effects: First,
the time dependence of the effective frequency (ωeffα,k)
ω2α,k−Nαα and second, the time dependence of the mode
couplings described by the antisymmetric matrix M and
the off-diagonal part of N .
Note that Equation (14) is derived from the corre-
sponding action for the variables qα,k rather than from
the wave equation (7) itself. In this way the approxi-
mated boundary conditions (8) can be implemented con-
sistently [9, 11].
In the technical paper [9] we have studied graviton pro-
duction provoked by a brane moving according to (3) in
great detail numerically. We have found that for long
wavelengths, kL ≪ 1, the zero mode is mainly generated
by its self-coupling, i.e. the time dependence of its ef-
fective frequency. One actually finds that N00 ∝ δ(t),
so that there is an instability at the moment of the kink
which leads to particle creation, and the number of 4D-
gravitons is given by 2vb/(kL)
2. This is specific to ra-
diation dominated expansion where H2a2 = −∂η(Ha).
For another expansion law we would also obtain particle
creation during the contraction and expansion phases.
Light KK gravitons are produced mainly via their cou-
pling to the zero mode. This behavior changes drastically
for short wavelengths kL ≫ 1. Then the evolution of the
zero mode couples strongly to the KK modes and pro-
duction of 4D gravitons via the decay of KK modes takes
place. In this case the number of produced 4D gravitons
decays only like ∝ 1/(kL).
Results and discussion: The numerical simulations
have revealed a multitude of interesting effects. In the
following we summarize the main findings. We refer the
interested reader to Ref. [9] for an extensive discussion.
For the zero-mode power spectrum we find on scales
kL ≪ 1 on which we observe cosmological fluctuations
(Mpc or larger)
P0(k) =
k2 if kt ≪ 1
(La)−2 if kt ≫ 1 . (15)
The spectrum of tensor perturbations is blue on super-
horizon scales as one would expect for an ekpyrotic sce-
nario. On cosmic microwave background scales the am-
plitude of perturbations is of the order of (H0/mPl)
2 and
hence unobservably small.
Calculating the energy density of the produced massless
gravitons one obtains [9]
ρh0 ≃
. (16)
Comparing this with the radiation energy density, ρrad =
(3/(κ4L
2))a−4, the RS fine-tuning condition leads to the
simple relation
ρh0/ρrad ≃ vb/2. (17)
The nucleosynthesis bound [12] requests ρh0 <∼ 0.1ρrad,
which implies vb ≤ 0.2, justifying our low energy ap-
proach. The model is not severely constrained by the
zero-mode.
More stringent bounds come from the KK modes. Their
energy density on the brane is found to be
ρKK ≃
. (18)
This result is dominated by high energy KK gravitons
which are produced due to the kink. It is reasonable
to require that the KK-energy density on the brane be
(much) smaller than the radiation density at all times,
and in particular, right after the bounce where ρKK is
greatest. If this is not satisfied, back reaction cannot be
neglected. We obtain with ρrad(0) = 3H
b /κ4
a=a(0)=1/
≃ 100 v3b
. (19)
If we use the largest value for the brane velocity vb ad-
mitted by the nucleosynthesis bound vb ≃ 0.2 and re-
quire that ρKK/ρrad be (much) smaller than one for back-
reaction effects to be negligible, we obtain the very strin-
gent condition
. (20)
Taking the largest allowed value for L ≃ 0.1mm,
the RS fine-tuning condition Eq. (2) determines
Ls = (LL
1/3 ≃ 10−22mm ≃ 1/(106TeV) and
(L/Ls)
2 ≃ 1042 so that ys > L(L/Ls)2 ≃ 1041mm
∼ 1016Mpc. This is about 12 orders of magnitude larger
than the present Hubble scale. Also, since yb(t) ≪ L
in the low energy regime, and ys ≫ L according to the
inequality (20), the physical brane and the static brane
need to be far apart at all times otherwise back reaction
is not negligible. This situation is probably not very
realistic. We need some high energy, stringy effects to
provoke the bounce and these may well be relevant only
when the branes are sufficiently close, i.e. at a distance
of order Ls. But in this case the constraint (20) will
be violated which implies that back reaction will be
relevant. On the other hand, if we want that ys ≃ L
and back reaction to be unimportant, then Eq. (19)
implies that the bounce velocity has to be exceedingly
small, vb <∼ 10−15. One might first hope to find a way
out of these conclusions by allowing the bounce to
happen in the high energy regime. But then vb ≃ 1 and
the nucleosynthesis bound is violated since too many
zero-mode gravitons are being produced. Clearly our
low energy approach looses its justification if vb ≃ 1, but
it seems unlikely that modifications coming from the
high energy regime alleviate the bounds.
Conclusions: Studying graviton production in an AdS
braneworld we have found the following. First, the
energy density of KK gravitons on the brane behaves as
∝ 1/a6, i.e. it scales like stiff matter with the expansion
of the Universe and can therefore not serve as a can-
didate for dark matter. Furthermore, if gravity looks
four dimensional on the brane, its higher-dimensional
aspects, like the KK modes, are repelled from the brane.
Even if KK gravitons are produced on the brane they
rapidly escape into the bulk as time evolves, leaving no
traces of the underlying higher-dimensional nature of
gravity. This is likely to survive also in other warped
braneworlds when expansion can be mimicked by brane
motion.
Secondly, a braneworld bouncing at low energies is not
constrained by massless 4D gravitons and satisfies the
nucleosynthesis bound as long as vb <∼ 0.2. However,
for interesting values of the string and AdS scales
and the largest admitted bounce velocity the back
reaction of the KK modes is only negligible if the
two branes are far apart from each other at all times,
which seems rather unrealistic. For a realistic bounce
the back reaction from KK modes can most likely not
be neglected. Even if the energy density of the KK
gravitons on the brane dilutes rapidly after the bounce,
the corresponding energy density in the bulk could even
lead to important changes of the bulk geometry. The
present model seems to be adequate to address the back
reaction issue since the creation of KK gravitons happens
exclusively at the bounce. This and the treatment of
the high energy regime vb ≃ 1 is reserved for future work.
We thank Kazuya Koyama for discussions. This
work is supported by the Swiss National Science
Foundation.
[1] J. Polchinski, String theory. An introduction to the
bosonic string, Vol. I, and String theory. Superstring the-
ory and beyond, Vol. II , Cambridge University Press
(1998).
J. Polchinski, Phys. Rev. Lett. 75, 4724 (1995), hep-
th/9910219.
[2] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370
(1999), hep-th/9905221; 83, 4690, hep-th/9906064
[3] P. Binetruy, C. Deffayet, U. Ellwanger, and D. Langlois,
Phys. Lett. B477, 285 (2000), hep-th/9910219.
[4] M. Ruser, Phys. Rev. A73, 043811 (2006); J. Phys. A39,
6711 (2006), and references therein.
[5] J. Khoury, P. Steinhardt and N. Turok, Phys. Rev. Lett.
92, 031302 (2004), hep-th/0307132; Phys. Rev. Lett. 91
161301 (2003), astro-ph/0302012.
[6] R. Durrer and F. Vernizzi, Phys.Rev.D66 083503 (2002),
hep-ph/0203275; C. Cartier, E. Copeland and R. Durrer,
Phys. Rev. D67, 103517 (2003), hep-th/0301198.
[7] R. Maartens, Living Rev. Rel. 7, 7 (2004), gr-qc/0312059.
[8] R. Durrer, Braneworlds, at the XI Brazilian School
of Cosmology and Gravitation, Edt. M. Novello and
S.E. Perez Bergliaffa, AIP Conference Proceedings, 782
(2005), hep-th/0507006.
[9] M. Ruser and R. Durrer, Phys. Rev. D 76, 104014 (2007),
arXiv:0704.0790.
[10] M. Minamitsuji, M. Sasaki and D. Langlois, Phys. Rev.
D71, 084019 (2005).
[11] C. Cartier, R. Durrer and M. Ruser, Phys. Rev. D72,
104018 (2005).
[12] M. Maggiore. Phys. Rept. 331, 283 (2000).
ABSTRACT
  In braneworld cosmology the expanding Universe is realized as a brane moving
through a warped higher-dimensional spacetime. Like a moving mirror causes the
creation of photons out of vacuum fluctuations, a moving brane leads to
graviton production. We show that, very generically, Kaluza-Klein (KK)
particles scale like stiff matter with the expansion of the Universe and can
therefore not represent the dark matter in a warped braneworld. We present
results for the production of massless and KK gravitons for bouncing branes in
five-dimensional anti de Sitter space. We find that for a realistic bounce the
back reaction from the generated gravitons will be most likely relevant. This
letter summarizes the main results and conclusions from numerical simulations
which are presented in detail in a long paper [M.Ruser and R. Durrer, Phys.
Rev. D 76, 104014 (2007), arXiv:0704.0790]

<|endoftext|><|startoftext|>
Bounds on Negativity of Superpositions
Yong-Cheng Ou and Heng Fan
Institute of Physics, Chinese Academy of Sciences, Beijing 100080, People’s Republic of China
The entanglement quantified by negativity of pure bipartite superposed states is studied. We
show that if the entanglement is quantified by the concurrence two pure states of high fidelity to one
another still have nearly the same entanglement. Furthermore this conclusion can be guaranteed by
our obtained inequality, and the concurrence is shown to be a continuous function even in infinite
dimensions. The bounds on the negativity of superposed states in terms of those of the states being
superposed are obtained. These bounds can find useful applications in estimating the amount of
the entanglement of a given pure state.
PACS numbers: 03.67.Mn, 03.65.Ta, 03.65.Ud
Quantum entanglement plays an important role both
in many aspects of quantum information theory[1] and in
describing quantum phase transition in quantum many-
body systems[2, 3]. As such characterization quantifi-
cation of quantum entanglement is a fundamental issue.
Consequently the legitimate measures of entanglement
are desirable as a first step. The existing well-known bi-
partite measure of entanglement with an elegant formula
is the concurrence derived analytically by Wootters[4]
and the entanglement of formation[5, 6] is a monoton-
ically increasing function of the concurrence. In general
for a multipartite or higher-dimensional system it is a
formidable task of quantifying its entanglement since it
needs complicate convex-roof extension. In the last 10
years some important properties of quantum entangle-
ment were found, one of which is the monogamy prop-
erty described by Coffman-Kundu-Wootters inequality in
terms of concurrence [7]. In our previous work we have
shown that the monogamy inequality can not general-
ize to higher-dimensional systems[8] and established a
monogamy inequality in terms of negativity giving a dif-
ferent residual entanglement[9].
On the other hand, quantum entanglement is a direct
consequence of the superposition principle. It is an in-
teresting physical phenomenon that the superposition of
two separable states may give birth to an entangled state,
on the contrary, the superposition of two entangled states
may give birth to a separable state. The relation between
the entanglement of the state and the entanglement of the
individual terms that by superposition yield the state has
been studied, where the entanglement is quantified by the
von Neumann entropy[10] and the concurrence[12]. Re-
cently it was generalized to the superposition of more
than two components[13]. If the entanglement is quan-
tified by negativity, it would be interesting to establish
the analogous relation and obtain the bound of entangle-
ment for the superposition state. In this paper, we first
show that, by contrast to the von Neumann entropy, the
concurrence is a continuous function even in infinite di-
mensions. We deduce an inequality to guarantee this
property. Next we give the bounds of the negativity of
the superposition state. The discussion and conclusion
are presented in the end.
The authors in[10] have shown that two states of high
fidelity to one another may not have the same entan-
glement, i.e., |〈ψ|φ〉|2 → 1 may not generally result in
E(ψ) → E(φ), where E is the von Neumann entropy.
For a bipartite pure state |Φ〉AB the von Neumann en-
tropy is defined as
E(ΦAB) ≡ S(TrB|Φ〉AB〈Φ|) = S(TrA|Φ〉AB〈Φ|), (1)
where S(ρ) = −Tr(ρ log ρ), and the concurrence is de-
fined as
C(ΦAB) ≡
2 (1− Trρ2
, (2)
where ρA = TrB|Φ〉AB〈Φ| with the eigenvalues µi. How-
ever, if we employ the concurrence to quantify the entan-
glement, |〈ψ|φ〉|2 → 1 must result in C(ψ) → C(φ). Let
us see their example letting
|φ〉AB = |00〉, (3)
|ψ〉AB =
1− ǫ|φ〉AB +
[|11〉+ |22〉+ · · ·+ |dd〉]. (4)
It is obviously true that E(φAB) = C(φAB) = 0, while
according to [10] the von Neumann entropy of the state
|ψ〉AB is
E(ψAB) ≈ ǫ log2 d→ ∞, (5)
specially when d is as large as we expect. It follows from
Eq.(2) that the concurrence of the state |ψ〉AB give us
the result
C2(ψAB) = 2
2ǫ− ǫ2 − ǫ
→ 0, (6)
when ǫ is adequately small. By contrast to E(ψAB) in
Eq.(5), C2(ψAB) in Eq.(6) is independent of d. Note that
when ǫ is small the two states have high fidelity |〈ψ|φ〉|2 =
1 − ǫ → 1. Comparing Eq.(5) to Eq.(6), we can draw a
http://arxiv.org/abs/0704.0757v1
conclusion that if the entanglement is quantified by the
concurrence two states of high fidelity to one another still
have nearly the same entanglement.
It is indeed that the difference of the von Neumann en-
tropy between two pure states of fixed dimension can be
bounded using Fannes’ inequality[11], while the von Neu-
mann entropy is not a continuous function and no such
bound applies in infinite dimensions. However, as we will
show here, a similar bound still works if the entanglement
is quantified by the concurrence and the concurrence is
a continuous function even in infinite dimensions. In or-
der to explain our above viewpoint we present the fol-
lowing Theorem which is similar to the original Fannes’
inequality except that the entanglement is quantified by
the concurrence.
Theorem 1. Suppose ρAB and σAB are density matri-
ces of two bipartite pure states in arbitrary dimensions.
For the trace distance T (ρA, σA) ≡ Tr|ρA − σB| between
ρA = TrBρAB and σA = TrBσAB we have
|C2(ρAB)− C2(σAB)| ≤ 4T (ρA, σA). (7)
Proof. Let r1 ≥ r2 ≥ · · · ≥ rd be the eigenvalues of
ρA, in decreasing order, and s1 ≥ s2 ≥ · · · ≥ sd be the
eigenvalues of σA, also in decreasing order. According
to[1], it follows that
|ri − si| ≤ T (ρA, σA). (8)
From the observation of the definition of the concurrence
in Eq.(2), we can rewrite the left-hand-side of Eq.(7) as
∣∣C2(ρAB)− C2(σAB)
∣∣ = 2
∣∣∣∣∣
(r2i − s2i )
∣∣∣∣∣
∣∣r2i − s2i
|ri + si||ri − si|
|ri − si|. (9)
The second formula is obtained from the observation that
|a+ b+ · · ·+ k| ≤ |a| + |b| + · · · + |k| for any complex
quantities a, b, · · ·, k. In the derivation of the last formula
we have taken into account the fact that ri + si ≤ 2
since each eigenvalue of ri and si is not greater than one.
Combining Eqs.(8) and (9) can give Eq.(7). Thus the
proof is completed.
From the Theorem 1 it can be seen that the difference
of the concurrences of two pure states is a function of
fidelity and can be bounded by Eq.(7). What’s more, by
contrast to the von Neumann entropy[10] the concurrence
is a continuous function and such a bound still works in
infinite dimensions. Note that whether a similar bound
in Eq.(7) holds for the negativity is still open. In the next
paragraphs we are devoted to deducing the bounds on the
negativity of any bipartite pure state as a superposition
of two terms |Γ〉AB = α|Ψ〉+ β|Φ〉.
Before embarking on this study, we first recall some
basic definitions of the negativity. As for detecting en-
tangled state in higher-dimensional Hilbert space, Peres-
Horodecki criterion based on partial transpose[15, 16] is
a convenient method. Given a density matrix ρ in a bi-
partite pure system of A and B, the partial transpose
with respect to A subsystem is described by (ρTA)ij,kl =
(ρ)kj,il and the negativity is defined as
N = 1
(‖ρTA‖ − 1). (10)
The trace norm ‖R‖ is given by ‖R‖ = Tr
RR†. Note
that N > 0 is the necessary and sufficient condition for
entangled bipartite pure states.
There are two key ingredients to obtain the bounds
of the negativity for bipartite superposition pure states.
One is that the negativity can be expressed by means of
Schmidt coefficients of a pure state. Suppose that a pure
m⊗ n(m ≤ n) quantum state has the standard Schmidt
form |ψ〉AB =
µi|aibi〉, where
µi(i = 1, · · · ,m)
are the Schmidt coefficients, ai and bi are the orthogonal
basis in HA and HB, respectively. For the pure bipartite
state we can derive ‖ρTA‖ =
[18], and there-
fore Eq.(10) can be reexpressed as
N = 1
 . (11)
In order for the later use we can transform Eq.(11) into
= 2N + 1. (12)
The other is the Theorem[17], which states that for any
two Hermitian matrix H and K defined in Cn×n,
µi(H) + µ1(K) ≤ µi(H +K) ≤ µi(H) + µn(K), (13)
holds, where µi(·) are the eigenvalues in increasing order.
If µ1(K) ≥ 0, from Eq.(13) it is easy to check that
µi(H) ≤
µi(H +K) ≤
µi(H) +
µn(K), (14)
holds also. Then Eq.(14) will be used repeatedly in what
follows.
For the negativity of the arbitrary superposition state
let us first see the simplest case in which two bi-
partite states we are superposing, Φ1 and Ψ1, are
biorthogonal[10], i.e., Φ1Ψ
= Ψ1Φ
= 0[12]. Since the
matrix representation of a reduced density matrix will
be used, we explain the corresponding notations in the
following. For the pure state |Φ〉AB defined in m ⊗ n
dimensions, generally it can be considered as a vector:
|Φ〉AB = [a00, a01, · · · , a0m, a10, a11, · · · , amn]T with the
superscript T denoting transpose operation. With the
matrix notation, the reduced density matrix reads
ρA = ΦΦ
†, (15)
whose eigenvalues are µi appearing in Eq.(11).
Theorem 2. Suppose that two biorthogonal pure states
Φ1 and Ψ1, which are defined in m ⊗ n(n ≤ m) di-
mensions. The negativity of their superposed states
Γ1 = αΦ1 + βΨ1 with |α2|+ |β|2 = 1 satisfies
2|α|2N (Φ1) + 2|β|2N (Ψ1)− 1
≤ N (αΦ1 + βΨ1) ≤
2|α|2Ñ (Φ1) + 2|β|2Ñ (Ψ1)− 1
, (16)
where
Ñ (Φ1) = N (Φ1) +
µn(Ψ1)[2N (Φ1) + 1]
|α| +
n2|β|2µn(Ψ1)
2|α|2 ,
Ñ (Ψ1) = N (Ψ1) +
µn(Φ1)[2N (Ψ1) + 1]
|β| +
n2|α|2µn(Φ1)
2|β|2 .
Proof. From Eq.(15) the reduced density matrix of the state Γ1 can read
= |α|2Φ1Φ†1 + |β|2Ψ1Ψ
+ αβ∗Φ1Ψ
+ α∗βΨ1Φ
. (17)
The biorthogonal condition with Φ1Ψ
= 0 and Ψ1Φ
0 makes Eq.(17) reduce to
= |α|2Φ1Φ†1 + |β|2Ψ1Ψ
. (18)
Substituting Eq.(18) into the left inequality of Eq.(13)
we have
|α|2µi(Φ1Φ†1) + |β|2µ1(Ψ1Ψ
) ≤ µi(Γ1Γ†1). (19)
Since Ψ1Ψ
is positive semidefinite, µ1(Ψ1Ψ
) ≥ 0. Thus
Eq.(19) becomes
|α|2µi(Φ1Φ†1) ≤ µi(Γ1Γ
). (20)
Taking the square root of both sides in Eq.(20) and the
sum of
µi(·) over all index i, we have
µi(Φ1Φ
µi(Γ1Γ
). (21)
In a similar way, substituting Eq.(18) into the right in-
equality of Eq.(14) and taking the sum of
µi(·) over all
index i, we have
µi(Γ1Γ
) ≤ |α|
µi(Φ1Φ
) + n|β|
µn(Ψ1Ψ
Substituting Eqs.(21) and (22) into Eq.(12), respectively,
we can obtain
|α|2N (Φ1) +
|α|2 − 1
≤ N (αΦ1 + βΨ1)
≤ |α|2Ñ (Φ1) +
|α|2 − 1
.(23)
If we replace the matrix |α|2Φ1Φ†1 with |β|2Ψ1Ψ
Eqs.(20) and (21), i.e., equivalently exchange the ma-
trixes H and K in Eq.(14), finally we can also obtain
|β|2N (Ψ1) +
|β|2 − 1
≤ N (αΦ1 + βΨ1)
≤ |β|2Ñ (Ψ1) +
|β|2 − 1
.(24)
Then combining Eqs.(23) and (24) gives Eq.(16). Thus
the proof is completed.
Note that the lower bound in Eq.(16) can provide a
nonzero value only when 2|α|2N (Φ1) + 2|β|2N (Ψ1) > 1.
Next we show an example to illustrate the validity of our
bound. Consider the state
|φ〉AB = α|ϕ〉AB + β|ψ〉AB , (25)
|ϕ〉AB =
|00〉+ 1√
|11〉, (26)
|ψ〉AB =
|22〉+ 1√
|33〉, (27)
where α = β = 1/
2. It is easy to check that |ϕ〉AB and
|ψ〉AB are biorthogonal, N (|φ〉AB) = 3/2, N (|ϕ〉AB) =
N (|ψ〉AB) = 1/2, and µ4(|ϕ〉AB) = µ4(|ψ〉AB) = 1/2.
Accordingly from Eq.(16) we obtain the lower and upper
bounds
0 < N (|φ〉AB) =
< 4, (28)
which work well.
Finally we directly present the main Theorem of this
paper, in which the two states being superposed can be
biorthoganal, orthogonal, or nonorthogonal.
Theorem 3. Suppose that two arbitrary normalized
pure states Φ2 with rank r1 and Ψ2 with rank r2, which
are defined in any dimensions. The negativity of their
superposed states Γ2 = αΦ2 + βΨ2 with rank r3 and
|α2|+ |β|2 = 1 satisfies
2‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2) ≤ 2|α|2Ñ (Φ2) + 2|β|2Ñ (Ψ2)− ‖α|Φ2〉+ β|Ψ2〉‖2 + 1, (29)
where
Ñ (Φ2) = N (Φ2) +
µn(Ψ2)[2N (Φ2) + 1]
|α| +
r2|β|2µn(Ψ2)
2|α|2 ,
Ñ (Ψ2) = N (Ψ2) +
µn(Φ2)[2N (Ψ2) + 1]
|β| +
r2|α|2µn(Φ2)
2|β|2 ,
where r = max{r1, r2, r3}.
Proof. Consider the matrix
M = |α|2Φ2Φ†2 + |β|2Ψ2Ψ
, (30)
which can be rewritten as
‖Γ2‖2
Γ̂2(Γ̂2)
)†, (31)
where Γ−
= αΦ2 − βΨ2, Γ̂2 = Γ2/‖Γ2‖, and Γ̂−2 =
‖. Thus Eqs.(13) shows that
|α|2µi(Φ2Φ†2) + |β|2µ1(Ψ2Ψ
≤ µi(M) ≤ |α|2µi(Φ2Φ†2) + |β|2µn(Ψ2Ψ
), (32)
‖Γ2‖2
Γ̂2Γ̂
≤ µi(M) ≤
‖Γ2‖2
Γ̂2Γ̂
.(33)
Since µ1(Ψ2Ψ
) ≥ 0 and µ1
≥ 0, observing
the left inequality of Eq.(33) and the right inequality in
Eq.(32) we have
‖Γ2‖√
Γ̂2Γ̂
≤ |α|
µi(Φ2Φ
) + |β|
µn(Ψ2Ψ
Substituting Eqs.(34) into Eq.(12) we have
‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2)
≤ 2|α|2Ñ (Φ2)−
‖α|Φ2〉+ β|Ψ2〉‖2
+ |α|2. (35)
Likewise, if we replace the two matrixes |α|2Φ2Φ†2 with
|β|2Ψ2Ψ†2 in Eq.(32), we can obtain
‖α|Φ2〉+ β|Ψ2〉‖2N (αΦ2 + βΨ2)
≤ 2|β|2Ñ (Ψ2)−
‖α|Φ2〉+ β|Ψ2〉‖2
+ |β|2. (36)
Combining Eqs.(35) and (36) gives Eq.(29). Thus the
proof is completed.
Since there exists a extra term of the maximal eigen-
value in the second inequality in Eq.(33), generally it
is difficult to achieve the universal formula for the lower
bound of the negativity in this case. But it is our interest
in the future work.
In conclusion, we have shown that if the entanglement
is quantified by the concurrence two pure states of high
fidelity to one another still have nearly the same en-
tanglement and obtained an inequality that can guaran-
tee that the concurrence is a continuous function even
in infinite dimensions. However, whether the similar
property can apply to the negativity is still open. The
bounds on the negativity of superposed states in terms
of those of the states being superposed were obtained.
So far some bounds of the wildly-studied measures of
entanglement like the von Neumann entropy[10], the
concurrence[12] and the negativity in this paper for the
superposition states have been provided. In view of that
the concurrence can be directly accessible in laboratory
experiment[19], these bounds can find useful applications
in estimating the amount of the entanglement of a given
pure state.
The author Y.C.O. was supported from China Post-
doctoral Science Foundation and the author H.F. was
supported by ’Bairen’ program NSFC grant and ’973’
program (2006CB921107).
[1] M. A. Nielsen and I. L. Chuang, Quantum Computation
and Quantum Information (Cambridge University Press,
Cambridge, 2000).
[2] A. Osterloh, L. Amico, G. Falci, and R. Fazio Na-
ture(London) 416, 608(2002).
[3] L. A. Wu, M. S. Sarandy, and D. A. Lidar, Phys. Rev.
Lett. 93, 250404(2004).
[4] W. K. Wootters, Phys. Rev. Lett. 80, 2245(1998).
[5] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, and W.
K. Wootters, Phys. Rev. A 54, 3824(1996).
[6] S. Hill and W. K. Wootters, Phys. Rev. Lett. 78,
5022(1997).
[7] V. Coffman, J. Kundu, and W. K. Wootters, Phys. Rev.
A 61, 052306(2000).
[8] Y. C. Ou, Phys. Rev. A 75, 034305(2007).
[9] Y. C. Ou and H. Fan, quant-ph/0702127.
[10] N. Linden, S. Popescu, and J. A. Smolin, Phys. Rev. Lett.
97, 100502(2006).
[11] M. Ohya and D. Petz, Quantum Entropy and Its Use
(Springer-Verlag, Berlin 1983); see also Ref.(1).
[12] C. S. Yu, X. X. Yi, and H. S. Song, Phys. Rev. A 75,
022332(2007).
[13] Y. Xiang, S. J. Xiong, and F. Y. Hong,
quant-ph/0701188.
[14] G. Vidal and R. F. Werner, Phys. Rev. A 65,
032314(2002).
[15] A. Peres, Phys. Rev. Lett. 77, 1413(1996).
[16] M. Horodecki, P. Horodecki, and R. Horodecki, Phys.
Lett. A 223, 1(1996).
[17] R. A. Horn and C. R. Johnson, Matrix Analysis (Cam-
bridge University Press, New York, 1985), see Theorem
4.3.1.
[18] K. Chen, S. Albeverio, and S. M. Fei, Phys. Rev. Lett.
95, 040504(2005).
[19] S. P. Walborn, P. H. Souto Ribeiro, L. Davidovich,
F. Mintert, and A. Buchleitner, Nature(London) 440,
1022(2006).
http://arxiv.org/abs/quant-ph/0702127
http://arxiv.org/abs/quant-ph/0701188
ABSTRACT
  The entanglement quantified by negativity of pure bipartite superposed states
is studied. We show that if the entanglement is quantified by the concurrence
two pure states of high fidelity to one another still have nearly the same
entanglement. Furthermore this conclusion can be guaranteed by our obtained
inequality, and the concurrence is shown to be a continuous function even in
infinite dimensions. The bounds on the negativity of superposed states in terms
of those of the states being superposed are obtained. These bounds can find
useful applications in estimating the amount of the entanglement of a given
pure state.

<|endoftext|><|startoftext|>
Halder_articleFigureArxiv
Entangling Independent Photons by Time Measurement 
Matthäus Halder, Alexios Beveratos, Nicolas Gisin, Valerio Scarani, Christoph Simon 
& Hugo Zbinden  
Group of Applied Physics, University of Geneva, 20, rue de l'Ecole-de-Médecine, 1211 
Geneva 4, Switzerland 
A quantum system composed of two or more subsystems can be in an entangled 
state, i.e. a state in which the properties of the global system are well defined but 
the properties of each subsystem are not. Entanglement is at the heart of quantum 
physics, both for its conceptual foundations and for applications in information 
processing and quantum communication. Remarkably, entanglement can be 
“swapped”: if one prepares two independent entangled pairs A1-A2 and B1-B2, a 
joint measurement on A1 and B1 (called a “Bell-State Measurement”, BSM) has 
the effect of projecting A2 and B2 onto an entangled state, although these two 
particles have never interacted or shared any common past1,2. Experiments using 
twin photons produced by spontaneous parametric down-conversion (SPDC) have 
already demonstrated entanglement swapping3-6, but here we present its first 
realization using continuous wave (CW) sources, as originally proposed2. The 
challenge was to achieve sufficiently sharp synchronization of the photons in the 
BSM. Using narrow-band filters, the coherence time of the photons that undergo 
the BSM is significantly increased, exceeding the temporal resolution of the 
detectors. Hence pulsed sources can be replaced by CW sources, which do not 
require any synchronization6,7, allowing for the first time the use of completely 
autonomous sources. Our experiment exploits recent progress in the time precision 
of photon detectors, in the efficiency of photon pair production by SPDC with 
waveguides in nonlinear crystals8, and in the stability of narrow-band filters. This 
approach is independent of the form of entanglement; we employed time-bin 
entangled photons9 at telecom wavelengths. In addition to entangling photons from 
autonomous sources, a fundamental quantum phenomenon, our setup is robust 
against thermal or mechanical fluctuations in optical fibres thanks to cm-long 
coherence lengths. The present experiment is thus an important step towards real-
world quantum networks with truly independent and distant nodes. 
The BSM is the essential element in an entanglement-swapping experiment. 
Linear optics allows the realization of only a partial BSM10 by coupling the two 
incoming modes on a beam-splitter (BS) and observing a suitable detection pattern in 
the outgoing modes. Such a measurement is successful in at most 50% of the cases. 
Still, a successful partial BSM entangles two photons that were, up to then, independent. 
The physics behind this realization is the bosonic character of photons, it is therefore 
crucial that the two incoming photons are indistinguishable: they must be identical in 
their spectral, spatial, polarization and temporal modes at the BS: Spectral overlap is 
achieved by the use of similar filters, spatial overlap by the use of single-mode optical 
fibres and polarization is matched by a polarization controller. In addition, the temporal 
resolution must be unambiguous: detection at a time t ± ∆td, with ∆td the temporal 
resolution of the detector, must single out a unique time mode. In previous experiments, 
synchronised pulsed sources created both the photons at the same time and path lengths 
had to be matched to obtain the required temporal overlap. The pulse length, i.e. the 
coherence length of the photons, was τc << ∆td (typically τc <1ps), but two subsequent 
pulses were separated by more than ∆td11. The drawback of such a realization is that the 
two sources cannot be totally autonomous, because of the indispensable 
synchronization. Here, by using stable narrow-band filters and detectors with low jitter, 
we reach the regime where τc > ∆td12. In this case, the detectors always single out a 
unique time mode. As a benefit, we can give up the pulsed character of the sources and 
the synchronization between them, realizing for the first time the entanglement 
swapping scheme as originally proposed in Ref.2. 
The experimental scheme is sketched in Fig.1. Each of the two non-linear crystals 
emits pairs of energy-time entangled photons13 produced by SPDC of a photon 
originating from a CW laser. A pair can be created at any time t, and all these processes 
are coherent within the km-long coherence length of the laser: 
tt,ψ describes a pair of signal and idler photons emitted by source A. Thus, 
the state produced by two independent sources can conveniently be represented as 
( )∑ ∑ 
++++++∝=Ψ
BABABABAprep
tttttttttttt
,,,,,,
ττττψψ .
The first term in the above sum describes 4 photons all arriving at the same time t at a 
BS. Since for this case two identical photons bunch in the same mode, due to their 
bosonic nature, this term leads to a Hong-Ou-Mandel (HOM) dip14. The second term 
describes two photon pairs arriving with a time difference τ>0. The two photons A1 and 
B1 are sent through a 50/50 BS. This fibre-coupler and the two detectors behind it 
realize a partial BSM10: in particular, when one of the detectors fires at time t and the 
other one at time t+τ, this corresponds to a measurement of the Bell-state −Ψ  for A1 
and B19. In consequence the remaining two photons A2 and B2 are projected in the 
state 
22222222 BABABABA
tttt ττψ +−+∝Ψ≡ − , which is a singlet state for 
time-bin entanglement. Hence entanglement has been swapped. This process can be 
seen as teleportation15-19 of entanglement. It can be tested by sending the photons in 
unbalanced interferometers such that the path difference between the two arms 
corresponds to τ. Interference between temporally distinguishable events (at t and t+τ, 
respectively) is obtained by erasing the time information via unbalanced 
interferometers9,12,20 as shown in Fig.1. Note that the value of τ varies from one 
successful entanglement swapping event to another. As in our experiment the path 
differences of the analysing interferometers are fixed, we test the entanglement of the 
swapped pairs produced with one fixed τ.  
We now describe our experiment in more detail. Above we have assumed that the 
detection times t and t+τ of the BSM are sharply defined. In physical terms, this 
requirement means that the detection times have to be determined with sub-coherence-
time precision: this is the key ingredient that makes it possible to achieve 
synchronisation of photons A1 and B1 by detection, thus to use CW sources. Since 
single-photon detectors have a certain intrinsic minimal jitter, the coherence length of 
the photons has to be increased to exceed this value by narrow filtering.  
Consider the case where each of the two sources emits one entangled pair of 
photons, and where A1 and B1 take different exits of the BS. The photon that takes 
output port 1 is detected by a NbN superconducting single-photon detector (SSPD)21 
with a time resolution ∆td = 74ps. The photon in output port 2 is detected by an InGaAs 
single photon avalanche diode (APD, ∆td = 105ps) triggered by the detection in the 
SSPD. The time resolution of these detectors is several times smaller than that of 
commercial telecommunication photon detection modules. To enable synchronization of 
the photons at the BS by post-selection, the coherence length of the photons has to 
exceed ∆td. This is achieved by using filters of 10pm bandwidth, corresponding to a 
coherence time τc of 350ps. We are able to tolerate the losses due to filtering because 
we use cm-long wave guides in PPLN crystals with a high down-conversion efficiency 
of 5*10-7 per pump photon and per nm of the created spectrum. For 2mW of laser 
power, an emission flux q of 2*10-2 pairs per coherence time is obtained. This q is 
independent of the filtered bandwidth: in fact, narrower filtering decreases the number 
of photons per second but increases their coherence time by the same factor, hence 
keeping q constant. 
Any two-detector click in the BSM prepares the two remaining photons in a time-
bin entangled state. In our experiment the creation rate for such entangled photon pairs 
is ≈104 per second, with time delays τ ranging up to 10ns. This is two orders of 
magnitude larger than in previous experiments at shorter and similar wavelengths3-6. As 
the probability of both the pairs originating from different sources equals the probability 
of creating them in the same source, the first cases have to be post selected by 
considering only 4-fold-events. Furthermore only one fixed τ is tested. The resulting 
rate is smaller by two orders of magnitude compared to the creation rate. To verify their 
entanglement, the two photons are sent through unbalanced interferometers (a and b) in 
Michelson configuration. The path length differences of the interferometers must be 
identical only within the coherence length of the analyzed photons (7cm), but stable in 
phase (α and β): this is achieved by active stabilization22. On each side, both output 
ports of the interferometer are connected to InGaAs APD, triggered by the detection of 
both the photons in the BSM.  
Four-fold coincidences, between one click in each BSM detector and one behind 
each interferometer, are registered by a multistop time to digital converter (TDC) and 
the arrival times (t, t+τ) are stored in a table. For τ = 0, we observe a decrease in this 
coincidence count rate (see Fig.2). The visibility of this HOM dip of 77% indicates the 
degree of indistinguishability of the two photons A1 and B1 and could be further 
improved by increasing τc/∆td. The width of the dip corresponds to the convolution of τc 
for the two photons with the jitter of the detectors. Note that photons which are detected 
after the BS at measurable different times, but within τc, do still partially bunch, which 
confirms that the relevant time precision is set by the coherence time of the photons. 
To test for successful entanglement swapping, the relative phase α-β between the 
interferometers is changed by keeping α fixed and scanning β. As usual for the analysis 
of time-bin entanglement9, interference is observable in the case where, at the output of 
the interferometers, both photons are detected at the same time.  
We measured the four possible 4-fold coincidence count rates ),( βαijR  (clicks in 
two outer detectors conditioned on a successful BSM) with { }−+∈ ,, ji  the different 
detectors behind interferometer a and b, respectively. Thus the two-photon spin-
correlation coefficient 
),(),(),(),(
),(),(),(),(
βαβαβαβα
βαβαβαβα
−−+−−+++
−−+−−+++
E  is 
obtained as a function of the phase settings α and β and plotted in Fig.3 for α fixed. A fit 
of the form )cos(),( βαβα −= VE  to our experimental data gives a visibility V=0.63. 
If one assumes that the two photons are in a Werner state (which corresponds to white 
noise), one can show that 31>V  is sufficient to demonstrate entanglement
5,23. Our 
experimental visibility clearly exceeds this bound24. The plain squares show that the 3-
fold coincidence count rate between a successful BSM and only one of the outside 
detectors is independent of the phase setting, as expected for a −Ψ -state. V is limited by 
imperfections in the matching of wavelengths, polarisations and temporal 
synchronisation. In our setup, the latter is the main source of errors. The integration 
time of this measurement was 1 hour for each of the 13 phase settings and the 
experiment was run 8 times, hence took 104 hours, which demonstrates the stability of 
our setup. Such long integration times are necessary because of low count rates (5 four-
fold coincidences per hour), which are mainly due to poor coupling efficiencies of the 
photons into optical fibres, losses in optical components like filters and interferometers, 
as well as the limited detectors efficiencies. All these factors decrease the probability of 
detecting all four photons of a two-pair event. 
Exploiting all the produced entangled pairs with different delays τ is possible in 
principle using rapidly adjustable delays in the interferometers or quantum memories. 
This would be an important step towards the realization of recent proposals for long-
distance quantum communication25. Time-bin entanglement is particularly stable and 
well suited for fibre optic communications26, and the coherence length of 7cm allows 
tolerating significant fiber length fluctuations as expected in field experiments. If 
additionally, count rates are further improved, long distance applications like quantum 
relays27,28 become realistic. 
In conclusion, we realized an entanglement swapping experiment with completely 
autonomous CW sources. This is possible thanks to the low jitter of new NbN 
superconducting and InGaAs avalanche single-photon detectors and to the long 
coherence length of the created photon pairs after narrow-band filtering. The setup does 
not require any synchronization between the sources and is highly stable against length 
fluctuations of the quantum channels. 
Methods 
Schematic description of the setup. Both sources consist of an external cavity diode 
laser in CW mode at 780.027nm (Toptica DL100), stabilized against a Rubidium 
transition (85Rb F = 3), pumping a nonlinear periodically poled Lithium Niobate 
waveguide8 (PPLN, HC photonics Corp) at a power of 2mW. The process of SPDC 
creates 4*1011 pairs of photons per second with a spectral width of 80nm FWHM 
centered at 1560nm. The photons are emitted collinearly and coupled into a single-mode 
fiber with 25% efficiency and the remaining laser light is blocked with a silicon high-
pass filter (Si). Signal and idler photons are separated and filtered down to a bandwidth 
of 10pm by custom-made tunable phase-shifted Bragg gratings (AOS GmbH). These 
filters have a rejection of >40dB, 3dB insertion losses, and can be tuned independently 
over a range of 400pm. Once a signal photon has been filtered to ωs, the corresponding 
idler photon has a well-defined frequency ωi, due to stabilized pump wavelength and 
energy conservation in the process of SPDC (ωs + ωi = ωlaser). After filtering, the 
effective conversion efficiency for creating a photon pair within these 10pm is 5*10-9 
per pump photon. In principle, the available pump power permits us to produce narrow 
band entangled photon pairs at rates up to 3*108 pairs per second, which translates to an 
emission flux of more than 0.1 photons per coherence time. In this experiment, we 
limited the laser to 2mW, in order to reduce the probability of multiple pair creation 
which would decrease the interference visibility29.  
After the beam splitter (BS), the first photon is detected by a NbN 
superconducting single-photon detector (SSPD, Scontel) operated in free running 
mode21, with a total detection efficiency of 4.5%, 300 dark counts/sec and a timing 
resolution of 74ps, including the time jitter of both the detector and the amplification 
and discrimination electronics. The second photon is detected by an InGaAs single-
photon avalanche diode operated in Geiger mode and actively triggered by the detection 
in the SSPD. With home-made electronics this detector has a time jitter of 105ps. The 
observed HOM-dip with a visibility of 77% was obtained with two SSPD detectors, 
which were used because of their smaller time jitter. For the entanglement swapping, we 
used an APD, because of its higher efficiency, in order to shorten the integration time. 
This means that the visibility of the interference fringe in Fig.3 could further be 
increased by the use of two SSPDs, but with the drawback of longer measurement 
times. 
Photons A2 and B2 are also detected by InGaAs APDs (ID200, idQuantique). All 
the APDs have quantum efficiencies of 30% and dark count probabilities of 10-4 per ns. 
The interferometers are actively stabilized against a laser locked on an atomic transition, 
have a path length difference of 1.2ns and insertion losses of 4dB each.  
1. Yurke, B. & Stoler, D. Bell’s-inequality experiments using independent-particle 
sources. Phys. Rev. A 46 2229 (1992). 
2. Żukowski M., Zeilinger A., Horne M. A. & Ekert A. K. ”Event-ready-detectors” Bell 
experiment via entanglement swapping. Phys. Rev. Lett. 71, 4287-4290 (1993). 
3. Pan J.-W., Bouwmeester D., Weinfurter H. & Zeilinger A. Experimental 
Entanglement Swapping: Entangling Photons That Never Interacted. Phys. Rev. Lett. 
80, 3891-3894 (1998). 
4. Jennewein T., Weihs G., Pan J.-W. & Zeilinger A. Experimental Nonlocality Proof of 
Quantum Teleportation and Entanglement Swapping. Phys. Rev. Lett. 88, 017903 
(2002). 
5. de Riedmatten, H., Marcikic, I., Tittel, W., Zbinden, H. & Gisin, N. Long-distance 
entanglement swapping with photons from separated sources. Phys. Rev. A 71, 050302 
(2005). 
6. Yang, T. et al. Experimental Synchronization of Independent Entangled Photon 
Sources. Phys. Rev. Lett. 96, 110501 (2006). 
7. Kaltenbaek, R., Blauensteiner, B., Żukowski, M., Aspelmeyer, M., & Zeilinger A. 
Experimental interference of independent photons. Phys. Rev. Lett. 96, 240502 (2006). 
8. Tanzilli S. et al. PPLN waveguide for quantum communication. Eur. Phys. .J D 18, 
155 (2002). 
9. Brendel, J., Gisin, N., Tittel, W. & Zbinden, H. Pulsed Energy-Time Entangled Twin-
Photon Source for Quantum Communication. Phys. Rev. Lett. 82, 2597 (1999). 
10. Weinfurter, H. Experimental Bell-State Analysis. Europhys Lett. 25, 559 (1994). 
11. Bouwmeester, D. et al. Experimental quantum teleportation. Nature 390, 575 
(1997). 
12. Legero, T., Wilk, T., Henrich, M., Rempe, G. & Kuhn, A. Quantum Beat of Two 
Single Photons. Phys. Rev. Lett. 93, 070503 (2004). 
13. Franson, J.D. Bell inequality for position and time. Phys. Rev. Lett. 62, 2205 (1989). 
14. Hong, C.K., Ou, Z.Y. & Mandel, L. Measurement of Subpicosecond Time Intervals 
between Two Photons by Interference. Phys. Rev. Lett. 59, 2044 (1987). 
15. Furusawa, A. Unconditional Quantum Teleportation. Science 282, 706 (1998) 
16. Boschi, D.,Branca, S., De Martini, F., Hardy, L. & Popescu, S. Experimental 
Realization of Teleporting an Unknown Pure Quantum State via Dual Classical and 
Einstein-Podolsky-Rosen Channels. Phys. Rev. Lett. 80, 1121 (1998). 
17. Marcikic, I., de Riedmatten, H., Tittel, W., Zbinden, H. & Gisin, N. Long-distance 
teleportation of qubits at telecommunication wavelengths. Nature 421, 509 (2003). 
18. Riebe, M. et al. Deterministic quantum teleportation with atoms. Nature 429, 734 
(2004). 
19. Barrett, M. D. et al. Deterministic quantum teleportation of atomic qubits. Nature 
429, 737 (2004). 
20. Pittman, T. B. et al. Can Two-Photon Interference be Considered the Interference of 
Two Photons? Phys. Rev. Lett. 77, 1917 (1996). 
21. Milostnaya, I. et al. Superconducting single-photon detectors designed for operation 
at 1.55-µm telecommunication wavelength. J. Phys. Conference Series 43, 1334 (2006). 
22. Marcikic, I. et al. Distribution of Time-Bin Entangled Qubits over 50 km of Optical 
Fiber. Phys. Rev. Lett. 93, 180502 (2004). 
23. Peres, A. Separability Criterion for Density Matrices. Phys. Rev. Lett. 77, 1413 
(1996). 
24. In fact, in our experimental situation entanglement is present even for values of V 
that are smaller than 0.33 because the noise is dominated by phase errors due to the 
partial distinguishability of the two photons involved in the BSM. 
25. Duan, L.-M., Lukin, M.D., Cirac, J.I. & Zoller, P. Long-distance quantum 
communication with atomic ensembles and linear optics. Nature 414, 413 (2001). 
26. Thew R. T., Tanzilli S., Tittel W., Zbinden H., & Gisin N. Experimental 
investigation of the robustness of partially entangled qubits over 11 km. Phys. Rev. A 
66, 062304 (2002). 
27. Jacobs B. C., Pittman T. B. & Franson J. D. Quantum relays and noise suppression 
using linear optics. Phys. Rev. A 66, 052307 (2002). 
28. Collins, D., Gisin, N. & de Riedmatten, H. Quantum Relay for Long Distance 
Quantum Cryptography. J. Mod. Opt. 522, 735 (2005). 
29. Scarani, V., de Riedmatten, H., Marcikic, I., Zbinden, H. & Gisin, N. Four-photon 
correction in two-photon Bell experiments. Eur. Phys. J. D 32, 129 (2005). 
Acknowledgements: We thank C. Barreiro, J.-D. Gautier, G. Gol’tsman, C. Jorel, S Tanzilli and J. van 
Houwelingen for technical support, and H. de Riedmatten, S. Iblisdir and R. Thew for helpful 
discussions. Financial support by the EU projects QAP and SINPHONIA and by the Swiss NCCR 
Quantum Photonics is acknowledged. 
Author Information: Reprints and permissions information is available at 
npg.nature.com/reprintsandpermissions. The authors declare that they have no competing financial 
interests Correspondence and requests for materials should be addressed to M.H. 
(matthaeus.halder@physics.unige.ch). 
Figure 1: 
Experimental 
Setup. Two 
pairs of 
entangled 
photons (A1-
A2 and B1-B2) 
are produced, 
one by each source (A and B), and all the photons are narrowly filtered (10pm). 
One photon of each pair is sent into a 50/50 beam splitter (BS) and both 
undergo a partial Bell-State measurement (BSM). By detection in different 
output ports of BS and with a certain time delay τ the two photons A1 and B1 
are projected on the 
−Ψ -state for time bin qubits, projecting the two remaining 
photons on the 
−Ψ -state as well. The entanglement is swapped onto the 
photons A2 and B2 and can be tested by passing them through interferometers 
with phases α and β, and detecting them by single photon avalanche detectors 
(APD) in both outputs (+,-) of each interferometer. 
 Figure 2: 4-fold 
coincidence count rate as 
a function of the temporal 
delay τ. It can be seen, 
that the detection 
probability decreases if 
the two photons A1 and 
B1 arrive simultaneously 
(τ=0) at the beam splitter 
due to photon bunching, leading to a Hong-Ou-Mandel dip with 77% visibility.  
Figure3  
Figure 3: The correlation coefficient E(α,β) between photons A2 and B2, 
conditioned on a BSM of photons A1 and B1, as a function of the relative phase 
α–β of the interferometers (open points). A fit of the form )cos(),( βαβα −= VE  
gives a visibility V=0.63. This proves successful entanglement swapping (see 
text). The coincidence count rate of only one detector conditioned on a 
successful BSM (3-fold coincidence) is independent of the phase setting as 
expected for a −Ψ -state (squares).
ABSTRACT
  A quantum system composed of two or more subsystems can be in an entangled
state, i.e. a state in which the properties of the global system are well
defined but the properties of each subsystem are not. Entanglement is at the
heart of quantum physics, both for its conceptual foundations and for
applications in information processing and quantum communication. Remarkably,
entanglement can be "swapped": if one prepares two independent entangled pairs
A1-A2 and B1-B2, a joint measurement on A1 and B1 (called a "Bell-State
Measurement", BSM) has the effect of projecting A2 and B2 onto an entangled
state, although these two particles have never interacted or shared any common
past[1,2]. Experiments using twin photons produced by spontaneous parametric
down-conversion (SPDC) have already demonstrated entanglement swapping[3-6],
but here we present its first realization using continuous wave (CW) sources,
as originally proposed[2]. The challenge was to achieve sufficiently sharp
synchronization of the photons in the BSM. Using narrow-band filters, the
coherence time of the photons that undergo the BSM is significantly increased,
exceeding the temporal resolution of the detectors. Hence pulsed sources can be
replaced by CW sources, which do not require any synchronization[6,7], allowing
for the first time the use of completely autonomous sources. Our experiment
exploits recent progress in the time precision of photon detectors, in the
efficiency of photon pair production by SPDC with waveguides in nonlinear
crystals[8], and in the stability of narrow-band filters. This approach is
independent of the form of entanglement; we employed time-bin entangled
photons[9] at telecom wavelengths. Our setup is robust against thermal or
mechanical fluctuations in optical fibres thanks to cm-long coherence lengths.

<|endoftext|><|startoftext|>
arXiv:0704.0759v1  [math.AP]  5 Apr 2007
ENERGY CONSERVATION AND ONSAGER’S CONJECTURE
FOR THE EULER EQUATIONS
A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
ABSTRACT. Onsager conjectured that weak solutions of the Euler equa-
tions for incompressible fluids in R3 conserve energy only if they have a
certain minimal smoothness, (of order of 1/3 fractional derivatives) and
that they dissipate energy if they are rougher. In this paper we prove
that energy is conserved for velocities in the function space B
3,c(N)
show that this space is sharp in a natural sense. We phrase the energy
spectrum in terms of the Littlewood-Paley decomposition and show that
the energy flux is controlled by local interactions. This locality is shown
to hold also for the helicity flux; moreover, every weak solution of the
Euler equations that belongs to B
3,c(N)
conserves helicity. In contrast,
in two dimensions, the strong locality of the enstrophy holds only in the
ultraviolet range.
1. INTRODUCTION
The Euler equations for the motion of an incompressible inviscid fluid
+ (u · ∇)u = −∇p,
(2) ∇ · u = 0,
where u(x, t) denotes the d-dimensional velocity, p(x, t) denotes the pres-
sure, and x ∈ Rd. We mainly consider the case d = 3. When u(x, t) is a
classical solution, it follows directly that the total energyE(t) = 1
|u|2 dx
is conserved. However, conservation of energy may fail for weak solutions
(see Scheffer [25], Shnirelman [24]). This possibility has given rise to a
considerable body of literature and it is closely connected with statistical
theories of turbulence envisioned 60 years ago by Kolmogorov and On-
sager. For reviews see, for example, Eyink and Sreenivasan [14], Robert
[23], and Frisch [15].
Date: April 5, 2007.
2000 Mathematics Subject Classification. Primary: 76B03; Secondary: 76F02.
Key words and phrases. Euler equations, anomalous dissipation, energy flux, Onsager
conjecture, turbulence, Littlewood-Paley spectrum.
http://arxiv.org/abs/0704.0759v1
2 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
Onsager [22] conjectured that in 3-dimensional turbulent flows, energy
dissipation might exist even in the limit of vanishing viscosity. He sug-
gested that an appropriate mathematical description of turbulent flows (in
the inviscid limit) might be given by weak solutions of the Euler equations
that are not regular enough to conserve energy. According to this view, non-
conservation of energy in a turbulent flow might occur not only from vis-
cous dissipation, but also from lack of smoothness of the velocity. Specif-
ically, Onsager conjectured that weak solutions of the Euler equation with
Hölder continuity exponent h > 1/3 do conserve energy and that turbulent
or anomalous dissipation occurs when h ≤ 1/3. Eyink [12] proved energy
conservation under a stronger assumption. Subsequently, Constantin, E and
Titi [7] proved energy conservation for u in the Besov spaceBα3,∞, α > 1/3.
More recently the result was proved under a slightly weaker assumption by
Duchon and Robert [11].
In this paper we sharpen the result of [7]: we prove that energy is con-
served for velocities in the Besov space of tempered distributions B
3,p . In
fact we prove the result for velocities in the slightly larger spaceB
3,c(N)
Section 3). This is a space in which the “Hölder exponent” is exactly 1/3,
but the slightly better regularity is encoded in the summability condition.
The method of proof combines the approach of [7] in bounding the trilinear
term in (3) with a suitable choice of the test function for weak solutions in
terms of a Littlewood-Paley decomposition. Certain cancelations in the tri-
linear term become apparent using this decomposition. We observe that the
space B
3,c(N)
is sharp in the context of no anomalous dissipation. We give
an example of a divergence free vector field in B
3,∞ for which the energy
flux due to the trilinear term is bounded from below by a positive constant.
This construction follows ideas in [12]. However, because it is not a solu-
tion of the unforced Euler equation, the example does not prove that indeed
there exist unforced solutions to the Euler equation that live in B
3,∞ and
dissipate energy.
Experiments and numerical simulations indicate that for many turbu-
lent flows the energy dissipation rate appears to remain positive at large
Reynolds numbers. However, there are no known rigorous lower bounds
for slightly viscous Navier-Stokes equations. The existence of a weak so-
lution of Euler’s equation, with positive smoothness and that does not con-
serve energy remains an open question. For a discussion see, for example,
Duchon and Robert [11], Eyink [12], Shnirelman [25], Scheffer [24], de
Lellis and Szekelyhidi [10].
We note that the proof in Section 3 applied to Burger’s equation for 1-
dimensional compressible flow gives conservation of energy in B1/3
3,c(N)
ENERGY CONSERVATION 3
this case it is easy to show that conservation of energy can fail in B
which is the sharp space for shocks.
The Littlewood-Paley approach to the issue of energy conservation ver-
sus turbulent dissipation is mirrored in a study of a discrete dyadic model
for the forced Euler equations [4, 5]. By construction, all the interactions
in that model system are local and energy cascades strictly to higher wave
numbers. There is a unique fixed point which is an exponential global at-
tractor. Onsager’s conjecture is confirmed for the model in both directions,
i.e. solutions with bounded H5/6 norm satisfy the energy balance condition
and turbulent dissipation occurs for all solutions when the H5/6 norm be-
comes unbounded, which happens in finite time. The absence of anomalous
dissipation for inviscid shell models has been obtained in [8] in a space with
regularity logarithmically higher than 1/3.
In Section 3.2 we present the definition of the energy flux employed in
the paper. This is the flux of the Littlewood-Paley spectrum, ([6]) which
is a mathematically convenient variant of the physical concept of flux from
the turbulence literature. Our estimates employing the Littlewood-Paley de-
composition produce not only a sharpening of the conditions under which
there is no anomalous dissipation, but also provide detailed information
concerning the cascade of energy flux through frequency space. In sec-
tion 3.3. we prove that the energy flux through the sphere of radius κ is
controlled primarily by scales of order κ. Thus we give a mathematical
justification for the physical intuition underlying much of turbulence the-
ory, namely that the flux is controlled by local interactions (see, for exam-
ple, Kolmogorov [16] and also [13], where sufficient conditions for locality
were described). Our analysis makes precise an exponential decay of non-
local contributions to the flux that was conjectured by Kraichnan [17].
The energy is not the only scalar quantity that is conserved under evolu-
tion by classical solutions of the Euler equations. For 3-dimensional flows
the helicity is an important quantity related to the topological configura-
tions of vortex tubes (see, for example, Moffatt and Tsinober [21]). The
total helicity is conserved for smooth ideal flows. In Section 4 we observe
that the techniques used in Section 3 carry over exactly to considerations of
the helicity flux, i.e., there is locality for turbulent cascades of helicity and
every weak solution of the Euler equation that belongs to B
3,c(N)
conserves
helicity. This strengthens a recent result of Chae [2]. Once again our argu-
ment is sharp in the sense that a divergence free vector field in B
3,∞ can be
constructed to produce an example for which the helicity flux is bounded
from below by a positive constant.
4 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
An important property of smooth flows of an ideal fluid in two dimen-
sions is conservation of enstrophy (i.e. the L2 norm of the curl of the ve-
locity). In section 4.2 we apply the techniques of Section 3 to the weak
formulation of the Euler equations for velocity using a test function that
permits estimation of the enstrophy. We obtain the result that, unlike the
cases of the energy and the helicity, the locality in the enstrophy cascade is
strong only in the ultraviolet range. In the infrared range there are nonlo-
cal effects. Such ultraviolet locality was predicted by Kraichnan [18] and
agrees with numerical and experimental evidence. Furthermore, there are
arguments in the physical literature that hold that the enstrophy cascade is
not local in the infrared range. We present a concrete example that exhibits
this behavior.
In the final section of this paper, we study the bilinear term B(u, v). We
show that the trilinear map (u, v, w) → 〈B(u, v), w)〉 defined for smooth
vector fields in L3 has a unique continuous extension to {B1/2
18/7,2
}3 (and a
fortiori to {H5/6}3, which is the relevant space for the dyadic model prob-
lem referred to above). We present an example to show that this result is
optimal. We stress that the borderline space for energy conservation is much
rougher than the space of continuity for 〈B(u, v), w〉.
2. PRELIMINARIES
We will use the notation λq = 2
q (in some inverse length units). Let
B(0, r) denote the ball centered at 0 of radius r in Rd. We fix a nonnegative
radial function χ belonging to C∞0 (B(0, 1)) such that χ(ξ) = 1 for |ξ| ≤
1/2. We further define
(3) ϕ(ξ) = χ(λ−11 ξ)− χ(ξ).
Then the following is true
(4) χ(ξ) +
ϕ(λ−1q ξ) = 1,
(5) |p− q| ≥ 2 ⇒ Supp ϕ(λ−1q ·) ∩ Supp ϕ(λ−1p ·) = ∅.
We define a Littlewood-Paley decomposition. Let us denote by F the
Fourier transform on Rd. Let h, h̃, ∆q (q ≥ −1) be defined as follows:
ENERGY CONSERVATION 5
h = F−1ϕ and h̃ = F−1χ,
∆qu = F−1(ϕ(λ−1q ξ)Fu) = λdq
h(λqy)u(x− y)dy, q ≥ 0
∆−1u = F−1(χ(ξ)Fu) =
h̃(y)u(x− y)dy.
For Q ∈ N we define
(6) SQ =
Due to (3) we have
(7) SQu = F−1(χ(λ−1Q+1ξ)Fu).
Let us now recall the definition of inhomogeneous Besov spaces.
Definition 2.1. Let s be a real number, p and r two real numbers greater
than 1. Then
‖u‖Bsp,r
= ‖∆−1u‖Lp +
λsq‖∆qu‖Lp
ℓr(N)
is the inhomogeneous Besov norm.
Definition 2.2. Let s be a real number, p and r two real numbers greater
than 1. The inhomogeneous Besov space Bsp,r is the space of tempered
distributions u such that the norm ‖u‖Bsp,r is finite.
We refer to [3] and [19] for background on harmonic analysis in the con-
text of fluids. We will use the Bernstein inequalities
Lemma 2.3.
‖∆qu‖Lb ≤ λ
q ‖∆qu‖La for b ≥ a ≥ 1.
As a consiquence we have the following inclusions.
Corollary 2.4. If b ≥ a ≥ 1, then we have the following continuous em-
beddings
Bsa,r ⊂ B
B0a,2 ⊂ La
6 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
In particular, the following chain of inclusions will be used throughout
the text.
(8) H
6 (R3) ⊂ B
(R3) ⊂ B
(R3) ⊂ B
3,2(R
3. ENERGY FLUX AND LOCALITY
3.1. Weak solutions.
Definition 3.1. A function u is a weak solution of the Euler equations with
initial data u0 ∈ L2(Rd) if u ∈ Cw([0, T ];L2(Rd)), (the space of weakly
continuous functions) and for every ψ ∈ C1([0, T ];S(Rd)) with S(Rd) the
space of rapidly decaying functions, with ∇x · ψ = 0 and 0 ≤ t ≤ T , we
(u(t), ψ(t))− (u(0), ψ(0))−
(u(s), ∂sψ(s))ds =
b(u, ψ, u)(s)ds,
where
(u, v) =
u · vdx,
b(u, v, w) =
u · ∇v · w dx,
and ∇x · u(t) = 0 in the sense of distributions for every t ∈ [0, T ].
Clearly, (9) implies Lipschitz continuity of the maps t → (u(t), ψ) for
fixed test functions. By an approximation argument one can show that for
any weak solution u of the Euler equation, the relationship (9) holds for all
ψ that are smooth and localized in space, but only weakly Lipschitz in time.
This justifies the use of physical space mollifications of u as test functions
ψ. Because we do not have an existence theory of weak solutions, this is a
rather academic point.
3.2. Energy flux. For a divergence-free vector field u ∈ L2 we introduce
the Littlewood-Paley energy flux at wave number λQ by
(10) ΠQ =
Tr[SQ(u⊗ u) · ∇SQu]dx.
If u(t) is a weak solution to the Euler equation, then substituting the test
function ψ = S2Qu into the weak formulation of the Euler equation (9) we
obtain
(11) ΠQ(t) =
‖SQu(t)‖22.
ENERGY CONSERVATION 7
Let us introduce the following localization kernel
(12) K(q) =
q , q ≤ 0;
q , q > 0,
For a tempered distribution u in R3 we denote
dq = λ
q ‖∆qu‖3,(13)
d2 = {d2q}q≥−1.(14)
Proposition 3.2. The energy flux of a divergence-free vector field u ∈ L2
satisfies the following estimate
(15) |ΠQ| ≤ C(K ∗ d2)3/2(Q).
From (15) we immediately obtain
(16) lim sup
|ΠQ| ≤ lim sup
We define B
3,c(N)
to be the class of all tempered distributions u in R3 for
which
(17) lim
λ1/3q ‖∆qu‖3 = 0,
and hence dq → 0. We endow B1/33,c(N) with the norm inherited from B
Notice that the Besov spaces B
3,p for 1 ≤ p < ∞, and in particular B
are included in B
3,c(N)
As a consequence of (11) and (16) we obtain the following theorem.
Theorem 3.3. The total energy flux of any divergence-free vector field in
the class B
3,c(N)
∩ L2 vanishes. In particular, every weak solution to the
Euler equation that belongs to the class L3([0, T ];B
3,c(N)
) ∩Cw([0, T ];L2)
conserves energy.
Proof of Proposition 3.2. In the argument below all the inequalities should
be understood up to a constant multiple.
Following [7] we write
(18) SQ(u⊗ u) = rQ(u, u)− (u− SQ)⊗ (u− SQ) + SQu⊗ SQu,
where
rQ(u, u) =
hQ(y)(u(x− y)− u(x))⊗ (u(x− y)− u(x))dy,
h̃Q(y) = λ
Q+1h̃(λQ+1y).
8 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
After substituting (18) into (10) we find
Tr[rQ(u, u) · ∇SQu]dx(19)
Tr[(u− SQ)⊗ (u− SQ) · ∇SQu]dx.(20)
We can estimate the term in (19) using the Hölder inequality by
‖rQ(u, u)‖3/2‖∇SQu‖3,
whereas
(21) ‖rQ(u, u)‖3/2 ≤
∣∣∣h̃Q(y)
∣∣∣ ‖u(· − y)− u(·)‖23dy.
Let us now use Bernstein’s inequalities and Corollary 2.4 to estimate
‖u(· − y)− u(·)‖23 ≤
|y|2λ2q‖∆qu‖23 +
‖∆qu‖23(22)
Q |y|2
Q−q d
q + λ
q(23)
≤ (λ4/3Q |y|2 + λ
Q )(K ∗ d2)(Q).(24)
Collecting the obtained estimates we find
Tr[rQ(u, u) · ∇SQu]dx
≤ (K ∗ d2)(Q)
∣∣∣h̃Q(y)
∣∣∣λ4/3Q |y|2dy + λ
λ2q‖∆qu‖23
≤ (K ∗ d2)(Q)λ−2/3Q
λ4/3q d
≤ (K ∗ d2)3/2(Q)
Analogously we estimate the term in (20)
Tr[(u− SQ)⊗ (u− SQ) ·∆SQu]dx
≤ ‖u− SQu‖23‖∆SQu‖3
‖∆qu‖23
λ2q‖∆qu‖23
≤ (K ∗ d2)3/2(Q).
This finishes the proof.
ENERGY CONSERVATION 9
3.3. Energy flux through dyadic shells. Let us introduce the energy flux
through a sequence of dyadic shells between scales −1 ≤ Q0 < Q1 < ∞
as follows
(25) ΠQ0Q1 =
Tr[SQ0Q1(u⊗ u) · ∇SQ0Q1u] dx,
where
(26) SQ0Q1 =
Q0≤q≤Q1
∆q = SQ1 − SQ0.
We will show that similar to formula (15) the flux through dyadic shells
is essentially controlled by scales near the inner and outer radii. In fact it
almost follows from (15) in view of the following decomposition
S2Q0Q1 = (SQ1 − SQ0−1)
= S2Q1 + S
Q0−1 − 2SQ0−1SQ1
= S2Q1 + S
Q0−1 − 2SQ0−1
= S2Q1 − S
Q0−1 − 2SQ0−1(1− SQ0−1)
= S2Q1 − S
Q0−1 − 2∆Q0−1∆Q0 .
Therefore
(28) ΠQ0Q1 = ΠQ1 − ΠQ0−1 − 2
Tr[∆̄Q0(u⊗ u) · ∇∆̄Q0u] dx,
where
(29) ∆̄Q0(u) =
h̄Q0(y)u(x− y) dy,
and h̄Q0(x) = F−1
ϕ(λ−1Q0−1ξ)ϕ(λ
Note that the flux through a sequence of dyadic shells is equal to the
difference between the fluxes across the dyadic spheres on the boundary
plus a small error term that can be easily estimated. Indeed, let us rewrite
the tensor product term as follows
(30) ∆̄Q0(u⊗ u) = r̄Q0(u, u) + ∆̄Q0u⊗ u+ u⊗ ∆̄Q0u,
where
r̄Q(u, u) =
h̄Q(y)(u(x− y)− u(x))⊗ (u(x− y)− u(x)) dy.
10 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
Thus we have
Tr[∆̄Q0(u⊗ u) · ∇∆̄Q0u] dx =
Tr[r̄Q(u, u) · ∇∆̄Q0u] dx
∆̄Q0u · ∇u · ∆̄Q0u dx
We estimate the first integral as previously to obtain
Tr[r̄Q0(u, u) · ∇∆̄Q0u] dx
∣∣∣∣ ≤ dQ0(K ∗ d
2)(Q0).
As to the second integral we have
∆̄Q0u · ∇u · ∆̄Q0u dx
∣∣∣∣ =
∆̄Q0u · ∇SQ0u · ∆̄Q0u dx
≤ d2Q0(K ∗ d
2)1/2(Q0).
Applying these estimates to the flux (28) we arrive at the following con-
clusion.
Theorem 3.4. The energy flux through dyadic shells between wavenumbers
λQ0 and λQ1 is controlled primarily by the end-point scales. More precisely,
the following estimate holds
(33) |ΠQ0Q1| ≤ C(K ∗ d2)3/2(Q0) + C(K ∗ d2)3/2(Q1).
3.4. Construction of a divergence free vector field with non-vanishing
energy flux. In this section we give a construction of a divergence free
vector field in B
3,∞(R
3) for which the energy flux is bounded from below
by a positive constant. This suggests the sharpness ofB
3,c(N)
(R3) for energy
conservation. Our construction is based on Eyink’s example on a torus [12],
which we transform to R3 using a method described below.
Let χQ(ξ) = χ(λ
Q+1ξ). We define P
ξ for vectors ξ ∈ R3, ξ 6= 0 by
P⊥ξ v = v − |ξ|−2(v · ξ)ξ =
I− |ξ|−2(ξ ⊗ ξ)
for v ∈ C3 and we use v · w =
vjwj for v, w ∈ C3.
Lemma 3.5. Let Φk(x) be R3 – valued functions, such that
Ik :=
|ξ||FΦk(ξ)| dξ <∞.
ENERGY CONSERVATION 11
Let also Ψk(x) = P(e
ik·xΦk(x)) where P is the Leray projector onto the
space of divergence free vectors. Then
(34) sup
∣∣Ψk(x)− eik·x(P⊥k Φk)(x)
∣∣ ≤ 1
|k| ,
(35) sup
∣∣(S2QΨk)(x)− χ2Q(k)Ψk(x)
∣∣ ≤ c
(2π)3
where c is the the Lipschitz constant of χ(ξ)2.
Proof. First, note that for any k, ξ ∈ R3 and v ∈ C3 we have
(v · ξ)ξ
|ξ|2 +
(v · ξ)k
∣∣∣∣ ≤
|ξ| ξ +
|v||ξ + k|
|k| .
In addition, it follows that
(v · k)k
|k|2 +
(v · ξ)k
∣∣∣∣ =
|(v · (k + ξ))k|
≤ |v||ξ + k||k| .
Adding (36) and (37) we obtain
|P⊥ξ v − P⊥k v| =
(v · ξ)ξ
|ξ|2 −
(v · k)k
(v · ξ)ξ
|ξ|2 +
(v · ξ)k
∣∣∣∣ +
(v · k)k
|k|2 +
(v · ξ)k
≤ 2 |v||ξ + k||k| .
Using this inequality we can now derive the following estimate:
|Ψk(x)− eik·x(P⊥k Φk)(x)| = |F−1[P⊥ξ (FΦk)(ξ + k)− P⊥k (FΦk)(ξ + k)]|
(2π)3
|ξ + k|
|k| |(FΦk)(ξ + k)| dξ
= |k|−1 1
|ξ||(FΦk(ξ))| dξ.
12 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
Finally, we have
|(S2QΨk)(x)− χQ(k)2Ψk(x)| = |F−1[(χQ(ξ)2 − χQ(k)2)(FΨk)(ξ)]|
(2π)3
c|ξ + k|
|(FΦk)(ξ + k)| dξ
= λ−1Q+1
(2π)3
|ξ||(FΦk)(ξ)| dξ,
where c is the the Lipschitz constant of χ(ξ)2. This concludes the proof. �
Example illustrating the sharpness of Theorem 3.3. Now we proceed to
construct a divergence free vector field in B
3,∞(R
3) with non-vanishing
energy flux. Let U(k) be a vector field U : Z3 → C3 as in Eyink’s example
[12] with
U(λq, 0, 0) = iλ
q (0, 0,−1), U(−λq, 0, 0) = iλ−1/3q (0, 0, 1),
U(0, λq, 0) = iλ
q (1, 0, 1), U(0,−λq, 0) = iλ−1/3q (−1, 0,−1),
U(λq, λq, 0) = iλ
q (0, 0, 1), U(−λq,−λq, 0) = iλ−1/3q (0, 0,−1),
U(λq,−λq, 0) = iλ−1/3q (1, 1,−1), U(−λq, λq, 0) = iλ−1/3q (−1,−1, 1),
for all q ∈ N and zero otherwise. Denote ρ(x) = F−1χ(4ξ) and A =∫
ρ(x)3 dx. Since χ(ξ) is radial, ρ(x) is real. Moreover,
ρ(x)3 dx =
(2π)3
F(ρ2)Fρ dξ
(2π)6
χ(4η)χ(4(ξ − η))χ(4ξ) dηdξ > 0.
Now let
u(x) = P
U(k)eik·xρ(x).
Note that u ∈ B1/33,∞(R3). Our goal is to estimate the flux ΠQ for the vector
field u. Define
Φk = |k|1/3U(k)ρ(x) and Ψk(x) = P(eik·xΦk(x)).
Then clearly Φk(x) and Ψk(x) satisfy the conditions of Lemma 3.5, and we
(41) u(x) =
k∈Z3\{0}
|k|−1/3Ψk(x).
ENERGY CONSERVATION 13
Now note that
Ψk1 · ∇S2QΨk2 = Ψk1 · S2QP[∇(eik·xΦk2)]
= i(Ψk1 · k2)S2QΨk2 +Ψk1 · S2QP(eik2·x∇Φk2).
In addition, the following equality holds by construction:
(43) P⊥k Φk = Φk, ∀k ∈ Z3.
Define the annulusAQ = Z
3∩B(0, λQ+2)\B(0, λQ−1). Thanks to Lemma 3.5,
for any sequences k1(Q), k2(Q), k3(Q) ∈ AQ with k1 + k2 = k3, we have
(Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx = i
(Ψk1 · k2)S2QΨk2 ·Ψ∗k3 dx+O(λ
(eik1·xΦk1 · k2)χQ(k2)2eik2·xΦk2 · e−ik3·xΦ∗k3 dx+O(λ
= i(|k1||k2||k3|)1/3A(U(k1) · k2)χQ(k2)2U(k2) · U(k3)∗ +O(λ0Q).
On the other hand, since the Fourier transform of Ψk is supported in
B(k, 1/4), we have
(Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx = 0,
whenever k1 + k2 6= k3. In addition, due to locality of interactions in this
example, (44) also holds if Aq \ {k1, k2, k3} 6= ∅ for all q ∈ N. Finally,
(Ψk1 · ∇S2QΨk2) ·Ψ∗k3 dx+
(Ψk1 · ∇S2QΨk3) ·Ψ∗k2 dx = 0,
whenever k2 /∈ AQ and k3 /∈ AQ. Hence, the flux for u can be written as
(46) ΠQ = −
k1,k2,k3∈AQ
k1+k2+k3=0
(|k1||k2||k3|)−1/3
(Ψk1 · ∇S2QΨk2) ·Ψk3 dx.
Since the number of nonzero terms in the above sum is independent of Q,
we obtain
(47) ΠQ = AΠ̃Q +O(λ
where Π̃ is the flux for the vector field U , i.e.,
(48) Π̃Q := −
k1,k2,k3∈AQ
k1+k2+k3=0
i(U(k1) · k2)χQ(k2)2U(k2) · U(k3).
14 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
The flux Π̃Q has only the following non-zero terms (see [12] for details):
|k2|=λQ
|k3|=
i(U1(−k2 − k3) · k2)U2(k2) · U3(k3)(χQ(k2)2 − χQ(k3)2)
≥ 4(χ(1/2)2 − χ(1/
2)2),
|k2|=
|k3|=2λQ
i(U1(−k2 − k3) · k2)U2(k2) · U3(k3)(χQ(k2)2 − χQ(k3)2)
≥ 4(χ(1/
2)2 − χ(1)2).
Hence
Π̃Q ≥ 4(χ(1/2)2 − χ(1/
2)2 + χ(1/
2)2 − χ(1)2) = 4.
This together with (47) implies that
lim inf
ΠQ ≥ 4A.
4. OTHER CONSERVATION LAWS
In this section we apply similar techniques to derive optimal results con-
cerning the conservation of helicity in 3D and that of enstrophy in 2D for
weak solutions of the Euler equation. In the case of the helicity flux we
prove that simultaneous infrared and ultraviolet localization occurs, as for
the energy flux. However, the enstrophy flux exhibits strong localization
only in the ultraviolet region, and a partial localization in the infrared re-
gion. A possibility of such a type of localization was discussed in [18].
4.1. Helicity. For a divergence-free vector field u ∈ H1/2 with vorticity
ω = ∇ × u ∈ H−1/2 we define the helicity and truncated helicity flux as
follows
u · ω dx(49)
Tr [SQ(u⊗ u) · ∇SQω + SQ(u ∧ ω) · ∇SQu] dx,(50)
where u ∧ ω = u ⊗ ω − ω ⊗ u. Thus, if u was a solution to the Euler
equation, then HQ would be the time derivative of the Littlewood-Paley
helicity at frequency λQ,
SQu · SQω dx.
ENERGY CONSERVATION 15
Let us denote
bq = λ
q ‖∆qu‖3,(51)
b2 = {b2q}∞q=−1,(52)
T (q) =
q , q ≤ 0;
q , q > 0,
Proposition 4.1. The helicity flux of a divergence-free vector field u ∈ H1/2
satisfies the following estimate
(54) |HQ| ≤ C(T ∗ b2)3/2(Q).
Theorem 4.2. The total helicity flux of any divergence-free vector field in
the class B
3,c(N)
∩H1/2 vanishes, i.e.
(55) lim
HQ = 0.
Consequently, every weak solution to the Euler equation that belongs to the
class L3([0, T ];B
3,c(N)
) ∩ L∞([0, T ];H1/2) conserves helicity.
Proposition 4.1 and Theorem 4.2 are proved by direct analogy with the
proofs of Proposition 3.2 and Theorem 3.3.
Example illustrating the sharpness of Theorem 4.2. We can also construct
an example of a vector field in B
3) for which the helicity flux is
bounded from below by a positive constant. Indeed, let U(k) be a vector
field U : Z3 → C3 with
U(±λq, 0, 0) = λ−2/3q (0, 0,−1),
U(0,±λq, 0) = λ−2/3q (1, 0, 1),
U(±λq,±λq, 0) = λ−2/3q (0, 0, 1),
U(±λq,∓λq, 0) = λ−2/3q (1, 1,−1),
for all q ∈ N and zero otherwise. Denote ρ(x) = F−1χ(4ξ),A =
ρ(x)3 dx,
and let
(56) u(x) = P
U(k)eik·xρ(x).
Note that u ∈ B2/33,∞(R3). On the other hand, a computation similar to the
one in Section 3.4 yields
(57) lim inf
|HQ| ≥ 4A.
16 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
4.2. Enstrophy. We work with the case of a two dimensional fluid in this
section. In order to obtain an expression for the enstrophy flux one can use
the original weak formulation of the Euler equation for velocities (9) with
the test function chosen to be
(58) ψ = ∇⊥S2Qω.
Let us denote by ΩQ the expression resulting on the right hand side of (9):
(59) ΩQ =
SQ(u⊗ u) · ∇∇⊥SQω
Thus,
d‖SQω‖22
= ΩQ.
As before we write
rQ(u, u) · ∇∇⊥SQω
(u− SQu)⊗ (u− SQu) · ∇∇⊥SQω
Let us denote
cq = ‖∆qω‖3,(61)
c2 = {c2q}∞q=−1,(62)
W (q) =
λ2q, q ≤ 0;
λ−4q , q > 0,
We have the following estimate (absolute constants are omitted)
|ΩQ| ≤
∣∣∣h̃Q(y)
∣∣∣ (‖∇SQu‖23|y|2 + ‖(I − SQ)u‖23)‖∇2SQω‖3dy
+ ‖(I − SQ)u‖23‖∇2SQω‖3
λ−2Q ‖SQω‖23 +
λ−2q c
λ−2q c
≤ ‖SQω‖23
λ−4Q−qc
λ2Q−qc
λ−4Q−qc
≤ ‖SQω‖23(W ∗ c2)1/2(Q) + (W ∗ c2)3/2(Q)
Thus, we have proved the following proposition.
ENERGY CONSERVATION 17
FIGURE 1. Construction of the vector field illustrating in-
frared nonlocality.
Proposition 4.3. The enstrophy flux of a divergence-free vector field satis-
fies the following estimate up to multiplication by an absolute constant
(64) |ΩQ| ≤ ‖SQω‖23(W ∗ c2)1/2(Q) + (W ∗ c2)3/2(Q).
Consequently, every weak solution to the 2D Euler equation
with ω ∈ L3([0, T ];L3) conserves enstrophy.
Much stronger results concerning conservation of enstrophy are available
for the Euler equations ([13], [20]) and for the long time zero-viscosity limit
for damped and driven Navier-Stokes equations ([9]).
Example illustrating infrared nonlocality. We conclude this section with a
construction of a vector field for which the enstrophy cascade is nonlocal in
the infrared range. Let θq = arcsin(λq−Q−2) and
(65) U lq = (cos(θq),− sin(θq)), Uhq = (sin(θq), cos(θq)),
klq = λq(sin(θq), cos(θq)), k
λ2Q+2 − λ2q(cos(θq),− sin(θq)),
see Fig. 4.2 for the case q = Q. Denote ρ(x) = δh̃(δx),A =
ρ(x)3 dx =∫
h̃(x)3 dx. Note that A > 0 and is independent of δ. Now let
(67) ulq(x) = P[U
q sin(k
q · x)ρ(x)], uhq (x) = P[Uhq sin(khq · x)ρ(x)].
(68) uq(x) = u
q(x) + u
q (x)
18 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
for q = 0, . . . , Q, and
(69) uQ+1(x) = P[V sin(λQ+2x1)ρ(x)],
where V = (0, 1). Now define
(70) u(x) =
uq(x).
Our goal is to estimate the enstrophy flux for u. Since Fu is compactly
supported, the expression (59) is equivalent to
(71) ΩQ =
(u · ∇)S2Qω · ω dx.
It is easy to see that
(72) ΩQ ≥
(uhq · ∇)S2Q(∇⊥ · ulq)(∇⊥ · uQ+1) dx.
Using Lemma 3.5 we obtain
ΩQ ≥ A
|Uhq |λ2q|U lq|λQ+2|V |+O(δ)
= λQ+2‖∆Q+2u‖3
λ2q‖∆qu‖23 +O(δ),
which shows sharpness of (64) in the infrared range.
5. INEQUALITIES FOR THE NONLINEAR TERM
We take d = 3 and consider u, v ∈ B
3,2 with ∇ · u = 0 and wish to
examine the advective term
(74) B(u, v) = P(u · ∇v) = ΛH(u⊗ v)
where
(75) [H(u⊗ v)]i = Rj(ujvi) +Ri(RkRl(ukvl))
and P is the Leray-Hodge projector, Λ = (−∆) 12 is the Zygmund operator
and Rk = ∂kΛ
−1 are Riesz transforms.
Proposition 5.1. The bilinear advective term B(u, v) maps continuously
the space B
3,2 × B
3,2 to the space B
. More precisely, there exist
ENERGY CONSERVATION 19
bilinear continuous maps C(u, v), I(u, v) so that B(u, v) = C(u, v) +
I(u, v) and constants C such that, for all u, v ∈ B
3,2 with ∇ · u = 0,
(76) ‖C(u, v)‖
≤ C‖u‖
(77) ‖I(u, v)‖
≤ C‖u‖
hold. If u, v, w ∈ B
(78) |〈B(u, v), w〉| ≤ C‖u‖
holds. So the trilinear map (u, v, w) 7→ 〈B(u, v), w〉 defined for smooth
vector fields in L3 has a unique continuous extension to
and a
fortiori to
Proof. We use duality. We take w smooth (w ∈ B
) and take the scalar
product
〈B(u, v), w〉 =
B(u, v) · wdx
We write, in the spirit of the paraproduct of Bony ([1])
(79) ∆q(B(u, v)) = Cq(u, v) + Iq(u, v)
(80) Cq(u, v) =
p≥q−2, |p−p′|≤2
∆q(ΛH(∆pu,∆p′v))
Iq(u, v) =
[∆qΛH(Sq+j−2u,∆q+jv) + ∆qΛH(Sq+j−2v,∆q+ju)]
20 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
We estimate the contribution coming from the Cq(u, v):
|〈Cq(u, v), w〉|
|q−q′|≤1
p≥q−2, |p−p′|≤2
3∆pu‖L3‖Λ
3∆p′v‖L3‖∆q′w‖L3
|p−p′|≤2
3∆pu‖L3‖Λ
3∆p′v‖L3
q≤p+2,|q−q′|≤1
3∆q′w‖L3
|p−p′|≤2
‖Λ 13∆pu‖ L3‖Λ
3∆p′v‖L3
 ‖w‖
≤ C‖u‖
This shows that the bilinear map C(u, v) =
q≥−1Cq(u, v) maps continu-
ously
(82) |〈C(u, v), w〉| ≤ C‖u‖
The terms Iq(u, v) contribute
|〈Iq(u, v), w〉|
|j|≤2, |q−q′|≤1
λq‖Sq+j−2u‖
‖∆q+jv‖L3‖∆q′w‖
|j|≤2, |q−q′|≤1
λq‖Sq+j−2v‖
‖∆q+ju‖L3‖∆q′w‖
≤ C‖u‖
|j|≤2,|q−q′|≤1
q ‖∆q+jv‖L3λ
q ‖∆q′w‖
+C‖v‖
|j|≤2,|q−q′|≤1
q ‖∆q+ju‖L3λ
q ‖∆q′w‖
≤ C‖u‖
Here we used the fact that
‖Squ‖
≤ C‖u‖
ENERGY CONSERVATION 21
This last fact is proved easily:
‖Sq(u)‖
∥∥∥∥∥∥
|∆ju|2
∥∥∥∥∥∥
‖∆ju‖2
≤ C‖u‖
We used Minkowski’s inequality in L
4 in the penultimate inequality and
Bernstein’s inequality in the last. This proves that I maps continuously
3,2 × B
3,2 to B
The proof of (78) follows along the same lines. Because of Bernstein’s
inequalities, the inequality (82) for the trilinear term 〈C(u, v), w〉 is stronger
than (78). The estimate of I follows:
|〈Iq(u, v), w〉|
|j|≤2, |q−q′|≤1
λq‖Sq+j−2u‖
‖∆q+jv‖
‖∆q′w‖
|j|≤2, |q−q′|≤1
λq‖Sq+j−2v‖
‖∆q+ju‖
‖∆q′w‖
≤ C‖u‖
|j|≤2,|q−q′|≤1
q ‖∆q+jv‖
q ‖∆q′w‖
+C‖v‖
|j|≤2,|q−q′|≤1
q ‖∆q+ju‖
q ‖∆q′w‖
+ ‖v‖
This concludes the proof. ✷
The inequality (82) is not true for 〈B(u, v), w〉 and (78) is close to being
optimal:
Proposition 5.2. For any 0 ≤ s ≤ 1
, 1 < p < ∞, 2 < r ≤ ∞ there exist
functions u, v, w ∈ Bsp,r and smooth, rapidly decaying functions un, vn, wn,
such that limn→∞ un = u, limn→∞ vn = v, limn→∞wn = w hold in the
norm of Bsp,r and such that
〈B(un, vn), wn〉 = ∞
Proof. We start the construction with a divergence-free, smooth function
u such that Fu ∈ C∞0 (B(0, 14)) and
u31dx > 0. We select a direction
22 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
e = (1, 0, 0) and set Φ = (0, u1, 0). Then
(83) A :=
(u(x) · e)
∣∣P⊥e Φ(x)
∣∣2 dx > 0.
Next we consider the sequence aq =
so that (aq) ∈ ℓr(N) for r > 2, but
not for r = 2, and the functions
(84) vn =
q aqP [sin(λqe · x)Φ(x)]
(85) wn =
q aqP [cos(λqe · x)Φ(x)] .
Clearly, the limits v = limn→∞ vn and w = limn→∞wn exist in norm in
every Bsp,r with 0 ≤ s ≤ 12 , 1 < p < ∞ and r > 2. Manifestly, by
construction, u, vn and wn are divergence-free, and because their Fourier
transforms are in C∞0 , they are rapidly decaying functions. Clearly also
〈B(u, vn), wn〉 =
P(u · ∇vn)wndx =
(u · ∇vn) · wndx.
The terms corresponding to each q in
u · ∇vn =
(u(x) · e)aqλ
q P [cos(λqe · x)Φ(x)]
q u(x) · P [sin(λqe · x)∇Φ(x)]
and in (85) have Fourier transforms supported B(λqe,
) ∪ B(−λq, 12) and
respectively B(λqe,
) ∪ B(−λqe, 14). These are mutually disjoint sets for
distinct q and, consequently, the terms corresponding to different indices q
do not contribute to the integral
(u · ∇vn) · wndx. The terms from the
second sum in (86) form a convergent series. Therefore, using Lemma 3.5,
we obtain
(u · vn) · wn =
(u(x) · e) {P [cos(λqe · x)Φ(x)]}2 dx+O(1)
(u(x) · e)
∣∣P⊥e Φ(x)
∣∣2 dx+O(1)
A+O(1),
ENERGY CONSERVATION 23
which concludes the proof. �
ACKNOWLEDGMENT
The work of AC was partially supported by NSF PHY grant 0555324,
the work of PC by NSF DMS grant 0504213, the work of SF by NSF DMS
grant 0503768, and the work of RS by NSF DMS grant 0604050.
REFERENCES
[1] J-M Bony, Calcul symbolique et propagation des singularité pour leséquations aux
dérivées partielles non linéaires, Ann. Ecole Norm. Sup. 14 (1981) 209–246.
[2] D. Chae, Remarks on the helicity of the 3-D incompressible Euler equations, Comm.
Math. Phys. 240 (2003), 501–507.
[3] J-Y Chemin, Perfect Incompressible Fluids, Clarendon Press, Oxford Univ 1998.
[4] A. Cheskidov, S. Friedlander and N. Pavlović, An inviscid dyadic model of turbu-
lence: the fixed point and Onsager’s conjecture, Journal of Mathematical Physics, to
appear.
[5] A. Cheskidov, S. Friedlander and N. Pavlović, An inviscid dyadic model of turbu-
lence: the global attractor (with S. Friedlander and N. Pavlović), preprint.
[6] P. Constantin, The Littlewood-Paley spectrum in 2D turbulence, Theor. Comp. Fluid
Dyn.9 (1997), 183-189.
[7] P. Constantin, W. E, E. Titi, Onsager’s conjecture on the energy conservation for
solutions of Euler’s equation, Commun. Math. Phys. 165 (1994), 207–209.
[8] P. Constantin, B. Levant, E. Titi, Regularity of inviscid shell models of turbulence,
Physical Review E 75 1 (2007) 016305.
[9] P. Constantin, F. Ramos, Inviscid limit for damped and driven incompressible Navier-
Stokes equations in R2, Commun. Math. Phys., to appear (2007).
[10] C. De Lellis and L. Székelyhidi, The Euler equations as differential inclusion,
preprint.
[11] J. Duchon and R. Robert, Inertial energy dissipation for weak solutions of incom-
pressible Euler and Navier-Stokes equations, Nonlinearity 13 (2000), 249–255.
[12] G. L. Eyink, Energy dissipation without viscosity in ideal hydrodynamics. I. Fourier
analysis and local energy transfer, Phys. D 78 (1994), 222–240.
[13] G. L. Eyink, Locality of turbulent cascades, Phys. D 207 (2005), 91–116.
[14] G. L. Eyink and K. R. Sreenivasan, Onsager and the theory of hydrodynamic turbu-
lence, Rev. Mod. Phys. 78 (2006).
[15] U. Frisch, Turbulence. The legacy of A. N. Kolmogorov. Cambridge University Press,
Cambridge, 1995.
[16] A. N. Kolmogorov, The local structure of turbulence in incompressible viscous fluids
at very large Reynolds numbers, Dokl. Akad. Nauk. SSSR 30 (1941), 301–305.
[17] R. H. Kraichnan, The structure of isotropic turbulence at very high Reynolds num-
bers, J. Fluid Mech. 5 (1959), 497–543.
[18] R. H. Kraichnan, Inertial ranges in two-dimensional turbulence, Phys. Fluids 10
(1967), 1417-1423.
[19] P-G Lemarié-Rieusset, Recent developments in the Navier-Stokes problem, Chapman
and Hall/CRC, Boca Raton, 2002.
[20] M. Lopes Filho, A. Mazzucato, H. Nussenzveig-Lopes, Weak solutions, renormalized
solutions and enstrophy defects in 2D turbulence, ARMA 179 (2006), 353-387.
24 A. CHESKIDOV, P. CONSTANTIN, S. FRIEDLANDER, AND R. SHVYDKOY
[21] H. K. Moffatt and A. Tsinober, Helicity in laminar and turbulent flow, Ann. Rev. Fluid
Mech. 24 (1992), 281–312.
[22] L. Onsager, Statistical Hydrodynamics, Nuovo Cimento (Supplemento) 6 (1949),
279–287.
[23] R. Robert, Statistical Hydrodynamics ( Onsager revisited ), Handbook of Mathemat-
ical Fluid Dynamics, vol 2 ( 2003), 1–55. Ed. Friedlander and Serre. Elsevier.
[24] V. Scheffer, An inviscid flow with compact support in space-time, J. Geom. Anal.
3(4) (1993), 343–401.
[25] A. Shnirelman, On the nonuniqueness of weak solution of the Euler equation, Comm.
Pure Appl. Math. 50 (1997), 1261–1286.
(A. Cheskidov) DEPARTMENT OF MATHEMATICS, UNIVERSITY OF MICHIGAN, ANN
ARBOR, MI 48109
E-mail address: acheskid@umich.edu
(P. Constantin) DEPARTMENT OF MATHEMATICS, UNIVERSITY OF CHICAGO, CHICAGO,
IL 60637
E-mail address: const@cs.uchicago.edu
(S. Friedlander and R. Shvydkoy) DEPARTMENT OF MATHEMATICS, STAT. AND
COMP. SCI., UNIVERSITY OF ILLINOIS, CHICAGO, IL 60607
E-mail address: susan@math.northwestern.edu
E-mail address: shvydkoy@math.uic.edu
ABSTRACT
  Onsager conjectured that weak solutions of the Euler equations for
incompressible fluids in 3D conserve energy only if they have a certain minimal
smoothness, (of order of 1/3 fractional derivatives) and that they dissipate
energy if they are rougher. In this paper we prove that energy is conserved for
velocities in the function space $B^{1/3}_{3,c(\NN)}$. We show that this space
is sharp in a natural sense. We phrase the energy spectrum in terms of the
Littlewood-Paley decomposition and show that the energy flux is controlled by
local interactions. This locality is shown to hold also for the helicity flux;
moreover, every weak solution of the Euler equations that belongs to
$B^{2/3}_{3,c(\NN)}$ conserves helicity. In contrast, in two dimensions, the
strong locality of the enstrophy holds only in the ultraviolet range.

<|endoftext|><|startoftext|>
Search for Heavy, Long-Lived Particles that Decay to Photons at CDF II
A. Abulencia,24 J. Adelman,13 T. Affolder,10 T. Akimoto,55 M.G. Albrow,17 S. Amerio,43 D. Amidei,35
A. Anastassov,52 K. Anikeev,17 A. Annovi,19 J. Antos,14 M. Aoki,55 G. Apollinari,17 T. Arisawa,57 A. Artikov,15
W. Ashmanskas,17 A. Attal,3 A. Aurisano,42 F. Azfar,42 P. Azzi-Bacchetta,43 P. Azzurri,46 N. Bacchetta,43
W. Badgett,17 A. Barbaro-Galtieri,29 V.E. Barnes,48 B.A. Barnett,25 S. Baroiant,7 V. Bartsch,31 G. Bauer,33
P.-H. Beauchemin,34 F. Bedeschi,46 S. Behari,25 G. Bellettini,46 J. Bellinger,59 A. Belloni,33 D. Benjamin,16
A. Beretvas,17 J. Beringer,29 T. Berry,30 A. Bhatti,50 M. Binkley,17 D. Bisello,43 I. Bizjak,31 R.E. Blair,2
C. Blocker,6 B. Blumenfeld,25 A. Bocci,16 A. Bodek,49 V. Boisvert,49 G. Bolla,48 A. Bolshov,33 D. Bortoletto,48
J. Boudreau,47 A. Boveia,10 B. Brau,10 L. Brigliadori,5 C. Bromberg,36 E. Brubaker,13 J. Budagov,15 H.S. Budd,49
S. Budd,24 K. Burkett,17 G. Busetto,43 P. Bussey,21 A. Buzatu,34 K. L. Byrum,2 S. Cabreraq,16 M. Campanelli,20
M. Campbell,35 F. Canelli,17 A. Canepa,45 S. Carilloi,18 D. Carlsmith,59 R. Carosi,46 S. Carron,34 B. Casal,11
M. Casarsa,54 A. Castro,5 P. Catastini,46 D. Cauz,54 M. Cavalli-Sforza,3 A. Cerri,29 L. Cerritom,31 S.H. Chang,28
Y.C. Chen,1 M. Chertok,7 G. Chiarelli,46 G. Chlachidze,17 F. Chlebana,17 I. Cho,28 K. Cho,28 D. Chokheli,15
J.P. Chou,22 G. Choudalakis,33 S.H. Chuang,52 K. Chung,12 W.H. Chung,59 Y.S. Chung,49 M. Cilijak,46
C.I. Ciobanu,24 M.A. Ciocci,46 A. Clark,20 D. Clark,6 M. Coca,16 G. Compostella,43 M.E. Convery,50 J. Conway,7
B. Cooper,31 K. Copic,35 M. Cordelli,19 G. Cortiana,43 F. Crescioli,46 C. Cuenca Almenarq,7 J. Cuevasl,11
R. Culbertson,17 J.C. Cully,35 S. DaRonco,43 M. Datta,17 S. D’Auria,21 T. Davies,21 D. Dagenhart,17
P. de Barbaro,49 S. De Cecco,51 A. Deisher,29 G. De Lentdeckerc,49 G. De Lorenzo,3 M. Dell’Orso,46 F. Delli Paoli,43
L. Demortier,50 J. Deng,16 M. Deninno,5 D. De Pedis,51 P.F. Derwent,17 G.P. Di Giovanni,44 C. Dionisi,51
B. Di Ruzza,54 J.R. Dittmann,4 M. D’Onofrio,3 C. Dörr,26 S. Donati,46 P. Dong,8 J. Donini,43 T. Dorigo,43
S. Dube,52 J. Efron,39 R. Erbacher,7 D. Errede,24 S. Errede,24 R. Eusebi,17 H.C. Fang,29 S. Farrington,30
I. Fedorko,46 W.T. Fedorko,13 R.G. Feild,60 M. Feindt,26 J.P. Fernandez,32 R. Field,18 G. Flanagan,48 R. Forrest,7
S. Forrester,7 M. Franklin,22 J.C. Freeman,29 I. Furic,13 M. Gallinaro,50 J. Galyardt,12 J.E. Garcia,46
F. Garberson,10 A.F. Garfinkel,48 C. Gay,60 H. Gerberich,24 D. Gerdes,35 S. Giagu,51 P. Giannetti,46 K. Gibson,47
J.L. Gimmell,49 C. Ginsburg,17 N. Giokarisa,15 M. Giordani,54 P. Giromini,19 M. Giunta,46 G. Giurgiu,25
V. Glagolev,15 D. Glenzinski,17 M. Gold,37 N. Goldschmidt,18 J. Goldsteinb,42 A. Golossanov,17 G. Gomez,11
G. Gomez-Ceballos,33 M. Goncharov,53 O. González,32 I. Gorelov,37 A.T. Goshaw,16 K. Goulianos,50 A. Gresele,43
S. Grinstein,22 C. Grosso-Pilcher,13 R.C. Group,17 U. Grundler,24 J. Guimaraes da Costa,22 Z. Gunay-Unalan,36
C. Haber,29 K. Hahn,33 S.R. Hahn,17 E. Halkiadakis,52 A. Hamilton,20 B.-Y. Han,49 J.Y. Han,49 R. Handler,59
F. Happacher,19 K. Hara,55 D. Hare,52 M. Hare,56 S. Harper,42 R.F. Harr,58 R.M. Harris,17 M. Hartz,47
K. Hatakeyama,50 J. Hauser,8 C. Hays,42 M. Heck,26 A. Heijboer,45 B. Heinemann,29 J. Heinrich,45 C. Henderson,33
M. Herndon,59 J. Heuser,26 D. Hidas,16 C.S. Hillb,10 D. Hirschbuehl,26 A. Hocker,17 A. Holloway,22 S. Hou,1
M. Houlden,30 S.-C. Hsu,9 B.T. Huffman,42 R.E. Hughes,39 U. Husemann,60 J. Huston,36 J. Incandela,10
G. Introzzi,46 M. Iori,51 A. Ivanov,7 B. Iyutin,33 E. James,17 D. Jang,52 B. Jayatilaka,16 D. Jeans,51 E.J. Jeon,28
S. Jindariani,18 W. Johnson,7 M. Jones,48 K.K. Joo,28 S.Y. Jun,12 J.E. Jung,28 T.R. Junk,24 T. Kamon,53
P.E. Karchin,58 Y. Kato,41 Y. Kemp,26 R. Kephart,17 U. Kerzel,26 V. Khotilovich,53 B. Kilminster,39 D.H. Kim,28
H.S. Kim,28 J.E. Kim,28 M.J. Kim,17 S.B. Kim,28 S.H. Kim,55 Y.K. Kim,13 N. Kimura,55 L. Kirsch,6 S. Klimenko,18
M. Klute,33 B. Knuteson,33 B.R. Ko,16 K. Kondo,57 D.J. Kong,28 J. Konigsberg,18 A. Korytov,18 A.V. Kotwal,16
A.C. Kraan,45 J. Kraus,24 M. Kreps,26 J. Kroll,45 N. Krumnack,4 M. Kruse,16 V. Krutelyov,10 T. Kubo,55
S. E. Kuhlmann,2 T. Kuhr,26 N.P. Kulkarni,58 Y. Kusakabe,57 S. Kwang,13 A.T. Laasanen,48 S. Lai,34 S. Lami,46
S. Lammel,17 M. Lancaster,31 R.L. Lander,7 K. Lannon,39 A. Lath,52 G. Latino,46 I. Lazzizzera,43 T. LeCompte,2
E. Lee,53 J. Lee,49 J. Lee,28 Y.J. Lee,28 S.W. Leeo,53 R. Lefèvre,20 N. Leonardo,33 S. Leone,46 S. Levy,13
J.D. Lewis,17 C. Lin,60 C.S. Lin,17 M. Lindgren,17 E. Lipeles,9 A. Lister,7 D.O. Litvintsev,17 T. Liu,17
N.S. Lockyer,45 A. Loginov,60 M. Loreti,43 R.-S. Lu,1 D. Lucchesi,43 P. Lujan,29 P. Lukens,17 G. Lungu,18
L. Lyons,42 J. Lys,29 R. Lysak,14 E. Lytken,48 P. Mack,26 D. MacQueen,34 R. Madrak,17 K. Maeshima,17
K. Makhoul,33 T. Maki,23 P. Maksimovic,25 S. Malde,42 S. Malik,31 G. Manca,30 F. Margaroli,5 R. Marginean,17
C. Marino,26 C.P. Marino,24 A. Martin,60 M. Martin,25 V. Marting,21 M. Mart́ınez,3 R. Mart́ınez-Ballaŕın,32
T. Maruyama,55 P. Mastrandrea,51 T. Masubuchi,55 H. Matsunaga,55 M.E. Mattson,58 R. Mazini,34 P. Mazzanti,5
K.S. McFarland,49 P. McIntyre,53 R. McNultyf ,30 A. Mehta,30 P. Mehtala,23 S. Menzemerh,11 A. Menzione,46
P. Merkel,48 C. Mesropian,50 A. Messina,36 T. Miao,17 N. Miladinovic,6 J. Miles,33 R. Miller,36 C. Mills,10
M. Milnik,26 A. Mitra,1 G. Mitselmakher,18 A. Miyamoto,27 S. Moed,20 N. Moggi,5 B. Mohr,8 C.S. Moon,28
http://arxiv.org/abs/0704.0760v1
R. Moore,17 M. Morello,46 P. Movilla Fernandez,29 J. Mülmenstädt,29 A. Mukherjee,17 Th. Muller,26 R. Mumford,25
P. Murat,17 M. Mussini,5 J. Nachtman,17 A. Nagano,55 J. Naganoma,57 K. Nakamura,55 I. Nakano,40 A. Napier,56
V. Necula,16 C. Neu,45 M.S. Neubauer,9 J. Nielsenn,29 L. Nodulman,2 O. Norniella,3 E. Nurse,31 S.H. Oh,16
Y.D. Oh,28 I. Oksuzian,18 T. Okusawa,41 R. Oldeman,30 R. Orava,23 K. Osterberg,23 C. Pagliarone,46
E. Palencia,11 V. Papadimitriou,17 A. Papaikonomou,26 A.A. Paramonov,13 B. Parks,39 S. Pashapour,34
J. Patrick,17 G. Pauletta,54 M. Paulini,12 C. Paus,33 D.E. Pellett,7 A. Penzo,54 T.J. Phillips,16 G. Piacentino,46
J. Piedra,44 L. Pinera,18 K. Pitts,24 C. Plager,8 L. Pondrom,59 X. Portell,3 O. Poukhov,15 N. Pounder,42
F. Prakoshyn,15 A. Pronko,17 J. Proudfoot,2 F. Ptohose,19 G. Punzi,46 J. Pursley,25 J. Rademackerb,42
A. Rahaman,47 V. Ramakrishnan,59 N. Ranjan,48 I. Redondo,32 B. Reisert,17 V. Rekovic,37 P. Renton,42
M. Rescigno,51 S. Richter,26 F. Rimondi,5 L. Ristori,46 A. Robson,21 T. Rodrigo,11 E. Rogers,24 S. Rolli,56
R. Roser,17 M. Rossi,54 R. Rossin,10 P. Roy,34 A. Ruiz,11 J. Russ,12 V. Rusu,13 H. Saarikko,23 A. Safonov,53
W.K. Sakumoto,49 G. Salamanna,51 O. Saltó,3 L. Santi,54 S. Sarkar,51 L. Sartori,46 K. Sato,17 P. Savard,34
A. Savoy-Navarro,44 T. Scheidle,26 P. Schlabach,17 E.E. Schmidt,17 M.P. Schmidt,60 M. Schmitt,38 T. Schwarz,7
L. Scodellaro,11 A.L. Scott,10 A. Scribano,46 F. Scuri,46 A. Sedov,48 S. Seidel,37 Y. Seiya,41 A. Semenov,15
L. Sexton-Kennedy,17 A. Sfyrla,20 S.Z. Shalhout,58 M.D. Shapiro,29 T. Shears,30 P.F. Shepard,47 D. Sherman,22
M. Shimojimak,55 M. Shochet,13 Y. Shon,59 I. Shreyber,20 A. Sidoti,46 P. Sinervo,34 A. Sisakyan,15 A.J. Slaughter,17
J. Slaunwhite,39 K. Sliwa,56 J.R. Smith,7 F.D. Snider,17 R. Snihur,34 M. Soderberg,35 A. Soha,7 S. Somalwar,52
V. Sorin,36 J. Spalding,17 F. Spinella,46 T. Spreitzer,34 P. Squillacioti,46 M. Stanitzki,60 A. Staveris-Polykalas,46
R. St. Denis,21 B. Stelzer,8 O. Stelzer-Chilton,42 D. Stentz,38 J. Strologas,37 D. Stuart,10 J.S. Suh,28 A. Sukhanov,18
H. Sun,56 I. Suslov,15 T. Suzuki,55 A. Taffardp,24 R. Takashima,40 Y. Takeuchi,55 R. Tanaka,40 M. Tecchio,35
P.K. Teng,1 K. Terashi,50 J. Thomd,17 A.S. Thompson,21 E. Thomson,45 P. Tipton,60 V. Tiwari,12 S. Tkaczyk,17
D. Toback,53 S. Tokar,14 K. Tollefson,36 T. Tomura,55 D. Tonelli,46 S. Torre,19 D. Torretta,17 S. Tourneur,44
W. Trischuk,34 S. Tsuno,40 Y. Tu,45 N. Turini,46 F. Ukegawa,55 S. Uozumi,55 S. Vallecorsa,20 N. van Remortel,23
A. Varganov,35 E. Vataga,37 F. Vazquezi,18 G. Velev,17 G. Veramendi,24 V. Veszpremi,48 M. Vidal,32 R. Vidal,17
I. Vila,11 R. Vilar,11 T. Vine,31 I. Vollrath,34 I. Volobouevo,29 G. Volpi,46 F. Würthwein,9 P. Wagner,53
R.G. Wagner,2 R.L. Wagner,17 J. Wagner,26 W. Wagner,26 R. Wallny,8 S.M. Wang,1 A. Warburton,34 D. Waters,31
M. Weinberger,53 W.C. Wester III,17 B. Whitehouse,56 D. Whiteson,45 A.B. Wicklund,2 E. Wicklund,17
G. Williams,34 H.H. Williams,45 P. Wilson,17 B.L. Winer,39 P. Wittichd,17 S. Wolbers,17 C. Wolfe,13
T. Wright,35 X. Wu,20 S.M. Wynne,30 A. Yagil,9 K. Yamamoto,41 J. Yamaoka,52 T. Yamashita,40 C. Yang,60
U.K. Yangj,13 Y.C. Yang,28 W.M. Yao,29 G.P. Yeh,17 J. Yoh,17 K. Yorita,13 T. Yoshida,41 G.B. Yu,49 I. Yu,28
S.S. Yu,17 J.C. Yun,17 L. Zanello,51 A. Zanetti,54 I. Zaw,22 X. Zhang,24 J. Zhou,52 and S. Zucchelli5
(CDF Collaboration∗)
1Institute of Physics, Academia Sinica, Taipei, Taiwan 11529, Republic of China
2Argonne National Laboratory, Argonne, Illinois 60439
3Institut de Fisica d’Altes Energies, Universitat Autonoma de Barcelona, E-08193, Bellaterra (Barcelona), Spain
4Baylor University, Waco, Texas 76798
5Istituto Nazionale di Fisica Nucleare, University of Bologna, I-40127 Bologna, Italy
6Brandeis University, Waltham, Massachusetts 02254
7University of California, Davis, Davis, California 95616
8University of California, Los Angeles, Los Angeles, California 90024
9University of California, San Diego, La Jolla, California 92093
10University of California, Santa Barbara, Santa Barbara, California 93106
11Instituto de Fisica de Cantabria, CSIC-University of Cantabria, 39005 Santander, Spain
12Carnegie Mellon University, Pittsburgh, PA 15213
13Enrico Fermi Institute, University of Chicago, Chicago, Illinois 60637
14Comenius University, 842 48 Bratislava, Slovakia; Institute of Experimental Physics, 040 01 Kosice, Slovakia
15Joint Institute for Nuclear Research, RU-141980 Dubna, Russia
16Duke University, Durham, North Carolina 27708
17Fermi National Accelerator Laboratory, Batavia, Illinois 60510
18University of Florida, Gainesville, Florida 32611
19Laboratori Nazionali di Frascati, Istituto Nazionale di Fisica Nucleare, I-00044 Frascati, Italy
20University of Geneva, CH-1211 Geneva 4, Switzerland
21Glasgow University, Glasgow G12 8QQ, United Kingdom
22Harvard University, Cambridge, Massachusetts 02138
23Division of High Energy Physics, Department of Physics,
University of Helsinki and Helsinki Institute of Physics, FIN-00014, Helsinki, Finland
24University of Illinois, Urbana, Illinois 61801
25The Johns Hopkins University, Baltimore, Maryland 21218
26Institut für Experimentelle Kernphysik, Universität Karlsruhe, 76128 Karlsruhe, Germany
27High Energy Accelerator Research Organization (KEK), Tsukuba, Ibaraki 305, Japan
28Center for High Energy Physics: Kyungpook National University,
Taegu 702-701, Korea; Seoul National University, Seoul 151-742,
Korea; SungKyunKwan University, Suwon 440-746, Korea
29Ernest Orlando Lawrence Berkeley National Laboratory, Berkeley, California 94720
30University of Liverpool, Liverpool L69 7ZE, United Kingdom
31University College London, London WC1E 6BT, United Kingdom
32Centro de Investigaciones Energeticas Medioambientales y Tecnologicas, E-28040 Madrid, Spain
33Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
34Institute of Particle Physics: McGill University, Montréal,
Canada H3A 2T8; and University of Toronto, Toronto, Canada M5S 1A7
35University of Michigan, Ann Arbor, Michigan 48109
36Michigan State University, East Lansing, Michigan 48824
37University of New Mexico, Albuquerque, New Mexico 87131
38Northwestern University, Evanston, Illinois 60208
39The Ohio State University, Columbus, Ohio 43210
40Okayama University, Okayama 700-8530, Japan
41Osaka City University, Osaka 588, Japan
42University of Oxford, Oxford OX1 3RH, United Kingdom
43University of Padova, Istituto Nazionale di Fisica Nucleare,
Sezione di Padova-Trento, I-35131 Padova, Italy
44LPNHE, Universite Pierre et Marie Curie/IN2P3-CNRS, UMR7585, Paris, F-75252 France
45University of Pennsylvania, Philadelphia, Pennsylvania 19104
46Istituto Nazionale di Fisica Nucleare Pisa, Universities of Pisa,
Siena and Scuola Normale Superiore, I-56127 Pisa, Italy
47University of Pittsburgh, Pittsburgh, Pennsylvania 15260
48Purdue University, West Lafayette, Indiana 47907
49University of Rochester, Rochester, New York 14627
50The Rockefeller University, New York, New York 10021
51Istituto Nazionale di Fisica Nucleare, Sezione di Roma 1,
University of Rome “La Sapienza,” I-00185 Roma, Italy
52Rutgers University, Piscataway, New Jersey 08855
53Texas A&M University, College Station, Texas 77843
54Istituto Nazionale di Fisica Nucleare, University of Trieste/ Udine, Italy
55University of Tsukuba, Tsukuba, Ibaraki 305, Japan
56Tufts University, Medford, Massachusetts 02155
57Waseda University, Tokyo 169, Japan
58Wayne State University, Detroit, Michigan 48201
59University of Wisconsin, Madison, Wisconsin 53706
60Yale University, New Haven, Connecticut 06520
(Dated: November 3, 2018; Version 5.1)
We present the first search for heavy, long-lived particles that decay to photons at a hadron
collider. We use a sample of γ+jet+missing transverse energy events in pp̄ collisions at
1.96 TeV taken with the CDF II detector. Candidate events are selected based on the arrival time
of the photon at the detector. Using an integrated luminosity of 570 pb−1 of collision data, we
observe 2 events, consistent with the background estimate of 1.3±0.7 events. While our search
strategy does not rely on model-specific dynamics, we set cross section limits in a supersymmetric
model with eχ01 → γ eG and place the world-best 95% C.L. lower limit on the eχ01 mass of 101 GeV/c2
at τχ̃0
= 5 ns.
PACS numbers: 13.85.Rm, 12.60.Jv, 13.85.Qk, 14.80.Ly
∗With visitors from aUniversity of Athens, bUniversity of
Bristol, cUniversity Libre de Bruxelles, dCornell University,
eUniversity of Cyprus, fUniversity of Dublin, gUniversity of Ed-
inburgh, hUniversity of Heidelberg, iUniversidad Iberoamericana,
jUniversity of Manchester, kNagasaki Institute of Applied Science,
lUniversity de Oviedo, mUniversity of London, Queen Mary Col-
lege, nUniversity of California Santa Cruz, oTexas Tech University,
Searches for events with final state photons and miss-
ing transverse energy (E/T ) [1] at collider experiments
are sensitive to new physics from a wide variety of mod-
els [2] including gauge mediated supersymmetry breaking
(GMSB) [3]. In these models the lightest neutralino (χ̃01)
decays into a photon (γ) and a weakly interacting, stable
gravitino (G̃) that gives rise to E/T by leaving the detec-
tor without depositing any energy. The observation of an
eeγγE/T candidate event by the CDF experiment during
Run I at the Fermilab Tevatron [4] has increased the in-
terest in experimental tests of this class of theories. Most
subsequent searches have focused on promptly produced
photons [5, 6], however the χ̃01 can have a lifetime on the
order of nanoseconds or more. This is the first search
for heavy, long-lived particles that decay to photons at a
hadron collider.
We optimize our selection requirements using a GMSB
model with a standard choice of parameters [7] and vary
the values of the χ̃01 mass and lifetime. However, the final
search strategy is chosen to be sufficiently general and
independent of the specific GMSB model dynamics to
yield results that are approximately valid for any model
producing the same reconstructed final state topology
and kinematics [8]. In pp̄ collisions at the Tevatron the
inclusive GMSB production cross section is dominated
by pair production of gauginos. The gauginos decay
promptly, resulting in a pair of long-lived χ̃01’s in asso-
ciation with other final state particles that can be identi-
fied as jets. For a heavy χ̃01 decaying inside the detector,
the photon can arrive at the face of the detector with a
time delay relative to promptly produced photons. To
have good sensitivity for nanosecond-lifetime χ̃01’s [8], we
search for events that contain a time-delayed photon, E/T ,
and ≥ 1 jet. This is equivalent to requiring that at least
one of the long-lived χ̃01’s decays inside the detector.
This Letter summarizes [9] the first search for heavy,
long-lived particles that decay to photons at a hadron
collider. The data comprise 570±34 pb−1 of pp̄ collisions
collected with the CDF II detector [10] at
s = 1.96 TeV.
Previous searches for nanosecond-lifetime particles using
non-timing techniques yielded null results [11].
A full description of the CDF II detector can be found
elsewhere [10]. Here we briefly describe the aspects of the
detector relevant to this analysis. The magnetic spec-
trometer consists of tracking devices inside a 3-m diame-
ter, 5-m long superconducting solenoid magnet that op-
erates at 1.4 T. An eight-layer silicon microstrip detector
array and a 3.1-m long drift chamber with 96 layers of
sense wires measure the position (~xi) and time (ti) of the
pp̄ interaction [12] and the momenta of charged particles.
Muons from collisions or cosmic rays are identified by a
pUniversity of California Irvine, qIFIC(CSIC-Universitat de Valen-
cia),
system of drift chambers situated outside the calorime-
ters in the region with pseudorapidity |η| < 1.1 [1].
The calorimeter consists of projective towers with elec-
tromagnetic and hadronic compartments. It is divided
into a central barrel that surrounds the solenoid coil
(|η| < 1.1) and a pair of end-plugs that cover the region
1.1 < |η| < 3.6. Both calorimeters are used to identify
and measure the energy and position of photons, elec-
trons, jets, and E/T . The electromagnetic calorimeters
were recently instrumented with a new system, the EM-
Timing system (completed in Fall 2004) [13], that mea-
sures the arrival time of electrons and photons in each
tower with |η| < 2.1 for all energies above ∼5 GeV.
The time and position of arrival of the photon at the
calorimeter, tf and ~xf , are used to separate the photons
from the decays of heavy, long-lived χ̃01’s from promptly
produced photons or photons from non-collision sources.
We define the corrected arrival time of the photon as
tγc ≡ tf − ti −
|~xf − ~xi|
The tγc distribution for promptly produced, high energy
photons is Gaussian with a mean of zero by construction
and with a standard deviation that depends only on the
measurement resolution assuming that the pp̄ production
vertex has been correctly identified. Photons from heavy,
long-lived particles can have arrival times that are many
standard deviations larger than zero.
The analysis preselection is summarized in Table I. It
begins with events passing an online, three-level trigger
by having a photon candidate in the region |η| < 1.1
with ET> 25 GeV and E/T> 25 GeV. Offline, the high-
est ET photon candidate in the fiducial region of the
calorimeter is required to have ET > 30 GeV and to
pass the standard photon identification requirements [5]
with a minor modification [14]. We require the event
to have E/T > 30 GeV where the trigger is 100% effi-
cient. We require at least one jet with |ηjet| < 2.0 and
> 30 GeV [15]. Since a second photon can be identi-
fied as a jet, the analysis is sensitive to signatures where
one or both χ̃01’s decay inside the detector. To ensure
a high quality ti and ~xi measurement, we require a ver-
tex with at least 4 tracks,
tracks
pT > 15 GeV/c, and
|zi| < 60 cm; this also helps to reduce non-collision back-
grounds. For events with multiple reconstructed vertices,
we pick the vertex with the highest
tracks
pT . To re-
duce cosmic ray background, events are rejected if there
are hits in a muon chamber that are not matched to any
track and are within 30◦ of the photon. After the above
requirements there are 11,932 events in the data sample.
There are two major classes of background events: col-
lision and non-collision photon candidates. Collision pho-
tons are presumed to come from standard model interac-
tions, e.g., γ+jet+mismeasured E/T , dijet+mismeasured
E/T where the jet is mis-identified as a γ, and W → eν
where the electron is mis-identified as a γ. Non-collision
Preselection Requirements Cumulative (individual)
Efficiency (%)
> 30 GeV, E/T > 30 GeV 54 (54)
Photon ID and fiducial, |η| < 1.0 39 (74)*
Good vertex,
tracks
pT > 15 GeV/c 31 (79)
|ηjet| < 2.0, Ejet
> 30 GeV 24 (77)
Cosmic ray rejection 23 (98)*
Requirements after Optimization
E/T > 40 GeV, E
> 35 GeV 21 (92)
∆φ(E/T , jet) > 1 rad 18 (86)
2 ns < tγc < 10 ns 6 (33)
TABLE I: The data selection criteria and the cumulative
and individual requirement efficiencies for an example GMSB
model point at mχ̃0
= 100 GeV/c2 and τχ̃0
= 5 ns. The ef-
ficiencies listed are, in general, model-dependent and have a
fractional uncertainty of 10%. Model-independent efficiencies
are indicated with an asterisk. The collision fiducial require-
ment of |zi| < 60 cm is part of the good vertex requirement
(95%) and is estimated from data.
backgrounds come from cosmic rays and beam effects
that can produce photon candidates, E/T , and sometimes
the reconstructed jet. We separate data events as a func-
tion of tγc into several control regions that allow us to
estimate the number of background events in the final
signal region by fitting to the data using collision and
non-collision shape templates as shown in Fig. 1.
Collision photons are subdivided in two subclasses:
correct and incorrect vertex selection [13]. An incorrect
vertex can be selected when two or more collisions occur
in one beam bunch crossing, making it possible that the
highest reconstructed
tracks
pT vertex does not produce
the photon. While the fraction of events with incorrect
vertices depends on the final event selection criteria, the
tγc distribution for each subclass is estimated separately
using W → eν data where the electron track is dropped
from the vertexing. For events with a correctly associ-
ated vertex, the tγc distribution is Gaussian and centered
at zero with a standard deviation of 0.64 ns [13]. For
those with an incorrectly selected vertex the tγc distribu-
tion is also Gaussian with a standard deviation of 2.05 ns.
The tγc distributions for both non-collision backgrounds
are estimated separately from data using events with
no reconstructed tracks. Photon candidates from cos-
mic rays are not correlated in time with collisions, and
therefore their tγc distribution is roughly flat. Beam halo
photon candidates are produced by muons that origi-
nate upstream of the detector (from the p direction) and
travel through the calorimeter, typically depositing small
amounts of energy. When the muon deposits significant
energy in the EM calorimeter, it can be misidentified as a
photon and cause E/T . These photons populate predomi-
nantly the negative tγc region, but can contribute to the
signal region. Since beam halo muons travel parallel to
the beam line, these events can be separated from cosmic
ray events by identifying the small energy deposited in
the calorimeter towers along the beam halo muon trajec-
tory.
The background prediction uses control regions out-
side the signal time window but well within the 132 ns
time window that the calorimeter uses to measure the
energy. The non-collision background templates are nor-
malized to match the number of events in two time win-
dows: a beam halo-dominated window at {−20, −6} ns,
selected to be 3σ away from the wrong vertex collision
background, and a cosmic rays-dominated window at
{25, 90} ns, well away from the standard model and
beam halo contributions. The collision background is
estimated by fitting events in the {−10, 1.2} ns window
with the non-collision contribution subtracted and with
the fraction of correct to incorrect vertex events allowed
to vary. In this way the background for the signal region
is entirely estimated from data samples. The systematic
uncertainty on the background estimate is dominated by
our ability to calibrate the mean of the tγc distribution
for prompt photons. We find a variation of 200 ps on
the mean and 20 ps on the standard deviation of the dis-
tribution by considering various possible event selection
criteria. These contribute to the systematic uncertainty
of the collision background estimate in the signal region
and are added in quadrature with the statistical uncer-
tainties of the final fit procedure.
We estimate the sensitivity to heavy, long-lived parti-
cles that decay to photons using GMSB models for dif-
ferent χ̃01 masses and lifetimes. Events from all SUSY
processes are simulated with the pythia Monte Carlo
program [16] along with the detector simulation [17]. The
acceptance is the ratio of simulated events that pass all
the requirements to all events produced. It is used in
the optimization procedure and in the final limit setting
and depends on a number of effects. The fraction of χ̃01
decays in the detector volume is the dominant effect on
the acceptance. For a given lifetime this depends on the
boost of the χ̃01. A highly boosted χ̃
1 that decays in
the detector typically does not contribute to the accep-
tance because it tends to produce a photon traveling in
the same direction as the χ̃01. Thus, the photon’s arrival
time is indistinguishable from promptly produced pho-
tons. At small boosts the decay is more likely to happen
inside the detector, and the decay angle is more likely
to be large, which translates into a larger delay for the
photon. The fraction of events with a delayed photon ar-
rival time initially rises as a function of χ̃01 lifetime, but
falls as the fraction of χ̃01’s decaying outside the detector
begins to dominates. In the χ̃01 mass region considered
(65 ≤ mχ̃0
≤ 150 GeV/c2), the acceptance peaks at a
lifetime of around 5 ns. The acceptance also depends on
the mass as the boost effects are mitigated by the ability
to produce high energy photons or E/T in the collision, as
discussed in Ref. [8].
The total systematic uncertainty of 10% on the ac-
Photon Corrected Time of Arrival (ns)
-20 0 20 40 60 80
)-1 + Jet data (570 pb
E + γ
Standard Model
Beam Halo
Cosmics
GMSB Signal MC
Photon Corrected Time of Arrival (ns)
-20 0 20 40 60 80
FIG. 1: The time distribution for photons passing all but
the final timing requirement for the background predic-
tions, data, and a GMSB signal for an example point at
= 100 GeV/c2, τχ̃0
= 5 ns. A total of 1.3±0.7 back-
ground events are predicted and 2 (marked with a star) are
observed in the signal region of 2 < tγc < 10 ns.
ceptance is dominated by the uncertainty on the mean
of the tγc distribution (7%) and on the photon ID effi-
ciency (5%). Other significant contributions come from
uncertainties on initial and final state radiation (3%), jet
energy measurement (3%), and the parton distribution
functions (1%).
We determine the kinematic and tγc selection require-
ments that define the final data sample by optimizing
the expected cross section limit without looking at the
data in the signal region. To compute the expected 95%
confidence level (C.L.) cross section upper limit [18], we
combine the predicted GMSB signal and background esti-
mates with the systematic uncertainties using a Bayesian
method with a flat prior [19]. The expected limits are op-
timized by simultaneously varying the selection require-
ments for E/T , photon ET , jet ET , azimuth angle be-
tween the leading jet and E/T (∆φ(E/T , jet)), and t
c . The
∆φ(E/T , jet) requirement rejects events where the E/T is
overestimated because of a poorly measured jet. While
each point in χ̃01 lifetime vs. mass space gives a slightly
different optimization, we choose a single set of require-
ments because it simplifies the final analysis, while only
causing a small loss of sensitivity. The optimized require-
ments are summarized in Table I. As an example, the ac-
ceptance for mχ̃0
= 100 GeV/c2 and lifetime τχ̃0
= 5 ns
is estimated to be (6.3±0.6)%.
After all kinematic requirements, 508 events are ob-
served in the data before the final signal region time re-
quirement. Their time distribution is shown in Fig. 1.
Our fit to the data outside the signal region predicts total
backgrounds of 6.2±3.5 from cosmic rays, 6.8±4.9 from
beam halo background sources, and the rest from the
)2 mass (GeV/c
65 70 75 80 85 90 95 100 105 110
0 1χ∼
1.0 pb
0.5 pb
0.3 pb
0.2 pb
0.13 pb
FIG. 2: The contours of constant 95% C.L. upper cross section
limits for a GMSB model [7].
standard model. Inside the signal time region, {2, 10} ns,
we predict 1.25±0.66 events: 0.71±0.60 from standard
model, 0.46±0.26 from cosmic rays, and 0.07±0.05 from
beam halo. Two events are observed in the data. Since
the result is consistent with the no-signal hypothesis, we
set limits on the χ̃01 lifetime and mass. Figure 2 shows the
contours of constant 95% C.L. cross section upper limit.
Figure 3 shows the exclusion region at 95% C.L., along
with the expected limit for comparison. This takes into
account the predicted production cross section at next-
to-leading order [20] as well as the uncertainties on the
parton distribution functions (6%) and the renormaliza-
tion scale (2%). Since the number of observed events
is above expectations, the observed limits are slightly
worse than the expected limits. These limits extend at
large masses beyond those of LEP searches using photon
“pointing” methods [11].
In conclusion, we have performed the first search for
heavy, long-lived particles that decay to photons at a
hadron collider using data collected with the EMTim-
ing system at the CDF II detector. There is no excess
of events beyond expectations. As our search strategy
does not rely on event properties specific solely to GMSB
models, we can exclude any γ+jet+E/T signal that would
produce more than 5.5 events. We set cross section limits
using a supersymmetric model with χ̃01 → γG̃, and find
a GMSB exclusion region in the χ̃01 lifetime vs. mass
plane with the world-best 95% C.L. lower limit on the
χ̃01 mass of 101 GeV/c
2 at τχ̃0
= 5 ns. Future improve-
ments with similar techniques should also provide sen-
sitivity to new particle decays with a delayed electron
signature [2]. By the end of Run II, an integrated lumi-
nosity of 10 fb−1 is possible for which we estimate a mass
reach of ≃ 140 GeV/c2 at a lifetime of 5 ns.
)2 mass (GeV/c
65 70 75 80 85 90 95 100 105 110
0 1χ∼
)-1+1jet analysis with EMTiming (570 pb
Predicted exclusion region
Observed exclusion region
ALEPH exclusion upper limit
χ∼GMSB 
)=15β, tan(Λ=2messM
>0µ=1, messN
65 70 75 80 85 90 95 100 105 110
FIG. 3: The exclusion region at 95% C.L. as a function of eχ01
lifetime and mass for a GMSB model [7]. The predicted and
the observed regions are shown separately and are compared
to the most stringent published limit from LEP searches [11].
We thank the Fermilab staff and the technical staffs
of the participating institutions for their vital contribu-
tions. This work was supported by the U.S. Department
of Energy and National Science Foundation; the Italian
Istituto Nazionale di Fisica Nucleare; the Ministry of
Education, Culture, Sports, Science and Technology of
Japan; the Natural Sciences and Engineering Research
Council of Canada; the National Science Council of the
Republic of China; the Swiss National Science Founda-
tion; the A.P. Sloan Foundation; the Bundesministerium
für Bildung und Forschung, Germany; the Korean Sci-
ence and Engineering Foundation and the Korean Re-
search Foundation; the Particle Physics and Astronomy
Research Council and the Royal Society, UK; the Russian
Foundation for Basic Research; the Comisión Interminis-
terial de Ciencia y Tecnoloǵıa, Spain; in part by the Eu-
ropean Community’s Human Potential Programme un-
der contract HPRN-CT-2002-00292; and the Academy
of Finland.
[1] We use a cylindrical coordinate system in which the pro-
ton beam travels along the z-axis, θ is the polar angle, φ
is the azimuthal angle, and η = − ln tan(θ/2). The trans-
verse energy and momentum are defined as ET = E sin θ
and pT = p sin θ where E is the energy measured by
the calorimeter and p the momentum measured in the
tracking system. E/T = | −
EiT ~ni| where ~ni is a unit
vector that points from the interaction vertex to the ith
calorimeter tower in the transverse plane.
[2] J. L. Feng, A. Rajaraman and F. Takayama, Phys. Rev.
D 68, 063504 (2003); M. J. Strassler and K. M. Zurek,
arXiv:hep-ph/0605193.
[3] S. Ambrosanio et al., Phys. Rev. D 54, 5395 (1996);
C. H. Chen and J. F. Gunion, Phys. Rev. D 58, 075005
(1998).
[4] F. Abe et al. (CDF Collaboration), Phys. Rev. Lett. 81,
1791 (1998) and Phys. Rev. D 59, 092002 (1999).
[5] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71,
031104 (2005).
[6] V. Abazov et al. (D0 Collaboration), Phys. Rev. Lett. 94,
041801 (2005).
[7] B. C. Allanach et al., Eur. Phys. J. C25, 113 (2002). We
use benchmark model 8 and allow the eG mass factor and
the supersymmetry breaking scale to vary independently.
[8] D. Toback and P. Wagner, Phys. Rev. D 70, 114032
(2004).
[9] P. Wagner, Ph.D. Thesis, Texas A&M University, 2007.
[10] D. Acosta et al. (CDF Collaboration), Phys. Rev. D 71,
032001 (2005).
[11] A. Heister et al. (ALEPH Collaboration), Eur. Phys. J.
C 25, 339 (2002); also see M. Gataullin, S. Rosier, L. Xia
and H. Yang, arXiv:hep-ex/0611010; G. Abbiendi et al.
(OPAL Collaboration), Proc. Sci. HEP2005 346 (2006);
J. Abdallah et al. (DELPHI Collaboration), Eur. Phys. J.
C 38 395 (2005).
[12] The distribution of the pp̄ collisions has a standard devi-
ation of 30 cm and 1.3 ns in zi and ti, respectively.
[13] M. Goncharov et al., Nucl. Instrum. Methods A565, 543
(2006).
[14] The standard requirement, χ2CES < 20 (see F. Abe et
al. (CDF Collaboration), Phys. Rev. D 52, 4784 (1995)),
has been removed because there is evidence that it is in-
efficient for photons that arrive with large incident angles
relative to the face of the detector.
[15] See F. Abe et al. (CDF Collaboration), Phys. Rev. D 45,
1448 (1992). We use corrected jets reconstructed with
a cone of ∆R = 0.7, see A. Bhatti et al., Nucl. In-
strum. Methods A566, 375 (2006).
[16] T. Sjöstrand et al., Comput. Phys. Commun. 135, 238
(2001). We use version 6.216.
[17] We use the standard geant based detector simulation
[R. Brun et al., CERN-DD/EE/84-1 (1987)] and add a
parametrized EMTiming simulation.
[18] E. Boos, A. Vologdin, D. Toback, and J. Gaspard, Phys.
Rev. D 66, 013011 (2002).
[19] J. Conway, CERN Yellow Book Report No. CERN 2000-
005, 2000, p. 247.
[20] W. Beenakker et al., Phys. Rev. Lett. 83, 3780 (1999).
http://arxiv.org/abs/hep-ph/0605193
http://arxiv.org/abs/hep-ex/0611010
ABSTRACT
  We present the first search for heavy, long-lived particles that decay to
photons at a hadron collider. We use a sample of photon+jet+missing transverse
energy events in p-pbar collisions at \sqrt{s}=1.96 TeV taken with the CDF II
detector. Candidate events are selected based on the arrival time of the photon
at the detector. Using an integrated luminosity of 570 pb-1 of collision data,
we observe 2 events, consistent with the background estimate of 1.3+-0.7
events. While our search strategy does not rely on model-specific dynamics, we
set cross section limits in a supersymmetric model with
\tilde{\chi}_1^0->\gamma\gravitino and place the world-best 95% C.L. lower
limit on the \tilde{\chi}_1^0 mass of 101 GeV/c^2 at \tau_{\tilde{\chi}_1^0} =
5 ns.

<|endoftext|><|startoftext|>
Failure of the work-Hamiltonian connection for free energy calculations 
Failure of the work-Hamiltonian connection for free energy calculations 
Jose M. G. Vilar1 and J. Miguel Rubi2 
1Computational Biology Program, Memorial Sloan-Kettering Cancer Center, 1275 York 
Avenue, New York, NY 10021 
2Departament de Fisica Fonamental, Universitat de Barcelona, Diagonal 647, 08028 
Barcelona, Spain 
Abstract 
Extensions of statistical mechanics are routinely being used to infer free energies 
from the work performed over single-molecule nonequilibrium trajectories. A key 
element of this approach is the ubiquitous expression / ( , )dW dt H x t t/= ∂ ∂ , which 
connects the microscopic work W  performed by a time-dependent force on the 
coordinate x  with the corresponding Hamiltonian (H x t),  at time t . Here we show that 
this connection, as pivotal as it is, cannot be used to estimate free energy changes. We 
discuss the implications of this result for single-molecule experiments and atomistic 
molecular simulations and point out possible avenues to overcome these limitations. 
PACS numbers: 05.40.-a, 05.20.-y, 05.70.Ln 
Hamiltonians provide two key ingredients to bridge the microscopic structure of nature 
with macroscopic thermodynamic properties: they completely specify the underlying 
dynamics and they can be identified with the energy of the system [1].  At equilibrium, 
the link with the thermodynamic properties is established through the partition function 
( )H xZ e dβ−= ∫ x , which here uses the Hamiltonian  in the coordinate space ( )H x x  as the 
energy of the system [2]. In particular, the free energy is given by 
= − Z , where 
1 Bk Tβ ≡ /  is the inverse of the temperature T  times the Boltzmann’s constant . 
Thermodynamic properties play an important role because they provide information that 
is not readily available from the microscopic properties, such as whether or not a given 
process happens spontaneously. 
The connection between work and Hamiltonian expressed through the relation 
W H x
)t, , or equivalently  through its integral representation 
( ( ') ') '
W H x t t
∂∫ dt , is typically used to extend statistical mechanics to far-from-
equilibrium situations [3-5]. These relations are meant to imply that the work W  
performed on a system is used to change its energy. The potential advantage of this type 
of approach is that it would allow one to infer thermodynamic properties even when the 
relevant details of the Hamiltonian are not known or when they are too complex for a 
direct analysis. Experiments and computer simulations can thus be performed to probe 
the microscopic mechanical properties from which to obtain thermodynamic properties.  
Time-dependent Hamiltonians, however, provide the energy up to an arbitrary factor that 
typically depends on time and on the microscopic history of the system. Such 
dependence, as we show below, prevents this approach from being generally applicable 
to compute thermodynamic properties.    
To illustrate how work and Hamiltonian fail to be generally connected, we consider a 
system described by the Hamiltonian  under the effects of a time-dependent force 0( )H x
( )f t . The total Hamiltonian is given by  
 0( ) ( ) ( ) (H x t H x f t x g t), = − + , 
where  is an arbitrary function of time, which leads to a total force 
. The function  does not affect the total force but it changes the 
Hamiltonian. Therefore,  has to be chosen so that the Hamiltonian can be identified 
with the energy of the system.  
( )g t
0 /F H x f= −∂ ∂ + ( )t ( )g t
( )g t
In general, the arbitrary time dependence of the Hamiltonian, , cannot be chosen 
so that the Hamiltonian gives a consistent energy. Consider, for instance, that the system, 
being initially at 
( )g t
0x , is subjected to a sudden perturbation 0( ) ( )f t f t≡ Θ , where 0f  is a 
constant and  is the Heaviside step function. The work performed on the system, 
, where 
( )tΘ
0( tW f x x= − 0 ) ( )tx x t≡  represents the value of the coordinate x  at time , is in 
general different from 
' 0 0
( ') ' ( ) (0
tH x t dt f x g t gt
, = − + −
∂∫ ) , irrespective of the explicit 
form of the function .  ( )g t
To illustrate the consequences of the lack of connection between work and changes in 
the Hamiltonian, we focus on the domain of validity of nonequilibrium work relations [3] 
of the type 
 ,EG We eβ β− Δ −=  
which have been widely used recently to obtain estimates EGΔ  of free energy changes 
from single-molecule pulling experiments [6] and atomistic computer simulations [7]. 
The promise of this type of relations is that they provide the values of the free energy 
from irreversible trajectories and therefore do not require equilibration of the system. Yet, 
in almost all instances in which this approach has been applied, the agreement with the 
canonical thermodynamic results has not been complete and in some cases the 
discrepancies have been large. These discrepancies have been attributed to the presence 
of statistical errors in the estimation of the exponential average We β−  [8].  
Currently, the mathematical validity of these type of nonequilibrium work relations 
appears to be well established: they have been derived using approximations [3] and 
rigorously for systems described by Langevin equations [4, 5]. However, all these 
derivations rely in different ways on the work-Hamiltonian connection, which as we 
show below prevents them from giving general estimates of thermodynamic free 
energies. 
The free energy difference between two states is defined as revG WΔ = , where  
is the work required to bring the system from the initial to the final state in a reversible 
manner [2]. Note that, if the system is not macroscopic,  is in general a fluctuating 
quantity. At quasi-equilibrium, the external force 
( )f t  balances with the system force 
. After integration by the displacement, the reversible work done on the 
system is given by . Therefore, the free energy follows from  
( )H x x−∂ /∂
0 0( ) ( )rev tW H x H x= − 0
  0 0( , ) ( ,0)rev eq t eq tG W P x t P x dx dxΔ = ,∫ ∫
where the equilibrium probabilities  are obtained, in the usual way, from the 
Boltzmann distribution 
( )( , )
H x t
eq Z tP x t e
β− ,= . To be explicit, let us consider a harmonic 
system described by 210 2( )H x kx=  and ( ) 0g t = , with  a constant. In this case, we can 
compute exactly the free energy change: 
G kxΔ = ,  
where ( )eqx f t k≡ / , which leads to a positive value as required for non-spontaneous 
processes. 
One might have been tempted to use the partition function to estimate changes in free 
energy according to the expression 1 ln( ( ) (0))ZG Z t ZβΔ = − / , where 
( )( ) H x tZ t e dβ− ,= ∫ x  is 
the time-dependent quasi-equilibrium partition function [3, 4]. However, this relation is 
not valid when changes in the Hamiltonian cannot be associated with changes in energy. 
In the case of the harmonic potential, the use of the time-dependent partition function 
leads to 212Z eqG kΔ = − x , a negative value inconsistent with a process that is not 
spontaneous. More generally, the Hamiltonian 212( ) ( )(tH x t kx f t x )γ, = − − , where γ  is a 
constant parameter that does not affect the dynamics of the system, leads to 
2( )Z eq eqG kx xγΔ = − , which can be positive or negative depending on the value of γ . 
Therefore, the estimates ZGΔ  are not suitable to predict typical thermodynamic 
properties, such as whether or not a process happens spontaneously. 
To what extent does the failure of the work-Hamiltonian connection impact 
nonequilibrium work equalities? In the case of a sudden perturbation and a harmonic 
potential discussed previously, the following result follows straightforwardly:  
 0 0( ) 0 0( , ) ( ,0) 1t
f x xW
eq t eq te e P x t P x dx dx
ββ − −− = =∫ ∫ ,  
which is different from .  Ge β− Δ
An intriguing question then arises: why do experiments and computer simulations 
sometimes lead to results that agree with nonequilibrium work equalities? Let us consider 
a situation closer to the experimental and computational setups, with a harmonic time-
dependent force that constrains the motion on the coordinate x :  
 210 2( ) ( ) ( )tH x t H x K x X, = + − .  
Here  is a constant and K tX  is the time-dependent equilibrium position for the 
constraining force. In this case, with 210 2( )H x kx=  and 0 0X = , we also have 
2rev eq
G W kxΔ = = ,  
where now Keq tk Kx X+≡ . 
For quasi-equilibrium displacements of tX , so that the work performed is equal to 
the reversible work,  , we have  0 0( ) ( )rev tW W H x H x= = − 0
 0 0 0( ( ) ( )) 0 0( , ) ( ,0)rev t
W H x H x
eq t eq te e P x t P x dx
β β− − −= ,∫ ∫ dx  
which leads to 
( ) 2
2( 2 ) ( )
k k K
eqk K
W e ke
K k K
This result indicates that quasi-equilibrium does not guarantee the accuracy of the 
exponential estimate of the free energy from nonequilibrium work relations. The free 
energy change  and its exponential estimate GΔ EGΔ  agree with each other only for large 
values of . The reason is that, in this case, work and Hamiltonian are connected to each K
other when both quasi-equilibrium and large-  conditions are fulfilled simultaneously. 
Under such conditions, the work-Hamiltonian connection is valid because 
eq tx x X≈ ≈  
implies that the rate of change of the Hamiltonian, ( ) / ( ) /t tH x t t K x X dX dt∂ , ∂ = − − , 
equals the power associated with the external force, / ( ) /tdW dt K x X dx dt= − − . 
Interestingly, large values of  suppress fluctuations and lead to quasi-deterministic 
dynamics. Indeed, the experimental data [6] and computer simulations [7] indicate that 
the agreement between the free energy change 
GΔ  and its exponential estimate EGΔ  
occurs mainly for relatively slow perturbations that lead to quasi-deterministic 
trajectories.  
Bringing thermodynamics to nonequilibrium microscopic processes [9] is becoming 
increasingly important with the advent of new experimental and computational 
techniques able to probe the properties of single molecules [6, 7]. Our results show that 
the classical connection between work and changes in the Hamiltonian cannot be applied 
straightforwardly to time-dependent systems. As a result, quantities that are based on the 
work-Hamiltonian connection, such as those obtained from nonequilibrium work 
relations and time-dependent partition functions, cannot generally be used to estimate 
thermodynamically consistent free energy changes. A possible avenue to overcome these 
limitations, as we have shown here, is to identify the particular conditions for which work 
and changes in the Hamiltonian are connected to each other.  
References 
[1] H. Goldstein, Classical mechanics (Addison-Wesley Pub. Co., Reading, Mass., 
1980). 
[2] R. C. Tolman, The principles of statistical mechanics (Oxford University Press, 
London, 1955). 
[3] C. Jarzynski, Physical Review Letters 78, 2690 (1997). 
[4] G. Hummer, and A. Szabo, Proc Natl Acad Sci USA 98, 3658 (2001). 
[5] A. Imparato, and L. Peliti, Physical Review E 72, 046114 (2005). 
[6] J. Liphardt et al., Science 296, 1832 (2002). 
[7] S. Park et al., Journal of Chemical Physics 119, 3559 (2003). 
[8] J. Gore, F. Ritort, and C. Bustamante, Proc Natl Acad Sci USA 100, 12564 (2003). 
[9] D. Reguera, J. M. Rubi, and J. M. G. Vilar, Journal of Physical Chemistry B 109, 
21502 (2005). 
ABSTRACT
  Extensions of statistical mechanics are routinely being used to infer free
energies from the work performed over single-molecule nonequilibrium
trajectories. A key element of this approach is the ubiquitous expression
dW/dt=\partial H(x,t)/ \partial t which connects the microscopic work W
performed by a time-dependent force on the coordinate x with the corresponding
Hamiltonian H(x,t) at time t. Here we show that this connection, as pivotal as
it is, cannot be used to estimate free energy changes. We discuss the
implications of this result for single-molecule experiments and atomistic
molecular simulations and point out possible avenues to overcome these
limitations.

<|endoftext|><|startoftext|>
Introduction 
Adsorption of polymers on surfaces plays a key role in many technological applications 
and is also relevant to many biological processes. As a result, it has been studied for more than 
three decades1 and continues to receive intense interest.2 The field is rich and contains a wide 
variety of topics, from equilibrium properties of adsorbed layers and conformations of adsorbed 
polymer chains to dynamic properties and non-equilibrium processes in adsorption.2 For polymer 
adsorption on planar surfaces, it is well-known that there exists a critical adsorption point (CAP) 
that marks the transition of a polymer chain, in contact with a surface, from a non-adsorbed state 
to an adsorbed state.3 Scaling laws for a variety of quantities below, above and at the CAP for a 
homopolymer in contact with a planar surface were developed by Eisenriegler, Kremer, and 
Binder (EKB).4 For example, when the chain goes from a non-adsorbed state to an adsorbed 
state, the energy of the chain E changes from an intensive variable independent of chain length N 
to an extensive variable dependent on N. At the CAP, E is expected to scale with Nφ where φ is 
the crossover exponent. Numerical studies, including exact enumeration,5 the scanning method6,7 
and the multiple Markov chain method8 have been performed to determine the location of the 
CAP and the crossover exponent φ. The values reported are however not completely in 
agreement with each other and are still under debate, especially the crossover exponent φ. The 
disagreement may be traced, as suggested by a recent article,9 to different methods used for 
determining the CAP and the crossover exponent φ.  
While many studies focused on adsorption of homopolymers on planar homogeneous 
surfaces, adsorption of polymers on chemically or physically heterogeneous surfaces has also 
received a fair amount of studies.10-20 Some were inspired by specific applications such as 
segregation of polymer chains on patterned surfaces,10 or pattern transfer via surface 
adsorption,21,22 others were motivated by a desire to understand how the presence of surface or 
sequence disorders may influence adsorption.13,14,16,17,23-25 For example, Sebastian and Sumithra 
developed an analytical theory of the adsorption of Gaussian chains on random surfaces using 
Gaussian variational approach.24,25 They took surface heterogeneity into account by modifying 
de Genne’s adsorption boundary condition and analyzed influence of randomness on the 
conformation of the adsorbed chains. Adsorption of heteropolymers on heterogeneous surfaces, 
in particular, has been studied because of its relevance to molecular recognition in biological 
process. The concept of “pattern matching” was proposed26 and has been investigated with 
different approaches.12,20,26,27 Muthukumar for example derived an equation for the critical 
condition of adsorption of a polyelectrolyte to an oppositely charged patterned surface.26 
Golumbfski et al.12 showed that a statistical blocky chain was selectively adsorbed on a patchy 
surface while a statistically alternating chain was selectively adsorbed on an alternating surface. 
Jayaraman et al.19 described a simulation method to design surfaces for recognizing specific 
monomer sequences in heteropolymers. Recently Polotsky et al18 considered adsorption of 
Gaussian heteropolymer chains onto heterogeneous surface. They found that the presence of 
correlations between sequence and surface heterogeneity always enhances adsorption. However, 
the dependence of the critical adsorption point on either surface disorder or sequence disorder is 
not well-understood. Lack of this knowledge hampers further understanding on the correlation 
between sequence disorder and surface disorder during adsorption.  
Here we present theoretical equations that describe the dependence of CAP on the surface 
disorder or sequence disorder, along with Monte Carlo simulation data in agreement with the 
derived equations. The current study does not address the correlation between sequence disorder 
and surface disorder. We only consider cases where the disorder is either present randomly on 
the surface (i.e. adsorption of homopolymers on random heterogeneous surface) or on the 
sequence (i.e., adsorption of random copolymer on homogeneous surface). The correlation 
between sequence disorder and surface disorder will be the subject of future publications. In the 
following, we first present the theory that predicts the dependence of CAP on surface disorder 
and sequence disorder. Then we present details of Monte Carlo simulation methods used to 
determine the CAP, followed by simulation data that agree with the derived equations. Finally, 
we discuss implications of these results on practical applications such as chromatographic 
separations of polymers.  
2. Theory 
2.1 Adsorption of a homopolymer on a homogeneous surface 
We first consider adsorption of a homopolymer chain on a homogeneous surface. This 
can be represented by a self-avoiding walk (SAW) in a three-dimensional lattice interacting with 
a plane and restricted to lie on one side of the plane. The vertices of the walks interact with the 
surface sites with an attractive energy εw. The partition function for a N-step SAW interacting 
with a homogeneous surface is given by 
( )∑=
wNw vvcNZ εε exp)(),(homo  (1) 
where cN(v) is the number of SAWs that lie above the surface with v visits to the surface. 
Hammersley et al.28 have shown that the model exhibits a phase transition at a critical adsorption 
energy, εc, with a desorbed state for εw  < εc, and an adsorbed state for εw  > εc. They have shown 
that the limiting monomer free energy f(εw) 
),(log
lim)( homo wNw NZN
=  (2) 
exists and is a convex non-decreasing continuous function of εw. Moreover, f(εw)=κ  for εw ≤ 0, 
where κ is the lattice connective constant, and f(εw) is a strictly increasing function of εw when 
εw  > εc. Therefore, f(εw) is non-analytic at εw = εc. εc has also been determined to be greater than 
zero and, based on the best-known connective constant for the simple cubic lattice29, to have an 
upper bound of 0.5738. The lattice connective constant κ is also the limiting monomer free 
energy of the SAWs in bulk solution. Hence the CAP can be understood as the condition where 
the limiting monomer free energy of a chain attached to the surface becomes equal to the limiting 
monomer free energy of the chain in the bulk solution.  
2.2 Adsorption of a homopolymer on a random heterogeneous surface 
Now we consider the adsorption of a homopolymer interact with a heterogeneous surface 
consisting of two types of surface sites, A and B. The interaction energy of the vertices with the 
two surface sites are εwA and εwB. Following Soteros and Whittington23, and express the partition 
function of a N-step SAW interacting with a heterogeneous surface that consists of A and B 
surface sites as: 
( ) ( )∑ ∑
ANBAhet BvfAvfAv
vcffZ
)()( )((exp)(exp
)(),( εε            (3) 
where cN(v) is the number of walks that have v surface contacts, v(A) is the number of monomers 
interacting with the A sites, and v(B) is the number of monomers interacting with the B sites, fA 
and fB = 1 - fA are the fractions of A and B sites on the surface, respectively.  Here the partition 
function is averaged over random distributions of the surface sites, i.e. the so called annealed 
approximation. Physically the annealed disorder means that the type of surface sites may change 
while the system attains equilibrium state. However, it has been previously suggested11,15 that the 
annealed approximation is valid if the chain can visit a large area of the surface and hence 
samples all distributions of surface patterns. Furthermore, the surface sites are randomly 
distributed. If there is a correlation between surface disorders, such as those present in patchy 
surface or alternating surface, then Eq. (3) will not be valid, as Eq. (3) gives equal weight to all 
possible surface labelings, while correlations restrict possible labelings. Summing over v(A), 
equation (3) can be simplified to 
( )∑ +=
wANBAhet ffvcffZ )exp()exp()(),( εε  (4) 
A comparison of equations (1) and (4) reveals that the partition functions for homogeneous and 
annealed random heterogeneous surface become equivalent if    
( ) ( ) )exp(expexp BwBAwAw ff εεε +=  (5) 
From Eq. (5), we derive the following equation that gives the dependence of CAP on the surface 
disorder: 
( ) ( ) ))(exp(exp)1()(exp ccffcc BwBAwBhw εεε +−=  (6) 
 where εwh(cc) is the CAP of a homopolymer above a homogeneous surface, εwB(cc) is the CAP 
of a homopolymer above a heterogeneous surface while the surface interaction energy εwA held 
constant. It can be easily seen from this equation, that the dependence of the CAP on the 
percentage of attractive sites on the surface is not expected to be linear, in contrast to the 
conclusion drawn by an earlier study.13 Equation (6) is expected to be valid as long as the two 
conditions are met: (i) the chain has enough mobility to visit a large area of surface so that the 
annealed approximation is valid, and (ii) the surface sites are randomly distributed (i.e. 
uncorrelated).  
2.3 Adsorption of a random heteropolymer on a homogeneous surface 
The same approach can be extended to consider the adsorption of a random 
heteropolymer interacting with a homogeneous surface.  We will use the same notation as in 
previous section except now fA and fB represent fractions of A and B monomers present on the 
heteropolymer. We will only consider random copolymers composed by A and B monomers. 
The sequence of a random copolymer can be represented by χ ={χ1, χ2, … χN} where χi are 
independently and identically distributed random variables with χi =A with a probability of fA 
and χi=B with a probability of 1-fA.  A sequence order parameter λ can be defined to characterize 
the sequence randomness.12,27  
BAAB pp −−= 1λ  (7) 
where pij is the nearest neighbor transition probabilities which is the probability that a monomer 
of type i is followed by a monomer of type j. When λ=0, the sequence is random. When 
λ>0, then the sequence is statistically blocky, and when λ<0, the sequence is statistically 
alternating. We note that a given random sequence designated by χ may have non-zero values of 
λ. More discussions will be given in the later section. 
The partition function of N-step SAWs with the given sequence above a homogenous 
surface is written as: 
)exp()|,(),,( BwB
BANBAhetpoly vvvvCffZ εεχχ += ∑  (8) 
There are two different ways to average over different distributions of random sequences, 
namely the annealed average and the quenched average. With the annealed average, the partition 
function in Eq. (8) is first averaged over different distributions of χ. This then leads to a partition 
function, Zhetpoly(fA, fB), which is exactly the same as in Eq. (3). With the annealed 
approximation, we derive the same equation as given by Eq. (6) for the CAP of a random 
heteropolymer interacting with a homogeneous surface, provided that fA and fB now represent the 
fractions of A and B monomers on the chain.  
In the following, we will present Monte Carlo simulation data that conform to the two 
equations and also results that do not conform to the equations because of the invalidation of the 
approximations used in deriving the equations.  
3. Monte Carlo Simulation Methods 
In our simulations, polymer chains are modeled as SAWs with N vertices on a simple 
cubic lattice of dimensions 250a × 250a × 100a, where a is the lattice spacing.  Each vertex 
represents a monomer on the polymer chain. Chain lengths studied are in the range of N = 25 to 
250.  There is an impenetrable wall in the z = a plane representing the surface.  One monomer, 
picked randomly from the chain, is first placed on a site adjacent to the wall (in the z = 2a plane). 
The rest of the chain is then grown using the biased chain insertion method.30 Monomers that are 
in the z = 2a plane are considered to be adsorbed on the surface.  For all adsorbed monomers, an 
attractive polymer-surface interaction, εw, is applied.  The standard chemical potential of the 
chain (since it does not contain translation entropy), µ0, is calculated from the Rosenbluth-
Rosenbluth weighting factor, W(N), which is given by30  
0 ln)(lnβµ  and 
)exp( β
 (9) 
where z is the lattice coordination number (z = 6 for simple cubic lattice), Ej is the energy of ith 
inserted monomer in the jth potential direction. We note that µ0 calculated is the free energy per 
chain, and µ0/N is free energy per monomer discussed in equation (2). Typically, the chemical 
potential is determined based on about twenty million copies of trial chain conformations. 
We obtained the standard chemical potentials of a chain with at least one monomer 
attached to the surface, µads0, and compared that against a chain grown in a bulk solution, µbulk0. 
The bulk solution is modeled by a 100a × 100a × 100a lattice with periodic boundary conditions 
applied in all three directions.  All chemical potentials calculated are reduced by the Boltzmann 
factor, β=1/kBT=1. A coefficient K, similar to partition coefficient if the chain was placed in a 
pore instead of near a surface, is calculated by K =exp(-∆µ0), where ∆µ0 = µads0 − µbulk0. The way 
we determined the CAP is based on the dependence of K on the chain length N and will be 
presented in the results section.  
Heterogeneous surfaces were modeled by making the z = a plane composed of two 
different types of sites, which have different values for polymer-surface interactions. The 
designations εwA and εwB will be used to distinguish between interaction energies of different site 
types.  Simulations were performed using surfaces with different fractions of A and B sites.  
Surfaces were created by randomly assigning each site as A or B based on the probabilities, pA 
and pB, where pA and pB are, respectively, the desired fractions of A and B sites on the surface.  
Because of size of the surface, this procedure resulted in the real surface composition 
percentages matching the desired percentages within 0.1%.  For a given surface composition, the 
surface was randomly created once and was subsequently used in all simulations that determine 
the chemical potential of a chain above that surface. The surfaces displayed quenched 
randomness, i.e. the surface pattern remained unchanged throughout the simulations.  However, 
the first bead of chain was placed randomly over the surface during the chain insertion, and 
hence the chemical potential determined has been averaged over different surface randomness. 
Therefore, the annealed approximation used in deriving Eq. (6) was met in the simulations. In a 
few cases, patchy and alternating surfaces were created by simulating a two-dimensional Ising 
model at appropriate conditions.  
Heteropolymers were modelled as SAWs consisting of two types of monomers, A and B 
with specified fractions fA and fB=1-fA.. Chains were created by randomly selecting N*fB different 
positions along the chain to be B beads, while the remaining beads were assigned as A beads, 
ensuring that the chain had the exact composition called for by fA and fB. The sequence order 
parameter, λ, in generated random sequences exhibits a Gaussian distribution with zero mean. 
Examples of distributions are presented in Figure 1.  The longer the chain, the narrower the 
distribution is. For a given chain length N, we typically generate 5000 copies of random 
sequences with specified fA. Each sequence is then used in biased insertion for 5000 or more 
copies to obtain the Rosenbluth-Rosenbluth weighting factor. Letting W(N, χ) stands for the 
sequence order parameter λ
-1.0 -0.5 0.0 0.5 1.0
N=100
N=200
Figure 1: Distribution of sequence order parameters obtained from 5000 copies of 
random sequences generated with fA = fB = 0.50 for three different chain lengths. Lines 
are smooth fit to the data. 
Rosenbluth-Rosenbluth weighting factor obtained for a given sequence χ, the chemical potential 
of a chain can be obtained using two different averages over sequences: 
),(ln)(0 χβµ NWNads −=  (10) 
),(ln),()( 00 χχβµβµ NWNN adsads −==  (11) 
The first approach is the annealed average, while the second approach is the quenched average. 
The two chemical potentials calculated differ slightly from each other. More discussion of the 
quenched versus annealed averages will be given later. For the determination of CAP, we have 
used annealed chemical potentials. 
4. Results and Discussion 
4.1. Method Used to Determine the Critical Adsorption Point 
The method we used to determine the CAP follows our earlier papers31,32 and is briefly 
sketched out. We obtain the difference in standard chemical potential ∆µ0 at different surface 
interaction εw for a set of chains with different lengths.  An example of data is presented in 
Figure 2(a) for a homopolymer above a homogeneous surface. The lines for different length N 
nearly intersect at a common point, which is estimated to be at εc=0.276 ± 0.005. A convenient 
way to identify this intersection point is to plot the standard deviation of all ∆µ0, σ(∆µ0), for a 
given range of chain length studied versus εw, which yields a minimum in a plot shown in Figure 
2(b). The minimum identified is directly related to the critical condition point employed in liquid 
chromatography at the critical condition (LCCC) 32-34. In LCCC, the critical condition was 
defined as the co-elution point of homopolymers with different molecular weights, which, 
corresponding to computer simulation, is the point where K has least dependence on chain 
length. If K is truly independent of chain length, then σ(∆µ0)  will be zero and will be the 
minimum in a plot in Figure 2(b). The critical condition point bracketed in this fashion depends 
slightly on the range of chain length included in the calculation of σ(∆µ0).  However, in the 
current study we fixed the range of chain lengths used.  
Since this common intersection point does not occur at ∆µ0 =0, one may wonder if it is 
the critical adsorption point discussed in the literature. We have applied the same method for 
random walks above a planar surface in simple cubic lattice31. The intersection point found was 
at εc = 0.183± 0.002, in excellent agreement with expected CAP for random-walks, εc = -ln(5/6)= 
0.1823.1 On the other hand, CAP could be understood as the point where the limiting monomer 
free energy for a chain attached to the surface f(ε) equals to the limiting monomer free energy of 
an unattached chain in the bulk solution. Therefore, we may define a CAP at a finite chain 
0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34
N=100
N=200
0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34
Figure 2: (a) Plot of ∆µ0 versus εw for SAW chains with N =25, 50, 100 and 200 above a 
homogeneous surface. The critical adsorption point is identified as the common intersection 
point, εw(cc)=0.276±0.005. (b) Plot of deviation in ∆µ0 for the given range of N versus εw. The 
minimum in the plot is the critical adsorption point. 
length, εc (N), at which ∆µ0(N)=0. From Figure 2(a), we extract such εc (N).  This εc (N) is 
expected to depend on N in a scaling law, εc (N) = εc(∞) –αN−φ, and εc(∞) is the CAP at infinite 
chain length limit. Assuming φ = 0.5, Figure 3 shows the linear fitting of εc (N) versus N−0.5 
which yields εc (∞) = 0.274 ± 0.005. The εc (∞) identified is within the error bars of the common 
intersection point. 
The CAP of SAWs in simple cubic lattice has been studied by others.6-8 The reported literature 
value for the CAP of SAWs on the simple cubic lattice ranged from ~0.37 by Ma et al.35 down to 
0.288 ± 0.02 by Janse van Rensburg and Rechnitzer8.  The value reported by Ma et al. was 
considered to be too high, probably due to chains analyzed being too short. Methods used to 
determine the CAP varied in the literature. Meirovitch and Livne6 obtained the CAP for SAW in 
simple cubic lattice with Monte Carlo simulations with the scanning method. They plotted 
E(T)/N against N and found the exponent α in E(T)/N~Nα over three different ranges of chain 
length (N = 20-60, 60-170, and 170-350).  Then, the critical point was located by finding the 
N-0.5
0.00 0.05 0.10 0.15 0.20 0.25
Figure 3: Plot of εc(N) versus N-0.5 where εc(N) is extracted from figure 1(a) as 
the point when ∆µ(N) = 0. The extrapolated εc(∞) =0.274 ± 0.005.  
value of the reciprocal temperature Θ that resulted in the exponent α being constant for the three 
different ranges of chain lengths. Their reported Θc, which is equivalent to our εc, was 0.291 + 
0.001. Their method for determining Θc was based on the scaling theory developed by EKB.4 As 
stated earlier, at CAP, E(T)/N is expected to scale with Nφ-1 where φ is the crossover exponent.  
The value of this crossover exponent was debated. EKB first showed that φ  ≈ ν ≈ 0.59, where ν 
is the Flory’s exponent. Several recent reports suggest that φ = 0.5 even for SAW chains, the 
same as φ for random-walks.8,36 In Meriovitch and Levin’s study, φ was left as an adjustable 
parameter. The reported φ value in their study was =0.530+ 0.007, slightly larger than recent 
reported values φ=0.5.  If we were to take φ=0.5, then their data would suggest a lower Θc. 
Recently Decase et al.9 explored four different ways to determine the CAP, mostly based on the 
scaling idea. They found that a slight change of εc lead to large deviations in the resulting φ. 
Therefore, simultaneous determination of εc and φ may not give the true location of CAP. Janse 
van Rensburg and Rechnitzer8 studied CAP for SAWs in two and three dimensions using a 
variety of methods, including studying the energy ratios of walks of different lengths and the 
specific heats of the chains. They found that analysis of the specific heat data in three dimensions 
were fraught with difficulty. The energy ratios of different lengths and the free energy method 
yielded εc within the error bars. They reported a value for the CAP, εc=0.288 + 0.020 and a 
crossover exponent φ = 0.5005 + 0.0036. Our CAP is within the error bars of their reported 
value. Interestingly, if they assume that the convergence of the energy ratios of different chain 
lengths is proportional to N1 , the yielded εc = 0.276 + 0.029, exactly the same as in our study.  
The above discussion suggests that the critical condition determined with our approach is 
the CAP. Our approach to determine the CAP does not depend on knowledge of φ and therefore 
does not suffer from the uncertainty in εc when both εc and φ need to be determined 
simultaneously. In the remainder of the paper, we will use this method to determine the CAP of 
SAWs above a planar heterogeneous surface and SAWs for heteropolymers above a planar 
homogeneous surface.  
4.2. Homopolymers above Heterogeneous Surfaces with Attractive and Non-Interacting Sites 
Here we consider adsorption of homopolymers above a heterogeneous surface. The first 
type of heterogeneous surface studied consists of a surface composed of two types of sites.  One 
type of the surface sites, which will be called A sites, did not interact with the polymer chains; 
that is, εwA = 0. The other type of surface site, the B sites, had an attractive interaction with the 
polymer chains, εwB.  The value of εwB was varied to locate the CAP.  Figure 4 shows a plot of 
the standard deviations in β∆µ0 over all chain lengths for each value of εwB scanned. The 
minimum in standard deviations occurs for εwB(cc) = 0.49± 0.01, where the error was based on 
the energy increment scanned. The same method was used to determine the CAP for surfaces 
with 10%, 15%, 20%, 25%, and 75% attractive sites. Table I summarizes the CAP of 
homopolymers over heterogeneous surfaces along with the data over a homogeneous surface. 
Figure 5 presents the plot of CAP, εwB(cc), as a function of fB along with the theoretical 
prediction according to Eq. (8) with εwA = 0 and εwh(cc) = 0.276.  
It is clear that a good agreement between Eq. (6) and simulation data is observed. Also 
we note that CAP is not linearly dependent on fB over the entire range but is well-described by 
Eq.(6). Earlier study by Sumithra and Baumgaertner13 focused on surfaces with fB above the 
percolation threshold. Within that limited range of fB, a linear dependence may be obtained. This 
study is the first to confirm the dependence of CAP on the surface disorder over a wide range of 
fB.  
0.44 0.46 0.48 0.50 0.52 0.54 0.56
Figure 4: Plot of deviation in ∆µ0 against εwB for a homogeneous chain 
adsorbing on a surface with 50% attractive sites and 50% non-interacting 
sites. The CAP occurs at εwB = 0.49 + 0.01. 
 As discussed in the theory section, one of the assumptions used in deriving Eq. (6) is that 
the interacting surface sites are randomly distributed. We have tested this assumption by 
studying adsorption of homopolymers over a 50% surface with alternating and patchy patterns. 
For a surface with 50% of A and B, an order parameter O.P. can be defined (readers are referred 
to literature for the definition).19 If O.P.=0, the surface is random; if O.P.=+1, then the surface is 
patchy; and if O.P.=-1, the surface is alternating. The data are also included in Table I and are 
indicated in Figure 4. The two points deviate from the line described by Eq. (6). The CAP 
obtained over a 50% alternating surface is larger than that over a 50% random surface.  On the 
other hand, the CAP obtained over a 50% patchy surface is smaller than over a 50% random 
surface. These results can be easily understood. When a chain is adsorbed on the surface, it 
forms trains, loops and tails.1 Formation of trains lowers the energy of a chain to overcome the 
entropy loss during the adsorption. When a chain is in contact with an alternating surface, it is 
however difficult to form trains as no adsorbing sites are adjacent, while this is possible for 
Percent B sites
0 20 40 60 80 100
Figure 5: Plot of the CAP, εwB(cc), against the percent of attractive B sites, fB.  The 
symbols are the CAP determined by the simulation, and the solid line is from equation 
(6) with εWA =0.0 and εwh(cc) =0.276. Circles are CAP over random surfaces, the cross 
(×) is the CAP over a strictly alternating surface, and the upper triangle (∆) is the CAP 
over a patchy surface with O.P. =+0.94.  
random and patchy surfaces. Therefore, chains attraction to the alternating surface is lessened, 
and adsorption over a 50% alternating surface has to occur at a larger value of εw. On the other 
hand, a chain over a patchy surface can selectively sample patches of the surface composed of 
adsorbing sites, so the adsorption over patchy surface can occur at a smaller value of εw. 
Another assumption used in deriving Eq. (6) is the annealed approximation. This 
approximation is strictly met if the surface pattern in contact with the chain changes during the 
chain adsorption,11 hence averaging over different distributions can be performed as done in Eq. 
(3). The surface in this case is said to contain annealed randomness. If the surface pattern can not 
change, then the surface is said to contain quenched randomness. In our simulations, the surface 
contains quenched randomness. In fact, we have used only one realization of a quenched random 
surface. However, the chain was placed randomly over different surface sites, making the 
annealed approximation applicable to our simulations. We note that Sumithra and 
Baumgaertner13, in their studies, averaged over 50 different realizations of quenched randomness 
and they compared the results with that of a single surface realization. They did not find major 
difference between these two approaches, especially if the temperature is high. Moghaddam and 
Whittington16 investigated the difference between the quenched average and the annealed 
average for homopolymer adsorption on heterogeneous surface and random copolymer 
adsorption on homogeneous surface. Their data show that there was no difference between the 
two averages in the case of adsorption on random surfaces but there were differences for 
adsorption of random copolymers especially at low temperature. It has been argued that 
quenched and annealed averages are equivalent in cases where the quenched surface is large in 
comparison with the polymer.11,15 Polotsky et. al18 have also found that the CAP for quenched 
and annealed surface disorders are the same. In our simulations, the surface is large in 
comparison with the size of the polymer, and the attachment of the polymer to the surface occurs 
at many random places on the surface. Therefore, the chain can effectively interact with many 
different random arrangements of surface sites, and the system approaches the annealed average.  
4.3. Homopolymers above Heterogeneous Surfaces with All Sites Interacting 
In order to assess whether the equation derived for the CAP of random surfaces was valid 
in more general cases, random surfaces that contained all attractive sites were prepared.  For 
these surfaces, the polymer-surface interaction for the A sites, εwA, was set at a relatively weak 
attractive strength, 0.10, and the interaction for the B surface sites was varied to find the CAP.  
Additionally, surfaces with repulsive A sites (εwA = -.10) were also investigated. 
Percent B Sites
0 20 40 60 80 100
Figure 6: Plot of the critical adsorption point, εwB(cc), against the percent of attractive 
B sites for surfaces with attractive or repulsive A sites.  The dashed line and open 
symbols are for surfaces with slightly repulsive A sites, εwA=-0.10.  The solid line and 
closed symbols are for surfaces with slightly attractive A sites, εwA=+0.10.  The 
symbols are simulation results, while the lines are from equation (6) with the 
corresponding εwA values. 
Figure 6 shows the values of εwB(cc) determined for these two cases, as well as the prediction of 
the value of εwB(cc) given the values of fB and εwA used in the simulation.  As can be seen in the 
figure, there is a good agreement between the data and the equation, indicating that the equation 
is valid for surfaces with many different types of surfaces, not just surfaces with attractive and 
non-interacting sites. 
4.4. Random Copolymers above Homogeneous Surfaces  
Critical adsorption point for random copolymers adsorbing on homogeneous surfaces 
were also determined.  In these systems, polymer chains are considered to be composed of two 
different types of monomers, A’s and B’s, interacting with a surface composed of only one type 
of site. B monomers were attracted to the surface, while A monomers do not interact with the 
surface, i.e. εwA = 0. Table 2 shows the values of the CAP, εwB(cc), for various values of  fB along 
with results obtained for homopolymers, alternating copolymers and block copolymers. Here we 
have used annealed chemical potentials to determine the CAP. Figure 7 presents the plot of 
B(cc) as a function of fB along with the theoretical prediction according to Eq. (6) with εwA = 0 
and εwh(cc) = 0.276.  The data fit the equation well for situations in which sequences are 
randomly specified. However, similar to homopolymer adsorption on heterogeneous surfaces, 
the equation does not apply when the chain sequence is not random. For a diblock copolymer, 
where the first half of the chain is all A monomers while the second half of the chain is all B 
monomers, a weaker attraction is required to reach the CAP than for a random 50% copolymer 
chain.  An alternating copolymer requires a slightly stronger attraction to reach the CAP. Again, 
these results can be explained by considering the tendency of forming trains during adsorption.  
The diblock copolymer is a homogeneous string of adsorbing B monomers attached to a string of 
A monomers.  The B section of the chain is able to interact with the surface like a homogeneous 
chain, while the A section does not adsorb and slightly repels the chain from the surface, 
indicating that the value of εwB(cc) for a diblock chain should be similar to a homogeneous chain 
on a homogeneous surface. In fact, εwB(cc) = 0.30 for diblock copolymers, a value only slightly 
higher than for homopolymer adsorption, and much lower than εwB(cc) for a 50% random 
copolymer chain.  For an alternating chain, consecutive attractive interactions are not possible, 
resulting in the necessity of a stronger εwB(cc) than for a random chain. 
Finally we compare the chemical potential determined with annealed approximation 
versus quenched average. We found that the chemical potential of a random copolymer above 
the surface, µ0ads, obtained via the annealed average in Eq. (10) was smaller than the quenched 
Percent B monomers
0 20 40 60 80 100
Figure 7: Plot of the CAP, εwB(cc), of copolymers over a homogenous surface against the 
percent of attractive B monomers, fB.  The symbols are the CAP determined by the 
simulation, and the solid line is the plot according to equation (6) with εWA =0.0 and 
h(cc) =0.276. Circles are CAP of random copolymers, the cross (×) is the CAP of block 
copolymers, and (∆) is the CAP of alternating copolymers. 
average in Eq. (11). This has been suggested in the literature.23 Annealed approximation implies 
that the chain sequence can change when it interacts with the surface. As a result, the chemical 
potential is lowered when compared with a chain with a fixed sequence. Figure 8 below shows 
the distribution of µ0ads, obtained based on trial insertions of a given random sequence, against 
the sequence order parameter λ. As discussed in section 3, a generated random sequence may not 
correspond to exactly λ=0, therefore resulting a distribution of µ0ads against λ. Figure 8 shows 
that within the range of λ spanned by random sequences, the chemical potential is seen to depend 
on λ. The µ0ads is higher for negative λ and is lower for positive λ. This is consistent with the 
results in Table II. A negative λ implies the random copolymer chain exhibits statistically 
alternating behaviour. A higher µ0ads implies that the chain is more difficult to be adsorbed on the 
surface; therefore, it needs a stronger attraction to reach CAP.  
-0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4
Figure 8: The distribution of µ0ads versus the sequence order parameter λ of a random 
copolymer. Each data point represent one µ0ads based on insertion of one given random 
sequence for 5000 times and the figure contains data for 5000 random sequences. Chain 
length N =100, fA = fB = 0.5, and εwA =0.0 and εWB =-0.5.   
5. Summary Remarks 
Polymer adsorption at surfaces is relevant to many practical applications and has thus 
received extensive experimental investigation. However, interest in the CAP, to a large degree, 
has, until recently, remained a theoretical exercise. There were neither experimental methods that 
directly measure the CAP, nor were there applications that depended on the exact location of the 
CAP.  This has now changed as interesting applications in liquid chromatography separations 
have been developed.37,38 In particular, liquid chromatography at the critical condition (LCCC), 
first reported in the 1980’s, has now widely used for characterization of polymer systems that 
contain structural and chemical heterogeneities. The critical condition in LCCC experiments was 
defined as the point where homopolymers of a specific type co-elute regardless of their 
molecular weights. By erasing the dependence of elution on the molecular weights of one 
species, other species, differing either chemically or structurally, can then be analyzed. 
Experimentalists39 have mostly regarded this critical condition as the CAP. Our earlier Monte 
Carlo simulations largely support this view.31-34 The current study provides knowledge on the 
dependence of CAP on sequence disorder or surface disorder and such knowledge will be useful 
to develop chromatographic methods for analyzing random copolymers.  
We note that several earlier studies 13,14,16,17 have examined the adsorption of polymers on 
surfaces with either surface disorder or sequence disorder. These studies examined influence of 
disorder on a variety of properties related to polymer adsorption, such as the change of heat 
capacity, energy of the chain, and radius of gyration of the chain. Very few, however, have tried 
to determine the dependence of CAP on the disorder. One of possible reasons that hamper these 
earlier studies to study the dependence of CAP on the disorder may be due to the lack of a 
convenient way to determine the CAP. As we have discussed in the theory section, CAP was 
typically understood as the phase transition of an infinitely long chain near a surface. Earlier 
studies trying to determine the CAP need to wrestle with the difficulty in extrapolation of results 
to the limit of infinitely long chain. On the other hand, validity of our studies hinges on the way 
we determine the CAP. In the case of adsorption of homopolymers over homogeneous surface, 
we discussed the relationship between the CAP determined by our method with reported 
literature values. Abundant evidence that supports the validity of our approach was presented in 
section 4.1.  However, for the adsorption over heterogeneous surface, the nature of this CAP is 
not well-understood. Can a long chain in contact with a surface with few adsorbing sites still 
exhibit a phase transition similar as that of homopolymers over homogeneous surface? If it does, 
is the transition first-order or second-order? These questions therefore may cast some doubt on 
the CAP determined by our approach in the presence of disorder. However, the CAP we 
determined is directly related to the critical condition point in LCCC. Hence, even though the 
physical meaning of the CAP determined in this study in the presence of disorder could be 
subjected to further scrutiny, the importance of our results is not undermined.  
Table 1: Critical Adsorption Point for Homopolymers above Heterogeneous Surfaces with 
Attractive B Sites and Non-interacting A Sites. 
Percentage of Attractive Sites εwB(cc) 
100% 0.276 + 0.005 
75% 0.35 + 0.01 
50% 0.49 + 0.01 
25% 0.82 + 0.01 
20% 0.96 + 0.01 
15% 1.15 + 0.01 
10% 1.45 + 0.01 
50% alternating surface 0.55+ 0.01 
50% patchy surface (O.P=0.94) 0.31+ 0.01 
Table 2: Critical Adsorption Point for Heteropolymers with Attractive B Monomers and 
Non-interacting A Monomers over Homogeneous Surface 
Percentage of B monomers εwB(cc) 
100% 0.276 + 0.005 
75% 0.36 + 0.01 
50% 0.49 + 0.01 
25% 0.84 + 0.01 
15% 1.16 + 0.01 
50% alternating copolymers 0.55 + 0.01 
50% block copolymers 0.30 + 0.01 
References: 
(1) Fleer, G. J.;  Cohens Stuart, M. A.;  Scheutjens, J. M. H. M.;  Cosgrove, T.; Vincent, B. 
Polymers at Interfaces; Chapman & Hall: London, UK, 1993. 
(2) O'Shaughnessy, B.; Vavylonis, D. J. Phys.: Conden. Matt. 2005, 17, R63-R99. 
(3) De Gennes, P. G. Scaling Concepts in Polymer Physics; Cornell Univ. Press: Ithaca, 
1979. 
(4) Eisenriegler, E.;  Kremer, K.; Binder, K. J. Chem. Phys. 1982, 77, 6296-6320. 
(5) Ishinabe, T. J. Chem. Phys. 1982, 77, 3171-3176. 
(6) Meirovitch, H.; Livne, S. J. Chem. Phys. 1988, 88, 4507-4515. 
(7) Livne, S.; Meirovitch, H. J. Chem. Phys. 1988, 88, 4498-4506. 
(8) van Rensburg, E. J. J.; Rechnitzer, A. R. J. Phys. A: Math. Gen. 2004, 37, 6875-6898. 
(9) Decase, R.;  Sommer, J.-U.; Blumen, A. J. Chem. Phys. 2004, 120, 8831-8840. 
(10) Balazs, A. C.;  Huang, K.;  McElwain, P.; Brady, J. E. Macromolecules 1991, 24, 714-
717. 
(11) Wu, D.;  Hui, K.; Chandler, D. J. Chem. Phys. 1992, 96, 835-841. 
(12) Golumbfskie, A. J.;  Pande, V. S.; Chakraborty, A. K. Proc. Nat. Acad. Sci. 1999, 96, 
11707-11712. 
(13) Sumithra, K.; Baumgaertner. J. Chem. Phys. 1998, 109, 1540-1544. 
(14) Sumithra, K.; Baumgaertner, A. J. Chem. Phys. 1999, 110, 2727-2731. 
(15) Charkraborty, A. K. Phys. Rep. 2001, 342, 1-61. 
(16) Moghaddam, M. S.; Whittington, S. G. J. Phys. A: Math. Gen. 2002, 35, 33-42. 
(17) Moghaddam, M. S. J. Phys. A: Math. Gen. 2003, 36, 939-949. 
(18) Polotsky, A.;  Schmid, F.; Degenhard, A. J. Chem. Phys. 2004, 121, 4853-4864. 
(19) Jayaraman, A.;  Hall, C. K.; Genzer, J. Phys. Rev. Lett. 2005, 94, 078103. 
(20) Bogner, T.;  Degenhard, A.; Schmid, F. Phys. Rev. Lett. 2004, 93, 268108-268101-
268104. 
(21) Genzer, J. J. CHem. Phys. 2001, 115, 4873-4881. 
(22) Genzer, J. Macromol. Theory Simul. 2002, 11, 481-493. 
(23) Soteros, C. E.; Whittington, S. G. J. Phys. A.: Math. Gen. 2004, 37, R279-R325. 
(24) Sumithra, K.; Sebastian, K. L. Journal of Physical Chemistry 1994, 98, 9312-9317. 
(25) Sebastian, K. L.; Sumithra, K. Phys. Rev. E. 1993, 47, R32-R35. 
(26) Muthukumar, M. J.Chem. Phys. 1995, 103, 4723-4731. 
(27) Bratko, D.;  Chakraborty, A. K.; Shakhnovich, E. I. Chem. Phys. Lett. 1997, 280, 46-52. 
(28) Hammersly, J. M.;  Torrie, G. M.; Whittington, S. G. J. Phys. A: Math. Gen. 1982, 15, 
539-571. 
(29) Arteca, G. A.; Zhang, S. Phys. Rev. E. 1998, 58, 6817-6820. 
(30) Frenkel, D.; Smit, B. Understanding molecular simulations-from algorithms to 
applications; Academic Press: San Diego, CA, 2002. 
(31) Gong, Y.; Wang, Y. Macromolecules 2002, 35, 7492-7498. 
(32) Orelli, S.;  Jiang, W.; Wang, Y. Macromolecules 2004, 37, 10073-10078. 
(33) Jiang, W.;  Khan, S.; Wang, Y. Macromolecules 2005, 38, 7514-. 
(34) Ziebarth, J.;  Orelli, S.; Wang, Y. Polymer 2005, 46, 10450-10456. 
(35) Ma, L.;  Middlemiss, K. M.;  Torrie, G. M.; Whittington, S. G. J. Chem. Soc. Frad. 
Trans. II. 1978, 74, 721-726. 
(36) Metzger, S.;  Muller, M.;  Binder, K.; Baschnagel, J. Macromol. Theory Simul. 2002, 11, 
985-995. 
(37) Pasch, H.; Trathnigg, B. HPLC of Polymers; Springer-Verlag Berlin Heidelberg, 1999. 
(38) Chang, T. J. Polym. Sci. B 2005, 43, 1591-1607. 
(39) Macko, T.; Hunkeler, D. Adv. Polym. Sci. 2003, 163, 61-136. 
Graphics to be used for the Table of Contents 
Percent B sites
0 20 40 60 80 100
ABSTRACT
  The critical adsorption point (CAP) of self-avoiding walks (SAW) interacting
with a planar surface with surface disorder or sequence disorder has been
studied. We present theoretical equations, based on ones previously developed
by Soteros and Whittington (J. Phys. A.: Math. Gen. 2004, 37, R279-R325), that
describe the dependence of CAP on the disorders along with Monte Carlo
simulation data that are in agreement with the equations. We also show
simulation results that deviate from the equations when the approximations used
in the theory break down. Such knowledge is the first step toward understanding
the correlation of surface disorder and sequence disorder during polymer
adsorption.

<|endoftext|><|startoftext|>
rhoRR_rhoee.tex
Coherent control of atomic tunneling
John Martin and Daniel Braun
Laboratoire de Physique Théorique, IRSAMC, UMR 5152 du CNRS, Université Paul Sabatier, Toulouse, FRANCE
We study the tunneling of a two-level atom in a double well potential while the atom is coupled to
a single electromagnetic field mode of a cavity. The coupling between internal and external degrees
of freedom, due to the mechanical effect on the atom from photon emission into the cavity mode,
can dramatically change the tunneling behavior. We predict that in general the tunneling process
becomes quasiperiodic. In a certain regime of parameters a collapse and revival of the tunneling
occurs. Accessing the internal degrees of freedom of the atom with a laser allows to coherently
manipulate the atom position, and in particular to prepare the atom in one of the two wells. The
effects described should be observable with atoms in an optical double well trap.
PACS numbers: 73.40.Gk, 37.30.+i
I. INTRODUCTION
The tunneling effect is considered one of the hallmarks
of quantum mechanical behavior. Historically, tunneling
was first examined for single particles (e.g. α particles [1],
electrons in field emission [2] and later in mesoscopic cir-
cuits [3]), for Cooper pairs [4], and for molecular groups
[5, 6, 7]. Recently the tunneling of atoms has attracted
substantial attention [8, 9, 10, 11]. Dynamical (chaos as-
sisted) tunneling of ultracold atoms between different is-
lands of stability in phase space was analyzed in [12, 13]
and has been observed experimentally [14, 15]. Reso-
nantly enhanced tunneling of atoms between wells of a
tilted optical lattice has also been observed very recently
[16]. In all of these examples, the atoms have been con-
sidered internally as inert, and only the center of mass
coordinate of the atom was of interest. In [17] it was
shown that by taking into account the internal degrees
of freedom of atoms, an atom/optical double well poten-
tial could be created in which tunneling atoms see their
internal and external states correlated (such an effect is
also known from other contexts [18]). Mechanical effects
of light in optical resonators were also investigated in
[19], but no tunneling was considered.
Here we show that the tunneling effect can be drasti-
cally modified if an internal transition of the atom is cou-
pled to a single electromagnetic mode in a cavity, such
that photon emission is a reversible and coherent pro-
cess. The resulting Rabi oscillations between states with
the excitation in the atom and states with a photon in
the cavity modulate the periodic tunneling motion. De-
pending on the frequencies involved, a rich quasi-periodic
behavior can result. If the cavity is fed with a coherent
state, collapse and revival of the tunneling effect can oc-
cur. Moreover, we show that one may profit from access
to the internal degrees of freedom of the atom (e.g. with
a laser) to control the atomic motion in the external po-
tential.
FIG. 1: (Color online) Two-level atom in a double well po-
tential interacting with a standing wave inside a cavity.
II. MODEL
A. Derivation of the Hamiltonian
Consider a trapped two-level atom (with levels |g〉, |e〉
of energy ∓~ω0/2 respectively) interacting with a stand-
ing wave (with wave number k and frequency ω) inside
a cavity as illustrated in Fig. 1. The atom is assumed to
be bound in the y − z plane at the equilibrium position
y = z = 0 and to experience a symmetric double well po-
tential V (x) along the x direction. We denote by ∆ the
tunnel splitting, i.e. the energy spacing between the two
lowest energy states (the symmetric |−〉 and antisym-
metric |+〉 states) of this double well potential. Below
we also allow the trapped atom to interact resonantly
with an external laser. The Hamiltonian of this system
is given by
H = HA +HF +HAF , (1)
where HA = H
A is the Hamiltonian of the trapped
atom, HF is the Hamiltonian of the free field and HAF
is the interaction Hamiltonian describing the atom-field
interaction. We have
HexA =
+ V (x),
H inA =
σinz ,
HF = ~ωa
HAF = −d.E,
http://arxiv.org/abs/0704.0763v2
where d denotes the atomic dipole,
E = Eωε
a+ a†
sin(k(x− x0)) (3)
is the electric field operator, with Eω =
, where
ǫ0 is the permittivity of free space, V the electromag-
netic mode volume, x0 the abscissa at the left cavity
mirror (x0 < 0), and ε the electric field polarization vec-
tor. We have introduced the operators σini (resp. σ
for i = x, y, z as the Pauli spin operators in the basis
{|e〉, |g〉} (resp. {|+〉, |−〉}). The operator x stands for
the center-of-mass position of the atom, px is the conju-
gate momentum along the x axis, m denotes the atomic
mass, and a (a†) the annihilation (creation) operator of
the cavity radiation field.
We adopt the two-level approximation which consists
of taking into account only the two lowest motional en-
ergy states. This requires the Rabi frequency
4g2 + δ2
(with δ = ω − ω0 the detuning between the cavity field
and the atomic transition frequencies) to be much smaller
than the frequency gap ∆̃ between the upper motional
states and the ground state doublet (see Fig. 1). Within
this approximation, Hamiltonian HexA becomes
HexA =
σexz (4)
and the position operator takes the form x = b
σexx with
b/2 = 〈+|x|−〉. We can form states that are mainly con-
centrated in the left/right wells,
|L〉 = (|+〉 − |−〉)/
|R〉 = (|+〉+ |−〉)/
The average position of a particle localized in the right
well is then given by b/2 (see Fig. 1) and σexx = |R〉〈R| −
|L〉〈L|. The interaction Hamiltonian HAF can then be
written
HAF = −~g(a+ a†)
sinχ cosκ σinx − cosχ sinκ σexx σinx
with the atom-field coupling strength g =
−〈e|d|g〉.εEω/~, and
χ = kx0, κ = kb/2. (6)
For long wavelengths (κ ≪ 1), or κ = nπ with inte-
ger n, the left and right sites of the double well are in-
distinguishable to the cavity photon and HAF reduces
to Jaynes-Cummings Hamiltonian without rotating wave
approximation (with a sine varying coupling constant),
−~g sinχ (a + a†)σinx . Note that κ ≪ 1 would normally
be identified with the Lamb-Dicke regime. Here the sit-
uation is more subtle as the level spacings between the
tunneling split ground state doublet and the next excited
states can be very different such that the recoil energy
~ωrecoil satisfies ∆ ≪ ωrecoil ≪ ∆̃. One may thus be in
the Lamb-Dicke regime concerning transitions to higher
vibrational states but have a significant mechanical ef-
fect on the atomic tunneling. Furthermore, since there is
only one photon mode, the recoil energy cannot vary con-
tinuously and exciting higher vibrational levels requires
ωrecoil close to a level spacing. Our numerical calculations
show that even for κ ∼ 1 the two-level approximation can
still work very well (see Fig. 4).
For δ, ∆ ≪ ω, ω0, a rotating wave approximation is
justified, which consists in eliminating the energy non-
conserving terms aσex± σ
− and a
†σex± σ
+ with σ
+ = |e〉〈g|,
σin− = σ
+ and σ
+ = |+〉〈−|, σex− = σ
+ . Within this
approximation, the total Hamiltonian reads
σexz +
σinz + ~ωa
†a (7)
+~g(aσin+ + a
†σin− )
cosχ sinκ σexx − sinχ cosκ 1ex
Thus, depending on the parameters χ and κ, the cavity
photon may induce internal transitions in the atom only
(cosχ sinκ = 0), or induce transitions between internal
and external states at the same time (cosχ sinκ 6= 0)
even for a vanishing detuning (δ = ω − ω0 = 0). This is
in contrast to conventional sideband transitions of har-
monically bound atoms or ions in the Lamb-Dicke regime
which require an appropriate value of the detuning. For
a fixed potential center (and thus fixed χ), κ can be
changed through a modulation of the well-to-well sep-
aration b. We will neglect in the following the effects
of decoherence, which means that not only g but also
∆ should be much larger than the rate of spontaneous
emission Γ, and the cavity decay rate κcav.
We denote the global state of the atom-field system by
|n, i, j〉 ≡ |n〉⊗|i〉⊗|j〉 where |n〉 stands for the cavity field
eigenstates, |i〉 ∈ {|−〉, |+〉} for the external motional
states, and |j〉 ∈ {|g〉, |e〉} for the internal states. The
total excitation number N is given by a†a+ σin+σ
B. Energy levels
The states |0,±, g〉 are eigenstates of H with eigen-
value (−~ω0 ± ~∆)/2, i.e. these states remain uncou-
pled and represent the two lowest energy states in the
regime δ, ∆ ≪ ω, ω0. It is straightforward to ver-
ify that the Hamiltonian (7) only induces transitions
between states with the same number of excitations
N , {|N − 1,+, e〉, |N,+, g〉, |N − 1,−, e〉, |N,−, g〉} ≡
{|1〉, |2〉, |3〉, |4〉}. It is therefore sufficient to solve the
dynamics in this subspace. In doing so, we obtain the
eigenvalues of H ,
λρµ = (N − 1/2)~ω + ρ
, (8)
for ρ, µ ∈ {±}, N = 1, 2, . . ., and with
2Ng2(1− cos(2κ) cos(2χ)) + δ2 +∆2 ± 2Ω2 , (9)
4Ng2 cos2 κ sin2 χ(∆2 + 4Ng2 sin2 κ cos2 χ) + δ2∆2 . (10)
For a vanishing tunnel splitting (∆ = 0), Ω± reduces to
the maximum (minimum) of the two Rabi frequencies of
the Jaynes-Cummings models in the right and left wells.
For cosκ = 1, the decoupling of external and internal
degrees of freedom manifests itself also in the eigenvalues
with Ω± = |
4Ng2 sin2 χ+ δ2 ±∆|.
C. Evolution operator
The whole dynamics of the system can be described
by means of the evolution operator U(t) = e−iHt/~ with
components Uij = 〈i|U(t)|j〉 = Uji, which can be calcu-
lated exactly. In order to simplify the expressions, we
restrict ourselves in the following to χ = −π/4−2nπ (in-
teger n). We find, up to a an overall phase e−i(N−1/2)ωt,
U11 = −
µSµΩ−µ
ξ + µ(∆− δ)Ω2
− iµΩ+Ω−Cµ(δ∆− µΩ2)
U12 =
Ng cosκ√
µSµΩ−µ(∆
2 + 2Ng2 sin2 κ
+ µΩ2) + iµΩ+Ω−∆Cµ
U13 =
−iNg2 sin(2κ)
µδΩµS−µ + iµΩ+Ω−Cµ
U23 =
Ng sinκ√
µΩµS−µ(δ∆+ 2Ng
2 cos2 κ− µΩ2)
ξ = ∆(δ2 + 2Ng2 cos2 κ− δ∆),
Λ = Ω+Ω−Ω
and where all time dependence is in the coefficients
C± = cos(Ω±t/2), S± = sin(Ω±t/2). (16)
The remaining components can be deduced from the
relations U22(δ,∆) = U33(−δ,−∆) = U44(δ,−∆) =
U11(−δ,∆), U24(δ,∆) = U13(−δ,∆), U14(δ,∆) =
U23(δ,−∆) = U23(−δ,∆), and U34(δ,∆) = U12(δ,−∆),
valid for any χ, where we have made explicit the depen-
dence of the Uij on δ and ∆.
III. INTERNAL AND EXTERNAL DYNAMICS
The reduced density matrix ρex for the atomic center-
of-mass motion alone follows from ρ = |ψ(t)〉〈ψ(t)| by
tracing out the field and internal degrees of freedom,
where the total wave function at time t reads |ψ(t)〉 =
i,j=1 Uij〈j|ψ(0)〉 |i〉. The average position of the atom
in the double well potential is then given by
〈x〉 =
Trex(ρ
exσexx ) =
(1− 2ρLL) (17)
with ρLL = 〈L|ρex|L〉. Similarly, we obtain the reduced
density matrix ρin for the internal atomic state by tracing
out the field and external degrees of freedom, and the
probability to find the atom in the excited state as ρee =
〈e|ρin|e〉.
In the following, we first focus on resonant atom-field
interaction (ω = ω0) before moving to the non-resonant
case (ω 6= ω0). We distinguish three regimes according
to the tunnel splitting compared to the Rabi frequency
g : the small tunnel splitting regime (when ∆/g ≪ 1),
the intermediate regime (when ∆/g ∼ 1), and the large
tunnel splitting regime (when ∆/g ≫ 1).
A. Resonant atom-field interaction
For resonant atom-field interaction (δ = 0), the ex-
pressions for Uij can be greatly simplified. If the system
is initially prepared in the state |N − 1, R, e〉 and for
κ = π/4, we have
ρLL =
∆2 +Ng2
Ωtunt
with the tunnel frequency
Ωtun =
(Ω+ +Ω−) , (19)
ρee =
Ω2µ −∆2
cos(Ωµt) + 4∆
2 cos
Ω+−Ω−
8(Ng2 +∆2)
The atom position oscillates with a single frequency
Ωtun given by Eq. (19), whereas ρee evolves with three in
general incommensurable frequencies Ω+, Ω−, and (Ω+−
Ω−)/2 giving rise to a quasi-periodic signal.
For ∆/g ≪ 1, Eq. (18) leads to ρLL ≃ 0 (up to order
(∆/g)2), indicating that tunneling is suppressed. This is
already obvious from (7), as the term responsible for tun-
neling, (~∆/2)σexz = (~∆/2)(|R〉〈L| + |L〉〈R|) becomes
very small compared to the last term, diagonal in |R〉, |L〉
which leads to internal Rabi flopping. Note, however,
that tunneling is suppressed on all time scales, even for
t≫ 1/∆, due to the reduced amplitude in Eq. (18), very
much in contrast to tunneling without internal degrees of
freedom, where only the period of the tunneling motion,
but not the amplitude is affected when ∆ is reduced. For
κ approaching π, the situation changes because the term
g cosχ sinκ σexx of the interaction Hamiltonian inducing
transitions between vibrational states becomes small in
comparison with ∆ thereby allowing tunneling again.
Because internal and external degrees of freedom are
coupled, the tunneling frequency (Eq. (19)) depends on
the number of photons inside the cavity. As an exam-
ple, let us now consider ∆ ∼ g and a cavity field initially
in a coherent state |α〉 = e− 12 |α|2
|n〉 with |α|2
equal to the mean photon number 〈n〉. Figure 2 shows
that the average position of the atom in the double well
as a function of time for a coherent state exhibits col-
lapses and revivals. The oscillation amplitude decreases
with increasing mean photon number 〈n〉 and decreasing
tunnel splitting ∆ (see Eqs. (18,9)). Since the probabil-
ity to find the atom in the excited state oscillates with
three frequencies, no collapses and revivals are observed
for ρee.
The collapse time tc of the tunneling motion can be
estimated from the condition [20] (Ωtun(〈n〉 +
〈n〉) −
Ωtun(〈n〉−
〈n〉)) tc ∼ 1 with Ωtun(m) given by Eq. (19)
for N = m+ 1, which yields, for 〈n〉 ≫ 1,
(∆/g)2 + 3/4
+O(〈n〉−2) (21)
The time interval between two following revivals, tr,
follows from (Ωtun(〈n〉) − Ωtun(〈n〉 − 1)) tr = 2π, and is
given for 〈n〉 ≫ 1 by
(∆/g)2 + 1/2
+O(〈n〉−2)
For the parameters of Fig. 2, Eq. (22) yields gtr ≃ 68.23
for ∆/g = 2 and gtr ≃ 86.70 for ∆/g = 5. Smaller revival
times are possible for smaller values of 〈n〉, but in general
the observation of revivals will be quite challenging, as
they require ∆ ∼ g ≫ κcav.
For large tunnel splitting, ∆/g ≫ 1, Ωtun = ∆ +
Ng2/(2∆) + O((g/∆)3), and Eq. (18) reduces to ρLL ≃
sin2(∆t/2), which is identical to the tunneling of a parti-
cle without internal structure. Equation (20) reduces to
a Rabi oscillation ρee ≃ cos2(
Ngt/2).
B. Non-resonant atom-field interaction
For non-resonant atom-field interaction (δ 6= 0), and
intermediate tunnel splitting [see Fig. 3 for ∆ = δ = g],
∆/g = 5
∆/g = 2
100806040200
FIG. 2: (Color online) Average position of the atom in the
double well as a function of time for ∆/g = 2 (blue, top curve)
and ∆/g = 5 (red), κ = π/4 and a coherent state with α = 5.
100806040200
FIG. 3: (Color online) Average position of the atom in the
double well as a function of time for ∆ = δ = g, κ = π/4 and
N = 1. The blue solid/red dashed curve corresponds to an
excited atom initially located in the left/right well.
〈x(t)〉 involves in general the two non-commensurate fre-
quencies Ω+ and Ω− and varies therefore quasiperiod-
ically as a function of time. Figure 3 also shows that
an atom initially located in one of the two wells remains
mostly confined to that well.
For small tunnel splitting, ∆/g ≪ 1 and large detuning
|δ|/g ≫ 1 (with ∆|δ|/g2 ∼ 1), the matrix elements of U
simplify to
U13 =
i Ng2 sin 2κ
δ2∆2 +N2g4 sin2(2κ)
(23a)
U33 = cos
δ2∆2 +N2g4 sin2(2κ)
(23b)
up to corrections of order O(∆/g) and a phase factor
ei[(Ng
2/δ+δ)−(2N−1)ω]t/2 while the components U12 and
U23 are of order O(∆/g). In this situation, the system
20151050
FIG. 4: (Color online) Density matrix elements ρRR (top)
and ρee (bottom) as a function of the interaction time gt
for an initially excited atom located in the right well and
for the parameters ∆/g ≃ 0.3336, δ/g = 3, κ = π/4, and
N = 1. Numerical results from the propagation of the time
dependent Schrödinger equation with Hamiltonian (1) and
rotating wave approximation are represented by circles and
analytical results by solid curves. The time propagation was
done with (~ = m = 1) g = 0.01 and the double well potential
V (x) = 0.08x4 − x2 yielding a tunnel splitting ∆ ≃ 0.003336
and a ratio ∆̃/
4g2 + δ2 ≃ 44.4 ≫ 1.
oscillates only between the two states |N − 1,+, e〉 and
|N − 1,−, e〉 with a single frequency
δ2∆2 +N2g4 sin2(2κ)
, (24)
just as a three-level atom undergoing a Raman transition
in the far detuned regime behaves as a two-level system.
If the system is initially in the state |N − 1,−, e〉, we
have from Eqs. (23)
ρLL =
− Nδ∆sin(2κ)
2Ω̄2(δ/g)2
1− cos
, (25)
and ρee = 1. For a detuning δ = ±Ng2 sin(2κ)/∆,
ρLL =
1− cos
. (26)
This regime may be suitable for coherently manipu-
lating the atom position through access to its internal
degrees of freedom with a laser. Coherent manipulation
of the position of neutral atoms has been proposed and
demonstrated before, see e.g. [21, 22, 23]. In these exam-
ples, the manipulation is done by modifying the external
potential. The mechanism we propose here is very dif-
ferent, as the potential remains totally unchanged, and
only internal transitions and the tunneling effect are used
to move the atom in a controlled way. As an example,
we show how the atom can be prepared in the left well
starting from the ground state |0,−, g〉 for δ = −g2/∆.
We first apply a π-pulse with an external laser resonant
with the atomic transition. By using a laser with a wave
vector perpendicular to the Ox-direction, only the atomic
internal degree of freedom is affected, resulting in the
transition |0,−, g〉 → −i|0,−, e〉. We assumed that the
laser Rabi frequency ΩR is much larger than the tunnel
frequency ∆.
Now we use the coupling between the internal and ex-
ternal degrees of freedom to create a superposition of
the |0,±, e〉 states, and then apply a second resonant π-
pulse to get back to the uncoupled states |0,±, g〉. For
∆/g ≪ 1, δ = −g2/∆ and κ = π/4, the initial state
transforms according to
|0,−, g〉 −−−−→
ΩRt=π
|0,−, e〉 −−−−−−→
∆t=π/
|0, L, e〉 −−−−→
ΩRt=π
|0, L, g〉
up to a physically irrelevant phase. Other coherent su-
perpositions of |0,+, g〉 and |0,−, g〉 can be obtained by
choosing appropriate interaction times.
In order to verify that the two-level approximation for
the external motion used in the derivation of the Hamilto-
nian is a good approximation, we have numerically solved
the time dependent Schrödinger equation with Hamil-
tonian (1) and rotating wave approximation but with
the exact external potential V (x) (i.e. with a large num-
ber of vibrational states). Figure 4 shows that provided
4g2 + δ2 as stated before, to take only the two
lowest vibrational states into account is indeed a good
approximation.
We finally comment on possible experimental realiza-
tions of our model. Double well potentials with tunable
well-to-well separation have been demonstrated with op-
tical dipole traps e.g. in [21, 24], and on atom chips
e.g. in [25, 26]. For our model, the double well poten-
tial has to be realized inside the cavity. Optical trapping
and even cooling of atoms close to their ground state
inside a cavity has been achieved in several groups by
now [27, 28, 29, 30], but up to our knowledge double
well potentials have not been realized in a cavity so far.
However, some of the cavities developed have a very long
lateral opening (up to 222 µm [31]) and should allow more
complicated trapping potentials (optical lattices inter-
secting a cavity have been realized in Chapman’s group
[31]). We remark that it is not essential for our model
that the double well potential be aligned with the cavity
axes. Any other orientation is possible, and only leads to
modified coefficients cosχ sinκ and sinχ cosκ.
At certain “magical wavelengths”, Cs, Yb, Sr, Mg, and
Ca atoms in optical traps experience the same potential
for ground and excited internal states coupled by a dipole
transition [27, 32, 33, 34]. In a symmetric potential V (z)
the tunneling frequency ∆ is given in WKB approxi-
mation by ∆ ∼ ωosc exp(−1/~
2m(E0 − V (z)) dz)
where E0 is the ground state energy, ωosc the single well
harmonic oscillation frequency, and z = ±a are the corre-
sponding classical turning points delimiting the range of
the barrier. The exponential factor can approach unity
for a barrier that is only slightly higher than the ground
state energy E0, in which case cooling to temperatures
kBT < ~∆ should be possible with state of the art
techniques [27]. In [27] a trap depth V0/~ = 47 MHz
was achieved inside a cavity with 1.2 mW laser power.
In any case, the trap frequency and thus the tunneling
splitting are determined by the laser power and the fo-
cussing (or the wavelength for optical lattices), and can
therefore be controlled independently of Γ, κcav, such
that there should be no fundamental problem achieving
∆ ≫ Γ, κcav. The detection of the tunneling motion
should be possible by optical imaging, i.e. diffusion of
laser light from another transition in the optical regime
with smaller wavelength than the well separation. Al-
ternatively, one might monitor the transmission through
the cavity in the case that it differs for the two locations
of the wells [35]. Another possibility might be using the
atomic spin as a position meter [17].
Acknowledgments
We thank Jacques Vigué for an interesting discussion
and CALMIP (Toulouse) for the use of their comput-
ers. This work was supported by the Agence National de
la Recherche (ANR), project INFOSYSQQ, and the EC
IST-FET project EUROSQIP.
[1] G. Gamow, Z. Phys. 51, 204 (1928).
[2] E. Guth and C. J. Mullin, Phys. Rev. 61, 339 (1942).
[3] M. H. Devoret, D. Esteve, H. Grabert, G.-L. Ingold,
H. Pothier, and C. Urbina, Phys. Rev. Lett. 64, 1824
(1990).
[4] B. D. Josephson, Rev. Mod. Phys. 46, 251 (1974).
[5] A. Hueller, Z. Phys. B 36, 215 (1980).
[6] A. Würger, Z. Phys. B 76, 65 (1989).
[7] D. Braun and U. Weiss, Physica B 202, 264 (1994).
[8] A. A. Louis and J. P. Sethna, Phys. Rev. Lett. 74, 1363
(1995).
[9] F. Meier and W. Zwerger, Phys. Rev. A 64, 033610
(2001).
[10] D. L. Luxat and A. Griffin, Phys. Rev. A 65, 043618
(2002).
[11] M. Albiez, R. Gati, J. Folling, S. Hunsmann, M. Cris-
tiani, and M. K. Oberthaler, Phys. Rev. Lett. 95, 010402
(2005).
[12] F. Grossmann, T. Dittrich, P. Jung, and P. Hanggi, Phys.
Rev. Lett. 67, 516 (1991).
[13] V. Averbukh, S. Osovski, and N. Moiseyev, Phys. Rev.
Lett. 89, 253201 (2002).
[14] D. A. Steck, W. H. Oskay, and M. G. Raizen, Science
293, 274 (2001).
[15] W. K. Hensinger, H. Häffner, A. Browaeys, N. R. Hecken-
berg, K. Helmerons, C. McKenzie, G. J. Milburn, W. D.
Philipps, S. L. Holston, H. Rubinsztein-Dunlop, et al.,
Nature 412, 52 (2001).
[16] C. Sias, A. Zenesini, H. Lignier, S. Wimberger,
D. Ciampini, O. Morsch, and E. Arimondo, Phys. Rev.
Lett. 98, 120403 (2007).
[17] D. L. Haycock, P. M. Alsing, I. H. Deutsch, J. Grondalski,
and P. S. Jessen, Phys. Rev. Lett. 85, 3365 (2000).
[18] T. Salzburger and H. Ritsch, Phys. Rev. Lett. 93, 063002
(2004).
[19] P. Domokos and H. Ritsch, J. Opt. Soc. Am. B 20, 1098
(2003).
[20] M. Scully and M. Zubairy, Quantum Optics (Cambridge
University Press, Cambridge, UK, 1997).
[21] J. Sebby-Strabley, M. Anderlini, P. S. Jessen, and J. V.
Porto, Phys. Rev. A 73, 033605 (2006).
[22] O. Mandel, M. Greiner, A. Widera, T. Rom, T. W.
Hänsch, and I. Bloch, Phys. Rev. Lett. 91, 010407 (2003).
[23] J. Mompart, K. Eckert, W. Ertmer, G. Birkl, and
M. Lewenstein, Phys. Rev. Lett. 90, 147901 (2003).
[24] Y. Shin, M. Saba, T. A. Pasquini, W. Ketterle, D. E.
Pritchard, and A. E. Leanhardt, Phys. Rev. Lett. 92,
050405 (2004).
[25] E. A. Hinds, C. J. Vale, and M. G. Boshier, Phys. Rev.
Lett. 86, 1462 (2001).
[26] W. Hänsel, J. Reichel, P. Hommelhoff, and T. W. Hänsch,
Phys. Rev. A 64, 063607 (2001).
[27] J. McKeever, J. R. Buck, A. D. Boozer, A. Kuzmich, H.-
C. Nägerl, D. M. Stamper-Kurn, and H. J. Kimble, Phys.
Rev. Lett. 90, 133602 (2003).
[28] J. Ye, D. W. Vernooy, and H. J. Kimble, Phys. Rev. Lett.
83, 4987 (1999).
[29] J. A. Sauer, K. M. Fortier, M. S. Chang, C. D. Hamley,
and M. S. Chapman, Phys. Rev. A 69, 051804(R) (2004).
[30] P. Maunz, T. Puppe, I. Schuster, N. Syassen, P. W. H.
Pinske, and G. Rempe, Nature 428, 50 (2004).
[31] K. M. Fortier, S. Y. Kim, M. J. Gibbons, P. Ahmadi, and
M. S. Chapman, Phys. Rev. Lett. 98, 233601 (2007).
[32] H. Katori, M. Takamoto, V. G. Pal’chikov, and V. D.
Ovsiannikov, Phys. Rev. Lett. 91, 173005 (2003).
[33] A. Brusch, R. LeTargat, X. Baillard, M. Fouche, and
P. Lemonde, Phys. Rev. Lett. 96, 103003 (2006).
[34] Z. W. Barber, C. W. Hoyt, C. W. Oates, L. Hollberg,
A. V. Taichenachev, and V. I. Yudin, Phys. Rev. Lett.
96, 083002 (2006).
[35] P. Maunz, T. Puppe, I. Schuster, N. Syassen, P. W. H.
Pinkse, and G. Rempe, Phys. Rev. Lett. 94, 033002
(2005).
ABSTRACT
  We study the tunneling of a two-level atom in a double well potential while
the atom is coupled to a single electromagnetic field mode of a cavity. The
coupling between internal and external degrees of freedom, due to the
mechanical effect on the atom from photon emission into the cavity mode, can
dramatically change the tunneling behavior. We predict that in general the
tunneling process becomes quasiperiodic. In a certain regime of parameters a
collapse and revival of the tunneling occurs. Accessing the internal degrees of
freedom of the atom with a laser allows to coherently manipulate the atom
position, and in particular to prepare the atom in one of the two wells. The
effects described should be observable with atoms in an optical double well
trap.

<|endoftext|><|startoftext|>
Correlation functions and excitation spectrum of the frustrated ferromagnetic spin-1
chain in an external magnetic field
T. Vekua,1 A. Honecker,2 H.-J. Mikeska,3 and F. Heidrich-Meisner4
Laboratoire de Physique Théorique et Modèles Statistiques,
Université Paris Sud, 91405 Orsay Cedex, France
Institut für Theoretische Physik, Universität Göttingen, 37077 Göttingen, Germany
Institut für Theoretische Physik, Universität Hannover, Appelstrasse 2, 30167 Hannover, Germany
Materials Science and Technology Division, Oak Ridge National Laboratory, Tennessee, 37831, USA and
Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37996, USA
(Dated: April 5, 2007; revised: July 6, 2007)
Magnetic field effects on the one-dimensional frustrated ferromagnetic chain are studied by means
of effective field theory approaches in combination with numerical calculations utilizing Lanczos
diagonalization and the density matrix renormalization group method. The nature of the ground
state is shown to change from a spin-density-wave region to a nematic-like one upon approaching
the saturation magnetization. The excitation spectrum is analyzed and the behavior of the single
spin-flip excitation gap is studied in detail, including the emergent finite-size corrections.
I. INTRODUCTION
The interest in helical and chiral phases of frustrated
low-dimensional quantum magnets has been triggered by
recent experimental results. While many copper-oxide
based materials predominantly realize antiferromagnetic
exchange interactions, several candidate materials with
magnetic properties believed to be described by frus-
trated ferromagnetic chains have been identified,1,2,3,4,5,6
including Rb2Cu2Mo3O12 (Ref. 1), LiCuVO4 (Refs. 2,
3,4,5), and Li2ZrCuO4 (Ref. 6). The frustrated anti-
ferromagnetic chain is well-studied,7 but the magnetic
phase diagram of the model with ferromagnetic nearest-
neighbor interactions remains a subject of active theoret-
ical investigations.8,9,10,11
In this work we consider a parameter regime that is
in particular relevant for the low-energy properties of
LiCuVO4, corresponding to a ratio of J1 ≈ −0.3 J2 be-
tween the nearest neighbor interaction J1 and the frus-
trating next-nearest neighbor interaction J2 > 0. As
the interchain couplings for this material are an order
of magnitude smaller than the intrachain ones,3 we an-
alyze a purely one-dimensional (1D) model. Apart from
mean-field based predictions,8 the nature of the ground
state in a magnetic field h is not yet completely known.
Therefore, combining the bosonization technique with a
numerical analysis we determine ground-state properties
and discuss the model’s elementary excitations.
The Hamiltonian for our 1D model reads:
J1~Sx · ~Sx+1 + J2~Sx · ~Sx+2
Szx , (1)
where ~Sx represents a spin one-half operator at site x.
Bosonization has turned out to be the appropriate lan-
guage for describing the regime |J1| ≪ J2 of Eq. (1).
This result has been established by studying the magne-
tization process yielding a good agreement between field
theory and numerical data.9 The derivation of the ef-
fective field theory is summarized in Sec. II. Here, we
extend on such comparison of analytical and numerical
results and further confirm the predictions of field the-
ory by analyzing several correlation functions in Sec. III.
Then, in Sec. IV, we numerically compute the one- and
two-spin flip excitation gaps and compare them to field-
theory predictions. Finally, Sec. V contains a summary
and a discussion of our results.
II. EFFECTIVE FIELD THEORY
We start from an effective field theory describing the
long-wavelength fluctuations of Eq. (1). In the limit of
strong next-nearest neighbor interactions J2 ≫ |J1|, the
spin operators can be expressed as:
Szα(r) ∼ m+ c(m) sin
2kF r +
+ · · ·
S−α (r) ∼ (−1)re−iθα
π + · · · . (2)
kF = (
−m)π is the Fermi-wave vector and α = 1, 2 enu-
merates the two chains of the zig-zag ladder. In relation
with Eq. (1), note that ~S1(r) = ~S(x+1)/2 (~S2(r) = ~Sx/2)
for x odd (even). φα and θα are compactified quantum
fields describing the out-of-plane and in-plane angles of
fluctuating spins obeying Gaussian Hamiltonians:
H = v
(∂xφα)
2 +K(∂xθα)
, (3)
with [φα(x), θα(y)] = iΘ(y − x), where Θ(x) is the
Heaviside function. Sub-leading terms are suppressed in
Eq. (2). m is the magnetization of decoupled chains, re-
lated to the real magnetization M of the zig-zag system
M ≃ m
1− 2K(m)J1
πv(m)
. (4)
K(m) and v(m) are the Luttinger liquid (LL) parame-
ter and the spin-wave velocity of the decoupled chains,
http://arxiv.org/abs/0704.0764v2
respectively. The nonuniversal amplitude c(m) appear-
ing in the bosonization formulas (2) has been determined
from density matrix renormalization group (DMRG)
calculations.12 Note that in our notation M = 1/2 at
saturation.
Now we perturbatively add the interchain coupling
term to two decoupled chains, each of which is described
by an effective Hamiltonian of the form Eq. (3) and fields
φi and θi, i = 1, 2. For convenience, we transform to
the symmetric and antisymmetric combinations of the
bosonic fields φ± = (φ1±φ2)/
2 and θ± = (θ1±θ2)/
In this basis and apart from terms H±0 of the form (3),
the effective Hamiltonian describing low-energy proper-
ties of Eq. (1) contains a single relevant interaction term
with the bare coupling g1 ∝ J1 ≪ v:
Heff = H+0 +H
0 + g1
dx cos
, (5)
and the renormalized LL parametersK± are, in the weak
coupling limit:
K± = K
1∓ J1
. (6)
K+ is the Luttinger-liquid parameter of the soft mode
of the zig-zag ladder. The Hamiltonian (5) represents
the minimal effective low-energy field theory describing
the region J2 ≫ |J1| of the frustrated FM spin-1/2 chain
for M 6= 0.9,13 The relevant interaction term cos
opens a gap in the φ− sector. Since S
x+1 − Szx ∼ ∂xφ−,
relative fluctuations of the two chains are locked. This
implies that single-spin flips are gapped with a sine-
Gordon gap in the sector describing relative spin fluctua-
tions of the two-chain system.9 Gapless excitations come
from the ∆Sz = 2 channel, i.e. only those excitations
are soft where spins simultaneously flip on both chains.
DMRG results show that this picture applies to a large
part of the magnetic phase diagram.9
III. CORRELATION FUNCTIONS
We now turn to the ground state properties of Eq. (1)
as a function of magnetization, concentrating on several
correlation functions in order to identify the leading in-
stabilities. Note that our analysis is only valid if M 6= 0.
Apart from a term representing the magnetization M in-
duced by the external field, the longitudinal correlation
function shows an algebraic decay with distance r:
〈Szα(0)Szβ(r)〉 ≃ M2+
C1 cos(2kF r + (α− β)kF )
2π2rK+
8π2r2
The constants Ci, i = 1, 2, 3, appearing here and in
Eq. (9) will be determined through a comparison with
numerical results.
In contrast to Eq. (7), the transverse xy-correlation
functions decay exponentially reflecting the gapped na-
ture of the single spin-flip excitations. Here we do not
restrict ourselves to the equal-time expression only, be-
cause we will need non-equal time correlation functions
to extract the finite-size corrections to the gap later on.
We obtain:
〈S+α (0, 0)S−β (r, τ)〉 ≃
δα,β(−1)re−∆1(M)
τ2+r2/v2
(r2 + v2+τ
8K+ (r2 + v2−τ
where τ stands for the Euclidean time, ∆1(M) is the
∆Sz = 1 gap, and v± ∼ v ± J1/π in the weak cou-
pling limit. The Kronecker delta strictly applies to the
thermodynamic limit, while on the lattice an additional
contribution for α 6= β exists.
It is noteworthy that, different from Eq. (8), the in-
plane correlation functions involving bilinear spin combi-
nations decay algebraically. This stems from the gapless
nature of ∆Sz = 2 excitations. In fact, these are the
slowest decaying correlators close to the saturation mag-
netization:
〈S+1 (r)S
2 (r)S
1 (0)S
2 (0)〉 ≃
r1/K+
C3 cos(2kF r)
rK++1/K+
This result is reminiscent of a partially ordered state be-
cause the ordering tendencies in this correlation function
are more pronounced than those of the corresponding
single-spin correlation function Eq. (8). Therefore, we
call the correlator (9) ‘nematic’. Furthermore, we will
refer to a situation where Eq. (9) is the slowest decay-
ing one among all correlation functions as a ‘nematic-like
phase’.
By virtue of the exponential decay in (8), the correlator
(9) is proportional to:
〈(S+1 (r) + S
2 (r))
2 (S−1 (0) + S
2 (0))
2〉 . (10)
The term (Sα1 + S
2 appearing in the case of the S = 1
zig-zag ladder corresponds to the operator (Sα)2 in the
case of a S = 1 chain. One can think of an effective S = 1
spin formed from two neighboring S = 1
spins coupled
by the ferromagnetic interaction. A similar behavior of
correlation functions, namely the exponential decay of in-
plane spin components and the algebraic decay of their
bilinear combinations, is encountered also in the XY 2
phase of the anisotropic S = 1 chain14 and in the spin-1
chain with biquadratic interactions, see, e.g., Ref. 15.
The algebraic decay of the nematic correlator as op-
posed to the exponential decay of (8) suggests that there
are tendencies towards nematic ordering in this phase.
Depending on the value of K+ the dominant instabilities
are either spin-density-wave ones for K+ < 1 or nematic
ones for K+ > 1. From the result for K+ given in Eq. (6)
one can perturbatively evaluate the crossover value of J1:
|J1,cr| =
πv(m)
. (11)
For J1 < J1,cr the nematic correlator (9) is the slowest
decaying one, i.e. one is in the nematic-like phase. The
-0.015
-0.01
-0.005
0.005
0.001
0 4 8 12 16 20 24 28 32
fit, L=32
fit, L=48
fit, L=64
ED, L=32
ED, L=48
ED, L=64
FIG. 1: (Color online) Correlation functions at J1 = −J2 < 0,
and magnetization M = 3/8: (a) longitudinal component Sz
(b) transverse component S±
, (c) spin nematic S±
. x is
the distance in a single-chain notation. ED results for periodic
boundary conditions are shown by symbols, fits by lines. Note
the logarithmic scale of the vertical axis in panel (b).
behavior of the cross-over line can be read off from the
behavior of K(m): K(m) increases monotonically with
m, tends to K = 1 for m → 1/2, and satisfies K < 1
for m < 1/2 (see, e.g., Refs. 16,17). Therefore, we have
J1,cr = 0 for M = 1/2 with increasing ferromagnetic
|J1,cr| for decreasing M . This means that for J1 < 0
a regime opens at high M where nematic correlations
given by Eq. (9) dominate over spin-density-wave corre-
lations given by Eq. (7), in agreement with Chubukov’s
prediction.8
Now we check the correlation functions obtained within
bosonization against exact diagonalization (ED) results.
Numerical data obtained for J1 = −J2 < 0 and M = 3/8
on finite systems with periodic boundary conditions are
shown in Fig. 1. This parameter set allows for a clear
test of the above predictions, but represents the generic
behavior in the phase of two weakly coupled chains. To
take into account finite-size effects we use the observation
that for a conformally invariant theory, any power law
on a plane becomes a power law in the following variable
defined on a cylinder of circumference L:
x → L
. (12)
First we fit the nematic correlator given by Eq. (9),
which from bosonization is expected to be the leading
instability at high magnetizations. Using the part with
x ≥ 5 of the L = 64 data shown in Fig. 1c, we find
1/K+ = 0.904 ± 0.011, C2 = 0.143 ± 0.004, and C3 =
−0.326±0.013. Fig. 1c shows that all finite-size results for
the nematic correlator are nicely described by this fit with
the dependence on L taken into account by substituting
Eq. (12) for the power laws. Moreover, from K+ > 1, we
see that the system is indeed in the region dominated by
nematic correlations for M = 3/8 and J1 = −J2.
Now we turn to the longitudinal correlation function
which we fit to the bosonization result Eq. (7). Since
most numerical parameters have been determined by the
previous fit, only one free parameter is left which we de-
termine from the numerical results of Fig. 1a for L = 64
and x ≥ 14 as C1 = 0.060± 0.004. Predictions for other
system sizes are again obtained by substituting Eq. (12)
for the power laws. The agreement in Fig. 1a is not as
good as in Fig. 1c. However, it improves at larger dis-
tances x and system sizes L, indicating that corrections
omitted in Eq. (7) are still relevant on the length scales
considered here.
Finally, the xy-correlation function is shown in Fig. 1b
with a logarithmic scale of the vertical axis of this panel.
The exponential decay predicted by Eq. (8) is verified.
One further observes that correlations between the in-
plane spin-operators belonging to different chains (odd
x) are an order of magnitude smaller than on the same
chain (even x). This suppression of correlations between
different chains corresponds to the δ symbol in (8), which
strictly applies only in the thermodynamic limit and for
large distances.
We summarize the main result of this section: in-plane
spin correlators are exponentially suppressed for any fi-
nite value of the magnetization in the parameter region
|J1| < J2. The ground state crosses over from a spin-
density-wave dominated to a nematic-like phase with in-
creasing magnetic field, with the crossover line given by
Eq. (11).
IV. EXCITATIONS
We next address the excitation spectrum. Since the
gap to ∆Sz = 1 excitations should be directly accessible
to microscopic experimental probes such as inelastic neu-
tron scattering or nuclear magnetic resonance, we analyze
its behavior as a function of magnetization. Sufficiently
below the fully polarized state the gap can be calculated
analytically using results from sine-Gordon theory. In
addition, to leading order of the interchain coupling, one
can get qualitative expressions using dimensional argu-
ments for the perturbed conformally invariant model:
∆1(m) ∼
c2(m)|J1| sin(πm)
v(m)(1 − J1K(m)/πv(m))
, (13)
where ν = 2− 2K(m)
1+ J1K(m)/πv(m)
. m(h),K(h)
and v(h) can be determined numerically from the Bethe
ansatz integral equations.16,17,18,19
With this information and Eqs. (4) and (13) we de-
termine the qualitative behavior of the single-spin gap
∆1(M) as a function of M : it increases from zero at
zero magnetization, reaches a maximum at intermediate
magnetization values, then shows a minimum and, upon
approaching the saturation magnetization, it increases
again. As our formulas do not strictly apply at m = 0,
the notion of a vanishing gap at zero magnetization may
be a spurious result. Note that when the fully polarized
state is approached, the magnetization increases in an un-
physical fashion since in this limit bosonization becomes
inapplicable. At the point where the magnetization sat-
urates the exact value of the gap can be obtained from
the following mapping to hard-core bosons:8,11,20
Szi =
− a†iai , S
i = a
i . (14)
Comparing Eq. (14) with Eq. (2) one recognizes the lead-
ing terms in Haldane’s harmonic fluid transformation for
bosons.20 Using a ladder approximation which is exact in
the two-magnon subspace we arrive at:
4J22 − 2J1J2 − J21
2(J2 − J1)
J21 + 8J1J2 + 16J
J21 (3 J2 + J1)
J2 (J2 − J1)
. (15)
In Eq. (15) we have represented the gap as a difference
of two terms: the quantum and the classical instability
fields emphasizing its quantum origin.
In order to verify these field theory predictions, we
perform complementary numerical computations using
the DMRG method.21 Open boundary conditions are im-
posed and we typically keep up to 400 DMRG states.
From DMRG we obtain the ground-state energies E(Sz)
as a function of total Sz. For those values of Sz that
emerge as a ground state in an external magnetic field
we compute the single-spin excitation gap from
∆1(M) =
E(Sz + 1) + E(Sz − 1)− 2E(Sz)
. (16)
Fig. 2a shows numerical results for ∆1 at a selected value
of J1 = −0.3 J2 < 0 for the largest system sizes investi-
gated. We find that the finite-size behavior of the gap
∆1(M,L) for system sizes L ≥ 24 is well described by
a 1/L correction. This will be further corroborated by
field-theoretical arguments outlined below. Therefore, we
0 0.1 0.2 0.3 0.4 0.5
L=120
L=144
L=156
L=120
L=144
L=156
extrapolated
FIG. 2: (Color online) Density matrix renormalization group
results for the gaps at J1 = −0.3 J2 < 0 as a function of
magnetization M . Panel (a) shows the single-spin excitation
gap (16), panel (b) the finite-size gap (19) for two flipped
spins multiplied by the chain length L.
extrapolate it to the thermodynamic limit using a fit to
the form
∆1(M,L) = ∆1(M) +
+ · · · , (17)
allowing for an additional 1/L2 correction for those values
of M where at least 4 different system sizes are available.
This extrapolation is represented by the full circles in
Fig. 2a; errors are estimated not to exceed the size of the
symbols. Our extrapolation for ∆1 is consistent with a
vanishing gap at M = 0 in agreement with previous nu-
merical studies22 although bosonization predicts a non-
zero – possibly very small – gap.13,22,23 The behavior of
∆1(M) confirms the picture described above: the gap is
non-zero for M > 0, goes first through a maximum and
then a minimum and finally approaches ∆1/J2 ≈ 0.023
given by Eq. (15) for M → 1/2.
We further wish to point out that for chains with pe-
riodic boundary conditions, the coefficient a(M) of the
finite-size extrapolation Eq. (17) is determined by the
spin-wave velocity and the critical exponent of the soft
mode from the ∆Sz = 2 channel. Indeed, using Eq. (8)
where we can set r = 0, and use the conformal mapping
(12) to the cylinder, we see that the leading finite-size
correction to the gap is:
∆1(M,L) = ∆1(M) +
πv+(M)
4K+(M)
. (18)
0 π/2 π
wave vector
1.0 0.0
FIG. 3: (Color online) Numerical dispersion spectrum in the
subspaces of odd Sz computed for L = 24 and J1 = −J2 < 0.
The wave vector is given relative to the ground state wave
vector (0 for Sz = 0, 4, 8 and π for Sz = 2, 6).
Note that we have to replace sin with sinh in Eq. (12)
in order to extract a gap, since we are dealing with Eu-
clidean time. In addition we used the fact that in our ap-
proximation the effective Hamiltonian (5) is a direct sum
of symmetric and antisymmetric sectors. Moreover, it is
only the symmetric sector enjoying conformal invariance
and consequently we perform the replacement τ → sinh τ
only in the symmetric sector. The antisymmetric sector
has a spectral gap and its contribution to the finite-size
corrections of the single-spin flip excitation energy are
exponentially suppressed with system size.24 With this
method one cannot fix the amplitudes of the 1/L2 term
and beyond. Note furthermore that there may be ad-
ditional surface terms for open boundary conditions as
employed in the numerical DMRG computations. Nev-
ertheless there is a dominant 1/L correction in any case.
Next, we briefly look at the ∆Sz = 2 excitations. Their
finite-size gap is, in analogy to Eq. (16), computed with
DMRG from
∆2(M) =
E(Sz + 2) + E(Sz − 2)− 2E(Sz)
. (19)
Fig. 2b shows numerical results for L∆2(M,L) again at
the value J1 = −0.3 J2 < 0. One observes that the scaled
finite-size gaps collapse onto a single curve which shows
that ∆2(M,L) scales linearly to zero with 1/L, exactly
as expected for gapless excitations in 1D. Furthermore,
we observe that the scaled quantity L∆2(M,L) vanishes
as one approaches saturation M = 1/2 which indicates a
vanishing of the velocity of the corresponding excitations
at saturation.
We proceed by discussing the wave-vector dependence
of the ∆Sz = 1 excitation, while we remind the reader
that the low-energy excitations are in the ∆Sz = 2 sec-
tor. Fig. 3 shows representative ED results obtained for
rings with L = 24 and J1 = −J2 < 0. For ground
states with low Sz, the ∆Sz = 1 excitation spectrum
looks similar to the continuum of spinons. On the other
hand, close to saturation one has single-magnon excita-
tions with a minimum given by the classical value of
the wave vector kcl = arccos(|J1|/4J2).8,13,25 We read
off from Fig. 3 that upon lowering the magnetic field,
this minimum shifts from the classical incommensurate
value towards π/2, i.e. the value appropriate for two de-
coupled chains. This renormalization of the minimum of
the magnon excitations towards the value of decoupled
chains can be interpreted in terms of quantum fluctua-
tions, which are enhanced when the density of magnons
increases. A strong quantum renormalization of the pitch
angle from its classical value at zero magnetization was
previously observed by the coupled-cluster method and
DMRG calculations.26
V. SUMMARY
We have combined numerical techniques with analyti-
cal approaches and mapped out the ground state phase
diagram of the frustrated ferromagnetic spin chain in an
external magnetic field. We have established that with
increasing magnetic field, the ground state crosses over
from a spin-density-wave dominated to a nematic-like
phase. Single spin flip excitations are gapped, giving rise
to an exponential decay of in-plane spin correlation func-
tions in both regimes. We have studied the single- and
two-spin flip excitation energy numerically. Using tools
from conformal field theory we have further shown that
the amplitude of the leading 1/L correction term to the
single-spin flip gap is determined by the critical exponent
and the spin-wave velocity of the soft mode.
Finally, in order to apply our findings to the mate-
rial LiCuVO4, one should take into account interchain
interactions as well as anisotropies, which are expected
to be present in this system.3 At low fields, a helical state
has been observed experimentally.2,3 On the other hand,
for the purely one dimensional case, we have shown that
upon increasing the magnetic field there is a competition
between spin-density-wave and nematic-like tendencies.
Those are the two leading instabilities at high magnetiza-
tions and thus they are the natural candidates to become
long-range ordered in higher dimensions. The question
whether there are true phase transitions at high fields
in higher dimensions is beyond the scope of the current
work.
Acknowledgments
We thank A. Feiguin for providing us with his DMRG
code used for large scale calculations. Most of T.V.’s
work was done during his visits to the Institutes of
Theoretical Physics at the Universities of Hannover and
Göttingen, supported by the Deutsche Forschungsge-
meinschaft. The hospitality of the host institutions is
gratefully acknowledged. T.V. also acknowledges support
from the Georgian National Science Foundation under
grant N 06−81−4−100. LPTMS is a mixed research unit
8626 of CNRS and University Paris-Sud. A.H. is sup-
ported by the Deutsche Forschungsgemeinschaft (Project
No. HO 2325/4-1), and F.H.-M. is supported by NSF
grant No. DMR-0443144.
1 M. Hase, H. Kuroe, K. Ozawa, O. Suzuki, H. Kitazawa, G.
Kido, and T Sekine, Phys. Rev. B 70, 104426 (2004).
2 B. J. Gibson, R. K. Kremer, A. V. Prokofiev, W. Assmus
and G. J. McIntyre, Physica B 350, e253 (2004).
3 M. Enderle, C. Mukherjee, B. F̊ak, R. K. Kremer, J.-M.
Broto, H. Rosner, S.-L. Drechsler, J. Richter, J. Malek,
A. Prokofiev, W. Assmus, S. Pujol, J.-L. Raggazzoni, H.
Rakoto, M. Rheinstädter, and H. M. Rønnow, Europhys.
Lett. 70, 237 (2005).
4 M. G. Banks, F. Heidrich-Meisner, A. Honecker, H.
Rakoto, J.-M. Broto, and R. K. Kremer, J. Phys.: Cond.
Mat. 19, 145227 (2007).
5 N. Büttgen, H.-A. Krug von Nidda, L. E. Svistov, L. A.
Prozorova, A. Prokofiev, and W. Aßmus, Phys. Rev. B 76,
014440 (2007).
6 S.-L. Drechsler, O. Volkova, A. N. Vasiliev, N. Tristan, J.
Richter, M. Schmitt, H. Rosner, J. Málek, R. Klingeler, A.
A. Zvyagin, and B. Büchner, Phys. Rev. Lett. 98, 077202
(2007).
7 H.-J. Mikeska and A. K. Kolezhuk, Lect. Notes Phys. 645,
1 (2004).
8 A. V. Chubukov, Phys. Rev. B 44, 4693 (1991).
9 F. Heidrich-Meisner, A. Honecker, and T. Vekua, Phys.
Rev. B 74, 020403(R) (2006).
10 D. V. Dmitriev, V. Y. Krivnov, and J. Richter, Phys. Rev.
B 75, 0114424 (2007).
11 R. O. Kuzian and S.-L. Drechsler, Phys. Rev. B 75, 024401
(2007).
12 See, e.g., T. Hikihara and A. Furusaki, Phys. Rev. B 69,
064427 (2004).
13 D. C. Cabra, A. Honecker, and P. Pujol, Eur. Phys. J. B
13, 55 (2000).
14 H. J. Schulz, Phys. Rev. B 34, 6372 (1986).
15 A. Läuchli, G. Schmid, and S. Trebst, Phys. Rev. B 74,
144426 (2006).
16 K. Totsuka, Phys. Lett. A 228, 103 (1997).
17 D. C. Cabra, A. Honecker, and P. Pujol, Phys. Rev. B 58,
6241 (1998).
18 V. E. Korepin, N. M. Bogoliubov, and A. G. Izergin, Quan-
tum Inverse Scattering Method and Correlation Functions
(Cambridge University Press, Cambridge, England, 1993).
19 S. Qin, M. Fabrizio, L. Yu, M. Oshikawa, and I. Affleck,
Phys. Rev. B 56, 9766 (1997).
20 F. D. M. Haldane, Phys. Rev. Lett 47, 1840 (1981).
21 S. R. White, Phys. Rev. Lett. 69, 2863 (1992); Phys. Rev.
B 48, 10345 (1993).
22 C. Itoi and S. Qin, Phys. Rev. B 63, 224423 (2001).
23 A. A. Nersesyan, A. O. Gogolin, and F. H. L. Eßler, Phys.
Rev. Lett. 81, 910 (1998).
24 S. I. Matveenko and A. M. Tsvelik, private communication.
25 C. Gerhardt, K.-H. Mütter, and H. Kröger, Phys. Rev. B
57, 11504 (1998).
26 R. Bursill, G. A. Gehring, D. J. J. Farnell, J. B. Parkinson,
T. Xiang, and C. Zeng, J. Phys. Cond. Mat. 7, 8605 (1995).
ABSTRACT
  Magnetic field effects on the one-dimensional frustrated ferromagnetic chain
are studied by means of effective field theory approaches in combination with
numerical calculations utilizing Lanczos diagonalization and the density matrix
renormalization group method. The nature of the ground state is shown to change
from a spin-density-wave region to a nematic-like one upon approaching the
saturation magnetization. The excitation spectrum is analyzed and the behavior
of the single spin-flip excitation gap is studied in detail, including the
emergent finite-size corrections.

<|endoftext|><|startoftext|>
Evidence of Spatially Inhomogeous Pairing on the Insulating Side of a Disorder-Tuned
Superconductor-Insulator Transition
K. H. Sarwa B. Tan, Kevin A. Parendo, and A. M. Goldman
School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
(Date textdate; Received textdate; Revised textdate; Accepted textdate; Published textdate)
Abstract
Measurements of transport properties of amorphous insulating InxOy thin films have been interpreted as evidence of the
presence of superconducting islands on the insulating side of a disorder-tuned superconductor-insulator transition. Although
the films are not granular, the behavior is similar to that observed in granular films. The results support theoretical models in
which the destruction of superconductivity by disorder produces spatially inhomogenous pairing with a spectral gap.
The interplay between localization and superconduc-
tivity can be investigated through studies of disordered
superconducting films [1], originally treated by Anderson
[2], and Abrikosov and Gor’kov [3], who considered the
low-disorder regime. Several approaches have been pro-
posed for strong disorder, including fermionic mean field
theories [4, 5, 6] and theories that focus on the univer-
sal critical properties near the superconductor-insulator
transition. The latter consider the transition to belong
to the dirty boson universality class [7]. When quantum
fluctuations are included in fermionic theories for high
levels of disorder a spatially inhomogeneous pairing am-
plitude is found which retains a nonvanishing spectral
gap [8]. For sufficiently disordered systems inhomoge-
neous pairing can also be brought about by thermal fluc-
tuations [9]. A similar inhomogeneous regime has also
been considered under the rubric of electronic microemul-
sions in the context of the metal-insulator transition of
two dimensional electron gases [10]. In this letter we pro-
vide evidence of a spatially inhomogeneous order param-
eter on the insulating side of a superconductor-insulator
transition driven by structural and/or chemical disorder.
Studies of disorder and magnetic field tuned
superconductor-insulator transitions have usually been
carried out on films that are either amorphous or gran-
ular. In the former the disorder is on an atomic scale,
and in the latter, on a mesoscopic scale in which case
the films consist of metallic grains or clusters connected
by tunneling, that are either embedded in an insulating
matrix, or on a bare substrate [1]. Amorphous films can
be produced when films of metal atoms such as Pb or Bi
are grown at liquid helium temperatures on substrates
precoated with a wetting layer of amorphous Ge or Sb
[11], or by careful vapor deposition of MoxGey, InxOy,
or TiN using a variety of techniques.
Granular films, are known to develop superconductiv-
ity in stages. If the grains are small and weakly con-
nected, the film is an insulator. For grains larger than
some characteristic size, and sufficiently close together,
“local superconductivity” will develop below some tem-
perature. The opening of a spectral gap in the density
of states of the grains [12] results in a relatively sharp
upturn in the resistance below this temperature, which
is usually close to the transition temperature of the bulk
material. For well enough coupled grains, there may be
a small drop in resistance at that temperature, followed
by this upturn. This is in contrast with the “global su-
perconductivity” that occurs when a sufficient fraction of
the grains or clusters are strongly enough Josephson cou-
pled to form a percolating superconducting path across
the film.
We have measured the temperature and magnetic field
dependence of the resistance, and nonlinear conductance-
voltage characteristics of amorphous InxOy films pre-
pared by electron-beam evaporation. These films are
not granular, but nevertheless exhibit local supercon-
ductivity at the lowest temperatures. At low temper-
atures, the application of a magnetic field results in a
dramatic rise in resistance exhibiting a maximum that
with decreasing temperatures is found at decreasingly
small fields. The conductance-voltage characteristics in
this high resistance regime are nonlinear in a manner
suggestive of single-particle tunneling between supercon-
ductors. We argue that these observations are consistent
with the presence of droplets, or islands of superconduc-
tivity, characterized by a nonvanishing superconducting
pair amplitude and coupled by tunneling. Many of the
droplets are Josephson coupled, but their density is not
high enough to produce a superconducting path across
the film.
The 22 nm thick films used in this study were deposited
at a rate of 0.4 nm/s by electron beam evaporation onto
(001) SrTiO3 epi-polished single crystal substrates. Plat-
inum electrodes, 10 nm in thickness, were deposited prior
to growth. The starting material was 99.999 % pure
In2O3. A shadow mask defined a Hall bar geometry in
which the effective area for four-terminal resistance mea-
surements was 500 x 500 µm2. As-grown films exhibited
sheet resistances of about 4600 Ω at room temperature
and about 23 kΩ at 10 K. By annealing at relatively low
temperatures (55-70 ◦C) in a high vacuum environment
(10−7 Torr), film resistances were lowered, and depending
upon the annealing time either insulating or supercon-
ducting behavior at low temperatures could be induced
Typeset by REVTEX 1
http://arxiv.org/abs/0704.0765v2
FIG. 1: Resistance vs. temperature for Films 1 and 2.
[13]. Low-temperature rather than high-temperature an-
nealing avoids changes in morphology that would result
in granular or microcrystalline films. As reported by
Gantmakher, et al. [14], at room temperature the re-
sistances of annealed films were found to be unstable.
However, at low temperatures (40-1400 mK) and in vac-
uum, they were stable. The films of the present study
had resistances of 2600 Ω at room temperature.
Film structure was studied using atomic force mi-
croscopy (AFM), X-ray diffraction (XRD) analysis, and
high resolution scanning electron microscopy (SEM).
From the XRD there was no indication of the presence
of crystalline In or In2O3. The SEM did not detect any
In inclusions, and could be correlated with AFM stud-
ies which revealed for a 22 nm thick film, roughness in
the form of surface features with a height of 8.5 nm, and
with bases about 18 nm in diameter. The conclusion
from these characterization efforts is that the films were
homogeneous and amorphous, and did not contain iso-
lated grains or In inclusions.
Measurements were carried out in an Oxford Kelvinox-
25 dilution refrigerator housed in a screen room, with
measuring leads filtered at room temperature using π-
section filters and RC filters. For measurements of resis-
tance, the applied current was set in the range of 10-100
pA, to avoid the possibility of heating. Figure 1 shows a
plot of R (T ) for two films which were studied in detail.
For each, dR/dT is negative at the lowest temperatures.
In the case of Film 1 there is a local minimum in R(T )
at about 350 mK. Both films exhibit a sharp upturn in
R(T ) between 200 and 300 mK, with the effects to be
discussed below, occurring for Film 1 at higher tempera-
tures than for Film 2. These behaviors are suggestive of
a regime of local superconductivity [12].
The sheet resistances of Films 1 and 2 were both ap-
proximately 78 kΩ at 40 mK. In a perpendicular magnetic
field of only 0.2 T, their sheet resistances increased by up
to a factor of 40 at 40 mK. The maximum in R(B) as
shown in Fig. 2(a) for Film 1 is followed, at the lowest
temperatures, by a relatively slow decrease in resistance
with increasing field. The resistance maximum moves to
higher fields, with increasing temperature. The behavior
of Film 2 resembled the higher temperature data for Film
1, presumably because Film 2 exhibited weaker traces of
superconductivity as evidenced by the absence of a lo-
cal minimum in R(T ) in the zero-field. This variation in
properties from film to film is expected, as small changes
in chemistry and/or morphology can have a large effect
on disordered film properties. The temperature depen-
dencies of the fields, Bpeak and resistances Rpeak are pre-
sented in Fig. 2(b). A qualitatively similar, but weaker
enhancement of resistance was previously reported for in-
sulating InxOy films by Gantmakher and coworkers [14].
A larger enhancement was reported for ultrathin insulat-
ing Be thin films [15]. However neither of these works
demonstrated the systematic effects shown in Fig. 2(b).
To probe the nature of the high resistance state, differ-
ential conductance-voltage characteristics were also stud-
ied [15, 16, 17, 18, 19, 20]. These are shown in Fig. 3
for Film 2 which was studied in detail. Film 1 exhibited
qualitatively similar features.
All of the nonlinear conductance-voltage characteris-
tics are reminiscent of the single-particle tunneling char-
acteristics of superconductor-insulator-superconductor
(SIS) tunneling junctions. The effects of electron heating
are found at voltages well above the observed conduc-
tance thresholds [21]. The fact that the low voltage non-
linearities vanish at temperatures above approximately
200 mK, suggests that they are associated with the pres-
ence of a nonvanishing pairing amplitude occurring in the
disconnected superconducting regions.
We can model these thin disordered films exhibiting
spatially inhomogeneous pairing as random networks of
tunneling junctions of various (random) levels of conduc-
tivity, connecting superconducting clusters imbedded in
an insulating matrix. Some of these junctions are su-
perconducting because they are Josephson coupled. As
a consequence there are “superclusters,” which are ag-
gregates of Josephson coupled smaller clusters that may
cover a macroscopic fraction of the film area. Charge
may flow through “superclusters” with zero electrical re-
sistance. However, as long as these do not span the film
the resistance will be determined by single-particle tun-
neling. The sheet resistance of the resultant network can
be inferred using the following simple argument [22]. Dis-
connect all of the junctions in the network whose conduc-
tance involves single particle tunneling, and then recon-
nect them one by one in ascending order of resistance.
A stage will be reached at which the next junction com-
pletes an infinite cluster connecting the ends of the net-
work. Let the normal state resistance of this last junction
FIG. 2: (a) Resistance vs. magnetic field for Film 1. The
temperatures are 40 (top), 80, 100, 120, 130, 140, 150, 170,
180, 200, 230, 250, 300, 350, 400, and 500 mK (bottom).
(b)The fields (left axis) and the resistances (right axis) of the
peaks in R(B) are plotted as a function of temperature. The
flattening of Rpeak at the lowest temperatures may be the
result of a failure to cool the electrons.
FIG. 3: Differential conductance vs. voltage of Film 2 at 100
mK for 0 (top), 0.01, 0.02, 0.03, 0.04, 0.06, 0.1, 0.175, 0.25,
0.5, and 1 T (bottom).
be Rn. The actual value will depend upon the nature of
the distribution of single-particle tunneling resistances in
the film. The measured normal-state sheet resistance of
the entire network will then be Rn, as this junction is the
bottleneck. Junctions with R > Rn are irrelevant since
they are always shunted by junctions with resistances
of order Rn. Junctions with R < Rn only form finite
clusters over macroscopic distances. They don’t affect
the conductivity because current must still pass through
junctions with resistance of order Rn to get from one
“supercluster” to the next. The action of a magnetic
field is to quench the Josephson coupling within the “su-
perclusters.” When this happens, the resistance at each
temperature will be governed by the new, higher, value
of the bottleneck resistance as there will no longer be any
Josephson-coupled “superclusters,” and the distribution
of junction resistances will shift towards higher values of
resistance.
The fact that the magnetic field that induces higher
resistance decreases with decreasing temperature is a
counter-intuitive result, implying a divergent magnetic
length scale in the zero temperature limit, possibly of
the form [φ0/B]
where φ0 is the flux quantum. This re-
sult suggests enhanced quantum fluctuations in the zero
temperature limit. A heuristic argument can be made
to demonstrate that this is plausible by treating the in-
homogeneous pairing state of the film as a collection of
superconducting grains or islands coupled by tunneling
junctions. Without the inclusion of quantum fluctuations
the argument may not capture all of the features of the
data. It is first useful to consider the magnetic field de-
pendence of the in-plane Josephson coupling between two
planar thin film square islands with an area L2. This is
a geometry resembling the grain boundary geometry of
high temperature superconductor junctions. A magnetic
field applied perpendicular to the plane will completely
penetrate both electrodes of such a junction. As a conse-
quence the minima of the diffraction pattern will be gov-
erned by the field corresponding to a single flux quantum
over the full area of the structure or B [L(2L+ d)] = φ0
where d is the width of the barrier or gap [23]. Since
L >> d, the field at the first minimum of the diffrac-
tion pattern would be found at a value proportional to
−2 . For a “supercluster” consisting of a square ar-
ray of square islands that are Josephson coupled, with
some degree of randomness in the coupling, one would
expect coherence to vanish at the first minimum. For a
random array and with clusters that are not square, one
might expect a similar dependence on L−2. If the char-
acteristic size of the islands increased with decreasing
temperature, which is a plausible assumption, the field
quenching the Josephson coupling, would be expected to
decrease as is observed. For the films studied, at the
lowest temperatures, the peak in the resistance occurs in
a field of 0.2 T, which would correspond to a length of
approximately 100 nm.
The fall off of the resistance at fields above that pro-
ducing the maximum slows with decreasing temperature,
consistent with the strengthening of the pairing ampli-
tude with decreasing temperature. The fact that this
remnant of superconductivity persists to fields up to 12
T, far above the bulk critical field of InxOy, implies that
the superconducting islands are much smaller than the
penetration depth. It should be possible to develop a
detailed percolation model of this effect, similar to that
developed for granular superconductors [24], which in-
cludes the quenching of the Josephson coupling by mag-
netic field and quantum fluctuations.
Although the resistance of the films of the present work
increases with decreasing temperatures in zero magnetic
field there is no guarantee that at some temperature lower
than the minimum value accessed in these measurements,
the resistance will not fall to zero. This could result
from the percolation of Josephson coupling across the
film as the size of the clusters increases. In that event
the inhomogeneous pairing implied by the data would be
more likely governed by a theory including both quantum
[8] and thermal [9] fluctuations.
The large peaks in the magnetoresistance found at
fields above the magnetic field-induced superconductor-
insulator transition of superconducting amorphous
InxOy thin films may result from a similar inhomogeneity
of the pairing amplitude, in that case induced by mag-
netic field rather than disorder. Such peaks were first re-
ported by Hebard and Paalanen [25] who suggested that
the state induced when superconductivity was quenched
was a Bose insulator, characterized by localized Cooper
pairs. They proposed that the peak was the signature of
a crossover to a Fermi insulating state of localized elec-
trons. This resistance peak has been the subject of more
recent studies of InxOy films [14, 16, 26], of microcrys-
talline TiN films where the high field limit appears to be
a “quantum metal” [18], and of high-Tc superconductors
[27]. The fact that inhomogeneous pairing can be in-
duced in disordered superconductors by magnetic fields
has been recently established using a Hubbard Model
[28].
The notion that disorder implies inhomogeneity of su-
perconducting order on some length scale was first dis-
cussed by Kowal and Ovadyahu [29], and as was men-
tioned earlier emerges naturally in a fermionic model of
the superconductor-insulator transition that exhibits a
disorder-tuned inhomogeneity of the pairing amplitude
[8]. The films of Kowal and Ovadyahu differ from those
of the present work in that they are presumably more
disordered, and thus further into the insulating regime.
Their magnetoresistance is always negative as there is no
Josephson coupling between islands and the main effect
of magnetic field is to weaken the inhomogeneous pairing
amplitude, leading to negative magnetoresistance.
This work was supported by the National Science
Foundation under grant no. NSF/DMR-0455121. The
authors would like to thank Zvi Ovadyahu for providing
samples and for critical comments, and Leonid Glazman
and Alex Kamenev for useful discussions.
[1] A. M. Goldman and N. Marković, Phys. Today 51 (11),
39 (1998).
[2] P. W. Anderson, J. Phys. Chem Solids 11, 26 (1959).
[3] A. A. Abrikosov and L. P. Gor’kov, Zh. Eksp. Teor. Fiz.
36, 319 (1959) [Sov. Phys. JETP 9, 220 (1959)].
[4] H. Fukuyama, H. Ebisawa, and S. Maekawa, J. Phys.
Soc. Jpn. 53, 3560 (1984).
[5] M. Ma and P. A. Lee, Phys. Rev. B 32, 5658 (1985).
[6] A. M. Finkel’stein, Physica B 197, 636 (1994).
[7] M. P. A. Fisher, Phys. Rev. Lett 65, 923 (1990).
[8] A. Ghosal, M. Randeria, and N. Trivedi, Phys. Rev. Lett.
81, 3940 (1998); A. Ghosal, M. Randeria, and N. Trivedi,
Phys. Rev. B 65, 014501 (2001).
[9] L. N. Bulaevskĭı, S. V. Panyukov, and M. V. Sadovskĭı,
Zh. Eksp. Teor. Fiz. 92, 672 (1987) [Sov. Phys. JETP
65, 380 (1987)].
[10] Boris Spivak and Steven A. Kivelson,
arXiv:cond-mat/0510422 v2.
[11] Myron Strongin, R. S. Thompson, O. F. Kammerer, and
J. E. Crow, Phys. Rev. B 1, 1078 (1970).
[12] B. G. Orr, H. M. Jaeger, and A. M. Goldman, Phys. Rev.
B 32, 7586 (1985).
[13] Z. Ovadyahu, J. Phys. C 19, 5187 (1986).
[14] V. F. Gantmakher, M. V. Golubkov, J. G. S. Lok, and
A. K. Geim, JETP 82, 951 (1996).
[15] E. Bielejec, J. Ruan, and W. Wu, Phys. Rev. B 63,
100502(R) (2001).
[16] G. Sambandamurthy, L. W. Engel, A. Johansson, and D.
Shahar, Phys. Rev. Lett. 92, 107005 (2004), G. Samban-
damurthy et al., Phys. Rev. Lett. 94, 017003 (2005).
[17] C. Christiansen, L. M. Hernandez, and A. M. Goldman,
Phys. Rev. Lett. 88, 037004 (2002).
[18] T. I. Baturina, C. Strunk, M. R. Baklanov, and A. Satta,
Phys. Rev. Lett. 98, 127003 (2007).
[19] W. Wu and P. W. Adams, Phys. Rev. B 50, 13065 (1994).
[20] R. P. Barber, Jr., Shih-Ying Hsu, J. M. Valles, Jr., R. C.
Dynes, and R. E. Glover III, Phys. Rev. B 73, 134516
(2006).
[21] M. E. Gershenson, Yu. B. Khavin, D. Reuter, P.
Schafmeister, and A. D. Wieck, Phys. Rev. Lett. 85, 1718
(2000).
[22] B. G. Orr, H. M. Jaeger, A. M. Goldman, and C. G.
Kuper, Phys. Rev. Lett. 56, 378 (1986).
[23] K. L. Ngai, Phys. Rev. 182, 555 (1969).
[24] Pedro A. Pury and Manuel O. Cáceres, Phys. Rev. B 55,
3841 (1997).
[25] A. F. Hebard and M. A. Paalanen, Phys. Rev. Lett. 65,
927 (1990); M. A. Paalanen, A. F. Hebard, and R. R.
Ruel, Phys. Rev. Lett. 69, 1604 (1992).
[26] M. Steiner and A. Kapitulnik, Physica C 422, 16 (2005).
[27] M. A. Steiner, G. Boebinger, and A. Kapitulnik, Phys.
Rev. Lett. 94, 107008 (2005).
[28] Yonatan Dubi, Yigal Meir, and Yshai Avishai, Nature
449, 876 (2007).
[29] D. Kowal and Z. Ovadyahu, Solid State Commun. 90,
783 (1994).
http://arxiv.org/abs/cond-mat/0510422
ABSTRACT
  Measurements of transport properties of amorphous insulating indium oxide
thin films have been interpreted as evidence of the presence of superconducting
islands on the insulating side of a disorder-tuned superconductor-insulator
transition. Although the films are not granular, the behavior is similar to
that observed in granular films. The results support theoretical models in
which the destruction of superconductivity by disorder produces spatially
inhomogenous pairing with a spectral gap.

<|endoftext|><|startoftext|>
Introduction
	How Non-local is de Broglie-Bohm?
	How non-local is Hooke's Law?
	 Entanglement and a priori Nonlocality
	Nonlocality involving violation of Bell's Inequality
	de Broglie-Bohm Trajectories for Massive Singlets
	Local vs. Nonlocal Velocity Prescriptions
	The Locality of de Broglie-Bohm Trajectories for Massive Singlets in Aligned Fields
	Computer Experiments
	Physical But Unfair Sampling Effects
	Photon EPR experiments
	Conclusion
	The Meaning of Nonlocality
	Résumé
	References
ABSTRACT
  We present a local interpretation of what is usually considered to be a
nonlocal de Broglie-Bohm trajectory prescription for an entangled singlet state
of massive particles. After reviewing various meanings of the term
``nonlocal'', we show that by using appropriately retarded wavefunctions (i.e.,
the locality loophole) this local model can violate Bell's inequality, without
making any appeal to detector inefficiencies.
  We analyze a possible experimental configuration appropriate to massive
two-particle singlet wavefunctions and find that as long as the particles are
not ultra-relativistic, a locality loophole exists and Dirac wave(s) can
propagate from Alice or Bob's changing magnetic field, through space, to the
other detector, arriving before the particle and thereby allowing a local
interpretation to the 2-particle de Broglie-Bohm trajectories.
  We also propose a physical effect due to changing magnetic fields in a
Stern-Gerlach EPR setup that will throw away events and create a detector
loophole in otherwise perfectly efficient detectors, an effect that is only
significant for near-luminal particles that might otherwise close the locality
loophole.

<|endoftext|><|startoftext|>
Ground-based Microlensing Surveys
Andrew Gould1, B. Scott Gaudi1, and David P. Bennett2
1. Overview
Microlensing is a proven extrasolar planet search method that has already yielded the de-
tection of four exoplanets. These detections have changed our understanding of planet
formation “beyond the snowline” by demonstrating that Neptune-mass planets with sepa-
rations of several AU are common. Microlensing is sensitive to planets that are generally
inaccessible to other methods, in particular cool planets at or beyond the snowline, very
low-mass (i.e. terrestrial) planets, planets orbiting low-mass stars, free-floating planets, and
even planets in external galaxies. Such planets can provide critical constraints on models
of planet formation, and therefore the next generation of extrasolar planet searches should
include an aggressive and well-funded microlensing component. When combined with the
results from other complementary surveys, next generation microlensing surveys can yield an
accurate and complete census of the frequency and properties of planets, and in particular
low-mass terrestrial planets. Such a census provides a critical input for the design of direct
imaging experiments.
Microlensing planet searches can be carried out from either the ground or space. Here we
focus on the former, and leave the discussion of space-based surveys for a separate paper.
We review the microlensing method and its properties, and then outline the potential of
next generation ground-based microlensing surveys. Detailed models of such surveys have
already been carried out, and the first steps in constructing the required network of 1-2m
class telescopes with wide FOV instruments are being taken. However, these steps are
primarily being taken by other countries, and if the US is to remain competitive, it must
commit resources to microlensing surveys in the relatively near future.
2. The Properties of Microlensing Planet Searches
If a foreground star (“lens”) becomes closely aligned with a more distant star (“source”), it
bends the source light into two images. The resulting magnification is a monotonic function of
the projected separation. For Galactic stars, the image sizes and separations are of order µas
and mas respectively, so they are generally not resolved. Rather “microlensing events” are
recognized from their time-variable magnification (Paczynski 1986), which typically occurs
on timescales tE of months, although it ranges from days to years in extreme cases. Presently
about 600 microlensing events are discovered each year, almost all toward the Galactic bulge.
If one of these images passes close to a planetary companion of the lens star, it further
1Department of Astronomy, The Ohio State University, 140 W. 18th Ave., Columbus, OH 43210, USA
2Department of Physics, University of Notre Dame, IN 46556, USA
http://arxiv.org/abs/0704.0767v1
– 2 –
perturbs the image and so changes the magnification. Because the range of gravitational
action scales ∝
M , where M is the mass of the lens, the planetary perturbation typically
lasts tp ∼ tE
mp/M , where mp is the planet mass. That is, tp ∼ 1 day for Jupiters and tp ∼
1.5 hours for Earths. Hence, planets are discovered by intensive, round-the-clock photometric
monitoring of ongoing microlensing events (Mao & Paczynski 1991; Gould & Loeb 1992)
2.1 Sensitivity of Microlensing
While, in principle, microlensing can detect planets of any mass and separation, orbiting
stars of any mass and distance from the Sun, the characteristics of microlensing favor some
regimes of parameter space.
• Sensitivity to Low-mass Planets: Compared to other techniques, microlensing is more
sensitive to low-mass planets. This is because the amplitude of the perturbation does not de-
cline as the planet mass declines, at least until mass goes below that of Mars (Bennett & Rhie
1996). The duration does decline as
mp (so higher cadence is required for small planets)
and the probability of a perturbation also declines as
mp (so more stars must be moni-
tored), but if a signal is detected, its magnitude is typically large ( & 10%), and so easily
characterized and unambiguous.
• Sensitivity to Planets Beyond the Snowline: Because microlensing works by perturb-
ing images, it is most sensitive to planets that lie at projected distances where the images
are the largest. This so-called “lensing zone” lies within a factor of 1.6 of the Einstein ring,
(4GM/c2)Dsx(1− x), where x = Dl/Ds and Dl and Ds are the distances to the lens
and source. At the Einstein ring, the equilibrium temperature is
TE = T⊕
)1/4( rE
)−1/2
→ 70K
0.5M⊙
[4x(1− x)]1/4 (1)
where we have adopted a simple model for lens luminosity L ∝ M5, and assumed Ds =
8 kpc. Hence, microlensing is primarily sensitive to planets in temperature zones similar to
Jupiter/Saturn/Uranus/Neptune.
• Sensitivity to Free Floating Planets: Because the microlensing effect arises directly
from the planet mass, the existence of a host star is not required for detection. Thus, mi-
crolensing maintains significant sensitivity at arbitrarily large separations, and in particular
is the only method that is sensitive to old, free-floating planets. See § 4.
• Sensitivity to Planets from 1 kpc to M31: Microlensing searches require dense star
fields and so are best carried out against the Galactic bulge, which is 8 kpc away. Given that
the Einstein radius peaks at x = 1/2, it is most sensitive to planets that are 4 kpc away, but
maintains considerable sensitivity provided the lens is at least 1 kpc from both the observer
and the source. Hence, microlensing is about equally sensitive to planets in the bulge and
disk of the Milky Way. However, specialized searches are also sensitive to closer planets and
to planets in other galaxies, particularly M31. See § 5.
• Sensitivity to Planets Orbiting a Wide Range of Host Stars: Microlensing is about
equally sensitive to planets independent of host luminosity, i.e., planets of stars all along the
main sequence, from G to M, as well as white dwarfs and brown dwarfs. By contrast, other
– 3 –
techniques are generally challenged to detect planets around low-luminosity hosts.
• Sensitivity to Multiple Planet Systems: In general, the probability of detecting two
planets (even if they are present) is the square of the probability of finding one, which
means it is usually very small. However, for high-magnification events, the planet-detection
probability is close to unity (Griest & Safizadeh 1998), and so its square is also near unity
(Gaudi et al. 1998). In certain rare cases, microlensing can also detect the moon of a planet
(Bennett & Rhie 2002).
2.2 Planet and Host Star Characterization
Microlensing fits routinely return the planet/star mass ratio q = mp/M and the projected
separation in units of the Einstein radius b = r⊥/rE (Gaudi & Gould 1997). Historically, it
was believed that, for the majority of microlensing discoveries, it would be difficult to obtain
additional information about the planet or the host star beyond measurements of q and b.
This is because of the well-known difficulty that the routinely-measured timescale tE is a
degenerate combination of M , Dl, and the velocity of the lens. In this regime, individual
constraints on these parameters must rely on a Bayesian analysis incorporating priors derived
from a Galactic model (e.g., Dong et al. 2006).
Experience with the actual detections has demonstrated that the original view was likely
shortsighted, and that one can routinely expect improved constraints on the mass of the
host and planet. In three of the four microlensing events yielding exoplanet detections, the
effect of the angular size of the source was imprinted on the light curve, thus enabling a
measurement of the angular size of the Einstein radius θE = rE/Dl. This constrains the
statistical estimate of M and Dl (and so mp and r⊥). In hindsight, one can expect this to be
a generic outcome. Furthermore, it is now clear that for a substantial fraction of events, the
lens light can be detected during and after the event, allowing photometric mass and distance
estimates, and so reasonable estimates of mp and r⊥ (Bennett et al. 2007). By waiting
sufficiently long (usually 2 to 20 years) one could use space telescopes or adaptive optics to
see the lens separating from the source, even if the lens is faint. Such an analysis has already
been used the constrain the mass of the host star of the first microlensing planet discovery
(Bennett et al. 2006), and similar constraints for several of the remaining discoveries are
forthcoming. Finally, in special cases it may also be possible to obtain information about
the three-dimensional orbits of the discovered planets.
3. Present-Day Microlensing Searches
Microlensing searches today still basically carry out the approach advocated by Gould & Loeb
(1992): Two international networks of astronomers intensively follow up ongoing microlens-
ing events that are discovered by two other groups that search for events. The one major
modification is that, following the suggestion of Griest & Safizadeh (1998), they try to focus
on the highest magnification events, which are the most sensitive to planets. Monitoring is
done with 1m (and smaller) class telescopes. Indeed, because the most sensitive events are
highly magnified, amateurs, with telescopes as small as 0.25m, play a major role.
– 4 –
Fig. 1.— (Left) Known extrasolar planets detected via transits (blue), RV (red), and
microlensing (green), as a function of their mass and equilibrium temperature. (Right)
Same as the right panel, but versus semimajor axis. The contours show the number of
detections per year from a NextGen microlensing survey.
To date, four secure planets have been detected, all with equilibrium temperatures 40K <
T < 70K. Two are Jupiter class planets and so are similar to the planets found by RV
at these temperatures (Bond et al. 2004; Udalski et al. 2005). However, two are Neptune
mass planets, which are an order of magnitude lighter than planets detected by RV at these
temperatures (Beaulieu et al. 2006; Gould et al. 2006). See Figure 1. This emphasizes the
main advantages that microlensing has over other methods in this parameter range. The
main disadvantage is simply that relatively few planets have been detected despite a huge
amount of work.
4. NextGen Microlensing Searches
Next-generation microlensing experiments will operate on completely different principles
from those at present, which survey large sections of the Galactic bulge one–few times per
night and then intensively monitor a handful of the events that are identified. Instead, wide-
field (∼ 4 deg2) cameras on 2m telescopes on 3–4 continents will monitor large (∼ 10 deg2)
areas of the bulge once every 10 minutes around-the-clock. The higher cadence will find
6000 events per year instead of 600. More important: all 6000 events will automatically be
monitored for planetary perturbations by the search survey itself, as opposed to roughly 50
events monitored per year as at present. These two changes will yield a roughly 100-fold
increase in the number of events probed and so in the number of planetary detections.
Two groups (led respectively by Scott Gaudi and Dave Bennett) have carried out detailed
– 5 –
Fig. 2.— Expectations from a NextGen ground-based microlensing survey. These results
represent the average of two independent simulations which include very different input
assumptions but differ in their predictions by only ∼ 0.3 dex. (Left) Number of planets
detected per year assuming every main-sequence (MS) star has a planet of a given mass and
semi-major axis (see §4). (Right) Same as left panel, but assuming every MS has two planets
distributed uniformly in log(a) between 0.4-20 AU. The arrows indicate the masses of the
four microlensing exoplanet detections.
simulations of such a survey, taking account of variable seeing and weather conditions as
well as photometry systematics, and including a Galactic model that matches all known
constraints. While these two independent simulations differ in detail, they come to similar
conclusions. Figure 1 shows the number of planets detected assuming all main-sequence
stars have a planet of a given mass and given semi-major axis. While, of course, all stars
do not have planets at all these different masses, Gould et al. (2006) have shown that the
two “cold Neptunes” detected by microlensing imply that roughly a third of stars have such
planets in the “lensing zone”, i.e. the region most sensitive for microlensing searches.
Microlensing sensitivity does decline at separations that are larger than the Einstein radius,
but then levels to a plateau, which remains constant even into the regime of free-floating
planets. In this case, the timescales are similar to those of bound-planet perturbations (1 day
for Jupiters, 1.5 hours for Earths) but there is no “primary event”. Again, typical amplitudes
are factor of a few, which makes them easily recognizable. If every star ejected f planets of
mass mp, the event rate would be Γ = 2 × 10−5f
mp/Mj yr
−1 per monitored star. Since
NextGen experiments will monitor 10s of millions of stars for integrated times of well over
a year, this population will easily be detected unless f is very small. Microlensing is the
only known way of detecting (old) free-floating planets, which may be a generic outcome of
– 6 –
planet formation (Goldreich et al. 2004; Juric & Tremaine 2007; Ford & Rasio 2007).
4.1 Transition to Next Generation
Although NextGen microlensing experiments will work on completely different principles,
the transition is actually taking place step by step. The Japanese/New Zealand group MOA
already has a 2 deg2 camera in place on their 1.8m NZ telescope and monitors about 4 deg2
every 10 minutes, while covering a much wider area every hour. The OGLE team has
funds from the Polish government to replace their current 0.4 deg2 camera on their 1.3m
telescope in Chile with a 1.7 deg2 camera. When finished, they will also densely monitor
several square degrees while monitoring a much larger area once per night. Astronomers in
Korea and Germany have each made comprehensive proposals to their governments to build
a major new telescope/camera in southern Africa, which would enable virtually round-the-
clock monitoring of several square degrees. Chinese astronomers are considering a similar
initiative. In the meantime, intensive followup of the currently surveyed fields is continuing.
5. Other Microlensing Planet Searches
While microlensing searches are most efficiently carried out toward the Galactic bulge, there
are two other frontiers that microlensing can broach over the next decade or so.
• Extragalactic Planets: Microlensing searches of M31 are not presently sensitive to plan-
ets, but could be with relatively minor modifications. M31’s greater distance implies that
only more luminous (hence physically larger) sources can give rise to detectable microlensing
events. To generate substantial magnification, the planetary Einstein ring must be larger
than the source, which generally implies that Jupiters are detectable, but Neptunes (or
Earths) are not (Covone et al. 2000; Baltz & Gondolo 2001). Nevertheless, it is astonishing
that extragalactic planets are detectable at all. To probe for M31 planets, M31 microlensing
events must be detected in real time, and then must trigger intensive followup observa-
tions of the type currently carried out toward the Galactic bulge, but with larger telescopes
(Chung et al. 2006). This capability is well within reach.
• Nearby microlensing events: In his seminal paper on microlensing, Einstein (1936)
famously dismissed the possibility that it would ever be observed because the event rate
for the bright stars visible in his day was too small. Nevertheless, a Japanese amateur
recently discovered such a “domestic microlensing event” (DME) of a bright (V ∼ 11.4),
nearby (∼ 1 kpc) star, which was then intensively monitored by other amateurs (organized
by Columbia professor Joe Patterson). While intensive observations began too late to detect
planets, Gaudi et al. (2007) showed that more timely observations would have been sensitive
to an Earth-mass planet orbiting the lens. In contrast to more distant lenses, DME lenses
would usually be subject to followup observations, including RV. This would open a new
domain in microlensing planet searches. Virtually all such DMEs could be found with two
“fly’s eye” telescopes, one in each hemisphere, which would combine 120 10 cm cameras on a
single mount to simultaneously monitor the π steradians above airmass 2 to V = 15. A fly’s
eye telescope would have many other applications including an all-sky search for transiting
– 7 –
planets and a 3-day warning system for Tunguska-type impactors. Each would cost ∼$4M.
6. Conclusion and Outlook
In our own solar system, the equilibrium-temperature range probed by microlensing (out
past the “snow line”) is inhabited by four planets, two gas giants and two ice giants. All
have similar-sized ice-rock cores and differ primarily in the amount of gas they have accreted.
Systematic study of this region around other stars would test predictive models of planet
formation (e.g. Ida & Lin 2004) by determining whether smaller cores (incapable of accreting
gas) also form. Such a survey would give clues as to why cores that reach critical gas-grabbing
size do or do not actually manage to accrete gas, and if so, how much. In the inner parts of
this region, RV probes the gas giants but not the ice giants nor, of course, terrestrial planets.
RV cannot make reliable measurements in the outer part of this region at all because the
periods are too long. Future astrometry missions (such as SIM) could probe the inner regions
down to terrestrial masses, but are also limited by their limited lifetime in the outer regions.
Hence, microlensing is uniquely suited to a comprehensive study of this region.
Although microlensing searches have so far detected only a handful of planets, these have
already changed our understanding of planet formation “beyond the snowline”. Next gen-
eration microlensing surveys, which would be sensitive to dozens of “cold Earths” in this
region, are well advanced in design conception and are starting initial practical implemen-
tation. These surveys play an additional crucial role as proving grounds for a space-based
microlensing survey, the results of which are likely to completely revolutionize our under-
standing of planets over a very broad range of masses, separations, and host star masses (see
the Bennett et al. ExoPTF white paper).
Traditionally, US astronomers have played a major role in microlensing planet searches.
For example, Bohdan Paczyński at Princeton essentially founded the entire field (Paczynski
1986) and co-started OGLE. Half a dozen US theorists have all contributed key ideas and led
the analysis of planetary events. The Ohio State and Notre Dame groups have played key
roles in inaugurating and sustaining the follow-up teams that made 3 of the 4 microlensing
planet detections possible.
Nevertheless, it must be frankly stated that the field is increasingly dominated by other
countries, often with GDPs that are 5–10% of the US GDP, for the simple reason that they
are outspending the US by a substantial margin. There are simply no programs that would
provide the $5–$10M required to be in the NextGen microlensing game. If US astronomers
still are in this game at all, it is because of the strong intellectual heritage that we bring,
augmented by the practical observing programs that we initiated when the entire subject
was being run on a shoestring. These historical advantages will quickly disappear as the
next generation of students is trained on NextGen experiments, somewhere else.
– 8 –
REFERENCES
Baltz, E. A., & Gondolo, P. 2001, ApJ, 559, 41
Beaulieu, J.-P., et al. 2006, Nature, 439, 437
Bennett, D. P., & Rhie, S. H. 1996, ApJ, 472, 660
Bennett, D. P., & Rhie, S. H. 2002, ApJ, 574, 985
Bennett, D. P., Anderson, J., Bond, I. A., Udalski, A., & Gould, A. 2006, ApJ, 647, L171
Bennett, D. P., Anderson, J., & Gaudi, B. S. 2006, ApJ, accepted (astro-ph/0611448)
Bond, I. A., et al. 2004, ApJ, 606, L155
Chung, S.-J., et al. 2006, ApJ, 650, 432
Covone, G., de Ritis, R., Dominik, M., & Marino, A. A. 2000, A&A, 357, 816
Dong, S., et al. 2006, ApJ, 642, 842
Einstein, A. 1936, Science, 84, 506
Ford, E. B., & Rasio, F. A. 2007, ApJ, submitted (astro-ph/0703163)
Gaudi, B. S., & Gould, A. 1997, ApJ, 486, 85
Gaudi, B. S., Naber, R. M., & Sackett, P. D. 1998, ApJ, 502, L33
Gaudi, B. S., et al. 2007, ApJ, submitted (astro-ph/0703125 )
Goldreich, P., Lithwick, Y., & Sari, R. 2004, ApJ, 614, 497
Gould, A., & Loeb, A. 1992, ApJ, 396, 104
Gould, A., et al. 2006, ApJ, 644, L37
Griest, K., & Safizadeh, N. 1998, ApJ, 500, 37
Ida, S., & Lin, D. N. C. 2004, ApJ, 604, 388
Juric, M., & Tremaine, S. 2007, ApJ, submitted (astro-ph/0703160)
Mao, S., & Paczynski, B. 1991, ApJ, 374, L37
Paczynski, B. 1986, ApJ, 304, 1
Udalski, A., et al. 2005, ApJ, 628, L109
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0611448
http://arxiv.org/abs/astro-ph/0703163
http://arxiv.org/abs/astro-ph/0703125
http://arxiv.org/abs/astro-ph/0703160
ABSTRACT
  Microlensing is a proven extrasolar planet search method that has already
yielded the detection of four exoplanets. These detections have changed our
understanding of planet formation ``beyond the snowline'' by demonstrating that
Neptune-mass planets with separations of several AU are common. Microlensing is
sensitive to planets that are generally inaccessible to other methods, in
particular cool planets at or beyond the snowline, very low-mass (i.e.
terrestrial) planets, planets orbiting low-mass stars, free-floating planets,
and even planets in external galaxies. Such planets can provide critical
constraints on models of planet formation, and therefore the next generation of
extrasolar planet searches should include an aggressive and well-funded
microlensing component. When combined with the results from other complementary
surveys, next generation microlensing surveys can yield an accurate and
complete census of the frequency and properties of planets, and in particular
low-mass terrestrial planets.

<|endoftext|><|startoftext|>
Introduction
Luminous infrared galaxies (LIGs) are characterized by ex-
treme IR luminosities LIR >∼ 1011L⊙ at mid- to far-infrared
(FIR) wavelengths. In their comprehensive spectroscopic sur-
vey of LIGs Kim et al. (1995) and Veilleux et al. (1995) have
shown a clear tendency for the more luminous objects to be
more Seyfert-like. The starburst and AGN are tightly connected
phenomena and the interaction between them is a matter of de-
bate.
Send offprint requests to: Ivanka Yankulova, e-mail:
yan@phys.uni-sofia.bg
⋆ Based on observations obtained at the Peak Terskol Observatory,
Caucasus, Russia.
Based on a large spectroscopic optical survey of bright
IRAS and X-ray sources from ROSAT All Sky Survey, Moran
et al. (1996) extracted low-redshift galaxies with optical spec-
tra characterized by the HII regions and X-ray luminosities
typical of AGNs and these objects were named Composite
Seyfert/Starburst galaxies. Other similar galaxies (i.e. with
bright X-ray emission together with the clear predominance of
a starburst in the optical and IR regime) have been found also in
the deep ROSAT fields (Boyle et al. 1995, Griffiths et al. 1996)
and in the Chandra and XMM-Newton deep fields (Rosati et al.
2001).
A significant part of the observed FIR-emission of these
composites could be associated with circumnuclear starburst
events. The nuclear X-ray source there is generally absorbed
http://arxiv.org/abs/0704.0768v1
2 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ...
with column density of NH > 10
22 cm−2 and these values range
from 1022 cm−2 to higher than 1024 cm−2 for about 96 % of this
class of objects (Risaliti et al. 1999, Bassani et al. 1999). The
circumnuclear starburst should also play a major role in the ob-
scuration processes – see for details Levenson et al. (2001) and
references therein. However, there are Sy2 galaxies with col-
umn densities lower than 1022 cm−2. Panessa & Bassani (2002,
hereafter PB02) present a sample of 17 type 2 SyGs showing
such low absorption in X-rays. The Compton thin nature of
these sources is strongly suggested by some isotropic indica-
tors such as FIR and ø3 emission.
The fraction of Composite Seyfert/Starburst objects is esti-
mated to be in the range of 10% - 30% of the Sy2 population.
The simple formulation of the Unified model for SyGs is not
applicable in such sources. The observed absorption is likely
to originate at larger scales instead in the pc-scale molecular
torus. Probably the Broad Line Regions (BLRs) of these ob-
jects are covered by some obscuring dusty material.
NGC 7679 is a nearby (z = 0.0177) nearly face-on SB0
Seyfert 2 type galaxy in which starburst and AGN activities co-
exist. The IRAS fluxes show that the luminosity of NGC 7679
in the far infrared is about LFIR ≈ 1011L⊙. This object is in-
cluded in the large spectroscopic survey of 200 luminous IRAS
galaxies (Kim et al. 1995, Veilleux et al. 1995). NGC 7679 is
physically associated by a common stream of ionized gas with
the Sy2 galaxy NGC 7682 at ∼ 4.5 arcmin eastward (PA ≈
72◦) forming the pair Arp 216 (VV 329). The tidal interac-
tions between both galaxies together with the existence of a
bar in NGC 7679 could enhance the gas flow towards the nu-
clear regions and possibly trigger the starburst processes (Gu et
al. 2001).
The X-ray properties of the NGC 7679 based on the
BeppoSAX observations and on the ASCA archive were dis-
cussed by Della Ceca et al. (2001, hereafter DC01). Their
conclusion is that NGC 7679 is a Seyfert-starburst composite
galaxy which implies the clear predominance of an AGN in the
X-ray regime connected with a starburst in the optical and IR
regime. DC01 found that a simple power-law spectral model
with Γ ∼ 1.75 and small intrinsic absorption (NH < 4 × 1020
cm−2) provides a good description of the spectral properties of
NGC 7679 from 0.1 to 50 keV. The small X-ray absorption
and the absence of strong (EW ∼ 1 keV) Fe-lines suggest a
Compton thin type 2 AGN in NGC 7679 which clearly distin-
guishes this galaxy from the other LIG Seyferts.
The main goal of this article is to investigate both gas distri-
bution and ionization structure in the circumnuclear regions of
the luminous IR unabsorbed Seyfert galaxy NGC 7679 and to
look for tracers of the presence of a hidden Sy1-type nucleus.
Some information on the observations and data reduction
procedures is presented in Section 2. The results are presented
in Section 3 and discussed in Section 4. The combination of
the data taken from recent literature and our Fabry-Perot ob-
servations provides new insight in the circumnuclear region of
NGC 7679 and in the phenomena occurring there.
Table 1. Observiation details
image interference Fabry-Perot frames
frame filtera) tuned ×
wavelength exposure
λc/FWHM λFP time
(Å)/(Å) (Å) (s)
Hα 6662/55 6674.8 1 × 1800
2 × 900
[N II]λ6548 6662/55 6659.9 1 × 900
continuum 6719/33 6720.0 1 × 1800
1 × 900
ø3 λ5007 5094/44 5092.4 2 × 900
continuum 5002/41 4437.7 1 × 1200
Gunn rb) 6800/1110 1 × 60
BG 39/2b) 4720/700 2 × 1500
a) Used to separate Fabry-Perot working orders
b) Broad-band image taken without Fabry-Perot to reveal the mor-
phology
2. Observations and data reduction
NGC 7679 was observed by K. Jockers, T. Bonev, and T.
Credner with the 2m RCC reflector of the Peak Terskol
Observatory, Caucasus, Russia. The observations were carried
out in October 1996 with the Two-channel Focal Reducer of
the former Max-Planck-Institut für Aeronomie, Germany (now
Max-Planck-Institut für Sonnensystemforschung, MPS). This
instrument was primarily intended for cometary studies but it
has repeatedly been used for observations of active galactic nu-
clei (see for example Golev et al. 1995, 1996, and Yankulova
1999). The technical data and the present capabilities of the
MPS Two-channel Focal Reducer are described in Jockers
(1997) and Jockers et al. (2000).
All observations were taken in Fabry-Perot (FP) mode us-
ing tunable FP narrow-band imaging with spectral FWHM of
the Airy profile δλ in order of 3 - 4 Å. The details of obser-
vations are presented in Table 1 where the central wavelengths
λc and the effective width ∆λ of the interference filters used
to separate the Fabry-Perot interference orders, the wavelength
λFP at which the Fabry-Perot was tuned, and the exposures are
listed.
The overall “finesse” of the system ∆λ/δλ is ≈ 15, ∆λ is
the free spectral range of the FP. As one can see from Table 1
∆λ is comparable to the filter’s band width and therefore all FP
orders except the central one are efficiently suppressed. Two
exposures of NGC 7679 were obtained through each filter to
eliminate cosmic ray events and to increase the signal-to-noise
ratio. Flatfield exposures were obtained using dusk and dawn
Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 3
Fig. 1. Contours of continuum-subtracted narrow-band
ø3 λ5007-image superimposed on the gray-scale ø3
λ5007-emission distribution of the circumnuclear region
of NGC 7679. The background noise level is σ = 2.01 × 10−17
ergs cm−2 s−1 arcsec−2. The outermost contour is taken at 5σ
above the sky level and the next contours increase by a factor
2. Note East-West elongation and two extrema decentered
of about ∼ 4 arcsec from the position of the nucleus marked
by cross. North is up, East is to the left.
twilight for uniform illumination of the detector. No dark cor-
rection was required.
The images were reduced following the usual reduction
steps for narrow-band imaging. After flatfielding the frames
were aligned by rebinning to a common origin. The final align-
ment of all the images was estimated to be better than 0.1 px
(the scale is 1 px = 0.8 arcsec). A convolution procedure was
performed in order to match the Point-Spread Functions (PSFs)
of each line-continuum pair which unavoidably degrades the fi-
nal FWHM of the images to the mean value ≈ 3 − 3.3 arcsec
(shown as ’seeing’ in Fig. 1). At the distance of NGC 7679
one arcsec corresponds to a distance of about 340 pc assuming
H0 = 75 km sec
−1 Mpc−1.
3. Results
3.1. Narrow-band emission-line images
Gray-scale images of the narrow-band flux distribution of the
extended circumnuclear region of NGC 7679 in the ø3 λ5007,
Hα, and [N II]λ6548 emission lines with superimposed con-
tours are presented in Fig. 1, 2, and 3, respectively.
The ø3 λ5007 emission shown in Fig. 1 reveals a bright,
about 20 arcsec in size, extended emission-line region (EELR)
which is elongated approximately in East direction (PA ≈ 80◦±
10◦). This region is similar to the analogous EELRs observed
in many Sy2 type galaxies. Most probably it is powered by the
Fig. 2. Contours of continuum-subtracted narrow-band Hα-
image superimposed on the gray-scale Hα-emission distribu-
tion of the circumnuclear region of NGC 7679. The back-
ground noise level is σ = 2.77 × 10−18 ergs cm−2 s−1 arcsec−2.
The outermost contour is taken at 5σ above the sky level and
the next contours increase by a factor of
2. North is up, East
is to the left.
Fig. 3. Contours of continuum-subtracted narrow-band
[N II]λ6548-image superimposed on the gray-scale
[N II]λ6548-emission distribution of the circumnuclear region
of NGC 7679. The background noise level is σ = 4.75 × 10−18
ergs cm−2 s−1 arcsec−2. The outermost contour is taken at 5σ
above the sky level and the next contours increase by a factor
2. North is up, East is to the left.
4 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ...
AGN-type activity of the nucleus. The emission-line peak of ø3
λ5007 is shifted at about ∼ 4 arcsec to the East with respect to
the center defined by the continuum emission and marked by
cross in Fig. 1.
At larger distances (∼ 37 arcsec) the ionized gas forms an
envelope which is extended along the direction PA ≈ 72◦ to
the NGC 7682, the counterpart of NGC 7679, as it was already
noted by Durret & Warin (1990).
In Fig. 2 we present our very deep and high-contrast Hα
continuum-subtracted image with numerous starburst regions
where because of both seeing and pixel size we are able to
see only elliptical central isophotes instead of the “double nu-
cleus” observed recently by Buson et al (2006). Our analysis
of the unpublished Hα images taken from the archive of the
Isaak Newton Group of telescopes at La Palma as well as the
archive images of Buson et al. (2006) from the ESO La Silla
NTT also revealed a “double nucleus” otherwise unseen in the
known broad-band images. The separation between the nuclear
counterparts (in fact one is the active nucleus itself and the
other one is a bright spiral-like extremely powerful starburst
region) is ≈ 3 arcsec. The existence of this “double nucleus” in
NGC 7679 could enhance the gas flows towards the nuclear
regions and possibly trigger the starburst process itself. The
“double nucleus” can be also seen at very different wavelength
range on 6 cm and 20 cm high-resolution VLA radio continuum
map of NGC 7679 published by Stine (1992). The angular dis-
tance and PA between two counterparts is quite the same. The
radio spectral index is −0.37 and steepens away from the cen-
ter which indicates that nonthermal emission leaks out of the
starburst region.
The low-excitation gas traced by the emission in Hα re-
veals different morphology as compared to that of the ø3 λ5007
emission. Inside of the region with radius of 6 – 8 arcsec from
the center the contours of the Hα emission are nearly circular.
Outside this region to the West of the main body of NGC 7679
a clearly outlined wide arc is observed at 16 arcsec (∼ 5 kpc)
from the center. To the East this arc converts into a gaseous
envelope which forms a part of a circumnuclear starforming
ring mentioned by Pogge (1989). This arc is not detected on
the narrow-band continuum image next to the Hα. The same
morphology in Hα + [N II] with higher spatial resolution was
observed by Buson et al. (2006).
The Fabry-Perot technique used by us makes possible to
disentangle [N II]λ6548 from Hα. The pure [N II]λ6548 emis-
sion (Fig. 3) shows extended structure ∼ 20 arcsec in diameter.
The starforming ring revealed by the Hα image is not seen here.
As a rule the gas component in the starforming ring is ionized
by stellar UV-emission and the [N II]λ6548 is weaker than that
one where the gas is ionized by power-low AGN continuum.
On the other hand, this could be an effect due to the shorter
exposure time of our [N II]λ6548 frame.
3.2. Narrow-band emission-line total fluxes
The total emission-line fluxes of Hα, ø3 λ5007and [N II]λ6583
were estimated from our flux calibrated images in an aperture
of 2 kpc (r <∼ 3 arcsec) like the one used by the authors cited
in Table 2. In this Table we have collected available measure-
ments of the emission lines observed by us up to now. Our
measured fluxes are in good agreement with those of Kim et
al. (1995) and differ from the measurements of Contini et al.
(1998). Flux values given by Contini et al. (1998) are twice
larger than ours and those given by Kim et al. (1995).
Recently Gu et al. (2006) measured the central flux in ø3
λ5007. We found a reasonable coincidence between their value
(1.55 × 10−14 ergs cm−2 s−1) and ours (1.94 × 10−14 ergs cm−2
s−1) in the much smaller aperture used by them.
We estimated the flux of the continuum near ø3 λ5007
within the central 2 kpc to be F(λcont) = 6.74×10−15 ergs cm−2
s−1Å. Then the equivalent width of the emission line ø3 λ5007
is EW(λ 5007) = 7.6 Å. Baskin and Loar (2005) have used the
photoionization code CLOUDY to calculate the dependence of
EW(λ 5007) on the electron density ne, the ionization parame-
ter U, and the covering factor CF. Following their Fig. 5 and
our estimation of EW(λ 5007) we derive for the covering factor
CF the range 0.016 ≤ CF ≤ 0.04 with the most probable value
CF ≈ 0.024.
There is a large quantity of absorbing matter in the central
region of NGC 7679 (Telesco et al. 1995) which modifies the
Balmer emission lines. The Balmer decrement reported by Kim
et al. (1995) in the central 2 kpc is F(Hα)/F(Hβ)≈ 17.4, but fol-
lowing Contini et al. (1998) this decrement is 8.5. Kewley et al.
(2000) give E(B − V)= 0.47 which results to F(Hα)/F(Hβ)=
5.04. In Table 2 the value of the parameter C is evaluated from
the measured Balmer decrement and from the assumption that
in AGNs F(Hα)/F(Hβ)= 3.1 and the optical depth τλ = C f (λ)
where f (λ) is the reddening curve (Osterbrock 1989). The ex-
tinction E(B − V) derived from the Balmer decrement is also
given in Table 2.
Contini et al. (1998) present measurements of emission-
lines fluxes made in the extranuclear region 9 arcsec off the nu-
cleus at PA = 207◦ in an aperture of 3 arcsec. We estimated the
emission-line fluxes from our images in the same aperture at
the same place in order to compare with those given by Contini
et al. (1998). The results are given in Table 2. The Contini’s
values are about 2 times larger than ours in the extranuclear
region as well as at the nucleus.
Moustakas & Kennicutt (2006) report total emission-line
fluxes of Hα and ø3 λ5007 in a wide rectangular aperture
30 × 80 arcsec oriented at PA = 90◦. Their Hα-flux F(Hα) =
(1.535± 0.062)× 10−12 ergs cm−2 s−1 coincides with our value
(1.52 × 10−12 ergs cm−2 s−1) in the same wide aperture after a
correction for extinction with E(B − V) = 0.065 used by them.
In ø3 λ5007 the coincidence is reasonably good (4.72 × 10−13
compared with ours 3.90 × 10−13 ergs cm−2 s−1).
3.3. The ionization map F(ø3 λ5007) / F(Hα)
Our flux-calibrated emission-line images are used to form the
F(ø3 λ5007)/F(Hα) ionization map in order to analyse the
mean level of ionization. This map is shown in the left panel of
Fig. 4. All pixels below 4σ of the background noise level were
suppressed before the division of the corresponding images.
The ionization map infers a presence of a maximum shifted
Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 5
Table 2. Measured emission lines fluxes in 2 kpc central aperture in NGC 7679
Emission Measured flux F(λ), ergs cm−2 s−1
2 kpc central aperture 9 arcsec off the nucleus
1 2 3 4 5 6 7
Hα 1.92 × 10−13 1.9 × 10−13 4.5 × 10−13 – 3.8 × 10−13 3.73 × 10−14 1.04 × 10−14
[N II]λ6548 9.96 × 10−14 1.08 × 10−13 1.86 × 10−13 – – 9.8 × 10−15 4.5 × 10−15
ø3 λ5007 5.2 × 10−14 5.3 × 10−14 8.8 × 10−14 – – 9 × 10−15 4.6 × 10−15
Hβ – 1.1 × 10−14 5.24 × 10−14 1.0 × 10−14 – 5.9 × 10−15 –
F(Hα)/F(Hβ) – 17.4 8.5 5.0 4.58 6.3 –
F(Hγ)/F(Hβ) – 0.24 0.32 0.4 0.3 – –
C – 4.93 2.88 1.6 1.12 2.02 –
E(B − V) – 1.45 0.85 0.47 0.33 0.65 –
Columns: 1 - this work; 2 - Kim et al. (1995); 3 - Contini et al. (1998); 4 - Kewley et al. (2000); 5 - Buson et al. (2006); 6 - Contini et al.
(1998); 7 - this work, PA = 207◦.
Fig. 4. F(ø3 λ5007) / F(Hα) ionization map of NGC 7679. All pixels below 4σ of the background noise level were suppressed
before image division (left). The ratio F(ø3 λ5007)/F(Hα) vs the axial distance from the nucleus along PA ≈ 80◦ (right). The
positions labeled 1 to 5 are equidistant with step size of 3 arcsec. We refer to them later in the text (see Fig. 6).
to the East at PA ≈ 80◦ with respect to the photometric cen-
ter defined by the integral light of the continuum images and
marked by cross on the figure.
A slice of this map along the PA ≈ 80◦ versus the axial dis-
tance from the nucleus is presented in the right panel of Fig. 4.
Below we will discuss in more detail the behaviour of the ion-
ization at positions 1 to 5.
4. Discussion
4.1. The ionizing flux from the central engine
In order to estimate the number of ionizing photons emitted
from the central engine, we made use of the recent X-ray obser-
vations of NGC 7679. This object was observed by ASCA and
BeppoSAX in 1998, and by XMM-Newton in 2005. A detailed
analysis of ASCA and BeppoSAX data sets is present in DC01.
6 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ...
They show that a single absorbed power-law function (with a
photon index 1.75) fits the observed spectrum very well and the
X-ray absorption is relatively small (NH ≤ 4 × 1020 cm−2).
The data for the X-ray observations in 2005 were taken
from the XMM-Newton public archive. The corresponding X-
ray spectra for the PN and the two MOS detectors were ex-
tracted following the standard procedures using the XMM-
Newton Science Analysis System software (SAS version
7.0.0). A single absorbed power-law function gave a good fit
(χ2/do f = 201/191) to all the three spectra which were fitted
simultaneously. The small X-ray absorption in the nucleus of
NGC 7679 was confirmed, NH = 5.6[4.0 ÷ 7.5] × 1020 cm−2,
and no change in the shape of the spectrum was found, a pho-
ton index of 1.81[1.70÷1.92] (the 90%-confidence intervals are
given in brackets). The absorbing X-ray column density along
the line of sight is about an order of magnitude smaller than
that one estimated from the observed Balmer decrement which
is NH ∼ 8 × 1021 cm−2 and ∼ 5 × 1021 cm−2 following Kim et
al. (1995) and Contini et al. (1998), respectively.
Interestingly, the observed X-ray flux has decreased by a
factor ∼ 10 over a time period of ∼ 7 years: FX = 3.8 ×
10−13 and 5.8 × 10−13 ergs cm−2 s−1 correspondingly in the
0.1-2.0 keV and 2.0-10.0 keV energy intervals. Since, on the
one hand, there is only about 5% scatter of the fluxes for all
the three detectors (one PN and two MOS) around the average
values given above, and, on the other hand, NGC 7679 shows
an appreciable X-ray variability (DC01), it is then likely that
the detected decrease of the X-ray flux is real and not an instru-
mental effect.
The extrapolation of the DC01’s power law to the UV spec-
tral domain (that is to hν0 = 13.6 eV) yields F
ν = Fν0(ν0/ν)
where α = 0.75 and Fν0 = 2.0 × 10
−28 erg cm−2 s−1 Hz−1. The
same extrapolation for the XMM-Newton spectrum results in
Fntν = Fν0(ν0/ν)
α where α = 0.81 and Fν0 = 2.8 × 10
−29 erg
cm−2 s−1 Hz−1.
The number of ionizing photons with hν > 55 eV provided
by the central AGN source is defined as
Nion =
55 eV
F ntν
dν = 4πR2G
hν=55 eV
where RG is the distance to the NGC 7679. For the BeppoSAX
data this estimation is Nion ∼ 1052 ph s−1 and for the XMM-
Newton data Nion ∼ 1051 ph s−1. These values are averaged be-
tween all BeppoSAX and XMM-Newton bands, respectively.
The number of ionizing photons decrease from the BeppoSAX
time to the XMM-Newton time in the range of 1051 <∼ Nion <∼
1052 ph s−1.
4.2. Physical conditions in the circumnuclear region of
NGC 7679
The extended emission-line region in NGC 7679 has a rather
different morphology when observed in Hα (low ionization
emission line) as compared to ø3 λ5007 (high ionization emis-
sion line). The Hα image (Fig. 2) contains a compact circum-
nuclear region (∼ 20 arcsec in diameter) whose isophotes do
not infer any preferred direction. In contrast, the ø3 λ5007 im-
age (Fig. 1) of the circumnuclear region of NGC 7679 shows
elliptical isophotes extended along the PA ≈ 80◦ ± 10◦. Such
difference in morphology of the emission-line images signals
the presence of at least two distinct ionization components (see
for example Pogge 1989).
The extended morphology both of the ø3 λ5007 image
(Fig. 1) and of the ø3 λ5007/Hα flux ratio image (Fig. 4) sug-
gests an anisotropy of the radiation field. In order to check
whether the ionizing field is collimated or not we have to com-
pare the number of ionizing photons Nph, absorbed by the ex-
tended emission line gas with the number of ionizing photons
Nion, emitted by the central AGN engine. Usually, the hydro-
gen line flux F(Hα ) or F(Hβ ) is used to find Nph. But the
NGC 7679 high resolution Hα image reveals a central circum-
nuclear star-forming spiral ring capable of producing about ∼
75% of the optical line emission within a radius of ∼ 1 kpc
(Buson et al. 2006). For this reason it is not quite correct to use
the F(Hα) in order to make the Nph estimate.
Kauffmann et al. (2003) focus on the luminosity of the ø3
λ5007 as a tracer of AGN activity. We can estimate the number
Nph of ionizing photons with energy above hν = 55 eV from
the observed ø3 λ5007 luminosity after correction for extinc-
tion. A dust correction to ø3 based on the ratio F(Hα) / F(Hβ)
should be regarded as best approximation (Kauffmann et al.
2003). According to Draine & Lee (1984) (Fig.7 therein) the
optical depth is τ5007 = 0.96 C = 2.76. Here we adopt the
value of C= 2.88 following Contini et al. (1998) as a more com-
promising reddening value among the different Balmer decre-
ment assessments. Then the luminosity, corrected for extinc-
tion, Lcorr([O+2]λ5007) = 4.4 × 1041 ergs s−1. We note that
PB02 give 5.7×1041ergs s−1 for the ø3 λ5007 luminosity.
The total number of ionizing photons that must be available
to produce the observed ø3 λ5007 emission is given by the
expression
Nph =
+2, Te)L
corr([O+2]λ5007) CF−1
αeff5007(ne, Te) hν5007
≈ 2 × 1052 ph s−1 (2)
where αG(O
+2, Te) = 5.1 × 10−12 cm3 s−1 (Aldrovandi &
Pequignot 1973) is the recombination coefficient at Te ≈ 104 K
and αeff5007(ne, Te) = 1.1× 10
−9 cm3 s−1 is the effective recombi-
nation coefficient at ne = 10
5 cm−3 and Te = 10
4 K. This coeffi-
cient strongly depends on the electron density and temperature.
If we accept Te = 10
4 K then αeff5007(ne) = 5.14 × 10
−3A21/ne
cm3 s−1 where A21 = 0.021 s
−1. As the critical electron density
is ncre (5007) = 5×10
5 cm−3 we assume that the electron density
is not lower than ne ≈ 104 cm−3 in order to emit the ø3 λ5007.
Then the lower limit for Nph is ≈ 2×1051 ph s−1. For NGC 7679
the covering factor CF = 0.024.
The photon ratio Nph/Nion is a probe of the collimation hy-
pothesis. In the anisotropic case this ratio is considerably larger
than 1. Under the above assumptions about ne and Te we esti-
mate for NGC 7679 0.2 <∼ (Nph/Nion)hν>55 eV <∼ 20 but the lower
limit could increase if the luminosity L([O+2]λ5007) is inte-
grated over the whole image. The increase of the upper limit of
Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 7
Fig. 5. Spectral energy distribution (SED) from the radio to the
X-ray band of the composite Starburst/Sy2 galaxy NGC 7679
(open diamonds). The radio values at 6 cm and 20 cm are from
VLA (Stine, 1992). The X-ray band data are from ASCA and
BeppoSAX (DC02 and Risaliti, 2002). Filled diamonds repre-
sent recent X-ray observations taken from the XMM-Newton
archive. All other data are taken from NED. The SED has been
compared with a normal spiral galaxy template (dotted line)
taken from Elvis et al.(1994), with Starburst and Sy2 galaxy
templates (dashed line and thin solid line) taken from Schmitt
et al. (1997), and with Sy1 galaxy template (thick solid line)
taken from Mas-Hesse et al. (1995).
this ratio is due to the XMM-Newton data which are ∼8 times
lower than ASCA/BappoSAX ones.
Both the ratio (Nph/Nion)hν>55 eV and the presence of weak
and elusive broad Hα-wings (Kewley et al. 2000) indicate a
hidden AGN in the NGC 7679. Contrary, the NGC 7679 X-
ray spectrum is not highly absorbed and NH < 4 × 1020 cm−2
(see discussion in section 4.1). As a matter of fact Bian & Gu
(2006) recently found a very high detectability of hidden BLRs
(∼ 85%) for Compton-thin Sy2s with higher ø3 luminosity of
L([O+2]λ5007) > 1041 erg s−1.
We have to note that NGC 7679 resembles in many respects
the galaxy IRAS 12393+3520. In this galaxy direct X-ray evi-
dence suggests the presence of a hidden AGN (Guainazzi et al.,
2000). This homology can be seen in Fig. 5 where the spectral
energy distribution (SED) from the radio to the X-ray band of
NGC 7679 is shown.
The composite nature of NGC 7679 is clearly seen.
Whereas the starburst component dominates in the FIR-IR
range, the X-ray band emission is well below that of a typi-
cal Sy1. The extrapolation of the power-low X-ray spectrum to
13.6 eV shows a much lower value than the typical Sy2 emis-
sion at this wavelength. This again favors the idea about a hid-
den central engine. Guainazzi et al. (2000) suppose that a dusty
ionized absorber is able to obscure selectively the optical emis-
sion, leaving the X-rays almost unabsorbed.
Fig. 6. The ø3 λ5007/Hβ vs. [N II]λ6583/Hα diagnostic dia-
gram of Veilleux & Osterbrock (1987). The dashed and dotted
theoretical lines demarcate between Starbursts and AGNs ac-
cording to Kauffmann et al. (2003) and Kewley et al. (2001),
respectively. The line dividing between LINERs and SyGs is
taken according to PA02. The label “Comp” indicates the re-
gion of the diagram in which composite objects are expected to
be found. The diagnostic value measured by us is denoted by
thick triangle. See text for other designations.
4.3. Ionization structure in the circumnuclear region of
the NGC 7679
The ionization map (the right panel of Fig. 4) displays the
clear signature of highly-excited gas. The ø3 λ5007/Hα-ratio
increases in the direction of the counterpart galaxy NGC 7682
reaching a maximum of ≈ 2.5 at about 12 arcsec off the nu-
cleus. More than 15 years ago Durret & Warin (1990) also re-
ported about the presence of high-ionization gas in this direc-
tion (see their Fig.3a) but their result seemingly did not attract
attention.
On the other hand at PA ≈ 0◦ our map shows values around
ø3 λ5007/Hα ≈ 0.3 and the ionization in this direction is en-
tirely due to the young hot stars.
The ø3 λ5007/Hβ vs. [N II]λ6583/Hα diagnostic diagram
(Veilleux & Osterbrock, 1987) helps to delineate the different
ionization mechanisms maintaining the ionization of gaseous
component in AGNs and in Starbursts. In Fig. 6 such a dia-
gram is shown for NGC 7679. Kewley et al. (2001) distinguish
between Starbursts and AGN using a theoretical upper limit de-
rived from star forming models. This limit is shown as a dotted
line in Fig. 6. Objects with emission-line ratios above this limit
cannot be explained by any possible combination of parame-
ters in a star forming model. Kauffmann et al. (2003) published
an updated estimate for the starburst boundary derived from
the SDSS observations. In Fig. 6 this boundary is shown as a
dashed line. The location of the Composites is expected to lie
between these two lines (see e.g. Panessa et al., 2005).
8 Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ...
In Fig. 6 we plot the emission-line flux ratios of NGC 7679
measured in an aperture of 3 arcsec in steps of 3 arcsec both
along the PA≈ 80◦ (with crosses) and PA= 0◦ (with diamonds).
The labels 1 - 5 for PA≈ 80◦ correspond to the labels in the
right panel of Fig 4. Using spectra taken from the Smithsonian
Astrophysical Observatory data Center Z-Machine Archive
obtained with 3 arcsec slit width, we estimate the observed
F(Hα)/F(Hβ) ∼ 5 in NLR. On Fig. 6 positions 1 and 2 at PA
= 80◦ off the nucleus lie well within the region occupied by
the Sy 2 galaxies. The position 5, which is at the same distance
from the nucleus but in opposite direction, is located nearly on
the dividing line.
All points which refer to the PA = 0◦ are situated between
Kauffmann’s and Kewley’s demarcation lines in the region of
Composites.
In Fig. 6 we also plot with asterisks the nuclear diagnostic
ratios according to the data of authors presented in Table 2. The
thick triangle refers to the nucleus according to our measure-
ments under the assumption of F(Hα)/F(Hβ) = 8.5 (Contini et
al., 1998). The large scattering of nuclear values is probably
due to the variations of the strength of Hβ absorption line of
the star-forming stellar population.
4.4. Unabsorbed SyGs with and without hidden BLRs
The unabsorbed Sy2 galaxies with low absorption in X-rays
(NH < 10
22 cm−2) possess a hidden or nonhidden central en-
gine and BLRs. We have used the ø3 λ5007 emission to test the
presence of hidden or nonhidden AGN sources in unabsorbed
Sy2 galaxies in the sample of PB02 (14 objects) and Panessa et
al., 2005 (6 objects selected by Moran et al., 1996) in the same
way as it was done for NGC 7679 (Subsections 4.1 and 4.2).
We derive the ratio (Nph/Nion)hν>55 eV following equations (1)
and (2) under the assumtions of ne ≈ 5 × 104 cm−3 (which is
an order of magnitude smaller than the critical electron density
for the ø3 λ5007 emission), Te ≈ 104 K, and CF ≈ 10−2. These
assumptions refer to the inner circumnuclear clouds of AGNs.
The ratios are presented in Table 3. For the objects dis-
cussed in Panessa et al. (2005) the most popular (i.e. as in
NED) galaxy names are used. The Lcorr([O+2]λ5007) values
are taken from PB02 and Panessa et al. (2005). In the case of
NGC 7679 we have used both their and our determinations of
Lcorr([O+2]λ5007).
For three objects with estimated broad Hα component
LbroadHα (Panessa et al., 2005, Table 1 therein) we derive also the
number of recombinations Nrec resulting in the Hα emission.
We assume Te = 10
4 K and CF = 1 which leads to the estima-
tion of the lower limit of the value of Nrec. The Nrec/Nion lower
limits are also presented in Table 3.
One can see that 17 out of 20 objects of the unabsorbed
Sy2s discussed here reveal Nph/Nion)hν>55 eV > 0.3. This indi-
cates that the central AGN sources in a considerable part of
the unabsorbed Sy2s are obscured. The NGC 7679 does not
make an exception and also possesses a hidden AGN engine
suggested both by the ø3 λ5007 morphology and by the photon
deficiency.
Table 3. The photon deficiency for unabsorbed Sy2s discussed
by Panessa and Bassani (2002) and Panessa et al. (2005)
galaxy (Nph/Nion)hν>55 eV Nrec/Nion
(lower limit)
ESO 540-G001 4.2 13.0
CGCG 551-008 1.0
MCG -03-05-007 2.2
UGC 03134 19.5
IRAS 20051-1117 1.6 1.2
CGCG 303-017 1.3 2.0
IC 1631 0.3
NGC 2992 2.0
NGC 3147 0.4
NGC 4565 6.7
NGC 4579 0.2
NGC 4594 1.7
NGC 4698 0.3
NGC 5033 1.3
MRK 273x 0.4
NGC 5995 0.4
NGC 6221 0.02
NGC 6251 6.0
NGC 7590 0.4
NGC 7679 3.4 (2.0 from our data)
It is still not clear what kind of physical process is related to
the presence of hidden central engines in Sy2s. PB02 suggest
two scenarios for the unabsorbed Sy2s (i) the central engine
and their BLR must be hidden by an absorbing medium with
high value of the AV/NH ratio, and (ii) the BLR is very weak or
absent.
5. Conclusions
We present a new ø3 λ5007 emission - line image of the
circumnuclear region of NGC 7679 which shows elliptical
isophotes extended along the PA≈ 80◦ ± 10◦ in the direction to
the counterpart galaxy NGC 7682. The maximum of this emis-
sion is displaced by about 4 arcsec from the photometric center
defined by the continuum emission.
The ratio of the quantity of ionizing photons inferred from
the observed extinction corrected ø3 λ5007 luminosity to the
number of ionizing photons with hν > 55 eV provided by the
central AGN source (Nph/Nion)hν> 55 eV ≈ 0.2 − 20 as well as
the presence of weak and elusive Hα broad wings probably
indicate a hidden AGN.
The high ionization inferred by the flux ratio ø3 λ5007/Hα
in the direction of about PA≈ 80◦ ± 10◦ coincides with the di-
rection to the counterpart galaxy NGC 7682. It is possible that
the dust and gas in this direction has a direct view to the central
AGN engine. It suggests that starburst and dust decay in this di-
Yankulova I., Golev V., and Jockers, K.: The luminous infrared composite Seyfert 2 galaxy NGC 7679 through ... 9
rection have occurred because of tidal interaction between the
two galaxies.
In the direction PA≈ 0◦ the ionization is entirely caused by
hot stars.
A large part of the unabsorbed Compton-thin Sy2s with
higher ø3 luminosity (>∼ 1041 erg s−1) possesses a hidden AGN
source.
Acknowledgements. We are grateful to the referee, Lucio Buson, for
his valuable comments which improved both the content and the clar-
ity of this manuscript.
We would like to thank T. Bonev, Institute of Astronomy of
Bulgarian Academy of Sciences, for kindly providing the Fabry-Perot
observations and for useful discussions. We are grateful to S. Zhekov,
Space Research Institute of Bulgarian Academy of Sciences, for the
numerous fruitful discussions and especially for the analysis of the
X-ray properties of NGC 7679.
Our work was partially based on data from the La Palma ING,
ESO NTT, and XMM-Newton Archives.
This research has made use of the SIMBAD database, operated
at CDS, Strasbourg, France, and of the NASA/IPAC Extragalactic
Database (NED) which is operated by the Jet Propulsion Laboratory,
California Institute of Technology, under contract with the National
Aeronautics and Space Administration.
We acknowledge the support of the National Science Research
Fund by the grant No.F-201/2006.
References
Aldrovandi, S. M. V., & Pequignot, D. 1973, A&A, 25, 137
Bassani, I., Dadina, M., Maiolino, R.,et al. 1999, ApJS, 121,473
Baskin, A. & Laor, A. 2005, MNRAS, 358, 1043
Bian, W., & Gu, Q. 2006, ApJ accepted (astro-ph/0611199)
Boyle, B. J., McMahon, R. G., Wilkes, B. J.,& Elvis, M. 1995,
MNRAS, 276, 315
Buson, L. M., Cappellari, M., Corsini, E. M., Held, E. V., Lim, J., &
Pizzella, A. 2006, A&A, 447, 441
Condon, J., Huang, Z., Yin, Q., & Thuan, T. 1991, ApJ, 378, 65
Contini T., Considere S., & Davoust E. 1998, A&AS, 130, 285
Della Ceca, R., Pellegrini, S., Bassani, L., Beckmann, V., Cappi, M.,
Palumbo, G. G. C., Trinchieri, G., & Wolter, A. 2001, A&A, 375,
781 (DC01)
Draine, B. T., & Lee, H. M. 1984, ApJ, 285, 89
Durret, F., & Warin, F. 1990, A&A, 238, 15
Elvis M., Wilkes, B. J., McDowell, J. C., Green, R. F., Bechtold, J.,
Willner, S. P., Oey, M. S., Polomski, E., & Cutri, R. 1994, ApJS
95, 1
Golev, V., Yankulova, I., Bonev, T., & Jockers, K. 1995, MNRAS, 273,
Golev, V., Yankulova, I., & Bonev, T. 1996, MNRAS, 280, 29
Granato, G. L., & Danese, L. 1994, MNRAS, 268, 235
Griffiths, R. E., Della Ceca, R., Georgantopoulos, I., Boyle, B.,
Stewart, G., Shnks, T., & Fruscione, A. 1996 MNRAS, 281, 71
Gu, Q., Melnick, J., Fernandes, R. Cid, Kunth, D., Terlevich, E., &
Terlevich, R. 2006, MNRAS, 366, 480
Gu, Q. S., Huang, J. H., de Diego, J. A., Dultzin-Hacyan, D., Lei, S.
J., & Benitez, E. 2001, A&A, 374, 932
Guainazzi, M., Dennefeld, M., Piro, L., Boller, T., Rafanelli, P., &
Yamauchi, M. 2000, A&A, 355, 113
Heckman, T. M., Armus, L., & Miley, G. K. 1990, ApJS, 74, 833
Jockers, K. 1997, Experimental Astronomy, 7, 305
Jockers, K., Credner, T., Bonev, T., Kiselev, N., Korsun, P., Kulik, I.,
Rosenbush, V., Andrienko, A., Karpov, N., Sergeev, A., & Tarady,
V. 2000, Kinematika i Fizika Nebesnykh Tel, Suppl, No. 3, 13
Kauffmann, G., Heckman, T. M., Tremonti, C., et al. 2003, MNRAS,
346, 1055
Kewley, L. J., Heisler, C. A., Dopita, M. A., Sutherland, R. Norris, R.,
Reynolds, J., & Lumsden, S. 2000, ApJ, 530, 704
Kewley, L. J., Heisler, C. A., Dopita, M. A., & Lumsden, S. 2001,
ApJS, 132, 37
Kim, D.-C., Sanders, D. B., Veilleux, S., Mazzarella, J. M., & Soifer,
B. T. 1995, ApJS, 98, 129
Kotilainen, J. K., & Prieto, M. A. 1995, A&A, 295, 646
Levenson, N., Weaver, K., & Heckman, T. 2001, ApJ, 550, 230
Lipari, S., Bonatto, Ch., & Pastoriza, M. 1991, MNRAS, 253, 19
Mas-Hesse, J. M., Rodriguez-Pascual, P. M., Sanz Fernandez de
Cordoba, L., Mirabel, I. F., Wamsteker, W., Makino, F., & Otani,
C. 1995, A&A 298, 22
Moran, E. C., Halpern, J. P.,& Helfand, D. J. 1996, ApJS, 106, 341
Moustakas, J., & Kennicutt, R. C. 2006, ApJS, 164, 81
Osterbrock, D. 1989, Astrophysics of gaseous nebulae and active
galactic nuclei, University Science Books
Panessa, F., & Bassani, L. 2002, A&A, 394, 435 (PB02)
Panessa, F., Wolter, A., Pellegrini, S., Fruscione, A., Bassani, L., Della
Ceca, R., Palumbo, G., & Trinchieri, G. 2005, ApJ, 631, 707
Pier, E. A., & Krolik, J. 1992, ApJ, 401, 99
Pogge, R. W. 1989, AJ, 98, 124
Risaliti, G., Maiolino, R., & Salvati, M. 1999, ApJ, 522, 157
Risaliti, G. 2002, A&A, 386, 379
Rosati, P., & Chandra Deep Field South Team, 2001, A&AS,
Bull.AAS, 33, 1519
Sanders, D., Soifer, B., Elias, J., Madore, B., Matthews, K.,
Neugebauer, G., & Scoville, N. 1988, ApJ, 325, 74
Schmitt, H. R., Kinney, A. L., Calzetti, D., & Storchi Bergmann, T.
1997, AJ 114, 592
Simpson, C., Mulchaey, J. S., Wislon, A. S., Ward, M. J., & Alonso-
Herrero, A. 1996, ApJ, 457, L19
Simpson, C., Wislon, A. S., Bower, G., Heckman, T. M., Krolik, J. H.,
& Miley, G. K. 1997, ApJ, 474, 121
Smith, H. E., Lonsdale, C. J., & Londsdale C. J. 1998, ApJ, 492, 137
Stine, P. C. 1992, ApJS, 81, 49
Telesco, C. M., Dressel, L., & Wolstencroft,R. 1993, ApJ, 414, 120
Veilleux, S., Kim, D.-C., Sanders, D. B., Mazzarella, J. M., & Soifer,
B. T. 1995, ApJS, 98, 171
Veilleux, S., & Osterbrock, D. E. 1987, ApJS, 63, 295
Wilson, A. S., Braatz, J. A., Heckman, T. M., Krolik, J. H., & Miley,
G. K. 1993, ApJ, 419, L61
Yankulova, I. 1999, A&A, 344, 36
http://arxiv.org/abs/astro-ph/0611199
	Introduction
	Observations and data reduction
	Results
	Narrow-band emission-line images
	Narrow-band emission-line total fluxes
	The ionization map F(ø3 5007)/F(H)
	Discussion
	The ionizing flux from the central engine
	Physical conditions in the circumnuclear region of NGC 7679
	Ionization structure in the circumnuclear region of the NGC 7679
	Unabsorbed SyGs with and without hidden BLRs
	Conclusions
ABSTRACT
  NGC 7679 is a nearby luminous infrared Sy2 galaxy in which starburst and AGN
activities co-exist. The ionization structure is maintained by both the AGN
power-law continuum and starburst. The galaxy is a bright X-ray source
possessing a low X-ray column density N_H < 4 x 10^20 cm^{-2}. The Compton-thin
nature of such unabsorbed objects infers that the simple formulation of the
Unified model for SyGs is not applicable in their case. The main goal of this
article is to investigate both gas distribution and ionization structure in the
circumnuclear region of NGC 7679 in search for the presence of a hidden Sy1
nucleus, using the [O III] 5007 luminosity as a tracer of AGN activity. The [O
III] 5007 image of the NGC 7679 shows elliptical isophotes extended along the
PA ~ 80 deg in the direction to the counterpart galaxy NGC 7682. The maximum of
ionization by the AGN power-law continuum traced by [O III] 5007/Halpha ratio
is displaced by ~ 13 arcsec eastward from the nucleus. We conclude that the
dust and gas in the high ionization direction has a direct view to the central
AGN engine. This possibly results in dust/star-formation decay. A large
fraction of the unabsorbed Compton-thin Sy2s with [O III] luminosity > 10^41
erg s^{-1} possesses a hidden AGN source (abridged).

<|endoftext|><|startoftext|>
arXiv:0704.0769v1  [cond-mat.other]  5 Apr 2007
The Fermionic Density-functional at Feshbach Resonance
Michael Seidl
Institute of Theoretical Physics, University of Regensburg, D-93040 Regensburg, Germany
Rajat K. Bhaduri
Department of Physics and Astronomy, McMaster University, Hamilton, Canada L8S 4M1
(Dated: November 17, 2018)
We consider a dilute gas of neutral unpolarized fermionic atoms at zero temperature. The atoms
interact via a short-range (tunable) attractive interaction. We demonstrate analytically a curious
property of the gas at unitarity. Namely, the correlation energy of the gas, evaluated by second
order perturbation theory, has the same density dependence as the first order exchange energy,
and the two almost exactly cancel each other at Feshbach resonance irrespective of the shape of
the potential, provided (µrs) ≫ 1. Here (µ)−1 is the range of the two-body potential, and rs is
defined through the number density, n = 3/(4πr3s). The implications of this result for universality
is discussed.
I. INTRODUCTION
Consider a dilute gas ofN ≫ 1 neutral fermionic atoms
(massM) at T = 0 interacting with a short-range attrac-
tive potential. In general, the properties of the dilute gas
are determined by the number density n, and the scatter-
ing length a. The Hamiltonian of this N -particle system
reads
Ĥ = − ~
∇2i +
|ri − rj |
. (1)
Not written explicitly here, there is also an external po-
tential vext(r) that forces the N atoms to stay within
a large box with volume Ω [where vext(r) ≡ 0]. The
attractive interaction potential is assumed to have the
2-parameter form
v(r) = −v0f(µr) (2)
where v0 > 0 is the strength of the interaction, R0 =
is its range, and f(x) is a dimensionless function.
In the true ground state of the Hamiltonian (1) the
attractive atoms may form dimers or even clusters. We
are, however, looking for a metastable state where there
is a dilute gas of separated atoms with uniform density n,
satisfying the condition (µrs) ≫ 1 where n = NΩ =
Even then, for a weak v0, there will be BCS-type pair-
ing, followed by dimer formation as the strength of the
interaction increases. This was predicted long back by
Leggett [1], and has been observed experimentally [2].
For the density functional analysis of the uniform gas
at Feshbach resonance, we shall disregard the BCS con-
densed pairs in this paper.
To study the effect of the attractive interaction v(r), we
consider the corresponding atom-atom scattering prob-
lem in the relative s-state. Separating the center-of-mass
motion, we are left with the relative Hamiltonian
Ĥrel = −
− v0f(µr) . (3)
Keeping the range of the potential small enough such
that (µrs) ≫ 1, the strength v0 is adjusted such that the
potential can support a single bound state at zero energy.
This happens when the scattering length a→ ∞, leaving
no length scale from the interaction. Such a tuning of
the interaction is possible experimentally, and gives rise
to Feshbach resonance [3]. The scattering cross section
in the given partial wave (s-wave in our case) reaches the
unitary limit, and the gas is said to be at unitarity. It is
then expected to display universal behavior [4]. Note that
at Feshbach resonance, there is no length scale left other
than the inverse of the Fermi wave number kF , where
kF = (3π
2n)1/3. The energy per particle, E/N , as a
function of the density n, should therefore scale the same
way as the noninteracting kinetic energy, 3
2k2F /2M ∝
n2/3. There has been much interest amongst theorists to
calculate the properties of the gas in the unitary regime
(kF |a| ≫ 1). In particular, at T = 0, the energy per
particle of the gas is calculated to be
, (4)
where ξ ≃ 0.44 [5]. The experimental value of ξ is about
0.5, but with large error bars [6]. Recently, there have
been two Monte Carlo (MC) finite temperature calcula-
tions [7, 8] of an untrapped gas at unitarity, where various
thermodynamic properties as a function of temperature
have been computed. It is clear that at unitarity, the
kinetic and potential energies should scale the same way.
This has been assumed a priori in a previous density
functional treatment of a unitary gas [9]. However, such
a scaling behavior is not evident from the density func-
tionals for the direct, exchange and correlation energies
[10] (see sects. II and III). The aim of the present paper
is to examine this point in some detail. In particular, we
are able to show analytically that the leading contribu-
tion of the correlation energy (calculated in second order
perturbation theory), cancels the first order exchange en-
rgy almost exactly at Feshbach resonance. This happens
irrespective of the shape of the potential as specified by
f(µr), provided the condition (µrs) ≫ 1. We show that
http://arxiv.org/abs/0704.0769v1
our general Eq.(24) (derived later in the text) that en-
sures such a cancellation is satisfied at unitarity for a
variety of 2-parameter potentials, including the square
well and the delta-shell, as well as the smoothly varying
cosh−2(µr) and Gaussian potentials. This is the main re-
sult of the present work. The implications of this result
for universality is marginal. This is because these po-
tential energy terms, in the limit of (µrs) ≫ 1, are very
small compared to the kinetic energy [4]. For a moder-
ately large value like (µrs) ≃ 3 howevr, these terms are
comparable in magnitude to the kinetic energy (sect. IV).
Even then, the cancellation of the first order exchange,
and the second order perturbative terms leave the di-
rect first order term in tact. In the electron gas, this
(repulsive) term got cancelled by the interaction of the
electrons with the positive ionic background. There is no
such mechanism of cancellation here, unless we assume,
rather arbitrarily, that the short-range interatomic repul-
sion cancels this direct (attractive) contribution. Even
without any such assumptions, however, our main result
(Table I), applicable at Feshbach reonance, is interesting
from the angle of potential theory.
II. PERTURBATION EXPANSION
Treating the interaction (2) as a weak perturbation in
the Hamiltonian (1), the unperturbed energy E(0) is the
kinetic energy of a non-interacting Fermi gas,
E(0) = Nts(rs) ≡ N
. (5)
Here, kF =
and α3 = 4
. The corresponding ground
state |Φ0〉 is a Slater determinant of plane waves.
In terms of dimensionless coordinates xi = µri, the
Hamiltonian (1) can be written as
Ĥ = −1
∇2i − λ
|xi − xj |
, λ =
This suggests that the perturbation parameter is not re-
ally small at unitarity. For example, for the square-well
potential, the zero-energy single bound state occurs when
λ = π
(see sect. III). Nevertheless the low-order terms
can point to important information, even when the ex-
pansion is divergent [11]. In our problem, there are three
parameters, µ, v0, and rs. The unitarity condition re-
lates µ and v0, so two independent parameters are left.
One of these may be taken to be the small parameter
ζ = (µrs)
−1. The remaining free parameter v0 may be
chosen independently of ζ to fulfill the unitarity condi-
tion.
A. First order
Formally, the first-order correction,
E(1) = 〈Φ0|V̂int|Φ0〉, (7)
has a direct contribution U(rs, µ) = Nu(rs, µ) with
u(rs, µ) =
d3r′v
|r− r′|
(µrs)3
f2. (8)
Here, f2 =
dxx2f(x).
The other first-order contribution is the exchange en-
ergy Ex(rs, µ) = Nex(rs, µ), [4]
ex(rs, µ) = −
drj1(kF r)
2v(r). (9)
Here, j1(z) is a spherical Bessel function. Since v(r) is
short-range and kF is small in a dilute gas, we can use
the small-z expansion j1(z) =
+O(z3) to find
ex(rs, µ) =
(µrs)3
f2 +O(µrs)
−5. (10)
B. Second order
1. General expressions
Also the second-order correction,
E(2) = −
|〈Φn|V̂int|Φ0〉|2
En − E0
dir + e
, (11)
has a direct and an exchange contribution [12],
dir(rs, µ) = −
d3q f̃
d3k1 d
q · (q+ k1 − k2)
, (12)
e(2)ex (rs, µ) = +
d3q f̃
d3k1 d
|q+ k1 − k2|
q · (q+ k1 − k2)
. (13)
While v20(2M/~
2µ2) has the dimension energy, the inte-
gration variables q, k1, and k2 are dimensionless here.
The domain of the integral over d3k1 d
3k2 depends on q,
D : |k1|, |k2| < 1; |k1 + q|, |k2 − q| > 1. (14)
Furthermore, f̃(y) is a dimensionless transform of f(x),
f̃(y) =
dxx2 f(x) j0(yx)
dxx f(x) sin(yx). (15)
To recover Eqs. (8) and (9) of Ref. [12], put M = me,
v0 = −e2µ, and f(x) = 1x or f̃(y) =
, such that v(r) =
becomes the electronic Coulomb repulsion. (Note that
Ref. [12] uses Rydberg units, mee
4/2~2 = e2/2aB = 1.)
2. The limit µrs ≫ 1
For a dilute gas (small kF ) with short-range interaction
(large µ), Eqs. (12) and (13) can be evaluated in the limit
µ/kF ≡ αµrs ≫ 1 where α3 = 49π . Following Ref. [13],
we choose a number q1 such that 1 ≪ q1 ≪ µ/kF and
split the integrals over d3q into two parts,
d3q =
d3q +
d3q. (16)
In the first part with q < q1, we have q ≪ µ/kF and
|q + k1 − k2| ≪ µ/kF (note that |k1|, |k2| < 1 ≪ q1).
Therefore, we may expand f̃(y) = f2+O(y
2) in Eqs. (12)
and (13) and keep the leading term f2 only. The sum of
the two resulting q < q1 contributions reads
q<q1(rs, µ) = −
f22 ×
d3k1 d
q · (q+ k1 − k2)
. (17)
The number q1 can be chosen independently of µ/kF ≫
1, despite the condition 1 ≪ q1 ≪ µ/kF . Then, the
integral in Eq. (17) is a finite constant and we conclude
q<q1(rs, µ) = O(µrs)
−4. (18)
In the second part q > q1 ≫ 1 of the integral (16), we can
put q + k1 − k2 ≈ q, since |k1|, |k2| < 1. The resulting
contributions to Eqs. (12) and (13) add up to
q>q1(rs, µ) = −
where
d3k1 d
3k2 = (
)2 has been used. Now,
dy f̃(y)2 (20)
where y1 = kF q1/µ ≪ 1. If
dy f̃(y)2 in Eq. (20) did
not depend on y1, expression (19) did rigorously have the
order O(µrs)
−3. However, using the small-y expansion
f̃(y) = f2 +O(y
2), we have
dyf̃(y)2 = f22 y1 +O(y
Consequently, shifting the lower limit y1 of the integral
(20) to zero does not affect the leading-order contribution
to expression (19),
q>q1(rs, µ) = O(µrs)
−3. (21)
Therefore, the quantity (18) does not contribute to the
leading order of e
c = e
dir + e
ex which is purely due to
expression (19),
e(2)c (rs, µ) = −
(µrs)3
F +O(µrs)
−4 (22)
where F =
dyf̃(y)2.
III. DENSITY SCALING AT UNITARITY
If the perturbation expansion is convergent [12], the
total energy E(rs, µ) = Ne(rs, µ) of the gas can be ex-
pressed in the form
e(rs, µ) = ts(rs) + ex(rs, µ) +
e(n)c (rs, µ). (23)
At unitarity, when the relative Hamiltonian (3) has a
single bound state at zero energy, the exchange plus cor-
relation energy ex +
n=2 e
c should display the same
density scaling as the kinetic energy, ts(rs) ∝ r−2s ∝ ρ2/3.
This is obviously not the case with any one of the present
(leading-order) results (10) and (22). However, since the
exchange energy (10) and the second-order correlation
energy (22) have opposite signs, they can cancel each
other at some value of µ. This happens when
, (24)
where f2 =
dxx2 f(x) and F =
dyf̃(y)2. This
is the main result of our paper, and we check it by tak-
ing four different potentials. The results of this analysis,
summarized in Table I, are discussed in detail below.
Generally, we need an eigenfunction ψ(r) =
of the relative Hamiltonian (3) with eigenvalue zero.
Writing u(r) = φ(µr), the corresponding dimensionless
Schrödinger Equation reads
φ′′(x) = −λf(x)φ(x), λ ≡ Mv0
. (25)
Precisely, we wish to determine that particular value λuty
of λ for which this zero-energy solution is the only bound
state. Then, φ(x) must obey φ(0) = 0, φ′(x) < 0 for
x ≥ 0, and φ(x) → const. for x → ∞. In the following
examples (A-D), the solution φ(x) can be found analyti-
cally or numerically.
(A) Square-well potential of radius R0 = 1/µ:
v(r) = −v0Θ(R0 − r) , (26)
where Θ(z) denotes the Heavyside step function, Θ(z) =
1 for z > 0 and Θ(z) = 0 for z ≤ 0. By setting the
dimensionless variable µr = x, we see that f(x) = Θ(1−
x). The square-well potential (26) supports a single zero
energy bound state when the LHS of Eq.(24) is λuty =
π2/4. It may be easily checked analytically that for the
square-well potential (26), f2 =
and F = π
so that the
RHS of Eq.(24) is 5
, very close to its LHS, π2/4 = 2.47.
(B) Rosen-Morse hyperbolic potential [5]. This poten-
tial is given by
v(r) = −v0 sech2(µr) , (27)
which suppotrs a single zero energy bound state when
the LHS of Eq.(24) is λuty = 2 instead of π
2/4. For this
potential, it is easy to check that f2 = π
2/12. The quan-
tity F , however, has to be calculated numerically, and
is given by F = 0.596. Again, Eq.(24) is approximately
satisfied, since its RHS for this potential is 2.17.
(C) Delta-shell potential [14]. Consider the potential
v(r) = −η ~
δ(r −R0) ,
= −η ~
= −v0f(µr) . (28)
Thus, we have v0 = η
, µ = 1
, and f(x) = δ(x− 1).
So we get f2 = 1, f̃(y) =
sin y
, and F = π
. Hence the
RHS of Eq. (24) is unity. The LHS is (ηR0), which is
exactly unity when the s-state scattering length goes to
infinity [14]. Thus Eq.(24) is exactly obeyed in this case.
(D) Gaussian Potential.
v(r) = −v0 exp(−µ2r2) (29)
For this example, f(x) = exp(−x2) in Eq. (2). We find
π and F = 1
)3/2 so that the RHS of Eq. (24)
becomes πf2/2F = 2
3/2. Solving Eq. (25) numerically
for this f(x), we obtain a single bound state at zero
energy when the LHS of Eq. (24) is λuty = 0.949× 23/2,
close to 23/2.
Note, however, that contributions O(µrs)
−3 may also
come from higher order terms of the perturbation expan-
sion in section II, since that expansion is carried out with
respect to the parameter λ =Mv0/~
2µ2, but not 1/µrs.
IV. DISCUSSION
The dimensionless Hamiltonian ĥ from Eq. (6) depends
on the dimensionless paramaters
TABLE I: The moments f2 and F of four different profiles
f(x) for the potential (2). λuty is the value at unitarity of the
parameter λ in Eq. (25). At unitarity, the ratio Q of the LHS
of Eq. (24) to the RHS is always close to 1.
f(x) f2 F λuty Q
Θ(1− x) 1
0.987
sech(x)2 π
0.596 2 0.922
δ(1− x) 1 π
1 1.000
exp(−x2) 1
)3/2 2.684 0.949
and, not written explicitly, xs = µrs. The perturbation
expansion of the ground-state energy of ĥ reads
ε(xs, λ) =
εn(xs)λ
n. (31)
The ground-state energy of the original Hamiltonian Ĥ ,
with three independent parameters, is then given by
E(rs, µ, λ) =
ε(µrs, λ)
εn(µrs)λ
n. (32)
For µrs ≫ 1, we may expand
εn(µrs) =
(µrs)m
. (33)
From Eq. (5), we have ε02 = N
α−2 while ε0m = 0 for
m 6= 2. Eqs. (8) and (10) imply that ε1m = 0 for m < 3
and ε13 = N(− 32 +
)f2. Eventually, due to Eq. (22),
ε2m = 0 for m < 3 and ε23 = N(− 34π )2F .
So far as the unitary point is concerned, we are inter-
ested in a situation where kF |a| ≫ 1 ≫ kFR0 ∼ (µrs)−1.
In view of the fact that the perturbation series above does
not converge at unitarity, how significant is our low order
perturbation calculation in this situation ? Note that our
first order direct and exchange (potential) energy terms
given by Eqs. (8,10) are the same as those obtained in
the Hartree-Fock calculation (see, for example, Eq.(10)
of Heiselberg [4]). How big are these terms at unitarity
compared to the kinetic energy per particle ? Taking the
example of the square-well potential discussed earlier, it
is straight forward to show that our exchange term (10)
at Feshbach resonance is
ex(rs, µ) =
. (34)
For the square-well example,
(kF a) =
(µrs)
1− tan
. (35)
At unitarity, the RHS diverges for any finite value of
(µrs), how ever large. Even in the neighbourhood of
unitarity, it is possible to have (kF |a|) ≫ 1 for (µrs) ≫ 1.
From Eq.(34), we note that too large a choice for (µrs)
would make ex negligible against
EF . Instead, taking
a modestly large value, µrs = 3, we obtain the ration
of ex to kinetic energy per particle to be about 0.56.
Noting that ex has a different density-dependence than
the kinetic energy per particle, its cancellation with the
second order perturbative correlation term helps towards
scale invariance, but only if there is a mechanism for the
direct first order term to be cancelled.
We conclude by emphasizing that the new result in this
paper is displayed in Table 1, and should be of interest
from the point of view of potential theory.
The authors would like to thank Brandon van Zyl for
discussions. This research was financed by NSERC of
Canada.
[1] A.J. Leggett, in Modern Trends in the Theory of Con-
densed Matter, Springer-Verlag Lecture Notes, Vol. 115,
edited by A. Peklaski and J. Przystawa (Springer-Verlag,
Berlin, 1980), p.13
[2] C.A. Regal et al., Nature (London) 424, 47 (2003); M.W.
Zwierlein et al., Phys. Rev. Lett. 91, 250401 (2003); C.A.
Regal et al., Phys. Rev. Lett. 92, 040403 (2004); M.W.
Zwierlein et al., Nature (London) 435, 1046 (2005); G.
B. Partridge et al., Science 311, 503 (2006).
[3] S. Inouye et al., Nature (London) 392, 151 (1998); Ph.
Courteille et al., Phys. Rev. Lett 81, 69 (1998).
[4] G.A. Baker, Phys. Rev. C60, 054311 (1999); H. Heisel-
berg, Phys. Rev. A63, 043606 (2001); T.-L. Ho. Phys.
Rev. Lett. 92, 090402 (2004).
[5] J. Carlson, S.-Y. Chang, V. R. Pandharipande, and K. E.
Schmidt, Phys. Rev. Lett. 91, 050401 (2003); A. Perali,
P. Pieri, and G. C. Strinati, Phys. Rev. Lett. 93, 100404
(2004).
[6] M. Bartenstein et al., Phys. Rev. Lett. 92, 120401 (2004);
T. Bourdel et al., Phys. Rev. Lett. 93, 050401 (2004).
[7] A. Bulgac, J. E. Drut J.E., and P. Magierski, Phys. Rev.
Lett. 96, 090404 (2006).
[8] E. Burovski, N. Prokof’ev, B. Svistunov, and M. Troyer,
Phys. Rev. Lett. 96, 160402 (2006).
[9] T. Papenbrock, Phys. Rev. A72, 041603 (R) (2005);
A. Bhattacharyya and T. Papenbrock, Phys. Rev. A74,
041602 (R) (2006).
[10] R. G. Parr and W. Yang, Density-Functional Theory
of Atoms and Molecules (Oxford University Press, New
York, 1989); W. Kohn, Rev. Mod. Phys. 71, 1253 (1999).
[11] M. Seidl, J. P. Perdew, and S. Kurth, Phys. Rev. Lett.
84, 5070 (2000).
[12] M. Gell-Mann, K. A. Brueckner, Phys. Rev. 106, 364
(1957).
[13] L. Zecca, P. Gori-Giorgi, S. Moroni, and G. B. Bachelet,
Phys. Rev. B 70, 205 127 (2004).
[14] K. Gottfried, Quantum Mechanicsvol.I, (W. A. Ben-
jamin, Inc., New York, 1966). See sect. (15).
ABSTRACT
  We consider a dilute gas of neutral unpolarized fermionic atoms at zero
temperature.The atoms interact via a short range (tunable) attractive
interaction. We demonstrate analytically a curious property of the gas at
unitarity. Namely, the correlation energy of the gas, evaluated by second order
perturbation theory, has the same density dependence as the first order
exchange energy, and the two almost exactly cancel each other at Feshbach
resonance irrespective of the shape of the potential, provided $(\mu r_s) >>
1$. Here $(\mu)^{-1}$ is the range of the two-body potential, and $r_s$ is
defined through the number density $n=3/(4\pi r_s^3)$. The implications of this
result for universality is discussed.

<|endoftext|><|startoftext|>
Chemical Evolution
Francesca Matteucci
Department of Astronomy
University of Trieste
and Osservatorio Astronomico di Trieste (INAF)
Via G.B. Tiepolo, 11, 34124 Trieste
Italy
(matteucci@ts.astro.it)
http://arxiv.org/abs/0704.0770v1
Contents
1 Chemical Evolution page 1
1.1 Lecture I: basic assumptions and equations of chem-
ical evolution 1
1.1.1 The basic ingredients 1
1.1.2 The Star Formation Rate 2
1.1.3 The Initial Mass Function 3
1.1.4 The Infall Rate 4
1.1.5 The Outflow Rate 4
1.1.6 Stellar evolution and nucleosynthesis: the
stellar yields 5
1.1.7 Type Ia SN Progenitors 6
1.1.8 Yields per Stellar Generation 7
1.1.9 Analytical models 8
1.1.10 Numerical Models 9
1.2 Lecture II: the Milky Way and other spirals 11
1.2.1 The Galactic formation timescales 11
1.2.2 The two-infall model 12
1.2.3 Common Conclusions from MW Models 18
1.2.4 Abundance Gradients from Emission Lines 19
1.2.5 Abundance Gradients in External Galaxies 21
1.2.6 How to model the Hubble Sequence 21
1.2.7 Type Ia SN rates in different galaxies 24
1.2.8 Time-delay model for different galaxies 25
1.3 Lecture III: interpretation of abundances in dwarf
irregulars 27
1.3.1 Properties of Dwarf Irregular Galaxies 27
1.3.2 Galactic Winds 31
iv Contents
1.3.3 Results on DIG and BCG from purely
chemical models 32
1.3.4 Results from Chemo-Dynamical models: IZw18 34
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrich-
ment 38
1.4.1 Ellipticals 38
1.4.2 Chemical Properties 38
1.4.3 Scenarios for galaxy formation 39
1.4.4 Ellipticals-Quasars connection 41
1.4.5 The chemical evolution of QSOs 41
1.4.6 The chemical enrichment of the ICM 43
1.4.7 Conclusions on the enrichment of the ICM 46
References 48
Chemical Evolution
1.1 Lecture I: basic assumptions and equations of chemical
evolution
To build galaxy chemical evolution models one needs to elucidate a num-
ber of hypotheses and make assumptions on the basic ingredients.
1.1.1 The basic ingredients
• INITIAL CONDITIONS: whether the mass of gas out of which stars
will form is all present initially or it will be accreted later on. The
chemical composition of the initial gas (primordial or already enriched
by a pregalactic stellar generation).
• THE BIRTHRATE FUNCTION:
B(M, t) = ψ(t)ϕ(M) (1.1)
where:
ψ(t) = SFR (1.2)
is the star formation rate (SFR) and:
ϕ(M) = IMF (1.3)
is the initial mass function (IMF).
• STELLAR EVOLUTION AND NUCLEOSYNTHESIS: stellar yields,
yields per stellar generation
• SUPPLEMENTARY PARAMETERS : infall, outflow, radial flows.
2 Chemical Evolution
1.1.2 The Star Formation Rate
Here we will summarize the most common parametrizations for the SFR
in galaxies, as adopted by chemical evolution models:
• Constant in space and time and equal to the estimated present time
SFR. For example, for the local disk, the present time SFR is SFR=2-
5M⊙pc
−2Gyr−1 (Boissier& Prantzos, 1999).
• Exponentially decreasing:
SFR = νe−t/τ∗ (1.4)
with τ∗ = 5− 15 Gyr (Tosi, 1988). The quantity νis a parameter that
we call efficiency of SF since it represents the SFR per unit mass of
gas and is expressed in Gyr−1.
• The most used SFR is the Schmidt (1959) law, which assumes a de-
pendence on the gas density, in particular:
SFR = νσkgas (1.5)
where k = 1.4± 0.15, as suggested by a study of Kennicutt (1998)
of local star forming galaxies.
• Some variations of the Schmidt law with a dependence also on the
total mass have been suggested for example by Dopita & Ryder (1994).
This formulation takes into account the feedback mechanism acting
between supernovae ( SNe) and stellar winds injecting energy into the
interstellar medium (ISM) and the galactic potential well. In other
words, the SF process is regulated by the fact that in a region of
recent star formation the gas is too hot to form stars and it is easily
removed from that region. Before new stars could form the gas needs
to cool and collapse back into the star forming region and this process
depends on the potential well and therefore on the total mass density:
SFR = νσk1totσ
gas (1.6)
with k1 = 0.5 and k2 = 1.5.
• Kennicutt (1998) also suggested, as an alternative to the Schmidt law
to fit the data, the following relation:
SFR = 0.017Ωgasσgas ∝ R
−1σgas (1.7)
with Ωgas being the angular rotation speed of gas.
1.1 Lecture I: basic assumptions and equations of chemical evolution3
• Finally a SFR induced by spiral density waves was suggested by Wyse
& Silk (1989):
SFR = νV (R)R−1σ1.5gas (1.8)
with R being the galactocentric distance and V (R) the gas rotation
velocity.
1.1.3 The Initial Mass Function
The IMF is a probability function describing the distribution of stars
as a function of mass. The present day mass function is derived for
the stars in the solar vicinity by counting the Main Sequence stars as a
function of magnitude and then applying the mass-luminosty relation,
holding for Main Sequence stars, to derive the distribution of stars as
a function of mass. In order to derive the IMF one has then to make
assumptions on the past history of SF.
The derived IMF is normally approximated by a power law:
ϕ(M)dM = aM−(1+x)dM (1.9)
where ϕ(M) is the number of stars with masses in the interval M,
M+dM.
Salpeter (1955) proposed a one-slope IMF (x = 1.35) valid for stars
with M > 10M⊙. Multi-slope (x1, x2, ..) IMFs have been suggested
later on always for the solar vicinity (Scalo 1986,1998; Kroupa et al.
1993; Chabrier 2003). The IMF is generally normalized as:
∫ 100
Mϕ(M)dM = 1 (1.10)
where a is the normalization constant and the assumed interval of inte-
gration is 0.1− 100M⊙.
The IMF is generally considered constant in space and time with some
exceptions such as the IMF suggested by Larson (1998) with:
x = 1.35(1 +m/m1)
−1 (1.11)
where m1 is variable typical mass and is associated to the Jeans mass.
This IMF predicts then that m1 is a decreasing function of time.
4 Chemical Evolution
1.1.4 The Infall Rate
For the rate of gas accretion there are in the literature several parametriza-
tions:
• The infall rate is constant in space and time and equal to the present
time infall rate as measured in the Galaxy (∼ 1.0M⊙yr
• The infall rate is variable in space and time, and the most common
assumption is an exponential law (Chiosi 1980; Lacey & Fall 1985):
IR = A(R)e−t/τ(R) (1.12)
with τ(R) constant or varying with the galactocentric distance. The
parameter A(R) is derived by fitting the present day total surface
mass density, σtot(tG), at any specific galactocentric radius R.
• For the formation of the Milky Way two episodes of infall have been
suggested (Chiappini et al. 1997), where during the first infall episode
the stellar halo forms whereas during the second infall episode the
disk forms. This particular infall law gives a good representation of
the formation of the Milky Way. The proposed two-infall law is:
IR = A(R)e−t/τH(R) +B(R)e−(t−tmax)/τD(R) (1.13)
where τH(R) is the timescale for the formation of the halo which
can be costant or vary with galactocentric distance. The quantity
τD(R) is the timescale for the formation of the disk and is a function
of the galactocentric distance; in most of the models it is assumed to
increase with R (e.g. Matteucci & François, 1989).
• More recently, Prantzos (2003) suggested a gaussian law with a peak
at 0.1 Gyr and a FWHM of 0.04 Gyr for the formation of the stellar
halo.
1.1.5 The Outflow Rate
The so-called galactic winds occur when the thermal energy of the gas
in galaxies exceeds its potential energy. Generally, gas outflows are
called winds when the gas is lost forever from the galaxy. Only detailed
dynamical simulations can suggest whether there is a wind or just an
outflow of gas which will soon or later fall back again into the galaxy. In
chemical evolution models galactic winds can be sudden or continuous. If
they are sudden, the mass is assumed to be lost in a very short interval of
1.1 Lecture I: basic assumptions and equations of chemical evolution5
time and the galaxy is devoided from all the gas; if they are continuous,
one has to assume the rate of gas loss. Generally, in chemical evolution
models (Bradamante et al. 1998) and also in cosmological simulations
(Springel & Hernquist, 2003) it is assumed that the rate of gas loss is
several times the SFR:
W = −λSFR (1.14)
where λ is a free parameter with the meaning of wind efficiency. This
particular formulation for the galactic wind rate is confirmed by obser-
vational findings (see Martin, 1999).
1.1.6 Stellar evolution and nucleosynthesis: the stellar yields
Here we summarize the various contribution to the element production
by stars of all masses.
• Brown Dwarfs (M < ML, ML = 0.08 − 0.09M⊙) are objects which
never ignite H and their lifetimes are larger than the age of the Uni-
verse. They are contributing to lock up mass.
• Low mass stars (0.5 ≤ M/M⊙ ≤ MHeF ) (1.85-2.2M⊙) ignite He ex-
plosively but without destroying themselves and then become C-O
white dwarfs (WD). If M < 0.5M⊙ they become He WDs. Their
lifetimes range from several 109 years up to several Hubble times!
• Intermediate mass stars (MHeF ≤ M/M⊙ ≤ Mup) ignite He quies-
cently. The mass Mup is the limiting mass for the formation of a C-O
degenerate core and is in the range 5-9M⊙, depending on stellar evo-
lution calculations. Lifetimes are from several 107 to 109 years. They
die as C-O WDs if not in binary systems. If in binary systems they
can give rise to cataclysmic variables such as novae and Type Ia SNe.
• Massive stars (M > Mup). We distinguish here several cases:
-Mup ≤ M/M⊙ ≤ 10 − 12. Stars with Main Sequence masses in
this range end up as electron-capture SNe leaving neutron stars as
remnants. These SNe will appear as Type II SNe which show H in
their spectra.
-10 − 12 ≤ M/M⊙ ≤ MWR, (with MWR ∼ 20 − 40M⊙ being the
limiting mass for the formation of a Wolf-Rayet (WR) star). Stars in
this mass range end their life as core-collapse SNe (Type II) leaving a
neutron star or a black hole as remnants.
-MWR ≤ M/M⊙ ≤ 100. Stars in this mass range are probably
6 Chemical Evolution
exploding as Type Ib/c SNe which do not show H in their spectra.
Their lifetimes are of the order of ∼ 106 years.
• Very Massive Stars (M > 100M⊙), they should explode by means
of instability due to “pair creation” and they are called pair-creation
SNe. In fact, at T ∼ 2 · 109 K a large portion of the gravitational
energy goes into creation of pairs (e+, e−), the star becomes unstable
and explodes. They leave no remnants and their lifetimes are < 106
years. Probably these very massive stars formed only when the metal
content was almost zero (Population III stars, Schneider et al. 2004).
All the elements with mass number A from 12 to 60 have been formed
in stars during the quiescent burnings. Stars transform H into He and
then He into heaviers until the Fe-peak elements, where the binding
energy per nucleon reaches a maximum and the nuclear fusion reactions
stop.
H is transformed into He through the proton-proton chain or the CNO-
cycle, then 4He is transformed into 12C through the triple- α reaction.
Elements heavier than 12C are then produced by synthesis of α-
particles. They are called α-elements (O, Ne, Mg, Si and others).
The last main burning in stars is the 28Si -burning which produces
56Ni which then decays into 56Co and 56Fe. Si-burning can be quiescent
or explosive (depending on the temperature).
Explosive nucleosynthesis occurring during SN explosions mainly pro-
duces Fe-peak elements. Elements originating from s- and r-processes
(with A> 60 up to Th and U) are formed by means of slow or rapid (rel-
ative to the β- decay) neutron capture by Fe seed nuclei; s-processing
occurs during quiescent He-burning whereas r-processing occurs during
SN explosions.
1.1.7 Type Ia SN Progenitors
The Type Ia SNe, which do not show H in their spectra, are believed to
originate from WDs in binary systems and to be the major producers of
Fe in the Universe. The model proposed are basically two:
• Single Degenerate Scenario (SDS), with a WD plus a Main Se-
quence or Red Giant star, as originally suggested by Whelan and Iben
(1973). The explosion (C-deflagration) occurs when the C-O WD
reaches the Chandrasekhar mass, MCh =∼ 1.44M⊙, after accreting
material from thecompanion. In this model the clock to the explosion
is given by the lifetime of the companion of the WD (namely the less
1.1 Lecture I: basic assumptions and equations of chemical evolution7
massive star in the system). It is interesting to define the minimum
timescale for the explosion which is given by the lifetime of a 8M⊙
star, namely tSNIamin=0.03 Gyr (Greggio and Renzini 1983). Recent
observations in radio-galaxies by Mannucci et al. (2005;2006) seem to
confirm the existence of such prompt Type Ia SNe.
• Double Degenerate Scenario (DDS), where the merging of two C-
OWDs of mass∼ 0.7M⊙, due to loss of angular momentum as a conse-
quence of gravitational wave radiation, produces C-deflagration (Iben
and Tutukov 1984). In this case the clock to the explosion is given
by the lifetime of the secondary star, as above, plus the gravitational
time delay, namely the time necessary for the two WDs to merge. The
minimum time for the explosion is tSNIamin = 0.03+∆tgrav=0.04 Gyr
(see Tornambè 1989).
Some variations of the above scenarios have been proposed such as
the model by Hachisu et al. (1996; 1999), which is based on the sin-
gle degenerate scenario where a wind from the WD is considered. Such
a wind stabilizes the accretion from the companion and introduces a
metallicity effect. In particular, the wind, necessary to this model, oc-
curs only if the systems have metallicity ([Fe/H]< −1.0). This implies
that the minimum time for the explosion is larger than in the previous
cases. In particular, tSNIamin = 0.33 Gyr, which is the lifetime of the
more massive secondary considered (2.3M⊙) plus the metallicity delay
which depends on the assumed chemical evolution model.
1.1.8 Yields per Stellar Generation
Under the assumption of Instantaneous Recycling Approximation (IRA)
which states that all stars more massive than 1M⊙ die immediately,
whereas all stars with masses lower than 1M⊙ live forever, one can
define the yield per stellar generation (Tinsley, 1980);
mpimϕ(m)dm (1.15)
where pim is the stellar yield of the element i, namely the newly formed
and ejected element i by a star of mass m.
The quantity R is the so-called Returned Fraction:
(m−Mrem)ϕ(m)dm (1.16)
8 Chemical Evolution
and is the total mass of gas restored into the ISM by an entire stellar
generation.
1.1.9 Analytical models
The Simple Model for the chemical evolution of the solar neighbourhood
is the simplest approach to model chemical evolution. The solar neigh-
bourhood is assumed to be a cylinder of 1 Kpc radius centered around
the Sun.
The basic assumptions of the Simple Model are:
- the system is one-zone and closed, no inflows or outflows with the
total mass present since the beginning,
- the initial gas is primordial (no metals),
- instantaneous recycling approximation holds,
- the IMF, ϕ(m), is assumed to be constant in time,
- the gas is well mixed at any time (IMA)
The Simple Model fails in describing the evolution of the Milky Way
(G-dwarf metallicity distribution, elements produced on long timescales
and abundance ratios) and the reason is that at least two of the above
assumptions are manifestly wrong, epecially if one intends to model the
evolution of the abundance of elements produced on long timescales,
such as Fe. In particular the assumptions of the closed boxiness and the
However, it is interesting to know the solution of the Simple Model
and its implications. Be Xi the abundance by mass of an element i.
If Xi << 1, which is generally true for metals, we obtain the solution
of the Simple Model. This solution is obtained analytically by ignoring
the stellar lifetimes:
Xi = yiln(
) (1.17)
where µ =Mgas/Mtot and yi is the yield per stellar generation, as defined
above, otherwise called effective yield. In particular, the effective yield
is defined as:
yieff =
ln(1/G)
(1.18)
namely the yield that the system would have if behaving as the simple
closed-box model. This means that if yieff > yi, then the actual system
has attained a higher abundance for the element i at a given gas fraction
G. Generally, in the IRA, we can assume:
1.1 Lecture I: basic assumptions and equations of chemical evolution9
(1.19)
which means that the ratio of two element abundances are always
equal to the ratio of their yields. This is no more true when IRA is
relaxed. In fact, relaxing IRA is necessary to study in detail the evolution
of the abundances of single elements.
One can obtain analytical solutions also in presence of infall and/or
outflow but the necessary condition is to assume IRA. Matteucci &
Chiosi (1983) found solutions for models with outflow and infall and
Matteucci (2001) found it for a model with infall and outflow acting at
the same time. The main assumption in the model with outflow but no
infall is that the outflow rate is:
W (t) = λ(1−R)ψ(t) (1.20)
where λ ≥ 0 is the wind parameter.
The solution of this model is:
(1 + λ)
ln[(1 + λ)G−1 − λ] (1.21)
for λ = 0 the equation becomes the one of the Simple Model (1.17).
The solution of the equation of metals for a model without wind but
with a primordial infalling material (XAi = 0) at a rate:
A(t) = Λ(1−R)ψ(t) (1.22)
and Λ 6= 1 is :
[1− (Λ − (Λ− 1)G−1)−Λ/(1−Λ)] (1.23)
For Λ = 1 one obtains the well known case of extreme infall studied by
Larson (1972) whose solution is:
Xi = yi[1− e
−(G−1−1)] (1.24)
This extreme infall solution shows that when G→ 0 then Xi → yi.
1.1.10 Numerical Models
Numerical models relax IRA and close boxiness but generally retain the
constancy of ϕ(m) and the IMA.
10 Chemical Evolution
If Gi is the mass fraction of gas in the form of an element i, we can
write:
Ġi(t) = −ψ(t)Xi(t)
∫ MBm
ψ(t− τm)Qmi(t− τm)ϕ(m)dm
∫ MBM
∫ 0.5
f(γ)ψ(t− τm2)Qmi(t− τm2)dγ]dm
∫ MBM
ψ(t− τm)Qmi(t− τm)ϕ(m)dm
ψ(t− τm)Qmi(t− τm)ϕ(m)dm
+XAiA(t)−Xi(t)W (t) (1.25)
where B=1-A, A=0.05-0.09. The meaning of the A parameter is the
fraction in the IMF of binary systems with those specific features re-
quired to give rise to Type Ia SNe, whereas B is the fraction of all the
single stars and binary systems in the same mass range of definition of
the progenitors of Type Ia SNe. The values of A indicated above are cor-
rect for the evolution of the solar vicinity where an IMF of Scalo (1986,
1989) or Kroupa et al.(1993) is adopted. If one adopts a flatter IMF such
as the Salpeter (1955) one then A is different. In the above equations
the contribution of Type Ia SNe is contained in the third term on the
right hand side. The integral is made over a range of masses going from
3 to 16 M⊙ which represents the total masses of binary systems able to
produce Type Ia SNe in the framework of the SDS. There is also an inte-
gration over the mass distribution of binary systems; in particular, one
considers the function f(γ) where γ = M2
M1+M2
, with M1 and M2 being
the primary and secondary mass of the binary system, respectively (for
more details see Matteucci & Greggio 1986 and Matteucci 2001).The
functions A(t) and W(t) are the infall and wind rate, respectively. Fi-
nally, the quantity Qmi represents the stellar yields (both processed and
unprocessed material).
1.2 Lecture II: the Milky Way and other spirals 11
1.2 Lecture II: the Milky Way and other spirals
The Milky Way galaxy has four main stellar populations: 1) the halo
stars with low metallicities (the most common metallicity indicator in
stars is [Fe/H]= log(Fe/H)∗− log(Fe/H)⊙) and eccentric orbits, 2) the
bulge population with a large range of metallicities and is dominated
by random motions, 3) the thin disk stars with an average metallicity
< [Fe/H ] >=-0.5 dex and circular orbits, and finally 4) the thick stars
which possess chemical and kinematical properties intermediate between
those of the halo and those of the thin disk. The halo stars have average
metallicities of < [Fe/H ] >=-1.5 dex and a maximum metallicity of
∼ −1.0 dex although stars with [Fe/H] as high as -0.6 dex and halo
kinematics are observed. The average metallicity of thin disk stars is
∼ −0.6 dex, whereas the one of Bulge stars is ∼ −0.2 dex.
1.2.1 The Galactic formation timescales
The kinematical and chemical properties of the different Galactic stel-
lar populations can be interpreted in terms of the Galaxy formation
mechanism. Eggen et al. (1962) in a cornerstone paper suggested a
rapid collapse for the formation of the Galaxy lasting ∼ 3 · 108 years.
This suggestion was based on a kinematical and chemical study of so-
lar neighbourhood stars. Later on, Searle & Zinn (1979) proposed a
central collapse like the one proposed by Eggen et al. but also that the
outer halo formed by merging of large fragments taking place over a con-
siderable timescale > 1 Gyr. More recently, Berman & Suchov (1991)
proposed the so-called hot Galaxy picture, with an initial strong burst
of SF which inhibited further SF for few Gyr while a strong Galactic
wind was created.
From an historical point of view, the modelization of the Galactic
chemical evolution has passed through different phases that I summarize
in the following.
• SERIAL FORMATION
The Galaxy is modeled by means of one accretion episode lasting
for the entire Galactic lifetime, where halo, thick and thin disk form in
sequence as a continuous process. The obvious limit of this approach
is that it does not allow us to predict the observed overlapping in
metallicity between halo and thick disk stars and between thick and
thin disk stars, but it gives a fair representation of our Galaxy (e.g.
Matteucci & François 1989).
12 Chemical Evolution
• PARALLEL FORMATION
In this formulation, the various Galactic components start at the
same time and from the same gas but evolve at different rates (e.g.
Pardi et al. 1995). It predicts overlapping of stars belonging to the
different components but implies that the thick disk formed out of gas
shed by the halo and that the thin disk formed out of gas shed by the
thick disk, and this is at variance with the distribution of the stellar
angular momentum per unit mass (Wyse & Gilmore 1992), which
indicates that the disk did not form out of gas shed by the halo.
• TWO-INFALL FORMATION
In this scenario, halo and disk formed out of two separate infall
episodes (overlapping in metallicity is also predicted) (e.g. Chiappini
et al. 1997; Chang et al. 1999). The first infall episode lasted no more
than 1-2 Gyr whereas the second, where the thin disk formed, lasted
much longer with a timescale for the formation of the solar vicinity of
6-8 Gyr (Chiappini et al. 1997; Boissier& Prantzos 1999).
• STOCHASTIC APPROACH
Here the hypothesis is that in the early halo phases ([Fe/H] < −3.0
dex), mixing was not efficient and, as a consequence, one should ob-
serve in low metallicity halo stars the effects of pollution from single
SNe (e.g. Tsujimoto et al. 1999; Argast et al. 2000; Oey 2000). These
models predict a large spread for [Fe/H] < −3.0dex which is not ob-
served, as shown by recent data with metallicities down to -4.0 dex
(Cayrel et al. 2004; see later).
1.2.2 The two-infall model
The adopted SFR (see Figure 2.1) is eq.(1.6) with different SF efficiencies
for the halo and disk, in particular νH = 2.0Gyr
−1, νD = 1.0Gyr
respectively. A threshold density (σth = 7M⊙pc
−2) for the SFR is also
assumed in agreement with results from Kennicutt (1989; 1998).
In Figure 2.2 we show the predicted SN (II and Ia) rates by the two-
infall model. Note that the Type Ia SN rate is calculated according to
the SDS (Greggio & Renzini, 1983; Matteucci & Recchi, 2001). There
is a delay between the Type II SN rate and the Type Ia SN rate, and
while the Type II SN rate strictly follows the SFR, the Type Ia SN rate
is smoothly increasing.
François et al. (2004) compared the predictions of the two-infall model
for the abundance ratios versus metallicity relations ([X/Fe] vs. [Fe/H]),
with the very recent and very accurate data of the project “First Stars”
1.2 Lecture II: the Milky Way and other spirals 13
Fig. 1.1. The predicted SFR in the solar vicinity with the two-infall model.
Figure from Chiappini et al. (1997). The oscillating behaviour at late times
is due to the assumed threshold density for SF. The threshold gas density is
also responsible for the gap in the SFR seen at around 1 Gyr.
by Cayrel et al. (2004). They adopted yields from the literature both for
Type II and Type Ia SNe and noticed that while for some elements (O,
Fe, Si, Ca) the yields of Woosley & Weaver (1995) (hereafter WW95)
reproduce the data fairly well, for the Fe-peak elements and heaviers
none of the available yields give a good agreement. Therefore, they
varied empirically the yields of these elements in order to best fit the
data. In Figures 2.3 and 2.4 we show the predictions for α-elements (O,
Mg, Si, Ca, Ti, K) plus some Fe-peak elements and Zn.
In Figure 2.4 we show also the ratios between the yields derived em-
pirically by François et al. (2004) in order to obtain the excellent fits
shown in the figures, and those of WW95 for massive stars. For some
elements it was necessary to change also the yields from Type Ia SNe
relative to the reference ones which are those of Iwamoto et al. (1999)
(hereafter I99).
In Figure 2.5 we show the predictions of chemical evolution models for
12C and 14N compared with abundance data. The behaviour of C shows
a roughly constant [C/Fe] as a function of [Fe/H], although C seems to
14 Chemical Evolution
Fig. 1.2. The predicted Type II and Ia SN rate in the solar vicinity with the
two-infall model. Figure from Chiappini et al. (1997)
slightly increase at very low metallicities, indicating that the bulk of
these two elements comes from stars with the same lifetimes. The data
in these figures, especially those for N are old and do not contain very
metal poor stars. Newer data containing stars with [Fe/H] down to ∼
-4.0 dex (Spite et al. 2005; Israelian et al. 2004) indicate that the [N/Fe]
ratio continues to be high also at low metallicities, indicating a primary
origin for N produced in massive stars. We recall here that we define
primary a chemical elements which is produced in the stars starting
from the H and He, whereas we define secondary a chemical element
which is formed from heavy elements already present in the star at its
birth and not produced in situ. The model predictions shown in Figure
2.5 for C and N assume that the bulk of these elements is produced
by low and intermediate mass stars (yields from van den Hoeck and
Groenewegen, 1997) and that N is produced as a partly secondary and
partly primary element. The N production from massive stars has only
a secondary origin (yields from WW95). In Figure 2.5 we show also a
model prediction where N is considered as a primary element in massive
stars with the yields artificially increased. Recently, Chiappini et al.
1.2 Lecture II: the Milky Way and other spirals 15
Fig. 1.3. Predicted and observed [X/Fe] vs. [Fe/H] for several α- and Fe-peak-
elements plus Zn compared with a compilation of data. In particular the black
dots are the recent high resolution data from Cayrel et al. (2004). For the
other data see references in François et al. (2004). The solar value indicated in
the upper right part of each figure represents the predicted solar value for the
ratio [X/Fe]. The assumed solar abundances are those of Grevesse & Sauval
(1998) except that for oxygen for which we take the value of Holweger (2001).
16 Chemical Evolution
Fig. 1.4. Upper panel: predicted and observed [X/Fe] vs. [Fe/H] for several
elements as in Figure 2.3. In the bottom part of this Figure are shown the
ratios between the empirical yields and the yields by WW95 for massive stars.
Such empirical yields have been suggested by François et al. (2004) in order
to fit at best all the [X/Fe] vs. [Fe/H] relations. In the small panel at the
bottom right side are shown also the ratios between the empirical yields for
Type Ia SNe and the yields by I99.
1.2 Lecture II: the Milky Way and other spirals 17
Fig. 1.5. Upper panel: predicted and observed [C/Fe] vs. [Fe/H]. Models
from Chiappini et al. (2003a). Lower panel, predicted and observed [N/Fe]
vs. [Fe/H]. For references to the data see original paper.The thin and thick
continuous lines in both panels represent models with standard nucleosynthe-
sis, as described in the text, whereas the dashed line represents the predictions
of a model where N in massive stars has been considered as a primary element
with “ad hoc” stellar yields.
(2006) have shown that primary N produced by very metal poor fastly
rotating massive stars can well reproduce the observations.
In summary, the comparison between model predictions and abun-
dance data indicate the following scenario for the formation of heavy
elements:
• 12C and 14N are mainly produced in low and intermediate mass stars
(0.8 ≤ M/M⊙ ≤ 8). The amounts of primary and secondary N is
still uncertain and also the fraction of C produced in massive stars.
Primary N from massive stars seems to be required to reproduce the
N abundance in low metallicity halo stars.
• α-elements originate in massive stars: the nucleosynthesis of O is
rather well understood (there is agreement between different authors),
the yields from WW95 as functions of metallicity produce an excellent
agreement with the observations for this particular element.
18 Chemical Evolution
• Magnesium is generally underproduced by nucleosynthesis models.
Taking the yields of WW95 as a reference, the Mg yields should be
increased in stars with masses M ≤ 20M⊙ and decreased in stars
with M > 20M⊙ to fit the data. Silicon should be slightly increased
in stars with masses M > 40M⊙.
• Fe originates mostly in Type Ia SNe. The Fe yields in massive stars
are still uncertain, WW95 metallicity dependent yields overestimate
Fe in stars < 30M⊙. For this element, it is better to adopt the yields
of WW95 for solar metallicity.
• Fe-peak elements: the yields of Cr, Mn should be increased in stars
of 10-20 M⊙ relative to the yields of WW95, whereas the yield of
Co should be increased in Type Ia SNe, relative to the yields of I99,
and decreased in stars in the range 10-20M⊙, relative to the yields of
WW95. Finally, the yield of Ni should be decreased in Type Ia SNe.
• The yields of Cu and Zn from Type Ia SNe should be larger, relative to
the standard yields, as already suggested by Matteucci et al. (1993).
1.2.3 Common Conclusions from MW Models
Most of the chemical evolution models for the Milky Way existing in the
literature conclude that:
• The G-dwarf metallicity distribution can be reproduced only by as-
suming a slow formation of the local disk by infall. In particular, the
time-scale for the formation of the local disk should be in the range
τd ∼ 6 − 8 Gyr (Chiappini et al. 1997; Boissier and Prantzos 1999;
Chang et al. 1999; Chiappini et al. 2001; Alibès et al. 2001).
• The relative abundance ratios [X/Fe] vs. [Fe/H], interpreted as time-
delay between Type Ia and II SNe, suggest a timescale for the halo-
thick disk formation of τh ∼ 1.5-2.0 Gyr (Matteucci and Greggio 1986;
Matteucci and François, 1989; Chiappini et al. 1997). The external
halo and thick disk probably formed more slowly or have been accreted
(Chiappini et al. 2001).
• To fit abundance gradients, SFR and gas distribution along the Galac-
tic thin disk we must assume that the disk formed inside-out (Mat-
teucci & François, 1989; Chiappini et al. 2001; Boissier & Prantzos
1999; Alibés et al. 2001). Radial flows can help in forming the gra-
dients (Portinari & Chiosi 2000) but they are probably not the main
cause for them. A variable IMF along the Disk can in principle ex-
plain abundance gradients but it creates unrealistic situations: in fact,
1.2 Lecture II: the Milky Way and other spirals 19
in order to reproduce the negative gradients one should assume that
in the external and less metal rich parts of the Disk low mass stars
form preferentially (see Chiappini et al. 2000 for a discussion on this
point).
• The SFR is a strongly varying function of the galactocentric distance
(Matteucci & François 1989; Chiappini et al, 1997,2001; Goswami &
Prantzos 2000; Alibés et al. 2001).
1.2.4 Abundance Gradients from Emission Lines
There are two types of abundance determinations in HII regions: one is
based on recombination lines which should have a weak temperature de-
pendence of the nebula (He, C, N, O), the other is based on collisionally
excited lines where a strong dependence is intrinsic to the method (C, N,
O, Ne, Si, S, Cl, Ar, Fe and Ni). This second method has predominated
until now. A direct determination of the abundance gradients from HII
regions in the Galaxy from optical lines is difficult because of extinction,
so usually the abundances for distances larger than 3 Kpc from the Sun
are obtained from radio and infrared emission lines.
Abundance gradients can also be derived from optical emission lines
in Planetary Nebulae (PNe). However, the abundances of He, C and N
in PNe are giving only information on the internal nucleosynthesis of the
star. So, to derive gradients one should look at the abundances of O, S
and Ne, unaffected by stellar processes. In Figure 2.6 we show theoretical
predictions of abundance gradients along the disk of the Milky Way
compared with data from HII regions and B stars. The adopted model
is from Chiappini et al. (2001; 2003a) and is based on an inside-out
formation of the thin disk with the inner regions forming faster than
the outer ones, in particular τ(R) = 0.875R − 0.75 Gyr. Note that to
obtain a better fit for 12C, the yields of this element have been increased
artificially relative to those of WW95.
As already said, most of the models agree on the inside-out scenario
for the Disk formation, however not all models agree on the evolution of
the gradients with time. In fact, some models predict a flattening with
time (Boissier and Prantzos 1998; Alibès et al. 2001), whereas others
such as that of Chiappini et al. (2001) predict a steepening. The reason
for the steepening is that in the model of Chiappini et al. is included a
threshold density for SF,, which induces the SF to stop when the density
decreases below the threshold. This effect is particularly strong in the
external regions of the Disk, thus contributing to a slower evolution and
20 Chemical Evolution
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
0 5 10 15 20
Fig. 1.6. Upper panel: abundance gradients along the Disk of the MW. The
lines are the models from Chiappini et al. (2003a): these models differ by the
nucleosynthesis prescriptions. In particular, the dash-dotted line represents
a model with van den Hoeck & Groenewegen (1997, hereafter HG97) yields
for low-intermediate mass stars with η (mass loss parameter) constant and
Thielemann et al.’s (1996) yields for massive stars, the long- dashed thick line
has HG97 yields with variable η and Thielemann et al. yields, the long-dashed
thin line has HG97 yields with variable η but WW95 yields for massive stars.
It is interesting to note that in all of these models the yields of 12C in stars
> 40M⊙ have been artificially increased by a factor of 3 relative to the yields
of WW95. Lower panel: the temporal behaviour of abundance gradients along
the Disk as predicted by the best model of Chiappini et al. (2001). The upper
lines in each panel represent the present time gradient, whereas the lower ones
represent the gradient a few Gyr ago. It is clear that the gradients tend to
stepeen in time, a still controversial result.
1.2 Lecture II: the Milky Way and other spirals 21
therefore to a steepening of the gradients with time, as shown in Figure
2.6, bottom panel.
1.2.5 Abundance Gradients in External Galaxies
Abundance gradients expressed in dex/Kpc are found to be steeper in
smaller disks but the correlation disappears if they are expressed in
dex/Rd, which means that there is a universal slope per unit scale length
(ref). The gradients are generally flatter in galaxies with central bars
(ref). The SFR is measured mainly from Hα emission (Kennicutt, 1998)
and show a correlation with the total surface gas density (HI+H2), in
particular the suggested law is that of eq. (1.5).
In the observed gas distributions differences between field and clus-
ter spirals are found in the sense that cluster spirals have less gas,
probably as a consequence of stronger interactions with the environ-
ment.Integrated colors of spiral galaxies (Josey & Arimoto 1992; Jimenez
et al. 1998; Prantzos & Boissier 2000) indicate inside-out formation, as
also found for the milky Way.
As an example of abundance gradients in a spiral galaxy we show
in Figure 2.7 the observed and predicted gas distribution and abun-
dance gradients for the disk of M101. In this case the gas distribu-
tion and the abundance gradients are reproduced with systematically
smaller timescales for the disk formation relative to the MW (M101
formed faster), and the difference between the timescales of formation
of the internal and external regions is smaller (τM101 = 0.75R−0.5 Gyr,
Chiappini et al. 2003a)
To conclude this section we like to recall a paper by Boissier et al.
(2001) where a detailed study of the properties of disks is presented.
They conclude that more massive disks are redder, more metal rich and
more gas-poor than smaller ones. On the other hand their estimated SF
efficiency (defined as the SFR per unit mass of gas) seem to be similar
among different spirals: this leads them to conclude that more massive
disks are older than less massive ones.
1.2.6 How to model the Hubble Sequence
The Hubble Sequence can be simply thought as a sequence of objects
where the SFR proceeds faster in the early than in the late types (see
also Sandage, 1986).
We take the Milky Way galaxy, whose properties are best known, as a
22 Chemical Evolution
Fig. 1.7. Upper panel: predicted and observed gas distribution along the disk
of M101. The observed HI, H2 and total gas are indicated in the Figure. The
large open circles indicate the models: in particular, the open circles connected
by a continuous line refer to a model with central surface mass density of
1000M⊙pc
−2, while the dotted line refers to a model with 800M⊙pc
−2 and the
dashed to a model with 600M⊙pc
−2. Lower panel: predicted and observed
abundance gradients of C,N,O elements along the disk of M101.The models
are the lines and differ for a different threshold density for SF, being larger in
the dashed model. All the models are by Chiappini et al. (2003a).
1.2 Lecture II: the Milky Way and other spirals 23
reference galaxy and we change the SFR relatively to the Galactic one,
for which we adopt eq. (1.6). The quantity ν in eq. (1.6) is the efficiency
of SF which we assume to be characteristic of each Hubble type. In the
two-infall model for the Milky Way we adopt νhalo = 2.0Gyr
−1 and
νdisk = 1.0Gyr
−1 (see Figure 2.1). The choice of adopting a dependence
on the total surface mass density for the Galactic disk is due to the fact
that it helps in producing a SFR strongly varying with the galactocentric
distance, as required by the observed SFR and gas density distribution
as well as by the abundance gradients. In fact, the inside-out scenario
influences the rate at which the gas mass is accumulated by infall at
each galactocentric distance and this in turn influences the SFR.
For bulges and ellipticals we assume that the SF proceeds like in a
burst with very high star formation efficiency, namely:
SFR = νσk (1.26)
with k = 1.0 for the sake of simplicity; ν = 10 − 20Gyr−1 (see Mat-
teucci, 1994; Pipino & Matteucci 2004).
For irregular galaxies, on the other hand, we assume that the SFR
proceeds more slowly and less efficiently that in the Milky Way disk,
in particular we assume the same SF law as for spheroids but with
0.01 ≤ ν(Gyr−1) ≤ 0.1. Among irregular galaxies, a special position
is taken by the Blue Compact Galaxies (BCG) namely galaxies which
have blue colors as a consequence of the fact that they are forming stars
at the present time, have small masses, large amounts of gas and low
metallicities. For these galaxies, we assume that they suffered on average
from 1 to 7 short bursts, with the SF efficiency mentioned above (see
Bradamante et al. 1998 and next Lecture).
Finally, dwarf spheroidals are also a special cathegory, characterized
by old stars, no gas and low metallicities. For these galaxies we assume
that they suffered one long starburst lasting 7-8 Gyr or at maximum a
couple of extended SF periods, in agreement with their measued Color-
Magnitude diagram. It is worth noting that both ellipticals and dwarf
spheroidals should loose most of their gas and therefore one may con-
clude that galactic winds should play an important role in their evolu-
tion, although ram pressure stripping cannot be excluded as a mecha-
nism for gas removal. Also for these galaxies we assume the previous
SF law with k = 1 and ν = 0.01− 1.0Gyr−1. Lanfranchi & Matteucci,
(2003, 2004) developed more detailed models for dwarf spheroidals by
adopting the SF history suggested by the Color-Magnitude diagrams of
24 Chemical Evolution
Fig. 1.8. Predicted SFRs in galaxies of different morphological type. Figure
from Calura (2004). Note that for the elliptical galaxy the SF stops abruptly
as a consequence of the galactic wind.
single galaxies and with the same efficiency of SF as above. In Figure
2.8 we show the adopted SFRs in different galaxies and in Figure 2.9
the corresponding predicted Type Ia SN rates. For the irregular galaxy,
the predicted Type Ia SN rate refers to a specific galaxy, LMC, with a
SFR taken from observations (see Calura et al. 2003) with an early ans
a late burts of SF and low SF in between.
1.2.7 Type Ia SN rates in different galaxies
Following Matteucci & Recchi (2001) we define the typical timescale
for Type Ia SN enrichment as the time when the SN rate reaches the
maximum. In the following we will always adopt the SDS for the pro-
genitors of Type Ia SNe. A point that is not often understood is that
this timescale depends upon the progenitor lifetimes, IMF and SFR and
therefore is not universal. Sometimes in the literature the typical Type
Ia SN timescale is quoted as being universal and equal to 1 Gyr, whereas
this is just the timescale at which the Type Ia SNe start to be important
in the process of Fe enrichment in the solar vicinity.
Matteucci & Recchi (2001) showed that for an elliptical galaxy or a
bulge of spiral with a high SFR the timescale for Type Ia SN enrichment
1.2 Lecture II: the Milky Way and other spirals 25
Fig. 1.9. Predicted Type Ia SN rates for the SFRs of Figure 2.8. Figure from
Calura (2004). Note that for the irregular galaxy here the predictions are for
the LMC, where a recent SF burst is assumed.
is quite short, in particular tSNIa = 0.3 − 0.5 Gyr. For a spiral like
the Milky Way, in the two-infall model, a first peak is reached at 1.0-
1.5 Gyr (the time at which SNeIa become important as Fe producers
(Matteucci and Greggio 1986) while a second less important peak occurs
at tSNIa = 4 − 5 Gyr. For an irregular galaxy with a continuous but
very low SFR the timescale is tSNIa > 5 Gyr.
1.2.8 Time-delay model for different galaxies
As we have already seen, the time-delay between the production of oxy-
gen by Type II SNe and that of Fe by Type Ia SNe allows us to explain
the [X/Fe] vs. [Fe/H] relations in an elegant way. However, the [X/Fe]
vs. [Fe/H] plots depend not only on nucleosynthesis and IMF but also
on other model assumptions, such as the SFR, through the absolute Fe
abundance ([Fe/H]). Therefore, we should expect a different behaviour
in galaxies with different SF histories. In Figure 2.10 we show the pre-
dictions of the time-delay model for a spheroid like the Bulge, for the
solar vicinity and for a typical irregular magellanic galaxy.
As one can see in this Figure, we predict a long plateau, well above
the solar value, for the [α/Fe] ratios in the Bulge (and ellipticals), owing
26 Chemical Evolution
LMC (Hill et al. 2000)
DLA (Vladilo 2002)
Fig. 1.10. Predicted [α/Fe] ratios in galaxies with different SF histories. The
top line represents the predictions for the Bulge or for an elliptical galaxy
of the same mass (∼ 1010M⊙), the median line represents the prediction for
the solar vicinity and the lower line the prediction for an irregular magellanic
galaxy. The differences among the various models are in the efficiency of star
formation, being quite high for spheroids (ν = 20Gyr−1), moderate for the
Milky Way (ν = 1 − 2Gyr−1) and low for irregular galaxies (ν = 0.1Gyr−1).
The nucleosynthesis prescriptions are the same in all objects. The time-delay
between the production of α-elements and Fe, coupled with the different SF
histories produces the differences in the plots. Data for Damped-Lyman-α
systems, LMC and Bulge are shown for comparison.
to the fast Fe enrichment reached in these systems by means of Type II
SNe: when the Type Ia SNe start enriching substantially the ISM, at
0.3-0.5 Gyr, the gas Fe abundance is already solar. The opposite occurs
in Irregulars where the Fe enrichment proceeds very slowly so that when
Type Ia SNe start restoring the Fe in a substantial way (> 3 Gyr) the Fe
in the gas is still well below solar. Therefore, here we observe a steeper
1.3 Lecture III: interpretation of abundances in dwarf irregulars 27
slope for the [α/Fe] ratio. In other words, we have below solar [α/Fe]
ratios at below solar [Fe/H] ratios. This diagram is very important since
it allows us to recognize a galaxy type only by means of its abundances,
and therefore it can be used to understand the nature of high redshift
objects.
1.3 Lecture III: interpretation of abundances in dwarf
irregulars
They are rather simple objects with low metallicity and large gas con-
tent, suggesting that they are either young or have undergone discon-
tinuous star formation activity (bursts) or a continuous but not efficient
star formation. They are very interesting objects for studying galaxy
evolution. In fact, in ”bottom-up” cosmological scenarios they should
be the first self- gravitating systems to form and they could also be
important contributors to the population of systems giving rise to QSO-
absorption lines at high redshift (see Matteucci et al. 1997 and Calura
et al. 2002).
1.3.1 Properties of Dwarf Irregular Galaxies
Among local star forming galaxies, sometimes referred to as HII galax-
ies, most are dwarfs. Dwarf irregular galaxies can be divided into two
categories: Dwarf Irregular (DIG) and Blue Compact galaxies (BCG).
These latter have very blue colors due to active star formation at the
present time.
Chemical abundances in these galaxies are derived from optical emis-
sion lines in HII regions. Both DIG and BCG show a distinctive spread
in their chemical properties, altough this spread is decreasing with the
new more accurate data, but also a definite mass-metallicity relation.
From the point of view of chemical evolution, Matteucci and Chiosi
(1983) first studied the evolution of DIG and BCG by means of ana-
lytical chemical evolution models including either outflow or infall and
concluded that: closed-box models cannot account for the Z-log G(G =
Mgas/Mtot) distribution even if the number of bursts varies from galaxy
to galaxy and suggested possible solutions to explain the observed spread.
In other words, the data show a range of values of the metallicity for a
given G ratio, and this means that the effective yield is lower than that
of the Simple Model and vary from galaxy to galaxy.
The possible solutions suggested to lower the effective yield were:
28 Chemical Evolution
• a. different IMF’s
• b. different amounts of galactic wind
• c. different amounts of infall
In Figure 3.1 we show graphically the solutions a), b) and c). Concerning
the solution a), one simply varies the IMF, whereas solutions b) and c)
have been already descibed (eqs. 1.21 ans 1.23).
Later on, Pilyugin (1993) forwarded the idea that the spread observed
also in other chemical properties properties of these galaxies such as
in the He/H vs. O/H and N/O vs. O/H relations, can be due to
self-pollution of the HII regions, which do not mix efficiently with the
surrounding medium, coupled with “enriched” or “differential” galac-
tic winds, namely different chemical elements are lost at different rates.
Other models (Marconi et al. 1994; Bradamante et al. 1998) followed
the suggestions of differential winds and introduced the novelty of the
contribution to the chemical enrichment and energetics of the ISM by
SNe of different type (II, Ia and Ib).
Another important feature of these galaxies is the mass-metallicity
relation.
The existence of a luminosity-metallicity relation in irregulars and
BCG was suggested first by Lequeux et al. (1979), then confirmed by
Skillman et al. (1989) and extended also to spirals by Garnett & Shields
(1987). In particular, Lequeux et al. suggested the relation:
MT = (8.5± 0.4) + (190± 60)Z (1.27)
with Z being the global metal content. Recently, Tremonti et al. (2004)
analyzed 53000 local star-forming galaxies in the SDSS (irregulars and
spirals). Metallicity was measured from the optical nebular emission
lines. Masses were derived from fitting spectral energy distribution
(SED) models. The strong optical nebular lines of elements other than
H are produced by collisionally excited transitions. Metallicity was then
determined by fitting simultaneously the most prominent emission lines
([OIII], Hβ , [OII], Hα, [NII], [SII]). Tremonti et al. (2004) derived a re-
lation indicating that 12+log(O/H) is increasing steeply from M∗ going
from 108.5 to 1010.5 but flattening for M∗ > 10
10.5.
In particular, the Tremonti et al. relation is:
12 + log(O/H) = −1.492 + 1.847(logM∗)− 0.08026(logM∗)
2. (1.28)
This relation extends to higher masses the mass-metallicity relation
1.3 Lecture III: interpretation of abundances in dwarf irregulars 29
Fig. 1.11. The Z-logG diagram.Solutions a), b) and c) from top to bottom,
to lower the effective yield in DIG and BCG by Matteucci & Chiosi (1983).
Solution a) consists in varying the yield per stellar generation, here indicated
by pZ , just by changing the IMF. The solution b) and c) correspond to eqs.
(1.21) and (1.23), respectively.
30 Chemical Evolution
Fig. 1.12. Figure 3 from Erb et al. (2006) showing the mass-metallicity rela-
tion for star forming galaxies at high redshift. The data from Tremonti et al.
(2004) are also shown.
found for star forming dwarfs and contains very important information
on the physics governing galactic evolution. Even more recently, Erb
et al. (2006) found the same mass-metallicity relation for star-forming
galaxies at redshift z>2, with an offset from the local relation of ∼ 0.3
dex. They used Hα and [NII] spectra. In Figure 3.2 we show the figure
from Erb et al. (2006) for the mass-metallicity relation at high redshift
which includes the relation of Tremonti et al. (2004) for the local mass-
metallicity relation.
The most simple interpretation of the mass-metallicity relation is that
the effective yield increases with galactic mass. This can be achieved in
several ways, as shown in Fig. 3.1.: either by changing the IMF or the
stellar yields as a function of galactic mass, or by assuming that the
1.3 Lecture III: interpretation of abundances in dwarf irregulars 31
galactic wind is less efficient in more massive systems, or that the infall
rate is less efficient in more massive systems. One of the most common
interpretations of the mass-metallicity relation is that the effective yield
changes because of the occurrence of galactic winds, which should be
more important in small systems. Evidences for galactic winds exist for
dwarf irregular galaxies, as we will see next.
1.3.2 Galactic Winds
Papaderos et al. (1994) estimated a galactic wind flowing at a velocity of
1320 Km/sec for the irregular dwarf VIIZw403. The escape velocity es-
timated for this galaxy is ≃ 50 Km/sec. Lequeux et al. (1995) suggested
a galactic wind in Haro2=MKn33 flowing at a velocity of ≃ 200Km/sec,
also larger that the escape velocity of this object. More recently, Martin
(1996;1998) found also supershells in 12 dwarfs, including IZw18, which
imply gas outflow. Martin (1999) concluded that the galactic wind rates
are several times the SFR. Finally, the presence of metals in the ICM
(revealed by X-ray observations) and in the IGM (Ellison et al. 2000)
represents a clear indication of the fact that galaxies lose their metals.
However, we cannot exclude that the gas with metals is lost also by ram
pressure stripping, especially in galaxy clusters.
In models of chemical evolution of dwarf irregulars (e.g. Bradamante
et al. 1998) the feedback effects are taken into account and the condition
for the development of a wind is:
(Eth)ISM ≥ EBgas (1.29)
namely, that the thermal energy of the gas is larger or equal to its binding
energy. The thermal energy of gas due to SN and stellar wind heating
(Eth)ISM = EthSN + Ethw (1.30)
with the contribution of SNe being:
EthSN =
ǫSNRSN (t
‘)dt‘, (1.31)
while the contribution of stellar winds is:
Ethw =
∫ 100
ϕ(m)ψ(t‘)ǫwdmdt
‘ (1.32)
with ǫSN = ηSN ǫo and ǫo = 10
51erg (typical SN energy), and ǫw =
32 Chemical Evolution
ηwEw with Ew = 10
49erg (typical energy injected by a 20M⊙ star taken
as representative). ηw and ηSN are two free parameters and indicate
the efficiency of energy transfer from stellar winds and SNe into the
ISM, respectively, quantities still largely unknown. The total mass of
the galaxy is expressed as Mtot(t) = M∗(t) +Mgas(t) +Mdark(t) with
ML(t) =M∗(t) +Mgas(t) and the binding energy of gas is:
EBgas(t) =WL(t) +WLD(t) (1.33)
with:
WL(t) = −0.5G
Mgas(t)ML(t)
(1.34)
which is the potential well due to the luminous matter and with:
WLD(t) = −GwLD
Mgas(t)Mdark
(1.35)
which represents the potential well due to the interaction between dark
and luminous matter, where wLD ∼
S(1 + 1.37S), with S = rL/rD,
being the ratio between the galaxy effective radius and the radius of the
dark matter core. The typical model for a BCG has a luminous mass
of 108 − 109M⊙, a dark matter halo ten times larger than the luminous
mass and various values for the parameter S. The galactic wind in these
galaxies develops easily but it carries out mainly metals so that the total
mass lost in the wind is small.
1.3.3 Results on DIG and BCG from purely chemical models
Purely chemical models (Bradamante et al. 1998, Marconi et al. 1994)
for DIG and BCG have been computed in the last years by varying the
number of bursts, the time of occurrence of bursts tburst, the star forma-
tion efficiency, the type of galactic wind (differential or normal), the IMF
and the nucleosynthesis prescriptions. The best model of Bradamante
et al. (1998) suggests that the number of bursts should be Nbursts ≤ 10,
the SF efficiency should vary from 0.1 to 0.7 Gyr−1 for either Salpeter or
Scalo (1986) IMF (Salpeter IMF is favored). Metal enriched winds are
favored. The results of these models also suggest that SNe of Type
II dominate the chemical evolution and energetics of these galaxies,
whereas stellar winds are negligible. The predicted [O/Fe] ratios tend to
be overabundant relative to the solar ratios, owing to the predominance
of Type II SNe during the bursts, in agreement with observational data
1.3 Lecture III: interpretation of abundances in dwarf irregulars 33
(see Figure 3.5 upper panel). Models with strong differential winds and
Nburst=10 - 15 can however give rise to negative [O/Fe] ratios. The
main difference between DIGs and BCGs, in these models, is that the
BCGs suffer a present time burst, whereas the DIGs are in a quiescent
phase.
In Figure 3.3 we show some of the results of Bradamante et al. (1998)
compared with data on BCGs: it is evident from the Figure that the
spread in the chemical properties can be simply reproduced by different
SF efficiencies, which translate into different wind efficiencies.
In Fig 3.4 we show the results of the chemical evolution models of
Henry et al. (2000). These models take into account exponential infall
but not outflow. They suggested that the SF efficiency in extragalactic
HII regions must have been low and that this effect coupled with the
primary N production from intermediate mass stars can explain the
plateau in log(N/O) observed at low 12+log(O/H). Henry et al. (2000)
also concluded that 12C is mainly produced in massive stars (yields by
Maeder 1992) whereas 14N is mainly produced in intermediate mass
stars (yields by HG97). This conclusion, however, should be tested also
on the abundances of stars in the Milky Way, where the flat behaviour
of [C/Fe] vs. [Fe/H] from [Fe/H] =-2.2 up to [Fe/H]=0 suggest a similar
origin for the two elements, namely partly from massive stars and mainly
from low and intermediate mass ones (Chiappini et al. 2003b).
Concerning the [O/Fe] ratios we show results from Thuan et al. (1995)
in Figure 3.5, where it is evident that generally BCGs have overabundant
[O/Fe] ratios.
Very recently, an extensive study from SDSS of chemical abundances
from emission lines in a sample of 310 metal poor emission line galaxies
appeared (Izotov et al. 2006). The global metallicity in these galax-
ies ranges from ∼ 7.1(Z⊙/30) to ∼ 8.5(0.7Z⊙). The SDSS sample is
merged with 109 BCGs containing extremely low metallicity objects.
These data, shown in Figure 3.5 lower panel, substantially confirm pre-
vious ones, showing how α-elements do not depend on the O abundance
suggesting a common origin for these elements in stars withM > 10M⊙,
except for a slight increase of Ne/O with metallicity which is inter-
preted as due to a moderate dust depletion of O in metal rich galaxies.
An important finding is that all the studied galaxies are found to have
log(N/O) > −1.6, which indicates that none of these galaxies is a truly
young object, unlike the DLA systems at high redshift which show a
log(N/O) ∼ −2.3.
34 Chemical Evolution
Fig. 1.13. Upper panel : predicted Log(N/O) vs. 12 + log(O/H) for a model
with 3 bursts of SF separated by quiescent periods and different SF efficien-
cies here indicated with γ = ν. Lower panel: predicted log(C/O) vs. 12 +
log(O/H). The data in both panels are from Kobulnicky and Skillman (1996).
The models assume a dark matter halo ten times larger than the luminous
mass and S=0.3 ( Bradamante et al. 1998, see text).
1.3.4 Results from Chemo-Dynamical models: IZw18
IZw18 is the most metal poor local galaxy, thus resembling to a pri-
mordial object. Probably it did not experience more than two bursts
of star formation including the present one. The age of the oldest stars
1.3 Lecture III: interpretation of abundances in dwarf irregulars 35
Fig. 1.14. Figure from Henry et al. (2000): a comparison between numerical
models and data for extragalactic HII regions and stars (filled circles, filled
boxes and filled diamonds); M and S mark the position of the Galactic HII re-
gions and the Sun, respectively. Their best model is model B with an efficiency
of SF of ν = 0.03.
in this galaxy is still uncertain, although recently Tosi et al. (2006)
suggested an age possibly > 2 Gyr. The oxygen abundance in IZW18
is 12+log(O/H)= 7.17-7.26, ∼ 15-20 times lower than the solar oxygen
(12+ log(O/H)= 8.39, Asplund et al. 2005) and log N/O= -1.54/ -1.60
(Garnett et al. 1997).
Recently, FUSE provided abundances also for HI in IZw18: the evi-
dence is that the abundances in the HI are lower than in the HII (Aloisi
et al. 2003; Lecavelier des Etangs et al. 2003). In particular, Aloisi et
al. (2003) found the largest difference relative to the HII data.
Chemo-dynamical (2-D) models (Recchi et al. 2001) studied first the
case of IZw18 with only one burst at the present time and concluded
that the starburst triggers a galactic outflow. In particular, the metals
leave the galaxy more easily than the unprocessed gas and among the
enriched material the SN Ia ejecta leave the galaxy more easily than
other ejecta. In fact, Recchi et al. (2001) had reasonably assumed that
Type Ia SNe can transfer almost all of their energy to the gas, since
36 Chemical Evolution
Fig. 1.15. Upper panel: [O/Fe] vs. [Fe/H] observed in a sample of BCGs
by Thuan et al. (1995) (filled circles), open triangles and asterisks are disk
and halo stars shown for comparison.Figure from Thuan et al. (1995). Lower
panel: new data from Izotov et al. (2006). The large filled circles represent the
BCGs whereas the dots are the SDSS galaxies. Abundances in the left panel
are calculated as in Thuan et al. (1995) whereas those in the right panel are
calculated as in Izotov et al. (2006) (see original papers for details). Figure
from Izotov et al. (2006).
1.3 Lecture III: interpretation of abundances in dwarf irregulars 37
Fig. 1.16. Figure from Recchi et al. (2004): predicted abundances for the HII
region in IZw18 (dashed lines represent a model adopting the yields of Meynet
& Maeder (2002) for Z = 10−5, whereas the continuous line refers to a higher
metallicity (Z=0.004).Observational data are represented by the shaded areas.
they explode in an already hot and rarified medium after the SN II
explosions. As a consequence of this, they predicted that the [α/Fe]
ratios in the gas inside the galaxy should be larger than the [α/Fe]
ratios in the gas outside the galaxy. At variance with previous studies,
they found that most of the metals are already in the cold gas phase
after 8-10 Myr since the superbubble does not break immediately and
thermal conduction can act efficiently. In the following, Recchi et al.
(2004) extended the model to a two-burst case, always with the aim of
reproducing the characteristics of IZw18. The model well reproduces the
chemical properties of IZw18 with a relatively long episode of SF lasting
270 Myr plus a recent burst of SF still going on. In Figure 3.6 we show
the predictions of Recchi et al. (2004) for the abundances in the HII
regions of IZW18 and in Figure 3.7 those for the HI region, showing a
little difference between the HII and HI abundances, more in agreement
with the data of Lecavelier des Etangs et al. (2004).
38 Chemical Evolution
Fig. 1.17. Figure from Recchi et al. (2004): predicted abundances for the
HI region. The models are the same as in Figure 3.6. Observational data
are represented by the shaded areas. The upper shaded area in the panel for
oxygen and the lower shaded area in the panel for N/O represent the data of
Lecavelier des Etangs et al. (2003).
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment
1.4.1 Ellipticals
We recall here some of the most important properties of ellipticals or
early type galaxies (ETG) which are systems made of old stars with no
gas and no ongoing SF. The metallicity of ellipticals is measured only
by means of metallicity indeces obtained from their integrated spectra
which are very similar to those of K giants. In order to pass from metal-
licity indices to [Fe/H] one needs then to adopt a suitable calibration
often based on population synthesis models (Worthey, 1994). We also
summarize the most common scenarios for the formation of ellipticals.
1.4.2 Chemical Properties
The main properties of the stellar populations in ellipticals are:
• There exist the well-known Color-Magnitude and Color - σo (veloc-
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 39
ity dispersion) relations indicating that the integrated colors become
redder with increasing luminosity and mass (Faber 1977; Bower et al.
1992). These relations are interpreted as a metallicity effect, although
a well known degeneracy exists between metallicity and age of the
stellar populations in the integrated colors (Worthey 1994).
• The indexMg2 is normally used as a metallicity indicator since it does
not depend much upon the age of stellar populations. There exists for
ellipticals a well defined Mg2–σo relation, equivalent to the already
discussed mass-metallicity relation for star forming galaxies (Bender
et al. 1993; Bernardi et al. 1998; Colless et al. 1999).
• Abundance gradients in the stellar populations inside ellipticals are
found (Carollo et al. 1993; Davies et al. 1993). Kobayashi & Arimoto
(1999) derived the average gradient for ETGs from a large compilation
of data and this is: ∆[Fe/H ]/∆r ∼ −0.3, with the average metallicity
in ETGs of < [Fe/H ] >∗∼ −0.3dex (from -0.8 to +0.3 dex).
• A very important characteristic of ellipticals is that their central dom-
inant stellar population (dominant in the visual light) shows an over-
abundance, relative to the Sun, of the Mg/Fe ratio, < [Mg/Fe] >∗> 0
(from 0.05 to + 0.3 dex) (Peletier 1989; Worthey et al. 1992; Weiss
et al. 1995; Kuntschner et al. 2001).
• In addition, the overabundance increases with increasing galactic mass
and luminosity, < [Mg/Fe] >∗ vs. σo, (Worthey et al. 1992; Mat-
teucci 1994; Jorgensen 1999; Kuntschner et al. 2001).
1.4.3 Scenarios for galaxy formation
The most common ideas on the formation and evolution of ellipticals
can be summarized as:
• they formed by an early monolithic collapse of a gas cloud or early
merging of lumps of gas where dissipation plays a fundamental role
(Larson 1974; Arimoto & Yoshii 1987; Matteucci & Tornambè 1987).
In this model SF proceeds very intensively until a galactic wind is
developed and SF stops after that. The galactic wind is devoiding the
galaxy from all its residual gas.
• They formed by means of intense bursts of star formation in merging
subsystems made of gas (Tinsley & Larson 1979). In this picture SF
stops after the last burst and gas is lost via ram pressure stripping or
galactic wind.
40 Chemical Evolution
Fig. 1.18. The relation [α/Fe] vs. velocity dispersion (mass) for ETGs. Figure
adapted from Thomas et al. (2002).The continuous line represents the predic-
tion of the model by Pipino & Matteucci (2004). The shaded area represents
the prediction of hierarchical models for the formation of ellipticals.The sym-
bols are the observational data.
• They formed by early merging of lumps containing gas and stars in
which some dissipation is present (Bender et al. 1993).
• They formed and continue to form in a wide redshift range and prefer-
entially at late epochs by merging of early formed stellar (e.g. Kauff-
mann et al. 1993;1996).
Pipino & Matteucci (2004), by means of recent revised monolithic
models taking into account the development of a galactic wind (see Lec-
ture III), computed the relation [Mg/Fe] versus mass (velocity disper-
sion) and compared it with the data by Thomas et al. (2002). Thomas
(1990) already showed how hierarchical semi-analitycal models cannot
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 41
reproduce the observed [Mg/Fe] vs. mass trend, since in this scenario
massive ellipticals have longer periods of star formation than smaller
ones. In Figure 4.1, the original figure from Thomas et al. (2002) is
shown, where we have plotted also our predictions. In the Pipino &
Matteucci (2004) model it is assumed that the most massive galaxies as-
semble faster and form stars faster than less massive ones. The adopted
IMF is the Salpeter one. In other words, more massive ellipticals seem to
be older than less massive ones, in agreement with what found for spirals
(Boissier et al. 2001). In particular, in order to explain the observed
< [Mg/Fe] >∗> 0 in giant ellipticals the dominant stellar population
should have formed on a time scale no longer than 3-5 ·108 yr (Weiss et
al. 1995; Pipino & Matteucci 2004).
1.4.4 Ellipticals-Quasars connection
We know now that most if not all massive ETGs are hosting an AGN
for sometime during their life. Therefore, there is a strict link between
the quasar activity and the evolution of ellipticals.
1.4.5 The chemical evolution of QSOs
It is very interesting to study the chemical evolution of QSOs by means
of the broad emission lines in the QSO region. The first studies by Wills
et al. (1985) and Collin-Souffrin et al. (1986) found that the abundance
of Fe in QSOs, as measured from broad emission lines, turned out to be
∼ a factor of 10 more than the solar one and this represented a challenge
for chemical evolution model makers. Hamman & Ferland (1992) from
N V/C IV line ratios in QSOs derived the N/C abundance ratios and
inferred the QSO metallicities. They suggested that N is overabundant
by factors of 2-9 in the high redshift sources (z > 2). Metallicities 3-14
times the solar one were also suggested in order to produce such a high
N abundance, under the assumption of a mainly secondary N. To inter-
pret their data they built a chemical evolution model, a Milky Way- like
model, and suggested that these high metallicities are reached in only
0.5 Gyr, implying that QSOs are associated with vigourous star forma-
tion. At the same time, Padovani & Matteucci (1993) and Matteucci &
Padovani (1993) proposed a model for QSOs in which QSOs are hosted
by massive ellipticals. They assumed that after the occurrence of a galac-
tic wind the galaxy evolves passively and that for massess > 1011M⊙
the gas restored by the dying stars is not lost but it feeds the central
42 Chemical Evolution
black hole. They showed that in this context the stellar mass loss rate
can explain the observed AGN luminosities. They also found that solar
abundances in the gas are reached in no more than 108 years explaining
in a natural way the standard emission lines observed in high-z QSOs.
The predicted abundances could explain the data available at that time
and solve the problem of the quasi-similarity of QSO spectra at differ-
ent redshifts. Finally, they suggested also a criterium for establishing
the ages of QSOs on the basis of the [α/Fe] ratios observed from broad
emission lines (see also Hamman & Ferland 1993).
Much more recently, Maiolino et al. (2005, 2006) used more than
5000 QSO spectra from SDSS data to investigate the metallicity of the
broad emission line region in the redshift range 2 < z < 4.5 and over
the luminosity range −24.5 < MB < −29.5. They found substantial
chemical enrichment in QSOs already at z = 6. Models for ellipticals
by Pipino & Matteucci (2004) were used as a comparison with the data
and they well reproduce the data, as one can see in Figure 4.2. In this
Figure the evolution of the abundances of several chemical elements in
the gas of a typical elliptical are shown. The elliptical suffers a galactic
wind at around 0.4 Gyr since the beginning of star formation. This wind
devoids the galaxy of all the gas present at that time. After this time,
the SF stops and the galaxy evolves passively. All the gas restored after
the galactic wind event by dying stars can in principle feed the central
black hole, thus the abundances shown in Figure 4.2, after the time of
the wind, can be compared with the abundances measured in the broad
emission line region. As one can see, the predicted Fe abundance after
the galactic wind is always higher than the O one, owing to the Type Ia
SNe which continue to produce Fe even after the stop in the SF. On the
other hand, O and α-elements stop to be produced when the SF halts.
The comparison between the predicted abundances and those derived
from the QSO spectra, are in very good agreement and indicates ages
for these objects between 0.5 and 1 Gyr.
Finally, in the context of the joint formation of QSOs and ellipticals
we recall the work of Granato et al. (2001) who includes the energy
feedback from the central AGN in ellipticals. This feedback produces
outflows and stops the SF in a down-sizing fashion, in agreement with
the chemical properties of ETGs indicating a shorter period of SF for
the more massive objects.
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 43
Fig. 1.19. The temporal evolution of the abundances of several chemical ele-
ments in the gas of an elliptical galaxy with luminous mass of 1011M⊙. Feed-
back effects are taken into account in the model (Pipino & Matteucci 2004), as
described in Lecture III. The downarrow indicates the time for the occurrence
of the galactic wind. After this time, the SF stops and the elliptical evolves
passively. All the abundances after the time for the occurrence of the wind are
those that we observe in the broad emission line region region. The shaded
area indicates the abundance sets which best fit the line ratios observed in the
QSO spectra. Figure from Maiolino et al. 2006.
1.4.6 The chemical enrichment of the ICM
The X-ray emission from galay clusters is generally interpreted as ther-
mal bremsstrahlung in a hot gas (107-108 K). There are several emission
lines (O, Mg, Si, S) including the strong Fe K-line at around 7keV which
was discovered by Mitchell et al. (1976). The iron is the best studied
element in clusters. For kT ≥ 3 keV the intracluster medium (ICM)
Fe abundance is constant and ∼ 0.3Fe⊙ in the central cluster regions;
44 Chemical Evolution
the existence of metallicity gradients seems evident only in some clusters
(see Renzini 2004). At lower temperatures, the situation is not so simple
and the Fe abundance seems to increase. The first works on chemical
enrichment of the ICM even preceeded the discovery of the Fe line (Gunn
& Gott 1972, Larson & Dinerstein 1975). In the following years other
works appeared such as those of Vigroux (1977), Himmes & Biermann
(1988) and Matteucci & Vettolani (1988). In particular, Matteucci &
Vettolani (1988) started a more detailed approach to the problem fol-
lowed by David et al. (1991), Arnaud (1992), Renzini et al. (1993),
Elbaz et al. (1995), Matteucci & Gibson (1995), Gibson & Matteucci
(1997), Lowenstein & Mushotzky (1996), Martinelli et al. (2000), Chiosi
(2000), Moretti et al. (2003). The majority of these papers assumed that
galactic winds (mainly from ellipticals and S0 galaxies) are responsible
for the ICM chemical enrichment. In fact, ETGs are the dominant type
of galaxy in clusters and Arnaud (1992) found a clear correlation be-
tween the mass of Fe in clusters and the total luminosity of ellipticals.
No such correlation was found for spirals in clusters. Alternatively, the
abundances in the ICM are due to ram pressure stripping (Himmes &
Biermann 1988) or derive from a chemical enrichment from pre-galactic
Pop III stars (White & Rees 1978).
In Matteucci & Vettolani (1988) the Fe abundance in the ICM rel-
ative to the Sun, XFe/XFe⊙ , was calculated as (MFe)pred/(Mgas)obs
to be compared with the observed ratio (XFe/XFe⊙)obs = 0.3 − 0.5
(Rothenflug & Arnaud 1985). They found a good agreement with the
observed Fe abundance in clusters if all the Fe produced by ellipticals
and S0, after SF has stopped, is eventually restored into the ICM and
if the majority of gas in clusters has a primordial origin. Low values for
[Mg/Fe] and [Si/Fe] were predicted at the present time, due to the short
period of SF in ETGs and to the Fe produced by Type Ia SNe. With
Salpeter IMF they found that the Type Ia SNe contribute ≥ 50% of the
total Fe in clusters. This leads to a bimodality in the [α/Fe] ratios in
the stars and in the gas in the ICM, since the stars have overabundances
of [α/Fe]> 0 whereas the ICM should have [α/Fe]≤ 0. The same con-
clusion was reached and more highlighted later by Renzini et al. (1993).
More recently, Pipino et al. (2002) computed the chemical enrichment
of the ICM as a function of redshift by considering the evolution of the
cluster luminosity function and an updated treatment of the SN feed-
back. They adopted Woosley & Weaver (1995) yields for Type II SNe
and Nomoto et al. (1997) W7 model for Type Ia SNe and a Salpeter
IMF. They also predicted solar or undersolar [α/Fe] ratios in the ICM.
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 45
Fig. 1.20. Observed Fe abundance and predicted Fe abundance in the ICM as
a function of redshift: data from Tozzi et al. (2003), model (continuous line)
from Pipino et al. (2002), where the formation of ETGs was assumed to occur
at z=8.
The observational data on abundance ratios in clusters are still uncertain
and vary from cluster centers where they tend to be solar or undersolar
to the outer regions where they tend to be oversolar (e.g. Tamura et al.
2004). So, no firm conclusions can be drawn on this point. Concerning
the evolution of the Fe abundance in the ICM as a function of redshift,
most of the above mentioned models predict very little or no evolution of
the Fe abundance from z=1 to z=0 (Pipino et al. 2002). This prediction
seemed to be in good agreement with data from Tozzi et al. (2003) as
shown Figure. However, more recently, more data of Fe abundance for
high redshift clusters appeared showing a different behaviour.
In Figure 4.4 we show the data of Balestra et al. (2006) who claim
an increase, by at least a factor of two, of the Fe abundance in the
ICM from z=1 to z=0. Clearly, if we assume that only ellipticals have
contributed to the Fe abundance in the ICM, this effect is difficult to
explain unless we assume recent star formation in ellipticals. Another
possible explanation could be that spiral galaxies contribute to Fe when
46 Chemical Evolution
Fig. 1.21. New data (always relative to Fe) from Balestra et al. (2006) showing
an increase of the Fe abundance in the ICM from z=1 to z=0. Error bars refer
to 1σ confidence level. The big shaded area represents the rms dispersion.
Figure from Balestra et al. (2006).
they become S0 as a consequence of ram pressure stripping, and this
morphological transformation might have started just at z=1.
1.4.7 Conclusions on the enrichment of the ICM
From what said before we can conclude that:
• Elliptical galaxies are the dominant contributors to the abundances
and energetic content of the ICM. A constant Fe abundance of ∼
0.3Fe⊙ is found in the central regions of clusters hotter than 3keV
(Renzini 2004).
• Good models for the chemical enrichment of the ICM should repro-
duce the iron mass measured in clusters plus the [α/Fe] ratios in-
side galaxies and in the ICM as well as the Fe mass to light ratio
(IMLR= MFeICM /LB, with LB being the total blue luminosity of
member galaxies, as defined by Renzini et al. (1993). Abundance
1.4 Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment 47
ratios are very powerful tools to impose constraints on the evolution
of ellipticals and of the ICM.
• Models which do not assume a top-heavy IMF for the galaxies in
clusters (a Salpeter IMF can reproduce at best the properties of local
ellipticals) predict [α/Fe]> 0 inside ellipticals and [α/Fe] ≤ 0 in the
ICM. Observed values are still too uncertain to draw firm conclusions
on this point.
Acknowledgements
This research has been supported by INAF (Italian National Institute
for Astrophysics), Project PRIN-INAF-2005-1.06.08.16
References
[1] Alibés, A., Labay, J. & Canal, R., 2001, A&A, 370, 1103
[2] Aloisi, A., Savaglio, S., Heckman, T. M., Hoopes, C. G., Leitherer, C. &
Sembach, K. R., 2003, ApJ, 595, 760
[3] Argast, D., Samland, M., Gerhard, O.E. & Thielemann, F.-K., 2000, A&A
356, 873
[4] Arimoto, N. & Yoshii, Y. 1987, A&A 173, 23
[5] Arnaud, M., Rothenflug, R., Boulade, O.,Vigroux, L. & Vangioni-Flam, E.,
1992, A&A, 254, 49
[6] Asplund, M., Grevesse, N. & Sauval, A.J., 2005, ASP (Astronomical Society
of the Pacific) Conf. Series, Vol. 336, p.55
[7] Balestra, I., Tozzi, P., Ettori, S., Rosati, P., Borgani, S., Mainieri, V.,
Norman, C. & Viola, M., 2006, A&A in press, astro-ph/0609664
[8] Barbuy, B. & Grenon, M., 1990. in :Bulges of Galaxies, eds. B.J. Jarvis &
D.M. Terndrup, ESO/CTO Workshop, p.83
[9] Barbuy, B.,Ortolani, S.& Bica, E., 1998, A&AS, 132, 333
[10] Bender, R., Burstein, D. & Faber, S. M., 1993, ApJ, 411, 153
[11] Berman, B.C. & Suchov, A.A., 1991, Astrophys. Space Sci. 184, 169
[12] Bernardi, M., Renzini, A., da Costa, L. N.., Wegner, G. & al., 1998, ApJ,
508, L143
[13] Boissier, S., Prantzos, N., 1999, MNRAS, 307, 857
[14] Boissier, S., Boselli, A., Prantzos, N. & Gavazzi, G., 2001, MNRAS, 321,
[15] Bradamante, F., Matteucci, F. & D’Ercole, A., 1998, A&A, 337, 338
[16] Calura, F. 2004 PhD Thesis, Trieste University
[17] Calura, F., Matteucci, F. & Vladilo, G., 2003, MNRAS, 340, 59
[18] Carollo, C. M., Danziger, I. J.& Buson, L., 1993, MNRAS, 265, 553
[19] Cayrel, R., Depagne, E., Spite, M., Hill, V., Spite, F., Franois, P., Plez,
B., Beers, T., & al., 2004, A&A, 416, 117
[20] Chabrier, G., 2003, PASP, 115, 763
[21] Chang, R.X., Hou, J.L., Shu, C.G. & Fu, C.Q., 1999, A&A 350, 38
[22] Chiappini, C,. Hirschi, R., Meynet, G., Ekstroem, S., Maeder, A. & Mat-
teucci, F., 2006, A&A, 449, L27
[23] Chiappini, C., Matteucci F. & Gratton R. 1997, ApJ, 477, 765
[24] Chiappini, C., Matteucci, F. & Meynet, G. 2003b, A&A, 410, 257
[25] Chiappini, C., Matteucci, F. & Padoan, P., 2000, ApJ, 528, 711
[26] Chiappini, C., Matteucci, F., & Romano, D., 2001, ApJ, 554, 1044
http://arxiv.org/abs/astro-ph/0609664
References 49
[27] Chiappini, C., Romano, D & Matteucci, F., 2003a, MNRAS, 339, 63
[28] Chiosi, C., 1980, A&A, 83, 206
[29] Chiosi, C., 2000, A&A 364, 423
[30] Colless, M., Burstein, D., Davies, R.L., McMahan, R. K., Saglia, R. P. &
Wegner, G., 1999, MNRAS, 303, 813
[31] Collin-Souffrin, S., Joly, M., Pequignot, D. & Dumont, S., 1986, A&A,
166, 27
[32] Davies, R. L., Sadler, E. M. & Peletier, R. F., 1993, MNRAS, 262, 650
[33] David, L.P., Forman, W., & Jones, C., 1991, ApJ, 376, 380
[34] Dopita, M.A.& Ryder, S.D., 1994, ApJ, 430, 163
[35] Eggen, O.J., Lynden-Bell, D. & Sandage, A.R., 1962, ApJ, 136, 748
[36] Elbaz, D., Cesarsky, C. J., Fadda, D., Aussel, H. & al., 1999, A&A, 351,
[37] Ellison, S.L., Songaila, A., Schaye, J. & Pettini, M., 2000, AJ, 120, 1175
[38] Erb, D. K., Shapley, A.E., Pettini, M., Steidel, C.C., Reddy, N.A.& Adel-
berger, K.L., 2006, ApJ, 644, 813
[39] François, P., Matteucci, F. Cayrel, R., Spite, M., Spite, F. & Chiappini,
C., 2004, A&A, 421, 613
[40] Garnett, D.R.& Shields, G.A., 1987, ApJ, 317, 82
[] Garnett, D.R., Skillman, E.D., Dufour, R.J.& Shields, G.A., 1997, ApJ, 481,
[41] Gibson, B.K. & Matteucci, F., 1997, ApJ, 475, 47
[42] Granato, G.L., Silva, L.,Monaco, P., Panuzzo, P., Salucci, P., De Zotti,
G.& Danese, L., 2001, MNRAS, 324, 757
[43] Greggio, L. & Renzini, A., 1983, A&A, 118, 217
[44] Grevesse, N., & Sauval, A.J., 1998, Space Science Reviews, Vol. 85, p.161
[45] Goswami, A. & Prantzos, N., 2000, A&A, 359, 191
[46] Gunn, J. E. & Gott, J. R. III, 1972, ApJ, 176, 1
[47] Holweger, H., 2001, Joint SOHO/ACE workshop ”Solar and Galactic
Composition”. Edited by Robert F. Wimmer-Schweingruber. Publisher:
American Institute of Physics Conference proceedings Vol. 598, p.23
[48] Hachisu, I., Kato, M. & Nomoto, K., 1996, ApJ, 470, L97
[49] Hachisu, I., Kato, M. & Nomoto, K., 1999, ApJ, 522, 487
[50] Hamman, F. & Ferland, G., 1993, ApJ, 418, 11
[51] Henry, R.B.C., Edmunds, M.G.& Koeppen, J., 2000, ApJ, 541, 660
[52] Himmes, A., & Biermann, P., A&A, 1988, 86, 11
[69] Iben, I.Jr. & Tutukov, A.V., 1984, ApJS, 54, 335
[54] Ishimaru, Y., & Arimoto, N., 1997, PASJ, 49, 1
[55] Izotov, Y. I., Stasinska, G., Meynet, G., Guseva, N. G. & Thuan, T. X.,
2006, A&A, 448, 955
[56] Jimenez, R., Padoan, P., Matteucci, F. & Heavens, A.F., 1998, MNRAS
299, 123
[57] Jorgensen, I., 1999, MNRAS, 306, 607
[58] Josey, S. A. & Arimoto, N., 1992, A&A, 255, 105
[59] Kauffmann, G., Charlot, S. & White, S. D. M., 1996, MNRAS 283, L117
[60] Kauffmann, G., White, S.D.M. & Guiderdoni, B., 1993, MNRAS, 264, 201
[61] Kennicutt, R.C. Jr., 1989, ApJ, 344, 685
[62] Kennicutt, R.C. Jr., 1998, ARAA, 36, 189
[63] Kobayashi, C. & Arimoto, N., 1999, ApJ, 527, 573
[64] Kodama, T., Yamada, T., Akiyama, M., Aoki, K., Doi, M., Furusawa,
H.,Fuse, T., Imanishi, M. & al., 2004, ApJ, 492, 461
50 References
[65] Kobulnicky, H.A. & Skillman, E.D., 1996, ApJ, 471, 211
[66] Kroupa, P., Tout, C.A. & Gilmore, G., 1993, MNRAS, 262, 545
[67] Kuntschner, H., Lucey, J. R., Smith, R. J., Hudson, M. J. & Davies, R.
L., 2001, MNRAS, 323, 625
[68] Hill, V., François, P., Spite, M., Primas, F., Spite, F., 2000, A&A, 364,
[69] Iben, I. Jr. & Tutukov, A., 1984, ApJ, 284, 719
[70] Iwamoto, K., Brachwitz, F., Nomoto, K., Kishimoto, N., Umeda, H., Hix,
W. R. & Thielemann, F-K., 1999, ApJS, 125, 439 (I99)
[71] Lacey, C.G. & Fall, S. M., 1985, ApJ, 290, 154
[72] Lanfranchi, G. & Matteucci, F., 2003, MNRAS, 345, 71
[73] Lanfranchi, G. & Matteucci, F., 2004, MNRAS, 351, 1338
[74] Larson, R.B., 1972, Nature, 236, 21
[75] Larson, R.B., 1974, MNRAS 169, 229
[76] Larson, R.B., 1976, MNRAS 176, 31
[77] Larson, R.B., 1998, MNRAS, 301, 569
[78] Larson, R.B., & Dinerstein, H.L., 1975, PASP, 87, 911
[79] Lecavelier des Etangs, A., Desert, J.-M. & Kunth, D., 2003, A&A, 413,
[80] Lequeux, J., Kunth, D., Mas-Hesse, J. M. & Sargent, W. L. W., 1995,
A&A 301, 18
[81] Lequeux, J.,Peimbert, M., Rayo, J. F., Serrano, A. & Torres-Peimbert, S.,
1979, A&A, 80, 155
[82] Loewenstein, M., & Mushotzky, F., 1996, ApJ, 466, 695
[83] Maeder, A., 1992, A&A, 264, 105
[84] Maiolino, R., Cox, P., Caselli, P., Beelen, A., Bertoldi, F., Carilli, C. L.,
Kaufman, M. J., Menten, K. M.& al., 2005, A&A, 440, L51
[85] Maiolino, R., Nagao, T., Marconi, A., Schneider, R., Pedani, M., Pipino,
A, Matteucci, F. & al., 2006, Mem. S.A.It. Vol. 77, 643
[86] Mannucci, F., Della Valle, M., Panagia, N., Cappellaro, E., Cresci, G.,
Maiolino, R., Petrosian, A. & Turatto, M., 2005, A & A, 433, 807
[87] Mannucci, F., Della Valle, M.& Panagia, N., 2006, MNRAS, 370, 773
[88] Marconi, G., Matteucci, F. & Tosi, M., 1994, MNRAS, 270, 35
[89] Martin, C.L., 1996, ApJ, 465, 680
[90] Martin, C.L., 1998, ApJ, 506, 222
[91] Martin, C.L., 1999, ApJ, 513, 156
[92] Martinelli, A., Matteucci, F. & Colafrancesco, S., 2000, A&A 354, 387
[93] Matteucci, F., 2001, The Chemical Evolution of the Galaxy, ASSL, Kluwer
Academic Publisher
[94] Matteucci, F.,1994, A&A, 288, 57
[95] Matteucci, F. & Chiosi, C., 1983, A&A 123, 121
[96] Matteucci, F. & François, P., 1989, MNRAS 239, 885
[97] Matteucci, F.& Gibson, B.K., 1995, A&A 304, 11
[98] Matteucci, F., Raiteri, C. M., Busso, M., Gallino, R. & Gratton, R., 1993,
A&A, 272, 421
[99] Matteucci, F. & Greggio, L., 1986, A&A ,154, 279
[100] Matteucci, F., Molaro, P. & Vladilo, G., 1997, A&A 321, 45
[101] Matteucci, F. & Padovani, P., 1993, ApJ, 419, 485
[102] Matteucci, F. & Recchi, S., 2001, ApJ 5,58, 351
[103] Matteucci, F.& Tornambé, A., 1987, A&A, 185, 51
[104] Matteucci, F., & Vettolani, G., 1988, A&A, 202, 21
References 51
[105] McWilliam, A. & Rich, R. M., 1994, ApJS, 91, 749
[106] Menanteau, F., Jimenez, R.& Matteucci, F., 2001, ApJ, 562, L23
[107] Meynet, G. & Maeder, A., 2002, A&A, 390, 561
[108] Moretti, A., Portinari, L. & Chiosi, C., 2003, A&A, 408, 431
[109] Nomoto, K., Hashimoto, M., Tsujimoto, T., Thielemann, F.-K. & al.,
1997, Nucl. Phys. A, 616, 79
[110] Oey, M. S., 2000, ApJ, 542, L25
[111] Padovani, P. & Matteucci, F., 1993, ApJ, 416, 26
[112] Papaderos, P., Fricke, K. J., Thuan, T. X. & Loose, H.-H., 1994, A&A
291, L13
[113] Pardi, M.C., Ferrini, F. & Matteucci, F., 1994, ApJ, 444, 207
[114] Peletier, R. 1989, PhD Thesis, University of Groningen, The Netherlands
[115] Pilyugin, I.S., 1993, A&A 277, 42
[116] Pipino, A., Matteucci, F., Borgani, S. & Biviano, A., 2002, NewAstr., 7,
[117] Pipino, A., Matteucci, F., 2004, MNRAS, 347, 968
[118] Pipino, A., Matteucci, F., 2006, MNRAS, 365, 1114
[119] Portinari, L. & Chiosi, C., 2000, A&A, 355, 929
[120] Prantzos, N., 2003, A&A, 404, 211
[121] Prantzos, N. & Boissier, S., 2000, MNRAS 313, 338
[122] Recchi, S., Matteucci, F. & D’Ercole, A., 2001, MNRAS 322, 800
[123] Recchi, S., Matteucci, F., D’Ercole, A. & Tosi, M., 2004, A&A, 426, 37
[124] Renzini, A., 2004, in Clusters of Galaxies: Probes of Cosmological Struc-
ture and Galaxy Evolution, eds. J.S. Mulchay, A. Dressler & Oemler, A.
(Cambridge University Press), p.260
[125] Renzini, A. & Ciotti, L., 1993, ApJ, 416, L49
[126] Renzini, A., Ciotti, L., D’Ercole, A. & Pellegrini, S., 1993, ApJ 416, L49
[127] Rothenflug, R. & Arnaud, M., 1985, A&A, 144, 431
[128] Salpeter, E.E., 1955, ApJ, 121, 161
[129] Sandage, A., 1986, A&A, 161, 89
[130] Scalo, J.M., 1986, Fund. Cosmic Phys. 11, 1
[131] Scalo, J.M., 1998, The Stellar Initial Mass Function, A.S.P. Conf. Ser.,
Vol. 142 p.201
[132] Schechter, P., 1976, ApJ, 203, 297
[133] Schmidt, M., 1959, ApJ, 129, 243
[134] Schmidt, M., 1963, ApJ, 137, 758
[135] Schneider, R., Salvaterra, R., Ferrara, A. & Ciardi, B., 2006, MNRAS,
369, 825
[136] Searle, L. & Zinn, R., 1978, ApJ, 225, 357
[137] Skillman, E.D, Terlevich, R. & Melnick, J., 1989, MNRAS, 240, 563
[138] Springel, V. & Hernquist, L., 2003, MNRAS, 339, 312
[139] Tamura, T., Kaastra, J.S., den Herder, J.W.A., Bleeeker, J.A.M. & Pe-
terson, J.R., 2004, A&A, 420, 135
[140] Thielemann, F.K., Nomoto, K. & Hashimoto, M., 1996, ApJ, 460, 408
[141] Thomas, D., Greggio, L., Bender, R., 1999, MNRAS, 302, 537
[142] Thomas, D., Maraston, C., Bender, R. & Mensez de Oliveira, C., 2005,
ApJ, 621, 673
[143] Thomas, D., Maraston, C.& Bender, R., 2002, in: R.E. Schielicke (ed.),
Reviews in Modern Astronomy, Vol.15, p.219
[144] Thuan, T.X., Izotov, Y.I., Lipovetsky, V.A., 1995, ApJ, 445, 108
[145] Tinsley, B.M., 1980, Fund. Cosmic Phys., Vol. 5, 287
52 References
[146] Tinsley, B.M. & Larson, R.B., 1979, MNRAS, 186, 503
[147] Tornambé, A., 1989, MNRAS, 239, 771
[148] Tosi, M., 1988, A&A, 197, 33
[149] Tosi, M., Aloisi, A., & Annibali, F., 2006, IAU Symp. N.35, p.19
[150] Tozzi, P., Rosati, P., Ettori, S., Borgani, S., Mainieri, V.& Norman, C.,
2003, ApJ, 593, 705
[151] Tremonti, C.A., Heckman, T. M., Kauffmann, G., Brinchmann, J., Char-
lot, S., White, S. D. M.; Seibert, M., Peng, E. W. & al., 2004, ApJ, 613,
[152] Tsujimoto, T., Shigeyama, T. & Yoshii, Y., 1999, ApJ 519,63
[153] van den Hoek, L.B. & Groenewegen, M.A.T., 1997, A&AS, 123, 305
(HG97)
[154] Vladilo, G., 2002, A&A, 391, 407
[155] Vigroux, L., 1977, A&A, 56, 473
[156] Weiss, A. Peletier, R. F. & Matteucci, F., 1995, A&A, 296, 73
[157] Whelan, J. & Iben, I. Jr., 1973, ApJ, 186, 1007
[158] White, S.D.M., & Rees, M.J., 1978, MNRAS 183, 341
[159] Wills, B.J., Netzer, H. & Wills, D., 1985, ApJ, 288, 94
[160] Worthey, G., 1994, ApJS, 95, 107
[161] Worthey, G. Faber, S. M. & Gonzalez, J. J., 1992, ApJ, 398, 69
[162] Worthey, G, Trager, S.C., Faber, S. M., 1995, ASP Conf. Ser., 86, 203
[163] Woosley, S.E. & Weaver, T.A., 1995, ApJS, 101, 181 (WW95)
[164] Wyse, R.F.G. & Gilmore, G., 1992, AJ, 104, 144
[165] Wyse, R. F. G.& Silk, J., 1989, ApJ, 339, 700
	Chemical Evolution
	Lecture I: basic assumptions and equations of chemical evolution
	The basic ingredients
	The Star Formation Rate
	The Initial Mass Function
	The Infall Rate
	The Outflow Rate
	Stellar evolution and nucleosynthesis: the stellar yields
	Type Ia SN Progenitors
	Yields per Stellar Generation
	Analytical models
	Numerical Models
	Lecture II: the Milky Way and other spirals
	The Galactic formation timescales
	The two-infall model
	Common Conclusions from MW Models
	Abundance Gradients from Emission Lines
	Abundance Gradients in External Galaxies
	How to model the Hubble Sequence
	Type Ia SN rates in different galaxies
	Time-delay model for different galaxies
	Lecture III: interpretation of abundances in dwarf irregulars
	Properties of Dwarf Irregular Galaxies
	Galactic Winds
	Results on DIG and BCG from purely chemical models
	Results from Chemo-Dynamical models: IZw18
	Lecture IV: Elliptical galaxies-Quasars- ICM Enrichment
	Ellipticals
	Chemical Properties
	Scenarios for galaxy formation
	Ellipticals-Quasars connection
	The chemical evolution of QSOs
	The chemical enrichment of the ICM
	Conclusions on the enrichment of the ICM
	References
ABSTRACT
  In this series of lectures we first describe the basic ingredients of
galactic chemical evolution and discuss both analytical and numerical models.
Then we compare model results for the Milky Way, Dwarf Irregulars, Quasars and
the Intra-Cluster- Medium with abundances derived from emission lines. These
comparisons allow us to put strong constraints on the stellar nucleosynthesis
and the mechanisms of galaxy formation.

<|endoftext|><|startoftext|>
Suppression of 1/fα noise in one-qubit systems
Pekko Kuopanportti,
Mikko Möttönen,
Ville Bergholm,
Olli-Pentti Saira,
Jun Zhang,
and K. Birgitta Whaley
Laboratory of Physi
s, Helsinki University of Te
hnology P. O. Box 4100, 02015 TKK, Finland
Low Temperature Laboratory, Helsinki University of Te
hnology, P.O. Box 3500, 02015 TKK, Finland
Department of Chemistry and Pitzer Center for Theoreti
al Chemistry, University of California, Berkeley, CA 94720
(Dated: O
tober 26, 2018)
We investigate the generation of quantum operations for one-qubit systems under 
lassi
al noise
with 1/fα power spe
trum, where 2 > α > 0. We present an e�
ient way to approximate the
noise with a dis
rete multi-state Markovian �u
tuator. With this method, the average temporal
evolution of the qubit density matrix under 1/fα noise 
an be feasibly determined from re
ently
derived deterministi
 master equations. We obtain qubit operations su
h as quantum memory and
the NOT gate to high �delity by a gradient based optimization algorithm. For the NOT gate, the

omputed �delities are qualitatively similar to those obtained earlier for random telegraph noise.
In the 
ase of quantum memory however, we observe a nonmonotoni
 dependen
y of the �delity on
the operation time, yielding a natural a

ess rate of the memory.
I. INTRODUCTION
In solid-state realization of qubits, material spe
i�
�u
tuations typi
ally indu
e the major 
ontribution to
the intrinsi
 noise. Mu
h e�ort has been fo
used on
the preservation of the state in a quantum memory in
the presen
e of 1/fα noise sin
e this is a ubiquitous
form of noise en
ountered in solid-state qubit appli
a-
tions [1, 2, 3℄. Both 
harge and spin qubits are sus-

eptible to noise of this form. For Josephson jun
tions,
both 
harge noise [4, 5℄ and 
riti
al 
urrent noise [6, 7℄
have been measured to have 1/fα power spe
tral densi-
ties. Similar 
harge �u
tuations are responsible for the
well-known 1/fα nature of low frequen
y noise in sin-
gle ele
tron transistors [8℄. Ba
kground 
harge �u
tua-
tions resulting in 1/fα noise spe
tra are 
onsidered to
be the most important sour
e of dephasing in Joseph-
son jun
tion qubits [4, 5, 9℄. Spin qubits su
h as those
formed from donor spins in semi
ondu
tors are sus
epti-
ble to nu
lear spin noise deriving from dipolar 
oupling
between environmental nu
lear spins. The nu
lear spin
bath 
ouples to the donor spins by hyper�ne intera
-
tions, whi
h renders the dynami
s of the nu
lear spins
to 
ause dephasing. Re
ent 
al
ulations for a phospho-
rus donor in sili
on show that the high frequen
y 
om-
ponent of the nu
lear spin noise is approximately de-
s
ribed by a 1/fα power spe
trum [10℄. Ele
tron spin
qubits implanted into sili
on [11℄ are also a�e
ted by
relaxation of dangling bonds deriving from oxygen va-

an
ies at the Si/SiO2 interfa
e. This gives rise to a
magneti
 noise with a 1/fα spe
trum that is the dom-
inant me
hanism for phase �u
tuations of donor spins
near the surfa
e [12℄. Another form of noise 
losely re-
lated to 1/fα noise is random telegraph noise (RTN),
whi
h arises from 
oupling of individual bistable �u
tu-
Ele
troni
 address: pekko.kuopanportti�tkk.�
ators to a qubit [2, 13, 14, 15, 16, 17, 18, 19℄.
Several approa
hes to suppress de
oheren
e based on
pulse design have been proposed in the literature. Among
them, dynami
al de
oupling s
hemes average out the un-
wanted e�e
ts of the environmental intera
tion through
the appli
ation of suitable 
ontrol pulses [20, 21℄. Appli-

ation of these s
hemes often involves hard pulses with
instantaneous swit
hings and unbounded 
ontrol ampli-
tudes, resulting in a range of validity restri
ted to time
s
ales for whi
h the pulse duration is mu
h less than the
noise 
orrelation time [22, 23℄. In Ref. [24℄, a dire
t pulse
optimization method restri
ted to bounded 
ontrol pulses
was developed for implementing one-qubit operations in a
noisy environment. This initial work on noise suppression
addressed the example of a single qubit system under the
in�uen
e of 
lassi
ally modeled random telegraph noise,
su
h as might arise from a single bistable �u
tuator.
In this paper, we extend the work of Ref. [24℄ to
the physi
ally relevant situation of 1/fα noise where
2 > α > 0. This kind of noise is known to result, for ex-
ample, from a set of bistable �u
tuators [25, 26, 27, 28℄,
i.e., RTN sour
es. We investigate two ways to approxi-
mate the 1/fα noise for 
omputer simulations, namely,
the sum of independent RTN �u
tuators and a single
dis
rete multi-state Markovian noise sour
e. We show
that the single �u
tuator provides a mu
h more e�
ient
way to model 1/fα noise than independent RTN �u
tu-
ators. Furthermore, the average temporal evolution of
the density matrix under this Markovian noise 
an be
exa
tly des
ribed by a set of deterministi
 master equa-
tions derived in Ref. [29℄. Using this approa
h, we avoid
the heavy 
omputational task arising from the numeri
al
evaluation of the density matrix averaged over a large
number of di�erent sample paths of the noise as 
om-
puted in Ref. [24℄. This framework will not only signif-
i
antly a

elerate the 
onvergen
e of the 
ontrol pulse
sequen
e optimization, but also allows further theoreti-

al analysis. Using these master equations, we employ
gradient based optimization pro
edures to obtain pulse
sequen
es that suppress 1/fα noise for quantum mem-
http://arxiv.org/abs/0704.0771v1
mailto:pekko.kuopanportti@tkk.fi
ory and for a NOT gate. Comparisons with 
omposite
pulses designed to eliminate systemati
 errors and with
refo
using pulses demonstrate that the numeri
ally opti-
mized pulse sequen
es yield the highest �delities.
The remainder of this paper is organized as follows.
In Se
. II, we show how to e�
iently approximate the
1/fα noise by a multi-state Markovian �u
tuator. In
Se
. III, we de�ne the �delity of qubit operations, re-
view the master equations des
ribing the average evolu-
tion of the qubit density matrix in the presen
e of the
noise and des
ribe the numerial optimization pro
edure.
Se
tions IV and V present optimized 
ontrol pulse se-
quen
es and the a
hieved �delities for quantum memory
and for the NOT gate, respe
tively. Finally, Se
. VI 
on-

ludes and indi
ates further appli
ations of the method.
II. ONE-QUBIT SYSTEM SUBJECT TO 1/fα
NOISE
We 
onsider a one-qubit system des
ribed by the e�e
-
tive Hamiltonian
a(t)σx +
η(t)σz , (1)
where a(t) ∈ [−amax, amax] is the external 
ontrol �eld
applied along the x dire
tion and η(t) is the 
lassi
al
noise signal perturbing the system along the z dire
tion.
The noise sour
e η(t) 
an be 
hara
terized by its auto-

orrelation fun
tion
C(t) ≡ 〈η(0)η(t)〉 = lim
∫ T/2
η(s)η(s+ t) ds, (2)
the Fourier transformation of whi
h de�nes the noise
power spe
tral density as
S(f) =
C(t)e−i2πftdt. (3)
For a single RTN sour
e with the amplitude ∆ and 
or-
relation time τc, the auto
orrelation fun
tion is given
by [30℄
(t) = ∆2e−2|t|/τc, (4)
and the 
orresponding power spe
tral density by
(f) =
1 + (πfτc)2
. (5)
A standard way to simulate 1/fα noise is to use
an ensemble of K independent un
orrelated RTN pro-

esses [1, 25, 27℄. Let ηk(t) be a symmetri
 RTN signal
swit
hing between values −∆k and ∆k with the 
orre-
lation time τk ≡ 1/γk, where γk is the transition rate
between the two states. The total noise pro
ess appears
in the Hamiltonian (1) as η(t) =
k=1 ηk(t). Sin
e the
RTN sour
es are independent, Eqs. (2) and (4) yield the
auto
orrelation fun
tion
C(t) =
−2|t|/τk =
−2γk|t|, (6)
and the 
orresponding power spe
tral density is given by
S(f) =
∆2kγk
γ2k + (πf)
. (7)
Introdu
ing the density of transition rates g(γ) and ex-
pressing the noise strength ∆ as a fun
tion of the tran-
sition rate, we 
an repla
e the summation in Eq. (7) by
an integration, whi
h yields
S(f) =
∫ γmax
∆2(γ)g(γ)γ
γ2 + (πf)2
dγ, (8)
where γmin and γmax are minimal and maximal transition
rates, respe
tively. Provided that
∆2(γ)g(γ) = 2A/γ, (9)
where A is a 
onstant, the power spe
tral density in
Eq. (8) be
omes [27℄
S(f) =
arctan
− arctan
, γmin ≪ πf ≪ γmax. (10)
Thus Eq. (10) yields an approximation to the 1/f power
spe
trum. To generate a general 1/fα power spe
tral
density for 2 > α > 0, we 
an 
hoose
∆2(γ)g(γ) = 2Aγ−α (11)
as shown in [27℄.
Although the above method yields a valid approxima-
tion for the 1/fα spe
trum, it is 
omputationally ine�-

ient. In parti
ular, the number of distin
t noise states
in
reases exponentially with the number of RTN �u
tua-
torsK, i.e., the number of terms in the sum of Eq. (7) ap-
proximating the 1/fα noise. Sin
e the size of the di�eren-
tial equation system des
ribing the average qubit dynam-
i
s in
reases linearly with the number of noise states [29℄,
in pra
ti
e one has to restri
t the 
omputation to a rather
small number of independent RTN �u
tuators.
To over
ome this problem, we present a 
on
eptually
di�erent way of generating the desired 1/fα noise spe
-
trum using a single multi-state Markovian �u
tuator.
Consider a 
ontinuous-time Markovian noise pro
ess with
M dis
rete noise states. Let Γkj denote the transition
rate from the jth state to the kth one. In order to pre-
serve total probability, we must have
Γjk = 0 for all k = 1, 2, . . . ,M. (12)
Let us assume that the transition rates are symmetri
,
i.e., Γ = ΓT . Under this assumption the noise pro
ess
has a steady-state solution in whi
h the di�erent noise
states are equally probable. In order for the noise to be
unbiased, i.e., 〈η〉 = 0, the amplitudes bk asso
iated with
the noise states must satisfy
bk = 0. (13)
Thus the auto
orrelation is given by
C(t) = 〈η(t)η(0)〉 =
bT eΓ|t|b. (14)
Sin
e Γ is symmetri
, we 
an diagonalize it with an or-
thogonal matrix V as Γ = V ΛV T , where the real diagonal
matrix Λ = diag{λk}Mk=1 
arries the eigenvalues of Γ in
a des
ending order. De�ning χ := 1√
V T b, we rewrite
Eq. (14) in the form of Eq. (6) as
C(t) = χT eΛ|t|χ =
λk|t|. (15)
In order to use this multi-state Markovian �u
tuator to
approximate 1/fα noise, we have to 
hoose the eigen-
values λk and the amplitudes χk su
h that Eq. (11) is
ful�lled. Moreover, we must 
onstru
t the orthogonal
matrix V su
h that Γ = V ΛV T satis�es Eq. (12), the
amplitudes bk satisfy Eq. (13), and the o�-diagonal ele-
ments of Γ must be non-negative.
One way to satisfy these requirements is to pi
k an
integer m ≥ 2 and set M = 2m and to 
hoose the eigen-
values as
{λk}Mk=1 = −2{0, γmin, γmin + δ, γmin + 2δ, . . . , γmax},
where γmax = (M − 2)δ+ γmin and 0 < δ ≤ γmin. Hen
e,
the distribution of the transition rates g(γ) is uniform
on [γmin, γmax]. Then we set V = H
, where H is the
Hadamard matrix
Expli
it 
al
ulation shows that these 
hoi
es ensure that
Eq. (12) is satis�ed. To ful�ll Eqs. (11) and (13), we set
χ1 = 0 and χk = γ
k for k = 2, . . . , M , where γk
is equal to γmin + (k − 2)δ. It 
an be shown that this

onstru
tion will also produ
e transition matri
es Γ with
non-negative o�-diagonal elements. Hen
e we have pro-
vided an e�
ient way to implement 1/fα noise. Note
that the M -state Markovian �u
tuator, Eq. (15), 
or-
responds formally to Eq. (6) with M − 1 non-vanishing
RTN �u
tuators. Thus we have a
hieved an exponential
improvement in the e�
ien
y of the noise approximation.
Alternatively, we 
an 
hoose the eigenvalues of Γ freely
and obtain a valid matrix V with numeri
al optimization,
0 0.2 0.4 0.6 0.8 1 1.2 1.4
[(2πf)/γ0]
FIG. 1: Logarithm of the power spe
tral density for �ve
independent RTN �u
tuators (dash-dotted line), a multi-state
Markovian sour
e 
orresponding to 31 RTN �u
tuators (solid
line), and an ideal 1/f noise (dotted line). The transition
rates of the RTN �u
tuators are in both 
ases distributed
uniformly on the interval [γ0, 30γ0].
whi
h may result in even more faithful approximation of
1/fα noise.
Figure 1 
ompares the approximation of the spe
-
tral density of 1/f noise generated by independent RTN
sour
es and by a multi-state Markovian sour
e. For the
RTN approa
h, we 
hoose 5 independent noise sour
es,
for whi
h the transition rates γk are uniformly distributed
in the range [γmin, γmax] = [γ0, 30γ0], and the strengths
are given by ∆k = 1/
γk. This yields a �u
tuator
with 32 distin
t noise states. For the multi-state �u
-
tuator, we 
hoose a 32-state noise sour
e, for whi
h
the nonzero eigenvalues λk of its transition rate ma-
trix Γ are distributed uniformly on [−60γ0,−2γ0], and
χk = 1/
−λk/2. Thus the 
ondition in Eq. (9) is satis-
�ed for both of the approa
hes and the multi-state noise
sour
e has an auto
orrelation fun
tion and power spe
-
tral density whi
h are equal to those for a 
ertain ensem-
ble of 31 RTN �u
tuators. We employ representations of
similar 
omputational 
omplexity here in order to be able
to assess the relative a

ura
y for a given 
omputational
e�ort.
Figure 1 shows that an ensemble of �ve RTN pro
esses
is not an a

urate model for 1/f noise, whereas a single
32-state Markovian noise sour
e is quite a

urate, espe-

ially in the range 3γ0 . ω . 16γ0. The poor quality
of the approximation with �ve RTN �u
tuators is due to
the small number of independent noise sour
es employed
here, whereas the 32-state Markovian �u
tuator 
ontains
more parameters and thereby introdu
es more �exibility
in the noise approximation. The frequen
y range over
whi
h the approximation is a

urate is relatively short
if one 
onsiders that the 1/f noise dete
ted in experi-
mental appli
ations often extends over several frequen
y
de
ades. The width of this frequen
y range 
an of 
ourse
be in
reased by in
reasing the width of the region from
whi
h the eigenvalues of the matrix Γ are 
hosen. In
this 
ase, however, the number of dis
rete levels in the
Markovian sour
e must also be in
reased to preserve the
desired a

ura
y. For the main purpose of demonstrating
the feasibility of the numeri
al optimization algorithm,
in the rest of this paper we will 
ontinue to approximate
1/fα noise by a single Markovian noise sour
e with 32
levels.
III. QUBIT DYNAMICS AND CONTROL
In Ref. [24℄, the temporal evolution of the qubit density
matrix was 
al
ulated by averaging over 104�105 unitary
quantum traje
tories, ea
h 
orresponding to a sample
noise path. To ensure a

ura
y, a large number of uni-
tary traje
tories are required, whi
h results in extensive

omputational e�ort. In Ref. [29℄, exa
t deterministi
master equations des
ribing the average temporal evo-
lution of quantum systems under Markovian noise were
derived.
Following Ref. [29℄, we introdu
e a 
onditional density
operator ρk(t) whi
h 
orresponds to the density operator
of the system averaged over all the noise sample paths
o

upying the kth state at the time instant t. The 
on-
ditional density operators are normalized su
h that the
tra
e of the operator ρk(t) yields the probability of the
kth noise state as Pk(t) = Tr [ρk(t)]. The total average
density operator 
an be expressed as
ρ(t) =
ρk(t). (16)
The dynami
s of ρk is obtained from the 
oupled master
equations [29℄
∂tρk(t) =
[Hk(t), ρk(t)] +
Γkjρj(t), (17)
whereHk(t) is the Hamiltonian of the system 
orrespond-
ing to the kth noise state, and Γkj the transition rate
from the jth state to the kth state, as de�ned in Se
. II.
Spe
i�
ally, in our one-qubit 
ase,
Hk(t) =
a(t)σx +
bkσz, (18)
where bk is the noise amplitude of the state k. We shall
use Ea {ρ} to denote the state ρ evolved under the in�u-
en
e of noise and the 
ontrol sequen
e a.
The �delity fun
tion quantifying the overlap between
the desired state ρf and the a
tual a
hieved �nal state is
de�ned as
φ(ρf , Ea {ρ0}) = Tr
Ea {ρ0}
, (19)
where ρ0 is the initial state of the system. To measure
how 
lose the evolution Ea is to the intended quantum
gate operation U , we 
al
ulate the average of the �delity
φ(Uρ0U
†, Ea {ρ0}) over all pure initial states ρ0, and ob-
tain the gate �delity fun
tion [24℄
Φ(U) =
k=x,y,z
†Ea {σk}
. (20)
We aim to �nd the optimal 
ontrol pulses whi
h max-
imize the �delity of the a
hieved quantum operation,
and hen
e apply a typi
al gradient based optimization
algorithm su
h as the gradient as
ent pulse engineering
(GRAPE) method developed in Ref. [31℄. If the 
ontin-
uous pulse pro�les are approximated by pie
ewise 
on-
stant fun
tions, the gradient of the �delity fun
tion with
respe
t to these 
onstant pulse values and durations 
an
be 
al
ulated by the 
hain rule. This gradient is further
used as a proportional adjustment to update the 
ontrol
pulse pro�le. The optimization pro
edure is terminated
when 
ertain desired a

ura
y is a
hieved. Note that
due to the non-
onvex nature of the problem, the gra-
dient based algorithm will only yield a lo
ally optimal
solution. We further employ a multitude of initial 
on-
ditions to �nd a 
ontrol pulse whi
h a
hieves the highest
�delity.
IV. QUANTUM MEMORY
In this se
tion, we fo
us on the implementation of
quantummemory, i.e., the identity operator. For the pur-
pose of 
omparison with the optimized pulse sequen
es,
we introdu
e four other kinds of 
ontrol s
hemes whi
h
generate the identity operator.
The �rst referen
e sequen
e is simply not to apply any
external 
ontrol pulse, i.e., a(t) = 0. This pulse has
no 
ompensation for de
oheren
e or error. The se
ond
referen
e sequen
e is a 
onstant 2π pulse given by
a2π(t) = amax, for t ∈ [0, 2π~/amax]. (21)
The third referen
e sequen
e is the 
omposite pulse se-
quen
e known as 
ompensation for o�-resonan
e with a
pulse sequen
e (CORPSE), whi
h was originally designed
to 
orre
t systemati
 errors in the implementation of one-
qubit quantum operations and to provide high order 
on-
trol proto
ols for systemati
 qubit bias, i.e., for the noise

orrelation time τc → ∞ [32, 33℄. For the identity oper-
ation, the CORPSE pulse sequen
e 
an be obtained as
aSC2π(t) =
amax, for 0 < t
′ < π
−amax, for π ≤ t′ ≤ 3π
amax, for 3π < t
′ < 4π,
where the dimensionless time t′ is de�ned as t′ = amaxt/~.
In the absen
e of noise, the CORPSE sequen
e gen-
erates the identity operator exa
tly although it requires
twi
e as long operation time as a 2π pulse, the se
ond
referen
e pulse above. In the presen
e of small system-
ati
 errors, the CORPSE sequen
e is mu
h more a

urate
than the 2π pulse. For example, 
onsider a state trans-
formation from the north pole ba
k to itself on the Blo
h
sphere. For η(t) ≡ ∆ in Eq. (1), the �delities de�ned in
Eq. (19) 
an be derived to be
φ2π = 1−
, (23)
SC2π = 1− 4π2
. (24)
We observe that the error in the �delity of the 2π pulse
is fourth order in the relative noise strength ∆/amax,
whereas for the CORPSE pulse sequen
e it is eighth or-
der. Thus the CORPSE sequen
e is mu
h more a

urate
than a 2π pulse in 
orre
ting the e�e
ts of systemati
errors on quantum memory.
The fourth standard pulse sequen
e whi
h we take as a
referen
e is the Carr-Pur
ell-Meiboom-Gill (CPMG) [34℄
sequen
e whi
h is designed to preserve qubit 
oheren
e.
In our 
ontext, this sequen
e 
onsists of a π/2 pulse fol-
lowed by multiple π pulses at intervals tp, followed by a
�nal π/2 pulse to bring the system ba
k to the original
state. This pulse sequen
e is designed for T2 measure-
ments on spins, starting from the |0〉 state. Thus one
does not expe
t a CPMG pulse sequen
e to perform as
well if the initial state is averaged over the Blo
h sphere
as is done to 
ompute a gate �delity.
We �rst present the �delities obtained for the iden-
tity operator using the various 
ontrol pulse options in
the presen
e of 1/f noise. The noise is generated here
by the single Markovian noise sour
e dis
ussed in Se
. II,
with transition rates distributed uniformly over the inter-
val [1/τc, 30/τc] . In Fig. 2, the �delities obtained from
optimized 
ontrol pulses, 2π pulse, CORPSE, CPMG,
and zero pulse sequen
es are plotted as fun
tions of the

hara
teristi
 
orrelation time τc of the approximate 1/f
noise. Here, CPMG1 and CPMG2 refer to two CPMG
types of pulses with the intervals between π pulses be-
ing π and 2π, respe
tively. The total duration for these
pulses are all 12π~/a
. The optimal 
ontrol pulse is de-
signed for 6π, and therefore we repeat it twi
e. Similarly,
we repeat the 2π pulse 6 times, the CORPSE sequen
e
3 times, the CPMG1 sequen
e 3 times, and the CPMG2
sequen
e twi
e. The optimal 
ontrol pulse yields 
learly
the highest �delity among all these pulses, whereas the
zero pulse sequen
e has the worst performan
e as there
are no 
orre
tion me
hanisms. Note that due to motional
narrowing, all 
urves approa
h unit �delity in the limit
τc → 0.
The memory a

ess rate is an important spe
i�
ation
in modern 
omputer te
hnology [35℄. In our 
ontext, it

orresponds to the total duration of the 
ontrol pulses.
Figure 3 shows the �delity as a fun
tion of the dura-
tion for the numeri
ally optimized 
ontrol pulses. Equa-
tion (1) implies that in the absen
e of noise, the quantum
system will generate an identity operator for a = amax
0
 5
 10
 15
 20
 25
 30
0.55
0.65
0.75
0.85
0.95
PSfrag repla
ements
τc/(~/amax)
FIG. 2: Fidelity of the quantum memory as a fun
tion of
the 
hara
teristi
 
orrelation time τc for optimized 
ontrol
pulses (bla
k solid), a 2π pulse (bla
k dash-dotted), CORPSE
pulse sequen
e (bla
k dotted), CPMG1 pulse sequen
e (bla
k
dashed), CPMG2 pulse sequen
e (gray solid), zero pulse se-
quen
e (gray dash-dotted). The operation time is 
hosen to be
12π~/a
. The noise is produ
ed by a single 32-state Marko-
vian sour
e with the average strength 〈|η|〉 = 0.125 × amax

orresponding to 31 RTN �u
tuators with the transition
rates uniformly distributed over the region [1/τ
, 30/τ
] and
strengths 
hosen as des
ribed in Se
. II.
0 1 2 3 4 5 6 7 8 9 10
PSfrag repla
ements
T/(π~/amax)
FIG. 3: Fidelity of the quantum memory as a fun
tion of the
operation time for 
ontrol pulses optimized at ea
h point. The
noise is produ
ed by a similar multi-state Markovian sour
e
as in Fig. 2, with τc = 3~/amax.
and the duration T = 2nπ/amax. In Fig. 3, we observe
that, despite an overall de
rease, there are peaks in the
�delity near these operation times. Thus we 
an regard
2nπ/amax as the natural periods for quantum memory,
and we always 
hoose the total duration of 
ontrol pulses

orrespondingly.
Here, we study the relation between the optimized �-
delities a
hieved above and the average noise strength
〈|η|〉, for a �xed value of the 
hara
teristi
 
orrelation
0
 0.1
 0.2
 0.3
 0.4
 0.5
 0.6
 0.7
PSfrag repla
ements
〈|η|〉
FIG. 4: Fidelity of the quantum memory as a fun
tion of the
average absolute noise strength for optimized 
ontrol pulses
(bla
k solid), 2π pulse (bla
k dash-dotted), CORPSE (bla
k
dotted), CPMG1 (bla
k dashed), CPMG2 (gray solid), and
zero (gray dash-dotted). The operation time is 
hosen to be
12π~/a
. Ex
ept for its strength, the noise is produ
ed by
a similar multi-state Markovian sour
e as in Fig. 2 with the

orrelation time τ
= 30~/a
time τc = 30~/amax. Figure 4 shows the �delity as a
fun
tion of the noise strength for the optimized 
ontrol
pulses, 2π pulse, the CORPSE, CPMG1, CPMG2, and
zero pulse sequen
es. At small values of 〈|η|〉 again, the
optimized 
ontrol pulses 
onsistently a
hieve higher �-
delities than all referen
e pulses. However, we note that
if the noise strength ex
eeds ∼0.4, the optimized pulse
sequen
e redu
es to the zero pulse sequen
e, i.e., any
nonzero pulse sequen
e will a
tually deteriorate the �-
delity performan
e.
The dis
ussion above is based on the spe
i�
 noise den-
sity spe
trum 1/fα with α = 1. Figure 5 shows the
�delities of quantum memory for four optimized 
ontrol
pulses, ea
h of whi
h is obtained for a di�erent value of α.
The noise is produ
ed here by a single multi-state Marko-
vian sour
e with average strength 〈|η|〉 = 0.125 × amax,
and the total duration for all 
ontrol pulses are �xed
to 6π. A systemati
 s
aling of the 
orrelation time axis
with respe
t to α is 
learly visible in Fig. 5. This phe-
nomenon is explained by the fa
t that the 
on
entra-
tion of the power spe
trum of 1/fα to high frequen
ies,
i.e., long 
orrelation times, in
reases with α. Hen
e, the

urves s
ale down in τ
with in
reasing α.
V. NOT GATE
In this se
tion, we fo
us on the generation of high-
�delityNOT gates, i.e., the σx operator, under 1/f noise.
As in the 
ase of quantum memory, we 
ompare the nu-
meri
ally optimized results with referen
e pulses. In this
0
 5
 10
 15
 20
 25
 30
0.95
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
PSfrag repla
ements
τc/(~/amax)
FIG. 5: Fidelity of the quantum memory a
hieved with op-
timized 
ontrol pulses as a fun
tion of the 
hara
teristi
 
or-
relation time τc for 1/f
noise with α = 1 (solid ), α = 1.25
(dotted), α = 1.5 (dash-dotted), and α = 1.75 (dashed). The
operation time is 
hosen to be 6π~/a
. The noise is pro-
du
ed by a similar multi-state Markovian sour
e as in Fig. 2
with variable values of the power α.

ase, our �rst referen
e pulse is the π pulse given by
aπ(t) = amax, for t ∈ [0, π~/amax], (25)
whi
h in the absen
e of noise is the most e�
ient way of
a
hieving a NOT gate. In addition, we will use the two

omposite pulse sequen
es CORPSE and short CORPSE
[32, 33℄ whi
h assume here the form
aCπ(t) =
amax, for 0 < t
′ < π/3
−amax, for π/3 ≤ t′ ≤ 2π
amax, for 2π < t
′ < 13π/3,
aSCπ(t) =
−amax, for 0 < t′ < π/3
amax, for π/3 ≤ t′ ≤ 2π
−amax, for 2π < t′ < 7π/3,
respe
tively. Both of these pulse sequen
es 
orre
t for
systemati
 error, CORPSE being more e�
ient. How-
ever, the operation time of short CORPSE is mu
h
shorter than that of CORPSE, and hen
e it 
an yield
higher �delities in the presen
e of noise.
Figure 6 shows the NOT gate �delities obtained by
the referen
e and optimized pulses in the presen
e of the
same 1/f noise as employed in the analysis of quantum
memory in Se
. IV. We observe that for long enough

orrelation times, the 
omposite pulse sequen
es provide
good error 
orre
tion. Furthermore, as observed earlier
for RTN [24℄, for intermediate 
orrelation times, short
CORPSE a
hieves the highest �delity among the refer-
en
e pulses. Figure 7 presents the pulse sequen
es ob-
tained from the numeri
al optimizations for three dif-
ferent values of the noise 
orrelation time τc. For the
optimized pulse sequen
e, we �nd a transition from an
approximately 
onstant pulse to a short CORPSE -like
pulse sequen
e at 
hara
teristi
 
orrelation time τ
50~/a
. This 
hange in optimal pulse sequen
e is
responsible for the apparent dis
ontinuity in the �rst
derivative of the �delity 
urve in Fig. 6.
0
 50
 100
 150
 200
0.955
0.96
0.965
0.97
0.975
0.98
0.985
0.99
0.995
τc/(~/amax)
FIG. 6: NOT gate �delities as fun
tions of the 
hara
teris-
ti
 noise 
orrelation time τc for a π pulse (dotted), CORPSE
(dash-dotted), short CORPSE (dashed), and gradient opti-
mized pulse sequen
e (solid). The 1/f noise is generated as
in Fig. 2.
These results for the generation of NOT gates under
1/f noise are qualitatively quite similar to the previous
results presented in Refs. [24, 29℄ for a single RTN. This
similarity is due to the fa
t that 1/f noise 
an be re-
garded as arising from a sum of independent RTN �u
-
tuators, ea
h of whi
h having a similar �delity depen-
den
e on their 
orrelation times. Note that the s
ale for
the referen
e 
orrelation time τc of the �delity obtained
in presen
e of 1/f noise in Fig. 6 is somewhat di�erent
from the 
orresponding s
ale for the 
orrelation time of
a single RTN sour
e, sin
e the 1/f noise involves an en-
semble of RTN �u
tuators with a range of 
orrelation
times.
VI. CONCLUSIONS
We have studied a single qubit under the in�uen
e of
1/fα noise for 2 > α > 0 and investigated how de
o-
heren
e due to this noise 
an be suppressed in the im-
plementation of single qubit operations. We presented
an e�
ient way to approximate the noise with a dis
rete
multi-state Markovian �u
tuator. Due to this �nding,
the average temporal evolution of the qubit density ma-
trix under 1/fα noise 
an be e�
iently determined from
a deterministi
 master equation.
Employing these exa
t deterministi
 master equations
des
ribing the temporal evolution of the qubit density
operator under Markovian noise, we applied a gradient
FIG. 7: Optimized pulse sequen
es yielding the highest gate
�delities for 
orrelation times (a) 45~/a
, (b) 100~/a
and (
) 150~/a

orresponding to Fig. 6.
based optimization pro
edure to sear
h for optimal 
on-
trol pulses implementing quantum operations. In par-
ti
ular, we studied the physi
al appli
ation of quantum
memory, i.e., the identity operator, whi
h is a funda-
mental 
on
ept in the realization of a quantum 
om-
puter. The optimized 
ontrol pulses signi�
antly im-
proved the �delity over several referen
e sequen
es su
h
as 2π, CORPSE, CPMG, and zero pulses. We observe
peaks on �delity 
urves 
orresponding to integer multi-
ples of 2π~/a
in the total durations of 
ontrol pulses,
where a
is the maximum magnitude of the external

ontrol �eld. We also studied the performan
e of opti-
mal 
ontrol pulses under 1/fα noise for several di�erent
values of 2 > α ≥ 1, and found a monotoni
 behavior in
the noise frequen
y as a fun
tion of α, i.e., the �delity

urves are s
aled down in the 
orrelation time for in
reas-
ing α. We also investigated how the �delities degraded as
the noise strength in
reases. For the generation of high-
�delity NOT gates, we obtained results showing qualita-
tively similar behavior to the previous results presented
in Refs. [24, 29℄ for a single RTN sour
e. In parti
u-
lar, just as for a single noise sour
e, in the presen
e of
1/fα noise we observed a transition in the optimal 
ontrol
pulse sequen
e from a 
onstant pulse to a CORPSE-like
sequen
e as the noise 
hara
teristi
 
orrelation time τc is
in
reased.
This approa
h of 
oupled master equations indexed by
noise states of the environment, together with an opti-
mization te
hnique for pulse design 
an be readily gen-
eralized to multiple qubits evolving in the presen
e of
1/fα noise and other Markovian noise sour
es. Further-
more, it 
an be used to develop realisti
 pulse sequen
es
for mitigation of nu
lear spin and surfa
e magneti
 noise
a
ting on donor spins implanted in sili
on [11℄, as well
as for suppression of ba
kground 
harge noise a
ting on
super
ondu
ting qubits [4℄. In future, we will study the
implementation of multi-qubit gates, e.g., the 
ontrolled
NOT gate, in noisy systems and the swapping of quan-
tum information from a noisy qubit to long term quan-
tum memory. We will also 
onsider more realisti
 noise
with 1/fα spe
trum over many frequen
y de
ades.
A
knowledgments
This work was supported by the A
ademy of Fin-
land, the National Se
urity Agen
y (NSA) under
MOD713106A and by the NSF ITR program under grant
number EIA-0205641. M. M. and V. B. a
knowledge the
Finnish Cultural Foundation, M. M. the Väisälä founda-
tion and Magnus Ehrnrooth Foundation for the �nan
ial
support. We thank J. Clarke for insightful dis
ussions.
[1℄ L. Faoro and L. Viola, Phys. Rev. Lett. 92, 117905
(2004).
[2℄ E. Paladino, L. Faoro, G. Fal
i, and R. Fazio, Phys. Rev.
Lett. 88, 228304 (2002).
[3℄ G. Fal
i, A. D'Arrigo, A. Mastellone, and E. Paladino,
Phys. Rev. A 70, 040101 (2004).
[4℄ O. Asta�ev, Y. A. Pashkin, Y. Nakamura, T. Yamamoto,
and J. S. Tsai, Phys. Rev. Lett. 96, 137001 (2006).
[5℄ O. Asta�ev, Y. A. Pashkin, Y. Nakamura, T. Yamamoto,
and J. S. Tsai, Phys. Rev. Lett. 93, 267007 (2004).
[6℄ F. C. Wellstood, C. Urbina, and J. Clarke, Apl. Phys.
Lett. 85, 5296 (2004).
[7℄ M. Mü
k, M. Korn, C. G. A. Mugford, J. B. Ky
ia, and
J. Clarke, Apl. Phys. Lett. 86, 012610 (2005).
[8℄ T. M. Eiles, R. L. Kautz, and J. M. Martinis, Apl. Phys.
Lett. 61, 237 (1992).
[9℄ Y. Nakamura, Y. A. Pashkin, T. Yamamoto, and J. S.
Tsai, Physi
a S
ripta 102, 155 (2002).
[10℄ R. de Sousa, unpublished, 
ond-mat/0610716 (2006).
[11℄ T. S
henkel, J. A. Liddle, A. Persaud, A. M. Tyryshkin,
S. A. Lyon, R. de Sousa, K. B. Whaley, J. B. J.
Shangkuan, and I. Chakarov, Apl. Phys. Lett. 8, 11201
(2006).
[12℄ R. de Sousa et al., unpublished (2007).
[13℄ Y. Nakamura, Y. A. Pashkin, T. Yamamoto, and J. S.
Tsai, Phys. Rev. Lett. 88, 047901 (2002).
[14℄ Y. M. Galperin, B. L. Altshuler, J. Bergli, and D. V.
Shantsev, Phys. Rev. Lett. 96, 097009 (2006).
[15℄ B. Savo, F. C. Wellstood, and J. Clarke, Appl. Phys.
Letts. 50, 1757 (1987).
[16℄ R. T. Wakai and D. J. V. Harlingen, Phys. Rev. Lett. 58,
1687 (1987).
[17℄ T. Fujisawa and Y. Hirayama, Appl. Phys. Lett. 77, 543
(2000).
[18℄ C. Kurdak, C.-J. Chen, D. C. Tsui, S. Parihar, S. Lyon,
and G. W. Weimann, Phys. Rev. Lett. 56, 9813 (1997).
[19℄ R. de Sousa, K. B. Whaley, F. K. Wilhelm, and J. von
Delft, Phys. Rev. Lett. 95, 247006 (2005).
[20℄ L. Viola and S. Lloyd, Phys. Rev. A 58, 2733 (1998).
[21℄ L. Viola, S. Lloyd, and E. Knill, Phys. Rev. Lett. 83,
4888 (1999).
[22℄ A. G. Kofman and G. Kurizki, Phys. Rev. Lett. 87,
270405 (2001).
[23℄ A. G. Kofman and G. Kurizki, Phys. Rev. Lett. 93,
130406 (2004).
[24℄ M. Möttönen, R. d. Sousa, J. Zhang, and K. B. Whaley,
Phys. Rev. A 73, 022332 (2006).
[25℄ M. B. Weissman, Rev. Mod. Phys. 60, 537 (1988).
[26℄ E. Paladino, L. Faoro, G. Fal
i, and R. Fazio, Phys. Rev.
Lett. 88, 228304 (2002).
[27℄ B. Kaulakys, V. Gontis, and M. Alaburda, Phys. Rev. E
71, 051105 (2005).
[28℄ Y. M. Galperin, B. L. Altshuler, J. Bergli, and D. V.
Shantsev, Phys. Rev. Lett. 96, 097009 (2006).
[29℄ O.-P. Saira, V. Bergholm, T. Ojanen, and M. Möttönen,
Phys. Rev. A 75, 012308 (2007).
[30℄ M. J. Kirton and M. J. Uren, Advan
es in Physi
s 38
(1989).
[31℄ N. Khaneja, T. Reiss, C. Kehlet, T. S
hulte-Herbrüggen,
and S. J. Glaser, J. Mag. Res. 172, 296 (2005).
[32℄ H. K. Cummins and J. A. Jones, New J. Phys. 2, 1 (2000).
[33℄ H. K. Cummins, G. Llewellyn, and J. A. Jones, Phys.
Rev. A 67, 042308 (2003).
[34℄ S. Meiboom and D. Gill, Rev. S
i. Instr. 29, 688 (1958).
[35℄ J. L. Hennessy and D. A. Patterson, Computer Ar
hi-
te
ture: A Quantitative Approa
h (Morgan Kaufmann,
2006).
ABSTRACT
  We investigate the generation of quantum operations for one-qubit systems
under classical noise with 1/f^\alpha power spectrum, where 2>\alpha > 0. We
present an efficient way to approximate the noise with a discrete multi-state
Markovian fluctuator. With this method, the average temporal evolution of the
qubit density matrix under 1/f^\alpha noise can be feasibly determined from
recently derived deterministic master equations. We obtain qubit operations
such as quantum memory and the NOT}gate to high fidelity by a gradient based
optimization algorithm. For the NOT gate, the computed fidelities are
qualitatively similar to those obtained earlier for random telegraph noise. In
the case of quantum memory however, we observe a nonmonotonic dependency of the
fidelity on the operation time, yielding a natural access rate of the memory.

<|endoftext|><|startoftext|>
Introduction
We consider the flow of an incompressible fluid in a open bounded set Ω ⊂ R2
during the time interval [0, T ]. The velocity field u : Ω × [0, T ] → R2 and the
pressure field p : Ω× [0, T ] → R satisfy the Navier-Stokes equations
∆u+ (u ·∇)u+∇p = f ,(1.1)
div u = 0 ,(1.2)
with the boundary and initial condition
u|∂Ω = 0 , u|t=0 = u0.
The terms ∆u and (u ·∇)u are respectively associated with the physical phenom-
ena of diffusion and convection. The Reynolds number Re measures the influence
of convection in the flow. For equations (1.1)–(1.2), finite element and finite dif-
ference methods are well known and mathematical studies are available (see [10]
for example). Numerous computations have also been conducted with finite vol-
ume schemes (e.g. [14] and [1]). However, in this case, few mathematical results
are available. Let us cite Eymard and Herbin [7] and Eymard, Latché and
Herbin [8]. In order to deal with the incompressibility constraint (1.2), these works
use a penalization method. Another way is to use the projection methods which
have been introduced by Chorin [4] and Temam [15]. This is the case in Faure
[9]. In this work, however, the mesh is made of squares, so that the geometry of
the problem is limited. Therefore, we introduce in what follows a finite volume
scheme on triangular meshes for equations (1.1)–(1.2), using a projection method.
An interesting feature of this scheme is that the unknowns for the velocity and
Received by the editors April 1, 2007 and, in revised form, April 1, 2007.
2000 Mathematics Subject Classification. 76M12, 76B99.
http://arxiv.org/abs/0704.0772v1
2 S. ZIMMERMANN
pressure are both piecewise constant (colocated scheme). It leads to an economic
computer storage, and allows an easy generalization of the scheme to the 3D case.
The layout of the article is the following. We first introduce (section 2) some no-
tations and hypotheses on the mesh. We define (section 2.2) the spaces we use to
approximate the velocity and pressure. We define also (section 2.3) the operators
we use to approximate the differential operators in (1.1)–(1.2). Combining this
with a projection method, we build the scheme in section 3. In order to provide
a mathematical analysis for the scheme, we prove in section 4 that the differential
operators in (1.1)–(1.2) and their discrete counterparts share similar properties. In
particular, the discrete operators for the gradient and the divergence are adjoint.
Also, the discrete gradient operator is a consistent approximation of its continuous
counterpart. The discrete operator for the convection term is positive, stable and
consistent. The discrete operator for the divergence satisfies an inf-sup (Babuška-
Brezzi) condition. From these properties we deduce in section 5 the stability of the
scheme.
We conclude with some notations. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the
usual Lebesgue spaces and we set L20 = {q ∈ L2 ;
q(x) dx = 0}. Their vectorial
counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L2 = (L2)2 and L∞ = (L∞)2. For
k ∈ N∗, (Hk, ‖·‖k) is the usual Sobolev space. Its vectorial counterpart is (Hk, ‖.‖k)
with Hk = (Hk)2. For k = 1, the functions of H1 with a null trace on the
boundary form the spaceH10. Also, we set ∇u = (∇u1,∇u2)T if u = (u1, u2) ∈ H1.
If X ⊂ L2 is a Banach space, we define C(0, T ;X) (resp. L2(0, T ;X)) as the
set of the applications g : [0, T ] → X such that t → |g(t)| is continous (resp.
square integrable). The norms ‖.‖C(0,T ;X) and ‖.‖L2(0,T ;X) are defined respectively
by ‖g‖C(0,T ;X) = supt∈[0,T ] |g(t)| and ‖g‖L2(0,T ;X) =
|g(t)|2 ds
. In all
calculations, C is a generic positive constant, depending only on Ω, u0 and f .
2. Discrete setting
First, we introduce the spaces and the operators needed to build the scheme.
2.1. The mesh. Let Th be a triangular mesh of Ω: Ω = ∪K∈ThK. For each
triangle K ∈ Th, we denote by |K| its area and EK the set of his edges. If σ ∈ EK ,
nK,σ is the unit vector normal to σ pointing outward of K.
The set of edges of the mesh is Eh = ∪K∈ThEK . The length of an edge σ ∈ Eh is |σ|
and its middle point xσ. The set of edges located inside Ω (resp. on its boundary)
is E inth (resp. Eexth ): Eh = E inth ∪ Eexth . If σ ∈ E inth , Kσ and Lσ are the triangles
sharing σ as an edge. If σ ∈ Eexth , only the triangle Kσ inside Ω is defined.
We denote by xK the circumcenter of a triangle K. We assume that the measure
of all interior angles of the triangles of the mesh are below π
, so that xK ∈ K. If
σ ∈ E inth (resp. σ ∈ Eexth ) we set dσ = d(xKσ ,xLσ ) (resp. dσ = d(xσ ,xKσ)). We
define for all edge σ ∈ Eh
(2.1) τσ =
The maximum circumradius of the triangles of the mesh is h. We assume ([6] p.
776) that there exists C > 0 such that
∀σ ∈ Eh, d(xKσ , σ) ≥ C|σ| and |σ| ≥ Ch.
It implies that there exists C > 0 such that
(2.2) ∀σ ∈ Eh , τσ ≥ C ,
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 3
and for all triangles K ∈ Th we have (with σ ∈ EK and hK,σ the matching altitude)
(2.3) |K| = 1
|σ|hK,σ ≥
|σ| d(xK ,xσ) ≥ C h2.
Lastly, if K ∈ Th and L ∈ Th are two triangles sharing the edge σ ∈ E inth , we define
αK,L =
d(xL,xσ)
d(xK ,xL)
Let us notice that αK,L ∈ [0, 1] and αK,L + αL,K = 1.
2.2. The discrete spaces. We first define
P0 = {q ∈ L2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0)2.
For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all triangle
K ∈ Th: qK = qh|K (resp. vK = vh|K). Although P0 6⊂ H1, we define the discrete
equivalent of a H1 norm as follows. For all vh ∈ P0 we set
(2.4) ‖vh‖h =
σ∈Eint
τσ |vLσ − vKσ |2 +
σ∈Eext
τσ |vKσ |2
where τσ is given by (2.1). We have [6] a Poincaré-like inequality for P0: there
exists C > 0 such that for all vh ∈ P0
(2.5) |vh| ≤ C ‖vh‖h.
We also have the following inverse inequality.
Proposition 2.1. There exists a constant C > 0 such that for all vh ∈ P0
h ‖vh‖h ≤ C |vh|.
Proof. According to (2.4)
h2 ‖vh‖2h =
σ∈Eint
h2 τσ |vLσ − vKσ |2 +
σ∈Eext
h2 τσ |vKσ |2.
We deduce from (2.2) and (2.3) that h2 τσ ≤ C |Kσ| and h2 τσ ≤ C |Lσ|. Thus,
since |vLσ − vKσ |2 ≤ 2
|vLσ |2 + |vKσ |2
, we get
h2 ‖vh‖2h ≤ C
σ∈Eint
|Kσ| |vKσ |2 + |Lσ| |vLσ |2
σ∈Eext
|Kσ| |vKσ |2.
Hence h2 ‖vh‖2h ≤ C
|K| |vK |2 ≤ C |vh|2.
From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set
(2.6) ‖vh‖−1,h = sup
(vh,ψh)
‖ψh‖h
For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. Now we introduce
some operators on P0 and P0. We define the projection operator ΠP0 : L
2 → P0
as follows. For all w ∈ L2, ΠP0w ∈ P0 is given by
(2.7) ∀K ∈ Th , (ΠP0w)|K =
w(x) dx.
We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) = (w,vh).
It implies that ΠP0 is stable for the L
2 norm. We define also the interpolation
operator Π̃P0 : H
2 → P0. For all q ∈ H2, Π̃P0q ∈ P0 is given by
∀K ∈ Th , Π̃P0q|K = q(xK).
4 S. ZIMMERMANN
According to the Sobolev embedding theorem, q ∈ H2 is a.e. equal to a continuous
function. Therefore the definition above makes sense. We also set Π̃P0 = (Π̃P0 )
The operator Π̃P0 (resp. Π̃P0) is naturally stable for the L
∞ (resp. L∞) norm.
One also checks ([2] and [16]) that there exists C > 0 such that
(2.8) |v −ΠP0v| ≤ C h ‖v‖1 , |q − Π̃P0q| ≤ C h ‖q‖2
for all v ∈ H1 and q ∈ H2.
We introduce the finite element spaces
P d1 = {v ∈ L2 ; ∀K ∈ Th, v|K is affine} ,
1 = {vh ∈ P d1 ; ∀σ ∈ E inth , vh|Kσ(xσ) = vh|Lσ(xσ) ,
Pc1 = {vh ∈ (P d1 )2 ; vh is continuous and vh|∂Ω = 0}.
We have Pc1 ⊂ H10. We define the projection operator ΠPc1 : H
0 → Pc1. For all
v = (v1, v2) ∈ H10, ΠPc1v = (v
h) ∈ Pc1 is given by
∀φh = (φ1h, φ2h) ∈ Pc1 ,
∇vih,∇φih) =
∇vi,∇φih).
The operator ΠPc
is stable for the H1 norm and ([2] p. 110) there exists C > 0
such that for all v ∈ H1
(2.9) |v −ΠPc
v| ≤ C h ‖v‖1.
Let us address now the space Pnc1 . If qh ∈ Pnc1 , we have usually ∇qh 6∈ L2. Thus
we define the operator ∇̃h : Pnc1 → P0 by setting for all qh ∈ Pnc1 and all K ∈ Th
∇̃hqh|K =
∇qh dx.
The associated norm is given by
‖qh‖1,h =
|qh|2 + |∇̃hqh|2
We also have a Poincaré inequality: there exists C > 0 such that for all qh ∈ Pnc1 ∩L20
(2.10) |qh| ≤ C |∇̃hqh|.
We define the projection operator ΠPnc
. For all qh ∈ Pnc1 , ΠPnc1 qh is given by
(2.11) ∀φ ∈ L2 , (ΠPnc
qh, φ) = (qh, φ).
We have the following result.
Proposition 2.2. If qh ∈ P0, ΠPnc
qh is given by
∀σ ∈ E inth , (ΠPnc1 qh)(xσ) =
|Kσ|+ |Lσ|
qKσ +
|Kσ|+ |Lσ|
qLσ ,
∀σ ∈ Eexth , (ΠPnc1 qh)(xσ) = qKσ .
Proof. For all edge σ ∈ Eh, we define the function ψσ ∈ Pnc1 by setting
ψσ(xσ′ ) =
1 if σ = σ′,
0 otherwise.
Let us notice that ψσ vanishes outside Kσ ∪ Lσ if σ ∈ E inth and outside Kσ if
σ ∈ Eexth . Let σ ∈ E inth . Using a quadrature formula we get
(ΠPnc
qh, ψσ) =
(ΠPnc
qh)(xσ)
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 5
(qh, ψσ) = qKσ
+ qLσ
For an edge σ ∈ Eexth we have (ΠPnc1 qh, ψσ) =
(ΠPnc
qh)(xσ) and (qh, ψσ) =
. By plugging these equations into (2.11) with φ = ψσ, we get the result.
We finally introduce the Raviart-Thomas spaces
={vh ∈ Pd1 ; ∀σ ∈ EK , vh|K · nK,σ is a constant, and vh · n|∂Ω = 0} ,
RT0 ={vh ∈ RTd0 ; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ ,σ = vh|Lσ · nKσ ,σ}.
For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh ·nK,σ)σ = vh|K ·nK,σ. We define
the operator ΠRT0 : H
1 → RT0. For all v ∈ H1, ΠRT0v ∈ RT0 is given by
(2.12) ∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ =
v dσ.
One checks [3] that there exists C > 0 such that for all v ∈ H1
(2.13) |v −ΠRT0v| ≤ C h ‖v‖1.
The following result will be useful.
Proposition 2.3. For all v ∈ H1 such that divv = 0, we have ΠRT0v ∈ P0.
Proof. Let vh = ΠRT0v and K ∈ Th. According to [3] there exists aK ∈ R2 and
bK ∈ R such that: ∀x ∈ K , vh(x) = aK + bK x. Thus divvh|K = 2 bK . On the
other hand, according to the divergence formula and (2.12)
divv dx =
v · n dγ =
vh · n dγ =
divvh dx.
Hence bK = 0 and we get: ∀x ∈ K , vh(x) = aK .
2.3. The discrete operators. The equations (1.1)–(1.2) use the differential op-
erators gradient, divergence and laplacian. Using the spaces of section 2.2 we define
their discrete counterparts. The discrete gradient ∇h : P0 → P0 is built using a
linear interpolation on the edges of the mesh (see [16] for details). This kind of
construction has also be considered in [5]. We set for all qh ∈ P0 and all K ∈ Th
∇h qh|K =
σ∈EK∩E
αKσ,Lσ qKσ + αLσ,Kσ qLσ
σ∈EK∩E
|σ| qKσ nK,σ.(2.14)
We have the following result [16].
Proposition 2.4. If qh ∈ L20 is such that ∇hqh = 0, then qh = 0.
The discrete divergence operator divh : P0 → P0 is built so that it is adjoint to the
operator ∇h (proposition 4.6 below). We set for all qh ∈ P0 and all K ∈ Th
(2.15) divh vh|K =
σ∈EK∩E
αLσ ,Kσ vKσ + αKσ,Lσ vLσ
· nK,σ.
The first discrete laplacian ∆h : P0 → P0 ensures that the incompressibility con-
straint (1.2) is satisfied in a discrete sense (proposition 3.1). We set for all qh ∈ P0
(2.16) ∆hqh = divh(∇hqh).
6 S. ZIMMERMANN
The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite volume
schemes [6]. We set for all vh ∈ P0 and all K ∈ Th
∆̃hvh|K =
σ∈EK∩E
τσ (vLσ − vKσ )−
σ∈EK∩E
τσ vKσ .
In order to approximate the convection term (u ·∇)u in (1.1) we define a bilinear
form b̃h : P0 ×P0 → P0 using the well-known upwind scheme ([6] p. 766). For all
uh ∈ P0, vh ∈ P0, and all K ∈ Th we have
(2.17) b̃h(uh,vh)
σ∈EK∩E
(uσ · nK,σ)+ vK + (uσ · nK,σ)− vLσ
We have set uσ = αLσ ,Kσ uKσ + αKσ ,Lσ uLσ and a
+ = max(a, 0), a− = min(a, 0)
for all a ∈ R. Lastly, we define the trilinear form bh : P0 × P0 × P0 → R2 as
follows. For all uh ∈ P0, vh ∈ P0, wh ∈ P0, we set
(2.18) bh(uh,vh,wh) =
|K|wK · b̃h(uh,vh)
3. The scheme
We have defined in section 2 the discretization in space. We now have to define
a discretization in time, and treat the incompressibility constraint (1.2). We use a
projection method to this end. This kind of method has been introduced byChorin
[4] and Temam [15]. The basic idea is the following. The time interval [0, T ] is
split with a time step k: [0, T ] =
n=0[tn, tn+1] with N ∈ N∗ and tn = n k for all
n ∈ {0, . . . , N}. For all m ∈ {2, . . . , N}, we compute (see equation (3.2) below) a
first velocity field ũmh ≃ u(tm) using only equation (1.1). We use a second-order
BDF scheme for the discretization in time. We then project ũmh (see equation (3.4)
below) over a subspace of P0. We get a a pressure field p
h ≃ p(tm) and a second
velocity field umh ≃ u(tm), which fulfills the incompressibilty constraint (1.2) in a
discrete sense. The algorithm goes as follows.
First, for all m ∈ {0, . . . , N}, we set fmh = ΠP0 f(tm). Since the operator ΠP0 is
stable for the L2-norm we get
(3.1) |fmh | = |ΠP0 f(tm)| ≤ |f(tm)| ≤ ‖f‖C(0,T ;L2).
We start with the initial values
u0h ∈ P0 ∩RT0 , u1h ∈ P0 ∩RT0 p1h ∈ P0 ∩ L20.
For all n ∈ {1, . . . , N}, (ũn+1h , p
h ) is deduced from (ũ
h) as follows.
• ũn+1h ∈ P0 is given by
(3.2)
3 ũn+1h − 4unh + u
∆̃hũ
h +b̃h(2u
h−un−1h , ũ
h )+∇hp
h = f
• pn+1h ∈ Pnc1 ∩ L20 is the solution of
(3.3) ∆h(p
h − p
divh ũ
• un+1h ∈ P0 is deduced by
(3.4) un+1h = ũ
∇h(pn+1h − p
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 7
Existence and unicity of a solution to equation (3.2) is classical ([6] for example).
Let us show that equation (3.3) has also a unique solution. Let qh ∈ P0 ∩ L20 such
that ∆hqh = 0. According to proposition 4.6 we have for all qh ∈ P0
−(∆hqh, qh) = −
divh(∇hqh), qh
= (∇hqh,∇hqh) = |∇hqh|2.
Therefore we have ∇hqh = 0. Using proposition 2.4 we get qh = 0. We have
thus proved the unicity of a solution for equation (3.3). It is also the case for the
associated linear system. It implies that this linear system has indeed a solution.
Hence it is also the case for equation (3.3). Let us now prove that for all m ∈
{0, . . . , N}, umh fulfills (1.2) in a discrete sense.
Lemma 3.1. If vh ∈ RT0 ∩P0 then divh vh = 0.
Proof. Let K ∈ Th. Since vh ∈ RT0, definition (2.15) reads
divh vh|K =
|σ| (αLσ ,K + αK,Lσ )vK · nK,σ.
Since αKσ ,Lσ + αLσ,Kσ = 1 we conclude that
divh vh|K =
|σ|vK · nK,σ = vK ·
|σ|nK,σ
Proposition 3.1. For all m ∈ {0, . . . , N} we have divh umh = 0.
Proof. For m ∈ {0, 1} we have u0h ∈ P0 ∩ RT0 and u1h ∈ P0 ∩RT0. Applying
the lemma above we get the result. If m ∈ {2, . . . , N}, we apply the operator divh
to (3.3) and compare with (3.4).
4. Properties of the discrete operators
We prove that the differential operators in (1.1)–(1.2) and the operators defined
in section 2.3 share similar properties.
4.1. Properties of the discrete convective term. We define b̃ : H1 ×H1 →
L2. For all u ∈ H1 and v = (v1, v2) ∈ H1 we set
(4.1) b̃(u,v) =
div(v1 u), div(v2 u)
We show that the operator b̃h is a consistent approximation of b̃.
Proposition 4.1. There exists a constant C > 0 such that for all v ∈ H2 and all
u ∈ H2 ∩H10 satisfying divu = 0
‖ΠP0b̃(u,v) − b̃h(ΠRT0u, Π̃P0v)‖−1,h ≤ C h ‖u‖2 ‖v‖1.
Proof. Let uh = ΠRT0u and vh = Π̃P0v. According to proposition 2.3 we have
uh ∈ P0. Let K ∈ Th. According to the divergence formula and (2.7) we have
ΠP0b̃(u,v)|K =
σ∈EK∩E
v (u · n) dσ.
On the other hand, let us rewrite b̃h(uh,vh). Let σ ∈ EK ∩ E inth . Setting
vK,Lσ =
vK si (uh · nK,σ)σ ≥ 0
vLσ si (uh · nK,σ)σ < 0
one checks that vK (uσ ·nK,σ)++vLσ (uσ ·nK,σ)− = vK,Lσ (uσ ·nK,σ). By definition
uσ · nK,σ = αLσ,K (uK · nK,σ) + αK,Lσ (uLσ · nK,σ) ; since uh ∈ RT0 we get
8 S. ZIMMERMANN
uσ · nK,σ = (αLσ ,K + αK,Lσ ) (uK · nK,σ) = (uK · nK,σ) = (uh · nK,σ)σ. Using at
last (2.12), we deduce from (2.17)
b̃h(uh,vh)|K =
σ∈EK∩E
vK,Lσ (u · nK,σ) dσ.
ΠP0 b̃(u,v) − b̃h(uh,vh)
σ∈EK∩E
(v − vK,Lσ ) (u · n) dσ.
Let ψh ∈ P0. We have
ΠP0b̃(u,v) − b̃h(uh,vh),ψh
σ∈EK∩E
(v − vK,Lσ ) (u · n) dσ
σ∈Eint
(ψKσ −ψLσ)
(v − vKσ ,Lσ) (u · n) dσ.(4.2)
Let σ ∈ E inth . We want to estimate the integral over σ. Since we work in a two-
dimensional domain, we have the Sobolev injection H2 ⊂ L∞. Thus
(v − vKσ,Lσ) (u · n) dσ
∣∣∣∣ ≤ ‖u‖L∞
|v−vKσ ,Lσ | dσ ≤ C ‖u‖2
|v−vKσ ,Lσ | dσ.
Let us first assume that v ∈ C1. We set
xKσ,Lσ =
xKσ si (uh · nK,σ)σ ≥ 0
xLσ si (uh · nK,σ)σ < 0
If x ∈ σ, we have the following Taylor expansion
v(x)− vKσ,Lσ = v(x)−v(xKσ ,Lσ) =
∇v (tx+(1− t)xKσ ,Lσ) (x−xKσ ,Lσ) dt.
We have |x−xKσ ,Lσ | ≤ h. Thus, integrating over σ and using the Cauchy-Schwarz
inequality, we get
|v − vKσ ,Lσ | dσ ≤
|∇v (tx+ (1− t)xKσ ,Lσ)|2 h
t dt dσ
We then use the change of variable (t,x) → y = tx + (1 − t)xKσ ,Lσ . Let Dσ be
the quadrilateral domain given by the endpoints of σ, xKσ and xLσ . The domain
[0, 1]× σ becomes DKσ,Lσ with
DKσ,Lσ =
Dσ ∩Kσ si (uh · nK,σ)σ ≥ 0
Dσ ∩ Lσ si (uh · nK,σ)σ < 0
For all t ∈ [0, 1] we have h
t ≤ h t ≤ C d(xKσ ,Lσ , σ) t thanks to the hypothesis on
the mesh. We check easily that d(xKσ ,Lσ , σ) t dt dσ = dy. Thus we get
|v − vKσ ,Lσ | dσ ≤ C h
DKσ,Lσ
|∇v (y)|2 dy
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 9
Since (C1)2 is dense in H2, this estimate still holds for v ∈ H2. Plugging this
estimate into (4.2) and using the Cauchy-Schwarz inequality we get
ΠP0 b̃(u,v) − b̃h(ΠRT0u, Π̃P0v),ψh
≤ C h ‖u‖H2
σ∈Eint
|ψLσ −ψKσ |
σ∈Eint
DKσ,Lσ
|∇v (y)|2 dy
so that
ΠP0 b̃(u,v)− b̃h(ΠRT0u, Π̃P0v),ψh
)∣∣∣ ≤ C h ‖u‖H2 ‖ψh‖1,h ‖v‖1. Using
then definition (2.6), we get the result.
Let us consider now the operator bh. Let u ∈ H1 and v ∈ L∞∩H1 with divu ≥ 0.
Integrating by parts we deduce from (4.1):
v · b̃(u,v) dx =
divu dx ≥ 0.
The discrete operator bh shares a similar property.
Proposition 4.2. Let uh ∈ P0 such that divh uh ≥ 0. For all vh ∈ P0 we have
bh(uh,vh,vh) ≥ 0.
Proof. Remember that for all edges σ ∈ E inth , two triangles Kσ et Lσ share σ as
an edge. We denote by Kσ the one such that uσ · nKσ,σ ≥ 0. Using the algebraic
identity 2 a (a− b) = a2 − b2 + (a− b)2 we deduce from (2.18)
2 bh(uh,vh,vh) = 2
σ∈Eint
|σ|vKσ · (vKσ − vLσ ) (uσ · nKσ,σ)
σ∈Eint
|vKσ|2 − |vLσ |2 + |vKσ − vLσ |2
(uσ · nKσ,σ)
so that 2 bh(uh,vh,vh) ≥
σ∈Eint
|vKσ|2−|vLσ |2
(uσ ·nKσ,σ). This sum can
be written as a sum over the triangles of the mesh. We get
2 bh(uh,vh,vh) ≥
|vKσ |2
σ∈EK∩E
|σ| (uσ · nKσ,σ).
Using finally definition (2.15) we get
2 bh(uh,vh,vh) ≥
|K| |vK |2 (divh uh)|K ≥ 0.
The following result states that the operator bh is stable for suitable norms.
Proposition 4.3. There exists a constant C > 0 such that for all vh ∈ P0, wh ∈
P0, uh ∈ P0 satisfying divh uh = 0
|bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖vh‖h.
Proof. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E inth , we have
(uσ · nK,σ)+ vK + (uσ · nK,σ)− vLσ = (uσ · nK,σ)vK − |(uσ · nK,σ)| (vLσ − vK).
This way, we deduce from (4.7) bh(uh,vh,wh) = S1 + S2 with
vK ·wK
σ∈EK∩E
|σ| (uσ · nK,σ) ,
S2 = −
σ∈EK∩E
|σ| |uσ · nK,σ| (vLσ − vK).
10 S. ZIMMERMANN
By writing the sum over the edges as a sum over the triangles we get
S2 = −
σ∈Eint
|σ| |uσ · nK,σ| (vLσ − vK) · (wLσ −wK).
Using the Cauchy-Schwarz inequality we get
|S2| ≤ h ‖uh‖∞
σ∈Eint
|vLσ − vKσ |2
1/2 
σ∈Eint
|wLσ −wKσ |2
Since uh ∈ P0 we have the inverse inequality [6] h ‖uh‖∞ ≤ C |uh|. Using (2.2)
and (2.4) we have
σ∈Eint
|vLσ − vKσ |2 ≤ C
σ∈Eint
τσ |vLσ − vKσ |2 ≤ C ‖vh‖2h
σ∈Eint
|wLσ −wKσ |2 ≤ C ‖wh‖2h. Therefore |S2| ≤ C |uh| ‖vh‖h ‖wh‖h. On
the other hand we deduce from definition (2.15)
|K| (vK ·wK) (divh uh)|K = 0.
By combining the estimates for S1 and S2 we get the result.
4.2. Properties of the discrete gradient.
Proposition 4.4. There exists a constant C > 0 such that for all qh ∈ P0:
h |∇hqh| ≤ C |qh|.
Proof. Using (2.14) and the Minkowski inequality, we have for all triangle K ∈ Th
|K| |∇hqh |K |2 ≤
σ∈EK∩E
6 |σ|2
(q2K + q
σ∈EK∩E
6 |σ|2
q2K .
Let us sum over K ∈ Th. Since |σ| ≤ h, using (2.3), we get
|∇hqh|2 ≤
σ∈EK∩E
|K| q2K + |Lσ| q2Lσ
σ∈EK∩E
|K| q2K
Thus h2 |∇hqh|2 ≤ C
|K| q2K ≤ C |qh|2.
We now prove that ∇h is a consistent approximation of the gradient.
Proposition 4.5. There exists a constant C > 0 such that for all q ∈ H2
|ΠP0(∇q)−∇h(Π̃P0q)| ≤ C h ‖q‖2.
Proof. Let K ∈ Th. Using the gradient formula and definition (2.14) we get
ΠP0(∇q)−∇h(Π̃P0q)
∇q dx− |K| ∇h(Π̃P0q)
where we have set for all edge σ ∈ EK ∩ E inth
IσK =
αK,Lσ q(xK) + αLσ,K q(xLσ )
nK,σ dσ
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 11
and for all edge σ ∈ EK ∩ Eexth : IσK =
q − q(xK)
nK,σ dσ. Squaring and using
(2.3) we get
ΠP0(∇q)−∇h(Π̃P0q)
|IσK |2 ≤
|IσK |2.
Summing over the triangles K ∈ Th we get
(4.3)
∣∣∣ΠP0(∇q)−∇h(Π̃P0q)
|IσK |2.
We must estimate the integral terms IσK . Let K ∈ Th. Let us first assume that
q ∈ C2(Ω). Let σ ∈ EK ∩ E inth . For x ∈ σ we have the following Taylor expansions
q(xK) = q(x)+∇q(x) · (xK −x)+
H(q) (txK +(1− t)x)(xK −x) · (xK −x) t dt ,
q(xLσ) = q(x)+∇q(x)·(xLσ −x)+
H(q) (txLσ+(1−t)x)(xLσ−x)·(xLσ−x) t dt ,
∇q(x) = ∇q(xK)−
txK + (1− t)x
(xK − x) dt.
Plugging the last expansion into the two others and integrating over σ we get
(4.4)
q(xK)− q
dσ = |σ| ∇q(xK) · (xK − xσ)−AσK +BσK ,
(4.5)
q(xLσ )− q
dσ = |σ| ∇q(xK) · (xLσ − xσ)−AσLσ + B
We have set for T ∈ {Kσ, Lσ}
(4.6) AσT =
∇∇q (txT + (1 − t)x) (xT − x) dt dσ ,
(4.7) BσT =
H(q) (txT + (1− t)x)(xT − x) · (xT − x) t dt dσ.
One can bound these terms as in the proof of proposition 4.1. We get
(4.8) |AσT |2 ≤ C h2
|∇∇q (y)|2 dy , |BσT |2 ≤ C h4
|H(q)(y)|2 dy.
Now, let us multiply (4.4) by −αK,Lσ nK,σ, (4.5) by −αLσ,K nK,σ and sum the
equalities. Since αLσ,K + αK,Lσ = 1 we have
−αLσ,K
q(xK)− q
nK,σ dσ − αK,Lσ
q(xLσ )− q
nK,σ dσ
αKσ,Lσ q(xK,σ) + αLσ,Kσ q(xL,σ)
nK,σ dσ = I
On the other hand
−αK,Lσ (xK −xσ) ·nK,σ −αLσ,K (xLσ −xσ) ·nK,σ = −αK,Lσ αLσ,K (dσ − dσ) = 0.
Therefore we get IσK = −αLσ,K
AσK +B
nK,σ −αK,Lσ
AσLσ +B
nK,σ. Using
estimates (4.8) we obtain
|IσK |2 ≤ C h4
(|H(q)(y)|2 + |∇∇q(y)|2) dy.
12 S. ZIMMERMANN
We now consider the case σ ∈ EK ∩ Eexth . For x ∈ σ we have
q(xK) = q(x)+∇q(x) · (xK − x)+
H(q)(txK + (1− t)x)(xK − x) · (xK − x)tdt.
Multiplying by nK,σ and integrating over σ, we get −IσK = JσK nK,σ+BσK nK,σ with
JσK =
∇q(x) · (xK − x) dx. Since |xK − x| ≤ h if x ∈ σ, using a trace theorem,
we have
|JσK | ≤ C h2 ‖∇q‖L∞(σ) ≤ C h2
|∇q(y)|2 + |∇∇q(y)|2
By combining this estimate with (4.8), we get
|IσK |2 ≤ 2 |Jσ|2 + 2 |BσK |2 ≤ C h4
|H(q)(y)|2 dy
+ C h4
(|∇q(y)|2 + |∇∇q(y)|2) dy.
The space C2(Ω) is dense in H2. Therefore the bounds for IσK still hold for q ∈ H2.
Plugging these bounds into (4.3) we get the result.
4.3. Properties of the discrete divergence. The operators divergence and gra-
dient are adjoint: if q ∈ H1 and v ∈ H1 with v · n|∂Ω = 0, we get (v,∇q) =
−(q, divv) by integrating by parts. For ∇h and divh we state
Proposition 4.6. If vh ∈ P0 and qh ∈ P0 we have: (vh,∇hqh) = −(qh, divh vh).
Proof. Using (2.14) one checks that (vh,∇hqh) =
qK (S1 + S2 + S3) with
σ∈EK∩E
|σ|αK,Lσ vK · nK,σ , S2 =
σ∈EK∩E
|σ|αK,Lσ vLσ · nLσ,σ ,
and S3 =
σ∈EK∩E
|σ|vK · nK,σ. Since αK,Lσ + αLσ,K = 1 we have
σ∈EK∩E
|σ| (1− αLσ ,K)vK · nK,σ
σ∈EK∩E
|σ|vK · nK,σ −
σ∈EK∩E
|σ|αLσ,K vK · nK,σ.
Since nLσ,σ = −nK,σ, we also have
σ∈EK∩E
|σ|αK,Lσ vLσ · nLσ,σ = −
σ∈EK∩E
|σ|αK,Lσ vLσ · nK,σ.
Therefore
(vh,∇hqh) = −
σ∈EK∩E
|σ| (αL,Kσ vK +αK,Lσ vLσ) ·nK,Lσ +
|σ|vK ·nK,Lσ .
Using definition (2.15) we get
(vh,∇hqh) = −
|K| divh vh|K +
|σ|vK · nK,Lσ .
Since
|σ|vK · nK,Lσ = vK ·
|σ|nK,Lσ = 0 we obtain finally
(vh,∇hqh) = −
qK |K| divh vh|K = −(qh, divh vh).
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 13
The divergence operator and the spaces L20, H
0 satisfy the following property,
called inf-sup (or Babuška-Brezzi) condition (see [10] for example). There exists a
constant C > 0 such that
(4.9) inf
− (q, divv)
‖v‖1|q|
We will now prove that the operator divh and the spaces P0 ∩ L20, P0 satisfy an
analogous property. The proof is based on the following lemma.
Lemma 4.1. We assume that the mesh is uniform (i.e. the triangles of the mesh
are equilateral). Then we have for all qh ∈ P0
∇hqh = ∇̃h(ΠPnc
Proof. Since the mesh is uniform we have: ∀σ ∈ E inth , αKσ,Lσ = 12 . Let K ∈ Th.
Using definition (2.14) and the gradient formula we get
∇hqh − ∇̃h(ΠPnc
σ∈EK∩E
(qKσ + qLσ)nK,σ
σ∈EK∩E
|σ| qKσ nK,σ −
(ΠPnc
qh)nK,σ dσ.
Since qh ∈ P0 we deduce from proposition 2.2
qh dσ = |σ| (ΠPnc
qh)(xσ) =
(qKσ + qLσ) if σ ∈ E inth ,
|σ| qKσ if σ ∈ Eexth .
Plugging this into the equation above, we get ∇hqh|K = ∇̃h(ΠPnc
qh)|K .
Lemma 4.2. We assume that the mesh is uniform. There exists a constant C > 0
such that
∀ qh ∈ P0 ∩ L20 , sup
vh∈P0\{0}
− (qh, divh vh)
‖vh‖h
≥ C h ‖ΠPnc
qh‖1,h.
Proof. If qh = 0 the result is trivial. Let qh ∈ P0 ∩ L20\{0}. Let vh = ∇hqh ∈
P0\{0}. Using proposition 4.6 we have
−(qh, divhvh) = (vh,∇hqh) = |∇hqh|2 = |∇hqh| |vh|.
Let χΩ be the characteristic function of Ω. Putting ψ = χΩ in (2.11) we get
qh ∈ L20. So according to (2.10) and(4.1) we have
|∇hqh| =
∣∣∣∇̃h(ΠPnc
∣∣∣ ≥ C ‖ΠPnc
qh‖1,h.
On the other hand, according to proposition 2.1: |vh| ≥ C h ‖vh‖h. Therefore
−(qh, divhvh) ≥ C h ‖ΠPnc
qh‖1,h ‖vh‖h.
Proposition 4.7. We assume that the mesh is uniform. There exists a constant
C > 0 such that for all qh ∈ P0 ∩ L20
vh∈P0\{0}
− (qh, divh vh)
‖vh‖h
≥ C |ΠPnc
14 S. ZIMMERMANN
Proof. If qh = 0 the result is clear. Let qh ∈ P0 ∩ L20\{0}. According to (4.9)
there exists v ∈ H10 such that
(4.10) divv = −ΠPnc
qh and ‖v‖1 ≤ C |ΠPnc
We set vh = ΠPc
v. We want to estimate −
qh, divh(ΠP0vh)
. Since ∇hqh ∈ P0
we deduce from proposition 4.6
qh, divh(ΠP0vh)
= (ΠP0vh,∇hqh) = (vh,∇hqh).
Splitting the last term we get
(4.11) −
qh, divh(ΠP0vh)
= (v,∇hqh)− (v − vh,∇hqh).
One one hand, integrating by parts, we get
(v,∇hqh) = −(ΠPnc
qh, divv) +
(ΠPnc
qh) (v · nK,σ) dσ.
According to (4.10) we have −(ΠPnc
qh, divv) = |ΠPnc
qh|2. Moreover
(ΠPnc
qh) (v · nK,σ) dσ =
σ∈Eint
(ΠPnc
qh) (v · nKσ ,σ) dσ
since v|∂Ω = 0. Using [2] p.269 and (4.10) we have
∣∣∣∣∣∣
σ∈Eint
(ΠPnc
qh) (v · nK,σ) dσ
∣∣∣∣∣∣
≤ C h ‖v‖1 ‖ΠPnc
qh‖1,h
≤ C h |ΠPnc
qh| ‖ΠPnc
qh‖1,h.
So we get
(4.12) (v,∇hqh) ≥ (|ΠPnc
qh| − C h ‖ΠPnc
qh‖1,h) |ΠPnc
On the other hand, using lemma 4.1 and the Cauchy-Schwarz inequality
|(v − vh,∇hqh)| = |(v − vh, ∇̃h(ΠPnc
qh))| ≤ |v − vh| |∇̃h(ΠPnc
qh)|.
Using (2.9) and (4.10) we get
|v − vh| = |v −ΠPc
v| ≤ C h ‖v‖1 ≤ C h |ΠPnc
|(v − vh,∇hqh)| ≤ C h |ΠPnc
qh| |∇̃h(ΠPnc
qh)| ≤ C h |ΠPnc
qh| ‖ΠPnc
qh‖1,h.
Let us plug this estimate and (4.12) into (4.11). We get
qh, divh(ΠP0vh)
≥ (|ΠPnc
qh| − C h ‖ΠPnc
qh‖1,h) |ΠPnc
We now introduce the norm ‖.‖h. We have vh = ΠPc
v ∈ Pc1 ⊂ H1. Thus, using
[6] p. 776, we get ‖ΠP0vh‖h ≤ C ‖vh‖1. Since ΠPc1 is stable for the H
1 norm, we
deduce from (4.10)
‖vh‖1 = ‖ΠPc
v‖1 ≤ ‖v‖1 ≤ C |ΠPnc
Therefore ‖ΠP0vh‖h ≤ C |ΠPnc1 qh|. Using this inequality in (4.3) we obtain that
there exists constants C1 > 0 and C2 > 0 such that
qh, divh(ΠP0vh)
C1 |ΠPnc
qh| − C2 h ‖ΠPnc
qh‖1,h
‖ΠP0vh‖h.
We deduce from this
vh∈P0\{0}
− (qh, divh vh)
‖vh‖h
≥ C1 |ΠPnc
qh| − C2 h ‖ΠPnc
qh‖1,h.
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 15
Let us combine this with lemma 4.2. Since
∀ t ≥ 0 , max
C t , C1 |ΠPnc
qh| − C2 t
≥ C C1
C + C2
|ΠPnc
we get the result.
4.4. Properties of the discrete laplacian. We first prove the coercivity of the
discrete laplacian.
Proposition 4.8. For all uh ∈ P0 and vh ∈ P0 we have
−(∆̃huh,uh) = ‖uh‖2h − (∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h.
Proof. Using definition (2.3) and writing the sum over the triangles as a sum over
the edges, we have
−(∆̃huh,vh) = −
σ∈EK∩E
τσ (uLσ − uK)−
σ∈EK∩E
τσ uK
σ∈Eint
τσ (vLσ − vK) · (uLσ − uK) +
σ∈EK∩E
τσ uK · vK .
We get the first half of the result by taking vh = uh. On the other hand, using the
Cauchy-Schwarz inequality and the algebraic identity a b+c d ≤
a2 + c2
b2 + d2,
we get the second half.
If v ∈ H2, we have |∆v| ≤ ‖v‖2. The operator ∆h shares a similar property.
Proposition 4.9. There exists a constant C > 0 such that for all v ∈ H2
∣∣∣∆̃h(Π̃P0v)
∣∣∣ ≤ C ‖v‖2.
Proof. Let vh = Π̃P0v. Let K ∈ Th. According to definition (2.16)
(4.13) ∆̃hvh|K =
σ∈EK∩E
τσ (v(xLσ )− v(xK))−
σ∈EK∩E
τσ v(xK ).
Let us first assume that v = (v1, v2) ∈ (C∞0 )2. Let i ∈ {1, 2}. If σ ∈ EK ∩ E inth and
x ∈ σ we have the Taylor expansions
vi(xLσ ) = vi(x)+∇vi(x)·(xLσ−x)+
H(vi)(txLσ+(1−t)x)(xLσ−x)·(xLσ−x) t dt ,
vi(xK) = vi(x)+∇vi(x)·(xK−x)+
H(vi)(txK+(1−t)x)(xK−x)·(xK−x) t dt ,
∇vi(x) = ∇vi(xK)−
∇∇vi(txK + (1− t)x)(xK − x) dt.
The notation H(vi) refers to the hessian matrix of vi. Plugging the last expansion
into the two others and integrating over σ, we get
vi(xLσ )− vi(x)
dx = ∇vi(xK) · (xLσ − xσ)−A
vi(xK)− vi(x)
dx = ∇vi(xK) · (xK − xσ)−Aσ,iK +B
The terms A
T and B
T are the same as in (4.6) and (4.7), with vi instead of q.
We substract these equations. Since xLσ − xK = dσ nK,σ we infer from (2.1)
vi(xLσ )− vi(xK)
= ∇vi(xK) · nK,σ +
−Aσ,iLσ +B
16 S. ZIMMERMANN
Let us consider now the case σ ∈ EK ∩Eexth . If x ∈ σ we have the Taylor expansions
vi(xK) = vi(x)+∇vi(x)·(xK−x)+
H(vi)(txK+(1−t)x)(xK−x)·(xK−x) t dt ,
∇vi(x) = ∇vi(xK)−
∇∇vi(txK + (1− t)x)(xK − x) dt.
Since vi ∈ C∞0 we have vi(x) = 0. We plug the last expansion into the other and
integrate over σ. Since xK − xσ = −dσ nK,σ we deduce from (2.1)
−τσ vi(xK) = ∇vi(xK) · nK,σ +
Thus we get
σ∈EK∩E
vi(xLσ )− vi(xK)
σ∈EK∩E
τσ vi(xK)
∇vi(xK) ·
|σ|nK,σ +
where we have set for all edge σ ∈ EK ∩ E inth
−Aσ,iLσ +B
and for all edge σ ∈ EK ∩ Eexth : Riσ = 1dσ
K − B
. Since
|σ|nK,σ = 0,
setting Rσ = (R
σ), we get
σ∈EK∩E
v(xLσ )− v(xK )
σ∈EK∩E
τσ v(xK) =
Since the space (C∞0 )2 is dense in H2, one checks that this equation still holds for
v ∈ H2. Using (4.13) we infer from it
∣∣∣∆̃hvh
∣∣∣∆̃hvh|K
|Rσ|2.
Using estimates (4.6) and (4.7) we obtain
∣∣∣∆̃hvh
|∇∇vi|2 + |H(vi)|2
dx ≤ C ‖v‖22.
5. Stability of the scheme
We now use the results of section 4 to prove the stability of the scheme. We
first show an estimate for the computed velocity (theorem 5.1). We then state a
similar result for the increments in time (lemma 5.2). Using the inf-sup condition
(proposition 4.7), we infer from it some estimates on the pressure (theorem 5.2).
Lemma 5.1. For all m ∈ {0, . . . , N} and n ∈ {0, . . . , N} we have
(umh ,∇hpnh) = 0 , |umh |2 − |ũmh |2 + |umh − ũmh |2 = 0.
Proof. First, using propositions 3.1 and 4.6, we get
(umh ,∇hpnh) = −(pnh, divhumh ) = 0.
Thus we deduce from (3.4)
2 (umh ,u
h − ũmh ) = −
umh ,∇h(pmh − pm−1h )
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 17
Using the algebraic identity 2 a (a− b) = a2 − b2 + (a− b)2 we get
2 (umh ,u
h − ũmh ) = |umh |2 − |ũmh |2 + |umh − ũmh |2 = 0.
We introduce the following hypothesis on the initial data.
(H1) There exists C > 0 such that |u0h|+ |u1h|+ k|∇hp1h| ≤ C.
Hypothesis (H1) is fulfilled if we set u0h = ΠRT0u0 and we use a semi-implicit
Euler scheme to compute u1h. We have the following result.
Theorem 5.1. We assume that the initial values of the scheme fulfill (H1). For
all m ∈ {2, . . . , N} we have
|umh |2 + k
‖ũnh‖2h ≤ C.
Proof. Let m ∈ {2, . . . , N} and n ∈ {1, . . . ,m− 1}. Taking the scalar product of
(3.2) with 4 k ũn+1h we get
3 ũn+1h − 4unh + u
, 4 k ũn+1h
− 4 k
(∆̃hũ
h , ũ
+4 k bh(2u
h − un−1h , ũ
h , ũ
h ) + 4 k (∇hp
h, ũ
h ) = 4 k (f
h , ũ
h ).(5.1)
First of all, using lemma 5.1, we get as in [12]
ũn+1h ,
3 ũn+1h − 4unh + u
= |un+1h |
2 − |unh|2 + 6 |ũn+1h − u
2 + |2un+1h − u
h |2 − |2unh − un−1h |
+|un+1h − 2u
h + u
According to proposition 4.8 we have − 4 k
(∆̃hũ
h , ũ
h ) =
‖ũn+1h ‖2h. Also,
using lemma 5.1 and (3.4), we have
4 k (∇hpnh, ũn+1h ) = 4 k (∇hp
h, ũ
h − u
(|∇pn+1h |
2 − |∇pnh|2 − |∇pn+1h −∇p
h|2).
Multiplying (3.4) by 4k∇h(pn+1h − pnh) and using the Young inequality we get
|∇(pn+1h − p
h)|2 ≤ 3 |un+1h − ũ
According to proposition 4.2 we have 4 k bh(2u
h − u
h , ũ
h , ũ
h ) ≥ 0. At last
using the Cauchy-Schwarz inequality, (2.5) and (3.1) we have
4 k (fn+1h , ũ
h ) ≤ 4 k |f
h | |ũ
h | ≤ C k ‖f‖C(0,T ;L2) ‖ũ
h ‖h.
Using the Young inequality we get
4 k (fn+1h , ũ
h ) ≤ 3 k ‖ũ
h + C k ‖f‖2C(0,T ;L2).
Let us plug these estimates into (5.1). We get
|un+1h |
2 − |unh|2 + |2un+1h − u
h|2 − |2unh − un−1h |
2 + |un+1h − 2u
h + u
+3 |ũn+1h − u
2 + k ‖ũn+1h ‖
(|∇hpn+1h |
2 − |∇hpnh|2) ≤ C k.
18 S. ZIMMERMANN
Summing from n = 1 to m− 1 we have
|umh |2 + |2umh − um−1h |
2 + 3
|ũn+1h − u
2 + k
‖ũn+1h ‖
|∇hpmh |2
≤ C + 4 |u1h|2 + |2u1h − u0h|2 + k2 |∇hp1h|2.
Using hypothesis (H1) we get the result.
We now want to estimate the computed pressure. From now on, we make the
following hypothesis on the data
f ∈ C(0, T ;L2) , ft ∈ L2(0, T ;L2) , u0 ∈ H2 ∩H10 , divu0 = 0.
For all sequence (qm)m∈N we define the sequence (δq
m)m∈N∗ by setting δq
qm − qm−1 for m ≥ 1. We set δ = (δ)2. If the data u0 and f fulfill a compatibility
condition [13] there exists a solution (u, p) to the equations (1.1)–(1.2) such that
u ∈ C(0, T ;H2) , ut ∈ C(0, T ;L2) , ∇p ∈ C(0, T ;L2).
We introduce the following hypothesis on the initial values of the scheme: there
exists a constant C > 0 such that
(H2) |u0h − u0|+
‖u1h − u(t1)‖∞ + |p1h − p(t1)| ≤ C h , |u1h − u0h| ≤ C k.
One checks easily that this hypothesis implies (H1). We have the following result.
Lemma 5.2. We assume that the initial values of the scheme fulfill (H2). Then
there exists a constant C > 0 such that for all m ∈ {1, . . . , N}
(5.2)
|δumh | ≤ C.
Proof. We prove the result by induction. The result holds for m = 1 thanks to
hypothesis (H2). Let us consider the case m = 2. We set ũ1h = u
h. Let u
h ∈ P0
given by
(5.3) u−1h = 4u
h − 3u1h +
h − 2 k b̃h(u0h, ũ1h)− 2 k∇hp1h − 2 k f1h .
We substract this equation from equation (3.4) written for n = 1. Since
b̃h(2u
h − u0h, ũ2h)− b̃h(u0h, ũ1h) = b̃(2u1h − 2u0h, ũ2h) + b̃h(u0h, δũ2h) ,
upon setting δu0h = u
h − u
h , we get
3 δũ2h − 4 δu1h + δu0h
∆̃h(δũ
h) + b̃h(2u
h − 2u0h, ũ2h) + b̃h(u0h, δũ2h) = δf2h .
Taking the scalar product with 4 k δũ2h we get
3 δũ2h − 4 δu1h + δu0h, δũ2h
∆̃h(δũ
h), δũ
+4 k bh(u
h, δũ
h, δũ
h) + 4 k bh(2u
h − 2u0h, ũ2h, δũ2h) = 4 k (δf2h , δũ2h).(5.4)
According to proposition 4.3 we have
4 k |bh(2u1h − 2u0h, ũ2h, δũ2h)| ≤ C k |2u1h − 2u0h| ‖ũ2h‖h ‖δũ2h‖h ;
so that, using hypothesis (H2)
∣∣bh(2u1h − 2u0h, ũ2h, δũ2h)
∣∣ ≤ C k2 ‖ũ2h‖h ‖δũ2h‖h.
From the Young inequality and theorem 5.1 we deduce
∣∣bh(2u1h − u0h, ũ2h, ũ2h − ũ1h)
∣∣ ≤ k
‖δũ2h‖2h + C k3 ‖ũ2h‖2h ≤
‖δũ2h‖2h + C k2.
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 19
On the other hand
δf2h = f
h − f1h = ΠP0 f(t2)−ΠP0 f(t1) = ΠP0
(∫ t2
ft(s) ds
Since ΠP0 is stable for the L
2 norm, using the Cauchy-Schwarz inequality, we get
|δf2h | ≤
|ft(s)| ds ≤
(∫ t2
|ft(s)|2 ds
k ‖ft‖L2(0,T ;L2).
4 k |(δf2h , δũ2h)| ≤ 4 k |δf2h | |δũ2h| ≤ C k3/2 |δũ2h|.
So that, using (2.5) and the Young inequality
4 k |(δf2h , δũ2h)| ≤ C k3/2 ‖δũ2h‖h ≤
‖δũ2h‖2h + C k2.
The other terms in (5.4) are dealt with as in the prooof of theroem 5.1. We get
(5.5) |δu2h|2 ≤ |δu1h|2 + |2 δu1h − δu0h|2.
We know ((5.2) for m = 1) that |δu1h|2 ≤ C k2. It remains to estimate the term
|2 δu1h − δu0h|2. According to (5.3)
2 δu1h − δu0h = −δu1h +
h − 2 k b̃h(u0h,u1h)− 2 k∇hp1h − 2 k f1h ;
by taking the scalar product with 2 δu1h − u0h and using the Cauchy-Schwarz in-
equality we get
|2 δu1h − δu0h|2 ≤ 2 k
( |δu1h|
|∆̃hu1h|+ |∇hp1h|+ |f1h |
|2 δu1h − δu0h|
+ 2 k
∣∣b(u0h, ũ
h, 2 δu
h − δu0h)
∣∣ .(5.6)
Let us bound the terms between braces. First, we have
h = ∆̃h
u1h − Π̃P0u(t1)
+ ∆̃h
ΠP0u(t1)
On one hand, according to proposition 4.8
∣∣∣∆̃h
u1h − Π̃P0u(t1)
u1h − Π̃P0u(t1)
, ∆̃h
u1h − Π̃P0u(t1)
≤ ‖∆̃h
u1h − Π̃P0u(t1)
‖h ‖u1h − Π̃P0u(t1)‖h.
Applying proposition 2.1 we get
∣∣∣∆̃h
u1h − Π̃P0u(t1)
u1h − Π̃P0u(t1)
| |u1h − Π̃P0u(t1)|.
Using the embedding L∞ ⊂ L2 we have
|u1h − Π̃P0u(t1)| = |Π̃P0(u1h − u(t1))| ≤ ‖Π̃P0(u1h − u(t1))‖∞ ;
since Π̃P0 is stable for the L
∞ norm, we get using hypothesis (H2)
|u1h − Π̃P0u(t1)| ≤ ‖u1h − u(t1)‖∞ ≤ C h2.
Therefore
∣∣∣∆̃h
u1h − Π̃P0u(t1)
)∣∣∣ ≤ C. And according to proposition 4.9
∣∣∣∆̃h
ΠP0u(t1)
)∣∣∣ ≤ C ‖u(t1)‖ ≤ C ‖u‖C(0,T ;H2).
Hence |∆̃hu1h| ≤ C. Let us now bound the pressure term in (5.6). We have
∇hp1h = ∇h
p1h − Π̃P0p(t1)
Π̃P0p(t1)
− ΠP0∇p(t1)
+ΠP0∇p(t1).
20 S. ZIMMERMANN
According to proposition 4.4 we have
∣∣∣∇h
p1h − Π̃P0p(t1)
)∣∣∣ ≤ Ch |p
h − Π̃P0p(t1)|.
Using (2.8) we get
∣∣∣∇h
p1h − Π̃P0p(t1)
)∣∣∣ ≤ C ‖p(t1)‖2 ≤ C ‖p‖C(0,T ;H2).
Since P0 is stable for the L
2 norm we have |ΠP0∇p(t1)| ≤ |∇p(t1)| ≤ ‖p‖C(0,T ;H1).
Using proposition 4.5 to treat last term we get |∇hp1h| ≤ C. And according to
(3.1) and (5.2) for m = 1 we have
+ |f1h | ≤ C. We are left with the term∣∣bh(u0h, ũ1h, 2 δu1h − δu0h)
∣∣ in (5.6). We use the following splitting
b̃h(u
h) = b̃h(u
h −ΠRT0u0,u1h) + b̃h
ΠRT0u0,u
h − Π̃P0u(t1)
+ b̃h
ΠRT0u0, Π̃P0u(t1)
Let us take the scalar product with 2 δu1h − δu0h. We get
h, 2 δu
h − δu0h) = B1 +B2 +B3
B1 = bh(u
h −ΠRT0u0,u1h, 2 δu1h − δu0h) ,
B2 = bh
ΠRT0u0,u
h − Π̃P0u(t1), 2 δu1h − δu0h
ΠRT0u0, Π̃P0u(t1)
, 2 δu1h − δu0h
Applying propositions 2.1 and 4.3 we have
|B1| ≤
|u0h −ΠRT0u0| ‖u1h‖h |2 δu1h − δu0h|.
According to (2.8) and (2.13) we have have
|u0h−ΠRT0u0| = |ΠP0u0 −ΠRT0u0| ≤ |ΠP0u0 −u0|+ |u0−ΠRT0u0| ≤ C h ‖u0‖1.
According to proposition 4.8 and (2.5)
‖u1h‖2h = −(∆̃hu1h,u1h) ≤ |∆̃hu1h| |u1h| ≤ C |∆̃hu1h| ‖u1h‖h ;
since |∆̃hu1h| is bounded we get ‖u1h‖h ≤ C. Hence |B1| ≤ C |2 δu1h − δu0h|. In a
similar way, using propositions 2.1 and 4.3, we get
|B2| ≤
|ΠRT0u0| |u1h − Π̃P0u(t1)| |2 δu1h − δu0h|.
We have |ΠRT0u0| ≤ |ΠRT0u0 − u0| + |u0| ≤ C h ‖u0‖1 + |u0| ≤ C ‖u0‖1. Using
moreover (5) we get |B2| ≤ C |2 δu1h − δu0h|. Lastly using the following splitting
ΠRT0u0, Π̃P0u(t1)
ΠRT0u0, Π̃P0u(t1)
−ΠP0 b̃
u0,u(t1)
+ ΠP0 b̃
u0,u(t1)
we have B3 = B31 +B32 with
B31 =
ΠRT0u0, Π̃P0u(t1)
−ΠP0 b̃
u0,u(t1)
, 2δu1h − δu0h
B32 =
ΠP0 b̃
u0,u(t1)
, 2δu1h − δu0h
We have
|B31| ≤ ‖b̃h
ΠRT0u0, Π̃P0u(t1)
−ΠP0 b̃
u0,u(t1)
‖−1,h ‖2δu1h − δu0h‖h
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 21
So that, using proposition 4.1 |B31| ≤ C h ‖u0‖2 ‖u(t1)‖2 ‖2 δu1h − δu0h‖h. Using
proposition 2.1 we obtain
|B31| ≤ C ‖u0‖2 ‖u‖C(0,T ;H2) |2 δu1h − δu0h|.
Let us now bound B32. Using the Cauchy-Schwarz inequality and the stability of
ΠP0 for the L
2 norm, we have
|B32| ≤
∣∣∣ΠP0 b̃
u0,u(t1)
)∣∣∣ |2 δu1h − δu0h| ≤
∣∣∣b̃
u0,u(t1)
)∣∣∣ |2 δu1h − δu0h|.
Integrating by parts, we deduce from (4.1)
∣∣∣b̃
u0,u(t1)
)∣∣∣ ≤
|u0 · ∇ui(t1)| ≤ |u0| ‖u(t1)‖2 ≤ C |u0| ‖u‖C(0,T ;H2).
Thus |B32| ≤ C |2 δu1h − δu0h|. By gathering the estimates for B1, B2, B3 we get
∣∣bh(u0h,u
h, 2 δu
h − δu0h)
∣∣ ≤ C.
Thus we have bounded the right-hand side in (5.6). We infer from it
|2 δu1h − δu0h| ≤ C k.
Plugging this estimate into (5.5) and using (5.2) for m = 1, we get (5.2) for m = 2.
Let m ∈ {3, . . . , N − 1}. We assume that the induction hypothesis is satisfied up
to rank n = m− 1. Let us substract equation (3.2) with the same for n− 1. Since
the operator b̃h is bilinear we get
3 δũn+1h − 4 δunh + δu
∆̃h(δũ
h ) + b̃h(2 δu
h − δun−1h , ũ
+ b̃h(2u
h − un−1h , δũ
h ) +∇h(δp
h) = δf
Let us take the scalar product with 4 k δũn+1h . We get
3 δũn+1h − 4 δunh + δu
, 4 k δũn+1h
− 4 k
∆̃h(δũ
h ), δũ
+4 k bh(2 δu
h − δun−1h , ũ
h , δũ
h ) + 4 k bh(2u
h − un−1h , δũ
h , δũ
∇h(δpnh), δũn+1h
= 4 k (δfn+1h , δũ
According to proposition 4.3 we have
∣∣4 k bh(2 δunh − δu
h , ũ
h , δũ
∣∣ ≤ C k |2 δunh − δu
h | ‖ũ
h ‖h ‖δũ
h ‖h.
Using the induction hypothesis we get
∣∣4 k bh(2 δunh − δu
h , ũ
h , δũ
∣∣ ≤ C k2 ‖ũn+1h ‖h ‖δũ
h ‖h.
Using the Young inequality and (5.1) we infer that
∣∣4 k bh(2 δunh − δun−1h , ũ
h , δũ
∣∣ ≤ k
‖δũn+1h ‖
h + C k
The other terms are treated like the case m = 2. We finally obtain (5.2).
Theorem 5.2. We assume that the initial values of the scheme fulfull (H2). There
exists a constant C > 0 such that for all m ∈ {2, . . . , N}
|ΠPnc
pnh|2 ≤ C.
22 S. ZIMMERMANN
Proof. Let m ∈ {2, . . . , N}. We set n = m− 1. Using the inf-sup condition (4.7)
and proposition 4.6, we get that there exists vh ∈ P0\{0} such that
(5.7) C ‖vh‖h |ΠPnc
h | ≤ −(p
h , divh vh) = (∇hp
h ,vh).
Plugging (3.4) into (3.2) we have
∇hpn+1h = −
3un+1h − 4unh + u
∆̃hũ
h − b̃h(2u
h − un−1h , ũ
h ) + f
so that
(∇hpn+1h ,vh) = −
3un+1h − 4unh + u
∆̃hũ
h ,vh
− bh(2unh − un−1h , ũ
h ,vh) + (f
h ,vh).
Using the Cauchy-Schwarz inequality, (2.5) and (3.1) we have
3un+1h − 4unh + u
)∣∣∣∣ ≤ C
3un+1h − 4unh + u
∣∣∣∣ ‖vh‖h
(fn+1h ,vh) ≤ |f
h | |vh| ≤ C |vh| ≤ C ‖vh‖h ,
Thanks to proposition 4.3 and theorem 5.1 we have
∣∣bh(2unh − un−1h , ũ
h ,vh)
2 |unh|+ |un−1h |
‖ũn+1h ‖h ‖vh‖h ≤ C ‖ũ
h ‖h ‖vh‖h.
And according to proposition 4.8 we have
∆̃hũ
h ,vh
≤ ‖ũn+1h ‖h ‖vh‖h. Thus
(∇hpn+1h ,vh) ≤ C + C
|3un+1h − 4unh + u
+ ‖ũn+1h ‖h
‖vh‖h.
Comparing with (5.7) we get
|ΠPnc
h | ≤ C + C
|3un+1h − 4unh + u
+ ‖ũn+1h ‖h
Squaring and summing from n = 1 to m− 1 we obtain
|ΠPnc
pnh|2 ≤ C + C k
|3un+1h − 4unh + u
+ C k
‖ũn+1h ‖
The last term on the right-hand side is bounded, thanks to theorem 5.1. And since
3un+1h − 4u
h + u
h = 3(u
h − u
h)− (unh − u
h ) = 3 δu
h − δu
we deduce from lemma 5.2
|3un+1h − 4unh + u
≤ C k
|δunh|2
References
[1] S. Boivin , F. Cayre, J. M. Herard, A finite volume method to solve the Navier-Stokes
equations for incompressible flows on unstructured meshes, Int. J. Therm. Sci., 39 (2000)
806-825.
[2] S. C. Brenner, L. R. Scott, The Mathematical Theory of Finite Element Methods, Springer,
2002.
[3] F. Brezzi, M. Fortin, Mixed and Hybrid Finite Element Methods, Springer-Verlag, 1991.
[4] J. Chorin, On the convergence of discrete approximations to the Navier-Stokes equations,
Math. Comp. 23 (1969) 341-353.
A COLOCATED FINITE VOLUME SCHEME FOR THE NAVIER-STOKES EQUATIONS 23
[5] R. Eymard, T. Gallouët, R. Herbin, A cell-centered finite-volume approximation for
anisotropic diffusion operators on unstructured meshes in any space dimension, IMA J. Nu-
mer. Anal. 26 (2006) 326-353.
[6] R. Eymard, T. Gallouët and R. Herbin, Finite volume methods. In Handbook of Numerical
Analysis, P.G. Ciarlet and J.L. Lions eds, North-Holland, 2000.
[7] R. Eymard and R. Herbin, A staggered finite volume scheme on general meshes for the
Navier-Stokes equations in two space dimensions, Int.J. Finite Volumes (2005).
[8] R. Eymard, J. C. Latché and R. Herbin, Convergence analysis of a colocated finite volume
scheme for the incompressible Navier-Stokes equations on general 2D or 3D meshes, preprint
LATP (2004).
[9] S. Faure, Stability of a colocated finite volume scheme for the Navier-Stokes equations, Num.
Methods Partial Differential Equations 21(2) (2005) 242-271.
[10] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes Equations: Theory
and Algorithms, Springer-Verlag, 1986.
[11] J.L. Guermond, Some implementations of projection methods for Navier-Stokes equations,
M2AN 30(5) (1996) 637-667.
[12] J. L. Guermond, Un résultat de convergence l’ordre deux en temps pour l’approximation des
équations de Navier-Stokes par une technique de projection, M2AN 33(1) (1999) 169-189.
[13] J. G. Heywood and R. Rannacher, Finite element approximation of the nonstationary Navier-
Stokes problem. I. Regularity of solutions and second-order error estimates for spatial dis-
cretization, SIAM J. Numer. Anal., 19(26) (1982) 275-311.
[14] D. Kim and H. Choi, A second-order time-accurate finit volume method for unsteady incom-
pressible flow on hybrid unstructured grids, J. Comput. Phys. 162 (2000) 411-428.
[15] R. Temam, Sur l’approximation de la solution des équations de Navier-Stokes par la méthode
de pas fractionnaires II, Arch. Ration. Mech. Anal. 33 (1969) 377-385.
[16] S. Zimmermann, Étude et implémentation de méthodes de volumes finis pour les fluides
incompressibles, PhD, Blaise Pascal University, 2006.
Department of Mathematics, Centrale Lyon University, 63177 Ecully, FRANCE
E-mail : Sebastien.Zimmermann@ec-lyon.fr
ABSTRACT
  We introduce a finite volume scheme for the two-dimensional incompressible
Navier-Stokes equations. We use a triangular mesh. The unknowns for the
velocity and pressure are both piecewise constant (colocated scheme). We use a
projection (fractional-step) method to deal with the incompressibility
constraint. We prove that the differential operators in the Navier-Stokes
equations and their discrete counterparts share similar properties. In
particular, we state an inf-sup (Babuska-Brezzi) condition. We infer from it
the stability of the scheme.

<|endoftext|><|startoftext|>
Introduction to
Econophysics (Cambridge University Press, Cambridge,
1999).
[2] J. P. Bouchaud and M. Potters, Theory of Financial Risk
and Derivative Pricing (Cambridge University Press,
Cambridge, 2003), 2nd ed.
[3] I. Kondor and J. Kertesz, eds., Econophysics: An Emerg-
ing Science (Kluwer, Dordrecht, 1999).
[4] A. Chatterjee and B. K. Chakrabarti, eds., Econophysics
of Stock and other Markets (Springer, Milan, 2006).
[5] T. Lux, Applied Financial Economics 6, 463 (1996).
[6] V. Plerou, P. Gopikrishnan, L. A. Nunes Amaral,
M. Meyer, and H. E. Stanley, Phys. Rev. E 60, 6519
(1999).
[7] R. K. Pan and S. Sinha, Europhys. Lett. 77, 58004
(2007).
[8] L. Laloux, P. Cizeau, J. P. Bouchaud, and M. Potters,
Phys. Rev. Lett. 83, 1467 (1999).
[9] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A.
Nunes Amaral, and H. E. Stanley, Phys. Rev. Lett. 83,
1471 (1999).
[10] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A.
Nunes Amaral, T. Guhr, and H. E. Stanley, Phys. Rev.
E 65, 066126 (2002).
[11] A. Utsugi, K. Ino, and M. Oshikawa, Phys. Rev. E 70,
026110 (2004).
[12] P. Gopikrishnan, B. Rosenow, V. Plerou, and H. E. Stan-
ley, Phys. Rev. E 64, 035106(R) (2001).
[13] D. H. Kim and H. Jeong, Phys. Rev. E 72, 046133 (2005).
[14] L. Giada and M. Marsili, Phys. Rev. E 63, 061101 (2001).
[15] H. M. Markowitz, Portfolio Selection: : Efficient Diver-
sification of Investments (John Wiely & Sons, Inc., New
York, 1959).
[16] R. N. Mantegna, Eur. Phys. Jour. B 11, 193 (1999).
[17] J. P. Onnela, A. Chakraborti, K. Kaski, and J. Kertesz,
Eur. Phys. Jour. B 30, 285 (2002).
[18] J. P. Onnela, A. Chakraborti, K. Kaski, J. Kertesz, and
A. Kanto, Phys. Rev. E 68, 056110 (2003).
[19] R. Morck, B. Yeung, and W. Yu, Journal of Financial
Economics 58, 215 (2000).
[20] D. Wilcox and T. Gebbie, Physica A 375, 584 (2007).
[21] S. Sinha and R. K. Pan, Econophysics of Stock and Other
Markets (Springer, Milan, 2006), chap. The power (Law)
of Indian markets: Analysing NSE and BSE trading
statistics, pp. 24–34.
[22] V. Kulkarni and N. Deo, Econophysics of Stock and Other
Markets (Springer, Milan, 2006), chap. A random matrix
approach to volatility in an Indian financial market, pp.
35–48.
[23] S. Cukur, M. Eryigit, and R. Eryigit, Physica A 376, 555
(2007).
[24] A. Durnev, K. Li, R. Morck, and B. Yeung, The Eco-
nomics of Transition 12, 593 (2004).
[25] R. K. Pan and S. Sinha, physics/0607014 (2006).
[26] P. Gopikrishnan, V. Plerou, L. A. Nunes Amaral,
M. Meyer, and H. E. Stanley, Phys. Rev. E 60, 5305
(1999).
[27] J. D. Noh, Phys. Rev. E 61, 5981 (2000).
[28] F. Lillo and R. N. Mantegna, Phys. Rev. E 72, 016219
(2005).
[29] H. E. Roman, M. Albergante, M. Colombo, F. Croccolo,
F. Marini, and C. Riccardi, Phys. Rev. E 73, 036129
(2006).
[30] Tech. Rep., National Stock Exchange (2004).
[31] http://www.nseindia.com/.
[32] http://finance.yahoo.com/.
[33] A. M. Sengupta and P. P. Mitra, Phys. Rev. E 60, 3389
(1999).
[34] L. Bachelier, Annales Scientifiques de l’École Normale
Supérieure Sér 3, 21 (1900).
TABLE I: The list of 201 stocks of NSE analyzed in this paper.
i Company Sector i Company Sector i Company Sector
1 UCALFUEL Automobiles Transport 68 IBP Energy 135 HIMATSEIDE Industrial
2 MICO Automobiles Transport 69 ESSAROIL Energy 136 BOMDYEING Industrial
3 SHANTIGEAR Automobiles Transport 70 VESUVIUS Energy 137 NAHAREXP Industrial
4 LUMAXIND Automobiles Transport 71 NOCIL Basic Materials 138 MAHAVIRSPG Industrial
5 BAJAJAUTO Automobiles Transport 72 GOODLASNER Basic Materials 139 MARALOVER Industrial
6 HEROHONDA Automobiles Transport 73 SPIC Basic Materials 140 GARDENSILK Industrial
7 MAHSCOOTER Automobiles Transport 74 TIRUMALCHM Basic Materials 141 NAHARSPG Industrial
8 ESCORTS Automobiles Transport 75 TATACHEM Basic Materials 142 SRF Industrial
9 ASHOKLEY Automobiles Transport 76 GHCL Basic Materials 143 CENTENKA Industrial
10 M&M Automobiles Transport 77 GUJALKALI Basic Materials 144 GUJAMBCEM Industrial
11 EICHERMOT Automobiles Transport 78 PIDILITIND Basic Materials 145 GRASIM Industrial
12 HINDMOTOR Automobiles Transport 79 FOSECOIND Basic Materials 146 ACC Industrial
13 PUNJABTRAC Automobiles Transport 80 BASF Basic Materials 147 INDIACEM Industrial
14 SWARAJMAZD Automobiles Transport 81 NIPPONDENR Basic Materials 148 MADRASCEM Industrial
15 SWARAJENG Automobiles Transport 82 LLOYDSTEEL Basic Materials 149 UNITECH Industrial
16 LML Automobiles Transport 83 HINDALC0 Basic Materials 150 HINDSANIT Industrial
17 VARUNSHIP Automobiles Transport 84 SAIL Basic Materials 151 MYSORECEM Industrial
18 APOLLOTYRE Automobiles Transport 85 TATAMETALI Basic Materials 152 HINDCONS Industrial
19 CEAT Automobiles Transport 86 MAHSEAMLES Basic Materials 153 CARBORUNIV Industrial
20 GOETZEIND Automobiles Transport 87 SURYAROSNI Basic Materials 154 SUPREMEIND Industrial
21 MRF Automobiles Transport 88 BILT Basic Materials 155 RUCHISOYA Industrial
22 IDBI Financial 89 TNPL Basic Materials 156 BHARATFORG Industrial
23 HDFCBANK Financial 90 ITC Consumer Goods 157 GESHIPPING Industrial
24 SBIN Financial 91 VSTIND Consumer Goods 158 SUNDRMFAST Industrial
25 ORIENTBANK Financial 92 GODFRYPHLP Consumer Goods 159 SHYAMTELE Telecom
26 KARURVYSYA Financial 93 TATATEA Consumer Goods 160 ITI Telecom
27 LAKSHVILAS Financial 94 HARRMALAYA Consumer Goods 161 HIMACHLFUT Telecom
28 IFCI Financial 95 BALRAMCHIN Consumer Goods 162 MTNL Telecom
29 BANKRAJAS Financial 96 RAJSREESUG Consumer Goods 163 BIRLAERIC Telecom
30 RELCAPITAL Financial 97 KAKATCEM Consumer Goods 164 INDHOTEL Services
31 CHOLAINV Financial 98 SAKHTISUG Consumer Goods 165 EIHOTEL Services
32 FIRSTLEASE Financial 99 DHAMPURSUG Consumer Goods 166 ASIANHOTEL Services
33 BAJAUTOFIN Financial 100 BRITANNIA Consumer Goods 167 HOTELEELA Services
34 SUNDARMFIN Financial 101 SATNAMOVER Consumer Goods 168 FLEX Services
35 HDFC Financial 102 INDSHAVING Consumer Goods 169 ESSELPACK Services
36 LICHSGFIN Financial 103 MIRCELECTR Consumer Discretonary 170 MAX Services
37 CANFINHOME Financial 104 SURAJDIAMN Consumer Discretonary 171 COSMOFILMS Services
38 GICHSGFIN Financial 105 SAMTEL Consumer Discretonary 172 DABUR Health Care
39 TFCILTD Financial 106 VDOCONAPPL Consumer Discretonary 173 COLGATE Health Care
40 TATAELXSI Technology 107 VDOCONINTL Consumer Discretonary 174 GLAXO Health Care
41 MOSERBAER Technology 108 INGERRAND Consumer Discretonary 175 DRREDDY Health Care
42 SATYAMCOMP Technology 109 ELGIEQUIP Consumer Discretonary 176 CIPLA Health Care
43 ROLTA Technology 110 KSBPUMPS Consumer Discretonary 177 RANBAXY Health Care
44 INFOSYSTCH Technology 111 NIRMA Consumer Discretonary 178 SUNPHARMA Health Care
45 MASTEK Technology 112 VOLTAS Consumer Discretonary 179 IPCALAB Health Care
46 WIPRO Technology 113 KECINTL Consumer Discretonary 180 PFIZER Health Care
47 BEML Technology 114 TUBEINVEST Consumer Discretonary 181 EMERCK Health Care
48 ALFALAVAL Technology 115 TITAN Consumer Discretonary 182 NICOLASPIR Health Care
49 RIIL Technology 116 ABB Industrial 183 SHASUNCHEM Health Care
50 GIPCL Energy 117 BHEL Industrial 184 AUROPHARMA Health Care
51 CESC Energy 118 THERMAX Industrial 185 NATCOPHARM Health Care
52 TATAPOWER Energy 119 SIEMENS Industrial 186 HINDLEVER Miscellaneous
53 GUJRATGAS Energy 120 CROMPGREAV Industrial 187 CENTURYTEX Miscellaneous
54 GUJFLUORO Energy 121 HEG Industrial 188 EIDPARRY Miscellaneous
55 HINDOILEXP Energy 122 ESABINDIA Industrial 189 KESORAMIND Miscellaneous
56 ONGC Energy 123 BATAINDIA Industrial 190 ADANIEXPO Miscellaneous
57 COCHINREFN Energy 124 ASIANPAINT Industrial 191 ZEETELE Miscellaneous
58 IPCL Energy 125 ICI Industrial 192 FINCABLES Miscellaneous
59 FINPIPE Energy 126 BERGEPAINT Industrial 193 RAMANEWSPR Miscellaneous
60 TNPETRO Energy 127 GNFC Industrial 194 APOLLOHOSP Miscellaneous
61 SUPPETRO Energy 128 NAGARFERT Industrial 195 THOMASCOOK Miscellaneous
62 DCW Energy 129 DEEPAKFERT Industrial 196 POLYPLEX Miscellaneous
63 CHEMPLAST Energy 130 GSFC Industrial 197 BLUEDART Miscellaneous
64 RELIANCE Energy 131 ZUARIAGRO Industrial 198 GTCIND Miscellaneous
65 HINDPETRO Energy 132 GODAVRFERT Industrial 199 TATAVASHIS Miscellaneous
66 BONGAIREFN Energy 133 ARVINDMILL Industrial 200 CRISIL Miscellaneous
67 BPCL Energy 134 RAYMOND Industrial 201 INDRAYON Miscellaneous
ABSTRACT
  To investigate the universality of the structure of interactions in different
markets, we analyze the cross-correlation matrix C of stock price fluctuations
in the National Stock Exchange (NSE) of India. We find that this emerging
market exhibits strong correlations in the movement of stock prices compared to
developed markets, such as the New York Stock Exchange (NYSE). This is shown to
be due to the dominant influence of a common market mode on the stock prices.
By comparison, interactions between related stocks, e.g., those belonging to
the same business sector, are much weaker. This lack of distinct sector
identity in emerging markets is explicitly shown by reconstructing the network
of mutually interacting stocks. Spectral analysis of C for NSE reveals that,
the few largest eigenvalues deviate from the bulk of the spectrum predicted by
random matrix theory, but they are far fewer in number compared to, e.g., NYSE.
We show this to be due to the relative weakness of intra-sector interactions
between stocks, compared to the market mode, by modeling stock price dynamics
with a two-factor model. Our results suggest that the emergence of an internal
structure comprising multiple groups of strongly coupled components is a
signature of market development.

<|endoftext|><|startoftext|>
Introduction
	Visual galaxy classification
	Linking morphology to cluster environment
	Linking morphology to local projected galaxy density
	Linking morphology to projected cluster mass
	Linking morphology to cluster radius
	Linking morphology to photometric classification
	Conclusions
ABSTRACT
  We present a morphological study of galaxies in the A901/902 supercluster
from the COMBO-17 survey. A total of 570 galaxies with photometric redshifts in
the range 0.155 < z_phot < 0.185 are visually classified by three independent
classifiers to M_V=-18. These morphological classifications are compared to
local galaxy density, distance from the nearest cluster centre, local surface
mass density from weak lensing, and photometric classification. At high local
galaxy densities, log(Sigma_10 /Mpc^2) > 1.5, a classical morphology-density
relation is found. A correlation is also found between morphology and local
projected surface mass density, but no trend is observed with distance to the
nearest cluster. This supports the finding that local environment is more
important to galaxy morphology than global cluster properties. The breakdown of
the morphological catalogue by colour shows a dominance of blue galaxies in the
galaxies displaying late-type morphologies and a corresponding dominance of red
galaxies in the early-type population. Using the 17-band photometry from
COMBO-17, we further split the supercluster red sequence into old passive
galaxies and galaxies with young stars and dust according to the prescription
of Wolf et al. (2005). We find that the dusty star-forming population describes
an intermediate morphological group between late-type and early-type galaxies,
supporting the hypothesis that field and group spiral galaxies are transformed
into S0s and, perhaps, ellipticals during cluster infall.

<|endoftext|><|startoftext|>
Introduction
For more than forty years, K-theory has been an essential tool in
studying rings and algebras [1, 7]. Given a ring R, a simple functorial
object associated to R is the abelian group K0(R). There are multi-
ple ways of defining K0(R), but the most useful characterization when
working with operator algebras is to define K0(R) in terms of idempo-
tents (or projections, if an involution is present) in matrix algebras over
R; i.e., elements e in Mk(R) for some k with the feature that e
2 = e
(p = p∗ = p2 in the involutive case). In this paper, we define, for each
natural number n ≥ 2, a group which we denote Kn0 (R). This group
is constructed from matrices e over R with the property that en = e;
we call such matrices n-potents. We define Kn0 (R) for all rings, unital
or not, and show that Kn0 determines a covariant functor from rings to
abelian groups.
Let Q(n− 1) be the cyclotomic field obtained from the rationals by
adjoining the (n− 1)-th roots of unity. We show that Kn0 is half-exact
on the subcategory of Q(n − 1)-algebras, and given any such algebra
A, we show that Kn0 (A) is isomorphic to a direct sum of n − 1 copies
of K0(A). Since a C-algebra A is a Q(n − 1)-algebra for all n, what-
ever invariants are contained in Kn0 (A) are already contained in K0(A).
However, K
0 for p 6= n may generate new groups for cyclotomic alge-
bras, e.g., K40(Q(4))
∼= Z⊕2Z (Theorem 3.15) which is not isomorphic
2010 Mathematics Subject Classification: 18F30, 19A99, 19K99.
http://arxiv.org/abs/0704.0775v2
2 EFTON PARK AND JODY TROUT
to K40 (Q(3))
∼= Z3. Thus, K40 distinguishes between the fields Q(3) and
Q(4), but idempotent, and also tripotent (n = 3), K-theory does not.
The paper is organized as follows. In Section 2, we define various
notions of equivalence on the set of n-potents, and explore the rela-
tionships between these equivalence relations. Most of our results in
this section mirror analogous facts about idempotents, but in many
cases the proofs differ or are more delicate for n-potents. In Section
3, we define n-potent K-theory and study its properties and compute
some examples. Finally, in Section 4, we consider n-homomorphisms
on rings and algebras [2, 3, 4], and show that n-potent K-theory is
functorial for such maps; this is a phenomenon that does not appear
in ordinary idempotent K-theory.
The authors thank Dana Williams and Tom Shemanske for their
helpful comments and suggestions.
Note: Unless stated otherwise, all rings and algebras have a unit;
i.e., a multiplicative identity, and all ring and algebra homomorphisms
are unital.
2. Equivalence of n-potents
Fix a natural number n ≥ 2. In this section, we develop the ba-
sic theory of n-potents, including various equivalence relations among
them. We begin by looking at n-potents over general rings, but even-
tually we will specialize to get a well-behaved theory.
Definition 2.1. Let R be a ring. An element e in R is called an
n-potent if en = e. For n = 2, 3, 4, we use the terms idempotent,
tripotent, and quadripotent, respectively. The set of all n-potents in
R is denoted Pn(R).
We begin with a very simple but useful fact about n-potents:
Lemma 2.2. Suppose e is an n-potent. Then en−1 is an idempotent.
Proof. (en−1)2 = en−1en−1 = enen−2 = een−2 = en−1. �
Definition 2.3. Let e and f be n-potents in a ring R. We say that
e and f are algebraically equivalent and write e ∼a f if there exist
elements a and b in R such that e = ab and f = ba. We say that e and
f are similar and write e ∼s f if there exists an invertible element z
in R with the property that f = zez−1.
Lemma 2.4. Suppose that e and f are algebraically equivalent n-
potents in a ring R. Then the elements a and b described in Definition
K0-THEORY WITH n-POTENTS 3
2.3 can be chosen so that
a = en−1a = afn−1 = en−1afn−1
b = fn−1b = ben−1 = fn−1ben−1.
Proof. Choose elements ã and b̃ in R so that ãb̃ = e and b̃ã = f . Set
a = en−1ãfn−1 and b = fn−1b̃en−1. Using Lemma 2.2, we have
ab = (en−1ãfn−1)(fn−1b̃en−1) = en−1ãfn−1b̃en−1
= en−1(ãb̃)nen−1 = en−1enen−1 = (en−1)2en = en−1e = en = e.
Similarly, ba = f . The two strings of equalities in the statement of the
lemma then follow easily. �
Proposition 2.5. The relations ∼a and ∼s are equivalence relations
on Pn(R).
Proof. The only nonobvious point to establish is that ∼a is transitive.
Let e, f , and g be elements of Pn(R), and suppose that e ∼a f ∼a g.
Choose elements a, b, c and d in R so that e = ab, f = ba = cd, and
g = dc, and set s = afn−2c and t = db. Then
st = afn−2cdb = afn−1b = a(ba)n−1b = (ab)n = en = e
ts = dbafn−2c = dfn−1c = d(cd)n−1c = (dc)n = gn = g.
Proposition 2.6. If e and f are similar n-potents in a ring R, then
they are algebraically equivalent.
Proof. Choose an invertible element z in R such that f = zez−1, and set
a = ez−1 and b = zen−1. Then ab = en = e and ba = zenz−1 = f . �
As is the case with idempotents, algebraic equivalence does not imply
similarity in general. However, we do have the following result, just as
for idempotents:
Proposition 2.7. Suppose that e and f are algebraically equivalent
n-potents in a ring R. Then
in the ring M2(R) of 2× 2 matrices over R.
4 EFTON PARK AND JODY TROUT
Proof. Choose elements a and b in R so that e = ab and f = ba; without
loss of generality, we assume that a and b satisfy the conclusions of
Lemma 2.4. Define
1− fn−1 b
afn−2 1− en−1
1− en−1 en−1
en−1 1− en−1
Straightforward computation yields that both u2 and v2 equal the iden-
tity matrix in M2(R), and thus each is its own inverse. Set z = uv.
Then we compute that
z−1 =
1− fn−1 b
afn−2 1− en−1
1− fn−1 b
afn−2 1− en−1
beafn−2 0
since beafn−2 = b(ab)a(ba)n−2 = (ba)n = fn = f. �
Definition 2.8. We say n-potents e and f in a ring R are orthogonal
if ef = fe = 0, in which case we write e ⊥ f .
The next result follows immediately by mathematical induction.
Proposition 2.9. Let e and f be orthogonal n-potents in a ring R.
Then (e+ f)k = ek + fk. In particular, e+ f is an n-potent.
Proposition 2.10. For i = 1, 2, let ei and fi be algebraically equivalent
n-potents in a ring R. Suppose that e1 and f1 are orthogonal to e2 and
f2, respectively. Then e1 + e2 and f1 + f2 are algebraically equivalent.
Proof. For i = 1, 2, choose ai and bi so that ei = aibi, fi = biai, and so
that ai and bi satisfy the conclusion of Lemma 2.4. Then
a1b2 = a1f
2 b2 = 0.
Similarly, b2a1, a2b1, and b1a2 are also zero. Thus
(a1 + a2)(b1 + b2) = a1b1 + a2b2 = e1 + e2
(b1 + b2)(a1 + a2) = b1a1 + b2a2 = f1 + f2,
whence e1 + e2 is algebraically equivalent to f1 + f2. �
Proposition 2.11. Let e and f be n-potents in a ring R.
K0-THEORY WITH n-POTENTS 5
(b) If e ⊥ f then
e+ f 0
Proof. Define
and b =
0 fn−1
en−1 0
in M2(R). Then
0 fn−1
en−1 0
0 fn−1
en−1 0
which establishes the first part of (a); to obtain the second part, simply
take f to be zero.
To prove (b), first observe that if e ⊥ f , then e + f is an n-potent
by Proposition 2.9. Define
and b =
en−1 fn−1
en−1 fn−1
en efn−1
fen−1 fn
en−1 fn−1
en + fn 0
e + f 0
whence the result follows. �
Later in this paper we will restrict our attention to n-potent K-
theory of cyclotomic algebras:
Definition 2.12. For each integer n ≥ 2, the cyclotomic field Q(n−1)
is the field obtained by adjoining the (n − 1)st primitive root of unity
ζn−1 = e
2πi/(n−1) to the field Q of rational numbers. A cyclotomic
algebra is a Q(n− 1)-algebra for some n ≥ 2.
Observe that Q(n− 1) ⊂ C, and therefore every C-algebra is canon-
ically a Q(n− 1)-algebra for all n.
Definition 2.13. Let F be a field and let A be an F-algebra with unit.
An n-partition of unity is an ordered n-tuple (e0, e1, . . . , en−1) of idem-
potents in A such that
(1) e0 + e1 + · · ·+ en−1 = 1;
6 EFTON PARK AND JODY TROUT
(2) e0, e1, . . . , en−1 are pairwise orthogonal; i.e., ejek = δjkek for all
0 ≤ j, k ≤ n− 1.
Note that e0 = 1 − (e1 + · · · + en−1) is completely determined by
e1, e2, . . . , en−1 and is thus redundant in the notation for an n-partition
of unity.
Cyclotomic algebras admit a distinguished n-partition of unity. Set
ω0 = 0 and let ωk = e
2πi(k−1)/(n−1) for 1 ≤ k ≤ n − 1. Note that
ω1, . . . , ωn−1 are the (n−1)st roots of unity, and Ωn = {ω0, ω1, . . . , ωn−1}
is the set of roots of the polynomial equation xn − x = 0.
Theorem 2.14. Let A be a Q(n − 1)-algebra with unit, and suppose
e is an n-potent in A. Then there exists a unique n-partition of unity
(e0, e1, . . . , en−1) in A such that
ωkek.
Proof. Let p0, p1, . . . , pn−1 ∈ Q(n− 1)[x] be the Lagrange polynomials
pk(x) =
j 6=k(x− ωj)
j 6=k(ωk − ωj)
In particular, p0(x) = 1 − xn−1. Each polynomial pk has degree n − 1
and satisfies pk(ωk) = 1 and pk(ωj) = 0 for all j 6= k. We claim that
for all numbers α ∈ Q(n− 1) ⊆ C,
pk(α) = p0(α) + · · ·+ pn−1(α) = 1
and that
(2) α =
ωkpk(α).
Indeed, these identities follow from the fact that these polynomial equa-
tions have degree n− 1 but are satisfied by the n distinct points in Ωn.
Now, given any ωni = ωi in Ωn it follows that pk(ωi)
2 = pk(ωi).
Hence, for any n-potent e ∈ A, if we define ek = pk(e), then each ek is
an idempotent in A, and Equation (1) implies that
pk(e) = 1.
These idempotents are pairwise orthogonal, because
ejek = pj(e)pk(e) = 0
K0-THEORY WITH n-POTENTS 7
for j 6= k. Finally,
ωkpk(e) =
by Equation (2). �
3. K0-theory with n-potents
We can now proceed to construct our n-potent K-theory groups.
Definition 3.1. Let R be a ring. For all k ≥ 1, let Pnk (R) denote the
set of n-potents in Mk(R), and let ik denote the inclusion
ik(a) =
ofMk(R) intoMk+1(R), as well as its restriction as a map from Pnk (R)
to Pnk+1(R). Define M∞(R) and Pn∞(R) to be the (algebraic) direct
limits
M∞(R) =
Mk(R), Pn∞(R) =
Pnk (R) = Pn(M∞(R)).
We define a binary operation ⊕ on Pn∞(R) as follows: let e and f be
elements of Pn∞(R), choose the smallest natural numbers k and ℓ such
that e ∈Mk(R) and f ∈Ml(R), and set
e⊕ f = diag(e, f) =
∈ Pnk+l(R) ⊂ Pn∞(R).
Definition 3.2. Let R be a ring, and define an equivalence relation ∼
on Pn∞(R) as follows: take e and f in Pn∞(R), and choose a natural
number k sufficiently large that e and f are elements of Pnk (R). Then
e ∼ f if e ∼a f in Mk(R). We let Vn(R) denote the set of equivalence
classes of ∼.
Note that if e = ab and f = ba in Mk(R), then
and therefore the equivalence relation described in Definition 3.2 is
well-defined.
8 EFTON PARK AND JODY TROUT
Note that for any n-potent e, f in M∞(R), we get
Thus, the binary operation ⊕ induces a binary operation + on Vn∞(R)
as follows: take e and f in Pn∞(R), and define
[e] + [f ] = [e⊕ f ] =
This operation is well-defined and commutative by Propositions 2.9
and 2.11.
The next proposition is straightforward and left to the reader.
Proposition 3.3. For every ring R and natural number n ≥ 2, Vn(R)
is an abelian monoid under the addition defined above, and whose iden-
tity element is the class of the zero n-potent. If α : R −→ S is a unital
ring homomorphism, then the induced map Vn(α) : Vn(R) −→ Vn(S)
given by
Vn(α)([(aij)]) = [(α(aij))]
is a well-defined homomorphism of abelian semigroups. The correspon-
dences R 7→ Vn(R) and α 7→ Vn(α) induce a covariant functor from the
category of rings and ring homomorphisms to the category of abelian
monoids and monoid homomorphisms.
Definition 3.4. Let R be a ring and let n ≥ 2 be a natural number.
We define Kn0 (R) to be the Grothendieck completion [6] of the abelian
monoid Vn(R). Given an n-potent e in Pn∞(R), we denote its class in
Kn0 (R) by [e].
In light of Propositions 2.6 and 2.7, we could have alternatively used
similarity to define Vn(R), and hence Kn0 (R).
Proposition 3.5. The assignments R 7→ Kn0 (R) determines a covari-
ant functor from the category of rings and ring homomorphisms to the
category of abelian groups and group homomorphisms.
Proof. Proposition 3.3 states that V is a covariant functor from the
category of rings to the category of abelian monoids, and Grothendieck
completion determines a covariant functor from the category of abelian
monoids to the category of abelian groups; we get the desired result by
composing these two functors. �
The following result shows that for (unital) algebras over a field of
characteristic 6= 2, the tripotent K-theory functor K30 offers us no new
invariants over ordinary idempotent K-theory. However, we will see
later (Theorem 3.15) that the situation is subtly different for K40 .
K0-THEORY WITH n-POTENTS 9
Theorem 3.6. Let F be a field with characteristic 6= 2. If A is a unital
algebra over F then there is a natural isomorphism
K30 (A)
K0(A)
of abelian groups.
Proof. If e = e3 ∈M∞(A) is a tripotent, then one can easily check that
(e2 + e) and e2 =
(e2 − e)
are (unique) idempotents in M∞(A) such that e = e1 − e2. It follows
that we have a natural bijection of abelain monoids
V3(A) → V2(A)⊕ V2(A)
[e] 7→ [e1]⊕ [e2]
with inverse map [e1]⊕[e2] 7→ [e1⊕−e2]. Since these maps are additive,
the result easily follows. �
While Kn0 (R) is well-defined for any ring R, to obtain a well-behaved
theory where the usual exact sequences exist, we must restrict our
attention to a smaller class of rings. The problem is that unlike the
situation for idempotents, it is not generally true that if e is an n-
potent, then so is 1− e. However, given an n-potent in an algebra over
the cyclotomic field Q(n− 1), there is an adequate substitute:
Definition 3.7. Let e be an n-potent in a Q(n − 1)-algebra A, and
write
as in the conclusion of Theorem 2.14. We define an n-potent
ω1(1− e1), ω2(1− e2), . . . , ωn−1(1− en−1)
∈Mn−1(A)
and call e⊥ the complementary n-potent of e.
Observe that if n = 2, this definition agrees with the usual one for
idempotents; i.e., e⊥ = 1− e. Note also that e⊕ e⊥ ∼s ω, where
ω = diag(ω11A, . . . , ωn1A) ∈Mn−1(Q(n− 1)) ⊆Mn−1(A).
Proposition 3.8 (Standard Picture of Kn0 (A)). Let n ≥ 2 be a natural
number and let A be a Q(n−1)-algebra. Then every element of Kn0 (A)
can be written in the form [e]−[ω], where e in an n-potent inMk(A) for
some natural number k and ω is a diagonal n-potent in Mk(Q(n− 1)).
10 EFTON PARK AND JODY TROUT
Proof. Start with an element [ẽ]− [f̃ ] in Kn0 (A), and take f̃⊥ to be the
complementary n-potent of f as defined in Definition 3.7. Then
[ẽ]− [f̃ ] =
[ẽ] + [f̃⊥]
[f̃ ] + [f̃⊥]
The n-potents f̃ and f̃⊥ are orthogonal, and therefore
[f̃ ] + [f̃⊥] = [f̃ + f̃⊥] = [ω],
where ω has the desired form. Finally we take e to be ẽ⊕ f̃⊥, and by
enlarging the matrix ω, we obtain the desired result. �
Proposition 3.9. Let n ≥ 2 and let A be a Q(n− 1)-algebra. Suppose
e and f are n-potents in M∞(A). Then [e] = [f ] in K
0 (A) if and only
if e⊕ ω is similar to f ⊕ ω for some n-potent ω in M∞(Q(n− 1)).
Proof. The “only if” direction is obvious. To show the inference in the
opposite direction, suppose that [e] = [f ] in Kn0 (A). By the definition
of the Grothendieck completion, e ⊕ ẽ is similar to f ⊕ ẽ for some n-
potent ẽ in M∞(A). Then e ⊕ ẽ ⊕ ẽ⊥ is similar to f ⊕ ẽ ⊕ ẽ⊥. But if
we write ẽ =
k=1 ωkẽk as in Theorem 2.14, then Proposition 2.11(b)
implies that
ẽ ∼s diag
ω1ẽ1, ω2ẽ2, . . . , ωn−1ẽn−1
Therefore ẽ ⊕ ẽ⊥ is similar to an n-potent in M∞(Q(n − 1)), and the
proposition follows. �
We next turn our attention to n-potent K-theory for nonunital alge-
bras. Given a nonunital Q(n − 1)-algebra A, we define its unitization
A+ as the unital Q(n−1)-algebra A+ = {(a, λ) : a ∈ A, λ ∈ Q(n−1)},
where addition and scalar multiplication are defined componentwise,
and multiplication is given by (a, λ)(b, τ) = (ab+ aτ + bλ, λτ).
Definition 3.10. Let A be a nonunital Q(n−1)-algebra, and let A+ be
its unitization. Let π : A+ −→ Q(n− 1) be the algebra homomorphism
π(a, λ) = λ. Then we define Kn0 (A) = ker π∗.
It is easy to see that π∗ is surjective, so by definition of K
0 (A) we
have a short exact sequence
0 // Kn0 (A)
// Kn0 (A
// Kn0 (Q(n− 1)) //
with splitting induced by the map ψ : Q(n − 1) −→ A+ defined by
ψ(λ) = (0, λ). In addition, it is easy to check that if A already has a
unit and we form A+, then ker π∗ is naturally isomorphic to our original
definition of Kn0 (A).
K0-THEORY WITH n-POTENTS 11
Proposition 3.11. Let A be a nonunital Q(n−1)-algebra. Then every
element of Kn0 (A) can be written in the form [e]− [s(e)], where e is an
n-potents in Mk(A
+) for some integer k ≥ 1, and s = ψ ◦π : A+ → A+
is the scalar mapping [6, Sect. 4.2.1].
Proof. Follows directly from Proposition 3.8 and Definition 3.10. �
Proposition 3.12 (Half-exactness). Every short exact sequence
0 // I
// A/I // 0
of Q(n− 1)-algebras, with A unital, induces an exact sequence
Kn0 (I)
// Kn0 (A)
// Kn0 (A/I)
of abelian n-potent K-theory groups.
Proof. Since q ◦ i = 0, we have by functoriality that q∗ ◦ i∗ = 0 and
so the image of Kn0 (I) under i∗ in K
0 (A) is contained in the kernel of
q∗. To show the reverse inclusion, suppose we have [e]− [λ] in Kn0 (A)
such that q∗
[e]− [λ]
= 0. Then [q(e)] = [q(λ)] = [λ] in Kn0 (A/I). By
Proposition 3.9, there exists an n-potent τ in M∞(Q(n− 1)) so that
q(e)⊕ τ ∼s λ⊕ τ.
Choose N sufficiently large so that we may view e, λ, and τ as N by
N matrices, and choose z in GL2N (A/I) so that
q(e)⊕ τ
z−1 = λ⊕ τ.
By Proposition 3.4.2 and Corollary 3.4.4 in [1], we can lift diag(z, z−1)
to an element u in GL4N(A). Set f = u(e⊕ τ)u−1. Then
q(f) = diag(z, z−1)(q(e)⊕ τ)diag(z−1, z) = λ⊕ τ,
and thus f and λ⊕ τ are in M4N (I+). Therefore
[e]− [λ] = [e⊕ τ ]− [λ⊕ τ ] = i∗([f ]− [λ⊕ τ ])
is in the image of Kn0 (I) under i∗ as desired. �
Note that our proof of Proposition 3.12 relies critically on Proposi-
tion 3.9, which in turn is proved using the standard picture of Kn0 (A).
We do not have a standard picture for Kk0 (A) when k 6= n, and it
seems likely to the authors that Kk0 is, in fact, not half-exact in this
case. However, we do not have a counterexample where half-exactness
fails to hold.
While it is not at all obvious from its definition, Kn0 (A) can be iden-
tified with a more familiar object.
12 EFTON PARK AND JODY TROUT
Theorem 3.13. Let n ≥ 2 be a natural number and let A be a not nec-
essarily unital Q(n− 1)-algebra. Then there is a natural isomorphism
Kn0 (A)
K0(A)
of abelian groups.
Proof. First consider the case where A is unital. We define a homo-
morphism ψ̃ : Vn(A) −→
V0(A)
in the following way: for each
n-potent e =
ωkek in M∞(A), set
ψ̃[e] =
[e1], [e2], . . . , [en−1]
It is easy to check that ψ̃ is additive and well-defined. Next, define a
homomorphism φ̃ :
)n−1 −→ Vn(A) by the formula
[f1], [f2], . . . , [fn−1]
ω1diag(f1, 0, 0, . . . , 0) + ω2diag(0, f2, 0, . . . , 0) + · · ·
+ ωn−1diag(0, 0, . . . , 0, fn−1)
Note that
[f1], [f2], . . . , [fn−1]
ω1diag(f1, 0, . . . , 0) + · · ·+ ωn−1diag(0, 0, . . . , fn−1)
[diag(f1, 0, . . . , 0)], [diag(0, f2, . . . , 0)] . . . [diag(0, 0, . . . , fn−1)]
[f1], [f2], . . . , [fn−1]
φ̃ψ̃[e] = φ
[e1], [e2], . . . , [en−1]
ω1diag(e1, 0, . . . , 0) + · · ·+ ωn−1diag(0, 0, . . . , en−1)
diag(ω1e1, ω2e2, . . . , ωn−1en−1)
= [e],
where the last equality is a consequence of Proposition 2.11(b). The
universal mapping property of the Grothendieck completion implies
that ψ̃ extends uniquely to an abelian group isomorphism
ψ : Kn0 (A) −→
K0(A)
and thus the theorem is true for unital Q(n− 1)-algebras.
K0-THEORY WITH n-POTENTS 13
Now suppose that A does not have a unit. Then we have the following
commutative diagram with exact rows:
0 // Kn0 (A)
// Kn0 (A
+) //
Kn0 (Q(n− 1)) //
0 // K0(A)
n−1 // K0(A
+)n−1 // K0(Q(n− 1))n−1 // 0
An easy diagram chase shows that there is a unique group iso-
morphism from Kn0 (A) to
K0(A)
that makes the diagram com-
mute. �
Since a complex algebra is a Q(n− 1)-algebra for all values of n, we
have the following immediate corollary.
Corollary 3.14. If A is a C-algebra, there are natural isomorphisms
Kn0 (A)
K0(A)
of abelian groups for all natural numbers n ≥ 2.
We now arrive at the result that suggests why we should consider all
Kn0 -functors for algebras over a cyclotomic field.
Theorem 3.15. Let Q(4) = Q[i] be the 4th cyclotomic field. Then we
have the following isomorphisms of abelian groups:
K20(Q(4))
∼= Z,
K30(Q(4))
∼= Z2,
K40(Q(4))
∼= Z⊕ 2Z,
K50(Q(4))
∼= Z4.
Thus, K40 (Q(4)) 6∼= Z3 ∼= K40 (Q(3)).
Proof. Since Q(4) is a field [7], we have K20 (Q(4)) = K0(Q(4))
∼= Z.
The field Q(4) has characteristic 0 6= 2, so Theorem 3.6 implies that
K30(Q(4))
K0(Q(4)
)2 ∼= Z2. Theorem 3.13 implies that we have an
isomorphism K50(Q(4))
K0(Q(4)
)4 ∼= Z4.
However, the spectrum of 4-potents is contained in
0, 1,−1
which is not contained inQ(4) since the two primitive 3rd roots of unity
ω = ζ3 = −12 +
i and ω̄ = ζ̄3 = −12 −
i are not in Q(4) = Q[i].
Given any 4-potent e ∈Mn(Q(4)) ⊂Mn(C) we can uniquely write
e = e1 + ωe2 + ω̄e3,
14 EFTON PARK AND JODY TROUT
where e1, e2, e3 are orthogonal idempotents in Mn(C) that sum to an
idempotent e1+e2+e3 = e
3 inMn(Q(4)) by Lemma 2.2. We thus have
e2 = e1 + ω̄e2 + ωe3
e3 = e1 + e2 + e3
because ω2 = ω̄, ω̄2 = ω, and ω3 = ω̄3 = 1. Since ω + ω̄ = −1, this
implies that the first idempotent
e1 = (e + e
2 + e3)/3 ∈Mn(Q(4))
and the sum of the last two idempotents
e2 + e3 = e
3 − e1 ∈Mn(Q(4))
are both inMn(Q(4)). Using a simple trace argument and the fact that
ω, ω̄ 6∈ Q(4), we conclude that
rank(e2) = trace(e2) = trace(e3) = rank(e3),
and so rank(e2 + e3) = trace(e2 + e3) = 2trace(e2) is even. We then
have a well-defined map
V4(Q(4)) → V2(Q(4))⊕ 2V2(Q(4)) ∼= N⊕ 2N
[e] 7→ [e1]⊕ [e2 + e3] ∼= trace(e1)⊕ 2 trace(e2);
this is because the classes of e1 and e1 + e2 are preserved by (stable)
similarity, and the K0-class of an idempotent in a matrix ring over a
number field (or a PID) is the rank (= trace). It is easy to check that
this map is injective (using e1 ⊥ e2 + e3 in Mn(Q(4))) and additive.
The only question is surjectivity. It suffices to show that there is a
4-potent e over Q(4) whose stable similarity class is mapped to the
generator 1⊕ 2 of N⊕ 2N. Consider the block diagonal matrix
1 0 0
0 0 i
0 i −1
 ∈M3(Q(4)),
which is easily checked to be quadripotent. The lower right quadripo-
tent 2× 2 invertible block has the desired eigenvalues ω and ω̄, and so
does not diagonalize over Q(4). The result now follows easily. �
4. n-Homomorphisms and Kn0 Functorality
We know from Proposition 3.5 that Kn0 is a covariant functor from
the category of (unital) rings and ring homomorphisms to the category
of abelian groups and group homomorphisms. However, Kn0 is actually
functorial for a more general class of ring mappings.
K0-THEORY WITH n-POTENTS 15
Definition 4.1. Let R and S be rings. An additive map (not neces-
sarily unital) φ : R −→ S is called an n-homomorphism if
φ(a1a2 · · · an) = φ(a1)φ(a2) · · ·φ(an)
for all a1, a2, . . . , an in R.
Obviously every (ring) homomorphism is an n-homomorphism, but
the converse is false in general. For example, an AEn-ring is a ring
R such that every additive map φ : R → R is an n-homomorphism.
Feigelstock [2, 3] classified all unital AEn-rings. The algebraic version
of n-homomorphism was introduced for complex algebras in [4] and has
been carefully studied in the case of C∗-algebras in [5].
Proposition 4.2. Let φ : R → S be an n-homomorphism between
unital rings. Then φ induces a group homomorphism
φ∗ : K
0 (R) −→ Kn0 (S).
Furthermore, the assignment R 7→ Kn0 (R) is a covariant functor from
the category of unital rings and n-homomorphisms to the category of
abelian groups and ordinary group homomorphisms.
Proof. For each natural number k, we extend φ to a map from Mk(R)
to Mk(S) by applying φ to each matrix entry; it is easy to check this
also gives us an n-homomorphism. Moreover, φ is compatible with
stabilization of matrices; the only nonobvious point to check is that φ
respects algebraic equivalence.
Let e and f be algebraically equivalent n-potents in Mk(R) for some
k, and choose a and b in Mk(R) so that e = ab and f = ba. Define
elements a′ = φ(ea)φ(f)n−2 and b′ = φ(b) in Mk(S). We compute:
a′b′ = φ(ea)φ(f)n−2φ(b) = φ((ea)fn−2b) =
φ(ea(ba)n−2b) = φ(e(ab)n−1) = φ(en) = φ(e).
A similar argument shows that b′a′ = φ(f). Therefore φ determines
a monoid homomorphism from Vn(R) to Vn(S), and hence a group
homomorphism φ∗ : K
0 (R) −→ Kn0 (S). We leave it to the reader to
make the straightforward computations to show that we have a covari-
ant functor. �
Note that while we have an isomorphism Kn0 (A)
K0(A)
Q(n− 1)-algebras, it is not at all clear from the right hand side of this
isomorphism that Kn0 (A) is functorial for n-homomorphisms.
16 EFTON PARK AND JODY TROUT
References
[1] B. Blackadar,K-theory for Operator Algebras, 2nd ed., MSRI Publication Series
5, Springer-Verlag, New York, 1998.
[2] S. Feigelstock, Rings whose additive endomorphisms are N -multiplicative, Bull.
Austral. Math. Soc. 39 (1989), no. 1, 11–14.
[3] S. Feigelstock, Rings whose additive endomorphisms are n-multiplicative. II,
Period. Math. Hungar. 25 (1992), no. 1, 21–26.
[4] M. Hejazian, M. Mirzavaziri, M.S. Moslehian, n-homomorphisms, Bull. Iranian
Math. Soc. 31 (2005), no. 1, 13-23.
[5] E. Park and J. Trout, On the Nonexistence of Nontrivial Involutive n-
homomorphisms of C∗-algebras, Trans. Amer. Math. Soc. 361 (2009), no. 4,
1949–1961
[6] M. Rordam, F. Larsen, N. Laustsen, An Introduction to K-theory for C∗-
algebras, London Mathematical Society Student Texts, vol. 49. Cambridge Uni-
versity Press, Cambridge, 2000.
[7] J. Rosenberg, Algebraic K-theory and Its Applications, Graduate Texts in Math-
ematics, vol. 147, Springer-Verlag, New York, 1994.
Box 298900, Texas Christian University, Fort Worth, TX 76129
E-mail address : e.park@tcu.edu
6188 Kemeny Hall, Dartmouth College, Hanover, NH 03755
E-mail address : jody.trout@dartmouth.edu
	1. Introduction
	2. Equivalence of n-potents
	3. K0-theory with n-potents
	4. n-Homomorphisms and K0n Functorality
	References
ABSTRACT
  Let $n \geq 2$ be an integer. An \emph{$n$-potent} is an element $e$ of a
ring $R$ such that $e^n = e$. In this paper, we study $n$-potents in matrices
over $R$ and use them to construct an abelian group $K_0^n(R)$. If $A$ is a
complex algebra, there is a group isomorphism $K_0^n(A) \cong
\bigl(K_0(A)\bigr)^{n-1}$ for all $n \geq 2$. However, for algebras over
cyclotomic fields, this is not true in general. We consider $K_0^n$ as a
covariant functor, and show that it is also functorial for a generalization of
homomorphism called an \emph{$n$-homomorphism}.

<|endoftext|><|startoftext|>
Spin coupling in zigzag Wigner crystals
A. D. Klironomos,1,2 J. S. Meyer,2 T. Hikihara,3 and K. A. Matveev1
Materials Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
Department of Physics, The Ohio State University, Columbus, Ohio 43210, USA
Department of Physics, Hokkaido University, Sapporo 060-0810, Japan
(Dated: October 28, 2018)
We consider interacting electrons in a quantum wire in the case of a shallow confining potential and
low electron density. In a certain range of densities, the electrons form a two-row (zigzag) Wigner
crystal whose spin properties are determined by nearest and next-nearest neighbor exchange as well
as by three- and four-particle ring exchange processes. The phase diagram of the resulting zigzag
spin chain has regions of complete spin polarization and partial spin polarization in addition to
a number of unpolarized phases, including antiferromagnetism and dimer order as well as a novel
phase generated by the four-particle ring exchange.
PACS numbers: 73.21.Hb,71.10.Pm
I. INTRODUCTION
The deviations of the conductance from perfect quanti-
zation in integer multiples of G0 = 2e
2/h observed in bal-
listic quantum wires at low electron densities have gener-
ated great experimental and theoretical interest in recent
years.1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27
These conductance anomalies manifest themselves as
quasi-plateaus in the conductance as a function of
gate voltage at about 0.5 to 0.7 of the conductance
quantum G0, depending on the device. Although most
experiments are performed with electrons in GaAs
wires,1,2,3,4,5,6,7,8,9,10,11 a similar “0.7 structure” was
recently observed in devices formed in two-dimensional
hole systems.12,13,14 It is widely accepted that the origin
of the quasi-plateau lies in correlation effects, but a
complete understanding of this phenomenon remains
elusive.
Although some alternative interpretations have been
proposed,11,26,27 most commonly the experimental find-
ings are attributed to non-trivial spin properties of quan-
tum wires.1,4,5,6,7,8,9,10,14,15,16,17,18,19,20,21,22,23,24,25 In a
truly one-dimensional geometry the spin coupling is rel-
atively simple: electron spins are coupled antiferromag-
netically, and the low energy properties of the system
are described by the Luttinger liquid theory. The pic-
ture may change dramatically when transverse displace-
ments of electrons are important and the system be-
comes quasi-one-dimensional. In particular, the spon-
taneous spin polarization of the ground state, which was
proposed1,6,9,10,14,15,16 as a possible origin of the conduc-
tance anomalies, is forbidden in one dimension,28 but
allowed in this case.
The electron system in a quantum wire undergoes
a transition from a one-dimensional to a quasi-one-
dimensional state when the energy of quantization in the
confining potential is no longer large compared to other
important energy scales. In this paper we consider the
spin properties of a quantum wire with shallow confin-
ing potential. In such a wire the electron system be-
comes quasi-one-dimensional while the electron density
is still very low, and thus the interactions between elec-
trons are effectively strong. At very low densities, elec-
trons in the wire form a one-dimensional structure with
short-range crystalline order—the so-called Wigner crys-
tal. As the density increases, strong Coulomb interac-
tions cause deviations from one-dimensionality creating
a quasi-one-dimensional zigzag crystal with dramatically
different spin properties. In particular, ring exchanges
will be shown to play an essential role.
We find several interesting spin structures in the
zigzag crystal. In a sufficiently shallow confining po-
tential, in a certain range of electron densities, the 3-
particle ring exchange dominates and leads to a fully
spin-polarized ground state. At higher electron densities,
and/or in a somewhat stronger confining potential, the
4-particle ring exchange becomes important. We study
the phase diagram of the corresponding spin chain us-
ing the method of exact diagonalization, and find that
the 4-particle ring exchange gives rise to novel phases,
including one of partial spin polarization.
The paper is organized as follows. The formation of a
Wigner crystal in a quantum wire and its evolution into
a zigzag chain as a function of electron density are dis-
cussed in Sec. II. Spin interactions in a zigzag Wigner
crystal which arise through 2-particle as well as ring ex-
changes are introduced in Sec. III. The numerical calcu-
lation of the relevant exchange constants is presented in
Sec. IV. The results of the numerical calculation estab-
lish the existence of a ferromagnetic phase at intermedi-
ate densities and the dominance of the 4-particle ring ex-
change at high densities. Subsequently, a detailed study
of the zigzag chain with 4-particle ring exchange is pre-
sented in Sec. V. An attempt to construct the phase dia-
gram for a realistic quantum wire as a function of electron
density and interaction strength is presented in Sec. VI.
The paper concludes with a discussion of the relation of
our work to recent experiments, given in Sec. VII. A
brief summary of some of our results has been reported
previously in Ref. 29.
http://arxiv.org/abs/0704.0776v1
0 0.05 0.1 0.15 0.2 0.25
(a) ν=0.70
(b) ν=0.90
(c) ν=1.75
FIG. 1: Wigner crystal of electrons in a quantum wire. The
structure as determined by the dimensionless distance be-
tween rows d/r0 depends on the parameter ν proportional
to electron density (see text). As density grows, the one-
dimensional crystal (a) gives way to a zigzag chain (b,c).
II. WIGNER CRYSTALS IN QUANTUM WIRES
We consider a long quantum wire in which the elec-
trons are confined by some smooth potential in the direc-
tion transverse to the wire axis. Assuming a quadratic
dispersion and zero temperature, the kinetic energy of
an electron is typically of the order of the Fermi en-
ergy EF = (π~n)
2/8m, whereas the Coulomb interaction
energy is of the order of e2n/ǫ. Here, n is the (one-
dimensional) density of electrons, ǫ is the dielectric con-
stant of the host material, and m is the effective electron
mass. As the density of electrons is lowered, Coulomb
interactions become increasingly more important, and at
n ≪ a−1B they dominate over the kinetic energy, where
the Bohr radius is given as aB = ~
2ǫ/me2. (In GaAs its
value is approximately aB ≈100Å.)
In this low-density limit, the electrons can be treated
as classical particles. They will minimize their mutual
Coulomb repulsion by occupying equidistant positions
along the wire, forming a structure with short-range crys-
talline order—the so-called Wigner crystal, Fig. 1(a).
Unlike in higher dimensions, the long-range order in a
one-dimensional Wigner crystal is smeared by quantum
fluctuations, and only weak density correlations remain
at large distances.30 However, as will be shown in the
following sections, the coupling of electron spins is con-
trolled by electron interactions at distances of order 1/n,
where the picture of a one-dimensional Wigner crystal is
applicable. Henceforth, we speak of a Wigner crystal in
a quantum wire with this important distinction in mind.
Upon increasing the density, the inter-electron distance
diminishes, and the resulting stronger electron repulsion
will eventually overcome the confining potential Vconf ,
transforming the classical one-dimensional Wigner crys-
tal into a staggered or zigzag chain31,32, as depicted in
Fig. 1(b,c). From the comparison of the Coulomb inter-
action energy Vint(r) = e
2/ǫr with the confining potential
an important characteristic length scale emerges. Indeed,
the transition from the one-dimensional Wigner crystal
to the zigzag chain is expected to take place when dis-
tances between electrons are of the order of the scale r0
such that Vconf(r0) = Vint(r0).
It is therefore necessary to identify the electron equi-
librium configuration as a function of density. In order
to proceed in a quantitative way we consider a specific
model, namely a quantum wire with a parabolic confining
potential Vconf(y) = mΩ
2y2/2, where Ω is the frequency
of harmonic oscillations in the potential Vconf(y). Within
that model the characteristic length scale r0 is given as
2e2/ǫmΩ2
. (1)
It is convenient for the following discussion to measure
lengths in units of r0. To that respect we introduce a
dimensionless density
ν = nr0. (2)
Then minimization of the energy with respect to the
electron configuration31,32 reveals that a one-dimensional
crystal is stable for densities ν < 0.78, whereas a zigzag
chain forms at intermediate densities 0.78 < ν < 1.75.
(If density is further increased, structures with larger
numbers of rows appear.31,32) The distance d between
rows grows with density as shown in Fig. 1. Note that at
ν ≈ 1.46 the equilateral configuration is achieved. There-
fore, at higher densities—and in a curious contradiction
in terms—the distance between next-nearest neighbors is
smaller than the distance between nearest neighbors (see
Fig. 1(c)).
III. SPIN EXCHANGE
In order to introduce spin interactions in the Wigner
crystal, it is necessary to go beyond the classical limit.
In quantum mechanics spin interactions arise due to ex-
change processes in which electrons switch positions by
tunneling through the potential barrier that separates
them. The tunneling barrier is created by the exchanging
particles as well as all other electrons in the wire. The re-
sulting exchange energy is exponentially small compared
to the Fermi energy EF . Furthermore, as a result of
the exponential decay of the tunneling amplitude with
distance, only nearest neighbor exchange is relevant in a
one-dimensional crystal. Thus, the one-dimensional crys-
tal is described by the Heisenberg Hamiltonian H1 =∑
j J1SjSj+1, where the exchange constant J1 is posi-
tive and has been studied in detail recently.24,33,34,35 The
exchange constant being positive leads to a spin-singlet
ground state with quasi-long-range antiferromagnetic or-
der, in accordance with the Lieb-Mattis theorem.28
The zigzag chain introduced in the previous section
displays much richer spin physics. As the distance be-
tween the two rows increases as a function of density, the
distance between next-nearest neighbors becomes com-
parable to and eventually even smaller than the distance
between nearest neighbors, as illustrated in Fig. 1(b,c).
Consequently, the next-nearest neighbor exchange con-
stant J2 may be comparable to or larger than the nearest
neighbor exchange constant J1. Drawing intuition from
studies of the two-dimensional Wigner crystal,36,37,38,39
one comes to a further important realization regarding
the physics of the zigzag chain: in addition to 2-particle
exchange processes, ring exchange processes, in which
three or more particles exchange positions in a cyclic
fashion, have to be considered in this geometry.
It has long been established that, due to symmetry
properties of the ground state wavefunctions, ring ex-
changes of an even number of fermions favor antiferro-
magnetism, while those of an odd number of fermions
favor ferromagnetism.40 In a zigzag chain, the Hamilto-
nian reads
J1Pj j+1 + J2Pj j+2 − J3(Pj j+1 j+2 + Pj+2 j+1 j)
+J4(Pj j+1 j+3 j+2 + Pj+2 j+3 j+1 j)− . . .
, (3)
where Pj1...jl denotes the cyclic permutation operator of l
spins. Here the exchange constants are defined such that
all Jl > 0. Furthermore, only the dominant l-particle ex-
changes are shown. A more familiar form of the Hamilto-
nian in terms of spin operators is obtained by noting that
Pij =
+ 2SiSj and Pj1...jl = Pj1j2Pj2j3 . . . Pjl−1jl .
Using spin operators and considering the two-spin ex-
changes one obtains the Hamiltonian
H12 =
(J1SjSj+1 + J2SjSj+2) . (4)
The competition between the nearest neighbor and next-
nearest neighbor exchanges becomes the source of frus-
tration of the antiferromagnetic spin order and eventu-
ally leads to a gapped dimerized ground state at J2 >
0.24J1.
41,42,43,44
The simplest ring exchange involves three particles and
is therefore ferromagnetic. Including the 3-particle ring
exchange J3 in addition to the 2-particle exchanges, the
Hamiltonian of the corresponding spin chain retains a
simple form. The 3-particle ring exchange does not in-
troduce a new type of coupling, but rather modifies the
2-particle exchange constants.40 For a zigzag crystal we
find the effective 2-particle exchange constants
J̃1 = J1 − 2J3, (5)
J̃2 = J2 − J3. (6)
Thus the total Hamiltonian has the form
H123 =
J̃1SjSj+1 + J̃2SjSj+2
, (7)
where J̃1 and J̃2 can have either sign.
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
����������������������
J 2~ 0
FM AF
Dimers
FIG. 2: The phase diagram including nearest neighbor, next-
nearest neighbor, and 3-particle ring exchanges. The effective
couplings eJ1 and eJ2 are defined in the text. The shaded region
between the dimer and ferromagnetic phases corresponds to
the exotic phase predicted in Ref. 48.
Consequently, regions of negative (i.e. ferromagnetic)
nearest and/or next-nearest neighbor coupling become
accessible. The phase diagram of the Heisenberg spin
chain (7) with both positive and negative couplings has
been studied extensively.41,42,43,44,45,46,47,48,49,50 In ad-
dition to the antiferromagnetic and dimer phases dis-
cussed earlier, a ferromagnetic phase exists for J̃1 <
min{0,−4J̃2}.46 An exotic phase called the chiral-biaxial-
nematic phase has been predicted48 to appear for J̃1 < 0
and −0.25 < J̃2/J̃1 < −0.38. However, the nature of the
system in this parameter region is still controversial. The
phase diagram is drawn in Fig. 2.
Thus, depending on the relative magnitudes of the var-
ious exchange constants, different phases are realized.
Extensive studies of the two-dimensional Wigner crys-
tal have shown that, at low densities (or strong interac-
tions), the 3-particle ring exchange dominates over the
2-particle exchanges. As a result, the two-dimensional
Wigner crystal becomes ferromagnetic at sufficiently
strong interactions.36,39 Given that the electrons in a
two-dimensional Wigner crystal form a triangular lat-
tice, by analogy, one should expect a similar effect in
the zigzag chain at densities where the electrons form ap-
proximately equilateral triangles. More specifically, upon
increasing the density and consequently the distance be-
tween rows, one would expect the system to undergo a
phase transition from an antiferromagnetic to a ferromag-
netic phase. To establish this scenario conclusively, the
various exchange energies in the zigzag crystal have to be
determined. The system differs from the two-dimensional
crystal in two important aspects. (i) The electrons are
subject to a confining potential as opposed to the flat
background in the two-dimensional case. Even more im-
portantly, (ii) the electron configuration depends on den-
sity, cf. Fig. 1, as opposed to the ideal triangular lattice
in two dimensions. In the following section, we proceed
with a numerical study of the exchange energies for the
specific configurations of the zigzag Wigner crystal in a
parabolic confining potential.
IV. SEMICLASSICAL EVALUATION OF THE
EXCHANGE CONSTANTS
The effective strength of interactions is usually de-
scribed by the interaction parameter rs which measures
the relative magnitude of the interaction energy and the
kinetic energy and is of order the distance between elec-
trons measured in units of the Bohr radius. For quan-
tum wires, it is more appropriate to use the parameter
rΩ = r0/aB, which takes into account the confining po-
tential. Within our model, the interaction parameter rΩ
rΩ = 2
2ǫ2~2
. (8)
For rΩ ≫ 1, strong interactions dominate the physics
of the system, and a semiclassical description is appli-
cable. In order to calculate the various exchange con-
stants, we use the standard instanton method, success-
fully employed in the study of the two-dimensional36,37,38
and one-dimensional34,35 Wigner crystal. Within this
approach, the exchange constants are given by Jl =
J∗l exp (−Sl/~). Here Sl is the value of the Euclidean
(imaginary time) action, evaluated along the classical ex-
change path. By measuring length and time in units of r0
and T =
2/Ω, respectively, the action S[{rj(τ)}] can be
rewritten in the form S = ~η
rΩ, where the functional
η[{rj(τ)}] =
+ y2j
|rj−ri|
 (9)
is dimensionless.
Thus, we find the exchange constants in the form
Jl = J
l exp (−ηl
rΩ), (10)
where the dimensionless coefficients ηl depend only on
the electron configuration (cf. Fig. 1) or, equivalently,
on the density ν. The exponents ηl are calculated nu-
merically for each type of exchange by minimizing the
action (9) with respect to the instanton trajectories of
the exchanging electrons. This procedure is mathemati-
cally equivalent to solving a set of coupled, second order
in the imaginary time τ , differential equations for the
trajectories rj(τ). The boundary conditions at τ = ±∞
are, respectively, the original equilibrium configuration
and the configuration where the electrons have exchanged
positions according to the exchange process considered.
0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
FIG. 3: The exponents η1, η2, η3, and η4 as functions of the
dimensionless density ν.
ν η1 η2 η3 η4
1.0 1.050 2.427 1.254 1.712
1.1 1.161 2.169 1.261 1.605
1.2 1.255 1.952 1.275 1.532
1.3 1.337 1.754 1.287 1.469
1.4 1.406 1.566 1.293 1.398
1.5 1.456 1.376 1.278 1.299
1.6 1.471 1.169 1.215 1.135
1.7 1.391 0.901 1.022 0.784
TABLE I: The numerically calculated values of the density
dependent exponents ηl, see Eq. (10). The computation was
carried out including 12 moving spectator particles on either
side of the exchanging particles. Corrections to all ηl from
the remaining spectators do not exceed 0.1%.
In the simplest approximation only the exchanging
electrons are included in the calculation while all other
electrons, being frozen in place, create the background
potential. It turns out, however, that it is important
to take into account the motion of “spectators”—the
electrons in the crystal to the left and to the right of
the exchanging particles—during the exchange process.
The results presented here are obtained by successively
adding more spectators on both sides until the values ηl
converge. We find that including 12 moving spectators
on either side of the exchanging particles determines the
exponents to an accuracy better than 0.1%.
Figure 3 shows the calculated exponents for various ex-
changes as a function of dimensionless density ν and the
corresponding values are reported in Table I. At strong
interactions (rΩ ≫ 1), the exchange with the smallest
value of ηl is clearly dominant, and the prefactor J
l is of
secondary importance to our argument. At low densities,
when the zigzag chain is still close to one-dimensional, J1
(c) J3
(b) J(a) J 2
(d) J4
FIG. 4: The calculated particle trajectories for various ex-
changes at a representative density ν = 1.5. It is evident that
only a few near neighbors of the exchanging particles move
appreciably.
is the largest exchange constant, and the spin physics is
controlled by the nearest-neighbor exchange. In an inter-
mediate density regime, when the electron configuration
is close to equilateral triangles, the 3-particle ring ex-
change dominates. Thus, the numerical calculation con-
firms our original expectation, and a transition from an
antiferromagnetic to a ferromagnetic state takes place
upon increasing the density. Surprisingly, however, at
even higher densities the 4-particle ring exchange is the
dominant process. The role of the 4-particle ring ex-
change and the phase diagram of the associated zigzag
spin chain will be the subject of the following section.
More complicated exchanges have also been computed,
namely multi-particle (l ≥ 5) ring exchanges as well as
exchanges involving more distant neighbors. However,
the exchanges displayed in Fig. 3 were found to be the
dominant ones.29
It is important to note here that spectators contribute
to our results in an essential way. Allowing spectators to
move results not only in quantitative changes (namely a
reduction of the initially overestimated values ηl) but in
qualitative changes as well: at high densities, the dom-
inance of the 4-particle ring exchange J4 over the next-
nearest neighbor exchange J2 is obtained only if specta-
tors are taken into account. In particular, it is necessary
to include at least 6 moving spectators on each side of the
exchanging particles for J4 to take over at high densities.
The considerable effect that the spectators have on
the values of the exponents raises the question whether
a short-ranged interaction potential might cause further
quantitative or qualitative changes to the physical pic-
ture. In order to investigate that possibility we have
repeated the entire calculation for a modified Coulomb
interaction of the form
V (x) =
x2 + (2d)2
. (11)
This particular interaction accounts for the presence of a
metal gate, modeled by a conducting plane at a distance
d from the crystal. The gate screens the bare Coulomb
potential, modifying the electron-electron interaction at
long distances. Our calculation shows that this modifica-
tion affects the values of the exponents only weakly, even
when the gate is placed at a distance from the crystal
comparable to the inter-particle spacing. Qualitatively,
the physical picture remains the same, with the order of
dominance of the various exchanges unaffected through-
out the range of densities.
At the same time, it is particularly noteworthy that
(both for the screened and unscreened interaction) the
contribution of the spectator electrons saturates rapidly
as their number is increased. This is an indication that
the destruction of long-range order in the quasi-one-
dimensional Wigner crystal by quantum fluctuations will
not affect our conclusions. Figure 4 shows the particle
trajectories for the dominant exchanges at a represen-
tative density of ν = 1.5. The trajectories of both the
exchanging particles and a subset of the spectators are
shown, and their relative displacements can be readily
compared.
V. FOUR-PARTICLE RING EXCHANGE
We have shown in the preceding section that in a cer-
tain range of densities, the 4-particle ring exchange dom-
inates. Unlike the 3-particle exchange, the 4-particle
ring exchange not only modifies the nearest and next-
nearest neighbor exchange constants, but, in addition,
introduces more complicated spin interactions.40 For the
zigzag chain, we find
H4 = J4
SjSj+l + 2
(SjSj+1)(Sj+2Sj+3)
+(SjSj+2)(Sj+1Sj+3)− (SjSj+3)(Sj+1Sj+2)
. (12)
Not much is known about the physics of zigzag spin
chains with interactions of this type. We have stud-
ied this particular system described by the Hamiltonian
H = H123 + H4 using exact diagonalization, consider-
ing systems of N = 12, 16, 20, 24 sites. Periodic bound-
ary conditions have been imposed, and we have employed
the well-known Lanczos algorithm to calculate a few low-
energy eigenstates.
Figure 5 shows the total spin S of the ground state
as a function of the effective couplings J̃1/J4 and J̃2/J4
for the largest system considered, one with N = 24 sites.
The darkest region corresponding to maximal total spin is
the ferromagnetic phase, which occurs for large negative
couplings in direct analogy to the phase diagram for the
system without four-spin interactions (see Fig. 2). For all
system sizes that we have considered, the obtained phase
boundary is almost independent of the system size and
agrees very well with the conditions for ferromagnetism
J̃1 + 2J4 < 0, (13)
J̃1 + 4J̃2 + 10J4 < 0, (14)
FIG. 5: Total spin S of the ground state for a chain of N = 24
sites as a function of the effective couplings eJ1/J4 and eJ2/J4.
derived by treating the four-spin terms in the Hamilto-
nian (12) on a mean field level near the ferromagnetic
state.
A new phase of partial spin polarization appears adja-
cent to the ferromagnetic phase. The partially polarized
phase possesses a ground state total spin of S = 2 for
N = 12, S = 2 or 4 for N = 16, 20, and S = 4 for
N = 24; it appears that total spin of one third of the
saturated magnetization N/2 prevails throughout most
of that phase. The phase persists, to a significant extent,
in range and form as N increases. Therefore, we believe
it is not a finite size effect. We note here that it has been
shown rigorously that a model described by a Hamilto-
nian having a similar form to ours also exhibits a ground
state with partial spin polarization.51 On the other hand,
the scattered points corresponding to non-zero total spin
in the first quadrant (J̃1, J̃2 > 0) appear to shift posi-
tion as N increases and the size of the total spin remains
small, S ≤ 2, for all system sizes considered. We cannot
ascertain at this point whether they persist in a larger
system.
At large values of |J̃1|/J4 and |J̃2|/J4, one would ex-
pect to recover the phases present in the absence of J4.
Thus, the large white area in Fig. 5 corresponding to
total spin S = 0 should contain the antiferromagnetic
phase, analogues of the dimer phases observed in the
system without four-spin interactions, and possibly en-
tirely new phases as well. In order to distinguish between
these phases, we first calculate the overlap between the
ground state wavefunctions in our model and the ones
representing the dimer and antiferromagnetic phases in
the well-studied model with J4 = 0. The representative
ground state wavefunctions are obtained for the chain
with J4 = 0 and typical parameter sets of (J̃1, J̃2) cho-
sen deep in the dimer and antiferromagnetic phases of
the phase diagram shown in Fig. 2. The results for the
−6 −4 −2 0 2 4 6
−6 −4 −2 0 2 4 6
8 1.0
FIG. 6: Overlaps of the ground state wavefunctions in the
presence of the 4-particle ring exchange with the wavefunc-
tions representing (a) the dimer and (b) the antiferromag-
netic phase for J4 = 0. The representative ground states
(a) and (b) are obtained for ( eJ1, eJ2, J4) = (1, 10, 0) and
( eJ1, eJ2, J4) = (1,−10, 0), respectively.
chain with N = 24 sites are shown in Fig. 6. As can
be seen from the figure, the ground states for a broad
region of large positive J̃2/J4 have a significant overlap
with the representative ground state of the dimer phase
while the ground states for large positive J̃1/J4 and/or
negative J̃2/J4 resemble very much the one belonging to
the antiferromagnetic phase. This behavior indicates the
appearance of the expected dimer and antiferromagnetic
phases for large effective couplings |J̃1|/J4 and |J̃2|/J4.
We have confirmed the existence of these phases in the
corresponding parameter regimes by studying the associ-
ated structure factors.
In order to study and clarify the properties of the sys-
tem in more detail, we have calculated the excitation
energies
∆En(S,Q) = En(S,Q)− Egs, (15)
where En(S,Q) is the energy of n-th lowest level in the
subspace characterized by the total spin S and the mo-
−2 0 2 4 6
J2 / J4
J1 /J4 = 2, N = 24
: (0, 0)
: (0, π)
: (1, π)
FIG. 7: Excitation energies ∆En(S,Q) in the system of N =
24 sites for eJ1/J4 = 2 as functions of eJ2/J4. The two-lowest
levels are plotted for the subspaces of (S,Q) = (0, 0) and (0, π)
while only the lowest one is shown for all other subspaces.
The energies for (S,Q) = (0, 0), (0, π), and (1, π) are plotted
by thick solid, dotted, and dashed curves, respectively. The
energies of the levels belonging to other subspaces are shown
by thin gray curves.
mentum Q, and Egs is the ground state energy. Figure 7
shows the results for the system of size N = 24, obtained
along the vertical line in the phase diagram given by
J̃1/J4 = 2. At large positive J̃2/J4, the ground and first-
excited states belong to the subspace (S,Q) = (0, 0) and
(0, π), respectively.52 These states are expected to form
the ground state doublet of the dimer phase in the ther-
modynamic limit. For J̃2/J4 > (J̃2/J4)c,dim ∼ 3.5, one of
the dimer doublet states is the ground state and the sys-
tem is in the dimer phase. At smaller J̃2/J4, both states
of the dimer doublet shift upward and move away from
the low-energy regime, while other states decrease steeply
in energy and eventually become the ground state. We
therefore take the point (J̃2/J4)c,dim as the boundary of
the dimer phase. After the transition, the system enters
a region with exotic ground states and a large number of
low-lying excitations. We have numerically checked that
these exotic ground states have no or, at most, negligibly
small overlap with the ground state of either the dimer or
antiferromagnetic phases. When J̃2/J4 decreases further,
the exotic states leave the low-energy regime and the
system predictably enters the antiferromagnetic phase,
which occurs for J̃2/J4 < (J̃2/J4)c,AF ∼ 0.1.
Performing the same type of analysis for several pa-
rameter lines, we can estimate the phase boundaries
(J̃2/J4)c,dim and (J̃2/J4)c,AF as functions of J̃1/J4. In
the limit of large negative coupling J̃1/J4 → −∞, the
boundary of the dimer phase (J̃2/J4)c,dim approaches the
line J̃1 = −0.38J̃2, suggesting a smooth connection to
the behavior for J̃1 < 0 and J4 = 0 (cf. Ref. 48). In
−5 0 5
Dimers
FIG. 8: The phase diagram of the Heisenberg chain including
nearest neighbor, next-nearest neighbor, and 4-particle ring
exchanges. The expected phases consist of a ferromagnetic
and an antiferromagnetic phase as well as a dimer phase. In
addition, a novel region (4P ) dominated by the 4-particle ring
exchange appears. The latter includes a phase of partial spin
polarization (M). Triangles, squares and circles correspond
to the boundaries obtained for N = 16, 20, and 24 sites, re-
spectively. We note that although the phase of partial spin
polarization persists as the system size is increased, its bound-
ary with the 4P phase has a rather irregular size dependence
and is represented approximately in the figure.
a similar fashion, at large positive coupling J̃1/J4, we
find no indication for the appearance of exotic phases
after J̃1/J4 ≥ 6; the data of the energy spectrum and
the wavefunction overlaps show essentially the same be-
haviors as those at J̃1/J4 → ∞. We therefore conclude
that there occurs a direct transition between the dimer
and antiferromagnetic phases and estimate the transition
line using the method of level spectroscopy, according
to which the transition point is determined by the level
crossing between the first-excited states in the dimer and
antiferromagnetic phases.43
Combining all these phase boundaries and including
the boundaries of the ferromagnetic and partially spin
polarized phases which were obtained using the total spin
of the ground state as a criterion, we determine the phase
diagram in the J̃1/J4 versus J̃2/J4 plane. The result is
shown in Fig. 8. The phase diagram has similarities to
the one obtained without the four-spin interaction term,
see Fig. 2. In particular, the expected ferromagnetic, an-
tiferromagnetic, and dimer phases appear for large values
of the effective couplings, |J̃1|/J4 and |J̃2|/J4. But more
importantly, at not too large values of the effective cou-
plings, new phases appear as a direct result of the new
interaction term. We can identify a phase with partial
spin polarization and a region occupied by one or sev-
eral novel phases with total spin S = 0. In the region
where J4 dominates, the ground state has no similarity
at the level of wavefunctions with that of the conventional
phases. It is important to note that the region occupied
by the new phases becomes broader as the system size
N grows, indicating that it survives even in the thermo-
dynamic limit. From the analysis of the wavefunction
overlaps between the ground states, there are strong in-
dications that the novel unpolarized region might consist
of several different phases. Unfortunately, it has proven
difficult to clarify the nature of the new phases and, in
particular, discover the order parameters that character-
ize them based solely on the analysis of small systems.
Therefore, the issue is relegated to future studies. In
the absence of detailed understanding of its properties,
we collectively dub the region of the phase diagram the
“4P” phase.
VI. PHASE DIAGRAM FOR REALISTIC
QUANTUM WIRES
Having identified possible phases of the zigzag chain,
the most interesting question is which of the various
phases appearing in the phase diagram Fig. 8 are ac-
cessible in quantum wires. At finite rΩ, the calculations
of the exchange constants discussed in Sec. IV have to be
completed in an important way by computing the prefac-
tors Jl in Eq. (10). To that effect it is necessary to take
into account Gaussian fluctuations around the classical
exchange path. We employ the method introduced by
Voelker and Chakravarty38 which, for the sake of com-
pleteness, is outlined in the Appendix . The prefactors
have the form
J∗l =
AlFl rΩ
, (16)
where Fl is density dependent. The factor Al is used to
account for multiple classical trajectories corresponding
to the same exchange process (see Appendix). Table II
contains the values of Fl we calculated for the various
exchanges considered in this work. Note that, in order
to achieve a comparable level of convergence, a more ac-
curate determination of the instanton trajectories was
required for the calculation of the prefactors J∗l than for
the calculation of the exponents ηl. By including up to 28
moving spectators on either side of the exchanging par-
ticles, we have been able to achieve an accuracy better
than 2%.
We are now in a position to map out the areas of the
phase diagram of Fig. 8 that are encountered as one tra-
verses the density region of interest for a given rΩ. The
resulting phase diagram obtained with the calculated ex-
change energies is shown in Fig. 9. Since the semiclassical
approximation is applicable only at rΩ ≫ 1, we do not
extend the phase diagram to values of rΩ < 10. It turns
out that the spin polarized phases are only realized at
rΩ & 50. On the other hand, the novel “4P” phase is
ν F1 F2 F3 F4
1.0 1.12 ≃ 6 1.22 2.44
1.1 1.04 ≃ 4 1.03 1.73
1.2 1.05 2.38 0.97 1.28
1.3 1.08 1.86 0.97 1.15
1.4 1.19 1.71 1.02 1.13
1.5 1.40 1.63 1.14 1.18
1.6 1.80 1.51 1.26 1.19
1.7 2.07 1.07 0.81 0.50
TABLE II: The numerically calculated values of the density
dependent part Fl of the exchange energy prefactor J
, see
Eq. (16), calculated with mobile spectators. For all the num-
bers reported, the accuracy is better than 2%, except for F2 at
ν = 1.0, 1.1, for which extrapolated values with an estimated
error of ∼ 10% are shown.
1.1 1.2 1.3 1.4 1.5 1.6
AF 4P
FIG. 9: The phase diagram as a function of the dimension-
less density ν and interaction strength rΩ. The various phases
were obtained by first calculating the effective couplings eJ1/J4
and eJ2/J4 for a given point; subsequently, the correspond-
ing phase was determined utilizing the calculated boundaries
shown in Fig. 8 for a system of N = 24 sites.
expected to appear in a certain density range as long as
rΩ ≫ 1.
VII. DISCUSSION
In the preceding sections we have studied the coupling
of spins of electrons forming a zigzag Wigner crystal in a
parabolic confining potential. We have found that apart
from the 2-particle exchange couplings between the near-
est and next-nearest neighbor spins, the 3- and 4-particle
ring exchange processes have to be taken into account.
At relatively low electron densities, when the transverse
displacement of electrons is small compared to the dis-
tance between particles, Fig. 1(b), the nearest-neighbor
2-particle exchange dominates. In this regime the spins
form an antiferromagnetic ground state, with low-energy
excitations described by the Tomonaga-Luttinger theory.
At relatively high densities, when the transverse displace-
ments are large, Fig. 1(c), the 4-particle ring exchange
processes dominate. Since the ring exchange processes in-
volving even numbers of particles favor spin-unpolarized
states, the ground state of the system in this regime has
zero total spin. Finally, if the confining potential is suf-
ficiently shallow, so that the parameter rΩ & 50, there
is an intermediate density range in which the 3-particle
exchange processes are important, and the ground state
is spontaneously spin-polarized. These results are sum-
marized in Fig. 9.
We expect that the zigzag Wigner crystal state can be
realized in quantum wires. In order for the zigzag crys-
tal to form the confining potential of the wire should be
rather shallow, so that large values rΩ ≫ 1 of the pa-
rameter (8) could be achieved. The exact shape of the
confining potential in existing wires is not well known.
Using the quoted value of subband spacing ∼ 20 meV
we estimate that the parameter rΩ is of order unity in
cleaved-edge-overgrowth wires.53 The confining potential
in split-gate quantum wires tends to be more shallow.
For a typical value 1 meV of subband spacing we es-
timate rΩ ≈ 6. Finally, for p-type quantum wires13,54
with subband spacing ∼ 300 µeV we estimate rΩ ≈ 20.
These hole systems are the most promising devices for
observation of the zigzag Wigner crystal.
Given the relatively modest values of rΩ . 20 in the ex-
isting quantum wire structures, we do not expect that the
spontaneously spin-polarized ground state will be easily
observed in experiments. Instead, we expect that as the
density of charge carriers is increased, a transition from
antiferromagnetism to a state dominated by 4-particle
ring exchanges will occur. We have found that the ground
state in this phase has a complicated size dependence,
which makes it very difficult to identify its nature by
exact diagonalization of finite-size chains. To fully un-
derstand the spin properties in the high density regime,
further studies of zigzag ladders with ring exchange cou-
pling are needed.
Acknowledgments
We acknowledge helpful discussions with A. Läuchli
and T. Momoi. This work was supported by the U. S.
Department of Energy, Office of Science, under Contract
No. DE-AC02-06CH11357. T.H. was supported in part
by a Grant-in-Aid from the Ministry of Education, Cul-
ture, Sports, Science and Technology (MEXT) of Japan
(Grant Nos. 16740213 and 18043003). Part of the calcu-
lations were performed at the Ohio Supercomputer Cen-
ter thanks to a grant of computing time.
APPENDIX: CALCULATION OF THE
PREFACTORS
In order to find the prefactors J∗l in the expressions
for the exchange constants, fluctuations around the in-
stanton trajectory have to be taken into account. The
Euclidean (imaginary time) path integral for the propa-
gator G(R1,R2;T ) = 〈R1|e−TH |R2〉 can be written as
G(R1,R2;T ) =
∫ R(T )=R2
R(0)=R1
DR e−
S[R], (A.1)
where the Euclidean action is given by
S[R] =
+ V (R)
. (A.2)
Here R is a M -dimensional position vector, where M/2
is the total number of moving particles, including the
exchanging particles as well as the spectators. In the
semiclassical limit, the integral is dominated by the clas-
sical path Rcl(τ) that extremizes the action S for a
given exchange process. (The exponents η are given as
η = S[Rcl]/(~
rΩ).) The Gaussian quantum fluctua-
tions about the classical path can be taken into account
by defining fluctuation coordinates u(τ) ≡ R(τ)−Rcl(τ)
and subsequently expanding the action to second order.
We obtain for the propagator
G(R1,R2;T ) = F [Rcl]e
S[Rcl], (A.3)
F [Rcl] =
∫ u(T )=0
u(0)=0
Du(τ) e− 1~ δS[u(τ)], (A.4)
δS[u(τ)] =
2(τ) + uT (τ)H(τ)u(τ)
, (A.5)
Hkp(τ) =
∂2V (R)
∂Rk∂Rp
R=Rcl(τ)
. (A.6)
In the preceding formulas, R1 and R2 correspond to two
configurations of electrons that minimize the electrostatic
potential V (R) describing electron-electron interactions
as well as the confining potential. The exchange constant
is related to the ratio of the propagator for a particular
exchange process R1 → R2, divided by the propagator
for the trivial path Rcl(τ) = R1:
F [Rcl]
F [R1]
S[Rcl]. (A.7)
We start from the expression for the propagator in the
semiclassical limit and proceed by partitioning the time
interval [0, T ] into N subintervals (τ0, τ1), (τ1, τ2), . . . ,
(τN−1, τN ), with τ0 = 0, τN = T . The partition is cho-
sen sufficiently fine as to enable the approximation that
in each subinterval, the Hessian matrix H(τ) of the sec-
ond derivative of the potential can be considered time
independent, H(τ) ≃ H(τν) ≡ Hν . (In what follows, we
use the convention that for the fluctuation coordinates,
superscripts denote time subinterval, while subscripts de-
note spatial coordinate.) Subsequently the path integral
is calculated as a product of path integrals over the par-
titioned interval. Moreover, each individual path inte-
gral is that of a multidimensional harmonic oscillator,
for which analytic results exist. We then have
F [Rcl] =
du1 G1(u
1,u0; τ1 − τ0) . . .
duN−1 GN−1(u
N−1,uN−2; τN−1 − τN−2)GN (uN ,uN−1; τN − τN−1), (A.8)
and the propagator for each subinterval is
ν ,uν−1; τν − τν−1) =
∫ u(τν)=uν
u(τν−1)=uν−1
Du(τ) exp
2(τ) + uT (τ)Hνu(τ)
. (A.9)
Within each imaginary time subinterval, we define or-
thonormal eigenvectors qνµ =
k=1 U
k. The unitary
matrix Uν is such that Hν = UνΛν(Uν)T , with Λ a diag-
onal matrix of eigenvalues (λνµ)
2, µ = 1 . . .M , where M
is the number of spatial coordinates. Then one immedi-
ately obtains
ν ,qν−1; τν − τν−1) =
∫ q(τν)=qν
q(τν−1)=qν−1
Dq(τ) exp
2(τ) + qT (τ)Λνq(τ)
= F̄ [qcl]e
δS[qcl],
(A.10)
where qcl is the classical trajectory connecting q
ν−1 and qν . Considering the fluctuation part first, we obtain an
elementary path integral
F̄ [qcl] =
∫ q(τν)=0
q(τν−1)=0
Dq(τ) exp
dτ qT (τ)
, (A.11)
where
Bνµ =
~ sinh(λνµ∆τν)
, (A.12)
and ∆τν = τn − τn−1. The exponent δS[qcl] can now be
calculated explicitly
δS[qcl]
[(qνµ)
cl + (q
cl] cosh(λ
µ∆τν)
−2(qνµ)cl(qν−1µ )cl
. (A.13)
The subscript “cl” used for notational clarity will be sub-
sequently dropped from all expressions. With some addi-
tional algebra, the remaining integral is easily evaluated.
With the following definitions
Γνkp =
~ tanh(λνµ∆τν)
Uνpµ (A.14)
∆νkp =
~ sinh(λνµ∆τν)
Uνpµ, (A.15)
we find
F [Rcl] = (2π)
, (A.16)
where the M(N − 1) ×M(N − 1) matrix Ωνλkp has com-
ponents
Ωνλkp = (Γ
kp + Γ
kp )δ
ν,λ −∆νkpδν,(λ+1) −∆λkpδν,(λ−1).
(A.17)
The calculation of F [R1] is carried out in an identical
manner and the subscript “0” will be used to distinguish
the results pertaining to that calculation.
Finally, one has to account for the existence of an
eigenvalue of the matrix Ω which is identically zero in
the continuum limit and corresponds to the zero mode
associated with uniform translation of the instanton in
imaginary time. The procedure is standard55 and we
simply report the result for the prefactor here. One ob-
tains
G = T
Bνµ,0
detΩ0
det′ Ω
, (A.18)
where the primed determinant implies the exclusion of
the eigenvalue corresponding to the zero mode. Revert-
ing to the system of units used in this work, the prefactor
of the exchange energy is given by
J∗l =
Al rΩ
Bνµ,0
detΩ0
det′ Ω
(A.19)
The additional factor Al is used to account for multiple
classical trajectories corresponding to the same exchange
process, as happens for the case of nearest and next-
nearest neighbor exchanges (i.e., A1 = A2 = 2, whereas
Al = 1 for l ≥ 3).
The numerical implementation of the method outlined
above is straightforward. In particular, the quantity that
needs to be numerically calculated, once for each type of
exchange at all densities of interest, is
Bνµ,0
detΩ0
det′ Ω
. (A.20)
We note here that the eigenvalue corresponding to the
zero mode is easily calculated with the same procedure
used by Voelker and Chakravarty38. In the definition of
the prefactor, see Eqs. (A.4) and (A.5), one replacesH(τ)
with H(τ)− λ, with λ a free parameter. Subsequently, a
numerical search for the smallest eigenvalue that results
in 1/F (λ) = 0 is carried out. The smallest eigenvalue
corresponds to the zero mode, and for a finite partition
of the imaginary time interval it is a small but finite
number.
1 K. J. Thomas, J. T. Nicholls, M. Y. Simmons, M. Pepper,
D. R. Mace, and D. A. Ritchie, Phys. Rev. Lett. 77, 135
(1996).
2 A. Kristensen, J. B. Jensen, M. Zaffalon, C. B. Sørensen,
S. M. Reimann, P. E. Lindelof, M. Michel, and A. Forchel,
J. Appl. Phys. 83, 607 (1998).
3 A. Kristensen, H. Bruus, A. E. Hansen, J. B. Jensen,
P. E. Lindelof, C. J. Marckmann, J. Nyg̊ard,
C. B. Sørensen, F. Beuscher, A. Forchel, and M. Michel,
Phys. Rev. B 62, 10950 (2000).
4 K. J. Thomas, J. T. Nicholls, N. J. Appleyard, M. Y. Sim-
mons, M. Pepper, D. R. Mace, W. R. Tribe, and
D. A. Ritchie, Phys. Rev. B 58, 4846 (1998).
5 B. E. Kane, G. R. Facer, A. S. Dzurak, N. E. Lumpkin,
R. G. Clark, L. N. Pfeiffer, and K. W. West, Appl. Phys.
Lett. 72, 3506 (1998).
6 K. J. Thomas, J. T. Nicholls, M. Pepper, W. R. Tribe,
M. Y. Simmons, and D. A. Ritchie, Phys. Rev. B 61,
R13365 (2000).
7 D. J. Reilly, G. R. Facer, A. S. Dzurak, B. E. Kane,
R. G. Clark, P. J. Stiles, R. G. Clark, A. R. Hamil-
ton, J. L. O’Brien, N. E. Lumpkin, L. N. Pfeiffer, and
K. W. West, Phys. Rev. B 63, 121311(R) (2001).
8 S. M. Cronenwett, H. J. Lynch, D. Goldhaber-
Gordon, L. P. Kouwenhoven, C. M. Marcus, K. Hirose,
N. S. Wingreen, and V. Umansky, Phys. Rev. Lett. 88,
226805 (2002).
9 D. J. Reilly, T. M. Buehler, J. L. O’Brien, A. R. Hamilton,
A. S. Dzurak, R. G. Clark, B. E. Kane, L. N. Pfeiffer, and
K. W. West, Phys. Rev. Lett. 89, 246801 (2002).
10 R. Crook, J. Prance, K. J. Thomas, S. J. Chorley, I. Farrer,
D. A. Ritchie, M. Pepper, and C. G. Smith, Science 312,
1359 (2006).
11 R. de Picciotto, L. N. Pfeiffer, K. W. Baldwin, and
K. W. West, Phys. Rev. B 72, 033319 (2005).
12 R. Danneau, W. R. Clarke, O. Klochan, A. P. Micol-
ich, A. R. Hamilton, M. Y. Simmons, M. Pepper, and
D. A. Ritchie, Appl. Phys. Lett. 88, 012107 (2006).
13 O. Klochan, W. R. Clarke, R. Danneau, A. P. Micolich,
L. H. Ho, A. R. Hamilton, K. Muraki, and Y. Hirayama,
Appl. Phys. Lett. 89, 092105 (2006).
14 L. P. Rokhinson, L. N. Pfeiffer, and K. W. West, Phys.
Rev. Lett. 96, 156602 (2006).
15 C.-K. Wang and K.-F. Berggren, Phys. Rev. B 54, R14257
(1996); 57, 4552 (1998); A. A. Starikov, I. I. Yakimenko,
and K.-F. Berggren, Phys. Rev. B 67, 235319 (2003).
16 B. Spivak and F. Zhou, Phys. Rev. B 61, 16730 (2000).
17 V. V. Flambaum and M. Yu. Kuchiev, Phys. Rev. B 61,
R7869 (2000).
18 T. Rejec, A. Rams̆ak, and J. H. Jefferson, Phys. Rev. B
62, 12985 (2000).
19 H. Bruus, V. V. Cheianov, and K. Flensberg, Physica E
10, 97 (2001).
20 K. Hirose, S. S. Li, and N. S. Wingreen, Phys. Rev. B 63,
033315 (2001).
21 O. P. Sushkov, Phys. Rev. B 64, 155319 (2001); Phys. Rev.
B 67, 195318 (2003).
22 Y. Meir, K. Hirose, and N. S. Wingreen, Phys. Rev. Lett.
89, 196802 (2002).
23 Y. Tokura and A. Khaetskii, Physica E 12, 711 (2002).
24 K. A. Matveev, Phys. Rev. Lett. 92, 106801 (2004); Phys.
Rev. B 70, 245319 (2004).
25 T. Rejec and Y. Meir, Nature 442, 900 (2006).
26 H. Bruus and K. Flensberg, Semicond. Sci. Technol. 13,
A30 (1998).
27 G. Seelig and K. A. Matveev, Phys. Rev. Lett. 90, 176804
(2003).
28 E. Lieb and D. Mattis, Phys. Rev. 125, 164 (1962).
29 A. D. Klironomos, J. S. Meyer and K. A. Matveev, Euro-
phys. Lett. 74, 679 (2006).
30 H. J. Schulz, Phys. Rev. Lett. 71, 1864 (1993).
31 R. W. Hasse and J. P. Schiffer, Ann. Phys. 203, 419 (1990).
32 G. Piacente, I. V. Schweigert, J. J. Betouras, and
F. M. Peeters, Phys. Rev. B 69, 045324 (2004).
33 W. Häusler, Z. Phys. B 99, 551 (1996).
34 A. D. Klironomos, R. R. Ramazashvili, and K. A. Matveev,
Phys. Rev. B 72, 195343 (2005).
35 M. M. Fogler and E. Pivovarov, Phys. Rev. B 72, 195344
(2005); J. Phys.: Condens. Matter 18, L7 (2006).
36 M. Roger, Phys. Rev. B 30, 6432 (1984).
37 M. Katano and D. S. Hirashima, Phys. Rev. B 62, 2573
(2000).
38 K. Voelker and S. Chakravarty, Phys. Rev. B 64, 235125
(2001).
39 B. Bernu, L. Candido, and D. M. Ceperley, Phys. Rev.
Lett. 86, 870 (2001).
40 D. J. Thouless, Proc. Phys. Soc. London 86, 893 (1965).
41 C. K. Majumdar and D. K. Ghosh, J. Math. Phys. 10,
1388 (1969); 10, 1399 (1969).
42 F. D. M. Haldane, Phys. Rev. B 25, R4925 (1982).
43 K. Okamoto and K. Nomura, Phys. Lett. A 169, 433
(1992).
44 S. Eggert, Phys. Rev. B 54, R9612 (1996).
45 S. R. White and I. Affleck, Phys. Rev. B 54, 9862 (1996).
46 T. Hamada, J. Kane, S. Nakagawa, and Y. Natsume, J.
Phys. Soc. Jpn. 57, 1891 (1988).
47 T. Tonegawa and I. Harada, J. Phys. Soc. Jpn. 58, 2902
(1989).
48 A. V. Chubukov, Phys. Rev. B 44, 4693 (1991).
49 D. Allen, F. H. L. Essler, and A. A. Nersesyan, Phys. Rev.
B 61, 8871 (2000).
50 C. Itoi and S. Qin, Phys. Rev. B 63, 224423 (2001).
51 N. Muramoto and M. Takahashi, J. Phys. Soc. Jpn. 68,
2098 (1999).
52 To be precise, we have found that the ground state at
large eJ2/J4 belongs to the subspace (S,Q) = (0, 0) [(0, π)]
for N = 8m [8m + 4], where m is an integer, while the
first-excited state belongs to the subspace (S,Q) = (0, π)
[(0, 0)].
53 A. Yacoby, H. L. Stormer, N. S. Wingreen, L. N. Pfeiffer,
K. W. Baldwin, and K. W. West, Phys. Rev. Lett. 77,
4612 (1996).
54 A. J. Daneshvar, C. J. B. Ford, A. R. Hamilton, M. Y. Sim-
mons, M. Pepper, and D. A. Ritchie, Phys. Rev. B 55,
R13409 (1997).
55 S. Coleman, Aspects of Symmetry (Cambridge University
Press, New York, 1988).
ABSTRACT
  We consider interacting electrons in a quantum wire in the case of a shallow
confining potential and low electron density. In a certain range of densities,
the electrons form a two-row (zigzag) Wigner crystal whose spin properties are
determined by nearest and next-nearest neighbor exchange as well as by three-
and four-particle ring exchange processes. The phase diagram of the resulting
zigzag spin chain has regions of complete spin polarization and partial spin
polarization in addition to a number of unpolarized phases, including
antiferromagnetism and dimer order as well as a novel phase generated by the
four-particle ring exchange.

<|endoftext|><|startoftext|>
arXiv:0704.0777v1  [hep-th]  5 Apr 2007
CALT-68-2636
DAMTP-2007-25
UT-07-11
Decoupling Supergravity from the Superstring
Michael B. Green,1 Hirosi Ooguri,2,3 and John H. Schwarz2
1Department of Applied Mathematics and Theoretical Physics
Cambridge University, Cambridge CB3 0WA, UK
2California Institute of Technology, Pasadena, CA 91125, USA
3Department of Physics, University of Tokyo, Tokyo 113-0033, Japan
Abstract
We consider the conditions necessary for obtaining perturbative maximal supergrav-
ity in d dimensions as a decoupling limit of type II superstring theory compactified on
a (10 − d)-torus. For dimensions d = 2 and d = 3 it is possible to define a limit in
which the only finite-mass states are the 256 massless states of maximal supergravity.
However, in dimensions d ≥ 4 there are infinite towers of additional massless and finite-
mass states. These correspond to Kaluza–Klein charges, wound strings, Kaluza–Klein
monopoles or branes wrapping around cycles of the toroidal extra dimensions. We con-
clude that perturbative supergravity cannot be decoupled from string theory in dimensions
≥ 4. In particular, we conjecture that pure N = 8 supergravity in four dimensions is in
the Swampland.
March, 2007
http://arxiv.org/abs/0704.0777v1
There has recently has been some speculation that four-dimensional N = 8 super-
gravity might be ultraviolet finite to all orders in perturbation theory [1,2,3]. If true, this
would raise the question of whether N = 8 supergravity might be a consistent theory that
is decoupled from its string theory extension. A related issue is whether N = 8 supergrav-
ity can be obtained as a well-defined limit of superstring theory. Here we argue that such
a supergravity limit of string theory does not exist in four or more dimensions, irrespective
of whether or not the perturbative approximation is free of ultraviolet divergences.
In this paper, we will study limits of Type IIA superstring theory on a (10 − d)-
dimensional torus T 10−d for various d. One may regard the following analysis as analogous
to the study of the decoupling limit on Dp-branes (the limit where field theories on branes
decouple from closed string degrees freedom in the bulk) for various p [4,5]. The decoupling
limit on Dp-branes is known to exist for p ≤ 5. On the other hand, subtleties have been
found for p ≥ 6, where infinitely many new world-volume degrees of freedom appear in the
limit. This has been regarded as a sign that a field theory decoupled from the bulk does
not exist on Dp-branes for p ≥ 6. We will find similar subtleties for Type IIA theory on
T 10−d ×Rd for d ≥ 4.
It will be sufficient for our purposes to consider the torus T 10−d to be the product of
(10− d) circles, each of which has radius R. Numerical factors, such as powers of 2π, are
irrelevant to the discussion that follows and therefore will be dropped. In ten dimensions,
Newton’s constant is given by
G10 = g
2ℓ8s ,
where ℓs is the string scale and g is the string coupling constant. Thus, the effective Newton
constant in d dimensions is given by
Gd ≡ ℓd−2d =
R10−d
g2ℓ8s
R10−d
, (1)
where ℓd is the d-dimensional Planck length, so that
. (2)
We are interested in whether there is a limit of string theory that reduces to maximal
supergravity, which is defined purely in terms of the dynamics of the 256 states in the
massless supermultiplet. In other words, we are interested in the limit in which all the
excited string states, together with the Kaluza–Klein excitations and string winding states
associated with the (10− d)-torus, decouple. A necessary condition for this to happen is
that these states are all infinitely massive compared to the d-dimensional Planck scale ℓd.
This is achieved by taking
, and
, (3)
with ℓd fixed. This is compatible with keeping g fixed for d < 6. If the extra states do
decouple then the surviving states are the 256 massless states of maximal supergravity,
which is N = 8 supergravity when d = 4.
Let us now consider the spectrum of nonperturbative superstring excitations in this
limit. First consider a Dp-brane wrapping a p cycle of the torus. The mass of such a state
in d dimensions is
· ℓ1−
. (4)
When d ≤ 5, we also need to consider a NS5-brane wrapping a 5 cycle. This has a mass
given by
MNS5 =
g2ℓ6s
· ℓ2−d
. (5)
In order to obtain the pure supergravity theory with 32 supercharges in d dimensions, these
nonperturbative states also need to decouple, so their masses must satisfy Mp,MNS5 ≫
1/ℓd. In the case of d = 4 the nonperturbative BPS particle spectrum also includes
Kaluza–Klein monopoles, which are discussed in the next paragraph.
Before studying the limit in any dimension, d, we will discuss what to expect on
general ground. A Kaluza–Klein momentum state and a wrapped string state have masses
1/R and R/ℓ2s , respectively, and they are half-BPS objects that carry a single unit of
a conserved charge. In d-dimensions, their magnetic duals are (d − 4)-branes. The BPS
saturation condition together with the Dirac quantization condition implies quite generally
that the mass m of a BPS particle and the tension T of its magnetic dual (d − 4)-brane
are related by
mT ∼ 1
. (6)
1 In this limit, the string length ℓs provides a regularization scale for supergravity. Thus, if
string amplitudes depend sensitively on ℓs, it can be taken as evidence for ultraviolet divergences
in supergravity. This is seen explicitly, for example, in the one-loop four graviton amplitude,
which is ultraviolet divergent in nine dimensions. The corresponding string expression is finite
and its low-energy limit is sensitive to the presence of these massive states with momenta ∼ 1/ℓs.
Applying this to d = 4, we immediately conclude that there is no limit in four dimensions
where we can keep all BPS particles heavier than the Planck scale. In particular, magnetic
duals of Kaluza–Klein excitations, which are the well-known Kaluza–Klein monopoles, are
BPS states with masses ∼ R/ℓ24 → 0.2 Similarly, magnetic duals of wrapped strings are
NS5-branes wrapping 5-cycles of T 6, and their masses go as ℓ2s/Rℓ
4 → 0. Later, we will
discuss implications of these light states. When d ≥ 5, at least a subset of the BPS branes
become tensionless in the limit (3).
By contrast, in three dimensions it is possible to define a limit where all BPS particles
become infinitely massive simultaneously. In this case, magnetic duals of BPS particles
are (−1)-branes, namely instantons, and their Euclidean actions vanish in the limit. Thus,
one would expect nonperturbative effects to be very large in three dimensions even though
no singularity is apparent from the spectrum.
In two dimensions, there are no magnetic duals of BPS particles, and we expect that
there is a smooth limit where all BPS particles are massive and instanton actions remain
non-vanishing.
Now, let us look at each case in more detail. When d = 2, the conditions we want to
impose are
and MNS5 =
→ ∞. (7)
On the other hand, the string coupling constant is given by
. (8)
Thus, the desired limit can be taken by sending R → 0 while keeping the string coupling
constant finite. In this limit, all particle masses are much higher than the Planck mass,
except for the massless two-dimensionalN = 16 supergravity states [6]. However, Dp-brane
and NS5-brane instantons wrapping T 8 have Euclidean actions proportional to (ℓs/R)
3−p ∼
8 and (ℓs/R)
2 ∼ g− 14 , respectively. Though the actions all remain finite and non-zero in
the limit, their effects are not uniformly suppressed for small g. Thus, the resulting theory
may not have a weak coupling limit that is dominated by the perturbative contribution.
When d = 3, the conditions we need to impose are
and MNS5 =
→ ∞. (9)
2 If the torus has six independent radii Ri, the Kaluza–Klein monopole mass spectrum has the
form M2 =
(niRi/ℓ
Since we now have
· ℓ3, (10)
we can rewrite (9) as
and MNS5 =
→ ∞. (11)
Since p = 0, 2, 4, 6 in Type IIA theory, this can again be arranged by taking R → 0
keeping g finite.3 This is also compatible with the limit (3). Thus, all particle states
develop large masses and may decouple, except for those in three-dimensional N = 16
supergravity theory [7]. However, Dp-brane and NS5-brane instanton actions, which are
given by g
4 (R/ℓ3)
8 and g−
2 (R/ℓ3)
4 , vanish in the limit R → 0 for any finite value of
g. This means that nonperturbative effects are strong and it may be difficult to determine
the properties of the resulting three-dimensional supergravity.
In view of these observations, it is interesting that gravity theories formulated in
terms of a finite number of fields are known to exist in two and three dimensions. In
three dimensions, the relation with Chern-Simons gauge theory [8] suggests that pure
Einstein gravity is finite to all orders in perturbation theory. However, this theory has
no propagating degrees of freedom, and it is not known whether there is a finite quantum
gravity theory in three dimensions that includes propagating (scalar or spin-1/2) degrees
of freedom. Such degrees of freedom are present, of course, in the examples considered
here. The fact that we find limits of string theory compactifications with a finite number
of such propagating degrees of freedom in these dimensions may be encouraging, though
the implications of the nonperturbative instanton contributions need to be understood.
When d = 4, the conditions, (3), necessary for the extra modes to have infinite masses
and MNS5 =
→ ∞. (12)
Clearly, this cannot be realized simultaneously for all p = 0, 2, 4, 6. This is in accord with
the general argument given earlier, since a wrapped Dp-brane and a wrapped D(6−p)-brane
3 Note that, in the Type IIB theory, a wrapped D7-brane cannot be made heavy unless g ≫ 1.
This is not in contradiction with T-duality since g transforms under T-duality in such a way that
ℓp given by (1) remains invariant. T-duality along one of the circles on T
10−d transforms the
coupling g → gℓs/R so it diverges in the limit R → 0 with the original coupling constant, given
by (10), kept finite.
are electric–magnetic duals. Similarly, the magnetic duals of Kaluza–Klein excitations and
wrapped strings are Kaluza–Klein monopoles and wrapped NS5-branes, whose masses
behave as R/ℓ24 and ℓ
4, respectively. There are infinitely many such states since they
have arbitrary integer charges. In the limit R, ℓ2s/R → 0, there is no mass gap and the
spectrum becomes continuous.
To understand the implications of these infinitely many light states, we note that
among the elements of the four-dimensional U-duality group E7(Z) is the four-dimensional
S-duality transformation that interchanges the 28 types of electric charge with the corre-
sponding magnetic charges [9,10]. This duality is described by the following transforma-
tions of the moduli,
S : R → R̃ = ℓ
and ℓs → ℓ̃s =
. (13)
Note that this transformation inverts the radius R in four-dimensional Planck units (in
contrast to T-duality, which inverts R in string units). Since g is related to R and ℓs by
(2), this transformation acts as the inversion g → g̃ = 1/g, which maps BPS states into
each other. For example, a wrapped Dp-brane is interchanged with a wrapped D(6 − p)-
brane. Similarly, a Kaluza–Klein excitation is interchanged with a Kaluza–Klein monopole
(whereas T-duality would relate it to a wrapped F-string). Thus, in the dual frame in
which the compactification scale R̃ → ∞, the six-torus is decompactified. This explains
the continuous spectrum in the limit (3). The fact that an infinite set of states from the
nonperturbative sector become massless shows that the limit of interest does not result
in pure N = 8 supergravity in four dimensions. Rather, it results in 10-dimensional
decompactified string theory with the string coupling constant inverted. This is true in
both the type IIA and type IIB cases. The only way of avoiding this would be to relax (3),
in which case there would instead be extra finite-mass Kaluza–Klein or winding number
states, which would therefore not decouple.
One may regard our results on the limit of superstring compactification on T 10−d as
examples illustrating the conjectures formulated in [11,12] on the geometry of continuous
moduli parameterizing the string landscape. The conjectures concern consistent quantum
gravity theories with finite Planck scale in four or more dimensions. Among the conjectures
are the statements that, if a theory has continuous moduli, there are points in the moduli
space that are infinitely far away from each other, and an infinite tower of modes becomes
massless as a point at infinity is approached [12]. Since the limit considered in this paper
corresponds to a point in the moduli space of string compactifications at infinite distance
from a generic point in the middle of moduli space, the conjectures predict than an infinite
number of particles become massless in the limit. For d = 4, we have found that among
such particles are Kaluza–Klein monopoles, i.e., Kaluza–Klein modes on T 6 in the dual
frame in the limit R̃ → ∞. On the other hand, the moduli space of pureN = 8 supergravity
also contains infinite distance points, but it does not take account of new light particles
appear near these points. If the BPS particles required by string theory were included
one would have string theory and not N = 8 supergravity.4 Thus, the conjectures of
[12] imply that the N = 8 supergravity is in the Swampland. Similarly, there are many
superstring compactifications with N < 8 supersymmetry, and discarding stringy states in
these compactifications results in further supergravity theories in the Swampland.
It is interesting to see how scattering amplitudes behave in the limit (3). Consider
a four-dimensional graviton scattering amplitude where the graviton momenta are below
the four-dimensional Planck scale. According to (1) and (2), the ten-dimensional Planck
length, ℓ10, is given by
ℓ10 = g
4 ℓs = R
. (14)
After the S-duality transformation (13), the limit R → 0 turns into R̃ → ∞. Thus, we have
ℓ̃10 = R̃
→ ∞ in ten dimensions. Since ℓ̃10 ≪ R̃, the extra dimensions decompactify
and the theory is effectively ten-dimensional. Furthermore, if we take this limit keeping
the graviton momenta fixed (in units of the four-dimensional Planck mass), the scattering
process becomes trans-Planckian. Generically, we expect that it will involve formation and
evaporation of virtual black holes in ten dimensions.
The original motivation of this work was to investigate the relation between super-
string theory and N = 8 supergravity to see, in particular, under what conditions super-
gravity might be ultraviolet finite. What we have found is that in four or more dimensions
(d ≥ 4) there is no limit of compactified superstring theory in which the stringy effects
decouple and only the 256 massless supergravity fields survive below the four-dimensional
Planck scale. This is true whether or not there are ultraviolet divergences in supergravity
perturbation theory. Of course, there is a well-defined procedure for extracting UV finite
four-dimensional scattering amplitudes from perturbative string theory. This involves tak-
ing g → 0 first, before taking the limit (3). However, this procedure does not keep ℓ4 fixed,
and therefore it does not correspond to the limit considered in this paper.
4 One can imagine an alternative history in which type II superstring theory and M-theory
were discovered by properly interpreting the BPS solitons of N = 8 supergravity.
It might be instructive to compare the situation to that of the conifold limit of Calabi–
Yau compactified type II superstring theory studied by Strominger [13]. In that case,
certain terms in the low-energy effective theory that are independent of the string coupling
constant g, due to the decoupling of vector and hypermultiplet fields, can be computed in
string perturbation theory. One can estimate the singularity of these terms using the fact
that a brane wrapping a vanishing cycle describes a nonperturbative BPS particle that
becomes massless in the conifold limit. If one could identify analogous terms in N = 8
supergravity, one could transform the Feynman diagram computation in four-dimensional
supergravity into a corresponding computation in ten dimensions, which might give insight
into the question of ultraviolet finiteness.
The situation is qualitatively different in two and three dimensions (d = 2, 3), where
all non-supergravity states develop masses larger than the Planck scale in the limit (3),
and therefore they can decouple. In these cases only the 256 massless supergravity states
survive, and a self-contained quantum gravity theory may well exist decoupled from string
theory. We have found, however, that in the d = 3 case there are instantons with zero
action, which give rise to large nonperturbative contributions. In the d = 2 case the
instanton actions do not vanish in the limit (3), but not all of them are small when g is
small. Therefore the amplitudes may not be dominated by the perturbative contribution
in this case, too.
Acknowledgments
We thank Z. Bern, N. Dorey, C. Hull, J. Russo, N. Seiberg, A. Sen, M. Shigemori,
Y. Tachikawa, D. Tong, P. Vanhove and E. Witten for discussions. H.O. thanks the
hospitality of the particle theory group of the University of Tokyo.
H.O. and J.H.S. are supported in part by the DOE grant DE-FG03-92-ER40701. The
research of H.O. is also supported in part by the NSF grant OISE-0403366 and by the 21st
Century COE Program at the University of Tokyo.
References
[1] M. B. Green, J. G. Russo and P. Vanhove, “Non-renormalisation conditions in type
II string theory and maximal supergravity,” JHEP 0702, 099 (2007) [arXiv:hep-
th/0610299].
[2] Z. Bern, L. J. Dixon and R. Roiban, “Is N = 8 supergravity ultraviolet finite?,” Phys.
Lett. B 644, 265 (2007) [arXiv:hep-th/0611086].
[3] Z. Bern, J. J. Carrasco, L. J. Dixon, H. Johansson, D. A. Kosower and R. Roiban,
“Three-loop superfiniteness of N = 8 supergravity,” arXiv:hep-th/0702112.
[4] A. Sen, “D0-branes on Tn and matrix theory,” Adv. Theor. Math. Phys. 2, 51 (1998)
[arXiv:hep-th/9709220].
[5] N. Seiberg, “Why is the matrix model correct?,” Phys. Rev. Lett. 79, 3577 (1997)
[arXiv:hep-th/9710009].
[6] H. Nicolai and N. P. Warner, “The structure of N = 16 supergravity in two dimen-
sions,” Commun. Math. Phys. 125, 369 (1989).
[7] N. Marcus and J. H. Schwarz, “Three-dimensional supergravity theories,” Nucl. Phys.
B 228, 145 (1983).
[8] E. Witten, “(2+1)-dimensional gravity as an exactly soluble system,” Nucl. Phys. B
311, 46 (1988).
[9] C. M. Hull and P. K. Townsend, “Unity of superstring dualities,” Nucl. Phys. B 438,
109 (1995) [arXiv:hep-th/9410167].
[10] C. M. Hull, “String dynamics at strong coupling,” Nucl. Phys. B 468, 113 (1996)
[arXiv:hep-th/9512181].
[11] C. Vafa, “The string landscape and the swampland,” arXiv:hep-th/0509212.
[12] H. Ooguri and C. Vafa, “On the geometry of the string landscape and the swampland,”
[arXiv:hep-th/0605264].
[13] A. Strominger, “Massless black holes and conifolds in string theory,” Nucl. Phys. B
451, 96 (1995) [arXiv:hep-th/9504090].
http://arxiv.org/abs/hep-th/0610299
http://arxiv.org/abs/hep-th/0610299
http://arxiv.org/abs/hep-th/0611086
http://arxiv.org/abs/hep-th/0702112
http://arxiv.org/abs/hep-th/9709220
http://arxiv.org/abs/hep-th/9710009
http://arxiv.org/abs/hep-th/9410167
http://arxiv.org/abs/hep-th/9512181
http://arxiv.org/abs/hep-th/0509212
http://arxiv.org/abs/hep-th/0605264
http://arxiv.org/abs/hep-th/9504090
ABSTRACT
  We consider the conditions necessary for obtaining perturbative maximal
supergravity in d dimensions as a decoupling limit of type II superstring
theory compactified on a (10 -- d)-torus. For dimensions d = 2 and d = 3 it is
possible to define a limit in which the only finite-mass states are the 256
massless states of maximal supergravity. However, in dimensions d > 3 there are
infinite towers of additional massless and finite-mass states. These correspond
to Kaluza--Klein charges, wound strings, Kaluza--Klein monopoles or branes
wrapping around cycles of the toroidal extra dimensions. We conclude that
perturbative supergravity cannot be decoupled from string theory in dimensions
d > 3. In particular, we conjecture that pure N = 8 supergravity in four
dimensions is in the Swampland.

<|endoftext|><|startoftext|>
Introduction
Let G denote a connected and reductive group over an algebraically
closed field k, and let B denote a Borel subgroup of G. An equi-
variant embedding X of G is a G × G-variety which contains G =
(G × G)/diag(G) as an open G × G-invariant subset, where diag(G)
is the diagonal image of G in G × G. Any equivariant embedding X
of G contains finitely many B × B-orbits. In recent years the geom-
etry of closures of B × B-orbits has been studied by several authors.
The most general result was obtained in [H-T2] where it was proved
that B × B-orbit closures are normal, Cohen-Macaulay and have (F -
)rational singularities (actually, even stronger results were obtained).
In the present paper we will study (closed) subvarieties inX of the form
diag(G) ·V, where V denotes the closure of a B×B-orbit. Subvarieties
of equivariant embeddings of G of this form will be called G-Schubert
varieties.
When G is a semisimple group of adjoint type there exists a canonical
equivariant embedding X of G which is called the wonderful compact-
ification. The wonderful compactifications are of primary interest in
this paper. Actually, this work arose from the question of describing
the closures of the so-called G-stable pieces of X. The G-stable pieces
makes up a decomposition of X into locally closed subsets. They were
introduced by Lusztig in [L] where they were used to construct and
http://arxiv.org/abs/0704.0778v2
2 XUHUA HE AND JESPER FUNCH THOMSEN
study a class of perverse sheaves which generalizes his theory of charac-
ter sheaves on reductive groups. More precisely, these perverse sheaves
are the intermediate extensions of the so-called “character sheaves”
on a G-stable piece. This motivates the study of closures of G-stable
pieces which turns out to coincide with the set of G-Schubert varieties.
Before discussing the closures of G-stable pieces in details, let us
make a short digression and discuss some other motivations for study-
ing G-stable pieces and G-Schubert varieties (in wonderful compactifi-
cations):
(1) When G is a simple group, the boundary of the closure of the
unipotent subvariety of G in the wonderful compactification X,
is a union of certain G-Schubert varieties (see [He] and [H-T]).
Thus knowing the geometry of these G-Schubert varieties will
help us to understand the geometry of the closure of the unipo-
tent variety within X.
(2) Let Lie(G) denote the Lie algebra of a simple group G over a
field of characteristic zero. Let ≪,≫ denote a fixed symmet-
ric non-degenerate ad-invariant bilinear form. Let <,> be the
bilinear form on Lie(G)⊕ Lie(G) defined by
< (x, y), (x′, y′) >=≪ x, x′ ≫ − ≪ y, y′ ≫ .
In [E-L], Evens and Lu showed that each splitting Lie(G) ⊕
Lie(G) = l ⊕ l′, where l and l′ are Lagrangian subalgebras of
Lie(G)⊕ Lie(G), gives rise to a Poisson structure Πl,l′ on X. If
moreover, one starts with the Belavin-Drinfeld splitting, then
all the G-stable pieces/G-Schubert varieties and B×B−-orbits
of X are Poisson subvarieties, where B− is a Borel subgroup
opposite to B. Thus to understand the Poisson structure on
X corresponding to the Belavin-Drinfeld splitting, one needs
to understand the geometry of the G-stable pieces/G-Schubert
varieties. If we start with another splitting, then we obtain
a different Poisson structure on X and in order to understand
these Poisson structures, one needs to study the R-stable pieces
[L-Y] instead (see Section 12), which generalize both the G-
stable pieces and the B × B−-orbits.
The main technical ingredient in this paper is the positive character-
istic notion of Frobenius splitting. Frobenius splitting is a powerful tool
which has been proved to be very useful in obtaining strong geometric
conclusions for e.g. Schubert varieties and closures of B × B-orbits in
equivariant embeddings. In the present paper we obtain two types of
results related to G-Schubert varieties over fields of positive character-
istic. First of all, if we fix an equivariant embedding X of a reductive
group G then we prove that all G-Schubert varieties in X are simul-
taneously compatibly Frobenius split by a Frobenius splitting of X .
Secondly, concentrating on a single G-Schubert variety X, in a smooth
projective and toroidal embedding X , we prove that this admits a
stable Frobenius splitting along an ample divisor. Statements of this
form put strong conditions on the intertwined behavior of cohomology
groups of line bundles on X and its G-Schubert varieties. As this is re-
lated to geometric properties it therefore seems natural to expect that
G-Schubert varieties should have nice singularities. It therefore comes
as a complete surprise that G-Schubert varieties, in general, are not
even normal. We only provide a single example of this phenomenon
(in the wonderful compactification of a group of type G2), but expect
that this absence of normality is the general picture.
In obtaining the Frobenius splitting result mentioned above, we have
developed some general theory of how to construct Frobenius splitting
of varieties of the form G×PX (see Section 4.2 for the definition). This
part of the paper is influenced by the theory of B-canonical Frobenius
splitting as discussed in [B-K, Chap.4]; in particular the proof of [B-K,
Prop.4.1.17]. The presentation we provide is more general and makes
it possible to extract even better result from the ideas of B-canonical
Frobenius splittings. This theory is presented in Chapter 5 in a general-
ity which is more than necessary for obtaining the described Frobenius
splitting results for G-Schubert varieties. However, we hope that this
theory could be useful elsewhere and we certainly consider it to be of
independent interest.
This paper is organized in the following way. In Section 2 we intro-
duce notation, and in Section 3 we briefly define Frobenius splitting
and explain its fundamental ideas. Section 4 is devoted to some results
on linearized sheaves which should all be well known. In Section 5 we
study the Frobenius splitting of varieties of the form G ×P X for a
variety X with an action by a parabolic subgroup P . The main idea
is to decompose the Frobenius morphism on G×P X into maps associ-
ated to the Frobenius morphism on the base G/P and the fiber X of the
natural morphism G×P X → G/P . In Section 6 we relate B-canonical
Frobenius splittings to the material in Section 5. Section 7 contains
applications of Section 5 to general G × G-varieties. In section 8 we
define the G-stable pieces and G-Schubert varieties. In Section 9 we
apply the material of the previous sections to the class of equivariant
embeddings and obtain Frobenius splitting results for G-Schubert vari-
eties. Section 10 contains results related to cohomology of line bundles
on G-Schubert varieties. Section 11 contains an example of a non-
normal G-Schubert variety. Finally Section 12 contains generalizations
and variations of the previous sections.
We would like to thank the referee for a careful reading of this paper
and for numerous suggestions concerning the presentation.
4 XUHUA HE AND JESPER FUNCH THOMSEN
2. Notation
We will work over a fixed algebraically closed field k. The charac-
teristic of k will depend on the application. By a variety we mean a
reduced and separated scheme of finite type over k. In particular, we
allow a variety to have several irreducible components.
2.1. Group setup. We letG denote a connected linear algebraic group
over k. We fix a Borel subgroup B and a maximal torus T ⊂ B. The
notation P is used for a parabolic subgroup of G containing B. The
set of T -characters is denoted by X∗(T ) and we identify this set with
the set X∗(B) of B-characters.
2.2. Reductive case. In many cases we will specialize to the case
where G is reductive. In this case we will also use the following no-
tation : the set of roots determined by T is denoted by R ⊆ X∗(T )
while the set of positive roots determined by (B, T ) is denoted by R+.
The simple roots are denoted by α1, . . . , αl, and we let ∆ = {1, . . . , l}
denote the associated index set. The simple reflection associated to the
simple root αi is then denoted by si. The Weyl group W = NG(T )/T is
generated by the simple reflections si, for i ∈ ∆. The length of w ∈ W
will be denoted by l(w). For J ⊂ ∆, let WJ denote the subgroup of
W generated by the simple reflection associated with the elements in
J , and let W J (resp. JW ) denote the set of minimal length coset rep-
resentatives for W/WJ (resp. WJ\W ). The element in W of maximal
length will be denoted by w0, while w
0 is used for the same kind of
element in WJ . For any w ∈ W , we let ẇ denote a representative of w
in NG(T ). For J ⊂ ∆, let PJ ⊃ B denote the corresponding standard
parabolic subgroup and P−J ⊃ B
− denote its opposite parabolic. Let
LJ = PJ ∩P
J be the common Levi subgroup of PJ and P
J containing
T . Let UJ (resp. U
J ) denote the unipotent radical of PJ (resp. P
When J = ∅ we also use the notation U and U− for UJ and U
J respec-
tively. When G is semisimple and simply connected we may associate
a fundamental character ωi to each simple root αi. The sum of the
fundamental characters is then denoted by ρ. Then ρ also equals half
the sum of the positive roots.
3. The relative Frobenius morphism
In this section we collect some results related to the Frobenius mor-
phism and to the concept of Frobenius splitting. Compared to other
presentations on the same subject, this presentation differs only in its
emphasis on the set HomOX′
(FX)∗OX ,OX′
(to be defined below) and
not just the set of Frobenius splittings. Thus, the obtained results are
only small variations of already known results as can be found in e.g.
[B-K].
3.1. The Frobenius morphism. By definition a variety X comes
with an associated morphism
pX : X → Spec(k),
of schemes. Assume that the field k has positive characteristic p > 0.
Then the Frobenius morphism on Spec(k) is the morphism of schemes
Fk : Spec(k) → Spec(k),
which on the level of coordinate rings is defined by a 7→ ap. As k
is assumed to be algebraically closed the morphism Fk is actually an
isomorphism and we let F−1k denote the inverse morphism. Composing
pX with F
k we obtain a new variety
p′X : X → Spec(k),
with underlying scheme X . In the following we suppress the morphism
pX from the notation and simply use X as the notation for the variety
defined by pX . The variety defined by p
X is then denoted by X
The relative Frobenius morphism on X is then the morphism of
varieties :
FX : X → X
which as a morphism of schemes is the identity map on the level of
points and where the associated map of sheaves
X : OX′ → (FX)∗OX ,
is the p-th power map. A key property of the Frobenius morphism is
the relation
(1) (FX)
′ ≃ Lp
which is satisfied for every line bundle L on X (here L′ denotes the
corresponding line bundle on X ′).
3.2. Frobenius splitting. A variety X is said to be Frobenius split if
the OX′-linear map of sheaves :
X : OX′ → (FX)∗OX ,
has a section; i.e. if there exists an element
s ∈ HomOX′
(FX)∗OX ,OX′
such that the composition s ◦F
X is the identity endomorphism of OX′ .
The section s will be called a Frobenius splitting of X .
6 XUHUA HE AND JESPER FUNCH THOMSEN
3.3. Compatibility with line bundles and closed subvarieties.
Fix a line bundle L on X and a closed subvariety Y in X with sheaf
of ideals IY . Let Y
′ denote the closed subvariety of X ′ associated to Y
with sheaf of ideals denoted by IY ′ . The kernel of the natural morphism
HomOX′
(FX)∗L,OX′
→ HomOX′
(FX)∗(L⊗ IY ),OY ′
induced by the inclusion L ⊗ IY ⊂ L and the projection OX′ → OY ′ ,
will be denoted by EndLF (X, Y ). The associated space of global sections
will be denoted by EndLF (X, Y ). When Y = X we simply denote
EndLF (X, Y ) (resp. End
F (X, Y )) by End
F (X) (resp. End
F (X)). The
sheaf EndLF (X, Y ) is a subsheaf of End
F (X) consisting of the elements
compatible with Y . Moreover, there is a natural morphism
EndLF (X, Y )|Y → End
F (Y ),
where the notation |Y means restriction to Y .
If Y1, Y2, . . . , Ym is a collection of closed subvarieties of X then the
notation EndLF (X, Y1, . . . , Ym) (or sometimes End
F (X, {Yi}
i=1)) will de-
note the intersection of the subsheaves EndLF (X, Yi) for i = 1, . . . , m.
The set of global sections of the sheaf EndLF (X, Y1, . . . , Ym) will be de-
noted by EndLF (X, Y1, . . . , Ym).
When L = OX we remove L from all of the above notation. In
particular, the vectorspace EndF (X) denotes the set of morphisms from
(FX)∗OX to OX′ and thus contains the set of Frobenius splittings of X .
A Frobenius splitting s of X contained in EndF (X, {Yi}i) is said to be
compatible with the subvarieties Y1, . . . , Ym. When s is compatible in
this sense it induces a Frobenius splitting of each Yi for i = 1 . . . , m.
In this case we also say that s compatibly Frobenius splits Y1, . . . , Ym.
In concrete terms, this is equivalent to
(FX)∗IYi
⊂ IY ′i .
for all i.
Lemma 3.1. Let Y and Z denote closed subvarieties in X and let s
denote a global section of EndLF (X,Z, Y ).
(1) s ∈ EndLF (X, Y1) for every irreducible component Y1 of Y .
(2) If the scheme theoretic intersection Z ∩ Y is reduced then s is
contained in EndLF (X, Y ∩ Z).
Proof. Let Y1 denote an irreducible component of Y and let
J = s
(FX)∗(IY1 ⊗ L)
⊂ OX′ .
Let U denote the open complement (in X ′) of the irreducible compo-
nents of Y ′ which are different from Y ′1 . Then IY ′1 coincides with IY ′
on U and consequently J|U ⊂ (IY ′)|U as s is compatible with Y . In
particular, J|U ⊂ (IY ′1 )|U . We claim that this implies that J ⊂ IY ′1 :
let V denote an open subset of X ′ and let f be a section of J over
V . As J is a subsheaf of OX′ , we may consider f as a function on V ,
and it suffices to prove that f vanishes on Y ′1 ∩ V . If Y
1 ∩ V is empty
then this is clear. Otherwise, U ∩ V ∩ Y ′1 is a dense subset of Y
1 and it
suffices to prove that f vanishes on this set. But this follows from the
inclusion J|U ⊂ (IY ′1 )|U . As a consequence s is compatible with Y1. The
second claim follows as the sheaf of ideals of the intersection Z ∩ Y is
IY + IZ . �
The condition that Z ∩ Y is reduced, in Lemma 3.1, only ensures
that Z ∩ Y is a variety. When L = OX and s is a Frobenius splitting
this is always satisfied [B-K, Prop.1.2.1].
3.4. The evaluation map. Let k[X ′] denote the space of global reg-
ular functions on X ′. Evaluating an element s : (FX)∗OX → OX′ of
EndF (X) at the constant global function 1 on X defines an element in
k[X ′] which we denote by evX(s). This defines a morphism
evX : EndF (X) → k[X
with the property that evX(s) = 1 if and only if s is a Frobenius
splitting of X .
3.5. Frobenius D-splittings. Consider an effective Cartier divisor D
on X , and let σD denote the associated global section of the associated
line bundle OX(D). A Frobenius splitting s of X is said to be a Frobe-
nius D-splitting if s factorizes as
s : (FX)∗OX
(FX)∗σD
−−−−−→ (FX)∗OX(D)
−→ OX′ ,
for some element sD in End
OX(D)
. We furthermore say that the
Frobenius D-splitting s is compatible with a subvariety Y if sD is com-
patible with Y . The following result assures that, in this case, the
compatibility with closed subvarieties agrees with the usual definition
[R, Defn.1.2].
Lemma 3.2. Assume that s defines a Frobenius D-splitting of X.
Then sD is compatible with Y if and only if (i) s compatibly Frobe-
nius splits Y and (ii) the support of D does not contain any irreducible
components of Y .
Proof. The if part of the statement follows from [R, Prop.1.4]. So
assume that sD is compatible with Y . Then sD induces a morphism
sD : (FY )∗OX(D)|Y → OY ′ ,
satisfying sD((σD)|Y ) is the constant function 1 on Y
′. As a conse-
quence (σD)|Y does not vanish on any of the irreducible components
of Y . This proves part (ii) of the statement. Part (i) is clearly satis-
fied. �
8 XUHUA HE AND JESPER FUNCH THOMSEN
It follows that if s is compatible with Y and, moreover, defines a
Frobenius D-splitting of X then D ∩ Y makes sense as an effective
Cartier divisor on Y and, in this case, s induces a Frobenius D ∩ Y -
splitting of Y .
3.6. Stable Frobenius splittings along divisors. Let X(0) = X
and define recursively X(n) = (X(n−1))′ for n ≥ 1. Composing the
Frobenius morphisms on X(i) for i = 0, . . . , n, we obtain a morphism
X : X → X
with an associated map of sheaves
♯ : OX(n) → (F
X )∗OX .
Let, as in Section 3.5, D denote an effective Cartier divisor on X with
associated canonical section σD of OX(D). We say that X admits a
stable Frobenius splitting along D if there exists a positive integer n
and an element
s ∈ HomO
X )∗OX(D),OX(n)
such that the composed map
OX(n)
−−−−→ (F
X )∗OX
−−−−−−→ (F
X )∗OX(D)
−→ OX(n) ,
is the identity map on OX(n) . The element s is called a stable Frobenius
splitting of X along D. When Y is a closed subvariety of X we say
that the stable Frobenius splitting s is compatible with Y if
X )∗(IY ⊗ OX(D))
⊂ IY (n).
Notice that this condition necessarily implies that the support of D
does not contain any of the irreducible components of Y (cf. proof
of Lemma 3.2). Notice also that if X admits a Frobenius D-splitting
which is compatible with Y then X admits a stable Frobenius splitting
along D which is compatible with Y . The following is well known (see
e.g. [T, Lem.4.4])
Lemma 3.3. Let D1 and D2 denote effective Cartier divisors on X and
let Y denote a closed subvariety of X. Then X admits stable Frobenius
splittings along D1 and D2 which are compatible with Y if and only if
X admits a stable Frobenius splitting along D1+D2 which is compatible
with Y .
The following result explains one of the main applications of (stable)
Frobenius splitting. Remember that a line bundle L is nef if L⊗M is
ample whenever M is ample.
Proposition 3.4. Assume that X admits a stable Frobenius splitting
along an effective Cartier divisor D. Then there exists a positive integer
n such that for each line bundle L on X we have an inclusion of abelian
groups
Hi(X,L) ⊂ Hi(X,Lp
⊗ OX(D)).
In particular, if D is ample and L is nef, then Hi(X,L) = 0 for i > 0.
Moreover, if the stable Frobenius splitting of X is compatible with a
subvariety Y , D is ample and L is nef then the restriction morphism
H0(X,L) → H0(Y,L),
is surjective.
Proof. Argue as in the proof [R, Prop.1.13(i)]. �
3.7. Duality for FX. By duality (see [Har2, Ex.III.6.10]) for the finite
morphism FX we may to each quasi-coherent OX′-module F associate
an OX -module denoted by (FX)
!F and satisfying
(FX)∗(FX)
F = HomOX′
(FX)∗OX ,F
Actually, as FX is the identity on the level of points we may define
!F as the sheaf of abelian groups
HomOX′
(FX)∗OX ,F
with OX -module structure defined by
(g · φ)(f) = φ(gf),
for g, f ∈ OX and φ ∈ HomOX′
(FX)∗OX ,F
. When F = OX we
will also use the notation End!F (X) for (FX)
!OX . This sheaf is par-
ticularly nice when X is smooth as (FX)
!OX then coincides with the
line bundle ω
X , where ωX denotes the dualizing sheaf of X (see e.g.
[B-K, Sect.1.3]). If Y1, Y2, . . . , Ym is a collection of closed subvarieties
of X then End!F (X, Y1, . . . , Ym) (or End
F (X, {Yi}
i=1)) will denote the
subsheaf of End!F (X) consisting of the elements mapping the sheaf of
ideals IYi to IY ′i for all i = 1, . . . , m. We say that End
F (X, {Yi}
i=1) is
the subsheaf of elements compatible with Y1, . . . , Ym.
More generally, duality for FX implies that we have a natural iden-
tification
(FX)∗HomOX
G, (FX)
≃ HomOX′
(FX)∗G,F
whenever G (resp. F) is a quasicoherent sheaf on X (resp. X ′). This
leads to the identification
HomOX
G, (FX)
≃ HomOX′
(FX)∗G,F
where a morphism η : G → (FX)
!F is identified with the composed
morphism
η′ : (FX)∗G
(FX)∗η
−−−−→ (FX)∗(FX)
F ≃ HomOX′
(FX)∗OX ,F
Here the latter map is the natural evaluation map at the element 1
in OX . From now on we will specialize to the case where F = OX′
10 XUHUA HE AND JESPER FUNCH THOMSEN
and G equals a line bundle L on X . In this case, an element in
HomOX
L,End!F (X)
may also be considered as a global section of
the sheaf End!F (X)⊗ L
−1. For later use we emphasize
Lemma 3.5. Let η be an element in HomOX
L,End!F (X)
and let
η′ denote the corresponding element in HomOX′
(FX)∗L,OX′
by the
above identification. Then η′ factors through the morphism
(FX)∗L
(FX)∗η
−−−−→ (FX)∗End
F (X).
Moreover, the element η′ is compatible with a collection of closed sub-
varieties Y1, . . . , Ym of X if and only if the image of η is contained in
End!F (X, Y1, . . . , Ym).
Proof. The first part of the statement follows directly from the discus-
sion above. To prove the second statement we may assume that m = 1.
We use the notation Y = Y1. Let σ denote a section of L over an open
subset U of X , and consider s = η(σ) as a map
s : OX(U) → OX′(U
That s is compatible with Y means that s(f) vanishes on Y ′ whenever
f vanishes on Y for a function f on U . Alternatively, the evaluation
of f · s at 1, which coincides with η′(f · σ), should vanish on Y ′. In
particular, the image of η is contained in End!F (X, Y ) if and only if
the restriction of η′ to (FX)∗
IY ⊗ L
maps into IY ′ . This ends the
proof. �
We will also need the following remark
Lemma 3.6. Let D denote a reduced effective Cartier divisor on X
and L denote a line bundle on X. Let M = OX((p − 1)D) ⊗ L and
assume that we have an OX-linear morphism η : M → End
F (X). Let
σD denote the canonical section of OX(D) and consider the map
ηD : L → End
F (X),
induced by σ
D . Then the element
η′D ∈ HomOX′
(FX)∗L,OX′
induced by ηD, is compatible with the support of D. In particular, the
image of ηD is contained in End
F (X,D).
Proof. Notice that η′D is the composition
η′D : (FX)∗L
(FX)∗σ
−−−−−−→ (FX)∗M
−→ OX′ ,
where η′ is the element corresponding to η. Hence, the restriction of
η′D to L⊗ OX(−D) coincides with the map
(FX)∗
L⊗ OX(−D)
) (FX)∗σ
−−−−−→ (FX)∗M
−→ OX′ .
But the restriction of η′ to (cf. (1))
(FX)∗
OX(−pD)⊗M
≃ OX′(−D
′)⊗ (FX)∗M,
maps by linearity into OX′(−D
′). The in particular part follows by
Lemma 3.5. �
3.8. Push-forward operation. Assume that f : X → Z is a mor-
phism of varieties satisfying that the associated map f ♯ : OZ → f∗OX
is an isomorphism. Let f ′ : X ′ → Z ′ denote the associated morphism.
Then f ′∗ induces a morphism
f ′∗EndF (X) → EndF (Z).
If Y ⊂ X is a closed subset then the subsheaf f ′∗EndF (X, Y ) is mapped
to EndF (Z, f(Y )), where f(Y ) denotes the variety associated to the
closure of the image of Y . On the level of global sections this means
that every Frobenius splitting s of X induces a Frobenius splitting f ′∗s
of Z such that when s is compatible with Y then f ′∗s is compatible
with f(Y ). Likewise
Lemma 3.7. With notation as above, let L denote a line bundle on
Z and let s be an element of End
f∗(L)
. Then f ′∗s is an element of
EndLF
. Moreover, if s is compatible with a closed subvariety Y of
X then f ′∗s is compatible with f(Y ).
Proof. This follows easily from the fact that the sheaf of ideals of f(Y )
coincides with f∗IY [B-K, Lem.1.1.8]. �
4. Linearized sheaves
In this section we collect a number of well known facts about lin-
earized sheaves. The chosen presentation follows rather closely the
presentation in [Bri, Sect.2].
Let H denote a linear algebraic group over the field k and let X
denote a H-variety with H-action defined by σ : H × X → X . We
let p2 : H × X → X denote projection on the second coordinate. A
H-linearization of a quasi-coherent sheaf F on X is an OH×X -linear
isomorphism
φ : σ∗F → p∗2F,
satisfying the relation
(2) (µ× 1X)
∗φ = p∗23φ ◦ (1H × σ)
as morphisms of sheaves on H × H × X . Here µ : H × H → H
(resp. p23 : H × H × X → H × X) denotes the multiplication on
H (resp. the projection on the second and third coordinate). Based
on the fact that σ∗OX = p
2OX we see that the sheaf OX admits a
canonical linearization. In the following we will always assume that
OX is equipped with this canonical linearization.
12 XUHUA HE AND JESPER FUNCH THOMSEN
A morphism ψ : F → F′ of H-linearized sheaves is a morphism of
OX -modules commuting with the linearizations φ and φ
′ of F and F′,
i.e. φ′ ◦ σ∗(ψ) = p∗2(ψ) ◦ φ.
Linearized sheaves on X form an abelian category which we denote
by ShH(X).
4.1. Quotients and linearizations. Assume that the quotient q :
X → X/H exists and that q is a locally trivial principal H-bundle.
Then for G ∈ Sh(X/H), q∗G is naturally a H-linearized sheaf on X .
This defines a functor q∗ : Sh(X/H) → ShH(X). On the other hand,
for F ∈ ShH(X), q∗F has a natural action of H . Define a functor
qH∗ : ShH(X) → Sh(
X/H) by qH∗ (F) = (q∗F)
H the subsheaf of H-
invariants of q∗F. It is known that the functor q
∗ : Sh(X/H) → ShH(X)
is an equivalence of categories with inverse functor qH∗ .
In general, if H is a closed normal subgroup of G and X is a G-
variety such that the quotient X/H exists (as above), then X/H is a G/H-
variety and the functor q∗ : ShG/H(X/H) → ShG(X) is an equivalence
of categories with inverse functor qH∗ : ShG(X) → ShG/H(
X/H).
4.2. Induction equivalence. Consider now a connected linear alge-
braic group G and a parabolic subgroup P in G. Let X denote a
P -variety. Then G×X is a G× P -variety by the action
(g, p)(h, x) = (ghp−1, px),
for g, h ∈ G, p ∈ P and x ∈ X . Then the quotient, denoted by
G ×P X , of G × X by P exists and the associated quotient map q :
G×X → G×P X is a locally trivial principal P -bundle. The quotient
of G × X by G also exists and may be identified with the projection
p2 : G×X → X . In particular, we may apply the above consideration
to obtain equivalences of the categories ShP (X), ShG×P (G × X) and
ShG(G×P X). Notice that under this equivalence a P -linearized sheaf
F on X corresponds to the G-linearized sheaf IndGP (F) = (q∗p
P . In
particular, the space of global sections of IndGP (F) equals
IndGP (F)(G×P X) =
p∗2F(G×X)
k[G]⊗k F(X)
= IndGP (F(X)),
where the second equality follows by the Künneth formula. This also
explains the notation IndGP (F). Similarly, starting with a G-linearized
sheaf G on G×P X then the associated P -linearized line bundle on X
equals G′ = ((p2)∗q
∗G)G. However, by [Bri, Lemma 2(1)] the latter also
equals the simpler pull back i∗G by the P -equivariant map
i : X → G×P X,
sending x to q(1, x). In particular, we conclude that the functor i∗ :
ShG(G ×P X) → ShP (X) is an equivalence of categories with inverse
functor IndGP . Notice also that the space of global sections of G is
G-equivariantly isomorphic to
G(G×P X) = Ind
(i∗G)(X)
which follows by (3) above.
4.3. Duality. Assume that the field k has positive characteristic p > 0.
Regard X ′ as a H-variety in the canonical way and let F denote a H-
linearized sheaf on X ′. The sheaf (FX)
!F, defined in Section 3.7, is
then naturally a H-linearized sheaf on X . Moreover, the induced H-
linearization of (FX)∗(FX)
!F coincides with the natural H-linearization
HomOX′
(FX)∗OX ,F
When X is smooth the sheaf (FX)
!OX′ is canonically isomorphic to
X (cf. Section 3.7). We may use this isomorphism to define a H-
linearization of ω
X . Alternatively we may consider the natural H-
linearization of the dualizing sheaf ΩX of X and use this to define a
H-linearization of ω
X . It may be checked that the two stated ways
of defining a H-linearization of ω
X coincide.
5. Frobenius splitting of G×P X
Let G denote a connected linear algebraic group over an algebraically
closed field k of characteristic p > 0. Let P denote a parabolic subgroup
of G and let X denote a P -variety. In this section we want to consider
Frobenius splittings of the quotient Z = G×PX of G×X by P . We let
π : Z → G/P denote the morphism induced by the projection of G×X
on the first coordinate. When g ∈ G and x ∈ X we use the notation
[g, x] to denote the element in Z represented by (g, x).
5.1. Decomposing the Frobenius morphism. The Frobenius mor-
phism FZ admits a decomposition FZ = Fb ◦ Ff where Fb (resp. Ff )
is related to the Frobenius morphism on the base (resp. fiber) of π.
More precisely, define Ẑ and the morphisms π̂ and Fb as part of the
fiber product diagram
(4) Ẑ
Fb //
// (G/P)′
A local calculation shows that we may identify Ẑ with the quotient
G ×P X
′, where the P -action on the Frobenius twist X ′ of X is the
natural one. With this identification π̂ : G ×P X
′ → G/P is just the
14 XUHUA HE AND JESPER FUNCH THOMSEN
map [g, x′] 7→ gP . It also follows that the natural morphism (induced
by the Frobenius morphism on X)
Ff : G×P X → G×P X
makes the following diagram commutative
Fb //
// (G/P)′
5.2. Let M denote a P -linearized line bundle on X and let MZ =
IndGP (M) denote the associated G-linearized line bundle on Z. The
main aim of this section is to construct global sections of the sheaf
F (Z) = HomOZ′
(FZ)∗MZ ,OZ′
To this end we fix a P -character λ and let L denote the associated line
bundle on G/P (cf. Section 4). The pull back π̂∗L of L to Ẑ is then
denoted by LẐ . We then define the following sheaves
F (Z)f := HomOẐ
(Ff)∗MZ ,OẐ
F (Z)b := HomOZ′
(Fb)∗LẐ ,OZ′
with spaces of global sections denoted by End
F (Z)f and End
F (Z)b.
Notice that whenM is substituted with the P -linearized twistM(−λ) :=
M⊗ k−λ then
M(−λ)Z = MZ ⊗ π
∗(L−1) = MZ ⊗ (Ff)
and thus by the projection formula
(5) End
M(−λ)Z
F (Z)f = HomOẐ
(Ff)∗MZ ,LẐ
Sections of End
F (Z) are then constructed as compositions of global
sections of the sheaves End
M(−λ)Z
F (Z)f and End
F (Z)b. More precisely,
v ∈ HomO
(Ff)∗MZ ,LẐ
u ∈ HomOZ′
(Fb)∗LẐ ,OZ′
are global sections of the latter sheaves, then the composition u◦(Fb)∗v
defines a global section of End
F (Z).
5.3. An equivariant setup. We now give equivariant descriptions of
the sheaves End
F (Z)f and End
F (Z)b.
5.3.1. A description of End
F (Z)f . Now End
F (Z)f is a G-linearized
sheaf on Ẑ = G×P X
′. Let Y denote a P -stable subvariety of X and
let ZY = G ×P Y denote the associated subvariety of Z with sheaf of
ideals IZY ⊂ OZ . Let ẐY denote the subvariety G ×P Y
′ of Ẑ. Then
there is a natural morphism of G-linearized sheaves
F (Z)f → HomOẐ
(Ff )∗(MZ ⊗ IZY ),OẐY
induced by the inclusion IZY ⊂ OZ and the projection OẐ → OẐY . We
let End
F (Z,ZY )f denote the kernel of the above map and arrive at a
left exact sequence of G-linearized sheaves
0 → EndMZF (Z,ZY )f → End
F (Z)f → HomOẐ
(Ff)∗(MZ⊗IZY ),OẐY
In particular, the space of global sections of End
F (Z,ZY )f is identified
with the set of elements in End
F (Z)f which map (Ff )∗(MZ ⊗ IZY ) to
⊂ OẐ . Using the observations in Section 4.2 we can give another
description of the space of global sections of End
F (Z,ZY )f . Let i
X ′ → G×P X
′ denote the morphism i′(x′) = [1, x′]. Then the functor
i′ is exact on the category of G-linearized sheaves. We want to apply
this fact on the left exact sequence (6) above : notice first that
(i′)∗End
F (Z)f = HomOX′
(i′)∗(Ff )∗MZ ,OX′
where, moreover, (i′)∗(Ff)∗MZ = (FX)∗M. Thus (i
′)∗End
F (Z)f =
EndMF (X). Similarly,
(i′)∗HomO
(Ff)∗(MZ ⊗ IZY ),OẐY
= HomOX′ ((FX)∗(M⊗ IY ),OY ′).
In particular, we see that the P -linearized sheaf on X ′ corresponding to
the G-linearized sheaf End
F (Z,ZY )f equals the kernel of the natural
EndMF (X) → HomOX′ ((FX)∗(M⊗ IY ),OY ′),
i.e. it equals EndMF (X, Y ). By Section 4.2 the space of global sections
F (Z,ZY )f of End
F (Z,ZY )f is then G-equivariantly isomorphic
IndGP
EndMF (X, Y )).
Applying the above conclusions to the sheaf M(−λ) we find:
Proposition 5.1. There exists a G-equivariant isomorphism
M(−λ)Z
F (Z)f ≃ Ind
EndMF (X)⊗ kλ)
such that when Y is a closed P -stable subvariety of X then the subset of
elements of End
M(−λ)Z
F (Z)f which map (Ff)∗(MZ⊗IZY ) to (IẐY ⊗LẐ) ⊂
LẐ (cf. equation (5)) is identified with
M(−λ)Z
F (Z,ZY )f ≃ Ind
EndMF (X, Y )⊗ kλ).
16 XUHUA HE AND JESPER FUNCH THOMSEN
5.3.2. A description of End
F (Z)b. As π
′ in the fibre-diagram (4) is
flat the natural morphism (π′)∗(FG/P )∗L → (Fb)∗π̂
∗L is an isomor-
phism ([Har2, Prop.III.9.3]). Thus there is a natural isomorphism of
G-linearized sheaves
F (Z)b ≃ (π
′)∗HomO(G/P )′
(FG/P )∗L,O(G/P)′
= (π′)∗EndLF (
G/P).
Let V denote a closed subvariety of G/P . Then EndLF (
G/P , V ) is the
kernel of the natural map
EndLF (
G/P) → HomO(G/P )′
(FG/P )∗(IV ⊗ L),OOV ′
In particular, (π′)∗
EndLF (
G/P , V )
maps into the kernel of the induced
morphism
(7) End
F (Z)b → (π
′)∗HomO
(G/P )′
(FG/P )∗(IV ⊗ L),OOV ′
Let q : G→ G/P denote the quotient map. Then π̂−1(V ) identifies with
the quotient q−1(V )×P X
′. Moreover, as π′ is locally trivial it follows
that π̂∗(IV ) = Iq−1(V )×PX′. In particular, the sheaf
(π′)∗HomO(G/P )′
(FG/P )∗(IV ⊗ L),OOV ′
is isomorphic to
HomOZ′
(Fb)∗(Iq−1(V )×PX′ ⊗ LẐ),O(q−1(V )×PX)′
Thus we see that the kernel of (7) is the subsheaf End
F (Z, π
−1(V ))b
of elements which map (Fb)∗(Iq−1(V )×PX′ ⊗ LẐ) to I(q−1(V )×PX)′ . The
global sections of this subsheaf is denote by End
F (Z, π
−1(V ))b. In
conclusion
Proposition 5.2. The map π′ induces a G-equivariant morphism
(π′)∗ : EndLF (
G/P) → End
F (Z)b.
Moreover, when V is a closed subvariety of G/P then (π′)∗ maps the
subset EndLF (
G/P , V ) into End
F (Z, q
−1(V )×P X)b.
The following is also useful.
Lemma 5.3. Let Y denote a closed P -stable subvariety of X and fix
notation as above. Then each element of End
F (Z)b maps (Fb)∗(IẐY ⊗
LẐ) to I(ZY )′.
Proof. It suffices to show that the natural morphism
HomOZ′
(Fb)∗LẐ ,OZ′
→ HomOZ′
(Fb)∗(IẐY ⊗ LẐ),O(ZY )′
is zero. By linearity, this will follow if the natural morphism
I(ZY )′ ⊗ (Fb)∗LẐ → (Fb)∗(IẐY ⊗ LẐ),
is an isomorphism, which can be checked by a local calculation. �
5.4. Conclusions. By Proposition 5.1 an element v in the vectorspace
IndGP
EndMF (X) ⊗ kλ
defines an element in End
M(−λ)Z
F (Z)f . More-
over, by Proposition 5.2, an element u ∈ EndLF (
G/P) defines an element
(π′)∗(u) in End
F (Z)b. Thus by the discussion in Section 5.2 we obtain
a G-equivariant map
M,λ : End
G/P)⊗ IndGP
EndMF (X)⊗ kλ
→ End
F (G×P X),
defined by
M,λ(u⊗ v) = (π
′)∗u ◦ (Fb)∗v.
We can now prove
Theorem 5.4. Let X denote a P -variety and M denote a P -linearized
line bundle on X. Let L denote the equivariant line bundle on G/P
associated to the P -character λ. Then the G-equivariant map Φ1
defined above, satisfies
(1) When Y is a P -stable closed subvariety of X then the restriction
of Φ1
M,λ to the subspace :
EndLF (
G/P)⊗ IndGP
EndMF (X, Y )⊗ kλ
maps to End
F (G×P X,G×P Y ).
(2) When V denotes a closed subvariety of G/P then the restriction
of Φ1
M,λ to the subspace
EndLF (
G/P , V )⊗ IndGP
EndMF (X)⊗ kλ
maps to End
F (G ×P X, q
−1(V ) ×P X), where q : G → G/P
denotes the quotient map.
Proof. The first statement follows from Proposition 5.1 and Lemma
5.3. The second statement follows from Proposition 5.2 and Lemma
5.5 below. �
Lemma 5.5. Let V denote a closed subset of G/P . Then every element
of End
M(−λ)Z
F (Z)f will map (Ff)∗(MZ ⊗ Iπ−1(V )) to I(π̂)−1(V ) ⊗ LẐ .
Proof. It suffices to prove that the natural morphism
I(π̂)−1(V ) ⊗ (Ff)∗MZ → (Ff)∗
Iπ−1(V ) ⊗MZ
is an isomorphism, which can be checked by a local calculation. �
5.5. Identify IndGP
with the space of global sections of MZ (cf.
Equation (3)). Then we can define a G-equivariant morphism
(8) End
F (G×P X)⊗ Ind
→ EndF (G×P X),
by mapping s⊗σ, for σ a global section of MZ and s : (FZ)∗MZ → OZ′ ,
to the element
(FZ)∗OZ
(FZ )∗σ
−−−−→ (FZ)∗MZ
−→ OZ′ ,
18 XUHUA HE AND JESPER FUNCH THOMSEN
in EndF (G×PX). Combining Φ
M,λ with the morphism in (8) we obtain
a G-equivariant map
ΦM,λ : End
G/P)⊗ IndGP
EndMF (X)⊗kλ
⊗ IndGP
→ EndF (Z),
where an element u⊗ v⊗ σ in the domain is mapped to the composed
(9) (FZ)∗OZ
(FZ)∗σ
−−−−→ (FZ)∗MZ
(Fb)∗v
−−−→ (Fb)∗LẐ
(π′)∗u
−−−→ OZ′.
Notice that by Lemma 3.5 the map u ∈ EndLF (
G/P) factors as
(10) (FG/P )∗L
(FG/P )∗u
−−−−−→ (FG/P )∗ω
→ O(G/P)′ ,
where u! is some global section of the line bundle Ľ := ω
⊗L−1 as-
sociated to u (cf. Section 3.7), and the rightmost map is the evaluation
map with domain (FG/P )∗ω
= EndF (G/P). It follows that we may
extend (9) into a commutative diagram
(FZ)∗OZ
(FZ)∗σ //
(Fb)∗π̂
∗(u!)
(FZ)∗MZ
(Fb)∗v //
(Fb)∗π̂
∗(u!)
(Fb)∗LẐ
(π′)∗u
(Fb)∗π̂
∗(u!)
(FZ)∗π
∗Ľ // (FZ)∗(MZ ⊗ π
∗Ľ) //// (Fb)∗(π̂
rrrrrrrrrrr
where all the vertical maps are induced by multiplication by π̂∗(u!).
Likewise the lower horizontal maps are induced from the upper hori-
zontal maps by multiplication with π̂∗(u!). The triangle on the right is
induced from (10) by pull-back to Z ′.
Theorem 5.6. Let X denote a P -variety and M denote a P -linearized
line bundle on X. Let L denote the equivariant line bundle on G/P
associated to the P -character λ. Then the G-equivariant map ΦM,λ,
defined above, satisfies
(1) When Y is a P -stable closed subvariety of X then the restriction
of ΦM,λ to the subspace :
EndLF (
G/P)⊗ IndGP
EndMF (X, Y )⊗ kλ
⊗ IndGP
maps to EndF (G×P X,G×P Y ).
(2) When V denotes a closed subvariety of G/P then the restriction
of ΦM,λ to the subspace :
EndLF (
G/P , V )⊗ IndGP
EndMF (X)⊗ kλ
⊗ IndGP
maps to EndF (G ×P X, q
−1(V ) ×P X), where q : G → G/P
denotes the quotient map.
Moreover, let u ∈ EndLF (
G/P), v ∈ IndGP
EndMF (X) ⊗ kλ
and σ ∈
IndGP
. Then the element ΦM,λ(u⊗ v ⊗ σ) factorizes both as
(FZ)∗OZ
(FZ)∗σ
−−−−→ (FZ)∗MZ
−→ OZ′,
and as
(FZ)∗OZ
(FZ)∗(σ⊗π
−−−−−−−−→ (FZ)∗(MZ ⊗ π
−→ OZ′,
where s1 and s2 satisfies
i) If v is contained in IndGP
EndMF (X, Y ) ⊗ kλ
then s1 and s2 are
compatible with G×P Y .
ii) If u is contained in EndLF (
G/P , V ) then s1 is compatible with q
−1(V )×P
Proof. Part (1) and (2) follows directly from Theorem 5.4 and the defi-
nition of ΦM,λ. The existence of s1 and s2 follows by the diagram (11).
Finally the claims about the compatibility of s1 and s2 follows from
Theorem 5.4 and Lemma 5.3. �
5.6. We will now describe when an element in the image of ΦM,λ defines
a Frobenius splitting of Z. For this we consider the composed map
evZ◦ΦM,λ. Recall that an element s ∈ EndF (Z) is a Frobenius splitting
of Z if and only if evZ(s) is the constant function 1 on Z
Let u ∈ EndLF (
G/P), v ∈ IndGP
EndMF (X)⊗kλ
and σ ∈ IndGP
By Equation (9) the image of u⊗v⊗σ under evZ ◦ΦM,λ coincides with
the global section of OZ′ determined by the composed map
(12) OZ′
−→ (FZ)∗OZ
(FZ )∗σ
−−−−→ (FZ)∗MZ
(Fb)∗v
−−−→ (Fb)∗LẐ
(π′)∗u
−−−→ OZ′ .
We may divide this composition into two parts. The first part
−→ (FZ)∗OZ
(FZ)∗σ
−−−−→ (FZ)∗MZ
(Fb)∗v
−−−→ (Fb)∗LẐ
is defined by σ and v and defines a global section of LẐ . The corre-
sponding map
M,λ : Ind
EndMF (X)⊗ kλ
⊗ IndGP
→ IndGP
k[X ′]⊗ kλ
is the map induced by the morphism
(13) EndMF (X)⊗M(X) → k[X
mapping s : (FX)∗M → OX′ and τ a global section of M, to s(τ).
Notice that we here identify IndGP
k[X ′]⊗ kλ
with the space of global
sections of LẐ (cf. Equation (3)). The second part takes a global
section τ̃ of LẐ and an element u in End
G/P) to the global section of
OZ′ defined by
−→ (Fb)∗OẐ
(Fb)∗τ̃
−−−→ (Fb)∗LẐ
(π′)∗u
−−−→ OZ′.
20 XUHUA HE AND JESPER FUNCH THOMSEN
The corresponding map is
Φλ : End
G/P)⊗ IndGP
k[X ′]⊗ kλ
→ k[Z ′],
which maps u⊗ τ̃ , to ((π′)∗u)(τ̃) (cf. Proposition 5.2). The restriction
of Φλ :
(14) φλ : End
G/P)⊗ IndGP
is the map corresponding to Φλ in case X is the one point space Spec(k)
(in which case k[X ′] is just k). In combination this defines us a com-
mutative diagram
EndLF (
G/P)⊗ IndGP
EndMF (X)⊗ kλ
⊗ IndGP
Id⊗Φ2
ΦM,λ // EndF (Z)
EndLF (
G/P)⊗ IndGP
k[X ′]⊗ kλ
) Φλ // k[Z ′]
EndLF (
G/P)⊗ IndGP
φλ // k
evG/P
33ggggggggggggggggggggggggggggggg
wheremλ is the natural map which makes the lower part of the diagram
commutative. Notice that when k[X ′] = k, e.g. if X ′ is a complete and
irreducible variety, then φλ and Φλ coincides. Let χ denote the P -
character associated to the canonical G-linearization of ω−1G/P (cf. Sec-
tion 4.3). Then as noted earlier (Section 5.5) the G-module EndLF (
coincides with the space of global sections of Ľ = ω
⊗L−1 and thus
coincides with
(16) EndLF (
G/P) = IndGP
(p− 1)χ− λ
where we abuse notation and write (p− 1)χ− λ for the 1-dimensional
P -representation associated with the character (p− 1)χ−λ. It follows
that mλ is the natural multiplication map
(17) mλ : Ind
(p− 1)χ− λ
⊗ IndGP
→ IndGP
(p− 1)χ
which is surjective if the domain is nonzero, i.e. if L and ω
⊗ L−1
are effective line bundles on G/P [R-R, Thm.3].
The commutativity of the diagram (15) then implies:
Proposition 5.7. Let Ξ denote an element in the domain of ΦM,λ,
and assume that the image (Id⊗Φ2
M,λ)(Ξ) is contained in the subspace
EndLF (
G/P)⊗IndGP
(cf. diagram (15)). Then ΦM,λ(Ξ) is a Frobenius
splitting of Z if and only if φλ((Id ⊗ Φ
M,λ)(Ξ)) equals the constant 1.
In particular, if EndLF (
G/P) ⊗ IndGP
is nonzero and IndGP
contained in the image of Φ2
M,λ, then Z admits a Frobenius splitting.
Proof. The first part of the proof is just a restatement of the fact that
the diagram (15) is commutative. The second part follows by the sur-
jectivity of mλ and the fact that G/P admits a Frobenius splitting.
Corollary 5.8. Assume that X is irreducible and complete. If both
IndGP
and IndGP
(p − 1)χ − λ
are nonzero and Φ2
M,λ is surjective,
then Z admits a Frobenius splitting.
5.7. In many concrete situation the existence of a P -invariant ele-
ment in EndMF (X) ⊗ kλ is given. Notice that this is equivalent to a
G-invariant element v in IndGP
EndMF (X) ⊗ kλ
and thus ΦM,λ defines
a G-equivariant map
(18) EndLF (
G/P)⊗ IndGP
→ EndF (Z),
u⊗ σ 7→ ΦM,λ(u⊗ v ⊗ σ).
Similarly Φ2
M,λ defines a G-equivariant morphism
(19) IndGP
→ IndGP
k[X ′]⊗ kλ
which makes the diagram
(20) EndLF (
G/P)⊗ IndGP
// EndF (Z)
EndLF (
G/P)⊗ IndGP
k[X ′]⊗ kλ
) Φλ // k[Z ′]
commutative. We also note
Corollary 5.9. Assume that X is irreducible and complete and let v
denote a P -invariant element of EndMF (X)⊗ kλ. If the induced map
M,λ)|v⊗IndGP (M(X))
: IndGP
→ IndGP
is surjective then Z admits a Frobenius splitting. In particular, if
IndGP
is an irreducible G-representation then for Z to be Frobenius
split it suffices that the latter map is nonzero.
Proof. Apply Corollary 5.8. �
6. B-Canonical Frobenius splittings
In this section we continue the study of the Frobenius splitting prop-
erties of Z = G ×P X . The notation is kept as in Section 5 but we
restrict ourselves to the case where G is a connected, semisimple and
simply connected linear algebraic group. Moreover, we fix P = B,
M = OX and λ = −(p − 1)ρ. Recall that, in this setup, the dualizing
sheaf ωG/B is the G-linearized sheaf associated to the B-character 2ρ.
22 XUHUA HE AND JESPER FUNCH THOMSEN
Thus, with the notation in Section 5.6, we have χ = −2ρ. Recall also
the G-equivariant identity (see (16))
(21) EndLF (
G/B) ≃ IndGB((p− 1)χ− λ) = Ind
B(λ) = Ind
B((1− p)ρ).
The latter G-module is called the Steinberg module of G and will be
denoted by St. The Steinberg module is a simple and selfdual G-
module. A B-canonical Frobenius splitting ofX is then a B-equivariant
(22) θ : St⊗ k(p−1)ρ → EndF (X),
containing a Frobenius splitting in its image. Notice that a B-canonical
Frobenius splitting of X is not a Frobenius splitting as defined in Sec-
tion 3.2. However, there exists a unique nonzero lowest weight vector
v− of St such that θ(v−) is a Frobenius splitting in the sense of Sec-
tion 3.2. Moreover, as St is a simple G-module the map θ is uniquely
determined by θ(v−), and we may thus identify θ with θ(v−). In this
way θ(v−) will also be called a B-canonical Frobenius splitting of X .
The importance of B-canonical Frobenius splittings was first ob-
served by O. Mathieu in connection with good filtrations of G-modules.
We refer to [B-K, Chapter 4] for a general reference on B-canonical
Frobenius splittings.
6.1. Consider a B-canonical Frobenius splitting as in (22). By Frobe-
nius reciprocity this defines a map
St → IndGB
EndF (X)⊗ kλ
and as IndGB
k[X ]
contains k we may consider the inducedG-equivariant
morphism
θ̃ : St → IndGB
EndF (X)⊗ kλ
⊗ IndGB
k[X ]
Composing θ̃ with the map Φ2
M,λ of Section 5.6 we end up with a map
M,λ ◦ θ̃ : St → Ind
k[X ′]⊗ kλ
We claim
Lemma 6.1. The composed map Φ2
M,λ ◦ θ̃ is an isomorphism on its
image IndGB
Proof. We first prove that the image of Φ2
M,λ◦θ̃ is contained in Ind
For this let EndF (X)c denote the inverse image of k ⊂ k[X
′] under
the evaluation map evX . It suffices to prove that the image of θ is
contained in EndF (X)c. Notice that EndF (X)c is a B-submodule of
EndF (X) containing the set of Frobenius splittings of X . In particular,
the image of the lowest weight space of St under θ is contained in
EndF (X)c. Moreover, as St is an irreducible G-module it is generated
by the lowest weight space as a B-module. Thus, the image of θ will
be contained in the B-module EndF (X)c.
Now Φ2
M,λ ◦ θ̃ is a map from St to Ind
B(λ) = St. Thus, by Frobenius
reciprocity, it suffices to prove that Φ2
M,λ ◦ θ̃ is nonzero which is the
case as θ contains a Frobenius splitting in its image. �
Using Lemma 6.1 we can now combine the diagram (15) with the
map Φ2
M,λ ◦ θ̃ and obtain a commutative and G-equivariant diagram
(23) St⊗ St
≃ Id⊗(Φ2
Θ // EndF (G×B X)
evZ // k[Z ′]
EndLF (
G/B)⊗ IndGB
φλ // k
evG/B
88qqqqqqqqqqqqqq
where Θ is the map induced by θ̃ and ΦM,λ. By Proposition 5.7 it
follows that Θ(Ξ), for Ξ in St⊗ St, is a Frobenius splitting of Z if and
only if the image of Ξ under φλ and Id ⊗ (Φ
M,λ ◦ θ̃) equals 1. The
latter map from St⊗St to k will be denoted by φ. By construction φ is
G-equivariant. Moreover, mλ is surjective and evG/B is nonzero (as G/B
admits a Frobenius splitting) and thus φ is nonzero. As St is a simple
G-module it follows that
(24) φ : St⊗ St → k,
defines a nondegenerate G-invariant bilinear form on St. By Frobenius
reciprocity such a form is uniquely determined up to a nonzero con-
stant. In particular, this provides a very useful way to construct lots
of Frobenius splittings of Z.
Corollary 6.2. Let θ : St⊗ k(p−1)ρ → EndF (X) denote a B-canonical
Frobenius splitting of X. Then the induced morphism (defined above)
Θ : St⊗ St → EndF (G×B X),
satisfies the following
(1) The image Θ(ν) of an element ν in St⊗ St defines a Frobenius
splitting of G×BX up to a nonzero constant if and only if φ(ν)
is nonzero.
(2) If the image of θ is contained in EndF (X, Y ) for a B-stable
closed subvariety Y of X, then the image of Θ is contained in
EndF (G×B X,G×B Y ).
(3) Let v denote an element of St = EndLF (
G/B) which is compatible
with a closed subvariety V of G/B. For any element v′ ∈ St we
Θ(v ⊗ v′) ∈ EndF (G×B X, q
−1(V )×B X),
with q : G→ G/B denoting the quotient map.
24 XUHUA HE AND JESPER FUNCH THOMSEN
(4) Any element of the form Θ(v ⊗ v′) factorizes as
(FZ)∗OZ
(FZ )∗π
−−−−−→ (FZ)∗π
−→ OZ′ ,
where Z = G×B X and L is the line bundle on G/B associated
to the B-character (1 − p)ρ. Moreover, if the image of θ is
contained in EndF (X, Y ) then s is compatible with G×B Y .
Proof. All statements follows directly from Theorem 5.6 and the con-
siderations above. �
The first part (1) and (2) of the above result is well known (see e.g.
[B-K, Ex. 4.1.E(4)]). However, the second part (3) and (4) seems to
be new.
6.2. B-canonical Frobenius splitting when G is not semisim-
ple. Although Corollary 6.2 is only stated for connected, semisimple
and simply connected groups it also applies in other cases : assume
that G is a connected linear algebraic group containing a connected
semisimple subgroup H such that the induced map H/H∩B → G/B is
an isomorphism. E.g. this is satisfied for any parabolic subgroup of a
reductive connected linear algebraic group. Let qsc : Hsc → H denote
a simply connected cover of H . Then X admits an action of the par-
abolic subgroup Bsc := q
sc (B ∩ H) of Hsc. Furthermore, the natural
morphism
Hsc ×Bsc X → G×B X,
is then an isomorphism. We then say that X admits a B-canonical
Frobenius splitting if X , as a Bsc-variety, admits a Bsc-canonical Frobe-
nius splitting. In this case we may apply Corollary 6.2 to obtain Frobe-
nius splitting properties of G×B X .
6.3. Restriction to Levi subgroups. Return to the situation where
G is connected, semisimple and simply connected. Let J be a subset of
the set of simple roots ∆ and let GJ denote the commutator subgroup
of LJ . Then GJ is a connected, semisimple and simply connected linear
algebraic group with Borel subgroup BJ = GJ ∩B and maximal torus
TJ = T ∩ GJ . We let StJ denote the associated Steinberg module.
Notice that StJ = Ind
((1− p)ρJ ) where ρJ denotes the restriction of
ρ to BJ . The following should be well known but we do not know a
good reference.
Lemma 6.3. There exists a GJ -equivariant morphism
StJ → St,
such that the B−J -invariant line of StJ maps surjectively to the B
invariant line of St. In particular, if X is a G-variety admitting a B-
canonical Frobenius splitting then X admits a BJ -canonical Frobenius
splitting as a GJ -variety.
Proof. Let M denote the T -stable complement to the B-stable line in
St. Then M is B−-invariant and thus also B−J -invariant. The trans-
late ẇJ0M is then invariant under BJ and we obtain a BJ -equivariant
morphism
St → St/(ẇJ0M) ≃ k(1−p)ρJ .
By Frobenius reciprocity this defines a GJ -equivariant map St → StJ
such that the B-stable line of St maps onto the BJ -stable line of StJ .
Now apply the selfduality of StJ and St to obtain the desired map.
This proves the first part of the statement.
The second part follows easily by composing the obtained morphism
StJ → St with the B-canonical Frobenius splitting
St → EndF (X)⊗ k(1−p)ρ,
of X and noticing that the restriction of ρ to BJ is ρJ .
7. Applications to G×G-varieties
In this section we consider a linear algebraic group G satisfying the
conditions of Section 6.2, i.e. we assume that G contains a closed
connected semisimple subgroup H such that H/H∩B → G/B is an iso-
morphism. We also let Hsc denote the simply connected version of H
and let Bsc denote the associated Borel subgroup.
7.1. A well known result. Consider for a moment (i.e. in this sub-
section) the case where G is semisimple and simply connected. Re-
member that the G-linearized line bundle on G/B associated to the B-
character 2ρ coincides with the dualizing sheaf ωG/B. Let L denote the
line bundle on G/B associated to the B-character (1−p)ρ and recall from
Section 6 the notation St = IndGB((1−p)ρ) for the Steinberg module. As
the Steinberg module is a selfdual G-module we may fix a G-invariant
nonzero element v∆ in the tensorproduct St⊗ St. We may think of v∆
as a global section of the line bundle L⊠ L on (G/B)2 = G/B × G/B.
Identify G/B × G/B with G×B G/B by the isomorphism
G×B G/B → G/B × G/B,
[g, hB] 7→ (gB, ghB),
and let D denote the subvariety of G/B × G/B corresponding to G ×B
∂(G/B), where ∂(G/B) denotes the union of the codimension 1 Schubert
varieties in G/B. Then, by [B-K, proof of Thm.2.3.8], the zero scheme
of v∆ equals (p− 1)D. Consider then the natural morphism :
η : (L⊠ L)⊗ (L⊠ L) → ω
(G/B)2
= End!F ((
G/B)2)
and define
ηD : (L⊠ L) → End
G/B)2),
26 XUHUA HE AND JESPER FUNCH THOMSEN
as in Lemma 3.6, using the identification L ⊠ L = O(G/B)2
(p − 1)D
Then by Lemma 3.6 the image of ηD is contained in End
G/B)2, D)
and thus the associated element
η′D ∈ HomO((G/B)2)′
(F(G/B)2)∗(L⊠ L),O((G/B)2)′
is compatible with D. It follows
Lemma 7.1. The element in
EndL⊠LF ((
G/B)2) ≃ St⊠ St
defined by v∆ is compatible with the diagonal diag(G/B) in G/B × G/B.
Proof. We have to prove that η′D, defined above, is compatible with the
diagonal diag(G/B). As η′D is compatible with D it suffices to show that
EndL⊠LF ((
G/B)2, D) is contained in EndL⊠LF ((
G/B)2, diag(G/B)
. This fol-
lows by an application of Lemma 3.1 and an argument as at the end of
the proof of [B-K, Thm.2.3.1]. �
7.2. We return to the setup as in the beginning of this section. We
want to apply the results of the preceding sections to the case when
the group equals G×G. So let X denote a B ×B-variety and assume
that X admits a Bsc ×Bsc-canonical Frobenius splitting defined by
θ : (St⊠ St)⊗ (k(p−1)ρ ⊠ k(p−1)ρ) → EndF (X),
which is compatible with certain B×B-stable subvarieties X1, . . . , Xm,
i.e. the image of θ is contained in EndF (X,Xi) for all i. Then
Theorem 7.2. The variety (G × G) ×(B×B) X admits a diag(Bsc)-
canonical Frobenius splitting which compatibly Frobenius splits the sub-
varieties diag(G)×diag(B) X and (G×G)×(B×B) Xi for all i.
Proof. It suffices to consider the case where G = Hsc (cf. discussion
in Section 6.2). By Corollary 6.2 there exists a G × G-equivariant
morphism
Θ : (St⊠ St)⊗ (St⊠ St) → EndF ((G×G)×(B×B) X),
satisfying certain compatibility conditions. Let v∆ ∈ St ⊠ St be a
nonzero diag(G)-invariant element and let v ∈ St ⊠ St be arbitrary.
Then by Corollary 6.2 and Lemma 7.1 the element Θ
v∆ ⊗ v
is com-
patible with diag(G) ×diag(B) X and (G × G) ×(B×B) Xi for all i. In
particular, if we define the diag(G)-equivariant morphism
Θ∆ : St⊗ St → EndF ((G×G)×(B×B) X),
by Θ∆(v) = Θ
v∆ ⊗ v
, then every element in the image of Θ∆ is
compatible with diag(G) ×diag(B) X and (G × G) ×(B×B) Xi for all i.
Consider k(p−1)ρ as the highest weight line in St. Then the restriction
of Θ∆ to St⊗ k(p−1)ρ defines a diag(B)-canonical Frobenius splitting of
(G×G)×(B×B) X with the desired properties. �
Notice that by the general machinery of canonical Frobenius split-
tings (see e.g. [B-K, Prop.4.1.17]) the existence of a Frobenius splitting
of diag(G)×diag(B) X follows if X admits a diag(Bsc)-canonical Frobe-
nius splitting. In the above setup X only admits a Bsc ×Bsc-canonical
Frobenius splitting which is less restrictive. However, in contrast to the
situation when X admits a diag(Bsc)-canonical Frobenius splitting, the
present Frobenius splitting is not necessarily compatible with subvari-
eties of the form BẇB ×B X , with w denoting an element of the Weyl
group and BẇB denoting the closure of BẇB in G.
8. G-Schubert varieties in equivariant Embeddings
From now on, unless otherwise stated, we assume that G is a con-
nected reductive group.
8.1. Equivariant embeddings. Consider G as a G × G-variety by
left and right translation. An equivariant embedding X of G is then
a normal irreducible G × G-variety containing an open dense subset
which is G × G-equivariantly isomorphic to G. In particular, we may
identify G with an open subset of X , and the complement X \ G is
then called the boundary. As G is an affine variety the boundary is of
pure codimension 1 in X [Har, Prop.3.1]. Any equivariant embedding
of G is a spherical variety (with respect to the induced B × B-action)
and thus X contains finitely may B ×B-orbits.
8.2. Wonderful compactifications. When G = Gad is of adjoint
type there exists a distinguished equivariant embedding X of G which
is called the wonderful compactification (see e.g. [B-K, 6.1]).
The boundary X \ G is a union of irreducible divisors Xj , j ∈ ∆,
which intersect transversely. For a subset J ⊂ ∆, we denote the inter-
section ∩j∈JXj by XJ . As a (G×G)-variety, XJ is isomorphic to the
variety (G × G) ×P−
×P∆\J
Y, where Y denotes the wonderful com-
pactification of the group of adjoint type associated to L∆\J . Here the
×P∆\J -action on Y is defined by the quotient maps P∆\J → L∆\J
and P−
→ L∆\J . In particular, X∆ is G×G-equivariantly isomorphic
to the variety G/B− × G/B.
8.3. Toroidal embeddings. Let Gad denote the group of adjoint
type associated to G, and let X denote the wonderful compactifica-
tion of Gad. An embedding X of the reductive group G is then called
toroidal if the canonical map G→ Gad admits an extension X → X.
8.4. G-Schubert varieties. By a G-Schubert variety in an equivari-
ant embedding X we will mean a subvariety of the form diag(G) · V ,
for some B × B-orbit closure V . Notice that diag(G) · V is the image
of diag(G)×diag(B) V under the proper map
diag(G)×diag(B) X → X,
28 XUHUA HE AND JESPER FUNCH THOMSEN
[g, x] 7→ g · x,
and thus G-Schubert varieties are closed diag(G)-stable subvarieties of
If G = Gad and X = X is the wonderful compactification then a G-
Schubert variety in X∆ is diag(G)-equivariantly isomorphic to a variety
of the form G×BX(w), where X(w) denotes a Schubert variety in G/B.
In particular, this explains the name G-Schubert varieties as this is the
name used for varieties of the form G×B X(w).
In the rest of this section, we will relate G-Schubert varieties to
closures of so-called G-stable pieces. Our primary interest are G-stable
pieces in wonderful compactifications but below we will also describe
the toroidal case in general.
8.5. G-stable pieces in the wonderful compactification. LetG =
Gad denote a group of adjoint type and let X denote its wonderful com-
pactification. Let J ⊂ ∆ and identify XJ with (G× G)×P−
×P∆\J
as in Section 8.2. Using this identification it easily follows that there
exists a unique element in XJ which is invariant under U
J × UJ and
diag(LJ). We denote this element by hJ and note that as an element
of (G×G)×P−
×P∆\J
Y it equals [(e, e), eJ ], where e (resp. eJ) denotes
the identity element of G (resp. the adjoint group associated to L∆\J ).
For w ∈ W∆\J , we then let
XJ,w = diag(G)(Bw, 1) · hJ ,
and call XJ,w a G-stable piece of X. A G-stable piece is a locally closed
subset of X and by [L, section 12] and [He, section 2], we can use them
to decompose X as follows
w∈W∆\J
XJ,w.
Moreover, by the proof of [He2, Theorem 4.5], any G-Schubert variety
is a finite union of G-stable pieces. In particular, we may think of
G-Schubert varieties as closures of G-stable pieces.
8.6. G-stable pieces in arbitrary toroidal embeddings. We fix a
toroidal embedding X of G. The irreducible components of the bound-
ary X \G will be denoted by X1, . . . , Xn. For each G×G-orbit closure
Y in X we then associate the set
KY = {i ∈ {1, . . . , n} | Y ⊂ Xi},
where by definition KY = ∅ when Y = X . Then by [B-K, Prop.6.2.3],
Y = ∩i∈KYXi. Moreover, we define
I = {KY ⊂ {1, . . . , n} | Y a G×G-orbit closure in X },
and write XK := ∩i∈KXi for K ∈ I. Then (XK)K∈I are the set of
closures of G×G-orbits in X . Let now πX : X → X denote the given
extension of G → Gad. Then the closure of πX(XK) equals XP (K) for
some unique subset P (K) of ∆. This defines a map P : I → P(∆),
where P(∆) denotes the set of subsets of ∆.
As in [H-T2, 5.4], for K ∈ I we may choose a base point hK in the
open G×G-orbit of XK which maps to hP (K). By [H-T2, Proposition
5.3], XK is then naturally isomorphic to (G×G)×P−
×P∆\J
L∆\J · hK ,
where J = P (K) and L∆\J · hK is a toroidal embedding of a quotient
(L∆\J )/H by some subgroup H of the center of L∆\J .
For K ∈ I and w ∈ W∆\p(K), we then define
XK,w = diag(G)(Bw, 1) · hK ,
and call XK,w a G-stable piece of X . One can then show, in the same
way as in [He2, 4.3], that
w∈W∆\P (K)
XK,w.
Also similar to the proof of [He2, Theorem 4.5], for any B × B-orbit
closure V in X , the G-Schubert variety diag(G) · V is a finite union
of G-stable pieces. In particular, G-Schubert varieties are closures of
G-stable pieces.
9. Frobenius splitting of G-Schubert varieties
In this section, we assume that X is an equivariant embedding of G.
Let Gsc denote a simply connected cover of the semisimple commutator
subgroup (G,G) of G. We fix a Borel subgroup Bsc of Gsc which is
compatible with the Borel subgroup B in G. Similarly we fix a maximal
torus Tsc ⊂ Bsc.
Let X1, . . . , Xn denote the boundary divisors of X . The closure
within X of the B × B-orbit Bsjw0B ⊂ G will be denoted by Dj .
Then Dj is of codimension 1 in X . The translate (w0, w0)Dj of Dj will
be denoted by D̃j.
By earlier work we know
Theorem 9.1. [H-T2, Prop.7.1] The equivariant embedding X admits
a Bsc × Bsc-canonical Frobenius splitting which compatibly Frobenius
splits the closure of every B ×B-orbit.
As a direct consequence of Theorem 7.2 we then obtain
Corollary 9.2. The variety (G × G) ×(B×B) X admits a diag(Bsc)-
canonical Frobenius splitting which is compatible with all subvarieties
of the form (G×G)×(B×B) Y and diag(G)×diag(B)Y , for a B×B-orbit
closure Y in X.
Proposition 9.3. The equivariant embedding X admits a diag(Bsc)-
canonical Frobenius splitting which compatibly splits all G-Schubert va-
rieties in X.
30 XUHUA HE AND JESPER FUNCH THOMSEN
Proof. By Corollary 9.2 the variety Z = diag(G) ×diag(B) X admits
a diag(Bsc)-canonical Frobenius splitting which is compatible with all
subvarieties of the form diag(G)×diag(B) Y , with Y denoting a B ×B-
orbit closure in X . As X is a diag(G)-stable we may identify Z with
G/B ×X using the isomorphism
G×B X → G/B ×X,
[g, x] 7→ (gB, gx).
In particular, we see that the morphism
π : Z = diag(G)×diag(B) X → X,
[g, x] 7→ g · x,
is projective and that π∗(OZ) = OX . As a consequence (see Section 3.8)
the diag(Bsc)-canonical Frobenius splitting of Z induces a diag(Bsc)-
canonical Frobenius splitting of X which is compatible with all subva-
rieties of the form
π(diag(G)×diag(B) Y ) = diag(G) · Y,
i.e. with all the G-Schubert varieties in X . This ends the proof. �
As a direct consequence of Proposition 9.3, we conclude the following
vanishing result (see [B-K, Theorem 1.2.8]).
Corollary 9.4. Let X denote a projective equivariant embedding of G.
Let X denote a G-Schubert variety in X and let L denote an ample line
bundle on X. Then
Hi(X,L) = 0, i > 0.
Moreover, if X̃ ⊂ X is another G-Schubert variety, then the restriction
H0(X,L) → H0(X̃,L),
is surjective.
Later (i.e. Cor. 10.5) we will generalize the vanishing part of this
result to nef line bundle.
9.1. F-splittings along ample divisors. In this subsection we as-
sume that X is toroidal. The following structural properties of toroidal
embeddings can all be found in [B-K, Sect.6.2]. Let X0 denote the com-
plement in X of the union of the subsets BsiB− for i ∈ ∆. If we let T̄
denote the closure of T in X , then X0 admits a decomposition defined
by the following isomorphism
(25) U × U− × (T̄ ∩X0) → X0, (x, y, z) 7→ (x, y) · z.
Moreover, every G×G-orbit in X intersects (T̄ ∩X0) in a unique orbit
under the left action of T . Notice here that as T is commutative the
T × T -orbits and the (left) T -orbit in T will coincide.
Lemma 9.5. Let X denote a projective toroidal equivariant embedding
of G and let Y denote a G × G-orbit closure in X. Let K denote the
subset of {1, . . . , n} consisting of those j such that Y is contained in
the boundary component Xj. Then
Y ∩ (
j /∈K
(1, w0)Di),
has pure codimension 1 in Y and contains the support of an ample
effective Cartier divisor on Y .
Proof. Let XK = ∪j /∈KXj . We claim that Y \X
K coincides with the
open G × G-orbit Y0 of Y . Clearly Y0 is contained in Y \ X
K . On
the other hand, let U be a G×G-orbit in Y \XK . Then Xj contains
U if and only if j /∈ K. But every G × G-orbit closure in X is the
intersection of those Xj which contain it [B-K, Prop.6.2.3]. It follows
that the closure of Y0 and U coincide and thus U = Y0.
As X is normal we may choose a G × G-linearized very ample line
bundle L on X . Then H0(Y,L) is a finite dimensional (nonzero) rep-
resentation of G ×G, and it thus contains a nonzero element v which
is B × B−-invariant up to constants. The support of v is then the
union of B × B−-invariant divisors on Y . As Y0 ∩ (T̄ ∩X0) is a single
T × T -orbit it follows that
Y0 ∩X0 ≃ U × U
− × (Y0 ∩ (T̄ ∩X0)),
is an affine variety and a single B×B−-orbit. In particular, the support
of v is contained in
Y \ (Y0 ∩X0) = Y ∩ (X
(1, w0)Di).
This shows the second part of the statement. The first part follows as
Y0 ∩X0 is affine [Har, Prop.3.1]. �
Let now X denote a smooth projective toroidal embedding of G. As
the line bundles OX(Di) and OX(D̃i) are isomorphic it follows by [B-K,
Prop.6.2.6] that
(26) ω−1X ≃ OX
(Di + D̃i) +
Recall that a X is normal and G is semisimple and simply connected,
any line bundle on X will admit a unique G2sc = Gsc×Gsc-linearization.
In particular, if we let τi denote the canonical section of the line bundle
OX(Di), then we may consider τi as a B
sc = Bsc×Bsc-eigenvector of the
space of global sections of OX(Di). As in the proof of [B-K, Prop.6.1.11]
we find that the associated weight of τi equals ωi ⊠ −w0ωi, where ωi
denotes the i-th fundamental weight. Similarly, we may consider the
canonical section σj of OX(Xj) as a G
sc-invariant element.
32 XUHUA HE AND JESPER FUNCH THOMSEN
Let V denote a B ×B-orbit closure in X . As V is B ×B-stable the
subset Y = (G×G) ·V is closed in X . Thus we may consider Y as the
smallest G×G-invariant subvariety of X containing V . Now define K
as in Lemma 9.5 and let M denote the line bundle
M = OX
(p− 1)(
D̃i +
j /∈K
By Equation (26) and Lemma 3.6 it then follows that multiplication
with τ
i , for i ∈ ∆, and σ
j , for j ∈ K, defines a morphism of
B2sc-linearized line bundles
M → End!F
X, {Di, Xj}i∈∆,j∈K
⊗ kλ⊠λ,
where λ = (1 − p)ρ. By [H-T2, Prop.6.5] and Lemma 3.1 any element
in End!F (X) which is compatible with the closed subvarieties Di, i ∈ ∆,
and Xj, j ∈ K, is also compatible with V and Y . In particular, we
have defined a B2sc-equivariant map
(27) η : M → End!F (X, Y, V
⊗ kλ⊠λ,
which, by Lemma 3.5, is the same as a B2sc-invariant element η
EndMF
X, Y, V
⊗ kλ⊠λ. In particular, this defines us an element
(28) v ∈ Ind
EndMF
X, Y, V
⊗ kλ⊠λ
which is G2sc-invariant. We are then ready to use the ideas explained
in Section 5.7. First we use (18) to construct a morphism
(29) EndL⊠LF
(Gsc/Bsc)
⊗M(X) → EndF
G2sc ×B2sc X
(u, σ) 7→ ΦM,λ⊠λ(u⊗ v ⊗ σ),
where L is the Gsc-linearized line bundle on Gsc/Bsc associated to the
character λ = (1− p)ρ. Notice that we here have used that M(X) is a
G2sc-module.
Lemma 9.6. There exists a G2sc-equivariant map
(30) St⊠ St → M(X),
which maps the B−sc×B
sc-invariant line in St⊠St to a nonzero multiple
of the global section
j /∈K
j ∈ M(X),
where τ̃i denotes the canonical section of OX(D̃i).
Proof. As OX(D̃i) and OX(Di) are isomorphic as line bundles we may
consider the element
j /∈K
as a global section ofM. Then σ is a B2sc-eigenvector inM(X) of weight
(p − 1)ρ ⊠ (p − 1)ρ. In particular, σ induces a Bsc × Bsc-equivariant
k(p−1)ρ ⊠ k(p−1)ρ → M(X).
Applying Frobenius reciprocity and the selfduality of the Steinberg
module St, this defines the desired map
St⊠ St → M(X),
with the stated properties. �
Combining the map (29) with the map (30) in Lemma 9.6 we obtain
a G2sc-equivariant map
(31) Θ : EndL⊠LF
(Gsc/Bsc)
St⊠ St
→ EndF
G2sc ×B2sc X
We will now study when the map (31) describes a Frobenius splitting
of G2sc ×B2sc X . Consider the G
sc-equivariant map
(32) M(X) → St⊠ St,
σ 7→ Φ2
M,λ⊠λ(v ⊗ σ),
defined as the map (19) in Section 5.7. We claim
Lemma 9.7. The composition of the map (30) in Lemma 9.6 and the
map in (32) is an isomorphism on St⊠ St.
Proof. By Frobenius reciprocity it suffices to show that the described
composed map is nonzero. In particular, it suffices to show that
M,λ⊠λ(v ⊗ σ̃) 6= 0,
where σ̃ denotes the global section of M defined in Lemma 9.6. For
this we use the fact that the global section
(τiτ̃i)
X defines a Frobenius splitting of X (see e.g. [B-K, proof of
Thm.6.2.7]). As a consequence η(σ̃) is a Frobenius splitting ofX , where
η is the map defined in (27). Equivalently , the natural G2sc-equivariant
morphism
EndMF (X)⊗M(X) → k[X
′] = k,
defined in (13), will map η′ ⊗ σ̃ to 1. This induces a commutative
diagram
(33) Ind
EndMF
⊗ kλ⊠λ
⊗M(X)
M,λ⊠λ
++VVV
St⊗ St
EndMF (X)⊗ kλ⊠λ ⊗M(X)
// kλ⊠λ
34 XUHUA HE AND JESPER FUNCH THOMSEN
where the image of v⊗ σ̃ under the diagonal map is nonzero. This ends
the proof. �
Proposition 9.8. Let Θ denote the map defined in (31). The image
Θ(ν) of an element ν defines, up to a nonzero constant, a Frobenius
splitting of G2sc ×B2sc X if and only if the image of ν under the map
(34) φλ⊠λ : End
(Gsc/Bsc)
St⊠ St
defined in Section 5.6, is nonzero.
Proof. Apply Proposition 5.7 and Lemma 9.7.
With the identification EndL⊠LF
(Gsc/Bsc)
≃ St⊠ St the map φλ⊠λ,
defined in (34), must necessarily (up to a nonzero constant) be the
G2sc-invariant form on St⊠ St mentioned in Section 6.1. Let v∆ denote
the diag(G)-invariant element in EndL⊠LF
(Gsc/Bsc)2
defined in Section
7.1. Then the diag(G)-equivariant map
St⊗ St → k,
ν 7→ φλ⊠λ(v∆ ⊗ ν),
is nonzero and thus it must coincide (up to a nonzero constant) with
the Gsc-invariant form φ on St defined in (24).
Proposition 9.9. Fix notation as above and let D denote the effective
Cartier divisor
(p− 1)
(1, w0)Di +
j /∈K
on X. Then X admits a Frobenius D-splitting which is compatible with
the subvariety Y and the G-Schubert variety diag(G) · V .
Proof. Consider the diag(G)-equivariant morphism
Θ∆ : St⊠ St → EndF
G2sc ×B2sc X
ν 7→ Θ(v∆ ⊗ ν),
where Θ is the map in (31). By Lemma 9.8 the image Θ∆(ν) of an
element ν ∈ St⊗ St is a Frobenius splitting, up to a nonzero constant,
if and only if φ(ν) is nonzero. Here φ is the the map defined in (24).
Let v+ (resp. v−) denote a nonzero B (resp. B
−)-eigenvector of St
and let ν = v+ ⊗ v−. After possibly multiplying v+ with a constant
we may assume that s = Θ∆(ν) defines a Frobenius splitting of Z =
G2sc ×B2sc X . As v is compatible with Y and V (cf. (28)) it follows by
Theorem 5.6 and Lemma 7.1 that s factorizes as
(35) s : (FZ)∗OZ
(FZ )∗σ
−−−−→ (FZ)∗MZ
−→ OZ′,
where s1 is compatible with the subvarieties G
sc×B2sc V , G
sc×B2sc Y and
diag(Gsc)×diag(Bsc) X . Here MZ is the G
sc-linearized line bundle on Z
associated with the B2sc-linearized line bundle M on X as explained in
Section 5.2, and σ is the global section of MZ defined as the image of ν
under the map (30) in Lemma 9.6. Notice that as M is a G2sc-linearized
line bundle on the G2sc-variety X we may identify the global sections of
M and MZ . Actually , as X is a G
sc-variety the morphism
G2sc ×B2sc X →
Gsc/Bsc × Gsc/Bsc ×X,
[(g1, g2), x] 7→ (g1B, g2B, (g1, g2) · x),
is an isomorphism. Moreover, under this isomorphism, the line bundle
MZ is just the pull back of M under projection pX on the third coor-
dinate. Thus, by Lemma 9.6 it follows that σ is the pull back from X
of the effective Cartier divisor
D = (p− 1)
(1, w0)Di +
j /∈K
Applying the functor (pX)∗ to (35) we obtain the Frobenius D-splitting
(pX)∗s : (FX)∗OX
(FX)∗σD
−−−−−→ (FX)∗O(D)
(pX)∗s1
−−−−→ OX′
ofX where (pX)∗s1 is compatible with the subvarieties pX(G
sc×B2scY ) =
Y and pX(diag(Gsc)×diag(Bsc) V ) = diag(G) · V (by Lemma 3.7). This
ends the proof. �
Corollary 9.10. Let X denote a G-Schubert variety in a smooth pro-
jective toroidal embedding of a reductive group G. Then X admits a
stable Frobenius splitting along an ample divisor.
Proof. Apply Proposition 9.9, Lemma 9.5 and Lemma 3.3. �
10. Cohomology of line bundles
The main aim of this section is to obtain a generalizing the vanishing
part of Corollary 9.4 to nef line bundles. The concept of a rational
morphism is here central and for this we use [B-K, Sect.3.3] as a general
reference. First we recall :
Definition 10.1. A morphism f : Y → Z of varieties is a called a ra-
tional morphism if the induced map f ♯ : OZ → f∗OY is an isomorphism
and Rif∗OY = 0, i > 0.
The following criterion for a morphism to be rational will be very
useful ([R, Lem.2.11]).
Lemma 10.2. Let f : Y → Z denote a projective morphism of ir-
reducible varieties and let Ŷ denote a closed irreducible subvariety of
Y . Consider the image Ẑ = f(Ŷ ) as a closed subvariety of Z. Let L
denote an ample line bundle on Z and assume
(1) f ♯ : OZ → f∗OY is an isomorphism.
(2) Hi(Y, f ∗Ln) = Hi(Ŷ , f ∗Ln) = 0, for i > 0 and n≫ 0.
36 XUHUA HE AND JESPER FUNCH THOMSEN
(3) The restriction map H0(Y, f ∗Ln) → H0(Ŷ , f ∗Ln) is surjective
for n≫ 0.
Then the induced map f̂ : Ŷ → Ẑ is a rational morphism.
10.1. Toric variety. An equivariant embedding Z of the (reductive)
group T is called a toric variety (wrt. T ). Notice that, as T is commu-
tative, we may consider the T ×T -action on Z as just a T -action. The
following result should be well known but, as we do not know a good
reference, we include a proof.
Lemma 10.3. Let f : Y → Z denote a projective surjective morphism
of equivariant embeddings of T . Let T · z denote a T -orbit in Z and let
T · y denote a T -orbit in f−1(T · z) of minimal dimension. Then the
map T · y → T · z, induced by f , is an isomorphism.
Proof. Let T · z and T · y denote the closures of T · z and T · y in Z
and Y respectively. Then the induced map
f̂ : T · y → T · z,
is a projective morphism. Moreover, by the minimality assumption
on T · y, the inverse image f̂−1(T · z) equals T · y. In particular, the
induced morphism : T · y → T · z is projective. But any T -orbit in a
toric variety (wrt. to T ) is isomorphic to a torus T1 satisfying that the
cokernel of the induced map of character groups X∗(T1) → X
∗(T ) is a
free abelian group ([Ful, Sect.3.1]). In particular, the varieties T ·y and
T · z are tori and the cokernel of the induced map of character groups
X∗(T · z) → X∗(T · y) is a free abelian group. But T · y → T · z is
an affine projective morphism and thus it must be a finite morphism.
Thus the cokernel of X∗(T · z) → X∗(T · y) is a finite group and, as it
is already a free group, it must be trivial. This ends the proof as tori
are determined by their character groups. �
Lemma 10.4. Let X denote a projective embedding of a reductive
group G and let Y denote a G × G-orbit closure of X. Then there
exists a smooth toroidal embedding X̂ of G, a projective G-equivariant
morphism f : X̂ → X and a G×G-orbit closure Ŷ in X̂ such that the
induced morphism f : Ŷ → Y is a rational morphism.
Proof. Assume first that X is toroidal. By [B-K, Prop.6.2.5] there ex-
ists a smooth toroidal embedding X̂ of G with a projective morphism
f : X̂ → X . Let X0 denote the open subset of X introduced in the
beginning of Section 9.1, and let X̂0 denote the corresponding sub-
set of X̂ . Then the inverse image f−1(X0) coincides with X̂0 [B-K,
Prop.6.2.3(i)]. Let T (resp. T̂ ) denote the closure of T in X (resp. X̂).
Then T and T̂ are toric varieties [B-K, Prop.6.2.3], and the induced
map f : T̂ → T is a projective morphism of toric varieties. Thus also
the induced map
X̂0 ∩ T̂ → X0 ∩ T ,
is a projective morphism of toric varieties. As mentioned in Section 9.1
every G×G-orbit in X will intersect X0 ∩ T in a unique T -orbit. We
let T · x denote the open T -orbit in the intersection of Y with X0 ∩ T .
By Lemma 10.3 we may find a T -orbit T · x̂ in X̂0 ∩ T̂ which by f
is isomorphic to T · x, and we then define Ŷ to be the closure of the
G×G-orbit through x̂. By the isomorphism (25) we then conclude that
f induces a projective birational morphism Ŷ → Y . By [H-T2, Cor.8.4]
the orbit closure Y is normal and thus, by Zariski’s main theorem, we
conclude f∗OŶ = OY . By Lemma 10.2 (used on the morphism Ŷ → Y
and the closed non-proper subvariety Ŷ of Ŷ ) it now suffices to prove
Hi(Ŷ , f ∗L) = 0, i > 0,
for a very ample line bundle L on Y . This follows from [H-T2, Prop.7.2]
and ends the proof in the case when X is toroidal.
Consider now an arbitrary projective equivariant embedding X of
G. Let X̂ denote the normalization of the closure of the image of the
natural G×G-equivariant embedding
G→ X ×X,
where X denotes the wonderful compactification of Gad. Then X̂ is a
toroidal embedding of G with an induced projective equivariant mor-
phism f : X̂ → X . Let Ŷ denote any G×G-orbit closure in X̂. Then
f : Ŷ → f(Ŷ ) is a rational morphism [H-T2, Lem.8.3]. In particular,
we may find a G × G-orbit closure Ŷ of X̂ with an induced rational
morphism f : Ŷ → Y . Finally we may apply the first part of the proof
to Ŷ and X̂ and use that a composition of rational morphisms is again
a rational morphism. �
Corollary 10.5. Let X denote a projective embedding of a reductive
group G and let X denote a G-Schubert variety in X. Let Y = (G×G)·
X denote the minimal G × G-orbit closure of X containing X. When
L is a nef line bundle on X then
Hi(X,L) = 0, i > 0.
Moreover, when L is a nef line bundle on Y then the restriction mor-
phism
H0(Y,L) → H0(X,L),
is surjective.
Proof. Assume first that X is smooth and toroidal. Then by Propo-
sition 9.9, Lemma 9.5 and Lemma 3.3 the variety Y admits a stable
Frobenius splitting along an ample divisor which is compatibly with X.
Thus the statement follows in this case by Proposition 3.4.
38 XUHUA HE AND JESPER FUNCH THOMSEN
Let now X denote an arbitrary projective equivariant embedding of
G. Choose, using Lemma 10.4, a smooth projective toroidal embedding
X̂ with a projective equivariant morphism f : X̂ → X onto X , and a
G × G-orbit closure Ŷ in X̂ with an induced rational morphism onto
Y . Let V denote a B×B-orbit closure in Y such that X = diag(G) ·V .
As Y is the minimal G × G-orbit closure containing X it follows that
V will intersect the open G×G-orbit of Y . In particular, there exists
a B×B-orbit closure V̂ in X̂ which intersects the open G×G-orbit of
Ŷ and which maps onto V . In particular,
X̂ := diag(G) · V̂ ,
is a G-Schubert variety in X̂ which by f maps onto X. Moreover, Ŷ is
the minimal G×G-orbit closure containing X̂.
We claim that the induced morphism X̂ → X is a rational morphism.
To prove this we apply Lemma 10.2 to the rational morphism f : Ŷ →
Y . Choose an ample line bundle M on Y . Then it suffices to prove
(36) Hi(Ŷ , f ∗Mn) = Hi(X̂, f ∗Mn) = 0, i > 0, n > 0,
and that the restriction map
(37) H0(Ŷ , f ∗Mn) → H0(X̂, f ∗Mn),
is surjective for n > 0. But Mn is an ample, and thus nef, line bundle
on Y and therefore the pull back f ∗Mn is a nef line bundle on Ŷ ([Laz,
Ex. 1.4.4]). As X̂ is smooth and toroidal, the conclusion of the first
part of this proof then shows that conditions (36) and (37) are satisfied.
Now both X̂ → X and Ŷ → Y are rational morphisms. In particular,
we have identifications
Hi(Ŷ , f ∗L) ≃ Hi(Y,L), i ≥ 0,
Hi(X̂, f ∗L) ≃ Hi(X,L), i ≥ 0,
for any line bundle L on Y or, in the second equation, on X . When L is
a nef line bundle the pull back f ∗L is also nef ([Laz, Ex. 1.4.4]). Thus
as we have already completed the proof of the statement for smooth
toroidal embeddings, in particular for X̂ , this now ends the proof. �
By the proof of the above result we also find that any G-Schubert
variety X in a projective equivariant embedding of G, will admit a G-
equivariant rational morphism f : X̂ → X by a G-Schubert variety X̂
of some smooth projective toroidal embedding of G.
Remark 10.6. When X = X is the wonderful compactification of a
group G of adjoint type and L is a nef line bundle on X, then the
restriction morphism
H0(X,L) → H0(Y,L),
to any closed G×G-stable irreducible subvariety Y of X is surjective.
In particular, also the restriction morphism
H0(X,L) → H0(X,L),
to any G-Schubert variety X is surjective by the above result. We do
not know if the latter is true for arbitrary equivariant embeddings.
11. Normality questions
The obtained Frobenius splitting properties of G-Schubert varieties
in Section 9 and the cohomology vanishing results in Corollary 10.5
should be expected to have strong implications on the geometry of
these varieties. However, in this section we provide an example of a G-
Schubert variety in the wonderful compactification of a group of type
G2 which is not even normal. In fact, it seems that there are plenty of
such examples.
11.1. Some general theory. We keep the notations as in Section
8.5. For J ⊂ ∆ and w ∈ W∆\J , we let XJ,w denote the closure of XJ,w
in X. Let
K = max{K ′ ⊂ ∆ \ J ;wK ′ ⊂ K ′}.
By [He2, Prop. 1.12], we have a diag(G)-equivariant isomorphism
diag(G)×diag(PK) (PKẇ, PK)hJ ≃ XJ,w
induced by the inclusion of (PKẇ, PK)hJ in X. Let V denote the
closure of (PKẇ, PK)hJ within X. Then V is the closure of a B × B-
orbit and we find that the induced map
(38) f : diag(G)×diag(PK) V → XJ,w,
is a birational and projective morphism. Thus, by Zariski’s Main The-
orem, a necessary condition for XJ,w to be normal is that the fibers of
f are connected. Actually, in positive characteristic, connectedness of
the fibers is also sufficient forXJ,w to be normal. This follows asXJ,w is
Frobenius split (Prop. 9.3) and thus weakly normal [B-K, Prop.1.2.5].
11.2. An example of a non-normal closure. Let now, further-
more, G be a group of type G2. Let α1 denote the short simple root
and α2 denote the long simple root. The associated simple reflections
are denoted by s1 and s2. Let J = {α2} and w = s1s2 ∈ W
∆\J . In this
case K = ∅ and we obtain a birational map
f : diag(G)×diag(B) V ≃ XJ,w
where V is the closure of (Bẇ,B)hJ . By [Sp, Prop. 2.4], the part of
V which intersect the open G×G-orbit of XJ equals
(Bẇ′, B)hJ ∪
ws1≤w′
(Bẇ′, Bṡ1)hJ .
40 XUHUA HE AND JESPER FUNCH THOMSEN
In particular, x := (v̇, 1)hJ is an element of V , where v = s2s1s2. We
claim that the fiber of f over x is not connected. To see this let y
denote a point in the fiber over x. Then we may find g ∈ G and x̃ ∈ V
such that
y = [g, x̃].
By (39), x̃ = (bẇ′, b′)hJ for some b ∈ B, b
′ ∈ P∆\J and w
′ ≥ w. Then
(gbẇ′, gb′)hJ = (v̇, 1)hJ .
It follows that (v̇−1gbẇ′, gb′) lies in the stabilizer of hJ . In particular,
gb′ ∈ P∆\J and thus also g ∈ P∆\J . If g ∈ B then y = [1, x]. So assume
that g = u1(t)ṡ1 where u1 is the root homomorphism associated to α1.
Assume that t 6= 0. Then we may find b1 ∈ B and s ∈ k such that
g = u−1(s)b1 where u−1 is the root homomorphism associated to −α1.
x̃ = (g−1, g−1)(v̇, 1)hJ
= (b−11 u−1(−s)v̇, g
−1)hJ
= (b−11 v̇, g
−1)hJ
∈ (Bv̇, Bṡ1)hJ
where the third equality follows as v̇−1u−1(−s)v̇ is contained in the
unipotent radical of P−
. But (Bv̇, Bṡ1)hJ has empty intersection
with V (by (39)) which contradicts the assumption that t 6= 0. It
follows that the only possibilities for y are [1, x] and [ṡ1, (ṡ
1 v̇, ṡ
1 )hJ ].
As (ṡ−11 v̇, ṡ
1 ) is contained in V (by (39)) we conclude that the fiber
of f over x consists of 2 points; in particular the fiber is not connected
and thus XJ,w is not normal.
Remark 11.1. It seems likely that normalizations of G-Schubert vari-
eties should have nice singularities : If we let ZJ,w denote the normal-
ization of the closure of XJ,w, then the map (38) induces a birational
and projective morphism
f̃ : diag(G)×diag(PK) V → ZJ,w.
We expect that f̃ can be used to obtain global F -regularity of ZJ,w
(see [S] for an introduction to global F -regularity). In fact, by the
results in [H-T2] the B×B-orbit closure V is globally F -regular. Thus
diag(G)×diag(PK) V is locally strongly F -regular, and as
f̃∗Odiag(G)×diag(PK )V
= OZJ,w ,
it seems likely that ZJ,w is also locally strongly F -regular. Moreover,
similarly to Corollary 9.10 one may conclude that ZJ,w admits a stable
Frobenius splitting along an ample divisor. Thus ZJ,w is globally F -
regular if it is locally strongly F -regular. At the moment we do not
know if ZJ,w is locally strongly F -regular.
12. Generalizations
Fix notation as in Section 2. An admissible triple of G × G is by
definition a triple C = (J1, J2, θδ) consisting of J1, J2 ⊂ ∆, a bijection
δ : J1 → J2 and an isomorphism θδ : LJ1 → LJ2 that maps T to T
and the root subgroup Uαi to the root subgroup Uαδ(i) for i ∈ J1. To
each admissible triple C = (J1, J2, θδ), we associate the subgroup RC of
G×G defined by
RC = {(p, q) : p ∈ PJ1, q ∈ PJ2, θδ(πJ1(p)) = πJ2(q)},
where πJ : PJ → LJ , for a subset J ⊂ ∆, denotes the natural quotient
Let X denote an equivariant embedding of the reductive group G.
A RC-Schubert variety of X is then a subset of the form RC · V for
some B × B-orbit closure V in X . When G = Gad is a group of
adjoint type and X = X is the associated wonderful compactification
the set of RC-Schubert varieties coincides with closures of the set of RC-
stable pieces. By definition [L-Y, section 7], a RC-stable piece in the
wonderful compactification X of Gad is a subvariety of the form RC ·Y ,
where Y = (Bv1, Bv2) · hJ for some J ⊂ ∆, v1 ∈ W
J and v2 ∈
(notation as in Section 8.5). Notice that when J1 = J2 = ∆ and θδ
is the identity map then a RC-stable piece is the same as a G-stable
piece. On the other hand, when J1 = J2 = ∅, then a RC-stable piece
is the same as a B × B-orbit. Moreover, any RC-Schubert variety is a
finite union of RC-stable pieces [L-Y, Section 7].
The following is a generalization of Proposition 9.3 and Proposition
Proposition 12.1. Let C = (J1, J2, θδ) denote an admissible triple of
G×G and let X denote an equivariant embedding of G. Then X admits
a Frobenius splitting which compatible splits all RC-Schubert varieties
in X. If, moreover, X is a smooth, projective and toroidal embedding
and Y = XK = (G × G) · V , for some B × B-orbit closure V in X,
then X admits a Frobenius splitting along the Cartier divisor
D = (p− 1)
(wJ10 , 1)D̃i +
j /∈K
which is compatibly with Y and RC · V .
Proof. As the proof is similar to the proof of Proposition 9.3 and Propo-
sition 9.9 we only sketch the proof. In the following GJ , for a subset
J ⊂ ∆, denotes the commutator of the Levi subgroup in Gsc associated
to J . The Borel subgroup GJ ∩Bsc of GJ is denoted by BJ . Define XC
to be the G2J1-variety which as a variety is X but where the action is
twisted by the morphism
GJ1 ×GJ1
−−−→ GJ1 ×GJ2.
42 XUHUA HE AND JESPER FUNCH THOMSEN
Then the BJ1 × BJ2-canonical Frobenius splitting of X defined by
Theorem 9.1 and Lemma 6.3 induces a B2J1-canonical Frobenius split-
ting of XC. In particular, all subvarieties of XC which corresponds
to B × B-orbit closures in X will be compatibly Frobenius split by
this canonical Frobenius splitting. Now apply an argument as in the
proof of Proposition 9.3 and use the identification of RC · V ⊂ X with
diag(GJ1) · V ⊂ XC. This ends the proof of the first statement.
Assume now that X is a smooth, projective and toroidal embedding
and consider the B2sc-equivariant morphism
η : M → End!F (X, Y, V )⊗ k(1−p)ρ⊠(1−p)ρ,
defined in (27). Let YC and VC be defined similar to XC. Then η induces
a B2J1-equivariant morphism
ηC : M → End
F (XC, YC, VC)⊗ k(1−p)ρJ1⊠(1−p)ρJ1 .
Similar to the definition of v in (28) we obtain from ηC an element
vC ∈ Ind
EndF (XC, YC, VC)⊗ k(1−p)ρJ1⊠(1−p)ρJ1
and from this a G2J1-equivariant morphism
(40) End
(GJ1/BJ1)
⊗M(XC) → EndF
G2J1 ×B2J1
similar to (29). Here LJ1 is the line bundle on
GJ1/BJ1 associated to the
character (1 − p)ρJ1. Combining Lemma 6.3 and Lemma 9.6 we also
obtain a map
(41) StJ1 ⊠ StJ1 → M(XC),
with properties similar to the ones described in Lemma 9.6. As in (32)
we may also use vC to construct a morphism
M(XC) → StJ1 ⊠ StJ1 ,
such that the composition with (41) is an isomorphism on StJ1 ⊠ StJ1 .
Finally we may construct
ΘC : End
(GJ1/BJ1)
⊗ (StJ1 ⊠ StJ1) → EndF
G2J1 ×B2J1
similar to (31). In particular, a statement equivalent to Proposition
9.8 is satisfied for ΘC. Let v
+ (resp. v
− ) denote a highest (resp.
lowest) weight vector in StJ1 and let v
∆ denote the diag(GJ1)-invariant
element of End
(GJ1/BJ1)
. Imitating the proof of Proposition
9.9 we then find that ΘC(v
∆ ⊗ (v
+ ⊗ v
− )) is a Frobenius splitting of
G2J1×B2J1
XC (up to a nonzero constant). Moreover, the push forward of
this Frobenius splitting to X has the desired properties. We only have
to note that the effective Cartier associated to the image of vJ1+ ⊗ v
under the map (41) equals
D = (p− 1)
(wJ10 , 1)D̃i +
j /∈K
This ends the proof. �
We may also argue as in Corollary 10.5 to obtain
Corollary 12.2. Let X denote a projective embedding of a reductive
group G and let V denote the closure of a B × B-orbit in X. Let
Y = (G×G) · V and XC = RC · V . When L is a nef line bundle on XC
Hi(XC,L) = 0, i > 0.
Moreover, when L is a nef line bundle on Y then the restriction mor-
phism
H0(Y,L) → H0(XC,L),
is surjective.
Remark 12.3. In the case where k = C and X is the wonderful com-
pactification, the subvarieties (wJ10 , 1)D̃i, Xj and all the RC-Schubert
varieties are Poisson subvarieties with respect to the Poisson structure
on X corresponding to the splitting
Lie(G)⊕ Lie(G) = l1 ⊕ l2,
where l1 = Lie(RC) and l2 is a certain subalgebra of Ad(w
0 )Lie(B
Lie(B−). See [L-Y2, 4.5].
References
[Bri] M. Brion, Multiplicity-free subvarieties of flag varieties, Contemp. Math.
331 (2003), 13–23.
[B-K] M. Brion and S. Kumar, Frobenius Splittings Methods in Geometry and
Representation Theory, Progress in Mathematics (2004), Birkhäuser,
Boston.
[B-T] M. Brion and J. F. Thomsen, F -regularity of large Schubert varieties,
Amer. J. Math. 128 (2006), 949–962.
[E-L] S. Evens and J.-H. Lu, On the variety of Lagrangian subalgebras, I, II,
Ann. Sci. cole Norm. Sup. (4) 34 (2001), no. 5, 631–668; 39 (2006), no. 2,
347–379.
[Ful] W. Fulton, Introduction to Toric Varieties, Ann. Math. Studies, 131
(1993), Princeton University Press.
[Har] R. Hartshorne, Ample subvarieties of algebraic varieties, Lecture Notes in
Math. 156 (1970), Springer-Verlag.
[Har2] R. Hartshorne, Algebraic Geometry, GTM 52 (1977), Springer-Verlag.
[He] X. He, Unipotent variety in the group compactification, Adv. in Math. 203
(2006), 109-131.
[He2] X. He, The G-stable pieces of the wonderful compactification, Trans. Amer.
Math. Soc. 359 (2007), 3005-3024.
44 XUHUA HE AND JESPER FUNCH THOMSEN
[H-T] X. He and J. F. Thomsen, On the closure of Steinberg fibers in the won-
derful compactification, Transformation Groups, 11 (2006), no. 3, 427-438.
[H-T2] X. He and J.F.Thomsen, Geometry of B×B-orbit closures in equivariant
embeddings, math.RT/0510088.
[Laz] R. Lazarsfeld, Positivity in Algebraic Geometry I, classical setting: line
bundles and linear series, Ergebnisse der Mathematik und ihrer Grenzge-
biete. 3. Folge (2004), Springer-Verlag, Berlin.
[L] G. Lusztig, Parabolic character sheaves, I, II, Mosc. Math. J. 4 (2004),
no. 1, 153–179; no. 4, 869–896.
[L-Y] J.-H Lu and M. Yakimov, Partitions of the wonderful group compactifica-
tion, math.RT/0606579.
[L-Y2] J.-H Lu and M. Yakimov, Group orbits and regular partitions of Poisson
manifolds, math.SG/0609732.
[M-R] V.B. Mehta and A. Ramanathan, Frobenius splitting and cohomology van-
ishing for Schubert varieties, Ann. of Math. 122 (1985), 27–40.
[R] A. Ramanathan, Equations defining Schubert varieties and Frobenius split-
ting of diagonals, Inst. Hautes Études Sci. Publ. Math. 65 (1987), 61–90.
[R-R] S. Ramanan and A, Ramanathan, Projective normality of flag varieties
and Schubert varieties, Invent. Math. 79 (1985), 217–224.
[S] K. E. Smith, Globally F -regular varieties: Applications to vanishing the-
orems for quotients of Fano varieties, Michigan Math. J. 48 (2000), 553–
[Sp] T. A. Springer, Intersection cohomology of B×B-orbits closures in group
compactifications, J. Alg. 258 (2002), 71–111.
[T] J. F. Thomsen, Frobenius splitting of equivariant closures of regular con-
jugacy classes Proc. London Math. Soc. 93 (2006), 570–592.
Department of Mathematics, Stony Brook University, Stony Brook,
NY 11794, USA
E-mail address : hugo@math.sunysb.edu
Institut for matematiske fag, Aarhus Universitet, 8000 Århus C,
Denmark
E-mail address : funch@imf.au.dk
ABSTRACT
  Let $X$ be an equivariant embedding of a connected reductive group $G$ over
an algebraically closed field $k$ of positive characteristic. Let $B$ denote a
Borel subgroup of $G$. A $G$-Schubert variety in $X$ is a subvariety of the
form $\diag(G) \cdot V$, where $V$ is a $B \times B$-orbit closure in $X$. In
the case where $X$ is the wonderful compactification of a group of adjoint
type, the $G$-Schubert varieties are the closures of Lusztig's $G$-stable
pieces. We prove that $X$ admits a Frobenius splitting which is compatible with
all $G$-Schubert varieties. Moreover, when $X$ is smooth, projective and
toroidal, then any $G$-Schubert variety in $X$ admits a stable Frobenius
splitting along an ample divisors. Although this indicates that $G$-Schubert
varieties have nice singularities we present an example of a non-normal
$G$-Schubert variety in the wonderful compactification of a group of type
$G_2$. Finally we also extend the Frobenius splitting results to the more
general class of $\mathcal R$-Schubert varieties.

<|endoftext|><|startoftext|>
Introduction
	Model setup
	Orphan TeV flares
ABSTRACT
  With the anticipated launch of GLAST, the existing X-ray telescopes, and the
enhanced capabilities of the new generation of TeV telescopes, developing tools
for modeling the variability of high energy sources such as blazars is becoming
a high priority. We point out the serious, innate problems one zone
synchrotron-self Compton models have in simulating high energy variability. We
then present the first steps toward a multi zone model where non-local, time
delayed Synchrotron-self Compton electron energy losses are taken into account.
By introducing only one additional parameter, the length of the system, our
code can simulate variability properly at Compton dominated stages, a situation
typical of flaring systems. As a first application, we were able to reproduce
variability similar to that observed in the case of the puzzling `orphan' TeV
flares that are not accompanied by a corresponding X-ray flare.

<|endoftext|><|startoftext|>
Fusion of radioactive
Sn with
J. F. Liang, D. Shapira, J. R. Beene, C. J. Gross, R. L. Varner, A. Galindo-Uribarri,
J. Gomez del Campo, P. A. Hausladen, P. E. Mueller, D. W. Stracener
Physics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831
H. Amro, J. J. Kolata
Department of Physics, University of Notre Dame, Notre Dame, IN 46556
J. D. Bierman
Physics Department AD-51, Gonzaga University, Spokane, Washington 99258-0051
A. L. Caraley
Department of Physics, State University of New York at Oswego, Oswego, NY 13126
K. L. Jones
Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854
Y. Larochelle
Department of Physics and Astronomy, University of Tennessee, Knoxville, Tennessee 37966
W. Loveland, D. Peterson
Department of Chemistry, Oregon State University, Corvallis, Oregon 97331
(Dated: October 28, 2018)
Evaporation residue and fission cross sections of radioactive 132Sn on 64Ni were measured near
the Coulomb barrier. A large sub-barrier fusion enhancement was observed. Coupled-channel
calculations including inelastic excitation of the projectile and target, and neutron transfer are in
good agreement with the measured fusion excitation function. When the change in nuclear size and
shift in barrier height are accounted for, there is no extra fusion enhancement in 132Sn+64Ni with
respect to stable Sn+64Ni. A systematic comparison of evaporation residue cross sections for the
fusion of even 112−124Sn and 132Sn with 64Ni is presented.
PACS numbers: 25.60.-t, 25.60.Pj
I. INTRODUCTION
Fusion of heavy ions has been a topic of interests
for several decades[1]. One motivation is to understand
the reaction mechanisms so that the production yield of
heavy elements can be better estimated by model calcula-
tions. The formation of a compound nucleus is a complex
process. The projectile and target have to be captured
inside the Coulomb barrier and subsequently evolve into
a compact shape. In heavy systems, the dinuclear system
can separate during shape equilibration prior to passing
the saddle point. This quasifission process is considered
the primary cause of fusion hindrance[2, 3, 4].
At energies near and below the Coulomb barrier, the
structure of the participants plays an important role in
influencing the fusion cross section[5, 6, 7]. Sub-barrier
fusion enhancement due to nuclear deformation and in-
elastic excitation has been observed[8, 9, 10, 11, 12].
Coupled-channel calculations have successfully repro-
duced experimental data by including nuclear deforma-
tion and inelastic excitation. Nucleon transfer is another
important channel to be considered[13, 14].
Recently available radioactive ion beams offer the op-
portunity to study fusion under the influence of strong
nucleon transfer reactions. Several theoretical works have
predicted large enhancement of sub-barrier fusion involv-
ing neutron-rich radioactive nuclei[15, 16, 17, 18, 19].
In addition, the compound nucleus produced in such re-
actions is predicted to have a higher survival probabil-
ity and longer lifetimes. This is encouraging for super-
heavy element research. If high-intensity, neutron-rich
radioactive beams become available in the future, new
neutron-rich heavy nuclei may be synthesized with en-
hanced yields. The longer lifetime of new isotopes of
heavy elements would enable the study of their atomic
and chemical properties[20]. However, the current inten-
sity of the radioactive beams is several orders of mag-
nitude lower than that of stable beams. It is thus not
practical to use such beams for heavy element synthesis
experiments, but they do provide excellent opportuni-
ties for studying reaction mechanisms of fusion involving
neutron-rich radioactive nuclei.
Fusion enhancement, with respect to a one-
dimensional barrier penetration model prediction, has
been observed in experiments performed with neutron-
rich radioactive ion beams at sub-barrier energies[21, 22,
23, 24, 25]. For instance, the effect of large neutron excess
on fusion enhancement can be seen in 29,31Al+197Au[23].
http://arxiv.org/abs/0704.0780v2
However, when comparing reactions involving stable iso-
topes of the projectile or target, the fusion excitation
functions are very similar if the change in nuclear sizes is
accounted for.
This paper reports results of fusion excitation func-
tions measured with radioactive 132Sn on 64Ni. The dou-
bly magic (Z=50, N=82) 132Sn has eight neutrons more
than the heaviest stable 124Sn. Its N/Z ratio (1.64) is
larger than that of stable doubly magic nuclei 48Ca (1.4)
and 208Pb (1.54) which are commonly used for heavy
element production[26]. Evaporation residue (ER) and
fission cross sections were measured. The sum of ER and
fission cross sections are taken as the fusion cross section.
The experimental apparatus is described in Sect. II
and data reduction procedures in Sect. III. The results
and comparison with model calculations are presented in
Sect. IV. In Sect. V a comparison of ER and fusion cross
sections with those resulting from stable Sn isotopes on
64Ni is discussed. A summary is given in Sect. VI.
II. EXPERIMENTAL METHODS
The experiment was carried out at the Holifield Ra-
dioactive Ion Beam Facility. A 42 MeV proton beam
produced by the Oak Ridge Isochronous Cyclotron was
used to bombard a uranium carbide target. The fission
fragments were ionized by an electron beam plasma ion
source. The largest yield of mass A=132 fragments was
132Te. Therefore, it was necessary to suppress 132Te.
This was accomplished by introducing sulfur into the ion
source then selecting the mass 164 XS+ molecular ions
from the extracted beam. The 132Te to 132Sn ratio in the
ion beam was found to be suppressed by a large factor
(∼ 7 × 104) compared to that observed with the mass
132 atomic beam. The mass 164 SnS+ beam was con-
verted into a Sn− beam by passing it through a Cs va-
por cell where the molecular ion underwent breakup and
charge exchange[27]. The negatively charged Sn was sub-
sequently injected into the 25 MV electrostatic tandem
accelerator to accelerate the beam to high energies. The
measurement was performed at energies between 453 and
620 MeV. The average beam intensity was 50,000 parti-
cles per second (pps) with a maximum of 72,000 pps. The
ER cross sections measured between 453 and 560 MeV
have been reported previously[24].
The purity of the Sn beam was measured by an ioniza-
tion chamber mounted at zero degrees. Figure 1 displays
the energy loss spectra of a 560 MeV A=132 beam with
and without the sulfur purification. The dashed curves
are the results of fitting the spectrum with Gaussian dis-
tributions to estimate the composition of the beam. In
the upper panel, the beam is primarily 132Te without
sulfur in the ion source. When sulfur was introduced in
the ion source, the beam was 96% 132Sn, as shown in the
lower panel. The small amount of Sb and Te had a neg-
ligible impact on the measurement because their atomic
number is higher. Fusion of the target with these iso-
baric contaminants at sub-barrier energies should have
been suppressed due to the higher Coulomb barriers.
FIG. 1: (Color online) Composition of a 560 MeV mass
A=132 beam measured by the ionization chamber. Top panel:
The mass A=132 beam without purification where Te and Sb
are the major components of the beam. Bottom panel: Sulfur
was introduced into the target ion source and SnS was selected
by the mass separator. The dashed curves are results of fitting
the spectrum with three Gaussian distributions. The isobar
contaminants 132Sb and 132Te were suppressed considerably.
The apparatus for the fusion measurement is shown in
Fig. 2. A thick 64Ni target (1.0 mg/cm2) was used to
compensate for the low beam intensity. Since the com-
pound nucleus decays by particle evaporation and fission,
the evaporation residue (ER) and fission cross sections
were measured. The ERs were detected by the ioniza-
tion chamber at zero degrees and the fission fragments
were detected by an annular double-sided silicon strip
detector.
time−of−flight
Si det
beam defining
Timing
Timing
Timing
target
ionization
chamber
FIG. 2: (Color online) Apparatus for measuring fission and
evaporation residues cross sections induced by low intensity
beams in inverse kinematics.
The ERs were identified by the time-of-flight measured
with the microchannel plate timing detector located in
front of the ionization chamber and by energy loss in the
ionization chamber. The two microchannel plate timing
detectors located before the target were used to monitor
the beam intensity and to provide the timing reference
for the time-of-flight measurement. The microchannel
plate timing detector in front of the ionization chamber
was position sensitive and was used to monitor the beam
position. It was located 200 mm from the target and had
a 25 mm diameter Mylar foil. The ionization chamber
was filled with CF4 gas so that it could function at rates
up to 50,000 pps. Higher beam intensities occurred in
some of the fission measurements, requiring the ioniza-
tion chamber to be turned off. The data acquisition was
triggered by either the beam signal rate down scaled by a
factor of 1000, the coincidence of the delayed beam signal
and ER signal, or the silicon detector signal. A 350 MeV
Au beam that resembled ERs was measured by the ion-
ization chamber to calibrate the energy loss spectrum.
The ER cross section was obtained by taking the ratio of
the ER yield to the target thickness and the integrated
beam particles in the ionization chamber. A detailed de-
scription of the ER measurement technique used in this
experiment can be found in Ref. [28].
The annular double-sided silicon strip detector (Mi-
cron Semiconductor Design S2) was located 42 mm from
the target. It had 48 concentric strips on one side and 16
pie-shaped sectors on the other side. The inner diame-
ter was 35 mm and the outer diameter was 70 mm. The
thickness of the detector was 300 µm. The detection an-
gles spanned 15.6◦ to 39.6◦. The fission fragments were
identified by requiring a coincidence of events in the Si
detector and by the folding angle distributions of the de-
tected particles.
III. DATA REDUCTION PROCEDURES
A. Evaporation residues
Since this was an inverse kinematics reaction, the ERs
recoiled in the forward direction in a narrow cone. The
apparatus was designed to have high efficiency for detect-
ing ERs. The efficiency of the apparatus was estimated
by Monte Carlo simulations. The angular distribution of
the ERs was generated by statistical model calculations
using the code PACE2[29]. The input parameters for the
statistical model calculations will be discussed later in
this paper. The calculated efficiency for the lowest bom-
barding energy is 93±1%. It increases as the reaction
energy increases and reaches 98±1% at the highest en-
ergy.
A relatively thick target was used in this experiment.
The beam lost approximately 40 MeV after passing
through the target (13 MeV in the center of mass). For
this reason, the measured cross section is an average of
the contributions from the beam interacting throughout
the thickness of the target. The variation of ER cross
sections is not very large at energies above the Coulomb
barrier because the shape of the excitation function is al-
most flat. Therefore, the measured cross section is close
to that would be measured at an energy corresponding to
the middle of the target. However, at energies below the
barrier the ER cross section falls off exponentially. The
cross section near the entrance of the target has more
weight than that near the exit. Smooth curves fitting
the excitation function in this rapidly varying region were
used to determine the reaction energy associated with the
measured cross section.
An iterative method was used to determine the effec-
tive reaction energy for the thick target measurement .
First, the measured cross sections and the beam energies
calculated at the middle of the target were fitted by a
tensioned spline[30] where the smoothness of the curve
could be adjusted. The resulting curve was then used to
calculate the thick target cross section for each measure-
ment, according to
dE/dx
where σ(E) is the curve generated by the spline fit,
dE/dx is the stopping power of 132Sn in 64Ni, and ρ is the
target thickness. The integration limits were the energies
of the beam at the exit of the target and at the entrance
of the target. The energy, Ei, corresponding to the cross
section, σi, was obtained by interpolation using the fit-
ted curve. This set of energies was used as the input for
the next iteration of the fit. The result converged very
quickly. After five iterations, the energies differed from
the previous iteration energies by less than 0.2 MeV. The
validity of this method was checked by generating data
from a known function such as the Wong formula [31]
and folding in the effects of target thickness.
Comparing to the cross-section-weighted-average
method described in Ref. [28], the differences in energies
determined by these two methods are not noticeable
at high energies because the excitation function is
fairly flat. However, at energies below the barrier, the
energy determined by the cross-section-weighted-average
method is larger than that determined by the method
described above and disagrees with the measurement
in Ref. [32], as can be seen in Fig. 3. Furthermore,
it is found that using data generated from a known
function the effective energy obtained by the cross-
section-weighted-average method is shifted to too high
an energy in the exponential falloff region.
The uncertainty of the energy determination was esti-
mated by comparison with the method using the cross
section weighted average. The average uncertainty of
the effective reaction energy is 2.3 MeV in the region
where the excitation function is almost flat and increases
to 3.9 MeV in the exponential fall off region. The uncer-
tainty is larger, 5.8 MeV, for the lowest energy data point
because an extrapolation is required for calculating the
thick target cross section and the extrapolation region
is influenced by the location of the next higher energy
point.
To verify our measurement technique, the ER cross
sections for 124Sn+64Ni in inverse kinematics were mea-
sured and compared to those published by Freeman et al.
measured with a thin target[32]. It is noted that some
of our measurements were performed at energies differ-
ent from those of Ref. [32]. The comparison is shown in
Fig. 3. Our data (open triangles) are in good agreement
with those measured by Freeman et al. [32] (filled stars).
FIG. 3: (Color online) Comparison of ER cross sections for
64Ni+124Sn measured in this work and by Freeman et al.[32]
(filled stars). The filled circles are for energies determined
by the method described in Ref. [28] and the open triangles
are for energies determined by the method described in this
paper.
The solid circles are for energy determined by the cross-
section-weighted-average method described in Ref. [28].
B. Fission
Fission fragments were identified by requiring a coin-
cidence of two particles detected by the pie-shaped sec-
tors of the Si strip detector on either side of the beam.
Figure 4(a) and (b) present two-dimensional histograms
of particle energy and strip number of the Si detector
for coincident events taken from 560 MeV and 620 MeV
132Sn+64Ni, respectively. They were compared to the
kinematics calculation displayed in Fig. 4(c) and (d)
where the fission fragments, elastically scattered Sn and
Ni are shown by the solid, dash-dotted and dotted curves,
respectively. The angular range of the Si strip detector
is between the two vertical dashed lines. The elastically
scattered Ni and Sn appear in the upper right hand cor-
ner and center of the histogram, respectively. The fission
events are located in the gated area.
The folding angle distributions of the fragments were
used to distinguish fission from other reactions, such as
deep inelastic reactions. Since there are two solutions
for the kinematics of the inverse reaction, as shown in
Fig. 4(c) and (d), the fragment angular correlation is not
as simple as that in normal kinematics. Monte Carlo
simulations were performed to provide guidance. It was
assumed that only fusion-fission results from a full mo-
mentum transfer. The width of the mass distribution was
taken from the 58Ni+124Sn measurement[33]. The width
of the mass distribution was varied to estimate the uncer-
(d)(c)
Strip No.
0 10 20 30 40 50 60
Strip No.
0 10 20 30 40 50 60
FIG. 4: (Color online) (a) and (b) Two dimensional his-
tograms of energy and strip number for coincident events from
560 and 620 MeV 132Sn+64Ni, respectively, measured by the
annular double-sided silicon strip detector. The gated area
shows events from fission and other reactions. (c) and (d)
Kinematics of energy as a function of scattering angle for 560
and 620 MeV 132Sn+64Ni, respectively, elastic scattering and
fission fragments.The dash-dotted and dotted curves are for
the elastically scattered Sn and Ni, respectively whereas the
solid curve is for the fission fragments. The angular range of
the Si strip detector is between the two vertical dashed lines.
tainty of the simulation. The transition state model[34]
was used to predict the fission fragment angular distribu-
tion. In Fig. 5 the simulated fission fragment folding an-
gle distributions for 550 MeV 124Te+64Ni are compared
with a stable beam test measurement. The folding angle
distributions for one of the fragments detected in strip 2
(16.2◦), strip 22 (27.7◦), and strip 41 (36.8◦) are shown.
The gap in the spectra at strip 14, 30, 44, 46, and 47 are
malfunctioning strips in the detector.
The Monte Carlo simulated folding angle distributions
for fission are shown in the middle panels of Fig. 5 and
compared to those of measurements shown in the left
panels. For one of the fragments detected at forward
angles, strip 2 for example, the predicted angular dis-
tribution of the other fragment is similar to that of the
measurement. Most of these events are considered as re-
sulting from fission. For one of the fragments detected
near the middle part of the detector, strip 22 for instance,
there are differences between measurement and simula-
tion in the shapes of the angular distributions of the other
fragment. It is predicted that the other fission fragment
is distributed around strip 40. The measured distribution
spreads to more forward angles. For one of the fragments
detected at the backward angles, the yield of the other
fragment is predicted to be small and they are equally
FIG. 5: (Color online) Left panels: Folding angle distributions
for 550 MeV 124Te+64Ni for one of the fragments detected at
16.2◦ (strip 2), 27.7◦ (strip 22), and 36.8◦ (strip 41) by the
annular double-sided silicon strip detector. The elastic scat-
tering events are excluded. The dotted and dashed histograms
are the results of fitting the data with simulated fission and
deep inelastic collisions with Q=–20 MeV, respectively (see
text). Middle panels: Results of Monte Carlo simulations for
fission events. Right panels: Results of Monte Carlo simula-
tions for deep inelastic scattering events. The solid curves are
for reaction Q value of –10 MeV, the dashed curves are for
Q=–20 MeV, and the dotted curves are for Q=–40 MeV.
distributed between the middle part of the detector and
the outer edge of the detector. But the measured events
appear in the middle part of the detector. There are no
events in the region where fission events are expected.
These differences are attributed to the contribution from
other reaction mechanisms, most likely deep inelastic col-
lisions.
An attempt was made to simulate these deep inelastic
collision events. It was assumed that the mass of these
products were projectile- and target-like and the angular
distribution at forward angles followed a 1/sin(θ) depen-
dence. The right panels of Fig. 5 show the results of sim-
ulations performed for reaction Q values of –10 (solid),
–20 (dashed), and –40 MeV (dotted). It can be seen that
the overlap of fission and deep inelastic collisions becomes
larger at more backward angles. At strip 41 (36.8◦), deep
inelastic collisions account for all the events.
The relative contribution of fission and deep inelas-
tic collisions were obtained by fitting the simulated fold-
ing angle distributions to the measured distributions for
all the detector strips using the CERN library program
MINUIT[35]. In the fits, the normalization coefficients for
the simulated distributions were the only two variable
parameters. The results of the fits are shown in the left
panels of Fig. 5 by the dotted and dashed histograms for
fission and deep inelastic collisions with Q=–20 MeV, re-
spectively. The number of fission events in the measured
distributions were taken as the summed events in each
strip multiplied by the relative contribution of fission.
The folding angle distributions for 132Sn+64Ni are
shown in Fig. 6. Due to the low statistics, it was not prac-
tical to extract the fission events by fitting the folding
angle distributions. As an alternative, the fission events
were extracted by setting gates on the folding angle dis-
tributions using the simulated distributions as references.
This gating method was also tested with the 124Te+64Ni
measurement. The fission cross sections obtained by the
fitting method and the gating method agreed within 10%.
FIG. 6: (Color online) Left panels: Folding angle distributions
for 560 MeV 132Sn+64Ni for one of the fragments detected at
16.2◦ (strip 2), 27.7◦ (strip 22), and 36.8◦ (strip 41) by the
annular double-sided silicon strip detector. The elastic scat-
tering events are excluded. Middle panels: Results of Monte
Carlo simulations for fission events. Right panels: Results of
Monte Carlo simulations for deep inelastic scattering events.
The solid curves are for reaction Q value of –10 MeV, and the
dotted curves are for Q=–40 MeV.
The Monte Carlo simulation was also employed to cal-
culate the coincidence efficiency of the detector. The effi-
ciency increased from 5.7±0.9% at 530 MeV to 7.6±0.8%
at 620 MeV bombarding energy.
In the present work, the dynamic range of the ampli-
fiers was not sufficiently large resulting in the distortion
of the high energy signals. In the future, new amplifiers
that are more suitable for measuring the energy of fission
fragments will be used so that the mass ratio of reac-
tion products can be obtained to help distinguish fission
events from other reaction channels.
The formation of a compound nucleus depends on
whether the interacting nuclei are captured inside the
fusion barrier and whether the dinuclear system can sub-
sequently evolve into a compact shape. Quasifission oc-
curs when the dinuclear system fails to cross the saddle
level density parameter (a) A/8 MeV−1
af/an 1.04
diffuseness of spin distribution (∆l) 4 h̄
fission barrier Sierk[45]
TABLE I: Input parameters for statistical model calculations.
point to reach shape equilibrium. Since the beam inten-
sity was several orders of magnitude lower than that of
stable beams and the reaction was in inverse kinemat-
ics, making separation of fusion-fission and quasifission
very difficult, there was no attempt to distinguish quasi-
fission from fusion-fission in this work. Furthermore, the
experimental results are compared to barrier penetration
models which describe the capture process, making it un-
necessary to separate these two processes.
IV. COMPARISON WITH MODEL
CALCULATIONS
A. Statistical model
The compound nucleus formed in 132Sn+64Ni decays
by particle evaporation and fission. Statistical models
have successfully described compound nucleus decay for
a wide range of fusion reactions. The measured ER and
fission cross sections are compared with the predictions of
the statistical model code PACE2[29]. The input param-
eters were obtained by simultaneously fitting the data
from stable Sn on 64Ni[32, 36] and the measured fusion
cross sections[36] were used for the calculations. Fig-
ure 7(a), (b), and (c) displays the comparison of calcula-
tions and data for 112,118,124Sn+64Ni, respectively. The
calculations reproduce the measurements well except for
the ER cross sections of 112Sn+64Ni. Table I lists the in-
put parameters for the calculations. Without adjusting
the parameters, calculations for 132Sn+64Ni were per-
formed. The results are shown in Fig. 7(d). Very good
agreement between the calculation and the data can be
seen.
It is noted that some of the parameters used in our
calculations are different from those used by Lesko et
al. [36]. In their calculations, the code CASCADE[37] was
used. The mass of the nuclei in the decay chain was
calculated using the Myers droplet model[38]. The dif-
fuseness of the spin distribution was ∆l = 15h̄ and the
ratio of level density at the saddle point to the ground
state, af/an, was set to 1.0. In this work, a compilation
of measured masses[39], ∆l = 4h̄, and af/an = 1.04 were
used.
FIG. 7: (Color online) Comparison of measured ER (filled
circles) and fission (open circles) cross sections with statistical
model calculations. The solid and the dotted curves are the
calculated fission and ER cross sections, respectively, using
the measured fusion cross sections as input. (a) 64Ni+112Sn,
(b) 64Ni+118Sn, (c) 64Ni+124Sn (Freeman et al.[32]), and (d)
132Sn+64Ni (this work)
B. Coupled-channel calculation
In general, sub-barrier fusion enhancement can be de-
scribed by coupled-channel calculations. The fusion cross
section of 132Sn+64Ni, the sum of ER and fission cross
sections, is compared with coupled-channel calculations
using the code CCFULL[40]. The interaction potential
(V◦=82.46 MeV, r◦=1.18 fm, and a=0.691 fm) was taken
from the systematics of Broglia and Winther[41]. The
result of the calculations are compared with the data
in Fig. 8. The dotted curve is the prediction of a one-
dimensional barrier penetration model and it can be seen
that it substantially underpredicts the sub-barrier cross
sections. The coupled-channel calculation including in-
elastic excitation of 64Ni to the first 2+ and 3− states
and 132Sn to the first 2+ state is shown by the dashed
curve. The transition matrix elements, B(Eλ), of 64Ni
were obtained from Ref. [42, 43] and the B(E2) of 132Sn
was obtained from a recent measurement by Varner et
al.[44]. This calculation overpredicts the data at energies
near the barrier and underpredicts the data well below
the barrier.
The neutron transfer reactions have positive Q values
for transferring two to six neutrons from 132Sn to 64Ni.
Since there is no neutron transfer data available for this
reaction, the transfer coupling form factor is unknown.
Thus, the coupled-channel calculation including transfer
and inelastic excitation was performed with one effec-
FIG. 8: (Color online) Comparison of 132Sn+64Ni fusion
data (filled circles) with a one-dimensional barrier penetra-
tion model calculation (dotted curve). The coupled-channel
calculation including inelastic excitation of the projectile and
target is shown by the dashed curve and the calculation in-
cluding inelastic excitation and neutron transfer is shown by
the solid curve.
tive transfer channel using the Q value for two-neutron
transfer. The coupling constant was adjusted to fit the
data. The calculation with the coupling constant set to
0.48 is shown by the solid curve. It reproduces the data
very well except for the lowest energy data point which
has large uncertainties in energy and in cross section. A
better treatment of the transfer channels based on exper-
imental transfer data would help improve understanding
of the influence of transfer on fusion. Experimental neu-
tron transfer data on 132Sn+64Ni in the future would be
very useful.
V. DISCUSSION
The ER cross section can be described by
σER = πλ
(2l+ 1)σl,
where λ is the de Broglie wave length, lc the maximum
angular momentum for ER formation and σl the partial
cross section. The reduced ER cross sections for 64Ni
on stable-even Sn isotopes[32] are compared with that
for 132Sn+64Ni in Fig. 9. The reduced ER cross sec-
tion is defined as the ER cross section divided by the
kinematic factor πλ2. It can be seen that the ER cross
sections saturate at high energies as fission becomes a
significant fraction of the fusion cross section. In addi-
tion, the saturation value increases as the neutron excess
in Sn increases. This is consistent with the fact that the
fission barrier height increases for the more neutron-rich
compound nuclei.
FIG. 9: (Color online) The reduced ER cross section as a
function of the center of mass energy for 64Ni on stable even
Sn isotopes[32] and radioactive 132Sn.
In Fig. 10, the measured reduced ER cross sections for
Ni+Sn as a function of the calculated average mass of the
ERs, predicted by PACE2, are presented. In the same re-
action, the higher mass ERs are produced at lower beam
energies because of the lower excitation energies of the
compound nucleus. As the neutron excess in the com-
pound nucleus increases, neutron evaporation becomes
the dominant decay channel. The PACE2 calculation pre-
dicts that a compound nucleus made with Sn isotopes of
mass number greater than 120 decays essentially 100%
by neutron evaporation and Pt isotopes are the primary
ERs. The mass of the compound nucleus is different
when it is produced with different Sn isotopes. How-
ever, it can be seen that Pt of a particular mass can
be produced with different Sn isotopes if different num-
bers of neutrons are evaporated. The reaction with a
more neutron-rich Sn produces the same Pt isotope at a
higher rate. With 132Sn as the projectile, the ERs are so
neutron-rich that they cannot be produced by stable Sn
induced reactions. This suggests that it may be benefi-
cial to use neutron-rich radioactive ion beams to produce
new isotopes of heavy elements.
The fusion excitation functions of 64Ni on stable even
Sn isotopes[36] are compared with that of 132Sn+64Ni in
Fig. 11. In order to remove the effects of the difference
in nuclear sizes, the cross section is divided by πR2 with
R=1.2(A
t ) fm, where Ap (At) is the mass num-
ber of the projectile (target). The reaction energy in the
center of mass is divided by the barrier height predicted
by the Bass model[46]. It can be seen that the fusion of
132Sn and 64Ni is not enhanced with respect to the stable-
even Sn isotopes when the difference in nuclear sizes is
FIG. 10: (Color online) The reduced ER cross section as a
function of the calculated average mass of ERs predicted by
PACE2[29] for 64Ni on stable even Sn isotopes[36] and radioac-
tive 132Sn.
considered.
FIG. 11: (Color online) Comparison of fusion excitation func-
tions for 64Ni on stable even Sn isotopes[36] and radioactive
132Sn. The change in nuclear sizes are corrected by factor-
ing out the area and the Bass barrier height[46] in the cross
section and energy, respectively.
The lowest energy data point has large uncertainties.
The cross section seems enhanced comparing to the sta-
ble beam measurements in Fig. 9 and Fig. 11. A more
pronounced enhancement appears when the data point
is compared to our coupled-channel calculations (Fig. 8)
and to a time-dependent Hartree-Fock calculation[47].
To further explore if fusion is enhanced at this low energy
region, we plan to repeat the measurement with an im-
proved apparatus where the thickness of the Mylar foil in
the microchannel plate timing detector located in front
of the ionization chamber will be reduced. This will al-
low a better separation of the energy loss signals from
ERs and scattered beams in the ionization chamber at
low bombarding energies.
The Q values for transferring two to six neutrons from
132Sn to 64Ni are positive. It is necessary to include
neutron transfer in coupled-channel calculations to re-
produce experimental results. As the neutron excess in
the Ni isotopes decreases, the number of neutron transfer
channels with positive Q values increases for 132Sn+Ni.
In 132Sn+58Ni, the Q values for transferring one to six-
teen neutrons from 132Sn to 58Ni are positive and range
from 1.7 to 17.4 MeV. A large sub-barrier fusion enhance-
ment due to the coupling to neutron transfer is expected
to occur in 132Sn+58Ni. An experiment to measure the
fusion excitation function of 132Sn on 58Ni is in prepara-
tion.
Although 132Sn is unstable, its neutron separation en-
ergy is 7.3 MeV. This is not very low compared to stable
nuclei. The sub-barrier fusion enhancement observed in
132Sn+64Ni with respect to stable Sn nuclei can be ac-
counted for by the change in nuclear sizes. No extra en-
hancement was found. However, an increased ER yield
at energies above the barrier was observed as compared
to stable Sn. As the shell closure is crossed, the binding
energy for 133Sn decreases by a factor of two. The nu-
clear surface of 133Sn and even more neutron-rich Sn may
be more diffused. The number of neutron transfer chan-
nels with positive Q values increases by a factor of two
or more. Larger sub-barrier fusion enhancement beyond
the nuclear size effect may be expected.
VI. SUMMARY
Neutron-rich radioactive 132Sn beams were incident on
a 64Ni target to measure fusion cross sections near the
Coulomb barrier. With an average intensity of 5×104
pps beams and a high efficiency apparatus for ER detec-
tion, the uncertainty of the measured ER cross section is
small and comparable to that achieved in stable beam ex-
periments. The efficiency for fission fragment detection
was low but the detector had a very fine granularity. By
requiring a coincident detection of the fission fragments
and performing folding angle distribution analysis, fission
events were identified. The excitation functions of ER
and fission can be described by statistical model calcula-
tions using parameters that simultaneously fit the stable
even Sn isotopes on 64Ni fusion data. A large sub-barrier
fusion enhancement with respect to a one-dimensional
barrier penetration model prediction was observed. The
enhancement is attributed to the coupling of the projec-
tile and target inelastic excitation and neutron transfer.
The reduced ER cross sections at energies above the bar-
rier are larger for the 132Sn induced reaction than those
induced by stable Sn nuclei, as expected from the higher
fission barrier of the more neutron-rich compound nu-
cleus. For a specific mass of ER, reactions with a more
neutron-rich Sn have higher cross sections. When the
fusion excitation functions are compared on a reduced
scale, where the effects of nuclear size and barrier height
are factored out, no extra fusion enhancement is observed
in 132Sn+64Ni with respect to stable Sn induced fusion.
The fusion cross section measured at the lowest energy
seems to be enhanced. Experiments to investigate this
with an improved apparatus is planned.
VII. ACKNOWLEDGMENT
We would like to thank D. J. Hinde for helpful and
stimulating discussions. We wish to thank the HRIBF
staff for providing excellent radioactive beams and tech-
nical support. Research at the Oak Ridge National
Laboratory is supported by the U.S. Department of
Energy under contract DE-AC05-00OR22725 with UT-
Battelle, LLC. W.L. and D.P. are supported by the the
U.S. Department of Energy under grant no. DE-FG06-
97ER41026.
[1] W. Reisdorf, J. Phys. G 20, 1297 (1994).
[2] B. B. Back, Phys. Rev. C 31, 2104 (1985).
[3] J. Tōke et al., Nucl. Phys. A440, 327 (1985).
[4] D. J. Hinde and M. Dasgupta, Phys. Lett. B 622, 23
(2005).
[5] M. Beckerman, Rep. Prog. Phys. 51, 1047 (1988).
[6] M. Dasgupta, D. J. Hinde, N. Rowley, and A. M. Ste-
fanini, Annu. Rev. Nucl. Part. Sci. 48, 401 (1998).
[7] A. B. Balantekin and N. Takigawa, Rev. Mod. Phys. 70,
77 (1998).
[8] J. R. Leigh et al., Phys. Rev. C 52, 3151 (1995).
[9] J. D. Bierman, P. Chan, J. F. Liang, M. P. Kelly, A.
A. Sonzogni, and R. Vandenbosch, Phys. Rev. Lett. 76,
1587 (1996); Phys. Rev. C 54, 3068 (1996).
[10] C. R. Morton et al., Phys. Rev. Lett. 72, 4074 (1994).
[11] A. M. Stefanini et al., Phys. Rev. Lett. 74, 864 (1995).
[12] A. A. Sonzogni, J. D. Bierman, M. P. Kelly, J. P. Lestone,
J. F. Liang, and R. Vandenbosch, Phys. Rev. C 57, 722
(1998).
[13] A. M. Stefanini, D. Ackermann, L. Corradi, J. H. He, G.
Montagnoli, S. Beghini, F. Scarlassara, and G. F. Segato,
Phys. Rev. C 52, R1727 (1995).
[14] H. Timmers et al., Nucl. Phys. A633, 421 (1998).
[15] N. Takigawa, H. Sagawa, and T. Shinozuka, Nucl. Phys.
A538, 221c (1992).
[16] M. S. Hussein, Nucl. Phys. A531, 192 (1991).
[17] C. H. Dasso and R. Donangelo, Phys. Lett. B 276, 1
(1992).
[18] V. Yu. Denisov, Eur. Phys. J. A 7 87 (2000).
[19] V. I. Zagrebaev, Phys. Rev. C 67, 061601(R) (2003).
[20] S. Hofmann, Prog. Part. Nucl. Phys. 46, 293 (2001).
[21] K. E. Zyromski et al., Phys. Rev. C 55, R562 (1997).
[22] J. J. Kolata et al., Phys. Rev. Lett. 81, 4580 (1998).
[23] Y. X. Watanabe et al., Eur. Phys. J. A 10, 373 (2001).
[24] J. F. Liang et al., Phys. Rev. Lett. 91, 152701 (2003);
Phys. Rev. Lett. 96, 029903(E) (2006).
[25] J. F. Liang and C. Signorini, Int. J. Mod. Phys. E 14,
1121 (2005).
[26] S. Hofmann and G. Münzenberg, Rev. Mod. Phys. 72,
733 (2000).
[27] D. W. Stracener, Nucl. Instrum. and Methods B 204, 42
(2003).
[28] D. Shapira et al., Nucl. Instrum. and Methods A 551,
330 (2005).
[29] A. Gavron, Phys. Rev. C 21, 230 (1980).
[30] http://www.netlib.org/fitpack/.
[31] C. Y. Wong, Phys. Rev. Lett. 31, 766 (1973).
[32] W. S. Freeman et al., Phys. Rev. Lett. 50, 1563 (1983).
[33] F. L. H. Wolfs, Phys. Rev. C 36, 1379 (1987).
[34] R. Vandenbosch and J. R. Huizenga, Nuclear Fission,
Academic Press, New York, (1973).
[35] F. James, MINUIT reference manual (Version 94.1),
Program Library D506, CERN, (1998).
[36] K. T. Lesko et al., Phys. Rev. C 34, 2155 (1986).
[37] F. Pühlhofer, Nucl. Phys. A280, 267 (1977).
[38] W. D. Myers, Droplet Model of the Atomic Nucleus
(IFI/Plenum, New York, 1977).
[39] A. H. Wapstra, G. Audi, and C. Thibault, Nucl. Phys.
A729, 129 (2003).
[40] K. Hagino, N. Rowley, and A. T. Kruppa, Compu. Phys.
Commun. 123, 143 (1999).
[41] R. A. Broglia and A. Winther, Heavy Ion Reactions,
Addison-Wesley, (1991).
[42] S. Raman et al., At. Data Nucl. Tables 36, 1 (1987).
[43] R. H. Spear, At. Data Nucl. Tables 42, 55 (1989).
[44] R. L. Varner et al., Eur. Phys. J. A 25, s01, 391 (2005).
[45] A. J. Sierk, Phys. Rev. C 33, 2039 (1986).
[46] R. Bass, Nucl. Phys. A231, 45 (1974).
[47] A. S. Umar and V. E. Oberacker, Phys. Rev. C 74,
061601(R) (2006).
http://www.netlib.org/fitpack/
ABSTRACT
  Evaporation residue and fission cross sections of radioactive $^{132}$Sn on
$^{64}$Ni were measured near the Coulomb barrier. A large sub-barrier fusion
enhancement was observed. Coupled-channel calculations including inelastic
excitation of the projectile and target, and neutron transfer are in good
agreement with the measured fusion excitation function. When the change in
nuclear size and shift in barrier height are accounted for, there is no extra
fusion enhancement in $^{132}$Sn+$^{64}$Ni with respect to stable Sn+$^{64}$Ni.
A systematic comparison of evaporation residue cross sections for the fusion of
even $^{112-124}$Sn and $^{132}$Sn with $^{64}$Ni is presented.

<|endoftext|><|startoftext|>
Tri-layer superlattices: A route to magnetoelectric multiferroics?
Alison J. Hatt and Nicola A. Spaldin
Materials Department, University of California
(Dated: October 22, 2018)
We explore computationally the formation of tri-layer superlattices as an alternative approach for
combining ferroelectricity with magnetism to form magnetoelectric multiferroics. We find that the
contribution to the superlattice polarization from tri-layering is small compared to typical polar-
izations in conventional ferroelectrics, and the switchable ferroelectric component is negligible. In
contrast, we show that epitaxial strain and “negative pressure” can yield large, switchable polar-
izations that are compatible with the coexistence of magnetism, even in materials with no active
ferroelectric ions.
PACS numbers:
The simultaneous presence of ferromagnetism and fer-
roelectricity in magnetoelectric multiferroics suggests
tremendous potential for innovative device applications
and exploration of the fundamental physics of coupled
phenomena. However, the two properties are chemically
contra-indicated, since the transition metal d electrons
which are favorable for ferromagnetism disfavor the off-
centering of cations required for ferroelectricity [1]. Con-
tinued progress in this burgeoning field rests on the iden-
tification of alternative mechanisms for ferroelectricity
which are compatible with the existence of magnetism
[2, 3]. Mechanisms discovered to date include the incor-
poration of stereochemically active lone pair cations, for
example in BiMnO3 [4, 5] and BiFeO3 [6, 7], geometric
ferroelectricity in YMnO3 [8], BaNiF4 [9, 10] and related
compounds, charge ordering as in LuFe2O4 [11, 12], and
polar magnetic spin-spiral states, of which TbMnO3 is
the prototype [13]. However, there are currently no single
phase multiferroics with large and robust magnetization
and polarization at or near room temperature [14].
The study of ferroelectrics has been invigorated over
the last few years by tremendous improvements in the
ability to grow high quality ferroelectric thin films with
precisely controlled composition, atomic arrangements
and interfaces. In particular, the use of compositional
ordering that breaks inversion symmetry, such as the
layer-by-layer growth of three materials in an A-B-C-
A-B-C... arrangement, has produced systems with en-
hanced polarizations and large non-linear optical re-
sponses [15, 16, 17, 18]. Here we explore computation-
ally this tri-layering approach as an alternative route to
magnetoelectric multiferroics. Our hypothesis is that the
magnetic ions in such a tri-layer superlattice will be con-
strained in a polar, ferroelectric state by the symmetry of
the system, in spite of their natural tendency to remain
centrosymmetric. We note, however, that in previous
tri-layering studies, at least one of the constituents has
been a strong ferroelectric in its own right, and the other
constituents have often contained so-called second-order
Jahn-Teller ions such as Ti4+, which have a tendency to
off-center. Therefore factors such as electrostatic effects
from internal electric fields originating in the strong fer-
roelectric layers [19], or epitaxial strain, which is well
established to enhance or even induce ferroelectric prop-
erties in thin films with second-order Jahn-Teller ions
[6, 20, 21], could have been responsible for the enhanced
polarization in those studies.
We choose a [001] tri-layer superlattice of perovskite-
structure LaAlO3, LaFeO3 and LaCrO3 as our model
system (see Fig. 1, inset.) Our choice is motivated by
three factors. First, all of the ions are filled shell or filled
sub-shell, and therefore insulating behavior, a prerequi-
site for ferroelectricity, is likely. Second, the Fe3+ and
Cr3+ will introduce magnetism. And third, none of the
parent compounds are ferroelectric or even contain ions
that have a tendency towards ferroelectric distortions, al-
lowing us to test the influence of trilayering alone as the
driving force for ferroelectricity. For all calculations we
use the LDA+U method [22] of density functional the-
ory as implemented in the Vienna Ab-initio Simulation
Package (VASP) [23]. We use the projector augmented
wave (PAW) method [24, 25] with the default VASP po-
tentials (La, Al, Fe pv, Cr pv, O), a 6x6x2 Monkhorst-
Pack mesh and a plane-wave energy cutoff of 450 eV. Po-
larizations are obtained using the standard Berry phase
technique [26, 27] as implemented in VASP. We find that
U/J values of 6/0.6 eV and 5/0.5 eV on the Fe and Cr
ions respectively, are required to obtain insulating band
structures; smaller values of U lead to metallic ground
states. These values have been shown to give reasonable
agreement with experimental band gaps and magnetic
moments in related systems [28] but are somewhat lower
than values obtained for trivalent Fe and Cr using a con-
strained LDA approach [29]. We therefore regard them as
a likely lower limit of physically meaningful U/J values.
(Correspondingly, since increasing U often decreases the
covalency of a system, our calculated polarizations likely
provide upper bounds to the experimentally attainable
polarizations).
We begin by constraining the in-plane a lattice con-
stant to the LDA lattice constant of cubic SrTiO3 (3.85
Å) to simulate growth on a substrate, and adjust the out-
of-plane c lattice constant until the stress is minimized,
with the ions constrained in each layer to the ideal, high-
symmetry perovskite positions. We refer to this as our
reference structure. (The LDA (LDA+U) lattice con-
http://arxiv.org/abs/0704.0781v3
stants for cubic LaAlO3 (LaFeO3, LaCrO3) are 3.75, 3.85
and 3.84 Å, respectively. Thus, LaAlO3 is under tensile
strain and LaFeO3/LaCrO3 are unstrained.) The cal-
culated total density of states, and the local densities of
states on the magnetic ions, are shown in Figure 2; a band
gap of 0.32 eV is clearly visible. The polarization of this
reference structure differs from that of the corresponding
non-polar single-component material (for example pure
LaAlO3) at the same lattice parameters by 0.21 µC/cm
. Note, however, that this polarization is not switch-
able by an electric field since it is a consequence of the
tri-layered arrangement of the B-site cations. Next, we
remove the constraint on the high symmetry ionic posi-
tions, and relax the ions to their lowest energy positions
along the c axis by minimizing the Hellmann-Feynman
forces, while retaining tetragonal symmetry. We obtain
a ground state that is significantly (0.14 eV) lower in en-
ergy than the reference structure, but which has a simi-
lar value of polarization. Two stable ground states with
different and opposite polarizations from the reference
structure, the signature of a ferroelectric, are not ob-
tained. Thus it appears that tri-layering alone does not
lead to a significant switchable polarization in the ab-
sence of some additional driving force for ferroelectricity.
In all cases, the magnetic ions are high spin with negligi-
ble energy differences between ferro- and ferri-magnetic
orderings of the Fe and Cr ions; both arrangements lead
to substantial magnetizations of 440 and 110 emu/cm3
respectively. Such magnetic tri-layer systems could prove
useful in non-linear-optical applications, where a break-
ing of the inversion center is required, but a switchable
polarization is not.
Since epitaxial strain has been shown to have a strong
influence on the polarization of some ferroelectrics (such
as increasing the remanent polarization and Curie tem-
perature of BaTiO3 [20] and inducing room temperature
ferroelectricity in otherwise paraelectric SrTiO3 [21]) we
next explore the effect of epitaxial strain on the polar-
ization of La(Al,Fe,Cr)O3. To simulate the effects of epi-
taxial strain we constrain the value of the in-plane lat-
tice parameter, adjust the out of plane parameter so as
to maintain a constant cell volume, and relax the atomic
positions. The volume maintained is that of the calcu-
lated fully optimized structure, 167 Å3, which has an
in-plane lattice constant of 3.82 Å. As shown in Figure
3, we find that La(Al,Fe,Cr)O3 undergoes a phase transi-
tion to a polar state at an in-plane lattice constant of 3.76
Å, which corresponds to a (compressive) strain of -0.016
(calculated from (a‖−a0)/a0 where a‖ is the in-plane lat-
tice constant and a0 is the calculated equilibrium lattice
constant). A compressive strain of -0.016 is within the
range attainable by growing a thin film on a substrate
with a suitably reduced lattice constant.
We find that significant ferroelectric polarizations can
be induced in La(Al,Fe,Cr)O3 at even smaller strain val-
ues by using negative pressure conditions. We simulate
negative pressure by increasing all three lattice constants
and imposing the constraint a=b=c/3; such a growth
condition might be realized experimentally by growing
the film in small cavities on the surface of a large-lattice-
constant substrate, such that epitaxy occurs both hori-
zontally and vertically. As in the planar epitaxial strain
state, the system becomes strongly polar; this time the
phase transition to the polar state occurs at a lattice con-
stant of 3.85 Å, at which the strain is a negligible 0.001
relative to the lattice constant of the fully optimized sys-
In Fig. 1 we show the calculated energy versus dis-
tortion profile and polarization for negative pressure
La(Al,Fe,Cr)O3 with in-plane lattice constant = 3.95 Å,
well within the ferroelectric region of the phase diagram
shown in Fig. 3. The system has a characteristic ferro-
electric double well potential which is almost symmetric
in spite of the tri-layering; the two ground states have
polarizations of 38.9 and -39.9 µC cm−2 respectively, rel-
ative to the reference structure at the same lattice con-
stant. Since the energies of the two minima are almost
identical, the effective electric field Eeff=∆E/∆P, intro-
duced in Ref [15], is close to zero and there is no tendency
to self pole. The origin of the symmetry is seen in the
calculated Born effective charges (3.6, 3.5 and 3.3 for Al,
Fe and Cr respectively) which show that the system is
largely ionic, with the ions showing very similar trivalent
cationic behavior. A similar profile is observed under
planar epitaxial strain, although the planar strained sys-
tem is around 0.15 eV lower in energy than the negative
pressure system for the same in-plane lattice constant.
To decouple the effects of interfacial strain and tri-
layering we calculate the polarization as a function of
strain and negative pressure for the individual compo-
nents, LaAlO3, LaFeO3 and LaCrO3. We find that all
three single-phase materials become polar at planar epi-
taxial strains of -0.03 (LaAlO3), -0.02 (LaFeO3), and
-0.01 (LaCrO3). Likewise, all three components be-
come polar at negative pressure, under strains of +0.03
(LaAlO3), +0.001 (LaFeO3), and +0.001 (LaCrO3).
(The higher strains required in LaAlO3 reflect its smaller
equilibrium lattice constant.)
These results confirm our earlier conclusion that the
large polarizations obtained in strained and negative
pressure La(Al,Fe,Cr)O3 are not a result of the tri-
layering. We therefore suggest that many perovskite ox-
ides should be expected to show ferroelectricity provided
that two conditions imposed in our calculations are met:
First, the ionic radii of the cation sites in the high sym-
metry structure are larger than the ideal radii, so that
structural distortions are desirable in order to achieve an
optimal bonding configuration. This can be achieved by
straining the system epitaxially or in a “negative pres-
sure” configuration. And second, non-polar structural
distortions, such as Glazer tiltings [30], are de-activated
relative to polar, off-centering distortions. These have
been prohibited in our calculations by the imposition of
tetragonal symmetry; we propose that the symmetry con-
straints provided experimentally by hetero-epitaxy in two
or three dimensions should also disfavor non-polar tilting
and rotational distortions. A recent intriguing theoretical
prediction that disorder can be used to disfavor cooper-
ative tilting modes is awaiting experimental verification
[31].
Finally, we compare the tri-layered La(Al,Fe,Cr)O3
with the polarization of its individual components. Cal-
culated separately, the remnant polarizations of LaAlO3,
LaFeO3 and LaCrO3, all at negative pressure with
a=c=3.95 Å, average to 40.4 µC cm−2. This is only
slightly larger than the calculated polarizations of the
heterostructure, 38.9 and 39.9 µC cm−2, indicating that
tri-laying has a negligible effect on the polarity. This sur-
prizing result warrants further investigation into how the
layering geometry modifies the overall polarization.
In conclusion, we have shown that asymmetric layering
alone is not sufficient to produce a significant switchable
polarization in a La(Al,Fe,Cr)O3 superlattice, and we
suggest that earlier reports of large polarizations in other
tri-layer structures may have resulted from the intrinsic
polarization of one of the components combined with epi-
taxial strain. We find instead that La(Al,Fe,Cr)O3 and
its parent compounds can become strongly polar under
reasonable values of epitaxial strain and symmetry con-
straints, and that tri-layering serves to modify the re-
sulting polarization. Finally, we suggest “negative pres-
sure” as an alternative route to ferroelectricity and hope
that our prediction motivates experimental exploration
of such growth techniques.
This work was funded by the NSF IGERT program,
grant number DGE-9987618, and the NSF Division of
Materials Research, grant number DMR-0605852. The
authors thank Massimiliano Stengel and Claude Ederer
for helpful discussions.
[1] N. A. Hill, J. Phys. Chem. B 104, 6694 (2000).
[2] C. Ederer and N. A. Spaldin, Curr. Opin. Solid State
Mater. Sci. 9, 128 (2005).
[3] M. Fiebig, J. Phys. D: Appl. Phys. 38, R1 (2005).
[4] R. Seshadri and N. A. Hill, Chem. Mater. 13, 2892
(2001).
[5] A. M. dos Santos, S. Parashar, A. R. Raju, Y. S. Zhao,
A. K. Cheetham, and C. N. R. Rao, Solid State Commun.
122, 49 (2002).
[6] J. Wang, J. B. Neaton, H. Zheng, V. Nagarajan, S. B.
Ogale, B. Liu, D. Viehland, V. Vaithyanathan, D. G.
Schlom, U. V. Waghmare, et al., Science 299, 1719
(2003).
[7] J. B. Neaton, C. Ederer, U. V. Waghmare, N. A. Spaldin,
and K. M. Rabe, Phys. Rev. B 71, 014113 (2005).
[8] B. B. van Aken, T. T. M. Palstra, A. Filippetti, and N. A.
Spaldin, Nat. Mater. 3, 164 (2004).
[9] D. L. Fox and J. F. Scott, J. Phys. C 10, L329 (1977).
[10] C. Ederer and N. A. Spaldin, Physical Re-
view B (Condensed Matter and Materials
Physics) 74, 024102 (pages 8) (2006), URL
http://link.aps.org/abstract/PRB/v74/e024102 .
[11] N. Ikeda, H. Ohsumi, K. Ohwada, K. Ishii, T. Inami,
K. Kakurai, Y. Murakami, K. Yoshii, S. Mori, Y. Horibe,
et al., Nature 436, 1136 (2005).
[12] M. A. Subramanian, H. Tao, C. Jiazhong, N. S. Rogado,
T. G. Calvarese, and A. W. Sleight, Adv. mater. 18, 1737
(2006).
[13] T. Kimura, T. Goto, H. Shintani, K. Ishizaka, T. Arima,
and Y. Tokura, Nature 426, 55 (2003).
[14] R. Ramesh and N. A. Spaldin, Nat. Mater. 6, 21 (2007).
[15] N. Sai, B. Meyer, and D. Vanderbilt, Phys. Rev. Lett.
84, 5636 (2000).
[16] H. N. Lee, H. M. Christen, M. F. Chisholm, C. M.
Rouleau, and D. H. Lowndes, Nature 433, 395 (2005).
[17] M. P. Warusawithana, E. V. Colla, J. N. Eckstein, and
M. B. Weissman, Phys. Rev. Lett. 90, 036802 (2003).
[18] Y. Ogawa, H. Yamada, T. Ogasawara, T. Arima,
H. Okamoto, M. Kawasaki, and Y. Tokura, Phys. Rev.
Lett. 90, 217403 (2003).
[19] J. B. Neaton and K. M. Rabe, Appl. Phys. Lett. 82, 1586
(2003).
[20] K. J. Choi, M. Biegalski, Y. L. Li, A. Sharan, J. Schubert,
R. Uecker, P. Reiche, Y. B. Chen, X. Q. Pan, V. Gopalan,
et al., Science 306, 1005 (2004).
[21] J. H. Haeni, P. Irvin, W. Chang, R.Uecker, P. Re-
iche, Y. L. Li, S. Choudhury, W. Tian, M. E. Hawley,
B. Craigo, et al., Nature 430, 758 (2004).
[22] V. I. Anisimov, F. Aryasetiawan, and A. I. Liechtenstein,
J. Phys.: Condens. Mat. 9, 767 (1997).
[23] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169
(1996).
[24] P. E. Blöchl, Phys. Rev. B 50, 17953 (1994).
[25] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999).
[26] R. D. King-Smith and D. Vanderbilt, Phys. Rev. B 47,
1651 (1993).
[27] D. Vanderbilt and R. D. King-Smith, Phys. Rev. B 48,
4442 (1993).
[28] Z. Yang, Z. Huang, L. Ye, and X. Xie, Phys. Rev. B 60,
15674 (1999).
[29] I. Solovyev, N. Hamada, and K. Terakura, Phys. Rev. B
53, 7158 (1996).
[30] A. M. Glazer, Acta Crystallogr. B 28, 3384 (1972).
[31] D. I. Bilc and D. J. Singh, Phys. Rev. Lett. 96, 147602
(pages 4) (2006).
http://link.aps.org/abstract/PRB/v74/e024102
FIG. 1: Energy and polarization as a function of displacement
from the centrosymmetric structure for La(Al,Fe,Cr)O3 under
negative pressure with a = c/3 = 3.95 Å. Inset: Schematic
representation of the centrosymmetric unit cell (center) and
displacements of the metal cations corresponding to the en-
ergy minima. Displacements are exaggerated for clarity.
FIG. 2: Density of states for Fe and Cr ions in La(Al,Fe,Cr)O3
with U/J values of 6/0.6 eV and 5/0.5 eV respectively. The
dashed line at 0 eV indicates the position of the Fermi energy.
FIG. 3: Calculated polarizations of negative pressure (cir-
cles) and epitaxially strained (triangles) La(Al,Fe,Cr)O3 as a
function of change in (a) in-plane and (b) out-of-plane lattice
constants relative to the lattice constants of the fully relaxed
structures. The polarizations are reported relative to the ap-
propriate corresponding reference structures in each case.
ABSTRACT
  We explore computationally the formation of tri-layer superlattices as an
alternative approach for combining ferroelectricity with magnetism to form
magnetoelectric multiferroics. We find that the contribution to the
superlattice polarization from tri-layering is small compared to typical
polarizations in conventionalferroelectrics, and the switchable ferroelectric
component is negligible. In contrast, we show that epitaxial strain and
``negative pressure'' can yield large, switchable polarizations that are
compatible with the coexistence of magnetism, even in materials with no active
ferroelectric ions.

<|endoftext|><|startoftext|>
Introduction
A fundamental problem in numerical relativity is the need to solve Einstein’s equations
on spatially unbounded domains with finite computer resources. There are various
ways of addressing this issue. Most often, the spatial domain is truncated at a finite
distance and suitable boundary conditions are imposed at the artificial boundary.
A different approach is to compactify the domain by using spatial coordinates that
bring spatial infinity to a finite location on the computational grid. Another method
often used for wave-like problems (although it is not commonly used in numerical
relativity) includes so-called sponge layers which damp the waves near the outer
boundary of the computational domain. The purpose of this paper is to compare these
various methods by testing their ability to accurately reproduce dynamical solutions
of Einstein’s equations.
An ideal boundary treatment would produce a solution to Einstein’s equations
that is identical (within the computational domain) to the corresponding solution
obtained on an unbounded domain. In particular, no spurious gravitational radiation
or constraint violations should enter the computational domain through the artificial
Testing outer boundary treatments for the Einstein equations 2
boundary. We can use this principle to test the various boundary treatments in the
following way. First we compute a reference solution using a very large computational
domain, large enough that its boundary remains out of causal contact with the interior
spacetime region where comparisons are being made. Next we compute the same
solution using a domain truncated at a smaller distance where one of the boundary
treatments is used: we either impose boundary conditions there, compactify spatial
infinity, or add a sponge layer. Finally we compare the solution on the smaller domain
with the reference solution, measuring the reflections and constraint violations caused
by the boundary treatment. Assessing boundary conditions by comparing with a
reference solution on a much larger domain or a known analytic solution is a common
practice in computational science. For applications to numerical relativity see e.g. [1],
chapter 8 of [2], and [3, 4, 5].
The particular test problem used in this paper is a Schwarzschild black hole
with an outgoing gravitational wave perturbation. The interior of the black hole is
excised; all the characteristic fields propagate into the black hole (and out of the
computational domain) at the inner boundary and hence no boundary conditions
are needed there. Our numerical implementation uses a pseudo-spectral collocation
method. See Appendix A for details on the initial data, the numerical methods, and
the quantities that we compare between the solutions.
We perform all of these tests using a first-order generalized harmonic formulation
of the Einstein equations (see [6] and references therein). In section 2 we discuss
the construction of boundary conditions for this system that prevent the influx of
constraint violations, and that limit the spurious incoming gravitational radiation
by controlling the Newman-Penrose scalar Ψ0 at the boundary. We also improve
the boundary conditions on the gauge degrees of freedom by studying small gauge
perturbations of flat spacetime. We then evaluate the performance of these boundary
conditions on our test problem: measuring the reflections and constraint violations
caused by the computational boundary, and determining how these reflections vary
with the radius of the boundary.
Section 3 evaluates the performance of a variety of other widely used boundary
conditions on our test problem. First we test the simple boundary conditions that
freeze all the incoming characteristic fields at the boundary. We also test the
commonly used variant of this, the Sommerfeld boundary conditions, used in many
binary black hole simulations [7, 8, 9, 10, 11] based on the BSSN [12, 13] formulation
of Einstein’s equations. Finally in section 3 we evaluate the constraint-preserving
boundary conditions proposed by Kreiss and Winicour [14], which differ from those
discussed in section 2 mainly by our use of a physical boundary condition that controls
In section 4 we evaluate two boundary treatments that are alternatives to
imposing local boundary conditions at a finite outer boundary. The first is the spatial
compactification method used e.g. by Pretorius [15, 16, 17] in his groundbreaking
binary black hole evolutions. In this treatment a coordinate transformation maps
spatial infinity to a finite location on the computational grid. As waves travel out,
they become increasingly blue-shifted with respect to the compactified coordinates
and ultimately they fail to be resolved. Hence numerical dissipation is applied, which
damps away these short-wavelength features. We measure the reflections and the
constraint violations generated by the waves in our test problem as they interact with
this boundary treatment. Finally in section 4 we implement and test a sponge layer
method for Einstein’s equations.
Testing outer boundary treatments for the Einstein equations 3
One of the main objectives of current binary black hole simulations is the
computation of reliable waveforms for gravitational wave data analysis. Therefore it is
important to evaluate how the various boundary treatments affect the accuracy of the
extracted waveforms. In section 5, we compute the Newman-Penrose scalar Ψ4 (which
describes the outgoing waves) on an extraction sphere close to the outer boundary (or
compactified region, or sponge layer, respectively) and compare it with the analogous
Ψ4 from the reference solution. We also compare the measured reflections caused
by our Ψ0 controlling boundary condition with the analytical predictions of these
reflections made by Buchman and Sarbach [18, 19].
Finally we discuss the implications of our results in section 6, and we also describe
briefly a number of other boundary treatments which we do not test here.
2. Constraint-preserving boundary conditions
In this section, we briefly review the generalized harmonic form of the Einstein
evolution system used in our tests. The method of constructing constraint-preserving
boundary conditions (CPBCs) for this system is also discussed, and an improved
boundary condition for the gauge degrees of freedom is derived. The numerical
performance of these boundary conditions is evaluated using our test problem, and
the dependence of the spurious reflections as a function of the boundary radius is
measured.
2.1. The generalized harmonic evolution system
The formulation of Einstein’s equations employed here uses generalized harmonic
gauge conditions, in which the coordinates xa obey the wave equation
�xa = Ha(x, ψ), (1)
where � = ψab(∂a∂b − Γcab∂c) is the covariant scalar wave operator, with ψab the
spacetime metric and Γcab the associated metric connection. In this formulation of
the Einstein system the gauge source function Ha may be chosen freely as a function
of the coordinates and of the spacetime metric ψab (but not derivatives of ψab).
As is well known, the Einstein equations reduce to a set of coupled wave equations
when the gauge is specified by equation (1). We write this system in first-order form,
both in time and space, by introducing the additional variables Φiab ≡ ∂iψab and
Πab ≡ −tc∂cψab, where tc is the future directed unit normal to the t = const.
hypersurfaces. Here lower-case Latin indices from the beginning of the alphabet
denote four-dimensional spacetime quantities, whereas lower-case Latin indices from
the middle of the alphabet are spatial. The principal parts of these evolution equations
are given by ‡
∂tψab ' 0,
∂tΠab ' Nk∂kΠab −Ngki∂kΦiab − γ2Nk∂kψab, (2)
∂tΦiab ' Nk∂kΦiab −N∂iΠab +Nγ2∂iψab,
where ' indicates that purely algebraic terms have been omitted, gij is the spatial
metric of the t = const. slices, and N and N i are the lapse function and shift vector,
‡ The parameter γ1 of [6] is chosen to be −1, which ensures that the equations are linearly degenerate.
Testing outer boundary treatments for the Einstein equations 4
respectively. The parameter γ2 was introduced in [6] in order to damp violations of
the three-index constraint
Ciab ≡ ∂iψab − Φiab = 0. (3)
We also include terms of lower derivative order that are designed to damp violations
of the harmonic gauge constraint [20]
Ca ≡ −�xa +Ha = ψbcΓabc +Ha = 0. (4)
The system (2) is symmetric hyperbolic. The characteristic fields in the direction
ni (where nata = 0) are given by
u0ab = ψab, speed 0, (5)
u1±ab = Πab ± Φnab − γ2ψab, speed −N
n ±N, (6)
u2Aab = ΦAab, speed −Nn. (7)
For future reference, we also define
ũ1±ab ≡ Πab ± Φnab. (8)
Here and in the following, an index n denotes contraction with ni, while upper-
case Latin indices A,B, . . . are orthogonal to n, e.g. vA = PAivi where Pab ≡
ψab − nanb + tatb. For further details, we refer the interested reader to [6].
2.2. Construction of boundary conditions
Our construction of boundary conditions for the generalized harmonic evolution
system can be divided into three parts: constraint-preserving, physical, and gauge
boundary conditions.
In order to impose constraint-preserving boundary conditions, we derive the
subsidiary evolution system that the constraints (3) and (4) obey as a consequence of
the main evolution equations (2). The incoming modes of the subsidiary system are
then required to vanish at the boundary (cf. [21, 22, 23, 24, 25, 26, 27, 28, 29]). For
instance, the harmonic gauge constraint (4) obeys a wave equation
�Ca = (lower-order terms homogeneous in the constraints) (9)
and the corresponding incoming fields will involve first derivatives of Ca. In terms of
the incoming modes u1−ab (6) of the main evolution equations, the resulting constraint-
preserving boundary conditions can be written in the form
PC cdab ∂nu
cd ≡ ( 12PabP
cd − 2l(aPb)(ckd) + lalbkckd)∂nu1−cd
= (tangential derivatives), (10)
where PC is a projection operator of rank 4 (cf. [6]). Here ni now refers to the outward-
pointing unit spatial normal to the boundary, la = (ta + na)/
2, ka = (ta − na)/
= denotes equality at the boundary. If the shift vector points towards the exterior
at the boundary (Nn >̇ 0), the fields u2Aab (7) are incoming as well and we obtain a
boundary condition on them by requiring the components CnAab of the four-index
constraint
Cijab ≡ −2∂[iΦj]ab (11)
to vanish at the boundary.
An acceptable physical boundary condition should require that no gravitational
radiation enter the computational domain from the outside (except for backscatter
Testing outer boundary treatments for the Einstein equations 5
off the spacetime curvature, an effect that is a first-order correction in M/R).
Gravitational radiation may be described by the evolution system that the Weyl tensor
obeys by virtue of the Bianchi identities (see e.g. [27]). Our boundary condition
requires the incoming characteristic fields of this system to vanish at the outer
boundary. These incoming fields are proportional to the Newman-Penrose scalar Ψ0
(evaluated for a Newman-Penrose null tetrad containing the vectors la and ka). Hence
the physical boundary condition we use is [27, 22, 30, 29, 31]
= 0, (12)
which can be written in a form similar to (10),
PP cdab ∂nu
cd ≡ (Pa
d − 1
cd)∂nu
= (tangential derivatives). (13)
Here PP is a projection operator of rank 2 that is orthogonal to PC [6]. We remark that
(12) still causes some, albeit very small, spurious reflections of gravitational radiation.
It can be viewed as the lowest level in a hierarchy of perfectly absorbing boundary
conditions for linearized gravity [18, 19].
The constraint-preserving (10) and physical (13) boundary conditions together
constrain six components of the main incoming fields u1−ab . The remaining four
components correspond to gauge degrees of freedom. In the past we chose simply
to freeze those components in time [6],
PG cdab ∂tu
= 0, (14)
where PG ≡ I− PC − PP.
The initial-boundary value problem (IBVP) for the boundary conditions discussed
so far was shown in [32] to be boundary-stable, which is a (rather strong) necessary
condition for well posedness. These boundary conditions have been successfully used
in long-term stable evolutions of single and binary black hole spacetimes [6, 33, 34]. In
the following subsection, we present an improvement to the gauge boundary condition
(14) motivated by the evolution of gauge perturbations about flat spacetime.
2.3. Improved gauge boundary condition
Let us assume that near the outer boundary, the spacetime is close to Minkowski space
in standard coordinates (Ha = 0) so that the Einstein equations may be linearized
about that background. This assumption is reasonable because for the dominant
wavenumber of the outgoing pulse (k = 1.6/M) and the boundary radius we typically
consider (R = 41.9M), we have kR � 1 and R � M . Furthermore, we assume that
the outer boundary is a coordinate sphere of radius r = R.
We begin by noting that harmonic gauge does not fix the coordinates completely:
infinitesimal coordinate transformations
xa → xa + ξa (15)
are still allowed provided the displacement vector satisfies the wave equation,
�ξa = 0. (16)
Under such a coordinate transformation, the metric changes by
δψab = −2∂(aξb). (17)
A closer inspection [32] of the projection operator PG in (14) shows that the gauge
boundary conditions control the components laδψab of the perturbations, where
Testing outer boundary treatments for the Einstein equations 6
la ≡ (ta + na)/
2 is the outgoing null vector normal to the boundary. It is
interesting to observe that these components vanish in the ingoing radiation gauge
[35]. However, imposing radiation gauge on the entire spacetime is not possible in
spacetimes containing strong-field regions, which will always generate perturbations
laδψab that propagate into the far field. A reasonable condition to require then is that
these perturbations pass through the boundary without causing strong reflections.
Each Cartesian component of the vector laδψab obeys the scalar wave equation
�ψ = 0. (18)
Solutions to this equation can be written in the form
Ylm(θ, φ)ψl(t, r), (19)
where the Ylm are the standard spherical harmonics and the ψl are linear combinations
of outgoing (+) and incoming (−) solutions
ψ±l (t, r) = r
F±l (r ∓ t), (20)
F±l (x) being arbitrary functions. A boundary condition is needed on ψ that eliminates
the incoming part of these solutions. In [36], a hierarchy of boundary conditions
is constructed that accomplish this task for all l 6 L. This idea was applied to
the evolution of the Weyl curvature in [18] in order to construct improved physical
boundary conditions. For the gauge boundary conditions considered here, we restrict
ourselves to the L = 0 member of the hierarchy, which corresponds to the Sommerfeld
condition §
(∂t + ∂r + r
= 0. (21)
In contrast, our old gauge boundary condition that froze the incoming characteristic
field, as in (14), is given by
(∂t + ∂r + γ2)ψ
= 0, (22)
where γ2 is the constraint damping parameter.
This Sommerfeld boundary condition (21) is much less reflective than the freezing
condition (22). To see this, we consider a solution of the form
ψl = ψ
l + ρlψ
l (23)
with generating functions
F±l (x) = e
±ikx, (24)
where k ∈ R is the wave number. Substituting this solution into the boundary
conditions (21) resp. (22), we solve for the reflection coefficient ρl. Figure 1 shows
|ρl| for a typical range of wave numbers k and outer boundary radii R used for the
numerical tests in this paper. (The dominant wave number of the outgoing pulse is
k ≈ 1.6/M and in most cases, we place the outer boundary at R = 41.9M .) We
see that |ρl| is much smaller (by about 3 orders of magnitude) for the Sommerfeld
condition than for the freezing condition.
§ To avoid confusion, we remark that in [5, 14], the term ‘Sommerfeld condition’ is used in reference
to a condition of the form (∂t +∂r)u
= 0, i.e. without the extra r−1 term due to our polar coordinates.
Testing outer boundary treatments for the Einstein equations 7
0 0.5 1 1.5 2
l = 1
R = 41.9 M
50 100 150 200
R / M
l = 1
k = 1 / M
Figure 1. Predicted reflection coefficients ρl for freezing (dotted) and
Sommerfeld (solid) boundary conditions as functions of wave number k and outer
boundary radius R. The curves for different l are visually indistinguishable in the
freezing case. Note also that ρ0 = 0 for the Sommerfeld condition.
In the notation of the previous subsection, the improved gauge boundary
condition (21) reads (after taking a time derivative),
PG cdab ∂t[u
cd + (γ2 − r
−1)ψcd]
= 0. (25)
We remark that the extra terms in (25) as compared with the old condition (14) are
of lower derivative order, so that the high-frequency stability result of [32] extends
immediately to these modified gauge boundary conditions.
2.4. Numerical results
The numerical tests of the various boundary conditions performed in this paper are
described in some detail in Appendix A. Figure 2 compares the numerical performance
of our new CPBCs (10), (11), (13), (25) with our old ones (10), (11), (13), (14). The
outer boundary is placed at radius R = 41.9M for these particular tests. Shown are
the discrete L∞ and L2 norms of the difference ∆U between the numerical solution
and the reference solution, and also the violations of the constraints C (see Appendix
A.4 for precise definitions of these quantities). The reference solution has an outer
boundary at radius 961.9M and is computed using our old CPBCs; thus for t < 920M
the outer boundary of the reference solution is out of causal contact with the region
where ∆U and C are computed.
In the difference ∆U we see a reflection that originates when the wave reaches the
boundary at t ≈ R and then amplifies as it moves inward in the spherical geometry,
assuming its maximum at t ≈ 2R. This feature is much more prominent in the L∞
norm than in the L2 norm, which is why we display only the L∞ norm in subsequent
plots. The reflection is much smaller (by a factor of ≈ R/M) for the new boundary
conditions as compared with the old ones. Even at later times, the new boundary
conditions result in a smaller ∆U , which in contrast to the old conditions appears to
decrease as resolution is increased.
We would like to stress that ∆U is a coordinate dependent quantity. Hence a
smaller ∆U does not necessarily mean that the boundary treatment is ‘better’ in a
physically meaningful sense. If however the aim is to produce a solution that is as
Testing outer boundary treatments for the Einstein equations 8
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
Figure 2. Old (solid) vs. new (dotted) CPBCs. Four different resolutions are
shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The outer boundary is
at R = 41.9M .
close to the reference solution in the same coordinates, the choice of gauge boundary
conditions does become important. Gauge reflections can in principle also impair the
numerical accuracy of gauge-invariant quantities because much numerical resolution
is wasted on resolving the gauge reflections. This is particularly the case when the
gauge excitations in question are high-frequency modes such as those produced along
with the so-called ‘junk radiation’ in binary black hole initial data.
There is no discernible difference between the two sets of boundary conditions as
far as constraint violations are concerned, which is what we expect because both of
them are constraint-preserving.
We close this section by investigating the dependence of the reflections on the
radius of the outer boundary (figure 3). The amplitude of the first peak in ||∆U||∞
decreases as the boundary is moved outward, roughly like 1/R. At late times, there
appears to be a power-law growth of that quantity at a rate that increases slightly
with resolution. Inspection of the constraints (also in figure 3) and Ψ4 (figure 10)
suggests that this is a pure gauge effect. This blow-up is completely dominated by
the innermost domain, which contains a long-wavelength feature that is growing in
time. We speculate that this problem might be cured by a more clever choice of gauge
source function close to the black hole horizon.
Testing outer boundary treatments for the Einstein equations 9
0 200 400 600 800 1000
t / M
R/M = 21.9
161.9
201.9
121.9
 = 51, L = 14)
0 200 400 600 800 1000
t / M
 = 51, L = 14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
R = 121.9 M
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
R = 121.9 M
Figure 3. New CPBCs at different radii. Top half: all radii at the highest
resolution, bottom half: R = 121.9M at all resolutions. In the top right panel,
curves for all outer boundary radii coincide.
3. Alternate boundary conditions
In this section, we consider several alternate boundary conditions that are often used
in numerical relativity. All of these are local conditions imposed at a finite boundary
radius, then in section 4 we consider some additional non-local boundary treatments.
We run the alternate boundary conditions on our test problem and compare the results
with the CPBCs (using the new gauge boundary condition (25)).
3.1. Freezing the incoming fields
A very simple boundary condition is obtained by freezing in time all the incoming
fields at the boundary, i.e.,
= 0 (and ∂tu
= 0 if Nn >̇ 0). (26)
This boundary condition is attractive from a mathematical point of view because it
is of maximally dissipative type and hence, together with the symmetric hyperbolic
evolution equations (2), yields a strongly well-posed IBVP [37, 38, 39]. However, in
general this boundary condition is not compatible with the constraints.
Testing outer boundary treatments for the Einstein equations 10
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
Figure 4. Freezing (solid) vs. new CPBCs (dotted). Four different resolutions
are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). For freezing
boundary conditions, both ||∆U|| and C converge to a nonzero function with
increasing resolution. The outer boundary is at R = 41.9M .
The left side of figure 4 demonstrates that freezing boundary conditions cause a
significantly larger (by ≈ 3 orders of magnitude) initial reflection than our CPBCs.
The difference with respect to the reference solution remains large in the subsequent
evolution and unlike for the CPBCs does not decrease with increasing resolution.
Furthermore, the violations of the constraints (right side of figure 4) do not converge
away. This means that a solution to the Einstein equations is not obtained in the
continuum limit.
3.2. Sommerfeld boundary conditions
A boundary condition that is often imposed in conjunction with the BSSN [12, 13]
formulation of the Einstein equations is a Sommerfeld condition on all the components
of the spatial metric gij and extrinsic curvature Kij ,
(∂t + ∂r + r
gij − δij
= 0. (27)
This condition has been used for example in many recent binary black hole
simulations [7, 8, 9, 10, 11]. We cannot impose precisely the conditions (27) in our
simulations because there is no one-to-one relationship between gij and Kij , and the
incoming characteristic fields of our generalized harmonic formulation of Einstein’s
equations. Instead we consider the similar condition
(∂t + ∂r + r
−1)(ψab − ηab)
= 0 (28)
on all the components of the spacetime metric (ηab being the Minkowski metric). A
very similar boundary condition (without the r−1 term) has recently been used in the
generalized harmonic evolutions of [40].
In our formulation, boundary conditions are required not on the spacetime metric
itself but only on certain combinations of its derivatives. By taking a time derivative
of (28) and rewriting in terms of incoming characteristic fields, we obtain
ab + (γ2 − r
−1)ψab]
= 0. (29)
Testing outer boundary treatments for the Einstein equations 11
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
Figure 5. Sommerfeld (solid) vs. new CPBCs (dotted). Four different
resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The
outer boundary is at R = 41.9M .
This then is our version of the Sommerfeld boundary condition (cf. (25)), to be imposed
on a spherical boundary in the far field (where linearized theory is assumed to be valid).
Because the BSSN formulations using (27) are usually second-order in space,
there is no analogue of our three-index constraint (3) in that system. To mimic this
situation in our tests of equation (29), we also impose a CPBC on u2Aab as discussed in
section 2.2, which together with our constraint damping terms ensures that violations
of the three-index constraint (3) are exponentially damped.
Our version of Sommerfeld boundary conditions performs similarly on our test
problem (figure 5) to the freezing boundary conditions (26) (figure 4). The initial pulse
of reflections is smaller by ≈ 2 orders of magnitude, but later ||∆U|| grows to a similar
level as for freezing boundary conditions. Again the constraints do not converge away,
although this non-convergence appears only at somewhat higher resolutions than in
the freezing case.
3.3. Kreiss-Winicour boundary conditions
Recently, Kreiss and Winicour [14] proposed a set of ‘Sommerfeld-like’ CPBCs for the
harmonic Einstein equations and showed that they result in an IBVP that is well-posed
in the generalized sense. Their boundary conditions were implemented and tested in
[5]; here we compare their performance with the various other boundary treatments.
The Kreiss-Winicour boundary conditions are obtained by requiring the harmonic
constraint to vanish at the boundary,
= 0. (30)
In our notation, this can be written as an algebraic condition on part of the incoming
fields u1−,
= Fa, (31)
where
[2k(cδad) − kaψcd],
lbu1+ab −
bcu1+bc + P
iju2ija − 12Pa
iψbcu2ibc (32)
Testing outer boundary treatments for the Einstein equations 12
− γ2ta +Ha.
The range of the projection operator PC
is identical with that of PC defined in (10).
For the unconstrained incoming fields ũ1− (i.e. u1− without the γ2 term, equation
(8)), Kreiss and Winicour [14] specify certain free boundary data qPab and q
ab. In our
notation,
PP cdab ũ
cd = q
ab, P
ab ũ
cd = q
ab. (33)
In the linearized wave and gauge wave tests of [5], these boundary data are obtained
from the known exact solutions. In the absence of an exact solution, it is suggested
that the data could be obtained from an exterior Cauchy-characteristic or Cauchy-
perturbative code. However, since we do not have such an exterior code, we compute
the boundary data from the background solution, i.e. Schwarzschild spacetime. As in
the Sommerfeld case (section 3.2), we use a constraint-preserving boundary condition
on u2Aab to emulate the second-order formulation of [5, 14], and this value of u
Aab is
then used to compute Fa in (32).
Figure 6 shows the numerical results for our test problem. The magnitude of the
initial reflections lies between that of freezing and Sommerfeld boundary conditions
and is somewhat smaller at later times, though still larger than for our CPBCs at the
higher resolutions. The constraints converge away with increasing resolution, as they
should for a boundary condition that is consistent with the constraints. In a numerical
simulation, violations of the constraints are in general present in the interior of the
computational domain. These propagate as described by the constraint evolution
system (9) and some may hit the outer boundary. The Dirichlet boundary conditions
(30) might be expected to cause more reflections of constraint violations than our
no-incoming-field conditions (10), however, no indications of this are seen in figure
6. Probably the constraint damping we use is sufficiently effective in eliminating the
source of these reflections.
We shall see in section 5.1 that the Kreiss-Winicour boundary conditions also
cause larger errors in the physical degrees of freedom than our CPBCs. Since the
main difference between the two sets of boundary conditions is our use of a physical
boundary condition ∂tΨ0
= 0, we conclude that such a condition is crucial in reducing
the reflections from the outer boundary.
4. Alternate approaches
So far we have only considered boundary conditions that are local algebraic or
differential conditions imposed at the boundary of some finite computational domain.
There are of course many ways of treating the outer boundary that do not fall into that
category. In this section, we evaluate two such approaches: spatial compactification
and sponge layers.
4.1. Spatial compactification
Spatial compactification is a method that has been widely used in numerical relativity,
for instance in [41, 42] or more recently in the generalized harmonic binary black hole
simulations of Pretorius [15, 16, 17].
The basic idea is to introduce spatial coordinates that map spacelike infinity to a
finite location. Here we consider mappings that are functions of coordinate radius only
(whereas Pretorius applies the mapping to each Cartesian coordinate separately). We
Testing outer boundary treatments for the Einstein equations 13
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
Figure 6. Kreiss-Winicour (solid) vs. new CPBCs (dotted). Four different
resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and (51, 14). The
outer boundary is at R = 41.9M .
have used two such mappings, named Tan and Inverse, as detailed in Appendix B.1.
Each map has a scaleR across which the mapping is (essentially) linear. The outermost
grid point is placed at a very large but finite uncompactified radius (r = 1017M). With
respect to the compactified radial coordinate, the characteristic speeds are below
numerical roundoff there and hence no boundary condition should be needed. The
following results were produced using constraint-preserving boundary conditions; we
have checked for one simulation that using no boundary condition at all yields results
that are visually indistinguishable from the ones presented here on the scales of figures
7, 8, and 10.
As the waves travel outward, they become more and more blue-shifted with
respect to the computational grid and are eventually no longer properly resolved.
However, some form of artificial numerical dissipation is applied that acts as a low-
pass filter and causes the waves to be damped as they become increasingly distorted.
We have experimented with various such filters; see Appendix B.1 for details. One of
them (referred to as number 2 in the following) is designed to emulate as closely as
possible the fourth-order Kreiss-Oliger dissipation used by Pretorius.
In the following numerical comparisons, we evaluate the differences with respect
to the reference solution only in the part of the domain where the compatification map
is essentially linear, i.e. for r 6 R. First we compare the various filtering methods at
a fixed resolution, using the Tan compactification mapping (figure 7). The filters that
are applied to the right side of the evolution equations (numbers 1 and 3, cf. table B1)
do somewhat better than those applied to the solution itself (numbers 2 and 4), and
the Exponential filters (numbers 3 and 4) are slightly better than the Kreiss-Oliger
filters (numbers 1 and 2). All of them are outperformed by the CPBCs (imposed at
r = R). For our closest approximation to the dissipation used by Pretorius (number
2), ||∆U|| is comparable to constraint-preserving boundary conditions at the peak of
reflections (at t ≈ 2R) but becomes larger by about 2 orders of magnitude at later
times. The compactification methods also generate considerable constraint violations.
Next we focus on the best filter (number 4) of the previous test but vary the
resolution (figure 8). We do see convergence of ||∆U|| initially but the convergence
degrades at later times. This is surprising at first because with increasing resolution,
Testing outer boundary treatments for the Einstein equations 14
0 200 400 600 800 1000
t / M
New CPBC
TAN, Filter 1
TAN, Filter 2
TAN, Filter 3
TAN, Filter 4
0 200 400 600 800 1000
t / M
New CPBC
TAN, Filter 1
TAN, Filter 2
TAN, Filter 3
TAN, Filter 4
Figure 7. Tan compactification with various filters vs. new CPBCs. Only the
highest resolution (Nr, L) = (51, 14) is shown. The compactification scale (and
the radius of the outer boundary in the CPBC case) is R = 41.9M .
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
Figure 8. Tan compactification with filter 4 (solid) vs. new CPBCs (dotted).
Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12), and
(51, 14). The compactification scale (and the radius of the outer boundary in the
CPBC case) is R = 41.9M .
the waves travel a longer distance before they fail to be resolved. Note however that
the high-frequency filter is applied at each time step, as is done in the simulations
of Pretorius. For higher resolutions, the time steps are smaller because of the CFL
condition and the filter is applied more often, thus leading to a stronger damping of
the waves. This may well lead to the observed loss of convergence with increasing
resolution. The constraints appear to converge away in this test, although from figure
8 it appears that this will not persist for even higher resolutions.
We have also evaluated the Inverse mapping described in Appendix B.1. The
results are similar, but somewhat worse than the Tan mapping results shown here.
Testing outer boundary treatments for the Einstein equations 15
4.2. Sponge layers
A method that has been used for a long time in computational science, in particular
for spectral methods (see e.g. section 17.2.3 of [43] and references therein), involves
so-called sponge layers. A sponge layer is introduced by modifying the evolution
equations according to
∂tu = . . .− γ(r)(u− u0), (34)
where u0 refers to the unperturbed background solution (Schwarzschild spacetime in
our case) and the smooth sponge function γ(r) > 0 is large only close to the outer
boundary of the computational domain. (Here we use uncompactified coordinates as
in sections 2 and 3.) In this way, the waves are damped exponentially as they approach
the outer boundary. Details on our particular choice of γ(r) can be found in Appendix
We compare the sponge layer method with our CPBCs in figure 9. For the
CPBCs, the boundary is either placed at R = 41.9M (the outer edge of the sponge-
free region) or at R = 121.9M (the outer edge of the sponge). At early times (t . 2R),
the ||∆U||∞ of the sponge layer method lies between that of the CPBCs for the two
choices of outer boundary radius, whereas at later times, it is much larger than both
versions of CPBCs. The constraint violations in the sponge runs do not converge
away.
5. Physical gravitational waves
Perhaps the most important predictions of numerical relativity simulations at the
present time are the gravitational waveforms produced by astrophysical systems like
binary black holes. It is important therefore to understand how the accuracy of these
waveforms is affected by the choice of boundary treatment. Physical gravitational
radiation can be described by the Newman-Penrose scalars Ψ4 and Ψ0. The scalar Ψ4
is dominated by the outgoing radiation (its ingoing part is suppressed by a factor of
(kr)4, where k is the wavenumber), whereas Ψ0 is dominated by the ingoing radiation
(its outgoing part is suppressed by a factor of (kr)4). In this section we compare
the gravitational waves extracted from the various boundary treatment solutions on
a sphere of radius r = Rex, using the methods described in Appendix A.5.
We note that Ψ4 (Ψ0) has a coordinate-invariant meaning only in the limit as
future (past) null infinity is approached. The quantities computed at finite radius r will
differ from those observed at infinity by terms of the order O(1/r). In the particular
case of perturbed Schwarzschild spacetime considered here, a gauge-invariant wave
extraction method does exist even at finite radius (see e.g. [44] and references therein)
but we do not adopt it here. Our purpose in this paper is merely to measure the
effects on Ψ4 caused by the various boundary treatments.
5.1. Difference of Ψ4 with respect to the reference solution
We begin by evaluating ∆Ψ4 ≡ Ψ4 − Ψref4 , where Ψ4 is the Newman-Penrose scalar
computed using one of the various boundary methods and Ψref4 is the same quantity
computed from the reference solution at the same extraction radius. The curves shown
in figure 10 plot the maximum value of |∆Ψ4| over time intervals of length 20M (this
time filtering averages over the high frequency quasi-normal oscillations of the black
hole), normalized by the maximum value of |∆Ψ4| over the entire evolution. The
Testing outer boundary treatments for the Einstein equations 16
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
CPBC at R = 41.9 M
0 200 400 600 800 1000
t / M
∞ (Nr,L)=(21,8)
(31,10)
(41,12)
(51,14)
CPBC at R = 41.9 M
0 200 400 600 800 1000
t / M
,L)=(21,8)
(31,10)
(41,12)
(51,14)
CPBC at R = 121.9 M
0 200 400 600 800 1000
t / M
∞ (Nr,L)=(21,8)
(31,10)
(41,12)
(51,14)
CPBC at R = 121.9 M
Figure 9. Sponge layer method (solid) vs. new CPBCs at two different radii
(dotted). Four different resolutions are shown: (Nr, L) = (21, 8), (31, 10), (41, 12),
and (51, 14). The size of the sponge-free region is R = 41.9M and ||∆U||∞ is only
computed for r 6 R.
radius of the outer boundary (or the compactification scale, or the size of the sponge-
free region, respectively) used for these comparisons is R = 41.9M , and the radiation
is extracted nearby at Rex = 40M .
The first peak in |∆Ψ4| seen in figure 10 arises as the wave in our test problem
passes outward through the extraction sphere at t ≈ Rex. This peak is caused by
a presently unknown (probably gauge) interaction between the outer boundary (or
compactified region etc.) and the spacetime near the extraction sphere. We have
verified that this interaction and its influence on the peak in ∆Ψ4 goes away if we
move the outer boundary (or the extraction surface) so that they are not in causal
contact as the outgoing wave pulse passes the extraction surface.
Some of the outgoing radiation is reflected off the boundary. Most of this reflected
radiation is subsequently absorbed by the black hole, but some of it excites the hole,
which then emits quasi-normal mode radiation of exponentially decaying amplitude.
This exponential decay can be clearly seen for most of the boundary treatments.
In the case of freezing boundary conditions, nearly all of the outgoing quasi-
normal mode radiation is reflected from the boundary because the reflection coefficient
is nearly 1 for the wave number of the dominant mode, k = 0.376/M (cf. figure 1).
Testing outer boundary treatments for the Einstein equations 17
It then re-excites the black hole, which again radiates and so forth. On average the
amplitude of the reflections remains roughly constant in time for this case. This
behaviour is consistent with the result shown in figure 3 of [6] for a similar perturbed
black hole simulation.
For the Sommerfeld and Kreiss-Winicour boundary conditions, the reflections are
much smaller but still considerably larger (by 2 to 3 orders of magnitude) than for
our CPBCs. We attribute this difference largely to our use of the physical boundary
condition (12).
The spatial compactification method has the largest difference |∆Ψ4|, particularly
at early times t ∼ R (about 4 orders of magnitude larger than for the CPBCs). We
suspect that this may be a consequence of the use of artificial dissipation, as discussed
in section 4.1.
The sponge layer method has the smallest errors at early times. This is not
surprising because the outer boundary of the sponge layer is much further out at
R = 121.9M . However at later times when the waves begin to interact with the
sponge layer, this method causes reflections comparable in amplitude to those using
Sommerfeld boundary conditions.
We also note that at late times the level of |∆Ψ4| decreases significantly with
resolution for the CPBCs, but not generally for the other boundary treatments.
We think it is remarkable that the maximum relative error in the extracted
physical radiation is quite small (10−5 to 10−3) in these tests, even for the less
sophisticated boundary treatments such as the freezing or Sommerfeld boundary
conditions. This success is due in part to the fact that the extraction radius,
Rex = 40M , for this test problem is about ten wavelengths (of the initial radiation
pulse) away from the central black hole. Our results are likely to be more accurate
than those from typical binary black hole simulations, which place the outer boundary
at two or three wavelengths. This suggests that current binary black hole codes
using, for instance, Sommerfeld boundary conditions, can still produce waveforms
that are useful for some aspects of gravitational wave data analysis provided the
outer boundary is placed sufficiently far out. Data analysis applications needing
high precision waveforms, however, such as source parameter measurement or high-
amplitude supermassive binary black hole signal subtraction for LISA, will need to
use a more sophisticated boundary treatment that produces smaller errors in Ψ4.
5.2. Comparison with the predicted reflection coefficient
Buchman and Sarbach [18, 19] have recently developed a hierarchy of increasingly
absorbing physical boundary conditions for the Einstein equations by analyzing
the equations describing the evolution of the Weyl curvature on both a flat and
a Schwarzschild background spacetime. Their analysis predicts, in particular, the
reflection coefficient ρ (defined as the ratio of the ingoing to the outgoing parts of the
solution) that arises from the ∂tΨ0
= 0 physical boundary condition that we use.
For quadrupolar radiation (as in our numerical tests), this reflection coefficient is
given by equation (89) of [18],
ρ(kR) = 3
(kR)−4 +O(kR)−5, (35)
where k is the wave number of the gravitational radiation and R is the boundary
radius. (As explained at the beginning of section 2.3, we assume the background
spacetime to be flat; effects due to the backscattering would only enter at O(M/R).)
Testing outer boundary treatments for the Einstein equations 18
0 200 400 600 800 1000
t / M
,L)=(31,10)
(51,14)
Freezing
0 200 400 600 800 1000
t / M
,L)=(31,10)
(51,14)
Sommerfeld
0 200 400 600 800 1000
t / M
,L)=(31,10)
(51,14)
Kreiss-Winicour
0 200 400 600 800 1000
t / M
,L)=(31,10)
(51,14)
TAN compactification (filter 4)
0 200 400 600 800 1000
t / M
,L)=(31,10)
(51,14)
Sponge layer
Figure 10. Difference of Ψ4 for the various alternate methods (solid) vs. the
new CPBCs (dotted). Two resolutions are shown: (Nr, L) = (31, 10) and (51, 14).
The radius of the outer boundary (or the compactification scale, or the size of
the sponge-free region, respectively) is R = 41.9M and the waves are extracted
at Rex = 40M .
Testing outer boundary treatments for the Einstein equations 19
0 1 2 3 4 5 6 7
(k R)
R = 21.9 M
0 1 2 3 4 5 6 7
(k R)
R = 121.9 M
Figure 11. Comparison of the time Fourier transform of the measured Ψ0(t)
with 3
(kR)−4Ψ4, which is the predicted value using the reflection coefficient of
[18].
By evaluating Ψ0 and Ψ4 at the extraction radius of our test, we find that the ratio
Ψ0/Ψ4 agrees with their predicted ρ to leading order in 1/(kR). We note that the
tetrad we use for wave extraction (Appendix A.5) does not agree exactly with that
of [18]. However, the tetrads do agree for the unperturbed Schwarzschild solution,
so that the errors introduced into Ψ0 and Ψ4 due to our different choice of tetrad
are second-order small in perturbation theory and hence the comparison with [18] is
consistent.
For a numerical solution using our new CPBCs, we evaluate the Newman-Penrose
scalars Ψ0(t) and Ψ4(t) on extraction spheres located 1.9M inside the outer boundary.
In figure 11 we plot the time Fourier transforms of these quantities. We also plot
(kR)−4Ψ4, which by the above argument should agree with Ψ0 to leading order in
1/(kR). Figure 11 shows that the numerical agreement is reasonably good: roughly
at the expected level of accuracy. The overall dependence of the predicted reflection
coefficient ρ on k and R is captured very well. We surmise that the levelling off of
our numerical Ψ0 for k & 3 is due to numerical roundoff effects. (Note the magnitude
of Ψ0 at those frequencies.) For radii R & 200M , Ψ0 is at the roundoff level for all
frequencies.
6. Discussion
The purpose of this paper is to compare various methods of treating the outer
boundary of the computational domain. We evaluate the performance of several often-
used boundary treatments in numerical relativity by measuring the amount of spurious
reflections and constraint violations they generate. To this end, we consider as a test
problem an outgoing gravitational wave superimposed on a Schwarzschild black hole
spacetime. First we compute this numerical solution on a reference domain, large
enough that the influence of the outer boundary can be neglected. Then we repeat the
evolution on smaller domains using one of the boundary treatments, either imposing
local boundary conditions, compactifying the domain using a radial coordinate map,
or installing a sponge layer. We use a first-order generalized harmonic formulation
of the Einstein equations, although these boundary methods can be applied to other
Testing outer boundary treatments for the Einstein equations 20
formulations as well. We believe our results are fairly independent of the particular
formulation used.
Our main conclusion is that our version of constraint-preserving boundary
conditions performs better than any of the alternate treatments that we tested. Our
boundary conditions include a limitation on the influx of spurious gravitational waves
by freezing the Newman-Penrose scalar Ψ0 at the boundary. We also introduce and
test an improved boundary condition for the gauge degrees of freedom.
For some of the simple boundary conditions, such as freezing or Sommerfeld
conditions, we find constraint violations that do not converge away with increasing
resolution. The continuum limit does not satisfy Einstein’s equations in these cases.
Most of the alternate boundary conditions also generate considerable reflections as
measured by ∆U , the norm of the difference with respect to the reference solution. In
many cases, these reflections do not decrease significantly with increasing resolution.
The difference norm ∆U that we use to measure boundary reflections includes the
entire spacetime metric, not just the physical degrees of freedom. It is important then
to evaluate separately the effects of the various boundary treatments on the physical
degrees of freedom. We use the extracted outgoing radiation as approximated by
the Newman-Penrose scalar Ψ4 for this purpose. Here our conclusions are somewhat
different. Rather surprisingly, most of the boundary methods we consider generate
relatively small errors in Ψ4. This suggests that if gravitational waveforms are only
needed to an accuracy of, say, 1% (which is comparable to the discrepancies between
recent binary black hole simulations [45]) then even the simple Sommerfeld conditions
might be good enough. (For those, we find relative errors ∼ 10−5.) The largest
relative errors in Ψ4 we find (∼ 10−2) occur with our implementation of the spatial
compactification method used by Pretorius [15, 16, 17]. We attribute these largely to
the use of artificial dissipation. Undesirable effects of dissipation might be somewhat
less severe in binary black hole evolutions, which have much larger wavelengths
(λ ∼ 20 − 100M) than ours (λ ∼ 4M). Our tests suggest that the errors in Ψ4 can
be made to decrease significantly with resolution only by using more sophisticated
constraint preserving and physical boundary conditions. The importance of using a
physical boundary condition on Ψ0 is illustrated in particular by the difference between
the performance of our boundary conditions and those of Kreiss and Winicour [14].
Some caveats regarding the interpretation of our results must be stated. First,
the ratio of the dominant wavelength to the radius of the outer boundary is typically
much larger for binary black hole evolutions (where λ/R & 0.5) than for the simple
test problem considered here (where λ/R ∼ 0.1). Boundary treatments generally work
better for smaller λ/R, i.e. when the boundary is well out in the wave zone. Hence the
results presented here are likely to be more accurate than those from typical binary
black hole simulations. Second, we use spectral methods rather than finite-difference
methods, which are more commonly used in numerical relativity at this time. This
complicates the implementation of the kind of numerical dissipation that is crucial for
the spatial compactification method to work. While we have attempted to construct
a filter that mimics the finite-difference dissipation as closely as possible, a direct
comparison is clearly impossible. In finite-difference methods, the error introduced by
the type of numerical dissipation considered here is below the truncation error. Hence
tests similar to ours but performed with a finite-difference method would not be able
to detect the effect of dissipation.
There are several directions in which the present work could be extended. For
large values of the outer boundary radius, we observe a non-convergent power-
Testing outer boundary treatments for the Einstein equations 21
law growth of the error in our test problem when constraint-preserving boundary
conditions are used; the origin of this growth should be investigated further. It would
be interesting to implement and test the hierarchy of physical boundary conditions that
are perfectly absorbing for linearized gravity (including leading-order corrections due
to the curvature and backscatter) found recently by Buchman and Sarbach [18, 19].
Our boundary conditions could also be tested using known exact solutions such as
gauge waves, and comparisons could be made with the results found in [5].
For completeness we also mention a number of additional outer boundary
approaches that were not addressed in this paper, but would also be interesting future
extensions of this research. In [46, 47], boundary conditions for the full nonlinear
Einstein equations on a finite domain are obtained by matching to exact solutions of
the linearized field equations at the boundary. Alternatively, the interior code could
be matched to an ‘outer module’ that solves the linearized field equations numerically
[48, 49, 50, 51]. Other approaches involve matching the interior nonlinear Cauchy code
to an outer characteristic code (see [52] for a review) or using hyperboloidal spacetime
slices that can be compactified towards null infinity (see [53] for a review).
Appendix A. Details on the numerical test problem
Appendix A.1. Initial data
The initial data used for our numerical tests are the same as in [27]. The background
solution is a Schwarzschild black hole in Kerr-Schild coordinates,
ds2 = −dt2 + 2M
(dt+ dr)2 + dr2 + r2dΩ2. (A.1)
Throughout the paper, M refers to the bare black hole mass of the unperturbed
background. We superpose an odd-parity outgoing quadrupolar wave perturbation
constructed using Teukolsky’s method [54]. Its generating function is taken to be a
Gaussian G(r) = A exp[−(r − r0)2/w2] with A = 4× 10−3, r0 = 5M , and w = 1.5M .
The full non-linear initial value equations in the conformal thin sandwich formulation
are then solved to obtain initial data that satisfy the constraints [55]. This yields
initial values for the spatial metric, extrinsic curvature, lapse function, and shift
vector. We note that after the superposition, the resulting solution is still nearly
but not completely outgoing.
Our generalized harmonic formulation of Einstein’s equations requires initial data
for the full spacetime metric and its first time derivative. These can be computed from
the 3+1 quantities obtained above, provided we also choose initial values for the time
derivatives of the lapse function and shift vector. These initial time derivatives are
freely specifiable and are equivalent to the initial choice of the gauge source function
Ha; we choose ∂tN = 0 and ∂tN i = 0 at t = 0.
Appendix A.2. Numerical method
We use a pseudospectral collocation method as described for example in [27].
The computational domain for the test problem considered here is taken to
be a spherical shell extending from r = 1.9M (just inside the horizon) out to
some r = R. This domain is subdivided into spherical-shell subdomains of extent
∆r = 10M . On each subdomain, the numerical solution is expanded in Chebyshev
polynomials in the radial direction and in spherical harmonics in the angular directions
Testing outer boundary treatments for the Einstein equations 22
(where each Cartesian tensor component is expanded in the standard scalar spherical
harmonics). Typical resolutions are Nr ∈ {21, 31, 41, 51} coefficients per subdomain
for the Chebyshev series and l 6 L with L ∈ {8, 10, 12, 14} for the spherical harmonics.
We change the outer boundary radius R by changing the number of subdomains
while keeping the width ∆r of each subdomain fixed; this facilitates direct comparisons
between runs with different values of R. For example, the innermost four subdomains
of the reference solution (which has a total of 96 subdomains and R = 961.9M) are
identical to the four subdomains used to compute the solution with R = 41.9M .
The evolution equations are integrated in time using a fourth-order Runge-Kutta
scheme, with a Courant factor ∆t/∆xmin of at most 2.25, where ∆xmin is the smallest
distance between two neighbouring collocation points. As described in [27], the top
four coefficients in the tensor spherical harmonic expansion of each of our evolved
quantities is set to zero after each time step; this eliminates an instability associated
with the inconsistent mixing of tensor spherical harmonics in our approach.
We use two methods of numerically implementing boundary conditions; the choice
of method depends on the type of boundary conditions. Boundary conditions that can
be expressed as algebraic relations involving the characteristic fields are implemented
using a penalty method (see [56] and references therein; in the context of finite-
difference methods see also [57] and references therein). In particular, we use a penalty
method to implement the Kreiss-Winicour boundary conditions (cf. section 3.3)
and to impose boundary conditions at the internal boundaries between neighbouring
subdomains. Boundary conditions that are expressed in terms of the time derivatives
of the characteristic fields are implemented using the method of Bjørhus [58], where
the time derivatives of the incoming characteristic fields are replaced at the boundary
with the relevant boundary condition. All boundary conditions in this paper besides
those mentioned above are implemented using the Bjørhus method.
Appendix A.3. Gauge source functions
Our generalized harmonic formulation [6] of Einstein’s equations allows for gauge
source functions that depend arbitrarily on the coordinates and the spacetime metric:
Ha = Ha(t, x, ψ). The generalized harmonic evolution equations are equivalent to
Einstein’s equations only if the constraint (4) remains satisfied.
We choose the time derivatives of lapse and shift to be zero at the beginning of
the simulation; this determines the initial value of Ha via the constraint (4). For the
subsequent evolution, we hold this Ha fixed in time.
Appendix A.4. Error quantities
We use two different measurements of the errors in our solutions, which we monitor
during our numerical evolutions. First, given a numerical solution (ψab,Πab,Φiab),
the difference between that solution and the reference solution (ψ(ref)ab ,Π
(ref)
ab ,Φ
(ref)
iab ) is
computed with the following norm at each point in space,
δabδcd(M−2∆ψac∆ψbd + ∆Πac∆Πbd
+gij∆Φiac∆Φjbd)
, (A.2)
where ∆ψab means ψab−ψ(ref)ab , and similarly for ∆Πab and ∆Φiab. Second, we define
a quantity C that measures the violations in all of the constraints of our system,
δab(FaFb + gij(CiaCjb + gklδcdCikacCjlbd)
Testing outer boundary treatments for the Einstein equations 23
+M−2(CaCb + gijδcdCiacCjbd))
, (A.3)
where Fa and Cia are first derivatives of Ca defined in [6]. To compute global error
measures, a spatial norm ||·||, either the L∞ norm or the L2 norm, is applied separately
to ∆U and C.
The question often arises as to the significance of particular values of ||∆U|| and
||C||. For example, is a simulation with ||C|| = 10−2 good to one percent accuracy? To
make it easier to answer such questions, we normalize both ||∆U|| and ||C|| as follows,
and we always plot normalized quantities.
We divide ||∆U|| by a normalization factor ||∆U0||, defined as the difference
between a given solution at t = 0 and the unperturbed Schwarzschild background;
i.e., the quantity ||∆U0|| is computed from (A.2) using the unperturbed Schwarzschild
solution instead of the reference solution. Since ||∆U0|| is evaluated at t = 0, it depends
only on the initial data used in the simulation, and is a measure of the amplitude
of the superposed gravitational wave perturbation. For the initial data used here,
||∆U0||∞ = 6× 10−3 and ||∆U0||2 = 1.4× 10−4. The quantity ||∆U||/||∆U0|| is more
easily interpreted than ||∆U||; for example, ||∆U||/||∆U0|| is unity when the difference
from the reference solution is of the same size as the initial perturbation.
Similarly, the constraint energy norm ||C|| is divided by the norm of the first
derivatives ||∂U|| (at the respective time),
gijδabδcd(M−2∂iψac∂lψbd + ∂iΠac∂jΠbd
+gkl∂iΦkac∂jΦlbd)
. (A.4)
The constraints for our system are linear combinations of the first derivatives of the
fields, hence ||C||/||∂U|| ∼ 1 corresponds to a complete violation of the constraints.
Appendix A.5. Wave extraction
For evaluating gravitational waveforms, we compute the Newman-Penrose scalars
Ψ0 = −Cabcdlamblcmd, Ψ4 = −Cabcdkam̄bkcm̄d, (A.5)
where Cabcd is the Weyl tensor, la and ka are outgoing and ingoing null vectors
normalized according to laka = −1, ma is a complex unit null spatial vector orthogonal
to la and ka, and m̄a is the complex conjugate of ma. For perturbations of flat
spacetime, there is a standard choice for the vectors la, ka, and ma. In general curved
spacetimes, however, no such prescription for the tetrad exists that would produce
coordinate-independent quantities Ψ0 and Ψ4 at finite radius. We choose the null
vectors according to
la = 1√
(ta + na) , ka = 1√
(ta − na) , (A.6)
where ta is the future-pointing unit timelike normal to the t = const. slices and na is
the unit spacelike normal to the extraction sphere. Finally, we choose
sin θ
, (A.7)
where (r, θ, φ) are spherical coordinates on the r = Rex = const. extraction sphere.
Note that our choice of ma is not exactly null nor of unit magnitude at finite extraction
radius. However, the tetrad is orthonormal for the unperturbed Schwarzschild
solution, so that the errors introduced into Ψ0 and Ψ4 because of the lack of tetrad
orthonormality will be second-order small in perturbation theory.
Testing outer boundary treatments for the Einstein equations 24
The quantity Ψ4 corresponds to outgoing radiation in the limit of r → ∞,
t − r = const., i.e. as future null infinity is approached. Similarly Ψ0 corresponds
to ingoing radiation as past null infinity is approached. At finite extraction radius,
Ψ4 and Ψ0 will disagree with the waveforms observed at infinity by terms of the order
O(Rex)−1.
We decompose the quantities Ψ4 and Ψ0 in terms of spin-weighted spherical
harmonics of spin-weight −2 on the extraction surface. Since our perturbation is
an odd-parity quadrupole wave, the imaginary part of the (l = 2, m = 0) spherical
harmonic is by far the dominant contribution to Ψ4, and we only display that mode
in our plots. We normalize the curves in our graphs by the maximum (in time) value
of |Ψ4| at the extraction radius Rex, which for Rex = 40M is max |Ψ4| = 6× 10−4.
Appendix B. Details of the alternate approaches
In this appendix, we provide some more details on the alternate boundary treatments
discussed in section 4: spatial compactification and sponge layers.
Appendix B.1. Spatial compactification
We implement spatial compactification by introducing a radial coordinate
transformation x → r(x) that maps a compact ball on the computational grid with
x ∈ [0, xmax] to the full unbounded physical slice with r ∈ [0,∞]. We consider two
such mappings. The Tan mapping is similar to the one used by Pretorius [15, 16, 17]
and is given by
rTan(x) = R tan
, 0 6 x < 2R. (B.1)
The scale R determines the range in physical radius r across which the map is
essentially linear (see figure B1). When comparing compactification with other
boundary treatments, we compare quantities only in the region r < R. (The scale
R is equal to unity in the work of Pretorius. He uses mesh refinement to obtain the
appropriate resolution close to the origin, while we fix the resolution and choose the
scale R appropriately.) We also tested an Inverse map defined by
rInverse(x) =
x, 0 6 x 6 R ,
2R− x, R < x < 2R,
(B.2)
see figure B1. This map is only C1 at x = R, but we maintain spectral accuracy in
our tests by placing this surface at the boundary between spectral subdomains.
Dissipation is needed to remove the short wavelength components of the
waves as they travel outward on the compactified computational grid and become
unresolved. We apply this dissipation only in the radial direction, but everywhere
in the computational domain. In spectral methods, dissipation can be conveniently
implemented in the form of a spectral filter. This filter is applied by multiplying each
spectral expansion coefficient of index k by a function f(k). (See Appendix A.2 for
details on the pseudospectral method we use.) Higher values of k correspond to shorter
wavelengths in the numerical approximation; let kmax be the highest index used in the
spectral expansion. The first filter function we consider is the closest analogue in the
context of our spectral methods to Kreiss-Oliger [59] dissipation,
fKreiss-Oliger(k) = 1− � sin4
2kmax
, 0 6 � 6 1. (B.3)
Testing outer boundary treatments for the Einstein equations 25
0 1 2
x / R
INVERSE
0 0.5 1
k / k
Filters 1 and 2
Filters 3 and 4
Figure B1. Compactification mappings (left) and filter functions (right). The
dashed line indicates the boundary of the region in where the compactification
mapping is (essentially) linear.
Typical values of the parameter � used by Pretorius are � ∈ [0.2, 0.5]; we use � = 0.25.
This filter was derived via a comparison with finite-difference methods as follows.
In the finite-difference approach, a numerical solution u is represented on a set of
equidistant grid points xj . (It suffices to consider the one-dimensional case here.)
Some form of numerical dissipation is usually required for the finite-difference method
to be stable. The one that is most often used for second-order accurate methods is
fourth-order Kreiss-Oliger dissipation [59]. One possible implementation of this, used
e.g. by Pretorius, amounts to replacing
u→ F [u] ≡
u (B.4)
at each time step, where h is the grid spacing and D4 is the second-order accurate
centred finite difference operator approximating the fourth derivative,
D4ui = h
−4(uj−2 − 4uj−1 + 6uj − 4uj+1 + uj+2). (B.5)
Taking u to be a Fourier mode u(k)j = exp(ikxj), it follows that the mode is damped
by a frequency-dependent factor,
u(k) → F [u(k)] ≡
1− � sin4
2kmax
u(k), (B.6)
where kmax = π/(2h) is the Nyquist frequency. Thus we obtain the filter function
(B.3). Strictly speaking, the above analysis only applies to Fourier expansions and
not to the Chebyshev expansions we use. Nevertheless, we apply the filter in the
form (B.3) to our Chebyshev expansion coefficients. Note that in (B.6), each spectral
coefficient u(k) is filtered separately; this is not true for the analogous calculation for
a Chebyshev expansion.
We also use a different filter function, which we call the Exponential filter, that
is often used in spectral methods (see [60] and references therein),
fExponential(k) = exp
σkmax
. (B.7)
Testing outer boundary treatments for the Einstein equations 26
No. Type Parameters Applied to
1 Kreiss-Oliger � = 0.25 right side
2 Kreiss-Oliger � = 0.25 solution
3 Exponential σ = 0.76, p = 13 right side
4 Exponential σ = 0.76, p = 13 solution
Table B1. Details of the filtering methods
Typical values of the parameters are σ = 0.76 and p = 13. This choice of parameters
gives less dissipation at small values of k than the Kreiss-Oliger filter, and also ensures
that f(kmax) ≈ 10−16 is at the level of the numerical roundoff error.
There are various ways the filters can be applied in a numerical evolution. We
have experimented with two different methods. In the first method, the filter is applied
to the right side of the equations, i.e. the evolution equations ∂tu = S are modified
according to ∂tu = F [S], where F [S] is the filtered right side. In the second method,
the filter is instead applied to the solution itself, i.e. after each substep of the time
integrator (cf. Appendix A.2), the numerical solution u is replaced with its filtered
version F [u]. This second method is closest to how the Kreiss-Oliger filter is applied
by Pretorius.
For our numerical tests, we have used four different combinations of the various
options described above. They are summarized in table B1.
Appendix B.2. Sponge layers
For sponge layers we must specify a sponge profile function γ(r), as defined in (34). We
choose γ(r) to be nonzero only outside some sponge-free region of radius R, and when
comparing sponge layers with other boundary treatments, we compare quantities only
in the sponge-free region r < R.
The sponge profile function γ(r) we use is a Gaussian centred at the outer
boundary, which we choose to place at r = 3R,
γ(r) = γ0 exp
r − 3R
. (B.8)
The amplitude of the Gaussian is taken to be γ0 = 1. The width σ is chosen so that
γ(r) 6 10−16 (the numerical roundoff error) for r 6 R, which requires σ . R/3. In
our numerical example, we take R = 41.9M and σ = 13.3M . Hence σ is considerably
larger than the wavelength λ ≈ 4M of the gravitational wave, which is required in
order to avoid reflections from the sponge layer (cf. section 17.2.3 of [43]). Figure B2
shows a plot of this sponge profile.
Acknowledgments
We thank Luisa Buchman, Jan Hesthaven, Larry Kidder, Harald Pfeiffer, Olivier
Sarbach, and Jeff Winicour for helpful discussions concerning this work. The
numerical simulations presented here were performed using the Spectral Einstein
Code (SpEC) developed at Caltech and Cornell primarily by Larry Kidder, Mark
Scheel and Harald Pfeiffer. This work was supported in part by grants from the
Sherman Fairchild Foundation, and from the Brinson Foundation; by NSF grants
Testing outer boundary treatments for the Einstein equations 27
0 1 2 3
r / R
Figure B2. The sponge profile function γ(r). The dashed line indicates the
boundary of the region where γ is below the numerical roundoff error.
PHY-0099568, PHY-0244906, PHY-0601459, DMS-0553302 and NASA grants NAG5-
12834, NNG05GG52G.
References
[1] Novak J and Bonazzola S 2004 Absorbing boundary conditions for simulation of gravitational
waves with spectral methods in spherical coordinates J. Comput. Phys. 197 86–196
[2] Rinne O 2005 Axisymmetric Numerical Relativity Ph.D. thesis Univ. of Cambridge Preprint
http://www.arxiv.org/abs/gr-qc/0601064
[3] Lau S L 2004 Rapid evaluation of radiation boundarz kernels for time-domain wave propagation
on black holes: implementation and numerical tests Class. Quantum Grav. 21 4147–4192
[4] Babiuc M C, Szilágyi B and Winicour J 2006 Harmonic initial-boundary evolution in general
relativity Phys. Rev. D 73 064017
[5] Babiuc M C, Kreiss H O and Winicour J 2007 Constraint-preserving Sommerfeld conditions for
the harmonic Einstein equations Phys. Rev. D 75 044002
[6] Lindblom L, Scheel M A, Kidder L E, Owen R and Rinne O 2006 A new generalized harmonic
evolution system Class. Quantum Grav. 23 S447–S462
[7] Brügmann B, Tichy W and Jansen N 2004 Numerical simulation of orbiting black holes Phys.
Rev. Lett. 92 211101
[8] Campanelli M, Lousto C O, Marronetti P and Zlochower Y 2006 Accurate evolutions of orbiting
black-hole binaries without excision Phys. Rev. Lett. 96 111101
[9] Baker J G, Centrella J, Choi D I, Koppitz M and van Meter J 2006 Gravitational-wave extraction
from an inspiraling configuration of merging black holes Phys. Rev. Lett. 96 111102
[10] Diener P, Herrmann F, Pollney D, Schnetter E, Seidel E, Takahashi R, Thornburg J and Ventrella
J 2006 Accurate evolution of orbiting binary black holes Phys. Rev. Lett. 96 121101
[11] Herrmann F, Hinder I, Shoemaker D and Laguna P 2007 Unequal-mass binary black hole plunges
and gravitational recoil Class. Quantum Grav. 24 S33–S42
[12] Shibata M and Nakamura T 1995 Evolution of three-dimensional gravitational waves: Harmonic
slicing case Phys. Rev. D 52 5428
[13] Baumgarte T W and Shapiro S L 1998 Numerical integration of Einstein’s field equations Phys.
Rev. D 59 024007
[14] Kreiss H O and Winicour J 2006 Problems which are well-posed in a generalized sense with
applications to the Einstein equations Class. Quantum Grav. 16 S405–S420
[15] Pretorius F 2005 Numerical relativity using a generalized harmonic decomposition Class.
Quantum Grav. 22 425–452
[16] Pretorius F 2005 Evolution of binary black hole spacetimes Phys. Rev. Lett. 95 121101
[17] Pretorius F 2006 Simulation of binary black hole spacetimes with a harmonic evolution scheme
Class. Quantum Grav. 23 S529–S552
[18] Buchman L T and Sarbach O C A 2006 Towards absorbing outer boundaries in general relativity
Class. Quantum Grav. 23 6709–6744
Testing outer boundary treatments for the Einstein equations 28
[19] Buchman L T and Sarbach O C A 2007 Improved outer boundary conditions for Einstein’s field
equations Class. Quantum Grav. 24 S307–S326
[20] Gundlach C, Calabrese G, Hinder I and Mart́ın-Garćıa J M 2005 Constraint damping in the Z4
formulation and harmonic gauge Class. Quantum Grav. 22 3767–3774
[21] Stewart J M 1998 The Cauchy problem and the initial boundary value problem in numerical
relativity Class. Quantum Grav. 15 2865–2889
[22] Friedrich H and Nagy G 1999 The initial boundary value problem for Einstein’s vacuum field
equations Comm. Math. Phys. 201 619–655
[23] Iriondo M S and Reula O A 2002 Free evolution of self-gravitating, spherically symmetric waves
Phys. Rev. D 65 044024
[24] Calabrese G, Lehner L and Tiglio M 2002 Constraint-preserving boundary conditions in
numerical relativity Phys. Rev. D 65 104031
[25] Calabrese G and Sarbach O 2003 Detecting ill-posed boundary conditions in general relativity
J. Math. Phys. 44 3888–3899
[26] Calabrese G, Pullin J, Reula O, Sarbach O and Tiglio M 2003 Well posed constraint-preserving
boundary conditions for the linearized Einstein equations Comm. Math. Phys. 240 377–395
[27] Kidder L E, Lindblom L, Scheel M A, Buchman L T and Pfeiffer H P 2005 Boundary conditions
for the Einstein evolution system Phys. Rev. D 71 064020
[28] Bona C, Ledvinka T, Palenzuela-Luque C and Žáček M 2005 Constraint-preserving boundary
conditions in the Z4 numerical relativity formalism Class. Quantum Grav. 22 2615–2634
[29] Sarbach O and Tiglio M 2005 Boundary conditions for Einstein’s field equations: Analytical and
numerical analysis J. Hyp. Diff. Eq. 2 839–883
[30] Bardeen J M and Buchman L T 2002 Numerical tests of evolution systems, gauge conditions,
and boundary conditions for 1D colliding gravitational plane waves Phys. Rev. D 65 064037
[31] Nagy G and Sarbach O 2006 A minimization problem for the lapse and the initial-boundary
value problem for Einstein’s field equations Class. Quantum Grav. 16 S477–S504
[32] Rinne O 2006 Stable radiation-controlling boundary conditions for the generalized harmonic
Einstein equations Class. Quantum Grav. 23 6275–6300
[33] Scheel M A, Pfeiffer H P, Lindblom L, Kidder L E, Rinne O and Teukolsky S A 2006 Solving
Einstein’s equations with dual coordinate frames Phys. Rev. D 74 104006
[34] Pfeiffer H P, Brown D A, Kidder L E, Lindblom L, Lovelace G and Scheel M A 2007 Reducing
orbital eccentricity in binary black hole simulations Class. Quantum Grav. 24 S59–S81
[35] Chrzanowski P L 1975 Vector potential and metric perturbations of a rotating black hole Phys.
Rev. D 11 2042–2062
[36] Bayliss A and Turkel E 1980 Radiation boundary conditions for wave-like equations Comm. Pure
Appl. Math. 33 707–725
[37] Rauch J 1985 Symmetric positive systems with boundary characteristics of constant multiplicity
Trans. Am. Math. Soc. 291 167–187
[38] Secchi P 1996 The initial boundary value problem for linear symmetric hyperbolic systems with
characteristic boundary of constant multiplicity Diff. Int. Eq. 9 671–700
[39] Secchi P 1996 Well-posedness of characteristic symmetric hyperbolic systems
Arch. Rat. Mech. Anal. 134 155–197
[40] Szilágyi B, Pollney D, Rezzolla L, Thornburg J and Winicour J 2007 An explicit harmonic code
for black-hole evolution using excision Class. Quantum Grav. 24 S275–S293
[41] Garfinkle D and Duncan G 2001 Numerical evolution of Brill waves Phys. Rev. D 63 044011
[42] Choptuik M, Lehner L, Olabarrieta I, Petryk R, Pretorius F and Villegas H 2003 Towards the
final fate of an unstable black string Phys. Rev. D 68 044001
[43] Boyd J P 2001 Chebyshev and Fourier Spectral Methods 2nd ed (Dover publications)
[44] Pazos E, Dorband E N, Nagar A amd Palenzuela C, Schnetter E and Tiglio M 2007 How far
away is far enough for extracting numerical waveforms, and how much do they depend on the
extraction method? Class. Quantum Grav. 24 S341–S368
[45] Baker J G, Campanelli M, Pretorius F and Zlochower Y 2007 Comparisons of binary black hole
merger waveforms Class. Quantum Grav. 24 S25–S31
[46] Abrahams A M and Evans C R 1988 Reading off the gravitational radiation waveforms in
numerical relativity calculations: Matching to linearized gravity Phys. Rev. D 37 318
[47] Abrahams A M and Evans C R 1990 Gauge-invariant treatment of gravitational radiation near
the source: Analysis and numerical simulations Phys. Rev. D 42 2585
[48] Abrahams A M et al. 1998 Gravitational wave extraction and outer boundary conditions by
perturbative matching Phys. Rev. Lett. 80 1812–1815
[49] Rupright M E, Abrahams A M and Rezzolla L 1998 Cauchy-perturbative matching and outer
boundary conditions I: Methods and tests Phys. Rev. D 58 044005
Testing outer boundary treatments for the Einstein equations 29
[50] Rezzolla L, Abrahams A M, Matzner R A, Rupright M E and Shapiro S L 1999 Cauchy-
perturbative matching and outer boundary conditions: Computational studies Phys. Rev.
D 59 064001
[51] Zink B, Pazos E, Diener P and Tiglio M 2006 Cauchy-perturbative matching reexamined: Tests
in spherical symmetry Phys. Rev. D 73 084011
[52] Winicour J 2005 Characteristic evolution and matching Living Rev. Relativity 8(10)
[53] Frauendiener J 2004 Conformal infinity Living Rev. Relativity 7(1)
[54] Teukolsky S A 1982 Linearized quadrupole waves in general relativity and the motion of test
particles Phys. Rev. D 26 745–750
[55] Pfeiffer H P, Kidder L E, Scheel M A and Shoemaker D 2005 Initial data for Einstein’s equations
with superposed gravitational waves Phys. Rev. D 71 024020
[56] Hesthaven J S 2000 Spectral penalty methods Appl. Numer. Math. 33 23–41
[57] Schnetter E, Diener P, Dorband E N and Tiglio M 2006 A multi-block infrastructure for three-
dimensional time-dependent numerical relativity Class. Quantum Grav. 23 S553–S578
[58] Bjørhus M 1995 The ODE formulation of hyperbolic PDEs discretized by the spectral collocation
method SIAM J. Sci. Comput. 16 542–557
[59] Kreiss H O and Oliger J 1973 Methods for the approximate solution of time dependent problems
Global Atmospheric Research Programme (Publication Series No. 10)
[60] Gottlieb D and Hesthaven J S 2001 Spectral methods for hyperbolic problems J. Comput. Appl.
Math. 128 83–131
	Introduction
	Constraint-preserving boundary conditions
	The generalized harmonic evolution system
	Construction of boundary conditions
	Improved gauge boundary condition
	Numerical results
	Alternate boundary conditions
	Freezing the incoming fields
	Sommerfeld boundary conditions
	Kreiss-Winicour boundary conditions
	Alternate approaches
	Spatial compactification
	Sponge layers
	Physical gravitational waves
	Difference of 4 with respect to the reference solution
	Comparison with the predicted reflection coefficient
	Discussion
	Details on the numerical test problem
	Initial data
	Numerical method
	Gauge source functions
	Error quantities
	Wave extraction
	Details of the alternate approaches
	Spatial compactification
	Sponge layers
ABSTRACT
  Various methods of treating outer boundaries in numerical relativity are
compared using a simple test problem: a Schwarzschild black hole with an
outgoing gravitational wave perturbation. Numerical solutions computed using
different boundary treatments are compared to a `reference' numerical solution
obtained by placing the outer boundary at a very large radius. For each
boundary treatment, the full solutions including constraint violations and
extracted gravitational waves are compared to those of the reference solution,
thereby assessing the reflections caused by the artificial boundary. These
tests use a first-order generalized harmonic formulation of the Einstein
equations. Constraint-preserving boundary conditions for this system are
reviewed, and an improved boundary condition on the gauge degrees of freedom is
presented. Alternate boundary conditions evaluated here include freezing the
incoming characteristic fields, Sommerfeld boundary conditions, and the
constraint-preserving boundary conditions of Kreiss and Winicour. Rather
different approaches to boundary treatments, such as sponge layers and spatial
compactification, are also tested. Overall the best treatment found here
combines boundary conditions that preserve the constraints, freeze the
Newman-Penrose scalar Psi_0, and control gauge reflections.

<|endoftext|><|startoftext|>
Introduction
We consider the flow of an incompressible fluid in a polyhedral set Ω ⊂ R2 during the time interval [0, T ].
The velocity field u : Ω× [0, T ] → R2 and the pressure field p : Ω× [0, T ] → R satisfy the Navier-Stokes equations
∆u+ (u · ∇)u+∇p = f , (1.1)
div u = 0 , (1.2)
with the boundary and initial conditions
u|∂Ω = 0 , u|t=0 = u0.
The terms ∆u and (u·∇)u are associated with the physical phenomena of diffusion and convection, respectively.
The Reynolds number Re measures the influence of convection in the flow. For equations (1.1)–(1.2), finite
element and finite difference methods are well known and mathematical studies are available (see [9] for example).
Keywords and phrases: Incompressible fluids, Navier-Stokes equations, projection methods, finite volume.
1 17 rue Barrème - 69006 LYON. e-mail: Sebastien.Zimmermann@ec-lyon.fr
c© EDP Sciences, SMAI 1999
http://arxiv.org/abs/0704.0783v2
2 TITLE WILL BE SET BY THE PUBLISHER
For finite volume schemes, numerous computations have been conducted ( [12] and [1] for example). However,
few mathematical results are available in this case. Let us cite Eymard and Herbin [6] and Eymard,
Latché and Herbin [7]. In order to deal with the incompressibility constraint (1.2), these works use a
penalization method. Another way is to use the projection methods which have been introduced by Chorin [4]
and Temam [13]. This is the case in Faure [8] where the mesh is made of squares. In Zimmermann [14] the
mesh is made of triangles, so that more complex geometries can be considered. In the present paper the mesh
is also made of triangles, but we consider a different discretization for the pressure. It leads to a linear system
with a better-conditioned matrix. The layout of the article is the following. We first introduce in section 2
the discrete setting. We state (section 2.1) some notations and hypotheses on the mesh. We define (section
2.2) the spaces we use to approximate the velocity and pressure. We define also (section 2.3) the operators
we use to approximate the differential operators in (1.1)–(1.2). Combining this with a projection method, we
build the scheme in section 3. In order to provide a mathematical analysis, we show in section 4 that the
differential operators in (1.1)–(1.2) and their discrete counterparts share similar properties. In particular, the
discrete operators for the gradient and the divergence are adjoint. The discrete operator for the convection term
is positive, stable and consistent. The discrete operator for the divergence satisfy an inf-sup (Babuška-Brezzi)
condition. From these properties we deduce in section 5 the stability of the scheme.
We conclude with some notations. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the usual Lebesgue spaces and we
set L20 = {q ∈ L
q(x) dx = 0}. Their vectorial counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L
2 = (L2)2
and L∞ = (L∞). For k ∈ N∗, (Hk, ‖ ·‖k) is the usual Sobolev space. Its vectorial counterpart is (H
k, ‖.‖k) with
Hk = (Hk)2. For k = 1, the functions of H1 with a null trace on the boundary form the space H10. Also, we set
∇u = (∇u1,∇u2)
T if u = (u1, u2) ∈ H
1. If X ⊂ L2 is a Banach space, we define C(0, T ;X) (resp. L2(0, T ;X))
as the set of the applications g : [0, T ] → X such that t → |g(t)| is continuous (resp. square integrable). The
norm ‖.‖C(0,T ;X) is defined by ‖g‖C(0,T ;X) = sups∈[0,T ] |g(s)|. In all calculations, C is a generic positive constant,
depending only on Ω, u0 and f .
2. Discrete setting
First, we introduce the spaces and the operators needed to build the scheme.
2.1. The mesh
Let Th be a triangular mesh of Ω. The circumscribed circle of a triangle K ∈ Th is centered at xK and has
the diameter hK . We set h = maxK∈Th hK . We assume that all the interior angles of the triangles of the mesh
are less than π
, so that xK ∈ K. The set of the edges of the triangle K ∈ Th is EK . The symbol nK,σ denotes
the unit vector normal to an edge σ ∈ EK and pointing outward K. We denote by Eh the set of the edges of the
mesh. We distinguish the subset E inth ⊂ Eh (resp. E
h ) of the edges located inside Ω (resp. on ∂Ω). The middle
of an edge σ ∈ Eh is xσ and its length |σ|. For each edge σ ∈ E
h , let Kσ and Lσ be the two triangles having
σ in common. We set dσ = d(xKσ ,xLσ ). For all σ ∈ E
h , only the triangle Kσ located inside Ω is defined and
we set dσ = d(xKσ ,xσ). Then for all σ ∈ Eh we set τσ =
. As in [5] we assume the following on the mesh:
there exists C > 0 such that
∀σ ∈ Eh , dσ ≥ C |σ| and |σ| ≥ C h.
It implies that there exists C > 0 such that
∀σ ∈ E inth , τσ = |σ|/dσ ≥ C. (2.1)
2.2. The discrete spaces
We first define
P0 = {q ∈ L
2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0)
TITLE WILL BE SET BY THE PUBLISHER 3
For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all triangle K ∈ Th: qK = qh|K (resp.
vK = vh|K). Although P0 6⊂ H
1, we define the discrete equivalent of a H1 norm as follows. For all vh ∈ P0
we set
‖vh‖h =
σ∈Eint
τσ |vLσ − vKσ |
σ∈Eext
τσ |vKσ |
. (2.2)
We have [5] a Poincaré-like inequality: there exists C > 0 such that for all vh ∈ P0
|vh| ≤ C ‖vh‖h. (2.3)
We also have [14] an inverse inequality: there exists C > 0 such that for all vh ∈ P0
h ‖vh‖h ≤ C |vh|. (2.4)
From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set
‖vh‖−1,h = sup
(vh,ψh)
‖ψh‖h
. (2.5)
For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. We define the projection operator ΠP0 :
L2 → P0 as follows. For all w ∈ L
2, ΠP0w ∈ P0 is given by
∀K ∈ Th , (ΠP0w)|K =
w(x) dx. (2.6)
We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) = (w,vh). We deduce from this that ΠP0
is stable for the L2 norm. We define also the operator Π̃P0 : H
2 → P0. For all w ∈ H
2, Π̃P0w ∈ P0 is given by
∀K ∈ Th , Π̃P0w|K = w(xK).
According to the Sobolev embedding theorem, w ∈ H2 is a.e. equal to a continuous function. Therefore the
definition above makes sense. We introduce also the finite element spaces
P d1 = {v ∈ L
2 ; ∀K ∈ Th, v|K is affine} ,
Pnc1 = {vh ∈ P
1 ; ∀σ ∈ E
h , vh|Kσ (xσ) = vh|Lσ(xσ) ,
Pc1 = {vh ∈ (P
2 ; vh is continuous and vh|∂Ω = 0}.
We have Pc1 ⊂ H
0. We define ΠPc1 : H
0 → P
1. For all v = (v1, v2) ∈ H
0, ΠPc1v = (v
h) ∈ P
1 is given by
∀φh = (φ
h) ∈ P
∇vih,∇φ
∇vi,∇φ
The operator ΠPc
is stable for the H1 norm. One checks ( [2] p. 110) that there exists C > 0 such that for all
v ∈ H1
|v −ΠPc
v| ≤ C h ‖v‖1. (2.7)
Let us address now the space Pnc1 . If qh ∈ P
1 , we have usually ∇qh 6∈ L
2. Thus we define the operator
∇h : P
1 → P0 by setting for all qh ∈ P0 and all triangle K ∈ Th
∇hqh|K =
∇qh dx. (2.8)
4 TITLE WILL BE SET BY THE PUBLISHER
The associated norm is defined by
‖qh‖1,h =
2 + |∇hqh|
We have a Poincaré-like inequality : there exists C > 0 such that for all qh ∈ P
1 ∩ L
|qh| ≤ C |∇hqh|. (2.9)
We define the projection operator ΠPnc
. For all q ∈ H1, ΠPnc
q is given by
∀σ ∈ Eh ,
(ΠPnc
q) dσ =
q dσ.
One checks ( [2] p.110) that there exists C > 0 such that
|p−ΠPnc
p| ≤ C h ‖p‖1 ,
∣∣∣∇̃h(p−ΠPnc
∣∣∣ ≤ C ‖p‖1. (2.10)
Finally, we use the Raviart-Thomas spaces (see [3])
= {vh ∈ P
1 ; ∀σ ∈ EK , vh|K · nK,σ is a constant, and vh · n|∂Ω = 0} ,
RT0 = {vh ∈ RT
; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ ,σ = vh|Lσ · nKσ ,σ}.
For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh ·nK,σ)σ = vh|K ·nK,σ. We define the operator ΠRT0 : H
RT0. For all v ∈ H
1, ΠRT0v ∈ RT0 is given by
∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ =
v dσ. (2.11)
2.3. The discrete operators
The equations (1.1)–(1.2) use the differential operators gradient, divergence and laplacian. Using the spaces
of section 2.2, we define their discrete counterparts. The discrete gradient ∇h : P
1 → P0 is defined by (2.8).
The discrete divergence operator divh : P0 → P
1 is built so that it is adjoint to the operator ∇h. We set for
all vh ∈ P0 and all triangle K ∈ Th
∀σ ∈ E inth , (divh vh)(xσ) =
3 |σ|
|Kσ|+ |Lσ|
(vLσ − vKσ ) · nK,σ ;
∀σ ∈ Eexth , (divh vh)(xσ) = −
3 |σ|
|Kσ|+ |Lσ|
vKσ · nK,σ. (2.12)
The first discrete laplacian ∆h : P
1 → P
1 ensures that the incompressibility constraint (1.2) is satisfied in a
discrete sense (see the proof of proposition 3.1 below). We set for all qh ∈ P
∆hqh = divh(∇hqh).
The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite volume schemes [5]. We set for all
vh ∈ P0 and all triangle K ∈ Th
∆̃hvh|K =
σ∈EK∩E
τσ (vLσ − vKσ )−
σ∈EK∩E
τσ vKσ .
TITLE WILL BE SET BY THE PUBLISHER 5
In order to approximate the term (u · ∇)u in (1.1) we define a bilinear form b̃h : RT0 × P0 → P0 using the
well-known upwind scheme [5]. For all uh ∈ P0, vh ∈ P0, and all triangle K ∈ Th we set
b̃h(uh,vh)
σ∈EK∩E
(u · nK,σ)
σ vK + (u · nK,σ)
σ vLσ
. (2.13)
We have set a+ = max(a, 0), a− = min(a, 0) for all a ∈ R. Lastly, we define the trilinear form bh : RT0 ×P0 ×
P0 → R
2 as follows. For all uh ∈ RT0, vh ∈ P0, wh ∈ P0, we set
bh(uh,vh,wh) =
|K|wK · b̃h(uh,vh)
. (2.14)
3. The scheme
In order to deal with the incompressibility constraint (1.2) we use a projection method. This kind of method
has been introduced by Chorin [4] and Temam [13]. The basic idea is the following. The time interval [0, T ]
is split with a time step k: [0, T ] =
n=0[tn, tn+1] with N ∈ N
∗ and tn = n k for all n ∈ {0, . . . , N}. For all
m ∈ {2, . . . , N}, we compute (see equation (3.2) below) a first velocity field ũmh ≃ u(tm) using only equation
(1.1). We use a second-order BDF scheme for the discretization in time. We then project ũmh (see equation (3.4)
below) over a subspace of P0. We get a a pressure field p
h ≃ p(tm) and a second velocity field u
h ≃ u(tm),
which fulfills the incompressibility constraint (1.2) in a discrete sense. The algorithm goes as follows. For all
m ∈ {0, . . . , N}, we set fmh = ΠP0 f(tm). Since the operator ΠP0 is stable for the L
2-norm we get
|fmh | = |ΠP0 f(tm)| ≤ |f(tm)| ≤ ‖f‖C(0,T ;L2). (3.1)
We start with the initial values
u0h ∈ P0 ∩RT0 , u
h ∈ P0 ∩RT0 p
h ∈ P0 ∩ L
For all n ∈ {1, . . . , N}, (ũn+1h , p
h ) is deduced from (ũ
h , p
h) as follows.
• ũn+1h ∈ P0 is given by
3 ũn+1h − 4u
h + u
∆̃hũ
h + b̃h(2u
h − u
h , ũ
h ) +∇hp
h = f
h , (3.2)
• pn+1h ∈ P
1 ∩ L
0 is the solution of
h − p
divh ũ
h , (3.3)
• un+1h ∈ P0 is deduced by
un+1h = ũ
h − p
h). (3.4)
Existence and unicity of a solution to equation (3.2) is classical ( [5] for example). The convection term in (3.2)
is well defined thanks to the following result.
Proposition 3.1. For all m ∈ {0, . . . , N} we have umh ∈ RT0 .
Proof. If m ∈ {0, 1} the result holds by definition. If m ∈ {2, . . . , N} we apply the operator divh to (3.3) and
compare with (3.4). We get divh u
h = 0. Using definition (2.12) we get u
h ∈ RT0.
6 TITLE WILL BE SET BY THE PUBLISHER
Let us show that equation (3.3) also has a unique solution. Let qh ∈ P
1 ∩ L
0 such that ∆hqh = 0. According
to proposition 4.4 we have for all qh ∈ P0
−(∆hqh, qh) = −
divh(∇hqh), qh
= (∇hqh,∇hqh) = |∇hqh|
Therefore we have ∇hqh = 0, so that qh = 0 since qh ∈ L
0. We have thus proved the unicity of a solution for
(3.3). It is also the case for the associated linear system. It implies that this linear system has indeed a solution.
Hence it is also the case for equation (3.3). Note finally that since umh ∈ P0 ∩RT0, we have divu
h = 0 for all
m ∈ {0, . . . , N}. Hence the incompressibility condition (1.2) is fulfilled.
4. Properties of the discrete operators
We show that the differential operators in (1.1)–(1.2) and the operators defined in section 2.3 share similar
properties.
4.1. Properties of the discrete convective term
We define b̃ : H1 ×H1 → L2. For all u ∈ H1 and v = (v1, v2) ∈ H
1 we set b̃(u,v) =
div(v1 u), div(v2 u)
We show that the operator b̃h is a consistent approximation of b̃.
Proposition 4.1. There exists a constant C > 0 such that for all v ∈ H2 and all u ∈ H2 ∩ H10 satisfying
divu = 0
‖ΠP0 b̃(u,v)− b̃h(ΠRT0u, Π̃P0v)‖−1,h ≤ C h ‖u‖2 ‖v‖1.
Proof. We set uh = ΠRT0u and vh = Π̃P0v. Let K ∈ Th. According to the divergence formula and (2.6) we
ΠP0 b̃(u,v)|K =
σ∈EK∩E
v (u · n) dσ.
On the other hand, let us rewrite b̃h(uh,vh). Let σ ∈ EK ∩ E
h . Setting
vK,Lσ =
vK si (uh · nK,σ)σ ≥ 0
vLσ si (uh · nK,σ)σ < 0
one checks that vK (uh · nK,σ)
σ + vLσ (uh · nK,σ)
σ = vK,Lσ (uh · nK,σ)σ. Using (2.11), we deduce from (2.13)
b̃h(uh,vh)|K =
σ∈EK∩E
vK,Lσ (uh · nK,σ) dσ.
ΠP0 b̃(u,v)− b̃h(uh,vh)
σ∈EK∩E
(v − vK,Lσ) (uh · n) dσ.
Let ψh ∈ P0. We have
ΠP0 b̃(u,v) − b̃h(uh,vh),ψh
σ∈EK∩E
(v − vK,Lσ) (uh · n) dσ
σ∈Eint
(ψKσ −ψLσ)
(v − vKσ ,Lσ) (uh · n) dσ.
TITLE WILL BE SET BY THE PUBLISHER 7
Let σ ∈ E inth . We consider the quadrilateral Dσ defined by xKσ , xLσ and the vertex of σ. We set
DK,Lσ =
Dσ ∩K si (uh · nK,σ)σ ≥ 0
Dσ ∩ Lσ si (uh · nK,σ)σ < 0
Using a Taylor expansion and a density argument (see [14]) one checks that
|v − vKσ ,Lσ | dσ ≤ C h
DKσ,Lσ
|∇v (y)|2 dy
ΠP0 b̃(u,v) − b̃h(ΠRT0u, Π̃P0v),ψh
≤ C h ‖u‖H2
σ∈Eint
|ψLσ −ψKσ |
σ∈Eint
DKσ,Lσ
|∇v (y)|2 dy
so that
ΠP0b̃(u,v)− b̃h(ΠRT0u, Π̃P0v),ψh
)∣∣∣ ≤ C h ‖u‖H2 ‖ψh‖1,h ‖v‖1. Using then definition (2.5), we get
the result.
Let v ∈ L∞ ∩H1 and u ∈ H1 with divu ≥ 0 a.e. in Ω. Integrating by parts one checks that
v · b̃(u,v) dx =
divu dx ≥ 0. The operator bh shares a similar property.
Proposition 4.2. Let uh ∈ RT0 such that divuh ≥ 0. For all vh ∈ P0 we have
bh(uh,vh,vh) ≥ 0.
Proof. Remember that for all edges σ ∈ E inth , two triangles Kσ et Lσ share σ as an edge. We denote by Kσ
the one such that uσ · nKσ ,σ ≥ 0. Using the algebraic identity 2 a (a− b) = a
2 − b2 + (a − b)2 we deduce from
(2.14)
2 bh(uh,vh,vh) = 2
σ∈Eint
|σ|vKσ · (vKσ − vLσ) (uh · nKσ,σ)
σ∈Eint
|vKσ|
2 − |vLσ |
2 + |vKσ − vLσ |
(uh · nKσ,σ)
so that 2 bh(uh,vh,vh) ≥
σ∈Eint
|vKσ|
2 − |vLσ |
(uh · nKσ,σ). This sum can be written as a sum over
the triangles of the mesh. We get
2 bh(uh,vh,vh) ≥
|vKσ |
σ∈EK∩E
|σ| (uh · nKσ,σ).
Using finally the divergence formula we get
2 bh(uh,vh,vh) ≥
|K| |vK |
divuh dx ≥ 0.
The following result states that the operator bh is stable for suitable norms.
8 TITLE WILL BE SET BY THE PUBLISHER
Proposition 4.3. There exists a constant C > 0 such that for all vh ∈ P0, wh ∈ P0, uh ∈ P0 satisfying
divuh = 0
|bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖vh‖h.
Proof. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E
h , we have
(uh · nK,σ)
σ vK + (uh · nK,σ)
σ vLσ = (uh · nK,σ)σ vK − |(uh · nK,σ)σ| (vLσ − vK).
Using this splitting, we deduce from (2.14) bh(uh,vh,wh) = S1 + S2 with
vK ·wK
σ∈EK∩E
|σ| (uh · nK,σ)σ ,
S2 = −
σ∈EK∩E
|σ| |(uh · nK,σ)σ| (vLσ − vK).
By writing the sum over the edges as a sum over the triangles we have
S2 = −
σ∈Eint
|σ| |(uh · nK,σ)σ| (vLσ − vK) · (wLσ −wK).
Using the Cauchy-Schwarz inequality we get
|S2| ≤ h ‖uh‖∞
σ∈Eint
|vLσ − vKσ |
1/2 
σ∈Eint
|wLσ −wKσ |
Since uh ∈ RT0 we have [5] the inverse inequality h ‖uh‖∞ ≤ C |uh|. Using (2.1) and (2.2) we get
σ∈Eint
|vLσ − vKσ |
2 ≤ C
σ∈Eint
τσ |vLσ − vKσ |
2 ≤ C ‖vh‖
and in a similar way
σ∈Eint
|wLσ −wKσ |
2 ≤ C ‖wh‖
h. Thus |S2| ≤ C |uh| ‖vh‖h ‖wh‖h. On the other hand,
according to the divergence formula
|K| (vK ·wK)
divuh dx = 0.
By gathering the estimates for S1 and S2 we get the result.
4.2. Properties of the discrete divergence
The operators gradient and divergence are adjoint: if q ∈ H1 , v ∈ H1 with v · n|∂Ω = 0, we get (v,∇q) =
−(q, divv) by integrating by parts. For ∇h and divh we state the following.
Proposition 4.4. For all vh ∈ P0 and qh ∈ P
1 we have: (vh,∇hqh) = −(qh, divh vh).
Proof. According to (2.8)
(vh,∇hqh) =
|K|vK · ∇hqh|K =
|σ|qh(xσ)nK,σ
TITLE WILL BE SET BY THE PUBLISHER 9
By writing this sum as a sum over the edges we get
(vh,∇hqh) = −
σ∈Eint
|σ| qh(xσ) (vLσ − vKσ) · nKσ,σ +
σ∈Eext
|σ| qh(xσ)vKσ · nKσ,σ. (4.1)
On the other hand, using a quadrature formula
−(qh, divh vh) = −
qh(xσ) (divh vh)(xσ).
By writing this sum as a sum over the edges of the mesh we get
−(qh, divh vh) = −
σ∈Eint
( |Kσ|
qh(xσ) (divhvh)(xσ)−
σ∈Eext
qh(xσ) (divh vh)(xσ).
Using definition (2.12) and comparing with (4.1) we get the result.
The divergence operator and the spaces L20, H
0 satisfy the following property, called inf-sup (or Babuška-Brezzi)
condition (see [9] for example). There exists a constant C > 0 such that
(q, divv)
‖v‖1|q|
≥ C. (4.2)
We will now show that the operator divh and the spaces P0 ∩ L
0, P0 satisfy an analogous property. The proof
uses the following lemma.
Lemma 4.1. There exists a constant C > 0 such that
∀ qh ∈ P
1 ∩ L
0 , sup
vh∈P0\{0}
(qh, divh vh)
‖vh‖h
≥ C h ‖qh‖1,h.
Proof. If qh = 0 the result is trivial. Let qh ∈ P
0\{0}. Let vh = ∇hqh ∈ P0\{0}. Using proposition 4.4
we have
−(qh, divhvh) = (vh,∇hqh) = |∇hqh|
2 = |∇hqh| |vh|.
Using (2.3) and (2.4) we get −(qh, divhvh) ≥ C h ‖qh‖1,h ‖vh‖h.
We now state the result.
Proposition 4.5. There exists a constant C > 0 such that for all qh ∈ P
1 ∩ L
vh∈P0\{0}
(qh, divh vh)
‖vh‖h
≥ C |qh|.
Proof. If qh = 0 the result is trivial. Let qh ∈ P
0\{0}. According to (4.2) there exists v ∈ H
0 such that
divv = −qh and ‖v‖1 ≤ C |qh|. (4.3)
We set vh = ΠPc
v. We want to estimate −
qh, divh(ΠP0vh)
. Since ∇hqh ∈ P0 we deduce from proposition
qh, divh(ΠP0vh)
= (ΠP0vh,∇hqh) = (vh,∇hqh).
By splitting the last term we get
qh, divh(ΠP0vh)
= (v,∇hqh)− (v − vh,∇hqh). (4.4)
10 TITLE WILL BE SET BY THE PUBLISHER
We bound the right-hand side of (4.4). Using (2.7) and (4.3) we have
|v − vh| = |v −ΠPc
v| ≤ C h ‖v‖1 ≤ C h |qh|.
Thus, using the Cauchy-Schwarz inequality, we get
|(v − vh,∇hqh)| ≤ C h |qh| |∇hqh| ≤ C h |qh| ‖qh‖1,h.
We estimate the other term as follows. Integrating by parts we get
(v,∇hqh) = −(qh, divv) +
qh (v · nK,σ) dσ.
We have −(qh, divv) = |qh|
2 thanks to (4.3). On the other hand
qh (v · nK,σ) dσ =
σ∈Eint
qh (v · nKσ,σ) dσ
since v|∂Ω = 0. Using [2] p.269 and (4.3) we have
∣∣∣∣∣
qh (v · nK,σ) dσ
∣∣∣∣∣
≤ C h ‖v‖1 ‖qh‖1,h ≤ C h |qh| ‖qh‖1,h.
Hence we get (v,∇hqh) ≥ (|qh| − C h ‖qh‖1,h) |qh|. Thus we deduce from (4.4)
qh, divh(ΠP0vh)
≥ (|qh| − C h ‖qh‖1,h) |qh|. (4.5)
We now introduce the norm ‖.‖h. We have vh = ΠPc
v ∈ Pc1 ⊂ H
1. From [5] p. 776 we deduce ‖ΠP0vh‖h ≤
C ‖vh‖1. Since ΠPc
is stable for the H1 norm, using (4.3), we get
‖vh‖1 = ‖ΠPc
v‖1 ≤ ‖v‖1 ≤ C |qh|.
Therefore ‖ΠP0vh‖h ≤ C |qh|. Using this inequality in (4.5) we obtain that there exists C1 > 0 and C2 > 0
such that
qh, divh(ΠP0vh)
≥ (C1 |qh| − C2 h ‖qh‖1,h) ‖ΠP0vh‖h.
We deduce from this
vh∈P0\{0}
(qh, divh vh)
‖vh‖h
≥ C1 |qh| − C2 h ‖qh‖1,h.
Let us combine this result with lemma 4.1. Since
∀ t ≥ 0 , max
C t , C1 |qh| − C2 t
C + C2
|qh| ,
we finally get the result.
4.3. Properties of the discrete laplacian
We recall from [14] the coercivity of the laplacian operator.
Proposition 4.6. For all uh ∈ P0 and vh ∈ P0 we have
−(∆̃huh,uh) = ‖uh‖
h , −(∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h.
TITLE WILL BE SET BY THE PUBLISHER 11
5. Stability of the scheme
We first prove an estimate for the computed velocity (theorem 5.1). We show a similar result for the
increments in time (lemma 5.2). Using the inf-sup condition (proposition 4.5), we infer from it some estimates
on the pressure (theorem 5.2).
Lemma 5.1. For all m ∈ {0, . . . , N} et n ∈ {0, . . . , N} we have
(umh ,∇hp
h) = 0 , |u
2 − |ũmh |
2 + |umh − ũ
2 = 0.
Proof. First, using propositions 3.1 and 4.4, we get (umh ,∇hp
h) = −(p
h, divhu
h ) = 0. Also, we deduce from
(3.4)
2 (umh ,u
h − ũ
h ) = −
umh ,∇h(p
h − p
Using the algebraic identity 2 a (a− b) = a2 − b2 + (a− b)2 we get
2 (umh ,u
h − ũ
h ) = |u
2 − |ũmh |
2 + |umh − ũ
2 = 0.
We introduce the following hypothesis on the initial data.
(H1) There exists C > 0 such that |u0h|+ |u
h|+ k|∇hp
h| ≤ C.
Hypothesis (H1) is fulfilled if we set u0h = ΠRT0u0 and we use a semi-implicit Euler scheme to compute u
We have the following stability result.
Theorem 5.1. We assume that the initial values of the scheme fulfill (H1). For all m ∈ {2, . . . , N} we have
|umh |
2 + k
‖ũnh‖
h ≤ C. (5.1)
Proof. Let m ∈ {2, . . . , N} and n ∈ {1, . . . ,m− 1}. Taking the scalar product of (3.2) with 4 k ũn+1h we get
3 ũn+1h − 4u
h + u
, 4 k ũn+1h
(∆̃hũ
h , ũ
+4 k bh(2u
h − u
h , ũ
h , ũ
h ) + 4 k (∇hp
h, ũ
h ) = 4 k (f
h , ũ
h ). (5.2)
First of all, using lemma 5.1 and proceeding as in [10], we get
ũn+1h ,
3 ũn+1h − 4u
h + u
= |un+1h |
2 − |unh|
2 + |2un+1h − u
2 − |2unh − u
+ |un+1h − 2u
h + u
2 + 6 |ũn+1h − u
According to proposition 4.6 we have − 4 k
(∆̃hũ
h , ũ
h ) =
‖ũn+1h ‖
h. Also, according to lemma 5.1 and
(3.4)
4 k (∇hp
h, ũ
h ) = 4 k (∇hp
h, ũ
h − u
(|∇pn+1h |
2 − |∇pnh|
2 − |∇pn+1h −∇p
12 TITLE WILL BE SET BY THE PUBLISHER
Multiplying equation (3.4) by 4 k∇h(p
h − p
h) and using the Young inequality we get
|∇(pn+1h − p
2 ≤ 3 |un+1h − ũ
According to proposition 4.2, we have 4 k bh(2u
h − u
h , ũ
h , ũ
h ) ≥ 0. At last using the Cauchy-Schwarz
inequality, (2.3) and (3.1) we have
4 k (fn+1h , ũ
h ) ≤ 4 k |f
h | |ũ
h | ≤ C k ‖f‖C(0,T ;L2) ‖ũ
h ‖h.
Using the Young inequality we get
4 k (fn+1h , ũ
h ) ≤ 3 k ‖ũ
h + C k ‖f‖
C(0,T ;L2).
Thus we deduce from (5.2)
|un+1h |
2 − |unh |
2 + |2un+1h − u
2 − |2unh − u
2 + |un+1h − 2u
h + u
+3 |ũn+1h − u
2 + k ‖ũn+1h ‖
(|∇hp
2 − |∇hp
2) ≤ C k.
Summing from n = 1 to m− 1 we have
|umh |
2 + |2umh − u
2 + 3
|ũn+1h − u
2 + k
‖ũn+1h ‖
≤ C + 4 |u1h|
2 + |2u1h − u
2 + k2 |∇hp
Using hypothesis (H1) we get (5.1).
We now want to estimate the computed pressure. From now on, we make the following hypothesis on the data
f ∈ C(0, T ;L2) , ft ∈ L
2(0, T ;L2) , u0 ∈ H
2 ∩H10 , divu0 = 0.
One shows that if the data u0 and f fulfill a compatibility condition [11] there exists a solution (u, p) to the
equations (1.1)–(1.2) such that
u ∈ C(0, T ;H2) , ut ∈ C(0, T ;L
2) , ∇p ∈ C(0, T ;L2).
We introduce the following hypothesis on the initial values of the scheme: there exists a constant C > 0 such
(H2) |u0h − u0|+
‖u1h − u(t1)‖∞ + |p
h − p(t1)| ≤ C h , |u
h − u
h| ≤ C k.
One checks easily that this hypothesis implies (H1). We have the following result.
Lemma 5.2. We assume that the initial values of the scheme fulfill (H2). Then there exists a constant C > 0
such that for all m ∈ {1, . . . , N}
|umh − u
h | ≤ C.
Proof. Using proposition 4.1 one proceeds as in [14]. The difference lies in the way we bound the term ∇hp
We use the splitting
p1h = (p
h −ΠPnc1 p(t1)) + (ΠP
p(t1)− p(t1)) + p(t1).
TITLE WILL BE SET BY THE PUBLISHER 13
Using an inverse inequality [2] we have
p1h −ΠPnc1 p(t1)
)∣∣ ≤
∣∣p1h −ΠPnc1 p(t1)
(∣∣p1h − p(t1)
∣∣p(t1)−ΠPnc
p(t1)
∣∣) .
Using (2.10) and hypothesis (H2) we get
p1h −ΠPnc1 p(t1)
)∣∣ ≤ C ‖p(t1)‖1 ≤ C ‖p‖C(0,T ;H1).
According to (2.10) we also have
∣∣∇h(p(t1)−ΠPnc
p(t1))
∣∣ ≤ C ‖p(t1)‖1 ≤ C ‖p‖C(0,T ;H1). Lastly |∇p(t1)| ≤
‖p‖C(0,T ;H1). Thus we get |∇hp
h| ≤ C.
Theorem 5.2. We assume that the initial values of the scheme fulfull (H2). There exists a constant C > 0
such that for all m ∈ {2, . . . , N}
|pnh|
2 ≤ C.
Proof. Let m ∈ {2, . . . , N}. We set n = m− 1. Using the inf-sup condition (4.5) and proposition 4.4, we get
that there exists vh ∈ P0\{0} such that
C ‖vh‖h |p
h | ≤ −(p
h , divh vh) = (∇hp
h ,vh). (5.3)
Plugging (3.4) into (3.2) we have
h = −
3un+1h − 4u
h + u
∆̃hũ
h − b̃h(2u
h − u
h , ũ
h ) + f
so that
h ,vh) = −
3un+1h − 4u
h + u
∆̃hũ
h ,vh
− bh(2u
h − u
h , ũ
h ,vh) + (f
h ,vh).
Thanks to proposition 4.3 and theorem 5.1 we have
∣∣bh(2unh − u
h , ũ
h ,vh)
2 |unh|+ |u
‖ũn+1h ‖h ‖vh‖h ≤ C ‖ũ
h ‖h ‖vh‖h.
According to proposition 4.6 we have
∆̃hũ
h ,vh
≤ ‖ũn+1h ‖h ‖vh‖h. Using the Cauchy-Schwarz inequality,
(2.3) and (3.1) we have
(fn+1h ,vh) ≤ |f
h | |vh| ≤ C |vh| ≤ C ‖vh‖h
and in a similar way
3un+1h − 4u
h + u
)∣∣∣∣ ≤ C
3un+1h − 4u
h + u
∣∣∣∣ ‖vh‖h.
Thus we get
h ,vh) ≤ C + C
|3un+1h − 4u
h + u
+ ‖ũn+1h ‖h
‖vh‖h.
By comparing with (5.3) we get
|pn+1h | ≤ C + C
|3un+1h − 4u
h + u
+ ‖ũn+1h ‖h
14 TITLE WILL BE SET BY THE PUBLISHER
Squaring and summing from n = 1 to m− 1 we obtain
|pnh|
2 ≤ C + C k
|3un+1h − 4u
h + u
+ C k
‖ũn+1h ‖
The last term on the right-hand side is bounded, thanks to theorem 5.1. And since
3un+1h − 4u
h + u
h = 3(u
h − u
h)− (u
h − u
h ) = 3 δu
h − δu
we deduce from lemma 5.2
|3un+1h − 4u
h + u
≤ C k
|δunh |
References
[1] S. Boivin, F. Cayre, J. M Herard, A finite volume method to solve the Navier-Stokes equations for incompressible flows on
unstructured meshes, Int. J. Therm. Sci. 39 (2000) 806–825.
[2] S. C. Brenner and L. R. Scott, The mathematical theory of finite element methods, Springer, 2002.
[3] F. Brezzi and M. Fortin, Mixed and hybrid finite element methods, Springer-Verlag, 1991.
[4] J. Chorin, On the convergence of discrete approximations to the Navier-Stokes equations, Math. Comp. 23 (1969) 341–353.
[5] R. Eymard, T. Gallouët and R. Herbin, Finite volume methods, P.G. Ciarlet and J.L. Lions eds, North-Holland, 2000.
[6] R. Eymard and R. Herbin, A staggered finite volume scheme on general meshes for the Navier-Stokes equations in two space
dimensions, Int.J. Finite Volumes (2005).
[7] R. Eymard, J. C. Latché and R. Herbin, Convergence analysis of a colocated finite volume scheme for the incompressible
Navier-Stokes equations on general 2 or 3D meshes, SIAM J. Numer. Anal. 45(1) (2007) 1–36.
[8] S. Faure, Stability of a colocated finite volume scheme for the Navier-Stokes equations, Num. Meth. PDE 21(2) (2005) 242–271.
[9] V. Girault and P. A. Raviart, Finite Element Methods for Navier-Stokes equations: Theory and Algorithms, Springer, 1986.
[10] J. L. Guermond, Some implementations of projection methods for Navier-Stokes equations, M2AN 30(5) (1996) 637–667.
[11] J. G. Heywood and R. Rannacher, Finite element approximation of the nonstationary Navier-Stokes problem. I. Regularity of
solutions and second-order error estimates for spatial discretization, SIAM J. Numer. Anal. 19(26) (1982) 275–311.
[12] D. Kim and H. Choi, A second-order time-accurate finite volume method for unsteady incompressible flow on hybrid unstruc-
tured grids, J. Comp. Phys. 162 (2000) 411–428.
[13] R. Temam, Sur l’approximation de la solution des équations de Navier-Stokes par la méthode de pas fractionnaires II, Arch.
Rat. Mech. Anal. 33 (1969) 377–385.
[14] S. Zimmermann, Stability of a colocated finite volume for the incompressible Navier-Stokes equations, arXiv :0704.0772 (2006).
[15] S. Zimmermann, Étude et implémentation de méthodes de volumes finis pour les fluides incompressibles, PhD, Blaise Pascal
University, France (2006).
ABSTRACT
  We introduce a finite volume scheme for the two-dimensional incompressible
Navier-Stokes equations. We use a triangular mesh. The unknowns for the
velocity and pressure are respectively piecewise constant and affine. We use a
projection method to deal with the incompressibility constraint. We show that
the differential operators in the Navier-Stokes equations and their discrete
counterparts share similar properties. In particular we state an inf-sup
(Babuska-Brezzi) condition. Using these properties we infer the stability of
the scheme.

<|endoftext|><|startoftext|>
Introduction 1
2 The Gauge Theory 4
3 D(-1) Instantons 6
3.1 D3-D(-1) in flat space . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 D(-1)-D3 at the C3/Z3-orientifold . . . . . . . . . . . . . . . 7
4 ADS-like superpotential 9
4.1 D3-D(-1) one-loop vacuum amplitudes . . . . . . . . . . . . . 9
4.2 Sp(6)× U(2) superpotential . . . . . . . . . . . . . . . . . . . 14
4.3 U(4) superpotential . . . . . . . . . . . . . . . . . . . . . . . 15
5 ED3-instantons 17
5.1 D3-ED3 one-loop vacuum amplitudes . . . . . . . . . . . . . 19
5.2 The superpotential . . . . . . . . . . . . . . . . . . . . . . . . 21
6 ADS superpotentials: a general analysis 22
7 Conclusions 24
1 Introduction
Our understanding of non perturbative effects in four dimensional supersym-
metric gauge theories (SYM) has dramatically improved in recent years. This
is due mainly to the observation that integrals over the moduli space of gauge
connections localize around a finite number of points[1]. These techniques
have been applied to the study of multi-instanton corrections to N = 1, 2, 4
supersymmetric gauge theories in R4 [2, 3, 4, 5, 6, 7, 8, 9, 10] (see [11, 12] for
reviews of multi-instanton techniques before localization and complete lists
of references). In the D-brane language language, the dynamics of the gauge
theory around the instanton background is described by an effective theory
governing the interactions of the lowest energy excitations of open strings
ending on a bound state of Dp-D(p+4) branes. For the case of N = 2, 4
SYM the multi-instanton action has been derived via string techniques in
[13, 14].
In [15], D-brane techniques have been applied to the computation of the
Affleck, Dine and Seiberg (ADS) superpotential [16, 17] for N = 1 SQCD
with gauge group SU(Nc) and Nf = Nc − 1 massless flavours and Sp(2Nc)
with 2Nf = 2Nc flavours. The N = 1 gauge theory is realized on the four-
dimensional intersection of Nc coloured and Nf flavour D6 branes. Chiral
matter comes from strings connecting the flavor and color D6 branes. Instan-
tons in the U(Nc) gauge theory are realized in terms of ED2 branes parallel to
the stack of Nc D6-branes. By careful integrating the supermoduli (massless
strings with at least one end on the ED2) the precise form of the ADS su-
perpotential was reproduced in the low energy, field theory limit α′ → 0. In
the recent literature ED2-brane instantons in intersecting D6-brane models
have received particular attention in connection with the possibility of gen-
erating a Majorana mass for right handed neutrinos and their superpartners
[18, 19, 20, 21, 22]. The field theory interpretation of this new instanton effect
is far from clear and it is the subject of active investigation. In this paper
we present a detailed derivation of these new non perturbative superpoten-
tials in N = 1 Z3-orientifold models. Investigations of stringy instantons on
N = 1 Z2 × Z2 orientifold singularities appeared recently in [23].
We study SYM gauge theories living on D3 branes located at a Z3-
orientifold singularity. There are two choices for the orientifold projection
[24, 25, 26, 27, 28] realized by two types of O3-planes1. They lead to anomaly
free2 chiral N = 1 gauge theories with gauge groups SO(N − 4)× U(N) or
Sp(N +4)×U(N) and three generations of chiral matter in the bifundamen-
tal and anti/symmetric representation of U(N). The archetype of this class
can be realized as a stack of 3N + 4 D3-branes and one O3−-plane sitting
on top of an R6/Z3 singularity. This system can be thought of as a T-dual
local description (near the origin) of the T 6/Z3 type I string vacuum found
in [40]. The lowest choices of N lead to U(4) or U(5) gauge theories with
three generations of chiral matter in the 6 and 10 + 5∗ that are clearly of
phenomenological interest in unification scenarios [45, 46, 47, 48] 3.
In [49] the U(4) case was studied and the form of the ADS-like super-
1We will only consider O3±-planes, not the more exotic Õ3
-planes [29, 30, 31].
2Factorizable U(1) anomalies are cancelled by a generalization of the Green-Schwarz
mechanism [32, 33, 34, 35, 36, 37, 38, 37] that may require the introduction of generalized
Chern-Simons couplings [39].
3Only the U(4) case can be realized in the compact Z3 orientifold. In general the Chan-
Paton group is SO(8 − 2n) × U(12 − 2n) ×Hn where Hn = U(n)
3, SO(2n), U(n), U(1)n
depending on the choice of Wilson lines [28, 41, 42, 43, 44].
potential was determined combining holomorphicity, U(1) anomaly, dimen-
sional analysis and flavour symmetry. Stringy instanton effects were also
considered. Very much as for worldsheet instantons in heterotic strings
[50, 51, 52, 53, 54], these genuinely stringy instantons give rise to super-
potentials that do not vanish at large VEV’s of the open string (charged)
‘moduli’.
Here we derive the non-perturbative superpotentials from a direct in-
tegration over the D-instanton super-moduli space. Gauge instantons are
described in terms of open strings ending on D(-1) branes while stringy in-
stantons are given by open strings ending on euclidean ED3 branes wrapping
a four cycle inside the Calabi Yau . The open strings connecting the stack
of D3 branes to D(-1) and ED3 branes have four and eight mixed Neumann-
Dirichelet directions respectively. This ensures that the bound state is su-
persymmetric. The superpotential receives contribution from disk, one-loop
annulus and Möbius amplitudes ending on the D(-1) or ED3 branes. We find
that ADS superpotentials are generated only for two gauge theory choices
U(4) and Sp(6)×U(2) inside the Z3-orientifold class. Stringy instantons leads
to Majorana masses in the U(4) case, Yukawa couplings in the U(6)×SO(2)
gauge theory and non-renormalizable couplings for U(2N + 4) × SO(2N)
gauge theories with N > 3.
The plan of the paper is as follows.
In section 2 we review the gauge theories coming from a stack of D3
branes at a C3/Z3 orientifold singularity. In Section 3 we consider non-
perturbative effects generated by D(-1) gauge instantons, corresponding to
ADS-like superpotential in the low energy limit. A detailed analysis of one-
loop vacuum amplitudes and the integrals over the supermoduli is presented
for SYM theories with gauge groups Sp(6) × U(2) and U(4). In section 5,
we consider stringy instanton effects generated by ED3-branes. Once again
a detailed analysis of the the one-loop string amplitudes and the integrals
over the supermoduli is presented. In section 6 we present a “complete” list
of N = 1 SYM theories with matter in the adjoint, fundamental, symmetric
and antisymmetric representation of the gauge groups (U, SO, Sp) which
exhibit a non perturbatively generated ADS superpotential.
We conclude with some comments and directions for future investigation
in Section 7.
2 The Gauge Theory
The low energy dynamics of the open strings living on a stack of N D3-branes
in flat space is described by a N = 4 U(N) SYM gauge theory. In the N = 1
language the fields are grouped into a vector multiplet V = (Aµ, λα, λ̄α̇) and
3 chiral multiplets ΦI = (φI , ψIα), I = 1, 2, 3 , all in the adjoint of the gauge
group.
We consider the D3-brane system at a R6/Z3 singularity. At the singular-
ity the N D3-branes group into stacks of Nn fractional branes with n = 0, 1, 2
labelling the conjugacy classes of Z3. The gauge group U(N) decomposes
n U(Nn). More precisely, denoting by γθ,N the projective embedding of
the orbifold group element θ ∈ Z3 in the Chan-Paton group and imposing
= 1 and γ†
= γ−1
one can write
N0×N̄0
, ωh 1
N1×N̄1
, ω̄h 1
N2×N̄2
) (2.1)
with N =
nNn. The resulting gauge theory can be found by projecting
the N = 4 U(N) gauge theory under the Z3 orbifold group action:
V → γ
V γ−1
ΦI → ω γ
ΦI γ−1
ω = e2πi/3 (2.2)
Keeping only invariant components under (2.2) one finds the N = 1 quiver
gauge theory
V : N0N̄0 +N1N̄1 +N2N̄2
ΦI : 3×
N0N̄1 +N1N̄2 +N2N̄0
(2.3)
with gauge group
n U(Nn) and three generations of bifundamentals. More
precisely V and ΦI are N ×N block matrices (N =
Nn) with non trivial
Nn × N̄m blocks given by (2.3). Under Z3 a block Nn × N̄m transform as
ωn−m. These non-trivial transformation properties are compensated by the
space-time eigenvalues of the corresponding field (ω0 for V and ω for ΦI )
making the corresponding component invariant under Z3.
Next we consider the effect of introducing an O3±-plane. Woldsheet
parity Ω flips open string orientations and act on Chan-Paton indices as
Nn ↔ N̄−n where subscripts are always understood mod 3. This prescription
leads to
Ω : N0 ↔ N̄0 N1 ↔ N̄2 (2.4)
The choices of O3±-planes correspond to keep states with eigenvalues Ω = ±1
and lead to symplectic or orthogonal gauge groups4.
We start by considering the O3− case. Keeping Ω = − components from
(2.3) one finds
V : 1
N0(N0 − 1) +N1N̄1
ΦI : 3×
N0N̄1 +
N1(N1 − 1)
(2.5)
This follows from (2.3) after identifying the mirror images N̄0 = N0, N̄2 = N1,
and antisymmetrizing the resulting block matrix. (2.5) describes the field
content of a N = 1 SYM with gauge group SO(N0)×U(N1) and three chiral
multiplets in the
( , ¯) + (•, )
For general N0, N1 the U(N1) gauge theory is anomalous. The anomaly
is a signal of the presence of a twisted RR tadpole [34, 35]. Focusing on
a local description near the orientifold singularity one can relax the global
tadpole cancellation condition [55, 56]. These models can be thought as
local descriptions of a more complicated Calabi Yau near a Z3 sigularity.
Cancellation of the twisted RR tadpole can be written as [40]
= −4 ⇒ N0 = N1 − 4 (2.6)
and ensures the cancellation of the irreducible four-dimensional anomaly
I(F ) ∼ [−N0 + (N1 − 4)] trF
3 = 0 (2.7)
Finally the running of the gauge coupling constants is governed by the β
functions with one-loop coefficients5
β0 = 3 ℓ(
N0(N0 − 1))− 3N1 ℓ(N0)
(N0 −N1 − 2) = −9 (IR free)
β1 = 3 ℓ(N1N̄1)− 3N0 ℓ(N̄1)− 3 ℓ(
N1(N1 − 1))
(−N0 +N1 + 2) = +9 (UV free) (2.8)
with βn refering to the n
th-gauge group. The last equalities arise after im-
posing the anomaly cancellation (2.6). As expected, β0 + β1 = 0 since the
ten-dimensional dilaton does not run.
4In the compact case, realized in terms of D9-branes and O9-plane on T 6/Z3, the
orthogonal choice is dictated by global tadpole cancellation. Turning on a quantized NS-
NS antisymmetric tensor [28, 41, 29] leads to symplectic groups.
5Here trRT
aT b = ℓ(R), i.e. ℓ(N) = 1
, ℓ(NN̄) = N and ℓ(1
N(N± 1)) = 1
(N ± 2).
The case Ω = + works in a similar way. The resulting N = 1
quiver has gauge group Sp(N0) × U(N1) and three chiral multiplets in the
[( , ¯) + (•, )]. The U(N1) is anomaly free for N0 = N1 + 4 and the one-
loop β function coefficients are given by β0 = +9 (UV free) and β1 = −9 (IR
free).
3 D(-1) Instantons
There are two sources of supersymmetric instanton corrections in the D3
brane gauge theory: D(-1)-instantons and Euclidean ED3-branes wrapping
four cycles on T 6/Z3. Both are point-like configurations in the space-time
and can be thought of as D(-1)-D3 and ED3-D3 bound states with four and
eight directions with mixed Neumann-Dirichlet boundary conditions.
3.1 D3-D(-1) in flat space
Gauge instantons in SYM can be efficiently described in terms of D(-1)-
branes living on the world-volume of D3-branes [57]. As before, we start
from the N = 4 case: a bound state of N D3 and k D(-1) branes in flat
space. In this formalism instanton moduli are described by the lowest energy
modes of open strings with at least one end on the D(-1)-brane stack. The
gauge theory dynamics around the instanton background can be described
in terms of the U(k) × U(N) 0-dimensional matrix theory living on the D-
instanton world-volume. In particular, the ADHM constraints [58] defining
the moduli space of self-dual YM connections follow from the F- and D-
flatness condition in the matrix theory [57].
The instanton moduli space is given by the D(-1)D3 field content
(aµ, θ
α , χa, D
c, θ̄Aα̇) kk̄
(wα̇, ν
A) kN̄
(w̄α̇, ν̄
A) Nk̄ (3.1)
with µ = 1, . . . , 4, α, α̇ = 1, 2 (vector/spinor indices of SO(4)), a = 1, . . . , 6,
A = 1, . . . , 4 (vector/spinor indices of SO(6)R), c = 1, . . . , 3. The matrices
aµ, χa describe the positions of the instanton in the directions parallel and
perpendicular to the D3-brane respectively, wα is given by the NS open D3-
D(-1) string (instanton sizes and orientations), Dc are auxiliary fields and
θAα , θ̄Aα̇, ν
A are the fermionic superpartners.
The D3-D(-1) action can be written as [59]
Sk,N = trk
SG + SK + SD
(3.2)
SG = −[χa, χb]
2 + iθ̄α̇A[χ
AB, θ̄
cDc (3.3)
SK = −[χa, aµ]
2 + χaw̄
α̇wα̇χa − iθ
αA[χABθ
α ] + 2iχAB ν̄
SD = i
−[aαα̇, θ
αA] + ν̄Awα̇ + w̄α̇ν
θ̄α̇A +D
w̄σcw − iη̄cµν [a
µ, aν ]
with χAB ≡
T aABχa, T
AB = (η
AB, iη̄
AB) given in terms of the t’Hooft symbols
and g20 = 4π(4π
2α′)−2 gs. The action (3.3) follows from the dimensional
reduction of the D5-D9 action in six dimensions down to zero dimension.
As a consequence our subsequent results hold up to some computable non
vanishing numerical constant.
In the presence of a v.e.v. for the six U(N) scalars ϕa in the D3-D3 open
string sector we must add to Sk,N
Sϕ = trk
w̄α̇(ϕaϕa + 2χaϕa)wα̇ + 2iν̄
AϕABν
(3.4)
The multi-instanton partition function is
Zk,N =
e−Sk,N−Sϕ =
VolU(k)
dχ dD da dθ dθ̄dw dν e−Sk,N−Sϕ
In the limit g0 ∼ (α
′)−1 → ∞, gravity decouples from the gauge theory and
the contributions coming from SG are suppressed. The fields θ̄α̇A, D
c become
Lagrange multipliers implementing the super ADHM constraints
θ̄α̇A : ν̄
Awα̇ + w̄α̇ν
A − [aαα̇, θ
αA] = 0
Dc : w̄σcw − iη̄cµν [a
µ, aν ] = 0 (3.5)
3.2 D(-1)-D3 at the C3/Z3-orientifold
Let us now consider in turn the Ω and then the Z3 projection.
The effect of introducing an O3±-plane in the D(-1)-D3 system corre-
sponds to keep open string states with eigenvalue ΩI = ±, Ω being the
worldsheet parity and I a reflection along the Neumann-Dirichlet directions
of the Dp-O3 system [8]. On D(-1) string modes, I acts as a reflection in the
spacetime plane
I : aµ → −aµ θ
α → −θ
α (3.6)
leaving all other moduli invariant. In addition consistency with the D3-O3
projection requires that the D(-1) strings are projected in the opposite way
with respect to the D3-branes[60] . From the gauge theory point of view this
corresponds to the well known fact that SO(N) and Sp(N) gauge instantons
have ADHM constraints invariant under Sp(k) and SO(k) respectively.
We start by considering the O3− case. After the ΩI projection the sur-
viving fields are
(aµ, θ
k(k− 1)
(Dc, χI , χ̄I , θ̄Aα̇)
k(k+ 1)
(wα̇, ν) kN . (3.7)
Since we are dealing with a SO(N) gauge theory the Dc moduli are projected
in the adjoint of Sp(k). This is also the case for all the other moduli even
under I while the odd ones, (aµ, θ
α ), turn out to be antisymmetric.
Let us now consider the Z3 projection. Out of the six χa one can form
three complex fields χI with eigenvalues ω under Z3 and their conjugate
χ̄I . To embed the Z3 projection into SU(4) we decompose the spinor index
A = (0, I), with I = 1, . . . , 3 and the zeroth direction along the surviving
N = 1 supersymmetry. The D3 and D(-1) gauge groups SO(N) and Sp(k)
break into SO(N0)×U(N1) and Sp(k0)×U(k1) respectively with N0 (k0) the
number of fractional D3 (D(-1)) branes invariant under Z3 and N1 (k1) those
transforming with eigenvalue ω. More precisely, the projective embedding
of the Z3 basic orbifold group element θ in the Chan-Paton group can be
written
N0×N0
, ωh 1
N1×N̄1
, ω̄h 1
N̄1×N1
k0×k0
, ωh 1
k1×k̄1
, ω̄h 1
k̄1×k1
) (3.8)
After projecting under Z3 the symmetric/antisymmetric matrices in (3.7)
break into km × k̄n, km × N̄n or Nm × k̄n each transforming with eigenvalue
ωm−n. In addition fields with up(down) index I transform like ω(ω̄). Keeping
only the invariant components one finds
(aµ; θ
k0(k0 − 1) + k1k̄1
k1(k1 − 1) + k0k̄1
(Dc; θ̄0α̇)
k0(k0 + 1) + k1k̄1
(χ̄I ; θ̄Iα̇)
k̄1(k̄1 + 1) + k0k1
k1(k1 + 1) + k0k̄1
(wα̇; ν0) k0N0 + k1N̄1 + k̄1N1
νI k0N̄1 + k̄1N0 + k1N1 (3.9)
Notice that the Z3 eigenvalues of the Chan-Paton indices in the r.h.s. of (3.9)
compensate for those of the moduli in the l.h.s. making the field invariant
under Z3. In addition (odd)even components under I are (anti)symmetrized
ensuring the invariance under ΩI.
The multi-instanton action follows from that of N = N0+2N1 D3 branes
and k = k0 + 2k1 D(-1) instanton in flat space (3.2) with U(N) and U(k)
matrices restricted to the invariant blocks (3.9).
The results for O3+ can be read off from (3.9) by exchanging symmetric
and antisymmetric representations.
4 ADS-like superpotential
4.1 D3-D(-1) one-loop vacuum amplitudes
Non-perturbative superpotentials can be computed from the instanton mod-
uli space integral [11, 13, 18]
SW = e
〈1〉D+〈1〉A+〈1〉M µβnkn
e−Sk,N−Sϕ (4.1)
The integration is over the instanton moduli space, M, 〈1〉D is the disk
amplitude and 〈1〉A,M are the one-loop vacuum amplitudes with at least
one end on the D(-1)-instanton. The factor µβnkn , µ being the energy scale,
comes from the quadratic fluctuations around the instanton background and
as we will see it combines with a similar contribution coming from the moduli
measure to give a dimensionless SW .
The terms in front of the integral in (4.1) combine into
SW = Λ
e−Sk,N−Sϕ (4.2)
Λknβn = e2πiknτn(µ) µβnkn τn(µ) = τn −
(4.3)
the one-loop renormalization group invariant and the running coupling con-
stant respectively and τn refers to the complexified coupling constant of the
nth gauge group. µ0 is a reference scale.
More precisely, the disk amplitude and one-loop amplitudes yields
e〈1〉D = e2πiknτn τn =
e〈1〉A+〈1〉M =
)−βnκn
+ . . . (4.4)
with dots refering to threshold corrections that will not be considered here.
To see (4.4) we should compute the following one-loop amplitudes
〈1〉A = −
Tr[(1 + (−)F )(1 + θ + θ2) qL0−a]
AD(−1)D3 = −A0,D(−1)D3 ln
+ . . .
〈1〉M = −
Tr[ ΩI (1 + (−)F )(1 + θ + θ2) qL0−a]
MD(−1) = −M0,D(−1) ln
+ . . . (4.5)
In the above formula µ enters as a UV regulator in the open string chan-
nel (see [61] for details) and A0,M0 are the massless contributions to the
amplitudes.
We start by considering the O3− projection. It is important to notice that
only the annulus with one end on the D(-1) and one on the D3 contributes
to these amplitudes since D(-1)-D(-1) amplitudes cancel due to the Riemann
identity. One finds
AD(−1),D3 =
trγθ,ktrγθ,N
ϑ[αβ ]
ϑ[αβ+hi ]
(k0 − k1)(N0 −N1) + . . .
MD(−1) =
trγθ2,k
ϑ[αβ ]
ϑ[αβ+hi ]
= −3(k0 − k1) + . . . (4.6)
The sum runs over the even spin structures and cαβ = (−)
2(α+β). The term
comes from the (b, c) and (β, γ) ghosts while the extra five thetas in the
numerator and denominator describe the contributions of the ten fermionic
and bosonic worldsheet degrees of freedom. We adopt the shorthand notation
h ] ≡ ϑ[
h ]/(2 cosπh) to describe the massive contribution of a periodic
boson to the partition function. hi = (
) denote the Z3-twists while
the extra 1
-shifts in the annulus account for the D(-1)-D3 open string twist
along Neuman-Dirichlet directions while 1
twists in the Möbius come from
the I-projection. In addition we used the fact that the contribution of the
unprojected sector is zero after using the Riemann identity while that of the
θ- and θ2-projected sectors are identical explaining the overall factor of 2.
The extra factor of 2 in the annulus comes from the two orientations of the
string. The second line displays the massless contributions. We use the Chan
Paton traces
1,k = k0 + 2k1 trγθ,k = k0 − k1
1,N = N0 + 2N1 trγθ,N = N0 −N1 (4.7)
that follows from (2.1) and the first few terms in the theta expansions
ϑ[0h] = 1 + q
2 2 cos 2πh+ . . . ϑ[
h ] = q
8 2 cosπh+ . . .
η = q
24 + . . . (4.8)
From (4.6) one finds
A0 +M0 =
(k0 − k1)(N0 −N1 − 2) = knβn (4.9)
with βn the one-loop β coefficients given in (2.8). Plugging (4.9) into (4.5)
results into (4.4). The fact that the β function coefficients are reproduced by
the instanton vacuum amplitudes is a nice test of the instanton field content
(3.9).
Now let us determine the dependence of the instanton measure on the
string scaleMs ∼ α
′ −1/2. The scaling of the various instanton moduli follows
from (3.3):
D, g0 ∼M
s χa, ϕa ∼ Ms wα̇, aµ ∼M
νA, θAα ∼M
s θ̄Aα̇ ∼M
s (4.10)
Collecting from (3.9) the number of components of the various moduli enter-
ing in the instanton measure one finds 6
e−Sk,N−Sϕ ∼ M−βnkns
knβn = −2nD − nχ + na + nw +
nθ̄ −
(k0 − k1)(N0 −N1 − 2) (4.11)
Notice that this factor precisely combines with that in (4.2) leading to a di-
mensionless SW as expected. This simple dimensional analysis can be used to
determine the form of the allowed ADS superpotentials in the gauge theory.
A superpotential is generated if and only if the integral over the instanton
moduli space reduce to an integral over x
0 describing the center of the in-
stanton and θα its superpartner. More precisely
SW = Λ
e−Sk,N−Sϕ = c
d4x0d
Λknβn
ϕknβn−3
(4.12)
where c is a numerical constant. Whether c is zero or not depends on the
presence or not of extra fermionic zero modes besides θ. Notice that the power
of ϕ is completely fixed requiring that SW is dimensionless. The precise form
of the superpotential requires the evaluation of the moduli space integral and
will be the subject of the next section. The superpotential follows from (4.12)
after promoting ϕI to the chiral superfield ΦI and x0, θα to the measure of
the superspace
SW = c
d4xd2θ
Λknβn
Φknβn−3
(4.13)
6We recall that fermionic differentials scale as the inverse of the dimension of the
fermion itself. This explains the extra minus sign in (4.11).
A superpotential of type (4.13) is generated whenever [63, 64, 16, 17]
〈λ2 ϕknβn−3〉 6= 0 (4.14)
Each scalar ϕ soaks two fermionic zero-modes and each gaugino λ one zero
mode7. The condition (4.14) translates into
dimMF = 2knβn − 4 (4.15)
with dimMF the fermionic dimension of the instanton super-moduli space.
The number of fermionic zero modes can be read off from (3.9)
dimMF = nθ + nν − nθ̄
= k0(3N1 +N0 − 2) + k1[2N1 + 3(N0 +N1 − 2)]
= k0(4N0 + 10) + k1(8N0 + 14) (4.16)
where we used the fact that θ̄α̇A enter as a Lagrangian multiplier imposing
the fermionic ADHM constraint and therefore subtracts degrees of freedom.
The last line in (4.16) follows from using the anomaly cancellation condition
N1 = N0 + 4. The result (4.16) is consistent with the Atiyah-Singer index
theorem that states
dimMF = 2k0
N0(N0 − 1)) + 3N1ℓ(N0)
ℓ(N1N̄1) + 3N0ℓ(N1) + 3ℓ(
N1(N1 − 1))
= k0(3N1 +N0 − 2) + k1[2N1 + 3(N0 +N1 − 2)] (4.17)
Combining (4.15) and (4.16) one finds
k1 − 7k0 − 1
k0 + 2k1
(4.18)
One can easily see that the only non-negative solution for N0 is
N0 = 0 k0 = 0 k1 = 1
We conclude that in the class of U(N0+4)×SO(N0) SYM theories describing
the low-energy dynamics of D3-branes on the Z3 orientifold only the U(4)
7This can be seen by explicitly solving the equations of motion of the gaugino and the
ϕ-field in the instanton background [59]. In particular the source for the scalar field comes
from the Yukawa coupling LY uk = gYMϕ
†ψλ in the gauge theory action.
theory with three chiral multiplets in the antisymmetric leads to an ADS-like
superpotential generated by gauge instantons.
The counting can be easily repeated for the Sp(N1+4)×U(N1) cases by
exchanging symmetric and antisymmetric representations in (3.9) as required
by the presence of the O3+-plane. The results are
knβn = 9(k0 − k1)
dimMF = k0(4N1 + 6) + k1(8N1 + 18)
3k0 − 9k1 − 1
k0 + 2k1
(4.19)
One can easily see that the only non-negative solution is
N1 = 2 k0 = 1 k1 = 0
We conclude that in this class, only the gauge theory Sp(6)×U(2) with three
chiral multiplets in the ( , ¯) + (•, ) admits an ADS-like superpotential
generated by instantons.
The aim of the rest of this section is to compute SW . The integral (4.12)
will be evaluated in turn for the Sp(6)× U(2) and U(4) case.
4.2 Sp(6)× U(2) superpotential
We first consider the O3+ case, i.e. the Sp(6) × U(2) gauge theory with
three chiral multiplets in the [(6, 2̄) + (1, 3)]. The instanton moduli is given
by (3.9) after flipping symmetric/antisymmetric representations in order to
deal with the symplectic projection. Plugging k0 = 1, k1 = 0, N0 = 6, N1 = 2
into (3.9) one finds the the surviving fields
aµ, w
, θ0α, ν
0u0 , νIu1 (4.20)
with u0 = 1, ..6, and u1 = 1, 2, whose position from lower to upper has been
switched in this section for notational convenience as we will momentarily
see. In particular both θ̄0α̇ andD
c are projected out (the D(-1) “gauge” group
is O(1) ≈ Z2 in this case) and therefore no ADHM constraint survives. The
instanton action reduces then to
S = SK + Sϕ = w
α̇ ϕ̄Iu0u1ϕ
I u1v0wα̇v0 + ν
Iu1ν0u0ϕ̄Iu0u1 (4.21)
Here and below we omit numerical coefficients that can be always reabsorbed
at the end in the definition of the scale. The integrations over wα̇u0 , ν
, νIu1
are gaussian and the final result, up to a non vanishing numerical constant,
can be written as
SW = Λ
d4ad2θ
det6×6 (ϕ̄Iu1,u0)
det6×6 (ϕ̄Iu1,u0ϕ
Iu1,v0)
d4ad2θ
det6×6 (ϕIu1,u0)
(4.22)
where we have exploited the possibility of combining I and u1 in one ‘bi-
index’ Iu1 so as to get a range of six values. For the sake of simplicity we
have dropped the subscript 0 denoting bare scalar fields. In the following
scalar fields entering in formulae involving Λ will be always understood to be
bare. The last step makes use of det(AB) = det(A)det(B).
4.3 U(4) superpotential
We now consider the O3− case, i.e. the U(4) gauge theory with three chiral
multiplets in the 6. Setting k0 = 0, k1 = 1, N0 = 0, N1 = 4 in (3.9) the
surviving fields can be written as
I[uv]
, ϕ̄I[uv](0) , aµ(0) , χ̄I(−2) , χ
(+2) , D
(0) , w
u(+1) , w̄
α̇(−1)
θ0α(0) , θ̄0α̇(0) , θ̄α̇I(−1) ; ν
u(+1) , ν̄
(−1) , ν
(+1) (4.23)
with u = 1, ..4 and the charge q under U(1)k1 is denoted in parentheses.
Plugging into (3.3) (after taking α′ → 0) one finds
S = SB + SF (4.24)
where
ν̄0uwuα̇ + w̄
θ̄α̇0 + ν
Iu wuα̇ θ̄
I + χ̄Iν
Iu + νIuϕ̄Iuvν̄
SB = w̄
α̇ϕ̄Iuwϕ
Iwvwα̇v + ϕ
Iuvwα̇uwvα̇χ̄I + ϕ̄Iuvw̄
uα̇w̄vα̇χ
I + w̄uα̇wuα̇χ̄Iχ
+Dc w̄σcw (4.25)
As before we omit numerical coefficients. The integral over Dc leads to a δ
function on the ADHM constraints
d8wd8w̄δ3(w̄σcw) =
dρ ρ9d12U (4.26)
In the r.h.s of (4.26) we have solved the ADHM constraints in favor of w and
U defined by
wuα̇ = ρUuα̇ w̄
uα̇ = ρ Ūuα̇ Ūuα̇Uuβ̇ = δ
(4.27)
The coset representatives Uα̇u parameterizes the SU(4)/SU(2) orientations
of the instanton inside the gauge group. The fermionic integrations lead to
the determinant
∆F = ρ
8 ǫu1u2u3u4ǫv1v2u5u6ǫv3v4v5v6Xu1v1u2v2 Xu3v3u4v4 Yu5v5 Yu6v6 (4.28)
Xu1v1u2v2 = ǫ
I1I2I3χ̄I1ϕ̄I2u1v1ϕ̄I3u2v2
Yuv = U
u Uα̇u (4.29)
The bosonic integrals are more involved. For arbitrary choices of the scalar
VEV’s ϕ̄I and ϕ
I , even along the flat directions of the potential, the inte-
gration over U represents a challenging if not a prohibitive task. Fortunately
choosing ϕIuv = ϕηIuv, ϕ̄Iuv = ϕ̄η
Iuv, the full ϕ-dependence can be factor-
ized. SU(4) gauge and SU(3) ‘flavor’ invariance can then be used to recover
the full answer. After the rescaling
ρ2 → ρ2/(ϕϕ̄) χI → ϕχI χ̄I → ϕ̄χ̄I (4.30)
The integral becomes
SW = Λ
d4x0d
(4.31)
with I the ϕ-independent integral
dρρ9 d12U d3χd3χ̄∆F e
S̃B = −ρ
2(1 + ηIuvYuvχI + η̄IuvȲ
uvχI + χ̄Iχ
I) (4.32)
and ∆F given again by (4.28) but now in terms of
Xu1v1u2v2 = ǫ
I1I2I3χ̄I1 η̄I2u1v1 η̄I3u2v2
Finally one can restore the gauge covariance of (4.31) by noticing that there
is a unique SU(4)c × SU(3)f singlet in the symmetric tensor of six ϕ
det3×3[ǫu1..u4ϕ
Iu1u2ϕJu3u4]
Therefore one can replace ϕ6 in (4.31) by this singlet. The superpotential
follows after replacing ϕI → ΦI
SW = c
d4xd2θ
det3×3[ǫu1..u4Φ
Iu1u2ΦJu3u4]
(4.33)
where c is a computable non-zero numerical coefficient.
5 ED3-instantons
Let us now consider the ED3-D3 system. We restrict ourselves to the compact
case T 6/Z3 and consider the ED3 fractional instanton wrapping a four-cycle
Cn inside T
6/Z3. We start by considering the O3
−-orientifold projection.
The zero modes of the Yang-Mills fields in the instanton background can
be described as before in terms of open strings with at least one end on
the ED3. Open strings connecting ED3 and D3 branes have 8 Neumann-
Dirichlet directions therefore the zero-mode dynamics of the ED3-D3 system
is equivalent to that of the D7-D(-1) bound state. The instanton action can
be found starting from that of the N = (8, 0) sigma model describing the
low energy dynamics of a D1-D9 bound state in type I [65] reduced down to
zero dimensions. In flat space the D(-1)-D7 action reads
S = trk
Sg + SK + SD
(5.1)
Sg = −[χ, χ̄]
2 + Θ̃ȧχΘ̃ȧ +DcDc
SK = −[χ,Xm][χ̄, Xm] + Θ
aχ̄Θa + ν(χ + ϕ)ν
SD = Θ̃
ȧXmΓ
a +DcΓ̂cmn[Xm, Xn] (5.2)
with m = 1, . . . , 8v, a = 1, . . . , 8s, ȧ = 1, . . . , 8c, c = 1, , . . . , 7. We denote
by ϕ = mI(Cn)ϕ
I , the gauge scalar parametrizing the position of the D3-
brane along the direction perpendicular to the 4-cycle Cn. Here Γ
ȧa, Γ̂
are gamma matrices of SO(8) and SO(7) respectively. The introduction
of the auxiliary fields Dc has broken the manifest SO(8) invariance of the
action that will be further broken by the Z3-projection. In (5.2), Xm and
χ, χ̄ describe the position of the D(-1)-instanton in the directions longitudinal
and perpendicular to the D7-brane respectively while Θa, Θ̃ȧ are the fermionic
superpartners grouped according to the their chirality along the Dirichlet-
Dirichlet χ-plane. Unlike the D(-1)-D3 case, in the case of 8 Neumann-
Dirichlet directions Ω acts in the same way on the D(-1) and D7 Chan-Paton
indices. This implies that Dc transform in the adjoint of SO(k) if we take
the D7 gauge symmetry to be SO(N). In addition I acts as
I : Xm → −Xm Θ
a → −Θa (5.3)
Fields with eigenvalues ΩI = − are then in the following representations of
SO(k)× SO(N)
(χ, χ̄, Dc, Θ̃ȧ) 1
k(k− 1)
(Xm,Θ
k(k+ 1)
ν kN (5.4)
Fields even under I transform in the adjoint of SO(k) while odd fields tran-
form in the symmetric representation. For k = 1, N = 32 the D(-1)-D7
system or equivalently the D1-D9 bound state describes the S-dual version
of the fundamental heterotic string on T 2. k > 1 bound states correspond to
multiple windings of the heterotic string [65].
The field Dc implements the one-real D and three complex F flatness
conditions
V = −
DcDc = −g20
m,n=1
[Xm, Xn]
2 = 0 (5.5)
Dc = −1
mn[Xm, Xn] (5.6)
An explicit choice of Γ matrices in D = 7 is given by (a = 1, 2, 3)
Γa8×8 = iσ1 ⊗ η
4×4 Γ
8×8 = iσ3 ⊗ η̄
4×4 Γ
8×8 = iσ2 ⊗ 14×4 (5.7)
As in section 3 Z3 acts both on spacetime and Chan-Paton indices. Chan-
Paton indices decompose as N → N0 + N1 + N̄1 and k → k0 + k1 + k̄1.
Spacetime indices on the other hand decompose as
8v = 4 + 2ω + 2ω̄
8s = 2 + 2ω + 4ω̄
8c = 2 + 2ω̄ + 4ω
7 = 3 + 2ω + 2ω̄ (5.8)
In addition χ, ν transform with eigenvalue ω under Z3. Combining with (5.4)
one finds the Z3-invariant components
χ, χ̄ 1
k1(k1 − 1) + k0k̄1 + h.c.
Dc 3(1
k0(k0 − 1) + k1k̄1) + 2
k1(k1 − 1) + k0k̄1 + h.c.
Θ̃ȧ 2
k0(k0 − 1) + k1k̄1
k̄1(k̄1 − 1) + k0k1)
k1(k1 − 1) + k0k̄1)
k0(k0 + 1) + k1k̄1
k1(k1 + 1) + k0k̄1 + h.c.
k0(k0 + 1) + k1k̄1
k1(k1 + 1) + k0k̄1)
k̄1(k̄1 + 1) + k0k1)
ν k0N̄1 + k1N1 + k̄1N0 (5.9)
5.1 D3-ED3 one-loop vacuum amplitudes
ED3 generated superpotentials can be computed following the same steps as
in section 4.1. The disk amplitude can be written as
e〈1〉D = e2πiknτ̃n τ̃n = i
4πV4(Cn)
g2n α
(C4 + C0 ∧R ∧ R) (5.10)
τ̃n describes the coupling of closed string moduli to the ED3 instanton wrap-
ping the 4-cycle Cn with volume V4(Cn). We remark that closed string states
in the Z3-twisted sectors flow in the ED3-ED3 cylinder amplitude and there-
fore τ̃n is function of both untwisted and twisted closed twisted moduli. This
is not surprising since the volume of the cycle depends also on the volume of
the exceptional cycles that the ED3 wraps.
The annulus and Möbius amplitudes are given by
AED3,D3 =
ϑ[αβ ]
2trγθ,ktrγθ,N
ϑ[αβ−2h1 ]
+ trγ
1,ktrγ1,N
ϑ[αβ ]
k0N1 −
k1(N0 +N1) + . . .
MED3 = −
ϑ[αβ ]
2trγθ2,k
ϑ[αβ−2h1 ]
+ trγ
ϑ[αβ ]
= 3k0 + k1 + . . . (5.11)
The origin of the various contributions is the same that in the D(-1)-D3
system. Now the D3-ED3 open strings have 8 Neumann-Dirichlet directions
explaining the extra 1
twists in the annulus amplitude. On the other side,
the I projection accounts for the 1
-shift in the Möbius amplitude. Notice
that unlike the D(-1)-D3 case, the unprojected amplitude tr1, now gives a
non-trivial contribution.
Collecting the contributions from (5.11) one finds
Λ̃knbn = µknbn e〈1〉D+〈1〉A+〈1〉M = µknbn e2πikn τ̃n(µ) (5.12)
τ̃n(µ) = τ̃n −
(5.13)
A0 +M0 = knbn =
k0(6−N1) +
k1(2−N0 −N1) (5.14)
The interpretation of the bn as the one-loop β function coefficients of the τ̃n
coupling, though tantalizing, is not clear to us. We will now check that knbn
reproduces the right scale dependence of the instanton measure. The scaling
of the various instanton moduli follows from (5.2):
D, g0 ∼M
s χ, χ̄, ϕ ∼Ms Xm ∼ M
ν, Θa ∼M−1/2s Θ̃
ȧ ∼ M3/2s (5.15)
Collecting from (5.9) the number of degrees of freedom entering in the
instanton supermoduli measure one finds
e−Sk,N ∼ M−knbns
knbn = −2nD − nχ + nX +
nΘ̃ −
k0(6−N1) +
k1(2−N0 −N1) (5.16)
As in the previous case we write the instanton generated superpotential as
the moduli space integral
SW = Λ̃
e−Sk,N−Sϕ =
d4x0d
2θ Λ̃knbn ϕ−knbn+3 (5.17)
After promoting ϕ→ Φ and x0, θα to the measure of the superspace one finds
the ED3 generated superpotential
d4xd2θ Λ̃knbn Φ−knbn+3 (5.18)
The main difference with respect to the D(-1) instantons is that now ϕ enters
into Sϕ (5.2) only through the coupling to the ν-fermions. This implies that
in order to get a non zero result from the fermionic integral in (5.17) only the
ν’s and the two fermionic zero modes θα ∈ Θ
a should survive the orientifold
projections. From (5.9) one can easily see that this implies k0 = 1, k1 = 0.
The same counting shows that no solutions are allowed in the Sp(N) case.
5.2 The superpotential
Here we evaluate the instanton moduli space integral for the SO(N0)×U(N1)
case. From our analysis above the relevant cases are k0 = 1, k1 = 0.
The surviving fields in (5.9) are
θα ∈ Θ
0 ∈ Xm νu (5.19)
with u = 1, ...N1. The instanton action reduces to
S = νuϕ
uvνv (5.20)
The superpotential is then given by the integral
SW = Λ̃
d4x d2θ dN1ν e−νϕν (5.21)
After integration over ν and lifting ϕ→ Φ to the superfield one finds
SW = c Λ̃
d4x d2θ ǫu1....uN1Φ
u1u2Φu3u4...ΦuN1−1uN1 (5.22)
where c is a non vanishing numerical constant. Notice that the result
is non-trivial only when N1 is even. The superpotentials (5.22) are non-
renormalizable for N1 > 6 and grow for large vacuum expectation values
where the low energy approximation breaks down. The only exceptions are
Majorana masses U(4) + 3
Yukawa couplings SO(2)× U(6) + 3 ( , ¯) + 3 (•, ) (5.23)
Notice that both instanton generated Yukawa couplings involve only the mat-
ter in the antisymmetric representation.
6 ADS superpotentials: a general analysis
Here we consider a general N = 1 gauge theory with gauge group U(N) and
nAdj, nf/n̄f , nS/n̄S, nA/n̄A number of chiral multiplets in the adjoint, fun-
damental, symmetric and anti-symmetric representations (and their complex
conjugates) respectively.
The cubic chiral anomaly, one-loop β function and number of fermionic
zero modes in the instanton background of the gauge theory can be written
Ianom = nf− + nS−(N + 4) + nA−(N − 4) = 0 (6.1)
β1−loop = 3N −NnAdj −
nf+ −
nS+(N + 2)−
nA+(N − 2)
dimMF = k [2N + 2NnAdj + nf+ + nS+(N + 2) + nA+(N − 2)]
nf± = nf ± n̄f nS± = nS ± n̄S nA± = nA ± n̄A (6.2)
The condition for an Affleck, Dine and Seiberg like superpotential [16, 17] to
be generated was determined in section 4.1 to be
dimMF = 2kβ − 4 (6.3)
Combining (6.1) and (6.3) one finds
β1−loop = 2N +
(6.4)
nf− = −nS−(N + 4)− nA−(N − 4)
nf+ = 2N −
− 2NnAdj − nS+(N + 2)− nA+(N − 2)
Remarkably the β function in a theory admitting an instanton generated su-
perpotential depends only on the rank of the gauge group. A simple inspec-
tion shows that a superpotential is generated only for k = 1 and nAdj = 0.
The complete list follows from a scan of any choice of nS±,nA± such that
n+ ≥ |n−| and n+ ≥ 0. One finds
U(N) +Nf ( + ¯ ) Nf ≤ N − 1
U(N) + + (N − 4)¯ +Nf ( + ¯ ) Nf ≤ 2
U(4) + 2 +Nf ( + ¯ ) Nf ≤ 1
U(4) + 3
U(5) + 2 + 2¯ (6.5)
The inequalities are saturated for gauge theories satisfying (6.3) and (6.4),
while the lower cases are found by decoupling quark-antiquark pairs via mass
deformations.
The generalization to SO(N)/Sp(N) gauge groups is straightforward. In
these cases there is no restriction coming from anomalies since representations
are real. The β function and the number of fermionic zero modes in the
instanton background are given by
β1−loop =
(N ± 2)− 1
nS(N + 2)−
nA(N − 2)
dimMF = k [N ± 2 + nf + nS(N + 2) + nA(N − 2)]
with upper sign for Sp(N) and lower sign for SO(N) gauge groups. Imposing
(6.3) one finds
β1−loop = N ± 2 +
(6.6)
nf = N ± 2−
− nS(N + 2)− nA(N − 2)
The list of solutions is even shorter
SO(N) +Nf Nf ≤ N − 3 k = 2
Sp(N) +Nf Nf ≤ N k = 1
Sp(N) + + 2 k = 1
(6.7)
Notice that k = 1, respectively k = 2, are the basic instantons in Sp(N),
respectively SO(N), since the instanton symmetry groups are in these cases
SO(k), respectively Sp(k).
7 Conclusions
In the present paper, we have given a detailed microscopic derivation of non-
perturbative superpotentials for chiral N = 1 D3-brane gauge theories living
at Z3-orientifold singularities. We considered both unoriented projections
leading to SO(N1− 4)×U(N1) and Sp(N1+4)×U(N1) gauge theories with
three generations of chiral matter in the representations ( , ¯) + (•, ) and
( , ¯) + (•, ) respectively.
The U(4) case was studied in details in [49] and describes the local physics
of type I theory near the origin of T 6/Z3 with SO(8)× U(12) gauge group
broken by Wilson lines. In the present T-dual setting, there are two sources
of non-perturbative effects: D(-1) and ED3 instantons. The former realize
the standard gauge instantons and lead to Affleck, Dine and Seiberg like
superpotentials. The latter lead to Majorana masses or non-renormalizable
superpotentials and were ignored till very recently [18, 19, 20, 21, 15, 49, 22,
Our explicit instanton computations confirm the form of ADS and stringy
superpotentials proposed in [49] on the basis of holomorphicity, dimensional
analysis U(1) anomaly and flavour symmetry. We show that ADS super-
potentials are generated only for the U(4) and Sp(6)× U(2) gauge theories
in the Z3-orientifold list. The precise form of the superpotential is derived
from an integration over the instanton super-moduli space. Like in [15], the
β function running of gauge couplings are reproduced from vacuum ampli-
tudes given in terms of annulus and Möbius amplitudes ending on the in-
stantons. The same analysis is performed for “stringy instantons” generated
by Euclidean ED3-branes (dual to ED1-strings in type I theory) wrapping
holomorphic four-cycles on T 6/Z3. A detailed microscopic analysis of the
multi-instanton super-moduli space encompasses massless open string states
with a least one end on the ED3-instanton. We show the generation of Ma-
jorana mass terms for the open string chiral multiplets in the U(4) case,
Yukwa couplings for the SO(2)×U(6) gauge theory and non-renormalizable
superpotentials for SO(N0)×U(N0 +4) gauge theories. The field theory in-
terpretation of the β function coefficients generated by the one-loop vacuum
amplitudes for open strings ending on the ED3-instantons is one of the most
interesting open question left by our instanton super-moduli space analysis.
As previously observed, the invariance under anomalous U(1)’s results from
a detailed balance between the charges of the open strings involved and the
axionic shift of a closed string R-R modulus from the twisted sector.
Our present analysis has some analogies with the recent ones [18, 19, 20,
21, 15, 22, 23] that have focussed on ED2-branes at D6-brane intersection.
As stressed in [49], one immediate advantage of the viewpoint advocated
here is the consistency of the local description. Indeed, imposing twisted
tadpole cancellation [34, 35] the models presented here and all closely related
settings of D-branes at singularities (not necessarily of the Zn kind) give
rise to anomaly free theories, while this is not necessarily the case for the
‘local’ models with intersecting D-branes. We can envisage the possibility of
extending our analysis to other Zn singularities [55, 56] or even to Gepner
models [66, 67, 68] where many if not all ingredients, such as the brane actions
from gauge kinetic functions including one-loop threshold effects [69, 70, 71,
72], are available.
In the present paper we have not addressed phenomenological implica-
tions of the stringy instanton effects we have analyzed in detail. We hope to
be able to investigate these issues in this or similar contexts with D-branes
at singularities, where the rigidity of the cycles is well understood and allows
for the correct number of fermionic zero-modes. Clearly additional (closed
string) fluxes neeeded for moduli stabilization [73, 74] may change some of
our present conclusions.
Acknowledgments
It is a pleasure to thank P. Anastasopoulos, R. Argurio, C. Bachas, M.
Bertolini, M. Billo, G. Ferretti, M.L. Frau, A. Kumar, E. Kiritsis, I. Kle-
banov, S. Kovacs, A. Lerda, L. Martucci, I. Pesandro, R. Russo and M.
Wijnholt for valuable discussions. Special thanks go to G. Pradisi for collab-
oration on the computation of the string amplitudes and useful exchanges.
During completion of this work, M.B. was visiting the Galileo Galilei In-
stitute in Arcetri (FI) and thanks INFN for hospitality and support. M.B.
is very grateful to the organizers and participants to the workshop “String
and M theory approaches to particle physics and cosmology” for creating
a very stimulating atmosphere. This work was supported in part by the
CNRS PICS no. 2530 and 3059, INTAS grant 03-516346, MIUR-COFIN
2003-023852, NATO PST.CLG.978785, the RTN grants MRTNCT- 2004-
503369, EU MRTN-CT-2004-512194, MRTN-CT-2004-005104 and by a Eu-
ropean Union Excellence Grant, MEXT-CT-2003-509661.
References
[1] N. A. Nekrasov, Seiberg-witten prepotential from instanton counting,
Adv. Theor. Math. Phys. 7 (2004) 831–864, [hep-th/0206161].
[2] R. Flume and R. Poghossian, An algorithm for the microscopic
evaluation of the coefficients of the seiberg-witten prepotential, Int. J.
Mod. Phys. A18 (2003) 2541, [hep-th/0208176].
[3] U. Bruzzo, F. Fucito, J. F. Morales, and A. Tanzini, Multi-instanton
calculus and equivariant cohomology, JHEP 05 (2003) 054,
[hep-th/0211108].
[4] A. S. Losev, A. Marshakov, and N. A. Nekrasov, Small instantons,
little strings and free fermions, hep-th/0302191.
[5] N. Nekrasov and A. Okounkov, Seiberg-witten theory and random
partitions, hep-th/0306238.
[6] R. Flume, F. Fucito, J. F. Morales, and R. Poghossian, Matone’s
relation in the presence of gravitational couplings, JHEP 04 (2004)
008, [hep-th/0403057].
[7] M. Marino and N. Wyllard, A note on instanton counting for n = 2
gauge theories with classical gauge groups, JHEP 05 (2004) 021,
[hep-th/0404125].
[8] F. Fucito, J. F. Morales, and R. Poghossian, Instantons on quivers and
orientifolds, JHEP 10 (2004) 037, [hep-th/0408090].
[9] F. Fucito, J. F. Morales, R. Poghossian, and A. Tanzini, N = 1
superpotentials from multi-instanton calculus, JHEP 01 (2006) 031,
[hep-th/0510173].
[10] S. Fujii, H. Kanno, S. Moriyama, and S. Okada, Instanton calculus and
chiral one-point functions in supersymmetric gauge theories,
hep-th/0702125.
[11] N. Dorey, T. J. Hollowood, V. V. Khoze, and M. P. Mattis, The
calculus of many instantons, Phys. Rept. 371 (2002) 231–459,
[hep-th/0206063].
[12] M. Bianchi, S. Kovacs and G. Rossi, “Instantons and supersymmetry”,
hep-th/0703142.
[13] M. Billo et. al., Classical gauge instantons from open strings, JHEP 02
(2003) 045, [hep-th/0211250].
[14] M. Billo, M. Frau, F. Fucito, and A. Lerda, Instanton calculus in r-r
background and the topological string, JHEP 11 (2006) 012,
[hep-th/0606013].
[15] N. Akerblom, R. Blumenhagen, D. Lust, E. Plauschinn, and
M. Schmidt-Sommerfeld, Non-perturbative sqcd superpotentials from
string instantons, hep-th/0612132.
[16] I. Affleck, M. Dine, and N. Seiberg, Supersymmetry breaking by
instantons, Phys. Rev. Lett. 51 (1983) 1026.
[17] I. Affleck, M. Dine, and N. Seiberg, Dynamical supersymmetry breaking
in supersymmetric qcd, Nucl. Phys. B241 (1984) 493–534.
[18] R. Blumenhagen, M. Cvetic, and T. Weigand, Spacetime instanton
corrections in 4d string vacua - the seesaw mechanism for d-brane
models, hep-th/0609191.
[19] M. Haack, D. Krefl, D. Lust, A. Van Proeyen, and M. Zagermann,
Gaugino condensates and d-terms from d7-branes, JHEP 01 (2007)
078, [hep-th/0609211].
[20] L. E. Ibanez and A. M. Uranga, Neutrino majorana masses from string
theory instanton effects, JHEP 03 (2007) 052, [hep-th/0609213].
[21] B. Florea, S. Kachru, J. McGreevy, and N. Saulina, Stringy instantons
and quiver gauge theories, hep-th/0610003.
[22] M. Cvetic, R. Richter, and T. Weigand, Computation of d-brane
instanton induced superpotential couplings: Majorana masses from
string theory, hep-th/0703028.
[23] R. Argurio, M. Bertolini, G. Ferretti, A. Lerda and C. Petersson,
Stringy instantons at orbifold singularities, hep-th/0704.0262.
[24] A. Sagnotti, Open strings and their symmetry groups,
hep-th/0208020.
[25] G. Pradisi and A. Sagnotti, Open string orbifolds, Phys. Lett. B216
(1989) 59.
[26] M. Bianchi and A. Sagnotti, On the systematics of open string
theories, Phys. Lett. B247 (1990) 517–524.
[27] M. Bianchi and A. Sagnotti, Twist symmetry and open string wilson
lines, Nucl. Phys. B361 (1991) 519–538.
[28] M. Bianchi, G. Pradisi, and A. Sagnotti, Toroidal compactification and
symmetry breaking in open string theories, Nucl. Phys. B376 (1992)
365–386.
[29] E. Witten, Toroidal compactification without vector structure, JHEP
02 (1998) 006, [hep-th/9712028].
[30] E. Dudas, Theory and phenomenology of type i strings and m-theory,
Class. Quant. Grav. 17 (2000) R41–R116, [hep-ph/0006190].
[31] C. Angelantonj and A. Sagnotti, Open strings, Phys. Rept. 371 (2002)
1–150, [hep-th/0204089].
[32] M. B. Green and J. H. Schwarz, Anomaly cancellation in
supersymmetric d=10 gauge theory and superstring theory, Phys. Lett.
B149 (1984) 117–122.
[33] A. Sagnotti, A note on the green-schwarz mechanism in open string
theories, Phys. Lett. B294 (1992) 196–203, [hep-th/9210127].
[34] L. E. Ibanez, R. Rabadan, and A. M. Uranga, Anomalous u(1)’s in
type i and type iib d = 4, n = 1 string vacua, Nucl. Phys. B542 (1999)
112–138, [hep-th/9808139].
[35] M. Bianchi and J. F. Morales, Anomalies and tadpoles, JHEP 03
(2000) 030, [hep-th/0002149].
[36] I. Antoniadis, E. Kiritsis, and J. Rizos, Anomalous u(1)s in type i
superstring vacua, Nucl. Phys. B637 (2002) 92–118, [hep-th/0204153].
[37] P. Anastasopoulos, 4d anomalous u(1)’s, their masses and their
relation to 6d anomalies, JHEP 08 (2003) 005, [hep-th/0306042].
[38] P. Anastasopoulos, Anomalous u(1)s masses in non-supersymmetric
open string vacua, Phys. Lett. B588 (2004) 119–126,
[hep-th/0402105].
[39] P. Anastasopoulos, M. Bianchi, E. Dudas, and E. Kiritsis, Anomalies,
anomalous u(1)’s and generalized chern-simons terms, JHEP 11
(2006) 057, [hep-th/0605225].
[40] C. Angelantonj, M. Bianchi, G. Pradisi, A. Sagnotti, and Y. S. Stanev,
Chiral asymmetry in four-dimensional open- string vacua, Phys. Lett.
B385 (1996) 96–102, [hep-th/9606169].
[41] M. Bianchi, A note on toroidal compactifications of the type i
superstring and other superstring vacuum configurations with 16
supercharges, Nucl. Phys. B528 (1998) 73–94, [hep-th/9711201].
[42] M. Cvetic, L. L. Everett, P. Langacker, and J. Wang, Blowing-up the
four-dimensional z(3) orientifold, JHEP 04 (1999) 020,
[hep-th/9903051].
[43] M. Cvetic and P. Langacker, D = 4 n = 1 type iib orientifolds with
continuous wilson lines, moving branes, and their field theory
realization, Nucl. Phys. B586 (2000) 287–302, [hep-th/0006049].
[44] M. Cvetic, A. M. Uranga, and J. Wang, Discrete wilson lines in n = 1
d = 4 type iib orientifolds: A systematic exploration for z(6)
orientifold, Nucl. Phys. B595 (2001) 63–92, [hep-th/0010091].
[45] A. M. Uranga, Chiral four-dimensional string compactifications with
intersecting d-branes, Class. Quant. Grav. 20 (2003) S373–S394,
[hep-th/0301032].
[46] E. Kiritsis, D-branes in standard model building, gravity and
cosmology, Fortsch. Phys. 52 (2004) 200–263, [hep-th/0310001].
[47] R. Blumenhagen, M. Cvetic, P. Langacker, and G. Shiu, Toward
realistic intersecting d-brane models, Ann. Rev. Nucl. Part. Sci. 55
(2005) 71–139, [hep-th/0502005].
[48] R. Blumenhagen, B. Kors, D. Lust, and S. Stieberger,
Four-dimensional string compactifications with d-branes, orientifolds
and fluxes, hep-th/0610327.
[49] M. Bianchi and E. Kiritsis, Non-perturbative and flux superpotentials
for type i strings on the z(3) orbifold, hep-th/0702015.
[50] M. Dine, N. Seiberg, X. G. Wen, and E. Witten, Nonperturbative
effects on the string world sheet, Nucl. Phys. B278 (1986) 769.
[51] M. Dine, N. Seiberg, X. G. Wen, and E. Witten, Nonperturbative
effects on the string world sheet. 2, Nucl. Phys. B289 (1987) 319.
[52] E. Witten, World-sheet corrections via d-instantons, JHEP 02 (2000)
030, [hep-th/9907041].
[53] C. Beasley and E. Witten, Residues and world-sheet instantons, JHEP
10 (2003) 065, [hep-th/0304115].
[54] C. Beasley and E. Witten, New instanton effects in string theory,
JHEP 02 (2006) 060, [hep-th/0512039].
[55] G. Aldazabal, L. E. Ibanez, F. Quevedo, and A. M. Uranga, D-branes
at singularities: A bottom-up approach to the string embedding of the
standard model, JHEP 08 (2000) 002, [hep-th/0005067].
[56] M. Buican, D. Malyshev, D. R. Morrison, M. Wijnholt, and
H. Verlinde, D-branes at singularities, compactification, and
hypercharge, JHEP 01 (2007) 107, [hep-th/0610007].
[57] M. R. Douglas, Branes within branes, hep-th/9512077.
[58] M. F. Atiyah, N. J. Hitchin, V. G. Drinfeld, and Y. I. Manin,
Construction of instantons, Phys. Lett. A65 (1978) 185–187.
[59] N. Dorey, T. J. Hollowood, V. V. Khoze, M. P. Mattis, and
S. Vandoren, Multi-instanton calculus and the ads/cft correspondence
in n = 4 superconformal field theory, Nucl. Phys. B552 (1999) 88–168,
[hep-th/9901128].
[60] E. G. Gimon and J. Polchinski, Consistency conditions for orientifolds
and d-manifolds, Phys. Rev. D54 (1996) 1667–1676, [hep-th/9601038].
[61] M. Bianchi and J. F. Morales, Rg flows and open/closed string duality,
JHEP 08 (2000) 035, [hep-th/0006176].
[62] F. Fucito, J. F. Morales, and A. Tanzini, D-instanton probes of
non-conformal geometries, JHEP 07 (2001) 012, [hep-th/0106061].
[63] G. Veneziano and S. Yankielowicz, An effective lagrangian for the pure
n=1 supersymmetric yang-mills theory, Phys. Lett. B113 (1982) 231.
[64] T. R. Taylor, G. Veneziano, and S. Yankielowicz, Supersymmetric qcd
and its massless limit: An effective lagrangian analysis, Nucl. Phys.
B218 (1983) 493.
[65] E. Gava, J. F. Morales, K. S. Narain, and G. Thompson, Bound states
of type I D-strings, Nucl. Phys. B528 (1998) 95–108,
[hep-th/9801128].
[66] C. Angelantonj, M. Bianchi, G. Pradisi, A. Sagnotti, and Y. S. Stanev,
Comments on gepner models and type i vacua in string theory, Phys.
Lett. B387 (1996) 743–749, [hep-th/9607229].
[67] T. P. T. Dijkstra, L. R. Huiszoon, and A. N. Schellekens,
Supersymmetric standard model spectra from rcft orientifolds, Nucl.
Phys. B710 (2005) 3–57, [hep-th/0411129].
[68] P. Anastasopoulos, T. P. T. Dijkstra, E. Kiritsis, and A. N.
Schellekens, Orientifolds, hypercharge embeddings and the standard
model, Nucl. Phys. B759 (2006) 83–146, [hep-th/0605226].
[69] I. Antoniadis, C. Bachas, and E. Dudas, Gauge couplings in
four-dimensional type i string orbifolds, Nucl. Phys. B560 (1999)
93–134, [hep-th/9906039].
[70] D. Lust and S. Stieberger, Gauge threshold corrections in intersecting
brane world models, hep-th/0302221.
[71] M. Bianchi and E. Trevigne, Gauge thresholds in the presence of
oblique magnetic fluxes, JHEP 01 (2006) 092, [hep-th/0506080].
[72] P. Anastasopoulos, M. Bianchi, G. Sarkissian, and Y. S. Stanev, On
gauge couplings and thresholds in type i gepner models and otherwise,
JHEP 03 (2007) 059, [hep-th/0612234].
[73] D. Lust, S. Reffert, E. Scheidegger, W. Schulgin, and S. Stieberger,
Moduli stabilization in type iib orientifolds. ii, hep-th/0609013.
[74] D. Lust, S. Reffert, E. Scheidegger, and S. Stieberger, Resolved toroidal
orbifolds and their orientifolds, hep-th/0609014.
ABSTRACT
  We give a detailed microscopic derivation of gauge and stringy instanton
generated superpotentials for gauge theories living on D3-branes at
Z_3-orientifold singularities. Gauge instantons are generated by D(-1)-branes
and lead to Affleck, Dine and Seiberg (ADS) like superpotentials in the
effective N=1 gauge theories with three generations of bifundamental and
anti/symmetric matter. Stringy instanton effects are generated by Euclidean
ED3-branes wrapping four-cycles on T^6/\Z_3. They give rise to Majorana masses
in one case and non-renormalizable superpotentials for the other cases. Finally
we determine the conditions under which ADS like superpotentials are generated
in N=1 gauge theories with adjoints, fundamentals, symmetric and antisymmetric
chiral matter.

<|endoftext|><|startoftext|>
Introduction 
Diamond-Like Carbon (DLC) films have been 
shown to demonstrate various tribological behaviors: in 
ultra-high vacuum (UHV), with either friction 
coefficients as low as 0.01 or less and very mild wear, 
or very high friction coefficients (>0.4) and drastic 
wear. These behaviors depend notably on gaseous 
environment, hydrogen content of the film [1], and on 
its viscoplastic properties [2,3]. A relation between 
superlow friction in UHV and viscoplasticity has indeed 
been established for a-C:H films and confirmed for a 
fluorinated sample (a-C:F:H). In this study, 
nanoindentation and nanoscratch tests were conducted 
in ambient air, using a nanoindentation apparatus, in 
order to evaluate tribological behaviors, as well as 
mechanical and viscoplastic properties of different 
amorphous carbon films. 
Experimental 
The samples were deposited on a Si (100) 
substrate by Plasma Enhanced Chemical Vapor 
Deposition (PECVD) process at different bias voltages, 
either from acetylene, cyclohexane precursors by d.c.-
PECVD, or hexafluorobenzene mixed with hydrogen 
precursor by r.f.-PECVD for the fluorinated sample (a-
C:F:H, noted FDLC). Details of the deposition process 
can be found in [4,5]. Thickness of the coating is 1µm, 
except for the FDLC, which is 0.4µm. 
Nanoindentation and nanoscratch tests were 
carried out in ambient air, at room temperature, with a 
MTS NanoIndenter® XP apparatus. A spherical (radius 
10µm) and a Berkovich diamond indenter were used. 
Mechanical and viscoplastic properties were evaluated 
from nanoindentation tests in continuous stiffness mode, 
using the Berkovich diamond indenter, with a maximum 
load of 100mN. As the load P is applied exponentially 
as a function of time, the ratio between loading rate P’ 
and load P is kept constant during indentation, and thus 
the strain rate ε  is also constant. Five different ratios 
P’/P, from 3.10-3 up to 3.10-1Hz, were used. 
Nanotribological evaluation of the samples was 
conducted from nanoscratch tests at ramping load (0.1 
to 10mN, 3 passes) and at constant load (5mN, 10 
passes) with spherical diamond indenter. 
Results 
The strain rate sensitivity of the materials is 
estimated and fitted by a Norton-Hoff law: 
xH H ε= ⋅  
where H is the hardness, H0  a constant, ε  the strain and 
x a constant called viscoplastic exponent (Table 1). 
Contrary to UHV, no evidence of correlation between 
friction coefficients in ambient air and viscoplasticity 
can be made. But even in this environment, some very 
low friction coefficient values, as low as 0.04 (FDLC), 
with very mild wear have been evidenced (Figure 1). 
Hardness H0 seems to be the key parameter: wear 
resistance in the air is improved with higher H0 and 
friction coefficient decreases with H0. Note that H0 is 
also roughly linked with the hydrogen content of the 
coating for the non fluorinated samples, as it has been 
shown in [2]. The number of passes seems also to lead 
to a decrease of friction coefficient. 
Sample H content 
(at. %) 
(GPa) 
x µ ramping 
load 
µ constant 
load 
Wear 
FDLC 5/18(F) 16 0.060 0.04-0.14 0.080 ~ none 
AC8 34 13 0.014 0.06-0.11 0.083 ~ none 
AC5 40 11 0.068 0.05-0.16 0.078 mild 
CY6.5 42 6.8 0.028 0.07-0.13 0.083 mild 
CY5 42 1.3 0.076 0.10-0.22 0.183 severe 
Table 1: Summary of nanofriction tests results 
and viscoplastic properties 
Conclusion 
This study shows that in ambient air, wear 
resistance and frictional behavior of a-C:H and a-C:F:H 
samples is improved with  hardness H0. In UHV, the 
achievement of super-low friction is linked with the 
viscoplastic character. Thus, intermediary coating, with 
high hardness and viscoplastic exponent, as a-C:F:H 
will demonstrate satisfactory tribological behavior both 
in ambient air and in UHV. 
References 
[1] C. Donnet et al., Tribo. Lett., 9 (2000) 137. 
[2] J. Fontaine et al., Tribo. Lett., 17 (2004) 709. 
[3] J. Fontaine et al., Thin Solid Films, (2005) in press. 
[4] C. Donnet et al., Surf. Coat. Tech., 94-95 (1997) 
456. 
[5] C. Donnet et al., Surf. Coat. Tech., 94-95 (1997) 
531. 
Figure 1: Constant load scratch micrograph
FDLC CY5
ABSTRACT
  Diamond-Like Carbon (DLC) films have been shown to demonstrate various
tribological behaviors: in ultra-high vacuum (UHV), with either friction
coefficients as low as 0.01 or less and very mild wear, or very high friction
coefficients (>0.4) and drastic wear. These behaviors depend notably on gaseous
environment, hydrogen content of the film [1], and on its viscoplastic
properties [2,3]. A relation between superlow friction in UHV and
viscoplasticity has indeed been established for a-C:H films and confirmed for a
fluorinated sample (a-C:F:H). In this study, nanoindentation and nanoscratch
tests were conducted in ambient air, using a nanoindentation apparatus, in
order to evaluate tribological behaviors, as well as mechanical and
viscoplastic properties of different amorphous carbon films.

<|endoftext|><|startoftext|>
IPPP/07/10
DCPT/07/20
Implication of the D0 Width Difference
On CP-Violation in D0-D̄0 Mixing
Patricia Ball
IPPP, Department of Physics, University of Durham, Durham DH1 3LE, UK
Abstract
Both BaBar and Belle have found evidence for a non-zero width difference in the D0-D̄0
system. Although there is no direct experimental evidence for CP-violation in D mixing
(yet), we show that the measured values of the width difference y ∼ ∆Γ already imply
constraints on the CP-odd phase in D mixing, which, if significantly different from zero,
would be an unambiguous signal of new physics.
∗Patricia.Ball@durham.ac.uk
http://arxiv.org/abs/0704.0786v2
The highlight of this year’s Moriond conference on electroweak interactions and unified
theories arguably was the announcement by BaBar and Belle of experimental evidence
for D0-D̄0 mixing [1, 2, 3], which was quickly followed by a number of theoretical anal-
yses [4, 5, 6, 7, 8, 9]. While Refs. [4, 7, 8, 9] focused on the constraints posed, by the
experimental results, on various new-physics models, Ref. [5] presented a first analysis
of the implications of these results for the fundamental parameters describing D mixing.
The purpose of this letter is to show that the present experimental results already imply
constraints on a sizeable CP-odd phase in D mixing, which could only be due to new
physics (NP).
To start with, let us shortly review the theoretical formalism of D mixing and the
experimental results, see Refs. [10, 11] for more detailed reviews. In complete analogy to
B mixing, D mixing in the SM is due to box diagrams with internal quarks andW bosons.
In contrast to B, though, the internal quarks are down-type. Also in contrast to B mixing,
the GIM mechanism is much more effective, as the contribution of the heaviest down-type
quark, the b, comes with a relative enhancement factor (m2b −m2s,d)/(m2s −m2d), but also
a large CKM-suppression factor |VubV ∗cb|2/|VusV ∗cs|2 ∼ λ8, which renders its contribution
to D mixing ∼ 1% and hence negligible. As a consequence, D mixing is very sensitive to
the potential intervention of NP. On the other hand, it is also rather difficult to calculate
the SM “background” to D mixing, as the loop-diagrams are dominated by s and d
quarks and hence sensitive to the intervention of resonances and non-perturbative QCD.
The quasi-decoupling of the 3rd quark generation also implies that CP violation in D
mixing is extremely small in the SM, and hence any observation of CP violation will be
an unambiguous signal of new physics, independently of hadronic uncertainties.
The theoretical parameters describing D mixing can be defined in complete analogy to
those for B mixing: the time evolution of the D0 system is described by the Schrödinger
equation
D0(t)
D̄0(t)
M − i Γ
D0(t)
D̄0(t)
with Hermitian matrices M and Γ. The off-diagonal elements of these matrices, M12
and Γ12, describe, respectively, the dispersive and absorptive parts of D mixing. The
flavour-eigenstates D0 = (cū), D̄0 = (uc̄) are related to the mass-eigenstates D1,2 by
|D1,2〉 = p|D0〉 ± q|D̄0〉 (2)
M12 − i2 Γ12
; (3)
|p|2 + |q|2 = 1 by definition.
The basic observables in D mixing are the mass and lifetime difference of D1,2, which
are usually normalised to the average lifetime Γ = (Γ1 + Γ2)/2:
x ≡ ∆M
M2 −M1
, y ≡ ∆Γ
Γ2 − Γ1
. (4)
In this letter we follow the sign convention of Ref. [5], according to which x is positive
by definition. The sign of y then has to be determined from experiment. In addition, if
there is CP-violation in the D system, one also has
6= 1, φ ≡ arg(M12/Γ12) 6= 0. (5)
While previously only bounds on x and y were known, both BaBar and Belle have
now found evidence for non-vanishing mixing in the D system. BaBar has obtained this
evidence from the measurement of the doubly Cabibbo-suppressed decay D0 → K+π−
(and its CP conjugate), yielding
y′ = (0.97± 0.44(stat)± 0.31(syst))× 10−2,
x′2 = (−0.022± 0.030(stat)± 0.021(syst))× 10−2, (6)
while Belle obtains
yCP = (1.31± 0.32(stat)± 0.25(syst))× 10−2 (7)
from D0 → K+K−, π+π− and
x = (0.80±0.29(stat)±0.17(syst))×10−2, y = (0.33±0.24(stat)±0.15(syst))×10−2 (8)
from a Dalitz-plot analysis of D0 → K0Sπ+π−. Here yCP → y in the limit of no CP
violation in D mixing, while the primed quantities x′, y′ are related to x, y by a rotation
by a strong phase δKπ:
y′ = cos δKπ − x sin δKπ, x′ = x cos δKπ + y sin δKπ. (9)
Limited experimental information on this phase has been obtainted at CLEO-c [12]:
cos δKπ = 1.09± 0.66 , (10)
which can be translated into δKπ = (0 ± 65)◦. An analysis with a larger data-set is
underway at CLEO-c, with an expected uncertainty of ∆ cos δKπ ≈ 0.1 in the next couple
of years [13]; BES-III is expected to reach ∆ cos δKπ ≈ 0.04 after 4 years of running
[14]. The experimental result (10) agrees with theoretical expectations, δKπ = 0 in the
SU(3)-limit and |δKπ| <∼ 15
◦ from a calculation of the amplitudes in QCD factorisation
[15]. Based on these experimental results, a preliminary HFAG-average was presented at
the 2007 CERN workshop “Flavour in the Era of the LHC” [13]:
x = (8.5+3.2
−3.1)× 10−3, y = (7.1+2.0−2.3)× 10−3. (11)
Adding errors in quadrature, this implies
= 1.2± 0.6. (12)
The exact relations between ∆M , ∆Γ, M12 and Γ12 are given by
(∆M)2 − 1
(∆Γ)2 = 4|M12|2 − |Γ12|2,
(∆M)(∆Γ) = 4Re(M∗
Γ12) = 4|M12||Γ12| cosφ . (13)
Eq. (13) implies x/y > 0 for |φ| < π/2 and x/y < 0 for π/2 < |φ| < 3π/2. In view of the
above experimental results, we assume |φ| < π/2 from now on.
As for the CP-violating observables, |q/p| 6= 1 characterises CP-violation in mixing
and can be measured for instance in flavour-specific decays D0 → f , where D̄0 → f is
possible only via mixing. The prime example is semileptonic decays with
ASL =
Γ(D0 → ℓ−X)− Γ(D̄0 → ℓ+X)
Γ(D0 → ℓ−X) + Γ(D̄0 → ℓ+X)
|q/p|2 − |p/q|2
|q/p|2 + |p/q|2
. (14)
Although the B factories may have some sensitivity to this asymmetry, its measurement
is severely impaired by the fact that D mixing proceeds only very slowly, resulting in a
large suppression factor of the mixed vs. the unmixed rate:
Γ(D0 → ℓ−X)
Γ(D0 → ℓ+X)
x2 + y2
2 + x2 + y2
≈ 6× 10−5. (15)
Both in the K and the B system the quantity
− 1 (16)
is very small, which however need not necessarily be the case forD’s. From (3) one derives
the general expression
4 + r2 + 4r sin φ
4 + r2 − 4r sinφ
with r = |Γ12/M12| and the weak phase φ defined in (5). In the B system, one has r ≪ 1
(the current up-to-date numbers are r ≈ 7 × 10−3 for Bd and r ≈ 5 × 10−3 for Bs [16]),
so that upon expansion in r
= 1 +
sin φ+O(r2). (18)
Note that this formula refers to the definition φ = arg(M12/Γ12), which differs by +π
from the one used in Ref. [16], φ = arg(−M12/Γ12). For the K system, one finds r ≈
|∆Γ/∆M | ≈ 2 from experiment, but now the phase φ turns out to be small, so that
= 1 +
4 + r2
φ+O(φ2) ≈ 1 + φ. (19)
In both cases, |q/p| ≈ 1 to a very good approximation. In the D system, however, there
is no natural hierarchy r ≪ 1, and of course one hopes that NP-effects induce |φ| ≫ 0. In
-1.5 -1. -0.5 0. 0.5 1. 1.5
Figure 1: |q/p|2, Eq. (20), as a function of the CP-odd phase φ for the central experimental
value r̃ = 7.1/8.5. Solid line: full expression, dashed line: first order expansion around φ = 0.
this case, and because x and y have been measured, while |M12| and |Γ12| are difficult to
calculate, it is convenient to express |q/p|D in terms of x, y, φ, using the exact relations
(13). From (3), and defining r̃ = y/x, we then obtain
2(1 + r̃2)
2(1 + r̃2)2 + 16r̃2 tan2 φ
+8r̃ tanφ secφ
(1 + r̃2)2 − (1− r̃2)2 sin2 φ
. (20)
Note that for finite xy and φ = ±π/2, |q/p| diverges because xy → 0 for φ → ±π/2 from
(13). In Fig. 1 we plot |q/p|2 as function of φ, for the central experimental value from
HFAG, r̃ = 7.1/8.5, Eq. (11). It is obvious that even for moderate values of φ the small-φ
expansion is not really reliable.
What is the currently available experimental information on CP-violating in D mixing,
i.e. |q/p| and φ? As already mentioned, the semileptonic CP-asymmetry (14) has not
been measured yet. What has been measured, though, is the effect of CP-violation on the
time-dependent rates of D0 → K+π− and D̄0 → K−π+. The BaBar collaboration has
parametrised these rates as
Γ(D0(t) → K+π−) ∝ e−Γt
x′2+ + y
(Γt)2
Γ(D̄0(t) → K−π+) ∝ e−Γt
+ y′2
(Γt)2
and fit the D0 and D̄0 samples separately. They find [2]
= (9.8± 6.4(stat)± 4.5(syst))× 10−3,
= (9.6± 6.1(stat)± 4.3(syst))× 10−3. (22)
Adding errors in quadrature, this means y′
= 1.0±1.1. BaBar also obtains values for
which we do not quote here, because the sensitivity to the quadratic term in (21) is
-1.5 -1. -0.5 0. 0.5 1.
-1.5 -1. -0.5 0. 0.5 1. 1.5
Figure 2: Left: y′+/y
as function of φ for x/y = 1.2 (solid line) and x/y = {0.6, 1.8} (dashed
lines), from Eq. (11). δKπ = 0. Right: y
as function of φ for x/y = 1.2 for δKπ = 0 (solid
line) and δKπ = ±65◦ (dashed lines).
less than that to the linear term in y′
D is the ratio of the doubly Cabbibo-suppressed
to the Cabibbo-favoured amplitude, R
D = |A(D0 → K+π−)/A(D0 → K−π+)|. δKπ is
the relative strong phase in the Cabibbo-favoured and suppressed amplitudes:
A(D0 → K+π−)
A(D̄0 → K+π−)
−iδKπ ; (23)
the minus-sign comes from the relative sign between the CKM matrix elements Vcd and
Vus. In the limit of no CP-violation in the decay amplitude, one has |A(D0 → K−π+)| =
|A(D̄0 → K+π−)|, which is expected to be a very good approximation, in view of the fact
that the decay is solely due to a tree-level amplitude. Then the relation of y′
to x, y and
φ is given by
{(y cos δKπ − x sin δKπ) cosφ+ (x cos δKπ + y sin δKπ) sinφ} ,
{(y cos δKπ − x sin δKπ) cosφ− (x cos δKπ + y sin δKπ) sinφ} . (24)
Presently, the experimental result for y′+/y
is compatible with 1, although with con-
siderable uncertainties. Any significant deviation from 1 would be a sign for new physics.
In Fig. 2 we plot y′+/y
as function of φ, for different values of x/y and δKπ. The figures
clearly show that the value of y′
is very sensitive to the phase φ, at least if δKπ is not
too close to −65◦, which corresponds to the nearly constant dashed line in Fig. 2b. The
reason for this dependence on δKπ becomes clearer if y
is expanded to first order in
= 1− 2φ x(x
2 + 2y2) cos δKπ + y
3 sin δKπ
(x2 + y2)(x sin δKπ − y cos δKπ)
+O(φ2) . (25)
For the central values of x and y, Eq. (11), this amounts to 1+3.4φ for δKπ = 0, 1− 3.3φ
for δKπ = +65
◦ and 1 + 0.45φ for δKπ = −65◦, which explains the shape of the curves in
Fig. 2b. Evidently it is important to reduce the uncertainty of δKπ, which, as mentioned
x/ySM
Figure 3: Plot of |∆Γ/∆ΓSM|, Eq. (26), as a function of x/ySM and φ.
earlier, will be achieved within the next few years. On the other hand, as shown in
Fig. 2a, y′
, which depends only on the ratio x/y, but not x and y separately, is not
very sensitive to the precise value of that ratio, but very much so to φ. The conclusion
is that, even if x/y itself cannot be determined very precisely, y′
will nonetheless be
a powerful tool to constrain φ, at least once δKπ will be known more precisely. Already
now very large values φ ∼ π/2 are excluded.
Another, more theory-dependent constraint on φ can be derived from the value of y.
This argument centers around the fact that (a) the experimental result (11) is at the top
end of theoretical predictions ySM ∼ 1% [17] and (b) new physics indicated by a non-zero
value of φ always reduces the lifetime difference, independently of the value of x. This
observation is similar to what was found, some time ago, for the Bs system [18]. In order
to derive it, we assume that new physics does not affect Γ12,
1 so that Γ12 = Γ
12 . We then
have 2|Γ12| = ∆ΓSM and hence |ySM| = |Γ12|/Γ. Using the relations (13), we can then
express the ratio |∆Γ/∆ΓSM| in terms of ySM, x and φ:
+ x2/ cos2 φ
. (26)
This implies that new physics always reduces the lifetime difference, independently of the
value of x (and any new physics in the mass difference). In particular one has y = 0
for φ = ±π/2 and x 6= 0, which follows from the 2nd relation (13). Eq. (26) is the
manifestation of the fact that one does not need to observe CP-violation in order to
constrain it. A famous example for this is the unitarity triangle in B physics, whose
sides are determined from CP-conserving quantities only, but nonetheless allow a precise
measurement of the size of CP-violation in the SM, via the angles and the area of the
triangle. In Fig. 3, we plot |∆Γ/∆ΓSM| as a function of r = x/ySM. The zero at φ = ±π/2
is clearly visible. The experimental value |y/ySM| = O(1) then excludes phases φ close
to ±π/2. In order to make more quantitative statements, apparently a more precise
calculation of ySM is needed.
1See, however, Ref. [19] for a discussion of the effect of tiny NP admixtures to Γ12.
-1.5 -1. -0.5 0. 0.5 1. 1.5
5.yCP
-1.5 -1. -0.5 0. 0.5 1. 1.5
Figure 4: Left: yCP/y as function of φ, for x/y = 1.2 (solid line) and x/y = {0.6, 1.8} (dashed
lines), see Eq. (12). Right: AΓ/yCP as function of φ.
Two more CP-sensitive observables related to D0 → K+K− have been measured by
the Belle collaboration [3]:
yCP =
[Γ(D0 → K+K−) + Γ(D̄0 → K+K−)]− 1
y cosφ+
x sin φ, (27)
[Γ(D0 → K+K−)− Γ(D̄0 → K+K−)]− 1
y cos φ+
x sin φ. (28)
The present experimental value of yCP is given in (7), that for AΓ is (0.01± 0.30(stat)±
0.15(syst)) × 10−2. Again, we can study the dependence of these observables on φ. In
Fig. 4a we plot the ratio yCP/y, which is a function of x/y and φ, in dependence on φ. As
it turns out, this quantity is far less sensitive to φ than y′
, the reason being that its
deviation from 1 is only a second-order effect in φ:
yCP = y
1 + φ2
x4 + x2y2 − y4
2(x2 + y2)2
+O(φ4)
. (29)
Hence, unless the experimental accuracy is dramatically increased, and because the results
on y′
and y/ySM already exclude a large CP-odd phase φ ≈ ±π/2, it is safe to interpret
yCP as measurement of y. In Fig. 4b we plot the quantity AΓ/yCP. Also here there is a
distinctive dependence on φ, with AΓ/y ∝ φ for small φ, but the effect is less dramatic
than that in y′
In conclusion, we find that the experimental results on D mixing reported by BaBar
and Belle already exclude extreme values of the CP-odd phase φ close to ±π/2. This
follows from the result for y, which is close to the top end of theoretical predictions and
can only be reduced by new physics, and from y′
∼ 1. While y′
− 1 vanishes in
the limit of no CP-violation, y ∼ ∆Γ is a CP-conserving observable, which demonstrates
the usefulness of such quantities in constraining CP-odd phases. Also yCP, AΓ and the
ratio AΓ/yCP can be useful in constraining φ. As long as there is no major breakthrough
in theoretical predictions for D mixing, which are held back by the fact that the D meson
is at the same time too heavy and too light for current theoretical tools to get a proper
grip on the problem, the long-distance SM contributions to x will completely obscure any
NP contributions and their detection. The observation of CP violation, however, presents
a theoretically clean way for NP to manifest itself and it is to be hoped that in the near
future, i.e. at the B factories or the LHC, at least one of the plentiful opportunities for
NP to show up in CP violation [20] will be realised.
Acknowledgments
This work was supported in part by the EU networks contract Nos. MRTN-CT-2006-
035482, Flavianet, and MRTN-CT-2006-035505, Heptools.
References
[1] M. Staric (Belle), talk given at 42nd Rencontres de Moriond, Electroweak Interactions
and Unified Theories, La Thuile, Italy, March 2007;
K. Flood (BaBar), talk given at the same conference.
[2] B. Aubert et al. [BABAR Collaboration], arXiv:hep-ex/0703020.
[3] K. Abe [Belle Collaboration], arXiv:hep-ex/0703036.
[4] M. Ciuchini et al., arXiv:hep-ph/0703204.
[5] Y. Nir, arXiv:hep-ph/0703235.
[6] P. Ball, arXiv:hep-ph/0703245.
[7] M. Blanke et al., arXiv:hep-ph/0703254.
[8] X. G. He and G. Valencia, arXiv:hep-ph/0703270.
[9] C. H. Chen, C. Q. Geng and T. C. Yuan, arXiv:0704.0601.
[10] G. Burdman and I. Shipsey, Ann. Rev. Nucl. Part. Sci. 53, 431 (2003)
[arXiv:hep-ph/0310076].
[11] D. Asner, review on D mixing in W. M. Yao et al. [Particle Data Group], J. Phys.
G 33 (2006) 1;
I. Shipsey, Int. J. Mod. Phys. A 21 (2006) 5381 [arXiv:hep-ex/0607070];
A. A. Petrov, Int. J. Mod. Phys. A 21 (2006) 5686 [arXiv:hep-ph/0611361].
http://arxiv.org/abs/hep-ex/0703020
http://arxiv.org/abs/hep-ex/0703036
http://arxiv.org/abs/hep-ph/0703204
http://arxiv.org/abs/hep-ph/0703235
http://arxiv.org/abs/hep-ph/0703245
http://arxiv.org/abs/hep-ph/0703254
http://arxiv.org/abs/hep-ph/0703270
http://arxiv.org/abs/0704.0601
http://arxiv.org/abs/hep-ph/0310076
http://arxiv.org/abs/hep-ex/0607070
http://arxiv.org/abs/hep-ph/0611361
[12] W. M. Sun [CLEO Collaboration], AIP Conf. Proc. 842 (2006) 693
[arXiv:hep-ex/0603031];
D. Asner et al. [CLEO Collaboration], Int. J. Mod. Phys. A 21 (2006) 5456
[arXiv:hep-ex/0607078].
[13] D. Asner, talk given at workshop Flavour Physics in the Era of the LHC, CERN,
March 07, http://mlm.home.cern.ch/mlm/FlavLHC.html.
[14] X. D. Cheng et al., arXiv:arXiv:0704.0120.
[15] D. N. Gao, Phys. Lett. B 645 (2007) 59 [arXiv:hep-ph/0610389].
[16] A. Lenz and U. Nierste, arXiv:hep-ph/0612167.
[17] H. Georgi, Phys. Lett. B 297, 353 (1992) [arXiv:hep-ph/9209291];
T. Ohl, G. Ricciardi and E. H. Simmons, Nucl. Phys. B 403, 605 (1993)
[arXiv:hep-ph/9301212];
I. I. Y. Bigi and N. G. Uraltsev, Nucl. Phys. B 592, 92 (2001) [arXiv:hep-ph/0005089];
A. F. Falk, Y. Grossman, Z. Ligeti and A. A. Petrov, Phys. Rev. D 65, 054034 (2002)
[arXiv:hep-ph/0110317];
A. F. Falk et al., Phys. Rev. D 69, 114021 (2004) [arXiv:hep-ph/0402204].
[18] Y. Grossman, Phys. Lett. B 380 (1996) 99 [arXiv:hep-ph/9603244].
[19] E. Golowich, S. Pakvasa and A. A. Petrov, arXiv:hep-ph/0610039.
[20] P. Ball, J. M. Frere and J. Matias, Nucl. Phys. B 572 (2000) 3
[arXiv:hep-ph/9910211];
P. Ball and R. Zwicky, JHEP 0604 (2006) 046 [arXiv:hep-ph/0603232];
P. Ball and R. Fleischer, Eur. Phys. J. C 48, 413 (2006) [arXiv:hep-ph/0604249];
P. Ball and R. Zwicky, Phys. Lett. B 642 (2006) 478 [arXiv:hep-ph/0609037];
P. Ball, G. W. Jones and R. Zwicky, Phys. Rev. D 75 (2007) 054004
[arXiv:hep-ph/0612081].
http://arxiv.org/abs/hep-ex/0603031
http://arxiv.org/abs/hep-ex/0607078
http://mlm.home.cern.ch/mlm/FlavLHC.html
http://arxiv.org/abs/0704.0120
http://arxiv.org/abs/hep-ph/0610389
http://arxiv.org/abs/hep-ph/0612167
http://arxiv.org/abs/hep-ph/9209291
http://arxiv.org/abs/hep-ph/9301212
http://arxiv.org/abs/hep-ph/0005089
http://arxiv.org/abs/hep-ph/0110317
http://arxiv.org/abs/hep-ph/0402204
http://arxiv.org/abs/hep-ph/9603244
http://arxiv.org/abs/hep-ph/0610039
http://arxiv.org/abs/hep-ph/9910211
http://arxiv.org/abs/hep-ph/0603232
http://arxiv.org/abs/hep-ph/0604249
http://arxiv.org/abs/hep-ph/0609037
http://arxiv.org/abs/hep-ph/0612081
ABSTRACT
  Both BaBar and Belle have found evidence for a non-zero width difference in
the $D^0$-$\bar D^0$ system. Although there is no direct experimental evidence
for CP-violation in $D$ mixing (yet), we show that the measured values of the
width difference $y\sim \Delta\Gamma$ already imply constraints on the
CP-violating phase in $D$ mixing, which, if significantly different from zero,
would be an unambiguous signal of new physics.

<|endoftext|><|startoftext|>
Introduction
We consider the flow of an incompressible fluid in a open bounded polyhedral
set Ω ⊂ R2 during the time interval [0, T ]. The velocity field u : Ω× [0, T ] →
2 and the pressure field p : Ω × [0, T ] → R satisfy the Navier-Stokes equa-
tions
∆u+ (u ·∇)u+∇p = f , (1)
div u = 0 , (2)
with the boundary and initial condition
u|∂Ω = 0 , u|t=0 = u0.
The terms ∆u and (u ·∇)u are associated with the physical phenomena of
diffusion and convection, respectively. The Reynolds number Re measures the
S. Zimmermann
17 rue Barrème, 69006 Lyon - FRANCE
Tel.: (+33)0472820337
E-mail: Sebastien.Zimmermann@ec-lyon.fr
http://arxiv.org/abs/0704.0787v1
2 Sébastien Zimmermann
influence of convection in the flow. For equations (1)–(2), finite element and
finite difference methods are well known and mathematical studies are avail-
able (see [9] for example). For finite volume schemes, numerous computations
have been conducted ([12] and [1] for example). However, few mathematical
results are available in this case. Let us cite Eymard and Herbin [5] and
Eymard, Latché and Herbin [6]. In order to deal with the incompress-
ibility constraint, these works use a penalization method. Another way is
to use the projection methods which have been introduced by Chorin [4]
and Temam [13]. This is the case in Faure [8] where the mesh is made of
squares. In Zimmermann [14] the mesh is made of triangles, which allows
more complex geometries. In the present paper the mesh is also made of tri-
angles, but we consider a different discretisation for the pressure. It leads to
a linear system with a better-conditioned matrix. The layout of the article is
the following. We first introduce (section 2.1) some notations and hypotheses
on the mesh. We define (section 2.2) the spaces we use to approximate the
velocity and pressure. We define also (section 2.3) the operators we use to
approximate the differential operators in (1)–(2). By combining this with a
projection method, we build the scheme in section 3. In order to provide a
mathematical analysis, we state in section 4 that the differential operators in
(1)–(2) and their discrete counterparts share similar properties. In particular,
the discrete operators for the gradient and the divergence are adjoint. We
then prove in section 5 the convergence of the scheme.
We conclude with some notations. We denote by χI the characteristic func-
tion of an interval I ⊂ R. We denote by C∞0 = C∞0 (Ω) the set of the functions
with a compact support in Ω. The spaces (L2, |.|) and (L∞, ‖.‖∞) are the
usual Lebesgue spaces and we set L20 = {q ∈ L2 ;
q dx = 0}. Their vectorial
counterparts are (L2, |.|) and (L∞, ‖.‖∞) with L2 = (L2)2 and L∞ = (L∞).
For k ∈ N∗, (Hk, ‖ · ‖k) is the usual Sobolev space. Its vectorial counterpart
is (Hk, ‖.‖k) with Hk = (Hk)2. For k = 1, the functions of H1 with a null
trace on the boundary form the space H10. Also, we set ∇u = (∇u1,∇u2)T
if u = (u1, u2) ∈ H1. If X ⊂ L2 is a Banach space, we define C(0, T ;X)
(resp. L2(0, T ;X)) as the set of the applications g : [0, T ] → X such that
t → |g(t)| is continous (resp. square integrable). The norm ‖.‖C(0,T ;X) is
defined by ‖g‖C(0,T ;X) = sups∈[0,T ] |g(s)|. Finally in all calculations, C is a
generic positive constant, depending only on Ω, u0 and f .
2 Discrete setting
First, we introduce the spaces and operators needed to build the mesh.
2.1 The mesh
Let Th be a triangular mesh of Ω: Ω = ∪K∈ThK. For each triangle K ∈ Th,
we denote by |K| its area and EK the set of his edges. If σ ∈ EK , nK,σ is the
unit vector normal to σ pointing outwards of K.
The set of edges of the mesh is Eh = ∪K∈ThEK . The length of an edge σ ∈ Eh
is |σ|. The set of edges inside Ω (resp. on the boundary) is E inth (resp. Eexth ):
Convergence of a finite volume scheme for the incompressible fluids 3
Eh = E inth ∪ Eexth . If σ ∈ E inth , Kσ and Lσ are the triangles sharing σ as an
edge. If σ ∈ Eexth , only the triangle Kσ inside Ω is defined.
We denote by xK the circumcenter of a triangle K. We assume that the
measure of all interior angles of the triangles of the mesh are below π
that xK ∈ K. If σ ∈ E inth (resp. σ ∈ Eexth ) we set dσ = d(xKσ ,xLσ ) (resp.
dσ = d(xσ,xKσ )). We define for all edge σ ∈ Eh: τσ =
. The maximum
circumradius of the triangles of the mesh is h. We assume that there exists
C > 0 such that
∀σ ∈ Eh, d(xKσ , σ) ≥ C|σ| and |σ| ≥ Ch.
It implies that there exists a constant C > 0 such that for all edge σ ∈ Eh
τσ ≥ C (3)
and for all triangles K ∈ Th we have (with σ ∈ EK and hK,σ the matching
altitude)
|K| = 1
|σ|hK,σ ≥
|σ| d(xK ,xσ) ≥ C h2. (4)
2.2 The discrete spaces
We first define
P0 = {q ∈ L2 ; ∀K ∈ Th, q|K is a constant} , P0 = (P0)2.
For the sake of concision, we set for all qh ∈ P0 (resp. vh ∈ P0) and all
triangle K ∈ Th: qK = qh|K (resp. vK = vh|K). Although P0 6⊂ H1, we
define the discrete equivalent of a H1 norm as follow. For all vh ∈ P0 we set
‖vh‖h =
σ∈Eint
τσ |vLσ − vKσ |2 +
σ∈Eext
τσ |vKσ |2
. (5)
We have [7] a discrete Poincaré inequality for P0: there exists C > 0 such
that for all vh ∈ P0
|vh| ≤ C ‖vh‖h. (6)
From the norm ‖.‖h we deduce a dual norm. For all vh ∈ P0 we set
‖vh‖−1,h = sup
(vh,ψh)
‖ψh‖h
. (7)
For all uh ∈ P0 and vh ∈ P0 we have (uh,vh) ≤ ‖uh‖−1,h ‖vh‖h. We
define the projection operator ΠP0 : L
2 → P0 as follows. For all w ∈ L2,
ΠP0w ∈ P0 is given by
∀K ∈ Th , (ΠP0w)|K =
w(x) dx. (8)
4 Sébastien Zimmermann
We easily check that for all w ∈ L2 and vh ∈ P0 we have (ΠP0w,vh) =
(w,vh). We deduce from this that ΠP0 is stable for the L
2 norm. We define
also the operator Π̃P0 : H
2 → P0. For all v ∈ H2, Π̃P0v ∈ P0 is given by
∀K ∈ Th , Π̃P0v|K = v(xK).
According to the Sobolev embedding theorem, v ∈ H2 is a.e. equal to a
continuous function. Therefore the definition above makes sense. One checks
[14] that there exists C > 0 such that
|v − Π̃P0v| ≤ C h ‖v‖2 (9)
for all v ∈ H1. We introduce also the finite element spaces
1 = {v ∈ L2 ; ∀K ∈ Th, v|K is affine} ,
Pnc1 = {vh ∈ P d1 ; ∀σ ∈ E inth , vh|Kσ (xσ) = vh|Lσ(xσ) , Pnc1 = (Pnc1 )2.
If qh ∈ Pnc1 , we have usually ∇qh 6∈ L2. Therefore we define the operator
∇h : Pnc1 → P0 by setting for all qh ∈ P0 and all triangle K ∈ Th
∇hqh|K =
∇qh dx. (10)
We define the projection operator ΠPnc
. For all q ∈ H1, ΠPnc
q is given by
∀σ ∈ Eh ,
(ΠPnc
q) dσ =
q dσ. (11)
We also set ΠPnc
= (ΠPnc
)2. One checks that there exists C > 0 such that
∣∣∇q −∇h(ΠPnc
∣∣ ≤ C h ‖q‖2 , |v −ΠPnc
v| ≤ C h ‖v‖1 , (12)
for all q ∈ H1 and v ∈ H1.
We also use the Raviart-Thomas spaces
= {vh ∈ Pd1 ; ∀σ ∈ EK , vh|K · nK,σ is constant, and vh · n|∂Ω = 0} ,
RT0 = {vh ∈ RTd0 ; ∀K ∈ Th, ∀σ ∈ EK , vh|Kσ · nKσ,σ = vh|Lσ · nKσ,σ}.
For all vh ∈ RT0, K ∈ Th and σ ∈ EK we set (vh · nK,σ)σ = vh|K · nK,σ.
We define the operator ΠRT0 : H
1 → RT0. For all v ∈ H1, ΠRT0v ∈ RT0
is given by
∀K ∈ Th , ∀σ ∈ EK , (ΠRT0v · nK,σ)σ =
v dσ. (13)
One checks [3] that there exists a constant C > 0 such that for all v ∈ H1
|v −ΠRT0v| ≤ C h ‖v‖1. (14)
Convergence of a finite volume scheme for the incompressible fluids 5
2.3 The discrete operators
The equations (1)–(2) use the differential operators gradient, divergence and
laplacian. Using the spaces of section 2.2, we now define their discrete coun-
terparts. The discrete gradient ∇h : Pnc1 → P0 is defined by (10). The
discrete divergence operator divh : P0 → Pnc1 is built so that it is adjoint
to the operator ∇h (proposition 3 below). We set for all vh ∈ P0 and all
triangle K ∈ Th
∀σ ∈ E inth , (divh vh)(xσ) =
3 |σ|
|Kσ|+ |Lσ|
(vLσ − vKσ ) · nK,σ ;
∀σ ∈ Eexth , (divh vh)(xσ) = −
3 |σ|
|Kσ|+ |Lσ|
vKσ · nK,σ. (15)
The first discrete laplacian ∆h : P
1 → Pnc1 is given by
∀ qh ∈ Pnc1 , ∆hqh = divh(∇hqh).
The second discrete laplacian ∆̃h : P0 → P0 is the usual operator in finite
volume schemes [7]. We set for all vh ∈ P0 and all triangle K ∈ Th
∆̃hvh|K =
σ∈EK∩E
τσ (vLσ − vKσ)−
σ∈EK∩E
τσ vKσ . (16)
In order to approximate the convection term (u · ∇)u in (1), we define a
bilinear form b̃h : RT0 ×P0 → P0 using the well-known [7] upwind scheme.
For all uh ∈ P0, vh ∈ P0, and all triangle K ∈ Th we set
b̃h(uh,vh)
σ∈EK∩E
(u · nK,σ)+σ vK + (u · nK,σ)−σ vLσ
We have set a+ = max(a, 0), a− = min(a, 0) for all a ∈ R. Lastly, we define
the trilinear form bh : RT0 × P0 × P0 → R as follows. For all uh ∈ RT0,
vh ∈ P0, wh ∈ P0, we set
bh(uh,vh,wh) =
|K|wK · b̃h(uh,vh)
3 The scheme
We have defined in section 2 the discretization in space. We now have to
define the discretization in time, and treat the incompressibility constraint
(2). We use a projection method to this end. This kind of method has been
introduced by Chorin [4] and Temam [13]. The time interval [0, T ] is split
with a time step k: [0, T ] =
n=0[tn, tn+1] with N ∈ N∗ et tn = n k for all
n ∈ {0, . . . , N}. We start with the initial values
u0h ∈ P0 ∩RT0 , u1h ∈ P0 ∩RT0 , p1h ∈ Pnc1 ∩ L20.
For all n ∈ {1, . . . , N}, (ũn+1h , p
h ) is deduced from (ũ
h) as
follows.
6 Sébastien Zimmermann
– ũn+1h ∈ P0 is given by
3 ũn+1h − 4unh + u
∆̃hũ
+ b̃h(2u
h − un−1h , ũ
h ) +∇hp
h = f
h , (17)
– pn+1h ∈ Pnc1 ∩ L20 is the solution of
h − p
divh ũ
– un+1h ∈ P0 is given by
un+1h = ũ
∇h(pn+1h − p
h). (18)
We have proven in [14] that the scheme is well defined. In particular the term
b̃h(2u
h − u
h ) in (17) is defined thanks to the following result.
Proposition 1 For m ∈ {0, . . . , N} we have umh ∈ RT0.
Note also that for m ∈ {0, . . . , N} we have divumh = 0, since umh ∈ P0. Thus
the incompressibility condition (2) is fullfilled.
4 Properties of the discrete operators
The operators defined in section 2.3 have the following properties [14].
Proposition 2 There exists a constant C > 0 such that for all uh ∈ RT0
satisfying divuh = 0, vh ∈ P0, wh ∈ P0:
|bh(uh,vh,vh)| ≤ C |uh| ‖vh‖h ‖wh‖h.
Proposition 3 For all vh ∈ P0 and qh ∈ Pnc1 : (vh,∇hqh) = −(qh, divh vh).
Proposition 4 For all uh ∈ P0 and vh ∈ P0: −(∆̃huh,uh) = ‖uh‖2h and
−(∆̃huh,vh) ≤ ‖uh‖h ‖vh‖h.
If v ∈ H1 we have |divv| ≤ ‖v‖1. The operator divh has a similar property.
Proposition 5 There exists a constant C > 0 such that for all vh ∈ P0
|divh vh| ≤ C ‖vh‖h.
Proof. Using a quadrature formula we have
|divh vh|2 =
|(divh vh)(xσ)|2 .
Convergence of a finite volume scheme for the incompressible fluids 7
Let K ∈ Th. Using definition and (4) we have
∀σ ∈ EK ∩ E inth , |(divh vh)(xσ)|
|K| |vLσ − vK |
∀σ ∈ EK ∩ Eexth , |(divh vh)(xσ)|
2 ≤ C
|vK |2.
Thus: |divh vh|2 ≤ C
σ∈EK∩E
|vLσ − vK |2 +
σ∈EK∩E
|vK |2
Writing the sum over the triangles as a sum over the edges, we get
|divh vh|2 ≤ C
σ∈Eint
τσ |vLσ − vK |2 +
σ∈Eext
τσ |vK |2
 ≤ C ‖vh‖2h.
Proposition 6 If uh ∈ P0 and vh ∈ P0 we have (∆̃huh,vh) = (uh, ∆̃hvh).
Proof. Using definition (2.3) one checks that
(∆̃huh,vh) =
σ∈Eint
τσ (vLσ − vKσ ) · (uLσ − uKσ)−
σ∈Eext
τσ vKσ · uKσ
= (∆̃hvh,uh).
Proposition 7 There exists C > 0 such that for all v ∈ H2 satisfying
∇v · n|∂Ω = 0
‖ΠP0(∆v)−∆h(Π̃P0v)‖−1,h ≤ C h ‖v‖2.
Proof. Let ψh ∈ P0. We have
ΠP0(∆q)−∆h(Π̃P0v),ψh
ΠP0(∆q)−∆h(Π̃P0v)
·ψK .
For all K ∈ Th, using (2.3) and the divergence formula, we get
ΠP0(∆v) −∆h(Π̃P0v)
) ∣∣∣
σ∈EK∩E
∇v · nK,σ dσ − τσ
v(xLσ )− v(xK)
Thus, by writing the sum over the triangles as a sum over the edges, we get
ΠP0(∆v)−∆h(Π̃P0v),ψh
σ∈Eint
(ψLσ −ψKσ)Rσ
with Rσ =
∇v · nKσ,σ − 1dσ
v(xLσ ) − v(xKσ )
dσ. We denote by Dσ
the quadrilatere defined by xKσ , xLσ and the endpoints of σ. Using a Taylor
expansion and a density argument, we get as in [7] that
|Rσ| ≤ C h
|H(vi)(y)|2 dy
8 Sébastien Zimmermann
Thus, using the Cauchy-Schwarz inequality, we get
ΠP0(∆q)−∆h(Π̃P0v),ψh
≤ C h
σ∈Eint
|ψLσ −ψKσ |
σ∈Eint
|H(vi)(y)|2 dy
According to (3)
σ∈Eint
|ψLσ −ψKσ |
2 ≤ C
σ∈Eint
τσ |ψLσ −ψKσ |
2 ≤ C ‖ψ‖2h.
ΠP0(∆v)−∆h(Π̃P0v),ψh
)∣∣∣ ≤ C h ‖ψh‖h ‖v‖2. Using (7) we get
the result.
5 Convergence of the scheme
We first recall the stability result that has been proven in [15]. We deduce
from it an estimate on the Fourier transform of the computed velocity (lemma
1). Using a result on space P0, we infer from it the convergence of the
scheme (theorem 2). One shows that if the data u0 et f fulfill a compati-
bility condition [11], there exists a solution (u, p) to equations (1)–(2) such
that u ∈ C(0, T ;H2) , ∇p ∈ C(0, T ;L2). We assume from now on that there
exists C > 0 such that
(HI) |u0h−u0|+
‖u1h−u(t1)‖∞+|p1h−p(t1)| ≤ C h , |u1h−u0h| ≤ C k.
Let us recall the following result [15].
Theorem 1 We assume that the initial values of the scheme fulfill (HI).
There exists a constant C > 0 such that for all m ∈ {2, . . . , N}
|umh |+ k
‖ũnh‖2h +
|umh − um−1h |+
‖ũnh − ũn−1h ‖
h ≤ C
|pnh|2 + k |∇hpmh |+ |∇h(pmh − pm−1h )| ≤ C.
From now on we set ũ1h = u
h for the sake of conveniance. One deduces from
hypothesis (HI) [14] that |p1h| ≤ C and ‖ũ1h‖h ≤ C. Now, let ε = max(h, k).
We study the behaviour of the scheme as ε → 0. We define the applications
uε : R → P0 , ũε : R → P0 , ũcε : R → P0, pε : R → Pnc1 and fε : R → P0 as
follows. For all n ∈ {0, . . . , N − 1} and all t ∈ [tn, tn+1] we set
uε(t) = u
h , ũε(t) = ũ
h , ũ
ε(t) = ũ
(t− tn) (ũn+1h − ũ
pε(t) = p
h, fε(t) = ΠP0 f(tn+1) ,
Convergence of a finite volume scheme for the incompressible fluids 9
and for all t 6∈ [0, T ] we set uε(t) = ũε(t) = ũcε(t) = fε(t) = 0, pε(t) = 0. We
recall that the Fourier transform v̂ of a function v ∈ L1(R) is defined by
∀ τ ∈ R, v̂(τ) =
e−2iπτt v(t) dt. (19)
We have the following result.
Lemma 1 Let 0 < γ < 1
. There exists C > 0 such that for all ε > 0
|τ |2γ |̂̃uε(τ)|2 dτ ≤ C.
Proof. Let χI be the characteristic function of an interval I ⊂ R. We define
the application gε : R → P0 as follows. For all t 6∈ [t1, T ] we set gε(t) = 0.
For all t ∈ [t1, T ], gε(t) ∈ P0 is the solution of
∆̃hgε = ∆̃hũε + fε − b̃h
2uε − uε(t− k), ũε
with Pε = −∇hpε− 2 ũε(t−k)−uεk +
ũε(t−2 k)−uε(t−k)
χ[t2,T ]. We have omitted
most of the time dependancies for the sake of concision. Let us estimate gε.
We have
−(∆̃hgε,gε) = −(∆̃hũε,gε)− (fε,gε) (20)
2uε − uε(t− k), ũε,gε
+ (Pε,gε).
According to proposition 4 we have
−(∆̃hgε,gε) = ‖gε‖2h , −(∆̃hũε,gε) ≤ ‖ũε‖h ‖gε‖h.
Using the Cauchy-Schwarz inequality and (6) we have
−(fε,gε) ≤ |fε| |gε| ≤ C |fε| ‖gε‖h.
According to proposition 2 and theorem 1
2uε − uε(t− k), ũε,gε
≤ C |2uε − uε(t− k)| ‖ũε‖h ‖gε‖h ≤ C ‖ũε‖h ‖gε‖h.
Using (18) we have
Pε = −
χ[t3,T ]
∇hpε+
χ[t3,T ]∇hpε(t−k)−
χ[t3,T ] ∇hpε(t−2 k).
Using proposition 3 and the Cauchy-Schwarz inequality we get
|(Pε,gε)| ≤ C
|pε|+ χ[t3,T ] |pε(t− k)|+ χ[t3,T ] |pε(t− 2 k)|
|divh gε|.
Using proposition 5 we have
|(Pε,gε)| ≤ C
|pε|+ χ[t3,T ] |pε(t− k)|+ χ[t3,T ] |pε(t− 2 k)|
‖gε‖h.
10 Sébastien Zimmermann
Let us plug these estimates into (20). By simplifying by ‖gε‖h and integrating
from t = t1 to T we get
‖gε‖h dt ≤ C + C
|pε| dt+
|fε| dt+
‖ũε‖h dt
According to the Cauchy-Schwarz inequality and theorem 1
|pε(t)| dt ≤
|pε(t)|2 dt
|pnh|2
Thanks to the stability of ΠP0 for the L
2 norm we have
|fε(t)| dt = k
|ΠP0 f(tn)| ≤ k
|f(tn)| ≤ k
‖f‖C(0,T ;L2) ≤ C.
And thanks to the Cauchy-Schwarz inequality and theorem 1
‖ũε(t)‖h dt ≤
‖ũε(t)‖2h dt ≤ C k
‖ũnh‖2h ≤ C.
Thus, since gε(t) = 0 for t ∈ [0, t1], we get
‖gε(t)‖h =
‖gε(t)‖h dt ≤ C.
Using definition (19) we obtain finally
∀ τ ∈ R , ‖ĝε(τ)‖h ≤ C. (21)
With this estimate we can now prove the result. Since the function ũcε is
piecewise C1 on R, and discontinous for t = 0 and t = T , equation (17) reads
dũcε
dũcε
(t− k) = ∆hgε +
(ũ0h δ0 − ũNh δT )−
(ũ1h δt1 − ũNh δT+k)
ũ1h − ũ0h
χ[0,t1] −
ũNh − ũ
χ[T,T+k]
where δ0, δt1 , δT and δT+k are Dirac distributions located respectively in 0,
t1, T and T + k. Let τ ∈ R. Applying the Fourier transform we get
−2iπτ
−2iπτk
̂̃uε(τ) = ∆hĝε(τ) +α
(ũ0h − ũNh e−2iπT )−
(ũ1h − ũNh e−2iπT )
e−2iπk − 1
ũ1h − ũ0h
ũNh − ũ
−2iπT
e−2iπk − 1
Convergence of a finite volume scheme for the incompressible fluids 11
Taking the scalar product with i ̂̃uε(τ) we get
e−2iπτk
|̂̃uε(τ)|2 = i
∆hĝε(τ), ̂̃uε(τ)
α, ̂̃uε(τ)
Let us bound the right-hand side. According to proposition 4 and (21)
∆hĝε(τ), ̂̃uε(τ)
)∣∣∣ ≤ ‖ĝε(τ)‖h ‖̂̃uε(τ)‖h ≤ C ‖̂̃uε(τ)‖h.
On the other hand, using theorem 1, one checks that α is bounded. Thus,
according to the Cauchy-Schwarz inequality and (6)
α, ̂̃uε(τ)
)∣∣∣ ≤ |α| |̂̃uε(τ)| ≤ C |α| ‖̂̃uε(τ)‖h ≤ C ‖̂̃uε(τ)‖h.
Hence we have
∀ τ ∈ R, |τ | |̂̃uε(τ)|2 ≤ C ‖̂̃uε(τ)‖h.
If τ 6= 0, multiplying this estimate by |τ |2 γ−1, we get |τ |2 γ |̂̃uε(τ)|2 ≤
C |τ |2 γ−1 ‖̂̃uε(τ)‖h. Using the Young inequality and integrating over {τ ∈
R ; |τ | > 1} we obtain
|τ |>1
|τ |2 γ |̂̃uε(τ)|2 dτ ≤
|τ |>1
|τ |4 γ−2 dτ + C
|τ |>1
‖̂̃uε(τ)‖2h dτ.
For |τ | ≤ 1 we have |τ |2 γ |̂̃uε(τ)|2 ≤ |̂̃uε(τ)|2 ≤ C ‖̂̃uε(τ)‖2h thanks to (6).
|τ |2 γ |̂̃uε(τ)|2 dτ ≤
|τ |>1
|τ |4 γ−2 dτ + C
‖̂̃uε(τ)‖2h dτ.
Since 4 γ−2 < −1 we have
|τ |>1
|τ |4 γ−2 dτ ≤ C. On the other hand, thanks
to the Parseval theorem and thorem 1
‖̂̃uε(τ)‖2h dτ ≤
‖̂̃uε(τ)‖2h dt ≤ k
‖ũnh‖2h ≤ C.
Hence the result.
We introduce the following spaces
H = {v ∈ L2 ; divv ∈ L2 et v · n|∂Ω = 0} , V = {v ∈ H10 ; divv = 0}.
We also set
((u,v)) =
(∇ui,∇vi) , b(u,v,w) = −
(vi,u · ∇wi)
for all u = (u1, u2) ∈ H1, v = (v1, v2) ∈ H1, w = (w1, w2) ∈ H1. We have
the following result.
12 Sébastien Zimmermann
Theorem 2 We assume that the initial values of the scheme fulfill hypothesis
(HI). We also assume that the space step h and the time step k are such
that h ≤ C kα with α > 1. Then we have uε → u in L2(0, T ;L2) with
u ∈ C(0, T ;H) ∩ L2(0, T ;V) ,
∈ L2(0, T ;L2). (22)
We also have u(0) = u0 and for all ψ ∈ C∞0 ([0, T ])
∀v ∈ V,
(u,v) + ((u,v)) + b(u,u,v) − (f ,v)
dt = 0. (23)
Proof. In what follows, sub-sequences of a sequence (vε)ε>0 will still be
noted (vε)ε>0 for the sake of convenience. All the limits are for ε → 0.
According to theorem 1 and hypothesis (HI) we have
‖uε‖2L2(0,T ;L2) = k (|u
h|2 + |u1h|2) + k
|unh|2 ≤ C.
We also deduce from (6), hypothesis (HI) and theorem 1
‖ũε‖2L2(0,T ;L2) = k |u
h|2 + k
|ũnh|2 ≤ C + C k
‖ũnh‖2h ≤ C.
A simple computation shows that there exists C > 0 such that
‖ũcε‖L2(0,T ;L2) ≤ C ‖ũε‖L2(0,T ;L2) ≤ C.
Thus the sequences (uε)ε>0, (ũε)ε>0 and (ũ
ε)ε>0 are bounded in L
2(0, T ;L2).
Therefore there exists u ∈ L2(0, T ;L2), ũ ∈ L2(0, T ;L2) and ũc ∈ L2(0, T ;L2)
such that, up to a sub-sequence, we have
uε ⇀ u , ũε ⇀ ũ , ũ
ε ⇀ ũ
c weakly in L2(0, T ;L2).
We claim that the limits u, ũ, ũc are the same. Indeed, let us consider uε−ũε.
Since unh − ũ
h = (u
h − u
h ) + (u
h − ũ
h ) we have
‖uε − ũε‖2L2(0,T ;L2) ≤ 2 k
|unh − un+1h |
2 + 2 k
|un+1h − ũ
According to theorem 1 we have k
n=0 |unh−u
h |2 ≤ C
n=0 k
3 ≤ C k2.
Thanks to (18) we also have
|un+1h − ũ
|∇h(pn+1h − p
h)|2 ≤ C
3 ≤ C k2.
Thus ‖uε − ũε‖L2(0,T ;L2) → 0 and u = ũ. One checks in a simililar way that
ũ = ũc. Now, using the Fourier transform, we prove the strong convergence
Convergence of a finite volume scheme for the incompressible fluids 13
of the sequence (uε)ε>0 in L
2(0, T ;L2). We set vε = uε −u. Let M > 0. We
use the splitting
|v̂ε(τ)|2 dτ =
|τ |≤M
|v̂ε(τ)|2 dτ +
|τ |>M
|v̂ε(τ)|2 dτ = IMε + JMε .
Let us estimate JMε . Since |v̂ε(τ)|2 ≤ 2 |ûε(τ)|2 + 2 |û(τ)|2 we have
JMε ≤ 2
|τ |>M
|ûε(τ)|2 dτ + 2
|τ |>M
|û(τ)|2 dτ.
According to lemma 1 we have
|τ |>M
|ûε(τ)|2 dτ ≤
|τ |>M
|τ |2γ |ûε(τ)|2 dτ ≤
|τ |>M
|û(τ)|2 dτ.
Therefore, for all ε > 0, we have JMε → 0 when M → ∞. We now consider
IMε . Let τ ∈ R. Since uε ⇀ u in L2(0, T ;L2), we deduce from definition (19)
̂̃uε(τ)⇀ û(τ) weakly in L2.
For all t ∈ R we have ũε(t) ∈ P0. From definition (19) we infer that ̂̃uε(τ) ∈
P0. Now, prolonging ̂̃uε(τ) by 0 outside Ω, we deduce from lemma 4 in [7]
that there exists a constant C > 0 such that
∀η∈ R2 , |̂̃uε(τ)(· + η)− ̂̃uε(τ)|2 ≤ ‖̂̃uε(τ)‖2h |η| (|η|+ C h).
Using definition (19), the Cauchy-Schwarz inequality and theorem 1, we have
‖̂̃uε(τ)‖2h ≤ C
‖ũε(t)‖2h dt ≤ C k
‖ũnh‖2h ≤ C.
Thus, using the compactness criterium given by theorem 1 in [7], we get
̂̃uε(τ) → û(τ) in L2. Thus ̂̃vε(τ) = ̂̃uε(τ)− û(τ) → 0 in L2. Therefore for all
M > 0 we have IMε → 0. Using the Parseval inequality, and gathering the
limits for IMε and J
ε , we get
|v̂ε(τ)|2 dτ =
|vε|2 dt =
|uε − u|2 dτ → 0.
We have proven that uε → u in L2(0, T ;L2).
We now check the properties of u. First, proceeding as in [7], one checks
easily that u ∈ L2(0, T ;H10). Now let q ∈ L2(0, T ; C∞0 ). According to (12) we
have ∇h(ΠPnc
q) → ∇q in L2(0, T ;L2). Since uε → u in L2(0, T ;L2) we get
∇h(ΠPnc
q),uε
→ (∇q,u) = −(q, divu).
14 Sébastien Zimmermann
On the other hand, according to propositions 1 and 3, we have for all ε > 0
∇h(ΠPnc
q),uε
= −(ΠPnc
q, divh uε) = 0.
Thus we have
q divu dt = 0 for all q ∈ L2(0, T ; C∞0 ). Since the space C∞0
is dense in L2, we get divu = 0. Hence u ∈ L2(0, T ;V). Let us now check
the regularity of
. Using hypothesis (HI), (6) and theorem 1, we have
dũcε
L2(0,T ;L2)
|u1h − u0h|2 + k
|δũnh|2
≤ C + C k
‖δũnh‖2h
Thus the sequence
dũcε
is bounded in L2(0, T ;L2). Since uε → u in
L2(0, T ;L2) with u ∈ L2(0, T ;H1), proceeding as in, we get
dũcε
weakly ∈ L2(0, T ;L2)
and u ∈ C(0, T ;H).
Let us now prove that u satisfies (23). For the sake of simplicity, we omit to
note some time dependencies. According to (17) we have for all t ∈ [t1, T ]
dũcε
dũcε
(t− k)− 1
∆hũε + b̃h
2uε − uε(t− k), ũε
χ[t3,T ]
∇hpε +
χ[t3,T ] ∇hpε(t− k)−
χ[t3,T ] ∇hpε(t− 2 k).
Let v ∈ V ∩ (C∞0 )2 and ψ ∈ C∞([0, T ]) with ψ(T ) = 0. We set vh = Π̃P0v.
Multiplying the former equation by ψ vh and integrating over [t1, T ] we get
dũcε
dũcε
(t− k),vh
dt− 1
ψ (∆̃hũε,vh) dt
2uε − uε(t− k), ũε,vh
ψ (fε,vh) dt
χψ (∇hpε,vh) dt (24)
with χ = −χ[t3,T−2 k] + 13 χ[t2,t3] −
χ[t1,t2] − 73 χ[T−k,T ] −
χ[T−2 k,T−k]. We
now check the limits of the terms in this equation. First, according to (9), we
have vh → v in L2. We will use this limit in the computations below without
mentioning it. Since ψ(T ) = 0 we obtain by integrating by parts
dũcε
dt− ψ(t1) (ũ1h,vh)−
ψ′ (ũcε,vh) dt
dũcε
(t− k),vh
dt = −ψ(0) (ũ0h,vh)−
∫ T−k
′(t+ k) (ũcε,vh) dt.
Convergence of a finite volume scheme for the incompressible fluids 15
According to hypothesis (HI) we have ũ0h = u
h → u0 in L2 and ũ1h =
u1h → u0 in L2. It implies that (u0h,vh) → (u0,v) and ψ(t1) (ũ1h,vh) =
ψ(k) (ũ1h,vh) → ψ(0) (u0,v). On the other hand
ψ′ (ũcε,vh) dt =
χ[t1,T ] ψ
′ (ũcε,vh) dt→
ψ′ (u,v) dt
and since χ[0,T−k] ψ
′(·+ k) → ψ′ in L∞(0, T )
∫ T−k
′(t+k) (ũcε,vh) dt =
χ[0,T−k] ψ
′(t+k) (ũcε,vh) dt→
′ (u,v) dt.
Thus we have
dũcε
dũcε
(t− k),vh
dt→ −ψ(0) (u,v)−
ψ′ (u,v) dt. (25)
Let us now consider the discrete laplacian. Using proposition 6 and the split-
ting ∆̃hvh =
∆̃hvh −ΠP0(∆̃v)
+ΠP0(∆̃v) we have
∆̃hũε,vh) dt = Aε +Bε
with Aε =
ũε, ∆̃h(Π̃P0v)−ΠP0(∆̃v)
dt,Bε =
ũε, ΠP0(∆v)
Since
|Aε| ≤ ‖∆̃h(Π̃P0v) −ΠP0(∆̃v)‖−1,h
ψ ‖ũε‖h dt ,
using proposition 7 and the Cauchy-Schwarz inequality, we get
|Aε| ≤ C h ‖v‖2
ψ ‖ũε‖h dt ≤ C h
ψ2 dt
)1/2(∫ T
‖ũε‖2h dt
Therefore, using theorem 1: |Aε| ≤ C h
n=1 ‖ũnh‖2h
≤ C h. Hence
Aε → 0. On the other hand, using an integration by parts, we have
ψ (ũε, ∆v) dt →
ψ (u, ∆v) dt = −
ψ ((u,v)) dt.
By gathering the limits for Aε and Bε we get
ψ (∆̃hũε,vh) dt → −
ψ ((u,v)) dt.
Let us now consider the pressure. We use the splitting
(∇hpε,vh) = (∇hpε,vh − v) + (∇hpε,v −ΠRT0v) + (∇hpε, ΠRT0v). (26)
16 Sébastien Zimmermann
First, integrating by parts, we have
(∇hpε, ΠRT0v) = −
pε, div (ΠRT0v)
pε (ΠRT0v · nK,σ).
Since divv = 0, using the divergence formula and definition (13), one checks
that div (ΠRT0v) = 0. Thus −
pε, div (ΠRT0v)
= 0. On the other hand
pε (ΠRT0v ·nK,σ) =
σ∈Eint
(ΠRT0v)σ ·nKσ ,σ)
(pε|Lσ −pε|Kσ ) dσ
and since pε ∈ Pnc1 we get
pε (ΠRT0v · nK,σ) = 0. Thus the last
term in (26) vanishes. To bound the other terms, we use the Cauchy-Schwartz
inequality together with estimates (9), (14) and theorem 1. We get
|(∇hpε,v − vh)|+ |(∇hpε,v −ΠRT0v)| ≤ C h |∇hpε| ‖v‖2 ≤ C
‖v‖2.
Plugging these estimates into (26) we get
|χψ (∇hpε,vh)| dt ≤ C hk . By
hypothesis we have h
≤ kα−1 with α− 1 > 0. Thus for ε = max(h, k) → 0
χψ (∇hpε,vh) dt→ 0.
Let us now consider the convection term. We set uε = 2uε − uε(t − k)
and want to find the limit of
ψ bh(uε, ũε,vh) dt. We use the splitting
−bh(uε, ũε,vh) + b(u,u,v) = Aε1 +Aε2 +Aε3 with
Aε1 = b(u− uε,u,v) , Aε2 = b(uε,u,v) −
div(ui uε), v
Aε3 =
div(ui uε), v
− bh(uε,uε,vh).
By definition Aε1 = −
ui, (u − uε) · ∇vi
. Using the Cauchy-Schwarz
inequality we get
ψ |Aε1| dt ≤ ‖ψ‖∞ ‖v‖W1,∞ ‖u‖L2(0,T ;L2) ‖u− uε‖L2(0,T ;L2).
Since uε → u in L2(0, T ;L2) we also have ‖u − uε‖L2(0,T ;L2) → 0. Thus∫ T
ψAε1 dt → 0. Let us now consider Aε2. Since uε · n|∂Ω = 0 we obtain by
integrating by parts b(uε,u,v) =
vi, div(ui uε)
. Thus
Aε2 =
vi − vih, div(ui uε)
vi − vih,uε · ∇ui
Convergence of a finite volume scheme for the incompressible fluids 17
Using the Cauchy-Schwarz inequality we get
ψ |Aε2| dt ≤ C ‖ψ‖L∞ ‖v − vh‖L∞ ‖uε‖L2(0,T ;L2) ‖u‖L2(0,T ;H1).
Using a Taylor expansion one checks that ‖v − vh‖L∞ ≤ ‖v‖W1,∞ h. We
recall also that ‖uε‖L2(0,T ;L2) ≤ C. Therefore
ψAε2 dt → 0. Let us now
bound Aε3. For all triangle K ∈ Th and all edge σ ∈ EK ∩ E inth , we set
ũεK,Lσ =
ũε|K if (uε · nK,σ) ≥ 0
ũε|Lσ if (uε · nK,σ) < 0
Using the divergence formula one checks that
Aε3 =
σ∈EK∩E
(u− ũεK,Lσ ) (uε · nK,σ) dσ.
By writing this sum as a sum on the edges we get
Aε3 =
σ∈Eint
(vKσ − vLσ) ·
(u− ũεKσ ,Lσ) (uε · nKσ,σ) dσ.
Thus, using definition (11) and a quadrature formula
σ∈Eint
(vKσ − vLσ )
(ΠPnc
u− ũεKσ,Lσ) (uε · nKσ,σ) dσ
σ∈Eint
(vKσ − vLσ ) |σ|
(ΠPnc
u)(xσ)− ũεKσ,Lσ
(uε · nKσ,σ)σ.
We have |σ| ≤ h and, using a Taylor expansion, one checks that |vKσ−vLσ | ≤
h ‖v‖W1,∞ . Thus, thanks to the Cauchy-Schwarz inequality, we get
|Aε3| ≤ C h2
σ∈Eint
|uε(xσ)|2
σ∈Eint
|(ΠPnc
u)(xσ)− ũεKσ ,Lσ |
Using (4) we get
|Aε3| ≤ C
σ∈EK∩E
|uε(xσ)|2
σ∈EK∩E
|(ΠPnc
u)(xσ)− ũε|K |2
Therefore, using a quadrature formula
|Aε3| ≤ C |uε| |ΠPnc1 u− uε| ≤ C |uε| |ΠPnc1 u− uε|.
18 Sébastien Zimmermann
By writing ΠPnc
u − uε = (ΠPnc
u − u) + (u − uε) and using (12), we get
|Aε3| ≤ C |uε| (h ‖u‖1+ |u−uε|). Thus, using the Cauchy-Schwarz inequality,
we have
|ψ| |Aε3| dt ≤ C ‖uε‖L2(0,T ;L2) (h ‖u‖L2(0,T ;H1) + ‖u− uε‖L2(0,T ;L2)).
ψAε3 dt → 0. By gathering the limits for Aε1, Aε2, Aε3 we obtain∫ T
bh(uε,uε,vh)− b(u,u,v)
dt→ 0. Since
b(u,u,v) dt → 0, we get
bh(uε,uε,vh), dt →
ψ b(u,u,v) dt.
Finally, since vh ∈ P0, we have: (fε,vh) = (ΠP0 f ,vh) = (f ,vh). Therefore
ψ (fε,vh) dt =
χ[t1,T ] ψ (f ,vh) dt→
ψ (f ,v) dt.
We now gather the limits we have obtained into (24). The space V∩(C∞0 )2 is
dense in V. Hence we obtain for all v ∈ V and ψ ∈ C∞([0, T ]) with ψ(T ) = 0
−ψ(0) (u0,v)−
′ (u,v) dt+
((u,v)) +b(u,u,v)− (f ,v)
dt = 0.
Taking ψ = φ ∈ C∞0 ([0, T ]), we have φ(0) = 0 and from the definition of the
derivative in the distributional sense
φ′ (u,v) dt = −
(u,v) dt.
Thus we have proven (23). At last, let us show that the initial condition
holds. We have proven before that
dũcε
weakly in L2(0, T ;L2).
Let v ∈ V ∩ (C∞0 )2 and ψ ∈ C∞([0, T ]) such that ψ(T ) = 0. We have
dũcε
dũcε
(t− k),vh
Integrating by parts the limit we get
dũcε
dũcε
(t− k),vh
dt → −ψ(0)
u(0),v
ψ′ (u,v) dt.
By comparing this limit with (25), we get ψ(0) (u(0) − u0,v) = 0 for all
ψ ∈ C∞([0, T ]) with ψ(T ) = 0. Therefore u(0) = u0. At last, note that we
have proven so far the convergence of a sub-sequence of (uε)ε>0 towards u.
But the application u such that (22), (23) and u(0) = u0 hold is unique ([13],
p. 254). Thus the whole sequence (uε)ε>0 converges towards u.
Convergence of a finite volume scheme for the incompressible fluids 19
References
1. Boivin, S., Cayre, F., Herard, J. M.: A finite volume method to solve the Navier-
Stokes equations for incompressible flows on unstructured meshes. Int. J. Therm.
Sci. 39 806–825 (2000).
2. Brenner, S. C., Scott, L.R.: The mathematical theory of finite element methods.
Springer, 2002.
3. Brezzi, F., Fortin, M.: Mixed and hybrid finite element methods. Springer-
Verlag, 1991.
4. Chorin, J.: On the convergence of discrete approximations to the Navier-Stokes
equations. Math. Comp. 23 341–353 (1969).
5. Eymard, R., Herbin, R.: A staggered finite volume scheme on general meshes
for the Navier-Stokes equations in two space dimensions. Int.J. Finite Volumes
(2005).
6. Eymard, R., Latché, J. C., Herbin, R.: Convergence analysis of a colocated finite
volume scheme for the incompressible Navier-Stokes equations on general 2 or
3D meshes. preprint LATP (2004).
7. Eymard, R., Gallouët, T., Herbin, R.: Finite volume methods. P.G. Ciarlet and
J.L. Lions eds, North-Holland, 2000.
8. Faure, S.: Stability of a colocated finite volume scheme for the Navier-Stokes
equations. Num. Meth. PDE 21(2) 242–271 (2005).
9. Girault, V., Raviart, P. A.: Finite Element Methods for Navier-Stokes equations:
Theory and Algorithms. Springer (1986).
10. Guermond, J. L.:Some implementations of projection methods for Navier-
Stokes equations. M2AN 30(5) 637–667 (1996).
11. Heywood, J. G., Rannacher, R.: Finite element approximation of the nonsta-
tionary Navier-Stokes problem. I. Regularity of solutions and second-order er-
ror estimates for spatial discretization. SIAM J. Numer. Anal. 19(26) 275–311
(1982).
12. Kim, D., Choi, H.: A second-order time-accurate finite volume method for
unsteady incompressible flow on hybrid unstructured grids. J. Comp. Phys. 162,
411–428 (2000).
13. Temam, R.: Sur l’approximation de la solution des équations de Navier-Stokes
par la méthode de pas fractionnaires II. Arch. Rat. Mech. Anal. 33 377–385
(1969).
14. Zimmermann, S.: Stability of a colocated finite volume for the incompressible
Navier-Stokes equations. preprint (2006).
15. Zimmermann, S.: Stability of a finite volume scheme for the incompressible
fluids. preprint (2006).
ABSTRACT
  We consider a finite volume scheme for the two-dimensional incompressible
Navier-Stokes equations. We use a triangular mesh. The unknowns for the
velocity and pressure are respectively piecewise constant and affine. We use a
projection method to deal with the incompressibility constraint. In a former
paper, the stability of the scheme has been proven. We infer from it its
convergence.

<|endoftext|><|startoftext|>
COMBINING SEVERAL ALGORITHMS INTO A SUPERIOR ONE
OPTIMAL SYNTHESIS OF MULTIPLE ALGORITHMS 
KERRY M. SOILEAU 
ksoileau@yahoo.com
JULY 27, 2004 
ABSTRACT 
In this paper we give a definition of “algorithm,” “finite algorithm,” “equivalent algorithms,” and what it means for a 
single algorithm to dominate a set of algorithms. We define a derived algorithm which may have a smaller mean 
execution time than any of its component algorithms. We give an explicit expression for the mean execution time 
(when it exists) of the derived algorithm. We give several illustrative examples of derived algorithms with two 
component algorithms. We include mean execution time solutions for two-algorithm processors whose joint density of 
execution times are of several general forms. For the case in which the joint density for a two-algorithm processor is a 
step function, we give a maximum-likelihood estimation scheme with which to analyze empirical processing time data. 
mailto:ksoileau@yahoo.com
1 INTRODUCTION 
It can categorically be said that no algorithm is unique. By this we mean that for a given 
task, invariably more than one algorithm exists which will accomplish that task. One 
strategy is to select one algorithm deemed generally superior to the rest, and to use that 
algorithm exclusively. This paper examines an alternative strategy. We ask, given two or 
more equivalent algorithms, is it ever possible to create a new derived algorithm whose 
mean execution time is less than that of all of the original algorithms? If so, how can such 
an algorithm be derived? 
First we define clearly what we mean by the term “algorithm:”  
Algorithm: An algorithm α  is a pair ( ),α αρ π , where :αρ Ω→Γ  is a Turing-
computable mapping of a countable set Ω  (tasks) into a countable set Γ  (outputs), and 
:απ Ω→  is a mapping of  into the positive real numbers. The function Ω αρ  specifies 
the algorithm’s output ( )αρ ω  when presented with the task ω∈Ω . The function απ  
specifies the execution time ( )απ ω  required to compute the output ( )αρ ω . Note that 
under this definition, given a task ω∈Ω , an algorithm will always produce a definite 
output, namely ( )αρ ω , and will always produce this output after a definite amount of 
time has passed, namely ( )απ ω . We do not address procedures which are 
nondeterministic or whose execution time is unpredictable. 
Definition: We say that an algorithm ( ),α αα ρ π=  is finite if and only if  
for every 
( )0 απ ω< < ∞
ω∈Ω . Note that “ ( ),α αα ρ π=  is finite” does not imply “ απ  is bounded.” For 
example, Quicksort and Bubblesort are finite. 
Definition: We say that two algorithms ( ),α αα ρ π=  and ( ),β ββ ρ π=  are equivalent if 
and only if Dom Dom α βρ ρ=  and ( ) ( )α βρ ω ρ ω=  for every ω∈Ω . Notice that 
equivalent algorithms may require different times to process a given task. For example, 
Quicksort and Bubblesort are equivalent. 
Definition: Let { }1 2, , , Nα α α  be a set of equivalent algorithms. We say that nα  
dominates { }1 2, , , Nα α α  if and only if for every ω∈Ω , ( ) (n iα α )π ω π ω≤  for every 
{ }1,2, ,i N∈ . 
Now suppose we are given a set of finite equivalent algorithms { }1 2, , , Nα α α  such that 
no nα  dominates { }1 2, , , Nα α α . Suppose further that there exists a probability space 
 over  such that ( ), , PΩ ℑ Ω
, , ,
Nα α α
π π π  are random variables. Let 
 be the joint density of the random variables 
, , , :
α α απ π π
, , ,
Nα α α
π π π . 
Definition of Derived Algorithm: From a set of finite equivalent algorithms 
{ }1 2, , , Nα α α , and a given point ( ) [ )
1 2 1, , , 0,
Nτ τ τ
− ∈ ∞ , the function 
 is defined as follows. For each 
, we define the random variable 
1 2 11 2 1
| | | :
NN Nτ τ τ
α α α α
⎡ Ω→ Γ⎣ ⎤⎦
( ) [ ) 11 2 1, , , 0,
Nτ τ τ
− ∈ ∞
, , , 1 2 1, , , :
NT α α απ π π τ τ τ − Ω→  
as follows: 
( )( )
, , , 1 2 1
1 2 2 3
1 2 2 2
1 2 1 1
, , ,
α α απ π π
τ τ τ ω
π ω ω
τ π ω ω
τ τ π ω ω
τ τ τ π ω ω
τ τ τ π ω ω
⎪ + + ∈⎪
⎪ + + + + ∈
⎪ + + + + ∈⎩
     (1) 
1 2 1
1 2 1
| | |
τ τ τ
ρ ω ω
ρ ω ω
ρ ω ω
α α α α ω
ρ ω ω
ρ ω ω
⎪ ∈⎪⎡ ⎤ = ⎨⎣ ⎦
   (2) 
where ( ) ( ) ( ){ }
1 21 2
; , , ,
S α α αω τ π ω τ π ω τ π ω= ∈Ω < < <  for 1 1n N≤ ≤ − . Each  
is the event consisting of the points 
ω∈Ω  on which none of the algorithms 1 2, , , nα α α  
completes processing within each algorithm’s permitted run time limit. The derived 
algorithm is then defined to be the pair 
.  ( )( )1 2 1 1 21 2 1 , , , 1 2 1| | | , , , ,N NN N NT α α ατ τ τ π π πα α α α τ τ τ−− −⎡ ⎤⎣ ⎦
( )( )
, , , 1 2 1, , ,
NT α α απ π π τ τ τ ω−  represents the time taken for the derived algorithm to 
execute when presented with the task ω , and 
1 2 11 2 1
| | |
NNτ τ τ N
α α α α
⎡ ⎤⎣ ⎦  represents the 
derived algorithm’s output when presented with the task ω .  
We may envision an implementation of this algorithm as follows. When presented with a 
task ω∈Ω , a timer is started, and 1α  is applied. If 1α  has not completed by time 1τ , 1α  
is abandoned and 2α  is applied. If 2α  has not completed by time 1 2τ τ+ , 2α  is 
abandoned and 3α  is applied, and so on. If 1Nα −  has not completed by time 
1 2 1Nτ τ τ −+ + + 1N, α −  is abandoned and Nα  is applied and (unlike the other algorithms) 
is allowed to run without time limit. ( )
ρ ω  is returned as output , where iα  is the 
algorithm which completed execution on the task ω∈Ω . 
The expected value (if it exists) of the random variable ( )
, , , 1 2 1, , ,
NT α α απ π π τ τ τ −  is 
given by the following  
Theorem 1: 
( ) ( ) ( )
( ) ( ) ( ) ( )
, , , 1 2 1 1 1
1 1 1
, , ,
n n n n N N
ET E S P S
E S S P S S E S P S
α α απ π π α
τ τ τ π
− − −
= Ω Ω
∼ ∼ 1−
Proof: Recall that 
( )( )
, , , 1 2 1
1 2 2 3
1 2 2 2
1 2 1 1
, , ,
α α απ π π
τ τ τ ω
π ω ω
τ π ω ω
τ τ π ω ω
τ τ τ π ω ω
τ τ τ π ω ω
⎪ + + ∈⎪
⎪ + + + + ∈
⎪ + + + + ∈⎩
It follows immediately that 
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
, , , 1 2 1
1 1 2 1 2
1 2 2 3 2 3
1 2 2 2 1 2 1
1 2 1 1 1
, , ,
N N N N
N N N
E S P S
E S S P S S
E S S P S S
E S S P
E S P S
α α απ π π
τ τ τ
τ τ π
τ τ τ π
τ τ τ π
− − − −
− − −
= Ω Ω
+ + +
+ + + + +
+ + + + +
∼ NS S −∼
    (4) 
This may be written as  
 ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( ) ( )
, , , 1 2 1 1 1 1 1 2
1 2 1 2 1 2 2 3
2 3 2 3
1 2 2 2 1
2 1 2 1
1 2 1 1 1 1
, , ,
N N N
N N N N
N N N N
ET E S P S P S S
E S S P S S P S S
E S S P S S
P S S
E S S P S S
P S E S P S
α α απ π π α
τ τ τ π τ
π τ τ
τ τ τ
τ τ τ π
− − −
− − − −
− − − −
= Ω Ω +
+ + +
+ + + + +
+ + + + +
∼ ∼ ∼
∼ ∼ ∼
∼ ∼   
Telescoping sums yield
 ( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( )( ) ( )( )
, , , 1 2 1 1 1
1 2 1 2
2 3 2 3
2 1 2 1
1 1 2 2 3 2 1 1
2 2 3 2 1 1 1 1
, , ,
N N N N
N N N
N N N N N
ET E S P S
E S S P S S
E S S P S S
E S S P S S
E S P S
P S S P S S P S S P S
P S S P S S P S P S
α α απ π π α
τ τ τ π
− − − −
− − −
− − − − −
= Ω Ω
+ + + + +
+ + + + + +
∼ ∼ ∼
Next,
( ) ( ) ( )
( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
, , , 1 2 1 1 1
1 2 1 2
2 3 2 3 2 1 2 1
1 1 1 1 2 2 1 1
, , ,
N N N N
N N N N
ET E S P S
E S S P S S
E S S P S S E S S P S S
E S P S P S P S P S
α α απ π π α
τ τ τ π
π τ τ τ
− − −
− − − −
= Ω Ω
+ + +
+ + + + +
∼ ∼ ∼ ∼ −
whence 
( ) ( ) ( )
( ) ( ) ( ) ( ) (
, , , 1 2 1 1 1
1 1 1 1
, , ,
n n n n N N n
ET E S P S
)nE S S P S S E S P S P S
α α απ π π α
τ τ τ π
− − − −
= Ω Ω
+ +∑ ∑
∼ ∼ τ+
as desired.
2 CASE  (TWO ALGORITHMS) 2N =
In this case 
( )( )
( ) ( )
( ) ( )
π ω π ω
τ π ω τ π ω
+ <⎪⎩       
( ) ( ) ( ) ( ) ( ) ( )
1 21 2
, 1 1 1 1 1 1 1ET E S P S E S P S P Sα απ π α ατ π π τ= Ω Ω + +∼ ∼
 and 
( ){ }
;S αω τ π ω= ∈Ω < ,        (11) 
( ) ( ) ( ) ( )( ) ( )1 1 1 2 1 11 2, 1 1 1 1 1 1ET E P E Pα απ π α α α α α ατ π π τ π τ π τ π τ τ π= ≤ ≤ + < + <  (12) 
2.1 ( )
, 1ET α απ π τ  WHEN JOINT DENSITY IS ( )1 2, ,f x yα απ π  
In this case, 
( ) ( ) ( ) ( )
1 2 1 2 1 2
, 1 , 1 ,
0 0 0
,ET xf x y dydx y f x y dydx
α α α α α α
π π π π π π
∞ ∞ ∞
= + +∫ ∫ ∫ ∫ ,    (13) 
2.11 EXAMPLE 
Suppose the joint density of completion times for the two algorithms is given by 
( ) ( )(
, , 12 exp )f x y xy x yα απ π = − +        (14) 
Figure 1. 
,f α απ π  
Then 
( ) ( )( )( ) ( ) ( )
4 2 3 23 3
, 1 1 1 1 1 1 12 8erf 1 expET α απ π τ π τ τ τ τ τ τ= − + + + − + π   (15) 
Figure 2. 
,ET α απ π  
2.12 EXAMPLE 
Suppose the joint density of completion times for the two algorithms is given by 
( ) ( )
, , 48 exp 4 3f x y xy x yα απ π = − −
       
Figure 3. 
,f α απ π  
Then 
( ) ( ) ( )
, 1 1 14 6erf 2 3 exp 4ET α απ π τ π τ π τ= + −      (17) 
Figure 4. 
,ET α απ π  
2.13 EXAMPLE 
Suppose the joint density of completion times for the two algorithms is given by 
( ) ( )( )
( ) ( )( )1 2
, 2 2
exp 1 7
, .022179119694367830844
exp 7 1
f x y xy
α απ π
⎛ ⎞− − − −
⎜ ⎟= ⎜ ⎟
+ − − − −⎜ ⎟
   (18) 
Figure 5. 
,f α απ π  
Figure 6. 
,ET α απ π  
The minimum occurs at 1 2.492τ . Note that if 1 2.492τ , then ( )
, 1 2.854ET α απ π τ , 
while 
4.260E απ  and 2 4.260E απ . In this case the derived algorithm has better mean 
execution time than either of the original algorithms. Its mean execution time is 
approximately 33% less than that of either of the original algorithms. 
Notation: In the following, wherever 1 2, , , MB B B  is found in a context requiring a 
Boolean expression, it means the conjunction 1 2 MB B B∧ ∧ ∧  of the Boolean 
expressions 1 2, , , MB B B .
Notation: In the following, if B  is a Boolean expression, then ( ) .  
1  is tru
0  is fals
In particular we define ( )
a x b a x x b
≤ < ≡ ≤ <⎨
2.14 EXAMPLE 
Suppose the joint density of completion times for the two algorithms is given by 
( ) ( )( )
( )( )
, 1 3 4
5 8 2 4
f x y x y
α απ π
= ≤ < ≤ <
+ ≤ < ≤ <
      (19) 
Figure 7. 
,f α απ π  
Figure 8. 
,ET α απ π  
The minimum occurs at 1 3τ = . Note that if 1 3τ = , then ( )
, 1 4ET α απ π τ = , while 
4.25E απ =  and 2 4.25E απ = . In this case the derived algorithm has better mean 
execution time than either of the original algorithms. Its mean execution time is 
approximately 6%  less than that of either of the original algorithms. 
2.15 EXAMPLE 
Suppose the joint density of completion times for the two algorithms is given by 
( ) ( )
, , expf x y x yα απ π = − −
        
Figure 9. 
,f α απ π  
Figure 10. 
,ET α απ π  
Note that for any choice of 1τ , then ( )
, 1 1ET α απ π τ = , while 1 1E απ =  and 2 1E απ = . In 
this case the derived algorithm has exactly the same mean execution time as do the 
original algorithms, so a derived algorithm would be of no benefit. 
2.16 ( )
, 1ET α απ π τ  DOES NOT ALWAYS EXIST  
If we take ( )
( ) ( )1 2
f x y
x yα α
π π =
, then  
( ) ( ) ( )
1 2 1 2
, 1 ,
0 0 0
,xf x y dydx y f x y dydx
α α α α
π π π π
∞ ∞ ∞
+ + =∫ ∫ ∫ ∫     (21) 
so in this case ( )
, 1ET α απ π τ  does not exist. 
2.2 ( )
, 1ET α απ π τ  WHEN JOINT DENSITY IS OF THE FORM 
( ) ( ) (
1 2 1 2
, , )f x y f x f yα α α απ π π π=  
( ) ( ) ( )( )
( ) ( ) ( ) (
1 21 2 1
1 1 1 2 1
, 1 1 1
1 1 1 1
ET xf x dx P E
E P E P
α α α
π π π α α
α α α α α
τ τ π τ
π π τ π τ τ π τ π
= + < +
= ≤ ≤ + + <
2.3 ( )
, 1ET α απ π τ  WHEN JOINT DENSITY IS OF THE FORM 
( ) ( )
1 2 2
ax by cx dy
xpf x y x y
a b c dα απ π
+ + + +
+ + + +
If  then 0 , , ,a b c d≤ ( ) (
1 2 2
ax by cx dy )xpf x y x y
a b c dα απ π
+ + + +
= − −
+ + + +
 is a density 
function over ( ) . Accordingly, [ ) [ ), 0, 0,x y ∈ ∞ × ∞
( ) ( )
1 2 6 2 2 4 4 exp
1 2 2
a b c d c a b c d
a b c dα απ π
+ + + + − + − + − −
+ + + +
  (23) 
2.4 ( )
, 1ET α απ π τ  WHEN JOINT DENSITY IS OF THE FORM 
( ) ( )
( )( )
( ) ( )1 2
, 2 3 3
1 1 1 1
f x y d m n
x y x yα α
+ + + +
If , 0 , , then  ( )0 ,d m n≤ c≤ ( )
c d m n
+ =∑∑ 1
( ) ( )
( )( )
( ) ( )1 2
, 2 3 3
1 1 1 1
f x y d m n
x y x yα α
+ + + +
is a density function over ( ) [ ) [ ), 0, 0,x y ∈ ∞ × ∞ . A straightforward calculation yields  
( ) ( ) ( ) ( )
( ) ( ) ( )
0 2 0 01 1
1, , 0,
2 1 1 1
1 1 1
2, 1, ln 1
n m n n
ET d n d m n d
d m n d m n c
∞ ∞ ∞ ∞
= = = =
= + + +
+ + +
+ − − − +⎜ ⎟+⎝ ⎠+
∑ ∑ ∑ ∑
  (25) 
2.5 ( )
, 1ET α απ π τ  WHEN JOINT DENSITY IS OF THE FORM 
( ) ( )(
n n n n n
)f x y k a x b c y d
α απ π
= ≤ < ≤ <∑  
Theorem 2: If  and ( )( )
n n n n n
k b a d c
− −∑ = 0 nk<  for 1, 2, ,n N= , and 
( ) ( )(
n n n n n
)f x y k a x b c y d
α απ π
= ≤ < ≤ <∑ , then 
( ) ( )
( )( )
( )( )( )
( )( )
( )( )( )
( )( )
, 1 1 2
n n n
n n n n n
n n n n n n
n n n n n
n n n n n n n
n n n n n
ET k d c
k b a d c
k b d c d c
k b a d c
k b d c a d c
k b a d c
α απ π
⎜ ⎟= − −⎜ ⎟⎜ ⎟
− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − + −⎜ ⎟
+ − −
+ + −
+ − −
      (26) 
Notation: In the following, if  is a Boolean expression, then . ( )B n
( )( )
s B n
≡∑ ∑ s
Proof: If  and 0( )( )
n n n n n
k b a d c
− −∑ = nk<  for 1, 2, ,n N= , then 
( ) ( )(
n n n n n
)f x y k a x b c y d
α απ π
= ≤ < ≤ <∑  is a density function over 
. Now note that ( ) [ ) [ ), 0, 0,x y ∈ ∞ × ∞
( ) ( ) ( ) ( )
1 2 1 2 1 2
, 1 , 1 ,
0 0 0
,ET xf x y dydx y f x y dydx
α α α α α α
π π π π π π
∞ ∞ ∞
= + +∫ ∫ ∫ ∫ ,      
( )( )
( ) ( )( )
n n n n n
n n n n n
x k a x b c y d dydx
y k a x b c y d dydx
= ≤ < ≤ <
+ + ≤ < ≤ <
     (27) 
( )( )
( )( )( )
( ) ( )
( ) ( )
1 0 0
n n n n n
n n n n n
n n n n n
n n n
k x a x b c y d dydx
k y a x b c y d dy
k d c x a x b dx
k a x b dx y dy
= ≤ < ≤ <
+ + ≤ < ≤ <
= − ≤ <
+ ≤ < +
∑ ∫ ∫
     (28) 
( ) ( )
( ) ( ) ( )
( ) ( )
( ) ( )
( ) ( )
( )( ) ( )
n n n n n
n n n
n n n n n
n n n n n
n n n n n
n n n n n n n
k d c x a x b dx
k a x b dx
k d c x a x b dx
k d c x a x b dx
k d c x a x b dx
k d c c d a x b dx
= − ≤ <
⎛ ⎞+ +
⎜ ⎟+ ≤ < −
= − ≤ <
+ − ≤ <
+ − ≤ <
+ − + + ≤ <
( )( ) ( )
( )( ) ( )
n n n n n n n
n n n n n n n
k d c c d a x b dx
k d c c d a x b dx
+ − + + ≤ <
+ − + + ≤ <
    (29) 
( ) ( ) ( )
( ) ( )
( )( ) ( )
( )( ) ( )
n n n n n
n n n n n
n n n n n n n
n n n n n n n
n n n n n
ET k d c x a x b dx
k d c x a x b dx
k d c c d a x b dx
k d c c d a x b dx
k d c xdx k d
= − ≤ <
+ − ≤ <
+ − + + ≤ <
+ − + + ≤ <
= − + −
∑ ∫ ( )
( )( )
( )( )
( )( ) ( ) ( )
( )( )( )
2 2 21 1 1
12 2 2
n n n
n n n n n
n n n n n
n n n n n n n n n
a b b
n n n n n n n
n n n
c xdx
k d c c d dx
k d c c d dx
k d c a k d c b a
k d c c d b a
k d c
< < ≤
+ − + +
+ − + +
= − − + −
+ − + + −
( )( )
n n n
c d b
+ + −∑
( )( )
( )( )( )
( )( )
( )( )( )
( )( )
n n n
n n n n n
n n n n n n
n n n n n
n n n n n n n
n n n n n
k d c
k b a d c
k b d c d c
k b a d c
k b d c a d c
k b a d c
⎜ ⎟= − −⎜ ⎟⎜ ⎟
− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − + −⎜ ⎟
+ − −
+ + −
+ − −
       (31)
( ) ( )
( )( )
( )( )( )
( )( )
( )( )( )
( )( )
, 1 1 2
n n n
n n n n n
n n n n n n
n n n n n
n n n n n n n
n n n n n
ET k d c
k b a d c
k b d c d c
k b a d c
k b d c a d c
k b a d c
α απ π
⎜ ⎟= − −⎜ ⎟⎜ ⎟
− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − + −⎜ ⎟
+ − −
+ + −
+ − −
      (32) 
In particular, 
( ) ( )
( )( )
( )( )( )
( )( )
( )( )( )
( )( )
n i n
n i n
n i n
i i n n n
a a b
n n n n n
n n n n n n
a a b
n n n n n
n n n n n n n
a a b
n n n n n
ET a a k d c
k b a d c
k b d c d c
k b a d c
k b d c a d c
k b a d c
α απ π
⎜ ⎟= − −⎜ ⎟⎜ ⎟
− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − + −⎜ ⎟
+ − −
+ + −
+ − −
      (33) 
( ) ( )
( )( )
( )( )( )
( )( )
( )( )( )
( )( )
n i n
n i n
n i n
i i n n n
a b b
n n n n n
n n n n n n
a b b
n n n n n
n n n n n n n
a b b
n n n n n
ET b b k d c
k b a d c
k b d c d c
k b a d c
k b d c a d c
k b a d c
α απ π
⎜ ⎟= − −⎜ ⎟⎜ ⎟
− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − + −⎜ ⎟
+ − −
+ + −
+ − −
      (34) 
It is straightforward to show that ( )
, 1ET α απ π τ  attains a global minimum at one of the 
points { }1 2 1 2, , , , ,Na a a b b bN . Indeed, ( )
, 1ET α απ π τ  is a continuous, piecewise 
quadratic function. Notice that the set of points of connection of the pieces is a subset of 
{ }1 2 1 2, , , , ,Na a a b b bN . Each piece is either linear or is quadratic with a negative 
second derivative. We can thus replace each quadratic piece with a linear piece 
connecting the endpoints of the quadratic piece, without altering the global minimum of  
, 1ET α απ π τ . After replacing each quadratic piece with the appropriate linear piece, we 
then have a continuous piecewise linear function whose global minimum is the same as 
that of ( )
, 1ET α απ π τ . But of course the global minimum of a continuous piecewise linear 
function is attained at one of its vertices. These vertices are a subset of the set of points of 
connection { }1 2 1 2, , , , ,Na a a b b bN , as desired. 
This global minimum is given by 
( ) ( ) ( )
( ) ( ) ( )
1 2 1 2 1 2
1 2 1 2 1 2
, 1 , 2 ,
, 1 , 2 ,
, , ,
, , , ,
ET a ET a ET a
ET b ET b ET b
α α α α α α
α α α α α α
π π π π π π
π π π π π π
⎪ ⎪⎩ ⎭
⎬      (35) 
This minimum can be computed in ( )2O N  time. 
MAXIMUM LIKELIHOOD ESTIMATION 
Suppose ( ) ( )(
n n n n n
)f x y k a x b c y d
α απ π
= ≤ < ≤ <∑   
with the following conditions: 
1.  for 1 , 0nk > n N≤ ≤
2.  for 1 , na b< n
n N≤ ≤
3.  for 1 , nc d< n N≤ ≤
4. The boxes  for 1[ ) [ ), ,n n n n nB a b c d≡ × n N≤ ≤ are disjoint, 
5. . ( )( )
n n n n n
k b a d c
− −∑ =
Then 
,f α απ π  is a joint density function. 
Suppose next that we have observed the performance of two equivalent algorithms α  
and β  over a (finite) sample set sΩ ⊂ Ω . That is, for each task sω∈Ω  we have 
observed the values ( )απ ω  and ( )βπ ω  representing the time that algorithms α  and β  
actually took to process the task ω . We now present a maximum-likelihood procedure to 
find the “best fitting” joint density function of the form 
( ) ( )(
n n n n n
)f x y k a x b c y d
α απ π
= ≤ < ≤ <∑ ,  
subject to the five conditions above. Let ( ) ( ) ( ){ }1 1 2 2, , , , , ,P Px y x y x y  be the data 
observed, where jx  and  are the durations required by algorithms jy α  and β  
respectively, to process j sω ∈Ω , for 1 j P≤ ≤  , with sP = Ω . Our performance function 
is defined as 
( ) ( ) ( )( ) 1 2
1 2 , 1 2
, , , , N
P P N
N m m n n m n n m n
g k k k f x y k a x b c y d k k k
α απ π
≡ = ≤ < ≤ < =∑∏ ∏ N
where 
( ) ( ) ( ) ( ){ }{ }1 1 2 2, , , , , , , ; ,j P P j j j jS x y x y x y x y a x b c y d≡ ∈ ≤ < ≤ <  
We form as usual the Lagrange multiplier equations  
( )( )1 0j j j j j
S b a d c
λ + − − =  for 1 j P≤ ≤ . We have immediately that 
( )( )
j j j j
b a d c
and recalling the constraint 
n n n n n
k b a d c
= − −∑ )  
we infer 
S Pλ λ
= − = −∑  
whence 
λ = −  
thus 
( )( )
j j j j
P b a d c
Substituting into (32), we get 
( )( )
, 1 1 2
n n n n n
ET d c
P b a d cα απ π
⎜ ⎟= − −⎜ ⎟− −⎜ ⎟
( )( )
( )( )
( )( )
( )( )( )
n n n n
n n n n n
n n n n n
n n n n n
b a d c
P b a d c
b d c d c
P b a d c
− −⎜ ⎟− −⎜ ⎟
+ ⎜ ⎟
⎜ ⎟+ − +⎜ ⎟− −⎜ ⎟
( )( )
( )( )
n n n n
n n n n n
b a d c
P b a d c
+ − −
− −∑  
( )( )
( )( )( )
n n n n n n
n n n n n
b d c a d c
P b a d c
− −∑ − −  
( )( ) ( )( )
n n n n
n n n n n
b a d c
P b a d c
− −∑ −  
( )( )
n n n n
a b n n n
n n n
P b a S
b d c
P b a
⎜ ⎟⎛ ⎞ ⎜ ⎟⎜ ⎟= − + ⎜ ⎟⎜ ⎟− ⎜ ⎟⎜ ⎟ + −⎝ ⎠ ⎜ ⎟−⎜ ⎟
( )( ) ( )
1 1 1
21 1 1
2 2 2
1 1 1
n n n n
N N N
n n n n n n n n
n n nn n
a a b b
d c b d c a b a
P P b a P
τ τ τ
= = =
≤ < < ≤
+ + + + − +
∑ ∑ ∑ n
1 1 1
, 1 1 12
1 1 1
n n n n n
N N N
n n nn
n n nn n n n
a b a a b
b d cS
ET S S
b a P b aα απ π
τ τ τ
τ τ τ
= = =
< < ≤ < <
− +⎜ ⎟= − + +⎜ ⎟− −⎜ ⎟
∑ ∑ ∑  
( )( ) ( ) ( )
2 2 2
n n n n
n n n n n n n n n nP P
n nn n
a b a b
b d c a S d c S b a
< < ≤ ≤
+ + − + + +
3 CONCLUSIONS 
In this paper, we asked the following questions: Given two or more equivalent 
algorithms, is it ever possible to create a new derived algorithm whose mean execution 
time is less than that of all of the original algorithms? If so, how can such an algorithm be 
derived? 
By giving examples in Section 2, we have shown that the answer to the first question is 
“yes.” In Section 1, we gave an explicit construction of the derived algorithm. 
ABSTRACT
  In this paper we give a definition of "algorithm," "finite algorithm,"
"equivalent algorithms," and what it means for a single algorithm to dominate a
set of algorithms. We define a derived algorithm which may have a smaller mean
execution time than any of its component algorithms. We give an explicit
expression for the mean execution time (when it exists) of the derived
algorithm. We give several illustrative examples of derived algorithms with two
component algorithms. We include mean execution time solutions for
two-algorithm processors whose joint density of execution times are of several
general forms. For the case in which the joint density for a two-algorithm
processor is a step function, we give a maximum-likelihood estimation scheme
with which to analyze empirical processing time data.

<|endoftext|><|startoftext|>
Draft version October 22, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
NEW CLOSE BINARY SYSTEMS FROM THE SDSS–I (DATA RELEASE FIVE) AND THE SEARCH FOR
MAGNETIC WHITE DWARFS IN CATACLYSMIC VARIABLE PROGENITOR SYSTEMS
Nicole M. Silvestri
, Mara P. Lemagie
, Suzanne L. Hawley
, Andrew A. West
, Gary D. Schmidt
, James
Liebert
, Paula Szkody
, Lee Mannikko
, Michael A. Wolfe
, J. C. Barentine
, Howard J. Brewington
Michael Harvanek
, Jurik Krzesinski
, Dan Long
, Donald P. Schneider
, and Stephanie A. Snedden
(Received 2007 February 9)
Draft version October 22, 2018
ABSTRACT
We present the latest catalog of more than 1200 spectroscopically–selected close binary systems
observed with the Sloan Digital Sky Survey through Data Release Five. We use the catalog to search
for magnetic white dwarfs in cataclysmic variable progenitor systems. Given that approximately 25%
of cataclysmic variables contain a magnetic white dwarf, and that our large sample of close binary
systems should contain many progenitors of cataclysmic variables, it is quite surprising that we find
only two potential magnetic white dwarfs in this sample. The candidate magnetic white dwarfs, if
confirmed, would possess relatively low magnetic field strengths (BWD < 10 MG) that are similar to
those of intermediate–Polars but are much less than the average field strength of the current Polar
population. Additional observations of these systems are required to definitively cast the white dwarfs
as magnetic. Even if these two systems prove to be the first evidence of detached magnetic white
dwarf + M dwarf binaries, there is still a large disparity between the properties of the presently known
cataclysmic variable population and the presumed close binary progenitors.
Subject headings: binaries: close — cataclysmic variables — stars: low-mass — stars: magnetic fields
— stars: white dwarfs
1. INTRODUCTION
The evolution of stars in close binary systems leads
to interesting stellar end-products such as cataclysmic
variables (CVs), Type 1a supernovae, and helium–core
white dwarfs (WDs). The period in which an evolved
star ascends the asymptotic giant branch and engulfs
a close companion in its evolving atmosphere, referred
to as the common envelope phase, probably plays a
dominant role in the evolution of these systems and
as yet is poorly understood. The angular momentum
of the system is believed to aid in the eventual ejec-
tion of the common envelope to reveal the remnant WD
and close companion. After the common envelope has
been ejected, gravitational and magnetic braking work
to decrease the orbital separation of the detached system
(de Kool & Ritter 1993). This orbital evolution contin-
ues through to the CV phase. The effect of the common
envelope on the secondary star in these systems is an-
other aspect of close binary evolution which is not well
characterized. Plausible scenarios for the secondary com-
1 Department of Astronomy, University of Washington, Box
351580, Seattle, WA 98195, USA; nms@astro.washington.edu,
mlemagie@u.washington.edu, slh@astro.washington.edu,
szkody@astro.washington.edu, leeman@u.washington.edu,
maw2323@u.washington.edu.
2 Astronomy Department, 601 Campbell Hall, University of
California, Berkeley, CA 94720, USA; awest@astro.berkeley.edu
3 Department of Astronomy and Steward Observatory, Univer-
sity of Arizona, Tucson, AZ 85721, USA; schmidt@as.arizona.edu,
jliebert@as.arizona.edu.
4 Apache Point Observatory, P.O. Box 59, Sunspot, NM
88349, USA; jcb@apo.nmsu.edu, hbrewington@apo.nmsu.edu,
harvanek@apo.nmsu.edu, long@apo.nmsu.edu, sned-
den@apo.nmsu.edu.
5 Mt. Suhora Observatory, Cracow Pedagogical University, ul.
Podchorazych 2, 30-084 Cracow, Poland; jurek@apo.nmsu.edu.
6 Department of Astronomy, Penn State University, PA 16802
USA; dps@astro.psu.edu
panion range from accreting as much as 90% of its mass
during this phase to escaping relatively unscathed from
the common envelope, emerging in the same state as it
entered (see Livio 1996, and references therein).
Recently, studies of close binary systems with WD
companions (see for example Farihi et al. 2005b,a;
Pourbaix et al. 2005; Silvestri et al. 2006) have revealed
yet another puzzling property of these systems. None of
the WDs in close binary systems with low–mass, main se-
quence companions appear to be magnetic (Liebert et al.
2005). Close, non–interacting binary systems with WD
primaries are quite common and are believed to be the
direct progenitors to CVs (Langer et al. 2000, and ref-
erences therein). Magnetic WDs, stellar remnants with
magnetic fields in excess of ∼ 1 MG, comprise only a
small percentage of the isolated WD population (∼ 2%,
Liebert et al. 2005). Note that the 2% magnetic WD
fraction applies to magnitude–limited samples like the
Palomar–Green (Liebert et al. 1988). However, the same
paper notes that magnetic WDs may generally have
smaller radii than non–magnetic white dwarfs, due to
higher mass. In a given volume, the density of mag-
netic WDs may be ∼ 10% of all WDs (Liebert et al.
2003). The SDSS is also a magnitude limited sample
so we assume a similar expected value for the close bi-
naries. Our sample (as discussed in detail in §2) con-
tains 1253 potential close binary systems. Therefore we
assume approximately 24 of these binaries to harbor a
magnetic WD. Possible implications of the small radii
for magnetic WD + main sequence pairs will be dis-
cussed in §5. However, more than 25% of the WDs in
the currently identified CV population are classified as
magnetic, and many have magnetic fields in excess of 10
MG (see Wickramasinghe & Ferrario 2000).
Holberg et al. (2002) have compiled a list of 109 known
http://arxiv.org/abs/0704.0789v1
2 Silvestri et al.
WDs within 20pc (and complete to within 13pc) that
have nearly complete information about the presence
of a companion. Of the 109 WDs in their sample,
19 ± 4 have nondegenerate companions. Table 7 in
Kawka et al. (2007) lists all known magnetic WDs as of
June 2006. Of the magnetic WDs listed in their table,
149 have field strengths identifiable in SDSS-resolution
spectra (BWD ≥ 3 MG). If the magnetic WDs in the
Kawka et al. (2007) sample are assumed to be drawn
from a similar sample then 28 ± 5.3 would be expected
to have nondegenerate companions, and yet none have
been detected in the Kawka et al. (2007) sample. This is
nearly a 5σ deficit in magnetic WDs with nondegenerate
companions.
Holberg & Magargal (2005) looked at the 2MASS
JHKs photometry of 347 WDs in the Palomar–Green
sample. Of the 347 WDs, 254 had reliable infrared mea-
surements of at least J magnitude. Of these, 59 had
excesses indicative of a nondegenerate companion and
another 15 showed “probable” excesses (Liebert et al.
2005). This gives a WD+dM fraction of 23% (definite
excess) and 29% (including all probable excesses). If the
Kawka et al. (2007) sample had the same frequency of of
nondegenerate companions as the Palomar–Green sam-
ple, they should have 34 and 43, respectively. This is
nearly as 6σ deficit!
This apparent lack of magnetic WDs with main se-
quence companions is not restricted to studies of close
binaries. Low resolution spectroscopic surveys of more
than 500 common proper motion binary systems dis-
covered by Luyten et al. (1964); Luyten (1968, 1972)
and Giclas et al. (1971, 1978) revealed no magnetic WDs
paired with main sequence companions in these wide
pairs (Smith 1997; Silvestri et al. 2005). In addition,
Schmidt et al. (2003) and Vanlandingham et al. (2005)
have identified over 100 magnetic WDs in the Sloan
Digital Sky Survey (SDSS, Gunn et al. 1998; York et al.
2000; Stoughton et al. 2002; Pier et al. 2003; Gunn et al.
2006). As discussed by Liebert et al. (2005), this implies
essentially no overlap between the close binary and mag-
netic WD samples.
A new class of short–period, low accretion–rate polars
(LARPS) identified by Schmidt et al. (2005b) may ex-
plain, in part, these “missing” magnetic WD systems.
In these systems, the donor star has not filled its Roche
Lobe. The WD accretes material by capturing the stel-
lar wind of the secondary. These CVs have accretion
rates that are less than 1% of accretion rates normally
associated with CVs. The discovery of these systems
sheds some light on the whereabouts of magnetic WD
binaries, though as Schmidt et al. (2005b) point out,
this still does not explain the apparent lack of long–
period, detached magnetic WD systems. Thought to be
the first detached binary with a magnetic WD, SDSS
J121209.31+013627.7, a magnetic WD with a proba-
ble brown dwarf (L dwarf) companion (Schmidt et al.
2005a) has been shown to be one of these LARP systems
(Debes et al. 2006; Koen & Maxted 2006; Burleigh). To
date, magnetic WDs have only been found as isolated ob-
jects, in binaries with another degenerate object (WD or
neutron star companion), or in CVs; none have a clearly
main sequence companion.
In this study, we investigate a new large sample of close
binary systems in an effort to uncover these “missing”
magnetic WD binary systems. The sample comprises
more than 1200 close binary systems containing a WD
and main sequence star drawn from the SDSS, many of
which were originally presented in Silvestri et al. (2006,
hereafter, S06). We find that only two of the WDs in
these pairs appear to be magnetic. Even if confirmed,
neither of these WDs has magnetic field strength compa-
rable to those observed in the majority of magnetic (Po-
lar) CV systems. We confirm that the current CV and
close binary populations are indeed disparate and show
that more work is necessary to unravel this mystery.
In §2 we introduce the catalog of close binary sys-
tems through the public SDSS Data Release Five (DR5;
Adelman-McCarthy et al. 2007). We discuss our anal-
ysis techniques in §3 and we present our results in §2.
Our discussion and concluding remarks are given in §5
and §6, respectively.
2. THE SDSS CLOSE BINARY CATALOG THROUGH DR5
The combined properties of the majority of close
binaries in this paper are discussed in detail in
Raymond et al. (2003) and S06. The S06 cata-
log was based on a preliminary list of spectroscopic
plates released internally to the collaboration and as
such does not include objects from ∼ 200 plates re-
leased with the final public Data Release Four (DR4;
Adelman-McCarthy et al. 2006). The additional systems
from both DR4 and DR5 do not change the overall re-
sults from analysis performed in S06, hence no new anal-
ysis is presented here. We include this list in its en-
tirety to complete the DR4 catalog introduced by S06
and add over 300 new systems from the now public DR5
(Adelman-McCarthy et al. 2007). This completes the
catalog of close binary systems with a WD identified
through SDSS–I. More close binaries are being targeted
in the SDSS–II (SEGUE) survey which will continue to
increase the sample through 2008.
The list of 1253 potential close binary systems given
in Table 1 includes objects from all plates released with
the public DR5, thereby superseding the S06 DR4 cat-
alog. The technique used to search for these objects is
the same as described in S06. As with that study, we
do not include systems with low signal-to-noise ratios
(S/N < 5) and do not search for systems with non–DA
WDs. We emphasize that our sample is not complete
(or bias free) due to the selection effects imposed by our
detection methods and due to the sporadic targeting of
these objects in the SDSS spectroscopic survey as dis-
cussed in S06. Thus, our sample represents primarily
bright, DAWD +M dwarf binary systems. As evidenced
by Smolčić et al. (2004), there are potentially thousands
more WD + M dwarf binaries observed photometrically
in the SDSS but not targeted for spectroscopy. Our cat-
alog represents an interesting and statistically significant
sampling of these systems, the properties of which can
be used to test models of close binary evolution (see
Politano & Weiler 2006, for example).
The list of plate numbers from which this
sample has been drawn can be found at
http://das.sdss.org/DR5/data/spectro/1d 23/. This
plate list includes both “extra” and “special” plates.
The extra plates are repeat observations of survey plates
taken during normal operation. The special plates are
observations for special programs (e.g. SEGUE, F stars,
http://das.sdss.org/DR5/data/spectro/1d_23/
Magnetic WDs in CV Progenitors 3
Fig. 1.— Example of an M dwarf with excess blue flux (:+dM)
from Table 1. The companion is seen as little more than excess blue
flux in the M dwarf spectrum. Follow-up spectroscopy to resolve
the companion is necessary to rule out the presence of a magnetic
WD. Note: spectrum has been boxcar smoothed with a filter size
of seven.
main sequence turnoff stars, quasar selection efficiency,
etc.) that are not part of the original SDSS–I survey.
The first four columns of Table 1 list the SDSS identi-
fier, the plate number, fiber identification, and modified
Julian date (MJD) of the observation, followed by the
spectral type of the components (determined visually)
where Sp1 represents the blue object and Sp2 is the red
object. Columns 6 and 7 give the J2000 coordinates (in
decimal degrees) for the object. The next 15 columns
give the ugriz PSF photometry (Fukugita et al. 1996;
Hogg et al. 2001; Ivezić et al. 2004; Smith et al. 2002;
Tucker et al. 2006), photometric uncertainties (σugriz),
and reddening (Augriz). The magnitudes are not cor-
rected for Galactic extinction. Column 23 lists the SDSS
data release in which the object was discovered as well as
additional references in the literature. Additional notes
for the objects are listed in column 24.
The objects identified in Table 1 as :+dM are likely
M dwarfs with faint, cool WD companions. The dis-
covery spectra for these objects reveal little more than
excess blue flux at wavelengths shorter than 5000 Å, as
shown in Figure 1. It is possible that some of these pairs
may contain a magnetic WD; however, much higher S/N
spectra are required to adequately characterize the blue
component of these systems.
Similarly, the thirty nine objects identified as WD+:
or WD+:e (see Figure 8 of S06) have either some ex-
cess flux in the red or have emission at Balmer wave-
lengths indicative of a faint, active, low–mass or sub–
stellar companion. The companion to the magnetic WD
in Schmidt et al. (2005a) was first identified by emission
at Hα in the SDSS discovery spectrum. Other than the
emission at Hα this object had no other optical signa-
ture of a companion. We are performing followup obser-
vations using the ARC 3.5–m telescope at Apache Point
Observatory to obtain radial velocities and near–infrared
imaging of these objects to measure the orbital periods
and categorize the probable low–mass companion’s spec-
tral type. We have already confirmed that none of these
systems contain a magnetic WD.
3. THE SEARCH FOR MAGNETIC WDS
Schmidt et al. (2003) and Vanlandingham et al. (2005)
demonstrated that magnetic WDs with field strengths as
low as ∼ 3 MG can be effectively measured using SDSS
spectra. Visual inspection of the systems in our sample
reveals no obvious magnetic WDs in spectra with good
S/N (> 10) (Lemagie et al. 2004). Most are classical
Fig. 2.— A Typical WD+dM System: SDSS
J140723.03+003841.7, the superposition of a DA (hydrogen
atmosphere) WD and a M4 red dwarf star. Hα emission is
visible in many of these systems and is a result of chromspheric
activity on the surface of the M star, perhaps enhanced due to
the influence of the WD. The lack of broad Zeeman absorption
features in the hydrogen lines indicates that the magnetic field
strength of the WD is very low (compare with Figure 4).
WD + M dwarf close binaries as shown in Figure 2. Of
interest are the lower quality spectra, where the features
of the WD are less obvious because of low S/N and/or
contamination by the spectral features of the close M
dwarf companion. These effects make it difficult to iden-
tify small magnetic field effects on the WD absorption
features. Thus, relatively low magnetic fields (BWD < 10
MG) are not easily recognized in the combined spectrum.
3.1. The Simulated Magnetic Binary Systems
Given the difficulties associated with visually identi-
fying features in these systems, we developed a method
to search for the characteristic Zeeman splitting of the
DA WD absorption features that is also sensitive to low
magnetic field WDs. We use a program that attempts
to match absorption features in magnetic DA WD mod-
els (see Kemic 1974b,a; Schmidt et al. 2003, and refer-
ences therein for details on the models) through an it-
erative method of smoothing and searching the stellar
spectrum. To develop a robust program to search for
magnetic WDs in close binaries we first tested our pro-
gram on WDs of known magnetic field strength. We
used the magnetic DA WDs with field strengths between
1.5 MG ≤ BWD ≤ 30 MG from Schmidt et al. (2003)
and Vanlandingham et al. (2005) as our test sample. We
then constructed model spectra at every half–MG be-
tween 1.5 MG ≤ BWD ≤ 30 MG, each with magnetic
field inclinations of 30◦, 60◦, and 90◦. The program was
able to match (using a χ2 minimization) the magnetic
field strength of each of the magnetic WDs to within ±5
MG of the value quoted in Schmidt et al. (2003).
We then constructed a sample of simulated SDSS spec-
tra of magnetic binary systems. The simulated binaries
were created by adding the spectra of magnetic WDs
used in our initial test from Schmidt et al. (2003) and
Vanlandingham et al. (2005) to the M star templates of
Hawley et al. (2002). We first normalized all spectra at
a wavelength of 6500 Å, and then combined them with
flux ratios of 1:4 (WD:M dwarf) to 4:1 to replicate the
range of flux ratios observed in the close binary sample
(see Figure 3)7. This created a sample of binaries which
represent the average brightness and spectral type dis-
tribution of the majority of the systems in Table 1 (i.e.
7 Note that 6500 Å is the midpoint of the SDSS combined blue
and red spectra, as plotted in Figure 3. In reality the SDSS spectra
extend to below 3900 Å and to nearly 10000 Å.
4 Silvestri et al.
Fig. 3.— Comparison of simulated and observed pre–cataclysmic
variable (PCV) systems. Left Hand Column: Simulated magnetic
PCVs produced by adding WD spectra from Schmidt et al. (2003)
to M dwarf spectra from Hawley et al. (2002) with brightness ratios
as specified at 6500 Å. Right Hand Column: Observed PCVs from
Silvestri et al. (2006).
Fig. 4.— A Simulated System. Top Left Panel: A 13 MG mag-
netic WD from Schmidt et al. (2003). Top Right Panel: Template
M4 dwarf star from Hawley et al. (2002). Bottom Panel: addition
of the magnetic WD and template M dwarf, assuming equal flux
density at 6500 Å.
DA WDs and M0–M5 dwarfs).
Figure 4 is an example of one of the simulated magnetic
binary systems. The upper left hand panel is the SDSS
spectrum of a 13 MG magnetic WD, the upper right
hand panel is the spectrum of a template M4 dwarf star.
The bottom panel is the addition (superposition) of the
two spectra with a flux ratio of 1:1 at 6500 Å. As shown,
this WD with a relatively moderate magnetic field, when
combined with the spectrum of an average M dwarf, is
clearly detected at the resolution of the SDSS spectra (R
∼ 1800).
3.2. Results from the Simulated Systems
We found that detecting the presence of a WD mag-
netic field depends most strongly on the spectral type
and relative flux of the M dwarf companion. Due to the
selection effects of the close binary sample (see S06 for
details), the majority of the M dwarfs in these binaries
have spectral sub–types between M0–M4. In SDSS spec-
tra, early M dwarf spectral types contribute nearly as
much flux in the blue portion of the spectrum (4000–7000
Å) as they do in the red (7000–10000 Å). The spectrum
of the blue magnetic WD is then superimposed onto the
numerous blue molecular features of the M dwarf. This
makes the small absorption features stemming from the
subtle influence of a weak magnetic field difficult to de-
tect.
We plot a subset of our simulated pairs to demon-
strate some of these issues in Figure 5 and Figure 6.
In Figure 5 we selected four early–type template M
dwarfs (WD+M0 = open squares, WD+M1 = open
circles, WD+M2 = open triangles, and WD+M3 =
Fig. 5.— Left Hand Panel: Subset of the simulated binary sys-
tems comprised of early–M dwarfs from Hawley et al. (2002) paired
with magnetic WDs and literature values from Schmidt et al.
(2003) and Vanlandingham et al. (2005) with BWD ≤ 10 MG.
Right Hand Panel: Same M dwarfs from Left Panel paired with
magnetic WDs with BWD ≥ 10 MG. The measured values are
from our program. In both panels, the filled triangles represent
single WDs, open squares are WD+M0, open circles are WD+M1,
open triangles are WD+M2, and crosses are WD+M3. The solid
line has a slope of one and the dashed lines are ±5 MG. Refer to
§ 3.2 of the text for details.
Fig. 6.— Left Hand Panel: Subset of the simulated binary sys-
tems comprised of late-M dwarfs from Hawley et al. (2002) paired
with magnetic WDs and literature values from Schmidt et al.
(2003) and Vanlandingham et al. (2005) with BWD ≤ 10 MG.
Right Hand Panel: Same M dwarfs from Left Panel paired with
magnetic WDs with BWD ≥ 10 MG. The measured values are
from our program. In both panels, the filled triangles represent
single WDs, open squares are WD+M4, open circles are WD+M5,
open triangles are WD+M6, and crosses are WD+M7. The solid
line has a slope of one and the dashed lines are ±5 MG. Refer to
§ 3.2 of the text for details.
crosses) from Hawley et al. (2002) and added them to a
range of magnetic WDs from Schmidt et al. (2003) and
Vanlandingham et al. (2005). The quoted value from
Schmidt et al. (2003) for the magnetic field strength of
each of these WDs represents the “Literature BWD”
value on the x–axis. The “Measured BWD” is the value
returned by the program. Values returned by the pro-
gram that matched the literature values fall along the
solid line. The dashed lines represent ±5 MG of the
literature value. Figure 6 is the same except we add
the same magnetic WDs to later–type M dwarf tem-
plates (WD+M4 = open squares, WD+M5 = open cir-
cles, WD+M6 = open triangles, andWD+M7 = crosses).
The solid triangles represent the tests using the isolated
WD spectra.
In both Figures 5 and 6 the program returns the value
of the single WD to within ∼ ±2 MG for the large ma-
jority of the systems. The uncertainty of the fitted value
and the spread in values increases for magnetic fields of 3
MG or less when the magnetic WD is paired with an M
dwarf of comparable brightness. The flux minima associ-
ated with the Zeeman features for such low field strengths
are just barely resolvable in high S/N spectra of isolated
SDSS WDs (see Schmidt et al. 2003). The added com-
Magnetic WDs in CV Progenitors 5
Fig. 7.— Here, we plot the flux ratio (WD flux/ M dwarf
[dM]) versus the difference between the literature value (from
Schmidt et al. 2003; Vanlandingham et al. 2005) of the magnetic
field strength (BLit) and the measured magnetic field strength
(BMea) as determined from the WD Hα (top panel), Hβ (center
panel), and Hγ (bottom panel) absorption features. Error bars are
from the χ2 fit. Refer to § 3.2 of the text for details.
plexity of the M dwarf molecular features and the gen-
erally lower S/N spectra make it difficult to measure the
magnetic features for low magnetic field strengths. How-
ever, WDs with magnetic fields ≥ 4 MG were easily mea-
sured at all M dwarf spectral types.
In both Figures, the largest discrepancies between the
literature and measured values occur when the WD’s
magnetic field is between 12 MG ≤ BWD ≤ 18 MG;
this is true when the WD is paired with both early– and
late–type M dwarfs. Inspection of the model results in-
dicates that at these field strengths, the Zeeman features
overlap on wavelengths with strong M dwarf molecular
features, causing confusion in the identification of the fea-
ture. However, WD spectra with these and larger field
strengths are quite easily recognized visually so we are
confident that no systems with ≥ 10 MG have escaped
notice, though the exact value of the field strength would
be more uncertain.
In Figure 7, we demonstrate the effect of the relative
flux ratio (WD: M dwarf [dM]) on the identification of the
magnetic field strength of WDs in the simulated binary
sample. The Figure gives the relative flux ratio versus
the difference between the magnetic fields quoted in the
literature and those returned by the program. We use
the same BWD distribution in Figure 7 as used in Fig-
ure 5 and Figure 6. The literature values (BLit) are from
Schmidt et al. (2003) and Vanlandingham et al. (2005).
The three panels show ratios determined using Hα (top),
Hβ (center), and Hγ (bottom). The program consis-
tently returns the quoted BWD as determined from Hβ
until the flux contribution from the M dwarf is nearly
double the flux contribution from the WD. The program
returns the magnetic field from the Hα feature to within
±5 MG until the flux contribution from the M dwarf is
nearly 1.5× the flux from the WD. The BWD as mea-
sured by Hγ is consistently 15–25 MG larger than the
BWD value in the literature at any flux ratio. The con-
tribution of a relatively clean spectral region near Hβ, to-
gether with the fairly strong Zeeman signal at this wave-
length makes Hβ a reliable indicator of WD magnetic
field strength for binaries with flux ratios up to 1:2.
4. TWO POSSIBLE MAGNETIC WDS IN THE DR5 CLOSE
BINARY SAMPLE
The method employed by S06 to split the binary sys-
tem into its two component spectra through an itera-
tive method of fitting and subtracting WD model at-
mospheres and template M dwarf spectra was not used
Fig. 8.— Two potential magnetic DA WD + M dwarf pairs as
identified by our program. The tentative magnetic field strengths
are 8 MG ±5 MG (top) and 3 MG +5/−3 MG (bottom) as deter-
mined from the Hα and Hβ WD absorption features.
on these objects. There are no obviously strong mag-
netic WDs in the sample, suggesting that any possibly
magnetic WDs must possess relatively weak fields. The
subsequent fitting and subtraction of model WDs and
template M dwarfs adds noise to the spectrum which
would make detection of an already weak magnetic field
even more difficult. Also, we would be subtracting a
non–magnetic WD model from the spectrum of a poten-
tially magnetic WD in our attempt to improve the M
dwarf template fit. This adds absorption features where
none actually exist, further corrupting the WD spectrum.
Given these complications, we chose to work with the
original composite SDSS discovery spectra.
Table 2 lists the properties of the only two close bi-
nary systems flagged by our program as containing po-
tential magnetic WDs: SDSS J082828.18+471737.9 and
SDSS J125250.03−020608.1. The first four columns are
the same as for Table 1, followed by the R.A. and
Decl. (J2000 coordinates). The tentative magnetic field
strengths (in MG), inclination of the WD magnetic field
to the line of sight (in degrees) and the spectral types
of the components are listed in Columns 7–9. For each
of these systems the magnetic field strength estimate is
based upon a match to at least two of the three Balmer
features (Hα, Hβ, and/or Hγ) to within ±5 MG of the
model minima. The last six columns give the ugriz pho-
tometry and the SDSS data release for the objects. Refer
to Table 1 for a full listing of photometric errors, redden-
ing and alternate literature sources.
Figure 8 displays the spectra of these two objects,
which have relatively low S/N (∼ 5 at Hα) The iden-
tification of the magnetic field strength was determined
from the Hα and Hβ features in each spectrum, which
upon closer inspection may show some Zeeman splitting.
The best fit model for SDSS J082828.18+471737.9 has a
magnetic field strength of 8 MG and an inclination of 90◦,
while the best fit model for SDSS J125250.03−020608.1
has a magnetic field strength of 3 MG and an inclina-
tion of 90◦. Hβ appears to be distorted in both systems,
indicating a potential broadening of a few MG field, how-
ever Hγ and Hδ would show more splitting than Hβ but
both appear to be relatively sharp in comparison. Hβ
may be affected by TiO features from the M dwarf and
there does appear to be a minor glitch in the blue por-
tion of the spectrum, indication difficulty with SDSS flux
calibration.
5. DISCUSSION
Of the 1253 potential close binary systems in the DR5
catalog, there were 168 systems that we could not mea-
6 Silvestri et al.
sure with our program. These include the :+dM systems
and binaries with non-DA WDs. We were not able to
unambiguously determine if the :+dM systems have a
magnetic or a non–magnetic WD as the blue component
is barely visible in most of the :+dM SDSS spectra. Un-
til we can identify the companion, we can not make any
statement about magnetism in these objects. The :+dM
cases where a blue component is seen in the spectrum
which must be a WD, but too faint even to classify the
type may include (a) cases where the WD is simply very
cool, but also (b) magnetic WDs of suitably warmer ef-
fective temperature but with smaller radii. These need to
be reobserved in the blue with a spectrograph and tele-
scope of large aperture. We made no attempt to measure
the DB WDs because we lack viable magnetic DB WD
models; however, all of the DB spectra matched well to
non–magnetic DB WD models, so we believe it is un-
likely that any of the WDs in these pairs are magnetic.
We could not measure the pairs with DC WDs because
there are no features with which we can detect a magnetic
field and therefore cannot rule out magnetism without
employing polarimetry or other methods of identifying a
magnetic field in these objects.
Of the remaining binary systems, we find only two
that may contain WDs with weak magnetic fields. Our
automatic detection methods are sensitive to magnetic
fields between 3 MG ≤ BWD ≤ 30 MG; field strengths
larger than this are easily identified by visual inspec-
tion. Therefore, there is a significant shortage of close
binary systems that could be the progenitors of the large
Intermediate–Polar and Polar CV populations.
As mentioned in §1, Schmidt et al. (2005b) discuss six
newly identified low accretion rate magnetic binary sys-
tems as being the probable progenitors to magnetic CVs.
The magnetic field strengths of the WDs in these sys-
tems are fairly high, with most around 60 MG. These
objects are clearly pre-Polars and provide an obvious link
between post–common envelope, detached binaries and
Polars. The existence of these objects, however only adds
to the mystery. If observations of these objects are possi-
ble then why have no detached binary systems with large
magnetic field WDs been detected?
Perhaps selection effects are to blame. Schmidt et al.
(2005b) discuss the various selection effects associated
with targeting these pre–Polars with the SDSS. As is
the case with the majority of the close binary systems,
the pre–Polars were targeted by the SDSS QSO target-
ing pipeline (Richards et al. 2002) which accounts for the
narrow range of magnetic field strengths found in these
objects. In the case of significantly lower or higher mag-
netic field strengths, the pre–Polars resemble an ordinary
WD + M dwarf binary in color–color space and are re-
jected by the QSO targeting algorithm. It is possible that
this selection effect accounts for the lack of close binary
magnetic systems targeted by the SDSS as well. Arguing
against this explanation is the large number of detached
close binary systems in our sample, and the fact that the
pre–Polars were observed by the SDSS. It is quite sur-
prising that a detached system with a WD magnetic field
in the range required to detect these pre–Polars has not
been observed, if such objects exist.
Another selection effect discussed by Liebert et al.
(2005) argues that magnetic WDs, on average, are more
massive than non–magnetic WDs; this implies smaller
WD radii and therefore less luminous WDs. Faint, mas-
sive WDs in competition with the flux from an M star
companion might go undetected in an optical survey
because they are hidden by the more luminous, non–
degenerate companion. This would imply an unusually
small mass ratio (q = M2/M1) for the initial binary if
the progenitor of the magnetic white dwarf were mas-
sive (3-8 M⊙). Thus, the magnetics may usually have
been paired with an A-G star. However, the vast major-
ity of polars and intermediate polars with strongly mag-
netic primaries have M dwarf companions. Perhaps they
were whittled down from more massive stars by mass
transfer. The LARPS are selected for spectroscopy be-
cause of their peculiar colors, which arise because of the
isolated cyclotron harmonics. As Schmidt et al. (2005b,
2007) point out, the WDs in LARPS are generally rather
faint (cool) and, in one case, undetected. So the large
mass/small radius selection effect would also apply to
the pre–Polars which have been observed by SDSS.
6. CONCLUSIONS
We present a new sample of close binary systems
through the Data Release Five of the SDSS. This cat-
alog includes more than 1200 WD + M dwarf binary
systems and represents the largest catalog of its kind to
date.
We have fit magnetic DA WD models (see
Schmidt et al. 2003, and references therein) to the 1100
DA WD + M dwarf close binaries in the DR5 sample.
Only two have been found to potentially harbor a mag-
netic DA WD of low (BWD < 10 MG) magnetic field
strength. Neither of these potential magnetic WDs are
convincing cases, though follow–up spectroscopy to im-
prove the S/N or polarimetry on these objects should
be performed to completely rule out the presence of a
magnetic field.
The remaining ∼ 100 close binaries comprised of M
dwarfs with excess blue flux (:+dM) and binaries with
non–DA WDs require other means of detecting mag-
netic fields. Methods that are sensitive to magnetic
fields weaker than 3 MG should also be employed on this
sample to detect possible Intermediate–Polar progenitors
that may have escaped detection with our methods.
Even if future spectroscopic or polarimetric observa-
tions reveal the two DA WD candidates to be magnetic,
their field strengths will likely prove to be quite low. A
sample of two, detached, low magnetic field WD binaries
is not representative of the majority of known magnetic
WDs in CVs nor would it comprise an adequate progen-
itor population for the newly discovered magnetic pre–
Polars described in Schmidt et al. (2005b). The question
of where the progenitors to magnetic CVs are remains
unanswered by the current spectroscopically identified
close binary population.
This work was supported by NSF Grant AST 02–05875
(NMS, SLH), a University of Washington undergraduate
research grant (MPL), NSF grant AST 03–06080 (GDS),
and NSF grant AST 03–07321 (JL).
Funding for the SDSS and SDSS–II has been pro-
vided by the Alfred P. Sloan Foundation, the Partic-
ipating Institutions, the National Science Foundation,
the U.S. Department of Energy, the National Aeronau-
Magnetic WDs in CV Progenitors 7
tics and Space Administration, the Japanese Monbuka-
gakusho, the Max Planck Society, and the Higher Educa-
tion Funding Council for England. The SDSS Web Site
is http://www.sdss.org/.
The SDSS is managed by the Astrophysical Research
Consortium for the Participating Institutions. The Par-
ticipating Institutions are the American Museum of Nat-
ural History, Astrophysical Institute Potsdam, Univer-
sity of Basel, University of Cambridge, Case Western
Reserve University, University of Chicago, Drexel Uni-
versity, Fermilab, the Institute for Advanced Study, the
Japan Participation Group, Johns Hopkins University,
the Joint Institute for Nuclear Astrophysics, the Kavli
Institute for Particle Astrophysics and Cosmology, the
Korean Scientist Group, the Chinese Academy of Sci-
ences (LAMOST), Los Alamos National Laboratory, the
Max–Planck–Institute for Astronomy (MPIA), the Max–
Planck–Institute for Astrophysics (MPA), New Mexico
State University, Ohio State University, University of
Pittsburgh, University of Portsmouth, Princeton Uni-
versity, the United States Naval Observatory, and the
University of Washington.
http://www.sdss.org/
8 Silvestri et al.
REFERENCES
Abazajian, K., et al. 2003, AJ, 126, 2081
—. 2004, AJ, 128, 502
—. 2005, AJ, 129, 1755
Adelman-McCarthy, J. K., et al. 2006, ApJS, 162, 38
—. 2007, ApJS, submitted
(Burleigh), M. R., et al. 2006, MNRAS,
accepted,[astro-ph/0609366], accepted,[astro
de Kool, M., & Ritter, H. 1993, A&A, 267, 397
Debes, J. H., López-Morales, M., Bonanos, A. Z., & Weinberger,
A. J. 2006, ApJ, 647, L147
Eisenstein, D., et al. 2006, AJ, accepted [astro-ph/0606700],
accepted [astro
Farihi, J., Becklin, E. E., & Zuckerman, B. 2005a, ApJS, 161, 394
Farihi, J., Zuckerman, B., & Becklin, E. E. 2005b, Astronomische
Nachrichten, 326, 964
Fukugita, M., Ichikawa, T., Gunn, J. E., Doi, M., Shimasaku, K.,
& Schneider, D. P. 1996, AJ, 111, 1748
Giclas, H. L., Burnham, R., & Thomas, N. G. 1971, Lowell
proper motion survey Northern Hemisphere. The G numbered
stars. 8991 stars fainter than magnitude 8 with motions
≥0′′.26/year (Flagstaff, Arizona: Lowell Observatory, 1971)
Giclas, H. L., Burnham, Jr., R., & Thomas, N. G. 1978, Lowell
Observatory Bulletin, 8, 89
Gunn, J. E., et al. 1998, AJ, 116, 3040
—. 2006, AJ, 131, 2332
Hawley, S. L., et al. 2002, AJ, 123, 3409
Hogg, D. W., Finkbeiner, D. P., Schlegel, D. J., & Gunn, J. E.
2001, AJ, 122, 2129
Holberg, J. B., & Magargal, K. 2005, in ASP Conf. Ser. 334: 14th
European Workshop on White Dwarfs, ed. D. Koester &
S. Moehler, 419–+
Holberg, J. B., Oswalt, T. D., & Sion, E. M. 2002, ApJ, 571, 512
Ivezić, Ž., et al. 2004, Astronomische Nachrichten, 325, 583
Kawka, A., Vennes, S., Schmidt, G. D., Wickramasinghe, D. T.,
& Koch, R. 2007, ApJ, 654, 499
Kemic, S. B. 1974a, ApJ, 193, 213
—. 1974b, ApJ, 193, 213
Kleinman, S. J., et al. 2004, ApJ, 607, 426
Koen, C., & Maxted, P. F. L. 2006, MNRAS, 371, 1675
Langer, N., Deutschmann, A., Wellstein, S., & Höflich, P. 2000,
A&A, 362, 1046
Lemagie, M. P., Silvestri, N. M., Hawley, S. L., Schmidt, G. D.,
Liebert, J., & Wolfe, M. A. 2004, in Bulletin of the American
Astronomical Society, 1515
Liebert, J., Bergeron, P., & Holberg, J. B. 2003, AJ, 125, 348
Liebert, J. et al. 2005, AJ, 129, 2376
Liebert, J., et al. 1988, PASP, 100, 1302
Livio, M. 1996, in ASP Conf. Ser. 90: The Origins, Evolution, and
Destinies of Binary Stars in Clusters, ed. E. F. Milone & J.-C.
Mermilliod, 291
Luyten, W. J. 1968, Univ. Minnesota, Minneapolis,$ fasc. 1-57,$
1963-81,1963, 13, 1 (1968), 13, 1
—. 1972, Proper Motion Survey with the 48-inch Telescope,
Univ. Minnesota, 29, 1 (1972), 29, 1
Luyten, W. J., Anderson, J. H., & University Of
Minnesota. Observatory. 1964, Publications of the
Astronomical Observatory University of Minnesota
Pier, J. R., Munn, J. A., Hindsley, R. B., Hennessy, G. S., Kent,
S. M., Lupton, R. H., & Ivezić, Ž. 2003, AJ, 125, 1559
Politano, M., & Weiler, K. P. 2006, ApJ, 641, L137
Pourbaix, D., et al. 2005, A&A, 444, 643
Raymond, S. N., et al. 2003, AJ, 125, 2621
Richards, G. T., et al. 2002, AJ, 123, 2945
Schmidt, G. D., Szkody, P., Henden, A., Anderson, S. F., Lamb,
D. Q., Margon, B., & Schneider, D. P. 2007, ApJ, 654, 521
Schmidt, G. D., Szkody, P., Silvestri, N. M., Cushing, M. C.,
Liebert, J., & Smith, P. S. 2005a, ApJ, 630, L173
Schmidt, G. D., et al. 2003, ApJ, 595, 1101
—. 2005b, ApJ, 630, 1037
Schuh, S., & Nagel, T. 2006, in ASP Conf. Ser., The 15th
European Workshop on White Dwarfs, ed. R. Napiwotzki &
M. Burleigh, accepted [astro–ph/0610324]
Silvestri, N. M., Hawley, S. L., & Oswalt, T. D. 2005, AJ, 129,
Silvestri, N. M., et al. 2006, AJ, 131, 1674
Smith, J. A. 1997, Ph.D. Thesis, Florida Institute of Technology
Smith, J. A., et al. 2002, AJ, 123, 2121
Smolčić, V., et al. 2004, ApJ, 615, L141
Stoughton, C., et al. 2002, AJ, 123, 485
Tucker, D. L., et al. 2006, Astronomische Nachrichten, 327, 821
van den Besselaar, E. J. M., Roelofs, G. H. A., Nelemans, G. A.,
Augusteijn, T., & Groot, P. J. 2005, A&A, 434, L13
Vanlandingham, K. M., et al. 2005, AJ, 130, 734
Wickramasinghe, D. T., & Ferrario, L. 2000, PASP, 112, 873
York, D. G., et al. 2000, AJ, 120, 1579
http://arxiv.org/abs/astro-ph/0609366
http://arxiv.org/abs/astro-ph/0606700
http://arxiv.org/abs/astro--ph/0610324
Magnetic WDs in CV Progenitors 9
TABLE 1
The SDSS–I DR5 Catalog of Close Binary Systems.
Identifier Plate FiberID MJD Sp1+Sp2a R.A.b Decl. upsf σu Au gpsf σg Ag rpsf σr Ar ipsf σi Ai zpsf σz Az Refs
c Notesd
(SDSS J) (deg) (deg)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) (22) (23) (24)
001029.87+003126.2 0388 545 51793 DZ:+dM 2.62448 00.52396 21.93 0.19 0.14 20.85 0.04 0.10 19.98 0.03 0.08 19.00 0.02 0.06 18.42 0.04 0.04 EDR
001726.63−002451.2 0687 153 52518 DA+dMe 4.36099 −00.41422 19.68 0.04 0.14 19.29 0.03 0.10 19.03 0.02 0.07 18.19 0.02 0.06 17.54 0.03 0.04 R03
001733.59+004030.4 0389 614 51795 DA+dM 4.38996 00.67511 22.10 0.40 0.13 20.79 0.14 0.10 19.59 0.03 0.07 18.17 0.02 0.05 17.39 0.02 0.04 EDR/R03
001749.24−000955.3 0389 112 51795 DA+dMe 4.45519 −00.16539 16.57 0.02 0.13 16.87 0.02 0.10 17.03 0.01 0.07 16.78 0.01 0.05 16.47 0.02 0.04 EDR/R03
002620.41+144409.5 0753 079 52233 DA+dMe 6.58505 14.73597 17.57 0.01 0.27 17.35 0.01 0.20 17.34 0.02 0.15 16.65 0.01 0.11 16.04 0.02 0.08 DR2
Note. — Table 1 is published in its entirety in the electronic edition of the AJ. A portion is shown here for guidance regarding its form and content. ugriz photometry has not been corrected for Galactic extinction.
Sp1: Spectral type of the WD, Sp2: Spectral type of the low–mass dwarf (see Silvestri et al. 2006, for details on Sp determination); e: emission detected visually.
R.A. and Decl. are J2000.0 equinox.
EDR: Stoughton et al. (2002); DR[1,2,3]: Abazajian et al. (2003, 2004, 2005); DR[4,5]: Adelman-McCarthy et al. (2006, 2007); R03: published in Raymond et al. (2003); K04: published in Kleinman et al. (2004); B05: published in van den Besselaar et al. (2005);
Sc05: published in Schmidt et al. (2005a); S06: published in Silvestri et al. (2006); E06: published in Eisenstein et al. (2006); P05: published in Pourbaix et al. (2005); KM: published in Koen & Maxted (2006); SN: published in Schuh & Nagel (2006); da06: R. da Silva
(priv. comm., 2006).
low: potential low gravity (log g < 7) white dwarf.
10 Silvestri et al.
TABLE 2
Two Potential Magnetic White Dwarf Binary Systems.
Identifier Plate Fiber MJD R.A. Decl. B i Sp1+Sp2 u g r i z Release
SDSS J (deg) (deg) (MG) (deg)
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
082828.18+471737.9 0549 338 51981 127.11742 +47.29387 8 90 DA+dM 20.41 20.35 20.33 19.58 19.02 DR1
125250.03−020608.1 0338 343 51694 193.20846 −02.10227 3 90 DA+dM 19.25 19.12 18.89 18.31 17.82 DR1
ABSTRACT
  We present the latest catalog of more than 1200 spectroscopically-selected
close binary systems observed with the Sloan Digital Sky Survey through Data
Release Five. We use the catalog to search for magnetic white dwarfs in
cataclysmic variable progenitor systems. Given that approximately 25% of
cataclysmic variables contain a magnetic white dwarf, and that our large sample
of close binary systems should contain many progenitors of cataclysmic
variables, it is quite surprising that we find only two potential magnetic
white dwarfs in this sample. The candidate magnetic white dwarfs, if confirmed,
would possess relatively low magnetic field strengths (B_WD < 10 MG) that are
similar to those of intermediate-Polars but are much less than the average
field strength of the current Polar population. Additional observations of
these systems are required to definitively cast the white dwarfs as magnetic.
Even if these two systems prove to be the first evidence of detached magnetic
white dwarf + M dwarf binaries, there is still a large disparity between the
properties of the presently known cataclysmic variable population and the
presumed close binary progenitors.

<|endoftext|><|startoftext|>
arXiv:0704.0790v3  [hep-th]  15 Nov 2007
Dynamical Casimir effect for gravitons in bouncing braneworlds
Marcus Ruser∗ and Ruth Durrer†
Département de Physique Théorique, Université de Genève,
24 quai Ernest Ansermet, 1211 Genève 4, Switzerland.
We consider a two-brane system in five-dimensional anti-de Sitter space-time. We study particle
creation due to the motion of the physical brane which first approaches the second static brane
(contraction) and then recedes from it (expansion). The spectrum and the energy density of the
generated gravitons are calculated. We show that the massless gravitons have a blue spectrum
and that their energy density satisfies the nucleosynthesis bound with very mild constraints on the
parameters. We also show that the Kaluza-Klein modes cannot provide the dark matter in an anti-
de-Sitter braneworld. However, for natural choices of parameters, backreaction from the Kaluza-
Klein gravitons may well become important. The main findings of this work have been published
in form of a Letter [R.Durrer and M.Ruser, Phys. Rev. Lett. 99, 071601 (2007), arXiv:0704.0756].
PACS numbers: 04.50.+h, 11.10.Kk, 98.80.Cq
I. INTRODUCTION
In recent times, the possibility that our observed
Universe might represent a hypersurface in a higher-
dimensional space-time has received considerable
attention. The main motivation for this idea is the
fact, that string theory [1, 2], which is consistent only
in ten spac-etime dimensions (or 11 for M–theory)
allows for solutions where the standard model particles
(like fermions and gauge bosons) are confined to some
hypersurface, called the brane, and only the graviton can
propagate in the whole space-time, the bulk [2, 3]. Since
gravity is not well constrained at small distances, the
dimensions normal to the brane, the extra dimensions,
can be as large as 0.1mm.
Based on this feature, Arkani-Hamed, Dimopoulos
and Dvali (ADD) proposed a braneworld model where
the presence of two or more flat extra-dimensions can
provide a solution to the hierarchy problem, the problem
of the huge difference between the Planck scale and the
electroweak scale [4, 5].
In 1999 Randall and Sundrum (RS) introduced a model
with one extra dimension, where the bulk is a slice of
five-dimensional anti de-Sitter (AdS) space. Such curved
extra dimensions are also referred to as warped extra
dimensions. While in the RS I model [6] with two flat
branes of opposite tension at the edges of the bulk the
warping leads to an interesting solution of the hierarchy
problem, it localizes four-dimensional gravity on a single
positive tension brane in the RS II model [7].
Within the context of warped braneworlds, cosmological
evolution, i.e., the expansion of the Universe, can be
understood as the motion of the brane representing our
Universe through the AdS bulk. Thereby the Lanczos-
Sen-Darmois-Israel-junction conditions [8, 9, 10, 11],
relate the energy-momentum tensor on the brane to the
∗Electronic address: marcus.ruser@physics.unige.ch
†Electronic address: ruth.durrer@physics.unige.ch
extrinsic curvature and hence to the brane motion which
is described by a modified Friedmann equation. At low
energy, however, the usual Friedmann equations for the
expansion of the Universe are recovered [12, 13].
Since gravity probes the extra dimension, gravita-
tional perturbations on the brane, i.e. in our Universe,
carry five-dimensional effects in form of massive four-
dimensional gravitons, the so-called Kaluza-Klein (KK)
tower. Depending on the particular brane trajectory,
these perturbations may be significantly amplified lead-
ing to observable consequences, for example, a stochastic
gravitational wave background. (For a review of stochas-
tic gravitational waves see [14].) This amplification
mechanism is identical to the dynamical Casimir effect
for the electromagnetic field in cavities with dynamical
walls (moving mirrors); see [15, 16, 17] and references
therein. In the quantum field theoretical language,
such an amplification corresponds to the creation of
particles out of vacuum fluctuations. Hence, in the same
way a moving mirror leads to production of photons,
the brane moving through the bulk causes creation of
gravitons. Thereby, not only the usual four-dimensional
graviton might be produced, but also gravitons of the
KK tower can be excited. Those massive gravitons are of
particular interest, since their energy density could dom-
inate the energy density of the Universe and spoil the
phenomenology if their production is sufficiently copious.
The evolution of cosmological perturbations under
the influence of a moving brane has been the subject of
many studies during recent years. Since one has to deal
with partial differential equations and time-dependent
boundary conditions, the investigation of the evolution
of perturbations in the background of a moving brane is
quite complicated. Analytical progress has been made
based on approximations like the “near brane limit” and
a slowly moving brane [18, 19, 20, 21].
The case of de Sitter or quasi-de Sitter inflation
on the brane has been investigated analytically in
[22, 23, 24, 25, 26]. In [25] it is demonstrated that dur-
http://arxiv.org/abs/0704.0790v3
mailto:marcus.ruser@physics.unige.ch
mailto:ruth.durrer@physics.unige.ch
yb(t)
FIG. 1: Two branes in an AdS5 spacetime, with y denot-
ing the fifth dimension and L the AdS curvature scale. The
physical brane is on the left at time dependent position yb(t).
While it is approaching the static brane its scale factor is
decreasing and when it moves away from the static brane it
is expanding [cf. Eq. (2.3)]. The value of the scale factor of
the brane metric as function of the extra dimension y is also
indicated.
ing slow-roll inflation (modeled as a period of quasi-de
Sitter expansion) the standard four-dimensional result
for the amplitude of perturbations is recovered at low
energies while it is enhanced at high energies.
However, most of the effort has gone into numerical
simulations [27, 28, 29, 30, 31, 32, 33, 34, 35, 36], in
particular in order to investigate the high-energy regime.
Thereby different coordinate systems have been used
for which the brane is at rest, and different numerical
evolution schemes have been employed in order to solve
the partial differential equation.
In this work we chose a different way of looking at
the problem. We shall apply a formalism used to
describe the dynamical Casimir effect to study the
production of gravitons in braneworld cosmology. This
approach and its numerical implementation offers many
advantages. The most important one is the fact that
this approach deals directly with the appearing mode
couplings by means of coupling matrices. (In [19] a
similar approach involving coupling matrices has been
used. However, perturbatively only, and not in the
complexity presented here.) Hence, the interaction
between the four-dimensional graviton and the KK
modes is not hidden within a numerical simulation but
can directly be investigated making it possible to reveal
the underlying physics in a very transparent way.
We consider a five-dimensional anti-de Sitter spacetime
with two branes in it; a moving positive tension brane
representing our Universe and a second brane which, for
definiteness, is kept at rest. This setup is depicted in
Fig. 1. For this model we have previously shown that
in a radiation dominated Universe, where the second,
fixed brane is arbitrarily far away, no gravitons are
produced [37].
The particular model which we shall consider is
strongly motivated by the ekpyrotic or cyclic Universe
and similar ideas [38, 39, 40, 41, 42, 43, 44, 45, 46].
In this model, roughly speaking, the hot big bang
corresponds to the collision of two branes; a moving
bulk brane which hits “our” brane, i.e. the observable
Universe. Within such a model, it seems to be possible to
address all major cosmological problems (homogeneity,
origin of density perturbations, monopole problem)
without invoking the paradigm of inflation. For more
details see [38] but also [39] for critical comments.
One important difference between the ekpyrotic model
and standard inflation is that in the latter one tensor
perturbations have a nearly scale invariant spectrum.
The ekpyrotic model, on the other hand, predicts a
strongly blue gravitational wave spectrum with spectral
tilt nT ≃ 2 [38]. This blue spectrum is a key test
for the ekpyrotic scenario since inflation always pre-
dicts a slightly red spectrum for gravitational waves.
One method to detect a background of primordial
gravitational waves of wavelengths comparable to the
Hubble horizon today is the polarization of the cosmic
microwave background. Since a strongly blue spectrum
of gravitational waves is unobservably small for large
length scales, the detection of gravitational waves in the
cosmic microwave background polarization would falsify
the ekpyrotic model [38].
Here we consider a simple specific model which is
generic enough to cover important main features of the
generation and evolution of gravitational waves in the
background of a moving brane whose trajectory involves
a bounce. First, the physical brane moves towards the
static brane, initially the motion is very slow. During
this phase our Universe is contracting, i.e. the scale
factor on the brane decreases, the energy density on
the brane increases and the motion becomes faster. We
suppose that the evolution of the brane is driven by a
radiation component on the brane, and that at some
more or less close encounter of the two branes which we
call the bounce, some high-energy mechanism which we
do not want to specify in any detail, turns around the
motion of the brane leading to an expanding Universe.
Modeling the transition from contraction to subsequent
expansion in any detail would require assumptions about
unknown physics. We shall therefore ignore results
which depend on the details of the transition. Finally
the physical brane moves away from the static brane
back towards the horizon with expansion first fast and
then becoming slower as the energy density drops. This
model is more similar to the pyrotechnic Universe of
Kallosh, Kofman and Linde [39] where the observable
Universe is also represented by a positive tension brane
rather than to the ekpyrotic model where our brane has
negative tension.
We address the following questions: What is the spec-
trum and energy density of the produced gravitons, the
massless zero mode and the KK modes? Can the gravi-
ton production in such a brane Universe lead to limits,
e.g. on the AdS curvature scale via the nucleosynthesis
bound? Can the KK modes provide the dark matter
or lead to stringent limits on these models? Similar
results could be obtained for the free gravi-photon and
gravi-scalar, i.e. when we neglect the perturbations of
the brane energy momentum tensor which also couple to
these gravity wave modes which have spin-1 respectively
spin-0 on the brane.
The reminder of the paper is organized as follows.
After reviewing the basic equations of braneworld cos-
mology and tensor perturbations in Sec. II, we discuss
the dynamical Casimir effect approach in Sec. III. In
Sec. IV we derive expressions for the energy density
and the power spectrum of gravitons. Thereby we show
that, very generically, KK gravitons cannot play the
role of dark matter in warped braneworlds. This is
explained by the localization of gravity on the moving
brane which we discuss in detail. Section V is devoted to
the presentation and discussion of our numerical results.
In Sec. VI we reproduce some of the numerical results
with analytical approximations and we derive fits for
the number of produced gravitons. We discuss our main
results and their implications for bouncing braneworlds
in Sec. VII and conclude in Sec. VIII. Some technical
aspects are collected in appendices.
The main and most important results of this rather long
and technical paper are published in the Letter [47].
II. GRAVITONS IN MOVING BRANEWORLDS
A. A moving brane in AdS5
We consider a AdS-5 spacetime. In Poincaré coordi-
nates, the bulk metric is given by
ds2 = gABdx
AdxB =
−dt2 + δijdxidxj + dy2
(2.1)
The physical brane (our Universe) is located at some time
dependent position y = yb(t), while the 2nd brane is at
fixed position y = ys (see Fig. 1). The induced metric on
the physical brane is given by
ds2 =
y2b (t)
dt2 + δijdx
= a2(η)
−dη2 + δijdxidxj
, (2.2)
where
a(η) =
yb(t)
(2.3)
is the scale factor and η denotes the conformal time of
an observer on the brane,
dt ≡ γ−1dt . (2.4)
We have introduced the brane velocity
v ≡ dyb
= − LH√
1 + L2H2
and (2.5)
1− v2
1 + L2H2 . (2.6)
Here H is the usual Hubble parameter,
H ≡ ȧ/a2 ≡ a−1H = −L−1γv , (2.7)
and an overdot denotes the derivative with respect to
conformal time η. The bulk cosmological constant Λ is
related to the curvature scale L by Λ = −6/L2. The
junction conditions on the brane lead to [37, 48]
(ρ+ T ) = 6
1 + L2H2
, (2.8)
(ρ+ P ) = − 2LḢ
1 + L2H2
. (2.9)
Here T is the brane tension and ρ and P denote the
energy density and pressure of the matter confined on
the brane. Combining (2.8) and (2.9) results in
ρ̇ = −3Ha(ρ+ P ) , (2.10)
while taking the square of (2.8) leads to
. (2.11)
These equations form the basis of brane cosmology and
have been discussed at length in the literature (for re-
views see [49, 50]). The last equation is called the mod-
ified Friedmann equation for brane cosmology [13]. For
usual matter with ρ+ P > 0, ρ decreases during expan-
sion and at sufficiently late time ρ ≪ T . The ordinary
four-dimensional Friedmann equation is then recovered if
and we set κ
= 8πG4 =
. (2.12)
Here we have neglected a possible four-dimensional cos-
mological constant. The first of these equations is the
RS fine tuning implying
κ5 = κ4 L . (2.13)
Defining the string and Planck scales by
= L3s , κ4 =
= L2Pl , (2.14)
respectively, the RS fine tuning condition leads to
. (2.15)
As outlined in the introduction, we shall be interested
mainly in a radiation dominated low-energy phase, hence
in the period where
ρ and |v| ≪ 1 so that γ ≃ 1 , dη ≃ dt .
(2.16)
In such a period, the solutions to the above equations are
of the form
a(t) =
|t|+ tb
, (2.17)
yb(t) =
|t|+ tb
, (2.18)
v(t) = − sgn(t)L
(|t|+ tb)2
≃ −HL . (2.19)
Negative times (t < 0) describe a contracting phase,
while positive times (t > 0) describe radiation dominated
expansion. At t = 0, the scale factor exhibits a kink and
the evolution equations are singular. This is the bounce
which we shall not model in detail, but we will have to in-
troduce a cutoff in order to avoid ultraviolet divergencies
in the total particle number and energy density which are
due to this unphysical kink. We shall show, that when
the kink is smoothed out at some length scale, the pro-
duction of particles (KK gravitons) of masses larger than
this scale is exponentially suppressed, as it is expected.
The (free) parameter tb > 0 determines the value of the
scale factor at the bounce ab, i.e. the minimal interbrane
distance, as well as the velocity at the bounce vb
ab = a(0) =
, |v(0)| ≡ vb =
. (2.20)
Apparently we have to demand tb > L which implies
yb(t) < L.
B. Tensor perturbations in AdS5
We now consider tensor perturbations on this back-
ground. Allowing for tensor perturbations hij(t,x, y) of
the spatial three-dimensional geometry at fixed y, the
bulk metric reads
ds2 =
−dt2 + (δij + 2hij)dxidxj + dy2
. (2.21)
Tensor modes satisfy the traceless and transverse condi-
tions, hii = ∂ih
j = 0. These conditions imply that hij has
only two independent degrees of freedom, the two polar-
ization states • = ×,+. We decompose hij into spatial
Fourier modes,
hij(t,x, y) =
(2π)3/2
•=+,×
eik·xe•ij(k)h•(t, y;k) ,
(2.22)
where e•ij(k) are unitary constant transverse-traceless po-
larization tensors which form a basis of the two polariza-
tion states • = ×,+. For hij to be real we require
h∗•(t, y;k) = h•(t, y;−k). (2.23)
The perturbed Einstein equations yield the equation of
motion for the mode functions h
, which obey the Klein-
Gordon equation for minimally coupled massless scalar
fields in AdS5 [25, 51, 52]
∂2t + k
2 − ∂2y +
(t, y;k) = 0 . (2.24)
In addition to the bulk equation of motion the modes also
satisfy a boundary condition at the brane coming from
the second junction condition,
LH∂th• −
1 + L2H2∂yh•
− γ (v∂t + ∂y)h•|yb =
aPΠ(T )
. (2.25)
Here Π(T )
denotes possible anisotropic stress perturba-
tions in the brane energy momentum tensor. We are in-
terested in the quantum production of free gravitons, not
in the coupling of gravitational waves to matter. There-
fore we shall set Π(T )
= 0 in the sequel, i.e. we make
the assumption that the Universe is filled with a perfect
fluid. Then, (2.25) reduces to 1
(v∂t + ∂y)h•|yb(t) = 0 . (2.26)
This is not entirely correct for the evolution of gravity
modes since at late times, when matter on the brane is
no longer a perfect fluid (e.g., free-streaming neutrinos)
and anisotropic stresses develop which slightly modify
the evolution of gravitational waves. We neglect this sub-
dominant effect in our treatment. (Some of the difficul-
ties which appear when Π(T )
6= 0 are discussed in [48].)
The wave equation (2.24) together with the boundary
condition (2.26) can also be obtained by variation of the
action
Sh = 2
yb(t)
|∂th•|2 − |∂yh•|2 − k2|h•|2
, (2.27)
which follows from the second order perturbation of the
gravitational Lagrangian. The factor 2 in the action is
1 In Equations (4) and (8) of our Letter [47] two sign mistakes
have creeped in.
due to Z2 symmetry. Indeed, Equation (2.26) is the only
boundary condition for the perturbation amplitude h•
which is compatible with the variational principle δSh =
0, except if h• is constant on the brane. Since this issue is
important in the following, it is discussed more detailed
in Appendix A.
C. Equations of motion in the late time/low energy
limit
In this work we restrict ourselves to relatively late
times, when
ρT ≫ ρ2 and therefore |v| ≪ 1. (2.28)
In this limit the conformal time on the brane agrees
roughly with the 5D time coordinate, dη ≃ dt and we
shall therefore not distinguish these times; we set t = η.
We want to study the quantum mechanical evolution
of tensor perturbations within a canonical formulation
similar to the dynamical Casimir effect for the electro-
magnetic field in dynamical cavities [15, 16, 17]. In or-
der to pave the way for canonical quantization, we have
to introduce a suitable set of functions allowing the ex-
pansion of the perturbation amplitude h• in canonical
variables. More precisely, we need a complete and or-
thonormal set of eigenfunctions φα of the spatial part
−∂2y + 3y∂y = −y
y−3∂y
of the differential opera-
tor (2.24). The existence of such a set depends on the
boundary conditions and is ensured if the problem is
of Sturm-Liouville type (see, e.g.,[53]). For the junc-
tion condition (2.26), such a set does unfortunately not
exist due to the time derivative. One way to proceed
would be to introduce other coordinates along the lines
of [54] for which the junction condition reduces to a sim-
ple Neumann boundary condition leading to a problem
of Sturm-Liouville type. This transformation is, however,
relatively complicated to implement without approxima-
tions and is the subject of future work.
Here we shall proceed otherwise, harnessing the fact that
we are interested in low energy effects only, i.e. in small
brane velocities. Assuming that one can neglect the
time derivative in the junction condition since |v| ≪ 1,
Eq. (2.25) reduces to a simple Neumann boundary con-
dition. We shall therefore work with the boundary con-
ditions
∂yh•|yb = ∂yh•|ys = 0 . (2.29)
Then, at any time t the eigenvalue problem for the spatial
part of the differential operator (2.24)
−∂2y +
φα(t, y) = −y3∂y
y−3∂yφα(t, y)
= m2α(t)φα(t, y) (2.30)
is of Sturm-Liouville type if we demand that the φα’s are
subject to the boundary conditions (2.29). Consequently,
the set of eigenfunctions {φα(t, y)}∞α=0 is complete,
φα(t, y)φα(t, ỹ) = δ(y − ỹ)y3 , (2.31)
and orthonormal with respect to the inner-product
(φα, φβ) = 2
yb(t)
φα(t, y)φβ(t, y) = δαβ . (2.32)
Note the factor 2 in front of both expressions which is
necessary in order to take the Z2 symmetry properly into
account.
The eigenvalues mα(t) are time-dependent and discrete
due to the time-dependent but finite distance between
the branes and the eigenfunctions φα(t, y) are time-
dependent in particular because of the time dependence
of the boundary conditions (2.29). The case α = 0
with m0 = 0 is the zero mode, i.e. the massless four-
dimensional graviton. Its general solution in accordance
with the boundary conditions is just a constant with re-
spect to the extra dimension, φ0(t, y) = φ0(t), and is fully
determined by the normalization condition (φ0, φ0) = 1:
φ0(t) =
ysyb(t)√
y2s − y2b (t)
. (2.33)
For α = i ∈ {1, 2, 3, · · · , } with eigenvalues mi > 0, the
general solution of (2.30) is a combination of the Bessel
functions J2 (mi(t) y) and Y2 (mi(t) y). Their particular
combination is determined by the boundary condition at
the moving brane. The remaining boundary condition at
the static brane selects the possible values for the eigen-
values mi(t), the KK masses. For any three-momentum
k these masses build up an entire tower of momenta in
the y-direction; the fifth dimension. Explicitely, the so-
lutions φi(t, y) for the KK modes read
φi(t, y) = Ni(t)y
2C2 (mi(t) y) (2.34)
Cν(miy) = Y1(miyb)Jν(miy)−J1(miyb)Yν(miy). (2.35)
The normalization reads
Ni(t, yb, ys) =
y2sC22(mi ys)− (2/(miπ))
(2.36)
where we have used that
C2(mi yb) =
πmi yb
. (2.37)
2 Note that we have changed the parameterization of the solutions
with respect to [37] for technical reasons. There, we also did not
take into account the factor 2 related to Z2 symmetry.
It can be simplified further by using
C2(mi ys) =
Y1(mi yb)
Y1(mi ys)
πmi ys
(2.38)
leading to
Y 21 (miys)
Y 21 (miyb)− Y 21 (miys)
. (2.39)
Note that it is possible to have Y 21 (mi ys)− Y 21 (mi yb) =
0. But then both Y 21 (miys) = Y
1 (miyb) = 0 and
Eq. (2.39) has to be understood as a limit. For that rea-
son, the expression (2.36) for the normalization is used
in the numerical simulations later on. Its denominator
remains always finite.
The time-dependent KK masses {mi(t)}∞i=1 are deter-
mined by the condition
C1 (mi(t)ys) = 0 . (2.40)
Because the zeros of the cross product of the Bessel func-
tions J1 and Y1 are not known analytically in closed form,
the KK-spectrum has to be determined by solving Eq.
(2.40) numerically 3. An important quantity which we
need below is the rate of change ṁi/mi of a KK mass
given by
m̂i ≡
= ŷb
m2i π
N2i (2.41)
where the rate of change of the brane motion ŷb is just
the Hubble parameter on the brane
ŷb(t) ≡
ẏb(t)
yb(t)
≃ −Ha = − ȧ
= −H . (2.42)
On account of the completeness of the eigenfunctions
φα(t, y) the gravitational wave amplitude h•(t, y;k)
subject to the boundary conditions (2.29) can now be
expanded as
h•(t, y;k) =
qα,k,•(t)φα(t, y) . (2.43)
The coefficients qα,k,•(t) are canonical variables describ-
ing the time evolution of the perturbations and the fac-
κ5/L3 has been introduced in order to render the
qα,k,•’s canonically normalized. In order to satisfy (2.23)
we have to impose the same condition for the canonical
variables, i.e.
q∗α,k,• = qα,−k,•. (2.44)
3 Approximate expressions for the zeros can be found in [55].
One could now insert the expansion (2.43) into the wave
equation (2.24), multiplying it by φβ(t, y) and integrating
out the y−dependence by using the orthonormality to de-
rive the equations of motion for the variables qα,k,•. How-
ever, as we explain in Appendix A, a Neumann boundary
condition at a moving brane is not compatible with a free
wave equation. The only consistent way to implement the
boundary conditions (2.29) is therefore to consider the
action (2.27) of the perturbations as the starting point
to derive the equations of motion for qα,k,•. Inserting
(2.43) into (2.27) leads to the canonical action
S = 1
|q̇α,k,•|2 − ω2α,k|qα,k,•|2
Mαβ (qα,k,•q̇β,−k,• + qα,−k,•q̇β,k,•)
+Nαβqα,k,•qβ,−k,•
. (2.45)
We have introduced the time-dependent frequency of a
graviton mode
ω2α,k =
k2 +m2α , k = |k| , (2.46)
and the time-dependent coupling matrices
Mαβ = (∂tφα, φβ) , (2.47)
Nαβ = (∂tφα, ∂tφβ) =
MαγMβγ (2.48)
which are given explicitely in Appendix B (see also [37]).
Consequently, the equations of motion for the canonical
variables are
q̈α,k,• + ω
α,kqα,k,• +
[Mβα −Mαβ ] q̇β,k,•
Ṁαβ −Nαβ
qβ,k,• = 0 . (2.49)
The motion of the brane through the bulk, i.e.
the expansion of the Universe, is encoded in the
time-dependent coupling matrices Mαβ , Nαβ. The
mode couplings are caused by the time-dependent
boundary condition ∂yh•(t, y)|yb = 0 which forces the
eigenfunctions φα(t, y) to be explicitly time-dependent.
In addition, the frequency of a KK mode ωα,k is also
time-dependent since the distance between the two
branes changes when the brane is in motion. Both time-
dependencies can lead to the amplification of tensor
perturbations and, within a quantum theory which is
developed in the next section, to graviton production
from vacuum.
Because of translation invariance with respect to the
directions parallel to the brane, modes with different k
do not couple in (2.49). The three-momentum k enters
the equation of motion for the perturbation only via the
frequency ωα,k, i.e. as a global quantity. Equation (2.49)
is similar to the equation describing the time-evolution
of electromagnetic field modes in a three-dimensional
dynamical cavity [16] and may effectively be described
by a massive scalar field on a time-dependent interval
[17]. For the electromagnetic field, the dynamics of the
cavity, or more precisely the motion of one of its walls,
leads to photon creation from vacuum fluctuations. This
phenomenon is usually referred to as dynamical Casimir
effect. Inspired by this, we shall call the production of
gravitons by the moving brane as dynamical Casimir
effect for gravitons.
D. Remarks and comments
In [37] we have already shown that in the limit where
the fixed brane is sent off to infinity, ys → ∞, only the
M00 matrix element survives with M00 = −H[1 + O(ǫ)]
and ǫ = yb/ys. M00 expresses the coupling of the zero
mode to the brane motion. Since all other couplings dis-
appear for ǫ → 0 all modes decouple from each other and,
in addition, the canonical variables for the KK modes de-
couple from the brane motion itself. This has led to the
result that at late times and in the limit ys ≫ yb, the KK
modes with non-vanishing mass evolve trivially, and only
the massless zero mode is coupled to the brane motion
q̈0,k,• +
k2 − Ḣ −H2
q0,k,• = 0 . (2.50)
Since φ0 ∝ 1/a [cf. Eqs. (4.2),(4.5)] we have found in [37]
that the gravitational zero mode on the brane h0,•(t;k) ≡√
κ5/L3q0,k,•φ0(t, yb) evolves according to
ḧ0,•(t;k) + 2Hḣ0,•(t;k) + k2h0,•(t;k) = 0 , (2.51)
which explicitely demonstrates that at low energies (late
times) the homogeneous tensor perturbation equation in
brane cosmology reduces to the four-dimensional tensor
perturbation equation.
An important comment is in order here concerning
the RS II model. In the limit ys → ∞ the fixed brane
is sent off to infinity and one ends up with a single
positive tension brane in AdS, i.e. the RS II model.
Even though we have shown that all couplings except
M00 vanish in this limit, that does not imply that this
is necessarily the case for the RS II setup. Strictly
speaking, the above arguments are only valid in a two
brane model with ys ≫ 1. Starting with the RS II model
from the beginning, the coupling matrices do in general
not vanish when calculated with the corresponding
eigenfunctions which can be found in, e.g., [22]. One
just has to be careful when taking those limits. But
what the above consideration demonstrates is that, if
the couplings of the zero mode to the KK modes vanish,
like in the ys ≫ 1 limit or in the low energy RS II model
as observed in numerical simulations (see below) the
standard evolution equation for the zero mode emerges
automatically from five-dimensional perturbation theory.
Starting from five-dimensional perturbation theory,
our formalism does imply the usual evolution equation
for the four-dimensional graviton in a FLRW-Universe
in the limit of vanishing couplings. This serves as a
very strong indication (but certainly not proof!) for
the fact that the approach based on the approximation
(2.29) and the expansion of the action in canonical
variables rather than the wave equation is consistent
and leads to results which should reflect the physics at
low energies. As already outlined, if one would expand
the wave equation (2.24) in the set of functions φα,
the resulting equation of motion for the corresponding
canonical variables is different from Eq. (2.49) and
cannot be derived from a Lagrangian or Hamiltonian
(see Appendix A). Moreover, in [30] the low energy RS
II scenario has been studied numerically including the
full junction condition (2.26) without approximations
(see also [27]). Those numerical results show that
the evolution of tensor perturbations on the brane is
four-dimensional, i.e. described by Eq. (2.51) derived
here analytically. Combining these observations gives
us confidence that the used approach based on the
Neumann boundary condition approximation and the
action as starting point for the canonical formulation is
adequate for the study of tensor perturbations in the
low energy limit. The many benefits this approach offers
will become visible in the following.
III. QUANTUM GENERATION OF TENSOR
PERTURBATIONS
A. Preliminary remarks
We now introduce a treatment of quantum generation
of tensor perturbations. This formalism is an advance-
ment of the method which is presented in [15, 16, 17]
for the dynamical Casimir effect for a scalar field and
the electromagnetic field to gravitational perturbations
in the braneworld scenario.
The following method is very general and not restricted
to a particular brane motion as long as it complies with
the low energy approach [cf. Eq. (2.28)]. We assume that
asymptotically, i.e. for t → ±∞, the physical brane
approaches the Cauchy horizon (yb → 0), moving very
slowly. Then, the coupling matrices vanish and the KK
masses are constant (for yb close to zero, Eq. (2.40) re-
duces to J1(miys) = 0):
Mαβ(t) = 0 , lim
mα(t) = const. ∀α, β .
(3.1)
In this limit, the system (2.49) reduces to an infinite set
of uncoupled harmonic oscillators. This allows to intro-
duce an unambiguous and meaningful particle concept,
i.e. notion of (massive) gravitons.
As a matter of fact, in the numerical simulations, the
brane motion has to be switched on and off at finite times.
These times are denoted by tin and tout, respectively. We
introduce vacuum states with respect to times t < tin < 0
and t > tout > 0. In order to avoid spurious effects influ-
encing the particle creation, we have to chose tin small,
respectively tout large enough such that the couplings are
effectively zero at these times. Checking the indepen-
dence of the numerical results on the choice of tin and
tout guarantees that these times correspond virtually to
the real asymptotic states of the brane configuration.
B. Quantization, initial and final state
Canonical quantization of the gravity wave amplitude
is performed by replacing the canonical variables qα,k,•
by the corresponding operators q̂α,k,•
ĥ•(t, y;k) =
q̂α,k,•(t)φα(t, y) . (3.2)
Adopting the Heisenberg picture to describe the quantum
time-evolution, it follows that q̂α,k,• satisfies the same
equation (2.49) as the canonical variable qα,k,•.
Under the assumptions outlined above, the operator
q̂α,k,• can be written for times t < tin as
q̂α,k,•(t < tin) = (3.3)
2ωinα,k
âinα,k,•e
−i ωinα,k t + âin†α,−k,•e
i ωinα,k t
where we have introduced the initial-state frequency
ωinα,k ≡ ωα,k(t < tin) . (3.4)
This expansion ensures that Eq. (2.44) is satisfied. The
set of annihilation and creation operators {âinα,k,•, â
α,k,•}
corresponding to the notion of gravitons for t < tin is
subject to the usual commutation relations
âinα,k,•, â
α′,k′,•′
= δαα′δ••′δ
(3)(k− k′) , (3.5)
âinα,k,•, â
α′,k′,•′
α,k,•, â
α′,k′,•′
= 0. (3.6)
For times t > tout, i.e. after the motion of the brane has
ceased, the operator q̂α,k,• can be expanded in a similar
manner,
q̂α,k,•(t > tout) = (3.7)
2ωoutα,k
âoutα,k,•e
−i ωout
t + â
out †
α,−k,•e
i ωout
with final state frequency
ωoutα,k ≡ ωα,k(t > tout) . (3.8)
The annihilation and creation operators {âoutα,k,•, â
out †
α,k,•}
correspond to a meaningful definition of final state gravi-
tons (they are associated with positive and negative fre-
quency solutions for t ≥ tout) and satisfy the same com-
mutation relations as the initial state operators.
Initial |0, in〉 ≡ |0, t < tin〉 and final |0, out〉 ≡ |0, t > tout〉
vacuum states are uniquely defined via 4
âinα,k,•|0, in〉 = 0 , âoutα,k,•|0, out〉 = 0 , ∀ α, k, • . (3.9)
The operators counting the number of particles defined
with respect to the initial and final vacuum state, respec-
tively, are
N̂ inα,k,• = â
α,k,•â
α,k,• , N̂
α,k,• = â
out †
α,k,•â
α,k,• . (3.10)
The number of gravitons created during the motion of
the brane for each momentum k, quantum number α and
polarization state • is given by the expectation value of
the number operator N̂outα,k,• of final-state gravitons with
respect to the initial vacuum state |0, in〉:
N outα,k,• = 〈0, in|N̂outα,k,•|0, in〉. (3.11)
If the brane undergoes a non-trivial dynamics between
tin < t < tout it is â
α,k,•|0, in〉 6= 0 in general, i.e.
graviton production from vacuum fluctuations takes
place.
From (2.22), the expansion (3.2) and Eqs.(3.3),
(3.7) it follows that the quantized tensor perturbation
with respect to the initial and final state can be written
ĥij(t < tin,x,y) =
(2π)3/2
âinα,k,• e
−i ωinα,k t
2ωinα,k
× u•ij,α(t < tin,x, y,k) + h.c. (3.12)
ĥij(t > tout,x,y) =
(2π)3/2
âoutα,k,• e
−i ωoutα,k t
2ωoutα,k
× u•ij,α(t > tout,x, y,k) + h.c. . (3.13)
We have introduced the basis functions
u•ij,α(t,x, y,k) = e
ik ·x e•ij(k)φα(t, y). (3.14)
which, on account of (e•ij(k))
∗ = e•ij(−k), satisfy
(u•ij,α(t,x, y,k))
∗ = u•ij,α(t,x, y,−k).
4 Note that the notations |0, t < tin〉 and |0, t > tout〉 do not mean
that the states are time-dependent; states do not evolve in the
Heisenberg picture.
C. Time evolution
During the motion of the brane the time evolution of
the field modes is described by the system of coupled
differential equations (2.49). To account for the inter-
mode couplings mediated by the coupling matrix Mαβ
the operator q̂α,k,• is decomposed as
q̂α,k,•(t) =
2ωinβ,k
âinβ,k,•ǫ
α,k(t) + â
β,−k,•ǫ
α,k (t)
(3.15)
The complex functions ǫ
α,k(t) also satisfy the system of
coupled differential equations (2.49). With the ansatz
(3.15) the quantized tensor perturbation at any time dur-
ing the brane motion reads
ĥij(t,x, y) = (3.16)
âinβ,k,•√
2ωinβ,k
α,k(t)u
ij,α(t,x, y,k) + h.c. .
Due to the time-dependence of the eigenfunctions φα,
the time-derivative of the gravity wave amplitude con-
tains additional mode coupling contributions. Using the
completeness and orthnormality of the φα’s it is readily
shown that
h•(t, y;k) =
p̂α,−k,•(t)φα(t, y) (3.17)
where
p̂α,−k,•(t) = ˙̂qα,k,•(t) +
Mβαq̂β,k,•(t). (3.18)
The coupling term arises from the time dependence of
the mode functions φα. Accordingly, the time derivative
hij reads
hij(t,x, y) =
âinβ,k,•√
2ωinβ,k
× (3.19)
× f (β)α,k(t)u
ij,α(t,x, y,k) + h.c.
where we have introduced the function
α,k(t) = ǫ̇
α,k(t) +
Mγα(t)ǫ
γ,k(t) . (3.20)
By comparing Eq. (3.12) and its time-derivative with
Eqs. (3.16) and (3.19) at t = tin one can read off the
initial conditions for the functions ǫ
α,k(tin) = δαβ Θ
α,k , (3.21)
α,k(tin) =
−iωinα,kδαβ −Mβα(tin)
Θinβ,k (3.22)
with phase
Θinα,k = e
−iωinα,k tin . (3.23)
The choice of this phase for the initial condition is in
principle arbitrary, we could as well set Θinα,k = 1. But
with this choice, ǫ
α,k(t) is independent of tin for t < tin
and therefore it is also at later times independent of tin
if only we choose tin sufficiently early. This is especially
useful for the numerical work.
D. Bogoliubov transformations
The two sets of annihilation and creation operators
{âinα,k,•, â
α,k,•} and {âoutα,k,•, â
out †
α,k,•} corresponding to the
notion of initial-state and final-state gravitons are re-
lated via a Bogoliubov transformation. Matching the
expression for the tensor perturbation Eq. (3.16) and
its time-derivative Eq. (3.19) with the final state expres-
sion Eq. (3.13) and its corresponding time-derivative at
t = tout one finds
âoutβ,k,• =
Aαβ,k(tout)âinα,k,• + B∗αβ,k(tout)â
α,−k,•
(3.24)
Aβα,k(tout) =
ωoutα,k
ωinβ,k
α,k(tout) +
ωoutα,k
α,k(tout)
(3.25)
Bβα,k(tout) =
Θoutα,k
ωoutα,k
ωinβ,k
α,k(tout)−
ωoutα,k
α,k(tout)
(3.26)
where we shall stick to the phase Θoutα,k defined like Θ
in (3.23) for completeness. Performing the matching at
tout = tin the Bogoliubov transformation should become
trivial, i.e. the Bogoliubov coefficients are subject to vac-
uum initial conditions
Aαβ,k(tin) = δαβ , Bαβ,k(tin) = 0. (3.27)
Evaluating the Bogoliubov coefficients (3.25) and (3.26)
for tout = tin by making use of the initial conditions
(3.21) and (3.22) shows the consistency. Note that the
Bogoliubov transformation (3.24) is not diagonal due
to the inter-mode coupling. If during the motion of
the brane the graviton field departs form its vacuum
state one has Bαβ,k(tout) 6= 0, i.e. gravitons have been
generated.
By means of Eq. (3.24) the number of generated
final state gravitons (3.11), which is the same for every
polarization state, is given by
N outα,k (t ≥ tout) =
•=+,×
〈0, in|N̂outα,k,•|0, in〉
|Bβα,k(tout)|2. (3.28)
Later we will sometimes interpret tout as a continuous
variable tout → t such that N outα,k → Nα,k(t), i.e. it
becomes a continuous function of time. We shall call
Nα,k(t) the instantaneous particle number [see Appendix
C 2], however, a physical interpretation should be made
with caution.
E. The first order system
From the solutions of the system of differential equa-
tions (2.49) for the complex functions ǫ
α,k, the Bogoli-
ubov coefficient Bαβ,k, and hence the number of cre-
ated final state gravitons (3.28), can now be calculated.
It is however useful to introduce auxiliary functions
α,k(t), η
α,k(t) through
α,k(t) = ǫ
α,k(t) +
ωinα,k
α,k(t) (3.29)
α,k(t) = ǫ
α,k(t)−
ωinα,k
α,k(t) . (3.30)
These are related to the Bogoliubov coefficients via
Aβα,k(tout) = (3.31)
ωoutα,k
ωinβ,k
∆+α,k(tout)ξ
α,k(tout) + ∆
α,k(tout)η
α,k(tout)
Bβα,k(tout) = (3.32)
Θoutα,k
ωoutα,k
ωinβ,k
∆−α,k(tout)ξ
α,k(tout) + ∆
α,k(tout)η
α,k(tout)
where we have defined
∆±α,k(t) =
ωinα,k
ωα,k(t)
, (3.33)
Using the second order differential equation for ǫ
α,k, it is
readily shown that the functions ξ
α,k(t), η
α,k(t) satisfy
the following system of first order differential equations:
α,k(t) = −i
a+αα,k(t)ξ
α,k(t)− a
αα,k(t)η
α,k(t)
c−αγ,k(t)ξ
γ,k(t) + c
αγ,k(t)η
γ,k(t)
(3.34)
α,k(t) = −i
a−αα,k(t)ξ
α,k(t)− a
αα,k(t)η
α,k(t)
c+αγ,k(t)ξ
γ,k(t) + c
αγ,k(t)η
γ,k(t)
(3.35)
a±αα,k(t) =
ωinα,k
ωα,k(t)
ωinα,k
 , (3.36)
c±γα,k(t) =
Mαγ(t)±
ωinα,k
ωinγ,k
Mγα(t)
. (3.37)
The vacuum initial conditions (3.27) entail the initial
conditions
α,k(tin) = 2 δαβ Θ
α,k , η
α,k(tin) = 0. (3.38)
With the aid of Eq. (3.32), the coefficient Bαβ,k(tout),
and therefore the number of produced gravitons, can be
directly deduced from the solutions to this system of
coupled first order differential equations which can be
solved using standard numerics.
In the next section we will show how interesting
observables like the power spectrum and the energy den-
sity of the amplified gravitational waves are expressed in
terms of the number of created gravitons. The system
(3.34, 3.35) of coupled differential equations forms the
basis of our numerical simulations. Details of the applied
numerics are collected in Appendix D.
IV. POWER SPECTRUM, ENERGY DENSITY
AND LOCALIZATION OF GRAVITY
A. Perturbations on the brane
By solving the system of coupled differential equations
formed by Eqs. (3.34) and (3.35) the time evolution of
the quantized tensor perturbation ĥij(t,x, y) can be com-
pletely reconstructed at any position y in the bulk. Ac-
cessible to observations is the imprint which the pertur-
bations leave on the brane, i.e. in our Universe. Of par-
ticular interest is therefore the part of the tensor pertur-
bation which resides on the brane. It is given by eval-
uating Eq. (2.22) at the brane position y = yb (see also
[36])
ĥij(t,x, yb) =
(2π)3/2
•=+,×
eik·xe•ij(k)ĥ•(t, yb,k) .
(4.1)
The motion of the brane (expansion of the Universe) en-
ters this expression via the eigenfunctions φα(t, yb(t)).
We shall take (4.1) as the starting point to define ob-
servables on the brane.
The zero-mode function φ0(t) [cf. Eq. (2.33)] does not
depend on the extra dimension y. Using Eq. (2.37), one
reads off from Eq. (2.34) that the eigenfunctions on the
brane φα(t, yb) are
φα(t, yb) = yb Yα(yb) =
Yα(a) (4.2)
where we have defined
Y0(a) =
y2s − y2b
and (4.3)
Yn(a) =
Y 21 (mnys)
Y 21 (mnyb)− Y 21 (mnys)
, (4.4)
for the zero- and KK modes, respectively. One immedi-
ately is confronted with an interesting observation: the
function Yα(a) behaves differently with the expansion of
the Universe for the zero mode α = 0 and the KK modes
α = n. This is evident in particular in the asymptotic
regime ys ≫ yb, i.e. yb → 0 (|t|, a → ∞) where, exploit-
ing the asymptotics of Y1 (see [55]), one finds
Y0(a) ≃ 1 , Yn(a) ≃
|Y1(mnys)| ≃
(4.5)
Ergo, Y0 is constant while Yn decays with the expansion
of the Universe as 1/a. For large n one can approximate
mn ≃ nπ/ys and Y1(mnys) ≃ Y1(nπ) ≃ (1/π)
2/n [55],
so that
Yn(a) ≃
, Y2n(a) ≃
πL2mn
2 ysa2
. (4.6)
In summary, the amplitude of the KKmodes on the brane
decreases faster with the expansion of the Universe than
the amplitude of the zero mode. This leads to interest-
ing consequences for the observable power spectrum and
energy density and has a clear physical interpretation: It
manifest the localization of usual gravity on the brane.
As we shall show below, KK gravitons which are traces
of the five-dimensional nature of gravity escape rapidly
from the brane.
B. Power spectrum
We define the power spectrum P(k) of gravitational
waves on the brane as in four-dimensional cosmology by
using the restriction of the tensor amplitude to the brane
position (4.1):
(2π)3
P(k)δ(3)(k− k′) (4.7)
•=×,+
0, in
∣∣∣ĥ•(t, yb;k)ĥ†•(t, yb;k′)
∣∣∣0, in
i.e. we consider the expectation value of the field operator
ĥ• with respect to the initial vacuum state at the position
of the brane y = yb(t). In order to get a physically mean-
ingful power spectrum, averaging over several oscillations
of the gravitational wave amplitude has to be performed.
Equation (4.7) describes the observable power spectrum
imprinted in our Universe by the four-dimensional spin-2
graviton component of the five-dimensional tensor per-
turbation.
The explicit calculation of the expectation value involv-
ing a “renormalization” of a divergent contribution is car-
ried out in detail in Appendix C 2. The final result reads
P(k) = 1
(2π)3
Rα,k(t)Y2α(a). (4.8)
The function Rα,k(t) can be expressed in terms of the
Bogoliubov coefficients (3.25) and (3.26) if one considers
tout as a continuous variable t:
Rα,k(t) =
Nα,k(t) +ONα,k(t)
ωα,k(t)
. (4.9)
Nα,k(t) is the instantaneous particle number [cf. Ap-
pendix C 1] and the function ONα,k(t) is defined in
Eq. (C9).
It is important to recall thatNα,k(t) can in general not be
interpreted as a physical particle number. For example
zero modes with wave numbers such that kt < 1 can-
not be considered as particles. They have not performed
several oscillations and their energy density cannot be
defined in a meaningful way.
Equivalently, expressed in terms of the complex functions
α,k, one finds
Rα,k(t) =
|ǫ(β)α,k(t)|2
ωinβ,k
ωα,k(t)
+Oǫα,k(t), (4.10)
with Oǫα,k given in Eq. (C10). Equation (4.8) together
with (4.9) or (4.10) holds at all times.
If one is interested in the power spectrum at early times
kt ≪ 1, it is not sufficient to take only the instantaneous
particle number Nα,k(t) in Eq. (4.9) into account. This
is due to the fact that even if the mode functions ǫ
are already oscillating, the coupling matrix entering the
Bogoliubov coefficients might still undergo a non-trivial
time dependence [cf. Eq. (6.16)]. In the next section
we shall show explicitly, that in a radiation dominated
bounce particle creation, especially of the zero mode,
only stops on sub-Hubble times, kt > 1, even if the mode
functions are plane waves right after the bounce [cf, e.g.,
Figs. 6, 7, 9]. Therefore, in order to determine the per-
turbation spectrum of the zero mode, one has to make
use of the full expression expression (4.10) and may not
use (4.11), given below.
At late times, kt ≫ 1 (t ≥ tout) when the brane moves
slowly, the couplings Mαβ go to zero and particle cre-
ation has come to an end, both functions ONα,k and Oǫα,k
do not contribute to the observable power spectrum after
averaging over several oscillations. Furthermore, the in-
stantaneous particle number then equals the (physically
meaningful) number of created final state gravitons N outα,k
and the KK masses are constant. Consequently, the ob-
servable power spectrum at late times takes the form
P(k, t ≥ tout) =
(2π)3
N outα,k
ωoutα,k
Y2α(a) , (4.11)
where we have used that κ5/L = κ4. Its dependence on
the wave number k is completely determined by the spec-
tral behavior of the number of created gravitons N outα,k .
It is useful to decompose the power spectrum in its zero-
mode and KK-contributions:
P = P0 + PKK . (4.12)
In the late time regime, using Eqs. (4.11) and (4.5), the
zero-mode power spectrum reads
P0(k, t ≥ tout) =
(2π)3
N out0,k . (4.13)
As expected for a usual four-dimensional tensor perturba-
tion (massless graviton), on sub-Hubble scales the power
spectrum decreases with the expansion of the Universe
as 1/a2.
In contrast, the KK mode power spectrum for late times,
given by
PKK(k, t ≥ tout) =
N outn,k
ωoutn,k
Y 21 (mnys),
(4.14)
decreases as 1/a4, i.e. with a factor 1/a2 faster than
P0. The gravity wave power spectrum at late times is
therefore dominated by the zero-mode power spectrum
and looks four dimensional. Contributions to it arising
from five-dimensional effects are scaled away rapidly as
the Universe expands due to the 1/a4 behavior of PKK.
In the limit of large masses mnys ≫ 1, n ≫ 1 and for
wave lengths k ≪ mn such that ωn,k ≃ mn, the late-time
KK-mode power spectrum can be approximated by
PKK(k, t ≥ tout) =
16π2ys
N outn,k (4.15)
where we have inserted Eq. (4.6) for Y2n(a).
Note that the formal summations over the particle num-
ber might be ill defined if the brane trajectory contains
unphysical features like discontinuities in the velocity. An
appropriate regularization is then necessary, for example,
by introducing a physically motivated cutoff.
C. Energy density
For a usual four-dimensional tensor perturbation hµν
on a backgroundmetric gµν an associated effective energy
momentum tensor can be defined unambiguously by (see,
e.g., [14, 56])
Tµν =
〈hαβ‖µhαβ‖ν〉 , (4.16)
where the bracket stands for averaging over several pe-
riods of the wave and “‖” denotes the covariant deriva-
tive with respect to the unperturbed background metric.
The energy density of gravity waves is the 00-component
of the effective energy momentum tensor. We shall use
the same effective energy momentum tensor to calculate
the energy density corresponding to the four-dimensional
spin-2 graviton component of the five-dimensional ten-
sor perturbation on the brane, i.e. for the perturbation
hij(t,x, yb) given by Eq. (4.1). For this it is important
to remember that in our low energy approach, and in
particular at very late times for which we want to cal-
culate the energy density, the conformal time η on the
brane is identical to the conformal bulk time t. The en-
ergy density of four-dimensional spin-2 gravitons on the
brane produced during the brane motion is then given by
[see also [36]]
κ4 a2
0, in| ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)|0, in
. (4.17)
Here the outer bracket denotes averaging over several os-
cillations, which (in contrast to the power spectrum) we
embrace from the very beginning. The factor 1/a2 comes
from the fact that an over-dot indicates the derivative
with respect t. A detailed calculation is carried out in
Appendix C 3 leading to
(2π)3
ωα,kNα,k(t)Y2α(a) (4.18)
where againNα,k(t) is the instantaneous particle number.
At late times t > tout after particle creation has ceased,
the energy density is therefore given by
(2π)3
ωoutα,k N outα,k Y2α(a). (4.19)
This expression looks at first sight very similar to a
“naive” definition of energy density as integration over
momentum space and summation over all quantum num-
bers α of the energy ωoutα,k N outα,k of created gravitons.
(Note that the graviton number N outα,k already contains
the contributions of both polarizations [see Eq. (3.28)].)
However, the important difference is the appearance of
the function Y2α(a) which exhibits a different dependence
on the scale factor for the zero mode compared to the
KK modes.
Let us decompose the energy density into zero-mode and
KK contributions
ρ = ρ0 + ρKK . (4.20)
For the energy density of the massless zero mode one
then obtains
(2π)3
kN out0,k . (4.21)
This is the expected behavior; the energy density of stan-
dard four-dimensional gravitons scales like radiation.
On contrast, the energy density of the KK modes at late
times is found to be
ρKK =
(2π)3
ωoutn,k N outn,k m2nY 21 (mnys),
(4.22)
which decays like 1/a6. As the Universe expands, the en-
ergy density of massive gravitons on the brane is there-
fore rapidly diluted. The total energy density of gravita-
tional waves in our Universe at late times is dominated
by the standard four-dimensional graviton (massless zero
mode). In the large mass limit mnys ≫ 1,n ≫ 1 the KK-
energy density can be approximated by
ρKK ≃
2a6ys
(2π)3
N outn,k ωoutn,kmn . (4.23)
Due to the factor mn coming from the function Y2n, i.e.
from the normalization of the functions φn(t, y), for the
summation over the KK-tower to converge, the number
of produced gravitons N outn,k has to decrease faster than
1/m3n for large masses and not just faster than 1/m
one might naively expect.
D. Escaping of massive gravitons and localization
of gravity
As we have shown, the power spectrum and energy
density of the KK modes scale, at late times when par-
ticle production has ceased, with the expansion of the
Universe like
PKK ∝ 1/a4 , ρKK ∝ 1/a6. (4.24)
Both quantities decay by a factor 1/a2 faster than the
corresponding expressions for the zero-mode graviton. In
particular, the energy density of the KK particles on the
brane behaves effectively like stiff matter. Mathemat-
ically, this difference arises from the distinct behavior
of the functions Y0(a) and Yn(a) [cf. Eq. (4.5)] and is a
direct consequence of the warping of the fifth dimension.
But what is the underlying physics? As we shall discuss
now, this scaling behavior for the KK particles has
indeed a very appealing physical interpretation which is
in the spirit of the RS model.
First, the mass mn is a comoving mass. The (in-
stantaneous) ’comoving’ frequency or energy of a KK
graviton is ωn,k =
k2 +m2n, with comoving wave
number k. The physical mass of a KK mode measured
by an observer on the brane with cosmic time dτ = adt
is therefore mn/a, i.e. the KK masses are redshifted
with the expansion of the Universe. This comes from
the fact that mn is the wave number corresponding to
the y-direction with respect to the bulk time t which
corresponds to conformal time η on the brane and not to
physical time. It implies that the energy of KK particles
on a moving AdS brane is redshifted like that of massless
particles. From this alone one would expect that the
energy density of KK modes on the brane decays like
1/a4 (see also Appendix D of [22]).
Now, let us define the “wave function” for a gravi-
Ψα(t, y) =
φα(t, y)
(4.25)
which, by virtue of (φα, φα) = 1, satisfies
dyΨ2α(t, y) = 1 (4.26)
From the expansion of the gravity wave amplitude
Eq. (2.43) and the normalization condition it is clear that
Ψ2α(t, y) gives the probability to find a graviton of mass
mα for a given (fixed) time t at position y in the Z2-
symmetric AdS-bulk. Since φα satisfies Equation (2.30),
the wave function Ψα satisfies the Schrödinger like equa-
− ∂2yΨα +
Ψα = m
αΨα (4.27)
and the junction conditions (2.29) translate into
Ψα|y={yb,ys} = 0. (4.28)
In Fig. 2 we plot the evolution of Ψ21(t, y) under the
influence of the brane motion Eq. (2.18) with vb = 0.1.
For this motion, the physical brane starting at yb → 0 for
t → −∞ moves towards the static brane, corresponding
to a contracting Universe. After a bounce, it moves
back to the Cauchy horizon, i.e. the Universe expands.
The second brane is placed at ys = 10L and y ranges
from yb(t) to ys. We set Ψ
1 ≡ 0 for y < yb(t) . The
time-dependent KK mass m1 is determined numerically
from Eq. (2.40). As it is evident from this Figure, Ψ21
is effectively localized close to the static brane, i.e. the
weight of the KK-mode wave function lies in the region
of less warping, far from the physical brane. Thus the
probability to find a KK mode is larger in the region
with less warping. Since the effect of the brane motion
on Ψ21 is hardly visible in Fig. 2, we show the behavior
of Ψ21 close to the physical brane in Fig. 3. This shows
that Ψ21 peaks also at the physical brane but with an
amplitude roughly ten times smaller than the amplitude
at the static brane. While the brane, coming from
t → −∞, approaches the point of closest encounter Ψ21
slightly increases and peaks at the bounce t = 0 where,
as we shall show in the next Section, the production
of KK particles takes place. Afterwards, for t → ∞,
when the brane is moving back towards the Cauchy
horizon, the amplitude Ψ21 decreases again and so does
the probability to find a KK particle at the position of
the physical brane, i.e. in our Universe. The parameter
settings used in Figures 2 and 3 are typical parameters
which we use in the numerical simulations described
later on. However, the effect is illustrated much better
if the second brane is closer to the moving brane. In
Figure 4 we show Ψ21 for the same parameters as in
Figures 2 and 3 but now with ys = L. In this case, the
probability to find a KK particle on the physical brane
is of the same order as in the region close to the second
brane during times close to the bounce. However, as the
Universe expands, Ψ21 rapidly decreases at the position
of the physical brane.
From Eqs. (4.2) and (4.5) it follows that Ψ2n(t, yb) ∝ 1/a.
The behavior of the KK-mode wave function suggests
the following interpretation: If KK gravitons are created
on the brane, or equivalently in our Universe, they
escape from the brane into the bulk as the brane moves
back to the Cauchy horizon, i.e. when the Universe
undergoes expansion. This is the reason why the power
spectrum and the energy density imprinted by the KK
modes on the brane decrease faster with the expansion
of the Universe than for the massless zero mode.
The zero mode, on the other hand, is localized at
the position of the moving brane. The profile of φ0 does
not depend on the extra dimension, but the zero-mode
wave function Ψ0 does. Its square is
Ψ20(t, y) =
y2s − y2b
if ys ≫ yb ,
(4.29)
such that on the brane (y = yb) it behaves as
Ψ20(t, yb) ≃
. (4.30)
Equation (4.29) shows that, at any time, the zero
mode is localized at the position of the moving brane.
For a better illustration we show Eq. (4.29) in Fig. 5
for the same parameters as in Fig. 4. This is the
“dynamical analog” of the localization mechanism for
four-dimensional gravity discussed in [7].
To establish contact with [7] and to obtain a intu-
itive physical description, we rewrite the boundary value
problem (4.27), (4.28) as a Schrödinger-like equation
− ∂2yΨα(t, y) + V (y, t)Ψα(y, t) = mα(t)Ψα(y, t) (4.31)
V (y, t) =
yb(t)
δ(|y| − yb(t))
− 3a(t)
δ(|y| − yb(t)) , (4.32)
where we have absorbed the boundary condition at the
moving brane into the (instantaneous) volcano potential
V (y, t) and made use of Z2 symmetry. Similar to the
static case [7], at any time the potential (4.32) supports
a single bound state, the four-dimensional graviton
(4.29), and acts as a barrier for the massive KK modes.
The potential, ensuring localization of four-dimensional
gravity on the brane and the repulsion of KK modes,
moves together with the brane through the fifth dimen-
sion. Note that with the expansion of the Universe, the
“depth of the delta-function” becomes larger, expressing
the fact that the localization of four-dimensional gravity
becomes stronger at late times [cf. Eq. (4.30), Fig. 5].
In summary, the different scaling behavior for the
zero- and KK modes on the brane is entirely a conse-
quence of the geometry of the bulk space-time, i.e. of
the warping L2/y2 of the metric (2.1) 5. It is simply a
manifestation of the localization of gravity on the brane:
as time evolves, the KK gravitons, which are traces of
the five-dimensional nature of gravity, escape into the
bulk and only the zero mode which corresponds to the
usual four-dimensional graviton remains on the brane.
This, and in particular the scaling behavior (4.24),
remains also true if the second brane is removed, i.e. in
the limit ys → ∞, leading to the original RS II model.
By looking at (4.15) and (4.23) one could at first think
that then the KK-power spectrum and energy density
vanish and no traces of the KK gravitons could be
observed on the brane since both expressions behave
as 1/ys. But this is not the case since the spectrum of
KK masses becomes continuous. In the continuum limit
ys → ∞ the summation over the discrete spectrum mn
has to be replaced by an integration over continuous
masses m in the following way:
f(mn) −→
dmf(m) . (4.33)
f is some function depending on the spectrum, for
example f(mn) = N outn,k . The pre-factor 1/ys in (4.15)
and (4.23) therefore ensures the existence of the proper
continuum limit of both expressions.
Another way of seeing this is to repeat the same
calculations but using the eigenfunctions for the case
with only one brane from the beginning. Those are
δ-function normalized and can be found in, e.g., [22].
They are basically the same as (2.34) except that the
normalization is different since it depends on whether
the fifth dimension is compact or not. In particular, on
the brane, they have the same scale factor dependence
as (4.2).
At the end, the behavior found for the KK modes
should not come as a surprise, since the RS II model
has attracted lots of attention because of exactly this;
it localizes usual four-dimensional gravity on the brane.
As we have shown here, localization of standard four-
dimensional gravity on a moving brane via a warped
geometry automatically ensures that the KK modes
escape into the bulk as the Universe expands because
their wave function has its weight in the region of less
warping, resulting in an KK-mode energy density on the
brane which scales like stiff matter.
An immediate consequence of this particular scaling
behavior is that KK gravitons in an AdS braneworld
5 Note that it does not depend on a particular type of brane motion
and is expected to be true also in the high energy case which we
do not consider here.
FIG. 2: Evolution of Ψ21(t, y) = φ
1(t, y)/y
3 corresponding to
the probability to find the first KK graviton at time t at the
position y in the AdS-bulk. The static brane is at ys = 10L
and the maximal brane velocity is given by vb = 0.1.
FIG. 3: Evolution of Ψ21(t, y) as in Fig. 2 but zoomed into
the bulk-region close to the moving brane.
cannot play the role of dark matter. Their energy density
in our Universe decays much faster with the expansion
than that of ordinary matter which is restricted to reside
on the brane.
V. NUMERICAL SIMULATIONS
A. Preliminary remarks
In this section we present results of numerical simula-
tions for the bouncing model described by the equations
(2.17)-(2.19).
In the numerical simulations we set L = 1, i.e. all
FIG. 4: Evolution of Ψ21(t, y) for ys = L and vb = 0.1.
FIG. 5: Localization of four-dimensional gravity on a moving
brane: Evolution of Ψ20(t, y) for ys = L = 1 and vb = 0.1
which should be compared with Ψ21(t, y) shown in Fig. 4.
dimensionful quantities are measured in units of the
AdS5 curvature scale. Starting at initial time tin ≪ 0
where the initial vacuum state |0, in〉 is defined, the
system (3.34,3.35) is evolved numerically up to final time
tout. Thereby we set tin = −2πNin/k with 1 ≤ Nin ∈ N,
such that Θin0,k = 1 [cf. Eq. (3.23)]. This implies
0 (tin) = 2, i.e. independent of the three-dimensional
momentum k a (plane wave) zero-mode solution always
performs a fixed number of oscillations between tin and
the bounce at t = 0 [cf. Eq. (3.38)]. The final graviton
spectrum at N outα,k is calculated at late times tout ≫ 1
when the brane approaches the Cauchy horizon and
graviton creation has ceased. This quantity is physically
well defined and leads to the late-time power spectrum
(4.11) and energy density (4.19) on the brane. For
illustrative purposes, we also plot the instantaneous
particle number Nα,k,•(t) which also determines the
power spectrum at all times [cf Eq.(4.9)]. In this section
we shall use the term particle number respectively
graviton number for both, the instantaneous particle
number Nα,k,•(t) as well as the final state graviton
number N outα,k,•, keeping in mind that only the latter one
is physically meaningful.
There are two physical input parameters for the
numerical simulation; the maximal brane velocity vb (i.e.
tb) and the position of the static brane ys. The latter
determines the number of KK modes which fall within a
particular mass range. On the numerical side one has to
specify Nin and tout, as well as the maximum number of
KK modes nmax which one takes into account, i.e. after
which KK mode the system of differential equations is
truncated. The independence of the numerical results
on the choice of the time parameters is checked and the
convergence of the particle spectrum with increasing
nmax is investigated. More detailed information on
numerical issues including accuracy considerations are
collected in Appendix D.
One strong feature of the brane motion (2.18) is its kink
at the bounce t = 0. In order to study how particle
production depends on the kink, we shall compare
the motion (2.18) with the following motion which
has a smooth transition from contraction to expansion
(L = 1):
yb(t) =
(|t|+ tb − ts)−1 if |t| > ts
a+ (b/2)t2 + (c/4)t4 if |t| ≤ ts
(5.1)
with the new parameter ts in the range 0 < ts < tb. This
motion is constructed such that its velocity at |t| = ts
is the same as the velocity of the kink motion at the
bounce. This will be the important quantity determin-
ing the number of produced gravitons. For ts → 0 the
motion with smooth transition approaches (2.18). The
parameters a, b and c are obtained by matching the mo-
tions and the first and second derivatives. Matching also
the second derivative guarantees that possible spurious
effects contributing to particle production are avoided.
The parameter ts has to be chosen small enough, ts ≪ 1,
such that the maximal velocity of the smooth motion is
not much larger than vb in order to have comparable sit-
uations.
For reasons which will become obvious in the next two
sections we shall discuss the cases of long k ≪ 1 and
short wavelengths k ≫ 1, separately.
B. Generic results and observations for long
wavelengths k ≪ 1
Figure 6 displays the results of a numerical simula-
tion for three-momentum k = 0.01, static brane position
ys = 10 and maximal brane velocity vb = 0.1. Depicted is
FIG. 6: Evolution of the graviton number Nα,k,•(t) for the
zero mode and the first ten KK modes for three-momentum
k = 0.01 and vb = 0.1, ys = 10.
FIG. 7: Nn,k,•(t) for the zero mode and the first ten KK
modes for the parameters of Fig. 6, but without coupling of
the zero mode to the KK modes, i.e. Mi0 ≡ 0.
the graviton number for one polarizationNα,k,•(t) for the
zero mode and the first ten KK modes as well as the evo-
lution of the scale factor a(t) and the position of the phys-
ical brane yb(t). Initial and final times are Nin = 5 and
tout = 2000, respectively. The KK-particle spectrum will
be discussed in detail below. One observes that the zero-
mode particle number increases slightly with the expan-
sion of the Universe towards the bounce at t = 0. Close
to the bounce N0,k,•(t) increases drastically, shows a lo-
cal peak at the bounce and, after a short decrease, grows
again until the mode is sub-horizon (kt ≫ 1). Inside the
horizon N0,k,•(t) is oscillating around a mean value with
diminishing amplitude. This mean value which is reached
asymptotically for t → ∞ corresponds to the number of
generated final state zero-mode gravitonsN out0,k,•. Produc-
tion of KK-mode gravitons takes effectively place only at
the bounce in a step-like manner and the graviton num-
ber remains constant right after the bounce.
In Fig. 7 we show the numerical results obtained for the
same parameters as in Fig. 6 but without coupling of the
zero mode to the KK modes, i.e. Mi0 = 0 (and thus
also Ni0 = N0i = 0). One observes that the production
of zero-mode gravitons is virtually not affected by the
artificial decoupling 6. Note that even if M0j ≡ 0 (see
Eqs. B2), which is in general true for Neumann bound-
ary conditions, the zero mode q0,k,• couples in Eq. (2.49)
to the KK modes via N0j = M00Mj0 and through the
anti-symmetric combination Mαβ −Mβα.
In contrast, the production of the first ten KK modes is
heavily suppressed if Mi0 = 0. The corresponding final-
state graviton numbers N outn,k,• are reduced by four orders
of magnitude. This shows that the coupling to the zero
mode is essential for the production of massive gravitons.
Later we will see that this is true for light KK gravitons
only. If the KK masses exceed mi ∼ 1, they evolve in-
dependently of the four-dimensional graviton and their
evolution is entirely driven by the intermode couplings
Mij . It will also turn out that the time-dependence of
the KK mass mi plays only an inferior role for the gen-
eration of massive KK modes. On the other hand, the
effective decoupling of the evolution of the zero mode
from the KK modes occurs in general as long as k ≪ 1
is satisfied, i.e. for long-wavelengths. We will see that it
is no longer true for short wavelengths k ≫ 1.
The effective decoupling of the zero-mode evolution from
the KK modes makes it possible to derive analytical ex-
pressions for the number of zero-mode gravitons, their
power spectrum and energy density. The calculations
are carried out in section VIA
In summary we emphasize the important observation
that for long wavelengths the amplification of the four
dimensional gravity wave amplitude during the bounce
is not affected by the evolution of the KK gravitons. We
can therefore study the zero mode separately from the
KK modes in this case.
C. Zero mode: long wavelengths k ≪ 1
In Figure 8 we show the numerical results for the num-
ber of generated zero-mode gravitons N0,k,•(t) and the
evolution of the corresponding power spectrum P0(k) on
the brane for momentum k = 0.01, position of the static
brane ys = 10 and maximal brane velocity vb = 0.1. The
results have been obtained by solving the equations for
the zero mode alone, i.e. without the couplings to the KK
modes, since, as we have just shown, the evolution of the
6 Quantitatively it is N0,k,•(t = 2000) = 965.01 with and
N0,k,•(t = 2000) = 965.06 without Mi0. Note that this differ-
ence lies indeed within the accuracy of our numerical simulations
(see Appendix D.)
four-dimensional graviton for long wavelengths is not in-
fluenced by the KK modes. Thereby the power spectrum
is shown before and after averaging over several oscilla-
tions, i.e. employing Eq. (4.9) with and without the term
ON0,k, respectively. Right after the bounce where the gen-
eration of gravitons is initiated and which is responsible
for the peak in N0,k,• at t = 0, the number of gravitons
first decreases again. AfterwardsN0,k,• grows further un-
til the mode enters the horizon at kt = 1. Once on sub-
horizon scales kt ≫ 1, the number of produced gravitons
oscillates with a diminishing amplitude and asymptoti-
cally approaches the final state graviton number N out0,k,•.
During the growth of N0,k,• after the bounce, the power
spectrum remains practically constant. Within the range
of validity it is in good agreement with the analytical pre-
diction (6.22) yielding (L2(2π)3/κ4)P0(k, t) = 4vb(kL)2.
When particle creation has ceased, the full power spec-
trum Eq.(4.8) starts to oscillate with an decreasing am-
plitude. The time-averaged power spectrum obtained
by using Eq. (4.9) without the ON0,k-term is perfectly
in agreement with the analytical expression Eq. (6.20)
which gives (L2(2π)3/κ4)P0(k, t) = 2vb/t2. Note that at
early times, the time-averaged power spectrum behaves
not in the same way as the full one, demonstrating the
importance of the term ON0,k.
Figure 9 shows a summary of numerical results for the
number of created zero-mode gravitons N0,k,•(t) for dif-
ferent values of the three-momentum k. The maximum
velocity at the bounce is vb = 0.1 and the second brane is
at ys = 10. These values are representative. Other values
in accordance with the considered low-energy regime do
not lead to a qualitatively different behavior. Note that
the evolution of the zero mode does virtually not depend
on the value of ys as long as ys ≫ yb(0) (see below). Ini-
tial and final integration times are given by Nin = 5 and
tout = 20000, respectively.
For sub-horizon modes we compare the final graviton
spectra with the analytical prediction (6.17). Both are in
perfect agreement. On super-horizon scales where parti-
cle creation has not ceased yet N0,k,• is independent of
k. The corresponding time-evolution of the power spec-
tra P0(k, t) is depicted in Fig. 10. For the sake of clarity,
only the results for t > 0, i.e. after the bounce, are shown
in both figures.
The numerical simulations and the calculations of sec-
tion VIA reveal that the power spectrum for the four-
dimensional graviton for long wavelengths is blue on
super-horizon scales, as expected for an ekpyrotic sce-
nario.
The analytical calculations performed in section VIA
rely on the assumption that yb ≪ ys and tin → −∞.
Figure 11 shows the behavior of the number of generated
zero-mode gravitons of momentum k = 0.01 in depen-
dence on the inter-brane distance and the initial integra-
tion time. The brane velocity at the bounce is vb = 0.1
which implies that at the bounce the moving brane is
at yb(0) =
vb ≃ 0.316 (L = 1). In case of a close en-
counter of the two branes as for ys = 0.35, the production
FIG. 8: Time evolution of the number of created zero-mode
gravitons N0,k,•(t) and of the zero-mode power spectrum
(4.8): (a) for the entire integration time; (b) for t > 0 only.
Parameters are k = 0.01, ys = 10 and vb = 0.1. Initial and fi-
nal time of integration are given by Nin = 10 and tout = 4000,
respectively. The power spectrum is shown with and without
the term ON0,k,•, i.e. before and after averaging, respectively,
and compared with the analytical results.
FIG. 9: Numerical results for the time evolution of the num-
ber of created zero-mode gravitons N0,k,•(t) after the bounce
t > 0 for different three-momenta k. The maximal brane
velocity at the bounce is vb = 0.1 and the second brane is
positioned at ys = 10. In the final particle spectrum the nu-
merical values are compared with the analytical prediction
Eq. (6.17). Initial and final time of integration are given by
Nin = 5 and tout = 20000, respectively.
of massless gravitons is strongly enhanced compared to
the analytical result. But as soon as ys ≥ 1, (i.e. ys ≥ L)
the numerical result is very well described by the analyt-
ical expression Eq. (6.16) derived under the assumption
ys ≫ yb. For ys ≥ 10 the agreement between both is
very good. From panels (b) and (c) one infers that the
numerical result becomes indeed independent of the ini-
FIG. 10: Evolution of the zero-mode power spectrum after the
bounce t > 0 corresponding to the values and parameters of
Fig. 9. The numerical results are compared to the analytical
predictions Eqs. (6.20) and (6.22).
0 1000 2000
time t
analytical
=0.35
10 100 1000
2nd brane pos. y
0.1 1 10 100 1000
2nd brane pos. y
analytical
(a) (b)
FIG. 11: Dependence of the zero-mode particle number on
inter-brane distance and initial integration time for momen-
tum k = 0.01, maximal brane velocity vb = 0.1 in comparison
with the analytical expression Eq. (6.16). (a) Evolution of the
instantaneous particle number N0,k,•(t) with initial integra-
tion time given by Nin = 5 for ys = 0.35, 0.5 and 1. (b) Final
zero-mode graviton spectrum N0,k,•(tout = 2000) for various
values of ys and Nin. (c) Close-up view of (b) for large ys.
tial integration time when increasing Nin. Note that in
the limit Nin ≫ 1 the numerical result is slightly larger
than the analytical prediction but the difference between
both is negligibly small. This confirms the correctness
and accuracy of the analytical expressions derived in Sec-
tion VIA for the evolution of the zero-mode graviton.
0.01 0.1 1 10
Kaluza-Klein mass m
t nmax=60
= 0.3
= 0.1
 = 0.5●
FIG. 12: Final state KK-graviton spectra for k = 0.001, ys =
100, different maximal brane velocities vb and Nin = 1, tout =
400. The numerical results are compared with the analytical
prediction Eq. (6.34) (dashed line).
D. Kaluza-Klein-modes: long wavelengths k ≪ 1
Because the creation of KK gravitons ceases right
after the bounce [cf Fig. 6] one can stop the numerical
simulation and read out the number of produced KK
gravitons N outn,k,• at times for which the zero mode is still
super-horizon.
Even though Eq. (2.40) cannot be solved analytically,
the KK masses can be approximated by mn ≃ nπ/ys.
This expression is the better the larger the mass.
Consequently, for the massive modes the position of the
second brane ys determines how many KK modes belong
to a particular mass range ∆m.
In Figure 12 we show the KK-graviton spectra N outn,k,• for
three-momentum k = 0.001 and second brane position
ys = 100 for maximal brane velocities vb = 0.1, 0.3
and 0.5. For any velocity vb two spectra obtained
with nmax = 60 and 80 KK modes taken into account
in the simulation are compared to each other. This
reveals that the numerical results are stable up to a
KK mass mn ≃ 1. One infers that first, N outn,k,• grows
with increasing mass until a maximum is reached. The
position of the maximum shifts slightly towards larger
masses with increasing brane velocity vb. Afterwards,
N outn,k,• declines with growing mass. Until the maximum
is reached, the numerical results for the KK-particle
spectrum are very stable. This already indicates that
the KK-intermode couplings mediated by Mij are not
very strong in this mass range. In Figure 13 we show
the final KK-particle spectrum for the same parameters
as in Fig. 12 but for three-momentum k = 0.01 and
0.01 0.1 1 10
Kaluza-Klein mass m
=60, M
=0 for all i,j
 = 0.1
 = 0.5
 = 0.3
 = 0.9
FIG. 13: Final state KK-graviton spectra for k = 0.01, ys =
100, different vb and Nin = 1, tout = 400. The numerical
results are compared with the analytical prediction Eq. (6.34)
(dashed line). For vb = 0.3, 0.5 the spectra obtained without
KK-intermode and self-couplings (Mij ≡ 0 ∀ i, j) are shown
as well.
the additional velocity vb = 0.9
7. We observe the same
qualitative behavior as in Fig. 12. In addition we show
numerical results obtained for vb = 0.3 and 0.5 without
the KK-intermode and self couplings, i.e. we have set
Mij ≡ 0 ∀ i, j by hand. One infers that for KK masses,
depending slightly on the velocity vb but at least up to
mn ≃ 1, the numerical results for the spectra do not
change when the KK-intermode coupling is switched off.
Consequently, the evolution of light, i.e. mn <∼ 1, KK
gravitons is virtually not affected by the KK-intermode
coupling.
In addition we find that also the time-dependence of the
KK masses is not important for the production of light
KK gravitons which is explicitly demonstrated below.
Thus, production of light KK gravitons is driven by
the zero-mode evolution only. This allows us to find
an analytical expression, Eq. (6.34), for the number of
produced light KK gravitons in terms of exponential
integrals. The calculations which are based on several
approximations are performed in Section VIC.
In Figs. 12 and 13 the analytical prediction (6.34) for
the spectrum of final state gravitons has already been
included (dashed lines). Within its range of validity
it is in excellent agreement with the numerical results
obtained by including the full KK-intermode coupling.
It perfectly describes the dependence of N outn,k,• on the
three-momentum k and the maximal velocity vb. For
small velocities vb <∼ 0.1 it is also able to reproduce the
position of the maximum. This reveals that the KK-
7 Such a high brane velocity is of course not consistent with a
Neumann boundary condition Eq. (2.29) at the position of the
moving brane.
intermode coupling is negligible for light KK gravitons
and that their production is entirely driven by their
coupling to the four-dimensional graviton.
The analytical prediction is very precious for testing the
goodness of the parameters used in the simulations, in
particular the initial time tin (respectively Nin). Since it
has been derived for real asymptotic initial conditions,
tin → −∞, its perfect agreement with the numerical
results demonstrates that the values for Nin used in the
numerical simulations are large enough. No spurious
initial effects contaminate the numerical results.
Note, that the numerical values for N outn,k,• in the ex-
amples shown are all smaller than one. However, for
smaller values of k than the ones which we consider here
for purely numerical reasons, the number of generated
KK-mode particles is enhanced since N outn,k,• ∝ 1/k as
can be inferred from Eq. (6.34) in the limit k ≪ mn.
If we go to smaller values of ys, fewer KK modes
belong to a particular mass range. Hence, with the same
or similar number of KK modes as taken into account
in the simulations so far, we can study the behavior
of the final particle spectrum for larger masses. These
simulations shall reveal the asymptotical behavior of
N outn,k,• for mn → ∞ and therefore the behavior of the
total graviton number and energy density. Due to the
kink in the brane motion we cannot expect that the
energy density of produced KK-mode gravitons is finite
when summing over arbitrarily high frequency modes.
Eventually, we will have to introduce a cutoff setting the
scale at which the kink-approximation [cf. Eqs. (2.17)
- (2.19)] is no longer valid. This is the scale where the
effects of the underlying unspecified high-energy physics
which drive the transition from contraction to expansion
become important. The dependence of the final particle
spectrum on the kink will be studied later on in this
section in detail.
In Figures 14 and 15 we show final KK-graviton
spectra for ys = 10 and three-momentum k = 0.01 and
k = 0.1. The analytical expression Eq. (6.34) is depicted
as well and the spectra are always shown for at least
two values of nmax to indicate up to which KK mass
stability of the the numerical results is guaranteed.
Now, only two KK modes are lighter than m = 1. For
these modes the analytical expression Eq. (6.34) is valid
and in excellent agreement with the numerical results, in
particular for small brane velocities vb ∼ 0.1. As before,
the larger the velocity vb the more visible is the effect of
the truncation of the system of differential equations at
nmax.
For k = 0.01 the spectrum seems to follow a power law
decrease right after the maximum in the spectra. In
case of vb = 0.1 the spectrum is numerically stable up to
masses mn ≃ 20. In the region 5 <∼ mn <∼ 20 the spec-
trum is very well fitted by a power law N outn,k,• ∝ m−2.7n .
Also for larger velocities the decline of the spectrum
is given by the same power within the mass ranges
0.1 1 10
Kaluza-Klein mass m
 = 0.5
 = 0.3
 = 0.1
FIG. 14: Final state KK-graviton spectra for k = 0.01, ys =
10, different maximal brane velocities vb and Nin = 2, tout =
400. The numerical results are compared with the analytical
prediction Eq. (6.34) (dashed line).
0.1 1 10
Kaluza - Klein mass m
 = 0.5
 = 0.1
 = 0.3
FIG. 15: Final state KK-graviton spectra for k = 0.1, ys = 10,
different maximal brane velocities vb and Nin = 2, tout =
400. The numerical results are compared with the analytical
prediction Eq. (6.34) (dashed line).
where the spectrum is numerically stable. For k = 0.1,
however, the decreasing spectrum bends over at a mass
around mn ≃ 10 towards a less steep decline. This is
in particular visible in the two cases with vb = 0.1 and
0.3 where the first 100 KK modes have been taken into
account in the simulation. The behavior of the KK-mode
particle spectrum can therefore not be described by a
single power law decline for masses mn > 1. It shows
more complicated features instead, which depend on the
parameters. We shall demonstrate that this bending
over of the decline is related to the coupling properties of
the KK modes and to the kink in the brane motion. But
before we come to a detailed discussion of these issues,
let us briefly confront numerical results of different ys to
FIG. 16: Upper panel: Final state KK-particle spectra for
k = 0.01, vb = 0.1 and different ys = 3, 10, 30 and 100.
The analytical prediction Eq. (6.34) is shown as well (dashed
line). Lower panel: Energy ωoutn,kN
n,k,• of the produced fi-
nal state gravitons binned in mass intervals ∆m = 1 for
ys = 10, 30, 100.
demonstrate a scaling behavior.
In the upper panel of Figures 16 and 17 we com-
pare the final KK-spectra for several positions of the
second brane ys = 3, 10, 30 and 100 obtained for a
maximal brane velocity vb = 0.1 for k = 0.01 and
0.1, respectively. One observes that the shapes of the
spectra are identical. The bending over in the decline
of the spectrum at masses mn ∼ 1 is very well visible
for k = 0.1 and ys = 3, 10. For a given KK mode n
the number of particles produced in this mode is the
larger the smaller ys. But the smaller ys, the less KK
modes belong to a given mass interval ∆m. The energy
transferred into the system by the moving brane, which
is determined by the maximum brane velocity vb, is the
same in all cases. Therefore, the total energy of the pro-
duced final state KK gravitons of a given mass interval
∆m should also be the same, independent of how many
KK modes are contributing to it. This is demonstrated
in the lower panels of Figs. 16 and 17 where the energy
ωoutn,kN outn,k,•(in units of L) of the generated KK gravitons
binned in mass intervals ∆m = 1 is shown 8. One
observes that, as expected, the energy transferred into
the production of KK gravitons of a particular mass
range is the same (within the region where the numerical
results are stable), independent of the number of KK
modes lying in the interval. This is in particular evident
for ys = 30, 100. The discrepancy for ys = 10 is due
to the binning. As we shall discuss below in detail, the
8 The energy for the case ys = 3 is not shown because no KK mode
belongs to the first mass interval.
FIG. 17: Upper panel: Final state KK-particle spectra for
k = 0.1, vb = 0.1 and different ys = 3, 10, 30 and 100.
The analytical prediction Eq. (6.34) is shown as well (dashed
line). Lower panel: Energy ωoutn,kN
n,k,• of the produced fi-
nal state gravitons binned in mass intervals ∆m = 1 for
ys = 10, 30, 100.
particle spectrum can be split into two different parts.
The first part is dominated by the coupling of the zero
mode to the KK modes (as shown above), whereas the
second part is dominated by the KK-intermode couplings
and is virtually independent of the wave number k.
As long as the coupling of the zero mode to the KK
modes is the dominant contribution to KK-particle
production it is N outn,k,• ∝ 1/k [cf. Eq. (6.34)]. Hence,
Eoutn,k,• = ωoutn,kN outn,k,• ∝ 1/k if mn ≫ k. This explains why
the energy per mass interval ∆m is one order larger for
k = 0.01 (cf Fig. 16) than for k = 0.1 (cf Fig. 17) .
Let us now discuss the KK-spectrum for large masses.
The qualitative behavior of the spectrum N outn,k,• and the
mass at which the decline of the spectrum changes are
independent of ys. This is demonstrated in Figure 18
where KK-spectra for vb = 0.1, k = 0.1, ys = 10 [cf
Fig. 15] and ys = 3 [cf Fig. 17] are shown. The results
obtained by taking the full intermode coupling into
account are compared to results of simulations where
we have switched off the coupling of the KK modes to
each other as well as their self-coupling (Mij ≡ 0 ∀ i, j).
Furthermore we display the results for the KK-spectrum
obtained by taking only the KK-intermode couplings
into account, i.e. Mi0 = Mii = 0 ∀ i. One infers that
for the lowest masses the spectra obtained with all
couplings are identical to the ones obtained without
the KK-intermode (Mij = 0, i 6= j) and self-couplings
(Mii = 0). Hence, as already seen before, the primary
source for the production of light KK gravitons is their
coupling to the evolution of the four-dimensional gravi-
ton. In this mass range, the contribution to the particle
creation coming from the KK-intermode couplings is
very much suppressed and negligibly small.
1 10 100
Kaluza-Klein mass m
t full coupling
=0 for all i,j
=0, M
FIG. 18: KK-particle spectra for three-momentum k = 0.1,
maximum brane velocity vb = 0.1 and ys = 3 and 10 with
different couplings taken into account. The dashed lines indi-
cates again the analytical expression Eq. (6.34).
For masses mn ≃ 4 a change in the decline of the
spectrum sets in and the spectrum obtained without
the coupling of the KK modes to the zero mode starts
to diverge from the spectrum computed by taking all
the couplings into account. While the spectrum without
the KK-intermode couplings decreases roughly like a
power law N outn,k,• ∝ m−3n the spectrum corresponding
to the full coupling case changes its slope towards
a power law decline with less power. At this point
the KK-intermode couplings gain importance and the
coupling of the KK modes to the zero mode looses
influence. For a particular mass mc ≃ 9 the spectrum
obtained including the KK-intermode couplings only,
crosses the spectrum calculated by taking into account
exclusively the coupling of the KK modes to the zero
mode. After the crossing, the spectrum obtained by
using only the KK-intermode couplings approaches the
spectrum of the full coupling case. Both agree for large
masses. Thus for large masses mn > mc the production
of KK gravitons is dominated by the couplings of the
KK modes to each other and is not influenced anymore
by the evolution of the four-dimensional graviton. This
crossing defines the transition between the two regimes
mentioned before: for masses mn < mc the production
of KK gravitons takes place due to their coupling to
the zero mode Mi0, while it is entirely caused by the
intermode couplings Mij for masses mn > mc.
Decoupling of the evolution of the KK modes from
the dynamics of the four-dimensional graviton for
large masses implies that KK-spectra obtained for the
same maximal velocity are independent of the three-
momentum k. This is demonstrated in Fig. 19 where we
compare spectra obtained for vb = 0.1 and ys = 3 but
different k. As expected, all spectra converge towards
the same behavior for masses mn > mc.
1 2 3 4 5 10 20 30 1000.3
Kaluza-Klein masses m
t k=0.01
k=0.03
k=0.1
FIG. 19: Comparison of KK-particle spectra for ys = 3,
vb = 0.1 and three-momentum k = 0.01, 0.03, 0.1 and 1
demonstrating the independence of the spectrum on k for
large masses. nmax = 60 KK modes have been taken into
account in the simulations.
1 10 100
Kaluza-Klein mass m
t full coupling
=0 for all i, j
=0, i = j
=0, M
Mαβ = 0  for all α, β
FIG. 20: KK-particle spectra for three-momentum k = 0.1,
maximum brane velocities vb = 0.1 and ys = 3 for nmax = 40
obtained for different coupling combinations.
Figure. 20 shows KK-particle spectra for k = 0.1,vb = 0.1
and ys = 3 obtained for different couplings. This plot
visualizes how each particular coupling combination
contributes to the production of KK gravitons. It
shows, as already mentioned before but not shown
explicitly, that the Mii coupling which is the rate of
change of the corresponding KK mass [cf. Eqs. (2.41)
and (B4)] is not important for the production of KK
gravitons. Switching it off does not affect the final
graviton spectrum. We also show the result obtained
with all couplings but with α+ii(t) = ω
i,k and α
ii (t) = 0,
i.e. the time-dependence of the frequency [cf. Eq. (3.36)]
has been neglected. One observes that in this case the
spectrum for larger masses is quantitatively slightly
0.1 1 10 100
Kaluza-Klein mass m
t full coupling
=0 for all i, j
=0, M
k=0.01
k=0.1
k=0.1
FIG. 21: KK-particle spectra for ys = 10, vb = 0.1, nmax =
100 and three-momentum k = 0.01 and 0.1 with different
couplings taken into account. The thin dashed lines indicates
Eq. (6.34) and the thick dashed line Eq. (5.4).
different but has a identical qualitative behavior. If,
on the other hand, all the couplings are switched off
Mαβ ≡ 0 ∀α, β and only the time-dependence of the
frequency ωi,k is taken into account, the spectrum
changes drastically. Not only the number of produced
gravitons is now orders of magnitude smaller but also
the spectral tilt changes. For large masses it behaves as
Nn,k,• ∝ m−2n . Consequently, the time-dependence of
the graviton frequency itself plays only an inferior role
for production of KK gravitons.
The bottom line is that the main sources of the produc-
tion of KK gravitons is their coupling to the evolution of
the four-dimensional graviton (Mi0) and their couplings
to each other (Mij , i 6= j) for small and large masses,
respectively. Both are caused by the time-dependent
boundary condition. The time-dependence of the oscilla-
tor frequency ωj,k =
m2j(t) + k
2 is virtually irrelevant.
Note that this situation is very different from ordinary
inflation where there are no boundaries and particle
production is due entirely to the time dependence of the
frequency 9.
The behavior of the KK-spectrum, in particular
the mass mc at which the KK-intermode couplings start
to dominate over the coupling of the KK modes to
the zero mode depends only on the three-momentum
k = |k| and the maximal brane velocity vb. This is now
discussed. In Figure 21 we show KK-particle spectra
for ys = 10, vb = 0.1, nmax = 100 and three-momenta
k = 0.01 and 0.1. Again, the spectra obtained by taking
9 Note, however, that the time-dependent KK mass mj(t) enters
the intermode couplings.
0.1 1 10
Kaluza-Klein mass m
t full coupling
=0 for all i, j
=0, M
 = 0.1
 = 0.1
 = 0.3
 = 0.03
FIG. 22: KK-particle spectra for three-momentum k =
0.1,ys = 10 and maximum brane velocities vb = 0.03, 0.1 and
0.3 with nmax = 100. As in Fig. 21 different couplings
have been taken into account and thin dashed lines indicates
Eq. (6.34) and the thick dashed line Eq. (5.4).
all the couplings into account are compared to the case
where only the coupling to the zero mode is switched
on. One observes that for k = 0.01 the spectrum is
dominated by the coupling of the KK modes to the
zero mode up to larger masses than it is the case for
k = 0.1. For k = 0.01 the spectrum obtained taking
into account Mi0 only is identical to the spectrum
obtained with the full coupling up to mn ≃ 10. In case
of k = 0.1 instead, the spectrum is purely zero mode
dominated only up to mn ≃ 5. Hence, the smaller the
three-momentum k the larger is the mass range for
which the KK-intermode coupling is suppressed, and
the coupling of the zero mode to the KK modes is the
dominant source for the production of KK gravitons. As
long as the coupling to the zero mode is the primary
source of particle production, the spectrum declines
with a power law ∝ m−3n . Therefore, in the limiting case
k → 0 when the coupling of the zero mode to the KK
modes dominates particle production also for very large
masses it is N outn≫1,k→0,• ∝ 1/m3n.
Figure 22 shows KK-graviton spectra obtained for
the same parameters as in Fig. 21 but for fixed k = 0.1
and different maximal brane velocities vb. Again, the
spectra obtained by taking all the couplings into account
are compared with the spectra to which only the coupling
of the KK modes to the zero mode contributes. The
mass up to which the spectra obtained with different
couplings are identical changes only slightly with the
maximal brane velocity vb. Therefore, the dependence
of mc on the velocity is rather weak even if vb is changed
by an order of magnitude, but nevertheless evident.
This behavior of the spectrum can indeed be understood
qualitatively. In Section VIC we demonstrate that the
coupling strength of the KK modes to the zero mode
1 10 100
1 10 100
1 10 100
1 10 100
k=0.01 k=0.03
k=0.1 k=1
=25.54 mc=14.11
=8.24 mc=7.31
FIG. 23: KK-particle spectra for three-momentum k = 0.01,
0.03, 0.1 and 1 for ys = 3 and maximum brane velocity
vb = 0.1 with different couplings taken into account where
the notation is like in Fig. 22. From the crossing of the
Mii = Mij = 0- and Mii = Mi0 = 0 results we determine
the k-dependence of mc(k, vb). The thick dashed line indi-
cates Eq. (5.4).
at the bounce t = 0, where production of KK gravitons
takes place, is proportional to
. (5.2)
The larger this term the stronger is the coupling of
the KK modes to the zero mode, and thus the larger
is the mass up to which this coupling dominates over
the KK-intermode couplings. Consequently, the mass
at which the tilt of the KK-particle spectrum changes
depends strongly on the three-momentum k but only
weakly on the maximal brane velocity due to the square
root behavior of the coupling strength. This explains
qualitatively the behavior obtained from the numerical
simulations.
An approximate expression for mc(k, vb) can be
obtained from the numerical simulations. In Figure 23
we depict the KK-particle spectra for three-momentum
k = 0.01, 0.03, 0.1 and 1 for ys = 3 and maximum brane
velocity vb = 0.1 with different couplings taken into
account. The legend is as in Fig. 22. From the crossings
of the Mij = 0, i 6= j and Mii = Mi0 = 0 results one
can determine the k-dependence of mc. Note that the
spectra are not numerically stable for large masses, but
they are stable in the range where mc lies [cf., e.g.,
Fig. 25, for k = 0.1]. Using the data for k = 0.01, 0.03
and 0.1 one finds mc(k, vb) ∝ 1/
In Fig. 24 KK-graviton spectra are displayed for
k = 0.1, ys = 3 and maximal brane velocities
vb = 0.3, 0.2, 0.1, 0.08, 0.05 and 0.03 with different
couplings taken into account. It is in principle possible
to determine the vb-dependence of mc from the crossings
1 10 100
1 10 100
1 10 100
 = 0.3
=9.50
 = 0.2 v
 = 0.1
 = 0.08
=8.24
 = 0.05 v
 = 0.03
=9.04
=8.06 m
=7.72 m
=7.52
FIG. 24: KK-graviton spectra for three-momentum k =
0.1, ys = 3 and maximum brane velocities vb =
0.3, 0.2, 0.1, 0.08, 0.05 and 0.03 with different couplings taken
into account where the notation is like in Fig. 22. From the
crossing of the Mii = Mij = 0- and Mii = Mi0 = 0 results we
determine the vb-dependence of mc.
of the Mij = 0, i 6= j- and Mii = Mi0 = 0 results as
done for the k-dependence. However, the values for mc
displayed in the Figures indicate that the dependence
of mc on vb is very weak. From the given data it is not
possible to obtain a good fitting formula (as a simple
power law) for the vb-dependence of mc. (In the range
0.1 ≤ vb ≤ 0.3 a very good fit is mc = 1.12πv0.13b /
The reason is twofold. First of all, given the complicated
coupling structure, it is a priori not clear that a simple
power law dependence exists. Recall that also the ana-
lytical expression for the particle number Eq. (6.34) has
not a simple power law velocity dependence. Moreover,
for the number of modes taken into account (nmax = 40)
the numerical results are not stable enough to resolve
the weak dependence of mc on vb with a high enough
accuracy. (But it is good enough to perfectly resolve
the k-dependence.) The reason for the slow convergence
of the numerics will become clear below. As we shall
see, the corresponding energy density is dominated by
masses much larger than mc. Consequently the weak
dependence of mc on vb is not very important in that
respect and therefore does not need to be determined
more precisely. However, combining all the data we can
give as a fair approximation
mc(k, vb) ≃
π vαb
, with α ≃ 0.1. (5.3)
Taking α = 0.13 for 0.1 ≤ vb ≤ 0.3 and α = 0.08 for
0.03 ≤ vb ≤ 0.1 fits the given data reasonably well.
As we have seen, as long as the zero mode is the
dominant source of KK-particle production, the final
KK-graviton spectrum can be approximated by a power
law decrease m−3n . We can combine the presented
numerical results to obtain a fitting formula valid in this
regime:
N outn≫1,k≪1,• =
(Lmn)3
, for
< mn < mc. (5.4)
This fitting formula is shown in Figs. 21 22 and 23
and is in reasonable good agreement with the numerical
results. Since Eq. (5.4) together with (5.3) is an impor-
tant result, we have reintroduced dimensions, i.e. the
AdS scale L which is set to one in the simulations, in
both expressions.
Let us now investigate the slope of the KK-graviton
spectrum for masses mn → ∞ since it determines the
contribution of the heavy KK modes to the energy den-
sity. In Figure 25 we show KK-graviton spectra obtained
for three-momentum k = 0.1, second brane position
ys = 3 and maximal brane velocities vb = 0.01, 0.03
and 0.1. Up to nmax = 100 KK modes have been taken
into account in the simulations. One immediately is
confronted with the observation that the convergence
of the KK-graviton spectra for large mn is very slow.
This is since those modes, which are decoupled from the
evolution of the four-dimensional graviton, are strongly
affected by the kink in the brane motion. Recall that the
production of light KK gravitons with masses mn ≪ mc
is virtually driven entirely by the evolution of the
massless mode. Those light modes are not so sensitive
to the discontinuity in the velocity of the brane motion.
To be more precise, their primary source of excitation is
the evolution of the four-dimensional graviton but not
the kink which, as we shall discuss now, is responsible
for the production of heavy KK gravitons mn ≫ mc.
A discontinuity in the velocity will always lead to a di-
vergent total particle number. Arbitrary high frequency
modes are excited by the kink since the acceleration
diverges there. Due to the excitation of KK gravitons
of arbitrarily high masses, one cannot expect that the
numerical simulations show a satisfactory convergence
behavior which allows to determine the slope by fit-
ting the data. However, it is nevertheless possible to
give a quantitative expression for the behavior of the
KK-graviton spectrum for large masses. The studies of
the usual dynamical Casimir effect on a time-dependent
interval are very useful for this purpose.
For the usual dynamical Casimir effect it has been
shown analytically that a discontinuity in the velocity
will lead to a divergent particle number [57, 58]. In
Appendix E we discuss in detail the model of a massless
real scalar field on a time-dependent interval [0, y(t)] for
the boundary motion y(t) = y0 + v t with v = const,
and present numerical results for final particle spectra
(Fig. 34). For this motion it was shown in [58] that the
particle spectrum behaves as ∝ v2/ωn where ωn = nπ/y0
is the frequency of a massless scalar particle. This di-
vergent behavior is due to the discontinuities in the
velocity when the motion is switched on and off, and are
responsible for the slow convergence of the numerical
1 2 3 4 5 10 20 30 100
Kaluza-Klein mass m
t nmax = 40
 = 60
 = 80
=0.03
=0.01
FIG. 25: KK-particle spectra for k = 0.1, ys = 3 and max-
imal brane velocities vb = 0.01, 0.03, 0.1 up to KK masses
mn ≃ 100 compared with an 1/mn decline. The dashed lines
indicate the approximate expression (5.6) which describes the
asymptotic behavior of the final KK-particle spectra reason-
ably well, in particular for vb < 0.1.
results shown in Fig. 34 for this scenario.
At the kink in the brane-motion the total change of the
velocity is 2vb, similar to the case for the linear motion
where the discontinuous change of the velocity is 2v.
Consequently we may conclude that for large KK masses
mn ≫ mc for which the evolution of the KK modes is no
longer affected by their coupling to the four-dimensional
graviton the KK-graviton spectrum behaves as 10
N outn,k,• ∝
for mn ≫ mc . (5.5)
If we assume that the spectrum declines like 1/mn and
use that the numerical results for masses mn ≃ 20 are
virtually stable one finds N outn,k,• ∝ v2.08b /mn which de-
scribes the asymptotics of the numerical results well.
As for the dynamical Casimir effect for a uniform motion
discussed in Appendix E [cf. Fig. 34], the slow conver-
gence of the numerical results towards the 1/mn behav-
ior is well visible for large masses mn ≫ mc which do no
longer couple to the four-dimensional graviton. This is a
strong indication for the statement that the final gravi-
ton spectrum for large masses behaves indeed like (5.5).
It is therefore possible to give a single simple expression
for the final KK-particle spectrum for large masses which
10 Note that the discussion in Appendix E refers to Dirichlet bound-
ary conditions. For Neumann boundary conditions considered
here, the zero mode and its asymmetric coupling play certainly a
particular role. However, as we have shown, for large masses only
the KK-intermode couplings are important. Consequently, there
is no reason to expect that the qualitative behavior of the spec-
trum for large masses depends on the particular kind of boundary
condition.
-100 -50 0 50 100
time t
-100 -50 0 50 100
time t
1 10 50
KK mass m
1 10 50
KK mass m
=40 n
FIG. 26: Evolution of the zero-mode particle numberN0,k,•(t)
and final KK-graviton spectra N outn,k,• for ys = 3, maximal
brane velocity vb = 0.1 and three-momenta k = 10 and 30.
The dashed line in the upper plots indicate Eq. (6.17) (divided
by two) demonstrating the value of the number of produced
zero-mode gravitons without coupling to the KK modes.
comprises all the features of the spectrum even quantita-
tively reasonably well [cf. dashed lines in Fig. 25]
N outn,k,• ≃ 0.2
ωoutn,k ys
for mn ≫ mc . (5.6)
The 1/ys-dependence is compelling. It follows imme-
diately from the considerations on the energy and the
scaling behavior discussed above [cf. Figs. 16 and 17].
For completeness we now write 1/ωoutn,k instead of the KK
mass mn only, since what matters is the total energy
of a mode. Throughout this section this has not been
important since we considered only k ≪ 1 such that
ωoutn,k becomes independent of k for large masses mn ≫ k
[cf. Fig. 19].
E. Short wavelengths k ≫ 1
For short wave lengths k ≫ 1 (short compared to the
AdS-curvature scale L set to one in the simulations) a
completely new and very interesting effect appears. The
behavior of the four-dimensional graviton mode changes
drastically. We find that the zero mode now couples
to the KK gravitons and no longer evolves virtually
independently of the KK modes, in contrast to the
behavior for long wavelengths.
In Fig. 26 we show the evolution of the zero-mode
graviton number N0,k,•(t) and final KK-graviton spectra
N outn,k,• for ys = 3, maximal brane velocity vb = 0.1
and three-momenta k = 10 and 30. One observes that
the evolution of the four-dimensional graviton depends
on the number of KK modes nmax taken into account,
i.e. the zero mode couples to the KK gravitons. For
-100 -50 0 50 100
time t
0 10 20 30 40
0.02/(k-1.8)
analytical
FIG. 27: 4D-graviton number N0,k,•(t) for k = 3, 5, 10, 20 and
30 with ys = 3 and maximal brane velocity vb = 0.1. The
small plot shows the final graviton spectrum N out
0,k,• together
with a fit to the inverse law a/(k + b) [dashed line] and the
analytical fitting formula Eq. (6.23) [solid line]. For k = 10
and 30 the corresponding KK-graviton spectra are shown in
Fig. 26.
k = 10 the first 60 KK modes have to be included in the
simulation in order to obtain a numerically stable result
for the zero mode. In the case of k = 30 one already
needs nmax ≃ 100 in order to achieve numerical stability
for the zero mode.
Figure 27 displays the time-evolution of the number
of produced zero-mode gravitons N0,k,•(t) for ys = 3
and vb = 0.1. For large k the production of massless
gravitons takes place only at the bounce since these
short wavelength modes are sub-horizon right after
the bounce. Corresponding KK-particle spectra for
k = 10, 30 are depicted in Figs. 26 and 28. The insert
in Fig. 27 shows the resulting final four-dimensional
graviton spectrum N out0,k,•, which is very well fitted
by an inverse power law N out0,k,• = 0.02/(k − 1.8) 11.
Consequently, for k ≫ 1 the zero-mode particle number
N out0,k,• declines like 1/k only, in contrast to the 1/k2
behavior found for k ≪ 1.
The dependence of N out0,k,• on the maximal brane
velocity vb also changes. In Fig. 28 we show N0,k,•(t)
together with the corresponding KK-graviton spectra for
ys = 3, k = 5 and 10 in each case for different vb. Using
nmax = 60 KK modes in the simulations guarantees
numerical stability for the zero mode. The velocity
dependence of N out0,k,• is not given by a simple power law
as it is the case for k ≪ 1. This is not very surprising
11 The momenta k = 5, 10, 20, 30 and 40 have been used to obtain
the fit. Fitting the spectrum for k = 20, 30 and 40 to a power
law gives N out
0,k,•
∝ k−1.1.
-100 -50 0 50 100
time t
-100 -50 0 50 100
time t
1 10 100
KK mass m
1 10 100
KK mass m
 = 0.03
 = 0.05
 = 0.1
 = 0.03
 = 0.05
 = 0.1
 = 0.3
 = 0.1
 = 0.03
 = 0.03
 = 0.1
 = 0.3
FIG. 28: Zero-mode particle number N0,k,•(t) and corre-
sponding final KK-particle spectraN outn,k,• for ys = 3, k = 5, 10
and different maximal brane velocities vb. nmax = 60 guaran-
tees numerically stable solutions for the zero mode.
since now the zero mode couples strongly to the KK
modes [cf. Fig. 26]. For k = 10, for example, one finds
N out0,k,• ∝ v1.4b if vb <∼ 0.1.
As in the long wavelengths case, the zero-mode
particle number does not depend on the position of the
static brane ys even though the zero mode now couples
to the KK modes. This is demonstrated in Fig. 29 where
the evolution of the zero-mode particle number N0,k,•(t)
and the corresponding KK-graviton spectra with k = 10,
vb = 0.1 for the two values ys = 3 and 10 are shown.
One needs nmax = 60 for ys = 30 in order to obtain a
stable result for the zero mode which is not sufficient in
the case ys = 10. Only for nmax ≃ 120 the zero-mode
solution approaches the stable result which is identical
to the result obtained for ys = 3.
What is important is not the number of the KK modes
the four-dimensional graviton couples to, but rather
a particular mass mzm ≃ k. The zero mode couples
to all KK modes of masses below mzm no matter how
many KK modes are lighter. Recall that the value of ys
just determines how many KK modes belong to a given
mass interval ∆m since, roughly, mn ≃ nπ/ys. The
KK-spectra for k ≥ 1 show the same scaling behavior as
demonstrated for long wavelengths in Figs. 16 and 17.
The production of four-dimensional gravitons of
short wavelengths takes place on the expense of the
KK modes. In Fig. 30 we show the numerical results
for the final KK-particle spectra with vb = 0.1, ys = 3
and k = 3, 5, 10 and 30 obtained for different coupling
combinations. These spectra should be compared with
those shown in Fig. 23 for the long wavelengths case. For
k >∼ 10 the number of the produced lightest KK gravi-
tons is smaller in the full coupling case compared to the
situation where only the KK-intermode coupling is taken
-100 -50 0 50 100
time t
-100 -50 0 50 100
time t
1 10 50
KK mass m
1 10 50
KK mass m
FIG. 29: Zero-mode particle numberN0,k•(t) and correspond-
ing KK-graviton spectra for k = 10, vb = 0.1 and 2nd brane
positions ys = 3 and 10.
1 10 100
KK-mass m
1 10 100
KK-mass m
k=3 k=5
k=10 k=30
FIG. 30: Final KK-particle spectra N outn,k,• for vb = 0.1, ys = 3
and k = 3, 5, 10 and 30 and different couplings. Circles cor-
respond to the full coupling case, squares indicate the results
if Mij = Mii = 0, i.e. no KK-intermode couplings and dia-
monds correspond to Mi0 = 0, i.e. no coupling of KK modes
to the zero mode.
into account. In case k = 30, for instance, the numbers
of produced gravitons for the first four KK modes are
smaller for the full coupling case. This indicates that
the lightest KK modes couple strongly to the zero mode.
Their evolution is damped and graviton production in
those modes is suppressed. The production of zero-mode
gravitons on the other hand is enhanced compared to
the long wavelengths case. For short wavelengths, the
evolution of the KK modes therefore contributes to
the production of zero-mode gravitons. This may be
interpreted as creation of zero-mode gravitons out of
KK-mode vacuum fluctuations.
As in the long wavelengths case, the KK-particle
spectrum becomes independent of k if mn ≫ k and
10 20 100 200
frequency ω
k=40, n
k=30, n
k=20, n
k=10, n
k=5, n
FIG. 31: Final KK-particle spectra N outn,k,• for vb = 0.1,
ys = 3 and k = 5, 10, 20, 30 and 40. The dashed lines in-
dicate Eq. (6.35) for k = 10, 20, 30 and 40. For k ≥ 20, the
simple analytical expression (6.35) agrees quite well with the
numerical results.
the evolution of the KK modes is dominated by the
KK-intermode coupling. This is visible in Fig. 30 for
k = 3 and 5. Also the bend in the spectrum when the
KK-intermode coupling starts to dominate is observable.
For k = 10 and 30 this regime with mn ≫ k is not
reached.
As we have shown before, in the regime mn ≫ k
the KK-particle spectrum behaves as 1/ωoutn,k which will
dominate the energy density of produced KK gravitons.
If 1 ≪ mn <∼ k, however, the zero mode couples to
the KK modes and the KK-graviton spectrum does
not decay like 1/ωoutn,k. This is demonstrated in Fig. 31
where the number of produced final state gravitons
N outn,k,• is plotted as function of their frequency ωoutn,k for
parameters vb = 0.1, ys = 3 and k = 5, 10, 20, 30 and 40.
While for k = 5 the KK-intermode coupling dominates
for large masses [cf. Fig. 30] leading to a bending over
in the spectrum and eventually to an 1/ωoutn,k-decay, the
spectra for k = 20, 30 and 40 show a different behavior.
All the modes are still coupled to the zero mode leading
to a power-law decrease ∝ 1/(ωoutn,k)α with α ≃ 2. The
case k = 10 corresponds to an intermediate regime.
Also shown is the simple analytical expression given in
Eq. (6.35) which describes the spectra reasonably well
for large k (dashed line).
The KK-particle spectra in the region 1 ≪ mn <∼ k will
also contribute to energy density since the cutoff scale is
the same for the integration over k and the summation
over the KK-tower (see Section VID below).
1 2 3 4 5 10 20 30 100
Kaluza-Klein mass m
=0 (kink)
=0.005
=0.015
=0.05
 = 2.2 x 10
 exp(-0.1315 m
FIG. 32: KK-particle spectrum for ys = 3, vb = 0.1 and
k = 0.1 for the bouncing as well as smooth motions with
ts = 0.005, 0.015, and 0.05 to demonstrate the influence of the
bounce. nmax = 60 KK modes have been taken into account
in the simulations and the result for the kink motion is shown
as well.
F. A smooth transition
Let us finally investigate how the KK-graviton spec-
trum changes when the kink-motion (2.18) is replaced by
the smooth motion (5.1). In Fig. 32 we show the numeri-
cal results for the final KK-graviton spectrum for ys = 3,
vb = 0.1 and k = 0.1 for the smooth motion (5.1) with
ts = 0.05, 0.015 and 0.005. nmax = 60 modes have been
taken into account in the simulation and the results are
compared to the spectrum obtained with the kink-motion
(2.18). The parameter ts defines the scale Ls ≃ 2ts at
which the kink is smoothed, i.e. Ls corresponds to the
width of the transition from contraction to expansion.
The numerical results reveal that KK gravitons of masses
smaller than ms ≃ 1/Ls are not affected, but the pro-
duction of KK particles of masses larger than ms is
exponentially suppressed. This is in particular evident
for ts = 0.05 where the particle spectrum for masses
mn > 10 has been fitted to a exponential decrease. Going
to smaller values of ts, the suppression of KK-mode pro-
duction sets in for larger masses. For the example with
ts = 0.005 the KK-particle spectrum is identical to the
one obtained with the kink-motion within the depicted
mass range. In this case the exponential suppression of
particle production sets in only for masses mn > 100.
Note that the exponential decay of the spectrum for the
smooth transition from contraction to expansions also
shows that no additional spurious effects due to the dis-
continuities in the velocity when switching the brane dy-
namics on and off occur. Consequently, tin and tout are
appropriately chosen.
VI. ANALYTICAL CALCULATIONS AND
ESTIMATES
A. The zero mode: long wavelengths k ≪ 1/L
The numerical simulations show that the evolution of
the zero mode at large wavelengths is not affected by the
KK modes. To find an analytical approximation to the
numerical result for the zero mode, we neglect all the
couplings of the KK modes to the zero mode by setting
Mij = 0 ∀ i, j and keeping M00 only. Then only the evo-
lution equation for ǫ
0 ≡ δα0 ǫ is important; it decouples
and reduces to
ǫ̈+ [k2 + V(t)]ǫ = 0 , (6.1)
with “potential”
V = Ṁ00 −M200 . (6.2)
The corresponding vacuum initial conditions are
[cf. Eqs. (3.21), (3.22); here we do not consider the unim-
portant phase]
ǫ = 1 , lim
ǫ̇ = −ik. (6.3)
A brief calculation using the expression for M00 (cf. Ap-
pendix B) leads to
V = y
y2s − y2b
3y2b − 2y2s
y2s − y2b
(6.4)
= − y
y2s − y2b
y2s − y2b
. (6.5)
If one assumes that the static brane is much further away
from the Cauchy horizon than the physical brane, ys ≫
yb, it is simply
V = −H2 − Ḣ , (6.6)
and one recovers Eq. (2.50).
For the particular scale factor (2.17) one obtains
H = ȧ
sgn(t)
|t|+ tb
and (6.7)
Ḣ = 2δ(t)
(|t|+ tb)2
(6.8)
such that
Ḣ +H2 = 2δ(t)
. (6.9)
The δ-function in the last equation models the bounce.
Without the bounce, i.e. for an eternally radiation dom-
inated dynamics, one has V = 0 and the evolution equa-
tion for ǫ would be trivial. With the bounce, the potential
is just a delta-function potential with “height” propor-
tional to −2√vb/L
V = −
δ(t) , (6.10)
where vb is given in Eq (2.20). Equation (6.1) with poten-
tial (6.10) can be considered as a Schrödinger equation
with δ-function potential. Its solution is a classical text-
book problem.
Since the approximated potential V vanishes for all t < 0
one has, with the initial condition (6.3),
ǫ(t) = e−ikt , t < 0 . (6.11)
Assuming continuity of ǫ through t = 0 and integrating
the differential equation over a small interval t ∈ [0−, 0+]
around t = 0 gives
(6.12)
= ǫ̇(0+)− ǫ̇(0−)−
ǫ(0) . (6.13)
The jump of the derivative ǫ̇ at t = 0 leads to parti-
cle creation. Using ǫ(0+) = ǫ(0) = ǫ(0−) and ǫ̇(0+) =
ǫ̇(0−) +
ǫ(0) as initial conditions for the solution for
t > 0, one obtains
ǫ(t) = Ae−ikt +Beikt , t > 0 (6.14)
A = 1 + i
, B = −i
. (6.15)
The Bogoliubov coefficient B00 after the bounce is then
given by
B00(t ≥ 0) =
e−ikt
1 + i
ǫ(t)− i
ǫ̇(t)
(6.16)
where we have used that M00 = −H if ys ≫ yb. At
this point the importance of the coupling matrix M00
becomes obvious. Even though the solution ǫ to the dif-
ferential equation (6.1) is a plane wave right after the
bounce, |B00(t)|2 is not a constant due to the motion of
the brane itself. Only once the mode is inside the hori-
zon, i.e. H/k ≪ 1, |B00(t)|2 is constant and the number
of generated final state gravitons (for both polarizations)
is given by
N out0,k = 2|B00(kt ≫ 1)|2 = 2
|ǫ|2 + |ǫ̇|
(kL)2
(6.17)
where we have used that the Wronskian of ǫ, ǫ∗ is 2ik.
As illustrated in Fig. 9 the expression (6.17) is indeed in
excellent agreement with the (full) numerical results, not
only in its k-dependence but also the amplitude agrees
without any fudge factor. The evolution of the four-
dimensional graviton mode and the associated genera-
tion of massless gravitons with momentum k < 1/L can
therefore be understood analytically.
Note that the approximation employed here is only valid
if y2s − yb(0)2 ≫ yb(0)2. In the opposite limit, if ∆y ≡
ys − yb(0) ≪ yb(0) one can also derive an analytical ap-
proximation along the same lines. For k ≤ 1/∆y one
obtains instead of Eq. (6.17)
N out0,k =
2(k∆y)2
, (6.18)
if ∆y ≡ ys − yb(0) ≪ yb(0) , k∆y <∼ 1 .
In order to calculate the energy density, we have to take
into account that the approximation of an exactly ra-
diation dominated Universe with an instant transition
breaks down on small scales. We assume this break
down to occur at the string scale Ls, much smaller than
L [cf. Eqs. (2.14),(2.15)]. Ls is the true width of the
transition from collapse to expansion, which we have set
to zero in the treatment. Modes with mode numbers
k ≫ (2π)/Ls will not ’feel’ the potential and are not
generated. We therefore choose kmax = (2π)/Ls as the
cutoff scale. Then, with Eq (4.21), one obtains for the
energy density
2 π2a4
∫ 2π/Ls
dkk3N0,k . (6.19)
For small wave numbers, k < 1/L, we can use the above
analytical result for the zero-mode particle number.
However, as the numerical simulations have revealed, as
soon as k >∼ 1/L, the coupling of the four-dimensional
graviton to the KK modes becomes important and for
large wave numbers N out0,k decays only like 1/k. Hence
the integral (6.19) is entirely dominated by the upper
cutoff. The contributions from long wavelengths to the
energy density are negligible.
For the power spectrum, on the other hand, we
are interested in cosmologically large scales, 1/k ≃
several Mpc or more, but not in short wavelengths
kL ≫ 1 dominating the energy density. Inserting the
expression for the number of produced long wavelength
gravitons (6.17) into (4.11), the gravity wave power
spectrum at late times becomes
P0(k) =
(2π)3
for kt ≫ 1. (6.20)
This is the asymptotic power spectrum, when ǫ starts
oscillating, hence inside the Hubble horizon, kt ≫ 1.
On super Hubble scales, kt ≪ 1 when the asymptotic
out-state of the zero mode is not yet reached, one may
use Eq. (4.10) with
R0,k(t) =
|ǫ(t)|2 − 1
≃ 4vba
. (6.21)
For the ≃ sign we assume t ≫ L and t ≫ tb so that
one may neglect terms of order t/L in comparison to√
vb(t/L)
2. We have also approximated a = (t+ tb)/L ≃
t/L. Inserting this in Eq. (4.8) yields
P0(k) =
2 , kt ≪ 1 . (6.22)
Both expressions (6.20) and (6.22) are in very good
agreement with the corresponding numerical results, see
Figs. 9, 10 and 11.
B. The zero mode: short wavelengths k ≫ 1/L
As we have demonstrated with the numerical analysis,
as soon as k >∼ 1/L, the coupling of the zero mode to
the KK modes becomes important, and for large wave
numbers N out0,k,• ∝ 1/k. We obtain a good asymptotic
behavior for the four-dimensional graviton spectrum if
we set
N out0,k,• ≃
5(kL)
. (6.23)
This function and Eq. (6.17) (divided by two for one po-
larization) meet at kL = 5. Even though the approxi-
mation is not good in the intermediate regime it is very
reasonable for large k [cf. Fig. 27].
Inserting this approximation into Eq (6.19) for the energy
density, one finds that the integral is dominated entirely
by the upper cutoff, i.e. by the blue, high energy modes:
. (6.24)
The power spectrum associated with the short wave-
lengths k ≫ 1/L is not of interest since the gravity wave
spectrum is measured on cosmologically large scales only,
k ≪ 1/L.
C. Light Kaluza-Klein modes and long wavelengths
k ≪ 1/L
The numerics indicates that light (mn < 1) long wave-
length KK modes become excited mainly due to their
coupling to the zero mode. Let us take only this cou-
pling into account and neglect also the time-dependence
of the frequency, setting ωn,k(t) ≡ ωoutn,k = ωinn,k since it
plays an inferior role as shown by the numerics.
The Bogoliubov coefficients are then determined by the
equations
ξ̇n,k + iω
n,kξn,k =
2ωoutn,k
Sn(t; k) (6.25)
η̇n,k − iωoutn,kηn,k = −
2ωoutn,k
Sn(t; k) (6.26)
with the “source”
Sn(t; k) = (ξ0 − η0)Mn0 . (6.27)
We have defined ξn,k ≡ ξ(0)n,k, ηn,k ≡ η
n,k, ξ0 ≡ ξ
0,k, and
η0 ≡ η(0)0,k. This source is known, since the evolution of
the four-dimensional graviton is know. From the result
for ǫ above and the definition of ξ0 and η0 in terms of ǫ
and ǫ̇ one obtains
ξ0 − η0 =
−ik + 1|t|+ tb
e−itk , t < 0 (6.28)
ξ0 − η0 = 2
1− iktb
k2tb(t+ tb)
e−itk
k2tb(t+ tb)
eitk , t > 0 .(6.29)
Furthermore, if ys ≫ yb, one has [cf. Eq. (B3)]
Mn0 = 2
Y1(mnys)2
Y1(mnyb)2 − Y1(mnys)2
. (6.30)
Assuming ysmn ≫ 1 and ybmn ≪ 1 one can expand the
Bessel functions and arrives at
Mn0 ≃
ẏb = −
πmnL2
L sgn(t)
(|t|+ tb)2
To determine the number of created final state gravi-
tons we only need to calculate ηn,k [cf. Eq. (3.32) with
∆+n,k(|t| → ∞) = 1 and ∆
n,k(|t| → ∞) = 0],
N outn,k,• = |B0n,k(tout)|2 =
ωoutn,k
|ηn,k|2 (6.31)
The vacuum initial conditions require limt→−∞ ηn,k = 0
so that ηn,k is given by the particular solution
ηn,k(t) =
ωoutn,k
′; k)e−it
′ωoutn,kdt′ , (6.32)
and therefore
N outn,k,• =
4ωoutn,k
Sn(t; k)e
−itωout
n,kdt
(6.33)
where the integration range has been extended from −∞
to +∞ since the source is very localized around the
bounce. This integral can be solved exactly. A some-
what lengthy but straight forward calculation gives
N outn,k,• =
πm5nL
2ωoutn,kkys
∣∣∣2iRe
+k)tbE1(i(ω
n,k + k)tb)
+(ktb)
−1ei(ω
n,k−k)tbE1(i(ω
n,k − k)tb)
−ei(ω
+k)tbE1(i(ω
n,k + k)tb)
. (6.34)
Here E1 is the exponential integral, E1(z) ≡∫∞
t−1e−tdt . This function is holomorphic in the com-
plex plane with a cut along the negative real axis, and
the above expression is therefore well defined. Note that
this expression does not give rise to a simple dependence
of N outn,k on the velocity vb = (L/tb)2. In the preceding
section we have seen that, within its range of validity,
Eq. (6.34) is in excellent agreement with the numerical
results (cf., for instance, Figs. 12 and 13).
As already mentioned before, this excellent agreement
between the numerics and the analytical approximation
demonstrates that the numerical results are not contam-
inated by any spurious effects.
D. Kaluza-Klein modes: asymptotic behavior and
energy density
The numerical simulations show that the asymptotic
KK-graviton spectra (i.e. for masses mn ≫ 1) decay
like 1/ωoutn,k if mn ≫ k and like
1/ωoutn,k
with α ≃ 2
if mn <∼ k. The corresponding energy density on the
brane is given by the summation of Eq. (4.23) over all
KK modes up to the cutoff. Since the mass mn is simply
the momentum into the extra dimension, it is plausible
to choose the same cutoff scale for both, the k-integral
and the summation over the KK modes, namely 2π/Ls.
The main contribution to the four-dimensional particle
density and energy density comes from mn ∼ 2π/Ls and
k ∼ 2π/Ls, i.e. the blue end of the spectrum.
The large-frequency behavior of the final KK-spectrum
can be approximated by
N outn,k,• ≃
0.2v2b


ωoutn,k
if 1/L <∼ k <∼ mn
2(α−1)/2
(ωoutn,k)
if mn <∼ k <∼ 2π/Ls
(6.35)
with α ≃ 2 which is particularly good for large k. Both
expression match at mn = k and are indicated in Figures
25 and 31 as dashed lines. Given the complicated
coupling structure of the problem and the multitude of
features visible in the particle spectra, these compact
expressions describe the numerical results reasonable
well for all parameters. The deviation from the numer-
ical results is at most a factor of two. This accuracy
is sufficient in order to obtain a useful expression for
the energy density from which bounds on the involved
energy scales can be derived.
The energy density on the brane associated with
the KK gravitons is given by [cf. Eq. (4.23)]
ρKK ≃
πa6ys
dkk2 N outn,k,• ωoutn,k mn . (6.36)
Splitting the momentum integration into two integrations
from 0 to mn and mn to the cutoff 2π/Ls, and replacing
the sum over the KK masses by an integral one obtains
ρKK ≃ C(α)
π5v2b
. (6.37)
The power α in Eq. (6.35) enters the final result for the
energy density only through the pre-factor C(α) which is
of order unity.
VII. DISCUSSION
The numerical simulations have revealed many inter-
esting effects related to the interplay between the evolu-
tion of the four-dimensional graviton and the KK modes.
All features observed in the numerical results have been
interpreted entirely on physical grounds and many of
them are supported by analytical calculations and ar-
guments. Having summarized the results for the power
spectrum and energy densities in the preceding section,
we are now in the position to discuss the significance of
these findings for brane cosmology.
A. The zero mode
For the zero-mode power-spectrum we have found that
P0(k) =
k2 if kt ≪ 1
(La)−2 if kt ≫ 1 . (7.1)
Therefore, the gravity wave spectrum on large, super
Hubble scales is blue with spectral tilt
nT = 2 , (7.2)
a common feature of ekpyrotic and pre-big-bang models.
The amplitude of perturbations on scales at which
fluctuations of the Cosmic Microwave Background
(CMB) are observed is of the order of (H0/mPl)
2, i.e.
very suppressed on scales relevant for the anisotropies
of the CMB. The fluctuations induced by these Casimir
gravitons are much too small to leave any observable
imprint on the CMB.
For the zero-mode energy density at late times,
kt ≫ 1, we have obtained [cf Eq. (6.24)]
ρh0 ≃
. (7.3)
In this section we denote the energy density of the
zero mode by ρh0 in order not to confuse it with the
12 Note that even the transition from the summation over the KK-
tower to an integration according to (4.33) “eats up” the 1/ys
term in (6.36), the final energy density (6.37) depends on ys since
it explicitly enters the particle number.
present density of the Universe. Recall that Ls is the
scale at which our kinky approximation (2.17) of the
scale factor breaks down, i.e. the width of the bounce.
If this width is taken to zero, the energy density of
gravitons is very blue and diverges. This is not so
surprising, since the kink in a(t) leads to the generation
of gravitons of arbitrary high energies. However, as the
numerical simulations have shown, when we smooth the
kink at some scale Ls, the production of modes with
energies larger than ≃ 1/Ls is exponentially suppressed
[cf. Fig. 32]. This justifies the introduction of Ls as a
cutoff scale.
In the following we shall determine the density pa-
rameter of the generated gravitons today and compare
it to the Nucleosynthesis bound. For this we need the
quantities ab given in Eq (2.20) and
Here ab is the minimal scale factor andHb is the maximal
Hubble parameter, i.e. the Hubble parameter right after
the bounce. (Recall that in the low energy approximation
t = η.) During the radiation era, curvature and/or a cos-
mological constant can be neglected so that the density
ρrad =
a−4 =
. (7.4)
In order to determine the density parameter of the gen-
erated gravitons today, i.e., at t = t0, we use
Ωh0 =
ρh0(t0)
ρcrit(t0)
ρh0(t0)
ρrad(t0)
ρrad(t0)
ρcrit(t0)
ρh0(t0)
ρrad(t0)
Ωrad.
(7.5)
The second factor Ωrad is the present radiation density
parameter. For the factor ρh0/ρrad, which is time inde-
pendent since both ρh0 and ρrad scale like 1/a
4, we insert
the above results and obtain
Ωh0 =
Ωrad =
Ωrad (7.6)
Ωrad . (7.7)
The nucleosynthesis bound [14] requests that
Ωh0 <∼ 0.1Ωrad , (7.8)
which translates into the relation
(LPl/Ls)
(L/Ls) <∼ 0.1 (7.9)
which, at first sight, relates the different scales involved.
But since we have chosen the cutoff scale Ls to be
the higher-dimensional fundamental scale (string scale),
Equation (7.9) reduces to
vb <∼ 0.2 (7.10)
by virtue of Equation (2.15). All one has to require to
be consistent with the nucleosynthesis bound is a small
brane velocity which justifies the low energy approach.
In all, we conclude that the model is not severely con-
strained by the zero mode. This result itself is remark-
able. If there would be no coupling of the zero mode
to the KK modes for small wavelengths the number of
produced high energy zero-mode gravitons would behave
as ∝ k−2 as it is the case for long wavelengths. The
production of high energy zero-mode gravitons from KK
gravitons enhances the total energy density by a factor of
about L/Ls. Without this enhancement, the nucleosyn-
thesis bound would not lead to any meaningful constraint
and would not even require vb < 1.
B. The KK modes
As derived above, the energy density of KK gravitons
on the brane is dominated by the high energy gravitons
and can be approximated by [cf. Eq. (6.37)]
ρKK ≃
π5v2b
. (7.11)
Let us evaluate the constraint induced from the require-
ment that the KK-energy density on the brane be smaller
than the radiation density ρKK(t) < ρrad(t) at all times.
If this is not satisfied, back-reaction cannot be neglected
and our results are no longer valid. Clearly, at early times
this condition is more stringent than at late times since
ρKK decays faster then ρrad. Inserting the value of the
scale factor directly after the bounce where the produc-
tion of KK gravitons takes place, a−2b = vb, one finds,
using again the RS fine tuning condition (2.15),
≃ 100 v3b
. (7.12)
If we use the largest value for the brane velocity vb ad-
mitted by the nucleosynthesis bound vb ≃ 0.2 and re-
quire that ρKK/ρrad be (much) smaller than one for back-
reaction effects to be negligible, we obtain the very strin-
gent condition
. (7.13)
Let us first discuss the largest allowed value for L ≃
0.1mm. The RS-fine tuning condition (2.15) then deter-
mines Ls = (LL
1/3 ≃ 10−22 mm ≃ 1/(106 TeV). In
this case the brane tension is T = 6κ4/κ25 = 6L2Pl/L6s =
6/(LL3s) ∼ (10TeV)4. Furthermore, we have (L/Ls)2 ≃
1042 so that ys > L(L/Ls)
2 ≃ 1041mm ≃ 3 × 1015Mpc,
which is about 12 orders of magnitude larger than the
present Hubble scale. Also, since yb(t) ≪ L in the low
energy regime, and ys ≫ L according to the inequality
(7.13), the physical brane and the static brane are very
far apart at all times. Note that the distance between
the physical and the static brane is
dy = L log(ys/yb) >∼ L ≫ Ls .
This situation is probably not very realistic. Some high
energy, stringy effects are needed to provoke the bounce
and one expects these to be relevant only when the
branes are sufficiently close, i.e. at a distance of order
Ls. But in this case the constraint (7.13) will be violated
which implies that back-reaction will be relevant.
On the other hand, if one wants that ys ≃ L and
back-reaction to be unimportant, then Eq. (7.12) implies
that the bounce velocity has to be exceedingly small,
vb <∼ 10−15.
A way out of this conclusion is to assume that the
brane distance at the bounce, ∆y = ys − yb(0), becomes
of the order of the cutoff Ls or smaller. Then the pro-
duction of KK gravitons is suppressed. However, then
the approximation (6.18) has to be used to determine
the energy density of zero-mode gravitons which then
becomes
ρh0 ≃
(Ls∆y)
Setting ∆y ≃ Ls, the nucleosynthesis bound,
ρh0 <∼ 0.1ρrad, then yields the much more stringent limit
on the brane velocity,
v2b <
. (7.14)
One might hope to find a way out of these conclusions
by allowing the bounce to happen in the high energy
regime. But then vb ≃ 1 and the nucleosynthesis bound
is violated since too many zero-mode gravitons are pro-
duced. Even if one disregards this limit for a moment,
saying that the calculation presented here only applies in
the low energy regime, vb ≪ 1, the modification coming
from the high energy regime are not expected to allevi-
ate the bounds. In the high energy regime one may of
course have yb(t) ≫ L and therefore the physical brane
can approach the static brane arbitrarily closely without
the latter having to violate (7.13). Those results suggest
that even in the scenario of a bounce at low energies, the
back reaction from KK gravitons has to be taken into
account. But this does not need to exclude the model.
VIII. CONCLUSIONS
We have studied the evolution of tensor perturbations
in braneworld cosmology using the techniques developed
for the standard dynamical Casimir effect. A model
consisting of a moving and a fixed 3-brane embedded in
a five-dimensional static AdS bulk has been considered.
Applying the dynamical Casimir effect formulation
to the study of tensor perturbations in braneworld
cosmology represents an interesting alternative to
other approaches existing in the literature so far and
provides a new perspective on the problem. The explicit
use of coupling matrices allows us to obtain detailed
information about the effects of the intermode couplings
generated by the time-dependent boundary conditions,
i.e. the brane motion.
Based on the expansion of the tensor perturbations
in instantaneous eigenfunctions, we have introduced a
consistent quantum mechanical formulation of graviton
production by a moving brane. Observable quantities
like the power spectrum and energy density can be
directly deduced from quantum mechanical expectation
values, in particular the number of gravitons created
from vacuum fluctuations. The most surprising and
at the same time most interesting fact which this
approach has revealed is that the energy density of the
massive gravitons decays like 1/a6 with the expansion
of the Universe. This is a direct consequence of the
localization of gravity: five-dimensional aspects of it,
like the KK gravitons, become less and less ’visible’
on the brane with the expansion of the Universe. The
1/a6-scaling behavior remains valid also when the fixed
brane is sent off to infinity and one ends up with
a single braneworld in AdS, like in the original RS II
scenario. Consequently, KK gravitons on a brane moving
through an AdS bulk cannot play the role of dark matter.
As an explicit example, we have studied graviton
production in a generic, ekpyrotic-inspired model of
two branes bouncing at low energies, assuming that
the energy density on the moving brane is dominated
by a radiation component. The numerical results have
revealed a multitude of interesting effects.
For long wavelengths kL ≪ 1 the zero mode evolves
virtually independently of the KK modes. zero-mode
gravitons are generated by the self coupling of the zero
mode to the moving brane. For the number of produced
massless gravitons we have found the simple analytical
expression 2vb/(kL). These long wavelength modes
are the once which are of interest for the gravitational
wave power spectrum. As one expects for an ekpyrotic
scenario, the power spectrum is blue on super-horizon
scales with spectral tilt nT = 2. Hence, the spectrum
of these Casimir gravitons has much too little power
on large scales to affect the fluctuations of the cosmic
microwave background.
The situation changes completely for short wavelengths
kL ≫ 1. In this wavelength range, the evolution of the
zero mode couples strongly to the KK modes. Produc-
tion of zero-mode gravitons takes place on the expense of
KK-graviton production. The numerical simulation have
revealed that the number of produced short-wavelength
massless gravitons is given by 2vb/(5kL). It decays
only like 1/k instead of the 1/k2-behavior found for
long wavelengths. These short wavelength gravitons
dominate the energy density. Comparing the energy
density with the nucleosynthesis bound and taking the
cutoff scale to be the string scale Ls, we have shown that
the model is not constrained by the zero mode. As long
as vb <∼ 0.2, i.e. a low energy bounce, the nucleosynthesis
bound is not violated.
More stringent bounds on the model come from
the KK modes. Their energy density is dominated by
the high energy modes which are produced due to the
kink which models the transition from contraction to
expansion. Imposing the reasonable requirement that
the energy density of the KK modes on the brane be
(much) smaller than the radiation density at all times in
order for back reaction effects to be negligible, has led to
two cases. On the one hand, allowing the largest values
for the AdS curvature scale L ≃ 0.1mm and the bounce
velocity vb ≃ 0.2, back reaction can only be neglected
if the fixed brane is very far away from the physical
brane ys ∼ 1041mm. As we have argued, this is not very
realistic since some high energy, stringy effects provoking
the bounce are expected to be relevant only when the
branes are sufficiently close, i.e. ys ∼ Ls. On the other
hand, by only requiring that ys ≃ L ≫ Ls, the bounce
velocity has already to be exceedingly small, vb <∼ 10−15,
for back reaction to be unimportant. Therefore, one of
the main conclusions to take away from this work is that
back reaction of massive gravitons has to be taken into
account for a realistic bounce.
Many of the results presented here are based on
numerical calculations. However, since the used ap-
proach provides the possibility to artificially switch
on and off the mode couplings, we were able identify
the primary sources driving the time evolution of the
perturbations in different wavelength and KK mass
ranges. This has allowed us to understand many of the
features observed in the numerical results on analytical
grounds.
On the other hand, it is fair to say that most of the
presented results rely on the low energy approach, i.e. on
the approximation of the junction condition (generalized
Neumann boundary condition) by a Neumann boundary
condition. Even though we have given arguments for
the goodness of this approximation, it has eventually
to be confirmed by calculations which take the exact
boundary condition into account. This is the subject of
future work.
Acknowledgment
We thank Cyril Cartier who participated in the early
stages of this work and Kazuya Koyama and David Lan-
glois for discussions. We are grateful for the use of the
’Myrinet’-cluster of Geneva University on which most of
the quite intensive numerical computations have been
performed. This work is supported by the Swiss National
Science Foundation.
APPENDIX A: VARIATION OF THE ACTION
Let us consider the variation of the action (2.27) with
respect to h•. It is sufficient to study the action for a
fixed wave number k and polarization •
Sh•(k) =
yb(t)
|∂th•|2 − |∂yh•|2 − k2|h•|2
and we omit the normalization factor L3/κ5 as well as
the factor two related to Z2 symmetry. The variation of
(A1) reads
δSh•(k) =
yb(t)
(∂th•)(∂tδh
•) (A2)
−(∂yh•)(∂yδh∗•)− k2h•δh∗•
+ h.c. .
Here, T denotes a time interval within the variation is
performed and it is assumed in the following that the
variation vanishes at the boundaries of the time interval
T . Performing partial integrations and demanding that
the variation of the action vanishes leads to
0 = (A3)
yb(t)
− ∂2t h• + y3
− k2h•
[(v∂t + ∂y)h•] δh
•|yb(t) −
(∂yh•)δh
with v = dyb(t)/dt. The first term in curly brackets is the
wave operator (2.24). In order for h• to satisfy the free
wave equation (perturbation equation) (2.24) the term
in curly brackets in the second integral has to vanish.
Allowing for an evolution of h• on the branes, i.e. in
general δh•|brane 6= 0, enforces the boundary conditions
(v∂t + ∂y)h•|yb(t) = 0 and ∂yh•|ys = 0 , (A4)
hence, the junction condition (2.26). Consequently, any
other boundary conditions than (A4) are not compati-
ble with the free perturbation equation (2.24) under the
influence of a moving brane (provided δh• 6= 0 at the
branes).
APPENDIX B: COUPLING MATRICES
The use of several identities of Bessel functions leads
M00 = ŷb
y2s − y2b
, (B1)
M0j = 0 , (B2)
Mi0 =
φ0 = ŷb
y2s − y2b
, (B3)
Mii = m̂i , (B4)
Mij = M
ij +M
ij (B5)
MAij = (ŷb + m̂i)yb
2m2iNiNj
m2j −m2i
× (B6)
× [ys C2(mjys)J1(miys)− yb C2(mjyb)J1(miyb)]
where
J1(mi y) = [J2(miyb)Y1(miy)− Y2(miyb)J1(miy)]
MNij = NiNjmim̂i
dyy2C1(miy)C2(mjy). (B8)
This integral has to be solved numerically. Note that,
because of the boundary conditions, one has the identity
dyy2C1(miy)C2(mjy) = −
dyy2C1(miy)C0(mjy).
Furthermore, one can simplify
J1(mi yb) =
πmiyb
, J1(mi ys) =
πmiyb
Y1(miys)
Y1(miyb)
(B10)
where the limiting value has to be taken for the last term
whenever Y1(miyb) = Y1(miys) = 0.
APPENDIX C: ON POWER SPECTRUM AND
ENERGY DENSITY CALCULATION
1. Instantaneous vacuum
In Section III the in - out state approach to particle
creation has been presented. The definitions of the in
- and out- vacuum states Eq. (3.9) are unique and the
particle concept is well defined and meaningful.
If we interpret tout as a continuous time variable t,
we can write the Bogoliubov transformation Eq. (3.24)
âα,k,•(t) =
Aβα,k(t)âinβ,k,• + B∗βα,k(t)â
β,−k,•
where at any time we have introduced a set of operators
{âα,k•(t), â†α,k,•(t)}. Vacuum states defined at any time
can be associated with these operators via
âα,k,•(t)|0, t〉 = 0 ∀ α,k • . (C2)
Similar to Eq. (3.11) a ”particle number” can be intro-
duced through
Nα,k(t) =
〈0, in|â†α,k•(t)âα,k,•(t)|0, in〉
|Bβα,k(t)|2 . (C3)
We shall denote |0, t〉 as the instantaneous vacuum state
and the quantity Nα,k(t) as instantaneous particle num-
ber 13. However, even if we call it ”particle number” and
plot it in section V for illustrative reasons, we consider
only the particle definitions for the initial and final state
(asymptotic regions) outlined in section III as physically
meaningful.
2. Power spectrum
In order to calculate the power spectrum Eq. (4.7) we
need to evaluate the expectation value
(t, yb,k)ĥ
(t, yb,k
′)〉in = (C4)
φα(t, yb)φα′ (t, yb)〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in
where we have introduced the shortcut 〈...〉in =
〈0, in|...|0, in〉. Using the expansion (3.15) of q̂α′,k′,•(t)
in initial state operators and complex functions ǫ
α,k(t)
one finds
〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in =
α,k(t) ǫ
α′,k (t)
2ωinβ,k
δ(3)(k− k′).
From the initial conditions (3.21) it follows that the sum
in (C4) diverges at t = tin. This divergence is related to
the usual normal ordering problem and can be removed
by a subtraction scheme. However, in order to obtain a
well defined power spectrum at all times, it is not suffi-
cient just to subtract the term (1/2)(δαα′/ω
α,k)δ
(3)(k −
′) which corresponds to 〈q̂α,k,•(tin)q̂†α′,k′,•(tin)〉in in the
above expression. In order to identify all terms contained
in the power spectrum we use the instantaneous particle
concept which allows us to treat the Bogoliubov coeffi-
cients (3.25) and (3.26) as continuous functions of time.
First we express the complex functions ǫ
α,k in (C5) in
terms of Aγα,k(t) and Bγα,k(t). This is of course equiv-
alent to calculating the expectation value (C5) using
[cf. Eq.(3.7)]
q̂α,k,•(t) =
2ωα,k(t)
âα,k,•(t)Θα,k(t)
α,−k,•(t)Θ
α,k(t)
and the Bogoliubov transformation Eq. (C1). The result
consists of terms involving the Bogoliubov coefficients
and the factor (1/2)(δαα′/ωα,k(t))δ
(3)(k − k′), leading
potentially to a divergence at all times. This term cor-
responds to 〈0, t|q̂α,k,•(t)q̂†α′,k′,•(t)|0, t〉, and is related to
13 It could be interpreted as the number of particles which would
have been created if the motion of the boundary (the brane)
stops at time t.
the normal ordering problem (zero-point energy) with re-
spect to the instantaneous vacuum state |0, t〉. It can be
removed by the subtraction scheme
〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in,phys (C7)
= 〈q̂α,k,•(t)q̂†α′,k′,•(t)〉in − 〈0, t|q̂α,k,•(t)q̂
α′,k′,•(t)|0, t〉
where we use the subscript “phys” to denote the physi-
cally meaningful expectation value.
Inserting this expectation value into (C4), and using
Eq. (4.2), we find
〈ĥ•(t, yb,k)ĥ•(t, yb,k′)〉in (C8)
Rα,k(t)Y2α(a)δ(3)(k− k′)
with Rα,k(t) defined in Eq. (4.9). The function ONα,k
appearing in Eq. (4.9) is explicitely given by
ONα,k = 2ℜ
Θ2α,kAβα,kB∗βα,k +Θα,k
α′ 6=α
ωα′,k
Yα′ (a)
Yα(a)
Θ∗α′,kB∗βαBβα′ +Θα′,kAβαB∗βα′
and Oǫα,k appearing in Eq. (4.10) reads
Oǫα,k =
β,α′ 6=α
Yα′(a)
Yα(a)
ωinβ,k
. (C10)
3. Energy density
In order to calculate the energy density we need to
evaluate the expectation value 〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in.
Using (2.22) and the relation e•ij(−k) = (e•ij(k))∗ we ob-
〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in =
(2π)3/2
(2π)3/2
(C11)
× 〈 ˙̂h
(t, yb,k)
′(t, yb,k
′)〉inei(k−k
′)·xe•ij(k)
′ ij(k′)
By means of the expansion (3.17) the expectation value
〈 ˙̂h
(t, yb,k)
′(t, yb,k
′)〉in becomes
〈 ˙̂h
(t, yb,k)
′(t, yb,k
′)〉in (C12)
〈p̂α,k,•(t)p̂†α′,k′,•′(t)〉inφα(t, yb)φα′ (t, yb).
From the definition of p̂α,k,•(t) in Eq. (3.18) it is clear
that this expectation value will in general contain terms
proportional to the coupling matrix and its square when
expressed in terms of ǫ
α,k. However, we are interested in
the expectation value at late times only when the brane
moves very slowly such that the mode couplings go to
zero and a physical meaningful particle definition can be
given. In this case we can set
〈p̂α,k,•(t)p̂†α′,k′,•′(t)〉in =
˙̂qα,k,•(t) ˙̂q
α′,k′,•′(t)
.(C13)
Calculating this expectation value by using Eq. (3.15)
leads to an expression which, as for the power spec-
trum calculation before, has a divergent part related to
the zero-point energy of the instantaneous vacuum state
(normal ordering problem). We remove this part by a
subtraction scheme similar to Eq (C7). The final result
reads
〈 ˙̂qα,k,•(t) ˙̂q†α′,k′,•′(t)〉in,phys (C14)
α,k(t)ǫ̇
α′,k′(t)√
ωinβ,kω
− ωα,k(t)δαα′
′δ(3)(k− k′).
Inserting this result into Eq. (C12), splitting the summa-
tions in sums over α = α′ and α 6= α′ and neglecting the
oscillating α 6= α′ contributions (averaging over several
oscillations), leads to
〈 ˙̂h
(t, yb,k)
′(t, yb,k
′)〉in (C15)
Kα,k(t)Y2α(a)δ••′δ(3)(k− k′)
where the function Kα,k(t) is given by
Kα,k(t) =
|ǫ̇(β)α,k(t)|2
ωinβ,k
− ωα,k(t) = ωα,k(t)Nα,k(t) ,
(C16)
and we have made use of Eq. (4.2). The relation be-
tween
β |ǫ̇
α,k(t)|2/ωinβ,k and the number of created par-
ticles can easily be established. Using this expression in
Eq. (C11) leads eventually to
〈 ˙̂hij(t,x, yb) ˙̂hij(t,x, yb)〉in (C17)
(2π)3
Kα,k(t)Y2α(a)
where we have used that the polarization tensors satisfy
e•ij(k)
e• ij(k)
= 2. (C18)
The final expression for the energy density Eq. (4.18) is
then obtained by exploiting that κ5/L = κ4.
APPENDIX D: NUMERICS
In order to calculate the number of produced gravitons
the system of coupled differential equations (3.34) and
(3.35) is solved numerically. The complex functions ξ
α,k are decomposed into their real and imaginary parts:
α,k = u
α,k + iv
α,k , η
α,k = x
α,k + iy
α,k. (D1)
The system of coupled differential equations can then be
written in the form (cf. Eq. (A2) of [16])
k (t) = Wk(t)X
k (t) (D2)
where
0,k ...u
nmax,k
0,k ...x
nmax,k
0,k ...v
nmax,k
0,k ...y
nmax,k
The matrix Wk(t) is given by Eq. (A4) of [16] but
here indices start at zero. The number of produced
gravitons can be calculated directly from the solutions
to this system using Eqs. (3.28) and (3.32). Note
that for a given truncation parameter nmax the above
system of size 4(nmax + 1) × 4(nmax + 1) has to be
solved nmax + 1 - times, each time with different initial
conditions (3.38).
The main difficulty in the numerical simulations is that
most of the entries of the matrix Wk(t) [Eq. (A4) of
[16]] are not known analytically. This is due to the fact
that Eq. (2.40) which determines the time-dependent
KK masses mi(t) does not have an (exact) analytical
solution. Only the 00-component of the coupling matrix
Mαβ is known analytically. We therefore have to deter-
mine the time-dependent KK-spectrum {mi(t)}nmaxi=1 by
solving Eq. (2.40) numerically. In addition, also the part
MNij [Eq. (B8)] has to be calculated numerically since
the integral over the particular combination of Bessel
functions can not be found analytically.
We numerically evaluate the KK-spectrum and the
integral MNij for discrete time-values ti and use spline
routines to assemble Wk(t). The system (D2) can
then be solved using standard routines. We chose the
distribution of the ti’s in a non-uniform way. A more
dense mesh close to the bounce and a less dense mesh at
early and late times. The independence of the numerical
results on the distribution of the ti’s is checked. In order
to implement the bounce as realistic as possible, we do
not spline the KK-spectrum very close to the bounce
but re-calculate it numerically at every time t needed in
the differential equation solver. This minimizes possible
artificial effects caused by using a spline in the direct
vicinity of the bounce. The same was done for MNij but
we found that splining MNij when propagating through
the bounce does not affect the numerical results.
Routines provided by the GNU Scientific Library (GSL)
[59] have been employed. Different routines for root
finding and integration have been compared. The code
has been parallelized (MPI) in order to deal with the
1 10 100
KK-mass m
1 10 100
KK-mass m
FIG. 33: Comparison of the final KK-graviton spectrum
n,k,• with the expression dn,k(tout) describing to what accu-
racy the diagonal part of the Bogoliubov relation (D4) is sat-
isfied. Left panel: ys = 3, k = 0.1, vb = 0.03 and nmax = 100
[cf. Fig. 25]. Right panel: ys = 3, k = 30, vb = 0.1 and
nmax = 100 [cf. Fig. 26].
intensive numerical computations.
The accuracy of the numerical simulations can be
assessed by checking the validity of the Bogoliubov
relations
Aβα,k(t)A∗βγ,k(t)− B∗βα,k(t)Bβγ,k(t)
= δαγ (D4)
Aβα,k(t)B∗βγ,k(t)− B∗βα,k(t)Aβγ,k(t)
= 0. (D5)
In the following we demonstrate the accuracy of the nu-
merical simulations by considering the diagonal part of
(D4). The deviation of the quantity
dα,k(t) = 1−
|Aβα,k(t)|2 − |Bβα,k(t)|2
from zero gives a measure for the accuracy of the
numerical result. We consider this quantity at final
times tout and compare it with the corresponding final
particle spectrum. In Fig. 33 we compare the final KK-
graviton spectrum N outn,k,• with the expression dn,k(tout)
for two different cases. This shows that the accuracy
of the numerical simulations is very good. Even if the
expectation value for the particle number is only of order
10−7 to 10−6, the deviation of dn,k(tout) from zero is at
least one order of magnitude smaller. This demonstrates
the reliability of our numerical simulations and that we
can trust the numerical results presented in this work.
APPENDIX E: DYNAMICAL CASIMIR EFFECT
FOR A UNIFORM MOTION
We consider a real massless scalar field on a time-
dependent interval [0, y(t)]. The time evolution of its
mode functions are described by a system of differential
equations like (2.49) where the specific form of Mαβ de-
pends on the particular boundary condition the field is
subject to. In [15, 17] a method has been introduced to
study particle creation due to the motion of the boundary
y(t) (i.e. the dynamical Casimir effect) fully numerically.
We refer the reader to these publications for further de-
tails.
If the boundary undergoes a uniform motion y(t) = 1+vt
(in units of some reference length) it was shown in [57, 58]
that the total number of created scalar particles diverges,
caused by the discontinuities in the velocity at the begin-
ning and the end of the motion. In particular, for Dirich-
let boundary conditions (no zero mode), it was found in
[58] that 〈0, in|N̂outn |0, in〉 ∝ v2/n if n > 6 and v ≪ 1.
Thereby in- and out- vacuum states are defined like in
the present work and the frequency of a mode function
is given by ωn = π n , n = 1, 2, ... . In Figure 34 we
show spectra of created scalar particles obtained numer-
ically with the method of [17] for this particular case.
One observes that, as for our bouncing motion, the con-
vergence is very slow since the discontinuities in the ve-
locity lead to the excitation of arbitrary high frequency
modes. Nevertheless, it is evident from Fig. 34 that the
numerically calculated spectra approach the analytical
prediction. The linear motion discussed here and the
brane-motion (2.18) are very similar with respect to the
discontinuities in the velocity. In both cases, the total
discontinuous change of the velocity is 2v and 2vb, re-
spectively. The resulting divergence of the acceleration
is responsible for the excitation and therefore creation of
particles of all frequency modes. Consequently we ex-
pect the same ∝ v2/ωn behavior for the bouncing mo-
tion (2.18). Indeed, comparing the convergence behavior
of the final graviton spectrum for vb = 0.01 shown in
Fig. 25 with the one of the scalar particle spectrum for
v = 0.01 depicted in Fig. 34 shows that both are very
similar.
[1] J. Polchinski, String theory. An introduction to the
bosonic string, Vol. I (Cambridge University Press, Cam-
bridge, UK, 1998).
[2] J. Polchinski, String theory. Superstring theory and be-
yond, Vol. II (Cambridge University Press, Cambridge,
UK, 1998).
[3] J. Polchinski, Phys. Rev. Lett. 75, 4724 (1995), hep-
th/9510017.
1 2 3 10 20 30 100 200
frequency mode n
v=0.1
v=0.01
v=0.05
v=0.02
FIG. 34: Spectra of massless scalar particles produced under
the influence of the uniform motion y(t) = 1+vt for velocities
v = 0.01, 0.02, 0.05 and 0.1. The numerical results are com-
pared to the expression Nn = 0.035v
2/n (dashed lines) which
agrees with the analytical prediction Nn ∝ v
[4] N. Arkani-Hamed, S. Dimopoulos, and G. R. Dvali, Phys.
Lett. B429, 263 (1998), hep-ph/9803315.
[5] N. Arkani-Hamed, S. Dimopoulos, and G. R. Dvali, Phys.
Rev. D 59, 086004 (1999), hep-ph/9807344.
[6] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 3370
(1999), hep-ph/9905221.
[7] L. Randall and R. Sundrum, Phys. Rev. Lett. 83, 4690
(1999), hep-th/9906064.
[8] C. Lanczos, Ann. Phys. (Leipzig) 74, 518 (1924).
[9] N. Sen, Ann. Phys. (Leipzig) 73, 365 (1924).
[10] G. Darmois, Mémorial des sciences mathématiques, fas-
cicule 25 chap. 5 (Gauthier-Villars, Paris, 1927).
[11] W. Israel, Nuovo Cimento B44, 1 (1966).
[12] P. Kraus, JHEP 12, 011 (1999), hep-th/9910149.
[13] P. Binetruy, C. Deffayet, U. Ellwanger, and D. Langlois,
Phys. Lett. B477, 285 (2000), hep-th/9910219.
[14] M. Maggiore. Phys. Rept. 331, 283 (2000), gr-
qc/9909001.
[15] M. Ruser, J. Opt. B: Quantum Semiclass. Opt. 7, S100
(2005), quant-ph/0408142.
[16] M. Ruser, Phys. Rev. A 73, 043811 (2006), quant-
ph/0509030.
[17] M. Ruser, J. Phys. A 39, 6711 (2006), quant-ph/0603097.
[18] R. A. Battye, C. van de Bruck, and A. Mennim, Phys.
Rev. D 69, 064040 (2004), hep-th/0308134.
[19] R. A. Battye and A. Mennim, Phys. Rev. D 70, 124008
(2004), hep-th/0408101.
[20] R. Easther, D. Langlois, R. Maartens, and D. Wands, J.
Cosmol. Astropart. Phys. 10 (2003) 014, hep-th/0308078.
[21] T. Kobayashi and T. Tanaka, J. Cosmol. Astropart.
Phys. 10 (2004) 015.
[22] D. S. Gorbunov, V. A. Rubakov, and S. M. Sibiryakov,
JHEP 10, 015 (2001), hep-th/0108017.
[23] T. Kobayashi, H. Kudoh, and T. Tanaka, Phys. Rev. D
68, 044025 (2003), gr-qc/0305006.
[24] R. Maartens, D. Wands, B. A. Bassett, and
I. P. C. Heard, Phys. Rev. D 62, 041301 (2000), hep-
ph/9912464.
[25] D. Langlois, R. Maartens, and D. Wands, Phys. Lett. B
489, 259 (2000), hep-th/0006007.
[26] A. V. Frolov and L. Kofman (2002), hep-th/0209133.
[27] T. Hiramatsu, K. Koyama, and A. Taruya, Phys. Lett.
B 578, 269 (2004), hep-th/0308072.
[28] T. Hiramatsu, K. Koyama, and A. Taruya, Phys. Lett.
B 609, 133 (2005), hep-th/0410247.
[29] T. Hiramatsu, Phys. Rev. D 73, 084008 (2006), hep-
th/0601105.
[30] K. Koyama, J. Cosmol. Astropart. Phys. 09, 10, (2004)
astro-ph/0407263.
[31] K. Ichiki and K. Nakamura, Phys. Rev. D 70, 064017
(2004), hep-th/0310282.
[32] K. Ichiki and K. Nakamura, astro-ph/0406606 (2004).
[33] T. Kobayashi and T. Tanaka, Phys. Rev. D 71, 124028
(2005), hep-th/0505065.
[34] T. Kobayashi and T. Tanaka, Phys. Rev. D 73, 044005
(2006), hep-th/0511186.
[35] T. Kobayashi and T. Tanaka Phys. Rev. D 73, 124031
(2006).
[36] S. Seahra, Phys. Rev. D 74, 044010 (2006), hep-
th/0602194.
[37] C. Cartier, R. Durrer, M. Ruser, Phys. Rev.D72, 104018
(2005), hep-th/0510155.
[38] J. Khoury, B. A. Ovrut, P.J. Steinhardt, and N. Turok,
Phys. Rev. D 64 123522 (2001), hep-th/0103239.
[39] R. Kallosh, L. Kovman and A. Linde, Phys. Rev. D 64
123523 (2001), hep-th/0104073.
[40] A. Neronov, J. High Energy Phys. 11, 007 (2001), hep-
th/0109090.
[41] P.J. Steinhardt, and N. Turok, Phys. Rev. D 65 126003
(2002), hep-th/0111098.
[42] J. Khoury, B. A. Ovrut, N. Seiberg, P.J. Steinhardt
and N. Turok, Phys. Rev. D 65 086007 (2002), hep-
th/0108187.
[43] J. Khoury, B. A. Ovrut, P.J. Steinhardt and N. Turok,
Phys. Rev. D 66 046005 (2002), hep-th/0109050.
[44] J. Khoury, P.J. Steinhardt and N. Turok, Phys. Rev.
Lett. 91 161301 (2003), astro-ph/0302012.
[45] J. Khoury, P.J. Steinhardt and N. Turok, Phys. Rev.
Lett. 92 031302 (2004), hep-th/0307132.
[46] A. Tolley, N. Turok, and P.J. Steinhardt, Phys. Rev. D
69 106005 (2004), hep-th/0306109.
[47] R. Durrer and M. Ruser, Phys. Rev. Lett. 99, 071601
(2007), arXiv:0704.0756.
[48] C. Cartier and R. Durrer, Phys. Rev. D71, 064022
(2005), hep-th/0409287.
[49] R. Maartens, Living Rev. Rel. 7, 7 (2004), gr-qc/0312059.
[50] R. Durrer, Braneworlds, at the XI Brazilian School of
Cosmology and Gravitation, Edt. M. Novello and S.E.
Perez Bergliaffa, AIP Conference Proceedings 782 (2005),
hep-th/0507006.
[51] S. W. Hawking, T. Hertog, and H. S. Reall, Phys. Rev.
D62, 043501 (2000), hep-th/0003052.
[52] S. W. Hawking, T. Hertog, and H. S. Reall, Phys. Rev.
D63, 083504 (2001), hep-th/0010232.
[53] M. A. Pinsky, Partial Differential Equations and
Boundary-Value Problems with Applications, McGraw-
Hill, inc. New York (1991).
[54] M. Crocce, D.A.R. Dalvit and F.D. Mazzitelli, Phys. Rev.
A66, 033811 (2002), quant-ph/0205104.
[55] M. Abramowitz and I. Stegun, Handbook of Mathematical
Functions, 9th Edition (Dover Publications, NY, 1970).
[56] N. Straumann, Ann. Phys. (Leipzig), Volume 15, Issue
10-11 , 701 (2006), hep-ph/0505249.
[57] G. T. Moore, J. Math. Phys. 11, 2679 (1970).
[58] M. Castagnino and R. Ferraro, Ann. Phys. 154, 1 (1984).
[59] http://www.gnu.org/software/gsl
ABSTRACT
  We consider a two-brane system in a five-dimensional anti-de Sitter
spacetime. We study particle creation due to the motion of the physical brane
which first approaches the second static brane (contraction) and then recedes
from it(expansion). The spectrum and the energy density of the generated
gravitons are calculated. We show that the massless gravitons have a blue
spectrum and that their energy density satisfies the nucleosynthesis bound with
very mild constraints on the parameters. We also show that the Kaluza-Klein
modes cannot provide the dark matter in an anti-de-Sitter braneworld. However,
for natural choices of parameters, backreaction from the Kaluza-Klein gravitons
may well become important. The main findings of this work have been published
in the form of a Letter [R. Durrer and M. Ruser, Phys. Rev. Lett. 99, 071601
(2007), arXiv:0704.0756].

<|endoftext|><|startoftext|>
Introduction
	Spectral analysis
	The sample
	Spectral fits
	Errors from the fit and error propagation
	Results
	Bursts detected also by other instruments
	Peak energy vs spectral index
	Correlation between Epk and E,iso
	Summary and Conclusions
	The observed spectra
ABSTRACT
  We study the spectral and energetics properties of 47 long-duration gamma-ray
bursts (GRBs) with known redshift, all of them detected by the Swift satellite.
Due to the narrow energy range (15-150 keV) of the Swift-BAT detector, the
spectral fitting is reliable only for fitting models with 2 or 3 parameters. As
high uncertainty and correlation among the errors is expected, a careful
analysis of the errors is necessary. We fit both the power law (PL, 2
parameters) and cut--off power law (CPL, 3 parameters) models to the
time-integrated spectra of the 47 bursts, and present the corresponding
parameters, their uncertainties, and the correlations among the uncertainties.
The CPL model is reliable only for 29 bursts for which we estimate the nuf_nu
peak energy Epk. For these GRBs, we calculate the energy fluence and the rest-
frame isotropic-equivalent radiated energy, Eiso, as well as the propagated
uncertainties and correlations among them. We explore the distribution of our
homogeneous sample of GRBs on the rest-frame diagram E'pk vs Eiso. We confirm a
significant correlation between these two quantities (the "Amati" relation) and
we verify that, within the uncertainty limits, no outliers are present. We also
fit the spectra to a Band model with the high energy power law index frozen to
-2.3, obtaining a rather good agreement with the "Amati" relation of non-Swift
GRBs.

<|endoftext|><|startoftext|>
Accepted for publication in the Astrophysical Journal, July 16, 2007
Preprint typeset using LATEX style emulateapj v. 08/22/09
THE RELATIONSHIP BETWEEN MOLECULAR GAS TRACERS AND KENNICUTT-SCHMIDT LAWS
Mark R. Krumholz
and Todd A. Thompson
Department of Astrophysical Sciences, Princeton University, Princeton, NJ 08544
Accepted for publication in the Astrophysical Journal, July 16, 2007
ABSTRACT
We provide a model for how Kennicutt-Schmidt (KS) laws, which describe the correlation between
star formation rate and gas surface or volume density, depend on the molecular line chosen to trace
the gas. We show that, for lines that can be excited at low temperatures, the KS law depends on how
the line critical density compares to the median density in a galaxy’s star-forming molecular clouds.
High critical density lines trace regions with similar physical properties across galaxy types, and this
produces a linear correlation between line luminosity and star formation rate. Low critical density
lines probe regions whose properties vary across galaxies, leading to a star formation rate that varies
superlinearly with line luminosity. We show that a simple model in which molecular clouds are treated
as isothermal and homogenous can quantitatively reproduce the observed correlations between galactic
luminosities in far infrared and in the CO(1 → 0) and HCN(1 → 0) lines, and naturally explains why
these correlations have different slopes. We predict that IR-line luminosity correlations should change
slope for galaxies in which the median density is close to the line critical density. This prediction
may be tested by observations of lines such as HCO+(1 → 0) with intermediate critical densities, or
by HCN(1 → 0) observations of intensely star-forming high redshift galaxies with very high densities.
Recent observations by Gao et al. hint at just such a change in slope. We argue that deviations from
linearity in the HCN(1 → 0)−IR correlation at high luminosity are consistent with the assumption of
a constant star formation efficiency.
Subject headings: ISM: clouds — ISM: molecules — stars: formation — galaxies: ISM — radio lines:
1. INTRODUCTION
Schmidt (1959, 1963) first proposed that the rate at
which a gas forms stars might follow a simple power
law correlation of the form ρ̇∗ ∝ ρNg , where ρ̇∗ is the
star formation rate per unit volume, ρg is the gas den-
sity, and N is generally taken to be in the range 1 − 2.
In the decades since, observations have revealed two
strong correlations that appear to be evidence for this
hypothesis. First, galaxy surveys reveal that the in-
frared luminosity of a galaxy, which traces the star for-
mation rate, varies with its luminosity in the CO(1 → 0)
line, which traces the total mass of molecular gas, as
LFIR ∝ L1.4−1.6CO (Gao & Solomon 2004a,b; Greve et al.
2005; Riechers et al. 2006a). Kennicutt (1998a,b) iden-
tified the closely-related correlation between gas surface
density Σg and star formation rate surface density Σ̇∗,
Σ̇∗ ∝ Σ1.4±0.15g , a relation that has come to be known
as the Kennicutt Law. Since over the bulk of the dy-
namic range of Kennicutt’s data galaxies are predomi-
nantly molecular, this is effectively a correlation between
molecular gas, as traced by CO(1 → 0) line emission,
and star formation. Spatially resolved observations of
galaxies confirm that, at least for molecule-rich galax-
ies where resolved CO(1 → 0) observations are possible,
star formation is more closely coupled with gas traced by
CO(1 → 0) than with atomic gas (Wong & Blitz 2002;
Heyer et al. 2004; Komugi et al. 2005; Kennicutt et al.
2007)
Electronic address: krumholz@astro.princeton.edu, thomp@astro.princeton.edu
1 Hubble Fellow
2 Lyman Spitzer Jr. Fellow
Second, Gao & Solomon (2004a,b) find that there is a
strong correlation between the IR luminosity of galax-
ies and emission in the HCN(1 → 0) line, which mea-
sures the mass at densities significantly greater than
that probed by CO(1 → 0). However, they find that
their correlation, which covers nearly three decades in
total galactic star formation rate, is linear: LFIR ∝
LHCN. Wu et al. (2005) show that this correlation ex-
tends down to individual star-forming clumps of gas in
the Milky Way, provided that their infrared luminosities
are >∼ 104.5 L⊙. Interestingly, however, Gao et al. (2007)
find a deviation from linearity in the IR-HCN correlation
for a sample of intensely star-forming high redshift galax-
ies. These sources show small but significant excesses of
infrared emission for their observed HCN emission.
The difference in power law indices between the LFIR−
LCO and LFIR −LHCN correlations is statistically signif-
icant, and, on its face, puzzling. An index near N = 1.5
seems natural if one supposes that a roughly constant
fraction of the gas present in molecular clouds will be
converted into stars each free-fall time. In this case one
expects ρ̇∗ ∝ ρ1.5g (Madore 1977; Elmegreen 1994). If
gas scale heights do not vary strongly from galaxy to
galaxy, this implies Σ̇∗ ∝ Σ1.5g as well, which is consis-
tent with the observed Kennicutt law. More generally,
since the dynamical timescale in a marginally Toomre-
stable (Q ≈ 1; see Martin & Kennicutt 2001) galactic
disk is of order Ω−1 ∝ (Gρg)−1/2, where Ω is the angular
frequency of the disk, an index close to N = 1.5 is ex-
pected if star formation is regulated by any phenomenon
that converts a fixed fraction of the gas into stars on this
time scale (Elmegreen 2002).
http://arxiv.org/abs/0704.0792v2
mailto:krumholz@astro.princeton.edu, thomp@astro.princeton.edu
On the other hand, Wu et al. (2005) suggest a simple
interpretation of the linear IR-HCN correlation. They ar-
gue that the individual HCN-emitting molecular clumps
that they identify in the Milky Way represent a funda-
mental unit of star formation. The linear correlation
between star formation rate and HCN luminosity across
galaxies arises because a measurement of the HCN lu-
minosity for a galaxy simply counts the number of such
structures present within it, each of which forms stars at
some roughly fixed rate regardless of its galactic environ-
ment. However, in this interpretation it is unclear why
the structures traced by HCN(1 → 0) emission should
form stars at the same rate in any galaxy. After all, one
could equally well argue that molecular clouds traced by
CO(1 → 0) are fundamental units of star formation, but
the non-linear IR-CO correlation clearly shows that these
objects do not form stars at a fixed rate per unit mass.
Moreover, the evidence presented by Gao et al. (2007)
that the linear IR-HCN correlation varies in extremely
luminous high redshift galaxies suggests that the rela-
tionship between HCN emission and star-formation may
be somewhat more complex.
In this paper we attempt to explain the origin of the
difference in slope between the CO and HCN correla-
tions with star formation rate, and more generally to
give a theoretical framework for understanding how cor-
relations between star formation rate and line luminosity,
which we generically refer to as Kennicutt-Schmidt (KS)
laws, depend on the tracer used to define them. Our cen-
tral argument is conceptually quite simple, and in some
sense represents a combination of the intuitive arguments
for CO and HCN given above.
Consider an observation of a galaxy in a molecular
tracer with critical density ncrit, which essentially mea-
sures the mass of gas at densities of ncrit or more, i.e. the
gas that is dense enough for that particular transition to
be excited. In galaxies where the median density of the
molecular gas is significantly larger than ncrit, this means
that the observation will detect the majority of the gas,
and the bulk of the emission will come from gas whose
density is near the median density. Since the gas den-
sity will vary from galaxy to galaxy, the star formation
rate per unit gas mass will vary as roughly ρ1.5g , with one
factor of ρg coming from the amount of gas available for
star formation, and an additional factor of ρ0.5g coming
from the dependence of the free-fall or dynamical time
on the density.
On the other hand, in galaxies where the median gas
density is small compared to the critical density for the
chosen transition, observations will pick out only high
density peaks. Since the density in these peaks is set by
ncrit, and not by the conditions in the galaxy, these peaks
are at essentially the same density in any galaxy where
they are observed, and the corresponding free-fall times
in these regions are constant as well. As a result, the star
formation rate per unit mass of gas traced by that line
is approximately the same in every galaxy, because the
corresponding free-fall time is the same in every galaxy.
In the rest of this paper, we give a quantitative version
of this intuitive argument, and then discuss its conse-
quences. In § 2 we develop a simple formalism to com-
pute the star formation rate and the molecular line lu-
minosity of galaxies, and in § 3 we use this formalism to
predict the correlation between star formation rate and
luminosity. We show that our predictions provide a very
good fit for a variety of observations, and make predic-
tions for future observations. We discuss the implications
of our work and its limitations in § 4, and summarize our
conclusions is § 5.
2. STAR FORMATION RATES AND LINE LUMINOSITIES
2.1. Cloud Properties
Consider a galaxy in which the star-forming molecu-
lar clouds have a volume-averaged mean molecular hy-
drogen number density n = ρg/µH2 , where ρg is the
volume-averaged mass density of the molecular clouds in
the galaxy and µH2 = 3.9×10−24 g is the mean mass per
hydrogen molecule for a gas of standard cosmic composi-
tion. Observations indicate that n varies by two to three
decades over the galaxies for which the Kennicutt and
Gao & Solomon correlations are measured, from n ≈ 50
cm−3 in normal spirals like the Milky Way (McKee
1999) up to n ≈ 104 cm−3 in the strongest starburst
systems in the local universe (e.g. Downes & Solomon
1998). There is strong evidence that densities in molec-
ular clouds follow a lognormal probability distribution
function (PDF; see reviews by Mac Low & Klessen 2004
and Elmegreen & Scalo 2004)
d lnx
(lnx− lnx)2
, (1)
where x = n/n is the molecular hydrogen number den-
sity n relative to the average density, σ is the width of
the lognormal, and lnx = −σ2/2. For this distribution
the median density is nmed = n exp(σ
2/2). Numerical
experiments show that for supersonic isothermal turbu-
lence σ2 ≈ ln
1 + 3M2/4
, where M is the 1D Mach
number of the turbulence (Nordlund & Padoan 1999;
Ostriker et al. 1999; Padoan & Nordlund 2002). Mach
numbers in star-forming molecular clouds range from
M ∼ 30 (McKee 1999) in normal spirals to M ∼ 100
in strong starbursts (Downes & Solomon 1998), imply-
ing that median densities in molecular clouds range from
∼ 103 cm−3 in normal spirals to ∼ 106 cm−3 in star-
bursts. Star forming clouds within a galaxy are approx-
imately isothermal, except very near strong sources of
stellar radiation, so we assume a fixed temperature T
for the clouds. Observationally, T ranges from roughly
10 K in normal spirals (McKee 1999) up to as much
about 50 K in strong starbursts (Downes & Solomon
1998; Gao & Solomon 2004b).
2.2. Star Formation Rates
First let us ask how quickly stars form in such a
medium. Krumholz & McKee (2005) give a model for
star formation regulated by supersonic turbulence in
which a population of molecular clouds of total mass Mcl
form stars at a rate Ṁ∗ = SFRffMcl/tff(n), where tff(n)
is the free-fall time evaluated at the mean density and
SFRff is a number of order 10
−2 that depends weakly on
M. We therefore estimate the star formation rate per
unit volume as a function of the mean density given by
ρ̇∗ ≈ SFRff
32Gµ3H2n
. (2)
Molecular Gas and Kennicutt-Schmidt Laws 3
We adopt the Krumholz & McKee result SFRff ≈
0.014(M/100)−0.32 for clouds with a fiducial virial ra-
tio of αvir = 1.3.
Alternately, Krumholz & Tan (2007) point out that
observed correlations between the star formation rate
and the luminosity in different density tracers imply that
over a 3− 4 decade range in density n,
Ṁ∗ ≈ 10−2
Mcl(> n)
tff(n)
, (3)
where Mcl(> n) is the mass of gas with a density of n
or higher, and Mcl = Mcl(> 0). For a given choice of
n this provides an alternative estimate of the star for-
mation rate which is purely empirical, and independent
of any particular theoretical model. However, the differ-
ence between the star formation rates predicted by (2)
and (3) is small. For gas with a lognormal PDF,
Mcl(> n) =
1 + erf
−2 lnx+ σ2
23/2σ
, (4)
and using this to evaluate equation (3) indicates that, for
Mach numbers in the observed range, the two prescrip-
tions (2) and (3) give about the same star formation rate
over a very broad range in x. For example, at M = 30
the two estimates agree to within a factor of 3 for den-
sities in the range 0.2 < x < 4 × 104. Given the scatter
inherent in observational estimates of the star formation
rate, a factor of 3 difference is not particularly signifi-
cant, so it matters little which prescription we adopt. In
practice, we will use equation (2).
2.3. Line Luminosities
Now we must compute the luminosity of molecular line
emission from the galaxy. Even for a cloud that is not
in local thermodynamic equilibrium (LTE), for optically
thin emission this calculation is straightforward. How-
ever, the molecular lines used most often in galaxy sur-
veys are generally optically thick. To handle the effect of
finite optical depth on molecule level populations and line
luminosities, we adopt an escape probability approxima-
tion and treat clouds as homogeneous spheres. This is
not fully consistent with our assumption that clouds have
lognormal density PDFs, since the escape probability for-
malism assumes a uniform level population throughout
the cloud, and the essence of our argument in this paper
turns on how the level population varies with density.
However, this approach gives us an approximate way of
incorporating the optical thickness of star-forming clouds
into our model, the only alternative to which for turbu-
lent media is full numerical simulation (e.g. Juvela et al.
2001). We therefore proceed by treating clouds as ho-
mogeneous in order to determine their escape probabili-
ties, and we then relax the assumption of homogeneity,
while keeping the escape probabilities fixed, in order to
determine level populations and cloud luminosities as a
function of density.
Consider a cloud of radius R in statistical equilibrium
but not necessarily in LTE. In the escape probability
approximation, the fraction fi of molecules of species S
in state i is given implicitly by the linear system
(nqji + βjiAji) fj =
(nqij + βijAij)
 fi (5)
fi=1, (6)
where qij is the collision rate for transitions from state
i to state j, Aij is the Einstein spontaneous emission
coefficient for this transition, βij is the cloud-averaged
escape probability for photons emitted in this transition,
the sums are over all quantum states, and we understand
that Aij = 0 for i ≤ j and qij = 0 for i = j.
Equations (5) and (6) allow us to compute the level
populations fi for given values of βij . To completely
specify the system, we must add an additional consis-
tency condition relating the values of βij to the level
populations. For a homogeneous spherical cloud, the es-
cape probability for a given line is related to the optical
depth from the center to the edge of the cloud τij by
(B. Draine, 2007, private communication)
βij ≈
1 + 0.5τij
, (7)
where τij is computed at the central frequency of the
line. In turn, the optical depth is related to the level
populations by
τij =
4(2π)3/2Mcs
nX(S)fjR
, (8)
where λij is the wavelength of transition i → j, gi and
gj are the statistical weights of states i and j, cs is
the isothermal sound speed of the gas, and X(S) is the
abundance of molecules of species S. Note that this
equation implicitly assumes that the cloud has a uni-
form Maxwellian velocity distribution with 1D disper-
sion Mcs, consistent with our treatment of the clouds
as homogeneous spheres. One additional complication is
that we do not directly know cloud radii for most exter-
nal galaxies, where observations cannot resolve individ-
ual molecular clouds. However, we often can diagnose
the optical depths of transitions by comparing line ratios
of molecular isotopomers of different abundances. We
therefore take τ10, the optical depth of the transition be-
tween the first excited state and the ground state, as
known. For a given level population this fixes the value
of R.
We solve equations (5)–(8) using Newton-Raphson it-
eration. In this procedure, we guess an initial set of es-
cape probabilities βij , and solve the linear system (5) and
(6) to find the corresponding initial level populations fi.
We then compute the optical depths τij from equation
(8). The guessed escape probabilities βij and the corre-
sponding optical depths τij generally will not satisfy the
consistency condition (7), so we then iterate over βij val-
ues using a Newton-Raphson approach, seeking βij for
which the level populations give optical depths τij such
that all elements of the matrix βij − 1/(1 + 0.5τij) are
equal to zero within some specified tolerance. We use
the LTE level populations and escape probabilities for
our initial guess, so that the iteration converges rapidly
when the system is close to LTE.
Once we have determined the escape probabilities βij ,
we compute the luminosity by holding the βij values fixed
but allowing the level populations to vary with density,
then integrating over the PDF. Thus, the total luminos-
TABLE 1
Model Parameters
Parameter Normal galaxy Intermediate Starburst Reference
T 10 20 50 1–4
M 30 50 80 1–4
X(CO) 2× 10−4 4× 10−4 8× 10−4 5
X(HCO+) 2× 10−9 4× 10−9 8× 10−9 6, 7
X(HCN) 1× 10−8 2× 10−8 4× 10−8 6–8
τCO(1→0) 10 20 40 9
τHCO+(1→0) 0.5 1.0 2.0 6, 7
τHCN(1→0) 0.5 1.0 2.0 6, 7
OPR 0.25 0.25 0.25 10
Note. — OPR = H2 ortho- to para-ratio. References:
1 – Solomon et al. (1987), 2 – Gao & Solomon (2004b), 3 –
Downes & Solomon (1998), 4 – Wu et al. (2005), 5 – Black (2000), 6 –
Nguyen et al. (1992), 7 – Wild et al. (1992), 8 – Lahuis & van Dishoeck
(2000), 9 – Combes (1991), 10 – Neufeld et al. (2006)
ity per unit volume in a particular line is
Lij = X(S)βijAijhνij
d lnx
d lnx, (9)
where νij is the line frequency, fi is an implicit function
of n given by the solution to equations (5) and (6), and
we assume that the abundance X(S) is independent of
n. The line luminosity per unit mass is Lij/(µH2n).
An IDL code that implements this calcu-
lation is available for public download from
http://www.astro.princeton.edu/∼krumholz/ astron-
omy.html.
3. CORRELATIONS AND KENNICUTT-SCHMIDT LAWS
3.1. Lines and Parameters
Using the formalism of § 2, we can now predict the
correlation between the star formation rate and the
luminosity of a galaxy in molecular lines. We make
these predictions for three representative molecular lines:
CO(1 → 0), HCO+(1 → 0), and HCN(1 → 0). For
the first and last of these transitions, there are exten-
sive observational surveys. We select HCO+(1 → 0)
in addition to these two because there is some obser-
vational data for it, and because its critical density of
ncrit = βHCO+4.6 × 104 cm−3 makes it intermediate
between CO(1 → 0), with ncrit = βCO560 cm−3, and
HCN(1 → 0), with ncrit = βHCN2.8 × 105 cm−3.3 Here
βS is the escape probability for the 1 → 0 transition of
species S. These critical densities are for T = 20 K. All
molecular data are taken from the Leiden Atomic and
Molecular Database4 (Schöier et al. 2005).
We make our calculations for three sets of fiducial pa-
rameters which we summarize in Table 1. The three sets
correspond roughly to typical conditions in normal disk
galaxies like the Milky Way, to starburst galaxies like
Arp 220, and to a case intermediate between the two. We
have selected parameters for each case to roughly model
the systematic variation of ISM parameters as one moves
3 Note that our critical density for HCN(1 → 0) is somewhat
larger than the value quoted by Gao & Solomon (2004a,b), proba-
bly because their calculation is based on somewhat different as-
sumptions about how to extrapolate from calculated rate coef-
ficients for HCN collisions with He to collisions with H2. See
Schöier et al. (2005) for details.
4 http://www.strw.leidenuniv.nl/∼moldata/
from normal disk galaxies to starbursts. Thus, we vary
the ISM temperature from 10− 50 K and the molecular
cloud Mach number from 30−80 as we move from Milky
Way-like molecular clouds to temperatures and Mach
numbers typical of starbursts (e.g. Downes & Solomon
1998). Similarly, starbursts, which preferentially occur
at galactic centers, have systematically larger metallici-
ties than galaxies like the Milky Way (e.g. Zaritsky et al.
1994; Yao et al. 2003; Netzer et al. 2005). To explore this
effect, we use abundances and 1 → 0 optical depths are
twice and four times as large for our intermediate and
starburst models, respectively, as for our normal galaxy
model.
3.2. Kennicutt-Schmidt Laws
We first plot, in Figure 1, the quantities L−1[dL(<
n)/d lnn] (solid lines) and M−1[dM(< n)/d lnn] (dot-
ted lines) as a function of density n for galaxies with
mean densities n = 102, 103, and 104 cm−3, for the
tracers CO(1 → 0), HCO+(1 → 0), and HCN(1 →
0), and for the Mach number and temperature corre-
sponding to our intermediate case in Table 1. Here
L(< n) and M(< n) are the luminosity and mass per
unit volume contributed by gas of density n or less,
i.e. L(< n) = X(S)βijAijhνij
∫ lnn
fin(dp/d lnn)d lnn,
M(< n) =
∫ lnn
µH2n(dp/d lnn)d lnn, L = L(< ∞), and
M = M(< ∞). Physically, L−1[dL(< n)/d lnn] and
M−1[dM(< n)/d lnn] represent the fractional contribu-
tion to the total line luminosity and the total mass that
comes from each unit interval in the logarithm of den-
sity. The plot shows what density range provides the
dominant contribution to the line luminosity in differ-
ent lines and for galaxies of differing mean densities, and
how the gas contributing light compares to the gas con-
tributing mass. Because the mass distribution is entirely
specified by n and M, the dotted lines are the same in
each of the three panels. Additionally, because of our
choice M = 50 (Table 1), the median density (the den-
sity corresponding to the peak in M−1[dM(< n)/d lnn])
is nmed ≈ 43n. In each panel, the critical density for
each molecule is identified by a vertical dashed line.
The top panel clearly shows that for the CO line, the
light and the mass track one another very closely, even
at the lowest densities. Thus, because nmed > ncrit, the
http://www.astro.princeton.edu/~krumholz/
Molecular Gas and Kennicutt-Schmidt Laws 5
Fig. 1.— Fractional contribution to the total luminosity
L−1[dL(< n)/d lnn] (solid lines) and mass M−1[dM(< n)/d lnn]
(dotted lines) versus density n for the lines CO(1 → 0) (top
panel), HCO+(1 → 0) (middle panel), and HCN(1 → 0)
(bottom panel). The three curves show the cases n = 102
cm−3, 103 cm−3, and 104 cm−3, from leftmost to rightmost.
We also show the critical density of each molecule, corrected
for radiative trapping (dashed vertical lines). These calcula-
tions use the parameters for the intermediate case listed in Table 1.
solid lines move in lock-step with the dashed lines as n
increases. In contrast, for HCN most of the luminosity
comes from densities near the critical density regardless
of the mass distribution. For the lowest n this means that
the line luminosity is entirely dominated by the high den-
sity tail of the mass distribution. As the median density
nmed varies by a factor of 100 (from 4.3× 103− 4.3× 105
cm−3), the peak of L−1[dL(< n)/d lnn] moves by just
a factor of a few in n. The HCO+ line is intermedi-
ate between CO and HCN. For n = 102 cm−3 and 103
cm−3, nmed
<∼ ncrit, and as with HCN most of the emis-
sion comes from near the critical density. For n = 104
cm−3, nmed > ncrit, and the light starts to follow the
mass, in a pattern similar to that for CO. Although Fig-
ure 1 shows only the intermediate case, the normal galaxy
and starburst cases give qualitatively identical results.
This confirms the intuitive argument given in § 1: high
critical density transitions trace regions of similar den-
sity in every galaxy, while low critical density transitions
trace regions whose density is close to the median density.
Now consider how the luminosity in a given line corre-
lates with the star formation rate in galaxies of varying
mean densities. For a given n, we can compute the vol-
ume density of star formation from equation (2) and the
line luminosity density from (9). To facilitate compari-
son with observations, rather than considering the total
line luminosity, we use the quantity L′ (Solomon et al.
1997), which is related to the luminosity L by
8πkBν2
L, (10)
converted to the units K km s−1 pc2.
Similarly, we can estimate the far infrared luminosity
from the star formation rate. There is a tight correla-
tion between far-IR emission and star formation, par-
ticularly for dense, dusty galaxies like those that make
up most of the dynamic range of the Kennicutt (1998a)
sample. To the extent that most or all of the light from
young stars is re-processed by dust before escaping the
galaxy, the bolometric luminosity integrated over the
wavelength range 8− 1000 µm, which we define as LFIR,
simply provides a calorimetric measurement of the total
energy output by young stars, and is therefore an excel-
lent tracer of recent star formation (Sanders & Mirabel
1996; Rowan-Robinson et al. 1997; Kennicutt 1998a,b;
Hirashita et al. 2003; Bell 2003; Iglesias-Páramo et al.
2006). We therefore estimate the FIR luminosity from
the star formation rate via
LFIR = ǫṀ∗c
2, (11)
where ǫ is an IMF-dependent constant. For consistency
with Kennicutt (1998a,b), we take ǫ = 3.8× 10−4. To be
precise and to facilitate comparison with observations, we
adopt the Sanders & Mirabel (1996) definition of LFIR
as a weighted sum of the luminosity in the 60 and 100
µm IRAS bands. This definition of the infrared luminos-
ity generically underestimates the total infrared luminos-
ity [8 − 1000]µm by a factor of 1.5 − 2 (Calzetti et al.
2000; Dale et al. 2001; Bell 2003). However, we use the
ǫ value appropriate for LFIR rather than for the total
IR luminosity because some of the observations to which
we wish to compare our model (see § 3.3) provide only
LFIR. Note that this choice for the connection between
the star formation rate and the infrared luminosity is not
fully consistent with our choice of the gas temperature
for the three sets of parameters — normal, intermediate,
and starburst — listed in Table 1, an issue we discuss in
more detail in § 4.3.
We plot the ratio of star formation rate to line lumi-
nosity, and infrared luminosity to line luminosity, as a
function of n in Figure 2. First consider the top panel,
which shows all three lines computed for the interme-
diate case. This again confirms our intuitive argument.
Since the luminosity per unit volume in the CO line is
roughly proportional to the mass density, and the star
formation rate / IR luminosity is proportional to mass
density to the 1.5 power, the ratios Ṁ∗/L
′ and LFIR/L
vary roughly as n0.5. A powerlaw fit to the data over the
range shown in Figure 2 gives an index of 0.57. In con-
trast, the ratio of star formation density to HCN luminos-
ity density is nearly constant for galaxies with n < 103
cm−3, and varies quite weakly with n up to densities
of 104 cm−3, values found in the densest starbursts. A
powerlaw fit from 10 cm−3 to 104 cm−3 gives an index of
0.17; from 10 cm−3 and 103 cm−3, the best fit powerlaw
index is 0.08. As in Figure 1, the slope of the Ṁ∗/L
curve for HCO+ represents an intermediate case, with
a roughly constant ratio of Ṁ∗/L
′ and LFIR/L
′ at low
n, rising to a slope comparable to that for CO at high
values of n.
Now consider the bottom three panels in Figure 2.
Each panel shows the ratio of star formation rate and
infrared luminosity to line luminosity for a single line,
computed for each of the three galaxy models. The most
Fig. 2.— Ratio of star formation rate or infrared luminosity to
line luminosity, as a function of mean density n. In the top panel
we show the lines CO(1 → 0) (solid line), HCO+(1 → 0) (dot-
dashed line), and HCN(1 → 0) (dashed line) for the intermediate
case in Table 1. In the next three panels we show the CO(1 → 0),
HCO+(1 → 0), and HCN(1 → 0) lines for the normal galaxy case
(dot-dashed line), intermediate case (solid line), and starburst case
(dashed line).
important point to take from these plots is that the choice
of galaxy model has little effect in most cases. The largest
differences are for HCN, where at n = 10 cm−3 the IR to
line ratio predicted for the intermediate case differs from
the normal galaxy case by a factor of 6.1, and from the
starburst case by a factor of 4.1. This variation comes
primarily from changes in the Mach number and the op-
tical depth between models. The higher Mach number of
the starburst model significantly increases the amount of
mass in the high overdensity tail of the probability dis-
tribution, while the higher optical depth lowers the ef-
fective critical density. Both of these effect increase the
amount of mass dense enough to emit in HCN(1 → 0)
and reduce Ṁ∗/L
′. At higher mean densities these effects
become less important and the models converge, so that
by n = 104 cm−3 the range in Ṁ∗/L
′ from the normal
to the starburst case is only a factor of 3.5.
Most importantly, our central conclusion that
Ṁ∗/L
HCN is roughly constant across galaxies, while
Ṁ∗/L
CO rises as roughly [L
0.5, still holds when we
consider how conditions vary across galaxies. Galaxies
with low mean densities n are generally closest to the
normal galaxy case, while those with high mean densi-
ties should be closest to the starburst case, and this sys-
tematic variation in galaxy properties with n still leaves
Ṁ∗/L
′ relatively flat for HCN, and varying with a slope
close to 0.5 for CO. From the normal galaxy case at
n = 10 cm−3 to the starburst case at n = 104 cm−3,
the value of Ṁ∗/L
′ varies by more than a factor of 50
for the CO(1 → 0), but by less than a factor of 3 for the
HCN(1 → 0).
3.3. Comparison with Observations
The calculations illustrated in Figure 2 demonstrate
the basic argument that one expects a roughly constant
star formation rate per unit line luminosity for high den-
sity tracers (e.g., HCN), and a star formation rate per
unit luminosity that rises like luminosity to the ∼ 0.5
for low density tracers (e.g., CO). However, in large sur-
veys one cannot always determine the mean density in
a galaxy, which would be required to construct an ob-
servational analog to Figure 2. Instead, we can use our
calculated dependence of star formation rate and line
luminosity on density to compare to observations as fol-
lows. Equation (9) gives the total molecular line lumi-
nosity per unit volume and equation (2) gives the star
formation rate, which we convert to an IR luminosity
via equation (11). For fixed assumed volume of molecu-
lar star-forming gas (Vmol) we can then predict the ex-
pected correlations between L′ in a given molecular line
and LFIR. The three panels of Figure 3 show our results
for LFIR as a function of L
CO, L
HCN, and L
for the
intermediate model (see Table 1) and for several values
of Vmol. Figure 4 shows how are results vary as a func-
tion of the assumed T and M. There, for fixed Vmol,
we compare our predictions for the intermediate model
with the normal and starburst models. In both figures we
compare our models to data culled from the literature.
From the work of Gao & Solomon (2004a,b),
Greve et al. (2005, their Fig. 7), Riechers et al. (2006b,
their Fig. 5), and Gao et al. (2007), as well as the
theoretical arguments in the preceding sections, we
expect a strong, but not linear, correlation between
the CO luminosity and the star formation rate —
as measured by LFIR — with the approximate form
LFIR ∝ L
CO . The left panel of Figure 3 shows the
CO data, the approximate correlation expected (solid
line segment; offset from the data for clarity) and the
theoretical prediction (solid lines) for a total volume of
molecular gas of Vmol = 10
7, 108, and 109 pc3. Because
at fixed LFIR, galaxies exhibit a dispersion in Vmol we
expect there to be intrinsic scatter in this correlation,
roughly bracketed by the range of Vmol plotted.
The middle and right panels of Figure 3 show the same
prediction for L′
and L′HCN. In these cases, because
the molecular line luminosity is nearly linearly propor-
tional to LFIR, the dependence on Vmol is much weaker
than for L′CO. However, systematic changes or differences
in the fiducial parameters for the calculation (see Table
1) introduce uncertainty and scatter into the correlation.
Figure 4 assesses this dependence. It compares the pre-
dictions of our model for normal (dot-dashed lines), inter-
mediate (solid lines), and starburst (dashed lines) galax-
ies, as defined in Table 1, for fixed Vmol = 10
8 pc3. Our
simple model reproduces the data rather well, and it pre-
dicts that generically there may be more intrinsic scatter
in the L′CO−LFIR correlation than in either L
HCN−LFIR
or L′
− LFIR.
Molecular Gas and Kennicutt-Schmidt Laws 7
Fig. 3.— LFIR (L⊙) versus L
(1 → 0) (left panel), L′
(1 → 0) (middle panel), and L′
(1 → 0) (K km s−1 pc2; right panel).
The lines in each panel derive from the model presented in this paper with a constant total volume of molecular material of 107, 108, and
109 pc3 (lowest to highest). The thick solid line segment shows power-law slopes to guide the eye. Data in the left and right panels are
from Gao & Solomon (2004a,b) (circles) and Gao et al. (2007) (open squares for detections, arrows for upper limits). The middle panel
combines data from Nguyen et al. (1992) (small circles with lines), Graciá-Carpio et al. (2006) (big circles), and Riechers et al. (2006b)
(open square; using the Gao et al. (2007) FIR luminosity and magnification factor). For all data, LFIR is defined based on a weighted sum
of the galaxy luminosity in the 60 and 100 µm IRAS bands, as described by Sanders & Mirabel (1996). For the Nguyen et al. (1992) data,
the uncertainties in LHCO+ indicated by the lines arise because Nguyen et al. provide both HCN(1 → 0) and HCO
+(1 → 0) intensities,
but the values for L′
derived from their work generally fall a factor of 2 − 3 below the L′
from Gao & Solomon (2004a,b) for the
same systems. This is probably because Nguyen et al. use a single beam pointing rather than integrating fully over extended sources, and
therefore miss some of the flux. We therefore show two values of L′
, connected by a line, for each Nguyen et al. data point: a smaller
value calculated directly from the data listed in their Table 2, and a larger value obtained by multiplying the L′HCN value of Gao & Solomon
for that galaxy by the ratio IHCO+/IHCN measured by Nguyen et al. If this ratio is constant over the source, this estimate should correctly
account for the flux outside the beam in the Nguyen et al. HCO+ observation.
Note that in both the middle and right panels of Fig-
ures 3 and 4, one expects a turn upward in the corre-
lation at high LFIR, a deviation from linearity. This
follows from the fact that in our model, at fixed Vmol,
systems with higher LFIR have higher average densities.
At sufficiently high LFIR we thus expect L
HCN−LFIR and
−LFIR to steepen, in analogy with the L′CO−LFIR
correlation. The data points with very high LFIR in Fig-
ures 3 and 4, which might be used to test this prediction
of our model, are gravitationally lensed, at high redshift,
and contaminated by bright AGN. It is therefore un-
clear if the deviation from linearity implied particularly
by the upper limits in L′HCN in the right panels of Fig-
ures 3 and 4 is a result of enhanced LFIR, caused by the
AGN emission (Carilli et al. 2005), or is instead a result
of less molecular line emission per unit star formation,
as our model implies (Fig. 2). Gao et al. (2007) note,
however, that in the three systems for which the contri-
bution from the AGN has been estimated (F10214+4724,
D. Downes & P. Solomon 2007, in preparation; Clover-
leaf, Weiß et al. 2003; APM 08279+5255, Weiß et al.
2005, 2007) the corrections are only significant for APM
08279+5255. This suggests that the data are so far con-
sistent with our interpretation, but clearly much more
data at high LFIR — or, more precisely for our purposes,
at high density — is required to test our predictions. We
discuss the issue of AGN contamination further in § 4.3.
As a final note, the data so far do support the utility
of HCO+ as a useful tracer of dense gas. Papadopoulos
(2007) has argued against the utility of HCO+ as a faith-
ful tracer of mass in starbursts on the basis that, since it
is an ion, its abundance is strongly dependent on the free-
electron abundance and might therefore vary strongly
between galaxies with different ionizing radiation back-
grounds. We cannot rule out this possibility given the
limited data set available, but we see no strong evi-
dence in favor of it from the data shown in Figures 3
and 4. As we have argued, HCO+(1 → 0) is particu-
larly useful because its critical density is between that
of CO(1 → 0) and HCN(1 → 0) and, thus, as Fig-
ure 3 and 4 show, the correlation between LFIR and
L′HCO+ should steepen from linear to super-linear over
the range of galaxies presented in the CO panels. A care-
ful, large-scale HCO+(1 → 0) survey similar to the work
of Gao & Solomon (2004a) on HCN(1 → 0) should reveal
these trends. Lines with similarly low excitation temper-
atures and intermediate critical densities like CS(1 → 0)
should behave analogously.
4. DISCUSSION
4.1. Implications for Kennicutt-Schmidt Laws and Star
Formation Efficiencies
Our results suggest that KS laws in different tracers
naturally fall into two regimes, although there is a broad
range of molecular tracers that are intermediate between
the two extremes. Tracers for which the critical density is
small compared to the median density in a galaxy repre-
sent one limit. In these tracers, the light faithfully follows
the mass, so the KS law measures a relationship between
total mass and star formation. In any model in which
star formation occurs at a roughly constant rate per dy-
namical time, this must produce a KS law in which the
star formation rate rises with density to a power of near
1.5, and the ratio of star formation to luminosity rises
as density to the 0.5 power. In terms of surface rather
than volume densities, this implies Σ̇∗ ∝ Σ
−1/2. If
we further add the observation that the scale heights h of
the star-forming molecular layers of galaxies are roughly
constant across galaxy types, one form of the observed
Kennicutt (1998a,b) star formation law follows immedi-
ately (Elmegreen 2002). Moreover, in a galactic disk,
Fig. 4.— The same as Figure 3, but with constant Vmol = 10
8 pc3, and for the model parameters corresponding to “starburst” (dashed),
“intermediate” (solid), and “normal” (dot-dashed) (see Table 1). Therefore, the middle solid line in each panel of Figure 3 is the same as
the solid line in each panel in this Figure.
h ∝ Σg/n and n ∝ Ω2/Q (e.g. Thompson et al. 2005);
since in star-forming disks the Toomre-Q is about unity
(Martin & Kennicutt 2001), substituting for h immedi-
ately gives Σ̇∗ ∝ ΣΩ, the alternate form of the Kennicutt
(1998a,b) law.
The other limit is tracers for which the critical density
is large compared to the median galactic density. These
tracers pick out a particular density independent of the
mean or median density in the galaxy, and thus all the
regions they identify have the same dynamical time re-
gardless of galactic environment. In this case the star
formation rate will simply be proportional to the total
mass of the observed regions, yielding a constant ratio of
star formation rate to mass, as is observed for HCN in
the local universe (Fig. 3, right panel; Gao & Solomon
2004a,b; Wu et al. 2005).
We predict that there should be a transition between
linear and super-linear scaling of LFIR with line luminos-
ity at the point where galaxies transition from median
densities that are smaller than the line critical density
to median densities larger than the critical density. The
HCO+(1 → 0) line, and other lines with similar critical
densities, e.g. CS(1 → 0) and SO(1 → 0), should show
this behavior for galaxies in the local universe. The ob-
served correlation between LHCO+ and LFIR appears to
be consistent with our prediction, although at present the
data are not of sufficient quality to distinguish between
a break and a single powerlaw relation. There are hints
that the very highest luminosity star-forming galaxies,
which all reside at high redshift and may well reach ISM
densities not found in any local systems, show such a
break in the IR-HCN correlation.
One important point to emphasize in this analysis is
that we have been able to explain the observed correla-
tions between line and infrared luminosities, and hence
between gas masses at various densities and star for-
mation rates, without resorting to the hypothesis that
the star formation process is fundamentally different in
galaxies of different properties. Although uncertainties
in both our model and the observations do not preclude
an order-unity change in the star formation efficiency or
SFRff as a function of LFIR, there is currently no evi-
dence for such a change in the data, contrary to claims
made by, e.g. Graciá-Carpio et al. (2006). In fact, all
of the observational trends are predicted by our sim-
ple model with constant star formation efficiency. This
is consistent with other lines of evidence that the frac-
tion of mass at a given density that turns into stars
is roughly 1% per free-fall time independent of density
(Krumholz & Tan 2007).
4.2. Does Star Formation Have a Fundamental Size or
Density Scale?
Based on the linear correlation between HCN(1 → 0)
luminosity and star formation rate, seen both in ex-
ternal galaxies and in individual molecular clumps in
the Milky Way, Gao & Solomon (2004a,b) and Wu et al.
(2005) propose that HCN(1 → 0) emission traces a fun-
damental unit of star formation. They explain the linear
IR-HCN correlation as a product of this; in their model,
HCN luminosity correlates linearly with star formation
rate because HCN luminosity simply counts the number
of such units.
Based on our analysis, we argue that this hypothesis is
only partially correct. We concur with Gao & Solomon
and Wu et al. that the HCN(1 → 0) luminosity of a
galaxy does simply reflect the mass of gas that is dense
enough to excite the HCN(1 → 0) line. However, our
analysis shows that this does not necessarily imply that
this density represents a special density in the star for-
mation process, or that objects traced by HCN(1 → 0)
represent a physically distinct class. We show that a
linear correlation between star formation rate and line
luminosity is expected for any line with a critical density
comparable to or larger than the median molecular cloud
density in the galaxies used to define the correlation. It
is possible that HCN(1 → 0)-emitting regions represent
a physically distinct scale of star formation as Wu et al.
propose, but one can explain the linear IR-HCN correla-
tion equally well if they are just part of the same contin-
uous medium as the regions traced by CO(1 → 0) and
by other transitions. Even the star-forming clouds them-
selves may simply be parts of a continuous distribution
of ISM structures occupying the entire galaxy, as argued
by Wada & Norman (2007). In this case there need be
no special density scales other than the mean and me-
dian densities for the star-forming clouds on their largest
scales, and the density at which star formation becomes
rapid, converting the mass into stars in of order a free-
Molecular Gas and Kennicutt-Schmidt Laws 9
fall time. This transition scale is unknown, but must
be considerably larger than the density traced by HCN
(Krumholz & Tan 2007).
4.3. Limitations and Cautions
4.3.1. Self-Consistency
As mentioned in § 3.2, our approach of leaving the gas
temperature T and Mach number M as free parame-
ters is not entirely consistent with our calculation of the
IR luminosity, since the IR luminosity and temperature
are of course related. In principle, with a model of how
the energy output from stars heats the dust and gas, to-
gether with a structural model connecting the energy and
momentum output from stars to the generation of turbu-
lence, it should be possible to self-consistently compute
both the gas temperature and the Mach number from the
volumetric star formation rate (see, e.g., Thompson et al.
2005). Such a model would return T andM as a function
of n and possibly other galaxy properties, while simulta-
neously predicting a set of Kennicutt-Schmidt laws.
If the line luminosity depended strongly on T or M,
or if one required knowledge of the temperature to com-
pute the infrared luminosity of a galaxy, we would would
have no alternative to constructing such a model if we
wished to explain the observed IR-line luminosity cor-
relation. However, we can avoid this by relying on the
observationally-calibrated star formation-IR correlation,
and because, as we show in Figures 2, 3, and 4, the line
luminosity varies quite weakly over a reasonable range of
T and M for our chosen lines. For this reason, any model
for computing T and M as a function of galaxy proper-
ties, if it were consistent with observations, would not
significantly alter the IR-line luminosity correlation we
derive. This is true, however, only for lines that require
low temperatures to excite. As we discuss in § 4.3.2,
lines that require higher temperatures to excite do de-
pend sensitively on the temparature in the galaxy, and a
model capable of predicting the IR-line luminosity cor-
relation for these lines must also include a calculation of
the temperature structure of the galaxy.
4.3.2. Isothermality
Our assumption of isothermality means that our anal-
ysis will only apply to molecular lines for which the tem-
perature Tup corresponding to the upper state energy is
< 10 K, low enough to be excited even in the coolest
molecular clouds in normal spiral galaxies. The reason
for this is that at temperatures larger than Tup, the lu-
minosity in a line generally varies at most linearly with
the temperature. As the similarity between the results
with our different galaxy models illustrates, changing the
temperature within the range of ∼ 10 − 50 K produces
only a factor of a few change in the luminosity of the
lines we have studied. In contrast, line luminosity re-
sponds exponentially to temperature changes when the
temperature is below the value corresponding to the up-
per state’s energy. This means that lines sensitive to
high temperatures pick out primarily the regions that
are warm enough for the line to be excited. Density has
only a secondary effect. The emission will therefore re-
flect the temperature distribution in star-forming clouds
more than the density distribution, an effect that our
isothermal assumption precludes us from treating. KS
laws in high temperature tracers are likely to find lin-
ear relationships between star formation rate and mass
regardless of the critical density of the molecule in ques-
tion because they will simply be correlating the mass
of dust warmed to >∼ 100 Kelvin, which is essentially
what is measured by LFIR, with the mass of gas warmed
to temperatures above Tup. However, our model will
not apply to these lines, and for this reason we do not
attempt to compare to observations using higher tran-
sitions of CO (3 → 2, 4 → 3, 5 → 4, 6 → 5, and
7 → 6, which have Tup = 33, 55, 83, 116, and 154 K,
respectively; Greve et al. 2005, Solomon & Vanden Bout
2005), CS(5 → 4) (Tup = 35 K; Plume et al. 1997), or
other high temperature tracers.
4.3.3. Molecular Abundances
We have not considered density-dependent variations
in molecule abundances. One potential source of vari-
ation in molecular abundance is freeze-out onto grain
surfaces at high densities and low temperatures (e.g.
Tafalla et al. 2004a,b). Chemodynamical models suggest
that freeze-out is not likely to become significant for ei-
ther carbonaceous or nitrogenous species until densities
n >∼ 106 cm−3 (Flower et al. 2006), but may become se-
vere at higher densities, so whether depletion is signif-
icant depends on what fraction of the total luminosity
would be contributed by gas of this density or higher
were there no freeze-out. Figure 1 suggests that freeze-
out is likely to modify the total galactic luminosity of
CO, HCO+, and HCN fairly little even at a mean ISM
density of n = 104 cm−3, but may have significant effects
for galaxies of larger mean densities or for lines for which
the critical densities is comparable to the freeze-out den-
sity. If freeze-out is significant, our conclusions will be
modified.
4.3.4. Atomic Gas
In the simple model developed here, we have neglected
the role of atomic gas entirely. Whether the density or
surface density of atomic gas plays a role in controlling
the star formation rate is subject to debate on both ob-
servational and theoretical grounds (Kennicutt 1998a,b;
Wong & Blitz 2002; Heyer et al. 2004; Komugi et al.
2005; Krumholz & McKee 2005; Kennicutt et al. 2007),
so it is unclear how much a limitation this omission re-
ally is. We can say with confidence that in molecule-rich
galaxies, which provide almost all the dynamic range of
both the Kennicutt (1998a,b) correlation and the cor-
relations illustrated in Figures 3 and 4, the atomic gas
plays almost no role simply because there is so little of
it. Thus, our predictions should be quite robust, except
perhaps at the very low luminosity ends of Figures 3 and
4.3.5. AGN Contributions
A final point is not so much a limitation of our work as
a cautionary note about comparing our model with ob-
servations. We have included in our model IR luminosity
only from star formation, and molecular line luminosity
only from molecules in cold star-forming clouds. How-
ever, an AGN may make a significant contribution to a
galaxy’s luminosity in the far infrared by direct heating of
dust grains, and in molecular lines via an X-ray dissocia-
tion region. Indeed, several of the systems with the high-
est IR luminosities in Figures 3 and 4 are contaminated
by AGN. As noted in §3.3, this complicates an assessment
of our prediction of an up-turn in the L′HCN − LFIR and
L′HCO+ − LFIR correlations at high luminosity. This de-
viation from linearity at high gas density (at fixed Vmol,
high LFIR) is an essential prediction of our model, but
testing it relies on a careful separation of the contribu-
tion of the AGN to both the IR and line luminosities (e.g.
Maloney et al. 1996). In fact, Carilli et al. (2005) discuss
the possibility that the AGN’s contribution to the IR lu-
minosity in these systems causes them to be above the
local linear L′HCN−LFIR correlation. Such a contamina-
tion would mimic the prediction of our model. However,
Gao et al. (2007) argue that the sub-millimeter galaxies
in their sample are not AGN dominated and that just
one of three quasars in their sample (APM 08279+5255)
has a large AGN IR component. See Gao et al. (2007)
for more discussion. For these reasons we contend that
although our model is consistent with the existing data,
the current evidence for a break in the L′HCN−LFIR cor-
relation should be viewed with caution and more data
in high density/luminosity systems is clearly required to
understand the role of AGN contamination in shaping
the correlation.
5. CONCLUSIONS
We provide a simple model for understanding how
Kennicutt-Schmidt laws, which relate the star formation
rate to the mass or surface density of gas as inferred from
some particular line, depend on the line chosen to define
the correlation. We show that for a turbulent medium
the luminosity per unit volume in a given line, provided
that line can be excited at temperatures lower than the
mean temperature in a galaxy’s molecular clouds, in-
creases faster than linearly with the density for molecules
with critical densities larger than the median gas density.
The star formation rate also rises super-linearly with the
gas density, and the combination of these two effects pro-
duces a close to linear correlation between star formation
rate and line luminosity. In contrast, the line luminosity
rises only linearly with density for lines with low critical
densities, producing a correlation between star formation
rate and line luminosity that is super-linear.
Based on this analysis, we construct a model for the
correlation between a galaxy’s infrared luminosity and its
luminosity in a particular molecular line. Our model is
extremely simple, in that it relies on an observationally-
calibrated IR-star formation rate correlation, it treats
molecular clouds as having homogenous density and ve-
locity distributions, temperatures, and chemical compo-
sitions, and it only very crudely accounts for variations
in molecular cloud properties across galaxies. Despite
these approximations, the model naturally explains why
some observed correlations between infrared luminosity
and line luminosity in galaxies are linear, and some are
super-linear. Using it, we are able to compute quantita-
tively the correlation between infrared and HCN(1 → 0)
line luminosity, and between IR and CO(1 → 0) line lu-
minosity. We show that our model provides a very good
fit to observations in these lines, and we are able to make
similar predictions for any molecular line that can be ex-
cited at low temperatures, as we demonstrate for the
example of HCO+(1 → 0). Moreover, we are able to ex-
plain the observed data without recourse to the hypoth-
esis that the star formation process is somehow different,
either more or less efficient, in different types of galaxies
or for media of different densities. Instead, our model is
able to explain the observed correlations using a simple,
universal star formation law.
One strong prediction of our model is that there should
be a break from linear to non-linear scaling in the HCN-
IR correlation at very high IR luminosity, and a similar
break in the HCO+-IR correlation at somewhat lower lu-
minosity. The data for HCO+ are consistent with this
prediction but do not yet strongly favor a break over pure
powerlaw behavior. However, there is some preliminary
evidence for a break in the IR-HCN correlation in high
redshift galaxies more luminous than any found in the
local universe, although with these high redshift obser-
vations it is difficult to rule out the alternative explana-
tion of the break as arising due to a progressively rising
AGN contribution to the IR luminosity (see §3.3 and
§4.3.5). Future galaxy surveys both in the local universe
and at high redshift may be used to test our predictions
for HCO+(1 → 0), HCN(1 → 0), and other molecular
lines.
We thank L. Blitz, B. Draine, A. Leroy, E. Rosolowsky,
and A. Socrates for helpful discussions, N. Evans and
the anonymous referee for useful comments on the
manuscript, and R. Kennicutt for kindly providing a
preprint of his submitted paper. We thank Y. Gao for
providing LFIR for the systems used in Figures 3 and 4.
MRK acknowledges support from NASA through Hubble
Fellowship grant #HSF-HF-01186 awarded by the Space
Telescope Science Institute, which is operated by the As-
sociation of Universities for Research in Astronomy, Inc.,
for NASA, under contract NAS 5-26555. TAT acknowl-
edges support from a Lyman Spitzer, Jr. Fellowship.
REFERENCES
Bell, E. F. 2003, ApJ, 586, 794
Black, J. H. 2000, in Astronomy, physics and chemistry of H
Calzetti, D., Armus, L., Bohlin, R. C., Kinney, A. L., Koornneef,
J., & Storchi-Bergmann, T. 2000, ApJ, 533, 682
Carilli, C. L., Solomon, P., Vanden Bout, P., Walter, F., Beelen,
A., Cox, P., Bertoldi, F., Menten, K. M., Isaak, K. G.,
Chandler, C. J., & Omont, A. 2005, ApJ, 618, 586
Combes, F. 1991, ARA&A, 29, 195
Dale, D. A., Helou, G., Contursi, A., Silbermann, N. A., &
Kolhatkar, S. 2001, ApJ, 549, 215
Downes, D. & Solomon, P. M. 1998, ApJ, 507, 615
Elmegreen, B. G. 1994, ApJ, 425, L73
—. 2002, ApJ, 577, 206
Elmegreen, B. G. & Scalo, J. 2004, ARA&A, 42, 211
Flower, D. R., Pineau Des Forêts, G., & Walmsley, C. M. 2006,
A&A, 456, 215
Gao, Y., Carilli, C. L., Solomon, P. M., & Vanden Bout, P. A.
2007, ApJ, in press, astro-ph/0703548
Gao, Y. & Solomon, P. M. 2004a, ApJS, 152, 63
—. 2004b, ApJ, 606, 271
Graciá-Carpio, J., Garćıa-Burillo, S., Planesas, P., & Colina, L.
2006, ApJ, 640, L135
Molecular Gas and Kennicutt-Schmidt Laws 11
Greve, T. R., Bertoldi, F., Smail, I., Neri, R., Chapman, S. C.,
Blain, A. W., Ivison, R. J., Genzel, R., Omont, A., Cox, P.,
Tacconi, L., & Kneib, J.-P. 2005, MNRAS, 359, 1165
Heyer, M. H., Corbelli, E., Schneider, S. E., & Young, J. S. 2004,
ApJ, 602, 723
Hirashita, H., Buat, V., & Inoue, A. K. 2003, A&A, 410, 83
Iglesias-Páramo, J., Buat, V., Takeuchi, T. T., Xu, K., Boissier,
S., Boselli, A., Burgarella, D., Madore, B. F., Gil de Paz, A.,
Bianchi, L., Barlow, T. A., Byun, Y.-I., Donas, J., Forster, K.,
Friedman, P. G., Heckman, T. M., Jelinski, P. N., Lee, Y.-W.,
Malina, R. F., Martin, D. C., Milliard, B., Morrissey, P. F.,
Neff, S. G., Rich, R. M., Schiminovich, D., Seibert, M.,
Siegmund, O. H. W., Small, T., Szalay, A. S., Welsh, B. Y., &
Wyder, T. K. 2006, ApJS, 164, 38
Juvela, M., Padoan, P., & Nordlund, Å. 2001, ApJ, 563, 853
Kennicutt, R. C. 1998a, ARA&A, 36, 189
—. 1998b, ApJ, 498, 541
Kennicutt, R. C., Calzetti, D., Walter, F., Helou, G., Hollenbach,
D. J., Armus, L., Bendo, G., Dale, D. A., Draine, B. T.,
Engelbracht, C. W., Gordon, K. D., Prescott, M. K. M., Regan,
M. W., Thornley, M. D., Bot, C., Brinks, E., de Blok, E., de
Mello, D., Meyer, M., Moustakas, J., Murphy, E. J., Sheth, K.,
& Smith, J. D. T. 2007, ApJ, submitted
Komugi, S., Sofue, Y., Nakanishi, H., Onodera, S., & Egusa, F.
2005, PASJ, 57, 733
Krumholz, M. R. & McKee, C. F. 2005, ApJ, 630, 250
Krumholz, M. R. & Tan, J. C. 2007, ApJ, 654, 304
Lahuis, F. & van Dishoeck, E. F. 2000, A&A, 355, 699
Mac Low, M. & Klessen, R. S. 2004, Reviews of Modern Physics,
76, 125
Madore, B. F. 1977, MNRAS, 178, 1
Maloney, P. R., Hollenbach, D. J., & Tielens, A. G. G. M. 1996,
ApJ, 466, 561
Martin, C. L. & Kennicutt, R. C. 2001, ApJ, 555, 301
McKee, C. F. 1999, in NATO ASIC Proc. 540: The Origin of
Stars and Planetary Systems, 29
Netzer, H., Lemze, D., Kaspi, S., George, I. M., Turner, T. J.,
Lutz, D., Boller, T., & Chelouche, D. 2005, ApJ, 629, 739
Neufeld, D. A., Melnick, G. J., Sonnentrucker, P., Bergin, E. A.,
Green, J. D., Kim, K. H., Watson, D. M., Forrest, W. J., &
Pipher, J. L. 2006, ApJ, 649, 816
Nguyen, Q.-R., Jackson, J. M., Henkel, C., Truong, B., &
Mauersberger, R. 1992, ApJ, 399, 521
Nordlund, Å. K. & Padoan, P. 1999, in Interstellar Turbulence,
Ostriker, E. C., Gammie, C. F., & Stone, J. M. 1999, ApJ, 513,
Padoan, P. & Nordlund, Å. 2002, ApJ, 576, 870
Papadopoulos, P. P. 2007, ApJ, 656, 792
Plume, R., Jaffe, D. T., Evans, N. J., Martin-Pintado, J., &
Gomez-Gonzalez, J. 1997, ApJ, 476, 730
Riechers, D. A., Walter, F., Carilli, C. L., Knudsen, K. K., Lo,
K. Y., Benford, D. J., Staguhn, J. G., Hunter, T. R., Bertoldi,
F., Henkel, C., Menten, K. M., Weiss, A., Yun, M. S., &
Scoville, N. Z. 2006a, ApJ, 650, 604
Riechers, D. A., Walter, F., Carilli, C. L., Weiss, A., Bertoldi, F.,
Menten, K. M., Knudsen, K. K., & Cox, P. 2006b, ApJ, 645,
Rowan-Robinson, M., Mann, R. G., Oliver, S. J., Efstathiou, A.,
Eaton, N., Goldschmidt, P., Mobasher, B., Serjeant, S. B. G.,
Sumner, T. J., Danese, L., Elbaz, D., Franceschini, A., Egami,
E., Kontizas, M., Lawrence, A., McMahon, R.,
Norgaard-Nielsen, H. U., Perez-Fournon, I., &
Gonzalez-Serrano, J. I. 1997, MNRAS, 289, 490
Sanders, D. B. & Mirabel, I. F. 1996, ARA&A, 34, 749
Schmidt, M. 1959, ApJ, 129, 243
—. 1963, ApJ, 137, 758
Schöier, F. L., van der Tak, F. F. S., van Dishoeck, E. F., &
Black, J. H. 2005, A&A, 432, 369
Solomon, P. M., Downes, D., Radford, S. J. E., & Barrett, J. W.
1997, ApJ, 478, 144
Solomon, P. M., Rivolo, A. R., Barrett, J., & Yahil, A. 1987, ApJ,
319, 730
Solomon, P. M. & Vanden Bout, P. A. 2005, ARA&A, 43, 677
Tafalla, M., Myers, P. C., Caselli, P., & Walmsley, C. M. 2004a,
A&A, 416, 191
—. 2004b, Ap&SS, 292, 347
Thompson, T. A., Quataert, E., & Murray, N. 2005, ApJ, 630,
Wada, K. & Norman, C. 2007, ApJ, in press, astro-ph/0701595
Weiß, A., Downes, D., Neri, R., Walter, F., Henkel, C., Wilner,
D. J., Wagg, J., & Wiklind, T. 2007, A&A, 467, 955
Weiß, A., Downes, D., Walter, F., & Henkel, C. 2005, A&A, 440,
Weiß, A., Henkel, C., Downes, D., & Walter, F. 2003, A&A, 409,
Wild, W., Harris, A. I., Eckart, A., Genzel, R., Graf, U. U.,
Jackson, J. M., Russell, A. P. G., & Stutzki, J. 1992, A&A,
265, 447
Wong, T. & Blitz, L. 2002, ApJ, 569, 157
Wu, J., Evans, N. J., Gao, Y., Solomon, P. M., Shirley, Y. L., &
Vanden Bout, P. A. 2005, ApJ, 635, L173
Yao, L., Seaquist, E. R., Kuno, N., & Dunne, L. 2003, ApJ, 588,
Zaritsky, D., Kennicutt, Jr., R. C., & Huchra, J. P. 1994, ApJ,
420, 87
ABSTRACT
  We provide a model for how Kennicutt-Schmidt (KS) laws, which describe the
correlation between star formation rate and gas surface or volume density,
depend on the molecular line chosen to trace the gas. We show that, for lines
that can be excited at low temperatures, the KS law depends on how the line
critical density compares to the median density in a galaxy's star-forming
molecular clouds. High critical density lines trace regions with similar
physical properties across galaxy types, and this produces a linear correlation
between line luminosity and star formation rate. Low critical density lines
probe regions whose properties vary across galaxies, leading to a star
formation rate that varies superlinearly with line luminosity. We show that a
simple model in which molecular clouds are treated as isothermal and homogenous
can quantitatively reproduce the observed correlations between galactic
luminosities in far infrared and in the CO(1->0) and HCN(1->0) lines, and
naturally explains why these correlations have different slopes. We predict
that IR-line luminosity correlations should change slope for galaxies in which
the median density is close to the line critical density. This prediction may
be tested by observations of lines such as HCO^+(1->0) with intermediate
critical densities, or by HCN(1->0) observations of intensely star-forming high
redshift galaxies with very high densities. Recent observations by Gao et al.
hint at just such a change in slope. We argue that deviations from linearity in
the HCN(1->0)-IR correlation at high luminosity are consistent with the
assumption of a constant star formation efficiency.

<|endoftext|><|startoftext|>
Friedmann Equations and Thermodynamics of Apparent Horizons
Yungui Gong1, 2, ∗ and Anzhong Wang2, †
College of Mathematics and Physics, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
GCAP-CASPER, Department of Physics, Baylor University, Waco, TX 76798, USA
With the help of a masslike function which has dimension of energy and equals to the Misner-Sharp
mass at the apparent horizon, we show that the first law of thermodynamics of the apparent horizon
dE = TAdSA can be derived from the Friedmann equation in various theories of gravity, including
the Einstein, Lovelock, nonlinear, and scalar-tensor theories. This result strongly suggests that the
relationship between the first law of thermodynamics of the apparent horizon and the Friedmann
equation is not just a simple coincidence, but rather a more profound physical connection.
PACS numbers: 98.80.-k,04.20.Cv,04.70.Dy
The derivation of the thermodynamic laws of black
holes from the classical Einstein equation suggests a deep
connection between gravitation and thermodynamics [1].
The discovery of the quantum Hawking radiation [2] and
black hole entropy which is proportional to the area of
the event horizon of the black hole [3] further supports
this connection and the thermodynamic (physical) inter-
pretation of geometric quantities. The interesting rela-
tion between thermodynamics and gravitation became
manifest when Jacobson derived Einstein equation from
the first law of thermodynamics by assuming the propor-
tionality of the entropy and the horizon area for all local
acceleration horizons [4].
In cosmology, like in black holes, for the cosmologi-
cal model with a cosmological constant (called de Sit-
ter space), there also exist Hawking temperature and en-
tropy associated with the cosmological event horizon, and
thermodynamic laws of the cosmological event horizon
[5]. In de Sitter space, the event horizon coincides with
the apparent horizon (AH). For more general cosmologi-
cal models, the event horizon may not exist, but the AH
always exists, so it is possible to have Hawking tempera-
ture and entropy associated with the AH. The connection
between the first law of thermodynamics of the AH and
the Friedmann equation was shown in [6]. Now, we must
ask if this interesting relation between gravitation and
thermodynamics exists in more general theories of grav-
ity, like Brans-Dicke (BD) theory and nonlinear gravi-
tational theory. In [7], the gravitational field equations
for the nonlinear theory of gravity were derived from the
first law of thermodynamics by adding some nonequi-
librium corrections. In this Letter, we show that equi-
librium thermodynamics indeed exists for more general
theories of gravity, provided that a new masslike function
is introduced.
To show our claim, we begin by reviewing the ther-
modynamics of the AH with the use of the Misner-
Sharp (MS) mass in Einstein and BD theories of gravity,
whereby we find the equilibrium thermodynamics fails
to hold for the BD theory. The Einstein equation can be
rewritten as the mass formulas with the help of the MS
mass M. The energy flow through the AH dE is related
with the MS mass. Since the MS mass M, the Hawking
temperature TA, and the entropy SA of the AH are geo-
metric quantities, the first law of thermodynamics of the
AH can be thought of as a geometric relation. Therefore,
we expect the geometric relation to hold in other gravi-
tational theory if it holds in Einstein theory. To achieve
this, we replace the MS mass M by a masslike function
M which equals to the MS mass M at the AH, we then
show that the connection between the first law of ther-
modynamics of the AH and the gravitational equations
holds in scalar-tensor and nonlinear theories of gravity
without adding nonequilibrium correction.
For a spherically symmetric space-time with the metric
ds2 = gabdx
adxb+ r̃2dΩ2, using the MS mass M = r̃(1−
gabr̃,ar̃,b)/2G [8], the a − b components of the Einstein
equation give the mass formulas [9, 10]
M,a = 4πr̃2(T ba − δ
aT )r̃,b, (1)
where the unit spherical metric is given by dΩ2 = dθ2 +
sin2 θdϕ2 and T = T aa . From now on, all the indices are
raised and lowered by the metric gab and the covariant
derivative is with respect to gab. The AH is
r̃A = arA = (H
2 + k/a2)−1/2. (2)
At the AH, the MS mass M = 4πr̃3Aρ/3, which can be
interpreted as the total energy inside the AH. Now we
use the (approximate) generator ka = (1, −Hr) of the
AH, which is null at the horizon, to project the mass
formulas. Since kar̃,a = 0, at the AH we find that
− dE = −ka∇aMdt = d(r̃A)/G = TAdSA, (3)
where the horizon temperature is TA = 1/(2πr̃A) and the
horizon entropy is SA = πr̃
A/G. On the other hand, us-
ing the mass formulae (1), we get the energy flow through
the AH
− dE = −ka∇aMdt = −4πr̃2T ba r̃,bk
= 4πr̃3AH(ρ+ p)dt. (4)
Therefore, the Friedmann equation gives rise to the first
law of thermodynamics −dE = TAdSA of the AH. From
http://arxiv.org/abs/0704.0793v2
the above definitions, we see that the relation −dE =
TAdSA is a geometrical relation which depends on the
only assumption of the Robertson-Walker metric. To
connect the geometrical quantity dE with the energy flow
through the AH, we need to use the Friedmann equa-
tions. Therefore, for any gravitational theory, if we can
write the gravitational field equation as Gµν = 8πGTµν
and regard the right-hand side as the effective energy-
momentum tensor, then we find the energy flow through
the AH, whereby we derive the first law of thermodynam-
ics of the AH −dE = TAdSA. For example, in the Jordan
frame of the scalar-tensor theory of gravity, if we take the
right-hand side of gravitational field equation as the total
effective energy-momentum tensor, then the Friedmann
equation can be regarded as a thermodynamic identity
at the AH [11].
The connection between the first law of thermodynam-
ics and the Friedmann equation at the AH was also found
for gravity with Gauss-Bonnet term, the Lovelock theory
of gravity [6], and the braneworld cosmology [12]. For
a general static spherically symmetric and stationary ax-
isymmetric space-times, it was shown that Einstein equa-
tion at the horizon give rise to the first law of thermody-
namics [13, 14]. For the Lovelock gravity, the interpreta-
tion of gravitational field equation as a thermodynamic
identity was proposed in [15].
Alternatively, the mass formulae (1) can be written as
the so-called unified first law ∇aM = AΨa + W∇aV
[16, 17], where W = (ρ − p)/2 and Ψa = T ba r̃,b +
Wr̃,a. Projecting the unified first law along the direc-
tion tangent to the AH (or trapping horizon in Hay-
ward’s terminology), the first law of thermodynamics
dM = TdS + WdV can be derived, where the hori-
zon temperature and entropy are given, respectively, by
T = ✷r̃/(4π) and S = A/(4G). Based on this result,
the connection between the Friedmann equation and the
first law of thermodynamics of the AH with the work
term was widely discussed for Einstein gravity, Love-
lock’s gravity, the scalar-tensor theory of gravity, the
nonlinear theory of gravity, and the braneworld scenario
[18, 19, 20, 21, 22, 23].
This connection between the Friedmann equation and
the first law of thermodynamics of the AH suggests the
unique role of the AH in thermodynamics of cosmology.
This may be used to probe the property of dark energy
[10, 24]. For example, if we assume that the temperature
of the dark components is T = bTA, then use the relation
T = (ρ+p)/s = (ρ+p)a3/σ, we find that the total energy
density of the dark components is given by
ρ = ρΛ + ρ0
, (5)
where ρ0 = σ
2b2Ga−60 /(6π), ρΛ = 3Λ/(8πG) is the en-
ergy density of the cosmological constant, σ is the con-
stant comoving entropy density, and s is the physical en-
tropy density. The right-hand side of the above equation
contains three different terms, which correspond to, re-
spectively, the cosmological constant, the stiff fluid, and
the pressureless matter. However, the coefficients of these
terms are not all independent. In fact, the current obser-
vational constraints tell us that the stiff fluid is negligibly
small, for which we must assume ρ0 ≪ 1. This in turn im-
plies that the pressureless matter given by the last term
is also negligibly small. So the pressureless matter in
the last term cannot account for dark matter. In other
words, the dark matter must not be in equilibrium with
the AH.
For the BD theory [25]
L = −
φR + ωgµν
∂µφ∂νφ
, (6)
the BD scalar φ plays the role of the gravitational con-
stant. The MS mass is [26]
M = φr̃(1− gabr̃,ar̃,b)/2. (7)
At the AH, M = φr̃A/2. The horizon entropy is SA =
πr̃2Aφ, so we get
TAdSA = r̃Adφ/2 + φdr̃A. (8)
On the other hand, we have
− dE = −ka∇aMdt = −r̃Adφ/2 + φdr̃A. (9)
Comparing Eqs. (8) with (9), we find that the equilibrium
thermodynamics −dE = TAdSA fails to hold for the BD
theory. Similarly, it can be shown that −dE = TAdSA
does not hold in the nonlinear and scalar-tensor theories
of gravity. It is exactly because of this that it was argued
nonequilibrium treatment might be needed.
As mentioned above, the mass, temperature and en-
tropy of the AH are all geometrical quantities, and the
first law of thermodynamics of the AH can be regarded
as a geometric relation. Now, the important question is
whether a mass function exists that serves as the bridge
between the Friedmann equation and the first law of ther-
modynamics of the AH without nonequilibrium correc-
tion. In the following, we show that the answer is affir-
mative. It has exactly the dimension of energy, and is
equal to the MS mass at the AH. To distinguish it with
the MS mass, we call it the masslike function.
To show our above claim, let us write the a − b com-
ponents of the Einstein equation as
M,a = −4πr̃2(T ba − δ
aT )r̃,b + r̃,a, (10)
where the mass-like function M is defined as
(1 + gabr̃,ar̃,b). (11)
At the AH, gabr̃,ar̃,b = 0 and the masslike function
M = r̃A/2G, which is equal to the MS mass. For
the Robertson-Walker metric we have gtt = −1, grr =
a2/(1 − kr2) and r̃ = ar. Then, the mass formulas (10)
yield the Friedmann equations
ρ, (12)
(ρ+ 3p). (13)
Combining Eqs. (12) and (13), we can derive the energy
conservation law ρ̇ + 3H(ρ + p) = 0. Thus, the mass
formulas (10) give rise to the full set of the cosmological
equations.
At the AH, the masslike function M = 4πr̃3Aρ/3, which
is the total energy inside the AH. The energy flow is
dE = ka∇aMdt = d(r̃A)/G = TAdSA. (14)
On the other hand, using the mass formulas (10), we get
the energy flow through the AH
dE = ka∇aMdt = −4πr̃2T ba r̃,bk
= 4πr̃3AH(ρ+ p)dt. (15)
Therefore, the Friedmann equation gives rise to the first
law of thermodynamics dE = TAdSA of the AH. While
this result is the same as that obtained by using the MS
mass, we show below that the equilibrium thermodynam-
ics can be derived for BD and nonlinear gravities by using
our newly defined masslike function, although it cannot
be done by using the MS mass, as shown above.
For the BD theory, the mass-like function is defined as
M ≡ φr̃(1 + gabr̃,ar̃,b)/2. (16)
At the AH, it reduces to the MS mass, M = M = φr̃A/2.
The a− b components of the gravitational field equation
become
M,a =− 4πr̃2(T ba − δ
aT )r̃,b + 2πr̃
+ (φr̃),a
ω + 2
r̃2φ,aφ,br̃
r̃2r̃,aφ,bφ
;b − r̃r̃,ar̃,bφ;b
r̃2φ;abr̃
r̃2✷r̃φ,a −
φ,a✷φ.
Applying to the Robertson-Walker metric, the above
equation gives the Friedmann equations
, (18)
(ρ+ 3p)−
. (19)
The mass formulas (17) or Eqs. (18) and (19) are not
sufficient to describe the full dynamics of the BD cosmol-
ogy. In the BD cosmology, we also need the equation of
motion of the BD scalar field φ in addition to Eqs. (18)
and (19), which is given by
φ̈+ 3Hφ̇ =
3 + 2ω
(ρ− 3p). (20)
From the definition of the masslike function (16), at the
AH we find
dE = M,ak
adt = r̃Adφ/2 + φdr̃A = TAdSA, (21)
where the entropy now is SA = πr̃
Aφ. Using the mass
formulas (17), we get the energy flow through the AH
3 + 2ω
r̃3AH [(ω + 2)ρ+ ωp] +
r̃3AH
− 2r̃3AH
r̃Aφ̇,
where we used Eq. (20) in deriving the above equation.
From Eqs. (18)-(20), the right-hand side of Eq. (22) can
be written as 1
r̃Aφ̇+ φ ˙̃rA. Therefore, we see that in BD
theory, the first law of thermodynamics of the AH dE =
TAdSA can be derived from the Friedmann equation.
The thermodynamic prescription can be easily ex-
tended to general scalar-tensor theory of gravity with the
Lagrangian
L = f(φ)R − gµν∂µφ∂νφ/2− V (φ). (23)
In this case, f(φ) plays the role of the gravitational con-
stant, so now we can define the mass-like function as
M ≡ f(φ)r̃
1 + gabr̃,ar̃,b
/2, (24)
and the horizon entropy as SA = πr̃
Af(φ). Then, using
these definitions, we can show that dE = M,ak
adt =
TAdSA.
For the nonlinear theory of gravity f(R), we can define
the masslike function as
f ′(R)r̃
1 + gabr̃,ar̃,b
, (25)
and the horizon entropy SA = πr̃
′(R), where f ′(R) =
df/dR. Again, it is easy to show that dE = M,ak
adt =
TAdSA. Therefore, the thermodynamics of the AH holds
for both the general scalar-tensor theory of gravity and
the nonlinear theory of gravity.
Now we show how to derive the first law of ther-
modynamics of the AH from the Friedmann equation
in the Lovelock gravity. The Lovelock Lagrangian is
n=0 cnLn [27], where
Ln = 2
µ1ν1···µnνn
α1β1···αnβn
Rα1β1µ1ν1 · · ·R
Using the Robertson-Walker metric, we obtain the Fried-
mann equations in N + 1 dimensional space-time
N(N − 1)
ρ, (26)
)i−1 (
N − 1
(ρ+ p),
where ĉ0 = c0/[N(N − 1)], ĉ1 = 1 and ĉi = ci
j=3(N +
1−j) for i > 1. The masslike function can now be defined
N(N − 1)ΩN r̃N
2r̃−2i −
= ΩN r̃
A ρ, (28)
where ΩN is the volume of unit N -dimensional sphere
and the last equality is evaluated at the AH. Note that
although the geometric form is different, the masslike
function at the AH has the same value as that in Ein-
stein theory of gravity, which is the total energy inside
the AH. The entropy of the AH is
i(N − 1)
N − 2i+ 1
ĉir̃
N+1−2i
A . (29)
From Eqs. (28) and (29), we can easily check that dE =
adt = TAdSA holds with the horizon temperature
TA = 1/(2πr̃A). Using the Friedmann Eqs. (26) and
(27), we find the energy flow through the AH is dE =
NΩNHr̃
A (ρ+p), which is the same as that in Einstein’s
gravity.
By properly defining the masslike function in each the-
ory of gravity, we find that the corresponding Friedmann
equations can be written in the form dE = TAdSA of
the first law of thermodynamics at the AH. In other
words, the thermodynamic description of the gravita-
tional dynamics is manifest through the mass formulas.
Therefore, the gravitational dynamics can be considered
as the thermodynamic identity dE = TAdSA. This is
true for a variety of theories of gravity, including the
Einstein, Lovelock, nonlinear, and scalar-tensor theories.
This non-trivial connection between the thermodynamics
of the AH and the Friedmann equation may represent a
generic connection, and it suggests the unique role that
the AH can play in the thermodynamics of cosmology.
Such a thermodynamic description of the AH can also be
used to probe other physical systems and properties, such
as the nature of dark energy and the thermodynamics of
black holes in each of these theories.
Finally, we would like to note that, although the newly
defined masslike function reduces to the MS mass at the
AH, the corresponding energy flows passing through the
horizon are different. This explains why our masslike
function gives rise to the first law of thermodynamics
in various theories of gravity, while the MS mass does
not. Because of the masslike function, the energy mo-
mentum tensor includes the contribution of gravitational
fields such as BD scalars, or curvature scalars in non-
linear theory of gravity, in addition to the matter fields.
This treatment allows a reinterpretation of the nonequi-
librium correction introduced in [7]. The studies of other
properties of the newly-defined masslike function, includ-
ing the physical and geometrical difference between the
MS mass and it are important and should be reported
somewhere else.
Y.G. Gong is supported by NNSFC under Grants
No. 10447008 and 10605042, CMEC under Grant No.
KJ060502, and SRF for ROCS, State Education Min-
istry. A. Wang’s work was partially supported by a VPR
fund from Baylor University.
∗ gongyg@cqupt.edu.cn
† anzhong˙wang@baylor.edu
[1] J.M. Bardeen, B. Carter and S.W. Hawking, Commun.
Math. Phys. 31, 161 (1973).
[2] S.W. Hawking, Commun. Math. Phys. 43, 199 (1975);
46, 206(E) (1976).
[3] J.D. Bekenstein, Phys. Rev. D 7, 2333 (1973).
[4] T. Jacobson, Phys. Rev. Lett. 75, 1260 (1995).
[5] G.W. Gibbons and S.W. Hawking, Phys. Rev. D 15, 2738
(1977).
[6] R.G. Cai and S.P. Kim, J. High Energy Phys. 02, 050
(2005).
[7] C. Eling, R. Guedens and T. Jacobson, Phys. Rev. Lett.
96, 121301 (2006).
[8] C.M. Misner and D.H. Sharp, Phys. Rev 136, B571
(1964).
[9] E. Poisson and W. Israel, Phys. Rev. D 41, 1796 (1990).
[10] Y.G. Gong, B. Wang and A. Wang, J. Cosmol. Astropart.
Phys. 01, 024 (2007).
[11] M. Akbar and R.G. Cai, Phys. Lett. B 635, 7 (2006).
[12] X.-H. Ge, Phys. Lett. B 651, 49 (2007).
[13] T. Padmanabhan, Class. Quantum Grav. 19, 5387
(2002).
[14] D. Kothawala, S. Sarkar and T. Padmanabhan,
gr-qc/0701002.
[15] A. Paranjape, S. Sarkar and T. Padmanabhan, Phys.
Rev. D 74, 104015 (2006).
[16] S.A. Hayward, Class. Quantum Grav. 15, 3147 (1998).
[17] S. A. Hayward, S. Mukohyama and M.C. Ashworth,
Phys. Lett. A 256, 347 (1999).
[18] M. Akbar and R.G. Cai, Phys. Rev. D 75, 084003 (2007).
[19] R.G. Cai and L.M. Cao, Phys. Rev. D 75, 064008 (2007).
[20] R.G. Cai and L.M. Cao, Nucl. Phys. B 785, 135 (2007).
[21] M. Akbar and R.G. Cai, Phys. Lett. B 648, 243 (2007).
[22] A. Sheykhi, B. Wang and R.G. Cai, Nucl. Phys. B 779,
1 (2007).
[23] A. Sheykhi, B. Wang and R.G. Cai, Phys. Rev. D 76,
023515 (2007).
[24] Y.G. Gong, B. Wang and A. Wang, Phys. Rev. D 75,
mailto:gongyg@cqupt.edu.cn
mailto:anzhong_wang@baylor.edu
http://arxiv.org/abs/gr-qc/0701002
123516 (2007).
[25] C. Brans and R.H. Dicke, Phys. Rev. 124, 925 (1961).
[26] N. Sakai and J.D. Barrow, Class. Quantum Grav. 18,
4717 (2001).
[27] D. Lovelock, J. Math. Phys. 12, 498 (1971).
ABSTRACT
  With the help of a masslike function which has dimension of energy and equals
to the Misner-Sharp mass at the apparent horizon, we show that the first law of
thermodynamics of the apparent horizon $dE=T_AdS_A$ can be derived from the
Friedmann equation in various theories of gravity, including the Einstein,
Lovelock, nonlinear, and scalar-tensor theories. This result strongly suggests
that the relationship between the first law of thermodynamics of the apparent
horizon and the Friedmann equation is not just a simple coincidence, but rather
a more profound physical connection.

<|endoftext|><|startoftext|>
Constraints on the Interactions between Dark Matter and Baryons from the X-ray
Quantum Calorimetry Experiment
Adrienne L. Erickcek1,2, Paul J. Steinhardt2,3, Dan McCammon4, and Patrick C. McGuire5
Division of Physics, Mathematics, & Astronomy,
California Institute of Technology, Mail Code 103-33, Pasadena, CA 91125, USA
Department of Physics, Princeton University, Princeton, NJ 08544, USA
Princeton Center for Theoretical Physics, Princeton University, Princeton, NJ 08544, USA
Department of Physics, University of Wisconsin, Madison, WI 53706, USA and
McDonnell Center for the Space Sciences, Washington University, St. Louis, M0 63130, USA
Although the rocket-based X-ray Quantum Calorimetry (XQC) experiment was designed for X-
ray spectroscopy, the minimal shielding of its calorimeters, its low atmospheric overburden, and its
low-threshold detectors make it among the most sensitive instruments for detecting or constraining
strong interactions between dark matter particles and baryons. We use Monte Carlo simulations to
obtain the precise limits the XQC experiment places on spin-independent interactions between dark
matter and baryons, improving upon earlier analytical estimates. We find that the XQC experiment
rules out a wide range of nucleon-scattering cross sections centered around one barn for dark matter
particles with masses between 0.01 and 105 GeV. Our analysis also provides new constraints on
cases where only a fraction of the dark matter strongly interacts with baryons.
PACS numbers: 95.35.+d, 12.60.-i, 29.40.Vj
I. INTRODUCTION
From Vera Rubin’s discovery that the rotation curves
of galaxies remain level to radii much greater than pre-
dicted by Keplerian dynamics [1] to the Wilkinson Mi-
crowave Anisotropy Probe (WMAP) measurement of
the cosmic microwave background (CMB) temperature
anisotropy power spectrum [2], observations indicate that
the luminous matter we see is only a fraction of the
mass in the Universe. The three-year WMAP CMB
anisotropy spectrum is best-fit by a cosmological model
with Ωm = 0.241 ± 0.034 and a baryon density that is
less than one fifth of the total mass density. The cold
collisionless dark matter (CCDM) model has emerged
as the predominant paradigm for discussing the missing
mass problem. The dark matter is assumed to consist
of non-relativistic, non-baryonic, weakly interacting par-
ticles, often referred to as Weakly Interacting Massive
Particles (WIMPs).
Although the CCDM model successfully predicts ob-
served features of large-scale structure at scales greater
than one megaparsec [3], there are indications that it may
fail to match observations on smaller scales. Numerical
simulations of CCDM halos [4, 5, 6, 7, 8, 9, 10, 11, 12]
imply that CCDM halos have a density profile that in-
creases sharply at small radii (ρ ∼ r−1.2 according to
Ref. [12]). These predictions conflict with lensing ob-
servations of clusters [13, 14] that indicate the presence
of constant-density cores. X-ray observations of clusters
have found cores in some clusters, although density cusps
have also been observed [15, 16, 17]. On smaller scales,
observations of dwarf and low-surface-brightness galaxies
[18, 19, 20, 21, 22, 23, 24] indicate that these dark mat-
ter halos have constant-density cores with lower densities
than predicted by numerical simulations. Observations
also indicate that cores are predominant in spiral galaxies
as well, including the Milky Way [25, 26, 27]. Numerical
simulations of CCDM halos also predict more satellite
halos than are observed in the Local Group [28, 29] and
fossil groups [30].
Astrophysical explanations for the discord between the
density profiles predicted by CCDM simulations and ob-
servations have been proposed: for instance, dynamical
friction may transform density cusps into cores in the
inner regions of clusters [31], and the triaxiality of galac-
tic halos may mask the true nature of their inner den-
sity profiles [32]. There are also models of substructure
formation that explain the observed paucity of satellite
halos [33, 34, 35, 36].
Another possible explanation for the apparent failure
of the CCDM model to describe the observed features
of dark matter halos is that dark matter particles scat-
ter strongly off one another. The discrepancies between
observations and the CCDM model are alleviated if one
introduces a dark matter self-interaction that is compa-
rable in strength to the interaction cross section between
neutral baryons [37, 38]:
= 8× 10−25 − 1× 10−23cm2 GeV−1, (1)
where σDD is the cross section for scattering between dark
matter particles and mdm is the mass of the dark mat-
ter particle. Numerical simulations have shown that in-
troducing dark matter self-interactions within this range
reduces the central slope of the halo density profile and
reduces the central densities of halo cores, in addition to
destroying the extra substructure [39, 40].
The numerical coincidence between this dark mat-
ter self-interaction cross section and the known strong-
interaction cross section for neutron-neutron or neutron-
proton scattering has reinvigorated interest in the pos-
sibility that dark matter interacts with itself and with
http://arxiv.org/abs/0704.0794v2
baryons through the strong nuclear force. We refer to
dark matter of this type as “strongly interacting dark
matter” where “strong” refers specifically to the strong
nuclear force. Strongly interacting dark matter candi-
dates include the dibaryon [41, 42], the Q-ball [43], and
O-helium [44].
Surprisingly, the possibility that the dark matter may
be strongly interacting is not ruled out. While there are
numerous experiments searching for WIMPs, they are
largely insensitive to dark matter that interacts strongly
with baryons. The reason is that WIMP searches are typ-
ically conducted at or below ground level based on the
fact that WIMPs can easily penetrate the atmosphere
or the Earth, whereas strongly interacting dark matter is
multiply scattered and thermalized by the time it reaches
ground level and its thermal kinetic energy is too small
to produce detectable collisions with baryons in WIMP
detectors. Consequently, there are few experiments ca-
pable of detecting strongly interacting dark matter di-
rectly. Starkman et al. [45] summarized the constraints
on strongly interacting dark matter from experiments
prior to 1990, and these constraints were later refined
[38, 46, 47, 48]. The strength of dark matter interactions
with baryons may also be constrained by galactic dynam-
ics [45], cosmic rays [45, 49], Big Bang nucleosynthesis
(BBN) [49], the CMB [50], and large-scale structure [50].
The X-ray Quantum Calorimetry (XQC) project
launched a rocket-mounted micro-calorimeter array in
1999 [51]. At altitudes above 165 km, the XQC detector
collected data for a little less than two minutes. Although
its primary purpose was X-ray spectroscopy, the limited
amount of shielding in front of the calorimeters and the
low atmospheric overburden makes the XQC experiment
a sensitive detector of strongly interacting dark matter.
In this article, we present a new numerical analysis
of the constraints on spin-independent interactions be-
tween dark matter particles and baryons from the XQC
experiment using Monte Carlo simulations of dark mat-
ter particles interacting with the XQC detector and the
atmosphere above it. Our work is a significant improve-
ment upon the earlier analytic estimates presented by
some of us in Refs. [38, 48] because it accurately models
the dark matter particle’s interactions with the atmo-
sphere and the XQC instrument. Our calculation here
also supersedes the analytic estimate by Zaharijas and
Farrar [52] because they only considered a small por-
tion of the XQC data and did not include multiple scat-
tering events nor the overburden of the XQC detector.
We restrict our analysis to spin-independent interactions
because the XQC calorimeters are not highly sensitive
to spin-dependent interactions. Only a small fraction of
the target nuclei in the calorimeters have non-zero spin;
consequently, the bound on spin-dependent interactions
between baryons and dark matter from the XQC experi-
ment is about four orders of magnitude weaker than the
bound on spin-independent interactions [52].
This article is organized as follows. In Section II we
summarize the specifications of the XQC detector. We
then review dark matter detection theory in Section III.
This Section includes a discussion of coherent versus in-
coherent scattering and how we account for the loss of
coherence in our analysis. A complete description of our
analysis follows in IV, and our results are presented in
Section V. Finally, in Section VI, we summarize our
findings and compare the constraints to strongly inter-
acting dark matter from the XQC experiment to those
from other experiments.
II. THE XQC EXPERIMENT
Calorimetry is the use of temperature deviations to
measure changes in the internal energy of a material. By
drastically reducing the specific heat of the absorbing
material, the use of cryogenics in calorimetry allows the
absorbing object to have a macroscopic volume and still
be sensitive to minute changes in energy. These detectors
are sensitive enough to register the energy deposited by a
single photon or particle and gave birth to the technique
of “quantum calorimetry,” the thermal measurement of
energy quanta.
The quantum calorimetry experiment [51] we use to
constrain interactions between dark matter particles and
baryons is the second rocket-born experiment in the XQC
(X-ray Quantum Calorimetry) Project, a joint under-
taking of the University of Wisconsin and the Goddard
Space Flight Center [53, 54]. It launched on March 28,
1999 and collected about 100 seconds of data at alti-
tudes between 165 and 225 km above the Earth’s surface.
The detector consisted of thirty-four quantum calorime-
ters operating at a temperature of 0.06 K; for detailed
information on the XQC detector functions, please refer
to Refs. [51, 54]. These detectors were separated from
the exterior of the rocket by five thin filter panes [51].
The small atmospheric overburden at this altitude and
the minimal amount of shielding in front of the calorime-
ters makes this experiment a promising probe of strongly
interacting dark matter.
The absorbers in the XQC calorimeters are composed
of a thin film of HgTe (0.96 µm thick) deposited on a sili-
con (Si) substrate that is 14 µm thick. The absorbers rest
on silicon spacers and silicon pixel bodies. Figs. 1 and
2 show side and top views of the detectors with the di-
mensions of each layer. Temperature changes in all four
components are measured by the calorimeter’s internal
thermometer. The calorimeters report the average tem-
perature over an integration time of 7 ms in order to
reduce the effect of random temperature fluctuations on
the measurement. Multiple scatterings by a dark matter
particle will register as a single event because the time it
takes the dark matter particle to make its way through
the calorimeter is small compared to the integration time.
The detector array consists of two rows of detec-
tors, with seventeen active calorimeters and one inactive
calorimeter in each row, and is located at the bottom of
a conical detector chamber. Within a 32-degree angle
   Si Substrate
0.5 mm x 2.0 mm
14 µm
12 µm
HgTe Absorber
0.96µm
      Si Pixel Body
0.25 mm x 1.0 mm
Si Spacer
0.245 mm x 0.245 mm
FIG. 1: A vertical cross section of an XQC calorimeter. The
relative thicknesses of the layers are drawn to scale, as are
their relative lengths, but the two scales are not the same. To
facilitate the display of the layers, the vertical dimension has
been stretched relative to the horizontal dimension.
2.0 mm
1.0 mm
Absorber Panel
  (HgTe on Si)
Pixel Body
     (Si)
   Spacer
     (Si)
  0.245 mm 
x 0.245 mm
FIG. 2: A top view of an XQC calorimeter. The absorber is
the top layer and underneath it lies the spacer, followed by
the pixel body. These dimensions are drawn to scale.
from the detector normal, the incoming particles only
pass through the aforementioned filters. The five filters
are located 2 mm, 6 mm, 9 mm, 11 mm and 28 mm above
the detectors. Each filter consists of a thin layer of alu-
minum (150 Å) supported on a parylene (CH) substrate
(1380 Å). The pressure inside the chamber is less than
10−6 Torr. At this level of evacuation, a dark matter
particle with a mass of 106 GeV and a baryon interac-
tion cross section of 106 barns, would have less than a
20% chance of colliding with an air atom in the cham-
ber. Therefore, we assume that the chamber is a perfect
vacuum in our analysis.
While the atmospheric pressure at the altitudes at
which the detector operated is about 10−8 times the at-
mospheric pressure at sea level, the atmospheric over-
burden of the XQC detector is still sufficient to scat-
ter incoming strongly interacting dark matter particles.
Simulating a dark matter particle’s path through the
atmosphere requires number-density profiles for all the
molecules in the atmosphere. These profiles were ob-
tained using the MSIS-E-90 model1 for the time (1999
1 Available at http://modelweb.gsfc.nasa.gov/models/msis.html
FIG. 3: The points depict the MSIS-E-90 density profiles for
the seven most prevalent constituents of the atmosphere above
the XQC detector, and the lines show the piecewise exponen-
tial fits used in our analysis.
March 28 9:00 UT) and location (White Sands Missile
Range, New Mexico) of the XQC rocket launch.
During the data collection period, the average altitude
of the XQC rocket was 201.747 km. At this altitude and
above, the primary constituents of the atmosphere are
molecular and atomic oxygen, molecular and atomic ni-
trogen, helium, atomic hydrogen, and argon. The MSIS-
E-90 model provides tables of the number densities of
each of these seven chemical species. In our analysis,
computational efficiency demanded that we fit analytic
functions to these data. We found exponential fits for
the density profiles in three altitude ranges: 200-300 km,
300-500 km and 500-1000 km. The error in the proba-
bility of a collision between a dark matter particle and
an element of the atmosphere introduced by using these
fits instead of the original data is 0.02%. Fig. 3 shows
the number density profiles provided by the MSIS-E-90
model and the exponential fits used to model the data.
The XQC detector collected data for a total of 150
seconds. During these 150 seconds of activity, the thirty-
four individual calorimeters were not all operational at
all times. Furthermore, events that could not be ac-
curately measured by the calorimeters and events at-
tributed to cosmic rays hitting the base of the detec-
tor array were removed from the XQC spectrum, and
these cuts also contribute to the dead time of the sys-
tem. Specifically, events that arrived too close together
for the calorimeters to accurately measure distinct ener-
gies were discarded. This criterion removed 12% of the
observed events and the resulting loss of sensitivity was
http://modelweb.gsfc.nasa.gov/models/msis.html
FIG. 4: Top panel: The XQC energy spectrum from 0 - 4
keV in 5 eV bins. This spectrum does not have non-linearity
corrections applied (see Ref. [51]), so the calibration lines at
3312 eV and 3590 eV appear slightly below their actual en-
ergies. The cluster of counts to the left of each calibration
peak result from X-rays passing through the HgTe layer and
being absorbed in the Si components where up to 12% of the
energy may then be trapped in metastable states. Bottom
panel: The XQC energy spectrum from 0 - 2.5 keV in 5 eV
bins. This spectrum, combined with the over-saturation rate
of 0.6 events per second with energies greater than 4000 eV,
was used in our analysis.
included in the dead time of the calorimeters. When a
cosmic ray penetrates the silicon base of the detector ar-
ray, the resulting temperature increase is expected to reg-
ister as multiple, nearly simultaneous, low-energy events
on nearby calorimeters. To remove these events from the
spectrum, we cut out events that were part of either a
pair of events in adjacent detectors or a trio of events
in any of the detectors that arrived within 3 ms of each
other and had energies less than 2.5 keV. This procedure
was expected to remove more than 97% of the events that
resulted from cosmic rays hitting the base of the array.
Nearly all of the events attributed to heating from cosmic
rays had energies less than 300 eV, and a high fraction
of the observed low-energy events were included in this
cut. For example, seventeen of the observed twenty-four
events with energies less than 100 eV were removed. The
expected loss of sensitivity due to events being falsely
attributed to cosmic rays was included in the calculated
dead time of the calorimeters. Once all the dead time is
accounted for, the 150 seconds of data collection is equiv-
alent to 100.7 seconds of observation with all thirty-four
calorimeters operational.
The XQC calorimeters are capable of detecting energy
deposits that exceed 20 eV, but full sensitivity is not
reached until the energy surpasses 36 eV, and for approx-
imately half of the detection time, the detector’s lower
threshold was set to 120 eV. The calorimeters cannot re-
solve energies above 4 keV, and the 2.5-4 keV spectrum is
dominated by the detector’s interior calibration source:
a ring of 2µCi 41Ca that generates Kα and Kβ lines at
3312 eV and 3590 eV, respectively. We refer the reader
to Ref. [51] for a complete discussion of the calibration of
the detector. These limitations restrict the useful portion
of the XQC spectrum to 0.03-2.5 keV. This spectrum is
shown in Fig. 4, along with the full spectrum from 0-4
keV. The XQC field of view was centered on a region of
the sky known to have an enhanced X-ray background
in the 100-300 eV range, possibly due to hot gas in the
halo, and this surge in counts can be seen in Fig. 4. In
addition to the information present in this spectrum, we
know that the XQC detector observed an average over-
saturation event rate of 0.6 per second. This corresponds
to a total of 60 events that deposited more than 4000 eV
in a calorimeter. In Section IVB, we describe how we use
the observed spectrum between 29 eV and 2500 eV and
the integrated over-saturation rate to constrain the total
cross section for elastic scattering between dark matter
particles and nucleons.
III. DETECTING DARK MATTER
A. Incidence of dark matter particles
The expected flux of dark matter particles into the de-
tector depends on the density of the dark matter halo in
the Solar System. Unfortunately, the local dark matter
density is unknown and the range of theoretical predic-
tions is wide. By constructing numerous models of our
galaxy with various dark matter density profiles and halo
characteristics, rejecting those models that contradict ob-
servations, and finding the distribution of local dark mat-
ter densities in the remaining viable models, Ref. [55] pre-
dicted that the local dark matter density is between 0.3
and 0.7 GeV cm−3 assuming that the dark matter halo
is flattened, and the predicted local density decreases as
the halo is taken to be more spherical. Another approach
[56] used numerical simulations of galaxies similar to our
own to find the dark matter density profile and then fit
the profile parameters to Galactic observations, predict-
ing a mean local dark matter density between 0.18 GeV
cm−3 and 0.30 GeV cm−3. Given that it lies in the in-
tersection of these two ranges, we use the standard value
of 0.3 GeV cm−3 for the local dark matter density in
our primary analysis. This assumption ignores the possi-
ble presence of dark matter streams or minihalos, which
do occur in numerical simulations [56] and could lead to
local deviations from the mean dark matter density.
We also assume that the velocities of the dark matter
particles with respect to the halo are isotropic and have
a bounded Maxwellian distribution: the probability that
a particle has a velocity within a differential volume in
velocity-space centered around a given velocity ~v is
P (~v) =
d3~v if v ≤ vesc,
0 if v > vesc.
where v0 is the dispersion velocity of the halo, vesc is the
Galactic escape velocity at the Sun’s position, and k is a
normalization factor [57]:
k = (πv20)
. (3)
Numerical simulations indicate that dark matter particle
velocities may not have an isotropic Maxwellian distri-
bution [56]. Ref. [58] examines how assuming a more
complicated velocity distribution would alter the flux of
dark matter particles into an Earth-based detector.
Given the flat rotation curve of the spiral disk at the
Sun’s radius and beyond and assuming a spherical halo,
the local dispersion speed v0 is the maximum rotational
velocity of the Galaxy vc [59]. Reported values for the
rotational speed include 222±20 km s−1 [60], 228±19 km
s−1 [61], 184± 8 km s−1 [62] and 230 ± 30 km s−1 [63].
Recent measurements of the Galaxy’s angular velocity
have yielded values of Ωgal = 28 ± 2 km s−1 kpc−1 [64]
and 32.8 ± 2 km s−1 kpc−1 [65]. If the Sun is located
8.0 kpc from the Galactic center, these angular velocities
correspond to tangential velocities 224± 16 km s−1 and
262 ± 16 km s−1 respectively. We adopt vc = 220 ± 30
km s−1 as a centrally conservative value for the Galaxy’s
circular velocity at the Sun’s location.
The final parameter we need to obtain the dark mat-
ter’s velocity distribution is the escape velocity in the So-
lar vicinity. The largest observed stellar velocity at the
Sun’s radius in the Milky Way is 475 km s−1, which es-
tablishes a lower bound for the local escape velocity [66].
Ref. [67] used the radial motion of Carney-Latham stars
to determine that the escape velocity is between 450 and
650 km s−1 to 90% confidence, and Ref. [68] obtained a
90% confidence interval of 498 to 608 km s−1 from ob-
servations of high-velocity stars. A kinematic derivation
of the escape velocity [59] gives
v2esc = 2v
1 + ln
, (4)
where R0 is the distance from the Sun to the center of the
Galaxy, and Rgal is radius of the Galaxy. Observations of
other galaxies suggest that our galaxy extends to about
100 kpc [59], and observations of Galactic satellites indi-
cate that the Galaxy’s flat rotation curve extends to at
least 110 kpc [63]. The commonly accepted value for the
Solar radius is R0 = 8.0 kpc [69]. Recent measurements
include R0 = 7.9± 0.3 kpc [70] and R0 = 8.01± 0.44 kpc
[71], and a compilation of measurements over the past
decade [71] yields an average value of R0 = 7.80 ± 0.33
kpc. To estimate the escape velocity, we use 100 kpc as
a conservative estimate of the Galactic radius and the
standard value R0 = 8.0 kpc. These parameters, com-
bined with vc = 220 km s
−1, predict an escape velocity
of 584 km s−1, which falls near the middle of the ranges
proposed in Refs. [67, 68].
The isotropic Maxwellian velocity distribution given
by Eq. (2) specifies the dark matter particles’ motion
relative to the halo. However, we are interested in their
motion relative to the XQC detector: ~vobserved = ~vdm −
~vdetector where the latter two velocities are measured with
respect to the halo. The velocity of the detector with
respect to the halo has three components: the velocity of
the Sun relative to halo, the velocity of the Earth with
respect to the Sun, and the velocity of the detector with
respect to the Earth.
When discussing these velocities, it is useful to de-
fine a Galactic Cartesian coordinate system. In Galactic
coordinates, the Sun is located at the origin, and the
xy-plane is defined by the Galactic disk. The x-axis
points toward the center of the Galaxy, and the y-axis
points in the direction of the Sun’s tangential velocity
as it revolves around the Galactic center. The z-axis
points toward the north Galactic pole and is antiparal-
lel to the angular momentum of the rotating disk. The
motion of the Sun through the halo has two compo-
nents. First, there is the Sun’s rotational velocity as
it orbits the Galactic center: vc in the y direction. Sec-
ond, there is the motion of the Sun relative to the spiral
disk [72]: ~v⊙ = (10.00 ± 0.36, 5.25 ± 0.62, 7.17 ± 0.38)
km s−1 in Galactic Cartesian coordinates. When the
Earth’s motion through the Solar System during its an-
nual orbit of the Sun is expressed in Galactic coor-
dinates [57], the resulting velocity at the time of the
XQC experiment (7.3 days after the vernal equinox) is
~vEarth = (29.14, 5.330,−3.597) km s−1.
The final consideration is the velocity of the detector
relative to the Earth. The maximum velocity attained by
the XQC rocket was less than 1.2 km s−1. This velocity is
insignificant compared to the motion of the Sun relative
to the halo. Moreover, the XQC detector collected data
while the rocket rose and while it fell, and the average
velocity of the rocket was only 0.104 km s−1. Therefore,
we neglect the motion of the rocket in the calculation of
the dark matter wind. Combining the motion of the Sun
and the Earth then gives the total velocity of the XQC
detector with respect to the halo during the experiment
in Galactic Cartesian coordinates: ~vdetector = (39.14 ±
0.36, 230.5 ± 30, 3.573 ± 0.38) km s−1. Subtracting the
velocity vector of the detector relative to the halo from
the velocity vector of the dark matter relative to the halo
gives the dark matter’s velocity relative to the detector in
Galactic coordinates. However, we want the dark matter
particles’ velocities in the coordinate frame defined by
the detector, where the z-axis is the field-of-view vector.
The XQC field of view was centered on l = 90◦, b = +60◦
in Galactic latitude and longitude [51], so the rotation
from Galactic coordinates to detector coordinates may
be described as a clockwise 30◦ rotation of the z-axis
around the x-axis, which is taken to be the same in both
coordinate systems.
B. Dark Matter Interactions
Calorimetry measures the kinetic energy transferred
from the dark matter to the absorbing material without
regard for the specific mechanism of the scattering or
any other interactions. Consequently, the dark matter
detection rate for a calorimeter depends only on the mass
of the dark matter particle and the total cross section for
elastic scattering between the dark matter particle and an
atomic nucleus of mass number A, which is proportional
to the cross section for dark matter interactions with a
single nucleon (σDn). The calorimeter measures the recoil
energy of the target nucleus (mass mT),
Erec =
2mTmdm
(mT +mdm)2
(1 − cos θCM), (5)
where mdm and vdm are dark matter particle’s mass and
velocity prior to the collision in rest frame of the target
nucleus and θCM is the scattering angle in the center-of-
mass frame.
If the momentum transferred to the nucleus, q2 =
2mTErec, is small enough that the corresponding de
Broglie wavelength is larger than the radius R of the
nucleus (qR ≪ ~), then the scattering is coherent. In co-
herent scattering, the scattering amplitudes for each in-
dividual component in the conglomerate body are added
prior to the calculation of the cross section, so the total
cross section is proportional to the square of the mass
number of the target nucleus. Including kinematic fac-
tors [45, 73], the cross section for coherent scattering off
a nucleus is given by
σcoh(A) = A
mred(DM,Nuc)
mred(DM, n)
σDn, (6)
where mred(DM,Nuc) is the reduced mass of the nucleus
and the dark matter particle, mred(DM, n) is the reduced
mass of a nucleon and the dark matter particle, and A is
the mass number of the nucleus. Coherent scattering is
isotropic in the center-of-mass frame of the collision.
Dark matter particles may be massive and fast-moving
enough that the scattering is not completely coherent
when the target nucleus is large [74]. When the scat-
tering is incoherent, the dark matter particle “sees” the
internal structure of the nucleus, and the cross section for
scattering is reduced by a “form factor,” which is a func-
tion of the momentum transferred to the nucleus during
the collision (q) and the nuclear radius (R):
F 2(q, R). (7)
Since q depends on the recoil energy, which in turn de-
pends on the scattering angle, incoherent scattering is
not isotropic.
In this discussion of coherence, we have neglected the
possible effects of the dark matter particle’s internal
structure by assuming that σDn is independent of recoil
energy. If the dark matter particle is not point-like then
σDn decreases as the recoil momentum increases due to a
loss of coherence within the dark matter particle. Inco-
herence within the dark matter particle has observational
consequences [75], but these effects depend on the size of
the dark matter particle. To avoid restricting ourselves
to a particular dark matter model, we assume that the
dark matter particle is small enough that nucleon scat-
tering is always coherent; when we discuss incoherence,
we are referring to the effects of the nucleus’s internal
structure.
According to the Born approximation, the form fac-
tor for nuclear scattering defined in Eq. (7) is the
Fourier transform of the nuclear ground-state mass den-
sity [57, 76]. The most common choice for the form factor
[74, 77] is F 2(q, R) = exp[−(qRrms)2/(3~2)], where Rrms
is the root-mean-square radius of the nucleus. For a solid
sphere, R2rms = (3/5)R
2, so this form factor is equivalent
to the form factor used in Ref. [52]. This form factor is
an accurate approximation of the Fourier transform of
a solid sphere for (qR)/~ ∼< 2, but it grossly underesti-
mates the reduction in σ for larger values of q [57]. The
maximum speed of a dark matter particle with respect
to the XQC detector is ∼ 800 km s−1 (escape velocity
+ detector velocity), and at that speed, the maximum
possible value of qR/~ for a collision with a Hg nucleus
(A = 200) is nearly ten for a 100 GeV dark matter parti-
cle, and the maximum possible value of qR/~ increases as
the mass of the dark matter particle increases. Clearly,
this approximation is not appropriate for a large portion
of the dark matter parameter space probed by the XQC
experiment.
Furthermore, a solid sphere is not a very realistic model
of the nucleus. A more accurate model of the nuclear
mass density is ρ(r) =
d3r′ρ0(r
′)ρ1(r− r′), where ρ0 is
constant inside a radius R20 = R
2 − 5s2 and zero beyond
that radius and ρ1 = exp[−r2/(2s2)], where s is a “skin
thickness” for the nucleus [78]. The resulting form factor
F (q, R) = 3
sin(qR/~)− (qR/~) cos(qR/~)
(qR/~)3
× exp
(qs/~)2
. (8)
We follow Ref. [57] in setting the parameters in Eq. (8):
s = 0.9 fm and
R2 = [(1.23A1/3 − 0.6)2 + 0.631π2 − 5s2] fm2, (9)
where A is the mass number of the target nucleus.
Despite its simple analytic form, the form factor given
by Eq. (8) is computationally costly to evaluate repeat-
edly. We use an approximation:
F 2 =
0.9 fm
if qR
9(0.81)
(qR/~)4
0.9 fm
if qR
The low-q approximation combines the standard approx-
imation for the solid sphere with the factor accounting
for the skin depth of the nucleus. The high-q approxi-
mation was derived from the asymptotic form of the first
spherical Bessel function and normalized so that the to-
tal cross section is as close as possible to the exact result.
The error in the total cross section due to the use of the
approximation is less than 1% for nearly all dark mat-
ter masses; the sole exception is mdm ∼ 10 − 100 GeV,
and even then the error is less than 5%. Unless other-
wise noted, we use this approximation for the form factor
throughout this analysis. We also assume that the dark
matter particle does not interact with nuclei in any way
other than elastic scattering.
IV. ANALYSIS OF XQC CONSTRAINTS
To obtain an accurate description of the XQC experi-
ment’s ability to detect strongly interacting dark matter
particles, we turned to Monte Carlo simulations. The
Monte Carlo code we wrote to analyze the XQC experi-
ment simulates a dark matter particle’s journey through
the atmosphere to the XQC detector chamber, its path
through the detector chamber to a calorimeter, and its in-
teraction with the sensitive components of the calorime-
ter. This latter portion of the code also records how much
energy the particle deposits in the calorimeter through
scattering. The results of several such simulations for the
same set of dark matter properties may be used to pre-
dict the likelihood that a given dark matter particle will
deposit a particular amount of energy into the calorime-
ter. These probabilities of various energy deposits predict
the recoil-energy spectrum the XQC detector would ob-
serve if the dark matter particles have a given mass and
nucleon-scattering cross section. This simulated spec-
trum may then be compared to the XQC data to find
which dark matter parameters are excluded by the XQC
experiment.
A. Generating Simulated Energy-Recoil Spectra
The basic subroutine in our Monte Carlo algorithm
is the step procedure. The step procedure begins with
a particle with a certain velocity vector and position in
a given material and moves the particle a certain dis-
tance in the material, returning its new position and ve-
locity. The step procedure also determines whether or
not a scattering event occurred during the particle’s trek
and updates the velocity accordingly. The number of ex-
pected collisions in a step of length l through a material
with target number density n is n× σtot × l, where σtot
is the total scattering cross section obtained by integrat-
ing Eq. (7) over the scattering angle, or equivalently, the
recoil momentum q:
σtot =
q2max
F 2(q, R) dq2, (11)
where qmax is the maximum possible recoil momentum.
The step length l is chosen so that it is at most a tenth
of the mean free path through the material, so the num-
ber of expected collisions is less than one and represents
the probability of a collision. After each step, a ran-
dom number between zero and one is generated using
the “Mersenne Twister” (MT) algorithm [79] and if that
random number is less than the probability of a collision,
the particle’s energy and trajectory are updated. First, a
recoil momentum is selected according to the probability
distribution P (q2) = F 2(q, R)σcoh/(q
maxσtot), where the
exact form factor is used for qR/~ > 2 so that the oscilla-
tory nature of the form factor is not lost. The recoil mo-
mentum determines the recoil energy and the scattering
angle in the center-of-mass frame through Eq. (5). The
scattering is axisymmetric around the scattering axis, so
the azimuthal angle is assigned a random value between
0 and 2π. The scattering angles are used to update the
particle’s trajectory, and its speed is decreased in accor-
dance with the kinetic energy transferred to the target
nucleus. The step subroutine repeats until the particle
exits the simulation, or its kinetic energy falls below 0.1
eV, or the energy deposited in the calorimeter exceeds
the saturation point of 4000 eV.
Our simulation treats the atmosphere as a 4.6×4.6 cm
square column with periodic boundary conditions, the
bottom face of which covers the top of the conical detec-
tor chamber described in Section II. This implementation
assumes that for every particle that exits one side of the
column, there is a particle that enters the column from
the opposite side with the same velocity. The infinite
extent of the atmosphere and its translational invariance
makes this assumption reasonable. The atmosphere col-
umn extends to an altitude 1000 km; increasing the at-
mosphere height beyond 1000 km has a negligible effect
on the total number of collisions in the atmosphere. The
simulation begins with a dark matter particle at the top
of the atmosphere column at a random initial position
on the 4.6×4.6 cm square. Its initial velocity with re-
spect to the dark matter halo is selected according to the
isotropic Maxwellian velocity distribution function given
by Eq. (2), and then the velocity relative to the detector
is found via the procedure described in Section IIIA.
The dark matter particle’s path from the top of the at-
mosphere to the detector is modeled using the step pro-
cedure described above. The simulation of the particle’s
interaction with the atmosphere ends if the particle’s al-
titude exceeds 1000 km or if the particle falls below the
height of the XQC rocket. We use the time-averaged alti-
tude (201.747 km) as the constant altitude of the rocket.
We made this simplification because it allows us to ig-
nore the periodic inactivity of each calorimeter and treat
the detector as thirty-four calorimeters that are active for
100.7 seconds. When the dark matter particle hits the
rocket, its path through the five filter layers is also mod-
eled using the step procedure, as is its path through the
calorimeters. In addition to being smaller than the mean
free path, the step length is chosen so that the particle’s
position relative to the boundaries of the detectors is ac-
curately modeled. The simulation ends when the dark
matter particle’s random-walk trajectory takes it out of
the detector chamber. As mentioned in Section II, the
calorimeter detects the sum of all the recoil energies if
the dark matter particle is scattered multiple times.
When the dark matter particle is unlikely to experience
more than one collision in the calorimeter, this simula-
tion is far more detailed than is required to accurately
predict the energy deposited by the dark matter parti-
cle. This is the case for the lightest (mdm ≤ 102 GeV)
and weakest-interacting (σDn ≤ 10−26 cm2) dark mat-
ter particles that the XQC calorimeters are capable of
detecting. Since the lightest dark matter particles are
also the most numerous, many Monte Carlo trials are
required to sample all the possible outcomes of a dark
matter particle’s encounter with the detector. The sim-
ulation described above is too computationally intensive
to run that many trials, so we used a faster and simpler
simulation to model the interactions of these dark matter
particles. This simulation assumes that the particle will
experience at most one collision in the atmosphere and
at most one collision in each filter layer and each layer of
the calorimeter. The simulation ends if the probability
of two scattering events in either the atmosphere or any
of the filter layers exceeds 0.1. Instead of tracking the
dark matter particle’s path through the atmosphere, the
total overburden for the atmosphere is used to determine
the probability that the dark matter particle scatters in
the atmosphere, and the particle only reaches the detec-
tor if its velocity vector points toward the detector after
the one allowed scattering event. Also, instead of the
small step lengths required to accurately model the ran-
dom walk of a strongly interacting particle, each layer is
crossed with a single step. These simplifications reduce
the runtime of the simulation by a factor of 100, making
it possible to run 1010 trials in less than one day.
B. Comparing the Simulations to the XQC Data
In order to compare the probability spectra produced
by our Monte Carlo routine to the results of the XQC
experiment, we must multiply the probabilities by the
number of dark matter particles that are encountered by
the initial surface of the Monte Carlo routine. When the
initial velocity of the dark matter particle is chosen, the
initial velocity may point toward or away from the detec-
tor; in the latter case, the trial ends immediately. Con-
FIG. 5: Simulated event spectra for dark matter particles with
masses of 1, 10 and 100 GeV and a total nucleon-scattering
cross section of 10−27.3 cm2. In addition to the events de-
picted in these spectra, the simulations predict 1300 ± 160
events with energies greater than 4000 eV when mdm = 10
GeV and 10, 000 ± 1200 such events when mdm = 100 GeV.
The histogram represents the XQC observations.
sequently, the Monte Carlo probability that the particle
deposits no energy in the calorimeter already includes the
probability that the dark matter particle does not have a
halo trajectory that takes it into the atmosphere. There-
fore, the probabilities resulting from the Monte Carlo
routine should be multiplied by the number of particles
in the volume swept out by the initial 4.6×4.6 cm2 square
surface during the 100.7f(E) seconds of observation time,
where f is the fraction of the observing time that the
XQC detector was sensitive to deposits of energy E. For
energies between 36 and 88 eV, f is 0.5083, and the value
of f increases to one over energies between 88 and 128 eV.
The detector was also slightly sensitive to lower energies:
between 29 and 35 eV, f increases from 0.3815 to 0.5083.
The normal of the initial surface points along the detec-
tor’s field of view, and the surface moves with the detec-
tor; using the detector velocity given in Section IIIA, the
number of dark matter particles encountered by the ini-
tial surface is Ndm = f × (ρdm/mdm)× [(2.5± 0.3)× 1010
cm3], where ρdm is the local dark matter density.
The simulated event spectra produced by our Monte
Carlo routine indicate that particles with masses less
than 1 GeV very rarely deposit more than 100 eV inside
the XQC calorimeters. Conversely, particles with masses
greater than 100 GeV nearly always deposit more than
4000 eV when they interact with the XQC calorimeters,
so constraints on σDn for these mdm values arise from the
FIG. 6: Simulated event spectra for 10-GeV dark mat-
ter particles with total nucleon-scattering cross sections of
10−21.6, 10−27.3 and 10−28.3 cm2. In addition to the events
depicted in these spectra, the simulations predict 140 ± 37
events with energies greater than 4000 eV when σDn = 10
−21.6
cm2, 1300 ± 160 such events when σDn = 10
−27.3 cm2, and
120±15 such events when σDn = 10
−28.3 cm2. The histogram
represents the XQC observations.
over-saturation (E ≥ 4000 eV) event rate. Fig. 5 shows
simulated spectra for three mdm values that lie between
these two extremes, along with a histogram that depicts
the XQC observations. Given an initial velocity of 300
km s−1 relative to the XQC detector, a 1-GeV particle
can only deposit up to 66 eV in a single collision with an
Si nucleus, so the spectrum for these particles is confined
to very low energies. Meanwhile, a 10-GeV particle and
a 100-GeV particle with the same initial velocity can de-
posit up to 900 eV and 44,000 eV, respectively, in a single
collision with an Hg nucleus. In fact, ignoring any loss
of coherence, all recoil energies between 0 and 44,000 eV
are equally likely during a collision between an Hg nu-
cleus and a 100-GeV dark matter particle. That’s why
the mdm = 100 GeV spectrum in Fig. 5 is flat below 2500
eV and why the simulations predict 10,000 events with
energies greater than 4000 eV for this value of mdm and
Fig. 6 shows how changing the total cross section for
elastic scattering off a nucleon affects the simulated spec-
tra generated by our Monte Carlo routine for a single
dark matter particle mass (mdm = 10 GeV). We see that
increasing σDn from 10
−28.3 cm2 to 10−27.3 cm2 increases
all of the counts by a factor of ten but leaves the ba-
sic shape of the spectrum unchanged. For much larger
values of σDn, however, the particle loses a considerable
Energy Range (eV) Counts Energy Range (eV) Counts
29 - 36 0 945 - 1100 31
36 - 128 11 1100 - 1310 30
128 - 300 129 1310 - 1500 29
300 - 540 80 1500 -1810 32
540 - 700 90 1810 - 2505 15
700 - 800 32 ≥ 4000 60
800 - 945 48
TABLE I: The binned XQC results used for comparison with
our Monte Carlo simulations.
amount of its energy while traveling through the atmo-
sphere. Consequently, high-energy recoil events become
less frequent, as shown by the spectrum for σDn = 10
−21.6
cm2. For larger values of σDn, too much energy is lost in
the atmosphere for the particle to be detectable by the
XQC experiment.
When comparing the simulated measurements to the
XQC data, we group the events into the thirteen energy
bins given in Table I. We generally use large bins be-
cause it reduces the fractional error in the probabilities
generated by our Monte Carlo routine by increasing the
probability of each bin: δpi/pi = 1/
pit, where t is the
number of trials and pi is the probability of an energy
deposit in the ith bin. Given that the number of trials is
limited by runtime constraints, increasing the bin size is
often the only way to obtain bin probabilities with δpi/pi
values much less than one. When choosing our binning
scheme, we attempted to maximize bin size while pre-
serving as many features of the observed spectrum as
possible. We also grouped all energies for which f 6= 1
into two bins; we ignore the variation in f within these
bins and set f = 0.3815 in the lowest-energy bin and
f = 0.5083 in the next-to-lowest bin.
Unfortunately, we do not know the number of X-ray
events in any of the bins listed in Table I. We considered
using a model to subtract off the X-ray background but,
given any model’s questionable accuracy, we decided not
to use it in our analysis. Our ignorance of the X-ray
background forces us to treat the number of observed
counts in each bin as an upper limit on the number of
dark matter events in that energy range. Consequently,
we define a parameter X2 that measures the extent of
the discrepancy between the simulated results for a given
mdm and σDn and the XQC observations while ignoring
bins in which the observed event count exceeds the pre-
dicted contribution from dark matter:
i=# of Bins
(Ei − Ui)2
with Ui < Ei
, (12)
where Ei = Ndm × pi is the number of counts in the ith
bin predicted by the Monte Carlo simulation and Ui is
the number of observed counts in the same bin. We use
a second Monte Carlo routine to determine how likely it
is that a set of observations would give a value of X2 as
large or larger than the one derived from the XQC data
given a mean signal described by the set of Ei derived
from the simulation Monte Carlo.
In the comparison Monte Carlo, a trial begins by gen-
erating a new set of Ei by sampling the error distribu-
tions of Ndm and pi. The distribution of Ndm values is
assumed to be Gaussian with the mean and standard de-
viation given above. The probability pi is derived from
pi× t events in the simulation Monte Carlo (recall that t
is the number of trials), so a new value for pi is generated
by sampling a Poisson distribution with a mean of pi × t
and dividing the resulting number by t. Once a new set
of Ei has been found, the routine generates a simulated
number of observed counts for each bin according to a
Poisson distribution with a mean of Ei. The value of the
X2 parameter for the new Ei and Ui is computed and
compared to the value for the original Ei and the XQC
observations, X2XQC. The number of trials needed to ac-
curately measure the probability P(X) thatX2 ≥ X2XQC
is determined by requiring that the variation in the mean
value of X2 over ten Monte Carlo simulations does not
exceed (100-C)%, where C% is the desired confidence
level and that the range P(X)±(5× the variation in P(X))
does not contain (100− C)/100.
V. RESULTS AND DISCUSSION
The XQC experiment rules out the enclosed region in
(mdm, σDn) parameter space shown in Fig. 7. The over-
burden from the atmosphere and the filtering layers as-
sures that there will be a limit to how strongly a dark
matter particle can interact with baryons and still reach
the XQC calorimeters; this overburden is responsible for
the top edge of the exclusion region. Conversely, if σDn is
too small, the dark matter particles will pass through the
calorimeters without interacting. The low-energy thresh-
old of the XQC calorimeters places a lower bound on the
excluded dark matter particle masses; ifmdm is too small,
then the recoil energies are undetectable. On the other
side of the mass range, the XQC detector is not sensitive
to mdm ∼> 105 GeV because the number density of such
massive dark matter particles is too small for the XQC
experiment to detect.
The exclusion region shown in Fig. 7 has a complicated
shape, but its features are readily explicable. As mdm in-
creases, the range of excluded σDn values shifts to lower
values and then moves up again. The downward shift for
mdm between 0.1 GeV and 100 GeV is due to the effects of
coherent nuclear scattering. Since σcoh increases with in-
creasing mdm for fixed σDn, a 100-GeV particle interacts
more strongly in the atmosphere and in the detector than
a 1-GeV particle with the same σDn. Consequently, both
the upper and lower boundaries of the excluded region
decrease with increasing mass for mdm ∼< 100 GeV. The
scattering of dark matter particles with larger masses is
incoherent, and the form factor discussed in Section III B
FIG. 7: The region of dark matter parameter space excluded
by the XQC experiment; σDn is the total cross section for scat-
tering off a nucleon and mdm is the mass of the dark matter
particle. This exclusion region follows from the assumption
that the local dark matter density is 0.3 GeV cm−3 and that
all of the dark matter shares the same value of σDn.
causes σtot to decrease as mass increases for fixed σDn.
Moreover, particles that are more massive than the target
nuclei have straighter trajectories than lighter dark mat-
ter particles due to smaller scattering angles in the de-
tector rest frame. The loss of coherence also contributes
because incoherent scattering makes small scattering an-
gles more probable. A straight trajectory is shorter than
a random walk, so the more massive particles interact
less in the atmosphere and the detector than the more
easily-deflected lighter particles. Due to both of these
effects, the upper and lower boundaries of the exclusion
region increase with increasingmdm for mdm ∼> 100 GeV.
The lower left corner of the exclusion region also has
two interesting features. First, the lower bound on the
excluded value of σDn decreases sharply as mdm increases
from 0.1 GeV to 0.5 GeV. A dark matter particle with the
maximum possible velocity with respect to the detector
(800 km s−1) must have a mass greater than 0.24 GeV
to be capable of depositing 29 eV in the calorimeter in
a single collision. Lighter particles are only detectable
if they scatter multiple times inside the calorimeter, and
multiple scatters require a higher value of σDn. Since
their analysis does not allow multiple collisions, the XQC
exclusion region found in Ref. [52] does not extend to
masses lower than 0.3 GeV for any value of σDn. Second,
there is a kink in the lower boundary at mdm = 10 GeV;
the constraint on σDn is not as strong for this mass. The
FIG. 8: The region of dark matter parameter space excluded
to 90% confidence by the XQC experiment for several values of
the local density of dark matter with a total nucleon scattering
cross section σDn and mass mdm. The four densities shown
are 0.3 GeV cm−3 (solid line), 0.15 GeV cm−3 (long dashed
line), 0.075 GeV cm−3 (short dashed line) and 0.03 GeV cm−3
(dotted line).
simulated spectra produced by our Monte Carlo routine
for mdm = 10 GeV and σDn ∼< 10−25 cm2 reveal that the
particle is most likely to deposit between 100 and 600
eV, as exemplified by the spectra depicted in Fig. 6. The
background in this energy range is very high, so the XQC
constraints are not as strict at these energies.
Altering the local density of dark matter that strongly
interacts with baryons changes the exclusion region.
Fig. 8 shows the 90%-confidence exclusion regions for
four values of the local density of dark matter particles
with strong baryon interactions: 0.3 GeV cm−3 (solid
line), 0.15 GeV cm−3 (long dashed line), 0.075 GeV cm−3
(short dashed line) and 0.03 GeV cm−3 (dotted line).
These different local densities could arise due to varia-
tions in the local dark matter density due to mini-halos
or streams. They also describe models where the dark
matter does not consist of a single particle species and
the dark matter that strongly interacts with baryons is
a fraction fd of the local dark matter. In that case, the
four exclusion regions in Fig. 8 correspond to fd =1, 0.5,
0.25, and 0.1.
Fig. 8 indicates that the top and left boundaries of the
XQC exclusion region are not highly sensitive to the dark
matter density. In particular, the upper left corner of the
exclusion region (0.01 ≤ mdm ≤ 0.1 GeV) is nearly un-
affected by lowering the dark matter density. This con-
sistency indicates our Monte Carlo-generated exclusion
region is smaller than the true exclusion region in this
corner. If the dark matter is light (mdm ∼< 0.1 GeV),
then the number of dark matter particles encountered by
the XQC detector is very large (Ndm ∼> 7 × 1010). As
previously mentioned, the upper left corner of the XQC
exclusion region results from multiple scattering events,
so the simpler version of our Monte Carlo code described
in Section IVA is not applicable. Consequently, it is not
possible to run more than 109 trials in a week, so each
scattering event in the simulation corresponds to more
than one scattering event in the detector for all the den-
sities shown in Fig. 8. Therefore, decreasing the density
does not change the result. If it were possible to run 1011
trials, then the upper left corner of the exclusion region
would expand and differences between the different den-
sity contours would emerge. Since the upper left corner
of the XQC exclusion region is already ruled out by astro-
physical constraints (see Fig. 9), we have not invested in
the computational time necessary to expand this corner.
The upper boundary of the exclusion region is also not
greatly affected by decreasing the particle density, even
when Ndm is small enough that the Monte Carlo routine
is capable of running more than Ndm trials (mdm ≥ 100
GeV). This robustness indicates that the overburden of
the XQC experiment effectively prevents all dark matter
particles with σDn values greater than the upper bound
of the exclusion region from reaching the detector, so
it does not matter how many particles are encountered.
Finally, as discussed previously, the lower portion of the
exclusion region’s left boundary (σDn ≤ 10−23 cm2) is
set by the energy threshold for detection and is therefore
independent of Ndm.
Examining the features of the excluded region allows
us to predict how the region may be expanded by a fu-
ture XQC-like experiment. Decreasing the overburden by
either increasing the rocket’s altitude or reducing the fil-
tering will push the top boundary of the excluded region
upwards. Decreasing the energy detection threshold will
extend the excluded region to lower masses. It may also
extend the exclusion region to higher values of σDn for all
masses since strongly interacting particles lose much of
their energy in the atmosphere and arrive at the calorime-
ter with too little energy to produce a detectable sig-
nal. Increasing the size or number of calorimeters would
increase the sensitivity and extend the excluded region
to lower values of σDn. Finally, increasing the observa-
tion time would increase Ndm, and that would extend the
right and bottom boundaries of the excluded region.
VI. CONCLUSION
The X-ray Quantum Calorimetry (XQC) experiment is
a powerful detector of dark matter that interacts strongly
with baryons due to its high altitude and minimal shield-
ing. The XQC measurements rule out a large range of
hitherto unconstrained dark matter masses and scatter-
FIG. 9: Plot of the scattering cross section for dark matter particles and nucleons (σDn) versus dark matter particle mass
(mdm) showing the new XQC limits along with other current experimental limits. The red XQC exclusion region is the same
as shown in Fig. 7, and the other experiments are discussed in the text. The dark gray region shows the maximal range of
dark matter self-interaction cross section consistent with the strongly self-interacting dark matter model of structure formation
[37, 38]. The square marks the value of the scattering cross section for neutron-nucleon interactions.
ing cross sections. The excluded range was first derived
in Refs. [38, 48] based on rough analytic estimates. In
this paper, we have improved upon these results using
detailed Monte Carlo simulations to predict how a dark
matter particle of a given mass and cross section for nu-
cleon scattering would interact with the XQC calorime-
ters. Unlike Ref. [52], our analysis includes the atmo-
sphere and the shielding of the detector, so our result in-
cludes the upper limit on excluded σDn values, which had
not yet been accurately determined. Our simulation also
models the internal geometry of the XQC detector and
the random walk of particles through it, which is not pos-
sible using the analytical approaches of Refs. [38, 48, 52].
The resulting exclusion region is significantly different
than its analytical predecessors. When multiple scatter-
ings are included, the XQC experiment is sensitive to
dark matter particles with masses below 0.3 GeV and
cross sections for nucleon scattering between 10−24 and
10−20 cm2. Unlike Ref. [52], we find that the XQC ex-
clusion region does not include σDn < 10
−29 cm2 for
dark matter masses less than 10 GeV. Ref. [52] obtained
a more restrictive upper bound because they assumed a
specific X-ray background while we treat all events as po-
tential dark matter interactions. At higher masses, the
lower boundary of our exclusion region is much higher
than in Refs. [38, 48] because they over-estimated the
XQC sensitivity by assuming coherent scattering. It also
appears that Refs. [38, 48] underestimated the atmo-
spheric and shielding overburden for the XQC detector
because our exclusion region does not extend to values of
σDn as large as those included in their exclusion region.
We also assume a lower local dark matter density than
Refs. [38, 48] (0.3 instead of 0.4 GeV cm−3), so some of
the shrinkage of the exclusion region may be attributed
to the reduction in the assumed number density of dark
matter particles.
Fig. 9 shows how the XQC exclusion region depicted in
Fig. 7 complements the exclusion regions from other ex-
periments that are sensitive to similar values of σDn and
mdm. For a summary of some of the other experimen-
tal constraints as of 1994, see Ref. [46]. The constraints
to σDn from Pioneer 11 [80], Skylab [81], and IMP7/8
[82] were interpreted by Refs. [38, 46, 48]. There have
been two balloon-borne searches for dark matter, the
IMAX experiment [46, 47] and the Rich, Rocchia & Spiro
(RRS) [83] experiment. Although underground detectors
are designed to detect WIMPs, DAMA [84, 85] does ex-
clude σDn values within the range of interest, and relevant
constraints may be derived from Edelweiss (EDEL) and
CDMS [86, 87].
All of the exclusion regions shown in Fig. 9 were de-
rived assuming that all the dark matter is strongly in-
teracting. A local dark matter density of 0.4 GeV cm−3
was assumed in the analysis of the exclusion regions from
Pioneer 11, Skylab and the RRS experiment, while all
the other exclusion regions were derived assuming a lo-
cal dark matter density of 0.3 GeV cm−3. Furthermore,
the derivations of all the shown exclusion regions other
than the XQC region and the EDEL+CDMS region as-
sume that the scattering between dark matter particles
and nuclei is coherent. Therefore, these exclusion regions
are likely too broad because they over-estimate the cross
section for nuclear scattering. A comparison of the XQC
exclusion region reported in Refs. [38, 48] and our exclu-
sion region indicates that assuming coherent scattering
extends the exclusion region for mdm ≥ 1000 GeV to
σDn values that are roughly A× smaller than the lower
boundary of our exclusion region, where A is the mass
number of the largest target nucleus.
Fig. 9 also shows the bound on σDn from the CMB
and large-scale structure (LSS) obtained when one as-
sumes prior knowledge of the Hubble constant H0 and
the cosmic baryon fraction (from BBN) [50]. This bound
is nominally stronger than the bound from disk stabil-
ity [45], but it is less direct in that it requires combining
different measurements and depends on the cosmologi-
cal model; consequently, we show both bounds in Fig. 9.
Measurements of primordial element abundances give an
upper limit of σDn/mdm ∼< 4 × 10−16 cm2 GeV−1 [49].
Since this upper bound lies well beyond the upper bound
from disk stability, we do not include it in Fig. 9. We
also do not display the constraints from cosmic rays [49]
because they are derived from inelastic interactions that
are model-dependent.
As shown in Fig. 9, the XQC experiment rules out
a wide region of (mdm, σDn) parameter space that was
not probed by prior dark matter searches. Of partic-
ular interest is the darkly shaded range of σDn values
that corresponds to the maximal range of dark matter
self-interaction cross sections consistent with the strongly
self-interacting dark matter model of structure formation
[37, 38]. If the dark matter consists of exotic hadrons
whose interactions with nucleons are comparable to their
self-interactions, then σDn for these particles would lie in
or near the darkly shaded region in Fig. 9. Previous esti-
mates of the XQC exclusion region [38, 48] indicated that
the XQC experiment rules out all the darkly shaded σDn
values for 1 ∼< mdm ∼< 104 GeV. Our analysis reveals that
this is not the case; portions of the darkly shaded region
for mdm ∼> 20 GeV are not excluded by the XQC ex-
periment, although they are ruled out by observations of
LSS and the CMB. The mass-σ combination correspond-
ing to nucleon-neutron scattering (the square in Fig. 9)
lies within the exclusion region of the XQC experiment,
and the only portion of the darkly shaded region that is
unconstrained corresponds to dark matter masses smaller
than 0.25 GeV.
It is important to note, however, that the cross section
for dark matter self-interactions need not be comparable
to the cross section for nucleon scattering; σDn could dif-
fer by a few orders of magnitude from the self-interaction
cross section (as is the case for Q-balls). Furthermore, no
interactions with baryons are required for self-interacting
dark matter to resolve the tension between the collision-
less dark matter model and observations of small-scale
structure.
Another XQC detector is scheduled to launch in the
upcoming year. This experiment will have twice the ob-
serving time of the XQC experiment used in this analy-
sis. As discussed in Section V, increasing the observing
time will extend the exclusion region to higher masses
and weaker interactions. The future XQC experiment
will also have a lower energy threshold (15 eV) and will
maintain sensitivity to all energies above this threshold
throughout the run. The increased sensitivity to low en-
ergies will shift the lower (σDn ≤ 10−23 cm2) left bound-
ary of the exclusion region to lower masses. A lower en-
ergy threshold of 15 eV will make the experiment sensi-
tive to single recoil events involving dark matter particles
more massive than 0.17 GeV, as discussed in Section V.
Clearly, the next-generation XQC experiment will be an
even more powerful probe of interactions between dark
matter particles and baryons than its predecessor.
Acknowledgments
A. L. E. would like to thank Robert Lupton and
Michael Ramsey-Musolf for useful discussions. D. M.
thanks the Wallops Flight Facility launch support team
and the many undergraduate and graduate students that
made this pioneering experiment possible. The authors
also thank Randy Gladstone for his assistance with the
atmosphere model. A. L. E. acknowledges the support
of an NSF Graduate Fellowship. P. J. S. is supported
in part by US Department of Energy grant DE-FG02-
91ER40671. P. C. M. acknowledges current support for
this project from a Robert M. Walker Senior Research
Fellowship in Experimental Space Science from the Mc-
Donnell Center for the Space Sciences, as well as prior
institutional support for this project from the Instituto
Nacional de Técnica Aeroespacial (INTA) in Spain, from
the University of Bielefeld in Germany, and from the Uni-
versity of Arizona.
[1] V. C. Rubin, N. Thonnard, and W. K. Ford, The Astro-
physical Journal 238, 471 (1980).
[2] D. N. Spergel, R. Bean, O. Dore’, M. R. Nolta, C. L.
Bennett, G. Hinshaw, N. Jarosik, E. Komatsu, L. Page,
H. V. Peiris, et al., ArXiv Astrophysics e-prints (2006),
astro-ph/0603449.
[3] N. A. Bahcall, J. P. Ostriker, S. Perlmutter, and P. J.
Steinhardt, Science 284, 1481 (1999).
[4] J. F. Navarro, C. S. Frenk, and S. D. M. White, The
Astrophysical Journal 462, 563 (1996).
[5] A. V. Kravtsov, A. A. Klypin, J. S. Bullock, and J. R.
Primack, The Astrophysical Journal 502, 48 (1998).
[6] B. Moore, T. Quinn, F. Governato, J. Stadel, and
G. Lake, Mon. Not. R. Astron. Soc. 310, 1147 (1999).
[7] S. Ghigna, B. Moore, F. Governato, G. Lake, T. Quinn,
and J. Stadel, The Astrophysical Journal 544, 616
(2000).
[8] C. Power, J. F. Navarro, A. Jenkins, C. S. Frenk, S. D. M.
White, V. Springel, J. Stadel, and T. Quinn, Mon. Not.
R. Astron. Soc. 338, 14 (2003), astro-ph/0201544.
[9] J. F. Navarro, E. Hayashi, C. Power, A. R. Jenkins, C. S.
Frenk, S. D. M. White, V. Springel, J. Stadel, and T. R.
Quinn, Mon. Not. R. Astron. Soc. 349, 1039 (2004),
astro-ph/0311231.
[10] E. Hayashi, J. F. Navarro, C. Power, A. Jenkins, C. S.
Frenk, S. D. M. White, V. Springel, J. Stadel, and T. R.
Quinn, Mon. Not. R. Astron. Soc. 355, 794 (2004), astro-
ph/0310576.
[11] J. Diemand, B. Moore, and J. Stadel, Mon. Not. R. As-
tron. Soc. 353, 624 (2004), astro-ph/0402267.
[12] J. Diemand, M. Zemp, B. Moore, J. Stadel, and C. M.
Carollo, Mon. Not. R. Astron. Soc. 364, 665 (2005),
astro-ph/0504215.
[13] J. A. Tyson, G. P. Kochanski, and I. P. dell’Antonio, The
Astrophysical Journal Letters 498, L107+ (1998).
[14] D. J. Sand, T. Treu, G. P. Smith, and R. S. Ellis, The
Astrophysical Journal 604, 88 (2004), astro-ph/0310703.
[15] H. Katayama and K. Hayashida, Advances in Space Re-
search 34, 2519 (2004), astro-ph/0405363.
[16] E. Pointecouteau, M. Arnaud, and G. W. Pratt, Astron-
omy and Astrophysics 435, 1 (2005), astro-ph/0501635.
[17] L. M. Voigt and A. C. Fabian, Mon. Not. R. Astron. Soc.
368, 518 (2006), astro-ph/0602373.
[18] R. A. Flores and J. R. Primack, The Astrophysical Jour-
nal Letters 427, L1 (1994).
[19] B. Moore, Nature 370, 629 (1994).
[20] A. Burkert, The Astrophysical Journal Letters 447,
L25+ (1995).
[21] W. J. G. de Blok and S. S. McGaugh, Mon. Not. R.
Astron. Soc. 290, 533 (1997).
[22] S. S. McGaugh and W. J. G. de Blok, The Astrophysical
Journal 499, 41 (1998), astro-ph/9801123.
[23] W. J. G. de Blok, S. S. McGaugh, and V. C. Rubin, The
Astronomical Journal 122, 2396 (2001).
[24] D. Marchesini, E. D’Onghia, G. Chincarini, C. Firmani,
P. Conconi, E. Molinari, and A. Zacchei, The Astrophys-
ical Journal 575, 801 (2002), astro-ph/0202075.
[25] J. J. Binney and N. W. Evans, Mon. Not. R. Astron. Soc.
327, L27 (2001), astro-ph/0108505.
[26] P. Salucci, Mon. Not. R. Astron. Soc. 320, L1 (2001),
astro-ph/0007389.
[27] J. D. Simon, A. D. Bolatto, A. Leroy, L. Blitz, and E. L.
Gates, The Astrophysical Journal 621, 757 (2005), astro-
ph/0412035.
[28] B. Moore, S. Ghigna, F. Governato, G. Lake, T. Quinn,
J. Stadel, and P. Tozzi, The Astrophysical Journal Let-
ters 524, L19 (1999).
[29] A. Klypin, A. V. Kravtsov, O. Valenzuela, and F. Prada,
The Astrophysical Journal 522, 82 (1999), astro-
ph/9901240.
[30] E. D’Onghia and G. Lake, The Astrophysical Journal
612, 628 (2004), astro-ph/0309735.
[31] A. A. El-Zant, Y. Hoffman, J. Primack, F. Combes, and
I. Shlosman, The Astrophysical Journal Letters 607, L75
(2004), astro-ph/0309412.
[32] E. Hayashi and J. F. Navarro, Mon. Not. R. Astron. Soc.
373, 1117 (2006), astro-ph/0608376.
[33] J. S. Bullock, A. V. Kravtsov, and D. H. Weinberg,
The Astrophysical Journal 539, 517 (2000), astro-
ph/0002214.
[34] A. J. Benson, C. S. Frenk, C. G. Lacey, C. M. Baugh,
and S. Cole, Mon. Not. R. Astron. Soc. 333, 177 (2002),
astro-ph/0108218.
[35] A. V. Kravtsov, O. Y. Gnedin, and A. A. Klypin, The As-
trophysical Journal 609, 482 (2004), astro-ph/0401088.
[36] B. Moore, J. Diemand, P. Madau, M. Zemp, and
J. Stadel, Mon. Not. R. Astron. Soc. 368, 563 (2006),
astro-ph/0510370.
[37] D. N. Spergel and P. J. Steinhardt, Physical Review Let-
ters 84, 3760 (2000).
[38] B. D. Wandelt, R. Davé, G. R. Farrar, P. C. McGuire,
D. N. Spergel, and P. J. Steinhardt, in Sources and De-
tection of Dark Matter and Dark Energy in the Universe,
edited by D. B. Cline (Springer-Verlag, Berlin, New York,
2001), p. 263, astro-ph/0006344.
[39] R. Davé, D. N. Spergel, P. J. Steinhardt, and B. D. Wan-
delt, The Astrophysical Journal 547, 574 (2001).
[40] K. Ahn and P. R. Shapiro, Mon. Not. R. Astron. Soc.
363, 1092 (2005), astro-ph/0412169.
[41] G. R. Farrar, Int. J. Theor. Phys. 42, 1211 (2003).
[42] G. R. Farrar and G. Zaharijas, Physical Review Letters
96, 041302 (2006), hep-ph/0510079.
[43] A. Kusenko and P. J. Steinhardt, Physical Review Letters
87, 141301 (2001), astro-ph/0106008.
[44] M. Y. Khlopov, Pisma Zh. Eksp. Teor. Fiz. 83, 3 (2006),
astro-ph/0511796.
[45] G. D. Starkman, A. Gould, R. Esmailzadeh, and S. Di-
mopoulos, Physical Review D 41, 3594 (1990).
[46] P. C. McGuire, Ph.D. thesis, University of Arizona
(1994).
[47] P. C. McGuire, T. Bowen, D. L. Barker, P. G. Halverson,
K. R. Kendall, T. S. Metcalfe, R. S. Norton, A. E. Pifer,
L. M. Barbier, E. R. Christian, et al., in AIP Conf. Proc.
336: Dark Matter, edited by S. S. Holt and C. L. Bennett
(1995), p. 53.
[48] P. C. McGuire and P. J. Steinhardt, in Proceedings of
the 27th International Cosmic Ray Conference, Ham-
burg, Germany (2001), p. 1566, astro-ph/0105567.
[49] R. H. Cyburt, B. D. Fields, V. Pavlidou, and B. Wandelt,
Physical Review D 65, 123503 (2002), astro-ph/0203240.
[50] X. Chen, S. Hannestad, and R. J. Scherrer, Physical Re-
view D 65, 123515 (2002), astro-ph/0202496.
[51] D. McCammon, R. Almy, E. Apodaca, W. Bergmann
Tiest, W. Cui, S. Deiker, M. Galeazzi, M. Juda,
A. Lesser, T. Mihara, et al., The Astrophysical Journal
576, 188 (2002).
[52] G. Zaharijas and G. R. Farrar, Physical Review D 72,
083502 (2005), astro-ph/0406531.
[53] D. McCammon, R. Almy, S. Deiker, J. Morgenthaler,
R. L. Kelley, F. J. Marshall, S. H. Moseley, C. K. Stahle,
and A. E. Szymkowiak, Nuclear Instruments and Meth-
ods in Physics Research A 370, 266 (1996).
[54] C. K. Stahle, R. L. Kelley, D. McCammon, S. H. Moseley,
and A. E. Szymkowiak, Nuclear Instruments and Meth-
ods in Physics Research A 370, 173 (1996).
[55] E. I. Gates, G. Gyuk, and M. S. Turner, The Astrophys-
ical Journal Letters 449, L123+ (1995).
[56] B. Moore, C. Calcáneo-Roldán, J. Stadel, T. Quinn,
G. Lake, S. Ghigna, and F. Governato, Phys. Rev. D
64, 063508 (2001), astro-ph/0106271.
[57] J. D. Lewin and P. F. Smith, Astroparticle Physics 6, 87
(1996).
[58] A. M. Green, Phys. Rev. D 68, 023004 (2003), astro-
ph/0304446.
[59] A. K. Drukier, K. Freese, and D. N. Spergel, Physical
Review D 33, 3495 (1986).
[60] F. J. Kerr and D. Lynden-Bell, Mon. Not. R. Astron.
Soc. 221, 1023 (1986).
[61] J. A. R. Caldwell and I. M. Coulson, The Astronomical
Journal 93, 1090 (1987).
[62] R. P. Olling and M. R. Merrifield, Mon. Not. R. Astron.
Soc. 297, 943 (1998).
[63] C. S. Kochanek, Astrophys. J. 457, 228 (1996), astro-
ph/9505068.
[64] M. J. Reid, A. C. S. Readhead, R. C. Vermeulen, and
R. N. Treuhaft, The Astrophysical Journal 524, 816
(1999).
[65] R. P. Olling and W. Dehnen, The Astrophysical Journal
599, 275 (2003), arXiv:astro-ph/0301486.
[66] K. M. Cudworth, The Astronomical Journal 99, 590
(1990).
[67] P. J. T. Leonard and S. Tremaine, Astrophys. J. 353,
486 (1990).
[68] M. C. Smith, G. R. Ruchti, A. Helmi, R. F. G. Wyse, J. P.
Fulbright, K. C. Freeman, J. F. Navarro, G. M. Seabroke,
M. Steinmetz, M. Williams, et al., Mon. Not. R. Astron.
Soc. 379, 755 (2007), arXiv:astro-ph/0611671.
[69] M. J. Reid, Annual Review of Astronomy and Astro-
physics 31, 345 (1993).
[70] D. H. McNamara, J. B. Madsen, J. Barnes, and B. F.
Ericksen, The Publications of the Astronomical Society
of the Pacific 112, 202 (2000).
[71] V. S. Avedisova, Astronomy Reports 49, 435 (2005).
[72] W. Dehnen and J. J. Binney, Mon. Not. R. Astron. Soc.
298, 387 (1998), astro-ph/9710077.
[73] M. W. Goodman and E. Witten, Physical Review D 31,
3059 (1985).
[74] A. Gould, The Astrophysical Journal 321, 571 (1987).
[75] G. Gelmini, A. Kusenko, and S. Nussinov, Physical Re-
view Letters 89, 101302 (2002), hep-ph/0203179.
[76] J. Engel, Physics Letters B 264, 114 (1991).
[77] D. Z. Freedman, Physical Review D 9, 1389 (1974).
[78] R. H. Helm, Physical Review 104, 1466 (1956).
[79] M. Matsumoto and T. Nishimura, ACM Transactions on
Modeling and Computer Simulation 8, 3 (1998).
[80] J. A. Simpson, T. S. Bastian, D. L. Chenette, R. B. McK-
ibben, and K. R. Pyle, Journal of Geophysical Research
85, 5731 (1980).
[81] E. K. Shirk and P. B. Price, Astrophys. J. 220, 719
(1978).
[82] R. A. Mewaldt, A. W. Labrador, C. Lopate, and R. B.
McKibben (2001), private communication.
[83] J. Rich, R. Rocchia, and M. Spiro, Physics Letters B
194, 173 (1987).
[84] C. Bacci, P. Belli, R. Bernabei, C. Dai, L. Ding, W. di
Nicolantonio, E. Gaillard, G. Gerbier, H. Kuang, A. In-
cicchitti, et al., Astroparticle Physics 2, 13 (1994).
[85] R. Bernabei, P. Belli, R. Cerulli, F. Montecchia, M. Am-
ato, G. Ignesti, A. Incicchitti, D. Prosperi, C. J. Dai,
H. L. He, et al., Physical Review Letters 83, 4918 (1999).
[86] I. F. M. Albuquerque and L. Baudis, Physical Review
Letters 90, 221301 (2003), astro-ph/0301188.
[87] I. F. M. Albuquerque and L. Baudis, Physical Review
Letters 91, 229903(E) (2003).
ABSTRACT
  Although the rocket-based X-ray Quantum Calorimetry (XQC) experiment was
designed for X-ray spectroscopy, the minimal shielding of its calorimeters, its
low atmospheric overburden, and its low-threshold detectors make it among the
most sensitive instruments for detecting or constraining strong interactions
between dark matter particles and baryons. We use Monte Carlo simulations to
obtain the precise limits the XQC experiment places on spin-independent
interactions between dark matter and baryons, improving upon earlier analytical
estimates. We find that the XQC experiment rules out a wide range of
nucleon-scattering cross sections centered around one barn for dark matter
particles with masses between 0.01 and 10^5 GeV. Our analysis also provides new
constraints on cases where only a fraction of the dark matter strongly
interacts with baryons.

<|endoftext|><|startoftext|>
Introduction
	Heavy-Light Staggered Chiral Perturbation Theory
	Generalizing Continuum PQ0.4exPT to S0.4exPT
	Form Factors for BP Decay
	Form factors for 3-flavor partially quenched S0.4exPT
	Full QCD Results
	Analytic terms
	Finite Volume Effects
	Conclusions
	Feynman Rules
	Integrals
	Wavefunction Renormalization Factors
	References
ABSTRACT
  We calculate the form factors for the semileptonic decays of heavy-light
pseudoscalar mesons in partially quenched staggered chiral perturbation theory
(\schpt), working to leading order in $1/m_Q$, where $m_Q$ is the heavy quark
mass. We take the light meson in the final state to be a pseudoscalar
corresponding to the exact chiral symmetry of staggered quarks. The treatment
assumes the validity of the standard prescription for representing the
staggered ``fourth root trick'' within \schpt by insertions of factors of 1/4
for each sea quark loop. Our calculation is based on an existing partially
quenched continuum chiral perturbation theory calculation with degenerate sea
quarks by Becirevic, Prelovsek and Zupan, which we generalize to the staggered
(and non-degenerate) case. As a by-product, we obtain the continuum partially
quenched results with non-degenerate sea quarks. We analyze the effects of
non-leading chiral terms, and find a relation among the coefficients governing
the analytic valence mass dependence at this order. Our results are useful in
analyzing lattice computations of form factors $B\to\pi$ and $D\to K$ when the
light quarks are simulated with the staggered action.

<|endoftext|><|startoftext|>
Introduction
Increasing attention is being paid to the dynamics of open quantum systems, that is,
to quantum systems acted on by an environment. Such systems are of interest for studies of
dissipative phenomena, decoherence, backgrounds to quantum computers and to precision
measurements, and theories of quantum measurement. A principal tool in studying open
quantum systems is the reduced density matrix, obtained from the pure state density matrix
by tracing over environment degrees of freedom, or in stochastic models where the environ-
ment is represented by a noise term in the Schrödinger equation, by averaging over the noise.
As is well-known, this transition from the pure state density matrix to the reduced density
matrix is not one-to-one, since information about the total system is lost. For example, in
stochastic models, there is known to be a continuum of different unravelings, or pure state
density matrix stochastic evolutions, that yield the same master equation for the reduced
density matrix. The question that we investigate here is the extent to which one can form
objects that refer only to the basis vectors of the system Hilbert space, but that nonetheless
recapture information that is lost in passing to the reduced density matrix. In the first
part of this paper (Sections 2 through 5), we discuss classical noise arising from fluctuations
defined by classical probability distributions. In the second part (Sections 6 through 9), we
give an analogous discussion of quantum noise, which appears in the physically important
case of a quantum system coupled to a quantum environment in an overall pure state. We
also give an extension, making contact with the discussion of the first part, to the case in
which the overall system is in a mixed state superposition of pure states. The final section
contains a discussion of quantum measurements that relates the material in the first and
second parts.
For the case of classical probability distributions, a relevant discussion appears in
Chapter 5 of the book The Theory of Open Quantum Systems by Breuer and Petruccione
[1], following up on earlier papers by those authors [2], by Wiseman [3] and by Mølmer,
Castin, and Dalibard [4]. In simplified form, Breuer and Petruccione introduce an ensemble
of pure state vectors |ψα〉, each drawn from the same system Hilbert space HS , with each
vector assumed to occur in the ensemble with probability wα,
wα = 1. Measurement
of a general self-adjoint operator R for a system prepared in |ψα〉 typically gives a range of
values, the mean of which given by 〈ψα|R|ψα〉. The mean or expectation over the ensemble
of pure state vectors is then given by
wα〈ψα|R|ψα〉 = TrρR , (1a)
with ρ the mixed state or reduced density matrix defined by
wα|ψα〉〈ψα| . (1b)
Breuer and Petruccione point out that there are three variances that are relevant. The
variance of measurements of R over all pure states in the ensemble is given by
Var(R) = Trρ(R − TrρR)2 = TrρR2 − (TrρR)2 . (2a)
This can be written as the sum of two non-negative terms,
Var(R) = Var1(R) + Var2(R) , (2b)
with Var1(R) the ensemble average of the variances of R within each pure state of the
ensemble,
Var1(R) =
wα[〈ψα|R
2|ψα〉 − 〈ψα|R|ψα〉
2] , (2c)
and with Var2(R) the variance of the pure state means of R over the ensemble,
Var2(R) =
wα〈ψα|R|ψα〉
2 − [
wα〈ψα|R|ψα〉]
2 . (2d)
Thus, Var1(R) is an ensemble average of the quantum variances of R, while Var2(R) is a
measure of the spread of the average values of R resulting from the statistical properties of
the ensemble. As Breuer and Petruccione note, neither of the subsidiary variances Var1,2
can be expressed as the density matrix expectation of some self-adjoint operator.
Our aim in the first part of this paper is to extend the formalism of ref [1] by utilizing
a density tensor hierarchy, which captures the statistical information that is lost in forming
the reduced density matrix of Eq. (1b). A density tensor, defined as an ensemble average
of density matrices, was first introduced by Mielnik [5], and was applied to discussions of
density functions on the space of quantum states and their application to thermalization of
quantum systems by Brody and Hughston [6]. These papers, in addition to introducing the
concept of a density tensor which is developed further here, also contain the important result
that in the case of a continuum probability distribution, the density tensor hierarchy gives
all of the information needed to reconstruct the probability function wα. In particular, the
variances Var1,2 for any observable, and more general statistical properties of the ensemble
as well, can be expressed as contractions of density tensor matrix elements with appropriate
matrix elements of the observable(s) of interest.
The basic construction of the density tensor hierarchy corresponding to a classical
probability distribution {wα} is given in Sec. 2. Here we generalize the reduced density
matrix of Eq. (1b) to a density tensor, formed by taking a product of pure state density matrix
elements, and averaging over the ensemble of pure states. When the ψα are independent of
α, this tensor reduces to an n-fold product of reduced density matrices, and so the difference
between the density tensor and this product is a measure of the statistical fluctuations in the
ensemble. In the generic case of non-trivial dependence of ψα on α, there are some general
statements that can be made. First of all, the order n density tensor is a symmetric tensor
in its pair indices, and it can be considered as a matrix operator acting on the n-fold tensor
product of the system Hilbert space HS with itself. The symmetry of the density tensor
allows construction of a generating function that on expansion gives the density tensors of
all orders. Additionally, as a consequence of the unit trace and idempotence conditions
obeyed by the pure state density matrix, the density tensor hierarchy satisfies a system of
descent equations, relating the order n tensor to the order n−1 tensor when any row index is
contracted with any column index. We show that the variances Var1,2 defined by Breuer and
Petruccione can be expressed in terms of appropriate contractions of density tensor elements
with operator matrix elements.
In subsequent sections we develop some concrete applications of the general formal-
ism for classical probability distributions. In Sec. 3, we consider an isotropic ensemble of
spin-1/2 pure state density matrices, construct the density tensors through order 3, ver-
ify the descent equations, and calculate the generating function. In Sec. 4 we apply the
formalism to a quantum system evolving under the influence of noise as described by a
stochastic Schrödinger equation, with the ensemble defined as the set of all histories of an
initial quantum state under the influence of the noise. Assuming white noise described by
the Itô calculus, we give the dynamics of the general density tensor in terms of the general
unraveling of the Lindblad equation constructed by Wiseman and Diósi [7], and show that
the order two and higher density tensors distinguish between inequivalent unravelings that
give the same reduced density matrix (i.e., the same order one density tensor). In Sec. 5
we develop an analogous formalism for the case of jump (piecewise deterministic process)
unravelings of the Lindblad equation.
We turn next to an analysis of a quantum system coupled to a quantum environment,
rather than to an external classical noise source. Here, one is confronted with the problem of
discussing the system dissipation associated with the system-environment interaction within
a single overall pure state of system plus environment (or in a thermal state that is a
weighted average of such pure states). Typically, in master equation derivations, the system-
environment interaction1 H has vanishing expectation in the environment, but its square H2
does not have a vanishing expectation, because the environment is not in an eigenstate of
H . The associated variance is then a measure of quantum fluctuations associated with the
environment state, and is the source of quantum “noise” driving the system dissipation.
Our aim in the second part of this paper is to generalize the formalism of the first part
to recapture information about this noise that is lost in the passage to the system reduced
density matrix. We do this in Sec. 6 by defining a density tensor hierarchy as the trace over
the environment of a product of environment operators constructed as the system matrix
elements of the total density matrix. Unlike the classical noise construction, which uses only
the system density matrix, the construction in the quantum noise case requires knowledge of
the full system plus environment density matrix, and so (except for the order one case) does
not give a system observable. It is nonetheless computable in any theory of the system plus
environment, and is of theoretical, rather than empirical, interest. Because the environment
1 What we call H is usually denoted by HI in the open systems literature. To avoid
confusion, all other Hamiltonians will carry subscripts, e.g., HS and HE for the system and
environment Hamiltonians, HTOT for the total Hamiltonian, etc.
operators entering the construction are non-commutative, this hierarchy is no longer totally
symmetric in its system index pairs, but by the cyclic permutation property of the trace, it
is symmetric under cyclic permutation of the system index pairs. Also, because the system
trace of these environment operators gives only the reduced environment density matrix,
rather than unity, there is in general no descent equation associated with taking this trace.
However, when indices of adjacent system operators are contracted, one gets the square of
the overall density matrix, and so there remains a set of descent relations connecting the
order (n) tensor to the order (n− 1) tensor. Finally, in the case of thermal (or other mixed)
overall states, we define the appropriate tensor as a weighted sum of pure state tensors, in
analogy with the definition of Sec. 2.
In subsequent sections, we give applications of the trace hierarchy formalism to
several classic problems discussed in the theory of quantum master equations. In Sec. 7 we
consider the quantum Brownian motion (and resulting decoherence) of a massive Brownian
particle in interaction with an independent particle bath of scatterers. In Sec. 8 we discuss
the tensor hierarchy corresponding to the weak coupling Born–Markov master equation,
and it specialization to the quantum optical master equation. Finally in Sec. 9, we give
an analogous discussion for the Caldeira–Leggett model of a particle in interaction with a
system of environmental oscillators.
We conclude with a discussion that bridges the considerations of the classical noise
and the quantum noise cases. In Sec. 10, we contrast two different Itô stochastic Schrödinger
equations, both of which have the same Lindblad, but only one of which leads to state vector
reduction. We relate this to the fact that the equation giving the time derivative of the
stochastic expectation of operator variances involves the order two density tensor, which
differs for the two cases. We discuss the analogous equation for the time dependence of the
variance of a “pointer operator” in the case of a quantum system coupled to a quantum
environment, and show why this does not lead to state vector reduction. Thus we see no
mechanism for quantum “noise” in a closed quantum system plus environment to provide a
resolution of the quantum measurement problem.
2. The density tensor for classical noise and its kinematical properties
We proceed to establish our notation and to define the density tensor hierarchy in
the classical noise case. We denote the pure state density matrix formed from the unit
normalized state |ψα〉 by ρα, with
ρα = |ψα〉〈ψα| , (3a)
and its general matrix element between states |i〉 and |j〉 of HS by
ρα;ij ≡ 〈i|ρα|j〉 . (3b)
The unit trace condition on ρα states that
Trρα = 〈ψα|ψα〉 = 1 , (3c)
and the idempotence condition on ρα states that
ρ2α = |ψα〉〈ψα||ψα〉〈ψα| = |ψα〉〈ψα| = ρα . (3d)
We now define the order n density tensor by
i1j1,i2j2,...,injn
wαρα;i1j1ρα;i2j2 ...ρα;injn = E[ρα;i1j1ρα;i2j2 ...ρα;injn ] , (4a)
with E[Fα] a shorthand for
E[Fα] =
wαFα . (4b)
Since
wαρα;ij =
wα〈i|ρα|j〉 , (5a)
we see that this is just the |i〉 to |j〉 matrix element of the reduced density matrix ρ defined
in Eq. (1b),
ij = 〈i|ρ|j〉 , (5b)
and so the density tensor of Eq. (4a) is a natural generalization of the usual reduced density
matrix. When the states |ψα〉 are independent of the label α, the definition of Eq. (4a)
simplifies to
i1j1,i2j2,...,injn
= ρi1j1ρi2j2 ...ρinjn , (5c)
and so the difference between Eq. (4a) and a product of reduced density matrix elements
is a reflection of the statistical structure of the ensemble. Since the factors within the
expectation E[...] on the right of Eq. (4a) are just ordinary complex numbers, the density
tensor is symmetric under interchange of any index pair iljl with any other index pair imjm.
Consequently, we can define a generating function for the density tensor by
G[aij ] = E[e
ρα;ijaij ] =
ai1j1 ...ainjn
i1j1,...,injn
, (5d)
where repeated indices i, j are summed. It will often be convenient to abbreviate ρα;ijaij by
ρα · a, so that the generating function becomes in this notation G[a] = E[e
ρα ·a].
Although the density tensor for n > 1 is not an operator on HS , it clearly has the
structure of an operator on the n-fold tensor product HS ⊗ HS ⊗ ... ⊗ HS . Motivated by
this, we will often find it convenient to write the definition of Eq. (4a) as
ρ(n) = E
, (5e)
with each factor ρα;ℓ acting on a distinct factor Hilbert space HS;ℓ in the tensor product
ℓ=1 HS;ℓ. One can pass easily back and forth from this notation to one in which the system
matrix indices are displayed explicitly.
Let us consider next the result of contracting any row index il with any column index
jk. There are two basic cases: (i) one can contract a row index il with its corresponding
column index jl, and (ii) one can contract a row index il with a column index jk with k 6= l.
Since the density tensor is symmetric in its index pairs, it suffices to consider only one
example of each case, since all others can be obtained by permutation. For the contraction
of i1 with j1 we find
δi1j1ρ
i1j1,i2j2,...,injn
= E[(Trρ)ρα;i2j2 ...ρα;injn ] = E[ρα;i2j2 ...ρα;injn ] = ρ
(n−1)
i2j2,...,injn
, (6a)
where we have used the unit trace condition of Eq. (3c). For the contraction of j1 with i2,
we find
δj1i2ρ
i1j1,i2j2,...,injn
= E[(ρ2)α;i1j2 ...ρα;injn ] = E[ρα;i1j2ρα;i3j3 ...ρα;injn ] = ρ
(n−1)
i1j2,i3j3,...,injn
, (6b)
where now we have used the idempotence condition of Eq. (3d). As an illustration of how
this works when all possible index pair contractions are considered, we give the complete set
of contractions reducing the second order density tensor to a first order density tensor,
δi1j1ρ
i1j1,i2j2
δi2j2ρ
i1j1,i2j2
δj1i2ρ
i1j1,i2j2
δj2i1ρ
i1j1,i2j2
Referring to the generating function of Eq. (5d), the general descent equations can be sum-
marized compactly by the two identities,
∂G[aij ]
=E[(Trρα)e
ρα;ijaij ] = G[aij ] ,
∂2G[aij ]
∂amr∂apq
=E[ρmrρrqe
ρα;ijaij ] = E[ρmqe
ρα;ijaij ] =
∂G[aij ]
When the density matrix ρ used to define the density tensor is a mixed state density matrix,
the trace descent relation of Eq. (6a) is unchanged, while the indempotency relation of
Eq. (6b) relates the contraction an order (n) tensor to an order (n− 1) tensor in which one
factor ρ is replaced by ρ2; this is not a member of the original hierarchy, but still gives a
useful relation for checking calculations.
To conclude this section, let us return to the variances introduced by Breuer and
Petruccione. In terms of the order one and order two density tensors, we evidently have
Var1(R) =ρ
(R2)j1i1 − ρ
i1j1,i2j2
Rj1i1Rj2i2 ,
Var2(R) =ρ
i1j1,i2j2
Rj1i1Rj2i2 − (ρ
Rj1i1)
Var(R) =ρ
(R2)j1i1 − (ρ
Rj1i1)
with Rji = 〈j|R|i〉. Clearly, other statistical properties of the ensemble are readily expressed
in terms of the density tensor hierarchy. For example, the ensemble average of the product
of the expectations of two different operators R and S is given by
wα〈ψα|R|ψα〉〈ψα|S|ψα〉 = ρ
i1j1,i2j2
Rj1i1Sj2i2 , (8b)
which can be used, together with information obtained from ρ(1), to calculate the covariance
and correlation of R and S.
3. Isotropic spin-1/2 ensemble
As a simple example of the density tensor formalism, let us follow Breuer and Petruc-
cione [1] and consider the case of an isotropic spin-1/2 ensemble. Let ~v be a vector in three
dimensions, and consider the ensemble of spin-1/2 pure state density matrices
ρ(~v) =
(1 + ~v · ~σ) , (9a)
with ~σ = (σ1, σ2, σ3) the standard Pauli matrices, and with a uniform probability distribution
of ~v over the unit sphere |~v | = 1 specified by
w(~v ) =
δ(|~v | − 1) . (9b)
(Clearly, ~v has the same significance as the label α used in the preceding section.) Defining
E[P (~v )] =
d3vw(~v )P (~v ) , (10a)
a standard calculation gives
E[1] = 1 , E[vsvt] =
δst , ... , (10b)
with all averages of odd powers of ~v vanishing. From Eq. (9a), we have
ρ(~v )ij =
(δij + vrσ
ij) , (11a)
and the general density tensor over this ensemble is defined by
i1j1,...,injn
= E[ρ(~v )i1j1 ...ρ(~v )injn ] . (11b)
From Eq. (10b), the first three tensors in this hierarchy are now easily found to be
δi1j1 ,
i1j1,i2j2
δi1j1δi2j2 +
~σi1j1 · ~σi2j2
i1j1,i2j2,i3j3
δi1j1δi2j2δi3j3 +
(δi1j1~σi2j2 · ~σi3j3 + δi2j2~σi1j1 · ~σi3j3 + δi3j3~σi1j1 · ~σi2j2)
Using the relations Tr~σ = 0 and (~σ 2)ij = 3δij , it is now easy to verify that the descent
relations of Eqs. (6a) and (6b) are satisfied by Eq. (12).
For the isotropic spin-1/2 ensemble, the generating function of Eq. (5d) becomes
G[aij ] = E[e
ρ(~v)ijaij ] , (13)
with ρ(~v )ij given by Eq. (11a). Defining the vector ~A by
~σijaij , (14a)
a simple calculation gives
G[aij ] = exp(
sinh | ~A |
| ~A |
= exp(
Tra)[1 +
( ~A 2)2
+ ...] , (14b)
from which one can read off the values of the low order density tensors given in Eq. (12).
The verification of the descent relations of Eq. (7b) for the generating function of Eq. (14b)
is given in Appendix A.
4. Itô stochastic Schrödinger equation
We consider next a state vector |ψ〉 with a time evolution described by a stochas-
tic Schrödinger equation, which is a frequently used model approximation to open system
dynamics. In this case the state vector and the corresponding pure state density matrix
ρ = |ψ〉〈ψ| are implicit functions of the noise, which takes a different sequence of values for
each history of the system. In the notation of Sec. 2, the different histories are labeled by
the subscript α, and the expectation of Eq. (4b) is an average over all possible histories. It
is customary, however, in discussing stochastic Schrödinger equations to omit the subscript
α, treating the history dependence of ρ as understood. So in this context, the definition of
Eq. (4a) becomes
i1j1,...,injn
= E[ρi1j1 ...ρinjn ] , (15)
with E[...] the stochastic expectation, and the generating function G[aij ] takes the same form
as given in Eq. (5d) but with the subscript α omitted.
Our aim in this section is to derive an equation of motion for the generating function,
which on expansion yields equations of motion for all density tensors ρ(n), taking as input
the general pure state density matrix evolution constructed by Wiseman and Diósi [7], that
corresponds to a given Lindblad form [8,9] for the time evolution of the reduced density
matrix ρ(1) = E[ρ]. We begin by recapitulating the results of ref [7]. The most general
evolution of a density matrix ρ that preserves Trρ = 1 and obeys the complete positivity
condition is the Lindblad form
dρ = dtLρ , (16a)
Lρ ≡ −i[HTOT, ρ] + ckρc
ck, ρ} , (16b)
with {, } denoting the anticommutator, and with the repeated index k summed. The set of
Lindblad operators ck describes the effects on the system of the reservoir or environment that
is modeled by an external classical noise. Wiseman and Diósi show that the most general
evolution of the pure state density matrix ρ for which E[dρ] reduces to Eqs. (16a) and (16b)
takes the form
dρ = dtLρ+ |dφ〉〈ψ|+ |ψ〉〈dφ| . (17a)
Here |dφ〉 is a state vector that is a pure noise term, so that
E[|dφ〉] = 0 , (17b)
that is orthogonal to |ψ〉, so that
〈ψ|dφ〉 = 0 , (17c)
and that obeys
|dφ〉〈dφ| = dtW . (17d)
The operator W is the Diósi transition rate operator [5] given by
W =Lρ− {ρ,Lρ}+ ρTr(ρLρ)
=(ck − 〈ck〉)ρ(ck − 〈ck〉)
, (18)
where 〈ck〉 is a shorthand for the quantum state expectation 〈ψ|ck|ψ〉 = Trρck. Although
|dφ〉〈dφ| is completely fixed, Wiseman and Diósi show that |dφ〉|dφ〉 is free, with different
choices for this and different phase choices for the ck corresponding to different pure state
evolutions (or “unravelings”) that yield the same evolution of Eqs. (16a) and (16b) for the
reduced density matrix ρ.
Wiseman and Diósi further show that |dφ〉 can be parameterized by complex Wiener
processes by writing
|dφ〉 = (ck − 〈ck〉)|ψ〉dξ
k , (19a)
E[dξk] = E[dξ
k] = 0 (19b)
and with
dξj(t)dξ
k(t) =dtδjk
dξj(t)dξk(t) =dtujk ,
(19c)
where ukj = ujk is a set of arbitrary complex numbers subject to the condition that the
norm of the complex matrix u ≡ [ujk] be less than or equal to 1. (See Eqs. (4.10) and (4.11)
of ref. [7].) In terms of this parameterization of |dφ〉, the pure state evolution of Eq. (17a)
takes the form
dρ = dtLρ+ (ck − 〈ck〉)ρdξ
k + ρ(ck − 〈ck〉)
†dξk , (19d)
and the corresponding stochastic Schrödinger equation for the wave function is [7]
d|ψ〉 =− iHψdt|ψ〉+ (ck − 〈ck〉)dξ
k|ψ〉 ,
−iHψ =− iHTOT −
kck − 2〈ck〉
∗ck + 〈ck〉
∗〈ck〉
(19e)
We proceed now to use pure state evolution of Eq. (19d) to calculate the evolution
equation for the generating function
G[aij ] = E[exp(ρijaij)] . (20a)
To calculate the differential of Eq. (20a), we use the Itô stochastic calculus rule for the
differential of a function f(w) of a stochastic variable w,
df(w) = dwf ′(w) +
(dw)2f ′′(w) . (20b)
Applying this to Eq. (20a), we get
dG[aij ] = E[(dρmramr +
dρmramrdρpqapq) exp(ρijaij)] . (20c)
Substituting Eq. (19d) for dρ, and using Eqs. (19a-c), together with the Itô calculus rule
E[dwf(w)] = 0, we get
dG[aij ] = dtE
amr(Lρ)mr +
amrapqCmr,pq
exp(ρijaij)
, (21a)
with the coefficient of the quadratic term in aij given by
Cmr,pq =Cpq,mr = dρmrdρpq
=〈m|(ck − 〈ck〉)ρ|r〉〈p|ρ(ck − 〈ck〉)
+〈m|ρ(ck − 〈ck〉)
†|r〉〈p|(ck − 〈ck〉)ρ|q〉
+〈m|(ck − 〈ck〉)ρ|r〉〈p|(cℓ − 〈cℓ〉)ρ|q〉u
+〈m|ρ(ck − 〈ck〉)
†|r〉〈p|ρ(cℓ − 〈cℓ〉)
†|q〉ukℓ
(21b)
This expression can be rearranged by using the identity, valid for general operators A,B,
general states |r〉, |m〉, and general pure state (idempotent) density matrix ρ,
ρA|r〉〈m|Bρ = ρ〈m|BρA|r〉 , (22a)
giving an alternative result for Cmr,pq
Cmr,pq =Wmqρpr +Wprρmq
+[(ck − 〈ck〉)ρ]mqu
kℓ[(cℓ − 〈cℓ〉)ρ]pr
+[ρ(ck − 〈ck〉)
†]prukℓ[ρ(cℓ − 〈cℓ〉)
†]mq ,
(22b)
where we have used Eq. (18) defining the operator W , and where we use the subscript
notation of Eq. (3b) for matrix elements, so that in general Amr = 〈m|A|r〉.
From the evolution equation of Eqs. (21a,b) and (22b) for the generating function,
by expansion in powers of a we can read off the evolution equation for the general density
tensor of order n. Employing now the condensed notation of Eq. (5e), in which matrix
indices are not indicated explicitly, we have
dρ(n) =dtE[
(ρ1...ρn)ℓ(Lρ)ℓ
ℓ<m=1
(ρ1...ρn)ℓmCℓm] .
(23a)
Here (ρ1...ρn)ℓ denotes the product
j=1 ρj with the factor ρℓ omitted, and similarly, (ρ1...ρn)ℓm
denotes the product
ρj with the factors ρℓ and ρm omitted.
2 The coefficient Cℓm is
given by
Cℓm = Cmℓ =[(ck − 〈ck〉)ρ]ℓ[ρ(ck − 〈ck〉)
+[ρ(ck − 〈ck〉)
†]ℓ[(ck − 〈ck〉)ρ]m
+[(ck − 〈ck〉)ρ]ℓ[(ck̄ − 〈ck̄〉)ρ]mu
+[ρ(ck − 〈ck〉)
†]ℓ[ρ(ck̄ − 〈ck̄〉)
†]mukk̄ ,
(23b)
which corresponds in an obvious way to Eq. (21b) when matrix elements are written explicitly
between states 〈m| and |r〉 in the Hilbert space labeled by ℓ, and between states 〈p| and |q〉
in the Hilbert space labeled by m. (No relation is implied between the m used as a state
label, and the m used as a Hilbert space label.) Since Cℓm in Eq. (23a), which depends
through the terms involving ukk̄ on the choice of unraveling, is multiplied by two powers of
a, it does not contribute to the evolution equation for the reduced density matrix ρ(1). So
as expected, the reduced density matrix evolution is given solely by the Lindblad term and
is independent of the choice of unraveling. Higher density tensors ρ(n), with n ≥ 2, have
evolution equations that receive contributions from Cℓm, and so contain information that
distinguishes between different unravelings of the Lindblad evolution.
As a simple illustration of how the tensors ρ(n) for n ≥ 2 distinguish between different
unravelings, let us consider the case of real noise, dξk = dξ
k, for which ujk = δjk, and with
2 For n = 1, (ρ1)1 = 1 and (ρ1)ℓm = 0, while for n = 2, (ρ1ρ2)12 = 1.
a single Lindblad c1, which we choose as either c1 = A or c1 = iA, with A a self-adjoint
operator. Both choices of c1 lead to the same Lindblad, since L is invariant under rephasing
of ck, but through the ukk̄ terms they lead to different expressions for Cmr,pq. When ck = A,
we find from Eq. (21b)
Cmr,pq =〈m|{A− 〈A〉, ρ}|r〉〈p|{A− 〈A〉, ρ}|q〉
=〈m|[ρ, [ρ, A]]|r〉〈p|[ρ, [ρ, A]]|q〉 ,
(24a)
while when ck = iA, we have instead
Cmr,pq = −〈m|[A, ρ]|r〉〈p|[A, ρ]|q〉 . (24b)
We will return to this example in Sec. 10.
Using the expression of Eq. (21a) for the time evolution of the generating function,
the descent equations of Eq. (7b) can be verified; this calculation is carried out in Appendix
5. Jump process Schrödinger equation
As our next density tensor application we consider the jump process (piecewise de-
terministic process, or PDP) Schrödinger equation, given by
d|ψ〉 = Adt|ψ〉+BkdNk|ψ〉 , (25a)
where a sum over k is understood, with A and the Bk general (non-self-adjoint) operators,
and with the dNk independent discrete random variables obeying
dNjdNk = δjkdNk , dNjdt = 0 . (25b)
Straightforward calculation shows that this process preserves the norm of |ψ〉 and the pure
state condition ρ2 = ρ = |ψ〉〈ψ|, provided that A and B obey the restrictions
〈A+ A†〉 =0,
〈Bk +B
kBk〉 =0 ,
(25c)
with no summation over k on the second line, which must hold individually for each value
of k. Corresponding to Eq. (25a), the density matrix obeys the evolution equation
dρ =(Aρ+ ρA†)dt+QkdNk ,
Qk =Bkρ+ ρB
+BkρB
(25d)
with a sum over k understood in the dNk term on the first line, but no sum over k understood
in the second line.
Let now E|ψ〉[...] denote an expectation conditioned on the current value of the wave
function being |ψ〉, and E[...] be the expectation value over the entire history of the jump
process (which leads to an ensemble of different current values of the wave function). We
wish to find restrictions on A, Bk, and on
E|ψ〉[dNk] ≡ vkdt , (26a)
such that the expectation of dρ takes the Lindblad form of Eq. (16b), that is,
E[dρ] =dtLρ
Lρ =− i[HTOT, ρ] + ckρc
kck, ρ} .
(26b)
Making the Ansatz
ck −Kk
− 1 , (27a)
withKk constants (this Ansatz includes both the standard quantum jump equation (Kk = 0),
and the orthogonal jump equation (Kk = 〈ck〉), as special cases; see Schack and Brun [10]
for a concise review), some calculation shows that the conditions of Eqs. (25c) and (26b) are
satisfied if we choose
vk =〈(ck −Kk)
†(ck −Kk)〉 ,
A =− iHTOT −
kck +
kck〉+ ckK
(〈ck〉K
k + 〈ck〉
∗Kk) .
(27b)
Let us now define the order n density tensor for the jump models by
ρ(n) = E[
ρℓ] , (28a)
where we use the condensed notation of Eq. (5e). For the differential of this, we find
dρ(n) =E[
(ρ1...ρn)ℓdρℓ +
ℓ<m=1
(ρ1...ρn)ℓmdρℓdρm
ℓ<m<p=1
(ρ1...ρn)ℓmpdρℓdρmdρp + ... + dρ1dρ2dρ3...dρn−1dρn] ,
(28b)
where all powers of dρ must be retained because dN2k = dNk. Using the conditional proba-
bility formula p(|ψ〉∩dNk) = p(dNk| |ψ〉)p(|ψ〉), we get the conditional expectation formula,
valid for an arbitrary function F of the state |ψ〉,
E[F (|ψ〉)dNk] = E[F (|ψ〉)E|ψ〉[dNk]] = E[F (|ψ〉)vk] . (29a)
Using this equation to evaluate the higher order terms in Eq. (28b), together with Eq. (26b)
for the leading term, we get
dρ(n) =dtE[
(ρ1...ρn)ℓ(Lρ)ℓ
ℓ<m=1
(ρ1...ρn)ℓmvk(Qk)ℓ(Qk)m
ℓ<m<p=1
(ρ1...ρn)ℓmpvk(Qk)ℓ(Qk)m(Qk)p + ...+ vk(Qk)1(Qk)2(Qk)3...(Qk)n−1(Qk)n] ,
(29b)
with a sum over k in each term containing vk.
Writing the corresponding generating function in compact notation as
G[a] = E[ea·ρ] , (30a)
the evolution equation for G is given , with the k sum now indicated explicitly, by
dG[a] =E[ea·dρea·ρ]− E[ea·ρ]
(a · dρ)p
)ea·ρ]
=dtE[(a · Lρ+
(a ·Qk)
)ea·ρ] .
(30b)
From Eq. (30b), and the identities
which follow, after some algebra, from Eqs. (16b), (25d),
(27a), and (27b)
{ρ,Lρ} =Lρ−
{ρ,Qk} =Qk −Q
(30c)
one can prove that Eq. (30b) obeys the descent equations, as shown in Appendix C.
6. The density tensor for quantum noise and its kinematical properties
Let us now consider a closed quantum system, consisting of a system S interacting
with an environment E . In such a situation, one does not have a classical probability distri-
bution wα and fluctuations associated with this probability distribution. Instead, one deals
with the system plus environment as the only pure state that is given, with the fluctuations
that are averaged over in deriving the master equation coming from quantum fluctuations
associated with the system-environment interaction. Weighted averages of the sort that we
have used in our definition of Eq. (4a) appear only when the total state is a mixture of pure
states, such as a thermal state, but in this case, important system quantum fluctuations still
occur in each pure state component of this mixture. In order to describe this more general
situation, we shall have to generalize our definition of a density tensor hierarchy.
To achieve this, we initially suppose the overall system plus environment to have the
pure state density matrix ρ. We denote the system basis states by |i〉 , as well as |j〉, and
denote the environment basis states by |ea〉, a = 1, 2, ..... A general density matrix element
has the form 〈e1i|ρ|e2j〉, and the standard reduced density matrix, with the environment
traced out, is defined by
ij = (TrEρ)ij =
〈ei|ρ|ej〉 . (31)
In order to recapture fluctuations that are averaged over in the trace in Eq. (31), we define
the density tensor ρ(n) by
i1j1,i2j2,...,injn
e1,e2,...,en
〈e1i1|ρ|e2j1〉〈e2i2|ρ|e3j2〉...〈en−1in−1|ρ|enjn−1〉〈enin|ρ|e1jn〉
=TrEρi1j1ρi2j2 ...ρinjn .
(32a)
Here we have defined ρiℓjℓ as the matrix, labeled by the system state labels iℓ, jℓ, acting on
the environment Hilbert space HE according to
(ρiℓjℓ)e1e2 = 〈e1|ρiℓjℓ |e2〉 = 〈e1iℓ|ρ|e2jℓ〉 . (32b)
The density tensor ρ(n) is again an operator on a tensor product of system Hilbert spaces
ℓ=1 HS;ℓ. Thus, in a condensed notation analogous to that of Eq. (5e), we can also write
Eq. (32a) as
ρ(n) = TrEρ1ρ2....ρn , (32c)
where ρℓ is an operator acting on HE ⊗HS;ℓ.
We have avoided using a product notation
ℓ=1 in Eq. (32c) because the factors ρiℓjℓ
in Eq. (32a) and ρℓ in Eq. (32c) are different operators on the environment for each ℓ and
thus do not commute. Hence the density tensor is not symmetric under permutation of its
pair indices iℓjℓ, but it is symmetric under cyclical permutation of the indices, as a result
of the cyclic symmetry of the trace. For n = 2, cyclic symmetry is equivalent to symmetry
under pair index interchange, and for n = 3, using the identity
TrABC = Tr
([A,B]C + {A,B}C) , (33)
cyclic symmetry is equivalent to the statement that the density tensor ρ(3) can be written
as the sum of two tensors ρ(3) = ρ(3S) + ρ(3A), with ρ(3S) completely symmetric, and ρ(3A)
completely antisymmetric, under pair index interchange. Also because the density tensor
is not totally symmetric in its pair indices, we cannot introduce a generating function by
imitating Eq. (5d)
Similarly, because of factor non-commutativity, the density tensor satisfies only a
subset of the descent equations of Eqs. (6a), (6b), and (7b). Contraction with δiℓjℓ does not
lead to a descent condition, since δiℓjℓρiℓjℓ is not unity, but rather TrSρ, the reduced density
matrix that acts on the environment when the system is traced out. Contraction of a general
jk with a general iℓ for k 6= ℓ gives nothing useful, since in general non-commuting factors
stand between ρk and ρℓ. However, when a column index jk is contracted with the adjacent
row index ik+1, the two density matrices to which they are attached are linked to form the
product ρ2 = ρ, and so we get the descent relation of Eq. (6b), and others related to it by
cyclic permutation symmetry,
δj1i2ρ
i1j1,i2j2,...,injn
(n−1)
i1j2,i3j3,...,injn
. (34)
As noted before, even when ρ2 6= ρ, the descent relation corresponding to Eq. (34) is still
useful for checking calculations. Since we cannot define a generating function as in Eq. (5d),
in the quantum noise case we do not have analogs of the descent equations in the form of
Eq. (7b); when verifying the descent equations in the various cases considered below, we will
work directly from Eq. (34).
We will also consider a more general definition of the density tensor, corresponding
to the case in which the system plus environment is in a mixed state composed of pure states
ρα with weights wα. Typically, α refers to an eigenvalue of a conserved quantum number of
the total system, such as the energy; when the environment is considered in the independent
particle approximation, with the system back reaction on the environment neglected, α then
can refer to the energies and momenta of each environmental particle. In this case we define
the density tensor by
i1j1,i2j2,...,injn
α;i1j1,i2j2,...,injn
, (35a)
α;i1j1,i2j2,...,injn
= TrEρα;i1j1ρα;i2j2 ...ρα;injn . (35b)
This definition gives information about both the quantum noise or fluctuations contained
within each ρα, and the classical noise or fluctuations associated with the probability distri-
bution wα. Note that in the mixed state case one could also define a density tensor that is
a direct analog of the classical noise definition of Sec. 2, by
(n);CL
i1j1,i2j2,...,injn
TrEρα;iℓjℓ (36)
which would give information only about the classical noise fluctuations associated with the
probability distribution wα. In the examples computed in the following sections, where a
weak coupling approximation is made, the definition of Eq. (36) typically contains no more
information than could be gotten from a product of n reduced density matrix factors, each
of the form
wαTrEρα;iℓjℓ .
As already noted, the density tensor ρ(n) is not measurable by any operation on
the system Hilbert space. Its construction requires knowledge of the full system plus envi-
ronment density matrix, which is not experimentally accessible for complex environments.
Nonetheless ρ(n) is computable in any theory of the system-environment interaction, and
we believe it to be of conceptual and theoretical interest, even if not of direct empirical
relevance.
We close out this section by noting that in the quantum noise case, there is no analog
of Eq.(8a), which relates the positive semidefinite variations Var1,2 to the density tensor ρ
in the classical noise case. The closest analog we find to the fluctuation formulas of Eq. (8a)
involves the n = 3 density tensor. The reason for this is that whereas E[1] = 1, the trace
over the environment of unity is the dimension of the environmental Hilbert space; to get a
unit trace over the environment we must include a factor of ρE ≡ TrSρ, the reduced density
matrix for the environment. This pushes up the order of the density tensor involved from 2
to 3. Specifically, let AS be an operator acting on HS , but which acts as the unit operator on
HE . In place of the expectations used in the classical noise discussion of Eqs. (1a) through
(2d), in the quantum noise case of system plus environment we consider the expression
AE ≡ TrSρAS , which is an operator on the environmental Hilbert space. The trace of this
operator over the environment is TrEAE = TrETrSρAS = TrS(TrEρ)AS = TrSρ
(1)AS , giving
the expectation of the operator AS when the environment is not observed. On the other
hand, the expectation of this operator formed from the environmental reduced density matrix
is TrEρEAE . The mean squared fluctuation of this operator over the environment is positive
semidefinite, and is given by
TrEρE(AE − TrEρEAE)
2 = TrEρEA
E − (TrEρEAE)
2 , (37a)
where we have used the fact that TrEρE = Trρ = 1. Reexpressing Eq. (37a) entirely in terms
of the pure state density matrix ρ, we have
TrETrSρ(TrSρAS)
2 − (TrETrSρTrSρAS)
=δj1i1ASj2i2ASj3i3ρ
i1j1,i2j2,i3j3
− (δj1i1ASj2i2ρ
i1j1,i2j2
(37b)
where we have used the fact that the right hand side of Eq. (37b) involves only the symmetric
part of the order 3 density tensor. Thus, as noted above, where a n = 2 density tensor appears
in Eq. (8a), a n = 3 density tensor appears in Eq. (37b), and where a n = 1 density tensor
appears in Eq. (8a), a n = 2 density tensor appears in Eq. (37b).
7. Collisional Brownian Motion
As our first application of Eqs. (32a-c) and Eqs. (35a,b), we consider the collisional
Brownian motion of a massive Brownian particle immersed in a bath of scattering particles.
We work in the approximation of neglecting recoil of the Brownian particle, and of treating
the bath as a collection of free particles of massm. We consider the pure state density matrix
corresponding to definite momenta {~ki} of the bath particles, calculate the corresponding
order n density tensor defined by Eqs. (32a-c), and then average over the thermal distribution
of the bath particles as in Eqs. (35a,b). Thus the initial density matrix for the total system,
corresponding to the factor ρℓ in Eq. (33d), is
ρTOTℓ = ρℓρE , (38a)
with ρℓ the initial density matrix of the Brownian particle, characterized by its coordinate
matrix elements 〈~Rℓ|ρℓ|~R
ℓ〉, and with ρE the product density matrix for the bath particles,
|~ki〉〈~ki| . (38b)
Since the bath particle scatterings are all independent, we focus on the effect of
the scattering of a single bath particle, of initial momentum ~k, on the Brownian particle,
which we take to be in a superposition of position eigenstates. Thus the initial state of the
Brownian particle and the bath particle that we are considering is
|I〉 =
~R 〉|~k 〉 , (39a)
corresponding to an initial state density matrix
ρI =|I〉〈I|
|~R 〉|~k〉〈~k|〈~R′| .
(39b)
The corresponding Brownian particle matrix element of ρI , which is still an operator on the
bath particle state, takes the form
〈~R|ρI |~R
′〉 = ρ(~R, ~R′)|~k〉〈~k| , (39c)
ρ(~R, ~R′) = c~Rc
. (39d)
Asymptotically, the effect of the scattering is to replace the initial state |I〉 by |F 〉 =
S|I〉, with S the scattering matrix. Substituting Eq. (39a), and using translation invariance
to relate the scattering matrix S with the Brownian particle at a general coordinate, to the
scattering matrix S0 with the Brownian particle at the origin, we get [11]
|F 〉 =S|I〉 =
c~RS|
~R 〉|~k 〉
~R 〉e−i
~kOP·~RS0e
i~kOP·~R|~k 〉 ,
(40a)
with ~kOP the momentum operator for the bath particle. The corresponding final density
matrix is then
ρF =|F 〉〈F |
|~R 〉e−i
~kOP·~RS0e
i~kOP·~R|~k 〉〈~k|e−i
~kOP·~R
i~kOP·~R
〈~R′| ,
(40b)
and the Brownian particle matrix element of ρF , which is again an operator acting on the
bath particle, is
〈~R|ρF |~R
′〉 = ρ(~R, ~R′)e−i
~kOP·~RS0e
i~kOP·~R|~k〉〈~k|e−i
~kOP·~R
i~kOP·~R
. (40c)
Substituting this expression into Eq. (33d), we get
~R1 ~R
,..., ~Rn ~R′n;F
ρ(~Rℓ, ~R
~k|e−i
~kOP·~R
i~kOP·~R
~kOP·~Rℓ+1S0e
i~kOP·~Rℓ+1 |~k〉
ρ(~Rℓ, ~R
i~kOP·(~R
−~Rℓ+1)S0|~k〉e
i~k·(~Rℓ+1−~R
(40d)
with ~Rn+1 = ~R1. The matrix element appearing in the final line of Eq. (40d) is one that
is familiar from the standard calculation of the reduced density matrix (that is, ρ
~R, ~R′
) for
collisional decoherence [12]. Writing
〈~k|S
i~kOP·(~R
−~Rℓ+1)S0|~k〉e
i~k·(~Rℓ+1−~R
) = 1 + f(~Rℓ+1 − ~R
ℓ) , (41a)
with f proportional to the square of the scattering amplitude, the product of matrix elements
in Eq. (40d) can be written, to second order accuracy in the scattering amplitude, as
[1 + f(~Rℓ+1 − ~R
ℓ)] ≃ 1 +
f(~Rℓ+1 − ~R
ℓ) . (41b)
We also note that Eq. (39c), when substituted into Eq. (32c), implies that the value of ρ(n)
before the scattering is
~R1 ~R
,..., ~Rn ~R′n;I
ρ(~Rℓ, ~R
ℓ) . (41c)
Thus when the approximation of Eq. (41b) is substituted into Eq. (40d), we get
~R1 ~R
,..., ~Rn ~R′n;F
~R1 ~R
,..., ~Rn ~R′n;I
f(~Rℓ+1 − ~R
~R1 ~R
,..., ~Rn ~R′n;I
. (41d)
At this point our work is essentially finished, since the remaining steps are identical
to the standard calculation [11,12,13] proceeding from the n = 1 case of Eq. (41d), and the
structure of Eq. (41d) makes it clear how to generalize the standard result for ρ(1) to the
case of general ρ(n). In brief, the standard procedure is to multiply the right hand side of
Eq. (41d) by the number of scattering particles, which combines with a normalizing factor of
the inverse volume to give an overall factor of N , the scattering particle density. The effect
of the thermal distribution µ(~k) of momenta ~k is taken into account by including an integral
d~kµ(~k), in accordance with the mixed state procedure of Eq. (35a). Finally, expressing
the S matrix in terms of the scattering amplitude f(~k′, ~k), and noting that the squared
delta function for energy conservation gives an overall factor of the elapsed time, Eq. (41d)
becomes, in the limit of small elapsed time, a formula for the time derivative of ρ(n). For the
n = 1 case, the standard answer obtained this way is
∂ρ(1)(t)~R~R′
= −F (~R − ~R′)ρ(1)(t)~R~R′ , (42a)
F (~R) = N
d~kµ(~k)
1− ei(
~k−n̂|~k|)·~R)
|f(n̂|~k|, ~k)|2 , (42b)
where n̂ is a unit vector which gives the direction of the scattered particle momentum
~k′ = n̂|~k|. To compare Eq. (42b) with the n = 1 case of Eq. (41d), we replace ~R by
~R2 = ~R1 and ~R
′ by ~R′1. Then we see that the generalization to n ≥ 1 is given by
∂ρ(n)(t)~R1 ~R′1,..., ~Rn ~R′n
F (~Rℓ+1 − ~R
(n)(t)~R1 ~R′1,..., ~Rn ~R′n
. (42c)
This is our final result for collisional Brownian motion, giving the evolution equation
obeyed by the order n density tensor . We see that it has the generic symmetries expected in
the quantum noise case: although not totally symmetric in its pair indices, ρ(n) is symmetric
under cyclic permutation of these indices. As additional checks, we see that for n = 2 the
factor involving F is
F (~R2 − ~R
1) + F (
~R1 − ~R
2) , (43a)
which is symmetric under the interchange 1 ↔ 2, while for n = 3 we have
F (~R2 − ~R
1) + F (
~R3 − ~R
2) + F (
~R1 − ~R
3) = F
S + FA ,
F S =
[F (~R2 − ~R
1) + F (
~R3 − ~R
2) + F (
~R1 − ~R
+F (~R1 − ~R
2) + F (
~R3 − ~R
1) + F (
~R2 − ~R
3)] ,
[F (~R2 − ~R
1) + F (
~R3 − ~R
2) + F (
~R1 − ~R
−F (~R1 − ~R
2)− F (
~R3 − ~R
1)− F (
~R2 − ~R
3)] ,
(43b)
with F S symmetric, and FA antisymmetric, under any of the pair interchanges 1 ↔ 2, or
1 ↔ 3, or 2 ↔ 3. Checking the descent equations is easy. Setting ~R′1 =
~R2, the term
F (~R2− ~R
1) in Eq. (42c) vanishes, so that on integrating over
~R′1 one is left on the right hand
side with a sum F (~R1 − ~R
n) + F (
~R3 − ~R
2) + ... that does not involve
~R′1, times
d~R′1ρ
(n)(t)~R1 ~R′1, ~R
,..., ~Rn ~R′n
, (43c)
and so the descent equation for ρ(n)(t) then implies the descent equation for its time deriva-
tive.
8. The weak coupling Born-Markov approximation and the quantum optical
master equation for the density tensor
We turn next to the density tensor extension of the standard weak coupling Born-
Markov approximation, that is used to give a master equation for the reduced density matrix
ρ(1) for a system S interacting with an environment E . We assume a total system plus envi-
ronment Hamiltonian HTOT = HE +HS +H , with HE and HS respectively the environment
and system Hamiltonians, and with H the system-environment interaction Hamiltonian.
(We omit the customary subscript I on the interaction Hamiltonian to avoid a proliferation
of subscripts.) We shall work in this section in interaction picture, in which the operators
carry the time dependence associated with HE and HS . Thus the interaction Hamiltonian
carries a time dependence H(t), and the density matrix obeys the equation of motion
dρ(t)
= −i[H(t), ρ(t)] (44a)
which can be integrated to give
ρ(t) = ρ(0)− i
ds[H(s), ρ(s)] . (44b)
Substituting Eq. (44b) back into Eq. (44a) gives the additional evolution equation
dρ(t)
= −i[H(t), ρ(0)]−
ds[H(t), [H(s), ρ(s)]] . (44c)
One then notes that up to an error of order H3, the time argument of the factor ρ(s) in the
double commutator term is irrelevant, so this factor can be approximated as ρ(t), giving
dρ(t)
= −i[H(t), ρ(0)]−
ds[H(t), [H(s), ρ(t)]] , (44d)
which is used as the starting point for the standard master equation derivation.
Our first step is to derive a suitable extension of Eq. (44d) for the product ρ1ρ2...ρn
that appears in Eq. (32c). By the chain rule, we have
d(ρ1ρ2...ρn)
ρ2...ρn + ρ1...
...ρn + ρ1....
. (45a)
For each undifferentiated factor on the right of Eq. (45a) we substitute Eq. (44b), and for
each time derivative factor we substitute Eq. (44c), with appropriate subscripts added. Let
us now organize the terms obtained this way according to the number of factors of H that
appear. Since Eq. (44c) contains at least one factor of H , there are no terms in Eq. (45a)
with no factors of H . The general term in Eq. (45a) with one factor of H comes from the
term in Eq. (44c) with one factor of H , multiplied by the product of the terms from Eq. (44b)
with no factors of H , giving
[H1(t), ρ1(0)]ρ2(0)...ρn(0)+ρ1(0)[H2(t), ρ2(0)]...ρn(0)+ ...+ρ1(0)ρ2(0)...[Hn(t), ρn(0)]
(45b)
The terms in Eq. (45a) with two factors of H are of two types: (1) the quadratic term in
H on the right of Eq. (44c) times factors of ρ(0), and (2) the linear term in H on the right
of Eq. (44c), multiplied by one factor of the linear term on the right of Eq. (44b), times
factors of ρ(0). We now note that up to an error of order H3, in terms that already contain
two factors of H we can replace all factors ρ(0) or ρ(s) by the corresponding ρ(t), since the
differences ρ(t) − ρ(s) and ρ(t) − ρ(0) are all of order H . Collecting everything, we get the
following formula, which gives the needed extension of Eq. (44d),
d(ρ1ρ2...ρn)
ρ1(0)...ρℓ−1(0)[Hℓ(t), ρℓ(0)]ρℓ+1(0)...ρn(0)
ρ1(t)...ρℓ−1(t)
ds[Hℓ(t), [Hℓ(s), ρℓ(t)]]ρℓ+1(t)...ρn(t)
{ρ1(t)...ρℓ−1(t)[Hℓ(t), ρℓ(t)]ρℓ+1(t)...ρm−1(t)
ds[Hm(s), ρm(t)]ρm+1(t)...ρn(t)
+ ρ1(t)...ρℓ−1(t)
ds[Hℓ(s), ρℓ(t)]ρℓ+1(t)...ρm−1(t)[Hm(t), ρm(t)]ρm+1(t)...ρn(t)}+O(H
(45c)
Taking the overall TrE of this expression then gives a formula for the time evolution of ρ
(n)(t)
as defined by Eq.(32c).
We now make two standard assumptions. First of all, we assume at that at the
initial time t = 0, the density matrix factorizes so that ρ(0) = ρEρS , with ρE and ρS
respectively density matrices for the environment and the system which commute with one
another, and with ρE a pure state density matrix obeying ρ
E = ρE . Secondly, we assume
that 〈H〉E = TrEρEH = 0, that is, we take the interaction Hamiltonian to have a vanishing
expectation in the initial environmental state. As a result of these two assumptions, the
environmental trace of the first term on the right hand side of Eq. (45c) vanishes, since
TrEρ1(0)...ρℓ−1(0)[Hℓ(t), ρℓ(0)]ρℓ+1(0)...ρn(0)
=ρS1...ρSℓ−1[(TrEρEHℓ(t)), ρSℓ]ρSℓ+1...ρSn = 0 .
(46a)
The remaining terms in Eq. (45c) all have two factors of H . Since ρ(t) and ρ(0) differ by
one power of H , in these terms, up to an error of order H3, we can replace all factors ρ(t)
by the factorized approximation
ρ(t) ≃ ρ(0) = ρEρS = ρETrEρ(0) ≃ ρETrEρ(t) = ρEρ
(1)(t) . (46b)
With these simplifications, and remembering that system operator factors ρ
with different
index values ℓ act on different Hilbert spaces HS;ℓ and so commute, Eq. (45c) becomes an
extended version of the Redfield equation,
dρ(n)(t)/dt = −
1 (t)...ρ
n (t))ℓTrEρ
ds[Hℓ(t), [Hℓ(s), ρ
(t)ρE ]]
1 (t)...ρ
n (t))ℓmTrE
n−(m−ℓ)−1
× {[Hℓ(t), ρ
(t)ρE ]ρ
m−ℓ−1
E [Hm(s), ρ
m (t)ρE ] + [Hℓ(s), ρ
(t)ρE ]ρ
m−ℓ−1
E [Hm(t), ρ
m (t)ρE ]} .
(46c)
This is converted to the Born-Markov equation by setting s→ t− s, and then extending the
upper limit of the s integration from t to ∞, giving
dρ(n)(t)/dt = −
1 (t)...ρ
n (t))ℓTrEρ
ds[Hℓ(t), [Hℓ(t− s), ρ
(t)ρE ]]
1 (t)...ρ
n (t))ℓmTrE
n−(m−ℓ)−1
× {[Hℓ(t), ρ
(t)ρE ]ρ
m−ℓ−1
E [Hm(t− s), ρ
m (t)ρE ] + [Hℓ(t− s), ρ
(t)ρE ]ρ
m−ℓ−1
E [Hm(t), ρ
m (t)ρE ]} .
(46d)
We now note that Eq. (46d) can be further simplified, by taking account of the fact
that whenever an H factor is sandwiched between factors of ρE it vanishes, since ρEHρE =
ρE〈H〉E = 0. This eliminates all terms in the sum over ℓ,m that are not adjacent in a cyclic
sense, i.e., that do not either have m = ℓ + 1, ℓ = 1, ..., n − 1, or ℓ = 1, m = n. The latter,
by use of the cyclic properties of the trace, can be rearranged to give the ℓ = n term of the
former set. We thus get a simplified set of Born-Markov equations. For n = 1, we get the
usual starting point for the Born-Markov master equation derivation,
dρ(1)(t)/dt = −TrE
ds[H(t)H(t− s)ρ(1)(t)ρE + ρ
(1)(t)ρEH(t− s)H(t)
−H(t)ρ(1)(t)ρEH(t− s)−H(t− s)ρ
(1)(t)ρEH(t)] ,
(47a)
and for n ≥ 2, with the subscript n+ 1 identified with 1,
dρ(n)(t)/dt = −TrEρE
× {(ρ
1 (t)...ρ
n (t))ℓ[Hℓ(t)Hℓ(t− s)ρ
(t) + ρ
(t)Hℓ(t− s)Hℓ(t)]
1 (t)...ρ
n (t))ℓℓ+1[ρ
(t)Hℓ(t)Hℓ+1(t− s)ρ
ℓ+1(t) + ρ
(t)Hℓ(t− s)Hℓ+1(t)ρ
ℓ+1(t)]} .
(47b)
At this point it is useful to check (and we have done so) that the descent equations are
satisfied by Eqs. (47a) and (47b).
The remainder of the derivation follows closely the standard master equation deriva-
tion, in the rotating wave approximation, that proceeds from Eq. (47a), so we will only give
a sketch. For further details, and in particular a discussion of the physical justification for
the approximations involved, see Sec. 3.3 of ref [1] and also ref [13]. One assumes that Hℓ(t)
has the form
Hℓ(t) =
eiωtA
ℓα(ω)Bα(t) , (48a)
with A
ℓα acting only in the system Hilbert space HS;ℓ and with Bα acting only in the
environment Hilbert space HE , and with the Hermiticity properties A
(ω) = Aℓα(−ω) and
B†α(t) = Bα(t). Since Eqs. (47a,b) are quadratic in H , one uses Eq. (48a) twice; for each
Hk(t− s) (regardless of the value of the index k) one writes
Hk(t− s) =
e−iω(t−s)Akβ(ω)Bβ(t− s) , (48b)
and for each Hk(t) (again regardless of the value of k) one writes
Hk(t) =
(ω′)B†α(t) . (48c)
The rotating wave approximation then consists of neglecting terms in the double sum with
ω′ 6= ω, so that only the diagonal terms ω′ = ω are left. From the trace over the environment,
and the integral over s, one gets correlators of the form
dseiωs〈B†α(t)Bβ(t− s)〉E ≡ Γαβ(ω) ,
dseiωs〈Bβ(t− s)B
α(t)〉E = Γαβ(−ω)
(49a)
where in the second line we have used the definition of the first line and the adjointness
properties of the integrand. It is also customary to decompose the reservoir correlation
function Γαβ into self-adjoint and anti-self-adjoint parts, according to
Γαβ(ω) =
γαβ(ω) + iSαβ(ω) . (49b)
Proceeding in this fashion, after some algebra one gets the final result, which can be
written as an equation for all n ≥ 1 by including a δn1 to take account of the special nature
of the n = 1 equation,
dρ(n)(t)/dt =
1 (t)...ρ
n (t))ℓi[ρ
Sαβ(ω)A
(ω)Aℓβ(ω)]
γαβ(ω)
1 (t)...ρ
n (t))ℓ
δn1Aℓβ(ω)ρ
(ω)Aℓβ(ω), ρ
1 (t)...ρ
n (t))ℓℓ+1ρ
(ω)Aℓ+1β(ω)ρ
ℓ+1(t)
(50a)
Despite the fact the the n = 1 and n ≥ 2 density tensors have a different structure, the
descent equations are satisfied by Eq. (50a), as verified in Appendix D.
Finally, we note that Eq. (50a) is readily converted to the quantum optical master
equation and its density tensor generalizations, by taking α to be a three-vector index,
so that Aα becomes ~A, which is related to the dipole operator by Eq. (3.182) of ref [1].
Also, one takes Sαβ(ω) = δαβS(ω), with S(ω) given by Eq. (3.205) of ref [1], and γαβ(ω) =
(4ω3/3)[1 + N(ω)]δαβ , with N(ω) = 1/(e
βω − 1) the photon number operator. One gets in
this way the density tensor generalization of the quantum optical master equation,
dρ(n)(t)/dt =
1 (t)...ρ
n (t))ℓi[ρ
S(ω) ~A
(ω) · ~Aℓ(ω)]
(4ω3/3)[1 +N(ω)]
1 (t)...ρ
n (t))ℓ
δn1 ~Aℓ(ω) · ρ
ℓ (t)
ℓ (ω)−
ℓ (ω) ·
~Aℓ(ω), ρ
ℓ (t)}
1 (t)...ρ
n (t))ℓℓ+1ρ
(t) ~A
(ω) · ~Aℓ+1(ω)ρ
ℓ+1(t)
(50b)
which is our final result of this section.
9. The Caldeira–Leggett model master equation for the density tensor
The Caldeira–Leggett model [14] describes the damping of the one-dimensional mo-
tion of a Brownian particle of mass m, moving in a potential V (x), and interacting with
an environment consisting of harmonic oscillators with masses mo and frequencies ωo, and
annihilation operator bo. The interaction Hamiltonian is assumed to be a linear coupling
H = −xB, with
κoxo =
κo(bo + b
o)/(2moωo)
2 (51a)
a weighted sum of the harmonic oscillator coordinates. A counter-term formally of order H2,
Hc = x
2moωo
≡ x2C , (51b)
is included in the calculation, so that the total Hamiltonian is
HTOT = HE +HS +H +Hc , (52a)
with HE and HS respectively the oscillator and particle Hamiltonians,
obo +
+ V (x) .
(52b)
Our aim will be to get a description of the effect on the particle motion of the couplings
to the oscillator environment, in the high temperature limit. Our derivation of the density
tensor generalization of the high temperature master equation closely follows that of Sec.
3.6 of ref [1], to which the reader is referred for a discussion of the physical motivation of
the approximations involved.
Since the environmental expectation of the interaction Hamiltonian H vanishes, we
can proceed directly from the simplified Born-Markov equation of Eqs. (47a) and (47b). The
first step is to transform the density matrix ρ(t) back to Schrödinger picture; it is easy to
see that the effect of this is to replace H(t) by H = H(0), to replace H(t − s) by H(−s)
(with H(−s) still in the interaction picture), and to change d/dt to D/dt, defined by
Dρ(n)(t)/dt = dρ(n)(t)/dt+ i
TrEρ1(t)...ρℓ−1(t)[p
ℓ/(2m) + V (xℓ), ρℓ(t)]ρℓ+1(t)...ρn(t) .
(53a)
It is also necessary to explicitly include commutators arising from the counter term, which is
easy since this term is treated as being already quadratic in H . For the analog of Eq. (47a)
for the special case n = 1, we find
Dρ(1)(t)/dt = −i[Hc, ρ
(1)(t)]− TrE
ds[HH(−s)ρ(1)(t)ρE + ρ
(1)(t)ρEH(−s)H
−Hρ(1)(t)ρEH(−s)−H(−s)ρ
(1)(t)ρEH ] ,
(53b)
and for the analog of Eq. (47b) for n ≥ 2, we have
dρ(n)(t)/dt =− i
1 (t)...ρ
n (t))ℓ[Hcℓ, ρ
ℓ (t)]
−TrEρE
1 (t)...ρ
n (t))ℓ[HℓHℓ(−s)ρ
(t) + ρ
(t)Hℓ(−s)Hℓ]
1 (t)...ρ
n (t))ℓℓ+1[ρ
(t)HℓHℓ+1(−s)ρ
ℓ+1(t) + ρ
(t)Hℓ(−s)Hℓ+1ρ
ℓ+1(t)]} .
(53c)
We next note that
Hℓ = −xℓ(0)B(0) , Hℓ(−s) = −xℓ(−s)B(−s) , (54a)
where, using the assumption that the system evolution is slow compared to the oscillator
time scale, we approximate xℓ(−s) by its free particle dynamics,
xℓ(−s) ≃ xℓ −
s . (54b)
Since the right hand sides of Eqs. (53b,c) are quadratic in H , the operator B giving the
coupling to the oscillators appears, after the environmental trace is taken, only through the
correlators
D(s) ≡i〈[B(0), B(−s)]〉E ,
D1(s) ≡〈{B(0), B(−s)}〉E ,
(55a)
so that we have
〈B(0)B(−s)〉E =
[D1(s)− iD(s)] ,
〈B(−s)B(0)〉E =
[D1(s) + iD(s)] .
(55b)
These correlators appear in the following integrals, which are evaluated or approximated in
Sec. 3.6.2 of ref [1],
dsD(s) =2C ,
dsD1(s) =4mγkBT ,
dssD(s) =2mγ ,
dssD1(s) =4mγkBT/Ω ≃ 0 ,
(55c)
with C the constant defined by the counter term of Eq. (51b), with γ a constant determined
by the harmonic oscillator spectral density, with kB and T respectively the Boltzmann con-
stant and environment temperature, and with Ω a frequency cutoff. For a spectral density
J(ω) with a Lorentz-Drude cutoff function, one has
J(ω) =
2moωo
δ(ω − ωo) =
Ω2 + ω2
. (55d)
This completes the specification of the calculation; the rest is just the algebra of
assembling all the pieces, and so we pass directly to the result. For n = 1, we get the
Caldeira–Leggett master equation,
Dρ(1)(t)/dt = −iγ[x, {p, ρ(1)(t)}]− 2mγkBT [x, [x, ρ
(1)(t)]] . (56a)
For the density tensors with n ≥ 2, we correspondingly get
Dρ(n)/dt =
1 (t)...ρ
n (t))ℓ
−2mγkBT{x
ℓ , ρ
ℓ (t)}+ iγ
ℓ (t)pℓxℓ − xℓpℓρ
ℓ (t)
1 (t)...ρ
n (t))ℓℓ+1[4mγkBTρ
(t)xℓxℓ+1ρ
ℓ+1(t) + iγρ
(t)(xℓpℓ+1 − pℓxℓ+1)ρ
ℓ+1(t)] .
(56b)
We also note that the term proportional to γ on the first line of Eq. (56b) can be written in
the alternative form,
(t)pℓxℓ − xℓpℓρ
(t) +
(t), {xℓ, pℓ}] . (56c)
Equations (56a) and (56b) are our final results for the Caldeira–Leggett model. As was
the case for the master equations derived in the preceding section, despite the differences
between the structure of the n = 1 and the n ≥ 2 equations, the descent equations are
satisfied, as verified in Appendix E.
10. An application to state vector reduction
We turn now to considerations that bridge the discussions given above in the classical
and quantum noise cases. We begin with an analysis of two Itô stochastic Schrödinger
equations,
d|ψ〉 = −
(A− 〈A〉)2dt|ψ〉+ (A− 〈A〉)dWt|ψ〉 , (57a)
d|ψ〉 = −
A2dt|ψ〉+ iAdWt|ψ〉 , (57b)
with dWt a real Brownian noise obeying dW
t = dt, and where we have dropped the Hamil-
tonian term. These lead to the respective density matrix evolution equations
dρ = −
[A, [A, ρ]]dt+ [ρ, [ρ, A]]dWt , (58a)
dρ = −
[A, [A, ρ]]dt+ i[A, ρ]dWt , (58b)
which correspond to the same Lindblad type evolution equation for the expectation E[ρ],
dE[ρ] = LE[ρ]dt , Lρ = −
[A, [A, ρ]] . (58c)
Let us now consider the effect of the stochastic evolutions of Eqs. (58a,b,c) on the
expectation of the variance V = Var(A) of the operator A,
V =TrρA2 − (TrρA)2 ,
E[V ] =TrE[ρ]A2 −E[ρi1j1ρi2j2 ]Aj1i1Aj2i2
=Trρ(1)A2 − ρ
i1j1,i2j2
Aj1i1Aj2i2 ,
(59a)
where in the final line we have used the density tensor definition of Eq. (15). For the time
evolution of E[V ] we have
dE[V ]/dt =Tr(LE[ρ])A2 − dρ
i1j1,i2j2
Aj1i1Aj2i2
=Tr(LE[ρ])A2 − 2E[ρi1j1(Lρ)i2j2 ]Aj1i1Aj2i2 − E[Ci1j1,i2j2 ]Aj1i1Aj2i2 ,
(59b)
where we have used Eq. (58c) in the first line and Eq. (23a) in the second line. Since the
cyclic property of the trace implies that Tr[A, [A, ρ]]A = 0 , the terms in Eq. (59b) involving
the Lindblad L all vanish, and so the time derivative of E[V ] comes entirely form the final
term,
dE[V ]/dt = −E[Ci1j1,i2j2 ]Aj1i1Aj2i2 , (59c)
and thus is determined by the evolution equation for the second order density tensor. This
is why the state vector evolutions of Eqs. (57a) and (57b), or equivalently the density matrix
evolutions of Eqs. (58a) and (58b), lead to very different results for the evolution of the
variance of the operator A. The tensor Ci1j1,i2j2 corresponding to Eqs. (57b) and (58b) is
given in Eq. (24b), and since the cyclic property of the trace implies that Tr[A, ρ]A = 0, one
has dE[V ]/dt = 0 for this evolution. On the other hand, the tensor Ci1j1,i2j2 corresponding
to Eqs. (57a) and (58a) is given in Eq. (24a), and through Eq. (59c) implies that
dE[V ]/dt = −E[(Tr[ρ, [ρ, A]]A)2] = −E[(Tr([ρ, A])2)2] , (59d)
which is negative definite. Starting from Eq. (59d), some simple inequalities imply that the
stochastic evolution of Eqs. (57a) and (58a) drives the variance of A to zero as t→ ∞, and
hence reduces the state vector to an eigenstate of A, as discussed in detail in refs [15].
Let us now consider a quantum system S, consisting of a microscopic system coupled
to a macroscopic measuring apparatus, interacting with a quantum environment E , with the
totality forming a closed system. A general result [16], using just the linearity of quantum
mechanics, shows that state vector reduction cannot occur in this case. To understand this
result through an analysis similar to that just given for Eqs. (57a,b), let us consider the
behavior of the variance of a system operator A which is a good “pointer observable”. By
definition, a system operator commutes with the environment Hamiltonian HE , and since
the system in this case includes the apparatus and so is macroscopic, the pointer observable
also obeys [17] [A,H ] = 0, with H the system-environment interaction Hamiltonian. Let us
now write the density matrix evolution in Schrödinger picture,
dρ/dt = −i[HTOT, ρ] = −i[HS +HE +H, ρ] , (60a)
with HS the system Hamiltonian. We consider the system evolution after a brief interaction
has entangled the apparatus states with the microscopic subsystem quantum states that are
to be distinguished by the pointer reading. For the time evolution of the variance of the
pointer observable A, we have
dV/dt = Tr(dρ/dt)A2 − 2(TrρA)(Tr(dρ/dt)A) , (60b)
which substituting Eq. (60a), and using the cyclic property of the trace and the fact that A
commutes with both HE and H , simplifies to
dV/dt = iTrρ[HS , A
2]− 2i(TrρA)(Trρ[HS , A]) . (60c)
This can be further simplified by using the definition of the reduced density matrix ρ(1) =
TrEρ, together with the fact that the commutators in Eq. (60c) involve only system operators,
giving
dV/dt = iTrSρ
(1)[HS , A
2]− 2i(TrSρ
(1)A)(TrSρ
(1)[HS , A]) . (60d)
We see that, unlike the Itô equation case discussed above, the time derivative of V here is
determined by ρ(1), rather than by ρ(2).
Let us now take the pointer observable to be a pointer center of mass coordinate
A = X , in which case, once the entanglement of the pointer with the microsystem being
measured has been established, the relevant part of the system Hamiltonian HS is P
2/(2M),
with P the total momentum operator for the pointer of macroscopic mass M . Evaluating
the commutators, and writing TrSρ
(1)O = 〈O〉, we see that
dV/dt =(1/M)〈{P,X}〉 − (2/M)〈X〉〈P 〉
=(1/M)〈{X − 〈X〉, P − 〈P 〉}〉 .
(61a)
By the Schwartz inequality, the right hand side of Eq. (61a) is bounded by
(2/M)〈(X − 〈X〉)2〉
2 〈(P − 〈P 〉)2〉
2 = (2/M)∆X∆P . (61b)
Let us now determine the minimum value of the bound of Eq. (61b) that is compatible
with the parameters of a feasible measurement. Since the uncertainty principle implies that
∆X∆P ≥ 1/2, we get a least upper bound on Eq. (61b) by substituting ∆X∆P ∼ 1/2.
This shows that |dV/dt| can be made as small as ∼ 1/M , which since M is macroscopic, can
be made essentially arbitrarily small.3 Hence the variance of the pointer variable A stays
essentially constant, and is not forced to reduce to zero in the course of the measurement.
We conclude, in agreement with the arguments of [16], that a quantum apparatus
interacting with a quantum environment does not act like the stochastic equation of Eq. (57a)
in terms of reducing the state vector. Although a quantum environment acts on a quantum
system with a form of “noise”, our analysis of the density tensor hierarchy in the classical and
3 Restoring factors of Planck’s constant, |dV/dt| can be as small as h̄/M , for which
the reduction time dt is at least of order MdV/h̄. For M ∼ 1024mproton and dV ∼ (1cm)
this gives dt ∼ M(1cm)2/h̄ ∼ 1027s ∼ 1010 times the age of the universe. Note that our
argument places no restriction on the mean pointer momentum 〈P 〉 that establishes the time
needed to attain one or the other of the measurement outcomes X starting from the initial
pointer position.
quantum noise cases shows that structures with different kinematical symmetries,4 different
dynamical evolutions, and different implications for the measurement process are involved.
As a result, the quantum noise in a closed quantum system does not mimic the action of the
classical noise in objective reduction models, and cannot be invoked to give a resolution of the
quantum measurement problem within the framework of unmodified quantum mechanics.
Acknowledgments
I wish to thank Angelo Bassi, Todd Brun, Lajos Diósi, Larry Horwitz, and Lane
Hughston for instructive conversations over a number of years that helped motivate this
study, and Francesco Petruccione for the gift some years ago of a copy of ref [1]. This work
was supported in part by the Department of Energy under Grant #DE–FG02–90ER40542.
4 The dissimilarities between the symmetries of the classical noise and quantum noise
hierarchies are least for the order two density tensor. In the order two case, cyclic symmetry
is equivalent to full permutation symmetry, and so the index symmetry properties are the
same in the classical and quantum noise cases, and as a consequence the descent equations
in the quantum noise case correspond to the idempotence descent equations in the classical
noise case. Only the classical noise descent equation implied by the unit trace condition has
no precise quantum noise counterpart: in the classical case, one has
δi1j1ρ
i1j1,i2j2
= δi1j1E[ρi1j1ρi2j2 ] = E[ρi2j2 ] = ρ
whereas in the quantum noise case one instead has
δi1j1ρ
i1j1,i2j2
= δi1j1TrEρi1j1ρi2j2 = TrEρEρi2j2 6= TrEρi2j2 = ρ
with ρE = TrSρ the reduced density matrix of the environment with the system traced out.
Appendix A: Descent equations for the
isotropic spin-1/2 ensemble
Let us write the generating function of Eq. (14b) as
G[aij ] =fg ,
f(x) = sinh x
2 , x = ~A 2 ,
Tra .
Then, we find
δmrG+ gf
′ ~A · ~σmr ,
∂amr∂apq
δpqgf
′ ~A · ~σmr
+gf ′′ ~A · ~σmr ~A · ~σpq +
gf ′~σmr · ~σpq .
Here the primes denote derivatives of f with respect to x, and in this notation f obeys the
second order differential equation
xf ′′ +
f ′ =
f . (A3)
Contracting the first expression in Eq. (A2) with δmr , and using the tracelessness of the Pauli
matrices, gives the first equation in Eq. (7b). Contracting the second expression in Eq. (A2)
with δrp, and using the differential equation of Eq. (A3) together with the Pauli matrix
identities (~σ 2)mq = 3δmq and σ
iσj = δij+ iǫijkσk, which implies ~A ·~σmp ~A ·~σpq = ~A
2δmq , gives
the second equation in Eq. (7b).
Appendix B: Descent equations for the
Itô stochastic Schrödinger equation
We wish here to verify that
dG[a] = dtE
amr(Lρ)mr +
amrapqCmr,pq
obeys the descent equations of Eq. (7b). Since
δmr(Lρ)mr = δmrCmr,pq = δpqCmr,pq = 0 , (B2a)
we have
∂dG[a]
= dtE
amr(Lρ)mr +
amrapqCmr,pq
(Trρ)eρ·a
= dG[a] , (B2b)
giving the first identity in Eq. (7b). Next we calculate
∂dG[a]
(Lρ)mq +
auv(Cmq,uv + Cuv,mq)
auv(Lρ)uv +
auvarsCuv,rs
(B3a)
while for the contraction of the second variation we have (with indices m, q implicit on the
right hand side)
∂2dG[a]
∂amr∂arq
= dtE
S1 + S2 + auv(T1uv + T2uv)
, (B3b)
(Cmr,rq + Crq,mr) ,
S2 ={Lρ, ρ}mq ,
T1uv =
[(Cmr,uv + Cuv,mr)ρrq + ρmr(Cuv,rq + Crq,uv)] ,
T2uv =
(Lρ)uv +
arsCuv,rs
ρmq ,
(B3c)
We see immediately that auvT2uv gives all of the second line of Eq. (B3a). From Eqs. (16b)
and (18) we find
{Lρ, ρ}mq = (Lρ)mq − [(ck − 〈ck〉)ρ(ck − 〈ck〉)
†]mq − ρmq〈(ck − 〈ck〉)
†(ck − 〈ck〉)〉 , (B4a)
while from Eq. (21b) we have
(Cmr,rq + Crq,mr) = [(ck − 〈ck〉)ρ(ck − 〈ck〉)
†]mq + ρmq〈(ck − 〈ck〉)
†(ck − 〈ck〉)〉 . (B4b)
Hence S1 + S2 = (Lρ)mq, giving the Lρ part of the first line of Eq. (B3a). Finally, again
using Eq. (21b) we find that
[(Cmr,uv + Cuv,mr)ρrq + ρmr(Cuv,rq + Crq,uv)] =
(Cmq,uv + Cuv,mq) , (B4c)
and so auvT1uv gives the remainder of the first line of Eq. (B3a), completing the check of the
descent equations.
Appendix C: Descent equations for the
jump Schrödinger equation
We verify here that
dG[a] = dtE
(a · Lρ+
(a ·Qk)
)ea·ρ
(C1a)
obeys the descent equations of Eq. (7b). Since Tr(Lρ) = 0 and TrQk = 〈Bk+B
Bk〉 = 0,
we have
∂dG[a]
= dtE
(a · Lρ+
(a ·Qk)
)(Trρ)ea·ρ
, (C1b)
checking the first line of Eq. (7b). Next we calculate the first variation of G,
∂dG[a]
(Lρ)mq +
(a ·Qk)
(p− 1)!
(Qk)mq
+(a · Lρ+
(a ·Qk)
)ρmqe
(C2a)
and the contracted second variation,
∂2dG[a]
∂amr∂arq
= dtE
(S1 + S2 + S3 + S4)e
, (C2b)
(a ·Qk)
(p− 2)!
(Q2k)mq ,
S2 ={Lρ, ρ}mq ,
(a ·Qk)
(p− 1)!
{Qk, ρ}mq ,
a · Lρ+
(a ·Qk)
ρmq .
(C2c)
We see immediately that S4 gives all of the second line of Eq. (C2a). From Eq. (30c), which
we rewrite here,
{Lρ, ρ} =Lρ−
{Qk, ρ} =Qk −Q
(C3a)
we see that the Lρ part of S2 and the Qk part of {Qk, ρ} in S3 give the first line of Eq. (C2a).
To complete the verification, we must show that S1 cancels against the remainder of S2+S3,
which is
(a ·Qk)
(p− 1)!
(Q2k)mq . (C3b)
But separating off the p = 2 term of S1, and making the change of variable p → p + 1
in the remaining sum, we see that S1 is exactly the negative of Eq. (C3b), completing the
argument.
Appendix D: Descent equations for the Born-Markov
master equation
We wish here to verify that Eq. (50a) obeys the descent equations of Eq. (34). We
separate the verification into two parts, first checking the descent from n = 2 to n = 1,
and then checking the descent from general n > 2 to n − 1. For the n = 2 density tensor
time derivative, writing out all terms in Eq. (50a) explicitly, and using the fact that since
operators labeled with subscripts 2 and 1 act on different Hilbert spaces, the order in which
they are written is irrelevant, we have
dρ(2)(t)/dt =i[ρ
1 (t),
Sαβ(ω)A
1α(ω)A1β(ω)]ρ
2 (t)
1 (t)i[ρ
2 (t),
Sαβ(ω)A
2α(ω)A2β(ω)]
γαβ(ω)
1α(ω)A1β(ω), ρ
1 (t)}ρ
2 (t)
γαβ(ω)ρ
1 (t)
2α(ω)A2β(ω), ρ
2 (t)}
γαβ(ω)[ρ
1 (t)A
1α(ω)A2β(ω)ρ
2 (t) + A1β(ω)ρ
1 (t)ρ
2 (t)A
2α(ω)] .
(D1a)
Contracting the column index associated with the subscript 1 with the row index
associated with the subscript 2, and dropping the subscripts since all operators now act in
the same Hilbert space, we get
dρ(2)(t)/dt→i[ρ(1)(t),
Sαβ(ω)A
α(ω)Aβ(ω)]ρ
(1)(t)
+ρ(1)(t)i[ρ(1)(t),
Sαβ(ω)A
α(ω)Aβ(ω)]
γαβ(ω)
{A†α(ω)Aβ(ω), ρ
(1)(t)}ρ(1)(t)
γαβ(ω)ρ
(1)(t)
{A†α(ω)Aβ(ω), ρ
(1)(t)}
γαβ(ω)[ρ
(1)(t)A†α(ω)Aβ(ω)ρ
(1)(t) + Aβ(ω)(ρ
(1)(t))2A†α(ω)]
=i[(ρ(1)(t))2,
Sαβ(ω)A
α(ω)Aβ(ω)]
γαβ(ω)
Aβ(ω)(ρ
(1)(t))2A†α(ω)−
{(ρ(1)(t))2, A†α(ω)Aβ(ω)}
(D1b)
which has the structure of dρ(1)(t)/dt and so verifies the 2 → 1 descent.
To verify the n→ n− 1 descent we make some simplifications in notation. We omit
all superscripts (1), since this leads to no ambiguities, as well as all time arguments (t) and
all frequency arguments (ω). We also abbreviate
Sαβ(ω)A
(ω)Aℓβ(ω) ,
γαβ(ω)A
(ω)Aℓβ(ω) .
(D2a)
Our general strategy is to split the sum
ℓ=1 containing (ρ1...ρn)ℓ into
ℓ=2 plus the ℓ = 1
and the ℓ = n terms, and to split the sum
ℓ=1 containing (ρ1...ρn)ℓℓ+1 into
ℓ=2 plus the
ℓ = 1, ℓ = n− 1, and ℓ = n terms. For the part of dρ(n)/dt involving Lℓ, we have
(ρ1...ρn−1)ℓρni[ρℓ, Lℓ] + (ρ2...ρn)i[ρ1, L1] + (ρ1...ρn−1)i[ρn, Ln] , (D2b)
which on contracting the column index associated with the subscript n with the row index
associated with the subscript 1, and relabeling all quantities that had subscript n with
subscript 1, since they act now in the same Hilbert space, gives
(ρ21ρ2...ρn−1)ℓi[ρℓ, Lℓ] + ρ2...ρn−1i(ρ1[ρ1, L1] + [ρ1, L1]ρ1)
(ρ21ρ2...ρn−1)ℓi[ρℓ, Lℓ] + ρ2...ρn−1i[ρ
1, L1] ,
(D2c)
which has the correct structure for the corresponding part of dρ(n−1)/dt, with ρ1 replaced by
ρ21. The remainder of dρ
(n)/dt is
(ρ1...ρn)ℓ
{Mℓ, ρℓ} − (ρ2...ρn)
{M1, ρ1} − (ρ1...ρn−1)
{Mn, ρn}
(ρ1...ρn)ℓℓ+1ρℓA
Aℓ+1βρℓ+1 + ρ3...ρnρ1A
1αA2βρ2
+ ρ1...ρn−2ρn−1A
n−1αAnβρn + ρ2...ρn−1ρnA
nαA1βρ1
(D3a)
Again, contracting the column index associated with the subscript n with the row index
associated with the subscript 1, and relabeling all quantities that had subscript n with
subscript 1, since they act now in the same Hilbert space, gives
(ρ21...ρn−1)ℓ
{Mℓ, ρℓ} − (ρ2...ρn−1)
ρ1M1ρ1(∗) +
{M1, ρ
(ρ21...ρn−1)ℓℓ+1ρℓA
Aℓ+1βρℓ+1 + (ρ3...ρn−1)ρ
1αA2βρ2
+ ρ2...ρn−2ρn−1A
n−1αA1βρ
1 + ρ2...ρn−1ρ1A
1αA1βρ1(∗)
(D3b)
which on canceling the terms marked with (∗) gives the corresponding part of dρ(n−1)/dt,
with ρ1 replaced by ρ
1. This completes the verification of the n→ n− 1 descent.
Appendix E: Descent equations for the
Caldeira–Leggett model
We verify here that Eqs. (56a) and (56b) obey the descent equations of Eq. (34).
As in the preceding appendix, we simplify the notation by omitting all superscripts (1) and
all time arguments (t). We first verify the n = 2 to n = 1 descent. For the n = 2 case of
Eq. (56b), we have
Dρ(2)/dt =ρ2[−2mγkBT (x
1ρ1 + ρ1x
1) + iγ(ρ1p1x1 − x1p1ρ1)]
+ρ1[−2mγkBT (x
2ρ2 + ρ2x
2) + iγ(ρ2p2x2 − x2p2ρ2)]
+4mγkBT (ρ1x1x2ρ2 + ρ2x2x1ρ1) + iγ[ρ1(x1p2 − p1x2)ρ2 + ρ2(x2p1 − p2x1)ρ1] .
(E1A)
Contracting the column index associated with the subscript 1 with the row index associated
with the subscript 2, and dropping subscripts since all operators now act in the same Hilbert
space, we get
Dρ(2)/dt→− 2mγkBT (x
2ρ2 + ρx2ρ) + iγ(ρpxρ− xpρ2)
− 2mγkBT (ρx
2ρ+ ρ2x2) + iγ(ρ2px− ρxpρ)
+4mγkBT (ρx
2ρ+ xρ2x) + iγ[ρ(xp− px)ρ+ pρ2x− xρ2p] .
(E1B)
We see that the terms that have an operator sandwiched between two factors of ρ cancel,
leaving only terms involving ρ2, which have the form of Eq. (56a) with ρ replaced by ρ2.
To check the n > 2 to n−1 descent, we split the sums that occur in the same manner
as in Appendix D. We thus write Eq. (56b) in the form
Dρ(n)/dt =
(ρ1...ρn)ℓ[−2mγkBT{x
ℓ , ρℓ}+ iγ(ρℓpℓxℓ − xℓpℓρℓ)]
+ρ2...ρn[−2mγkBT{x
1, ρ1}+ iγ(ρ1p1x1 − x1p1ρ1)]
+ρ1...ρn−1[−2mγkBT{x
n, ρn}+ iγ(ρnpnxn − xnpnρn)]
(ρ1...ρn)ℓℓ+1[4mγkBTρℓxℓxℓ+1ρℓ+1 + iγρℓ(xℓpℓ+1 − pℓxℓ+1)ρℓ+1]
+ρ3...ρn[4mγkBTρ1x1x2ρ2 + iγρ1(x1p2 − p1x2)ρ2]
+ρ1...ρn−2[4mγkBTρn−1xn−1xnρn + iγρn−1(xn−1pn − pn−1xn)ρn]
+ρ2...ρn−1[4mγkBTρnxnx1ρ1 + iγρn(xnp1 − pnx1)ρ1] .
We now contract the column index associated with the subscript n with the row index
associated with the subscript 1, and relabel all quantities that had subscript n with subscript
1, since they act now in the same Hilbert space. As is readily seen by inspection of Eq. (E2),
this gives Eq. (56b) with n replaced by n−1 and with ρ1 replaced by ρ
1, together with terms
of the wrong structure, that grouped together give (4−2−2)ρ2...ρn−1mγkBTρ1x
1ρ1 = 0 and
(1 − 1)ρ2...ρn−1iγρ1(x1p1 − p1x1)ρ1 = 0, which thus vanish. This completes the verification
of the descent equation for Eq. (56b).
References
[1] Breuer H-P and Petruccione F (2002) The Theory of Open Quantum Systems (Oxford:
Oxford University Press)
[2] Breuer H-P and Petruccione F (1996) Phys. Rev. A 54 1146
[3] Wiseman H M (1993) Phys. Rev. A 47 5180
[4] Mølmer K, Castin Y and Dalibard J (1993) J. Opt. Soc. Am. B 10 524
[5] Mielnik, B (1974) Commun. Math. Phys. 37 221; see especially p 240. I wish to thank
Lane Hughston for bringing this reference, and ref [6] as well, to my attention.
[6] Brody, D C and Hughston, L P (1999) J. Math. Phys. 40 12, Eqs. (31) and (32); Brody,
D C and Hughston, L P (1999) Proc. Roy. Soc. A 455 1683, Sec. 2(e); Brody, D C and
Hughston, L P (2000) J. Math. Phys. 41, 2586, Eq. (9) and subsequent discussion.
[7] Wiseman H M and Diósi L (2001) Chem. Phys. 268 91. See also Diósi L (1986) Phys.
Lett. A 114 451 for the transition rate operator.
[8] Lindblad G (1976) Commun. Math. Phys. 48 119
[9] Gorini V, Kossakowski A and Sudarshan E C G (1976) J. Math. Phys. 17 821
[10] Schack R and Brun T A (1997) Comp. Phys. Commun. 102 210
[11] Gallis M R and Fleming G N (1990) Phys. Rev. A 42 38
[12] Diósi L (1995) Europhys. Lett. 30 63; Dodd P J and Halliwell J J (2003) Phys. Rev. D
67 105018; Hornberger K and Sipe J E (2003) Phys. Rev. A 68 012105; Adler S L (2006)
J. Phys. A: Math. Gen. 39 14067
[13] Hornberger K (2006) Introduction to decoherence theory, arXiv: quant-ph/0612118
[14] Caldeira A O and Leggett A J (1983) Physica A 121 587
[15] Ghirardi G C, Pearle P and Rimini A (1990) Phys. Rev. A 42 78; Hughston L P (1996)
Proc. Roy. Soc. A 452 953; Adler S L and Horwitz L P (2000) J. Math. Phys. 41 2485;
Adler S L, Brody D C, Brun T A and Hughston L P (2001) J Phys. A: Math. Gen. 34
8795; Adler S L (2004) Quantum Theory as an Emergent Phenomenon (Cambridge UK:
http://arxiv.org/abs/quant-ph/0612118
Cambridge University Press) Sec. 6.2
[16] Bassi A and Ghirardi G C Phys. Lett. A 275 373
[17] Zurek W H (1981) Phys. Rev. D 24 1516; Schlosshauer M (2004) Rev. Mod. Phys. 75
1267, p. 1280
[18] For reviews of stochastic reduction models, see Bassi A and Ghirardi G C (2003) Phys.
Reports 379 257; Pearle P (1999) Collapse models, in Open Systems and Measurements in
Relativistic Quantum Field Theory, Lecture Notes in Physics 526, Breuer H-P and Petruc-
cione F eds. (Berlin: Springer-Verlag)
ABSTRACT
  We introduce a density tensor hierarchy for open system dynamics, that
recovers information about fluctuations lost in passing to the reduced density
matrix. For the case of fluctuations arising from a classical probability
distribution, the hierarchy is formed from expectations of products of pure
state density matrix elements, and can be compactly summarized by a simple
generating function. For the case of quantum fluctuations arising when a
quantum system interacts with a quantum environment in an overall pure state,
the corresponding hierarchy is defined as the environmental trace of products
of system matrix elements of the full density matrix. Only the lowest member of
the quantum noise hierarchy is directly experimentally measurable. The unit
trace and idempotence properties of the pure state density matrix imply descent
relations for the tensor hierarchies, that relate the order $n$ tensor, under
contraction of appropriate pairs of tensor indices, to the order $n-1$ tensor.
As examples to illustrate the classical probability distribution formalism, we
consider a quantum system evolving by It\^o stochastic and by jump process
Schr\"odinger equations. As examples to illustrate the corresponding trace
formalism in the quantum fluctuation case, we consider collisional Brownian
motion of an infinite mass Brownian particle, and the weak coupling Born-Markov
master equation. In different specializations, the latter gives the hierarchies
generalizing the quantum optical master equation and the Caldeira--Leggett
master equation. As a further application of the density tensor, we contrast
stochastic Schr\"odinger equations that reduce and that do not reduce the state
vector, and discuss why a quantum system coupled to a quantum environment
behaves like the latter.

<|endoftext|><|startoftext|>
Scalar self-force on eccentric geodesics in Schwarzschild spacetime: A time-domain
computation
Roland Haas
Department of Physics, University of Guelph, Guelph, Ontario, Canada N1G 2W1
(Dated: April 3, 2007)
We calculate the self-force acting on a particle with scalar charge moving on a generic geodesic
around a Schwarzschild black hole. This calculation requires an accurate computation of the retarded
scalar field produced by the moving charge; this is done numerically with the help of a fourth-order
convergent finite-difference scheme formulated in the time domain. The calculation also requires
a regularization procedure, because the retarded field is singular on the particle’s world line; this
is handled mode-by-mode via the mode-sum regularization scheme first introduced by Barack and
Ori. This paper presents the numerical method, various numerical tests, and a sample of results for
mildly eccentric orbits as well as “zoom-whirl” orbits.
PACS numbers: 04.25.-g, 04.40.-b, 41.60.-m, 45.50.-j, 02.60.Cb, 02.70.Bf
I. INTRODUCTION
The inspiral and capture of solar-mass compact objects
by supermassive black holes is one of the most promis-
ing and interesting sources of gravitational radiation to
be detected by the future space-based gravitational-wave
antenna LISA [1]. For these extreme mass-ratio inspirals,
one can treat the compact object as a point mass and de-
scribe its influence on the spacetime perturbatively. Go-
ing beyond the test mass limit, its motion is no longer
along a geodesic of the unperturbed spacetime of the cen-
tral black hole; it is a geodesic of the perturbed space-
time created by the presence of the moving body. When
viewed from the unperturbed spacetime, the small body
is said to move under the influence of its gravitational
self-force. The self-force induces radiative losses of energy
and angular momentum, which will eventually drive the
object into the black hole. To describe the motion of the
body, including its inspiral toward the black hole, we seek
to evaluate the self-force and calculate its effect on the
motion. One way of doing this uses the mode-sum reg-
ularization procedure introduced by Barack and Ori [2].
(For a comprehensive introduction of the problem, see
the special issue of Classical and Quantum Gravity [3].)
In this paper, in an effort to build expertise to calculate
the gravitational self-force, we retreat to the technically
simpler problem of a point particle of mass m endowed
with a scalar charge q orbiting a Schwarzschild black hole
of mass M . Following up on a previous paper [4], we
implement the numerical part of the regularization pro-
cedure for generic orbits with a time-domain integration
of the scalar-wave equation.
A. The problem
Our goal is to calculate the regularized self-force acting
on a scalar point charge in orbit around a Schwarzschild
black hole. In analogy with the gravitational case, where
in a first-order (in m/M) perturbative calculation the
particle moves on a geodesic of the background space-
time, we take the orbit of the particle to be a geodesic and
calculate the self-force as a vector field on this geodesic.
We start by writing the Schwarzschild metric using the
tortoise coordinate r∗ = r + 2M ln
ds2 = f
−dt2 + dr∗2
+ r2dΩ2, (1.1)
where f =
1− 2M
, dΩ2 =
dθ2 + sin2 θdφ2
is the
metric on a two-sphere, and t, r, θ and φ are the usual
Schwarzschild coordinates. Our task is to solve the scalar
wave equation
gαβ∇α∇βΦ(x) = −4πµ(x), (1.2)
µ(x) = q
δ4(x, z(τ))dτ , (1.3)
where ∇α is the covariant derivative compatible with the
metric gαβ , Φ(x) is the scalar field created by a scalar
charge q which moves along a world line γ : τ 7→ z(τ)
parametrized by proper time τ . The source term µ(x)
appearing on the right-hand side is written in terms of a
scalarized four-dimensional Dirac δ-function δ4(x, x
′) :=
δ(x0 − x′0)δ(x1 − x′1)δ(x2 − x′2)δ(x3 − x′3)/
− det(gαβ).
Because of the singularity in the source term, the re-
tarded solution to Eq. (1.2) is singular on the world line,
and the näıve expression for the self-force,
Fα(τ) = q∇αΦ(z(τ)), (1.4)
must be regularized. Following DeWitt and Brehme [5],
Mino, Sasaki, Tanaka [6], Quinn and Wald [7], Quinn [8]
carried out this regularization for the electromagnetic,
scalar and gravitational radiation reaction. In later work,
Detweiler and Whiting [9] introduced a very useful de-
composition of the retarded solution of Eq. (1.2) in terms
of a singular part ΦS and a regular remainder ΦR:
Φ = ΦS +ΦR. (1.5)
ΦR is regular and differentiable at the position of the par-
ticle, satisfies the homogeneous wave equation associated
http://arxiv.org/abs/0704.0797v2
with Eq. (1.2), and is solely responsible for the self-force
acting on the particle. ΦS , on the other hand, satisfies
Eq. (1.2), is just as singular at the particle’s position as
the retarded solution, and produces no force on the par-
ticle. Rearranging Eq. (1.5) and differentiating once, we
can write the regularized self-force as
Fα := q∇αΦR = q
∇αΦ−∇αΦS
. (1.6)
In a previous paper [4], we described our implemen-
tation of the regularization procedure to find a mode-
sum representation of ∇αΦS along a generic geodesic of
the Schwarzschild spacetime. Schematically, we intro-
duce a tetrad eα
and decompose the tetrad components
Φ(µ) := e
∇αΦ of the field gradient in terms of ordinary
scalar spherical harmonics Yℓm:
Φ(µ)(t, r, θ, φ) =
Φℓm(µ)(t, r)Yℓm(θ, φ). (1.7)
Each mode Φℓm
(t, r) is finite at the position of the par-
ticle, but their sum diverges on the world line. In [4], we
derive analytic expressions for the mode-sum decompo-
sition of ΦS(µ),
ΦS(µ) =q
ΦS(µ),ℓ (1.8)
ΦS(µ),ℓ = A(µ)
+B(µ) +
(ℓ− 1
)(ℓ + 3
+ · · · , (1.9)
where the coefficients A(µ), B(µ), C(µ), and D(µ) are in-
dependent of ℓ; they are listed in Appendix B for conve-
nience.
As each mode of Φ is finite, it is straightforward to
compute the modes of the retarded solution using nu-
merical methods, and we will describe how this was done
in Sec. IV. We use the numerical solutions in Eq. (1.6)
to calculate the regularized self-force, regularizing mode-
by-mode:
ΦR(µ) =
Φ(µ),ℓ − ΦS(µ),ℓ
, (1.10)
where Φ(µ),ℓ :=
Yℓm (no summation over ℓ im-
plied).
For numerical purposes it is convenient to define ψℓm
Φ(x) =
ℓm, (1.11)
where Yℓm are the usual scalar spherical harmonics. Af-
ter substituting in Eq. (1.2), this yields a reduced wave
equation for the multipole moments ψℓm:
−∂2tψℓm + ∂2r∗ψℓm − Vℓψℓm =
− 4πq
Ȳℓm(π/2, φ0)δ(r
∗ − r∗0), (1.12)
where
Vℓ = f
ℓ (ℓ+ 1)
. (1.13)
An overbar denotes complex conjugation, E = −ut is the
particle’s conserved energy per unit mass, and uα = dz
is its four velocity. Quantities bearing a subscript “0” are
evaluated at the particle’s position; they are functions of
τ that are obtained by solving the geodesic equation
uβ∇βuα = 0 (1.14)
in the background spacetime. Without loss of general-
ity, we have confined the motion of the particle to the
equatorial plane θ = π
Once we have numerically solved Eq. (1.12), we ex-
tract numerical estimates for ψℓm, ∂tψℓm and ∂r∗ψℓm,
which can then be used to find Φℓm, ∂tΦℓm and ∂rΦℓm.
These—together with the translation table displayed in
Eqs. (1.23)–(1.26) of [4], reproduced in Appendix A—
allow us to find the tetrad components Φ(µ)ℓm with re-
spect to the tetrad defined by Eqs. (1.18)–(1.21) of [4].
Eventually we regularize the multipole coefficients
Φ(µ)ℓ =
Φ(µ)ℓm(t0, r0)Y
ℓm(π/2, φ0) (1.15)
using Eq. (1.10); this involves the regularization param-
eters listed in Eqs. (1.30)–(1.45) of [4], which are repro-
duced in Appendix B.
B. Organization of this paper
In Sec. II we introduce the main ideas behind the
discretization scheme used in the numerical simulation.
Sec. III describes the choices we make in order to handle
the problems of specifying initial data and proper bound-
ary conditions. The next section—Sec. IV—provides de-
tails on the concrete implementation of the ideas put
forth in Secs. II and III. In Sec. V we describe the tests
we performed in order to validate our implementation of
the numerical method. Sec. VI finally presents sample
results for a small number of representative simulations.
C. Future work
This work, which deals with a scalar charge moving in
the Schwarzschild spacetime, is not intended to produce
physically or astrophysically interesting results. Instead,
its goal is to help us evaluate the merits of several strate-
gies that could be used to tackle the more interesting (and
difficult) problems of electromagnetism and gravity.
One future project we are currently exploring is to ap-
ply the formalism developed so far to the electromagnetic
self-force acting on an electric charge. Beyond the tech-
nical complication of having to deal with a vector field
instead of a single scalar quantity, we are also faced with
the reality of having to impose a gauge (in our case: the
Lorenz gauge) and to eliminate (or at least control) gauge
violations in the numerical simulation. The first step,
namely, the calculation of the regularization parameters
A(µ), B(µ), C(µ), and D(µ) for the self-force, is currently
underway. Also underway is the calculation of the regu-
larization parameters for he gravitational self-force.
Another project is the implementation of a scheme to
use the calculated self-force to update the orbital pa-
rameters of a particle on its inspiral toward the black
hole. The standard proposed approach to this problem
in the past has been to calculate the self-force on a set
of geodesics which are momentarily tangent to the par-
ticle’s trajectory. The self-force calculated in this way
is then used to update the orbital elements. This “after
the fact” calculation of the motion requires one to build
(in advance) a large database of self-force values for the
anticipated set of orbital parameters that the particle’s
trajectory will assume during its inspiral. Alternatively,
and conceptually more simply, the self-force could be cal-
culated self-consistently along the real, accelerated tra-
jectory. Such an approach requires changes in the expres-
sions of the regularization parameters, which so far have
been derived only for geodesic orbits. We are currently
investigating the merits of such an approach.
II. NUMERICAL METHOD
In this section we describe the algorithm used to inte-
grate the reduced wave equation [Eq. (1.12)] numerically.
For the most part we use the fourth-order algorithm in-
troduced by Lousto [10], with some modifications to suit
our needs. We choose to implement a fourth-order con-
vergent code because second-order convergence for the
potential Φ, while much easier to achieve, would guaran-
tee only first-order convergence for ∇αΦ, the quantity in
which we are ultimately interested. With a fourth-order
convergent code we can expect to achieve third-order con-
vergence for ∇αΦ, which is required for an accurate es-
timation of the self-force. Numerical experiments, how-
ever, show that in practice we do achieve fourth-order
convergence for the derivatives of Φ, a fortunate outcome
that we exploit but cannot explain.
From now on, we will suppress the subscripts ℓ and m
on Vℓ and ψℓm for convenience of notation. The wave
equation consists of three parts: the wave-operator term
(∂2r∗ − ∂2t )ψ and the potential term V ψ on the left-hand
side, and the source term on the right-hand side of the
equation. Of these, the wave operator turns out to be
easiest to handle, and the source term does not create a
substantial difficulty. The term involving the potential
V turns out to be the most difficult one to handle.
Following Lousto we introduce a staggered grid with
step sizes ∆t = 1
∆r∗ ≡ h, which follows the characteris-
tic lines of the wave operator in Schwarzschild spacetime;
see Fig. 1 for a sketch of a typical grid cell. The basic
idea behind the method is to integrate the wave equation
over a unit cell of the grid, which nicely deals with the
Dirac-δ source term on the right-hand side. To this end,
we introduce the Eddington-Finkelstein null coordinates
v = t + r∗ and u = t − r∗ and use them as integration
variables.
A. Differential operator
Rewriting the wave operator in terms of u and v, we
find −∂2t + ∂2r∗ = −4∂u∂v, which allows us to evaluate
the integral involving the wave operator exactly. We find
−4∂u∂vψ du dv =− 4[ψ(t+ h, r∗) + ψ(t− h, r∗)
− ψ(t, r∗ − h)− ψ(t, r∗ + h)].
(2.1)
B. Source term
If we integrate over a cell traversed by the particle, then
the source term on the right-hand side of the equation
will have a non-zero contribution. Writing the source
term as G(t, r∗)δ(r∗ − r∗0(t)) with
G(t, r∗) = −4πq f
Ȳℓm(π/2, φ0), (2.2)
we find
Gδ(r∗ − r∗0(t)) du dv =−
f0(t)
r0(t)
× Ȳℓm(π/2, φ0(t)) dt,
(2.3)
where t1 and t2 are the times at which the particle enters
and leaves the cell, respectively. While we do not have
an analytic expression for the trajectory of the particle
(except when the particle follows a circular orbit), we can
numerically integrate the first-order ordinary differential
equations that govern the particle’s motion to a precision
that is much higher than that of the partial differential
equation governing ψ. In this sense we treat the integral
over the source term as exact. To evaluate the integral
we adopt a four-point Gauss-Legendre scheme, which has
an error of order h8.
C. Potential term
The most problematic term—from the point of view of
implementing an approximation of sufficiently high or-
der in h—turns out to be the term V ψ in Eq. (1.12).
Since this term does not contain a δ-function, we have to
approximate the double integral
V ψ du dv (2.4)
t0 − h
t0 + h
t0 − 2h
0 − 2h r
0 + 2h
0 − 3h r
0 + 3hr
0 + hr
0 − h
13 34
87 9 10
FIG. 1: Points used to calculate the integral over the potential
term for vacuum cells. Grid points are indicated by blue cir-
cles while red cross-hairs indicate points in between two grid
points. We calculate field values at points that do not lie on
the grid by employing the second-order algorithm described
in [10].
up to terms of order h6 for a generic cell in order to
achieve an overall O(h4) convergence of the scheme.
Here we have to treat cells traversed by the particle
(“sourced” cells) differently from the generic (“vacuum”)
cells. While much of the algorithm can be transferred
from the vacuum cells to the sourced cells, some modifica-
tions are required. We will describe each case separately
in the following subsections.
1. Vacuum case
To implement Lousto’s algorithm to evolve the field
across the vacuum cells, we use a double Simpson rule to
compute the integral Eq. (2.4). We introduce the nota-
g(t, r∗) = V (r∗)ψ(t, r∗) (2.5)
and label our points in the same manner (see Fig. 1) as
in [10]:
g du dv =
[g1 + g2 + g3 + g4 + 4(g12+
g24 + g34 + g13) + 16g0] +O(h
6). (2.6)
Here, for example, g1 is the value of g at the grid point
labeled 1, and g12 is the value of g at the off-grid point
labeled 12, etc. Deviating from Lousto’s algorithm, we
choose to calculate g0 using an expression different from
that derived in [10]. Unlike Lousto’s approach, our ex-
pression exclusively involves points that are within the
past light cone of the current cell. We find
8V4 ψ4 + 8V1 ψ1 + 8V2 ψ2 − 4V6 ψ6 − 4V5 ψ5
+ V10 ψ10 + V7 ψ7 − V9 ψ9 − V8 ψ8
+O(h4). (2.7)
In order to evaluate the term in parentheses in
Eq. (2.6), we again use a variant of the equations given
in [10]. Lousto’s equations (33) and (34),
g13 + g12 =V (r
0 − h/2) (ψ1 + ψ0)
V (r∗0 − h/2)
+O(h4),
(2.8)
g24 + g34 =V (r
0 + h/2) (ψ0 + ψ4)
V (r∗0 + h/2)
+O(h4)
(2.9)
contain isolated occurrences of ψ0, the value of the field
at the central point. Since Eq. (2.7) only allows us to
find g0 = V0ψ0, finding ψ0 would involve a division by
V0, which will be numerically unstable very close to the
event horizon where V0 ≈ 0. Instead we choose to express
the potential term appearing in the square brackets as a
Taylor series around r∗0 . This allows us to eliminate the
isolated occurrences of ψ0, and we find
g13+g12 + g24 + g34 = 2V (r
V (r∗0)
+ V (r∗0 − h/2)ψ1
V (r∗0 − h/2)
+ V (r∗0 + h/2)ψ4
V (r∗0 + h/2)
V (r∗0 − h/2)− 2V (r∗0)
+ V (r∗0 + h/2)
(ψ1 + ψ4) +O(h
4). (2.10)
Because of the
factor in Eq. (2.6), this allows us
to reach the required O(h6) convergence for a generic
vacuum cell. This—given that there is a number of order
N = 1/h2 of such cells—yields the desired overall O(h4)
convergence of the full algorithm, at the end of the N
steps required to finish the simulation.
2. Sourced cells
For vacuum cells, the algorithm described above is
the complete algorithm used to evolve the field forward
in time. For cells traversed by the particle, however,
we have to reconsider the assumptions used in deriving
Eqs. (2.7) and (2.10). When deriving Eq. (2.10) we have
employed the second-order evolution algorithm (see [10]),
in which the single step equation
ψ3 =− ψ2 +
(ψ1 + ψ4) (2.11)
O(h3)O(h4)O(h5) O(h5)O(h4)
traje
tory
t0 + h
t0 − h
t0 − 2h
FIG. 2: Cells affected by the passage of the particle, showing
the reduced order of the single step equation
is accurate only to O(h3) for cells traversed by the parti-
cle. For these cells, therefore, the error term in Eq. (2.10)
is O(h3) instead of O(h4). As there is a number of or-
der N ′ = 1/h of cells that are traversed by the parti-
cle in a simulation run, the overall error—after including
factor in Eq. (2.6)—is of order h4. We can
therefore afford this reduction of the convergence order
in Eq. (2.10)
Equation (2.7), however, is accurate only to O(h) for
cells traversed by the particle. Again taking the
factor into account, this renders the overall algorithm
O(h2). Figure 2 shows the cells affected by the particle’s
traversal and the reduced order of the single step equa-
tion for each cell. Cells whose convergence order is O(h5)
or higher do not need modifications, since there is only a
number N ′ = 1/h of such cells in the simulation. We are
therefore concerned about cells neighboring the particle’s
trajectory and those traversed by the particle.
a. Cells neighboring the particle These cells are not
traversed by the particle, but the particle might have
traversed cells in their past light-cone, which are used in
the calculation of g0 in Eq. (2.7). For these cells, we use
a one-dimensional Taylor expansion of g(t, r∗) within the
current time-slice t = t0,
5V (r∗0 − h)ψ(t0, r∗0 − h)
+ 15V (r∗0 − 3h)ψ(t0, r∗0 − 3h)
− 5V (r∗0 − 5h)ψ(t0, r∗0 − 5h)
+ V (r∗0 − 7h)ψ(t0, r∗0 − 7h)
+O(h4) (2.12)
for the cell on the left-hand side, and
5V (r∗0 + h)ψ(t0, r
0 + h)
+ 15V (r∗0 + 3h)ψ(t0, r
0 + 3h)
− 5V (r∗0 + 5h)ψ(t0, r∗0 + 5h)
+ V (r∗0 + 7h)ψ(t0, r
0 + 7h)
+O(h4) (2.13)
for the cell on the right-hand side, where (t0, r
0) is the
center of the cell traversed by the particle. Both of these
are more accurate than is strictly necessary; we would
t0 − h
t0 + h
0 + 2hr
0 − 2h r
0 − h r
0 + h
(3a) (3b)
(2a) (2b)
FIG. 3: Typical cell traversal of the particle. We split the
domain into sub-parts indicated by the dotted line based on
the time the particle enters (at t1) and leaves (at t2) the cell.
The integral over each sub-part is evaluated using an iterated
two-by-two point Gauss-Legendre rule.
need error terms of order h3 to achieve the desired over-
all O(h4) convergence of the algorithm. Keeping the ex-
tra terms, however, improves the numerical convergence
slightly.
b. Cell traversed by the particle We choose not to
implement a fully explicit algorithm to handle cells tra-
versed by the particle, because this would increase the
complexity of the algorithm by a significant factor. In-
stead we use an iterative approach to evolve the field
using the integrated wave equation
−4(ψ3+ψ2 − ψ1 − ψ4)−
V ψ du dv =
− 8πq
f0(t)
r0(t)
Ȳℓm(π/2, φ0(t)) dt. (2.14)
In this equation the integral involving the source term
can be evaluated to any desired accuracy at the begin-
ning of the iteration, because the motion of the particle
is determined by a simple system of ordinary differential
equations, which are easily integrated with reliable nu-
merical methods. It remains to evaluate the integral over
the potential term, which we do iteratively. Schemati-
cally the method works as follows:
• Make an initial guess for ψ3 using the second-order
algorithm. This guess is correct up to terms of
O(h3).
• Match a second-order piecewise interpolation poly-
nomial to the six points that make up the past light-
cone of the future grid point, including the future
point itself.
• Use this approximation for ψ to numerically calcu-
V ψ du dv,
using two-by-two point Gauss-Legendre rules for
the six sub-parts indicated in Fig. 3.
• Update the future value of the field and repeat the
process until the iteration has converged to a re-
quired degree of accuracy.
trajectory
FIG. 4: Numerical domain evolved during the simulation. We
impose an inner boundary condition close to the black whole
where we can implement it easily to the accuracy of the un-
derlying floating point format. Far away from the black hole,
we evolve the full domain of dependence of the initial data
domain without imposing boundary conditions.
III. INITIAL VALUES AND BOUNDARY
CONDITIONS
As is typical for numerical simulations, we have to pay
careful attention to specifying initial data and appropri-
ate boundary conditions. These aspects of the numerical
method are highly non-trivial problems in full numerical
relativity, but they can be solved or circumvented with
moderate effort in the present work.
A. Initial data
In this work we use a characteristic grid consisting of
points lying on characteristic lines of the wave operator
to evolve ψ forward in time. As such, we need to specify
characteristic initial data on the lines u = u0 and v = v0
shown in Fig. 4. We choose not to worry about specifying
“correct” initial data, but instead arbitrarily choose ψ to
vanish on u = u0 and v = v0:
ψ(u = u0, v) = ψ(u, v = v0) = 0. (3.1)
This is equivalent to adding spurious initial waves in the
form of a homogeneous solution of Eq. (1.12) to the cor-
rect solution. This produces an initial wave burst that
moves away from the particle with the speed of light,
and quickly leaves the numerical domain. Any remain-
ing tails of the spurious initial data decay as t−(2ℓ+2) as
shown in [11] and become negligible after a short time.
We conclude that the influence of the initial-wave con-
tent on the self-force becomes negligible after a time of
the order of the light-crossing time of the particle’s orbit.
B. Boundary conditions
On the analytical side we would like to impose ingoing
boundary conditions at the event horizon r∗ → −∞ and
outgoing boundary conditions at spatial infinity r∗ → ∞,
r∗→−∞
∂uψ =0, lim
∂vψ =0. (3.2)
Because of the finite resources available to a computer
we can only simulate a finite region of the spacetime,
and are faced with the reality of implementing boundary
conditions at finite values of r∗. Two solutions to this
problem present themselves:
1. choose the numerical domain to be the domain of
dependence of the initial data surface. Since the
effect of the boundary condition can only propagate
forward in time with at most the speed of light,
this effectively hides any influence of the boundary.
This is what we choose to do in order to deal with
the outer boundary condition.
2. implement boundary conditions sufficiently “far
out” so that numerically there is no difference be-
tween imposing the boundary condition there or at
infinity. Since the boundary conditions depend on
the vanishing of the potential V (r) appearing in the
wave equation, this will happen once 1−2M/r ≈ 0.
Near the horizon r ≈ 2M(1 + exp(r∗/2M)), so
this will happen—to numerical accuracy—for mod-
estly large (negative) values of r∗ ≈ −73M . We
choose to implement the ingoing waves condition
∂uψℓm = 0 there.
IV. IMPLEMENTATION
Making more precise the ideas developed in the pre-
ceding sections, we implement the following numerical
scheme.
A. Particle motion
Following Darwin [12] we introduce the dimensionless
semi-latus rectum p and the eccentricity e such that for
a bound orbit around a Schwarzschild black hole of mass
1 + e
, r2 =
(4.1)
are the radial positions of the periastron and apastron,
respectively. Energy per unit mass and angular momen-
tum per unit mass are then given by
(p− 2− 2e)(p− 2 + 2e)
p (p− 3− e2)
, L2 =
p− 3− e2
(4.2)
Together with these definitions it is useful to introduce
an orbital parameter χ such that along the trajectory of
the particle,
r(χ) =
1 + e cosχ
, (4.3)
where χ is single-valued along the orbit. We can then
write down first-order differential equations for χ(t) and
the azimuthal angle φ(t) of the particle,
(p− 2− 2e cosχ)(1 + e cosχ)(1 + e cosχ)
(Mp2)
p− 6− 2e cosχ
(p− 2− 2e)(p− 2 + 2e)
, (4.4)
(p− 2− 2e cosχ)(1 + e cosχ)2
p3/2M
(p− 2− 2e)(p− 2 + 2e)
. (4.5)
We use the embedded Runge-Kutta-Fehlberg (4, 5) algo-
rithm provided by the GNU Scientific Library routine
gsl odeiv step rkf45 and an adaptive step-size control
to evolve the position of the particle forward in time.
Intermediate values of the particle’s position are found
using a Hermite interpolation of the nearest available cal-
culated positions.
B. Initial data
We do not specify initial data. The field is set to zero
on the initial characteristic slices, u = u0 and v = v0.
C. Boundary conditions
We adjust the outer boundary of the numerical do-
main at each time-step so that we cover the domain of
dependence of the initial characteristic surfaces and the
particle’s world line. The resulting numerical domain was
already shown in Fig. 4.
Near the event horizon, at r∗ ≈ −73M , we implement
an ingoing-wave boundary condition by imposing
ψ(t+ h, r∗) = ψ(t, r∗ − h). (4.6)
This allows us to drastically reduce the number of cells
in the numerical domain, and consequently the running
time of the simulation.
D. Evolution in vacuum
Cells not traversed by the particle are evolved using
Eqs. (2.1), (2.6) – (2.10). Explicitly written out, we use
ψ3 = −ψ2
(V0 + V1) +
V0 (V0 + V1)
(V0 + V4) +
V0 (V0 + V4)
(g12 + g24 + g34 + g13 + 4g0),
(4.7)
where g0 is given by Eq. (2.7) and the sum g12 + g24 +
g34 + g13 is given by Eq. (2.10).
E. Cells next to the particle
Vacuum cells close to the current position of the parti-
cle require a different approach to calculate g0, since the
cells in their past light cone could have been traversed
by the particle. We use Eqs. (2.12) and (2.13) to find
g0 in this case. Other than this modification, the same
algorithm as for generic vacuum cells is used.
F. Cells traversed by the particle
We evolve cells traversed by the particle using the it-
erative algorithm described in Sec. II C 2. Here
ψ3 =− ψ1 + ψ2
+ ψ4 −
V ψ du dv
f0(t)
r0(t)
Ȳℓm(π/2, π0(t)) dt, (4.8)
where the initial guess for the iterative evolution of∫∫
V ψ du dv is obtained using the second order algo-
rithm of Lousto and Price [13],
ψ3 =− ψ1 +
× [ψ2 + ψ4]
f0(t)
r0(t)
Ȳℓm(π/2, π0(t)) dt. (4.9)
Successive iterations use a four-point Gauss-Legendre
rule to evaluate the integral of V ψ; this requires a second-
order polynomial interpolation of the current field values
as described in Appendix C.
G. Extraction of the field data at the particle
In order to extract the value of the field and its first
derivatives at the position of the particle, we again use
a polynomial interpolation at the points surrounding the
particle’s position. Using a fourth-order polynomial, as
described in Appendix C, we can estimate ψ, ∂tψt, and
∂r∗ψ at the position of the particle up to errors of order
h4. As was briefly mentioned in Sec. II, we would expect
an error term of order h3 for ∂tψt and ∂r∗ψ. The O(h
accuracy we actually achieve by using a fourth-order (in-
stead of a third-order) piecewise polynomial shows up
clearly in a regression plot such as Fig. 7.
H. Regularization of the mode sum
We use the calculated multipole moments ψℓm to con-
struct the multipole moments Φℓm, and first derivatives
∂tΦℓm and ∂rΦℓm, of the scalar field. These, in turn, are
used to calculate the tetrad components Φ(0)ℓm, Φ(+)ℓm,
Φ(−)ℓm, and Φ(3)ℓm of the field gradient according to
Eqs. (1.23)–(1.26) of [4], which are reproduced in Ap-
pendix A. These multipoles then give rise to the multi-
pole coefficients of the retarded field,
Φ(µ)ℓ(t, r, θ, φ) =
Φ(µ)ℓm(t, r)Yℓm(θ, φ), (4.10)
which are subjected to the regularization procedure de-
scribed by Eq. (1.29) of [4],
ΦR(µ)(t, r0, π/2, φ0) = lim
Φ(µ)ℓ(t, r0 +∆, π/2, φ0)
(ℓ+ 1/2)A(µ) +B(µ)
(ℓ + 1/2)
(ℓ− 1/2)(ℓ+ 3/2)
+ · · ·
, (4.11)
using the regularization parameters A(µ), B(µ), C(µ), and
D(µ) tabulated in Appendix B.
Finally we reconstruct the vector components of the
field gradient using Eqs. (1.47)–(1.48) of [4],
ΦRt =
(0), (4.12)
ΦRr =
ΦR(+)e
−iφ0 +ΦR(−)e
, (4.13)
ΦRθ = −r0ΦR(3), (4.14)
ΦRφ = −
ΦR(+)e
−iφ0 − ΦR(−)e
, (4.15)
and calculate the self-force
Fα = qΦ
α . (4.16)
We recall the discussion in Sec. I A concerning the def-
inition of ΦR, its connection to the self-force acting on
the particle, and its regularity at the particle’s position.
V. NUMERICAL TESTS
In this section we present the tests we have performed
to validate our numerical evolution code. First, in order
to check the fourth-order convergence rate of the code,
we perform regression runs with increasing resolution for
both a vacuum test case, where we seeded the evolution
with a Gaussian wave packet, and a case where a particle
is present. As a second test, we compute the regularized
self-force for several different combinations of orbital el-
ements p and e and check that the multipole coefficients
decay with ℓ as expected. This provides a very sensi-
tive check on the overall implementation of the numerical
scheme, as well as the analytical calculations that lead to
the regularization parameters. Finally, we calculate the
self-force for a particle on a circular orbit and show that
it agrees with the results presented in [4, 14].
A. Convergence tests: Vacuum
As a first test of the validity of our numerical code we
estimate the convergence order by removing the particle
and performing regression runs for several resolutions.
We use a Gaussian wave packet as initial data,
ψ(u = u0, v) = exp(−[v − vp]2/[2σ2]), (5.1)
ψ(u, v = v0) = 0, (5.2)
where vp = 75M and σ = 10M , v0 = −u0 = 6M +
2M ln 2, and we extract the field values at r∗ = 20M .
Several such runs were performed, with varying resolu-
tion of 2, 4, 8, 16, and 32 grid points per M . Figure 5
shows ψ(2h)−ψ(h) rescaled by appropriate powers of 2,
so that in the case of fourth-order convergence the curves
would lie on top of each other. As can be seen from the
plots, they do, and the vacuum portion of the code is
indeed fourth-order convergent.
B. Convergence tests: Particle
While the convergence test described in section VA
clearly shows that the desired convergence is achieved
for vacuum evolution, it does not test the parts of the
code that are used in the integration of the inhomoge-
neous wave equation. To test these we perform a second
set of regression runs, this time using a non-zero charge
q. We extract the field at the position of the particle,
thus also testing the implementation of the extraction
algorithm described in section IVG. For this test we
choose the ℓ = 6, m = 4 mode of the field generated
by a particle on a mildly eccentric geodesic orbit with
p = 7, e = 0.3. As shown in Fig. 6 the convergence
is still of fourth order, but the two curves no longer lie
precisely on top of each other at all times. The region
before t ≈ 100M is dominated by the initial wave burst
and therefore does not scale as expected, yielding two
very different curves. In the region 300M . t . 400M
the two curves lie on top of each other, as expected for a
fourth-order convergent algorithm. In the region between
t ≈ 200M and t ≈ 300M , however, the dashed curves
-8.0e-07
-6.0e-07
-4.0e-07
-2.0e-07
0.0e+00
2.0e-07
4.0e-07
6.0e-07
8.0e-07
 0  20  40  60  80  100  120
δ16-8
δ32-16
FIG. 5: Convergence test of the numerical algorithm in the
vacuum case. We show differences between simulations using
different step sizes h = 0.5M (ψ2), h = 0.25M (ψ4), h =
0.125M (ψ8), h = 0.0625M (ψ16), and h = 0.03125M (ψ32).
Displayed are the rescaled differences δ4−2 = ψ4 −ψ2, δ8−4 =
24(ψ8 − ψ4), δ16−8 = 4
4(ψ8 − ψ4), and δ32−16 = 8
4(ψ8 − ψ4)
for the real part of the ℓ = 2, m = 2 mode at r∗ ≈ 20M .
The maximum value of the field itself is of the order of 0.1,
so that the errors in the field values are roughly five orders
of magnitude smaller than the field values themselves. We
can see that the convergence is in fact of fourth-order, as the
curves lie nearly on top of each other, with only the lowest
resolution curve δ4−2 deviating slightly.
have slightly smaller amplitudes than the solid one, indi-
cating an order of convergence different from (but close
to) four.
To explain this behavior we have to examine the terms
that contribute significantly to the error in the simula-
tion. The numerical error is almost completely domi-
nated by that of the approximation of the potential term∫∫
V ψ du dv in the integrated wave equation. For vac-
uum cells the error in this approximation scales as h6,
where h is the step size. For cells traversed by the parti-
cle, on the other hand, the approximation error depends
also on the difference t2−t1 of the times at which the par-
ticle enters and leaves the cell. This difference is bounded
by h but does not necessarily scale as h. For example, if
a particle enters a cell at its very left, then scaling h by 1
would not change t2 − t1 at all, thus leading to a scaling
behavior that differs from expectation.
To investigate this further we conducted test runs
of the simulation for a particle on a circular orbit at
r = 6M . In order to observe the expected scaling behav-
ior, we have to make sure that the particle passes through
the tips of the cell it traverses. When this is the case, then
t2 − t1 ≡ h and a plot similar to the one shown in Fig. 6
shows the proper scaling behavior. As a further test we
artificially reduced the convergence order of the vacuum
algorithm to two by implementing the second-order algo-
rithm described in [10]. By keeping the algorithm that
deals with sourced cells unchanged, we reduced the rela-
-4e-05
-2e-05
 2e-05
 4e-05
 100  200  300  400  500
δ16-8
FIG. 6: Convergence test of the numerical algorithm in the
sourced case. We show differences between simulations using
different step sizes of 4 (ψ4), 8 (ψ8), 16 (ψ16), and 32 (ψ32)
cells per M . Displayed are the rescaled differences δ8−4 =
ψ8 −ψ4, etc. (see caption of Fig. 5 for definitions) of the field
values at the position of the particle for a simulation with
ℓ = 6, m = 4 and p = 7, e = 0.3. We see that the convergence
is approximately fourth-order.
-4e-05
-2e-05
 2e-05
 4e-05
 100  200  300  400  500
δ32-16
δ16-8
FIG. 7: Convergence test of the numerical algorithm in the
sourced case. We show differences between ∂rΦ for simula-
tions using different step sizes of 4 (Φr,4), 8 (Φr,8), 16 (Φr,16),
and 32 (Φr,32) cells per M . Displayed are the rescaled differ-
ences δ8−4 = Φr,8 − Φr,4 etc. of the values at the position of
the particle for a simulation with ℓ = 6, m = 4 and p = 7,
e = 0.3. Although there is much noise caused by the piece-
wise polynomials used to extract the data, we can see that
the convergence is approximately fourth-order.
tive impact on the numerical error. This, too, allows us
to recover the expected (second-order) convergence. Fig-
ures 8 and 9 illustrate the effects of the measures taken
to control the convergence behavior.
-3.000e-05
-2.000e-05
-1.000e-05
0.000e+00
1.000e-05
2.000e-05
3.000e-05
 200  210  220  230  240  250  260  270  280
δ64-32
δ32-16
δ16-8
-3.000e-05
-2.000e-05
-1.000e-05
0.000e+00
1.000e-05
2.000e-05
3.000e-05
 200  210  220  230  240  250  260  270  280
FIG. 8: Behavior of convergence tests for a particle in circular
orbit at r = 6M . We show differences between simulations of
the ℓ = 2, m = 2 multipole moment using different step sizes
of 2 (ψ2), 4 (ψ4), 8 (ψ8), 16 (ψ16), 32 (ψ32) and 64 (ψ64) cells
per M . Displayed are the real part of the rescaled differences
δ4−2 = (ψ4 −ψ2) etc. of the field values at the position of the
particle, defined as in Fig. 5. The values have been rescaled
so that—for fourth order convergence—the curves should all
coincide. The upper panel corresponds to a set of simulations
where the particle traverses the cells away from their tips.
The curves do not coincide perfectly with each other, seem-
ingly indicating a failure of the convergence. The lower panel
was obtained in a simulation where the particle was carefully
positioned so as to pass through the tips of each cell it tra-
verses. This set of simulations passes the convergence test
more convincingly.
C. High-ℓ behavior of the multipole coefficients
Inspection of Eq. (4.11) reveals that a plot of Φ(µ)ℓ as
a function of ℓ (for a selected value of t) should display
a linear growth in ℓ for large ℓ. Removing the A(µ) term
should produce a constant curve, removing the B(µ) term
(given that C(µ) = 0) should produce a curve that decays
as ℓ−2, and finally, removing the D(µ) term should pro-
duce a curve that decays as ℓ−4. It is a powerful test of
the numerical methods to check whether these expecta-
tions are borne out by the numerical data. Fig. 10 plots
the remainders as obtained from our numerical simula-
tion, demonstrating the expected behavior. It displays,
on a logarithmic scale, the absolute value of ReΦR(+)ℓ, the
real part of the (+) component of the self-force. The orbit
is eccentric (p = 7.2, e = 0.5), and all components of the
self-force require regularization. The first curve (in trian-
gles) shows the unregularized multipole coefficients that
increase linearly in ℓ, as confirmed by fitting a straight
line to the data. The second curve (in squares) shows par-
tially regularized coefficients, obtained after the removal
of (ℓ + 1/2)A(µ); this clearly approaches a constant for
large values of ℓ. The curve made up of diamonds shows
the behavior after removal of B(µ); because C(µ) = 0, it
decays as ℓ−2, a behavior that is confirmed by a fit to
-8.000e-05
-4.000e-05
0.000e+00
4.000e-05
8.000e-05
 200  210  220  230  240  250  260  270  280
-8.000e-05
-4.000e-05
0.000e+00
4.000e-05
8.000e-05
 200  210  220  230  240  250  260  270  280
-8.000e-05
-4.000e-05
0.000e+00
4.000e-05
8.000e-05
 200  210  220  230  240  250  260  270  280
δ64-32
δ32-16
δ16-8
FIG. 9: Behavior of convergence tests for a particle in circular
orbit at r = 6M . We show differences between simulations
of the ℓ = 2, m = 2 multipole moment using different step
sizes of 8 (ψ8), 16 (ψ16), 32 (ψ32), and 64 (ψ64) cells per M .
Displayed are the real part of the rescaled differences δ16−8 =
ψ16−ψ8 etc. of the field values at the position of the particle,
defined as in Fig. 5. The values have been rescaled so that—
for second order convergence—the curves should all coincide.
The upper two panels correspond to simulations where the
second order algorithm was used throughout. For the topmost
one, care was taken to ensure that the particle passes through
the tip of each cell it traverses, while in the middle one no
such precaution was taken. Clearly the curves in the middle
panel do not coincide with each other, indicating a failure
of the second-order convergence of the code. The lower panel
was obtained in a simulation using the mixed-order algorithm
described in the text. While the curves still do not coincide
precisely, the observed behavior is much closer to the expected
one than for the purely second order algorithm.
the ℓ ≥ 5 part of the curve. Finally, after removal of
D(µ)/[(ℓ − 12 ) (ℓ +
)] the terms of the sum decrease in
magnitude as ℓ−4 for large values of ℓ, as derived in [15].
Each one of the last two curves would result in a con-
verging sum, but the convergence is much faster after
subtracting the D(µ) terms. We thereby gain more than
2 orders of magnitude in the accuracy of the estimated
Figure 10 provides a sensitive test of the implemen-
tation of both the numerical and analytical parts of the
calculation. Small mistakes in either one will cause the
difference in Eq. (4.11) to have a vastly different behav-
 0  2  4  6  8  10  12  14
ReΦ(+)
ReΦ(+)-A
ReΦ(+)-A-B
ReΦ(+)-A-B-D
FIG. 10: Multipole coefficients of the dimensionless self-force
ReΦR(+) for a particle on an eccentric orbit (p = 7.2, e =
0.5). The coefficients are extracted at t = 500M along the
trajectory shown in Fig. 12. The plots show several stages of
the regularization procedure, with a closer description of the
curves to be found in the text.
 0  2  4  6  8  10  12  14  16  18  20
FIG. 11: Multipole coefficients of ΦR(0) for a particle on a circu-
lar orbit. Note that ΦR(0)ℓ is linked to Φ
t via Φ
The multipole coefficients decay exponentially with ℓ until
ℓ ≈ 16, at which point numerical errors start to dominate.
D. Self-force on a circular orbit
For the case of a circular orbit, the regularization pa-
rameters A(0), B(0), and D(0) all vanish identically, so
that the (0) (or alternatively the t) component of the
self-force does not require regularization. Figure 11 thus
shows only one curve, with the magnitude of the multi-
pole coefficients decaying exponentially with increasing
As a final test, in Table I we compare our result for the
self-force on a particle in a circular orbit at r = 6M to
those obtained in [4, 14] using a frequency-domain code.
For a circular orbit, a calculation in the frequency domain
TABLE I: Results for the self-force on a scalar particle with
scalar charge q on a circular orbit at r0 = 6M . The
first column lists the results as calculated in this work us-
ing time-domain numerical methods, while the second and
third columns list the results as calculated in [4, 14] using
frequency-domain methods. For the t and φ components the
number of digits is limited by numerical roundoff error. For
the r component the number of digits is limited by the trun-
cation error of the sum of multipole coefficients.
This work: Previous work: Diaz-Rivera
time-domain frequency-domain [4] et. al. [14]
ΦRt 3.60339 × 10
−4 3.60907254 × 10−4
1.6767 × 10−4 1.67730 × 10−4 1.6772834 × 10−4
ΦRφ −5.30424 × 10
−3 −5.30423170 × 10−3
is more efficient, and we expect the results of [4, 14] to
be much more accurate than our own results. This fact
is reflected in the number of regularization coefficients
we can reliably extract from the numerical data, before
being limited by the accuracy of the numerical method:
the frequency-domain calculation found usable multipole
coefficients up to ℓ = 20, whereas our data for ΦR(0)ℓ is
dominated by noise by the time ℓ reaches 16. Figure 11
shows this behavior.
E. Accuracy of the numerical method
Several figures of merit can be used to estimate the
accuracy of numerical values for the self-force.
An estimate for the truncation error arising from cut-
ting short the summation in Eq. (4.11) at some ℓmax can
be calculated by considering the behavior of the remain-
ing terms for large ℓ. Detweiler et. al. [15] showed that
the remaining terms scale as ℓ−4 for large ℓ. They find
the functional form of the terms to be
EP3/2
(2ℓ− 3)(2ℓ− 1)(2ℓ+ 3)(2ℓ+ 5)
, (5.3)
where P3/2 = 36
2. We fit a function of this form to the
tail end of a plot of the multipole coefficients to find the
coefficient E in Eq. (5.3). Extrapolating to ℓ → ∞ we
find that the truncation error is
ℓ=ℓmax
[Eq. (5.3)] (5.4)
2Eℓmax
(2ℓmax + 3)(2ℓmax + 1)(2ℓmax − 1)(2ℓmax − 3)
(5.5)
where ℓmax is the value at which we cut the summation
short. For all but the special case of the (0) component
for a circular orbit, for which all regularization parame-
ters vanish identically, we use this approach to calculate
an estimate for the truncation error.
A second source of error lies in the numerical calcula-
tion of the retarded solution to the wave equation. This
error depends on the step size h used to evolve the field
forward in time. For a numerical scheme of a given con-
vergence order, we can estimate this discretization error
by extrapolating the differences of simulations using dif-
ferent step sizes down to h = 0. This is what was done
in the graphs shown in Sec. VB.
We display results for mildly eccentric orbits. A high
eccentricity causes ∂rΦ (displayed in Fig. 7) to be plagued
by high frequency noise produced by effects similar to
those described in Sec. VB. This makes it impossible to
reliably estimate the discretization error for these orbits.
We do not expect this to be very different from the errors
for mildly eccentric orbits.
Finally we compare our final results for the self-force
Fα to “reference values”. For circular orbits, frequency-
domain calculations are much more accurate than our
time-domain computations. We thus compare our results
to the results obtained in [4]. Table II lists typical values
for the various errors listed above.
error estimation mildly eccentric orbit
truncation error (M
Φ(+)) ≈ 2× 10
discretization error (M
∂rΦℓm) ≈ 10
comparison with reference values circular orbit
Ft 0.2%
Fr 0.04%
Fφ 2× 10
TABLE II: Estimated values for the various errors in the com-
ponents of the self-force as described in the text. We show
the truncation and discretization errors for a mildly eccentric
orbit and the total error for a circular orbit. The truncation
error is calculated using a plot similar to the one shown in
Fig. 16. The discretization error is estimated using a plot
similar to that in Fig. 7 for the ℓ = 2, m = 2 mode, and the
total error is estimated as the difference between our values
and those of [4]. We use p = 7.2 , e = 0.5 for the mildly
eccentric orbit. Note that we use the tetrad component Φ(+)
for the truncation error and the vector component ∂rΦ for
the discretization error. Both are related by the translation
table Eqs. (A6) – (A9), we expect corresponding errors to be
comparable for Φ(+) and ∂rΦ.
VI. SAMPLE RESULTS
In this section we describe some results of our numer-
ical calculation.
A. Mildly eccentric orbit
We choose a particle on an eccentric orbit with p = 7.2,
e = 0.5 which starts at r = pM/(1−e2), halfway between
 15  10  5  0  5  10
trajectory for p=7.2, e=0.5
FIG. 12: Trajectory of a particle with p = 7.2, e = 0.5. The
cross-hair indicates the point where the data for Fig. 10 was
extracted.
-0.014
-0.012
-0.01
-0.008
-0.006
-0.004
-0.002
 0.002
 100  200  300  400  500  600  700  800  900  1000
time/M
FIG. 13: Regularized dimensionless self-force M
and M
Fφ on a particle on an eccentric orbit with p = 7.2,
e = 0.5.
periastron and apastron. The field is evolved for 1000M
with a resolution of 16 grid points per M , both in the t
and r∗ directions, for ℓ = 0. Higher values of ℓ (and thus
m) require a corresponding increase in the number of
grid points used to achieve the same fractional accuracy.
Multipole coefficients for 0 ≤ ℓ ≤ 15 are calculated and
used to reconstruct the regularized self-force Fα along
the geodesic. Figure 13 shows the result of the calcula-
tion. For the choice of parameters used to calculate the
force shown in Fig. 13, the error bars corresponding to
the truncation error (which are already much larger than
than the discretization error) would be of the order of
the line thickness and have not been drawn.
Already for this small eccentricity, we see that the self-
force is most important when the particle is closest to the
black hole (ie. for 200M . t . 400M and 600M . t .
 60  50  40  30  20  10  0  10  20  30  40  50
trajectory for p=7.8001, e=0.9
FIG. 14: Trajectory of a particle on a zoom-whirl orbit with
p = 7.8001, e = 0.9. The cross-hairs indicate the positions
where the data shown in Fig. 16 and 17 was extracted.
800M); the self-force acting on the particle is very small
once the particle has moved away to r ≈ 15M .
B. Zoom-whirl orbit
Highly eccentric orbits are of most interest as sources
of gravitational radiation. For nearly parabolic orbits
with e . 1 and p & 6+2e, a particle revolves around the
black hole a number of times, moving on a nearly circu-
lar trajectory close to the event horizon (“whirl phase”),
before moving away from the black hole (“zoom phase”).
During the whirl phase the particle is in the strong field
region of the black hole, emitting copious amounts of
radiation. Figures 14 and 15 show the trajectory of a
particle and the force on such an orbit with p = 7.8001,
e = 0.9. Even more so than for the mildly eccentric
orbit discussed in Sec. VIA, the self-force (and thus the
amount of radiation produced) is much larger while the
particle is close to the black hole than when it zooms out.
Defining energy E per unit mass and angular momen-
tum L per unit mass in the usual way,
E = −
uα, L =
uα, (6.1)
and following eg. the treatment of Wald [16], Ap-
pendix C, it is easy to see that the rates of change Ė
and L̇ (per unit proper time) are directly related to com-
ponents of the acceleration aα (and therefore force) ex-
perienced by the particle via
Ė = −at, L̇ = aφ. (6.2)
The self-force shown in Fig. 15 therefore confirms our
näıve expectation that the self-force should decrease both
the energy and angular momentum of the particle as ra-
diation is emitted.
-0.025
-0.02
-0.015
-0.01
-0.005
 0.005
 500  1000  1500  2000  2500  3000
time/M
-0.002
-0.001
 0.001
 0.002
 0.003
 0.004
 2000 2050 2100 2150 2200
FIG. 15: Self-force acting on a particle. Shown is the dimen-
sionless self-force M
Fr and
Fφ on a zoom-whirl
orbit with p = 7.8001, e = 0.9. The inset shows a magni-
fied view of the self-force when the particle is about to enter
the whirl phase. No error bars showing an estimate error are
shown, as the errors shown eg. in Table II are to small to
show up on the graph. Notice that the self-force is essentially
zero during the zoom phase 500M . t . 2000M and reaches
a constant value very quickly after the particle enters into the
whirl phase.
It is instructive to have a closer look at the force acting
on the particle when it is within the zoom phase, and also
when it is moving around the black hole on the nearly cir-
cular orbit of the whirl phase. In Fig. 16 and Fig. 17 we
show plots of Φ(0)ℓ vs. ℓ after the removal of the A(µ),
B(µ), and D(µ) terms. While the particle is still zooming
in toward the black hole, Φ(0)ℓ behaves exactly as for the
mildly eccentric orbit described in Sec. VIA over the full
range of ℓ plotted; ie. the magnitude of each term scales
as ℓ0, ℓ−2 and ℓ−4, after removal of the A(µ), B(µ), and
D(µ) terms respectively. Close to the black hole, on the
other hand, the particle moves along a nearly circular tra-
jectory. If the orbit were perfectly circular for all times,
ie. ṙ ≡ 0, then the (0) component would not require reg-
ularization at all, and the multipole coefficients would
decay exponentially, resulting in a straight line on the
semi-logarithmic plot shown in Fig. 17. As the real orbit
is not precisely circular, curves eventually deviate from a
straight line. Removal of the A(µ) term is required almost
immediately (beginning with ℓ ≈ 3), while the D(µ) term
starts to become important only after ℓ ≈ 11. This shows
that there is a smooth transition from the self-force on a
circular orbit, which does not require regularization for
the t and φ components, to that of a generic orbit, for
which all components of the self-force require regulariza-
tion.
 0  2  4  6  8  10  12  14
Φ(0)-A
Φ(0)-A-B
Φ(0)-A-B-D
FIG. 16: Multipole coefficients of M
ReΦR(0) for a particle on
a zoom-whirl orbit (p = 7.8001, e = 0.9). The coefficients are
extracted at t = 2000M as the particle is about to enter the
whirl phase. As ṙ is non-zero, all components of the self-force
require regularization and we see that the dependence of the
multipole coefficients on ℓ is as predicted by Eq. 1.9. After the
removal of the regularization parameters A(µ), B(µ), and D(µ)
the remainder is proportional to ℓ0, ℓ−2 and ℓ−4 respectively.
 0  2  4  6  8  10  12  14
Φ(0)-A
Φ(0)-A-B
Φ(0)-A-B-D
FIG. 17: Multipole coefficients of ReΦR(0) for a particle on
a zoom-whirl orbit (p = 7.8001, e = 0.9). The coefficients
are extracted at t = 2150M while the particle is in the whirl
phase. The orbit is nearly circular at this time, causing the
dependence on ℓ after removal of the regularization parame-
ters to approximate that of a true circular orbit.
Acknowledgments
We thank Eric Poisson and Eran Rosenthal for useful
discussions and suggestions. This work was supported by
the Natural Sciences and Engineering Council of Canada.
APPENDIX A: TRANSLATION TABLES
We quote the results of [4] for the translation table be-
tween the modes Φℓm and the tetrad components Φ(µ)ℓm
with respect to the pseudo-Cartesian basis
eα(0) =
, 0, 0, 0
, (A1)
eα(1) =
f sin θ cosφ,
cos θ cosφ,− sinφ
r sin θ
, (A2)
eα(2) =
f sin θ sinφ,
cos θ sinφ,
r sin θ
, (A3)
eα(3) =
f cos θ,−1
sin θ, 0
, (A4)
and the complex combinations eα
:= eα
± ieα
eα(±) =
f sin θe±iφ,
cos θe±iφ,
±ie±iφ
r sin θ
. (A5)
With these, the spherical-harmonic modes Φ(µ)ℓm(t, r)
are given in terms of Φℓm(t, r) by
Φ(0)ℓm =
Φℓm, (A6)
Φ(+)ℓm =−
(ℓ+m− 1)(ℓ +m)
(2ℓ− 1)(2ℓ+ 1)
− ℓ− 1
Φℓ−1,m−1
(ℓ−m+ 1)(ℓ −m+ 2)
(2ℓ+ 1)(2ℓ+ 3)
Φℓ+1,m−1, (A7)
Φ(−)ℓm =
(ℓ −m− 1)(ℓ−m)
(2ℓ− 1)(2ℓ+ 1)
− ℓ− 1
Φℓ−1,m+1
(ℓ+m+ 1)(ℓ +m+ 2)
(2ℓ+ 1)(2ℓ+ 3)
Φℓ+1,m+1, (A8)
Φ(3)ℓm =
(ℓ −m)(ℓ+m)
(2ℓ− 1)(2ℓ+ 1)
Φℓ−1,m
(ℓ−m+ 1)(ℓ +m+ 1)
(2ℓ+ 1)(2ℓ+ 3)
Φℓ+1,m. (A9)
APPENDIX B: REGULARIZATION
PARAMETERS
For completeness we list the regularization parameters
as calculated in [4]. Quantities bearing a subscript “0”
are evaluated at the particle’s position.
A(0) =
0 + L
sign(∆), (B1)
A(+) = −eiφ0
0 + L
sign(∆), (B2)
A(3) = 0, (B3)
where f0 := 1 − 2M/r0 and sign(∆) is equal to +1 if
∆ > 0 and to −1 if ∆ < 0. We have, in addition, A(−) =
Ā(+), A(1) = Re[A(+)], and A(2) = Im[A(+)].
We also use
B(0) = −
Er0ṙ0√
0 + L
2)3/2
E + Er0ṙ0
0 + L
2)3/2
B(+) = e
Bc(+) − iB
, (B5)
Bc(+) =
0 + L
2)3/2
r20 + L
0 + L
2)3/2
f0 − 1
r20 + L
K, (B6)
Bs(+) = −
f0)ṙ0
r20 + L
E + (2−
f0)ṙ0
r20 + L
B(3) = 0. (B8)
In addition, B(−) = B̄(+), B(1) = Re[B(+)] =
cosφ0 + B
sinφ0, and B(2) = Im[B(+)] =
Bc(+) sinφ0 −B
(+) cosφ0.
Here, the rescaled elliptic integrals E and K are defined
E := 2
∫ π/2
(1− k sin2 ψ)1/2 dψ = F
; 1; k
K := 2
∫ π/2
(1− k sin2 ψ)−1/2 dψ = F
; 1; k
(B10)
in which k := L2/(r20 + L
We also use
C(µ) = 0 (B11)
D(0) = −
Er30(r
0 − L2)ṙ30
0 + L
2)7/2
E(r70 + 30Mr
0 − 7L2r50 + 114ML2r40 + 104ML4r20 + 36ML6)ṙ0
16r40
0 + L
2)5/2
Er30(5r
0 − 3L2)ṙ30
0 + L
2)7/2
E(r50 + 16Mr
0 − 3L2r30 + 42ML2r20 + 18ML4)ṙ0
16r20
0 + L
2)5/2
K, (B12)
D(+) = e
Dc(+) − iD
, (B13)
Dc(+) =
r30(r
0 − L2)ṙ40
0 + L
2)7/2
− r0ṙ
4(r20 + L
2)3/2
(3r70 + 6Mr
0 − L2r50 + 31ML2r40 + 26ML4r20 + 9ML6)ṙ20
0 + L
2)5/2
(3r70 + 8Mr
0 + L
2r50 + 26ML
2r40 + 22ML
4r20 + 8ML
16r60(r
0 + L
2)3/2
0 + 2Mr
0 + 4ML
r20 + L
0 − 3L2)ṙ40
0 + L
2)7/2
8(r20 + L
2)3/2
− (7r
0 + 12Mr
0 − L2r30 + 46ML2r20 + 18ML4)ṙ20
16r20
0 + L
2)5/2
− (7r
0 + 6Mr
0 + 6L
2r30 + 12ML
2r20 + 4ML
16r40(r
0 + L
2)3/2
r20 + L
K, (B14)
Ds(+) =
r20(r
0 − 7L2)(
f0 − 2)ṙ30
0 + L
2)5/2
− (2r
0 +Mr
0 + 5L
2r50 + 10ML
2r40 + 29ML
4r20 + 14ML
6)ṙ0
8r50L(r
0 + L
2)3/2
(r50 −Mr40 + 4L2r30 − 5ML2r20 + 2ML4)ṙ0
4r30L
0 + L
2)3/2
r20(r
0 − 3L2)(
f0 − 2)ṙ30
0 + L
2)5/2
(4r50 + 2Mr
0 + 7L
2r30 + 10ML
2r20 + 14ML
4)ṙ0
16r30L(r
0 + L
2)3/2
− (2r
0 − 2Mr20 + 5L2r0 − 8ML2)ṙ0
0 + L
2)3/2
K, (B15)
D(3) = 0. (B16)
And finally, D(−) = D̄(+), D(1) = Re[D(+)] =
cosφ0 + D
sinφ0, and D(2) = Im[D(+)] =
Dc(+) sinφ0 −D
(+) cosφ0.
APPENDIX C: PIECEWISE POLYNOMIALS
In two places in the numerical simulation we introduce
piecewise polynomials to approximate the scalar field ψℓm
across the world line, where it is continuous but not dif-
ferentiable. By a piecewise polynomial we mean a poly-
nomial of the form
p(t, r∗) =


n,m=0
unvm if r∗(u, v) > r∗0
n,m=0
unvm if r∗(u, v) < r∗0
, (C1)
where u = t−r∗, v = t+r∗ are characteristic coordinates,
r∗0 is the position of the particle at the time t(u, v), and
N is the order of the polynomial, which for our purposes
is N = 4 or less. The two sets of coefficients cnm and
c′nm are not independent of each other, but are linked
via jump conditions that can be derived from the wave
equation [Eq. (1.12)]. To do so, we rewrite the wave
equation in the characteristic coordinates u and v and
reintroduce the integral over the world line on the right-
hand side,
−4∂u∂vψ − V ψ =
Ŝ(τ)δ(u − up)δ(v − vp) dτ , (C2)
where Ŝ(τ) = −8πq Ȳℓm(π/2,φp(τ))
rp(τ)
is the source term and
quantities bearing a subscript p are evaluated on the
world line at proper time τ .
Here and in the following we use the notation
[∂nu∂
v ψ] = lim
[∂nu∂
v ψ(t0, r
0 + ǫ)− ∂nu∂mv ψ(t0, r∗0 − ǫ)]
to denote the jump in ∂nu∂
v ψ across the world line. First,
we notice that the source term does not contain any
derivatives of the Dirac δ-function, causing the solution
ψ to be continuous. This means that the zeroth-order
jump vanishes: [ψ] = 0. Our task is then to find the re-
maining jump conditions at a point (t0, r
0) for n,m ≤ 4.
Alternatively, instead of crossing the world line along a
line t = t0 = const we can also choose to cross along
lines of u = u0 = const or v = v0 = const, noting that
for a line of constant v the coordinate u runs from u0+ ǫ
to u0 − ǫ to cross from the left to the right of the world
line. Figure 18 provides a clearer description of the paths
taken.
(u0 − ǫ, v0)
(u0, v0 + ǫ)(u0 + ǫ, v0)
(u0, v0 − ǫ)
(t0, r
0) = (u0, v0)
FIG. 18: Paths taken in the calculation of the jump condi-
tions. (u0, v0) denotes an arbitrary but fixed point along the
world line γ. The wave equation is integrated along the lines
of constant u or v indicated in the sketch. Note that in order
to move from the domain on the left to the domain on the
right, u has to run from u0 + ǫ to u0 − ǫ. Where appropriate
we label quantities connected to the domain on the left by a
subscript “−” and quantities connected to the domain on the
right by “+”.
In order to find the jump [∂uψ] we integrate the wave
equation along the line u = u0 from v0 − ǫ to v0 + ǫ
∫ v0+ǫ
∂u∂vψdv −
∫ v0+ǫ
V ψdv =
Ŝ(τ)δ(u0 − up)
∫ v0+ǫ
δ(v − vp)dv dτ ,
which, after involving
∫ v0+ǫ
δ(v − vp)dv = θ(vp − v0 +
ǫ)θ(v0 − vp + ǫ) and δ(g(x)) = δ(x− x0)/ |g′(x0)|, yields
[∂uψ] = −
E − ṙ0
Ŝ(τ0), (C5)
where the overdot denotes differentiation with respect to
proper time τ .
Similarly, after first taking a derivative of the wave
equation with respect to v and integrating from u0+ ǫ to
u0 − ǫ, we obtain
∫ u0−ǫ
vψdu−
∫ u0−ǫ
V ψdu =
Ŝ(τ)
∫ u0−ǫ
δ(u− up)du δ′(v0 − vp)dτ .
We find
E + ṙ0
E + ṙp
Ŝ(τ)
|τ=τ0
. (C7)
Systematically repeating this procedure we find expres-
sions for the jumps in all the derivatives that are purely
in the u or v direction. Table III lists these results. Jump
[ψ] =0
[∂uψ] =−
bS(τ0), [∂vψ] =
bS(τ0)
bS(τ )
|τ=τ0
bS(τ )
|τ=τ0
V ξ0ξ̄
0 [∂uψ]−
bS(τ )
|τ=τ0
V ξ̄0ξ
0 [∂vψ] +
bS(τ )
|τ=τ0
|τ=τ0
+ 3ξ0ξ̄
0 ∂uV + ξ
0 ∂vV
[∂uψ] +
bS(τ )
|τ=τ0
|τ=τ0
+ 3ξ̄0ξ
0 ∂vV + ξ̄
0 ∂uV
[∂vψ]−
bS(τ )
|τ=τ0
TABLE III: Jump conditions for the derivatives purely in the
u or v directions. ṙ and r̈ are the particle’s radial velocity and
acceleration, respectively. They are obtained from the equa-
tion of motion for the particle. ξ̄ := E−ṙ
and ξ := E+ṙ
introduced for notational convenience. Quantities bearing a
subscript p are evaluated on the particle’s world line, while
quantities bearing a subscript 0 are evaluated at the parti-
cle’s current position. Derivatives of V with respect to either
u or v are evaluated as ∂uV = −
f∂rV and ∂vV =
f∂rV ,
respectively.
conditions for derivatives involving both u and v are ob-
tained directly from the wave equation [Eq. (C2)]. We
see that
[∂u∂vψ] = 0, (C8)
and taking an additional derivative with respect to u on
both sides reveals that
∂2u∂vψ
V [∂uψ] . (C9)
Systematically repeating this procedure we can find jump
conditions for each of the mixed derivatives by evaluating
∂n+1u ∂
[∂nu∂
v (V ψ)] , (C10)
where n,m ≥ 0 and derivatives of V with respect to
either u or v are evaluated as ∂uV = − 12f∂rV and ∂vV =
f∂rV , respectively.
The results of Table III and Eq. (C10) allow us to
express the coefficients of the left-hand polynomial in
Eq. (C1) in terms of the jump conditions and the co-
efficients of the right-hand side:
c′nm = cnm − [∂nu∂mv ψ] . (C11)
For N = 4 this leaves us with 25 unknown coefficients
cnm which can be uniquely determined by demanding
that the polynomial match the value of the field on the
25 grid points surrounding the particle. When we are
interested in integrating the polynomial, as in the case of
the potential term in the fourth-order algorithm, we do
not need all these terms. Instead, in order to calculate
e.g. the integral
V ψ du dv up to terms of order h5,
as is needed to achieve overall O(h4) convergence, it is
sufficient to include only terms such that n+m ≤ 2, thus
reducing the number of unknown coefficients to 6. In this
case Eq. (C1) becomes
p(t, r∗) =
m+n≤2
unvm if r∗(u, v) > r∗0
m+n≤2
unvm if r∗(u, v) < r∗0
. (C12)
The six coefficients can then be determined by matching
the polynomial to the field values at the six grid points
which lie within the past light cone of the grid point
whose field value we want to calculate.
[1] The LISA web site is located at
http://lisa.jpl.nasa.gov/.
[2] L. Barack and A. Ori, Phys. Rev. D 61, 061502 (2000),
gr-qc/9912010.
[3] C. Lousto, Class. Quantum Grav. 22, S543 (2005).
[4] R. Haas and E. Poisson, Phys. Rev. D 74,
044009 (pages 29) (2006), gr-qc/0605077, URL
http://link.aps.org/abstract/PRD/v74/e044009 .
[5] B. S. DeWitt and R. W. Brehme, Annals of Physics 9,
220 (1960).
[6] Y. Mino, M. Sasaki, and T. Tanaka, Phys. Rev. D 55,
3457 (1997), gr-qc/9606018.
[7] T. C. Quinn and R. M. Wald, Phys. Rev. D 56, 3381
(1997), gr-qc/9610053.
[8] T. C. Quinn, Phys. Rev. D 62, 064029 (2000), gr-
qc/0005030.
[9] S. Detweiler and B. F. Whiting, Phys. Rev. D 67, 024025
(2003), gr-qc/0202086.
[10] C. O. Lousto, Class. Quant. Grav. 22, S543 (2005), gr-
qc/0503001.
[11] R. H. Price, Phys. Rev. D 5, 2419 (1972).
[12] C. G. Darwin, Proc. R. Soc. A 249, 180 (1959).
http://lisa.jpl.nasa.gov/
http://link.aps.org/abstract/PRD/v74/e044009
[13] C. O. Lousto and R. H. Price, Phys. Rev. D 56, 6439
(1997), gr-qc/9705071.
[14] L. M. Diaz-Rivera, E. Messaritaki, B. F. Whit-
ing, and S. Detweiler, Physical Review D (Par-
ticles, Fields, Gravitation, and Cosmology) 70,
124018 (pages 14) (2004), gr-qc/0410011, URL
http://link.aps.org/abstract/PRD/v70/e124018.
[15] S. Detweiler, E. Messaritaki, and B. F. Whiting, Phys.
Rev. D 67, 104016 (2003), gr-qc/0205079.
[16] R. M. Wald, General relativity (University of Chicago
Press, Chicago, 1984), ISBN 0226870324.
http://link.aps.org/abstract/PRD/v70/e124018
ABSTRACT
  We calculate the self-force acting on a particle with scalar charge moving on
a generic geodesic around a Schwarzschild black hole. This calculation requires
an accurate computation of the retarded scalar field produced by the moving
charge; this is done numerically with the help of a fourth-order convergent
finite-difference scheme formulated in the time domain. The calculation also
requires a regularization procedure, because the retarded field is singular on
the particle's world line; this is handled mode-by-mode via the mode-sum
regularization scheme first introduced by Barack and Ori. This paper presents
the numerical method, various numerical tests, and a sample of results for
mildly eccentric orbits as well as ``zoom-whirl'' orbits.

<|endoftext|><|startoftext|>
Introduction
In [6], the authors and Bruner described a proof of the following theorem, along
with some additional nonimmersion results.
Theorem 1.1. ([6, 1.1]) Assume that M is divisible by the smallest 2-power greater
than or equal to h.
• If α(M) = 4h − 1, then P 8M+8h+2 cannot be immersed in ( 6⊆)
16M−8h+10.
• If α(M) = 4h− 2, then P 8M+8h 6⊆ R16M−8h+12.
Here and throughout, α(M) denotes the number of 1’s in the binary expansion of M ,
and P n denotes real projective space.
Date: April 5, 2007.
2000 Mathematics Subject Classification. 57R42, 55N20.
Key words and phrases. immersion, projective space, elliptic cohomology.
We thank Steve Wilson for causing us to take a look at these matters.
http://arxiv.org/abs/0704.0798v1
2 DONALD M. DAVIS AND MARK MAHOWALD
In [6], the theorem is followed by a comment that this is new provided α(M) ≥ 6,
i.e., h ≥ 2, and the first new result occurs for P 1536. In this note, we point out
that 1.1 is valid when h = 1, and these results are new when M is even, including
new nonimmersions of P n for n as small as 56. A remark in [6, p.66] that the
nonimmersions when h = 1 were implied by earlier work of the authors was incorrect.
Letting h = 1 in 1.1, we have the following result.
Corollary 1.2. a. If α(M) = 3, then P 8M+10 6⊆ R16M+2.
b. If α(M) = 2, then P 8M+8 6⊆ R16M+4.
Part (a) is new when M is even. It is 2 better than the previous best result, proved
in [4], and the nonembedding result that it implies is also new, 1 better than the
previous best, proved in [3]. In [7], a table of known nonimmersions, immersions,
nonembeddings, and embeddings of P n is presented, arranged according to n = 2i+d
with 0 ≤ d < 2i and d < 64. Part (a) enters the table with a new result for d = 58,
applying first to P 122.
If M is even, 1.2.b is new, 1 better than the previous best result, of [12], and the
nonembedding result implied is also new. It enters [7] at d = 24 and 40, with a new
result for P n with n as small as 56. The result of 1.2.b with M = 2i + 1 was also
proved very recently by Kitchloo and Wilson in [15]. This result for P 2
k+16, 2 better
than the previous result of [4] and also new as a nonembedding, enters [7] at d = 16,
and applies for n as small as 48.
In Section 2, we present a self-contained proof of Corollary 1.2. The primary reason
for doing this, which amounts to a reproof of part of [6, 1.1], is that the proof of the
general case in [6] requires some extremely elaborate arguments and calculations. Our
proof here, which is just for the case h = 1, is much more comprehensible.
The proof in [6] contained an oversight which we shall correct here. The argument
there was that an immersion of RP n in Rn+k implies existence of an axial map P n×
f−→ Pm+k for an appropriate value of m, and obtains a contradiction for certain
n, m, and k by consideration of tmf∗(f). Here tmf is the spectrum of topological
modular forms, which was discussed in [6]. A class X ∈ tmf8(P n) was described,
along with X1 = X × 1 and X2 = 1 × X in tmf8(P n × Pm). It was asserted that
f ∗(X) = X1+X2, and a contradiction obtained by showing that, for certain values of
NONIMMERSIONS IMPLIED BY TMF, REVISITED 3
the parameters, we might have Xℓ = 0 but (X1+X2)
ℓ 6= 0. We recently realized that
it is conceivable that f ∗(X) might contain other terms coming from tmf8(P n ∧ Pm).
In Section 3 (see Theorem 3.7) we perform a complete calculation of tmf∗(P∞×P∞)
in positive gradings divisible by 8, and in Section 4 we use it to show that effectively
f ∗(X) = u(X1 + X2), where u is a unit in tmf
∗(P∞ × P∞), which enables us to
retrieve all the nonimmersions of [6].
In Section 5, we compute tmf∗(CP∞ × CP∞) in positive gradings. The original
purpose of doing this was, prior to our obtaining the argument of Section 4, to see
whether we might mimic the argument of [2] and [8] to conclude that if f is an
axial map, then f ∗(X) might necessarily equal u(X1 − X2), where u is a unit in
tmf∗(CP × CP ). This approach to retrieving the nonimmersions of [6] did not yield
the desired result, but the later approach given in Section 4 did. Nevertheless the nice
result for tmf∗(CP∞ × CP∞) obtained in Theorem 5.19 should be of independent
interest.
2. Proof of Corollary 1.2
We begin by proving 1.2.a. The following standard reduction goes back at least to
[14]. If P 8M+10 ⊆ R16M+2, then gd((2L+3 − 8M − 11)ξ8M+10) ≤ 8M − 8, hence this
bundle has (2L+3−16M −3) linearly independent sections, and thus there is an axial
P 8M+10 × P 2L+3−16M−4 f−→P 2L+3−8M−12.
The bundle here is the stable normal bundle, L is a sufficiently large integer, and gd
refers to geometric dimension. Let X , X1, and X2 be elements of tmf
8(−) described
in [6] and also in Section 1. In Section 4, we will show that we may assume that
f ∗(X) = X1+X2, as was done in [6], since this is true up to multiplication by a unit.
Since tmf2
L+3−8M−8(P 2
L+3−8M−12) = 0, we have
0 = f ∗(0) = f ∗(X2
L−M−1) = (X1+X2)
2L−M−1 ∈ tmf2L+3−8M−8(P 8M+10×P 2L+3−16M−4).
Expanding, we obtain
2L−M−1
XM+11 X
2L−2M−2
2L−M−1
XM1 X
2L−2M−1
2 as the only
terms which are possibly nonzero. Next we note that, with all u’s representing odd
integers,
2L−M−1
= 2α(M)−ν(M+1)u2 = 2
3−ν(M+1)u2,
4 DONALD M. DAVIS AND MARK MAHOWALD
where we have used α(M) = 3 at the last step. Here and throughout, ν(2eu) = e.
Similarly,
2L−M−1
= 2α(M)u4 = 2
3u4. Thus an immersion implies that in
L+3−8M−8(P 8M+10 × P 2L+3−16M−4), we have
23−ν(M+1)u2X
2L−2M−2
2 + 2
2L−2M−1
2 = 0.
(2.1)
We recall [6, 2.6], which states that there is an equivalence of spectra P k+8b+8 ∧
tmf ≃ Σ8P kb ∧ tmf. Combining this with duality, we obtain tmf
8M+8(P 8M+10) ≈
tmf−1(P−3) ≈ Z/8, and so 8XM+11 X2
L−2M−2
2 = 0. Here and throughout, Pn = P
RP∞/RP n−1. Similarly tmf2
L+3−16M−8(P 2
L+3−16M−4) ≈ tmf7(P3) ≈ Z/16, and hence
16XM1 X
2L−2M−1
2 = 0. Duality also implies
L+3−8M−8(P 8M+10 × P 2L+3−16M−4) ≈ tmf14(P−3 ∧ P3).
Calculations such as E2(tmf∗(P−3∧P3)), the E2-term of the Adams spectral sequence
(ASS), were made by Bruner’s minimal-resolution computer programs in our work on
[6]. This one is in a small enough range to actually do by hand. The result is given
in Diagram 2.2.
Diagram 2.2. E2(tmf∗(P−3 ∧ P3)), ∗ ≤ 15
0 3 7 11 15
r r r r r�
r r r
The Z/8 ⊕ Z/16 arising from filtration 0 in grading 14 in 2.2 is not hit by a
differential from the class in (15, 0) because, as explained in the last paragraph of page
54 of [6], the class in (15, 0) corresponds to an easily-constructed nontrivial map. The
monomials XM+11 X
2L−2M−2
2 and X
2L−2M−1
2 are detected in mod-2 cohomology,
NONIMMERSIONS IMPLIED BY TMF, REVISITED 5
and so their duals emanate from filtration 0. We saw in the previous paragraph that
8 and 16, respectively, annihilate these monomials, and hence also their duals. Since
the chart shows that the subgroup of tmf14(P−3∧P3) generated by classes of filtration
0 is Z/8 ⊕ Z/16, we conclude that 8 and 16, respectively, are the precise orders of
the monomials. In particular, the order of XM1 X
2L−2M−1
2 is 16, and hence the class
in (2.1) is nonzero since it has a term 8uXM1 X
2L−2M−1
2 , and so (2.1) contradicts the
hypothesized immersion.
Part b of 1.2 is proved similarly. If P 8M+8 immerses in R16M+4, then there is an
axial map
P 8M+8 × P 2L+3−16M−6 f−→P 2L+3−8M−10,
and hence, up to odd multiples,
22−ν(M+1)XM+11 X
2L−2M−2
2 + 2
2XM1 X
2L−2M−1
2 (2.3)
= 0 ∈ tmf2L+3−8M−8(P 8M+8 ∧ P 2L+3−16M−6),
since α(M) = 2. We have tmf8M+8(P 8M+8) ≈ tmf−1(P−1) ≈ Z/2, and
L+3−16M−8(P 2
L+3−16M−6) ≈ tmf−1(P−3) ≈ Z/8.
Thus the two monomials in (2.3) have order at most 2 and 8, respectively. On
the other hand, the group in (2.3) is isomorphic to tmf6(P−1 ∧ P−3). A minimal
resolution calculation easier than the one in Diagram 2.2 shows that tmf6(P−1∧P−3)
has Z/2⊕Z/8 emanating from filtration 0 (and another Z/2⊕Z/8 in higher filtration).
The monomials of (2.3) are generated in filtration 0, and since the above upper bound
for their orders equals the order of the subgroup generated by filtration-0 classes, we
conclude that the orders of the monomials in (2.3) are precisely 2 and 8, respectively,
and so the term 4XM1 X
2L−2M−1
2 in (2.3) is nonzero, contradicting the immersion.
3. tmf-cohomology of P∞ × P∞
In this section, we compute tmf∗(P∞) and tmf8∗(P∞ × P∞) in positive gradings.
These will be used in the next section in studying the axial class in tmf-cohomology.
There is an element c4 ∈ π8(tmf) which reduces to v41 ∈ π8(bo); it has Adams
filtration 4. It acts on tmf∗(X) with degree −8. Recall also that π∗(bo) = bo∗ is as
depicted in 5.1. We denote bo∗ = bo−∗. We use P1 and P
∞ interchangeably.
6 DONALD M. DAVIS AND MARK MAHOWALD
Theorem 3.1. There is an element X ∈ tmf8(P1) of Adams filtration 0, described
in [6], such that, in positive dimensions divisible by 8, tmf∗(P1) is isomorphic as an
algebra over Z(2)[c4] to Z(2)[c4][X ]. In particular, each tmf
8i(P1) with i > 0 is a free
abelian group with basis {cj4X i+j : j ≥ 0}. There is a class L ∈ t0(P1) such that
• tmf0(P1) is a free abelian group with basis {L, cj4Xj : j ≥ 1},
• L2 = 2L and LX = 2X.
Moreover, in positive dimensions tmf∗(P1) is isomorphic as a graded abelian group to
bo∗[X ], and is depicted in Diagram 3.6.
Remark 3.2. A complete description of tmf∗(P1) as a graded abelian group could
probably be obtained using the analysis in the proof which follows, together with the
computation of the E2-term of the ASS converging to tmf∗(P−1), which was given in
[10]. However, this is quite complicated and unnecessary for this paper, and so will
be omitted.
Proof. We begin with the structure as graded abelian group. There are isomorphisms
tmf∗(P1) ≈ lim← tmf
∗(P n1 ) ≈ lim← tmf−∗−1(P
−n−1) = tmf−∗−1(P
(3.3)
Since H∗(tmf;Z2) ≈ A//A2, there is a spectral sequence converging to tmf∗(X) with
E2(X) = ExtA2(H
∗X,Z2). Here A2 is the subalgebra of the mod 2 Steenrod algebra
A generated by Sq1, Sq2, and Sq4. Also Z2 = Z/2.
We compute E2(P
−∞) from the exact sequence
→ Es−1,t2 (P∞−1)→ E
−∞)→ E
q∗−→ Es,t2 (P∞−1)→ .
(3.4)
It was proved in [17] that
ExtA2(P
−∞,Z2) ≈
ExtA1(Σ
Z2,Z2).
Here we have initiated a notation that Pmn := H
∗(Pmn ). A complete calculation of
ExtA2(P
−1,Z2) was performed in [10], but all we need here are the first few groups.
We can now form a chart for E2(P
−∞) from (3.4), as in Diagram 3.5, where ◦ indicate
elements of ExtA2(P
−1,Z2) suitably positioned, and lines of negative slope correspond
to cases of q∗ 6= 0 in (3.4).
NONIMMERSIONS IMPLIED BY TMF, REVISITED 7
Diagram 3.5. tmf∗(P
−∞), −17 ≤ ∗ ≤ 2
−17 −9 −1
· · ·
✻ ✻ ✻
r r r
r r r
✻ ✻ ✻
r r r
r r r
r r r
r r r
r r r
Dualizing, we obtain Diagram 3.6 for the desired tmf∗(P∞1 ).
Diagram 3.6. tmf∗(P∞1 ), ∗ ≥ −2
0 8 16
✻ ✻ ✻✻ ✻ ✻
r r r
r r r
· · ·
Naming of the generators X i is clear since X has filtration 0. The free action of c4
is also clear. The class L is (up to sign) the composite P1
λ−→ S0 → tmf, where λ is
the well-known Kahn-Priddy map. Thus L is the image of a class L̂ ∈ π0(P1). Lin’s
theorem ([16]) says that π0(P1) ≈ Z∧2 , generated by L̂. Since π0(P1)→ ko0(P1) is an
isomorphism, and, since (1 − ξ)2 = 2(1 − ξ) for a generator (1 − ξ) of ko0(P1), we
obtain L̂2 = 2L̂, and hence also for L. We chose the generator to be (1 − ξ) rather
than (ξ − 1) to avoid minus signs later in the paper.
8 DONALD M. DAVIS AND MARK MAHOWALD
To prove the claim about LX , first note that, by the structure of tmf8(P1), we
must have LX = p(c4X)X for some polynomial p. Multiply both sides by L and
apply the result about L2 to get 2LX = p(c4X)LX , hence 2p = p
2, from which we
conclude p = 2.
In tmf∗(P1 × P1), for i = 1, 2, let Li and Xi denote the classes L and X in the ith
factor. Note that there is an isomorphism as tmf∗-modules, but not as rings,
tmf∗(P1 × P1) ≈ tmf∗(P1 ∧ P1)⊕ tmf∗(P1 × ∗)⊕ tmf∗(∗ × P1).
Theorem 3.7. In positive dimensions divisible by 8, tmf∗(P1 ∧ P1) is isomorphic
as a graded abelian group to a free abelian group on monomials X i1X
2 with i, j > 0
direct sum with a free Z[c4]-module with basis {L1X i2, X i1L2 : i ≥ 1}. The product
and Z[c4]-module structure is determined from 3.1 and
c4(X1X2) = (c4X1)X2 = X1(c4X2) =
4(L1X
1 L2),
for certain integers γi with γ0 divisible by 8.
The proof of this theorem involves a number of subsidiary results. They and it
occupy the remainder of this section. We will use duality and exact sequences similar
to (3.4). But to get started, we need ExtA2(P ⊗ P,Z2). Here we have begun to
abbreviate P := P∞−∞. We begin with a simple lemma. Throughout this section, x1
and x2 denote nonzero elements coming from the factors in H
1(RP × RP ;Z2).
Lemma 3.8. ([9]) There is a split short exact sequence of A-modules
0→ Z2 ⊗P→ P⊗P→ (P/Z2)⊗P→ 0.
Proof. The Z2 is, of course, the subgroup generated by x
0, which is an A-submodule.
A splitting morphism P⊗P g−→Z2 ⊗P is defined by g(xi1 ⊗ x
2) = x
1 ⊗ x
2 . This is
A-linear since
g(Sqk(xi1⊗x
2)) =
x01⊗x
i+j+k
x01⊗x
i+j+k
2 = Sq
k g(xi1⊗x
The following result is more substantial. We will prove it at the end of this section.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 9
Proposition 3.9. There is a short exact sequence of A2-modules
0→ C → (P/Z2)⊗P→ B → 0,
where C has a filtration with
Fp(C)/Fp−1(C) ≈ Σ8pA2/ Sq2, p ∈ Z,
and B has a filtration with
Fp(B)/Fp−1(B) ≈
Z copies
Σ4p−2A2/ Sq
1, p ∈ Z.
The generator of Fp(C)/Fp−1(C) is x
2 ; a basis over Z2 for C is
{x21xi+22 +x41xi2, x41xi2+x81xi−42 , i ∈ Z}∪{x11xi−12 +x21xi−22 , i 6≡ 0 (8)}∪{x11x
2 , p ∈ Z}.
A minimal set of generators as an A2-module for the filtration quotients of B is
{x8i−11 x
2 : i, j ∈ Z}.
Corollary 3.10. A chart for Ext
(P ⊗ P,Z2) in 8p − 3 ≤ t − s ≤ 8p + 4 is as
suggested in Diagram 3.11, for all integers p. The big batch of towers in each grading
≡ 2 (4) represents an infinite family of towers. The pattern of the other classes is
repeated with vertical period 4. Thus, for example, in 8p−1 there is an infinite tower
emanating from filtration 4i for each i ≥ 0.
10 DONALD M. DAVIS AND MARK MAHOWALD
Diagram 3.11. Ext
(P⊗P,Z2) in 8p− 3 ≤ t− s ≤ 8p+ 4
8p+ −2 0 2 4
✻✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻✻✻ ✻✻
✻✻ ✻✻
Proof of Corollary 3.10. We first note that ExtA2(P,Z2) is identical to the left portion
of Diagram 3.5 extended periodically in both directions. Also, ExtA2(A2/ Sq
1,Z2) ≈
ExtA0(Z2,Z2) is just an infinite tower, and
ExtA2(A2/ Sq
2,Z2) ≈ ExtA1(A1/ Sq2,Z2)
is given as in Diagram 3.14. We will show at the end of this proof that
ExtA2(C,Z2) ≈
ExtA2(Σ
8pA2/ Sq
2,Z2) (3.12)
and similarly
ExtA2(B,Z2) ≈
ExtA2(Σ
4p−2A2/ Sq
1,Z2).
These would follow by induction on p once you get started, but since p ranges over
all integers, that is not automatic.
Thus ExtA2(P⊗P,Z2) is formed from
ExtA2(P,Z2)⊕
ExtA2(Σ
8pA2/ Sq
2,Z2)⊕
ExtA2(Σ
4p−2A2/ Sq
1,Z2),
NONIMMERSIONS IMPLIED BY TMF, REVISITED 11
using the sequences in 3.8 and 3.9. The Ext sequence of 3.8 must split, and there are
no possible boundary morphisms in the Ext sequence of 3.9, yielding the claim of the
corollary.
To prove (3.12), let (s, t) be given, and choose p0 so that 8p0 < t− 23s+ 2. Since
the highest degree element in A2 is in degree 23, Ext
(Fp0(C),Z2) = 0. Actually a
much sharper lower vanishing line can be established, but this is good enough for our
purposes. Thus, for this (s, t),
(Fp1(C),Z2) ≈
(Σ8p−2A2/ Sq
2) (3.13)
for p1 ≤ p0, as both are 0. Let p1 be minimal such that (3.13) does not hold. Then
comparison of exact sequences implies that
s−1,t
(Fp1−1(C),Z2)→ Ext
(Fp1(C)/Fp1−1(C),Z2)
must be nonzero. But one or the other of these groups is always 0,1 as both charts
(Fp1−1(C),Z2) and Ext
(Fp1(C)/Fp1−1(C),Z2) are copies of Diagram 3.14 dis-
placed by 4 vertical units from one another. Thus (3.13) is true for all p1, and hence
(3.12) holds. A similar proof works when C is replaced by B.
Diagram 3.14. ExtA2(A2/ Sq
2,Z2)
· · ·
Now we can prove a result which will, after dualizing, yield Theorem 3.7. The
groups ExtA1(Z2,Z2) to which it alludes are depicted in 5.1. The content of this
result is pictured in Diagram 3.18.
Proposition 3.15. In dimensions t− s ≡ 2 mod 4 with t− s ≤ −10, ExtA2(P−2−∞⊗
P−2−∞,Z2) consists of i infinite towers emanating from filtration 0 in dimensions −8i−
1Actually this is not quite true; for one family of elements we need to use h0-
naturality.
12 DONALD M. DAVIS AND MARK MAHOWALD
6 and −8i − 10, together with the relevant portion of two copies of ExtA1(Z2,Z2)
beginning in filtration 1 in each dimension −8i − 2. The generators of the towers in
−8i− 10 correspond to cohomology classes x−91 x−8i−12 , . . . , x−8i−11 x−92 . The generators
of the two copies of ExtA1(Z2,Z2) in −8i−2 arise from h0 times classes corresponding
to x−11 x
2 and x
−8i−1
Proof. Using exact sequences like (3.4) on each factor, we build Ext
(P−2−∞⊗P−2−∞,Z2)
fromA := Ext
(P⊗P,Z2), B := Ext∗−1,∗A2 (P
−1⊗P,Z2), C := Ext
∗−1,∗
(P⊗P∞−1,Z2),
and D := Ext
∗−2,∗
(P∞−1 ⊗ P∞−1,Z2), with possible d1-differential from A and into D.
In the range of concern, t− s ≤ −9, the D-part will not be present, and the part of
Diagram 3.11 in dimension 6≡ 2 mod 4 will not be involved in d1. Using [17] for B
and C, the relevant part, namely the portion of A in dimension ≡ 2 mod 4, together
with B and C, is pictured in Diagram 3.16.
Diagram 3.16. Portion of A+B+C
✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻ ✻✻✻✻✻✻✻✻✻✻
−2 2 68p+
rr rrr
In dimension 8p−2, the towers in A arise from all cohomology classes x−8i−11 x
−8j−1
with i+ j = −p, while in dimension 8p+ 2, they arise from x8i−11 x
2 ∼ x8i+31 x
The finite towers in B arise from x4i−11 x
2 with i ≥ 0, and those from C from
x8i−11 x
2 with j ≥ 0. The homomorphism
Ext0A2(P⊗P,Z2)→ Ext
(P∞−1 ⊗P,Z2)⊕ Ext0A2(P⊗P
−1,Z2),
NONIMMERSIONS IMPLIED BY TMF, REVISITED 13
which is equivalent to the d1-differential mentioned above, sends classes to those with
the same name. In dimension ≤ −10, this is surjective, with kernel spanned by classes
with both components < −1. In dimension −8i−6 and −8i−10, there will be i such
classes. We illustrate by listing the classes in the first few gradings:
−14 : x−91 x−52 ∼ x−51 x−92
−18 : x−91 x−92
−22 : x−171 x−52 ∼ x−131 x−92 , x−91 x−132 ∼ x−51 x−172
−26 : x−171 x−92 , x−91 x−172 .
These kernel classes yield infinite towers emanating from filtration 0.
For each p < 0, the towers arising from x
2 , j ≥ 0, in A combine with those
in the p-summand of
ExtA1(Σ
8p−1P∞−1,Z2)
as in Diagram 3.17 to yield one of the copies of ExtA1(Z2,Z2) arising from filtration
1. An identical picture results when the factors are reversed.
Diagram 3.17. Part of ExtA2(P
−∞ ⊗P−2−∞,Z2)
✻ ✻ ✻
Putting things together, we obtain that in dimensions less than −8, ExtA2(P−2−∞⊗
P−2−∞,Z2) consists of a chart described in Proposition 3.15 and partially illustrated
in Diagram 3.18 together with the classes in Diagram 3.11 which are not part of the
infinite sums of towers in dimension ≡ 2 mod 4.
14 DONALD M. DAVIS AND MARK MAHOWALD
Diagram 3.18. Illustration of Proposition 3.15
−26 −18 −10
✻✻✻✻✻✻ ✻ ✻✻✻
✻✻✻✻ ✻✻
The only possible differentials in the Adams spectral sequence of P−2−∞∧P−2−∞∧ tmf
involving the classes in dimensions 8p − 2 with p < 0 are from the towers in 8p − 1
in Diagram 3.11, but these differentials are shown to be 0 as in [6, p.54]. Similarly to
(3.3), we have
tmf∗(P1 ∧ P1) ≈ tmf−∗−2(P−2−∞ ∧ P−2−∞),
and so we obtain a turned-around version of Diagram 3.18, of the same general sort
as Diagram 3.6, as a depiction of a relevant portion of tmf∗(P1∧P1), with the labeled
columns in Diagram 3.18 corresponding to cohomology gradings 24, 16, and 8.
The classes X i1X
2 described in Theorem 3.7 are detected by the S-duals of the
classes from which the filtration-0 towers in dimensions 8p− 2 in Diagram 3.18 arise,
and so they can be chosen to be the corresponding elements of tmf8∗(P1 ∧P1). Simi-
larly the classes L1X
2 and X
1L2 have Adams filtration 1, and so one would anticipate
that they represent the duals of the generators of the two towers in dimension 8p− 2
with p < 0 in Diagram 3.18. This seems a bit harder to prove using the Adams spectral
sequence; however, the Atiyah-Hirzebruch spectral sequence shows this quite clearly.
The class X i1 is detected by H
8i(P1; π0(tmf)), while L is detected by H
1(P1; π1(tmf)).
NONIMMERSIONS IMPLIED BY TMF, REVISITED 15
Under the pairing, their product is detected in H8i+1(P1; π1(tmf)), clearly of Adams
filtration 1.
The last part of Theorem 3.7 deals with the action of c4 on the monomials X
Since tmf is a commutative ring spectrum, tmf∗(P1 ∧ P1) is a graded commutative
algebra over tmf∗. The action c4(X1X2) must be of the form
i≥0 γic
4(L1X
as these are the only elements in tmf8(P1∧P1), and the class must be invariant under
reversing factors. The divisibility of γ0 by 8 follows since c4 has Adams filtration 4.
Having just completed the proof of Theorem 3.7, we conclude this section with the
postponed proof of Proposition 3.9.
Proof of Proposition 3.9. Let C denote the A2-submodule of (P/Z2) ⊗ P generated
by all x11x
2 , p ∈ Z. Note that Sq2(x11x
2 ) = Sq
4 Sq6(x11x
2 ). Thus a basis
of A2/ Sq
2 acting on all x11x
2 spans C. The 24 elements in a basis of A/ Sq
acting on x11x
2 yield x
2 + x
2 + x
2 + x
2 , x
2 + x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
2 , x
and x41x
2 + x
2 . These classes with second components shifted by all multiples of
8 exactly comprise the basis for C described in the proposition.
The procedure to establish the structure of B = ((P/Z2)⊗P)/C is similar but more
elaborate. For the 32 elements θ in a basis of A2/ Sq
1, we list θ(x−11 x
2 ) and θ(x
Then we show that these, with each component allowed to vary by multiples of 8,
together with C, fill out all of (P/Z2)⊗P.
It is convenient to let Q denote the quotient of (P/Z2)⊗P by C and all elements
θ(x8i−11 x
2 ) and θ(x
2 ). We will show Q = 0. This will complete the proof of
Proposition 3.9, implying in particular that Sq1(x8i−11 x
2 ) and Sq
1(x8i−11 x
2 ) are
decomposable over A2.
A separate calculation is performed for each mod 8 value of the degree. Here we
use repeatedly that the A2-action on x
i depends only on i mod 8. We illustrate with
the case in which degree ≡ 0 mod 8. The other 7 congruences are handled similarly,
although some are a bit more complicated.
A basis of A2/ Sq
1 in degree ≡ 2 mod 8 acting on x−11 x−12 yields the following
elements: x−11 x
2 + x
2 + x
2 , x
2 + x
2 + x
2 + x
2 + x
2 + x
16 DONALD M. DAVIS AND MARK MAHOWALD
and x41x
2 + x
2. A basis of A2/ Sq
1 in degree ≡ 6 mod 8 acting on x−11 x32 yields
the following elements: x21x
2 + x
2 + x
2 + x
2 + x
2 + x
2 + x
2 + x
2, and x
2 + x
2. Because we allow both components to vary
by multiples of 8, we will list just the first component of the ordered pairs. These
are considered as relations in Q. Thus the relation R1 below really means that all
x8i−11 x
2 + x
2 + x
2 become 0 in Q.
R1 : X−1 +X0 +X1,
R2 : X2 +X6,
R3 : X−1 +X3 +X4 +X5 +X9,
R4 : X4 +X12,
R5 : X2 +X3 +X4 +X5,
R6 : X−1 +X2 +X5,
R7 : X4 +X6 +X10 +X12,
R8 : X8 +X16.
We will use these relations to show that all classes (in degree ≡ 0 mod 8) are 0 in
Q. First, R8 implies that all classes X8i are congruent to one another. Since X0 is
0 in the quotient due to P/Z2, we conclude that all classes X8i are 0 in Q. Next,
R4 implies that all X8i+4 are congruent to one another. Since X4 +X8 ∈ C, and we
have just shown that X8 ≡ 0 in Q, we deduce that all X8i+4 are 0 in Q. Now we
use R2 + R7 to see that all X8i+2 + X8i+4 are congruent to one another, then that
X2 +X4 ∈ C to deduce all X8i+2 +X8i+4 ≡ 0, and finally the result of the previous
sentence to conclude all X8i+2 ≡ 0. Then R2 implies all X8i+6 ≡ 0. Now R1+R3+R5,
together with relations previously obtained, implies all X8i+1 are congruent to one
another, and since X1 ∈ C, we conclude all X8i+1 ≡ 0. Finally R1 implies X8i−1 ≡ 0,
R6 implies X8i+5 ≡ 0, and then R3 implies X8i+3 ≡ 0.
4. Careful treatment of axial class
In this section, we fill the gap in the proof in [6] of its Theorem 1.1 by careful consid-
eration of the possible “other terms” in the axial class discussed in the Introduction.
We show that, at least as far as the monomials cX i1X
2 in its powers are concerned,
NONIMMERSIONS IMPLIED BY TMF, REVISITED 17
the axial class equals u(X1+X2), where u is a unit in tmf
0(RP∞×RP∞). Thus the
ℓth power of the axial class is nonzero in tmf8ℓ(RP n×RPm) if and only if (X1+X2)ℓ
is nonzero there, and the latter is the condition which yielded the nonimmersions of
[6, 1.1]. Thus we have a complete proof of [6, 1.1].
If P n × Pm f−→Pm+k is an axial map, then there is a commutative diagram
P n × Pm f−−−→ Pm+k
P∞ × P∞ g−−−→ P∞,
where g is the standard multiplication of P∞, since P∞ = K(Z2, 1). Since X ∈
tmf8(Pm+k) has been chosen to extend over P∞, we obtain that f ∗(X) is the restric-
tion of g∗(X). By Theorem 3.7 and the symmetry of g, we must have
g∗(X) = X1 +X2 +
4(L1X
1 L2), (4.1)
for some integers κi. This is what we call the “axial class.” Then g
∗(Xℓ) equals the
ℓth power of (4.1). Using the formulas for L2i , LiXi, and c4(X1X2) in 3.1 and 3.7 and
the binomial theorem, this ℓth power can be written in terms of the basis described
in 3.7. If some κi’s are nonzero, the coefficients of X
2 in g
∗(Xℓ) will not equal
, as was claimed in [6]. We will study this possible deviation carefully.
One simplification is to treat L1 and L2 as being just 2. Note that Li acts like 2
when multiplying by Xi, and if, for example, L1 is present without X1, then the terms
ci4L1X
2 cannot cancel our X
2-classes because both are separate parts of the basis.
You have to carry the terms along, because they might get multiplied by an X1, and
then it is as if L1 = 2. We will incorporate this important simplification throughout
the remainder of this section.
For example, one easily checks that, using L21 = 2L1 and L1X1 = 2X1, we obtain
(X1 +X2 + L1X2)
4 = (X1 + 3X2)
4 − 80X42 + 40L1X42 .
The exponent of 2 in each monomial of (X1 + 3X2)
4 − 80X42 is the same as that in
(X1 +X2)
4, and L1X
2 is a separate basis element.
With this simplification, the axial class in (4.1) becomes
X1 +X2 + 2
2 ) (4.2)
18 DONALD M. DAVIS AND MARK MAHOWALD
for some integers κi. There was another term 2κ0(X1+X2), but it can be incorporated
into the leading (X1 +X2). The odd multiple that it can create is not important.
From Theorem 3.7, we have
c4(X1X2) = 16(X1 +X2) + 2
(4.3)
for some integers γk. The 16 comes from γ0 = 8 and Li = 2. Actually we don’t really
know that γ0 = 8, even just up to multiplication by a unit, but it is divisible by 8
and the possibility of equality must be allowed for. This gives
2 ) = 16(X
2 ) + 2
i+k+1
j+k+1
(4.4)
Here we use that in a graded tmf∗-algebra tmf
∗(X) with even-degree elements, c(xy) =
cx · y, for c ∈ tmf∗ and x, y ∈ tmf∗(X).
There is an iterative nature to the action of c4 in (4.4), but the leading coefficient
16 enables us to keep track of 2-exponents of leading terms in the iteration. (As
observed above, the leading coefficient might be an even multiple of 16, which would
make the terms even more highly 2-divisible. We assume the worst, that it equals
16.) We obtain the following key result about the action of c4 on monomials in X1
and X2.
Theorem 4.5. There are 2-adic integers Ai such that
24+iAi
Remark 4.6. This formula will be evaluated on (i.e. multiplied by) monomialsXk1X
One might worry that the negative powers of X1 or X2 in 4.5 will cause nonsensical
negative powers in c4X
2. This will, in fact, not occur because the monomials on
which we act always have total degree greater than the dimension of either factor.
Thus if, after multiplication by c4, a term with negative exponent of Xi appears, then
the accompanying X
3−i-term will be 0 for dimensional reasons.
Proof of Theorem 4.5. The defining equation (4.3) may be written as, with θ =
X1X2 and z =
X1/X2,
θ = 16(z + z−1) +
i(zi+1 + z−(i+1)). (4.7)
NONIMMERSIONS IMPLIED BY TMF, REVISITED 19
Let pi = z
i + z−i. We will show that
24+iAip2i+1 (4.8)
for certain 2-adic integers Ai, which interprets back to the claim of 4.5.
Note that pipj = pi+j + p|i−j|, and hence
pe11 · · · p
k = pΣiei + L,
where L is a sum of integer multiples of pj with j <
iei and j ≡
iei mod 2.
We will ignore for awhile the coefficients γi which occur in (4.7). This is allowable
if we agree that when collecting terms, we only make crude estimates about their
2-divisibility. We have
θ = 16p1 + 2θp2 + 2θ
2p3 + 2θ
3p4 + · · ·
= 16p1 + 2p2(16p1 + 2p2(16p1 + · · · ) + 2p3(16p1 + · · · )2 + · · · )
+2p3(16p1 + 2p2(16p1 + · · · ) + · · · )2 + · · · .
Note that the only terms that actually get evaluated must end with a 16p1 factor.
Now let T1 = 16p1 and, for i ≥ 2, let Ti = 2θi−1pi. Each term in the expansion of
θ involves a sequence of choices. First choose Ti for some i ≥ 1, and then if i > 1
choose (i−1) factors Tj , one from each factor of θi−1. For each of these Tj with j > 1,
choose j − 1 additional factors, and continue this procedure. This builds a tree, and
we don’t get an explicit product term until every branch ends with T1. Each selected
factor Tj with j > 1 contributes a factor 2pj. There will also be binomial coefficients
and the omitted γi’s occurring as additional factors.
For example, Diagram 4.9 illustrates the choices leading to one term in the expan-
sion of θ. This yields the term 2p2 · 2p4 · 16p1 · 2p2 · 16p1 · 2p3 · 16p1 · 2p2 · 16p1, which
equals 221(p17 +L), where L is a sum of pi with i < 17 and i odd. By induction, one
sees in general that the sum of the subscripts emanating from any node, including
the subscript of the node itself, is odd.
20 DONALD M. DAVIS AND MARK MAHOWALD
Diagram 4.9. A possible choice of terms
T2 T4 T2 T1
T2 T1
The important terms are those in which T2 is chosen k times (k ≥ 0) and then T1 is
chosen. These give (2p2)
kp1 with no binomial coefficient. This term is 2
k+4(p2k+1+L).
Note that a term 2k+4p2i+1 with i < k obtained from L will be more 2-divisible than
the 2i+4p2i+1 term that was previously obtained. Thus it may be incorporated into
the coefficient of that term.
All other terms will be more highly 2-divisible than these. For example, the first
would arise from choosing T3 then two copies of T1. This would give 2p3 ·24p1 ·24p1 =
29p5+L, and the 29p5 can be combined with the 26p5 obtained from choosing T2 then
T2 then T1. Incorporating γi’s may make terms even more divisible, but the claim of
(4.8) is only that p2i+1 occurs with coefficient divisible by 2
Now we incorporate 4.5 into (4.2) to obtain the following key result, which we prove
at the end of the section.
Theorem 4.10. The monomials ciX
2 in the nth power of the axial class in
tmf8n(RP∞ × RP∞) are equal to those in the nth power of
(X1 +X2)
24+iαi
, (4.11)
where u is an odd 2-adic integer and αi are 2-adic integers.
The factor which accompanies (X1+X2) in (4.11) is a unit in tmf
∗(RP∞×RP∞);
we referred to it earlier as u. Indeed, its inverse is a series of the same form, obtained
by solving a sequence of equations. This justifies the claim in the first paragraph of
this section regarding retrieval of the nonimmersions of [6, 1.1].
We must also observe that restriction to tmf8ℓ(RP n × RPm) of the non-X i1Xℓ−i2
parts of the basis of tmf8ℓ(RP∞ × RP∞) cannot cancel the X i1Xℓ−i2 terms essential
for the nonimmersion. This is proved by noting that these elements such as L1X
NONIMMERSIONS IMPLIED BY TMF, REVISITED 21
and ci4L1X
2 will restrict to a class of the same name in tmf
8ℓ(RP n × RPm), and
will be 0 there for dimensional reasons, since 8ℓ > n.
Proof of Theorem 4.10. Let g∗(X) denote the axial class as in (4.1). From (4.2) and
4.5, the difference g∗(X)− (X1 +X2) equals
We let z =
X1/X2 and pj = z
j + z−j as in the proof of 4.5.
The summand with i = 2t becomes
2κi(X1 +X2)
X t1X
2jAjp2j+1
= 2κi(X1 +X2)(p2t + L)24i
k(p2k+i + L).
Here k is a sum of j-values taken from the various factors in the ith power. Also, in
pj + L, L denotes a combination of pt’s with t < j. Noting (p2t + L)(p2k+i + L) =
p2k+2i + L, this becomes
2(X1 +X2)2
k(p2k+2i + L). (4.12)
The argument when i = 2t + 1 is similar but slightly more complicated because
(X i+11 +X
2 ) is not divisible by (X1 +X2). We obtain
X i+11 +X
X1X2)2t+1
2jAjp2j+1
For one of the factors of the ith power, say the first, we treat p2j+1 as
X1+X2√
(p2j +L).
The expression then becomes
2(X1 +X2)pi+12
k(p2k+i−1 + L),
where k is obtained as in the previous case. We again obtain (4.12).
Thus when g∗(X) − (X1 +X2) is written as (X1 + X2)
βjp2j , the coefficient βj
satisfies ν(βj) ≥ (j − 1) + 4 + 1. Here the (j − 1) + 4 comes from the case i = 1,
k = j− 1 in (4.12), and the extra +1 is the factor 2 which has been present all along.
This yields the claim of (4.11).
22 DONALD M. DAVIS AND MARK MAHOWALD
5. tmf-cohomology of CP∞ × CP∞
In [2], [4], and [8], it was noted, first by Astey, that the axial class using BP (or
BP 〈2〉) was u(X2 − X1), where u is a unit in BP ∗(P∞ ∧ P∞). In this section, we
review that argument and consider the possibility that it might be true when BP
is replaced by tmf, which would render the considerations of the previous section
unnecessary. To do this, we calculate tmf∗(CP∞) and tmf∗(CP∞×CP∞) in positive
dimensions. (See Theorems 5.15 and 5.19.) Although our conclusion will be that
Astey’s BP -argument cannot be adapted to tmf, nevertheless these calculations may
be of independent interest.
We begin by reviewing Astey’s argument. There is a commutative diagram, in
which RP = RP∞ and CP = CP∞
dR−−−→ RP × RP mR−−−→ RP
dC−−−→ CP × CP CP
1×(−1)
CP × CP mC−−−→ CP
The generator XR ∈ BP 2(RP ) satisfies XR = h∗(X). We also have that mC ◦ (1 ×
(−1))◦dC is null-homotopic. The key fact, which will fail for tmf, is BP ∗(CP×CP ) ≈
BP ∗[X1, X2].
The axial class is m∗R(XR). It equals (h× h)∗(1× (−1))∗m∗C(X). But
(1× (−1))∗m∗C(X) ∈ ker(d∗C).
By the above “key fact,” d∗C is the projection BP
∗[X1, X2]→ BP ∗[X ] in which each
Xi 7→ X . The kernel of this projection is the ideal (X2 −X1). To see this, just note
that in grading 2n a kernel element must be
2 with
ci = 0, and hence is
2 −Xn1 ) =
1(X2 −X1)
n−i−1−j
Thus (1 × (−1))∗m∗C(X) = (X2 − X1)u for some u ∈ BP ∗(CP × CP ). This u is
a unit by consideration of its reduction to H∗(−;Z), as in [2]. Since h∗(u) will then
be a unit in BP ∗(RP ×RP ) and h∗(Xi) = XRi, we obtain the claim about the axial
class being a unit times XR2 −XR1.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 23
In order to see if there is any chance of adapting this to tmf, we compute tmf∗(CP∞)
and tmf∗(CP∞ × CP∞) in positive gradings. We begin with the relevant Ext calcu-
lations.
Let bo = Ext
(Z2,Z2). Recall that a chart for this is given as in Diagram 5.1,
extended with period (t− s, s) = (8, 4).
Diagram 5.1. Ext
(Z2,Z2)
0 4 8
· · ·
Let M10 denote the A2-module 〈1, Sq4, Sq2 Sq4, Sq4 Sq2 Sq4〉.
Lemma 5.2. There is an additive isomorphism
(M10,Z2) ≈ bo[v2],
where v2 ∈ Ext1,7(−).
Thus the chart for Ext
(M10,Z2) consists of a copy of bo shifted by (t − s, s) =
(6i, i) units for each i ≥ 0.
Proof. There is a short exact sequence of A2-modules
0→ Σ7M10 → A2//A1 → M10 → 0.
This yields a spectral sequence which builds Ext
(M10,Z2) from
∗−i,∗−7i
(A2//A1,Z2).
Since Ext
(A2//A1,Z2) ≈ bo, one easily checks that there are no possible differen-
tials in this spectral sequence.
Let Cmn = H
∗(CPmn ;Z2).
24 DONALD M. DAVIS AND MARK MAHOWALD
Theorem 5.3. There is an additive isomorphism
(C∞−∞,Z2) ≈
Σ8p−2bo[v2].
Of course Σ applied to a module or an Ext group just means to increase the t-grading
by 1.
Proof. There is a filtration of C∞−∞ with Fp/Fp−1 ≈ Σ8p−2M10 for p ∈ Z. We have
Sq2 ι8p−2 = Sq
4 Sq2 Sq4 ι8p−10. The same argument used in the last paragraph of the
proof of Corollary 3.10 works to initiate an inductive proof of the Ext-isomorphism
claimed in the theorem.
Corollary 5.4. In gradings (t− s) less than −1,
(C−2−∞,Z2) ≈
Σ8p−2bo[v2].
Proof. There is an exact sequence
→ Exts−1,tA2 (C
−1,Z2)→ Ext
(C−2−∞,Z2)→ Ext
(C∞−∞,Z2)
q∗−→ Exts,tA2(C
−1,Z2)→ .
The result is immediate from this and 5.3, since q∗ sends the initial tower in F0/F−1
isomorphically to the initial tower in ExtA2(C
−1,Z2).
The A-modules C∞1 and Σ
2C−2−∞ are dual. Thus, by [9, Prop 4],
(Z2,C
1 ) ≈ Ext
(Σ2C−2−∞,Z2).
There is a ring structure on Ext
(Z2,C
1 ). We deduce the following result, which
is pictured in Diagram 5.12.
Corollary 5.5. In (t− s) gradings ≤ 0, there is a ring isomorphism
(Z2,C
1 ) ≈ bo[v2][X ],
where X ∈ Ext0,−8.
Proof. We apply the duality isomorphism to 5.4. The multiplicative structure is
obtained from the observation that the powers of the class in Ext0,−8 equal the class
in Ext0,−8i for each i > 0.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 25
The Ext groups computed here are the E2-term of the ASS converging to tmf
−∗(CP∞).
We will consider the differentials in this spectral sequence after performing the Ext
calculation relevant for tmf∗(CP∞ × CP∞).
Now we consider C−2−∞⊗C−2−∞. Now x1 and x2 denote elements of H2(CP ;Z2). Let
E2 denote the exterior subalgebra generated by the Milnor primitives of grading 1, 3,
and 7. Note that A2//E2 has a basis with elements of grading 0, 2, 4, 6, 6, 8, 10, and
12. Finally we note that for any j ≡ −2 mod 8 with j ≤ −10, there is a nontrivial
A2-morphism C
ρ−→ΣjZ2.
Lemma 5.6. Let
K = ker(C−2−∞ ⊗C−2−∞
ρ−→C−2−∞ ⊗ Σ−10Z2).
Let S denote the set of all classes x8i−21 x
2 with i ≤ −1 and j ≤ −2, together with
the classes x8i−21 x
2 with i ≤ −1 and j ≤ −1. Then K is the direct sum of a free
A2//E2-module on S with a single relation Sq
4 Sq2 Sq4(x−101 x
2 ) = 0.
Proof. Since the generators of E2 have odd grading, A2//E2 acts on any element of
these evenly-graded modules. The action of A2//E2 on x
2 yields the additional
elements x−21 x
2 , x
2 , x
2 , x
x−21 x
2 , and x
2. The action of A2//E2 on x
2 yields the
additional elements x01x
2 + x
2, and x
2 + x
2. Each exponent can be decreased by any multiple of 8.
One can easily check that in each grading all classes in C−2−∞ ⊗C−2−∞ are obtained
exactly once from the described elements in K together with C−2−∞ ⊗ Σ−10Z2. There
are four cases, for the four even mod 8 values. We illustrate with the case of grading
4 mod 8. We will just consider the specific value −28, but it will be clear that it
generalizes to all gradings ≡ 4 mod 8. Letting Xi denote xi1x−28−i2 , we have:
(1) From generators in −28, we obtain just X−10 in K. The class
X−18 is in C
−∞ ⊗ Σ−10Z2.
(2) From generators in −32, we obtain X−8 + X−6, X−16 + X−14,
and X−24 +X−22.
(3) From generators in −36, we obtain X−8+X−4 and X−16+X−12.
(4) From generators in −40, we obtain X−4, X−12 +X−8, X−20 +
X−16, and X−24.
26 DONALD M. DAVIS AND MARK MAHOWALD
Note in (4) that X0 and X−28 do not appear because each component must be ≤ −4
and the components sum to −28.
One easily checks that the 11 classes listed above, including X−18, form a basis for
the space spanned by X−4, . . . , X−24, in an orderly fashion that clearly generalizes to
any grading ≡ 4 mod 8. A similar argument works in the other three congruences.
There are some minor variations in the top few dimensions.
Now we dualize. There is a pairing
ExtA2(Z2,C
1 )⊗ ExtA2(Z2,C∞1 )→ ExtA2(Z2,C∞1 ⊗C∞1 ).
Let Xi denote the class in grading −8 coming from the ith factor. Then we obtain
Theorem 5.7. The algebra Ext
(Z2,C
1 ⊗ C∞1 ) in gradings ≤ −8 is isomorphic
to Z2[X1, X2]〈X1X2, y−12〉 with y2−12 = X21X2 + X1X22 . The monomials of the form
X i1X
2y−12 are acted on freely by Z2[v0, v1, v2]. Let Sn denote the Z2-vector space with
basis the monomials X i1X
2 , and define a homomorphism ǫ : Sn → Z2 by sending
each monomial to 1. Then Z2[v0, v1, v2] acts freely on ker(ǫ), while bo[v2] acts freely
on Sn/ ker(ǫ). Thus in dimensions t − s ≤ −8 Ext∗,∗A2 (Z2,C
1 ⊗ C∞1 ) has, for each
i > 0, i copies of Σ−8i−4Z2[v0, v1, v2] and i copies of Σ
−8i−16Z2[v0, v1, v2], and also one
copy of Σ−8i−8bo[v2].
Here Z2[X1, X2]〈X1X2, y−12〉means a free Z2[X1, X2]-module on basis {X1X2, y−12}
Proof. The structure as graded abelian group is straightforward from Lemma 5.6,
Corollary 5.5, and the duality isomorphism
(Z2,C
1 ⊗C∞1 ) ≈ Ext
∗,∗−4
(C−2−∞ ⊗C−2−∞,Z2).
We use that ExtA2(A2//E2,Z2) ≈ Z2[v0, v1, v2]. The reason that we only assert the
structure in dimension ≤ −8 is due to the Σ−10 in the cokernel part of Lemma 5.6, and
that Theorem 5.5 was only valid in dimension ≤ 0. In the range under consideration,
the relation on the top class in Lemma 5.6 does not affect Ext.
The ring structure in filtration 0 comes from HomA2(Z2,C
1 ⊗C∞1 ) being isomorphic
to elements of C∞1 ⊗C∞1 annihilated by Sq2 and Sq4, which has as basis all elements
x4i1 ⊗ x
2 and (x
1 ⊗ x
2 )(x
1 ⊗ x22 + x21 + x42).
NONIMMERSIONS IMPLIED BY TMF, REVISITED 27
Now we show that Ext
1,−8n+2
(Z2,C
1 ⊗ C∞1 ) = Z2, and h1 times each mono-
mial in Ext
0,−8n
(Z2,C
1 ⊗ C∞1 ) equals the nonzero element here. An element in
1,−8n+2
(Z2,C
1 ⊗C∞1 ) = Z2 is an equivalence class of morphisms
Σ2A2 ⊕ Σ4A2
h−→C∞1 ⊗C∞1
which increase grading by 8n− 2, and yield a trivial composite when preceded by
Σ4A2 ⊕ Σ8A2
Sq2 Sq6
0 Sq4
−−−−−−−−−→ Σ2A2 ⊕ Σ4A2.
Morphisms h which can be factored as
Σ2A2 ⊕ Σ4A2
Sq2,Sq4−−−−→ A2
k−→C∞1 ⊗C∞1 (5.8)
are equivalent to 0 in Ext.
We illustrate with the case n = 3. There are A2-morphisms increasing grading by
22 sending either Σ2A2 or Σ
4A2 to any one of the following classes:
2 , x
2 , x
(5.9)
The classes are listed in this order because any two adjacent monomials are equivalent
using as k in (5.8) the morphism sending the generator to the indicated classes in
succession:
2 , x
For example, (Sq2, Sq4)(x11x
2 ) = (x
2 , x
2 ). Thus all classes in (5.9) are equiva-
lent to one another.
That h1 times any monomialX
2 equals this nonzero element of Ext
1,8n+2
(Z2,C
C∞1 ) follows from usual Yoneda product consideration. If 0 ← Z2 ← C0 ← C1 ←
is the beginning of a minimal A2-resolution, with C1 = Σ
1A2 ⊕ Σ2A2 ⊕ Σ4A2,
then h1X
2 is represented by the composite C1 → C0 → C∞1 ⊗ C∞1 sending
ι2 7→ ι 7→ X i1Xn−i2 , and this is equivalent to the element described in the previous
paragraph.
Here is a schematic way of picturing Theorem 5.7. We first list the generators in
grading greater than −32. Then for each of the two types of generators, we list the
structure arising from them in the first 10 dimensions. The bo[v2]-structure in the
28 DONALD M. DAVIS AND MARK MAHOWALD
left half of Diagram 5.11 arises from one tower in dimensions −24 and −16, while the
Z2[v0, v1, v2]-structure in the right half of diagram 5.11 arises from the other towers
in Diagram 5.10.
Diagram 5.10. Generators of ExtA2(Z2,C
1 ⊗C∞1 )
−28 −24 −20 −16 −12
✻✻✻ ✻✻ ✻✻ ✻ ✻
Diagram 5.11. Structure on two types of generators
0 010 10
✻ ✻ ✻ ✻ ✻ ✻ ✻ ✻✻ ✻ ✻
Now we consider the differentials in the ASS converging to tmf∗(CP∞) and then for
tmf∗(CP∞ ∧ CP∞). The gradings are negated when considered as tmf-cohomology
groups. Corollary 5.5 gives the E2-term converging to [Σ
∗CP∞1 , tmf] ≈ tmf
−∗(CP∞1 ).
We will maintain the homotopy gradings until just before the end. In diagram 5.12,
we depict a portion of the E2-term of this ASS in gradings −16 to 1. There are also
classes in higher filtration arising from powers of v41 and v2 acting on generators in
lower grading. The elements indicated by •’s are involved in differentials, as explained
later.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 29
Diagram 5.12. A portion of E2 for [Σ
∗CP∞, tmf]
−16 −8 0
We will prove the following key result about differentials in this ASS.
Theorem 5.13. The nonzero differentials in the ASS converging to [Σ∗CP∞, tmf],
∗ < 1, are given by
−2k+1) = hǫ+11 v
for ǫ = 0, 1, i, j ≥ 0, k ≥ 1.
Here h1, v
1, and v2 have the usual Ext
s,t gradings (s, t) = (1, 2), (4, 12), and (1, 7),
respectively.
Diagram 5.12 pictures the situation for k = 1 and small values of i and j. The
elements indicated by •’s are involved in the differentials. The resulting picture is
nicer if the filtrations of all classes built on X−2k+1 are increased by 1. There is
a nontrivial extension (multiplication by 2) in dimension −6 due to the preceding
differential. This is equivalent to the way that bu∗ is formed from bo∗ and Σ
2bo∗.
We obtain Diagram 5.14 from Diagram 5.12 after the differentials, extensions, and
filtration shift are taken into account.
30 DONALD M. DAVIS AND MARK MAHOWALD
Diagram 5.14. Diagram 5.12 after differentials and filtration shift
−16 −8 0
The regular sequence of towers in the chart beginning in filtration 1 in dimension −10
is interpreted as vi1v2, i ≥ 0.
After negating dimensions to switch to cohomology indexing, we obtain the follow-
ing result, which is immediate from 5.13 after the extensions such as just seen are
taken into account.
Theorem 5.15. In positive gradings, there is an isomorphism of graded abelian
groups
tmf∗(CP∞1 ) ≈ Z(2)[Z16](bo∗ ⊕ v2Z(2)[v1, v2]).
Here Z16 ∈ tmf16(CP∞1 ), and |v1| = −2 and |v2| = −6.
Recall that bo∗ = bo−∗ with bo∗ as suggested in 5.1. Much of the ring structure
of tmf∗(CP∞1 ) is described in 5.15, since bo∗ and v2Z(2)[v1, v2] are rings, and it is
quite clear how to multiply an element in bo∗ by one in v2Z(2)[v1, v2]. Because of the
filtration shift that led to the identification of some of the classes in v2Z(2)[v1, v2], we
hesitate to make any complete claims about the ring structure.
A complete computation of tmf∗(CP∞) was made in [5]. See there especially
Theorem 7.1 and Diagram 7.1. At first glance, the two descriptions appear quite
different, but they seem to be compatible.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 31
Proof of Theorem 5.15. We first prove that there is a nontrivial class in [Σ−16CP, tmf]
detected in filtration 0. This is obtained using the virtual bundle 8(H−1)−(H3−H),
where H denotes the complex Hopf bundle. Considered as a real bundle θ, this bundle
satisfies w2(θ) and p1(θ) = 0. Here we use from [18] that p1 generates the infinite
cyclic summand in H4(BSO;Z) and satisfies r∗(p1) = c
1 − 2c2 under BU
r−→BSO,
and ρ∗(p1) = 2e1 under BSpin
ρ−→ BSO, where H4(BSpin;Z) is an infinite cyclic
group generated by e1. The total Chern class of 9H −H3 is
(1 + x)9(1 + 3x)−1 = 1 + 6x+ 18x2 + · · · ,
and hence
r∗(p1(θ)) = (c1(9H −H3))2 − 2c2(9H −H3) = (6x)2 − 2 · 18x2 = 0.
Thus e1(θ) = 0, hence CP
∞ θ−→BSpin → K(Z, 4) is trivial, and so θ lifts to a map
CP∞ → BO[8]. Hence its Thom spectrum induces a degree-1 map T (θ) → MO[8].
Since ψ3(H) = H3 −H , by [19] θ is J(2)-equivalent to 8(H − 1), and hence its Thom
spectrum is T (8(H − 1)) = Σ−16CP∞8 . Using the Ando-Hopkins-Rezk orientation
([1]) MO[8]→ tmf, we obtain our desired class as the composite
Σ−16CP∞1
col−→ Σ−16CP∞8
T (θ)−−→MO[8]→ tmf . (5.16)
We will deduce our differentials from the d3-differential E
3 → E
3 in the ASS
converging to π∗(tmf). This can be seen in [13, p.537] or [11, Thm 2.2]. See Remark
5.17 for additional explanation. It is not difficult to show that, with M10 as in 5.2,
the morphism
(Z2,Z2)→ Exts,tA2(M10,Z2)
induced by the nontrivial A2-map M10 → Z2 sends the Z2 in Ext7,23A2 (Z2,Z2) which is
not part of the infinite tower to h21v
We prefer to think about the ASS for tmf∗(Σ
2CP−2−∞), which, as we have noted,
is isomorphic to that of [Σ∗CP∞1 , tmf]. The E2-term was described in 5.4. Let
S−16 → Σ2CP−2−∞ ∧ tmf correspond to the map in (5.16). Since E2(CP−2−∞ ∧ tmf)
in negative dimensions is built from copies of ExtA2(M10,Z2), we deduce from the
previous paragraph that h21v
1v2g−16 in the ASS for tmf∗(Σ
2CP−2−∞) must be hit by a
d2- or d3-differential, since it is the image of a class hit by a d3. The only possibility is
that it be d2 from h1v
1g−8, as indicated by the dotted line in Diagram 5.12. Naturality
32 DONALD M. DAVIS AND MARK MAHOWALD
of differentials with respect to h1 and v
1 implies the differentials of 5.13 for ǫ = 0, 1,
all i, j = 0, and k = 1. Using the diagonal map of CP∞1 and the multiplication
of tmf, powers of (5.16) give similar nontrivial elements in [Σ−16kCP∞1 , tmf] for all
k ≥ 1, and by the argument just presented, we establish the differentials of 5.13 for
all k (with j = 0 still).
The only possible differentials on v2g−16 would be some dr with r > 2 hitting an
element which is acted on nontrivially by h1. However h1v2g−16 has become 0 in
E3 since it was hit by a d2-differential. Thus a nonzero differential on v2g−16 would
contradict naturality of differentials with respect to h1-action. Hence there is a map
S−10 → Σ2CP−2−∞ ∧ tmf hitting v2g−16, and the argument of the previous paragraph
implies that d2(h1v
1v2g−8) = h
2g−16 and then other related differentials. This
now establishes the differentials of 5.13 when j = 1, and sets in motion an inductive
argument to establish these differentials for all j ≥ 1.
No further differentials in the spectral sequence are possible, by dimensional and
h1-naturality considerations.
Remark 5.17. The proof of the key d3-differential in the ASS of tmf from the 17-
stem to the 16-stem, which was cited above, has not had a thorough proof in the
literature. Giambalvo’s original argument was incorrect and his correction merely
refers to “a homotopy argument.” The current authors cited Giambalvo’s result in
[11] without additional argument. We provide some more detail here regarding this
differential.
The relevant portion of the ASS of tmf appears in Diagram 5.18. In [13] and [11],
this was pictured as the ASS of MO[8], but through dimension 18,
∗(MO[8]),Z2) ≈ Ext∗,∗A2(Z2 ⊕ Σ
Z2,Z2).
One way of obtaining the differentials from 15 to 14, as in [13], is to note that the [8]-
cobordism group of 14-dimensional manifolds is Z2, and so the top two elements must
be killed by differentials. It is not difficult to compute in Ext the Massey product
formula B = 〈A, h0, h1〉, where A and B are as in Diagram 5.18. This can be seen
as v41 times a similar formula between classes in dimensions 6 and 8. Since A is 0 in
homotopy, the associated Toda bracket formula says that B must be divisible by η.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 33
But only 0 can be divisible by η in dimension 16 here. Thus B must be killed by a
differential, and the depicted way is the only way this can happen.
Diagram 5.18. Portion of ASS of tmf
14 16 18
The differentials in the ASS converging to tmf∗(CP
−∞∧CP−2−∞) are implied by the
same considerations that worked for CP−2−∞. The Z2[v0, v1, v2]-parts in Theorem 5.7
cannot support differentials by dimensionality and h1-naturality. For the bo-like part,
we prefer thinking about it as [Σ∗+4CP∞1 ∧ CP∞1 , tmf] ≈ tmf−∗−4(CP∞1 ∧ CP∞1 ),
where the product structure is more apparent.
Let Zn denote the nonzero element of Ext
0,−8n
(Z2,C
1 ⊗C∞1 )/ ker(h1). By Theorem
5.7, Zn can be represented by X
2 for any 1 ≤ i < n. If n is even and n ≥ 4,
choosing i even, Zn is an infinite cycle because it is an external product of infinite
cycles. Hence by the proof of Theorem 5.13,
2Z2k−1) = h
2 Z2k
for ǫ = 0, 1, i, j ≥ 0, and k ≥ 2.
Finally, X1X2 is an infinite cycle since there is nothing that it can hit. Also,
h1v2X1X2 and h
1v2X1X2 are not hit by differentials since Ext
(Z2,C
1 ⊗C∞1 ) = 0
by Theorem 5.7. We obtain the following.
Theorem 5.19. In grading ≥ 10, there is an isomorphism of graded abelian groups
tmf∗(CP∞1 ∧CP∞1 ) ≈ yZ(2)[v1, v2, X1, X2]⊕
In·Z(2)[v1, v2]⊕Z(2)[Z](bo∗⊕v2Z(2)[v1, v2]),
34 DONALD M. DAVIS AND MARK MAHOWALD
where |y| = 12, |Xi| = 8, |Z| = 16, |v1| = −2, and |v2| = −6. Here In = ker(Fn
ǫ−→Z),
where Fn is a free abelian group with basis {X i1Xn−i2 : 1 ≤ i < n}, and ǫ(X i1Xn−i2 ) = 1.
Thus In consists of all polynomials of grading n with sum of coefficients equal to 0.
We could have extended the description in 5.19 down to grading 8, but the description
would have been slightly more complicated, since it would include h1v2Z and h
1v2Z.
The motivation for this section was to see if perhaps
ker(tmf∗(CP∞ × CP∞) d
−→ tmf∗(CP∞))
might be something nice like the I(X1 − X2) which was the case for BP ∗(−). In
Theorem 5.19, we described tmf∗(CP∞ ∧ CP∞). To obtain tmf∗(CP∞ × CP∞), we
add on two copies of tmf∗(CP∞), which was described in 5.15. Denote by Z1 and
Z2 the generators in tmf
16(CP∞ × CP∞). Monomials Z i1Zn−i2 should equal Zn of
5.19 plus perhaps elements of I2n of 5.19. The class y of 5.19 plus perhaps a sum of
elements of higher filtration is in ker(d∗) and not in the ideal generated by (Z1−Z2).
Thus, as expected, ker(d∗) does not have the nice form that it did for BP ∗(−), and
so we cannot use this argument to show that the axial class in tmf∗(RP∞ × RP∞)
is u(X1 − X2). However, we showed something like this by a completely different
method in Theorem 4.10. We feel that the results obtained in Theorems 5.15 and
5.19 should be of independent interest.
References
[1] M. Ando, M. J. Hopkins, and C. Rezk, Multiplicative orientations of
KO-theory and of the spectrum of topological modular forms, preprint,
www.math.uiuc.edu/∼mando/papers/koandtmf.pdf.
[2] L. Astey, Geometric dimension of vector bundles over real projective spaces,
Quar Jour Math Oxford 31 (1980) 139-155.
[3] , A cobordism obstruction to embedding manifolds, Ill Jour Math 31
(1987) 344-350.
[4] L. Astey and D. M. Davis, Nonimmersions of real projective spaces implied by
BP , Bol Soc Mat Mex 25 (1980) 15-22.
[5] T. Bauer, Elliptic cohomology and projective spaces–a computation, preprint.
wwwmath.uni-muenster.de/u/tbauer/cpinfty.pdf.
[6] R. R. Bruner, D. M. Davis, and M. Mahowald, Nonimmersions of real projec-
tive spaces implied by tmf, Contemp Math 293 (2002) 45-68.
[7] D. M. Davis, Table of immersions and embeddings of real projective spaces,
http://www.lehigh.edu/∼dmd1/immtable.
[8] , A strong nonimmersion theorem for real projective spaces, Annals
of Math 120 (1984) 517-528.
NONIMMERSIONS IMPLIED BY TMF, REVISITED 35
[9] , On the Segal Conjecture for Z2 × Z2, Proc Amer Math Soc 83
(1981) 619-622.
[10] D. M. Davis and M. Mahowald, Ext over the subalgebra A2 of the Steenrod
algebra for stunted projective spaces, Can Math Soc Conf Proc 2 (1982) 297-
[11] , A new spectrum related to 7-connected cobordism, Springer-Verlag
Lecture Notes in Math 1370 (1989) 126-134.
[12] D. M. Davis and V. Zelov, Some new embeddings and nonimmersions of real
projective spaces, Proc Amer Math Soc 128 (2000) 3731-3740.
[13] V. Giambalvo, On 〈8〉-cobordism, Ill Jour Math 15 (1971) 533-541. Correction
in Ill Jour Math 16 (1972) 704.
[14] I. M. James, On the immersion problem for real projective spaces, Bull Amer
Math Soc 69 (1963) 231-238.
[15] N. Kitchloo and W. S. Wilson, The second real Johnson-Wilson theory and
nonimmersions of RPn, preprint.
[16] W. H. Lin, On conjectures of Mahowald, Segal, and Sullivan, Math Proc Camb
Phil Soc 87 (1980) 449-458.
[17] W. H. Lin, D. M. Davis, M. Mahowald, and J. F. Adams, Calculation of Lin’s
Ext groups, Math Proc Camb Phil Soc 87 (1980) 459-469.
[18] J. Milnor and J. D. Stasheff, Characteristic classes, Princeton Univ Press
(1974).
[19] D. Sullivan, Genetics of homotopy theory and the Adams conjecture, Annals of
Math 100 (1974) 1-79.
Lehigh University, Bethlehem, PA 18015, USA
E-mail address : dmd1@lehigh.edu
Northwestern University, Evanston, IL 60208, USA
E-mail address : mark@math.northwestern.edu
ABSTRACT
  In a 2002 paper, the authors and Bruner used the new spectrum tmf to obtain
some new nonimmersions of real projective spaces. In this note, we
complete/correct two oversights in that paper.
  The first is to note that in that paper a general nonimmersion result was
stated which yielded new nonimmersions for RP^n with n as small as 48, and yet
it was stated there that the first new result occurred when n=1536. Here we
give a simple proof of those overlooked results.
  Secondly, we fill in a gap in the proof of the 2002 paper. There it was
claimed that an axial map f must satisfy f^*(X)=X_1+X_2. We realized recently
that this is not clear. However, here we show that it is true up multiplication
by a unit in the appropriate ring, and so we retrieve all the nonimmersion
results claimed in the original paper.
  Finally, we present a complete determination of tmf^{8*}(RP^\infty\times
RP^\infty) and tmf^*(CP^\infty\times CP^\infty) in positive dimensions.

<|endoftext|><|startoftext|>
Spin Evolution of Accreting Neutron Stars: Nonlinear Development of the R-mode
Instability
Ruxandra Bondarescu, Saul A. Teukolsky, and Ira Wasserman
Center for Radiophysics and Space Research, Cornell University, Ithaca, NY 14853
The nonlinear saturation of the r-mode instability and its effects on the spin evolution of Low
Mass X-ray Binaries (LMXBs) are modeled using the triplet of modes at the lowest parametric
instability threshold. We solve numerically the coupled equations for the three mode amplitudes in
conjunction with the spin and temperature evolution equations. We observe that very quickly the
mode amplitudes settle into quasi-stationary states that change slowly as the temperature and spin
of the star evolve. Once these states are reached, the mode amplitudes can be found algebraically and
the system of equations is reduced from eight to two equations: spin and temperature evolution. The
evolution of the neutron star angular velocity and temperature follow easily calculated trajectories
along these sequences of quasi-stationary states. The outcome depends on whether or not the
star will reach thermal equilibrium, where the viscous heating by the three modes is equal to
the neutrino cooling (H = C curve). If, when the r-mode becomes unstable, the star spins at a
frequency below the maximum of the H = C curve, then it will reach a state of thermal equilibrium.
It can then either (1) undergo a cyclic evolution with a small cycle size with a frequency change
of at most 10%, (2) evolve toward a full equilibrium state in which the accretion torque balances
the gravitational radiation emission, or (3) enter a thermogravitational runaway on a very long
timescale of ≈ 106 years. If the star does not reach a state of thermal equilibrium, then a faster
thermal runaway (timescale of ≈ 100 years) occurs and the r-mode amplitude increases above the
second parametric instability threshold. Following this evolution requires more inertial modes to
be included. The sources of damping considered are shear viscosity, hyperon bulk viscosity and
viscosity within the core-crust boundary layer. We vary proprieties of the star such as the hyperon
superfluid transition temperature Tc, the fraction of the star that is above the threshold for direct
URCA reactions, and slippage factor, and map the different scenarios we obtain to ranges of these
parameters. We focus on Tc & 5 × 10
9 K where nonlinear effects are important. Wagoner [1] has
shown that a very low r-mode amplitude arises at smaller Tc. For all our bounded evolutions the
r-mode amplitude remains small ∼ 10−5. The spin frequency of accreting neutron stars is limited by
boundary layer viscosity to νmax ≈ 800Hz[Sns/(M1.4R6)]
4/11T
−2/11
. Fast rotators are allowed for
[Sns/(M1.4R6)]
4/11T
−2/11
∼ 1 and we find that in this case the r-mode instability would be active
for about 1 in 1000 LMXBs and that only the gravitational waves from LMXBs in the local group
of galaxies could be detected by advanced LIGO interferometers.
PACS numbers: 04.40.Dg, 04.30.Db, 97.10.Sj, 97.60.Jd
I. INTRODUCTION
R-modes are oscillations in rotating fluids that are due
to the Coriolis effect. They are subject to the classical
Chandrashekar-Friedman-Shutz (CFS) instability [2, 3],
which is driven by the gravitational radiation backreac-
tion force. Andersson [4] and Friedman and Morsink [5]
showed that, in the absence of fluid dissipation, r-modes
are linearly unstable at all rotation rates. However, in
real stars there is a competition between internal viscous
dissipation and gravitational driving [6] that depends on
the angular velocity Ω and temperature T of the star.
Above a critical curve in the Ω−T plane the n = 3,m = 2
mode, referred to as ’the r-mode’ in this work, becomes
unstable. At first, an unstable r-mode grows exponen-
tially, but soon it may enter a regime where other in-
ertial modes that couple to the r-mode become excited
and nonlinear effects become important. Roughly speak-
ing, nonlinear effects first become significant as the am-
plitude passes its first parametric instability threshold,
which is very low (∼ 10−5). Modeling and understand-
ing the nonlinear effects is crucial in determining (1) the
final saturation amplitude of the r-mode and (2) the lim-
iting spin frequency that neutron stars can achieve. The
r-mode amplitude and the duration of the instability are
among the main factors that determine whether the as-
sociated gravitational radiation could be detectable by
laser interferometers on Earth.
The r-mode instability has been proposed as an expla-
nation for the sub-breakup spin rates of both Low Mass
X-ray Binaries (LMXBs) [7, 8] and young, hot neutron
stars [6, 9]. The idea that gravitational radiation could
balance accretion was proposed independently by Bild-
sten [7] and Andersson et al. [8]. Cook, Shapiro and
Teukolsky [10, 11] model the recycling of pulsars to mil-
lisecond periods via accretion from a Keplerian disk onto
a bare neutron star with M = 1.4M⊙ when Ω = 0. De-
pending on the equation of state they found that spin
frequencies of between ≈ 670 Hz and 1600 Hz could be
achieved before mass shedding or radial instability set
in (these calculations predated the realization that the
r-mode instability could limit the spin frequency). For
comparison, the highest observed spin rate of millisec-
ond pulsars is 716 Hz for PSR J1748-2446ad [12, 13].
http://arxiv.org/abs/0704.0799v3
PSR B1937+21, which was discovered in 1982, was the
previous fastest known radio pulsar with a spin rate of
642 Hz [14]; that this “speed” record stood for 24 years
suggests that neutron stars rotating this fast are rare.
Moreover, based on a Bayesian statistical analysis of the
spin frequencies of the 11 nuclear-powered millisecond
pulsars whose spin periods are known from burst oscil-
lations, Chakrabarty et al. [15] claimed a cutoff limit of
νmax = 760 Hz (95% confidence); A more recent analy-
sis, which added two more pulsars to the sample, found
νmax = 730 Hz [16].
At first sight, one might conclude that mass shedding
or radial instability sets νmax, and that it is just above
the record ν = 716 Hz determined for PSR J1748-2446ad.
However, the nuclear equations of state consistent with
this picture all have rather large radii ≈ 16 − 17 km
for non-rotating 1.4 M⊙ models; see Table 1 in Cook et
al. [10]. For these equations of state, the r-mode insta-
bility should lead to νmax somewhat below 716 Hz; see
Eq. (33) in Sec. V below. Thus, the r-mode instability
may prevent recycling by accretion from reaching mass
shedding or radial instability. In other words, the de-
tection of the 716 Hz rotator is consistent with accretion
spin-up mitigated by the r-mode instability only for equa-
tions of state for which mass shedding or radial instability
would permit even faster rotation. Ultimately, this may
be turned into useful constraints on nuclear equations of
state. However, at present the uncertainty in the physics
of internal dissipation is a significant hindrance in estab-
lishing such constraints.
Since a physical model to follow the nonlinear phase of
the evolution was initially unavailable, Owen et al. [17]
proposed a simple one-mode evolution model in which
they assumed that nonlinear hydrodynamics effects satu-
rate the r-mode amplitude at some arbitrarily fixed value.
According to their model, once this maximum allowed
amplitude is achieved, the r-mode amplitude remains
constant and the star spins down at this fixed ampli-
tude (see Eqs. (3.16) and (3.17) in Ref. [17]). They used
this model to study the impact of the r-mode instability
on the spin evolution of young hot neutron stars assum-
ing normal matter. In their calculation they include the
effects of shear viscosity and n-p-e bulk viscosity. They
found that the star would cool to approximately 109 K
and spin down from a frequency close to the Kepler fre-
quency to about 100 Hz in a period of ∼ 1 yr [17].
Most subsequent investigations that did not perform
direct hydrodynamic simulations used the one-amplitude
model of Ref. [17] for studying the r-mode instability.
Levin [18] used this model to study the limiting effects of
the r-mode instability on the spin evolution of LMXBs,
assuming an r-mode saturation amplitude of ∼ 1; he
adopted a modified shear viscosity to match the maxi-
mum LMXB spin frequency of 330 Hz known in 1999.
Levin found that the neutron star followed a cyclic evo-
lution in the Ω − T phase plane. The star spins up for
several million years until it crosses the r-mode stability
curve, whereupon the r-mode becomes unstable and the
star is viscously heated for a fraction of a year until the
r-mode reaches its saturation amplitude (∼ 1). At this
point the spin and r-mode amplitude evolution equations
are changed, following the prescription of Ref. [17] to en-
sure constant amplitude. The star then spins down by
emitting gravitational radiation for another fraction of a
year until it crosses the r-mode stability curve again and
the instability shuts off. The time period during which
the r-mode is unstable was found to be about 10−6 times
shorter than the spin-up time, and Levin concluded that
it is unlikely that any neutron stars in LMXBs in our
galaxy are currently spinning down and emitting gravita-
tional radiation. However, following work by Arras et al.
[19] showing that nonlinear effects become significant at
small r-mode amplitude, Heyl [20] varied the saturation
amplitude, and found that the duration of the spin-down
depends sensitively on it. He predicted that the unstable
phase could be as much as 30% of the cyclic evolution for
an r-mode saturation amplitude of α ≈ 10−5, and that
this would make some of the fastest spinning LMXBs in
our galaxy detectable by interferometers on Earth.
Jones [21] and Lindblom and Owen [22] pointed out
that if the star contains exotic particles such as hyperons
(massive nucleons where an up or down quark is replaced
with a strange quark), internal processes could lead to a
very high coefficient of bulk viscosity in the cores of neu-
tron stars. While this additional high viscosity coefficient
could eliminate the instability altogether in newly born
neutron stars [21, 22, 23, 24], Nayyar and Owen [24] pro-
posed that it would enhance the probability of detection
of gravitational radiation from LMXBs by blocking the
thermal runaway.
The cyclic evolution found by Levin [18] and gener-
alized by Heyl [20] arises when shear or boundary layer
viscosity dominates the r-mode dissipation. In the evo-
lutionary picture of Nayyar and Owen [24], the r-mode
first becomes unstable at a temperature where shear
and boundary layer viscosity dominate, but the result-
ing thermal runaway halts once hyperon bulk viscosity
becomes dominant. The key feature behind the runaway
is that shear and boundary layer viscosities both decrease
with increasing temperature, so the instability speeds up
as the star grows hotter. However, if the bulk viscosity
is sufficiently large the star can cross the r-mode stabil-
ity curve at a point where the viscosity is an increas-
ing function of temperature. Such scenarios were stud-
ied by Wagoner [1] for hyperon bulk viscosty with low
hyperon superfluid transition temperature; similar evo-
lution was found for strange stars by Andersson, Jones
and Kokkotas [25]. In this picture, the star evolves near
the r-mode stability curve until an equilibrium between
accretion spin-up and gravitational radiation spin-down
is achieved. The value of the r-mode amplitude remains
below the lowest instability threshold found by Brink et
al. [26, 27, 28] for modes with n < 30, and hence in this
regime nonlinear effects may not play a role.
Schenk et al. [29] developed a formalism to study the
nonlinear interaction of the r-mode with other inertial
modes. They assumed a small r-mode amplitude and
treated the oscillations of the modes with weakly nonlin-
ear perturbation theory via three-mode couplings. This
assumption was tested by Arras et al. [19] and Brink
et al. [26, 27, 28]. Arras et al. proposed that a turbu-
lent cascade would develop in the strong driving regime.
They estimated that r-mode amplitude was small and
could have values between 10−1 − 10−4. Brink et al.
modeled the star as incompressible and calculated the
coupling coefficients analytically. They computed the in-
teraction of about 5000 modes via approximatively 1.3
million couplings of the 109 possible couplings among
the modes with n ≤ 30. The couplings were restricted to
mode triplets with a fractional detuning δω/(2Ω) < 0.002
since near-resonances promote modal excitation at very
small amplitudes. Brink et al. showed that the nonlinear
evolution saturates at a very small amplitude, generally
comparable to the lowest parametric instability thresh-
old that controls the initiation of energy sharing among
the sea of inertial modes. However, Brink et al. did not
model accretion spin-up or neutrino cooling in their cal-
culation and only included minimal dissipation via shear
viscosity.
In this paper we begin a more complete study of the
saturation of the r-mode instability including accretion
spin up and neutrino cooling. We use a simple model in
which we parameterize uncertain properties of the star
such as the rate at which it cools via neutrino emission
and the rate at which the energy in inertial modes dis-
sipates via boundary layer effects [30] and bulk viscos-
ity. In order to exhibit the variety of possible nonlinear
behaviors, we explore a range of models with different
neutrino cooling and viscous heating coefficients by vary-
ing the free parameters of our model. In particular, we
vary: (1) the slippage factor Sns, which regulates the
boundary layer viscosity, between 0 and 1 (see for exam-
ple [31, 32, 33] for some models of the interaction between
the oscillating fluid core and an elastic crust) ; (2) the
fraction of the star that is above the density threshold for
direct URCA reactions fdU, which is taken to be between
0 (0% of the star cools via direct URCA) and 1 (100% of
the star is subjected to direct URCA reactions), and in
general depends on the equation of state used; and (3) the
hyperon superfluidity temperature Tc, which is believed
to be between 109− 1010 K (We use a single, effective Tc
rather than modelling its spatial variation.) We focus on
Tc & 5×109 K for which nonlinear effects are important.
For low Tc . 3 × 109 K, Wagoner [1] showed that the
evolution reaches a steady state at amplitudes below the
lowest parametric instability threshold found by Brink
et al. [28]. It is important to note that all our evolu-
tions start on the part of the r-mode stability curve that
decreases with temperature and that the bulk viscosity
does not play a role in any of our bound evolutions.
We include three modes: the r-mode at n = 3 and the
two inertial modes at n = 13 and n = 14 that become
unstable at the lowest parametric instability threshold
found by Brink et al. [28]. We evolve the coupled equa-
tions for the three-mode system numerically in conjunc-
tion with the spin and temperature evolution equations.
The lowest parametric instability threshold provides a
physical cutoff for the r-mode amplitude. In all cases we
investigate, the growth of the r-mode is initially halted
by energy transfer to the two daughter modes. We ob-
serve that the mode amplitudes settle into a series of
quasi-stationary states within a period of a few years af-
ter the spin frequency of the star has increased above the
r-mode stability curve. These quasi-stationary states are
algebraic solutions of the three-mode amplitude equa-
tions (see Eqs. (6)) and change slowly as the spin and
the temperature of the star evolve. Using these solutions
for the mode amplitudes, one can reduce the eight evo-
lution equations (six for the real and imaginary parts of
the mode amplitudes, which are complex [29]; one for the
spin, and one for the temperature) to two equations gov-
erning the rotational frequency and the temperature of
the star. Our work can be regarded as a minimal physical
model for modeling amplitude saturation realistically.
The outcome of the evolution is crucially dependent
on whether the star can reach a state of thermal equilib-
rium. This can be predicted by finding the curve where
the viscous heating by the three modes balances the neu-
trino cooling, referred to below as the Heating = Cooling
(H = C) curve. TheH = C curve can be calculated prior
to carrying out an evolution using the quasi-stationary
solutions for the mode amplitudes. If the spin frequency
of the star upon becoming unstable is below the peak
of the H = C curve, then the star will reach a state of
thermal equilibrium. When such a state is reached we
find several possible scenarios. The star can: (1) un-
dergo a cyclic evolution; (2) reach a true equilibrium in
which the accretion torque is balanced by the rate of loss
of angular momentum via gravitational radiation; or (3)
evolve in thermal equilibrium until it reaches the peak of
the H = C curve, which occurs on a timescale of about
106 yr, and subsequently enter a regime of thermal run-
away. On the other hand, if the star cannot find a state
of thermal equilibrium, then it enters a regime of ther-
mogravitational runaway within a few hundred years of
crossing the r-mode stability curve. When this happens,
the r-mode amplitude increases beyond the second para-
metric instability, and more inertial modes would need
to be included to correctly model the nonlinear effects.
This will be done in a later paper.
This paper focuses on showing how nonlinear mode
couplings affect the evolution of the temperature and spin
frequency of a neutron star once it becomes prone to the
r-mode CFS instability. We do this in the context of three
mode coupling, which may be sufficient for large enough
dissipation. To illustrate the types of behavior that arise,
we adopt a very specific model in which the mode fre-
quencies and couplings are computed for an incompress-
ible star, modes damp via shear viscosity, boundary layer
viscosity and hyperon bulk viscosity, and the star cools
via a mixture of fast and slow processes. This model in-
volves several parameters that are uncertain, and we vary
these to find ‘phase diagrams’ in which different generic
types of behavior are expected. Moreover, the model it-
self is simplified: (1) A more realistic treatment of the
modes could include buoyant forces, and also mixtures of
superfluids or of superfluid and normal fluid in different
regions. (2) Dissipation rates, particularly from bulk vis-
cosity, depend on the composition of high density nuclear
matter, which could differ from what we assume.
Nevertheless, although the quantitative details may
differ from what we compute, we believe that many fea-
tures of our calculations ought to be robust. More sophis-
ticated treatment of the modes of the star will still find a
dense set of modes confined to a relatively small range of
frequencies. Most importantly, this set will exhibit nu-
merous three mode resonances, which is the prerequisite
for strong nonlinear effects at small mode amplitudes.
Thus, whenever the unstable r-mode can pass its lowest
parametric instability threshold, it must start exciting its
daughters. Whether or not that occurs depends on the
temperature dependence of the dissipation rate of the r-
mode; for the models considered here, where bulk viscos-
ity is relatively unimportant, soon after the star becomes
unstable its r-mode amplitude passes its first paramet-
ric instability threshold. Once that happens, the generic
types of behavior we find - cycles, steady states, slow
and fast runaway - ought to follow suit. The details of
when different behaviors arise will depend on the precise
features of the stellar model, but the principles we out-
line here (parametric instability, quasisteady evolution,
competition between heating and cooling) ought to ap-
ply quite generally.
In Sec. II we describe the evolution equations of the
three modes, the angular frequency and the temperature
of the neutron star. We first show how the equations of
motion for the modes of Schenk et al. couple to the rota-
tional frequency of the star in the limit of slow rotation.
We then give a short review of the parametric instability
threshold and the quasi-stationary solutions of the three-
mode system. The thermal and spin evolution of the star
is discussed next. This is followed by a description of
the driving and damping rates used. Sec. III provides
an overview of the results, which includes a discussion
of each evolution scenario and of the initial conditions
and input physics that lead to each scenario. Sec. IVA
discusses cyclic evolution in more detail. An evolution
that leads to an equilibrium steady state is presented
next in Sec. IVB. The two types of thermal runaway are
then discussed in Sec. IVC. The prospects for detecting
gravitational radiation for the evolutions in which the
three-mode system correctly models the nonlinear effects
are considered in Sec. V. We summarize the results in
the conclusion. Appendix A sketches a derivation of the
equations of motion for the three modes and Appendix
B contains a stability analysis of the evolution equations
around the thermal equilibrium state.
II. EVOLUTION EQUATIONS
A. Three mode system: coupling to uniform
rotation
In this section we review the equations of motion for
the three-mode system in the limit of slow rotation. In
terms of rotational phase τ for the time variable with
dτ = Ω dt Eq. (2.49) of Schenk et al. [29] can be rewritten
= iω̃αCα +
2iω̃ακ̃√
CβCγ , (1)
= iω̃βCβ −
2iω̃βκ̃√
= iω̃γCγ −
2iω̃γκ̃√
Here the scaled frequency ω̃j is defined to be ω̃j = ωj/Ω,
the dissipation rates of the daughter modes are γβ and γγ ,
γα is the sum of the driving and damping rates of the r-
mode γα = γGR−γαv, and the dimensionless coupling is
κ̃ = κ/(MR2Ω2). These amplitude variables are complex
and can be written in terms of the variables of Ref. [29]
as Cj(t) =
Ω(t)cj(t) (see Appendix A for a derivation
of Eqs. 1). The index j loops over the three modes j =
α, β, γ, where α labels the r-mode or parent mode and β
and γ label the two daughter modes in the mode triplet.
When the daughter mode amplitudes are much smaller
than that of the parent mode, one can approximate the
parent mode amplitude as constant. Under this assump-
tion one performs a linear stability analysis on Eqs. (1)
and finds the r-mode amplitude when the two daughter
modes become unstable (see Eqs. (B5-B7) of Ref. [28]
for a full derivation). This amplitude is the parametric
instability threshold
|Cα|2 =
4κ̃2ω̃βω̃γΩ
1 + Ω2
γβ + γγ
, (2)
where the fractional detuning is δω̃ = ω̃α − ω̃β − ω̃γ .
Thorough explorations of the phase space of damped
three-mode systems were performed by Dimant [34] and
Wersinger et al. [35].
For the three modes at the lowest parametric instabil-
ity threshold, ω̃α ≈ 0.66, ω̃β ≈ 0.44, ω̃γ ≈ 0.22, κ̃ ≈ 0.19
and |δω̃| ≈ 3.82 × 10−6. Note that ω̃ is twice the w of
Brink et al. [26, 27, 28]. Here β labels the mode with
n = 13,m = −3 and γ labels the n = 14,m = 1 mode.
The amplitude the r-mode has to reach before exciting
these two daughter modes is |Cα| ≈ 1.5× 10−5
Ω [28].
We next rescale the rotational phase τ by the fractional
detuning as τ̃ = τ |δω̃| and the mode amplitudes by
|Cα|0 =
|δω̃|
ω̃βω̃γ
, |Cβ |0 =
|δω̃|
ω̃αω̃γ
, (3)
|Cγ |0 =
|δω̃|
ω̃βω̃α
which for the r-mode is, up to a factor of
Ω/Ωc,
the no-damping limit of the parametric instability thresh-
old below which no oscillations will occur. The coupled
equations become
|δω̃|
C̄α +
|δω̃|Ω̃
C̄α −
C̄βC̄γ , (4)
|δω̃|
C̄β −
|δω̃|Ω̃
C̄β −
C̄αC̄
|δω̃|
C̄γ −
|δω̃|Ω̃
C̄γ −
C̄αC̄
with C̄j = Cj/|Cj |0 and γ̃j = γj/Ωc being the newly
rescaled amplitudes and dissipation/driving rates, re-
spectively.
1. Quasi-Stationary Solution
In terms of amplitudes and phase variables Cj =
|Cj |eiφj Eqs. (4) can be rewritten as
d|C̄α|
Ω̃|δw̃|
|C̄α| −
sinφ|C̄β ||C̄γ |
, (5)
d|C̄β |
= − γ̃β
Ω̃|δw̃|
|C̄β |+
sinφ|C̄α||C̄γ |
d|C̄γ |
Ω̃|δw̃|
|C̄γ |+
sinφ|C̄α||C̄β |
|δω̃|
− cosφ
|C̄β ||C̄γ |
|C̄α|
− |C̄α||C̄γ |
|C̄β |
− |C̄β ||C̄α|
|C̄γ |
where we have defined the relative phase difference as
φ = φα − φβ − φγ . These equations have the stationary
solution
|C̄α|2 =
4γ̃βγ̃γ
Ω̃|δω̃|2
tan2 φ
, (6)
|C̄β |2 =
4γ̃αγ̃γ
Ω̃|δω̃|2
tan2 φ
|C̄γ |2 =
4γ̃αγ̃β
Ω̃|δω̃|2
tan2 φ
tanφ =
γ̃β + γ̃γ − γ̃α
Ω̃|δω̃|
Note that in the limit in which γβ+γγ >> γα the station-
ary solution for the r-mode amplitude |Cα| is the same
as the parametric instability threshold.
B. Temperature and Spin Evolution
The spin evolution equation is obtained from conser-
vation of total angular momentum J , where
J = IΩ + Jphys. (7)
Following Eq (K39-K42) of Schenk et al. [29] the physical
angular momentum of the perturbation can be written as
ΩJphys =
C⋆BCA
d3xρ[(Ω̂× ξ⋆B) · (Ω̂× ξA) (8)
− i (ω̃A + ω̃B)
ξ⋆B · (Ω̂× ξA)].
Since the eigenvectors ξA ∝ eimAφ the cross-terms will
vanish for modes with different magnetic quantum num-
bers m as
ei(mA−mB)φdφ = 0 for mA 6= mB. Eq. (8)
can be re-written for our triplet of modes as
Jphys = MR
2(kαα|Cα|2 + kββ|Cβ |2 + kγγ |Cγ |2), (9)
where kαα is defined as
kαα =
d3xρ[(Ω̂×ξ⋆α) · (Ω̂×ξα)− iω̃αξ⋆α · (Ω̂×ξα)]
and similarly for kββ and kγγ . In terms of the scaled vari-
ables C̄j = Cj/|Cj |0 (with |Cj |0 defined in Eq. (3)) the
angular momentum of the perturbation can be written
Jphys =
MR2Ωc|δω̃|2
(4k̃)2ω̃αω̃βω̃γ
(kαα|C̄α|2ω̃α (11)
+kββ|C̄β |2ω̃β + kγγω̃γ |C̄γ |2).
We chose the same normalization for the eigenfuctions
as Refs. [19, 26, 27, 28, 29] so that at unit amplitude all
modes have the same energy ǫα = MR
2Ω2. The energy
of a mode α is Eα = MR
2Ω2|cα|2 = MR2Ω|Cα|2. The
rotating frame energy is the same as the canonical energy
and physical energy [29]. The canonical angular momen-
tum and the canonical energy of the perturbation satisfy
the general relation Ec = −(ω/m)Jc [3].
Angular momentum is gained because of accretion and
lost via gravitational waves emission
= 2γGRJc rmode + Ṁ
GMR, (12)
where Jc rmode = −(mα/ωα)ǫα|cα|2 = −3MR2Ω|cα|2 =
−3MR2|Cα|2. Eq. (12) can be rewritten in terms of the
scaled variables C̄j as
= −6γ̃GR
MR2Ωc|δω̃|
(4k̃)2ω̃βω̃γ
|C̄α|2 +
ΩcΩ̃|δω̃|
. (13)
Thermal energy conservation gives the temperature evo-
lution equation
C(T )
2Ejγj +KnṀc
2 − Lν(T ), (14)
= 2MR2Ω(γα v|Cα|2 + γβ |Cβ |2
+ γγ |Cγ |2) +KnṀc2 − Lν(T ).
The three terms on the right hand side of the equa-
tion represent viscous heating, nuclear heating and neu-
trino cooling. The specific heat is taken to be C(T ) ≈
1.5 × 1038 T8 erg K−1, where T = T8 × 108 K. Nu-
clear heating occurs because of pycnonuclear reactions
and neutron emission in the inner crust [36]. At large
accretion rates such as that of the brightest LMXBs of
Ṁ ≈ 10−8M⊙/yr, the accreted helium and hydrogen
burns stably and most of the heat released in the crust is
conducted into the core of the neutron star, where neu-
trino emission is assumed to regulate the temperature of
the star [36, 37]. The nuclear heating constant is taken
to be Kn ≈ 1×10−3 [36]. Following Ref. [1], we take the
neutrino luminosity to be
Lν = LdUT
8RdU(T/Tp) + LmUT
8RmU(T/Tp) (15)
+ Le−iT
8 + Ln−nT
8 + LCpT
where the constants for the modified and direct URCA re-
actions are defined by LmU = 1.0×1032 erg sec−1, LdU =
fdU × 108LmU [38, 39], and the electron-ion, neutron-
neutron neutrino bremsstrahlung and Cooper pairing of
neutrons are given by Le−i = 9.1 × 1029 erg sec−1 [36],
Ln−n ≈ 0.01LmU, LCp = 8.9 × 1031 erg sec−1 [40]. The
fraction of the star fdU that is above the density thresh-
old for direct URCA reactions is in general dependent on
the equation of state [41] and in this work we treat fdU
a free parameter with values between 0 and 1.
The proton superfluid reduction factors for the modi-
fied and direct URCA reactions are taken from Ref. [39]
(see Eqs. (32) and (51) in Ref. [39]):
RdU(T/Tp) =
0.2312 +
(0.76880)2 + (0.1438v)2
× exp
3.427−
(3.427)2 + v2
RmU(T/Tp) =
0.2414 +
(0.7586)2 + (0.1318v)2
× exp
5.339−
(5.339)2 + (2v)2
where the dimensionless gap amplitude v for the singlet
type superfluidity is given by
1.456− 0.157
+ 1.764
. (17)
Similar to Ref. [1], we use Tp = 5.0× 109 K. In terms of
the scaled variables Eq. (14) becomes
C(T )
2MR2Ω2c |δω̃|
(4κ̃)2ω̃αω̃βω̃γ
(ω̃αγ̃α v|C̄α|2 + ω̃βγ̃β |C̄β |2(18)
+ω̃γγ̃γ |C̄γ |2) +
KnṀc
2 − Lν(T )
ΩcΩ̃|δω̃|
C. Temperature and Spin Evolution with the
Mode Amplitudes in Quasi-Stationary States
Assuming that the amplitudes evolve through a series
of spin- and temperature-dependent steady states, i.e.,
dCi/dτ̃ ≈ 0, the spin and thermal evolution equations
can be rewritten by taking J ≈ IΩ and using Eqs. (6) in
Eq. (13).
= − 6γ̃GR
Ω̃2|δω̃|
γ̃β γ̃γ
4k̃2Ĩω̃βω̃γ
tan2 φ
MR2ĨΩ̃|δω̃|
where Ĩ = I/(MR2). The thermal evolution of the sys-
tem is given by
C(T )
2MR2Ω2c
(4κ̃)2ω̃αω̃βω̃γ
γ̃αγ̃β γ̃γ
Ω̃|δω̃|
ω̃αγ̃α,v
+ ω̃β (20)
+ω̃γ)
tan2 φ
KnṀc
2 − Lν(T )
ΩcΩ̃|δω̃|
By setting the right hand side of the above equation to
zero, one can find the Heating = Cooling (H = C) curve.
Below, we find that Eqs. (19)-(20) describe the evolu-
tion very well throughout the unstable regime. These
equations are a minimal physical model for the effects of
nonlinear coupling on r-mode evolution.
D. Sources of Driving and Dissipation
The damping mechanisms are shear viscosity, bound-
ary layer viscosity and hyperon bulk viscosity; for modes
j = α, β, γ we write
γj v(Ω, T ) = γj sh(T ) + γj bl(Ω, T ) + γj hb(Ω, T ). (21)
The r-mode is driven by gravitational radiation and
damped by these dissipation mechanisms, while the pair
of daughter modes (n = 13,m = −3 labeled as β and
n = 14,m = 1 labeled as γ) is affected only by the vis-
cous damping. Brink et al. [26, 27, 28] determined that
this pair of modes is excited at the lowest parametric
instability threshold. Their model uses the Bryan [42]
modes of an incompressible star, which has the advan-
tage that the mode eigenfrequencies (and eigenfunctions)
are known analytically. This enables them to find near
resonances efficiently. We are using their results, but we
include more realistic effects such as bulk viscosity, whose
effect vanishes in the incompressible limit (Γ1 → ∞ in
Eq. (29))
For our benchmark calculations, we adopt the neutron
star model of Owen et al. Ref. [6] (n = 1 polytrope,
M = 1.4M⊙, Ωc = 8.4 × 103 rad sec−1 and R = 12.53
km) and use their gravitational driving rate and shear
viscous damping rate for the r-mode
γGR(Ω) ≃
sec−1, (22)
γα sh(T ) ≃
where τsh = 2.56 × 106 sec. (In Sec. V we consider ap-
proximate scalings with M and R.)
The damping rate due to shear viscosity for the two
daughter modes is calculated using the Bryan modes for
a star with the same mass and radius
γβ sh(T ) ≃ 3.48× 10−4 sec−1
, (23)
γγ sh(T ) ≃ 4.52× 10−4 sec−1
The geometric contribution γsh/η of the individual modes
increases significantly with the degree n of the mode scal-
ing approximatively like n3 for large n (see Eq. (29) of
Brink et al. [27] for an analytic fit to the shear damping
rates computed for the 5,000 modes in their network),
and hence the inertial modes with n = 13 and n = 14
have shear damping rates about three orders of magni-
tude larger than that of the r-mode.
The damping due to boundary layer viscosity is calcu-
lated using Eq. (4) of Ref. [30],
γα bl(T,Ω) ≃ 0.009 sec−1 S2ns
, (24)
γβ bl(T,Ω) ≃ 0.028 sec−1 S2ns
γγ bl(T,Ω) ≃ 0.021 sec−1S2ns
Analogous to Wagoner [1], we allow the slippage fac-
tor Sns to vary. The slippage factor is defined by Refs.
[1, 31, 45] to be S2ns = (2S
n + S
s )/3, with Sn being the
fractional difference in velocity of the normal fluid be-
tween the crust and the core [31] and Ss the fractional
degree of pinning of the vortices in the crust [45]. Note
that γβ bl and γγ bl are both greater than 2 × γα bl and
can easily be comparable to γGR in the unstable regime.
The damping rate due to bulk viscosity produced by
out-of-equilibrium hyperon reactions for the r-mode is
found by fitting the results of Nayyar and Owen [24].
This rate is taken to have a form similar to that taken
by Wagoner [1]
γα hb = fhb
t−20α τ(T )Ω̃
1 + (ω̃αΩτ(T ))2
, (25)
and for the daughter modes
γβ hb = fhb
t−20β τ(T )ω̃
1 + (ω̃βΩτ(T ))2
, (26)
and similarly foe γγ hb. The relaxation timescale
τ(T ) =
Rhb(T/Tc)
The reduction factor is taken to be the product of two
single-particle reduction factors [23, 24]
Rhb single(T/Tc) =
a5/4 + b1/2
0.5068−
0.50682 + y2
where a = 1 + 0.3118y2, b = 1 + 2.556y2 and y =
1.0− T/Tc(1.456 − 0.157
Tc/T + 1.764Tc/T ). The
constants t1 ≈ 10−4 sec and t0α ≈ 0.00058 sec are found
by fitting the results of Ref. [24]. The factor fhb allows
for physical uncertainties; we take fhb = 1 throughout
the body of the paper since Tc , which enters γj hb ex-
ponentially, is also uncertain. For the daughter modes,
the dissipation energy due to bulk viscosity is calculated
using the modes for the incompressible star. In the slow
rotation limit, it is given to leading order in Γ−21 by
− ĖB j =
ξj · ∇p
. (29)
This approximation was proposed by Cutler and Lind-
blom [43] and adopted by Kokkotas and Stergioulas [44]
for the r-mode and by Brink et al. [27] for the inertial
modes. The adiabatic index Γ1 is regarded as a parame-
ter; we use Γ1 ≈ 2. The damping rate is
γj hb = −
ĖB j
, (30)
where ǫ = MR2Ω2 is the mode’s energy in the rotat-
ing frame at unit amplitude and j = β, γ. Using this
procedure, we calculate
t0β ≈ 1.4× 10−5 sec, (31)
t0γ ≈ 1.0× 10−5 sec.
III. SUMMARY OF RESULTS
Fig. 1(a) shows possible evolutionary trajectories of a
neutron star in the angular velocity-temperature Ω̃− T8
plane, where T = T8×108 K is the core temperature, and
Ω̃ = Ω/Ωc = Ω/
πGρ̄ with ρ̄ the mean density of the
neutron star. Fig. 1(b) displays the regions in fdU − Sns
in which the trajectories occur. Here fdU represents the
fraction of the star that is above the density threshold
for direct URCA reactions and Sns is the slippage factor
that reduces the relative motion between the crust and
the core taking into account the elasticity of the crust
[31]. The stability regions are shown at fixed hyperon
superfluidity temperature, Tc = 5.0 × 109 K. The initial
part of the evolution is similar in all scenarios and can
be divided into phases.
Phase 0. Spin up below the r-mode stability curve at
T8 = T8 in such that nuclear heating balances neutrino
cooling.
Phase 1. Linear regime. The r-mode amplitude grows
exponentially. The phase ends when the r-mode reaches
the parametric instability.
Phase 2. The triplet coupling leads to quasi-steady
mode amplitudes. The star is secularly heated at
approximately constant Ω because of viscous dissipation
in all three modes.
Phase 3. Several trajectories are possible depending on
FIG. 1: (a)Typical trajectories for the four observed evolu-
tion scenarios are shown in the Ω̃ - T8 phase space, where
Ω̃ = Ω/Ωc. The dashed lines (H = C curves) represent
the points in the Ω̃ − T8 phase space where the dissipative
effects of the heating from the three-modes exactly compen-
sate the neutrino cooling for the given set of parameters (Sns,
fdU, Tc, ...) of each evolution. (b)The corresponding sta-
bility regions for which these scenarios occur are plotted at
fixed hyperon superfluidity temperature Tc = 5.0 × 10
while varying fdU and Sns. The position of the initial angu-
lar velocity and temperature (Ω̃in, T8 in) with respect to the
maximum of this curve determines the stability of the evo-
lution. (I) Ω̃in > Ω̃H=C max. Trajectory R1. Fast Runaway
Region. After the r-mode becomes unstable the star heats
up, does not find a thermal equilibrium state and continues
heating up until a thermogravitational runaway occurs. (II)
Ω̃in < Ω̃H=C max. The evolutions are either stable or, if there
is a runaway, it occurs on timescales comparable to the ac-
cretion timescale. The possible trajectories are (1)Trajectory
C. Cycle Region. (2) Trajectories S1 and S2. Steady State
Region. (3) Trajectory R2. Slow Runaway Region.
how the previous phase ends.
a. Fast Runaway. The star fails to reach thermal
equilibrium when the trajectory passes over the peak of
the Heating = Cooling (H = C) curve. This leads to
rapid runaway. The daughter modes damp eventually
as bulk viscosity becomes important, and the r-mode
grows exponentially until the trajectory hits the r-mode
stability curve again. This scenario ends as predicted
by Nayyar and Owen [24]. However, the r-mode passes
its second parametric instability threshold soon after it
starts growing again. This requires the inclusion of more
modes to follow the evolution, which is the subject of
future work.
b. The star reaches thermal equilibrium. There are then
three possibilities:
(i) Cycle. The star cools and spins down slowly,
descending the H = C curve until it crosses the r-mode
stability curve again. At this point the instability shuts
off. The star cools back to T8 in at constant Ω̃ and
then the cycle repeats itself. At Tc = 5.0 × 109 K this
scenario occurs for values of Sns < 0.50 and large enough
values of fdU. However, if Tc is larger, the cycle region
in the fdU-Sns phase space increases dramatically (see
Fig. 9(a)). Note that our cycles are different from those
obtained by Levin [18] in that the spin-down phase
does not start when the r-mode amplitude saturates
(or in our case when it reaches the parametric insta-
bility threshold), but rather when the system reaches
thermal equilibrium. The r-mode amplitude does not
grow significantly above its first parametric instability
threshold, remaining close to ∼ 105 and so the part
of the cycle in which the r-mode is unstable also lasts
longer than in Ref. [18]. Also, our cycles are narrow.
During spin-down the temperature changes by less than
20 % and Ω̃ changes by less than 10% of the initial value.
(See Sec. 2 for a detailed example.)
(ii) Steady State. For small Sns and large enough
fdU (fdU & 5 × 10−5, Sns . 0.04; see Fig. 1(b)) the
star evolves towards an Ω̃ equilibrium. The trajectory
either ascends or descends the H = C curve (spins
up and heats or spins down and cools). The evolution
stops when the accretion torque equals the gravitational
radiation emission.
(iii) Slow Runaway. For small Sns and very small fdU
(Sns . 0.03, fdU < 5×10−5) the star ascends the H = C
curve until the peak is overcome and subsequently a
runaway occurs. The daughter modes eventually damp
and the r-mode grows exponentially until it crosses its
second parametric instability threshold and more modes
need to be included.
Bulk viscosity only affects the runaway evolutions; the
cyclic and steady state evolutions found here would be
the same if there were no hyperon bulk viscosity. For
large Tc ∼ 1010, or for models with no hyperons at all,
there would be no runaway region (See Fig. 9(a) for an
fdU − Sns scenario space with a larger Tc = 6.5× 109 K
where the fast runaway region has shrunk dramatically
FIG. 2: Two cyclic trajectories in the Ω̃ − T8 plane are dis-
played for a star with Tc = 5.0 × 10
9 K and (a) fdU = 0.15
and Sns = 0.10, and (b) fdU = 0.142 and Sns = 0.35, which is
close to the border between the stable and unstable region (see
Fig. 1(b)). The thick solid line labeled as the Heating = Cool-
ing (H = C) curve is the locus of points in this phase space
where the neutrino cooling is equal to the viscous heating due
to the unstable modes. The other solid line representing the
r-mode stability curve is defined by setting the gravitational
driving rate equal to the viscous damping rate. The part of
the curve that decreases with T8 is dominated by boundary
layer and shear viscosity, while the part of the curve that has
a positive slope is dominated by hyperon bulk viscosity. In
portion a1 → b1 of the trajectory the star heats up at con-
stant Ω̃. Part b1 → c1 represents the spin down stage, which
occurs when the viscous heating is equal to the neutrino cool-
ing. c1 → d1 shows the star cooling back to the initial T8.
Segment d1 → a1 displays the accretional spin-up of the star
back to the r-mode stability curve. The cycle a2 → d2 pro-
ceeds in the same way. This cycle is close to the peak of the
H = C curve. Configurations above this peak will run away.
and the slow runaway region has disappeared.)
IV. POSSIBLE EVOLUTION SCENARIOS
In this section we examine examples of the different
types of evolution in more detail. We assume Ṁ =
10−8M⊙/yr and Tc = 5.0× 109 K.
A. Cyclic Evolution
In this sub-section we present the features of typical
cyclic trajectories of neutron stars in the angular velocity
temperature plane in more detail. We focus on two cases:
(C1) Sns = 0.10 and fdU = 0.15 and (C2) Sns = 0.35
and fdU = 0.142. In this scenario the 3-mode system is
sufficient to model the nonlinear effects and successfully
stops the thermal runaway. The numerical evolution is
started once the star reaches the r-mode stability curve.
The initial temperature of the star is at the point where
nuclear heating equals neutrino cooling in Eq. (18) that
is approximately T8 in ≈ 3.29 for both cases. The initial
Ω is the angular velocity that corresponds to this tem-
perature on the r-mode stability curve, which differs for
the different Sns (Ω̃in = 0.183 for C1 and Ω̃in = 0.288 for
Figs. 2(a) and (b) display the cyclic evolution for tra-
jectories C1 and C2 of Fig. 1(b). In leg a1 → b1 of the
trajectory the r-mode and, once the r-mode amplitude
increases above the first parametric instability thresh-
old, the two daughter modes it excites, viscously heat
up the star until point b1 when the neutrino cooling bal-
ances the viscous dissipation. This part of the evolu-
tion occurs at constant angular velocity over a period
of theat−up ≈ 100 yr and a total temperature change
(∆T )a1−b1 ≈ 0.80 (≈ 24% of T8 in). The points where
the viscous heating compensates the neutrino cooling are
represented by the Heating = Cooling (H = C) curve.
This is determined by setting Eq. 18 to zero and using
the quasi-stationary solutions given by Eq. (6) for the
three modes on the right hand side. The star continues
to evolve on the H = C curve for part b1 → c1 of the
trajectory as it spins down and cools down back to the r-
mode stability curve. This spin-down stage lasts a time
tspin−downb1−c1 ≈ 23, 000 yr that is much longer than
the heat-up period. This timescale is very sensitive to
changes in the slippage factor and can reach 106 yr for
smaller values of Sns that are close to boundary of the
steady state region. The cycle is very narrow in angular
velocity with a total angular velocity change of less than
4%, (∆Ω̃)b1−c1 ≈ 0.0066. The temperature also changes
by only about 2%, (∆T8)b1−c1 ≈ 0.08 in this spin-down
period. Segment c1 → d1 represents the cooling of the
star to the initial temperature on a timescale of ∼ 2, 000
yr. In part d1 → a1 the star spins up by accretion at
constant temperature back to the original crossing point
on the r-mode stability curve. This last part of the tra-
jectory is the longest-lasting one, taking ≈ 200, 000 yr
at our chosen Ṁ of 10−8M⊙yr
−1. The cycle C2 in Fig.
2(b) proceeds in a similar fashion. It is important to note
that this configuration is close to the border between the
“FAST RUNAWAY” and “CYCLE” regions and there-
fore close to the peak of the H = C curve. Configura-
tions above this peak (e.g., with the same fdU and higher
Sns) will go through a fast runaway.
Fig. 3(a) shows the evolution of the three modes in
the first few years after the star first reaches the r-
mode stability curve. In this region the r-mode is un-
stable and initially grows exponentially. Once it has in-
creased above the first parametric instability threshold
the daughter modes are excited. The oscillations of the
three modes display some of the typical dynamics of a
driven three-mode system. When the r-mode transfers
energy to the daughter modes they increase exponen-
tially while the r-mode decreases. Similarly, when daugh-
ter modes decrease the r-mode increases. The viscosity
damps the oscillations and the r-mode amplitude settles
at a value close to the parametric instability threshold.
Fig. 3(b) displays the evolution of the r-mode ampli-
tude divided by the parametric instability threshold on
a longer timescale. It can be seen that the r-mode never
grows significantly beyond this first threshold. Fig. 3(c)
shows the evolution of the parametric instability thresh-
old as a function of time. The threshold increases as the
temperature increases and the star is viscously heated by
the three modes. When the star spins down in thermal
equilibrium, the threshold decreases to a value close to
its initial value.
B. Steady State Evolution
This sub-section focuses on evolutions that lead to a
steady equilibrium state in which the rate of accretion
of angular momentum is balanced by the rate of loss
via gravitational radiation emission. This scenario is re-
stricted to stars with small slippage factor (Sns . 0.04,
see Fig. 1(b)) and boundary layer viscosity. A typical
trajectory of a star that reaches such an equilibrium is
shown in Fig. 4. As always, we start the evolution at the
point on the r-mode stability curve at which the nuclear
heating balances neutrino cooling. Above the r-mode sta-
bility curve the gravitational driving rate is greater than
the viscous damping rate and the r-mode grows exponen-
tially until nonlinear effects become important. In this
case, as in the cyclic evolution, the triplet of modes at
the lowest parametric instability threshold is sufficient to
stop the thermal runaway. The r-mode remains close to
the first instability threshold for the length of the evo-
lution and after a few oscillations the three modes settle
into their quasi-stationary states, which change only sec-
ularly as the spin and temperature of the star evolve.
The modes heat the star viscously at constant Ω̃ in seg-
ment a → b of the trajectory for theat−up ≈ 1, 100 yr.
At point b, the star reaches a state of thermal balance.
In leg b → c the star continues its evolution in thermal
equilibrium and slowly spins up due to accretion until
the angular velocity evolution also reaches an equilib-
5.12 5.13
τ [x 10
0.422
0.424
0.426
0.428
0.432
|cα Th|
0.4 0.5 0.6 0.7 0.8 0.9 1
τ [x 10
0 0.5 1 1.5 2 2.5 3 3.5 4
τ [x 10
0.425
0.435
FIG. 3: (a)The amplitudes of the r-mode |Cα| and of the
n = 13, m = −3 and n = 14, m = 1 inertial modes |Cβ | and
|Cγ | are shown as a function of time for a star that executes
a cyclic evolution (same parameters as in Fig. 2). The low-
est parametric instability threshold is also displayed. (b)The
ratio of the r-mode amplitude to the parametric instability
threshold is plotted as a function of time. It can be seen that
once the r-mode crosses the parametric instability threshold
it remains close to it for the rest of the evolution. (c)The
parametric instability threshold is displayed as a function of
time. Its value changes as the angular velocity and tempera-
ture evolve.
FIG. 4: The trajectory of a neutron star in the Ω̃− T8 phase
space is shown for a model with Tc = 5.0× 10
9 K, fdU = 0.03
and Sns = 0.03 that reaches an equilibrium steady state. The
star spins up until it crosses the r-mode stability curve and
the r-mode becomes unstable. The r-mode then quickly grows
to the first parametric instability threshold and excites the
daughter modes. In leg a → b of the trajectory the star is
viscously heated by the mode triplet until the system reaches
thermal equilibrium. Segment b → c shows the star contin-
uing to heat and spin up in thermal equilibrium until the
accretion torque is balanced by the gravitational radiation
emission. The r-mode stability curve represents the points
in phase space where the viscous driving rate is equal to the
gravitational driving rate. The H=C curve is the locus of
points where the viscous dissipation due to the mode triplet
balances the neutrino cooling.
FIG. 5: The (Ω̃, T8) initial values (region delimited by the
solid line) that lead to equilibrium steady states and their
corresponding final steady state values (region enclosed by
the dashed line) are shown. Since both the initial and final
values of T8 are low, these evolutions are roughly independent
of Tc.
rium. The timescale to reach an equilibrium steady state
is tsteady ≈ 3.5× 106 yr for this set of parameters.
Fig. 5 displays the possible initial values for the angu-
lar velocity Ω̃ and temperature T8 of the star that lead to
a balancing between the accreted angular momentum and
the angular momentum emitted in gravitational waves.
The fraction of the star that is above the threshold for
direct URCA reactions and the slippage factor are varied
within the corresponding “STEADY STATE” region of
Fig. 1(b). The final equilibrium values are also displayed
and cluster in a narrower region than the initial values.
Because viscosity is so small in this regime, the values
of Ω also tend to be small. Thus, although an interest-
ing physical regime, this case is most likely not relevant
to recycling by accretion to create pulsars with spin fre-
quencies as large as 716 Hz. Note that a steady state
can be achieved when Sns = 0. This is the probable end
state of the problem first calculated by Levin [18]. The
reason we do not find a cycle at low Sns is twofold: (1)
the shear viscosity we are using is lower (shear viscosity
in Ref. [18] is amplified by a factor of 244), and (2) the
nonlinear couplings keep all mode amplitudes small.
C. Thermal Runaway Evolutions
We now consider evolutions in which the three-mode
system is not sufficient to halt the thermal runaway. We
observe two such scenarios. In the first scenario, the
star is unable to reach thermal equilibrium. The run-
away occurs on a period much shorter than the accretion
timescale and so the whole evolution is at approximately
constant angular frequency. In the second scenario, the
star reaches a state of thermal equilibrium but the spin
evolution does not reach a steady state. The star contin-
ues to spin up by accretion until it climbs to the peak of
the H = C curve, thermal equilibrium fails and a run-
away occurs.
1. Fast Runaway
A typical trajectory of a star that goes through a rapid
thermal runaway is displayed in Fig. 6. This star has
Sns = 0.25 and fdU = 0.058. Initially, the growth of the
r-mode is halted by the two daughter modes once the
lowest parametric instability threshold is crossed, and
the three modes settle in the (Ω,T )-dependent quasi-
stationary states of Eqs. (6). They viscously heat up the
star until hyperon bulk viscosity becomes important for
the daughter modes. As the amplitudes of the daughter
modes decrease the coupling is no longer strong enough
to drain enough energy to stop the growth of the r-mode.
The daughter modes are completely damped and the r-
mode increases exponentially. The system goes back to
the one-mode evolution described by Ref. [24].
Fig. 6(a) and (b) compare both the temperature evolu-
tion and the trajectory in the Ω̃−T8 plane of the star for a
0 0.5 1 1.5 2 2.5 3 3.5
τ [x 10
Full Amplitudes-Ω-T Evolution
Steady State Evolution
2nd Parametric Instability Threshold
3 3.5 4 4.5 5 5.5 6 6.5
r-mode Stability Curve
Full Amplitude-Ω-T Evolution
Steady State Evolution
2nd Parametric Instability Threshold
FIG. 6: This plot compares the full evolution resulting from
solving Eqs. (4),(13),(18) with the reduced Ω − T evolution
that assumes the amplitudes go through a series of steady
states Eqs. (19)-(20) for a model with Tc = 5.0 × 10
fdU = 0.058 and Sns = 0.25. (a) The temperature is dis-
played as a function of time for the two different methods.
(b) The angular velocity Ω̃ = Ω/Ωc is shown as a function
of temperature. The evolution occurs at constant spin fre-
quency. It can be seen that the steady-state amplitude ap-
proximation is extremely good. The ‘X’ shows the point at
which the r-mode crosses its second lowest parametric insta-
bility threshold, where additional dissipation would become
operative.
simulation solving the full set of equations to a simulation
that assumes quasi-stationary solutions for the three am-
plitudes and evolves only the angular velocity and tem-
perature of the star. It can be seen that the steady state
approximation is very good until the thermal runaway
occurs. Afterward, the temperature evolution of the re-
duced equations is offset slightly from the quasi-steady
result and intersects the r-mode instability curve sooner.
This evolution is similar to that described by Nayyar and
Owen [24]. However, the r-mode crosses its second low-
FIG. 7: The trajectory of a neutron star in the Ω̃ − T8
phase space is shown for a model with Tc = 5.0 × 10
fdU = 4.0 × 10
−5 and Sns = 0.02 that goes through a slow
thermogravitational runaway. Portion a → b of the trajectory
shows the mode triplet heating up the neutron star through
boundary layer and shear viscosity until the system reaches
thermal equilibrium. Segment b → c represents the accre-
tional spin-up of the star in thermal equilibrium. The dotted-
dashed line is the locus of points where the viscous dissipation
of the mode triplet is equal to the neutrino cooling, and is la-
beled as the H = C curve. The star reaches the maximum of
this curve and fails to reach an equilibrium between the ac-
cretion torque and gravitational emission. It then continues
heating at constant angular velocity and crosses its second
lowest parametric instability threshold, at which point more
modes would need to be included to make the evolution accu-
rate. Eventually the star reaches the r-mode stability curve
again.
est parametric instability much earlier in the evolution
(see the ‘X’ in the figure), and at that point more modes
need to be included to model the instability accurately.
Thus, we cannot be sure that a runaway must occur in
this case. We shall return to this issue in a subsequent
paper.
2. Slow Runaway
In this section we examine evolutions in which the neu-
tron star has both a very small slippage factor, Sns .
0.03, and only a small percentage of the star is above the
threshold for direct URCA reactions, fdU < 5 × 10−5.
A trajectory for this kind of evolution is displayed in
Fig. 7. After the star crosses the r-mode stability curve,
the r-mode increases beyond the first parametric insta-
bility threshold, and its growth is temporarily stopped
by energy transfer to the daughter modes. As in the
previous scenarios, the star is viscously heated by the
mode triplet at constant Ω in part a → b of the trajec-
tory on a timescale of about 5, 000 yr. At point b, it
FIG. 8: The spin-down timescale is shown as slippage fac-
tor Sns and fraction of the star subject to direct URCA fdU
for cyclic evolutions are varied for a fixed hyperon critical
temperature of Tc = 5.0 × 10
9 K. This timescale dominates
the heat-up timescale and hence represents the time the star
spends above the r-mode instability curve. It increases as the
viscosity is lowered and the star gets closer to the steady state
region.
reaches thermal equilibrium. In leg b → c of the tra-
jectory, the star continues its evolution by ascending the
H = C curve and spinning up because of accretion for
about 2 × 106 yr without finding an equilibrium state
for the angular momentum evolution. Once it reaches
the peak of the H = C curve, the cooling is no longer
sufficient to stop the temperature from increasing expo-
nentially and a thermal runaway occurs. The cross mark
‘X’ on the trajectory shows the point at which the r-mode
amplitude crosses its second lowest parametric instabil-
ity threshold. At this stage more inertial modes need to
be included to model the rest of this evolution correctly.
As for the cases that evolve to steady states, these long-
timescale runaways tend to occur at low spin rates.
V. PROBABILITY OF DETECTION
Fig. 8 shows how the time the star spends above the
r-mode stability curve changes when Sns and fdU are var-
ied. For large enough values of Sns the boundary layer
viscosity dominates. In this region of phase space the
spin-down timescale can be approximated by
tspin−down =
dΩ̃ (32)
FIG. 9: (a)The stability regions are plotted at fixed hyperon
superfluidity temperature Tc = 6.5×10
9 K, while varying fdU
and Sns. The steady state region remains roughly the same
as in Fig. 1(b), the slow run-away region disappears, and the
cycle region increases dramatically while shrinking the fast-
runaway region. (b) The spin-down timescale is shown for the
cyclic evolutions in part (a).
Ĩτ0GR
(4κ̃)2ω̃βω̃γ
|δω̃|2
|C̄α|2
<Ω̃>6
≈ 250 yr ∆νkHz
<νkHz>7
M1.4R
|cthα |
where M1.4 = M/(1.4M⊙), R6 = R/(10
6cm),
νkHz = ν/1kHz, Ĩ = 0.261 [17], the r-mode am-
plitude at its parametric instability threshold |cthα | ≈
|δω̃|/(4κ̃
ω̃βω̃γ) ≈ 1.5×10−5, and C̄α =
Ω̃|cα|/|cα|th.
This approximation agrees with spin-up timescales ob-
tained from our simulations to ∼ 25%.
The maximum ν is approximately the same as the ini-
tial frequency, and can be determined by equating the
driving and damping rate of the r-mode, since it is on
the r-mode stability curve
νmax ≈ 800Hz
M1.4R6
)4/11
. (33)
Thus, the spin-down timescale is very sensitive to
the slippage factor tspin−down ∝ S−24/11ns (∆νkHz/νkHz).
The dependences on fdU and accretion rate Ṁ are
much weaker; a rough approximation, obtained by
matching direct URCA cooling and nuclear heat-
ing, is T8 in ∝ Ṁ1/6f−1/6dU R
1.4 , and νmax ∝
dU Ṁ
−1/33R
−34/99
1.4 . The gravitational
wave amplitude measured at distance d [46, 47] is
h ≈ 1.6R
τ0GRc
Ω̃3|cα| (34)
≈ 3× 10−25
10kpc
M1.4R
Taking ν ≈ νmax gives
h ∝ S12/11ns M
−1/33
1.4 R
dU Ṁ
−1/11. (35)
The maximum distance at which sources could be de-
tected by Advanced LIGO interferometers, assuming
hmin = 10
−27, [46] is
dmax ≈ 3Mpc
10−27
M1.4R
|cthα |
≈ 1.5Mpc
10−27
S12/11ns M
−1/11
1.4 R
21/11
× T−6/118
|cthα |
Eqs. (33) and (36) imply that gravitational radiation
from the r-mode instability may only be detectable for
sources in the Local Group of galaxies. Eq. (33) implies
that for accretion to be able to spin up neutron stars to
ν & 700 Hz, we must require (Sns/M1.4R6
T8in)
4/11 & 1.
Assuming this to be true, dmax . 1-1.5 Mpc. However,
tspin−down ≈ 1000 yr at most, making detection unlikely
for any given source. Moreover, unless Sns can differ sub-
stantially from one neutron star to another, only those
with ν given by Eq. (33) can be r-mode unstable. Slower
rotators, including almost all LMXBs, are still in their
stable spin-up phases.
Still more seriously, Fig. 1(b) shows that spin cycles
are only possible for Sns . 0.50, assuming Tc ≈ 5.0× 109
K; Eq. (33) then implies ν . 450 Hz. This would restrict
detectable gravitational radiation to galactic sources, al-
though the duration of the unstable phase could be
longer.
Within the context of our three mode calculation,
Sns > 0.50, which is needed for explaining the fastest pul-
sars, would imply fast runaway. There are two possible
resolutions to this problem. One is that including addi-
tional modes prevents the runaway; we shall investigate
this in subsequent papers. The second is that Tc is larger,
or that neutron stars do not contain hyperons (e.g., be-
cause they are insufficiently dense). Fig. 9(a) shows the
same phase plane as Fig. 1(b) but with Tc = 6.5×109 K,
and Fig. 9(b) shows the results for tspin−down analogous
to Fig. 8. Larger Tc permits spin cycles for higher values
of Sns (and hence ν), but the time spent in the unstable
regime is shorter.
VI. CONCLUSIONS
In this paper, we model the nonlinear saturation of un-
stable r-modes of accreting neutron stars using the triplet
of modes formed from the n = 3,m = 2 r-mode and the
the first two near resonant modes that become unstable
(n = 13,m = −3 and n = 14,m = 1) by coupling to
the r-mode. This is the first treatment of the spin and
thermal evolution including the nonlinear saturation of
the r-mode instability to provide a physical cutoff by en-
ergy transfer to other modes in the system. The model
includes neutrino cooling and shear, boundary layer and
hyperon bulk viscosity. We allow for some uncertainties
in neutron star physics that is not yet understood by
varying the superfluid transition temperature, the slip-
page factor that regulates the boundary layer viscosity,
and the fraction of the star that is above the density
threshold for direct URCA reactions. In all our evolu-
tions we find that the mode amplitudes quickly settle
into a series of quasi-stationary states that can be calcu-
lated algebraically, and depend weakly on angular veloc-
ity and temperature. The evolution continues along these
sequences of quasi-steady states as long as the r-mode is
in the unstable regime. The spin and temperature of
the neutron star can follow several possible trajectories
depending on interior physics. The first part of the evo-
lution is the same for all types of trajectories: the star
viscously heats up at constant angular velocity.
If thermal equilibrium is reached, we find several pos-
sible scenarios. The star may follow a cyclic evolution,
and spin down and cool in thermal equilibrium until the
r-mode enters the stable regime. It subsequently cools
at constant Ω until it reaches the initial temperature.
At this point the star starts spinning up by accretion
until the r-mode becomes unstable again and the cycle
is repeated. The time the star spends in the unstable
regime is found to vary between a few hundred years
(large Sns ∼ 1) and 106 yr (small Sns ∼ 0.05). Our
cycles are different from those previously found by Ref.
[18] in that our amplitudes remain small, ∼ 10−5, which
slows the viscous heating and causes the star to spend
more time in the regime where the r-mode instability is
active. Furthermore, we find that the star stops heating
when it reaches thermal equilibrium and not when the r-
mode reaches a maximum value. The cycles we find are
narrow with the spin frequency of the star changing less
than 10% even in the case of high spin rates ∼ 750 Hz.
Other possible trajectories are an evolution toward a full
steady state in which the accretion torque balances the
gravitational radiation emission, and a very slow thermo-
gravitational runaway on a timescale of ∼ 106 yr. These
scenarios occur for very low viscosity (Sns . 0.04). Al-
though theoretically interesting, they do not allow for
very fast rotators of ∼ 700 Hz.
Alternatively, if the star does not reach thermal equi-
librium, we find that it continues heating up at constant
spin frequency until it enters a regime in which the r-
mode is no longer unstable. This evolution is similar
to that predicted by Nayyar and Owen [24]. However,
the r-mode grows above its second parametric instability
threshold fairly early in its evolution and at this point
more inertial modes should be excited and the three-
mode model becomes insufficient. Modeling this scenario
accurately is subject of future work.
We have focused on cases with Tc & 5× 109 K. These
are cases for which the nonlinear effects are substantial.
In this regime, hyperon bulk viscosity is not important
except for thermal runaways where we expect other mode
couplings, ignored here, to play important roles. Fast ro-
tation requires large dissipation, as has long been recog-
nized [18, 30] and these models can only achieve ν & 700
Hz if boundary layer viscosity is very large. Alterna-
tively, at lower Tc . 3 × 109 K, large rotation rates can
be achieved at r-mode amplitudes below the first para-
metric instability threshold [1]. Nayyar and Owen found
that increasing the mass of the star for the same equation
of state makes the hyperon bulk viscosity become impor-
tant at lower temperatures [24]. Conceivably, there are
accreting neutron stars with relatively low masses that
have lower central densities and small hyperon popula-
tions. These could evolve as detailed here and only spin
up to modest frequencies. Hyperons could be more im-
portant in more massive neutron stars leading to larger
spin rates and very small steady state r-mode amplitude
as found by Wagoner [1].
Our models imply small r-mode amplitudes of ∼ 10−5
and therefore gravitational radiation detectable by ad-
vanced LIGO interferometers only in the local group
of galaxies up to a distance of a few Mpc. The r-
mode instability puts a fairly stringent limit on the
spin frequencies of accreting neutron stars of νmax ≈
800Hz[Sns/(M1.4R6)]
4/11T
−2/11
8 . In order to allow for
fast rotators of & 700 Hz in our models a large bound-
ary layer viscosity with (Sns/M1.4R6
T8in)
4/11 ∼ 1 is
required. Slippage factors of order ∼ 1 lead to time peri-
ods on which the r-mode is unstable with a timescale of at
most 1000 yr, which is about 10−3 times shorter than the
accretion timescale. This would mean that only about 1
in 1000 LMXBs in the galaxy are possible LIGO sources.
However, lower slippage factors lead to a longer duration
of the gravitational wave emission, but also lower fre-
quencies. We also note that in this model we have con-
sidered only very fast accretors with Ṁ ∼ 10−8M⊙yr−1
and most LMXBs in our galaxy accrete at slower rates.
Investigations with more accurate nuclear heating models
are a subject for future work.
Our analysis could be made more realistic in several
ways, such as by including the effects of magnetic fields,
compressibility, multi-fluid composition [48], superfluid-
ity, superconductivity, etc. These features would render
the model more realistic, but its generic features ought
to persist, since the upshot would still be a dense set of
mode frequencies exhibiting three mode resonances and
parametric instabilities with low threshold amplitudes.
Although the behavior of the star would differ quanti-
tatively in a model different from ours in detail, we ex-
pect the qualitative behaviors we have found to be ro-
bust, as they are well described by quasi-stationary mode
evolutions whose slow variations are determined by com-
petitions between dissipation and neutrino cooling, and
accretion spin-up and gravitational radiation spin-down.
In our model, it seems that three mode evolution involv-
ing interactions of the r-mode with two daughters at the
lowest parametric instability threshold is often sufficient
to quench the instability. Our treatment is inadequate to
follow what happens when the system runs away; for this,
coupling to additional modes is essential. For this regime,
a generalization of the work of Brink et al. [26, 27, 28]
that includes accretion spin-up, viscous heating and neu-
trino cooling would be needed. Such a calculation is
formidable even in a “simple” model involving coupled
inertial modes of an incompressible star.
Acknowledgments
It is a pleasure to thank Jeandrew Brink and Éanna
Flanagan for useful discussions. RB would especially
like to thank Jeandrew for useful discussions, encourage-
ment and advice at the beginning of this project, with-
out which the project would not have been started. RB
is very grateful to Gregory Daues for steady encourage-
ment and support for the duration of this project, and
also to Gabrielle Allen and Ed Seidel. This research was
funded by grants NSF AST-0307273, NSF AST-0606710
and NSF PHY-0354631.
APPENDIX A
This appendix will sketch the derivation of Eqs. (1)
from the Lagrangian density. We follow closely Appendix
A in Schenk et al., which contains the derivation of the
equations of motion for constant Ω.
The Lagrangian density as given by Eq. (A1) in Schenk
et al. [29] is
L = 1
ξ̇ · ξ̇ + 1
ξ̇ ·B · ξ − 1
ξ ·C · ξ + aext(t) · ξ, (A-1)
where the operators B · ξ = 2Ω× ξ and
ρ(C · ξ)i = −∇i(Γ1p∇jξj) +∇ip∇jξj + ρ∇iδφ (A-2)
− ∇jp∇iξj + ρξj∇j∇iφ+ ρξj∇j∇iφrot
with φrot = −(1/2)(Ω × x)2. We are interested in a
situation where the uniform angular velocity of the star
changes slowly on the timescale of the rotation period
itself. In order to remove the time dependence we define
the new displacement and time variables
, dτ = Ωdt. (A-3)
In terms of these new variables the Lagrangian density
can be written as
L̃ = 1
ξ̃′ · ξ̃′ + 1
ξ̃′ · (B̃ · ξ̃) + (
|ξ̃|2 (A-4)
ξ̃ · C̃ · ξ̃ + aext(t)
· ξ̃,
where the primes denote derivatives with respect to τ ,
B̃ = Ω−1B and C̃ = Ω−2C. The momentum canonically
conjugate to ξ̃ is
= ξ̃′ + Ω̂× ξ̃. (A-5)
The associated Hamiltonian density is
B̃ · ξ̃
|ξ̃|2 +
ξ̃ · C̃ · ξ̃ −
· ξ̃.
(A-6)
Hamilton’s equations of motions can be written as
ζ̃′ = T · ζ̃ + F(τ), (A-7)
where
the operator T is T = T0 + T1 with
B̃2 − C̃ − 1
Ω)′′√
F(τ) =
We assume solutions of the form ζ̃(τ,x) = eiω̃tζ̃(x). Spe-
cializing to the case of no forcing term aext = 0 leads to
the eigenvalue equation
(T0 − iω̃)ζ̃(x) = 0. (A-8)
Since the operator T0 is not Hermitian it will have dis-
tinct right and left eigenvectors. Similar to Schenk et
al. [29] we label the right eigenvectors of T as ζ̃A, and
the associated eigenfrequencies as ω̃A = ωA/Ω, and the
eigenvalue equation above becomes
(T0 − iω̃A)ζ̃A(x) = 0. (A-9)
The left eigenvectors χA satisfy
0 − iω̃⋆A)χ̃A = 0, (A-10)
where
B̃2 − C̃
For simplicity, in this appendix we specialize to the case
of no Jordan chains when the set of right eigenvectors
forms a complete basis. The orthonormality relation be-
tween right and left eigenvectors is
χ̃A, ζ̃B
d3xρ(x)χ̃
A · ζ̃B = δAB. (A-11)
We can expand ζ(τ,x) in this basis as
ζ(τ,x) =
CA(τ)ζA(x), (A-12)
where the coefficients CA are given by the inverse of this
mode expansion
CA(τ) =
χ̃A, ζ̃(τ,x)
. (A-13)
Using Eqs. (B-2,A-9,A-11) in Eq. (A-7) leads to the equa-
tions of motion for the mode amplitudes
C′A − iω̃ACA = g(τ)
(A-14)
+ 〈χ̃A, F (τ)〉 ,
where g(τ) = (
Ω)′′/
Ω. Following Sec. IV of Schenk
et al. [29] we replace the externally applied acceleration
by the nonlinear acceleration given by Eq. (4.2) of Ref.
[29]. The inner product can be written in terms of the
displacement variable ξ̃. The left eigenvectors are
χ̃A =
where τ̃A can be chosen to be proportional to ξ̃A because
they satisfy the same matrix equation.
τ̃A = −iξ̃A/b̃A, (A-15)
which corresponds to Eq. (A-45) in Schenk et al. [29]
with the proportionality constant b̃A = Ω
−1bA =
MR2/ω̃A (also given by Eq. (2.36) of Ref. [29]).
The equations of motion for the mode amplitudes be-
C′A − iω̃ACA =
ig(τ)
d3xξ̃⋆A · ξ̃B(A-16)
κ̃⋆ABCC
where the nonlinear coupling κ̃ABC = κABC/(MR
and κABC is explicitly give by Eq. (4.20) of Ref. [29]. The
g(τ) integral mixes only modes with mA = mB because
of the eimφ dependence of the displacement eigenvectors
ξ̃. (
dφei(mA−mB)φ = 0 if mA 6= mB.) So, this term
will be zero for our mode triplet. Also, in the case of a
single mode triplet there is only one coupling and Eqs.
(A-16) take the form of Eqs. (1).
APPENDIX B
In this appendix we study the behavior of the mode
amplitudes and temperature near equilibrium assuming
constant angular velocity. We are performing a first order
expansion of Eqs. (5) and (18). Similar to Ref. [49], each
of the five variables is expanded about its equilibrium
(Xj)e as follows
Xj(τ̃ ) = {|C̄α|, |C̄β |, |C̄γ |, φ, T8} = (Xj)e[1 + ζj(τ̃ )]
(B-1)
where the perturbation |ζj | << 1 and j = α, β, γ, T . The
expansion leads to a first order differential equation for
each ζj
(γ̃α)e
Ω̃|δω̃|
ζα − ζβ − ζγ −
ζφ (B-2)
(γ̃β)e
Ω̃|δω̃|
ζα − ζβ + ζγ +
(γ̃γ)e
Ω̃|δω̃|
ζα + ζβ − ζγ +
φe tanφe
γ̃α + γ̃β + γ̃γ
Ω̃|δω̃|
−γ̃α − γ̃β + γ̃γ
Ω̃|δω̃|
−γ̃α + γ̃β − γ̃γ
Ω̃|δω̃|
(γ̃α − γ̃β − γ̃γ)e
Ω̃|δω̃|
MR2Ω2c γ̃αγ̃β γ̃γ
2κ̃2ω̃αω̃βω̃γΩ̃|δω̃|C(Te)T8e
tanφ2e
γ̃α v
ζα + ω̃βζβ + ω̃γζγ
+ T8e
+ ω̃β
+ ω̃γ
ΩcΩ̃|δω̃|C(Te)
where the equilibrium amplitudes |Cj |e have been written
in terms of the corresponding driving and damping rates
using Eqs. (6). Eq. (B-2) can be written in matrix form
= Aijζi. (B-3)
Let ζj ∝ exp(λτ̃ ). The determinant ||Aij − λδij || = 0
leads to the eigenvalue equation
λ5 + a4λ
4 + a3λ
3 + a2λ
2 + a1λ+ a0 = 0. (B-4)
The coefficients aj with j = 0, 4 are
a4 = 2 tanφe =
γ̃β + γ̃γ − γ̃α
Ω̃|δω̃|
, (B-5)
tanφ2e
γ̃2β + γ̃
γ + γ̃
(Ω̃|δω̃|)2
+ tanφ2e − 1,
γ̃αγ̃β γ̃γ
(Ω̃|δω̃|)3
tanφ2e
4γ̃αγ̃β γ̃γ
(Ω̃|δω̃|)3
tanφe
+ tanφ
2MR2Ω2c
κ̃2ω̃αω̃βω̃γC(Te)
(γ̃αγ̃β γ̃γ)
(Ω̃|δω̃|)4
tanφe
tanφ2e
4γ̃αγ̃β γ̃γ
(Ω̃|δω̃|)3
tanφe
Ω̃|δω̃|C(Te)
The eigenvalues can be approximated as
λ1,2 ≈ −
− ǫ± i
ǫ2 + w2
, (B-6)
λ3,4 ≈ ǫ± iw,
λ5 ≈ −
where ǫ = (a2 − a3a4)/a4 and w =
a1/a3. The system
is unstable when a2 − a3a4 > 0 or a0 < 0. The first
two eigenvalues will have a negative real part as long as
γ̃β + γ̃γ > γ̃α. If the heating compensates the cooling of
the star a0 ≈ 0 and becomes negative if the star can not
reach thermal equilibrium. The other critical stability
condition a2 − a3a4 = 0 can be written as
Ω̃|δω̃|
[1+Γβ+Γγ−(Γ2β+Γ2γ)−(Γβ−Γγ)2(Γβ+Γγ)] = 0,
(B-7)
where Γβ = γβ/γα and Γγ = γγ/γα. Note that we have
ignored the smaller terms of orderO([γ̃α/(Ω̃|δω̃|)]5). This
condition can be rewritten by defining variables D1 =
Γβ + Γγ and D2 = Γβ − Γγ
2 + 2D1 −D21 −D22 − 2D22D1 = 0. (B-8)
If D2 = 0 then the equation has one solutionD1 = 1+
for D1 > 2, which corresponds to Γ = Γβ = Γγ = 1.37
and matches the result of Wersinger et al. [35]. For the
viscosity we consider (see Sec. II D) a2 − a3a4 < 0.
[1] R. Wagoner, Astrophys. J 578, L63 (2002).
[2] S. Chandrasekhar, Phys. Rev. Lett. 24, 611 (1970).
[3] J. L. Friedman and B. F. Schutz, Astrophys. J. 222, 281
(1978). J. L. Friedman and B. F. Schutz, Astrophys. J.
221, 937 (1978).
[4] N. Andersson, Astrophys. J. 502, 708 (1998).
[5] J. Friedman and S. Morsink, Astrophys. J. 502,
714(1998).
[6] L. Lindblom, B. J. Owen, and S. M. Morsink, Phys. Rev.
Lett 80, 4843 (1998).
[7] L. Bildsten, Astrophys. J. 501, L89 (1998).
[8] N. Andersson, K. D. Kokkotas, N. Stergioulas, Astro-
phys. J. 516, 307 (1999).
[9] N. Andersson, K. Kokkotas, and B. F. Schutz, Astrophys.
J. 510, 846 (1999).
[10] G. B. Cook, S. L. Shapiro, and S. A. Teukolsky, Astro-
phys. J 423 L117 (1994).
[11] G. B. Cook, S. L. Shapiro, and S. A. Teukolsky, Astro-
phys. J 424 823 (1994).
[12] J. W. T. Hessels et al. Science 311 1901 (2006).
[13] J. E. Grindlay, Science 311, 1876 (2006).
[14] D. C. Backer et al., Nature 300, 615 (1982).
[15] D. Chakrabarty et al., Nature 424, 42 (2003).
[16] D. Chakrabarty, Astron. Soc. Pac. Conf. Series 328, 279
(2005).
[17] B. J. Owen et al., Phys. Rev. D 58, 084020 (1998).
[18] Y. Levin, Astrophys. J 517, 328 (1999).
[19] P. Arras et al., Astrophys. J 591, 1129 (2003).
[20] J. Heyl, Astrophys. J 574, L57 (2002).
[21] P.B. Jones, Astrophys. Lett. 5, 33 (1970). P.B. Jones,
Proc, Roy. Soc. (London) A323, 111 (1971). P.B. Jones,
Phys. Rev. Lett. 86, 1384 (2001). P.B. Jones, Phys. Rev.
D64, 084003 (2001).
[22] L. Lindblom and B. J. Owen, Phys. Rev. D65, 063006
(2002), astro-ph/0110558.
[23] P. Haensel, K. P. Levenfish, and D. G. Yakovlev, Astron.
and Astrophys. 381, 1080 (2002), astro-ph/0110575.
[24] M. Nayyar and B. J. Owen, Phys. Rev. D 73 (2006)
084001, astro-ph/0512041.
[25] N. Andersson, D. I. Jones, and K. D. Kokkotas, MNRAS
337, 1224 (2002).
[26] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys. Rev.
D70 (2004) 121501, gr-qc/0406085.
[27] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys.Rev.
D70 (2004) 124017, gr-qc/0409048.
[28] J. Brink, S. A. Teukolsky, and I. Wasserman, Phys.Rev.
D71 (2005) 064029, gr-qc/0410072.
[29] A. K. Schenk, P. Arras, E. E. Flanagan, S. A. Teukol-
sky, I. Wasserman, Phys.Rev. D65 (2001) 024001,
gr-qc/0101092.
[30] L. Bildsten and G. Ushomirsky, Astrophys. J 529, L33
(2000).
[31] Y. Levin and G. Ushomirsky, MNRAS 322, 515 (2001).
[32] S. Yohida and U. Lee, Astrophys. J 546, 1121 (2001).
[33] K. Glampedakis and N. Andersson, astro-ph/0607105,
astro-ph/0411750.
[34] Y. S. Dimant, Phys. Rev. Lett. 84, 622 (2000).
[35] J. Wersinger, J. Finn, and E. Ott, Phys. Fluids 23, 1142
(1980).
[36] E. F. Brown, Ap. J 531, 988 (2000).
[37] H. Schatz, Phys. Rep. 294, 167 (1998).
[38] D. G. Yakovlev and K. P. Levenfish, Astron. Astrophys.
297, 717 (1995).
[39] D. G. Yakovlev and K. P. Levenfish, and Yu. A. Shibanov,
Soviet Phys.-Uspekhi, 42, 737 (1999).
[40] D. G. Yakovlev, A. D. Kaminker, and O. Y. Gnedin,
A&A, 379, L5 (2001).
[41] D. G. Yakovlev, and C. J. Pethick, Ann. Rev. Astron.
Astrophysics, 42, 169 (2004).
[42] G. Bryan, Philos. Trans. R. Soc. London A180, 187
(1889).
[43] C. Cutler and L. Lindblom, Astrophys. J 314, 234 (1987).
[44] K. D. Kokkotas and N. Stergioulas, Astron. and Astro-
phys. 341, 110 (1999).
[45] J. B. Kinney and G. Mendell, Phys.Rev. D67 024032
(2003).
[46] P. R. Brady, T. Creighton, C. Cutler, B. F. Schutz,
Phys. Rev. D 57, 2101 (1998), gr-qc/9702050. P. R.
Brady, T. Creighton, Phys. Rev. D 61, 082001 (2000),
gr-qc/9812014.
[47] B. J. Owen and L. Lindblom, Class.Quant.Grav. 19,
1247-1254 (2002), gr-qc/0111024.
[48] R. Prix, G. L. Comer, and N. Andersson, MNRAS 348,
625 (2004). N. Andersson and G. L. Comer, MNRAS
328,1129 (2001). N. Andersson, G. L. Comer and R. Prix,
MNRAS 354, 101 (2004).
[49] R. V. Wagoner, J. F. Hennawi, J. Liu, Proceedings of the
20th Texas Symposium on Relativistic Astrophysics, 781
(2001), astro-ph/0107229.
http://arxiv.org/abs/astro-ph/0110558
http://arxiv.org/abs/astro-ph/0110575
http://arxiv.org/abs/astro-ph/0512041
http://arxiv.org/abs/gr-qc/0406085
http://arxiv.org/abs/gr-qc/0409048
http://arxiv.org/abs/gr-qc/0410072
http://arxiv.org/abs/gr-qc/0101092
http://arxiv.org/abs/astro-ph/0607105
http://arxiv.org/abs/astro-ph/0411750
http://arxiv.org/abs/gr-qc/9702050
http://arxiv.org/abs/gr-qc/9812014
http://arxiv.org/abs/gr-qc/0111024
http://arxiv.org/abs/astro-ph/0107229
ABSTRACT
  The nonlinear saturation of the r-mode instability and its effects on the
spin evolution of Low Mass X-ray Binaries (LMXBs) are modeled using the triplet
of modes at the lowest parametric instability threshold. We solve numerically
the coupled equations for the three mode amplitudes in conjunction with the
spin and temperature evolution equations. We observe that very quickly the mode
amplitudes settle into quasi-stationary states. Once these states are reached,
the mode amplitudes can be found algebraically and the system of equations is
reduced from eight to two equations: spin and temperature evolution.
Eventually, the system may reach thermal equilibrium and either (1) undergo a
cyclic evolution with a frequency change of at most 10%, (2) evolve toward a
full equilibrium state in which the accretion torque balances the gravitational
radiation emission, or (3) enter a thermogravitational runaway on a very long
timescale of about $10^6$ years. Alternatively, a faster thermal runaway
(timescale of about 100 years) may occur. The sources of damping considered are
shear viscosity, hyperon bulk viscosity and boundary layer viscosity. We vary
proprieties of the star such as the hyperon superfluid transition temperature
T_c, the fraction of the star that is above the threshold for direct URCA
reactions, and slippage factor, and map the different scenarios we obtain to
ranges of these parameters. For all our bound evolutions the r-mode amplitude
remains small $\sim 10^{-5}$. The spin frequency is limited by boundary layer
viscosity to $\nu_{max} \sim 800 Hz [S_{ns}/(M_{1.4} R_6)]^{4/11} T_8^{-2/11}$.
We find that for $\nu > 700$ Hz the r-mode instability would be active for
about 1 in 1000 LMXBs and that only the gravitational waves from LMXBs in the
local group of galaxies could be detected by advanced LIGO interferometers.

<|endoftext|><|startoftext|>
Introduction
Quantum information processing [23] offers potential improvements in a va-
riety of applications. Computational advantages [26, 14] of quantum com-
puters with many qubits have received the most attention but are difficult to
implement physically. On the other hand, technology for manipulating and
communicating just a few qubits could be sufficient to create new economic
mechanisms by altering the information security and strategic incentives of
the underlying game.
Examples of quantum mechanisms include the prisoner’s dilemma [10,
11, 7, 8], coordination [17, 21] and public goods provisioning [3]. In partic-
ular, a quantum mechanism can significantly reduce the free-rider problem
without a third-party enforcer or repeated interactions, both in theory and
practice [2].
In this paper, we examine quantum mechanisms for another economic
scenario: resource allocation by auction [28]. While traditional auction
mechanisms can efficiently allocate resources in many cases, quantum auc-
tion protocols offer improvements in preserving privacy of the losing bids
and dealing with scenarios in which bidders care about what other bidders
win when multiple items are auctioned. Specifically, using quantum super-
positions to represent bids prevents the auctioneer and other bidders from
viewing the bids during the auction without disrupting the auction process.
Furthermore, the auction result reveals nothing but the winning bid and
allocation.
The first part of the paper introduces a general quantum auction protocol
for various pricing and allocation rules, multiple unit auctions, combinatorial
auctions and partnership bids. For simplicity, we focus on the sealed-bid
first-price auction. In this auction, each bidder has one opportunity to
submit a bid. The winner is the highest bidder, who pays the amount bid
for the item. This auction has been well studied both theoretically [28] and
experimentally [5, 4], and contrasts with iterative auctions in which bidders
can incrementally increase their bids depending on how others bid.
If the auction is not well-matched to the bidders preferences, it can intro-
duce perverse incentives and result in poor outcomes, such as lost revenue
for the seller or economically inefficient allocations where items are not al-
located to those who value them most. Thus it is important to examine
incentives introduced with a proposed auction design. In particular, our
auction protocol involves quantum search, which introduces incentive issues
beyond those examined in prior quantum games [11].
A full analysis of incentive issues is complicated, even for classical auc-
tions. In this paper we focus on two incentive issues arising from the quan-
tum auction protocol. The first incentive issue arises from the possibility
of manipulating the search outcome by altering amplitudes associated with
different bids. We show how to revise an adiabatic search method to correct
this incentive problem, thereby preserving the classical Nash equilibrium.
From a quantum algorithm perspective, this construction of the search il-
lustrates how incentive issues affect algorithm design, in contrast to the
more common concern with computational efficiency in quantum informa-
tion processing.
Second, the quantum search for the highest bid is probabilistic, i.e., does
not always return the highest bid. While the probability of finding the
correct answer can be made as high as one wishes by using more iterations
of the search, the small residue probability of awarding the item to someone
other than the highest bidder may change bidding behavior. As a step
toward addressing the effect of probabilistic outcomes, we show that, with
sufficient steps in the quantum search, altering choices from those of the
corresponding deterministic auction gives at most a small improvement for
that bidder.
The paper is organized as follows. Sec. 2 describes the quantum auction
and the bidding language encoding bids in quantum states. Sec. 3 describes
the quantum search method to find the maximum bid. After these sections
describing the auction protocol, in Sec. 4 we turn to strategic issues raised
by the quantum nature of the auction beyond those in the corresponding
classical auctions. Then, in Sec. 5 we give a game theory analysis of some
of these strategic possibilities and describe how simple modifications of the
quantum search improves the auction outcome, in theory. Sec. 6 generalizes
the results to auctions of multiple items, including combinatorial auctions.
Sec. 7 describes scenarios for which the quantum protocol offers likely eco-
nomic advantages in terms of information security and ability to compactly
express complex dependencies among items and bidders. Finally, Sec. 8 sum-
marizes the quantum auction protocol and highlights a number of remaining
economic questions.
2 Quantum Auction Protocol
In our auction protocol, each bidder selects an operator that produces the de-
sired bid from a prespecified initial state. The auctioneer repeatedly asks the
bidders to apply their individual operators in a distributed implementation
of a quantum search to find the winning bid. More specifically, the quantum
auction protocol for sealed-bid auctions involves the following steps:
1. Auctioneer announces conventional aspects of the auction: type of
auction (e.g., first or second price and any reservation prices), the
good(s) for sale, the allowed price granularity (e.g., if bids can specify
values to the penny, or only to the dollar), and the criterion used to
determine the winner(s), e.g., maximizing revenue for the seller
2. Auctioneer announces how quantum states will be interpreted, i.e., as
specifying a price if only one good is for sale, or a combination of price
and a set of goods if combinations are for sale; and also announces
the initial quantum state. This state uses p qubits for each bidder.
Auctioneer announces the quantum search procedure.
3. Each bidder selects an operator on p qubits. Bidders keep their choice
of operator private.
4. Auctioneer produces a set of particles implementing p qubits for each
bidder, initializing the set to the announced initial state.
5. Auctioneer and bidders perform a distributed search for the winner
Fig. 1 illustrates this procedure for two bidders and repeating the steps
of the search twice. Realistic search involves a larger number of steps. In
contrast with other quantum games, e.g., public goods, that involve just
one round of interaction, the search required to identify the winners involves
multiple rounds of interaction among the participants. The required number
of iterations depends on the search method. In practice, the auctioneer could
pick the number of iterations based on prior experience with similar auctions,
or from simulating several test cases using valuations randomly drawn from
a plausible distribution of values for the auction items. Alternatively, the
auctioneer could repeat the procedure several times (possibly with steps
from each repetition interleaved in a random order) and use the best result
from these repetitions.
auctioneer
auctioneer
auctioneer
bidder 1 bidder 2
bidder 1 bidder 2
start
measure state
announce result
Figure 1: Schematic diagram of distributed search procedure, showing re-
peated interactions between auctioneer and bidders, in this case two bidders
and two steps of the distributed search.
number of bidders n
number of items in auction m
number of qubits per bidder p
state of qubits for bidder j ψj
state of all qubits Ψ = ψ1 ⊗ . . .⊗ ψn
Table 1: Notation for the quantum auction.
This auction protocol uses a distributed search so bidders’ operator
choices remain private. Specifically, the search operation requiring input
from the bidders is applied locally by each bidder, giving the overall opera-
U = U1 ⊗ U2 ⊗ . . .⊗ Un (1)
where n is the number of bidders and Ui the operator of bidder i.
3 Quantum Auction Implementation
A quantum auction requires finding the winning bid and corresponding bid-
der. This procedure has two components: the interpretation of the qubits
as bids, and the search procedure to find the winner. The following two sub-
sections discuss these components in the context of a single-item auction.
Sec. 6 generalizes this discussion to multiple items.
3.1 Creation and interpretation of quantum bids
We define a bid as the amount a bidder indicates he is willing to pay for
the item. An allocation is a list of bids, one from each bidder. The quan-
tum auction protocol manipulates superpositions of allocations. We use an
allocation rule to indicate how allocations specify a winner and amount paid.
Example 1. Consider an auction of one item with three bidders, willing to
pay $1, $3 and $10 for the item, respectively. We represent these bids as
|$1〉, |$3〉 and |$10〉, and the corresponding allocation as the product of these
states, i.e., |$1, $3, $10〉 with the ordering in the allocation understood to
correspond to the bidders. A simple allocation rule selects the highest bidder
as the winner, who pays the high bid. In this example, this rule results in
the third bidder winning, and paying $10 for the item.
Each bidder gets p qubits and can only operate on those bits. Thus each
bidder has 2p possible bid values, and can create superpositions of these
values. A superposition of bids specifies set of distinct bids, with at most
one allowed to win. The amplitudes of the superposition affect the likelihood
of various outcomes for the auction. For a single-item auction, a bidder
will typically have only one bid. As discussed below, more complicated
superpositions are useful for information hiding. Specifically, bidder j selects
an operator Uj on p qubits to apply to the initial state for that bidder’s
qubits ψinit specified by the auctioneer. The resulting state, ψj = Ujψinit,
is a superposition of bids, each of the form
where b
i is bidder j’s
bid for the item. The subscript i indicates one of the possible bids that can
be specified with p qubits according to the announced interpretation of the
bits.
We define the subspace used by bidder j as the set of states spanned by
the basis eigenvectors in ψj . Only these basis vectors appear in allocations
relevant for the search. As bidders apply their operators during the search,
the superposition of allocations remains within the subspace of each bidder.
In this case, where each bidder applies an operator only to their own qubits,
the superposition of allocations is always a factored form, i.e., Ψ = ψ1 ⊗
. . . ⊗ ψn. More generally, groups of bidders could operate jointly on their
qubits, entangling their bids in the allocations as discussed in Sec. 7.
To exploit information hiding properties of superpositions, the state re-
vealed at the end of the search should specify only the bidder who wins the
item and the corresponding bid. To achieve this, instead of a direct repre-
sentation of bids, we interpret bids formed from the p qubits available to a
bidder as containing a special null value, ∅, indicating a bid for nothing.
This null bid has additional benefits in multiple item settings, as discussed
in Sec. 6 and Sec. 7.
Example 2. Consider bidder j with two qubits and the initial state ψinit =
|00〉 corresponding to the vector (1, 0, 0, 0), which is interpreted as the null
bid. The other bid states are |01〉, |10〉 and |11〉 corresponding to vectors
(0,1,0,0), (0,0,1,0) and (0,0,0,1). These three states are interpreted as three
bid values in some preannounced way, e.g., $1, $2 and $3, respectively.
The operator
1 0 1 0
0 1 0 1
1 0 −1 0
0 1 0 −1
gives the initial state ψj = Ujψinit as (|00〉+|10〉)/
2 and specifies the search
subspace whose basis is the first and third columns of Uj in this example.
Thus the possible allocations involve only |00〉 and |10〉 for this bidder, cor-
responding to the null bid and a bid of $2, respectively.
In the presence of a null bid, we consider an allocation to be a feasible
if it contains exactly one bid not equal to ∅. The corresponding allocation
rule assigns no winner to infeasible allocations and, for feasible allocations,
the winner is the single bidder in the allocation whose bid is not ∅, and he
pays the amount bid. This allocation rule corresponds to a first-price single-
item auction, except there can be no winner, analogous to the situation in
auctions with a reservation price when no bidder exceeds that price.
3.2 Distributed Search
The auctioneer must find the best state according to an announced crite-
rion, e.g., maximum revenue. Specifically, the auctioneer has a evaluation
function F assigning a quality value to each allocation. The function F
assigns a lower value to infeasible allocations than to any feasible one. An
example is F equal to the revenue produced by the allocation (if feasible)
and otherwise is −1.
The auctioneer uses quantum search to find the allocation in the subspace
selected by the bidders giving the maximum value for F (e.g., a feasible
allocation giving the most revenue to the auctioneer). This could be done via
repeated uses of a decision-problem quantum search [14, 1] as a subroutine
within a search for the minimum threshold value of F giving a solution to the
decision problem, e.g., with a classical binary search on threshold values or
using results of prior iterations of the decision problem [9]. Alternatively, we
could use a method giving the maximum value directly (e.g., adiabatic [12]
if run for a sufficiently long time or heuristic methods [15, 16] based on some
prior knowledge of the distribution of bidders values). For definiteness, we
focus on the adiabatic method.
The adiabatic search is conventionally described as searching for the
minimum cost state. We use this convention by defining a state’s cost to be
the negative of the evaluation function F . The adiabatic search procedure,
if run sufficiently slowly, changes the initial superposition into a final super-
position in such a way that the amplitude in each initial eigenstate maps
to the same amplitude in the corresponding final eigenstate, up to a phase
factor (for nondegenerate eigenstates). We refer to this mapping of initial
to final eigenstates as a perfect search. In practice, with a finite time for
the search, there will be some transfer of amplitude among the eigenstates
so the search will not be perfect in the sense defined here. Instead the auc-
tion outcome is probabilistic: the auction will not always produce the best
outcome when starting from the ground state. For example, an auction in-
tending to find the highest bid could sometimes produce the second highest
bid instead. Conventionally, the search operations are chosen so the uniform
superposition is the lowest cost initial eigenstate. In our case, bidders are
free to choose their operators and need not create uniform superpositions.
A discrete implementation of adiabatic search consists of the following
steps:
• The auctioneer selects a number of search steps S and parameter ∆.
These need not be announced to the bidders.
• The auctioneer initializes the state of all np qubits to Ψinit = ψinit ⊗
. . . ⊗ ψinit = |0, . . . , 0〉, with n factors of ψinit in the product, and
ψinit = |0〉 is the initial state for the p qubits for a single bidder.
• The auctioneer sends these initialized qubits to the bidders who use
their individual operators and then return the qubits to the auctioneer,
jointly creating the state
Ψ0 = UΨinit (3)
• For s = 1, . . . , S, the auctioneer and bidders update the state to
Ψs = UD(f)U
†P (f)Ψs−1 (4)
with f = s/S the fraction of steps completed. The bid operator U and
its adjoint U † are performed by sending bits to the bidders as described
in Sec. 2. The diagonal matrices D(f) and P (f) are described below.
• The auctioneer measures the state ΨS, resulting in specific values for
all the bits, from which the winner and prices are determined by the
allocation rule described in Sec. 3.1.
The diagonal matrix P (f) adjusts the phases of the amplitudes according
to the cost associated with each allocation. In particular, using the cost
c(x) = −F (x) for allocation |x〉, we have
Pxx(f) = exp (−ifc(x)∆) (5)
Similarly, the diagonal matrix D(f) adjusts amplitude phases as defined by
a function d(x):
Dxx(f) = exp (−i(1− f)d(x)∆) (6)
The key property of d(x) is assigning the smallest value, e.g., 0, to |0〉,
thereby making the first column of U the ground state eigenvector. Aside
from this key property, the choice of d(x) is somewhat arbitrary. The con-
ventional choice in the adiabatic method uses the Hamming weight of the
state, i.e., d(x) equal to the number of 1 bits in the binary representation of
x. However, as described in Sec. 5, other choices for d(x) can improve the
incentive properties of the auction.
The discrete-step implementation of the continuous adiabatic method [12]
involves the limits ∆ → 0 and S∆ → ∞, in which case the final state ψS
has high probability to be the lowest cost state. In practice, this outcome
can often be achieved with considerably fewer steps using a fixed value of
∆, corresponding to a discrete version of the adiabatic method [16].
4 Strategies with Quantum Operators
Ideally, an auction achieves the economic objective of its design (e.g. maxi-
mum revenue for the seller). In practice, an auction design may not provide
incentives for participants to behave so as to achieve this objective. Usu-
ally auction designs are examined under the assumption of self-interested
rational participants. In conventional auctions, strategic issues include mis-
representation of the true value, collusion among bidders and false name
bidding (where a single bidder submits bids under several aliases). Some
of these issues can be addressed with suitable auction rules, e.g., second
price auctions encourage truthful reporting of values. Developing suitable
designs of classical auctions in a wide range of economic contexts remains a
challenging problem [28].
Quantum auctions raise strategic issues beyond those of classical auc-
tions. In our case, every step of the adiabatic search requires each bidder
to perform an operation on their qubits. Ideally, the bidder should use the
same operator U for creating ψinit as in every step of the search in Eq. (4). In
addition, bidders should include the null bid in their subspaces. In the clas-
sical first-price sealed-bid auction, the bidder makes one choice: the amount
to bid. In our quantum setting, this choice amounts to selecting the sub-
space to use with the quantum search. The remaining freedom to select U ,
and possibly a different U for each step in the search, are additional choices
provided by the quantum auction.
Bidders may be tempted to exploit the flexibility of choosing operators in
several general ways. First, they could use a subspace not including the null
bid. Second, they could use a different operator for creating ψinit than they
use in the rest of the search, thereby producing an altered initial amplitude
that is not the ground state eigenvector. Third they could change operators
during the search. If any such changes give significant probability for low
bids to win, bidders would be tempted to make such changes and include
a low bid in their subspace, hoping to profit significantly by winning the
auction with a low bid.
The remainder of this section describes some strategic issues unique to
quantum auctions and possible solutions. We further discuss a game theory
analysis of some of these issues in Sec. 5.
4.1 Selecting the Subspace
The use of the null bid in our protocol raises the strategic issue illustrated
in the following example:
Example 3. Consider an auction of a single item with two bidders Alice
and Bob. Using operators producing uniform amplitudes for the sake of
illustration, they ought to apply operators that create
(|∅〉+ |bA〉) and 1√
(|∅〉+ |bB〉)
respectively, where bA and bB are their desired bids. The initial superposition
for all the qubits is the product of these individual superpositions, i.e., Ψ0 is
(|∅,∅〉+ |bA,∅〉+ |∅, bB〉+ |bA, bB〉)
If bidders use these same operators during the search, the search algorithm
finds the highest revenue allocation, i.e., giving the item to the highest bid-
der. Suppose instead Bob picks an operator with a one-dimensional subspace,
producing an initial state |bB〉 rather than including ∅. The product super-
position is then
(|∅, bB〉+ |bA, bB〉)
Since the search remains in this subspace and the second allocation is infea-
sible, the search will return |∅, bB〉 no matter what Alice bids. Thus Bob
always wins the item, and can win using the lowest possible bid.
This example shows bidders have an incentive to exclude the null set
from their subspace. If all bidders make this choice, there will be no feasible
allocations in the joint subspace and the auction will always give no winner.
For auctions with more than two bidders, selecting subspaces excluding ∅ is
a weak Nash Equilibrium for the quantum auction because any other choice
by a single bidder still results in no feasible allocations.
4.2 Altering Initial Amplitudes
Strategic choices for bidders also arise from the search procedure itself, even
when using the correct subspace consisting of ∅ and the desired bid. In
particular, the probabilistic outcome of the search means the optimal bid
according to the auction criterion (e.g., highest revenue) will not always win.
For the adiabatic search method, bidders could try to arrange for especially
tiny eigenvalue gaps between the state corresponding to the best outcome
and another state allowing them to win with a low bid. A sufficiently small
gap could make the number of steps the auctioneer selects insufficient to give
the optimal state with high probability and instead give a significant chance
of producing the more favorable outcome. However, because the eigenvalues
are a complicated function of the operators of all bidders, and individual
bidders do not know the choices made by others, it will be difficult for a
bidder to determine how to make such especially small gaps and do so in a
way that gives a favorable outcome. Nevertheless, even fairly small proba-
bilities for not finding the optimal state could alter the strategic behavior
of the bidders.
A more direct way a bidder can arrange for a low bid to win is by altering
the initial state of the adiabatic search to start not in the ground state but
in an eigenvector corresponding to one of the first few eigenvalues above
the ground state. The adiabatic search takes such eigenvectors, with high
probability, to an outcome in which a bid lower than the highest wins. While
a single bidder cannot create an arbitrary initial condition, one bidder can
ensure that it is not the ground state. For example, a bidder could chose an
operator that gives a nonuniform amplitude for the initial state, in particular
(|∅〉−|bA〉)/
2, while using the uniform state (|∅〉+|bA〉)/
2 as the ground
state through the remainder of the search in Eq. (4). This can result in
significant probability for a low bid to win, and so a bidder is tempted to
deviate from the nominal operator choice.
Fig. 2 illustrates this behavior. Instead of starting in the ground state,
0 0.2 0.4 0.6 0.8 1
high bid wins
low bid wins
infeasible
Figure 2: Correspondence between initial basis and the possible allocations
for a single item auction with two bidders in the standard adiabatic search.
During the search, as f increases from 0 to 1, the eigenvalues of the four
states change as shown schematically in the figure. The states for f = 0
correspond to both bidders starting with the ground state, |00〉, the two
states obtained if one of the bidders starts with a different superposition,
|01〉 and |10〉 (“single-bidder deviation states”), and the state of both bidders
starting with different superpositions, |11〉 (“2-bidder deviation state”).
the bidder’s choice gives the initial state as a linear combination of the
ground state and the single-deviation state for that bidder, denoted as |01〉
or |10〉 for the two bidders in Fig. 2. Here a “single deviation” state is
one that a single bidder can create, i.e., by operating on just the qubits
available to that bidder. The adiabatic search splits the degeneracy, thereby
giving some probability for the lowest bid to win and some probability for
an infeasible allocation.
More generally, bidder i uses this strategy by selecting two different
operators U initi and Ui to use for forming the initial state and during the
search, respectively. These choices result in different joint operators, in
Eq. (1), used in Eq. (3) and (4).
As with selecting a subspace without ∅, if many or all bidders make
this choice, the initial state will have significant amplitude in eigenvectors
corresponding to large eigenvalues, which produce infeasible outcomes and
0 0.2 0.4 0.6 0.8 1
high bid wins
low bid wins
infeasible
Figure 3: Correspondence between the initial basis and the possible alloca-
tions for a single item auction with two bidders in the search with permuted
initial eigenvalues.
hence a high probability for no winner. Thus with standard adiabatic search,
if everyone uses the same operator for both initialization and search, then
each bidder is tempted to use a different initialization operator and bid low,
gaining a chance to win with a low bid. However, if multiple bidders attempt
this, the outcome will most likely be an infeasible state, with no winner.
We can address this problem by reordering the eigenvalues given by the
d(x) function in Eq. (6) so that any change in initial operator by a single
bidder increases probability of infeasible allocation but not the probability of
any feasible allocation with a bid lower than the highest bid. This is possible
because bidders only have access to their own bits, so can only form initial
superpositions from a limited set of basis vectors. Fig. 3 illustrates the
resulting situation. We give an analysis of this approach in Sec. 5.2.
4.3 Changing Operator During Search
The distributed search of Eq. (4) has each bidder using the same operator
for every step of the search. Thus bidders may gain some advantage by
altering their operator during the steps of the search. Gradually changing
the operator during the search amounts to a different path from initial to
final Hamiltonian during the adiabatic search. Thus, provided the auction-
eer uses enough steps, such changes will have at most a minor effect on the
outcome probabilities unless the bidder can arrange for particularly small
eigenvalue gaps among favorable states. Such arrangement is difficult, par-
ticularly since the bidder does not know the choices of other bidders and
the auctioneer could treat the bits from the bidders in an arbitrary, unan-
nounced order.
More significant changes in outcome is possible with sudden, large changes
in the operator during search. Since the use of bidders operators gradually
decreases during the search (i.e., Dxx(f) given in Eq. (6) approaches the
identity operator as f approaches 1), the most problematic situation is for
an abrupt change in operator at the beginning of the search. After such
a change, the adiabatic search continues its gradual change of states, but
now instead of starting in the ground state, it will instead have a linear
combination of various states obtained by mapping the original basis onto
the basis after the change.
5 Quantum Auction Design
In this section, we focus on mechanism design to reduce incentive issues
arising from the quantum aspects of the auction. We analyze incentive is-
sues with the Nash equilibrium (NE) concept commonly used to evaluate
auctions [28]. A given set of behaviors for the bidders is an equilibrium
if no single bidder can gain an advantage (i.e., higher expected payoff) by
switching to another behavior. Specifically, Sec. 5.1 describes an approach
to encouraging bidders to include the null set in their bids. In Sec. 5.2 we
show that using the ground state eigenvector is a NE provided bidders do
not change the operators during the search. Sec. 5.3 then discusses how
the auctioneer can discourage bidders from changing operators. Sec. 5.4 de-
scribes how the auction can be made symmetric across the different bidders.
We focus on single-item auctions in this section, but the ideas extend to
quantum combinatorial auctions, as described in Sec. 6.
5.1 Checking for the Null Set
One approach to the incentive to exclude the null set, described in Sec. 4.1,
is for the auctioneer to perform a second search: for the allocation with the
most ∅ values. This search uses the same distributed protocol of Eq. (4)
but with separate qubits and a different cost function to define P (f), i.e.,
setting c(x) to the number of non-∅ values in the allocation x. Interleaving
the additional search in a random order within the steps of the search for
the winning bid prevents bidders from knowing which search a given step
belongs to. So bidders could not consistently select different operators for
the two searches.
If all bidders include ∅ in their selected subspace, this additional search
returns |∅,∅, . . .〉. Any bidder found not to have included ∅ could be ex-
cluded from winning the auction. At this point the auctioneer could either
announce there is no winner, or restart the auction for the remaining bid-
ders without announcing this restart. The adiabatic search has a small but
nonzero probability of returning the wrong result, which would then incor-
rectly conclude some bidder did not include ∅. As long as the probability of
such errors is smaller than the error probability of the search for the winner,
these errors should not greatly affect the incentive structure of the mech-
anism. Alternatively, the auctioneer could use a search completing with
probability one in a finite number of steps, i.e., with different choices of D
and P in Eq. (4), the auctioneer could implement Grover’s algorithm [14]
to search for the allocation |∅,∅, . . .〉 in the joint subspace of the bidders.
Since the auctioneer does not know the size of the subspaces selected by the
bidders, the auctioneer would need to try various numbers of steps [1] before
concluding |∅,∅, . . .〉 is not in the selected subspaces. Unlike the adiabatic
search, failure would only indicate some bidder had not included ∅, but not
which one. Thus the auctioneer’s only alternative in this case is to announce
the auction has no winner.
While this approach removes the immediate benefit of not including the
null bid, its affect on broader strategic issues in the full auction is an open
question.
5.2 The First-Price Sealed-Bid Auction
In this section we examine the incentive structure of the auction with per-
muted eigenstates described in Sec. 4.2. We first review how game the-
ory applies to auctions. We then consider the quantum auction when the
search runs long enough to give successful completion almost always (“per-
fect search”). Finally, we consider the more realistic case of search with
small, but not negligible, probability for non-optimal outcomes.
5.2.1 A Game Theory Approach to Auctions
Game theory is a common approach to evaluating auctions [20, 28]. Con-
sider n people bidding for an item, with person i having value vi for the
item. Unlike discrete choice games, such as the prisoner’s dilemma, a strat-
egy for a private value auction involves a bidding function b(v), mapping
a bidder’s value to a corresponding bid. Theoretical analysis of auctions
usually involves identifying a NE strategy, if any. This is a strategy for all
players such that no bidder gains by changing this strategy given everyone
else is using it. This focus on possible changes by a single bidder assumes
bidders do not collude.
A primary issue for auction behavior is how much participants know
about other bidders’ values. Such knowledge can affect the choice of bid.
The most popular model of such knowledge is independent private values,
where the vi are independently drawn from the same distribution. Each
bidder knows his own value, but not the values of other bidders. However,
the distribution from which values come is common knowledge, i.e., known
to all bidders, each bidder knows the others know this fact, and so on. A
final ingredient for the analysis is an assumption of bidders’ goals. For
illustration, we use the common assumption that bidders are risk neutral
expected utility maximizers, and within the context of the auction, utility
is proportional to profit.
We illustrate this approach for a first-price sealed-bid auction, in which
each bidder submits a single bid without seeing any of the other bids. This
corresponds to the auction scenario considered in this paper. The bidder
with the highest bid gets the item and pays the amount of his bid. Thus if
bidder i bids bi, his profit is vi−bi if he wins the auction and zero otherwise.
To avoid possibly losing money, bidders should ensure bi ≤ vi, and bids are
required to be nonnegative.
In the symmetric case where bidders’ values all come from the same
distribution, a NE is a bidding function b(v). A bidder’s expected payoff
is (v − b(v))P (b) where v is his value, b is his bidding function and P (b)
is the probability of winning if he is using b(v) (which is also the function
others use in equilibrium). Let F be the cumulative distribution of values,
i.e., probability a value is at most v, and n be the number of bidders. The
equilibrium condition leads to a differential equation satisfied by b(v) [28].
As a simple example, when v is uniformly distributed between 0 and 1,
F (v) = v and the NE is b(v) = (n − 1)v/n. Thus, in the equilibrium
strategy, a bidder bids somewhat less than his value and the bid gets closer
to the value when there is more competition, i.e., larger n.
If bidders have differing value distributions, a NE involves a set of bidding
functions, {bi(v)}. An auction may have multiple equilibria.
5.2.2 Behavior with Perfect Search
With perfect search and non-colluding bidders, if bidders use the same opera-
tors for every step of the search, including initialization, and pick a subspace
with the null bid then the adiabatic search described in Sec. 3.2 finds the
highest revenue state. We now show that the auctioneer can choose eigenval-
ues for the search so that bidders have no incentive to create an initial state
different from the ground state. This choice corresponds to the auctioneer
selecting an appropriate function d(x) in Eq. (6).
Suppose bidder i uses operator Ui, giving the overall operator U with
Eq. (1). Suppose all bidders except bidder 1 use the same operator to create
the initial state as they use for the subsequent search. But bidder 1 uses two
operators: U init1 to form the initial state and U1 for the search. Thus the
initial state produced by bidder 1, ψ1 = U
1 ψinit, i.e., the first column of
U init1 , is not necessarily equal to the first column of U1 that bidder 1 uses for
the subsequent search. Instead, ψ1 may have contributions from all columns
of U1, i.e.,
αi |i〉 (7)
where |i〉 corresponds to column i, ranging from 0 to 2p−1, of U1. Combining
with the initial state of all other bidders, Eq. (3) gives Ψ0 =
i αi |i, 0, . . . , 0〉,
instead of the initial ground state |0, 0, . . . , 0〉.
Significantly, because a bidder can only operate on the p qubits from the
auctioneer and not on any of the qubits sent to other bidders, a single bidder
can only create a limited set of “single-deviation” initial states. In the case
of bidder 1, these states all have the form |i, 0, . . . , 0〉. Similarly, if bidder
j is the one using different initial and search operators, the states all have
the form |. . . , 0, i, 0, . . .〉, where only the jth position can be nonzero. Thus,
among the 2np basis states in the full search space, aside from the correct
ground state, only n(2p−1) are possible states some single bidder can create
when all other bidders use the same operator for initialization and search.
More generally, k bidders can create superpositions of (2p − 1)k basis
states in which none of them use the ground state initially, by selecting
different operators for initialization and search. Thus there are
(2p − 1)k (8)
k-deviation states that some set of k bidders can create, while the other
n− k bidders use the ground state.
Our formulation has n(2p−1) feasible allocations, i.e., situations in which
exactly one of the bidders has a non-∅ bid while all other bidders have ∅.
To see this, each of the n bidders could have the non-∅ bid, and this bid
could have any of 2p − 1 values (since the remaining value for the bidder’s
bits represents ∅). The remaining n− 1 bidders have only one choice each,
i.e., ∅.
Suppose the auctioneer selects d(x) such that d(|0, . . . , 0〉) = 0 is the
lowest eigenvalue and d(x) for all single-deviation states x is the largest
value, with intermediate values for all other states. Provided the number
of infeasible allocations is at least equal to the number of single-deviation
states, a perfect search will then map every single-deviation state to an
infeasible allocation, resulting in no winner for the auction. This condition
amounts to
2np − n(2p − 1) ≥ n(2p − 1) (9)
The following claim shows that Eq. (9) always is true in an auction scenario.
Claim 1. Eq. (9) is true for all integers n, p ≥ 1
Proof. When p = 1, Eq. (9) reduces to 2n−1 ≥ n, which is true for all n ≥ 1.
We prove a stronger condition for p ≥ 2, namely there are enough in-
feasible states to handle up to n − 1 bidders deviating. Using Eq. (8), this
stronger condition is
2np − n(2p − 1) ≥
(2p − 1) = 2np − 1− (2p − 1)n (10)
with the k = 1 term in the sum corresponding to the right-hand side of
Eq. (9). Writing x ≡ 2p − 1, Eq. (10) becomes f(x, n) ≡ xn − nx+ 1 ≥ 0.
Since p ≥ 2, we have x ≥ 3. For this range of x and for n ≥ 1, f(x, n) is
monotonically increasing in both arguments. To see f is monotonic for x, the
derivative of f(x, n) with respect to x is n(xn−1 − 1) which is nonnegative
since n ≥ 1 and x > 1. Similarly, the derivative with respect to n is
x(xn−1 ln(x) − 1) which is at least 3(ln(3) − 1) > 0 since n ≥ 1 and x ≥ 3.
Thus for the relevant range of n and x, f(x, n) ≥ f(3, 1) = 1 so Eq. (10) is
true for all n ≥ 1 and p ≥ 2.
Combining these cases for p = 1 and p ≥ 2 establishes the claim.
Using this claim, we demonstrate the permuted eigenvalue choices re-
move incentives to alter the initial amplitudes:
Theorem 1. If (a) auctioneer chooses eigenvalues as described above, (b)
{b∗i (v)}ni=1 is an equilibrium for the first-price classical auction, and (c) bid-
ders include the null set as part of their bids and use the same operator in
each step in the search except, possibly, for the initial state, then the strategy
of using bidding functions {b∗i (v)}ni=1 and the same operator for their initial
state as they use in the search is a NE for corresponding quantum auction.
Proof. Without loss of generality, suppose only bidder 1 deviates and all the
other bidders use {b∗i (v)}ni=2 and the same operator for initialization and
search. Then, as described above, the initial state Ψ0 is
i αi |i, 0, . . . , 0〉
for some choice of amplitudes αi, with i ranging from 0 to 2
p − 1.
A perfect adiabatic search maps each of these states to a corresponding
allocation. In particular, with d(|0, . . . , 0〉) having the smallest value of
the function d(x), the lowest cost allocation is produced with probability
|α0|2. This allocation corresponds to the highest bid winning. Moreover,
each |i, 0, . . . , 0〉 with i 6= 0 has the largest value of d(x), and so, because
of Eq. (9), maps to an infeasible allocation, giving no winner and hence no
value to bidder 1.
Hence the expected value for bidder 1 is |α0|2V where V is the value of
the expected profit of the corresponding classical auction to bidder 1. Since
|α0|2V ≤ V , bidder 1 cannot gain from such a deviation.
Furthermore, there is no gain from deviating from the bidding function
b∗1(v) since it will only decrease V , because, by assumption, {b∗i (v)}ni=1 is a
NE for the corresponding classical auction.
Because of Eq. (9), this discussion applies to deviations by any single
bidder, not just bidder 1. Thus, using bidding function {b∗i (v)}ni=1 and using
the same operator for their initial state as they use in the search is a NE.
The stronger condition, Eq. (10), shows that the number of infeasible
states is enough to give no winner for any choice of initial amplitudes that
up to n − 1 bidders can produce, provided p ≥ 2. Thus if an auctioneer
implements a collusion-proof classical auction with the quantum protocol
and assigns infeasible states as described then the resulting quantum auction
is collusion-proof up to n− 1 bidders for initial amplitude deviations.
The choice for d(x) satisfying the above requirements is not unique.
As one example, let x be the state index in the full search space, running
from 0 to 2np − 1. Consider x as written as a series of n base-2p numbers,
|x1, x2, . . . , xn〉. Define
d(x) = −r(x) (mod n+ 1) (11)
where r(x) is number of nonzero values among x1, x2, ..., xn. The mod oper-
ation gives all d(x) values in the range 0 to n. For the initial ground state,
x = |0, . . . , 0〉, r(x) = 0 so d(x) = 0, and this is the smallest possible value.
Single-deviation states have exactly one of the xi nonzero, giving r(x) = 1
and d(x) = n, the largest possible value. More generally, all k-deviation
states have r(x) = k so d(x) = n + 1 − k. This function definition gives
values directly from the representation of the state x, so, in particular, the
auctioneer can implement it without any knowledge of the subspaces selected
by the bidders.
The assumption of perfect search is a sufficient but not necessary con-
dition for the proof of Theorem 1. The necessary conditions are more com-
plicated because we only need that every single bidder deviation maps to a
linear combination of infeasible states. Thus mixing among different single-
deviation states during search (e.g., due to small eigenvalue gaps among
those states), or among states corresponding to two or more bidders deviat-
ing, does not affect the proof.
5.2.3 Bounded Number of Search Steps
Theorem 1 shows the quantum auction has the same NE as the classical
first price auction if the search is perfect and each bidder uses the same
operator for every search step of Eq. (4). Since adiabatic search, run for a
finite number of steps, is not perfect we examine the effect on the NE of
an imperfect search. We show that the NE for perfect search, i.e., bidding
as in a classical first price auction and using the same operator initially
and during the search, is an ǫ-equilibrium for the auction with imperfect
search. Furthermore, ǫ converges to zero as the number of search steps goes
to infinity. A strategic profile is an ǫ-equilibrium [24] if for every player,
the gains of unilateral defecting to another strategy is at most ǫ. This
weaker equilibrium concept is useful in our case because determining how to
exploit imperfect search is computationally difficult. Specifically, with the
small eigenvalue gaps and degeneracy it is hard to know whether imperfect
search benefits a particular bidder. Thus computational cost will likely
outweigh the small possible gain. In this situation, an ǫ-equilibrium is a
useful generalization of NE.
We must prove that for any ǫ there exists an N so that if the search
process uses at least N steps, the equilibrium of the game with a perfect
search is also an ǫ-equilibrium when using the actual search. To do so,
we bound the possible gain from deviation based on prior knowledge of the
range of possible bidder values. That is, we assume the distribution of values
has a finite upper bound v̄. In our context, one such bound is the maximum
bid value expressible by the announced interpretation of each bidders qubits.
Theorem 2. If the conditions of Theorem 1 are met, and assuming the pos-
sible bidder values are bounded by v̄, for any ǫ > 0, there exists an N so that
the NE in the quantum auction with a perfect search, shown in Theorem 1,
is also an ǫ-equilibrium of the same auction with an imperfect search using
N search steps.
Proof. Let ph be the probability of the highest bid wins. Let pinf be the
probability of reaching an infeasible state. Then po = 1 − ph − pinf is the
probability of a bid other than the highest bid wins.
With the adiabatic search, with nonzero eigenvalue gaps, the probability
of correctly mapping the initial to final states converges to one as the number
of search steps increases. Thus for any δ > 0, there always exists a N where
po is at most δ.
We define an equilibrium expected payoff function for bidder i with value
v as π∗i (v), when all bidders use their equilibrium bidding functions.
Without loss of generality, from the perspective of bidder i with value
v, the probability of achieving the equilibrium payoff, π∗i (v), if that bidder
does not deviate is 1 − δ. Thus the expected payoff of deviating is at most
πdeviatei (v) ≤ (1− δ)π∗i (v) + δv̄ because (a) the most any bidder can gain is
bounded by v̄, and (b) with probability 1− δ the auction either produces no
profit (pinf) or is identical to a classical auction (ph).
The expected gain g from deviating is the expected payoff from deviating
minus the expected payoff with no deviation, i.e., g = πdeviatei (v) − π∗i (v) ≤
δ(v̄ − π∗i (v)), which in turn is at most δv̄. Thus for any choice of δ, there
always exists an N where the maximum deviation benefit is at most δv̄.
For any ǫ > 0, using δ = ǫ/v̄ in the above discussion shows there always
exists an N where the deviation is at most ǫ.
5.3 Testing for Changed Operators During Search
One approach to the incentive issue of changing operators during search,
described in Sec. 4.3, is for the auctioneer to test the bidders by randomly
inserting additional probe steps in the search.
Specifically, suppose at any step of the search the auctioneer, with some
probability, decides to check a bidder by sending a new set of qubits in a
known state |φ〉, while storing the qubits for the search until a subsequent
step. For the test step, the auctioneer sets D or P to the identity operator.
The state returned by the bidder is then U ′iU
i |φ〉 or U
i Ui |φ〉, depending on
which part of the search step in Eq. (4) the auctioneer is testing. Without
loss of generality, we consider the former case.
Ideally, the bidder uses the same operator, so U ′i = Ui and U
i is the
identity. Suppose the test state is formed from some operator V , randomly
selected by the auctioneer, |φ〉 = V |0〉. If U ′iU
i is not the identity, the re-
turned state has the form α |φ〉+β |φ⊥〉, where |φ⊥〉 is some state orthogonal
to |φ〉 and |α|2 + |β|2 = 1. The auctioneer then applies V †, giving
α |0〉+ β |a〉 (12)
for some value a 6= 0. The auctioneer then measures this state, getting
0 with probability |α|2, indicating the bidder passes the test. Otherwise,
the auctioneer observes a different value, indicating the bidder changed the
operator.
Hence the chance of getting caught depends on how often the auctioneer
checks, and how big a change the bidder makes in the operator. Larger
operator changes are more likely to be caught. This testing behavior is
appropriate as small changes are not likely to have much affect on the search
outcome, and instead simply act as an alternate adiabatic path from initial
to final states. This technique is especially useful for risk averse bidders
since then even a small chance to be caught might be enough to prevent
bidders from wanting to change operators.
5.4 Assigning Eigenvalues to Subspaces
Quantum search acts on the full space of superpositions of the available
qubits, i.e., in our case to all 2np configurations of items and bids. In the
auction context, bidders choose operators to restrict the search to a subspace
of possible bids, namely the ones they wish to make. Conceptually, the
search described above is then restricted to the subspace selected by the
bidders.
The search can also be viewed as taking place in the full space of 2np
configurations. The operator U appearing in the search algorithm is block
diagonal (up to a permutation of the basis states), with only the block
operating on the selected subspace relevant for the search outcome. This
view of the search is that of the auctioneer, who has no prior knowledge
of the subspace selected by each bidder. The operator U is not known
to any single individual: instead its implementation is distributed among
the bidders, with each bidder implementing a part of the overall operator.
The auctioneer chooses the eigenvalues for the initial Hamiltonian and the
ordering for the qubits assigned to each bidder. These choices, which could
change during the search, affect the incentive structure of the auction as
described in Sec. 5.2.
This section describes how the auctioneer’s choice of d(x) can give the
same eigenvalues when restricted to the subspace actually selected for the
search. For simplicity, we suppose each bidder uses a 2-dimensional sub-
space, consisting of ∅ and the desired bid for the single item. While not
essential for the NE results discussed above, uniformity with respect to sub-
space choices means bidders are treated uniformly, so convergence of the
search is independent of the order in which the auctioneer considers the
bidders.
5.4.1 An Example
Consider n = 2 bidders, each with p = 2 bits, representing 4 values: ∅
and three bid values 1, 2, 3. A set of 2-bit operators to form a uniform
superposition of the form (|∅〉+ |b〉)/
2 where b is the bid value, 1, 2 or 3,
is 1/
2 times
1 −1 0 0
1 1 0 0
0 0 1 −1
0 0 1 1
1 0 −1 0
0 1 0 −1
1 0 1 0
0 1 0 1
1 0 0 −1
0 1 1 0
0 −1 1 0
1 0 0 1
which we can denote as A1, A2, A3, respectively, with the first columns giving
the uniform superposition of the three possible bid values. If the bidders
select bids b1, b2, respectively, the overall operator for the search is U =
Ab1 ⊗ Ab2 , used in Eq. (4) to perform each step of the search. Thus in this
case there are 9 possible subspaces the two bidders can jointly select. Up to
a permutation, U is block diagonal with the block containing the nonzero
entries of the first column, and hence all the nonzero amplitude during the
search, equal to
1 −1 −1 1
1 1 −1 −1
1 −1 1 −1
1 1 1 1
The search using U in the full 4-bit space is thus equivalent to one taking
place in the 2-bit subspace selected by the two bidders using this operator
The auctioneers’ choice of eigenvalues, i.e., the function d(x) used in
Eq. (6) should ensure the uniform superposition within the subspace defined
by the two bidders has the lowest value, say 0, and all other eigenstates have
larger values.
One possibility is the standard choice for the diagonal values d(x) when
searching in the full space of 24 states defined by the np = 4 bits, namely
the Hamming weight of each state, i.e., the number of 1 bits in its binary
representation, ranging from 0 to 4.
An alternative approach is picking d(x) so eigenvalues for the four states
appearing in V have the same values as they would have with using the
Hamming weight for a 2-bit search, ranging from 0 to 2. Doing so requires
selecting the eigenvalues to match the corresponding Hamming weights for
any choices the bidders make among A1, A2, A3. In this example, each bidder
has 2 qubits, so can represent 4 states, which we denote as |0〉 , . . . , |3〉. The
states for both bidders are products of these individual states, |0, 0〉 , . . . , |3, 3〉.
Examining the 9 possible cases for U , shows a consistent set of choices is
d(|x, y〉) equal to the number of nonzero values among x, y. With this d(x),
the adiabatic search in the subspace selected by the bidders is identical to
the standard adiabatic search for two bits. This choice treats both bidders
identically.
In this case we see the auctioneer can arrange the adiabatic search to
operate symmetrically no matter what choice of subspace each bidder makes
(i.e., no matter what value each bidder decides to bid). Thus from the point
of view of the bidders, the search, in effect, takes place within the subspace
of possible values defined by their bid selections.
5.4.2 General Case
For arbitrary numbers of bidders n and bits p, we consider a single-item
auction so each bidder would, ideally, pick an operator giving just two terms,
with b(j) the bid of bidder j for the single item and no bits needed to specify
which item the bidder is interested in. The choice of b(j) corresponds to
the bidder picking a 2-dimensional subspace of the 2p possible states. The
product of these subspaces gives a subspace S of all np qubits used in the
auction. The subspace S has dimension 2n and its states xS can be viewed
as strings of n bits. More specifically, we suppose bidder j implements the
operator Uj such that the rows and columns corresponding to ∅ and b
have nonzero values only for positions ∅ and b(j). That is, the elements of
Uj for these two values form a 2× 2 unitary matrix.
If the auctioneer knew the subspace S, the eigenvalue function d(x) used
in Eq. (6) could be selected to match any desired function dS(xS) of the
states in xS ∈ S. Without such knowledge, this is possible only for some
choices for dS .
Theorem 3. Provided dS(xS) depends only on the Hamming weight of the
states xS, a single choice of d(x) in the full space corresponds to dS(xS) in
all possible subspaces the bidders could select that include the null set.
Proof. Consider the full operator U given by Eq. (1). For the element Ux,y,
express the np bits defining the states x and y as sequences of p-bit values,
x1, . . . , xn and y1, . . . , yn, respectively, with each xi and yi between 0 and
2p − 1. From Eq. (1),
Ux,y =
(Ui)xi,yi
The matrix U is of size 2np × 2np while each Ui is of size 2p × 2p.
Consider the first column of U , i.e., y = 0. Ux,0 is nonzero only for those
x such that all the (Ui)xi,0 are nonzero. For this to be the case, each xi is
either 0 (corresponding to |∅〉 for that bidder’s superposition) or xi = b(i),
i.e., the bid value. Similarly, for all columns with each yi equal to 0 or b
These values for x, y are precisely the states in the selected subspace of the
bidders, S. For these choices of xi, yi, we can map 0 (i.e., p bits all equal to
zero) to the single bit 0, and each b(i) (specified by values for p bits) to the
single bit 1. This establishes a one-to-one mapping from states in the full
space, of np bits corresponding to the product of bidders’ superpositions,
to states in the subspace treated as n-bit vectors. Thus a function dS(xS)
applied to the subspace that depends on the Hamming weight, i.e., the
number of 1 bits in xS, is the same as a function on the full space depending
on the number of nonzero xi values in x = x1, . . . , xn.
We must show that a single choice of function d(x) in the full space
maps to the desired dS(xS) in any choice of bidder subspaces. To see this
is the case, consider any state in the full space x = x1, . . . , xn. Among
these xi, suppose h are nonzero, denoted by xa1 , . . . , xah . This state x
will appear in all selected subspaces in which bidder aj bids b
(aj) = xaj ,
for j = 1, . . . , h, and the remaining bidders have any choice of bid. That
is, x appears in (2p − 1)n−h possible subspaces S. Since x has exactly h
nonzero values, in each of these possible subspaces it maps to a state xS
with exactly h bits equal to 1, i.e., it has the same Hamming weight, h,
in all possible subspaces in which it appears. Thus any choice of function
dS(xS) depending only on the Hamming weight of xS will have the same
value in all these possible subspaces. This observation allows the auctioneer
to select that common value as the value for d(x), consistently giving the
desired eigenvalue function for any possible subspace. Since this holds for
all values of h, the auctioneer can operate in the full space with identical
search behavior no matter what subspace the bidders select.
For the auctioneer to operate without knowledge of the actual subspace
selected by the bidders and treat bidders identically, we need d(x) to map
to the same function on any subspace selected. In this case, the search
proceeds exactly as if the auctioneer did know the subspace choices made
by the bidders. The theorem gives one type of function for in which this is
the case. In particular, Eq. (11) is an example of a function satisfying this
theorem.
6 Multiple Items and Combinatorial Auction
While the paper focuses on the single item first-price sealed-bid auction, the
quantum protocol can apply to multiple items by changing the interpretation
of the bids, i.e., the bidding language. Such changes affect the counting of
deviation and feasible states, so we must check the validity of Theorem 1.
In the single item case, each bidder uses the p qubits to specify the bid
amount. With multiple items, the bid must specify both the items of interest
and a bid amount for the items. Various bidding languages can encode this
information.
For multiple items, we divide the p qubits allocated to each bidder into
two parts: pitem bits to denote a bundle of items and pprice bits to denote bid
value (so p = pitem+pprice). Since qubits are expensive, a succinct represen-
tation of items is best. Depending on the type of auction, we have various
choices with different efficiency in using bits. For example, the pitem item
bits could indicate the item in the bid, allowing pitem qubits to specify up to
2pitem different items. Another case is multiple units of a single item, so pitem
could specify how many units a bidder wants (with the understanding the
bid is for all those units not a partial amount) so the bits could specify 2pitem
different numbers. In the general case, bids are on arbitrary sets of items or
bundles, and we represent a bundle with m bits, 1 if the corresponding item
is a part of the bundle and 0 otherwise, i.e., m = pitem. We focus on this
general case in the remainder of the section. Allowing bids on sets of items
is called a combinatorial auction [6].
With multiple items, the bid operator ψj = Ujψinit gives a superposition
of bids of the form
i , b
where b
i is bidder j’s bid for a bundle of
items I
. In this notation, the null bid is |∅, b〉, and the specified amount
b is irrelevant so we take it to be zero in the examples. A superposition
specifies a set of distinct bids, with at most one allowed to win.
Example 4. Consider a combinatorial auction with two items X, Y and
integer prices ranging from 0 to 3. With p = 4 bits for each bidder, using 2
bits each to specify item bundles and prices, is sufficient to specify the bids.
The full space for a bidder has dimension 2p = 16, consisting of 4 possible
item bundle choices and 4 price choices. Suppose a bidder places a bid
(|∅, 0〉+ |X, 1〉 + |(X,Y ), 2〉)
i.e., a bid of 1 for item X alone, and 2 for the bundle of both items. In this
case, the bidder is not interested in item Y by itself. The dimension of the
subspace of this bid is 3. Another example is the bid
(|∅, 0〉 + |X, 1〉 + |X, 3〉 + |(X,Y ), 4〉)
The dimension of the subspace is 4. This superposition has multiple bids on
the same item X.
This bidding language is both expressive and compact. For instance, a
superposition of bundles of items readily expresses exclusive-or preferences,
where a bidder wants at most one of the bundles. It is also compact because
superpositions allow the bidder to use exactly the same qubits to place
no bid (i.e., ∅) and to place all the exponential number of bundles in a
combinatorial auction.
An allocation, as defined in Sec. 3.1, is a list of bids, one from each
bidder. With multiple items, an allocation is feasible if the item sets are
pairwise disjoint. As in the single item case, we consider the allocation
when all item sets are empty as infeasible. The value of a feasible allocation
is the sum total of the bid values of the different bids in the allocation. The
number of feasible states is ((n + 1)m − 1)2npprice . This is because we can
assign m items among n bidders where all items need not be allocated in
(n+1)m ways. The factor n+1 allows for some items to remain unallocated.
Since the allocation when all bidders place the null bid is an infeasible state,
we subtract 1. Each bidder can specify 2pprice different prices for the bundle
giving 2npprice possible choices for n bidders. Note that the number of feasible
states for a single item, m = 1, is different from that in Sec. 5.2 because
here we have changed the bidding language to represent items also.
The null bid in our protocol simplifies the evaluation of allocations for
combinatorial auctions. To see this, consider a protocol without the null bid.
In a single item case, F (x) for any allocation vector x would be maximum of
the bids placed by the different bidders on the item, which is fairly easy to
compute. But in the case of multiple items, there could be several allocations
for a vector x. For example suppose Alice bids on the set {A,B} and Bob
bids on {B,C}. Without the null set then both bids appear in the same
state and have to be evaluated by F (x). The possible allocation to the
bidders are
1. none to either
2. {A,B} to Alice
3. {B,C} to Bob, and
4. {A,B} to Alice and {B,C} to Bob (which is infeasible)
F (x) will have to compute the maximum of the values in all these states.
This is computationally complex when there are many items. By contrast,
the bidding language with the null bid avoids this combinatorial evaluation
within the search function F (x).
As in the case of single item auctions, we restrict ourselves to a one-shot
sealed bid classical combinatorial auction that we implement in a quantum
setting. The total number of states is 2pn and the total number of single
bidder deviations states is n(2p − 1). These expressions are the same as the
single item case. The condition for all single-deviation states to be mapped
to infeasible allocations, resulting in no winner, is
2np − ((n + 1)m − 1)2npprice ≥ n(2p − 1) (13)
This condition holds for cases relevant for auctions as seen in the following
claim.
Claim 2. Eq. (13) is true for all integers m, pprice ≥ 1 and n ≥ 2.
Proof. Recall p = m + pprice. We prove a stronger condition for integers
n,m ≥ 2, i.e., there exists enough infeasible states to handle joint deviations
up to n − 1 bidders. The number of k-bidder deviation states is the same
as the single-item case, i.e., Eq. (8). Thus this stronger condition, with the
same right-hand side as Eq. (10), is
2np − ((n + 1)m − 1)2npprice ≥ 2np − 1− (2p − 1)n (14)
Hence Eq. (14) is true if
(2p − 1)n ≥ ((n+ 1)m − 1)2npprice
⇔ 2ppricen(2m − 2−pprice)n ≥ ((n+ 1)m − 1)2npprice
⇔ (2m − 2−pprice)n ≥ (n+ 1)m − 1
Since 2−pprice ≤ 1, Eq. (14) is true if
(2m − 1)n ≥ (n+ 1)m − 1
which is true if
(2m − 1)
m ≥ (n+ 1)
Let f(m) ≡ (2m − 1)
m and g(n) ≡ (n + 1)
n . We establish the required
inequality, f(m) ≥ g(n), by showing f(m) is increasing in m when m ≥ 2,
g(n) is decreasing in n when n ≥ 2 and noting f(2) = g(2) =
Taking the derivative of f(m) with respect to m, we get,
(2m − 1)
2m ln(2)
2m − 1
ln(2m − 1)
This is positive if and only if
2m − 1
log2(2
m − 1)
This is true because log2(2
m − 1) < log2(2m) = m and hence both fractions
in the expression are greater than 1. Thus, f(m) is increasing for all m ≥ 2.
Taking derivative of g(n) with respect to n, we get,
(n+ 1)
1 + n
− ln(1 + n)
This is negative if and only if
ln(1 + n)
1 + n
This is true for n ≥ 2. Thus g(n) is decreasing in n for n ≥ 2.
Thus we have shown that Eq. (13) is true for n,m ≥ 2. It can be easily
checked that Eq. (13), is not true for n = 1 and true when n = 2 and
m = 1.
Thus, if a classical combinatorial auction has a NE then the correspond-
ing quantum auction protocol also has a NE with respect to initial state
deviations. Also there is an ǫ-equilibrium of the same auction with an im-
perfect search using N search steps. Moreover, the stronger condition of
Eq. (14) shows that in auctions with at least two bidders (n > 1), there are
enough infeasible states to give no winner for any deviation of initial ampli-
tudes that up to n− 1 can produce. Thus no groups, up to size n− 1, can
collude to benefit from initial amplitude deviations in the quantum auction.
7 Applications of Quantum Auctions
Two properties of quantum information may provide benefits to auctioneers
and bidders: the ability to compactly express complicated combinations of
preferences via superpositions and entanglement and the destruction of the
quantum state upon measurement. This section describes some economic
scenarios that could benefit from these properties.
As one economic application, quantum auctions provide a natural way to
solve the allocative externality problem [18, 25]. In this situation, a bidder’s
value for an item depends on the items received by other bidders. For
example, consider companies bidding on a big government project requiring
multiple companies to work on different parts. Allocative externality refers
to the issue that the costs for a company which wins a contract for one part
depends on which other companies win other parts. So company A may
be willing to bid more aggressively if it knows that company B will work
on related parts. Multiple simultaneous auctions for separate parts will not
handle these interdependencies and thus will be inefficient. One possible
solution is to let companies form partnership bids. That is joint bids that
are accepted together or not at all. Quantum information processing allows
for a natural way of forming partnership bids via entanglement. With the
protocol described in Sec. 6, multiple bids can be entangled so they will
either all be accepted together or none will be. Furthermore, quantum
auctions may provide more flexibility with respect to information privacy of
partnership bids than classical methods.
Specifically, with multiple items, groups of bidders could select joint op-
erators on their combined qubits, allowing them to express joint constraints
(e.g., where they either all win their specified items or none of them do)
without any of the other bidders or auctioneer knowing this choice. The
bidders do so by creating an entangled state instead of the factored form
for their qubits. Thus employing quantum entanglement provides bidders a
natural way for expressing any allocative externality. This possibility shows
bidding languages based on qubits are highly expressive and compact be-
cause bidders can use the same bits to express their individual bids and joint
bids via entanglement.
Example 5. Alice and Bob could jointly form the state
(|∅, 0,∅, 0〉 + |IA, bA, IB , bB〉+ |IC , bC , ID, bD〉) (15)
to represent the bidders willing to pay bA and bB for items IA and IB, or
to pay bC and bD for items IC and ID, but they are not willing to buy other
combinations, such as IA for Alice and ID for Bob.
In this scenario, a direct representation of bids, i.e., without a null bid,
would not guarantee the joint preferences are satisfied for all entangled bid-
ders or none of them. That is, without null bids, the superposition could
not express the joint preference through entanglement.
A group of k bidders operating jointly on their qubits to form entangled
bids could also produce initial amplitudes involving up to k-bidder deviation
states. However the discussion with Eq. (14) on multiple item auctions shows
our protocol can handle all deviation states a group of up to n−1 bidders can
produce, i.e., by mapping them to infeasible outcomes. Thus the additional
expressivity used for joint bids does not introduce additional opportunities
for collusion to change the outcomes via initial amplitude selection.
A second economic application for quantum auctions arises from their
privacy guarantee for losing bids. This property is economically useful when
bidders have incentives to hide information. An example is a scenario in
which companies are bidding for government contracts year after year. A
company’s bid usually contains information about its cost structure. If there
is reasonable expectation that the losing bids will be revealed, a company
may want to bid less aggressively to reduce the amount of information passed
to its competition for use in future auctions. This will lead to a less efficient
auction than if bidders reveal their true values. In this situation, a privacy
guarantee on the losing bids enables bidders to bid with less inhibition.
More generally, this privacy issue is only relevant when there are additional
interactions between these companies after the auction is concluded, such as
future auctions or negotiations where participants may be at a disadvantage
if their values are known to others.
This strong privacy property is unique to quantum information process-
ing. Privacy can be enforced via cryptographic methods for multi-player
computation [13], and in an auction can keep losing bids secret [22]. How-
ever, the information on the bids, and the key to decrypt them, remains
after the auction completes. People who have access to the key may be
legally compelled to reveal the information or choose to sell it. So while
cryptography can be secured computationally, it cannot guarantee the in-
tegrity of the person(s) who have the means to decrypt the information.
On the other hand, the quantum method destroys losing bids during the
search for the winning one and it is physically impossible to reconstruct
the bids after the auction process. Similarly, some of the other properties
of quantum auctions, such as correlations for partnership bids, can be pro-
vided classically [19]. Moreover, quantum mechanisms are readily simulated
classically [27] (as long as they involve at most 20 to 30 qubits). However,
these classical approaches lack the information security of quantum states.
More study is needed to determine scenarios where the privacy property of
the quantum protocol is significant.
8 Discussion
This paper describes a quantum protocol for auctions, gives a game theory
analysis of some strategic issues the protocol raises and suggests economic
scenarios that could benefit from these auctions. These include the privacy
of bids and the possibility of addressing allocative externalities. The search
used in our protocol can use arbitrary criteria for evaluating allocations,
thereby implementing other types of auctions with quantum states. Thus
while we focus our attention on the first-price sealed-bid auction, the pro-
tocol is more general: it can implement other pricing and allocation rules,
as well as multiple-unit-multiple-item auctions with combinatorial bids. For
example we can use this protocol in a multiple stage, iterative auction. In
fact, the protocol supports general bidding languages.
Encoding bids in quantum states raises new game theory issues because
the bidders’ strategic choices include specifying amplitudes in the quantum
states. The auction is not only probabilistic, but the winning probability
is not just a function of the amount bid. Instead a bidder can change the
probability of winning by altering the amplitudes of the quantum states
encoding his bid. For example, in the context of the first-price sealed-bid
auction, the auction does not guarantee the allocation of the item to the
highest bidder.
We show that the correct design of the protocol can solve a specific
version of this incentive problem. The salient design feature is an incentive
compatible mechanism so that bidders do not want to cheat, as opposed to
an algorithmic secure protocol that prevents bidders from cheating. Thus,
our design is an example of a quantum algorithm, in this case adiabatic
search, tuned to improve incentive issues rather than the usual focus in
quantum information processing on computation or security properties of
algorithms.
In addition, we show that the Nash equilibrium of the corresponding
classical first-price sealed-bid auction is an ǫ-equilibrium of the quantum
auction and that ǫ converges to zero when the quantum search associated
with the protocol uses an increasing number of steps, under the conditions
listed in Theorem 1. This result is with respect to changes in the initial
state of the search. It remains to be seen whether other bidder strategies
give some unilateral benefit, requiring further adjustments to the auction
design.
There are multiple directions for future work. First, we plan a series
of human subject experiments on whether people can indeed bid effectively
in the simple quantum auction scenario described in this paper. As with
previous experiments with a quantum public goods mechanism [2], such ex-
periments are useful tests of the applicability of game theory in practice,
and also suggest useful training and decision support tools. In particular,
people’s behavior in a quantum auction could differ from game theory predic-
tions that people select a Nash equilibrium based on idealized assumptions
of human rationality and full ability to evaluate consequences of strategic
choices with uncertainty.
Second, we plan to extend studies of quantum auctions to more com-
plicated economic scenarios, such as one with allocative externality. Our
analysis considers a single auction. An interesting extension is to a series of
auctions for similar items. If auctions are repeated, the game theory anal-
ysis is more complicated [28]. In particular, privacy concerns become more
significant since information revealed by a bidder’s behavior in one auction
may benefit other bidders in later auctions.
The quantum auction destroys all information about the losing bids. As
a result, it is not possible to conduct after-the-fact audits to verify that
the auction has been conducted correctly. Is there a way to modify the
mechanism to enable audits while preserving some of the privacy guaran-
tees? Security is another interesting issue. For example, there may be third
parties, aside from the auctioneer and bidders who are interested in inter-
cepting and changing bits in transit. Auctioneers may have incentives to
detect a bidder’s bid or skew auction results. The question is whether we
can build security around the protocol to prevent or at least detect these
types of attacks.
Similarly, many economics issues surrounding the protocol remain to be
resolved. For example, people behave as if they are risk averse in auction
situations [5, 4] which can change the predictions of game theory. Another
issue arises from the possibility of multiple Nash equilibria. We have only
shown that the desirable outcome is an equilibrium. The quantum protocol
can also have other equilibria. Since the Nash equilibrium concept alone
does not indicate how people select one equilibrium over another, additional
study is needed to determine when the desirable outcome is likely to occur.
Our protocol makes only limited use of quantum states, in particular
encoding bids in the subspace selected by the bidders but not using the
amplitudes separately. Thus it would be interesting to examine extensions to
the protocol exploiting the wider range of options for bidders. For example,
a protocol might use amplitudes of superpositions to indicate a bidder’s
probabilistic preferences, say, as in constructing a portfolio of items with
various expected values and risks. Such portfolios could be useful if bidders
have some uncertainty in their values (e.g., in bidding for oil field exploration
rights) rather than the standard private value framework considered in this
paper, where bidders know their own values for the items. With uncertain
values, probabilistic bids could allow bidders to match their risk preferences
along with their value estimates within the auction process.
As a final note, the number of qubits necessary to conduct an auction is
small compared to the requirement of complex computations such as factor-
ing. For example, if each bidder uses 7 bits (corresponding to 27 or about
100 bid values) and there are 3 bidders, about 25 qubits are needed, consid-
erably less than thousands needed for factoring interesting-sized numbers.
Thus with the advancement of quantum information processing technologies,
economics mechanisms could be early feasible applications.
Acknowledgments
We have benefited from discussions with Raymond Beausoleil, Saikat Guha, Philip
Kuekes, Andrew Landahl and Tim Spiller. This work was supported by DARPA
funding via the Army Research Office contract #W911NF0530002 to Dr. Beau-
soleil. This paper does not necessarily reflect the position or the policy of the
Government funding agencies, and no official endorsement of the views contained
herein by the funding agencies should be inferred.
References
[1] Michel Boyer, Gilles Brassard, Peter Hoyer, and Alain Tapp. Tight bounds
on quantum searching. In T. Toffoli et al., editors, Proc. of the Workshop on
Physics and Computation (PhysComp96), pages 36–43, Cambridge, MA, 1996.
New England Complex Systems Institute.
[2] Kay-Yut Chen and Tad Hogg. How well do people play a quantum prisoner’s
dilemma? Quantum Information Processing, 5:43–67, 2006.
[3] Kay-Yut Chen, Tad Hogg, and Raymond Beausoleil. A quantum treatment of
public goods economics. Quantum Information Processing, 1:449–469, 2002.
arxiv.org preprint quant-ph/0301013.
[4] Kay-Yut Chen and Charles R. Plott. Nonlinear behavior in sealed bid first
price auctions. Games and Economic Behavior, 25:34–78, 1998.
[5] James C. Cox, Vernon L. Smith, and James M. Walker. Theory and individual
behavior of first-price auctions. Journal of Risk and Uncertainty, 1:61–99,
1988.
[6] Peter Cramton, Yoav Shoham, and Richard Steinberg, editors. Combinatorial
Auctions. MIT Press, 2006.
[7] Jiangfeng Du et al. Entanglement enhanced multiplayer quantum games.
Physics Letters A, 302:229–233, 2002. arxiv.org preprint quant-ph/0110122.
[8] Jiangfeng Du et al. Experimental realization of quantum games on a quan-
tum computer. Physical Review Letters, 88:137902, 2002. arxiv.org preprint
quant-ph/0104087.
[9] Christoph Durr and Peter Hoyer. A quantum algorithm for finding the mini-
mum. arxiv.org preprint quant-ph/9607014, 1996.
[10] J. Eisert, M. Wilkens, and M. Lewenstein. Quantum games and quantum
strategies. Physical Review Letters, 83:3077–3080, 1999. arxiv.org preprint
quant-ph/9806088.
[11] Jens Eisert and Martin Wilkens. Quantum games. J. Modern Optics, 47:2543–
2556, 2000. arxiv.org preprint quant-ph/0004076.
[12] Edward Farhi et al. A quantum adiabatic evolution algorithm applied to
random instances of an NP-complete problem. Science, 292:472–476, 2001.
[13] O. Goldreich. Secure multi-party computation. working draft version 1.1, 1998.
Available at philby.ucsd.edu/cryptolib/books.html.
[14] Lov K. Grover. Quantum mechanics helps in searching for a needle in a
haystack. Physical Review Letters, 79:325–328, 1997. arxiv.org preprint
quant-ph/9706033.
[15] Tad Hogg. Quantum search heuristics. Physical Review A, 61:052311, 2000.
Preprint at publish.aps.org/eprint/gateway/eplist/aps1999oct19 002.
[16] Tad Hogg. Adiabatic quantum computing for random satisfiability problems.
Physical Review A, 67:022314, 2003. arxiv.org preprint quant-ph/0206059.
http://arxiv.org/abs/quant-ph/0301013
http://arxiv.org/abs/quant-ph/0110122
http://arxiv.org/abs/quant-ph/0104087
http://arxiv.org/abs/quant-ph/9607014
http://arxiv.org/abs/quant-ph/9806088
http://arxiv.org/abs/quant-ph/0004076
http://arxiv.org/abs/quant-ph/9706033
http://arxiv.org/abs/quant-ph/0206059
[17] Bernardo A. Huberman and Tad Hogg. Quantum solution of coordination
problems. Quantum Information Processing, 2:421–432, 2003. arxiv.org
preprint quant-ph/0306112.
[18] Philippe Jehiel and Benny Moldovanu. Allocative and informational external-
ities in auctions and related mechanisms. Technical Report SFB/TR 15 142,
Free University of Berlin. available at ideas.repec.org/p/trf/wpaper/142.html
LOCATION =.
[19] David A. Meyer. Quantum communication in games. In S. M. Barnett et al.,
editors, Quantum Communication, Measurement and Computing, volume 734,
pages 36–39. AIP Conference Proceedings, 2004.
[20] Paul R. Milgrom and Robert J. Weber. A theory of auctions and competitive
bidding. Econometrica, 50:1089–1122, 1982.
[21] Pierfrancesco La Mura. Correlated equilibria of classical strategic games with
quantum signals. arxiv.org preprint quant-ph/0309033, Sept. 2003.
[22] Moni Naor, Benny Pinkas, and Reuben Sumner. Privacy perserving auctions
and mechanism design. In Proc. of the ACM Conference on Electronic Com-
merce, pages 129–139, NY, 1999. ACM Press.
[23] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum
Information. Cambridge Univ. Press, 2000.
[24] Roy Radner. Collusive behavior in noncooperative epsilon-equilibria of
oligopolies with long but finite lives. J. of Economic Theory, 22:136–154,
1980.
[25] Martin Ranger. The generalized ascending proxy auction in the presence of
externalities. Technical report, Social Science Research Network, July 2005.
available at ssrn.com/abstract=834785.
[26] Peter W. Shor. Algorithms for quantum computation: Discrete logarithms and
factoring. In S. Goldwasser, editor, Proc. of the 35th Symposium on Founda-
tions of Computer Science, pages 124–134, Los Alamitos, CA, November 1994.
IEEE Press.
[27] S. J. van Enk and R. Pike. Classical rules in quantum games. Physical Review
A, 66:024306, 2002.
[28] Robert Wilson. Strategic analysis of auctions. In Robert Aumann and Sergiu
Hart, editors, Handbook of Game Theory with Economics Applications, vol-
ume 1. Elsevier, 1992. Chapter 8.
http://arxiv.org/abs/quant-ph/0306112
http://arxiv.org/abs/quant-ph/0309033
	Introduction
	Quantum Auction Protocol
	Quantum Auction Implementation
	Creation and interpretation of quantum bids
	Distributed Search
	Strategies with Quantum Operators
	Selecting the Subspace
	Altering Initial Amplitudes
	Changing Operator During Search
	Quantum Auction Design
	Checking for the Null Set
	The First-Price Sealed-Bid Auction
	A Game Theory Approach to Auctions
	Behavior with Perfect Search
	Bounded Number of Search Steps
	Testing for Changed Operators During Search
	Assigning Eigenvalues to Subspaces
	An Example
	General Case
	Multiple Items and Combinatorial Auction
	Applications of Quantum Auctions
	Discussion
ABSTRACT
  We present a quantum auction protocol using superpositions to represent bids
and distributed search to identify the winner(s). Measuring the final quantum
state gives the auction outcome while simultaneously destroying the
superposition. Thus non-winning bids are never revealed. Participants can use
entanglement to arrange for correlations among their bids, with the assurance
that this entanglement is not observable by others. The protocol is useful for
information hiding applications, such as partnership bidding with allocative
externality or concerns about revealing bidding preferences. The protocol
applies to a variety of auction types, e.g., first or second price, and to
auctions involving either a single item or arbitrary bundles of items (i.e.,
combinatorial auctions). We analyze the game-theoretical behavior of the
quantum protocol for the simple case of a sealed-bid quantum, and show how a
suitably designed adiabatic search reduces the possibilities for bidders to
game the auction. This design illustrates how incentive rather that
computational constraints affect quantum algorithm choices.

<|endoftext|><|startoftext|>
Geometric Phase and Superconducting Flux Quantization
Geometric Phase and Superconducting Flux Quantization 
Walter A. Simmons & Sandip S. Pakvasa 
Department of Physics and Astronomy 
University of Hawaii at Manoa 
Honolulu, Hi  96822 
Abstract 
In a ring of s-wave superconducting material the magnetic flux is 
quantized in units of 0 2
Φ = .  It is well known from the theory of 
Josephson junctions that if the ring is interrupted with a piece of d-
wave material, then the flux is quantized in one-half of those units 
due to a additional phase shift of π  .  We reinterpret this 
phenomenon in terms of geometric phase. 
We consider an idealized hetero-junction superconductor with pure s-
wave and pure d-wave electron pairs. We find, for this idealized 
configuration, that the phase shift of π  follows from the discontinuity 
in the geometric phase and is thus a fundamental consequence of 
quantum mechanics. 
Geometric phase has been contained in quantum mechanics since 
the foundations of the field were set down in the early twentieth 
century; however, the phase and its importance were not recognized 
for some time.  Pancharatnam1 discovered the classical geometric 
phase in optics in 1956 and Berry’s important 1987 quantum 
mechanics paper2 stimulated the rapid development of the field.  By 
1992, Anandan3, in a review article in Nature, was able to conclude 
that the phase had been convincingly demonstrated.  The first 
application of geometric phase to Josephson Junctions was carried 
out by Anandan and Pati in 1997.  They showed that the zero voltage 
tunneling supercurrent is geometric in nature and that it is 
proportional to the speed of the state vector in projective Hilbert 
space.4
The appearance of a phase discontinuity of π±  arising from 
geometric phase, under certain circumstances, was shown5 to be a 
general feature of quantum mechanics in 2003, but has so far found 
only limited application6.   Here we show that a well known 
phenomenon7,8 in superconductivity, the quantization of magnetic 
flux in one half of the usual unit9, which is 0 2
Φ = , can be interpreted 
as an effect of the discontinuity in geometric phase.  This phase shift 
in superconductors has been understood in terms of the physics of 
the Josephson junctions and the result has been applied to high 
temperature superconductors10 in order to test the idea that they 
involve d-wave electron pairs.   Our application considers an 
idealized hetero-junction superconductor with pure s-wave and pure 
d-wave electron pairs.  We find, for this idealized configuration, that 
the phase shift of π  follows from the discontinuity in the geometric 
phase4 and is thus a fundamental consequence of quantum 
mechanics. 
Applications of quantum geometric phase have been made in nearly 
every branch of physics, from fundamental material science11,12 to 
quantum computing with superconducting nanocircuits13, as well as 
in chemistry14, and it has been suggested that phase may become 
important in biology15.   
Since the phase has been long present, but not fully recognized, 
some applications entail reinterpreting known phenomena in terms of 
the phase.  An important and illustrative example is the 
reinterpretation of the so called Guoy effect in optics as a geometrical 
phase16, which we will summarize below.   
An idealized superconducting ring, which consists of a composite of 
s-wave material and d-wave material, will, in the absence of external 
electromagnetic fields, exhibit quantization of the magnetic flux in 
units of one half of the usual unit, 0 2
Φ = .  This half-unit quantization 
of the magnetic flux will occur whenever there is an odd number of 
phase shifts of magnitude π  in the circuit.  The theoretical argument6 
for the π  phase shift in a composite ring was based upon the 
dynamics of the Josephson junction and on thermodynamics, and 
has experimental support.   
We next explain the quantum mechanics of the phase shift and then 
we shall proceed to reinterpret the s-wave/d-wave superconductor 
hetro-junction. 
It has long been known17 that when a light beam converges to a 
focus, then diverges again, the light experiences a phase shift whose 
magnitude depends upon the details.  For example, for a beam with a 
Gaussian profile and a very small waistline at the focus, an abrupt 
phase shift of 
π  occurs for each of the two transverse directions, for 
a total phase shift of π .  This result, which follows from standard 
classical electrodynamics, has been reinterpreted in terms of 
geometric phase15.  For one transverse direction, the complex 
curvature of the wave reverses at the focus; the geometric phase, 
which is directly related to the curvature, changes by 
π .   
That optical example of geometric phase is closely analogous to a 
well-known18 phase flip of π± , which occurs in optics when the 
polarization of a light beam is rotated from some initial state, through 
and beyond, another polarization state that is orthogonal to the initial 
state. 
The latter example of the geometric phase discontinuity has been 
shown to occur rather generally in quantum mechanics4.  This can be 
understood by considering the behavior of a complex quantum state 
vector as it is impelled through a series of states in Hilbert space by 
an external force.  Suppose the initial state is iΨ , the final state is 
fΨ , and some intermediate state 0Ψ  is orthogonal to the initial 
state, 0 0iΨ Ψ = . In the complex plane, the trajectory is a sequence 
of projections of each state upon its subsequent state.  The trajectory 
from the initial to the final state passes through the origin (with a 
positive or negative infinitesimal imaginary part); the phase goes 
through an inverse-tangent singularity and changes by π± .   
Finally, turning to superconductivity, we adopt the theoretical 
framework19, in which superconductivity is viewed as a consequence 
of the breaking of gauge invariance entailing the formation of Cooper 
pairs.  We consider a ring of material in which the supercurrent is 
carried by s-wave Cooper pairs.  The ring is interrupted by a section 
of material in which d-wave Cooper pairs carry the supercurrent.  The 
supercurrent passing through the idealized hetro-junction 
experiences a shift of π±  in the geometric phase due to the 
orthogonality of d-wave and s-wave.  Since the orientation of the d-
wave relative to the s-wave is not meaningful, we have no sum over 
dimensions as in the optical beam analogy; a shift of π±  is the result 
of a single inserted section of material and the half-unit magnetic flux 
quantization follows.   
Since the 1997 work of Anandan and Pati, it has been known that the 
zero-voltage current in a tunneling supercurrent arises from the 
geometry of Hilbert space and is independent of the specific 
Hamiltonian, (which is a general feature of geometric phase20).   
Recently, experiments on hetero-junctions have supported the idea 
that high  materials are d-wave superconductors.  While we are not 
discussing here realistic models of high materials here, our results 
show that a phase shift of 
π±  in s-wave/d-wave hetero-junctions 
arises from the fundamentals of quantum mechanics.   
                                      
1 Pancharatnam, S., The Proceedings of the Indian Academy of Sciences, Vol XLIV, No. 5, Sec. 
A, 247 (1956) in Collected Works of S. Pancharatnam, Oxford University Press, London (175). 
                                                                                                                
2 Berry, M.V. “Quantal Phase Factors Accompanying Adiabatic Changes”, Proc. R. Soc. Lond. 
A392, 45, (1984). 
3 Anandan, J. “The geometric phase”, Nature 360, 307 (1992). 
4  Anandan, J. & Pati, A.K., “Geometry of the Josephson effect” Physics Letters A 231, 29 (1997). 
5 Mukunda, et al “Bargmann invariants, null phase curves, and a theory of the geometric phase”, 
Phys. Rev. A 67, 042114 (2003). 
6 Simon, R. and Mukunda, N., “Bargmann Invariant and the Geometry of the Guoy Effect”, Phys. 
Rev. Letters 70, 880 (1993). 
7 Bulaevskii, L. N. , Kuzii, V. V. & Sobyanin, A. A. Superconducting system with weak coupling to 
the current in the ground state. JETP Lett. 25, 290–294 (1977). 
8 Tsuei, C.C., and Kirtley, J.R. “Paring symmetry in cuprate superconductors” Rev. Mod. Phys 72, 
969 (2000).  For more recent results, see Kirtley, et al, Nature 373, 225 (2005) 
9 Ashcroft, N.W. & Mermin, N.D., Solid State Physics, Holt, Rinehart and Winston (1976). 
10 Hilgenkamp, H., Ariando, Smilde, H.-J. H., Blank, D. H. A., Rijnders, G., Rogalla, H., Kirtley, J. 
R., and Tsuei, C. C., “Ordering and manipulation of the magnetic moments in large-scale 
superconducting pi-loop arrays”, Nature 422, 50 (2003). 
11 Zak, J. “Berry’s Phase for Energy Bands in Solids”, Phys. Rev. Lett. 62, 2747 (1989). 
12 Resta, R. “Manifestations of Berry’s phase in molecules and condensed matter”, 
J.Phys.Condens. Matter 12, R107 (2000). 
13 Falci,G., Fazio, R., Palma, G.M., Siewert, J., and Vedral, V. “Detection of geometric phases in 
superconducting nanocircuits”, Nature 407, 355 (2000). 
14 Mead, C.A. “The geometric phase in molecular systems”, Rev. Mod. Phys. 64, 51 (1992). 
15 Kagan, M.L., Kepler, T. B.  & Epstein, I.R., “Geometric phase shifts in chemical oscillators”, 
Nature 349, 506 (1991). 
16 Simon, R. and Mukunda, N., “Bargmann Invariant and the Geometry of the Guoy Effect”, Phys. 
Rev. Letters 70, 880 (1993). 
17 Siegman, A.E., Lasers, University Science Books, Mill Valley, California (1986). 
18 Bhandari, R. “Polarization of light an the topological phases”, Physics Reports, 281, 1 (1997). 
19 Weinberg, S., Quantum Theory of Fields II, Cambridge University Press (1996).   
20 Aharonov, Y. & Anandan, J., “Phase Change during a Cyclic Quantum Evolution”, Phys. Rev. 
Lett 58, 1593 (1987). 
ABSTRACT
  In a ring of s-wave superconducting material the magnetic flux is quantized
in units of $\Phi_0 = \frac{h}{2e}$. It is well known from the theory of
Josephson junctions that if the ring is interrupted with a piece of d-wave
material, then the flux is quantized in one-half of those units due to a
additional phase shift of $\pi$. We reinterpret this phenomenon in terms of
geometric phase.
  We consider an idealized hetero-junction superconductor with pure s-wave and
pure d-wave electron pairs. We find, for this idealized configuration, that the
phase shift of $\pi$ follows from the discontinuity in the geometric phase and
is thus a fundamental consequence of quantum mechanics.

<|endoftext|><|startoftext|>
Equation-Free Implementation of Statistical Moment Closures
Francis J. Alexander and Gregory Johnson
Los Alamos National Laboratory, P.O.Box 1663,
Los Alamos, NM, 87545.
Gregory L. Eyink
Department of Mathematical Sciences
Johns Hopkins University
Baltimore, MD 21218
Ioannis G. Kevrekidis
Department of Chemical Engineering and PACM
Princeton University
Princeton, NJ 08544
We present a general numerical scheme for the practical implementation of statistical moment clo-
sures suitable for modeling complex, large-scale, nonlinear systems. Building on recently developed
equation-free methods, this approach numerically integrates the closure dynamics, the equations of
which may not even be available in closed form. Although closure dynamics introduce statistical
assumptions of unknown validity, they can have significant computational advantages as they typi-
cally have fewer degrees of freedom and may be much less stiff than the original detailed model. The
closure method can in principle be applied to a wide class of nonlinear problems, including strongly-
coupled systems (either deterministic or stochastic) for which there may be no scale separation. We
demonstrate the equation-free approach for implementing entropy-based Eyink-Levermore closures
on a nonlinear stochastic partial differential equation.
PACS numbers:
INTRODUCTION
Accurate, fast simulations of complex, large-scale, non-
linear systems remain a challenge for computational sci-
ence and engineering, despite extraordinary advances in
computing power. Examples range from molecular dy-
namics simulations of proteins [1], [2] and glasses [3], to
stochastic simulations of cellular biochemistry [4, 5], to
global-scale, geophysical fluid dynamics [6]. Often for the
systems under consideration there is no obvious scale sep-
aration, and their many degrees of freedom are strongly
coupled. The complex and multiscale nature of these pro-
cesses therefore makes them extremely difficult to model
numerically. To make matters worse, one is often in-
terested not in a single, time-dependent solution of the
equations governing these processes, but rather in ensem-
bles of solutions consisting of multiple realizations (e.g.,
sampling noise, initial conditions, and/or uncertain pa-
rameters). Often real-time answers are needed (e.g., for
control, tracking, filtering). These demands can easily
exceed the computational resources available not only
now but also for the foreseeable future.
In principle, all statistical information for the problem
under investigation is contained in solutions to the Liou-
ville (if deterministic)/Kolmogorov (if stochastic) equa-
tions. These are partial differential equations in a state
space of high (possibly infinite) dimension. A straightfor-
ward discretization of the Liouville / Kolmogorov equa-
tions is therefore impractical. An ensemble approach to
solving these equations can be taken; however, quite of-
ten, the practical application of the ensemble approach
is also problematic. Generating a sufficient number of
independent samples for statistical convergence can be a
challenge. For some problems, computing even one real-
ization may be prohibitive.
The traditional approach to making these prob-
lems computationally tractable is to replace the Liou-
ville/Kolmogorov equation by a (small) set of equations
(PDEs or ODEs) for a few, low order statistical moments
of its solution. When taking this approach for nonlinear
systems, one must make an approximation, a closure, for
the dependence of higher order moments on lower order
moments. Typically the form of the closure equation is
based on expert knowledge, empirical data, and/or phys-
ical insight. For example, in the superposition approxi-
mation and its extensions [7] for dense liquids and plas-
mas, both quantum or classical, one approximates third
order moments as functions of second order moments.
Moment closure methods of this type have been applied
to a number of areas including fluid turbulence (see [8]
and references therein, and also the work of Chorin et.
al.). Of course, as with any approximation strategy, the
quality of the resulting reduced description depends on
the approximations made – poor closures lead to poor
answers/predictions. In addition to replacing the ensem-
ble with a small set of equations for low order moments,
these equations are typically easier to solve. They are
deterministic and generally far less stiff than the original
http://arxiv.org/abs/0704.0804v1
equations.
A less exploited variant of this approximation scheme
is the probability density function (PDF) based moment-
closure approach. For PDF moment closures one makes
an ansatz for the system statistics guided by available in-
formation (e.g., symmetries). One then uses this ansatz
in conjunction with the original dynamical equations to
derive moment equations. Such PDF-based closures have
been developed for reacting scalars advected by turbu-
lence [10], phase-ordering dynamics [11] and a variety of
other systems. This approach to moment-closure is a
close analogue of the Rayleigh-Ritz method frequently
used in solving the quantum-mechanical Schroedinger
equation, by exploiting an ansatz for the wave-function.
For a formal development of this point of view, see [12].
One of the obstacles to applying moment closures is
that often the closure equations are too complicated to
write down explicitly, even with the availability of com-
puter algebra / symbolic computation systems. This
is especially true for large-scale, complex systems, e.g.
global climate models. Because of their great complexity,
even if one could in principle derive the closure equations
analytically, this procedure would be extremely difficult
and time-intensive. Moreover, each time a model is up-
dated, as climate and ocean models regularly are, the
closure equations would have to be rederived. In other
cases it may simply be impossible to determine the clo-
sure equations analytically. This is especially likely when
PDF’s are not Gaussian, which is the case for most use-
ful closures. Monte Carlo or other numerical methods
may be needed in order to evaluate integrals for the mo-
ments [13]. In addition, there may be situations where
neither analytic nor numerical/MC integration will yield
the closure equations due to the black-box nature of the
available numerical simulator such as a compiled numer-
ical code with an inaccessible source. Clearly, a need ex-
ists for a robust approach to the general closure protocol
which circumvents analytical difficulties.
We address that need here by combining PDF closures
with equation free modeling [14] [15]. The basic premise
of the equation-free method is to use an ensemble of short
bursts of simulation of the original dynamical system to
estimate, on demand, the time-evolution of the the clo-
sure equations that we may not explicitly have. The
equation-free approach extends the applicability of sta-
tistical closures beyond the rare cases where they can be
expressed in closed form. This hybrid strategy may be
faster than the brute-force solution of a large ensemble of
realizations of the dynamical equations since the closure
version is generally smoother than the original problem.
This paper is organized as follows. In Section 2 we
describe the general features of PDF-based moment clo-
sures. In Section 3 we explain how to implement the
equation-free approach with these closures. We then, in
Section 4, apply these ideas for a specific dynamical sys-
tem, the stochastic Ginzburg-Landau (GL) equations us-
ing a particular PDF-based closure scheme, the entropy
method of Eyink and Levermore [16]. We conclude with
a discussion of closure quality, computational issues, and
the application of our approach to large-scale systems.
PDF-BASED MOMENT CLOSURES
We consider the very general class of dynamical sys-
tems, including maps, formally represented by
Ẋ = U(X(t),N(t), t) (1)
Xt+1 = Ut(Xt,Nt) (2)
where N(t) is a stochastic process with prescribed statis-
tics. The stochastic component arises from unknown pa-
rameters, random forcing, neglected degrees of freedom
and/or random initial conditions. This class includes
both deterministic and stochastic systems with discrete
and/or continuous states. Queueing systems, molecular
dynamics, and stochastic PDEs are just some of the many
examples that fall into this category.
For concreteness in this paper we restrict ourselves to
a special case of equation (2), namely, situations where
N(t) is a Markov process (Brownian motion, Poisson pro-
cess, etc.) and—more specifically still—Itô stochastic dif-
ferential equations of the form:
dX = U(X, t)dt +
2S(X, t)dW(t). (3)
The deterministic component of the state, X, is gov-
erned by the continuously differentiable vector field, U :
N × R → RN . For many problems of interest (e.g., cli-
mate) U is a highly nonlinear function. The noise com-
ponent is modeled by the standard mean 0, covariance
matrix I Wiener process, W ∈ RN , possibly modulated
by a state-dependent matrix S : RN×R → RN×N . Equa-
tion (3) encompasses a wide class of systems including
deterministic (S = 0) ones.
In many cases one is interested in knowing the low
order statistics of equation (3), for example an instanta-
neous mean value or possibly multi-point covariance of
X. These statistics can be obtained by averaging over
an ensemble of stochastic systems, solving equation (3).
They can also be obtained via the forward Kolmogorov
equation for the probability density function P (X, t):
∂tP = L∗(t)P, (4)
where P satisfies the conditions: P (X, t) ≥ 0, and
P (X, t) dX = 1, and where L∗ is the generator of the
Markov process. In the case of equation (3) this operator
takes the form
L∗(t)ψ(X) = −∇X·(U(X, t)ψ(X))+∇2X : (D(X, t)ψ(X)).
The forward Kolmogorov equation then becomes a
Fokker-Planck equation
∂tP +∇X · (UP ) = ∇2X : (DP ) (6)
where D(X, t) = S(X, t)S(X, t)T is the nonnegative-
definite diffusion matrix arising from the noise term. Un-
like the original dynamical equation (3), the forward Kol-
morogov equation (FKE) is both linear and determinis-
tic. Dealing with it, therefore, has apparent advantages
over the original ensemble of stochastic systems simu-
lations. The price to pay for these advantages is that
the FKE lives in a typically high, potentially infinite-
dimensional, space. When equation (3) is a nonlinear
PDE, numerical solution to the FKE is usually ruled out.
For computational purposes, we would therefore like
to reduce the FKE (if possible and useful) to a small
system of ordinary differential equations. This reduc-
tion should simplify the computation as much as pos-
sible while retaining fidelity to the original dynamical
processes. The reduction proceeds by taking moments of
the FKE with respect to a vector-valued function ξ(X, t)
from RN × R+ → RM . The ξ selected should include
the relevant variables in the system (slow modes, con-
served quantities, etc.). The moments µ(t) of ξ(X, t) are
defined by
µ(t) =
ξ(X, t)P (X, t)dX (7)
and give rise to
µ̇(t) =
ξ̇(X, t)P (X, t)dX, (8)
where
ξ̇(X, t) = ∂tξ(X, t) + L(t)ξ(X, t) (9)
and L is the adjoint of L∗ or the backward Kolmogorov
operator. The result (8) can be obtained by averaging
over an ensemble of realizations of the stochastic dynam-
ics (3). In general, however, (8) is not a closed equa-
tion for the moments, µ. One can close this equation by
choosing a PDF, P (X, t,µ), which itself is a function of
the moments µ.
µ̇(t) = V(µ, t) ≡
ξ̇(X, t)P (X, t,µ)dX. (10)
Alternatively, one can select a family of probability den-
sities P (X, t,α), specified by parameters α = α(µ, t)
rather than directly by the moments µ. This is analogous
to specifying the temperature in the canonical ensemble
as opposed to the average energy. The equivalence of
these approaches is guaranteed provided that the param-
eters and moments can be determined uniquely from one
another. The translation between the parameters and
their corresponding moments can be carried out by one
of several methods. In some cases one may require Monte
Carlo evaluation of the resulting integrals.
If the moments and/or parameters are selected
judiciously, one hopes that the approximate PDF
P (X, t,α(µ)(t)) will be close to the exact solution of
the Liouville/Kolmogorov equation (4). The mapping
closure approach of Chen et al [10] and the Gaussian
mapping method of Yeung et al. [11] are based on this
type of parametric PDF closure [19]. In fact, perhaps
the most familiar application of the parametric approach
is the use of the Rayleigh-Ritz method in quantum me-
chanical calculations. This is the essential approach of
our paper.
EQUATION-FREE COMPUTATION
Although we now have obtained a closed moment equa-
tion (equation 10), we still need to determine the dynam-
ical vector field V. As explained above, this step can be a
serious obstacle to the practical implementation of PDF-
based moment-closure (PDFMC). A method to calculate
V is desirable that (i) does not require a radical revision
each time the underlying code or model changes, and (ii)
is relatively insensitive to the complexity of the PDFMC.
The equation-free approach of Kevrekidis and collabora-
tors [14] meets those requirements. It permits one to
work with much more sophisticated, physically realistic
closures.
Equation-free computation is motivated by the simple
observation that numerical computations involving the
closure equations ultimately do not require closed for-
mulae for the closure equations. Instead, one must only
be able to sample an ensemble of system states X dis-
tributed according to the closure ansatz P (X, t;α) and
then evolve each of these via equation (3) for short inter-
vals of time. Such sampling and subsequent dynamical
evolution would be necessary to calculate the statistics
of interest even when not using a closure strategy. It is
sufficient to have a (possibly black-box) subroutine avail-
able which, given a specific state variable X(t) as input,
returns the value of the state X(t + δt) after a short
time δt. The ensemble of systems, each of which satis-
fies equation (3), is evolved over a time interval δt. The
moments/parameters µ or α are determined at the be-
ginning and end of this interval and the time derivative
µ̇ is estimated from the results of these short ensemble
runs. This “coarse timestepper” can be used to estimate
locally the right hand side of the closure evolution equa-
tions, namely V(µ, t).
Coarse projective forward Euler (arguably the simplest
of equation-free algorithms) which we will use below il-
lustrates the approach succinctly: Starting from a set
of coarse-grained initial conditions specified by moments
µ(t) we first (a) lift to a consistent fine scale descrip-
tion, that is, sample the PDF ansatz P (X, t;α(t)) to
generate ensembles of initial conditions X for equation
(3) consistent with the set µ(t); (b) starting with these
consistent initial conditions we evolve the fine scale de-
scription for a (relatively short) time δt; we subsequently
restrict back to coarse observables by evaluating the mo-
ments µ(t+ δt) as ensemble-averages and (d) use the re-
sults to estimate locally the time derivative dµ/dt. This
is precisely the right hand-side of the explicitly unavail-
able closure, obtained not through a closed form formula,
but rather through short, judicious computational exper-
iments with the original fine scale dynamics/code. Given
this local estimate of the coarse-grained observable time
derivatives, we can now exploit the smoothness of their
evolution in time (in the form of Taylor series) and take
a single long projective forward Euler step:
µ(t+∆t) = µ(t) + ∆t
µ(t+ δt)− µ(t)
. (11)
The procedure then repeats itself: lifting, fine scale evo-
lution, restriction, estimation, and then (connecting with
continuum traditional numerical analysis) a new for-
ward Euler step. Beyond coarse projective forward Eu-
ler, many other coarse initial-value solvers (e.g. coarse
projective Adams-Bashforth, and even implicit coarse
solvers) have been implemented; the stability and accu-
racy study of such algorithms is progressing [14]. These
developments allow us to construct a nonintrusive imple-
mentation of PDF moment closures, nonintrusive in the
sense that we compute with the closures without explic-
itly obtaining them, but rather by intelligently chosen
computational experiments with the original, fine-scale
problem.
There is, however, an obvious objection to the
equation-free implementation of moment-closures. Using
the same ingredients, one can clearly obtain an estimate
of any statistics of interest (for example, the moment-
averages µ(t)) without the need of making any closure
assumptions whatsoever. This can be done by the much
simpler method of direct ensemble averaging. That is,
one can sample an ensemble of initial conditions X from
any chosen distribution P0(X), evolve each of these real-
izations according to the fine-scale dynamics of equation
(3), and then evaluate any statistics of interest at time
t by averaging over the ensemble of solutions X(t). It
would seem that this direct ensemble approach is much
more straightforward and accurate than the equation-free
implementation of a moment-closure, which introduces
additional statistical hypotheses.
The response to this important objection is that the
fine-scale dynamics (3) is often very stiff for the appli-
cations considered, in which the system contains many-
degrees-of-freedom interacting on a huge range of length-
and time-scales. In contrast, the closure equation (10) is
much less stiff, because of statistical-averaging, and its
solutions µ(t) are much smoother in time (and space).
Thus, to evolve an ensemble of solutions of the fine-scale
dynamics (3) from an initial time t0 to a final time t0+T
would require O(T/δt) integration steps, where the time-
step δt is required to be very small by the intrinsic stiff-
ness of the micro-dynamics. In the closure approach, the
evolution of the moment equations (10) from time t0 to
time t0+T requires only O(T/∆t) integration steps, with
(hopefully) ∆t ≫ δt. Each of these closure integration
steps by an increment ∆t requires in the equation-free
approach just one (or just a few) fine-scale integration
step by an increment δt. Thus, there is an over-all savings
by a (hopefully) large factor O(∆t/δt). This crude esti-
mate is based on a single step coarse projective forward
Euler algorithm; clearly, more sophisticated projective
integration algorithms can be used.
In all of them, however, the computational savings are
predicated on the smoothness of the closure equations,
and are governed by the ratio of the time that it takes
to obtain a good local estimate of dµ/dt from full direct
simulation to the time that we can (linearly or even poly-
nomially) extrapolate µ(t) in time. It is also worth not-
ing that a variety of additional computational tasks, be-
yond projective integration (e.g. accelerated fixed point
computation) can be performed within the equation-free
framework
In the next section we show by a concrete example how
significant computational economy can be achieved with
statistical moment closures implemented in the equation-
free framework.
A NUMERICAL EXAMPLE
We illustrate here the equation-free implementation
of moment-closures for a canonical equation of phase-
ordering kinetics [17], the stochastic time-dependent
Ginzburg-Landau (TDGL) equation in one spatial di-
mension. This is written as
∂φ(x, t)
= D∆φ(x, t) − V ′(φ(x, t)) + η(x, t) (12)
where φ(x, t) represents a local order parameter, e.g. a
magnetization. The noise has mean zero and covariance
〈η(x, t)η(x′, t′)〉 = 2kT δ(x−x′)δ(t− t′). The potential V
shall be chosen as
V (φ) =
to represent a single quartic/quadratic well. This
stochastic dynamics has an invariant measure which is
formally of Hamiltonian form P∗[φ] ∝ exp(−H [φ]/kT )
where
H [φ] =
D|∇φ(x)|2 + V (φ(x))] dx. (13)
The Gibbsian measure P∗[φ] is approached at long times
for any random distribution P0[φ] of initial states.
One of the simplest dynamical quantities of interest is
the bulk magnetization φ(t) = (1/V )
φ(x, t)dx, where
V is the total volume. If the initial statistics are space-
homogeneous, then the ensemble average µ(t) = 〈φ(t)〉 is
also given by µ(t) = 〈φ(x, t)〉 for any space point x. Equa-
tion (12) leads to a hierarchy of equations for statistical
moments of φ(x, t). For example, the first moment satis-
fies the equation
∂〈φ(x, t)〉
= ∆〈φ(x, t)〉 − 〈φ(x, t)〉 − 〈φ3(x, t)〉. (14)
The evolution of the mean total magnetization is thus
a function of the mean cubic total magnetization. One
could write a time evolution equation for 〈φ3〉, but it
would involve a higher order term 〈φ5〉, and so on. Each
equation contains higher moments and therefore the hi-
erarchy does not close.
To close the equation for µ(t) we assume a parametric
PDF of the form P [φ;α] ∝ exp(−H [φ;α]/kT ) where
H [φ;α] = H [φ] + α
φ(x) dx
is a perturbation of the Hamiltonian (13) by a term pro-
portional to the moment variable ξ[φ] = (1/V )
φ(x) dx.
This is a special case of a general “entropy-based” clo-
sure prescription proposed by Eyink and Levermore [16].
This closure scheme guarantees that α(t) → 0 at long
times and therefore the PDF ansatz P [φ;α(t)] relaxes to
the correct stationary distribution P∗[φ] of the stochastic
process. The determination of the parameter α given the
moment µ is here accomplished by Legendre transform
α = argmaxα[αµ− F (α)], (15)
where the “moment-generating function” F (α) =
log〈exp[α
φ(x) dx]〉∗ and 〈·〉∗ denotes average with re-
spect to the invariant measure P∗[φ]. The numerical op-
timization required for the Legendre transform is well-
suited to gradient descent algorithms such as the conju-
gate gradient method, since
(∂/∂α)[αµ− F (α)] = µ− µ(α),
where µ(α) = 〈ξ〉α is the average of the moment-function
in the PDF ansatz P [φ;α]. In simple cases, F (α) and
µ(α) = F ′(α) may be given by closed analytical expres-
sions. If not, then both of these averages may be deter-
mined together by Monte Carlo sampling techniques.
In the numerical calculations below, we discretize equa-
tion (12) using a forward Euler-Maruyama stochastic in-
tegrator and 3-point stencil for the Laplacian (other dis-
cretizations are possible).
φ(x, t + δt) = φ(x, t)− δt[φ(x, t) + φ3(x, t)] + (16)
(δx)2
[φ(x + δx, t)− 2φ(x, t) + φ(x− δx, t)] +
2kT (δt/δx)N(x, t)
where N(x, t) are independent, identically distributed
standard normal random variables for each space-time
point (x, t). The invariant distribution of the stochastic
dynamics space-discretized in this manner has a Gibbsian
form ∝ exp(−Hδ/kT ) with discrete Hamiltonian
〈x,x′〉
(φ(x) − φ(x′))2 (17)
φ2(x) +
φ4(x)]
where 〈x, x′〉 are nearest-neighbor pairs. The closure
ansatz can be adopted in the consistently discretized form
Pδ[φ;α] ∝ exp(−Hδ[φ;α]/kT ) where
Hδ[φ;α] = Hδ[φ] + α
δxφ(x).
In this numerical experiment, we integrate an N =
1000 member ensemble of solutions of equation (17),
and measure the ensemble-averaged, global magnetiza-
tion µ(t) = 〈φ(t)〉 = (1/V )
〈φ(x, t)〉 at each time-
step. With this we compare the results of the entropy-
based closure simulation implemented by the equation-
free framework using also an ensemble with N = 1000
samples. In this concrete example, the projective inte-
gration scheme works as follows: Suppose we are given
the parameter α(t) at time t. The mean µ(t) is first cal-
culated from the parametric ensemble at time t by Monte
Carlo sampling. Next all N samples are integrated over
a short time-step δt to create a time-advanced ensemble.
From this ensemble µ(t + δt) is calculated, which yields
an estimate of the local time derivative.
µ̇app(t) = [µ(t+ δt)− µ(t)]/δt.
A large, projective Euler time-step of the moment aver-
age is then taken via
µ(t+∆t) = µ(t) + ∆t µ̇app(t).
The parameter is finally updated by using the Legendre
transform inversion to obtain α(t+∆t) from the known
value µ(t + ∆t). The cycle may now be repeated to in-
tegrate the closure equations by successive time-steps of
length ∆t.
A critical issue in general application of projective inte-
gration is the criterion to determine the projective time-
step ∆t. For stiff problems with time-scale separation,
the projective time step for stability purposes is of the
order of (1/fastest “slow group” eigenvalues), while the
“preparatory” simulation time is of the order of (1/slow-
est “fast group” eigenvalue). Variants of the approach
have been developed for problems with several gaps in
their spectrum [18]. Accuracy considerations in real-time
projective step selection can, in principle, be dealt with
in the traditional way for integrators with adaptive step-
size selection and error control: through on-line a poste-
riori error estimates. An additional “twist” arises from
the error inherent in the estimation of the (unavailable)
reduced time derivatives from the ensemble simulations;
issues of variance reduction and even on-line hypothe-
sis testing (are the data consistent with a local linear
model?) must be considered. These are important re-
search issues that are currently explored by several re-
search groups. Nevertheless, the main factor in computa-
tional savings comes from the effective smoothness of the
unavailable closed equation: the separation of time scales
between the low-order statistics we follow and the higher
order statistics whose effect we model (and, eventually,
the time scales of the direct simulation of the original
model).
Figure 1 is a plot comparing Projective Integration
with Entropy Closure and direct Ensemble Integration
with equation (12) for diffusion constant D = 1000.0 We
have selected both the “fine-scale” integration step δt and
the “coarse-scale” projective integration step ∆t to be as
large as possible, consistent with stability and accuracy.
Thus, only steps small enough to avoid numerical blow-
ups were considered. Then, values were selected both
for δt and for ∆t so that the numerical integrations with
those time-steps differed by at most a few percent from
fully converged integrations with very small steps. In this
manner, the time step required for the Euler-Maruyama
integration of (12) was determined to be δt = 0.0004. On
the other hand, for projective integration of the closure
equation a time step ∆t = 0.01 could be taken. This
indicates a gain in time step by a factor of 25, which is
also roughly the speed-up in the algorithm or savings in
CPU time. The present example is not as stiff as equa-
tions that appear in more realistic applications, with a
very broad range of length- and time-scales, where even
greater computational economies might be expected.
In general, the moment-closure results need not agree
so well with those of the direct ensemble approach, even
when both are converged. In the example presented here,
there is good agreement because the closure effectively
captures the one-point PDF (see Fig.2). This one-point
PDF is the only statistical quantity that enters into Equa-
tion (14) as long as the statistics are homogeneous and
the Laplacian term vanishes.
CONCLUSIONS
In this paper, we have described how one can combine
recently developed equation-free methods with statisti-
cal moment closures to model nonlinear problems. With
this method we can numerically integrate complex non-
linear systems, for which closure equations may not be
available in closed form. In the example presented here
the specific entropy-based closure we selected has an H-
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
FIG. 1: Mean total field as a function of time. Line (sym-
bols): traditional (coarse projective) integration, respectively.
See the text for a description of the stepsize selection.
FIG. 2: Comparison of the time dependent PDF’s of the local
field φ(x, t) for the exact solution (blue) and for the projective
integration / closure solution (red).
theorem which guarantees relaxation to the equilibrium
state of the original dissipative dynamics. However, we
stress that the general approach outlined above can be
used with a variety of closure methods.
The equation-free method has the potential to enhance
the flexibility, power, and applications set of the statis-
tical moment closure approach. Since little or no an-
alytic work is required, the sophistication of statistical
moment closures can greatly enhanced beyond Gaussian
PDF ansätze. The “practical usefulness” criterion for
parametric PDF models that they permit analytical cal-
culations is replaced by the criterion that they can be
efficiently sampled. We believe that this approach can
significantly increase the usefulness of closure methods.
In order to model systems like global climate, oceans,
and reaction diffusion processes in systems biology, one
will have to construct more complex closures. These
will likely include higher order moments, correlation
functions of the relevant variables, highly non-Gaussian
statistics, etc. As the closures become more complex,
the lifting step will require more efficient sampling ap-
proaches. One will likely have to use nonlocal, acceler-
ated sampling methods. One will also likely employ the
latest in adaptive time and adaptive mesh methods to
optimize performance for large-scale problems.
ACKNOWLEDGEMENTS
This work, LA-UR-07-2218, was carried out in part
at Los Alamos National Laboratory under the auspices
of the US National Nuclear Security Administration of
the US Department of Energy. It was supported un-
der contract number DE-AC52-06NA25396. The work
of IGK was partially supported by DARPA and by and
US DOE(CMPD). G. Eyink was supported by NSF-ITR
grant, DMS-0113649.
[1] T. Schlick, R. D. Skeel, A. T. Brunger, L. V. Kale, J. A.
Board, Jr. , J. Hermans, and K. Schulten J. Comp. Phys.
151, 9, (1999).
[2] M. Karplus and J. A. McCammon, Nature, Structural
and Molecular Biology, 9 , 646, (2002).
[3] P. G. Debenedetti and F. H. Stillinger, Nature, 410, 259,
(2001).
[4] D. T. Gillespie, J. Phys. Chem., 81 , 2340, (1977).
[5] D. J. Wilkinson, Stochastic Modeling for Systems Biol-
ogy, Chapman & Hall / CRC Press, Boca Raton, (2006).
[6] A. J. Majda and X. Wang, Nonlinear Dynamics and Sta-
tistical Theories for Basic Geophysical Flows, Cambridge
Univeristy Press, Cambridge UK, 2006
[7] J. P. Hansen and I. R. MacDonald, Theory of Simple
Liquids, Academic, New York, (1986).
[8] S. B. Pope, Turbulent Flows, Cambridge University
Press, Cambridge, UK, (2000).
[9] A. J. Chorin, O. H. Hald, and R. Kupferman, Proceedings
of the National Academy of Sciences of the United States
of America, 97, 2968, (2000).
[10] H. Chen, S. Chen, and R. H. Kraichnan, Phys. Rev. Lett.,
63, 2657–2660, 1989.
[11] C. Yeung, Y. Oono, and A. Shinozaki Phys. Rev. E, 49,
2693 (1994)
[12] G. L. Eyink, Phys. Rev. E 54 (1996) 3419–3435.
[13] C.D. Levermore, J. Stat. Phys., 86 (1996), 1021–1065.
[14] I. G. Kevrekidis, C. W. Gear, J. M. Hyman, P. G.
Kevrekidis, O. Runborg and K. Theodoropoulos, Comm.
Math. Sciences 1(4) pp.715-762 (2003); S. L. Lee and C.
W. Gear, J. Comp. App. Math., 201, 258, (2007).
[15] I. G. Kevrekidis, C. William Gear and G. Hummer,
A.I.Ch.E Journal, 50(7) pp.1346-1354 (2004)
[16] G. L. Eyink and C. D. Levermore (preprint) Entropy-
Based Closures of Nonlinear Stochastic Dynamics. sub-
mitted to ”Communications in Mathematical Sciences”
(2006).
[17] A. J. Bray, Adv. in Phys., 43, 357, (1994)
[18] C. W. Gear and I. G. Kevrekidis, J. Comp. Phys., 187,
95, (2003)
[19] In the case of [10] the dynamics is an advection-reaction-
diffusion equation for a scalar concentration field X(t) =
{θ(x, t) : x ∈ Rd}. The moment functions are the “fine-
grained PDF” ξϑ,x[X, t] = δ(θ(x, t) − ϑ), labelled by
space point x and scalar value ϑ. The moment average
µϑ,x(t) = 〈δ(θ(x, t) − ϑ)〉 is the 1-point PDF p(ϑ;x, t)
which gives the distribution of scalar values ϑ at space-
time point (x, t). The parametric model P [X;α, t] is
the distribution over scalar fields obtained by the ansatz
θ(x, t) = X(θ0(x, t),x, t) where θ0(x, t) is a reference ran-
dom field of known (Gaussian) statistics and X(·,x, t) :
R → R is a “mapping function”. The latter function is
the “parameter” αϑ0,x(t) = X(ϑ0,x, t) which determines
(and is determined by) the “moment” µϑ,x(t) from the re-
lation p(X(ϑ0,x, t);x, t)|∂X/∂ϑ0| = p0(ϑ0,x, t). Here p0
is the 1-point PDF of the reference Gaussian field θ0(x, t).
The approach of [11] is similar. The problem is phase-
ordering dynamics as given, for example, by our equation
(12) and X(t) = {φ(x, t) : x ∈ Rd}. The moment func-
tions are the quadratic products ξr[X, t] = φ(r, t)φ(0, t),
labelled by the displacement r ∈ Rd and the moment av-
erages µr(t) are the spatial correlation function C(r, t).
The parametric model P [X;α, t] is the distribution ob-
tained by the ansatz φ(x, t) = f(u(x, t)) where u(x, t) is a
homogeneous Gaussian random field with mean zero and
covariance G(r, t) = 〈u(r, t)u(0, t)〉 and f(z) is the sta-
tionary planar interface solution of the TDGL equation
(12). In this case, it is the auxiliary correlation function
G(r, t) which plays the role of the “parameter” αr(t). It
is shown in [11] for various cases how this function may
be uniquely related to the “moment” µr(t) = C(r, t).
ABSTRACT
  We present a general numerical scheme for the practical implementation of
statistical moment closures suitable for modeling complex, large-scale,
nonlinear systems. Building on recently developed equation-free methods, this
approach numerically integrates the closure dynamics, the equations of which
may not even be available in closed form. Although closure dynamics introduce
statistical assumptions of unknown validity, they can have significant
computational advantages as they typically have fewer degrees of freedom and
may be much less stiff than the original detailed model. The closure method can
in principle be applied to a wide class of nonlinear problems, including
strongly-coupled systems (either deterministic or stochastic) for which there
may be no scale separation. We demonstrate the equation-free approach for
implementing entropy-based Eyink-Levermore closures on a nonlinear stochastic
partial differential equation.

<|endoftext|><|startoftext|>
Introduction
	System Model
	Relay Selection Via Limited Feedback
	Performance Impact of Varying System Parameters
	BER Analysis
	Conclusion
	References
ABSTRACT
  It has been shown that a decentralized relay selection protocol based on
opportunistic feedback from the relays yields good throughput performance in
dense wireless networks. This selection strategy supports a hybrid-ARQ
transmission approach where relays forward parity information to the
destination in the event of a decoding error. Such an approach, however,
suffers a loss compared to centralized strategies that select relays with the
best channel gain to the destination. This paper closes the performance gap by
adding another level of channel feedback to the decentralized relay selection
problem. It is demonstrated that only one additional bit of feedback is
necessary for good throughput performance. The performance impact of varying
key parameters such as the number of relays and the channel feedback threshold
is discussed. An accompanying bit error rate analysis demonstrates the
importance of relay selection.

<|endoftext|><|startoftext|>
Introduction
This paper describes the Fourth Edition of the Sloan Digital Sky Survey (SDSS; York et al. 2000)
Quasar Catalog. Previous versions of the catalog (Schneider et al. 2002, 2003, 2005; hereafter Papers I, II,
and III) were published with the SDSS Early Data Release (EDR; Stoughton et al. 2002), the SDSS First
Data Release (DR1; Abazajian et al. 2003), and the SDSS Third Data Release (DR3; Abazajian et al. 2005),
and contained 3,814, 16,713, and 46,420 quasars, respectively. The current catalog is the entire set of quasars
from the SDSS-I Quasar Survey; the SDSS-I was completed on 30 June 2005 and the Fifth Data Release
(DR5; Adelman-McCarthy et al. 2007) was made public on 30 June 2006. The catalog contains 77,429
quasars, the vast majority of which were discovered by the SDSS. The SDSS Quasar Survey is continuing
via the SDSS-II Legacy Survey, which is is an extension of the SDSS-I.
The catalog in the present paper consists of the DR5 objects that have a luminosity larger than
Mi = −22.0 (calculated assuming an H0 = 70 km s
−1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7 cosmology [Spergel et
al. 2006], which will be used throughout this paper), and whose SDSS spectra contain at least one broad
emission line (velocity FWHM larger than ≈ 1000 km s−1) or have interesting/complex absorption-line fea-
tures. The catalog also has a bright limit of i ≈ 15.0. The quasars range in redshift from 0.08 to 5.41;
78% have redshifts below 2.0.
The objects are denoted in the catalog by their DR5 J2000 coordinates; the format for the object name
is SDSS Jhhmmss.ss+ddmmss.s. Since the image data used for the astrometric information can change
between data releases (e.g., a region with poor seeing that is included in an early release is superseded by a
newer observation in good seeing), the coordinates for an object can change at the 0.1′′ to 0.2′′ level; hence
– 3 –
the designation of a given source can change between data releases. Except on very rare occasions (see §5.1),
this change in position is much less than 1′′. When merging SDSS Quasar Catalogs with previous databases
one should always use the coordinates, not object names, to identify unique entries.
The DR5 catalog does not include classes of Active Galactic Nuclei (AGN) such as Type 2 quasars,
Seyfert galaxies, and BL Lacertae objects; studies of these sources in the SDSS can be found in Zakamska et
al. (2003) (Type 2), Kauffmann et al. (2003) and Hao et al. (2005) (Seyferts), and Collinge et al. (2005) and
Anderson et al. (2007) (BL Lacs). Spectra of the highest redshift SDSS quasars (z > 5.7; e.g., Fan et al. 2003,
2006a) were not acquired as part of the SDSS quasar survey (the objects were identified as candidates in
the SDSS imaging data, but the spectra were not obtained with the SDSS spectrographs), so they are not
included in the catalog.
The observations used to produce the catalog are presented in Section 2; the construction of the catalog
and the catalog format are discussed in Sections 3 and 4, respectively. Section 5 presents an overview of the
catalog, and a summary is given in Section 6. The catalog is presented in an electronic table in this paper
and can also be found at an SDSS public web site.1
2. Observations
2.1. Sloan Digital Sky Survey
The Sloan Digital Sky Survey uses a CCD camera (Gunn et al. 1998) on a dedicated 2.5-m telescope
(Gunn et al. 2006) at Apache Point Observatory, New Mexico, to obtain images in five broad optical bands
(ugriz; Fukugita et al. 1996) over approximately 10,000 deg2 of the high Galactic latitude sky. The sur-
vey data-processing software measures the properties of each detected object in the imaging data in all
five bands, and determines and applies both astrometric and photometric calibrations (Pier et al., 2003;
Lupton et al. 2001; Ivezić et al. 2004). Photometric calibration is provided by simultaneous observations
with a 20-inch telescope at the same site (see Hogg et al. 2001, Smith et al. 2002, Stoughton et al. 2002, and
Tucker et al. 2006). The SDSS photometric system is based on the AB magnitude scale (Oke & Gunn 1983).
The catalog contains photometry from 204 SDSS imaging runs acquired between 19 September 1998
(Run 94) and 13 May 2005 (Run 5326).
2.2. Target Selection
The SDSS filter system was designed to identify quasars at redshifts between zero and approximately
six; most quasar candidates are selected based on their location in multidimensional SDSS color-space. The
Point Spread Function (PSF) magnitudes are used for the quasar target selection, and the selection is based
on magnitudes and colors that have been corrected for Galactic extinction (using the maps of Schlegel,
Finkbeiner, & Davis 1998). An i magnitude limit of 19.1 is imposed for candidates whose colors indicate a
probable redshift of less than≈ 3.0 (selected from the ugri color cube); high-redshift candidates (selected from
the griz color cube) are accepted if i < 20.2 and the source is unresolved. The errors on the i measurements
are typically 0.02–0.03 and 0.03–0.04 magnitudes at the brighter and fainter limits, respectively. In addition
1http://www.sdss.org/dr5/products/value added/qsocat dr5.html
http://www.sdss.org/dr5/products/value$_$added/qsocat$_$dr5.html
– 4 –
to the multicolor selection, unresolved objects brighter than i = 19.1 that lie within 2.0′′ of a FIRST radio
source (Becker, White, & Helfand 1995) are also identified as primary quasar candidates. Target selection also
imposes a maximum brightness limit (i ≈ 15.0) on quasar candidates; the spectra of objects that exceed this
brightness could contaminate the adjacent spectra on the detectors of the SDSS spectrographs. A detailed
description of the quasar selection process and possible biases can be found in Richards et al. (2002a).
The primary sample described above was supplemented by quasars that were targeted by the following
SDSS spectroscopic target selection algorithms: Galaxy and Luminous Red Galaxy (Strauss et al. 2002
and Eisenstein et al. 2001), X-ray (object near the position of a ROSAT All-Sky Survey [RASS; Voges et
al. 1999, 2000] source; see Anderson et al. 2003), Star (point source with a color typical of an interesting
class of star), or Serendipity (unusual color or FIRST matches). The SDSS is designed to be complete
in the Galaxy, Luminous Red Galaxy and Quasar programs, (in practice various limitations reduce the
completeness to about 90%) but no attempt at completeness was made for the other categories. Most of
the DR5 quasars that fall below the magnitude limits of the quasar survey were selected by the serendipity
algorithm (see §5).
While the bulk of the catalog objects targeted as quasars were selected based on the algorithm of
Richards et al. (2002a), during the early years of the SDSS the quasar selection software was undergoing
constant modification to improve its efficiency. All of the sources in Papers I and II, and some of the
Paper III objects, were not identified with the final selection algorithm. Once the final target selection
software was installed, the algorithm was applied to the entire SDSS photometric database. Each DR5
quasar has two spectroscopic target selection flags listed in the catalog: BEST, which refers to the final
algorithm, and TARGET, which is the target flag used in the actual spectroscopic targeting. There are also
two sets of photometric measurements for each quasar: BEST, which refers to the measurements with the
latest photometric software on the highest quality data, and TARGET, which are the values used at the
time of the spectroscopic target selection.
Extreme care must be exercised when constructing statistical samples from this catalog; if one uses the
values produced by only the latest version of the selection software, not only must one drop the catalog
quasars that were not identified as quasar candidates by the final selection software, one must also account
for quasar candidates produced by the final version that were not observed in the SDSS spectroscopic survey
(this can occur in regions of sky whose spectroscopic targets were identified by early versions of the selection
software). The selection for the UV-excess quasars, which comprise the majority (≈ 80%) of the objects in
the DR5 Catalog, has remained reasonably uniform; the changes to the selection algorithm were primarily
designed to increase the effectiveness of the identification of 3.0 < z < 3.8 quasars. Extensive discussion of
the completeness and efficiency of the selection can be found in Richards et al. (2002a) and Vanden Berk et
al. (2005); Richards et al. (2006) describes the process for the construction of statistical SDSS quasar samples
(see also Adelman-McCarthy et al. 2007). The survey efficiency (the ratio of quasars to quasar candidates)
for the ultraviolet excess-selected candidates, which comprise the bulk of the quasar sample, is about 77%.
(The catalog contains information on which objects can be used in a uniform sample; see Section 4.)
2.3. Spectroscopy
Spectroscopic targets chosen by the various SDSS selection algorithms (i.e., quasars, galaxies, stars,
serendipity) are arranged onto a series of 3◦ diameter circular fields (Blanton et al. 2003). Details of the
spectroscopic observations can be found in York et al. (2000), Castander et al. (2001), Stoughton et al. (2002),
– 5 –
and Paper I. A total of 1458 spectroscopic fields, taken between 5 March 2000 and 14 June 2005, provided
the quasars for the DR5 quasar catalog; the locations of the plate centers can be found from the information
given by Adelman-McCarthy et al. (2007). The DR5 spectroscopic program attempted to cover, in a well-
defined manner, an area of ≈ 5740 deg2. Spectroscopic plate 716 was the first spectroscopic observation
that was based on the final version of the quasar target selection algorithm of Richards et al. (2002a); the
detailed tiling information in the SDSS database must be consulted to identify those regions of sky targeted
with the final selection algorithm (see Richards et al. 2006).
The two SDSS double spectrographs produce data covering 3800–9200 Å at a spectral resolution of ≃ 2000.
The data, along with the associated calibration frames, are processed by the SDSS spectroscopic pipeline
(see Stoughton et al. 2002). The calibrated spectra are classified into various groups (e.g., star, galaxy,
quasar), and redshifts are determined by two independent software packages. Objects whose spectra cannot
be classified by the software are flagged for visual inspection. Figure 1 shows the calibrated SDSS spectra
of four previously unknown catalog quasars representing a range of properties. The processed DR5 spectra
have not been corrected for Galactic extinction.
3. Construction of the SDSS DR5 Quasar Catalog
The quasars in the catalog were drawn from three sets of SDSS observations: 1) the primary survey area,
2) “Bonus” plates, which are spectroscopic observations of regions near to, but outside of, the primary survey
area, and 3) “Special” plates, where the spectroscopic targets were not chosen by the standard SDSS target
selection algorithms (e.g., a set of plates to investigate the structure of the Galaxy; see Adelman-McCarthy
et al. 2006).
The DR5 quasar catalog was constructed, as were the previous editions, in three stages: 1) Creation of a
quasar candidate database, 2) Visual examination of the spectra of the quasar candidates, and 3) Application
of luminosity and emission-line velocity width criteria. All three tasks were initially done without reference
to the material in the previous SDSS Quasar Catalogs, although the results of each task were compared to
the Paper III database (e.g., the construction of the quasar database was not viewed as complete until it
was understood why any Paper III quasars were not included).
3.1. Creation of the DR5 Quasar Candidate Database
This catalog of bona-fide quasars, that have redshifts checked by eye and luminosities and line widths
that meet the formal quasar definition, is constructed from a larger “master” table of confirmed quasars and
quasar candidates. This master table was created using an SQL query to the public SDSS-DR5 database (i.e.,
the Catalog Archive Server [CAS]; http://cas.sdss.org/astrodr5/). Two versions of the photometric database
exist, which contain the properties of objects when targeted for spectroscopic observations (TARGET) and as
determined in the latest processing (BEST). These databases are divided into multiple tables and subtables
to facilitate access to only the most relevant data for a particular use. In the case of the quasar catalog
construction, we have made use of the PhotoObjAll and SpecObjAll tables, which contain, respectively, the
photometric information for all SDSS sources and for all SDSS spectra. In the case of PhotoObjAll, both
the TARGET and BEST versions are queried. These tables include duplicate observations of objects and
observations of objects that lie outside of the formal SDSS area (as compared to the PhotoObj and SpecObj
tables, which include only sources in the formal SDSS area), and are the most complete database files. For
http://cas.sdss.org/astrodr5/
– 6 –
example, in PhotoObjAll, two (or more) observations of a single object may exist; if so, one is classified as
PRIMARY, the other(s) as SECONDARY.
This master table contains all objects identified as quasar candidate targets for spectroscopy in either
the TARGET or BEST photometric databases. Quasar candidates are those objects which have had one or
more of the following flags set by the algorithm described by Richards et al. (2002a):
TARGET QSO HIZ OR TARGET QSO CAP OR TARGET QSO SKIRT OR TARGET QSO FIRST CAP
OR TARGET QSO FIRST SKIRT
( = 0x0000001F, except for the “special” plates [see Adelman-McCarthy et al. 2006, 2007], where additional
care is required in interpreting the flags).
Objects flagged as TARGET QSO MAG OUTLIER and TARGET QSO REJECT are not included, as these flags are
meant only for diagnostic purposes. (In the CAS documentation and the EDR paper, TARGET QSO MAG OUTLIER
is called TARGET QSO FAINT.) Furthermore, the master table includes any objects with spectra that have been
classified by the spectroscopic pipeline as quasars (specClass=QSO or HIZ QSO), that have UNKNOWN type,
or that have redshifts greater than 0.6. (On rare occasions the spectroscopic pipeline measures the correct
redshift for a quasar but classifies the object as a galaxy.)
The query was run on the union of the database tables Target..PhotoObjAll, Best..SpecObjAll, and
Best..PhotoObjAll. Multiple entries for a given object are retained at this stage.
Ten objects in the DR3 Quasar Catalog were missed by this query. One omission was due to an
“unmapped” fiber (a spectrum of a quasar was obtained, but because of a failure in the mapping of fiber
number to location in the sky, we are no longer certain of the celestial position of the object); the other nine
were low-redshift AGN that were not classified as quasars by the spectroscopic pipeline (this result provides
an estimate of the incompleteness produced by the query). We were able to identify the information for all
ten quasars in the database and add the material to the master table.
Four automated cuts were made to the master table database of 329,884 candidates 2: 1) Objects
targeted as quasars but whose spectra had not yet been obtained by the closing date of DR5 (124,447
objects), 2) Candidates classified with high confidence as “stars” by the spectroscopic pipeline that had
redshifts less than 0.002 (33,653), 3) Objects whose photometric measurements have not been loaded into
the CAS (3106) and 4) Multiple spectra (coordinate agreement better than 1.0′′) of the same object (40,007).
In cases of duplicate spectra of an object, the “science primary” spectrum is selected (i.e., the spectrum was
obtained as part of normal science operations); when there is more than one science primary observation
(or when none of the spectra have this flag set), the spectrum with the highest signal-to-noise ratio (S/N) is
retained (see Stoughton et al. 2002 for a description of the science primary flag). These actions produced a
list of 128,671 unique quasar candidates.
3.2. Visual Examination of the Spectra
The SDSS spectra of the remaining quasar candidates were manually inspected by several of the authors
(DPS, PBH, GTR, MAS, and SFA); as in previous papers in this series, we found that the spectroscopic
2The master table is known as the QSOConcordanceALL table, which can be found in the SDSS database; see
http://cas.sdss.org/astrodr5/en/help/browser/description.asp?n=QsoConcordanceAll&t=U.
http://cas.sdss.org/astrodr5/en/help/browser/description.asp?n=QsoConcordanceAll
– 7 –
pipeline redshifts and classifications of the overwhelming majority of the objects are accurate. Tens of
thousands of objects were dropped from the list because they were obviously not quasars (these objects
tended to be low S/N stars, unusual stars, and a mix of absorption-line and narrow emission-line objects);
this large number of candidates that are not quasars is due to the inclusive nature of our initial database
query. Spectra for which redshifts could not be determined (low signal-to-noise ratio or subject to data-
processing difficulties) were also removed from the sample. This visual inspection resulted in the revisions
of the redshifts of 863 quasars; the changes in the individual redshifts were usually quite substantial, due to
the spectroscopic pipeline misidentifying emission lines.
An independent determination of the redshifts of 5,865 quasars with redshifts larger than 2.9 in the
catalog was performed by Shen et al. (2007). The redshift differences between the two sets of measurements
follow a Gaussian distribution (with slightly extended wings), with a mean of 0.002 and a dispersion of 0.01.
The catalog contains numerous examples of extreme Broad Absorption Line (BAL) Quasars (see Hall
et al. 2002); it is difficult if not impossible to apply the emission-line width criterion for these objects, but
they are clearly of interest, have more in common with “typical” quasars than with narrow-emission line
galaxies, and have historically been included in quasar catalogs. We have included in the catalog all objects
with broad absorption-line spectra that meet the Mi < −22.0 luminosity criterion.
3.3. Luminosity and Line Width Criteria
As in Papers II and III, we adopt a luminosity limit of Mi = −22.0. The absolute magnitudes were
calculated by correcting the BEST i measurement for Galactic extinction (using the maps of Schlegel,
Finkbeiner, & Davis 1998) and assuming that the quasar spectral energy distribution in the ultraviolet-
optical can be represented by a power law (fν ∝ ν
α), where α = −0.5 (Vanden Berk et al. 2001). (In the
134 cases where BEST photometry was not available, the TARGET measurements were substituted for the
absolute magnitude calculation.) This approach ignores the contributions of emission lines and the observed
distribution in continuum slopes. Emission lines can contribute several tenths of a magnitude to the k-
correction (see Richards et al. 2006), and variations in the continuum slopes can introduce a magnitude or
more of error into the calculation of the absolute magnitude, depending upon the redshift. The absolute
magnitudes will be particularly uncertain at redshifts near and above five, when the Lyman α emission line
(with a typical observed equivalent width of ≈ 400− 500 Å) and strong Lyman α forest absorption enter
the i bandpass.
Quasars near the Mi = −22.0 luminosity limit are often not enormously brighter in the i-band than the
starlight produced by the host galaxy. Although the PSF-based SDSS photometry presented in the catalog
are less susceptible to host galaxy contamination than are fixed-aperture measurements, the nucleus of the
host galaxy can still contribute appreciably to this measurement for the lowest luminosity entries in the
catalog (see Hao et al. 2005). An object of Mi = −22.0 will reach the i = 19.1 “low-redshift” selection limit
at a redshift of ≈ 0.4.
After visual inspection and application of the luminosity criterion had reduced the number of quasar
candidates to under 80,000 objects, the remaining spectra were processed with an automated line-measuring
routine. The spectra for objects whose maximum line width was less than 1000 km s−1 were visually
examined; if the measurement was deemed to be an accurate reflection of the line (automated routines
occasionally have spectacular failures when dealing with complex line profiles), the object was removed from
the catalog.
– 8 –
4. Catalog Format
The DR5 SDSS Quasar Catalog is available in three types of files at the SDSS public web site listed in
the introduction: 1) a standard ASCII file with fixed-size columns, 2) a gzipped compressed version of the
ASCII file (which is smaller than the uncompressed version by a factor of more than four), and 3) a binary
FITS table format. The following description applies to the standard ASCII file. All files contain the same
number of columns, but the storage of the numbers differs slightly in the ASCII and FITS formats; the FITS
header contains all of the required documentation. Table 1 provides a summary of the information contained
in each of the columns in the ASCII catalog.
The standard ASCII catalog (Table 2 of this paper) contains information on 77,429 quasars in a 36 MB
file. The DR5 format is similar to that of DR3 with a few minor differences.
The first 80 lines consist of catalog documentation; this is followed by 77,429 lines containing information
on the quasars. There are 74 columns in each line; a summary of the information is given in Table 1 (the
documentation in the ASCII catalog header is essentially an expansion of Table 1). At least one space
separates all the column entries, and, except for the first and last columns (SDSS designation and the object
name if previously known), all entries are reported in either floating point or integer format.
Notes on the catalog columns:
1) The DR5 object designation, given by the format SDSS Jhhmmss.ss+ddmmss.s; only the final 18 char-
acters are listed in the catalog (i.e., the “SDSS J” for each entry is dropped). The coordinates in the object
name follow IAU convention and are truncated, not rounded.
2–3) The J2000 coordinates (Right Ascension and Declination) in decimal degrees. The positions for the
vast majority of the objects are accurate to 0.1′′ rms or better in each coordinate; the largest expected errors
are 0.2′′ (see Pier et al 2003). The SDSS coordinates are placed in the International Celestial Reference
System, primarily through the United States Naval Observatory CCD Astrograph Catalog (Zacharias et
al. 2000), and have an rms accuracy of 0.045′′ per coordinate.
4) The quasar redshifts. A total of 863 of the CAS redshifts were revised during our visual inspection.
A detailed description of the redshift measurements is given in Section 4.10 of Stoughton et al. (2002).
A comparison of 299 quasars observed at multiple epochs by the SDSS (Wilhite et al. 2005) found an rms
difference of 0.006 in the measured redshifts for a given object. It is well known that the redshifts of individual
broad emission lines in quasars exhibit significant offsets from their systemic redshifts (e.g., Gaskell 1982,
Richards et al. 2002b, Shen et al. 2007); the catalog redshifts attempt to correct for this effect in the ensemble
average (see Stoughton et al. 2002).
5–14) The DR5 PSF magnitudes and errors (not corrected for Galactic extinction) from BEST photometry
for each object in the five SDSS filters. Some of the relevant imaging scans, such as special scans through
M31 (see the DR4 and DR5 papers) were never loaded into the CAS, therefore the BEST photometry is not
available for them. Thus there are 134 quasars which have entries of “0.000” for their BEST photometric
measurements.
The effective wavelengths of the u, g, r, i, and z bandpasses are 3541, 4653, 6147, 7461, and 8904 Å, re-
spectively (for an α = −0.5 power-law spectral energy distribution using the definition of effective wavelength
given in Schneider, Gunn, & Hoessel 1983). The photometric measurements are reported in the natural sys-
tem of the SDSS camera, and the magnitudes are normalized to the AB system (Oke & Gunn 1983). The
measurements are reported as asinh magnitudes (Lupton, Gunn, & Szalay 1999); see Adelman-McCarthy et
– 9 –
al. (2007) for additional discussion and references for the accuracy of the photometric measurements. The
TARGET PSF photometric measurements are presented in columns 63–72.
15) The Galactic extinction in the u band based on the maps of Schlegel, Finkbeiner, & Davis (1998). For
an RV = 3.1 absorbing medium, the extinctions in the SDSS bands can be expressed as
Ax = Cx E(B − V )
where x is the filter (ugriz), and values of Cx are 5.155, 3.793, 2.751, 2.086, and 1.479 for ugriz, respectively
(Ag, Ar, Ai, and Az are 0.736, 0.534, 0.405, and 0.287 times Au).
16) The logarithm of the Galactic neutral hydrogen column density along the line of sight to the quasar.
These values were estimated via interpolation of the 21-cm data from Stark et al. (1992), using the COLDEN
software provided by the Chandra X-ray Center. Errors associated with the interpolation are typically
expected to be less than ≈ 1× 1020 cm−2 (e.g., see §5 of Elvis, Lockman, & Fassnacht 1994).
17) Radio properties. If there is a source in the FIRST catalog within 2.0′′ of the quasar position, this
column contains the FIRST peak flux density at 20 cm encoded as an AB magnitude
AB = −2.5 log
3631 Jy
(see Ivezić et al. 2002). An entry of “0.000” indicates no match to a FIRST source; an entry of “−1.000”
indicates that the object does not lie in the region covered by the final catalog of the FIRST survey. The
catalog contains 6226 FIRST matches; 5729 DR5 quasars lie outside of the FIRST area.
18) The S/N of the FIRST source whose flux is given in column 17.
19) Separation between the SDSS and FIRST coordinates (in arc seconds).
20) In cases when the FIRST counterpart to an SDSS source is extended, the FIRST catalog position of the
source may differ by more than 2′′ from the optical position. A “1” in column 20 indicates that no matching
FIRST source was found within 2′′ of the optical position, but that there is significant detection (larger
than 3σ) of FIRST flux at the optical position. This is the case for 2440 SDSS quasars.
21) A “1” in column 21 identifies the 1596 sources with a FIRST match in either columns 17 or 20 that also
have at least one FIRST counterpart located between 2.0′′ (the SDSS-FIRST matching radius) and 30′′ of
the optical position. Based on the average FIRST source surface density of 90 deg−2, we expect 50–60 of
these matches to be chance superpositions.
22) The logarithm of the vignetting-corrected count rate (photons s−1) in the broad energy band (0.1–2.4 keV)
in the ROSATAll-Sky Survey Faint Source Catalog (Voges et al. 2000) and the ROSATAll-Sky Survey Bright
Source Catalog (Voges et al. 1999). The matching radius was set to 30′′; an entry of “−9.000” in this column
indicates no X-ray detection. There are 4133 RASS matches in the DR5 catalog.
23) The S/N of the ROSAT measurement.
24) Separation between the SDSS and ROSAT All-Sky Survey coordinates (in arc seconds).
25–30) The JHK magnitudes and errors from the Two Micron All Sky Survey (2MASS; Skrutskie et al.
2006) All-Sky Data Release Point Source Catalog (Cutri et al. 2003) using a matching radius of 2.0′′. A
– 10 –
non-detection by 2MASS is indicated by a “0.000” in these columns. Note that the 2MASS measurements
are Vega-based, not AB, magnitudes. The catalog contains 9824 2MASS matches.
31) Separation between the SDSS and 2MASS coordinates (in arc seconds).
32) The absolute magnitude in the i band calculated by correcting for Galactic extinction and assuming
H0 = 70 km s
−1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7, and a power-law (frequency) continuum index of −0.5.
33) The ∆(g− i) color, which is the difference in the Galactic extinction corrected (g− i) for the quasar and
that of the mean of the quasars at that redshift. If ∆(g − i) is not defined for the quasar, which occurs for
objects at either z < 0.12 or z > 5.12 the column will contain “−9.000”. See Section 5.2 for a description of
this quantity.
34) Morphological information. If the SDSS photometric pipeline classified the image of the quasar as a
point source, the catalog entry is 0; if the quasar is extended, the catalog entry is 1.
35) The SDSS SCIENCEPRIMARY flag, which indicates whether the spectrum was taken as a normal science
spectrum (SCIENCEPRIMARY= 1) or for another purpose (SCIENCEPRIMARY= 0). The latter category contains
Quality Assurance and calibration spectra, or spectra of objects located outside of the nominal survey area.
Over 90% of the DR5 entries (69,762 objects) are SCIENCEPRIMARY = 1.
36) This flag provides information on whether the photometric object is designated PRIMARY (1), SECONDARY (2),
or FAMILY (3; these are blended objects that have not been deblended). During target selection, only PRIMARY
objects are considered (except on occasion for objects located in fields that are not part of the nominal sur-
vey area); however, differences between TARGET and BEST photometric pipeline versions make it possible
that the BEST photometric object belonging to a spectrum is either not detected at all, or is a non-primary
object (see §3.1 above). Over 99% of the catalog entries are PRIMARY; 613 quasars are SECONDARY and 9 are
FAMILY. There are 124 quasars with an entry of “0” in this column; each of these is an object that lacks
BEST photometry. For statistical analysis, one should use only PRIMARY objects; SECONDARY and FAMILY
objects are included in the catalog for the sake of completeness with respect to confirmed quasars.
37) The “uniform selection” flag, either 0 or 1; a “1” indicates that the object was identified as a primary
quasar target (37,574 catalog entries) with the final target selection algorithm as given by Richards et
al. (2002a). These objects constitute a statistical sample.
38) The 32-bit SDSS target-selection flag from BEST processing (PRIMTARGET; see Table 26 in Stoughton et
al. 2002 for details); this is the flag produced by running the selection algorithm of Richards et al. (2002a) on
the most recent processing of the image data. The target-selection flag from TARGET processing is found
in column 55.
39–45) The spectroscopic target selection breakdown (BEST) for each object. The target selection flag
in column 38 is decoded for seven groups: Low-redshift quasar, High-redshift quasar, FIRST, ROSAT,
Serendipity, Star, and Galaxy An entry of “1” indicates that the object satisfied the given criterion (see
Stoughton et al. 2002). Note that an object can be, and often is, targeted by more than one selection
algorithm. The last two columns in Table 3 presents the number of quasars identified by the individual
BEST target selection algorithm; the column labeled “Sole” indicates the number of objects that were
detected by only one of the seven listed selection algorithms.
46–47) The SDSS Imaging Run number and the Modified Julian Date (MJD) of the photometric observation
used in the catalog. The MJD is given as an integer; all observations on a given night have the same integer
MJD (and, because of the observatory’s location, the same UT date). For example, imaging run 94 has an
– 11 –
MJD of 51075; this observation was taken on 1998 September 19 (UT).
48–50) Information about the spectroscopic observation (Modified Julian Date, spectroscopic plate number,
and spectroscopic fiber number) used to determine the redshift. These three numbers are unique for each
spectrum, and can be used to retrieve the digital spectra from the public SDSS database.
51–54) Additional SDSS processing information: the photometric processing rerun number; the camera
column (1–6) containing the image of the object, the field number of the run containing the object, and the
object identification number (see Stoughton et al. 2002 for descriptions of these parameters).
55) The 32-bit SDSS target selection flag from the TARGET processing, i.e., the value that was used when
the spectroscopic plate was drilled. This may not match the BEST target selection flag because a different
versions of the selection algorithm were used, the selection was done with different image data (superior
quality data of the field was obtained after the spectroscopic observations were completed), or different
processings of the same data were used. Objects with no TARGET flag were either identified as quasars
as a result of Quality Assurance observations and/or from special plates with somewhat different targeting
criteria (see Adelman-McCarthy 2006).
56–62) The spectroscopic target-selection breakdown (TARGET) for each object; this is the same convention
as followed in columns 39–45 for the BEST target-selection flag.
63–72) The DR5 PSF magnitudes and errors (not corrected for Galactic reddening) from TARGET photom-
etry.
73) The 64-bit integer that uniquely describes the spectroscopic observation that is listed in the catalog
(SpecObjID).
74) Name of object in the NASA/IPAC Extragalactic Database (NED). If there is a source in the NED
quasar database within 5.0′′ of the quasar position, the NED object name is given in this column. The NED
quasar database contains over 100,000 objects. Occasionally NED will list the SDSS name for objects that
were not discovered by the SDSS.
5. Catalog Summary
The 77,429 objects in the catalog represent an increase of 31,009 quasars over the Paper III database;
of the entries in the new catalog, 74,297 (96.0%) were discovered by the SDSS (with the caveat that NED
is not complete). The catalog quasars span a wide range of properties: redshifts from 0.078 to 5.414,
14.94 < i < 22.36 (506 objects have i > 20.5; only 26 have i > 21.0), and−30.27 < Mi < −22.00. The catalog
contains 6226, 4133, and 9824 matches to the FIRST, RASS, and 2MASS catalogs, respectively. The RASS
and 2MASS catalogs cover essentially all of the DR5 area, but 5729 (7%) of the entries in the DR5 catalog
lie outside of the FIRST region.
Figure 2 displays the distribution of the DR5 quasars in the i-redshift plane (the 26 objects with i > 21
are not plotted). Objects which NED indicates were previously discovered by investigations other than the
SDSS are indicated with open circles. The curved cutoff on the left hand side of the graph is produced by
the minimum luminosity criterion (Mi < −22.0). The ridge in the contours at i ≈ 19.1 for redshifts below
three reflects the flux limit of the low-redshift sample; essentially all of the large number of z < 3 points with
i > 19.1 are quasars selected via criteria other than the primary multicolor sample.
– 12 –
A histogram of the catalog redshifts is shown in the upper curve in Figure 3. A clear majority of quasars
have redshifts below two (the median redshift is 1.48, the mode is ≈ 1.85), but there is a significant tail of
objects extending out to redshifts beyond five (zmax = 5.41). The dips in the curve at redshifts of 2.7 and 3.5
arise because the SDSS colors of quasars at these redshifts are similar to the colors of stars; we decided
to accept significant incompleteness at these redshifts rather than be overwhelmed by a large number of
stellar contaminants in the spectroscopic survey. Improvements in the quasar target selection algorithm
since the initial editions of the SDSS Quasar Catalog have increased the efficiency of target selection at
redshifts near 3.5 (compare Figure 3 with Paper II’s Figure 4; see Richards et al. 2002a for a discussion of
the incompleteness of the SDSS Quasar Survey).
This structure in the catalog redshift histogram can be understood by careful modelling of the selection
effects (e.g., accounting for emission line effects and using only objects selected in regions whose spectroscopic
observations were chosen with the final version of the quasar target selection algorithm; also see Figure 8
in Richards et al. 2006). Repeating the analysis of Richards et al. (2006) for the DR5 sample reveals
no structure in the redshift distribution after selection effects have been included (see lower histogram in
Figure 3); this is in contrast to the reported redshift structure found in the SDSS quasar survey by Bell &
McDiarmid (2006). To construct the lower histogram we have partially removed the effect of host galaxy
contamination (by excluding extended objects), limited the sample to a uniform magnitude limit of i < 19.1
(accounting for emission-line effects), and have corrected for the known incompleteness near z ∼ 2.7 and
z ∼ 3.5 due to quasar colors lying close to or in the stellar locus. Accounting for selection effects significantly
reduces the number of objects as compared with the raw, more heterogeneous catalog, but the smaller, more
homogeneous sample is what should be used for statistical analyses.
The distribution of the observed i magnitude (not corrected for Galactic extinction) of the quasars is
given in Figure 4. The sharp drops in the histogram at i ≈ 19.1 and i ≈ 20.2 are due to the magnitude limits
in the low and high-redshift samples, respectively.
Figures 5 and 6 display the distribution of the absolute i magnitudes of the catalog quasars. There
is a roughly symmetric peak centered at Mi = −26 with a FWHM of approximately one magnitude. The
histogram declines sharply at high luminosities (only 1.5% of the objects haveMi < −28.0) and has a gradual
decline toward lower luminosities, partially due to host-galaxy contribution.
A summary of the spectroscopic selection, for both the TARGET and the BEST algorithms, is given in
Table 3. We report seven selection classes in the catalog (columns 39 to 45 for BEST, 56–62 for TARGET).
Each selection version has two columns, the number of objects that satisfied a given selection criterion and
the number of objects that were identified only by that selection class. About two-thirds of the catalog entries
were selected based on the SDSS quasar selection criteria (either a low-redshift or high-redshift candidate,
or both). Slightly more than half of the quasars in the catalog are serendipity-flagged candidates, which is
also primarily an “unusual color” algorithm; about one-fifth of the catalog was selected by the serendipity
criteria alone.
Of the 50,093 DR5 quasars that have Galactic-absorption corrected TARGET i magnitudes brighter
than 19.1, 48,593 (97.0%) were identified by the TARGET quasar multicolor selection; if one combines
TARGET multicolor and FIRST selection (the primary quasar target selection criteria), all but 1015 of
the i < 19.1 objects were selected. (The spectra of many of the last category of objects were obtained in
observations that were not part of the primary survey.) The numbers are similar if one uses the BEST
photometry and selection, although the completeness is not quite as high as with TARGET values.
– 13 –
5.1. Discrepancies Between the DR5 and Other Quasar Catalogs
The DR3 database is entirely contained in that of DR5, but there are 66 quasars from Paper III (out
of 46,420 objects) that do not have a counterpart within 1.0′′ of a DR5 quasar. Three of these “missing”
quasars are in the DR5 list; changes in celestial position of 1.1′′, 1.8′′, and 5.3′′ between DR3 and DR5 caused
these quasars to be missed with the 1.0′′ matching criterion. The other 63 cases (0.14% of the DR3 total)
were individually investigated. Three DR3 objects were dropped because the latest photometry reduced
their luminosities below the catalog limit. The remaining 60 objects were removed because 1) the visual
examination of the spectrum either convinced us that the object was not a quasar or that the S/N was
insufficient to assign a redshift with confidence or 2) The widest line in the latest fit to the spectrum had a
FWHM of less than 1000 km s−1. It should be noted that there have been no changes to the DR3 spectra in
the DR5 database; the missing objects reflect the inherent uncertainties involved with interpreting objects
that either lie near survey cutoffs or have spectra of marginal S/N.
There are 40 and 136 DR5 quasars that have redshifts that differ by more than 0.1 from the DR3 and
NED values, respectively (there is, of course, considerable overlap in these two groups). In all cases the DR5
measurements are considered more reliable than those presented in previous publications. The 40 objects
with |zDR5 − zDR3| > 0.1 are listed in Table 4.
5.2. Quasar Colors
It has long been known that the majority of quasars inhabit a restricted range in photometric color,
and the large sample size and accurate photometry of the SDSS revealed a relatively tight color-redshift
correlation for quasars (Richards et al. 2001). This SDSS color relation, recently presented in Hopkins et
al. (2004), has led to considerable success in assigning photometric redshifts to quasars (e.g., Weinstein et
al. 2004 and references therein). All photometric measurements used in these analyses have been corrected
for Galactic extinction.
The dependence of the four standard SDSS colors on redshift for the DR5 quasars is given in Figure 7.
The dashed line in each panel is the modal relation for the DR5 quasars; the modal relations are tabulated
in Table 5, along with the values for (g− i). The figures show an impressively tight correlation of color with
redshift, although the scatter dramatically increases when the Lyman α forest dominates the bluer of the
passbands used to form the color. The distribution near the modal curve is roughly symmetric, but there is
clearly a significant population of “red” quasars that has no “blue” counterpart.
This table is an improvement over previous work in that it is based on a larger sample size (a factor
of four increase since this relation was last published) and provides higher redshift resolution (0.01, except
near the extrema). As in Hopkins et al. (2004), we compute the mode, rather than the mean or median,
as the most representative quantity. However, a formal computation of the mode requires binning the data
both in redshift and by color within redshift bins; therefore we estimated the mode from the mean and
the median. Typically, the mode is estimated as (3 × median−2 × mean), but we found empirically that
(2 × median−mean) appeared to work better for this sample in terms of tracing the modal “ridgeline” with
redshift.
For each of the DR5 quasars we provide the quantity ∆(g − i), which is defined by
∆(g − i) = (g − i)QSO − 〈(g − i)〉redshift
– 14 –
where 〈(g − i)〉redshift is the entry in Table 5 for the redshift of the quasar. This “differential color” provides
an estimate of the continuum properties of the quasar (values above zero indicate that the object has a
redder continuum than the typical quasar at that redshift).
5.3. Bright Quasars
Although the spectroscopic survey is limited to objects fainter than i ≈ 15, the SDSS continues to
discover a number of “PG-class” (Schmidt & Green 1983) objects. The DR5 catalog contains 81 entries with
i < 16.0; 14 of the quasars are not in the NED database or attributed to the SDSS by NED. The spectrum of
the brightest post-DR3 discovery, SDSS J165551.37+214601.8 (i = 15.62, z = 0.15), is presented in Figure 1.
Three of the SDSS-discovered objects in this catalog have been added since Paper III.
5.4. Luminous Quasars
There are 103 catalog quasars with Mi < −29.0 (3C 273 has Mi ≈ −26.6 in our adopted cosmology);
61 were discovered by the SDSS, and 18 are published here for the first time. The redshifts of these
quasars lie between 1.3 and 5.0. The most luminous quasar in the catalog is 2MASSI J0745217+473436
(= SDSS J074521.78+473436.2), at Mi = −30.27 and z = 3.22. Spectra of the two most luminous post-DR3
discoveries, with absolute i magnitudes of −29.94 and −29.65, are displayed in the upper two panels of
Figure 1. The spectra of both quasars possess a considerable number of absorption features redward of the
Lyman α emission line.
5.5. Broad Absorption Line Quasars
The SDSS quasar selection algorithm has proven to be effective at finding a wide variety of Broad
Absorption Line (BAL) Quasars. An EDR sample of 118 BAL quasars was presented by Tolea, Krolik, &
Tsvetanov (2002). There have been two editions of the SDSS BAL Quasar Catalog; the first, associated with
Paper I, contained 224 BAL quasars (Reichard et al. 2003); the second was based on the Paper III catalog
and presents 4787 BAL quasars (Trump et al. 2006). BAL quasars are usually recognized by the presence
of C IV absorption features, which are only visible in SDSS spectra at z > 1.6, thus the frequency of the
BAL quasar phenomenon cannot be found from simply taking the ratio of BAL quasars to total number of
quasars in the SDSS catalog. The SDSS has discovered a wide variety of extreme BAL quasars (see Hall
et al. 2002); the lower right panel in Figure 1 presents the spectrum of an unusual FeLoBAL quasar with
strong Balmer absorption (see Hall 2007 for a discussion of this object).
5.6. Quasars with Redshifts Below 0.15
The catalog contains 109 quasars with redshifts below 0.15. All of these objects are of low luminosity
(Mi > −24.0, only three have Mi < −23.5) because of the i ≈ 15.0 limit for the spectroscopic sample. About
three-quarters of these quasars (83) are extended in the SDSS image data. A total of 40 of the z < 0.15
quasars were found by the SDSS; 21 have been added since Paper III.
– 15 –
5.7. High-Redshift (z ≥ 4) Quasars
At first light of the SDSS, the most distant known quasar was PC 1247+3406 at redshift of 4.897
(Schneider, Schmidt, and Gunn 1991), which had been discovered seven years earlier. Within a year of
operation, the SDSS had discovered quasars with redshifts above five (Fan et al. 1999, 2000); the DR5
catalog contains 60 objects with redshifts greater than that of PC 1247+3406.
In recent years the SDSS has identified quasars out to a redshift of 6.4 (Fan et al. 2003, 2006b). Quasars
with redshifts larger than ≈ 5.7, however, cannot be found by the SDSS spectroscopic survey because at
these redshifts the observed wavelength of the Lyman α emission line is redward of the i band; at this point
quasars become single-filter (z) detections. At the typical z-band flux levels for redshift six quasars, there
are simply too many “false-positives” to undertake automated targeting. The largest redshift in the DR5
catalog is SDSS J023137.65−072854.5 at z = 5.41, which was originally described by Anderson et al. (2001).
The DR5 catalog contains 891 quasars with redshifts larger than four; 36 entries have redshifts above
five (11 above z = 5.2), which is more than a factor of two increase since Paper III. The spectra of the 20
highest redshift post-DR3 objects (all with redshifts greater than or equal to 4.99) are displayed in Figure 8.
These redshift five spectra display a striking variety of emission line properties, and include an impressive
BAL at z = 5.27.
We have used archival data from Chandra, ROSAT , and XMM-Newton to check for new X-ray detections
of z > 4 quasars with unusual emission-line or absorption-line properties; we do not report all z > 4 X-ray
detections here as there are now more than 110 already published.3 We found three remarkable z > 4
X-ray detections: the z = 4.26 BAL quasar SDSS J133529.45+410125.9, the z = 4.11 BAL quasar
SDSS J142305.04+240507.8, and the z = 4.50 quasar SDSS J150730.63+553710.8, which shows remarkably
strong C iv emission. None of these objects has sufficient counts for detailed X-ray spectral analysis, but we
have computed their point-to-point spectral slopes between rest-frame 2500 Å and 2 keV (αox), adopting the
assumptions in §2 of Brandt et al. (2002). SDSS J133529.45+410125.9 and SDSS J142305.04+240507.8 were
serendipitously detected in archival Chandra ACIS observations and have αox = −2.19 and αox = −1.52,
respectively. Comparing these values to the established relation between αox and 2500 Å luminosity (e.g.,
Steffen et al. 2006), we find that SDSS J133529.45+410125.9 is notably X-ray weak, indicating likely X-ray
absorption as is often seen in BAL quasars (e.g., Gallagher et al. 2006) including those at z > 4 (Vignali
et al. 2005). In contrast, the level of X-ray emission from SDSS J142305.04+240507.8 is consistent with
that from normal, non-BAL quasars; its relatively narrow UV absorption, for a BAL quasar, may indicate
a relatively small column density of obscuring material. SDSS J150730.63+553710.8 is weakly detected in a
ROSAT PSPC observation and has αox = −1.47; this level of X-ray emission is nominal for a quasar of its
luminosity.
We have also checked all quasars with z > 5 for new X-ray detections and found none; 21 quasars with
z > 5 have previously reported X-ray detections.
5.8. Close Pairs
The mechanical constraint that SDSS spectroscopic fibers must be separated by 55′′ on a given plate
makes it difficult for the spectroscopic survey to confirm close pairs of quasars. In regions that are covered
3See http://www.astro.psu.edu/users/niel/papers/highz-xray-detected.dat for a list of X-ray detections and references.
http://www.astro.psu.edu/users/niel/papers/highz-xray-detected.dat
– 16 –
by more than one plate, however, it is possible to obtain spectra of both components of a close pair;
there are 346 pairs of quasars in the catalog with angular separation less than an arcminute (34 pairs
with separations less than 20′′). Most of the pairs are chance superpositions, but there are many sets
whose components have similar redshifts, suggesting that the quasars may be physically associated. The
typical uncertainty in the measured value of the redshift difference between two quasars is 0.02; the catalog
contains 18 quasar pairs with separations of less than an arcminute and with ∆z < 0.02. These pairs, which
are excellent candidates for binary quasars, are listed in Table 6. Hennawi et al. (2006) identified over 200
quasar pairs in the SDSS, primarily through spectroscopic observations of SDSS quasar candidates (based on
photometric measurements) near known SDSS quasars; statistical arguments based on a correlation-function
analysis suggests that most of these pairs are indeed physically associated.
5.9. Morphology
The images of 3498 of the DR5 quasars are classified as extended by the SDSS photometric pipeline; 3291 (94%)
have redshifts below one (there are nine resolved z > 3.0 quasars). The majority of the large-redshift “re-
solved” quasars are probably measurement errors, but this sample may also contain a mix of chance su-
perpositions of quasars and foreground objects or possibly some small angle separation gravitational lenses
(indeed, several lenses are present in the resolved quasar sample; see Paper II and Oguri et al. 2006).
5.10. Matches with Non-optical Catalogs
A total of 6226 catalog objects are FIRST sources (defined by a SDSS-FIRST positional offset of less
than 2.0′′). Note that 226 of the objects were selected (with TARGET) solely because they were FIRST
matches (all unresolved SDSS sources brighter than i = 19.1 that lie within 2.0′′ of a FIRST source are
targeted by the quasar spectroscopic selection algorithm). Extended radio sources may be missed by this
matching. The upper left panel in Figure 9 contains a histogram of the angular offsets between the SDSS
and FIRST positions; the solid line is the expected distribution assuming a 0.21′′ 1σ Gaussian error in the
relative SDSS/FIRST positions (found by fitting the points with a separation less than 1.0′′. The small-angle
separations are well-fit to the Rayleigh distribution, but outside of about 0.5′′ there is an obvious excess of
observed separations. The number of chance superpositions was estimated by shifting the quasar positions
by ±200′′ in declination and matching the new coordinates to the FIRST catalog; only about 0.1% of the
reported FIRST matches are false. The large “tail” of this distribution is not likely to be due to measurement
errors but probably arises from extended radio emission that may not be precisely centered on the optical
image. To recover radio quasars that have offsets of more than 2.0′′, we separately identify all objects with a
greater than 3σ detection of FIRST flux at the optical position (2440 sources). For these objects as well as
those with a FIRST catalog match within 2′′, we perform a second FIRST catalog search with 30′′ matching
radius to identify possible radio lobes associated with the quasar, finding such matches for 1596 sources.
Matches with the ROSAT All-Sky Survey Bright and Faint Source Catalogs were made with a maximum
allowed positional offset of 30′′; this is the positional coincidence required for the SDSS ROSAT target
selection code. The DR5 catalog contains 4133 RASS matches; approximately 1.3% are expected to be false
identifications based on an analysis similar to that described in the previous paragraph. The SDSS-RASS
offsets for the DR5 sample are presented in the upper right panel of Figure 9; the solid curve, which is the
predicted distribution for a 1σ positional error of 11.1′′ (fit using all of the points), matches the data quite
– 17 –
well.
JHK photometric measurements for 9824 DR5 quasars were found by using a matching radius of 2.0′′ in
the 2MASS All-Sky Data Release Point Source Catalog. No infrared information was used to select the SDSS
spectroscopic targets. The positional offset histogram, given in the lower left panel of Figure 9, is considerably
tighter than that for the FIRST matches, although the Rayleigh fit to the separations less than 1.0′′ is
virtually identical to the FIRST distribution (1σ of 0.21′′). There are very few 2MASS identifications with
offsets between 1′′ and 2′′; virtually all of the infrared matches are correct.
6. Summary
The lower right panel in Figure 9 charts the progress of the SDSS Quasar Survey, denoted by the
number of spectroscopically-confirmed quasars, over the duration of SDSS-I. Although SDSS-I has now
been completed, the SDSS Quasar Survey is continuing under the SDSS-II project. By necessity the SDSS
spectroscopy lags the SDSS imaging; at the conclusion of SDSS-I more than 2000 square degrees of SDSS
image data in the Northern Galactic Cap lacked spectroscopic coverage (Adelman-McCarthy et al. 2007).
A future edition of the SDSS Quasar Catalog will incorporate the observations from SDSS-II and should
contain approximately 100,000 quasars.
The publication of this catalog marks the completion of the SDSS-I Quasar Survey, and we dedicate this
work to the memory of John N. Bahcall. John was the initial co-chair of the SDSS Quasar Working Group, a
position he held for nearly a decade. He played a key role in the formation of the SDSS Collaboration and the
design of the SDSS Quasar Survey, and was a mentor to many of the members of the Quasar Working Group.
We would like to thank Todd Boroson for suggesting several redshift adjustments to some of the DR3 Quasar
Catalog redshifts. This work was supported in part by National Science Foundation grants AST-0307582
and AST-0607634 (DPS, DVB, JW), AST-0307384 (XF), and AST-0307409 (MAS), and by NASA LTSA
grant NAG5-13035 (WNB, DPS). PBH acknowledges support by NSERC, and GTR was supported in part
by a Gordon and Betty Moore Fellowship in Data Intensive Sciences at JHU. XF acknowledges support from
an Alfred P. Sloan Fellowship and a David and Lucile Packard Fellowship in Science and Engineering. SJ
was supported by the Max-Planck-Gesellschaft (MPI für Astronomie) through an Otto Hahn fellowship. CS
was supported by the U.S. Department of Energy under contract DE-AC02-76CH03000.
Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participat-
ing Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics
and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Educa-
tion Funding Council for England. The SDSS Web site is http://www.sdss.org/.
The SDSS is managed by the Astrophysical Research Consortium (ARC) for the Participating Institu-
tions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute
of Potsdam, University of Basel, Cambridge University, Case Western Reserve University, University of
Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group,
Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle As-
trophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los
Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute
for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh,
University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of
http://www.sdss.org/
– 18 –
Washington.
This research has made use of 1) the NASA/IPAC Extragalactic Database (NED) which is operated
by the Jet Propulsion Laboratory, California Institute of Technology, under contract with the National
Aeronautics and Space Administration, and 2) data products from the Two Micron All Sky Survey, which is
a joint project of the University of Massachusetts and the Infrared Processing and Analysis Center/California
Institute of Technology, funded by the National Aeronautics and Space Administration and the National
Science Foundation.
– 19 –
REFERENCES
Abazajian, K., et al. 2003, AJ, 126, 2081 (DR1)
Abazajian, K., et al. 2005, AJ, 129, 1755 (DR3)
Adelman-McCarthy, J., et al. 2006, ApJS, 162, 38 (DR4)
Adelman-McCarthy, J., et al. 2007, ApJS, in press (DR5)
Anderson, S.F., et al. 2001, AJ, 122, 503
Anderson, S.F., et al. 2003, AJ, 126, 2209
Anderson, S.F., et al. 2007, AJ, 133, 313
Becker, R.H., White, R.L., & Helfand, D.J. 1995, ApJ, 450, 559
Bell, M. B., & McDiarmid, D. 2006, ApJ, 648, 140
Blanton, M.R., Lupton, R.H., Maley, F.M., Young, N., Zehavi, I., & Loveday, J. 2003, AJ, 125, 2276
Brandt, W.N., et al. 2002, ApJ, 569, L5
Castander, F.J., et al. 2001, AJ, 121, 2331
Collinge, M., et al. 2005, AJ, 129, 2542; Erratum AJ 131, 3135
Cutri, R.M., Skrutskie, M.F., van Dyk, S., Beichman, C.A., et al. 2003, VizieR On-line Data Catalog: II/246,
University of Massachusetts and Infrared Processing and Analysis Center
Eisenstein, D.J., et al. 2001, AJ, 122, 2267
Elvis M., Lockman F.J., & Fassnacht C., 1994, ApJS, 95, 413
Fan, X., et al. 1999, AJ, 118, 1
Fan, X., et al. 2000, AJ, 119, 1
Fan, X., et al. 2003, AJ, 125, 1649
Fan, X., et al. 2006a, AJ, 131, 1203
Fan, X., et al. 2006b, AJ, 132, 171
Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., Shimasaku, K., & Schneider, D.P. 1996, AJ, 111, 1748
Gallagher, S.C., Brandt, W.N., Chartas, G., Priddey, R., Garmire, G.P., & Sambruna, R.M. 2006, ApJ, 644,
Gaskell, C.M. 1982, ApJ, 263, 79
Gunn, J.E., et al. 1998, AJ, 116, 3040
Gunn, J.E., et al. 2006, AJ, 131, 2332
Hall, P.B. 2007, AJ, 133, 1271
– 20 –
Hall, P.B., et al. 2002, ApJS, 141, 267
Hennawi, J., et al. 2006, 131, 1
Hao, L., et al. 2005, AJ, 129, 1783
Hogg, D.W., Schlegel, D.J., Finkbeiner, D.P., & Gunn, J.E. 2001, AJ, 122, 2129
Hopkins, P.F., et al. 2004, AJ, 128, 1112
Ivezić, Ž., et al. 2002, AJ, 124, 2364
Ivezić, Ž., et al. 2004, AN, 325, 583
Kauffmann, G., et al. 2003, MNRAS, 346, 1055
Lupton, R.H., Gunn, J.E., Ivezić, Ž., Knapp, G.R., Kent, S., & Yasuda, N. 2001, in ASP Conf. Ser. 238,
Astronomical Data Analysis Software and Systems, ed. F.R. Harnden, F.A. Primini, & H.E. Payne
(San Francisco:ASP), 269
Lupton, R.H., Gunn, J.E., & Szalay, A. 1999, AJ, 118, 1406
Oke, J.B., & Gunn, J.E., 1983, ApJ., 266, 713 Oguri, M., et al. 2006, AJ, 132, 999
Pier, J.R., Munn, J.A., Hindsley, R.B., Hennessy, G.S., Kent, S.M., Lupton, R.H., & Ivezić, Ž., 2003, AJ,
125, 1559
Reichard, T.A., et al. 2003, AJ, 125, 1711
Richards, G.T., et al. 2001, AJ, 121, 2308
Richards, G.T., et al. 2002a, AJ, 123, 2945
Richards, G.T., et al. 2002b, AJ, 124,1
Richards, G.T., et al. 2006, AJ, 131, 2766
Schlegel, D.J., Finkbeiner, D.P., & Davis, M. 1998, ApJ, 500, 525
Schmidt, M., & Green, R.F. 1983, ApJ, 269, 352
Schneider, D.P., Gunn, J.E., & Hoessel, J.G. 1983, ApJ, 264, 337
Schneider, D.P., Schmidt, M., and Gunn, J.E., 1991, AJ, 102, 837
Schneider, D.P., et al. 2002, AJ, 123, 567 (Paper I)
Schneider, D.P., et al. 2003, AJ, 126, 2579 (Paper II)
Schneider, D.P., et al. 2005, AJ, 130, 367 (Paper III)
Shen, Y., et al. 2007, AJ, in press.
Skrutskie, M.F., et al. 2006, AJ, 131, 1163
Smith, J.A., et al. 2002, AJ, 123, 2121
– 21 –
Spergel, D.N., et al. 2006, ApJ, submitted (astro-ph/0603449)
Stark A.A., Gammie C.F., Wilson R.W., Bally J., Linke R.A., Heiles, C., & Hurwitz, M. 1992, ApJS, 79, 77
Steffen, A.T., Strateva, I.V., Brandt, W.N., Alexander, D.M., Koekemoer, A.M., Lehmer, B.D., Schneider,
D.P., & Vignali, C. 2006, AJ, 131, 2826
Stoughton, C., et al. 2002, AJ, 123, 485 (EDR)
Strauss, M.A., et al. 2002, AJ, 124, 1810
Tolea, A., Krolik, J.H., & Tsvetanov, Z. 2002, ApJ, 578, 31
Trump, J.R., et al. 2006, ApJS, 165, 1
Tucker, D., et al. 2006, AN, 327, 821
Vanden Berk, D.E., et al. 2001, AJ, 122, 549
Vanden Berk, D.E., et al. 2005, AJ, 129, 2047
Vignali, C., Brandt, W.N., Schneider, D.P., & Kaspi, S. 2005, AJ, 129, 2519
Voges, W., et al. 1999, A & A, 349, 389
Voges, W., et al. 2000, IAUC, 7432
Weinstein, M.A., et al. 2004, ApJS, 155, 243
Wilhite, B.C., et al. 2005, AJ, 633, 638
York, D.G., et al. 2000, AJ, 120, 1579
Zacharias, N., et al. 2000, AJ, 120, 2131
Zakamska, N.L., et al. 2003, AJ, 128, 1002
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0603449
– 22 –
0 160843.90+071508.6
z = 2.88
163909.11+282447.1
z = 3.82
Wavelength (A)
4000 5000 6000 7000 8000 9000
165551.37+214601.8
z = 0.15
Wavelength (A)
4000 5000 6000 7000 8000 9000
5 125942.80+121312.6
z = 0.75
Fig. 1.— SDSS spectra of four previously unreported quasars. The spectral resolution of the data ranges
from 1800 to 2100; a dichroic splits the beam at 6150 Å. The data have been rebinned to 5 Å pixel−1 for
display purposes. The upper two panels display the two most luminous of the newly discovered quasars;
both objects have Mi < −29.6. SDSS J165551.37+214601.8 is the brightest (i = 15.62) of the new quasars;
SDSS J125942.80+121312.6 is an unusual FeLoBAL quasar with Balmer-line absorption.
– 23 –
Fig. 2.— The observed i magnitude as a function of redshift for the 77,429 objects in the catalog. Open
circles indicate quasars in NED that were recovered but not discovered by the SDSS. The 26 quasars with
i > 21 are not plotted. The distribution is represented by a set of linear contours when the density of points
in this two-dimensional space causes the points to overlap. The steep gradient at i ≈ 19 is due to the flux
limit for the targeted low-redshift part of the survey; the dip in the counts at z ≈ 2.7 arises because of the
high incompleteness of the SDSS Quasar Survey at redshifts between 2.5 and 3.0 (also see Figure 3).
– 24 –
Fig. 3.— The redshift histogram of the catalog quasars. The redshifts range from 0.08 to 5.41; the median
redshift of the catalog is 1.48. The redshift bins have a width of 0.05. The dips at redshifts of 2.7 and 3.5
are caused by the reduced efficiency of the selection algorithm at these redshifts. The lower histogram is the
redshift distribution of the i < 19.1 sample after correction for selection effects (see Section 5).
– 25 –
 i mag
16 18 20 22
Fig. 4.— The i magnitude (not corrected for Galactic absorption) histogram of the 77,429 catalog quasars.
The magnitude bins have a width of 0.108. The sharp drop that occurs at magnitudes slightly fainter than 19
is due to the flux limit for the low-redshift targeted part of the survey. Quasars fainter than the i = 20.2
high-redshift selection limit were found via other selection algorithms, primarily serendipity. The SDSS
Quasar survey has a bright limit of i ≈ 15.0 imposed by the need to avoid saturation in the spectroscopic
observations.
– 26 –
Fig. 5.— The absolute i magnitude as a function of redshift for the 77,429 objects in the catalog. Open
circles indicate quasars in NED that were recovered but not discovered by the SDSS. The distribution is
represented by a set of linear contours when the density of points in this two-dimensional space causes the
points to overlap. The steep gradient that runs through the midst of the quasar distribution is produced by
the i ≈ 19 flux limit for the targeted low-redshift part of the survey.
– 27 –
Absolute i Magnitude
-30 -28 -26 -24 -22
Fig. 6.— The luminosity distribution of the catalog quasars. The absolute magnitude bins have a width
of 0.114. The most luminous quasar in the catalog has Mi ≈ −30.3. In the adopted cosmology 3C 273 has
Mi ≈ −26.6.
– 28 –
Fig. 7.— The quasar color-redshift relation for the DR5 quasars (photometry corrected for Galactic ex-
tinction). Contours are used to represent the distribution when the density of points causes the points to
overlap. The panels present the four standard SDSS colors; the dashed gray lines are the modal relations
presented in Table 5. The influence of emission lines on the colors is readily apparent (in particular Hα in
the (i− z) panel). The tightness of the correlations breaks down when the Lyman α forest region dominates
the bluer of the two passbands (e.g., above redshifts of 2.2 in the (u− g) relation).
– 29 –
005421.42-010921.6
z = 5.09
073103.12+445949.4
z = 5.00
4 084627.84+080051.7
z = 5.03
090245.76+085115.9
z = 5.23
092216.81+265359.0
z = 5.03
111920.64+345248.1
z = 5.01
113246.50+120901.6
z = 5.17
114657.79+403708.6
z = 5.01
115424.73+134145.7
z = 5.01
120207.78+323538.8
z = 5.29
123333.48+062234.2
z = 5.29
133412.56+122020.7
z = 5.13
133728.81+415539.8
z = 5.01
134015.03+392630.7
z = 5.03
134040.24+281328.1
z = 5.34
134141.45+461110.2
z = 5.00
Wavelength (A)
6000 7000 8000 9000
142325.92+130300.6
z = 5.04
Wavelength (A)
6000 7000 8000 9000
4 144350.66+362315.1
z = 5.27
Wavelength (A)
6000 7000 8000 9000
3 162629.19+285857.5
z = 5.02
Wavelength (A)
6000 7000 8000 9000
165902.11+270935.1
z = 5.31
Fig. 8.— SDSS spectra of the 20 new quasars with the highest redshifts (z ≥ 4.99). The spectra have
been rebinned to 10 Å pixel−1 for display purposes. The wavelength region below 6000 Å has been removed
because of the lack of signal below rest frame wavelengths of 1000 Å in these objects. Five of the quasars
have redshifts larger than 5.25.
– 30 –
FIRST/SDSS Offset (")
0.0 0.5 1.0 1.5 2.0
RASS/SDSS Offset (")
0 10 20 30
2MASS/SDSS Offset (")
0.0 0.5 1.0 1.5 2.0
Modified Julian Date   (51600.0 = UTC 2000 Feb 26.0)
51500 52000 52500 53000 53500
Fig. 9.— a) Offsets between the 6226 SDSS and FIRST matches; the matching radius was set to 2.0′′. The
smooth curve is the expected distribution for a set of matches if the offsets between the objects are described
by a Rayleigh distribution with σ = 0.21′′ (best fit for points with separations of less than 1.0′′). b) Offsets
between the 4133 SDSS and RASS FSC/BSC matches; the matching radius was set to 30′′. The smooth
curve is the Rayleigh distribution fit (σ = 11.1′′) to all of the points. c) Offsets between the 9824 SDSS
and 2MASS matches; the matching radius was set to 2′′. The smooth curve is a Rayleigh distribution with
σ = 0.21′′ based on the points with separations smaller than 1.0′′. d) The cumulative number of DR5 quasars
as a function of time. The horizontal axis runs from February 2000 to June 2005. The periodic structure in
the curve is caused by the yearly summer maintenance schedule. The total number of objects in the catalog
is 77,429.
– 31 –
Table 1. SDSS DR5 Quasar Catalog Format
Column Format Description
1 A18 SDSS DR5 Designation hhmmss.ss+ddmmss.s (J2000)
2 F11.6 Right Ascension in decimal degrees (J2000)
3 F11.6 Declination in decimal degrees (J2000)
4 F7.4 Redshift
5 F7.3 BEST PSF u magnitude (not corrected for Galactic extinction)
6 F6.3 Error in BEST PSF u magnitude
7 F7.3 BEST PSF g magnitude (not corrected for Galactic extinction)
8 F6.3 Error in BEST PSF g magnitude
9 F7.3 BEST PSF r magnitude (not corrected for Galactic extinction)
10 F6.3 Error in BEST PSF r magnitude
11 F7.3 BEST PSF i magnitude (not corrected for Galactic extinction)
12 F6.3 Error in BEST PSF i magnitude
13 F7.3 BEST PSF z magnitude (not corrected for Galactic extinction)
14 F6.3 Error in BEST PSF z magnitude
15 F7.3 Galactic extinction in u band
16 F7.3 logNH (logarithm of Galactic H I column density)
17 F7.3 FIRST peak flux density at 20 cm expressed as AB magnitude;
0.0 is no detection, −1.0 source is not in FIRST area
18 F8.3 S/N of FIRST flux density
19 F7.3 SDSS-FIRST separation in arc seconds
20 I3 > 3σ FIRST flux at optical position but no FIRST counterpart within 2′′ (0 or 1)
21 I3 FIRST source located 2′′-30′′ from optical position (0 or 1)
22 F8.3 log RASS full band count rate; −9.0 is no detection
23 F7.3 S/N of RASS count rate
24 F7.3 SDSS-RASS separation in arc seconds
25 F7.3 J magnitude (2MASS); 0.0 indicates no 2MASS detection
26 F6.3 Error in J magnitude (2MASS)
27 F7.3 H magnitude (2MASS); 0.0 indicates no 2MASS detection
28 F6.3 Error in H magnitude (2MASS)
29 F7.3 K magnitude (2MASS); 0.0 indicates no 2MASS detection
30 F6.3 Error in K magnitude (2MASS)
31 F7.3 SDSS-2MASS separation in arc seconds
32 F8.3 Mi (H0 = 70 km s
−1 Mpc−1, ΩM = 0.3, ΩΛ = 0.7, αν = −0.5)
33 F7.3 ∆(g − i) = (g − i)− 〈(g − i)〉redshift (Galactic extinction corrected)
34 I3 Morphology flag 0 = point source 1 = extended
35 I3 SDSS SCIENCEPRIMARY flag (0 or 1)
36 I3 SDSS MODE flag (blends, overlapping scans; 1, 2, or 3)
37 I3 Selected with final quasar algorithm (0 or 1)
38 I12 Target Selection Flag (BEST)
39 I3 Low-z Quasar selection flag (0 or 1)
40 I3 High-z Quasar selection flag (0 or 1)
41 I3 FIRST selection flag (0 or 1)
– 32 –
Table 1—Continued
Column Format Description
42 I3 ROSAT selection flag (0 or 1)
43 I3 Serendipity selection flag (0 or 1)
44 I3 Star selection flag (0 or 1)
45 I3 Galaxy selection flag (0 or 1)
46 I6 SDSS Imaging Run Number of photometric measurements
47 I6 Modified Julian Date of imaging observation
48 I6 Modified Julian Date of spectroscopic observation
49 I5 Spectroscopic Plate Number
50 I5 Spectroscopic Fiber Number
51 I4 SDSS Photometric Processing Rerun Number
52 I3 SDSS Camera Column Number
53 I5 SDSS Field Number
54 I5 SDSS Object Number
55 I12 Target Selection Flag (TARGET)
56 I3 Low-z Quasar selection flag (0 or 1)
57 I3 High-z Quasar selection flag (0 or 1)
58 I3 FIRST selection flag (0 or 1)
59 I3 ROSAT selection flag (0 or 1)
60 I3 Serendipity selection flag (0 or 1)
61 I3 Star selection flag (0 or 1)
62 I3 Galaxy selection flag (0 or 1)
63 F7.3 TARGET PSF u magnitude (not corrected for Galactic extinction)
64 F6.3 TARGET Error in PSF u magnitude
65 F7.3 TARGET PSF g magnitude (not corrected for Galactic extinction)
66 F6.3 TARGET Error in PSF g magnitude
67 F7.3 TARGET PSF r magnitude (not corrected for Galactic extinction)
68 F6.3 TARGET Error in PSF r magnitude
69 F7.3 TARGET PSF i magnitude (not corrected for Galactic extinction)
70 F6.3 TARGET Error in PSF i magnitude
71 F7.3 TARGET PSF z magnitude (not corrected for Galactic extinction)
72 F6.3 TARGET Error in PSF z magnitude
73 I21 Spectroscopic Identification flag (64-bit integer)
74 1X, A25 Object Name for previously known quasars
“SDSS” designates previously published SDSS object
Table 2. The SDSS Quasar Catalog IVa
Object (SDSS J) R.A. (deg) Dec (deg) Redshift u g r i z
000006.53+003055.2 0.027228 0.515349 1.8227 20.389 0.066 20.468 0.034 20.332 0.037 20.099 0.041 20.053 0.121
000008.13+001634.6 0.033898 0.276304 1.8365 20.233 0.054 20.200 0.024 19.945 0.032 19.491 0.032 19.191 0.068
000009.26+151754.5 0.038605 15.298476 1.1986 19.921 0.042 19.811 0.036 19.386 0.017 19.165 0.023 19.323 0.069
000009.38+135618.4 0.039088 13.938447 2.2400 19.218 0.026 18.893 0.022 18.445 0.018 18.331 0.024 18.110 0.033
000009.42−102751.9 0.039269 −10.464428 1.8442 19.249 0.036 19.029 0.027 18.980 0.021 18.791 0.018 18.751 0.047
aTable 2 is presented in its entirety in the electronic edition of the Astronomical Journal. A portion is shown here for guidance regarding its form and
content. The full catalog contains 74 columns of information on 77,429 quasars.
– 34 –
Table 3. Spectroscopic Target Selection
TARGET TARGET BEST BEST
Sole Sole
Class Selected Selection Selected Selection
Low-z 49010 16422 46460 14444
High-z 16383 5327 16757 4411
FIRST 3501 226 3619 209
ROSAT 4817 380 4918 492
Serendipity 42109 15729 41042 15950
Star 1970 187 820 162
Galaxy 536 99 601 80
– 35 –
Table 4. Quasars with |zDR5 − zDR3| > 0.1
SDSS J zDR5 SDSS J zDR5
005508.55−105206.2 1.381 133028.12+600811.7 1.992
013413.55+142900.1 1.195 133951.94+481651.3 0.911
031712.23−075850.3 2.696 134048.37+433359.8 2.069
075052.59+300334.1 3.990 135833.05+634122.6 3.180
075132.75+350535.0 2.077 140012.65+595823.3 2.061
083503.79+322242.0 0.728 140223.63+463604.9 0.925
085339.64+372203.6 1.950 140327.91+613654.2 2.023
090902.73+355334.8 1.638 141230.28+471103.7 2.078
091025.25+365921.3 2.004 142010.28+604722.3 1.345
092415.87+424632.2 0.559 143702.47+613437.0 2.064
093557.85+005528.1 1.301 144939.30+534212.1 1.805
093935.08−000801.1 0.909 151307.26−000559.3 2.030
094326.48+460226.8 2.093 151422.99+481936.3 2.071
100415.17+415802.6 1.977 153257.67+422047.1 1.950
102117.71+623010.1 1.949 160320.97+315248.3 0.727
103039.95+510923.3 1.649 165806.76+611858.9 2.631
103219.66+563456.8 2.017 170929.58+323826.9 1.902
115917.62+100921.5 2.028 205058.45+004709.9 0.932
124345.10+492645.3 1.982 212744.12+005720.3 4.386
131810.57+585416.9 1.900 225246.43+142525.8 4.904
– 36 –
Table 5. Quasar Colors as a Function of Redshifta
zbin 〈z〉 NQSO (g − i) (u − g) (g − r) (r − i) (i − z)
0.18 0.181 183 0.567 −0.065 0.197 0.379 −0.037
0.21 0.210 290 0.580 0.032 0.223 0.355 −0.034
0.24 0.240 394 0.513 0.000 0.236 0.267 0.115
0.27 0.270 406 0.289 0.055 0.231 0.077 0.397
0.30 0.301 484 0.236 0.067 0.219 0.033 0.472
aTable 5 is presented in its entirety in the electronic edition of the Astro-
nomical Journal. A portion is shown here for guidance regarding its form
and content.
– 37 –
Table 6. Candidate Binary Quasars
Quasar 1 Quasar 2 z1 z2 ∆θ
001201.87+005259.7 001202.35+005314.0 1.652 1.642 16.0
011757.99+002104.1 011758.83+002021.4 0.612 0.613 44.5
014110.40+003107.1 014111.62+003145.9 1.879 1.882 42.9
024511.93−011317.5 024512.12−011313.9 2.463 2.460 4.5
025813.65−000326.4 025815.54−000334.2 1.316 1.321 29.4
025959.68+004813.6 030000.57+004828.0 0.892 0.900 19.6
074336.85+205512.0 074337.28+205437.1 1.570 1.565 35.5
074759.02+431805.4 074759.66+431811.5 0.501 0.501 9.2
082439.83+235720.3 082440.61+235709.9 0.536 0.536 14.9
085625.63+511137.0 085626.71+511117.8 0.543 0.543 21.8
090923.12+000203.9 090924.01+000211.0 1.884 1.865 15.0
095556.37+061642.4 095559.02+061701.8 1.278 1.273 44.0
110357.71+031808.2 110401.48+031817.5 1.941 1.923 57.3
111610.68+411814.4 111611.73+411821.5 2.980 2.971 13.8
113457.73+084935.2 113459.37+084923.2 1.533 1.525 27.1
121840.47+501543.4 121841.00+501535.8 1.457 1.455 9.1
165501.31+260517.5 165502.02+260516.5 1.881 1.892 9.6
215727.26+001558.4 215728.35+001545.5 2.540 2.553 20.8
aThe quasar pairs were selected by a redshift difference of less than
0.02 and an angular separation less than 60′′.
	Introduction
	Observations
	Sloan Digital Sky Survey
	Target Selection
	Spectroscopy
	Construction of the SDSS DR5 Quasar Catalog
	Creation of the DR5 Quasar Candidate Database
	Visual Examination of the Spectra
	Luminosity and Line Width Criteria
	Catalog Format
	Catalog Summary
	Discrepancies Between the DR5 and Other Quasar Catalogs
	Quasar Colors
	Bright Quasars
	Luminous Quasars
	Broad Absorption Line Quasars
	Quasars with Redshifts Below 0.15
	High-Redshift (z 4) Quasars
	Close Pairs
	Morphology
	Matches with Non-optical Catalogs
	Summary
ABSTRACT
  We present the fourth edition of the Sloan Digital Sky Survey (SDSS) Quasar
Catalog. The catalog contains 77,429 objects; this is an increase of over
30,000 entries since the previous edition. The catalog consists of the objects
in the SDSS Fifth Data Release that have luminosities larger than M_i = -22.0
(in a cosmology with H_0 = 70 km/s/Mpc, Omega_M = 0.3, and Omega_Lambda = 0.7)
have at least one emission line with FWHM larger than 1000 km/s, or have
interesting/complex absorption features, are fainter than i=15.0, and have
highly reliable redshifts. The area covered by the catalog is 5740 sq. deg. The
quasar redshifts range from 0.08 to 5.41, with a median value of 1.48; the
catalog includes 891 quasars at redshifts greater than four, of which 36 are at
redshifts greater than five. Approximately half of the catalog quasars have i <
19; nearly all have i < 21. For each object the catalog presents positions
accurate to better than 0.2 arcsec. rms per coordinate, five-band (ugriz)
CCD-based photometry with typical accuracy of 0.03 mag, and information on the
morphology and selection method. The catalog also contains basic radio,
near-infrared, and X-ray emission properties of the quasars, when available,
from other large-area surveys. The calibrated digital spectra cover the
wavelength region 3800--9200A at a spectral resolution of ~2000. The spectra
can be retrieved from the public database using the information provided in the
catalog. The average SDSS colors of quasars as a function of redshift, derived
from the catalog entries, are presented in tabular form. Approximately 96% of
the objects in the catalog were discovered by the SDSS.

<|endoftext|><|startoftext|>
Introduction and Historical Perspective 3
2 QCD and the Nuclear Force 5
3 Effective Field Theory for Low-Energy QCD 5
3.1 Symmetries of Low-Energy QCD . . . . . . . . . . . . . . . . 6
3.1.1 Chiral Symmetry . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Explicit Symmetry Breaking . . . . . . . . . . . . . . 9
3.1.3 Spontaneous Symmetry Breaking . . . . . . . . . . . . 9
3.2 Chiral Effective Lagrangians Involving Pions . . . . . . . . . 10
3.3 Nucleon Contact Lagrangians . . . . . . . . . . . . . . . . . . 12
4 Nuclear Forces from EFT: Overview 13
4.1 Chiral Perturbation Theory and Power Counting . . . . . . . 14
4.2 The Hierarchy of Nuclear Forces . . . . . . . . . . . . . . . . 14
∗Lecture series presented at the DAE-BRNS Workshop on Physics and Astrophysics
of Hadrons and Hadronic Matter, Visva Bharati University, Santiniketan, West Bengal,
India, November 2006.
5 Two-Nucleon Forces 16
5.1 Pion-Exchange Contributions in ChPT . . . . . . . . . . . . 16
5.1.1 Zeroth Order (LO) . . . . . . . . . . . . . . . . . . . 17
5.1.2 Second Order (NLO) . . . . . . . . . . . . . . . . . . 17
5.1.3 Third Order (NNLO) . . . . . . . . . . . . . . . . . . 19
5.1.4 Fourth Order (N3LO) . . . . . . . . . . . . . . . . . . 20
5.1.5 Iterated One-Pion-Exchange . . . . . . . . . . . . . . 20
5.2 NN Scattering in Peripheral Partial Waves Using the Pertur-
bative Amplitude . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 NN Contact Potentials . . . . . . . . . . . . . . . . . . . . . 28
5.3.1 Zeroth Order . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.2 Second Order . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.3 Fourth Order . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Constructing a Chiral NN Potential . . . . . . . . . . . . . . 31
5.4.1 Conceptual Questions . . . . . . . . . . . . . . . . . . 31
5.4.2 What Order? . . . . . . . . . . . . . . . . . . . . . . . 33
5.4.3 Charge-Dependence . . . . . . . . . . . . . . . . . . . 34
5.4.4 A Quantitative NN Potential at N3LO . . . . . . . . 36
6 Many-Nucleon Forces 39
6.1 Three-Nucleon Forces . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Four-Nucleon Forces . . . . . . . . . . . . . . . . . . . . . . . 42
7 Conclusions 42
A Fourth Order Two-Pion Exchange Contributions 44
A.1 One-loop diagrams . . . . . . . . . . . . . . . . . . . . . . . . 44
A.1.1 c2i contributions. . . . . . . . . . . . . . . . . . . . . . 44
A.1.2 ci/MN contributions. . . . . . . . . . . . . . . . . . . . 44
A.1.3 1/M2N corrections. . . . . . . . . . . . . . . . . . . . . 45
A.2 Two-loop contributions. . . . . . . . . . . . . . . . . . . . . . 46
B Partial Wave Decomposition of the Fourth Order Contact
Potential 48
1 Introduction and Historical Perspective
The theory of nuclear forces has a long history (cf. Table 1). Based upon
the seminal idea by Yukawa [1], first field-theoretic attempts to derive the
nucleon-nucleon (NN) interaction focused on pion-exchange. While the one-
pion exchange turned out to be very useful in explaining NN scattering data
and the properties of the deuteron [2], multi-pion exchange was beset with
serious ambiguities [3, 4]. Thus, the “pion theories” of the 1950s are gen-
erally judged as failures—for reasons we understand today: pion dynamics
is constrained by chiral symmetry, a crucial point that was unknown in the
1950s.
Historically, the experimental discovery of heavy mesons [5] in the early
1960s saved the situation. The one-boson-exchange (OBE) model [6, 7]
emerged which is still the most economical and quantitative phenomenol-
ogy for describing the nuclear force [8, 9]. The weak point of this model,
however, is the scalar-isoscalar “sigma” or “epsilon” boson, for which the
empirical evidence remains controversial. Since this boson is associated with
the correlated (or resonant) exchange of two pions, a vast theoretical effort
that occupied more than a decade was launched to derive the 2π-exchange
contribution to the nuclear force, which creates the intermediate range at-
traction. For this, dispersion theory as well as field theory were invoked
producing the Stony Brook [10], Paris [11, 12], and Bonn [7, 13] potentials.
The nuclear force problem appeared to be solved; however, with the
discovery of quantum chromodynamics (QCD), all “meson theories” were
relegated to models and the attempts to derive the nuclear force started all
over again.
The problem with a derivation from QCD is that this theory is non-
perturbative in the low-energy regime characteristic of nuclear physics, which
makes direct solutions impossible. Therefore, during the first round of new
attempts, QCD-inspired quark models [14] became popular. These models
are able to reproduce qualitatively and, in some cases, semi-quantitatively
the gross features of the nuclear force [15, 16]. However, on a critical note,
it has been pointed out that these quark-based approaches are nothing but
another set of models and, thus, do not represent any fundamental progress.
Equally well, one may then stay with the simpler and much more quantita-
tive meson models.
A major breakthrough occurred when the concept of an effective field
theory (EFT) was introduced and applied to low-energy QCD. As outlined
by Weinberg in a seminal paper [17], one has to write down the most general
Lagrangian consistent with the assumed symmetry principles, particularly
Table 1: Seven Decades of Struggle: The Theory of Nuclear Forces
1935 Yukawa: Meson Theory
The “Pion Theories”
1950’s One-Pion Exchange: o.k.
Multi-Pion Exchange: disaster
Many pions ≡ multi-pion resonances:
1960’s σ, ρ, ω, ...
The One-Boson-Exchange Model: success
Refined meson models, including
1970’s sophisticated 2π exchange contributions
(Stony Brook, Paris, Bonn)
Nuclear physicists discover
1980’s QCD
Quark Cluster Models
Nuclear physicists discover EFT
1990’s Weinberg, van Kolck
and beyond Back to Pion Theory!
But, constrained by Chiral Symmetry: success
the (broken) chiral symmetry of QCD. At low energy, the effective degrees
of freedom are pions and nucleons rather than quarks and gluons; heavy
mesons and nucleon resonances are “integrated out”. So, the circle of his-
tory is closing and we are back to Yukawa’s meson theory, except that we
have learned to add one important refinement to the theory: broken chiral
symmetry is a crucial constraint that generates and controls the dynamics
and establishes a clear connection with the underlying theory, QCD.
Following the first initiative by Weinberg [18], pioneering work was per-
formed by Ordóñez, Ray, and van Kolck [19, 20] who constructed a NN
potential in coordinate space based upon chiral perturbation theory at next-
to-next-to-leading order. The results were encouraging and many researchers
became attracted to the new field [21, 22, 23, 24, 25, 26, 27]. As a conse-
quence, nuclear EFT has developed into one of the most popular branches
of modern nuclear physics [28, 29].
It is the purpose of these lectures to describe in some detail the recent
progress in our understanding of nuclear forces in terms of nuclear EFT.
2 QCD and the Nuclear Force
Quantum chromodynamics (QCD) is the theory of strong interactions. It
deals with quarks, gluons and their interactions and is part of the Standard
Model of Particle Physics. QCD is a non-Abelian gauge field theory with
color SU(3) the underlying gauge group. The non-Abelian nature of the
theory has dramatic consequences. While the interaction between colored
objects is weak at short distances or high momentum transfer (“asymptotic
freedom”); it is strong at long distances ( >∼ 1 fm) or low energies, leading
to the confinement of quarks into colorless objects, the hadrons. Conse-
quently, QCD allows for a perturbative analysis at large energies, whereas it
is highly non-perturbative in the low-energy regime. Nuclear physics resides
at low energies and the force between nucleons is a residual QCD interac-
tion. Therefore, in terms of quarks and gluons, the nuclear force is a very
complicated problem.
3 Effective Field Theory for Low-Energy QCD
The way out of the dilemma of how to derive the nuclear force from QCD
is provided by the effective field theory (EFT) concept. First, one needs to
identify the relevant degrees of freedom. For the ground state and the low-
energy excitation spectrum of an atomic nucleus as well as for conventional
nuclear reactions, quarks and gluons are ineffective degrees of freedom, while
nucleons and pions are the appropriate ones. Second; to make sure that this
EFT is not just another phenomenology, the EFT must observe all relevant
symmetries of the underlying theory. This requirement is based upon a ‘folk
theorem’ by Weinberg [17]:
If one writes down the most general possible Lagrangian, in-
cluding all terms consistent with assumed symmetry principles,
and then calculates matrix elements with this Lagrangian to any
given order of perturbation theory, the result will simply be the
most general possible S-matrix consistent with analyticity, per-
turbative unitarity, cluster decomposition, and the assumed sym-
metry principles.
Thus, the EFT program consists of the following steps:
1. Identify the degrees of freedom relevant at the resolution scale of (low-
energy) nuclear physics: nucleons and pions.
2. Identify the relevant symmetries of low-energy QCD and investigate if
and how they are broken.
3. Construct the most general Lagrangian consistent with those symme-
tries and the symmetry breaking.
4. Design an organizational scheme that can distinguish between more
and less important contributions: a low-momentum expansion.
5. Guided by the expansion, calculate Feynman diagrams to the the de-
sired accuracy for the problem under consideration.
We will now elaborate on these steps, one by one.
3.1 Symmetries of Low-Energy QCD
In this section, we will give a brief introduction into (low-energy) QCD, its
symmetries and symmetry breaking. A more detailed introduction can be
found in the excellent lecture series by Scherer and Schindler [30].
3.1.1 Chiral Symmetry
The QCD Lagrangian reads
LQCD = q̄(iγµDµ −M)q −
Gµν,aGµνa (1)
with the gauge-covariant derivative
Dµ = ∂µ + ig
Aµ,a (2)
and the gluon field strength tensor
Gµν,a = ∂µAν,a − ∂νAµ,a − gfabcAµ,bAν,c . (3)
In the above, q denotes the quark fields and M the quark mass matrix.
Further, g is the strong coupling constant and Aµ,a are the gluon fields.
The λa are the Gell-Mann matrices and the fabc the structure constants
of the SU(3)color Lie algebra (a, b, c = 1, . . . , 8); summation over repeated
indices is always implied. The gluon-gluon term in the last equation arises
from the non-Abelian nature of the gauge theory and is the reason for the
peculiar features of the color force.
On a typical hadronic scale, i.e., on a scale of low-mass hadrons which are
not Goldstone bosons, e.g., mρ = 0.78 GeV ≈ 1 GeV; the masses of the up
(u), down (d), and—to a certain extend—strange (s) quarks are small [31]:
mu = 2± 1 MeV (4)
md = 5± 2 MeV (5)
ms = 95± 25 MeV (6)
It is therefore of interest to discuss the QCD Lagrangian in the limit of
vanishing quark masses:
L0QCD = q̄iγ
µDµq −
Gµν,aGµνa . (7)
Defining right- and left-handed quark fields,
qR = PRq , qL = PLq , (8)
(1 + γ5) , PL =
(1− γ5) , (9)
we can rewrite the Lagrangian as follows:
L0QCD = q̄Riγ
µDµqR + q̄LiγµDµqL −
Gµν,aGµνa . (10)
Restricting ourselves now to up and down quarks, we see that L0QCD is
invariant under the global unitary transformations
7−→ exp
−iΘRi
7−→ exp
−iΘLi
, (12)
where τi (i = 1, 2, 3) are the generators of SU(2)flavor, the usual Pauli spin
matrices. The right- and left-handed components of massless quarks do not
mix. This is SU(2)R × SU(2)R symmetry, also known as chiral symmetry.
Noether’s Theorem implies the existence of six conserved currents; three
right-handed currents
i = q̄Rγ
qR with ∂µR
i = 0 (13)
and three left-handed currents
i = q̄Lγ
qL with ∂µL
i = 0 . (14)
It is useful to consider the following linear combinations; namely, three vec-
tor currents
i = R
i + L
i = q̄γ
q with ∂µV
i = 0 (15)
and three axial-vector currents
i = R
i − L
i = q̄γ
q with ∂µA
i = 0 , (16)
which got their names from the fact that they transform as vectors and
axial-vectors, respectively. Thus, the chiral SU(2)L × SU(2)R symmetry is
equivalent to SU(2)V ×SU(2)A, where the vector and axial-vector transfor-
mations are given respectively by
7−→ exp
−iΘVi
7−→ exp
−iΘAi γ5
. (18)
Obviously, the vector transformations are isospin rotations and, therefore,
invariance under vector transformations can be identified with isospin sym-
metry.
There are the six conserved charges,
QVi =
d3x V 0i =
d3x q†(t, ~x)
q(t, ~x) with
= 0 (19)
QAi =
d3x A0i =
d3x q†(t, ~x)γ5
q(t, ~x) with
= 0 , (20)
which are also generators of SU(2)V × SU(2)A.
3.1.2 Explicit Symmetry Breaking
The mass term −q̄Mq in the QCD Lagrangian Eq. (1) breaks chiral sym-
metry explicitly. To better see this, let’s rewrite M,
(mu +md)
(mu −md)
(mu +md) I +
(mu −md) τ3 . (23)
The first term in the last equation in invariant under SU(2)V (isospin sym-
metry) and the second term vanishes for mu = md. Thus, isospin is an exact
symmetry if mu = md. However, both terms in Eq. (23) break SU(2)A.
Since the up and down quark masses are small as compared to the typical
hadronic mass scale of ≈ 1 GeV [cf. Eqs. (4) and (5)], the explicit chiral
symmetry breaking due to non-vanishing quark masses is very small.
3.1.3 Spontaneous Symmetry Breaking
A (continuous) symmetry is said to be spontaneously broken if a symmetry
of the Lagrangian is not realized in the ground state of the system. There is
evidence that the chiral symmetry of the QCD Lagrangian is spontaneously
broken—for dynamical reasons of nonperturbative origin which are not fully
understood at this time. The most plausible evidence comes from the hadron
spectrum. From chiral symmetry, one would naively expect the existence
of degenerate hadron multiplets of opposite parity, i.e., for any hadron of
positive parity one would expect a degenerate hadron state of negative parity
and vice versa. However, these “parity doublets” are not observed in nature.
For example, take the ρ-meson, a vector meson with negative parity (1−)
and mass 776 MeV. There does exist a 1+ meson, the a1, but it has a mass
of 1230 MeV and, thus, cannot be perceived as degenerate with the ρ. On
the other hand, the ρ meson comes in three charge states (equivalent to
three isospin states), the ρ± and the ρ0 with masses that differ by at most
a few MeV. In summary, in the QCD ground state (the hadron spectrum)
SU(2)V (isospin symmetry) is well observed, while SU(2)A (axial symmetry)
is broken. Or, in other words, SU(2)V ×SU(2)A is broken down to SU(2)V .
A spontaneously broken global symmetry implies the existence of (mass-
less) Goldstone bosons with the quantum numbers of the broken generators.
The broken generators are the QAi of Eq. (20) which are pseudoscalar. The
Goldstone bosons are identified with the isospin triplet of the (pseudoscalar)
pions, which explains why pions are so light. The pion masses are not ex-
actly zero because the up and down quark masses are not exactly zero either
(explicit symmetry breaking). Thus, pions are a truly remarkable species:
they reflect spontaneous as well as explicit symmetry breaking.
3.2 Chiral Effective Lagrangians Involving Pions
The next step in our EFT program is to build the most general Lagrangian
consistent with the (broken) symmetries discussed above. An elegant formal-
ism for the construction of such Lagrangians was developed by Callan, Cole-
man, Wess, and Zumino (CCWZ) [32] who worked out the group-theoretical
foundations of non-linear realizations of chiral symmetry. The Lagrangians
given below are built upon the CCWZ formalism.
As discussed, the relevant degrees of freedom are pions (Goldstone bosons)
and nucleons. Since the interactions of Goldstone bosons must vanish at zero
momentum transfer and in the chiral limit (m→ 0), the low-energy expan-
sion of the Lagrangian is arranged in powers of derivatives and pion masses.
This is chiral perturbation theory (ChPT).
The Lagrangian consists of one part that deals with the interaction
among pions, Lππ, and another one that describes the interaction between
pions and the nucleon, LπN :
Leff = Lππ + LπN (24)
Lππ = L(2)ππ + L
ππ + . . . (25)
LπN = L
πN + L
πN + L
πN + . . . , (26)
where the superscript refers to the number of derivatives or pion mass inser-
tions (chiral dimension) and the ellipsis stands for terms of higher dimension.
The leading order (LO) ππ Lagrangian is given by [33]
L(2)ππ =
∂µU∂µU
† +m2π(U + U
and the LO relativistic πN Lagrangian reads [34]
L(1)πN = Ψ̄
iγµDµ −MN +
γµγ5uµ
Ψ (28)
Dµ = ∂µ + Γµ (29)
(ξ†∂µξ + ξ∂µξ
τ · (π × ∂µπ) + . . . (30)
uµ = i(ξ
†∂µξ − ξ∂µξ†) = −
τ · ∂µπ + . . . (31)
U = ξ2 = 1 +
τ · π −
(τ · π)3 +
8α− 1
π4 + . . . (32)
In Eq. (28) the chirally covariant derivative Dµ is applied which introduces
the “gauge term” Γµ (also known as chiral connection), a vector current that
leads to a coupling of pions with the nucleon. Besides this, the Lagrangian
includes a coupling term which involves the axial vector uµ. The SU(2)
matrix U = ξ2 collects the Goldstone pion fields.
In the above equations, MN denotes the nucleon mass, gA the axial-
vector coupling constant, and fπ the pion decay constant. Numerical values
will be given later.
The coefficient α that appears in Eq. (32) is arbitrary. Therefore, dia-
grams with chiral vertices that involve three or four pions must always be
grouped together such that the α-dependence drops out (cf. Fig. 4, below).
We apply the heavy baryon (HB) formulation of chiral perturbation the-
ory [35] in which the relativistic πN Lagrangian is subjected to an expansion
in terms of powers of 1/MN (kind of a nonrelativistic expansion), the lowest
order of which is
L̂(1)πN = N̄
iD0 −
~σ · ~u
i∂0 −
τ · (π × ∂0π)−
τ · (~σ · ~∇)π
N + . . . (33)
In the relativistic formulation, the nucleon is represented by a four-component
Dirac spinor field, Ψ, while in the HB version, the nucleon, N , is a Pauli
spinor; in addition, all nucleon fields include Pauli spinors describing the
isospin of the nucleon.
At dimension two, the relativistic πN Lagrangian reads
L(2)πN =
ciΨ̄O
i Ψ . (34)
The various operators O(2)i are given in Ref. [36]. The fundamental rule by
which this Lagrangian—as well as all the other ones—are assembled is that
they must contain all terms consistent with chiral symmetry and Lorentz
invariance (apart from other trivial symmetries) at a given chiral dimension
(here: order two). The parameters ci are known as low-energy constants
(LECs) and are determined empirically from fits to πN data.
The HB projected πN Lagrangian at order two is most conveniently
broken up into two pieces,
L̂(2)πN = L̂
πN, fix + L̂
πN, ct , (35)
L̂(2)πN, fix = N̄
~D · ~D + i
{~σ · ~D, u0}
N (36)
L̂(2)πN, ct = N̄
2 c1m
π (U + U
u20 + c3 uµu
~σ · (~u× ~u)
N . (37)
Note that L̂(2)πN, fix is created entirely from the HB expansion of the relativistic
L(1)πN and thus has no free parameters (“fixed”), while L̂
πN, ct is dominated
by the new πN contact terms proportional to the ci parameters, besides
some small 1/MN corrections.
At dimension three, the relativistic πN Lagrangian can be formally writ-
ten as
L(3)πN =
diΨ̄O
i Ψ , (38)
with the operators, O(3)i , listed in Refs. [36, 37]; not all 23 terms are of
interest here. The new LECs that occur at this order are the di. Similar
to the order two case, the HB projected Lagrangian at order three can be
broken into two pieces,
L̂(3)πN = L̂
πN, fix + L̂
πN, ct , (39)
with L̂(3)πN, fix and L̂
πN, ct given in Refs. [36, 37].
3.3 Nucleon Contact Lagrangians
Nucleon contact interactions consist of four nucleon fields (four nucleon legs)
and no meson fields. Such terms are needed to renormalize loop integrals,
to make results reasonably independent of regulators, and to parametrize
the unresolved short-distance contributions to the nuclear force. For more
about contact terms, see Sec. 5.3.
Because of parity, nucleon contact interactions come only in even num-
bers of derivatives, thus,
LNN = L
NN + L
NN + L
NN + . . . (40)
The lowest order (or leading order) NN Lagrangian has no derivatives
and reads [18]
L(0)NN = −
CSN̄NN̄N −
CT N̄~σNN̄~σN , (41)
where N is the heavy baryon nucleon field. CS and CT are unknown con-
stants which are determined by a fit to the NN data. The second order NN
Lagrangian is given by [19]
L(2)NN = −C
1[(N̄ ~∇N)
2 + (~∇NN)2]− C ′2(N̄ ~∇N) · (~∇NN)
−C ′3N̄N [N̄ ~∇
2N + ~∇2NN ]
−iC ′4[N̄ ~∇N · (~∇N × ~σN) + (~∇N)N · (N̄~σ × ~∇N)]
−iC ′5N̄N(~∇N · ~σ × ~∇N)− iC
6(N̄~σN) · (~∇N × ~∇N)
−(C ′7δikδjl + C
8δilδkj + C
9δijδkl)
×[N̄σk∂iNN̄σl∂jN + ∂iNσkN∂jNσlN ]
−(C ′10δikδjl + C
11δilδkj + C
12δijδkl)N̄σk∂iN∂jNσlN
C ′13(δikδjl + δilδkj)
+C ′14δijδkl)[∂iNσk∂jN + ∂jNσk∂iN ]N̄σlN . (42)
Similar to CS and CT , the C ′i are unknown constants which are fixed in a
fit to the NN data. Obviously, these contact Lagrangians blow up quite a
bit with increasing order, which why we do not give L(4)NN explicitly here.
4 Nuclear Forces from EFT: Overview
In the beginning of Sec. 3, we spelled out the steps we have to take to
accomplish our EFT program for the derivation of nuclear forces. So far,
we discussed steps one to three. What is left are steps four (low-momentum
expansion) and five (Feynman diagrams). In this section, we will say more
about the expansion we are using and give an overview of the Feynman
diagrams that arise order by order.
4.1 Chiral Perturbation Theory and Power Counting
In ChPT, we analyze contributions in terms of powers of small momenta
over the large scale: (Q/Λχ)ν , where Q stands for a momentum (nucleon
three-momentum or pion four-momentum) or a pion mass and Λχ ≈ 1 GeV
is the chiral symmetry breaking scale (hadronic scale). Determining the
power ν at which a given diagram contributes has become known as power
counting. For a non-iterative contribution involving A nucleons, the power
ν is given by
ν = −2 + 2A− 2C + 2L+
∆i , (43)
∆i ≡ di +
− 2 , (44)
where C denotes the number of separately connected pieces and L the num-
ber of loops in the diagram; di is the number of derivatives or pion-mass
insertions and ni the number of nucleon fields involved in vertex i; the sum
runs over all vertices contained in the diagram under consideration. Note
that for an irreducible NN diagram (A = 2), the above formula reduces to
ν = 2L+
∆i (45)
The power ν is bounded from below; e.g., for A = 2, ν ≥ 0. This fact is
crucial for the power expansion to be of any use.
4.2 The Hierarchy of Nuclear Forces
Chiral perturbation theory and power counting imply that nuclear forces
emerge as a hierarchy ruled by the power ν, Fig. 1.
The NN amplitude is determined by two classes of contributions: con-
tact terms and pion-exchange diagrams. There are two contacts of order
Q0 [O(Q0)] represented by the four-nucleon graph with a small-dot vertex
shown in the first row of Fig. 1. The corresponding graph in the second row,
four nucleon legs and a solid square, represents the seven contact terms of
O(Q2). Finally, at O(Q4), we have 15 contact contributions represented by
a four-nucleon graph with a solid diamond.
Now, turning to the pion contributions: At leading order [LO, O(Q0),
ν = 0], there is only the well-known static one-pion exchange (1PE), second
diagram in the first row of Fig. 1. Two-pion exchange (2PE) starts at next-
to-leading order (NLO, ν = 2) and all diagrams of this leading-order two-
pion exchange are shown. Further 2PE contributions occur in any higher
+... +... +...
2N Force 3N Force 4N Force
N LO3
Figure 1: Hierarchy of nuclear forces in ChPT. Solid lines represent nucleons
and dashed lines pions. Further explanations are given in the text.
order. Of this sub-leading 2PE, we show only two representative diagrams at
next-to-next-to-leading order (NNLO) and three diagrams at next-to-next-
to-next-to-leading order (N3LO).
Finally, there is also three-pion exchange, which shows up for the first
time at N3LO (two loops; one representative 3π diagram is included in
Fig. 1). At this order, the 3π contribution is negligible [38].
One important advantage of ChPT is that it makes specific predictions
also for many-body forces. For a given order of ChPT, two-nucleon forces
(2NF), three-nucleon forces (3NF), . . . are generated on the same footing (cf.
Fig. 1). At LO, there are no 3NF, and at NLO, all 3NF terms cancel [18,
39]. However, at NNLO and higher orders, well-defined, nonvanishing 3NF
occur [39, 40]. Since 3NF show up for the first time at NNLO, they are
weak. Four-nucleon forces (4NF) occur first at N3LO and, therefore, they
are even weaker.
5 Two-Nucleon Forces
In this section, we will elaborate in detail on the two-nucleon force contri-
butions of which we have given a rough overview in the previous section.
5.1 Pion-Exchange Contributions in ChPT
The effective pion Lagrangians presented in Sec. 3.2 are the crucial ingredi-
ents for the evaluation of the pion-exchange contributions to the NN inter-
action. We will derive these contributions now order by order.
We will state our results in terms of contributions to the momentum-
space NN amplitude in the center-of-mass system (CMS), which takes the
general form
V (~p ′, ~p) = VC + τ 1 · τ 2WC
+ [VS + τ 1 · τ 2WS ] ~σ1 · ~σ2
+ [VLS + τ 1 · τ 2WLS ]
−i~S · (~q × ~k)
+ [VT + τ 1 · τ 2WT ] ~σ1 · ~q ~σ2 · ~q
+ [VσL + τ 1 · τ 2WσL ] ~σ1 · (~q × ~k ) ~σ2 · (~q × ~k ) , (46)
where ~p ′ and ~p denote the final and initial nucleon momenta in the CMS,
respectively; moreover,
~q ≡ ~p ′ − ~p is the momentum transfer,
~k ≡ 1
(~p ′ + ~p) the average momentum,
~S ≡ 1
(~σ1 + ~σ2) the total spin,
and ~σ1,2 and τ 1,2 are the spin and isospin operators, respectively, of nucleon
1 and 2. For on-energy-shell scattering, Vα and Wα (α = C, S, LS, T, σL)
can be expressed as functions of q and k (with q ≡ |~q| and k ≡ |~k|), only.
Our formalism is similar to the one used by the Munich group [22, 41, 42]
except for two differences: all our momentum space amplitudes differ by an
over-all factor of (−1) and our spin-orbit potentials, VLS and WLS , differ by
an additional factor of (−2). Our conventions are more in tune with what
is commonly used in nuclear physics.
In all expressions given below, we will state only the nonpolynomial con-
tributions to the NN amplitude. Note, however, that dimensional regular-
ization typically generates also polynomial terms. These polynomials are
absorbed by the contact interactions to be discussed in a later section and,
therefore, they are of no interest here.
5.1.1 Zeroth Order (LO)
At order zero [ν = 0, O(Q0), lowest order, leading order, LO], there is only
the well-known static one-pion exchange, second diagram in the first row of
Fig. 1 which is given by:
V1π(~p
′, ~p) = −
τ 1 · τ 2
~σ1 · ~q ~σ2 · ~q
q2 +m2π
. (48)
At first order [ν = 1, O(Q)], there are no pion-exchange contributions
(and also no contact terms).
5.1.2 Second Order (NLO)
Non-vanishing higher-order graphs start at second order (ν = 2, next-to-
leading order, NLO). The most efficient way to evaluate these loop diagrams
is to use covariant perturbation theory and dimensional regularization. This
is the method applied by the Munich group [22, 41, 42]. One starts with
the relativistic versions of the πN Lagrangians (cf. Sec. 3.2) and sets up
four-dimensional (covariant) loop integrals. Relativistic vertices and nucleon
propagators are then expanded in powers of 1/MN . The divergences that
occur in conjunction with the four-dimensional loop integrals are treated by
means of dimensional regularization, a prescription which is consistent with
chiral symmetry and power counting. The results derived in this way are
the same obtained when starting right away with the HB versions of the
πN Lagrangians. However, as it turns out, the method used by the Munich
group is more efficient in dealing with the rather tedious calculations.
Two-pion exchange occurs first at second order, also know as leading-
order 2π exchange. The graphs are shown in the first row of Fig. 2. Since
a loop creates already ν = 2, the vertices involved at this order can only be
from the leading/lowest order Lagrangian L̂(1)πN , Eq. (33), i. e., they carry
only one derivative. These vertices are denoted by small dots in Fig. 2.
Concerning the box diagram, we should note that we include only the non-
iterative part of this diagram which is obtained by subtracting the iter-
ated 1PE contribution Eq. (65) or Eq. (66), below, but using M2N/Ep ≈
(N LO)2
(NLO)
Figure 2: Two-pion exchange contributions to the NN interaction at order two
and three in small momenta. Solid lines represent nucleons and dashed lines pions.
Small dots denote vertices from the leading order πN Lagrangian L̂(1)πN , Eq. (33).
Large solid dots are vertices proportional to the LECs ci from the second order
Lagrangian L̂(2)πN, ct, Eq. (37). Symbols with an open circles are relativistic 1/MN
corrections contained in the second order Lagrangian L̂(2)πN , Eqs. (35). Only a few
representative examples of 1/MN corrections are shown and not all.
M2N/Ep′′ ≈ MN at this order (NLO). Summarizing all contributions from
irreducible two-pion exchange at second order, one obtains [22]:
WC = −
384π2f4π
4m2π(5g
A − 4g
A − 1) + q
2(23g4A − 10g
A − 1)
48g4Am
, (49)
VT = −
VS = −
3g4AL(q)
64π2f4π
, (50)
where
L(q) ≡
w + q
4m2π + q2 . (52)
5.1.3 Third Order (NNLO)
The two-pion exchange diagrams of order three (ν = 3, next-to-next-to-
leading order, NNLO) are very similar to the ones of order two, except that
they contain one insertion from L̂(2)πN , Eq. (35). The resulting contributions
are typically either proportional to one of the low-energy constants ci or
they contain a factor 1/MN . Notice that relativistic 1/MN corrections can
occur for vertices and nucleon propagators. In Fig. 2, we show in row 2
the diagrams with vertices proportional to ci (large solid dot), Eq. (37),
and in row 3 and 4 a few representative graphs with a 1/MN correction
(symbols with an open circle). The number of 1/MN correction graphs
is large and not all are shown in the figure. Again, the box diagram is
corrected for a contribution from the iterated 1PE. If the iterative 2PE of
Eq. (65) is used, the expansion of the factor M2N/Ep = MN − p
2/2MN + . . .
is applied and the term proportional to (−p2/2MN ) is subtracted from the
third order box diagram contribution. Then, one obtains for the full third
order contribution [22]:
16πf4π
16MNw2
2m2π(2c1 − c3)− q
× w̃2A(q)
, (53)
128πMNf4π
3g2Am
4m2π + 2q
2 − g2A(4m
π + 3q
w̃2A(q)
, (54)
VT = −
9g4Aw̃
2A(q)
512πMNf4π
, (55)
WT = −
g2AA(q)
32πf4π
(10m2π + 3q
, (56)
VLS =
3g4Aw̃
2A(q)
32πMNf4π
, (57)
WLS =
g2A(1− g
32πMNf4π
w2A(q) , (58)
A(q) ≡
arctan
2m2π + q2 . (60)
As discussed in Sec. 5.1.5, below, we prefer the iterative 2PE defined in
Eq. (66), which leads to a different NNLO term for the iterative 2PE. This
changes the 1/MN terms in the above potentials. The changes are obtained
by adding to Eqs. (53)-(56) the following terms:
VC = −
256πf4πMN
2 + ω̃4A(q)) (61)
128πf4πMN
2 + ω̃4A(q)) (62)
VT = −
512πf4πMN
(mπ + ω
2A(q)) (63)
WT = −
WS = −
256πf4πMN
(mπ + ω
2A(q)) (64)
5.1.4 Fourth Order (N3LO)
This order, which may also be denoted by next-to-next-to-next-to-leading
order (N3LO), is very involved. Three-pion exchange (3PE) occurs for the
first time at this order. The 3PE contribution at N3LO has been calculated
by the Munich group and found to be negligible [38]. Therefore, we will
ignore it.
The 2PE contributions at N3LO can be subdivided into two groups, one-
loop graphs, Fig. 3, and two-loop diagrams, Fig. 4. Since these contributions
are very complicated, we have moved them to Appendix A.
5.1.5 Iterated One-Pion-Exchange
Besides all the irreducible 2PE contributions presented above, there is also
the reducible 2PE which is generated from iterated 1PE. This “iterative
2PE” is the only 2PE contribution which produces an imaginary part. Thus,
one wishes to formulate this contribution such that relativistic elastic uni-
tarity is satisfied. There are several ways to achieve this.
Kaiser et al. [22] define the iterative 2PE contribution as follows,
(KBW)
2π,it (~p
′, ~p) =
d3p′′
(2π)3
V1π(~p ′, ~p ′′)V1π(~p ′′, ~p)
p2 − p′′2 + i�
Q4 (N LO)3
Figure 3: One-loop 2π-exchange contributions to the NN interaction at order
four. Basic notation as in Fig. 2. Symbols with a large solid dot and an open circle
denote 1/MN corrections of vertices proportional to ci. Symbols with two open
circles mark relativistic 1/M2N corrections. Both corrections are part of the third
order Lagrangian L̂(3)πN , Eq. (39). Representative examples for all types of one-loop
graphs that occur at this order are shown.
with V1π given in Eq. (48).
Since we adopt the relativistic scheme developed by Blankenbecler and
Sugar [43] (BbS) (see beginning of Sec. 5.4), we prefer the following for-
mulation which is consistent with the BbS approach (and, of course, with
relativistic elastic unitarity):
2π,it (~p
′, ~p) =
d3p′′
(2π)3
V1π(~p ′, ~p ′′)V1π(~p ′′, ~p)
p2 − p′′2 + i�
. (66)
The iterative 2PE contribution has to be subtracted from the covariant
box diagram, order by order. For this, the expansion M2N/Ep = MN −
p2/2MN + . . . is applied in Eq. (65) and M2N/Ep′′ = MN −p
′′2/2MN + . . . in
Eq. (66). At NLO, both choices for the iterative 2PE collapse to the same,
� � � �
� � � �
� � �
(N LO)3
Figure 4: Two-loop 2π-exchange contributions at order four. Basic notation as in
Fig. 2. The oval stands for all one-loop πN graphs some of which are shown in the
lower part of the figure. The solid square represents vertices proportional to the
LECs di which are introduced by the third order Lagrangian L
πN , Eq. (38). More
explanations are given in the text.
while at NNLO there are obvious differences.
5.2 NN Scattering in Peripheral Partial Waves Using the
Perturbative Amplitude
After the tedious mathematics of the previous section, it is time for more
tangible affairs. The obvious question to address now is: How does the
derived NN amplitude compare to empirical information? Since our deriva-
tion includes only one- and two-pion exchanges, we are dealing here with
the long- and intermediate-range part of the NN interaction. This part of
the nuclear force is probed in the peripheral partial waves of NN scattering.
Thus, in this section, we will calculate the phase shifts that result from the
NN amplitudes presented in the previous section and compare them to the
empirical phase shifts as well as to the predictions from conventional meson
theory. Besides the irreducible two-pion exchanges derived above, we must
also include 1PE and iterated 1PE.
In this section [44], which is restricted to just peripheral waves, we
will always consider neutron-proton (np) scattering and take the charge-
dependence of 1PE due to pion-mass splitting into account, since it is ap-
preciable. With the definition
V1π(mπ) ≡ −
~σ1 · ~q ~σ2 · ~q
q2 +m2π
, (67)
the charge-dependent 1PE for np scattering is
1π (~p
′, ~p) = −Vπ(mπ0) + (−1)
I+1 2Vπ(mπ±) , (68)
where I denotes the isospin of the two-nucleon system. We use mπ0 =
134.9766 MeV, mπ± = 139.5702 MeV [31], and
2MpMn
Mp +Mn
= 938.9182 MeV . (69)
Also in the iterative 2PE, we apply the charge-dependent 1PE, i.e., in
Eq. (66) we replace V1π with V
The perturbative relativistic T-matrix for np scattering in peripheral
waves is
T (~p ′, ~p) = V (np)1π (~p
′, ~p) + V (EM,np)2π,it (~p
′, ~p) + V2π,irr(~p
′, ~p) , (70)
where V2π,irr refers to any or all of the irreducible 2PE contributions pre-
sented in Sec. 5.1, depending on the order at which the calculation is con-
ducted. In the calculation of the irreducible 2PE, we use the average pion
mass mπ = 138.039 MeV and, thus, neglect the charge-dependence due to
pion-mass splitting. The charge-dependence that emerges from irreducible
2π exchange was investigated in Ref. [45] and found to be negligible for
partial waves with L ≥ 3.
For the T -matrix given in Eq. (70), we calculate phase shifts for partial
waves with L ≥ 3 and Tlab ≤ 300 MeV. At order four in small momenta,
partial waves with L ≥ 3 do not receive any contributions from contact inter-
actions and, thus, the non-polynomial pion contributions uniquely predict
the F and higher partial waves. We use fπ = 92.4 MeV [31] and gA = 1.29.
Via the Goldberger-Treiman relation, gA = gπNN fπ/MN , our value for gA
is consistent with g2πNN/4π = 13.63± 0.20 which is obtained from πN and
NN analysis [46, 47].
The LECs used in this calculation are shown in Table 2, column “NN
periph. Fig. 5”. Note that many determinations of the LECs, ci and d̄i, can
be found in the literature. The most reliable way to determine the LECs
Table 2: Low-energy constants, LECs, used for a NN potential at N3LO,
Sec. 5.4.4, and in the calculation of the peripheral NN phase shifts shown
in Fig. 5 (column “NN periph. Fig. 5”). The ci belong to the dimension-
two πN Lagrangian, Eq. (37), and are in units of GeV−1, while the d̄i are
associated with the dimension-three Lagrangian, Eq. (38), and are in units of
GeV−2. The column “πN empirical” shows determinations from πN data.
LEC NN potential NN periph. πN
at N3LO Fig. 5 empirical
c1 –0.81 –0.81 −0.81± 0.15a
c2 2.80 3.28 3.28± 0.23b
c3 –3.20 –3.40 −4.69± 1.34a
c4 5.40 3.40 3.40± 0.04a
d̄1 + d̄2 3.06 3.06 3.06± 0.21b
d̄3 –3.27 –3.27 −3.27± 0.73b
d̄5 0.45 0.45 0.45± 0.42b
d̄14 − d̄15 –5.65 –5.65 −5.65± 0.41b
aTable 1, Fit 1 of Ref. [48].
bTable 2, Fit 1 of Ref. [37].
from empirical πN information is to extract them from the πN amplitude
inside the Mandelstam triangle (unphysical region) which can be constructed
with the help of dispersion relations from empirical πN data. This method
was used by Büttiker and Meißner [48]. Unfortunately, the values for c2 and
all d̄i parameters obtained in Ref. [48] carry uncertainties, so large that the
values cannot provide any guidance. Therefore, in Table 2, only c1, c3, and
c4 are from Ref. [48], while the other LECs are taken from Ref. [37] where
the πN amplitude in the physical region was considered. To establish a link
between πN and NN , we apply the values from the above determinations
in our calculations of the NN peripheral phase shifts. In general, we use
the mean values; the only exception is c3, where we choose a value that is,
in terms of magnitude, about one standard deviation below the one from
Ref. [48]. With the exception of c3, phase shift predictions do not depend
sensitively on variations of the LECs within the quoted uncertainties.
In Fig. 5, we show the phase-shift predictions for neutron-proton scat-
tering in F waves for laboratory kinetic energies below 300 MeV (for G and
H waves, see Ref. [26]). The orders displayed are defined as follows:
• Leading order (LO) is just 1PE, Eq. (68).
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
Figure 5: F -wave phase shifts of neutron-proton scattering for laboratory kinetic
energies below 300 MeV. We show the predictions from chiral pion exchange at lead-
ing order (LO), next-to-leading order (NLO), next-to-next-to-leading order (N2LO),
and next-to-next-to-next-to-leading order (N3LO). The solid dots and open circles
are the results from the Nijmegen multi-energy np phase shift analysis [49] and the
VPI single-energy np analysis SM99 [50], respectively.
• Next-to-leading order (NLO) is 1PE, Eq. (68), plus iterated 1PE,
Eq. (66), plus the contributions of Sec. 5.1.2 (order two), Eqs. (49)
and (50).
• Next-to-next-to-leading order (denoted by N2LO in the figures) con-
sists of NLO plus the contributions of Sec. 5.1.3 (order three), Eqs. (53)-
(58) and (61)-(64).
• Next-to-next-to-next-to-leading order (denoted by N3LO in the fig-
ures) consists of N2LO plus the contributions of Sec. 5.1.4 (order four),
Eqs. (99)-(112) and (115)-(124).
It is clearly seen in Fig. 5 that the leading order 2π exchange (NLO) is
a rather small contribution, insufficient to explain the empirical facts. In
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
Figure 6: F -wave phase shifts of neutron-proton scattering for laboratory kinetic
energies below 300 MeV. We show the results from one-pion-exchange (OPE),
and one- plus two-pion exchange as predicted by ChPT at next-to-next-to-next-
to-leading order (N3LO) and by the Bonn Full Model [13] (Bonn). Note that the
“Bonn” curve does not include the repulsive ω and πρ exchanges of the full model,
since this figure serves the purpose to compare just predictions by different mod-
els/theories for the π + 2π contribution to the NN interaction. Empirical phase
shifts (solid dots and open circles) as in Fig. 5.
contrast, the next order (N2LO) is very large, several times NLO. This is
due to the ππNN contact interactions proportional to the LECs ci that are
introduced by the second order Lagrangian L(2)πN , Eq. (34). These contacts
are supposed to simulate the contributions from intermediate ∆-isobars and
correlated 2π exchange which are known to be large (see, e. g., Ref. [13]).
At N3LO a clearly identifiable trend towards convergence emerges. Ob-
viously, 1F3 and 3F4 appear fully converged. However, in 3F2 and 3F3, N3LO
differs noticeably from NNLO, but the difference is much smaller than the
one between NNLO and NLO. This is what we perceive as a trend towards
convergence.
In Fig. 6, we conduct a comparison between the predictions from chi-
ral one- and two-pion exchange at N3LO and the corresponding predictions
from conventional meson theory (curve ‘Bonn’). As representative for con-
ventional meson theory, we choose the Bonn meson-exchange model for the
NN interaction [13], since it contains a comprehensive and thoughtfully con-
structed model for 2π exchange. This 2π model includes box and crossed
box diagrams with NN , N∆, and ∆∆ intermediate states as well as di-
rect ππ interaction in S- and P -waves (of the ππ system) consistent with
empirical information from πN and ππ scattering. Besides this the Bonn
model also includes (repulsive) ω-meson exchange and irreducible diagrams
of π and ρ exchange (which are also repulsive). However, note that in the
phase shift predictions displayed in Fig. 6, the “Bonn” curve includes only
the 1π and 2π contributions from the Bonn model; the short-range contri-
butions are left out since the purpose of the figure is to compare different
models/theories for π + 2π. In all waves shown we see, in general, good
agreement between N3LO and Bonn. In 3F2 and 3F3 above 150 MeV and
in 3F4 above 250 MeV the chiral model at N3LO is more attractive than
the Bonn 2π model. Note, however, that the Bonn model is relativistic and,
thus, includes relativistic corrections up to infinite orders. Thus, one may
speculate that higher orders in ChPT may create some repulsion, moving
the Bonn and the chiral predictions even closer together [51].
The 2π exchange contribution to the NN interaction can also be de-
rived from empirical πN and ππ input using dispersion theory, which is
based upon unitarity, causality (analyticity), and crossing symmetry. The
amplitude NN̄ → ππ is constructed from πN → πN and πN → ππN
data using crossing properties and analytic continuation; this amplitude is
then ‘squared’ to yield the NN̄ amplitude which is related to NN by cross-
ing symmetry [52]. The Paris group [11, 12] pursued this path and calcu-
lated NN phase shifts in peripheral partial waves. Naively, the dispersion-
theoretic approach is the ideal one, since it is based exclusively on empirical
information. Unfortunately, in practice, quite a few uncertainties enter into
the approach. First, there are ambiguities in the analytic continuation and,
second, the dispersion integrals have to be cut off at a certain momentum
to ensure reasonable results. In Ref. [13], a thorough comparison was con-
ducted between the predictions by the Bonn model and the Paris approach
and it was demonstrated that the Bonn predictions always lie comfortably
within the range of uncertainty of the dispersion-theoretic results. There-
fore, there is no need to perform a separate comparison of our chiral N3LO
predictions with dispersion theory, since it would not add anything that we
cannot conclude from Fig. 6.
Finally, we need to compare the predictions with the empirical phase
shifts. In F waves the N3LO predictions above 200 MeV are, in general,
too attractive. Note, however, that this is also true for the predictions by
the Bonn π + 2π model. In the full Bonn model, besides π + 2π, (repul-
sive) ω and πρ exchanges are included which move the predictions right
on top of the data. The exchange of a ω meson or combined πρ exchange
are 3π exchanges. Three-pion exchange occurs first at chiral order four.
It has be investigated by Kaiser [38] and found to be negligible, at this
order. However, 3π exchange at order five appears to be sizable [53] and
may have impact on F waves. Besides this, there is the usual short-range
phenomenology. In ChPT, this short-range interaction is parametrized in
terms of four-nucleon contact terms (since heavy mesons do not have a place
in that theory). Contact terms of order four (N3LO) do not contribute to
F -waves, but order six does. In summary, the remaining small discrepan-
cies between the N3LO predictions and the empirical phase shifts may be
straightened out in fifth or sixth order of ChPT.
5.3 NN Contact Potentials
In conventional meson theory, the short-range nuclear force is described by
the exchange of heavy mesons, notably the ω(782). The qualitative short-
distance behavior of the NN potential is obtained by Fourier transform of
the propagator of a heavy meson,
ei~q·~r
m2ω + ~q2
e−mωr
. (71)
ChPT is an expansion in small momenta Q, too small to resolve struc-
tures like a ρ(770) or ω(782) meson, because Q � Λχ ≈ mρ,ω. But the
latter relation allows us to expand the propagator of a heavy meson into a
power series,
m2ω +Q2
−+ . . .
, (72)
where the ω is representative for any heavy meson of interest. The above
expansion suggests that it should be possible to describe the short distance
part of the nuclear force simply in terms of powers of Q/mω, which fits in
well with our over-all power scheme since Q/mω ≈ Q/Λχ.
A second purpose of contact terms is renormalization. Dimensional reg-
ularization of the loop integrals of pion-exchanges (cf. Sec. 5.1) typically
generates polynomial terms with coefficients that are, in part, infinite or
scale dependent. Contact terms pick up infinities and remove scale depen-
dence.
The partial-wave decomposition of a power Qν has an interesting prop-
erty. First note that Q can only be either the momentum transfer between
the two interacting nucleons q or the average momentum k [cf. Eq. (47) for
their definitions]. In any case, for even ν,
Qν = f ν
(cos θ) , (73)
where fm stands for a polynomial of degree m and θ is the CMS scattering
angle. The partial-wave decomposition of Qν for a state of orbital-angular
momentum L involves the integral
QνPL(cos θ)d cos θ =
(cos θ)PL(cos θ)d cos θ , (74)
where PL is a Legendre polynomial. Due to the orthogonality of the PL,
L = 0 for L >
. (75)
Consequently, contact terms of order zero contribute only in S-waves, while
order-two terms contribute up to P -waves, order-four terms up to D-waves,
etc..
We will now present, one by one, the various orders of NN contact
terms together with their partial-wave decomposition [54]. Note that, due
to parity, only even powers of Q are allowed.
5.3.1 Zeroth Order
The contact potential at order zero reads:
V (0)(~p′, ~p) = CS + CT ~σ1 · ~σ2 (76)
Partial wave decomposition yields:
V (0)(1S0) = C̃1S0 = 4π (CS − 3CT )
V (0)(3S1) = C̃3S1 = 4π (CS + CT ) (77)
5.3.2 Second Order
The contact potential contribution of order two is given by:
V (2)(~p′, ~p) = C1q
2 + C2k
2 + C4k
~σ1 · ~σ2
−i~S · (~q × ~k)
+ C6(~σ1 · ~q) (~σ2 · ~q)
+ C7(~σ1 · ~k) (~σ2 · ~k) (78)
Second order partial wave contributions:
S0) = C1S0(p
C2 − 3C3 −
C4 − C6 −
P0) = C3P0 pp
C5 + 2C6 −
P1) = C1P1 pp
C2 + 2C3 −
P1) = C3P1 pp
S1) = C3S1(p
C2 + C3 +
S1 −3 D1) = C3S1−3D1p
P2) = C3P2 pp
5.3.3 Fourth Order
The contact potential contribution of order four reads:
V (4)(~p′, ~p) = D1q
4 +D2k
4 +D3q
2k2 +D4(~q × ~k)2
4 +D6k
4 +D7q
2k2 +D8(~q × ~k)2
~σ1 · ~σ2
2 +D10k
−i~S · (~q × ~k)
2 +D12k
(~σ1 · ~q) (~σ2 · ~q)
2 +D14k
(~σ1 · ~k) (~σ2 · ~k)
+ D15
~σ1 · (~q × ~k) ~σ2 · (~q × ~k)
The rather lengthy partial-wave expressions of this order have been relegated
to Appendix B.
5.4 Constructing a Chiral NN Potential
5.4.1 Conceptual Questions
The two-nucleon system is non-perturbative as evidenced by the presence
of a shallow bound state (the deuteron) and large scattering lengths. Wein-
berg [18] showed that the strong enhancement of the scattering amplitude
arises from purely nucleonic intermediate states. He therefore suggested to
use perturbation theory to calculate the NN potential and to apply this
potential in a scattering equation to obtain the NN amplitude. We adopt
this prescription.
Since the irreducible diagrams that make up the potential are calculated
using covariant perturbation theory (cf. Sec. 5.1), it is consistent to start
from the covariant Bethe-Salpeter (BS) equation [55] describing two-nucleon
scattering. In operator notation, the BS equation reads
T = V + V G T (81)
with T the invariant amplitude for the two-nucleon scattering process, V the
sum of all connected two-particle irreducible diagrams, and G the relativistic
two-nucleon propagator. The BS equation is equivalent to a set of two
equations
T = V + V g T (82)
V = V + V (G − g)V (83)
= V + V1π (G − g)V1π + . . . , (84)
where g is a covariant three-dimensional propagator which preserves rela-
tivistic elastic unitarity. We choose the propagator g proposed by Blanken-
becler and Sugar (BbS) [43] (for more details on relativistic three-dimensional
reductions of the BS equation, see Ref. [7]). The ellipsis in Eq. (84) stands
for terms of irreducible 3π and higher pion exchanges which we neglect.
Note that when we speak of covariance in conjunction with (heavy baryon)
ChPT, we are not referring to manifest covariance. Relativity and relativis-
tic off-shell effects are accounted for in terms of a Q/MN expansion up to
the given order. Thus, Eq. (84) is evaluated in the following way,
V ≈ V(on-shell) + V1π G V1π − V1π g V1π , (85)
where the pion-exchange content of V(on-shell) is V1π+V ′2π with V1π the on-
shell 1PE given in Eq. (48) and V ′2π the irreducible 2π exchanges calculated
in Sec. 5.1, but without the box. V1π denotes the relativistic (off-shell) 1PE.
Notice that the term (V1π G V1π−V1π g V1π) represents what has been called
“the (irreducible part of the) box diagram contribution” in Sec. 5.1 where
it was evaluated at various orders.
The full chiral NN potential V is given by irreducible pion exchanges
Vπ and contact terms Vct,
V = Vπ + Vct (86)
Vπ = V1π + V2π + . . . , (87)
where the ellipsis denotes irreducible 3π and higher pion exchanges which
are omitted. Two-pion exchange contributions appear in various orders
V2π = V
2π + V
2π + V
2π + . . . (88)
as calculated in Sec. 5.1. Contact terms come in even orders,
Vct = V
ct + V
ct + V
ct + . . . (89)
and were presented in Sec. 5.3. The potential V is calculated at a given order.
For example, the potential at NNLO includes 2PE up to V (3)2π and contacts
up to V (2)ct . At N
3LO, contributions up to V (4)2π and V
ct are included.
The potential V satisfies the relativistic BbS equation, Eq. (82). Defining
V̂ (~p ′, ~p) ≡
(2π)3
V (~p ′, ~p)
T̂ (~p ′, ~p) ≡
(2π)3
T (~p ′, ~p)
with Ep ≡
M2N + ~p
2 (the factor 1/(2π)3 is added for convenience), the
BbS equation collapses into the usual, nonrelativistic Lippmann-Schwinger
(LS) equation,
T̂ (~p ′, ~p) = V̂ (~p ′, ~p) +
d3p′′ V̂ (~p ′, ~p ′′)
p2 − p′′2 + i�
T̂ (~p ′′, ~p) . (92)
Since V̂ satisfies Eq. (92), it can be used like a usual nonrelativistic
potential, and T̂ is the conventional nonrelativistic T-matrix.
Iteration of V̂ in the LS equation requires cutting V̂ off for high momenta
to avoid infinities, This is consistent with the fact that ChPT is a low-
momentum expansion which is valid only for momenta Q � Λχ ≈ 1 GeV.
Thus, we multiply V̂ with a regulator function
V̂ (~p ′, ~p) 7−→ V̂ (~p ′, ~p) e−(p
′/Λ)2n e−(p/Λ)
≈ V̂ (~p ′, ~p)
+ . . .
with the ‘cutoff parameter’ Λ around 0.5 GeV. Equation (94) provides an
indication of the fact that the exponential cutoff does not necessarily affect
the given order at which the calculation is conducted. For sufficiently large
n, the regulator introduces contributions that are beyond the given order.
Assuming a good rate of convergence of the chiral expansion, such orders are
small as compared to the given order and, thus, do not affect the accuracy at
the given order. In our calculations we use, of course, the full exponential,
Eq. (93), and not the expansion. On a similar note, we also do not expand
the square-root factors in Eqs. (90-91) because they are kinematical factors
which guarantee relativistic elastic unitarity.
5.4.2 What Order?
Since in nuclear EFT we are dealing with a perturbative expansion, at some
point, we have to raise the question, to what order of ChPT we have to go
to obtain the precision we need. To discuss this issue on firm grounds, we
show in Table 3 the χ2/datum for the fit of the world np data below 290
MeV for a family of np potentials at NLO and NNLO. The NLO potentials
produce the very large χ2/datum between 67 and 105, and the NNLO are
between 12 and 27. The rate of improvement from one order to the other
is very encouraging, but the quality of the reproduction of the np data at
NLO and NNLO is obviously insufficient for reliable predictions.
Table 3: χ2/datum for the reproduction of the 1999 np database [56] by
families of np potentials at NLO and NNLO constructed by the Juelich
group [57].
Tlab bin # of np — Juelich np potentials —
(MeV) data NLO NNLO
0–100 1058 4–5 1.4–1.9
100–190 501 77–121 12–32
190–290 843 140–220 25–69
0–290 2402 67–105 12–27
Based upon these facts, it has been pointed out in 2002 by Entem and
Machleidt [25, 26] that one has to proceed to N3LO. Consequently, the first
N3LO potential was published in 2003 [27].
At N3LO, there are 24 contact terms (24 parameters) which contribute to
the partial waves with L ≤ 2 (cf. Sec. 5.3). In Table 4, column ‘Q4/N3LO’,
we show how these terms/parameters are distributed over the various partial
waves. For comparison, we also show the number of parameters used in
the Nijmegen partial wave analysis (PWA93) [49] and in the high-precision
CD-Bonn potential [9]. The table reveals that, for S and P waves, the
number of parameters used in high-precision phenomenology and in EFT at
N3LO are about the same. Thus, the EFT approach provides retroactively
a justification for what the phenomenologists of the 1990’s were doing. At
NLO and NNLO, the number of parameters is substantially smaller than for
PWA93 and CD-Bonn, which explains why these orders are insufficient for
a quantitative potential. This fact is also clearly reflected in Fig. 7 where
phase shifts are shown for potentials constructed at NLO, NNLO, and N3LO.
5.4.3 Charge-Dependence
For an accurate fit of the low-energy pp and np data, charge-dependence
is important. We include charge-dependence up to next-to-leading order of
the isospin-violation scheme (NLØ, in the notation of Ref. [58]). Thus, we
include the pion mass difference in 1PE and the Coulomb potential in pp
scattering, which takes care of the LØ contributions. At order NLØ, we
have the pion mass difference in 2PE at NLO, πγ exchange [59], and two
charge-dependent contact interactions of order Q0 which make possible an
accurate fit of the three different 1S0 scattering lengths, app, ann, and anp.
Table 4: Number of parameters needed for fitting the np data in phase-shift
analysis and by a high-precision NN potential versus the number of NN
contact terms of EFT based potentials at different orders.
Nijmegen CD-Bonn — Contact Potentials —
partial-wave high-precision Q0 Q2 Q4
analysis [49] potential [9] LO NLO/NNLO N3LO
1S0 3 4 1 2 4
3S1 3 4 1 2 4
3S1-3D1 2 2 0 1 3
1P1 3 3 0 1 2
3P0 3 2 0 1 2
3P1 2 2 0 1 2
3P2 3 3 0 1 2
3P2-3F2 2 1 0 0 1
1D2 2 3 0 0 1
3D1 2 1 0 0 1
3D2 2 2 0 0 1
3D3 1 2 0 0 1
3D3-3G3 1 0 0 0 0
1F3 1 1 0 0 0
3F2 1 2 0 0 0
3F3 1 2 0 0 0
3F4 2 1 0 0 0
3F4-3H4 0 0 0 0 0
1G4 1 0 0 0 0
3G3 0 1 0 0 0
3G4 0 1 0 0 0
3G5 0 1 0 0 0
Total 35 38 2 9 24
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
Figure 7: Phase parameters for np scattering as calculated from NN potentials at
different orders of ChPT. The dotted line is NLO [57], the dashed NNLO [57], and
the solid N3LO [27]. Partial waves with total angular momentum J ≤ 2 are dis-
played. Solid dots represent the Nijmegen multienergy np phase shift analysis [49]
and open circles are the GWU/VPI single-energy np analysis SM99 [50].
5.4.4 A Quantitative NN Potential at N3LO
NN Scattering. The fitting procedure starts with the peripheral partial
waves because they depend on fewer parameters. Partial waves with L ≥ 3
are exclusively determined by 1PE and 2PE because the N3LO contacts
contribute to L ≤ 2 only. 1PE and 2PE at N3LO depend on the axial-
vector coupling constant, gA (we use gA = 1.29), the pion decay constant,
fπ = 92.4 MeV, and eight low-energy constants (LECs) that appear in the
dimension-two and dimension-three πN Lagrangians, Eqs. (37) and (38).
In the fitting process, we varied three of them, namely, c2, c3, and c4. We
found that the other LECs are not very effective in the NN system and,
therefore, we kept them at the values determined from πN (cf. Table 2).
The most influential constant is c3, which has to be chosen on the low side
(slightly more than one standard deviation below its πN determination) for
an optimal fit of the NN data. As compared to a calculation that strictly
uses the πN values for c2 and c4, our choices for these two LECs lower
Table 5: χ2/datum for the reproduction of the 1999 np database [56]
by various np potentials. (Numbers in parentheses are the values of cutoff
parameters in units of MeV used in the regulators of the chiral potentials.)
Tlab bin # of np Idaho Juelich Argonne
(MeV) data N3LO [27] N3LO [60] V18 [61]
(500–600) (600/700–450/500)
0–100 1058 1.0–1.1 1.0–1.1 0.95
100–190 501 1.1–1.2 1.3–1.8 1.10
190–290 843 1.2–1.4 2.8–20.0 1.11
0–290 2402 1.1–1.3 1.7–7.9 1.04
Table 6: χ2/datum for the reproduction of the 1999 pp database [56] by
various pp potentials. Notation as in Fig. 5.
Tlab bin # of np Idaho Juelich Argonne
(MeV) data N3LO [27] N3LO [60] V18 [61]
(500–600) (600/700–450/500)
0–100 795 1.0–1.7 1.0–3.8 1.0
100–190 411 1.5–1.9 3.5–11.6 1.3
190–290 851 1.9–2.7 4.3–44.4 1.8
0–290 2057 1.5–2.1 2.9–22.3 1.4
the 3F2 and 1F3 phase shifts bringing them into closer agreement with the
phase shift analysis. The other F waves and the higher partial waves are
essentially unaffected by our variations of c2 and c4. Overall, the fit of all
J ≥ 3 waves is very good.
We turn now to the lower partial waves. Here, the most important fit
parameters are the ones associated with the 24 contact terms that contribute
to the partial waves with L ≤ 2. In addition, we have two charge-dependent
contacts which are used to fit the three different 1S0 scattering lengths, app,
ann, and anp.
In the optimization procedure, we fit first phase shifts, and then we
refine the fit by minimizing the χ2 obtained from a direct comparison with
the data. The χ2/datum for the fit of the np data below 290 MeV is shown
in Table 5, and the corresponding one for pp is given in Table 6. These tables
show that at N3LO a χ2/datum comparable to the high-precision Argonne
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
0 100 200 300
Lab. Energy (MeV)
Figure 8: Neutron-proton phase parameters as described by two potentials at
N3LO. The solid curve is calculated from the Idaho N3LO potential [27] while the
dashed curve is from the Juelich [60] one. Solid dots and open circles as in Fig. 7.
V18 [61] potential can, indeed, be achieved. The “Idaho” N3LO potential [27]
produces a χ2/datum = 1.1 for the world np data below 290 MeV which
compares well with the χ2/datum = 1.04 by the Argonne potential. In 2005,
also the Juelich group produced several N3LO NN potentials [60], the best
of which fits the np data with a χ2/datum = 1.7 and the worse with a
χ2/datum = 7.9 (see Table 5). While 7.9 is clearly unacceptable for any
meaningful application, a χ2/datum of 1.7 is reasonable, although it does
not meet the precision standard that few-nucleon physicists established in
the 1990’s.
Turning to pp, the χ2 for pp data are typically larger than for np be-
cause of the higher precision of pp data. Thus, the Argonne V18 produces a
χ2/datum = 1.4 for the world pp data below 290 MeV and the best Idaho
N3LO pp potential obtains 1.5. The fit by the best Juelich N3LO pp poten-
tial results in a χ2/datum = 2.9 which, again, is not quite consistent with
the precision standards established in the 1990’s. The worst Juelich N3LO
pp potential produces a χ2/datum of 22.3 and is incompatible with reliable
predictions.
Table 7: Deuteron properties as predicted by various NN potentials are com-
pared to empirical information. (Deuteron binding energy Bd, asymptotic S state
AS , asymptotic D/S state η, deuteron radius rd, quadrupole moment Q, D-state
probability PD; the calculated rd and Q are without meson-exchange current con-
tributions and relativistic corrections.)
Idaho Juelich
N3LO [27] N3LO [60] CD-Bonn[9] AV18[61] Empiricala
(500) (550/600)
Bd (MeV) 2.224575 2.218279 2.224575 2.224575 2.224575(9)
AS (fm−1/2) 0.8843 0.8820 0.8846 0.8850 0.8846(9)
η 0.0256 0.0254 0.0256 0.0250 0.0256(4)
rd (fm) 1.975 1.977 1.966 1.967 1.97535(85)
Q (fm2) 0.275 0.266 0.270 0.270 0.2859(3)
PD (%) 4.51 3.28 4.85 5.76
aSee Table XVIII of Ref. [9] for references; the empirical value for rd is from Ref. [62].
Phase shifts of np scattering from the best Idaho (solid line) and Juelich
(dashed line) N3LO np potentials are shown in Figure 8. The phase shifts
confirm what the corresponding χ2 have already revealed.
The Deuteron. The reproduction of the deuteron parameters is shown
in Table 7. We present results for two N3LO potentials, namely, Idaho [27]
and Juelich [60]. Remarkable are the predictions by the chiral potentials for
the deuteron radius which are in good agreement with the latest empirical
value obtained by the isotope-shift method [62]. All NN potentials of the
past (Table 7 includes two representative examples, namely, CD-Bonn [9]
and AV18 [61]) fail to reproduce this very precise new value for the deuteron
radius.
In Fig. 9, we display the deuteron wave functions derived from the N3LO
potentials and compare them with wave functions based upon conventional
NN potentials from the recent past. Characteristic differences are notice-
able; in particular, the chiral wave functions are shifted towards larger r
which explains the larger deuteron radius.
6 Many-Nucleon Forces
As noted before, an important advantage of the EFT approach to nuclear
forces is that it creates two- and many-nucleon forces on an equal footing.
0 1 2 3 4 5 6
r (fm)
Figure 9: Deuteron wave functions: the family of larger curves are S-waves,
the smaller ones D-waves. The thick lines represent the wave functions
derived from chiral NN potentials at order N3LO (thick solid: Idaho [27],
thick dashed: Juelich [60]). The thin dashed, dash-dotted, and dotted lines
refer to the wave functions of the CD-Bonn [9], Nijm-I [8], and AV18 [61]
potentials, respectively.
6.1 Three-Nucleon Forces
The first non-vanishing 3NF terms occur at NNLO and are shown in Fig. 10
(cf. also Fig. 1, row ‘Q3/NNLO’, column ‘3N Force’). There are three di-
agrams: the 2PE, 1PE, and 3N-contact interactions [39, 40]. The 2PE
3N-potential is given by
V 3NF2PE =
i 6=j 6=k
(~σi · ~qi)(~σj · ~qj)
(q2i +m
ijk τ
j (95)
with ~qi ≡ ~pi′ − ~pi, where ~pi and ~pi′ are the initial and final momenta of
nucleon i, respectively, and
ijk = δ
4c1m2π
~qi · ~qj
�αβγ τ
k ~σk · [~qi × ~qj ] . (96)
The vertex involved in this 3NF term is the two-derivative ππNN vertex
(solid square in Fig. 10) which we encountered already in the 2PE contribu-
tion to the NN potential at NNLO. Thus, there are no new parameters and
1 2 3
Figure 10: The three-nucleon force at NNLO (from Ref. [40]).
the contribution is fixed by the LECs used in NN . The 1PE contribution is
V 3NF1PE = D
i 6=j 6=k
~σj · ~qj
q2j +m
(τ i · τ j)(~σi · ~qj) (97)
and, finally, the 3N contact term reads
V 3NFct = E
j 6=k
τ j · τ k . (98)
The last two 3NF terms involve two new vertices (that do not appear in the
2N problem), namely, the πNNNN vertex with parameter D and a 6N ver-
tex with parameters E. To pin them down, one needs two observables that
involve at least three nucleons. In Ref. [40], the triton binding energy and
the nd doublet scattering length 2and were used. Alternatively, one may also
choose the binding energies of 3H and 4He [63]. Once D and E are fixed, the
results for other 3N, 4N, . . . observables are predictions. In Refs. [64, 63],
the first calculations of the structure of light nuclei (6Li and 7Li) were re-
ported. Recently, the structure of nuclei with A = 10−13 nucleons has been
calculated using the ab initio no-core shell model and applying chiral two
and three-nucleon forces [65]. The results are very encouraging. Concerning
the famous ‘Ay puzzle’, the above 3NF terms yield some improvement of
the predicted nd Ay, however, the problem is not solved [40].
Note that the 3NF expressions given in Eqs. (95)-(98) above are the ones
that occur at NNLO, and all calculations to date have included only those.
Since we have to proceed to N3LO for sufficient accuracy of the 2NF, then
consistency requires that we also consider the 3NF at N3LO. The 3NF at
N3LO is very involved as can be seen from Fig. 11, but it does not depend
on any new parameters. It is presently under construction [66]. So, for the
moment, we can only hope that the Ay puzzle may be solved by a complete
calculation at N3LO.
Figure 11: Three-nucleon force contributions at N3LO (from Ref. [66]).
6.2 Four-Nucleon Forces
In ChPT, four-nucleon forces (4NF) appear for the first time at N3LO
(ν = 4). Thus, N3LO is the leading order for 4NF. Assuming a good rate of
convergence, a contribution of order (Q/Λχ)4 is expected to be rather small.
Thus, ChPT predicts 4NF to be essentially insignificant, consistent with ex-
perience. Still, nothing is fully proven in physics unless we have performed
explicit calculations. Very recently, the first such calculation has been per-
formed: The chiral 4NF, Fig. 12, has been applied in a calculation of the
4He binding energy and found to contribute a few 100 keV [68]. It should
be noted that this preliminary calculation involves many approximations,
but it certainly provides the right order of magnitude of the result, which is
indeed very small as compared to the full 4He binding energy of 28.3 MeV.
7 Conclusions
The theory of nuclear forces has made great progress since the turn of the
millennium. Nucleon-nucleon potentials have been developed that are based
on proper theory (EFT for low-energy QCD) and are of high-precision, at
the same time. Moreover, the theory generates two- and many-body forces
on an equal footing and provides a theoretical explanation for the empirically
known fact that 2NF � 3NF � 4NF . . . .
(a) (b) (c) (d)
(e) (f) (g) (h)
(i) (j) (k) (l)
(m) (n) (o) (p)
Figure 12: The four-nucleon force at N3LO (from Ref. [67]).
At N3LO [26, 27], the accuracy can be achieved that is necessary and
sufficient for reliable microscopic nuclear structure predictions. First cal-
culations applying the N3LO NN potential [27] in the conventional shell
model [69, 70], the ab initio no-core shell model [71, 72, 73], the coupled
cluster formalism [74, 75, 76, 77, 78], and the unitary-model-operator ap-
proach [79] have produced promising results.
The 3NF at NNLO is known [39, 40] and has been applied in few-nucleon
reactions [40, 80, 81] as well as the structure of light nuclei [64, 63, 65]. How-
ever, the famous ‘Ay puzzle’ of nucleon-deuteron scattering is not resolved
by the 3NF at NNLO. Thus, one important outstanding issue is the 3NF at
N3LO, which is under construction [66].
Another open question that needs to be settled is whether Weinberg
power counting, which is applied in all current NN potentials, is consistent.
This controversial issue is presently being debated in the literature [82, 83].
Acknowledgements
It is a pleasure to thank the organizers of this workshop, particularly,
Ananda Santra, for their warm hospitality. I gratefully acknowledge numer-
ous discussions with my collaborator D. R. Entem. This work was supported
in part by the U.S. National Science Foundation under Grant No. PHY-
0099444.
A Fourth Order Two-Pion Exchange Contributions
The fourth order 2PE contributions consist of two classes: the one-loop
(Fig. 3) and the two-loop diagrams (Fig. 4).
A.1 One-loop diagrams
This large pool of diagrams can be analyzed in a systematic way by intro-
ducing the following well-defined subdivisions.
A.1.1 c2i contributions.
The only contribution of this kind comes from the football diagram with
both vertices proportional to ci (first row of Fig. 3). One obtains [41]:
3L(q)
16π2f4π
w2 + c3w̃
2 − 4c1m2π
, (99)
WT = −
2L(q)
96π2f4π
. (100)
A.1.2 ci/MN contributions.
This class consists of diagrams with one vertex proportional to ci and one
1/MN correction. A few graphs that are representative for this class are
shown in the second row of Fig. 3. Symbols with a large solid dot and an
open circle denote 1/MN corrections of vertices proportional to ci. They are
part of L̂(3)πN , Eq. (39). The result for this group of diagrams is [41]:
VC = −
g2A L(q)
32π2MNf4π
(c2 − 6c3)q4 + 4(6c1 + c2 − 3c3)q2m2π
+6(c2 − 2c3)m4π + 24(2c1 + c3)m
, (101)
WC = −
2L(q)
192π2MNf4π
g2A(8m
π + 5q
2) + w2
, (102)
WT = −
c4L(q)
192π2MNf4π
g2A(16m
π + 7q
2)− w2
, (103)
VLS =
8π2MNf4π
w2L(q) , (104)
WLS = −
c4L(q)
48π2MNf4π
g2A(8m
π + 5q
2) + w2
. (105)
A.1.3 1/M2N corrections.
These are relativistic 1/M2N corrections of the leading order 2π exchange
diagrams. Typical examples for this large class are shown in row 3–6 of
Fig. 3. This time, there is no correction from the iterated 1PE, Eq. (65) or
Eq. (66), since the expansion of the factor M2N/Ep does not create a term
proportional to 1/M2N . The total result for this class is [42],
VC = −
32π2M2Nf
2m8πw
−4 + 8m6πw
−2 − q4 − 2m4π
(106)
WC = −
768π2M2Nf
q4 + 3m2πq
2 + 3m4π − 6m
−k2(8m2π + 5q
+ 4g4A
k2(20m2π + 7q
2 − 16m4πw
−2) + 16m8πw
+12m6πw
−2 − 4m4πq
2w−2 − 5q4 − 6m2πq
2 − 6m4π
− 4k2w2
16g4Am
, (107)
VT = −
g4A L(q)
32π2M2Nf
q2 +m4πw
, (108)
WT = −
1536π2M2Nf
7m2π +
q2 + 4m4πw
− 32g2A
m2π +
, (109)
VLS =
g4A L(q)
4π2M2Nf
q2 +m4πw
, (110)
WLS =
256π2M2Nf
16g2A
m2π +
4m4πw
q2 − 9m2π
, (111)
VσL =
g4A L(q)
32π2M2Nf
. (112)
A.2 Two-loop contributions.
The two-loop contributions are quite intricate. In Fig. 4, we attempt a
graphical representation of this class. The gray disk stands for all one-loop
πN graphs which are shown in some detail in the lower part of the figure.
Not all of the numerous graphs are displayed. Some of the missing ones
are obtained by permutation of the vertices along the nucleon line, others
by inverting initial and final states. Vertices denoted by a small dot are
from the leading order πN Lagrangian L̂(1)πN , Eq. (33), except for the four-
pion vertices which are from L(2)ππ , Eq. (27). The solid square represents
vertices proportional to the LECs di which are introduced by the third
order Lagrangian L(3)πN , Eq. (38). The di vertices occur actually in one-
loop NN diagrams, but we list them among the two-loop NN contributions
because they are needed to absorb divergences generated by one-loop πN
graphs. Using techniques from dispersion theory, Kaiser [41] calculated the
imaginary parts of the NN amplitudes, Im Vα(iµ) and Im Wα(iµ), which
result from analytic continuation to time-like momentum transfer q = iµ−0+
with µ ≥ 2mπ. From this, the momentum-space amplitudes Vα(q) and
Wα(q) are obtained via the subtracted dispersion relations:
VC,S(q) = −
ImVC,S(iµ)
µ5(µ2 + q2)
, (113)
VT (q) =
ImVT (iµ)
µ3(µ2 + q2)
, (114)
and similarly for WC,S,T .
In most cases, the dispersion integrals can be solved analytically and the
following expressions are obtained [26]:
VC(q) =
3g4Aw̃
2A(q)
1024π2f6π
(m2π + 2q
2mπ + w̃
2A(q)
+ 4g2Amπw̃
(115)
WC(q) = W
C (q) +W
C (q) , (116)
C (q) =
18432π4f6π
192π2f2πw
2g2Aw̃
(g2A − 1)w
6g2Aw̃
2 − (g2A − 1)w
384π2f2π
w̃2(d̄1 + d̄2) + 4m
+L(q)
4m2π(1 + 2g
A) + q
2(1 + 5g2A)
(5 + 13g2A) + 8m
π(1 + 2g
(117)
C (q) = −
ImW (b)C (iµ)
µ5(µ2 + q2)
, (118)
where
ImW (b)C (iµ) = −
3µ(8πf2π)3
g2A(2m
π − µ
2) + 2(g2A − 1)κ
− 3κ2x2 + 6κx
m2π + κ2x2 ln
m2π + κ2x2
µ2 − 2κ2x2 − 2m2π
m2π + κ2x2
; (119)
VT (q) = V
T (q) + V
T (q)
VS(q) = −
S (q) + V
S (q)
, (120)
T (q) = −
S (q) = −
2L(q)
32π2f4π
(d̄14 − d̄15) (121)
T (q) = −
S (q) =
ImV (b)T (iµ)
µ3(µ2 + q2)
, (122)
where
ImV (b)T (iµ) = −
2g6Aκ
µ(8πf2π)3
dx(1− x2)
m2π + κ2x2
 ; (123)
WT (q) = −
WS(q) =
2A(q)
2048π2f6π
w2A(q) + 2mπ(1 + 2g
, (124)
where κ ≡
µ2/4−m2π.
Note that the analytic solutions hold modulo polynomials. We have
checked the importance of those contributions where we could not find an
analytic solution and where, therefore, the integrations have to be performed
numerically. It turns out that the combined effect on NN phase shifts from
C , V
T , and V
S is smaller than 0.1 deg in F and G waves and smaller
than 0.01 deg in H waves, at Tlab = 300 MeV (and less at lower energies).
This renders these contributions negligible. Therefore, we omit W (b)C , V
and V (b)S in the construction of chiral NN potentials at order N
In Eqs. (117) and (121), we use the scale-independent LECs, d̄i, which
are obtained by combining the scale-dependent ones, dri (λ), with the chiral
logarithm, ln(mπ/λ), or equivalently d̄i = dri (mπ). The scale-dependent
LECs, dri (λ), are a consequence of renormalization. For more details about
this issue, see Ref. [37].
B Partial Wave Decomposition of the Fourth Or-
der Contact Potential
The contact potential contribution of order four, Eq. (80), decomposes into
partial-waves as follows.
V (4)(1S0) = D̂1S0(p
′4 + p4) +D1S0p
V (4)(3P0) = D3P0(p
′3p+ p′p3)
V (4)(1P1) = D1P1(p
′3p+ p′p3)
V (4)(3P1) = D3P1(p
′3p+ p′p3)
V (4)(3S1) = D̂3S1(p
′4 + p4) +D3S1p
V (4)(3D1) = D3D1p
V (4)(3S1 −3 D1) = D̂3S1−3D1p
4 +D3S1−3D1p
V (4)(1D2) = D1D2p
V (4)(3D2) = D3D2p
V (4)(3P2) = D3P2(p
′3p+ p′p3)
V (4)(3P2 −3 F2) = D3P2−3F2p
V (4)(3D3) = D3D3p
′2p2 (125)
The coefficients in the above expressions are given by:
D̂1S0 = D1 +
D3 − 3D5 −
D7 −D11 −
D12 −
D1S0 =
D4 − 10D5 −
D7 − 2D8 −
D12 −
D13 −
D14 −
D3P0 = −
D10 +
D11 +
D12 −
D1P1 = −
D2 + 4D5 −
D11 −
D3P1 = −
D10 − 2D11 −
D12 +
D̂3S1 = D1 +
D3 +D5 +
D11 +
D12 +
D3S1 =
D12 +
D13 +
D14 +
D3D1 =
D10 −
D11 +
D12 +
D13 −
D14 −
D̂3S1−3D1 = −
D11 −
D12 −
D13 −
D3S1−3D1 = −
D11 +
D12 +
D13 −
D14 +
D1D2 =
D12 +
D13 −
D14 +
D3D2 =
D10 +
D11 −
D12 −
D13 +
D14 +
D3P2 = −
D10 −
D11 +
D13 +
D3P2−3F2 =
D11 −
D12 +
D13 −
D3D3 =
D10 −
D15 (126)
References
[1] H. Yukawa, Proc. Phys. Math. Soc. Japan 17, 48 (1935).
[2] Prog. Theor. Phys. (Kyoto), Supplement 3 (1956).
[3] M. Taketani, S. Machida, and S. Onuma, Prog. Theor. Phys. (Kyoto)
7, 45 (1952).
[4] K. A. Brueckner and K. M. Watson, Phys. Rev. 90, 699; 92, 1023
(1953).
[5] A. R. Erwin et al., Phys. Rev. Lett. 6, 628 (1961); B. C. Maglić et al.,
ibid. 7, 178 (1961).
[6] Prog. Theor. Phys. (Kyoto), Supplement 39 (1967); R. A. Bryan and
B. L. Scott, Phys. Rev. 177, 1435 (1969); M. M. Nagels et al., Phys.
Rev. D 17, 768 (1978).
[7] R. Machleidt, Adv. Nucl. Phys. 19, 189 (1989).
[8] V. G. J. Stoks et al., Phys. Rev. C 49, 2950 (1994).
[9] R. Machleidt, Phys. Rev. C 63, 024001 (2001).
[10] A. D. Jackson, D. O. Riska, and B. Verwest, Nucl. Phys. A249, 397
(1975).
[11] R. Vinh Mau, in Mesons in Nuclei, edited by M. Rho and D. H. Wilkin-
son (North-Holland, Amsterdam, 1979), Vol. I, p. 151.
[12] M. Lacombe, B. Loiseau, J. M. Richard, R. Vinh Mau, J. Côté, P. Pires,
and R. de Tourreil, Phys. Rev. C 21, 861 (1980).
[13] R. Machleidt, K. Holinde, and Ch. Elster, Phys. Rep. 149, 1 (1987).
[14] F. Myhrer and J. Wroldsen, Rev. Mod. Phys. 60, 629 (1988).
[15] D. R. Entem, F. Fernandez, and A. Valcarce, Phys. Rev. C 62, 034002
(2000).
[16] G. H. Wu, J. L. Ping, L. J. Teng, F. Wang, and T. Goldman, Nucl.
Phys. A673, 273 (2000).
[17] S. Weinberg, Physica 96A, 327 (1979).
[18] S. Weinberg, Phys. Lett. B 251, 288 (1990); Nucl. Phys. B363, 3
(1991); Phys. Lett. B 295, 114 (1992).
[19] C. Ordóñez, L. Ray, and U. van Kolck, Phys. Rev. Lett. 72, 1982 (1994);
Phys. Rev. C 53, 2086 (1996).
[20] U. van Kolck, Prog. Part. Nucl. Phys. 43, 337 (1999).
[21] L. S. Celenza et al., Phys. Rev. C 46, 2213 (1992); C. A. da Rocha et
al., ibid. 49, 1818 (1994); D. B. Kaplan et al., Nucl. Phys. B478, 629
(1996).
[22] N. Kaiser, R. Brockmann, and W. Weise, Nucl. Phys. A625, 758 (1997).
[23] N. Kaiser, S. Gerstendörfer, and W. Weise, Nucl. Phys. A637, 395
(1998).
[24] E. Epelbaum et al., Nucl. Phys. A637, 107 (1998); A671, 295 (2000).
[25] D. R. Entem and R. Machleidt, Phys. Lett. B 524, 93 (2002).
[26] D. R. Entem and R. Machleidt, Phys. Rev. C 66, 014002 (2002).
[27] D. R. Entem and R. Machleidt, Phys. Rev. C 68, 041001 (2003).
[28] R. Machleidt and D. R. Entem, J. Phys. G: Nucl. Phys. 31, S1235
(2005).
[29] P. F. Bedaque and U. van Kolck, Ann. Rev. Nucl. Part. Sci. 52, 339
(2002).
[30] S. Scherer and M. R. Schindler, arXiv:hep-ph/0505265.
[31] Review of Particle Physics, J. Phys. G: Nucl. Part. Phys. 33, 1 (2006).
[32] S. Coleman, J. Wess, and B. Zumino, Phys. Rev. 177, 2239 (1969); C.
G. Callan, S. Coleman, J. Wess, and B. Zumino, ibid. 177, 2247 (1969).
[33] J. Gasser and H. Leutwyler, Ann. Phys. 158, 142 (1984).
[34] J. Gasser, M. E. Sainio, and A. Švarc, Nucl. Phys. B307, 779 (1988).
http://arxiv.org/abs/hep-ph/0505265
[35] V. Bernard, N. Kaiser, and U.-G. Meißner, Int. J. Mod. Phys. E 4, 193
(1995).
[36] N. Fettes, U.-G. Meißner, M. Mojžǐs, and S. Steininger, Ann. Phys.
(N.Y.) 283, 273 (2000); 288, 249 (2001).
[37] N. Fettes, U.-G. Meißner, and S. Steiniger, Nucl. Phys. A640, 199
(1998).
[38] N. Kaiser, Phys. Rev. C 61, 014003 (1999); 62, 024001 (2000).
[39] U. van Kolck, Phys. Rev. C 49, 2932 (1994).
[40] E. Epelbaum et al., Phys. Rev. C 66, 064001 (2002).
[41] N. Kaiser, Phys. Rev. C 64, 057001 (2001).
[42] N. Kaiser, Phys. Rev. C 65, 017001 (2002).
[43] R. Blankenbecler and R. Sugar, Phys. Rev. 142, 1051 (1966).
[44] This section closely follows Ref. [26].
[45] G. Q. Li and R. Machleidt, Phys. Rev. C 58, 3153 (1998).
[46] V. Stoks, R. Timmermans, and J. J. de Swart, Phys. Rev. C 47, 512
(1993).
[47] R. A. Arndt, R. L. Workman, and M. M. Pavan, Phys. Rev. C 49, 2729
(1994).
[48] P. Büttiker and U.-G. Meißner, Nucl. Phys. A668, 97 (2000).
[49] V. G. J. Stoks, R. A. M. Klomp, M. C. M. Rentmeester, and J. J. de
Swart, Phys. Rev. C 48, 792 (1993).
[50] R. A. Arndt, I. I. Strakovsky, and R. L. Workman, SAID, Scattering
Analysis Interactive Dial-in computer facility, George Washington Uni-
versity (formerly Virginia Polytechnic Institute), solution SM99 (Sum-
mer 1999); for more information see, e. g., R. A. Arndt, I. I. Strakovsky,
and R. L. Workman, Phys. Rev. C 50, 2731 (1994).
[51] In fact, preliminary calculations, which take an important class of dia-
grams of order five into account, indicate that the N4LO contribution
may prevailingly be repulsive (N. Kaiser, private communication).
[52] G. E. Brown and A. D. Jackson, The Nucleon-Nucleon Interaction,
(North-Holland, Amsterdam, 1976).
[53] N. Kaiser, Phys. Rev. C 63, 044010 (2001).
[54] K. Erkelenz, R. Alzetta, and K. Holinde, Nucl. Phys. A176, 413 (1971);
note that there is an error in equation (4.22) of this paper where it
should read
−W JLS = 2qq
′ J − 1
2J − 1
J−2,(0)
LS −A
+W JLS = 2qq
′ J + 2
2J + 3
J+2,(0)
LS −A
[55] E. E. Salpeter and H. A. Bethe, Phys. Rev. 84, 1232 (1951).
[56] The 1999 NN data base is defined in Ref. [9].
[57] E. Epelbaum, W. Glöckle, and U.-G. Meißner, Eur. Phys. J. A19, 401
(2004).
[58] M. Walzl et al., Nucl. Phys. A693, 663 (2001).
[59] U. van Kolck et al., Phys. Rev. Lett. 80, 4386 (1998).
[60] E. Epelbaum, W. Glöckle, and U.-G. Meißner, Nucl. Phys. A747, 362
(2005).
[61] R. B. Wiringa et al., Phys. Rev. C 51, 38 (1995).
[62] A. Huber et al., Phys. Rev. Lett. 80, 468 (1998).
[63] A. Nogga, P. Navratil, B. R. Barrett, and J. P. Vary, Phys. Rev. C 73,
064002 (2006).
[64] A. Nogga et al., Nucl. Phys. A737, 236 (2004).
[65] P. Navratil, V. G. Gueorguiev, J. P. Vary, W. E. Ormand, and A.
Nogga, arXiv:nucl-th/0701038.
[66] U.-G. Meißner, Proc. 18th International Conference on Few-Body Prob-
lems in Physics, Santos, SP, Brazil, August 2006, to be published in
Nucl. Phys. A.
[67] E. Epelbaum, Phys. Lett. B 639, 456 (2006).
http://arxiv.org/abs/nucl-th/0701038
[68] D. Rozpedzik et al., Acta Phys. Polon. B37, 2889 (2006); arXiv:nucl-
th/0606017.
[69] L. Coraggio et al., Phys. Rev. C 66. 021303 (2002).
[70] L. Coraggio et al., Phys. Rev. C 71. 014307 (2005).
[71] P. Navrátil and E. Caurier (2004) Phys. Rev. C 69 014311.
[72] C. Forssen et al., Phys. Rev. C 71, 044312 (2005).
[73] J.P. Vary et al., Eur. Phys. J. A 25 s01, 475 (2005).
[74] K. Kowalski et al., Phys. Rev. Lett. 92, 132501 (2004).
[75] D.J. Dean and M. Hjorth-Jensen (2004) Phys. Rev. C 69 054320.
[76] M. Wloch et al., J. Phys. G 31, S1291 (2005); Phys. Rev. Lett. 94,
21250 (2005).
[77] D.J. Dean et al., Nucl. Phys. 752, 299 (2005).
[78] J.R. Gour et al., Phys. Rev. C 74, 024310 (2006).
[79] S. Fujii, R. Okamato, and K. Suzuki, Phys. Rev. C 69, 034328 (2004).
[80] K. Ermisch et al., Phys. Rev. C 71, 064004 (2005).
[81] H. Witala, J. Golak, R. Skibinski, W. Glöckle, A. Nogga, E. Epelbaum,
H. Kamada, A. Kievsky, and M. Viviani, Phys. Rev. C 73, 044004
(2006).
[82] A. Nogga, R. G. E. Timmermans, and U. van Kolck, Phys. Rev. C 72,
054006 (2005).
[83] E. Epelbaum, U.-G. Meißner, arXiv:nucl-th/0609037.
http://arxiv.org/abs/nucl-th/0606017
http://arxiv.org/abs/nucl-th/0606017
http://arxiv.org/abs/nucl-th/0609037
	Introduction and Historical Perspective
	QCD and the Nuclear Force
	Effective Field Theory for Low-Energy QCD 
	Symmetries of Low-Energy QCD
	Chiral Symmetry
	Explicit Symmetry Breaking
	Spontaneous Symmetry Breaking
	Chiral Effective Lagrangians Involving Pions  
	Nucleon Contact Lagrangians
	Nuclear Forces from EFT: Overview
	Chiral Perturbation Theory and Power Counting
	The Hierarchy of Nuclear Forces
	Two-Nucleon Forces
	Pion-Exchange Contributions in ChPT 
	Zeroth Order (LO) 
	Second Order (NLO) 
	Third Order (NNLO) 
	Fourth Order (N3LO) 
	Iterated One-Pion-Exchange 
	NN Scattering in Peripheral Partial Waves Using the Perturbative Amplitude 
	NN Contact Potentials 
	Zeroth Order
	Second Order
	Fourth Order
	Constructing a Chiral NN Potential 
	Conceptual Questions
	What Order?
	Charge-Dependence
	A Quantitative NN Potential at N3LO 
	Many-Nucleon Forces
	Three-Nucleon Forces
	Four-Nucleon Forces
	Conclusions
	Fourth Order Two-Pion Exchange Contributions
	One-loop diagrams
	ci2 contributions.
	ci/MN contributions.
	1/MN2 corrections.
	Two-loop contributions.
	Partial Wave Decomposition of the Fourth Order Contact Potential
ABSTRACT
  In this lecture series, I present the recent progress in our understanding of
nuclear forces in terms of chiral effective field theory.

<|endoftext|><|startoftext|>
On a Conjecture of E. M. Stein on
the Hilbert Transform on Vector
Fields
Michael Lacey and Xiaochun Li
Michael Lacey, School of Mathematics, Georgia Insti-
tute of Technology, Atlanta GA 30332
Xiaochun Li, Department of Mathematics, University
of Illinois, Urbana IL 61801
E-mail address : xcli@math.uiuc.edu
http://arxiv.org/abs/0704.0808v3
1991 Mathematics Subject Classification. Primary 42A50, 42B25
The authors are supported in part by NSF grants. M.L. was
supported in part by the Guggenheim Foundation.
Abstract. Let v be a smooth vector field on the plane, that is a
map from the plane to the unit circle. We study sufficient condi-
tions for the boundedness of the Hilbert transform
Hv,ǫ f(x) := p.v.
f(x− yv(x)) dy
where ǫ is a suitably chosen parameter, determined by the smooth-
ness properties of the vector field. It is a conjecture, due to
E.M. Stein, that if v is Lipschitz, there is a positive ǫ for which
the transform above is bounded on L2. Our principal result gives
a sufficient condition in terms of the boundedness of a maximal
function associated to v, namely that this new maximal function
be bounded on some Lp, for some 1 < p < 2. We show that
the maximal function is bounded from L2 to weak L2 for all Lips-
chitz vector fields. The relationship between our results and other
known sufficient conditions is explored.
Contents
Preface vii
Chapter 1. Overview of Principal Results 1
Chapter 2. Connections to Besicovitch Set and Carleson’s
Theorem 7
Besicovitch Set 7
The Kakeya Maximal Function 7
Carleson’s Theorem 8
The Weak L2 Estimate in Theorem 1.15 is Sharp 10
Chapter 3. The Lipschitz Kakeya Maximal Function 11
The Weak L2 Estimate 11
An Obstacle to an Lp estimate, for 1 < p < 2 22
Bourgain’s Geometric Condition 23
Vector Fields that are a Function of One Variable 27
Chapter 4. The L2 Estimate for Hilbert Transform on Lipschitz
Vector Fields 31
Definitions and Principle Lemmas 31
Truncation and an Alternate Model Sum 36
Proofs of Lemmata 39
Chapter 5. Almost Orthogonality Between Annuli 63
Application of the Fourier Localization Lemma 63
The Fourier Localization Estimate 79
References 87
Preface
This memoir is devoted to a question in planar Harmonic Analysis,
a subject which is a circle of problems all related to the Besicovitch set.
This anomalous set has zero Lebesgue measure, yet contains a line seg-
ment of unit length in each direction of the plane. It is a known, since
the 1970’s, that such sets must necessarily have full Hausdorff dimen-
sion. The existence of these sets, and the full Hausdorff dimension, are
intimately related to other, independently interesting issues [26]. An
important tool to study these questions is the so-called Kakeya Maxi-
mal Function, in which one computes the maximal average of a function
over rectangles of a fixed eccentricity and arbitrary orientation.
Most famously, Charles Fefferman showed [10] that the Besicovitch
set is the obstacle to the boundedness of the disc multiplier in the
plane. But as well, this set is intimately related to finer questions of
Bochner-Riesz summability of Fourier series in higher dimensions and
space-time regularity of solutions of the wave equation.
This memoir concerns one of the finer questions which center around
the Besicovitch set in the plane. (There are not so many of these
questions, but our purpose here is not to catalog them!) It concerns
a certain degenerate Radon transform. Given a vector field v on R2,
one considers a Hilbert transform computed in the one dimensional line
segment determined by v, namely the Hilbert transform of a function
on the plane computed on the line segment {x+ tv(x) | |t| ≤ 1}.
The Besicovitch set itself says that choice of v cannot be just mea-
surable, for you can choose the vector field to always point into the set.
Finer constructions show that one cannot take it to be Hölder continu-
ous of any index strictly less than one. Is the sharp condition of Hölder
continuity of index one enough? This is the question of E. M. Stein,
motivated by an earlier question of A. Zygmund, who asked the same
for the question of differentiation of integrals.
The answer is not known under any condition of just smoothness
of the vector field. Indeed, as is known, and we explain, a positive
answer would necessarily imply Carleson’s famous theorem on the con-
vergence of Fourier series, [6]. This memoir is concerned with reversing
viii PREFACE
this implication: Given the striking recent successes related to Car-
leson’s Theorem, what can one say about Stein’s Conjecture? In this
direction, we introduce a new object into the study, a Lipschitz Kakeya
Maximal Function, which is a variant of the more familiar Kakeya Max-
imal Function, which links the vector field v to the ‘Besicovitch sets’
associated to the vector field. One averages a function over rectan-
gles of arbitrary orientation and—in contrast to the classical setting—
arbitrary eccentricity. But, the rectangle must suitably localize the
directions in which the vector field points. This Maximal Function ad-
mits a favorable estimate on L2, and this is one of the main results of
the Memoir.
On Stein’s Conjecture, we prove a conditional result: If the Lips-
chitz Kakeya Maximal Function associated with v maps is an estimate
a little better than our L2 estimate, then the associated Hilbert trans-
form is indeed bounded. Thus, the main question left open concerns
the behavior of these novel Maximal Functions.
While the main result is conditional, it does contain many of the
prior results on the subject, and greatly narrows the possible avenues
of a resolution of this conjecture.
The principal results and conjectures are stated in the Chapter 1;
following that we collect some of the background material for this sub-
ject, and prove some of the folklore results known about the subject.
The remainder of the Memoir is taken up with the proofs of the The-
orems stated in the Chapter 1.
Acknowledgment. The efforts of a strikingly generous referee has
resulted in corrections of arguments, and improvements in presentation
throughout this manuscript. We are indebted to that person.
Michael T. Lacey and Xiaochun Li
CHAPTER 1
Overview of Principal Results
We are interested in singular integral operators on functions of two
variables, which act by performing a one dimensional transform along
a particular line in the plane. The choice of lines is to be variable.
Thus, for a measurable map, v from R2 to the unit circle in the plane,
that is a vector field, and a Schwartz function f on R2, define
Hv,ǫ f(x) := p.v.
f(x− yv(x)) dy
This is a truncated Hilbert transform performed on the line segment
{x+ tv(x) : |t| < 1}. We stress the limit of the truncation in the defi-
nition above as it is important to different scale invariant formulations
of our questions of interest. This is an example of a Radon transform,
one that is degenerate in the sense that we seek results independent of
geometric assumptions on the vector field. We are primarily interested
in assumptions of smoothness on the vector field.
Also relevant is the corresponding maximal function
(1.1) Mv,ǫ f := sup
0<t≤ǫ
(2t)−1
|f(x− sv(x))| ds
The principal conjectures here concern Lipschitz vector fields.
Zygmund Conjecture 1.2. Suppose that v is Lipschitz. Then, for
all f ∈ L2(R2) we have the pointwise convergence
(1.3) lim
(2t)−1
f(x− sv(x)) ds = f(x) a. e.
More particularly, there is an absolute constant K > 0 so that if ǫ−1 =
K‖v‖Lip, we have the weak type estimate
(1.4) sup
λ|{Mv,ǫ f > λ}|1/2 . ‖f‖2 .
The origins of this question go back to the discovery of the Besicov-
itch set in the 1920s, and in particular, constructions of this set show
that the Conjecture is false under the assumption that v is Hölder con-
tinuous for any index strictly less than 1. These constructions, known
2 1. OVERVIEW OF PRINCIPAL RESULTS
since the 1920’s, were the inspiration for A. Zygmund to ask if integrals
of, say, L2(R2) functions could be differentiated in a Lipschitz choice
of directions. That is, for Lipschitz v, and f ∈ L2, is it the case that
(2ǫ)−1
f(x− yv(x)) dy = f(x) a.e.(x)
These and other matters are reviewed in the next chapter.
Much later, E. M. Stein [25] raised the singular integral variant of
this conjecture.
E.M. Stein Conjecture 1.5. There is an absolute constant K > 0
so that if ǫ−1 = K‖v‖Lip, we have the weak type estimate
(1.6) sup
λ|{|Hv,ǫ f | > λ}|1/2 . ‖f‖2 .
These are very difficult conjectures. Indeed, it is known that if
the Stein Conjecture holds for, say, C2 vector fields, then Carleson’s
Theorem on the pointwise convergence of Fourier series [6] would follow.
This folklore result is recalled in the next Chapter.
We will study these questions using modifications of the phase plane
analysis associated with Carleson’s Theorem [15–20] and a new tool,
which we term a Lipschitz Kakeya Maximal Function.
Associated with the Besicovitch set is the Kakeya Maximal Func-
tion, a maximal function over all rectangles of a given eccentricity.
A key estimate is that the L2 −→ L2,∞ norm of this operator grows
logarithmically in the eccentricity, [27, 28].
Associated with a Lipschitz vector field, we define a class of maxi-
mal functions taken over rectangles of arbitrary eccentricity, but these
rectangles are approximate level sets of the vector field. Perhaps sur-
prisingly, these maximal functions admit an L2 bound that is indepen-
dent of eccentricity. Let us explain.
A rectangle is determined as follows. Fix a choice of unit vectors in
the plane (e, e⊥), with e⊥ being the vector e rotated by π/2. Using these
vectors as coordinate axes, a rectangle is a product of two intervals
R = I × J . We will insist that |I| ≥ |J |, and use the notations
(1.7) L(R) = |I|, W(R) = |J |
for the length and width respectively of R.
The interval of uncertainty of R is the subarc EX(R) of the unit
circle in the plane, centered at e, and of length W(R)/ L(R). See Fig-
ure 1.1.
1. OVERVIEW OF PRINCIPAL RESULTS 3
0 EX(R) R
Figure 1.1. An example eccentricity interval EX(R).
The circle on the left has radius one.
We now fix a Lipschitz map v of the plane into the unit circle. We
only consider rectangles R with
(1.8) L(R) ≤ (100‖v‖Lip)−1 .
For such a rectangle R, set V(R) = R ∩ v−1(EX(R)). It is essential to
impose a restriction of this type on the length of the rectangles, for with
out it, one can modify constructions of the Besicovitch set to provide
examples which would contradict the main results and conjectures of
this work.
For 0 < δ < 1, we consider the maximal functions
(1.9) Mv,δ f(x)
= sup
|V(R)|≥δ|R|
1R(x)
|f(y)| dy.
That is we only form the supremum over rectangles for which the vector
field lies in the interval of uncertainty for a fixed positive proportion δ
of the rectangle, see Figure 1.2.
Weak L2 estimate for the Lipschitz Kakeya Maximal Function
1.10. The maximal function Mδ,v is bounded from L
2(R2) to L2,∞(R2)
with norm at most . δ−1/2. That is, for any λ > 0, and f ∈ L2(R2),
this inequality holds:
(1.11) λ2|{x ∈ R2 : Mδ,v f(x) > λ}| . δ−1‖f‖22 .
The norm estimate in particular is independent of the Lipschitz vector
field v.
A principal Conjecture of this work is:
Conjecture 1.12. For some 1 < p < 2, and some finite N and all
0 < δ < 1 and all Lipschitz vector fields v, the maximal function Mδ,v
is bounded from Lp(R2) to Lp,∞(R2) with norm at most . δ−N .
4 1. OVERVIEW OF PRINCIPAL RESULTS
Figure 1.2. A rectangle, with the vector field pointing
in the long direction of the rectangle at three points.
We cannot verify this Conjecture, only establishing that the norm
of the operator can be controlled by a slowly growing function of ec-
centricity.
In fact, this conjecture is stronger than what is needed below. Let us
modify the definition of the Lipschitz Kakeya Maximal Function, by re-
stricting the rectangles that enter into the definition to have an approx-
imately fixed width. For 0 < δ < 1, and choice of 0 < w < 1
‖v‖Lip,
parameterizing the width of the rectangles we consider, define
(1.13) Mv,δ,w f(x)
= sup
|V(R)|≥δ|R|
w≤W(R)≤2w
1R(x)
|f(y)| dy.
We can restrict attention to this case as the primary interest below
is the Hilbert transform on vector fields applied to functions with fre-
quency support in a fixed annulus. By Fourier uncertainty, the width
of the fixed annulus is the inverse of the parameter w above.
Conjecture 1.14. For some 1 < p < 2, and some finite N and all
0 < δ < 1, all Lipschitz vector fields v and 0 < w < 1
‖v‖Lip the
maximal function Mδ,v,w is bounded from L
p(R2) to Lp,∞(R2) with norm
at most . δ−N .
These conjectures are stated as to be universal over Lipschitz vec-
tor fields. On the other hand, we will state conditional results below
in which we assume that a given vector field satisfies the Conjecture
above, and then derive consequences for the Hilbert transform on vec-
tor fields. We also show that e. g. real-analytic vector fields [3] satisfy
these conjectures.
We turn to the Hilbert transform on vector fields. As it turns out,
it is useful to restrict functions in frequency variables to an annulus.
Such operators are given by
St f(x) =
1/t≤|ξ|≤2/t
f̂(ξ) eiξ·x dξ .
The relevance in part is explained in part by this result of the authors
[15], valid for measurable vector fields.
1. OVERVIEW OF PRINCIPAL RESULTS 5
Theorem 1.15. For any measurable vector field v we have the L2 into
λ|{|Hv,∞ ◦ St f | > λ}|1/2 . ‖f‖2 .
The inequality holds uniformly in t > 0.
It is critical that the Fourier restriction St enters in, for otherwise
the Besicovitch set would provide a counterexample, as we indicate in
the first section of Chapter 2. This is one point at which the differ-
ence between the maximal function and the Hilbert transform is strik-
ing. The maximal function variant of the estimate above holds, and
is relatively easy to prove, yet the Theorem above contains Carleson’s
Theorem on the pointwise convergence of Fourier series as a Corollary.
The weak L2 estimate is sharp for measurable vector fields, and so
we raise the conjecture
Conjecture 1.16. There is a universal constant K for which we have
the inequalities
(1.17) sup
0<t<‖v‖Lip
‖Hv,ǫ ◦ St‖2→2 <∞ ,
where ǫ = ‖v‖Lip/K.
Modern proofs of the pointwise convergence of Fourier series use the
so-called restricted weak type approach, invented by Muscalu, Tao and
Thiele in [21]. This method uses refinements of the weak L2 estimates,
together with appropriate maximal function estimates, to derive Lp
inequalities, for 1 < p < 2. In the case of Theorem 1.15—for which
this approach can not possibly work—the appropriate maximal func-
tion is the maximal function over all possible line segments. This is
the unbounded Kakeya Maximal function for rectangles with zero ec-
centricity. One might suspect that in the Lipschitz case, there is a
bounded maximal function. This is another motivation for our Lips-
chitz Kakeya Maximal Function, and our main Conjecture 1.12. We
illustrate how these issues play out in our current setting, with this
conditional result, one of the main results of this memoir.
Theorem 1.18. Assume that Conjecture 1.14 holds for a choice of
Lipschitz vector field v. Then we have the inequalities
(1.19) ‖Hv,ǫ ◦ St‖2 . 1 , 0 < t < ‖v‖Lip .
Here, ǫ is as in (1.17). Moreover, if the vector field as 1+η derivatives,
we have the estimate
(1.20) ‖Hv,ǫ‖2 . (1 + log‖v‖C1+η)2 .
In this case, ǫ = K/‖v‖C1+η and η > 0.
6 1. OVERVIEW OF PRINCIPAL RESULTS
While this is a conditional result, we shall see that it sheds new
light on prior results, such as one of Bourgain [3] on real analytic vector
fields. See Proposition 3.30, and the discussion of that Proposition.
The authors are not aware of any conceptual obstacles to the fol-
lowing extension of the Theorem above to be true, namely that one can
establish Lp estimates, for all p > 2. As our argument currently stands,
we could only prove this result for p sufficiently close to 2, because of
our currently crude understanding of the underlying orthogonality ar-
guments.
Conjecture 1.21. Assume that Conjecture 1.14 holds for a choice of
vector field v with 1 + η > 1 derivatives, then we have the inequalities
below
(1.22) ‖Hv,ǫ‖p . (1 + log‖v‖C1+η)2 , 2 < p <∞ .
In this case, ǫ = K/‖v‖C1+η .
For a brief remark on what is required to prove this conjecture, see
Remark 4.65.
The results of Christ, Nagel, Stein and Wainger [7] apply to certain
vector fields v. This work is a beautiful culmination of the ‘geometric’
approach to questions concerning the boundedness of Radon trans-
forms. Earlier, a positive result for analytic vector fields followed from
Nagel, Stein and Wainger [22]. E.M. Stein [25] specifically raised the
question of the boundedness of Hv for smooth vector fields v. And
the results of D. Phong and Stein [23, 24] also give results about Hv.
J. Bourgain [3] considered real–analytic vector field. N. H. Katz [13]
has made an interesting contribution to maximal function question.
Also see the partial results of Carbery, Seeger, Wainger and Wright [5].
CHAPTER 2
Connections to Besicovitch Set and Carleson’s
Theorem
Besicovitch Set
The Besicovitch set is a compact set that contains a line segment of
unit length in each direction in the plane. Anomalous constructions of
such sets show that they can have very small measure. Indeed, given
ǫ > 0 one can select rectangles R1, . . . , Rn, with disjoint eccentricities,
|EX(R)| ≃ n−1, and of unit length, so that |B| ≤ ǫ for B :=
n=1Rj.
On the other hand, letting ej ∈ EX(Rj), one has that the rectangles
Rj + ej are essentially disjoint. See Figure 2.1. Call the ‘reach’ of the
Besicovitch set
Reach :=
Rj + ej .
This set has measure about one. On the Reach, one can define a vector
field with points to a line segment contained in the Besicovitch set.
Clearly, one has
|Hv 1B(x)| ≃ 1 , x ∈ Reach .
Further, constructions of this set permit one to take the vector field
to be Lipschitz continuous of any index strictly less than one. And
conversely, if one considers a Besicovitch set associated to a vector field
of sufficiently small Lipschitz norm, of index one, the corresponding
Besicovitch set must have large measure. Thus, Lipschitz estimates
are critical.
The Kakeya Maximal Function
The Kakeya maximal function is typically defined as
(2.1) MK,ǫ f(x) := sup
|EX(R)|≥ǫ
1R(x)
|f(y)| dy , ǫ > 0 .
One is forced to take ǫ > 0 due to the existence of the Besicovitch set.
It is a critical fact that the norm of this operator admits a norm bound
on L2 that is logarithmic in ǫ. See Córdoba and Fefferman [8], and
8 2. CONNECTIONS
Figure 2.1. A Besicovitch Set on the left, and it’s
Reach on the right.
Strömberg [27, 28]. Subsequently, there have been several refinements
of this observation, we cite only Nets H. Katz [12], Alfonseca, Soria
and Vargas [1], and Alfonseca [2]. These papers contain additional
references. For the L2 norm, the following is the sharp result.
Theorem 2.2. We have the estimate below valid for all 0 < ǫ < 1.
‖MK,ǫ‖2→2 . 1 + log 1/ǫ .
The standard example of taking f to be the indicator of a small
disk show that the estimate above is sharp, and that the norm grows
as an inverse power of ǫ for 1 < p < 2.
Carleson’s Theorem
We explain the connection between the Hilbert transform on vector
fields and Carleson’s Theorem on the pointwise convergence of Fourier
series. Since smooth functions have a convergent Fourier expansion,
the main point of Carleson’s Theorem is to provide for the control of
an appropriate maximal function. We recall that maximal function in
this Theorem.
Carleson’s Theorem 2.3. For all measurable functions N : R −→
R, the operator below maps L2 into itself.
CNf(x) := p.v.
eiN(x)y f(x− y)dy
The implied operator norm is independent of the choice of measurable
N(x).
CARLESON’S THEOREM 9
v(x1)
N(x1)
ξ2 = J
σ(ξ1 −N(x1))
Figure 2.2. Deducing Carleson’s Theorem from Stein’s Conjecture.
For fixed function f , an appropriate choice of N will give us
∣∣∣p.v.
eiNy f(x− y)dy
∣∣∣ . |CNf(x)| .
Thus, in the Theorem above we have simply linearized the supremum.
Also, we have stated the Theorem with the un-truncated integral. The
content of the Theorem is unchanged if we make a truncation of the
integral, which we will do below.
Let us now show how to deduce this Theorem from an appropriate
bound on certain bound on Hilbert transforms on vector fields. (This
observation is apparently due to R.Coifman from the 1970’s.)
Proposition 2.4. Assume that we have, say, the bound
‖Hv,1‖2→2 . 1 ,
assuming that ‖v‖C2 ≤ 1. It follows that the Carleson maximal operator
is bounded on L2(R).
Proof. The Proposition and the proof are only given in their most
obvious formulation. Set σ(ξ) =
iξy dy
. For a C2 function N :
R −→ R we deduce that the operator with symbol σ(ξ − N(x)) maps
L2(R) into itself with norm that is independent of the C2 norm of
the function N(x). A standard limiting argument then permits one
to conclude the same bound for all measurable choices of N(x), as is
required for the deduction of Carleson’s inequality.
This argument is indicated in Figure 2.2. Take the vector field to
be v(x1, x2) = (1,−N(x1)/n) where n is chosen much larger than the
10 2. CONNECTIONS
C2 norm of the function N(x1). Then, Hv,1 is bounded on L
2(R2) with
norm bounded by an absolute constant. The symbol of Hv,1 is
σ(ξ1, ξ2) = σ(ξ1 − ξ2N(x1)/n) .
The trace of this symbol along the line ξ2 = J defines a symbol of a
bounded operator on L2(R). Taking J very large, we obtain a very good
approximation to symbol σ(ξ1 − ξ2N(x1)/n), deducing that it maps
L2(R) into itself with a bounded constant. Our proof is complete. �
The Weak L2 Estimate in Theorem 1.15 is Sharp
An example shows that under the assumption that the vector field
is measurable, the sharp conclusion is that Hv ◦ S1 maps L2 into L2,∞.
And a variant of the approach to Carleson’s theorem by Lacey and
Thiele [20] will prove this norm inequality. This method will also show,
under only the measurability assumption, that Hv S1 maps L
p into itself
for p > 2, as is shown by the current authors [15]. The results and
techniques of that paper are critical to this one.
CHAPTER 3
The Lipschitz Kakeya Maximal Function
The Weak L2 Estimate
We prove Theorem 1.10, the weak L2 estimate for the maximal
function defined in (1.9), by suitably adapting classical covering lemma
arguments.
The Covering Lemma Conditions. We adopt the covering lemma
approach of Córdoba and R. Fefferman [8]. To this end, we regard the
choice of vector field v and 0 < δ < 1 as fixed. Let R be any finite
collection of rectangles obeying the conditions (1.8) and |V(R)| ≥ δ|R|.
We show that R has a decomposition into disjoint collections R′ and
R′′ for which these estimates hold.
. δ−1
,(3.1)
R∈R′′
∣∣∣ .
(3.2)
The first of these conditions is the stronger one, as it bounds the L2
norm squared by the L1 norm; the verification of it will occupy most
of the proof.
Let us see how to deduce Theorem 1.10. Take λ > 0 and f ∈ L2
which is non negative and of norm one. Set R to be all the rectangles
R of prescribed maximum length as given in (1.8), density with respect
to the vector field, namely |V(R)| ≥ δ|R|, and
f(y) dy ≥ λ|R| .
We should verify the weak type inequality
(3.3) λ
. δ−1/2 .
12 3. LIPSCHITZ KAKEYA
Apply the decomposition to R. Observe that
. δ−1/2
Here of course we have used (3.1). This implies that
. δ−1/2.
Therefore clearly (3.3) holds for the collection R′.
Concerning the collection R′′, apply (3.2) to see that
R∈R′′
. δ−1/2 .
This completes our proof of (3.3).
The remainder of the proof is devoted to the proof of (3.1) and
(3.2).
The Covering Lemma Estimates.
Construction of R′ and R′′. In the course of the proof, we will need
several recursive procedures. The first of these occurs in the selection
of R′ and R′′.
We will have need of one large constant κ, of the order of say 100,
but whose exact value does not concern us. Using this notation hides
distracting terms.
Let Mκ be a maximal function given as
Mκ f(x) = sup
|f(y)| dy , sup
|f(x+ σω)| dσ
Here, Q is the unit square in plane, and Ω is a set of uniformly dis-
tributed points on the unit circle of cardinality equal to κ. It follows
from the usual weak type bounds that this operator maps L1(R2) into
weak L1(R2).
To initialize the recursive procedure, set
R′ ← ∅ ,
STOCK← R .
THE WEAK L
ESTIMATE 13
R′ κR
Figure 3.1. The rectangle R′ would have been removed
from STOCK upon the selection of R as a member of R′.
The main step is this while loop. While STOCK is not empty, select
R ∈ STOCK subject to the criteria that first it have a maximal length
L(R), and second that it have minimal value of |EX(R)|. Update
R′ ←R′ ∪ {R}.
Remove R from STOCK. As well, remove any rectangle R′ ∈ STOCK
which is also contained in
1κR ≥ κ−1
As the collection R is finite, the while loop will terminate, and at
this point we set R′′ def= R−R′. In the course of the argument below,
we will refer the order in which rectangles were added to R′.
With this construction, it is obvious that (3.3) holds, with a bound
that is a function of κ. Yet, κ is an absolute constant, so this depen-
dence does not concern us. And so the rest of the proof is devoted to
the verification of (3.1).
An important aspect of the qualitative nature of the interval of
eccentricity is encoded into this algorithm. We will choose κ so large
that this is true: Consider two rectangles R and R′ with R ∩ R′ 6= ∅,
L(R) ≥ L(R′), W(R) ≥ W(R′), |EX(R)| ≤ |EX(R′)| and EX(R) ⊂
10EX(R′) then we have
(3.4) R′ ⊂ κR .
See Figure 3.1.
Uniform Estimates. We estimate the left hand side of (3.1). In
so doing we expand the square, and seek certain uniform estimates.
14 3. LIPSCHITZ KAKEYA
Expanding the square on the left hand side of (3.1), we can estimate
l.h.s. of (3.1) ≤
|R|+ 2
(ρ,R)∈P
|ρ ∩R|
where P consists of all pairs (ρ, R) ∈ R′×R′ such that ρ∩R 6= ∅, and
ρ was selected to be a member of R′ before R was. It is then automatic
that L(R) ≤ L(ρ). And since the density of all tiles is positive, it follows
that dist(EX(ρ),EX(R)) ≤ 2‖v‖LipL(ρ) < 150 .
We will split up the collection P into sub-collections {SR : R ∈ R′}
and {Tρ : ρ ∈ R′}.
For a rectangle R ∈ R′, we take SR to consist of all rectangles ρ
such that (a) (ρ, R) ∈ P; and (b) EX(ρ) ⊂ 10EX(R). We assert that
(3.5)
|R ∩ ρ| ≤ |R|, R ∈ R.
This estimate is in fact easily available to us. Since the rectangles
ρ ∈ SR were selected to be in R′ before R was, we cannot have the
inclusion
(3.6) R ⊂
1κρ > κ
Now the rectangle ρ are also longer. Thus, if (3.5) does not hold, we
would compute the maximal function of
in a direction which is close, within an error of 2π/κ, of being orthog-
onal to the long direction of R. In this way, we will contradict (3.6).
The second uniform estimate that we need is as follows. For fixed
ρ, set Tρ to be the set of all rectangles R such that (a) (ρ, R) ∈ P and
(b) EX(ρ) 6⊂ 10EX(R). We assert that
(3.7)
|R ∩ ρ| . δ−1|ρ|, ρ ∈ R′.
This proof of this inequality is more involved, and taken up in the next
subsection.
Remark 3.8. In the proof of (3.7), it is not necessary that ρ ∈ R′.
Writing ρ = Iρ × Jρ, in the coordinate basis e and e⊥, we could take
any rectangle of the form I × Jρ.
THE WEAK L
ESTIMATE 15
. . .
Figure 3.2. Proof of Lemma 3.9
These two estimates conclude the proof of (3.1). For any two dis-
tinct rectangles ρ, R ∈ P, we will have either ρ ∈ SR or R ∈ Tρ. Thus
(3.1) follows by summing (3.5) on R and (3.7) on ρ.
The Proof of (3.7). We do not need this Lemma for the proof of
(3.7), but this is the most convenient place to prove it.
Lemma 3.9. Let S be any finite collection of rectangles with L(R) ≤
2 L(R′), and with |V(R)| ≥ δ|R| for all R,R′ ∈ S. Then it is the case
(3.10)
≤ 2δ−1 .
Proof. Fix a point x at which we give an upper bound on the
sum above. Let C(x) be any circle centered at x. We shall show that
there exists at most one R in S such that V(R) ∩ C(x) 6= ∅. By the
assumption |V(R)| ≥ δ|R| this proves the Lemma.
We prove this last claim by contradiction of the Lipschitz assump-
tion on the vector field v. Assume that there exist at least two rectan-
gles R,R′ ∈ S for which the sets V(R) and V(R′) intersect C(x). Thus
there exist y and y′ in C(x) such that v(y) ∈ EX(R) and v(y′) ∈ EX(R′).
Since v is Lipschitz, we have
|v(y)− v(y′)| ≤ ‖v‖Lip|y − y′|
≤ 4‖v‖Lip L(R)|v(y)− v(y′)| ,
but this is a contradiction to our assumption (1.8). See Figure 3.2. �
We fix ρ, and begin by making a decomposition of the collection Tρ.
Suppose that the coordinate axes for ρ are given by eρ, associated with
the long side of R, and e⊥ρ , with the short side. Write the rectangle
as a product of intervals Iρ × J , where |Iρ| = L(ρ). Denote one of the
endpoints of J as α. See Figure 3.3.
16 3. LIPSCHITZ KAKEYA
Figure 3.3. Notation for the proof of (3.7).
For rectangles R ∈ Tρ, let IR denote the orthogonal projection R
onto the line segment 2Iρ×{α}. Subsequently, we will consider different
subsets of this line segment. The first of these is as follows. For R ∈ Tρ,
let VR be the projection of the set V(R) onto 2Iρ × {α}. The angle θ
between ρ and R is at most |θ| ≤ 2‖v‖Lip L(ρ) ≤ 150 . It follows that
(3.11) 1
L(R) ≤ |IR| ≤ 2 L(R), and δ L(R) . |VR|.
A recursive mechanism is used to decompose Tρ. Initialize
STOCK← Tρ ,
U ← ∅ .
While STOCK 6= ∅ select R ∈ STOCK of maximal length. Update
U ← U ∪ {R},
U(R)← {R′ ∈ STOCK : VR ∩ VR′ 6= ∅}.
STOCK← STOCK− U(R).
(3.12)
When this while loop stops, it is the case that Tρ =
R∈U U(R).
With this construction, the sets {VR : R ∈ U} are disjoint. By
(3.11), we have
(3.13)
L(R) . δ−1 L(ρ) .
The main point, is then to verify the uniform estimate
(3.14)
R′∈U(R)
|R′ ∩ ρ| . L(R) ·W(ρ) , R ∈ U .
Note that both estimates immediately imply (3.7).
THE WEAK L
ESTIMATE 17
Figure 3.4. Proof of Lemma 3.15: The rectangles
R,R′ ∈ U(ρ), and so the angles R and R′ form with
ρ are nearly the same.
Proof of (3.14). There are three important, and more technical,
facts to observe about the collections U(R).
For any rectangle R′ ∈ U(R), denote its coordinate axes as eR′ and
e⊥R′ , associated to the long and short sides of R
′ respectively.
Lemma 3.15. For any rectangle R′ ∈ U(R) we have
|eR′ − eR| ≤ 12 |eρ− eR|
Proof. There are by construction, points x ∈ V(R) and x′ ∈ V(R′)
which get projected to the same point on the line segment Iρ × {α}.
See Figure 3.4. Observe that
|eR′ − eR| ≤ |EX(R′)|+ |EX(R)|+ |v(x′)− v(x)|
≤ |EX(R′)|+ |EX(R)|+ ‖v‖Lip · L(R) · |eρ− eR|
≤ |EX(R′)|+ |EX(R)|+ 1
|eρ− eR|
Now, |EX(R)| ≤ 1
|eρ− eR|, else we would have ρ ∈ SR. Likewise,
|EX(R′)| ≤ 1
|eR′ − eR|. And this proves the desired inequality.
Lemma 3.16. Suppose that there is an interval I ⊂ Iρ such that
(3.17)
R′∈U(R)
L(R′)≥8|I|
|R′ ∩ I × J | ≥ |I × J | .
Then there is no R′′ ∈ U(R) such that L(R′′) < |I| and R′′∩4I×J 6= ∅.
Proof. There is a natural angle θ between the rectangles ρ and R,
which we can assume is positive, and is given by |eρ− eR|. Notice that
we have θ ≥ 10|EX(R)|, else we would have ρ ∈ SR, which contradicts
our construction.
18 3. LIPSCHITZ KAKEYA
4I × J
I × J
Figure 3.5.
Moreover, there is an important consequence of Lemma 3.15: For
any R′ ∈ U(R), there is a natural angle θ′ between R′ and ρ. These
two angles are close. For our purposes below, these two angles can be
regarded as the same.
For any R′ ∈ U(R), we will have
|κR′ ∩ ρ|
|I × J | ≃ κ
W(R′) ·W(ρ)
θ|I|W(ρ)
W(R′)
θ · |I| .
Recall Mκ is larger than the maximal function over κ uniformly
distributed directions. Choose a direction e′ from this set of κ directions
that is closest to e⊥ρ . Take a line segment Λ in direction e
′ of length
κθ|I|, and the center of Λ is in 4I × J . See Figure 3.5. Then we have
|κR′ ∩ Λ|
|Λ| ≥
W(R′)
θ · |I|
Thus by our assumption (3.17),
R′∈U(R)
|κR′ ∩ Λ| & 1 .
That is, any of the lines Λ are contained in the set
1κR′ > κ
Clearly our construction does not permit any rectangle R′′ ∈ U(R)
contained in this set. To conclude the proof of our Lemma, we seek a
contradiction. Suppose that there is an R′′ ∈ U(R) with L(R′′) < |I|
and R′′ intersects 2I × J . The range of line segments Λ we can permit
THE WEAK L
ESTIMATE 19
4I × J
Figure 3.6. The proof of Lemma 3.18
is however quite broad. The only possibility permitted to us is that
the rectangle R′′ is quite wide. We must have
W(R′′) ≥ 1
|Λ| = κ
· θ · |I|.
This however forces us to have |EX(R′′)| ≥ κ
θ. And this implies that
ρ ∈ SR′′ , as in (3.5). This is the desired contradiction.
Our third and final fact about the collection U(R) is a consequence
of Lemma 3.15 and a geometric observation of J.-O. Stromberg [27,
Lemma 2, p. 400].
Lemma 3.18. For any interval I ⊂ IR we have the inequality
(3.19)
R′∈U(R)
L(R′)≤|I|≤
κL(R′)
|R′ ∩ I × J | ≤ 5|I| ·W(ρ) .
Proof. For each point x ∈ 4I × J , consider the square S centered
at x of side length equal to
κ · |I| · |eR− eρ|. See Figure 3.6. It is
Stromberg’s observation that for R′ ∈ U(R) we have
|κR′ ∩ I × J |
|I × J | ≃
|S ∩ κR′|
with the implied constant independent of κ. Indeed, by Lemma 3.15,
we have that
|κR′ ∩ I × J |
|I × J | ≃
κW(R′)
|eR− eρ| · |I|
≃ κW(R
′) · |I| · |eR− eρ|
(|eR− eρ| · |I|)2
≃ |S ∩ κR
|S| ,
20 3. LIPSCHITZ KAKEYA
as claimed.
Now, assume that (3.19) does not hold and seek a contradiction.
Let U ′ ⊂ U(R) denote the collection of rectangles R′ over which the
sum is made in (3.19). The rectangles in U ′ were added in some order
to the collection R′, and in particular there is a rectangle R0 ∈ U ′ that
was the last to be added to U ′. Let U ′′ be the collection U ′−{R0}. We
certainly have ∑
R′∈U ′′
|R′′ ∩ I × J | ≥ 4|I × J |.
Since we cannot have ρ ∈ SR0 , Stromberg’s observation implies that
R′∈U ′′
1κR′ > κ
Here, we rely upon the fact that the maximal function Mκ is larger than
the usual maximal function over squares. But this is a contradiction
to our construction, and so the proof is complete. �
The principal line of reasoning to prove (3.14) can now begin with
it’s initial recursive procedure. Initialize
C(R′)← R′ ∩ ρ .
We are to bound the sum
R′∈U(R)|C(R′)|. Initialize a collection of
subintervals of IR to be
I ← ∅
WHILE there is an interval I ⊂ IR satisfying∑
R′∈V(I)
|C(R′) ∩ I × J | ≥ 40|I| ·W(ρ) ,(3.20)
V(I) = {R′ ∈ U(R) | |C(R′) ∩ I × J | 6= ∅ , L(R′) ≥ 8|I|} ,(3.21)
we take I to be an interval of maximal length with this property, and
update
I ← I ∪ {I} ;
C(R′, I) = C(R′) ∩ I × J , R′ ∈ V(I);
C(R′)← C(R′)− I × J, R′ ∈ V(I) .
[We remark that this last updating is not needed in the most important
special case when all rectangles have the same width. But the case we
are considering, rectangles can have variable widths, so that |C(R)| can
be much larger than any |I| · |J | that would arise from this algorithm.]
Once the WHILE loop stops, we have
R′ ∩ ρ = C(R′) ∪
{C(R′, I) | I ∈ I , R′ ∈ V(I)} .
THE WEAK L
ESTIMATE 21
Here the union is over pairwise disjoint sets.
We first consider the collection of sets {C(R′) | R′ ∈ U(R)} that
remain after the WHILE loop has finished. Since we must not have
R′ ⊂ 1/4κ · ρ, it follows that the minimum value of L(R′) is 1
W(ρ).
Thus, if in (3.20), we consider an interval I of length 1
W(ρ), the
condition L(R′) ≥ 8|I| in the definition of V(I) in (3.21) is vacuous.
Thus, we necessarily have
R′∈V(I)
|C(R′) ∩ I × J | ≤ 40|I| ·W(ρ) .
For if this inequality failed, the WHILE loop would not have stopped.
We can partition IR by intervals of length close to
W(ρ), showing
that we have ∑
R′∈U(R)
|C(R′)| . |IR| ·W(ρ) .
Turning to the central components of the argument, namely the
bound for the terms associated with the intervals in I, consider I ∈ I.
The inequality (3.20) and Lemma 3.18 implies that each I ∈ I must
have length |I| ≤ κ−1/2|Iρ|. But we choose intervals in I to be of
maximal length. Thus,
R′∈V(I)
|C(R′, I)| ≤ 100 · |I| ·W(ρ) .
(3.22)
Indeed, suppose this last inequality fails. Let I ⊂ Ĩ ⊂ Iρ be an interval
twice as long as I. By Lemma 3.18, we conclude that
R′∈V(I)
L(R′)≤8|eI|
|R′ ∩ Ĩ × J | ≤ 10|I| ·W(ρ) .
Notice that we are restricting the sum on the left by the length of |Ĩ|.
Therefore, we have the inequalities
R′∈V(I)
L(R′)>8|eI|
|C(R′, I)| ≥ 90 · |I| ·W(ρ) > 40 · |Ĩ| ·W(ρ) .
That is, Ĩ would have been selected, contradicting our construction.
Lemmas 3.16 and 3.18 place significant restrictions on the collection
of intervals I. If we have I 6= I ′ ∈ I with 3
I ∩ 3
I ′ 6= ∅, then we must
have e.g.
κ|I ′| < |I|, as follows from Lemma 3.18. Moreover, V(I ′)
must contain a rectangle R′ with L(R′) < |I|. But this contradicts
Lemma 3.16.
22 3. LIPSCHITZ KAKEYA
Therefore, we must have
|I| . |IR| . L(R).
With (3.22), this completes the proof of (3.14).
An Obstacle to an Lp estimate, for 1 < p < 2
We address one of the main conjectures of this memoir, namely
Conjecture 1.12. Let us first observe
Proposition 3.23. We have the estimate below valid for all 0 < w <
‖v‖Lip.
λ|{Mv,w f > λ}|2/3 . δ−1/3(1 + logw−1‖v‖Lip)1/3‖f‖3/2
Proof. Let ‖v‖Lip = 1. This just relies upon the fact that with
0 < w < 1
fixed, there are only about log 1/w possible values of L(R).
This leads very easily to the following two estimates. Following the
earlier argument, consider an arbitrary collection of rectangles R with
each R ∈ R satisfying (1.8) and |V(R)| ≥ δ|R|. We can then decompose
R into disjoint collections R′ and R′′ for which these estimates hold.
. δ−1(log 1/w)
,(3.24)
R∈R′′
∣∣∣ .
(3.25)
Compare to (3.1) and (3.2). Following the same line of reasoning that
was used to prove (3.3), we prove our Proposition.
We can devise proofs of smaller bounds on the norm of the maximal
function than that given by this proposition. But no argument that we
can find avoids the some logarithmic term in the width of the rectangle.
Let us illustrate the difficulty in the estimate with an object pointed
out to us by Ciprian Demeter. We term it a pocketknife, and it is
pictured in Figure 3.7.
A pocketknife comes with a handle, namely a rectangle Rhandle that
is longer than any other rectangle in the pocketknife. We call a collec-
tion of rectangles B a set of blades if these two conditions are met. In
the first place,
(3.26) Rhandle ∩
R 6= ∅ .
BOURGAIN’S GEOMETRIC CONDITION 23
handle
blades
. . .
hinges
Figure 3.7. A pocketknife.
In the second place, we have
angle(R,Rhandle) ≃ angle(R′, Rhandle) , R, R′ ∈ B .
Let θ(B) denote the angle between Rhandle and the rectangles in the
blade B. We refer to as a hinge a rectangle of dimensions w/θ(B) by w,
in the same coordinate system of Rhandle that contains the intersection
in (3.26).
Now, let B be a collection of blades for the handle Rhandle. Our proof
of the weak L2 estimate for the Lipschitz Kakeya Maximal function
shows that we can assume
♯B · w2 · θ(B)−1 . |Rhandle| .
This is essentially the estimate (3.5).
But, to follow the covering lemma approach to the L3/2 estimate
for the maximal function, we need to control
(♯B)2 · w2 · θ(B)−1 .
We can only find control of expressions of this type in terms of some
slowly varying function of w−1.
Bourgain’s Geometric Condition
Jean Bourgain [3] gives a geometric condition on the Lipschitz vec-
tor field that is sufficient for the L2 boundedness of the maximal func-
tion associated with v. We describe the condition, and show how it
immediately proves that the corresponding Lipschitz Kakeya maximal
function admits a weak type bound on L1. In particular our Conjec-
ture 1.12 holds for these vector fields.
To motivate Bourgain’s condition, let us recall the earlier condition
considered by Nagel, Stein and Wainger in [22]. This condition imposes
24 3. LIPSCHITZ KAKEYA
a restriction on the maximum and minimum curvatures of the integral
curves of the vector field through the assumption that
supx∈Ω det[∇v(x)v(x), v(x)]
infx∈Ω det[∇v(x)v(x), v(x)]
Here, Ω is a domain in R2, and one can achieve an upper bound on the
norm of the maximal function associated to v, appropriately restricted
to Ω, in terms of this ratio.
Bourgain’s condition permits the vector field to have integral curves
which are flat. Suppose that v is defined on all of R2. Define
(3.27) ω(x; t) := |det[v(x+ tv(x)), v(x)]| , |t| ≤ 1
‖v‖Lip .
Assume a uniform estimate of the following type: For absolute con-
stants 0 < c, C <∞ and 0 < ǫ0 < 12‖v‖Lip,
(3.28) |{|t| ≤ ǫ | ω(x; t) < τ sup
|s|≤ǫ
ω(x, s)}| ≤ Cτ cǫ ,
this condition holding for all x ∈ R2, 0 < τ < 1 and 0 < ǫ < ǫ0.
The interest in this condition stems from the fact [3] that real-
analytic vector fields satisfy it. Also see Remark 3.35. Bourgain proved:
Theorem 3.29. Assume that (3.28) holds. Then, the maximal opera-
tor Mv,ǫ0 defined in (1.1) maps L
2 into itself.
This paper claims that the same methods would prove the bounds
‖Mv,ǫ0‖p . ‖f‖p , 1 < p <∞ .
And suggests that similar methods would apply to the localized Hilbert
transform with respect to these vector fields.
Here, we prove
Proposition 3.30. Assume that (3.28) holds. Then, the Lipschitz
Kakeya Maximal Functions
Mv,δ,w , 0 < δ < 1 , 0 < w < ǫ0
defined in (1.13) satisfy the weak L1 estimate
λ|{Mv,δ,w f > λ}| . δ−1(1 + log 1/δ)‖f‖1 .
The implied constants depend upon the constants in (3.28).
That is, these vector fields easily fall within the scope of our anal-
ysis. As a corollary to Theorem 1.18, we see that Hv maps L
2 into
itself.
BOURGAIN’S GEOMETRIC CONDITION 25
Figure 3.8. Proof of (3.31).
Proof. Let us assume that ‖v‖Lip = 1. Fix δ > 0 and 0 < w < ǫ0.
Let R be the class of rectangles with L(R) < κ and satisfying |V(R)| ≥
δ|R|.
Say that R′ ⊂ R has scales separated by s > 3 iff for R,R′ ∈ R′ the
condition 4 L(R) < L(R′) implies that 2s L(R) < L(R′). One sees that
R can be decomposed into ≃ s sub-collections with scales separated
by s.
The fortunate observation is this: Assuming (3.28), and taking s ≃
log 1/δ, any subset R′ ⊂ R with scales separated by s further enjoys
this property: If R,R′ ∈ R′ with C EX(R) ∩ C EX(R′) = ∅, with C a
fixed constant, then
(3.31) L(R) ≃ L(R′) or R ∩R′ = ∅ .
Let us see why this is true, arguing by contradiction. Thus we
assume that L(R′) ≤ 2−s L(R), R∩R′ 6= ∅ and C EX(R)∩C EX(R′) = ∅.
Since the rectangles have an essentially fixed width, it follows that
2|EX(R′)| ≥ |EX(R)|. Fix a line ℓ in the long direction of 2R with
|{x ∈ ℓ | v(x) ∈ V(R)}| ≥ δ
|ℓ| = δ
L(R) .
Let x0 be in the set above, x
0 ∈ V (R′) and x′0 is the projection of x′′0
onto the line ℓ. See Figure 3.8. Observe that we can estimate
|v(x′′0)− v(x′0)| ≤ 2|v(x0)− v(x′′0)| L(R′)(3.32)
Therefore, for C sufficiently large, we have
|v(x0)− v(x′0)| ≥
∣∣|v(x′0)− v(x′′0)| − |v(x′′0)− v(x0)|
≥ |v(x′′0)− v(x0)|(1− 2 L(R′))
≥ |EX(R′)|
provided C is large enough.
Now, after a moments thought, one sees that
|det[v(x0), v(x′0)]| ≃ angle(v(x0), v(x′0)) .
26 3. LIPSCHITZ KAKEYA
Therefore, for any x ∈ ℓ
s≤L(R)
ω(x; s) & |EX(R′)| .
But the vector field satisfies (3.28), which we will apply with
τ ≃ EX(R)
EX(R′)
≃ L(R
It follows that
L(R) ≤ |{x ∈ ℓ | ω(x; s) ≤ cτ |EX(R′)|}|
≤ |{x ∈ ℓ | ω(x; s) ≤ τ sup
|s|≤ǫ
ω(x, s)}|
. τ c L(R) .
Therefore, we see
(δ/2)1/c .
L(R′)
which is a contradiction to R′ have scales separated by s, and s ≃
1 + log 1/δ.
Let us see how to prove the Proposition now that we have proved
(3.31). Take s ≃ log 1/δ, and a finite sub-collection R′ ⊂ R of rectan-
gles with scales separated by s. We may take a further subset R′′ ⊂ R′
such that ∥∥∥
R′′∈R′′
. δ−1 ,(3.33)
R∈R′−R′′
∣∣∣ .
R∈R′′
|R′′| .(3.34)
These are precisely the covering estimates needed to prove the weak
L1 estimate claimed in the proposition.
But, in choosing R′′ to satisfy (3.33), it is clear that we need only
be concerned about rectangles with a fixed length, and the separation
in scales are (3.31) will control rectangles of distinct lengths.
The procedure that we apply to select R′′ is inductive. Set
R′′ ← ∅ ,
S ← ∅ ,
STOCK←R′ .
WHILE STOCK 6= ∅, select R ∈ STOCK with maximal length, and
update R′′ ← R′′ ∪ {R}, as well as STOCK ← STOCK − {R}. In
addition, for any R′ ∈ STOCK with R′ ⊂ 4CR, where C ≥ 1 is the
VECTOR FIELDS THAT ARE A FUNCTION OF ONE VARIABLE 27
constant that insures that (3.31) holds, remove these rectangles from
STOCK and add them to S.
Once the WHILE loop stops, we will have STOCK = ∅ and we have
our decomposition of R′. By construction, it is clear that (3.34) holds.
We need only check that (3.33) holds. Now, consider R,R′ ∈ R′, with
two rectangles have their scales separated, thus 2s L(R′) < L(R). If it
is the case that R∩R′ 6= ∅ and C EX(R)∩C EX(R′) 6= ∅, then R would
been selected to be in R′ first, whence R′ would have been placed in
Therefore, C EX(R) ∩ C EX(R′) = ∅, but then (3.31) implies that
R ∩ R′ = ∅. Thus, the only contribution to the L∞ norm in (3.33)
can come from rectangles of about the same length. But Lemma 3.9
then implies that such rectangles can overlap only about δ−1 times.
Our proof is complete. (As the interest in (3.28) is in small values of
c, it will be more efficient to use Lemma 3.9 to handle the case of the
rectangles having approximately the same length.) �
Remark 3.35. To conclude that the Hilbert transform on vector fields
is bounded, one could weaken Bourgain’s condition (3.28) to
|{|t| ≤ ǫ | ω(x; t) < τ sup
|s|≤ǫ
ω(x, s)}| ≤ C exp(−(log 1/τ)c)ǫ .
This inequality is to hold universally in x ∈ R2, 0 < τ < 1, and
0 < ǫ < ‖v‖Lip. This is of interest for 0 < c < 1. The proof above
can be modified to show that the maximal functions Mv,δ,w satisfy the
weak L1 inequality, with constant at most . δ−1−1/c.
Vector Fields that are a Function of One Variable
We specialize to the vector fields that are a function of just one real
variable. Assume that the vector field v is of the form
(3.36) v(x1, x2) = (v1(x2), v2(x2)) ,
and for the moment we do not impose the condition that the vector
field take values in the unit circle. The point is simply this: If we are
interested in transforms where the kernel is not localized, the restriction
on the vector field is immaterial. Namely, for any vector field v
Hv,∞ f(x) = p.v.
f(x− yv(x)) dy
= p.v.
f(x− yṽ(x)) dy
, ṽ(x) =
|v(x)| .
We return to a theme implicit in the proof of Proposition 2.4. This
proof only relies upon vector fields that are only a function of one
28 3. LIPSCHITZ KAKEYA
variable. Thus, it is a significant subcase of the Stein Conjecture to
verify it for Lipschitz vector fields of just one variable. Indeed, the
situation is this.
Proposition 3.37. Suppose that a choice of vector field v(x1, x2) =
(1, v1(x1)) is just a function of, say, the first coordinate. Then, Hv,∞
maps L2(R2) into itself.
Proof. The symbol of Hv,∞ is
sgn(ξ1 + ξ2v1(x1)) .
For each fixed ξ2, this is a bounded symbol. And in the special case
of the L2 estimate, this is enough to conclude the boundedness of the
operator. �
It is of interest to extend this Theorem in any Lp, for p 6= 2, for
some reasonable choice of vector fields.
The corresponding questions for the maximal function are also of
interest, and here the subject is much more developed. The paper [5]
studies the maximal function Mv,∞. They proved the boundedness of
this maximal function on Lp, p > 1, assuming that the vector field was
of the form v(x) = (1, v2(x)), that D v2 was positive, and increasing,
and satisfied a third more technical condition. More recently, [14] has
showed that the third condition is not needed. Namely the following is
true.
Theorem 3.38. Assume that v(x) = (1, v2(x)), and moreover that
D v2 ≥ 0 and is monotonically increasing. Then, Mv,∞ is bounded on
Lp, for 1 < p <∞.
These vector fields present far fewer technical difficulties than a gen-
eral Lipschitz vector field, and there are a richer set of proof techniques
that one can bring to bear on them, as indicated in part in the proof
of Proposition 3.37. The papers [5, 14] cleverly exploit the Plancherel
identity (in the independent variable), and other orthogonality consid-
erations to prove their results.
These considerations are not completely consistent with the domi-
nant theme of this monograph, in which the transforms are localized.
Nevertheless, it would be interesting to explore methods, possibly mod-
ifications of this memoir, that could provide an extension of Proposi-
tion 3.37.
In this direction, let us state a possible direction of study. The
definition of the the sets V(R) for vector fields of magnitude 1 is given
as V(R) = R ∩ v−1(EX(R)). For vector fields of arbitrary magnitude,
VECTOR FIELDS THAT ARE A FUNCTION OF ONE VARIABLE 29
we define these sets to be
V(R) = {x ∈ R | v(x)|v(x)| ∈ EX(R)} .
Define a maximal function—an extension of our Lipschitz Kakeya Max-
imal Function—by
(3.39) M̃v,δf(x) = sup
|V(R)|≥δ|R|
|R|−1
f(y) dy .
In this definition, we require the rectangles to have density δ, but do
not restrict their eccentricities, or lengths.
Conjecture 3.40. Assume that the vector field is of the form v(x) =
(1, v2(x2)), and the derivative D v ≥ 0 and is monotone. Then for all
0 < δ < 1, we have the estimate
‖M̃v,δ‖p . δ−1 , 1 < p <∞ .
One can construct examples which show that the L1 to weak L1
norm of the maximal function is not bounded in terms of δ. Indeed,
recalling the ‘pocketknife’ examples of Figure 3.7, we comment that one
can construct examples of vector fields with these properties, which we
describe with the terminology associated with the pocketknife exam-
ples.
• The width of all rectangles are fixed. And all rectangles have
density δ.
• The ‘handle’ of the pocketknife has positive angle θ with the
x1 axis.
• There is ‘hinge’ whose blades have angles which are positive,
and greater than θ. The number of blades can be unbounded,
as the width of the rectangles decreases to zero.
The assumption that the vector field is only a function of x2 then
greatly restricts, but does not completely forbid, the existence of addi-
tional hinges. So the combinatorics of these vector fields, as expressed
in the Lipschitz Kakeya Maximal Function, are not so simple.
CHAPTER 4
The L2 Estimate for Hilbert Transform on
Lipschitz Vector Fields
We prove one of our main conditional results about the Hilbert
transform on Lipschitz vector fields, the inequality (1.19) which is the
estimate at L2, for functions with frequency support in an annulus,
assuming an appropriate estimate for the Lipschitz Kakeya Maximal
Function.
We begin the proof by setting notation appropriate for phase plane
analysis for functions f on the plane supported on an annulus. With
this notation, we can define appropriate discrete analogs of the Hilbert
transform on vector fields. The Lemmas 4.22 and 4.23 are the combi-
natorial analogs of our Theorem 1.15. We then take up the proofs. The
main step in the proof is Lemma 4.50 which combines the (standard)
orthogonality considerations with the conjectures about the Lipschitz
Kakeya Maximal Functions.
Definitions and Principle Lemmas
Throughout this chapter, κ will denote a fixed small positive con-
stant, whose exact value need not concern us. κ of the order of 10−3
would suffice. The following definitions are as in the authors’ previous
paper [15].
Definition 4.1. A grid is a collection of intervals G so that for all
I, J ∈ G, we have I ∩ J ∈ {∅, I, J}. The dyadic intervals are a grid. A
grid G is central iff for all I, J ∈ G, with I ⊂6= J we have 500κ−20I ⊂ J .
The reader can find the details on how to construct such a central
grid structure in [11].
Let ρ be rotation on T by an angle of π/2. Coordinate axes for R2
are a pair of unit orthogonal vectors (e, e⊥) with ρ e = e⊥.
Definition 4.2. We say that ω ⊂ R2 is a rectangle if it is a product
of intervals with respect to a choice of axes (e, e⊥) of R
2. We will
say that ω is an annular rectangle if ω = (−2l−1, 2l−1)× (a, 2a) for an
integer l with 2l < κa, with respect to the axes (e, e⊥). The dimensions
of ω are said to be 2l × a. Notice that the face (−2l−1, 2l−1) × a is
32 4. L
ESTIMATE FOR Hv
es Rs
Figure 4.1. The two rectangles ωs and Rs whose prod-
uct is a tile. The gray rectangles are other possible loca-
tions for the rectangle Rs.
tangent to the circle |ξ| = a at the midpoint to the face, (0, a). We
say that the scale of ω is scl(ω) := 2l and that the annular parameter
of ω is ann(ω) := a. In referring to the coordinate axes of an annular
rectangle, we shall always mean (e, e⊥) as above.
Annular rectangles will decompose our functions in the frequency
variables. But our methods must be sensitive to spatial considerations;
it is this and the uncertainty principle that motivate the next definition.
Definition 4.3. Two rectangles R and R are said to be dual if they
are rectangles with respect to the same basis (e, e⊥), thus R = r1 × r2
and R = r1 × r2 for intervals ri, ri, i = 1, 2. Moreover, 1 ≤ |ri| · |ri| ≤ 4
for i = 1, 2. The product of two dual rectangles we shall refer to as a
phase rectangle. The first coordinate of a phase rectangle we think of
as a frequency component and the second as a spatial component.
We consider collections of phase rectangles AT which satisfy these
conditions. For s, s′ ∈ AT we write s = ωs × Rs, and require that
ωs is an annular rectangle,(4.4)
Rs and ωs are dual,(4.5)
The rectangles Rs are from the product of central grids.(4.6)
{1000κ−100R | ωs × R ∈ AT } covers R2, for all ωs.(4.7)
ann(ωs) = 2
j for some integer j,(4.8)
♯{ωs | scl(s) = scl, ann(s) = ann} ≥ c
,(4.9)
scl(s) ≤ κann(s).(4.10)
DEFINITIONS AND PRINCIPLE LEMMAS 33
0 ρωs1
Figure 4.2. An annular rectangular ωs, and three as-
sociated subintervals of ρωs1, ωs1, and ωs2.
We assume that there are auxiliary sets ωs,ωs1,ωs2 ⊂ T associated to
s—or more specifically ωs—which satisfy these properties.
Ω := {ωs,ωs1,ωs2 | s ∈ AT } is a grid in T,(4.11)
ωs1 ∩ ωs2 = ∅, |ωs| ≥ 32(|ωs1|+ |ωs2|+ dist(ωs1,ωs2))(4.12)
ωs1 lies clockwise from ωs2 on T,(4.13)
|ωs| ≤ K
scl(ωs)
ann(ωs)
,(4.14)
{ ξ|ξ| | ξ ∈ ωs} ⊂ ρωs1.(4.15)
In the top line, the intervals ωs1 and ωs2 are small subintervals of
the unit circle, and we can define their dilate by a factor of 2 in an
obvious way. Recall that ρ is the rotation that takes e into e⊥. Thus,
eωs ∈ ωs1. See the figures Figure 4.1 and Figure 4.2 for an illustration
of these definitions.
Note that |ωs| ≥ |ωs1| ≥ scl(ωs)/ann(ωs). Thus, eωs is in ωs1, and
ωs serves as ‘the angle of uncertainty associated to Rs.’ Let us be
more precise about the geometric information encoded into the angle
of uncertainty. Let Rs = rs × rs⊥ be as above. Choose another set of
coordinate axes (e′, e′⊥) with e
′ ∈ ωs and let R′ be the product of the
intervals rs and rs⊥ in the new coordinate axes. Then K
′ ⊂ Rs ⊂
′ for an absolute constant K0 > 1.
We say that annular tiles are collections AT satisfying the condi-
tions (4.4)—(4.15) above. We extend the definition of e⊥, eω⊥, ann(ω)
and scl(ω) to annular tiles in the obvious way, using the notation es,
es⊥, ann(s) and scl(s).
34 4. L
ESTIMATE FOR Hv
A phase rectangle will have two distinct functions associated to it.
In order to define these functions, set
Ty f(x) := f(x− y), y ∈ R2 (Translation operator)
Modξ f(x) := e
iξ·x f(x), ξ ∈ R2 (Modulation operator)
R1×R2 f(x1, x2) :=
(|R1||R2|)1/p
(Dilation operator).
In the last display, 0 < p ≤ ∞, and R1 × R2 is a rectangle, and the
coordinates (x1, x2) are those of the rectangle. Note that the definition
depends only on the side lengths of the rectangle, and not the location.
And that it preserves Lp norm.
For a function ϕ and tile s ∈ AT set
(4.16) ϕs := Modc(ωs)Tc(Rs)D
We shall consider ϕ to be a Schwartz function for which ϕ̂ ≥ 0 is
supported in a small ball, of radius κ, about the origin in R2, and is
identically 1 on another smaller ball around the origin. (Recall that κ
is a fixed small constant.)
We introduce the tool to decompose the singular integral kernels.
In so doing, we consider a class of functions ψt, t > 0, so that
Each ψt is supported in frequency in [−θ − κ,−θ + κ].(4.17)
|ψt(x)| . CN(1 + |x|)−N , N > 1 .(4.18)
In the top line, θ is a fixed positive constant so that the second half of
(4.19) is true.
Define
φs(x) :=
ϕs(x− yv(x))ψs(y) dy
(v(x))
ϕs(x− yv(x))ψs(sy) dy.
(4.19)
ψs(y) := scl(s)ψscl(s)(scl(s)y).(4.20)
An essential feature of this definition is that the support of the integral
is contained in the set {v(x) ∈ ωs2}, a fact which can be routinely
verified. That is, we can insert the indicator 1
(v(x)) without loss
of generality. The set ωs2 serves to localize the vector field, while ωs1
serves to identify the location of ϕs in the frequency coordinate.
DEFINITIONS AND PRINCIPLE LEMMAS 35
The model operator we consider acts on a Schwartz functions f ,
and it is defined by
(4.21) Cannf :=
s∈AT (ann)
scl(s)≥‖v‖Lip
〈f, ϕs〉φs.
In this display, AT (ann) := {s ∈ AT | ann(s) = ann}, and we have
deliberately formulated the operator in a dilation invariant manner.
Lemma 4.22. Assume that the vector field is Lipschitz, and satisfies
Conjecture 1.14. Then, for all ann ≥ ‖v‖−1Lip, the operator Cann extends
to a bounded map from L2 into itself, with norm bounded by an absolute
constant.
We remind the reader that for 2 < p <∞ the only condition needed
for the boundedness of Cann is the measurability of the vector field, a
principal result of Lacey and Li [15]. It is of course of great importance
to add up the Cann over ann. The method we use for doing this are
purely L2 in nature, and lead to the estimate for C :=
j=1 C2j .
Lemma 4.23. Assume that the vector field is of norm at most one
in Cα for some α > 1, and satisfies Conjecture 1.14. Then C maps
L2 into itself. In addition we have the estimate below, holding for all
values of scl.
(4.24)
ann=−∞
s∈AT (ann)
scl(s)=scl
〈f, ϕs〉φs
. (1 + log(1 + scl−1‖v‖Cα)).
Moreover, these operators are unconditionally convergent in s ∈ AT .
These are the principal steps towards the proof of Theorem 1.15.
In the course of the proof, we shall not invoke the additional notation
needed to account for the unconditional convergence, as it is entirely
notational. They can be added in by the reader.
Observe that (4.24) is only of interest when scl < ‖v‖Cα. This
inequality depends critically on the fact that the kernel sclψ(scly) has
mean zero. Without this assumption, this inequality is certainly false.
The proof of Theorem 1.15 from these two lemmas is an argument
in which one averages over translations, dilations and rotations of grids.
The specifics of the approach are very close to the corresponding argu-
ment in [15]. The details are omitted.
The operators Cann and C are constructed from a a kernel which is a
smooth analog of the truncated kernel p.v. 1
1{|t|≤1}. Nevertheless, our
36 4. L
ESTIMATE FOR Hv
main theorem follows,1 due to the observation that we can choose a
sequence of Schwartz kernels ψ(1+κ)n , for n ∈ Z, which satisfy (4.17)
and (4.18), and so that for
K(t) :=
an(1 + κ)
nψ(1+κ)n((1 + κ)
we have p.v. 1
1{|t|≤1} = K(t)−K(t). Here, for n ≥ 0 we have |an| . 1.
And for n < 0, we have |an| . (1+κ)n. The principal sum is thus over
n ≥ max(0, ‖v‖Cα), and this corresponds to the operator C. For those
n < max(0, ‖v‖Cα), we use the estimate (4.24), and the rapid decay of
the coefficient an.
Truncation and an Alternate Model Sum
There are significant obstacles to proving the boundedness of the
model sum Cann on an Lp space, for 1 < p < 2. In this section, we
rely upon some naive L2 estimates to define a new model sum which is
bounded on Lp, for some 1 < p < 2.
Our next Lemma is indicative of the estimates we need. For choices
of scl < κann, set
AT (ann, scl) := {s ∈ AT (ann) | scl(s) = scl}.
Lemma 4.25. For measurable vector fields v and all choices of ann
and scl. ∥∥∥
s∈AT (ann,scl)
〈f, ϕs〉φs
. ‖f‖2
Proof. The scale and annulus are fixed in this sum, making the
Bessel inequality
s∈AT (ann,scl)
|〈f, ϕs〉|2 . ‖f‖22
evident. For any two tiles s and s′ that contribute to this sum, if ωs 6=
ωs′, then φs and φs′ are disjointly supported. And if ωs = ωs′, then
Rs and R
s are disjoint, but share the same dimensions and orientation
in the plane. The rapid decay of the functions φs then gives us the
1In the typical circumstance, one uses a maximal function to pass back and
forth between truncated and smooth kernels. This route is forbidden to us; there
is no appropriate maximal function to appeal to.
TRUNCATION AND AN ALTERNATE MODEL SUM 37
estimate
s∈AT (ann,scl)
〈f, ϕs〉φs
s∈AT (ann,scl)
|〈f, ϕs〉|2
. ‖f‖2
Consider the variant of the operator (4.21) given by
(4.26) Φf =
s∈AT (ann)
scl(s)≥κ−1‖v‖Lip
〈f, ϕs〉φs.
As ann is fixed, we shall begin to suppress it in our notations for oper-
ators. The difference between Φ and Cann is the absence of the initial
. log(1+ ‖v‖Lip) scales in the former. The L2 bound for these missing
scales is clearly provided by Lemma 4.25, and so it remains for us to
establish
(4.27) ‖Φ‖2 . 1,
the implied constant being independent of ann, and the Lipschitz norm
of v.
It is an important fact, the main result of Lacey and Li [15], that
(4.28) ‖Φ‖p . 1, 2 < p <∞.
This holds without the Lipschitz assumption.
We are now at a point where we can be more directly engaged with
the construction of our alternate model sum. We only consider tiles
with κ−1‖v‖Lip ≤ scl(s) ≤ κann. A parameter is introduced which is
used to make a spatial truncation of the functions ϕs; it is
(4.29) γ2s := 100
−2 scl(s)
‖v‖Lip
Write ϕs = αs + βs where αs = (Tc(Rs)D
ζ)ϕs, and ζ is a smooth
Schwartz function supported on |x| < 1/2, and equal to 1 on |x| < 1/4.
Write for choices of tiles s,
(4.30) ψs(y) = ψs−(y) + ψs+(y)
where ψs−(y) is a Schwartz function on R, with
supp(ψs−) ⊂
γs(scl(s))
−1[−1, 1] ,
38 4. L
ESTIMATE FOR Hv
and equal to ψscl(s)(y) for |y| < 14γs(scl(s))
−1. Then define
(4.31) as±(x) = 1ωs2(v(x))
φs(x− yv(x))ψs±(y) dy.
Thus, φs = as− + as+. Recalling the notation Sann in Theorem 1.15,
define
(4.32) A± f :=
s∈AT (ann)
scl(s)≥κ−1‖v‖Lip
〈Sann f, αs〉as±
We will write Φ = ΦSann = A+ +A− +B, where B is an operator
defined in (4.35) below. The main fact we need concerns A−.
Lemma 4.33. There is a choice of 1 < p0 < 2 so that
‖A−‖p . 1, p0 < p <∞.
The implied constant is independent of the value of ann, and the Lips-
chitz norm of v.
The proof of this Lemma is given in the next section, modulo three
additional Lemmata stated therein. The following Lemma is important
for our approach to the previous Lemma. It is proved below.
Lemma 4.34. For each choice of κ−1‖v‖Lip < scl < κann, we have the
estimate ∑
s∈AT (ann,scl)
|〈Sann f, αs〉|2 . ‖f‖22.
Define
(4.35) B f :=
s∈AT (ann)
scl(s)≥κ−1‖v‖Lip
〈Sann f, βs〉φs
Lemma 4.36. For a Lipschitz vector field v, we have
‖B‖p . 1, 2 ≤ p <∞.
Proof. For choices of integers κ−1‖v‖Lip ≤ scl < κann, consider
the vector valued operator
Tj,k f :=
{〈Sann f, βs〉√
1{v(x)∈ωs2}Tc(Rs)D
(1 + | · |2)N )(x)
| s ∈ AT (ann, scl)
where N is a fixed large integer.
PROOFS OF LEMMATA 39
Recall that βs is supported off of
γsRs. This is bounded linear op-
erator from L∞(R2) to ℓ∞(AT (ann, scl)). It has norm. (scl/‖v‖Lip)−10.
Routine considerations will verify that
Tj,k : L
2(R2) −→ ℓ2(AT (ann, scl))
with a similarly favorable estimate on its norm. By interpolation, we
achieve the same estimate for Tj,k from L
p(R2) into ℓp(AT (ann, scl)),
2 ≤ p <∞.
It is now very easy to conclude the Lemma by summing over scales
in a brute force way, and using the methods of Lemma 4.25. �
We turn to A+, as defined in (4.32).
Lemma 4.37. We have the estimate
‖A+‖p . 1 2 ≤ p <∞.
Proof. We redefine the vector valued operator Tj,k to be
Tj,k f :=
{〈Sann f, αs〉√
1{v(x)∈ωs2}Tc(Rs)D
(1 + |x|2)N
| s ∈ AT (ann, scl)
where N is a fixed large integer. This operator is bounded from
Lp(R2) −→ ℓp(AT (ann, scl)) , 2 ≤ p <∞
Its norm is at most . 1.
But, for s ∈ AT (ann, scl), we have
(4.38) |as+| . (scl/‖v‖Lip)−10|Rs|−1/2(M1Rs)100.
Here M denotes the strong maximal function in the plane in the coordi-
nates determined by Rs. This permits one to again adapt the estimate
of Lemma 4.25 to conclude the Lemma. �
Now we conclude that ‖Φ‖2 . 1. And since Φ = A− +A++B, it
follows from the Lemmata of this section.
Proofs of Lemmata
Proof of Lemma 4.33. We have Φ = A−+A+ +B, so from
(4.28), Lemma 4.36 and Lemma 4.37, we deduce that ‖A−‖p . 1 for
all 2 < p <∞. It remains for us to verify that A− is of restricted weak
type p0 for some choice of 1 < p0 < 2. That is, we should verify that
for all sets F,G ⊂ R2 of finite measure
(4.39) |〈A− 1F , 1G〉| . |F |1/p|G|1−1/p, p0 < p < 2.
40 4. L
ESTIMATE FOR Hv
Since A− maps L
p into itself for 2 < p < ∞, it suffices to consider
the case of |F | < |G|. Since we assume only that the vector field is
Lipschitz, we can use a dilation to assume that 1 < |G| < 2, and so
this set will not explicitly enter into our estimates.
We fix the data F ⊂ R2 of finite measure, ann, and vector field v
with ‖v‖Lip ≤ κann. Take p0 = 2 − κ2. We need a set of definitions
that are inspired by the approach of Lacey and Thiele [20], and are
also used in Lacey and Li [15]. For subsets S ⊂ Av := {s ∈ AT (ann) |
κ−1‖v‖Lip ≤ scl(s) < κann}, set
〈Sann 1F , αs〉as−
Set χ(x) = (1 + |x|)−1000/κ. Define
(4.40) χ
:= χ(p)s = Tc(Rs)D
χ, 0 ≤ p ≤ ∞.
And set χ̃
s = 1γsRsχ
Remark 4.41. It is typical to define a partial order on tiles, following
an observation of C. Fefferman [9]. In this case, there doesn’t seem to
be an appropriate partial order. Begin with this assumption on the
order relation ‘<’ on tiles:
(4.42)
If ωs ×Rs ∩ ωs′ ×Rs′ 6= ∅, then s and s′ are comparable under ‘<’.
It follows from transitivity of a partial order that that one can have
tiles s1, . . . , sJ , with sj+1 < sj for 1 ≤ j < J , J ≃ log(‖v‖Lip · ann), and
yet the rectangles RsJ and Rs1 can be far apart, namely RsJ ∩ (cJ)Rs1 ,
for a positive constant c. See Figure 4.3. (We thank the referee for
directing us towards this conclusion.) Therefore, one cannot make
the order relation transitive, and maintain control of the approximate
localization of spatial variables, as one would wish. The partial order
is essential to the argument of [9], but while it is used in [20], it is not
essential to that argument.
We recall a fact about the eccentricity. There is an absolute con-
stant K ′ so that for any two tiles s, s′
(4.43) ωs ⊃ ωs′ , Rs ∩ Rs′ 6= ∅ implies Rs ⊂ K ′Rs′ .
Figure 3.1 illustrates this in the case where the two rectangles Rs and
Rs′ have different widths, which is not the case here.
We define an order relation on tiles by s . s′ iff ωs ) ωs′ and
Rs ⊂ κ−10Rs′. Thus, (4.42) holds for this order relation, and it is
certainly not transitive.
PROOFS OF LEMMATA 41
Figure 4.3. The rectangles Rs1 , . . . , RsJ of Remark 4.41.
A tree is a collection of tiles T ⊂ Av, for which there is a (non–
unique) tile ωT × RT ∈ AT (ann) with Rs ⊂ 100κ−10RT, and ωs ⊃ ωT
for all s ∈ T. Here, we deliberately use a somewhat larger constant
100κ−10 than we used in the definition of the order relation ‘..’
For j = 1, 2, call T a i–tree if the tiles for all s, s′ ∈ T, if scl(s) 6=
scl(s′), then ωsi∩ωs′i = ∅. 1–trees are especially important. A few tiles
in such a tree are depicted in Figure 4.4.
Remark 4.44. This remark about the partial order ‘.’ and trees is
useful to us below. Suppose that we have two trees T, with top s(T)
and T′ with top s(T′). Suppose in addition that s(T′) . s(T). Then,
it is the case that T ∪ T′ is a tree with top s(T). Indeed, we must
necessarily have ωT ( ωT, since the Rs are from products of a central
grid. Also, 100κ−1RT′ ⊂ 100κ−1RT. And so every tile in T′ could also
be a tile in T.
Our proof is organized around these parameters and functions as-
sociated to tiles and sets of tiles. Of particular note here are the first
definitions of ‘density,’ which have to be formulated to accommodate
the lack of transitivity in the partial order. Note that in the first defi-
nition, the supremum is taken over tiles s′ ∈ AT of the same annular
parameter as s. We choose the collection AT as it is ‘universal,’ cover-
ing all scales in a uniform way, due to different assumptions including
42 4. L
ESTIMATE FOR Hv
(4.7).
dense(S) := sup
s′∈AT
G∩v−1(ωs′ )
s′ dx | ∃ s , s′′ ∈ S :
ωs ⊃ ωs′ ⊃ ωs′′ , Rs ⊂ 100κ−10Rs′ ,
Rs′ ⊂ 100κ−10Rs′
(4.45)
∆(T)2 :=
|〈Sann 1F , αs〉|2
1Rs , T is a 1–tree,(4.46)
size(S) := sup
T is a 1–tree
∆(T) dx.(4.47)
Observe that dense(S) only really applies to ‘tree-like’ sets of tiles, and
that—and this is important—the tile s′ that appear in (4.45) are not
in S, but only assumed to be in AT . Finally, note that
dense(s) ≃
G∩v−1(ωs)
χ̃(1)s dx
with the implied constants only depending upon κ, χ, and other fixed
quantities.
Observe these points about size. First, it is computed relative to
the truncated functions αs, recall (4.29). Second, that for p > 1,
(4.48) ‖∆(T)‖p . |F |1/p ,
because of a standard Lp estimate for a Littlewood-Paley square func-
tion. Third, that size(Av(ann)) . 1. And fourth, that one has an
estimate of John-Nirenberg type.
Lemma 4.49. For a 1-tree T we have the estimate
‖∆(T)‖p . size(T)|RT|1/p , 1 < p <∞ .
Proofs of results of this type are well represented in the literature.
See [4, 11].
Given a set of tiles, say that count(S) < A iff S is a union of trees
T ∈ T for which ∑
|sh(T)| < A.
We will also use the notation count(S) . A, implying the existence of
an absolute constant K for which count(S) ≤ KA.
The principal organizational Lemma is
PROOFS OF LEMMATA 43
Figure 4.4. A few possible tiles in a 1–tree. Rectangles
ωs are on the left in different shades of gray. Possible
locations of Rs are in the same shade of gray.
Lemma 4.50. Any finite collection of tiles S ⊂ Av is a union of four
subsets
Slight, Ssmall, S
large, ℓ = 1, 2.
They satisfy these properties.
size(Ssmall) <
size(S),(4.51)
dense(Slight) <
dense(S),(4.52)
and both Sℓlarge are unions of trees T ∈ T ℓ, for which we have the
estimates
count(S1large) .
size(S)
−2−κ|F |
size(S)−p dense(S)−M |F |
+ size(S)
dense(S)−1
dense(S)−1
(4.53)
count(S2large) .
size(S)
(log 1/ size(S))3|F |
size(S)κ/50 dense(S)−1
(4.54)
What is most important here is the middle estimate in (4.53). Here,
p is as in Conjecture 1.14, and M > 0 is only a function of N in that
Conjecture.
The estimates that involve size(S)−2|F | are those that follow from
orthogonality considerations. The estimates in dense(S)−1 are those
that follow from density considerations which are less complicated.
However, in the second half of (4.54), the small positive power of size
is essential for us. All of these estimates are all variants of those in
[20].
The middle estimate of (4.53) is not of this type, and is the key
ingredient that permits us to obtain an estimate below L2. Note that
44 4. L
ESTIMATE FOR Hv
it gives the best bound for collections with moderate density and size.
For it we shall appeal to our assumed Conjecture 1.14.
Logarithms, such as those that arise in (4.54), arise from our trun-
cation arguments, associated with the parameters γs in (4.29).
For individual trees, we need two estimates.
Lemma 4.55. If T is a 1–tree with −
∆(T) ≥ σ, then we have
(4.56) |F ∩ σ−κRT| & σ1+κ|RT|.
Lemma 4.57. For trees T we have the estimate
(4.58)
|〈Sann1F , αs〉〈as−, 1G〉| . Ψ
dense(T) size(T)
|sh(T)|.
Here Ψ(x) = x|log cx|, and inside the logarithm, c is a small fixed
constant, to insure that c dense(T) · size(T) < 1
, say.
Sum(S) :=
|〈Sann1F , αs〉〈as−, 1G〉|
We want to provide the bound Sum(Av) . |F |1/p for p0 < p < 2. We
have the trivial bound
(4.59) Sum(S) . Ψ
dense(S) size(S)
count(S).
It is incumbent on us to provide a decomposition of Av into sub-
collections for which this last estimate is effective.
By inductive application of our principal organizational Lemma 4.50,
Av is the union of Sℓδ,σ, ℓ = 1, 2 for δ, σ ∈ 2 := {2n | n ∈ Z , n ≤ 0},
satisfying
dense(Sℓδ,σ) . δ,(4.60)
size(Sℓδ,σ) . σ,(4.61)
count(Sℓδ,σ) .
min(σ−2−κ|F |, δ−Mσ−p|F |+ σ1/κδ−1, δ−1) ℓ = 1,
min(σ−2(log 1/σ)3|F |, δ−1σκ/50) ℓ = 2
(4.62)
Using (4.59), we see that
Sum(S1δ,σ) . min(Ψ(δ)σ
−1−κ|F |, δ−M+1σ−p+1|F |+ σ1/κ+1, σ)
Sum(S2δ,σ) . min(Ψ(δ)σ
−1(log 1/σ)4|F |, σ1+κ/50)
(4.63)
One can check that for ℓ = 1, 2,
(4.64)
δ,σ∈2
Sum(Sℓδ,σ) . |F |1/p, p0 < p < 2.
PROOFS OF LEMMATA 45
This completes the proof of Lemma 4.33, aside from the proof of
Lemma 4.50.
Proof of (4.64). We can assume that |G| = 1, and that |F | ≤ 1,
for otherwise the result follows from the known Lp estimates, for p > 2
and measurable vector fields, see Theorem 1.15.
The case of ℓ = 2 in (4.64) is straightforward. Notice that in (4.63),
for ℓ = 2, the two terms in the minimum are roughly comparable,
ignoring logarithmic terms, for
δ|F | ≃ σ2+κ/50 .
Therefore, we set
T1 = {(δ, σ) ∈ 2× 2 | δ|F | ≤ σ2+κ/50 ≤ |F |} ,
T2 = {(δ, σ) ∈ 2× 2 | σ2+κ/50 ≤ δ|F |}
and T3 = 2× 2− T1 − T2.
We can estimate
(δ,σ)∈T1
Sum(S2δ,σ) .
(δ,σ)∈T1
Ψ(δ)σ−1(log 1/σ)4|F |
σ2+κ/50≤|F |
σ1+κ/75
. |F |1/p0 , p0 =
2 + κ/50
1 + κ/75
< 2 .
Notice that we have absorbed harmless logarithmic terms into a slightly
smaller exponent in σ above.
The second term is
(δ,σ)∈T2
Sum(S2δ,σ) .
(δ,σ)∈T2
σ1+κ/50
(δF )1/p1 , p1 =
2 + κ/50
1 + κ/50
< 2 ,
. |F |1/p1 .
46 4. L
ESTIMATE FOR Hv
The third term is
(δ,σ)∈T3
Sum(S2δ,σ) .
(δ,σ)∈T3
Ψ(δ)σ−1(log 1/σ)4|F |
σ2+κ/50≥|F |
σ−1|F |1−κ/75
. |F |1/p0 .
Here, we have again absorbed harmless logarithms into a slightly smaller
power of |F |, and p0 < 2 is as in the first term.
The novelty in this proof is the proof of (4.64) in the case of ℓ = 1.
We comment that if one uses the proof strategy just employed, that
is only relying upon the first and last estimates from the minimum in
(4.63), in the case of ℓ = 1, one will only show that |F |1/2.
In the definitions below, we will have a choice of 0 < τ < 1, where
τ = τ(M, p) ≃ M−1·(2−p) will only depend uponM and p in (4.63). (τ
enters into the definition of T4 and T5 below.) The choice of 0 < κ < τ
will be specified below.
T1 = {(δ, σ) ∈ 2× 2 | |F |
(2+κ)(1+κ) ≤ σ} ,
T2 = {(δ, σ) ∈ 2× 2 | σ < |F |
2−κ , δ ≥ σ1/κ} ,
T3 = {(δ, σ) ∈ 2× 2 | σ < |F |
2−κ , δ > σ1/κ} ,
T4 = {(δ, σ) ∈ 2× 2 | |F |
2−κ ≤ σ < |F |
(2+κ)(1+κ) , δ > στ} ,
T5 = {(δ, σ) ∈ 2× 2 | |F |
2−κ ≤ σ < |F |
(2+κ)(1+κ) , δ ≤ στ} ,
T (T ) =
(δ,σ)∈T
Sum(S1δ,σ) .
Note that for T1 we can use the first term in the minimum in (4.63).
T (T1) .
(δ,σ)∈T1
δσ−1−κ|F |
σ≥|F |
(2+κ)(1+κ)
σ−1−κ|F |
. |F |1−
2+κ .
This last exponent on |F | is strictly larger than 1
as desired. The point
of the definition of T1 is that when it comes time to use the middle term
PROOFS OF LEMMATA 47
of the minimum for ℓ = 1 in (4.63), we can restrict attention to the
δ−M+1σ−p+1|F | .
For the collection T2, use the last term in the minimum in (4.63).
T (T2) .
(δ,σ)∈T2
σ≤|F |
σ log 1/σ
. |F |
2−κ/2 .
Again, for 0 < κ < 1, the exponent on |F | above is strictly greater
than 1/2.
The term T3 can be controlled with the first term in the minimum
in (4.63).
T (T3) .
(δ,σ)∈T3
δσ−1−κ|F |
σ|F | . |F | .
The term T4 is the heart of the matter. It is here that we use the
middle term in the minimum of (4.63), and that the role of τ becomes
clear. We estimate
T (T4) .
(δ,σ)∈T4
δ−Mσ−p+1|F |
δ≥|F |τ
δ−M |F |1−
. |F |1−
−Mτ .
Recall that 1 < p < 2, so that 0 < p − 1 < 1. Therefore, for 0 < κ
sufficiently small, of the order of 2− p, we will have
1− p− 1
2− κ >
+ 2−p
Therefore, choosing τ ≃ (2 − p)/M will leave us with a power on |F |
that is strictly larger than 1
The previous term did not specify κ > 0. Instead it shows that
for 0 < κ < 1 sufficiently small, we can make a choice of τ , that is
48 4. L
ESTIMATE FOR Hv
independent of κ, for which T (T4) admits the required control. The
bound in the last term will specify a choice of κ on us. We estimate
T (T5) .
(δ,σ)∈T5
δσ−1−κ|F |
σ≥|F |
σ−1−κ|F |1+τ
. |F |1+τ+
Choosing κ = τ/6 will result in the estimate
which is as required, so our proof is finished. �
Remark 4.65. The resolution of Conjecture 1.21 would depend upon
refinements of Lemma 4.50, as well as using the restricted weak type
approach of [21].
Proof of Lemma 4.34. We only consider tiles s ∈ AT (ann, scl),
and sets ω ∈ Ω which are associated to one of these tiles. For an
element a = {as} ∈ ℓ2(AT (ann, scl)),
s :ωs=ω
asSannαs
For |ωs| = |ωs′|, note that dist(ωs,ωs′) is measured in units of scl/ann.
By a lemma of Cotlar and Stein, it suffices to provide the estimate
′‖2 . ρ−3, ρ = 1 +
dist(ω,ω′).
Now, the estimate ‖T
‖2 . 1 is obvious. For the case ω 6= ω′, by
Schur’s test, it suffices to see that
(4.66) sup
s′ :ωs′=ω
s :ωs=ω
|〈Sannαs, Sannαs′〉| . ρ−3.
For tiles s′ and s as above, recall that 〈ϕs, ϕs′〉 = 0, note that
|Rs′ ∩ Rs|
ann dist(ω,ω′)
≃ ρ−1,
and in particular, for a fixed s′, let Ss′ be those s for which ρRs∩ρRs′ 6=
∅. Clearly,
card(Ss′) .
|ρRs|
|2ρRs′ ∩ 2ρRs|
ρ ≃ ρ2
PROOFS OF LEMMATA 49
If for r > 1, rRs ∩ rRs′ = ∅, then it is routine to show that
|〈Sannαs, Sannαs′〉| . r−10
And so we may directly sum over those s 6∈ Ss′ ,
s 6∈Ss′
|〈Sannαs, Sannαs′〉| . ρ−3.
For those s ∈ Ss′, we estimate the inner product in frequency vari-
ables. Recalling the definition of αs = (Tc(Rs)D
ζ)ϕs, we have
α̂s = (Mod−c(Rs) D
ζ̂) ∗ ϕ̂s.
Recall that ζ is a smooth compactly supported Schwartz function. We
estimate the inner product
|〈Ŝannαs, Ŝannαs′〉|
without appealing to cancellation. Since we choose the function λ̂ to
be supported in an annulus 1
< |ξ| < 3
so that λ̂ann = λ̂(·/ann) is
supported in the annulus 1
ann < |ξ| < 3
ann. We can restrict our
attention to this same range of ξ. In the region |ξ| > ann/4, suppose,
without loss of generality, that ξ is closer to ωs than ωs′. Then since
ωs and ωs′ are separated by an amount & anndist(ω,ω
|α̂s(ξ)α̂s′(ξ)| . χ(2)ωs (ξ)χ
dist(ω,ω)
. χ(2)ωs (ξ)χ
(ξ)ρ−20.
Here, χ is the non–negative bump function in (4.40). Hence, we have
the estimate ∫
|λ̂ann(ξ)|2|α̂s(ξ)α̂s′(ξ)|dξ . ρ−10.
This is summed over the . ρ2 possible choices of s ∈ Ss′ , giving the
estimate ∑
s∈Ss′
|〈Sannαs, Sannαs′〉| . ρ−8 . ρ−3.
This is the proof of (4.66). And this concludes the proof of Lemma 4.34.
Proof of the Principal Organizational Lemma 4.50. Recall
that we are to decompose S into four distinct subsets satisfying the
favorable estimates of that Lemma. For the remainder of the proof set
dense(S) := δ and size(S) := σ. Take Slight to be all those s ∈ S for
which there is no tile s′ ∈ AT of density at least δ/2 for which s . s′.
It is clear that this set so constructed has density at most δ/2, that
this is a set of tiles, and that S1 := S− Slight is also .
50 4. L
ESTIMATE FOR Hv
The next Lemma and proof comment on the method we use to
obtain the middle estimate in (4.53) which depends upon the Lipschitz
Kakeya Maximal Function Conjecture 1.14. It will be used to obtain
the important inequality (4.82) below.
Lemma 4.67. Suppose we have a collection of trees T ∈ T , with these
conditions.
a: For T ∈ T there is a 1-tree T1 ⊂ T with
(4.68)
∆(T1) dx ≥ κσ ,
b: Each tree has top element s(T) := ωT×RT of density at least
c: The collections of tops {s(T) | T ∈ T } are pairwise incompa-
rable under the order relation ‘.’.
d: For all T ∈ T , γT = γωT×RT ≥ κ−1/2σ−κ/5N . Here, N is the
exponent on δ in Conjecture 1.14.
Then we have
|RT| . δ−Np−1σ−p(1+κ/4)|F |+ σ1/κδ−1.(4.69)
|RT| . δ−1.(4.70)
Concerning the role of γT, recall from the definition, (4.29), that
γs is a quantity that grows as does the ratio scl(s)/‖v‖Lip, hence there
are only . log σ−1 scales of tiles that do not satisfy the assumption d
above.
Proof. Our primary interest is in (4.69), which is a consequence of
our assumption about the Lipschitz Kakeya Maximal Functions, Con-
jecture 1.12.
s(T) := ωT × σ−κ/10NRT .
Let us begin by noting that
κ−1‖v‖Lip ≤ scl(s(T)) ≤ κann(s(T)), T ∈ T ,(4.71)
dense(s(T)) ≥ δσκ/10N , T ∈ T ,(4.72)
|F ∩Rs(T)| ≥ σ1+κ/4N |Rs(T)| .(4.73)
The conclusion (4.71) is straightforward, as is (4.72). The inequality
(4.73) follows from Lemma 4.55.
PROOFS OF LEMMATA 51
Note that the length of σ−κ/10NRT satisfies
σ−κ/10N L(RT) ≤ γT L(RT)
scl(s)
‖v‖Lip
≤ (100‖v‖Lip)−1 .
(4.74)
This is the condition (1.8) that we impose in the definition of the
Lipschitz Kakeya Maximal Functions.
Observe that we can regard ann(s(T)) ≃ σκ/10ann as a constant
independent of T.
The point of these observations is that our assumption about the
Lipschitz Kakeya Maximal Function applies to the maximal function
formed over the set of tiles {s(T) | T ∈ T }. And it will be applied
below.
Let Tk be the collection of trees so that T ∈ Tk if k ≥ 0 is the
smallest integer such that
(4.75) |(2kRT) ∩ v−1(ωT) ∩G| ≥ 220k/κ
2−1δ|RT| .
Then since the density of s(T) for every tree T ∈ T is at least δ, we
have T =
k=0 Tk. We can apply Conjecture 1.12 to these collections,
with the value of δ in that Conjecture being 220k/κ
2−1δ.
For each Tk, we decompose it by the following algorithm. Initialize
T selectedk ← ∅, T stockk ← Tk .
While T stockk 6= ∅, select T ∈ T stockk such that scl(s(T)) is minimal.
Define Tk(T) by
Tk(T) = {T′ ∈ Tk : (2kRT) ∩ (2kRT′) 6= ∅ and ωT ⊂ ωT′} .
Update
T selectedk ← T selectedk ∪ {T} , T stockk ← T stockk \Tk(T) .
Thus we decompose Tk into
T∈T selected
T′∈Tk(T)
{T′} .
And ∑
|RT| =
T∈T selected
T′∈Tk(T)
|RT′| .
Notice that RT′ ’s are disjoint for all T
′ ∈ Tk(T) and they are contained
in 5(2kRT). This is so, since the tops of the trees are assumed to be
incomparable with respect to the order relation ‘.’ on tiles.
52 4. L
ESTIMATE FOR Hv
Thus we have
|RT| .
T∈T selected
22k|RT|
. δ−12−10k/κ
T∈T selected
|(2kRT) ∩ v−1(ωT) ∩G| .
Observe that (2kRT)∩v−1(ωT)’s are disjoint for all T ∈ T selectedk . This
and the fact that |G| ≤ 1 proves (4.70). To argue for (4.69), we see
|RT| . δ−12−10k/κ
T∈T selected
(2kRT) ∩ v−1(ωT) ∩G
. δ−12−10k/κ
(2kRT) ∩G
∣∣∣∣ .
At this point, Conjecture 1.12 enters. Observe that we can estimate
∣∣∣ . |{Mδ′,v,(σκ/10ann)−1 1F > σ1+κ/4N}|
. (δ′)−Npσ−p(1+κ/4N)|F |.
. (δ)−Np2−kσ−p(1+κ/4N)|F |.
(4.76)
Here, δ′ = 220k/κ
2−1δ, the choice of δ′ permitted to us by (4.75), and
we have used (4.73) in the first line, to pass to the Lipschitz Kakeya
Maximal Function.
Hence,
|sh(T)| .
. δ−1
2−10k/κ
(2kRT) ∩G
. δ−1
k : 1≤2k≤σ−κ/10
(2kRT)
+ δ−1
k : 2k>σ−κ/10
2−10k/κ
2 |G| .
On the first sum in the last line, we use (4.76), and on the second, we
just sum the geometric series, and recall that |G| = 1.
PROOFS OF LEMMATA 53
We can now begin the principal line of reasoning for the proof of
Lemma 4.50.
The Construction of S1large. We use an orthogonality, or TT
∗ argu-
ment that has been used many times before, especially in [20] and [15].
(There is a feature of the current application of the argument that is
present due to the fact that we are working on the plane, and it is
detailed by Lacey and Li [15].)
We may assume that all intervals ωs are contained in the upper
half of the unit circle in the plane. Fix S ⊂ Av, and σ = size(S).
We construct a collection of trees T 1large for the collection S1, and
a corresponding collection of 1–trees T 1,1large, with particular properties.
We begin the recursion by initializing
T 1large ← ∅, T
large ← ∅,
S1large ← ∅, Sstock ← S1.
In the recursive step, if size(Sstock) < 1
σ1+κ/100, then this recursion
stops. Otherwise, we select a tree T ⊂ Sstock such that three conditions
are met.
a: The top of the tree s(T) (which need not be in the tree)
satisfies dense(s(T)) ≥ δ/4.
b: T contains a 1–tree T1 with
(4.77) −
∆(T1) dx ≥ 1
σ1+κ/100 .
c: And that ωT is in the first place minimal and and in the
second most clockwise among all possible choices of T. (Since
all ωs are in the upper half of the unit circle, this condition
can be fulfilled.)
We take T to be the maximal tree in Sstock which satisfies these condi-
tions.
We then update
T 1large ← {T} ∪ Tlarge, T
large ← {T1} ∪ T
large,
S1large ← T ∪ S1large Sstock ← Sstock −T.
The recursion then repeats. Once the recursion stops, we update
S1 ← Sstock
It is this collection that we analyze in the next subsection.
Note that it is a consequence of the recursion, and Remark 4.44,
that the tops of the trees {s(T) | T ∈ T 1large} are pairwise incomparable
under ..
54 4. L
ESTIMATE FOR Hv
The bottom estimate of (4.53) is then immediate from the construc-
tion and (4.70).
First, we turn to the deduction of the first estimate of (4.53). Let
T 1,(1)large be the set
T 1,(1)large =
T ∈ T 1large :
|〈Sann 1F , βs〉|2 < 116σ
2+κ/50|RT|
And let T 1,(2)large be the set
T 1,(2)large =
T ∈ T 1large :
|〈Sann 1F , βs〉|2 ≥ 116σ
2+κ/50|RT|
In the inner products, we are taking βs, which is supported off of γsRs.
Since T ∈ T 1large satisfies
(4.78) −
∆(T) dx ≥ 1
σ1+κ/100 ,
we have ∑
|〈Sann 1F , αs〉|2 ≥ 14σ
2+κ/50|RT| .
Thus, if T ∈ T 1,(1)large , we have
|〈1F , ϕs〉|2 ≥ 18σ
2+κ/50|RT| .
The replacement of αs by ϕs in the inequality above is an important
point for us. That we can then drop the Sann is immediate.
With this construction and observation, we claim that
(4.79)
T∈T 1,(1)
large
|RT| . (log 1/σ)2σ−2−κ/50|F |.
Proof of (4.79). This is a variant of the the argument for the
‘Size Lemma’ in [15], and so we will not present all details. Begin by
making a further decomposition of the trees T ∈ T 1,(1)large . To each such
tree, we have a 1-tree T1 ⊂ T which satisfies (4.77). We decompose
T1. Set
T1(0) =
s ∈ T1 | |〈f, ϕs〉|√
< σ1+κ/100
T1(j) =
s ∈ T1 | 4j−1σ1+κ/100 ≤ |〈f, ϕs〉|√
< 4jσ1+κ/100
1 ≤ j ≤ j0 = C log 1/σ .
PROOFS OF LEMMATA 55
Now, set T (j) to be those T ∈ T 1,(1)large for which
(4.80)
s∈T1(j)
|〈f, ϕs〉|2 ≥ (2j0)−1σ2+κ/50|RT| , 0 ≤ j ≤ j0 .
It is the case that each T ∈ T 1,(1)large is in some T (j), for 0 ≤ j ≤ j0.
The central case is that of j = 0. We can apply the ‘Size Lemma’
of [15] to deduce that
T∈T (0)
|RT| ≤ (2j0)σ−2−κ/50
T∈T (0)
s∈T1(0)
|〈f, ϕs〉|2
. (log 1/σ)σ−2−κ/50|F | .
The point here is that to apply the argument in the ‘Size Lemma’ one
needs an average case estimate, namely (4.80), as well as a uniform
control, namely the condition defining T1(0). This proves (4.79) in
this case.
For 1 ≤ j ≤ j0, we can apply the ‘Size Lemma’ argument to the
individual tiles in the collection
{T1(j) | T ∈ T (j)} .
The individual tiles satisfy the definition of a 1-tree. And the defining
condition of T1(j) is both the average case estimate, and the uniform
control needed to run that argument. In this case we conclude that
T∈T (j)
s∈T1(j)
|〈f, ϕs〉|2 . |F | .
Thus, we can estimate
T∈T (j)
|RT| . (log 1/σ)σ−1−κ/50|F | .
This summed over 1 ≤ j ≤ j0 = C log 1/σ proves (4.79). �
For T 1,(2)large , we have
T∈T 1,(2)
large
|RT| . σ−2−κ/50
scl≥κ−1‖v‖Lip
s:scl(s)=scl
|〈Sann 1F , βs〉|2
. σ−2−κ/50|F |
scl≥κ−1‖v‖Lip
‖v‖Lip
. σ−2−κ/50|F | ,
56 4. L
ESTIMATE FOR Hv
since βs has fast decay. The Bessel inequality in the last display can
be obtained by using the same argument in the proof of Lemma 4.34.
Hence we get
(4.81)
T∈T 1,(2)
large
|RT| . σ−2−κ/50|F |.
Combining (4.79) and (4.81), we obtain the first estimate of (4.53).
Second, we turn to the deduction of the middle estimate of (4.53),
which relies upon the Lipschitz Kakeya Maximal Function. Let T 1,goodlarge
be the set
T ∈ T 1large : γT ≥ κ−1/2σ−κ/5N
And let T 1,badlarge be the set
T ∈ T 1large : γT < κ−1/2σ−κ/5N
The ‘good’ collection can be controlled by facts which we have already
marshaled together. In particular, we have been careful to arrange the
construction so that Lemma 4.67 applies. By the main conclusion of
that Lemma, (4.69), we have
(4.82)
T∈T 1,good
large
|RT| . δ−Mσ−1−3κ/4|F |+ σ1/κδ−1 .
Here, M is a large constant that only depends upon N in Conjec-
ture 1.14.
For T ∈ T 1,badlarge , there are at most K = O(log(σ−κ)) many possible
scales for scl(ωT × RT). Let scl(T) = scl(ωT × RT). Thus we have
T∈T 1,bad
large
|RT| .
T:scl(T)=2mκ−1‖v‖Lip
|RT| .
Since T satisfies (4.78), we have
|F ∩ γTRT| & σ1+κ/2|RT| .
Thus, we get
T∈T 1,bad
large
|RT| . σ−1−κ/2
T:scl(T)=2mκ−1‖v‖Lip
1σ−κRT(x)dx .
PROOFS OF LEMMATA 57
For the tiles with a fixed scale, we have the following inequality, which
is a consequence of Lemma 4.25.
T:scl(T)=2mκ−1‖v‖Lip
1σ−κRT
. σ−κ/5δ−1 .
Hence we obtain
(4.83)
T∈T 1,bad
large
|RT| . δ−1σ−1−3κ/4|F | .
Combining (4.82) and (4.83), we obtain the middle estimate of
(4.53). Therefore, we complete the proof of (4.53).
The Construction of S2large. It is important to keep in mind that we
have only removed trees of nearly maximal size, with tops of a given
density. In the collection of tiles that remain, there can be trees of
large size, but they cannot have a top with nearly maximal density.
We repeat the TT∗ construction of the previous step in the proof, with
two significant changes.
We construct a collection of trees T 2large from the collection S1, and
a corresponding collection of 1–trees T 2,1large, with particular properties.
We begin the recursion by initializing
T 2large ← ∅ , T
large ← ∅ ,
S2large ← ∅ , Sstock ← S1 .
In the recursive step, if size(Sstock) < σ/2, then this recursion stops.
Otherwise, we select a tree T ⊂ Sstock such that two conditions are
a: T satisfies ‖∆(T)‖2 ≥ σ2 |RT|
b: ωT is both minimal and most clockwise among all possible
choices of T.
We take T to be the maximal tree in Sstock which satisfies these condi-
tions. We take T1 ⊂ T to be a 1–tree so that
(4.84) −
∆(T1) dx ≥ κσ .
This last inequality must hold by Lemma 4.49.
We then update
T 2large ← {T} ∪ Tlarge, T
large ← {T1} ∪ T
large,
Sstock ← Sstock −T.
The recursion then repeats.
58 4. L
ESTIMATE FOR Hv
Once the recursion stops, it is clear that the size of Sstock is at most
σ/2, and so we take Ssmall := Sstock.
The estimate ∑
T∈T 2
large
|RT| . σ−2|F |
then is a consequence of the TT ∗ method, as indicated in the previous
step of the proof. That is the first estimate claimed in (4.54).
What is significant is the second estimate of (4.54), which involves
the density. The point to observe is this. Consider any tile s of density
at least δ/2. Let Ts be those trees T ∈ T 2large with top ωs(T) ⊃ ωs and
Rs(T) ⊂ KRs. By the construction of S1large, we must have
∆(T1) dx ≤ σ1+κ/100 ,
for the maximal 1–tree T 1 contained in
T∈Ts T. But, in addition, the
tops of the trees in T 2large are pairwise incomparable with respect to the
order relation ‘.,’ hence we conclude that
|RT| . σ2+κ/50|Rs|.
Moreover, by the construction of Slight, for each T ∈ T 2large we must be
able to select some tile s with density at least δ/2 and ωs(T) ⊃ ωs and
Rs(T) ⊂ KRs.
Thus, we let S∗ be the maximal tiles of density at least δ/2. Then,
the inequality (4.70) applies to this collection. And, therefore,
T∈T 2
large
|RT| ≤ σκ/50
|Rs| . σκ/50δ−1.
This completes the proof of second estimate of (4.54). �
The Estimates For a Single Tree.
The Proof of Lemma 4.55. It is a routine matter to check that for
any 1–tree we have ∑
|〈f, ϕs〉|2 . ‖f‖22.
Indeed, there is a strengthening of this estimate relevant to our concerns
here. Recalling the notation (4.40), we have
(4.85)
|〈f, ϕs〉|2
]1/2∥∥∥
. ‖χ(∞)RT f‖p , 1 < p <∞ .
PROOFS OF LEMMATA 59
This is variant of the Littlewood-Paley inequalities, with some addi-
tional spatial localization in the estimate.
Using this inequality for p = 1 + κ/100 and the assumption of the
Lemma, we have
σ1+κ/100 ≤
∆T dx
]1+κ/100
1+κ/100
≤ |RT|−1
|〈f, ϕs〉|2
]1/2∥∥∥
1+κ/100
1+κ/100
. |RT|−1
dx.(4.86)
This inequality can only hold if |F ∩ σ−κRT| ≥ σ1+κ|RT|. �
The Proof of Lemma 4.57. This Lemma is closely related to the
Tree Lemma of [15]. Let us recall that result in a form that we need
it. We need analogs of the definitions of density and size that do not
incorporate truncations of the various functions involved. Define
dense(s) :=
G∩v−1(ωs)
(x) dx.
(Recall the notation from (4.40).)
dense(T) := sup
dense(s).
Likewise define
size(T) := sup
′ is a 1–tree
|RT′|−1
|〈1F , ϕs〉|2
Then, the proof of the Tree Lemma of [15] will give us this inequality:
For T a tree,
(4.87)
|〈Sann 1F , ϕs〉〈φs, 1G〉| . dense(T) size(T)|RT|.
Now, consider a tree T with dense(T) = δ, and size(T) = σ, where
we insist upon using the original definitions of density and size. If
in addition, γs ≥ K(σδ)−1 for all s ∈ T, we would then have the
inequalities
dense(T) . δ,
size(T) . σ,
60 4. L
ESTIMATE FOR Hv
This places (4.87) at our disposal, but this is not quite the estimate we
need, as the functions ϕs and φs that occur in (4.87) are not truncated
in the appropriate way, and it is this matter that we turn to next.
Recall that
ϕs = αs + βs ,
αs(x− yv(x))ψs(y) dy = αs−(x) + αs+(y) .
One should recall the displays (4.30), (4.31), and (4.38).
As an immediate consequence of the definition of βs, we have∫
|βs(x)| dx . γ−2s
|Rs|.
Hence, if we replace ϕs by βs, we have
|〈Sann 1F , βs〉〈φs, 1G〉| .
|Rs||〈φs, 1G〉|
γ−1s |Rs|
. σδ|RT|.
And by a very similar argument, one sees corresponding bounds, in
which we replace the φs by different functions. Namely, recalling the
definitions of as± in (4.31) and estimate (4.38), we have
|Rs||〈as+, 1G〉| . σ
(‖v‖Lip
scl(s)
(x) dx(4.88)
(‖v‖Lip
scl(s)
. σδ|RT| .
Similarly, we have
|Rs||〈φs − as+ − as−, 1G〉| . σδ|RT|,
Putting these estimates together proves our Lemma, in particular (4.58),
under the assumption that γs ≥ K(σδ)−1 for all s ∈ T.
Assume that T is a tree with scl(s) = scl(s′) for all s, s′ ∈ T. That
is, the scale of the tiles in the tree is fixed. Then, T is in particular a 1–
tree, so that by an application of the definitions and Cauchy–Schwartz,
|〈Sann 1F , αs〉〈as−, 1G〉| ≤ δ
|〈Sann 1F , αs〉|
≤ δσ|RT|.
PROOFS OF LEMMATA 61
But, γs ≥ 1 increases as does
scl(s). Thus, any tree T with γs ≤
K(σδ)−1 for all s ∈ T, is a union of O(|log δσ|) trees for which the last
estimate holds. �
CHAPTER 5
Almost Orthogonality Between Annuli
Application of the Fourier Localization Lemma
We are to prove Lemma 4.23, and in doing so rely upon a technical
lemma on Fourier localization, Lemma 5.56 below. We can take a
choice of 1 < α < 9
, and assume, after a dilation, that ‖v‖Cα = 1.
The first inequality we establish is this.
Lemma 5.1. Using the notation of of Lemma 4.23, and assuming that
‖v‖Cα . 1, we have the estimate ‖C‖2 . 1, where
ann≥1
where the Cann are defined in (4.21).
We have already established Lemma 4.22, and so in particular know
that ‖Cann‖2 . 1. Due to the imposition of the Fourier restriction in
the definition of these operators, it is immediate that CannC∗ann′ ≡ 0 for
ann 6= ann′. We establish that
Cann′‖2 . max(ann, ann′)−δ ,
δ = 1
(α− 1) , |log ann(ann′)−1| > 3 .
(5.2)
Then, it is entirely elementary to see that C is a bounded operator. Let
Pann be the Fourier projection of f onto the frequencies ann < |ξ| <
2ann. Observe,
‖Cf‖22 =
ann≥1
Cann Pann f
ann≥1
ann′>1
〈Cann Pann f, Cann′ Pann′ f〉
≤ 2‖f‖2
ann≥1
ann′>1
Cann′ Pann′ f‖2
. ‖f‖22
ann≥1
ann′>1
max(ann, ann′)−δ
. ‖f‖22.
64 5. ALMOST ORTHOGONALITY
There are only O(log ann) possible values of scl that contribute to
Cann, and likewise for Cann′. Thus, if we define
(5.3) Cann,sclf =
s∈AT (ann)
scl(s)=scl
〈f, ϕs〉φs ,
it suffices to prove
Lemma 5.4. Using the notation of of Lemma 4.23, and assuming that
‖v‖Cα . 1, we have
ann,sclCann′,scl′‖2 . (max(ann, ann′))−δ .
Here, we can take δ′ = 1
(α − 1), and the inequality holds for all
|log ann(ann′)−1| > 3, 1 < scl ≤ ann and 1 < scl′ ≤ ann′.
Proof of Lemma 4.23. In this proof, we assume that Lemma 5.1
and Lemma 5.4 are established. The first Lemma clearly establishes
the first (and more important) claim of the Lemma.
Let us prove the inequality (4.24). Using the notation of this sec-
tion, this inequality is as follows.
(5.5)
ann=−∞
Cann,scl
. (1 + log(1 + scl−1‖v‖Cα)).
This inequality holds for all choices of Cα vector fields v.
Note that Lemma 5.4 implies immediately
ann=3
Cann,scl
. 1 , ‖v‖Cα = 1 .
We are however in a scale invariant situation, so that this inequality
implies this equivalent form, independent of assumption on the norm
of the vector field.
(5.6)
ann≥8‖v‖Cα
Cann,scl
. 1 .
On the other hand, Lemma 4.25, implies that independent of any
assumption other than measurability, we have have the inequality
‖Cann,scl‖2 . 1 .
To prove (5.5), use the inequality (5.6), and this last inequality together
with the simple fact that for a fixed value of scl, there are at most
. 1 + log(1 + scl−1‖v‖Cα)) values of ann with scl ≤ ann ≤ 8‖v‖Cα.
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 65
We use the notation
AT (ann, scl) := {s ∈ AT (ann) : scl(s) = scl} ,
Observe that as the scale is fixed, we have a Bessel inequality for the
functions {ϕs | s ∈ AT (ann, scl)}. Thus,
ann,sclCann′,scl′f‖22 =
s∈AT (ann,scl)
s∈AT (ann′,scl′)
〈φs, φs′〉〈ϕs′, f〉ϕs
s∈AT (ann,scl)
s∈AT (ann′,scl′)
〈φs, φs′〉〈ϕs′, f〉
At this point, the Schur test suggests itself, and indeed, we need a
quantitative version of the test, which we state here.
Proposition 5.7. Let A = {ai,j} be a matrix acting on ℓ2(N) by
ai,jxj
Then, we have the following bound on the operator norm of A.
‖A‖2 . sup
|ai,j| · sup
|ai,j|
We assume that 1 ≤ ann < 1
′. For a subset S ⊂ AT (ann, scl)×
AT (ann′, scl′) Consider the operator and definitions below.
AS f =
(s,s′)∈S
〈φs, φs′〉〈ϕs′, f〉ϕs ,
FL(s,S) =
s′∈AT (ann′,scl′)
|〈φs, φs′〉| ,
FL(S) = sup
FL(s,S) .
Here ‘FL’ is for ‘Fourier Localization’ as this term is to be controlled
by Lemma 5.56. We will use the notations FL(s′,S), and FL′(S),
which are defined similarly, with the roles of s and s′ reversed. By
Proposition 5.7, we have the inequality
(5.8) ‖AS‖22 . FL(S) · FL′(S) .
We shall see that typically FL(S) will be somewhat large, but is bal-
anced out by FL′(S).
We partition AT (ann, scl) × AT (ann′, scl′) into three disjoint sub-
collections Su, u = 1, 2, 3, defined as follows. In this display, (s, s′) ∈
66 5. ALMOST ORTHOGONALITY
AT (ann, scl)×AT (ann′, scl′).
(s, s′) | scl
≥ scl
,(5.9)
(s, s′) | scl
, scl < scl′
,(5.10)
(s, s′) | scl
, scl′ < scl
.(5.11)
A further modification to these collections must be made, but it
is not of an essential nature. For an integer j ≥ 1, and (s, s′) ∈ Su,
for u = 1, 2, 3, write (s, s′) ∈ Su,j if j is the smallest integer such that
2j+2Rs ∩ 2j+2Rs′ 6= ∅.
We apply the inequalities (5.8) to the collections Su,j, to prove the
inequalities
(5.12) ‖ASu,j‖2 . 2−j(ann′)−δ
where δ′ = 1
(α − 1). This proves Lemma 5.4, and so completes the
proof of Lemma 5.1.
In applying (5.8) it will be very easy to estimate FL(s,S), with
a term that decreases like say 2−10j. The difficult part is to estimate
either FL(s,S) or FL′(S) by a term with decreases faster than a small
power of (ann′)−1. for which we use Lemma 5.56.
Considering a term 〈φs, φs′〉, the inner product is trivially zero if
ωs ∩ ωs′ = ∅. We assume that this is not the case below. To apply
Lemma 5.56, fix e ∈ ω′s ∩ωs. Let α be a Schwartz function on R with
α̂ supported on [ann′, 2ann′], and identically one on 3
[ann′, 2ann′]. Set
β̂(θ) := α̂(θ − 3
′). We will convolve φs with β in the direction e,
and φs′ with α also in the direction e, thereby obtaining orthogonal
functions.
Define
Ie g(x) =
g(x− ye)β(y) dy,(5.13)
∆s = φs − Ie φs
∆s′ = φs′ − Ie φs′
(5.14)
By construction, we have
〈φs, φs′〉 = 〈Ie φs +∆s, Ie φs′ +∆s′〉
= 〈Ie φs,∆s′〉+ 〈∆s, Ie φs′〉+ 〈∆s,∆s′〉 .
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 67
It falls to us to estimate terms like
s∈Sℓ,j
|〈∆s, Ie φs′〉|,(5.15)
s∈Sℓ,j
|〈Ie φs,∆s′〉|,(5.16)
s∈Sℓ,j
|〈∆s,∆s′〉|.(5.17)
as well as the dual expressions, with the roles of s and s′ reversed.
The differences ∆s and ∆s′ are frequently controlled by Lemma 5.56.
Concerning application of this Lemma to ∆s, observe that
Mod−c(ωs)∆s = Mod−c(ωs) φs −
[Mod−c(ωs) φs(x− ye)]β̃(y) dy
where β̃(y) = e(c(ωs)·e)y β(y). Now the Fourier transform of β is identi-
cally one in a neighborhood of the origin of width comparable to ann′,
where as |c(ωs) · e| is comparable to ann. Since we can assume that
′ > ann+3, say, the function β̃ meets the hypotheses of Lemma 5.56,
namely it is Schwarz function with Fourier transform identically one
in a neighborhood of the origin, and the width of that neighborhood
is comparable to ann′. And so ∆s is bounded by the bounded by
the three terms in (5.57)—(5.59) below. In these estimates, we take
2k ≃ ann′ > 1. By a similar argument, one sees that Lemma 5.56 also
applies to ∆s′.
We will let ∆s,m, for m = 1, 2, 3, denote the terms that come from
(5.57), (5.58), and (5.59) respectively. We use the corresponding no-
tation for ∆s,m, for m = 1, 2, 3. A nice feature of these estimates, is
that while ∆s and ∆s′ depend upon the choice of e ∈ ωs′ ∩ ωs, the
upper bounds in the first two estimates do not depend upon the choice
of e. While the third estimate does, the dependence of the set Fs on
the choice of e is rather weak.
In application of (5.58), the functions ∆s,2 will be very small, due
to the term (ann′)−10 which is on the right in (5.58). This term is so
much smaller than all other terms involved in this argument that these
terms are very easy to control. So we do not explicitly discuss the case
of ∆s,2, or ∆s′,2 below.
In the analysis of the terms (5.15) and (5.16), we frequently only
need to use an inequality such as |Ie φs′| . χ(2)Rs′ . When it comes to the
analysis of (5.17), the function ∆s′ obeys the same inequality, so that
68 5. ALMOST ORTHOGONALITY
these sums can be controlled by the same analysis that controls (5.15),
or (5.16). So we will explicitly discuss these cases below.
In order for 〈φs, φs′〉 6= ∅, we must necessarily have ωs ∩ ωs′ 6= ∅.
Thus, we update all Sℓ,j as follows.
Sℓ,j ← {(s, s′) ∈ Sℓ,j | ωs ∩ ωs′ 6= ∅} .
The Proof of (5.12) for S1,j, j ≥ 1. Recall the definition of S1,j
from (5.9). In particular, for (s, s′) ∈ S1,j , we must have ωs ⊂ ωs′.
We will use the inequality (5.8), and show that for 0 < ǫ < 1,
FL(S1,j) . 2−10j(ann′)−eα
′ · ann′
scl · ann(5.18)
FL′(S1,j) . 22j(ann′)ǫ ·
scl · ann
′ · ann′ .(5.19)
Notice that in the second estimate, we permit some slow increase in
the estimates as a function of 2j and ann′. But, due to the form of the
estimate of the Schur test in (5.8), this slow growth is acceptable.
The terms inside the square root in these two estimates cancel out.
These inequalities conclude the proof of the inequality (5.12) for the
collection S1,j , j ≥ 1.
We prove (5.18). For this, we use Lemma 5.56. That is, we should
bound the several terms
s′ : (s,s′)∈S1,j
|〈∆s, Ie φs′〉| ,(5.20)
s′ : (s,s′)∈S1,j
|〈Ie φs,∆s′〉| ,(5.21)
s′ : (s,s′)∈S1,j
|〈∆s,∆s′〉| .(5.22)
Here ∆s and ∆s′ are as in (5.14). And, Ie is defined as in (5.13). We
can regard the tile s as fixed, and so fix a choice of e ∈ ωs. In the next
two cases, we will need to estimate the same expressions as above. In
all three cases, Lemma 5.56 is applied with 2k ≃ ann′, and we can take
ǫ in this Lemma to be ǫ = 1
(α− 1). For ease of notation, we set
(5.23) α̃ = (α− 1)(1− ǫ)2 − ǫ > 0
As we have already mentioned, we do not explicitly discuss the
upper bound on the estimate for (5.22).
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 69
The Upper Bound on (5.20). We write ∆s = ∆s,1 + ∆s,2 + ∆s,3,
where these three terms are those on the right in (5.57)—(5.59) respec-
tively. Note that
(5.24) |Ie φs′| . χ(2)Rs′ ,
since Ie is convolution in the long direction of Rs′ , at the scale of
(ann′)−1, which is much smaller than the length of Rs′ in the direc-
tion e. Therefore, we can estimate the term in (5.20) by
s′ : (s,s′)∈S1,j
|〈∆s,1, Ie φs′〉| . (ann′)−eα2−10j
s′ : (s,s′)∈S1,j
. (ann′)−eα2−10j
′ · ann′
scl · ann .(5.25)
This is as required to prove (5.18) for these sums.
For the terms associated with ∆s,3, we have
s′ : (s,s′)∈S1,j
|〈∆s,3, Ie φs′〉| .
s′ : (s,s′)∈S1,j
|Rs|−1/2 · χ(2)R′s dx
. 2−10j|Fs|
ann′ · scl′ · ann · scl
. 2−10j(ann′)−α+ǫ
′ · ann′
scl · ann .
That is, we only rely upon the estimate (5.60). This completes the
analysis of (5.20). (As we have commented above, we do not explicitly
discuss the case of ∆s,2.)
The Upper Bound for (5.21). Since ωs ⊂ ωs′ , the only facts about
∆s′ we need are
(ann′)ǫR′s
|∆s′| dx . (ann′)−eα+ǫ
′ · ann′ ,
|∆s′(x)| . (ann′)−eαχ(2)Rs′ (x) , x 6∈ (ann
′)ǫRs′ .
(5.26)
Indeed, this estimate is a straightforward consequence of the various
conclusions of Lemma 5.56. (We will return to this estimate in other
cases below.)
70 5. ALMOST ORTHOGONALITY
These inequalities, with |Ie φs| . χ(2)Rs , permit us to estimate
(5.21) . 2−20j |Rs|−1/2
s′ : (s,s′)∈S1,j
(ann′)ǫR′s
|∆s′| dx
. 2−20j(ann′)−eα
scl · ann
′ · ann′ × ♯{s
′ : (s, s′) ∈ S1,j}
. 2−20j(ann′)−eα
′ · ann′
scl · ann .
which is the required estimate. Here of course we use the estimate
♯{s′ : (s, s′) ∈ S1,j} . 22j
′ · ann′
scl · ann .
We now turn to the proof of (5.19), where it is important that we
justify the small term √
scl · ann
′ · ann′
on the right in (5.19). We estimate the terms dual to (5.20)—(5.22),
namely
s : (s,s′)∈S1,j
|〈∆s, Ies φs′〉| ,(5.27)
s : (s,s′)∈S1,j
|〈Ies φs,∆s′〉| ,(5.28)
s : (s,s′)∈S1,j
|〈∆s,∆s′〉| .(5.29)
Here, for each choice of tile s, we make a choice of es ∈ ωs ⊂ ωs′.
The Upper Bound on (5.27). We have an inequality analogous to
(5.24).
(5.30) |Ies φs′| . χ
Note that as we can view s′ as fixed, all the tiles {s : (s, s′) ∈ S1,j}
have the same approximate spatial location. Let us single out a tile s0
in this collection. Then, for all s, we have Rs ⊂ 2j+2Rs0 .
Recalling the specific information about the support of the functions
of ∆s from (5.57), (5.59) and (5.61), it follows that
s : (s,s′)∈S1,j
|∆s| . 22j(ann′)ǫχ(2)2j+2Rs0 .
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 71
In particular, we do not claim any decay in ann′ in this estimate. (The
small growth of (ann′)ǫ above arises from the overlapping supports of
the functions ∆s, as detailed in Lemma 5.56.) Therefore, we can esti-
s : (s,s′)∈S1,j
|〈∆s, Ies φs′〉| . 22j(ann′)ǫ
2jRs0
. 2−10j(ann′)ǫ
scl · ann
′ · ann′ .
This is as required in (5.19).
Remark 5.31. It is the analysis of the term
s : (s,s′)∈S1,j
|〈∆s,3, Ies φs′〉|
which prevents us from obtaining a decay in ann′, at least in some
choices of the parameters scl , ann , scl′, and ann′.
The Upper Bound on (5.28). The fact about ∆s′ we need is the
simple inequality |∆s′| . χ(2)Rs′ .
As in the previous case, we turn to the fact that all the tile {s :
(s, s′) ∈ S1,j} have the same approximate spatial location. Single out
a tile s0 in this collection, so that Rs ⊂ 2j+2Rs0 for all such s.
Our claim is that
(5.32)
s : (s,s′)∈S1,j
|Ies φs| . 22jχ
22jRs
(We will have need of related inequalities below.) Suppose that s ∈
{s : (s, s′) ∈ S1,j}. These intervals all have the same length, namely
scl/ann. And x 6∈ supp(φs) implies v(x) 6∈ ωs, so that by the Lipschitz
assumption on the vector field
dist(x, supp(φs)) & dist(v(x),ωs) .
This means that
(5.33) |Ies φs(x)| . χ
1 + ann′ · dist(v(x),ωs)
Here, we recall that the operator Ie is dominated by the operator which
averages on spatial scale (ann′)−1 in the direction e. Moreover, we have
(5.34) ann′ · dist(ωs,ω) & scl .
Here, we partition the unit circle into disjoint intervals ω ∈ Ω of length
|ω| ≃ scl/ann, so that for all s ∈ {s : (s, s′) ∈ S1,j}, we have ωs ∈ Ω.
72 5. ALMOST ORTHOGONALITY
Figure 5.1. The relative positions of Rs and Rs′ in for
pairs (s, s′) ∈ Sℓ, for ℓ = 2 and ℓ = 3 respectively.
In fact, the term on the left in (5.34) can be taken to be integer
multiples of scl. Combining these observations proves (5.32). Indeed,
we can estimate the term in (5.32) as follows. For x, fix ω ∈ Ω with
v(x) ∈ ω. Then,
s : (s,s′)∈S1,j
|Ies φs| .
s : (s,s′)∈S1,j
1 + ann′ · dist(ω,ωs)
The important point is that the term involving the distance allows us
to sum over the possible values of ωs ⊂ ωs′ to conclude (5.32).
To finish this case, we can estimate
s : (s,s′)∈S1,j
|〈Ies φs,∆s′〉| . 2−10j
scl · ann
′ · ann′ .
This completes the upper bound on (5.28).
The Proof (5.12) for S2,j, j ≥ 1. In this case, note that the
assumptions imply that we can assume that ωs′ ⊂ ωs, and that di-
mensions of the rectangle Rs′ are smaller than those for Rs in both
directions. See Figure 5.
We should show these two inequalities, in analogy to (5.18) and
(5.19).
FL(S2,j) . 2−10j(ann′)−eα
′ · ann′
scl · ann(5.35)
FL′(S2,j) . 2−10j(ann′)−eα ·
scl · ann
′ · ann′ .(5.36)
Here, α̃ is as in (5.23).
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 73
For the proof of (5.35), we should analyze the sums
s′ : (s,s′)∈S2,j
|〈∆s, Ies′ φs′〉| ,(5.37)
s′ : (s,s′)∈S2,j
|〈Ies′ φs,∆s′〉| ,(5.38)
s′ : (s,s′)∈S2,j
|〈∆s,∆s′〉| .(5.39)
These inequalities are in analogy to (5.20)—(5.22), and es′ ∈ ωs′ ⊂ ωs.
The Upper Bound on (5.37). Fix the tile s. Fix a translate Rs of
Rs with 2
jRs ∩ 2jRs = ∅, but 2j+1Rs ∩ 2j+2Rs 6= ∅. Let us consider
(5.40) S2,j = {(s, s′) ∈ S2,j | Rs′ ⊂ Rs}
and we restrict the the sum in (5.37) to this collection of tiles. Note
that with . 22j choices of Rs, we can exhaust the collection S2,j . So
we will prove a slightly stronger estimate in the parameter 2j for the
restricted collection S2,j.
The point of this restriction is that we can appeal to an inequality
similar to (5.32). Namely,
(5.41)
s′ : (s,s′)∈S2,j
|Ie′s φs′| .
′ · ann′
scl · ann χ
Note that the term in the square root takes care of the differing L2
normalizations of φs′ and χ
. Indeed, the proof of (5.32) is easily
modified to give this inequality.
Next, we observe that the analog of (5.26) holds for ∆s. Just replace
s′ in (5.26) with s. It is a consequence that we have
s′ : (s,s′)∈S2,j
|〈∆s, Ies′ φs′〉| . 2
−12j(ann′)−eα
′ · ann′
scl · ann .
This is enough to finish this case.
The Upper Bound on (5.38). Let us again appeal to the notations
Rs and S2,j as in (5.40).
We have the estimates
|Ies′ φs| . χ
74 5. ALMOST ORTHOGONALITY
As for the sum over ∆s′ , we have an analog of the estimates (5.26).
Namely,
s′ : (s,s′)∈S2,j
(ann′)ǫRs
|∆s′,1| dx . (ann′)−eα+ǫ
′ · ann′
scl · ann
s′ : (s,s′)∈S2,j
|∆s′,1| .
′ · ann′
scl · ann χ
, x 6∈ (ann′)ǫRs .
Note that we again have to be careful to accommodate the different
normalizations here. The proof of (5.26) can be modified to prove this
estimate.
Putting these two estimates together clearly proves that
s′ : (s,s′)∈S2,j
|〈Ies′ φs,∆s′〉| . 2
−10j(ann′)−eα
′ · ann′
scl · ann ,
as is required.
We now turn to the proof of the inequality (5.36), which will follow
from appropriate upper bounds on the sums below.
s : (s,s′)∈S2,j
|〈∆s, Ie φs′〉| ,(5.42)
s : (s,s′)∈S2,j
|〈Ie φs,∆s′〉| ,(5.43)
s : (s,s′)∈S2,j
|〈∆s,∆s′〉| .(5.44)
Here, we can regard s′ as a fixed tile, and e ∈ ωs′ ⊂ ωs. In this case,
observe that we have the inequality
(5.45) ♯{s : (s, s′) ∈ S1,j} . 22j .
This is so since Rs has larger dimensions in both directions than does
The Upper Bound on (5.42). We use the decomposition of ∆s =
∆s,1 +∆s,2 +∆s,3. In the first case, we can estimate
s : (s,s′)∈S2,j
|〈∆s,1, Ie φs′〉| . 22j sup|〈∆s,1, Ie φs′〉|
. 2−10j(ann′)−eα
scl · ann
′ · ann′ .
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 75
For the last case, of ∆s,3, we estimate
s : (s,s′)∈S2,j
|〈∆s,3, Ie φs′〉| . 22j sup|〈∆s,3, Ie φs′〉|
. 22j min
|Fs| ·
scl · ann · scl′ · ann′ ,
2−30j
scl · ann
′ · ann′
Examining the two terms of the minimum, note that by (5.61),
|Fs| ·
scl · ann · scl′ · ann′ . (ann′)−α+ǫ
′ · ann′
scl · ann
. (ann′)−α+1+ǫ
′ · ann′ · scl · ann
. (ann′)−α+1+ǫ
scl · ann
′ · ann′ .
Here it is essential that we have the estimate (5.60) as stated, with
|Fs| . (ann′)−α+ǫ|Rs|. This is an estimate of the desired form, but
without any decay in the parameter j. The second term in the min-
imum does have the decay in j, but does not have the decay in ann′.
Taking the geometric mean of these two terms finishes the proof, pro-
vided (α−ǫ)/2 > α̃, which we can assume by taking α sufficiently close
to one.
The Upper Bound on (5.43). Using the inequality |Ie φs| . χ(2)Rs ,
and the inequalities (5.26) and (5.45), it is easy to see that
s : (s,s′)∈S2,j
|〈Ie φs,∆s′〉| . 2−12j(ann′)−eα
scl · ann
′ · ann′ .
This is the required estimate.
The Proof of (5.12) for S3,j, j ≥ 1. In this case, we have that
the length of the rectangles Rs′ are greater than those of the rectangles
Rs, as depicted in Figure 5. We show that
FL(S3,j) . (ann′)ǫ
′ · ann′
scl · ann(5.46)
FL′(S3,j) . 2−10j(ann′)−eα
scl · ann
′ · ann′ .(5.47)
In particular, we do not claim any decay in the term FL(S3,j), in fact
permitting a small increase in the parameter ann′. Recall that 0 < ǫ < 1
is a small quantity. See (5.23). But due to the form of the estimate in
76 5. ALMOST ORTHOGONALITY
Proposition 5.7, with the decay in 2j and ann′ in the estimate (5.47),
these two estimates still prove (5.12) for S3,j .
For the proof of (5.46), we analyze the sums
s′ : (s,s′)∈S3,j
|〈∆s, Ies′ φs′〉| ,(5.48)
s′ : (s,s′)∈S3,j
|〈Ies′ φs,∆s′〉| ,(5.49)
s′ : (s,s′)∈S3,j
|〈∆s,∆s′〉| .(5.50)
Here, es′ ∈ ωs′ ⊂ ωs.
The Upper Bound on (5.48). Regard s as fixed. We employ a vari-
ant of the notation established in (5.40). Let R̃s be a rectangle with
in the same coordinates axes as Rs. In the direction es, let it have
length 1/scl′, that is the (longer) length of the rectangles Rs′, and let
it have the same width of Rs. Further assume that 2
jRs ∩ R̃s = ∅ but
2j+4Rs ∩ R̃s 6= ∅. (There is an obvious change in these requirements
for j = 1.) Then, define
S̃3,j = {(s, s′) ∈ S3,j | Rs′ ⊂ R̃s} .
With . 22j choices of R̃s, we can exhaust the collection S3,j . Thus, we
prove a slightly stronger estimate in the parameter 2j for the collection
S̃3,j.
The main point here is that we have an analog of the estimate
(5.32):
s′ : (s,s′)∈ eS3,j
|Ie φs′| .
The term in the square root takes into account the differing L2 normal-
izations between the φs′ and χ
. The proof of (5.32) can be modified
to prove the estimate above.
We also have the analogs of the estimate (5.26). Putting these two
together proves that
s′ : (s,s′)∈S3,j
|〈∆s, Ies′ φs′〉| . 2
−10j(ann′)−eα
. 2−10j(ann′)−eα
′ · ann′
scl · ann .
APPLICATION OF THE FOURIER LOCALIZATION LEMMA 77
That is, we get the estimate we want with decay in ann′, we do not
claim in general.
The Upper Bound on (5.49). We use the inequality
|Ies′ φs| . χ
And we use the decomposition ∆s′ = ∆s′,1 +∆s′,2 +∆s′,3.
For the case of ∆s′,1, we have ωs′ ⊂ ωs. And the supports of the
functions ∆s′ are well localized with respect to the vector field. See
(5.57). Thus, in particular we have
s′ : (s,s′)∈S3,j
|∆s′| . (ann′)ǫ
′ · ann′ .
Hence, we have
s′ : (s,s′)∈S3,j
|〈χ(2)Rs ,∆s′〉| . (ann
′ · ann′
scl · ann
which is the desired estimate.
Remark 5.51. It is the analysis of the sum
s′ : (s,s′)∈S3,j
|〈Ie φs,∆s′,3〉|
that prevents us from obtaining decay in the parameter ann′ for cer-
tain choices of parameters scl , ann , scl′ and ann′. This is why we have
formulated (5.46) the way we have.
For the proof of (5.47), we analyze the sums
s : (s,s′)∈S3,j
|〈∆s, Ie φs′〉| ,(5.52)
s : (s,s′)∈S3,j
|〈Ie φs,∆s′〉| ,(5.53)
s : (s,s′)∈S3,j
|〈∆s,∆s′〉| .(5.54)
Here es′ ∈ ωs′ ⊂ ωs, and one can regard the interval ωs′ as fixed. It is
essential that we obtain the decay in 2j and ann′ in these cases.
Indeed, these cases are easier, as the sum is over s. For fixed s′,
there is a unique choice of interval ωs ⊃ ωs′. And the rectangles Rs
are shorter than Rs′, but wider. Hence,
(5.55) ♯{s : (s, s′) ∈ S3,j} . 22j
78 5. ALMOST ORTHOGONALITY
The Upper Bound on (5.52). We use the decomposition ∆s = ∆s,1+
∆s,2 +∆s,3, and the inequality |Ies′ φs′, | . χ
For the sum associated with ∆s,1, we have
s : (s,s′)∈S3,j
|〈∆s,1, Ies′ φs′〉| . (ann
′)−eα
s : (s,s′)∈S3,j
〈χ(2)Rs , χ
. 2−12j(ann′)−eα
′ · ann
scl · ann′
× ♯{s : (s, s′) ∈ S3,j}
. 2−10j(ann′)−eα
′ · ann′
scl · ann .
This is the required estimate.
For the sum associated with ∆s,3, the critical properties are those
of the corresponding sets Fs, described in (5.60) and (5.61). Note that
the sets ∑
s : (s,s′)∈S3,j
1Fs . (ann
′)2ǫ .
On the other hand,
s : (s,s′)∈S3,j
|Fs| . 22j
′ sup
s : (s,s′)∈S3,j
. 22j(ann′)−α+ǫ
′ · ann .
Here, we have used the estimate (5.55).
This permits us to estimate
s : (s,s′)∈S3,j
|〈∆s,3, Ies′ φs′〉| . 2
−10j(ann′)−α+3ǫ
scl · ann′
′ · ann
Note that the parity between the ‘primes’ is broken in this estimate.
By inspection, one sees that this last term is at most
. 2−10j(ann′)−eα
′ · ann′
scl · ann .
Indeed, the claimed inequality amounts to
(ann′)−α+3ǫscl . (ann′)−eαscl′ .
We have to permit scl′ to be as small as 1, whereas scl can be as big
as ann. But α > 1, and ann < ann′, so the inequality above is trivially
true. This completes the analysis of (5.52).
THE FOURIER LOCALIZATION ESTIMATE 79
The Upper Bound on (5.53). We only need to use the inequality
|Ies′ φs| . χ
, and the inequalities (5.26). It follows that
s : (s,s′)∈S3,j
|〈Ies′ φs,∆s′〉| .
s : (s,s′)∈S3,j
〈χ(2)Rs , |∆s′|〉
. 2−10j(ann′)−eα
′ · ann′
scl · ann .
The Fourier Localization Estimate
The precise form of the inequalities quantifying the Fourier local-
ization effect follows.
Fourier Localization Lemma 5.56. Let 1 < α < 2, ǫ < (α− 1)/20,
and v be a vector field with ‖v‖Cα ≤ 1. Let s be a tile with
1 < scl(s) = scl ≤ ann(s) = ann < 1
fs = Mod−c(ωs) φs
Let ζ be a smooth function on R, with 1(−2,2) ≤ ζ̂ ≤ 1(−3,3) and set
ζ2k(y) = 2
kζ(y2k). We have this inequality valid for all unit vectors e
with |e− es| ≤ |ωs|.
∣∣∣fs(x)−
fs(x− ye)ζ2k(y) dy
.(scl2(α−1)k)−1+ǫχ
(x)1fωs(v(x))(5.57)
+ (2kscl)−10χ
(x)(5.58)
+ |Rs|−1/21Fs(x) ,(5.59)
where ω̃s is a sub arc of the unit circle, with ω̃s = λωs, and 1 < λ <
2ǫk. Moreover, the sets Fs ⊂ R2 satisfy
|Fs| . 2−(α−ǫ)k(1 + scl−1)α−1|Rs|,(5.60)
Fs ⊂ 2ǫkRs ∩ v−1(ω̃s) ∩
∂(v ·es⊥)
∣∣∣ > 2(1−ǫ)k
.(5.61)
The appearance of the set Fs is explained in part because the only
way for the function φs to oscillate quickly along the direction es is
that the vector field moves back and forth across the interval ωs very
quickly. This sort of behavior, as it turns out, is the only obstacle to
the frequency localization described in this Lemma.
Note that the degree of localization improves in k. In (5.57), it
is important that we have the localization in terms of the directions
80 5. ALMOST ORTHOGONALITY
of the vector field. The terms in (5.58) will be very small in all the
instances that we apply this lemma. The third estimate (5.59) is the
most complicated, as it depends upon the exceptional set. The form of
the exceptional set in (5.61) is not so important, but the size estimate,
as a function of α > 1, in (5.60) is.
Proof. We collect some elementary estimates. Throughout this
argument, ~y := y e ∈ R2.
(5.62)
|y|>t2−k
|y2k||ζ2k(y)| dy . t−N , t > 1.
This estimate holds for all N > 1. Likewise,
(5.63)
|u|>tscl−1
|uscl||sclψ(scl u)| du . t−N , t > 1.
More significantly, we have for all x ∈ R2,
(5.64)
eiξ0y ϕ
(x− ~y)ζ2k(y) dy = ϕ
(x) − 2k < ξ0 < 2k,
where ϕ
= Tc(Rs)D
ϕ. This is seen by taking the Fourier transform.
Likewise, by (4.17), for vectors v0 of unit length,∫
e−2πiuλ0 ϕ
(x− uv0)sclψ(scl u) du 6= 0
implies that
(5.65) scl ≤ λ0 + ξ · v0 ≤ 98scl, for some ξ ∈ supp(ϕ̂
At this point, it is useful to recall that we have specified the fre-
quency support of ϕ to be in a small ball of radius κ in (4.16). This
has the implication that
(5.66) |ξ · es| ≤ κscl, |ξ · es⊥| ≤ κann ξ ∈ supp(ϕ̂(2)Rs )
We begin the main line of the argument, which comes in two stages.
In the first stage, we address the issue of the derivative below exceeding
a ‘large’ threshold.
e ·Dv(x) · es⊥ =
∂v · es⊥
We shall find that this happens on a relatively small set, the set Fs
of the Lemma. Notice that due to the eccentricity of the rectangle
Rs, we can only hope to have some control over the derivative in the
long direction of the rectangle, and e essentially points in the long
direction. We are interested in derivative in the direction es⊥ as that is
THE FOURIER LOCALIZATION ESTIMATE 81
the direction that v must move to cross the interval ωs2. A substantial
portion of the technicalities below are forced upon us due to the few
choices of scales 1 ≤ scl ≤ 2εk, for some small positive ε.1
Let 0 < ε1, ε2 < ǫ to be specified in the argument below. In partic-
ular, we take
0 < ε1 ≤ min
, κα−1
, 0 < ε2 <
(α− 1).
We have the estimate
|fs(x)|+
fs(x− ~y)ζ2k(y) dy
∣∣∣ . 2−10kχ(2)Rs (x), x 6∈ 2
ε1kRs.
This follows from (5.62) and the fact that the direction e differs from
es by an no more than the measure of the angle of uncertainty for Rs.
This is as claimed in (5.58). We need only consider x ∈ 2ε1kRs.
Let us define the sets Fs, as in (5.59). Define
λs :=
2ε1k scl
< 2−2ε1k
8 otherwise
Let λωs denote the interval on the unit circle with length λ|ωs|, and
the same center as ωs.
2 This is our ω̃ of the Lemma; the set Fs of the
Lemma is
(5.67) Fs := 2
ε1kRs ∩ v−1(λsωs) ∩
∂(v ·es⊥)
∣∣∣ > 2(1−ε2)k
And so to satisfy (5.61), we should take ε1 < 1/1200.
Let us argue that the measure of Fs satisfies (5.60). Fix a line ℓ in
the direction of e. We should see that the one dimensional measure
(5.68) |ℓ ∩ Fs| . 2−k(α−ǫ)(1 + scl−1)α−1scl−1.
For we can then integrate over the choices of ℓ to get the estimate in
(5.60).
The set ℓ ∩ Fs is viewed as a subset of R. It consists of open
intervals An = (an, bn), 1 ≤ n ≤ N . List them so that bn < an+1
for all n. Partition the integers {1, 2, . . . , N} into sets of consecutive
integers Iσ = [mσ, nσ]∩N so that for all points x between the left-hand
endpoint of Amσ and the right-hand endpoint of Anσ , the derivative
∂(v · es⊥)/∂e has the same sign. Take the intervals of integers Iσ to be
maximal with respect to this property.
1The scales of approximate length one are where the smooth character of the
vector field helps the least. The argument becomes especially easy in the case that√
ann ≤ scl, as in the case, |ωs| & scl−1.
2We have defined λs this way so that λsωs makes sense.
82 5. ALMOST ORTHOGONALITY
For x ∈ Fs, the partial derivative of v, in the direction that is
transverse to λsωs, is large with respect to the length of λsωs. Hence,
v must pass across λsωs in a small amount of time:
|Am| . 2−(1−ε1−ε2)k for all σ.
Now consider intervals Anσ and A1+nσ = Amσ+1. By definition,
there must be a change of sign of ∂v(x) · es⊥/∂e between these two
intervals. And so there is a change in this derivative that is at least as
big as 2(1−ε2)k scl
. The partial derivative is also Hölder continuous of
index α − 1, which implies that Anσ and Amσ+1 cannot be very close,
specifically
dist(Anσ , Amσ+1) ≥
2(1−ε2)k
As all of the intervals An lie in an interval of length 2
ε1kscl
−1, it follows
that there can be at most
1 ≤ σ . 2ε1kscl−1
2(1−ε2)k
)−α+1
intervals Iσ. Consequently,
|ℓ ∩ Fs| . 2−(1−2ε1−ε2+(1−ε2)(α−1))kscl−1
. 2−(α−2ε1−2ε2)kscl−α+1scl−1
We have already required 0 < ε1 <
and taking 0 < ε2 <
achieve the estimate (5.68). This completes the proof of (5.60).
The second stage of the proof begins, in which we make a detailed
estimate of the difference in question, seeking to take full advantage of
the Fourier properties (5.62)—(5.65), as well as the derivative informa-
tion encoded into the set Fs.
We consider the difference in (5.57) in the case of x ∈ 2ε1kRs −
v−1(λsωs). In particular, x is not in the support of fs, and due to the
smoothness of the vector field, the distance of x to the support of fs is
at least
& 2ε1k
so that by (5.62), we can estimate
∣∣∣fs(x)−
fs(x− ~y)ζ2k(y) dy
∣∣∣ . (2ε1kscl)−N |Rs|−1/2
which is the estimate (5.58).
THE FOURIER LOCALIZATION ESTIMATE 83
We turn to the proof of (5.57). For x ∈ 2ε1kRs ∩ v−1(λsωs), we
always have the bound
∣∣∣fs(x)−
fs(x− ~y)ζ2k(y) dy
∣∣∣ . 210ε1k/κχ(2)Rs (x)
101λsωs(x).
It is essential that we have |e − es| ≤ |ωs| for this to be true, and κ
enters in on the right hand side through the definition (4.40).
We establish the bound
∣∣∣fs(x)−
fs(x− ~y)ζ2k(y) dy
∣∣∣ . (scl2(α−1)k)−1|Rs|−1/2,
x ∈ 2ε1kRs ∩ v−1(λsωs) ∩ F cs .
(5.69)
We take the geometric mean of these two estimates, and specify that
0 < ε1 < κ
to conclude (5.57).
It remains to consider x ∈ 2ε1kRs ∩ v−1(λsωs) ∩ F cs , and now some
detailed calculations are needed. To ease the burden of notation, we
exp(x) := e−2πiuc(ωs)·v(x), Φ(x, x′) = ϕ
(x− uv(x′)),
with the dependency on u being suppressed, and define
w(du, d~y) := sclψ(scl u)ζ2k(y) du d~y.
In this notation, note that
fs = Mod−c(ωs) φs
ec(ωs)(x−uv(x)−c(ωs))x ϕ
(x− uv(x)) sclψ(sclu) du
exp(x)Φ(x, x, )sclψ(sclu) du
exp(x)Φ(x, x, )w(du, d~y) ,
since ζ has integral on R2. In addition, we have
fs(x− ~ye)ζ2k(~y) d~y =
ec(ωs)(x−uv(x−~y)−c(ωs))x
× ϕ(2)Rs (x− uv(x− ~y)) sclψ(sclu) du d~y
exp(x− ~y)Φ(x− ~y, x− ~y)w(du, d~y) .
84 5. ALMOST ORTHOGONALITY
We are to estimate the difference between these two expressions, which
is the difference of
Diff1(x) :=
exp(x)Φ(x, x)− exp(x− ~y)Φ(x− ~y, x)w(du, d~y)
Diff2(x) :=
exp(x− ~y){Φ(x− ~y, x− ~y)− Φ(x− ~y, x)}w(du, d~y)
The analysis of both terms is quite similar. We begin with the first
term.
Note that by (5.64), we have
Diff1(x) =
{exp(x)− exp(x− ~y)}Φ(x− ~y, x)w(du, d~y).
We make a first order approximation to the difference above. Observe
exp(x)− exp(x− ~y) = exp(x){1− exp(x− ~y)exp(x)}
= exp(x){1− e−2πiu[c(ωs)·Dv(x)·e]~y}(5.70)
+O(|u|ann|~y|α).
In the Big–Oh term, |u| is typically of the order scl−1, and |~y| is of the
order 2−k. Hence, direct integration leads to the estimate of this term
|u|ann|y|α|Φ(x− ~y, x)|·|w(du, d~y)|
.|Rs|−1/2
.|Rs|−1/2(scl2(α−1)k)−1.
This is (5.69).
The term left to estimate is
Diff ′1(x) :=
exp(x)(1−e−2πiu[c(ωs)·Dv(x)·e]~y)
Φ(x− ~y, x)w(du, d~y) .
Observe that by (5.64), the integral in y is zero if
|u[c(ωs) ·Dv(x) · e]| ≤ 2k.
Here we recall that c(ωs) =
ann es⊥. By the definition of Fs, the
partial derivative is small, namely
|es⊥ ·Dv(x) · e| . 2(1−ε1)k
THE FOURIER LOCALIZATION ESTIMATE 85
Hence, the integral in y in Diff ′1(x) can be non–zero only for
scl|u| & 2ε1k.
By (5.63), it follows that in this case we have the estimate
|Diff ′1(x)| . 2−2k|Rs|−1/2
This estimate holds for x ∈ 2ε1kRs∩v−1(λsωs)∩F cs and this completes
the proof of the upper bound (5.69) for the first difference.
We consider the second difference Diff2. The term v(x− ~y) occurs
twice in this term, in exp(x − ~y), and in Φ(x − ~y, x − ~y). We will use
the approximation (5.70), and similarly,
Φ(x− ~y, x−~y)− Φ(x− ~y, x)
(x− ~y − uv(x− ~y))− ϕ(2)Rs (x− ~y − uv(x))
(x− ~y − uv(x)− uDv(x)~y)
− ϕ(2)Rs (x− ~y − uv(x)) +O(ann |u||y|
= ∆Φ(x, ~y) +O(ann |u||~y|α)
The Big–Oh term gives us, upon integration in u and ~y, a term that is
no more than
. |Rs|−1/2
2−αk . |Rs|−1/2(scl2(α−1)k)−1.
This is as required by (5.69).
We are left with estimating
Diff ′2(x) :=
e−2πiuc(ωs)·(v(x)−Dv(x)·~y)∆Φ(x, ~y)w(du, d~y).
By (5.64), the integral in y is zero if both of these conditions hold.
|uc(ωs)Dv(x) · e| < 2k,
|[uc(ωs)Dv(x)− ξ − uξDv(x)] · e| < 2k, ξ ∈ supp(ϕ̂(2)Rs )
Both of these conditions are phrased in terms of the derivative which
is controlled as x 6∈ Fs. In fact, the first condition already occurred in
the first case, and it is satisfied if
scl|u| . 2ε1k.
Recalling the conditions (5.66), the second condition is also satisfied
for the same set of values for u. The application of (5.63) then yields a
very small bound after integrating |u| & 2ε1kscl−1. This completes the
proof our technical Lemma. �
REFERENCES 87
References
[1] Angeles Alfonseca, Fernando Soria, and Ana Vargas, A remark on maximal
operators along directions in R2, Math. Res. Lett. 10 (2003), no. 1, 41–
49.MR1960122 (2004j:42010) ↑8
[2] Angeles Alfonseca, Strong type inequalities and an almost-orthogonality princi-
ple for families of maximal operators along directions in R2, J. London Math.
Soc. (2) 67 (2003), no. 1, 208–218.MR1942421 (2003j:42015) ↑8
[3] J. Bourgain, A remark on the maximal function associated to an analytic vec-
tor field, Analysis at Urbana, Vol. I (Urbana, IL, 1986–1987), 1989, pp. 111–
132.MR 90h:42028 ↑4, 6, 23, 24
[4] Camil Muscalu, Terence Tao, and Christoph Thiele, Uniform estimates on
paraproducts, J. Anal. Math. 87 (2002), 369–384. Dedicated to the memory of
Thomas H. Wolff. MR 1945289 (2004a:42023) ↑42
[5] Anthony Carbery, Andreas Seeger, Stephen Wainger, and James Wright,
Classes of singular integral operators along variable lines, J. Geom. Anal. 9
(1999), no. 4, 583–605.MR 2001g:42026 ↑6, 28
[6] Lennart Carleson, On convergence and growth of partial sumas of Fourier se-
ries, Acta Math. 116 (1966), 135–157.MR 33 #7774 ↑vii, 2
[7] Michael Christ, Alexander Nagel, Elias M. Stein, and Stephen Wainger, Sin-
gular and maximal Radon transforms: analysis and geometry, Ann. of Math.
(2) 150 (1999), no. 2, 489–577.MR 2000j:42023 ↑6
[8] A. Córdoba and R. Fefferman, On differentiation of integrals, Proc. Nat. Acad.
Sci. U.S.A. 74 (1977), no. 6, 2211–2213.MR0476977 (57 #16522) ↑7, 11
[9] Charles Fefferman, Pointwise convergence of Fourier series, Ann. of Math. (2)
98 (1973), 551–571.MR 49 #5676 ↑40
[10] , The multiplier problem for the ball, Ann. of Math. (2) 94 (1971), 330–
336.MR 45 #5661 ↑vii
[11] Loukas Grafakos and Xiaochun Li, Uniform bounds for the bilinear Hilbert
transform, I, Ann. of Math. 159 (2004), 889–933. ↑31, 42
[12] Nets Hawk Katz, Maximal operators over arbitrary sets of directions, Duke
Math. J. 97 (1999), no. 1, 67–79.MR 2000a:42036 ↑8
[13] , A partial result on Lipschitz differentiation, Harmonic analysis at
Mount Holyoke (South Hadley, MA, 2001), 2003, pp. 217–224.1 979 942 ↑6
[14] Joonil Kim, Maximal Average Along Variable Lines (2006). ↑28
[15] Michael T. Lacey and Xiaochun Li, Maximal theorems for the directional
Hilbert transform on the plane, Trans. Amer. Math. Soc. 358 (2006), no. 9,
4099–4117 (electronic). MR 2219012 ↑2, 4, 10, 31, 35, 37, 40, 53, 54, 55, 59
[16] Michael T. Lacey and Christoph Thiele, Lp estimates on the bilinear Hilbert
transform for 2 < p < ∞, Ann. of Math. (2) 146 (1997), no. 3, 693–
724.MR1491450 (99b:42014) ↑2
[17] , On Calderón’s conjecture for the bilinear Hilbert transform, Proc.
Natl. Acad. Sci. USA 95 (1998), no. 9, 4828–4830 (electronic). MR 1619285
(99e:42013) ↑2
[18] Michael Lacey and Christoph Thiele, On Calderón’s conjecture, Ann. of Math.
(2) 149 (1999), no. 2, 475–496. MR 1689336 (2000d:42003) ↑2
[19] Michael T. Lacey and Christoph Thiele, Lp estimates for the bilinear Hilbert
transform, Proc. Nat. Acad. Sci. U.S.A. 94 (1997), no. 1, 33–35. MR 1425870
(98e:44001) ↑2
88 5. ALMOST ORTHOGONALITY
[20] Michael Lacey and Christoph Thiele, A proof of boundedness of the Carleson
operator, Math. Res. Lett. 7 (2000), no. 4, 361–370.MR 2001m:42009 ↑2, 10,
40, 43, 53
[21] Camil Muscalu, Terence Tao, and Christoph Thiele, Multi-linear operators
given by singular multipliers, J. Amer. Math. Soc. 15 (2002), no. 2, 469–496
(electronic).MR 2003b:42017 ↑5, 48
[22] Alexander Nagel, Elias M. Stein, and Stephen Wainger, Hilbert transforms
and maximal functions related to variable curves, Harmonic analysis in Eu-
clidean spaces (Proc. Sympos. Pure Math., Williams Coll., Williamstown,
Mass., 1978), Part 1, 1979, pp. 95–98.MR 81a:42027 ↑6, 23
[23] D. H. Phong and Elias M. Stein, Hilbert integrals, singular integrals, and Radon
transforms. II, Invent. Math. 86 (1986), no. 1, 75–113.MR 88i:42028b ↑6
[24] , Hilbert integrals, singular integrals, and Radon transforms. I, Acta
Math. 157 (1986), no. 1-2, 99–157.MR 88i:42028a ↑6
[25] Elias M. Stein, Problems in harmonic analysis related to curvature and oscil-
latory integrals, Proceedings of the International Congress of Mathematicians,
Vol. 1, 2 (Berkeley, Calif., 1986), 1987, pp. 196–221.MR 89d:42028 ↑2, 6
[26] E. M. Stein, Harmonic analysis: real-variable methods, orthogonality, and os-
cillatory integrals, Princeton Mathematical Series, vol. 43, Princeton Univer-
sity Press, Princeton, NJ, 1993. With the assistance of Timothy S. Murphy;
Monographs in Harmonic Analysis, III.MR 95c:42002 ↑vii
[27] Jan-Olov Strömberg,Maximal functions associated to rectangles with uniformly
distributed directions, Ann. Math. (2) 107 (1978), no. 2, 399–402.MR0481883
(58 #1978) ↑2, 8, 19
[28] , Weak estimates on maximal functions with rectangles in certain di-
rections, Ark. Mat. 15 (1977), no. 2, 229–240.MR0487260 (58 #6911) ↑2, 8
	Preface
	Chapter 1. Overview of Principal Results
	Chapter 2. Connections to Besicovitch Set and Carleson's Theorem
	Besicovitch Set
	The Kakeya Maximal Function
	Carleson's Theorem
	The Weak  L 2  Estimate in Theorem 1.15 is Sharp
	Chapter 3. The Lipschitz Kakeya Maximal Function
	The Weak  L 2  Estimate
	An Obstacle to an  Lp estimate, for  1<p<2
	Bourgain's Geometric Condition
	Vector Fields that are a Function of One Variable
	Chapter 4. The  L2 Estimate for Hilbert Transform on Lipschitz Vector Fields
	Definitions and Principle Lemmas
	Truncation and an Alternate Model Sum 
	Proofs of Lemmata 
	Chapter 5. Almost Orthogonality Between Annuli
	Application of the Fourier Localization Lemma
	The Fourier Localization Estimate
	References
ABSTRACT
  Let $ v$ be a smooth vector field on the plane, that is a map from the plane
to the unit circle. We study sufficient conditions for the boundedness of the
Hilbert transform
  \operatorname H_{v, \epsilon}f(x) := \text{p.v.}\int_{-\epsilon}^ \epsilon
f(x-yv(x)) \frac{dy}y where $ \epsilon $ is a suitably chosen parameter,
determined by the smoothness properties of the vector field. It is a
conjecture, due to E.\thinspace M.\thinspace Stein, that if $ v$ is Lipschitz,
there is a positive $ \epsilon $ for which the transform above is bounded on $
L ^{2}$. Our principal result gives a sufficient condition in terms of the
boundedness of a maximal function associated to $ v$. This sufficient condition
is that this new maximal function be bounded on some $ L ^{p}$, for some $
1<p<2$. We show that the maximal function is bounded from $ L ^{2}$ to weak $ L
^{2}$ for all Lipschitz maximal function. The relationship between our results
and other known sufficient conditions is explored.

<|endoftext|><|startoftext|>
Wide Field Surveys and Astronomical Discovery Space
A.Lawrence
Institute for Astronomy, SUPA∗, University of Edinburgh,
Royal Observatory, Blackford Hill, Edinburgh EH9 3HJ
A review for publication in Astronomy and Geophysics
Feb 27th 2007
Abstract
I review the status of science with wide field surveys. For many decades surveys have been the
backbone of astronomy, and the main engine of discovery, as we have mapped the sky at every
possible wavelength. Surveys are an efficient use of resources. They are important as a fundamental
resource; to map intrinsically large structures; to gain the necessary statistics to address some
problems; and to find very rare objects. I summarise major recent wide field surveys - 2MASS,
SDSS, 2dfGRS, and UKIDSS - and look at examples of the exciting science they have produced,
covering the structure of the Milky Way, the measurement of cosmological parameters, the creation
of a new field studying substellar objects, and the ionisation history of the Universe. I then look
briefly at upcoming projects in the optical-IR survey arena - VISTA, PanSTARRS,WISE, and LSST.
Finally I ask, now we have opened up essentially all wavelength windows, whether the exploration
of survey discovery space is ended. I examine other possible axes of discovery space, and find them
mostly to be too expensive to explore or otherwise unfruitful, with two exceptions : the first is the
time axis, which we have only just begun to explore properly; and the second is the possibility of
neutrino astrophysics.
1 Why are wide field surveys important ?
Some astronomical experiments are direct, in that measurements are made of some piece of sky, and
these measurements are then used for a specific scientific analysis. The essence of a survey however is
that extracting science is a two step process. First we summarise the sky, usually by taking an image
and then running pattern recognition software to produce a catalogue of objects each with a set of
measured parameters. When this summary is made, we can then do the science with the catalogue;
the archive becomes the sky. There are many such archives, distributed around the world in online
structured databases; querying such databases is a growing mode of scientific analysis. This of course is
why survey databases have played such a central role in the worldwide Virtual Observatory initiatives.
Why is this two-step process a good thing to do ? Firstly, it is cost effective, because we can performmany
experiments using the same data. Secondly, surveys are a resource that can support other experiments.
This can mean for example creating samples of objects which are ‘followed up’, i.e. observed in detail,
on other facilities (eg getting spectra of galaxy samples). Conversely, interesting objects discovered
by other experiments can be matched against objects in the standard survey catalogues, so that one
quickly has the optical flux of a new gamma-ray source. (Should this be called follow-down ?). Finally,
surveying the sky can produce surprises. First looks in new corners of parameter space have often
∗Scottish Universities Physics Alliance
http://arxiv.org/abs/0704.0809v1
Wide Field Surveys : A.Lawrence 2
Table 1: Examples of major astronomical surveys from recent decades
Type Survey Examples
Radio 3C, PKS, 4C, FIRST
IR IRAS-PSC, ELAIS, 2MASS, UKIDSS
Optical APM, SuperCOSMOS, SDSS, CFHTLS
X-ray 3U, 2A, HEAO-A, 1-XMM
z-surveys CfA-z, QDOT, 2dFGRS, SDSS-z
discovered completely new populations of objects. Historically, surveys have been the main engine of
discovery for astronomy.
Why are wide angle surveys important, as opposed to the deepest possible pencil beams ? The key point
here is that in Euclidean space, time spent surveying more area increases volume much faster than time
spent going deeper. (The argument that wide angles produce large samples faster breaks down when the
differential source count slope is flatter than 1, which for example occurs for galaxies fainter than about
B∼ 23. Also of course, sometimes, one simply has to go deep, for example to survey at some given large
redshift.) Many astronomical problems need large samples of objects to address them. Sometimes this
is because one wants accurate function estimation – for example to test theories of structure formation,
one wants to estimate the galaxy clustering power spectrum to an accuracy of around 5% in many bins
over a wide range of scale. Sometimes large samples are needed to recover a very weak signal from
noise – for example the net alignment of many random galaxy ellipticities produced by weak lensing
by intervening dark matter. The second reason for maximising volume as quickly as possible is to find
rare objects, such as the hoped for Y dwarfs and z = 7 quasars; to a given depth there might be only a
handful over the whole sky. Finally, some objects of astronomical study simply have intrinsically large
angular scale - for example the Milky Way, the galaxy clustering dipole, or open clusters of stars, which
can be tens of degrees across.
2 Major surveys
Surveys are the core of astronomy. This has always been true of course, from Ptolemy through the New
General Catalogue, to the Carte du Ciel, but it has been certainly been the case in the last few decades.
Table 1 lists some of the best known imaging surveys in each wavelength regime. (I have also included a
few redshift surveys as a distinct set). This is only a selection, and is biased towards my own favourites,
so apologies to those whose own surveys aren’t listed. The point to note is that these names are as
immediately recogniseable to every astronomer as are the names of famous telescopes and satellites -
Palomar, AAT, Ariel-V, etc. The data in these catalogues are of everyday use and have been the source
of many discoveries. Many of the older surveys were classic examples of opening a completely new
window on the Universe - 3C, IRAS, and 3U in particular, though I think it is also fair to include the
CfA redshift survey in this category, as it gave us the first real feel for the three dimensional structure of
the Universe, with bubbles, filaments, and walls. The 1-XMM catalogue is slightly different, in that it
wasn’t planned as a coherent single survey, but is the uniformly processed summation of XMM pointings
over the sky.
Over the last 5-10 years the most important major new surveys have been in the optical-IR - 2MASS,
SDSS, 2dFGRS, and now UKIDSS, which started in 2005. I will summarise each of these briefly in turn.
Some highlight science results are in the next section.
The Two Micron All Sky Survey : 2MASS.
Wide Field Surveys : A.Lawrence 3
Figure 1: All sky distribution of 2MASS catalogues. Point sources are shown as white dots. Extended
sources are coloured according to estimated redshift, based either on known values, or estimated from K
magnitude. Blue are the nearest sources (z < 0.01); green are at moderate distances (0.01 < z < 0.04)
and red are the most distant sources that 2MASS resolves (0.04 < z < 0.1). Taken from Jarrett (2004).
2MASS broke new ground, as it was the first real sky survey at near infa-red wavelengths. At near-
IR wavelengths we see roughly the same Universe as in the visible light regime, but with some key
improvements. Extinction is much less; we can see pretty much clean through the Milky Way, and can
find reddened versions of objects such as quasars. Cooler objects such as brown dwarfs can be found,
with the most extreme objects essentially invisible in standard optical bands. Cleaner galaxy samples
can be constructed, with high redshift objects easier to find. Colour combinations with optical bands
have proved especially good at finding rare objects, such as the new T-dwarf class of brown dwarfs.
2MASS used two dedicated 1.3m telescopes, in Mt Hopkins, Arizona, and CTIO, Chile. Each telescope
was equipped with a three-channel camera, each channel consisting of a 256×256 HgCdTe array, so
that observations could be made simultaneously at J (1.25 microns), H (1.65 microns), and Ks (2.17
microns). One interesting innovation was the use of large pixels, maximising survey speed, requiring
micro-stepping to improve sampling. The survey started in June 1997 June and completed in February
2001. The full data release occurred in March 2003, including both an Atlas of images and a catalogue of
almost half a billion sources. To a point source limit of 10σ, the catalogue depth is J=16 H=15 Ks=14.7,
almost five orders of magnitude deeper than any comparable IR survey. However, for the colours of
many astronomical objects, this is still two orders of magnitude shallower than modern optical surveys.
The core reference for 2MASS is Skrutskie et al. 2006). Further information can be found at the IPAC
(http://www.ipac.caltech.edu/2mass/) and UMASS(http://pegasus.phast.umass.edu/) sites. Data ac-
cess is through the IRSA system at http://irsa.ipac.caltech.edu/ .
The Sloan Digital Sky Survey : SDSS.
The SDSS project has produced a survey of 8,000 square degrees of sky at visible light wavelengths,
approximately two magnitudes deeper than the historic Schmidt surveys, and in addition has carried out
a spectroscopic survey of objects selected from the imaging survey. The project used a dedicated 2.5m
telescope at Apache Point, New Mexico, and a camera covering 1.5 square degrees. Survey operations
used a novel drift scan approach; the 30 CCDs on the camera are arranged in five rows each sensitive
to a separate filter band (u,g,r,i,z); the telescope is parked in a given position and the sky allowed to
drift past. The spectroscopic survey is carried out using a 600-fibre system, on separate nights spliced
into the imaging programme. This then required the data processing pipeline to keep up in almost real
time. Public data access has been announced in a series of staged releases, culminating in June 2006
with DR5, which contains a catalogue of around 200 million objects, and spectra for around a million
http://www.ipac.caltech.edu/2mass/
Wide Field Surveys : A.Lawrence 4
galaxies, quasars, and stars. An extended programme, cunningly called SDSS-II, has now commenced,
and is expected to continue through 2008.
SDSS has been arguably the most successful survey project of recent times, with many hundreds of
scientific papers based directly on its data, and having an impact on a very large range of astronomical
topics - large scale structure, the highest redshift quasars, the structure of the Milky Way, and many
other things besides. This may seem surprising, as visible light sky surveys covering the whole sky have
been available for decades, and available as digitised queryable online databases for some years (eg the
Digitised Sky Survey (DSS : see http://archive.stsci.edu/dss/) or the SuperCOSMOS Science Archive
(SSA : see http://surveys.roe.ac.uk/ssa). There are several reasons for the success of SDSS. The first
reason is of course the spectroscopic database, matched only by 2dF (see below). The second reason
is the wider wavelength range, with filters carefully chosen and calibrated to optimise various kinds of
search. The third reason is the improvement in quality - not only is SDSS a magnitude or two deeper
than the Schmidt surveys, but the seeing is markedly better. The fourth reason, shared by 2MASS,
is the quality of the online interface - well calibrated, reliable, and documented data were available
promptly, and with the ability to do online analysis rather than just downloading data. This has made
it easy for astronomers all over the world to jump in and benefit from SDSS.
The core reference for SDSS is York et al. 2000). Further information can be found at http://www.sdss.org,
which also contains links to data access via SkyServer.
The UKIRT Infrared Deep Sky Survey : UKIDSS.
UKIDSS is the near-infrared equivalent of the SDSS, covering only part of the sky, but many times
deeper than 2MASS. The project has been designed and implemented by a private consortium, but on
behalf of the whole ESO member community, and after a short delay, the world. It uses the Wide Field
Camera (WFCAM) on the UK Infrared Telescope (UKIRT) in Hawaii, and is taking roughly half the
UKIRT time over 2005-2012. WFCAM has an instantaneous field of view of 0.21 sq.deg, much larger
than any previous large facility IR camera. Put together with a 4m telescope, this makes possible an
ambitiuous survey. It is estimated that the effective volume of UKIDSS will be 12 times that of 2MASS,
and the effective amount of information collected 70 times larger. UKIDSS is not a single survey, but a
portfolio of five survey components. Three of these are wide shallow surveys, to K∼ 18−19, and covering
a total of ∼ 7000 sq.deg - the Galactic Plane Survey (GPS); the Galactic Clusters Survey (GCS); and
the high latitude Large Area Survey (LAS). Then there is a Deep Extragalactic Survey (DXS), covering
35 sq.deg to K ∼ 21, and an Ultra Deep Survey (UDS), covering 0.77 sq.deg. to K ∼ 23. In all cases,
there is the maximum possible overlap with other multiwavelength surveys and key areas, such as SDSS,
the Lockman Hole, and the Subaru Deep Field.
The aim of UKIDSS is to provide a public legacy database, but the design was targeted at some specific
goals - for example, to measure the substellar mass function, and its dependence on metallicity; to find
quasars at z = 7; to discover Population II brown dwarfs if they exist; to measure galaxy clustering at
z = 1 and z = 3 with the same accuracy as at z = 0; and to determine the epoch of spheroid formation.
Like SDSS, data are being released in a series of stages. At each stage the data are public to astronomers
in all ESO member states, and world-public eighteen months later. Data are made available through a
queryable interface at the WFCAM Science Archive (WSA : http://surveys.roe.ac.uk/wsa). Three data
releases have occured so far – the “Early Data Release”, and Data Releases One and Two (DR1 and
DR2) which contain approximately 10% of the likely full dataset.
UKIDSS is summarised in Lawrence et al. (2007) , and technical details of the releases are described in
Dye et al. (2006) and Warren et al. (2007).
Redshift surveys : 2MRS/6dFGRS; 2dFGRS and SDSS-z.
Systematic redshift surveys based on galaxy catalogues from imaging surveys were one of the big success
stories of the 1970s–90s, culminating in the all-sky z-survey based on the IRAS galaxies, the PSC-z
(Saunders et al. 2000). The most ambitious surveys to date have however been carried out over the last
five years. The first example is the construction of a complete all-sky redshift survey based on galaxies in
http://archive.stsci.edu/dss/
http://www.sdss.org
http://surveys.roe.ac.uk/wsa
Wide Field Surveys : A.Lawrence 5
the 2MASS Extended Source Catalog (XCS) to a depth of KS=12.2, containing roughly 100,000 galaxies.
In the south, observations are carried out at the UK Schmidt, as part of the 6dfGRS project (Jones et
al. 2004, http://www.aao.gov.au/local/www/6df/); in the North observations are being carried out by
a CfA team at Mt Hopkins, Arizona (see http://cfa-www.harvard.edu/∼huchra/2mass/). The survey
is part way through, but has already been used to measure the dipole anisotropy of the local universe
(Erdodgu et al. 2005).
Two very successful projects have completed redshift surveys of smaller area, but reaching considerably
deeper, containing hundreds of thousands of galaxies. The first, in the Northern sky, is SDSS-z, the
spectroscopic component of SDSS, as described above. The second, in the southern sky, is the 2dF
Galaxy Redshift Survey (2dFGRS; Colless et al. 2001). This was based on galaxies selected from the
APM digitisation of UK Schmidt plates (Maddox et al. 199x), and observed using the Two Degree
Field (2dF) facility at the Anglo-Australian Telescope, which has 400 independent fibres. The 2dFGRS
obtained spectra for 245591 objects, mainly galaxies, brighter than a nominal extinction-corrected mag-
nitude limit of bJ=19.45, covering 1500 square degrees in three regions. The final data release was in
June 2003. More information, and data access, is available at http://www.mso.anu.edu.au/2dFGRS/.
These two surveys have produced a range of science, but have concentrated on making the best possible
measurement of the power spectrum of galaxy clustering, and together with WMAP and supernova
programme results, have produced the definitive estimates of the cosmological parameters, leading to
the current ‘concordance cosmology’.
3 Recent survey science highlights
I have picked out a handful of results from the optical-IR surveys of the last few years, including the
first results from UKIDSS, to illustrate the power of the survey approach.
Panoramic mapping : the structure of the Milky Way.
Two topics which clearly benefit from a map covering 4π sr, and with low extinction, are the structure
of the Milky Way, and the structure of the local extragalactic universe. Figure 1, taken from Jarrett
(2004), illustrates the impact 2MASS has on both these topics, showing both the Point Source Catalog
(mostly stars) and the Extended Source Catalog (mostly galaxies at z<0.1). For the first time, we can
see the Milky Way looking like other external galaxies, with disc, bulge, and dust lane. Some of the
most important scientific results however have come from looking at subsets of the stellar population.
Figure 2, from Majewski et al. (2003), shows the sky distribution of M giants selected from the 2MASS
PSC, a selection which traces very large scale structures while removing the dilution of local objects,
using a few thousand stars out of the catalogue of half a billion. From APM star counts we already knew
of the existence of the Sagittarius dwarf, swallowed by the Milky Way (Ibata, Irwin and Gilmore 1994),
but now we can see its complete structure including an extraordinary 150 degree tidal tail. Its orbital
plane shows no precession, indicating that the Galactic potential within which it moves is spherical. The
Earth is currently close to the debris, which means that some very nearby stars are actually members of
the Sagittarius dwarf system. Interestingly, Sagittarius seems to contribute over 75% of of high latitude
halo M giants, with no evidence for M giant tidal debris from the Magellanic clouds.
SDSS, although not a panoramic survey, has also been very important for Galactic structure and stel-
lar populations, with the five widely spread bands making it possible to derive stellar types and so
photometric parallaxes. Juric et al. (2005) derive such parallaxes for 48 million stars. They fit a com-
bination of oblate halo, thin disk, thick disk, but also find significant ‘localised overdensities’, including
the known Monoceros stream, but also a new enhancement towards Virgo that covers 1000 sq.deg. This
then maybe another dwarf galaxy swallowed by the Milky Way.
Large sample statistics : galaxy clustering and the cosmological parameters.
http://www.aao.gov.au/local/www/6df/
http://www.mso.anu.edu.au/2dFGRS/
Wide Field Surveys : A.Lawrence 6
Figure 2: Smoothed maps of the sky in equatorial coordinates showing the 2MASS point source catalogue
optimally filtered to show the Sagittarius dwarf; southern arc (top), and the Sagittarius dwarf northern
arm (bottom). Two cycles around the sky are plotted to demonstrate the continuity of features. The
top panel uses 11 < Ks < 12 and 1.00 < J − Ks < 1.05. The bottom panel uses 12 < Ks < 13 and
1.05 < J −Ks < 1.15. Taken from Majewski et al. (2003).
Figure 3: Cone diagram showing projected distribution of galaxies in 2dFGRS. Taken from Peacock
(2002).
Wide Field Surveys : A.Lawrence 7
Figure 4: (a) Power spectrum from 2dFGRS, compared to various model predictions. Taken from Per-
cival et al. (2001). (b) Correlation function of Luminous red Galaxies in the SDSS-z sample, showing
the first baryon acoustic oscillation peak. Taken from Eisenstein et al. (2005)
The SDSS-z and 2dFGRS surveys illustrate the power of the survey approach in two ways. First,
significant volume is needed to map out large scale structures and overcome shot noise on the largest
scales. Figure 3 is a cone diagram for all the 2dFGRS galaxies, showing the richness of structure that
is only possible to map out with both a large volume and density. Second, large numbers are needed to
make a good enough estimate of the power spectrum of galaxy clustering. This is illustrated in Fig 4a,
which shows the power spectrum derived from 2dFGRS compared to various model predictions (data
from Percival et al. (2001), figure from Peacock (2002)). To distinguish models with differing matter
density in the interesting range requires accuracy of a few percent over a very wide range of scales;
to have a chance of measuring small scale features predicted by models including a significant baryon
fraction requires many samples across this wide range, with of the order 103 galaxies per bin to achieve
the required accuracy. These wiggles are due to acoustic oscillations in the baryon component of the
universe at early times. In the Percival et al. paper, only a limit could be placed on these oscillations,
but they were statistically detected in the fimal 2dFGRS data (Cole et al. 2005). However, in another
good example of filtering out a tracer sub-sample from a very large sample, the first baryon peak was
much more clearly seen in the correlation function of Luminous Red Galaxies (LRGs) selected from
SDSS-z (Eisenstein et al. 2005; Huetsi 2005; see Fig 4).
2dFGRS and SDSS-z were the first redshift surveys to have large enough scale and depth to overlap
the fluctuation measurements from the CMB, enabling degeneracies in the estimation of cosmological
parameters to be broken, and accuracy to be increased by a factor of several. Several key papers made
joint analyses of the galaxy and CMB datasets (Percival et al. 2002; Efstathiou et al. 2002; Tegmark et
al. 2003; Pope et al. 2004) arriving at broadly consistent answers. We now know what kind of universe
we live in : a geometrically flat universe dominated by vacuum energy (75%), with some kind of cold
dark matter at about 21% and ordinary baryons 4%. The equation of state parameter for the dark
energy has been limited to w < −0.52 (Percival et al. 2002), and the total mass of the neutrinos to
m <1 eV (Tegmark et al. 2003; Elgaroy et al. 2002)
The Deep eXtragalactic Survey (DXS) of UKIDSS will produce a galaxy survey over a volume as large
as that of 2dFGRS or SDSS, but at z = 1. A redshift survey of this sample is a prime target for future
work.
Rare objects : Brown Dwarfs.
Infrared surveys have transformed the study of the substellar regime, blurring our idea of what it means
Wide Field Surveys : A.Lawrence 8
7000 7500 8000 8500 9000 9500 10000 10500
Wavelength
Dashed: LBQS composite
Dotted: Telfer continuum
SDSS J1030+0524
z=6.28
Figure 5: (a) High resolution spectrum of high redshift quasar found in SDSS. Gunn-Peterson troughs
due to Lyα and Lyβ are the black sections from 8500Å to 9000Å and from 7000Å to 7500Å. Taken
from White et al. (2003); original discovery spectrum in Becker et al. (2001). (b) Spectral energy
distributions for a z = 7 quasar and a T-dwarf, compared to filter passbands from SDSS and UKIDSS.
Taken from Lawrence et al. (2007).
to be a star. For many years, until the first discovery of the very faint IR companion of GL 229 (i.e.
GL229B) by Nakajima et al. (1995), the possibility of star-like objects which never ignite nuclear burning
was only a speculation. Within a year of the start of 2MASS, Kirkpatrick et al. (1999) had found 20
brown dwarfs in the field, increasing the number of known brown dwarfs by a factor of four, and had
defined two new stellar spectral types - L and T. (These strange designations were determined by the
fact that various odd stellar types had already used up nearly all the other letters of the alphabet.)
The transition from M to L was defined by the change of key atmospheric spectral features from those
of metal oxides to metal hydrides and neutral metals; the transition from L to T by the appearance of
molecular features such as methane - as seen in solar system planets. The effective temperature for L
dwarfs is in the range T∼1500 – 2000 K, and for T-dwarfs T∼1000 –1500 K. As of the time of writing,
almost 600 brown dwarfs are known. Most of these are L-dwarfs, but almost 60 T-dwarfs have now been
found in a series of 2MASS papers (see Ellis et al. 2005 and references therein).
The much deeper UKIDSS search is expected to make significant further advances in two ways. The
first is by pushing to ever cooler and fainter objects, hopefully finding examples of a putative new stellar
class labelled ‘Y dwarfs’ (the last useable letter left ... see Hewett et al. 2006), finding T-dwarfs further
than 10pc, and plausibly finding Population II brown dwarfs if they exist. The second advance expected
from UKIDSS is the determination of the substellar mass function, through the Galactic Clusters Survey
(GCS), and testing whether it is universal or not. These hopes are already being borne out by early
UKIDSS results; Warren et al. (2007b) report the discovery of the coolest known star, classified as T8.5;
and in early results from the GCS programme, Lodieu et al. (2007) have found 129 new brown dwarfs in
Upper Sco, a significant fraction of all known brown dwarfs, including a dozen below 20 Jupiter masses,
finding the mass function in the range 0.3 – 0.01 solar masses to have a slope of index α = 0.6 ± 0.1.
Rare objects : the ionisation history of the Universe.
An excellent example of the ‘needle in a haystack’ search is looking for very high redshift quasars.
Only the most extremely luminous quasars are detectable at these distances, but the space density of
such objects is very low; even in a survey with thousands of square degrees there may be only a few
present. Luminous and high redshift quasars are interesting for a variety of reasons, but a key target
for four decades has been their use as beacons to detect the re-ionisation of the inter-galactic medium.
The baryon content of the early universe must have become neutral as it cooled down, but something
subsequently re-ionised it, as attempts to find the expected ‘Gunn-Peterson trough’ (Gunn and Peterson
1965) in the spectra of high redshift quasars had failed for many years. This finally changed in 2001 as
SDSS broke the z = 6 quasar redshift barrier (Fan et al. 2001) and Becker et al. (2001) made the first
Wide Field Surveys : A.Lawrence 9
detection of a Gunn-Peterson trough at z = 6.28. Figure 5 shows the improved spectrum from White
et al. (2003).
Unfortunately this exciting result seemed to conflict with the CMB measurements by the WMAP year-1
data. The degree of scattering required implied that ionisation had already taken place by z=11 – 30
(Kogut et al. 2003). Rather than being seen as a contradiction, it seems likely that that re-ionisation was
not a single sharp-edged event, but an extended and very likely complex affair, perhaps with multiple
stages and even spatial inhomogeneity (see White et al. 2003). This opens an entire new field of
investigation for understanding the history of the early universe. Rather than a single object locating
the transition edge, it is now important to find as many beacons as possible at z>6, and to find some
beacons in the range z = 7–8. This is one of the key aims of UKIDSS, in combination with SDSS data,
looking for z-dropouts. A problem however is that JHK colours of high-z quasars and T-dwarfs become
very similar. For this reason, UKIDSS is using a Y-band filter centred at 1.0 µm. Figure 6 illustrates
the point, comparing the spectrum of a quasar redshifted to z = 7 with that of a T-dwarf brown dwarf.
4 Next steps in optical-IR surveys
Three key optical-IR survey projects are to begin soon (VISTA, PanSTARRS, and WISE), with the
ultimate in wide-field surveys (LSST) now in the planning stage. Here I briefly summarise each of these.
VISTA.
The Visible and Infrared Survey Telescope for Astronomy (VISTA) is a 4m aperture dedicated survey
telescope on Paranal in Chile. It was originally a UK project, aimed ta bothe optical and IR surveys,
but became an IR-only ESO telescope during the accession of the UK to ESO. The infrared camera
operates at Z, Y , J , H , and KS , and contains 16 arrays each of which has 2048×2048 0.33” pixels,
covering 0.6 sq.deg. in each shot. VISTA therefore operates in the same parameter space as UKIDSS,
but will survey three times faster, and furthermore, 100% of the time is dedicated to IR surveys. The
majority (75%) of the telescope time is reserved for large public surveys. At the time of writing, these
surveys are in the process of final approval, but are likely to include a complete hemisphere survey to
K=18.5, surveys of the Galactic Bulge and the Milky Way, a thousand sq.deg. survey to K=19.5, a 30
sq.deg. survey to K=21.5, and a 1 sq.deg. survey to K=23. VISTA is expected to begin operations in
late 2007. The VISTA web page is at http://www.vista.ac.uk/, and a recent reference is McPherson et
al. (2006).
PanSTARRS.
The power of a survey facility is measured by its étendue, the product of collecting area times field
of view. The cost of a telescope, and the difficulty of producing very wide fields, scales steeply with
telescope aperture. The idea behind the ‘Panoramic Survey Telescope and Rapid Response System’
(PanSTARRS), a University of Hawaii project, is to produce the maximum étendue per unit cost by
building several co-operating wide field telescopes of moderate size. The design has four 1.8m telescopes
each with a mosaic array of 64×64 CCD chips covering 7 sq.deg, which will produce an étendue an order
of magnitude larger than the SDSS facility. As well as enabling one to produce deep surveys faster,
this makes it plausible to cover very large areas of sky repeatedly - thousands of square degrees per
night. The prime aim of PanSTARRS is to detect potentially hazardous NEOs, but it will also be used
for stellar transits, microlensing studies, and locating distant supernovae to constrain the dark energy
problem. The accumulated sky survey will be many times deeper than SDSS, and the expected image
quality and stability from Hawaii should allow the best ever mapping of dark matter via weak lensing
distortions.
A prototype single PanSTARRS system (‘PS1’) has recently been built and is being commissioned at
the time of writing. The operation and science analysis for PS1 involves an extended ‘PS1 Science
Consortium’ with additional partners, from the US, Uk and Germany. Over three years, it is expected
http://www.vista.ac.uk/
Wide Field Surveys : A.Lawrence 10
 13  13.5  14  14.5  15
Log(ν) Hz
Sky survey comparisons : Extragalactic
PS1UHSWISE
LRG z=2
quasar z=2
quasar z=7
 13  13.5  14  14.5  15
Log(ν) Hz
Sky survey comparisons : Brown dwarfs
PS1UHSWISE
T=1000K
D=30pc
M=10MJ, T=5Gyr
D=1pc
Figure 6: Spectral energy distributions of various objects compared to 5σ sensitivities of key sky surveys.
Green triangles are JHK sensitivities for a proposed extension to the UKIDSS Large Area Survey, the
UKIRT Hemisphere SUrvey (UHS). Blue circles are for the WISE mission, from Mainzer et al 2005.
Red circles (PS1) are for the PanSTARRS-1 3π survey, taken from project documentation. The left hand
frame compares extragalactic objects - a giant elliptical at z = 2; the mean quasar continuum SED from
Elvis et al 1994 redshifted to z = 2; and a high redshift quasar spectrum redshifted to z = 7. The right
hand frame compares two model brown dwarf spectra, from Burrows et al (2003, 2006). The red line
(lower curve at low frequency) is for an object with effective temperature of 1000K and surface gravity
of 4.5, placed at a distance of 50pc. The black line (upper curve at low frequency) is for an object with
mass of 10 Jupiter masses and age 5 Gyr, placed at a distance of 1pc.
to produce a 3π steradian survey at grizy to z=23 with 12 visits, a Medium Deep Survey visiting 12 7
sq.deg. fields with a 4 day cadence, building up a survey to z=26, and special stellar transit campaigns
and microlensing monitoring of M31. The data will become public at the end of this science programme.
Information about PanSTARRS can be found at http://pan-starrs.ifa.hawaii.edu/public/
WISE.
The Wide Field Infrared Survey Explorer (WISE) is a NASA MIDEX mission scheduled for launch in
2009 that will fill the gap between UKIDSS/VISTA in the near-IR and IRAS and Akari in the far-IR,
surveying the sky in four bands simultaneously (3.3, 4.7, 12, and 23µm). The sky survey at 3 and 5µm
is completely new territory; as 12 and 23µm WISE covers the same territory as IRAS but will be a
thousand times deeper. WISE carries a 40cm cooled telescope with a 47 arcmin field of view. It is
designed to have a relatively short lifetime - 7 months - but in this time will make a mid-infrared survey
of the entire sky in all four bands. WISE will produce significant advances in a number of areas, but
especially for objects expected to have temperatures in the hundreds of degrees - the very coolest brown
dwarfs, protoplanetary discs, solar system bodies, and obscured quasars. Information about WISE can
be found at http://wise.ssl.berkeley.edu/ and in Mainzer et al. (2006).
The depths of PS1, UKIDSS-VISTA, and WISE surveys are well matched and will produce a stunning
sky survey dataset over a factor of a hundred in wavelength. This is illustrated in Fig. 6, taken from a
recent proposal to extend UKIDSS to a complete hemisphere survey.
LSST.
The Large Synoptic Survey Telescope (LSST) aims at the maximum possible étendue, aiming at the
same kind of science as PanSTARRS - hazardous NEOs, GRBs, supernovae, dark matter mapping via
weak lensing - but a factor of several faster; it should be able to produce a survey equivalent to SDSS
every few days. The design has an 8.4m telescope with a 10 sq.deg. field of view. The planned standard
mode of use is to take 15 second exposures, and keep moving, covering the whole sky visible from LSST
http://pan-starrs.ifa.hawaii.edu/public/
http://wise.ssl.berkeley.edu/
Wide Field Surveys : A.Lawrence 11
in bands ugrizy once every three days. This produces 15TB of imaging data every night. The aim is
to keep up with this flow in quasi-real time, producing alerts for transient objects within minutes. This
requires approximately 60 TFlops of processing power - a huge amount today, but following Moore’s
Law, very likely to be equivalent to merely the 500th most powerful computer in the world by 2012..
The LSST data management plan has a hierarchy of archive and data centres, reminiscent of the LHC
Grid, with the primary mission facility acting like a ‘beamline’, where a variety of research groups can
rent space for their own experiments on the data flowing past. The LSST site has now been chosen
(Cerro Pachon in Chile), but the project is not yet fully funded. More information can be found at
http://www.lsst.org
5 The end of survey discoveries ?
In 1950, the universe seemed to consist of stars, and a sprinkling of dust. Over the last fifty years, the
actual diverse and bizarre contents of the universe have been successively revealed as we surveyed the sky
at a series of new wavelengths. Radio astronomy has shown us radio galaxies and pulsars; microwave
observations have given us molecular clouds and the Big Bang fossil background; IR astronomy has
shown us ultraluminous starburst galaxies and brown dwarfs; X-ray astronomy has given us collapsed
object binaries and the intra-cluster medium; and submm astronomy has shown us debris disks and
the epoch of galaxy formation. As well as revealing strange new objects, these surveys revealed new
states of matter (relativistic plasma, degenerate matter, black holes) and new physical processes (bipolar
ejection, matter-antimatter annihilation). Having opened up gamma-rays and the submm with GRO
and SCUBA, there are no new wavelength windows left. Has this amazing journey of discovery now
finished ?
Wavelength is not the only possible axis of survey discovery space. Let us step through some other axes
and examine their possibilities. In doing this, we will to some extent go over ground already trodden by
Harwit (2003), but with a particular emphasis on surveys rather than discovery space in general, and
with an eye to what is economically plausible.
Photon Flux. Historically, going ever deeper has been as productive as opening new wavelength
windows, the classic example of course being the existence of the entire extragalactic universe, which
did not become apparent until reaching ten thousand times fainter than naked eye observations, requiring
both large telescopes and the ability to integrate. We can now see things ten billion times fainter than the
naked eye stars. However, we have reached the era of diminishing returns. The flux reached by a telescope
is inversely proportional to diameter D but its cost is proportional to D3. Significant improvements can
now only be achieved with world-scale facilities, and orders of magnitude improvements are unthinkable.
The easy wins have been covered already - our detectors now achieve close to 100% quantum efficiency;
we have gone into space and reduced sky background to a minimum; and multi-night integrations have
been used many times. We will keep building bigger telescopes, but it no longer seems the fast track to
discovery.
Spectral resolution. Detailed spectroscopy of individual objects is of course the key technique of
modern astrophysics. Spectroscopic surveys of samples drawn from imaging surveys have been carried
out at many wavelengths, and have been particularly important for measuring redshift and so mapping
the Universe in 3D; we were not expecting the voids, bubbles and walls that we found in the galaxy
distribution in the 1980s. This industry will continue, but there is no obvious new barrier to break.
Narrow band imaging surveys centred on specific atomic or molecular features (21cm HI, CO, Hα) have
been fruitful, but again its not obvious there is anywhere new to go.
Polarization. Polarisation measurements of individual objects are a very important physical diagnostic,
but are polarisation surveys plausible ? Surveys of samples of known objects to the 0.1% level have
been done, with interesting results but no big surprises. Perhaps blank field imaging surveys in four
Stokes parameters would turn up unexpected highly polarised objects ? This has essentially been done
http://www.lsst.org
Wide Field Surveys : A.Lawrence 12
in radio astronomy but not at other wavelengths.
Spatial resolution. This is the dominant big-project target of the next few decades, and of course is
the real point of Extremely Large Telescopes. Put together with multi-conjugate Adaptive Optics, we
hope to achieve both depth and milli-arcsec resolution at the same time. However, the royal road to
high spatial resolution is through interferometry. Surveys with radio interferometers in the twentieth
century showed the existence of masers in space, and bulk relativistic outflow. In the twenty first
century we will be doing microwave interferometry on the ground (ALMA) and IR interferometry in
space (TPF/DARWIN), hoping to directly detect Earth-like planets around nearby stars. So there is
excitement for at least some time to come; however, as with photon flux, we are hitting an economic
brick wall. Significantly bigger and better experiments will be a very long time coming.
Time. The observation of temporal changes has repeatedly brought about revolutionary changes in
astronomy, the classic examples being Tycho’s supernova, and the measurement of parallax. The last
two decades has seen a renaissance in this area, with an impressive number of important discoveries
from relatively cheap monitoring experiments - the discovery of extrasolar planets from velocity wobbles
and transits; the discovery of the accelerating universe and dark energy from supernova campaigns; the
location of substellar objects from survey proper motions; the existence of Trans-Neptunian Objects,
and Near Earth Objects; the final pinning down of gamma-ray burst counterparts; and the limits on
dark matter candidates from micro-lensing events. The next decade or two will see more ambitious pho-
tometric monitoring experiments, such as PanSTARRS and LSST, and a series of astrometric missions,
culminating in GAIA, which will see external galaxies rotating. Overall, the ‘time window’ is well and
truly opened up. However, the temporal frequency axis is far from fully explored. My instinct is that
this technique will continue to produce surprises for some time.
Non-light channels : particles. Cosmic ray studies have been important for many decades, but
you can’t really do surveys - indeed the central mystery has alway been where they come from. Dark
matter experiments are confronting what is arguably the most important problem in physics, let alone
astrophysics, but again no survey is plausible. The big hope is neutrino astrophysics. Neutrinos should
emerge from deep in the most fascinating places that we could otherwise never see - supernova cores, the
centres of stars, the interior of quasar accretion discs. Measurement of solar neutrinos has solved a long
standing problem, and set a challenge for particle physics - but what about the rest of the Universe ?
New experiments such as ANTARES (under the sea) and AMANDA (under the ice) seem to be clearly
detecting cosmic neutrinos, but no distinct sources have yet emerged. Possibly the next generation
(ICECUBE) will get there. This looks like the best bet for genuinely unexpected discoveries in the
twenty first century.
Non-light channels : gravitational waves. Like neutrinos, we know that gravitational waves have
to be there somewhere, and their existence has been indirectly proved by the famous binary pulsar
timing experiment. However after many years of exquisite technical development, we still have no
direct detection of a gravitational wave. The space interferometer mission LISA should finally detect
gravitational waves, unless current predictions are badly wrong. However even LISA will not produce a
genuine survey. We will detect many events and understand more astrophysics, but will have essentially
no idea where they came from, except that hopefully some will correlate with Gamma-ray bursts. If we
see totally unexpected signals, it will be very hard to know what to do next.
Hyper-space planes : the Virtual Observatory. As we explore the various possible axes one by
one, many if not most of them are running out of steam, or too expensive to pursue. But we are a
long way short of exploring the whole space - for example narrow line imaging in all Stokes parameters
versus time. This exploration does not necessarily need complex new experiments. More survey-quality
datasets come on line every year. As formats, access and query protocols, and analysis tool interaction
protocols all get standardised, the virtual universe becomes easier for the e-astronomer to explore, and
unexpected results will emerge. This, of course, is the agenda of the worldwide Virtual Observatory
initiative.
Wide Field Surveys : A.Lawrence 13
6 Conclusions
Surveys are perhaps the most cost effective and productive way of doing astronomy. In recent years
optical, infra-red, and redshift surveys have produced spectacular results in determining cosmological
parameters, finding the smallest stellar objects, decoding the history of the Milky Way, and much else
besides. Surveys underway now and over the next few years should also produce impressive science.
Having been the main engine of discovery for decades, there is a worry now that we have already explored
every axis of discovery space. The best hopes for unexpected discoveries may be in massive time domain
surveys, in neutrino astrophysics, and in exploring the full multi-dimensional space through the Virtual
Observatory.
Wide Field Surveys : A.Lawrence 14
7 REFERENCES
Becker, R.H. Fan, X., White, R.L., 2001, AJ, 122, 2850.
Burrows, A., Sudarsky, D., Lurine, J.I., 2003 ApJ, 596, 587.
Burrows, A., Sudarsky, D., Hubeny, I., 2006 ApJ, 640, 1063.
Cole, S., Percival, W.J., Peacock, J.A. et al. 2005, MNRAS, 363 505.
Colless M.M., Dalton G.B., Maddox S.J., et al. 2001, MNRAS, 328, 1039.
Dye, S., Warren, S.J., Hambly, N.C., et al. 2006, MNRAS, 372, 1227.
Eisenstein, D.J., Zehavi, I., Hogg, D.W., et al. 2005, ApJ, 633, 560.
Efstathiou G., Moody S., Baugh C., et al. 2002, MNRAS, 330, 29.
Elgary, ., Lahav, O., Percival, W.J., et al. 2002, Phys.Rev.Lett., 89, 1301.
Ellis, S.C., Tinney, C.G., Burgasser, A.J., Kirkpatrick, J.D., McElwain, M.W., 2005, AJ, 130, 2347.
Elvis, M, Wilkes, B.J., McDowell, J.C., Green, R.F., Bechtold, J., Willner, S.P., Oey, M.S., Polomski,
E. Cutri, R., 1994, ApJSupp, 95, 1.
Erdogdu, P., Huchra, J.P., Lahav, O., et al. 2006, MNRAS, 368, 1515.
Fan, X., Narayanan, V.K., Lupton, R.H., et al. 2001, AJ, 122, 2833.
Gunn, J.E., Peterson, B.A., 1965, ApJ, 142, 1633.
Harwit, M., 2003, Physics Today, November 2003, 38.
Hewett, P.C., Warren, S.J., Leggett, S.K., Hodgkin, S.L., 2006 MNRAS, 367, 454.
Huetsi, G., 2005, A&A, 449, 891.
Ibata, R.A., Gilmore, G., Irwin, M.I. 1994 Nature, 370, 194.
Jarrett, T.H. 2004, PASA, 21, 396.
Jones, D.H., Saunders, W., Colless, M., et al. 2004, MNRAS, 355, 747.
Juric, M., Ivezic, Z., Brooks, A., et al. 2005, ApJ submitted (astro-ph/0510520)
Kirkpatrick, J.D., Reid, I.N., Liebert, J., Cutri, R.M., Nelson, B., Beichman, C.A., Dahn, C.C., Monet,
D.G., Gizis, J.E., Skrutskie, M.F., 1999, ApJ, 519, 802.
Kogut, A., Spergel, D.N., Barnes, C., et al. 2003 ApJSupp, 148, 161.
Lawrence, A., Warren, S.J., Almaini, O., et al. 2006 MNRAS submitted (astro-ph/0604426)
Lodieu, N., Hambly, N.C., Jameson, R.F., Hodgkin, S.T., Carraro, G., Kendall, T.R., 2007, MNRAS,
374, 372.
Maddox, S.J., Sutherland, W.J., Efstathiou, G., Loveday, J., 1990, MNRAS, 243, 692.
http://arxiv.org/abs/astro-ph/0510520
http://arxiv.org/abs/astro-ph/0604426
Wide Field Surveys : A.Lawrence 15
Mainzer, A.K., Eisenhardt, P., Wright, E.L., Liu, F-C., Irace, W., Heinrichsen, I., Cutri, R., Duval, V.,
2006, Proc SPIE, 6256, 61.
Majewski, S.R., Skrutskie, M.F., Weinberg, M.D., Ostheimer, J.C. 2003, 599, 1082.
McPherson, A.M., Born, A., Sutherland, S., Emerson, J., Little, B., Jeffers, P., Stewart, M., Murray,
J., Ward, K., 2006, Proc SPIE 6267, 7.
Nakajima, T., Oppenheimer, B.R., Kulkarni, S.R., Golimowski, D.A., Matthews, K., Durrance, S.T.,
1995 Nature 378 463.
Percival W.J., Baugh C.M., Bland-Hawthorn J., et al. 2001, MNRAS, 327, 1297.
Percival W.J., Sutherland W.J., Peacock J.A., et al. 2002, MNRAS 337, 1068
Peacock J.A., 2002, ASP Conf Series, 283, 19.
Pope, Adrian C.; Matsubara, Takahiko; Szalay, Alexander S., et al. 2004, ApJ, 607, 655.
Saunders, W., Sutherland, W.J., Maddox, S.J., et al. 2000, MNRAS, 317,55.
Skrutskie M.F., Cutri R.M., Stiening, et al. 2006, AJ, 131, 1163.
Tegmark, M., Blanton, M., Strauss, M., et al. 2004, ApJ, 606, 702.
Warren, S.J., Hambly, N.C., Dye, S., et al. 2007a, MNRAS in press (astro-ph/0610191)
Warren, S.J., Mortlock, D.J., Legget, S.K., et al. 2007b, MNRAS submitted
White, R.L., Becker, R.H., Fan, X., Strauss, M.A., 2003 AJ, 126, 1.
York, D.G., Adelman, J., Anderson, J.E., et al. 2000, AJ, 120, 1579
http://arxiv.org/abs/astro-ph/0610191
	Why are wide field surveys important ?
	Major surveys
	Recent survey science highlights
	Next steps in optical-IR surveys
	The end of survey discoveries ?
	Conclusions
	REFERENCES
ABSTRACT
  I review the status of science with wide field surveys. For many decades
surveys have been the backbone of astronomy, and the main engine of discovery,
as we have mapped the sky at every possible wavelength. Surveys are an
efficient use of resources. They are important as a fundamental resource; to
map intrinsically large structures; to gain the necessary statistics to address
some problems; and to find very rare objects. I summarise major recent wide
field surveys - 2MASS, SDSS, 2dfGRS, and UKIDSS - and look at examples of the
exciting science they have produced, covering the structure of the Milky Way,
the measurement of cosmological parameters, the creation of a new field
studying substellar objects, and the ionisation history of the Universe. I then
look briefly at upcoming projects in the optical-IR survey arena - VISTA,
PanSTARRS, WISE, and LSST. Finally I ask, now we have opened up essentially all
wavelength windows, whether the exploration of survey discovery space is ended.
I examine other possible axes of discovery space, and find them mostly to be
too expensive to explore or otherwise unfruitful, with two exceptions : the
first is the time axis, which we have only just begun to explore properly; and
the second is the possibility of neutrino astrophysics.

<|endoftext|><|startoftext|>
Introduction
Measurement of polarization anisotropies in the Cosmic Microwave Back-
ground (CMB) is one of the great challenges in cosmology today. Very sensitive
measurements of these anisotropies, particularly at large angular scales, will
Preprint submitted to Elsevier Science 30 October 2018
http://arxiv.org/abs/0704.0810v2
provide unique constraints on the influence of gravitational waves on the pro-
duction of structure in the very early Universe and information on the epoch
of reionization.
Several experiments are running or in the planning stages, and long term devel-
opment for a future space mission attacking CMB polarization is underway. To
date, nearly all of the effort has been directed towards maximizing the number
of detectors in the focal plane to achieve the required sensitivity. Relatively
little work is going into sub-orbital efforts to constrain polarization fluctua-
tions at the largest angular scales, those most interesting for their impact on
understanding the inflationary epoch and ionization history of the universe.
This is primarily because of an unproven perception that very low multipoles
will not be accessible to any but space-based missions. Indeed, large scale
polarization has been searched for with ground based experiments over the
last 30 years. The COsmic Foreground Explorer (COFE) is a balloon-borne
instrument to measure the low frequency and low-ℓ characteristics of some
dominant polarized foregrounds. Good understanding of these foregrounds is
critical both for interpreting recent results, e.g. Spergel et al. (2006), and for
appropriately planning future CMB missions. The experiment also explores
low-ℓ limits to CMB polarization measurements at moderate frequencies from
non-space based platforms. We believe that balloon and ground-based mea-
surements to characterize in detail the polarized microwave sky are essential
to prepare a future space mission dedicated to CMB B-modes.
2 Science
The CMB radiation field is an observable that provides direct information
from the early Universe. The temperature and polarization characteristics of
this field impose constraints on cosmological scenarios relevant to understand
the origin and the structure of the Universe. Accurate measurements of the
CMB are vital to improve our understanding about geometry, mass-energy
composition, and reionization of the Universe. Ultimately, the CMB could
also provide indirect detection of a stochastic gravitational background and
information from the inflationary epoch itself. Having this big picture in mind,
several CMB experiments are now trying to constrain the tensor-to-scalar ratio
value and to detect the B-mode signature.
Among all practical limitations to primordial tensor amplitude detection, con-
tamination due diffuse microwave foreground polarized emission is certainly
the fundamental one. This emission presents spatial and frequency variations
that are not well known, and the residuals from foreground subtraction are
restricting our knowledge of CMB polarization. This is particularly true for fu-
ture B-mode experiments that will benefit if accurate determinations of spatial
and spectral characteristics of polarized foreground are made. For this reason,
multifrequency measurements of the polarized foregrounds in the microwaves
is now recognized as a key objective within the CMB community.
At low frequencies, foregrounds include synchrotron, free-free, and possible
spinning dust emission. Synchrotron dominates the low frequency range of the
microwave sky. Its emission is caused by relativistic charged particles interact-
ing with the Galactic magnetic field and can be highly polarized. Synchrotron
measurements provide better understanding of the Galactic magnetic field
structure and the density of relativistic electrons across the Galaxy. Free-free
emission becomes more important in the microwave intermediate frequency
range, and it is due to electron-ion scattering. Free-free is expected to be un-
polarized but this might not be true at the edges of HII clouds. Electrical
dipole emission from spinning dust has also been suggested by recent obser-
vations at low microwave frequencies, e.g. Finkbeiner et al. (2004).
COFE is a balloon-borne microwave polarimeter to measure spatial and low-
frequency characteristics of diffuse polarized foregrounds. This is an important
effort toward characterizing the polarized foregrounds for future CMB experi-
ments, in particular the ones that aim to detect primordial gravitational wave
signatures in the CMB polarization angular power spectrum.
3 Instrumentation
3.1 Telescope
Amodified BEAST telescope design is the basis for the COFE optics (Childers et al.,
2005; Figueiredo et al., 2005; Meinhold, P. R. et al., 2005; Mej́ıa, J. et al., 2005;
O’Dwyer, I. J. et al., 2005). It consists of an off-axis Gregorian configuration
obeying the DragoneMizuguchi condition (Dragone, 1978; Mizuguchi et al.,
1978). The telescope is optimized for minimal cross-polarization contamina-
tion and maximum focal plane area. The primary reflector is a 2.2 m off-axis
parabolic reflector. The incoming radiation is reflected off of the primary re-
flector towards a polarization modulating wave plate then to the secondary
reflector. The 0.9 m ellipsoidal secondary reflects the incoming radiation to-
ward the array of scalar feed horns that couple the radiation to an array of
cryogenic low noise amplifiers. The telescope will be mounted in a gondola that
has been simplified from a standard balloon-borne design due to the very light
carbon fiber optical elements. A schematic of the optics is shown in Figure 1.
3.2 Polarization modulator
COFE will employ a low-loss reflective polarization modulator for measur-
ing both Q and U simultaneously. It consists of a linear polarizing wire grid
mounted in front of a reflecting plate. The wire grid decomposes the input wave
into components, parallel and perpendicular to the wires, reflecting the par-
allel component with low loss. The perpendicular component passes through
the wire grid and reflects off the back short, passes through the grid again
and recombines with the parallel component. The distance between the plate
and the grid introduces a phase shift between the two components, effectively
rotating the plane of polarization of the input wave. A schematic of the polar-
ization modulator is shown in Figure 2. Rotating the grid chops between the
two polarization states four times per revolution as shown in Figure 3.
Tests of this modulator were performed at 41.5 GHz, using a 70 cm telescope.
We measured beam patterns for the rotated polarization states and integrated
for extended periods on the sky in Santa Barbara, CA. We were able to deter-
mine a 1/f knee lower than 50 mHz and very stable long term offsets. We also
demodulated sky data to the two different states and calculated the correct
combined sensitivity, as seen in Figure 4.
The polarization modulator has a broad bandwidth. We achieved 22 dB isola-
tion at 20% bandwidth. The radiometric loss of the elements in the modulator
can easily be made very low (of order 0.11%) up to relatively high frequencies.
The system works for a very wide range of frequency bands.
3.3 Receiver
COFE will use InP MMIC 1 amplifiers integrated into simple total power re-
ceivers. All of the RF gain will be integrated into a small compact module
inside the vacuum chamber. The module will contain 3 to 4 amplifiers (∼ 75
dB of gain), band pass filter, cryogenic detector diode, and an audio ampli-
fier. The module avoids the need for cryo/vacuum waveguide feedthrus on the
dewar simplifying the overall design. The audio amplifiers will be within the
cryostat vacuum vessel for simplicity and noise reasons, but will be at ambient
temperatures. COFE has a modest number of feeds required, and no ortho-
mode transducers or hybrid tees, so the passive components are minimal. A
schematic of the receiver is shown in Figure 5.
1 Indium Phosphide Monolithic Microwave Integrated Circuit
3.4 Data acquisition/demodulation
Data acquisition will use the same technique we have been using in our test
system, namely synchronous sampling of analog integrators. We oversample
the data by a large factor and perform the demodulation of Q and U Stokes
parameters (and other modes for systematic error analysis) in software. This
yields the most information and allows a variety of post-flight tests includ-
ing null signal analysis and analysis of the DC or total power components
(contaminated with 1/f , but still useful for systematic tests).
3.5 Ground-based B-machine prototype
A prototype polarimeter for a B-mode project, named B-machine, is being
deployed at the WMRS 2 Barcroft facility, CA (118◦14′ W longitude, 37◦35′ N
latitude, 3800 m altitude). The WMRS facility is an excellent site for mi-
crowave observation because of a cold microwave zenith temperature, low
precipitable water vapor, and a high percentage of clear days (Marvil et al.,
2006). Many of the components that will be used by the B-machine prototype
are useful for COFE as well. For example, the prototype will allow systematic
checks of the polarization modulator, and COFE scan strategy. The B-machine
prototype will be able to yield some basic higher multipole results on the fore-
grounds as well as the polarization signature and establish a data analysis
pipeline.
The prototype possesses telescope and detector technology identical to COFE.
It has 2 Ka-band and 6 Q-band channels centered at 31 and 41.5 GHz with
FWHM resolution of 28′ and 20′ respectively. The receiver has been previ-
ously used in anisotropy measurements (Childers et al., 2005). The telescope
runs at constant elevation while continuously scanning the sky in azimuth. A
photograph of B-machine prototype is shown in Figure 6.
4 Performance
For any sub-orbital CMB experiment, minimizing atmospheric contamination
is important. For the COFE bands, total atmospheric emission at our target
altitude of 35 km is less than 1 mK. Common broad band bolometric at-
mospheric antenna temperature contributions at balloon altitudes are several
hundred mK or more. Since the effective CMB antenna temperature drops
2 White Mountain Research Station
with frequency, our effective atmospheric signal is approximately 1000 times
less than for a bolometric balloon-borne system. Hence low-ℓ information from
a balloon-borne system is very clean by comparison. Figure 7 shows the atmo-
sphere and predicted foreground emission over a range of frequencies interest-
ing for CMB work (the foreground prediction is calculated from Bennett et al.
(2003)).
4.1 Receiver bands and expected receiver sensitivity
Receiver sensitivity can be estimated according to the radiometer equation
σT = K
Tsys + Tsky
∆ν · τ
, (1)
where σT is the root-mean-square noise, Tsys is the system noise temperature,
Tsky is the sky antenna temperature, ∆ν is the bandwidth, τ is the integration
time, and K is the sensitivity constant of the receiver.
For COFE and B-machine prototype the sensitivity constant of each receiver is
K = π
. The signal is sine wave modulated, reducing the sensitivity by a factor
as compared with a standard Dicke receiver, with an addition factor of
from the the standard definition for Q and U in the Rayleigh-Jeans regime
of the CMB spectrum. Table 1 shows our estimation of the sensitivity of each
receiver.
Table 1 – Instrument parameters.
COFE B-machine
Central frequency (GHz) 10 15 20 31 41.5
FWHM beam (arcmin) 83 55 42 28 20
Tsys (K) 8 10 12 25 27
Tsky (K) at target altitude
3 2.5 2.4 2.3 6.4 13.0
Bandwidth (GHz) 4 4 5 10 7
Number of receivers 3 6 10 2 6
Sensitivity per receiver (µK
s) 261 308 318 493 751
Aggregate sensitivity (µK
s) 151 126 100 348 307
3 For COFE and B-machine (ground based), we compute expected Tsky antenna
temperature at target altitude of 35 km and 3.8 km, respectively.
By increasing the number of receivers, future ground-based or balloon-borne
experiments can significantly improve aggregate sensitivity. For instance, 30
detectors could reach 61 µK
s and 107 µK
s at 30 and 40 GHz, respectively.
4.2 Scan strategy, sky coverage and expected map sensitivity
COFE uses a simple scan strategy to cover the largest available sky area in
each flight. The telescope will be pointed nominally 45◦ from the horizon to
minimize ground and balloon pickup, and the gondola will rotate constantly
at approximately 1/2 rpm. Data acquisition sample rate will be synchronized
with the polarization rotator (at ∼ 30 Hz). For instance, using this strategy,
a 24 hour flight from Fort Sumner, NM, allows to cover 59% of the sky area
with a median aggregate pixel sensitivity of 92 µK/deg2, 77 µK/deg2, and 61
µK/deg2 at 10 GHz, 15 GHz, and 20 GHz respectively. COFE will acquire
data from nearly all of the sky (∼ 93%). This will be achieved in a set of 12
and/or 24 hour flights from the Northern and Southern Hemispheres. Figure
8 provides estimates for sensitivity per square degree pixel over the whole sky
for our flight plans. Figure 9 illustrates the expected sky coverage.
The B-machine prototype focuses on higher multipoles but uses a similar scan-
ning strategy from the ground. For a conservative 60 day observing campaign
at WMRS, we expect to cover 56% of the sky with an median aggregate sensi-
tivity of 27 µK/deg2, and 23 µK/deg2 at 31 GHz and 41.5 GHz, respectively.
5 Conclusion
Over the next few years we will field a balloon-borne telescope to map more
than 90% of the sky. Both polarization anisotropy and polarized foregrounds
will be measured over several bands. This is an important effort toward char-
acterizing the polarized foregrounds for future CMB experiments.
In addition to foreground detection, COFE will better characterize the po-
larization modulation capability for measuring Q and U simultaneously. As
discussed earlier, a large scale ground-based campaign will capitalize on the
technology that has been developed by COFE and B-machine prototype.
It is clear that our current understanding of the polarization foregrounds limits
our ability to make accurate observations of the B-mode signature. COFE will
lessen the effect that incomplete models of foregrounds will have on future
experiments.
6 Acknowledgments
We acknowledge support from the National Aeronautics and Space Admin-
istration (NASA), and the California Space Institute (CalSpace). T.V. and
C.A.W. acknowledge CNPq Grants 305219/2004-9 and 307433/2004-8, respec-
tively. Some of the results have been derived using the HEALPix 4 (Górski et al.,
2005) package.
References
Bennett, C. L. et al. 2003, ApJS, 148, 97
Childers, J. et al. 2005, ApJS, 158, 124
Dragone, C., 1978, The Bell System Technical Journal, 57, 7, 2663
Figueiredo, N. et al. 2005, ApJS, 158, 118
Finkbeiner, D. P. et al. 2004, ApJ, 617, 350
Górski, K. M. et al. 2005, ApJ, 622, 759
Marvil, J. et al. 2006, New Astronomy, 11, 218
Meinhold, P. R. et al. 2005, ApJS, 158, 101
Mej́ıa, J. et al. 2005, ApJS, 158, 109
Mizuguchi, Y., Akagawa, M., and Yokoi, H., 1978, Electronics and Communi-
cations In Japan, 61, 58
O’Dwyer, I. J. et al. 2005, ApJS, 158, 93
Spergel, D. N. et al. 2006, astro-ph/0603449
4 http://healpix.jpl.nasa.gov
http://arxiv.org/abs/astro-ph/0603449
Fig. 1. Optical schematic for COFE and B-machine prototype telescopes, an off-axis
Gregorian configuration optimized for minimal cross-polarization contamination. A
2.2 m parabolic reflector primary, a 0.9 m ellipsoidal secondary, and a 0.3 m rotator
grid are shown.
Fig. 2. Schematic of the polarization modulator. The input wave is decomposed
into its two linear polarization states, parallel and perpendicular to the wires (rep-
resented by dots just above the conducting reflector). The perpendicular component
is phase shifted from the extra path length. When added back to the parallel com-
ponent, the plane of polarization of the input wave is rotated.
Fig. 3. Sample signal from a polarized thermal source. A single revolution of the
modulator is shown, along with the reference signal to be used for demodulation.
Commutating using this signal yields Q, for instance, while demodulating with a
reference phase shifted by π/4 gives U .
Fig. 4. Sample data from our room temperature radiometer viewing the sky at 41.5
GHz. The undemodulated PSD displays the 1/f knee of the HEMT radiometer of
10 Hz and a white noise of 5.4 mK
s. The demodulated data have no visible 1/f
and a white noise level consistent with expectation.
Fig. 5. Radiometer layout for COFE.
Fig. 6. Picture of prototype telescope to be deployed at WMRS.
Fig. 7. Atmosphere, CMB, and predicted foreground emission from 5 to 300 GHz.
COFE bands run from 10 to 20 GHz. The zenith atmosphere emission is shown at
3.8 and 35 km. The atmospheric emission and lines are mainly due to H2O, O2,
and O3. For the target altitude of 35 km, we expect well under 1 mK total emission
from the atmosphere. Foreground spectral index β for free-free, synchrotron, and
dust were assumed, respectively, as −2.15, −2.7, and 2.2.
Fig. 8. Integrated histogram of anticipated aggregate sensitivity per 1 deg2 pixel
assuming a 24 hour flight from the Northern Hemisphere (Fort Sumner, NM) and
a 24 hour flight from the Southern Hemisphere (Alice Springs, Australia). For each
COFE band, we plot the fraction of the entire sky measured with better than a
given aggregate sensitivity. The change of the curves slope is due to the fact that
35% of the sky can be observed from both hemispheres using COFE scan strategy.
Fig. 9. Sky coverage for COFE assuming a 24 hour flight from the Northern Hemi-
sphere (Fort Sumner, NM) and a 24 hour flight from the Southern Hemisphere (Alice
Springs, Australia). The region observed contains nearly the entire sky (93%). The
darker strip shows the overlap between the two observations. For illustration pur-
poses, we show the diffuse Galactic structure obtained adding synchrotron, free-free
and dust maps at 23 GHz (Bennett et al., 2003).
	Introduction
	Science
	Instrumentation
	Telescope
	Polarization modulator
	Receiver
	Data acquisition/demodulation
	Ground-based B-machine prototype
	Performance
	Receiver bands and expected receiver sensitivity
	Scan strategy, sky coverage and expected map sensitivity
	Conclusion
	Acknowledgments
ABSTRACT
  The COsmic Foreground Explorer (COFE) is a balloon-borne microwave polarime-
ter designed to measure the low-frequency and low-l characteristics of dominant
diffuse polarized foregrounds. Short duration balloon flights from the Northern
and Southern Hemispheres will allow the telescope to cover up to 80% of the sky
with an expected sensitivity per pixel better than 100 $\mu K / deg^2$ from 10
GHz to 20 GHz. This is an important effort toward characterizing the polarized
foregrounds for future CMB experiments, in particular the ones that aim to
detect primordial gravity wave signatures in the CMB polarization angular power
spectrum.

<|endoftext|><|startoftext|>
To appear in “Massive Stars: Fundamental Parameters and Circumstellar Interactions (2007)”RevMexAA(SC)
JET INTERACTIONS IN MASSIVE X-RAY BINARIES
Gustavo E. Romero 1,2
RESUMEN
Los sistemas binarios masivos de rayos X están formados por un objeto compacto que acreta materia del viento
estelar de una estrella donante de tipo temprano. En algunos de estos sistemas, llamados microcuásares, chorros
de part́ıculas relativistas son eyectados desde las cercańıas del objeto compacto. Estos chorros interactúan
con el campo de fotones de la estrella compañera, con el viento estelar, y, a grandes distancias, con el medio
interestelar. En este trabajo se resumirán los principales resultados de tales interacciones con especial énfasis en
la producción de fotones de alta enerǵıa y neutrinos. El caso de algún sistema particular, como ser LS I +61 303,
será discutido con algún detalle. Además, se presentarán las perspectivas futuras para observaciones a diferentes
longitudes de onda para este tipo de objetos.
ABSTRACT
Massive X-ray binaries are formed by a compact object that accretes matter from the stellar wind of an
early-type donor star. In some of these systems, called microquasars, relativistic jets are launched from the
surroundings of the compact object. Such jets interact with the photon field of the companion star, the stellar
wind, and, at large distances, with the interstellar medium. In this paper I will review the main results of such
interactions with particular emphasis on the production of high-energy photons and neutrinos. The case of
some specific systems, like LS I +61 303, will be discussed in some detail. Prospects for future observations at
different wavelengths of this type of objects will be presented.
Key Words: GAMMA RAYS: THEORY — GAMMA RAYS: OBSERVATIONS — JETS AND OUT-
FLOWS — STARS: BINARIES — STARS: MICROQUASARS
1. INTRODUCTION
Massive stars use to form binary systems. In such
systems one of the stars evolves faster than the other.
At the end of the lifetime of this star a supernova
explosion will occur, and either a neutron star or a
black hole will be left behind. If the system is not
disrupted by the explosion, the compact object will
start to accrete matter from the stellar wind of its
early-type companion. Since the matter has angu-
lar momentum, it will form an accretion disk around
the compact star. The matter will be heated in the
disk, losing angular momentum and falling into the
potential well. The hot disk will cool through the
emission of X-rays. We say then that a massive X-
ray binary (HMXRB) is born. There are around 120
HMXRBs detected in the Galaxy so far (Liu et al.
2006). Some of these systems present non-thermal
radio emission. This emission is thought to be syn-
chrotron radiation produced by relativistic electrons
in a jet that is somehow ejected from the surround-
1Facultad de Ciencias Astronómicas y Geof́ısicas, Univer-
sidad Nacional de La Plata, Paseo del Bosque, 1900 La Plata,
Argentina (romero@fcaglp.unlp.edu.ar).
2Instituto Argentino de Radioastronomı́a, Casilla de
Correos No. 5, (1894) Villa Elisa, Buenos Aires, Argentina
(romero@irma.iar.unlp.edu.ar).
ings of the compact object. When the jet is resolved
at radio wavelengths through interferometric tech-
niques or at X-rays, the HMXRB is called a high-
mass microquasar (Mirabel et al. 1992).
The word ‘microquasar’ (MQ) was coined to em-
phasize the similarities between galactic jet sources
and extragalactic quasars (Mirabel & Rodŕıguez
1998). These similarities, although important,
should not make us to overlook the also important
differences between both types of objects. The main
difference is, of course, the presence of a donor star
in the case of MQs. In high-mass MQs, this star pro-
vides a strong photon field, a matter field in the form
of a stellar wind, and a gravitational field that can
act upon the accretion disk producing a torque and
inducing its precession. The photon and matter field
constitute targets for the relativistic particles in the
jet. The interaction of the jets with these fields can
produce a variety of phenomena that are absent in
the case of extragalactic jets. The aim of the present
article is to review these phenomena.
2. WHAT IS A MICROQUASAR?
A microquasar is an accreting X-ray binary sys-
tem with non-thermal jets. The basic ingredients of
a MQ are shown in Figure 1. They are the compact
http://arxiv.org/abs/0704.0811v1
2 GUSTAVO E. ROMERO
ACCRETION DISC
(hard X-rays)
‘Corona’
CORONA
Accretion
      disc
Accreting
neutron star
or black
(radio - ?)
(optical -
soft X-rays)
Mass-
donating companion
star (IR-optical)
Mass-flow
> 1radio   infrared  optical soft-X  hard-X  gamma-ray
COMPANION
Fig. 1. Sketch showing the different components of a
microquasar and the energy bands at which they emit.
Not to scale. From Fender & Maccarone (2004).
object, the donor star, the accretion disk, the jets,
which usually are relativistic or mildly relativistic,
and a region of hot plasma called the ‘corona’ that
surrounds the compact object. If the star is an early-
type, hot star, the accretion can proceed through
capture of the wind material. In the case of low-
mass stars and in some close systems, the accretion
occurs through the overflow of the Roche lobe. In
what follows we will focus only on high-mass MQs.
The donor star can produce radiation from the IR
up to UV energies. The accretion disk produces soft
X-rays, whereas the corona is responsible for hard
X-rays that are likely generated by Comptonization
of disk photons. The emission of the jets goes from
radio wavelengths to, in some cases like LS 5039,
gamma-rays. MQs, like blazars, can emit along the
entire electromagnetic spectrum. Their spectral en-
ergy distribution (SED) is complex, being the result
of a number of different radiative processes occurring
on different size-scales in the MQs.
MQs present different spectral states at X-rays.
The two basic state are the ‘soft’ state and the ‘hard’
state. In the former the SED is dominated by a grey-
body peak around E ∼ 1 keV, probably due to the
contribution of the accretion disk, which extends in
this state all the way down to the last stable or-
bit around the compact object. In the hard state
the peak in the X-rays is shifted toward lower ener-
gies and a strong and hard power-law component is
present up to energies ∼ 150 keV, in some cases even
beyond. This emission is usually interpreted as soft
X-ray Comptonization in the corona (e.g. Ichimaru
1977), although some authors have suggested that it
could be produced in the jet through external inverse
Compton (IC) interactions (Georganopoulos et al.
2002) or through synchrotron mechanism (Markoff
et al. 2001, 2003).
The sources spend most of the time in the hard
state. It is in this state when a steady, self-absorbed
radio jet is usually observed. The transition form one
state to the other is commonly accompanied by the
ejection of superluminal components, that can be de-
tected as moving radio blobs (Mirabel & Rodŕıguez
1994, Fender et al. 2004).
3. WHAT ARE JETS MADE OF?
One of the most important open issues concern-
ing MQs is the nature of the matter content of
the jets. We know for sure that relativistic lep-
tons with a power-law distribution are present in
the jets since we can detect and measure their syn-
chrotron radiation. The relativistic outflow can be
made of relativistic electron-positron pairs, or al-
ternatively it could be a relativistic proton-electron
plasma. Another possibility is a plasma formed by a
cold electron-proton fluid, where the particles would
have a thermal distribution, plus a relativistic con-
tent, locally accelerated by shocks (Bosch-Ramon,
Romero & Paredes 2006). In this kind of jets, the
bulk of the momentum is carried out by the cold
plasma, which additionally confines the relativistic
component.
In any case, the large perturbations observed in
the interstellar medium (ISM) around some MQs like
Cygnus X-1 (Gallo et al. 2005) and SS 433 (Dubner
et al. 1998), strongly suggest that the jets are bari-
onic loaded. The direct detection of iron lines in the
case of the jets of SS 433 (Kotani et al. 1994, 1996;
Migliari et al. 2002) clearly confirms that they con-
tain hadrons, at least in this particular object. Since
there seems to be a clear correlation between the ac-
cretion and ejection of matter in MQs (Mirabel et al.
1998), it is natural to assume that the content of the
jets does not basically differ in nature from that of
the accreting matter. All these considerations make
quite likely the presence of relativistic protons in the
jets of MQs. Hence, their radiative signatures can
not be neglected in a serious analysis of the radia-
tive processes in these sources.
4. JET INTERACTIONS
What does happen when a relativistic jet pass
through the medium that surrounds a hot, massive
star?. The radiation field of the star penetrates freely
into the jet and the dominant UV photons will in-
teract with relativistic particles in the outflow. The
interaction of the stellar wind with the jet will form
JET INTERACTIONS IN HMXRBS 3
a boundary layer where shocks will likely be formed,
but some level of fluid mixing is expected to occur.
The interaction between relativistic particles from
the jet and thermal particles of the wind will take
place, producing high-energy emission. We can sep-
arate the microscopic jet-stellar environment inter-
actions in two groups, according to whether they are
of leptonic or hadronic nature. Of course, both types
of reactions will occur in a specific system, but ac-
cording to the given conditions, one type or the other
might dominate the high-energy output of the MQ.
Let us briefly discuss both cases.
4.1. Leptonic interactions
Relativistic electrons and positrons in the jet will
IC scatter soft photons up to high energies. The
origin of these photons can be diverse: stellar UV
photons, X-ray photons from the accretion disk and
the hot corona around the compact object, or non-
thermal photons produced in the jet by synchrotron
mechanism. At high energies, the interaction en-
ters in the Klein-Nishina regime, where the cross
section decreases dramatically. Opacity effects to
gamma-ray propagation due to the presence of the
local photon fields can result in the generation of IC
cascades within the binary system (Bednarek 2006a,
Orellana et al. 2007). Relativistic leptons can in-
teract with cold protons and nuclei from the stellar
wind producing high-energy emission through rela-
tivistic Bremsstrahlung. A number of papers have
been devoted to leptonic interactions in MQs in re-
cent years, for instance, Atoyan & Aharonian (1999),
Markoff et al. (2001, 2003), Georganopolous et al.
(2002), Kaufman Bernadó et al. (2002), Romero et
al. (2002), Bosch-Ramon & Paredes (2004), Bosch-
Ramon et al. (2005a, 2006), Paredes et al. (2006),
Dermer & Böttcher (2006), Gupta et al. (2006),
Bednarek (2006b), etc. The reader is referred to
these papers and references therein for detailed dis-
cussions.
In Figure 2 we show the broadband SED ex-
pected from leptonic interactions in a high-mass MQ.
The different contributions are indicated. It can be
seen that the synchrotron emission can extend up to
MeV energies and that in the GeV-TeV range the
dominant process is IC upscattering of stellar pho-
tons. Figure 3 shows a detail of the SED at high
energies. Notice that absorption by photon-photon
annihilation has been taken into account in the fi-
nal curve, yielding a soft spectrum around 100 GeV
(Bosch-Ramon et al. 2006).
−5 −3 −1 1 3 5 7 9 11 13
log(photon energy [eV])
observed SED
IC emission
seed photons
ext. Bremsstr.
int. Bremsstr.
star IC
corona IC
disk IC
sync.
corona
Fig. 2. Spectral energy distribution of a high-mass MQ.
The different contributions to the total SED are shown.
From Bosch-Ramon, Romero & Paredes (2006).
6 7 8 9 10 11 12 13
log(photon energy [eV])
observed SED
star IC
synchrotron
corona IC
Fig. 3. High-energy emission from high-mass MQ. Cour-
tesy of V. Bosch-Ramon.
4.2. Hadronic interactions
The main reaction for proton cooling in a high-
mass MQs is pp interaction, through the channels
pp → p + p + π0 and pp → p + p + ξ(π+ + π−),
where ξ is the π± multiplicity. The neutral pions
decay yielding gamma rays, π0 → γ + γ, whereas
the charged pions produce neutrinos and e± pairs:
π± → µ±νµν̄µ → e
±νeν̄e. The gamma-ray spectrum
will mimic at high-energies the spectrum of the par-
ent relativistic proton population. In general, since
proton losses are not as severe as electron losses in
the inner region of the source, we could expect a
higher energy cutoff in hadronic-dominated sources.
Models for hadronic MQs have been developed
4 GUSTAVO E. ROMERO
 7  8  9  10  11  12  13  14
Log E [eV]
θ = 0º
θ = 30º
θ = 60º
θ = 90º
 0  60  120  180
θ [º]
Beaming factor
Fig. 4. Spectral energy distributions for the hadronic
emission of a high-mass MQ with a smooth stellar wind.
Different curves correspond to different viewing angles.
From Orellana et al. (2007).
by Romero et al. (2003), Romero et al. (2005),
Romero & Orellana (2005) and Orellana & Romero
(2007). Neutrino production in this kind of models
is discussed by Romero & Orellana (2005), Aharo-
nian et al. (2006), Benarek (2005) and Christiansen
et al (2006). For photo-pion production of neutri-
nos, under rather extreme conditions, see Levinson
& Waxman (2001).
Figure 4 shows different SED obtained from pp
interactions for a high-mass MQ with a smooth
spherical wind (Orellana et al. 2007). The vari-
ous curves correspond to different viewing angles.
The total jet power in relativistic protons is Lrel
6 × 1036 erg s−1. The jet is assumed to be perpen-
dicular to the orbital plane, but this constraint can
be relaxed to allow, for instance, for a precessing jet.
Actually, in some systems, the jet could point even
in the direction of the star (Butt et al. 2003, Romero
& Orellana 2005). In such a case, jet-induced nucle-
osynthesis can occur in the stellar atmosphere. The
power of the stellar wind might be, for some stars,
strong enough as to stop the jet creating a stand-
ing shock between the compact object and the star.
Protons and electrons might be re-accelerated there
up to very high energies, producing a detectable TeV
source.
All existing models for hadronic MQs assume a
smooth wind from the star3. However, it would
be quite possible that the wind have some struc-
ture, for instance in the form of clumps, a fact that
3See, nonetheless, the paper by Aharonian & Atoyan
(1996) that, although not framed in the context of MQs, dis-
cusses the interaction of a proton beam with a cloudy medium
around a star.
would lead to gamma-ray variability on short time
scales. If such a variability could be detected by fu-
ture Cherenkov telescope arrays, it might be used to
infer the structure of the wind. The jet would act as
a kind of lantern illuminating the wind in gamma-
rays to the observer.
Hadronic jets can propagate through the ISM
producing hot spots similar to those observed in the
case of extragalactic sources (Bosch-Ramon et al.
2005b). Particles re-accelerated at the termination
point of the jets, can diffuse in the ambient medium,
interacting with diffuse material and producing ex-
tended high-energy sources.
5. THE CONTROVERSIAL CASE OF
LS I +61 303
LS I +61 303 is a puzzling Be/X-ray binary,
which displays gamma-ray variability at high ener-
gies. The nature of the compact object and the
origin of the high-energy emission is unclear. The
detection of jet-like radio features by Massi et al.
(2001, 2004) led to the classification of this source as
a MQ. This has been recently challenged by Dhawan
et al. (2006), who observed the source with the
VLBA at different orbital phases concluding that
the direction the jet-like feature during the perias-
tron passage (opposed to the primary star) supports
the scenario of a colliding wind model where the com-
pact object is an energetic pulsar (wind power∼ 1036
erg/s). The system has been detected by the MAGIC
telescope at E > 200 GeV. The variability is modu-
lated with the orbital period. Contrary to the expec-
tations the maximum of the gamma-ray emission oc-
curred well after the periastron passage. The source
was not clearly detected during the periastron (Al-
bert et al. 2006). The cause of this could be gamma-
ray absorption in the combined photon field of the Be
star and its decretion disk (e.g. Orellana & Romero
2007). Figure 5 shows the electromagnetic cascades
that might develop close to the periastron passage
(which occurs at phase 0.23). According to these
simulations the source should be detectable during
the periastron, but with longer exposures, and the
spectrum will be softer than what was observed at
phases 0.6-0.7.
The pulsar/Be binary interpretation goes not free
of severe problems. The flux at MeV-GeV energies
observed by EGRET (Kniffen et al. 1997) accounts
for a luminosity of ∼ 1036 erg/s, which would imply
an impossible conversion efficiency from wind power
to gamma-rays of ∼ 1. In addition, since the pulsar
wind would be orders of magnitude stronger than the
slow Be equatorial wind, the observed ‘cometary tail’
JET INTERACTIONS IN HMXRBS 5
Fig. 5. Electromagnetic cascades at different phases de-
veloped close to the periastron passage of the X-ray bi-
nary LS I +61 303 (from Orellana & Romero 2007).
radio feature, if interpreted as synchrotron radiation
from electrons accelerated at the colliding wind re-
gion, should point out toward the primary star, and
not opposite to it.
It is clear that LS I +61 303 is a interesting and
peculiar system that deserves more intensive studies
in the near future.
6. CONCLUSIONS
MQs are outstanding natural laboratories to
study a variety of physical phenomena such as par-
ticle acceleration, accretion physics, and particle in-
teractions. Observations of gamma-ray emission of
high-mass MQs can be used to probe the structure
of stellar winds and the nature of the matter content
of relativistic jets.
How many MQs are there in the Galaxy?. It is
difficult to answer this questions, but it seems possi-
ble that a significant number of the yet-unidentified
variable gamma-ray sources located on the galactic
plane (Romero et al. 1999) could be associated with
high-mass MQs (Romero et al. 2004, Bosch-Ramon
et al. 2005a). In the next few years, new Cherenkov
telescope arrays like HESS II, MAGIC II, and VER-
ITAS, along with the satellite observatories AGILE
and GLAST, will continue detecting these extraor-
dinary objects at high energies and helping to pene-
trate into their mysteries.
Acknowledgments
This work has been supported by the Agencies
CONICET (PIP 5375) and ANPCyT (PICT 03-
13291 BID 1728/OC-AR). I thank the organizers for
a wonderful meeting and a warm hospitality.
REFERENCES
Albert, J. et al. (MAGIC coll.) 2006, Science, 312, 1771
Aharonian, F. A., & Atoyan, A. M. 1996, Space Sci. Rev.,
75, 357
Aharonian, F. A., Anchordoqui, L. A., Khangulyan, D.,
& Montaruli, T. 2006, Journal of physics: conference
series, 39, 408
Atoyan, A. M., & Aharonian, F. A. 1999, MNRAS, 302,
Bednarek, W. 2005, ApJ, 631, 466
Bednarek, W. 2006a, MNRAS, 368, 579
Bednarek, W. 2006b, MNRAS, 371, 1737
Bosch-Ramon, V. & Paredes, J. M. 2004, A&A, 417, 1075
Bosch-Ramon, V., Romero, G. E., & Paredes, J. M.
2005a, A&A, 429, 267
Bosch-Ramon, V., Aharonian, F. A., & Paredes, J. M.
2005b, A&A, 432, 609
Bosch-Ramon, V., Romero, G. E., & Paredes, J. M. 2006,
A&A, 447, 263
Butt, Y.M., Maccarone, T.J., & Prantzos, N. 2003, ApJ,
587, 748
Christiansen, H. R., Orellana, M., & Romero, G. E. 2006,
PhRvD, 73, 063012
Dhawan, V., Mioduszewski, A., & Rupen, M., 2006, in
Proc. of the VI Microquasar Workshop, Como-2006
Dermer, C., & Böttcher, M. 2006, ApJ, 643, 1081
Dubner, G. M., Holdaway, M., Goss, W. M., & Mirabel,
I. F. 1998, AJ, 116, 1842
Fender R., & Maccarone T. 2004, in: Cosmic Gamma-
Ray Sources, ed. K.S. Cheng & G.E. Romero, Kluwer
Academic Publishers, Dordrecht, p.205
Fender, R. P., Belloni, T. M., & Gallo, E. 2004, MNRAS,
355, 1105
Gallo, E., Fender, R., Kaiser, C. 2005, Nature, 436, 819
Georganopoulos, M., Aharonian, F. A., & Kirk, J. G.
2002, A&A, 388, L25
Gupta, S., Böttcher, M., & Dermer, C. D. 2006, ApJ,
644, 409
Ichimaru, S. 1977, ApJ, 214, 840
Kaufman Bernadó, M. M., Romero, G. E., & Mirabel, I.
F. 2002, A&A, 385, L10
Kniffen, D.A., et al., 1997, ApJ, 486, 126
Kotani, T., Kawai, N., Aoki, T., et al. 1994, PASJ, 46,
Kotani, T., Kawai, N., Matsuoka, M., & Brinkmann, W.
1996, PASJ, 48, 619
Levinson, A., & Waxman, E. 2001, PhRvL, 87, 171101
Liu, Q.Z., van Paradijs, J., & van den Heuvel, E. P. J
2006, A&A, 455, 1165
Markoff, S., Falcke, H., & Fender, R. P. 2001, A&A, 372,
Markoff, S., Nowak, M., Corbel, S., et al. 2003, A&A,
397, 645
Massi, M., et al. 2001, A&A, 376, 217
Massi, M., et al. 2004, A&A, 414, L1
Migliari, S., Fender, R. & Méndez, M. 2002, Science, 297,
Mirabel, I. F., Rodriguez, L. F., Cordier, B., Paul, J., &
Lebrun, F. 1992, Nature, 358, 215
Mirabel, I. F., & Rodŕıguez, L. F. 1994, Nature, 371, 46
6 GUSTAVO E. ROMERO
Mirabel, I. F., & Rodŕıguez, L. F. 1998, Nature, 392, 673
Mirabel, I. F., Dhawan, V., & Chaty, S. et al. 1998, A&A,
330, L9
Orellana, M., & Romero, G. E. 2007, Ap&SS, in press
Orellana, M., Bordas, P., Bosch-Ramon, V., et al. 2007,
A&A, submitted
Paredes, J. M., Bosch-Ramon, V., & Romero, G. E. 2006,
A&A, 451, 259
Romero, G.E., Benaglia, P., Torres, D.F. 1999, A&A,
348, 868
Romero, G.E., Kaufman Bernadó, M.M., & Mirabel, I.F.
2002, A&A, 393, L61
Romero, G. E., Torres, D. F., Kaufman Bernadó, M. M.,
& Mirabel, I. F. 2003, A&A, 410, L1
Romero, G. E., Grenier, I. A., Kaufman Bernadó, M.M.,
Mirabel, I.F., & Torres, D. F. 2004, ESA-SP, 552, 703
Romero, G.E., & Orellana, M. 2005, A&A, 439, 237
Romero, G.E., Christiansen, H.R., & Orellana, M. 2005
ApJ, 632, 1093
ABSTRACT
  Massive X-ray binaries are formed by a compact object that accretes matter
from the stellar wind of an early-type donor star. In some of these systems,
called microquasars, relativistic jets are launched from the surroundings of
the compact object. Such jets interact with the photon field of the companion
star, the stellar wind, and, at large distances, with the interstellar medium.
In this paper I will review the main results of such interactions with
particular emphasis on the production of high-energy photons and neutrinos. The
case of some specific systems, like LS I +61 303, will be discussed in some
detail. Prospects for future observations at different wavelengths of this type
of objects will be presented.

<|endoftext|><|startoftext|>
Introduction
	Observations and stellar sample
	Observations and calibration of the spectra
	Stellar sample
	Pseudo-continuum
	Definition 
	Relation with (B-V)
	Calibration
	Equivalent width
	Definition 
	Relation with colour index (B-V)
	Comparison
	Determination of (B-V)
	Activity indexes and chromospheric activity
	N index 
	Chromospheric activity
	Summary
ABSTRACT
  We study the sodium D lines (D1: 5895.92 \AA; D2: 5889.95 \AA) in late-type
dwarf stars. The stars have spectral types between F6 and M5.5 (B-V between
0.457 and 1.807) and metallicity between [Fe/H] = -0.82 and 0.6. We obtained
medium resolution echelle spectra using the 2.15-m telescope at the argentinian
observatory CASLEO. The observations have been performed periodically since
1999. The spectra were calibrated in wavelength and in flux. A definition of
the pseudo-continuum level is found for all our observations. We also define a
continuum level for calibration purposes. The equivalent width of the D lines
is computed in detail for all our spectra and related to the colour index (B-V)
of the stars. When possible, we perform a careful comparison with previous
studies. Finally, we construct a spectral index (R_D') as the ratio between the
flux in the D lines, and the bolometric flux. We find that, once corrected for
the photospheric contribution, this index can be used as a chromospheric
activity indicator in stars with a high level of activity. Additionally, we
find that combining some of our results, we obtain a method to calibrate in
flux stars of unknown colour.

<|endoftext|><|startoftext|>
Introduction
Bosonic systems at very low temperature are characterized by the fact that a macroscopic fraction
of the particles collapses into a single one-particle state. Although this phenomenon, known as
Bose-Einstein condensation, was already predicted in the early days of quantum mechanics, the first
empirical evidence for its existence was only obtained in 1995, in experiments performed by groups
led by Cornell and Wieman at the University of Colorado at Boulder and by Ketterle at MIT (see
[2, 4]). In these important experiments, atomic gases were initially trapped by magnetic fields and
cooled down at very low temperatures. Then the magnetic traps were switched off and the consequent
time evolution of the gas was observed; for sufficiently small temperatures, the particles remained
close together and the gas moved as a single particle, a clear sign for the existence of condensation.
In the last years important progress has also been achieved in the theoretical understanding of
Bose-Einstein condensation. In [10], Lieb, Yngvason, and Seiringer considered a trapped Bose gas
consisting of N three-dimensional particles described by the Hamiltonian
(−∆j + Vext(xj)) +
Va(xi − xj), (1.1)
where Vext is an external confining potential and Va(x) is a repulsive interaction potential with
scattering length a (here and in the rest of the paper we use the notation ∇j = ∇xj and ∆j = ∆xj).
Letting N → ∞ and a→ 0 with Na = a0 fixed, they showed that the ground state energy E(N) of
(1.1) divided by the number of particle N converges to
N→∞, Na=a0
= min
ϕ∈L2(R3): ‖ϕ‖=1
EGP(ϕ)
where EGP is the Gross-Pitaevskii energy functional
EGP(ϕ) =
|∇ϕ(x)|2 + Vext(x)|ϕ(x)|
2 + 4πa0|ϕ(x)|
. (1.2)
http://arxiv.org/abs/0704.0813v1
Later, in [9], Lieb and Seiringer also proved that trapped Bose gases characterized by the Gross-
Pitaevskii scaling Na = a0 = const exhibit Bose-Einstein condensation in the ground state. More
precisely, they showed that, if ψN is the ground state wave function of the Hamiltonian (1.1) and
N denotes the corresponding one-particle marginal (defined as the partial trace of the density
matrix γN = |ψN 〉〈ψN | over the last N − 1 particles, with the convention that Tr γ
N = 1 for all N),
N → |φGP〉〈φGP| as N → ∞ . (1.3)
Here φGP ∈ L
2(R3) is the minimizer of the Gross-Pitaevskii energy functional (1.2). The interpre-
tation of this result is straightforward; in the limit of large N , all particles, apart from a fraction
vanishing asN → ∞, are in the same one-particle state described by the wave-function φGP ∈ L
2(R3).
In this sense the ground state of (1.1) exhibits complete Bose-Einstein condensation into φGP.
In joint works with L. Erdős and H.-T. Yau (see [5, 6, 7]), we prove that the Gross-Pitaevskii
theory can also be used to describe the dynamics of Bose-Einstein condensates. In the Gross-
Pitaevskii scaling (characterized by the fact that the scattering length of the interaction potential is
of the order 1/N) we show, under some conditions on the interaction potential and on the initial N -
particle wave function, that complete Bose-Einstein condensation is preserved by the time evolution.
Moreover we prove that the dynamics of the condensate wave function is governed by the time-
dependent Gross-Pitaevskii equation associated with the energy functional (1.2).
As an example, consider the experimental set-up described above, where the dynamics of an
initially confined gas is observed after removing the traps. Mathematically, the trapped gas can be
described by the Hamiltonian (1.1), where the confining potential Vext models the magnetic traps.
When cooled down at very low temperatures, the system essentially relaxes to the ground state ψN
of (1.1); from [9] it follows that at time t = 0, immediately before switching off the traps, the system
exhibits complete Bose-Einstein condensation into φGP in the sense (1.3). At time t = 0 the traps
are turned off, and one observes the evolution of the system generated by the translation invariant
Hamiltonian
HN = −
Va(xi − xj) .
Our results (stated in more details in Section 3 below) imply that, if ψN,t = e
−iHN tψN is the time
evolution of the initial wave function ψN and if γ
N,t denotes the one-particle marginal associated
with ψN,t, then, for any fixed time t ∈ R,
N,t → |ϕt〉〈ϕt| as N → ∞
where ϕt is the solution of the nonlinear time-dependent Gross-Pitaevskii equation
i∂tϕt = −∆ϕt + 8πa0|ϕt|
2ϕt (1.4)
with the initial data ϕt=0 = φGP. In other words, we prove that at arbitrary time t ∈ R, the
system still exhibits complete condensation, and the time-evolution of the condensate wave function
is determined by the Gross-Pitaevskii equation (1.4).
The goal of this manuscript is to illustrate the main ideas of the proof of the results obtained in
[5, 6, 7]. The paper is organized as follows. In Section 2 we define the model more precisely, and
we give a heuristic argument to explain the emergence of the Gross-Pitaevskii equation (1.4). In
Section 3 we present our main results. In Section 4 we illustrate the general strategy used to prove
the main results and, finally, in Sections 5 and 6 we discuss the two most important parts of the
proof in some more details.
2 Heuristic Derivation of the Gross-Pitaevskii Equation
To describe the interaction among the particles we choose a positive, spherical symmetric, compactly
supported, smooth function V (x). We denote the scattering length of V by a0.
Recall that the scattering length of V is defined by the spherical symmetric solution to the zero
energy equation (
V (x)
f(x) = 0 f(x) → 1 as |x| → ∞ . (2.1)
The scattering length of V is defined then by
a0 = lim
|x|→∞
|x| − |x|f(x) .
This limit can be proven to exist if V decays sufficiently fast at infinity. Note that, since we assumed
V to have compact support, we have
f(x) = 1−
(2.2)
for |x| sufficiently large. Another equivalent characterization of the scattering length is given by
8πa0 =
dxV (x)f(x) . (2.3)
To recover the Gross-Pitaevskii scaling, we define VN (x) = N
2V (Nx). By scaling it is clear that
the scattering length of VN equals a = a0/N . In fact if f(x) is the solution to (2.1), it is clear that
fN (x) = f(Nx) solves (
VN (x)
fN (x) = 0 (2.4)
with the boundary condition fN (x) → 1 as |x| → ∞. From (2.2), we obtain
fN (x) = 1−
N |x|
for |x| large enough. In particular the scattering length a of VN is given by a = a0/N .
We consider the dynamics generated by the translation invariant Hamiltonian
−∆j +
VN (xi − xj) (2.5)
acting on the Hilbert space L2s(R
3N ,dx1 . . . dxN), the bosonic subspace of L
2(R3N ,dx1 . . . dxN) con-
sisting of all permutation symmetric functions (although it is possible to extend our analysis to
include an external potential, to keep the discussion as simple as possible we only consider the
translation invariant case (2.5)). We consider solutions ψN,t of the N -body Schrödinger equation
i∂tψN,t = HNψN,t . (2.6)
Let γN,t = |ψN,t〉〈ψN,t| denote the density matrix associated with ψN,t, defined as the orthogonal
projection onto ψN,t. In order to study the limit N → ∞, we introduce the marginal densities of
γN,t. For k = 1, . . . , N , we define the k-particle density matrix γ
N,t associated with ψN,t by taking
the partial trace of γN,t over the last N − k particles. In other words, γ
N,t is defined as the positive
trace class operator on L2s(R
3k) with kernel given by
N,t(xk;x
dxN−k ψN,t(xk,xN−k)ψN,t(x
k,xN−k) . (2.7)
Here and in the rest of the paper we use the notation x = (x1, x2, . . . , xN ), xk = (x1, x2, . . . , xk),
x′k = (x
2, . . . , x
k), and xN−k = (xk+1, xk+2, . . . , xN ).
We consider initial wave functions ψN,0 exhibiting complete condensation in a one-particle state
ϕ. Thus at time t = 0, we assume that
N,0 → |ϕ〉〈ϕ| as N → ∞ . (2.8)
It turns out that the last equation immediately implies that
N,0 → |ϕ〉〈ϕ|
⊗k as N → ∞ (2.9)
for every fixed k ∈ N (the argument, due to Lieb and Seiringer, can be found in [9], after Theorem 1).
It is also interesting to notice that the convergence (2.8) (and (2.9)) in the trace class norm is
equivalent to the convergence in the weak* topology defined on the space of trace class operators on
3 (or R3k, for (2.9)); we thank A. Michelangeli for pointing out this fact to us (the proof is based
on general arguments, such as Grümm’s Convergence Theorem).
Starting from the Schrödinger equation (2.6) for the wave function ψN,t, we can derive evolution
equations for the marginal densities γ
N,t. The dynamics of the marginals is governed by a hierarchy
of N coupled equations usually known as the BBGKY hierarchy.
N,t =
−∆j, γ
VN (xi − xj), γ
+ (N − k)
Trk+1
VN (xj − xk+1), γ
(k+1)
(2.10)
Here Trk+1 denotes the partial trace over the (k + 1)-th particle.
Next we study the limit N → ∞ of the density γ
N,t for fixed k ∈ N. For simplicity we fix k = 1.
From (2.10), the evolution equation for the one-particle density matrix, written in terms of its kernel
N,t(x1;x
1) is given by
N,t(x1, x
−∆1 +∆
N,t(x1;x
+ (N − 1)
VN (x1 − x2)− VN (x
1 − x2)
N,t(x1, x2;x
1, x2) .
(2.11)
Suppose now that γ
∞,t and γ
∞,t are limit points (with respect to the weak* topology) of γ
N,t and,
respectively, γ
N,t as N → ∞. Since, formally,
(N − 1)VN (x) = (N − 1)N
2V (Nx) ≃ N3V (Nx) → b0δ(x) with b0 =
dxV (x)
as N → ∞, we could naively expect the limit points γ
∞,t and γ
∞,t to satisfy the limiting equation
∞,t(x1;x
−∆1 +∆
∞,t(x1;x
1) + b0
δ(x1 − x2)− δ(x
1 − x2)
∞,t(x1, x2;x
1, x2) .
(2.12)
From (2.9) we have, at time t = 0,
∞,0(x1;x
1) = ϕ(x1)ϕ(x
∞,0(x1, x2;x
2) = ϕ(x1)ϕ(x2)ϕ(x
1)ϕ(x
(2.13)
If condensation is really preserved by the time evolution, also at time t 6= 0 we have
∞,t(x1;x
1) = ϕt(x1)ϕt(x
∞,t(x1, x2;x
2) = ϕt(x1)ϕt(x2)ϕt(x
1)ϕt(x
(2.14)
Inserting (2.14) in (2.12), we obtain the self-consistent equation
i∂tϕt = −∆ϕt + b0|ϕt|
2ϕt (2.15)
for the condensate wave function ϕt. This equation has the same form as the time-dependent Gross-
Pitaevskii equation (1.4), but a different coefficient in front of the nonlinearity (b0 instead of 8πa0).
The reason why we obtain the wrong coupling constant in (2.15) is that going from (2.11) to
(2.12), we took the two limits
(N − 1)VN (x) → b0δ(x) and γ
N,t → γ
∞,t (2.16)
independently from each other. However, since the scattering length of the interaction is of the order
1/N , the two-particle density γ
N,t develops a short scale correlation structure on the length scale
1/N , which is exactly the same length scale on which the potential VN varies. For this reason the
two limits in (2.16) cannot be taken independently. In order to obtain the correct Gross-Pitaevskii
equation (1.4) we need to take into account the correlations among the particles, and the short scale
structure they create in the marginal density γ
To describe the correlations among the particles we make use of the solution fN (x) to the zero
energy scattering equation (2.4). Assuming that the function fN (xi−xj) gives a good approximation
for the correlations between particles i and j, we may expect that the one- and two-particle densities
associated with the evolution of a condensate are given, for large but finite N , by
N,t(x1;x
1) ≃ ϕt(x1)ϕt(x
N,t(x1, x2;x
2) ≃ fN (x1 − x2)fN (x
1 − x
2)ϕt(x1)ϕt(x2)ϕt(x
1)ϕt(x
(2.17)
Inserting this ansatz into (2.11), we obtain a new self-consistent equation
i∂tϕt = −∆ϕt +
(N − 1)
dxfN (x)VN (x)
= −∆ϕt +
dxf(Nx)V (Nx)
= −∆ϕt + 8πa0|ϕt|
(2.18)
because of (2.3). This is exactly the Gross-Pitaevskii equation (1.4), with the correct coupling
constant in front of the nonlinearity.
Note that the presence of the correlation functions fN (x1−x2) and fN (x
2) in (2.17) does not
contradict complete condensation of the system at time t. On the contrary, in the weak limit N → ∞,
the function fN converges to one, and therefore γ
N,t and γ
N,t converge to |ϕt〉〈ϕt| and |ϕt〉〈ϕt|
respectively. The correlations described by the function fN can only produce nontrivial effects on
the macroscopic dynamics of the system because of the singularity of the interaction potential VN .
From this heuristic argument it is clear that, in order to obtain a rigorous derivation of the Gross-
Pitaevskii equation (2.18), we need to identify the short scale structure of the marginal densities and
prove that, in a very good approximation, it can be described by the function fN as in (2.17). In
other words, we need to show a very strong separation of scales in the marginal density γ
N,t (and,
more generally, in the k-particle density γ
N,t) associated with the solution of the N -body Schrödinger
equation; the Gross-Pitaevskii theory can only be correct if γ
N,t has a regular part, which factorizes
for large N into the product of k copies of the orthogonal projection |ϕt〉〈ϕt|, and a time independent
singular part, due to the correlations among the particles, and described by products of the functions
fN (xi − xj), 1 ≤ i, j ≤ k.
3 Main Results
To prove our main results we need to assume the interaction potential to be sufficiently weak. To
measure the strength of the potential, we introduce the dimensionless quantity
α = sup
|x|2V (x) +
V (x) . (3.1)
Apart from the smallness assumption on the potential, we also need to assume that the correlations
characterizing the initial N -particle wave function are sufficiently weak. We define therefore the
notion of asymptotically factorized wave functions. We say that a family of permutation symmetric
wave functions ψN is asymptotically factorized if there exists ϕ ∈ L
2(R3) and, for any fixed k ≥ 1,
there exists a family ξ
(N−k)
N ∈ L
3(N−k)) such that
∥∥∥ψN − ϕ⊗k ⊗ ξ
(N−k)
∥∥∥→ 0 as N → ∞ . (3.2)
It is simple to check that, if ψN is asymptotically factorized, then it exhibits complete Bose-Einstein
condensation in the one-particle state ϕ (in the sense that the one-particle density associated with
ψN satisfy γ
N → |ϕ〉〈ϕ| as N → ∞). Asymptotic factorization is therefore a stronger condition
than complete condensation, and it provides more control on the correlations of ψN .
Theorem 3.1. Assume that V (x) is a positive, smooth, spherical symmetric, and compactly sup-
ported potential such that α (defined in (3.1)) is sufficiently small. Consider an asymptotically
factorized family of wave functions ψN ∈ L
3N ), exhibiting complete Bose-Einstein condensation
in a one-particle state ϕ ∈ H1(R3), in the sense that
N → |ϕ〉〈ϕ| as N → ∞ (3.3)
where γ
N denotes the one-particle density associated with ψN . Then, for any fixed t ∈ R, the one-
particle density γ
N,t associated with the solution ψN,t of the N -particle Schrödinger equation (2.6)
satisfies
N,t → |ϕt〉〈ϕt| as N → ∞ (3.4)
where ϕt is the solution to the time-dependent Gross-Pitaevskii equation
i∂tϕt = −∆ϕt + 8πa0|ϕt|
2ϕt (3.5)
with initial data ϕt=0 = ϕ.
The convergence in (3.3) and (3.4) is in the trace norm topology (which in this case is equivalent
to the weak* topology defined on the space of trace class operators on R3). Moreover, from (3.4) we
also get convergence of higher marginal. For every k ≥ 1, we have
N,t → |ϕt〉〈ϕt|
⊗k as N → ∞.
Theorem 3.1 can be used to describe the dynamics of condensates satisfying the condition of
asymptotic factorization. The following two corollaries provide examples of such initial data.
The simplest example of N -particle wave function satisfying the assumption of asymptotic fac-
torization is given by a product state.
Corollary 3.2. Under the assumptions on V (x) stated in Theorem 3.1, let ψN (x) =
j=1 ϕ(xj) for
an arbitrary ϕ ∈ H1(R3). Then, for any t ∈ R,
N,t → |ϕt〉〈ϕt| as N → ∞
where ϕt is a solution of the Gross-Pitaevskii equation (3.5) with initial data ϕt=0 = ϕ.
The second application of Theorem 3.1 gives a mathematical description of the results of the
experiments depicted in the introduction.
(−∆j + Vext(xj)) +
VN (xi − xj) (3.6)
with a confining potential Vext. Let ψN be the ground state of H
N . By [9], ψN exhibits complete
Bose Einstein condensation into the minimizer φGP of the Gross-Pitaevskii energy functional EGP
defined in (1.2). In other words
N → |φGP〉〈φGP| as N → ∞ .
In [5], we demonstrate that ψN also satisfies the condition (3.2) of asymptotic factorization. From
this observation, we obtain the following corollary.
Corollary 3.3. Under the assumptions on V (x) stated in Theorem 3.1, let ψN be the ground state
of (3.6), and denote by γ
N,t the one-particle density associated with the solution ψN,t = e
−iHN tψN
of the Schrödinger equation (2.6). Then, for any fixed t ∈ R,
N,t → |ϕt〉〈ϕt| as N → ∞
where ϕt is the solution of the Gross-Pitaevskii equation (3.5) with initial data ϕt=0 = φGP.
Although the second corollary describes physically more realistic situations, also the first corollary
has interesting consequences. In Section 2, we observed that the emergence of the scattering length in
the Gross-Pitaevskii equation is an effect due to the correlations. The fact that the Gross-Pitaevskii
equation describes the dynamics of the condensate also if the initial wave function is completely
uncorrelated, as in Corollary 3.2, implies that the N -body Schrödinger dynamics generates the
singular correlation structure in very short times. Of course, when the wave function develops
correlations on the length scale 1/N , the energy associated with this length scale decreases; since
the total energy is conserved by the Schrödinger evolution, we must conclude that together with
the short scale structure at scales of order 1/N , the N -body dynamics also produces oscillations on
intermediate length scales 1/N ≪ ℓ ≪ 1, which carry the excess energy (the difference between the
energy of the factorized wave function and the energy of the wave function with correlations on the
length scale 1/N) and which have no effect on the macroscopic dynamics (because only variations
of the wave function on length scales of order one and order 1/N affect the macroscopic dynamics
described by the Gross-Pitaevskii equation).
4 General Strategy of the Proof and Previous Results
In this section we illustrate the strategy used to prove Theorem 3.1. The proof is divided into three
main steps.
Step 1. Compactness of γ
N,t. Recall, from (2.7), the definition of the marginal densities γ
associated with the solution ψN,t = exp(−iHN t)ψN of the N -body Schrödinger equation. By defini-
tion, for any N ∈ N and t ∈ R, γ
N,t is a positive operator in L
k = L
1(L2(R3k)) (the space of trace
class operators on L2(R3k)) with trace equal to one. For fixed t ∈ R and k ≥ 1, it follows by standard
general argument (Banach-Alaouglu Theorem) that the sequence {γ
N,t}N≥k is compact with respect
to the weak* topology of L1k. Note here that L
k has a weak* topology because L
k = K
k, where
Kk = K(L
2(R3k)) is the space of compact operators on L2(R3k). To make sure that we can find
subsequences of γ
N,t which converge for all times in a certain interval, we fix T > 0 and consider the
space C([0, T ],L1k) of all functions of t ∈ [0, T ] with values in L
k which are continuous with respect
to the weak* topology on L1k. Since Kk is separable, it follows that the weak* topology on the unit
ball of L1k is metrizable; this allows us to prove the equicontinuity of the densities γ
N,t, and to obtain
compactness of the sequences {γ
N,t}N≥k in C([0, T ],L
Step 2. Convergence to an infinite hierarchy. By Step 1 we know that, as N → ∞, the
family of marginal densities ΓN,t = {γ
k=1 has at least one limit point Γ∞,t = {γ
∞,t}k≥1 in⊕
k≥1C([0, T ],L
k) with respect to the product topology. Next, we derive evolution equations for
the limiting densities γ
∞,t. Starting from the BBGKY hierarchy (2.10) for the family ΓN,t, we prove
that any limit point Γ∞,t satisfies the infinite hierarchy of equations
∞,t =
−∆j, γ
+ 8πa0
Trk+1
δ(xj − xk+1), γ
(k+1)
(4.1)
for k ≥ 1. It is at this point, in the derivation of this infinite hierarchy, that we need to identify the
singular part of the densities γ
(k+1)
N,t . The emergence of the scattering length in the second term on
the right hand side of (4.1) is due to short scale structure of γ
(k+1)
N,t .
It is worth noticing that the infinite hierarchy (4.1) has a factorized solution. In fact, it is simple
to see that the infinite family
t = |ϕt〉〈ϕt|
⊗k for k ≥ 1 (4.2)
solves (4.1) if and only if ϕt is a solution to the Gross-Pitaevskii equation (3.5).
Step 3. Uniqueness of the solution to the infinite hierarchy. To conclude the proof of Theorem 3.1,
we show that the infinite hierarchy (4.1) has a unique solution. This implies immediately that the
densities γ
N,t converge; in fact, a compact sequence with at most one limit point is always convergent.
Moreover, since we know that the factorized densities (4.2) are a solution, it also follows that, for
any k ≥ 1,
N,t → |ϕt〉〈ϕt|
⊗k as N → ∞
with respect to the weak* topology of L1k.
Similar strategies have been used to obtain rigorous derivations of the nonlinear Hartree equation
i∂tϕt = −∆ϕt + (v ∗ |ϕt|
2)ϕt (4.3)
for the dynamics of initially factorized wave functions in bosonic many particle mean field models,
characterized by the Hamiltonian
HmfN =
−∆j +
v(xi − xj) . (4.4)
In this context, the approach outlined above was introduced by Spohn in [11], who applied it to
derive (4.3) in the case of a bounded potential v. In [8], Erdős and Yau extended Spohn’s result to
the case of a Coulomb interaction v(x) = ±1/|x| (partial results for the Coulomb case, in particular
the convergence to the infinite hierarchy, were also obtained by Bardos, Golse, and Mauser, see [3]).
More recently, Adami, Golse, and Teta used the same approach in [1] for one-dimensional systems
with dynamics generated by a Hamiltonian of the form (4.4) with an N -dependent pair potential
vN (x) = N
βV (Nβx), β < 1. In the limit N → ∞, they obtain the nonlinear Schrödinger equation
i∂tϕt = −∆ϕt + b0|ϕt|
2ϕt with b0 =
V (x)dx .
Notice that the Hamiltonian (2.5) has the same form as the mean field Hamiltonian (4.4), with
an N -dependent pair potential vN (x) = N
3V (Nx). Of course, one may also ask what happens if
we consider the mean field Hamiltonian (4.4) with the N -dependent potential vN (x) = N
3βV (Nβx),
for β 6= 1. If β < 1, the short scale structure developed by the solution of the Schrödinger equation
is still characterized by the length scale 1/N (because the scattering length of N3β−1V (Nβx) is still
of order 1/N); but this time the potential varies on much larger scales, of the order N−β ≫ N−1.
For this reason, if β < 1, the scattering length does not appear in the effective macroscopic equation
(8πa0 is replaced by b0 =
dxV (x)). In [6] (and previously in [5] for 0 < β < 1/2) we prove in fact
that Corollary 3.2 can be extended to include the case 0 < β < 1 as follows.
Theorem 4.1. Suppose ψN (x) =
j=1 ϕ(xj), for some ϕ ∈ H
1(R3). Let ψN,t = e
−iHβ,N tψN with
the mean-field Hamiltonian
Hβ,N =
−∆j +
N3βV (Nβ(xi − xj))
for a positive, spherical symmetric, compactly supported, and smooth potential V such that α (defined
in (3.1)) is sufficiently small. Let γ
N,t be the one-particle density associated with ψN,t. Then, if
0 < β ≤ 1 we have, for any fixed t ∈ R, γ
N,t → |ϕt〉〈ϕt| as N → ∞. Here ϕt is the solution to the
nonlinear Schrödinger equation
i∂tϕt = −∆ϕt + σ|ϕt|
with initial data ϕt=0 = ϕ and with
8πa0 if β = 1
b0 if 0 < β < 1
5 Convergence to the Infinite Hierarchy
In this section we give some more details concerning Step 2 in the strategy outlined above. We
consider a limit point Γ∞,t = {γ
∞,t}k≥1 of the sequence ΓN,t = {γ
k=1 and we prove that Γ∞,t
satisfies the infinite hierarchy (4.1). To this end we use that, for finite N , the family ΓN,t satisfies the
BBGKY hierarchy (2.10), and we show the convergence of each term in (2.10) to the corresponding
term in the infinite hierarchy (4.1) (the second term on the r.h.s. of (2.10) is of smaller order and
can be proven to vanish in the limit N → ∞).
The main difficulty consists in proving the convergence of the last term on the right hand side
of (2.10) to the last term on the right hand side of (4.1). In particular, we need to show that in the
limit N → ∞ we can replace the potential (N − k)N2V (N(xj −xk+1)) ≃ N
3V (Nx) in the last term
on the r.h.s. of (2.10) by 8πa0δ(xj − xk+1) . In terms of kernels we have to prove that
dxk+1
N3V (N(xj − xk+1))− 8πa0δ(xj − xk+1)
(k+1)
N,t (xk, xk+1,x
k, xk+1) → 0 (5.1)
as N → ∞. It is enough to prove the convergence (5.1) in a weak sense, after testing the expression
against a smooth k-particle kernel J (k)(xk;x
k). Note, however, that the observable J
(k) does not
help to perform the integration over the variable xk+1.
The problem here is that, formally, the N -dependent potential N3V (N(xj − xk+1)) does not
converge towards 8πa0δ(xj − xk+1) as N → ∞ (it converges towards b0δ(xj − xk+1), with b0 =∫
dxV (x)). Eq. (5.1) is only correct because of the correlations between xj and xk+1 hidden in the
density γ
(k+1)
N,t . Therefore, to prove (5.1), we start by factoring out the correlations explicitly, and
by proving that, as N → ∞,
dxk+1
N3V (N(xj − xk+1))fN (xj − xk+1)− 8πa0δ(xj − xk+1)
) γ(k+1)N,t (xk, xk+1,x
k, xk+1)
fN (xj − xk+1)
(5.2)
where fN (x) is the solution to the zero energy scattering equation (2.4). Then, in a second step, we use
the fact that fN → 1 in the weak limitN → ∞, to prove that the ratio γ
(k+1)
N,t /fN (xj−xk+1) converges
to the same limiting density γ
(k+1)
∞,t as γ
(k+1)
N,t . Eq. (5.2) looks now much better than (5.1) because,
formally, N3V (N(xj − xk+1))fN (xj − xk+1) does converge to 8πa0δ(xj − xk+1). To prove that (5.2)
is indeed correct, we only need some regularity of the ratio γ
(k+1)
N,t (xk, xk+1;x
k, xk+1)/fN (xj − xk+1)
in the variables xj and xk+1. In terms of the N -particle wave function ψN,t we need regularity of
ψN,t(x)/fN (xi−xj) in the variables xi, xj, for any i 6= j. To establish the required regularity we use
the following energy estimate.
Proposition 5.1. Consider the Hamiltonian HN defined in (2.5), with a positive, spherical sym-
metric, smooth and compactly supported potential V . Suppose that α (defined in (3.1)) is sufficiently
small. Then there exists C = C(α) > 0 such that
〈ψ,H2Nψ〉 ≥ CN
∣∣∣∣∇i∇j
fN(xi − xj)
. (5.3)
for all i 6= j and for all ψ ∈ L2s(R
3N ,dx).
Making use of this energy estimate it is possible to deduce strong a-priori bounds on the solution
ψN,t of the Schrödinger equation (2.6). These bounds have the form
∣∣∣∣∇i∇j
ψN,t(x)
fN(xi − xj)
≤ C (5.4)
uniformly in N ∈ N and t ∈ R. To prove (5.4) we use that, by (5.3), and because of the conservation
of the energy along the time evolution,
∣∣∣∣∇i∇j
ψN,t(x)
fN (xi − xj)
≤ CN−2〈ψN,t,H
NψN,t〉 = CN
−2〈ψN,0,H
NψN,0〉 . (5.5)
From (5.5) and using an approximation argument on the initial wave function to make sure that the
expectation of H2N at time t = 0 is of the order N
2, we obtain (5.4).
The bounds (5.4) are then sufficient to prove the convergence (5.1) (using a non-standard Poincaré
inequality; see Lemma 7.2 in [6]).
Remark that the a-priori bounds (5.4) do not hold true if we do not divide the solution ψN,t of
the Schrödinger equation by fN(xi − xj) (replacing ψN,t(x)/fN (xi − xj) by ψN (x) the integral in
(5.4) would be of order N). It is only after removing the singular factor fN (xi − xj) from ψN,t(x)
that we can prove useful bounds on the regular part of the wave function.
It is through the a-priori bounds (5.4) that we identify the correlation structure of the wave
function ψN,t and that we show that, when xi and xj are close to each other, ψN,t(x) can be
approximated by the time independent singular factor fN(xi − xj), which varies on the length scale
1/N , multiplied with a regular part (regular in the sense that it satisfy the bounds (5.4)). It is
therefore through (5.4) that we establish the strong separation of scales in the wave function ψN,t
and in the marginal densities γ
N,t which is of fundamental importance for the Gross-Pitaevskii theory.
Since it is quite short and it shows why the solution fN (xi − xj) to the zero energy scattering
equation (2.1) can be used to describe the two-particle correlations, we reproduce in the following
the proof Proposition 5.1. Note that this is the only step in the proof of our main theorem where the
smallness of constant α, measuring the strength of the interaction potential, is used. The positivity
of the interaction potential, on the other hand, also plays an important role in many other parts of
the proof.
Proof of Proposition 5.1. We decompose the Hamiltonian (2.5) as
hj with hj = −∆j +
i 6=j
VN (xi − xj) .
For an arbitrary permutation symmetric wave function ψ and for any fixed i 6= j, we have
〈ψ,H2Nψ〉 = N〈ψ, h
iψ〉+N(N − 1)〈ψ, hihjψ〉 ≥ N(N − 1)〈ψ, hihjψ〉 .
Using the positivity of the potential, we find
〈ψ,H2Nψ〉 ≥ N(N − 1)
−∆i +
VN (xi − xj)
−∆j +
VN (xi − xj)
. (5.6)
Next, we define φ(x) by ψ(x) = fN (xi−xj)φ(x) (φ is well defined because fN (x) > 0 for all x ∈ R
note that the definition of the function φ depends on the choice of i, j. Then
fN (xi − xj)
∆i (fN (xi − xj)φ(x)) = ∆iφ(x) +
(∆fN )(xi − xj)
fN (xi − xj)
φ(x) +
∇fN (xi − xj)
fN (xi − xj)
∇iφ(x) .
From (2.1) it follows that
fN (xi − xj)
−∆i +
VN (xi − xj)
fN (xi − xj)φ(x) = Liφ(x)
and analogously
fN (xi − xj)
−∆j +
VN (xi − xj)
fN (xi − xj)φ(x) = Ljφ(x)
where we defined
Lℓ = −∆ℓ + 2
∇ℓ fN (xi − xj)
fN (xi − xj)
∇ℓ, for ℓ = i, j .
Remark that, for ℓ = i, j, the operator Lℓ satisfies
dx f2N (xi−xj) Lℓ φ(x) ψ(x) =
dx f2N (xi−xj) φ(x) Lℓ ψ(x) =
dx f2N (xi−xj) ∇ℓ φ(x) ∇ℓ ψ(x) .
Therefore, from (5.6), we obtain
〈ψ,H2Nψ〉 ≥ N(N − 1)
dx f2N (xi − xj) Li φ(x)Lj φ(x)
= N(N − 1)
dx f2N (xi − xj) ∇iφ(x)∇iLj φ(x)
= N(N − 1)
dx f2N (xi − xj) ∇iφ(x)Lj ∇iφ(x)
+N(N − 1)
dx f2N(xi − xj) ∇iφ(x) [∇i, Lj ]φ(x)
= N(N − 1)
dx f2N (xi − xj) |∇j∇iφ(x)|
+N(N − 1)
dx f2N(xi − xj)
∇fN(xi − xj)
fN (xi − xj)
∇iφ(x)∇jφ(x) .
(5.7)
To control the second term on the right hand side of the last equation we use bounds on the function
fN , which can be derived from the zero energy scattering equation (2.1):
1− Cα ≤ fN (x) ≤ 1, |∇fN (x)| ≤ C
, |∇2fN(x)| ≤ C
(5.8)
for constants C independent of N and of the potential V (recall the definition of the dimensionless
constant α from (3.1)). Therefore, for α < 1,
dx f2N (xi − xj)
∇fN(xi − xj)
fN (xi − xj)
∇iφ(x)∇jφ(x)
|xi − xj |
|∇iφ(x)| |∇jφ(x)|
|xi − xj |2
|∇iφ(x)|
2 + |∇jφ(x)|
dx |∇i∇jφ(x)|
(5.9)
where we used Hardy inequality. Thus, from (5.7), and using again the first bound in (5.8), we obtain
〈ψ,H2Nψ〉 ≥ N(N − 1)(1 −Cα)
dx |∇i∇jφ(x)|
which implies (5.3).
6 Uniqueness of the Solution to the Infinite Hierarchy
In this section we discuss the main ideas used to prove the uniqueness of the solution to the infinite
hierarchy (Step 3 in the strategy outlined in Section 4).
First of all, we need to specify in which class of family of densities Γt = {γ
t }k≥1 we want to prove
the uniqueness of the solution to the infinite hierarchy (4.1). Clearly, the proof of the uniqueness
is simpler if we can restrict our attention to smaller classes. But of course, in order to apply the
uniqueness result to prove Theorem 3.1, we need to make sure that any limit point of the sequence
ΓN,t = {γ
k=1 is in the class for which we can prove uniqueness.
We are going to prove uniqueness for all families Γt = {γ
t }k≥1 ∈
C([0, T ],L1k) with
t ‖Hk := Tr
∣∣∣(1−∆1)1/2 . . . (1−∆k)1/2 γ
t (1−∆k)
1/2 . . . (1−∆1)
∣∣∣ ≤ Ck (6.1)
for all t ∈ [0, T ] and for all k ≥ 1 (with a constant C independent of k).
The following proposition guarantees that any limit point of the sequence ΓN,t satisfies (6.1).
Proposition 6.1. Assume the same conditions as in Proposition 5.1. Suppose that Γ∞,t = {γ
∞,t}k≥1
is a limit point of ΓN,t = {γ
k=1 with respect to the product topology on
k≥1C([0, T ],L
k). Then
∞,t ≥ 0 and there exists a constant C such that
Tr (1−∆1) . . . (1−∆k)γ
∞,t ≤ C
k (6.2)
for all k ≥ 1 and t ∈ [0, T ].
Because of Proposition 6.1, it is enough to prove the uniqueness of the infinite hierarchy (4.1) in
the following sense.
Theorem 6.2. Suppose that Γ = {γ(k)}k≥1 is such that
‖γ(k)‖Hk ≤ C
k (6.3)
for all k ≥ 1 (the norm ‖.‖Hk is defined in (6.1)). Then there exists at most one solution Γt =
t }k≥1 ∈
C([0, T ],Lk) of (4.1) such that Γt=0 = Γ and
t ‖Hk ≤ C
k (6.4)
for all k ≥ 1 and all t ∈ [0, T ] (with the same constant C as in (6.3)).
In the next two subsections we explain the main ideas of the proofs of Proposition 6.1 and
Theorem 6.2.
6.1 Higher Order Energy Estimates
The main difficulty in proving Proposition 6.1 is the fact that the estimate (6.2) does not hold true
if we replace γ
∞,t by the marginal density γ
N,t. More precisely,
Tr (1−∆1) . . . (1−∆k)γ
N,t ≤ C
k (6.5)
cannot hold true with a constant C independent of N . In fact, for finite N and k > 1, the k-particle
density γ
N,t still contains the short scale structure due to the correlations among the particles.
Therefore, when we take derivatives of γ
N,t as in (6.5), the singular structure (which varies on a
length scale of order 1/N) generates contributions which diverge in the limit N → ∞.
To overcome this problem, we cutoff the wave function ψN,t when two or more particles come
at distances smaller than some intermediate length scale ℓ, with N−1 ≪ ℓ ≪ 1 (more precisely, the
cutoff will be effective only when one or more particles come close to one of the variable xj over
which we want to take derivatives). For fixed j = 1, . . . , N , we define θj ∈ C
∞(R3N ) such that
θj(x) ≃
1 if |xi − xj| ≫ ℓ for all i 6= j
0 if there exists i 6= j with |xi − xj | . ℓ
It is important, for our analysis, that θj controls its derivatives (in the sense that, for example,
|∇iθj| ≤ Cℓ
j ); for this reason we cannot use standard compactly supported cutoffs, but instead
we have to construct appropriate functions which decay exponentially when particles come close
together. Making use of the functions θj(x), we prove the following higher order energy estimates.
Proposition 6.3. Choose ℓ ≪ 1 such that Nℓ2 ≫ 1. Suppose that α is small enough. Then there
exist constants C1 and C2 such that, for any ψ ∈ L
3N ),
〈ψ, (HN + C1N)
kψ〉 ≥ C2N
dx θ1(x) . . . θk−1(x) |∇1 . . .∇kψ(x)|
2 . (6.6)
The meaning of the bounds (6.6) is clear. We can control the L2-norm of the k-th derivative
∇1 . . .∇kψ by the expectation of the k-th power of the energy per particle, if we only integrate over
configurations where the first k − 1 particles are “isolated” (in the sense that there is no particle at
distances smaller than ℓ from x1, x2, . . . , xk−1). In this sense the energy estimate in Proposition 5.1
(which, compared with Proposition 6.3, is restricted to k = 2) is more precise than (6.6), because
it identifies and controls the singularity of the wave function exactly in the region cutoff from the
integral on the right side of (6.6). The point is that, while Proposition 5.1 is used to identify the
two-particle correlations in the marginal densities γ
N,t (which are essential for the emergence of
the scattering length a0 in the infinite hierarchy (4.1)), we only need Proposition 6.3 to establish
properties of the limiting densities; this is why we can introduce cutoffs in (6.6), provided we can
show their effect to vanish in the limit N → ∞.
Note that we can allow one “free derivative”; in (6.6) we take the derivative over xk although
there is no cutoff θk(x). The reason is that the correlation structure becomes singular, in the L
sense, only when we derive it twice (if one uses the zero energy solution fN introduced in (2.1) to
describe the correlations, this can be seen by observing that ∇fN (x) ≃ 1/|x|, which is locally square
integrable). Remark that the condition Nℓ2 ≫ 1 is a consequence of the fact that, if ℓ is too small,
the error due to the localization of the kinetic energy on distances of order ℓ cannot be controlled.
The proof of Proposition 6.3 is based on induction over k; for details see Section 9 in [6].
From the estimates (6.6), using the preservation of the expectation ofHkN along the time evolution
and a regularization of the initial N -particle wave function ψN , we obtain the following bounds for
the solution ψN,t = e
−iHN tψN of the Schrödinger equation (2.6).
dx θ1(x) . . . θk−1(x) |∇1 . . .∇kψN,t(x)|
≤ Ck (6.7)
uniformly in N and t, and for all k ≥ 1. Translating these bounds in the language of the density
matrix γN,t, we obtain
Tr θ1 . . . θk−1∇1 . . .∇kγN,t∇
1 . . .∇
k ≤ C
k . (6.8)
The idea now is to use the freedom in the choice of the cutoff length ℓ. If we fix the position of all
particles but xj, it is clear that the cutoff θj is effective in a volume at most of the order Nℓ
3. If
we choose now ℓ such that Nℓ3 → 0 as N → ∞ (which is of course compatible with the condition
that Nℓ2 ≫ 1), then we can expect that, in the limit of large N , the cutoff becomes negligible. This
approach yields in fact the desired results; starting from (6.8), and choosing ℓ such that Nℓ3 ≪ 1,
we can complete the proof of Proposition 6.1 (see Proposition 6.3 in [6] for more details).
6.2 Expansion in Feynman Graphs
To prove Theorem 6.2, we start by rewriting the infinite hierarchy (4.1) in the integral form
γt = U
(k)(t)γ0 + 8iπa0
ds U (k)(t− s)Trk+1
δ(xj − xk+1), γ
(k+1)
= U (k)(t)γ0 +
ds U (k)(t− s)B(k)γ(k+1)s ,
(6.9)
where U (k)(t) denotes the free evolution of k particles,
U (k)(t)γ(k) = eit
∆jγ(k)e−it
and the collision operator B(k) maps (k+1)-particle operators into k-particle operators according to
B(k)γ(k+1) = 8iπa0
Trk+1
δ(xj − xk+1), γ
(k+1)
(6.10)
(recall that Trk+1 denotes the partial trace over the (k + 1)-th particle).
Iterating (6.9) n times we obtain the Duhamel type series
t = U
(k)(t)γ
m,t + η
n,t (6.11)
m,t =
ds1 . . .
∫ sm−1
dsm U
(k)(t− s1)B
(k)U (k+1)(s1 − s2)B
(k+1) . . . B(k+m−1)U (k+m)(sm)γ
(k+m)
· · ·
ds1 . . .
∫ sm−1
dsm U
(k)(t− s1)Trk+1
δ(xj1 − xk+1),
U (k+1)(s1 − s2)Trk+2
δ(xj2 − xk+2), . . .Trk+m
δ(xjm − xk+m),U
(k+m)(sm)γ
(k+m)
. . .
(6.12)
and the error term
n,t =
ds2 . . .
∫ sn−1
dsn U
(k)(t− s1)B
(k)U (k+1)(s1 − s2)B
(k+1) . . . B(k+n−1)γ(k+m)sn .
(6.13)
Note that the error term (6.13) has exactly the same form as the terms in (6.12), with the only
difference that the last free evolution is replaced by the full evolution γ
(k+m)
2k+2m leaves2k roots
Vertices:
Figure 1: A Feynman graph in Fm,k and its two types of vertices
To prove the uniqueness of the infinite hierarchy, it is enough to prove that the error term (6.13)
converges to zero as n → ∞ (in some norm, or even only after testing it against a sufficiently large
class of smooth observables). The main problem here is that the delta function in the collision
operator B(k) cannot be controlled by the kinetic energy (in the sense that, in three dimensions, the
operator inequality δ(x) ≤ C(1 − ∆) does not hold true). For this reason, the a-priori estimates
t ‖Hk ≤ C
k are not sufficient to show that (6.13) converges to zero, as n→ ∞. Instead, we have
to make use of the smoothing effects of the free evolutions U (k+j)(sj − sj+1) in (6.13) (in a similar
way, Stricharzt estimates are used to prove the well-posedness of nonlinear Schrödinger equations).
To this end, we rewrite each term in the series (6.11) as a sum of contributions associated with certain
Feynman graphs, and then we prove the convergence of the Duhamel expansion by controlling each
contribution separately.
The details of the diagrammatic expansion can be found in Section 9 of [5]. Here we only present
the main ideas. We start by considering the term ξ
m,t in (6.12). After multiplying it with a compact
k-particle observable J (k) and taking the trace, we expand the result as
Tr J (k)ξ
m,t =
Λ∈Fm,k
KΛ,t (6.14)
where KΛ,t is the contribution associated with the Feynman graph Λ. Here Fm,k denotes the set of
all graphs consisting of 2k disjoint, paired, oriented, and rooted trees with m vertices. An example
of a graph in Fm,k is drawn in Figure 1. Each vertex has one of the two forms drawn in Figure 1,
with one “father”-edge on the left (closer to the root of the tree) and three “son”-edges on the right.
One of the son edge is marked (the one drawn on the same level as the father edge; the other two
son edges are drawn below). Graphs in Fm,k have 2k + 3m edges, 2k roots (the edges on the very
left), and 2k + 2m leaves (the edges on the very right). It is possible to show that the number of
different graphs in Fm,k is bounded by 2
4m+k.
The particular form of the graphs in Fm,k is due to the quantum mechanical nature of the
expansion; the presence of a commutator in the collision operator (6.10) implies that, for every
B(k+j) in (6.12), we can choose whether to write the interaction on the left or on the right of the
density. When we draw the corresponding vertex in a graph in Fm,k, we have to choose whether to
attach it on the incoming or on the outgoing edge.
Graphs in Fm,k are characterized by a natural partial ordering among the vertices (v ≺ v
the vertex v is on the path from v′ to the roots); there is, however, no total ordering. The absence
of total ordering among the vertices is the consequence of a rearrangement of the summands on
the r.h.s. of (6.12); by removing the order between times associated with non-ordered vertices we
significantly reduce the number of terms in the expansion. In fact, while (6.12) contains (m+ k)!/k!
summands, in (6.14) we are only summing over 24m+k contributions. The price we have to pay is
that the apparent gain of a factor 1/m! because of the ordering of the time integrals in (6.12) is lost
in the new expansion (6.14). However, since the time integrations are already needed to smooth out
singularities, and since they cannot be used simultaneously for smoothing and for gaining a factor
1/m!, it seems very difficult to make use of the apparent factor 1/m! in (6.12). In fact, we find that
the expansion (6.14) is better suited for analyzing the cumulative space-time smoothing effects of
the multiple free evolutions than (6.12).
Because of the pairing of the 2k trees, there is a natural pairing between the 2k roots of the
graph. Moreover, it is also possible to define a natural pairing of the leaves of the graph (this is
evident in Figure 1); two leaves ℓ1 and ℓ2 are paired if there exists an edge e1 on the path from ℓ1
back to the roots, and an edge e2 on the path from ℓ2 to the roots, such that e1 and e2 are the two
unmarked son-edges of the same vertex (or, if there is no unmarked sons in the path from ℓ1 and ℓ2
to the roots, if the two roots connected to ℓ1 and ℓ2 are paired).
For Λ ∈ Fm,k, we denote by E(Λ), V (Λ), R(Λ) and L(Λ) the set of all edges, vertices, roots
and, respectively, leaves in the graph Λ. For every edge e ∈ E(Λ), we introduce a three-dimensional
momentum variable pe and a one-dimensional frequency variable αe. Then, denoting by γ̂
(k+m)
0 and
by Ĵ (k) the kernels of the density γ
(k+m)
0 and of the observable J
(k) in Fourier space, the contribution
KΛ,t in (6.14) is given by
KΛ,t =
e∈E(Λ)
dpedαe
αe − p2e + iτeµe
v∈V (Λ)
× exp
e∈R(Λ)
τe(αe + iτeµe)
 Ĵ (k)
{pe}e∈R(Λ)
(k+m)
{pe}e∈L(Λ)
(6.15)
Here τe = ±1, according to the orientation of the edge e. We observe from (6.15) that the momenta
of the roots of Λ are the variables of the kernel of J (k), while the momenta of the leaves of Λ are the
variables of the kernel of γ
(k+m)
0 (this also explain why roots and leaves of Λ need to be paired).
The denominators (αe−p
e+iτeµe)
−1 are called propagators; they correspond to the free evolutions
in the expansion (6.12) and they enter the expression (6.15) through the formula
eit(α+iµ)
α− p2 + iµ
(here and in (6.15) the measure dα is defined by dα = d′α/(2πi) where d′α is the Lebesgue measure
on R).
The regularization factors µe in (6.15) have to be chosen such that µfather =
e= son µe at every
vertex. The delta-functions in (6.15) express momentum and frequency conservation (the sum over
e ∈ v denotes the sum over all edges adjacent to the vertex v; here ±αe = αe if the edge points
towards the vertex, while ±αe = −αe if the edge points out of the vertex, and analogously for ±pe).
An analogous expansion can be obtained for the error term η
n,t in (6.13). The problem now
is to analyze the integral (6.15) (and the corresponding integral for the error term). Through an
appropriate choice of the regularization factors µe one can extract the time dependence of KΛ,t and
show that
|KΛ,t| ≤ C
k+m tm/4
e∈E(Γ)
dαedpe
〈αe − p2e〉
v∈V (Γ)
∣∣∣Ĵ (k)
{pe}e∈R(Γ)
) ∣∣∣
∣∣∣γ̂(k+m)0
{pe}e∈L(Γ)
) ∣∣∣
(6.16)
where we introduced the notation 〈x〉 = (1 + x2)1/2.
Because of the singularity of the interaction at zero, we may be faced here with an ultraviolet
problem; we have to show that all integrations in (6.16) are finite in the regime of large momenta
and large frequency. Because of (6.3), we know that the kernel γ̂
(k+m)
0 ({pe}e∈L(Λ)) in (6.16) provides
decay in the momenta of the leaves. From (6.3) we have, in momentum space,
dp1 . . . dpn (p
1 + 1) . . . (p
n + 1) γ̂
0 (p1, . . . , pn; p1, . . . , pn) ≤ C
for all n ≥ 1. Power counting implies that
(k+m)
0 ({pe}e∈L(Λ))| .
e∈L(Λ)
−5/2 . (6.17)
Using this decay in the momenta of the leaves and the decay of the propagators 〈αe−p
−1, e ∈ E(Λ),
we can prove the finiteness of all the momentum and frequency integrals in (6.15). Heuristically, this
can be seen using a simple power counting argument. Fix κ≫ 1, and cutoff all momenta |pe| ≥ κ and
all frequencies |αe| ≥ κ
2. Each pe-integral scales then as κ
3, and each αe-integral scales as κ
2. Since
we have 2k + 3m edges in Λ, we have 2k + 3m momentum- and frequency integrations. However,
because of the m delta functions (due to momentum and frequency conservation), we effectively only
have to perform 2k + 2m momentum- and frequency-integrations. Therefore the whole integral in
(6.15) carries a volume factor of the order κ5(2k+2m) = κ10k+10m. Now, since there are 2k + 2m
leaves in the graph Λ, the estimate (6.17) guarantees a decay of the order κ−5/2(2k+2m) = κ−5k−5m.
The 2k + 3m propagators, on the other hand, provide a decay of the order κ−2(2k+3m) = κ−4k−6m.
Choosing the observable J (k) so that Ĵ (k) decays sufficiently fast at infinity, we can also gain an
additional decay κ−6k. Since
κ10k+10m · κ−5k−5m−4k−6m−6k = κ−m−5k ≪ 1
for κ ≫ 1, we can expect (6.15) to converge in the large momentum and large frequency regime.
Remark the importance of the decay provided by the free evolution (through the propagators);
without making use of it, we would not be able to prove the uniqueness of the infinite hierarchy.
This heuristic argument is clearly far from rigorous. To obtain a rigorous proof, we use an
integration scheme dictated by the structure of the graph Λ; we start by integrating the momenta
and the frequency of the leaves (for which (6.17) provides sufficient decay). The point here is that
when we perform the integrations over the momenta of the leaves we have to propagate the decay
to the next edges on the left. We move iteratively from the right to the left of the graph, until we
reach the roots; at every step we integrate the frequencies and momenta of the son edges of a fixed
vertex and as a result we obtain decay in the momentum of the father edge. When we reach the
roots, we use the decay of the kernel Ĵ (k) to complete the integration scheme. In a typical step, we
α upuα rpr
Figure 2: Integration scheme: a typical vertex
consider a vertex as the one drawn in Figure 2 and we assume to have decay in the momenta of the
three son-edges, in the form |pe|
−λ, e = u, d,w (for some 2 < λ < 5/2). Then we integrate over
the frequencies αu, αd, αw and the momenta pu, pd, pw of the son-edges and as a result we obtain a
decaying factor |pr|
−λ in the momentum of the father edge. In other words, we prove bounds of the
dαudαddαwdpudpddpw
|pu|λ|pd|
λ|pw|λ
δ(αr = αu + αd − αw)δ(pr = pu + pd − pw)
〈αu − p
u〉〈αd − p
d〉〈αw − p
const
|pr|λ
. (6.18)
Power counting implies that (6.18) can only be correct if λ > 2. On the other hand, to start the
integration scheme we need λ < 5/2 (from (6.17) this is the decay in the momenta of the leaves,
obtained from the a-priori estimates). It turns out that, choosing λ = 2 + ε for a sufficiently
small ε > 0, (6.18) can be made precise, and the integration scheme can be completed.
After integrating all the frequency and momentum variables, from (6.16) we obtain that
|KΛ,t| ≤ C
k+m tm/4
for every Λ ∈ Fm,k. Since the number of diagrams in Fm,k is bounded by C
k+m, it follows immediately
that ∣∣∣Tr J (k) ξ(k)m,t
∣∣∣ ≤ Ck+mtm/4 .
Note that, from (6.12), one may expect ξ
m,t to be proportional to t
m. The reason why we only get
a bound proportional to tm/4 is that we effectively use part of the time integration to control the
singularity of the potentials.
Note that the only property of γ
(k+m)
0 used in the analysis of (6.15) is the estimate (6.3), which
provides the necessary decay in the momenta of the leaves. However, since the a-priori bound (6.4)
hold uniformly in time, we can use a similar argument to bound the contribution arising from the
error term η
n,t in (6.13) (as explained above, also η
n,t can be expanded analogously to (6.14), with
contributions associated to Feynman graphs similar to (6.15); the difference, of course, is that these
contributions will depend on γ
(k+n)
s for all s ∈ [0, t], while (6.15) only depends on the initial data).
Thus, we also obtain ∣∣∣Tr J (k) η(k)n,t
∣∣∣ ≤ Ck+n tn/4 . (6.19)
This bound immediately implies the uniqueness. In fact, given two solutions Γ1,t = {γ
1,t }k≥1 and
Γ2,t = {γ
2,t }k≥1 of the infinite hierarchy (4.1), both satisfying the a-priori bounds (6.4) and with
the same initial data, we can expand both in a Duhamel series of order n as in (6.11). If we fix
k ≥ 1, and consider the difference between γ
1,t and γ
2,t , all terms (6.12) cancel out because they
only depend on the initial data. Therefore, from (6.19), we immediately obtain that, for arbitrary
(sufficiently smooth) compact k-particle operators J (k),
∣∣∣TrJ (k)
1,t − γ
)∣∣∣ ≤ 2Ck+n tn/4
Since it is independent of n, the left side has to vanish for all t < 1/C4. This proves uniqueness
for short times. But then, since the a-priori bounds hold uniformly in time, the argument can be
repeated to prove uniqueness for all times.
References
[1] Adami, R.; Golse, F.; Teta, A.: Rigorous derivation of the cubic NLS in dimension one. Preprint:
Univ. Texas Math. Physics Archive, www.ma.utexas.edu, No. 05-211.
[2] M.H. Anderson, J.R. Ensher, M.R. Matthews, C.E. Wieman, and E.A. Cornell, Science 269
(1995), 198.
[3] Bardos, C.; Golse, F.; Mauser, N.: Weak coupling limit of the N -particle Schrödinger equation.
Methods Appl. Anal. 7 (2000), 275–293.
[4] K. B. Davis, M. -O. Mewes, M. R. Andrews, N. J. van Druten, D. S. Durfee, D. M. Kurn and
W. Ketterle, Phys. Rev. Lett. 75 (1995), 3969.
[5] Erdős, L.; Schlein, B.; Yau, H.-T.: Derivation of the cubic non-linear Schrödinger equation from
quantum dynamics of many-body systems. Invent. Math. 167 (2007), no. 3, 515-614.
[6] Erdős, L.; Schlein, B.; Yau, H.-T.: Derivation of the Gross-Pitaevskii equation for the dynamics
of Bose-Einstein condensate. To appear in Ann. of Math. Preprint arXiv:math-ph/0606017.
[7] Erdős, L.; Schlein, B.; Yau, H.-T.: Rigorous derivation of the Gross-Pitaevskii equation. Phys.
Rev. Lett. 98 (2007), no. 4, 040404.
[8] Erdős, L.; Yau, H.-T.: Derivation of the nonlinear Schrödinger equation from a many body
Coulomb system. Adv. Theor. Math. Phys. 5 (2001), no. 6, 1169–1205.
[9] Lieb, E.H.; Seiringer, R.: Proof of Bose-Einstein condensation for dilute trapped gases. Phys.
Rev. Lett. 88 (2002), no. 17, 170409.
[10] Lieb, E.H.; Seiringer, R.; Yngvason, J.: Bosons in a trap: a rigorous derivation of the Gross-
Pitaevskii energy functional. Phys. Rev A 61 (2000), no. 4, 043602.
[11] Spohn, H.: Kinetic Equations from Hamiltonian Dynamics. Rev. Mod. Phys. 52 (1980), no. 3,
569–615.
http://arxiv.org/abs/math-ph/0606017
	Introduction
	Heuristic Derivation of the Gross-Pitaevskii Equation
	Main Results
	General Strategy of the Proof and Previous Results
	Convergence to the Infinite Hierarchy
	Uniqueness of the Solution to the Infinite Hierarchy
	Higher Order Energy Estimates
	Expansion in Feynman Graphs
ABSTRACT
  We report on some recent results concerning the dynamics of Bose-Einstein
condensates, obtained in a series of joint papers with L. Erdos and H.-T. Yau.
Starting from many body quantum dynamics, we present a rigorous derivation of a
cubic nonlinear Schroedinger equation known as the Gross-Pitaevskii equation
for the time evolution of the condensate wave function.

<|endoftext|><|startoftext|>
Introduction.— It is known since the ground-breaking
work of Berry on geometric phases [1] that artificial gauge
potentials can be induced if the spatial dynamics of a sys-
tem that obeys a wave equation is confined in a certain
way. For instance, if the internal Hamiltonian of neu-
tral atoms contains an energy barrier but the spin eigen-
states are spatially varying, gauge field dynamics can be
induced [2]. In the limit of ray optics, moving atomic en-
sembles could simulate the propagation of light around
a black hole or generate topological phase factors of the
Aharonov-Bohm type [3], and inhomogeneous dielectric
media could generally exhibit geometric effects such as
an optical spin-Hall effect and the optical Magnus force
In this paper, we propose to use electromagnetically in-
duced transparency (EIT) to generate an artificial vector
potential for the paraxial dynamics of signal photons that
simulates quantum dynamics of charged particles in a
static electromagnetic field. Not only the ray of light but
also its mode structure is affected, resulting in a paraxial
wave equation that is equivalent to the Schrödinger equa-
tion for charged particles. Furthermore, the form of the
artificial vector potential can be easily controlled through
spatial variations in the control fields. We suggest con-
figurations that generate homogeneous quasi-electric and
magnetic fields as well as a vector potential of Aharonov-
Bohm type.
Although the treatment in this paper is based on EIT,
the effect presented here is more general: it will occur in
any medium that supports a set of discrete eigenmodes
for a propagating signal fields with different indices of
refraction. If the parameters governing these eigenmodes
vary in space, the signal modes will adiabatically follow,
acquiring geometric phases that affect their paraxial dy-
namics.
Review of EIT with multi-Λ atoms.— The effect takes
place in an atomic multi-Λ system, in which two ground
states are coupled to Q excited states by Q pairs of con-
trol (Ωq) and signal (âq) fields (Fig. 1). An experimen-
tally relevant example of such system is the fundamental
D1 transition in atomic rubidium, where both the ground
and excited levels are split into two hyperfine sublevels
[5]. We assume that the detunings are small so each sig-
nal field âq interacts only with the respective transition
|B〉 ↔ |Aq〉 with the associated atomic operator σ̂B,Aq
and vacuum Rabi frequency gq. In this case, the parax-
ial wave equation for each signal mode can be cast into
the form
âq = iNgqσ̂B,Aq , (1)
where the wave propagates along the z axis, ∆⊥ = ∂
∂2y , N is the number of atoms and k is the wavevector
which we assume approximately independent of q. In
Ref. [6] we have constructed a unitary transformation
âq =
Wqs b̂s (2)
that maps the original field modes aq to a new set of
modes b̂q, such that one and only one of the new modes,
b̂Q =
R∗q âq, (3)
(where Rq ≡ Ωq/(gqΩ⊥) and Ω⊥ ≡
q=1 |Ωq/gq|2 de-
pend on the control fields) couples only to an atomic dark
state and experiences EIT [6, 7, 8]. All other superposi-
tions of field modes are absorbed. This transformation is
given explicitly by Wqq′ = γwqw
q′−δqq′ , with γ = RQ+1
and wq = γ
−1(δQq +Rq).
The EIT mode b̂Q interacts with the multi-Λ atoms
in the same fashion as does the signal field in a regu-
lar 3-level system. While propagating through the EIT
medium, it gives rise to a dark-state polariton associ-
ated with zero interaction energy [9]. All other modes
couple to atomic states whose energy levels are Stark-
shifted by the interaction with either the pump field or
the other signal modes b̂q (q 6= Q). The resulting energy
gap guarantees that, if the amplitudes and phases of the
control fields are slowly changed, the composition of the
dark-state polariton, and hence the EIT mode b̂Q, will
adiabatically follow. It has been proposed [6] and ex-
perimentally demonstrated [5] that a variation in time of
the control fields can therefore be used to adiabatically
http://arxiv.org/abs/0704.0814v2
FIG. 1: Multi Λ-system: Q excited states |Aq〉 are each cou-
pled by a classical control field Ωq to the ground state |C〉
and by a quantized field âq with detuning δ to state |B〉.
transfer optical states between signal modes. In this pa-
per, we focus on spatial propagation of the EIT mode
under control fields that are constant in time, but varied
in space.
Derivation of the gauge potential.— We proceed by ex-
pressing Eq. (1) in terms of the new signal modes b̂q.
Employing the vector notation ~a = {â1, · · · , âQ} and
~σB,A = {g1σ̂B,A1 , · · · , gQσ̂B,AQ} we get
b = iN~σB,A. (4)
Throughout the paper, the double arrow denotes a Q ×
Q matrix. Because
W depends on space and time, the
differential operators have to be applied to both
W and
b. As a result, transformation (2) brings about additional
terms into the equation of motion, that can be written
in form of a minimal coupling scheme by introducing the
Hermitian gauge field
i ≡ i
W †∂i
W, (5)
where i = t, x, y, z. We multiply both sides of Eq. (4) by
W † and exploit the unitarity of
W to show that ∂i
W † =
W †(∂i
W † from which it follows that −
W †∂2i
i + i∂iA
i. The dynamic equation for the b̂ modes can
then be written as
i∂t +A
b = −
ic∂z + cA
b (6)
(−i∇⊥ −A
2~̂b−
W †N~σBA
with ∇⊥ = (∂x, ∂y). This equation has the structure of
a 2+2 dimensional field theory with minimal coupling.
Under the assumption that the control fields do not
depend on t and z we can make a temporal Fourier trans-
formation of the slowly varying amplitudes, which results
in the paraxial wave equation
b(δ) =
(−i∇⊥ −A
~σBA(δ). (7)
The gauge potential is given explicitly by
⊥ = i
R∗q(∇⊥Rq)~w~w† − iγ(∇⊥ ~w)~w† + iγ∗ ~w∇⊥ ~w† .
The full matrix A
⊥ is a pure gauge: it has emerged solely
as a consequence of the unitary transformation (2), which
reflects our choice to describe the system in terms of the
new modes b̂q rather than the original modes âq. How-
ever, this choice is motivated by the fact that the EIT
mode b̂Q is the only mode that is not absorbed. Ab-
sorption of other modes b̂q (with q 6= Q) means that
the index of refraction for these modes has a significant
imaginary part. This separates the EIT mode b̂Q from
other b-modes and ensures that it will adiabatically follow
variations of the control fields. Therefore, when analyz-
ing the evolution of b̂Q, we can neglect the off-diagonal
terms in the matrix (−i∇⊥ −A
2 in Eq. (7) and write
i∂z b̂Q(δ) = −(
~σBA)Q(δ)−
b̂Q(δ) (9)
(−i∇⊥ −A⊥)Qq(−i∇⊥ −A⊥)qQb̂Q(δ).
This equation does not include the whole matrix A
Consequently, this potential no longer acts like a pure
gauge but attains physical significance in determining the
spatial dynamics of the EIT mode.
The first term on the right-hand side of Eq. (9), re-
sponsible for the interaction of the light field with the
EIT medium, takes the same form as the susceptibility
of EIT in a single Λ-system. Neglecting decoherence, we
can write it as [6] (
W † N
~σBA)Q(δ) =
b̂Q, with the
EIT group velocity vEIT = cΩ
2/(Ng2). Note that vEIT
depends on the spatial position because Ω does. This
transforms Eq. (9) to
i∂z b̂Q =
(−i∇⊥ −AQQ)2 −
b̂Q (10)
AQQ = i
R∗q∇⊥Rq = −
|Rq|2∇⊥Arg(Rq),
|(A⊥)Qq|2 = −A2QQ +
|∇⊥Rq|2 (11)
being, respectively, the “quasi-vector” and “quasi-scalar”
potentials.
We see that the paraxial spatial evolution of the EIT
signal mode is governed by the equation that is identi-
cal (up to coefficients) to the Schrödinger equation of a
charged particle in an electromagnetic field. This is the
main result of this work. By arranging the control field in
a certain configuration, one can control the spatial prop-
agation of the signal mode through the EIT medium.
Some steering of the EIT mode is possible even in a
single-Λ system by affecting the term δ/vEIT in Eq. (10),
which results in nonuniform refraction for this mode
[10, 11]. The action of quasi-gauge fields (11) is fun-
damentally different: deflection of the signal field occurs
not due to refraction (the refraction index on resonance
is 1), but due to adiabatic following.
The case of two control fields: homogeneous electric
and magnetic quasi-fields.— Of particular practical im-
portance is the simplest non-trivial case with Q = 2.
We parametrize the control fields by writing R1,2 =
1/2±Rei(φ±θ). The corresponding Rabi frequencies
are then Ωi = h(x, y) giRi, with h(x, y) being an arbi-
trary common prefactor. This parametrization yields the
gauge potentials
AQQ = −∇⊥φ− 2R∇⊥θ; (12)
(∇⊥R)2
1− 4R2 + (∇⊥θ)
2(1 − 4R2).
Similarly to usual electrodynamics we can use a gauge
transformation [12], A′QQ = AQQ + ∇⊥f , to eliminate
the term ∇⊥φ from Eq. (12). The common phase φ of
the control fields therefore does not contribute and can
be set to zero.
A simple way to generate a term that corresponds to a
one-dimensional scalar potential V (x) for a Schrödinger
particle is to choose R = 0 and θ =
2kV (x′)
This choice of control fields leads to AQQ = 0 and Φ =
2kV (x).
For the special case of a constant electric quasi-field
along the x axis, V (x) = −Fx and subsequently
θ = −
4kF |x|3/3, (13)
where x < 0 is assumed for the region of interest. A res-
onant (δ = 0) Gaussian solution to Eq. (10) is displayed
in Fig. 2(a). The center of the Gaussian beam is shifted
by an amount xctr = Fz
2/2k, which is equivalent to the
motion of a charged particle in a constant electric field.
The control field phase profile (13) can be implemented
using, for example, a phase plate. The assumption that
the control fields do not depend on z implies that the
Fresnel number for these fields must be above 1, i.e. that
the characteristic transverse distance over which these
fields significantly change must be larger than ∼
where L is the EIT cell length. This imposes a limi-
tation on the magnitude of the electric quasi-field: from
Eq. (13) we find F <∼ λ−1/2L−3/2 and thus xctr <∼
Assuming that the signal field also has a Fresnel number
of at least 1, and thus satisfies 2zR >∼ L (with Rayleigh
length zR = kw
2/2, w being the signal beam width at
the cell entrance), we find that in a realistic experiment,
the maximum possible signal beam displacement due to
the quasi-electric field is on the order of the signal beam
width w.
To generate a homogeneous magnetic quasi-field along
the z-axis the quantity B = ∇ × AQQ = 2∇⊥θ × ∇⊥R
should be constant. However, it seems difficult to si-
multaneously achieve a vanishing electric quasi-field E =
−∇⊥Φ. A choice that minimizes the electric quasi-
field around the origin is given by θ =
B/2x and
B/2 y. The quasi-potentials then become AQQ =
−B y ex, which corresponds to the Landau gauge in stan-
dard electrodynamics, and Φ = B + 2B3y4 + O(y6). If
Φ is neglected, a Gaussian solution to the paraxial wave
equation is given by
bQ = N cscu(z) exp
cotu(z)∆x2 +∆x · pc
∆x∆y − 1
pc,xpc,y +
ycpc,y
where we have set ∆x ≡ (x − xc, y − yc), u(z) ≡
Bz/(2k) − i tanh−1(2η) and η ≡ Bw2/4. Here xc =
x0 + (k/B)(x
0 sin(Bz/k) + x̃
0(1 − cos(Bz/k)) denotes
the classical spiral trajectory of a charged particle in a
magnetic field, with x′c = dxc/dz, initial position x0 and
initial velocity x′0. For convenience we also have defined
x̃′0 = (y
0,−x′0) and the classical canonical momentum
pc. We remark that pc,x is a constant of motion. The
evolution of the signal mode is displayed in Fig. 2(b).
A surprising feature of solution (14) is that the diffrac-
tive divergence of the signal beam is reduced: the width
squared of the Gaussian,
Re(iB cotu)
1 + 4η2 − (1 − 4η2) cos(Bz
varies periodically with z instead of monotonically in-
creasing. This effect is known for electron wavepackets
[13] and can be understood as a consequence of the cir-
cular motion of particles in a magnetic field: instead of
dispersing, two-dimensional particles in a magnetic field
will simply move on circles of different size (depending on
their velocity), but with the same angular velocity. The
particle cloud will therefore not spread but “breathe”.
It remains to show that non-adiabatic coupling to other
modes can be suppressed for realistic experimental pa-
rameters. This is the case if the strength of the gauge
field terms coupling bQ to other modes, which for the
quasi-magnetic field are of the order B/(2k), are much
smaller than the difference in the respective linear sus-
ceptibilities χ1. For the EIT mode bQ, χ1 = δ/vEIT
with vEIT defined above Eq. (10); for the other modes
it can be approximated by the susceptibility of a two-
level medium, χ1 = −4Ng2(δ − iγ/2)/(cγ2). Evalu-
ating this relation at resonance leads to the condition
η ≪ (kw)2n3π/2, with n ≡ N/(V k3) being the number
of atoms in the volume k−3, which can easily be fulfilled
in an experiment.
Aharonov-Bohm potential for photons.— One of the
most intriguing phenomena of charged quantum particles
FIG. 2: Paraxial propagation of a signal beam over twice the
Rayleigh length in the presence (solid) and absence (grey) of
a constant (a) electric field along the x axis and (b) magnetic
field along z. The dashed line represents the center of the
grey beam. The effect of the fields is somewhat exaggerated.
in electromagnetic fields is the Aharonov-Bohm (AB) ef-
fect [14]. Its two astonishing features are (i) a phase shift
induced by the vector potential in a region in which elec-
tric and magnetic fields are absent, and (ii) its topological
nature: the phase shift does not depend on the particle
trajectory as long as it encloses a magnetic flux. Because
(unlike genuine electromagnetism) the potential (5) is a
differential function of the control fields, it is impossible
to simulate feature (i) with quasi-charged photons. How-
ever, we will show here that a mathematically equivalent
topological phase shift does exist for the optical case.
To generate an AB potential for photons we propose
to use two counter-rotating Laguerre-Gaussian control
fields, i.e., fields that possess an orbital angular momen-
tum. If these control fields are spatially wider than the
signal fields, the corresponding Rabi frequencies can be
approximated in cylindrical coordinates (r, ϕ) by Ω1 =
g1s1re
iϕ and Ω2 = g2s2re
−iϕ. The gauge potentials (12)
then become AQQ = −2R/r ~eϕ and Φ = (1 − 4R2)/r2,
with R = 1
(|s1|2 − |s2|2)/(|s1|2 + |s2|2). The potential
AQQ corresponds exactly to an Aharonov-Bohm poten-
tial for charged particles as it is created by a solenoid.
Solutions of the paraxial wave equation (10) can be
found in cylindrical coordinates by expanding the field
mode as bQ = r
m∈ZZ Bm(z, r) exp(imϕ). Because
of Ω ∼ r, the EIT group velocity can be written as vEIT =
ṽ r2 with ṽ ≡ c
|s1|2 + |s2|2/N . Exact solutions are
given by Bessel functions,
Bm = e
−iκ2z/(2k)
κrJν(κr) + βm
κrYν(κr)
with ν =
1 +m2 + 4Rm− 2kṽδ. For monochromatic
signal fields this corresponds to a rotation of the trans-
verse mode structure. For R = ±1/2 the potential trans-
fers a unit amount of angular momentum to the signal
light, but generally the amount can vary continuously be-
tween −h̄ and h̄. Signal photons in the EIT mode there-
fore form a two-dimensional bosonic quantum system in
an Aharonov-Bohm potential.
Conclusion.— We showed that EIT in a multi-Λ sys-
tem can be used to generate a variety of geometric ef-
fects on propagating signal pulses that mimic the be-
havior of a charged particle in an electromagnetic field.
We found specific arrangements of two spatially inhomo-
geneous pump fields in a double-Λ system which gener-
ate quasi-gauge potentials which correspond to constant
electric and magnetic fields. Furthermore topological ef-
fects like the Aharonov-Bohm phase shift can be induced.
The latter is significantly different from the proposal of
Ref. [3] in that it is based on spatially inhomogeneous
pump fields rather than the Doppler effect in moving me-
This paper investigated EIT in systems with two
ground levels. In such a system, there is only one EIT
mode, which results in an Abelian U(1) gauge theory,
making the physics analogous to electromagnetism. By
extending to multiple ground levels, it may be possible to
obtain multiple EIT modes and model non-Abelian gauge
potentials. This will be explored in a future publication.
We thank David Feder and Alexis Morris for fruit-
ful discussions. This work was supported by iCORE,
NSERC, CIAR, QuantumWorks and CFI.
[1] M. V. Berry, Proc. R. Soc. Lond. A 392, 45 (1984).
[2] R. Dum and M. Olshanii, Phys. Rev. Lett. 76, 1788
(1996); J. Ruseckas et al., Phys. Rev. Lett. 95, 010404
(2005); K. Osterloh et al., Phys. Rev. Lett. 95, 010403
(2005).
[3] U. Leonhardt and P. Piwnicki, Phys. Rev. A 60, 4301
(1999).
[4] S. Murakami, N. Nagaosa, and S.-C. Zhang, Science 301,
1348 (2003); M. Onoda, S. Murakami, and N. Nagaosa,
Phys. Rev. E 74, 066610 (2006); K. Y. Bliokh and Y. P.
Bliokh, Phys. Rev. Lett. 96, 073903 (2006); K. Bliokh,
Phys. Rev. Lett. 97, 043901 (2006); C. Duval, Z. Hor-
vath, and P. Horvathy, J.Geom.Phys. 57, 925 (2007);
C. Duval, Z. Horvathy, and P. A. Horvathy, Phys. Rev. D
74, 021701 (2006); S. Raghu and F. D. M. Haldane,
cond-mat/0602501.
[5] F. Vewinger et al., quant-ph/0611181.
[6] J. Appel, K.-P. Marzlin, and A. I. Lvovsky, Phys. Rev.
A 73, 013804 (2006).
[7] X.-J. Liu, H. Jing, and M.-L. Ge, Eur. Phys. J. D 40,
297 (2006); see also quant-ph/0403171.
[8] S. A. Moiseev and B. S. Ham, Phys. Rev. A 73, 033812
(2006).
[9] M. Fleischauer and M. D. Lukin, Phys. Rev. A 65,
022314 (2002).
[10] A. G. Truscott et al., Phys. Rev. Lett. 82, 1438 (1999);
R. Kapoor and G. S. Agarwal, Phys. Rev. A 61, 053818
(2000).
[11] L. Karpa and M. Weitz, Nature Phys. 2, 332 (2006).
[12] Note that this gauge transformation acts on the EIT
mode b̂Q only and is therefore different from the gauge
transformation discussed above.
[13] H. Takagi, M. Ishida, and N. Sawaki, Jpn. J. Appl. Phys
40, 1973 (2001).
[14] Y. Aharonov and D. Bohm, Phys. Rev. 115, 485 (1959).
http://arxiv.org/abs/cond-mat/0602501
http://arxiv.org/abs/quant-ph/0611181
http://arxiv.org/abs/quant-ph/0403171
ABSTRACT
  The Schrodinger motion of a charged quantum particle in an electromagnetic
potential can be simulated by the paraxial dynamics of photons propagating
through a spatially inhomogeneous medium. The inhomogeneity induces geometric
effects that generate an artificial vector potential to which signal photons
are coupled. This phenomenon can be implemented with slow light propagating
through an a gas of double-Lambda atoms in an electromagnetically-induced
transparency setting with spatially varied control fields. It can lead to a
reduced dispersion of signal photons and a topological phase shift of
Aharonov-Bohm type.

<|endoftext|><|startoftext|>
Introduction
The engineering of quantum states of light fields and oscillators became an in-
teresting topic in the last years, due to its applications in : (i) fundamentals
of quantum mechanics (preparation of Schrodinger-cat states [1], their super-
position [2] and measurement of their decoherence [3], etc.); (ii) determination
of certain properties of a system (phase distribution P(θ) [4], Wigner [5] and
Husimi [6] functions, etc.); (iii) proposals for practical applications (quantum
lithography [7], quantum communication [8] - e.g., via hole-burning in Fock
∗corresponding author, e-mail : sbd@cbpf.br
http://arxiv.org/abs/0704.0815v2
space [9] - quantum teleportation [10], etc). However, a difficult situation ap-
pears when one wants to prepare a state of a system offering hard access [11].
In this case the difficulty may be circumvented by coupling the system having
hard access to a second system offering easy access, in which a desired state is
prepared with subsequent transfer to the first one. The success of this operation
depends on the model-Hamiltonian and on the initial state describing the whole
system.
Although the problem of two interacting harmonic oscillators has been ex-
haustively studied in the literature, the discussion about exchange of nonclas-
sical states between them is scarce. The coupled quantum oscillation problem
was considered earlier in [12, 13, 14], where the authors of those papers were
interested only in the energy of the system. Later on, in Ref [15] a full exchange
between quantum two-mode harmonic oscillators was presented, however the
issue was only concerned with the particular transfer of coherent states. In Ref.
[16] we have studied the transfer of certain properties (statistics and squeezing)
and in Ref. [17] we have studied the transfer of the most relevant part of
the state of a sub-system to another, through the simultaneous transfer of the
number and phase distributions, Pn and P (θ)
1 [17]; the solutions were found
numerically since the models were not exactly soluble.
In the present work we employ a distinct model-Hamiltonian, allowing us to
treat the problem analytically permitting us to analyze the transfer of generic
states. We show in which way one can get exact exchange of the states between
two interacting sub-systems. Exchange of states means simultaneous transfer
of states in two opposite directions ; so, it is more significant than the transfer
of states in one direction as studied in [17]. In the present case the transfer of a
state from the “easy-oscillator” to the “hard-oscillator” is observed by simply
monitoring the state of the easy-oscillator during the time evolution of the whole
system. For brevity, hereafter the easy- and the hard-oscillator will be referred
to as O1 and O2, respectively.
The Sect. II introduces the model-Hamiltonian allowing us to obtain the
evolution operator for this coupled system. In the Sect. III we consider differ-
ent types of initial states describing the entire system to study the mentioned
effect between the O1 and the O2 ( Sub-Sects. (A), (B),and (C) ), includ-
ing superpositions of states representing the qubits |0〉 and |1〉. The Sect. IV
contains the comments and conclusion.
2 Model-Hamiltonian: evolution operator
We start from the Hamiltonian
H/h̄ = ω1a
1 a1 + ω2a
2 a2 + λ
a+1 a2 + a1a
, (1)
1Since the number and phase are canonically conjugate operators they are complementary,
in the sense that simultaneous transfer of number and phase distributions, Pn and P (θ),
concerns the transfer of the major part of the state describing a system.
where a+i (ai) stands for the raising (lowering) operator of the i− th oscillator,
i = 1, 2; ωi and λ are real parameters standing for the i-th oscillator frequency
and coupling constant, respectively. The equations of motion for the operators
a1(t) and a2(t) can be solved analytically,
a1(t) =
c2e−iω
t + s2e−iω
a1(0) + cs
t − e−iω
a2(0), (2)
a2(t) =
c2e−iω
t + s2e−iω
a2(0) + cs
t − e−iω
a1(0),
where,
ω′1 = ω1 + λ
, (3)
ω′2 = ω2 − λ
x2 + 1
x2 + 1
ω1 − ω2
. (5)
The parameter s and c satisfy the condition c2+s2 = 1, they define the auxiliary
operators
a′1 = c a1 + s a2 , (6)
a′2 = −s a1 + c a2 ,
which decouple the above Hamiltonian. The following relations also hold:
ω′1 + ω
2 = ω1 + ω2 , (7)
ω′1 − ω′2 =
It is convenient for our purposes to find the time dependent state vector or
density operator in the Schrodinger picture. One formal prescription is to work
with Wigner representation of the state and obtain the time-dependent density
operator from the Wigner function[19], for which the time evolution is easily
obtained. However, it is a hard task to restore analytical or numerical values
for the density matrix ρ(t) in the Fock basis from the time dependent Wigner
function. To overcome this difficult we will show that for the Hamiltonian given
by Eq.(1) there is an analytical expression for the evolution operator U(t), which
defines the solution of the Schrodinger equation, allowing us to get directly the
matrix ρ(t) in the Fock basis. This kind of approach was already used in Ref
[18], but only treating the system in the resonant case (ω1 = ω2). In [18] the
author studied the transfer of state starting from the particular one photon
state. Our results permit one to obtain an analytical expression for the matrix
element U(t), for the Hamiltonian (1) not restricted to the resonant case and
permitting easy application to a generic initial state. Consequently, the problem
of transfer of states can be more comfortably discussed using the present results.
To obtain the operator U(t), we define the (auxiliary) unitary operator
Us(t) which is associated to a rotation and decouples the Hamiltonian,
U−1s ai Us = a
i . (8)
We have,
U−1s = U−s , (9)
in view of the reverse transformation
a1 = c a
1 − s a′2 , (10)
a2 = s a
1 + c a
We denote {|n1, n2〉0} as representing the Fock′s basis, eigenvectors of the
(old) number operator Ni = a
i ai , whereas {|n1, n2〉s} is the same for the
(new) number operator Ni(s) = a
i. We have,
Us|n1, n2〉s = |n1, n2〉0, (11)
|n1, n2〉s = U−s|n1, n2〉0.
If we represent Us in the Fock
′s basis {|n1, n2〉0}, we obtain
n1, n2
m1, m2
= 0〈n1, n2|Us|m1,m2〉0 (12)
= s〈n1, n2|m1,m2〉0.
Next, to reconstruct the operator Us in the Fock’s basis, we start from
s〈n1, n2| a′1|m1,m2〉0 = s〈n1, n2| (c a1 + s a2) |m1,m2〉0, (13)
Since the operators a′i act on the basis {|n1, n2〉s} whereas the ai act on the
basis {|n1, n2〉0}, we get
n1 + 1s〈n1 + 1, n2|m1,m2〉0 = c
m1 s〈n1, n2|m1 − 1,m2〉0 (14)
m2 s〈n1, n2|m1,m2 − 1〉0,
which, after using the Eq.(12), leads to
n1, n2
m1, m2
n1−1,n2
m1−1,m2
n1−1,n2
m1,m2−1
, (15)
and similarly, repeating the procedure for the operator a′2, we find
n1, n2
m1, m2
n1,n2−1
m1−1,m2
n1,n2−1
m1,m2−1
. (16)
Using the Eqs. (15), (16) plus the unitary condition U †sUs = UsU
s = 1 we
obtain, after a lengthy calculation, the expression
n1, n2
m1, m2
= δn1+n2, m1+m2
n1!n2!
m1!m2!
(−1)n2 cm1−n2 sm2+n2 (17)
min(n2,m2)
k=max(0,m2−n1)
(−1)−k
n2 − k
(U−s)
n1, n2
m1, m2
= (−1)m2−n2 (Us)n1, n2m1, m2 . (18)
The time evolution operator U(t) may be written in the basis {|n1, n2〉s} as
U(t) =
k1,k2
|k1, k2〉s e−i(k1ω
+ k2ω
s〈k1, k2| , (19)
for H is diagonal in this basis. Finally from the Eqs.(12) and (19) we obtain
the expression
n1, n2
m1, m2
k1,k2
e−i(k1ω
+ k2ω
)t (U−s)
n1, n2
k1, k2
(U−s)
m1, m2
k1, k2
, (20)
restricted to n1 + n2 = k1 + k2 = m1 +m2 , whereas U
n1, n2
m1, m2
= 0 otherwise.
The evolution operator obtained in Eq.(20) allows us to study the time evo-
lution of the whole state describing our bipartite system composed by coupled
oscillators, represented by the Hamiltonian in the Eq.(1). In the next section
we will study the exchange of states between these oscillators and, as a natural
assumption, we will suppose the O2 initially in its ground state |0〉. The O1 is
assumed to be previously prepared in various initial states, firstly starting from
an arbitrary state |φ〉.
3 Exchange of generic state
Let us consider that the whole (bipartite) system is initially in the state
|Ψ(0)〉 = |φ〉 ⊗ |0〉 , (21)
whose components in the Fock’s basis are given by,
|Ψ(0)〉 =
Cn, 0(0)|n, 0〉 , (22)
since Cn1, n2(0) = 0 for n2 6= 0. In the Schrodinger representation, the coeffi-
cients Cn1,n2(t) are obtained from Cn1,n2(t) = 〈n1, n2|U(t)|Ψ(0)〉, which, using
Eq. (22) and the constraint n1 + n2 = n, results in the form
Cn1,n2(t) = Cn1+n2,0(0)U(t)
n1,n2
n1+n2,0
. (23)
In particular, we have that
Cn,0(t) = Cn,0(0)U(t)
n,0 , (24)
C0,n(t) = Cn,0(0)U(t)
n,0 . (25)
The exchange of states between the oscillators will occur after an instant τ ,when
C0,n(τ ) = Cn,0(0) and
|Ψ(τ)〉 =
C0, n(τ )|0, n〉 , (26)
or, |Ψ(τ )〉 = |0〉 ⊗ |φ〉. This shows that exchange of states allows us to verify
the transfer of states to the O2 by monitoring the time evolution of the O1.
From the Eqs. (17) and (18) we have,
n−l,l =
(n− l)!l!
cn−l sl , (27)
n−l,l =
(n− l)!l! (−1)
cl sn−l . (28)
The substitution of the Eqs. (27) and (28) in the Eq. (20) results
n,0 = (−1)
(n− l)!l!
(−1)n−l cnsne−i (n−l) ω
t e−i l ω
t . (29)
where we recognize the Newton’s binomial expression,
n,0 = (−1)
e−i ω
t − e−i ω
or, replacing the auxiliary parameters ω′1, ω
2 by ω1, ω2 and λ (cf. Eq. (7)),
n,0 = e
ω1+ω2
−2 i s c sin( λ
, (31)
and, consequently,
C0,n(t) = Cn,0(0) e−i
ω1+ω2
−2 i s c sin( λ
. (32)
In a similar way we get,
Cn,0(t) = Cn,0(0) e−i
ω1+ω2
c2e−i
t + s2ei
. (33)
From Eq. (32) we see that a partial exchange of states will occur when
λt/sc = (2k + 1)π, i.e., in the time intervals τk = (sc/λ) (2k + 1)π. The effect
attains the highest efficiency when the product sc is maximum, i.e., when s =
c = 1/
2 and τk = (k + 1/2)π/λ. According to the Eq. (4) this implies x = 0
and the resonance condition ω1 = ω2 = ω (cf. Eq. (5)),
C0,n(τk) = (−i)n Cn,0(0) e−i ω n τk . (34)
However, we note that even at resonance we obtain no exchange of states, due to
the presence of the phase factor exp
ωτk +
affecting the coefficients
of the state describing both oscillators in the Fock’s representation. In this gen-
eral case we obtain
∣C0,n(τk)
∣Cn,0(0)
∣, which means exchange of statistics
between the two oscillators. This can also be seen comparing both reduced
density matrix, ρ
m1, m2(τk) and ρ
m1, m2(0), in the Fock’s representation,
ρ(2)m1, m2(τk) = e
−i (ωτk+π2 ) (m1−m2) ρ(1)m1, m2(0) , (35)
which exhibits the distinction between their off-diagonal elements. As well
known, while the state of a system offers its complete description, the same
is not true for the statistics, which contains only partial informations of the
system.
3.1 The complete exchange of state
It is shown in the last section that it is not possible to have a complete exchange
of states for a generic initial state because the phases are not transferred (see
Eq.35). Here we show that when the state of oscillator O1 is given by the super-
position C0|0〉+ CN |N〉 whereas O2 is in the vacuum state, complete exchange
of states occurs. Note that this state includes in particular the important case
C0|0〉+ C1|1〉 using the qubits |0〉, |1〉 having potential applications in quantum
communication [20] and in quantum computation [21]. It was shown that this
state exhibits squeezed fluctuations [22].
Next, let us consider the whole system initially in the superposed state
|Ψ(0)〉 = C0,0(0)|0, 0〉+ CN,0(0)|N, 0〉 . (36)
In this case we verify perfect exchange of states between the oscillators for a
convenient choice of the parameters involved. Assuming the resonance condition
in the Eq.(32) we have, for C0,0(t) = C0,0(0),
C0,N (t) = CN,0(0) e−i (ω t+π/2)N sinN (λ t) . (37)
Partial exchange of states will occur when t = τ0 = π/(2λ),which results in
C0,N (τk) = C
N,0(0) e−i π/2(ω/λ+1)N , (38)
whose meaning is the exchange of statistics. The exchange of states becomes
complete (exact) when C0,N (τk) = C
N,0(0), namely, when
, (39)
with m integers. Taking m = 1 and ω in the microwave domain (ω ∼ 109Hz)
the time spent to transfer the state C0|0 > + C1|1 > from the O1 to the O2
results τ0 = π/(2λ) ∼ 10−9s, since λ = ω/3 (cf. Eq.(39)), which is smaller than
the typical decoherence time for such systems (τd ∼ 10−3s), as it should.
Note that the previous initial state C0|0〉+CN |N〉 describing the O1 includes
the Fock states |N〉, obtained from C0 = 0 and CN = 1. In this case exact
exchange of states no longer requires the Eq. (39). The reason comes from the
phase factor appearing in the Eq. (39), now becoming a global phase with no
physical relevance. In this case the exchange of states is exact for any instant
tk = τ0 + 2πk/λ.
4 Comments and Conclusion
An analytical procedure applied to a convenient model-Hamiltonian describing
two coupled oscillators allows us to get the exact evolution operator for the
entire system (Sect. II). This approach, through the use of distinct initial states
and parameters (Sub-Sects. (A), (B) of Sect. III), makes easy the study of
exchange of states between such sub-systems. In all cases we have shown that
the fidelity of the process is maximum when the resonance condition, ω1 =
ω2, is attained. Assuming the O2 always in the vacuum state we find, sub-
Section by sub-Section, that: (A) partial exchange of states is achieved when
the initial state of the O1 is arbitrary, for the time intervals t = τk = (k +
1/2)π/λ; the efficiency of partial exchange is maximum when the product sc
is maximum (sc = 1/2); however, while the occurrence of exchange of states is
partial, exchange of statistics is obtained exactly, as shown in the Eqs. (34),
(35); (B) exact exchange of states occurs when the O1 starts from the initial
superposed state C0|0〉 + CN |N〉, in the time intervals tk = τ0 + 2πk/λ, with
the requirement in Eq. (39). If the Eq.(39) is not obeyed, exchange of states
will occur at the same time intervals, but now the effect is only partial; Exact
exchange of states is also found in the particular case of (B), setting C0 = 0
and CN = 1, which means the O1 starting from a Fock state |N〉. In this case
the exchange of states occurs exactly at the same time intervals found in (B),
no matter the Eq. (39) is obeyed or not.
As final remarks we mention that exchange of states and its efficiency could
be investigated for other model-Hamiltonians and, as explained before, the ef-
fect goes beyond those studied in [16] and [17]. To our knowledge, exchange of
states in coupled systems and even exchange of certain properties, are subjects
receiving little attention in the literature [23] - with the remarkable exception
of quantum teleportation [21], an effect having a very distinct nature (requiring
the presence of quantum channels and entangled states), which occurs in the
absence of coupling between the two sub-systems. In the context of teleporta-
tion, exchange of states appears with the name ”identity interchange” [24] and
”two-way teleportation” [25].
4.1 Acknowledgements
The authors thank the CNPq (SBD, BB) and FAPERJ (DPJ) for the partial
supports.
4.2 References
References
[1] B.Yurke, D. Stoler, Phys. Rev. Lett. 57 (1986) 13.
[2] L. Davidovich et al., Phys. Rev. Lett. 71 (1993) 2360.
[3] M. Brune et al., Phys. Rev. Lett. 77 (1996) 4887; D.M. Meekhof et al.,
Phys. Rev. Lett. 76 (1996) 1796.
[4] D.T. Pegg, S. M. Barnett, Phys. Rev. Lett. 76 (1996) 4148.
[5] L.G. Lutterbach and L. Davidovich, Phys. Rev. Lett. 78 (1997) 2547.
[6] M. H.Y. Moussa, B. Baseia, Phys. Lett. A 238 (1998) 223.
[7] G. Bjork, L.L. Sanchez-Soto, J. D. Soderholm, Phys. Rev. Lett. 86 (2001)
4516.
[8] See, e.g., S.L. Braustein, P. van Loock, Rev. Mod. Phys. 77 (2005) 513.
[9] B. Baseia, J.M.C. Malbouisson, Chinese Phys. Lett. 18 ,1467 (2001); Phys.
Lett. A 290 (2001) 234; A.T. Avelar, B. Baseia, Opt. Commun. 239 (2004)
281; Phys. Rev. A 72 (2005) 67508; B. Escher et al., Phys. Rev. A 70 (2004)
025801.
[10] B. Julsgaard et al., Nature, 413 (2001) 400, and references therein.
[11] F. Dietrich et al., Phys. Rev. Lett. 62 (1989) 403; D.J. Heizein et al., Phys.
Rev. Lett. 66 (1991) 2080.
[12] J. Tucker and D. F. Walls, Ann. Phys. (N.Y.) 52, 1 (1969).
[13] E.Y.C. Lu, Phys. Rev. A 8, 1053 (1973).
[14] M.S. Abdalla, J. Phys. A: Math. Gen. 29, 1997 (1996).
[15] Marcos C de Oliveira et al, Journal Optics B 1 (1999) 610.
[16] H. Rodrigues et al., Physica A 311 (2002)188.
[17] D. Portes Jr., et al., Physica A 329 (2003) 391.
[18] Lee E. Estes, Thomas H. Keil, and Lorenzo M. Narducci, Physical Review
175,1 (1968) 286.
[19] B. R. Mollow, Physical Review 162,5 (1967) 1256.
[20] S. J. van Enk, J. I. Cirac, P. Zoller, Phys. Rev. Lett. 78 (1997) 4293.
[21] P. W. Shor, Phys. Rev. A 52 (1995) R2493.
[22] K. Wodkiewicz et al, Phys. Rev. A 35, (1987) 2567.
[23] A. S. M. de Castro, V.V. Dodonov , J. Opt. B: Quantum Semiclass. Opt.
4 (2002) 191.
[24] M. H. Y. Moussa, Phys. Rev. A 55, (1997) R3287.
[25] L. Vaidman, N. Yoran, Phys. Rev. A 59 (1999) 116.
	 Introduction
	Model-Hamiltonian: evolution operator
	Exchange of generic state
	The complete exchange of state
	Comments and Conclusion
	Acknowledgements
	References
ABSTRACT
  Exchange of quantum states between two interacting harmonic oscillator along
their evolution time is discussed. It is analyzed the conditions for such
exchange starting from a generic initial state and demonstrating that the
effect occurs exactly only for the particular states C0|0>+Cn|N>, which
includes the interesting qubits components |0>,|1>. It is also determined the
relation between the coupling constant and characteristic frequencies of the
oscillators to have the complete exchange.

<|endoftext|><|startoftext|>
Introduction
Recent studies of luminous radio quasars indicate that the power of the radio jet can
exceed the bolometric luminosity associated with the accretion flow thermal emission (Punsly
2006b, 2007). This has proven to be quite challenging for current 3-D numerical simulations
of MHD black hole magnetospheres. Based on table 4 of Hawley and Krolik (2006) and the
related discussion of Punsly (2006b, 2007), the most promising 3-D simulations for achieving
this level of efficiency are those of the highest spin, a/M ≈ 1 (where the black hole mass, M ,
and the angular momentum per unit mass, a, are in geometrized units). More generally, such
high spins have been inferred in some black hole systems based on observational constraints
(McClintock et al 2006). Thus, there is tremendous astronomical relevance to these highest
http://arxiv.org/abs/0704.0816v1
– 2 –
spin configurations, in particular the physical origin of the relativistic Poynting jet. The first
generation of long term 3-D simulations produced one Poynting flux powerhouse, the a/M =
0.995 simulation, KDE (De Villiers et al 2003, 2005a; Hirose et al 2004; Krolik et al 2005).
The source of most of the Poynting flux was clearly shown to be outside the event horizon
in KDE (Punsly 2006a). However, without access to the original data, the details of the
physical mechanism could not be ascertained. A second generation of 3-D simulations were
developed in Hawley and Krolik (2006), the highest spin case was KDJ, a/M = 0.99, with
by far the most powerful Poynting jet within the new family of simulations; three times the
Poynting flux (in units of the accretion rate of mass energy) of the next closest simulation
KDH, a/M = 0.95. The last three data dumps, at simulation times, t = 9840 M, t = 9920 M
and t = 10000 M, were generously made available to this author. The late time behavior of
the simulations is established after t = 2000 M (when the large transients due to the funnel
formation have died off) making these data dumps of particular interest for studying the
Poynting jet (Hawley and Krolik 2006). This paper studies the origin of the Poynting jet
at these late times.
The analysis of the data from the KDJ simulation clearly indicates that the Poynting
flux in the outgoing jet is dominated by large flares. Typically, one expects the turbulence
in the field variables to mask the dynamics of Poynting flux creation in an individual time
slice of one of the 3-D simulations (Punsly 2006a). Surprisingly, the flares are of such a large
magnitude that they clearly standout above the background field fluctuations as evidenced
by figure 1. The flares are created in the equatorial accretion flow deep in the egosphere
between the inner calculational boundary at r=1.203 M and r= 1.6 M (the event horizon is at
r= 1.141 M). Powerful beams of Poynting flux emerge perpendicular to the equatorial plane
in the ergospheric flares and much of the energy flux is diverted outward along approximately
radial trajectories that are closely aligned with the poloidal magnetic field direction in the jet
(see figure 1). The situation is unsteady, whenever some vertical magnetic flux is captured
in the accretion flow it tends to be asymetrically distributed and concentrated in either
the northern or southern hemisphere. This hemisphere then receives a huge injection of
electromagnetic energy on time scales ∼ 60M .
The source of Poynting flux in KDJ resembles a nonstationary version of the ergospheric
disk (see Punsly and Coroniti (1990) and chapter 8 of Punsly (2001) for a review). The
ergospheric disk is modeled in the limit of negligible accretion and it is the most direct
manifestation of gravitohydromagnetics (GHM) Punsly (2001). A GHM dynamo arises
when the magnetic field impedes the inflow of gas in the ergosphere, i.e., vertical flux in an
equatorial accretion flow. The strong gravitational force will impart stress to the magnetic
field in an effort to move the plasma through the obstructing flux. In particular, the metric
induced frame dragging force will twist up the field azimuthally. These stresses are coupled
– 3 –
into the accretion vortex around a black hole by large scale magnetic flux, and propagate
outward as a relativistic Poynting jet. The more obstinate the obstruction, the more powerful
the jet. There are two defining characteristics that distinguish the GHM dynamo from a
Blandford-Znajek (B-Z) process, Blandford and Znajek (1977), on field lines that thread the
ergopshere:
1. The B-Z process is electrodynamic so there is no source within the ergosphere, it
appears as if the energy flux is emerging from the horizon. In the GHM mechanism,
the source of Poynting flux is in the ergospheric equatorial accretion flow.
2. In a B-Z process in a magnetosphere shaped by the accretion vortex, the field line
angular velocity is, ΩF ≈ ΩH/2 (where ΩH is the angular velocity of the horizon)
near the pole and decreases with latitude to ≈ ΩH/5 near the equatorial plane of
the inner ergosphere (Phinney 1983). In GHM, since the magnetic flux is anchored
by the inertia of the accretion flow in the inner ergosphere, frame dragging enforces
dφ/dt ≈ ΩH . One therefore has the condition, ΩF ≈ ΩH .
In order to understand the physical origin of the Poynting flux, these two issues are studied
below.
2. The KDJ Simulation
The simulation is performed in the Kerr metric (that of a rotating, uncharged black
hole), gµν . Calculations are carried out in Boyer-Lindquist (B-L) coordinates (r, θ, φ, t). The
reader should refer to Hawley and Krolik (2006) for details of the simulation. We only
give a brief overview. The initial state is a torus of gas in equilibrium that is threaded
by concentric loops of weak magnetic flux that foliate the surfaces of constant pressure.
The magnetic loops are twisted azimuthally by the differentially rotating gas. This creates
significant magnetic stress that removes angular momentum from the gas, initiating a strong
inflow that is permeated by magneto-rotational instabilities (MRI). The end result is that
after t = a few hundred M, accreted poloidal magnetic flux gets trapped in the accretion
vortex or funnel (with an opening angle of ∼ 60◦ at the horizon tapering to ∼ 35◦ at
r > 20M). This region is the black hole magnetosphere and it supports a Poynting jet. The
surrounding accretion flow is very turbulent.
In order to understand the source of the strong flares of radial Poynting flux, one
needs to merely consider the conservation of global, redshifted, or equivalently the B-L
coordinate evaluated energy flux (Thorne et al 1986). In general, the divergence of the
– 4 –
Fig. 1.— The source of Poynting flux. The left hand column is Sθ and the right hand column
is Sr in KDJ, both averaged over azimuth, at (from top to bottom) t= 9840 M, t = 9920
M and t= 10000 M. The relative units (based on code variables) are in a color bar to right
of each plot for comparison of magnitudes between the six plots. The contours on the Sθ
plots are of the density, scaled from the peak value within the frame at relative levels 0.5
and 0.1. The contours on the Sr plots are of Sθ scaled from the peak within the frame at
relative levels 0.67 and 0.33. The inside of the inner calculational boundary (r=1.203 M)
is black. The calculational boundary near the poles is at 8.1◦ and 171.9◦. Notice that any
contribution from an electrodynamic effect associated with the horizon appears minimal.
The white contour is the stationary limit surface. There is no data clipping, so plot values
that exceed the limits of the color bar appear white.
– 5 –
Fig. 2.— The central engine. The left hand column is Bθ and the right hand column is ΩF
in KDJ, both averaged over azimuth, at (from top to bottom) t= 9840 M, t = 9920 M and
t= 10000 M. The relative units (based on code variables) are in a color bar to right of each
plot for comparison of magnitudes between the plots. The calculational boundaries are the
same as figure 1. The contours on the Bθ plots are of the density, scaled from the peak value
within the frame at relative levels 0.5 and 0.1. There is no data clipping, so plot values that
exceed the limits of the color bar appear white.
– 6 –
time component of the stress-energy tensor in a coordinate system can be expanded as,
T νt ;ν = (1/
−g)[∂(
−g T νt )/∂(x
ν)] + Γ
µ , where Γ
t β is the connection coefficient and
g = −(r2 + a2 cos2 θ)2 sin2 θ is the determinant of the metric. However, the Kerr metric has
a Killing vector (the metric is time stationary) dual to the B-L time coordinate. Thus, there
is a conservation law associated with the time component of the divergence of the stress-
energy tensor. Consequently, if one expands out the inhomogeneous connection coefficient
term in the expression above, it will equate to zero. The conservation of energy evaluated
in B-L coordinates reduces to, ∂(
−g T νt )/∂(x
ν) = 0, where the four-momentum −T νt has
two components: one from the fluid, −(T νt )fluid, and one from the electromagnetic field,
−(T νt )EM. The reduction to a homogeneous equation with only partial derivatives is the
reason why the global conservation of energy can be expressed in integral form in (3.70) of
Thorne et al (1986). It follows that the poloidal components of the redshifted Poynting flux
are Sθ = −
−g (T θt )EM and S
r = −
−g (T rt )EM. We can use these simple expressions to
understand the primary source of the Poynting jet in KDJ. Figure 1 is a plot of Sθ (on the
left) and Sr (on the right) in KDJ at the last three time steps of data collection. Each frame
is the average over azimuth of each time step. This greatly reduces the fluctuations as the
accretion vortex is a cauldron of strong MHD waves. The individual φ = constant slices show
the same dominant behavior, however it is embedded in large MHD fluctuations. On the left
hand column of figure 1, density contours have been superimposed on the images to indicate
the location of the equatorial accretion flow. The density is evaluated in B-L coordinates
with contours at 0.5 and 0.1 of the peak value within r < 2.5M . Notice that in all three left
hand frames, Sθ is created primarily in regions of very high accretion flow density. In all
three of the right hand frames of figure 1, there is an enhanced Sr that emanates from the
ergosphere (defined by the interior of the stationary limit, rs = M +
M2 − a2 cos2 θ, note
that there are 40 grid points between r = 1.203M and rs at θ = π/2). This radial energy
beam diminishes precipitously just outside the horizon, near the equatorial plane in all three
time steps. The region in which Sr diminishes is adjacent to a region of strong Sθ that orig-
inates in the inertially dominated accretion flow in the inner ergosphere, 1.2M < r < 1.6M
(this region is resolved by 28 radial grid zones). In fact, if one looks at the conservation of
energy equation, the term ∂(Sθ)/∂θ is sufficiently large to be the source of ∂(Sr)/∂r at the
base of the radial beam in all three frames. This does not preclude the transfer of energy
to and from the plasma. It merely states that the magnitude is sufficient to source Sr. In
general, the hydrodynamic energy flux is negligible in the funnel. In order to illustrate this,
contours of Sθ are superimposed on the color plots of Sr. The contour levels are chosen to
be 2/3 and 1/3 of the maximum value of Sθ emerging from the dense equatorial accretion
flow. One clearly sees Sθ switching off where Sr switches on. We conclude that a vertical
Poynting flux created in the equatorial accretion flow is the source of the strong beams of
Sr. This establishes condition 1 of the Introduction.
– 7 –
The left column of figure 2 contains plots of the magnetic field component, Bθ ≡ Frφ, at
the three time steps. At every location in which Sθ is strong in figure 1, there is a pronounced
enhancement in Bθ in figure 2. Recall that the sign of Sθ is not determined by the sign of Bθ.
These intense flux patches penetrate the inertially dominated equatorial accretion flow in all
three frames. The density contours indicate that the regions of enhanced vertical field greatly
disrupt the equatorial inflow. As noted in the introduction, a GHM interaction is likely to
occur when the magnetic field impedes the inflow in the ergosphere. The regions of large
Bθ are compact compared to the global field configuration of the jet, only ∼ 1.0M − 2.0M
long. Considering the turbulent, differentially rotating plasma in which they are embedded,
these are most likely highly enhanced regions of twisted magnetic loops created by the MRI.
The strength of Bθ at the base of the flares is comparable to, or exceeds the radial magnetic
field strength. The situation is clearly very unsteady and vertical flux is constantly shifting
from hemisphere to hemisphere. The time slice t = 10000 M, although primarily a southern
hemisphere event, also has a significant contribution in the northern hemisphere (see the
blue fan-like plume of vertical Poynting flux in figure 1). The GHM interaction is provided
by the vertical flux that links the equatorial plasma to the relatively slowly rotating plasma
of the magnetosphere within the accretion vortex. The vertical flux transmits huge torsional
stresses from the accretion flow to the magnetosphere.
Further corroboration of this interpretation can be found by looking at the values of ΩF
in the vicinity of the Sr flares. In a non-axisymmetric, non-time stationary flow, there is still
a well defined notion of ΩF : the rate at which a frame of reference at fixed r and θ would
have to rotate so that the poloidal component of the electric field, E⊥, that is orthogonal
to the poloidal magnetic field, BP , vanishes. This was first derived in Punsly (1991) (see
the extended discussion in Punsly (2001) for the various physical interpretations), and has
recently been written out in B-L coordinates in Hawley and Krolik (2006) in terms of the
plasma three-velocity, vi and the Faraday tensor as
ΩF = v
φ − Fθr
rFφθ + gθθv
(Fφθ)2grr + (Frφ)2gθθ
. (2-1)
This expression was studied in the context of the simulation KDH, a/M = 0.95, in Hawley and Krolik
(2006). They found that a long term time and azimuth average yielded ΩF ≈ 1/3ΩH and
there was no enhancement at high latitudes as was anticipated by Phinney (1983). The t
= 10000 M time slice of KDH was generously provided to this author. At t = 10000 M,
there are no strong flares emerging from the equatorial accretion flow. Inside the funnel at
r < 10M , at t=10000 M, 0 < ΩF < 0.5ΩH .
The right hand column of figure 2 is ΩF plotted at three different time steps for KDJ.
By comparison to figure 1, notice that each flare in Sr is enveloped by a region of enhanced
– 8 –
ΩF , typically 0.7ΩH < ΩF < 1.2ΩH . The regions of the funnel outside the ergosphere are
devoid of large flares in Sr and typically have 0 < ΩF < 0.5ΩH , similar to what is seen in
KDH.. Unlike KDH, there are huge enhancements in ΩF at lower latitudes in the funnel.
It seems reasonable to associate this large difference in the peak values of ΩF in KDJ and
KDH (at t= 10000 M) with the spatially and temporally coincident flares in Sr that occur
in KDJ. Furthermore, this greatly enhanced value of ΩF indicates a different physical origin
for ΩF in the flares than for the remainder of the funnel or in KDH at t = 10000 M. The
most straightforward interpretation is that it is a direct consequence of the fact that the
flares originate on magnetic flux that is locked into approximate corotation with the dense
accreting equatorial plasma (i.e., the inertially dominated equatorial plasma anchors the
magnetic flux). In the inner ergosphere, frame dragging enforces 0.7ΩH < dφ/dt < 1.0ΩH
on the accretion flow. This establishes condition 2 of the Introduction.
3. Discussion
In this Letter we showed that in the last three data dumps of the 3-D MHD numerical
simulation, KDJ, the dominant source of Poynting flux originated near the equatorial plane
deep in the ergopshere. The phenomenon is unsteady and is triggered by large scale vertical
flux that is anchored in the inertially dominated equatorial accretion flow. The situation
typifies the ergospheric disk in virtually every aspect, even though there is an intense accre-
tion flow. There is one exception, unlike the ergospheric disk, the anchoring plasma rarely
achieves the global negative energy condition that is defined by the four-velocity, −Ut < 0,
because of the flood of incoming positive energy plasma from the accretion flow. The plasma
attains −Ut < 0 only near the base of the strongest flares seen in the φ = constant slices.
The switch-on of a powerful beam of Sr outside the horizon at r ≈ 1.3M in the a/M =
0.995 simulation, KDE, of Krolik et al (2005) was demonstrated in Punsly (2006a). It
seems likely the the source of Sr in KDE is Sθ from an ergopsheric disk. The ergospheric
disk appears to switch on at a/M > 0.95 as evidenced by the factor of 3 weaker Poynting
flux in KDH. Furthermore, if the funnel opening angle at the horizon in KDH at t= 10000
M is typical within ±5◦ then figure 5 and table 4 of Hawley and Krolik (2006) indicate
that only 35% to 40% of the funnel Poynting flux at large distances is created outside the
horizon during the course of the simulation. A plausible reason is given by the plots of Bθ
in figure 2. The vertical magnetic flux at the equatorial plane is located at r < 1.55M . The
power in the ergospheric disk jet ∼ [Bθ(SA)(ΩH)]2, where SA is the proper surface area
of the equatorial plane threaded by vertical magnetic flux (Semenov et al 2004; Punsly
2001). The proper surface area in the ergospheric equatorial plane increases dramatically
– 9 –
at high spin, diverging at a = M . For example, between the inner calculational boundary
and 1.55 M the surface area is only significant for a/M > 0.95 and grows quickly with a/M ,
exceeding twice the surface area of the horizon for a/M = 0.99. Thus, if Bθ in the inner
ergosphere were independent of spin to first order, then a strong ergospheric disk jet would
switch-on in the 3-D simulations at a/M > 0.95. Note that if the inner boundary were truly
the event horizon instead of the inner calculational boundary then this argument would
indicate that the ergospheric disk would likely be very powerful even at a/M = 0.95 and the
switch-on would occur at a/M ≈ 0.9. The implication is that a significant amount of large
scale magnetic flux threading the equatorial plane of the ergopshere (which implies a large
black hole spin based on geometrical considerations) catalyzes the formation of the most
powerful Poynting jets around black holes. Thus, we are now considering initial conditions
in simulations that are conducive to producing significant vertical flux in the equatorial plane
of the ergosphere.
It should be noted that 2-D simulations from a similar initial state of torii threaded by
magnetic loops have been studied in McKinney and Gammie (2004). However, the magnetic
flux evolution can be much different in this setting as discussed in Punsly (2006a) and poloidal
flux configurations conducive to GHM could be highly suppressed. In summary, there are
no interchange instabilities, so flux tubes cannot pass by each other or move around each
other in the extra degree of freedom provided by the azimuth. Thus, there is a tendency
for flux tubes to get pushed into the hole by the accretion flow. This is in contrast to the
formation of the ergospheric disk in Punsly and Coroniti (1990) in which buoyant flux tubes
are created by reconnection at the inner edge of the ergospheric disk and recycle back out
into the outer ergosphere by interchange instabilities. Ideally, a full 3-D simulation with a
detailed treatment of resistive MHD reconnection is preferred for studying the relevant GHM
physics.
I would like to thank Jean-Pierre DeVilliers for sharing his deep understanding of the
numerical code and these simulations. I was also very fortunate that Julian Krolik and John
Hawley were willing to share their data in the best spirit of science.
REFERENCES
Blandford, R. and Znajek, R. 1977, MNRAS. 179, 433
De Villiers, J-P., Hawley, J., Krolik, 2003, ApJ 599 1238
De Villiers, J-P., Hawley, J., Krolik, J.,Hirose, S. 2005, ApJ 620 878
– 10 –
De Villiers, J-P., Staff, J., Ouyed, R.. 2005, astro-ph 0502225
Hawley, J., Krolik, K. 2006, ApJ 641 103
Hirose, S., Krolik, K., De Villiers, J., Hawley, J. 2004, ApJ 606, 1083
Krolik, K., Hawley, J., Hirose, S. 2005, ApJ 622, 1008
McKinney, J. and Gammie, C. 2004, ApJ 611 977
McClintock, J.E. et al 2006, ApJ 652, 518
Phinney, E.S. 1983, PhD Dissertation University of Cambridge.
Punsly, B., Coroniti, F.V. 1990, ApJ 354 583
Punsly, B. 1991, ApJ 372 424
Punsly, B. 2001, Black Hole Gravitohydromagnetics (Springer-Verlag, New York)
Punsly, B. 2006, MNRAS 366 29
Punsly, B. 2006, ApJL 651 L17
Punsly, B. 2007, MNRAS 374 10
Semenov, V., Dyadechkin, S. and Punsly, B. 2004, Science 305978
Thorne, K., Price, R. and Macdonald, D. 1986, Black Holes: The Membrane Paradigm (Yale
University Press, New Haven)
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0502225
	Introduction
	The KDJ Simulation
	Discussion
ABSTRACT
  This Letter reports on 3-dimensional simulations of Kerr black hole
magnetospheres that obey the general relativistic equations of perfect
magnetohydrodynamics (MHD). In particular, we study powerful Poynting flux
dominated jets that are driven from dense gas in the equatorial plane in the
ergosphere. The physics of which has been previously studied in the simplified
limit of an ergopsheric disk. For high spin black holes, $a/M > 0.95$, the
ergospheric disk is prominent in the 3-D simulations and is responsible for
greatly enhanced Poynting flux emission. Any large scale poloidal magnetic flux
that is trapped in the equatorial region leads to an enormous release of
electromagnetic energy that dwarfs the jet energy produced by magnetic flux
threading the event horizon. The implication is that magnetic flux threading
the equatorial plane of the ergosphere is a likely prerequisite for the central
engine of powerful FRII quasars.

<|endoftext|><|startoftext|>
Introduction
	2. Tableau facts and proof of the Main Theorem
	2.1. Tableau sliding
	2.2. Proof of the rule
	3. An extended example of the main theorem
	Acknowledgments
	References
ABSTRACT
  The classical Littlewood-Richardson coefficients C(lambda,mu,nu) carry a
natural $S_3$ symmetry via permutation of the indices. Our "carton rule" for
computing these numbers transparently and uniformly explains these six
symmetries; previously formulated Littlewood-Richardson rules manifest at most
three of the six.

<|endoftext|><|startoftext|>
Submitted to Physical Review Letters
Two-scale structure of the electron dissipation region during collisionless magnetic
reconnection
M. A. Shay∗
Department of Physics & Astronomy, 217 Sharp Lab, University of Delaware, Newark, DE 19716
J. F. Drake, M. Swisdak
University of Maryland, College Park, MD, 20742
(Dated: November 1, 2018)
Particle in cell (PIC) simulations of collisionless magnetic reconnection are presented that demon-
strate that the electron dissipation region develops a distinct two-scale structure along the outflow
direction. The length of the electron current layer is found to decrease with decreasing electron
mass, approaching the ion inertial length for a proton-electron plasma. A surprise, however, is that
the electrons form a high-velocity outflow jet that remains decoupled from the magnetic field and
extends large distances downstream from the x-line. The rate of reconnection remains fast in very
large systems, independent of boundary conditions and the mass of electrons.
PACS numbers: Valid PACS appear here
Magnetic reconnection drives the release of magnetic
energy in explosive events such as disruptions in labo-
ratory experiments, magnetic substorms in the Earth’s
magnetosphere and flares in the solar corona. Recon-
nection in these events is typically collisionless because
reconnection electric fields exceed the Dreicer runaway
field. Since magnetic field lines reconnect in a boundary
layer, the “dissipation region”, whose structure may limit
the rate of release of energy, understanding the structure
of this boundary layer and its impact on reconnection
is critical to understanding the observations. Because
of their ability to carry large currents the dynamics of
electrons continues to be a topic of interest. Early sim-
ulations of reconnection suggested that the rate of re-
connection was not sensitive to electron dynamics [1, 2]
and this insensitivity was attributed to the coupling to
whistler dynamics at the small spatial scales of the dis-
sipation region [3, 4]. The results of more recent kinetic
PIC simulations have called into question these results
by suggesting that the electron current layer stretches
along the outflow direction and the rate of reconnection
drops[5, 6]. The fast rates of reconnection obtained from
earlier simulations[1, 3, 7] were attributed to the influ-
ence of periodicity[5].
We present particle-in-cell (PIC) simulations with var-
ious electron masses and computational domain sizes and
an analytic model that demonstrate that collisionless re-
connection remains fast even in very large collisionless
systems. The reconnection rate stabilizes before the pe-
riodicity of the boundary conditions can impact the dy-
namics. The electron current layer develops a distinct
two-scale structure along the outflow direction that had
not been identified in earlier simulations. The out-of-
plane electron current driven by the reconnection elec-
∗Electronic address: shay@udel.edu;
URL: http://www.physics.udel.edu/~shay
tric field has a length that decreases with the electron
mass, scaling as (me/mi)
3/8, which extrapolates to about
an ion inertial length di = c/ωpi for the electron-proton
mass ratio. The surprise is that a jet of outflowing elec-
trons with velocity close to the electron Alfven speed cAe
extends up to several 10’s of di from the x-line. Remark-
ably, the electrons are able to jet across the magnetic field
over such enormous distances because momentum trans-
port transverse to the jet effectively “blocks” the flow of
the out-of-plane current in this region. The momentum
transport causing this “current blocking” effect has the
same source (the off diagonal pressure tensor[1]), but is
much stronger than that which balances the reconnection
electric field at the x-line.
Our simulations are performed with the particle-in-cell
code p3d [8, 9]. The results are presented in normal-
ized units: the magnetic field to the asymptotic value of
the reversed field, the density to the value at the cen-
ter of the current sheet minus the uniform background
density, velocities to the Alfvén speed vA, lengths to
the ion inertial length di, times to the inverse ion cy-
clotron frequency Ω−1ci , and temperatures to miv
A. We
consider a system periodic in the x− y plane where flow
into and away from the x-line are parallel to ŷ and x̂,
respectively. The reconnection electric field is parallel
to ẑ. The initial equilibrium consists of two Harris cur-
rent sheets superimposed on a ambient population of uni-
form density. The reconnection magnetic field is given by
Bx = tanh[(y − Ly/4)/w0] − tanh[(y − 3Ly/4)/w0] − 1,
where w0 and Ly are the half-width of the initial current
sheets and the box size in the ŷ direction. The electron
and ion temperatures, Te = 1/12 and Ti = 5/12, are ini-
tially uniform. The initial density profile is the usual Har-
ris form plus a uniform background of 0.2. The simula-
tions presented here are two-dimensional,i.e., ∂/∂z = 0.
Reconnection is initiated with a small initial magnetic
perturbation that produces a single magnetic island on
each current layer.
We have explored the dependence of the rate of recon-
http://arxiv.org/abs/0704.0818v1
mailto:shay@udel.edu
http://www.physics.udel.edu/~shay
FIG. 1: (color online). Reconnection electric field versus time:
(a) 204.8 × 102.4, (b) 102.4 × 51.2, (c) 51.2 × 25.6. w0 is the
initial current sheet width.
nection on the system size in a series of simulations with
three different system sizes and three different mass ra-
tios. For mi/me = 25, the grid scale ∆ = 0.05 and the
speed of light c = 15. For mi/me = 100, ∆ = 0.025 and
c = 20. For mi/me = 400, ∆ = 0.0125 and c = 40. The
reconnection rate versus time is plotted for our simula-
tions in Fig. 1. The reconnection rate is determined by
taking the time derivative of the total magnetic flux be-
tween the x-line and the center of the magnetic island.
The rate increases with time, undergoes a modest over-
shoot that is more pronounced in the smaller domains,
and approaches a quasi-steady rate of around 0.14, in-
dependent of the domain size. Earlier suggestions [5]
that reconnection rates would plunge until elongated cur-
rent layers spawned secondary magnetic islands are not
borne out in these simulations. The rates of reconnec-
tion approach constant values even in the absence of sec-
ondary islands, which for anti-parallel reconnection typ-
ically only occur transiently due to initial conditions[10].
Even these transient islands can be largely eliminated by
a suitable choice of the initial current layer width w0 (a
larger value of w0 is required for the larger domains).
A critical issue is whether the periodicity in the x direc-
tion can influence the rate of reconnection [5]. In each of
the simulations we have identified the time at which the
ion outflows from the x-line meet at the center of the mag-
netic island. This occurs at t ≈ 155 for the largest simu-
lation shown in Fig. 1a. The plasma at the x-line can not
be affected by the downstream conditions until t ≈ 255,
when a pressure perturbation can propagate back up-
stream to the x-line at the magnetosonic speed. This is
well after the end of the simulation. The electrons are
ejected from the x-line at a velocity of around cAe ≫ cA
and therefore might be able to follow field lines back to
the x-line. During the traversal time δt = Lx/cAe, the
amount of reconnected flux is vinB0L/cAe, where vin is
the inflow velocity into the x-line. Using the conservation
of the canonical momentum in the z-direction, the condi-
tion that an electron with a velocity cAe can not cross this
flux to access the x-line reduces to L > di(cA/vin) ∼ 7di,
which is easily satisfied for the simulations in Fig. 1. The
fact that the reconnection rates for all of the simulation
domains in Fig. 1 are essentially identical further sup-
ports this conclusion.
Also shown in Fig. 1 in the dashed lines are the rates of
reconnection for mi/me = 100 in (b) and mi/me = 400
in (c). Consistent with simulations in smaller domains [1,
2], the rate of reconnection is insensitive to the electron
mass.
We now proceed to explore the structure of the electron
current layer. Shown in Fig. 2 is a blow-up around the x-
line of the out-of-plane electron velocity for mi/me = 25
and two simulation domains, 204.8 × 102.4 in (a) and
51.2 × 25.6 in (b), and for mi/me = 400 in a simula-
tion domain of 51.2 × 25.6 in (c). All of the data is
taken in the phase where the reconnection rate and the
lengths of the region of intense out-of-plane current are
stationary. Reconnection forms intense current layers
that have a well-defined length (half widths of around
7di and independent of the size of computational domain
formi/me = 25) and then open up forming the open out-
flow jet that characterizes Hall reconnection [3, 7]. The
current layer in the case of mi/me = 400 in Fig. 2c is dis-
tinctly shorter than the smaller mass ratio current layers
in Fig. 2a,b, suggesting that the length of the electron
current layer depends on the electron mass and would be
shorter for realistic proton-electron mass ratios.
Shown in Fig. 3a is a blow-up around the x-line of
the electron outflow velocity vex for the mi/me = 25,
204.8 × 102.4 run corresponding to Fig. 2a. In contrast
with the out-of-plane current the electrons form an out-
flow jet that extends a very large distance downstream
from the x-line. This outflow jet continued to grow in
length until the end of the simulation. This simulation,
along with others at differing mass ratios, reveals that the
peak outflow velocity is very close to the electron Alfven
speed [8, 11]. One might expect that because of the colli-
mation of the outflow jet and its length, the reconnection
rate would drop. However, this is not the case. While
there is an intense jet in the core of the reconnection
exhaust, the exhaust as a whole quickly begins to open
up downstream of the current layer (Jz). The jet itself
therefore does not act as a nozzle to limit the rate of
FIG. 2: (color online). Blowups around the x-line of the out-
of-plane electron velocity for: (a)mi/me = 25, simulation size
204.8× 102.4, (b) mi/me = 25, 51.2× 25.6, and (c) mi/me =
400, 51.2 × 25.6.
FIG. 3: (color online). Blowups around the x-line for system
size 204.8× 102.4 with mi/me = 25. (a) The electron outflow
velocity vex. (b) Momentum flux vectors, Γ = pexzx̂ + peyzŷ
(vectors in box surrounding x-line are multiplied by 20), with
a background color plot of | (Ez + (ve ×B/c)z)/Ez | .
reconnection: the rate of reconnection remains constant
even as the length of the outflow jet varies in time.
To understand how the electrons can form such an ex-
tended outflow jet while the out-of-plane current layer
remains localized, we examine the out-of-plane compo-
nent of the fluid electron momentum equation along the
symmetry line of the outflow direction. In steady state
Ez = −
mevex
vexBy −
∇ · Γ, (1)
where ve is the electron bulk velocity, Γ = pexzx̂+peyzŷ
is the flux of z-directed electron momentum in the recon-
nection plane (not including convection of momentum)
with pe the electron pressure tensor. In Fig. 4a we plot
all of the terms in this equation along a cut though the
x-line along the outflow direction from a simulation with
mi/me = 100 and Lx × Ly = 102.4 × 51.2. The data
has been averaged between t = 116.2 and t = 117.0.
The electric field (black) is balanced by the sum (red)
of the electron inertia (dashed blue), the Lorentz force
(solid blue) and the divergence of the momentum flux
(green). The major contributions to momentum balance
come from the Lorentz force and the divergence of the
momentum flux. At the x-line the electric field drive
is balanced by the momentum transport [1, 12]. The
surprise is that the Lorentz force, rather than simply in-
creasing downstream from the x-line to balance the re-
connection electric field, instead strongly overshoots the
reconnection electric field far downstream of x-line. This
tendency was seen in earlier simulations [12] but there
was no clear separation of scales because of the small size
of these earlier simulations. Downsteam from the x-line
the electrons are streaming much faster than the mag-
netic field lines. Thus, in a reference frame of the moving
electrons the z-directed electric field has reversed direc-
tion compared with the x-line. This electric field tries to
drive a current opposite to that at the x-line. Evidence
for this reversed current appears downstream of the x-line
in Fig. 2c. In spite of the strength of the effective elec-
tric field, the reversed current carried by the electrons
is small. As at the x-line, the momentum transfer to
electrons in this extended outflow region is balanced by
momentum transport. The momentum flux around the
x-line is shown as a 2-D vector plot in Fig. 3 for the same
run as in (a). The momentum flux has been multiplied by
20 in the box surrounding the x-line. The data for this
figure has been averaged between t = 172.5 and 174.5.
The background color plot is of | (Ez+(ve×B/c)z)/Ez |,
which is & 1 where the electrons are not frozen-in. Evi-
dent is the outward flow of momentum around the x-line
and the much stronger outward flow of negative momen-
tum in an extended downstream region. The momentum
transport is so large that the out-of-plane current down-
stream is effectively “blocked”. The force associated with
this “blocking effect” drives the flow of the large-scale jet
of electrons downstream of the x-line.
We define the length ∆x of the inner dissipation re-
gion as the distance from the x-line to the point where
the Lorentz force vexBy/c crosses the reconnection elec-
tric field Ez. At this location the effective out-of-plane
electric field seen by the electrons reverses sign, causing
the electron current jez to be driven in reverse, which
allows the separatrices to open up. Thus, the inner dis-
sipation region defines the spatial extent of the magnetic
nozzle that develops during reconnection. Since the sim-
ulations presented in this paper use artificial values of
me, it is essential to understand the me scaling of ∆x so
that this important length can be calculated for a proton-
electron plasma. The momentum equation of electrons
in the outflow direction yields a steady state equation for
ex) =
vezBy, (2)
where vez ∼ cAe. Thus, the profile of By along the out-
flow direction and its dependence on me must be deter-
mined. This profile is shown formi/me = 25 (system size
102.4× 51.2), 100 (102.4× 51.2) and 400 (51.2× 25.6) in
Fig. 4b. Surprisingly, the profile of By is apparently in-
dependent of me. Our original expectation was because
of the continuity of the flow of magnetic flux into and
out of the x-line that By ∼ B0vin/cAe ∝ m
e , where the
outflow velocity eventually rises to cAe. However, since
the electrons are not frozen into the magnetic field until
far downstream, the expected scaling fails. To calculate
vex we approximate By by a linear ramp and integrate
Eq. (2). Setting the Lorentz force equation to the re-
connection electric field, we then obtain an equation for
)3/8 (
)1/2 (
diBy′
di. (3)
For the three simulations shown in Fig. 4b the simula-
tions yield 2.9di, 1.8di and 1.0di for mi/me = 25, 100
and 400, respectively, which is in reasonable accord with
the scaling. Extrapolating to a mass-ratio of 1836, we
predict ∆x ∼ 0.6di. In contrast the outer dissipation
region can extend to 10’s of di.
We have shown that the electron current layer that
forms during reconnection stabilizes at a finite length,
independent of the periodicity of the simulation domain,
and aside from transients from initial conditions remains
largely stable to secondary island formation. Reconnec-
tion remains fast with normalized reconnection rates of
around 0.14. The length of the electron current layer ∆x
scales as m
e . Since the width δ of the current layer
scales with the electron skin depth c/ωpe, the aspect-
ratio δ/∆x ∝ (me/mi)
1/8. Extrapolating from our
mi/me = 400 simulations to mi/me = 1836 should not
significantly change the aspect-ratio and we therefore ex-
pect the current layer to remain stable for real mass ra-
tios.
The structure of the current layer is important to
the design of NASA’s magnetospheric multiscale mission
(MMS), which will be the first mission with the time reso-
lution to measure the electron current layers that develop
during reconnection. The length of the out-of-plane elec-
tron current layer projects to around c/ωpi for a proton-
electron plasma while the the outflow jet, which supports
a strong Hall (out-of-plane) magnetic field, extends 10’s
of c/ωpi from the x-line.
FIG. 4: (color online). Results for simulation size 102.4×51.2
with mi/me = 25 and 100; and 51.2×25.6 with mi/me = 400.
(a) Cuts through the x-line of the contributions to Ohm’s law
for mi/me = 100. 1 → −me/eve · ∇vez, 2 → −ẑ · ve × B/e,
3 → −ẑ · (∇ ·Pe)/(nee), 4 → sum of 1,2,3. (b) Cuts through
x-line of By for the three different mi/me.
This work was supported in part by NSF, NASA and
Acknowledgments This work was supported in part
by NASA and the NSF. Computations were carried out
at the National Energy Research Scientific Computing
Center.
[1] M. Hesse et al., Phys. Plasmas 6, 1781 (1999).
[2] M. A. Shay and J. F. Drake, Geophys. Res. Lett. 25,
3759 (1998).
[3] J. Birn et al., J. Geophys. Res. 106, 3715 (2001).
[4] B. N. Rogers et al., Phys. Rev. Lett. 87, 195004 (2001).
[5] W. Daughton et al., Phys. Plasmas 13, 072101 (2006).
[6] K. Fujimoto, Phys. Plasmas 13, 072904 (2006).
[7] M. A. Shay et al., Geophys. Res. Lett. 26, 2163 (1999).
[8] M. A. Shay et al., J. Geophys. Res. 106, 3751 (2001).
[9] A. Zeiler et al., J. Geophys. Res. 107, 1230 (2002),
doi:10.1029/2001JA000287.
[10] J. F. Drake et al., Geophys. Res. Lett. 33, L13105 (2006),
doi:10.1029/2006GL025957.
[11] M. Hoshino et al., J. Geophys. Res. 106, 25979 (2001).
[12] P. L. Pritchett, J. Geophys. Res. 106, 3783 (2001).
ABSTRACT
  Particle in cell (PIC) simulations of collisionless magnetic reconnection are
presented that demonstrate that the electron dissipation region develops a
distinct two-scale structure along the outflow direction. The length of the
electron current layer is found to decrease with decreasing electron mass,
approaching the ion inertial length for a proton-electron plasma. A surprise,
however, is that the electrons form a high-velocity outflow jet that remains
decoupled from the magnetic field and extends large distances downstream from
the x-line. The rate of reconnection remains fast in very large systems,
independent of boundary conditions and the mass of electrons.

<|endoftext|><|startoftext|>
Accepted by The Astrophysical Journal
Preprint typeset using LATEX style emulateapj v. 03/07/07
POSITION–VELOCITY DIAGRAMS FOR THE MASER EMISSION COMING FROM A KEPLERIAN RING
Lucero Uscanga,
Centro de Radioastronomı́a y Astrof́ısica, Universidad Nacional Autónoma de México and
Apartado Postal 3-72, 58089 Morelia, Michoacán, Mexico
Jorge Cantó,
Instituto de Astronomı́a, Universidad Nacional Autónoma de México and
Apartado Postal 70-264, 04510 México, DF, Mexico
Alejandro C. Raga
Instituto de Ciencias Nucleares, Universidad Nacional Autónoma de México and
Apartado Postal 70-543, 04510 México, DF, Mexico
Accepted by The Astrophysical Journal
ABSTRACT
We have studied the maser emission from a thin, planar, gaseous ring in Keplerian rotation around
a central mass observed edge-on. The absorption coefficient within the ring is assumed to follow a
power law dependence with the distance from the central mass as, κ = κ0r
−q. We have calculated
position-velocity diagrams for the most intense maser features, for different values of the exponent q.
We have found that, depending on the value of q, these diagrams can be qualitatively different. The
most intense maser emission at a given velocity can either come mainly from regions close to the inner
or outer edges of the amplifying ring or from the line perpendicular to the line of sight and passing
through the central mass (as is commonly assumed). Particularly, when q > 1 the position-velocity
diagram is qualitatively similar to the one observed for the water maser emission in the nucleus of the
galaxy NGC 4258. In the context of this simple model, we conclude that in this object the absorption
coefficient depends on the radius of the amplifying ring as a decreasing function, in order to have
significant emission coming from the inner edge of the ring.
Subject headings: galaxies: individual (NGC 4258) — galaxies: nuclei — masers
1. INTRODUCTION
The a priori probability of seeing a thin disk nearly
edge-on is very small. It is given by p ≃ 0.125 (h/R)2,
where h is the thickness of the disk and R is its radius.
Typically h/R ≃ 0.01 and thus p ≃ 1.25 × 10−5. Sur-
prisingly however, the maser emission observed in sev-
eral cosmic sources has been successfully modeled as
coming from a ring or truncated disk in Keplerian ro-
tation (around a massive object) seen edge-on. For in-
stance: circumstellar disks in star-forming regions as in
S255 (Cesaroni 1990) and MWC 349 (Ponomarev et al.
1994), and also circumnuclear disks around black holes
of galactic nuclei as in NGC 4258 (Watson & Wallin
1994; Miyoshi et al. 1995). In general, the maser emis-
sion from a Keplerian disk observed edge-on produces a
triple-peaked spectrum (Elmegreen & Morris 1979); but
Ponomarev et al. (1994) showed that, there is a transi-
tion from triple- to double-peaked spectra as the width
of the amplifying ring decreases.
NGC 4258 is a Seyfert 2/LINER located at a dis-
tance of 7.2 ± 0.3 Mpc (Herrnstein et al. 1999). The
water maser emission (22 GHz) toward this galaxy was
first detected by Claussen et al. (1984). Shortly after-
wards it was shown that the water masers are confined
in a very small region (∼1.3 pc) at the center of NGC
4258 (Claussen & Lo 1986). Subsequently, Nakai et al.
(1993) discovered water maser emission with velocity off-
Electronic address: l.uscanga@astrosmo.unam.mx
Electronic address: raga@nucleares.unam.mx
sets ±1000 km s−1 from the already known emission at
the galactic systemic velocity of ≃472 km s−1. They sug-
gested that the high-velocity emission could arise from
masers orbiting a massive central black hole, or ejected
in a bipolar outflow. Using the Very Long Baseline Array
(VLBA), Miyoshi et al. (1995) simultaneously observed
the systemic and high-velocity water maser emission in
NGC 4258, finding that the spatial distribution and line-
of-sight velocities of the water masers trace a thin molec-
ular ring in Keplerian rotation around a massive black
hole of 3.6×107 M⊙ seen nearly edge-on. The position-
velocity (PV) diagram for the maser emission shows dis-
tinct Keplerian orbits (with deviations < 1%) defined by
the high-velocity maser emission that arises on the ring
diameter perpendicular to the line of sight, as well as a
line traced by the systemic maser emission that arises
from material on the inner edge of the amplifying ring,
this linear dependence is a consequence of the change in
the line-of-sight projection of the rotation velocity.
By monitoring both systematic and high-velocity wa-
ter maser emission of NGC 4258 over periods of sev-
eral years with different radio telescopes, a signif-
icant centripetal acceleration was observed only for
the maser features near the galactic systemic velocity.
The systemic maser features drift at a mean rate of
∼9 km s−1yr−1 (Haschick et al. 1994; Greenhill et al.
1995; Nakai et al. 1995; Bragg et al. 2000) while the
high-velocity maser features drift by .1 km s−1yr−1
(Greenhill et al. 1995). In a recent spectroscopic
study, Bragg et al. (2000) detected accelerations for
http://arxiv.org/abs/0704.0819v1
mailto:l.uscanga@astrosmo.unam.mx
mailto:raga@nucleares.unam.mx
2 USCANGA, CANTÓ, & RAGA
the high-velocity features in the range of −0.77 to
0.38 km s−1 yr−1. These measurements indicate that
the systemic water masers lie within a relatively nar-
row range of radii, on the near side of the ring at the
proximity of its inner edge, while the high-velocity wa-
ter masers are located near the ring diameter (between
−13.o6 and 9.o3 of the mid-line, Bragg et al. 2000). In
addition, the deviation of the high-velocity masers from
a straight line passing through the systemic masers in
the plane of the sky suggests that the rotating disk is
slightly warped (Herrnstein et al. 1996, 1999, 2005).
Previously, Watson & Wallin (1994) demonstrated
that the maser emission from a rapidly rotating, thin
Keplerian ring viewed edge-on can reproduce the general
features of the observed 22 GHz radiation from the nu-
cleus of NGC 4258, including the high-velocity satellites.
However, it is important to point out that their assump-
tion of a uniform absorption coefficient within the ampli-
fying ring results in a PV diagram for the most intense
masers that is qualitatively different from the observed
one. While their model predicts that the maser emission
at velocities around the systemic velocity of the galaxy
comes mainly from the outer edge of the ring, the ob-
servations indicate that this emission is actually coming
from the inner parts of the truncated disk.
In this paper we show that this discrepancy can be
resolved if the absorption coefficient decreases with dis-
tance from the central mass. The model is presented in
§2. The main results are described in §3. Finally, the
conclusions are discussed in §4.
2. MODEL
We study the maser emission that arises from a thin,
planar, gaseous ring in Keplerian rotation around a mas-
sive central object when it is observed edge-on. The mas-
ing gas is located between R0 and R, the inner and outer
radii of the ring, respectively. For simplicity, we assume
that the disk is transparent to the maser radiation at
radii smaller than R0 and greater than R, although the
inner region is probably thermalized due to the higher
gas density, and actually it would absorb a significant
fraction of the maser radiation produced in the far side
of the ring (see §4). The absorption coefficient is assumed
to follow a power law function of the distance from the
central mass within the amplifying ring as, κ = κ0r
The distances are measured in units of R, and the veloc-
ities are measured in units of vout, the rotation velocity
at the outer edge of the ring (see Figure 1).
For the case of an unsaturated maser and neglecting
the spontaneous emission, the intensity of the maser ra-
diation from a line of sight with impact parameter y at
a velocity vr is
I(vr, y) = I0e
τ(vr,y) , (1)
where the optical depth or gain along the line of sight is
given by
τ(vr , y) = 2 κ0
∫ xmax
(x2+y2)−q/2 exp
−(v − vr)2
xmin =
r20 − y2 for 0 ≤ |y| ≤ r0 ,
0 for r0 < |y| ≤ 1 ,
xmax =
1− y2 .
The line-of-sight velocity component of the gas at the
position (x, y) can be expressed as v = y/(x2 + y2)3/4.
Here I0 and ∆vD are the background intensity and the
Doppler width, respectively, which are supposed to be
uniform inside the amplifying ring. The Doppler width
∆vD is related with the FWHM of the velocity distribu-
tion of the emitting particles as ∆vD = FWHM/
4 ln 2.
We have numerically solved equations (1) and (2), and
we have also calculated the y-positions (impact parame-
ters) of maximum maser intensity for each specific value
of the velocity vr. When we have found two local max-
ima, we have kept both. With this information, we have
constructed the PV diagrams using the positions of the
observer’s line of sight with maximum emission at each
velocity. This way to construct the PV diagrams was
previously used by Uscanga et al. (2005).
We show the results using the following values for the
model parameters which seem to be appropriate for mod-
eling maser emission in the galaxy NGC 4258. The back-
ground intensity is I0 = 1.3 × 10−5 Jy beam−1, corre-
sponding to a radio continuum source with a temper-
ature of 106 K (Watson & Wallin 1994). The dimen-
sionless inner radius r0 = 0.51, using the estimated val-
ues for the inner and outer radii of 4.1 and 8.0 mas re-
spectively, given by Miyoshi et al. (1995). The Doppler
width ∆vD = 0.007vout which combined with an outer
rotation velocity of 770 km s−1 (Miyoshi et al. 1995),
gives a Doppler width ≃ 5 km s−1, similar to the value
used by Watson & Wallin (1994). We have used some
representative values of the exponent q, specifically q =
0, 1/2, 15/8, 5 for Models I, II, III and IV, respec-
tively. In Model I, we study the simplest situation of a
uniform absorption coefficient. In Model III, we choose
q = 15/8, that corresponds to the density dependence
with the radius of an accretion disk, i.e., Frank et al.
(1992). Finally, in Models II and IV, we explore two
other different values of the exponent q in order to study
how it changes the results. In all the models, the value
for the absorption coefficient κ0 is mainly determined
by the requirement that the intensity at the peak of the
central component (13 Jy beam−1) is compatible with
the observational data when the background intensity is
1.3×10−5 Jy beam−1. Other values of I0, r0, ∆vD, and
κ0 give qualitatively similar results.
We present the results in the next section; but let us
first discuss briefly some important concepts in order to
understand these results. In general, the observed emis-
sion at a given velocity coming from a specific position in
a nebula has contributions of the whole material along
the line of sight. However, when the velocity gradient
along the line of sight is greater than the dispersion ve-
locity (thermal or turbulent) of the emitting material,
the main contribution to the emission is actually coming
from a narrow region around the point with a line-of-
sight velocity equal to the observation velocity. The esti-
mated width of the region is 2l, where l is the correlation
distance defined as
l ≡ ∆vD
|dv/dx|
, (3)
here dv/dx is the line-of-sight velocity gradient. In this
approximation, known as Sobolev’s approximation or the
approximation of high velocity gradient, the observed in-
tensity is given by the following expression
I(vr) = I0(vr)e
−τ(vr) + S(vr)(1 − e−τ(vr)) , (4)
where I0 is the background intensity, S is the source func-
tion and τ is the optical depth given by
τ(vr) = κ(2l) , (5)
where κ is the absorption coefficient.
For the case of maser emission, the value of κ is in-
trinsically negative and τ is also negative, therefore the
factor e−τ(vr) becomes an amplification factor. Because
of this reason, the relative contribution at a given veloc-
ity of the correlation region is even more important with
respect to the remainder of the emitting material than
in the case of non-maser emission. Consequently, the ap-
proximation given by equations (4) and (5) is suitable
for maser emission.
As shown in the next section, for a gaseous ring of in-
ner radius R0 and outer radius R in Keplerian rotation
and seen edge-on, the emission either comes preferen-
tially from the inner or outer edges of the ring or from
the line perpendicular to the line of sight and passing
through the ring center. In the first two cases, it is easy
to show that the expected PV diagram will be a straight
line. When the emission comes from the outer edge, the
slope of the straight line is equal to one (measuring the
distances in units of the outer radius of the ring and the
velocities in units of the rotation velocity at that point),
whereas if the emission comes from the inner edge, the
slope of the straight line is equal to 1/r
0 . On the other
hand, when the emission arises from the line perpendic-
ular to the line of sight, the PV diagram will be a curve
with the form 1/y1/2, where y is the impact parameter
of the observation (see Figure 2).
3. RESULTS
The PV diagrams for the maser emission peak are
point-symmetric, consequently we only discuss positive
velocities from now on (see Figures 3 and 4).
• Model I (q = 0) – With this value of the expo-
nent q, we are considering the simplest situation,
a uniform or constant absorption coefficient. The
strongest maser emission either comes mostly from
the outer edge of the ring at velocities lower than
1, or from the mid-line of the ring perpendicular
to the line of sight and passing through the cen-
tral mass at greater velocities. The filled squares,
circles, and triangles mark the regions of strongest
maser emission at each velocity.
• Model II (q = 1/2) – The results are qualitatively
similar to those of Model I.
• Model III (q = 15/8) – This value of exponent q
corresponds to the density dependence with radius
of an accretion disk (ρ ∝ r−15/8). The strongest
maser emission either comes mainly from the inner
edge of the ring at low velocities (velocities near
the systemic velocity), or from the outer edge at
velocities close to 1. On the other hand, at ve-
locities greater than 1, the most intense emission
comes predominantly from the mid-line of the ring
perpendicular to the line of sight.
• Model IV (q = 5) – The strongest maser emission
comes mainly from the inner edge of the ring at
velocities lower than 1. At greater velocities, the
most intense emission can either come mainly from
the inner edge or from the mid-line of the ring per-
pendicular to the line of sight.
In summary, from the results of Models I–IV (see Fig-
ure 3), we found that the most intense maser emission
can be around the inner or outer edges of the ring, or
the mid-line of the ring perpendicular to the line of sight
depending on the velocity and also on the value of q. In
fact, the PV diagrams are qualitatively different when
q < 1 or q > 1. In the first case, for q < 1 (including
the simplest situation with a uniform absorption coeffi-
cient, q = 0) and vr < 1, the PV diagram corresponds
to a straight line with slope 1; for vr > 1, the diagram
corresponds to a Keplerian curve. In the second case,
for q > 1 and vr < 1, the PV diagram corresponds to a
straight line with a slope that depends on the inner ra-
dius of the ring. At velocities close to 1 the slope changes
to 1; for vr > 1, the diagram corresponds to a Keplerian
curve and also a straight line with a slope that depends
on the inner radius under circumstances such as in Model
It is also important to realize that when q > 1, the
optical depth or gain presents two local maxima within
a certain velocity range. Either local maxima may be a
global maximum. For vr < 1, the local maximum can be
either at the inner and/or outer edges of the ring, while
for vr > 1, they are located at the inner edge and/or
mid-line of the ring (see Figure 5).
As shown in the top panels of Figure 5 (Model III,
q = 15/8), the relative difference between the two lo-
cal maxima is not very significant. However, when the
value of q is higher (like in Model IV, q = 5 shown in
the bottom panels) the relative difference becomes more
important.
In order to estimate the velocity vc at which the global
maximum of the optical depth changes its locus, we have
calculated analytical approximations for the largest value
of the optical depth or gain that corresponds to the max-
imum intensity at the inner and outer edges of the ring,
and also at the mid-line of the ring perpendicular to the
line of sight. The detailed calculations are presented in
the Appendix. The following equations give the local
maximum depth as a function of the velocity in each
neighborhood
τ(vr) ≃
πκ0∆vD
1− r0v2r
inner edge , (6)
τ(vr) ≃
πκ0∆vD
1− v2r
outer edge , (7)
τ(vr) ≃
r mid-line . (8)
The velocity vc is estimated by combining equations
(6) and (7), or equations (6) and (8) according to the
value of vc (when vc < 1 or vc > 1, respectively). The
results are
2(1−q)
0 − 1
2(1−q)
0 − r0
for vc < 1, (9)
4 USCANGA, CANTÓ, & RAGA
2(1−q)
0 − v
c + r0v
c = 0 for vc > 1, (10)
which are presented in Figure 6 for some representative
values of the exponent q.
The bottom plot of Figure 6 shows vc as function of
the inner radius r0 for different values of q, from equation
(9a). For q < 1, there is no solution to equation (9a).
When q = 1, vc = 0 for any value of r0. That is, the
optical depth has a maximum and its locus is around
the outer edge of the ring, and vc is meaningless as we
have defined it. When q > 1, the optical depth presents
two local maxima, and vc is different from zero and its
value depends on r0. This velocity corresponds to the
value at which the locus of the global maximum changes
from the inner edge to the outer edge of the ring. As a
consequence, there is a slope change in the PV diagrams
at velocities lower than the rotation velocity at the outer
edge of the ring. For instance, when q = 15/8 and r0 =
0.51, the slope change occurs at vc = 0.906. In other
words, the locus of the global maximum of the optical
depth changes from the inner to the outer edge of the
ring at this velocity vc.
The top plot of Figure 6 also shows vc as function of
the inner radius r0 using specific values of q and ∆vD in
equation (9b); in this case, 5 and 0.007vout, respectively.
As an example, when ∆vD = 0.007vout, q = 5 and r0 =
0.51, then vc = 1.077. Stated differently, at that velocity
vc, the largest value of both the optical depth and the
intensity changes its locus from the inner edge to the
mid-line of the ring perpendicular to the line of sight.
The remarkable water maser emission in the nucleus
of the galaxy NGC 4258 traces a PV diagram where the
detected emission around the systemic velocity of the
galaxy comes from the inner edge of the amplifying ring;
this emission delineates a straight line just as the straight
line that connects points C and D in Figure 2 (see Figure
3 of Miyoshi et al. 1995). According to our model re-
sults, this implies that the absorption coefficient within
the molecular ring of NGC 4258 is not uniform, instead
it must be a decreasing function of the distance from the
central mass, i.e., κ = κ0r
−q with q > 1. Moreover, the
observed red/blue-shifted emission at high velocities that
arises from the mid-line of the ring perpendicular to the
line of sight traces a Keplerian curve such as is indicated
by the model results (see Figure 7). Simply stated, when
q > 1 the PV diagram is qualitatively similar to the one
observed for the water maser emission detected in the
nucleus of NGC 4258.
As an example, in Figure 7 we show a comparison be-
tween the results of Model III (q = 15/8) and the water
maser emission in NGC 4258. The detected emission
arises from the inner edge of the amplifying ring and the
mid-line perpendicular to the line of sight. The locus of
the observed maser emission coincides with the locus of
the most intense maser emission as indicated by the sizes
of the circles in Figure 7. The model results indicate that
there is emission coming from the outer edge of the ring
at velocities close to 1, nevertheless the sizes of the circles
indicate that this emission is very weak. Maybe maser
emission is not detected from this locus for this reason.
Additionally, our model results also indicate that the
intense maser emission at the inner edge of the ring ex-
tends neither to velocities very different from the sys-
temic velocity nor to impact parameters very different
from zero, as is indicated by the size of the circles in the
PV diagram shown in Figure 7. Furthermore, accord-
ing to the size of the circles, the other locus of intense
maser emission is the mid-line of the ring perpendicular
to the line of sight, precisely the locus of the red/blue-
shifted maser emission at high velocities that describes
Keplerian curves in the PV diagram.
4. DISCUSSION AND CONCLUSIONS
In our model, we have assumed that the gas in the re-
gion inside the masing ring is transparent to the maser
radiation. This implies that the most intense maser emis-
sion at low velocities (velocities near the systemic veloc-
ity) comes mainly from the outer edge of ring (for q < 1)
or from the inner edge (for q > 1), either the near or
far side of the ring, as is indicated in Figure 4. Mea-
surements of positive acceleration of the maser emission
around the systemic velocity show that this emission cer-
tainly comes from the near side of the ring at the prox-
imity of its inner edge (e.g., Greenhill et al. 1995). If we
suppose that the gas inside the masing ring is thermal-
ized probably due to its higher density then an important
fraction of the maser emission from the backside of the
ring would be absorbed and the detected emission would
come from the front side of the ring at the outer or inner
edge depending on the value of q. For instance, consider-
ing absorption and emission from the gas located inside
the masing ring, the difference in the intensity for a line
of sight that passes through both the inner absorbing
region and the front side of the masing ring from the in-
tensity for a line of sight that passes through both the
backside of the masing ring and the inner absorbing re-
gion is S(1−e−τ2)(eτ1−1) where S is the source function
of the gas inside the masing ring, τ2 is the optical depth
in this region, and τ1 is the optical depth for the front
side of the ring. If τ2 >> τ1, then the detected emission
would be the radiation amplified by the front side of the
masing ring.
Also, we have made a simplifying assumption about
the geometry of the masing ring in NGC 4258, consider-
ing that the amplifying ring is strictly flat. Despite the
observations indicate an apparent warp in the maser dis-
tribution of this galaxy, Kartje et al. (1999) presented a
model in which the disk does not require to be physically
warped in order to the masing gas become exposed to
the central continuum radiation. In this scenario, dusty
clouds provide the shielding of the high-energy contin-
uum, which is required for the gas to remain molecu-
lar. They found that a flat-disk model of the irradiated
ring could be applied to a source like NGC 4258 only
if the water abundance is higher than the value implied
by equilibrium photoionization-driven chemistry. A very
important result from their study (based on radiative
and kinematic considerations) was that, even if the disk
in NGC 4258 is warped, the maser-emitting gas must be
clumpy, instead of homogeneous as in the scenario pre-
viously proposed by Neufeld & Maloney (1995).
An important result of our model shows that the as-
sumption, commonly used, that considers a uniform or
constant absorption coefficient within the masing ring
in Keplerian rotation around the nucleus of NGC 4258
is not appropriate. For example, Wallin et al. (1998)
supposed that κ was constant considering that the locus
of the maser emission from NGC 4258 was determined
mainly by the velocity gradients in a Keplerian velocity
field indicating some uniformity of κ, at least on length
scales comparable to the coherence or correlation length
resulting from the Keplerian velocity gradients. On the
contrary, from our analysis, we conclude that a constant
absorption coefficient would result in a PV diagram qual-
itatively different from the observed one, since the most
intense maser emission would come predominantly from
a narrow region close to the outer edge of the ring instead
of a narrow region close to the inner edge of the ring, as
indicated by the observations. Necessarily, the absorp-
tion coefficient must be a decreasing function of distance
from the central mass (i.e., κ = κ0r
−q with q > 1) to
have significant emission coming from the inner edge of
the amplifying ring and hence explain the form of the PV
diagram delineated by the water masers in NGC 4258.
When comparing our edge-on disk model with the
observations of NGC 4258, it is clear that we need a
κ ∝ r−q radial dependence for the absorption coefficient
with q > 1 (so as to favour the emission from the in-
ner edge of the disk, see above) in order to reproduce
the observations. In reality, the fact that the disk of
NGC 4258 is warped introduces geometrical effects which
might favour the inner disk edge emission (over the one of
the outer edge). One will need to compute more complex,
3D transfer models to see whether or not these geomet-
rical effects are sufficient to explain the PV diagrams of
the NGC 4258 masers without introducing the radially
dependent absorption coefficient which is required by the
edge-on disk models described in the present paper.
J. C. and A. C. R. acknowledge support from CONA-
CyT grants 41320 and 43103, and DGAPA-UNAM. L.
U. acknowledges support from DGAPA-UNAM. We sin-
cerely thank J. M. Torrelles and Y. Gómez for useful
comments, which contributed to improve an earlier ver-
sion of this manuscript. L. U. gives special thanks to
M. R. Pestalozzi and M. Elitzur for valuable comments
on this work. We also thank an anonymous referee for
helpful comments on the manuscript.
APPENDIX
ANALYTICAL APPROXIMATIONS FOR THE OPTICAL DEPTH
In this appendix, we describe how to obtain the analytical approximations for the optical depth given by equations
(6)–(8).
First, we define w = (v − vr)/∆vD then we can change the variables in equation (2) and rewrite it as
τ(vr , y) = 2κ0
∫ wmax
(x2 + y2)−q/2
exp(−w2)dw , (A1)
where
wmin =
0 − vr
wmax =
y − vr
Note that wmin > wmax since r0 ≤ 1. Using the expressions for the line-of-sight velocity v = y/(x2 + y2)3/4 and the
previously defined variable w = (v− vr)/∆vD, we can write x =
(y/(vr + w∆vD))
4/3 − y2
. At zero order around
w = 0 we obtain
(x2 + y2)−q/2
≃ − 2∆vD(y/vr)
2(2−q)/3
(y/vr)4/3 − y2
. (A2)
Additionally
∫ wmax
exp(−w2)dw =
erf(wmax)− erf(wmin)
, (A3)
where erf(w) is the error function, defined as erf(w) ≡ (2/
exp(−t2)dt. Finally, substituting equations (A2)
and (A3) into (A1), we obtain the approximation for the optical depth
τ(vr , y) ≃
πκ0∆vD(y/vr)
2(2−q)/3
(y/vr)4/3 − y2
erf(wmin)− erf(wmax)
. (A4)
Around the inner edge of the ring, vr ≃ y/r3/20 and the maximum value of [erf(wmin)− erf(wmax)] = 2. Substituting
these approximations into equation (A4), we obtain equation (6). Similarly, around the outer edge of the ring, vr ≃ y,
and the maximum value of [erf(wmin)− erf(wmax)] also equals 2. Then, equation (A4) reduces to equation (7) for the
optical depth at the outer edge of the ring.
In order to find an approximation for the local maximum of optical depth at the mid-line of the ring perpendicular
to the line of sight, we expand the expression for the velocity along the line of sight around x = 0 to obtain
v ≃ 1
, (A5)
6 USCANGA, CANTÓ, & RAGA
therefore
= −∆vD , (A6)
and thus
5/4 . (A7)
Then, using Sobolev’s approximation
τ = κ0(x
2 + y2)−q/2(2x) , (A8)
and substituting equation (A7) into (A8), we obtain the following expression
τ ≃ κ0
5/2 + y2
)−q/2 4
5/4 , (A9)
since y ≪ 1 and ∆vD is small, then 43∆vDy
5/2 ≪ y2, hence
τ ≈ 4
5/4−q , (A10)
considering that y = v−2r , we finally obtain the approximation for the local maximum of the optical depth at the
mid-line of the ring given by equation (8).
REFERENCES
Bragg, A. E., Greenhill, L. J., Moran, J. M., & Henkel, C. 2000,
ApJ, 535, 73
Cesaroni, R. 1990, A&A, 233, 513
Claussen, M. J., Heiligman, G. M., & Lo, K. Y. 1984, Nature, 310,
Claussen, M. J., & Lo, K.-Y. 1986, ApJ, 308, 592
Elmegreen, B. J., & Morris, M. 1979, ApJ, 229, 593
Frank, J., King, A., & Raine, D. 1992, Accretion Power in
Astrophysics (Cambridge University Press)
Greenhill, L. J., Henkel, C., Becker, R., Wilson, T. L., &
Wouterloot, J. G. A. 1995, A&A, 304, 21
Haschick, A. D., Baan, W. A., & Peng, E. W. 1994, ApJ, 437, L35
Herrnstein, J. R., Greenhill, L. J., & Moran, J. M. 1996, ApJ, 468,
Herrnstein, J. R., Moran, J. M., Greenhill, L. J., & Trotter, A. S.
2005, ApJ, 629, 719
Herrnstein, J. R., et al. 1999, Nature, 400, 539
Kartje, J. F., Königl, A., & Elitzur, M. 1999, ApJ, 513, 180
Miyoshi, M., Moran. J., Herrnstein, J., Greenhill, L., Nakai, N.,
Diamond, P., & Inoue, M. 1995, Nature, 373, 127
Nakai, N., Inoue, M., Miyazawa, K., Miyoshi, M., & Hall, P. 1995,
PASJ, 47, 771
Nakai, N., Inoue, M., & Miyoshi, M. 1993, Nature, 361, 45
Neufeld, D. A. & Maloney, P. R. 1995, ApJ, 447, L17
Ponomarev, V. O., Smith, H. A., & Strelnitski, V. S. 1994, ApJ,
424, 976
Uscanga, L., Cantó, J., Curiel, S., Anglada, G., Torrelles, J. M.,
Patel, N. A., Gómez, J. F., & Raga, A. C. 2005, ApJ, 634, 468
Wallin, B. K., Watson, W. D., & Wyld, H. W. 1998, ApJ, 495, 774
Watson, W. D., & Wallin, B. K. 1994, ApJ, 432, L35
R  /R00
Observer
Fig. 1.— Schematic diagram of a gaseous disk in Keplerian rotation. The masing gas exists between radii R0 and R. At radii smaller
than R0 and greater than R the disk is transparent to the maser radiation. The observer is on the plane of the disk. All the distances are
measured in units of R, the outer radius of the amplifying ring; therefore the variables x, y, and r0 are dimensionless.
8 USCANGA, CANTÓ, & RAGA
v  = y
1/2v  = 1/y
v  = y/r0
Fig. 2.— PV diagram for the maser emission of a gaseous ring with inner radius R0 and outer radius R in Keplerian rotation observed
edge-on. The straight line that connects points A and B has a slope equal to 1, while the straight line that connects points C and D has a
slope equal to 1/r
Fig. 3.— From top to bottom results of Models I, II, III, and IV. The left panels show PV diagrams for the maser emission peak. The
filled squares, circles, and triangles represent the strongest maser emission which is coming from regions a, b, and c, respectively, indicated
in Figure 4. The straight lines or curves represent the velocity dependences of the regions where this emission arises. The central panels
show PV diagrams for the maser emission peak. Because of the point-symmetric shape of these diagrams, only positive velocities are shown.
The radii of the open circles are proportional to the maximum maser intensity at each position and velocity. The right panels show the
logarithm of the ratio between the maximum intensity and the background intensity as a function of the velocity.
10 USCANGA, CANTÓ, & RAGA
:a v   = y
b : v   = 1 / y
x x x
b b b
Observer Observer
Observer
Model  IIIModels  I  and  II Model  IV
:c  v   = y / r0
Fig. 4.— Schematic representation of the results of Models I, II, III, and IV. The filled squares, circles, and triangles represent the most
intense maser emission that is coming from regions a, b, and c, respectively. These regions are very narrow because the correlation distance
is very small; that is, the width ∆vD is much smaller than the line-of-sight velocity gradient. Besides, the exponential amplification of the
intensity emphasizes small changes in the optical depth.
Fig. 5.— Left : PV diagrams for the maser emission in grey scale with intensity contours overlaid. The darker regions show the locus
of the strongest emission in these diagrams, which potentially could be detected, depending on the sensitivity cutoff of the observations.
Right : Close-up to the PV diagrams showing the maser emission at low velocities.
12 USCANGA, CANTÓ, & RAGA
Fig. 6.— vc as function of the inner radius of the ring, r0. Bottom: For vc < 1, it is computed from equation (9a) for some representative
values of q. For q < 1, there is no solution to equation (9a). Top: For vc > 1, it is computed from equation (9b). This is the solution to
equation (9b), using ∆vD = 0.007vout and q = 5.
Fig. 7.— Comparison between the calculated PV diagram for the maser emission peak in Model III (q = 15/8) and the PV diagram
delineated by the water masers observed in NGC 4258 (Miyoshi et al. 1995). The radii of the open circles are proportional to the maser
intensity at each position and velocity. The dots represent the observed maser spots in NGC 4258. We have subtracted the ring systemic
velocity of 476 km s−1 from the observed local standard of rest velocity of the maser spots in order to compare the observed PV diagram
with the modeled one. The positions and velocities are in units of the outer radius of the ring (8 mas) and the rotation velocity at the
outer edge of the ring (770 km s−1), respectively.
ABSTRACT
  We have studied the maser emission from a thin, planar, gaseous ring in
Keplerian rotation around a central mass observed edge-on. The absorption
coefficient within the ring is assumed to follow a power law dependence with
the distance from the central mass as, k=k0r^{-q}. We have calculated
position-velocity diagrams for the most intense maser features, for different
values of the exponent q. We have found that, depending on the value of q,
these diagrams can be qualitatively different. The most intense maser emission
at a given velocity can either come mainly from regions close to the inner or
outer edges of the amplifying ring or from the line perpendicular to the line
of sight and passing through the central mass (as is commonly assumed).
Particularly, when q>1 the position-velocity diagram is qualitatively similar
to the one observed for the water maser emission in the nucleus of the galaxy
NGC 4258. In the context of this simple model, we conclude that in this object
the absorption coefficient depends on the radius of the amplifying ring as a
decreasing function, in order to have significant emission coming from the
inner edge of the ring.

<|endoftext|><|startoftext|>
Microsoft Word - Boriskina_OL2007.DOC
Coupling of whispering-gallery modes in size-mismatched 
microdisk photonic molecules 
Svetlana V. Boriskina 
School of Radiophysics, V. Karazin Kharkov National University, Kharkov 61077, Ukraine 
Mechanisms of whispering-gallery (WG) modes coupling in microdisk photonic molecules (PMs) with slight and 
significant size mismatch are numerically investigated. The results reveal two different scenarios of modes interaction 
depending on the degree of this mismatch and offer new insight into how PM parameters can be tuned to control and 
modify WG-modes wavelengths and Q-factors. From a practical point of view, these findings offer a way to fabricate PM 
microlaser structures that exhibit low thresholds and directional emission, and at the same time are more tolerant to 
fabrication errors than previously explored coupled-cavity structures composed of identical microresonators. © 2007 
Optical Society of America. 
OCIS codes: 130.3120, 140.3410, 140.4780, 140.5960, 230.5750, 260.3160 
During the last decade, photonic molecules1 (clusters of 
electromagnetically-coupled optical microcavities) have gone 
a long way from a useful illustration of parallels between 
behavior of photons and electrons and now hold promise of 
new insights into physics of light-matter interactions and also 
of a variety of practical applications, including microlasers, 
tunable filters and switches, coupled-cavity waveguides, 
sensors, etc2-10. The simplest PM composed of two identical 
optical microcavities has been widely used as a test-bed to 
demonstrate shift and splitting of cavity modes and formation 
of a spectrum of bonding and anti-bonding PM supermodes1-
4. I have recently shown how arranging identical WG-mode 
microdisks into pre-designed high-symmetry configurations 
yields quasi-single-mode PMs with dramatically increased Q-
factors6, enhanced sensitivity to the environmental changes7, 
and/or directional emission patterns8. In all these structures, 
size uniformity of microcavities is an important issue in 
successful realization of PM-based optical components.  
The motivation for studying interactions of optical 
modes in a photonic molecule with size mismatch9,10 stems 
from two sources. First, precise and repeatable fabrication of 
identical microcavities, which in many cases are tiny 
structures having just several microns in diameter, is highly 
challenging. Second, a systematic study of double-cavity 
PMs with various degrees of cavities size mismatch can 
reveal new mechanisms of manipulating their optical 
properties thus paving way to improving or adding new 
functionalities to PM-based photonic devices. Such study has 
never been performed before, and is a focus of this letter. 
Despite its simplicity, the double-cavity structure provides 
useful insight into the general mechanisms of WG-modes 
coupling and offers new design ideas for more complex 
structures. The Muller boundary integral equations 
framework previously developed by the author to model PMs 
composed of identical cavities7 has been modified to study 
size-mismatched PMs. In the following, the term 
“microcavity mode” encompasses a complex value of the 
mode eigenfrequency and the corresponding eigenvector 
(i.e., modal spatial field distribution). 
The PM under study is composed of a pair of side-
coupled microdisks of radii RA, RB and refractive indices nA, 
nB separated by an airgap of width w (Fig. 1a). The 
microdisks of thicknesses much smaller than their 
diameters are considered. Thus, instead of the 3-D 
boundary value problem for the Maxwell equations, we 
solve its 2-D equivalent. In the following analysis, we 
search for the TE (transverse-electric) WG-modes, which 
are dominant in thin disks. At wavelength λ = 1.521 μm, a 
2-D cavity with radius 1.1 μm and effective refractive index 
63.2TE =effn  (2-D equivalent of a 200 nm-thick GaInAsP 
disk)2 supports WG8,1-mode with one radial field variation 
and eight azimuthal field variations (Q = 5243).  This mode 
(like all other WG-modes in circular cavities) is double-
degenerate due to the symmetry of the structure.  
WG-mode degeneracy is removed if two (or more) 
cavities are brought close together1-9, and four non-
degenerate WG-supermodes of different symmetry appear 
in the double-disk PM spectrum instead of every WG-mode 
of an isolated cavity. Fig. 2 (b and c) shows the wavelength 
migration and Q-factors change of these modes with the 
change of the radius of one of the cavities. The modes are 
labeled according to the symmetry of their field patterns 
along the y- and x- axes, respectively (e.g., OE supermode 
has odd symmetry with respect to y-axis and even 
symmetry with respect to x-axis). OE and OO modes are 
termed “anti-bonding” modes, while EO and EE modes are 
termed “bonding” ones. Bonding and anti-bonding 
supermodes group into nearly-degenerate doublets as seen 
in Fig. 1b. The values of real parts of eigenfrequencies of 
two modes in a doublet are so close to each other that they 
cannot be distinguished (Fig. 1b), while their imaginary 
parts differ, resulting in different Q-factors of these 
supermodes (Fig. 1c). Thus, in practice only two peaks are 
observed in a symmetrical double-cavity PM lasing 
spectrum (see Fig. 2 in Ref. 9), where the narrow high-
intensity peak corresponds to the high-Q anti-bonding mode 
doublet, and the wider low-intensity peak corresponds to the 
bonding one. 
0.8 1.0 1.2 1.4
(single disk)
Radius of microdisk B, R
 (μm)
 w (a) RA=1.1 μm RB 
Fig 1. (a) A geometry of a PM composed of two microdisks of different 
radii; (b) wavelengths migration and (c) Q-factors change of PM supermodes 
as a function of the radius of disk B (RA = 1.1 µm, w = 400 nm). The insets 
show the magnetic field distribution of the bonding (EE) and the anti-
bonding (OE) WG8,1 supermodes in the symmetrical (RA = RB = 1.1 µm) PM. 
Here and thereafter, corresponding characteristics of the WG8,1 mode of an 
isolated cavity with radius 1.1 µm are plotted for comparison (dash-dot line). 
1.05 1.10 1.15
1.05 1.10 1.15
 Superm
ode Q
-factor
Radius of microdisk B, R
 (μm)
Fig. 2. Supermodes wavelengths (left) and Q-factors (right) in the vicinity of 
anti-crossing point AC1 (RB = RA). Mode switching (see the modal near-field 
distributions at points A and B shown in the insets) at the anti-crossing point 
and Q-factor enhancement of one of the supermodes can be observed. 
Careful study of Figs. 1 b,c reveals a number of so-called 
exceptional points (corresponding to certain values of the 
varied parameter), where PM supermodes couple following 
either crossing (C) or avoided crossing (AC) scenarios. The 
behavior of wavelengths and Q-factors of the four 
supermodes in the vicinity of these exceptional points is 
shown in more detail in Figs. 2-4 (for the points AC1, AC2, 
and C1, respectively). The phenomenon of coupling of 
complex eigenvalues of matrices dependent on parameters 
under the change of these parameters is of a general nature 
and is observed in many physical systems11. Usually, 
frequency anti-crossing (crossing) is accompanied by 
crossing (anti-crossing) of the corresponding widths of the 
resonance states. Furthermore, at the points of avoided 
frequency crossing (points AC1-AC3 in Fig. 1), eigenmodes 
interchange their identities, i.e., Q-factors and field 
distributions.  
0.89 0.90 0.91 0.92
0.89 0.90 0.91 0.92
 Superm
ode Q
-factor
Radius of microdisk B, R
 (μm)
Fig. 3. Supermodes wavelengths (left) and Q-factors (right) in the vicinity 
of anti-crossing point AC2. Wavelengths repulsion accompanied by the 
linewidths crossing is observed. The insets demonstrate mode switching at 
the anti-crossing point (the modal near-field distributions shown in the 
upper(lower) insets correspond to the complex frequency values at the 
points labeled as A and D (B and C), respectively. 
In the context of coupling of WG-modes in microcavities, 
this interchange offers exciting new prospects for 
manipulating the PM optical characteristics, e.g. for 
realization of optical flip-flops9. For example, the coupling 
of modes with avoided frequency crossing scenario 
observed in Figs. 2 and 3 makes possible switching of field 
intensity between two microdisks. To realize such 
switching in practice, carrier-induced refractive index 
change of one of the disks induced by nonuniform pumping 
can be used. This effect was observed experimentally11 in a 
PM composed of nearly-identical microdisks (similar to the 
case shown in Fig. 2). If the microcavities are severely size-
mismatched, their WG-modes couple with the frequency 
crossing scenario. This situation is demonstrated in Fig. 4, 
and the numerical data indicate that such coupling may 
spoil significantly the Q-factors of the high-Q modes in the 
larger microdisk. However, optical coupling between 
cavities with strongly detuned WG-modes makes possible 
broad spectral transmission effects in coupled resonator 
optical waveguides (CROWs)10, coupled-resonator induced 
transparency12, and significant reducing of CROW bend 
radiation loses13. 
0.72 0.75 0.78
0.72 0.75 0.78
 Superm
ode Q
-factor
Radius of microdisk B, R
 (μm)
Fig. 4. Supermodes wavelengths (left) and Q-factors (right) in the vicinity of 
the crossing point C1. Wavelength crossing accompanied by damping of the 
high-Q supermodes is observed. The insets show supermodes near-field 
portraits at the crossing point. 
0.4 0.6 0.8
1.473
1.474
0.4 0.6 0.8
5x103
6x103
7x103
8x103
9x103
 Superm
ode Q
-factor
Disk-to-disk separation (μm)  
Fig. 5. Resonance wavelength (left) and Q-factor (right) of an anti-bonding 
WG-supermode in a three-disk PM. The central disk of radius 1.065 μm is 
separated from the side disks of radii 1.1 μm by airgaps of 400 nm width. 
Supermode near-field portrait and directional far-field emission pattern at the 
point labeled as A are shown in the inset. 
Finally, enhancement of the Q-factor of a single 
supermode in a double-disk PM in comparison to its single-
cavity value can be observed in Fig. 2 in a relatively wide 
range of cavities radii detuning: 14 nm < ΔR < 53 nm (ΔR = 
|RA - RB|). Note that all the other PM supermodes have 
significantly lower Q-factors in this range of the parameter 
change. This effect offers a way for selective enhancement of 
the Q-factor of a single supermode that (unlike symmetry-
enhanced Q-factor boost in polygonal PMs)6,7 does not rely 
on exact cavity size-matching. A possible realization of a 
PM-based structure designed by making use of this 
mechanism of selective mode enhancement is presented in 
Fig. 5. It consists of three coupled microcavities, with the 
central cavity radius detuned by ΔR = 35 nm from the side 
cavities radii. By adjusting the width of the airgaps between 
microcavities, noticeable Q-factor enhancement of one anti-
bonding supermode is achieved without shifting the 
supermode wavelength (Fig. 5). Furthermore, such PM 
demonstrates directional light emission, which cannot be 
achieved in isolated WG-mode microdisks (see inset to Fig. 
5). Our studies also indicate that this directional emission 
pattern is preserved if the disk-to-disk distance is varied. 
It should also be noted that other system parameters 
can be tuned to manipulate resonance wavelengths and Q-
factors of microcavities through mode coupling at 
exceptional points. Among these are: the refractive index of 
the cavity substrate, and the size and/or position of a hole 
pierced in the cavity, which can be adjusted to enhance a 
WG-mode Q- factor14,15 or to achieve directional emission 
on a high-Q WG-mode16. 
In summary, a comprehensive numerical study was 
performed to elucidate the mechanisms of modes coupling 
in PMs with various degrees of cavities size mismatch. The 
study offers an alternative approach to design novel PM-
based components with improved functionalities. In 
contrast to PM structures composed of identical cavities 
that may require fabrication accuracy beyond the 
capabilities of modern technology, the proposed approach 
does not rely on precise cavities size-matching to achieve 
the desired device performance. This approach paves the 
way for new designs of more complex PM structures and 
arrays, which may eventually lead to new capabilities and 
applications in micro- and nano-photonics. 
I wish to thank Vasily Astratov for discussions and Jan 
Wiersig for bringing his recent paper16 to my attention. This 
work has been partially supported by the NATO 
Collaborative Linkage Grant CBP.NUKR.CLG 982430. 
Svetlana Boriskina’s e-mail address is 
SBoriskina@gmail.com.  
References 
1. M. Bayer, T. Gutbrod, J. P. Reithmaier, A. Forchel, T. L. Reinecke, 
and P. A. Knipp, Phys. Rev. Lett. 81, 2582-2586 (1998). 
2. A. Nakagawa, S. Ishii, and T. Baba, Appl. Phys. Lett. 86, 041112 
(2005). 
3. E. I. Smotrova, A. I. Nosich, T. M. Benson, and P. Sewell, IEEE J. 
Select. Topics Quantum. Electron. 12(1), 78-85 (2006). 
4. Y. P. Rakovich, J. F. Donegan, M. Gerlach, A. L. Bradley, T. M. 
Connolly, J. J. Boland, N. Gaponik, and A. Rogach, Phys. Rev. A 70, 
051801(R) (2004). 
5. B. Moller, U. Woggon, and M. V. Artemyev, J. Opt. A: Pure Appl. 
Opt. 8, S113–S121 (2006). 
6. S. V. Boriskina, Opt. Lett. 31(3), 338-340 (2006). 
7. S. V. Boriskina, J. Opt. Soc. Am. B 23(8), 1565-1573 (2006). 
8. S. V. Boriskina, T. M. Benson, P. Sewell, and A. I. Nosich, IEEE J. 
Select. Topics Quantum Electron., 12(6), (2006). 
9. S. Ishii, A. Nakagawa, and T. Baba, IEEE J. Select. Topics Quantum. 
Electron. 12(1), 71-77 (2006). 
10. A. V. Kanaev, V. N. Astratov, and W. Cai, Appl. Phys. Lett. 88, 
111111 (2006). 
11. W. D. Heiss, Phys. Rev. E 61, 929–932 (2000). 
12. D. D. Smith, H. Chang, K. A. Fuller, A. T. Rosenberger, and R. W. 
Boyd, Phys. Rev. A 69, 063804 (2004). 
13. S. V. Pishko, P. Sewell, T. M. Benson, and S. V. Boriskina, submitted 
to J. Lightwave Technol. (2007). 
14. S. V. Boriskina, T. M. Benson, P. Sewell, and A. I. Nosich, J. 
Lightwave Technol. 20(8) 1563-1572 (2002). 
15. X.-S. Luo, Y.-Z. Huang, and Q. Chen, Opt. Lett. 31(8), 1073-1075 
(2006). 
16. J. Wiersig and M. Hentschel, Phys. Rev. A 73, 031802 (2006).
ABSTRACT
  Mechanisms of whispering-gallery (WG) modes coupling in microdisk photonic
molecules (PMs) with slight and significant size mismatch are numerically
investigated. The results reveal two different scenarios of modes interaction
depending on the degree of this mismatch and offer new insight into how PM
parameters can be tuned to control and modify WG-modes wavelengths and
Q-factors. From a practical point of view, these findings offer a way to
fabricate PM microlaser structures that exhibit low thresholds and directional
emission, and at the same time are more tolerant to fabrication errors than
previously explored coupled-cavity structures composed of identical
microresonators.

<|endoftext|><|startoftext|>
Spin solid phases of spin 1 and spin 3/2 antiferromagnets on a cubic lattice.
Karol Gregor and Olexei I. Motrunich
Department of Physics, California Institute of Technology, Pasadena, CA 91125
(Dated: October 30, 2018)
We study spin S = 1 and S = 3/2 Heisenberg antiferromagnets on a cubic lattice focusing on
spin solid states. Using Schwinger boson formulation for spins, we start in a U(1) spin liquid phase
proximate to Neel phase and explore possible confining paramagnetic phases as we transition away
from the spin liquid by the process of monopole condensation. Electromagnetic duality is used to
rewrite the theory in terms of monopoles. For spin 1 we find several candidate phases of which the
most natural one is a phase with spins organized into parallel Haldane chains. For spin 3/2 we find
that the most natural phase has spins organized into parallel ladders. As a by-product, we also
write a Landau theory of the ordering in two special classical frustrated XY models on the cubic
lattice, one of which is the fully frustrated XY model. In a particular limit our approach maps to
a dimer model with 2S dimers coming out of every site, and we find the same spin solid phases in
this regime as well.
PACS numbers:
I. INTRODUCTION
A simple, nontrivial, and physically common example
of a regular system of quantum objects is a collection of
spins on a lattice. This is easiest to analyze if the in-
teractions do not compete and all prefer the same spin
state; the resulting phases have been known for a long
time and include ferromagnetic and Neel states. A much
richer situation of current interest is when interactions
compete. The frustration together with quantum fluctu-
ations can destroy the magnetic order and produce spin
solid or spin liquid phases. In a spin solid, spins combine
into larger singlet objects such as valence bonds which
form an ordered pattern on a lattice. Such phases have
been found in nature,1,2,3 and also in numerical studies
of model Hamiltonians.4,5,6 A spin liquid, on the other
hand, is a featureless paramagnet, which can be crudely
viewed as a quantum superposition of many valence bond
configurations, thus the name “resonating valence bonds”
(RVB) state. So far there are only few experimental can-
didates, but on the theoretical side the existence of spin
liquids in many varieties and our understanding of them
is well established (see Ref. 7 for a recent collection of
references and also a very recent example of the so-called
Coulomb phase in 3d, which is the spin liquid relevant to
the present work).
In this paper we look for natural spin solid phases of
spin 1 and spin 3/2 on a cubic lattice. A direct study of
spin Hamiltonians that can stabilize such phases is diffi-
cult but can be done in some cases with Quantum Monte
Carlo. Which phases are realized will of course depend on
the specific model: For example, Refs. 4,5 found valence
bond solids in spin 1/2 systems with ring exchanges on
the square and cubic lattices. Refs. 6,8 found spin solid
phases for a spin 1 model with biquadratic interaction
on the anisotropic square lattice, but only magnetically
ordered phases on the isotropic square and cubic lattices.
Here we follow instead a more phenomenological
approach.9,10,11,12 A systematic and commonly used
route to achieve this, and the one we start with, is to
generalize the spins to a representation of higher symme-
try group, here taken to be SU(N).9,10 The problem can
be solved exactly in the N → ∞ limit and one can con-
sider fluctuations around this limit to get long distance
properties of the system. This approach, while difficult to
connect with the actual microscopic SU(2) spin system,
nevertheless gives us some guidance about what phases
to expect and gives us a form of the effective field theory.
Here it results in a gauge theory which naturally exhibits
deconfined (liquid) and confined (solid) phases, and we
expect that if a microscopic spin system has such phases,
they should be described by this theory.
FIG. 1: The most natural spin solid phase for S = 1 on the
cubic lattice. The thick lines denote links with large spin-spin
correlations suggesting that the spins organize into Haldane
chains along one lattice direction.
One of the spin liquid phases expected in 3d is the
so-called Coulomb phase. It is a compact U(1) gauge
theory coupled to matter in the deconfined phase, where
the matter fields (spinons) are gapped, gauge field (emer-
gent photon) is gapless, and monopoles (which arise due
http://arxiv.org/abs/0704.0821v1
FIG. 2: The most natural spin solid phase for S = 3/2 on the
cubic lattice. The drawn bold lines denote links with large
spin-spin correlations suggesting that the spins organize into
ladders.
to compactness) are gapped. In addition, importantly,
there are spin Berry phases that lead to the presence of a
background charge in the gauge theory formulation. This
makes the confined phases nontrivial in that they break
lattice symmetries and therefore correspond to various
spin solids. The transition occurs because the monopoles
condense, and the theory can be equivalently analyzed in
terms of them by employing standard electro-magnetic
duality. The background charge causes monopoles to
acquire a phase when they hop around a plaquette.12
This leads to a nontrivial monopole condensation pat-
tern, which then corresponds to a spin solid phase. In
2d the physics is similar, except that the monopoles are
instantons and they always proliferate, so there is no
Coulomb spin liquid. This approach was first used by
Read and Sachdev10 on the square lattice. The spin solids
for spin 1/2 on the cubic lattice were analyzed in Ref. 12
and near several different Coulomb spin liquids in Ref. 13.
Ref. 14 was led (in a different context) to a gauge theory
with background charges on a diamond lattice which was
attacked using analogous techniques.
For the spins on the cubic lattice, the analysis depends
only on the spin magnitude. Any case can be mapped
onto S = 0, 1/2, and 1 in 2d and S = 0, 1/2, 1, and
3/2 in 3d. Only the spin 1/2 case was considered so
far, but these results cannot be transferred to the other
spins since each requires a separate analysis. This is the
task of the present work. We find that the most nat-
ural phases for spin 1 and 3/2 are the ones shown on
Figures 1 and 2. In the S = 1 case the spins organize
into Haldane chains. This is easiest to understand in the
standard picture where we break spin 1 into two spin
1/2’s and form singlets with spin 1/2’s of spins on either
side. Similarly, in the S = 3/2 case we break spin 3/2
into three spin 1/2’s and form singlets on the bonds of
the ladders. Several approaches that we have taken and
used in different parameter regimes suggest the same spin
solid states, which gives us confidence that these phases
are very natural in the two cases.
II. SCHWINGER BOSONS, DUAL
REFORMULATION, AND A BASIC PHASE
DIAGRAM
A. Schwinger bosons
We begin by briefly reviewing the standard technique
of large N for spins.9,10 This maps (approximately) our
spin system into a theory of spinons coupled to a U(1)
gauge field in the presence of static background charges.
Our main work is the analysis of this theory, while the
purpose of the review here is to establish the connection
with the properties of the original spin system.
The basic steps in the derivation are as follows. We
generalize the SU(2) spin to SU(N) spin and denote it by
Sβα(i). We write the spins in terms of Schwinger bosons:
Sβα(i) = b
α(i)b
β(i) sublattice A ,
Sβα(j) = −b̄β†(j)b̄α(j) sublattice B , (1)
where the b, b̄’s are bosonic operators that transform un-
der the fundamental representation of SU(N) if the index
is on the top and under its conjugate if the index is on
the bottom. To get the Hilbert space of the spins we
need to restrict the boson occupations as
b†α(i)b
α(i) = nc ,
b̄α†(j)b̄α(j) = nc , (2)
where nc corresponds to the spin length. The SU(N)
spin Hamiltonian is
〈i,j〉
Sβα(i)S
β (j) , (3)
which reduces to the SU(2) Heisenberg spin model for
spin S when N = 2 and nc = 2S.
Next we write the system in the path integral pic-
ture, imposing the constraints (2) by Lagrange multi-
pliers. The spin interaction contains quartic terms; to
get action that is quadratic in the boson fields, we use
Hubbard-Stratonovich transformation and obtain
b†α(i)
+ iλ(i)
bα(i)− iλ(i)nc
b̄α†(j)
+ iλ(j)
b̄α(j)− iλ(j)nc
〈i,j〉
|Qij |2 −Q∗ijbα(i)b̄α(j) + h.c. (4)
The path integral goes over b, b̄, Q, λ.
We can now integrate out the b’s. The resulting ex-
pression will have coefficient N in front of it. At large
N it can be approximated by its saddle point value.
Our departing point is such a “mean field” with uni-
form Qr,r+m̂(τ) = Q̄ and λ(r, τ) = λ̄ and assuming
gapped b spectrum; this represents a Coulomb spin liq-
uid, which is a stable phase in three dimensions. The
effective theory is obtained by considering the fluctua-
tions of the fields, Qr,r+m̂(τ) = [Q̄ + qm(r, τ)]e
iαm(r,τ)
and λ(r, τ) = λ̄ + iα0(r, τ). Here r runs over all sites of
the cubic lattice and m̂ = x̂, ŷ, ẑ denotes one of the direc-
tions in 3d. The amplitude fields qm are massive, and so
are the fields αm and α0 near the wavevector (0, 0, 0). On
the other hand, the fields αm and α0 near the wavevector
(π, π, π) are massless and describe the gauge field (pho-
ton) of the Coulomb phase, am ≡ α(π,π,π)m , aτ ≡ α(π,π,π)0 .
For details of the derivation, see the original Ref. 10 (our
notation is slightly different compared to these papers,
which use a two-site unit cell labeling instead).
As emphasized in Refs. 10,11, we also have to con-
sider effect of Berry phases, which is crucial for the un-
derstanding of the spin solid states. A very convenient
encapsulation of the low-energy degrees of freedom and
the Berry phases is provided by the following re-latticized
Euclidean action:15,16
Daiµe
−Sa−SB , (5)
Sa = −β
i,µ<ν
cos(∇µaν −∇νaµ) ,
SB = i
ηiaiτ .
Here we have a compact U(1) gauge field a residing on
the links of a (3+1)d space-time lattice and described by
the action term Sa. The SB term comes from detailed
consideration of the Berry phases, and ηi is 2S on one
sublattice of the spatial lattice and −2S on the other
one. In the Hamiltonian language this has a simple in-
terpretation as a background charge of value 2S on one
sublattice and −2S on the other one:
H = u
E2rm − κ
r,m<n
cos(∇man −∇nam) , (6)
(∇ ·E)r = ηr = ±2S , (7)
where Em are electric fields residing on the links of the
3d cubic lattice and conjugate to am. Thus we obtained a
compact U(1) gauge theory in the presence of background
charge.17,18,19
Throughout, we will assume the spinons are gapped
and are integrated out. Note that even though we start
in the Coulomb phase where the gauge field is decon-
fined, the above action also provides access to confining
paramagnetic phases, and this will be our main focus.
To sum up, we will be describing spin solid phases that
are proximate to the simple Coulomb phase; the latter
with the specified Berry phases encoded in the staggered
background charge is in turn appropriate in the vicinity
of the conventional Neel phase.
Since we will continue with (5) we need a way to con-
nect the variables there to the original spin variables.
This is done as follows. The nearest neighbor spin-spin
correlation 〈Sr ·Sr′〉 is proportional to the bond variable
|Qrr′|2. To get the connection between the fluctuation
of the magnitude of Q and the gauge fields we have to
also keep the massive amplitude fields qm in the above
derivation when integrating out the b’s. One finds that
the (π, π, π) component couples to the gauge fields in
the action as follows: δS = iγ
(π,π,π)
m (∂maτ − ∂τam),
with some coupling parameter γ. On the other hand, in
the derivation of the path integral from the Hamiltonian
formulation of the gauge theory, the electric field is cou-
pled to the gauge field in the same way, i.e., via a term
Em(∂maτ − ∂τam) in the action. Thus the electric
field gives the fluctuation of the staggered nearest neigh-
bor spin-spin correlation function.
B. Electro-magnetic duality
We now proceed to the analysis of the model (5). We
are interested in the confining phases, which will neces-
sarily break lattice symmetries for spins S = 1/2, 1, 3/2
studied here. The confinement occurs due to condensa-
tion of monopoles. Therefore we would like to express
the theory in terms of them. This can be done by the
standard electro-magnetic duality. The duality maps the
theory of a compact U(1) gauge field without charges into
a theory of a noncompact gauge field coupled to charges
– the monopoles of the original theory. The noncompact-
ness comes from the fact that we have dropped the elec-
tric charges in the original theory; had we retained them,
we would have obtained a compact dual gauge field whose
monopoles would correspond to the original charges. The
new variables reside on the lattice dual to the original lat-
tice. The background charge of the original theory gives
rise to a static dual magnetic flux emanating out of the
center of each cube as drawn in Figure 3. This flux al-
ternates in sign from one cube to the next and frustrates
the monopole hopping. Therefore we obtain a theory of
monopoles with frustrated hopping that are coupled to
the dual noncompact gauge field.12 The duality can be
done explicitly with various approximations clearly dis-
played as is written in Appendix A.
Explicitly, the partition function is
DL e−Sdual , (8)
Sdual =
∑ (∂L)2
λ cos(L + L0 −∇θ) , (9)
where L is the dual gauge field, (∂L)µν = ∇µLν −∇νLµ
is the four dimensional curl, L0 is the frustration that re-
sults from the original background charge and ultimately
z even
z odd
/32πS
FIG. 3: a) Original background electric charges 2S and −2S
on the two sublattices give rise to the static dual magnetic
fluxes as seen by the monopoles. b) Gauge choice for L0 that
realizes these fluxes modulo 2π.
from Berry phases, and θ is the monopole field. A con-
venient choice of L0 that produces the appropriate static
fluxes is shown in Figure 3; all subsequent work is done
in this gauge.
The advantage of the dual formulation is that it has no
sign problem and can be in principle studied by Monte
Carlo. A sketch of the phase diagram is on Figure 4. In
the bottom left side of the diagram, the monopoles are
gapped and the system is in the deconfined phase, which
correspond to the Coulomb spin liquid in the spin model.
At large enough λ and 1/β, the monopoles condense.
They can condense in various patterns which translate
to various spin solid phases of the original model. Dual-
ity relates the original field theory (5) to the large λ part
of the dual theory. It is hard to analyze the transition in
the large λ limit. Instead we look at three different places
in this phase diagram. At 1/β = ∞ the system becomes
frustrated XY model. First we analyze the phase tran-
sition looking for ordering of the XY spins as we cross
the phase boundary to the ordered phase. This gives us
the most likely monopole condensation patterns near the
transition. Next we look at the classical ground state
of the XY model in the upper right corner of the phase
diagram as approached from 1/β = ∞. Finally we look
near the same point but in the limit λ ≫ 1/β.
FIG. 4: Sketch of the expected phase diagram for the dual
action Eq. (9).
III. ANALYSIS 1,2: FRUSTRATED XY MODEL
AT 1/β = ∞
A. Outline of the Analysis
In this section we describe in general terms the anal-
ysis in the 1/β = ∞ limit where the dual action Eq. (9)
reduces to a frustrated XY model. We look at the phase
transition and the classical ground state.
In the Analysis 1, we consider the transition in the
spirit of the Landau theory. We identify the relevant low-
energy fields, write the most general quartic potential
consistent with the symmetries, and study it in mean
field. The approach is the same for each spin S, but
the details are unique in each case and are contained
in Subsections III B and III C for spin 1 and spin 3/2
respectively (spin 1/2 was considered using this approach
in Ref. 12).
More explicitly, the mean field derivation is done as
follows. The mean field theory of XY spins is described
by the continuous soft spin action
|∂τΦR|2 −
〈RR′〉
(tRR′Φ
RΦR′ + c.c.) +
V (|φR|2)
with some potential V (|ΦR|2) = r0|ΦR|2 + u0|ΦR|4 + · · ·.
After crossing the transition from the disordered side, the
system enters a phase with non-zero ΦR that minimize
this action. The initial step is to minimize the kinetic
energy. This will turn out to have a three-dimensional
manifold of minima for spin 1 and four-dimensional one
for spin 3/2. One then expands around these minima
and writes all terms to a given order that are allowed by
symmetry. In both cases the degeneracy is lifted at the
fourth order. We will find that for spin 1 there are three
independent quartic terms and we manage to draw a gen-
eral phase diagram of the Landau theory. For spin 3/2,
there are five such terms and the parameter space is too
rich for us to describe the phase diagram completely. In
this case we confine ourselves to examining the potential
obtained from the most natural microscopic fourth order
term and determining its mean field phase.
In the Analysis 2, we find the classical ground state of
the frustrated XY model – the state in the upper right
point of the phase diagram Figure 4 – by a direct mini-
mization of the hard-spin action (9). We use the following
method: For some system size, we start with a random
configuration of spins. We pick a random spin and min-
imize its (local) energy and repeat this process until the
total energy converges. Different starting configurations
will lead to different final energies, because sometimes the
system gets stuck in some local minima. We repeat this
procedure for many starting configurations and also for
different system sizes. We then select the configurations
with the same lowest energy, which gives the absolute
minimum of the potential. The case of spin 3/2, which
corresponds to a fully frustrated XY model, was already
considered some time ago in Ref. 20, and our method
produces results in agreement with that work.
For both spin 1 and spin 3/2, we find that the clas-
sical ground state coincides with the most natural state
identified in the mean field theory near the transition.
This suggests that there is only one XY-ordered phase
along the 1/β = ∞ line in Fig. 4, which could in princi-
ple be tested in Monte Carlo studies of the corresponding
frustrated XY models.
We have described how to find the phases of the dual
action in the 1/β = ∞ limit. However we are interested
in the phases of the original spin model. To make the
connection we calculate the energies and staggered curls
of the monopole currents in the dual model and relate
them to variables in the original spin problem. These
variables are the plaquette energy and the bond expecta-
tion value respectively. This allows us to determine the
spin solid patterns.
The mapping of the first variable, the energy, is sim-
ple. Energy simply maps to energy. In the dual model
we can calculate the energy for each bond, which is
ǫ = 2Re(tRR′Φ
RΦR′). The center of a bond of the dual
lattice coincides with the center of a plaquette of the orig-
inal lattice, and so the calculated energy is the plaquette
energy of the original model.
The connection of the staggered curls of the monopole
current to the original bond variables is established
as follows. The monopole current is given by JM =
2Im(tRR′Φ
RΦR′). In terms of the original gauge the-
ory, Eq. (5), just as the electric current produces mag-
netic field, the magnetic current produces electric field.
The resulting electric field is given by the analog of Biot-
Savart law. However, approximately, if we have a loop of
the magnetic current, the electric field it produces in the
center is proportional to the circulation of the current,
which is what we call the curl of the monopole current.
As we described in the preceding section, the electric field
is proportional to the staggered fluctuation of the near-
est neighbor spin-spin correlation function, therefore the
claimed connection. We will use this extensively in the
detailed treatment of spin 1 and spin 3/2 below.
B. Results: Spin 1
1. Analysis 1: Phase transition of the XY model
Now we turn to finding the phases for spin 1. We
choose the gauge shown on Figure 3. In this case the
hopping amplitudes in (10) are given by
tR,R+x̂ =
(−1)z
3 + i(−1)x+y
, (11)
tR,R+ŷ =
(−1)z
3− i(−1)x+y
, (12)
tR,R+ẑ = 1 . (13)
The band structure has three minima and hence the space
of ground states of the kinetic energy is three dimen-
sional. Convenient choice of the basis is the following:
(−1)x+y+z − (−1)x+y
3 + i[(−1)z +
Ψ2 = i
(−1)x+y+z − (−1)x+y
3− i[(−1)z +
Ψ3 = −
(−1)y + i(−1)x√
A general kinetic energy ground state can be written as
Φ(R) = φ1Ψ1(R) + φ2Ψ2(R) + φ3Ψ3(R) (14)
with complex fields φ1,2,3. This degeneracy will be lifted
by nonlinear terms. To find out how, we would like to
write the Landau theory for the φ’s, including all terms
that are allowed by symmetry. Thus we need to find how
the φ’s transform under the lattice symmetries.
The generators of the symmetries are the translations
by one lattice spacing in the x,y,z directions, 90 degree
rotations around the x,y,z axes (it suffices to consider two
out of three rotations), and mirror reflections. Note that
the fluxes seen by the monopoles (and encoded in the
complex phases of the hopping amplitudes tRR′) change
sign under unit translations. The original spin problem
is translationally invariant, and this is represented in the
dual action (10) as follows. The fluxes remain unchanged
if the t’s are also conjugated after the translation, and
there is a gauge transformation that brings such mod-
ified t’s to the original themselves. The action of the
symmetry on the field Φ is then a combined application
of the translation of the coordinates, conjugation, and
gauge transformation. Similar considerations apply for
the 90 degree rotations performed here about the dual
lattice axes. After carrying through this analysis, the
transformation properties of φ’s are remarkably simple:
Tx Ty Tz Rx Ry Rz
φ1 → – + + + φ∗3 φ∗2
φ2 → + – + φ∗3 + φ∗1
φ3 → + + – φ∗2 φ∗1 +
In this table, “+” or “−” stands for φi → φ∗i or
φi → −φ∗i respectively. We see that φ1 can be loosely
associated with the x direction, φ2 with y, and φ3 with z.
We should also point out that under mirror symmetries
in the dual lattice planes the fields transform simply
φi → φi.
There is only one invariant term at the quadratic level:
V (2) = m(|φ1|2 + |φ2|2 + |φ3|2) , (15)
where m is a constant. There are three independent al-
lowed terms at the quartic level, and the most general
quartic potential can be written in the form
V (4) = u(|φ1|2 + |φ2|2 + |φ3|2)2
+ v(|φ1|4 + |φ2|4 + |φ3|4)
+ w(φ∗21 φ
2 + φ
3 + φ
1 + c.c.) , (16)
where u, v, w are constants.
To find the phases of the Landau theory, we simply
need to minimize this potential. Before we start describ-
ing the phases, however, it is useful to introduce bilinears
of the fields. The reason is that these are gauge indepen-
dent objects whereas the form of Ψ1,2,3 and hence the
transformation properties of φ1,2,3 are gauge dependent.
We consider the following bilinears:
B0 = |φ1|2 + |φ2|2 + |φ3|2 ,
(|φ1|2 + |φ2|2 − 2|φ3|2) ,
F2 = |φ1|2 − |φ2|2 ,
Dx = φ
3φ2 + φ
2φ3 ,
Dy = φ
1φ3 + φ
3φ1 ,
Dz = φ
2φ1 + φ
1φ2 ,
Nx = i(φ
3φ2 − φ∗2φ3) ,
Ny = i(φ
1φ3 − φ∗3φ1) ,
Nz = i(φ
2φ1 − φ∗1φ2) .
The B0 and the groups of F ’s, D’s and N ’s form ir-
reducible representations of dimensions 1, 2, 3, and 3
respectively. The transformation properties of these bi-
linears are displayed in the following table
Tx Ty Tz Rx Ry Rz
B0 + + + + + +
F1 + + + − 12F1 +
F2 − 12F1 −
F2 + + +
Dx + − − + Dz Dy
Dy − + − Dz + Dx
Dz − − + Dy Dx +
Nx − + + + Nz Ny
Ny + − + Nz + Nx
Nz + + − Ny Nx +
We should also add that all bilinears transform
trivially under mirror symmetries in the dual lattice
planes.
We next calculate the energies and the staggered
curls of the monopole currents, which, as described
in Subsec. III A, are related to the plaquette ener-
gies and bond variables of the original spin prob-
lem. To repeat, the energy is given by ǫµ(R) =
2Re(tR,R+µ̂Φ
RΦR+µ̂), and the monopole current is given
by Jµ(R) = 2Im(tR,R+µ̂Φ
RΦR+µ̂). The staggered curl
of the monopole current is what the name suggests, for
example, fz ≡ (−1)x+y+z[Jx(R) + Jy(R + x̂) − Jy(R) −
Jx(R+ ŷ)].
The energies and staggered curls of the monopole cur-
rents are bilinears in φ and thus can be expressed in terms
of the B0, . . . , Nz. They are
− 2(−1)y+zDx
3 [(−1)yNy + (−1)zNz] , (17)
fx = 3
− 4(−1)xNx . (18)
The components in the other directions are obtained from
these by the appropriate rotations using the table, which
for all bilinears except for F ’s gives the same result as the
obvious permutation of indices. More generally, while the
numerical coefficients in these expressions are obtained
from the bare monopole hopping problem, the overall
structure of the contributing terms is dictated by the
symmetries – one only needs to remember that ǫx and fx
are associated with scalars residing on respectively pla-
quettes and bonds of the original spin lattice and also
that the rotations and mirrors quoted here are about the
axes and planes passing through the dual lattice sites.
With the above results in hand, we now turn to analyz-
ing phases of the Landau theory. The phase diagram is
obtained simply by minimizing the potential (15)+(16)
and is shown in Figure 5. The different phases are de-
scribed in the following. In each case the ground state
has finite degeneracy; we display few such states and the
others are obtained from them by obvious permutations;
we display nonzero bilinears, the energies, and the stag-
gered curls of the monopole currents for the first listed
state.
FIG. 5: Phase diagram of the Landau theory for spin 1 ob-
tained by minimizing the potential (15)+(16) for m < 0 and
u > 0 (the latter choice is made for concreteness). In the
”Quartic unstable” region on the left the potential to quartic
order is asymptotically negative and we would have to include
sixth order terms to stabilize it. The cross denotes the pa-
rameter point obtained by simply expanding the microscopic
potential |Φ|4 in terms of the slowly varying fields φ1,2,3.
Phase 1. There are three degenerate states. The
values in one of them are
φ1 = 1, φ2 = φ3 = 0; (19)
B0 = 1; F1 =
, F2 = 1; (20)
ǫx = 2, ǫy = ǫz = 1; (21)
fx = 4
3; fy = fz = −2
3. (22)
The bond variables are drawn on the original spin lattice
in Figure 1; they suggest that the spins are organized
into Haldane chains along the x direction. The values of
plaquette energies are consistent with this: the plaque-
ttes in the xy and xz planes are the same and differ from
the plaquettes in the yz plane, ǫz = ǫy 6= ǫx.
Phase 2. There are six degenerate states. The values
in one of them are
φ1 = 0, φ2 = 1, φ3 = ±i; (23)
B0 = 2; F1 = − 1√3 , F2 = −1; Nx = 2; (24)
ǫx = 2, ǫy = ǫz = 3+ 2
3(−1)x; (25)
fx = −4[
3 + 2(−1)x], fy = fz = 2
3. (26)
The corresponding drawing of the bond variables on the
original spin lattice is in Figure 6, suggesting that in this
phase the spins combine into singlets and form a colum-
nar dimer state along one direction. Permuting the val-
ues of φ1,2,3 gives six degenerate states that correspond
to six possible ways of placing such columnar solid onto
the cubic lattice.
Phase 3. There are eight degenerate states specified
as follows:
φ1 = 1, φ2 = e
iα2 , φ3 = e
iα3 , (27)
{α2, α3} = ±{2π/3,−2π/3}, ± {2π/3, π/3},
± {π/3, 2π/3}, ± {π/3,−π/3}; (28)
FIG. 6: Phase 2 of spin 1. The thick lines denote the positions
where the bond variables are strongest and dashed lines where
they are weakest. This suggests that the spins organize into
singlets (dimers) and form a columnar order.
B0 = 3; Dx = Dy = Dz = −1;
Nx = Ny = Nz =
3 (29)
ǫx = 4 + 2(−1)y+z + 3[(−1)y + (−1)z], etc., (30)
fx = −4
3(−1)x, etc. (31)
The nearest neighbor spin-spin correlation has higher ex-
pectation value on the sides of the cubes shown in Figure
7, which suggests that this phase corresponds to a box
state. There are eight possible ways of placing such box
state onto the cubic lattice.
FIG. 7: Phase 3 of spin 1. The bond variables have higher
expectation values on the cubes shown.
Phase 4. There are four degenerate states:
φ1 = 1, φ2 = e
iα2 , φ3 = e
iα3 ; (32)
α2 = 0, π; α3 = 0, π; (33)
B0 = 3; Dx = Dy = Dz = 2; (34)
ǫx = 4− 4(−1)y+z, ǫy = 4− 4(−1)z+x,
ǫz = 4− 4(−1)x+y; (35)
fx = fy = fz = 0. (36)
This state breaks lattice symmetries as can be seen from
the plaquette energies. However, because the bond vari-
ables fx,y,z are zero, we do not know a simple interpre-
tation of this phase in terms of the original spins; some
finer characterization than what we use here is needed to
establish this state.
This concludes the discussion of the general phase dia-
gram of the Landau theory including quadratic and quar-
tic terms. Higher-order interactions may stabilize some
other phases, but the presented states are the most nat-
ural ones. The actual lowest-energy state depends on the
parameters u, v, w, unknown apriori. If we are to guess
which of the four phases is the most likely candidate in
the specific frustrated XY model, we can consider the
simplest microscopic quartic potential |Φ|4. When ex-
panded in terms of the continuum fields, we find u = 2,
v = −1, w = −1/2; this point is denoted by the cross in
Figure 5 and lies in the Phase 1, i.e., the Haldane chains
phase.
2. Analysis 2: The ground state of the XY model
Minimizing the classical energy of the hard spin XY
model as described in Sec. III A, we find that the ground
state configurations coincide with the condensate wave-
functions of the phase 1 and hence the state is that of the
phase 1. In particular note that each wavefunction Ψ1,2,3
has the same length |Φ| on all sites. The XY angles of
spins in this gauge in the three ground states are
(0,−30,−30, 0, 60,−90,−90, 60) , (37)
(0, 30, 30, 0,−60, 90, 90,−60) , (38)
(0,−90, 90, 180, 0,−90, 90, 180) , (39)
where the convention is that we vary position on the cube
in the x direction first, then in the y direction, and then
in the z.
3. Discussion and extension to anisotropic system
Some remarks are in order. First, it is useful to note
that the doublet F1,2 can be interpreted as an order
parameter of the Haldane chains phase. Indeed, one
can readily see that the transformation properties of F1
and F2 coincide with those of (Qx +Qy − 2Qz)/
3 and
Qx − Qy respectively, where Qm is the bond variable
in the direction m̂. On the other hand, Nx transforms
as (−1)xQx and similarly for Ny and Nz, so ~N can be
viewed as an order parameter of the valence bond solids
such as the columnar Phase 2 or the box Phase 3. In
the columnar phase, it is suggestive to view each strong
bond in Fig. 6 as representing a singlet formed by two
spin-1’s, which can be also drawn as two spin-1/2 valence
bonds connecting the two sites. However, we should be
cautious with such interpretation, since we can only tell
that the deviations of the bond variables from their mean
value will have the displayed pattern. The actual state
needs to be studied by constructing the corresponding
spin wavefunction. For example, the Haldane phase of a
spin 1 chain is stable to weak dimerization and should be
viewed as a solid formed by single-strength bonds along
the chains, so such distinct possibilities should be kept
in mind.
Let us now assume that the system is in the Phase
1. It is also interesting to ask what happens when we
stretch the lattice in one of the axis directions, say the
z-direction. In this case the Rx and Ry rotations are no
longer symmetries but the other transformations are. At
the quadratic level, the translation symmetries already
prohibit all terms except B0 and F ’s. Then from Rz we
see that only F1 is allowed. Thus at the quadratic level
one new term is allowed. In principle we should look at
the new allowed terms at the quartic level, however we
will assume that this quadratic term is leading but small
compared to the terms that were there before we broke
the symmetry.
We find that if the F1 comes with a positive pre-factor,
out of the three ground states it selects the state with
chains running along the z direction whereas if it comes
with a negative pre-factor it selects the two states with
chains running along the x and y directions.
This has a simple interpretation in terms of spins.
If the coupling in the z direction is stronger than in
the other directions the state with maximum number of
bonds in this directions is selected which is the state with
chains running in the z-direction. In the opposite case,
the states with fewest bonds in the z direction are se-
lected which are the states with chains running in the x
or y directions.
C. Results: Spin 3/2
1. Analysis 1: Phase transition of the XY model
We choose the gauge as shown on Figure 3 with S =
3/2. The hopping amplitudes are
(−1)z
1 + i(−1)x+y
, (40)
(−1)z
1− i(−1)x+y
, (41)
tz = 1 . (42)
The band structure has four minima and hence the space
of the ground states of kinetic energy is four-dimensional.
Unlike the spin 1 case where this space was three-
dimensional and simple basis vectors were found corre-
sponding to the three directions of the physical space,
there is no such form in the spin 3/2 case. The four
wavefunctions that give us relatively simple subsequent
analysis are the following
Ψ1 = (−1)x
cosβ − i(−1)x+y+z sinβ
Ψ2 = i(−1)y
cosβ + i(−1)x+y+z sinβ
1 + i(−1)x+y√
cosβ − i(−1)x+y+z sinβ
1− i(−1)x+y√
cosβ + i(−1)x+y+z sinβ
where
cosβ =
3 + 1
, sinβ =
. (43)
We again write Φ(R) =
i=1 φiΨi(R). The transfor-
mation properties of the slow fields φ1,2,3,4 are derived in
the same manner as in the spin 1 case. The symmetries
Tx : ~φ → τ3σ0 ~φ∗ , (44)
Ty : ~φ → τ0σ0 ~φ∗ , (45)
Tz : ~φ → τ1σ0 ~φ∗ , (46)
Ry : ~φ → τ1e−i
~φ∗ , (47)
Rz : ~φ → e−i
σ1 ~φ∗ . (48)
Here ~φ, ~φ∗ are column vectors, and we introduced two
sets of Pauli matrices: τ matrices that act on the blocs
{1, 2} and {3, 4}, and σ matrices that act within each bloc
(τ0 and σ0 are the corresponding identity matrices). At
the quadratic order there is one invariant term
V (2) = m
|φi|2 . (49)
At the quartic order there are five invariant terms. The
expressions in terms of φ are rather complicated and not
very illuminating, particularly since φ’s depend on the
choice of gauge and the basis. Instead, we will use gauge
invariant bilinears of φ to which we now turn.
There are 16 bilinears and they can be conveniently
organized using tensor product of the introduced two
sets of Pauli matrices, namely φ†τµσνφ with µ, ν =
0, 1, 2, 3. These break up into irreducible representa-
tions of the cubic lattice symmetry group. There are two
one-dimensional, one two-dimensional, and four three-
dimensional representations. The convenient combina-
tions that we use are
B0 = φ
†τ0σ0φ ,
C = φ†τ0σ2φ ,
F1 = φ
†τ0σ1φ ,
F2 = φ
†τ0σ3φ ,
~D = (Dx, Dy, Dz) = φ
†~τσ2φ ,
~N = (Nx, Ny, Nz) = φ
†~τσ0φ ,
Mx = φ
†τ1(−1
σ3)φ ,
My = φ
†τ2(−1
σ3)φ ,
Mz = φ
†τ3σ1φ ,
Kx = φ
σ1 − 1
σ3)φ ,
Ky = φ
†τ2(−
σ3)φ ,
Kz = φ
†τ3σ3φ .
The transformation properties of these bilinears are in
the following table
Tx Ty Tz Rx Ry Rz
B0 + + + + + +
C − − − + + +
F1 + + + − 12F1 +
F2 − 12F1 −
F2 + + +
Dx + − − + Dz Dy
Dy − + − Dz + Dx
Dz − − + Dy Dx +
Nx − + + + Nz Ny
Ny + − + Nz + Nx
Nz + + − Ny Nx +
Mx − + + + Mz My
My + − + Mz + Mx
Mz + + − My Mx +
Kx − + + − −Kz −Ky
Ky + − + −Kz − −Kx
Kz + + − −Ky −Kx −
The energies and staggered curls of monopole cur-
rents in term of these bilinears are
B0 − 2(−1)y+zDx
2 [(−1)yMy + (−1)zMz]
[(−1)yKy − (−1)zKz] , (50)
fx = 2
2(F1 +
3F2) +
8(−1)x√
Nx . (51)
The components in the other directions are obtained from
these by simple rotations of the coordinates. Our general
discussion following similar expressions (17) and (18) in
the spin 1 case apply here as well (for ease of compari-
son, we are using similar labels for objects with identical
transformation properties in the two cases). However, a
word of warning is in order here, which will be explained
in Sec. III C 3 below. Observe, for example, that ~N and
~M have identical transformation properties and therefore
should enter similarly in any expression. The absence of
M ’s in the expression for ǫx and the absence of N ’s in
the expression for fx is due to their different eigenval-
ues under an additional artificial symmetry present in
the frustrated XY model, namely a charge conjugation
symmetry defined later, which is also present in our bare
kinetic term and thus in the above expressions. This
symmetry is not physical in the original spin model and
will not be used here; it is therefore important to note
that the degeneracy of the four slow modes obtained from
the bare kinetic term is protected at the quadratic level
by the physical lattice symmetries.
There are five independent fourth order terms in φ al-
lowed by translation and rotation symmetries:
I1 = B
0 , (52)
I2 = C
2 , (53)
I3 = N
z , (54)
I4 = M
z , (55)
I5 = NxMx +NyMy +NzMz . (56)
As we have said earlier, because the number of invariant
terms is large, we will not attempt to draw the phase
diagram of the Landau’s theory. Instead we look at the
natural microscopic term
V (4) = |Φ|4 = 4
I4 , (57)
where the second equality is obtained after some calcu-
lation keeping only non-oscillatory terms.
This potential does not have any continuous symme-
try left other than the global U(1) transformation of all
fields. In fact the dimensions of the subgroups of SU(4)
that keep the terms I1, . . . , I5 invariant are 15, 7, 6, 0, 0
respectively. The potential (57) achieves global mini-
mum at twelve discrete points. As an illustration, we
consider the following four minima that are associated
with the z direction in the sense to become clear below:
(φ1, φ2, φ3, φ4) =
(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1). (58)
The four states can be related to each other by transla-
tions in the z direction and rotations about the z axis.
Besides B0 = 1, the only nonzero bilinears in these states
are (F2, Nz,Kz) = (1, 1, 1), (−1, 1,−1), (1,−1,−1), and
(−1,−1, 1) respectively.
The energies are
∓ (−1)z
, (59)
± (−1)z
, (60)
, (61)
where the upper sign corresponds to the first and fourth
minima and the lower sign to the other two.
The staggered curls of monopole currents are respec-
tively
fx, fy, fz =
8(−1)z√
, (62)
8(−1)z√
, (63)
6,− 8(−1)
, (64)
6,− 8(−1)
. (65)
The staggered curls are interpreted as the strength
(above some mean) of the expectation value of nearest
neighbor spin-spin correlation function. The above val-
ues imply that the spins organize themselves into ladders
as shown in Figure 2, obtained by drawing say the pos-
itive bonds for the first of the above minima. The four
listed states correspond to the four different positions of
ladders with rungs oriented along the z-axis. The other
eight minima are obtained by 90 degree rotations around
the x and y axes and we will not write the specific values
of the variables. The ladder state is natural for S = 3/2
system, in the picture where spin 3/2 breaks up into three
spin 1/2’s and each of them forms a bond with some other
neighboring spin 1/2.
2. Analysis 2: The ground state of the XY model
We can use the same procedure as in the case of spin
1 to find the classical ground state of the appropriate
XY model. In fact, this was already done in Ref. 20
because this problem is the fully frustrated XY model
(FFXY), which is of interest by itself, and we can use
the available results. We find that the ground state con-
figurations coincide with the condensate wavefunctions
obtained above. Thus, in each of the four displayed states
(58), the microscopic boson field Φ is given precisely by
one of the four wavefunctions Ψ1,...,4. One can see that
|Φ| = 1 on all lattice sites, and the complex phases of Φ
can be interpreted as angles of the hard-spin XY model.
For example, for Φ = Ψ1 the angles are
(−β, π + β, β, π − β, β, π − β,−β, π + β) , (66)
listed in the same order as in Eq. (39). All other ground
states can be obtained by appropriate symmetry trans-
formations. The agreement of the two analyses suggests
that there is only one ordered phases in the FFXY model,
which is also supported by the available Monte Carlo
studies.20,21
3. Remark on charge conjugation symmetry in the FFXY
It is worth to point out that the fully frustrated XY
model has an additional charge conjugation symmetry.
Indeed, since π and −π fluxes are indistinguishable, tRR′
and t∗RR′ are related by a gauge transformation, tRR′ =
eiγRt∗RR′e
−iγR′ , and so the action remains invariant under
the following unitary transformation:
C : ΦR → eiγRΦ∗R . (67)
In terms of the continuum fields, this becomes
C : ~φ → τ2σ2 ~φ∗ . (68)
In particular, the bilinears Nx,y,z are odd under C while
Mx,y,z are even, so if this symmetry is included, the I5
quartic term is not allowed (this is why this term did not
appear in Eq. 57 since both the microscopic |Φ|4 and the
bare quadratic terms in Eq. 10 have this additional sym-
metry). Thus, the complete field theory for the FFXY
model is a φ4 theory with four complex fields and inde-
pendent quartic terms I1,...,4.
One consequence of the charge conjugation symmetry
is that, for example, if we draw the Ψ1 state using neg-
ative values of the staggered curls fx,y,z as opposed to
using positive values which was done in Fig. 2, we would
obtain another set of ladders that go perpendicularly to
the ones displayed and are shifted up by one lattice spac-
ing. To put this in other words, the Ψ1 and Ψ4 states that
can be related by a translation in the z direction followed
by a rotation around the z axis are also related by C. In
this sense, each of the states Eq. (58) does not define a
direction in the x-y plane since the correlations in the x
and y directions are related by the charge conjugation
symmetry.
Tracing back to the original gauge theory formulation,
this symmetry is present in the simplest model Eq. (5) for
S = 3/2 that we wrote down and the corresponding sim-
plest “dimer model” Hamiltonian Eq. (6). Specifically,
the transformation E → 1−E on the links oriented from
one sublattice to the other, or equivalently 1 ↔ 0 in the
dimer language, takes the model corresponding to spin S
to the one corresponding to spin 3−S, while the S = 3/2
case maps back onto itself. This symmetry is useful in
the specific models, but there is no corresponding sym-
metry in the microscopic derivation from the spin model,
and therefore it was not used in the preceding analysis.
Let us look what happens to the ground states when we
add small term that breaks the charge conjugation sym-
metry, the I5, to the potential. Using general arguments
it is easy to check that the twelve minima will shift but
not split, and the twelve-fold degeneracy remains since
all are related to each other by lattice symmetries. Fur-
thermore, each ground state stays translationally invari-
ant along the ladders and perpendicular to the plane of
ladders (otherwise, if this were not true, there would be
more than twelve states). In other words, the states still
have the structure of ladders. However since the charge
conjugation is broken, it is no longer true that the nega-
tive bonds are of the same magnitude as the correspond-
ing positive ones. This makes sense when interpreted
in terms of spins. In the picture where spin 3/2 breaks
up into three spin 1/2 and ladders of valence bonds are
formed, the links that belong to these ladders are differ-
ent from the links without bonds (which also form lad-
ders). For example, the system is entangled along the
former but not along the latter. Thus these two should
not be related by any symmetry.
Explicitly, the four states in Eq. (58) become
(1, δ, 0, 0), (δ, 1, 0, 0), (0, 0, 1, δ), (0, 0, δ, 1), (69)
with appropriate δ obtained from minimization. There
is now an additional non-zero bilinear Mz, and also both
F1 and F2 are non-zero. The expressions for the ener-
gies and staggered curls in the x and y directions are no
longer related, and we can then associate a unique x or y
direction with each of the four states. These are ladders
with rungs oriented in the z direction and are related to
each other by the z translations and rotations.
4. Extension to anisotropic system
As in the spin 1 case, we ask what happens when we
stretch the system along one axis, say the z-direction.
Again, the Rx and Ry rotations are no longer symmetries
but the translations and Rz are. At the quadratic level,
the translation symmetries already prohibit all terms ex-
cept B0 and F ’s. Then from Rz we see that only F1
is allowed. Thus at the quadratic level one new term is
allowed.
We find that if the F1 comes with a positive pre-factor,
out of the twelve ground states it selects four with the
ladders that lie entirely in the x-y plane, whereas if it
comes with a negative pre-factor it selects four states
with the ladders running along the z-direction. Note that
this breaking up into groups of four is a consequence of
the remaining symmetries in the system.
These results have a simple physical interpretation for
the spin system. If the coupling in the z direction is
weaker than in the other directions, the states with fewest
bonds in the z direction are selected which are the states
with the ladders lying in the x-y plane. On the other
hand, if the coupling in the z direction is stronger, the
states with the largest number of bonds in the z direction
are selected, which are the states with ladders oriented
in the z direction.
IV. ANALYSIS 3: MAPPING TO DIMERS AT
λ ≫ 1/β ≫ 1
Here we look at the right hand corner of the phase
diagram Fig. 4 in the regime with λ ≫ 1/β ≫ 1, where
as we will see the system can be mapped to dimers.17,18,19
The analysis proceeds as follows. First we gauge away
the ∇θ in Eq. (9) to obtain
Sdual =
∑ (∂L)2
λ cos(L+ L0) , (70)
Because we assume λ ≫ 1/β ≫ 1 the configurations
that contribute significantly to the partition function can
be written in the form L = −L0 + 2πn + δL where n
is an integer and δL is small. Note that the λ term
does not depend on n and the 1/β term has a gauge
invariance n → n+∇m where m’s are integers on sites.
The partition function can be written as a sum over the
gauge equivalent classes. These classes are in one-to-
one correspondence with the fluxes j = ∂n which are
integers on plaquettes, where ∂n is the four dimensional
curl (∂n)µν = ∂µnν − ∂νmµ.
Consider first configurations with δL = 0. Some con-
figurations of j minimize the action and we denote them
by jgs. As we show below, there is an extensive num-
ber of them in all our cases. The configurations with j
that are not jgs are at energy of at least ∼ 1/β higher.
Now turning on δL, if we show that the typical energy of
excitation in δL around a given j is much smaller then
1/β then we can neglect all configurations which are not
around jgs. We will assume that this is true and show
this self-consistently below.
We define Jgs = −∂L0/(2π) + jgs. We expand the
action to the second order and drop the terms that do
not depend on Jgs, δL to obtain
∑ 4πJgs · (∂δL) + (∂δL)2
(δL)2 . (71)
This is just a gaussian integral. There are two quadratic
terms and the first one has 1/β in front and contains two
derivatives while the second has λ in front and contains
no derivatives. Since we are on a lattice the derivatives
are of order one. Since λ ≫ β, the first term can be
neglected. Next we sum by parts and integrate out the
δL. Before we do this however, we notice that the cou-
pling is ferromagnetic in time direction and L0 has zero
time components and its spatial components do not de-
pend on time. This implies that the jgs and Jgs have
zero time components and their spatial components do
not depend on time. Thus we drop time components and
time derivatives from the action and treat the Jgs and
L0 as three-dimensional. Now we integrate out the δL
and obtain
Seff [J
gs] = − 1
8π2β2λ
(∇× Jgs)2 (72)
Thus, to obtain a ground state, we need to maximize the
sum of the squares of curls of Jgs.
Let us check the consistency of our approach. From
(71), δL ∼ ∇× Jgs/(λβ) and so energy∼ 1/(λβ2). This
needs to be much smaller then 1/β which implies λ ≫
1/β which is what we assumed.
FIG. 8: a) L0/(2π) where S = 1/2, 1, 3/2 is the spin. The link
variables switch orientations under elementary translation in
the x or y direction. b) The fluxes (∇ × L0)/(2π). This
figure is similar to Fig. 3 with 2π’s removed to simplify the
discussion of the dimer ground states.
Now let us turn to the specific cases of spins. Since the
spin 1/2 case has not been considered using this approach
before, we will add it here for completeness. The gauge
choice for L0 and the fluxes∇×L0/(2π) through the faces
of the spatial cubes are shown on Figure 8 with S = 1/2.
It is easy to see that the set of ground states consists of
all configurations with precisely one −5/6 and five 1/6
fluxes Jgs coming out of every site of one sublattice of
the original spin lattice (and coming into every site on
the other sublattice). Jgs = −∇ × L0/(2π) is one such
configuration in the spin 1/2 case, but there are many
more. Associating the −5/6 plaquettes with dimers on
the links of the original spin lattice, the set of the ground
states is thus the set of dimer configurations with one
dimer coming out of every site.
Now turn to the case of spin 1. The fluxes (∇ ×
L0)/(2π) are shown on Figure 8 with S = 1. If we try
Jgs = −∇× L0/(2π), each cube contributes 1/β energy
term proportional to 5(1/3)2 + (5/3)2 = 10/3. However
we can do better. Using L = −L0+2πn, if we pick n = 1
on the upper link on the front face and zero elsewhere on
the cube in Fig. 8, we lower the magnitude of the flux
on the upper face, at the expense of increasing the flux
through the front face. The energy of this cube is then
4(1/3)2 + 2(2/3)2 = 4/3, which is lower. It is easy to
show that this is the lowest we can achieve and that the
ground state configurations have two fluxes of value −2/3
and four fluxes of value 1/3 coming out of every site of
one sublattice of the original spin lattice. Associating the
2/3 links with dimers, the set of the ground states is thus
the set of dimer configurations with two dimers coming
out of every spin site.
Finally, in the S = 3/2 case, it is easy to see that
the ground state configurations have precisely three −1/2
and three 1/2 fluxes coming out of every site of one sub-
lattice of the original cubic lattice. Associating the −1/2
links with dimers, the set of the ground states is thus the
set of dimer configurations with three dimers coming out
of every spin site.
Thus, as claimed, in each case there is an extensive
number of Jgs’s. To find the true ground state, we need
to minimize (72) among these dimer configurations. It is
not hard to show that for the spin 1/2 we get columnar
state, for spin 1 the Haldane chains state of Fig. 1 and
for spin 3/2 the ladder state of Fig. 2.
Finally we note that defining Egs = S/3− Jgs, the set
of Egs is the set of electric fields on links, cf. Eq. (6),
with the property that the magnitude of each is either
zero or one (which can be imposed by minimizing the
energy term
E2); the mapping between such electric
fields and dimers above is the standard one on the cubic
lattice17,18,19. The final ground state selection is obtain
by maximizing
(∇× E)2.
V. CONCLUSIONS
In this paper we looked for spin solid phases in the sys-
tem of spin 1 and 3/2 on the cubic lattice. We wrote the
spins in terms of Schwinger bosons, assumed the uniform
Coulomb spin liquid phase and by process of monopole
condensation transitioned into spin solid phases. Using
the duality we rewrote the system in terms of monopoles
coupled to a noncompact U(1) gauge field, Eq. (9), and
analyzed this theory in three different limits shown in
Figure 4.
In the first two limits the theory becomes a frustrated
XY model. For spin 1 the frustrating flux through every
plaquette is 2π/3, while for spin 3/2 it is π. In the first
approach, using symmetries we wrote the Landau’s the-
ory near the ordering transition. It is a φ4 theory with φ
a complex vector with three components for S = 1 and
four components for S = 3/2. At the quadratic level only
the rotationally invariant mass term is allowed. At the
quartic level there are three allowed terms for spin 1 and
five for spin 3/2. For spin 1 we draw a mean field phase
diagram Figure 5. For spin 3/2 we didn’t attempt it due
to a large number of parameters. In both cases we also
considered the most natural microscopic potential and
found that it selects a state with parallel Haldane chains
of Figure 1 for S = 1 and a state with parallel ladders
of Figure 2 for S = 3/2. These are natural states for the
spin systems to be in, in the picture where spin 1 breaks
up into two and spin 3/2 into three spin 1/2’s and each
such spin 1/2 forms a singlet bond with another spin 1/2
of some neighbor.
In the second approach we looked at the classical
ground states of the frustrated XY models and found
that these actually describe the same phases as the most
natural ones identified near the transition.
In the third approach the theory becomes a dimer
model with 2S dimers coming out of every site. Dimer
configurations with parallel lines for spin 1 and parallel
ladders for spin 3/2 are selected, which is the same re-
sult as in the other two limits suggesting that these are
indeed the most natural valence bond solids in the corre-
sponding spin systems. It would be interesting to look for
such spin solid phases in Quantum Monte Carlo studies
of models on the cubic lattice.5,8
It is also worth noting14 that if we consider our quan-
tum 3d systems at a finite temperature, we obtain simply
the corresponding classical 3d dimer models, e.g., with
the classical energy given by the first term in Eq. (6). Our
results then provide appropriate long-wavelength (dual)
description of the dimer ordering patterns transitioning
out of the so-called Coulomb phase of the classical dimer
models,22,23,24 stressing in particular a composite char-
acter of the naive order parameters for the valence bond
solid phases. It would interesting to explore such 3d clas-
sical dimer models and their transitions further.
APPENDIX A: CLASSICAL U(1) DUALITY WITH
BACKGROUND CHARGE
In this section we derive duality for classical com-
pact U(1) gauge theory.12,25 However we will use a gen-
eral notation of antisymmetric tensors, or differential
forms which are fields of antisymmetric tensors. Thus
the derivation will work not only for the gauge theory,
whose objects are one dimensional, but for general n-
dimensional objects. For n = 0 this is the vortex duality
of the XY model and for n = 1 the duality of the gauge
theory. The further advantage of this derivation is that
the formulas are simpler and more transparent.
First we give the basic notations and properties of
antisymmetric tensors. An n-dimensional antisymmet-
ric tensor ω in d dimensions is a collection of numbers
ωµ1,µ2,...,µn , where µv = 1, . . . , d, which is completely an-
tisymmetric. A differential form ω(~r) is a field of these
tensors.
We define two operations. First is the exterior deriva-
tive ∂. The derivative of ω, denoted ∂ω is the (n+1)-form
(∂ω)µ1,µ2,...,µn+1 =
(−1)p∂µp1ωµp2 ,...,µpn (A1)
where the sum is over all permutations of the n+1 indices
and (−1)p is −1 if the permutation is odd and 1 if it
is even. Thus for example for n = 1, a vector field,
(∂ω)12 = ∂1ω2 − ∂2ω1 and hence this is the curl of a
vector field.
The second operation that we define is the star opera-
tor that takes n-form to (d− n)-form
(∗ω)ν1,...,νd−n =
ǫν1,...,νd−n,µd−n+1,...,µdωµd−n+1,...,µd
where ǫ is the fully antisymmetric tensor in d dimensions
and repeated indices are summed over. For example in
three dimensions for n = 2, (∗ω)1 = 12 (ω23 − ω32). Note
that ∗∗ = (−1)n(d−n).
A common operator is divergence which in this nota-
tion is proportional to ∗∂∗. As easily checked,
(∇ · ω)µ1,...,µn−1 ≡ ∂νων,µ1,...,µn−1 (A3)
= (−1)(n−1)(d−n)(∗∂ ∗ ω)µ1,...,µn−1(A4)
For a vector field this is the standard divergence.
We will work on the lattice. The variables are defined
on discrete points. We will define the coordinates of a
given variable to be those of the center of the object the
variable belongs to. For example the x component of a
one form ω in d = 3 lies on a link pointing in x direction
and it is denoted by ωx(x+1/2, y, z). The ∂ now denotes
the difference operator. For example the curl of the ω
is (∂ω)xy(x + 1/2, y + 1/2, z) = ωy(x + 1, y + 1/2, z) −
ωy(x, y+1/2, z)−ωx(x+1/2, y+1, z)+ωx(x+1/2, y, z).
Finally we will write the integration (summation) by
parts
ω · ∂φ = −
(∇ · ω) · φ+ surface term (A5)
where the dot is the sum over the component by com-
ponent product of two forms of the same n. Note that
∗ω1 · ∗ω2 = ω1 · ω2. Because we use periodic boundary
conditions below, the surface term will be zero.
Now we are ready to turn to the duality. Let a be an
n-form in d dimensions where its variables are defined on
the unit circle. The action is
S = −β
cos(∂a)− i
η · a (A6)
In the first term one takes every component at every
point, takes cosine of it and sums. In the second term
the n-form η denotes the background charge. For the
action considered in this paper, the first term is the Sa
and the second term the SB in (5), while the η is the four
dimensional vector with the time component being ±2S
and the other components being zero.
The duality proceeds by the following steps.
cos(∂a)+i
(∂a−2πp)2+i
(∂a−2π∂−1q′)2+i
J·(∂a−2π∂−1q′)+i
J·∂−1q′
×∆(∇ · J − η) (A7)
All numerical factors are dropped throughout, while the
sign “≈” is used when an approximation is being made
that does not change the qualitative aspects.
In the second line we use the Villain form of the cosine.
In the third line we have written the field p = ∂α+∂−1q′
as a curl of α plus a field of a particular monopole current
configuration q′, ∂−1q′. The ∂−1 denotes a particular
configuration of p that gives the monopole currents - that
satisfies q′ = ∂p. Then we shifted a → a − α. The
summation over α extends the integration of a over the
whole real line. The prime on q′ denotes that fact that
we are summing over fields for which ∂q′ = 0.
The third line can be obtained from the fourth one by
completing the square, shifting J and integrating it out.
In the fifth line, the ∆ denotes that the operator inside
of it is zero. This line is obtained from the fourth one by
integrating (summing) by parts and integrating out the
Next, as shown explicitly below, in our case there are
fields J0 and L0 such that
η = ∇ · J0 (A8)
J0 = (∗P∂L0)/2π (A9)
∂J0 = 0 (A10)
The P shifts a real number by a multiple of 2π so that
the result is in the interval (−π, π].
Using (A8) in (A7) we see ∂ ∗ (J − J0) = 0 and hence
we can write
J = J0 + (∗∂L)/2π (A11)
for some field L. To substitute this into (A7) we notice
the following
J2 = J20 + (∗∂L/2π)2 + 2J0 · ∗∂L/2π ≃ J20 + (∗∂L/2π)2.
The ≃ denotes that these expressions are equal under
integration, which follows from Eq. (A10). Also
∂−1q′ · ∗∂L ≃ − ∗ ∂∂−1q′ · L = − ∗ q′ · L ≡ −Q′ · L,
′·∗P∂L0 = ei∂
′·∗∂L0 ≃ e−iQ
′·L0,
where Q′ ≡ ∗q′, and we have dropped inconsequential
± signs; in the last line, the P can be removed because
the resulting expression, which is in the exponent, differs
from the original one by a multiple of 2π.
With this we can proceed to complete the duality
(∂L)2
′·(L+L0)
(∂L)2
Q·(L+L0−∂θ)
(∂L)2
(L+L0−∂θ−2πp)2
(∂L)2
−λ cos(L+L0−∂θ)
(A12)
In the first line the summation over Q′ is over integer
fields Q′ with zero divergence ∇ · Q′ = 0 - currents. In
the second line we introduced θ that imposes this con-
straint as a Lagrange multiplier and summed by parts.
In the third line we added a small term
Q2/2λ and as-
sumed that it is not going to change the basic behavior of
the system. Then we summed out Q, which introduced
integer p because Q is an integer (this is the Poisson sum-
mation formula). The second term is the Villain form of
cosine. In the last line we approximated it by cosine.
To complete it remains to find J0 and L0. The η has
values ητ (x, y, z, τ + 1/2) = (−1)x+y+z2S and zero for
other components. As easily checked
(J0)τx(x+ 1/2, y, z, τ + 1/2) =
(−1)x+y+z (A13)
and similarly for y and z with other components (other
then the ones obtained by permutation of indices) being
zero. This gives the right η and satisfies ∂J0 = 0. The
L0 can be chosen as on the Fig. 3.
In the final expression (A12) the L is 1-form and hence
a gauge field. The θ is 0-form - a number on a circle -
a matter field. Thus we obtained a noncompact U(1)
gauge theory coupled to scalar fields of monopoles with
frustrated hopping.
1 S. Taniguchi et al., J. Phys. Soc. Jpn. 64, 2758 (1995);
2 D. S. Chow, P. Wzietek, D. Fogliatti, B. Alavi, D. J. Tan-
tillo, C. A. Merlic, and S. E. Brown, Phys. Rev. Lett. 81,
3984 (1998).
3 H. Kageyama, K. Yoshimura, R. Stern, N.V. Mushnikov,
K. Onizuka, M. Kato, K. Kosuge, C.P. Slichter, T. Goto,
and Y. Ueda, Phys. Rev. Lett. 82, 3168 (1999); H.
Kageyama, M. Nishi, N. Aso, K. Onizuka, T. Yosihama,
K. Nukui, K. Kodama, K. Kakurai, and Y. Ueda, Phys.
Rev. Lett. 84, 5876 (2000);
4 A. W. Sandvik, S. Daul, R. R. P. Singh, and D. J.
Scalapino, Phys. Rev. Lett. 89, 247201 (2002).
5 K. S. D. Beach and A. W. Sandvik, cond-mat/0612126.
6 K. Harada, N. Kawashima, and M. Troyer, J. Phys. Soc.
Jpn 76, 013703 (2007).
7 A. Banerjee, S. V. Isakov, K. Damle and Y. B. Kim,
cond-mat/0702029.
8 K. Harada and N. Kawashima, Phys. Rev. B 65, 052403
(2002).
9 D. P. Arovas and A. Auerbach, Phys. Rev. Lett. 61, 617
(1988); Phys. Rev. B 38, 316 (1988).
10 N. Read and S. Sachdev, Phys. Rev. Lett. 62, 1694 (1989);
Phys. Rev. B 42, 4568 (1990).
11 F. D. M. Haldane, Phys. Rev. Lett. 61, 1029 (1988).
12 O. I. Motrunich and T. Senthil, Phys. Rev. B 71, 125102
(2005)
13 J.-S. Bernier, Y.-J. Kao, and Y. B. Kim, Phys. Rev. B 71,
184406 (2005).
14 D. L. Bergman, G. A. Fiete and L. Balents, Phys. Rev. B
73, 134402 (2006)
15 S. Sachdev and R. Jalabert, Mod. Phys. Lett. B 4, 1043
(1990).
16 S. Sachdev and K. Park, Ann. Phys. (N.Y.) 298, 58 (2002).
17 W. Zheng and S. Sachdev, Phys. Rev. B 40, 2704 (1989).
18 E. Fradkin and S. Kivelson, Mod. Phys. Lett. B 4, 225
(1990).
19 E. Fradkin, Field Theories of Condensed Matter Systems,
Westview Press, 1991
20 H. T. Diep, A. Ghazali, and P. Lallemand, J. Phys. C 18,
5881 (1985).
21 K. Kim and D. Stroud, Phys. Rev. B 73, 224504 (2006).
22 D. A. Huse, W. Krauth, R. Moessner, and S. L. Sondhi,
Phys. Rev. Lett. 91, 167004 (2003).
23 M. Hermele, M. P. A. Fisher, and L. Balents, Phys. Rev.
B 69, 064404 (2004).
24 F. Alet, G. Misguich, V. Pasquier, R. Moessner, and J. L.
Jacobsen, Phys. Rev. Lett. 97, 030403 (2006).
25 M. Peskin, Ann. Phys. (NY) 113, 122 (1978); R. Savit,
Rev. Mod. Phys. 52, 453 (1980).
http://arxiv.org/abs/cond-mat/0612126
http://arxiv.org/abs/cond-mat/0702029
ABSTRACT
  We study spin S=1 and S=3/2 Heisenberg antiferromagnets on a cubic lattice
focusing on spin solid states. Using Schwinger boson formulation for spins, we
start in a U(1) spin liquid phase proximate to Neel phase and explore possible
confining paramagnetic phases as we transition away from the spin liquid by the
process of monopole condensation. Electromagnetic duality is used to rewrite
the theory in terms of monopoles. For spin 1 we find several candidate phases
of which the most natural one is a phase with spins organized into parallel
Haldane chains. For spin 3/2 we find that the most natural phase has spins
organized into parallel ladders. As a by-product, we also write a Landau theory
of the ordering in two special classical frustrated XY models on the cubic
lattice, one of which is the fully frustrated XY model. In a particular limit
our approach maps to a dimer model with 2S dimers coming out of every site, and
we find the same spin solid phases in this regime as well.

<|endoftext|><|startoftext|>
Introduction
Symmetry is one of the most important notions in quantum field theory. In many examples, it
is useful in investigating properties of quantum field theories non-perturbatively, is a guiding
principle in constructing field theories for various purposes such as grand unification, or
gives powerful methods in finding exact solutions. It also plays important roles in actual
renormalization procedures. Therefore it should be interesting to study symmetries also in
noncommutative field theories [1, 2, 3, 4, 5], which may result from some quantum gravity
effects [6].
A difficulty in the study in this direction is the apparent violation of basic symme-
tries such as Poincaré symmetry in the noncommutativity of spacetime. For example, the
Moyal plane [xµ, xν ] = iθµν is translational invariant, but is not Lorentz or rotational
invariant. Another example is the three-dimensional spacetime with noncommutativity
[xi, xj] = iκǫijkxk (i, j, k = 1, 2, 3) [7, 8, 9, 10] with a noncommutativity parameter κ. This
noncommutative spacetime is Lorentz-invariant, but is not invariant under the translational
transformation xi → xi + ai with c-number ai. In fact, a naive construction of noncom-
mutative quantum field theory on this spacetime leads to rather disastrous violations of
energy-momentum conservation [10]: the violations coming from the non-planar diagrams
do not vanish in the commutative limit κ→ 0 as in the UV/IR mixing phenomena [11].
In recent years, however, there has been interesting conceptual progress in understanding
symmetries in noncommutative field theories: the symmetry transformations in noncommu-
tative spacetime are not the usual Lie-algebraic type, but should be generalized to have
Hopf algebraic structures. The Moyal plane was pointed out to be invariant under the
twisted Poincaré transformation in [12, 13, 14] and under the twisted diffeomorphism in
[15, 16, 17, 18]. There have been various proposals to implement the twisted Poincaré in-
variance in quantum field theories [19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]. As for the
noncommutative spacetime with [xi, xj ] = iκǫijkxk, a noncommutative quantum field theory
was derived as the effective field theory of three-dimensional quantum gravity with matters
[31]. Its essential difference from the naive construction mentioned above is the nontrivial
braiding for each crossing in non-planar Feynman diagrams. With this braiding, there ex-
ists a kind of conserved energy-momentum in the amplitudes, and the energy-momentum
operators have Hopf algebraic structures.
Our aim of this paper is to systematically understand these Hopf algebraic symmetries
and their consequences in noncommutative field theories in the framework of braided quan-
tum field theories proposed by Oeckl [34]. In the usual quantum field theories, symmetries
give non-perturbative relations among correlation functions. We will see that such relations
have natural extensions to the Hopf algebraic symmetries in braided quantum field theories,
and will obtain the four conditions for the relations to hold. These conditions should be
interpreted as the criteria of the symmetries in braided quantum field theories.
This paper is organized as follows. In the following section, we review braided quantum
field theory. This review part follows faithfully the original paper [34], but figures are more
extensively used in the proofs and the explanations to make this paper self-contained and
intuitively understandable. We start with braided category and braided Hopf algebra. Then
correlation functions of braided quantum field theory are represented in terms of them.
Finally braided Feynman rules are given.
In Section 3, we first review the axioms of action1 of an algebra on vector spaces. Then
we consider the relations among correlation functions in braided quantum field theory. We
find that four algebraic conditions are required for the relations to hold. Then, as concrete
examples, we discuss whether the noncommutative field theories mentioned above have the
Poincaré symmetry by checking the four conditions. In the former case, we find that the
twisted Poincaré symmetry is implemented only after the introduction of a non-trivial braid-
ing factor, which agrees with the previous proposal in [21, 35]. In the latter case, we find
that the theory has a kind of translational symmetry, which is different from the usual one
by multi-field contributions. We also give some examples of such relations among correlation
functions and the implications.
The final section is devoted to summary and comments. We comment on quantum
field theory on κ-Minkowski spacetime whose noncommutativity of coordinates is [x0, xj] =
xj (j = 1, 2, 3) [36].
2 Review of braided quantum field theory
2.1 Braided categories and braided Hopf algebras
First of all, we review braided categories and braided Hopf algebras [34, 37]. Braided cate-
gories are composed of an object X , which is a vector space, a dual object X∗, which is a
dual vector space, and morphisms
ev : X∗ ⊗X → k (evaluation), (1)
coev : k → X ⊗X∗ (coevaluation), (2)
where k is a c-number. The composition of the two morphisms in an obvious way makes the
identity. Then the braided categories have also an invertible morphism
ψV,W : V ⊗W → W ⊗ V (braiding), (3)
where V,W are any pair of vector spaces. Generally the inverse of braiding is not equal to
the braiding itself.
The braiding is required to be compatible with the tensor product such that
ψU,V⊗W = (id⊗ ψU,W ) ◦ (ψU,V ⊗ id),
ψU⊗V,W = (ψU,W ⊗ id) ◦ (id⊗ ψV,W ). (4)
1We use the italic symbol to distinguish it from the action S.
Figure 1: The evaluation, coevaluation, braiding and its inverse.
Then the braiding is also required to be intersectional under any morphisms in a Hopf
algebra. For example,
ψZ,W (Q⊗ id) = (id⊗Q)ψV,W for any Q : V → Z,
ψV,Z(id⊗Q) = (Q⊗ id)ψV,W for any Q : W → Z, (5)
where Z is a vector space.
We can represent these axioms in pictorial ways [38]. We write the morphisms, ev, coev,
ψ, downwards as in Figure 1. Thus the axioms (4) are represented as in Figure 2, and the
axioms (5) are represented as in Figure 3.
Next we consider the polynomials of X ,
X̂ :=
Xn, with X0 := 1 and Xn := X ⊗ · · · ⊗X︸ ︷︷ ︸
n times
, (6)
where 1 is the trivial one-dimensional space. X̂ naturally has the structure of a braided
Hopf algebra via
· (product) : X̂⊗̂X̂ → X̂, (7)
η (unit) : k → X̂ ; η(1) = 1, (8)
∆ (coproduct) : X̂ → X̂⊗̂X̂ ; ∆φ = φ⊗̂1+ 1⊗̂φ, and ∆(1) = 1⊗̂1, (9)
ǫ (counit) : X̂ → k ; ǫ(φ) = 0, and ǫ(1) = 1, (10)
S (antipode) : X̂ → X̂ ; Sφ = −φ, and S(1) = 1, (11)
where φ ∈ X . The tensor product ⊗ is the same as the usual product of Xs, while the new
tensor product ⊗̂ is the tensor product of X̂s. The coproduct ∆, counit ǫ, antipode S of the
Figure 2: The axioms of braiding (4).
Figure 3: The axioms of braiding (5).
Figure 4: The axioms of coproduct, counit, antipode for products.
products of Xs are defined inductively by
∆ ◦ · = (·⊗̂·) ◦ (id⊗̂ψ⊗̂id) ◦ (∆⊗̂∆), (12)
ǫ ◦ · = · ◦ (ǫ⊗̂ǫ), (13)
S ◦ · = · ◦ ψ ◦ (S⊗̂S). (14)
These axioms are diagrammatically represented in Figure 4.
2.2 Braided quantum field theory
Next we represent braided quantum field theory [34] in terms of the braided category and
the braided Hopf algebra. We take the vector space X as the space of a field φ(x), where
x denotes a general index for independent modes of the field. Thus X̂ is the space of
polynomials of the fields such as φ(x1)φ(x2) · · ·φ(xn), and 1 correspond to the constant field
of unit. We also take the dual vector space X∗ as the space of differentials δ/δφ(x). We take
the evaluation and the coevaluation as follows,
δφ(x)
⊗ φ(x′) → δ(x− x′), (15)
coev : 1 →
φ(x)⊗ δ
δφ(x)
, (16)
Figure 5: The differentials on X̂.
where the distribution and the integration should symbolically be understood, and their
detailed forms, which may contain non-trivial measures, depend on each case.
The differential on X̂ is defined by
diff := (êv ⊗ id) ◦ (id⊗∆); X∗ ⊗ X̂ → X̂, (17)
where
êv|X∗⊗Xn =
ev for n = 1,
0 for n 6= 1.
Diagrammatically this is given by Figure 5.
To see whether the map diff gives really the differential of products, let us compute the
differential of φ(x)φ(y) as a simple example using the definition (17). This becomes
δφ(x′)
⊗ φ(x)φ(y)
= (êv ⊗ id) ◦ (id⊗∆)
δφ(x′)
⊗ φ(x)φ(y)
= (êv ⊗ id) ◦
δφ(x′)
⊗∆(φ(x)φ(y))
= (êv ⊗ id) ◦
δφ(x′)
⊗ (φ(x)φ(y)⊗̂1
+ φ(x)⊗̂φ(y) + ψ(φ(x)⊗̂φ(y)) + 1⊗̂φ(x)φ(y))
= δ(x′ − x)⊗ φ(y) + (êv ⊗ id) ◦
δφ(x′)
⊗ ψ(φ(x)⊗̂φ(y))
where we have used the axiom (12) in deriving the third line. If the braiding is trivial, we
find that the differential (17) satisfies the usual Leibniz rule.
Figure 6: Diagram of ψn,m.
Generally we find a braided Leibniz rule
∂(αβ) = ∂(α)β + ψ−1(∂ ⊗ α)(β) (20)
∂(α) = (ev ⊗ idn−1)(∂ ⊗ [n]ψα), (21)
where ∂ ∈ X∗, α, β ∈ X̂ , and we have used a simplified notation
∂(α) := diff(∂ ⊗ α). (22)
Here n is the degree of α, and [n]ψ is called a braided integer defined by
[n]ψ := id
n + ψ ⊗ idn−2 + · · ·+ ψn−2,1 ⊗ id + ψn−1,1, (23)
where ψn,m is a braiding morphism given in Figure 6.
The proofs of the formula (20), (21) are in Appendix A.
Now we define a Gaussian integration, which defines the path integral. The definition is
given by ∫
∂(αw) := 0 for ∂ ∈ X∗, α ∈ X̂, (24)
where w ∈ X̂ is a Gaussian weight. In field theory, w is the exponential of the free part of
the action, e−S0 .
In order to obtain a formula for correlation functions, we define a morphism γ : X∗ → X
such that
∂(w) := −γ(∂)w. (25)
This morphism is assumed to be commutative with the braiding as in (5). If w = e−S0 ,
γ(∂) = ∂(S0). In field theory, this is the kinetic part of the action, or the inverse of the
propagator.
Starting from (24), we can represent correlation functions of a free field theory in terms of
the braided category and the braided Hopf algebra. This is the analog of the Wick theorem
in braided quantum field theory. The definition of the free n point correlation function is
given by
Z(0)n (α) :=
, (26)
where the degree of α is n. Algebraically, this is given by
2 = ev ◦ (γ−1 ⊗ id) ◦ ψ, (27)
2n = (Z
n ◦ [2n− 1]′ψ!!, (28)
2n−1 = 0, (29)
where
[2n− 1]′ψ!! := ([1]′ψ ⊗ id2n−1) ◦ ([3]′ψ ⊗ id2n−3) ◦ · · · ◦ ([2n− 1]′ψ ⊗ id), (30)
ψ := id
n + idn−2 ⊗ ψ−1 + · · ·+ ψ−11,n−1
= ψ−11,n−1 ◦ [n]ψ. (31)
The proofs of (27), (28), (29) are in Appendix B.
Next we consider correlation functions with the existence of an interaction. For S =
S0 + λSint, a correlation function is perturbatively given by
Zn(α) =
αe−S∫
α(1− λSint + · · · )e−S0∫
(1− λSint + · · · )e−S0
, (32)
where α ∈ Xn. Introducing a morphism Sint : k → Xk, where k is the degree of Sint, the
correlation function is algebraically given by
n − λZ(0)n+k ◦ (id
n ⊗ Sint) + 12λ
n+2k ◦ (id
n ⊗ Sint ⊗ Sint) + · · ·
1− λZ(0)
◦ Sint + 12λ2Z
2k ◦ (Sint ⊗ Sint) + · · ·
. (33)
Acting Zn on α ∈ Xn, we obtain the correlation function (32). One can obviously extend
Sint to include various interaction terms.
2.3 Braided Feynman rules
From the results in the preceding subsection, a correlation function can be represented by
summation of diagrams obeying the following rules below.
Figure 7: Propagator (left) and vertex (right).
Figure 8: The braiding ψ (left) and its inverse ψ−1 (right).
• An n-point function Zn is a morphism Xn → k. Thus a Feynman diagram starts with
n strands at the top and must be closed at the bottom.
• The propagator Z(0)2 : X ⊗X → k is represented by the left of Figure 7, which is the
abbreviation of Figure 9.
• The interaction vertex Sint : k → Xk is represented by the right of Figure 7. Generally
the order of the strands is noncommutative.
• The two kinds of crossings, which are represented in Figure 8, correspond to the braid-
ing and its inverse.
• Any Feynman diagram is built out of propagators, vertices, and crossings, and is closed
at the bottom.
3 Symmetries in braided quantum field theory
In this section, we discuss symmetries in braided quantum field theory. In order to represent
symmetry transformations on fields, we review general description of an action in Section
Figure 9: The propagator, which is abbreviated in the left figure of Figure 7.
3.1. In Section 3.2, we study relations among correlation functions. We find four conditions
for such relations to follow from a symmetry algebra. In Section 3.3 and 3.4, we treat two
examples of (braided) noncommutative field theories and discuss their Poincaré symmetries.
3.1 General description of an action
We review an action of a general Hopf algebra on vector spaces in a mathematical language
[37, 39].
An action αV is a map αV : A⊗ V → V , where A is an arbitrary Hopf algebra and V is
a vector space (in our case, A is a symmetry algebra, and V = X or X∗). We will denote
the coproduct and the counit of the Hopf algebra2 by ∆′ and ǫ′ to distinguish them from
those of the braided Hopf algebra of fields in Section 2. We do not write all the axioms of
an action, but our important axioms are the following.
• αV satisfies the following condition.
αV ◦ (· ⊗ id) = αV ◦ (id⊗ αV ), (34)
where the equality acts on A ⊗ A ⊗ V . This means that αV ((a · b) ⊗ V ) = αV (a ⊗
(αV (b⊗ V ))), where a, b ∈ A. In short we can write this as
(a · b) ⊲ V = a ⊲ (b ⊲ V ). (35)
• An action on 1, which is in a vector space, is defined by
αV (a⊗ 1) = ǫ′(a)1, (36)
where ǫ′(a) is the counit of an algebra a ∈ A.
• An action on a tensor product of vector spaces V,W is defined by
αV⊗W (a) := ((αV ⊗ αW ) ◦∆′)(a) =
αV (a
(1))⊗ αW (ai(2)), a ∈ A, (37)
where ∆′(a) =
ai(1) ⊗ ai(2) is the coproduct of the Hopf algebra A. In the case of a
usual Lie-algebraic transformation, its coproduct is given by ∆′(a) = a ⊗ 1 + 1 ⊗ a,
where 1 is in A. This gives the usual Leibnitz rule.
• Since a Hopf algebra has the coassociativity that
((∆′ ⊗ id) ◦∆′)(a) = ((id⊗∆′) ◦∆′)(a), (38)
2We omit the antipode.
the action on a tensor product of vector spaces, which is obtained by the multiple
operations of ∆′ on a, is actually unique. An important consequence is that one can
divide the action on a tensor product of vector spaces as
a⊲(V1 ⊗ · · · ⊗ Vk−1 ⊗ Vk ⊗ · · · ⊗ Vn) =∑
ai(1) ⊲ (V1 ⊗ · · · ⊗ Vk−1)⊗ ai(2) ⊲ (Vk ⊗ · · · ⊗ Vn) (39)
for any k.
3.2 Symmetry relations among correlation functions and their al-
gebraic descriptions
The expression of the correlation functions (33) is perturbative in interactions, but is a full
order algebraic description. Therefore we can discuss the symmetry of the theory and the
implied relations among correlation functions by using this expression. We may even expect
that the relations will hold non-perturbatively.
In usual quantum field theory, if a field theory has a certain symmetry, there is a relation
among the correlation functions in the form,
〈φ(x1) · · · δaφ(xi) · · ·φ(xn)〉 = 0, (40)
where δaφ(x) is a variation of a field under a transformation a, on the assumption that the
path integral measure and the action are invariant under the transformation.
If the coproduct of a symmetry algebra is not the usual Lie-algebraic type and thus the
Leibniz rule is deformed, the relation will generally have the form,
c(bi)a 〈φ(x1) · · · δbφ(xi) · · ·φ(xn)〉
+c(bi)(cj)a 〈φ(x1) · · · δbφ(xi) · · · δcφ(xj) · · ·φ(xn)〉
+c(bi)(cj)(dk)a 〈φ(x1) · · · δbφ(xi) · · · δcφ(xj) · · · δdφ(xk) · · ·φ(xn)〉
+ · · · = 0, (41)
where c···a are some coefficients. Its essential difference from (40) is the multi-field contribu-
tions. In our algebraic language, the relation can be written as
Zn(a ⊲ χ) = ǫ
′(a)Zn(χ), for a ∈ A, χ ∈ Xn. (42)
This is equivalent to Figure 10 in our diagrammatic representation. Then we consider what
an algebraic structure is required for (42) to hold for any a and χ, i.e. the theory is invariant
under the Hopf algebra transformation A.
Figure 10: A relation among correlation functions in the diagrammatic representation. n
is the number of external legs, k is the order of the interaction, and p is the order of the
perturbation. n+ kp is even.
Let us write the coproduct of an element a ∈ A as
∆′(a) =
f s ⊗ gs, (43)
where f s, gs ∈ A. Since the coproduct must satisfy the Hopf algebra axiom [37],
(ǫ′ ⊗ id)∆′(a) = (id⊗ ǫ′)∆′(a) = a, (44)
f s, gs must satisfy
ǫ′(f s)⊗ gs =
f s ⊗ ǫ′(gs) = a. (45)
For all the relations among correlation functions to hold, we find the following four
conditions for any action a ∈ A.
• (Condition 1) Sint must satisfy
a ⊲ Sint = ǫ
′(a)Sint. (46)
• (Condition 2) The braiding ψ is an intertwining operator. That is
ψ(a ⊲ (V ⊗W )) = a ⊲ ψ(V ⊗W ). (47)
• (Condition 3) γ−1 and a are commutative,
a ⊲ (γ−1(V )) = γ−1(a ⊲ V ). (48)
• (Condition 4) Under an action a, the evaluation map follows
ev(a ⊲ (X∗ ⊗X)) = ǫ′(a)ev(X∗ ⊗X). (49)
Condition 1 to 4 are diagrammatically represented in Figure 11. It is clear that, when the
algebra A is generated from a finite number of its independent elements, it is enough for
these generators to satisfy these conditions.
Condition 1 is the requirement of the symmetry at the classical level for the interaction.
We can extend this condition to
(a ⊲ Xn)⊗ Spint = a ⊲ (Xn ⊗ S
int). (50)
The proof is the following. From a coproduct (43) and its coassociativity (39), the right
hand side of (50) is equal to
(f s ⊲ (Xn ⊗ Sp−1int ))⊗ gs ⊲ Sint (51)
Since Condition 1 implies
gs ⊲ Sint = ǫ
′(gs)Sint, (52)
(51) becomes
(f s ⊲ (Xn ⊗ Sp−1int ))⊗ ǫ′(gs)Sint
= a ⊲ (Xn ⊗ Sp−1int )⊗ Sint, (53)
where we have used (45). Iterating this procedure, we obtain the left-hand side of (50).
Condition 2,3,4 can also be extended to
[n + kp− 1]ψ!! ◦ (a ⊲ Xn+kp) = a ⊲ [n+ kp− 1]ψ!! Xn+kp, (54)
(γ−1 ⊗ id)
2 ◦ (a ⊲ Xn+kp) = a ⊲ (γ−1 ⊗ id)
2 Xn+kp, (55)
2 (a ⊲ (X∗ ⊗X)
2 ) = ǫ′(a) ev
2 (X∗ ⊗X)
2 . (56)
We can find that these extended conditions (50), (54), (55), (56) can be represented as
in Figure 12. In the diagrammatic language, the relation among correlation functions holds
if an action can pass downwards through a Feynman diagram and satisfies (36).
3.3 Symmetries of the effective noncommutative field theory of
three-dimensional quantum gravity coupled with scalar parti-
In this subsection, we discuss the Poincaré symmetry of the effective noncommutative field
theory of three-dimensional quantum gravity coupled with scalar particles, which was ob-
tained in [31] by studying the Ponzano-Regge model [40] coupled with spinless particles. The
Figure 11: Conditions 1,2,3, and 4.
Figure 12: A relation among correlation functions is satisfied if the four conditions (46),
(47), (48), (49) are satisfied.
symmetries of this theory is also known as DSU(2), which was discussed in [32, 33]. We first
review the field theory [10, 31].
Let φ(x) be a scalar field on a three-dimensional space x = (x1, x2, x3). Its Fourier
transformation is given by
φ(x) =
dgφ̃(g)e
tr(Xg), (57)
where κ is a constant, X = ixiσi, and g = P 0−iκP iσi ∈ SO(3)3 with Pauli matrices σi. Here∫
dg is the Haar measure of SO(3) and P 0 = ±
1− κ2P iPi by definition. In the following
discussions, we will only deal with the Euclidean case, but the Lorentzian case can also be
treated in a similar manner by replacing SO(3) with SL(2, R).
The definition of the star product is given by
tr(Xg1) ⋆ e
tr(Xg2) := e
tr(Xg1g2). (58)
Differentiating both hands sides of (58) with respect to P i1 := P
i(g1) and P
2 := P
j(g2) and
then taking the limit P i1, P
2 → 0, one finds the SO(3) Lie-algebraic space-time noncommu-
tativity [7, 8, 9],
[xi, xj]⋆ = 2iκǫ
ijkxk. (59)
For example, the action4 of a φ3 theory is
(∂iφ ⋆ ∂iφ)(x)−
M2(φ ⋆ φ)(x) +
(φ ⋆ φ ⋆ φ)(x)
, (60)
where M2 = sin
. Its momentum representation is
P 2(g)−M2
φ̃(g)φ̃(g−1)
dg1dg2dg3δ(g1g2g3)φ̃(g1)φ̃(g2)φ̃(g3), (61)
from which it is straightforward to read the Feynman rules.
Some quantum properties of this scalar field theory were analyzed in [10]. As can be
seen from (59), the naive translational symmetry is violated. In fact, the violation is rather
disastrous. There exists a kind of conserved energy-momentum in the amplitudes of the
tree and the planar loop diagrams, but this energy-momentum is not conserved in the non-
planar loop diagrams. Moreover, the violation of the energy-momentum conservation does
not vanish in the commutative limit κ → 0 due to a mechanism similar to the UV/IR
phenomena [11].
3The identification g ∼ −g is implicitly assumed.
4Since in the Ponzano-Regge model the definition of the weight of partition function is eiS despite of
Euclidean theory, the sign of the mass term is not the usual one.
In the effective field theory of quantum gravity coupled with spinless particles, however,
the Feynman rules contain also a non-trivial braiding rule for each crossing, which comes
from a flatness condition in a graph of intersecting particles [31]. This can be incorporated
as a braiding between the scalar fields,
ψ(φ̃(g1)φ̃(g2)) = φ̃(g2)φ̃(g
2 g1g2), (62)
in the braided quantum field theory.
From the direct analysis of the Feynman graphs with this braiding rule, one can easily find
that the energy-momentum mentioned above is conserved also in the non-planar diagrams.
This suggests the existence of a translational symmetry in the quantum field theory. In
the sequel, we will discuss the embedding of this field theory into the framework of braided
quantum field theory, and will check the four conditions for its translational and rotational
symmetries.
We use the momentum representation, and take X as the space of φ̃(g) and X∗ as that
δφ̃(g)
. We take the braided Hopf algebra of the fields as follows,
∆ : φ̃(g) → φ̃(g)⊗̂1 + 1⊗̂φ̃(g), (63)
ǫ : φ̃(g) → 0, (64)
S : φ̃(g) → −φ̃(g), (65)
ψ : φ̃(g1)⊗ φ̃(g2) → φ̃(g2)⊗ φ̃(g−12 g1g2). (66)
The evaluation and coevaluation maps are given by
δφ̃(g)
⊗ φ̃(g′) → δ(g−1g′), (67)
coev : 1 →
dgφ̃(g)⊗ δ
δφ̃(g)
. (68)
From γ(∂) = ∂S0 = (P
2(g)−m2)φ̃(g−1),
γ−1(φ̃(g)) =
P 2(g−1)−m2
δφ̃(g−1)
. (69)
From the algebraic consistencies in Figure 13, the braidings between X and X∗ and the
braiding between X∗s are determined to be
δφ̃(g1)
⊗ φ̃(g2)
= φ̃(g2)⊗
δφ̃(g−12 g1g2)
, (70)
φ̃(g1)⊗
δφ̃(g2)
δφ̃(g2)
⊗ φ̃(g2g1g−12 ), (71)
δφ̃(g1)
δφ̃(g2)
δφ̃(g2)
δφ̃(g2g1g
. (72)
Figure 13: The algebraic consistency conditions of coevaluation map and X , X∗.
In this derivation, we have used the invariance of the Haar measure d(g−1g′g) = dg′.
Now we consider a translational transformation of the field. If we shift xi to xi + ǫi, a
field φ(x) becomes
φ(x) → φ(x+ ǫ)
dgφ̃(g)ei(x+ǫ)
iPi(g)
dg(1 + iǫiPi(g))φ̃(g)e
ixiPi(g). (73)
Thus in the momentum representation, the translational transformation corresponds to an
action
P i ⊲ φ̃(g) = P i(g)φ̃(g), P 0 ⊲ φ̃(g) = P 0(g)φ̃(g). (74)
From the requirement that the star product (58) conserve a kind of momentum, the action
on a product of fields should be
P i ⊲ (φ̃(g1)φ̃(g2)) = P
i(g1g2)φ̃(g1)φ̃(g2)
= (P 01P
2 + P
1 + κǫ
2 )φ̃(g1)φ̃(g2), (75)
P 0 ⊲ (φ̃(g1)φ̃(g2)) = (P
2 − κ2P i1P2i)φ̃(g1)φ̃(g2). (76)
This determines the coproduct of P i, P 0 as
∆′(P i) = P 0 ⊗ P i + P i ⊗ P 0 + κǫijkP j ⊗ P k, (77)
∆′(P 0) = P 0 ⊗ P 0 − κ2P i ⊗ Pi. (78)
This coproduct satisfies the coassociativity, which essentially comes from the associativity
of the group multiplication.
From the axiom (44), the counit of P i, P 0 is given by
ǫ′(P i) = ǫ′(P 0) = 0. (79)
Since the conservation of momentum under the coevaluation map (68) requires that the
action of P i on
dg(φ̃(g)⊗ δ
δφ̃(g)
) vanish from (36), the action of P i on δ
δφ̃(g)
must be
P i ⊲
δφ̃(g)
= P i(g−1)
δφ̃(g)
. (80)
In the following, we see that the momentum algebra satisfies the four conditions (46),
(47), (48), (49).
Condition 1 is satisfied since
P i ⊲ Sint
dg1dg2dg3δ(g1g2g3)P
i ⊲ (φ̃(g1)φ̃(g2)φ̃(g3))
dg1dg2dg3δ(g1g2g3)P
i(g1g2g3)(φ̃(g1)φ̃(g2)φ̃(g3))
= 0. (81)
Condition 2 is satisfied since
ψ(P i ⊲ (φ̃(g1)φ̃(g2))) = P
i(g1g2)(φ̃(g2)φ̃(g
2 g1g2)),
P i ⊲ ψ(φ̃(g1)φ̃(g2)) = P
i(g2g
2 g1g2)(φ̃(g2)φ̃(g
2 g1g2)).
Condition 3 is satisfied since
P i ⊲ γ−1(φ̃(g)) =
P 2(g−1)−m2
P i(g)
δφ̃(g−1)
γ−1(P i ⊲ φ̃(g)) =
P 2(g−1)−m2
P i(g)
δφ̃(g−1)
Condition 4 is satisfied since
P i ⊲
δφ̃(g1)
⊗ φ̃(g2)
= P i(g−11 g2) ev
δφ̃(g1)
⊗ φ̃(g2)
= 0. (82)
Thus we find that the effective braided noncommutative field theory of three-dimensional
quantum gravity coupled with spinless particles has the translational symmetry.
Next we consider a rotational symmetry. The rotational symmetry corresponds to an
action
Λ ⊲ φ̃(g) = φ̃(h−1gh), (83)
which is the usual Lie-group one. The action on the tensor product is
Λ ⊲ (φ̃(g1)⊗ φ̃(g2)) = φ̃(h−1g1h)⊗ φ̃(h−1g2h). (84)
Thus the coproduct of the rotational symmetry is given by
∆′(Λ) = Λ⊗ Λ. (85)
From the axiom (44), the counit of Λ is given by
ǫ′(Λ) = 1. (86)
Condition 1 is satisfied since
Λ ⊲ Sint
dg1dg2dg3δ(g1g2g3)Λ ⊲ (φ̃(g1)φ̃(g2)φ̃(g3))
dg1dg2dg3δ(g1g2g3)(φ̃(h
−1g1h)φ̃(h
−1g2h)φ̃(h
−1g3h))
=ǫ′(Λ)Sint. (87)
Condition 2 is satisfied since
ψ(Λ ⊲ (φ̃(g1)⊗ φ̃(g2))) = φ̃(h−1g2h)⊗ φ̃(h−1g−12 g1g2h)
Λ ⊲ ψ(φ̃(g1)⊗ φ̃(g2)) = φ̃(h−1g2h)⊗ φ̃(h−1g−12 g1g2h). (88)
Condition 3 is satisfied since
Λ ⊲ γ−1(φ̃(g)) =
P 2(g−1)−m2
δφ̃(h−1g−1h)
γ−1(Λ ⊲ φ̃(g)) =
P 2(h−1g−1h)−m2
δφ̃(h−1g−1h)
P 2(g−1)−m2
δφ̃(h−1g−1h)
. (89)
Condition 4 is satisfied since
δφ̃(g1)
⊗ φ̃(g2)
δφ̃(h−1g1h)
⊗ φ̃(h−1g2h)
= δ(g−11 g2)
= ǫ′(Λ)ev
δφ̃(g1)
⊗ φ̃(g2)
. (90)
Thus we find that this braided noncommutative field theory has also the rotational sym-
metry.
3.4 Twisted Poincaré symmetry of noncommutative field theory
on Moyal plane
In this subsection, we discuss the twisted Poincaré symmetry of noncommutative field theory
on Moyal plane [xµ, xν ] = iθµν .
For example, the action of a φ3 theory is given by
(∂µφ ∗ ∂µφ)(x)−
m2(φ ∗ φ)(x) + λ
(φ ∗ φ ∗ φ)(x)
, (91)
where the star product is given by
φ(x) ∗ φ(x) = e
θµν∂xµ∂
νφ(x)φ(y)
. (92)
In the momentum representation, the action is
(p2 −m2)φ̃(p)φ̃(−p)
dDp1d
µνp2νδ(p1 + p2 + p3)φ̃(p1)φ̃(p2)φ̃(p3)
. (93)
We take X as the space of φ̃(p) and X∗ as that of δ
δφ̃(p)
. Then we take the braided Hopf
algebra as follows:
∆ : φ̃(p) → φ̃(p)⊗̂1+ 1⊗̂φ̃(p), (94)
ǫ : φ̃(p) → 0, (95)
S : φ̃(p) → −φ̃(p). (96)
From γ(∂) = ∂S0 = (p
2 −m2)φ̃(−p),
γ−1(φ̃(p)) =
p2 −m2
δφ̃(−p)
. (97)
Let us consider the twisted Poincaré symmetry [12, 13, 14]. The coproduct and the counit
of the twisted Poincaré algebra is given by
∆′(P µ) = P µ ⊗ 1+ 1⊗ P µ,
ǫ′(P µ) = 0,
∆′(Mµν) =Mµν ⊗ 1+ 1⊗Mµν
θαβ [(δµαP
ν − δναP µ)⊗ Pβ + Pα ⊗ (δ
ν − δνβP µ)],
ǫ′(Mµν) = 0. (98)
Thus the action of the twisted Lorentz algebra on the tensor product is
Mµν ⊲ (φ̃(p1)⊗ φ̃(p2)) =Mµν ⊲ φ̃(p1)⊗ φ̃(p2) + φ̃(p1)⊗Mµν ⊲ φ̃(p2)
θαβ [(δµαP
ν − δναP µ) ⊲ φ̃(p1)⊗ Pβ ⊲ φ̃(p2)
+ Pα ⊲ φ̃(p1)⊗ (δµβP
ν − δνβP µ) ⊲ φ̃(p2)], (99)
where Mµν ⊲ φ̃(p) = i(pµ∂/∂pν − pν∂/∂pµ)φ̃(p) and P µ ⊲ φ̃(p) = pµφ̃(p). The actions of Mµν
and P µ on δ
δφ̃(p)
Mµν ⊲
δφ̃(p)
= i(pµ∂/∂pν − pν∂/∂pµ)
δφ̃(p)
, (100)
P µ ⊲
δφ̃(p)
= −pµ δ
δφ̃(p)
. (101)
One easily finds that three conditions (46), (48), (49) are satisfied, but (47) is not if the
braiding is trivial. In order to keep the invariance, the braiding must be taken as
ψ(φ̃(p1)⊗ φ̃(p2)) = eiθ
αβp2α⊗p1β(φ̃(p2)⊗ φ̃(p1)). (102)
This agrees with the previous proposal [21, 35].
We can easily check that the translational symmetry holds since the coproduct ∆′(P µ)
follows the usual Leibniz rule.
3.5 Relations among correlation functions : Examples
Now we have checked, in all orders of perturbation, that the two theories in the preceding
sections have symmetry relations among correlation functions implied by the Hopf algebra
symmetries. In Section 3.3 we gave how the translational generator acts on a product of
fields in (75), (76) in the momentum representation. Since the physical meaning of this Hopf
algebra transformation is not so clear, it would be interesting to see explicitly the symmetry
relations among correlation functions. The same thing is also true in the case of the twisted
Lorentz symmetry in Section 3.4. In this subsection, we work out explicitly some relations
among correlation functions in the two theories.
In the effective quantum field theory of quantum gravity, the action of the translational
generators on a correlation function is given by
〈φ̃(g1) · · · φ̃(gn)〉 → iǫiPi(g1 · · · gn)〈φ̃(g1) · · · φ̃(gn)〉 (103)
in the momentum representation, where ǫi is an infinitesimal parameter. Thus we obtain a
relation,
Pi(g1 · · · gn)〈φ̃(g1) · · · φ̃(gn)〉 = 0. (104)
This is a (modified) momentum conservation law; the correlation function has support only
on the vanishing momentum subspace, Pi(g1 · · · gn) = 0. This all-order relation in the
quantum field theory would be a simple but an important implication of the Hopf algebraic
translational symmetry. This provides a good example of the physical importance of a Hopf
algebraic symmetry: a Hopf algebra symmetry leads to a (modified) conservation law.
It would also be interesting to see the relations in the coordinate representations, where
the fields are defined by φ(x) =
eip·xφ̃(p). As explicitly noted in the preceding subsections,
we stress that the basis of the spaces X of the field variables in the path integrals are
parameterized in terms of momenta, and that φ(x) are defined by some c-number linear
combinations of them. Therefore, an action a ∈ A of a symmetry transformation acts as
a ⊲ φ(x) =
eip·x(a ⊲ φ̃(p)), (105)
and the symmetry relations of correlation functions can be obtained by some inverse Fourier
transformations (with possible non-trivial measures) of those in momentum representations.
For example, in the case of the two point function, after the inverse Fourier transforma-
tion, the relation among correlation functions is given by
〈∂iφ(x1)φ(x2) + φ(x1)∂iφ(x2)〉 = 0, (106)
where we have used the relation (104). Interestingly, this is the usual relation in a transla-
tionally invariant quantum field theory. In the case of the three point function, however, the
relation is given by
〈∂iφ(x1)
1 + κ2∂2φ(x2)
1 + κ2∂2φ(x3) +
1 + κ2∂2φ(x1)∂
iφ(x2)
1 + κ2∂2φ(x3)
1 + κ2∂2φ(x1)
1 + κ2∂2φ(x2)∂
iφ(x3) + iκǫ
1 + κ2∂2φ(x1)∂jφ(x2)∂kφ(x3)
+ iκǫijk∂jφ(x1)
1 + κ2∂2φ(x2)∂kφ(x3)− iκǫijk∂jφ(x1)∂kφ(x2)
1 + κ2∂2φ(x3)
+ κ2∂jφ(x1)∂
jφ(x2)∂
iφ(x3)− κ2∂iφ(x1)∂kφ(x2)∂kφ(x3)
+ κ2∂kφ(x1)∂
iφ(x2)∂
kφ(x3)〉 = 0. (107)
This is quite a non-trivial relation among correlation functions, and would be hard to find,
if the Hopf algebra symmetry in the quantum field theory was not noticed. This would be
another interesting example implying the physical importance of a Hopf algebra symmetry.
In general, the relation has the form,
∂xli − i
κǫijk∂xlj∂xmk +O(κ
2))〈φ(x1) · · ·φ(xn)〉 = 0. (108)
In the κ → 0 limit, the relation approaches the usual relation. Thus the Hopf algebra sym-
metry is a kind of translational symmetry modified by adding κ dependent higher derivative
multi-field contributions.
We can proceed in a similar manner for the twisted Lorentz symmetry. We have a general
form of such a symmetry relation as
Mµν ⊲ 〈φ̃(p1) · · · φ̃(pn)〉 = 0. (109)
In the case of the two point function, the relation is given by
〈(xµ1∂ν − xν1∂µ)φ(x1)φ(x2) + φ(x1)(x
ν − xν2∂µ)φ(x2)〉 = 0, (110)
where we have used the momentum conservation. This is the same relation as that in a
Lorentz invariant quantum field theory. In the case of the three point function, the relation
is given by
〈(xµ1∂ν − xν1∂µ)φ(x1)φ(x2)φ(x3)
+ φ(x1)(x
ν − xν2∂µ)φ(x2)φ(x3) + φ(x1)φ(x2)(x
ν − xν3∂µ)φ(x3)
iθαµ(∂αφ(x1)∂
νφ(x2)φ(x3) + ∂αφ(x1)φ(x2)∂
νφ(x3) + φ(x1)∂αφ(x2)∂
νφ(x3)
− ∂νφ(x1)∂αφ(x2)φ(x3)− ∂νφ(x1)φ(x2)∂αφ(x3)− φ(x1)∂νφ(x2)∂αφ(x3))
iθαν(∂αφ(x1)∂
µφ(x2)φ(x3) + ∂αφ(x1)φ(x2)∂
µφ(x3) + φ(x1)∂αφ(x2)∂
µφ(x3)
− ∂µφ(x1)∂αφ(x2)φ(x3)− ∂µφ(x1)φ(x2)∂αφ(x3)− φ(x1)∂µφ(x2)∂αφ(x3))〉 = 0. (111)
In general, the relation among correlation functions has the from,
((x1µ∂x1ν − x1ν∂x1µ) + · · ·+ (xnµ∂xnν − xnν∂xmν) +O(θ))〈φ(x1) · · ·φ(xn)〉 = 0 (112)
in the coordinate representation. The leading terms corresponds to the usual Lorentz trans-
formation xµ → xµ + ǫµνxν .
The above symmetry relations on Moyal plane can be represented in similar manners
as the usual commutative cases, if we use star products. In the papers [24, 25, 26, 27,
28, 29, 30], they have pointed out that in coordinate representation, correlation functions on
Moyal plane should be defined with star products extended to non-coincident points (see also
[43]) instead of usual products since the usual commutative commutation relation [x
i , x
j ] =
0 (i, j = 1, · · · , n) is not invariant under the twisted Poincaré transformation. Carrying
out Fourier transformation of the symmetry relation (109) in momentum representation to
such a noncommutative coordinate representation, we obtain the symmetry relations in star
tensor products. Namely (110) becomes
〈((xµ1∂ν − xν1∂µ)φ(x1)) ∗ φ(x2) + φ(x1) ∗ ((x
ν − xν2∂µ)φ(x2))〉 = 0, (113)
and (111) becomes
〈((xµ1∂ν − xν1∂µ)φ(x1)) ∗ φ(x2) ∗ φ(x3)
+ φ(x1) ∗ ((xµ2∂ν − xν2∂µ)φ(x2)) ∗ φ(x3) + φ(x1) ∗ φ(x2) ∗ ((x
ν − xν3∂µ)φ(x3))
iθαµ(∂αφ(x1) ∗ ∂νφ(x2) ∗ φ(x3) + ∂αφ(x1) ∗ φ(x2) ∗ ∂νφ(x3) + φ(x1) ∗ ∂αφ(x2) ∗ ∂νφ(x3)
− ∂νφ(x1) ∗ ∂αφ(x2) ∗ φ(x3)− ∂νφ(x1) ∗ φ(x2)∂α ∗ φ(x3)− φ(x1) ∗ ∂νφ(x2) ∗ ∂αφ(x3))
iθαν(∂αφ(x1) ∗ ∂µφ(x2) ∗ φ(x3) + ∂αφ(x1) ∗ φ(x2)∂µ ∗ φ(x3) + φ(x1) ∗ ∂αφ(x2) ∗ ∂µφ(x3)
− ∂µφ(x1) ∗ ∂αφ(x2) ∗ φ(x3)− ∂µφ(x1) ∗ φ(x2) ∗ ∂αφ(x3)− φ(x1) ∗ ∂µφ(x2) ∗ ∂αφ(x3))〉 = 0.
(114)
More generally we can derive the symmetry relations of correlation functions for tensor
fields φα1···αn(x) ≡ ∂α1 · · ·∂αnφ(x). For example in the case of the three point function of the
tensor fields, the symmetry relation becomes
〈((M1µν)α1···αl
δ1···δlφδ1···δl(x1)) ∗ φβ1···βm(x2) ∗ φγ1···γn(x3)
+ φα1···αl(x1) ∗ ((M
2µν)β1···βm
δ1···δmφδ1···δm(x2)) ∗ φγ1···γn(x3)
+ φα1···αl(x1) ∗ φβ1···βm(x2) ∗ ((M
3µν)γ1···γn
δ1···δnφδ1···δn(x3))
θαµ[∂αφα1···αl(x1) ∗ ∂
νφβ1···βm(x2) ∗ φγ1···γn(x3)
+ ∂αφα1···αl(x1) ∗ φβ1···βm(x2) ∗ ∂
νφγ1···γn(x3)
+ φα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ ∂
νφγ1···γn(x3)
− ∂νφα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ φγ1···γn(x3)
− ∂νφα1···αl(x1) ∗ φβ1···βm(x2)∂α ∗ φγ1···γn(x3)
− φα1···αl(x1) ∗ ∂
νφβ1···βm(x2) ∗ ∂αφγ1···γn(x3)]
θαν [∂αφα1···αl(x1) ∗ ∂
µφβ1···βm(x2) ∗ φγ1···γn(x3)
+ ∂αφα1···αl(x1) ∗ φβ1···βm(x2)∂
µ ∗ φγ1···γn(x3)
+ φα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ ∂
µφγ1···γn(x3)
− ∂µφα1···αl(x1) ∗ ∂αφβ1···βm(x2) ∗ φγ1···γn(x3)
− ∂µφα1···αl(x1) ∗ φβ1···βm(x2) ∗ ∂αφγ1···γn(x3)
− φα1···αl(x1) ∗ ∂
µφβ1···βm(x2) ∗ ∂αφγ1···γn(x3)]〉 = 0, (115)
where
(Mµν)α1···αn
β1···βn = (Lµν)α1···αn
β1···βn + (Sµν)α1···αn
β1···βn
(Lµν)α1···αn
β1···βn = i(xµ∂ν − xν∂µ)δα1β1 · · · δαnβn
(Sµν)α1···αn
β1···βn = i(ηνβ1δ{α1
β2 · · · δαn}βn − ηµβ1δ{α1νδα2β2 · · · δαn}βn) (116)
If we bring the operators (M iµν)α1···αn
β1···βn (i = 1, 2, 3) out of the star products, θµν depen-
dent terms are canceled. The final expressions are just the usual Lorentz rotations on the
coordinates and the tensorial indices in the correlation functions. This is fully consistent
with the discussions in [29].
3.6 Origin of Hopf algebra symmetries
To study more the meaning of these additional terms, let us see closer the transformation
properties of the star products. In the latter case, it is known that the θµν dependence of
the twisted Lorentz transformation (99) comes from the Lorentz transformation of θµν itself
[41]. To see this, let us consider an infinitesimal Lorentz transformation, Λµν = δ
ν + ǫ
The transformation of θµν is given by
θµν → θµν + ǫµρθρν + ǫνρθµρ
:= θµν + δθµν . (117)
If one considers not only the transformation of the coordinates, x
′µ = xµ + ǫµνxν , but also
(117), and assumes that φ(x) ∗θ φ(x) and φ′(x′) ∗θ+δθ φ′(x′) be equal, one obtains, after the
Fourier transformation,
φ̃′(p1)⊗ φ̃′(p2)
(ǫµνMµν ⊗ 1+ 1⊗ ǫµνMµν + δθµνPµ ⊗ Pν)
φ̃(p1)⊗ φ̃(p2)
ǫµν∆′Mµν
φ̃(p1)⊗ φ̃(p2), (118)
which agrees with (99). This shows that the additional part of the coproduct of Mµν takes
into account the transformation of the non-dynamical background parameter θµν .
The former case can be discussed in a similar manner. The definition of the star product
is given by
iPi(g1) ⋆x e
ixiPi(g2) = eix
iPi(g1g2), (119)
where we have explicitly indicated the coordinate where the star product is taken. Then we
recognize that ei(x+ǫ)
iPi(g1) ⋆x+ǫ e
i(x+ǫ)iPi(g2) and ei(x+ǫ)
iPi(g1) ⋆x e
i(x+ǫ)iPi(g2) give distinct values.
Namely, if the coordinate of the star product is also shifted,
ei(x+ǫ)
iPi(g1) ⋆x+ǫ e
i(x+ǫ)iPi(g2) = ei(x+ǫ)
iPi(g1g2), (120)
but, if not,
ei(x+ǫ)
iPi(g1) ⋆x e
i(x+ǫ)iPi(g2) = eiǫ
iPi(g1)eiǫ
iPi(g2)eix
iPi(g1g2). (121)
Therefore, if we take the translational transformation as (120), and carry out the same
procedure in deriving (59), we always obtain a translational invariant commutation relation5,
[(x+ ǫ)i, (x+ ǫ)j ]⋆x+ǫ = 2iκǫ
ijk(x+ ǫ)k. (122)
5There is a similar discussion in [42].
Now, assuming that φ(x) ⋆x φ(x) and φ
′(x′) ⋆x′ φ
′(x′) be equal under the translation xi →
′i = xi + ǫi, we obtain, after the Fourier transformation,
φ̃′(g1)φ̃
′(g2) = (1− iǫiPi(g1g2))φ̃(g1)φ̃(g2), (123)
which is the same as (75).
From these two examples, we anticipate that the multi-field contributions in (41) comes
from the transformation properties of the star products.
4 Summary and comments
We have discussed symmetries in noncommutative field theories in the framework of braided
quantum field theory. We have obtained the algebraic conditions for a Hopf algebra to be a
symmetry of a braided quantum field theory, by discussing the conditions for the relations
among correlation functions generated from the transformation algebra to hold. Then we
have applied our discussions to the Poincaré symmetries in the effective noncommutative
field theory of three-dimensional quantum gravity coupled with spinless particles and in
the noncommutative field theory on Moyal plane. In the former case we can understand
the braiding between fields, which was derived from the three-dimensional quantum gravity
computation, from the viewpoint of the translational symmetry of the noncommutative field
theory on a Lie-algebraic noncommutative spacetime. In the latter case we have found that
the twisted Lorentz symmetry on Moyal plane is a symmetry of the quantum field theory only
after the inclusion of the nontrivial braiding factor, which is in agreement with the previous
proposal [28, 35]. Then we have discussed the meaning of the Hopf algebra symmetries from
the viewpoint of coordinate representation.
In the recent research a noncommutative field theory on κ-Minkowski spacetime is dis-
cussed [36]. Since this noncommutativity of the coordinates is given by [x0, xj] = i
xj , this
noncommutative field theory will not have the naive translational symmetry. We may intro-
duce a non-trivial braiding between fields as in the effective field theory discussed in Section
3.3 to keep the momentum conservation. However, while the effective field theory has the
braided category structure because of the invariance of the Haar measure d(g−1g′g) = dg′,
the measure of the momentum space of the field theory on κ-Minkowski spacetime is only
left-invariant [36]. Therefore it is not clear to us whether we can embed this field theory on
κ-Minkowski spacetime into the framework of braided quantum field theory.
Acknowledgments
We would like to thank S. Terashima and S. Sasaki for useful discussions and comments, and
would also like to thank L. Freidel for stimulating discussions and explaining their recent
results during his stay in Yukawa Institute for Theoretical Physics after the 21st Nishinomiya-
Yukawa Memorial Symposium. Y.S. was supported in part by JSPS Research Fellowships
for Young Scientists. N.S. was supported in part by the Grant-in-Aid for Scientific Research
No.13135213, No.16540244 and No.18340061 from the Ministry of Education, Science, Sports
and Culture of Japan.
A The proofs of the formula (20), (21)
We give the proofs of the formula (20), (21) using diagrams. At first we use the formula
êv(∂ ⊗ αβ) = êv(∂ ⊗ α)ǫ(β) + êv(∂ ⊗ β)ǫ(α), (124)
where α, β ∈ X̂. This is clear from the definition of êv.
Figure 14 gives the proof of (20). In the first line, we use the axiom (12), and in the
second line we use the lemma (124). We find the last line from the property of counit.
Next we prove (21). By using the braided Leibniz rule (20) as α ∈ X ⊗ X̂, the left-hand
side of (21) becomes Figure 15. The first term of Figure 15 becomes (ev ⊗ idn−1)(∂ ⊗ idnα)
by using the definition of coproduct (9).
In the second term of Figure 15, we divide X̂ into X ⊗ X̂ and iterate the same as we did
above. For example, if the degree of X̂ is 3, the second term of Figure 15 can be reduced as
in Figure 16. We have used ∆X = X⊗̂1+ 1⊗̂X in the second line of Figure 16. The result
agrees with (21).
In the same way, we can obtain the formula (21) in general.
B The proofs of (27), (28), (29)
From the definition of γ (25), we find that
αaw = −αdiff(γ−1(a)⊗ w), (125)
for a ∈ X and α ∈ X̂ . On the other hand, adding γ−1 and ψ to the braided Leibniz rule
(20) as in Figure 17, we find that
αdiff(γ−1(a)⊗ w) = diff(ψ(α⊗ γ−1(a))w)− (diff ◦ ψ(α⊗ γ−1(a)))w. (126)
Combining (125), (126), we obtain
αaw = −diff(ψ(α⊗ γ−1(a))w) + (diff ◦ ψ(α⊗ γ−1(a)))w. (127)
Integrating the both hand sides of (127) and using (24), we find that
Z(0)(αa) = Z(0)(diff ◦ ψ(α⊗ γ−1(a))). (128)
If α is b ∈ X ,
Z(0)(ba) = Z(0)(diff ◦ ψ(b⊗ γ−1(a)))
= ev ◦ ψ(b⊗ γ−1(a))
= ev ◦ (γ−1 ⊗ id) ◦ ψ(b⊗ a). (129)
Figure 14: The proof of (20).
Figure 15: The left-hand side of (21).
Figure 16: The second term of Figure 15.
Figure 17: The diagram obtained from adding γ−1 and ψ over the braided Leibniz rule.
Thus we obtain (27).
By putting α = 1, it is clear that
1 (a) = 0. (130)
Next we rewrite (128) for α ∈ Xn−1 using the formula (21). Diagrammatically it is
written as in Figure 18. The second equality is due to (21). Thus we obtain that
Z(0)n = (Z
n−2 ⊗ Z
2 ) ◦ ([n− 1]′ψ ⊗ id) (131)
Iterating this, we find (28) for even n and (29) for odd n.
Figure 18: Diagrammatic proof of (131)
References
[1] H. S. Snyder, “Quantized space-time,” Phys. Rev. 71, 38 (1947).
[2] C. N. Yang, “On Quantized Space-Time,” Phys. Rev. 72, 874 (1947).
[3] A. Connes and J. Lott, “Particle Models And Noncommutative Geometry (Expanded
Version),” Nucl. Phys. Proc. Suppl. 18B, 29 (1991).
[4] S. Doplicher, K. Fredenhagen and J. E. Roberts, “The Quantum structure of space-
time at the Planck scale and quantum fields,” Commun. Math. Phys. 172, 187 (1995)
[arXiv:hep-th/0303037].
[5] N. Seiberg and E. Witten, “String theory and noncommutative geometry,” JHEP 9909,
032 (1999) [arXiv:hep-th/9908142].
[6] L. J. Garay, “Quantum gravity and minimum length,” Int. J. Mod. Phys. A 10, 145
(1995) [arXiv:gr-qc/9403008].
[7] N. Sasakura, “Space-time uncertainty relation and Lorentz invariance,” JHEP 0005,
015 (2000) [arXiv:hep-th/0001161].
[8] J. Madore, S. Schraml, P. Schupp and J. Wess, “Gauge theory on noncommutative
spaces,” Eur. Phys. J. C 16, 161 (2000) [arXiv:hep-th/0001203].
[9] L. Freidel and S. Majid, “Noncommutative harmonic analysis, sampling theory and the
Duflo map in 2+1 quantum gravity,” arXiv:hep-th/0601004.
[10] S. Imai and N. Sasakura, “Scalar field theories in a Lorentz-invariant three-dimensional
noncommutative space-time,” JHEP 0009, 032 (2000) [arXiv:hep-th/0005178].
[11] S. Minwalla, M. Van Raamsdonk and N. Seiberg, “Noncommutative perturbative dy-
namics,” JHEP 0002, 020 (2000) [arXiv:hep-th/9912072].
[12] M. Chaichian, P. P. Kulish, K. Nishijima and A. Tureanu, “On a Lorentz-invariant
interpretation of noncommutative space-time and its implications on noncommutative
QFT,” Phys. Lett. B 604, 98 (2004) [arXiv:hep-th/0408069].
[13] J. Wess, “Deformed coordinate spaces: Derivatives,” arXiv:hep-th/0408080.
[14] F. Koch and E. Tsouchnika, “Construction of theta-Poincare algebras and their invari-
ants on M(theta),” Nucl. Phys. B 717, 387 (2005) [arXiv:hep-th/0409012].
[15] P. Aschieri, C. Blohmann, M. Dimitrijevic, F. Meyer, P. Schupp and J. Wess, “A
gravity theory on noncommutative spaces,” Class. Quant. Grav. 22, 3511 (2005)
[arXiv:hep-th/0504183].
http://arxiv.org/abs/hep-th/0303037
http://arxiv.org/abs/hep-th/9908142
http://arxiv.org/abs/gr-qc/9403008
http://arxiv.org/abs/hep-th/0001161
http://arxiv.org/abs/hep-th/0001203
http://arxiv.org/abs/hep-th/0601004
http://arxiv.org/abs/hep-th/0005178
http://arxiv.org/abs/hep-th/9912072
http://arxiv.org/abs/hep-th/0408069
http://arxiv.org/abs/hep-th/0408080
http://arxiv.org/abs/hep-th/0409012
http://arxiv.org/abs/hep-th/0504183
[16] P. Aschieri, M. Dimitrijevic, F. Meyer and J. Wess, “Noncommutative geometry and
gravity,” Class. Quant. Grav. 23, 1883 (2006) [arXiv:hep-th/0510059].
[17] X. Calmet and A. Kobakhidze, “Noncommutative general relativity,” Phys. Rev. D 72,
045010 (2005) [arXiv:hep-th/0506157].
[18] A. Kobakhidze, “Theta-twisted gravity,” arXiv:hep-th/0603132.
[19] M. Chaichian, P. Presnajder and A. Tureanu, “New concept of relativistic invariance
in NC space-time: Twisted Poincare symmetry and its implications,” Phys. Rev. Lett.
94, 151602 (2005) [arXiv:hep-th/0409096].
[20] M. Chaichian, K. Nishijima and A. Tureanu, “An interpretation of noncommuta-
tive field theory in terms of a quantum shift,” Phys. Lett. B 633, 129 (2006)
[arXiv:hep-th/0511094].
[21] A. P. Balachandran, G. Mangano, A. Pinzul and S. Vaidya, “Spin and statistics on the
Groenwald-Moyal plane: Pauli-forbidden levels and transitions,” Int. J. Mod. Phys. A
21, 3111 (2006) [arXiv:hep-th/0508002].
[22] A. P. Balachandran, A. Pinzul and B. A. Qureshi, “UV-IR mixing in non-commutative
plane,” Phys. Lett. B 634, 434 (2006) [arXiv:hep-th/0508151].
[23] F. Lizzi, S. Vaidya and P. Vitale, “Twisted conformal symmetry in noncommu-
tative two-dimensional quantum field theory,” Phys. Rev. D 73, 125020 (2006)
[arXiv:hep-th/0601056].
[24] A. Tureanu, “Twist and spin-statistics relation in noncommutative quantum field the-
ory,” Phys. Lett. B 638, 296 (2006) [arXiv:hep-th/0603219].
[25] J. Zahn, “Remarks on twisted noncommutative quantum field theory,” Phys. Rev. D
73, 105005 (2006) [arXiv:hep-th/0603231].
[26] J. G. Bu, H. C. Kim, Y. Lee, C. H. Vac and J. H. Yee, “Noncommutative field theory
from twisted Fock space,” Phys. Rev. D 73, 125001 (2006) [arXiv:hep-th/0603251].
[27] Y. Abe, “Noncommutative quantization for noncommutative field theory,”
arXiv:hep-th/0606183.
[28] A. P. Balachandran, T. R. Govindarajan, G. Mangano, A. Pinzul, B. A. Qureshi and
S. Vaidya, “Statistics and UV-IR mixing with twisted Poincare invariance,” Phys. Rev.
D 75, 045009 (2007) [arXiv:hep-th/0608179].
[29] G. Fiore and J. Wess, “On ’full’ twisted Poincare symmetry and QFT on Moyal-Weyl
spaces,” arXiv:hep-th/0701078.
http://arxiv.org/abs/hep-th/0510059
http://arxiv.org/abs/hep-th/0506157
http://arxiv.org/abs/hep-th/0603132
http://arxiv.org/abs/hep-th/0409096
http://arxiv.org/abs/hep-th/0511094
http://arxiv.org/abs/hep-th/0508002
http://arxiv.org/abs/hep-th/0508151
http://arxiv.org/abs/hep-th/0601056
http://arxiv.org/abs/hep-th/0603219
http://arxiv.org/abs/hep-th/0603231
http://arxiv.org/abs/hep-th/0603251
http://arxiv.org/abs/hep-th/0606183
http://arxiv.org/abs/hep-th/0608179
http://arxiv.org/abs/hep-th/0701078
[30] E. Joung and J. Mourad, “QFT with twisted Poincare invariance and the Moyal prod-
uct,” arXiv:hep-th/0703245.
[31] L. Freidel and E. R. Livine, “Ponzano-Regge model revisited. III: Feynman diagrams
and effective field theory,” Class. Quant. Grav. 23, 2021 (2006) [arXiv:hep-th/0502106].
[32] K. Noui, “Three dimensional loop quantum gravity: Towards a self-gravitating quantum
field theory,” Class. Quant. Grav. 24, 329 (2007) [arXiv:gr-qc/0612145].
[33] K. Noui, “Three dimensional loop quantum gravity: Particles and the quantum double,”
J. Math. Phys. 47, 102501 (2006) [arXiv:gr-qc/0612144].
[34] R. Oeckl, “Braided quantum field theory,” Commun. Math. Phys. 217, 451 (2001)
[arXiv:hep-th/9906225].
[35] R. Oeckl, “Untwisting noncommutative R**d and the equivalence of quantum field
theories,” Nucl. Phys. B 581, 559 (2000) [arXiv:hep-th/0003018].
[36] L. Freidel, J. Kowalski-Glikman and S. Nowak, “From noncommutative kappa-
Minkowski to Minkowski space-time,” arXiv:hep-th/0612170.
[37] S. Majid, “Foundations of quantum group theory,” Cambridge, UK: Univ. Pr. (1995)
607 p
[38] S. Majid, “Beyond supersymmetry and quantum symmetry: An Introduction to braided
groups and braided matrices,” arXiv:hep-th/9212151.
[39] A. Klimyk and K. Schmudgen, “Quantum groups and their representations,” Berlin,
Germany: Springer (1997) 552 p
[40] G. Ponzano and T. Regge, in “Spectroscopic and Group Theoretical Methods in
Physics” ed. F. Bloch, North-Holland, Amsterdam, (1968).
[41] L. Alvarez-Gaume, F. Meyer and M. A. Vazquez-Mozo, “Comments on noncommutative
gravity,” Nucl. Phys. B 753, 92 (2006) [arXiv:hep-th/0605113].
[42] A. Agostini, G. Amelino-Camelia, M. Arzano, A. Marciano and R. A. Tac-
chi, “Generalizing the Noether theorem for Hopf-algebra spacetime symmetries,”
arXiv:hep-th/0607221.
[43] R. J. Szabo, “Quantum field theory on noncommutative spaces,” Phys. Rept. 378, 207
(2003) [arXiv:hep-th/0109162].
http://arxiv.org/abs/hep-th/0703245
http://arxiv.org/abs/hep-th/0502106
http://arxiv.org/abs/gr-qc/0612145
http://arxiv.org/abs/gr-qc/0612144
http://arxiv.org/abs/hep-th/9906225
http://arxiv.org/abs/hep-th/0003018
http://arxiv.org/abs/hep-th/0612170
http://arxiv.org/abs/hep-th/9212151
http://arxiv.org/abs/hep-th/0605113
http://arxiv.org/abs/hep-th/0607221
http://arxiv.org/abs/hep-th/0109162
	Introduction
	Review of braided quantum field theory
	Braided categories and braided Hopf algebras
	Braided quantum field theory
	Braided Feynman rules
	Symmetries in braided quantum field theory
	General description of an action
	Symmetry relations among correlation functions and their algebraic descriptions
	Symmetries of the effective noncommutative field theory of three-dimensional quantum gravity coupled with scalar particles
	Twisted Poincaré symmetry of noncommutative field theory on Moyal plane
	Relations among correlation functions : Examples
	Origin of Hopf algebra symmetries
	Summary and comments
	The proofs of the formula (??), (??)
	The proofs of (??), (??), (??)
ABSTRACT
  Braided quantum field theories proposed by Oeckl can provide a framework for
defining quantum field theories having Hopf algebra symmetries. In quantum
field theories, symmetries lead to non-perturbative relations among correlation
functions. We discuss Hopf algebra symmetries and such relations in braided
quantum field theories. We give the four algebraic conditions between Hopf
algebra symmetries and braided quantum field theories, which are required for
the relations to hold. As concrete examples, we apply our discussions to the
Poincare symmetries of two examples of noncommutative field theories. One is
the effective quantum field theory of three-dimensional quantum gravity coupled
with spinless particles given by Freidel and Livine, and the other is
noncommutative field theory on Moyal plane. We also comment on quantum field
theory on kappa-Minkowski spacetime.

<|endoftext|><|startoftext|>
Introduction
Solar flares first revealed themselves as visual perturbations of the solar atmo-
sphere (“white light flares”) and hence immediately were construed as a pho-
tospheric process. With the invention of spectroscopic techniques, though, it
became clear that chromospheric emission lines such as Hα revealed flare pres-
ence much more readily. This led to the concept of the “chromospheric flare” and
to a great deal of observational material on Hα flares and eruptions, as reviewed
by Smith & Smith (1963), Zirin (1966), or Švestka (1976), for example. At some
point, prior to the discovery of coronal flare effects, the misinterpretations of the
Hα line profile even led to the incorrect idea that a flare was a sudden cooling of
the solar atmosphere. In any case, a perturbation of the lower solar atmosphere
violent enough to affect the solar luminosity itself (“white light”) implies a large
energy content.
Our view of flares now emphasizes the high temperatures and non-thermal
effects seen in the corona, and we generally believe the chromospheric effects
themselves to be secondary in nature. This may be true, but nonetheless the
modern observations confirm the fact that the lower solar atmosphere dominates
the radiant energy budget of a flare via the UV and white-light continua. Some-
how, therefore, the energy stored in the solar corona rapidly focuses down into
regions visible in chromospheric signatures; this accounts for the high contrast
of flare effects there. Thus the “chromospheric flare” remains essential to our
understanding of the overall processes involved.
The chromosphere nowhere exists as a well-defined layer with a reproducible
height structure. In this paper I use the term interchangeably with “lower
solar atmosphere,” embracing the phenomena of the visible photosphere through
the transition region. During flares the structure of these “layers” and the
physical conditions within them may change drastically. The changes generally
http://arxiv.org/abs/0704.0823v1
2 Hudson
happen so fast and on such small spatial scales that we cannot observe them
comprehensively. Understanding the impulsive phase in the chromosphere may
therefore seem like something of a lost cause from the the point of view of
theory, especially in view of our inability to understand the quiet chromosphere
any better than we do. The data repeatedly reveal that we simply have not
yet resolved the spatial or temporal structures involved in the impulsive phase,
and that without knowing the geometry of the physical structure, we cannot
really comprehend its physics. The TRACE (Handy et al. 1999) and RHESSI
(Lin et al. 2002) observations have provided more than one recent breakthrough,
however, and it may be that we are beginning to understand the gradual phase
of a flare at least.
This review is organized around several topics involving the behavior of
the chromosphere during a flare. These include the process of “chromospheric
evaporation” (Section 4), flare energetics (Section 5), the mechanisms of flare
continuum emission (Section 6), and the inference of flare structure from the
morphology of the chromospheric flare (Section 7 and Section 9). In Section 2
and Section 3 we give an overview of the history of chromospheric flares and
show a cartoon to establish a working model of a solar flare. Sections 8 and 11
discuss large-scale magnetic reconnection and theoretical ideas, and Section 10
presents a γ-ray mystery.
2. Historical Development
Although it was the white-light continuum that initially revealed the existence
of solar flares, the advent of spectroscopy (e.g., Hale 1930) allowed their regular
observation via the Hα line (see Švestka 1966 for a discussion of the historical de-
velopment of these observations). This strong absorption line actually becomes
an emission line during bright flares, and Hα limb observations frequently show
prominences and eruptions. Hα observers came to recognize a particular flare
morphology, the so-called two-ribbon flare. Bruzek (1964) described the pat-
terns followed by these events, which provided strong evidence that the solar
corona had to play a major role in flare development. Figure 1 reproduces one
of Bruzek’s sketches, and then illustrates in a cartoon (due to Anzer & Pneuman
1982) how this morphology led to our standard magnetic-reconnection scenario
that tries to embrace the X-ray observations and the coronal mass ejections
(CMEs) as well as the chromospheric ribbon structures.
In this standard picture a solar flare develops in a complicated manner that
involves restructuring of the coronal magnetic field in such a way as to release
energy. The immediate effects of this energy release are to produce broad-band
“impulsive phase” emissions and to drive chromospheric gas up into coronal
magnetic loops, the process we term “chromospheric evaporation.” A part of the
field magnetic structure may actually erupt and open out into the solar wind, in
the sense that the field lines stretch out past the Alfvé n critical point of the flow.
This opening may consist of rising loops which then take the form of a coronal
mass ejection (CME), or it may involve interactions with previously open field (a
process often termed “interchange reconnection” nowadays; see Heyvaerts et al.
1977). If a CME does accompany the flare, as it almost invariably does for
flares of GOES class X or greater, the energy involved in mass motions may
Chromospheric Flares 3
Figure 1. Left: one of Bruzek’s (1964) sketches, showing a flare with ribbons
on the disk and its equivalent Hα “loop prominence system” over the limb.
This key observational pattern led directly to the formation of our standard
flare model (right), in the form presented by Anzer & Pneuman (1982).
be comparable to the luminous energy (e.g., Emslie et al. 2005). Generally the
observations are limited in resolution, both temporal and spatial, and especially
in spectral coverage. Thus we often resort to a cartoon that serves to identify
how the essential parts of a flare relate to one another.
Soft X-ray observations show hot loops in the gradual phase of a flare.
These result from the material “evaporated” from the chromosphere and have
anomalously high gas pressure (but still low plasma β; however see Gary 2001).
Whereas the pressure at the base of the corona normally is of order 0.1 dyne cm−2,
a bright flare loop can achieve 103 dyne cm−2. This over-dense and over-hot coro-
nal loop gradually cools, and in its final stages the remaining plasma returns
to a more chromospheric state and suddenly becomes visible in Hα (Goldsmith
1971). The loops that have reached this state then form Bruzek’s Hα loop
prominence system (Figure 1).
During the ribbon expansion another important phenomenon occurs: hard
X-ray emission appears at the footpoints of the coronal loops that are in the pro-
cess of being filled by chromospheric evaporation (Hoyng et al. 1981). The hard
X-rays show that a substantial part of flare energy appears in the form of non-
thermal electrons (Kane & Donnelly 1971; Lin & Hudson 1976; Holman et al.
2003). The hard X-ray signature (and hence the energetic dominance of these
electrons) is present whether or not the flare develops the two-ribbon morphol-
ogy or has a CME association.
The hard X-ray emission occurs in the impulsive phase of the flare, contem-
poraneously with the period of chromospheric evaporation that fills the coro-
nal loops and with the acceleration phase of the associated CME (Zhang et al.
2001). In Section 5 we describe this phase of the flare with the thick-target
model (Kane & Donnelly 1971) which Hudson (1972) identified with the energy
source of white-light flare continuum.
3. The Flare Spectrum
A (major) flare can be observed at almost any wavelength in a fast-rise/slow-
decay time profile, with some (e.g., the white-light continuum) having a more
impulsive variability, and others (e.g., the Balmer lines) having a more gradual
pattern (Figure 2, right). We generally describe a flare as consisting of a foot-
4 Hudson
Figure 2. Left: Line widths of the Balmer-series lines, from the classic
paper by Suemoto & Hiei (1959). The inferred densities added to the curves
are logne = 13.5 and 13.3; the inferred filling factor is small, suggesting
either filamentary structure or thin layering. Right: Typical time series of
flare radiations, distinguishing the impulsive phase from the gradual phase
(see Kane & Donnelly 1971).
point and ribbon structures in the lower atmosphere, coronal loops, and various
kinds of ejecta. The impulsive phase is typically associated with the footpoint
structures, and the gradual phase with the flare ribbons. Nowadays imaging
spectroscopy in principle allows us to study these regions independently.
Flare spectroscopy began with the observation of the Balmer series, which
shows broad lines tending towards emission profiles as the flare gets more en-
ergetic. Early observations of the higher members of the sequence allowed the
inference of a relatively high density and of a small filling factor (Suemoto & Hiei
1959; see the left panel of Figure 2). Such observations refer to what we would
now call the gradual phase of the flare (see the right panel of Figure 2 for a
sketch of the temporal development of a flare). In the impulsive phase the con-
tinuum appears in emission, as noted originally by Carrington and by Hodgson
independently. The weak photospheric metallic lines may also go weakly into
emission (or are filled in by continuum) and the recent observations of Xu et al.
(2004) show that flare effects can appear even at the “opacity minimum” region
of the spectrum, where one would expect much higher densities. In fact a single
density could never properly describe such a heterogeneous structure, but each
spectral band provides its own clues. At the time of writing no proper analy-
sis of spectroscopic “response functions” (e.g., Uitenbroek 2005) for any of the
signatures has yet been attempted, so our inference of flare structure from the
spectroscopy alone is weak.
The continuum radiation seen in white light and the UV constitutes the bulk
of flare radiated energy (Kane & Donnelly 1971; Woods et al. 2006). TRACE
imaging of this emission component shows it to consist of unresolved, intensely
Chromospheric Flares 5
bright fine structures (Hudson et al. 2006). The thick-target model invokes fast
electrons (energies above about 10 keV) to transport coronal energy into the
chromosphere. Here collisional losses provide the heating and footpoint emis-
sions that accompany the hard X-ray bremsstrahlung. The thick-target model
does not explain the particle acceleration, nor show how the footpoint sources
can be so intermittent. We return to this question in Section 7.
The spectra emitted at the footpoints of the flaring coronal loops have
contributions over an exceptionally broad wavelength range, as sketched in the
right panel of Figure 2. The prototypical observable is the hard X-ray flux, which
imaging observations show to be concentrated at the footpoints (Hoyng et al.
1981), but impulsive footpoint emissions also occur in many spectral windows
ranging from the microwaves (limited presumably by opacity) to the γ-rays
(limited presumably by detection sensitivity). There is a large body of work
on the Hα line alone, both observation and theory. Berlicki (2007) reviews the
Hα spectroscopic material in detail in these proceedings. A strong absorption
line forms across a wide range of continuum optical depths, and in principle this
single line might provide sufficient information to infer the physical structure of
the flare everywhere. In practice the complexities of the radiative transfer and
of the flare motions, especially in the impulsive phase, make this information
ambiguous (see Berlicki 2007).
4. Chromospheric Evaporation
The motions most directly relevant to the chromosphere are often called “chro-
mospheric evaporation,” even though the direct Doppler signatures of this mo-
tion are normally found in lines formed at higher temperatures (but see Berlicki et al.
2005). That this process occurs (even if it is not “evaporation” strictly speak-
ing) was suggested by the early observations of loop prominence systems (e.g.,
Bruzek 1964) with their “coronal rain,” and Neupert (1968) established its as-
sociation with non-thermal processes such as bursts of microwave synchrotron
radiation. The thermal microwave spectrum (e.g., Hudson & Ohki 1972) made
it particularly clear that the gradual phase of a solar flare involves the temporary
levitation of chromospheric material into the corona, as opposed to the process
that might be imagined from the earlier term “sporadic coronal condensation”
(e.g., Waldmeier 1963). The flows involved in chromospheric evaporation are
along the field direction and serve to create systems of coronal loops with rel-
atively high gas pressure and therefore higher (but still probably low) plasma
beta.
The early observational indications of chromospheric evaporation actually
came from blueshifts in EUV and soft X-ray lines (e.g., Antonucci et al. 1982;
Acton et al. 1982) such as those from FeXXV or CaXIX. Figure 3 shows an
image-resolved view of Doppler shifts in an evaporative flow (Czaykowska et al.
1999). The chromospheric effects are more subtle and in fact the impulsive-phase
evaporation is difficult to disentangle from other effects (Schmieder et al. 1987).
The high-temperature blueshifts correspond to upward velocities of some hun-
dreds km/s and seldom appear in the absence of a stationary emission line; in
other words, hot plasma has already accumulated in coronal loops as the process
continues. Based on theory and simulations (Fisher et al. 1985) one can distin-
6 Hudson
Figure 3. Imaging spectroscopy from SOHO/CDS of EUV emission lines
in the gradual phase of a two-ribbon flare, showing the clear signature of
blueshifted upflows in the expected locations along the flare ribbons. This is
“gentle” evaporation not associated with strong hard X-ray emission (from
Czaykowska et al. 1999). Note that CDS produces images by scanning in one
spatial dimension, so that each image (while monochromatic) is not instanta-
neous.
guish “explosive” and “gentle” evaporation, depending upon the physics of en-
ergy deposition (e.g., Abbett & Hawley 1999). In explosive evaporation, driven
hypothetically by an electron beam, one has the additional complication of a
“chromospheric condensation” that produces a redshift as well. Schmieder et al.
(1987) and Berlicki et al. (2005) survey our overall understanding. It would be
fair to comment that the explosive evaporation stage remains ill-understood,
even though it principle it describes the key physics of sudden mass injections
into flare loops.
Chromospheric Flares 7
-500 0 500 1000 1500 2000 2500 3000
VAL C Height, km
-500 0 500 1000 1500 2000 2500 3000
VAL C Height, km
10000
Figure 4. Characteristic radiative cooling time (upper) as a function of
height in the VAL-C model, crudely estimated as described in the text. The
lower panel shows the temperature in this model.
5. Energetics and Magnetic Field
We can use the standard VAL-C model (Vernazza et al. 1981), as discussed
further in the Appendix, to discuss the energetics. First we establish that the
chromosphere and the rest of the lower solar atmosphere (i.e., that for which
τ5000 < 1) have negligible heat capacity and limited time scales. Figure 4 shows
an estimate of the radiative time scale in the VAL-C model (Vernazza et al.
1981). This shows 3σ(z)kT/L⊙, where σ is the surface density as a function of
height about the τ5000 = 1 layer, and L⊙ the solar luminosity. The time scale
decreases below 1 sec only above z ∼ 515 km, near the temperature minimum
in the VAL-C model. Above this height any energy injected into the system will
tend to radiate rapidly, resulting in a direct energy balance between input and
output energy, rather than a local storage and release. At lower altitudes we
would not expect to see rapid variability.
The model also allows us to ask whether the chromosphere itself can store
energy comparable to that released in a major flare or CME. Table 1 gives some
order-of-magnitude properties for a chromospheric area of 1019 cm2, showing
both possible sources (bold) and sinks (italics) of energy. For the magnetic field
we simply assume 10 or 1000 G as representative cases. Using the total magnetic
energy in this manner is an upper limit, since the actual free energy would
depend on its degree of non-potentiality. We find that magnetic energy storage
limited to the volume of the chromosphere will not suffice, unless unobservably
small-scale fields there somehow dominate. The gravitational potential energy
also will not suffice. Estimates of this sort confirm the idea that the flare energy
must reside in the corona prior to the event.
The estimate of gravitational potential energy is somewhat more ambiguous.
The Table shows the value needed to displace the entire atmosphere by its total
thickness, the equivalent of roughly 3′′ in the VAL-C model. There does not
seem to be any evidence for such a displacement, although I am not aware of
any searches. It is likely that the stresses that store energy in the coronal field
8 Hudson
have their origin deeper in the convection zone, rather than in the atmosphere
(McClymont & Fisher 1989). Actually the observable changes of gravitational
energy are even of the wrong sign, given that we normally observe only outward
motions, (against gravity) during a flare.
Table 1. Properties of a chromospheric volume of area 1019 cm2
Parameter VAL-C VAL-C above Tmin
Mass 4×1019 gram 5×1017 gram
Magnetic energy 1×1028erg 8×1027 erg (10 G)
Magnetic energy 1×1032 erg 8×1031 erg (103 G)
Gravitational energya 3×1032 erg 3×1030 erg
Thermal energy 2×1031 erg 3×1029 erg
Kinetic energy 3×1029 erg 3×1027 erg
Ionization energy 4×1032 erg 6×1030 erg
aPotential energy for a vertical displacement of 2.5 × 108 cm
From Table 1 one concludes that the chromosphere probably does play a
dominant role in the energetics of a solar flare, at least as described by a semi-
empirical model such as VAL-C. This just restates the conventional wisdom,
namely that the flare energy needs to be stored magnetically in the corona,
rather than in the chromosphere where the radiation forms. Note that this
is backwards from the relationship for steady emissions: the requirement for
chromospheric heating is larger than that for coronal heating, so it is possible to
argue that the steady-state corona actually forms as a result of energy leakage
from the process of chromospheric heating (e.g., Scudder 1994).
We can make a similar estimate for energy flowing up from the photosphere.
The Alfvén speed at τ5000 = 1 ranges from 3 to 30 km s
−1, depending on the
field strength, in the VAL-C model (see Appendix). Below the surface of the Sun
vA drops rapidly because of the increase of hydrogen ionization. Thus chromo-
spheric flare energy cannot have been stored just below the photosphere, since it
could not propagate upwards rapidly enough (McClymont & Fisher 1989). This
again supports the conventional wisdom that flare energy resides in the corona
prior to the event.
To drive chromospheric radiations from coronal energy sources requires ef-
ficient energy transport, which is normally thought to be in the form of non-
thermal particles (the “thick target model”, Brown 1971; Hudson 1972) or in
the form of thermal conduction as in the formation of the classical transition
region. Both of these mechanisms provide interesting physical problems, but the
impulsive phase of the flare (where the thick-target model usually is thought to
apply) certainly remains less understood. Section 11 comments on models.
The magnetic field in the chromosphere is decisively important but ill-
understood. The plasma beta is generally low (see Appendix), so just as in
the corona the dynamics depends more on the behavior of the field itself than
to the other forces at work. Generally we believe that the subphotospheric field
exists in fibrils, implying the existence of sheath currents that isolate the flux
tubes from their unmagnetized environment. On the other hand, the dominance
of plasma pressure in the chromosphere as well as the corona implies that the
Chromospheric Flares 9
field must rapidly expand to become space-filling. Longcope & Welsch (2000)
discuss the physics involved in this process as flux emerges from the interior.
The effect of the flux emergence must be to create current systems linking the
sources of magnetic stress below the photosphere, with the non-potential fields
containing the coronal free energy. A full theory of how this works does not
exist, and we must add to the uncertainty the possibilty of unresolved fields
(e.g., Trujillo Bueno et al. 2004). Their suggested factor of 100 in B2 would
clearly affect the estimate of magnetic energy given in Table 1 and perhaps
change everything. We note in this context that the “impulse response” flares
(White et al. 1992) have scales so small that one could argue for an entirely
chromospheric origin.
6. Energetics and the Formation of the Continuum
The formation of the optical/UV emission spectrum of a solar flare has from
the outset presented a special challenge, since (a) it represents so much energy,
and (b) it appears in what should be the stablest layer of the solar atmosphere.
The recent observations of rapid variability and spatial intermittency make this
all the more interesting, and these observations – now from space – also help
to intercompare events; previous catalogs of white-light flares (e.g., Neidig 1989
and references therein) had to be based on spotty observations made with a
wide variety of instruments. Observationally, the continuum appears to have two
classes, with most events (“Type I” spectra) showing evidence for recombination
radiation via the presence of the Balmer edge and sometimes the Paschen edge
as well. A few events (e.g., Machado & Rust 1974) show spectra with weak or
unobservable Balmer jumps, implicating H− continuum as observed in normal
photospheric radiation. The spectra in the latter class (“Type II”) suggests
a relationship to Ellerman bombs (Chen et al. 2001). However, the physics of
Ellerman bombs appears to be quite different from that of solar flares (e.g.,
Pariat et al. 2004), though.
The strong suggestion from correlations is that non-thermal electrons phys-
ically transport flare energy from the corona, where it had been stored in the
current systems of non-potential field structures, into the radiating layers. The
hard X-ray bremsstrahlung results from the collisional energy losses of these
particles, and other signatures (such as the optical/UV continuum) depends on
secondary effects. Proposed mechanisms include direct heating, heating in the
presence of non-thermal ionization, and radiative backwarming. In some manner
these effects (or others not imagined) must provide the emissivity ǫν , to support
the observed spectrum. Note that the emissivity is often expressed in terms
of the source function Sν = ǫν/κν via the opacity κν . In a steady state one
would have energy balance between the input (e.g., electrons) and the contin-
uum. Fletcher et al. (2007) have now shown that this implies energy transport
by low-energy electrons, below 25 keV, as opposed to the 50 keV or higher sug-
gested by some earlier authors. Such low-energy electrons have little penetrating
power and could not directly heat the photosphere itself from a coronal accel-
eration site. Thus either the continuum arises from altered conditions in the
chromosphere, or some mechanism must be devised to link the chromosphere
and photosphere not involving the thick-target electrons.
10 Hudson
“Radiative backwarming” – for example Balmer and Paschen continuum
excited in the chromosphere and then penetrating down to and heating a deeper
layer – could in principle provide a vertical step between energy source and sink.
One problem with this is that the weaker backwarming energy fluxes might
not cause appreciable heating in the denser atmosphere, and thus not be able
to contribute to the observed continuum excess, because of the short radiative
time scale. This idea is a variant of the mechanism of non-thermal ionization
originally proposed by Hudson (1972) in the “specific ionization approximation,”
which involves no radiative-transfer theory and simply assumes ion-electron pairs
to be created locally at a mean energy (∼30 eV per ion pair). Finally, the rapid
variability observed in the continuuum, even at 1.56 µm (Xu et al. 2006) provides
a clear argument that the continuum forms at the temperature minimum or
above (see Section 5, especially Figure 4).
Early proponents of particle heating as an explanation for white-light flares
also considered protons as an energy source (Najita & Orrall 1970; Švestka
1970). This made sense, because protons at energies even below those char-
acteristic of γ-ray emission-line excitation can penetrate more deeply than the
electrons that produce hard X rays. It makes even more sense now that we have
the suggestion that ion acceleration in solar flares may rival electron accelera-
tion energetically (Ramaty et al. 1995). Simnett & Haines (1990) suggest that
particle acceleration in solar flares involves a neutral beam, implying that the
major energy content (and hence the optical/UV continuum) would originate in
the ion component. This idea does not appear to explain the apparent simul-
taneity of the footpoint sources (Sakao et al. 1996), and at present we do not
understand the plasma physics of the particle acceleration and propagation well
enough even to identify the location of the acceleration region.
7. Flare Structures Inferred from Chromospheric Signatures
The continuum kernels may move systematically for perhaps tens of seconds and
generally have short lifetimes. We illustrate this in Figure 5 (from Fletcher et al.
2004). This shows the motions of individual UV bright points within the flare
ribbon structure. Such motions are only apparent motions, as in a deflagration
wave, because they exceed the estimated photospheric Alfvén speed (see Sec-
tion 4 and the Appendix). Figure 6 (from Hudson et al. 2006) makes the same
point for a different flare, using TRACE white-light observations. The basic
picture one gets from such observations is that the white light/UV continuum
of a flare appears in compact structures that are essentially unresolved in space
and in time within the present observational limits. These bright points con-
tain enormous energy and thus must map directly to the energy source. We do
not know if the fragmentation (intermittency) results from this mapping or is
intrinsic to the basic energy-release mechanism.
How do the small chromospheric sources map into the corona, where the
flare energy must reside on a large scale before its release? A strong literature
has grown up regarding this point, interpreting the ribbon motions as measures
of flux transfer in the standard magnetic-reconnection model (Poletto & Kopp
1986; see also literature cited, for example, by Isobe et al. 2005). The flux
transfer in the photosphere is taken to measure the coronal inflow into the re-
Chromospheric Flares 11
Figure 5. Flare footpoint apparent motions deduced from TRACE UV ob-
servations. Each squiggle represents the track of a bright point visible for
several consecutive images at a few-second cadence, with the black dot show-
ing the beginning of each track (Fletcher et al. 2004).
Figure 6. Intermittent structure seen in TRACE white-light images of
an M-class flare on July 24, 2004. The individual frames have dimensions
32′′× 64′′. Note the presence of bright features consistent with the TRACE
angular resolution, and which change from frame to frame over the 30-second
interval. These observations do not appear to resolve the fluctuations either
in space or in time (Hudson et al. 2006).
connecting current sheet, which appears to correlate with the radiated energy as
seen in hard X-rays, UV, or Hα. Figure 1 (right) shows the assumed geometry
linking the chromosphere and corona. The analysis extends to the multiple si-
multaneous UV footpoints apparently moving within the ribbons as they evolve,
as noted in Figure 5 above. The analyses suggest a strong relationship between
energy release and the inferred coronal Alfvén speed.
12 Hudson
Figure 7. Left: UV ribbons (TRACE observations) from a flare of Novem-
ber 23, 2000. The gray scale shows the time sequence of brightening. Right:
Correlation between pixel brightness in Ribbon A and the inferred reconnec-
tion rate (from Saba et al. 2006).
8. Dynamics and Magnetic Reconnection
To release energy from coronal magnetic field in a largely “frozen-field” plasma,
a flare must involve mass motions. We often do observe apparent motions,
both parallel and perpendicular to the field as indicated by the image striations
(“loops”). Most of the observable motions are outward, leading to the idea of a
“magnetic explosion” (e.g., Moore et al. 2001). Motions apparently perpendicu-
lar to the magnetic field may become coronal mass ejections (CMEs) and contain
a great deal of energy (e.g., Emslie et al. 2005). These perpendicular motions
also are involved in flare energy release; for example the large-scale magnetic
reconnection involved in many flare models (Figure 1, right panel) necessarily
involve “shrinkage” (e.g., Švestka et al. 1987; Forbes & Acton 1996). Note that
this process is more of a magnetic implosion than a magnetic explosion (Hudson
2000).
The motion of flare footpoints and ribbons is (we believe) only apparent,
because of the low Alfvén speed vA in the photosphere, where the field is tem-
porarily anchored (“line-tied”). For B =1000 G and n = 1017 cm−3 we find
vA ∼ 6 km s
−1; observations often suggest motions an order of magnitude faster
(e.g., Schrijver et al. 2006). The motions therefore represent a wave-light confla-
gration moving through a relatively fixed magnetic-field structure. It is natural
to imagine that this sequence of field lines links to the coronal energy-release
site, which the standard model identifies with a current sheet that mediates
large-scale magnetic reconnection.
Figure 7 shows one example of the result of an analysis of the apparent
motion of a flare ribbon (Saba et al. 2006). This and other similar analyses reveal
a tendency for the “reconnection rate” to correlate with the pixel brightness. The
reconnection rate is the rate at which flux is swept out in the ribbonmotion, often
expressed as an electric field from E = v×B (the so-called “reconnection electric
field”). In this picture the flare ribbons are identified with “quasi-separatrix
structures” where magnetic reconnection can take place most directly.
Chromospheric Flares 13
9. Surges, Sprays, and Jets
Chromospheric material also appears in the corona in the form of surges and
sprays, which may have a close relationship to the flare process (e.g., Engvold
1980). In addition, of course, we observe filaments and prominences in chromo-
spheric lines, and these also have a flare/CME association, but too tangential
for discussion in this review.
Surges and sprays are Hα ejecta, rising into the corona as a result of chro-
mospheric magnetic activity. The literature traditionally distinguishes them by
apparent velocity, with the faster-moving sprays taken to have stronger flare
associations. Surges often appear to return to the Sun, while sprays acceler-
ate beyond the escape velocity and do not return. Both appear to move along
the magnetic field lines, but unlike the evaporation flow the surges and sprays
incorporate material at chromospheric temperatures.
Modern soft X-ray and EUV data (Yohkoh, SOHO, and TRACE) have had
sufficient time resolution to reveal the phenomenon of X-ray jets (Shibata et al.
1992); see also the UV observations of Dere et al. 1983. These tightly-collimated
structures at X-ray temperatures have a strong correlation with surges and
sprays, and indeed presumably lead to the jet-like CMEs seen at much greater
altitudes (Wang & Sheeley 2002). These events have a strong association with
emerging flux, and indeed the X-ray jets invariably have an association with mi-
croflares and originate in the chromosphere near the microflare loop(s) (Shibata et al.
1992). As Zirin famously remarked, most emerging flux emerges within active
regions, and that is where the jets preferentially occur. The site is frequently
in the leading part of the sunspot group. Figure 8 (Canfield et al. 1996) shows
the sequence of events in an explanation of these phenomena invoking magnetic
reconnection to allow chromospheric material access to open fields. Note that
this scenario imposes two requirements on the chromosphere: there must be
open and closed fields juxtaposed, and a large-scale reconnection process must
be able to proceed under chromospheric conditions. The Canfield et al. (1996)
observations strongly imply that this process requires the presence of vertical
electric currents supporting the observed twisting motions.
The surges, sprays, and jets, not to mention flares and CMEs, underscore
the time dependence and three-dimensionality characterizing what is often char-
acterized as a thin time-independent layer for convenience. The subject of
spicules is outside the scope of this review, but we note that they represent
a form of activity that occurs ubiquitously outside the magnetic active regions.
10. A Chromospheric γ-ray Mystery
The γ-ray observations of solar flares have begun, as did the radio and X-ray
observations before them, to open new windows on flare physics. Share et al.
(2004) have made a discovery that is difficult to understand and which in-
volves chromospheric material. They report observations of the line width of the
0.511 MeV γ-ray emission line formed by positron annihilation (Figure 9). This
emission requires a complicated chain of events: the acceleration of high-energy
ions, their collisional braking and nuclear interactions in the solar atmosphere,
the emission of secondary positrons by the excited nuclei, the collisional braking
14 Hudson
Figure 8. A mechanism for jet/surge formation involving emerging flux
(upper left), with magnetic reconnection against already-open fields (upper
right), which may lead to a high-temperature ejection (the jet) entraining
chromospheric material (the surge). The cartoon at lower right describes the
observations of (Canfield et al. 1996), who find a spinning motion suggesting
that the process must occur in a 3D configuration rather than that of the
cartoons left and above.
of these energetic positrons in turn, and finally their recombination with ambient
electrons to produce the 0.511 MeV γ-rays. Because the γ-ray observations are
so insensitive, this process requires an energetically significant level of particle
acceleration that is possibly distinct from the well-known electron acceleration
in the impulsive phase.
The mystery comes in the line width of the emission line. Surprisingly the
pioneering RHESSI observations of Share et al. (2004) showed it to be broad
enough to resolve. The likeliest source of this line broadening is Doppler mo-
tions in the positron-annihilation region. This requires the existence of a large
column density (of order gram cm−2) at transition-region temperatures; the
transition region under hydrostatic conditions would be many orders of mag-
nitude thinner (see also Figure 11). According to Schrijver et al. (2006), the
excitation of the footpoint regions during the the time of intense particle accel-
eration only continues for some tens of seconds at most. This would represent
the time scale for the apparent motion of a foopoint source across its diameter.
The γ-ray observations, on the other hand, require minutes of integration for a
statistically significant line-profile measurement.
We therefore are confronted with a major problem. What is the structure
of the flaring atmosphere that permits the formation of the broad 0.511 MeV
γ-ray line? Recent spectroscopic observations of the impulsive phase in the UV,
as viewed off the limb (Raymond et al. 2007) make a conventional explanation
difficult.
Chromospheric Flares 15
500 505 510 515 520
Energy (keV)
0.000
0.005
0.010
0.015
Figure 9. RHESSI γ-ray observations of the 0.511 MeV line of positron
annihilation (Share et al. 2004). The two line profiles are from different inte-
grations in the late phase of the X17 flare of 28 October 2003; for the broader
line the authors suggest thermal broadening, which would require a large
column depth of transition-region temperatures during the flare.
11. Theory and Modeling
To understand the chromospheric spectrum of a solar flare we must understand
the formation of the radiation and its transfer in the context of the motions
produced by (or producing) the flare. The representation of the spectrum by a
“semi-empirical model” represents one shortcut; in such an approach (e.g., the
standard VAL model that we use in the Appendix) one attempts to construct
a model atmosphere capable of describing the spectrum even if it may not be
physically self-consistent. Such descriptions may however be sufficient in the
gradual phase of a flare when the flare loops no longer have energy input and
simply evolve by cooling and draining. Even here, however, we do not have a
good understanding of the “moss” regions that form at the footpoints of these
high-pressure loops (but see Berger et al. 1999). So far as I am aware there is no
literature specifically on “spreading moss,” the similar structure that appears in
association with flare ribbons.
A more complete approach to the physics comes from “radiation hydrody-
namics” physical models, most recently those of Allred et al. (2005); see Berlicki
(2007) for a fuller description. Such models solve the equations of hydrodynam-
ics and radiative transfer simultaneously and can thus deal with chromospheric
evaporation and the formation of the high-pressure flare loops. This frame-
work is necessary if we are to be able to understand the flare impulsive phase
(e.g., Heinzel 2003). Even these models do not have sufficient realism, though,
since they work currently in one dimension and thus cannot follow the time
development of the excitation properly; the high-resolution observations of UV
and white light by TRACE clearly show that the energy release has unresolved
scales. Further, as pointed out by Hudson (1972), the ionization of the chromo-
sphere (and hence the formation of the continua) cannot be described by a fluid
16 Hudson
Figure 10. Continuum emission in the near infrared (1.56 µm, the “opacity
minimum” region) during an X10 flare (Xu et al. 2004). Red shows the IR
emissions, contours show the RHESSI 50-100 keV X-ray sources. The IR
contrast relative to the preflare photosphere reached ∼20% in this event.
approximation, or even by non-LTE radiative transfer that assumes a unique
temperature.
At present there has been little effort to create an electrodynamic theory
of chromospheric flare processes, even though non-thermal particles are widely
thought to provide the dominant energy in at least the impulsive phase. In
the gradual phase there is interesting physics associated with heat conduction
because the transition region would have to become so steep that classical con-
ductivity estimates have difficulty (Shoub 1983). A more complete theory would
have to take plasma effects into account and would probably contain elements
of theories of the terrestrial aurora that are now largely missing from the solar
lexicon. This lack of self-consistency in the modeling probably means that we
have major gaps in our understanding of, for example, the evaporation process
as it affects the fractionation of the elements and of the ionization states of the
flare plasma. The Appendix gives estimates of the ranges of some the key plasma
parameters in the chromosphere.
12. Conclusions
This article has reviewed chromospheric flare observations from the point of view
of the newest available information – Yohkoh, SOHO, TRACE and RHESSI, for
example, but not Hinode or STEREO (already launched), nor much less FASR
or ATST (not launched yet at the time of writing). spite of the high quality of
the data prior to these missions, we still find major unsolved problems:
• How does the chromosphere obtain all of the energy that it radiates?
• How can flare effects appear at great depths in the photosphere?
• How is the anomalous 0.511 MeV line width produced?
Chromospheric Flares 17
• What are the elements of an electrodynamic theory of chromospheric flares?
In my view the solution of these problems cannot be found in chromospheric
observations alone, because the physical processes involve much broader regions
of the solar atmosphere. Even providing answers to these specific questions may
not reveal the plasma physics responsible for flare occurrence, which may involve
spatial scales too fine ever to resolve. But we can hope that new observations
from space and from the ground, in wavelengths ranging from the radio to the
γ-rays, will enable us to continue our current rapid progress, and can speculate
that eventually numerical tools will supplement the theory well enough for us to
achieve full comprehension of the important properties of flares. To get to this
point we will need to deal with the chromosphere, as messy as it is.
One important task that is probably within our grasp now is the compu-
tation of response functions for physical models of flares. At present these are
restricted to very limited numerical explorations of the radiative transfer within
the framework of one-dimensional radiation hydrodynamics (e.g., Allred et al.
2005). The energy transport in these models has been restricted to simplistic
representations of particle beams for energy transport, and do not take account
of complicated flare geometries, waves, or various elements of plasma physics.
Future developments of chromospheric flare theory will need to complete the
picture in a more self-consistent manner.
Acknowledgments This work has been supported by NASA under grant NAG5-
12878 and contract NAS5-38099. I thank W. Abbett for a critical reading. I am
also grateful to Rob Rutten for LaTeX instruction, and to Bart de Pontieu for
meticulous keyboard entry.
Appendix: plasma parameters
The lower solar atmosphere marks the transition layer between regions of strik-
ing physical differences, and as one goes further up in height the tools of plasma
physics should become more important. This Appendix evaluates for conve-
nience several basic plasma-physics parameters for the conditions of the staple
VAL-C atmospheric model (Vernazza et al. 1981)1. This is a “semi-empirical
model” in which interprets a set of observations in terms of the theory of ra-
diative transfer, but without any effort to have self-consistent physics. Such a
model can accurately represent the spectrum but may or may not provide a
good starting point for physical analysis. Because the optical depth of a spec-
tral feature is the key parameter determining its structure, one often sees the
model parameters plotted against continuum optical depth τ5000 evaluated at
5000Å. Just for illustration, Figure 11 shows the VAL-C temperature separately
as a function of height, column mass, and optical depth. Note that features
prominent in one display may appear to be negligible in another
The VAL-C model is an “average quiet Sun” model, and like all static 1D
models, it cannot describe the variability of the physical parameters that theory
The VAL-C parameters are available within SolarSoft as the procedure VAL C MODEL.PRO.
18 Hudson
10000 1000 100 10
Height, km
10-10 10-8 10-6 10-4 10-2 100
Continuum opacity
10-8 10-6 10-4 10-2 100 102
Column mass, g/cm3
Figure 11. Illustration of the structure of a semi-empirical model, using
three different independent variables: the VAL-C temperature plotted against
height, optical depth, and column mass.
and observation require (see other papers in these proceedings, e.g., Carlsson’s
review). Thus we should regard the plasma parameters estimated here as order-
of-magnitude estimates only and note especially that the vertical scales, which
depend in the model on the inferred optical-depth scale, may be systematically
displaced.
The VAL-C model explicitly does not represent a chromosphere perturbed
by a flare. Vernazza et al. (1981) and many other authors give more appropriate
models derived by similar techniques for flares as well as other structures. As
the discussion of the γ-ray signatures in Section 10 suggests, though, a pow-
erful flare may be able distort the lower solar atmosphere essentially beyond
all recognition (especially in the impulsive phase). To estimate representative
plasma parameters I have therefore chosen just to start with the basic VAL-C
model, and we simply assume constant values of B at 10 G and 1000 G. The
actual magnetic field may vary through this region (the “canopy”) but the de-
tails are little-understood. The γ-ray literature usually uses a parametrization
of the magnetic field strength B ∝ Pαg (Zweibel & Haber 1983), where Pg is the
gas pressure.
The most complicated behavior of the plasma parameters happens prefer-
entially near the top of the VAL-C model range (for example, Figure 12 shows
that the collision frequency decreases below the plasma and Larmor frequencies)
above the helium ionization level (or even below this level for strong magnetic
fields). Because VAL-C ignores time dependences and 3D structure, and as-
sumes Te = Ti, we can expect that it has diminished fidelity as one approaches
the unstable transition region; thus one should be especially careful not to take
these approximations too literally. The following notes correspond to each panel
of the figure. Most of the plasma-physics formulae used in this Appendix are
from Chen (1984).
Chromospheric Flares 19
Figure 12. Various plasma parameters in the VAL-C model. We have as-
sumed representative B values of 10 G and 1000 G. The different panels show
the following, left to right and top to bottom: (a) Temperature. (b) Den-
sities: solid, total hydrogen density; dotted, electron density; dashed, He I
density; dash-dot, He II density. (c) Plasma beta: solid, for 1000 G; dotted,
for 10 G; light solid, electron density as a fraction of total hydrogen den-
sity; dash-dot, the plasma parameter. (d) Frequencies. Solid, electron and
ion plasma frequencies; dotted, electron gyrofrequencies for 10 and 1000 G;
dashed, electron and ion collision frequencies; dash-dot, electron/neutral col-
lision frequency. (e) Velocities: Solid, electron and ion thermal velocities;
dashed, Alfvén speeds for 10 and 1000 G. (f) Scale lengths: solid, electron
Larmor radii for 10 G and 100 G; dotted, Debye length; dashed, ion and
electron inertial lengths.
Temperature: The VAL-C model, like all of the semi-empirical models, sets
Te = Ti. It therefore cannot support plasma processes dependent upon dif-
ferent ion and electron temperatures, or more complicated particle distribution
functions (e.g. Scudder 1994).
Densities: Total hydrogen density, electron density, and densities of He I and
He II.
Dimensionless parameters: We approximate the plasma beta as
2(nH + 2ne)kT
B2/8π
with nH the hydrogen density, ne the electron density, Figure 12(c) gives the
number of electrons in a Debye sphere as the “plasma parameter” Λ.
20 Hudson
Frequencies: The plasma frequency, the electron and proton Larmor frequen-
cies, and the electron and ion and collision frequencies
νei = 2.4 × 10
−6nelnΛ/T
eV ; νii = 0.05 × 4νei; νeH = (nH/ne)νe
with ne in cm
−3, TeV the temperature in eV, using Z = 1.2 and the Coulomb
logarithm lnΛ = 23 - ln(n0.5e T
eV ) (Chen 1984; De Pontieu et al. 2001).
Note that the collision frequencies are small compared with the plasma and
Larmor frequencies above about 1000 km in this model. This means generally
that plasma processes must have strong effects on the physical parameters of
the atmosphere in this region.
Velocities: Electron and proton thermal velocities; Alfvén speeds vA for 10 and
1000 G.
Scale lengths: Electron Larmor radii for 10 and 1000 G, the ion inertial length
c/ωpi, the electron inertial length c/ωpe, and the Debye length λD. The iner-
tial lengths determines the scale for the particle demagnetization necessary for
magnetic reconnection. For VAL-C parameters the ion inertial length increases
to about 100 m in the transition region.
References
Abbett W. P., Hawley S. L., 1999, ApJ521, 906
Acton L. W., Leibacher J. W., Canfield R. C., Gunkler T. A., Hudson H. S., Kiplinger
A. L., 1982, ApJ263, 409
Allred J. C., Hawley S. L., Abbett W. P., Carlsson M., 2005, ApJ630, 573
Antonucci E., Gabriel A. H., Acton L. W., Leibacher J. W., Culhane J. L., Rapley
C. G., Doyle J. G., Machado M. E., Orwig L. E., 1982, Solar Phys.78, 107
Anzer U., Pneuman G. W., 1982, Solar Phys.79, 129
Berger T. E., de Pontieu B., Fletcher L., Schrijver C. J., Tarbell T. D., Title A. M.,
1999, Solar Phys.190, 409
Berlicki A., Heinzel P., Schmieder B., Mein P., Mein N., 2005, A&A430, 679
Berlicki A., 2007, these proceedings
Brown J. C., 1971, Solar Phys.18, 489
Bruzek A., 1964, ApJ140, 746
Canfield R. C., Reardon K. P., Leka K. D., Shibata K., Yokoyama T., Shimojo M.,
1996, ApJ464, 1016
Chen F. F., 1984, Introduction to plasma physics, 2nd edition, New York: Plenum
Press, 1984
Chen P.-F., Fang C., Ding M.-D., 2001, Chinese Journal of Astronomy and Astrophysics
1, 176
Czaykowska A., de Pontieu B., Alexander D., Rank G., 1999, ApJ521, L75
De Pontieu B., Martens P. C. H., Hudson H. S., 2001, ApJ558, 859
Dere K. P., Bartoe J.-D. F., Brueckner G. E., 1983, ApJ 267, L65
Emslie A. G., Dennis B. R., Holman G. D., Hudson H. S., 2005, Journal of Geophysical
Research (Space Physics) 110, 11103
Engvold O., 1980, in M. Dryer, E. Tandberg-Hanssen (eds.), IAU Symp. 91: Solar and
Interplanetary Dynamics, p. 173
Fisher G. H., Canfield R. C., McClymont A. N., 1985, ApJ289, 434
Fletcher L., Pollock J. A., Potts H. E., 2004, Solar Phys.222, 279
Forbes T. G., Acton L. W., 1996, ApJ459, 330
Chromospheric Flares 21
Gary G. A., 2001, Solar Phys.203, 71
Goldsmith D. W., 1971, Solar Phys.19, 86
Hale G. E., 1930, ApJ71, 73
Handy B. N. et al., 1999, Solar Phys. 187, 229
Heinzel P., 2003, Advances in Space Research 32, 2393
Heyvaerts J., Priest E. R., Rust D. M., 1977, ApJ216, 123
Holman G. D., Sui L., Schwartz R. A., Emslie A. G., 2003, ApJ595, L97
Hoyng P. et al., 1981, ApJ246, L155
Hudson H. S., 1972, Solar Phys.24, 414
Hudson H. S., 2000, ApJ 531, L75
Hudson H. S., Ohki K., 1972, Solar Phys.23, 155
Hudson H. S., Wolfson C. J., Metcalf T. R., 2006, Solar Phys.234, 79
Isobe H., Takasaki H., Shibata K., 2005, ApJ632, 1184
Kane S. R., Donnelly R. F., 1971, ApJ164, 151
Lin R. P., et al., 2002, Solar Phys.210, 3
Lin R. P., Hudson H. S., 1976, Solar Phys.50, 153
Longcope D. W., Welsch B. T., 2000, ApJ545, 1089
Machado M. E., Rust D. M., 1974, Solar Phys.38, 499
McClymont A. N., Fisher G. H., 1989, in J. H. Waite Jr., J. L. Burch, R. L. Moore
(eds.), Solar System Plasma Physics, p. 219
Moore R. L., Sterling A. C., Hudson H. S., Lemen J. R., 2001, ApJ552, 833
Najita K., Orrall F. Q., 1970, Solar Phys.15, 176
Neidig D. F., 1989, Solar Phys.121, 261
Neupert W. M., 1968, ApJ 153, L59
Pariat E., Aulanier G., Schmieder B., Georgoulis M. K., Rust D. M., Bernasconi P. N.,
2004, ApJ614, 1099
Poletto G., Kopp R. A., 1986, in The Lower Atmosphere of Solar Flares, p. 453
Ramaty R., Mandzhavidze N., Kozlovsky B., Murphy R. J., 1995, ApJ455, L193
Raymond J. C., Holman G., Ciaravella A., Panasyuk A., Ko Y. ., Kohl J., 2007, ArXiv
Astrophysics e-prints 1359
Saba J. L. R., Gaeng T., Tarbell T. D., 2006, ApJ641, 1197
Sakao T., Kosugi T., Masuda S., Yaji K., Inda-Koide M., Makishima K., 1996, Advances
in Space Research 17, 67
Schmieder B., Forbes T. G., Malherbe J. M., Machado M. E., 1987, ApJ317, 956
Schrijver C. J., Hudson H. S., Murphy R. J., Share G. H., Tarbell T. D., 2006, ApJ650,
Scudder J. D., 1994, ApJ427, 446
Share G. H., Murphy R. J., Smith D. M., Schwartz R. A., Lin R. P., 2004, ApJ615,
Shibata K., et al., 1992, PASJ44, L173
Shoub E. C., 1983, ApJ266, 339
Simnett G. M., Haines M. G., 1990, Solar Phys.130, 253
Smith H. J., Smith E. V. P., 1963, Solar flares, New York: Macmillan, 1963
Suemoto Z., Hiei E., 1959, PASJ11, 185
Trujillo Bueno J., Shchukina N., Asensio Ramos A., 2004, Nat430, 326
Uitenbroek H., 2005, AGU Spring Meeting Abstracts
Švestka Z., 1966, Space Science Reviews 5, 388
Švestka Z., 1970, Solar Phys.13, 471
Švestka Z., 1976, Solar Flares, Dordrecht: Reidel, 1976
Švestka Z. F., Fontenla J. M., Machado M. E., Martin S. F., Neidig D. F., 1987, So-
lar Phys.108, 237
Vernazza J. E., Avrett E. H., Loeser R., 1981, ApJS45, 635
Waldmeier M., 1963, Zeitschrift fur Astrophysik 56, 291
Wang Y.-M., Sheeley, Jr. N. R., 2002, ApJ575, 542
White S. M., Kundu M. R., Bastian T. S., Gary D. E., Hurford G. J., Kucera T.,
22 Hudson
Bieging J. H., 1992, ApJ 384, 656
Woods T. N., Kopp G., Chamberlin P. C., 2006, Journal of Geophysical Research (Space
Physics) 111, 10
Xu Y., Cao W., Liu C., Yang G., Jing J., Denker C., Emslie A. G., Wang H., 2006,
ApJ641, 1210
Xu Y., Cao W., Liu C., Yang G., Qiu J., Jing J., Denker C., Wang H., 2004, ApJ607,
Zhang J., Dere K. P., Howard R. A., Kundu M. R., White S. M., 2001, ApJ559, 452
Zirin H., 1966, The solar atmosphere, Blaisdell: Waltham, Mass., 1966
Zweibel E. G., Haber D. A., 1983, ApJ264, 648
ABSTRACT
  In this topical review I revisit the "chromospheric flare." This should
currently be an outdated concept, because modern data seem to rule out the
possiblity of a major flare happening independently in the chromosphere alone,
but the chromosphere still plays a major observational role in many ways. It is
the source of the bulk of a flare's radiant energy - in particular the
visible/UV continuum radiation. It also provides tracers that guide us to the
coronal source of the energy, even though we do not yet understand the
propagation of the energy from its storage in the corona to its release in the
chromosphere. The formation of chromospheric radiations during a flare presents
several difficult and interesting physical problems.

<|endoftext|><|startoftext|>
Introduction
In this work we study deformations of the N -differential of a N -differential graded algebra.
According to Kapranov [18] and Mayer [24, 25] a N -complex over a field k is a Z-graded k-
vector space V =
n∈Z Vn together with a degree one linear map d : V −→ V such that
dN = 0. Remarkably, there are at least two generalizations of the notion of differential graded
algebras to the context of N -complexes. A choice, introduced first by Kerner in [20, 21] and
further studied by Dubois-Violette [13, 14] and Kapranov [18], is to fix a primitive N -th root
of unity q and define a q-differential graded algebra A to be a Z-graded associative algebra
together with a linear operator d : A −→ A of degree one such that d(ab) = d(a)b + qāad(b)
and dN = 0. There are several interesting examples and constructions of q-differential graded
algebras [1, 2, 6, 8, 9, 15, 16, 19, 21].
We work within the framework of N -differential graded algebras (N -dga) introduced in [4].
This notion does not depend on the choice of a N -th primitive root of unity, and thus it is better
adapted for differential geometric applications. A N -differential graded algebra A consist of a
Z-graded associative algebra A =
n∈Z An together with a degree one linear map d : A −→ A
such that dN = 0 and d(ab) = d(a)b + (−1)āad(b) for a, b ∈ A. The main question regarding
this definition is whether there are interesting examples of N -differential graded algebras. Much
work still needs to be done, but already a variety of examples has been constructed in [4, 5].
These examples may be classified as follows:
• Deformations of 2-dga into N -dga. This is the simplest and most direct way to construct
N -differential graded algebras. Take a differential graded algebra A with differential d
and consider the deformed derivation d+e where e : A −→ A is a degree one derivation. It
http://arxiv.org/abs/0704.0824v3
is possible to write down explicitly the equations that determine under which conditions
d+ e is a N -differential, and thus turns A into a N -differential graded algebra. In other
words one can explicitly write down the condition (d+ e)N = 0.
• N flat connections. Let E be a vector bundle over a manifold M provided with a flat
connection ∂E . Differential forms on M with values in End(E) form a differential graded
algebra. An End(E)-valued one form T determines a deformation of this algebra into a
N -differential graded algebra with differential of the form ∂E + [T, ] if and only if T is a
N -flat connection, i.e., the curvature of T is N -nilpotent.
• Differential forms of depth N ≥ 3. Attached to each affine manifold M there is a
(dim(M)(N − 1) + 1)-differential graded algebra ΩN (M), called de algebra of differential
forms of depth N on M , constructed as the usual differential forms allowing higher order
differentials, i.e., for affine coordinates xi on M , there are higher order differentials d
for 1 ≤ j ≤ N − 1.
• Deformations of N -differential graded algebras into M -differential graded algebras. If we
are given a N -differential graded algebra A with differential d, one can study under which
condition a deformed derivation d + e, where e is a degree one derivation of A, turns
A into a M -differential graded algebra, i.e., one can determine conditions ensuring that
(d + e)M = 0. In [4] we showed that e must satisfy a system of non-linear equations,
which we called the (N,M) Maurer-Cartan equation.
• Algebras AN∞. This is not so much an example of N -differential graded algebras but rather
a homotopy generalization of such notion. AN∞ algebras are studied in [7].
This paper has three main goals. One is to introduce geometric examples of N - differential
graded algebras. We first review the constructions of N -differential graded algebras outlined
above and then proceed to consider the new examples:
• Differential forms on finitely generated simplicial sets. We construct a contravariant func-
tor ΩN : set
∆op −→ N ildga from the category of simplicial sets generated in finite dimen-
sions to N ildga, the category of nilpotent differential graded algebras, i.e., N -differential
graded algebras for some N ≥ 1. For a simplicial set s we let ΩN (s) be the algebra of
algebraic differential forms of depth N on the algebro-geometric realization of s. For each
integer K we define functor Sing≤K : Top −→ set
∆op , thus we obtain contravariant func-
tors ΩN ◦Sing≤K : Top −→ N
ildga assigning to each topological space X a nil-differential
graded algebra.
• Difference forms on finitely generated simplicial sets. We construct a contravariant functor
DN defined on set
∆op with values in a category whose objects are graded algebras which
are also N -complexes for some N , with the N -differential satisfying a twisted Leibnitz
rule. For a simplicial set s we let DN (s) be the algebra of difference forms of depth N
on the integral lattice in the algebro-geometric realization of s. Again, for each integer
K ≥ 0 we obtain a functor DN ◦ Sing≤K defined on Top assigning to each topological
space X a twisted nil-differential graded algebra.
Our second goal is to study the construction of N -differential graded algebras as defor-
mations of 3-differential graded algebras. Although in [4] a general theory solving this sort
of problem was proposed, our aim here is to provided a solution as explicit as possible. We
consider exact and infinitesimal deformations of 3-differentials in Section 3.
Our final goal in this work is to find applications of N -differential graded algebras to Lie
algebroids. In Section 4 we review the concept of Lie algebroids introduced by Pradines [27],
which generalizes both Lie algebras and tangent bundles of manifolds. A Lie algebroid E may
be defined as a vector bundle together with a degree one differential d on Γ(
E∗). We generalize
this notion to the world of N -complexes, that is we introduce the concept of N Lie algebroids
and construct several examples of such objects.
2 Examples of N-differential graded algebras
In this section we give a brief summary of the known examples of N -dgas and introduce new
examples of N -dgas of geometric nature.
Definition 1. Let N ≥ 1 be an integer. A N -complex is a pair (A, d), where A is a Z-graded
vector space and d : A −→ A is a degree one linear map such that dN = 0.
Clearly a N -complex is also a M -complex for M ≥ N . N -complexes are also referred to
as N -differential graded vector spaces. A N -complex (A, d) such that dN−1 6= 0 is said to be
a proper N -complex. Let (A, d) be a N -complex and (B, d) be a M -complex, a morphism
f : (A, d) −→ (B, d) is a linear map f : A −→ B such that df = fd. One of the most
interesting features of N -complexes is that they carry cohomological information. Let (A, d)
be a N -complex, a ∈ Ai is p-closed if dp(a) = 0, and is p-exact if there exists b ∈ Ai−N+p such
that dN−p(b) = a, for 1 ≤ p < N . The cohomology groups of (A, d) are the spaces
i(A) = Ker{dp : Ai −→ Ai+p}/Im{dN−p : Ai−N+p −→ Ai},
for i ∈ Z and p = 1, 2, ..., N−1.
Definition 2. A N -differential graded algebra (N -dga) over a field k, is a triple (A,m, d) where
m : A⊗A −→ A and d : A −→ A are linear maps such that:
1. dN = 0, i.e., (A, d) is a N -complex.
2. (A,m) is a graded associative algebra.
3. d satisfies the graded Leibnitz rule d(ab) = d(a)b+ (−1)āad(b).
The simplest way to obtain N -differential graded algebras is deforming differential graded
algebras. Let Der(A) be the Lie algebra of derivations on a graded algebra A. Recall that a
degree one derivation d on A, induces a degree one derivation, also denoted by d, on End(A).
Let A be a 2-dga and e ∈ Der(A). It is shown in [4] that e defines a deformation of A into
a N -differential graded algebra if and only if (d + e)N = 0, or equivalently, if and only if the
curvature Fe = d(e) + e
2 of e satisfies (Fe)
2 = 0 if N is even, or (Fe)
2 (d + e) = 0 if N is
odd. For example, consider the trivial bundle M × Rn over M . A connection on M × Rn is a
gl(n)-valued one form a on M , and its curvature is Fa = da +
[a, a]. Let Ω(M,gl(n)) be the
graded algebra of gl(n)-valued forms on M . Thus the pair (Ω(M,gl(n)), d + [a, ]) defines a
N -dga if and only if (Fa)
2 = 0 for N even, or (Fa)
2 (d+ a) = 0 for N odd.
Differential forms of depth N on simplicial sets
Fix an integer N ≥ 3. We are going to construct the (n(N − 1) + 1)-differential graded algebra
ΩN (R
n) of algebraic differential forms of depth N on Rn. Let x1, ..., xn be coordinates on R
and for 0 ≤ i ≤ n and 0 ≤ j < N, let djxi be a variable of degree j. We identify d
0xi with xi.
Definition 3. The (n(N − 1) + 1)-differential graded algebra ΩN (R
n) is given by
• ΩN (R
n) = R[djxi]/
djxid
kxi | j, k ≥ 1
as a graded algebras.
• The (n(N − 1) + 1)-differential d : ΩN (R
n) −→ ΩN (R
n) is given by d(djxi) = d
j+1xi, for
0 ≤ j ≤ N − 2, and d(dN−1xi) = 0.
One can show that d is (n(N − 1) + 1)-differential as follows:
1. It is easy to check that ΩN (R) is a N -dga.
2. If A is a N -dga and B is a P -dga, then A⊗B is a (N + P − 1)-dga.
3. ΩN (R
n) = ΩN (R)
We often write ΩN (x1, ..., xn) instead of ΩN (R
n) to indicate that a choice of affine coordi-
nates (x1, ..., xn) on R
n has been made.
Let ∆ be the category such that its objects are non-negative integers; morphisms in ∆(n,m)
are order preserving maps f : {0, ..., n} −→ {0, ...,m}. The category of simplicial sets Set∆
the category of contravariant functors ∆ −→ Set. Explicitly, a simplicial set s : ∆op −→ Set is
a functorial correspondence assigning:
• A set sn for each integer n ≥ 0. Elements of sn are called simplices of dimension n.
• A map s(f) : sm −→ sn for each f ∈ ∆(n,m).
Let Aff be the category of affine varieties, and let A : ∆ −→ Aff be the functor sending
n ≥ 0, into the affine variety A(x0, ..., xn) = ∆n = {(x0, ..., xn) ∈ R
n | x0 + ... + xn = 1}. A
sends f ∈ ∆(n,m) into A(f) : A(x0, ..., xn) → A(x0, ..., xm) given by A(f)
∗(xj) =
f(i)=j xi,
for 0 ≤ j ≤ m. Forms of depth N on the cosimplicial affine variety A are defined by the functor
ΩN : ∆
op −→ N ildga sending n ≥ 0 into
ΩN (n) = ΩN (x0, ..., xn)/ 〈x0 + ...+ xn − 1, dx0 + ...+ dxn〉 .
A map f ∈ ∆(n,m) induces a morphisms ΩN (f) : ΩN (m) −→ ΩN (n) given for 0 ≤ j ≤ m by
ΩN (f)(xj) =
f(i)=j
xi and ΩN (f)(dxj) =
f(i)=j
Let set∆
be the full subcategory of Set∆
whose objects are simplicial sets generated in
finite dimensions, i.e., simplicial sets s for which there is an integer K such that for each p ∈ si,
i ≥ K, there exists q ∈ sj, j ≤ K, with p = s(f)(q) for some f ∈ ∆(p, q). We are ready to
define the contravariant functor ΩN : set
∆op −→ N ildga announced in the introduction. The
nil-differential graded algebra ΩN (s) =
i=0 Ω
N (s) associated with s is given by
ΩiN (s) = {a ∈
ΩiN (n) | as(f)(p) = ΩN (f)(ap) for p ∈ sm and f ∈ ∆(n,m)}.
A natural transformation l : s −→ t induces a map ΩN(l) : ΩN (t) −→ ΩN (s) given by the rule
[ΩN (l)(a)]p = al(p) for a ∈ ΩN (t) and p ∈ sn.
For each integer K ≥ 0 there is functor ( )≤K : Set
∆op −→ set∆
sending a simplicial set
s, into the simplicial set s≤K generated by simplices in s of dimension lesser or equal to K.
The singular functor Sing : Top −→ Set∆
sends a topological space X into the simplicial set
Sing(X) such that
Singn(X) = {f : ∆n −→ X | f is continous }.
Thus, for each pair of integers N and K we have constructed a functor
ΩN ◦ ( )≤K ◦ Sing : Top −→ N
ildga
sending a topological space X into the nil-differential graded algebra ΩN (Sing≤K(X)).
Difference forms of depth N on simplicial sets
Next we construct difference forms of higher depth on finitely generated simplicial sets. Dif-
ference forms on discrete affine space were introduced by Zeilberger in [28]. We proceed to
construct a discrete analogue of the functors from topological spaces to nil-differential graded
algebras introduced above. First, we construct DN (Z
n) the algebra of difference forms of depth
N on Zn. Let F (Zn,R) be the algebra concentrated in degree zero of R-valued functions on the
lattice Zn. Introduce variables δjmi of degree j for 1 ≤ i ≤ n and 1 ≤ j < N . The graded
algebra of difference forms of depth N on Zn is given by
DN (Z
n) = F (Zn,R)⊗ R[δjmi]/
δjmiδ
kmi | j, k ≥ 1
A form ω ∈ DN (Z
n) can be written as ω =
I ωIdmI where I : {1, .., n} −→ N is any map
and dmI =
i=1 d
I(i)mi. The degree of dmI is |I| =
i=1 I(i). The finite difference ∆i(g) of
g ∈ F (Zn,R) along the i-direction is given by
∆i(g)(m) = g(m+ ei)− g(m),
where the vectors ei are the canonical generators of Z
n and m ∈ Zn. The difference operator δ
is defined for 1 ≤ j ≤ N − 2 by the rules
δ(g) =
∆i(g)δmi, δ(δ
jmi) = δ
j+1mi and δ(δ
N−1mi) = 0.
It is not difficult to check that if ω =
I ωIdmI , then δω =
J(δω)JdmJ where
(δω)J =
J(i)=1
(−1)|J<i|∆iωJ−ei +
J(i)≥2
(−1)|J<i|ωJ−ei .
From the later formula we see that (δω)J is a linear combination of (differences of) functions ωK
with |K| < |J |. This fact implies that δ is nilpotent, indeed, one can check that δn(N−1)+1 = 0.
All together we have proved the following result.
Theorem 4. DN (Z
n) is a graded algebra and the difference operator δ gives DN (Z
n) the
structure of a (n(N − 1) + 1)-complex.
One can check that δ satisfies a twisted Leibnitz rule, so DN (Z
n) is actually pretty close of
being a N -dga. Let Zn,1 ⊆ Zn+1 consists of tuples (m0, ...,mn) such that m0 + ... + mn = 1.
Consider the functor DN defined on ∆
op sending n ≥ 0 into
DN (n) = F (Z
n,1,R)⊗
δjmiδ
kmi, δm0 + ...+ δmn
A map f ∈ ∆(n, k) induces a morphisms DN (f) : DN (k) −→ DN (n) given for g ∈ F (Z
k,1,R)
and 0 ≤ j ≤ k by
DN (f)(g)(m0, ...,mn) = g
f(i)=0
mi, ...,
f(i)=k
 and DN (f)(δmj) =
f(i)=j
We extend DN to the functor defined on set
∆op sending a finitely generated simplicial set s into
DN (s) =
i=0 D
N (s) where
DiN (s) =
DiN (n) | as(f)(p) = DN (f)(ap) for p ∈ sk and f ∈ ∆(n, k)
A natural transformation l : s −→ t induces a map ΩN (l) : ΩN (t) −→ ΩN (s) by the rule
[DN (l)(a)]p = al(p) for a ∈ DN (t) and p ∈ sn. Thus for given integers N and K we have
constructed a functor DN ◦ ( )≤K ◦ Sing on Top sending a topological space X into a sort
of nil-differential graded algebra satisfying a twisted Leibnitz rule DN (Sing≤K(X)) . It would
be interesting to compute the cohomology groups of the algebra of difference forms of higher
depth on known simplicial sets. Even in the case of forms of depth 2 these groups have seldom
been studied.
3 On the (3,N) curvature
Recall that a discrete quantum mechanical system is given by the following data:
1. A directed graph with set of vertices V and set of directed edges E. The Hilbert space of
the system is H = CV .
2. A map ω : E −→ R assigning a weight to each edge.
3. Operators Un : H −→ H for n ∈ N given by (Unf)(y) =
ωn(y, x)f(x) where the
discretized kernel ωn(y, x) is given by
ωn(y, x) =
γ∈Pn(x,y)
ω(e).
Pn(x, y) denotes the set of paths in Γ from x to y of length n, i.e., sequences (e1, · · · , en)
of edges such that s(e1) = x, t(ei) = s(ei+1), for i = 1, ..., n − 1 and t(en) = y.
Let us introduce some notation. For s = (s1, ..., sn) ∈ N
n we set l(s) = n and |s| =
i si.
For 1 < i ≤ n we set s<i = (s1, ..., si−1); also we set s>n = s<1 = ∅. N
(∞) is equal to
where by convention N(0) = {∅}. Let A be a 3-dga and e be a degree one derivation on A. For
s ∈ Nn we let e(s) = e(s1) · · · e(sn), where e(l) = dl(e) if l ≥ 1, e(0) = e and e∅ = 1. For N ∈ N,
we set EN =
s ∈ N(∞) | s 6= ∅ and |s|+ l(s) ≤ N
and for s ∈ EN we let N(s) ∈ Z be given
by N(s) = N − |s| − l(s).
The following data defines a discrete quantum mechanical system:
1. The set of vertices is N(∞).
2. There is a unique directed edge from s to t if and only if t ∈ {(0, s), s, (s + ei)}, where
ei ∈ N
l(s) are the canonical vectors.
3. Edges are weighted according to the table:
source target weight
s (0, s) 1
s s (−1)|s|+l(s)
s (s+ ei) (−1)
|s<i|+i−1
PN (∅, s) consists of paths γ = (e1, ..., eN ), such that s(e1) = ∅, t(eN ) = s and s(el+1) = t(el).
The weight ω(γ) of a path γ ∈ PN (∅, s) is given by ω(γ) =
i=1 ω(ei). The following result,
proved in [4], tell us when d+ e defines a deformation of a 3-dga into a N -dga.
Theorem 5. d+ e defines a deformation of the 3-dga A into a N -dga if and only if the (3, N)
Maurer-Cartan equation holds co + c1d+ c2d
2 = 0, where for 0 ≤ k ≤ 2 we set
N(s)=k
c(s,N)e(s) and c(s,N) =
γ∈PN (∅,s)
ω(γ).
Exact deformations
Let us first consider the deformation of a 3-dga into a 3-dga. According to Theorem 5 the
derivation d+ e defines a 3-dga if and only if co + c1d+ c2d
2 = 0 where
N(s)=k
c(s, 3)e(s) and c(s, 3) =
γ∈P3(∅,s)
ω(γ).
Let us compute the coefficients ck. We have that
E3 = {(0), (1), (2), (0, 0), (1, 0), (0, 1), (0, 0, 0)} .
Let us first compute c0. There are four vectors s in E3 such that N(s) = 0, these are
(2), (1, 0), (0, 1) and (0, 0, 0). The only path from ∅ to (2) of length 3 is
∅ −→ (0) −→ (1) −→ (2)
of weight 1. Since e(2) = d2(e), then we have that c((2), 3) = d2(e). The unique path from ∅ to
(1, 0) of length 3 is
∅ −→ (0) −→ (0, 0) −→ (1, 0)
of weight 1. Since e(1,0) = d(e)e we have that c((1, 0), 3) = d(e)e. There are two paths from ∅
to (0, 1) of length 3, namely
∅ −→ (0) −→ (0, 0) −→ (0, 1)
∅ −→ (0) −→ (1) −→ (0, 1)
of weight −1 and 1, respectively. Thus c((1, 0), 3) = 0 since the sum of the weights vanishes.
The unique path from ∅ to (0, 0, 0) of length 3 is
∅ −→ (0) −→ (0, 0) −→ (0, 0, 0)
of weight 1. Since e(0,0,0) = e3, then c((0, 0, 0), 3) = e3. Thus we have shown that
c0 = d
2(e) + d(e)e + e3.
We proceed to compute c1. The vectors in E3 such that N(s) = 1 are (1) and (0, 0). Paths
from ∅ to (1) of length 3 are
∅ −→ ∅ −→ (0) −→ (1)
∅ −→ (0) −→ (0) −→ (1)
∅ −→ (0) −→ (1) −→ (1)
of weight 1, −1 and 1, respectively. Since e(1) = d(e), then c((1), 3) = d(e). Paths from ∅ to
(0, 0) of length 3 are
∅ −→ (0) −→ (0) −→ (0, 0)
∅ −→ ∅ −→ (0) −→ (0, 0)
∅ −→ (0) −→ (0, 0) −→ (0, 0).
The corresponding weights are, respectively, −1, 1 and 1. We have that e(0,0) = e2, thus
c((0, 0), 3) = e2 and c1 = d(e) + e
Finally we compute c2. (0) is the only vector in E3 such that N(s) = 2. The paths from ∅
to (0) of length 3 are
∅ −→ ∅ −→ ∅ −→ (0)
∅ −→ ∅ −→ (0) −→ (0)
∅ −→ (0) −→ (0) −→ (0).
The corresponding weights are, respectively, 1, −1 and 1. Since e(0) = e then c2 = c((0), 3) =
e. Altogether we have proven the following result.
Theorem 6. d+ e defines a deformation of the 3-dga A into a 3-dga if and only if
(d2(e) + d(e)e + e3) + (d(e) + e2)d+ ed2 = 0.
Consider now deformations of a 3-dga into a 4-dga. Again by Theorem 5 we must have
c0 + c1d+ c2d
2 = 0. We proceed to compute the coefficients ck. We have that
E4 = {(0), (1), (2), (0, 0), (1, 0), (0, 1), (2, 0), (0, 2), (1, 1),
(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 0, 0, 0)}.
(0) is the only vector in E4 such that N(s) = 3. Paths of length 4 from ∅ to (0) are of the
form ∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
with weight (−1)j , where i + j = 3, thus we have that
c3 = (1− 1 + 1− 1)e = 0.
We compute c2. Vectors in E4 with N(s) = 2 are (0, 0) and (1). Paths from ∅ to (0, 0)
of length 2 are of the form ∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (0, 0) → · · · → (0, 0)︸ ︷︷ ︸
of weight
i+j+k=2(−1)
j = 2, thus c((0, 0), 4)e(0,0) = 2e2. Paths from ∅ to (1) of length 2 are of the
form ∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (1) → · · · → (1)︸ ︷︷ ︸
with weight
i+j+k=2(−1)
j = 2, thus
c((1), 4)e(1) = 2d(e) and c2 = 2(e
2 + d(e)).
Let us now compute c1. Vectors in E4 with N(s) = 1 are (0, 0, 0), (1, 0), (0, 1) and (2).
Paths from ∅ to (0, 0, 0) are of 5 types. Paths of the form
∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (0, 0) → · · · → (0, 0)︸ ︷︷ ︸
→ (0, 0, 0) → · · · → (0, 0, 0)︸ ︷︷ ︸
with weight
i+j+k+l=1(−1)
j(−1)l, so that c((0, 0, 0), 4)e(0,0,0) = (1− 1 + 1− 1)e3 = 0. Paths
of the form
∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (1) → · · · → (1)︸ ︷︷ ︸
→ (0, 1) → · · · → (0, 1)︸ ︷︷ ︸
with weight ∑
i+j+k+l=1
(−1)j(−1)l = 1− 1 + 1− 1 = 0 .
Path of the form
∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (0, 0) → · · · → (0, 0)︸ ︷︷ ︸
→ (0, 1) → · · · → (0, 1)︸ ︷︷ ︸
of weight
i+j+k+l=1(−1)
j(−1)(−1)l so that
c((0, 1), 4)e(0,1) = ((1 − 1 + 1− 1) + (1− 1 + 1− 1))ed(e) = 0.
Paths of the form
∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (0, 0) → · · · → (0, 0)︸ ︷︷ ︸
(1, 0) → · · · → (1, 0)︸ ︷︷ ︸
of weight
i+j+k+l=1(−1)
j(−1)l, thus c((1, 0), 4)e(1,0) = (1 − 1 + 1 − 1)d(e)e = 0. There are
also paths of the form
∅ → · · · → ∅︸ ︷︷ ︸
→ (0) → · · · → (0)︸ ︷︷ ︸
→ (1) → · · · → (1)︸ ︷︷ ︸
→ (2) → · · · → (2)︸ ︷︷ ︸
of weight
i+j+k+l=1(−1)
j(−1)l, so we have c((2), 4)e(2) = (1 − 1 + 1− 1)d2(e) = 0. We have
shown that
c1 = c((0, 0, 0), 4)e
(0,0,0) + c((0, 1), 4)e(0,1) + c((1, 0), 4)e(1,0) + c((2), 4)e(2) = 0.
Let us compute c0. There are several types of paths in this case. Path
∅ −→ (0) −→ (0, 0) −→ (0, 0, 0) −→ (0, 0, 0, 0)
of weight 1, thus cq((0, 0, 0, 0), 4)a
(0,0,0,0) = a4. Paths
∅ −→ (0) −→ (0, 0) −→ (0, 0, 0) −→ (0, 0, 1)
∅ −→ (0) −→ (1) −→ (0, 1) −→ (0, 0, 1)
∅ −→ (0) −→ (0, 0) −→ (0, 1) −→ (0, 0, 1)
of weight 1, thus we have that c((0, 0, 1), 4)e(0,0,1) = e2d(e). Paths
∅ −→ (0) −→ (0, 0) −→ (1, 0) −→ (0, 1, 0)
∅ −→ (0) −→ (0, 0) −→ (0, 0, 0) −→ (0, 1, 0)
of weight 0, thus c((0, 1, 0), 4)e(0,1,0) = 0ad(a)a = 0. Path
∅ −→ (0) −→ (0, 0) −→ (0, 0, 0) −→ (1, 0, 0)
of weight 1, thus c((1, 0, 0), 4)e(1,0,0) = d(e)e2. Path
∅ −→ (0) −→ (0, 0) −→ (1, 0) −→ (2, 0)
of weight 1, so c((2, 0), 4)e(2,0) = d2(e)e. Paths
∅ −→ (0) −→ (0, 0) −→ (0, 1) −→ (0, 2)
∅ −→ (0) −→ (1) −→ (0, 1) −→ (0, 2)
∅ −→ (0) −→ (1) −→ (2) −→ (0, 2)
of weight 1, so that c((0, 2), 4)e(0,2) = ed2(e). There are also paths
∅ −→ (0) −→ (0, 0) −→ (1, 0) −→ (1, 1)
∅ −→ (0) −→ (1) −→ (0, 1) −→ (1, 1)
of weight 2, so that c((1, 1), 4)e(1,1) = (d(e))2. We see that
c0 = e
4 + e2d(e) + d(e)e2 + d2(e)e + ed2(e) + (d(e))2.
All together we have shown the following result.
Theorem 7. d+ e defines a deformation of the 3-dga A into a 4-dga if and only if
(e4 + e2d(e) + d(e)e2 + d2(e)e + ed2(e) + (d(e))2) + 2(e2 + d(e))d2 = 0.
Infinitesimal deformations
Let t be a formal parameter such that t2 = 0.
Theorem 8. Let (A, d) be a N -dga and e a degree one derivation on A, then we have
(d+ te)N = t
p∈Par(k,N−k+1)
(−1)w(p)
 dN−k−1(e)dN−k−1,
where
Par(k,N − k + 1) = {p = (p1, · · · , pN−k+1) |
N−k+1∑
pi = k} and w(p) =
N−k+1∑
(i− 1)pi.
Proof. From Theorem 5 we know that DN =
k=0 ckd
k. Since t2 = 0, then
(te)(s) = (te)(s1) · · · (te)(sl(s)) = tl(s)e(s) = 0
unless l(s) ≤ 1. On the other hand we have that
EN = {(0), (1), · · · , (N − 1)}.
Suppose that l(s) = 1 and N(s) = N − |s| − l(s) = k, thus |s| = N − k − 1. The unique vector
s in EN of length 1 such that |s| = N − k − 1 is s = (N − k − 1). Therefore
N(s)=k
c(s,N)e(s) = c((N − k − 1), N)e(s) = c((N − k − 1), N)dN−k−1(e).
A path from ∅ to (N − k − 1) of length N must be of the form
∅→ · · · →︸ ︷︷ ︸
∅ → (0)→ · · · →︸ ︷︷ ︸
(0) → (1)→ · · · →︸ ︷︷ ︸
(1) → · · · → (N − k − 1)→ · · · →︸ ︷︷ ︸
pN−k+1
(N − k − 1)
with (p1 + 1) + (p1 + 1) + · · ·+ (pN−k + 1) + pN−k+1 = N , i.e.,
∑N−k+1
i=1 pi = k. The weight of
such path is
(−1)0p1(−1)(2−1)p2(−1)(3−1)p2 ...(−1)(N−k)pN−k+1 = (−1)w(p), thus we get that
c((N − k − 1), N) =
γ∈PN (∅,s)
ω(γ) =
p∈Par(k,N−k+1)
(−1)w(p).
Corollary 9. e defines an infinitesimal deformation of theN -dga (A, d) into theN -dga (A, d+e)
if and only if
p∈Par(k,N−k+1)
(−1)w(p)
 dN−k−1(e)dN−k−1 = 0. (1)
4 N Lie algebroids
In this section we introduce the notion of N Lie algebroids and construct examples of such
structures. We first review the notion of Lie algebroids, provide some examples, and write the
definition of Lie algebroids in a convenient way for our purposes.
Lie algebroids
We review basic ideas around the notion of Lie algebroids; the interested reader will find much
more information in [12, 23, 27]. The notion of Lie algebroids has gained much attention in
the last few years because of its interplay with various branches of mathematics and theoretical
physics, see [10, 11, 17]. We center our attention on the basic definitions and constructions of
Lie algebroids and its relation with graded manifolds and differential graded algebras.
Definition 10. A Lie algebroid is a vector bundle π : E −→ M together with:
• A Lie bracket [ , ] on the space Γ(E) of sections of E.
• A vector bundle map ρ : E −→ TM over the identity, called the anchor, such that the
induced map ρ : Γ(E) −→ Γ(TM) is a Lie algebra morphism.
• The identity [v, fw] = f [v,w]+(ρ(v)f)w must hold for sections v,w of E and f a smooth
function on M .
Let (x1, ..., xn) be coordinates on a local chart U ⊂ M , and let {eα | α = 1, . . . , r} be a basis
of local sections of π : E|U −→ U . Local coordinates on E|U are given by (x
i, yα). Locally the
Lie bracket and the anchor are given by [eα, eβ ]E = C
eγ and ρ(eα) = ρ
, respectively.
The smooth functions C
, ρiα are the structural functions of the Lie algebroid. The condition
for ρ to be a Lie algebra homomorphism is written in local coordinates as
= ρiγ C
The other compatibility condition between ρ and [ , ] is given by
cycl(α,β,γ)
+ Cµαν C
where the sum is over indices α, β, γ such that the map 1, 2, 3 −→ α, β, γ is a cyclic permuta-
tion. The simplest examples of Lie algebroids are described below; the reader will find further
examples in the references listed at the beginning of this section.
Example 11. A finite dimensional Lie algebra g may be regarded as a vector bundle over a
single point. Sections are elements of g, the Lie bracket is that of g, and the anchor map is
identically zero. The structural functions C
are the structural constants c
of g and ρiα = 0.
Example 12. The tangent bundle π : TM −→ M with anchor the identity map ITB on TB
and with the usual bracket on vector fields.
Exterior differential algebra of Lie algebroids
Sections Γ(
E) of a Lie algebroid E play the rôle of vector fields on a manifold and are called
E vector fields. Sections of the dual bundle π : E∗ −→ M are called E 1-forms. Similarly
sections Γ(
E∗) of
E∗ are called E forms. The degree of a E form in Γ(
E∗) is k. Let us
state and sketch the proof of a result of fundamental importance for the rest of this work.
Theorem 13. Let E be a vector bundle. E is a Lie algebroid if and only if Γ(
E∗) is a
differential graded algebra. A differential on
E∗ is the same as a degree one vector field v on
E[−1] such that v2 = 0.
Above E[−1] denotes the graded manifold whose underlying space is E with fibers placed
in degree one. If E is a Lie algebroid one defines a differential
d : Γ(∧kE∗) −→ Γ(∧k+1E∗)
as follows:
dθ(v1, . . . , vk+1) =
(−1)i+1ρ(vi)θ(v1, . . . , v̂i, . . . , vk+1)
(−1)i+jθ([vi, vj ], v1, . . . , v̂i, . . . , v̂j , . . . vk+1),
for v1, . . . , vk+1 ∈ Γ(E). The axioms for a Lie algebroid imply that:
1. d2 = 0;
2. If f ∈ C∞(M) and v ∈ Γ(E), then 〈df, v〉 = ρ(v)f ;
3. d is a derivation of degree 1, i.e., d(θ ∧ ζ) = dθ ∧ ζ + (−1)θθ ∧ ζ.
Conversely, assume that d is a degree one derivation on Γ(
E∗) satisfying d2 = 0. Then E
is a Lie algebroid with the structural maps ρ and [ , ] given by
ρ(v)f = df(v) ,
θ([v,w]) = ρ(v)θ(w) − ρ(w)θ(v)− dθ(v,w),
for v,w ∈ Γ(E), f ∈ C∞(M) and θ ∈ Γ(
E). In local coordinates d is determined by
dxi = ρiα e
α and deγ = C
eα ∧ eβ,
where {eα | α = 1, . . . , r} is the dual basis of {eα | α = 1, . . . , r}. It is not hard to see that
the conditions d2xi = 0 and d2eα = 0 are equivalent to the structural equations defining a Lie
algebroid. Let us compute the exterior algebra of a few Lie algebroids.
Example 14. To the trivial Lie algebroid structure on a vector bundle E corresponds to the
exterior algebra
E∗ with vanishing differential.
Example 15. Chevalley-Eilenberg differential on
∗ arises from the Lie algebroid g −→ {•}
of Example 11. The Chevalley-Eilenberg differential d takes the form
dθ(v1, . . . , vk+1) =
(−1)i+jθ([vi, vj ], v1, . . . , v̂i, . . . , v̂j , . . . vk+1),
for vi ∈ g and θ ∈
Example 16. The differential associated with the tangent bundle TM −→ M Lie algebroid is
de Rham differential.
N Lie algebroids
We are ready to introduce the main concept of this section. In the light of Theorem 13 it is
rather natural to define a N Lie algebroid as a vector bundle E together with a degree one
N -nilpotent vector field v on the graded manifold E[−1]. That definition, useful as it might
be, rules out some significant examples that we would not like to exclude, thus, we prefer the
more inclusive definition given below. Though not strictly necessary for our definition of N Lie
algebroids, the study of nilpotent vector fields on graded manifolds is of independent interest,
and we shall say a few words about them. Indeed our next result gives an explicit formula for
the N -th power of a graded vector field.
Let x1, ..., xm be local coordinates on a graded manifold and ∂1, ..., ∂m be the corresponding
vector fields. We recall that if xi is a variable of degree xi, then ∂i is of degree −xi, and dxi
is of degree xi + 1. Let a
1, ..., am be functions of homogeneous degree depending on x1, ..., xm.
For L a linearly ordered set and f : L −→ [m] a map we define
f(i) and ∂f =
∂f(i).
Also we define the sign s(f) by the rule
∂f = s(f)∂
|f−1(1)|
1 ... ∂
|f−1(m)|
Let p : N −→ Z2 be the map such that p(n) is 1 if n is even and −1 otherwise. Using induction
on N one can show that:
Theorem 17.
(ai∂i)
s(f, α)(
α−1(i)
af(i))∂f |
α−1(N+1)⊔N
where the sum runs over f : [N ] −→ [m] and α : [N − 1] −→ [2, N + 1] such that α(i) > i. The
sign s(f, α) is given by
s(f, α) = p(
s<j<α(s)
f(j) + xsf |α−1(j)∩[s+1,N−1] ).
Corollary 18.
(ai∂i)
cI∂I ,
where I : [m] −→ N is such that 1 ≤ |I| := I(1) + ...+ I(N) ≤ N , ∂I =
i=1 ∂
i , and
S(f, α)
(∂f(α−1(i))a
f(i))
where the sum runs over maps α : [N − 1] −→ [2, N + 1] with α(i) > i for i ∈ [N − 1], and
f : [N ] −→ [m] such that |{j ∈ α−1(N + 1) ⊔ {N} | f(j) = i}| = I(i), for i ∈ [m]. The sign
S(f, α) is given by
S(f, α) = s(f, α)s(f |α−1(N+1)⊔{N}) .
Corollary 19. (ai∂i)
N = 0 if and only if cI = 0 for I as above.
For example for N = 2 one gets
(ai∂i)
p(xiaj)aiaj∂i∂j + ai∂i(aj)∂j .
For N = 3 we get that
(ai∂i)
i,j,k
ai∂i(aj)∂j(ak)∂k + p(xiaj)aiaj∂i∂j(ak)∂k
+ p(xiak)aiaj∂j(ak)∂i∂k + p(xjak)ai∂i(aj)ak∂j∂k
+ xiaj)aiaj∂i(ak)∂j∂k + p(xjak + xiaj + xiak)aiajak∂i∂j∂k.
For N = 4 the corresponding expression have 24 terms and we won’t spell it out.
We return to the problem of defining N Lie algebroids. We need some general remarks on
differential operators on associative algebras. Given an associative algebra A we let DO(A)
be the algebra of differential operators on A, i.e., the subalgebra of End(A) generated by
A ⊂ End(A) and Der(A) ⊂ End(A), the space of derivations of A. Thus DO(A) is generated
as a vector space by operators of the form x1 ◦x2 ◦ · · · ◦xn ∈ End(A) where xi is in A⊔Der(A).
Notice that DO(A) admits a natural filtration
∅ = DO≤−1(A) ⊆ DO≤0(A) ⊆ DO≤1(A) ⊆ · · · ⊆ DO≤k(A) ⊆ · · · ⊆ DO(A),
whereDO≤k(A) ⊆ DO(A) is the subspace generated by operators x1◦x2◦· · ·◦xn, where at most
k operators among the xi belong to Der(A). Thus DO(A) admits the following decomposition
as graded vector space
DO(A) =
DOk(A) :=
DO≤k(A)/DO≤k−1(A).
Clearly DO0(A) = A and if A is either commutative or graded commutative, then
DO1(A) = Der(A).
The projection map π1 : DO(A) −→ DO1(A) induces a non-associative product
⋄ : DO1(A)⊗DO1(A) −→ DO1(A)
given by s ⋄ t = π1(s ◦ t) for s, t ∈ DO1(A). In particular if A is commutative or graded
commutative we obtain a non-associative product
⋄ : Der(A)⊗Der(A) −→ Der(A).
To avoid unnecessary use of parenthesis we assume that in the iterated applications of ⋄ we
associate in the minimal form from right to left.
Definition 20. A N Lie algebroid is a vector bundle E together with a degree one derivation
d : Γ(
E∗) −→ Γ(
E∗), such that the result of N ⋄-compositions of d with itself vanishes,
i.e., d ⋄ d ⋄ · · · ⋄ d = 0.
The notions of Lie algebroids and 2 Lie algebroids agree; indeed it is easy to check that
d ◦ d = d ⋄ d for any degree one derivation d : Γ(
E∗) −→ Γ(
E∗). Let us now illustrate with
an example the difference between the condition d◦d◦· · ·◦d = 0 and the much weaker condition
d ⋄ d ⋄ · · · ⋄ d = 0. Let C[x1, ..., xn] be the free graded algebra generated by graded variables xi
for 1 ≤ i ≤ n. A derivation on C[x1, ..., xn] is a vector field ∂ =
ai∂i where ai ∈ C[x1, ..., xn].
The condition ∂N = 0 is rather strong and restrictive, it might be tackled with the methods
provided above. In contrast, the condition ∂ ⋄ ∂ ⋄ · · · ⋄ ∂ = 0 is much simpler and indeed it is
equivalent to the condition ∂N (xi) = 0 for 1 ≤ i ≤ n.
Definition 21. A N Lie algebra is a vector space g together with a degree one derivation d on
∗ such that the N -th ⋄-composition of d with itself vanishes.
Our next result characterizes 3 Lie algebras in more familiar terms. For integers k1, k2, ..., kl
such that k1 + k2 + · · ·+ kl = n, we let Sh(k1, k2, · · · , kl) be the set of permutations
σ : {1, · · · , n} −→ {1, · · · , n}
such that σ is increasing on the intervals [ki + 1, ki+1] for 0 ≤ i ≤ l, k0 = 1 and kl+1 = n.
Assume we are given a map [ , ] :
g −→ g.
Theorem 22. The pair (g, [ , ]) is a 3 Lie algebra if and only if for v1, v2, v3, v4 ∈ g we have
σ∈Sh(2,1,1)
sgn(σ)[[[vσ(1), vσ(2)], vσ(3)]vσ(4)] =
σ∈Sh(2,2)
sgn(σ)[[vσ(1) , vσ(2)], [vσ(3), vσ(4)]],
Proof. One can show that a degree one differential on
∗ is necessarily the Chevalley-Eilenberg
operator
dθ(v1, . . . , vn+1) =
(−1)i+jθ([vi, vj ], v1, . . . , v̂i, . . . , v̂j , . . . vn+1) ,
where [ , ] :
g −→ g is an antisymmetric operator. We remark that we are not assuming,
at this point, that the bracket [ , ] satisfies any further identity. Jacobi identity arises when
the square of d is set to be equal to zero, but we do not do that since we want to investigate
the weaker condition that the third ⋄-power of d be equal to zero. For θ ∈
∗ = g∗ the
Chevalley-Eilenberg operator takes the simple form
dθ(v1, v2) = −θ([v1, v2]).
Moreover a further application of d to dθ yields
d2θ(v1, v2, v3) =
σ∈Sh(2,1)
sgn(σ)θ([[vσ(1), vσ(2)], vσ(3)]).
From the last equation it is evident that Jacobi identity is equivalent to the condition d2 = 0.
We do not assume assume that Jacobi identity holds and proceed to compute the third ⋄-power
of d. We obtain that
d ⋄ d ⋄ dθ(v1, v2, v3, v4) =
σ∈Sh(2,1,1)
sgn(σ)θ([[[vσ(1), vσ(2)], vσ(3)]vσ(4)])
σ∈Sh(2,2)
sgn(σ)θ([[vσ(1), vσ(2)], [vσ(3), vσ(4)]]).
Thus d ⋄ d ⋄ d = 0 if and only if the condition from the statement of the Theorem holds.
Using local coordinates θ1, ..., θm on the graded manifold g[−1], it is not hard to show that
a vector field of degree one on g[−1] can be written as
where the constants C
may be identified with the structural constants of [·, ·]. The square of
the vector field ∂ is given by
Cσδ εθ
Cσγ εθ
αθβθε
Cσδ γθ
αθβθδ
θαθβCσδ εθ
Using the antisymmetry properties of C
and the commutation rules for θα one can write
together the first to terms. We find that
∂ ⋄ ∂ =
Cσγ εθ
αθβθε
The condition ∂ ⋄ ∂ = 0 is equivalent to Jacobi identity. We assume that ∂ ⋄ ∂ 6= 0 and proceed
to compute consider the condition ∂ ⋄ ∂ ⋄ ∂ = 0. We have that
∂ ◦ (∂ ⋄ ∂) =
Cνλµθ
Cσγ εθ
αθβθε
Using carefully the properties of C
and θα we find that
∂ ◦ (∂ ⋄ ∂) =
CνλµC
Cσγ εθ
λθµθβθε
CνλµC
Cσγ νθ
λθµθαθβ
CνλµC
Cσγ εθ
λθµθαθβθε
Therefore we have shown that
∂ ⋄ (∂ ⋄ ∂) =
CνλµC
Cσγ ǫ −
CσγαC
θλθµθβθε
Thus the condition ∂ ⋄ (∂ ⋄ ∂) = 0 is equivalent to the following equations for fixed σ:
λ,µ,β,ε
CνλµC
Cσγ ǫ −
CσγαC
θλθµθβθε = 0 .
Let us now go back to the case of Lie algebroids as opposed to Lie algebras. There is a
natural degree one vector field on the graded manifold T[−1]R
n, namely, de Rham differential.
We now investigate whether it is possible to deform, infinitesimally, de Rham differential into
a 3-differential. In local coordinates (x1, . . . , xn, θ1, . . . , θn) on T[−1]R
n, with xi of degree zero
and θi of degree 1, de Rham operator takes the form
∂ = δiαθ
Let t be a formal infinitesimal parameter such that t2 = 0. We are going to show that any set
of functions aiα of degree zero on T[−1]R
n determine a deformation of de Rham operator into a
3-⋄ nilpotent operator given by
δiα + ta
Theorem 23. ∂a ⋄ ∂a = t
θα θβ
and ∂a ⋄ (∂a ⋄ ∂a) = 0.
Proof.
∂2a =
δiα + ta
θα θβ
θα θβ
θα θβ
Since t2 = 0 the third term on the right hand side of the expression above vanishes. The second
term also vanishes because it is a contraction of even and odd indices. So we get that
∂a ⋄ ∂a = t
θα θβ
The third power of ∂a is given by
∂a ⋄ (∂a ⋄ ∂a) = t
∂xγ∂xα
θγ θα θβ
It also vanishes because it includes a contraction of even and odd indices.
The nilpotency condition for the operator ∂a ⋄∂a is
θα θβ = 0 for j = 1, . . . , n. It is not
hard to find examples of matrices a
such that ∂a ⋄ ∂a = 0, for example

(x4)2
x1 x1
x2 x2 x3 x2
x3 x3 x2 x4
x4 x4 x1 x4 x3

More importantly there are also matrices a
such that ∂a ⋄ ∂a 6= 0, for example

x1 x4 x1 x1 x1
x2 x2 x4 x2 x2
x3 x3 x3 x4 x3
x4 x4 x4 x1 x4

We now consider full deformations as opposed to infinitesimal ones. Let
δiα + a
be a vector field. We think of ∂a as a deformation of de Rham differential with deformation
parameters aiα.
Theorem 24.
∂a ⋄ (∂a ⋄ ∂a) =
δlγ + a
){∂aiα
+ aiα
∂xl∂xi
θγθαθβ
Proof. Since
∂2a =
δiα + a
∂a ⋄ ∂a =
δiα + a
) ∂ajβ
we get
∂a ⋄ (∂a ⋄ ∂a) =
δlγ + a
δiα + a
) ∂ajβ
δlγ + a
){∂aiα
+ aiα
∂xl∂xi
θγθαθβ
Corollary 25. ∂a ⋄ (∂a ⋄ ∂a) = 0 if for fixed indices α, β, λ, j the following identity holds
δlγ + a
){∂aiα
+ aiα
∂xl∂xi
θγθαθβ = 0.
Corollary 26. Each matrix A = (A
) ∈ Mn(R) such that A
2 = 0 determines a 3 Lie algebroid
structure on TRn with differential given by (δiα +A
αxα)dx
Our final result describes explicitly the conditions defining a 3 Lie algebroid. Let E be a
vector bundle over M . A vector field on E[−1] of degree one is given in local coordinates by
∂ = ρiαθ
where ρiα and C
are functions of the bosonic variables only.
Theorem 27. ∂ ⋄ (∂ ⋄ ∂) = 0 if and only if for fixed γ and i the following identity holds:
Cασ µ)
Cλµσ −
CαλµC
CαβµC
θνθσθµθβ = 0 ,
Cǫσν−
Cǫσν + ρ
θσθνθγ = 0 .
Proof. We sketch the rather long proof. For ∂ = ρiαθ
θαθβ ∂
, we have
∂ ⋄ ∂ =
θλθµθβ
As in the previous theorem one finds that the condition ∂ ⋄ (∂ ⋄ ∂) = 0 is equivalent to the
following identities
Cασ µ)
Cλµσ −
Cλνσ −
θνθσθµθβ
= 0 ,
Cǫσν − ρ
θσθνθγ
Needless to say further research is necessary in order to have a better grasp of the meaning
and applications of the notion of N Lie algebroids. We expect that this approach will lead
towards new forms of infinitesimal symmetries, and for that reason alone it should find appli-
cations in various problems in mathematical physics. In our forthcoming work [3] we are going
to discuss some applications of N Lie algebroids in the context of Batalin-Vilkovisky algebras
and the master equation.
Acknowledgment
Thanks to Takashi Kimura, Juan Carlos Moreno and Jim Stasheff.
References
[1] V. Abramov, R. Kerner, Exterior differentials of higher order and ther covariant generalization,
J. Math. Phys. 41 (8) (2000) 5598-5614.
[2] V. Abramov, R. Kerner, On certain realizations of the q-deformed exterior differential calculus,
Rep. Math. Phys. 43 (1999) 179-194.
[3] M. Angel, J. Camacaro, R. Dı́az, Batalin-Vilkovisky algebras andN -complexes, in preparation.
[4] M. Angel, R. Dı́az, N-differential graded algebras, J. Pure App. Alg. 210 (3) (2007) 673-683.
[5] M. Angel, R. Dı́az, N -flat connections, in S. Paycha, B. Uribe (Eds.), Geometric and Topological
Methods for Quantum Field Theory, Contemp. Math. 432, Amer. Math. Soc., Providence, pp. 163-
172, 2007.
[6] M. Angel, R. Dı́az. On the q-analoque of the Maurer-Cartan equation, Adv. Stud. Contemp.
Math. 12 (2) (2006) 315-322.
[7] M. Angel, R. Dı́az, AN
-algebras, preprint, arXiv.math.QA/0612661.
[8] N. Bazunova, Construction of graded differential algebra with ternary differential, in J. Fuchs, J.
Mickelsson, Grigori Rozenblioum and Alexander Stolin (Eds.), Noncommutative geometry and rep-
resentation theory in mathematical physics, Contemp. Math. 391, Amer. Math. Soc., Providence,
pp. 1-9, 2005.
[9] N. Bazunova, Non-coordinate case of graded differential algebra with ternary differential, J.
Nonlinear Math. Phys. 13 (2006) 21-26.
[10] J. R. Camacaro, Lie algebroid exterior algebra in gauge field theories, in Groups, Geometry and
Physics, Monogr. Acad. Ci. Zaragoza 29, Zaragoza, pp. 57-64, 2006.
[11] J.F. Cariñena, Lie groupoids and algebroids in classical and quantum mechanics, in Symmetries
in Quantum Mechanics and Quantum Optics, Universidad de Burgos, Burgos, pp. 67-81, 1999.
[12] A.C. da Silva, A. Weinstein, Lectures on geometrical models for noncommutative algebra,
Berkeley Mathematical Lecture Notes 10, Amer. Math. Soc., Providence, 1999.
[13] M. Dubois-Violette, Generalized differential spaces with dN = 0 and the q-differential calculus,
Czech J. of Phys. 46 (1996) 1227- 1233.
[14] M. Dubois-Violette, Generalized homologies for dN = 0 and graded q-differential algebras, in M.
Henneaux, J. Krasil’shchik, A. Vinogradov (Eds.), Secondary Calculus and Cohomological Physics,
Contemp. Maths. 219, Amer. Math. Soc., Providence, pp. 69-79, 1998.
[15] M. Dubois-Violette, Lectures on differentials, generalized differentials and some examples re-
lated to theoretical physics, in R. Coquereaux, A. Garcia, R. Trinchero (Eds.),Quantum Symmetries
in Theoretical Physics and Mathematics, Contemp. Maths. 294, Amer. Math. Soc., Providence, pp.
59-94, 2002.
[16] M. Dubois-Violette, R. Kerner, Universal q-differential calculus and q-analog of homological
algebra, Acta Math. Univ. Comenianae LXV (2) (1996) 175-188.
[17] N. P. Landsman, Lie groups and Lie algebroids in physics and noncommutative geometry, J.
Geom. Phys. 56 (2006) 24-54.
[18] M.M. Kapranov, On the q-analog of homological algebra, preprint, arXiv.q-alg/9611005.
[19] C. Kassel, M. Wambst, Algèbre homologique des N-complexes et homologie de Hochschild aux
racines de l’unité, Publ. Res. Inst. Math. Sci. Kyoto University 34 (2) (1998) 91-114.
[20] R. Kerner, The cubic chessboard, Class. Quantum Grav. 14 (1997) A203-A225.
[21] R. Kerner, Z3-graded exterior differential calculus and gauge theories of higher order, Lett.
Math. Phys. 36 (1996) 441-454.
[22] R. Kerner, B. Niemeyer, Covariant q-differential calculus and its deformations at qN = 1, Lett.
Math. Phys. 45 (1998) 161-176.
http://arxiv.org/abs/math/0612661
http://arxiv.org/abs/q-alg/9611005
[23] K. C. Mackenzie, General Theory of Lie Groupoids and Lie Algebroids, London Math. Soc.
Lecture Note Series 213, Cambridge Univ. Press, Cambridge, 2005.
[24] W. Mayer, A new homology theory I, Ann. of Math. 43 (1942) 370-380.
[25] W. Mayer, A new homology theory II, Ann. of Math. 43 (1942) 594-605.
[26] A. Sitarz, On the tensor product construction for q-differential algebras, Lett. Math. Phys. 44
(1998).
[27] J. Pradines, Théorie de Lie pour les groupöıdes différentiables. Relations entre propiétés locales
et globales, C. R. Acad. Sci. Paris Sér. I Math. 236 (1966) 907-910.
[28] D. Zeilberger; Closed forms (pun inteded!), in A tribute to Emil Grosswald: number theory and
related analysis, Contemp. Math. 143, Amer. Math. Soc., Providence, pp. 579-607, 1993.
mangel@euler.ciens.ucv.ve, jcama@usb.ve, ragadiaz@gmail.com
	Introduction
	Examples of N-differential graded algebras
	On the (3,N) curvature
	 N Lie algebroids
ABSTRACT
  Deformations of the 3-differential of 3-differential graded algebras are
controlled by the (3,N) Maurer-Cartan equation. We find explicit formulae for
the coefficients appearing in that equation, introduce new geometric examples
of N-differential graded algebras, and use these results to study N Lie
algebroids.

<|endoftext|><|startoftext|>
Electronic structure of kinetic energy driven superconductors in the presence of
bilayer splitting
Yu Lan,1 Jihong Qin,2 and Shiping Feng1
Department of Physics, Beijing Normal University, Beijing 100875, China
Department of Physics, Beijing University of Science and Technology, Beijing 100083, China
(Dated: November 17, 2018)
Within the framework of the kinetic energy driven superconductivity, the electronic structure
of bilayer cuprate superconductors in the superconducting state is studied. It is shown that the
electron spectrum of bilayer cuprate superconductors is split into the bonding and antibonding
components by the bilayer splitting, then the observed peak-dip-hump structure around the [π, 0]
point is mainly caused by this bilayer splitting, with the superconducting peak being related to
the antibonding component, and the hump being formed by the bonding component. The spectral
weight increases with increasing the doping concentration. In analogy to the normal state case, both
electron antibonding peak and bonding hump have the weak dispersions around the [π, 0] point.
PACS numbers: 74.20.Mn, 74.20.-z, 74.25.Jb
I. INTRODUCTION
The parent compounds of cuprate superconductors are
the Mott insulators with an antiferromagnetic (AF) long-
range order (AFLRO), then via the charge carrier doping,
one can drive these materials through a metal-insulating
transition and enter the superconducting (SC) dome1,2,3.
It has become clear in the past twenty years that cuprate
superconductors are among the most complex systems
studied in condensed matter physics1,2,3. The compli-
cations arise mainly from (1) a layered crystal structure
with one or more CuO2 planes per unit cell separated by
insulating layers which leads to a quasi-two-dimensional
electronic structure, and (2) extreme sensitivity of the
physical properties to the compositions (stoichiometry)
which control the carrier density in the CuO2 plane
1,2,3.
As a consequence, both experimental investigation and
theoretical understanding are extremely difficult.
By virtue of systematic studies using the angle-resolved
photoemission spectroscopy (ARPES), the low-energy
electronic structure of cuprate superconductors in the
SC state is well-established by now2,3, where an agree-
ment has emerged that the electronic quasiparticle-like
excitations are well defined, and are the entities par-
ticipating in the SC pairing. In particular, the lowest
energy states are located at the [π, 0] point of the Bril-
louin zone, where the d-wave SC gap function is max-
imal, then the most contributions of the electron spec-
tral function come from the [π, 0] point2,3. Moreover,
some ARPES experimental results unambiguously estab-
lished the Bogoliubov-quasiparticle nature of the sharp
SC quasiparticle peak near the [π, 0] point4,5, then the
SC coherence of the quasiparticle peak is described by
the simple Bardeen-Cooper-Schrieffer (BCS) formalism6.
However, there are numerous anomalies for different fam-
ilies of cuprate superconductors, which complicate the
physical properties of the electronic structure2,3. Among
these anomalies is the dramatic change in the spectral
lineshape around the [π, 0] point first observed on the bi-
layer cuprate superconductor Bi2Sr2CaCu2O8+δ, where
a sharp quasiparticle peak develops at the lowest bind-
ing energy, followed by a dip and a hump, giving rise to
the so-called peak-dip-hump (PDH) structure in the elec-
tron spectrum7,8,9. Later, this PDH structure was also
found in YBa2Cu3O7−δ
10 and in Bi2Sr2Ca2Cu3O10+δ
Furthermore, although the sharp quasiparticle peaks are
identified in the SC state along the entire Fermi surface,
the PDH structure is most strongly developed around the
[π, 0] point2,7,8,9,10,11.
The appearance of the PDH structure in bilayer
cuprate superconductors in the SC state is the mostly
remarkable effect, however, its full understanding is still
a challenging issue. The earlier works2,12 gave the main
impetus for a phenomenological description of the single-
particle excitations in terms of an interaction between
quasiparticles and collective modes, which is of fun-
damental relevance to the nature of superconductivity
and the pairing mechanism in cuprate superconductors.
However, the different interpretive scenario has been
proposed2,13. This followed from the observation of the
bilayer splitting (BS) for both normal and SC states
in a wide doping range14,15,16. This BS of the CuO2
plane derives the electronic structure in the bonding and
antibonding bands due to the present of CuO2 bilayer
blocks in the unit cell of bilayer cuprate superconduc-
tors, then the main features of the PDH structure is
caused by the BS13,14,15,16, with the peak and hump
corresponding to the antibonding and bonding bands,
respectively. Furthermore, some ARPES experimental
data measured above and below the SC transition tem-
perature show that this PDH structure is totally unre-
lated to superconductivity14. The recent ARPES exper-
imental results reported by several groups support this
scenario, and most convincingly suggested that the PDH
structure originates from the BS at any doping levels17.
To the best of our knowledge, this PDH structure in bi-
layer cuprate superconductors has not been treated start-
ing from a microscopic SC theory.
Within the single layer t-t′-J model, the electronic
structure of the single layer cuprate superconductors in
the SC state has been discussed18 based on the frame-
work of the kinetic energy driven superconductivity19,
and the main features of the ARPES experiments on
the single layer cuprate superconductors have been repro-
duced, including the doping and temperature dependence
of the electron spectrum and quasiparticle dispersion. In
http://arxiv.org/abs/0704.0825v2
this paper, we study the electronic structure of bilayer
cuprate superconductors in the SC state along with this
line. Within the kinetic energy driven SC mechanism19,
we employed the t-t′-J model by considering the bilayer
interaction, and then show explicitly that the BS occurs
due to this bilayer interaction. In this case, the elec-
tron spectrum is split into the bonding and antibond-
ing components by this BS, then the SC peak is closely
related to the antibonding component, while the hump
is mainly formed by the bonding component. In other
words, the well pronounced PDH structure in the electron
spectrum of bilayer cuprate superconductors is mainly
caused by the BS. Furthermore, the spectral weight in
the [π, 0] point increases with increasing the doping con-
centration. In analogy to the normal-state case14,20,21,22,
both electron antibonding peak and bonding hump have
the weak dispersions around the [π, 0] point, in qualita-
tive agreement with the experimental observation on bi-
layer cuprate superconductors in the SC state2,7,8,9,10,11.
The paper is organized as follows. The basic formal-
ism is presented in Sec. II, where we generalize the
kinetic energy driven superconductivity from the previ-
ous single layer case18,19 to the bilayer case, and then
evaluate explicitly the longitudinal and transverse com-
ponents of the electron normal and anomalous Green’s
functions (hence the bonding and antibonding electron
spectral functions). Within this theoretical framework,
we discuss the electronic structure of bilayer cuprate su-
perconductors in the SC state in Sec. III. It is shown that
the striking PDH structure in bilayer cuprate supercon-
ductors is closely related to the BS. Finally, we give a
summary and discussions in Sec. IV.
II. FORMALISM
It has been shown from the ARPES experiments2,23
that the two-dimensional t-t′-J model is of particular
relevance to the low energy features of cuprate super-
conductors. For discussions of the physical properties of
bilayer cuprate superconductors, the t-t′-J model can be
expressed by including the bilayer interactions as,
H = −t
iη̂aσ
iaσCi+η̂aσ + t
iτ̂aσ
iaσCi+τ̂aσ
t⊥(i)(C
i1σCi2σ +H.c.) + µ
iaσCiaσ
Sia · Si+η̂a + J⊥
Si1 · Si2, (1)
supplemented by an important on-site local constraint
iaσCiaσ ≤ 1 to avoid the double occupancy, where
η̂ = ±x̂,±ŷ representing the nearest neighbors of a given
site i, τ̂ = ±x̂± ŷ representing the next nearest neighbors
of a given site i, a = 1, 2 is plane index, C
iaσ (Ciaσ)
is the electron creation (annihilation) operator, Sia =
iaσCia/2 is the spin operator with the Pauli matrices
σ = (σx, σy, σz), µ is the chemical potential, and the
interlayer coherent hopping has the form,
t⊥(k) =
(cos kx − cos ky)
2, (2)
which is strongly anisotropic and follows the theoret-
ical predictions24. In particular, this momentum de-
pendent form (2) has been experimentally verified14,15.
For this t-t′-J model (1), it has been argued that cru-
cial requirement is to impose the electron single occu-
pancy local constraint for a proper understanding of the
physical properties of cuprate superconductors. To in-
corporate the electron single occupancy local constraint,
the charge-spin separation (CSS) fermion-spin theory has
been proposed25, where the constrained electron opera-
tors are decoupled as, Cia↑ = h
ia and Cia↓ = h
with the spinful fermion operator hiaσ = e
−iΦiaσhia rep-
resents the charge degree of freedom together with some
effects of the spin configuration rearrangements due to
the presence of the doped hole itself (dressed holon),
while the spin operator Sia represents the spin degree
of freedom, then the bilayer t-t′-J Hamiltonian (1) can
be expressed in this CSS fermion-spin representation as,
H = t
i+η̂a↑hia↑S
i+η̂a + h
i+η̂a↓hia↓S
i+η̂a)
i+τ̂a↑hia↑S
i+τ̂a + h
i+τ̂a↓hia↓S
i+τ̂a)
t⊥(i)(h
i2↑hi1↑S
i2 + h
i1↑hi2↑S
i2↓hi1↓S
i2 + h
i1↓hi2↓S
i1)− µ
iaσhiaσ
+ Jeff
Sia · Si+η̂a + Jeff⊥
Si1 · Si2, (3)
where Jeff = J(1 − δ)
2, Jeff⊥ = J⊥(1 − δ)
2, and δ =
iaσhiaσ〉 = 〈h
iahia〉 is the doping concentration. It has
been shown that the electron single occupancy local con-
straint is satisfied in analytical calculations within this
CSS fermion-spin theory, and the double spinful fermion
occupancy are ruled out automatically25. Although in
common sense hiaσ is not a real spinful fermion, it be-
haves like a spinful fermion25. As in the single layer
case18, the kinetic energy terms in the bilayer t-t′-J
model have been transferred as the dressed holon-spin
interactions, which can induce the dressed holon pair-
ing state (hence the electron Cooper pairing state) by
exchanging spin excitations in the higher power of the
doping concentration. Before calculation of the electron
normal and anomalous Green’s functions of the bilayer
system in the SC state, we firstly introduce the SC order
parameter. As we have mentioned above, there are two
coupled CuO2 planes in the unit cell, and in this case,
the SC order parameters for the electron Cooper pair is
a matrix ∆ = ∆L + σx∆T , with the longitudinal and
transverse SC order parameters in the CSS fermion-spin
theory can be expressed as,
∆L = 〈C
i+η̂a↓ − C
i+η̂a↑〉
= 〈hia↑hi+η̂a↓S
i+η̂a − hia↓hi+η̂a↑S
i+η̂a〉
= −χ1∆hL, (4a)
∆T = 〈C
i2↓ − C
= 〈hi1↑hi2↓S
i2 − hi1↓hi2↑S
= −χ⊥∆hT , (4b)
respectively, where the spin correlation functions χ1 =
〈S+iaS
i+η̂a〉 and χ⊥ = 〈S
i2〉, and the longitudinal
and transverse dressed holon pairing order parameters
∆hL = 〈hi+η̂a↓hia↑ − hi+η̂a↑hia↓〉 and ∆hT = 〈hi2↓hi1↑ −
hi2↑hi1↓〉.
Within the t-J type model, robust indications
of superconductivity with the d-wave symmetry in
doped cuprates have been found by using numerical
techniques26. On the other hand, it has been argued that
the SC transition in doped cuprates is determined by the
need to reduce the frustrated kinetic energy27. Although
it is not necessary for the strong coupling of the electron
quasiparticles and a pairing boson in their arguments27,
a series of the inelastic neutron scattering experimental
results provide a clear link between the electron quasi-
particles and magnetic excitations28,29. In particular, an
impurity-substitution effect on the low energy dynamics
has been studied by virtue of the ARPES measurement30,
this impurity-substitution effect is a magnetic analogue
of the isotope effect used for the conventional super-
conductors. These experimental results30 reveal that
the impurity-induced changes in the electron self-energy
show a good correspondence to those of the magnetic
excitations, indicating the importance of the magnetic
fluctuation to the electron pairing in cuprate supercon-
ductors. Recently, we19 have developed the kinetic en-
ergy driven SC mechanism based on the CSS fermion-
spin theory25, where the dressed holons interact occur-
ring directly through the kinetic energy by exchanging
spin excitations, leading to a net attractive force between
dressed holons, then the electron Cooper pairs originat-
ing from the dressed holon pairing state are due to the
charge-spin recombination, and their condensation re-
veals the SC ground-state. Within this SC mechanism19,
the doping and temperature dependence of the electron
spectral function of the single layer cuprate supercon-
ductors in the SC state has been discussed18. In this
section, our main goal is to generalize these analytical
calculations from the single layer case to the bilayer sys-
tem. As in the case for the SC order parameter, the full
dressed holon normal and anomalous Green’s functions
can also be expressed as g(k, ω) = gL(k, ω) + σxgT (k, ω)
and ℑ†(k, ω) = ℑ
L(k, ω) + σxℑ
L(k, ω), respectively. We
now can follow the previous discussions for the single
layer case18,19, and evaluate explicitly these correspond-
ing longitudinal and transverse parts of the full dressed
holon normal and anomalous Green’s functions as [see
the Appendix],
gL(k, ω) =
ν=1,2
U2hνk
ω − Ehνk
V 2hνk
ω + Ehνk
, (5a)
gT (k, ω) =
ν=1,2
(−1)ν+1Z
U2hνk
ω − Ehνk
V 2hνk
ω + Ehνk
, (5b)
L(k, ω) = −
ν=1,2
2Ehνk
ω − Ehνk
ω + Ehνk
, (5c)
T (k, ω) = −
ν=1,2
(−1)ν+1Z
2Ehνk
ω − Ehνk
ω + Ehνk
, (5d)
where the dressed holon quasiparticle coherence fac-
tors U2hνk = [1 + ξ̄νk/Ehνk]/2 and V
hνk = [1 −
ξ̄νk/Ehνk]/2, the dressed holon quasiparticle disper-
sion Ehνk =
[ξ̄νk]2+ | ∆̄
(k) |2, the renormalized
dressed holon excitation spectrum ξ̄νk = Z
ξνk, with
the mean-field (MF) dressed holon excitation spectrum
ξνk = Ztχ1γk − Zt
k − µ+ (−1)
ν+1χ⊥t⊥(k), where
the spin correlation function χ2 = 〈S
i+τ̂a〉, γk =
(1/Z)
eik·η̂, γ′k = (1/Z)
eik·τ̂ , Z is the num-
ber of the nearest neighbor or next nearest neighbor
sites, the renormalized dressed holon pair gap func-
tion ∆̄
(k) = Z
[∆̄hL(k) + (−1)
ν+1∆̄hT (k)], with
ν = 1 ( ν = 2) for the bonding (antibonding) case,
where ∆̄hL(k) = Σ
2L (k, ω) |ω=0= ∆̄hLγ
, with γ
(coskx − cosky)/2, ∆̄hT (k) = Σ
2T (k, ω) |ω=0= ∆̄hT , the
dressed holon quasiparticle coherent weights Z
(1)−1
hF1 − Z
hF2, Z
(2)−1
= Z−1
hF1 + Z
hF2, with Z
hF1 =
1 − Σ
1L (k0, ω) |ω=0, and Z
hF2 = Σ
1T (k0, ω) |ω=0 ,
where k0 = [π, 0], Σ
1L (k, ω) and Σ
1T (k, ω) are the cor-
responding antisymmetric parts of the longitudinal and
transverse dressed holon self-energy functions Σ
1L (k, ω)
and Σ
1T (k, ω), while the longitudinal and transverse
parts of the dressed holon self-energy functions Σ
1 (k, ω)
and Σ
2 (k, ω) have been evaluated as,
1L (k, iωn) =
p+q+k
gL(p+ k, ipm + iωn)ΠLL(p,q, ipm)
p+q+k
gT (p+ k, ipm + iωn)ΠTL(p,q, ipm)], (6a)
1T (k, iωn) =
p+q+k
gT (p+ k, ipm + iωn)ΠTT (p,q, ipm)
p+q+k
gL(p+ k, ipm + iωn)ΠLT (p,q, ipm)], (6b)
2L (k, iωn) =
p+q+k
L(−p− k,−ipm − iωn)ΠLL(p,q, ipm)
p+q+k
T (−p− k,−ipm − iωn)ΠTL(p,q, ipm)], (6c)
2T (k, iωn) =
p+q+k
(−p− k,−ipm − iωn)ΠTT (p,q, ipm)
p+q+k
L(−p− k,−ipm − iωn)ΠLT (p,q, ipm)], (6d)
where R
= [Z(tγk − t
′γ′k)]
2 + t2⊥(k), R
= 2Z(tγk −
)t⊥(k), and the spin bubbles Πη,η′(p,q, ipm) =
(1/β)
η (q, iqm)D
η′ (q + p, iqm+ ipm), with η =
L, T and η′ = L, T , and the MF spin Green’s function
D(0)(k, ω) = D
(k, ω) + σxD
(k, ω), with the cor-
responding longitudinal and transverse parts have been
given by22,
L (k, ω) =
ν=1,2
ω2 − ω2
, (7a)
(k, ω) =
ν=1,2
(−1)ν+1
ω2 − ω2
, (7b)
where Bνk = λ(A1γk−A2)−λ
′(2χz2γ
k−χ2)−Jeff⊥[χ⊥+
2χz⊥(−1)
ν ][ǫ⊥(k)+(−1)
ν ], A1 = 2ǫ‖χ
1+χ1, A2 = ǫ‖χ1+
2χz1, λ = 2ZJeff , λ
′ = 4Zφ2t
′, ǫ‖ = 1+2tφ1/Jeff , ǫ⊥(k) =
1 + 4φ⊥t⊥(k)/Jeff⊥, the spin correlation functions χ
〈SziaS
i+η̂a〉, χ
2 = 〈S
i+τ̂a〉, χ
⊥ = 〈S
i2〉, the dressed
holon particle-hole order parameters φ1 = 〈h
iaσhi+η̂aσ〉,
φ2 = 〈h
iaσhi+τ̂aσ〉, φ⊥ = 〈h
i1σhi2σ〉, and the MF spin
excitation spectrum,
ω2νk = λ
A4 − αǫ‖χ
1γk −
αǫ‖χ1
(1 − ǫ‖γk) +
αχz1 − αχ1γk
(ǫ‖ − γk)
Z − 1
γ′k +
+ λλ′α
χz1(1 − ǫ‖γk)γ
k − C2)(ǫ‖ − γk)
+ γ′k(C
2 − ǫ‖χ
2γk)−
ǫ‖(C2 − χ2γk)
+ λJeff⊥α
ǫ⊥(k)(ǫ‖ − γk)[C⊥ + χ1(−1)
+ (1− ǫ‖γk)[C
⊥ + χ
1ǫ⊥(k)(−1)
ν ] + [ǫ⊥(k) + (−1)
ǫ‖(C⊥ − χ⊥γk) + (C
⊥ − ǫ‖χ
⊥γk)(−1)
+ λ′Jeff⊥α
γ′k[C
⊥ + χ
2ǫ⊥(k)(−1)
ǫ⊥(k)[C
⊥ + χ2(−1)
k − C
⊥) + χ
k(−1)
[ǫ⊥(k) + (−1)
J2eff⊥[ǫ⊥(k) + (−1)
ν ]2, (8)
where A3 = αC1 + (1− α)/2Z, A4 = αC
1 + (1− α)/4Z,
A5 = αC3 + (1 − α)/2Z, and the spin correla-
tion functions C1 = (1/Z
η̂η̂′
i+η̂aS
i+η̂′a
C2 = (1/Z
i+η̂aS
i+τ̂a〉, C3 =
(1/Z2)
τ̂ τ̂ ′
i+τ̂aS
i+τ̂ ′a
〉, Cz1 =
(1/Z2)
η̂η̂′
〈Szi+η̂aS
i+η̂′a
〉, Cz2 =
(1/Z2)
〈Szi+η̂aS
i+τ̂a〉, C⊥ = (1/Z)
〈S+i1S
i+η̂2〉,
C′⊥ = (1/Z)
〈S+i1S
i+τ̂2〉, C
⊥ = (1/Z)
〈Szi1S
i+η̂2〉,
and C′
⊥ = (1/Z)
〈Szi1S
i+τ̂2〉. In order to satisfy the
sum rule of the spin correlation function 〈S+iaS
ia〉 = 1/2
in the case without AFLRO, the important decoupling
parameter α has been introduced in the above calcu-
lation as in the single layer case18,19,22, which can be
regarded as the vertex correction.
With the help of the longitudinal and transverse parts
of the full dressed holon normal and anomalous Green’s
functions in Eq. (5) and MF spin Green’s function
in Eq. (7), we now can calculate the electron nor-
mal and anomalous Green’s functions G(i − j, t − t′) =
〈〈Ciσ(t);C
′)〉〉 = GL(i− j, t− t
′) + σxGT (i− j, t− t
and Γ†(i−j, t−t′) = 〈〈C
i↑(t);C
′)〉〉 = Γ
L(i−j, t−t
T (i−j, t−t
′), where these longitudinal and transverse
parts are the convolutions of the corresponding longitudi-
nal and transverse parts of the full dressed holon normal
and anomalous Green’s functions and MF spin Green’s
function in the CSS fermion-spin theory, and can be eval-
uated explicitly as,
GL(k, ω) =
L(1)µν (k,p)
U2hµp−k
ω + Ehµp−k − ωνp
V 2hµp−k
ω − Ehµp−k + ωνp
+ L(2)µν (k,p)
U2hµp−k
ω + Ehµp−k + ωνp
V 2hµp−k
ω − Ehµp−k − ωνp
, (9a)
GT (k, ω) =
(−1)µ+νZ
L(1)µν (k,p)
U2hµp−k
ω + Ehµp−k − ωνp
V 2hµp−k
ω − Ehµp−k + ωνp
+ L(2)µν (k,p)
U2hµp−k
ω + Ehµp−k + ωνp
V 2hµp−k
ω − Ehµp−k − ωνp
, (9b)
L(k, ω) =
(p− k)
2Ehµp−k
L(1)µν (k,p)
ω − Ehµp−k + ωνp
ω + Ehµp−k − ωνp
+ L(2)µν (k,p)
ω − Ehµp−k − ωνp
ω + Ehµp−k + ωνp
, (9c)
(k, ω) =
(−1)µ+νZ
(p− k)
2Ehµp−k
L(1)µν (k,p)
ω − Ehµp−k + ωνp
ω + Ehµp−k − ωνp
+ L(2)µν (k,p)
ω − Ehµp−k − ωνp
ω + Ehµp−k + ωνp
, (9d)
where L
µν (k,p) = [coth(βωνp/2) − th(βEhµp−k/2)]/2
and L
µν (k,p) = [coth(βωνp/2) + th(βEhµp−k/2)]/2,
then the longitudinal and transverse parts of the
electron spectral function AL(k, ω) = −2ImGL(k, ω)
and AT (k, ω) = −2ImGT (k, ω) and SC gap func-
tion ∆L(k) = (1/β)
L(k, iωn) and ∆T (k) =
(1/β)
(k, iωn) are obtained as,
AL(k, ω) = π
{L(1)µν (k,p)[U
hµp−kδ(ω + Ehµp−k − ωνp) + V
hµp−kδ(ω − Ehµp−k + ωνp)]
+ L(2)µν (k,p)[U
hµp−kδ(ω + Ehµp−k + ωνp) + V
hµp−kδ(ω − Ehµp−k − ωνp)]}, (10a)
AT (k, ω) = π
(−1)µ+νZ
{L(1)µν (k,p)[U
hµp−kδ(ω + Ehµp−k − ωνp) + V
hµp−kδ(ω − Ehµp−k + ωνp)]
+ L(2)µν (k,p)[U
hµp−kδ(ω + Ehµp−k + ωνp) + V
hµp−kδ(ω − Ehµp−k − ωνp)]}, (10b)
∆L(k) = −
p,µ,ν
(p− k)
Ehµp−k
βEhµp−k]coth[
βωνp], (10c)
∆T (k) = −
p,µ,ν
(−1)µ+νZ
(p− k)
Ehµp−k
βEhµp−k]coth[
βωνp]. (10d)
With the above longitudinal and transverse parts of the
SC gap functions in Eqs. (10c) and (10d), the corre-
sponding longitudinal and transverse SC gap parameters
are obtained as ∆L = −χ1∆hL and ∆T = −χ⊥∆hT ,
respectively. In the bilayer coupling case, the more ap-
propriate classification is in terms of the spectral func-
tion and SC gap function within the basis of the an-
tibonding and bonding components13,14,15,16,17. In this
case, the electron spectral function and SC gap parame-
ter can be transformed from the plane representation to
the antibonding-bonding representation as,
A(a)(k, ω) =
[AL(k, ω)−AT (k, ω)], (11a)
A(b)(k, ω) =
[AL(k, ω) +AT (k, ω)], (11b)
∆(a) = ∆L −∆T , (11c)
∆(b) = ∆L +∆T . (11d)
respectively, then the antibonding and bonding parts
have odd and even symmetries, respectively.
III. ELECTRON STRUCTURE OF BILAYER
CUPRATE SUPERCONDUCTORS
We now begin to discuss the effect of the bilayer in-
teraction on the electronic structure in the SC state. We
first plot, in Fig. 1, the antibonding (solid line) and
bonding (dashed line) electron spectral functions in the
[π, 0] point for parameters t/J = 2.5, t′/t = 0.3, and
t⊥/t = 0.35 with temperature T = 0.002J at the doping
concentration δ = 0.15. In comparison with the single
layer case18, the electron spectrum of the bilayer system
has been split into the bonding and antibonging compo-
nents, with the bonding and antibonding SC quasipar-
ticle peaks in the [π, 0] point are located at the differ-
ent positions. In this sense, the differentiation between
the bonding and antibonding components of the electron
spectral function is essential. The antibonding spectrum
consists of a low energy antibonding peak, corresponding
to the SC peak, and the bonding spectrum has a higher
energy bonding peak, corresponding to the hump, while
the spectral dip is in between them, then the total con-
tributions for the electron spectrum from both antibond-
ing and bonding components give rise to the PDH struc-
ture. Although the simple bilayer t-t′-J model (1) can-
not be regarded as a comprehensive model for a quanti-
tative comparison with bilayer cuprate superconductors,
our present results for the SC state are in qualitative
agreement with the major experimental observations on
bilayer cuprate superconductors2,7,8,9,10,11,16.
We now turn to discuss the doping evolution of the
electron spectrum of bilayer cuprate superconductors in
( )/J
-1.0 -.5 0.0 .5
Bonding
Antibonding
FIG. 1: The antibonding (solid line) and bonding (dashed
line) electron spectral functions in the [π, 0] point for t/J =
2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J at δ = 0.15.
( )/J
-1.0 -.5 0.0 .5
FIG. 2: The electron spectral functions at [π, 0] point for
t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J at
δ = 0.09 (solid line), δ = 0.12 (dashed line), and δ = 0.15
(dotted line).
the SC state. We have calculated the electron spec-
trum at different doping concentrations, and the result
of the electron spectral functions in the [π, 0] point for
t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 with T = 0.002J
at δ = 0.09 (solid line), δ = 0.12 (dashed line), and
δ = 0.15 (dotted line) are plotted in Fig. 2. In compari-
son with the corresponding ARPES experimental results
of the bilayer cuprate superconductor Bi2Sr2CaCu2O8+δ
in the SC state in Ref.12, it is obviously that the doping
evolution of the spectral weight of the bilayer supercon-
Bonding Band
Antibonding Band
(-0.2 , ) (0, ) (0.2 , )
FIG. 3: The positions of the antibonding peaks and bonding
humps in the electron spectrum as a function of momentum
along the direction [−0.2π, π] → [0, π] → [0.2π, π] with T =
0.002J at δ = 0.15 for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35.
ductor Bi2Sr2CaCu2O8+δ is reproduced. With increas-
ing the doping concentration, both SC peak and hump
become sharper, and then the spectral weights increase
in intensity. Furthermore, we have also calculated the
electron spectrum with different temperatures, and the
results show that the spectral weights of both SC peak
and hump are suppressed with increasing temperatures.
Our these results are also qualitatively consistent with
the ARPES experimental results on bilayer cuprate su-
perconductors in the SC state2,9,12.
To better perceive the anomalous form of the antibond-
ing and bonding electron spectral functions as a function
of energy ω for k in the vicinity of the [π, 0] point, we
have made a series of calculations for the electron spec-
tral function at different momenta, and the results show
that the sharp SC peak from the electron antibonding
spectral function and hump from the bonding spectral
function persist in a very large momentum space region
around the [π, 0] point. To show this point clearly, we
plot the positions of the antibonding peak and bonding
hump in the electron spectrum as a function of momen-
tum along the direction [−0.2π, π] → [0, π] → [0.2π, π]
with T = 0.002J at δ = 0.15 for t/J = 2.5, t′/t = 0.3,
and t⊥/t = 0.35 in Fig. 3. Our result shows that there
are two branches in the quasiparticle dispersion, with
upper branch corresponding to the antibonding quasi-
particle dispersion, and lower branch corresponding to
the bonding quasiparticle dispersion. Furthermore, the
BS reaches its maximum at the [π, 0] point. Our present
result also shows that in analogy to the two flat bands ap-
peared in the normal state22, both electron antibonding
peak and bonding hump have a weak dispersion around
the [π, 0] point, in qualitative agreement with the ARPES
experimental measurements on bilayer cuprate supercon-
ductors in the SC state2,7,8,9,10,11,14.
In the above calculations, we find that although the
antibonding SC peak and bonding hump have different
dispersions, the transverse part of the SC gap param-
eter ∆T ≈ 0. To show this point clearly, we plot the
antibonding and bonding gap parameters in Eqs. (11c)
00.00 0.05 0.10 0.15 0.20 0.25 0.30
00.00
FIG. 4: The antibonding (solid line) and bonding (dashed
line) gap parameters as a function of the doping concentration
with T = 0.002J for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35.
and (11d) as a function of the doping concentration with
T = 0.002J for t/J = 2.5, t′/t = 0.3, and t⊥/t = 0.35 in
Fig. 4. As seen from Fig. 4, both antibonding and bond-
ing gap parameters have the same d-wave SC gap mag-
nitude in a given doping concentration, i.e., ∆a ≈ ∆b.
This result shows that although there is a single elec-
tron interlayer coherent hopping (2) in bilayer cuprate
superconductors in the SC state, the electron interlayer
pairing interaction vanishes. This reflects that in the
present kinetic energy driven SC mechanism, the weak
dressed holon-spin interaction due to the interlayer co-
herent hopping (2) from the kinetic energy terms in Eq.
(3) does not induce the dressed holon interlayer pair-
ing state by exchanging spin excitations in the higher
power of the doping concentration. This is different from
the dressed holon-spin interaction due to the intralayer
hopping from the kinetic energy terms in Eq. (3), it
can induce superconductivity by exchanging spin excita-
tions in the higher power of the doping concentration19.
Our this result is also consistent with the ARPES ex-
perimental results of the bilayer cuprate superconductor
Bi(Pb)2Sr2CaCu2O8+δ
14,16, where the SC gap separately
for the bonding and antibonding bands has been mea-
sured, and it is found that both d-wave SC gaps from
the antibonding and bonding components are identical
within the experimental uncertainties.
To our present understanding, two main reasons why
the electronic structure of bilayer cuprate superconduc-
tors in the SC state can be described qualitatively in
the framework of the kinetic energy driven supercon-
ductivity by considering the bilayer interaction are as
follows. Firstly, the bilayer interaction causes the BS,
this leads to that the full electron normal (anomalous)
Green’s function is divided into the longitudinal and
transverse parts, respectively, then the bonding and an-
tibonding electron spectral functions (SC gap functions)
are obtained from these longitudinal and transverse parts
of the electron normal (anomalous) Green’s function,
respectively. Although the transverse part of the SC
gap parameter ∆T ≈ 0, the antibonding peak around
the [π, 0] point is always at lower binding energy than
the bonding peak (hump) due to the BS. In this sense,
the PDH structure in the bilayer cuprate superconduc-
tors in the SC state is mainly caused by the BS. Sec-
ondly, the SC state in the kinetic energy driven SC
mechanism is the conventional BCS like as in the sin-
gle layer case18,19. This can be understood from the
electron normal and anomalous Green’s functions in Eq.
(9). Since the spins center around the [π, π] point in
the MF level18,19,22, then the main contributions for the
spins comes from the [π, π] point. In this case, the
longitudinal and transverse parts of the electron nor-
mal and anomalous Green’s functions in Eq. (9) can
be approximately reduced in terms of ωνp=[π,π] ∼ 0 and
one of the self-consistent equations22 1/2 = 〈S+iaS
ia〉 =
1/(4N)
(Bνk/ωνk)coth[(1/2)βωνk] as,
GL(k, ω) ≈
ν=1,2
ω − Eνk
V 2νk
ω + Eνk
(12a)
GT (k, ω) ≈
ν=1,2
(−1)ν+1Z
ω − Eνk
V 2νk
ω + Eνk
(12b)
(k, ω) =
ν=1,2
z (k)
ω − Eνk
ω + Eνk
(12c)
T (k, ω) =
ν=1,2
(−1)ν+1Z
z (k)
ω − Eνk
ω + Eνk
, (12d)
where the electron coherent weights Z
FA = Z
/2, the
electron quasiparticle coherence factors U2νk ≈ V
hνk−kA
and V 2νk ≈ U
hνk−kA
, the SC gap function ∆̄
z (k) ≈
(k − kA) and the electron quasiparticle spectrum
Eνk ≈ Ehνk−kA , with kA = [π, π]. As in the sin-
gle layer case18,19, this reflects that the hole-like dressed
holon quasiparticle coherence factors Vhνk and Uhνk and
hole-like dressed holon quasiparticle spectrum Ehνk have
been transferred into the electron quasiparticle coher-
ence factors Uνk and Vνk and electron quasiparticle spec-
trum Eνk, respectively, by the convolutions of the corre-
sponding longitudinal and transverse parts of the MF
spin Green’s function and full dressed holon normal
and anomalous Green’s functions due to the charge-spin
recombination27. As a result, these electron normal and
anomalous Green’s functions in Eq. (12) are typical bi-
layer BCS like6. This also reflects that as in the single
layer case18,19, the dressed holon pairs condense with the
d-wave symmetry in a wide range of the doping concen-
tration, then the electron Cooper pairs originating from
the dressed holon pairing state are due to the charge-
spin recombination, and their condensation automati-
cally gives the electron quasiparticle character. These
are why the basic bilayer BCS formalism6 is still valid in
discussions of SC coherence of the quasiparticle peak and
hump, although the pairing mechanism is driven by the
intralayer kinetic energy by exchanging spin excitations,
and other exotic magnetic scattering28,29 is beyond the
BCS formalism.
IV. SUMMARY AND DISCUSSIONS
We have studied the electronic structure of bilayer
cuprate superconductors in the SC state based on the
kinetic energy driven SC mechanism19. Our results show
that the electron spectrum of bilayer cuprate supercon-
ductors is split into the bonding and antibonding com-
ponents by the BS, then the observed PDH structure
around the [π, 0] point is mainly caused by this BS, with
the SC peak being related to the antibonding compo-
nent, and the hump being formed by the bonding com-
ponent. The spectral weight increases with increasing
the doping concentration. In analogy to the two flat
bands appeared in the normal state, the antibonding
and bonding quasiparticles around the [π, 0] point dis-
perse weakly with momentum, in qualitative agreement
with the experimental observation on the bilayer cuprate
superconductors2,7,8,9,10,11. Our these results also show
that the bilayer interaction has significant contributions
to the electronic structure of bilayer cuprate supercon-
ductors in the SC state.
It has been shown from the ARPES experiments2,14
that the BS has been detected in both normal and SC
states, and then the electron spectral functions display
the double-peak structure in the normal state and PDH
structure in the SC state. Recently, we22 have studied the
electron spectrum of bilayer cuprate superconductors in
the normal state, and shown that the double-peak struc-
ture in the electron spectrum in the normal state is dom-
inated by the BS. On the other hand, although the anti-
bonding and bonding SC peaks have different dispersions,
the antibonding and bonding parts have the same d-wave
SC gap amplitude as mentioned above. Incorporating our
previous discussions for the normal state case22 and the
present studies for the SC state case, we therefore find
that the one of the important roles of the interlayer co-
herent hopping (2) is to split the electron spectrum of the
bilayer system into the bonding and antibonding compo-
nents in both normal and SC states. As a consequence,
the well pronounced PDH structure of bilayer cuprate su-
perconductors in the SC state and double-peak structure
in the normal state are mainly caused by the BS.
Acknowledgments
The authors would like to thank Dr. H. Guo and Dr.
L. Cheng for the helpful discussions. This work was sup-
ported by the National Natural Science Foundation of
China under Grant No. 90403005, and the funds from
the Ministry of Science and Technology of China under
Grant Nos. 2006CB601002 and 2006CB921300.
APPENDIX A: DRESSED HOLON BCS TYPE
NORMAL AND ANOMALOUS GREEN’S
FUNCTIONS IN BILAYER CUPRATE
SUPERCONDUCTORS
In the single layer case, it has been shown19 that
the dressed holon-spin interactions from the kinetic en-
ergy terms of the t-t′-J model are quite strong, and
in the case without AFLRO, these interactions can in-
duce the dressed holon pairing state (then the electron
Cooper pairing state) by exchanging spin excitations in
the higher power of the doping concentration. Following
their discussions18,19, we obtain in terms of Eliashberg’s
strong coupling theory31 that the self-consistent equa-
tions that satisfied by the full dressed holon normal and
anomalous Green’s functions in the bilayer system in the
SC state as,
g(k, ω) = g(0)(k, ω) + g(0)(k, ω)
1 (k, ω)g(k, ω)
2 (−k,−ω)ℑ
†(k, ω)
, (A1a)
ℑ†(k, ω) = g(0)(−k,−ω)
1 (−k,−ω)ℑ
†(−k,−ω)
2 (−k,−ω)g(k, ω)
, (A1b)
respectively, where the MF dressed holon normal Green’s
function22 g(0)(k, ω) = g
(k, ω) + σxg
(k, ω), with
the longitudinal and transverse parts are evaluated as
L (k, ω) = (1/2)
ν=1,2(ω − ξνk)
−1 and g
T (k, ω) =
(1/2)
ν=1,2(−1)
ν+1(ω − ξνk)
−1, respectively, while
the dressed holon self-energy functions Σ
1 (k, ω) =
1L (k, ω) + σxΣ
1T (k, ω) and Σ
2 (k, ω) = Σ
2L (k, ω) +
2T (k, ω), with the corresponding longitudinal and
transverse parts have been given in Eq. (6).
In the previous discussions of the electronic struc-
ture for the single layer cuprate superconductors in the
SC state18, it has been shown the self-energy function
2 (k, ω) describes the effective dressed holon pair gap
function, while the self-energy function Σ
1 (k, ω) de-
scribes the quasiparticle coherence. Since Σ
2 (k, ω) is
an even function of ω, while Σ
1 (k, ω) is not, therefore
for the convenience, the self-energy function Σ
1 (k, ω)
can be broken up into its symmetric and antisymmetric
parts as, Σ
1 (k, ω) = Σ
1e (k, ω)+ωΣ
1o (k, ω), then both
1e (k, ω) and Σ
1o (k, ω) are even functions of ω. Now
we can define the dressed holon quasiparticle coherent
weights in the present bilayer system as Z−1
hF1(k, ω) =
1 − Σ
1L (k, ω) and Z
hF2(k, ω) = Σ
1T (k, ω). As in the
single layer case18, we only discuss the low-energy behav-
ior of the electronic structure of bilayer cuprate super-
conductors, which means that the effective dressed holon
pair gap functions and quasiparticle coherent weights
can be discussed in the static limit, i.e., ∆̄h(k) =
2 (k, ω) |ω=0= ∆̄hL(k) + σx∆̄hT (k), Z
hF1(k) = 1 −
1L (k, ω) |ω=0 and Z
hF2(k) = Σ
1T (k, ω) |ω=0. As in
the single layer case18, although ZhF1(k) and ZhF2(k)
still are a function of k, the wave vector dependence
may be unimportant. This followed from the ARPES
experiments2 that in the SC-state of bilayer cuprate su-
perconductors, the lowest energy states are located at
the [π, 0] point, which indicates that the majority con-
tribution for the electron spectrum comes from the [π, 0]
point. In this case, the wave vector k in ZhF1(k) and
ZhF2(k) can be chosen as Z
hF1 = 1−Σ
1L (k) |k=[π,0] and
hF2 = Σ
1T (k) |k=[π,0]. With the help of the above dis-
cussions, the corresponding longitudinal and transverse
parts of the dressed holon normal and anomalous Green’s
functions in Eqs. (A1a) and (A1b) now can be obtained
explicitly as,
gL(k, ω) =
ν=1,2
U2hνk
ω − Ehνk
V 2hνk
ω + Ehνk
, (A2a)
gT (k, ω) =
ν=1,2
(−1)ν+1Z
U2hνk
ω − Ehνk
V 2hνk
ω + Ehνk
, (A2b)
L(k, ω) = −
ν=1,2
2Ehνk
ω − Ehνk
ω + Ehνk
, (A2c)
T (k, ω) = −
ν=1,2
(−1)ν+1Z
2Ehνk
ω − Ehνk
ω + Ehνk
, (A2d)
with the dressed holon effective gap parameters and
quasiparticle coherent weights satisfy the following four
equations,
∆̄hL = −
k,q,p
ν,ν′,ν′′
k−p+qCνν′′ (k+ q)
(ν′′)
Bν′pBνq
ων′pωνq
(ν′′)
νν′ν′′(q,p) + F
νν′ν′′(k,q,p)
[ων′p − ωνq]2 − E
hν′′k
νν′ν′′(q,p) + F
νν′ν′′(k,q,p)
[ων′p + ωνq]2 − E
hν′′k
, (A3a)
∆̄hT = −
k,q,p
ν,ν′,ν′′
(−1)ν+ν
′+ν′′+1Cνν′′(k+ q)
(ν′′)
Bν′pBνq
ων′pωνq
(ν′′)
νν′ν′′(q,p) + F
νν′ν′′(k,q,p)
[ων′p − ωνq]2 − E
hν′′k
νν′ν′′(q,p) + F
νν′ν′′(k,q,p)
[ων′p + ωνq]2 − E
hν′′k
, (A3b)
= 1 +
ν,ν′,ν′′
[1 + (−1)ν+ν
′+ν′′+1]Cνν′′ (p+ k0)
(ν′′)
Bν′pBνq
ων′pωνq
νν′ν′′(q,p)
[ων′p − ωνq + Ehν′′p−q+k0 ]
νν′ν′′(q,p)
[ων′p − ωνq − Ehν′′p−q+k0 ]
νν′ν′′(q,p)
[ων′p + ωνq + Ehν′′p−q+k0]
νν′ν′′ (q,p)
[ων′p + ωνq − Ehν′′p−q+k0 ]
, (A3c)
= 1 +
ν,ν′,ν′′
[1− (−1)ν+ν
′+ν′′+1]Cνν′′ (p+ k0)
(ν′′)
Bν′pBνq
ων′pωνq
νν′ν′′(q,p)
[ων′p − ωνq + Ehν′′p−q+k0 ]
νν′ν′′(q,p)
[ων′p − ωνq − Ehν′′p−q+k0 ]
νν′ν′′(q,p)
[ων′p + ωνq + Ehν′′p−q+k0]
νν′ν′′ (q,p)
[ων′p + ωνq − Ehν′′p−q+k0 ]
, (A3d)
where Cνν′′ (k) = [Z(tγk − t
′γ′k) + (−1)
ν+ν′′t⊥(k)]
νν′ν′′
(q,p) = nB(ωνq)+nB(ων′p)+2nB(ωνq)nB(ων′p),
νν′ν′′(k,q,p) = [2nF (Ehν′′k) − 1][ων′p −
ωνq][nB(ωνq) − nB(ων′p)]/Ehν′′k, F
νν′ν′′(q,p) =
1 + nB(ωνq) + nB(ων′p) + 2nB(ωνq)nB(ων′p),
νν′ν′′
(k,q,p) = [2nF (Ehν′′k) − 1][ων′p + ωνq][1 +
nB(ωνq) + nB(ων′p)]/Ehν′′k, H
νν′ν′′(q,p) =
nF (Ehν′′p−q+k0)[nB(ων′p) − nB(ωνq)] + nB(ωνq)[1 +
nB(ων′p)], H
νν′ν′′(q,p) = nF (Ehν′′p−q+k0)[nB(ωνq) −
nB(ων′p)] + nB(ων′p)[1 + nB(ωνq)], H
νν′ν′′(q,p) =
[1 − nF (Ehν′′p−q+k0)][1 + nB(ωνq) + nB(ων′p)] +
nB(ωνq)nB(ων′p), H
νν′ν′′
(q,p) = nF (Ehν′′p−q+k0)[1 +
nB(ωνq)+nB(ων′p)]+nB(ωνq)nB(ων′p), and k0 = [π, 0].
These four equations must be solved self-consistently
in combination with other equations as in the single
layer case18,19, then all order parameters, decoupling
parameter α, and chemical potential µ are determined
by the self-consistent calculation.
1 See, e.g., M.A.Kastner, R.J. Birgeneau, G. Shirane, and
Y. Endoh, Rev. Mod. Phys. 70, 897 (1998), and references
therein.
2 See, e.g., A. Damascelli, Z. Hussain, and Z.-X. Shen, Rev.
Mod. Phys. 75, 473 (2003), and references therein.
3 See, e.g., J. Campuzano, M. Norman, and M. Randeira,
in Physics of Superconductors, vol. II, edited by K. Benne-
mann and J. Ketterson (Springer, Berlin Heidelberg New
York, 2004), p. 167, and references therein.
4 J. Campuzano, H. Ding, M. R. Norman, M. Randeira,
A. F. Bellman, T. Yokoya, T. Takahashi, H. Katayama-
Yoshida, T. Mochiku, and K. Kadowaki, Phys. Rev. B 53,
R14737 (2003).
5 H. Matsui, T. Sato, T. Takahashi, S.-C. Wang, H.-B. Yang,
H. Ding, T. Fujii, T. Watanabe, and A. Matsuda, Phys.
Rev. Lett. 90, 217002 (2003).
6 J.R. Schrieffer, Theory of Superconductivity, Benjamin,
New York, 1964.
7 D.S. Dessau, B.O. Wells, Z.-X. Shen, W.E. Spicer, A.J.
Arko, R.S. List, D.B. Mitzi, and A. Kapitulnik, Phys. Rev.
Lett. 66, 2160 (1991); Y. Hwu, L. Lozzi, M. Marsi, S. La
Rosa, M. Winokur, P. Davis, M. Onellion, H. Berger, F.
Gozzo, F. Lévy, and G. Margaritondo, Phys. Rev. Lett.
67, 2573 (1991).
8 Mohit Randeria, Hong Ding, J-C. Campuzano, A. Bell-
man, G. Jennings, T. Yokoya, T. Takahashi, H. Katayama-
Yoshida, T. Mochiku, and K. Kadowaki, Phys. Rev. Lett.
74, 4951 (1995); H. Ding, T. Yokoya, J-C. Campuzano, T.
Takahashi, M. Randeria, M. R. Norman, T. Mochiku, K.
Kadowaki, and J. Giapintzakis, Nature 382, 51 (1996).
9 A.V. Fedorov, T. Valla, P.D. Johnson, Q. Li, G.D. Gu, and
N. Koshizuka, Phys. Rev. Lett. 82, 2179 (1999).
10 D.H. Lu, D.L. Feng, N.P. Armitage, K.M. Shen, A. Dam-
ascelli, C. Kim, F. Ronning, Z.-X. Shen, D.A. Bonn, R.
Liang, W.N. Hardy, A.I. Rykov, and S. Tajima, Phys. Rev.
Lett. 86, 4370 (2001).
11 T. Sato, H. Matsui, S. Nishina, T. Takahashi, T. Fujii, T.
Watanabe, and A. Matsuda, Phys. Rev. Lett. 89, 67005
(2002); D.L. Feng, A. Damascelli, K.M. Shen, N. Mo-
toyama, D.H. Lu, H. Eisaki, K. Shimizu, J.-i. Shimoyama,
K. Kishio, N. Kaneko, M. Greven, G.D. Gu, X.J. Zhou,
C. Kim, F. Ronning, N.P. Armitage, and Z.-X Shen, Phys.
Rev. Lett. 88, 107001 (2002).
12 J.C. Campuzano, H. Ding, M.R. Norman, H.M. Fretwell,
M. Randeria, A. Kaminski, J. Mesot, T. Takeuchi, T. Sato,
T. Yokoya, T. Takahashi, T. Mochiku, K. Kadowaki, P.
Guptasarma, D.G. Hinks, Z. Konstantinovic, Z.Z. Li, and
H. Raffy, Phys. Rev. Lett. 83, 3709 (1999); M.R. Nor-
man, H. Ding, J.C. Campuzano, T. Takeuchi, M. Randeria,
T. Yokoya, T. Takahashi, T. Mochiku, and K. Kadowaki,
Phys. Rev. Lett. 79, 3506 (1997).
13 A.A. Kordyuk, S.V. Borisenko, T.K. Kim, K.A. Nenkov,
M. Knupfer, J. Fink, M.S. Golden, H. Berger, and R. Fol-
lath, Phys. Rev. Lett. 89, 077003 (2002); A.D. Gromko,
Y.-D. Chuang, A.V. Fedorov, Y. Aiura, Y. Yamaguchi, K.
Oka, Yoichi Ando, D.S. Dessau, cond-mat/0205385.
14 D.L. Feng, N.P. Armitage, D.H. Lu, A. Damascelli, J.P.
Hu, P. Bogdanov, A. Lanzara, F. Ronning, K.M. Shen,
H. Eisaki, C. Kim, Z.-X. Shen, J.-i. Shimoyama, and K.
Kishio, Phys. Rev. Lett. 86, 5550 (2001).
15 Y.-D. Chuang, A.D. Gromko, A. Fedorov, Y. Aiura, K.
Oka, Yoichi Ando, H. Eisaki, S.I. Uchida, and D.S. Dessau,
Phys. Rev. Lett. 87, 117002 (2001); P.V. Bogdanov, A.
Lanzara, X.J. Zhou, S.A. Kellar, D.L. Feng, E.D. Lu, H.
Eisaki, J.-I. Shimoyama, K. Kishio, Z. Hussain, and Z.X.
Shen, Phys. Rev. B 64, 180505(R) (2001).
16 S.V. Borisenko, A.A. Kordyuk, T.K. Kim, S. Legner, K.A.
Nenkov, M. Knupfer, M.S. Golden, J. Fink, H. Berger, and
R. Follath, Phys. Rev. B 66, 140509(R) (2002).
17 D.L. Feng, C. Kim, H. Eisaki, D.H. Lu, A. Damascelli,
K.M. Shen, F. Ronning, N.P. Armitage, N. Kaneko1, M.
Greven, J.-i. Shimoyama, K. Kishio, R. Yoshizaki, G.D.
Gu, and Z.-X. Shen, Phys. Rev. B 65, 220501(R) (2002);
A.A. Kordyuk, S.V. Borisenko, M.S. Golden, S. Legner,
K.A. Nenkov, M. Knupfer, J. Fink, H. Berger, L. Forró,
and R. Follath, Phys. Rev. B 66, 014502 (2002); Y.-D.
Chuang, A.D. Gromko, A.V. Fedorov, Y. Aiura, K. Oka,
Yoichi Ando, D.S. Dessau, cond-mat/0107002.
18 Huaiming Guo and Shiping Feng, Phys. Lett. A 361, 382
(2007); Shiping Feng and Tianxing Ma, Phys. Lett. A 350,
138 (2006).
19 Shiping Feng, Phys. Rev. B68, 184501 (2003); Shiping
Feng, Tianxing Ma, and Huaiming Guo, Physica C 436,
14 (2006).
20 A.A. Kordyuk, S.V. Borisenko, M. Knupfer, and J. Fink,
Phys. Rev. B 67, 064504 (2003); A.A. Kordyuk and S.V.
Borisenko, Low Temp. Phys. 32, 298 (2006).
21 M. Mori, T. Tohyama, and S. Maekawa, Phys. Rev. B 66,
064502 (2002).
22 Yu Lan, Jihong Qin, and Shiping Feng, Phys. Rev. B 75,
134513 (2007).
23 C. Kim, P.J. White, Z.-X. Shen, T. Tohyama, Y. Shibata,
S. Maekawa, B.O. Wells, Y.J. Kim, R.J. Birgeneau, and
M.A. Kastner, Phys. Rev. Lett. 80, 4245 (1998).
24 O.K. Anderson, A.I. Liechtenstein, O. Jepsen, and F.
Paulsen, J. Phys. Chem. Solids 56, 1573 (1995); A.I.
Liechtenstein, O. Gunnarsson, O.K. Anderson, and R.M.
Martin, Phys. Rev. B 54, 12505 (1996); S. Chakarvarty, A.
Sudbo, P.W. Anderson, and S. Strong, Science 261, 337
(1993).
25 Shiping Feng, Jihong Qin, and Tianxing Ma, J. Phys. Con-
dens. Matter 16, 343 (2004); Shiping Feng, Tianxing Ma,
and Jihong Qin, Mod. Phys. Lett. B 17, 361 (2003).
26 S. Sorella, G.B. Martins, F. Becca, C. Gazza, L. Capriotti,
A. Parola, and E. Dagotto, Phys. Rev. Lett. 88, 117002
(2002).
27 P.W. Anderson, Phys. Rev. Lett. 67, 2092 (1991); Science
288, 480 (2000).
28 P. Dai, H.A. Mook, R.D. Hunt, and F. Dog̃an, Phys. Rev.
B 63, 54525 (2001); Ph. Bourges, B. Keimer, S. Pailhés,
L.P. Regnault, Y. Sidis, and C. Ulrich, Physica C 424, 45
(2005).
29 M. Arai, T. Nishijima, Y. Endoh, T. Egami, S. Tajima,
K. Tomimoto, Y. Shiohara, M. Takahashi, A. Garret, and
S.M. Bennington, Phys. Rev. Lett. 83, 608 (1999); S.M.
Hayden, H.A. Mook, P. Dai, T.G. Perring, and F. Dog̃an,
Nature 429, 531 (2004); C. Stock, W.J. Buyers, R.A. Cow-
ley, P.S. Clegg, R. Coldea, C.D. Frost, R. Liang, D. Peets,
D. Bonn, W.N. Hardy, and R.J. Birgeneau, Phys. Rev. B
71, 24522 (2005).
30 K. Terashima, H. Matsui, D. Hashimoto, T. Sato, T. Taka-
hashi, H. Ding, T. Yamamoto, and K. Kadowaki, Nature
Phys. 2, 27 (2006).
31 G.M. Eliashberg, Sov. Phys. JETP 11, 696 (1960); D.J.
Scalapino, J.R. Schrieffer, and J.W. Wilkins, Phys. Rev.
148, 263 (1966).
http://arxiv.org/abs/cond-mat/0205385
http://arxiv.org/abs/cond-mat/0107002
ABSTRACT
  Within the framework of the kinetic energy driven superconductivity, the
electronic structure of bilayer cuprate superconductors in the superconducting
state is studied. It is shown that the electron spectrum of bilayer cuprate
superconductors is split into the bonding and antibonding components by the
bilayer splitting, then the observed peak-dip-hump structure around the
$[\pi,0]$ point is mainly caused by this bilayer splitting, with the
superconducting peak being related to the antibonding component, and the hump
being formed by the bonding component. The spectral weight increases with
increasing the doping concentration. In analogy to the normal state case, both
electron antibonding peak and bonding hump have the weak dispersions around the
$[\pi,0]$ point.

<|endoftext|><|startoftext|>
Submitted to ApJ Letters (Revised Version including Referee’s Comments)
Preprint typeset using LATEX style emulateapj v. 08/22/09
9.7 µ M SILICATE ABSORPTION IN A DAMPED LYMAN-α ABSORBER AT Z = 0.52
Varsha P. Kulkarni1, Donald G. York2,3, Giovanni Vladilo4, Daniel E. Welty2
Submitted to ApJ Letters (Revised Version including Referee’s Comments)
ABSTRACT
We report a detection of the 9.7 µm silicate absorption feature in a damped Lyman-α (DLA) system
at zabs = 0.524 toward AO0235+164, using the Infrared Spectrograph (IRS) onboard the Spitzer Space
Telescope. The feature shows a broad shallow profile over ≈ 8-12 µm in the absorber rest frame and
appears to be > 15 σ significant in equivalent width. The feature is fit reasonably well by the silicate
absorption profiles for laboratory amorphous olivine or diffuse Galactic interstellar clouds. To our
knowledge, this is the first indication of 9.7 µm silicate absorption in a DLA. We discuss potential
implications of this finding for the nature of the dust in quasar absorbers. Although the feature is
relatively shallow (τ9.7 ≈ 0.08− 0.09), it is ≈ 2 times deeper than expected from extrapolation of the
τ9.7 vs. E(B − V ) relation known for diffuse Galactic interstellar clouds. Further studies of the 9.7
µm silicate feature in quasar absorbers will open a new window on the dust in distant galaxies.
Subject headings: Quasars: absorption lines–ISM:dust
1. INTRODUCTION
Damped Lyman-alpha (DLA) absorption systems in
quasar spectra dominate the neutral gas content in galax-
ies and offer venues for studying the evolution of metals
and dust in galaxies. Recent observations, however, sug-
gest that the majority of DLAs have low metallcities at
all redshifts studied (0 . z . 4), with the mean metal-
licity reaching at most ≈ 10 − 20% solar at the lowest
redshifts (see, e.g., Prochaska et al. 2003; Kulkarni et al.
2005, 2007; Péroux et al. 2006; and references therein).
These results appear to contradict the predictions of a
near-solar global mean interstellar metallicity of galaxies
at z ∼ 0 in most chemical evolution models based on the
cosmic star formation history inferred from galaxy imag-
ing surveys such as the Hubble Deep Field (HDF) (e.g.,
Madau et al. 1996). Furthermore, for a large fraction
of the DLAs, the SFRs inferred from emission-line imag-
ing searches fall far below the global predictions (e.g.,
Kulkarni et al. 2006, and references therein).
A possible explanation of these puzzles is that the cur-
rent DLA samples are biased due to dust selection ef-
fects, i.e. that the more dusty and more metal-rich ab-
sorbers obscure the background quasars more, making
them harder to observe (e.g., Fall & Pei 1993; Boissé et
al. 1998; Vladilo & Péroux 2005). DLAs are known to
have some dust, based on both the (generally mild) deple-
tions of refractory elements and the (typically slight) red-
dening of the background quasars (e.g., Pei et al. 1991;
Pettini et al. 1997; Kulkarni et al. 1997). Combining
∼ 800 quasar spectra from the Sloan Digital Sky Survey
(SDSS), York et al. (2006b) found a small but significant
amount of dust in absorbers at 1 < z < 2, with E(B−V )
of 0.02-0.09 for 9 of their 27 sub-samples (see also Khare
et al. 2007). York et al. (2006b) also showed that the
extinction in the composite spectra is best fitted by a
1 Department of Physics and Astronomy, University of South
Carolina, Columbia, SC 29208; E-mail: kulkarni@sc.edu
2 Department of Astronomy and Astrophysics, University of
Chicago, Chicago, IL 60637
3 Also, Enrico Fermi Institute
4 INAF, Osservatorio Astronomico di Trieste, Trieste, Italy
Small Magellanic Cloud (SMC) curve (with no 2175 Å
bump). Some recent studies suggest that dusty DLAs
could hide as much as 17% of the total metal content at
z ∼ 2, and more at lower z (Bouché et al. 2005). To un-
derstand whether this is the case, and to understand the
role of dust in quasar absorbers in general, it is essential
to directly probe the basic properties of the dust.
Recently, a small number of very dusty quasar ab-
sorbers have been discovered, via various signatures of
the dust in optical and UV observations: substantial red-
dening of the background quasars, large element deple-
tions (e.g., for Cr, Fe), and/or a detectable 2175 Å bump
(e.g., Junkkarinen et al. 2004; Wang et al. 2004). It is
not yet clear, however, whether the dust in these systems
is similar to that in the Milky Way or SMC or LMC.
The 2175 Å bump is generally, though not conclusively,
attributed to carbonaceous grains. The silicate compo-
nent of the dust, believed to comprise ≈ 70% of the core
mass of interstellar dust grains in the Milky Way (see,
e.g., Draine 2003) has not yet been probed in quasar ab-
sorbers. A unique opportunity to study this important
dust component is provided by the Spitzer IRS (Werner
et al. 2004; Houck et al. 2004), which provides the spec-
tral coverage, sensitivity, and resolution needed for the
detection of the strongest of the silicate spectral features
near 9.7 µm. The 9.7 µm feature, thought to arise in Si-O
stretching vibrations, is seen in a wide range of Galac-
tic and extragalactic environments (e.g., Whittet 1987
and references therein; Spoon et al. 2006; Imanishi et al.
2007). We have been carrying out an exploratory study
of the silicate dust in quasar absorbers by searching for
the 9.7 µm absorption feature with the Spitzer IRS. Here
we report on the detection of the 9.7 µm feature in one of
the systems studied, while the remaining three systems
observed recently will be reported in a separate paper
(Kulkarni et al. 2007b, in preparation).
2. OBSERVATIONS AND DATA ANALYSIS
The DLA at zabs = 0.524 (Junkkarinen et al. 2004)
toward the blazar AO 0235+164 (zem = 0.94) offers an
excellent venue for comparing dust in a distant galaxy
http://arxiv.org/abs/0704.0826v2
2 Kulkarni et al.
with that in near-by galaxies. It has one of the largest
H I column densities seen in DLAs (log NHI = 21.70)
and shows 21-cm absorption (Roberts et al. 1976). It
also shows X-ray absorption, consistent with a metallic-
ity of 0.7 solar (Junkkarinen et al. 2004). Candidate
absorber galaxies (much fainter than the blazar) within
a few arcseconds from the blazar sightline have been de-
tected (e.g., Smith et al. 1977; Yanny et al. 1989; Chun
et al. 2006). This absorber is one of a very few DLAs
producing appreciable reddening [E(B−V ) = 0.23 in the
absorber rest frame] and detection of a strong broad 2175
Å extinction bump (Junkkarinen et al. 2004). Finally,
this absorber is the only DLA with detections of several
diffuse interstellar bands (Junkkarinen et al. 2004; York
et al. 2006a). All of these data suggest that this absorber
is very dusty and may contain molecular gas.
The observations were obtained with the Spitzer IRS
on January 30, 2006 (UT) as GO program 20757 (PI
V. P. Kulkarni). IRS modules Short-Low 1 (SL1) and
Long-Low 2 (LL2) were used to cover 7.5-21.4 µm in the
observed frame (4.9-14.1 µm in the DLA rest frame). The
target was acquired with high-accuracy peakup using a
near-by bright star. The IRS standard staring mode was
used, with 2-pixel slit widths of 3.6” for SL1 and 10.5” for
LL2. Integration times were 60 s ×8 cycles for SL1 and
120 s ×11 cycles for LL2. For each cycle, observations
were performed at both nod positions A and B (offset by
1/3 the slit length), so the total integration times were
960 s and 2640 s, respectively, for SL1 and LL2.
The data were processed using the IRS S15.0 calibra-
tion pipeline (the latest version available at present), Im-
age Reduction and Analysis Facility (IRAF5), and In-
teractive Data Language (IDL). As detailed below, the
S15.0 pipeline yielded significant improvements for the
reliable detection and measurement of weak, broad fea-
tures in our spectra. The pipeline performs a number of
standard processing steps to produce the basic calibrated
data (BCD) files (see, e.g., the IRS Data Handbook
at http://ssc.spitzer.caltech.edu/irs/dh). Subtraction of
the sky (mostly zodiacal light) was performed by sub-
tracting the coadded frames at nod position B from those
at nod position A, and vice versa. The 1-dimentional
spectra were extracted from the 2-dimensional images us-
ing the Spitzer IRS Custom Extraction (SPICE) software
using the default extraction windows, and flux calibrated
using the standard S15.0 flux calibration files. The spec-
tra from the two nod positions were averaged together,
and the corresponding flux uncertainties calculated us-
ing both measurement uncertainties and “sampling un-
certainties” between the two nod positions.
The absolute flux levels in the different IRS mod-
ules were scaled to match the continuum levels in the
overlapping regions, using the bonus segment available
in the LL2 images. There was no mismatch between
the SL1 and LL2 flux levels; we used the SL1 data for
λ < 14.23µm and LL2 data for λ > 14.23µm. The data
at λ > 20µm bonus segment level had to be scaled up by
5.5% to match with the LL2 data at λ < 20µm. Fig. 1(a)
shows the final merged spectrum of AO0235+164. The
5 IRAF is distributed by the National Optical Astronomy Ob-
servatories, which are operated by the Association of Universities
for Research in Astronomy, Inc. (AURA), under cooperative agree-
ment with the National Science Foundation
S/N achieved per unbinned pixel in the final spectrum,
determined from rms fluctuations in the continuum re-
gions, is ≈ 100. The error bars denote 1 σ uncertainties.
The dashed line in Fig. 1(a) shows an estimate of the
power-law continuum of the quasar. This line joins the
observed continuum fluxes at 5.6 and 7.1 µm in the ab-
sorber rest frame and is extrapolated to the remaining
wavelength region. These wavelengths are chosen to be
in regions free of any other potential emission or absorp-
tion features (e.g., Imanishi et al. 2007). In principle,
significant 9.7 µm emission at the quasar redshift could
affect continuum determination redward of the suspected
silicate absorption feature from the DLA. However, (a)
our spectrum does not extend that far to the red, (b)
the power law provides a good fit to the continuum in
our data, and (c) the 9.7 µm emission is not particularly
strong in most quasars (e.g., Hao et al. 2007).
3. RESULTS
The spectrum shown in Fig. 1(a) exhibits a broad
absorption feature between about 12.4 and 18.3 µm rel-
ative to the power law continuum. The flux decline from
the continuum begins near the long wavelength end of
SL1 and continues smoothly into the LL2 data. The
broad feature is centered at 15.41 µm (10.11 µm in the
DLA rest frame). The observed frame equivalent width
is 0.31µm, with a 1 σ uncertainty of 0.014-0.020 µm, in-
cluding contributions from photon noise and continuum
fitting uncertainties (Sembach & Savage 1992).
We have performed several checks of our data analy-
sis to see whether the observed broad feature could be
an artifact. Since the possible silicate feature is broad
and shallow, extending from the long wavelength end of
SL1 through most of LL2, flux calibration and contin-
uum fitting are critical issues. In the S14 pipeline ver-
sion of these data, the possible silicate feature was some-
what stronger than in the S15 version. These differences
are due to a low-level non-linearity problem in the S14
pipeline, which produces a 4% tilt in LL2 spectra and a
5% mismatch at the SL1/LL2 boundary. This problem
has been eliminated in the S15 pipeline, and we find no
mismatch at the SL1/LL2 boundary in the S15 data.
The possible silicate feature does not show any visi-
ble signature of the “teardrop” feature known to exist
near 14.1 µm in some SL1 data (see, e.g., the IRS data
handbook). The beginning of decline in flux at the long-
wavelength end of SL1 matches smoothly with the flux at
the short-wavelength end of LL2 (which does not suffer
from the teardrop problem). Our results do not change
much even if the SL1 data are truncated at 14 µm to
avoid the region potentially affected by the teardrop (the
region 14-14.23 µm is a small fraction of the whole fea-
ture stretching out to 18.3 µm in the observed frame).
Inaccuracies in pointing (which can affect SL1 and LL2
fluxes at the ±1% level) also do not appear to be sig-
nificant for our data. Based on an examination of the
spectral images and the pointing difference keywords in
the data file headers, the telescope pointing was accu-
rate to within 0.09-0.11” for LL2 and within 0.22-0.29”
for SL1. Integrating a Gaussian intensity distribution
from a point source with the Spitzer point spread func-
tion over the known slit dimensions (57′′ × 3.6′′ for SL1,
168′′×10.5′′ for LL2), we estimate that the effect of such
an offset would be about 0.26% for SL1 and 0.05% for
http://ssc.spitzer.caltech.edu/irs/dh
Silicate Feature in A DLA 3
LL2, far too small to account for the observed feature.
We also compared our results with IRS spectra from
the literature for quasars without strong absorption sys-
tems (e.g., Sturm et al. 2006; Hao et al. 2007), and did
not find the broad absorption feature from our data in
those other quasars. In fact, quasar spectra in general
show no silicate absorption, but rather (generally rela-
tively weak) silicate emission at the quasar emission red-
shift. We also compared our IRS data for AO0235+164
with those for other targets in our study. The feature
seen in AO0235+164 is not seen at the same observed
wavelength in the other objects, suggesting that it is
not an instrumental artifact. [In fact, in Kulkarni et al.
2007b (in prep.), we will report the possible detection of
redshifted broad 9.7 µm silicate absorption in other parts
of the Spitzer spectral coverage toward other quasars.]
Given the results of the above tests and the fact that
the DLA toward AO0235+164 is already known to be
dusty (from detection of 2175 Å bump and diffuse inter-
stellar bands and reddening of the background quasar),
it seems very likely that the feature detected is the broad
9.7 µm silicate feature arising in the absorber galaxy.
4. DISCUSSION
The suggested silicate feature in the DLA toward
AO0235+164 is relatively shallow/weak compared to the
silicate features typically observed in Galactic interstel-
lar material (ISM) because of the modest reddening and
lower amounts of dust in quasar absorbers than in the
Milky Way. Indeed, the dust-to-gas ratio in the DLA
toward AO0235+164 is estimated to be 0.19 times the
Galactic value (Junkkarinen et al. 2004). On the other
hand, the observed feature is stronger than expected
from E(B−V ) = 0.23±0.01 for this absorber (Junkkari-
nen et al. 2004). In Galactic diffuse interstellar clouds,
the peak optical depth in the 9.7 µm silicate feature (τ9.7)
is observed to correlate with the reddening along the line
of sight, with τ9.7 = AV /18.5 (e.g., Whittet 1987). Ex-
trapolating this relation, and assuming RV = 3.1, one
would expect τ9.7 ≈ 0.039 for the DLA in AO0235+164.
Our observations, however, indicate τ9.7 ≈ 0.08 for this
DLA, ∼ 2 times higher than expected from the relation
for Galactic diffuse ISM. The dust in this absorber may
thus be somewhat richer in silicates than typical Galac-
tic dust. We note, however, that the silicate feature is
also known to be stronger in the Galactic Center region,
perhaps due to fewer carbon stars (and thus less carbona-
ceous dust) there (e.g., Roche & Aitken 1985). If future
observations of other DLAs also reveal material richer in
silicates, it might indicate that those DLAs probe denser
regions near the centers of the respective galaxies.
The Galactic interstellar 9.7 µm feature is generally
broad and relatively featureless, which is taken as an in-
dication that interstellar silicates are largely amorphous.
(Crystalline silicates would produce structure within the
broad feature.) In principle, silicate grains may be com-
posed of a mixture of pyroxene-like [(MgxFe1−x)SiO3]
and olivine-like [(MgxFe1−x)2SiO4] silicates, with the
shape and central wavelength of the 9.7 µm absorp-
tion somewhat dependent on the exact composition (e.g.,
Kemper et al. 2004; Chiar & Tielens 2006). Fig. 1(b)
shows a closer view of the data, normalized by the power
law continuum shown in Fig. 1(a), and binned by a fac-
tor of 3. The dotted and short-dashed curves are fits
based on silicate emissivities derived from observations
of the M supergiant µ Cep and of the Orion Trapezium
region (e.g., Roche & Aitken 1984; Hanner et al. 1995),
which are taken to be representative of diffuse Galactic
ISM and denser molecular material, respectively . The
long-dashed and dot-dashed curves are fits based on the
silicate absorption profile observed toward the Galactic
Center Source GCS3, and on laboratory measurements
for amorphous olivine (Spoon et al. 2006). The shape
of the silicate profile observed toward AO 0235+164 is
most similar to that of laboratory amorphous olivine,
but the µ-Cep and GCS3 templates also yield reasonable
fits. The DLA silicate profile does not exhibit the red-
ward extension seen for the Trapezium profile, suggest-
ing that the DLA dust resembles dust in diffuse Galactic
clouds more than that in molecular clouds. Using χ2
minimization for 8.0-13.3 µm in the DLA rest frame, the
peak optical depth values τ9.7 for the laboratory olivine,
GCS3, µ Cep, and Trapezium templates are 0.081±0.018,
0.088 ± 0.020, 0.083 ± 0.018, and 0.071 ± 0.016, respec-
tively for the binned data (0.081± 0.020, 0.091± 0.023,
0.084± 0.021, and 0.069± 0.017, respectively, for the un-
binned data). The error bars on τ9.7 correspond to op-
tical depths that give reduced χ2 larger by 1.0 than the
minimum values. The respective reduced χ2 values are
1.22, 1.32, 1.51, and 1.92 for the binned data (1.82, 2.08,
2.10, and 2.65 for the unbinned data). It is interesting
to note that the best-fit astronomical template is GCS3,
consistent with the enhanced τ9.7/E(B−V ) ratio seen in
the DLA as toward the Galactic center. While the min-
imum reduced χ2 values are greater than 1.0, they are
similar to those found in other studies of the silicate ab-
sorption toward both Galactic and extragalactic sources
(e.g., Hanner et al. 1995; Bowey et al. 1998; Roche et
al. 2006, 2007). Indeed, we do not expect a perfect fit,
since possible differences in dust grain size and chemical
composition can alter the shape of the silicate feature, in-
cluding the peak wavelength and the FWHM (Bowey et
al. 1998 and references therein). Higher S/N and higher
resolution data would be needed to shed further light on
the specific types of silicates present in DLAs.
With a larger absorber sample, it would be possible to
explore correlations between the strengths of the 9.7 µm
silicate feature and the 2175 Å extinction bump (which
is thought to be produced by a carbonaceous component
of the dust). For example, it would be interesting to
understand whether the relative amounts of silicate and
carbonaceous dust vary with redshift or with the gas-
phase abundances of C or Si. High-S/N observations of
other possible features (e.g., the 18.5 µm silicate feature
or the 3.0 µm H2O ice feature) would provide additional
constraints on dust composition. (While those features
are generally weaker than the 9.7 µm feature in the Milky
Way, the 3.0 µm feature can be stronger than the 9.7 µm
feature in highly reddened molecular sightlines.)
Our exploratory study has demonstrated the potential
of the Spitzer IRS to study dust in quasar absorbers.
It would be very interesting to obtain similar spectra
for other dusty quasar absorbers. The E(B − V ) val-
ues for dusty absorbers such as that reported here (0.23)
are much larger than those for typical Mg II absorbers
[E(B − V ) of 0.002; York et al. 2006b]. These relatively
large reddening values are comparable to some of those
4 Kulkarni et al.
for Ly-break galaxies (LBGs), which show E(B − V ) up
to 0.4 and a median E(B − V ) of ≈ 0.15 at z ∼ 2
and z ∼ 3 (Shapley et al. 2001, 2005; Papovich et al.
2001). Such dusty absorbers appear to be chemically
more evolved (Wild et al. 2006) than typical DLAs, and
may possibly provide a link in terms of SFRs, masses,
metallicities, and dust content between the primarily
metal-poor and dust-poor general DLA population with
low SFRs and the actively star-forming, metal-rich, and
dust-rich LBGs. Further Spitzer IRS observations of
more dusty quasar absorbers thus will help to open a
new window on this interesting class of distant galaxies.
This work is based on observations made with the
Spitzer Space Telescope, which is operated by the Jet
Propulsion Laboratory, California Institute of Technol-
ogy under a contract with NASA. Support for this work
was provided by NASA through an award issued by
JPL/Caltech. VPK acknowledges support from NSF
grant AST-0607739 to University of South Carolina.
DEW acknowledges support from NASA LTSA grant
NAG5-11413 to the University of Chicago. We are grate-
ful to the Spitzer Science Center staff for helpful advice
on data analysis and to an anonymous referee for helpful
comments.
Facilities: SST (IRS).
REFERENCES
Boissé, P., Le Brun, V., Bergeron, J., & Deharveng, J.-M. 1998,
A&A, 333, 841
Bouché, N., Lehnert, M. D., & Péroux, C. 2005, MNRAS, 364,
Bowey, J. E., Adamson, A. J., & Whittet, D. C. B. 1998,
MNRAS, 298, 131
Chiar, J. E., & Tielens, A. G. G. M. 2006, ApJ, 637, 774
Chun, M. R. et al. 2006, AJ, 131, 686
Draine, B. T. 2003, ARAA, 41, 241
Fall, S. M., & Pei, Y. C. 1993, ApJ, 402, 479
Hao, L., Weedman, D. W., Spoon, H. W. W., Marshall, J. A.,
Levenson, N., Elitzur, M., & Houck, J. R. 2007, ApJ, 655, L77
Hanner, M. S., Brooke, T. Y., & Tokunaga, A. T. 1995, ApJ, 438,
Houck, J. R. et al. 2004, ApJS, 154, 18
Imanishi, M., Dudley, C. C., Maiolino, R., Maloney, P. R.,
Nakagawa, T., & Risaliti, G. 2007, ApJ, in press
Junkkarinen, V. T., Cohen, R. D., Beaver, E. A., Burbidge, E.
M., Lyons, R. W., & Madejski, G. 2004, ApJ, 614, 658
Kemper, F., Vriend, W. J., & Tielens, A. G. G. M. 2004, ApJ,
609, 826 (erratum 633, 534 [2005])
Khare, P., Kulkarni, V. P., Péroux, C., York, D. G., Lauroesch, J.
T., & Meiring, J. D. 2007, A&A, 464, 487
Kulkarni, V. P., Fall, S. M. & Truran, J. W. 1997, ApJ, 484, L7
Kulkarni, V. P., Fall, S. M., Lauroesch, J. T., York, D. G., Welty,
D. E., Khare, P., & Truran, J. W. 2005, ApJ, 618, 68
Kulkarni, V. P., Woodgate, B. E., York, D. G., Thatte, D. G.,
Meiring, J., Palunas, P., & Wassell, E. 2006, ApJ, 636, 30
Kulkarni, V. P., Khare, P., Péroux, C., York, D. G., Lauroesch, J.
T., & Meiring, J. D. 2007, ApJ, in press (astro-ph/0608126)
Madau, P., Ferguson, H. C., Dickinson, M. E., Giavalisco, M.,
Steidel, C. C., & Fruchter, A. 1996, MNRAS, 283, 1388
Papovich et al. 2001, ApJ, 559, 620
Pei, Y. C., Fall, S. M., & Bechtold, J. 1991, ApJ, 378, 6
Péroux, C., Kulkarni, V. P., Meiring, J., Ferlet, R., Khare, P.,
Lauroesch, J., Vladilo, G., & York, D. G. 2006, A&A 450, 53
Pettini, M. Smith, L. J., King, D. L., & Hunstead, R. W. 1997,
ApJ, 486, 665
Prochaska, J. X., Gawiser, E., Wolfe, A. M., Cooke, J., & Gelino,
D. 2003, ApJS, 147, 227
Roberts, M. S. et al. 1976, AJ, 81, 293
Roche, P. F., & Aitken, D. K. 1984, MNRAS, 208, 481
Roche, P. F., & Aitken, D. K. 1985, MNRAS, 215, 425
Roche, P. F., Packham, C., Aitken, D. K., & Mason, R. E. 2007,
MNRAS, 375, 99
Roche, P. F., Packham, C., Telesco, C. M., Radomski, J. T.,
Alonso-Herroro, A., Aitken, D. K., Colina, L., & Perlman, E.
2006, MNRAS, 367, 1689
Sembach, K. R., & Savage, B. D. 1992, ApJS, 83, 147
Shapley, A. et al. 2001, ApJ, 562, 95
Shapley, A. E., Steidel, C. C., Erb, D. K., Reddy, N. A.,
Adelberger, K. L., Pettini, M., Barmby, P., & Huang, J. 2005,
ApJ, 626, 698
Smith, H. E. et al. 1977, ApJ, 218, 611
Spoon, H. W. W. et al. 2006, ApJ, 638, 759
Sturm, E., Hasinger, G., Lehmann, I., Mainieri, V., Genzel, R.,
Lehnert, M. D., Lutz, D., & Tacconi, L. J. 2006, ApJ, 642, 81
Vladilo, G., & Péroux, C. 2005, , A&A, 444, 461
Wang, J., Hall, P. B., Ge, J., Li, A., & Schneider, D. P. 2004,
ApJ, 609, 589
Werner, M. W. et al. 2004, ApJS, 154, 1
Whittet, D. C. B. 1987, QJRAS, 28, 303
Whittet, D. C. B., Bode, M. F., Longmore, A. J., Adamson, A. J.,
McFadzean, A. D., Aitken, D. K., & Roche, P. F. 1988, 233, 321
Wild, V., & Hewett, P. C. 2005, MNRAS, 361, L30
Wild, V., Hewett, P. C., & Pettini, M. 2006, MNRAS, 367, 211
Yanny, B., York, D. G., & Gallagher, J. S. 1989, ApJ, 338, 735
York, B. A., Ellison, S. L., Lawton, B., Churchill, C. W., Snow,
T. P., Johnson, R. A., & Ryan, S. G. 2006a, ApJ, 647, L29
York, D. G. et al. 2006b, MNRAS, 367, 945
http://arxiv.org/abs/astro-ph/0608126
Silicate Feature in A DLA 5
0.85 0.95 1.05 1.15 1.25 1.35
−1.65
−1.55
−1.45
−1.35
−1.25
−1.15
−1.05
SL order 1
LL order 2
LL bonus order
log λobserved
5.0 10.06.0 8.0 12.0
Rest Wavelength(
Q0235+164
z abs = 0.524
6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5
Rest Wavelength
tau_9.7=0.083, mu Cep
tau_9.7=0.071, Trapezium
tau_9.7=0.088, GCS3
tau_9.7=0.081, Lab Olivine
( m)µ
Fig. 1.— (a) Left: Spitzer IRS spectrum of AO0235+164. The lower scale for the abscissa denotes the logarithm of the observed
wavelength in µm; rest frame wavelengths at the absorber redshift are shown at the top. The errorbars denote 1 σ flux uncertainties. The
dashed line shows a power law estimate of the continuum. (b) Right: A closer look at the suggested silicate feature. The abscissa denotes
the rest frame wavelength at the DLA redshift. The data points show the spectrum, normalized by the power law continuum and binned
by a factor of 3. The errorbars denote 1 σ uncertainties. The smooth curves show profiles for four templates of silicate optical depth, based
on observations for three Galactic sightlines and laboratory measurements for amorphous olivine.
ABSTRACT
  We report a detection of the 9.7 micrometer silicate absorption feature in a
damped Lyman-alpha (DLA) system at z_{abs} = 0.524 toward AO0235+164, using the
Infrared Spectrograph (IRS) onboard the Spitzer Space Telescope. The feature
shows a broad shallow profile over about 8-12 micrometers in the absorber rest
frame and appears to be > 15 sigma significant in equivalent width. The feature
is fit reasonably well by the silicate absorption profiles for laboratory
amorphous olivine or diffuse Galactic interstellar clouds. To our knowledge,
this is the first indication of 9.7 micrometer silicate absorption in a DLA. We
discuss potential implications of this finding for the nature of the dust in
quasar absorbers. Although the feature is relatively shallow (tau_{9.7} =
0.08-0.09), it is about 2 times deeper than expected from extrapolation of the
tau_{9.7} vs. E(B-V) relation known for diffuse Galactic interstellar clouds.
Further studies of the 9.7 micrometer silicate feature in quasar absorbers will
open a new window on the dust in distant galaxies.

<|endoftext|><|startoftext|>
Introduction
The luminosity function (LF) is an observational tool used for analyzing the post-
main sequence evolutionary phases of low-mass (≈ 0.5-0.8M⊙) metal-poor stars in Galactic
globular clusters (GGC). Because of their age and richness, GGC typically contain hun-
dreds of stars that have evolved off the main sequence. The numbers of stars in evolved
phases are directly related to the evolutionary timescales and fuel consumed in each phase
(Renzini & Fusi Pecci 1988), so that they present us with an opportunity to test this aspect
of stellar evolution models.
The results of the most stringent tests have been mixed. Repeated studies of the metal-
poor cluster M30 (Bolte 1994; Bergbusch 1996; Guhathakurta et al. 1998; Sandquist et al.
1999) have found an excess number of red giant branch (RGB) stars relative to main sequence
(MS) stars. Stetson (1991) also uncovered an apparent excess of stars in a combined LF of
the metal-poor clusters M68, NGC 6397, and M92. However, the LFs of more metal-rich
clusters show no discrepancy (M5: Sandquist et al. 1996; M3: Rood et al. 1999; M12:
Hargis, Sandquist, & Bolte 2004). In a survey of 18 clusters, Zoccali & Piotto (2000) found
good agreement with model predictions with the possible exception of clusters at the high
metallicity end.
In this paper we present BV I photometry of NGC 5466, a high galactic latitude globular
cluster (l = 42.2◦ and b = 73.6◦), located in the constellation of Boötes (α = 14h05m27.s4,
δ = +28◦32′04′′ at a distance of R=15.9 kpc; Harris 1996). NGC 5466 is a loose cluster
(rc = 1.
′64) with extremely low metallicity ([Fe/H]= −2.22) and subject to little or no
reddening, (E(B − V ) ≃ 0) (Harris 1996).
In §2, we describe the process leading to the calibrated photometry, and compare with
previous studies of the cluster. In §3, we compare the observed color-magnitude diagram
and observed luminosity function with theoretical models, focusing on the relative number
– 3 –
of stars on the lower RGB and around the MS turnoff. Finally, in §4, we present a new
examination of the blue straggler population of NGC 5466.
2. Observations and Data Reduction
The data used in this study were obtained with the Kitt Peak National Observatory
(KPNO) 0.9 m telescope (0.′′68 pix−1) on the nights of UT dates 1995 May 4, May 5, and
May 9. A complete list of the image frames, exposure times, and observing conditions is
given in Table 1.
The images obtained on the three nights were processed using IRAF1 tasks and pack-
ages. The reduction involved subtraction and trimming of the overscan region of all images,
subtraction of a master bias frame from flats and object frames, and flat fielding of the
object frames using images taken at twilight. Profile-fitting photometry was done using the
DAOPHOTII/ALLSTAR programs (Stetson 1987).
We also reduced archival ground-based photometry of the cluster core taken with the
High-Resolution Camera (HRCam) on the 3.6 m Canada-France-Hawaii Telescope (CFHT).
The V and I images were taken 30 and 31 May 1992 (observers J. Heasley and C. Christian),
and have not previously been described in the literature. The CCD images had 1024 × 1024
pixels, 0.′′13 per pixel, and excellent seeing (0.4 - 0.5 arcsec). The images were reduced using
the archived bias and twilight flat frames, and following a procedure similar to that for the
KPNO data. This allowed us to get excellent photometry to 2 magnitudes below the turnoff
in the cluster core. These images were used entirely for blue straggler identification (see §4).
2.1. Calibration against Primary Standard Stars
The conditions at KPNO on 1995 May 9 were photometric, and Landolt standard star
fields were observed at a range of air masses to determine photometric transformation coeffi-
cients. The standard values used for the calibration were chosen from the large compilation
of Stetson (2000), which is set to be on the same photometric scale as the earlier Landolt
(1992) values.
We conducted photometry on the standard stars and isolated cluster stars using multiple
1IRAF(Image Reduction and Analysis Facility) is distributed by the National Optical Astronomy Ob-
servatories, which are operated by the Association of Universities for Research in Astronomy, Inc., under
contract with the National Science Foundation.
– 4 –
synthetic apertures. We then used the DAOGROW (Stetson 1990) program to construct
growth curves to extrapolate measurements to a common aperture size. Using the CCDSTD
program, the standard star transformation equations were found to be:
b = B + ao + (−0.069± 0.005)(B − V ) + (0.255± 0.014)(X − 1.0)
v = V + bo + (0.027± 0.004)(B − V ) + (0.172± 0.010)(X − 1.0)
i = I + co + (−0.013± 0.005)(V − I) + (0.145± 0.018)(X − 1.0)
where X is the airmass, v, b and i are instrumental magnitudes, and V , B, and I are standard
magnitudes. These calibration equations are different than those used for our analysis of
M10 Pollard et al. (2005) and M12 Hargis et al. (2004), which were observed on the same
night, because our I−band exposures did not go as deep as the B and V exposures. As a
result, the (B − V ) color was a better choice for calibrating the V photometry down to our
faintest observed stars. The calibrated measurements for the standard stars are compared
with catalogue values in Fig. 1.
We note that there is slight evidence of trend in the residuals for the I band versus
magnitude, which might indicate nonlinearity. This impression is caused by one observation
of the PG1323-086 field. We did, however, have an additional observation of the same field
on the same night having the same exposure time that does not show the same (small) trend.
Because we do not have any reason to eliminate the frame and because its elimination has a
minimal effect on the transformation coefficients, we have decided to retain the measurements
from the image.
2.2. Calibration against Secondary Standard Stars
Aperture photometry for 165 cluster stars was used to calibrate the point-spread function
(PSF) photometry for the cluster. These secondary standard stars were chosen based on
relatively low measurement errors and location in relatively uncrowded regions of the cluster.
They were chosen from the asymptotic giant branch (AGB), upper RGB, and horizontal
branch (HB) of the cluster in order to cover the entire range of colors covered by cluster
stars.
The PSF-fitting photometry for the three nights of data was combined and averaged
after zero-point differences among the frames had been determined and corrected. The zero-
point corrections to the standard system were determined after fixing the color-dependent
terms at the values measured in the primary standard star calibration. (This was also done
in our studies of M10 and M12.) In Fig. 2, it can be seen that this procedure does not
introduce systematic color- or magnitude-dependent errors.
– 5 –
2.3. Comparison with Previous Studies
We compared our photometric data set to those of Jeon et al. (2004), Rosenberg et al.
(2000), and Stetson (2000). The magnitude and color comparisons (BV I in this study versus
V I in Stetson and Rosenberg et al., and BV in Jeon et al.) as function of magnitude and
color are shown in Figs. 3 – 5. Though our calibrated magnitudes are slightly brighter than
those of Stetson (2000), the differences are small, and there is no color trend. The offsets
compared to the Rosenberg et al. (2000) are larger, but again there are no clear color trends.
The offsets compared to the Jeon et al. (2004) data are also significant, but more notable
are slight trends with color.
2.4. Calculation of the Luminosity Function
Artificial star tests were performed to empirically measure the precision of our photom-
etry and to correct for incompleteness in the detection of stars. We followed the procedure
described by Hargis et al. (2004) for the calculation of incompleteness corrections as function
of position and magnitude.
The inputs used for producing the artificial star tests were the reduced B and V
CCD frames, PSFs for each object frame, fiducial lines, and an estimate for the initial
LF (Sandquist et al. 1996). Artificial stars were randomly placed in cells on a spatial grid
and the entire grid was then shifted randomly from run to run in order to ensure the whole
imaged field was tested (Piotto & Zoccali 1999). Each star was placed in a consistent po-
sition relative to the cluster center on each image. If a detected star was found to coincide
with the input position of an artificial star, it was added to the archive. The new images
were reduced using the same procedure applied to the original data set. In this study, a
total of 100,000 artificial stars from 50 separate runs were added. The number of artificial
stars per trial was chosen so that the effects of crowding on the photometry was qualitatively
unchanged.
The recovered artificial stars were used to calculate 1) median magnitude and color
biases (δV and δB−V , where δ = median[output − input]), 2) median external error esti-
mates (σext(V ) = median[δV −median(δV )]/0.6745 and σext(B − V )), and 3) total recovery
probabilities (F (V ), which is the fraction of the stars that were recovered with any output
magnitude) in bins according to projected radius and magnitude. The values for the above
quantities are plotted in Figs.6 – 8.
Finally, an initial estimate of the “true” LF and the error distribution, magnitude biases
and the total recovery probability (F ) were used to compute the completeness fraction f
– 6 –
(the ratio of the predicted number of stars to the actual number of observed stars). The
completeness fraction results are shown in Figure 9. We then interpolated to compute f
for the radial distance and magnitude of each detected star. For each observed star f−1
was added to the appropriate magnitude bin to determined the observed LF. (Note that the
completeness fraction was set to 1.0 for star brighter than the turnoff.) The observed LF
along with the upper and lower 1 σ error bars on log N are listed in Table 2.
3. Discussion
3.1. Reddening, Metallicity, Distance Modulus, and Age
Because NGC 5466 resides at high galactic latitude, it suffers little if any reddening.
Though Schlegel et al. (1998) found a reddening of E(B − V ) = 0.02 from the maps of dust
IR emission, we adopted E(B − V ) = 0.0 (Rosenberg et al. 2000). For our interests in this
paper, the small difference is of small importance. Most of the comparisons below between
observations and theory are relative, in which reference points (like the turnoff) are used to
determine magnitude and/or color shifts. This has the benefit of minimizing the influence
of uncertainties in reddening and distance modulus (see below).
As for abundances, there is only one high-resolution measurement for a cluster star, and
it is for the anomalous Cepheid V19. McCarthy & Nemec (1997) find [Fe/H]= −1.92±0.05,
while Pritzl et al. (2005) find [Fe/H]= −2.05 using the same data. Typically quoted metal-
licity values include [Fe/H] = −2.17 (Zinn 1980) from photoelectric photometry of integrated
light in selected filter bands, and [Fe/H] = −2.22, which was derived by Zinn & West (1984)
(ultimately from low-resolution spectral scans by Searle & Zinn 1978). When converted to
the widely-used metallicity scale of Carretta & Gratton (1997), this becomes [Fe/H] = −2.14.
More work could certainly be done on the composition of NGC 5466 stars, but the evidence
so far points to an abundance [Fe/H] . −2.0. Though the range in the above quoted metal-
licity values is relatively large for a globular cluster, the exact value is not critical for our
purposes since we will primarily be concerned with relative comparisons.
Our photometry does not extend faint enough to derive a new distance modulus from
subdwarf fitting to the main sequence. Harris (1996) obtained (m − M)V = 16.0 by cali-
brating the observed luminosity level of the horizontal branch with the relation MV (HB) =
0.15[Fe/H]+0.80 and adopting a reddening, E(B−V )=0.0 and a metallicity, [Fe/H]=−2.22.
Ferraro et al. (1999) determined distance moduli (m−M)V = 16.16 from their zero-age HB
estimate, assuming no reddening and metallicity on the Carretta & Gratton (1997) scale.
We will consider distance moduli in this range.
– 7 –
Most previous age estimates of NGC 5466’s age have it older than the recent deter-
mination of the age of the universe (13.7+0.13
−0.17 Gyr) obtained by the Wilkinson Microwave
Anisotropy Probe (WMAP) team (Spergel et al. 2006). Recent homogeneous studies of GGC
indicate that NGC 5466 is coeval with clusters of similar metallicity (Salaris & Weiss 2002;
Rosenberg et al. 2000). As a result, we will primarily consider ages in the range of 12 to 13
3.2. The Color-Magnitude Diagram
The color-magnitude diagrams (CMDs) for NGC 5466 show well-defined RGB, AGB,
and HB sequences (see Fig. 10), and stars extending from the tip of the RGB down to
V ≈ 22.5. Fiducial sequences for the MS and lower RGB were determined from the mode of
the color distribution of stars in magnitude bins. The SGB position was determined using
the magnitude distribution of the stars in color bins. The fiducial line for the rest of the
RGB was obtained from the mean color of stars in magnitude bins. The fiducial points are
listed in Table 3.
A comparison of the fiducial points derived for NGC 5466 with theoretical isochrones for
a range of ages from the Teramo (Cassisi et al. 2004), Victoria-Regina (VandenBerg et al.
2006), and Yonsei-Yale (Demarque et al. 2004) groups is displayed in Figures 11 and 12.
The isochrones have been shifted in color and magnitude (aligning the turnoff colors and
the magnitudes of the main sequence point 0.05 mag redder than the turnoff) according
to the technique of Vandenberg et al. (1990). This has the advantage of removing some of
systematic uncertainties associated with the color-Teff transformations. [In our comparisons,
we found that the Yonsei-Yale models could not match the fiducial line for any reasonable set
of input parameters when the transformations of Lejeune et al. (1998) were used. Therefore,
we only utilize models using the Green, Demarque, & King (1987) transformations. Even
then, we could not find a match with the slope of the upper giant branch for reasonable
metallicities ([Fe/H] . −1.9).] On the whole, the shape of the fiducial matches the models
well on the main sequence, subgiant branch, and lower giant branch. Neither the Teramo
nor the Victoria-Regina models include element diffusion processes, while the Yonsei-Yale
isochrones only include He diffusion. However, differences in Teff -color transformations are
likely to be the cause of some of the differences seen.
– 8 –
3.3. The Luminosity Function
The number of stars at a given luminosity in post-main-sequence phases is directly pro-
portional to the lifetime spent at that luminosity. It is well known that the LF of the RGB
probes the chemical stratification inside a star because the hydrogen abundance being sam-
pled by the thin hydrogen-fusion shell affects the rate of evolution, and hence star counts
on the RGB (Renzini & Fusi Pecci 1988). Setting aside the short pause at the RGB bump,
RGB evolution accelerates in a very regular way that is ultimately related to the structure
of degenerate core. The relationship between core mass and radius forces the fusion shell to
function at strictly controlled density and temperature conditions, which leads to a relation-
ship between core mass and luminosity. This causes the LF to be particularly sensitive to
certain physical details, which we discuss in §3.3.1 below.
Fig. 13 shows the observed luminosity function compared to theoretical LFs for the la-
beled values of metallicity, exponent of the initial mass function, and a range of age estimates
for NGC 5466, assuming our preferred distance modulus (m −M)V = 16.00. The theoret-
ical models were normalized to the observed LF at V ≈ 21.3 (sufficiently faint that stellar
evolution effects are minimized). The models agree well with the observed LF, implying an
age of approximately 12 - 13 Gyr.
3.3.1. Relative RGB and MS Numbers
Gallart et al. (2005) recently discussed theoretical luminosity functions calculated by
different groups. One of the primary differences they noted was in the number of giant stars
relative to main sequence stars. In order to show these differences in a parameter-independent
way, we follow the method of Vandenberg et al. (1990). In Fig. 14, the theoretical LFs were
shifted so that a point on the main sequence 0.05 mag redder than the turnoff color point
matched the corresponding point on the cluster fiducial line. The reason for using the point
(VTO+0.05) rather than the MSTO itself is that the MS has a significant slope and curvature
at this point, making it possible to accurately measure the point in both observational data
and isochrones (Vandenberg et al. 1990). The theoretical models were normalized to the two
bins in the observed LF on either side of the turnoff (V = 19.83 and 20.13). As can be seen,
age-related differences between the theoretical LFs nearly disappear when this procedure
is applied (see also Stetson 1991; Vandenberg et al. 1998). However, when different sets of
models are compared, there are small differences in the number of RGB stars relative to
MS stars, with the Victoria-Regina models predicting the smallest number of giants and the
Yonsei-Yale models predicting the largest.
– 9 –
To quantify the differences, we computed the ratio of the number of stars on the lower
giant branch to the number of stars near the main sequence. Pollard et al. (2005) introduced
this ratio and showed that it is insensitive to age and heavy-element abundance. For the
main sequence population, we used star counts in the two bins on either side of the turnoff
(19.682 < V < 20.282), and for the red giant branch we used the counts in the range
16.982 < V < 18.482. We derived model values from the same magnitude ranges relative to
the VTO+0.05 point on the main sequence. Values are compared in Table 4. The error in the
observed value is dominated by Poisson statistical scatter. The Yonsei-Yale models are in
best agreement with the observations, the Victoria-Regina models are out of agreement by
more than 2 σ, and the Teramo models are in between.
It is worth examining the possible causes of this difference both because it may help
improve the physics inputs for the models and because red giant stars are some of the largest
contributors to the integrated light of old stellar populations. Gallart et al. (2005) tabulated
most of the main physics inputs for the most widely-used model sets2. Earlier studies (Stetson
1991; Vandenberg et al. 1998) have shown that the LF-shifting method used above eliminates
nearly all sensitivity to model input parameters like mass function, age, convective mixing
length, and composition inputs with the exception of helium abundance, which we examine
first.
Older models (e.g. Fig. 9 of Ratcliff 1987, Fig. 7 of Stetson 1991, Fig. 3 of Vandenberg et al.
1998) seem to agree that an increase in initial helium abundance Y in non-diffusive models
by 0.1 results in an increase in the relative number of stars on the RGB (more precisely, a
reduction in the relative number of main sequence stars) by about 0.07-0.08 in logN . The
Teramo models (Y = 0.245) predict about 12% more giant stars relative to main sequence
stars compared to the Victoria-Regina models (Y = 0.235)3. This difference corresponds
to a shift of 0.05 in logN , which is about an order of magnitude too large for the helium
abundance difference.
The Yonsei-Yale models have the lowest assumed helium abundance (Y = 0.23), but
are the only set of the three that include helium diffusion. The inclusion of helium dif-
fusion reduces the age derived from the turnoff of a globular cluster by about 10-15%
2Since the Gallart et al. (2005) review was published, the Teramo group found an error in the evolution
scheme they used on the giant branch, which brings their models into better agreement with other groups.
We use their updated models in the comparisons here
3The Teramo models also assume a larger α-element enhancement ([α/Fe] = 0.4) than the Victoria-Regina
models ([α/Fe] = 0.3), which would tend to reduce the number of giants relative to main sequence stars.
However, because the relative number of RGB and MS stars is not sensitive to small differences in heavy
element abundance, this difference is probably unimportant.
– 10 –
(Proffitt & Vandenberg 1991; Straniero et al. 1997; VandenBerg et al. 2002), thanks to the
inward motion of helium. According to theoretical models (e.g. Fig. 8 of Proffitt & Vandenberg
1991), diffusion has a small effect on the LFs (∼ a few times 10−2 in logN , increasing for
increasing age), but it does increase the number of giants relative to MS stars. He diffusion
reduces the total core hydrogen fuel supply available to an MS star, but in itself this does not
strongly modify the LF, just changes the brightness of the turnoff. This magnitude change
is eliminated in our LF shifting procedure. The chemical composition profile left in the star
after it leaves the main sequence has a greater impact. Diffusion reduces the H abundance in
the fusion regions, thereby decreasing the evolutionary timescale. According to the models
of Proffitt & Michaud (1991), the changes to the He abundance profile are most considerable
immediately below the surface convection zone, and just outside the nuclear fusion regions
(where the composition gradient slows the inward settling of helium). However, over most
of the star, the changes in Y are limited to 0.01 - 0.02. Because the core portion of the
composition profile is consumed on the subgiant branch, the evolution timescales for giant
stars are only affected in a minor way, and the appearance of a deep convection zone on
the lower giant branch wipes out most of the effects of diffusion for the upper giant branch
evolution.
In spite of this, the Yonsei-Yale models have almost 23% more giants than the Victoria-
Regina models, and more than 9% more giants than the Teramo models (relative to MS
stars). The lower helium abundance in the Yonsei-Yale models compared to the other models
should partially counteract what effects helium diffusion might have had on the RGB/MS
ratio as well. Thus, it appears that neither He abundance nor He diffusion can completely
explain the differences between the Yonsei-Yale models and the other groups.
We can ask whether the LFs show similar disagreements at other metallicities. Hargis et al.
(2004) made comparisons between the Victoria-Regina and Yonsei-Yale theoretical LFs and
observational LFs for the clusters M3 (Rood et al. 1999), M5 (Sandquist et al. 1996), M12,
and M30. The overall impression from those comparisons was again that the Yonsei-Yale
models (having He diffusion) predict more giant stars relative to main-sequence stars than
do the Victoria-Regina models. In Fig. 15, we compare the LFs for these clusters with
the Teramo models. The degree of agreement or disagreement can be quantified with num-
ber ratios of lower RGB and MSTO stars, similar to the ones we computed earlier for
comparisons with NGC 5466. Our calculations are shown in Table 4. As can be seen, uncer-
tainties in the metallicity scale have some effect on the comparisons with the observations.
The Carretta & Gratton (1997) scale has higher [Fe/H] values than the Zinn & West (1984)
scale, and thus results in lower number ratios.
Fig. 16 shows the results of comparing the observed ratios with the models for different
– 11 –
[Fe/H] scales. On the Zinn & West 1984 scale (right panels), the observed values seem to
be in agreement to within about 1 − 1.5σ for most of the models, with the exception of
the the lowest metallicity clusters (M30 and NGC 5466) and the lowest helium abundance
(Victoria-Regina) models. On the Carretta & Gratton 1997 scale, the Yonsei-Yale models
have the best overall agreement, although the Teramo models only deviate noticeably for
the lowest metallicity clusters. The Victoria-Regina models predict too few giants for all of
the clusters.
The differences from model to model (as opposed to models versus observation) point
toward deficiencies elsewhere in the physics or computational algorithms used in the stel-
lar evolution codes. 4 The RGB LF is a robust prediction of the models because there is
a strong core mass — luminosity relationship: the conditions in the hydrogen fusion shell
of the giant are strongly dependent on the structure of the degenerate core and are al-
most independent of the details of the mass or structure of the envelope. As a result of
this, we can focus on factors affecting core structure. (As a non-standard physics exam-
ple, Vandenberg et al. 1998 describe the way in which core rotation relieves a giant star of
some of the need to support itself by gas pressure, which reduces the core temperature and
lengthens the evolutionary timescale.) Because model-to-model differences appear even on
the faint end of the giant branch, we can set aside factors that only become important to the
structure of the star near the tip of the giant branch [such as neutrino losses and conductive
opacities; see Bjork & Chaboyer 2006, for example], even though there are significant theo-
retical uncertainties in these quantities. Nuclear reaction rates in the fusion shell can also
be neglected, partly because the uncertainties in the reactions appear to be relatively small
Adelberger et al. (1998), but also because small changes in the reaction rates require only
tiny changes in the shell temperature to get the same energy production. This leads us to
examine the equation of state (EoS) in the core.
Although the behavior of degenerate electrons is thought to be very well understood,
their interactions with nuclei can have a measureable effect on the pressure. Particles of like
charge tend to cluster together, which modifies the free energy of the gas and reduces the gas
pressure for given density and temperature. Harpaz & Kovetz (1988) looked at the effects
of the inclusion of Coulomb interactions on giant stars, and their results are corroborated
by those of Cassisi et al. (2003). They found that for a given core mass the fusion shell
temperature was higher when the Coulomb interactions were included, which leads to faster
processing of hydrogen. Thus this is another example (like core rotation) where modification
of the pressure support of the core affects the evolutionary timescales, which results in
4For a comparison of these physical inputs for the different theoretical groups, see Table 1 of Gallart et al.
(2005).
– 12 –
changes to the luminosity function. The Coulomb corrections to the pressure become more
important with increasing density for the core, but are small compared to the contribution
of the degenerate electrons.
All of the model sets we have considered here incorporate Coulomb interactions in some
form. The Teramo group used the most sophisticated “EOS1” version of the FreeEOS5,
which incorporates Coulomb corrections in a form that matches limits in both the weak
(Debye-Hückel) and strong (once-component plasma) Coulomb interaction limits as well
as (less importantly) electron exchange interactions. The strong interaction limit is most
relevant for giant star cores since the strong interaction parameter
where ζ is the rms nuclear charge and r0 is the average internuclear distance.
The Yonsei-Yale group used the OPAL EoS tables (Rogers et al. 1996), but falls back
on the group’s older EoS [e.g. (Guenther et al. 1992)] for conditions for high densities and
temperatures outside the OPAL tables. While the most recent OPAL tables probably contain
the most complete physical description of the Coulomb effect, the OPAL tables they used
(Y.-C. Kim, private communication) were computed prior to recent improvements to account
for relativistic electrons (Rogers & Nayfonov 2002), and as a result cut off at log ρ > 5.0.
The Yale EoS at higher densities only includes the Coulomb effect in the weak Debye-Hückel
limit, which for the highest densities in the core. The Victoria-Regina models also use
a modified version of the EFF EoS (Eggleton et al. 1973), with a correction for Coulomb
corrections in the weak Debye-Hückel limit (VandenBerg et al. 2000).
The differences in the implementation of the Coulomb effect may explain the fact that
the Teramo models generally predict more giants (relative to the main sequence) than the
Victoria-Regina models do. However, the smaller Coulomb corrections in the Yonsei-Yale
models would tend to result in fewer giants than the Teramo models (although the effects of
helium diffusion work in the opposite direction). So, we are unable to completely reconcile
the differences in the luminosity functions from the three groups.
Obviously more detailed study is needed by all of the modelling groups to identify the
causes of the differences, but such a study is beyond the scope of this paper. Still, we
believe that helium diffusion and strong interaction Coulomb corrections are physical effects
that should be considered first. There is, for example, good evidence from helioseismology
5FreeEOS is available at http://freeeos.sourceforge.net/, and the discussion of the implementation of the
Coulomb effect can be found at http://freeeos.sourceforge.net/coulomb.pdf.
http://freeeos.sourceforge.net/
http://freeeos.sourceforge.net/coulomb.pdf
– 13 –
for helium diffusion in the Sun Bahcall et al. (1995), despite the surface convection and
meridional flow (e.g., Hathaway 1996). It is expected that helium diffusion should also act
in globular cluster stars. A detailed study of the effect of equation of state uncertainties
has yet to be done [see, for example, Bjork & Chaboyer (2006) for a study of uncertainties
in other physical inputs]. Use of FreeEoS would make a study of equation of state effects
most stringent since it appears to be capable of modelling the most sophisticated tabular
EoS (OPAL), while also having the flexibility to allow individual bits of the physics to be
“turned off”.
As a final warning about the observations, we should remember the LF of the clus-
ter M10. Pollard et al. (2005) found that unusual variations in numbers of RGB stars at
different brightness levels in M10 (a virtual twin to M12). In particular there seemed to
a significant excess in the number of stars near the RGB bump in brightness, while the
lower RGB appeared normal (compared to Victoria-Regina and Yonsei-Yale models). A
similar excess may be present in the RGB LF of M13 (Cho et al. 2005). These kinds of
variations cannot be explained by the “global” physics that should apply to all globular
cluster stars. These anomalies point toward fluctuations in the stellar initial mass function
or composition-dependent effects.
3.3.2. The RGB Bump
A second feature of the LF presented here is a noticeable RGB bump. Typically, the
RGB bump appears as a peak in the differential LF and as a change of slope in the cumulative
LF (CLF). The bump provides a measure of the maximum depth reached by the outer
convection zone during first dredge-up since it is the result of a pause in the star’s evolution
when the shell fusion source begins consuming material of constant, lower helium content
(Fusi Pecci et al. 1990). Unfortunately, the number of stars occupying the bump gets smaller
and the luminosity of the bump increases as the metallicity of the cluster decreases, making
the bump harder to detect in metal-poor clusters. A small peak appears in our differential
LF at V ≈ 16.2, and a significant (2.5 − 3.5σ) change in slope occurs at the same position
as the peak in the differential LF, as shown in Fig. 17.
The relative brightness of the bump can be measured by comparing to the V -magnitude
of the HB at the level of the RR Lyrae instability strip ∆V
HB = Vbump−VHB (Ferraro et al.
1999). This indicator is a function of the total metallicity and the age of the cluster: an
increase in metallicity and/or a decrease in age are accompanied by a decrease in luminosity
of the bump (Ferraro & Montegriffo 2000). We find Vbump = 16.20 ± 0.05 mag, and VHB =
16.52 ± 0.11 (from interpolation between the average magnitudes of non-variable HB stars
– 14 –
at the blue and red ends of the RR Lyrae instability strip), giving ∆V
HB = −0.32 ± 0.12.
In the compilation of Ferraro et al. (1999), a zero-age HB reference point was calculated
using the relation VZAHB = VHB +0.106[Fe/H]
+0.236[Fe/H]+ 0.193. Ferraro et al. found
VZAHB = 16.62 ± 0.10, which is consistent with the value obtained here (VZAHB = 16.65 ±
0.11). Our value of ∆V
ZAHB = −0.45 ± 0.12 is considerably lower than tabulated values
for other clusters with similar metallicities (M68: −0.60 ± 0.07; M92: −0.65 ± 0.12; M15:
−0.65 ± 0.09). In a separate compilation, Zoccali et al. (1999) measured a smaller value
ZAHB = −0.45 ± 0.11 for M15, in better agreement with the value for NGC 5466. (The
difference is primarily because Zoccali et al. measured the bump position to be 0.16 mag
fainter than Ferraro et al. .) Clearly there is still some need for more precise comparisons of
bumps in metal-poors clusters with theory.
We believe, however, that the brightness of the bump should ultimately be judged using
hydrogen-fusing stars as references because it avoids any effects of the poorly-understood
physical processes (such as the helium flash and/or mass loss) associated with the creation
of an HB star. Using the cluster LF (as seen in Figure 14), we again find that the observed
RGB bump is fainter than model values by at least 0.3 mag when the models are shifted
to match the cluster’s main sequence. Hargis et al. (2004) did similar comparisons between
theoretical models and luminosity functions for M3, M5, M12, and M30 taken from the
literature. With the exception of M30 (because the bump could not be identified), the
position of the bump relative to the turnoff region agreed well with theory. NGC 5466 is
thus the most metal-poor cluster this comparison has been done for. So at present we are left
with the question of whether this might result from the cluster’s low metallicity, or whether
we have been the unfortunate victims of a fluctuation in the number of giant stars in this
low-mass cluster. We therefore encourage the examination of the luminosity function of more
massive metal-poor clusters to settle the question.
3.3.3. Mass Function Exponent
Two recent papers (Belokurov et al. 2006; Grillmair & Johnson 2006) reported the dis-
covery of tidal streams covering many degrees around NGC 5466 in Sloan Digital Sky Sur-
vey images. Gnedin & Ostriker (1997) examined the Milky Way globular clusters and found
NGC 5466 to be a cluster that has probably been strongly affected by disk shocking in the
recent past. In our examination of the LF, we found a rather low value for the global main se-
quence mass function slope. The mass function for a cluster is typically expressed as a power
law (N(M) ∝ M−(1+x)), where the slope x = 1.35 is the standard Salpeter value. Generally,
the present-day power-law index x varies from cluster to cluster. The mass function slopes
– 15 –
that best fit the upper LF of NGC 5466 around and above the MSTO have −1 . x . 0.
(Note that the best fit slope does depend on the models being used: the Yonsei-Yale models
require a flatter slope than the Victoria-Regina and Teramo models.) Such a shallow mass
function slope is unusual for a metal-poor cluster. For example, the cluster NGC 5053 has
similar metallicity, position relative to the galactic center and plane, and density structure,
but still has a steep x ∼ 2 mass function (Fahlman et al. 1991). Djorgovski et al. (1993)
found that mass function slopes in the range of 0.5 ≤ M
≤ 0.8 are influenced primarily by
the cluster’s position in the galaxy, and to some extent by cluster metallicity. Based on both
of those factors, NGC 5466 should have a larger mass function slope (x ∼ 3 according to the
multivariate formula in Djorgovski et al. 1993). Like other halo clusters, NGC 5466’s orbit
is quite eccentric and will take the cluster more than 30 kpc away from the Galactic center
(Dinescu et al. 1999), but it is currently on its way back into the halo after two relatively
recent passes through the Galactic disk. Recent losses of low-mass stars may explain the
recent identification of strong tidal tails near the cluster by Belokurov et al. (2006).
4. Blue Stragglers
Blue stragglers (BSSs) were first identified by Sandage (1953) in the globular cluster
M3. These stars are more massive than the turnoff mass and occupy the space in the CMD
just bluer and brighter than the MSTO. Blue stragglers are found in clusters, and relatively
more frequently in lower-luminosity clusters (Ferraro et al. 1993; Preston & Sneden 2000;
Piotto et al. 2004; Sandquist 2005). From the various models proposed for the formation
mechanism of BSSs, the “collision” theory (involving strong gravitational interactions be-
tween previously unassociated single or binary stars) and the “mass-transfer” theory (in
which the more massive star in a binary evolves and during its expansion transfers mass to
its companion) are the strongest possibilities. There is a continuing interest in the study of
BSSs because they may provide insight into the recent dynamical history of a cluster.
In order to identify BSSs over the entire observed area of the cluster out to a radius
of 11.′6, we used photometry from three datasets. In the core of the cluster we used the
CFHT data presented here for the first time. Outside of the CFHT field, we used the BV
photometry of Jeon et al. 2004, which covered a field 11.′6 on a side centered on the cluster.
Finally, we used our KPNO data for the least crowded outskirts of the cluster. Even though
NGC 5466 is a very low density cluster, the spatial resolution of the KPNO data was such
that blends of stars would have resulted in the spurious identification of 10 objects as BSS
in the intermediate portion of the field.
48 BSS candidates were identified in NGC 5466 by Nemec & Harris (1987), all located
– 16 –
between 0.′1 and 5.′6 from the cluster center. In spite of the low cluster density, we find new
BSS candidates at all radii and luminosity levels, and find several of their candidates are
spurious. According to the CFHT photometry, the object with ID 45 from Nemec & Harris
is a blend of several fainter stars, none of which is a BSS. In addition, IDs 6 and 24 were
identified as blends of stars using the Jeon et al. (2004) dataset. Our BSS list is presented
in Table 5. The list includes the nine known SX Phoenicis stars (ID 27, 29, 35, 38, 39, and
49, Nemec & Mateo 1990; ID 3 (SX Phe 3), 36 (SX Phe 2), and 50 (SX Phe 1), Jeon et al.
2004) and the three eclipsing binaries (ID 19, 30, and 31; Mateo et al. 1990). New straggler
candidates were given ID numbers that build upon the Nemec & Harris (1987) list. Figure
18 shows the CMDs used to select the 94 identified BSSs in each of the three datasets.
In order to use the BSSs to constrain cluster dynamics, we compared the normalized
cumulative radial distribution of the BSSs to the population of the giant branch, as shown
in Fig. 19. Nemec & Harris (1987) found a 97.8% probability that their BSS sample was
more centrally concentrated than red giants in the same magnitude range. Because their
photometry was taken in conditions of poorer seeing (compared to our CFHT photometry
and that of Jeon et al.), their samples are likely to be somewhat incomplete near the cluster
center.
Our RGB sample contains 350 stars with magnitude V < 18.5. Kolmogorov-Smirnov
(K−S) probability tests were used to test the hypothesis that both populations were drawn
from the same parent population. The K−S probability that the BSSs are drawn from
the same radial distribution as the RGB population is 8.1 × 10−7, and 2.4 × 10−4 for the
comparison with the HB population. By contrast there is a probability of 0.27 that the RGB
and HB samples are drawn from the same population. The concentration of BSSs toward
the cluster center as compared to the RGB and HB samples is consistent with the idea that
they are more massive than individual RGB stars, and as a result have been segregated by
mass deeper within the cluster potential well.
Piotto et al. (2004) recently used samples of stragglers from the cores of 56 globular
clusters to show that there was a strong correlation between FHBBSS = NBSS/NHB and inte-
grated cluster V magnitude, and a weaker anti-correlation with central density. Sandquist
(2005) examined an additional 13 low-luminosity globular clusters using similar selection
criteria. NGC 5466 is an interesting cluster in relation to these samples because it has an
integrated luminosity that puts it at the faint end of the Piotto et al. sample (MVt = −6.96;
Harris 1996), but with a central density that is nearly an order of magnitude lower than
any of their clusters [log(ρ0/(LV,⊙pc
−3)) = 0.88] but comparable to clusters in the Sandquist
sample. To put NGC 5466 in the context of the Piotto et al. and Sandquist samples, we
selected a subset of our BSSs that satisfied the selection criteria in those studies (brighter
– 17 –
than the MSTO, and bluer than the MSTO by 0.05 in B − V color). From mode fitting to
the turnoff region in the Jeon et al. (2004) and CFHT data, we find (B−V )TO = 0.367 and
(V − I)TO = 0.511. A color offset of 0.05 in B − V corresponds to an offset of about 0.075
in V − I (VandenBerg & Clem 2003) for NGC 5466’s metallicity. We find that 75 BSSs are
brighter than the cluster turnoff (VTO = 19.99±0.05) and 0.05 bluer than the cluster turnoff
in B − V . We have identified 97 HB stars in our datasets for NGC 5466, which gives a
specific frequency FHBBSS = 0.77± 0.12 (with the error estimate from Poisson statistics).
When compared to the Piotto et al. values (see Fig. 20), NGC 5466 falls within the
general trend versus MVt despite the cluster’s low central density. On the other hand, NGC
5466 has a lower FHBBSS value than other clusters of similar central density (but lower total
luminosity). As discussed by Sandquist (2005), this provides additional evidence that the
plateau in FHBBSS seen for clusters with log ρ0 . 2.5 is a result of the correlation between cluster
integrated magnitude and central density. The lowest luminosity clusters in the Sandquist
sample (E3 and Palomar 13) have central densities comparable to that of NGC 5466, but
BSS frequencies that are several times higher. Another moderate-luminosity, low-density
cluster (NGC 5053; Hiner et al., in preparation) similar to NGC 5466 shows a comparably
low straggler frequency. BSSs produced via purely collisional means are not likely to show
this kind of behavior. More likely is the scenario proposed by Davies et al. (2004) in which
binary stars that would normally produce BSSs are destroyed earlier in the cluster’s history.
More direct observational support for that hypothesis is needed though — for example, a
detailed study of the variation of the binary star fractions as a function of integrated cluster
magnitude.
Fig. 21 shows that the number and frequency
RBSS =
NBSS/N
Lsample/L
sample
of BSSs relative to the integrated V -band flux (derived from a King model profile) as a func-
tion of radius. Both frequencies increase toward the cluster center, and neither shows signs
of rising toward larger radii. As recent studies of denser clusters show (M3: Ferraro et al.
1993; M55: Zaggia et al. 1997; 47 Tuc: Ferraro et al. 2004; NGC 6752: Sabbi et al. 2004),
the BSS frequency generally decreases at intermediate radii and rises again at larger dis-
tances. However, the cluster Palomar 13 (Clark et al. 2004), which has a central density
similar to NGC 5466, shows no sign of an increase in straggler frequency at large distance.
In more massive clusters, the minimum in the BSS frequency is reached approximately
where the timescale for dynamical friction equals the age of the cluster (Warren et al. 2006).
A similar calculation for the current structure of NGC 5466 indicates that this occurs at about
270′′(about 2.8rc). This appears to be in the outer reaches of the core straggler distribution.
– 18 –
Because it is likely that NGC 5466 has lost a significant fraction of its mass, we expect
that the current density structure of the cluster has not existed throughout its history and
that NGC 5466 might have been able to dynamically relax stragglers to its core from larger
distances earlier in its history. This may be showing that the global BSS population differs
significantly between low-density/low-mass clusters and high-density/high-mass clusters. A
lack of stragglers at large distance may be signature of large-scale tidal stripping of the
cluster, which would remove both stragglers that would normally have formed in primordial
binaries in the outer reaches of the cluster and ones that formed in the core but were given
velocity kicks into orbits that would take them into the outer reaches.
The case against a rise at large radius in Palomar 13 is stronger because the cluster
has been surveyed out to 19 core radii, while in NGC 5466, we have only surveyed out to
about 10 core radii (or 7.5 half-mass radii). Still, NGC 5466 probably should have an even
more concentrated distribution of stragglers if its current density structure has existed for
most of its history. In more massive clusters, the secondary rise in straggler frequency is
observed between 8 and 10 rc. Unfortunately, further study of the stragglers in NGC 5466
will probably be complicated by the strong tidal tails observed in the cluster.
5. Conclusions
Examinations of the luminosity functions of globular clusters continue to produce inter-
esting tests of astrophysics. In this study, we found that NGC 5466 has a luminosity function
that is in better overall agreement with theoretical models than the anomalous cluster M30,
which has a similar low metallicity. In addition, we found that the relative numbers of red
giant and main sequence stars may produce a fairly sensitive test of the physics near the
core of red giants — specifically, helium diffusion and Coulomb interactions. However, we
are not yet able to fully explain the differences between sets of theoretical models.
Recent discoveries of large tidal tails associated with NGC 5466 suggest that this cluster
has been strongly disrupted by interactions with the Galaxy. Our measured flat (−1 . x . 0)
main-sequence luminosity function is unusual for a low-metallicity halo cluster. It is, however,
consistent with the emerging picture of mass-segregation followed by tidal stripping.
We have thoroughly re-examined the blue straggler population in the cluster, and de-
tected a total of 94. The radial distribution of stragglers is clearly more centrally concen-
trated than the RGB and HB populations. The frequency of blue stragglers in the cluster is
relatively low — consistent with the observed anti-correlation between frequency and cluster
luminosity, in spite of the cluster’s very low central density.
– 19 –
We would like to thank the anonymous referee for helpful comments on the manuscript,
Y. Jeon for providing us with an electronic copy of his photometric dataset, Y.-C. Kim for
information on the Yonsei-Yale isochrones, and S. Cassisi for providing us with access to
the Teramo set of models. This work has been funded through grants AST 00-98696 and
05-07785 from the National Science Foundation to E.L.S. and M.B.
REFERENCES
Adelberger, E. G., et al. 1998, Rev. Mod. Phys., 70, 1265
Bahcall, J. N., Pinsonneault, M. H., & Wasserburg, G. J. 1995, Rev. Mod. Phys., 67, 781
Behr, B. B. 2003, ApJS, 149, 67
Belokurov, V., Evans, N. W., Irwin, M. J., Hewett, P. C., & Wilkinson, M. I. 2006, ApJ,
637, L29
Bergbusch, P. A. 1996, AJ, 112, 1061
Bjork, S. R., & Chaboyer, B. 2006, ApJ, 641, 1102
Bolte, M. 1994, ApJ, 431, 223
Bono, G., Cassisi, S., Zoccali, M., & Piotto, G. 2001, ApJ, 546, L109
Burles, S., Nollett, K. M., & Turner, M. S. 2001, ApJ, 552, L1
Carretta, E., & Gratton, R. G. 1997, A&AS, 121, 95
Cassisi, S., Salaris, M., Castelli, F., & Pietrinferni, A. 2004, ApJ, 616, 498
Cassisi, S., Salaris, M., & Irwin, A. W. 2003, ApJ, 588, 862
Cho, D.-H., Lee, S.-G., Jeon, Y.-B., & Sim, K. J. 2005, AJ, 129, 1922
Clark, L. L., Sandquist, E. L., & Bolte, M. 2004, AJ, 128, 3019
D’Antona, F., Bellazzini, M., Caloi, V., Pecci, F. F., Galleti, S., & Rood, R. T. 2005, ApJ,
631, 868
Davies, M. B., Piotto, G., & de Angeli, F. 2004, MNRAS, 349, 129
degl’Innocenti, S., Weiss, A., & Leone, L. 1997, A&A, 319, 487
– 20 –
Demarque, P., Woo, J.-H., Kim, Y.-C., & Yi, S. K. 2004, ApJS, 155, 667
Dinescu, D. I., Girard, T. M., & van Altena, W. F. 1999, AJ, 117, 1792
Djorgovski, S., Piotto, G., & Capaccioli, M. 1993, AJ, 105, 2148
Eggleton, P. P., Faulkner, J., & Flannery, B. P. 1973, A&A, 23, 325
Fahlman, G. G., Richer, H. B., & Nemec, J. 1991, ApJ, 380, 124
Ferraro, F. R., Beccari, G., Rood, R. T., Bellazzini, M., Sills, A., & Sabbi, E. 2004, ApJ,
603, 127
Ferraro, F. R., Messineo, M., Fusi Pecci, F., de Palo, M. A., Straniero, O., Chieffi, A., &
Limongi, M. 1999, AJ, 118, 1738
Ferraro, F. R. & Montegriffo, P. 2000, AJ, 119, 1282
Ferraro, F. R., Paltrinieri, B., & Cacciari, C. 1999, Mem. Soc. Astron. Italiana, 70, 599
Ferraro, F. R., Fusi Pecci, F., Cacciari, C., Corsi, C., Buonanno, R., Fahlman, G. G., &
Richer, H. B. 1993, AJ, 106, 2324
Fusi Pecci, F., Ferraro, F. R., Crocker, D. A., Rood, R. T., & Buonanno, R. 1990, A&A,
238, 95
Gallart, C., Zoccali, M., & Aparicio, A. 2005, ARA&A, 43, 387
Gnedin, O. Y., & Ostriker, J. P. 1997, ApJ, 474, 223
Grillmair, C. J., & Johnson, R. 2006, ApJ, 639, L17
Guenther, D. B., Demarque, P., Kim, Y.-C., & Pinsonneault, M. H. 1992, ApJ, 387, 372
Guhathakurta, P., Webster, Z. T., Yanny, B., Schneider, D. P., & Bahcall, J. N. 1998, AJ,
116, 1757
Hargis, J. R., Sandquist, E. L., & Bolte, M. 2004, ApJ, 608, 243
Harpaz, A., & Kovetz, A. 1988, ApJ, 331, 898
Harris, W. E. 1996, AJ, 112, 1487
Hathaway, D. H. 1996, ApJ, 460, 1027
Jeon, Y., Lee, M. G., Kim, S., & Lee, H. 2004, AJ, 128, 287
– 21 –
Johnson, J. A., & Bolte, M. 1998, AJ, 115, 693
Kim, Y., Demarque, P., Yi, S. K., & Alexander, D. R. 2002, ApJS, 143, 499
Landolt, A. U. 1992, AJ, 104, 340
Lejeune, T., Cuisinier, F., & Buser, R. 1998, A&AS, 130, 65
Mateo, M., Harris, H. C., Nemec, J., & Olszewski, E. W. 1990, AJ, 100, 469
McCarthy, J. K., & Nemec, J. M. 1997, ApJ, 482, 203
Nemec, J. M. & Harris, H. C. 1987, ApJ, 316, 172
Nemec, J., & Mateo, M. 1990, ASP Conf. Ser. 11: Confrontation Between Stellar Pulsation
and Evolution, 11, 64
Norris, J. E. 2004, ApJ, 612, L25
Olive, K. A., & Skillman, E. D. 2004, ApJ, 617, 29
Piotto, G., De Angeli, F., King I. R., Djorgovski, S. G., Bono, G., Cassisi, S. , Meylan, G.,
Recio-Blanco, A. , Rich, R. M. & Davies, M. B. 2004, ApJ, 604, 109
Piotto, G., & Zoccali, M. 1999, A&A, 345, 485
Pollard, D. L., Sandquist, E. L., Hargis, J. R., & Bolte, M. 2005, ApJ, 628, 729
Preston, G. W., & Sneden, C. 2000, AJ, 120, 1014
Pritzl, B. J., Venn, K. A., & Irwin, M. 2005, AJ, 130, 2140
Proffitt, C. R., & Michaud, G. 1991, ApJ, 371, 584
Proffitt, C. R., & Vandenberg, D. A. 1991, ApJS, 77, 473
Pryor C. , McClure, R. D., Fletcher, J. M., & Hesser, J. E., 1991, AJ, 102, 1026
Ratcliff, S. J. 1987, ApJ, 318, 196
Renzini, A. & Fusi Pecci, F. 1988, ARA&A, 26, 199
Rogers, F. J., & Nayfonov, A. 2002, ApJ, 576, 1064
Rogers, F. J., Swenson, F. J., & Iglesias, C. A. 1996, ApJ, 456, 902
Rood, R. T., et al. 1999, ApJ, 523, 752
– 22 –
Rosenberg, A., Aparicio, A., Saviane, I., & Piotto, G. 2000, A&AS, 145, 451
Sabbi, E., Ferraro, F. R., Sills, A., & Rood, R. T. 2004, ApJ, 617, 1296
Salaris, M., Riello, M., Cassisi, S., & Piotto, G. 2004, A&A, 420, 911
Salaris, M., & Weiss, A. 2002, A&A, 388, 492
Sandage, A. R. 1953, AJ, 58, 61
Sandquist, E. L., Bolte, M., Langer, G. E., Hesser, J. E., & Mendes de Oliveira, C. 1999,
ApJ, 158, 262
Sandquist, E. L., Bolte, M., Stetson, P. B., & Hesser, J. E. 1996, ApJ, 470, 910
Sandquist, E. L. 2005, ApJ, 635, 73
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525
Searle, L., & Zinn, R. 1978, ApJ, 225, 357
Spergel, D. N., et al. 2006, submitted
Stetson, P. B. 2000, PASP, 112, 925
Stetson, P. B. 1991, ASP Conf. Ser. 13: The Formation and Evolution of Star Clusters, 13,
Stetson, P. B. 1990, PASP, 102, 932
Stetson, P. B. 1987, PASP, 99, 191
Straniero, O., Chieffi, A., & Limongi, M. 1997, ApJ, 490, 425
VandenBerg, D. A., Bergbusch, P. A., & Dowler, P. D. 2006, ApJS, 162, 375
Vandenberg, D. A., Bolte, M., & Stetson, P. B. 1990, AJ, 100, 445
VandenBerg, D. A., & Clem, J. L. 2003, AJ, 126, 778
Vandenberg, D. A., Larson, A. M., & de Propris, R. 1998, PASP, 110, 98
VandenBerg, D. A., Richard, O., Michaud, G., & Richer, J. 2002, ApJ, 571, 487
VandenBerg, D. A., Swenson, F. J., Rogers, F. J., Iglesias, C. A., & Alexander, D. R. 2000,
ApJ, 532, 430
– 23 –
Warren, S. R., Sandquist, E. L., & Bolte, M. 2006, ApJ, 648, 1026
Zaggia, S. R., Piotto, G., & Capaccioli, M. 1997, A&A, 327, 1004
Zoccali, M., Cassisi, S., Piotto, G., Bono, G., & Salaris, M. 1999, ApJ, 518, L49
Zoccali, M., & Piotto, G. 2000, A&A, 358, 943
Zinn, R. & West, M. J. 1984, ApJS, 55, 45
Zinn, R. 1980, ApJ, 241, 602
This preprint was prepared with the AAS LATEX macros v5.2.
– 24 –
Fig. 1.— Photometric residuals (in the sense of this study minus those of Landolt 1992 and
Stetson 2000) of primary standard stars. The median residuals are listed in the panels with
the semi-interquartile range (half the magnitude difference between the 25% and 75% points
in the ordered list of residuals) given in parentheses.
– 25 –
Fig. 2.— Photometric residuals (in the sense of the final PSF photometry minus standard
aperture photometry values) of secondary standard stars.
– 26 –
Fig. 3.— Residuals (in the sense of this study minus Stetson 2000) from the star-by-star
comparison. The median residuals are listed in the panels with the semi-interquartile range
(see Fig.1) given in parentheses.
– 27 –
Fig. 4.— Residuals (in the sense of this study minus Rosenberg et al. 2000) from the star-
by-star comparison. The median residuals and the plots versus color have been restricted
to brighter stars (V < 20 and I < 19) to make the comparisons clearer. The numbers in
parentheses are the semi-interquartile ranges (see Fig.1).
– 28 –
Fig. 5.— Residuals (in the sense of this study minus Jeon et al. 2004) from the star-by-star
comparison. The median residuals and the plots versus color have been restricted to brighter
stars (B < 20 and V < 19.5) to make the comparisons clearer. The numbers in parentheses
are the semi-interquartile ranges (see Fig.1).
– 29 –
Fig. 6.— External V magnitude errors σext(V ) as a function of radius and magnitude
determined from artificial star tests, with exponential fits shown by the solid lines.
– 30 –
Fig. 7.— Magnitude biases δ(V ) determined from artificial star tests as a function of radius
and magnitude.
– 31 –
Fig. 8.— Total recovery probability F (V ) determined from artificial star tests as a function
of radius and magnitude.
– 32 –
Fig. 9.— Completeness fraction f(V ) determined from artificial star tests as a function of
radius and magnitude.
– 33 –
Fig. 10.— Color-magnitude diagrams for all stars measured in the KPNO and CFHT images.
The BV fiducial (Table 3) is also plotted in the left panel.
– 34 –
Fig. 11.— Comparison of the observed fiducial sequence of NGC 5466 with the isochrones
of the Teramo, Victoria-Regina, and Yonsei-Yale groups. The isochrones have been shifted
horizontally so that the turnoff colors align, and shifted vertically to align the main sequence
point 0.05 mag redder than the turnoff. On the giant branch, the ages increase from the
reddest to the bluest isochrone.
– 35 –
Fig. 12.— Same as Fig. 11, except for more metal-rich models.
– 36 –
Fig. 13.— Comparison of the observed V−band LF of NGC 5466 with theoretical models
of the Yonsei-Yale, Teramo, and Victoria-Regina groups assuming (m−M)V = 16.
– 37 –
Fig. 14.— Comparison of the observed V−band LF of NGC 5466 with theoretical models
of the Victoria-Regina, Yonsei-Yale, and Teramo groups using magnitude shifts that bring
the main sequence point 0.05 mag redder than the turnoff into alignment. The models have
been normalized to the two bins on either side of the turnoff (V = 19.94).
– 38 –
Fig. 15.— Comparison of the observed LFs of M3, M5, M12, and M30 with theoretical
models of the Teramo group using magnitude shifts that bring the main sequence point 0.05
mag redder than the turnoff into alignment. The models have been normalized to bins on
either side of the turnoff.
– 39 –
Fig. 16.— Fractional difference between the observed RGB-MS number ratios (for M5, M12,
M3, M30, and NGC 5466, from left to right) and the theoretical predictions from the Yonsei-
Yale, Teramo, and Victoria-Regina models. The left panels use the Carretta & Gratton
(1997) metallicity scale, and the right panels use the Zinn & West (1984) scale. The sense
is (observed − theoretical) / observed.
– 40 –
Fig. 17.— The cumulative luminosity function for bright RGB stars derived from the pho-
tometry of Jeon et al. (2004; 286 stars) and from the KPNO dataset (338 stars) presented
here. Dotted lines show fits to the data for stars above and below the position of the apparent
bump.
– 41 –
Fig. 18.— Blue straggler selection for NGC 5466. The stars plotted in each panel show the
entire sample used for the selection: stars from Jeon et al. (2004) in the middle panel are
only those stars outside the CFHT field, and KPNO stars in the right panel are only those
outside the Jeon et al. field. Open squares show stragglers identified by Nemec & Harris
(1987), and open circles are new candidates.
– 42 –
Fig. 19.— Normalized cumulative radial distributions for RGB stars (dashed line), HB stars
(dotted line), and BSSs (solid line).
– 43 –
Fig. 20.— Relative frequencies of blue stragglers as a function of cluster absolute magnitude
and central density. The solid square is NGC 5466, the open squares are globular clusters
from Sandquist (2005), and all other points are from Piotto et al. (2004). Open circles are
post-core-collapse clusters. In the left panel, symbols represent clusters in different ranges of
central density from the Piotto et al. sample: log ρ0 < 2.8: open triangles; 2.8 < log ρ0 < 3.6:
filled triangles; 3.6 < log ρ0 < 4.4: stars; log ρ0 > 4.4: filled circles.
– 44 –
Fig. 21.— Frequency of BSS relative to the integrated V -band flux of detected cluster stars
(top panel) and specific frequency of blue stragglers relative as a function of radius (bottom
panel).
– 45 –
Table 1. Photometric Observation Log for NGC 5466
UT Date Filters N Exposure Time (s) Airmass
1995 May 4 B,V 1,1 60 1.01,1.12
1995 May 4 B 2 300 1.03, 1.03
1995 May 4 B,V 2,1 600 1.0,1.11,1.0
1995 May 5 B,I 2,2 300 1.03,1.01,1.02,1.01
1995 May 9 B,V ,I 1,1,1 60 1.12,1.13,1.14
– 46 –
Table 2. V -Band Luminosity Function
V logN σhigh σlow
13.532 0.5005 0.1761 0.3010
13.832 −0.1015 0.3010 1.0000
14.132 −0.1015 0.3010 1.0000
14.432 0.5975 0.1605 0.2575
14.732 0.8985 0.1193 0.1651
15.032 0.6767 0.1487 0.2279
15.332 0.6767 0.1487 0.2279
15.632 0.5976 0.1606 0.2575
15.932 0.8986 0.1193 0.1651
16.232 1.2410 0.0839 0.1041
16.532 1.0127 0.1063 0.1411
16.832 1.0451 0.1029 0.1351
17.132 1.3306 0.0764 0.0928
17.432 1.4432 0.0678 0.0804
17.732 1.4777 0.0728 0.0876
18.032 1.6136 0.0630 0.0738
18.332 1.7575 0.0540 0.0617
18.632 1.7577 0.0540 0.0617
18.932 1.9624 0.0433 0.0481
19.232 2.2928 0.0301 0.0324
19.532 2.5143 0.0236 0.0249
19.832 2.6996 0.0192 0.0201
20.132 2.7738 0.0178 0.0185
20.432 2.8830 0.0159 0.0165
20.732 2.9327 0.0151 0.0157
21.032 3.0036 0.0142 0.0146
21.332 3.0496 0.0137 0.0142
21.632 3.1179 0.0132 0.0136
– 47 –
– 48 –
Table 3. Fiducial sequence for NGC 5466
V B − V Na
22.198 0.578 518
21.999 0.564 598
21.798 0.549 724
21.596 0.532 710
21.398 0.498 673
21.200 0.470 670
21.004 0.459 653
20.802 0.450 614
20.600 0.431 568
20.400 0.418 546
20.199 0.404 470
20.001 0.395 453
19.805 0.396 382
19.599 0.403 273
19.400 0.436 235
19.206 0.495 155
19.005 0.543 84
18.823 0.586 70
18.588 0.607 43
18.401 0.616 52
18.204 0.621 40
17.989 0.636 34
17.820 0.647 31
17.586 0.654 23
17.403 0.672 20
17.194 0.677 19
16.988 0.693 11
16.784 0.722 10
16.613 0.725 6
16.400 0.752 12
16.199 0.771 16
– 49 –
Table 3—Continued
V B − V Na
16.066 0.779 5
15.825 0.809 7
15.602 0.827 4
15.407 0.857 6
15.111 0.917 1
15.006 0.930 5
14.786 0.946 2
14.590 0.986 11
14.438 1.044 2
aNumber of stars
used to determine
fiducial point
– 50 –
Table 4. RGB-MSTO Number Ratios
Sourcea TO Sample RGB Sample Y NRGB/NMSTO [Fe/H]
NGC 5466 19.682 < V < 20.282 16.982 < V < 18.482 0.162± 0.013
VR 0.235 0.132 −2.22
VR 0.235 0.130 −2.14
T 0.245 0.148 −2.22
T 0.245 0.146 −2.14
YY 0.230 0.162 −2.22
YY 0.230 0.160 −2.14
M3 18.80 < V < 19.40 16.40 < V < 18.00 0.168± 0.008
VR 0.235 0.170 −1.66
VR 0.235 0.158 −1.34
T 0.246 0.181 −1.66
T 0.248 0.170 −1.34
YY 0.230 0.185 −1.66
YY 0.230 0.162 −1.34
M5 19.13 < B < 19.73 16.33 < B < 17.93 0.110± 0.006
VR 0.235 0.106 −1.40
VR 0.235 0.097 −1.11
T 0.248 0.114 −1.40
T 0.251 0.106 −1.11
YY 0.230 0.119 −1.40
YY 0.230 0.105 −1.11
M12 18.14 < V < 18.74 15.59 < V < 17.24 0.158± 0.011
VR 0.235 0.144 −1.40
VR 0.235 0.118 −1.14
T 0.248 0.155 −1.40
T 0.251 0.144 −1.14
YY 0.230 0.150 −1.40
YY 0.230 0.137 −1.14
M30 18.33 < V < 18.93 15.78 < V < 17.43 0.214± 0.017
VR 0.235 0.174 −2.13
VR 0.235 0.153 −1.91
– 51 –
Table 4—Continued
Sourcea TO Sample RGB Sample Y NRGB/NMSTO [Fe/H]
T 0.245 0.173 −2.13
T 0.246 0.165 −1.91
YY 0.230 0.194 −2.13
YY 0.230 0.180 −1.91
aVR: Victoria-Regina models (no diffusion); YY: Yonsei-Yale models (He
diffusion); T: Teramo models (no diffusion). All models are for an age of
12 Gyr.
– 52 –
Table 5. Selected Star Populations in NGC 5466
ID ∆α(′′) ∆δ(′′) B σB V σV I σI Alternate ID Ref.
a Notes
Blue Stragglers
1 137.74 32.86 19.1477 19.0038 605 J
1 137.74 32.86 19.1446 0.0122 19.0182 0.0280 18.7606 0.0400 809 K
2 −12.18 −5.60 18.7591 0.0116 18.5972 0.0061 10313 C
2 −12.18 −5.60 18.8461 0.0147 18.7209 0.0258 18.6245 0.0504 2176 K
3 −8.68 14.99 19.1710 0.0100 18.9595 0.0059 9339 C SX Phe (3)
3 −8.68 14.99 19.4373 0.0155 19.2441 0.0296 19.1073 0.0563 2129 K
4 −77.31 83.56 19.2057 19.0546 646 J
4 −77.31 83.56 19.2266 0.0123 19.0548 0.0256 18.8465 0.0460 2880 K
5 −81.70 65.06 18.5511 18.2697 389 J
5 −81.70 65.06 18.5632 0.0121 18.2740 0.0243 17.8309 0.0255 2895 K
7 −132.96 −3.18 18.8010 18.7046 504 J
7 −132.96 −3.18 18.8482 0.0113 18.7379 0.0217 18.5538 0.0386 3289 K
8 −144.85 3.07 18.8436 18.6640 498 J
8 −144.85 3.07 18.8726 0.0120 18.6345 0.0234 18.3367 0.0298 3358 K
9 −90.90 83.38 19.1757 19.0256 626 J
9 −90.90 83.38 19.1385 0.0144 18.9711 0.0257 18.7508 0.0387 2973 K
10 −44.94 96.04 19.3627 19.1882 751 J
10 −44.94 96.04 19.4145 0.0139 19.2190 0.0261 18.9077 0.0494 2520 K
aPhotometry Sources: K: KPNO data from this paper, C: CFHT data from this paper, J: Jeon et al. (2004)
Note. — The complete version of this table is in the electronic edition of the Journal. The printed edition contains
only a sample.
	Introduction
	Observations and Data Reduction
	Calibration against Primary Standard Stars
	Calibration against Secondary Standard Stars
	Comparison with Previous Studies
	Calculation of the Luminosity Function
	Discussion
	Reddening, Metallicity, Distance Modulus, and Age
	The Color-Magnitude Diagram
	The Luminosity Function
	Relative RGB and MS Numbers
	The RGB Bump
	Mass Function Exponent
	Blue Stragglers
	Conclusions
ABSTRACT
  We present wide-field BVI photometry for about 11,500 stars in the
low-metallicity cluster NGC 5466. We have detected the red giant branch bump
for the first time, although it is at least 0.2 mag fainter than expected
relative to the turnoff. The number of red giants (relative to main sequence
turnoff stars) is in excellent agreement with stellar models from the
Yonsei-Yale and Teramo groups, and slightly high compared to Victoria-Regina
models. This adds to evidence that an abnormally large ratio of red giant to
main-sequence stars is not correlated with cluster metallicity. We discuss
theoretical predictions from different research groups and find that the
inclusion or exclusion of helium diffusion and strong limit Coulomb
interactions may be partly responsible.
  We also examine indicators of dynamical history: the mass function exponent
and the blue straggler frequency. NGC 5466 has a very shallow mass function,
consistent with large mass loss and recently-discovered tidal tails. The blue
straggler sample is significantly more centrally concentrated than the HB or
RGB stars. We see no evidence of an upturn in the blue straggler frequency at
large distances from the center. Dynamical friction timescales indicate that
the stragglers should be more concentrated if the cluster's present density
structure has existed for most of its history. NGC 5466 also has an unusually
low central density compared to clusters of similar luminosity. In spite of
this, the specific frequency of blue stragglers that puts it right on the
frequency -- cluster M_V relation observed for other clusters.

<|endoftext|><|startoftext|>
Quark-Antiquark and Diquark Condensates in Vacuum in a 3D Two-Flavor
Gross-Neveu Model∗
Zhou Bang-Rong
College of Physical Sciences, Graduate School of the Chinese Academy of Sciences, Beijing 100049, China and
CCAST (World Laboratory), P.O.Box 8730, Beijing 100080, China
(Dated:)
The effective potential analysis indicates that, in a 3D two-flavor Gross-Neveu model in vacuum,
depending on less or bigger than the critical value 2/3 of GS/HP , where GS and HP are respectively
the coupling constants of scalar quark-antiquark channel and pseudoscalar diquark channel, the
system will have the ground state with pure diquark condensates or with pure quark-antiquark
condensates, but no the one with coexistence of the two forms of condensates. The similarities and
differences in the interplay between the quark-antiquark and the diquark condensates in vacuum in
the 2D, 3D and 4D two-flavor four-fermion interaction models are summarized.
PACS numbers: 12.38Aw; 12.38.Lg; 12.10.Dm; 11.15.Pg
Keywords: 3D Gross-Neveu model, quark-antiquark and diquark condensates, effective potential
I. INTRODUCTION
It has been shown by effective potential approach that
in a two-flavor 4D Nambu-Jona-Lasinio (NJL) model [1],
even when temperature T = 0 and quark chemical poten-
tial µ = 0, i.e. in vacuum, there could exist mutual com-
petition between the quark-antiquark condensates and
the diquark condensates [2]. Similar situation has also
emerged from a 2D two-flavor Gross-Neveu (GN) model
[3] except some difference in the details of the results [4].
An interesting question is that if such mutual competi-
tion between the two forms of condensates is a general
characteristic of this kind of two-flavor four-fermion in-
teraction models? For answer to this question, on the
basis of research on the 4D NJL model and the 2D GN
model, we will continue to examine a 3D two-flavor GN
model in similar way. The results will certainly deepen
our understanding of the feature of the four-fermion in-
teraction models.
We will use the effective potential in the mean field
approximation which is equivalent to the leading order
of 1/N expansion. It is indicated that a 3D GN model is
renormalizable in 1/N expansion [5].
II. MODEL AND ITS SYMMETRIES
The Lagrangian of the model will be expressed by
L = q̄iγµ∂µq +GS [(q̄q)2 + (q̄~τq)2]
A=2,5,7
(q̄τ2λAq
C)(q̄Cτ2λAq). (1)
All the denotations used in Eq.(1) are the same as the
ones in the 2D GN model given in Ref.[4], except that
∗The project supported by the National Natural Science Founda-
tion of China under Grant No.10475113.
the dimension of space-time is changed from 2 to 3 and
the coupling constant HS of scalar diquark interaction
channel is replaced by the coupling constant HP of pseu-
doscalar diquark interaction channel. Now the matrices
γµ(µ = 0, 1, 2) and the charge conjugate matrix C are
taken to be 2× 2 ones and have the explicit forms
, γ1 =
, γ2 =
It is emphasized that, in 3D case, no ”γ5” matrix can
be defined, hence the third term in the right-handed side
of Eq.(1) will be the only possible color-anti-triplet di-
quark interaction channel which could lead to Lorentz-
invariant diquark condensates, where we note that the
matrix Cτ2λA is antisymmetric. Without ”γ5”, the La-
grangian (1) will have no chiral symmetry. Except this, it
is not difficult to verify that the symmetries of L include:
1. continuous flavor and color symmetries SUf (2) ⊗
SUc(3)⊗ Uf (1);
2. discrete symmetry R: q → −q;
3. parity P : q(t, ~x) → γ0q(t,−~x) and qC(t, ~x) →
−γ0qC(t,−~x);
4. time reversal T : q(t, ~x) → γ2q(−t, ~x) and
qC(t, ~x) → −γ2qC(−t, ~x);
5. charge conjugate C: q ↔ qC ;
6. special parity P1: q(t, x1, x2) → γ1q(t,−x1, x2) and
qC(t, x1, x2) → −γ1q(t,−x1, x2);
7. special parity P2: q(t, x1, x2) → γ2q(t, x1,−x2) and
qC(t, x1, x2) → −γ2qC(t, x1,−x2).
If the quark-antiquark condensates 〈q̄q〉 could be formed,
then the time reversal T , the special parities P1 and P2
will be spontaneously broken [6]. If the diquark conden-
sates 〈q̄Cτ2λ2q〉 could be formed, then the color symme-
try SUc(3) will be spontaneously broken down to SUc(2)
http://arxiv.org/abs/0704.0829v2
and the flavor number Uf (1) will be spontaneously bro-
ken but a ”rotated” electric charge U
(1) and a ”rotated”
quark number U ′q(1) leave unbroken [7]. In addition, the
parity P will be spontaneously broken, though all the
other discrete symmetries survive. This implies that the
diquark condensates 〈q̄Cτ2λ2q〉 will be a pseudoscalar.
In this paper we will neglect discussions of the Gold-
stone bosons induced by breakdown of the continuous
symmetries and pay our main attention to the problem
of interplay between the above two forms of condensates.
III. EFFECTIVE POTENTIAL IN MEAN FIELD
APPROXIMATION
Define the order parameters in the 3D GN model by
σ = −2GS〈q̄q〉 and ∆ = −2HP 〈q̄Cτ2λ2q〉, (3)
then in the mean field approximation, the Lagrangian (1)
can be rewritten by
L = Ψ̄(x)S−1(x)Ψ(x) −
, (4)
where
Ψ(x) =
qC(x)
and Ψ̄(x) =
q̄(x) q̄C(x)
are the expressions of the quark fields in the Nambu-
Gorkov basis [8]. In the momentum space, the inverse
propagator S−1(x) for the quark fields may be expressed
S−1(p) =
6p− σ −τ2λ2∆
−τ2λ2∆∗ 6p− σ
, 6p = γµpµ. (5)
The effective potential corresponding to L given by
Eq.(4) becomes
V (σ, |∆|) = σ
(2π)3
Tr lnS−1(p)S0(p).
Similar to the case of the 2D NG model [4], the calcu-
lations of Tr for (red, green) and blue color degrees of
freedom can be made separately thus Eq.(6) will be re-
duced to
V (σ, |∆|) = σ
(2π)3
p2 − (σ − |∆|)2 + iε
p2 + iε
p2 − (σ + |∆|)2 + iε
p2 + iε
p2 − σ2 + iε
p2 + iε
After the Wick rotation, we may define and calculate in
3D Euclidean momentum space
I(a2) =
(2π)3
p̄2 + a2
a3 arctan
a2Λ − π
, if Λ ≫ |a|, (8)
where Λ is the 3D Euclidean momentum cut-off. Assume
that Λ ≫ |σ − |∆||, Λ ≫ σ + |∆| and Λ ≫ σ, then by
means of Eq.(8) we will obtain the final expression of the
effective potential in the 3D GN model
V (σ, |∆|) = σ
(3σ2 + 2|∆|2)Λ
6σ2|∆|+ 2|∆|3 + σ3
+2θ(σ − |∆|)(σ − |∆|)3
. (9)
IV. GROUND STATES
Equation (9) provide the possibility to discuss
the ground states of the model analytically. The
extreme value conditions ∂V (σ, |∆|)/∂σ = 0 and
∂V (σ, |∆|)/∂|∆| = 0 will lead to the equations
θ(σ − |∆|)(σ − |∆|)2 = 0, (10)
[σ2 − θ(σ − |∆|)(σ − |∆|)2] = 0. (11)
Define the expressions
= AC −B2,
where A, B and C represent the second order derivatives
of V (σ, |∆|) with the explicit expressions
θ(σ − |∆|)(σ − |∆|),
∂σ∂|∆|
∂|∆|∂σ
[σ − θ(σ − |∆|)(σ − |∆|)],
∂|∆|2
θ(σ − |∆|)(σ − |∆|). (12)
Equations (10) and (11) have the four different solutions
which will be discussed in proper order as follows.
(i) (σ, |∆|)=(0,0). It is a maximum point of V (σ, |∆|),
since in this case we have
< 0 and K = A
assuming Eqs. (10) and (11) have solutions of non-zero
σ and |∆|.
(ii) (σ, |∆|)=(σ1 ,0), where the non-zero σ1 satisfies the
equation
= 0. (13)
When Eq. (13) is used, we obtain
K = = A
> 0, if
Hence (σ1,0) will be a minimum point of V (σ, |∆|) when
GS/HP > 2/3.
(iii) (σ, |∆|)= (0, ∆1), where non-zero ∆1 obeys the
equation
= 0. (14)
By using Eq.(14) we may get
, K = A
Obviously, (0,∆1) will be a minimum point of V (σ, |∆|)
when GS/HP < 2/3.
(iv) (σ, |∆|)=(σ2,∆2). In view of existence of the func-
tion θ(σ− |∆|) in Eqs.(10) and (11), we have to consider
the case of σ2 > ∆2 and σ2 < ∆2 respectively.
(a) σ2 > ∆2. In this case, Eqs.(10) and (11) will be-
+ 2∆2
) = 0,
From them we can get
− 2∆2
> 0, K = −
Thus it is turned out that (σ2, ∆2) will be neither a
maximum nor a minimum point of V (σ, |∆|) if σ2 > ∆2.
(b) σ2 < ∆2. Now Eqs. (10) and (11) are changed into
4∆2 + σ2
= 0, (15)
= 0. (16)
Hence we will have the results that
, K =
− 8σ2∆2),
from which it may be deduced that only if
σ2 < (
17− 4)∆2, (17)
(σ2,∆2) is just a minimum point of V (σ, |∆|). On the
other hand, from Eqs. (15) and (16) obeyed by σ2 and
∆2 we may get
13− 1
∆2 + σ2
13 + 1
∆2 − σ2
. (18)
Equation (18) indicates that for the minimum point
(σ2,∆2) satisfying Eq.(17) one will certainly have
GS/HP > 2/3. Taking this and the result obtained in
case (ii) into account we see that if GS/HP > 2/3 the
effective potential V (σ, |∆|) will have two possible min-
imum points (σ1, 0) and (σ2,∆2). To determine which
one of the two minimum points is the least value point
of V , we must make a comparison between V (σ1, 0) and
V (σ2,∆2) with the constraint given by Eq.(17). In fact,
it is easy to find out that when Eq.(13) is used,
V (σ1, 0) = −
, (19)
and that when Eqs. (15) and (16) are used,
V (σ2,∆2) = −
+ 3σ2
. (20)
By comparing Eq.(13) with Eq.(15) we may obtain the
relation
3σ1 = σ2 + 4∆2. (21)
By means of Eqs.(19)-(21) it is easy to verify that
V (σ1, 0)− V (σ2,∆2) = −
(23∆3
− 4σ3
(8∆2 − 7σ2)
when Eq.(17) is satisfied. This result indicates that
when GS/HP > 2/3, the least value point of V (σ, |∆|)
will be (σ1, 0) but not (σ2,∆2).
In summary, if the necessary conditions GSΛ > π
and HPΛ > π
2/8 for non-zero σ and ∆ are satisfied,
then the least value points of the effective potential
V (σ, |∆|) will be at
(σ, |∆|) =
(0,∆1)
(σ1, 0)
0 ≤ GS/HP < 2/3
GS/HP > 2/3
. (22)
As a result, in the ground state of the 3D two-flavor GN
model, depending on that the ratio GS/HP is either big-
ger or less than 2/3, one will have either pure quark-
antiquark condensates or pure diquark condensates, but
no coexistence of the two forms of condensates could hap-
V. CONCLUDING REMARKS
The result (22) in the 3D GN model can be compared
with the ones in the 4D NJL model and in the 2D GN
model. The minimal points of the effective potential
V (σ, |∆|) for the latter models have been obtained and
are located respectively at
(σ, |∆|) =
(0, ∆1)
(σ2, ∆2)
(σ1, 0)
0 ≤ GS/HS < 2/[3(1 + C)]
2/[3(1 + C)] < GS/HS < 2/3
GS/HS > 2/3
with C = (2HSΛ
/π2 − 1)/3 and Λ4 denoting the 4D
Euclidean momentum cutoff in the 4D two-flavor NJL
model, if the necessary conditions GSΛ
> π2/3 and
> π2/2 for non-zero σ and ∆ are satisfied [2], and
(σ, |∆|) =
(0, ∆1)
(σ2, ∆2)
(σ1, 0)
GS/HS = 0
0 < GS/HS < 2/3
GS/HS > 2/3
in the 2D two-flavor GN model [4]. In Eqs.(23) and (24),
GS and HS always represent the coupling constants in
scalar quark-antiquark channel and scalar diquark chan-
nel separately.
By a comparison among Eqs.(22)-(24) it may be found
that the three models lead to very similar results. In
all the three models, the interplay between the quark-
antiquark and the diquark condensates in vacuum de-
pends on the ratio GS/HD (D = S for the 4D and 2D
model and D = P for the 3D model). In particular, the
diquark condensates could emerge (in separate or coex-
istent pattern) only if GS/HD < 2/3. This is probably
a general characteristic of the considered two-flavor four-
fermion models, since in these models the color number
of the quarks participating in the diquark condensates
and in the quark-antiquark condensates is just 2 and 3
respectively. However, there are also some differences
in the pattern realizing the diquark condensates among
the three models, though the pure quark-antiquark con-
densates arise only if GS/HD > 2/3 in all of them. In
the 2D GN model, the pure diquark condensates emerge
only if GS/HS = 0 and this is different from the 4D NJL
model where the pure diquark condensates may arise if
GS/HS is in a finite region below 2/3. Another difference
is that in the 3D GN model, there is no coexistence of the
quark-antuquark condensates and the diquark conden-
sates but such coexistence is clearly displayed in the 4D
and 2D model. This implies that in the 3D GN model,
GS/HP = 2/3 becomes the critical value which distin-
guishes between the ground states with the pure diquark
condensates and with the pure quark-antiquark conden-
sates.
It is also indicated that if the two-flavor four-fermion
interaction models are assumed to be simulations of QCD
(of course, only the 4D NJL model is just the true
one) and the four-fermion interactions are supposed to
come from the heavy color gluon exchange interactions
−g(q̄γµλaq)2 (a = 1, · · · , 8;µ = 0, · · · , D − 1) via the
Fierz transformation [7], then one will find that in all the
three models, for the case of two flavors and three colors
the ratio GS/HD are always equal to 4/3 which is larger
than the above critical value 2/3. From this we can con-
clude that there will be only the pure quark-antiquark
condensates and no diquark condensates in the ground
states of all these models in vacuum.
[1] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961)
345; 124 (1961) 246.
[2] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 95.
[3] D.J. Gross and A. Neveu, Phys. Rev. D 10 (1974) 3235.
[4] Zhou Bang-Rong, Commun. Theor. Phys., 47 (2007) 520.
[5] B. Rosenstein, B. J. Warr, and S. H. Park, Phys. Rep. 205
(1991) 59.
[6] Bang-Rong Zhou, Phys. Lett. B444 (1998) 455.
[7] M. Buballa, Phys. Rep. 407 (2005) 205.
[8] Y. Nambu, Phys. ReV. 117 (1960) 648; L. P. Gorkov,
JETP 7 (1958) 993.
ABSTRACT
  The effective potential analysis indicates that, in a 3D two-flavor
Gross-Neveu model in vacuum, depending on less or bigger than the critical
value 2/3 of $G_S/H_P$, where $G_S$ and $H_P$ are respectively the coupling
constants of scalar quark-antiquark channel and pseudoscalar diquark channel,
the system will have the ground state with pure diquark condensates or with
pure quark-antiquark condensates, but no the one with coexistence of the two
forms of condensates. The similarities and differences in the interplay between
the quark-antiquark and the diquark condensates in vacuum in the 2D, 3D and 4D
two-flavor four-fermion interaction models are summarized.

<|endoftext|><|startoftext|>
Introduction
	Throughput of random linear coding
	Performance without pre-coding
	Performance with pre-coding
	Discussion
	References
ABSTRACT
  We assess the practicality of random network coding by illuminating the issue
of overhead and considering it in conjunction with increasingly long packets
sent over the erasure channel. We show that the transmission of increasingly
long packets, consisting of either of an increasing number of symbols per
packet or an increasing symbol alphabet size, results in a data rate
approaching zero over the erasure channel. This result is due to an erasure
probability that increases with packet length. Numerical results for a
particular modulation scheme demonstrate a data rate of approximately zero for
a large, but finite-length packet. Our results suggest a reduction in the
performance gains offered by random network coding.

<|endoftext|><|startoftext|>
Introduction 
The discovery of extrasolar planets during the past decade has confronted astronomers with 
many new challenges. The diverse and surprising dynamical characteristics of many of these 
objects have made scientists wonder to what extent the current theories of planet formation can 
be applied to other planetary systems. A major challenge of planetary science is now to explain 
how such planets were formed, how they acquired their unfamiliar dynamical state, whether 
there are habitable extrasolar planets, and how to detect such habitable worlds. 
 In this respect, one of the most surprising discoveries is the detection of planets in binary star 
systems. Among the currently known extrasolar planet-hosting stars approximately 25% are 
members of binaries (Table 1). With the exception of the pulsar planetary system PSR B1620-26 
(Sigurdsson et al. 2003; Richer et al. 2003; Beer et al. 2004), and possibly the system of 
HD202206 (Correia et al. 2005), planets in these binary systems revolve only around one of the 
stars. While the majority of these binaries are wide (i.e., with separations between 250 and 6500 
AU, where the perturbative effect of the stellar companion on planet formation around the other 
star is negligible), the detection of Jovian-type planets in the two binaries γ Cephei (separation of 
18.5 AU, see Hatzes et al. 2003) and GJ 86 (separation of 21 AU, see Els et al. 2001) have 
brought to the forefront questions on the formation of giant planets and the possibility of the 
existence of smaller bodies in moderately close binary and multiple star systems. Given that 
more than half of main sequence stars are members of binaries/multiples (Duquennoy & Mayor 
1991; Mathieu et al. 2000), and the frequency of planets in binary/multiple systems is 
comparable to those around single stars (Bonavita & Desidera 2007), such questions have 
realistic grounds. 
 At present, the sensitivity of the detection techniques does not allow routine discovery of 
Earth-sized objects around binary and multi-star systems. However, with the advancement of 
new techniques, and with the recent launch of CoRoT and the launch of Kepler in late 2008, the 
detection of more planets (possibly terrestrial-class objects) in such systems is on the horizon.  
Table 1- Binary and multi-star systems with extrasolar planets (Haghighipour 2006) 
Star Star Star Star 
HD142 (GJ 9002) HD3651 HD9826 (υAnd) HD13445 (GJ 86) 
HD19994 HD22049 (εEri) HD27442 HD40979 
HD41004 HD75732 (55 Cnc) HD80606 HD89744 
HD114762 HD117176 (70 Vir) HD120136 (τBoo) HD121504 
HD137759 HD143761 (ρCrb) HD178911 HD186472 (16 Cyg) 
HD190360 (GJ 777 A) HD192263 HD195019 HD213240 
HD217107 HD219449 HD219542 HD222404 (γCeph) 
HD178911 PSR B1257-20 PSR B1620-26 HD202206 
See http://www.obspm.fr/planets for complete list of extrasolar planets with their corresponding 
references. 
                       
Fig 1. The time of ejection, vs. the initial semimajor axis of an Earth-like planet in a co-planar arrangement in 
the γ Cephei system. The binary consists of a 1.59 solar-masses K1 IV subgiant as its primary (Fuhrmann 
2003) and a probable red M dwarf with a mass of 0.41 solar-masses (Neuhauser et al 2007) as its secondary. 
The semimajor axis and eccentricity of the binary are 18.5 AU and 0.36, respectively (Hatzes et al. 2003). The 
primary star is host to a 1.7 Jupiter-masses object in an orbit with a semimajor axis of 2.13 AU and eccentricity 
of 0.12. The habitable zone of the primary is within 3 AU to 3.7 AU from this star (Haghighipour 2006). As 
shown here, the orbit of an Earth-sized object in the primary’s habitable zone is unstable. However, such an 
object can have a log-term stable orbit in distances closer to the primary star. 
Theoretical studies and numerical modeling of terrestrial and habitable planet formation in such 
dynamically complex environments are, therefore, necessary to gain fundamental insights into 
the prospects for life in such systems and have great strategic impact on NASA science and 
missions.  
 Several lines of investigations are needed to ensure progress in understanding the formation 
of terrestrial and habitable planets in binary and multi-star systems. 
                   
Fig 2. Results of simulations of the formation of Earth-like objects in the habitable zone of the 
primary of a binary star system. The stars of the binary are Sun-like and the primary is host to a 
Jupiter-sized object on a circular orbit at 5 AU. Simulations show the results for different values 
of the eccentricity and semimajor axis of the stellar companion (Haghighipour & Raymond 2007). 
As shown here, the orbital motion of the secondary star disturbs the orbit of the giant planet, 
which in turn affects the final assembly and water contents of the terrestrial objects. This figure 
also shows that binary systems with larger perihelia are more favorable for forming and harboring 
habitable planets. The quantities ab  and eb represent the seimmajor axis and eccentricity of the 
binary. 
1) Computational Modeling 
Extensive numerical studies are necessary to  
i) map the parameter-space of binary and/or multiple star systems to identify  
    regions where giant and terrestrial planets can have long-term stable orbits, 
ii) simulate the  collision and growth of planetesimals to form protoplanetary objects, 
iii) simulate the formation of planetesimals in circumbinary and circumstellar disks, 
iii) develop models of protoplanet disk chemistry that ensure delivery of water to  
      terrestrial-class planets in the habitable zone, 
iv) simulate the interaction of planetary embryos and the late stage of terrestrial  
      planet formation. 
The parameter-space is large and includes the masses and orbital parameters of the stars and 
planets. It is, therefore, necessary to develop a systematic approach, based on the results of  
                     
Fig 3. Histograms of the number of final terrestrial planets formed in binary star systems with periastron 
distances of 5 AU (top), 7.5 AU (middle), and 10 AU (bottom). The color red corresponds to simulations in a 
binary in which the primary and secondary stars are 0.5 solar-masses. The color blue represents a binary with 
1 solar-mass stars, and the color yellow corresponds to a binary with a 0.5 solar-masses primary and a 1 solar-
mass secondary. The black line in the middle panel shows the results of simulations when the primary star is 
1 solar-mass and the secondary is 0.5 solar-masses. As shown here, the typical number of final planets clearly 
increases in systems with larger stellar periastra, and also when the companion star is less massive than the 
primary (Quintana et al. 2007). 
current research, to avoid un-necessary simulations, particularly if the computational resources 
are limited.  
 Current research has indicated that terrestrial-class planets can have long-term stable orbits as 
long as they are closer to their host stars and their orbits lie outside the influence zone of the 
giant planet of the system (figure 1, also see Holman & Wiegert 1999; David et al. 2003; 
Haghighipour 2006). This implies, in order for such systems to be habitable, the habitable zone 
of the planet-hosting star has to be considerably closer to it than orbit of its giant planet(s). Given 
that the location of the habitable zone is a function of the luminosity of a star, the above-
mentioned criterion can be used to constrain stellar properties. Recent numerical simulations 
have also shown that (1) water-delivery is more efficient when the perihelion of the binary is 
large and the orbit of the giant planet is close to a circle (figures 2 and 3, also see Quintana et al. 
2007, Haghighipour & Raymond 2007), and (2) habitable planets can form in the habitable zone 
of a star during the migration of giant planets (figure 4, see Raymond, Mandell & Sigurdsson, 
2006). Since many stars are formed in clusters, their mutual interactions may change their orbital 
configurations and cause their giant planets to revolve around their host starts in un-conventional 
orbits. Theoretical studies are essential to identify systems capable of forming and harboring 
habitable planets. 
                                 
Fig 4. Habitable planet formation at presence of giant planet migration (Raymond, Mandell & Sigurdsson 
2006). The system consists of a Sun-like star and a Jupiter-sized giant planet. The figure shows snapshots in 
time of the evolution of one simulation. Each panel plots the orbital eccentricity versus semimajor axis for 
each surviving body. The size of each body is proportional to its physical size (except for the giant planet, 
shown in black). The vertical "error bars" represent the sine of each body's inclination on the y-axis scale. The 
color of each dot corresponds to its water content (as per the color bar), and the dark inner dot represents the 
relative size of its iron core. For scale, the Earth's water content is roughly 10-3. As shown here, an Earth-like 
object can form in the habitable zone of the star while the giant planet migrates to closer distances.  
2) Theoretical Analysis of Observation Data 
Recent observations of binary star systems, using Spitzer Space Telescope, show evidence of 
debris disks in these environments (Trilling et al. 2007). As shown by these authors, 
approximately 60% of their observed close binary systems (separation smaller than 3 AU) have 
excess in their thermal emissions, implying on-going collisions in their planetesimal regions.  
Future space-, air-, and ground-based telescopes such as ALMA, SOFIA and JWST will be able 
to detect more of such disks and will also be able to resolve their fine structures. Numerical 
simulations, similar to those for debris disks around single stars (Telesco et al. 2005), will be 
necessary in order to understand the dynamics of such planet-forming environments, and also 
identify the source of their disks features (e.g., embedded planets, and/or on-going planetesimals 
collision). Due to the complex nature of these systems, such numerical studies require more 
advanced computational codes, and more powerful computers. Developing theories of disk 
evolution in close binary systems is also essential. 
3) Computational Resources 
Given the extent and complexity of simulations of planet formation in multi-star systems, and the 
high dimensionality of the parameter space of initial conditions, supports for developing 
computational resources with the primary focus of conducting numerical analysis of terrestrial 
planet formation are essential. Reliable simulations of collisional growth of planetesimals and 
planetary embryos require integration of the orbits of several hundred thousands of such objects. 
With the current technology, such simulations may take several months to a year to complete. It 
is therefore necessary to develop (i) faster integration routines, and (ii) major computational 
facilities with the primary focus of simulating terrestrial planet formation.     
Strategic Impact to NASA Missions 
Understanding terrestrial and habitable planet formation in binary and multiple star systems has 
implications for investigating the habitability of extra-solar planets. It ties directly into near 
future NASA missions, in particular Kepler, and JWST as well as complementary ongoing and 
planned NSF and privately funded surveys that include transit, and radial velocity. It is also 
closely coupled with the scientific aspect of the Space Exploration Vision and aligns with the 
2006 NASA Science Program implementation of the Strategic sub-goal 3D: 
“Discover the origin, structure, evolution, and destiny of the universe and search for earth-like 
planets.” 
The strategic relevance to the NASA missions is in the prospects for detection of habitable 
Earth-like planets. Studies such as those presented here underlie hypotheses regarding the 
likelihood of the existence of such planets, the origin of life in the habitable zones of their host 
stars, and theories of evolution and persistence of life after initiation, at the presence of a stellar 
companion. Earth-like objects in and around binary star systems allow testing of theories of 
extrasolar habitability and origin of life. Prospects for testing of extrasolar life are intrinsically 
exciting and valuable to the NASA community and the public, and the systems to be explored, 
once found, provide calibration targets for future NASA missions.  
References  
Beer, M. E., King, A. P., &  Pringle, J. E. 2004, MNRAS, 355, 1244 
Bonavita, M., & Desidera, S. 2007, submitted to A&A (astro-ph/0703754) 
Correia, A. C. M., Udry, S., Mayor, M., Laskar, J., Naef, D., Pepe, F., Queloz, D.,  
& Santos, N. C., 2005, A&A, 440, 751 
David, E., Quintana, E. V., Fatuzzo, M., Adams, F. C., 2003, PASP, 115, 825 
Duquennoy, A., & Mayor, M. 1991, A&A, 248, 485 
Els, S. G., Sterzik, M. F., Marchis, F., Pantin, E., Endl, M., & Kurster, M. 2001,  
A&A, 370, L1 
Fuhrmann, K. 2003, Astron.Nachr. 323, 392 
Haghighipour, N. 2006, ApJ, 644, 543 
Haghighipour, N., & Raymond, S. N., to appear in ApJ (astro-ph/0702706) 
Holman, M. J., & Wiegert, P. A. 1999, AJ, 117, 621 
Hatzes, A. P., Cochran, W. D., Endl, M., McArthur, B., Paulson, D. B., Walker, G. A. H., 
Campbell, B., & Yang, S. 2003, ApJ, 599, 1383 
Mathieu, R. D., Ghez, A. M., Jensen, E. L. N., & Simon, M. 2000, in Protostars and Planets IV, 
ed. V. Mannings, A. P. Boss, & S. S. Russell (Tucson: Univ. Arizona Press), 703 
Neuhauser, R., Mugrauer, M., Fukagawa, M., Torres, G., Schmidt, T., 2007, A&A, 462, 777 
Quintana, E. V., Adams, F. C., Lissauer, J. J., Chambers, J. E. 2007, ApJ, to appear in vol. 660 
Raymond, S. N., Mandell, A. M., & Sigurdsson, S., 2006, Science, 313, 1413 
Richer, H. B., Ibata, R., Fahlman, G., G., & Huber, M. 2003, ApJ, 597, L45 
Sigurdsson, S., Richer, H. B., Hansen, B., M., Stairs, I. H., & Thorsett, S. E. 2003,  
Science, 301, 103 
Telesco, C. M., Fisher, R. S., Wyatt, M. C., Dermott, S. F., Kehoe, T. J. J., Novotny. S., 
Mariñas, N., Radomski, J. T., Packham, C., De Buizer, J., Hayward, T. L., 2005,  
Nature, 433, 133 
Trilling, D. E., Stansberry, J. A., Stapelfeldt, K. R., Rieke, G. H., Su, K. Y. L., Gray, R. O., 
Corbally, C. J., Bryden, G., Chen, C. H., Boden, A., Beichman, C. A. 2007, 658, 1289
ABSTRACT
  One of the most surprising discoveries of extrasolar planets is the detection
of planets in moderately close binary star systems. The Jovian-type planets in
the two binaries of Gamma Cephei and GJ 86 have brought to the forefront
questions on the formation of giant planets and the possibility of the
existence of smaller bodies in such dynamically complex environments. The
diverse dynamical characteristics of these objects have made scientists wonder
to what extent the current theories of planet formation can be applied to
binaries and multiple star systems. At present, the sensitivity of the
detection techniques does not allow routine discovery of Earth-sized bodies in
binary systems. However, with the advancement of new techniques, and with the
recent launch of CoRoT and the launch of Kepler in late 2008, the detection of
more planets (possibly terrestrial-class objects) in such systems is on the
horizon. Theoretical studies and numerical modeling of terrestrial and
habitable planet formation are, therefore, necessary to gain fundamental
insights into the prospects for life in such systems and have great strategic
impact on NASA science and missions.

<|endoftext|><|startoftext|>
Introduction and statement of results
The theory of nonlinear dispersive equations (local and global existence, regularity,
scattering theory) is vast and has been studied extensively by many authors. Almost
exclusively, the techniques developed so far restrict to Cauchy problems with initial
data in a Sobolev space, mainly because of the crucial role played by the Fourier
transform in the analysis of partial differential operators. For a sample of results and
a nice introduction to the field, we refer the reader to Tao’s monograph [12] and the
references therein.
In this note, we focus on the Cauchy problem for the nonlinear Schrödinger equa-
tion (NLS), the nonlinear wave equation (NLW), and the nonlinear Klein-Gordon
equation (NLKG) in the realm of modulation spaces. Generally speaking, a Cauchy
data in a modulation space is rougher than any given one in a fractional Bessel poten-
tial space and this low-regularity is desirable in many situations. Modulation spaces
were introduced by Feichtinger in the 80s [6] and have asserted themselves lately as
the “right” spaces in time-frequency analysis. Furthermore, they provide an excellent
substitute in estimates that are known to fail on Lebesgue spaces. This is not entirely
surprising, if we consider their analogy with Besov spaces, since modulation spaces
arise essentially replacing dilation by modulation.
The equations that we will investigate are:
(1) (NLS) i
+∆xu+ f(u) = 0, u(x, 0) = u0(x),
(2) (NLW )
−∆xu+ f(u) = 0, u(x, 0) = u0(x),
(x, 0) = u1(x),
(3) (NLKG)
+ (I −∆x)u+ f(u) = 0, u(x, 0) = u0(x),
(x, 0) = u1(x),
Date: October 30, 2018.
2000 Mathematics Subject Classification. Primary 35Q55; Secondary 35C15, 42B15, 42B35.
Key words and phrases. Fourier multiplier, weighted modulation space, short-time Fourier trans-
form, nonlinear Schrödinger equation, nonlinear wave equation, nonlinear Klein-Gordon equation,
conservation of energy.
http://arxiv.org/abs/0704.0833v1
2 Á. Bényi and K. A. Okoudjou
where u(x, t) is a complex valued function on Rd×R, f(u) (the nonlinearity) is some
scalar function of u, and u0, u1 are complex valued functions on R
The nonlinearities considered in this paper will be either power-like
(4) pk(u) = λ|u|2ku, k ∈ N, λ ∈ R,
or exponential-like
(5) eρ(u) = λ(e
ρ|u|2 − 1)u, λ, ρ ∈ R.
Both nonlinearities considered have the advantage of being smooth. The correspond-
ing equations having power-like nonlinearities pk are sometimes referred to as alge-
braic nonlinear (Schrödinger, wave, Klein-Gordon) equations. The sign of the coeffi-
cient λ determines the defocusing, absent, or focusing character of the nonlinearity,
but, as we shall see, this character will play no role in our analysis on modulation
spaces.
The classical definition of (weighted) modulation spaces that will be used through-
out this work is based on the notion of short-time Fourier transform (STFT). For
z = (x, ω) ∈ R2d, we let Mω and Tx denote the operators of modulation and transla-
tion, and π(z) = MωTx the general time-frequency shift. Then, the STFT of f with
respect to a window g is
Vgf(z) = 〈f, π(z)g〉.
Modulation spaces provide an effective way to measure the time-frequency concen-
tration of a distribution through size and integrability conditions on its STFT. For
s, t ∈ R and 1 ≤ p, q ≤ ∞, we define the weighted modulation space Mp,qt,s (Rd) to
be the Banach space of all tempered distributions f such that, for a nonzero smooth
rapidly decreasing function g ∈ S(Rd), we have
‖f‖Mp,qt,s =
|Vgf(x, ω)|p < x >tp dx
< ω >qs dω
Here, we use the notation
< x >= (1 + |x|2)1/2.
This definition is independent of the choice of the window, in the sense that different
window functions yield equivalent modulation-space norms. When both s = t = 0, we
will simply write Mp,q = Mp,q0,0. It is well-known that the dual of a modulation space
is also a modulation space, (Mp,qs,t )′ = M
p′,q′
−s,−t, where p
′, q′ denote the dual exponents
of p and q, respectively. The definition above can be appropriately extended to
exponents 0 < p, q ≤ ∞ as in the works of Kobayashi [9], [10]. More specifically, let
β > 0 and χ ∈ S such that suppχ̂ ⊂ {|ξ| ≤ 1} and
k∈Zd χ̂(ξ − βk) = 1, ∀ξ ∈ Rd.
For 0 < p, q ≤ ∞ and s > 0, the modulation space Mp,q0,s is the set of all tempered
distributions f such that
|f ∗ (Mβkχ)(x)|p dx
< βk >sq
NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 3
When, 1 ≤ p, q ≤ ∞ this is an equivalent norm on Mp,q0,s, but when 0 < p, q < 1
this is just a quasi-norm. We refer to [9] for more details. For another definition of
the modulation spaces for all 0 < p, q ≤ ∞ we refer to [5, 15]. For a discussion of
the cases when p and/or q = 0, see [4]. These extensions of modulation spaces have
recently been rediscovered and many of their known properties reproved via different
methods by Baoxiang et all [1], [2]. There exist several embedding results between
Lebesgue, Sobolev, or Besov spaces and modulation spaces, see for example [11], [13];
also [1], [2]. We note, in particular, that the Sobolev space H2s coincides with M
For further properties and uses of modulation spaces, the interested reader is referred
to Gröchenig’s book [8].
The goal of this note is two fold: to improve some recent results of Baoxiang,
Lifeng and Boling [1] on the local well-posedness of nonlinear equations stated above,
by allowing the Cauchy data to lie in any modulation space Mp,10,s, p > dd+1 , s ≥ 0,
and to simplify the methods of proof by employing well-established tools from time-
frequency analysis. Ideally, one would like to adapt these methods to deal with global
well-posedness as well. We plan to address these issues in a future work.
In what follows, we assume that d ≥ 1, k ∈ N, d
< p ≤ ∞, λ, ρ ∈ R and s ≥ 0
are given. With pk and eρ defined by (4) and (5) respectively, our main results are
the following.
Theorem 1. Assume that u0 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists
T ∗ = T ∗(‖u0‖Mp,10,s) such that (1) has a unique solution u ∈ C([0, T
∗],Mp,10,s(Rd)).
Moreover, if T ∗ < ∞, then lim sup
t→T ∗
‖u(·, t)‖Mp,10,s = ∞.
Theorem 2. Assume that u0, u1 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists
T ∗ = T ∗(‖u0‖Mp,10,s , ‖u1‖Mp,10,s) such that (2) has a unique solution u ∈ C([0, T
∗],Mp,10,s(Rd)).
Moreover, if T ∗ < ∞, then lim sup
t→T ∗
‖u(·, t)‖Mp,10,s = ∞.
Theorem 3. Assume that u0, u1 ∈ Mp,10,s(Rd) and f ∈ {pk, eρ}. Then, there exists
T ∗ = T ∗(‖u0‖Mp,10,s , ‖u1‖Mp,10,s) such that (3) has a unique solution u ∈ C([0, T
∗],Mp,10,s(Rd)).
Moreover, if T ∗ < ∞, then lim sup
t→T ∗
‖u(·, t)‖Mp,10,s = ∞.
Remark 1. In Theorem 1 we can replace the (NLS) equation with the following
more general (NLS) type equation:
(7) (NLS)α i
+∆α/2x u+ f(u) = 0, u(x, 0) = u0(x),
for any α ∈ [0, 2] and p ≥ 1. The operator ∆α/2x is interpreted as a Fourier multiplier
operator (with t fixed), ∆̂
x u(ξ, t) = |ξ|αû(ξ, t). This strengthening will become
evident from the preliminary Lemma 1 of the next section.
Remark 2. Theorems 1.1 and 1.2 of [1] are particular cases of Theorem 1 with p = 2
and s = 0.
4 Á. Bényi and K. A. Okoudjou
2. Fourier multipliers and multilinear estimates
The generic scheme in the local existence theory is to establish linear and nonlinear
estimates on appropriate spaces that contain the solution u. As indicated by the main
theorems above, the spaces we consider here areMp,10,s, and we present the appropriate
estimates in the lemmas below. In fact, we will need estimates on Fourier multipliers
on modulation spaces. As proved in [3] and [7], a function σ(ξ) is a symbol of a
bounded Fourier multiplier on Mp,q for 1 ≤ p, q ≤ ∞ if σ ∈ W (FL1, ℓ∞) (see the
proofs of the following two lemmas for a definition of this space). As we shall indicate
below, this condition can be naturally extended to give a sufficient criterion for the
boundedness of the Fourier multiplier operator on Mp,q0,s for 0 < p, q ≤ ∞ and s ≥ 0.
The notation A . B stands for A ≤ cB for some positive constant c independent of
A and B.
Lemma 1. Let σ be a function defined on Rd and consider the Fourier multiplier
operator Hσ defined by
Hσf(x) =
σ(ξ) f̂(ξ) e2πξ·x dξ.
Let χ ∈ S such that supp χ̂ ⊂ {|ξ| ≤ 1}. Let d ≥ 1, s ≥ 0, 0 < q ≤ ∞, and 0 < p < 1.
If σ ∈ W (FLp, ℓ∞)(Rd), i.e.,
‖σ‖W (FLp,ℓ∞) = sup
‖σ · Tβnχ‖FLp < ∞
for β > 0, then Hσ extends to a bounded operator on Mp,q0,s(Rd).
Proof. We use the definition of the modulation spaces given by (6) (see also [9]). In
particular, let χ ∈ S such that supp χ̂ ⊂ {|ξ| ≤ 1}, and define g ∈ S by ĝ = χ̂2.
Denote g̃(x) = g(−x). For f ∈ S, β > 0, k ∈ Zd and x ∈ Rd we have:
|Hσf ∗ (Mβkg̃)(x)| = |VgHσf(x, βk)|
= |〈σf̂,M−xTβkĝ〉|
= |〈σf̂,M−xTβkχ̂2〉|
≤ |F−1(σ · Tβkχ̂)| ∗ |F−1(f̂ · Tβkχ̂)|(x)
≤ |F−1(σ · Tβkχ̂)| ∗ |f ∗ (Mβkχ̃)|(x).
Now, observe that supp
σ · Tβkχ̂
⊂ Γk := βk + {|ξ| ≤ 1} and supp
f̂ · Tβkχ̂
⊂ Γk.
Moreover, by assumption we know that σ ∈ W (FLp, ℓ∞) and so F−1(σ · Tβkχ̂) ∈ Lp
and f ∗(Mβkχ̃) ∈ Lp. Consequently, by [9, Lemma 2.6] we have the following estimate
‖Hσf ∗ (Mβkg̃)‖Lp ≤ C ‖F−1(σ · Tβkχ̂)‖Lp‖f ∗ (Mβkχ̃)‖Lp,
where C is a positive constant that depends only on the diameter of Γk and p. Clearly,
the diameter of Γk is independent of k, and this makes C a constant depending only
on the dimension d and the exponent p. Therefore, for 0 < q ≤ ∞ we have
‖Hσf‖Mp,q0,s . sup
‖F−1(σ · Tβkχ̂)‖Lp ‖f‖Mp,q0,s = ‖σ‖W (FLp,ℓ∞) ‖f‖Mp,q0,s .
NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 5
The result then follows from the density of S in Mp,q0,s for p, q < ∞; see [9, Theorem
3.10]. �
We are now ready to state and prove the boundedness of Fourier multipliers that
will be needed in establishing our main results.
Lemma 2. Let d ≥ 1, s ≥ 0, and 0 < q ≤ ∞ be given. Define mα(ξ) = ei|ξ|
1 ≤ p ≤ ∞ and α ∈ [0, 2], then the Fourier multiplier operator Hmα extends to a
bounded operator on Mp,q0,s(Rd).
Moreover, If α ∈ {1, 2} and d
< p ≤ ∞, then the Fourier multiplier operator
Hmα extends to a bounded operator on M
0,s(R
Proof. First, we prove the result when 1 ≤ p ≤ ∞, and 0 < q ≤ ∞. Let g ∈ S(Rd)
and define χ ∈ S by χ̂ = g2. For f ∈ S, we have
|VχHmαf(x, ξ)|
mα(t) f̂(t) e
2πix·t χ̂(t− ξ) dt
mα(t) Tξg(t) < t >
s f̂(t) <ξ>
<t>s <t−ξ>N < t− ξ >
N g(t− ξ) e2πix·t dt
mα(t) Tξg(t)φN(ξ, t) ̂< D >s f(t) TξgN(t) dt
∣∣∣∣F
mα · Tξg φN(ξ, ·) ̂< D >s f TξgN
∣∣∣∣F(mα · Tξg) ∗ F2(φN(ξ, ·)) ∗ F( ̂< D >s f · TξgN)(−x)
∣∣∣∣,
where N > 0 is an integer to be chosen later, gN(t) =< t >
N g(t), φN(ξ, t) =
<t>s <t−ξ>N , and < D >
s is the Fourier multiplier defined by ̂< D >s f(ξ) =< ξ >s
f̂(ξ). We also denote by Φ2,N (ξ, ·) := F2(φN(ξ, ·)) the Fourier transform in the second
variable of φN(ξ, ·)
6 Á. Bényi and K. A. Okoudjou
We can therefore estimate the weighted modulation norm of Hmαf as follows:
‖Hmαf‖Mp,q0,s
|Vχf(x, ξ)|p dx
< ξ >qs
∣∣∣∣F(mα · Tξg) ∗ Φ2,N (ξ, ·) ∗ F( ̂< D >s f · TξgN)(−x)
‖F−1(mα · Tξg)‖qL1 ‖Φ2,N(ξ, ·)‖
‖F( ̂< D >s f · TξgN)‖qLp dξ
≤ sup
‖F−1(mα · Tξg)‖L1 sup
‖Φ2,N(ξ, ·)‖L1
‖F−1( ̂< D >s f · TξgN)‖qLp dξ
≤ sup
‖F(mα · Tξg)‖L1 sup
‖Φ2,N (ξ, ·)‖L1 ‖f‖Mp,q0,s .
Now, it follows from [3, Lemma 8] that, for α ∈ [0, 2],
‖F−1(mα · Tξg)‖L1 := ‖mα‖W (FL1,ℓ∞) < ∞.
Moreover (see, for example, [13, Lemma 3.1] or [14, Lemma 2.1]), we can select a
sufficiently large N > 0 such that
‖Φ2,N (ξ, ·)‖L1 ≤
|Φ2,N (ξ, x))|dx < ∞.
Hence, using (8), we get
‖Hmαf‖Mp,q0,s ≤ Cα‖f‖Mp,q0,s .
To prove the second part of the result we shall use Lemma 1. In particular, we
need to show that for α ∈ {1, 2} and d
< p < 1, mα ∈ W (FLp, ℓ∞). This, however,
follows by straightforward adaptations of the proofs of [3, Theorems 9 and 11], which
we leave to the interested reader. �
In analogy to the proof of the previous lemma, we can prove the following weighted
version of [3, Theorem 16].
Lemma 3. Let d ≥ 1, s ≥ 0, d
< p ≤ ∞ and 0 < q ≤ ∞ be given, and let
m(1)(ξ) =
sin(|ξ|)
|ξ| and m
(2)(ξ) = cos(|ξ|), for ξ ∈ Rd. Then, the Fourier multiplier
operators Hm(1) , Hm(2) can be extended as bounded operators on M
A “smooth” version of Lemma 3 is obtained by replacing |ξ| with < ξ >.
Lemma 4. Let d ≥ 1, s ≥ 0, d
< p ≤ ∞ and 0 < q ≤ ∞ be given, and let
m(ξ) = ei<ξ>, m(1)(ξ) =
sin(<ξ>)
and m(2)(ξ) = cos(< ξ >), for ξ ∈ Rd. Then, the
Fourier multiplier operators Hm, Hm(1), Hm(2) can be extended as bounded operators
on Mp,q0,s.
NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 7
Proof. It is clear that m,m(1), m(2) are C∞(Rd) functions and that all their derivatives
are bounded. Therefore, m,m(1), m(2) ∈ Cd+1(Rd) ⊂ M∞,1(Rd) ⊂ W (FL1, ℓ∞)(Rd)
[8, 11]. Thus, for 1 ≤ p ≤ ∞, and 0 < q ≤ ∞ the result follows from [3] and Lemma 2.
For d
< p < 1 and 0 < q ≤ ∞, it can be showed that m,m(1), m(2) ∈ Cd+1(Rd) ⊂
W (FLp, ℓ∞)(Rd). Indeed, this follows from obvious modifications to the proof of
the embedding Cd+1(Rd) ⊂ M∞,1(Rd) ⊂ W (FL1, ℓ∞)(Rd) [8, 11]. Furthermore, if
we modify, for example, the multiplier m to mt(ξ) = e
it<ξ>, t ∈ R, we have for
< p ≤ 1
(9) ‖mt‖W (FLp,ℓ∞) ≤ (1 + |t|)d+1,
and similar estimates hold for modified multipliers m
t and m
t . �
Finally, we state a crucial multilinear estimate that will be used in our proofs.
Although the estimate will be needed only in the particular case of a product of
functions (see Corollary 1), we present it here in its full generality that applies to
multilinear pseudodifferential operators.
An m-linear pseudodifferential operator is defined à priori through its (distribu-
tional) symbol σ to be the mapping Tσ from the m-fold product of Schwartz spaces
S × · · · × S into the space S ′ of tempered distributions given by the formula
Tσ(u1, . . . , um)(x)
σ(x, ξ1, . . . , ξm) û1(ξ1) · · · ûm(ξm) e2πix·(ξ1+···+ξm) dξ1 · · · dξm,(10)
for u1, . . . , um ∈ S. The pointwise product u1 · · ·um corresponds to the case σ = 1.
Lemma 5. If σ ∈ M∞,10,s (R(m+1)d), then the m-linear pseudodifferential operator Tσ
defined by (10) extends to a bounded operator from Mp1,q10,s ×· · ·×M
pm,qm
0,s into M
p0,q0
when 1
+ · · ·+ 1
+ · · ·+ 1
= m − 1 + 1
, and 0 < pi ≤ ∞, 1 ≤ qi ≤ ∞
for 0 ≤ i ≤ m.
This result is a slight modification of [4, Theorem 3.1]. Its proof proceeds along
the same lines, and therefore it is omitted here. Note that if σ ∈ M∞,10,s , and we pick
u1 = · · · = um = u (some of them could be equal to ū since the modulation norm is
preserved), p1 = · · · = pm = mp, 0 < p ≤ ∞, and q1 = · · · = qm = 1 we have
(11) ‖Tσ(u, . . . , u)‖Mp,10,s . ‖u‖
Mmp,10,s
. ‖u‖mMp,10,s ,
where we used the obvious embedding Mp,10,s ⊆ M
0,s . The notation A . B stands
for A ≤ cB for some positive constant c independent of A and B. In particular, if
we select σ = 1 (the constant function 1), then σ ∈ M∞,10,s ⊂ M∞,1, and we obtain
Corollary 1. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then um ∈ M
0,s. Furthermore,
‖um‖Mp,10,s . ‖u‖
Mp,10,s
8 Á. Bényi and K. A. Okoudjou
This is of course just a particular case of the more general multilinear estimate
Mp0,q00,s
‖ui‖Mpi,qi0,s ,
where the exponents satisfy the same relations as in Lemma 1. When we consider
the power nonlinearity f(u) = pk(u) = λ|u|2ku = λuk+1ūk, Corollary 1 becomes
Corollary 2. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then pk(u) ∈ M
0,s. Furthermore,
‖pk(u)‖Mp,10,s . ‖u‖
Mp,10,s
For a different proof of the estimate in Corollary 2, see [1, Corollary 4.2]. It is
important to note that the previous estimate allows us to control the exponential
nonlinearity eρ as well. Indeed, since
eρ(u) = λ(e
ρ|u|2 − 1)u =
pk(u),
if we now apply the modulation norm on both sides and use the triangle inequality,
we arrive at
Corollary 3. Let 0 < p ≤ ∞. If u ∈ Mp,10,s, then eρ(u) ∈ M
0,s. Furthermore,
‖eρ(u)‖Mp,10,s . ‖u‖Mp,10,s(e
|ρ|‖u‖2
0,s − 1).
3. Proofs of the main results
We are now ready to proceed with the proofs of our main theorems. We will only
prove our results for the power nonlinearities f = pk, by making use of Corollary 2.
The case of exponential nonlinearity f = eρ is treated similarly, by now employing
Corollary 3. In all that follows we assume that u : [0, T )×Rd → C where 0 < T ≤ ∞
and that f(u) = pk(u) = λ|u|2ku.
3.1. The nonlinear Schrödinger equation: Proof of Theorem 1. We start by
noting that (1) can be written in the equivalent form
(13) u(·, t) = S(t)u0 − iAf(u)
where
(14) S(t) = eit∆, A =
S(t− τ) · dτ.
Consider now the mapping
J u = S(t)u0 − i
S(t− τ)(pk(u))(τ) dτ.
It follows from Lemma 2 (see also [3, Corollary 18]) that
‖S(t)u0‖Mp,10,s ≤ C (t
2 + 4π2)d/4 ‖u0‖Mp,10,s ,
NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 9
where C is a universal constant depending only on d. Therefore,
(15) ‖S(t)u0‖Mp,10,s ≤ CT ‖u0‖Mp,10,s ,
where CT = sup
t∈[0,T )
C (t2 + 4π2)d/4. Moreover, we have
S(t− τ)(pk(u))(τ) dτ
Mp,10,s
‖S(t− τ)(pk(u))(τ)‖Mp,10,s dτ
≤ T CT sup
t∈[0,T ]
‖pk(u)(t)‖Mp,10,s .(16)
By using now Corollary 2, we can further estimate in (16) to get
S(t− τ)(pk(u))(τ) dτ
Mp,10,s
. CT T ‖u(t)‖2k+1Mp,10,s .
Consequently, using (15) and (17) we have
(18) ‖J u‖
C([0,T ],Mp,10,s)
≤ CT (‖u0‖Mp,10,s + cT ‖u‖
Mp,10,s
for some universal positive constant c. We are now in the position of using a stan-
dard contraction argument to arrive to our result. For completeness, we sketch it
here. Let BM denote the closed ball of radius M centered at the origin in the space
C([0, T ],Mp,10,s). We claim that
J : BM → BM ,
for a carefully chosen M . Indeed, if we let M = 2CT‖u0‖Mp,10,s and u ∈ BM , from (18)
we obtain
‖J u‖
C([0,T ],Mp,10,s)
+ cCTTM
2k+1.
Now let T be such that cCTTM
2k ≤ 1/2, that is, T ≤ T̃ (‖u0‖Mp,10,s ). We obtain
‖J u‖
C([0,T ],Mp,10,s)
that is J u ∈ BM . Furthermore, a similar argument gives
‖J u− J v‖
C([0,T ],Mp,10,s)
‖u− v‖
C([0,T ],Mp,10,s)
This last estimate follows in particular from the following fact:
pk(u)(τ)− pk(v)(τ) = λ(u− v)|u|2k(τ) + λv(|u|2k − |v|2k)(τ).
Therefore, using Banach’s contraction mapping principle, we conclude that J has a
fixed point in BM which is a solution of (13); this solution can be now extended up
to a maximal time T ∗(‖u0‖Mp,10,s ). The proof is complete.
10 Á. Bényi and K. A. Okoudjou
3.2. The nonlinear wave equation: Proof of Theorem 2. Equation (2) can be
written in the equivalent form
(19) u(·, t) = K̃(t)u0 +K(t)u1 − Bf(u)
where
(20) K(t) =
sin(t
−∆ , K̃(t) = cos(t
−∆), B =
K(t− τ) · dτ
Consider the mapping
J u = K̃(t)u0 +K(t)u1 − Bf(u).
Recall that f = pk. If we now use Lemma 3 (see also [3, Corollary 21]) for the first
two inequalities below and Corollary 2 for the last estimate, we can write
‖K̃(t)u0‖Mp,10,s ≤ CT‖u0‖Mp,10,s ,
‖K(t)u1‖Mp,10,s ≤ CT‖u1‖Mp,10,s ,
‖Bf(u)‖Mp,10,s ≤ cT CT‖u‖
Mp,10,s
where c is some universal positive constant. The constants T and CT have the
same meaning as before. The standard contraction mapping argument applied to J
completes the proof.
3.3. The nonlinear Klein-Gordon equation: Proof of Theorem 3. The equiv-
alent form of equation (3) is
(22) u(·, t) = K̃(t)u0 +K(t)u1 + Cf(u)
where now
(23) K(t) =
sin t(I−∆)1/2
(I−∆)1/2 , K̃(t) = cos t(I −∆)
1/2, C =
K(t− τ) · dτ.
Consider the mapping
J u = K̃(t)u0 +K(t)u1 + Cf(u).
Using Lemma 4 and the notations above, we can write
‖K̃(t)u0‖Mp,10,s ≤ CT‖u0‖Mp,10,s ,
‖K(t)u1‖Mp,10,s ≤ CT‖u1‖Mp,10,s ,
‖Cf(u)‖Mp,10,s ≤ cT CT‖u‖
Mp,10,s
The standard contraction mapping argument applied to J completes the proof.
NONLINEAR DISPERSIVE EQUATIONS ON MODULATION SPACES 11
References
[1] W. Baoxiang, Z. Lifeng, and G. Boling, Isometric decomposition operators, function spaces Eλp,q
and applications to nonlinear evolution equations, J. Funct. Anal. 233 (2006), no. 1, 1–39.
[2] W. Baoxiang and H. Hudzik, The global Cauchy problem for the NLS and NLKG with small
rough data, J. Diff. Equations 232 (2007), 36–73.
[3] Á. Bényi, K. Gröchenig, K. A. Okoudjou, and L. G. Rogers, Unimodular Fourier multipliers
for modulation spaces, J. Funct. Anal. (2007), to appear.
[4] Á. Bényi, K. Gröchenig, C. Heil, and K. Okoudjou, Modulation spaces and a class of bounded
multilinear pseudodifferential operators, J. Operator Theory 54 (2005), no. 2, 387–399.
[5] Y. V. Galperin, and S. Samarah, Time-frequency analysis on modulation spaces Mp,qm , 0 <
p, q ≤ ∞, Appl. Comput. Harmon. Anal., 16 (2004), 1–18.
[6] H. G. Feichtinger, Modulation spaces on locally Abelian groups, in: “ Proc. Internat. Conf. on
Wavelets and Applications” (Radha, R.;Krishna, M.;Thangavelu, S. eds.), New Delhi Allied
Publishers (2003), 1–56.
[7] H. G. Feichtinger and G. Narimani, Fourier multipliers of classical modulation spaces, Appl.
Comput. Harmon. Anal. 21 (2006), no. 3, 349–359.
[8] K. Gröchenig, Foundations of Time-Frequency Analysis, Birkhäuser, Boston MA, 2001.
[9] M. Kobayashi, Modulation spaces Mp,q for 0 < p, q ≤ ∞, J. Func. Spaces Appl. 4 (2006), no.
2, 329–341.
[10] M. Kobayashi, Dual of modulation spaces, J. Func. Spaces Appl., to appear.
[11] K. A. Okoudjou, Embeddings of some classical Banach spaces into the modulation spaces, Proc.
Amer. Math. Soc., 132 (2004), no. 6, 1639–1647.
[12] T. Tao, Nonlinear dispersive equations: Local and global analysis, CBMS Regional Conference
Series in Mathematics, no. 106, American Mathematical Society, 2006
[13] J. Toft, Convolutions and embeddings for weighted modulation spaces, Advances in pseudo-
differential operators, 165–186, Oper. Theory Adv. Appl. 155, Birkhauser, Basel, 2004.
[14] J. Toft, Continuity properties for modulation spaces, with applications to pseudo-differential
calculus, II, Ann. Global Anal. Geom. 26 (2004), no. 1, 73–106.
[15] H. Triebel, Modulation spaces on the euclidean n−space, Z. Anal. Anwendungen, 2 (1983), no.
5, 443–457.
Árpád Bényi, Department of Mathematics, 516 High Street, Western Washington
University, Bellingham, WA 98225, USA
E-mail address : arpad.benyi@wwu.edu
Kasso A. Okoudjou, Department of Mathematics, University of Maryland, Col-
lege Park, MD 20742, USA
E-mail address : kasso@math.umd.edu
	1. Introduction and statement of results
	2. Fourier multipliers and multilinear estimates
	3. Proofs of the main results
	3.1. The nonlinear Schrödinger equation: Proof of Theorem ??
	3.2. The nonlinear wave equation: Proof of Theorem ??
	3.3. The nonlinear Klein-Gordon equation: Proof of Theorem ??
	References
ABSTRACT
  By using tools of time-frequency analysis, we obtain some improved local
well-posedness results for the NLS, NLW and NLKG equations with Cauchy data in
modulation spaces $M{p, 1}_{0,s}$.

<|endoftext|><|startoftext|>
Introduction 
Arithmetic coding algorithm in its modern version was published in Communications of  ACM in June 1987 
[Witten], but the authors, Ian Witten, Radford Neal and John Cleary, referred to [Abrahamson] as to “the 
first reference to what was to become the method of arithmetic coding”. So we may say that it is known 
“for more than forty years”. The algorithm now is a common knowledge – it was published in numerous 
textbooks (see for example [Salomon, Sayood]), some reviews were published [Bodden, Said], Dr. 
Dobb’s Journal popularized it [Nelson], wiki [wiki] contains an article about it, a lot of sources could be 
found on web... So why one more paper on this subject and what is this “p-adic arithmetic”? 
Let go back to the original idea of arithmetic coding. In arithmetic coding a message is represented as a 
subinterval [b, e) of union semi interval [0, 1). (We will give all definitions later) When a new symbol s 
comes a new subinterval [b(s), e(s)) of [b, e) is constructed. Common method to calculate a new 
subinterval is to divide a current interval into |A| (A is an alphabet, |A|- number of symbols) subintervals, 
each subinterval represents a symbol from A and has length proportional to probability of this symbol. For 
a new symbol s corresponding subinterval [b(s), e(s)) will be return by encoder. Thus encoding is a 
process of narrowing intervals (we will call them message intervals) starting from the union interval:  
 [0, 1) ≡ [b0, e0), [b1, e1), [b2, e2), … , [bt, et) 
where 
0 = b0, ≤  b1  ≤  b2  ≤  … ≤  bt 
1 = e0, ≥ e 1 ≥ e 2  ≥  … ≥ e t 
All bi and ei are real numbers.  
A last constructed subinterval may be used as a final output, or any point x from last subinterval and 
message length. But usually a special symbol EOM (End Of Message), which does not belong to the 
alphabet, is used as termination symbol of a message. In this case only a point x can be used as coding 
result. 
Decoding is also a process on narrowing intervals. It starts with union interval and a point x inside it. 
Decoder finds a symbol by dividing current intervals into |A| subintervals and finds the one that contains 
point x, say [b1, e1).  Corresponding to this interval symbol s1 is pushed into an output buffer; [b1, e1) is 
used as a new current interval. And so on until EOM symbol is received. 
4/5/2007 
But here is a problem – one have to use infinite precision real numbers to implement this algorithm and 
there is no such a thing like effective infinite precision real arithmetic. This problem was always considered 
as a technical one. Solution is simple - just use integers instead. There is a canonical implementation, first 
written in C [Witten], which was later reproduced in other languages, but no analysis of what happens to 
the algorithm after moving it from the field of real numbers to the ring of integer numbers was published.  
In this paper we introduce p-adic arithmetic coding which is based on mapping a message to a path on a p-
tree (a tree with p outgoing branches at each vertex; we also assume that p is a prime number). This path is 
constructed as a common part of paths to the left and right edges of a subinterval [gl(s), gr(s)), where gl(s) 
and gr(s) are from a special equidistant grid G on [0, 1). This semi interval is constructed according to the 
same rules, as in real number arithmetic coding, but in contrast to it, the edges are not arbitrary real 
numbers, but belong to the grid. 
A path on a p-tree can be naturally presented as a p-adic integer number. p-adic distance proved to be a 
natural measure on paths – the longer a common part of two paths, the smaller p-adic distance between 
them. Function ordP, also known as p-adic logarithm, gives length of a common path. 
A path can be also identified by its final point on a grid. A grid point g can be represented by an integer 
index k from a finite integer ring as g=k*|G|-1(here |G| is number of elements in G). The crucial point of 
this algorithm is how we can calculate a path from an index and vice versa – an index from a path. IP 
(Index-Path) mapping, described in this article, presents an elegant and efficient way for this. 
Now we ready to give a brief sketch of how does the algorithm work. As initial step we have to define an 
alphabet A, a model M, an output buffer (it will contain a p-adic integer number B) and a grid G on [0, 1). 
Start with union coding semi interval represented by two indexes l=0 (left) and r=0 (right), and B=0. When 
a new symbol s comes, the model calculates a new subinterval [l(s), r(s)) (l and r are indexes from a finite 
integer ring, while l^ and r^ – paths presented as p-adic integer numbers).  Using IP transformation (we use 
symbol ^ for this transformation) we can calculate p-adic representation of paths to these edges l(s)^ and 
r(s)^. If p-adic distance between them is equal to 1, we continue encoding using [l(s), r(s)) as a new 
current encoding interval. If the distance is less then 1, then l(s)^ and r(s)^ have a common path of length 
c=ordP(l(s)^, r(s)^). That means that path to any point inside [l(s), r(s)) have the same first (least 
significant) c digits as l(s)^. We can push this common path to an output buffer adding them as new most 
significant part of p-adic number B. We can also drop c least significant digit of l(s)^ and r(s)^.  
Both of these operations are possible, because p-adic numbers are read from left to right, i.e. less 
significant digit (those that are multiplied by less powers of p) are in the left part of buffer. This feature of 
p-adic integer numbers explains why p-adic arithmetic coding and decoding are incremental. 
Now we can continue encoding with truncated l(s)^ and r(s)^.  To do this we must calculate new 
subinterval, corresponding to new paths. This also can be done using IP transformation. Encoding will 
continue using [(l(s)^)^, (r(s)^)^) as new current message interval from some grid. This procedure we will 
call PR rescaling. In the case of p=2 this is procedure is similar to well known E1/E2 rescaling [Bodden]. 
But PR rescaling gives a better insight of this mechanism, connects it with p-adic norm and can be used for 
any prime p. Moreover, PR rescaling is more accurate on boundaries and because of this the algorithm is 
able to reproduce Huffman codes for certain models.  We will also generalize E3 rescaling, which is based 
on usual Archimedean norm (absolute value in this case), for any prime p.   
p-adic arithmetic coding algorithm generalizes not only arithmetic coding. For a special class of models, p-
adic coding algorithm works exactly as Huffman’s algorithm [Huffman]. In this models weights of all 
symbols should be equal to p-n, where n some positive numbers and a sum of all weight is equal to 1. In 
other words, they are leaves of a Huffman code tree.  
For a special model and one symbol alphabet p-adic arithmetic coding reproduce Golomb-Rice codes 
[Golomb, Rice]. 
4/5/2007 
Definitions 
Alphabet 
Alphabet A - A non empty set of symbols ai. |A| - number of symbols in A. 
In most examples below 4 symbols alphabet [a, b, c, d] will be used. Other examples: binary alphabet [0, 
1], 128 characters ASCII, alphabet of 256 different eight-bit characters. The last one is used in all tests. 
Even an alphabet containing only one symbol makes sense – as it will be shown, the algorithm creates 
exactly Golomb-Rice [Golomb, Rice] codes in this case. 
Message 
Message M – a sequence of symbols from alphabet A. 
M = (ao, a1, … , ai, … ,an)   
where ai belong to A. 
Example: (a, b, a, a, b, c, d, a). 
Semi interval 
[l, r) – includes l, but not r.  
Notation [,) means that the left point is included to the interval, while the right one is not.  
Below we always deal with subintervals of [0, 1).  
Grid 
Later we will subdivide [0, 1) into PN (P and N are natural numbers) semi intervals of equal length. Each 
has length P-N and can be identified by its left edge. These left edge points form a grid G(PN). Coordinate 
of a point of the grid with index k is evidently kP-N. We will use notation gk(N) for points from G(P
Picture 1. Grid  P2 (P=2) 
These indexes will play an important role in our discussion.  If fact, all calculations will be done using 
indexes. The range of indexes is 
0 ≤ k < PN 
In other words, indexes are nonnegative numbers modulo PN.  Negative numbers are defined in this ring as  
-k = PN - k 
Weight interval 
Following the main idea of arithmetic coding let map alphabet A to a semi interval [0, 1), which we will 
refer as a weight interval. To do this enumerate symbols from A in any order (the order is not important in a 
sense that compression rate does not depend on it) and divide the interval in |A| semi intervals. Semi 
interval [wi, wi+1) corresponds to symbol ai. To make compression effective lengths of these intervals must 
be equal to probability of symbols in a message: 
| wi - wi+1 | = pi 
where pi – probability of symbol ai.  
4/5/2007 
Picture 2. Weight interval 
To define weight intervals we will use notation:  
{ symbol0:[semininterval1), … , symbol|A|-1:[semininterval|A|-1) } 
For example: 
{a:[0, 0.5), b:[0.5, 0.75), c:[0.75, 0.875), d:[0.875,1)} 
Message interval 
Let fix a natural number N, a prime number P and create a grid G(PN). Messages will be mapped to semi 
intervals [l, r) of this interval. 
0 ≤ l < r < 1 
l, r belongs to G(PN). 
Arithmetic coding is just a process of narrowing a message interval. When a new symbol comes, a current 
message interval is divided in |A| subintervals proportional to weight interval and then a subinterval 
corresponding to a new symbol is selected as a new message interval.  Thus starting with [0, 1) (empty 
message) interval we end up with a subinterval corresponding to the whole message.  
For example let see how a short message {a, b, a} may be coded (here we use weight interval from 
previous section): 
Picture 3. Message interval 
We may also present this in a table (using actual values of gk(N)): 
4/5/2007 
Message Semi interval Numerical value
{} [ g0(0) , g0(0) ) [ 0     , 1    ) 
{a} [ g0(1) , g1(1) ) [ 0     , 1/2 ) 
{a, b} [ g2(3) , g3(3) ) [ 2/4  , 3/8 ) 
{a, b, a} [ g4(4) , g5(4) ) [ 4/8  , 5/8 ) 
An important difference from the original idea of real number arithmetic coding is that here we use only 
points from a grid as subintervals edges.  
Coding tree 
Consider a tower of grids 
 G(P0)  <  G(P1)  < G(P2) < … <  G(Pn) … 
By construction, if a point belongs to G(Pn), then it belongs to G(Pn+1), G(Pn+2),…., G(Pk) (k>n). 
G(P0)  consist only of one point. 
Now let construct a coding tree. Start with a root – which is evidently the only point from G(P0) - g0(0) . 
Then comes a first level – points from G(P1).  Link the root g0(0) with points from  G(P
1): g0(1),  g1(1), 
… , gP-1(1). This gives us first level of the coding tree. Now we can continue. Let us assume that the tree 
build up to a level N. To create a new N+1 level, we have to:  
Construct a new grid G(Pn+1) as a new bottom level to the bottom (us usual, tree grows 
downwards).  
Link points from the last level (i.e. points from G(Pn)) to points from G(P n+1) according to the 
following rule:  
gk(n) link to points gk*P(n+1),  g(k+1)*P(n+1), …, g(k+P-1)*P (n+1) 
0 11/21/4 3/4
g0(1) g1(1)
g0(2) g1(2) g2(2) g3(2)
g0(3) g2(3) g4(3)g3(3) g5(3) g6(3) g7(3)
g0(4)
g1(4)
g2(4)
g3(4)
g4(4)
g5(4)
g6(4)
g7(4)
g8(4)
g9(4)
g10(4)
g11(4)
g12(4)
g13(4)
g14(4)
g15(4)
g1(3)
g0(0)
Picture 4. Coding tree  
Here, as in most other illustrations, we use P=2 to simplify drawing.  
Now we can use the grid and the tree to code a simple message {a, b, a} using weights {a:[0, 1/2), b:[1/2, 
3/4), c:[3/4, 7/8), d:[ 7/8, 1)} 
4/5/2007 
Picture 5. Paths on coding tree  
Message Semi interval Path 
{} [ g0(0) , g0(0) ) [ { g0(0) },  { g0(0) }  ) 
{a} [ g0(1) , g1(1) ) [ { g0(0) , g0(1) } ,  { g0(0) , g1(1) }  ) 
{a, b} [ g2(3) , g3(3) ) [ { g0(0), g0(1) , g1(2) , g2(3) },  { g0(0), g0(1) , g1(2) , g3(3) }  ) 
{a, b, a} [ g4(4) , g5(4) ) [ { g0(0), g0(1) ,g1(2) ,g2(3),g4(4)}, { g0(0),g0(1) ,g1(2) ,g2(3),g5(4) } ) 
We need a more convenient way to refer to grid points and tree paths.  Grid points can be easily represented 
as indexes, i.e. well known positive integer numbers, while for paths we will use p-adic integer numbers.   
Representation of paths as p-adic integer numbers 
From any point gk(n) we have P different links to a next (n+1) level. We can mark our next choice with a 
nonnegative integer number m 
0 ≤ mj < P  
0 ≤  j   ≤ n     
Now we can represent any (final) paths on the coding tree as a vector  
M = {m0, m1, m2, … , mn} 
This vector can be mapped to a nonnegative number x 
x= m0P
 + m1P
1 + m2P
2+   … + mnP
This mapping is evidently one to one. The number x may be considered as a p-adic integer number. These 
numbers are not well known among programmers.  One can find an introduction in p-adic numbers in 
[Baker, Koblitz].  A very helpful way to visualize some unusual properties of p-adic mathematic may be 
found in [Holly]. 
The first coefficient m0 tells us to what top level subinterval of [0, 1) the point belongs. The next one m1 – 
to which subinterval of this interval the point belongs, and so on. 
4/5/2007 
Picture 6. p-adic representation of binary tree. 
Level Paths 
0 {} 
1 Paths {0} {1} 
p-adic number 0 1  
2 Paths {0,0} {0,1} {1,0} {1,1} 
p-adic number 0 2 1 3  
3 Path {0,0,0} {0,0,1} {0,1,0} {0,1,1} {1,0,0} {1,0,1} {1,1,0} {1,1,1} 
p-adic number 0 4 2 6 1 5 3 7  
A tree for P=3 is shown in the next picture 
Picture 6.  p-adic representation of a tree; P=3. 
4/5/2007 
Level Paths 
0 {} 
1 Paths {0} {1} {2} 
p-adic number 0 1 2  
2 Paths {0,0} {0,1} {0,2 } {1,0} {1,1} {1,2} {2,0} {2,1} {2,2} 
p-adic number 0 3 6 1 4 7 2 5 8  
In our algorithm we will use p-adic distance. This distance can be defined with the help of p-adic logarithm 
function, usually called as ordP.  
ordP(x) = max number r such that (x % P
r) = 0; x ≠ 0  
p-adic norm is  
 |x|P = 1 / P**ordP(x)      x ≠ 0 
 |0| = 0. 
and distance 
dP(x,y) = |x – y|P 
It can be shown [Koblitz] these are “real” norm and distance, i.e. all three axioms are valid for them. 
For two paths x and y 
x= m0P
 + m1P
1 + m2P
2+   … + mnP
y= k0P
 + k1P
1 + k2P
2+   … + knP
ordP(x-y) gives a number of common links. dP(x,y) have a very intuitive meaning – the greater number of 
common links have two paths, the closer they are in terms of p-adic distance.  
Index representation 
Now return to the first method of mapping paths – by end points. An end point belongs to a grid G(Pn), so 
it is defined by its index a which is just a plain nonnegative integer number. How this index is connected to 
the paths leading to this point?  
Let x be a path 
x= m0P
 + m1P
1 + m2P
2+   … + mnP
0 ≤ mj < P  
0 ≤ j   ≤ n     
Then the path x ends at a point g at level n 
g = m0P
 + m1P
-2 + m2P
-3 +   … + mnP
-n-1 
We just negate powers and subtract 1. It can be proved by simple induction. 
Let a be an index corresponding to point g on the grid. 
g = aP-n-1 
a= m0P
 + m1P
n-1 + m2P
n-2+   … + mnP
We can rewrite a in usual form  
a= u0P
 + u1P
1 + u2P
2+   … + unP
If we consider x and a as integer numbers then mapping is just reversing vectors of coefficients.  This 
mapping we will call IP (Index-Path) mapping. 
4/5/2007 
We will introduce some useful notations in the next section and use this feature intensively in algorithms: 
we can perform ordinary integer arithmetic operations on indexes calculating new subinterval and 
immediately get paths to them.  
Mapping paths to points was considered in a more general form by S.V. Kozyrev [Kozyrev]. Following his 
notation we will use symbol ρ for it. The ρ mapping has a very important feature 
|ρ(x) - ρ(y)| ≤ |x – y|P  
were |ρ(x) - ρ(y)| is Archimedean (our usual) distance which in our case is absolute value. This means that 
if two paths are close to each other then corresponding end points are also close in our usual Archimedean 
norm. A proof of this can be found in [Kozyrev]. 
ρ mapping is not one to one, IP (Index-Path), mapping which deals with finite sums, is one to one 
mapping. Let, as previously a and g be 
a= m0P
 + m1P
n-1 + m2P
n-2+   … + mnP
g = aP-n-1 
To what subinterval of level 1 of [0, 1) the point belongs to? It depends on the value of m0 only. 
Straightforward calculations   
n-1 + m2P
n-2+   … + mnP
0 ≤ (P-1)( P0 + P
1 +  P2+   … +  Pn-1) = 
(P-1)((Pn -1)/(P-1)) =(Pn -1)  
show that a sum of all less significant terms can’t move a point to another subinterval. With m0 fixed, we 
can conclude that m1 solely defines subinterval inside the second subinterval and so on. 
Let us continue with the example (with P=2) above by adding indexes of grid points to the table: 
Level Paths 
0 {} 
1 Paths {0} {1} 
p-adic number 0 1 
Index 0 1 
Points 0 1/2  
2 Paths {0,0} {0,1} {1,0} {1,1} 
p-adic number 0 2 1 3 
Index 0 1 2 3 
Points 0 1/4 2/4 3/4  
3 Paths {0,0,0} {0,0,1} {0,1,0} {0,1,1} {1,0,0} {1,0,1} {1,1,0} {1,1,1} 
p-adic number 0 4 2 6 1 5 3 7 
Index 0 1 2 3 4 5 6 7 
Points 0 1/8 2/8 3/8 4/8 5/8 6/8 7/8  
As an illustration let us compare  Archimedean and  p-adic distances for the following three points g2(3) 
(or {0, 1, 0}), g3(3) ({0,1,1}) and g4(3) ({1, 0, 0}). Archimedean distance we all used to: 
Points (p-adic) 2 3 4 
2 0 1/8 1/4 
3 1/8 0 1/8 
4 1/4 1/8 0 
p-adic logarithm (ord) has values: 
4/5/2007 
Points (p-adic) 2 6 1 
2  2 0 
6 2  0 
1 0 0  
and p-adic distances are: 
Points (p-adic) 2 6 1 
2 0 1/4 1 
6 1/4 0 1 
1 1 1 0 
From this we may observe that the greater is a common path, the closer are points in p-adic norm.  
Operators ^ and [] 
Let us define operator ^ to transform points from index representation to path representation, or, in other 
words, from nonnegative integers modulo PN to p-adic integers, and back, from path to index 
representation. 
x = a^ 
It is convenient to rewrite a and x in form of scalar product. 
Consider N+1 element vectors MN and PN 
MN = (m0, m1, m2, … , mN)    
where 0 ≤ mj < P,  0 ≤  j  ≤ N 
PN = (P
0, P1, P2,   … ,  PN) 
Then 
x = m0P
 + m1P
1 + m2P
2+   … + mNP
can be represented as scalar product of two vectors 
x = (MN • PN
T – as usual, means operation of matrix transposition (i.e. changing rows to columns).  While 
a= m0P
 + m1P
N-1 + m2P
N-2+   … + mNP
a = (MN
R  • PN
T) = (MN
 • (PN
 R )T) 
Here R means reverting elements of a vector.  It is obvious that operator ^ is idempotent 
x^^ = a^ = x 
The important thing about this trivial operation is that we can perform arithmetic operation on points of a 
grid, and then immediately find a path to it by applying operator ^, and vice versa, for a given path we can 
find a corresponding grid point.  
It is convenient to define operator [] as coefficient in scalar representation: 
Let x be as previously  
x = m0P
 + m1P
1 + m2P
2+   … + mNP
Then  
4/5/2007 
x[i] = mi 
It is easy to see that 
x[i] = x^[N-i] 
Mapping subintervals to paths 
Now any subinterval can be mapped to a pair of paths on a coding tree, provided that edge points of 
subintervals belong to some G(PN).  We will use notation [ , ] for intervals presented as a pair of indexes, 
where l ≤ r and [l^, r^] – as a pair of paths and  [ , ) for pairs paths to subintervals. A simple fact, just to 
note: 
if an interval [l, r] lies inside an interval [l1, r1], then dP(l1^, r1^) ≤ dP(l^, r^) 
In other words, paths to subinterval’s edges are closer then paths to enveloping interval. So, if an interval’s 
edges have a common path, then paths to edges of any subinterval have at least the same or even longer 
common paths. A length of common path can be calculated as ordP(r^ - l^). As an example see Picture 5. 
Let discuss in more details the rightmost semi interval, i.e. a subinterval which ends at point 1. This point 
has index equal to PN. Because we are working in the ring on integers numbers mod PN the index is equal 
to 0 in this ring.  So path to 1 has the form {0, 0, … , 0}, an general form of the rightmost interval is [l, 0]. 
What is a p-adic length of a rightmost interval? By definition 
dP(l,0) = | 0 – l |P = | –l |P 
And common length is ordP(-l). 
What is a “negative” path in our case? Path is always a path to some point on a grid. We use indexes for 
representing them. So a negative path may be defined as a path to a point, represented by negated index. 
-l = (-(l^))^  
By definition negative numbers in ring mod PN are 
l^ + -(l^) = 0   mod PN 
Common paths 
Consider common part of all paths to points of semi interval [l, r) (l and r are indexes here); both of them 
belongs to G(PN).  All these paths end at corresponding points pi 
 l ≤ pi ≤ r-1   
(remember that r does not belong to subinterval). What is a common path to all these point? 
First consider length of a common path.  To find it we may first find maximum p-adic distance among all 
pairs: 
max(|p^ - q^|P) ;        l ≤ p < q ≤ r-1   
We can use ultrametric feature of p-adic norm (see, for example, [Koblitz, Holly]): 
|x-y|P <= max(|x|P, |y| P)   
In our case we can use it as 
(|p^ - q^|P)  = |(p^ - l^) + (l^ - q^)|P ≤  max(|(p^ - l^)|P , |(l^ - q^)|P) 
l ≤ p < q ≤ r-1   
So all we need is to find 
max(|(pi^ - l^)|P) ;       l < pi  ≤ r-1   
which, by construction, is: 
4/5/2007 
|(r -1)^ - l^|P 
Now we can calculate length of a common path. Special case l = r – 1 is important but trivial – the length 
here is simply a length of l. If l ≠ r – 1 then it is equal to ordP((r-1)^ - l^). Because function ordP is not 
defined for zero argument we introduce a function com, defined on p-adic numbers: 
comP,N(l, r) = N if ( l == r ) else ordP(r - l)  
If l, r belongs to G(PN) then length of common path is calculated as comP,N(l^, r^). 
Finally we have: 
Paths to points of semi interval [l, r) have a common path of length comP,N(l^, (r-1)^). 
Common path is a sub path of length comP,N(^, (r-1)^) of l^ starting from root. 
In the following table examples different intervals of level 2 from Picture 6 and their common paths are 
shown. 
l^ r^ l r r-1 l (r-1) - l com2,2( l^, r^ ) Common path 
{0, 0} {0, 1} 0 1 0 0 0 2 {0, 0} 
{0, 0} {1, 0} 0 2 1 0 2 1 {0} 
{0, 0} {1, 1} 0 3 2 0 1 0 {} 
{0, 0} {0, 0} 0 0 3 0 3 0 {} 
{0, 1} {1, 0} 1 2 1 2 0 2 {0, 1} 
{0, 1} {1, 1} 1 3 2 2 3 0 {} 
{0, 1} {0, 0} 1 0 3 2 1 0 {} 
{1, 0 } {1, 1} 2 3 2 1 1 2 {1, 0} 
{1, 0} {0, 0} 2 0 3 1 2 1 {1} 
{1, 1} {0, 0} 3 0 3 3 0 2 {1, 1} 
Rescaling based on P-adic distance (PR)  
Consider two paths x = {0, 0, 1} and y = {0, 1, 1} or, as p-adic numbers: 
x = 0*20 + 0*2
1 + 1*22 = 4 
y = 0*20 + 1*2
1 + 1*22 = 6 
or, as grid points 
x^*2-3 = (1*20 + 0*2
1 + 0*22)/8 = 1/8 
y^*2-3 = (1*20 + 1*2
1 + 0*22)/8 = 3/8 
Because all subintervals in coding process will be inside [1/8, 3/8) all subsequent intervals will be inside it 
(this is how coding works), that means that all paths to these subintervals will have a common part. We can 
calculate common path of x and y according to the procedure described above: 
y^ -1 = 1*20 + 1*2
1 + 0*22 = 6 
(y^ -1)^ = 0*20 + 1*2
1 + 0*22 = 2 
(y^ -1)^ - x = 2 
And finally 
com2,3(x, y) = ord2(2) = 1
We can store this path as a vector of coefficients and proceed with remaining part.  
To make further descriptions shorter we introduce two operators: extracting and rescaling 
Extracting is a trivial operator - it creates a vector of the first j coefficients of x: 
4/5/2007 
x= m0P
 + m1P
1 + m2P
2+   … + mnP
ext(x, j) = {m0, m1, m2, … , mj-1} 
if second argument is omitted, then all coefficients are extracted: 
ext(x) = {m0, m1, m2, … , mn} 
One more operation on vector representation: 
cut(x, n, m)  
removes m bit starting with position n and shrink the vector of coefficients. 
Rescaling is just omitting first j terms in x and removing common factor Pj, 
res(x, j)  = mjP
 + mj+iP
1 + m j+2P
 2+   … + mnP
Why do we call it rescaling? Because we can continue with level n-j as a first level and do not care about 
previous steps. 
Let see what happens with corresponding index 
res(x, j)^ = mjP
 + mj+iP
n-j+1 + m j+2P
n-j+2+   … + mnP
As an integer number it is smaller then the original one. Rescaling keeps numbers from growing and makes 
it possible to use computer’s integer arithmetic (not infinite precision) for calculations, which makes this 
algorithm robust. 
Continuing our example (do not forget remove common factor!)     
res(x,1) = (0*21 + 1*22) / 2 = 0*20 + 1*21 
res(y,1) = (1*21 + 1*22) / 2 = 1*20 + 1*21 
Indexes will be 
res(x,1)^ = 1 
res(y,1)^ = 3 
and grid points 
res(x,1)^*2-2 = 1/4 
res(y,1)^*2-2 = 3/4 
We can show how this works for weight interval 
{c:[0, 0.125), b:[ 0.125, 0.375), d:[0.375, 05), a:[0.5, 1)} 
and message {b}. 
4/5/2007 
Picture 7. Rescaling 
In arithmetic coding analogous procedure (see [Bodden]) is called E1/E2. We will call this rescaling PR 
(p-adic rescaling).  
Trivial, but important case is when an interval occupies a whole subinterval of level K. In this case x^ = y^-
1 and the interval can be rescale on full length to starting interval [0, 1).  This fact will be used later in 
discussion how p-adic coding corresponds to Huffman algorithm.  
Lifting 
In all our previous considerations and examples we use grids of minimal level. We may as well fix a level 
deep enough to perform all calculations. In fact, adding or removing trailing zeros in path representation 
does not change p-adic representation of a point, but, of course, changes its index representation. We will 
call operation of adding or removing zeros lifting. The reason for this name is that on a picture it looks like 
moving points in vertical direction.  
There are several advantages of using fix level in calculations: 
• It may be easy and more efficient coded, especially for P=2. 
• Model may be unable to present results as numbers of current ring G(PN); in this case special 
procedure must be implemented for changing level on a model’s demand. 
Let x to be from G(PN). Lifting is a mapping x to G(PN+f) 
x = m0P
 + m1P
1 + m2P
2+   … + mNP
lift(x, j) => x = m0P
 + m1P
1 + m2P
2+   … + mNP
N + 0•PN+1+   … + 0•PN+j  
where j ≥ 0 
Evidently as an integer number x does not change, but as an index it changes dramatically. Important, but 
trivial feature of lifting is that it does not change common paths. 
Lifting can be defined also for negative argument: 
lift(x, -j) => x = m0P
 + m1P
1 + m2P
2+   … + mN-jP
N-j  
where j ≥ 0 
4/5/2007 
If last j coefficient were zero, negative lifting also does not change x as an integer number. To use negative 
lifting without changing results we need to know an order of the last non zero coefficient: 
lnz(x) = min( j: mk =0 for k>j) 
This function gives us the highest possible for x level. A semi interval [x y) may be positioned at level  
hpl(x,y) = max(lnz(x), lnz(y)) 
Our procedure for calculating common path length of a semi interval was defined for intervals at hpl level. 
To extend it for the case when an interval belongs to a fixed level we need first to lift it back to hpl. 
comP,N( x, y ) = N if ( x == y ) else ordP(x - y)   
comP,N( x, y ) = comP,hpl(x,y) (lift(x, hpl(x,y) - N ), lift(y, hpl(x,y)  -N ) )  
Fortunately we do not have to go in that complication. The reason for this is that lifting does not change 
number of common paths.  
Let explore previous example restricting all calculations to level 4.  
0 11/21/4 3/4
{0} {1}
{0, 0} {0, 1} {1, 0} {1, 1}
{0, 0, 0} {0, 1, 0} {1, 0, 0}{0, 1, 1} {1, 0, 1} {1, 1, 0} {1, 1, 1}
{0, 0, 0, 0}
{0, 0, 0, 1}
{0, 0, 1, 0}
{0, 0, 1, 1}
{0, 1, 0, 0}
{0, 1, 0, 1}
{0, 1, 1, 0}
{0, 1, 1, 1}
{1, 0, 0, 0}
{1, 0, 0, 1}
{1, 0, 1, 0}
{1, 0, 1, 1}
{1, 1, 0, 0}
{1, 1, 0, 1}
{1, 1, 1, 0}
{1, 1, 1, 1}
Before rescaling
After rescaling
{0, 0, 1}
Picture 7a. Rescaling on level 4 
Consider two paths x and y from the previous example, but fixed the level equal to 4. On this level x and y 
can be presented as {0, 0, 1, 0} and {0, 1, 1, 0} or, as p-adic numbers: 
x = 0*20 + 0*2
1 + 1*22 + 0*23 = 4 
y = 0*20 + 1*2
1 + 1*22 + 0*23= 6 
To determine common path length we need to calculate 
y^ -1 = 5  
(y^ -1)^ = 10 
(y^ -1)^ - x = 6 
And finally 
4/5/2007 
com2,4(x, y) = ord2(6) = 1
After rescaling we have new x and y: 
x = 0*20 + 1*21 + 0*22 = 2 
y = 1*20 + 1*21 + 0*22 = 3 
But they belong to level 3. To return x and y back to level 4 lifting is needed:  
lift(x, 1) = 0*20 + 1*21 + 0*22 + 0*23  
lift(y, 1) = 1*20 + 1*21 + 0*22 + 0*23 
Finding the shortest path point 
When coding is over we can choose any paths to any point from a final semi interval as a result. But points 
from the same semi interval may and have different paths after dropping trailing zeros. Let take a simple 
example when a message finally ends with semi interval [g5(4), g10(4)). Because we can drop trailing 
zeros, point g8(4) is the best choice – after dropping trailing zeros it becomes g1(1). 
0 11/21/4 3/4
g0(0)
g0(1) g1(1)
g0(2) g1(2) g2(2) g3(2)
g0(3) g2(3) g4(3)g3(3) g5(3) g6(3) g7(3)
g0(4)
g1(4)
g2(4)
g3(4)
g4(4)
g5(4)
g6(4)
g7(4)
g8(4)
g9(4)
g10(4)
g11(4)
g12(4)
g13(4)
g14(4)
g15(4)
g1(3)
Picture 8. Shortest path point 
A shortest path point in a semi interval [l, r) can be defined as a point with minimum level.  
lv(x^) = max(i: mi ≠ 0)   
g = min(lv(x^): l ≤ x < r )  
But let consider paths as integers. From this point of view a point with minimal path is just a minimal p-
adic integer. So 
g = min(x^: l ≤ x < r) 
We can check this for our example: 
Point g5(4) g6(4) g7(4) g8(4) g9(4) g10(4) 
Path {0, 1, 0, 1} {0, 1, 1, 0} {0, 1, 1, 1} {1, 0, 0, 0} {1, 0, 0, 1} {1, 0, 1, 0} 
p-adic number 10 6 15 1 9 5 
Index 5 6 7 8 9 10 
4/5/2007 
Model 
Model is just an abstraction for a set of functions. One function calculates new subinterval on a base of 
incoming symbol and current interval in a predefined grid.  
M.code(a, l, r) => lnew, rnew  
An other takes as arguments a point and current interval and returns a new subinterval and a symbol  
M.decode(g, l, r ) => lnew, rnew, a 
l, r, g belongs to grid G(PN), a – to alphabet A. 
Model operates with indexes from a ring of nonnegative integers modular PN, so we have three possible 
variants how one subinterval on a ring can be situated inside another: 
0 ≤ l ≤ lnew < rnew  ≤  r <  P
N     
r=0 ; 0 ≤ l ≤ lnew < rnew <  P
N     
r=0 ;  rnew = 0; 0 ≤ l ≤ lnew <  P
N     
And, of course some technical things: initialization and taking care of end of message. 
M.init(A, P, N, x) 
Where A is alphabet, P, N – characteristics of grid G(PN), x – optional parameter, some auxiliary 
information, which may be used by a model for optimization. 
code and decode functions may update model, but they must do it in sync. 
Input and Output 
I (input) and O (Output) are abstracts for pushing and receiving information.  
To make notations short we introduce an ugly term P-bit, which means one of symbols 0, … P-1. For P=2 
it is obviously a normal bit. Now let describe input and output operations. 
I.getC => returns next character from input stream or EOM (End Of Message) 
I.getB(n) => returns  next n P-bit vector from input stream 
O.pushB(U ) – pushes all P-bits from vector U 
O.pushB(p, n) – pushes P-bit p n times 
O.pushC(a) – pushes a symbol to an output stream 
Algorithms 
Now we are in position to describe the p-adic coding algorithm. Main idea of this algorithm is the same as 
in arithmetic coding – a message is mapped to in interval on [0, 1). 
There two parts of the algorithm – encoding and decoding, but whatever we are doing the first step – 
initialize a model: 
M.init(A, N) 
Coding 
Start with an empty message – no symbols. An empty message is coded as [0, 1), empty path U = {} or as 
[0, 0). 
l, r = 0, 0 
When a symbol a comes  
a = I.getC 
model calculates a new interval.  
4/5/2007 
l, r = M.code(a, l, r) 
Now calculate a common path length 
 n = comP,N(l^, (r-1)^) 
If n > 0 we can push common path to an output  
O.pushB(ext (l^, n)) 
and do rescaling. 
l^, r^ = res(l^, n), res(r^, n) 
And we also need to lift rescaled values back to level N and convert to index representation. 
l, r = lift(l^, n)^, lift(r^, n)^ 
Now we can read a next symbol and repeat steps. 
Pseudo code 
M.init(A, P, N) 
l, r = 0, 0 
while ( ( a = I.getC ) != EOM ) {  
     l, r = M.code(a, l, r) 
     n = comP,N(l^, (r-1)^) 
      if ( n > 0 )  { 
             O.pushB(ext(l^, n)) 
             l, r = lift(res(l^, n), n)^, lift(res(r^, n), n)^ 
      } //if 
} //while 
l, r = M.code(EOM, l, r) 
q = selectPoint(l, r) 
O.pushB(ext(q, lnz(q)) 
We do not specify here what selectPoint does. The only requirement is to return a grid point from final 
semi interval [l, r), but of cause, it’s a good idea to return a point with a shortest paths. As it follows from 
previous discussion, all we need is to find a minimal integer in p-adic representation. So, to select a point 
with minimal path we should define  
selectPoint( l, r ) = min(x^: l ≤ x < r) 
lnz used here not to push trailing zeros. 
Decoding 
Start with an empty message – no symbols. An empty message is coded as [0, 1), or as empty path U={} or 
as pair of indexes: 
l, r = 0, 0 
As the first step read first N P-bits from an input stream and construct a number from the vector. We need 
also to transform a path we a getting from a stream, to a number, so we use operator ^. 
g= (I.getB(n) • PTN)^ 
where PTN is a vector  
PN = (P
0, P1, P2,   … ,  PN) 
Model calculates a new interval and a symbol a 
M.decode(g, l, r ) => l, r, a 
Now, a is a new decoded symbol and can be pushed into a stream of decoded symbols  
4/5/2007 
O.pushC(a)  
Next, as in the coding algorithm, calculate common path length 
n = comP,N(l^, (r-1)^) 
If n > 0 we can drop common path and do rescaling. 
l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n))^  
read additional n P-bits and recalculate g 
g = g + (I.getB(n) • PTn)^ 
Now we can repeat all steps. 
Pseudo code 
M.init(A, P, N) 
l, r = 0, 0 
g = (I.getB(n) • PTN)^ 
while ( true ) {  
     l, r, a = M.decode(g, l, r ) 
     if ( a == EOM ) break 
     O.pushC(a) 
     n = comP,N(l^, (r-1)^) 
     if ( n > 0 ) { 
           l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n),n)^ 
           g = g + (I.getB(n) • PTn)^ 
     } //if  
 } //while 
One important particular case – Huffman codes 
Now we are prepared to show that p-adic coding algorithm gives exactly the same codes as Huffman’s 
algorithm [Huffman] if a weight interval is prepared in a special way. Let as assume that for a given 
alphabet and symbol probabilities a Huffman code tree was constructed.  For example: 
Symbol (s): a b C d e 
Codeword (h(s)):  000 001 10 01 11 
Grid level (cl(s)): 3 3 2 2 2 
Starting index in grid: 0 1 2 1 3 
We can map the tree to weight interval using the same technique as we used for coding messages  
4/5/2007 
Picture 9. Mapping Huffman code tree to weight interval 
After lifting all intervals to highest grid (N=3): 
Symbol (s): a B c d e 
Codeword (lift(h(s),N-cl(s)): 000 001 100 010 110 
Starting index in grid: 0 1 4 2 6 
Algorithm of constructing weight intervals from a Huffman code tree for alphabet A is simple. 
Let cl(s) be a length of Huffman code of symbol s and N = max(cl(s)) among all s from A, h(s) - 
Huffman code of s, then symbols s occupies a semi interval starting at point with index 
lift(h(s),N-cl(s))^ and ending at starting point of a next symbol or 1. 
Constructed weight interval has an important property – all of subintervals occupy a whole grid interval of 
some level. It was shown above that in this situation left end right ends have an entire path in common; so 
PR rescaling will push all of it into an output and a next symbol will be coded starting with [0,1) interval. 
This proves that for this particular choice of weight interval p-adic coding works identical to Huffman’s 
algorithm. 
Another particular case – Golomb-Rice codes 
Surprisingly enough, but p-adic coding algorithm produces Golomb-Rice [Golomb, Rice] codes when 
supplied with single symbol alphabet and special model; no changes to algorithm itself are needed.  
If an alphabet contains only one symbol, the only information a message may contains is its length. So 
coding of a message is equivalent to coding of a natural number – the length. We will use symbol * to 
identify the only entry. 
The model is trivial:  
M.code(*, l, r) => l, r-1  
M.code(EOM, l, r) => r-1, r  
M.decode(g, l, r ) => if (g == r-1)  then l, r, EOM  else l, r-1, * 
The algorithm will do all the work. Let start with P=2 and consider a grid 2N+1.  
Coding procedure starts with  
 l = r = 0 
4/5/2007 
If a message is empty, we have to encode EOM. To do this we need to calculate r-1= 0 - 1, which is 2N+1-1 
and return a path to 2N+1-1.   This path consists of N+1 ones: {1, 1, … , 1, 1}. This is our new 
representation of zero.  If a symbol comes, the model recalculates r and l: 
l = 0 
r = 2N+1-1 
If it was the only symbol in a message, then the model returns 2N+1-2, 2N+1-1, and a code is a path to a 
point with index 2N+1-2:  {1, 1, … , 1, 0}. This procedure may be continued until a message’s length is less 
than 2N. At this point the model returns  
l = 0 
r = 2N 
because comP,N(0^, (2
N -1)^) = 1 PR rescaling will be used; one 0 will be pushed to output buffer, l and r   
return to their initial values l = r = 0. The coder is in initial state and ready to receive a new symbol.   
Encoder stays almost without changes. We have defined  
selectPoint( l, r ) = l 
And drop lnz call in the last pushB operation to keep trailing zeros 
O.pushB(ext(q)) 
If a messages of length W comes W/2N zeros will be pushed in output buffer; the rest part of the output will 
contain a path to a point which index is 0 – (W%2N). After encoding EOM we have to move the point one 
step to the left. So finally index will be 0 – ((W%2N) + 1). 
For example, for N=3 we have: 
W code W Code 
0 1111 8 01111 
1 1110 9 01110 
2 1101 10 01101 
3 1100 11 01100 
4 1011 12 01011 
5 1010 13 01010 
6 1001 14 01001 
7 1000 15 01000 
The codes look very much like Golomb-Rice codes. Indeed, they may be transformed to each other by 
replacing 1 with 0, and 0 with 1 - binary NOT.  
There is no magic in changing unary representation and delimiter – there is no difference between counting 
a number 0 of before first 1 and counting number of 1 before first 0.  Transformation of the rest part – after 
delimiter, may be not that clear.  
In the ring of integers modular 2N 
0 – (R +1) = (2N  - 1) - R  
here R = (W%2N); R < 2N. In binary representation (2N – 1) is a vector U of N 1. Now   
NOT(U – R) =  R  
This proves that after NOT transformation the rightmost part of codes transforms to W%2N. 
4/5/2007 
Any prime P can be used with this model. But this generalization does not look very promising. In fact, the 
reason why we discuss Huffman and Golomb-Rice codes here is to emphasize that the most popular 
entropy codes have a common base – they all maps messages to p-adic integer numbers. 
Rescaling based on Archimedean distance (AR)  
We were very ingenious when selecting most convenient for us weigh interval:   
{a:[0, 0.5), b:[0.5, 0.75), c:[0.75, 0.875), d:[0.875,1)} 
Yes, compression rate does not depend on an order of subintervals, but calculation and resulted codes do. 
Let shuffle the weigh interval: 
{b:[0, 0.25), a:[0.25, 0.75), c:[0.75, 0.875), d:[0.875,1)} 
Now subinterval a:[0.25, 0.75) covers the center point 1/2. Consider now a message containing only 
symbols a. It can be easily shown that left edge of message interval will be always less than 1/2, while the 
right one – greater. From p-adic point of view this means that ordp(l, r) is always zero and there is no 
common path and, as a sequence, rescaling will never happen. If we continue coding {a, a, … , a} we will 
end in integer overflow error or will be faced to use infinite precision arithmetic.  
To save our integer arithmetic from huge numbers we have to use the fact that Archimedean length in this 
case is less or equal to1/2.  
Picture 10. Coding {a, a, a} 
For P ≠ 2 situation is more complex. A semi interval can include any grid point 0 < n < P.  In the 
following example (P = 3) an interval has Archimedean length 2/9, but p-adic length 1.  
4/5/2007 
Picture 11. Before rescaling 
Now let explore a case when a sub interval lies in the smallest interval of level 2, which includes a point of 
level 1 with index n.  p-adic representation of left l and right r edges of such subinterval is.  
l = {n-1, P-1, …. } 
r = {n, 0, … }   
It’s Archimedean length is less or equal to 2/(P*P). We want to map it to a bigger interval, precisely to 
interval [n-1, n+1) from level 1.  This can be done by a linear transformation: 
Y(X) = XP1 – nP0 +nP-1  
Let’s consider how a semi interval defined in p-adic representation as 
l = m0P
 + m1P
1 + m2P
2+   … + mNP
r = k0P
 + k1P
1 + k2P
2+   … + kNP
transforms under this mapping. The first thing we need to do – to transform paths to points. We can do it by 
using IP transformation:   
a = m0P
 + m1P
-2 + m2P
-3+   … + mNP
-N-1 
b = k0P
 + k1P
-2 + k2P
-3+   … + kNP
-N-1 
Now we can apply linear transformation: 
Y(a) = (m0 - n)P
 + (m1 + n)P
-1 + m2P
-2+   … + mNP
Y(b) = (k0 - n) P
0+ (k1 + n) P
-1 + k2P
-2+   … + kNP
For this subinterval we have: 
m0 = n -1;   m1 = P- 1  
k0 = n;   k1 = 0  
Y(a) = 0P0 + (n - 1)P
-1 + m2P
-2+   … + mNP
Y(b) = 0P0+ nP-1 + k2P
-2+   … + kNP
Rescaling will drop first zero terms. Reverting back from points to paths we can find how this 
transformation works on paths: 
Y(l) = (n - 1) P0 + m2P
1+   … + mnP
Y( r ) = nP0 + k2P
1+   … + knP
Or in vector representation: 
4/5/2007 
Y(l) = {n-1, P-1, …. } => {n -1, … } 
Y(r) = {n, 0, … }  => {n, … } 
we just remove second (counting from the left) elements. 
It is also easy to verify that center point {n, 0, 0, …, 0} of this mapping is a stable point, i.e. Y maps it to 
itself 
Y :  {n, 0, 0, …,  0}  => {n, 0, …,  0} 
New interval [l, r) contains the stable point. 
Coming back to the example (here n=1) we can draw the picture after rescaling: 
Picture 11a. After rescaling 
We will refer this rescaling as AR. Important difference between AR and PR rescaling is that AR does not 
push anything in output buffer.  
It is convenient to invent a special predicate AR? for testing if AR rescaling can be applied for an interval. 
AR?(l, r, P) = (r[0] – l[0] == 1) AND (l[1] == P-1)  AND  (r[1] == 0)  
To continue coding we must remember the applied mapping, it can be done by storing only two parameters: 
n – a stable point and u – a number of times rescaling was applied.   
What may happen if we continue coding?  
1. [l, r) are still contains n 
1.1. value of AR? predicate is false  
1.2. value of AR? predicate is true 
2. [l, r) does not contain n 
2.1. n lays to the right of r;  toRight?( n, r) == true  
2.2. n lays to the left of l;     toLeft?( n, l) == true 
To test condition 2.1 and 2.2 we introduced two predicates toRight? and toLetf?. There predicates are 
suppose to receive a path as second argument, i.e. a number in p-adic integer number; first argument is an 
integer number 
toRight?(n, r) = ( r[0] < n ) OR ( r^ == n ) 
toLeft?(n, l) = l[0] ≥ n 
Now let discuss situations mentioned above: 
1.1. This is the simplest case. We just continue coding. 
1.2. Increase u: u = u +1; do AR rescaling and continue coding.   
4/5/2007 
2.1. This means that the whole interval lays in [{n-1, P-1, … , P-1}, {n, 0, … , 0}); where P-1 is added u 
times. Any subinterval from this interval has common path {n-1, P-1, … , P-1}, so we can now push this 
path into output and rescale l and r one more time removing fist digits.  
2.2. This means that the whole interval lays in [{n, 0, … , 0, 1}, { n, 0, … , 0,1}); where 0 is added u 
times. Any subinterval from this interval has common path {n, 0, … , 0}, so we can now push this path into 
output and rescale l and r one more time removing fist digits.  
AR and PR rescaling procedures together guaranty that current coding interval will never be smaller than 
2/P2-1/PN. This means that maximum value of indexes is 2PN-2-1. 
Algorithms revised 
Coding with AR 
A new feature here, comparing to the first variant of p-adic encoding algorithm, is that we need to track AR 
transformation. To do this we introduce two new variables sp and spn.  
• sp – stable point of AR; it is a point of level 1 and may be represented as a positive integer (not 
path) 0 < sp < P.  
• spn – number of times AR was applied.  
Some additional operations should be done at final step. First of all we need to check, as in the main loop, 
if the final interval is situated to the left or to the right of a stable point and, if this is the case, do necessary 
pushing and then proceed to usual final search for minimal point.  If not and spn is not zero, then we are 
lucky and we already have a point from level 1 and all we need to do is just to push out sp.  
Pseudo code 
M.init(A, N) 
l, r = 0, 0 
sp, spn = 0, 0 
while ( ( a = I.getC ) != EOM ) {  
     l, r = M.code(a, l, r) 
     if ( spn ≠ 0 ) { 
         if ( toLeft?(sp, l^) ) { 
             O.pushB(sp, 1) 
             O.pushB(0,spn) 
             l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ 
             sp, spn = 0, 0 
         } //if  
         if ( toRight?(sp, r^) ) { 
             O.pushB(sp - 1, 1) 
             O.pushB(P - 1, spn) 
             l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ 
             sp, spn = 0, 0 
         } //if   
     } //if 
     // PR rescaling 
     if ( spn == 0 ) { 
          n = comP,N(l^, (r - 1)^) 
          if ( n > 0 )  { 
                  O.pushB(ext(l^, n)) 
                  l, r = lift(res(l^, n), n)^, lift(res(r^, n), n)^ 
4/5/2007 
          } //if 
      } 
     //AR rescaling 
     while ( AR?(l^, r^) )  { 
            sp =  r^[0]  if sp == 0 
            spn = spn + 1 
            l, r = lift(cut(l^,1,1),1)^, lift(cut(r^,1,1),1)^ 
     }  //while 
}  //while 
l, r = M.code(EOM, l, r) 
if ( spn ≠ 0 ) { 
    if ( toLeft?(sp, l^) ) { 
        O.pushB(sp, 1) 
        O.pushB(0, spn) 
        l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ 
        sp, spn = 0, 0 
    } //if  
    if ( toRight?(sp, r^) ) { 
        O.pushB(sp - 1, 1) 
        O.pushB(P - 1, spn) 
        l, r = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ 
        sp, spn = 0, 0 
    } //if   
} //if 
if (spn == 0) { 
     q = selectPoint(l, r) 
     O.pushB(ext(q, lnz(q)) 
} else { 
     O.pushB(sp, 1)  // we already have point of level 1 
} //if 
//the End 
Decoding with AR 
AR rescaling is simpler for decoding process, because we do not care about pushing anything out and a 
final step is most simple – we just finish decoding. The only thing which is new is additional reading from 
an input stream.  
Pseudo code 
M.init(A, N) 
l, r = 0, 0 
spn = sp = 0 
g = (I.getB(N) • PTN)^ 
while ( true ) {  
     l, r, a = M.decode(g, l, r ) 
     if ( a == EOM ) break 
     O.pushC(a) 
     if ( spn ≠ 0 ) { 
         if ( toLeft?(sp, l^) OR toRight?(sp, r^) ) { 
             l, r, g = lift(res(l^, 1), 1)^, lift(res(r^, 1), 1)^ , lift(res(g^, 1), 1)^ 
             g = g + (I.getB(1) • PT1)^ 
4/5/2007 
             sp, spn = 0, 0 
         } //if  
     } //if 
     // PR rescaling 
     n = comP,N(l^, (r-1)^) 
      if ( n > 0 ) { 
            l, r, g = lift(res(l^, n), n)^, lift(res(r^, n), n)^, lift(res(g^, n),n)^ 
            g = g + (I.getB(n) • PTn)^ 
       } //if 
     // AR rescaling 
     while ( AR?(l^, r^) )  { 
            sp =  r^[0]  if sp == 0 
            spn = spn +1 
            l, r, g = lift(cut(l^,1,1),1)^, lift(cut(r^,1,1),1)^, lift(cut(g^,1,1),1)^ 
            g = g + (I.getB(1) • PT1)^ 
     }  //while 
 } //while 
//the End 
Of course, PT1 is just 1 and we can also omit ^ operator. The operation 
g = g + (I.getB(1) • PT1)^ 
can be replaced (in two places) by 
 g = g + I.getB(1)  
Implementation 
We have implemented all algorithms and all tests in Ruby [Ruby] – a new popular interpreted, dynamically 
typed, pure object-oriented, scripting language. And Ruby proved to be very helpful. We would hardly be 
able to try so many variants and run innumerous tests in any other language.  
Now let discuss the practical case of P=2. All previous discussion remains valid – this is just a special case. 
This case has most important advantage – we can use real bits and binary vectors. This is extremely 
convenient.   
All algorithms remain the same. Only some small improvement can be done for AR rescaling. Because the 
only possible value for n is 1, there is no need to store it as spt. In case when toLeft? returns true we have 
to push 1 and a number of 0; if toRight? returns true we have to push 0 and a number of 1. 
Arithmetic coding 
We can see that arithmetic coding is just a special case of p-adic coding for P=2. All conditions expressed 
there as arithmetic operations can be done on bit level. In fact, many practical implementations use shifts 
instead.  
Let us examine E1 condition: 
mHigh < g_Half 
where  
g_Half = 0x40000000 
This condition means that most significant bit in binary representation of mHigh must be 0. This is also 
true for gLow because mLow < mHigh. Reverting to paths we can see that both gLow and gHigh have 
4/5/2007 
most significant bits in p-adic representation are equal to 0, so p-adic distance is less than 1 and PR 
condition is fulfilled.   
However, in p-adic coding algorithm PR rescaling works for mHigh equal to g_Half. It is this small 
difference makes p-adic coding algorithm works exactly as Huffman algorithm for certain models. 
Arithmetic coding in this situation does not provide optimal compression (see discussion in [Bodden]).  
AR rescaling is similar to E3. AR? predicate is equivalent to  
 (g_FisrtQuater <= mlow) AND (mHigh < g_ThirdQuater) 
We have implemented the same model as proposed in [Bodden] and get the same compression for all 
standard tests.  
Results 
Standard tests 
For testing we used Calgary/Canterbury text compression corpus – popular set of tests first discussed in 
[Bell]. It contains files bib, book1, book2, geo, news, obj1, obj2, paper1, paper2, paper3, paper4, paper5, 
paper6, pic, progc, progl, progp and trans. These files may be obtained from [Canterbury corpus]. 
Comparison with Arithmetic coding 
We used program codes published in [Bodden] to get results of arithmetic coding. This program adds 
additional 4 bytes to an output file; these 4 bytes are in most examples the only difference between 
arithmetic and p-adic coding results. 
In our test we use p-adic coding with P=2 and N= 31. 
Conclusion 
Tree is a well known and widely used data structure in computer science. Arithmetic, Huffman and 
Golomb-Rice coding are also well known and widely used for a long time algorithms. p-adic numbers, 
ultrametric spaces are not so popular in computer science; even for pure mathematic they are relatively 
new. Is there any connection between them? We hope that we have shown this connection and that this 
connection is quite natural and fundamental.  
A message, as sequences of symbols, may be considered as path on a tree. There are numerous ways to 
construct this mapping. It is quite fundamental and widely used way for presenting messages and is very 
popular in computer science and applications.  On the other hand, trees are great models of p-adic numbers; 
many strange and unusual features of ultrametric spaces can be understood and visualized on trees [Holly]. 
This works also in the reverse direction – p-adic numbers is convenient tool for indexing paths and p-adic 
norm is a natural measure on trees. 
On the other hand, a message can be mapped to a subinterval of a unit interval – this what real number 
arithmetic algorithm does. While theoretically clear and simple, this method was never used in practice, 
because of its inefficiency due to problems with computer based real arithmetic.  Integer arithmetic coding 
solved this problem by introducing some practical receipts how to use integer numbers, instead of real 
ones. The resulting algorithm proved to be efficient and robust and may be because of this fact no 
theoretical analysis has been done. Integer version looks pretty much like the original algorithm, but in fact, 
difference between them is considerable; while real number algorithm works on a field, its integer number 
variant deals with a finite ring. 
p-adic number coding algorithm explicitly works with numbers from the finite ring of positive integer 
numbers modular PN. These numbers, being mapped to a union interval, create an equidistance grid G(PN). 
The next step is to create a path from a root through grid points of upper levels (G(PK); k=0..N-1) to points 
of this grid.  This construction creates a bridge between ultrametric space of tree paths and Archimedean 
space of grid points. Now we can identify any grid point not only by its index, but by a path, i.e. by some 
4/5/2007 
p-adic integer number; the reverse is also true – any path can be identify by its end point from the grid and 
as so by an index. This dualism is the real base of p-adic arithmetic coding algorithm. We found a simple 
and elegant way to transform paths to indexes and back. We called it IP transformation. As a 
transformation from paths to points IP transformation can be considered as Kozyrev’s transformation for 
finite paths, but IP transformation is reversible.  
p-adic arithmetic coding algorithm works as a bridge between two spaces – ultrametric space of paths and 
Archimedean space of grid points. Model calculates intervals with edge points on the grid and then IP 
transformation maps them to paths; if these paths are closed to each other as p-adic numbers, then common 
path is pushed to output buffer, these p-adic numbers are truncated, and IP transformation maps them back 
on grid. For P=2 PR rescaling works pretty much like E1/E2 rescaling but it has one small improvement.  
It is this improvement that makes it possible to show that for certain models and alphabet p-adic coding 
algorithm works as Huffman and Golomb-Rice algorithm.  For P=2 and general models it works as 
arithmetic coding. 
So we may say that three most popular entropy coding algorithms can be considered as special cases of one 
algorithm - p-adic coding, working with different P, models and alphabets.  This also gives an answer to 
the question in the begging of this paragraph - arithmetic, Huffman and Golomb-Rice coding algorithms 
maps messages to ultrametric space of p-adic numbers. They are “speaking in prose”! 
References 
1. Abrahamson, N., "Information theory and coding”, McGraw-Hill, New York 1963. 
2. Baker A.J., “Introduction to p-adic Numers and p-adic Analysis”, Department of Mathematics, 
University of Glasgow G12 8QW, Scotland 
3. Bell, T.C., Witten, I.H. and Cleary, J.G., "Modeling for text compression", Computing Surveys 
21(4): 557-591; December 1989. 
4. Bodden Eric, Clasen Malte, Kneis Joachim, "Arithmetic Coding Revealed. A guided tour from 
theory to practice”, Translated and updated version, May, 2001. 
5. Canterbury corpus, http://links.uwaterloo.ca/calgary.corpus.html . 
6. Golomb S.W., "Run-length encoding”, IEEE Transactions on information Theory, IT-12:399-
401, July 1966. 
7. Holly J.E.,”Pictures of Ultrametric Spaces, the p-adic Numbers, and Valued fields”, Amer. 
Math. Monthly 108 (2001) 721-728 
8. Huffman D.A., “A method for construction of minimum-redundancy codes ”,  Proc. Inst. Radio 
Eng. 40, 9 (Sept. 1952), 1098-1101 
9. Koblitz Neal, "p-adic numbers, p-adic analysis and zeta-functions”, Springer-Verlag, 1977. 
10. Koc C.K., "A Tutorial on p-adic Arithmetic”, Electrical & Computer Engineering, Oregon State 
University, Corvallis, Oregon 97331, April 2002. 
11. Kozyrev S.V., “Wavelet theory as p-adic spectral analysis”, Izvestiya: Mathematics 66:2 367-376 
12. Moffat A, Neal R. M. and Witten I. H., “Arithmetic coding revised,” ACM Transactions on 
Information Systems, vol. 16, no. 3, pp. 256-294, 1998 
13. Nelson Mark, "Arithmetic Coding + Statistical Modeling = Data Compression”, Dr. Dobb’s 
Journal, February, 1991.   
14. Rice R.F., "Some Practical Universal Noiseless Coding Techniques”, Technical Report JPL 
Publication 79-22, JPL, March 1979. 
15. Ruby, "Ruby”, http://www.ruby-lang.org/en/ 
16. Said Amir, “Introduction to Arithmetic Coding – Theory and Practice”, Imagining Systems 
Laboratory, HP Laboratories, Palo Alto, HPL-2004-76, April 21,2004 
17. Salomon David, "Data compression. The complete reference”, Springer-Verlag, 2004 
18. Sayood Khalid, "Introduction to data compression. The complete reference”, Elsevier, 2006 
19. wiki,  "Arithmetic_coding”, http://en.wikipedia.org/wiki/Arithmetic_coding 
20. Witten, I.H., Neal, R. and Cleary, J.G. (1987) “Arithmetic coding for data compression.” 
Communications of the ACM, 30(6), pp. 520-540, June. Reprinted in C Gazette 2(3) 4-25, 
December 1987
ABSTRACT
  A new incremental algorithm for data compression is presented. For a sequence
of input symbols algorithm incrementally constructs a p-adic integer number as
an output. Decoding process starts with less significant part of a p-adic
integer and incrementally reconstructs a sequence of input symbols. Algorithm
is based on certain features of p-adic numbers and p-adic norm. p-adic coding
algorithm may be considered as of generalization a popular compression
technique - arithmetic coding algorithms. It is shown that for p = 2 the
algorithm works as integer variant of arithmetic coding; for a special class of
models it gives exactly the same codes as Huffman's algorithm, for another
special model and a specific alphabet it gives Golomb-Rice codes.

<|endoftext|><|startoftext|>
Compton X-ray and γ-ray Emission from Extended Radio
Galaxies
C. C. Cheung1
Kavli Institute for Particle Astrophysics and Cosmology, Stanford University, Stanford, CA 94305, USA
Abstract. The extended lobes of radio galaxies are examined as sources of X-ray and γ-ray emission via inverse Compton
scattering of 3K background photons. The Compton spectra of two exemplary examples, Fornax A and Centaurus A, are
estimated using available radio measurements in the ∼10’s MHz – 10’s GHz range. For average lobe magnetic fields of
∼ 0.3–1 µG, the lobe spectra are predicted to extend into the soft γ-rays making them likely detectable with the GLAST
LAT. If detected, their large angular extents (∼1◦ and 8◦) will make it possible to “image” the radio lobes in γ-rays. Similarly,
this process operates in more distant radio galaxies and the possibility that such systems will be detected as unresolved γ-ray
sources with GLAST is briefly considered.
Keywords: gamma-ray sources (astronomical); radiofrequency spectra; imaging; radiogalaxies
PACS: 98.54.Gr,98.58.Fd
INVERSE COMPTON "IMAGES" OF LARGE RADIO GALAXIES
Inverse Compton (IC) scattering of the CMB is a mandatory process in synchrotron emitting sources. This emission
becomes most prominent in regions of weaker B-field like the extended lobes of radio galaxies. Many such IC/CMB
lobe X-ray sources are now known (e.g., Croston et al. 2005; Kataoka & Stawarz 2005) and we explore the possibility
of the IC spectra extending into the γ-ray band. This is independent of possible γ-ray emission from the unresolved
nuclei of radio galaxies, i.e., from the misaligned blazar (Sreekumar et al. 1999; Bai & Lee 2001; Foshini et al. 2005).
The case of the nearby (D=18.6 Mpc) double-lobed radio galaxy, Fornax A was discussed in Cheung (2007). Radio
flux density measurements down to ∼30 MHz (Isobe et al. 2006) were used to estimate the IC/CMB spectra of the
lobes. Normalizing the IC spectra to the X-ray detections of the lobes (which indicate B∼1.5µG on average; Feigelson
et al. 1995, Isobe et al. 2006), the presence of high frequency radio emission observed in the >∼ 10–90 GHz range with
WMAP (with Fν ∝ ν−1.5) imply a detectable soft γ-ray signal. As this emission is not expected to be time variable,
the LAT can simply integrate on this position during its normal scanning mode to test this prediction.
Here, we similarly consider the case of Centaurus A which is only 3.5 Mpc away. It is long known to have structure
extended over ∼8◦ in declination (Cooper et al. 1965, and references therein). We use the extensive compilation by
Alvarez et al. (2000) of the various components of the radio source; Figure 1 shows a low resolution 408 MHz image
from Haslam et al. (1982). The outer (degree-scale) giant lobes (GLs) visible in Figure 1 account for >∼ 2/3 of the
total 408 MHz emission at ∼1000 Jy each; the arcmin-scale inner lobes (ILs) are only 3–4 times fainter than each GL.
The northern GL was searched for such IC emission with ASCA data but the extended X-rays could not be uniquely
attributed to such a process (Isobe et al. 2001).
Repeating the analysis as for Fornax A, it appears that the extended components of Cen A will also emit γ-rays at
a level detectable by GLAST. The various data from ∼10 MHz to 43 GHz are consistent with a single spectral index
α=0.7. Since the luminosities of both the ILs and GLs are similar (within ∼20%), only the SEDs of the southern ones
are plotted in Figure 1. Utilizing these radio measurements, the expected IC/CMB spectra for example B-field strengths
are drawn. The integrated Compton Gamma-Ray Observatory (CGRO) COMPTEL detections of Cen A (Steinle et al.
1998) at ∼1021 Hz already limit B >∼ 1µG for both the northern and southern GLs (since they have similar radio
spectra); a similar extrapolation for the ILs give B >∼ 0.3µG.
Thermal emission will be a complicating factor at energies below ∼10 keV, so hard X-ray and soft γ-ray measure-
ments are better suited for detecting the suspected IC/CMB emission. Additionally, since Fornax A and Cen A are
1 Jansky Postdoctoral Fellow. The National Radio Astronomy Observatory is operated by Associated Universities, Inc. under a cooperative
agreement with the U.S. National Science Foundation.
http://arxiv.org/abs/0704.0835v1
208.000 204.000 200.000 196.000
-38.000
-40.000
-42.000
-44.000
-46.000
-48.000
Right Ascension (J2000)
GL (south)
GL (north)
Centaurus A                   408 MHz
0.85 deg
FIGURE 1. [Left] Radio image of Cen A at 0.85◦ resolution which is comparable to the angular resolution of GLAST/LAT.
[Center] SEDs of the multiple components of the Cen A radio source with lines indicating Fν ∝ ν−0.7 spectra. The data points at
> 1018 Hz are the integrated detections with CGRO with lines indicating the expected IC/CMB spectra of the southern giant lobe
for different average B-fields. [Right] IC/CMB X-ray and γ-ray flux predictions for 17 of the highest-z radio galaxies discussed in
the text. Typical Chandra “snapshots" are 5–10 ksec exposures so these sources are all expected to be easily detectable in the X-rays
for the indicated field strengths. GLAST detections require electrons with γ >∼ 105 in a 1µG or smaller field which are optimistic.
quite extended in the sky (∼1◦ and 8◦), if they are detected with GLAST, the contributions from the two lobes will
be separable with the LAT making IC/CMB γ-ray “images” of these radio galaxies possible. These γ-ray images will
appear most similar to radio maps at frequencies, ν >∼ 10 GHz; such radio maps of Cen A’s extended components have
already been obtained by WMAP (Page et al. 2007, Fig. 2 therein) and are available for this comparison.
THE HIGHEST-REDSHIFT RADIO GALAXIES
Using the above examples as a guide, we can gauge the feasibility of detecting even more distant radio galaxies at
the higher-energies. Utilizing the recent large compilation of bright z>2.5 radio sources by Carson et al. (2007), we
consider the highest-redshift (z > 3.5) radio galaxies for illustration. The observed monochromatic Compton (X-ray,
γ-ray) to synchrotron (radio) flux ratio for IC/CMB emission has a strong redshift dependence: for α=1, it is simply
fc/ fs ≃ ucmb/uB ≃ 10(1+ z)
4δ 2/B2µG ( fν ≡ νFν , and δ is the Doppler factor which is set to 1). We use the NVSS
(Condon et al. 1998) 1.4 GHz fluxes for fs. Most of the considered sources (13/17) are detected at 74 MHz in the
VLSS database (Cohen et al. 2005) giving α74MHz−1.4GHz ∼ 0.9− 1.2, so the approximate relation is applicable.
As in the nearby sources, these distant radio galaxies are expected to be IC/CMB X-ray sources unless B ≫10µG
(Fig. 1). Chandra observations should easily detect this emission to constrain the lobe B-fields, and thus the lobe
energetics. In one of the highest-redshift (z = 3.8) radio galaxies observed so far with Chandra (Scharf et al. 2003),
it was necessary to remove the contribution from a bright nucleus (spatially) and extended IC emission from other
sources of seed photons (by spectral fitting). Such X-ray observations can guide our determination of the expected
level of (soft) γ-ray emission from the IC/CMB process; at the moment, the estimates (Fig. 1) are rather crude.
REFERENCES
1. H. Alvarez, J. Aparici, J. May, & P. Reich, Astron. Astrophys. 355, pp. 863–872 (2000).
2. J. M. Bai & M. G. Lee, Astrophys. Journal Lett. 549, pp. L173–L177 (2001).
3. J. E. Carson, T. M. Arias, & C. C. Cheung, in preparation (2007).
4. C. C. Cheung, in The Central Engine of Active Galactic Nuclei, edited by L. C. Ho & J.-M. Wang, ASP Conf. Series, in press,
arXiv:astro-ph/0612372 (2007).
5. A. S. Cohen, et al., in From Clark Lake to the Long Wavelength Array: Bill Erickson’s Radio Science, edited by N. Kassim et
al., ASP Conf. Series 345, 299–303 (2005).
6. B. F. C. Cooper, R. M. Price, & D. J. Cole, Australian Journal of Physics 18, pp. 589–625 (1965).
http://arxiv.org/abs/astro-ph/0612372
7. J. J. Condon, W. D. Cotton, E. W. Greisen, Q. F. Yin, R. A. Perley, G. B. Taylor, & J. J. Broderick, Astron. Journal 115, pp.
1693–1716 (1998).
8. J. H. Croston, et al., Astrophys. Journal 626, pp. 733–747 (2005).
9. E. D. Feigelson, S. A. Laurent-Muehleisen, R. I. Kollgaard, & E. B. Fomalont, Astrophys. Journal Lett. 449, pp. L149–L152
(1995).
10. L. Foschini et al., Astron. Astrophys. 433, pp. 515–518 (2005).
11. C. G. T. Haslam, C. J. Salter, H. Stoffel, & W. E. Wilson, Astron. Astrophys. Suppl. 47, pp. 1–142 (1982).
12. N. Isobe, K. Makishima, M. Tashiro, & H. Kaneda, in Particles and Fields in Radio Galaxies, edited by R. A. Laing &
K. M. Blundell, ASP Conf. Series 250, pp. 394–399 (2001).
13. N. Isobe, K. Makishima, M. Tashiro, K. Itoh, N. Iyomoto, I. Takahashi, & H. Kaneda, Astrophys. Journal 645, pp. 256–263
(2006).
14. J. Kataoka, & Ł. Stawarz, Astrophys. Journal 622, pp. 797–810 (2005).
15. L. Page, et al., Astrophys. Journal, in press (2007).
16. C. Scharf, et al. Astrophys. Journal 596, pp. 105–113 (2003).
17. P. Sreekumar, D. L. Bertsch, R. C. Hartman, P. L. Nolan, & D. J. Thompson, Astroparticle Physics 11, pp. 221–223 (1999).
18. H. Steinle, et al. Astron. Astrophys. 330, pp. 97–107 (1998).
	Inverse Compton "Images" of Large Radio Galaxies
	The Highest-Redshift Radio Galaxies
ABSTRACT
  The extended lobes of radio galaxies are examined as sources of X-ray and
gamma-ray emission via inverse Compton scattering of 3K background photons. The
Compton spectra of two exemplary examples, Fornax A and Centaurus A, are
estimated using available radio measurements in the ~10's MHz - 10's GHz range.
For average lobe magnetic fields of >~0.3-1 micro-G, the lobe spectra are
predicted to extend into the soft gamma-rays making them likely detectable with
the GLAST LAT. If detected, their large angular extents (~1 deg and 8 deg) will
make it possible to ``image'' the radio lobes in gamma-rays. Similarly, this
process operates in more distant radio galaxies and the possibility that such
systems will be detected as unresolved gamma-ray sources with GLAST is briefly
considered.

<|endoftext|><|startoftext|>
Introduction
In this paper we construct a new Z-basis for the space of quasisymmetric
functions, QSym and study its properties. For instance, we show that it has
nonnegative structure constants, and that it behaves well with respect to the
quasisymmetric functions associated to matroids by the Hopf algebra morphism
Mat → QSym described by Billera, Jia, and Reiner [3]. We also answer in the
affirmative a question regarding rank two matroids posed in [3, Question 7.10],
and give an affirmative answer to [3, Question 7.12] in the case of rank two
matroids.
In [3], Billera, Jia, and Reiner describe an invariant for matroids in the form
of a quasisymmetric function. They show that the mapping F : Mat → QSym
is in fact a morphism of combinatorial Hopf algebras (given a suitable choice of
character on Mat; see [1]), where Mat is the Hopf algebra of matroids introduced
by Schmitt [15], and studied by Crapo and Schmitt [4], [5], [6], [7]. Billera,
Jia, and Reiner show that, while the mapping F is not surjective over integer
coefficients, it is surjective over rational coefficients.
http://arxiv.org/abs/0704.0836v2
Our new basis for QSym is “matroid-friendly” in that it reflects the rank of
loopless matroids as well as the size of the ground sets: for every 1 ≤ r ≤ n,
there is a set Nnr of
basis vectors such that for every loopless matroid
M of rank r on an n-element ground set, F (M) ∈ span Nnr ; moreover, QSym
decomposes as the direct sum of these subspaces. This provides us with a new
product grading of QSym, according to matroid rank r. (The usual grading of
QSym by degree corresponds to the size n of the matroid ground set.) Also,
as with the monomial and fundamental bases of QSym, for every matroid M ,
F (M) has nonnegative coefficients in our basis.
The paper has two main parts. The first part (Sections 2–4) presents the new
basis and relevant background material. In Section 2, we recount background
material from the literature regarding posets and quasisymmetric functions. In
Section 3, we present a definition for our new basis for QSym by means of a
construction, and highlight several of its important features. There we also
prove that it is a Z-basis for QSym. In Section 4, we build necessary machinery
regarding computing the quasisymmetric function associated to a labeled poset,
in the form of alternative decompositions, and apply these tools to prove that
the structure constants of the new basis are nonnegative.
The second part, (Sections 5–7) discusses matroids and their quasisymmetric
functions. In Section 5, we recall some of the concepts, terminology, and results
from the paper [3], and prove our claims regarding the quasisymmetric functions
of matroids vis-a-vis our new basis. In Section 6, we recall the context of [3,
Section 7] regarding the relationship between decompositions of the quasisym-
metric function associated to a matroid and decompositions of its matroid base
polytope, and recall the statement of [3, Question 7.10] regarding the functions
associated to rank two matroids. We develop a formula for the quasisymmetric
function of a loopless rank two matroid in terms of the new basis, and apply
it to show (1) that the morphism F : Mat → QSym distinguishes isomorphism
classes of rank two matroids, (2) that the two types of decompositions mirror
each other, i.e. an affirmative answer to [3, Question 7.12] for the case of rank
two matroids, and (3) to give an affirmative answer to [3, Question 7.10]. In
Section 7, we make additional observations regarding matroid functions and the
new basis. We also compare the new basis with the other QSym bases discussed
in Section 10 of [3], and sketch an alternate proof of the surjectivity of the map
Mat → QSym over rational coefficients.
2 Preliminaries
In this section we quote certain concepts, terminology, and facts from the litera-
ture, as well as establish certain conventions which will be used in the remainder
of the paper.
2.1 Compositions
A composition α is a finite sequence of positive integers, i.e. α ∈ Pm for some
m ∈ N. The number of parts of α, m, is the length of α, and denoted by ℓ(α).
The weight of α = (α1, . . . , αm) is |α| = α1+ · · ·+αm. Included in our definition
is the composition having no parts, which we denote by the (bold font) symbol
0. We have ℓ(0) = |0| = 0, the only composition with these properties.
Note, for small examples where individual parts are less than 10, we will
often write a composition as a sequence of digits, with no separating commas.
For example, we may write (1, 5, 6, 3, 2, 3) as 156323 when the context is clear.
We adopt a similar convention for the one-line notation of permutations in Sn
when n < 10.
There is a natural bijection between compositions of weight |α| = n and
susbets of [n− 1] (where [n] = {1, 2, 3, . . . , n}), given by
(α1, . . . , αm) ↔ {α1, α1 + α2, α1 + α2 + α3, . . . , α1 + · · ·+ αm−1}.
We say that β is a refinement of α, or that β refines α (denoted β 4 α)
if |α| = |β| and A ⊂ B, where A and B are the sets associated to α and β
respectively.
To any permutation π ∈ Sn there is an associated composition of weight n
which we denote C(π) and whose parts give the lengths of successive increasing
runs in the one-line notation of π. For example, for π = 934756218 ∈ S9, we have
C(π) = 13212. In this paper, we mildly generalize the notion of a permutation
to be any sequence of distinct positive integers. Given a set of positive integers
X , we let S(X) denote the set of all permuations of all the elements of X . The
run length operator C(π) extends to these general permutations in the obvious
way. If X and Y are two sets of positive integers of the same cardinality n,
then every bijection f : X → Y induces a mapping f : S(X) → S(Y ) given by
f(x1, . . . , xn) = (f(x1), . . . , f(xn)). If f is an increasing function, then we have
C(f(π)) = C(π) for every π ∈ S(X).
2.2 Well-known QSym bases
The algebra of quasisymmetric functions QSym (or QSym(x) when we want
to emphasize the variable set) forms a subring of the power series ring R[[x]]
where x = (x1, x2, x3, ...) is a linearly ordered set of variables indexed by the
positive integers, and R is a (fixed) commutative ring. In this paper we only
deal with the cases where R is either Z or Q, assuming coefficients in Q unless
otherwise stated. We often suppress the variables in our notation, writing simply
f ∈ QSym rather than f(x) ∈ QSym(x).
There are a number of well-known bases for QSym, all indexed by compo-
sitions. (For the two considered here, see [10].) The best-known is the basis
of monomial quasisymmetric functions, which here we denote {xα}. Given a
composition α with ℓ(α) = k, xα is defined by
xα :=
1≤i1<i2<···<ik
xα1i1 x
· · ·xαkik .
Another frequently used basis is the set {Lα} of fundamental quasisymmetric
functions, defined by
Lα :=
For example, L1 = x
1 is simply the degree one elementary symmetric function.
We note that QSym as an algebra under the usual multiplication is graded by
degree. For each of the bases described above, the set of basis elements indexed
by all the compositions of a fixed weight n forms a basis for the homogeneous
component of degree n, QSymn. Accordingly, dimQSymn = 2
2.3 Posets and P -partitions
One of the early references to quasisymmetric functions is the paper of Ges-
sel [10] (who built on the work of Stanley [17]), where they are related to P -
partitions of labeled posets. Most of this material can also be found in Stanley
[18]. In the following, we let ≤ denote the usual ordering on integers, and ≤P
denote the partial order of a poset P . All posets we consider here are finite.
We adopt a mild generalization of Gessel’s convention. We say that a labeled
poset on n elements is a partial order on a set of n positive integers. These
integers are referred to as the labels of the poset. Usually the set of labels is
[n] = {1, 2, . . . , n}, but sometimes we make use of other labels. We often use the
same symbol to refer to both the poset and its set of labels when the meaning
is clear from context.
Note. This convention differs from that used in [3]. There, a labeled poset
consists of a pair (P, γ), where P is a poset on an arbitrary set of n elements,
and γ is a labeling of P , that is a bijection between the elements of P and the
set [n]. The notion is equivalent to Gessel’s. For our generalization, the labeling
would be an injective function from the set of elements of the poset into the set
P of positive integers. At times we find it convenient to write (P, γ) when we
wish to discuss various labelings on the same underlying unlabeled poset.
The following is not the actual definition used in [10] and [17], but rather is
a formula developed by Stanley in [17]. We take it as our definition here.
Definition 2.1. Let P be a labeled poset. Let L(P ) denote the Jordan-Hölder
set of P, that is, the set of all permutations in S(P ) that are linear extensions
of P . Then the quasisymmetric function of P is
F (P ) :=
π∈L(P )
LC(π). (1)
Remark 2.1. The function F (P ) depends only on the relative partial order of
the labels at each covering relation of the poset and not on the absolute values
of the labels themselves. Given labeled posets P defined on the set of labels
A, and P ′ defined on the set of labels B, and a function f : A → B which is
an isomorphism of their underlying unlabeled posets, then F (P ) = F (P ′) if for
every covering relation (y covers x) in P we have x < y ⇐⇒ f(x) < f(y).
3 The new basis
The main goal of this section is to define our new basis (see Definition 3.1
below) and to prove that it is in fact a Z-basis of QSym, that is to say, every
quasisymmetric function that can be written in terms of either the standard
monomial or fundamental basis using only integer coefficients can also be written
in terms of the new basis using only integer coefficients. In Section 4.3, we prove
the positivity of the structure constants for this new basis and the grading of
QSym by composition rank.
Following the notation of [3], given unlabelled posets P and Q, we denote
by P ⊕Q their ordinal sum. The set of elements of P ⊕Q is the disjoint union
of the elements of P and Q. All of the order relations of P and Q are retained,
and in addition, x <P⊕Q y for all x ∈ P and y ∈ Q.
As with all the well-known bases for QSym, the elements of the new basis
are indexed by compositions. We denote the basis by {Nα}, where α ranges
over all compositions.
Definition 3.1. For a given composition α 6= 0, let Pα = A1 ⊕ · · · ⊕ Am be
the graded poset on |α| elements, where m = ℓ(α) and Ai is an antichain on αi
elements. Make Pα into a labeled poset by numbering the ranks in alternating
fashion: first number the odd-ranked elements A2, A4, . . ., followed by the even-
ranked elements A1, A3, . . .. We define N0 := 1, and for each α 6= 0, we define
Nα := F (Pα) (see Equation (1)).
Example 3.2. Let α = (1, 2, 2). Then
P122 = {3} ⊕ {1, 2} ⊕ {4, 5},
L(P122) = {(31245), (31254), (32145), (32154)}, and
N122 = L14 + L131 + L113 + L1121.
Definition 3.3. Given a composition α = (α1, . . . , αk), the rank of α, denoted
by r(α), is the sum of the odd-indexed parts of the composition. That is,
r(α) :=
odd i
αi = α1 + α3 + α5 + · · · . (2)
We define N 00 := {N0} = {1}, and for 1 ≤ r ≤ n,
Nnr := {Nα : |α| = n and r(α) = r}. (3)
We also define the subspace V nr := span N
r ⊂ QSymn. If we are working over a
field of coefficients for QSym, then V nr may be viewed as a vector space, whereas
if we are working over integer coefficients then we refer to the Z-span of Nnr and
V nr is a Z-module.
Theorem 3.4. The set of quasisymmetric functions {Nα}, as α ranges over
all compositions, forms a Z-basis for QSym.
Proof. We show that {Nα}|α|=n forms a basis for the homogeneous component
QSymn for each nonnegative integer n. This is trivial for n = 0. For the general
case, we prove the existence of a unitriangular transition matrix from {Nα}|α|=n
of QSymn to the fundamental basis {Lα}|α|=n.
Consider the following construction. Given a permutation ω ∈ Sn, let b(ω) ∈
1{0, 1}n−1 be the n-digit binary word where the digits are given by
1 if i = 1 or ω(i− 1) < ω(i),
0 otherwise.
Then define ρ(ω) to be the composition which gives the lengths of succes-
sive runs of 1’s and 0’s in b(ω). For example, if ω = 184356729 ∈ S9 then
b(ω) = 110011101, and ρ(ω) = 22311. Clearly one can determine the run-length
composition C(ω) from ρ(ω) and vice versa.
Given a composition α, let Pα be the labeled poset in Definition 3.1, and
L(Pα) its set of linear extensions. Recall that by definition
Nα :=
π∈L(Pα)
LC(π).
By the nature of the labeling on Pα, ρ(π) 4 α for all π ∈ L(Pα). Furthermore,
there is a unique element π ∈ L(Pα) such that ρ(π) = α, namely the one in
which all the labels of Ai are in ascending order if i is odd, and in descending
order if i is even. Thus if we order the rows of the transition matrix (labeled
by compositions α) and columns (labeled by compositions ρ(π)) in an arbitrary
way that extends the partial refinement order 4, then the resulting matrix is
unitriangular, and hence {Nα} is indeed a Z-basis for QSym.
4 Additional facts regarding F (P )
In this section we develop several additional facts regarding the quasisymmetric
function F (P ) for labeled posets P , including an alternative way to decompose
F (P ) for posets, the main idea being to partition L(P ). These facts, especially
Lemmas 4.1 through 4.4, are key tools for the results in following sections.
4.1 Ordered partitions
Consider a permutation π = (π1, . . . , πn) ∈ S(X), where |X | = n, and a compo-
sition τ = (τ1, . . . , τk) with |τ | = n. We can “chop up”, or segment the one-line
notation of π from left to right into k segments, where each respective segment
si is a subsequence of consecutive elements of the one-line notation of length τi.
We call the sequence of these segments s = (s1, . . . , sk) a segmentation of π of
type τ (or induced by τ). Letting t0 = 0 and tj =
i=1 τi be the j-th partial
sum of the parts of τ , every permutation π ∈ S(X) has a unique segmentation
sτ (π), whose segments, for 1 ≤ j ≤ k, are given by
sj = (πtj−1+1, πtj−1+2, . . . , πtj ).
An ordered partition K = (K1, . . . ,Kk) of a set X ⊂ P is a partitioning of X
into non-empty, pairwise disjoint subsets called blocks, i.e. X = ⊔ki=1Ki, where
the order of the blocks matters. Let τi = |Ki| for all i, and refer to the resulting
composition τ(K) = (τ1, . . . , τk) as the type of K.
Let K(X) denote the set of all ordered partitions of X . Every composition
τ of weight n induces a mapping Kτ : S(X) → K(X) as follows. For every
π ∈ S(X) there is a unique ordered partition Kτ (π), each of whose blocks Kj is
the set of elements in the corresponding segment sj of the segmentation sτ (π).
We abbreviate the inverse image K−1τ (K) as K
−1(K) since, for a given or-
dered partition K, the type τ , the set of elements X , and thus the permutation
group S(X), can all be determined from K. Thus for an ordered partition K,
we have
K−1(K) := {π ∈ S(X) : Kτ(K)(π) = K}. (4)
For example, K−1(({2, 7}, {5}, {1, 8})) = {27518, 27581, 72518, 72581}.
The following lemma is simply an exercise in notation, so we omit its proof.
Lemma 4.1. Let K be an ordered partition with k blocks. Let PK be the labeled
poset PK = K1 ⊕ · · · ⊕Kk, where each Ki is regarded as an antichain. Then
F (PK) =
π∈K−1(K)
LC(π).
We say that an ordered partitionK = (K1, . . . ,Kk) is alternating if for every
1 ≤ i < k and for all x ∈ Ki and y ∈ Ki+1 we have x < y if i is even and x > y
if i is odd.
Lemma 4.2. Let K be an alternating ordered partition of type τ . Then
F (PK) = Nτ .
Proof. Each rank Ki of PK = K1 ⊕ · · · ⊕Kk (where ℓ(τ) = k) is an antichain.
Hence F (PK) depends only on the relative ordering of the elements between
adjacent ranks Ki and Ki+1 (see Remark 2.1). Since K is alternating, we can
relabel its elements in each rank as we do in the construction of Pτ (as in
Definition 3.1) and still maintain the same relative ordering between elements
in adjacent ranks. Thus F (PK) = F (Pτ ) = Nτ .
4.2 Unordered partitions of X ⊂ P
Let T = {T1, . . . , Tm} be an unordered partition of the set X ⊂ P. We say that
an ordered partition K is a refinement of T if K, considered as an unordered
partition, is a refinement of T . For every permutation π ∈ S(X), T induces a
unique segmentation of π where each segment is contained in a block of T and
this segmentation is least (coarsest), with respect to refinement, among all such
segmentations. Corresponding to this segmentation there is a unique ordered
partition KT (π), which clearly is is a refinement of T . We say that T induces
the ordered partition KT (π) on π.
Example 4.3. Let X = [9], T = { {1, 4}, {2, 6, 8, 9}, {3, 5, 7} }, π = 965412378.
Then KT (π) = ({6, 9}, {5}, {1, 4}, {2}, {3, 7}, {8}).
Let P be a labeled poset, and T an unordered partition of P . Define KP,T
to be the set of induced ordered partitions KP,T := {KT (π) | π ∈ L(P )}. We
say that T is antichain-inducing if for every ordered partition K ∈ KP,T , every
block Ki of K is an antichain in P .
Lemma 4.4. Let T be an antichain-inducing unordered partition of a labeled
poset P . Then
F (P ) =
K∈KP,T
F (PK). (5)
We call this the decomposition of F (P ) with respect to T .
Proof. By Lemma 4.1 it suffices to show that
L(P ) =
K∈KP,T
K−1(K).
The “⊂”-direction is trivial. Indeed, T induces some ordered partition on every
permutation, and by definition KP,T includes all such partitions as permutations
range over L(P ). Also, clearly K−1(K) ∩ K−1(J) = ∅ if K 6= J since KT is a
well-defined map on L(P ), and so the union on the right is indeed a disjoint
union.
For the “⊃”-direction, let K ∈ KP,T . By definition of KP,T , there exists
π ∈ L(P ) ∩ K−1(K). Let s = sτ(K)(π). Since T is antichain-inducing, the
unordered set of elements Ki of each segment si is an antichain. It follows that
if we form a new permutation π̂ by permuting the elements of si arbitrarily
within si (and thus within π), we must also have that π̂ ∈ L(P ). Since this
holds true for each segment of s, we have K−1(K) ⊂ L(P ).
Remark. In the extreme case where T consists of all singleton sets, KT (π) is the
list of singleton sets in the order specified by π, and K−1(KT (π)) = {π}. We
can identify KT (π) with π itself, and similarly KP,T with L(P ), and the lemma
is then equivalent to the formula (1).
4.3 Structure constants for the new basis
Following the notation of [3] and [17], given labeled posets P and Q on sets
X and Y respectively, we denote by P + Q any disjoint sum of the posets,
constructed as follows. We first form the poset whose set of elements is the
disjoint union of the sets of elements of P and Q, retaining all partial order
relations of the two posets but adding no new relations. In order to ensure that
all labels are distinct, we then relabel the elements in any fashion subject to
the restriction that the resulting labels are all distinct and preserve the relative
order of labels at all covering relations (see Remark 2.1). While the disjoint sum
of the labeled posets is not uniquely defined, all disjoint sums so constructed
will have the same quasisymmetric function. It is well-known and is easy to
prove (see, for example, [10]) that
F (P +Q) = F (P ) · F (Q). (6)
We are now in a position to prove the nonnegativity of the structure constants
for our new basis.
Theorem 4.5. The quasisymmetric function algebra QSym is graded by the
rank of the compositions indexing the basis {Nα}. Furthermore, the structure
constants for {Nα} are nonnegative. That is, in the expansion
NαNβ =
cνα,βNν ,
all the constants cνα,β are nonnegative integers.
Proof. We first prove the statement regarding structure constants. Since N0 =
1, the claim holds trivially if α = 0 or β = 0. Thus we assume α = (α1, . . . , αs) 6=
0 and β = (β1, . . . , βt) 6= 0. By (6) we have that
NαNβ = F (Pα)F (Pβ) = F (Pα + Pβ). (7)
We write Pα = A1 ⊕ · · · ⊕As and Pβ = B1 ⊕ · · · ⊕Bt, and identify the Ai and
Bj subsets with their canonical inclusions in Pα + Pβ .
We form a new poset Q by relabeling the elements of the Ai and Bj subsets
while maintaining their ordering relations: first label the even-indexed Ai and
Bj in order, with the numbers from [m], where m = |α|+ |β| − r(α)− r(β) and
r(α) is the rank function from Definition 3.3, then label the odd-indexed Ai and
Bj in order, with the numbers from {m + 1, . . . , |α| + |β|}. Since F (Pα + Pβ)
depends only on the relative ordering of elements between adjacent ranks Ai
and Ai+1 for 1 ≤ i < s and between adjacent ranks Bj and Bj+1 for 1 ≤ j < t,
we have
F (Pα + Pβ) = F (Q). (8)
We consider the unordered partition T = {T1, T2} of Q given by
odd i
odd i
, and T2 =
even i
even i
Note that T is antichain-inducing, so we may apply Lemma 4.4:
F (Q) =
K∈KQ,T
F (PK). (9)
On the other hand, the labeling of Q implies that every ordered partition K ∈
KQ,T is alternating, so applying Lemma 4.2, we have
F (Q) =
K∈KQ,T
Nτ(K). (10)
Combining Equations (7) – (10) yields the positivity claim. In particular,
cνα,β = |{K ∈ KQ,T : τ(K) = ν}|.
To prove the statement regarding the grading of QSym by composition rank,
we simply note that for every K ∈ KQ,T , we have
r(τ(K)) = |T1| = r(α) + r(β).
5 Matroids
This section begins the second part of the paper. Here we review some of
the concepts, terminology, and results from [3], and prove our claims regarding
the quasisymmetric functions of matroids vis-a-vis our new basis. For general
background in matroid theory we refer the reader to standard texts such as
Oxley’s [14]. We review several of the terms here.
The direct sum of matroids M1 and M2, denoted M1⊕M2, has as its ground
set the disjoint union E(M1 ⊕M2) = E(M1) ⊔ E(M2), and as its bases
B(M1 ⊕M2) = {B1 ⊔B2 : B1 ∈ B(M1), B2 ∈ B(M2)}.
A circuit is a minimal dependent set. If we declare two elements of a matroid
to be equivalent if and only if they are both contained in some circuit, then
the equivalence classes of elements are the components of the matroid. We say
that the matroid is connected if it has only one component, and disconnected
otherwise. A matroid is the direct sum of its components.
5.1 The quasisymmetric function of a matroid
Billera, Jia, and Reiner [3] describe an invariant for isomorphism classes of
matroids in the form of a quasisymmetric function. Rather than give the defi-
nition from [3], we describe it in terms of a formula which is shown in [3] to be
equivalent to the definition.
Fix a matroid M , one of its bases B ∈ B(M), and let Bc = E(M)−B (the
cobase of B). Define the poset PB on the ground set E(M) where e <PB e
and only if e ∈ B, e′ ∈ Bc, and (B − e)∪ {e′} ∈ B(M). That is, e <PB e
′ if and
only if swapping e′ for e in B yields another base in M . Thus the Hasse diagram
of PB is a bipartite graph in which the elements of B are minimal elements of
the poset and the elements of Bc are maximal elements. Note that if M has no
loops, then in the Hasse diagram of PB , every element in B
c has positive vertex
degree. We say that a labeled poset is strictly labeled if for all x, y ∈ P we have
that x <P y implies x > y. Similarly, a labeled poset is naturally labeled if for
all x, y ∈ P , x <P y implies x < y. We apply a strict labeling to PB (any will
do). The quasisymmetric function F (M) associated with M can be written as
F (M) =
B∈B(M)
F (PB), (11)
where F (PB) is the quasisymmetric function of the strictly labeled poset PB as
defined in Definition 2.1.
It was shown in [3] that the mapping F : Mat → QSym is in fact a morphism
of combinatorial Hopf algebras, with a suitable choice of character on the algebra
Mat. Here Mat is the Hopf algebra of matroids introduced by Schmitt [15] and
studied by Crapo and Schmitt [4], [5], [6], [7]. The matroid algebra Mat has as
its basis elements isomorphism classes of matroids. The product of two basis
elements [M1] and [M2] in the algebra is given by [M1] · [M2] := [M1 ⊕ M2],
where M1 ⊕M2 denotes the direct sum of matroids. Comultiplication in Mat is
given by ∆([M ]) :=
A⊂E(M)[M |A]⊗ [M \A], where M |A is the restriction of
M to A, and M \ A is the contraction of M by A. Under the morphism F we
have that
F (M1 ⊕M2) = F (M1) · F (M2).
Billera, Jia, and Reiner also show that the mapping F , while not surjective over
integer coefficients, is surjective over rational coefficients.
The mapping F : Mat → QSym does not distinguish between loops and
coloops. Indeed, let M ′ extend the matroid M by adding a loop ℓ, i.e. M ′ =
M ⊕ {ℓ}, and let M ′′ extend the matroid M by adding a coloop c, i.e. M ′′ =
M ⊕ {c}. Then
F (M ′) = F (M ′′) = F (M) · L1. (12)
Here L1 is the fundamental basis function indexed by the composition (1), which
is the elementary symmetric function e1(x). Define an equivalence relation ∼
on isomorphism classes of matroids by [M1] ∼ [M2] if and only if one can obtain
M1 from M2 by changing some number of loops to coloops or vice versa. Then
by Equation (12) the mapping Mat → QSym factors through the quotient
Mat → Mat/∼ → QSym.
Accordingly, throughout most of our paper, we assume that, unless otherwise
specified, our matroids have no loops; that is, out of each equivalence class in
Mat/∼ we select the representative that has no loops when considering their
images in QSym.
5.2 Expanding F (M) in the {N
} basis
Recall from Definition 3.3 that Nnr = {Nα : |α| = n, r(α) = r} and V
spanNnr . In this subsection we may take our coefficient ring to be Z if we wish.
Lemma 5.1. Let P be a strictly labeled poset on n elements of rank at most
one and with r minimal elements. Then F (P ) ∈ V nr . Moreover the expan-
sion of F (P ) in terms of the basis elements Nnr has only nonnegative integer
coefficients.
Proof. If P has rank 0, then P is an antichain and so r = n. Thus by labelling
P with the elements of [n] and taking α = (n), we have
F (P ) = F (Pα) = Nα = N(n) ∈ V
Otherwise P has rank 1 and is not an antichain. Let T = {T1, T2} be the
unordered partition of P in which T1 comprises the r minimal elements of P ,
and T2 the remaining elements. Note that some elements may be both minimal
and maximal, and these will be placed in T1. Thus every element in T2 in
the Hasse diagram of P has positive vertex degree. Since we are interested in
computing F (P ), we may assume without loss of generality that P has a strict
labeling which labels the elements of T1 = {n, n− 1, . . . , n− r+ 1} in arbitrary
fashion, and the elements of T2 = {1, 2, . . . , n − r} in arbitrary fashion. Since
T1 and T2 are themselves antichains, T is antichain-inducing, so by Lemma 4.4:
F (P ) =
K∈KP,T
F (PK).
Moreover, by the choice of labeling and the fact that every element in T2 has
positive vertex degree, we have that every K ∈ KP,T is alternating. Lemma 4.2
then implies
F (P ) =
K∈KP,T
Nτ(K).
We also have that |τ(K)| = n and r(τ(K)) =
odd i τi = |T1| = r, thus Nτ(K) ∈
V nr for every K ∈ KP,T . Hence F (P ) ∈ V
r as claimed.
Theorem 5.2. Let M be a loopless matroid of rank r on n elements. Then
F (M) ∈ V nr . Moreover the expansion of F (M) in terms of the basis elements
Nnr has only nonnegative integer coefficients.
Proof. For a loopless matroid M , for every base B ∈ B(M), the base poset PB
has r minimal elements out of a total of n elements, and rank at most one. The
assertion then follows by Lemma 5.1, and the formula in Equation (11).
We make a few observations here about the coefficients in the expansion of
the quasisymmetric function F (M) of a matroid M in terms of our new basis.
Given a quasisymmetric function q, define
supp(q) =
α : mα 6= 0 in the expansion q =
We first note that for a (loopless) matroid M of rank r on n elements, the
coefficient m(r,n−r) of Nα, where α = (r, n− r), is equal to the number of bases
of M .
Example 5.3. Let M = Ur,n be the uniform matroid of rank r on n elements.
By definition, its bases are all the r-subsets of the ground set E(M), i.e. B(M) =(
. Then every PB has a complete bipartite Kr,n−r graph for its Hasse
diagram. Therefore F (Ur,n) =
Nr,n−r, and supp(F (M)) = {(r, n− r)}.
Thus the values of m(r,n−r) (how small) and |supp(F (M))| (how large) are,
to some extent, measures of the degree to which M fails to be uniform.
The coefficient mα where α = (r− 1, 1, 1, n− r− 1) also has a combinatorial
interpretation. There is such an Nα term for every edge “missing” from the
Hasse diagram of a base poset PB as compared to the complete bipartite graph
Kr,n−r. In terms of matroid base polytopes, which are discussed in 6.1, the
polytope Q(Ur,n−r) contains all possible vertices, namely
, while the base
polytope for a different matroid M of same rank and ground set size has only a
subset of them, namely B(M). The coefficient mα is the number of edges in the
1-skeleton of Q(Ur,n−r) between the set of vertices B(M) and its complement(
− B(M).
Lemma 5.4. Let M be a matroid, possibly containing loops. Then the total
number c of loops and coloops is given by
c = max
α∈supp(F (M)),
ℓ(α) odd
α̇, (13)
where α̇ denotes the last part of the composition α.
Proof. Since the morphism F : Mat → QSym factors through loop-coloop equiv-
alence, F (M) = F (M ′) where M ′ is obtained from M by replacing all loops of
M with coloops. Since M and M ′ both have the same total number of loops
and coloops, without loss of generality, we assume that M has no loops.
Consider a typical strictly labeled base poset PB of M, and antichain inducing
partition {B,Bc} as in the proof of Lemma 5.1. We have α ∈ supp(F (M)) if
and only if there is an induced ordered partition K of PB of type α. Now ℓ(α)
is odd if and only if the last block of K is a subset of B, and in this case the
elements in this block must be coloops. Thus c ≥ α̇. Conversely, there always
exists an induced ordered partition of the poset, say of type α, which has all of
the coloops of M in the last block, i.e. c = α̇, and this ordered partition will
have odd length. The result follows.
6 Matroid base polytopes
In this section we recall the context of [3, Section 7] regarding the relationship
between decompositions of the quasisymmetric function associated to a matroid
and decompositions of its matroid base polytope. In Subsection 6.2 we develop a
formula for the quasisymmetric function of a loopless rank two matroid in terms
of the new basis, and apply it to address [3, Question 7.12] and [3, Question
7.10].
6.1 Matroid base polytopes and their decompositions
The motivating context is to study the decompositions of the matroid base
polytope Q(M) of a matroid M . This topic arises in the work of Lafforgue [12],
[13], Kapranov [11, §1.2 – 1.4], and can be found in the work of Speyer [16].
If M is a matroid with |E(M)| = n, we define the matroid base polytope
Q(M) by identifying E(M) with the set of standard basis vectors {ei}
i=1 of R
and declaring
Q(M) := conv
ei : B ∈ B(M)
where B(M) is the set of bases of M . Useful facts about matroid base polytopes
(see [9]), which we quote without proof, are:
1. If M has rank r, then Q(M) lies in the hyperplane {x ∈ Rn :
i xi = r}.
2. There is an edge in Q(M) between vertices (bases) B1 and B2 if and
only if there exist a pair of elements ei ∈ B1 and ej ∈ B2 such that
B2 = (B1 − {ei}) ∪ {ej}.
3. Each face of a matroid base polytope is in turn the base polytope of some
matroid.
4. The dimension of Q(M) is |E(M)| − s(M), where s(M) is the number of
connected components of M .
Billera, Jia, and Reiner define a matroid base polytope decomposition of Q(M)
to be a decomposition
Q(M) =
Q(Mi), (14)
where each Q(Mi) is also a matroid base polytope for some matroid, and for
each i 6= j, the intersection Q(Mi) ∩ Q(Mj) = Q(Mi ∩ Mj) is a face of both
Q(Mi) and Q(Mj). They call such a decomposition a split if t = 2.
Billera, Jia, and Reiner show that the mapping F : Mat → QSym behaves as
a valuation on matroid base polytopes. (See [2] for a discussion of valuations.)
This implies that, given a matroid base polytope decomposition as in Equation
(14), F (M) can be expressed in terms of the set of F (Mj) in an inclusion-
exclusion fashion, where the Mj are the matroids of the faces of the constituent
polytopes in the decomposition. For example, given a split Q(M) = Q(M1) ∪
Q(M2), we have F (M) = F (M1) + F (M2) − F (M1 ∩M2), where Q(M1 ∩M2)
is necessarily a lower-dimensional face. Hence by Fact 4 above, the matroid
M1 ∩M2 is disconnected, and so F (M1 ∩M2) can be expressed as a product.
If we let m :=
d≥1 QSymd be the maximal ideal in the ring QSym, then
F (M1∩M2) ∈ m
2. Therefore in the quotient space QSym/m2, we have F (M) =
F (M1) + F (M2). In general, given a matroid base polytope decomposition as
in Equation (14), there is an algebraic decomposition modulo m2
F (M) =
F (Mi). (15)
One of the open questions raised by Billera, Jia, and Reiner [3] is under what
conditions the converse may hold; given a collection of matroids satisfying (15),
what additional conditions are sufficient to conclude (14)?
Note. So far we have ignored the distinction between the isomorphism class of a
matroid on the one hand and a specific instance of that class on a given ground
set on the other, since the quasisymmetric function of a matroid is invariant
on the elements of the same isomorphism class. When discussing the existence
of matroid base polytope decompositions, it is sometimes necessary to draw a
distinction between the notions, as is done in the statement of Theorem 6.2
below. When this precision is necessary, we use the usual bracket notation [M ]
to denote the isomorphism class of the matroid M .
Given a ground set size n, the converse question is trivial for rank 0 and 1,
and by matroid duality, for rank n and n − 1. One necessary condition they
point out is that a specific set of matroids on a common ground set satisfying
(14) must at least satisfy the condition B(Mi) ⊂ B(M) for all i, in which case
they say that (15) is a weak image decomposition and that F (M) is weak image
decomposable. They specifically ask,
[3, Question 7.12] Does F (M) being weak image decomposable in
QSym/m2 imply that Q(M) is decomposable?
So far, general sufficient conditions are not known beyond the trivial ranks listed
above. We claim that the converse ((15) ⇒ (14)) holds quite generally for rank
two matroids, as shown in Section 6.2. By matroid duality, the converse also
holds for matroids of corank two.
As discussed in [3], the loopless rank two matroids are indexed, up to iso-
morphism, by partitions having two or more parts, where there are as many
parts as there are parellelism classes of elements in the matroid and the parts of
the partition give the respective cardinalities of these classes. For this section
we write Mλ to denote the loopless rank two matroid indexed by the partition
λ. More generally, given a composition α, define Mα = Mλ where λ is the
decreasing rearrangement of the parts of α.
Kapranov [11, §1.3] gives a description of all decompositions of rank two
matroid base polytopes. He shows [11, Lemma 1.3.14] that in rank two, all
matroid base polytope decompositions arise from hyperplane splits. We provide
some description here of the geometric situation, in our own words. Given the
composition λ = (λ1, . . . , λm), with |λ| = n, set t0 = 0 and for 0 ≤ k ≤ m set
i=1 λi be the k-th partial sum. Then the vertices of Q(Mλ) are precisely
those 0/1-lattice points v lying in the hyperplane H = {x ∈ Rn :
i xi = 2}
subject to the restriction that
tk+1∑
i=tk+1
vi ≤ 1
for all 0 ≤ k < m. If ℓ(λ) = 2, then
Mλ = M(λ1,λ2) = U1,λ1 ⊕ U1,λ2 ,
where U1,n is the uniform matroid of rank 1 on n elements. It follows that if
ℓ(λ) = 2, then dimQ(Mλ) = n− 2 and F (Mλ) ∈ m
Supposing that ℓ(λ) = m > 3, choose index j such that 1 < j < m− 1. Let
a = tj and b = n− tj , and define compositions µ = (a, b), α = (a, λj+1, . . . , λm),
and β = (λ1, . . . , λj , b), all of which have weight n. Consider the hyperplane
H ′ = {x ∈ Rn :
i=1 xi = 1}. Then H
′ ∩ Q(Mλ) = Q(Mµ), giving us a
hyperplane split Q(Mλ) = Q(Mα) ∪ Q(Mβ). It follows from the above that
F (Mλ) = F (Mα) + F (Mβ) − F (Mµ), and F (Mλ) = F (Mα) + F (Mβ). We
can summarize this in the following proposition. The relations given in the
proposition remain true even if λ has only two or three parts, but in that case
the resulting relations are trivial.
Proposition 6.1. Let λ = (λ1, . . . , λt) be a composition with at least two parts.
Let 1 ≤ s < t, a =
i=1 λi, and b =
i=s+1 λi. Consider compositions
α = (a, λs+1, . . . , λt), β = (λ1, . . . , λs, b), and µ = (a, b). We then have
F (Mλ) = F (Mα) + F (Mβ)− F (Mµ),
and modulo m2,
F (Mλ) = F (Mα) + F (Mβ).
Moreover there is a split of matroid base polytopes
Q(Mλ) = Q(Mα) ∪Q(Mβ).
The splitting process can be repeated on the constituent matroid base polytopes
until we have decomposed Q(Mλ) into the union of matroid base polytopes of
type Q(Mα) where ℓ(α) = 3. Consequently, modulo m
2, F (Mλ) can be written
as a positive sum
F (Mλ) =
F (Mi),
where each Mi is a loopless rank 2 matroid indexed by a partition of length 3.
In this setting, Billera, Jia, and Reiner , pose the following question:
[3, Question 7.10] Fix n and consider the semigroup generated by
F (M) within QSymn/m
2 as one ranges over all matroids M of rank
2 on n elements. Is the Hilbert basis for this semigroup indexed by
those M for which λ(M) has exactly 3 parts?
By repeated application of Proposition 6.1, the set {F (Mλ) : ℓ(λ) = 3}
generates the semigroup in question, so the point of the question is whether this
generating set is minimal, and whether distinct indices yield distinct functions.
We prove that this is the case as a corollary of Theorem 6.2.
6.2 Results for rank two matroids
In this section, we prove that the morphism F : Mat → QSym distinguishes
isomorphism classes of rank two matroids and that decomposability of F (M)
for a rank two matroid M implies decomposability of Q(M), as stated in the
following theorem.
Theorem 6.2. Let λ ⊢ n with ℓ(λ) ≥ 3, and let J be a multiset of partitions of
n, all of length three or more, such that
F ([Mλ]) =
F ([Mµ]), (16)
where [Mτ ] denotes the isomorphism class of (loopless) rank two matroids on n
elements indexed by the partition τ . Then, taking the set of standard basis vec-
tors of Rn as the common ground set, there exists a collection of representative
matroids on this ground set, Mλ ∈ [Mλ] and Mµ ∈ [Mµ] for all µ ∈ J which
form a decomposition of matroid base polytopes
Q(Mλ) =
Q(Mµ). (17)
Before the main proof of this theorem, we establish some preliminary results.
We begin by developing a formula for F (Mλ) in terms of the new basis {Nα}.
We define the following quasisymmetric functions in
V n2 = span{Nα : |α| = n, r(α) = 2}.
For all 1 ≤ k ≤ n− 1 let
T nk :=
N(2,n−2) +
k − 1
N(1,j,1,n−2−j), (18)
where we understand N(1,j,1,n−2−j) to be N(1,n−2,1) when j = n − 2. We also
define quasisymmetric functions
Unk := k(n− k)T
Note that each of the sets {T nk } and {U
k } forms a basis for the subspace V
where we consider QSym to have rational coefficients.
Lemma 6.3. Let Mλ be the rank two matroid on n elements indexed by the
partition λ = (λ1, . . . , λm). Then
F (Mλ) =
Unλi . (19)
Proof. We write c(λi) to denote the parallelism class of elements in Mλ cor-
responding to the part λi. A typical base B ∈ B(Mλ) is B = {ei, ej}, where
ei ∈ c(λi) and ej ∈ c(λj) are in distinct parallelism classes. The Hasse diagram
of PB has two minimal elements, ei and ej. There are edges from ei to all
elements of the cobase Bc = E(Mλ) − B except for the λj − 1 elements which
are in the same parallelism class c(λj) as ej. Similarly, there are edges from ej
to all elements of the cobase except for the λi − 1 elements which are in the
same parallelism class c(λi) as ei.
We can analyze F (PB) as in the proof of Lemma 5.1 by applying a strict
labeling γ : E(Mλ) → [n] such that γ(ei) = n, γ(ej) = n − 1, and the cobase
elements are arbitrarily labeled with {1, 2, . . . , n − 2}. We take T = {B,Bc}
to be our antichain-inducing partition of (PB , γ). There is one induced ordered
partition (of [n]) of type (2, n − 2), namely K = (B,Bc), classifying one set
of permutations in L(PB , γ), and thus contributing one N(2,n−2) term to the
expansion of F (PB). For each 1 ≤ k < λj , and for each k-set A ⊂ B
c(λj), there is an induced ordered partition K = ({ej}, A, {ei}, B
c −A) of type
(1, k, 1, n − 2 − k) contributing a term N(1,k,1,n−2−k) to the expansion. Thus
there are
such terms N(1,k,1,n−2−k) corresponding to ordered partitions
K of type (1, k, 1, n−2−k) withK1 = {ej}. Likewise there are
such terms
N(1,k,1,n−2−k) corresponding to ordered partitions K of type (1, k, 1, n− 2− k)
with K1 = {ei}. All the Nα ∈ N
2 are of one of these types, and we know that
the terms of F (PB) must lie in V
2 , so these are the only types appearing in the
expansion for F (PB). There can be no other terms than these due to the order
relations in PB. Thus
F (PB) = N(2,n−2) +
λi − 1
λj − 1
N(1,k,1,n−2−k). (20)
Using Equation (18), we can rewrite this as
F (PB) = T
+ T nλj .
Finally, there are λiλj such bases B ∈ c(λi)× c(λj). Summing over all pairs of
parallelism classes of the matroid yields the formula
F (Mλ) =
λi(n− λi)T
Unλi .
Next we develop a similar formula for F (Mλ) in QSymn/m
2. Our starting
point is the following corollary.
Corollary 6.4. Let a and b be positive integers such that a+ b = n. Then
ab ·N(1,a−1) ·N(1,b−1) = U
a + U
Proof. Let λ = (a, b). ThenMλ = U1,a⊕U1,b, where U1,m is the uniform matroid
of rank one onm elements. As discussed in Example 5.3, F (U1,m) = mN(1,m−1).
Therefore by the Hopf algebra morphism, we have
F (Mλ) = F (U1,a) · F (U1,b) = aN(1,a−1) · bN(1,b−1).
On the other hand, by Lemma 6.3 we have F (Mλ) = U
a + U
b . Equating right
hand sides yields the desired formula.
Since QSym with respect to its product structure is graded by composition
rank as well as degree, the vector subspace V n2 ∩m
2 is spanned by the vectors
{N(1,a−1) ·N(1,b−1) : a+ b = n}.
Thus a basis for V n2 ∩ m
2 is {Unk + U
n−k : 1 ≤ k ≤
}. For expressing our
formula for F (Mλ), we find it convenient to define vectors U
k as follows:
Unk =
Unk if k <
0 if k = n
−Unn−k if k >
Thus the set {Unk : 1 ≤ k <
} forms a basis (over rational coefficients) for
V n2 /m
2. We have the immediate corollary of Lemma 6.3:
Corollary 6.5. Let Mλ be the rank two matroid on n elements indexed by the
partition λ = (λ1, . . . , λm). Then
F (Mλ) =
Unλi . (22)
The next proposition provides a necessary step for the main result, but may be
of interest in its own right.
Proposition 6.6. Let M2 be the set of matroid isomorphism classes (including
those with loops) of rank two matroids. Let Matc be the vector subspace of Mat
spanned by the isomorphism classes of connected matroids, and let M2c be the
set of matroid isomorphism classes of connected rank two matroids. Then the
algebra morphism F : Mat → QSym is injective when restricted to M2, and the
induced quotient map of vector spaces F : Matc → QSym/m
2 is injective when
restricted to M2c.
Proof. We show that we can recover the isomorphism class of the matroid from
its respective function. Suppose we are given F (M) for a rank two matroid M .
We know that F (M) is a non-zero homogeneous function of degree n = |E(M)|,
and so we recover the size of the ground set. Clearly, n ≥ 2.
It is possible that M may have loops or coloops. By Lemma 5.4 we can
recover the total number s of loops and coloops of M from F (M) by Equation
(13). If s = n, then M consists of two coloops and n − 2 loops. Otherwise
s ≤ n− 2, and we may factor F (M) as
F (M) = N(s) · F (M
where (s) is the one-part composition of s, and M ′ is the matroid obtained
from M by removing all loops and coloops. If now F (M ′) ∈ V n−s1 , we have
M ′ ∼= U1,n−s and M has one coloop and s− 1 loops. Otherwise M has s loops,
no coloops, F (M ′) ∈ V n−s2 , and M
′ is a loopless rank two matroid on n − s
elements.
So now without loss of generality, we assume that M has no loops or coloops
and thus is isomorphic to Mλ for some λ ⊢ n. We expand F (M) as
F (M) =
k . (23)
This expansion can be determined since the set of {Unk } form a basis of V
2 . Per
Lemma 6.3, for each k, the coefficient tk is the number of parts of λ that are
equal to k, and so we recover λ from F (Mλ).
The argument for recoveringM from F (M) for a connected rank two matroid
M is similar. Since M is connected, it has no loops or coloops, and so again
M is isomorphic to Mλ for some λ ⊢ n with ℓ(λ) ≥ 3, where n is the degree of
F (M). We expand
F (M) =
⌊(n−1)/2⌋∑
k . (24)
This expansion can be determined since the set {Unk : 1 ≤ k <
} forms a basis
for the subspace V n2 /m
2. Note that λ cannot have a pair of parts with values
k and n − k. Using this fact together with Corollary 6.5, we see that if the
coefficient tk is nonnegative, then λ has exactly tk parts with value k. From
this we can determine all the parts of λ which are < n
. Since λ cannot have
more than one part ≥ n
, this allows us to determine the remaining part of λ, if
Proof of Theorem 6.2. We write A⊔B to denote the disjoint union of multisets
A and B. Note that a partition may be considered to be a multiset of integers.
We fix n > 2 and λ ⊢ n with ℓ(λ) ≥ 3, and proceed by induction on |J |. The
base case |J | = 1 follows from Proposition 6.6. So we assume that the statement
holds for |J | < m for some fixed m > 1. Suppose now that
F ([Mλ]) =
F ([Mµ]), (25)
where |J | = m. Say that a pair of elements µ, ν ∈ J are matching if for some
value 1 < k < n − 1 we have k ∈ µ and n − k ∈ ν. If µ, ν are a matching
pair, then we can apply Proposition 6.1 to form a new relation of type (25) by
replacing J with J ′ = (J − {µ, ν}) ⊔ {τ}, where τ = (µ ⊔ ν) − {k, n − k}. At
the same time, Proposition 6.1 tells us that we also have a decomposition of
base polytopes Q(Mτ ) = Q(Mµ) ∪ Q(Mν). Since |J
′| < m, we can apply our
induction hypothesis, and we are done. It remains to show that there exists a
matching pair in J .
For a partition τ ⊢ n, define the multiset g(τ) = {τi : τi > 1, τi 6=
Define multisets L = g(λ) and R =
µ∈J g(µ). Per Corollary 6.5 we expand
F (Mλ) =
ℓ(λ)∑
Unλi ,
and we similarly expand each F (Mµ) on the right hand side of (25). Since the
set {Unk : 1 ≤ k <
} forms a basis for V n2 /m
2, with Unk = −U
n−k, we conclude
that L ⊆ R and that the parts in R − L can be matched into complementary
pairs of the form (k, n− k). Since no partition in J can contain both parts of a
complementary pair, there exists a matching pair in J if R− L 6= ∅.
We are assuming that |J | ≥ 2, and that R and L contain all the parts not
equal to n
or 1 on the respective sides of (25). The parts equal to 1 on both sides
must match since all of the partitions have at least three parts and hence no
part equal to (n− 1). The only way to have R−L = ∅ is if there exist µ, ν ∈ J
each of which contains a part equal to n
, in which case they are matching.
Thus in all cases, there exists a matching pair µ, ν ∈ J , and the result follows
by induction.
Now we can give an affirmative answer to [3, Question 7.10].
Corollary 6.7. For a fixed n, the Hilbert basis for the semigroup in QSym/m2
generated by the set S = {F (Mλ) : λ ⊢ n, ℓ(λ) ≥ 3} is indexed by those Mλ for
which ℓ(λ) = 3.
Proof. Let T = {F (Mλ) : λ ⊢ n, ℓ(λ) = 3}. It follows from Proposition 6.1
that for ℓ(λ) > 3, F (Mλ) is decomposable into a sum
µ F (Mµ), where for all
µ, ℓ(µ) < ℓ(λ). Hence T generates the same semigroup as S. As noted in [3,
Section 7], if ℓ(λ) = 3, then Q(Mλ) is indecomposable. Theorem 6.2 then implies
that F (Mλ) must also be indecomposable, so T is the minimal generating set,
i.e. the Hilbert basis of the semigroup. By Proposition 6.6, distinct indexing
partitions yield distinct images, establishing the claim.
7 Additional observations
In this section we discuss additional aspects of our new basis, especially regard-
ing the expansion of F (M) for a matroid M .
7.1 Matroid duality, loops, and coloops
Although we describe the basis {Nα} as ‘matroid-friendly’, things are slightly
less friendly when considering matroid duality in the presence of coloops. This
is due to the fact, mentioned in Section 5.1, that the mapping F : Mat → QSym
factors through the quotient Mat → Mat/∼ → QSym, where ∼ denotes loop-
coloop equivalence.
For example, a fact proved in [3] is that, for any matroid M , in terms of the
monomial basis for QSym the following relationship holds:
F (M) =
α =⇒ F (M∗) =
α∗ . (26)
where α∗ is the reversal of α, obtained by writing the parts of α in reverse order.
If M be is a matroid of rank r on n elements having no loops or coloops, then
we have the analogous relationship
F (M) =
mαNα =⇒ F (M
mαNα∗ . (27)
However this relationship breaks down if M has loops or coloops.
We showed in Theorem 5.2 that if M is a loopless matroid of rank r on n
elements, then F (M) ∈ V nr . More generally, if M is of rank r on n elements
and has exactly ℓ loops, then F (M) ∈ V nr+ℓ. Thus if M has exactly c coloops,
then we have the duality relationship
F (M) ∈ V nr =⇒ F (M
∗) ∈ V nn−r+c.
7.2 Comultiplication
The matroid Hopf algebra is graded by matroid rank as well as ground set size.
Let Wnr be the subspace of Mat spanned by the classes of matroids of rank r
on n elements. Then Wnr ·W
s ⊂ W
r+s . For any matroid M and A ⊆ E(M),
r(M) = r(M |A) + r(M/A). So comultiplication in Mat also respects these
gradings. (For general background on Hopf algebras, see [8].) That is,
∆Wnr ⊆
a+b=n,
s+t=r
W as ⊗W
. (28)
One might wonder whether the standard comultiplication of the Hopf algebra
QSym respects the grading by the rank function for our new basis, that is,
whether
∆V nr ⊆
a+b=n,
s+t=r
V as ⊗ V
. (29)
This is not the case. For the simplest example, consider n = 2 and r = 1. We
have N 00 = {N0} = {1} and N
1 = {N11} = {x
11}. Note that there is no Nm0
(or rather, Nm0 = ∅) for m > 0. The basis vectors corresponding to the right
hand side of (29) are
N11 ⊗N0 = x
11 ⊗ 1, and N0 ⊗N11 = 1⊗ x
However,
∆N11 = ∆x
11 = x11 ⊗ 1 + x1 ⊗ x1 + 1⊗ x11,
which clearly does not lie in the span of the above vectors.
The failure of the comultiplication to respect the rank grading can be viewed
as another artifact of loop-coloop equivalence under the morphism F , as evi-
denced by the fact that the rank grading is respected by comultiplication in the
quotient space corresponding to matroids with neither loops nor coloops. Let
J ⊂ QSym be the ideal generated by degree one elements, i.e. by {N1} = {x
Similarly, let I ⊂ Mat be the ideal generated by degree one elements, i.e. by
{[U0,1], [U1,1]}. Both I and J are Hopf ideals in their respective Hopf algebras,
hence Mat/I and QSym/J (with their naturally induced comultiplications) are
Hopf algebras. Moreover, I = F−1(J), so F : Mat → QSym induces a sur-
jective Hopf algebra morphism Mat/I → QSym/J . Note that a natural basis
for Mat/I is the set of all matroid isomorphism classes that have neither loops
nor coloops, while a natural basis for QSym/J is {Nα : ℓ(α) is even}. Taking
appropriate images under the quotient map, the relation (29) holds in QSym/J .
The duality formula (27) also holds in QSym/J .
7.3 Comparison with other QSym bases
In the course of their proof in Section 10 of [3], the authors introduce two new
Z-bases for QSym. They also compare their bases to another Z-basis due to
Stanley [19].
Our new basis is different from these three, as evidenced by the report by
those authors that all three of these bases have some negative structure con-
stants, whereas our new basis does not. However, of the three, ours most closely
resembles that of Stanley. Stanley’s basis element indexed by a composition
α = (α1, . . . , αm) is F (P ) where, as with our basis, P = A1 ⊕ · · · ⊕ Am, is
the ordered sum of antichains A1, · · · , Am on α1, . . . , αm elements respectively.
However, Stanley applies a natural labeling to P , whereas we apply an alter-
nating labeling to the ranks in the poset for our basis.
7.4 Surjectivity of the Hopf algebra morphism
Billera, Jia, and Reiner devote [3, Section 10] to showing that the morphism
F : Mat → QSym is surjective over rational coefficients. In this subsection we
sketch one way to shorten their proof somewhat using our new basis. The reader
will need to consult [3] to have the full context.
Define an ordering on compositions as follows. To each composition α we
assign the binary word b(α) that begins with α1 zeros followed by α2 ones, then
α3 zeros, then α4 ones, etc. We then linearly order compositions according to
their binary words: α < β if b(α) <lex b(β).
In their proof, Billera, Jia, and Reiner make use of a novel basis for the
quasisymmetric functions based on a family of posets {Rσ} of maximum rank
one, indexed by binary words σ ∈ 0{0, 1}n−1, where n is the number of elements
of the poset. We may equivalently index them using compositions of weight n,
declaring Rα = Rb(α). We refer the reader to [3, Section 10] for the definition of
this basis. Billera, Jia, and Reiner show, through a series of theorems that the
set of {F (Rα)}, where the posets are strictly labeled, forms a Z-basis for QSym.
Using our basis, one can show this more directly. We know from Lemma 5.1
that all β ∈ supp(F (Rα)) are of rank r(α) and weight |α|. It is not too hard
to show that the largest β ∈ supp(F (Rα)) with respect to the above ordering
is precisely α, and that the coefficient of Nα in the expansion of F (Rα) is one.
Thus an array giving the expansion of all the {F (Rα)} of a fixed set size |α| = n
in terms of {Nα} with rows and columns suitably ordered is unitriangular.
Acknowledgments
I wish to thank the referees for their helpful comments. I extend many thanks
to Isabella Novik and Sara Billey for their encouragement and their many hours
of proofreading and advice through several drafts of this manuscript. Thanks
also go to Vic Reiner and Lou Billera for their clarification of parts of their
paper and their helpful comments on this work. Special thanks go to Isabella
Novik for bringing the paper of Billera, Jia, and Reiner to my attention, and
to Vic Reiner for pointing me specifically to [3, Question 7.10], which was the
starting point for my investigation.
While working on this project the author was partially supported by a gradu-
ate fellowship from VIGRE NSF Grant DMS-0354131. The seeds of this project
were planted at the ICM in Madrid, 2006. The trip there was made possible by
funds from NSF Grant DMS-9983797.
References
[1] M. Aguiar, N. Bergeron, and F. Sottile, Combinatorial Hopf al-
gebras and generalized Dehn-Sommerville relations., Compos. Math., 142
(2006), pp. 1–30.
[2] A. Barvinok, A Course in Convexity., Graduate Studies in Mathematics.
54. Providence, RI: American Mathematical Society., 2002.
[3] L. J. Billera, N. Jia, and V. Reiner, A quasisymmetric function for
matroids, arXiv:math.CO/0606646.
[4] H. Crapo and W. Schmitt, Primitive elements in the matroid-minor
Hopf algebra, arXiv:math.CO/0511033.
[5] H. Crapo and W. Schmitt, A free subalgebra of the algebra of matroids.,
Eur. J. Comb., 26 (2005), pp. 1066–1085.
[6] , A unique factorization theorem for matroids., J. Comb. Theory, Ser.
A, 112 (2005), pp. 222–249.
[7] , The free product of matroids., Eur. J. Comb., 26 (2005), pp. 1060–
1065.
[8] S. Dăscălescu, C. Năstăsescu, and c. Raianu, Hopf Algebras. An
Introduction., Pure and Applied Mathematics, Marcel Dekker. 235. New
York, NY: Marcel Dekker., 2001.
[9] I. Gel’fand and V. Serganova, Combinatorial geometries and torus
strata on homogeneous compact manifolds., Russ. Math. Surv., 42 (1987),
pp. 133–168.
[10] I. M. Gessel, Multipartite P-partitions and inner products of skew Schur
functions. Combinatorics and algebra, Proc. Conf., Boulder/Colo. 1983,
Contemp. Math. 34, 289-301, 1984.
[11] M. Kapranov, Chow quotients of Grassmannians. I. Gelfand, Sergej (ed.)
et al., I. M. Gelfand Seminar. Providence, RI: American Mathematical
Society. Adv. Sov. Math. 16(2), 29-110, 1993.
[12] L. Lafforgue, Pavages des simplexes, schémas de graphes recollés et com-
pactification des PGLn+1r /PGLr., Invent. Math., 136 (1999), pp. 233–271.
[13] , Chirurgie des grassmanniennes., CRM Monograph Series. 19. Prov-
idence, RI: American Mathematical Society., 2003.
[14] J. G. Oxley, Matroid Theory., Oxford Graduate Texts in Mathematics 3.
Oxford: Oxford University Press., 2006.
[15] W. R. Schmitt, Incidence Hopf algebras., J. Pure Appl. Algebra, 96
(1994), pp. 299–330.
[16] D. E. Speyer, A matroid invariant via the K-theory of the Grassmannian,
arXiv:math.AG/0603551.
[17] R. Stanley, Ordered Structures and Partitions, Memoirs Amer. Math.
Soc., (1972). American Mathematical Society, Providence, RI.
[18] , Enumerative Combinatorics, vol. 2, Cambridge University Press,
Cambridge, MA, 1999.
[19] R. P. Stanley, The descent set and connectivity set of a permutation., J.
Integer Sequences, (2005).
	Introduction
	Preliminaries
	Compositions
	Well-known QSym bases
	Posets and P-partitions
	The new basis
	Additional facts regarding F(P)
	Ordered partitions
	Unordered partitions of X P
	Structure constants for the new basis
	Matroids
	The quasisymmetric function of a matroid
	Expanding F(M) in the {N} basis
	Matroid base polytopes
	Matroid base polytopes and their decompositions
	Results for rank two matroids
	Additional observations
	Matroid duality, loops, and coloops
	Comultiplication
	Comparison with other QSym bases
	Surjectivity of the Hopf algebra morphism
ABSTRACT
  A new Z-basis for the space of quasisymmetric functions (QSym, for short) is
presented. It is shown to have nonnegative structure constants, and several
interesting properties relative to the space of quasisymmetric functions
associated to matroids by the Hopf algebra morphism (F) of Billera, Jia, and
Reiner. In particular, for loopless matroids, this basis reflects the grading
by matroid rank, as well as by the size of the ground set. It is shown that the
morphism F is injective on the set of rank two matroids, and that
decomposability of the quasisymmetric function of a rank two matroid mirrors
the decomposability of its base polytope. An affirmative answer is given to the
Hilbert basis question raised by Billera, Jia, and Reiner.

<|endoftext|><|startoftext|>
Bremsstrahlung Radiation At a Vacuum Bubble Wall
Jae-Weon Lee∗
School of Computational Sciences, Korea Institute for Advanced Study,
207-43 Cheongnyangni 2-dong, Dongdaemun-gu, Seoul 130-722, Korea
Kyungsub Kim and Chul H. Lee
Department of Physics, Hanyang University, Seoul 133-791, Korea
Ji-ho Jang
Korea Atomic Energy Research Institute Yuseong, Daejeon 305-353, Korea
When charged particles collide with a vacuum bubble, they can radiate strong electromagnetic
waves due to rapid deceleration. Owing to the energy loss of the particles by this bremsstrahlung
radiation, there is a non-negligible damping pressure acting on the bubble wall even when thermal
equilibrium is maintained. In the non-relativistic region, this pressure is proportional to the velocity
of the wall and could have influenced the bubble dynamics in the early universe.
PACS numbers: 12.15.Ji, 98.80.Cq
There have been many studies on cosmological roles of first-order phase transitions, which
proceed by nucleations and collisions of vacuum bubbles[1]. For example, in electroweak baryo-
genesis models[2] rapid bubble expansion can provide a non-equilibrium environment, which
may result in asymmetry between matter and antimatter. Furthermore, in some inflation-
ary models[3, 4, 5], the speed of expanding vacuum bubbles determines how long the infla-
tion period lasts. To understand the bubble kinematics in a hot plasma, it is important to
study particle scatterings at a moving bubble wall. To calculate the velocity of electro-weak
bubbles[6, 7, 8, 9, 10, 11] and the CP violating charge transport rate by the wall available for
baryogenesis[2], one should know the reaction force acting on the wall due to the scattered par-
ticles, such as quarks and gauge bosons[12, 13]. (For a supersymmetric model see, for example,
Ref. 14)
At the first order cosmological phase transition, the false vacuum decays to the true vacuum,
which has lower energy, by making a vacuum bubble. When it is created, the wall of the bubble
is at rest. As the free energy difference between the inner and the outer parts of the bubble
fuels the wall, the velocity of the wall increases to the light velocity unless there is a damping
force. In the literature, it is generally believed that the non-trivial damping force is caused by
a deviation of the particle population from a thermal equilibrium one. In this paper, we study
the effect of bremsstrahlung radiations emitted by particles on the pressure acting on a bubble
wall (not necessary electroweak bubbles) during cosmological first order phase transitions.
The aim of this work is to show that, contrary to the usual arguments, the radiation damping
could give a non-negligible pressure even when the particles maintain thermal equilibrium.
Bremsstrahlung (braking radiation) is a radiation due to the acceleration or deceleration of a
charged particle[15]. Entering a true vacuum through a bubble wall, particles interact with the
wall and could acquire mass and be decelerated. For example, a fermion field ψ can get mass
through the well-known Yukawa term gψ̄φψ = mψ̄ψ, where φ is a Higgs field. At this time, if
the particle is charged electromagnetically, it can radiate strong electromagnetic waves due to
the deceleration. Let us calculate the pressure from the scattering.
For simplicity, we assume a linear profile for the bubble wall, i.e., gφ(x) ≡ m(x) = m0x/d
when 0 < x < d. (See Fig.1.) and choose the coordinates of the rest frame of the bubble wall.
∗Electronic address: scikid@kias.re.kr
http://arxiv.org/abs/0704.0837v1
mailto:scikid@kias.re.kr
This approximation is good for the usual tanh profile of the wall. The radiation power of an
accelerated particle is given by a relativistic version of the Larmor’s formula[16]:
dErad
, (1)
where A = 2e2/3c3 ≃ 0.0611 in the natural units (~ = c = k = 1) and ~k = (kx, ky, kz)
is the 3-momentum of a particle. We assume a situation where this classical description of
bremsstrahlung is good enough. Also, assuming that the wall is planar and parallel to the y-z
plane, we can treat the bubble as a 1-dimensional one along the x-axis. The energy, momentum,
and mass of the particle satisfy the usual relation
E2 ≡ m2(x) + ~k2(x). (2)
Let us denote the x-component of the momentum (kx) as k from now on. Differentiating the
above equation with time t and using dx/dt ≡ v and k = Ev, we get the force acting on the
wall due to the particles
, (3)
which is the starting point of the pressure calculation[6]. However, if we also consider the energy
carried away by the radiation Erad, then the total energy conserved is Etot ≡ E+Erad and the
force and, hence, the pressure should be changed. From dEtot/dt = 0, we obtain
= 0, (4)
which has a solution for the force
. (5)
Up to O(A), one can expand the square root term and obtain
. (6)
The second term represents the radiation damping. Then, the total pressure due to the collisions
of the particles in the plasma is given by[6]
(2π)3
f(E(k))
, (7)
where f(E) = (exp(βE) ± 1)−1 is a distribution function of fermions and bosons, respectively.
First, let us briefly review the well-known results without radiation damping. When the mean
velocity of the plasma fluid V relative to the wall (or the negative of the bubble wall velocity
relative to the fluid ) is zero, the first term of Eq. (6) contributes
dm2(x)
(2π)3
eβE ± 1
= F (m0, T )− F (0, T ), (8)
where F (φ, T ) is a free energy of φ at a temperature T = β−1. When V 6= 0, the distribution
function is changed to
f [γ(E − V k)] =
eβγ(E−V k) ± 1
. (9)
Here, γ = (1−V 2)−
2 . However, using the fact that the phase factor d3~k/E is a Lorentz invariant
and changing the integration variable to k′ = γ(k − V E) and defining E′ ≡ γ(E − V k), one
can find that the V dependency of P1 disappears [6]. From this, it is generally believed that to
get non-trivial pressure on the wall, one needs to consider a non-equilibrium deviation of f [7].
Our work indicates this is not necessarily true for some phase transitions. To see this, consider
the effect of the radiation (the second term of Eq. (6)). When V = 0, the term contributes to
the pressure
P2 = 2A
dm(x)
(2π)3
Ek(eβE ± 1)
which also vanishes because the second integrand is an odd function of k. However, when
V 6= 0, one can easily check that, due to the 1/k term, the V dependency survives even under
the change of the integration variable. Thus, in this case,
P2 = 2A
dm(x)
(2π)3
Ek(eβγ(E−V k) ± 1)
dm(x)
dx I2(x). (10)
To be more concrete, let us calculate an approximate value of the integration when V ≪ 1
for fermions. In this case, we can expand f [γ(E − V k)] ≃ f(E) − V βkf(E)[f(E) − 1] =
f(E) + V βkf2(E)exp(−βE). The integration of the first term gives zero, and the second term
contributes
I2 = V β
(2π)3
f2(E)exp(−βE) ≃
(ln2)TV
because
(2π)3
f2(E)exp(−βE) ≃
(ln2)T 2
, (12)
to lowest oder in (m/T )2 (See Ref. 7). Therefore, for the wall described in Fig. 1 the pressure
by the radiation is
(ln2)Am20T
V, (13)
which is comparable to the result of numerical integration of Eq. (10) for V ≪ 1, as shown in
Fig. 2. (During the numerical study it is useful to change the measure from dkydkz to 2πEdE.)
This pressure is proportional to the wall velocity up to the moderately relativistic case and exists
even when the system is in a thermal equilibrium. During the electroweak phase transition, a
particle’s electromagnetic charge is not definite, so the A value in Lamor’s formula can not be
a constant. In this paper , however, to perform a rough calculation, we have assumed that A is
a constant during the phase transition. For illustration of high temperature effects on electric
charges, now we consider a Debye screening of electric charge by plasma during the phase
transition, which is given by effective coupling αeff = α/(1 − 2α ln(k/Λ)/3π) ≃ 0.97α, where
we used averaged momentum 〈k〉 ≃ 3T and Λ of order electron mass at the last approximation
(see Eq. (42) of [17]). Thus, we obtain A = 0.0599 which is slightly smaller than the zero
temperature value. We also plot the pressure with this A value.
It is noteworthy that the pressure caused by the radiation damping (Eq. (13)) is of order
O(α), which is bigger than the pressure due to a departure from thermal equilibriums[7, 18, 19]
(O(α2))[18], and hence non-negligible. Here, α is the fine structure constant. Note also that
the power of bremsstrahlung due to bubble walls is much stronger (O(α)) than that of ordinary
bremsstrahlung of electrons colliding with ions in a plasma (O(α3))[20]. Since the electroweak
phase transition is a complicated phenomenon, by no means is our work a full calculation of the
pressure acting on the electroweak bubbles. The purpose of this paper is to present a general
idea that radiation damping (although usually ignored in the many related works for bubble wall
velocity calculations) could give rise to significant frictional forces even in thermal equilibrium
states at some cosmological phase transitions. To include the effects of other particles (e.g.,
gluon and W/Z particles) in our work, we need to modify Larmor’s formula by using some sort
of group factor. Even in this case, it is hardly probable that the pressure from the radiation
damping from different gauge sectors exactly cancel each other. Hence, one can expect that
a O(α) viscosity to survive. Since bubbles are slow initially, they are supposed to be in a
thermal equilibrium state initially. An ordinary calculation shows no friction at this time,
but the radiation damping force exists in this stage, and hence, this pressure can significantly
change the early evolution of the vacuum bubbles, and the nature of electroweak baryogenesis
or inflationary cosmology.
ACKNOWLEDGEMENTS
The authors are thankful to Myongtak Choi for helpful discussions. This work was supported
in part by the Korean Science and Engineering Foundation and Korea Research Foundation
(BSRI-98-2441).
[1] C. H. Lee, J. Korean Phys. Soc. 33, 588 (1998).
[2] M. Trodden, Rev. Mod. Phys. 71, 1463 (1999).
[3] D. La and P. J. Steinhardt, Phys. Rev. Lett. 62, 376 (1989).
[4] D. S. Goldwirth and H. W. Zaglauer, Phys. Rev. Lett. 67, 3639 (1991).
[5] S. Koh, J. Korean Phys. Soc. 49, 787 (2005).
[6] N. Turok, Phys. Rev. Lett. 68, 1803 (1992).
[7] G. Moore and T. Prokopec, Phys. Rev. Lett. 75, 777 (1995).
[8] P. J. Steinhardt, Phys. Rev. D 25, 2074 (1982).
[9] K. Enqvist, J. Ignatius, K. Kajantie, and K. Rummukainen, Phys. Rev. D 45, 3415 (1992).
[10] M. Dine, R. G. Leigh, P. Huet, A. Linde, and D. Linde, Phys. Rev. D 46, 550 (1992).
[11] C. H. Lee, J. Korean Phys. Soc. 32, 861 (1998).
[12] D. B. K. Andrew G. Cohen and A. E. Nelson, Nuc. Phys. B 349, 727 (1991).
[13] G. R. Farrar and M. E. Shaposhnikov, Phys. Rev. D 50, 774 (1994).
[14] P. John and M. G. Schmidt, Nucl. Phys. B 598, 291 (2001).
[15] K. T. Byun, K. Y. Kim, and H. Y. Kwak, J. Korean Phys. Soc. 47, 1010 (2005).
[16] J. Jackson, Classical Electrodynamics, 2nd ed. (Wiley, New York, 1975).
[17] R. A. Schneider, Phys. Rev. D66, 036003 (2002).
[18] G. D. Moore, JHEP 0003, 006 (2000).
[19] G. D. Moore and T. Prokopec, Phys. Rev. D 52, 7182 (1995).
[20] S. Ichimaru, Basic Principles of Plasma Physics (W. A. Benjamin, Reading, MA., 1973).
m
 
(
x
)
FIG. 1: The effective mass of the particle m(x) in the wall rest frame.
0.2 0.4 0.6 0.8 1
0.001
0.002
0.003
0.004
FIG. 2: The pressure by the radiation damping of fermions colliding with the linear bubble wall as a
function of the wall velocity. The thick line shows numerical integration of Eq. (10) and the dotted
line shows the approximate formula in Eq. (13). Here we set 1/d = 1 = m0 = T for simplicity. The
dashed line represents the result with Debye screening of charge.
	ACKNOWLEDGEMENTS
	References
ABSTRACT
  When charged particles collide with a vacuum bubble, they can radiate strong
electromagnetic waves due to rapid deceleration. Owing to the energy loss of
the particles by this bremsstrahlung radiation, there is a non-negligible
damping pressure acting on the bubble wall even when thermal equilibrium is
maintained. In the non-relativistic region, this pressure is proportional to
the velocity of the wall and could have influenced the bubble dynamics in the
early universe.

<|endoftext|><|startoftext|>
Introduction
The classical setting of the universal lossless compression problem [5], [8], [9] assumes that a se-
quence xn of length n that was generated by a source θ is to be compressed without knowledge
of the particular θ that generated xn but with knowledge of the class Λ of all possible sources θ.
The average performance of any given code, that assigns a length function L(·), is judged on the
basis of the redundancy function Rn (L,θ), which is defined as the difference between the expected
code length of L (·) with respect to (w.r.t.) the given source probability mass function Pθ and the
nth-order entropy of Pθ normalized by the length n of the uncoded sequence. A class of sources
is said to be universally compressible in some worst sense if the redundancy function diminishes
for this worst setting. Another approach to universal coding [29] considers the individual sequence
redundancy R̂n (L, x
n), defined as the normalized difference between the code length obtained by
L(·) for xn and the negative logarithm of the maximum likelihood (ML) probability of the sequence
xn, where the ML probability is within the class Λ. We thereafter refer to this negative logarithm as
the ML description length of xn. The individual sequence redundancy is defined for each sequence
that can be generated by a source θ in the given class Λ.
Classical literature on universal compression [5], [8], [9], [23], [29] considered compression of
sequences generated by sources over finite alphabets. In fact, it was shown by Kieffer [15] (see also
[13]) that there are no universal codes (in the sense of diminishing redundancy) for sources over
infinite alphabets. Later work (see, e.g., [21], [25]), however, bounded the achievable redundancies
for identically and independently distributed (i.i.d.) sequences generated by sources over large and
infinite alphabets. Specifically, while it was shown that the redundancy does not decay if the
alphabet size is of the same order of magnitude as the sequence length n or greater, it was also
shown that the redundancy does decay for alphabets of size o(n). 1
While there is no universal code for infinite alphabets, recent work [20] demonstrated that if
one considers the pattern of a sequence instead of the sequence itself, universal codes do exist in
the sense of diminishing redundancy. A pattern of a sequence, first considered, to the best of our
knowledge, in [1], is a sequence of indices, where the index ψi at time i represents the order of first
occurrence of letter xi in the sequence x
n. Further study of universal compression of patterns [20],
[21], [26], [28] provided various lower and upper bounds to various forms of redundancy in universal
1For two functions f(n) and g(n), f(n) = o(g(n)) if ∀c,∃n0, such that, ∀n > n0, f(n) < cg(n); f(n) = O(g(n))
if ∃c, n0, such that, ∀n > n0, 0 ≤ f(n) ≤ cg(n); f(n) = Θ(g(n)) if ∃c1, c2, n0, such that, ∀n > n0, c1g(n) ≤ f(n) ≤
c2g(n).
compression of patterns. Another related study is that of compression of data, where the order of
the occurring data symbols is not important, but their types and empirical counts are [30]-[31].
This paper considers universal compression of data sequences generated by distributions that
are known a-priori to be monotonic. Hence, the order of probabilities of the source symbols is
known in advance to both encoder and decoder and can be utilized as side information to improve
universal compression performance. Monotonic distributions are common for distributions over
the integers, including the geometric distribution and others. Such distributions do occur in image
compression problems (see, e.g., [18], [19]), and in other applications that compress residual signals.
A specific application one can consider for the results in this paper is compression of the list of
last or first names in a given city of a given population. One can usually find some monotonicity
for such a distribution in the given population, which both encoder and decoder may be aware of
a-priori . For example, the last name “Smith” can be expected to be much more common than
the last name “Shannon”. Another example is the compression of a sequence of observations of
different species, where one has prior knowledge which species are more common, and which are
rare. Finally, one can consider compressing data for which side information given to the decoder
through a different channel gives the monotonicity order.
Unlike compression of patterns, Foster, Stine, and Wyner, showed in [10] that there are no
universal block codes in the standard sense for the complete class of monotonic distributions. The
main reason is that there exist such distributions, for which much of the statistical weight lies in
symbols that have very low probability, and most of which will not occur in a given sequence.
Thus, in practice, even though one has the prior knowledge of the monotonicity of the distribution,
this monotonicity is not necessarily retained in an observed sequence. Therefore, actual coding
can be very similar to compressing with infinite alphabets, and the additional prior knowledge
of the monotonicity is not very helpful in reducing redundancy. Despite that, Foster, Stine, and
Wyner demonstrated codes that obtained universal per-symbol redundancy of o(1) as long as the
source entropy is fixed (i.e., neither increasing with n nor infinite). However, instead of considering
redundancy in the standard sense, the study of monotonic distributions resorted to studying relative
redundancy , which bounds the ratio between average assigned code length and the source entropy.
This approach dates back to work by Elias [7], Rissanen [22], and Ryabko [24].
The work in [10] studied coding sequences (or blocks) generated by i.i.d. monotonic distributions,
and designed codes for which the relative block redundancy could be (upper) bounded. Unlike that
work, the focus in [7], [22], and [24] was on designing codes that minimize the redundancy or
relative redundancy for a single symbol generated by a monotonic distribution. Specifically, in
[22], minimax codes, which minimize the relative redundancy for the worst possible monotonic
distribution over a given alphabet size, were derived. In [24], it was shown that redundancy of
O(log log k), where k is the alphabet size, can be obtained with minimax per-symbol codes. Very
recent work [16] considered per-symbol codes that minimize an average redundancy over the class
of monotonic distributions for a given alphabet size. Unlike [10], all these papers study per-symbol
codes. Therefore, the codes designed always pay non-diminishing per-symbol redundancy.
A different line of work on monotonic distributions considered optimizing codes for a known
monotonic distribution but with unknown parameters (see [18], [19] for design of codes for two-sided
geometric distributions). In this line of work, the class of sources is very limited and consists of
only the unknown parameters of a known distribution.
In this paper, we consider a general class of monotonic distributions that is not restricted
to a specific type. We study standard block redundancy for coding sequences generated by i.i.d.
monotonic distributions, i.e., a setting similar to the work in [10]. We do, however, restrict ourselves
to smaller subsets of the complete class of monotonic distributions. First, we consider monotonic
distributions over alphabets of size k, where k is either small w.r.t. n, or of O(n). Then, we extend
the analysis to show that under minimal restrictions of the monotonic distribution class, there exist
universal codes in the standard sense, i.e., with diminishing per-symbol redundancy. In fact, not
only do universal codes exist, but under mild restrictions, they achieve the same redundancy as
obtained for alphabets of size O(n). The restrictions on this subclass imply that some types of fast
decaying monotonic distributions are included in it, and therefore, sequences generated by these
distributions (without prior knowledge of either the distribution or of its parameters) can still be
compressed universally in the class of monotonic distributions.
The main contributions of this paper are the development of codes and derivation of their
upper bounds on the redundancies for coding i.i.d. sequences generated by monotonic distributions.
Specifically, the paper gives complete characterization of the redundancy in coding with monotonic
distributions over “small” alphabets (k = o(n1/3)) and “large” alphabets (k = O(n)). Then, it
shows that these redundancy bounds carry over (in first order) to fast decaying distributions. Next,
a code that achieves good redundancy rates for even slower decaying monotonic distributions is
derived, and is used to study achievable redundancy rates for such distributions. Lower bounds are
also presented to complete the characterization, and are shown to meet the upper bounds in the first
three cases (small alphabets, large alphabets, and fast decaying distributions). The lower bounds
turn out to result from lower bounds obtained for coding patterns. The relationship to patterns is
demonstrated in the proofs of those lower bounds. Finally, individual sequences are considered. It is
shown that under mild conditions, there exist universal codes w.r.t. the monotonic ML description
length for sequences that contain the O(n) more likely symbols, even if their empirical distributions
are not monotonic.
The outline of this paper is as follows. Section 2 describes the notation and basic definitions.
Then, in section 3, lower bounds on the redundancy for monotonic distributions are derived. Next,
in Section 4, we propose codes and upper bound their redundancy for coding monotonic distribu-
tions over small and large alphabets. These bounds are then extended to fast decaying monotonic
distributions in Section 5. Finally, in Section 6, we consider individual sequence redundancy.
2 Notation and Definitions
Let xn
= (x1, x2, . . . , xn) denote a sequence of n symbols over the alphabet Σ of size k, where k
can go to infinity. Without loss of generality, we assume that Σ = {1, 2, . . . , k}, i.e., it is the set of
positive integers from 1 to k. The sequence xn is generated by an i.i.d. distribution of some source,
determined by the parameter vector θ
= (θ1, θ2, . . . , θk), where θi is the probability of X taking
value i. The components of θ are non-negative and sum to 1. The distributions we consider in
this paper are monotonic. Therefore, θ1 ≥ θ2 ≥ . . . ≥ θk. The class of all monotonic distributions
will be denoted by M. The class of monotonic distributions over an alphabet of size k is denoted
by Mk. It is assumed that prior to coding xn both encoder and decoder know that θ ∈ M or
θ ∈ Mk, and also know the order of the probabilities in θ. In the more restrictive setting, k is
known in advance and it is known that θ ∈ Mk. We do not restrict ourselves to this setting. In
general, boldface letters will denote vectors, whose components will be denoted by their indices in
the vector. Capital letters will denote random variables. We will denote an estimator by the hat
sign. In particular, θ̂ will denote the ML estimator of θ which is obtained from xn.
The probability of xn generated by θ is given by Pθ (x
= Pr (xn | Θ = θ). The average
per-symbol2 nth-order redundancy obtained by a code that assigns length function L(·) for θ is
Rn (L,θ)
EθL [X
n]−Hθ [X] , (1)
where Eθ denotes expectation w.r.t. θ, and Hθ [X] is the (per-symbol) entropy (rate) of the source
2In this paper, redundancy is defined per-symbol (normalized by the sequence length n). However, when we refer
to redundancy in overall bits, we address the block redundancy cost for a sequence.
(Hθ [X
n] is the nth-order sequence entropy of θ, and for i.i.d. sources, Hθ [X
n] = nHθ [X]). With
entropy coding techniques, assigning a universal probability Q (xn) is identical to designing a uni-
versal code for coding xn where, up to negligible integer length constraints that will be ignored,
the negative logarithm to the base of 2 of the assigned probability is considered as the code length.
The individual sequence redundancy (see, e.g., [29]) of a code with length function L (·) per
sequence xn is
R̂n (L, x
{L (xn) + log PML (xn)} , (2)
where the logarithm function is taken to the base of 2, here and elsewhere, and PML (x
n) is the
probability of xn given by the ML estimator θ̂Λ ∈ Λ of the governing parameter vector Θ. The
negative logarithm of this probability is, up to integer length constraints, the shortest possible
code length assigned to xn in Λ. It will be referred to as the ML description length of xn in Λ.
In the general case, one considers the i.i.d. ML. However, since we only consider θ ∈ M, i.e.,
restrict the sequence to one governed by a monotonic distribution, we define θ̂M ∈ M as the
monotonic ML estimator. Its associated shortest code length will be referred to as the monotonic
ML description length. The estimator θ̂M may differ from the i.i.d. ML θ̂, in particular, if the
empirical distribution of xn is not monotonic. The individual sequence redundancy in M is thus
defined w.r.t. the monotonic ML description length, which is the negative logarithm of PML (x
xn | Θ = θ̂M ∈ M
The average minimax redundancy of some class Λ is defined as
R+n (Λ)
= min
Rn (L,θ) . (3)
Similarly, the individual minimax redundancy is that of the best code L (·) for the worst sequence
R̂+n (Λ)
= min
{L (xn) + logPθ (xn)} . (4)
The maximin redundancy of Λ is
R−n (Λ)
= sup
w (dθ)Rn (L,θ) , (5)
where w(·) is a prior on Λ. In [5], it was shown that R+n (Λ) ≥ R−n (Λ). Later, however, [6], [11],
[24] the two were shown to be essentially equal.
3 Lower Bounds
Lower bounds on various forms of the redundancy for the class of monotonic distributions can be
obtained with slight modifications of the proofs for the lower bounds on the redundancy of coding
patterns in [14], [20], [21], and [26]. The bounds are presented in the following three theorems. For
the sake of completeness, the main steps of the proofs of the first two theorems are presented in
appendices, and the proof of the third theorem is presented below. The reader is referred to [14],
[20], [21], [25] and [26] for more details.
Theorem 1 Fix an arbitrarily small ε > 0, and let n → ∞. Then, the nth-order average max-
imin and minimax universal coding redundancies for i.i.d. sequences generated by a monotonic
distribution with alphabet size k are lower bounded by
R−n (Mk) ≥
log n
+ k−1
log πe
log k
, for k ≤
πn1−ε
)1/3 · (1.5 log e) · n(1−ε)/3
, for k >
πn1−ε
)1/3 . (6)
Theorem 2 Fix an arbitrarily small ε > 0, and let n→ ∞. Then, the nth-order average universal
coding redundancy for coding i.i.d. sequences generated by monotonic distributions with alphabet
size k is lower bounded by
Rn (L,θ) ≥
log n
− k−1
log 8π
log k
, for k ≤ 1
1.5 log e
2π1/3
· n(1−ε)/3
, for k > 1
)1/3 (7)
for every code L(·) and almost every i.i.d. source θ ∈ Mk, except for a set of sources Aε (n) whose
relative volume in Mk goes to 0 as n→ ∞.
Theorems 1 and 2 give lower bounds on redundancies of coding over monotonic distributions
for the class Mk. However, the bounds are more general, and the second region applies to the
whole class of monotonic distributions M. As in the case of patterns [20], [26], the bounds in
(6)-(7) show that each parameter costs at least 0.5 log(n/k3) bits for small alphabets, and the total
universality cost is at least Θ(n1/3−ε) bits overall for larger alphabets. Unlike the currently known
results on patterns, however, we show in Section 4 that for k = O(n) these bounds are achievable
for monotonic distributions. The proofs of Theorems 1 and 2 are presented in Appendix A and in
Appendix B, respectively.
Theorem 3 Let n → ∞. Then, the nth-order individual minimax redundancy for i.i.d. sequences
with maximal letter k w.r.t. the monotonic ML description length with alphabet size k is lower
bounded by
R̂+n (Mk) ≥
log n
log e
23/12
log k
, for k ≤ e5/18
(2π)1/3
· n1/3
e5/18
(2π)1/3
(log e) · n1/3
, for n > k > e
(2π)1/3
· n1/3
(log e) · n1/3
, for k ≥ n.
Theorem 3 lower bounds the individual minimax redundancy for coding a sequence believed
to have an empirical monotonic distribution. The alphabet size is determined by the maximal
letter that occurs in the sequence, i.e., k = max {x1, x2, . . . , xn}. (If k is unknown, one can use
Elias’ code for the integers [7] using O(log k) bits to describe k. However this is not reflected in
the lower bound.) The ML probability estimate is taken over the class of monotonic distributions,
i.e., the empirical probability (standard ML) estimate θ̂ is not θ̂M in case θ̂ does not satisfy the
monotonicity that defines the class M. While the average case maximin and minimax bounds
of Theorem 1 also apply to R̂+n (Mk), the bounds of Theorem 3 are tighter for the individual
redundancy and are obtained using individual sequence redundancy techniques.
Proof of Theorem 3: Using Shtarkov’s normalized maximum likelihood (NML) approach [29],
one can assign probability
Q (xn)
yn Pθ̂M
maxθ′∈M Pθ′ (x
yn maxθ′∈M Pθ′ (y
to sequence xn. This approach minimizes the individual minimax redundancy, giving individual
redundancy of
R̂n (Q,x
maxθ′∈M Pθ′ (x
Q (xn)
Pθ′ (y
to every xn, specifically achieving the individual minimax redundancy.
It is now left to bound the logarithm of the sum in (10). For the first two regions, we follow
the approach used in Theorem 2 in [21] for bounding the redundancy for standard compression
of i.i.d. sequences over large alphabets, but adjust it to monotonic distributions. Alternatively,
one can derive the same bounds following the approach used for bounding the individual minimax
redundancy of patterns in proving Theorem 12 in [20]. Let nℓx
= (nx(1), nx(2), . . . , nx(ℓ)) denote
the occurrence counts of the first ℓ letters of the alphabet Σ in xn. For ℓ = k,
i=1 nx(i) = n.
Now, following (10),
nR̂+n (Mk)
≥ log
yn:θ̂(yn)∈M
≥ log
ny(1), . . . , ny(ℓ)
ny(i)
)ny(i)
≥ log
ny(1), . . . , ny(k)
ny(i)
)ny(i)
≥ log
ek/12 · (2π)k/2
nx(i)
≥ log
k − 1
ek/12
≥ k − 1
+ k log
e23/12√
−O (log k) (11)
where (a) follows from including only sequences yn that have a monotonic empirical (i.i.d. ML)
distribution in Shtarkov’s sum. Inequality (b) follows from partitioning the sequences yn into types
as done in [21], first by the number of occurring symbols ℓ, and then by the empirical distribution.
Unlike standard i.i.d. distributions though, monotonicity implies that only the first ℓ symbols in Σ
occur, and thus the choice of ℓ out of k in the proof in [21] is replaced by 1. Like in coding patterns,
we also divide by ℓ! because each type with ℓ occurring symbols can be ordered in at most ℓ! ways,
where only some retain the monotonicity. (Note that this step is the reason that step (b) produces
an inequality, because more than one of the orderings may be monotonic if equal occurrence counts
occur.) Except the division by ℓ!, the remaining steps follow those in [21]. Retaining only the term
ℓ = k yields inequality (c). Inequality (d) follows from Stirling’s bound
2πm ·
≤ m! ≤
2πm ·
· exp
. (12)
Then, (e) follows from the relation between arithmetic and geometric means, and from expressing
the number of types as the number of ordered partitions of n into k parts
k − 1
. Finally, (f)
follows from applying (12) again and by lower bounding
k − 1
The first region in (8) results directly from (11). The behavior is similar to patterns as shown
in [1] for this region. As mentioned in [20], to obtain the second region, the bound is maximized
by retaining ℓ̂ =
n1/3e5/18
/(2π)1/3 instead of k in step (c) of (11), for every k ≥ ℓ̂. The bounds
obtained are equal to those obtained for patterns because the first step (a) in (11) discards all
the sequences whose contributions to Shtarkov’s sum are different between patterns and monotonic
distributions. A similar step is effectively done deriving the bounds for patterns. The difference
is that in the case of patterns, components of Shtarkov’s sum are reduced, but all are retained
in the sum, while here, we omit components from the sum, corresponding to sequences with non-
monotonic i.i.d. ML estimates. The analysis in [20] that also attains the second region of the
bound in (8) is still valid here. It differs from the steps taken above by lower bounding a pattern
probability by a larger probability than the ML i.i.d. probability corresponding to the pattern. The
bound used in the derivation of Theorem 12 in [20] adds a multiplicative factor to each pattern
probability which equals the number of sequences with the same pattern and an equal i.i.d. ML
probability. However, this similar effect is included in Shtarkov’s sum for monotonic distributions
since all these sequences do have a corresponding i.i.d. ML estimate which is monotonic, and are
thus not omitted by step (a) of the derivation.
The analysis in [14] yields the third region of the bound in (8), since, for k ≥ n,
R̂+n (Mk) =
Ψ(yn)
1.5n1/3 log e
log n
, (13)
where Ψ(yn) is the pattern of the sequence yn. Inequality (a) holds because each pattern cor-
responds to at least one sequence whose ML probability parameter estimates are ordered, i.e.,
θ̂i ≥ θ̂i+1,∀i, where the most probable index represents i = 1, the second most probable index
i = 2, and so on. Note that the sum element on the right hand side is for a probability of a
sequence, not a pattern, but the sum is over all patterns. The left hand side also includes sequences
for which the probabilities are unordered. Furthermore, exchanging the letters that correspond
to two indices with the same occurrence count will not violate monotonicity. Thus the inequality
follows. Step (b) in (13) is taken from [14], where the sum on the left hand side was shown to
equal the right hand side. This was true when summing over all patterns with up to n indices, thus
requiring k ≥ n. Note that this requirement does not mean that n distinct symbols must occur
in xn, only that the maximal symbol in xn is n or greater. This concludes the proof of Theorem 3. �
4 Upper Bounds for Small and Large Alphabets
In this section, we demonstrate codes that asymptotically achieve the lower bounds for θ ∈ Mk
and k = O(n). We begin with a theorem that shows the achievable redundancies, and devote the
remainder of the section to describing the codes and deriving upper bounds on their redundancies.
The theorem is stated assuming no initial knowledge of k. The proof first considers the setting
where k is known, and then shows how the same bounds are achieved even when k is unknown in
advance, but as long as it satisfies the conditions.
Theorem 4 Fix an arbitrarily small ε > 0, and let n → ∞. Then, there exist a code with length
function L∗ (·) that achieves redundancy
Rn (L
∗,θ) ≤
(1 + ε) k−1
n(logn)2
, for k ≤ n1/3,
(1 + ε) (log n)
log k
n1/3−ε
, for n1/3 < k = o(n),
(1 + ε) 2
(log n)
2 n1/3
, for n1/3 < k = O(n),
for i.i.d. sequences generated by any source θ ∈ Mk.
Slightly tighter bounds are possible in the first and second regions and between them. The
bounds presented, however, are inclusive for each of the regions. Note that the third region con-
tains the second, but if k = o(n), a tighter bound is possible in the second region. The code
designed to code a sequence xn is a two part code [23] that quantizes a distribution that minimizes
the cost, and uses it to code xn. The total redundancy cost consists of the cost of describing the
quantized distribution and the quantization cost. The second is bounded through the quantized
true distribution of the sequence, which cannot result in lower cost than that of the chosen dis-
tribution (which minimizes the cost). In order to achieve the low costs of the lower bound, the
probability parameters are quantized non-uniformly, where the smaller the probability the finer the
quantization. This approach was used in [25] and [26] to obtain upper bounds on the redundancy
for coding over large alphabets and for coding patterns, respectively. The method used in [25] and
[26], however, is insufficient here, because it still results in too many quantization points due to the
polynomial growth in quantization spacing. Here, we use an exponential growth as the parameters
increase. This general idea was used in [28] to improve an upper bound on the redundancy of
coding patterns. Here, however, we improve on the method presented in [28]. Another key step in
the proof here is the fact that since both encoder and decoder know the order of the probabilities
a-priori , this order need not be coded. It is sufficient to encode the quantized probabilities of the
monotonic distribution, and the decoder can identify which probability is associated with which
symbol using the monotonicity of the distribution.
Proof of Theorem 4: We start with k ≤ n1/3 assuming k is known. Let β = 1/(log n) be a
parameter (note, that we can choose other values). Partition the probability space into J1 = ⌈1/β⌉
intervals,
n(j−1)β
, 1 ≤ j ≤ J1. (15)
Note that I1 = [1/n, 2/n), I2 = [2/n, 4/n), . . . , Ij = [2
j−1/n, 2j/n). Let kj = |θi ∈ Ij| denote the
number of probabilities in θ that are in interval Ij. In interval j, take a grid of points with spacing
. (16)
Note that to complete all points in an interval, the spacing between two points at the boundary of an
interval may be smaller. There are ⌈log n⌉ intervals. Ignoring negligible integer length constraints
(here and elsewhere), in each interval, the number of points is bounded by
|Ij | ≤
, ∀j : j = 1, 2, . . . , J1, (17)
where | · | denotes the cardinality of a set. Let the grid
τ = (τ1, τ2, . . .) =
, . . . ,
, . . .
be a vector that takes all the points from all intervals, with cardinality
= |τ | ≤ 1
⌈log n⌉ . (19)
Now, let ϕ = (ϕ1, ϕ2, . . . , ϕk) be a monotonic probability vector, such that
ϕi = 1, ϕ1 ≥
ϕ2 ≥ · · · ≥ ϕk ≥ 0, and also the smaller k−1 components of ϕ are either 0 or from τ , i.e., ϕi ∈ (τ ∪
{0}), i = 2, 3, . . . , k. One can code xn using a two part code, assuming the distribution governing
xn is given by the parameter ϕ. The code length required (up to integer length constraints) is
L (xn|ϕ) = log k + LR(ϕ)− log Pϕ (xn) , (20)
where log k bits are needed to describe how many letter probabilities are greater than 0 in ϕ, and
LR(ϕ) is the number of bits required to describe the quantized points of ϕ.
The vector ϕ can be described by a code as follows. Let k̂ϕ be the number of nonzero letter
probabilities hypothesized by ϕ. Let bi denote the index of ϕi in τ , i.e., ϕi = τbi . Then, we
will use the following differential code. For ϕ
we need at most 1 + log b
+ 2 log(1 + log b
bits to code its index in τ using Elias’ coding for the integers [7]. For ϕi−1, we need at most
1 + log(bi−1 − bi + 1) + 2 log[1 + log(bi−1 − bi + 1)] bits to code the index displacement from the
index of the previous parameter, where an additional 1 is added to the difference in case the two
parameters share the same index. Summing up all components of ϕ, and taking b
k̂ϕ+1
LR(ϕ) ≤ k̂ϕ − 1 +
log (bi − bi+1 + 1) + 2
log [1 + log (bi − bi+1 + 1)]
≤ (k − 1) + (k − 1) log B1 + k − 1
+ 2(k − 1) log log B1 + k − 1
+ o(k)
= (1 + ε)
k − 1
n (log n)
. (21)
Inequality (a) is obtained by applying Jensen’s inequality once on the first sum, twice on the second
sum utilizing the monotonicity of the logarithm function, and by bounding k̂ϕ by k and absorbing
low order terms in the resulting o(k) term. Then, low order terms are absorbed in ε, and (19) is
used to obtain (b).
To code xn, we choose ϕ which minimizes the expression in (20) over all ϕ, i.e.,
L∗ (xn) = min
L (xn|ϕ) △= L (xn|ϕ̂) . (22)
The pointwise redundancy for xn is given by
nRn (L
∗, xn) = L∗ (xn) + log Pθ (x
n) = log k + L∗R (ϕ̂) + log
Pθ (x
Pϕ̂ (x
. (23)
Note that the pointwise redundancy differs from the individual one, since it is defined w.r.t. the
true probability of xn.
To bound the third term of (23), let θ′ be a quantized still monotonic version of θ onto τ , i.e.,
θ′i ∈ (τ ∪ {0}), i = 2, 3, . . . , k, where if θi > 0 ⇔ θ′i > 0 as well. Define the quantization error,
δi = θi − θ′i. (24)
The quantization is performed from the smallest parameter θk to the largest, where monotonicity
is retained, as well as minimal absolute quantization error. This implies that θi will be quantized
to one of the two nearest grid points (one smaller and one greater than it). It also guarantees that
|δ1| ≤ ∆
, where j2 is the index of the interval in which θ2 is contained, i.e., θ2 ∈ Ij2 . Now, since
θ′ is included in the minimization of (22), we have, for every xn,
L∗ (xn) ≤ L
xn|θ′
, (25)
and also
nRn (L
∗, xn) ≤ log k + LR
+ log
Pθ (x
Pθ′ (x
. (26)
Averaging over all possible xn, the average redundancy is bounded by
nRn (L
∗,θ) = log k + EθL
R (ϕ̂) + Eθ log
Pθ (X
Pϕ̂ (X
≤ log k + EθLR
′)+ Eθ log
Pθ (X
Pθ′ (X
. (27)
The second term of (27) is bounded with the bound of (21), and we proceed with the third term.
Eθ log
Pθ (X
Pθ′ (X
θi log
θ′i + δi
≤ n(log e)
θ′i + δi
= n(log e)
≤ k log e+ 2(log e)k
kj · njβ
≤ 5(log e)k. (28)
Equality (a) is since the argument in the logarithm is fixed, thus expectation is performed only on
the number of occurrences of letter i for each letter. Representing θi = θ
i + δi yields equation (b).
We use ln(1+x) ≤ x to obtain (c). Equality (d) is obtained since all the quantization displacements
must sum to 0. The first term of inequality (e) is obtained under a worst case assumption that
θi ≪ 1/n for i ≥ 2. Thus it is quantized to θ′i = 1/n, and the bound |δi| ≤ 1/n is used. The
second term is obtained by separating the terms into their intervals. In interval j, the bounds
θ′i ≥ n(j−1)β/n, and |δi| ≤
knjβ/n1.5 are used, and also nβ = 2. Inequality (f) is obtained since
j ≤ 2n. (29)
Inequality (29) is obtained since k1 ≤ n, k2 ≤ (n− k1)/2, k3 ≤ (n− k1)/4− k2/2, and so on, until
kJ1 ≤
2J1−1
2J1−ℓ
j ≤ 2n. (30)
The reason for these relations are the lower limits of the J1 intervals that restrict the number
of parameters inside the interval. The restriction is done in order of intervals, so that the used
probabilities are subtracted, leading to the series of equations.
Plugging the bounds of (21) and (28) into (27), we obtain,
nRn (L
∗,θ) ≤ log k + (1 + ε) k − 1
n (log n)
+ 5(log e)k
1 + ε′
) k − 1
n (log n)
, (31)
where we absorb low order terms in ε′. Replacing ε′ by ε normalizing the redundancy per symbol
by n, the bound of the first region of (14) is proved.
We now consider the larger values of k, i.e., n1/3 < k = O(n). The idea of the proof is the
same. However, we need to partition the probability space to different intervals, the spacing within
an interval must be optimized, and the parameters’ description cost must be bounded differently,
because now there are more parameters quantized than points in the quantization grid. Define the
jth interval as
n(j−1)β
, 1 ≤ j ≤ J2, (32)
where J2 = ⌈2/β⌉ = ⌈2 log n⌉. Again, let kj = |θi ∈ Ij| denote the number of probabilities in θ that
are in interval Ij. It could be possible to use the intervals as defined in (15), but this would not
guarantee bounded redundancy in the rate we require if there are very small probabilities θi ≪ 1/n.
Therefore, the interval definition in (15) can be used for larger alphabets only if the probabilities
of the symbols are known to be bounded. Define the spacing in interval j as
, (33)
where α is a parameter to be optimized. Similarly to (17), the interval cardinality here is
|Ij| ≤ 0.5 · nα, ∀j : j = 1, 2, . . . , J2, (34)
In a similar manner to the definition of τ in (18), we define
η = (η1, η2, . . .) =
, . . . ,
, . . .
. (35)
The cardinality of η is
= |η| ≤ 0.5 · nα ⌈2 log n⌉ ≤ nα ⌈log n⌉ . (36)
We now perform the encoding similarly to the small k case, where we allow quantization to
nonzero values to the components of ϕ up to i = n2. (This is more than needed but is possible
since η1 = 1/n
2.) Encoding is performed similarly to the small k case. Thus, similarly to (27), we
nRn (L
∗,θ) ≤ 2 log n+ EθLR
′)+ Eθ log
Pθ (X
Pθ′ (X
, (37)
where the first term is due to allowing up to k̂ = n2. Since usually in this region k ≥ B2 (except
the low end), the description of vectors ϕ and θ′ is done by coding the cardinality of |ϕi = ηj | and
|θ′i = ηj |, respectively, i.e., for each grid point the code describes how many letters have probability
quantized to this point. This idea resembles coding profiles of patterns, as done in [20]. However,
unlike the method in [20], here, many probability parameters of symbols with different occurrences
are mapped to the same grid point by quantization. The number of parameters mapped to a grid
point of η is coded using Elias’ representation of the integers. Hence, in a similar manner to (21),
1 + log
|θ′i = ηj |+ 1
+ 2 log
1 + log
|θ′i = ηj |+ 1
≤ B2 +B2 log
k +B2
+ 2B2 log log
k +B2
+ o (B2)
(1 + ε)(log n)
log k
nα, for nα < k = o(n),
(1 + ε)(1 − α) (log n)2 nα, for nα < k = O(n).
The additional 1 term in the logarithm in (a) is for 0 occurrences, (b) is obtained similarly to step
(a) of (21), absorbing all low order terms in the last term. To obtain (c), we first assume, for the
first region, that knε ≫ B2 (an assumption that must be later validated with the choice of α).
Then, low order terms are absorbed in ε. The extra nε factor is unnecessary if k ≫ B2. The second
region is obtained by upper bounding k without this factor. It is possible to separate the first
region into two regions, eliminate this factor in the lower region, and obtain a more complicated,
yet tighter, expression in the upper region, where k ∼ Θ(n1/3).
Now, similarly to (28), we obtain
Eθ log
Pθ (X
Pθ′ (X
≤ n(log e)
≤ O(1) +
2 log e
n1+2α
≤ 4(log e)n1−2α +O(1). (39)
The first term of inequality (a) is obtained under the assumption that k = O(n), θ′i ≥ 1/n2, and
|δi| ≤ 1/n2. For the second term |δi| ≤ njβ/n2+α, and θ′i ≥ n(j−1)β/n2. Inequality (b) is obtained
in a similar manner to inequality (f) of (28), where the sum is shown similarly to be 2n2.
Summing up the contributions of (38) and (39) in (37), it is clear that α = 1/3 minimizes
the total cost (to first order). This choice of α also satisfies the assumption of step (c) in (38).
Using α = 1/3, absorbing all low order terms in ε and normalizing by n, we obtain the remaining
two regions of the bound in (14). It should be noted that the proof here would give a bound of
O(n1/3+ε) up to k = O(n4/3). If the intervals in (15) were used for bounded distributions, the
coefficients of the last two regions will be reduced by a factor of 2. Additional manipulations on
the grid η may reduce the coefficients more (see, e.g., [28]).
The proof up to this point assumes that k is known in advance. This is important for the code
resulting in the bound for the first region because the quantization grid depends on k. Specifically,
if in building the grid, k is underestimated, the description cost of ϕ increases. If k is overestimated,
the quantization cost will increase. Also, if the code of the second region is used for a smaller k, a
larger bound than necessary results. To solve this, the optimization that chooses L∗ (xn) is done
over all possible values of k (greater than or equal to the maximal symbol occurring in xn), i.e.,
every greater k in the first region, and the construction of the code for the other regions. For every
k in the first region, a different construction is done, using the appropriate k to determine the
spacing in each interval. The value of k yielding the shortest code word is then used, and O(log n)
additional bits are used at the prefix of the code to inform the decoder which k is used. The analysis
continues as before. This does not change the redundancy to first order, giving all three regions of
the bound in (14), even if k is unknown in advance. This concludes the proof of Theorem 4. �
5 Upper Bounds for Fast Decaying Distributions
This section shows that with some mild conditions on the source distribution, the same redundancy
upper bounds achieved for finite monotonic distributions can be achieved even if the monotonic
distribution is over an infinite alphabet. The key observation that allows this is that a distribution
that decays fast enough will result in only a small number of occurrences of unlikely letters in a
sequence. These letters may very likely be out of order, but since there are very few of them, they
can be handled without increasing the asymptotic behavior of the coding cost. More precisely, fast
decaying monotonic distributions can be viewed as if they have some effective bounded alphabet
size, where occurrences of symbols outside this limited alphabet are rare. We present two theorems
and a corollary that show how one can upper bound the redundancy obtained when coding with
some unknown distribution. The first theorem provides a slightly stronger bound (with smaller
coefficient) even for k = O(n), where the smaller coefficient is attained by improved bounding,
that more uniformly weights the quantization cost for minimal probabilities. In the weaker version
of the results presented here, if the distribution decays slower and there are more low probability
symbols, the redundancy order does increase due to the penalty of identifying these symbols in
a sequence. However, we show, consistently with the results in [10], that as long as the entropy
of the source is finite, a universal code, in the sense of diminishing redundancy per symbol, does
exist. We begin with stating the two theorems and the corollary, then the proofs are presented.
The section is concluded with three examples of typical monotonic distributions over the integers,
to which the bounds are applied.
5.1 Upper Bounds
We begin with some notation. Fix an arbitrary small ε > 0, and let n→ ∞. Define m △= mρ
as the effective alphabet size, where ρ > ε. (Note that ρ = (logm)/(log n).) Let
Rn(m)
log n
, for m = o
ρ+ ε− 1
(log n)
n1/3, otherwise.
Theorem 5 I. Fix an arbitrarily small ε > 0, and let n → ∞. Let xn be generated by an i.i.d.
monotonic distribution θ ∈ M. If there exists m∗, such that,
nθi log i = o [Rn (m∗)] , (41)
then, there exists a code with length function L∗(·), such that
Rn (L
∗,θ) ≤ (1 + ε)
Rn (m∗) (42)
for the monotonic distribution θ.
II. If there exists m∗ for which ρ∗ = o
n1/3/(log n)
, such that,
θi log i = o(1), (43)
then, there exists a universal code with length function L∗(·), such that
Rn (L
∗,θ) = o(1). (44)
Theorem 5 implies that if a monotonic distribution decays fast enough, its effective alphabet
size does not exceed O(nρ), and, as long as ρ is fixed, bounds of the same order as those obtained for
finite alphabets are achievable. Specifically, very fast decaying distributions, although over infinite
alphabets, may even behave like monotonic distributions with o
symbols. The condition in
(41) merely means that the cost that a code would obtain in order to code very rare symbols, that
are larger than the effective alphabet size, is negligible w.r.t. the total cost obtained from other,
more likely, symbols. Note that for m = n, the bound is tighter than that of the third region
of Theorem 4, and a constant of 5/9 replaces 2/3. The second part of the theorem states that if
the decay is slow, but the cost of coding rare symbols is still diminishing per symbol, a universal
code still exists for such distributions. However, in this case the redundancy will be dominated by
coding the rare (out of order) symbols. This result leads to the following corollary:
Corollary 1 As n → ∞, sequences generated by monotonic distributions with Hθ(X) = O(1) are
universally compressible in the average sense.
Corollary 1 shows that sequences generated by finite entropy monotonic distributions can be com-
pressed in the average with diminishing per symbol redundancy. This result is consistent with the
results shown in [10].
While Theorem 5 bounds the redundancy decay rate with two extremes, a more general theorem
can be used to provide some best redundancy decay rate that a code can be designed to adapt to
for some unknown monotonic distribution that governs the data. As the examples at the end of
this section show, the next theorem is very useful for slower decaying distributions.
Theorem 6 Fix an arbitrarily small ε > 0, and let n → ∞. Let xn be generated by an i.i.d.
monotonic distribution θ ∈ M. Then, there exists a code with length function L∗(·), that achieves
redundancy
nRn (L
∗,θ) ≤ (1 + ε) ·
α,ρ:ρ≥α+ε
· (ρ+ 2α) (ρ− α) (log n)2nα + 5(log e)n1−2α +
θi log i
for coding sequences generated by the source θ.
We continue with proving the two theorems and the corollary.
Proof : The idea of the proof of both theorems is to separate the more likely symbols from the
unlikely ones. First, the code determines the point of separation m = nρ. (Note that ρ can be
greater than 1.) Then, all symbols i ≤ m are considered likely and are quantized in a similar
manner as in the codes for smaller alphabets. Unlike bounded alphabets, though, a more robust
grid is used here to allow larger values of m. Coding of occurrences of these symbols uses the
quantized probabilities. The unlikely symbols are coded hierarchically. They are first merged into
a single symbol, and then are coded within this symbol, where the full cost of conveying to the
decoder which rare symbols occur in the sequence is required. Thus, they are presented giving
their actual value. As long as the decay is fast enough, the average cost of conveying these symbols
becomes negligible w.r.t. the cost of coding the likely symbols. If the decay is slower, but still fast
enough, as the case described in condition (43), the coding cost of the rare symbols dominates the
redundancy, but still diminishing redundancy can be achieved. In order to determine the best value
of m for a given sequence, all values are tried and the one yielding the shortest description is used
for coding the specific sequence xn.
Let m ≥ 2 determine the number of likely symbols in the alphabet. For a given m, define
θi, (46)
as the total probability of the remaining symbols. Given θ, m and Sm, a probability
P (xn|m,Sm,θ)
nx(i)
· Snx(x>m)m ·
nx(i)
nx(x > m)
)nx(i)
, (47)
can be computed for xn, where nx(i) is the occurrence count of symbol i in x
n, and nx(x > m)
is the count of all symbols greater than m in xn. This probability mass function clusters all large
symbols (with small probabilities) greater than m into one symbol. Then, it uses the ML estimate
of each of the large symbols to distinguish among them in the clustered symbol.
For every m, we can define a quantization grid ξm for the first m probability parameters of
θ. The idea is similar to that used for all probability parameters in the proof of Theorem 4. If
m = o(n1/3), we use ξm = τm, where τm is the grid defined in (18) where m replaces k. Otherwise,
we can use the definition of η in (35). However, to obtain tighter bounds for large m, we define a
different grid for the larger values of m following similar steps to those in (32)-(36). First, define
the jth interval as
n(j−1)β
nρ+2α
nρ+2α
, 1 ≤ j ≤ Jρ, (48)
where ρ = (logm)/(log n) as defined above, α is a parameter, and β = 1/(log n) as before. Within
the jth interval, we define the spacing in the grid by
nρ+3α
. (49)
As in (34),
|Ij | ≤ 0.5 · nα, ∀j : j = 1, 2, . . . , Jρ, (50)
and the total number of intervals is
Jρ = ⌈(ρ+ 2α) log n⌉ . (51)
Similarly to (35), ξm is defined as
ξm = (ξ1, ξ2, . . .) =
nρ+2α
nρ+2α
nρ+3α
, . . . ,
nρ+2α
nρ+2α
nρ+3α
, . . .
. (52)
The cardinality of ξm is thus
= |ξm| ≤ 0.5 · nα ⌈(ρ+ 2α) log n⌉ . (53)
An mth order quantized version θ′m of θ is obtained by quantizing θi, i = 2, 3, . . . ,m onto ξm,
such that θ′i ∈ ξm for these values of i. Then, the remaining cluster probability Sm is quantized into
S′m ∈ [1/n, 2/n, . . . , 1]. The parameter θ′1 is constrained by the quantization of the other parameters.
Quantization is performed in a similar manner as before, to minimize the accumulating cost and
retain monotonicity.
Now, for any m ≥ 2, let ϕm be any monotonic probability vector of cardinality m whose last
m− 1 components are quantized into ξm, and let σm ∈ [1/n, 2/n, . . . , 1] be a quantized estimate of
the total probability of the remaining symbols, such that
i=1 ϕi,m+σm = 1, where ϕi,m is the ith
component of ϕm. If m, σm and ϕm are known, a given x
n can be coded using P (xn|m,σm,ϕm)
as defined in (47), where σm replaces Sm, and the m components of ϕm replace the first m com-
ponents of θ. However, in the universal setting, none of these parameter are known in advance.
Furthermore, neither the symbols greater than m nor their conditional ML probabilities are known
in advance. Therefore, the total cost of coding xn using these parameters requires universality costs
for describing them. The cost of universally coding xn assigning probability P (xn|m,σm,ϕm) to it
thus requires the following five components: 1) m should be described using Elias’ representation
with at most 1+ ρ log n+2 log(1+ ρ log n) bits. 2) The value of σm in its quantization grid should
be coded using log n bits. 3) The m components of ϕm require LR (ϕm) (which is bounded below)
bits. 4) The number cx(x > m) of distinct letters in x
n greater than m is coded using log n bits.
5) Each letter i > m in xn is coded. Elias’ coding for the integers using 1 + log i+ 2 log(1 + log i)
bits can be used, but to simplify the derivation we can also use the code, also presented in [7], that
uses no more than 1 + 2 log i bits to describe i. In addition, at most log n bits are required for
describing nx(i) in x
n. For n→ ∞, m≫ 1, and ε > 0 arbitrarily small, this yields a total cost of
L (xn|m,σm,ϕm) ≤ − log P (xn|m,σm,ϕm) + LR (ϕm) + [(1 + ε)ρ+ cx(x > m) + 2] log n
+cx(x > m) + 2
i>m,i∈xn
log i, (54)
where we assume m is large enough to bound the cost of describing m by (1 + ε)ρ log n.
The description cost of ϕm for m = o(n
1/3) is bounded by
LR (ϕm) ≤ (1 + ε)
using (21), where m replaces k. The (log n)2 factor in (21) can be absorbed in ε since we limit m
to o(n1/3), unlike the derivation in (21). For larger values of m, we describe symbol probabilities of
ϕm in the grid ξm in a similar manner to the description of O(n) symbol probabilities in the grid
η. Similarly to (38), we thus have
LR(ϕm) ≤ Bρ +Bρ log
nρ +Bρ
+ 2Bρ log log
nρ +Bρ
+ o (Bρ)
≤ (1 + ε)
(ρ+ 2α) (ρ+ ε− α) (log n)2nα (56)
where to obtain inequality (a), we first multiply nρ by nε in the numerator of the argument of the
logarithm. This is only necessary for ρ→ α to guarantee that nρ+ε ≫ Bρ. Substituting the bound
on Bρ from (53), absorbing low order terms in the leading ε, yields the bound.
A sequence xn can now be coded using the universal parameters that minimize the length of
the sequence description, i.e.,
L∗ (xn)
= min
σm′∈[
,...,1]
ϕm′ :ϕi∈ξm′ ,i≥2
xn|m′, σm′ ,ϕm′
xn|m,S′m,θ′m
, (57)
where θ′m and S
m are the true source parameters quantized as described above, and the inequality
holds for every m. Note that the maximization on m′ should be performed only up to the maximal
symbol the occurs in xn.
Following (54)-(57), up to negligible integer length constraints, the average redundancy using
L∗(·) is bounded, for every m ≥ 2, by
nRn (L
∗,θ) = Eθ [L
∗ (Xn) + log Pθ (X
Xn | m,S′m,θ′m
+ logPθ (X
≤ Eθ log
Pθ (X
Xn | m,S′m,θ′m
) + LR
Pθ (i ∈ Xn) log i
+(1 + ε) [EθCx (X > m) + ρ+ 2] log n (58)
where (a) follows from (57), and (b) follows from averaging on (54) with σm = S
m, and ϕm = θ
where the average on cx(x > m) is absorbed in the leading ε.
Expressing Pθ (x
n) as
Pθ (x
nx(i)
 · Snx(x>m)m ·
)nx(i)
, (59)
and defining δS
= Sm − S′m, the first term of (58) is bounded, for the upper region of m, by
Eθ log
Pθ (X
Xn | m,S′m,θ′m
) ≤ Eθ
Nx(i) log
θ′i,m
+Nx (X > m) log
Nx(i) log
θi/Sm
Nx(i)/Nx(X > m)
≤ n ·
θi log
θ′i,m
+ nSm log
≤ n(log e)
θ′i,m
≤ (log e) ·
n · nρ
nρ+2α
+ 2(log e)n1−ρ−4α ·
jβ + log e
≤ 5(log e)n1−2α + log e, (60)
where (a) is since for the third term, the conditional ML probability used for coding is greater than
the actual conditional probability assigned to all letters greater than m for every xn. Hence, the
third term is bounded by 0. For the other terms expectation is performed. Inequality (b) is obtained
similarly to (28) where quantization includes the first m components of θ and the parameter Sm.
Then, inequality (c) follows the same reasoning as step (a) of (39). The first term bounds the worst
case in which all nρ symbols are quantized to 1/nρ+2α with |δi| ≤ 1/nρ+2α. The second term is
obtained where θ′i,m ≥ n(j−1)β/nρ+2α and |δi| ≤ njβ/nρ+3α for θi ∈ Ij , and kj = |θi ∈ Ij| as before.
The last term is since S′m ≥ 1/n and |δS | ≤ 1/n. Finally, (d) is obtained similarly to step (b) of
(39), where as in (29),
jβ ≤ 2nρ+2α. For m = o(n1/3), the same initial steps up to step (b) in
(60) are applied, and then the remaining steps in (28) are applied to the left sum with m replacing
k, yielding a total quantization cost of 5(log e)m+ log e.
To bound the third and fourth terms of (58), we realize that
Pθ (i ∈ Xn) = 1− (1− θi)n ≤ nθi. (61)
Similarly,
EθCx(X > m) =
Pθ (i ∈ Xn) ≤ nSm. (62)
Combining the dominant terms of the third and fourth terms of (58), we have
Pθ (i ∈ Xn) log i+ (1 + ε)EθCx(X > m) log n
Pθ (i ∈ Xn) [2 log i+ (1 + ε) log n]
1 + ε
Pθ (i ∈ Xn) log i
1 + ε
θi log i (63)
where (a) is because EθCx(X > m) =
i>m Pθ (i ∈ Xn), (b) is because for i > m = nρ, log i >
ρ log n, and (c) follows from (61). Given ρ > ε for an arbitrary fixed ε > 0, the resulting coefficient
above is upper bounded by some constant κ.
Summing up the contributions of the terms of (58) from (28), (55), and (63), absorbing low
order terms in a leading ε′, we obtain that for m = o(n1/3),
nRn (L
∗,θ) ≤
1 + ε′
) m− 1
θi log i. (64)
For the second region, substituting α = 1/3, and summing up the contributions of (60), (56), and
(63) to (58), absorbing low order terms in ε′, we obtain
nRn (L
∗,θ) ≤ (1 + ε′)
ρ+ ε′ −
(log n)
n1/3 + κn
θi log i. (65)
Since (64)-(65) hold for every m > nε, there exists m∗ for which the minimal bound is obtained.
To bound the redundancy, we choose this m∗. Now, if the condition in (41) holds, then the second
term in (64) and (65) is negligible w.r.t. the first term. Absorbing it in a leading ε, normalizing by
n, yields the upper bound of (42), and concludes the proof of the Part I of Theorem 5.
For Part II of Theorem 5, we consider the bound of the second region in (65). If there exists
ρ∗ = o
n1/3/(log n)
for which the condition in (43) holds, then both terms of (65) are of o(n),
yielding a total redundancy per symbol of o(1). The proof of Theorem 5 is concluded. �
To prove Corollary 1, we use Wyner’s inequality [32], which implies that for a finite entropy
monotonic distribution,
θi log i = Eθ [logX] ≤ Hθ [X] . (66)
Since the sum on the left hand side of (66) is finite if Hθ[X] is finite, there must exist some n0 such
θi log i = o(1). Let n > n0, then for m
∗ = n and ρ∗ = 1, condition (43) is satisfied.
Therefore, (44) holds, and the proof of Corollary 1 is concluded. �
We now consider only the upper region in (58) with parameters α and ρ taking any valid value.
(The code leading to the bound of the upper region can be applied even if the actual effective
alphabet size is in the lower region.) We can sum up the contributions of (60), (56), and (63) to
(58), absorbing low order terms in ε. Equation (56) is valid without the middle ε term as long as
ρ ≥ α + ε. Since, in the upper region of m, i ≥ m is large enough, Elias’ code for the integers
can be used costing (1 + ε) log i to code i, with ε > 0 which can be made arbitrarily small. Hence,
the leading coefficient of the bound in (63) can be replaced by (1 + ε)(1 + 1/ρ). This yields the
expression bounding the redundancy in (45). This expression applies to every valid choice of α and
ρ, including the choice that minimizes the expression. Thus the proof of Theorem 6 is concluded. �
5.2 Examples
We demonstrate the use of the bounds of Theorems 5 and 6 with three typical distributions over
the integers. We specifically show that the redundancy rate of O
n1/3+ε
bits overall is achievable
when coding many of the typical monotonic distributions, and, in fact, for many distributions
faster convergence rates are achievable with the codes provided in proving the theorems above.
The assumption that very few unlikely symbols are likely to appear in a sequence generated by a
monotonic distribution, which is reflected in the conditions in (41) and (43), is very realistic even
in practical examples. Specifically, in the phone book example, there may be many rare names, but
only very few of them may occur in a certain city, and the more common names constitute most of
any possible phone book sequence.
5.2.1 Fast Decaying Distributions Over the Integers
Consider the monotonic distributions over the integers of the form,
, i = 1, 2, . . . , (67)
where γ > 0, and a is a normalization coefficient that guarantees that the probabilities over all
integers sum to 1. It is easy to show by approximating summation by integration that for some
m→ ∞,
Sm ≤ (1 + ε)
θi log i ≤ (1 + ε)
a logm
. (69)
For m = nρ and fixed ρ, the sum in (41) is thus O
n1−ργ log n
, which is o
n1/3(log n)2
for every
ρ ≥ 2/(3γ). Specifically, as long as γ ≤ 2 (slow decay), the minimal value of ρ required to guarantee
negligibility of the sum in (41) is greater than 1/3. Using Theorem 5, this implies that for γ ≤ 2,
the second (upper) region of the upper bound in (42) holds with the minimal choice of ρ∗ = 2/(3γ).
Plugging in this value in the second region of (40) (i.e., in (42)) yields the upper bound shown below
for this region. For γ > 2, 2/(3γ) < 1/3. Hence, (41) holds for m∗ = o
. This means that for
the distribution in (67) with γ > 2, the effective alphabet size is o
, and thus the achievable
redundancy is in the first region of the bound of (42). Thus, even though the distribution is over
an infinite alphabet, its compressibility behavior is similar to a distribution over a relatively small
alphabet. To find the exact redundancy rate, we balance between the contributions of (55) and
(63) in (58). As long as 1 − ργ < ρ, condition (41) holds, and the contribution of small letters
in (63) is negligible w.r.t. the other terms of the redundancy. Equality, implying ρ∗ = 1/(1 + γ),
achieves the minimal redundancy rate. Thus, for γ > 2,
nRn (L
≤ (1 + ε)
a(2ρ∗ + 1)
∗γ log n+
(1− 3ρ∗) log n
= (1 + ε)
1+γ log n (70)
where the first term in (a) follows from the bounds in (63) and (69), with m = nρ
, and the second
term from that in (55), and (b) follows from ρ∗ = 1/(1 + γ). Note that for a fixed ρ∗, the factor 3
in the first term can be reduced to 2 with Elias’ coding for the integers. The results described are
summarized in the following corollary:
Corollary 2 Let θ ∈ M be defined in (67). Then, there exists a universal code with length function
L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal coding redundancy
Rn (L
∗,θ) ≤
(1 + ε) 1
1 + 1
+ ε− 1
n1/3(logn)2
, for γ ≤ 2,
(1 + ε)
1+γ logn
, for γ > 2.
Corollary 2 gives the redundancy rates for all distributions defined in (67). For example, if γ = 1,
the redundancy is O
n1/3(log n)2
bits overall with coefficient 2/9. For γ = 3, O(n1/4 log n) bits
are required. For faster decays (greater γ) even smaller redundancy rates are achievable.
5.2.2 Geometric Distributions
Geometric distributions given by
θi = p (1− p)i−1 ; i = 1, 2, . . . , (72)
where 0 < p < 1, decay even faster than the distribution over the integers in (67). Thus their
effective alphabet sizes are even smaller. This implies that a universal code can have even smaller
redundancy than that presented in Corollary 2 when coding sequences generated by a geometric
distribution (even if this is unknown in advance, and the only prior knowledge is that θ ∈ M).
Choosing m = ℓ · log n, the contribution of low probability symbols in (63) to (58) can be upper
bounded by
θi (log i+ log n)
≤ 2n(1− p)m log n+O (n(1− p)m logm)
= 2n1+ℓ log(1−p)(log n) +O
n1+ℓ log(1−p) log log n
where (a) follows from computing Sm using geometric series, and bounding the second term, and
(b) follows from substituting m = ℓ log n and representing (1 − p)ℓ logn as nℓ log(1−p). As long as
ℓ ≥ 1/(− log(1− p)), the expression in (73) is O(log n), thus negligible w.r.t. the redundancy upper
bound of (42) with m∗ = ℓ∗ log n = (log n)/(− log(1 − p)). Substituting this m∗ in (42), we obtain
the following corollary:
Corollary 3 Let θ ∈ M be a geometric distribution defined in (72). Then, there exists a universal
code with length function L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal
coding redundancy
Rn (L
∗,θ) ≤ 1 + ε
−2 log(1− p)
· (log n)
. (74)
Corollary 3 shows that if θ parameterizes a geometric distribution, sequences governed by θ can be
coded with average universal coding redundancy of O
(log n)2
bits. Their effective alphabet size
is O(log n), implying that larger symbols are very unlikely to occur. For example, for p = 0.5, the
effective alphabet size is log n, and 0.5(log n)2 bits are required for a universal code. For p = 0.75,
the effective alphabet size is (log n)/2, and (log n)2/4 bits are required by a universal code.
5.2.3 Slow Decaying Distributions Over the Integers
Up to now, we considered fast decaying distributions, which all achieved the O(n1/3+ε/n) redun-
dancy rate. We now consider a slowly decaying monotonic distribution over the integers, given
i (log i)
, i = 2, 3, . . . , (75)
where γ > 0 and a is a normalizing factor (see, e.g., [12], [27]). This distribution has finite
entropy only if γ > 0 (but is a valid infinite entropy distribution for γ > −1). Unlike the previous
distributions, we need to use Theorem 6 to bound the redundancy for coding sequences generated
by this distribution. Approximating the sum with an integral, the order of the third term of (45)
θi log i = O
(logm)γ
. (76)
In order to minimize the redundancy bound of (45), we define ρ = nℓ. For the minimum rate, all
terms of (45) must be balanced. To achieve that, we must have
α+ 2ℓ = 1− 2α = 1− γℓ. (77)
The solution is α = γ/(4 + 3γ), and ℓ = 2/(4 + 3γ). Substituting these values in the expression of
(45), with ρ = nℓ, results in the first term in (45) dominating, and yields the following corollary:
Corollary 4 Let θ ∈ M be defined in (75) with γ > 0. Then, there exists a universal code with
length function L∗(·) that has only prior knowledge that θ ∈ M, that can achieve universal coding
redundancy
Rn (L
∗,θ) ≤ (1 + ε)
3γ+4 (log n)2
. (78)
Due to the slow decay rate of the distribution in (75), the effective alphabet size is much greater
here. For γ = 1, for example, it is nn
. This implies that very large symbols are likely to appear
in xn. As γ increases though, the effective alphabet size decreases, and as γ → ∞, m → n. The
redundancy rate increases due to the slow decay. For γ ≥ 1, it is O
n5/7(log n)2/n
. As γ → ∞,
since the distribution tends to decay faster, the redundancy rate tends to the finite alphabet rate
n1/3(log n)2/n
. However, as the decay rate is slower γ → 0, a non-diminishing redundancy
rate is approached. Note that the proof of Theorem 6 does not limit the distribution to a finite
entropy one. Therefore, the bound of (78) applies, in fact, also to −1 < γ ≤ 0. However, for γ ≤ 0,
the per-symbol redundancy is no long diminishing.
6 Individual Sequences
In this section, we first show that individual sequences whose empirical distributions obey the
monotonicity constraints can be universally compressed as well as the average case. We then
study compression of sequences whose empirical distributions may diverge from monotonic. We
demonstrate that under mild conditions, similar in nature to those of Theorems 5 and 6, redundancy
that diminishes (slower than in the average case) w.r.t. the monotonic ML description length can
be obtained. However, these results are only useful when the monotonic ML description length
diverges only slightly from the (standard) ML description length of a sequence, i.e., the empirical
distribution of a sequence only mildly violates monotonicity. Otherwise, the penalty of using an
incorrect monotone model overwhelms the redundancy gain. We begin with sequences that obey
the monotonicity constraints.
Theorem 7 Fix an arbitrarily small ε > 0, and let n→ ∞. Let xn be a sequence for which θ̂ ∈ M,
i.e., θ̂1 ≥ θ̂2 ≥ . . .. Let k = k̂ be the number of letters occurring in xn. Then, there exists a code
L∗ (·) that achieves individual sequence redundancy w.r.t. θ̂M = θ̂ for xn which is upper bounded
R̂n (L
∗, xn) ≤
(1 + ε) k−1
n(logn)
, for k ≤ n1/3,
(1 + ε) (log n)
log k
n1/3−ε
, for n1/3 < k = o(n),
(1 + ε) 1
(log n)
2 n1/3
, for n1/3 < k = O(n).
Note that by the monotonicity constraint, the number of symbols k̂ occurring in xn also equals to
the maximal symbol in xn. Since, in the individual sequence case, this maximal symbol defines the
class considered and also to be consistent with Theorem 3, we use k to characterize the alphabet
size of a given sequence. (The maximal symbol in the individual sequence case is equivalent to the
alphabet size in the average case.) Finally, since θ̂ is monotonic, θ̂M = θ̂.
Proof of Theorem 7: The result in Theorem 7 follows directly from the proof of Theorem 4.
Both regions of the proof apply here, where instead of quantizing θ to θ′, we quantize θ̂ to θ̂
similar manner, and do not need to average over all sequences. In fact, instead of using any general
ϕ̂ to code xn, we can use θ̂
without any additional optimizations, where log n bits describe k. The
description costs of θ̂
are almost the same as those of θ′. The factor 2 reduction in the last region
is because it is sufficient here to replace n2 by n in the denominators of (32). This is because for
every occurring symbol θ̂′i ≥ 1/n and δi ≤ 1/n, thus the first term of step (a) in (39) holds with
the new grid, and B2 in (36) reduces by a factor of 2. The quantization costs bounded in (28) and
(39) are thus bounded similarly, where θ̂ replaces θ and θ̂
replaces θ′. This results in the bounds
in (79) and concludes the proof of Theorem 7. �
If one a-priori knows that xn is likely to have been generated by a monotonic distribution,
the case considered in Theorem 7 is with high probability the typical one. However, a typical
sequence can also be one for which θ̂ 6∈ M, where θ̂ mildly violates the monotonicity. In the pure
individual sequence setting (where no underlying distribution is assumed but some monotonicity
assumption is reasonable for the empirical distribution of xn), one can still observe sequences that
have empirical distributions that are either monotonic or slightly diverge from monotonic. Coding
for this more general case can apply the methods described in Section 5 to the individual sequence
case. If the divergence from monotonicity is small, one may still achieve bounds of the same order
of those presented in Theorem 7 with additional negligible cost of relaying which symbols are out
of order. The next theorem, however, provides a general upper bound in the form of the bounds
of Theorems 5 and 6 for the individual sequence redundancy w.r.t. the monotonic ML description
length, as defined in (10). We begin, again, with some notation.
Recall the definition of an effective alphabet size m
= nρ (where ρ = (logm)/(log n).)
Now, use this definition for a specific individual sequence xn. Let
R̂n(m)
log n
, m ≤ n1/3,
m log n
, n1/3 < m = o (
minα<ρ
ρ+1+α
(ρ− α) (log n)2 nα + 3(log e)n1−α
, otherwise.
Theorem 8 Fix an arbitrarily small ε > 0, and let n→ ∞. Then, there exists a code with length
function L∗(·), that achieves individual sequence redundancy w.r.t. the monotonic ML description
length of xn (as defined in (10)) bounded by
R̂n (L
∗, xn) ≤ 1 + ε
R̂n (nρ) +
i>nρ,i∈xn
log i
for every xn.
Theorem 8 shows that if one can find a relatively small effective alphabet of the symbols that
occur in xn, and the symbols outside this alphabet are small enough, xn can be described with
diminishing per-symbol redundancy w.r.t. its monotonic ML description length. This implies that
as long as the occurring symbols are not too large, there exist a universal code w.r.t. a monotonic
ML distribution for any such sequence xn. This is unlike standard individual sequence compression
w.r.t. the i.i.d. ML description length. Specifically, if the effective alphabet size is O(n), and
only a small number of symbols which are only polynomial in n occur, the universality cost is
n(log n)2) bits overall, which gives diminishing per-symbol redundancy of O((log n)2/
This redundancy is much better than what can be achieved in standard compression. The penalty,
of course, is when the empirical distribution of an individual sequence diverges significantly away
from a monotonic one. While the monotonic redundancy can be made diminishing under mild
conditions, there is a non-diminishing divergence cost by using the monotonic ML description
length instead of the ML description length in that case. This implies that one should compress
a sequence as generated by a monotonic distribution only if the total description length required
to code xn as such is shorter than the total description length required to code xn with standard
methods. As shown in the proof of Theorem 8, one prefix bit can inform the decoder which type
of description is used.
Theorem 8 shows that as long as the effective alphabet size is polynomial in n, α = 0.5 optimizes
the third region of the upper bound, thus yielding the rate shown above, unless very large symbols
occur in xn. For small effective alphabets (the first region), there is no redundancy gain in using
the monotonic ML description length over the ML description length. The reason, again, is that
the bound is obtained for cases where the actual empirical distribution of a sequence may not be
monotonic. One can still use an i.i.d. ML estimate w.r.t. only the effective alphabet, if the additional
cost of symbols outside this alphabet is negligible, to better code such sequences. Theorem 8 also
shows that if a very large symbol, such as i = an; a > 1, occurs in xn, xn cannot be universally
compressed even w.r.t. its monotonic ML description length. This is because it is impossible to
avoid the cost of (1+ε) log i = (1+ε)n log a bits to describe this symbol to the decoder. The bound
above and its proof below give a very powerful method to individually compress sequences that
have an almost monotonic empirical distribution but may have some limited disorder, for which
the monotonic ML description length diverges only negligibly from the ML description length.
Proof of Theorem 8: The proof follows the same steps as the proof of Theorems 5 and 6. Each
value of m is tested and the best one is chosen, where the same coding costs described in the
mentioned proof are computed for each m. In addition, one can test the cost of coding xn using the
description lengths for both θ̂ and θ̂M. Then, one bit can be used to relay which ML estimator is
used. If θ̂ is used, the codes for coding individual sequences over large alphabets in either [21] or [25]
can be used. In the first region in (81), the bound in [25] is obtained since log P
(xn) ≥ log P
for every xn. This bound yields smaller redundancy for this region than that obtained using θ̂M
if θ̂M 6= θ̂. It implies that for small alphabets, if xn does not have an empirical monotonic
distribution, it is better coded, even in terms of universal coding redundancy, using standard
universal compression methods without taking advantage of a monotonicity assumption.
For the other two regions, we start with a lemma.
Lemma 6.1 Let θ̂M =
θ̂1,M, θ̂2,M, . . . , θ̂k,M
be the monotonic ML estimator of θ from xn, i.e.,
θ̂1,M ≥ θ̂2,M ≥ · · · ≥ θ̂k,M, where k = max {x1, x2, . . . , xn}. Then,
θ̂k,M ≥
. (82)
Lemma 6.1 provides a lower bound on the minimal nonzero probability component of the monotonic
ML estimator. This bound helps in designing the grid of points used to quantize the monotonic
ML distribution of xn, while maintaining bounded quantization costs. The proof of Lemma 6.1 is
in Appendix C.
For m in the second region, we cannot use the grid in (18). The reason is that, here, the
quantization cost is affected by both θ̂ and θ̂M. This is unlike the average case, where the av-
erage respective vectors merge. To limit the quantization cost for very small probabilities, using
Lemma 6.1, the minimal grid point must be 1/n2 or smaller. To make the quantization cost neg-
ligible w.r.t. the cost of describing the quantized ML, the ratio ∆j/ϕi,M between the spacing in
interval j, and a quantized version ϕi,M of θ̂i,M in the jth interval, must be O(m/n). Hence, using
the same methodology of the proof of Theorems 5 and 6, we define the jth interval for an effective
alphabet m = nρ = o (
n) as
Îj =
n(j−1)β
, 1 ≤ j ≤ Ĵρ. (83)
The spacing in the jth interval is
. (84)
This gives a total of
B̂ρ ≤
log n (85)
quantization points. Using the same methodology as in (21), this yields a representation cost of
LR (ϕm) ≤ (1 + ε)m log
where ϕm is the quantized version of θ̂M in which only the firstm components of θ̂M are considered.
Using the quantization with the grid defined in (83)-(86) in a code similar to the one used in the
proof of Theorems 5 and 6, the individual quantization cost is given by
P (xn|m,S′m,ϕm)
θ̂i log
θ̂i,M
+ log e
≤ n(log e)
+ log e
≤ (log e) · n
·mn+ (log e) · n · mn
+ log e
= 3m(log e) + log e. (87)
where (a) follows the same steps as in (60), (b) follows from ln(1+x) ≤ x, and then x ≤ |x|, where
= θ̂i,M − ϕi,m, and (c) follows from Lemma 6.1 and the definition of Îj in (83) (for the worst
case first term, |δi| ≤ 1/n2 and ϕi,m ≥ 1/(mn)), from (84) and (83) (the second term), and since
θ̂i = 1. The only additional non-negligible cost of coding sequences using a code as defined in
the proof of Theorems 5 and 6 for a given m is the cost of coding all symbols i > m that occur in
xn. Using a similar derivation to (54), with Elias’ asymptotic code for the integers, this yields an
additional cost of (1 + ε) (1 + 1/ρ)
i>nρ,i∈xn log i code bits. Combining all costs, absorbing low
order terms in ε, and normalizing by n, yields the second region of the bound in (81). Note that
this bound also applies to the first region, but in that region, a tighter bound is obtained by using a
code that uses the standard i.i.d. ML estimator θ̂. This is because very fine quantization is needed
to offset the cost of mismatch between θ̂ and θ̂M. This quantization requires higher description
costs than the description of a quantized type of a sequence when using standard compression.
(This is not the case when θ̂ obeys the monotonicity, as in Theorem 7. Even if θ̂ does not obey
monotonicity in the upper regions of the bound, this is not the case.)
For the last region of the bound, we follow the same steps above as was done for the upper
region of the bound in Theorem 5 with a parameter α. The intervals are chosen, again, to guarantee
bounded quantization costs. Hence,
Îj =
n(j−1)β
nρ+1+α
nρ+1+α
, 1 ≤ j ≤ Ĵρ. (88)
The spacing in the jth interval is
nρ+1+2α
. (89)
This gives a total of
B̂ρ ≤ 0.5nα ⌈(ρ+ 1 + α) log n⌉ (90)
quantization points. Using the same methodology as in (56), this yields a representation cost of
LR (ϕm) ≤ (1 + ε)
ρ+ 1 + α
(ρ+ ε− α) (log n)2nα. (91)
Similarly to (87),
P (xn|m,S′m,ϕm)
≤ (log e)
nρ+1+α
+ (log e)2n1−α + log e = 3(log e)n1−α + log e (92)
where (a) follows from similar steps to (a)-(c) of (87). Using Lemma 6.1, ϕi,m ≥ 1/nρ+1 and
|δi| ≤ 1/nρ+1+α, leading to the first term. Bounding |δi| ≤ njβ/nρ+1+2α and ϕi,m ≥ n(j−1)β/nρ+1+α
leads to the second term. Note that as before, m is used here in place of k, because using an ef-
fective alphabet m, all greater symbols are packed together as one symbol, and the additional cost
to describe them is reflected in an additional term. Adding this additional term with an identical
expression to that in the lower regions, absorbing low order terms in ε, and normalizing by n,
yields the third region of the bound in (81). Since the bound holds for every α and every ρ > α, it
can be optimized to give the values that attain the minimum, concluding the proof of Theorem 8. �
7 Summary and Conclusions
Universal compression of sequences generated by monotonic distributions was studied. We showed
that for finite alphabets, if one has the prior knowledge of the monotonicity of a distribution,
one can reduce the cost of universality. For alphabets of o(n1/3) letters, this cost reduces from
0.5 log(n/k) bits per each unknown probability parameter to 0.5 log(n/k3) bits per each unknown
probability parameter. Otherwise, for alphabets of O(n) letters, one can compress such sources with
overall redundancy of O(n1/3+ε) bits. This is a significant decrease in redundancy from O(k log n)
or O(n) bits overall that can be achieved if no side information is available about the source
distribution. Redundancy of O(n1/3+ε) bits overall can also be achieved for much larger alphabets
including infinite alphabets for fast decaying monotonic distributions. Sequences generated by
slower decaying distributions can also be compressed with diminishing per-symbol redundancy
costs under some mild conditions and specifically if they have finite entropy rates. Examples for
well-known monotonic distributions demonstrated how the diminishing redundancy decay rates
can be computed by applying the bounds that were derived. Finally, the average case results were
extended to individual sequences. Similar convergence rates were shown for sequences that have
empirical monotonic distributions. Furthermore, universal redundancy bounds w.r.t. the monotonic
ML description length of a sequence were also derived for the more general case. Under some mild
conditions, these bounds still exhibit diminishing per-symbol redundancies.
Appendix A – Proof of Theorem 1
The proof follows the same steps used in [25] and [26] to lower bound the maximin redundancies
for large alphabets and patterns, respectively, using the weak version of the redundancy-capacity
theorem [5]. This version ties between the maximin universal coding redundancy and the capacity
of a channel defined by the conditional probability Pθ (x
n). We define a set ΩMk of points θ ∈ Mk.
Then, show that these points are distinguishable by observing Xn, i.e., the probability that Xn
generated by θ ∈ ΩMk appears to have been generated by another point θ
′ ∈ ΩMk diminishes
with n. Then, using Fano’s inequality [3], the number of such distinguishable points is a lower
bound on R−n (Mk). Since R+n (Mk) ≥ R−n (Mk), it is also a lower bound on the average minimax
redundancy. The two regions in (6) result from a threshold phenomenon, where there exists a value
km of k that maximizes the lower bound, and can be applied to all Mk for k ≥ km.
We begin with defining ΩMk . Let ω be a vector of grid components, such that the last k − 1
components θi, i = 2, . . . , k, of θ ∈ ΩMk must satisfy θi ∈ ω. Let ωb be the bth point in ω, and
define ω0 = 0 and
2(j − 1
, b = 1, 2, . . . . (A.1)
Then, for the bth point in ω,
. (A.2)
To count the number of points in ΩMk , let us first consider the standard i.i.d. case, where there
is no monotonicity requirement, and count the number of points in Ω, which is defined similarly,
but without the monotonicity requirement (i.e., ΩMk ⊆ Ω). Let bi be the index of θi in ω, i.e.,
θi = ωbi . Then, from (A.1)-(A.2) and since the components of θ are probabilities,
ωbi =
θi ≤ 1. (A.3)
It follows that for θ ∈ Ω,
b2i ≤ n1−ε. (A.4)
Hence, since the components bi are nonnegative integers,
= |Ω| ≥
n1−ε⌋
n1−ε−b22
· · ·
n1−ε−
i=2 b
n1−ε−x22
· · ·
n1−ε−
i=2 x
dxk · · · dx3dx2
(A.5)
where Vk−1
is the volume of a k − 1 dimensional sphere with radius
, (a) follows
from monotonic decrease of the function in the integrand for all integration arguments, and (b)
follows since its left hand side computes the volume of the positive quadrant of this sphere. Note
that this is a different proof from that used in [25]-[26] for this step. Applying the monotonicity
constraint, all permutations of θ that are not monotonic must be taken out of the grid. Hence,
= |ΩMk | ≥
k! · 2k−1
, (A.6)
where dividing by k! is a worst case assumption, yielding a lower bound and not an equality. This
leads to a lower bound equal to that obtained for patterns in [26] on the number of points in ΩMk .
Specifically, the bound achieves a maximal value for km =
πn1−ε/2
and then decreases to
eventually become smaller than 1. However, for k > km, one can consider a monotonic distribution
for which all components θi; i > km, of θ are zero, and use the bound for km.
Distinguishability of θ ∈ ΩMk is a direct result of distinguishability of θ ∈ Ω, which is shown
in Lemma 3.1 in [25], i.e., there exits an estimator Θ̂g(X
n) ∈ Ω for which the estimate θ̂g satisfies
limn→∞ Pθ
θ̂g 6= θ
= 0 for all θ ∈ Ω. Since this is true for all points in Ω, it is also true
for all points in ΩMk ⊆ Ω, where now, θ̂g ∈ ΩMk . Assuming all points in ΩMk are equally
probable to generate Xn, we can define an average error probability Pe
Θ̂g(X
n) 6= Θ
θ∈ΩMk
θ̂g 6= θ
/MMk . Using the redundancy-capacity theorem,
nR−n [Mk] ≥ C [Mk → Xn]
≥ I[Θ;Xn] = H [Θ]−H [Θ|Xn]
= logMMk −H [Θ|X
≥ (1− Pe) (logMMk)− 1
≥ (1− o(1)) logMMk , (A.7)
where C [Mk → Xn] denotes the capacity of the respective channel and I[Θ;Xn] is the mutual
information induced by the joint distribution Pr (Θ = θ) · Pθ (Xn). Inequality (a) follows from the
definition of capacity, equality (b) from the uniform distribution of Θ in ΩMk , inequality (c) from
Fano’s inequality, and (d) follows since Pe → 0. Lower bounding the expression in (A.6) for the
two regions (obtaining the same bounds as in [26]), then using (A.7), normalizing by n, and ab-
sorbing low order terms in ε, yields the two regions of the bound in (6). The proof of Theorem 1
is concluded. �
Appendix B – Proof of Theorem 2
To prove Theorem 2, we use the random-coding strong version of the redundancy-capacity theorem
[17]. The idea is similar to the weak version used in Appendix A. We assume that grids ΩMk of
points are uniformly distributed over Mk, and one grid is selected randomly. Then, a point in the
selected grid is randomly selected under a uniform prior to generate Xn. Showing distinguishability
within a selected grid, for every possible random choice of ΩMk , implies that a lower bound on the
cardinality of ΩMk for every possible choice is essentially a lower bound on the overall sequence
redundancy for most sources in Mk.
The construction of ΩMk is identical to that used in [26] to construct a grid of sources that
generate patterns. We pack spheres of radius n−0.5(1−ε) in the parameter space defining Mk. The
set ΩMk consists of the center points of the spheres. To cover the space Mk, we randomly select
a random shift of the whole lattice under a uniform distribution. The cardinality of ΩMk is lower
bounded by the relation between the volume of Mk, which equals (as shown in [26]) 1/[(k− 1)!k!],
and the volume of a single sphere, with factoring also of a packing density (see, e.g., [2]). This
yields eq. (55) in [26],
MMk ≥
(k − 1)! · k! · Vk−1
n−0.5(1−ε)
· 2k−1
, (B.1)
where Vk−1
n−0.5(1−ε)
is the volume of a k−1 dimensional sphere with radius n−0.5(1−ε) (see, e.g.,
[2] for computation of this volume).
For distinguishability, it is sufficient to show that there exists an estimator Θ̂g(X
n) ∈ ΩMk
such that limn→∞ PΘ
Θ̂g(X
n) 6= Θ
= 0 for every choice of ΩMk and for every choice of Θ ∈
ΩMk . This is already shown in Lemma 4.1 in [25] for a larger grid Ω of i.i.d. sources, which is
constructed identically to ΩMk over the complete k−1 dimensional probability simplex. Therefore,
by the monotonicity requirement, for every ΩMk , there exists such Ω, such that ΩMk ⊆ Ω. Since
Lemma 4.1 in [25] holds for Ω, it then must also hold for the smaller grid ΩMk . Note that
distinguishability is easier to prove here than for patterns because Θ̂g(X
n) is obtained directly
form Xn and not from its pattern as in [26]. Now, since all the conditions of the strong random-
coding version of the redundancy-capacity theorem hold, taking the logarithm of bound in (B.1),
absorbing low order terms in ε, and normalizing by n, leads to the first region of the bound in (7).
More detailed steps follow those found in [26].
The second region of the bound is handled in a manner related to the second region of the
bound of Theorem 1. However, here, we cannot simply set the probability of all symbols i > km
to zero, because all possible valid sources must be included in one of the grids ΩMk to generate
a complete covering of Mk. As was done in [26], we include sources with θi > 0 for i > km in
the grids ΩMk , but do not include them in the lower bound on the number of grid points. In-
stead, for k > km, we bound the number of points in a km-dimensional cut of Mk for which the
remaining k− km components of θ are very small (and insignificant). This analysis is valid also for
k > n. Distinguishability for k > km is shown for i.i.d. non-monotonically restricted distributions
in the proof of Lemma 6.1 in [26]. As before, it carries over to monotonic distributions, since as
before, for each ΩMk , there exists an unrestricted corresponding Ω, such that ΩMk ⊆ Ω. The
choice of km = 0.5(n
1−ε/π)1/3 gives the maximal bound w.r.t. k. Since, again, all conditions of the
strong version of the redundancy-capacity theorem are satisfied, the second region of the bound is
obtained. Again, more detailed steps can be found in [26]. This concludes the proof of Theorem 2. �
Appendix C – Proof of Lemma 6.1
For cardinality k, we consider the largest component of θ̂M; θ̂1,M, as the constraint component,
i.e., θ̂1,M = 1−
i=2 θ̂i,M. For any given probability parameter ϕ of cardinality k with ϕ1 > 0, we
Pϕ (x
n) = ϕ
nx(1)
1 (1− ϕ1)
n−nx(1) ·
1− ϕ1
)nx(i) △
nx(1)
1 (1− ϕ1)
n−nx(1)
nx(i)
i (C.1)
where we recall that nx(i) is the occurrence count of i in x
n. Therefore, maximization of Pϕ (x
w.r.t. ϕ1 is independent of the maximization over ϑi; i > 1, and is obtained for ϕ1 = θ̂1 = nx(1)/n.
Since for all i > 1, θ̂1,M ≥ θ̂i,M, θ̂1,M can thus only increase from θ̂1 by the monotonicity constraint.
(Note that the monotonicity constraint implies a water filling [3] optimization to achieve θ̂M.)
Hence, θ̂1,M ≥ nx(1)/n.
Now, using the result above, we show that the derivative of lnPϕM (x
n) w.r.t. ϕk,M is positive
for ϕk,M < 1/(kn) and a monotonic ϕM. A component of a parameter vector ϕM, which is
monotonic, can be expressed as
ϕi,M =
ϕ′ℓ, ϕ
ℓ ≥ 0. (C.2)
Hence,
∂ lnPϕM (x
∂ϕk,M
ϕ1,M=θ̂1,M
∂ lnPϕM (x
ϕ1,M=θ̂1,M
nx(i)
− (k − 1)nx(1)
θ̂1,M
knx(k)
− knx(1)
= 0 (C.3)
where (a) follows from ϕk,M being the smallest nonzero component of ϕM, (b) is since by (C.2),
ϕ′k is included in all terms, and
ϕ1,M = 1−
ϕi,M = 1−
(i− 1)ϕ′i − (k − 1)ϕk,M, (C.4)
where the last equality follows from (C.2), (c) follows by omitting all terms of the sum except i = k,
from the assumption that ϕk,M < 1/(nk) ≤ θ̂k/k, and since θ̂1,M ≥ nx(1)/n = θ̂1, and (d) follows
since its left hand side is 0 for the (i.i.d.) ML parameter values. Hence, PϕM (x
n) must increase,
with ϕ1,M taking its optimal value, for all ϕM for which ϕk,M < 1/(nk), and the maximum is thus
achieved for θ̂k,M ≥ 1/(nk). �
References
[1] J. Åberg, Y. M. Shtarkov, and B. J. M. Smeets, “Multialphabet coding with separate alphabet
description,” in Proceedings of Compression and Complexity of Sequences, pp. 56-65, Jun.
1997.
[2] J. H. Conway, N. J. A. Sloane, Sphere Packings, Lattices and Groups, Springer-Verlag, Third
Edition, 1998.
[3] T. M. Cover and J. A. Thomas, Elements of Information Theory , second edition, John Wiley
& Sons, 2006.
[4] I. Csiszar and J. Korner, Information Theory: Coding Theorems for Discrete Memoryless
Systems., Academic Press, New York, 1981.
[5] L. D. Davisson, “Universal noiseless coding,” IEEE Trans. Inform. Theory , vol. IT-19, no. 6,
pp. 783-795, Nov. 1973.
[6] L. D. Davisson, and A. Leon-Garcia, “A source matching approach to finding minimax codes,”
IEEE Trans. Inform. Theory , vol. IT-26, no. 2, pp. 166-174, Mar. 1980.
[7] P. Elias, “Universal codeword sets and representation of the integers,” IEEE Trans. Inform.
Theory , vol. IT-21, no. 2, pp. 194-203, March 1975.
[8] B. M. Fitingof, “Optimal coding in the case of unknown and changing message statistics,”
Probl. Inform. Transm., vol. 2, no. 2, pp. 1-7, 1966.
[9] B. M. Fitingof, “The compression of discrete information,” Probl. Inform. Transm., vol. 3,
no. 3, pp. 22-29, 1967.
[10] D. P. Foster, R. A. Stine, and A. J. Wyner, “Universal codes for finite sequences of integers
drawn from a monotone distribution,” IEEE Trans. Inform. Theory , vol. 48, no. 6, pp. 1713-
1720, June 2002.
[11] R. G. Gallager, “Source coding with side information and universal coding,” unpublished
manuscript, September 1976.
[12] G. M. Gemelos and T. Weissman, “On the entropy rate of pattern processes,” IEEE Trans.
Inform. Theory , vol. 52, no. 9, pp. 3994-4007, Sept. 2006.
[13] L. Györfi, I. Páli, and E. C. van der Meulen, “There is no universal code for an infinite source
alphabet,” IEEE Trans. Inform. Theory , vol. 40, no. 1, pp. 267-271, Jan. 1994.
[14] N. Jevtić, A. Orlitsky, and N. P. Santhanam, “A lower bound on compression of unknown
alphabets,” Theoret. Comput. Sci., vol. 332, no. 1-3, pp. 293-311, 2005.
[15] J. C. Kieffer, “A unified approach to weak universal source coding,” IEEE Trans. Inform.
Theory , vol. IT-24, no. 6, pp. 674-682, Nov. 1978.
[16] M. Khosravifard, H. Saidi, M. Esmaeili, and T. A. Gulliver, “The minimum average code
for finite memoryless monotone sources,” IEEE Trans. Inform. Theory , vol. 52, no. 3, pp.
955-975, Mar. 2007.
[17] N. Merhav and M. Feder, “A strong version of the redundancy-capacity theorem of universal
coding,” IEEE Trans. Inform. Theory , vol. no. 3, 41, pp. 714-722, May 1995.
[18] N. Merhav, G. Seroussi, and M. J. Weinberger, “Optimal prefix codes for sources with two-
sided geometric distributions,” IEEE Trans. Inform. Theory , vol. 46, no. 1, pp. 121-135, Jan.
2000.
[19] N. Merhav, G. Seroussi, and M. J. Weinberger, “Coding of sources with two-sided geometric
distributions and unknown parameters,” IEEE Trans. Inform. Theory , vol. 46, no. 1, pp.
229-236, Jan. 2000.
[20] A. Orlitsky, N. P. Santhanam, and J. Zhang, “Universal compression of memoryless sources
over unknown alphabets,” IEEE Trans. Inform. Theory , vol. 50, no. 7, pp. 1469-1481, July
2004.
[21] A. Orlitsky, and N. P. Santhanam, “Speaking of infinity,” IEEE Trans. Inform. Theory , vol.
50, no. 10, pp. 2215-2230, Oct. 2004.
[22] J. Rissanen, “Minimax codes for finite alphabets,” IEEE Trans. Inform. Theory , vol. IT-24,
no. 3, pp. 389-392, May 1978.
[23] J. Rissanen, “Universal coding, information, prediction, and estimation,” IEEE Trans. In-
form. Theory , vol. IT-30, no. 4, pp. 629-636, Jul. 1984.
[24] B. Ya. Ryabko, “Coding of a source with unknown but ordered probabilities,” Problems of
Information Transmission, vol. 15, no. 2, pp. 134-138, Oct. 1979.
[25] G. I. Shamir, “On the MDL principle for i.i.d. sources with large alphabets,” IEEE Trans.
Inform. Theory , vol. 52, no. 5, pp. 1939-1955, May 2006.
[26] G. I. Shamir, “Universal lossless compression with unknown alphabets - the average case”,
IEEE Trans. Inform. Theory , vol. 52, no. 11, pp. 4915-4944, Nov. 2006.
[27] G. I. Shamir, “Patterns of sequences and their entropy,” submitted to IEEE Trans. Inform.
Theory . Also in Arxiv:cs.IT/0605046 .
[28] G. I. Shamir, “A new redundancy bound for universal lossless compression of unknown alpha-
bets,” in Proceedings of The 38th Annual Conference on Information Sciences and Systems,
Princeton, New-Jersey, U.S.A., pp. 1175-1179, Mar. 17-19, 2004.
[29] Y. M. Shtarkov, “Universal sequential coding of single messages,” Problems of Information
Transmission, 23(3):3-17, Jul.-Sep. 1987.
[30] L. R. Varshney and V. K. Goyal, “Ordered and disordered source coding,” in Information
Theory & Applications Workshop (ITA), San Diego, California, Feb. 6-10, 2006.
[31] L. R. Varshney and V. K. Goyal, “On universal coding of unordered data,” in Information
Theory & Applications Workshop (ITA), San Diego, California, Jan. 29-Feb. 2, 2007.
[32] A. D. Wyner, “An upper bound on the entropy series,” Inform. Contr., vol. 20, pp. 176-181,
1972.
	Introduction
	Notation and Definitions
	Lower Bounds
	Upper Bounds for Small and Large Alphabets
	Upper Bounds for Fast Decaying Distributions
	Upper Bounds
	Examples
	Fast Decaying Distributions Over the Integers
	Geometric Distributions
	Slow Decaying Distributions Over the Integers
	Individual Sequences
	Summary and Conclusions
	–   Proof of Theorem ??
	–   Proof of Theorem ??
	–   Proof of Lemma ??
ABSTRACT
  We study universal compression of sequences generated by monotonic
distributions. We show that for a monotonic distribution over an alphabet of
size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits,
where $n$ is the coded sequence length, as long as $k = o(n^{1/3})$. Otherwise,
for $k = O(n)$, the total average sequence redundancy is $O(n^{1/3+\epsilon})$
bits overall. We then show that there exists a sub-class of monotonic
distributions over infinite alphabets for which redundancy of
$O(n^{1/3+\epsilon})$ bits overall is still achievable. This class contains
fast decaying distributions, including many distributions over the integers and
geometric distributions. For some slower decays, including other distributions
over the integers, redundancy of $o(n)$ bits overall is achievable, where a
method to compute specific redundancy rates for such distributions is derived.
The results are specifically true for finite entropy monotonic distributions.
Finally, we study individual sequence redundancy behavior assuming a sequence
is governed by a monotonic distribution. We show that for sequences whose
empirical distributions are monotonic, individual redundancy bounds similar to
those in the average case can be obtained. However, even if the monotonicity in
the empirical distribution is violated, diminishing per symbol individual
sequence redundancies with respect to the monotonic maximum likelihood
description length may still be achievable.

<|endoftext|><|startoftext|>
Introduction: smooth tropical varieties
In this section we follow the definitions of [5] and [4].
The underlying algebra of tropical geometry is given by the semifield
T = R∪{−∞} of tropical numbers. The tropical arithmetic operations
are “a + b” = max{a, b} and “ab” = a + b. The quotation marks are
used to distinguish between the tropical and classical operations. With
respect to addition T is a commutative semigroup with zero “0T” =
−∞. With respect to multiplication T× = T r {0T} ≈ R is an honest
commutative group with the unit “1T” = 0. Furthermore, the addition
and multiplication satisfy to the distribution law “a(b+c)” = “ab+ac”,
a, b, c ∈ T. These operations may be viewed as a result of the so-called
dequantization of the classical arithmetic operations that underlies the
patchworking construction, see [3] and [8].
These tropical operations allow one to define tropical Laurent poly-
nomials. Namely, a tropical Laurent polynomial is a function f : Rn →
f(x) = “
j” = max
(aj + jx),
where jx denotes the scalar product, x ∈ (T×)n ≈ Rn, j ∈ Zn and only
finitely many coefficients aj ∈ T are non-zero (i.e. not −∞).
http://arxiv.org/abs/0704.0839v1
2 GRIGORY MIKHALKIN
Affine-linear functions with integer slopes (for brevity we call them
simply affine functions) form an important subcollection of all Laurent
polynomials. Namely, these are such functions f : Rn → R that both
f and “1T
” = −f are tropical Laurent polynomials.
We equip Tn ≈ [−∞,∞)n with the Euclidean topology. Let U ⊂ Tn
be an open set.
Definition 1.1. A continuous function f : U → T is called regular if
its restriction to U ∩ Rn coincides with a restriction of some tropical
Laurent polynomial to U ∩ Rn.
We denote the sheaf of regular functions on Tn with O (or sometimes
OTn to avoid confusion). Any subset X ⊂ T
n gets an induced regular
sheaf OX by restriction. For our purposes we restrict our attention only
to the case when X is a polyhedral complex, i.e. when X is the closure
of a union of convex polyhedra (possibly unbounded) in Rn such that
the intersection of any number of such polyhedra is their common face.
We say that X is an k-dimensional polyhedral complex if it is obtained
from a union of k-dimensional polyhedra. These polyhedra are called
the facets of X .
Let V ⊂ X be an open set and f ∈ OX(V ) be a regular function
in V . A point x ∈ V is called a “zero point” of f if the restriction of
” = −f toW ⊂ V is not regular for any open neighborhoodW ∋ x.
Note that it may happen that x is a “zero point” for φ : U → T, but
not for φ|X∩U . It is easy to see that if X is a k-dimensional polyhe-
dral complex then the “zero locus” Zf of f is a (k − 1)-dimensional
polyhedral subcomplex.
To each facet of Zf we may associate a natural number, called its
weight (or degree). To do this we choose a “zero point” x inside such
a facet. We say that x is a “simple zero” for f if for any local de-
composition into a sum (i.e. the tropical product) of regular function
f = “gh” = g+h on V near x we have either g or h affine (i.e. without
a “zero”). We say that the weight is l if f can be locally decomposed
into a tropical product of l functions with a simple zero at x.
A regular function f allows us to make the following modification on
its domain V ⊂ X ⊂ Tn. Consider the graph
Γf ⊂ V × T ⊂ T
It is easy to see that the “zero locus”
Γ̄f ⊂ V × T
of the (regular) function “y+ f(x)” (defined on V ×T), where x is the
coordinate on V and y is the coordinate on T, coincides with the union
MODULI SPACES OF RATIONAL TROPICAL CURVES 3
of Γf and the undergraph
UΓf,Z = {(x, y) ∈ V × T | x ∈ Zf , y ≤ f(x)}.
Furthermore, the weight of a facet F ⊂ Γ̄f is 1 if F ∈ Γf (recall that
as V is an unweighted polyhedral complex all the weights of its facets
are equal to one) and is the weight of the corresponding facet of Zf if
F ∈ UΓf,Z . We view Γ̄f as a “tropical closure” of the set-theoretical
graph Γf . Note that we have a map Γ̄f → V . We set Ṽ = Γ̄f to be
the result of the tropical modification µf : Ṽ → V along the regular
function f . The locus Zf is called the center of tropical modification.
The weights of the facets of Ṽ supplies us with some inconvenience
as they should be incorporated in the definition of the regular sheaf OṼ
on Ṽ . Namely, the affine functions defined by OṼ on a facet of weight
w should contain the group of functions that come as restrictions to
this facet of the affine functions on Tn+1 as a subgroup of index w.
Sometimes one can get rid of the weights of Ṽ by a reparameteriza-
tion with the help of a map V̄ → Ṽ that is given by locally linear maps
in the corresponding charts. Indeed, the restriction of µg : V̄ → Ṽ to
a facet is locally given by a linear function between two k-dimensional
affine-linear spaces defined over Z. If its determinant equals to w then
the push-forward of OV̄ supplies an extension of OṼ required by the
weights. Note however that if w > 1 then the converse map is not
defined over Z and thus is not given by elements of OṼ .
Tropical modifications give the basic equivalence relation in Tropical
Geometry. It can be shown that if we start from Tk and do a number of
tropical modifications on it then the result is a k-dimensional polyhe-
dral complex Y ⊂ Tn that satisfies to the following balancing property
(cf. Property 3.3 in [4] where balancing is restated in an equivalent
way).
Property 1.2. Let E ⊂ Y ∩ RN be a (k − 1)-dimensional face and
F1, . . . , Fl be the facets of Y adjacent to F whose weights are w1, . . . , wl.
Let L ⊂ RN be a (N−k)-dimensional affine-linear space with an integer
slope and such that it intersects E. For a generic (real) vector v ∈ RN
the intersection Fj ∩ (L + v) is either empty or a single point. Let
ΛFj ⊂ Z
N be the integer vectors parallel to Fj and ΛL ⊂ Z
N be the
integer vectors parallel to L. Set λj to be the product of wj and the
index of the subgroup ΛFj +ΛL ⊂ Z
N . We say that Y ⊂ Tn is balanced
if for any choice of E, L and a small generic v the sum
j | Fj∩(L+v)6=∅
4 GRIGORY MIKHALKIN
is independent of v. We say that Y is simply balanced if in addition for
every j we can find L and v so that Fj ∩ (L + v) 6= ∅, ιL = 1 and for
every small v there exists an affine hyperplane Hv ⊂ L such that the
intersection Y ∩ (L + v) sits entirely on one side of Hv + v in L + v
while the intersection Y ∩ (Hv + v) is a point.
Definition 1.3 (cf. [5],[4]). A topological space X enhanced with a
sheaf of tropical functions OX is called a (smooth) tropical variety of
dimension k if for every x ∈ X there exist an open set U ∋ x and an
open set V in a simply balanced polyhedral complex Y ⊂ TN such that
the restrictions OX |U and OY |V are isomorphic.
Tropical varieties are considered up to the equivalence generated by
tropical modifications. It can be shown that a smooth tropical variety
of dimension k can be locally obtained from Tk by a sequence of tropical
modifications centered at smooth tropical varieties of dimension (k−1).
This follows from the following proposition.
Proposition 1.4. Any k-dimensional simply balanced polyhedral com-
plex X ⊂ Rn can be obtained from Tk by a sequence of consecutive trop-
ical modifications whose centers are simply balanced (k−1)-dimensional
polyhedral complexes.
Proof. We prove this proposition inductively by n. Without the loss of
genericity we may assume that X is a fan, i.e. each convex polyhedron
of X is a cone centered at the origin.
The base of the induction, when n = k, is trivial. If n > k let us take
a (n− k)-dimensional affine-linear subspace L ⊂ Rn given by Property
1.2. Choose a linear projection
λ : Rn → Rn−1
defined over Z and such that ker(λ) is a line contained in L.
The image λ(X) ⊂ Rn−1 is a k-dimensional polyhedral complex since
L is transversal to some facets of X . We claim that
λ|X : X → λ(X)
is a tropical modification once we identify Rn and Rn−1×R. The center
of this modification is the locus
Zf = {x ∈ R
n−1 | dim(λ−1(x) ∩X) > 0}.
Here we use the dimension in the usual topological sense. Note that the
(k − 1)-dimensional complex Zf ⊂ R
n−1 is simply balanced, existence
of the needed (n− k)-dimensional affine-linear spaces follows from the
fact that X ⊂ Rn is simply balanced.
MODULI SPACES OF RATIONAL TROPICAL CURVES 5
To justify our claim we note that near any point x ∈ Zf the sub-
complex Y ⊂ X obtained as the (Euclidean topology) closure of X r
λ−1(Zf) is a (set-theoretical) graph of a convex function. This, once
again, follows from the fact that X ⊂ Rn is simply balanced, this time
applied to the points in the facets on X r Y . Thus it gives a regular
tropical function f and it remains only to show that the the weight of
any facet of E ⊂ Zf is 1. But this follows, in turn, from the balancing
condition at λ−1(E) ∩ Y . �
2. Tropical curves and their moduli spaces
The definition of tropical variety is especially easy in dimension 1.
Tropical modifications take a graph into a graph (with arbitrary va-
lence of its vertices) and the tropical structure carried by the sheaf OX
amounts to a complete metric on the complement of the set of 1-valent
vertices of the graph X (cf. [5], [6], [1]). Thus, each 1-valent vertex of
a tropical curve X is adjacent to an edge of infinite length.
A tropical modification allows one to contract such an edge or to
attach it at any point of X other than a 1-valent vertex. If we have a
finite collection of marked points on X then by passing to an equivalent
model if needed we may assume that the set of marked points coincides
with the set of 1-valent vertices. (Of course, if X is a tree then we have
to have at least two marked points to make such assumption.)
The genus of a tropical curve X is dimH1(X). Let Mg,n be the
set of all tropical curves X of genus g with n distinct marked points.
Fixing a combinatorial type of a graph Γ with n marked leaves defines
a subset UΓ ⊂ Mg,n consisting of marked tropical curves with this
combinatorics. A length of any non-leaf edge of Γ defines a real-valued
function on UΓ. Such functions are called edge-length functions. To
avoid difficulties caused by self-automorphisms of X from now on we
restrict our attention to the case g = 0.
Definition 2.1. The combinatorial type of a tropical curve X is its
equivalence class up to homeomorphisms respecting the markings.
Combinatorial types partite the set M0,n into disjoint subsets. The
edge-length functions define the structure of the polyhedral cone RM≥0
in each of those subsets (as the lengths have to be positive). The
number M here is the number of the bounded (non-leaf) edges in X .
By the Euler characteristic reasoning it is equal to n − 3 if X is (1-
and) 3-valent, it is smaller if X has vertices of higher valence.
Furthermore, any face of the polyhedral cone RM≥0 coincides with the
cone corresponding to another combinatorial type, the one where we
6 GRIGORY MIKHALKIN
contract some of the edges of X to points. This gives the adjacency
(fan-like) structure on M0,n, so M0,n is a (non-compact) polyhedral
complex. In particular, it is a topological space.
Theorem 1. The set M0,n for n ≥ 3 admits the structure of an (n−3)-
dimensional tropical variety such that the edge-length functions are reg-
ular within each combinatorial type. Furthermore, the space M0,n can
be tropically embedded in RN for some N (i.e. M0,n can be presented
as a simply balanced complex).
Proof. This theorem is trivial for n = 3 as M0,3 is a point. Otherwise,
any two disjoint ordered pairs of marked points can be used to define
a global regular function on M0,n with values in R = T
×. Namely,
each such ordered pair defines the oriented path on the tropical curve
X connecting the corresponding marked points. These paths can be
embedded.
Since the two pairs of marked points are disjoint the intersection of
the two corresponding paths has to have finite length. We take this
length with the positive sign if the orientations agree and with the
negative sign otherwise. This defines a function on M0,n. We call such
functions the double ratio functions.
Take all possible disjoint pairs of marked points and use them as
coordinates for our embedding
ι : M0,n → R
where N is the number of all possible decompositions of n into two
disjoint pairs. The theorem now follows from the following two lemmas.
Note that, strictly speaking, each coordinate in RN depends not
only on the choice of two disjoint pairs of marked points but also on
the order of points in each pair. However, changing the order in one
of the pairs only reverses the sign of the double ratio. Taking an extra
coordinate for such a change of order would be redundant. Indeed, for
any balanced complex Y ⊂ RN and any affine-linear function λ : RN →
R with an integer slope the graph of λ is a balanced complex in RN+1
isomorphic to the initial complex Y .
Lemma 2.2. The map ι is a topological embedding.
Proof. First, let us prove that ι is an embedding. The combinatorial
type of X is determined by the set of the coordinates that do not
vanish on X . Indeed, any non-leaf edge E of the tree X separates the
leaves (i.e. the set of markings) into two classes corresponding to the
components of XrE. Let us take a coordinate in Rn that corresponds
MODULI SPACES OF RATIONAL TROPICAL CURVES 7
to four marking points (union of the two disjoint pairs) such that two of
these points belong to one class and two to the other class. We call such
a coordinate an E-compatible coordinate. Note that an E-compatible
coordinate vanishes on X if and only if the pairs of markings defined
by the coordinate agree with the pairs defined by the classes.
This observation suffices to reconstruct the combinatorial type of X .
Furthermore, the length of E equals to the minimal non-zero abso-
lute value of the E-compatible coordinates. This implies that ι is an
embedding. �
Lemma 2.3. The image ι(M0,n) is a simply balanced complex in R
Proof. This is a condition on codimension 1 faces of M0,n. First we
shall check it for the case n = 4. There are three ways to split the
four marking points into two disjoint pairs. Accordingly, there are
three combinatorial types of 3-valent trees with three marked leaves.
Thus our space M0,4 is homeomorphic to the tripod, or the “interior”
of the letter Y , see Figure 1. Each ray of this tripod correspond to
a combinatorial type of a 3-valent tree with 4 leaves while the vertex
correspond to the 4-valent tree.
Figure 1. The tropical moduli space M0,4 and its
points on the corresponding edges.
8 GRIGORY MIKHALKIN
Up to the sign we have the total of three double ratios for n = 4. Let
us e.g. take those defined by the following ordered pairs: {(12), (34)},
{(13), (24)} and {(14), (23)} Each is vanishing on the corresponding
ray of the tripod. Let us parameterize each ray of the tripod by its
only edge-length t ≥ 0 and compute the corresponding map to R3.
We have the following embeddings on the three rays
t 7→ (0, t, t), t 7→ (t, 0,−t), t 7→ (−t,−t, 0).
The sum of the primitive integer vectors parallel to the resulting direc-
tions is 0 and thus ι(M0,4) is balanced.
In the case n > 4 the codimension 1 faces of M0,n correspond to
the combinatorial types of X with a single 4-valent vertex. Near a
point inside of such face F the space M0,n looks like the product of
M0,4 and R
n−4. The factor Rn−4 comes from the edge-lengths on F
(its combinatorial type has n−4 bounded edges) while the factor M0,4
comes from perturbations of the 4-valent vertex (which result in a new
bounded edge in one of the three possible combinatorial types of the
result).
We have a well-defined map from the union U of the F -adjacent
facets to F by contracting the new edge to a point. Note that the
edge-length functions exhibit F as the positive quadrant in Rn−4. Fur-
thermore, in the combinatorial type of F we may choose 4 leaves such
that contracting all other leaves will take place outside of the 4-valent
vertex (see Figure 2). This contraction defines a map U → M0,4.
Figure 2. One of the possible contractions of a tree
with a 4-valent vertex to the tree corresponding to the
origin O ∈ M0,4.
The lemma now follows from the observation that the resulting de-
composition into M0,4 × R
n−4 agrees with the double ratio functions.
Indeed, note that the complement of the 4-valent vertex for a curve
in the combinatorial type F is composed of four components. If the
double ratio is such that its four markings are in one-to-one correspon-
dence with these components then at U it coincides with sum of the
pull-back of the corresponding double ratio in F with the pull-back of
the corresponding double ratio in M0,4. If one of the four components
MODULI SPACES OF RATIONAL TROPICAL CURVES 9
is lacking a marking from the double ratio ρ then ρ|U coincides with
the corresponding pull-back from F . �
Remark 2.4. The functions Zxi,xj from [4] do not define regular func-
tions on M0,n, contrary to what is written in [4]. These functions were
a result of an erroneous simplification of the double ratio functions.
But these functions cannot be regular as they are always positive and
Proposition 5.12 of [4] is not correct. Even the projectivization of the
embedding is not a balanced complex already for M0,5. One should
use the (non-simplified) double ratios instead.
Clearly, the space M0,n is non compact. However it is easy to com-
pactify it by allowing the lengths of bounded edges to assume infinite
values. Let M0,n be the space of connected trees with n (marked)
leaves such that each edge of this tree is assigned a length 0 < l ≤ +∞
so that each leaf has length necessarily equal to +∞.
Corollary 2.5. The space M0,n is a smooth compact tropical variety.
To verify that M0,n is smooth near a point x at the boundary
∂M0,n = M0,n rM0,n
we need to examine those double ratios that are equal to ±∞ at x.
There we use only those signs that result in −∞ do that the map takes
values in TN .
Remark 2.6. Note that the compactification M0,n ⊃ M0,n corresponds
to the Deligne-Mumford compactification in the complex case as under
the 1-parametric family collapse of a Riemannian surface to a tropical
curve the tropical length of an edge corresponds to the rate of growth
of the complex modulus of the holomorphic annulus collapsing to that
edge.
Furthermore, similarly to the complex story the infinite edges de-
compose a tropical curve into components (where the non-leaf edges
are finite). Any tropical map from an infinite edge which is bounded
would have to be constant and thus the image would have to split as
a union of several tropical curves in the target. Such decompositions
were used by Gathmann and Markwig in their deduction of the tropical
WDVV equation in R2, see [1].
3. Tropical ψ-classes
Note that we do have the forgetting maps
ftj : M0,n+1 → M0,n
10 GRIGORY MIKHALKIN
for j = 1, . . . , n + 1 by contracting the leaf with the j-marking. This
map is sometimes called the universal curve. Each marking k 6= j
defines a section σk of ftj. The conormal bundle to σk defines the ψk-
class in complex geometry (to avoid ambiguity we take j = n+1). This
notion can be adapted to our tropical setup.
Recall that so far our choice of tropical models in their equivalence
class was such that the leaves of the tropical curves were in 1-1 cor-
respondence with the markings. For this choice we have the images
σk(M0,n) contained in the boundary part of M0,n+1. This presenta-
tion is compatible with the point of view when we think about line
bundles in tropical geometry to be given by H1(X,O×). Here X is
the base of the bundle and O× is the sheaf of “non-vanishing” tropi-
cal regular functions. Such functions are given in the charts to RN by
affine-linear functions with integer slopes, see [6]. (Recall that T× = R
is an honest group with respect to tropical multiplication, i.e. the
classical addition.)
However, the following alternative construction allows one to obtain
the ψ-classes more geometrically (as we’ll illustrate in an example in
the next section). This approach is based on contracting the leaves
marked by number k.
The canonical class of a tropical curve is supported at its vertices,
namely we take each vertex with the multiplicity equal to its valence
minus 2, cf. [6]. Furthermore, the cotangent bundle near a 3-valent
vertex point can be viewed as a neighborhood of the origin for the line
given by the tropical polynomial “x + y + 1T” in R
2, so the +1 self-
intersection of the line gives the required multiplicity for the canonical
class at any 3-valent vertex. Thus we can use the intersections with the
corresponding codimension 1 faces inM0,n to define the ψ-classes there.
In other words, tropical ψ-classes will be supported on the (n − 4)-
dimensional faces in M0,n.
Namely, for a ψk-class we have to collect those codimension 1 faces
in M0,n whose only 4-valent vertex is adjacent to the leaf marked by
k. After a contraction of this leaf we get a 3-valent vertex, thus the
multiplicity of every face in a ψ-divisor is 1. We arrive to the following
definition.
Definition 3.1. The tropical ψk-divisor Ψk ⊂ M0,n is the union of
those (n−4)-dimensional faces that correspond to tropical curves with
a 4-valent vertex adjacent to the leaf marked by k, k = 1, . . . , n. Each
such face is taken with the multiplicity 1.
Proposition 3.2. The subcomplex Ψk is a divisor, i.e. satisfies the
balancing condition.
MODULI SPACES OF RATIONAL TROPICAL CURVES 11
Proof. Recall that the balancing condition is a condition at (n − 5)-
dimensional faces. In M0,n there are two types of such faces, one
corresponding to tropical curves with two 4-valent vertices and one
corresponding to a tropical curve with a 5-valent vertex.
Near the faces of the first type the moduli space M0,n is locally a
product of two copies of M0,4 and R
n−5. The Ψ-divisor is a product
of Rn−5, one copy of M0,4 and the central (3-valent) point in the other
copy of M0,4 (this is the point corresponding to the 4-valent vertex
adjacent to the leaf marked by k). Thus the balancing condition holds
trivially in this case.
Near the faces of the second type the moduli space M0,n is locally a
product of M0,5 and R
n−5. As in the proof of Theorem 1 each double
ratio decomposes to the sum of the corresponding double ration inM0,5
(perhaps trivial if two of the markings for the double ratio correspond
to the same edge adjacent to the 5-valent vertex) and an affine-linear
function in Rn−5. Thus it suffices to check only the balancing condition
for the Ψ-divisors in M0,5. This example is considered in details in the
next section. The balancing condition there follows from Proposition
4.1. �
Conjecturally, the tropical Ψ-divisors are limits of some natural rep-
resentatives of the divisors for the complex ψ-classes under the collapse
of the complex moduli space onto the corresponding tropical moduli
space M0,n. Note that our choice for the tropical Ψ-divisor is not con-
tained in the boundary ∂M0,n ⊂ M0,n (cf. the calculus of the complex
boundary classes in [2]), but comes as a closure of a divisor in M0,n.
4. The space M0,5
We have already described the moduli space M0,4 as the tripod of
Figure 1. It has only one 0-dimensional face O ∈ M0,4. This point
(considered as a divisor) coincides with the divisors Ψ1 = Ψ2 = Ψ3 =
Ψ4. The description of M0,5 is somewhat more interesting.
There are 15 combinatorial types of 3-valent trees with 5 marked
leaves. If we forget about the markings there is only one homeomor-
phism class for such a curve (see Figure 3). To get the number of
non-isomorphic markings we take the number all possible reordering of
vertices (equal to 5! = 120) and divide by 23 = 8 as there is an 8-fold
symmetry of reordering. Indeed there is one symmetry interchanging
the left two leaves, one interchanging the right two leaves and the cen-
tral symmetry around the central leave of the 3-valent tree on top of
Figure 3.
12 GRIGORY MIKHALKIN
1 15 5
Figure 3. Adjunction of combinatorial types corre-
sponding to the quadrant connecting the rays (45) and
(12).
(25) (13)
(15) (23)
Figure 4. The link of the origin in M0,5.
MODULI SPACES OF RATIONAL TROPICAL CURVES 13
Thus the space M0,5 is a union of 15 quadrants R
≥0. These quad-
rants are attached along the rays which correspond to the combinato-
rial types of curves with one 4-valent vertex. Such curves also have one
3-valent vertex which is adjacent to two leaves and the only bounded
edge of the curve, see the bottom of Figure 3. Such combinatorial types
are determined by the markings of the two leaves emanating from the
3-valent vertex. Thus we have a total of
= 10 of such rays.
The two boundary edges of the quadrant correspond to contractions
of the bounded edges of the combinatorial type as shown on Figure
3. The global picture of adjacency of quadrants and rays is shown
on Figure 4 where the reader may recognize the well-known Petersen
graph, cf. the related tropical Grassmannian picture in [7]. Vertices of
this graph correspond to the rays of M0,5 while the edges correspond
to the quadrants. Thus the whole picture may be interpreted as the
link of the only vertex O ∈ M0,5 (the point O corresponds to the tree
with a 5-valent vertex adjacent to all the leaves).
To locate the Ψk-divisor we recall that the kth leaf has to be adjacent
to a 4-valent vertex if it appears in Ψk. This means that Ψk consists
of 6 rays that are marked by pairs not containing k.
Proposition 4.1. The subcomplex Ψk ⊂ M0,5 is a divisor.
Proof. Since the whole M0,5 is S5-symmetric it suffices to check the
balancing condition only for Ψ1. The embedding M0,5 ⊂ R
N is given
by the double ratios, so it suffices to check that for each double ratio
function the sum of its gradients on the six rays of Ψ1 vanishes.
If the double ratio is determined by two pairs disjoint from the mark-
ing 1, e.g. by {(23), (45)} then its restriction onto the six rays of Ψ1 is
the same as its restriction to the three rays M0,4 taken twice and thus
balanced. Namely its gradient is 1 on the rays (24) and (35); −1 on
the rays (25) and (34); and 0 on the rays (23) and (45).
If the four markings of the double ratio contain the marking 1 then
thanks to the symmetry we may assume that the double ratio is given
by {(12), (34)}. It vanishes on the rays (34), (35), (45) and (25); it has
gradient +1 on the ray (24) and the gradient −1 on the ray (23). Once
again, the balancing condition holds. �
As our final example of the paper we would like to describe explicitly
the universal curve
ft5 : M0,5 → M0,4.
This is presented on Figure 5. Once again, we interpret the Peterson
graph as the link L of the vertex O ∈ M0,5. Similarly, the link of the
14 GRIGORY MIKHALKIN
Figure 5. The three fibers and four sections of the
universal curve ft5 : M0,5 → M0,4.
origin in M0,4 consists of three points. Thus L is the union of the fibers
of ft5 (away from a neighborhood of infinity) over these three points
and four copies of a neighborhood of the origin in M0,4 corresponding
to the four sections σ1, σ2, σ3 and σ4 of the universal curve. Figure
5 depicts the fibers in L with solid lines and the sections with dashed
lines.
Acknowledgements. I am thankful to Valery Alexeev and Kristin
Shaw for discussions related to geometry of tropical moduli spaces.
My research is supported in part by NSERC.
References
[1] Gathmann, A., Markwig, H., Kontsevich’s formula and the WDVV equations
in tropical geometry, http://arxiv.org/abs/math.AG/0509628.
[2] Keel, S., Intersection theory of moduli space of stable N -pointed curves of
genus zero, Transactions of the AMS 330 (1992), 545–574.
[3] Litvinov, G. L., The Maslov dequantization, idempotent and tropical mathe-
matics: a very brief introduction. In Idempotent mathematics and mathemat-
ical physics, Contemp. Math., 377, Amer. Math. Soc., Providence, RI, 2005,
1–17.
[4] Mikhalkin, G., Tropical Geometry and its application, to appear in the Pro-
ceedings on the ICM-2006, Madrid; http://arxiv.org/abs/math/0601041.
[5] Mikhalkin, G., Tropical Geometry, book in preparation.
http://arxiv.org/abs/math.AG/0509628
http://arxiv.org/abs/math/0601041
MODULI SPACES OF RATIONAL TROPICAL CURVES 15
[6] Mikhalkin, G., Zharkov, I., Tropical curves, their Jacobians and Theta func-
tions, http://arxiv.org/abs/math/0612267.
[7] Speyer, D., Sturmfels, B., The tropical Grassmannian. Adv. Geom. 4 (2004),
no. 3, 389–411.
[8] Viro, O. Ya., Dequantization of real algebraic geometry on logarithmic paper.
In European Congress of Mathematics, Vol. I (Barcelona, 2000), Progr. Math.,
201, Birkhäuser, Basel, 2001, 135–146.
Department of Mathematics, University of Toronto, 40 St George
St, Toronto ON M5S 2E4 Canada
http://arxiv.org/abs/math/0612267
	1. Introduction: smooth tropical varieties
	2. Tropical curves and their moduli spaces
	3. Tropical -classes
	4. The space M0,5
	References
ABSTRACT
  This note is devoted to the definition of moduli spaces of rational tropical
curves with n marked points. We show that this space has a structure of a
smooth tropical variety of dimension n-3. We define the Deligne-Mumford
compactification of this space and tropical $\psi$-class divisors.

<|endoftext|><|startoftext|>
Difermion condensates in vacuum in 2-4D four-fermion interaction models
Bang-Rong Zhou†
College of Physical Sciences, Graduate School of
the Chinese Academy of Sciences, Beijing 100049, China
In any four fermion (denoted by q) interaction models, the couplings of (qq)2-form can
always coexist with the ones of (q̄q)2-form via the Fierz transformations. Hence, even in
vacuum, there could be interplay between the condensates 〈q̄q〉 and 〈qq〉. Theoretical anal-
ysis of this problem is generally made by relativistic effective potentials in the mean field
approximation in 2D, 3D and 4D models with two flavor and Nc color massless fermions.
It is found that in ground states of these models, interplay between the two condensates
mainly depend on the ratio GS/HS for 2D and 4D case or GS/HP for 3D case, where GS ,
HS and HP are respectively the coupling constants in a scalar (q̄q), a scalar (qq) and a
pseudoscalar (qq) channel.
In ground states of all the models, only pure 〈q̄q〉 condensates could exist if GS/HS or
GS/HP is bigger than the critical value 2/Nc, the ratio of the color numbers of the fermions
entering into the condensates 〈qq〉 and 〈q̄q〉. Below it, differences of the models will manifest
themselves.
In the 4D Nambu-Jona-Lasinio (NJL) model, as GS/HS decreases to the region below
2/Nc, one will first have a coexistence phase of the two condensates then a pure 〈qq〉 con-
densate phase. Similar results come from a renormalized effective potential in the 2D Gross-
Neveu model, except that the pure 〈qq〉 condensates could exist only if GS/HS = 0. In a
3D Gross-Neveu model, when GS/HP < 2/Nc, the phase transition similar to the 4D case
can arise only if Nc > 4, and for smaller Nc, only a pure 〈qq〉 condensate phase exists but
no coexistence phase of the two condensates happens. The GS −HS (or GS − HP ) phase
diagrams in these models are given.
The results deepen our understanding of dynamical phase structure of four-fermion inter-
action models in vacuum. In addition, in view of absence of difermion condensates in vacuum
of QCD, they will also imply a real restriction to any given two-flavor QCD-analogous NJL
model, i.e. in the model, the derived smallest ratio GS/HS via the Fierz transformations in
the Hartree approximation must be bigger than 2/3.
The project supported by the National Natural Science Foundation of China under Grant No.10475113.
Electronic mailing address: zhoubr@163bj.com
http://arxiv.org/abs/0704.0841v3
I. MAIN RESULTS
We have researched interplay between the fermion(q)-antifermion (q̄) condensates 〈q̄q〉 and
the difermion condensates 〈qq〉 in vacuum in 2D, 3D and 4D four-fermion interaction models
with two flavor and Nc color massless fermions. It is found that the ground states of the systems
could be in different phases shown in the following GS − HS and GS − HP phase diagrams
[Fig.(a)–Fig.(d)], where
GS — coupling constant of scalar (q̄q)
2 channel
HS — coupling constant of scalar color
Nc(Nc − 1)
−plet (qq)2 channel (4D, 2D case)
HP — coupling constant of pseudoscalar color
Nc(Nc − 1)
−plet (qq)2 channel (3D case)
Λ — Euclidean Momentum cutoff of loop integrals (4D,3D case)
(σ1, 0) — pure 〈q̄q〉 phase
(0,∆1) — pure 〈qq〉 phase
(σ2,∆2) — mixed phase with both 〈q̄q〉 and 〈qq〉
Fig.(a)-Fig.(d) (pages 3-6)
                                                                    3 
4D NJL Model 
y=GSΛ
2/π2 
               
 y =2x/ Nc
           (σ1, 0) 
(σ2,Δ2) 
      
1/Nc                    (0 ,Δ1)                R 
0        1/2 
x =HSΛ
R:  y=x/[1+(Nc-2)x] 
Fig. (a) 
                                                          4           
2D GN Model   
y=GS /π 
y=2x /Nc  
         
(σ1, 0)  
(σ2 ,Δ2) 
x=HS /π  (0,Δ1) 
Fig. (b)      
                                            
3D GN Model,  Nc≤4  
y=GSΛ/π
y=2x/ Nc
           (σ1, 0) 
1/4Nc
                                 (0,Δ1) 
        0         1/8 
x=HPΛ/π
Fig. (c)  
3D GN Model,  Nc≥5  
y=GSΛ/π
          y =2x/ Nc
(σ1, 0) 
1/4Nc                         
            (σ2,Δ2) 
(0,Δ1) 
0 1/8 
x =HPΛ/π
Fig. (d) 
Main Conclusions
1. In all the models, pure 〈q̄q〉 phase happens if
) > 2
(also GS must be large
enough in 3D and 4D model).
2. The phases with condensates 〈qq〉, including pure 〈qq〉 phase and mixed phase with 〈q̄q〉
and 〈qq〉, arise only if
) < 2
3. In 3D Gross-Neveu model, no mixed phase with 〈q̄q〉 and 〈qq〉 exists for Nc ≤ 4.
II. Motive and general approach
• In any four-fermion interaction model [1, 2], the couplings of (qq)2-form and (q̄q)2-form
can always coexist via the Fierz transformations, hence there must be interplay between
the condensates 〈q̄q〉 and 〈qq〉 in ground state of the system.
• In the vacuum, despite of absence of net fermions, based on a relativistic quantum field
theory, it is possible that the condensates 〈qq〉 and 〈q̄q̄〉 are generated simultaneously.
• The mean field approximation has been taken. In this case, we have used the Fierz
transformed four-fermion couplings in the Hartree approximation to avoid double
counting [3].
• In selecting the couplings of (qq)2-form, we always simulate SU(Nc) gauge interaction,
where two fermions are attractive in the antisymmetric
Nc(Nc − 1)
-plet.
• Euclidean momentum cutoffs in 3D and 4D models have been used so as to maintain
Lorentz invariance of effective potentials in the vacuum.
• In massless fermion limit, all the discussions can be made analytically.
• The coupling constants GS and HS (or HP ) are viewed as independent parameters.
III. 4D Nambu-Jona-Lasinio model
With 2 flavors and Nc color massless fermions, the Lagrangian
L = q̄iγµ∂µq +GS [(q̄q)
2 + (q̄iγ5~τq)
2] +HS
(q̄iγ5τ2λAq
C)(q̄C iγ5τ2λAq), (1)
where the fermion fields q are in the doublet of SUf (2) and the Nc-plet of SUc(Nc), i.e.
i = 1, · · · , Nc, (2)
http://arxiv.org/abs/0704.0841v3
qC is the charge conjugate of q and ~τ = (τ1, τ2, τ3) are the Pauli matrices acting in two-flavor
space. The matrices λA run over all the antisymmetric generators of SUc(Nc).
Assume that the four-fermion interactions can lead to the scalar condensates
〈q̄q〉 = φ (3)
with all the Nc color fermion entering them, and the scalar color
Nc(Nc − 1)
2 -plet difermion
and di-antifermion condensates (after a global SUc(Nc) transformation)
〈q̄Ciγ5τ2λ2q〉 = δ, 〈q̄iγ5τ2λ2q
C〉 = δ∗, (4)
with only two color fermions enter them. The corresponding symmetry breaking is that
SUfL(2) ⊗ SUfR(2) → SUf (2), SUc(Nc) → SUc(2), and a ”rotated” electric charge UQ̃(1)
and a ”rotated” quark number U ′q(1) leave unbroken. It should be indicated that in the case
of vacuum, the Goldstone bosons induced by spontaneous breaking of SUc(Nc) could be some
combinations of difermions and di-antifermions.
Define that
σ = −2GSφ, ∆ = −2HSδ, ∆
∗ = −2HSδ
∗. (5)
With standard technique and a 4D Euclidean momentum cutoff Λ [4], we obtain the relativistic
effective potential
V4(σ, |∆|) =
2 + 2|∆|2)Λ2 − (Nc − 2)
−(σ2 + |∆|2)2
σ2 + |∆|2
. (6)
The ground states of the system, i.e. the minimum points of V4(σ, |∆|), will be at
(σ, |∆|) =
(0, ∆1)
(σ2, ∆2)
(σ1, 0)
, 0 ≤
1 + (Nc − 2)
1 + (Nc − 2)
, (7)
Eq.(7) gives the phase diagram Fig.(a) of the 4D NJL model.
IV. 2D Gross-Neveu model
The Lagrangian is expressed by
L = q̄iγµ∂µq +GS [(q̄q)
2 + (q̄iγ5~τq)
2] +HS(q̄iγ5τSλAq
C)(q̄Ciγ5τSλAq), (8)
All the denotations are the same as ones in 4D NJL model, except that in 2D space-time
, γ1 =
= −C, γ5 = γ
and τS = (τ0 ≡ 1, τ1, τ3) are flavor-triplet symmetric matrices. It is indicated that the product
matrix Cγ5τSλA is antisymmetric.
Assume that the four-fermion interactions could lead to the scalar quark-antiquark conden-
sates
〈q̄q〉 = φ, (9)
which will break the discrete symmetries
χD : q(t, x)
→ γ5q(t, x),
P1 : q(t, x)
→ γ1q(t,−x),
and that the coupling with HS can lead to the scalar color
Nc(Nc − 1)
-plet difermion con-
densates and the scalar color anti-
Nc(Nc − 1)
-plet di-antifermion condensates (after a global
transformation in flavor and color space)
〈q̄C iγ51fλ2q〉 = δ, 〈q̄iγ51fλ2q
C〉 = δ∗ (10)
which will break discrete symmetries Zc
(center of SUc(3)) and Z
(center of SUf (2)), besides
χD and P1. Noting that in a 2D model, no breaking of continuous symmetry needs to be
considered on the basis of Mermin-Wagner-Coleman theorem [5].
The model is renormalizable. In the space-time dimension regularization approach, we can
write down the renormalized L in D = 2− 2ε dimension space-time by the replacements
GS → GSM
2−DZG, HS → HSM
2−DZH ,
with the scale parameter M , the renormalization constants ZG and ZH . In addition, the γ
L will become 2D/2 × 2D/2 matrices.
Define the order parameters
σ = −2GSM
2−DZGφ, ∆ = −2HSM
2−DZHδ, (11)
which will be finite if ZG and ZH are selected so as to cancel the UV divergences in φ and δ. In
the minimal substraction scheme,
ZG = 1−
2NcGS
, ZH = 1−
. (12)
By similar derivation to the one made in Ref.[6], the corresponding renormalized effective po-
tential in the mean field approximation up to one-loop order becomes
V2(σ, |∆|) =
σ2 + |∆|2
+ (Nc − 2) ln
σ2 + |∆|2
, M̄2 = 2πe−γM2, (13)
where γ is the Euler constant. The ground states of the system i.e. the minimal points of
V2(σ, |∆|) will be at
(σ, |∆|) =
(0, ∆1)
(σ2, ∆2)
(σ1, 0)
GS/HS = 0
0 < GS/HS < 2/Nc
GS/HS > 2/Nc
Eq.(14) gives the phase diagram Fig.(b) of 2D GN model.
In 2D case, the GS -HS phase structure has the following feature:
1. The pure 〈qq〉 phase (0,∆1) could appear only if GS/HS = 0;
2. Formations of the condensates do not call for that the coupling constant GS and HS have
some lower bounds.
V. 3D Gross-Neveu model
The Lagrangian is expressed by
L = q̄iγµ∂µq +GS [(q̄q)
2 + (q̄~τq)2] +HP
(q̄τ2λAq
C)(q̄Cτ2λAq), (15)
where γµ(µ = 0, 1, 2) are taken to be 2× 2 matrices
, γ1 =
, γ2 =
It is noted that the product matrix Cτ2λA is antisymmetric, and since without the ”γ5” matrix,
the only possible color
Nc(Nc − 1)
-plet difermion interaction channel is pseudoscalar one. The
condensates 〈q̄q〉 will break
time reversal symmetry T : q(t, ~x) → γ2q(−t, ~x),
special parity P1 : q(t, x
1, x2) → γ1q(t,−x1, x2),
special parity P2 : q(t, x
1, x2) → γ2q(t, x1,−x2).
The difermion condensates 〈q̄Cτ2λ2q〉 (after a global rotation in the color space) will break
SUc(Nc) → SUc(2)
and leave a ”rotated” electrical charge U
(1) and a ”rotated” fermion number U ′q(1) unbroken.
It also breaks
parity P : q(t, ~x) → γ0q(t,−~x)
and this shows pseudoscalar feature of the difermion condensates.
Define the order parameters in the 3D GN model
σ = −2GS〈q̄q〉, ∆ = −2HP 〈q̄
Cτ2λ2q〉, (16)
on bases of the same method used in Ref.[7], we find out the effective potential in the mean field
approximation
V3(σ, |∆|) =
2 + 2|∆|2)Λ
6σ2|∆|+ 2|∆|3 + (Nc − 2)σ
3 + 2θ(σ − |∆|)(σ − |∆|)3
, (17)
where Λ is a 3D Euclidean momentum cutoff. The ground states of the system correspond to
the least value points of V3(σ, |∆|) which will respectively be at
(σ, |∆|) =
(0,∆1),
(0,∆1),
(σ2,∆2),
(σ1, 0),
, for Nc ≤ 4
for Nc > 4
, for all Nc
Eq.(18) gives the GS −HP phase diagrams Fig.(c) and Fig.(d) of the 3D GN model.
VI. Summary
• Present research deepens our theoretical understanding of the four-fermion interaction
models:
1. Even in vacuum, it is possible that the difermion condensates are generated as long
as the coupling constants of the difermion channel are strong enough (bigger than
zero or some finite values).
2. Interplay between the condensates 〈q̄q〉 and 〈qq〉 mainly depends on GS/HS (or
GS/HP ), the ratio of the coupling constants of scalar fermion-antifermion channel
and scalar (or pseudoscalar ) difermion channel.
3. In all the discussed 2-flavor models, if GS/HS (GS/HP ) > 2/Nc, the ratio of the
color numbers of the fermions entering into the condensates 〈qq〉 and 〈q̄q〉, (and also
with sufficiently large GS in 4D and 3D model), then only pure 〈q̄q〉 condensates
phase may exist. Below 2/Nc, (and also with sufficiently large HS or HP in 4D or
3D model), one will always first have a mixed phase with condensates 〈q̄q〉 and 〈qq〉,
then a pure 〈qq〉 condensate phase, except that in the 3D GN model, no the mixed
phase appears when Nc ≤ 4.
• In view of absence of 〈qq〉 condensates in vacuum of QCD, the result here also implies
a real restriction to any given two-flavor QCD-analogue NJL model: in such model, the
derived smallest ratio GS/HS via the Fierz transformation in the Hartree approximation
must be bigger than 2/3 [4].
[1] Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122 (1961) 345; 124 (1961) 246.
[2] D.J. Gross and A. Neveu, Phys. Rev. D 10 (1974) 3235.
[3] M. Buballa, Phys. Rep. 407 (2005) 205.
[4] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 95.
[5] N. D. Mermin and H. Wagner, Phys. Rev. Lett. 17 (1966) 1133; S. Coleman, Commun. Math. Phys.
31 (1973) 259.
[6] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 520.
[7] Zhou Bang-Rong, Commun. Theor. Phys. 47 (2007) 695.
	Main results
	References
ABSTRACT
  Theoretical analysis of interplay between the condensates $<\bar{q}q>$ and
$<qq>$ in vacuum is generally made by relativistic effective potentials in the
mean field approximation in 2D, 3D and 4D models with two flavor and $N_c$
color massless fermions. It is found that in ground states of these models,
interplay between the two condensates mainly depend on the ratio $G_S/H_S$ for
2D and 4D case or $G_S/H_P$ for 3D case, where $G_S$, $H_S$ and $H_P$ are
respectively the coupling constants in a scalar $(\bar{q}q)$, a scalar $(qq)$
and a pseudoscalar $(qq)$ channel. In ground states of all the models, only
pure $<\bar{q}q>$ condensates could exist if $G_S/H_S$ or $G_S/H_P$ is bigger
than the critical value $2/N_c$, the ratio of the color numbers of the fermions
entering into the condensates $<qq>$ and $<\bar{q}q>$. As $G_S/H_S$ or
$G_S/H_P$ decreases to the region below $2/N_c$, differences of the models will
manifest themselves. Depending on different models, and also on $N_c$ in 3D
model, one will have or have no the coexistence phase of the two condensates,
besides the pure $<qq>$ condensate phase. The $G_S-H_S$ (or $G_S-H_P$) phase
diagrams in these models are given. The results also implicate a real
constraint on two-flavor QCD-analogous NJL model.

<|endoftext|><|startoftext|>
Oscillation bands of condensates on a ring: Beyond the mean field theory
C. G. Bao
Center of Theoretical Nuclear Physics,
National Laboratory of Heavy Ion Collisions,
Lanzhou 730000, P. R. China
The State Key Laboratory of Optoelectronic Materials and Technologies,
Zhongshan University, Guangzhou, 510275, P.R. China
Abstract: The Hamiltonian of a N-boson system confined on a ring with zero spin and repulsive
interaction is diagonalized. The excitation of a pair of p-wave-particles rotating reversely appears to
be a basic mode. The fluctuation of many of these excited pairs provides a mechanism of oscillation,
the states can be thereby classified into oscillation bands. The particle correlation is studied
intuitively via the two-body densities. Bose-clustering originating from the symmetrization of wave
functions is found, which leads to the appearance of 1-, 2-, and 3-cluster structures. The motion is
divided into being collective and relative, this leads to the establishment of a relation between the
very high vortex states and the low-lying states.
After the experimental realization of the Bose-Einstein
condensation1, various condensates confined under dif-
ferent circumstances have been extensively studied the-
oretically and experimentally. Mostly, the condensates
are considered to be confined in a harmonic trap. Con-
densates trapped by periodic potential have also been
studied due to the appearance of optical lattices.2 It
is believed that the appearance of condensates confined
in particular geometries is possible. Experimentally, the
particle interactions can now be tuned from very weak
to very strong,3−8 it implies that the particle correlation
may become important. Theoretically, to respond, go-
ing beyond the mean field Gross-Pitaevskii (GP) theory
is desirable, and the condensates confined in particular
geometries are also deserved to be considered.
Along this line, in addition to the ground state, the
yrast states have been studied both analytically and
numerically.9−17 The condensation on a ring has also
been studied recently.12 The present paper is also ded-
icated to the N−boson systems confined on a ring with
weak interaction, its scope is broader and covers the
whole low-lying spectra. A similar system has been in-
vestigated analytically by Lieb and Liniger16,17. How-
ever, the emphasis of their papers is different from the
present one, which is placed on analyzing the structures
of the excited states to find out their distinctions and sim-
ilarities, and to find out the modes of excitation. Based
on the analysis, an effort is made to classify the ex-
cited states. Traditionally, the particle correlation and
its effect on the geometry of N−boson systems is a topic
scarcely studied if N is large. In this paper, the corre-
lation is studied intuitively so as the geometric features
inherent in the excited states can be understood. Tradi-
tionally, a separation between the collective and internal
motions is seldom to be considered if N is large. In this
paper such a separation is made and leads to the estab-
lishment of a relation between the vortex states and the
low-lying states.
It is assumed that the N identical bosons confined on
a ring have mass m, spin zero, and square-barrier inter-
action. The ring has a radius R, N is given at 100, 20
and 10000. Let G = ~2/(2mR2) be the unit of energy.
The Hamiltonian then reads
H = −
i<j Vij (1)
where θi is the azimuthal angle of the i-th boson. Vij =
Vo if |θj − θi| ≤ θrange , or = 0 otherwise. Let φk =
eikθ/
2π be a single particle state, −kmax ≤ k ≤ kmax
is assumed. The N−body normalized basis functions
in Fock-representation are |α〉 ≡ |n−kmax , · · · nkmax〉,
where nj is the number of bosons in φj ,
nj = N,
njj = L, the total angular momentum. Then,
H is diagonalized in the space spanned by |α〉, the low-
lying spectrum together with the eigen-wave-functions,
each is a linear combination of |α〉, are thereby obtained.
Let Kα =
2 be the total kinetic energy of an |α〉
state. Evidently, those |α〉 with a large Kα are negligi-
ble for low-lying states. Therefore, one more constraint
Kα ≤ Kmax is further added to control the number of
|α〉. In this procedure, the crucial point is the calcula-
tion of the matrix elements of H . This can be realized
by using the fractional parentage coefficients18 (refer to
eq.(6) below). Numerical results are reported as follows.
This paper concerns only the cases with weak interac-
tion. Firstly, let Vo = 1, θrange = 0.025, and N = 100.
This is corresponding to γ = 0.00157 , where γ is in-
troduced by Lieb and Liniger to measure the strength
of interaction,16,17 this is shown later. When kmax and
Kmax are given at a number of values, the associated
eigen-energiesEj of the first, fifteenth, and sixteenth L =
0 eigen-states are listed in Table I. When (kmax,Kmax)
is changed from (3, 50) to (5,60), the total number of |α〉
is changed from 2167 to 8890. Table I demonstrates that
the great increase of basis functions does not lead to a
remarkable decrease of eigen-energies. Thus the conver-
gency is qualitatively satisfying even for the higher states.
http://arxiv.org/abs/0704.0842v1
TABLE I: Eigen-energies Ej (the unit is G) of the L = 0
states. The first row is (kmax,Kmax), the first column is the
serial number of states j. Vo = 1, θrange = 0.025, and N =
100 are given.
(3,50) (4,50) (4,60) (5,60)
1 39.109 39.090 39.090 39.078
15 53.645 53.616 53.616 53.613
16 54.822 54.800 54.800 54.790
In the following the choice kmax = 4 and Kmax = 50
are adopted, this limitation leads to a 3254-dimensional
space. Thereby the resultant data have at least three
effective figures, this is sufficient for our qualitative pur-
pose.
N=100,  L=0
FIG. 1: The spectrum of L = 0 states, the unit of energy is
G = ~2/(2mR2). N = 100, Vo = 1, and θrange = 0.025 are
assumed, they are the same for Fig.1 to Fig.5. The levels in a
column constitute an oscillation band, the levels in bold line
are doubly degenerate.
The low-lying spectrum is given in Fig.1, where the
lowest fourteen levels are included. Twelve of them can
be ascribed into three bands, in each band the levels
are distributed equidistantly, this is a strong signal of
harmonic-like oscillations. From now on the labels Ψ
and E(L,Z,i) are used to denote the wave function and en-
ergy of the i-th state of the Z-th band (Z=I, II, III,· · ·).
It turns out that the excitation of a pair of particles
both in p-wave but rotating reversely, namely, one par-
ticle in φ1 while the other one in φ−1, is a basic mode,
the pair is called a basic pair in the follows. A number of
such basic pairs might be excited. When 2j particles are
in basic pairs while the remaining N − 2j particles are
in φ0, the associated |α〉 is written as |P (j)〉. For all the
states of the I−band, we found Ψ(0)I,i is mainly a linear
combination of |P (j)〉 together with a small component
denoted by ∆I,i, i.e.,
I,i =
(0,I,i)
(j)〉+∆I,i (2)
where ∆I,i is very small as shown in Table II, while the
coefficients C
(L,Z,i)
j arise from the diagonalization. Thus
the basic structure of the I−band is just a fluctuation of
many of the basic pairs.
TABLE II: The weights of ∆Z,i of the bands with L = 0
i I−band II−band III−band
1 0.009 0.017 0.040
2 0.012 0.030 0.056
3 0.021 0.061 0.088
4 0.035 0.028
5 0.055 0.035
6 0.079 0.106
For lower states, C
(0,Z,i)
j would be very small if j
is larger, e.g., for the ground state, C
(0,I,1)
0 = 0.968
and C
(0,I,1)
j ≥2 ≈ 0, it implies that the excitation of many
pairs is not probable. It also implies that the ground state
wave function obtained via mean-field theory might be a
good approximation. However, for higher states, many
pairs would be excited. E.g., for the third state of the
I−band, C(0,I,3)j = 0.052, 0.406, 0.681,−0.528, and 0.245
when j is from 0 to 4, it implies a stronger fluctuation.
When a |α〉 has not only 2j particles in the basic pairs,
but also m particles in φk, while the remaining particles
in φ0, then it is denoted as |(k)mP (j)〉 (where k = ±1
are allowed) Similarly, we can define |(k1)m1(k2)m2P (j)〉,
and so on. For all the states of the II−band, we found
II,i =
(0,II,i)
[ |(2)1(−1)2P (j)〉
±|(−2)1(1)2P (j)〉] + ∆II,i
where both the + and − signs lead to the same energy,
thus the level is two-fold degenerate. Again, all the ∆II,i
are very small as shown in Table II, thus the fluctua-
tion of basic pairs is again the basic structure. However,
the II−band is characterized by having the additional
3-particle-excitation (one in d-wave and two in p-wave).
For all the states of the third band, we found
III,i
(0,III,i)
j |(2)1(−2)1P (j)〉+∆III,i (4)
Thus, the III−band contains, in addition to the fluc-
tuation of basic pairs, a more energetic pair with each
particle in d-wave. It was found that the spacing
E(0,Z,i+1) − E(0,Z,i) inside all the bands are nearly the
same, they are ∼3.15. This arises because they have the
same mechanism of oscillation, namely, the fluctuation of
basic pairs.
When the energy goes higher, more oscillation bands
can be found. The two extra levels in Fig.1 at the right
are the band-heads of higher bands.
Incidentally, the band-heads of the above three
bands are dominated by |P (0)〉, |(2)1(−1)2P (0)〉 ±
|(−2)1(1)2P (0)〉 and |(2)1(−2)1P (0)〉, respectively, and
their kinetic energies Kα = 0, 6, and 8. Among all the
basis functions with L = 0 and without basic pairs, these
three are the lowest three. This explains why the band-
heads are dominated by them. Once a band-head is fixed,
the corresponding oscillation band would grow up via the
fluctuation of basic pairs.
The particle correlations can be seen intuitively by ob-
serving the two-body densities
ρ2(θ1, θ2) =
dθ3 · · · dθN Ψ(L)∗Z,i Ψ
Similar to the calculation of the matrix elements of
interaction, the above integration can be performed in
coordinate space by extracting the particles 1 and 2
from |α〉 by using the fractional parentage coefficients18,
namely,
|α〉 =
nk(nk − 1)/N(N − 1)φk(1)φk(2)|αk〉
ka,kb
(ka 6=kb)
nkankb/N(N − 1)φka(1)φkb(2)|αkakb〉
where |αk〉 is different from |α〉 by replacing nk with nk−
2, |αkakb〉 is different from |α〉 by replacing nka and nkb
with nka − 1 and nkb − 1, respectively.
0 60 120 180 240 300 360
3 (c)
III-band
3 (b)
II-band
I-band
FIG. 2: ρ2 as functions of θ2 for the I ( a), II ( b), and III (c)
bands of L = 0 states, θ1 = 0 is given. The labels i of the
states Ψ
are marked by the curves.
ρ2 gives the spatial correlation between any pair of
particles as shown in Fig.2. For the ground state Ψ
ρ2 is flat implying that the correlation is weak. However,
it is a little larger when the two particles are opposite to
each other (θ1 = 0 and θ2 = π). It implies the existence
of a weak correlation which is entirely ignored by the
mean field theory. Thus, even the interaction adopted is
weak and even for the ground state, there is still a small
revision to the mean field theory. For higher states of the
I−band, the fluctuation of basic pairs becomes stronger.
Due to the fluctuation, the particles tend to be close to
each other to form a single cluster. This tendency is
clearly shown in Fig.2a.
For the first state of the III − band, Ψ(0)
III,1 has two
peaks in ρ2 implying a 2-cluster structure. It arises from
the two d-wave paticles inherent in the band. The fea-
ture of Ψ
II,1 is lying between Ψ
I,1 and Ψ
III,1. For all
higher states of every band, due to the strong fluctuation
of basic pairs, all the particles tend to be close to each
other as shown in 2b and 2c.
To understand the physics why the particles tend to be
close to each other, let us study the most important basis
state |P (j)〉. By inserting |P (j)〉 into eq.(5) to replace
and by using (6), ρ2 reads
ρ2(θ1, θ2) =
(2π)2N(N−1) [N(N − 1)− j(4N − 6j)
+4j(N − 2j)(1 + cos(θ1 − θ2)) + 4j2 cos2(θ1 − θ2)]
Where there are four terms at the right, the non-
uniformity arises from the third and fourth terms. The
third term causes the particles to be close to each other
to form a single cluster, while the fourth term causes
the two-cluster clustering. When j is small, the fourth
term can be neglected, and the particles tend to form
a single cluster. However, when j ≈ N/2 , the third
term can be neglected, and the particles tend to form
two clusters. It is noted that, if the symmetrization were
dropped, the density contributed by |P (j)〉 would be uni-
form. The appearance of the clustering originates from
the symmetrization of the bosonic wave functions, there-
fore it can be called as bose-clustering.
For L = 1 states, the lowest energy E(1,I,1) is higher
than E(0,I,1) by 1.606, but lower than E(0,I,2). Thus Ψ
is the true first excited state of the system. A number
of oscillation bands exist as well, the wave functions of
the lowest six bands are found as
(1,I,i)
j |(1)1P (j)〉+∆I,i
(1,II,i)
j |(2)1(−1)1P (j)〉+∆II,i
III,i
(1,III,i)
j |(−2)1(1)3P (j)〉+∆III,i
IV,i =
(1,IV,i)
j |(2)1(−2)1(1)1P (j)〉+∆IV,i
V,i =
(1,V,i)
j |(3)1(−1)2P (j)〉+∆V,i
V I,i =
(1,V I,i)
j |(3)1(−2)1P (j)〉+∆V I,i
Where the weights of all the ∆Z,i ≤ 0.1 if i ≤ 4. Thus,
just as the above L = 0 case, all the bands have the
common fluctuation of basic pairs, but each band has
a specific additional few-particle excitation. The ener-
gies of the band-heads from I to V I are 40.70, 45.42,
48.58, 50.14, 52.04, and 53.58 respectively. Further-
more, the spacing ∼3.15 found above is found again for
all these bands due to having the same mechanism of os-
cillation. The I−band is similar to the above I−band
with L = 0 but having an additional single p-wave ex-
citation, the ρ2 of them are one-one similar. Similarly,
the ρ2 of the IV−band is one-one similar to those of
the above III− band with L = 0. The ρ2 of the II
and III−bands are both similar to those of the above
II−band with L = 0. However, the V and V I−bands
are special due to containing the f-wave excitation, the ρ2
of their band-heads exhibit a 3-cluster structure as shown
in Fig.3. When the energy goes even higher, more higher
oscillation bands will appear. For the above six L = 1
bands, their band-heads are dominated by the |α〉 with
Kα = 1, 5, 7, 9, 11, and 13. Obviously, a higher Kα leads
to a higher band.
0 60 120 180 240 300 360
FIG. 3: ρ2 for selected L = 1 states. θ1 = 0, the (Z, i) labels
are marked by the curves.
In general, all the low-lying states can be classified into
oscillation bands. For all the lower bands disregarding
L, it was found that each band-head is dominated by
a basis function containing a specific few-particle exci-
tation but not containing any basic pairs. The energy
order of the bands is determined by the magnitudes of
Kα associated with the dominant basis function |α〉 of
the band-heads. Once a band-head stands, an oscillation
band will grow up from the band-head simply via the fluc-
tuation of basic pairs. For examples, for L = 2 states, the
dominant |α〉 of the band-heads of the four lowest oscil-
lation bands are |(1)2P (0)〉, |(2)1P (0)〉, |(−2)1(1)4P (0)〉,
and |(3)1(−1)1P (0)〉 with Kα = 2, 4, 8, and 10, respec-
tively.
For L = 3 states, the dominant |α〉 of the band-heads of
the three lowest bands are |(1)3P (0)〉, |(2)1(1)1P (0)〉, and
|(3)1P (0)〉, with Kα = 3, 5, and 9, respectively. Since the
p-, d-, and f-wave appear successively, these band-heads
exhibit 1-cluster, 2-cluster, and 3-cluster structures, re-
spectively, as shown in Fig.4.
0 60 120 180 240 300 360
III,1ρ
FIG. 4: ρ2 for the band-heads of L = 3 states, θ1 = 0.
Furthermore, a −L state can be derived from the
corresponding L state simply by changing every k to
−k, i.e., change the components |(k1)m1(k2)m2P (j)〉 to
|(−k1)m1(−k2)m2P (j)〉, and so on. Therefore Ψ(−L)Z,i =
)∗, and E(−L,Z,i) = E(L,Z,i).
0 2 4 6 8
Yrast line
N=100,   Vo=1,  θrange=0.025
FIG. 5: Energies of the yrast states with L = 0 to 10.
Let us study the yrast states Ψ
I,1 , each is the lowest
one for a given L. The energies of them are plotted in
Fig.5, their wave functions are found as
I,1 =
(L,I,1)
j |(1)LP (j)〉+∆LI,1 (9)
where ∆LI,i is very small. When L is small, the fluc-
tuation of basic pairs is small, and the yrast states are
dominated by the j = 0 component |(1)LP (0)〉.When L is
larger, the weight of the |(1)LP (0)〉 component becomes
smaller. E.g., when L = 0, 2, 4, and 10, the weights
of |(1)LP (0)〉 are 0.94, 0.84, 0.75, and 0.54, respectively.
Evidently, the energy going up linearly in the yrast line
in Fig.5 is mainly due to the linear increase of the number
of p-wave particles.
N=20,   L=0
III IV V
FIG. 6: The spectrum of the L = 0 states with N = 20,
Vo = 1, and θrange = 0.025. Refer to Fig.1
When N = 20, all the above qualitative features re-
main unchanged. Examples are given in Fig.6 and 7 to
be compared with Fig.1 and 4. Nonetheless, the decrease
of N implies that the particles have a less chance to meet
each other, thus the particle correlation is expected to be
weaker. Quantitatively, it was found that (i) The spac-
ing of adjacent oscillation levels becomes smaller, it is
now ∼2.2 to replace the previous 3.15 (ii) The fluctu-
ation becomes weaker. E.g., the weights of |P (j)〉 of
the Ψ
I,3 state are 0.01, 0.95, and 0.03 for j = 1, 2, and
3, respectively, while these weights would be 0.16, 0.46,
0 60 120 180 240 300 360
0.20 N=20,  L=3 
III,1
FIG. 7: The same as Fig.4 but with N = 20
and 0.28 if N = 100. (iii) When N becomes small, the
geometric features would become explicit. E.g., for the
3-cluster structure, the difference between the maximum
and minimum of ρ2 is ∼0.007 in Fig.4, but ∼ 0.033 in
Fig.7.
The decrease of Vo or θrange was found to cause an
effect similar to the decrease of N , the spectra would
remain qualitatively unchanged. Quantitatively, when
Vo is changed from 1 to 0.1, the spacing inside a band
is changed from ∼ 3.15 to ∼ 2.14, and the fluctuation
becomes much weaker as expected.
In what follows we study the vortex states. For an
arbitrary Lo ≤ N/2, the spectra of the Ψ(N−Lo)Z,i and
Z,i states are found to be identical
15, except the former
shifts upward as a whole by N − 2Lo, namely,
E(N−Lo,Z,i) = E(Lo,Z,i) +N − 2Lo (10)
Furthermore, their ρ2 are found to be identical.
Let us define an operator
X so that the state
X |α〉 is
related to |α〉 by changing every ki in |α〉 to −ki + 1,
i.e., φki(θ) to φ−ki+1(θ) = e
iθφ−ki(θ). We further found
from the numerical data that
(N−Lo)
Z,i =
holds exactly. In fact,
X causes a reversion of rota-
tion of each particle plus a collective excitation. It does
not cause any change in particle correlation, therefore ρ2
remains exactly unchanged. Thus the L large states,
including the vortex states L = N , can be known from
the L small states.
The underlying physics of this finding is the separabil-
ity of the Hamiltonian (it is emphasized that the sepa-
rability is exact as can be proved by using mathematical
induction). Let θcoll =
θi/N, which describes a col-
lective rotation. Then H = − 1
+Hint, where Hint
describes the relative (internal) motions and does not de-
pend on θcoll. Accordingly, E
(L,Z,i) = L2/N + E
(L,Z,i)
int ,
the former is for collective and the latter is for relative
(internal) motions. The eigen-states can be thereby sep-
arated as Ψ
Z,i =
eiLθcoll ψ
(L,Z,i)
int . The feature of the
internal states ψ
(L,Z,i)
int has been studied in [19]. Where
it was found that, for an arbitrary Lo
(N+Lo,Z,i)
int = ψ
(Lo,Z,i)
With these in mind, eq.(10) and (11) can be derived as
follows.
From the separability
(N−L)
ei(N−L)θcoll ψ
(N−L,Z,i)
int (13)
X acts on a wave function with L, from the defini-
tion of
X, L should be changed to −L and an additional
factor
eiθj = eiN θcoll should be added, thus
Z,i =
ei(N−L)θcoll ψ
(−L,Z,i)
Due to (12), the right hand sides of (13) and (14) are
equal, thereby (11) is proved.
Furthermore, since ψ
(−L,Z,i)
int = (ψ
(L,Z,i)
int )
∗, the internal
energy E
(N−L,Z,i)
int = E
(−L,Z,i)
int = E
(L,Z,i)
int . Therefore,
E(N−L,Z,i) − E(L,Z,i) = (N − L)2/N − L2/N = N − 2L.
This recovers eq.(10), the energy difference arises purely
from the difference in collective rotation.
If the particles are tightly confined on the ring, rapidly
rotating state with a large L = JN − Lo would exist,
where J is an integer. Their spectra would remain the
same but shift upward by J(JN − 2Lo) from the spec-
trum with L = Lo, while Ψ
(JN−Lo)
Z,i =
Z,i , where
XJ changes each φki to φ−ki+J . Thus the rapidly rotat-
ing states have the same internal structure as the corre-
sponding lower states but have a much stronger collective
rotation.
When N increases greatly while Vo or θrange decreases
accordingly, the qualitative behaviors remain unchanged.
E.g., when N = 10000 and Vo = 0.01 (θrange remains
unchanged), the spectrum and the wave functions are
found to be nearly the same as the case N = 100 and
Vo = 1, except that the spectrum has shifted upward
nearly as a whole by 3939. This is again a signal that,
for weak interaction and for the ground states, the mean-
field theory is a good approximation.
It is noted that the confinement by a ring is quite differ-
ent from a 2-dimensional harmonic trap. In the latter,
the energy of a particle in the lowest Landau levels is
proportional to its angular momentum k. However, for
the rings, it is proportional to k2. Consequently, higher
partial waves are seriously suppressed and the p-wave ex-
citation becomes dominant. For a harmonic trap it was
found in [10,11] that d- and f-wave excitations are more
important than the p-wave excitation when L is small.
This situation does not appear in our case.
When the zero-range interaction Vij = gδ(θi − θj) is
adopted, The results are nearly the same with those from
the square-barrier interaction if the parameters are re-
lated as g = 2Voθrange (in this choice both interactions
have the same diagonal matrix elements). For an exam-
ple, a comparison is made in Table III. The high simi-
larity between the two sets of data imply that the above
findings are also valid for zero-range interaction.
TABLE III: Eigen-energies of the four lowest L = 0 states
for a system with N = 100 and with zero-range interaction
Vij = 0.05δ(θi − θj) (the unit of energy is G as before). The
weights of the j = 0 components of these states are also
listed. The corresponding results from square-barrier inter-
action with Vo = 1, and θrange = 0.025 are given in the
parentheses.
(L,Z, i) E(L,Z,i) (C
(L,Z,i)
(0, I, 1) 39.0900 (39.0902) 0.9370 (0.9371)
(0, I, 2) 42.2982 (42.2983) 0.0510 (0.0510)
(0, I, 3) 45.4733 (45.4733) <0.02 (<0.02)
(0, II, 1) 47.0076 (47.0074) 0.8372 (0.8373)
The numerical results from using zero-range interac-
tion can be compared with the exact results from solving
integral equations by Lieb and Liniger [16,17]. The vari-
ables γ and e(γ) introduced in [16] are related to those of
this paper as γ = gπ/N and e(γ) = 4π2E/N3 (the unit
of E is G). However, this paper concerns mainly the case
of weak interaction, say, g ≤ 0.05,or γ ≤ 0.00157 (other-
wise, the procedure of diagonalization would not be valid
due to the cutoff of the space). Nonetheless, even γ is
as large as 0.5 (g = 15.9) the evolution of the ground
state energy with N = 100 against γ obtained via di-
agonalization coincide, in the qualitative sense, with the
exact results quite well . This is shown in Fig.8 to be
compared with Fig.3 of [16], where γ is ranged from 0
to 10. In Fig.8, the constraint γ < e(γ) is recovered.
Furthermore, when γ is small, e(γ) against γ appears as
a straight line.
0.0 0.1 0.2 0.3 0.4 0.5
FIG. 8: e(γ) = 4π2E/N3 against γ = gπ/N . N is given
at 100 and E is the ground state energy calculated from the
diagonalization in the unit G.
In summary, a detailed analysis based on the numerical
data of N−boson systems on a ring with weak interac-
tion has been made. The main result is the discovery of
the basic pairs, which exist extensively in all the excited
states and dominates the low-lying spectra. The fluc-
tuation of basic pairs provides a common mechanism of
oscillation, the low-lying states are thereby classified into
oscillation bands. Each band is characterized by having
its specific additional excitation of a few particles. Since
the mechanism of oscillation is common, the level spac-
ings of different bands are nearly equal in a spectrum.
To divide the motion into being collective and relative
provides a better understanding to the relation between
the higher and lower states. The very high vortex states
with L ≈ N can be understood from the correspond-
ing low-lying states because they have exactly the same
internal states.
The particle correlation has been intuitively studied.
particle densities are found to be in general non-uniform,
bose-clustering originating from the symmetrization of
wave functions is found, which leads to the appearance
of one, two, and three clusters. This phenomenon would
become explicit and might be observed if N is small.
Acknowledgment: The support by NSFC under the
grants 10574163 and 90306016 is appreciated.
REFERENCES
1, D.M. Stamper-Kurn, M.R. Andrews, A.P.
Chikkatur, S. Inouye, H.-J. Miesner, J. Stenger, and W.
Ketterle, Phys. Rev. Lett. 80, 2027 (1998)
2, B.P. Anderson and M.A. Kasevich, Science 281,
1686 (1998)
3, J.L. Roberts, et al, Phys. Rev. Lett. 81, 5159
(1998)
4, J. Stenger, et al, Phys. Rev. Lett. 82, 2422 (1999).
5, S.L. Cornish et al., Phys. Rev. Lett. 85, 1795 (2000)
6, M. Greiner et al., Nature (London) 415, 39 (2002)
7, B. Paredes et al., Nature (London) 429, 277 (2004)
8, G.T. Kinoshita, T. Wenger, and D.S. Weiss, Science
305, 1125 (2004)
9, N.K. Wilkin, J.M.F. Gunn, and R.A. Smith, Phys.
Rev. Lett. 80, 2265 (1998)
10, B. Mottelson, Phys. Rev. Lett. 83, 2695 (1999)
11, G.F. Bertsch and T. Papenbrock, 83, 5412 (1999)
12, K. Sakmann, A.I. Streltsov, O.E. Alon, L.S. Ceder-
baum, Phys. Rev. A 72, 033613 (2005)
13, I. Romanovsky, C. Yannouleas, and U. Landman,
Phys. Rev. Lett. 93, 230405 (2004)
14, I. Romanovsky, C. Yannouleas, L.O. Baksmaty,
and U. Landman, Phys. Rev. Lett. 97, 090401 (2006)
15, Yongle Yu, cond-mat/0609711 v1.
16, E.H. Lieb and W.Liniger, Phys, Rev. 130, 1605
(1963)
17, E.H. Lieb, Phys, Rev. 130, 1616 (1963)
http://arxiv.org/abs/cond-mat/0609711
18, F. Bacher and S. Goudsmit, Phys. Rev., 46, 948
(1934)
19, C.G. Bao, G.M. Huang, and Y.M. Liu, Phys. Rev.
B 72, 195310 (2005)
ABSTRACT
  The Hamiltonian of a N-boson system confined on a ring with zero spin and
repulsive interaction is diagonalized. The excitation of a pair of
p-wave-particles rotating reversely appears to be a basic mode. The fluctuation
of many of these excited pairs provides a mechanism of oscillation, the states
can be thereby classified into oscillation bands. The particle correlation is
studied intuitively via the two-body densities. Bose-clustering originating
from the symmetrization of wave functions is found, which leads to the
appearance of 1-, 2-, and 3-cluster structures. The motion is divided into
being collective and relative, this leads to the establishment of a relation
between the very high vortex states and the low-lying states.

<|endoftext|><|startoftext|>
Kadowaki-Woods Ratio of Strongly Coupled Fermi Liquids
Takuya Okabe
Faculty of Engineering, Shizuoka University, 3-5-1 Johoku, Hamamatsu 432-8561,Japan
(Dated: November 28, 2018)
On the basis of the Fermi liquid theory, the Kadowaki-Woods ratio A/γ2 is evaluated by using a
first principle band calculation for typical itinerant d and f electron systems. It is found as observed
that the ratio for the d electron systems is significantly smaller than the normal f systems, even
without considering their relatively weak correlation. The difference in the ratio value comes from
different characters of the Fermi surfaces. By comparing Pd and USn3 as typical cases, we discuss
the importance of the Fermi surface dependence of the quasiparticle transport relaxation.
PACS numbers: 71.10.Ay, 71.18.+y, 71.20.Be, 71.27.+a, 72.15.-v
It is widely known as a universal feature of heavy
fermion systems that there holds the Kadowaki-Woods
(KW) relation A/γ2 ≃ 1 × 10−5µΩ cm(mol K/mJ)2 be-
tween the electronic specific heat coefficient γ of C = γT
and the coefficient A of the resistivity ρ = AT 2 in the
clean and low temperature limit.[1] According to the
Fermi liquid theory, this is interpreted as an indication
of the fact that A is squarely proportional to quasiparti-
cle mass enhancement due to strong electron correlation.
On the other hand, transition metal systems are reported
since before to obey a similar relation with a more than
an order of magnitude smaller value of A/γ2.[2, 3] In view
of the observation that there seems to exist several types
of systems in this regard, the recent finding by Tsujii et
al.[4] is quite impressive that many Yb-based compounds
show the KW ratio A/γ2 as small as the transition met-
als. Kontani derived the small ratio as a result of the
large orbital degeneracy of the the 4f13 state of trivalent
Yb by applying the dynamical mean field approximation
to a periodic Anderson model of an orbitally degenerate f
electron states coupled with a single conduction band.[5]
To discuss the KW ratio A/γ2 and the many-body
mass enhancement effect, a simple model is usually
adopted at the cost of neglecting material specific individ-
ual factors. In the present work, we are interested in such
an effect as caused by a system-dependent factor, that is,
the Fermi surface dependence of quasiparticle current re-
laxation. The system should have a large enough Fermi
surface relative to the Brillouin zone boundary in order
for the quasiparticle current to dissipate effectively into
an underlying lattice through mutual quasiparticle scat-
terings. In other words, the effectiveness of the trans-
port relaxation may depend on the size and shape of the
Fermi surface. To investigate this point definitely, we
discuss the quasiparticle transport by taking account of
the momentum dependence of quasiparticle scattering on
the basis of realistic band structures. This has been ham-
pered so far by a task required for not so simple Fermi
surfaces of many band systems as could be simply mod-
elled analytically. In terms of fairly realistic energy bands
obtained from a first principle calculation, we evaluate
those quantities which are not affected severely by the
electron correlation effect. The theory in use is essen-
tially within the phenomenological Fermi liquid theory
described by renormalized quantities, and unlike a model
calculation no bare microscopic quantities appear explic-
itly. Schematic results using simple abstract models have
been given before, in which a tight binding square lattice
model and a two-band model are investigated.[6, 7, 8]
For the ratio A/γ2 we make use of the expression,
= 21.3αFa [µΩ (mol K/mJ)
], (1)
which corresponds to Eq. (4.11) in Ref. 7 where we set
a = 4Å for the lattice constant. In what follows we sub-
stitute a calculated value for a. Below we follow how to
derive αF , where α is a coupling constant, and F is a
factor determined by the Fermi surface.
Following a microscopic analysis of the quasiparticle
transport with vertex corrections properly taken into
account,[9] we may derive a phenomenological linearized
Boltzmann equation.[7] Generalizing the theory to take
a many-band effect into account, in the low temperature
T → 0 we end up with the equation
∗Electronic address: ttokabe@ipc.shizuoka.ac.jp
vipµ = (πT )
pp′kρ
p′+k(l
pµ + l
p′µ − l
p′+kµ − l
p−kµ), (2)
where vipµ and ρ
p = δ(µ − ε
p) are the velocity compo- nent and the local density of state of the renormalized
http://arxiv.org/abs/0704.0843v2
mailto:ttokabe@ipc.shizuoka.ac.jp
(mass-enhanced) quasiparticle with the crystal momen-
tum p in the i-th band. The superscripts i and j are the
band indices, while the subscript µ = x, y, z are Carte-
sian coordinates. In the right hand side of Eq. (2), the
2nd to 4th terms in the parenthesis represent vertex cor-
rections in the microscopic formulation. In terms of the
solution lipµ, which physically represents stationary devi-
ation of the Fermi surface in an applied electric field Eµ,
the conductivity is given by
σ ≡ σµ = 2e
pµ, (3)
The above equations (2) and (3) correspond to
Eqs. (3.10) and (3.15) of Ref. 8 respectively. We may
suppress the index µ (= x) in Eq. (3) as we discuss the
cubic systems in what follows.
Instead of solving the simultaneous matrix equations
(2) exactly, we use trial functions for lipµ as commonly ap-
plied in a variational principle formulation of the trans-
port problems.[10] Assuming
lipµ ∝ e
|vipµ|
we obtain
αijci,j
ρ2|vx|
, (4)
where
ci,j =
k1,k2,k3,k4
k1+k2=k3+k4
ρik1ρ
ρik4(e
−eik4)
2/4ρiρj ,
ρ|vx| ≡
ρip|v
px|. (6)
We define coupling constants αij = ρiρj〈W
ij〉/π, where
p, is the density of states of the i-th band at
the Fermi level and 〈W ij〉 denotes the quasiparticle scat-
tering probability W
pp′k averaged over the momenta p, p
and k. As the double sum in (2), dominated by Umk-
lapp processes, covers a complicated shaped phase space
over the Fermi surface, it is generally a good approxi-
mation to take W
pp′k out of the momentum sum as an
averaged quantity. The total density of states ρ =
is substituted for γ = 2π2ρ/3.
In heavy fermion systems, the momentum dependence
pp′k could be generally neglected, for the quasiparti-
cle scattering W
pp′k is primarily caused by strong on-site
Coulomb repulsion U . Then we can make an order of
magnitude estimate of αii in terms of Landau parame-
ters F
and F
. For an anisotropic Fermi liquid, as
in an isotropic case, one can derive that the charge and
spin susceptibilities are given by χic = 2ρi/(1+F
) and
χis = 2ρi/(1 + F
), respectively. Thus, for the systems
in which charge fluctuations are suppressed, χic → 0, we
obtain F
≫ 1. On the other hand, in terms of A
/(1 +F
), one obtains a rough estimate of the cou-
pling αii = 1
)2 + 1
. There-
fore, under the normal condition that the spin enhance-
ment is moderate, (1+F
)−1 ∼ 1, αii should universally
stay around a constant of an order of unity.[7] This corre-
sponds to the condition to make the Wilson ratioRW = 2
in the impurity model.[11, 12] We discuss a normal state
that the system is well away from critical instabilities,
around which A/γ2 will be strongly enhanced at vari-
ance with experimental results under consideration.[13]
We evaluate F numerically for α = αij = 1 to obtain
A/γ2, and investigate the Fermi surface dependence.
It is noted that the factor F is determined by the shape
and extent of the Fermi surfaces relative to the Brillouin
zone boundary. Microscopically, the mass enhancement
due to the many-body effect is represented by the ω-
derivative of the electron self-energy Σ(q, ω), or by the
renormalization factor zip as ρ
p = ρ
0,p/z
p, where ρ
0,p is
a bare density of states. It is easily checked that the
factor z cancels in F when zip is independent of i. Oth-
erwise, in case that a dominant contribution to the re-
sistivity comes from an electron-correlated main band,
then the other bands may be neglected and A/γ2 be-
comes independent of z of the main band. As we see
below numerically, it is found indeed that F is domi-
nated by a few scattering channels within a main band
or two. Hence, we elaborate on a numerical estimate of
F on the basis of a realistic band calculation reproducing
reliable Fermi surfaces of relevant bands, even if it may
not take account of local many-body correlation effects
fully enough for the renormalized quantities like ρi and
vip to be separately compared with experiments. As a
matter of course, we must exclude the extreme case in
which strong correlation modifies electron states around
the Fermi level qualitatively from those of a band calcu-
lation. We apply our theory to those itinerant electron
systems in which correlation strength is not negligible
but not so strong.
To calculate F for some typical cubic d and f itiner-
ant electron systems in the fcc and Cu3Au structures, we
have performed ab initio band calculations within den-
sity functional theory using the plane wave pseudopoten-
tial code VASP with the Perdew-Wang 1991 generalized
gradient approximation to the exchange correlation func-
tional Exc.[14, 15, 16, 17] By minimizing the total energy
we obtain the lattice constant a, which is accurate enough
to be used in Eq. (1).
To evaluate F numerically, we have to broaden the
delta function ρip = δ(µ − ε
p) by ∆ to pick up electron
states around the Fermi level. The width ∆ of the order
of real temperature should be decreased as the number of
the k-points is increased until we confirm to have a con-
vergent result. For the number L of subdivisions along re-
TABLE I: Calculated results.
a (Å) ρ|vx|
a F N A/γ2 b
USn3 4.60 3.1 4.0 3 0.39
UIn3 4.61 4.9 1.6 3 0.16
UGa3 4.24 3.9 2.5 3 0.23
Pd 3.86 7.4 0.23 3 0.019
Pt 3.91 8.4 0.15 4 0.012
aIn unit of a = 1.
bIn unit of [10−5 µΩ cm (mol K/mJ)2].
ciprocal lattice vectors, band calculations are performed
with Lband ∼ 50, from which we obtain the band energies
εik on the finer k-mesh of L ∼ 200 by interpolation. As
the four-fold k-sum in the numerator of Eq. (4), especially
for the most important terms coming from the main d or
f correlated bands, constitutes the most time consuming
part of the calculation, we have to reduce the numerical
task by some symmetry considerations not only on the
cubic symmetry of the quasiparticle states, but on the
relative directions of the four momentum vectors of the
scattering quasiparticle states and the x-direction of the
current flow. The reduction is particularly effective for
the intra-band scatterings i = j.
The calculated results are shown in Table I, where F
and A/γ2 for α = αij = 1 are shown along with the
lattice constant a, the number N of metallic bands con-
tributing to the resistivity, and ρ|vx| defined in Eq. (6).
We find that our results explain well the experimental
tendency of an order of magnitude small values of the
ratio A/γ2 for the transition metal systems. As for the
absolute values of the ratio, our results are a few times
smaller than observed evenly, but the accuracy of this
order should not be taken seriously here. Among other
things, the results indicate that different characters of
the Fermi surfaces play an important role.
To show the relative contribution to the resistivity
from relevant bands, relative magnitudes of ci,j in the nu-
merator of Eq. (4) are shown for Pd and USn3 in Figs. 1
and 2, respectively. For Pd, the contribution to F comes
from the 4th to 6th bands, among which dominant is the
5th hole band of the 3d character. Similarly, the 5th band
contributes majorly not only to ρ, i.e., ρ5 ≃ 5.4ρ4 ≃ 12ρ6,
but to ρ|vx| in Eq. (6). On the other hand, for USn3,
while the 14th heavy electron band plays a central role,
the 12th and 13th hole bands also make non-negligible
contributions through the inter-band scatterings. Hence,
as the first point to note, numerical importance of the
inter-band contributions makes F large in the f electron
system. This is partly because ρi for i = 12, 13, 14 are
comparable with each other, namely, ρ14 ≃ 2ρ13 ≃ 3ρ12.
Moreover, it is remarked that the large and nearly spher-
ical shape of the Fermi surfaces are essential too. As the
second point to note, the importance of the Fermi surface
geometry can be understood within a single band model
by comparing contribution from the main band. We find
that c5,5/ρ
5 = 0.097 for Pd is an order of magnitude
 6  4
FIG. 1: cij (i, j = 4, 5, 6) for Pd. The contribution from the
5th band is dominant for the resistivity.
 14 12
FIG. 2: cij (i, j = 12, 13, 14) for USn3. The interband con-
tribution with the 14th band is important too.
smaller than c14,14/ρ
14 = 0.93 for USn3. The difference
comes from the different characters of the Fermi surfaces.
According to an elementary formula σ = e2ρv2τ =
e2ρvl, the conductivity σ depends on ρv as well as l. In
this context, the mean free path l is not a single particle
property determined by a lifetime of the particle state,
but it is the transport property which characterizes how
efficiently the total electric current decays into a lattice
system, e.g., in our case, through mutual Umklapp scat-
tering processes between the current carriers. In partic-
ular, regardless of interaction, electrons in free space will
not have resistivity.[9] Thus, to evaluate the transport
property l correctly, it is crucial to take account of the
momentum dependence of the scattering states and their
conservation modulo the reciprocal lattice vectors.
Note that ρ|vx| defined in Eq. (6) is related to the sur-
face area S of the Fermi surfaces, as ρdε = Sdk⊥/(2π)
Hence, ρ|vx| too is independent of the mass renormaliza-
tion z as F is, and for free electrons we obtain ρ|vx| ∝
∝ n2/3. One can see a correlation between F and
ρ|vx| in Table I. In fact, Pd and Pt have twice as large
ρ|vx| as the uranium compounds. The difference can-
not be simply explained by the difference in the Fermi
surface volume n. It is caused by the fact that the f -
electron systems have the nearly isotropic Fermi surfaces
while the d-electron systems have complicated ones with
FIG. 3: The intersection of the Fermi surfaces of Pd d-hole
states with the (111̄) plane.
FIG. 4: The intersection of the Fermi surfaces of USn3 with
the (100) plane
relatively large area compared to their total volume, as
indicated in Figs. 3 and 4. The different characters of
the surfaces affect not only the single particle quantity
ρ|vx| but also the transport property of the total cur-
rent relaxation. As the order of magnitude difference in
F is not explained merely by ρ|vx|, we have to have re-
sort to the other factor, that is, the transport property
depending on the Fermi surfaces. It originates from the
detailed k-dependence of the scattering states, as repre-
sented in ci,j , or by the phase space volume available for
all possible scattering channels under strict restrictions
of energy and momentum conservations. Thus our quan-
titative analysis concludes the important effect on the
quasiparticle transport due to the shape and complexity
of the Fermi surfaces.
In summary, we evaluated the Kadowaki-Woods ratio
A/γ2 of some itinerant d and f electron systems numeri-
cally on the basis of the Fermi liquid theory using quasi-
particle Fermi surfaces obtained by band calculations.
In a single framework, we find the d electron systems
have smaller ratio than the f systems, as observed, and
among others we pointed out an important effect to the
transport coefficient A originating from a commonly ne-
glected specific feature depending on the characters of
the Fermi surfaces. The effect is not understood fully
as a single-particle property of interacting systems, but
we stress the importance of the phase space restriction
due to momentum conservation in two-body scattering
processes to dissipate a total electric current. In short,
to realize effective dissipation, the system should have a
large and regular shaped Fermi surface. In future we will
examine that the Fermi-surface dependent efficiency of
mutual quasiparticle scatterings may depend on a type
of transport current to be relaxed.
Acknowledgment
The author is grateful to N. Fujima, S. Kokado and
T. Hoshino for providing assistance in the numerical cal-
culations. He also acknowledges computational resources
offered from YITP computer system in Kyoto University.
[1] K. Kadowaki and S. B. Woods, Solid State Commun. 58,
507 (1986).
[2] M. J. Rice, Phys. Rev. Lett. 20, 1439 (1968).
[3] K. Miyake, T. Matsuura, and C. M. Varma, Solid State
Commun. 71, 1149 (1989).
[4] N. Tsujii, H. Kontani, and K. Yoshimura, Phys. Rev.
Lett. 94, 057201 (2005).
[5] H. Kontani, J. Phys. Soc. Jpn. 73, 515 (2004).
[6] T. Okabe, J. Phys. Soc. Jpn. 67, 2792 (1998).
[7] T. Okabe, J. Phys. Soc. Jpn. 67, 4178 (1998).
[8] T. Okabe, J. Phys. Soc. Jpn. 68, 2721 (1999).
[9] K. Yamada and K. Yosida, Prog. Theor. Phys. 76, 621
(1986).
[10] J. M. Ziman, Electrons and Phonons (Clarendon Press,
Oxford, 1960).
[11] P. Nozières, J. Low. Temp. Phys. 17, 31 (1974).
[12] K. Yosida and K. Yamada, Prog. Theor. Phys. 53, 1286
(1975).
[13] T. Takimoto and T. Moriya, Solid State Commun. 99,
457 (1996).
[14] G. Kresse and J. Furthmüller, Comput. Mater. Sci. 6, 15
(1996).
[15] G. Kresse and J. Furthmüller, Phys. Rev. B 54, 11169
(1996).
[16] G. Kresse and D. Joubert, Phys. Rev. B 59, 1758 (1999).
[17] J. P. Perdew, J. A. Chevary, S. H. Vosko, K. A. Jackson,
M. R. Pederson, D. J. Singh, and C. Fiolhais, Phys. Rev.
B 46, 6671 (1992).
ABSTRACT
  On the basis of the Fermi liquid theory, the Kadowaki-Woods ratio
$A/\gamma^2$ is evaluated by using a first principle band calculation for
typical itinerant $d$ and $f$ electron systems. It is found as observed that
the ratio for the $d$ electron systems is significantly smaller than the normal
$f$ systems, even without considering their relatively weak correlation. The
difference in the ratio value comes from different characters of the Fermi
surfaces. By comparing Pd and USn$_3$ as typical cases, we discuss the
importance of the Fermi surface dependence of the quasiparticle transport
relaxation.

<|endoftext|><|startoftext|>
Structure of Strange Dwarfs with Color Superconducting Core
Masayuki Matsuzaki∗ and Etsuchika Kobayashi
Department of Physics, Fukuoka University of Education,
Munakata, Fukuoka 811-4192, Japan
Abstract
We study effects of two-flavor color superconductivity on the structure of strange dwarfs, which
are stellar objects with similar masses and radii with ordinary white dwarfs but stabilized by the
strange quark matter core. We find that unpaired quark matter is a good approximation to the
core of strange dwarfs.
PACS numbers: 95.30.-k
∗matsuza@fukuoka-edu.ac.jp
http://arxiv.org/abs/0704.0844v1
mailto:matsuza@fukuoka-edu.ac.jp
Witten made a conjecture that the absolute ground state of quantum chromodynamics
(QCD) is not 56Fe but strange quark matter, which is a plasma composed of almost equal
number of deconfined u, d, and s quarks [1]. Although this conjecture has been neither
confirmed nor rejected, if this is true, since deconfinement is expected in high density cores
of compact stars, there could exist stars that contain strange quark matter converted from
two-flavor quark matter via weak interaction. Strange quark stars whose radii are about 10
km, with or without thin nuclear crust, have long been investigated.
Glendenning et al. proposed a new class of compact stars containing strange quark matter
and thick nuclear crust ranging from a few hundred to ten thousand km [2, 3, 4]. They named
them the strange dwarfs because their radii correspond to those of white dwarfs. Alcock et
al. discussed the mechanism that the strange quark core supports the cruct [5]. Since the
mass of s quark is larger than those of u and d, strange quark matter is positively charged.
In order to electrically neutralize the core, electrons are bound to the surface of the core.
They estimated that the thickness of this electric dipole layer is a few hundred fm. Then
this layer can support a nuclear crust. Although Alcock et al. considered only thin crusts,
Glendenning et al. considered thick crusts up to about ten thousand km. Very recently,
Mathews et al. identified eight candidates of strange dwarfs from observed data [6].
A theoretical facet whose importance in nuclear physics was recognized later is color
superconductivity in quark matter. At asymptotically high density, the color-flavor locking
(CFL) is believed to be the ground state [7]. At realistic densities, however, the two-flavor
color superconductivity (2SC) is thought to be realized even when electric neutrality is
imposed if the coupling constant is strong [8]. Thus, in the present paper, we discuss effects
of the 2SC phase in the strange quark matter core on the structure of strange dwarfs.
In order to determine the structure of compact stars, we solve the general relativistic
Tolman-Oppenheimer-Volkoff (TOV) equation,
dp(r)
Gǫ(r)M(r)
4πr3p(r)
M(r)c2
2GM(r)
, (1)
M(r) = 4π
ǫ(r′)
r′2dr′, (2)
for the pressure p(r), the energy density ǫ(r), and the mass enclosed within the radius r,
M(r). Here G is the gravitational constant and c is the speed of light. The equation is closed
when an equation of state (EOS), a relation between p and ǫ, is specified. In the present
case, strange dwarfs are composed of the strange quark matter core and the nuclear crust.
Accordingly two parameters, the pressure at the center and at the core-crust boundary,
must be specified to integrate the TOV equation. The latter must be equal or less than that
corresponds to the nucleon drip density ǫdrip. Otherwise neutrons drip and gravitate to the
core. In the present calculation we take a pcruct calculated from ǫcrust = ǫdrip.
We assume zero temperature throughout this paper. As for the EOS of the quark core,
we adopt the MIT bag model without any QCD corrections (see Ref. [4], for example). For
unpaired free quark matter,
p = −B +
µfkFf
µ2f −
m4f ln
µf + kFf
, (3)
ǫ = B +
µfkFf
µ2f −
m4f ln
µf + kFf
, (4)
where mf , kFf , and µf =
m2f + k
Ff are the mass, the Fermi momentum, and the chemical
potential of quarks of each flavor, respectively, and f runs u, d, and s. Hereafter we put
c = h̄ = 1. The quantity B is the bag constant. The effect of color superconductivity is
incorporated as a chemical potential dependent effective bag constant. In the 2SC case [9],
Beff = B −
∆2(µ)µ2, (5)
where ∆(µ) is the quark pairing gap as a function of a chemical potential µ, whose relation
to µf is specified later.
The pairing gap is obtained as a function of the Fermi momentum by solving the gap
equation [10]
∆(kF) = −
v̄(kF, k)
E ′(k)
k2dk, (6)
E ′(k) =
(Ek − EkF)
2 + 3∆2(k), (7)
with kF = kFu = kFd, Ek =
k2 +m2q , and mq = mu = md. The one gluon exchange pairing
interaction is given by
v̄(p, k) = −
pkEpEk
2EpEk + 2m
q + p
2 + k2 +m2E
(p+ k)2 +m2E
(p− k)2 +m2E
6EpEk − 6m
q − p
2 − k2
m2E =
2, (8)
where p and k are the magnitudes of 3-momenta. The running coupling constant is given
by [11]
q2max+q
q = p− k,
qmax = max{p, k}. (9)
As for the EOS of the crust, we adopt the tabulated one for β-equilibrium nuclear matter
of Baym, Pethick, and Sutherland [12] (BPS) conforming to Refs. [2, 3, 4].
The positively charged strange quark matter in the core is simply approximated by µ =
µu = µd = µs. Quark masses are given by mu = md = 10 MeV, ms = 150 MeV. The bag
constant is chosen to be B1/4 = 160 MeV. Parameters entering into the pairing interaction
are q2c = 1.5Λ
QCD and ΛQCD = 400 MeV. The nucleon drip density is ǫdrip = 4.3×10
11 g/cm3.
26 28 30 32 34 36 38
log� (J/m
free quark
2SC quark
FIG. 1: Equations of state of free and 2SC quark matter and β-equilibrium nuclear matter. The
latter is tabulated in Refs. [12] and [4].
The adopted EOS is displayed in Fig. 1. The logarithm is to base 10 throughout this
paper. The quark matter EOS describes the core and the BPS EOS describes the crust. At
the boundary, the pressure is common whereas the energy density jumps discontinuously.
In order to obtain the EOS for 2SC matter, the pairing gap must be calculated at each
kF beforehand. This is shown in Fig. 2 left. The effective bag constant determined by the
pairing gap is shown in Fig. 2 right. The resulting 2SC EOS is included in Fig. 1.
Figure 3 presents the mass-radius relation obtained by integrating the TOV equation with
a fixed pcruct, determined from ǫcrust = ǫdrip, and various central pressures. This result can
200 300 400 500 600 700 800
µ (MeV)
200 300 400 500 600 700 800
µ (MeV)
FIG. 2: Left: color superconducting pairing gap and right: effective bag constant, as functions of
the quark chemical potential.
0 1 2 3 4 5
logR (km)
FIG. 3: Mass-radius relation of strange dwarfs and white dwarfs.
be classified into three regions. The first region (larger central pressures), almost vertical
curve at around R ∼ 10 km, describes strange stars with thin crusts. In this region, color
superconductivity makes the maximum mass and radius larger because the pairing gap
reduces the bag constant and consequently the energy density decreases and the pressure
increases. This is consistent with another calculation with the CFL phase [13]. The second
region, horizontal at around M/Msun ∼ 10
−2, and the third region, vertical at around R ∼
104 km up to the maximum mass, correspond to strange dwarfs. In the second region, color
superconducting quark cores support slightly larger masses than unpaired free quark cores.
In the third region, effect of color superconductivity is negligible. In Fig. 3, The mass-radius
relation of ordinary white dwarfs without quark matter cores calculated by adopting the
BPS EOS is also shown although it is known that the BPS EOS is not very suitable for
white dwarfs. As the central pressure decreases, the quark matter core shrinks (Fig. 4 left)
and eventually strange dwarfs reduce to ordinary white dwarfs. When their masses are the
same, the former is more compact than the latter (see also Fig. 5 right) because of the
gravity of the core. Mathews et al. paid attention to this difference in the mass-radius
relation and classified the observed data of dwarfs [6]. According to their work, eight of
them are classified into strange dwarfs.
28 29 30 31 32 33 34 35
logp0 (J/m
28 29 30 31 32 33 34 35
logp0 (J/m
FIG. 4: Left: core radius and right: mass of strange dwarfs, as functions of the central pressure.
14 15 16
log�0 (g/cm
-1 0 1 2 3 4
logr (km)
SD(free)
FIG. 5: Left: mass of strange dwarfs as a function of the central energy density. Right: energy
profile of a strange dwarf with M/Msun = 0.465 and that of a white dwarf with M/Msun = 0.466.
Figure 4 right indicates that strange dwarfs, in particular those of 103 km < R < 104 km,
are realized in a very narrow range of the central pressure. This is reflected in the density of
calculated points. During this rapid structure change from the second to the third region,
the core radius almost does not change, see Fig. 4 left. Figure 5 left also graphs M/Msun as
Fig. 4 right but as a function of the central energy density. The difference between these two
figures at the low pressure/energy density side can be understood from the quark matter
EOS in Fig. 1 such that the pressure decreases steeply at the lowest energy density. Figure 5
left indicates that strange dwarfs have central energy densities just below the lowest stable
compact strange stars and several orders of magnitude larger than those of ordinary white
dwarfs. This is clearly demonstrated in Fig. 5 right.
To summarize, we have solved the Tolman-Oppenheimer-Volkoff equation for strange
dwarfs with ǫcrust = ǫdrip and a wide range of the central pressure. We have examined effects
of the two-flavor color superconductivity in the strange quark matter core in a simplified
manner. The obtained results indicate that, aside from a slight increase of the minimum
mass, effect of color superconductivity is negligible in the mass-radius relation. This is
consistent with the conjecture given in Ref. [6]. As a function of the central energy density,
however, strange dwarfs are realized at slightly lower energy densities than the unpaired
free quark case reflecting the effect on the equation of state. Recently Usov discussed that
electric fields are also generated on the surface of the color-flavor locked matter [14]. This
suggests that strange dwarfs with color-flavor locked cores might also be possible although
this is expected only at relatively high densities. Since the pairing gap enters into the
calculation only through the effective bag constant, aside from a possible slight change in
chemical potentials, it can surely be expected that the effect of color-flavor locking does
not differ much from that of the two-flavor color superconductivity. In conclusion, unpaired
quark matter is a good approximation to the core of strange dwarfs. Another aspect that
might be affected by color superconductivity is the cooling [15]. This is beyond the scope of
the present study.
[1] E. Witten, Phys. Rev. D 30 (1984), 272.
[2] N. K. Glendenning, Ch. Ketter and F. Weber, Phys. Rev. Lett. 74 (1995), 3519.
[3] N. K. Glendenning, Ch. Ketter and F. Weber, Astrophys. J. 450 (1995), 253.
[4] N. K. Glendenning, Compact Stars (Springer, New York, 1996).
[5] C. Alcock, E. Farhi and A. Olinto, Astrophys. J. 310 (1986), 261.
[6] G. J. Mathews, I. -S. Suh, B. O’Gorman, N. Q. Lan, W. Zech, K. Otsuki and F. Weber, J.
Phys. G 32 (2006), 747.
[7] K. Rajagopal and F. Wilczek, Phys. Rev. Lett. 86 (2001), 3492.
[8] H. Abuki and T. Kunihiro, Nucl. Phys. A 768 (2006), 118.
[9] M. Alford and K. Rajagopal, J. High Energy Phys. 06 (2002), 031.
[10] M. Matsuzaki, Phys. Rev. D 62 (2000), 017501.
[11] K. Higashijima, Prog. Theor. Phys. Suppl. 104 (1991), 1.
[12] G. Baym, C. Pethick and P. Sutherland, Astrophys. J. 170 (1971), 299.
[13] G. Lugones and J. E. Horvath, Astron. and Astrophys. 403 (2003), 173.
[14] V. V. Usov, Phys. Rev. D 70 (2004), 067301.
[15] O. G. Benvenuto and L. G. Althaus, Astrophys. J. 462 (1996), 364.
	References
ABSTRACT
  We study effects of two-flavor color superconductivity on the structure of
strange dwarfs, which are stellar objects with similar masses and radii with
ordinary white dwarfs but stabilized by the strange quark matter core. We find
that unpaired quark matter is a good approximation to the core of strange
dwarfs.

<|endoftext|><|startoftext|>
Information entropic superconducting microcooler
A. O. Niskanen,1, 2 Y. Nakamura,1, 3, 4 and J. P. Pekola5
1CREST-JST, Kawaguchi, Saitama 332-0012,Japan
2VTT Technical Research Centre of Finland, Sensors, PO BOX 1000, 02044 VTT, Finland
3NEC Fundamental Research Laboratories, Tsukuba, Ibaraki 305-8501, Japan
4The Institute of Physical and Chemical Research (RIKEN), Wako, Saitama 351-0198, Japan
5Low Temperature Laboratory, Helsinki University of Technology, PO BOX 3500, 02015 TKK, Finland
(Dated: October 25, 2018)
We consider a design for a cyclic microrefrigerator using a superconducting flux qubit. Adiabatic
modulation of the flux combined with thermalization can be used to transfer energy from a lower
temperature normal metal thin film resistor to another one at higher temperature. The frequency
selectivity of photonic heat conduction is achieved by including the hot resistor as part of a high
frequency LC resonator and the cold one as part of a low-frequency oscillator while keeping both
circuits in the underdamped regime. We discuss the performance of the device in an experimentally
realistic setting. This device illustrates the complementarity of information and thermodynamic
entropy as the erasure of the quantum bit directly relates to the cooling of the resistor.
PACS numbers: 74.50.+r,85.80.Fi,03.67.-a
For the purpose of quantum computing, the coher-
ence properties of superconducting quantum bits (qubits)
should be optimized by decoupling them from all noise
sources as well as possible. However, many interesting
experiments can be envisioned also when the decoupling
is far from perfect. One such experiment closely related
to coherence optimization is using a qubit as a spectrom-
eter [1, 2, 3] for the environmental noise by monitoring
the effect of the environment on the quantum two-level
system. Here we focus on the opposite phenomenon, i.e.
the effect of a qubit on the environment. Recently a
superconducting flux qubit [4, 5] with a quite small tun-
neling energy from the point of view of quantum com-
puting was cooled using sideband cooling and a third
level [6] from about 400 mK down to 3 mK. Motivated
by this experiment we consider the possibility of using
a single quantum bit as a cyclic refrigerator for environ-
mental degrees of freedom. The utilized heat conduction
mechanism is photonic which was recently studied also
in experiment [7]. Besides the possible practical uses,
the device is interesting physically as it directly illus-
trates the connection between information entropy and
thermodynamical entropy. For related superconducting
high-frequency cooler concepts see eg. Refs. [8, 9].
Here we study a flux qubit coupled inductively to two
different loops shown in Fig. 1a. In loop j (j = 1, 2) we
have a resistor Rj in series with an inductance Lj and
a capacitance Cj . These form two damped harmonic os-
cillators. The resistors are in general at different tem-
peratures T1 and T2. The coupling of the qubit to these
two admittances Y1 and Y2 is assumed to be sufficiently
large to dominate the relaxation of the qubit. This as-
sumption can be easily validated by e.g. increasing the
mutual inductance. The flux qubit is an otherwise su-
perconducting loop except for three or four Josephson
junctions with suitably picked parameters. In particu-
FIG. 1: (color online) Principle of the flux-qubit cooler.
(a) Layout of the circuit. (b) Energy band diagram. (c)
Schematic of the cooling cycle in the qubit temperature-
entropy plane.
lar one of the junctions is made smaller than others to
form a two-level system. When biased close to half of the
flux quantum Φ0 = h/2e, the qubit can be described (in
persistent current basis) by the Hamiltonian
H/~ = −1
(∆σx + εσz) (1)
where σx and σz are Pauli matrices, ~ε = 2Ip(Φ−Φ0/2)
is the flux-tunable energy bias and Φ is the controllable
flux threading the qubit loop. Away from Φ = Φ0/2 the
eigenstates have the persistent currents ±Ip circulating
in the loop. The tunneling energy ~∆ results in an an-
http://arxiv.org/abs/0704.0845v1
ticrossing at Φ = Φ0/2 and there the energy eigenstates
do not carry average current. The resonant angular fre-
quency of the qubit is ω =
ε2 +∆2.
Consider the ideal cycle shown in Fig. 1b-c where the
bias of the flux qubit is swept slowly (slower than ∆/2π)
between two extreme values ε1 and ε2 corresponding to
two different energy level separations ~ω1 and ~ω2. Let
us further assume that ωj ≈ ωLCj and Qj ≫ 1, where
ωLCj = 1/
LjCj and Qj =
Lj/Cj/Rj. This choice
guarantees that the qubit mainly couples to resistor R1
(R2) at bias point 1 (2). The cooling cycle consists of
steps O, P, Q and R. First in step O the qubit has the
angular frequency ω2 and is allowed to thermalize. Be-
cause of the bandwidth limitations imposed by the reac-
tive elements, the qubit tends to thermalize with resistor
R2 to temperature T2. In the next step P the flux bias is
adiabatically changed to point 1 such that the level popu-
lations do not change but the energy eigenstates do. The
sweep is assumed to be however faster than relaxation.
In point 1 the angular frequency is reduced to ω1. Be-
cause the level populations and therefore the Boltzmann
factors do not change the qubit must now be at lower
temperature T̃2 given by T̃2 = T2ω1/ω2 in order to com-
pensate for the change of the qubit splitting. Note that
the quantum mechanical adiabaticity implies also ther-
modynamical adiabaticity: while the energy eigenbasis
changes the level populations and thus also entropy do
not change. In step Q the qubit is allowed to thermalize
to temperature T1 which results in heating of the qubit
and in cooling of resistor 1 if T̃2 < T1. At this point
the ideally pure quantum state of the qubit gets erased
and information stored is lost. The entropy of the qubit
increases, but locally the entropy of resistor 1 decreases
such that one can say that some information is “stored”
in the resistor as it cools but naturally with some loss.
Finally in step R the qubit is adiabatically shifted back
to frequency ω2 which results in heating of the qubit to
the effective temperature T̃1 = T1ω2/ω1 which is assumed
to be higher than T2. The excess energy is dumped to
admittance 2 when the cycle starts again from the be-
ginning. Note that due to the condition T̃2 < T1 resistor
1 can never be cooled below T2ω1/ω2. Since there is no
isothermal stage in the above cycle the present device
is not even in principle a Carnot cooler but rather an
Otto-type device.[10]
The density matrix of the qubit with the resonant an-
gular frequency ω at temperature T (β = (kBT )
−1) is
given by
ρeq(β, ε) =
. (2)
Using this the cooling power and the efficiency of the ideal
cycle in Fig. 1c can be easily calculated. It is given by
the area of the shaded region in the entropy-temperature
plane below points P and Q. In principle one could solve
for the effective temperature of the qubit along the line
between points P and Q as a function of entropy given
by S = −kBTr(ρ ln ρ). Alternatively, we can simply note
that the expectation value of the energy stored in the
qubit in point P is EP = Tr(ρeq(β2ω2/ω1, ε1)H1) while
after relaxation we have EQ = Tr(ρeq(β1, ε1)H1), where
H1 = H(ε1) is the Hamiltonian at point 1. We thus get
for the ideal cooling power
P/f = EQ − EP =
−β1~ω1
e−β1~ω1 + 1
− ~ω1e
−β2~ω2
e−β2~ω2 + 1
≤ ~ω1
where f is the pump frequency. The cooling power
achieves the maximum value of ~ω1f/2 when the ther-
mal population in step O (and P) is small and when
the population in step Q is large, i.e. when β2~ω2 ≫ 1
and β1~ω1 ≪ 1. Naturally a practical device has to be
designed to fulfill the first condition always, in which
case the smallest achievable temperature is on the or-
der of ~ω1/kB below which the cooling power decreases
rapidly. The dynamic range could be made wider by a
tunable ∆ which can be achieved by splitting the small-
est junction into a dc SQUID geometry. Another figure
of merit is the ratio η of the heat removed from resis-
tor 1 divided by the heat added to resistor 2. It can
be obtained as the ratio of the shaded area divided by
the sum of the hatched area and the shaded area, i.e.,
η = (EQ−EP )/(ER−EO) where EO = Tr(ρeq(β2, ε2)H2)
and ER = Tr(ρeq(β1ω1/ω2, ε2)H2). This simplifies neatly
to η = ω1/ω2 < 1 which is in harmony with the second
law of thermodynamics.
For more quantitative analysis we have to consider the
details of the relaxation rates due to the baths. The
Golden Rule transition rates due to resistor j are given
↓,↑ =
|〈0|dH/dΦ|1〉|2M2j S
I (±ωj)
M2j S
I(±ωj) (4)
where the positive sign corresponds to relaxation. The
total thermalization rate is Γ
↑ + Γ
↓. Here the
unsymmetrized noise spectrum is given by
I(ω) =
e−iωt〈δIj(0)δIj(t)〉dt
2~ωReYj(ω)
1− exp(−βj~ω)
. (5)
where ReYj(ω) = R
j /[1+Q
− ωLCj
)2] is the real
part of admittance of circuit j. The total relaxation rate
is thus
2(Ip∆Mj)
2 coth
1 +Q2j
− ωLCj
) . (6)
To model the behavior of the device we utilize the Bloch
master equation (see e.g Ref. [11]) given in our case by
~̇M = − ~B× ~M−Γ1th( ~M‖− ~MT1)−Γ2th( ~M‖− ~MT2)−Γ2 ~M⊥,
where ~M = Tr(~σρ) is the “magnetization” of the qubit,
and ~B = ∆~x + ε~z is the fictitious magnetic field. Note
however that the z-component of ~B and ~M do correspond
to real magnetic field and magnetization, respectively. In
Eq. (7) ~M‖ and ~M⊥ are the components of the magnetiza-
tion parallel and perpendicular to ~B, respectively. These
are explicitly
~M‖ =
(∆Mx + εMz)(∆~x + ε~z) (8)
~M⊥ =
ε2Mx−∆εMz
~x+My~y +
2Mz−∆εMx
~z. (9)
Here ~MT stands for the ε-dependent equilibrium mag-
netization of a qubit at temperature T given explicitly
~MT =
and Γ2 = (Γ
th + Γ
th)/2 + Γϕ is the dephasing rate. The
possibility of pure dephasing at the rate Γϕ has been
included. In the simulation we neglect pure dephasing
due to the intentionally large dominating thermalization
rate. Equation (7) describes relaxation towards instan-
taneous equilibrium with two competing rates due to
two different thermal baths. Equations of this type are
usually used in the stationary case, but for driving fre-
quencies slower than ∆/~ it should be also valid. As
is obvious from Eq. (7), the qubit actually tends to re-
lax towards an effective ε-dependent equilibrium mag-
netization (Γ1th
~MT1 + Γ
~MT2)/(Γ
th + Γ
th) at the rate
To illustrate the practical potential of the device we
show in Fig. 2 the simulated cooling power with si-
nusoidal driving of ǫ(t) compared to the ideal case
along with the actual loop in the entropy temperature
plane. The heat flow Pj from resistor j to the qubit
is simply obtained by integrating the product of the
thermalization rate and the energy deficit, i.e., Pj =
∫ 1/f
[Tr(ρeq(βj , ǫ(t))H)− Tr(ρ(t)H)] . The den-
sity matrix ρ(t) = 1
~M(t) · ~σ is solved numerically using
the Bloch equation (system is followed over a few periods
until it has converged to the limit cycle). We see that the
actual simulated behavior does not significantly deviate
at low f from the ideal behavior and that cooling pow-
ers on the order of fW can be achieved with reasonable
sample parameters. The oscillatory behavior at high f is
interpreted as Landau-Zener interference [12, 13].
However, the cooling power has to be compared with
realistic heat loads to evaluate the utility of the flux qubit
cooler. On one hand, resistor 1 is subject to heat load
0  ln 2
(a) (c)
(b) (d)
0 0.5 1 1.5
f (GHz)
0  ln 2
0 0.5 1 1.5
f (GHz)
FIG. 2: (color online) Example of the simulated cooling power
with ω1/2π = ∆/2π = 5 GHz (ǫ = 0 GHz), ω2/2π = 20.62
GHz (ǫ = 20 GHz), Q1 = Q2 = 10, ωj = ωLCj and
2(Ip∆Mj)
2/(Rj~ωj) = 20 × 10
9s−1. This can be achieved
e.g. with Ip = 200 nA, M1 = 29 pH, M2 = 59 pH and
R1 = R2 = 1 Ω. The driving is sinusoidal. (a) The solid
line illustrates the path in the T − S plane for the ideal cy-
cle described in the text while the dashed (dotted) line is a
result of simulation for f = 0.05 GHz (f = 1 GHz) with
T1 = T2 = 0.3 × ~ω2/kB ≈ 300 mK. (b) Simulated cooling
power vs. f for the same temperatures as in (a) is shown
with the dashed line while the solid line is the ideal result of
Eq. 3 (c-d) Same as (a-b) but with T1 = 0.5× T2 ≈ 150 mK.
The cooling threshold at 0.14 GHz in (d) is caused by finite
Q-factor.
from the phonons of the substrate on which the device
rests. On the other hand, resistor 2 should be coupled
well enough to phonon bath such that the unavoidable
work done on it does not raise T2 excessively. The heat
flow between the electron system of resistor j and the
phonon system is given by Pel−ph = ΣV (T
j −T 5ph) where
Vj is the volume of resistor j and Σ is typically on the
order of 109 Wm−3K−5. Thus resistor 1 needs to have a
sufficiently small volume while resistor 2 should be large
enough physically in order to serve as a heat sink. In ad-
dition the photonic heat conduction between the resistors
due to temperature gradient may in principle contribute
also. Following an analysis similar to Ref. [14], the heat
flow from admittance Y2(ω) to Y1(ω) can be written as
ReY1(ω)ReY2(ω)(n2(ω)− n1(ω))
where nj(ω) = [exp(βj~ω − 1)]−1 are the boson occu-
pation factors and M is the mutual inductance between
the loops. For detuned high-Q resonators the photonic
heat conduction turns out to be quite negligible. For
instance for the values of Fig. 3 with M = 5 pH and
R1 = R2 = 1 Ω we get only Pγ = 2 × 10−18 W even
0 0.5 1 1.5
f (GHz)
FIG. 3: (color online) Equilibrium temperature as a function
of pump frequency for three different phonon bath temper-
atures. The temperature of resistor 1 (volume 10−21m3) is
shown with dashed line while the temperature of resistor 2
(volume 10−18m3) is shown with solid line. The bath tem-
peratures Tph ≈ T2 from top to bottom are 0.3 × ~ω2/kB,
0.2 × ~ω2/kB and 0.1 × ~ω2/kB. Otherwise the parameters
are like in Fig. 2.
if T1 = 0 K and T2 = 300 mK. Figure 2 illustrates the
calculated equilibrium temperature versus operation fre-
quency obtained numerically by finding the balance be-
tween the dominating phononic heat conduction and the
integrated cooling power. We see that almost a factor of
2 reduction of T1 is possible with realistic parameters.
In practice the drop of T1 can be measured e.g. using
an additional SINIS thermometer, in which resistor 1 will
serve as the normal metal N. Its reading is sensitive to
the electronic temperature of N only, and self-heating can
be made very small. The resistors should be made out of
thin film normal metal such as copper or gold with typ-
ically sub 1 Ω square resistance. Volume can be picked
freely. To get the resonant frequencies and quality factor
as above we need L1 = 320 pH, C1 = 3.2 pF, L2 = 80
pH and C2 = 0.8 pF which are also realistic. For the
inductor one may use either Josephson or the kinetic in-
ductance of superconducting wire while the capacitance
values are similar to those in typical flux qubits [2]. To
satisfy the conditions of the above numerical example we
need quite large mutual inductances which however can
be easily achieved using e.g. kinetic inductance [15]. The
strong driving requires also rather large inductance be-
tween the microwave line and the qubit, which should not
result in uncontrolled relaxation. For instance, Mmw=5
pH coupling to the control line is acceptable as it would
result in at most 3 × 107 s−1 relaxation rate assuming a
50 Ω environment at 0.3 K. This choice will not degrade
the performance of the device significantly since driving
is much faster. Yet sufficiently strong driving can be
achieved with a modest 3 µA ac current. Fabrication
process will require most likely three lithography steps.
In conclusion, we have described a method of using a
superconducting flux qubit driven strongly at microwave
frequency to cool an external metal resistor. Here we con-
sidered LC resonators to achieve the required frequency
selectivity but a coplanar wave-guide resonator or a me-
chanical oscillator could be used in principle, too. We
demonstrated by a numerical example that it is possible
to observe the associated temperature decrease experi-
mentally. This effect is directly related to the loss of
information and thus to the increase of entropy of the
quantum bit.
J.P.P thanks NanoSciERA project ”NanoFridge” of
EU for financial support.
[1] O. Astafiev, Yu. A. Pashkin, Y. Nakamura, T. Ya-
mamoto, and J. S. Tsai, Phys. Rev. Lett. 93, 267007
(2004).
[2] F. Yoshihara, K. Harrabi, A. O. Niskanen, Y. Nakamura,
J.S. Tsai, Phys. Rev. Lett. 97, 167001 (2006).
[3] P. Bertet, I. Chiorescu, G. Burkard, K. Semba, C. J. P.
M. Harmans, D.P. DiVincenzo, and J.E. Mooij, Phys.
Rev. Lett. 95, 257002 (2005).
[4] J. E. Mooij, T. P. Orlando, L. Levitov, L. Tian, C. H.
van der Wal, S. Lloyd, Science 285, 1036 (1999).
[5] I. Chiorescu, Y. Nakamura, C. J. P. M. Harmans, and J.
E. Mooij, Science 299, 1869 (2003).
[6] S. O. Valenzuela, W. D. Oliver, D. M. Berns, K. K.
Berggren, L. S. Levitov, and T. P. Orlando, Science 314,
1589 (2006).
[7] M. Meschke, W. Guichard, and J. P. Pekola, Na-
ture(London) 444, 187 (2006).
[8] J. Hauss, A. Fedorov, C. Hutter, A. Shnirman, and G.
Schön, cond-mat/0701041.
[9] J. P. Pekola, F. Giazotto, and O. P. Saira, Phys. Rev.
Lett. 98, 037201 (2007).
[10] H.T. Quan, Y.X. Liu, C. P. Sun, and Franco Nori,
quant-ph/0611275.
[11] Yu. Makhlin, G. Schön, and A. Shnirman, in New Di-
rections in Mesoscopic Physics (Towards Nanoscience),
edited by R. Fazio, V. F. Gantmakher, and Y. Imry
(Kluwer, 2003), p. 197; cond-mat/0309049.
[12] M. Sillanpää, T. Lehtinen, A. Paila, Yu. Makhlin, and P.
Hakonen, Phys. Rev. Lett. 96, 187002 (2006).
[13] W. D. Oliver, Y. Yu, J. C. Lee, K. K. Berggren, L. S.
Levitov, and T. P. Orlando, Science 310, 1653 (2005).
[14] D. R. Schmidt, R. J. Schoelkopf, and A. N. Cleland,
Phys. Rev. Lett. 93, 045901 (2004).
[15] A. O. Niskanen, K. Harrabi, F. Yoshihara, Y. Nakamura,
and J. S. Tsai, Phys. Rev. B 74, 220503(R) (2006).
http://arxiv.org/abs/cond-mat/0701041
http://arxiv.org/abs/quant-ph/0611275
http://arxiv.org/abs/cond-mat/0309049
ABSTRACT
  We consider a design for a cyclic microrefrigerator using a superconducting
flux qubit. Adiabatic modulation of the flux combined with thermalization can
be used to transfer energy from a lower temperature normal metal thin film
resistor to another one at higher temperature. The frequency selectivity of
photonic heat conduction is achieved by including the hot resistor as part of a
high frequency LC resonator and the cold one as part of a low-frequency
oscillator while keeping both circuits in the underdamped regime. We discuss
the performance of the device in an experimentally realistic setting. This
device illustrates the complementarity of information and thermodynamic entropy
as the erasure of the quantum bit directly relates to the cooling of the
resistor.

<|endoftext|><|startoftext|>
Introduction
Presented here is a new technique for analyzing skew polynomial rings satisfying a poly-
nomial identity with an eye toward discovering their PI degrees. It combines and extends
the methods of Jøndrup [21] and Cauchon [5], who introduced techniques of “deleting
derivations” in skew polynomial rings, by means of which they showed that some proper-
ties of certain types of iterated skew polynomial ring A = k[x1][x2; τ2, δ2] · · · [xn; τn, δn]
are determined by the corresponding ring A′ = k[x1][x2; τ2] · · · [xn; τn]. Jøndrup’s re-
sults imply that A and A′ have the same PI degree under certain hypotheses, including
characteristic zero for the base field. Cauchon developed an algorithm that gives an
isomorphism between certain localizations of A and A′, but this requires a qi-skew
condition on each (τi, δi) with qi not a root of unity, which usually precludes A from
satisfying a polynomial identity. We relax the restrictions placed on the base field and
its chosen scalars by Jøndrup and Cauchon, respectively, by introducing the notion of
a higher q-skew τ -derivation.
If we “twist” the multiplication in the (commutative) coordinate ring of affine, symplec-
tic, or Euclidean n-space over a field k, we get a (noncommutative) quantized coordinate
ring which has the structure of an iterated skew polynomial ring with coefficients in k.
This structure is also exhibited in the quantized Weyl algebras and in the quantized
coordinate ring of n×n matrices over k. Letting A represent one of these k-algebras, the
quantum Gel’fand-Kirillov conjecture asserts that FractA is isomorphic to the quotient
division ring of a quantum affine space over a purely transcendental extension of k. For
1991 Mathematics Subject Classification. 16R99; 16S36; 81R50; 16P40.
Key words and phrases. noncommutative rings; skew polynomial rings; quantum algebras.
This research will form a part of the author’s PhD dissertation at the University of California at
Santa Barbara.
http://arxiv.org/abs/0704.0846v1
2 HEIDI HAYNAL
more information on the quantum Gel’fand-Kirillov conjecture and proofs of conditions
under which the result holds, see [1] [7] [23] [28] [32] [33]. We will confirm some of these
cases in a new way.
The first section sets up the conventions under which we work, including definitions
and an established result concerning the PI degree of quantum affine space. We assume
that the reader has some familiarity with the subject, so we do not give an exhaustive
collection of definitions. A comprehensive discussion of any unfamiliar terms can be
found in [16] [4] and [27]. In the second section we define higher τ -derivations and
give necessary and sufficient conditions for their existence. Of particular interest are
higher τ -derivations which satisfy a q-skew relation. In the third section we present
a structure theorem for a localization of q-skew polynomial rings. This extends the
work of Cauchon [5], and the calculations are simplified by the presence of higher q-
skew τ -derivations. In the fourth section we deal with the structure of iterated skew
polynomial rings. Sometimes it is advantageous to rearrange the order in which the
indeterminates appear, so we establish a sufficient condition that allows such reordering.
The main theorem there asserts that if A is an iterated q-skew polynomial ring with
certain higher τ -derivations, then there is a finitely generated Ore set T ⊆ A such that
AT−1 is isomorphic to a localization of a much “nicer” iterated skew polynomial ring. In
the fifth section, we use the tools developed in the previous sections to confirm certain
cases of the quantum Gel’fand-Kirillov conjecture and to find the PI degree of some
quantized coordinate rings and quantized Weyl algebras. In the last section, we follow
up with a structure theorem for completely prime factors of iterated skew polynomial
rings. We also present an open question which, if answered positively, would show that
the quantum Gel’fand-Kirillov conjecture holds for certain of the prime factor algebras
we study.
Throughout, k will denote a field of arbitrary characteristic, q ∈ k a nonzero ele-
ment. The following assumptions apply to all skew polynomial rings that we will con-
sider:
• all coefficient rings are k-algebras
• all automorphisms are k-algebra automorphisms
• all skew derivations are k-linear
• in all skew polynomial rings R[x; τ, δ], τ is an automorphism, not just an endo-
morphism.
To say that R[x; τ, δ] is a q-skew polynomial ring means that the auomorphism and skew
derivation satisfy the relation δτ = qτδ. The reader will note that this is opposite to
Cauchon’s conventions, but it matches the presentation in [10] and others. To say that
δ is locally nilpotent means that for every r ∈ R there is an integer nr ≥ 0 such that
δnr(r) = 0, and δp(r) 6= 0 for p < nr. Such nr is called the δ-nilpotence index of r. The
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 3
symbol N refers to the set of positive integers. For a real number m we use the notation
⌊m⌋ in section five to indicate the integer part of m.
Definition 1.1. We say that two rings R and S exhibit PI degree parity when these
two conditions are satisfied:
(1) R is a PI ring if and only if S is a PI ring,
(2) PIdegR = PIdegS.
For a field k and multiplicatively antisymmetric λ ∈ Mn(k), the corresponding mul-
tiparameter quantum affine space is the k-algebra Oλ(kn) with generators x1, . . . , xn
and relations xixj = λijxjxi for all i, j. The corresponding multiparameter quantum
torus is the k-algebra Oλ((k×)n) given by generators x±11 , . . . , x±1n and the same rela-
tions. The multiplicative set generated by x1, . . . , xn in Oλ(kn) is a denominator set,
and Oλ((k×)n) is a localization of Oλ(kn) with respect to this set.
In this paper we’ll show that iterated skew polynomial algebras covering a large class
of standard examples have PI degree parity with Oλ(kn) for an appropriately chosen
λ. To find out what that PI degree may be, we utilize a result of De Concini and
Procesi. In [8, Proposition 7.1], they establish the following formula for calculating the
PI degree of a quantum affine space Oλ(kn). Their assumption of characteristic zero
from [8, Section 4] is not used in this result.
Theorem 1.2. [De Concini - Procesi] Let λ = (λij) be a multiplicatively antisymmetric
n× n matrix over k.
(1) The quantum affine space Oλ(kn) is a PI ring if and only if all the λij are roots of
unity. In this case, there exist a primitive root of unity q ∈ k× and integers aij such
that λij = q
aij for all i, j.
(2) Suppose λij = q
aij for all i, j, where q ∈ k is a primitive ℓth root of unity and the
aij ∈ Z. Let h be the cardinality of the image of the homomorphism
n (aij)−−−−−−→ Zn π−−−−→ (Z/ℓZ)n
where π denotes the canonical epimorphism. Then PI-deg (Oλ(kn)) =
2. Higher q-Skew τ-Derivations
Before the featured definition, a brief discussion of a tool used to study q-skew polyno-
mial rings is needed. Having the q-skew relation δτ = qτδ in place allows us to group
terms of the same degree when we do skew polynomial arithmetic. The means to do
this are provided by the q-Liebnitz rules.
4 HEIDI HAYNAL
Definition 2.1. For an indeterminate t, and integers n ≥ m ≥ 0, we define the following
polynomial functions:
(m)t = t
m−1 + tm−2 + · · ·+ t + 1 (1)
(m)!t = (m)t(m− 1)t · · · (1)t, and (0)!t = 1 (2)(
(n)!t
(m)!t(n−m)!t
The expressions
are called the t-binomial coefficients, or Gaussian polynomials.
The t-binomial coefficients have properties similar to those of the regular binomial
coefficients. Two that will be useful for this work are:(
= 1 for all n ≥ 0 (4)
+ tn−m
for all 0 < m < n
Proofs for these identities may be found in combinatorics texts such as [39]. When we
evaluate the t-binomial coefficients at t = q, we obtain the q-binomial coefficients that
we need for studying q-skew polynomial rings.
As shown in [10, Section 6], the following q-Liebnitz rules hold for any q-skew polynomial
ring R[x; τ, δ]:
δn(rs) =
τn−iδi(r)δn−i(s) for all r, s ∈ R and n = 0, 1, 2, ...
xnr =
τn−iδi(r)xn−i for all r ∈ R and n = 0, 1, 2, ...
Now, taking a cue from the study of Schmidt differential operator rings, for instance
[25], we define a sequence of k-linear maps that allows us to broaden the class of rings
for which we may derive results like those of Jøndrup and Cauchon.
Definition 2.2. A higher q-skew τ -derivation (h.q-s.τ -d.) on a k-algebra R is a sequence
d0, d1, d2, . . . of k-linear operators on R such that
d0 is the identity
dn(rs) =
τn−idi(r)dn−i(s) for all r, s ∈ R and all n
diτ = q
iτdi for all i.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 5
If a sequence of k-linear maps satisfies the first two conditions, we refer to it as a higher
τ -derivation. We abbreviate the sequence {di}∞i=0 usually as just {di}. A h.q-s.τ -d is
locally nilpotent if for all r ∈ R, there exists an integer n ≥ 0 such that di(r) = 0 for all
i ≥ n, and dp(r) 6= 0 for p < n. In this case, n is called the d-nilpotence index of r. A
h.q-s.τ -d is iterative if didj =
di+j for all i, j. This implies that the di commute
with each other. A q-skew τ -derivation δ on R extends to a h.q-s.τ -d. if there is a
h.q-s.τ -d {di} on R with d1 = δ.
For example, consider the k-algebra with two generators x and y, and one relation
xy − qyx = 1, where q ∈ k×. We’ll assume that q 6= 1 and recognize this algebra as
a q-skew polynomial ring k[y][x; τ, δ] with τ(y) = qy and δ(y) = 1, commonly known
as a quantized Weyl algebra and denoted A
1(k). If q is not a root of unity, then the
(i)!q
comprise an iterative higher q-skew τ -derivation that extends δ on k[y]. The prop-
erties of a higher q-skew τ -derivation follow directly from the fact that δ is a q-skew
τ -derivation and the first q-Liebnitz rule. This particular h.q-s.τ -d. is also locally
nilpotent because
yn−i when i ≤ n,
0 when i > n.
Proposition 2.3. Let {di} be a sequence of k-linear maps on a k-algebra R with
d0 = idR, and let R[[x; τ
−1]] be the skew power series ring where τ is a k-linear automor-
phism of R, the coefficients are written on the right of the variable x, and rx = xτ(r)
for all r ∈ R.
(a) Then {di} is a higher τ -derivation on R if and only if the map Ψ : R → R[[x; τ−1]]
given by r 7→
i=0 x
idi(r) is a ring homomorphism.
(b) Extend τ to an automorphism of R[[x; τ−1]] such that τ(x) = xq. Assume that
{di} is a higher τ -derivation. Then the sequence {di} is a h.q-s.τ -d. if and only if this
diagram is commutative:
R[[x; τ−1]]
// R[[x; τ−1]]
6 HEIDI HAYNAL
Proof. (a) Suppose {di} is a higher τ -derivation on R. Consider any r, s ∈ R. It is clear
that Ψ is additive and Ψ(1) = 1. Applying the definition 2.2 gives
Ψ(rs) =
xidi(rs) =
τ i−mdm(r)di−m(s)
Power series multiplication, with rx = xτ(r), gives
Ψ(r)Ψ(s) =
xidi(r)
)( ∞∑
xidi(s)
τ i−mdm(r)di−m(s)
So Ψ preserves products. Therefore, Ψ is a ring homomorphism.
To demonstrate the other implication, suppose Ψ is a ring homomorphism. Then
Ψ(r)Ψ(s) = Ψ(rs) implies that dn(rs) =
i=0 τ
n−idi(r)dn−i(s) for all r, s ∈ R. There-
fore, {di} is a higher τ -derivation.
(b) Suppose that {di} is a h.q-s.τ -d. Then the relations diτ = qiτdi imply that
τΨ(r) =
i=0 x
iqiτdi(r) =
i=0 x
idi(τ(r)) = Ψτ(r), for all r ∈ R.
Now if the diagram is commutative, then comparing the coefficients of
τΨ(r) =
i=0 x
iqiτdi(r) and Ψτ(r) =
i=0 x
idi(τ(r)) for all r ∈ R yields that
diτ = q
iτdi. �
Remark 2.4. If {di} is locally nilpotent on R, we observe that claims analogous to the
proposition can be made for the map Ψ : R → R[x; τ−1].
Proposition 2.5. Let {di} be a h.q-s.τ -d. on a k-algebra R, where τ is an automor-
phism, and let S be a right denominator set in R with τ(S) = S. Then {di} can be
uniquely extended to a h.q-s.τ -d. on RS−1.
Proof. It has been established that τ and d1 extend uniquely to RS
−1 by
τ(rs−1) = τ(r)τ(s)−1 and d1(rs
−1) = d1(r)s
−1 − τ(rs−1)d1(s)s−1 in [10, Lemma 1.3].
Suppose that {di} extends to a h.q-s.τ -d. on RS−1. For r ∈ R and s ∈ S, we apply dn
to the equation r1−1 = (rs−1)(s1−1) to get
dn(r)1
−1 = dn
(rs−1)(s1−1)
τn−jdj(rs
−1)dn−j(s1
= τn(rs−1)dn(s)1
−1 + · · ·+ dn(rs−1)s1−1.
This implies that
dn(rs
−1) =
dn(r)−
τn−jdj(rs
−1)dn−j(s)
So we have uniqueness in case of existence.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 7
To show existence, let Ψ : R → R[[x; τ−1]] be the map defined in Proposition 2.3, and
let φ : R[[x; τ−1]] → RS−1[[x; τ−1]] be the natural map. Consider the composite map
Φ = φΨ : R → RS−1[[x; τ−1]]. For any s ∈ S, the constant term of Φ(s) is a unit.
So we may inductively solve for the coefficients of an inverse for Φ(s) in RS−1[[x; τ−1]].
Details, as in [37, 1.2], are left to the reader. Hence, Φ extends to a ring homomorphism
Φ′ : RS−1 → RS−1[[x; τ−1]] such that Φ′(rs−1) = Φ(r)Φ(s)−1, and we consider the
diagram:
RS−1[[x; τ−1]]
// RS−1[[x; τ−1]]
// RS−1
where τ has been extended to an automorphism of RS−1[[x; τ−1]] as in Proposition 2.3.
Since Φ(r) =
i=0 x
idi(r)1
−1, and {di} is a h.q-s.τ -d. on R, we have
τΦ(r) =
xiqiτdi(r)1
1−1 = Φτ(r)
for all r ∈ R. It follows directly that τΦ′(rs−1) = Φ′τ(rs−1). So, indeed, the diagram is
commutative.
Define a sequence {di} on RS−1 such that di(t) equals the coefficient of xi in Φ′(t) for
all t ∈ RS−1. Then by Proposition 2.3 we conclude that this sequence is a h.q-s.τ -d. on
RS−1 extending {di} on R. �
Lemma 2.6. Let A be a k-algebra, B ⊆ A a k-subalgebra generated by {b1, b2, . . . }, τ
a k-linear automorphism of A, and {di} a higher τ -derivation on A. If di(bj) ∈ B and
τ(bj) ∈ B, for all i, j ∈ N, then di(B) ⊆ B for all i.
Proof. First, observe that τ(bj) ∈ B for all j implies that τ(B) ⊆ B. Since the di
are k-linear maps, it suffices to check monomials in the bj , using induction on their
length. Suppose, inductively, that for integers m ≥ 1 and 1 ≤ ℓ ≤ m − 1, we have
di(bj1 · · · bjℓ) ∈ B for all i and all j1, . . . , jℓ. Then using the product rule for h.q-s.τ -d.
gives
dn(bj1 · · · bjm) =
τn−1di(bj1 · · · bjm−1)dn−i(bjm) ∈ B
for all n and all j1, . . . , jm, by the induction hypothesis. �
8 HEIDI HAYNAL
Lemma 2.7. Let A be a k-algebra with a set {xj} of generators, τ an automorphism of
A, and {di} a h.q-s.τ -d. on A. If {di} is locally nilpotent for all xj, then {di} is locally
nilpotent on A.
Proof. It suffices to check monomials in the xj because the di are k-linear maps. We
proceed by using induction on the length of such monomials. For a given xn, let i(n)
be its nilpotence index, so di(xn) = 0 for all i ≥ i(n).
Suppose inductively that for n ≥ 2, all integers ℓ with 1 ≤ ℓ ≤ n− 1, and all choices
of j1, . . . , jℓ, there exists an integer m such that di(xj1 · · ·xjl) = 0 for all i ≥ m. For
instance, m = i(j1)+ · · ·+ i(jℓ) will suffice, although the d-nilpotence index of xj1 · · ·xjℓ
may be less than this sum. Then, for p ≥ m+ i(jn), we have
dp(xj1 · · ·xjn) =
τ p−idi(xj1 · · ·xjn−1)dp−i(xjn) = 0,
completing the induction. �
Consider again the quantized Weyl algebra A
1(k). In case q is an ℓ-th root of unity,
the dℓ given in (7) would be undefined due to the occurrence of a zero denominator.
However, realizing A
1(k) as a factor of a quantized Weyl algebra over k[t
±1] allows us to
define a h.q-s.τ -d. on A
1(k) nonetheless. The k[t
±1]-algebra At1(k[t
±1]) has generators
x and y and one relation xy− tyx = 1. This is a t-skew polynomial ring k[t±1][y][x; τ̄ , δ̄]
where τ̄ (y) = ty, τ̄(t) = t, δ̄(y) = 1, and δ̄(t) = 0. Note that
δ̄i(yn) =
(n)!t
(n−i)!t
yn−i when i ≤ n
0 when i > n
implying that δ̄i
k[t±1][y]
⊆ (i)!tk[t±1][y]. So the assignment
d̄i =
(i)!t
defines an iterative, locally nilpotent h.t-s.τ̄ -d. {d̄i} on k[t±1][y]. Now, the relation
xy − tyx = 1 is equivalent to the relation xy − qyx = 1 modulo 〈t − q〉. Hence we
k[t±1]
/〈t− q〉 ∼= Aq1(k).
When q is an ℓth root of unity, we have δ̄ℓ
k[t±1][y]
⊆ 〈t−q〉k[t±1][y]. Nonetheless, the
h.t-s.τ̄ -d. {d̄i} on k[t±1][y] induces a h.q-s.τ -d. {di} on k[y], also iterative and locally
nilpotent, with d1 = δ. Note that even though δ
ℓ = 0 in this algebra, we have di(y
i) = 1
for all i.
This phenomenon is not unique to the quantized Weyl algebras. The conditions that
drive it are codified in the following theorem.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 9
Theorem 2.8. Let R be a k-algebra and R[x; τ, δ] a q-skew polynomial ring where
q ∈ k, q 6= 1. Suppose there exists a torsion-free k[t±1]-algebra R and R[x; τ̄ , δ̄] a t-skew
polynomial ring such that R/〈t− q〉R ∼= R, with τ̄ and δ̄ reducing to τ and δ. Suppose
further that δ̄i(R) ⊆ (i)!tR for all i. Then δ extends to an iterative h.q-s.τ -d. {di} on
R. If δ̄ is locally nilpotent, then so is {di}. If q is not a root of unity, then di = δ
(i)!q
all i. If q is a primitive ℓth root of unity, then di =
(i)!q
for i < ℓ.
Proof. The assumption δ̄i(R) ⊆ (i)!tR for all i implies that the sequence of maps
d̄i =
(i)!t
make up a well-defined iterative h.t-s.τ̄ -d. on R, and also implies that
δ̄ℓ(R) ⊆ 〈t − q〉R because (ℓ)t ≡ (ℓ)q = 0 modulo 〈t − q〉. Since τ̄ and δ̄ reduce to
τ and δ modulo 〈t− q〉, we have an isomorphism R/〈t− q〉[x; τ̄ , δ̄] ∼= R[x; τ, δ] whereby
{d̄i} induces an iterative h.q-s.τ -d. {di} on R. The reduction of the maps from R to R
also implies the remaining results. �
We will find that all of the conditions assumed above are satisfied by the common
quantized coordinate rings and related examples, which will be discussed in a subsequent
section.
3. The τ-Derivation Removing Homomorphism
Following the pattern in [5], let A = R[x; τ, δ], and suppose that δ is locally nilpotent.
Set S = {xn | n ∈ N ∪ {0}} ⊂ A.
Lemma 3.1. The set S is a denominator set in A.
Proof. Clearly, S is a multiplicative set inA. And, since S contains only regular elements
of A, it is left and right reversible. It remains to show that S is an Ore set.
Let a =
i=0 rix
i be an element of A with rn 6= 0. For each ri in the expression of a,
and each mi ≥ 0, we have
xmiri =
τmi−jδj(ri)x
= a′ix+ δ
mi(ri) for some a
i ∈ A.
Since δ is locally nilpotent, we may choose mi to be the δ-nilpotence index of ri to
conclude that xmiri = a
ix for some a
i ∈ A. Set ma = max{mi | 0 ≤ i ≤ n}. Then for
each ri, we have x
mari = ãix, and hence x
maa = ãx for some ã ∈ A.
Now suppose, inductively, that for a given a ∈ A and xp ∈ S we can find elements
xma ∈ S and ā ∈ A such that xmaa = āxp, say ā =
i=0 r̄ix
i. We know that there
10 HEIDI HAYNAL
exists an element xmā such that xmā ā = a′x for some a′ ∈ A. So, xmaa = āxp implies
xmā+maa = a′xp+1, completing the induction.
Hence, for any a ∈ A and s ∈ S, we have Sa∩As 6= ∅. So S is a left Ore set in A. We see
that S is a right Ore set by applying the same argument to Aop = Rop[x; τ−1,−δτ−1]. �
Suppose also that the derivation δ extends to an iterative, locally nilpotent higher q-
skew τ -derivation {di} on R and that q 6= 1. Denote Â = AS−1 = S−1A, the localization
of A with respect to S, and define a map f : R −→ Â by
f(r) =
n(n+1)
2 (q − 1)−ndnτ−n(r)x−n,
noting that {di} is locally nilpotent and that q − 1 is invertible. If q is not a root of
unity and {di} is obtained from a q-skew τ -derivation δ as in (7), the formula for f can
be rewritten as
f(r) =
n(n+1)
(q − 1)−n
(n)!q
δnτ−n(r)x−n.
The rewritten formula matches the one presented in [5, Section 2] when q is replaced by
q−1 to account for the difference between δτ = qτδ (used here) and τδ = qδτ (used in
[5]). We will show that f is a homomorphism and that the the multiplication in imf is
made simpler than that in A by removing the derivation, as seen in the following.
Proposition 3.2. If r ∈ R, then xf(r) = f
x in Â.
Proof. Using the hypothesis that {di} is iterative, we compute that
xf(r) =
n(n+1)
2 (q − 1)−nxdnτ−n(r)x−n
n(n+1)
2 (q − 1)−n
−n(r)x+ d1dnτ
−n(r)
n(n+1)
2 (q − 1)−nq−ndnτ−n+1(r)x−n+1
n(n+1)
2 (q − 1)−n(n+ 1)qdn+1τ−n(r)x−n
n(n+1)
2 (q − 1)−nq−ndnτ−n(τ(r))x−n+1
n(n−1)
2 (q − 1)−n+1(n)qdnτ−n(τ(r))x−n+1
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 11
= τ(r)x
n(n+1)
2 (q − 1)−nq−n + q
n(n−1)
2 (q − 1)−n+1(n)q
−n(τ(r))x−n+1
= τ(r)x
(q − 1)−n
2 + q
2 (qn − 1)
−n(τ(r))x−n+1
= τ(r)x+
(q − 1)−nq
n(n+1)
2 dnτ
−n(τ(r))x−n+1
n(n+1)
2 (q − 1)−ndnτ−n(τ(r))x−n
x = f
which gives the result. �
From Proposition 3.2, it follows by routine induction that
xmf(r) = f
τm(r)
xm ∀m ∈ Z. (8)
This is what we need in order to show that our map is indeed a k-algebra homomor-
phism.
Proposition 3.3. The map f : R −→ Â is a k-algebra homomorphism.
Proof. It is immediate that f is k-linear (τ and {di} are k-linear), and that f(1) = 1.
We’ll show that f is multiplicative. If r, s ∈ R, then using Prop. 3.2,
f(r)f(s) =
i(i+1)
2 (q − 1)−idiτ−i(r)x−if(s)
i(i+1)
2 (q − 1)−idiτ−i(r)f(τ−i(s))x−i
i≥0, j≥0
i(i+1)+j(j+1)
2 (q − 1)−(i+j)diτ−i(r)djτ−(i+j)(s)x−(i+j).
For n ∈ N, the coefficient of x−n in the sum above is
i≥0, j≥0,
i+j=n
i(i+1)+j(j+1)
2 (q − 1)−ndiτ−i(r)djτ−n(s)
(n−p)2+p2+n
2 (q − 1)−ndn−pτ p−n(r)dpτ−n(s)
12 HEIDI HAYNAL
n(n+1)
2 (q − 1)−n
qp(p−n)dn−pτ
pτ−n(r)dpτ
−n(s)
n(n+1)
2 (q − 1)−n
τ pdn−p(τ
−n(r))dp(τ
−n(s))
n(n+1)
2 (q − 1)−ndn(τ−n(r)τ−n(s))
n(n+1)
2 (q − 1)−ndnτ−n(rs),
computed by putting p = j and using the second condition in the Definition 2.2. In
summary, f(r)f(s) =
n=0 q
n(n+1)
2 (q − 1)−ndnτ−n(rs)x−n = f(rs). �
Proposition 3.4. (1) The map f extends uniquely to an algebra homomorphism, also
denoted f , of R[y; τ ] to Â satisfying f(y)=x.
(2) The extended homomorphism is injective.
Proof. (1) This result follows from Proposition 3.2 and the universal property of Ore
extensions.
(2) Let P = pmy
m + · · ·+ p1y + p0 be a nonzero element of R[y; τ ], where each pi ∈ R,
m ≥ 0, pm 6= 0. Then f(P ) = f(pm)xm + · · ·+ f(p1)x+ f(p0). Since
f(pi) =
n(n+1)
2 (q − 1)−ndnτ−n(pi)x−n ∈ AS−1,
we know that there exists an integer l ≥ 0 such that each f(pi)xl is a nonzero element
of A of positive degree l (in x) whenever pi 6= 0. (Because {di} is locally nilpotent,
we may choose an l large enough.) It follows that f(P )xl is a nonzero element of Â of
degree m+ l, hence f(P ) 6= 0. �
Definition 3.5. The algebra homomorphism f : R[y; τ ] −→ Â = AS−1 is called the
derivation removing homomorphism. The image of f , call it A′, is the subalgebra of
Â = AS−1 generated by x and f(R), and is isomorphic (as an algebra) to R[y; τ ] by the
derivation removing homomorphism f .
Observe that A′ contains the multiplicative system S = {xn | n ∈ N ∪ {0}}. Since
equation (8) holds and f(y) = x, the elements of this set are normal in A′. Hence,
S satisfies the (two-sided) Ore condition in A′. The elements of S are regular in A′
because they are regular in Â, and thus:
Proposition 3.6. A′S−1 = AS−1
Proof. We have A′S−1 ⊆ AS−1 because A′ = im(f) ⊆ AS−1. To show the other
inclusion, it suffices to show that R ⊆ A′S−1. (This suffices because A is built up from
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 13
R by x, x2, . . . . So if R ⊆ A′S−1, then AS−1 ⊆ A′S−1.) Consider any r ∈ R and let ℓ
be the d-nilpotence index of r. We show that r ∈ A′S−1 with an induction argument
on ℓ.
If ℓ ≤ 1, then d1(r) = 0, whence f(r) = r ∈ A′ ⊆ AS−1.
If ℓ ≥ 2, we write
f(r) = r +
−n, with rn = q
n(n+1)
2 (q − 1)−ndnτ−n(r) ∈ R.
We’ll show that
n=1 rnx
−n ∈ A′S−1 in order to conclude that r ∈ A′S−1, because
f(r) −
n=1 rnx
−n = r. That is, we need to show that each rn ∈ A′S−1. Suppose,
inductively, that for any element r̃ ∈ R with d-nilpotence index m such that m < ℓ, we
have r̃ ∈ A′S−1.
Note that for n ∈ {1, . . . , ℓ}, we have
dℓ−n(rn) = q
n(n+1)
2 (q − 1)−n
−n(r) = q
n(n+1)
2 (q − 1)−n
q−nℓτ−ndℓ(r) = 0
because dℓ(r) = 0 by hypothesis.
Hence, by the induction hypothesis, each rn ∈ A′S−1 for 1 ≤ n ≤ ℓ− 1. It follows that
r = f(r)−
n=1 rnx
−n also belongs to A′S−1. �
This equality of quotient rings reveals that if A is a PI ring, then
PIdegA = PIdegA′ = PIdegR[y; τ ],
with the second equality arising from the derivation removing homomorphism f . This
recovers the result of Jøndrup [21] without the assumption that k has characteristic
zero. We summarize the results of this section in the following theorem.
Theorem 3.7. Let k be a field, R a k-algebra and A = R[x; τ, δ] a q-skew polynomial
ring in which δ extends to a locally nilpotent, iterative h.q-s.τ -d. {di} on R for some
q ∈ k×, q 6= 1. Let S be the Ore set in A generated by x, and define a map
f : R −→ AS−1 by f(r) =
n=0 q
n(n+1)
2 (q − 1)−ndnτ−n(r)x−n. Then f is a k-algebra
homomorphism, and it extends to an injective homomorphism f : R[y; τ ] −→ AS−1
sending y to x. Furthermore, the extension f : R[y±1; τ ] −→ AS−1 is an isomorphism.
So there is PI degree parity between A and R[y; τ ]. Moreover, if R is a noetherian
domain, then FractA ∼= FractR[y; τ ].
14 HEIDI HAYNAL
4. Main Theorem
In the case where A is an iterated skew polynomial ring, we would like to apply re-
peatedly the method presented above to remove all of the derivations and compare the
resulting Ore localizations. We must first establish some facts about the behavior of
h.q-s.τ -d. when the variables adjoined to the coefficient ring are rearranged, and about
iterated localization. The results of these lemmas will ensure that after the induction
step in the proof of the main theorem we are left with a ring to which the method of
the preceding section applies.
The first parts of the following lemmas hold in a broader class of skew polynomial rings
and also when the q-skew condition is imposed. The final parts assert that h.q-s.τ -d.
are preserved when rearranging of the variables is permissible.
Lemma 4.1. Let S = R[x; τ, δ], A = R[x; τ, δ][y; σ], and Â = R[x; τ, δ][y±1; σ], where
σ(R) = R and σ(x) = λx for some λ ∈ k×.
(1) Then A = R[y; σ′][x; τ ′; δ′], and Â = R[y±1; σ′][x; τ ′; δ′], where σ′ = σ
, τ ′
= τ ,
= δ, τ ′(y) = λ−1y, and δ′(y) = 0.
(2) If (τ, δ) is q-skew, then so is (τ ′, δ′).
(3) Suppose further that δ extends to a h.q-s.τ -d. {di} on R, and that σdi = λidiσ for
all i. Then the τ ′-derivation δ′ extends to a h.q-s.τ ′-d. {d′i} on R[y±1; σ′] such that the
restrictions of the d′i to R coincide with di, and d
i(y) = 0 for all i ≥ 1. Moreover, {d′i}
restricts to a h.q-s.τ ′-d. on R[y; σ′].
(a) If {di} is iterative, then {d′i} is iterative.
(b) If {di} is locally nilpotent, then {d′i} is locally nilpotent.
Proof. (1) Routine details omitted so as not to try the patience of the reader.
(2) Suppose that (τ, δ) is q-skew on R. We’ll check that the two τ ′-derivations τ ′−1δ′τ ′
and qδ′ agree on R[y±1; σ′]. It suffices to check their agreement on a set of generators,
R ∪ {y, y−1}. It is clear that τ ′−1δ′τ ′(r) = qδ′(r) for all r ∈ R. Since δ′(y) = 0, they
agree on {y, y−1} as well. So (τ ′, δ′) is q-skew.
(3) Define a sequence of maps d′i : R[y
±1; σ′] → R[y±1; σ′] by
di(rj)y
Clearly these are k-linear maps, d′i(r) = di(r) for all r ∈ R; also d′i(y) = di(1)y = 0 for
i ≥ 1, and d′0 is the identity on R[y±1; σ′].
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 15
Because δ extends to {di} on R, we get
d1(rj)y
δ(rj)y
j = δ′
for all rj ∈ R. So d′1 = δ′ on R[y±1; σ′].
Now, for integers j,m, n, and elements r, s ∈ R,
(ryj)(sym)
= d′n
rσj(s)yj+m
rσj(s)
τn−idi(r)dn−i(σ
j(s))yj+m
τn−idi(r)y
jσ−jdn−i(σ
j(s))ym
τn−idi(r)y
jλ−j(n−i)dn−i(s)y
(τ ′)n−i
di(r)y
d′n−i(sy
(τ ′)n−id′i(ry
j)d′n−i(sy
So {d′i} satisfies the product rule for a higher τ -derivation on R[y±1; σ′].
Furthermore,
τ ′d′i
= τ ′
di(rj)y
τdi(rj)λ
−jyj,
and d′iτ
= d′i
τ(rj)λ
diτ(rj)λ
τdi(rj)λ
−jyj,
giving the q-skew relation d′iτ
′ = qiτ ′d′i on R[y
±1; σ′].
It follows directly from the definition of the maps {di} that their restrictions to the
k-subalgebra R[y; σ′] also exhibit the properties of definition 2.2.
16 HEIDI HAYNAL
If {di} is iterative on R, then d′ℓd′i(rym) = d′ℓ
di(r)y
= dℓdi(r)y
dℓ+i(r)y
d′ℓ+i(ry
m) for all r ∈ R, m ∈ Z, and non-negative integers ℓ, i. Hence, {d′i} is
iterative on R[y±1; σ′].
Suppose that {di} is locally nilpotent on R. By Lemma 2.7 we need only check that
{d′i} is locally nilpotent on R ∪ {y, y−1}, a set of generators for R[y±1; σ′]. This is clear
because d′i(r) = di(r) for all r ∈ R, and d′i(y) = 0 for all i by construction. �
Lemma 4.2. Let
A = R[x1; τ1, δ1][x2; τ2, δ2] · · · [xn; τn, δn][y; σ],
Â = R[x1; τ1, δ1][x2; τ2, δ2] · · · [xn; τn, δn][y±1; σ],
where σ(R) = R, and for all i ∈ {1, . . . , n}, σ(xi) = λixi for some nonzero λi ∈ k. Let
Aj = R[x1; τ1; δ1][x2; τ2, δ2] · · · [xj ; τj, δj ] for j = 1, 2, . . . , n, and A0 = R.
(1) Then
A = R[y; σ∗][x1; τ
1][x2; τ
2] · · · [xn; τ ′n, δ′n],
Â = R[y±1; σ∗][x1; τ
1][x2; τ
2] · · · [xn; τ ′n, δ′n],
where σ∗ = σ
, τ ′i
= τi, δ
= δi, τ
i(y) = λ
i y, and δ
i(y) = 0 for all 1 ≤ i ≤ n
and j ≤ i− 1.
(2) If (τi, δi) is qi-skew for any 1 ≤ i ≤ n, then (τ ′i , δ′i) is also qi-skew.
(3) Suppose that each δi extends to an h.qi-s.τi-d. {di,p}∞p=0, and that σdi,p = λ
i di,pσ
on Ai−1 for all i and p. Then each δ
i extends to a h.qi-s.τ
i -d. {d′i,p}∞p=0 on the algebra
R〈y, y−1, x1, . . . , xi−1〉, where d′i,p coincides with di,p on Aj, for j < i, and d′i,p(y) = 0
for p ≥ 1. Moreover, {d′i,p} restricts to a h.qi-s.τ ′i -d. on R〈y, x1, . . . , xi−1〉.
(a) If {di,p} is iterative for any 1 ≤ i ≤ n, then {d′i,p} is iterative.
(b) If {di,p} is locally nilpotent for any 1 ≤ i ≤ n, then {d′i,p} is locally nilpotent.
Proof. (1) The condition σ(xi) = λixi for all i implies that σ(Ai) = Ai. We will use
induction on n to prove the result.
Lemma 4.1 proves the case n = 1. Suppose the result holds for all m < n, and consider
A = An−1[xn; τn, δn][y; σ]. Application of Lemma 4.1, and then the induction hypothesis,
gives
A = An−1[xn; τn, δn][y; σ]
= An−1[y; σ
′][xn; τ
= R[x1; τ1, δ1] · · · [xn−1; τn−1, δn−1][y; σ′][xn; τ ′n, δ′n]
= R[y; σ∗][x1; τ
1] · · · [xn; τ ′n, δ′n],
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 17
with the desired conditions met by the automorphisms and derivations, completing the
induction. Similarly, Â = R[y±1; σ∗][x1; τ
1] · · · [xn; τ ′n, δ′n].
(2) Consider the two τ ′i -derivations τ
i and qiδ
i on the ring
R[y±1; σ∗][x1; τ
1] · · · [xi−1; τ ′i−1, δ′i−1]
for 1 ≤ i ≤ n. Since (τi, δi) is q-skew, it is clear that these two τ ′i derivations agree on
Ai−1. And since δ
i(y) = 0 for all i = 1, . . . , n, these two τ
i -derivations agree on a full
set of generators of R[y±1; σ∗][x1; τ
1] · · · [xi−1; τ ′i−1, δ′i−1]. Hence, δ′iτ ′i = qiτ ′iδ′i.
(3) Suppose the result holds for the algebra R[x1; τ1, δ1] · · · [xn−1; τn−1, δn−1][y±1; σ].
Then Lemma 4.1 may be applied, with An−1 providing the coefficients, to get
An−1[xn; τn, δn][y
±1; σ] = An−1[y
±1; σ′][xn; τ
where δ′n extends to a h.qn-s.τ
n-d. {d′n,p} on An−1[y±1]. The induction hypothesis gives
the result. �
Definition 4.3. For a k-algebra A and a, b ∈ A, we say that a and b scalar commute
if there is an element α ∈ k× such that ab = αba. We may also say that a and b
α-commute.
In the following two lemmas, we let D denote the division ring of fractions for the
noetherian domain A. When comparing localizations of A, we identify them as subrings
of D.
Lemma 4.4. Let A be a noetherian domain, S ⊆ A \ {0} an Ore set. Let T be an Ore
set in AS−1 \ {0} with S ⊆ T .
(1) Then there exists an Ore set T̃ ⊆ A\{0} with S ⊆ T̃ such that AT̃−1 = (AS−1)T−1.
(2) Suppose A is a k-algebra and S is generated by s1, . . . , sn satisfying sisj = γijsjsi
for all i, j and some γij ∈ k×. Further suppose that T is generated by S ∪ t for some
t ∈ AS−1 that satisfies sit = λitsi for all i and some λi ∈ k×. Then there exist a cyclic
Ore set T̂ ⊆ A \ {0} and an (n + 1)-generator Ore set Ŝ ⊆ A \ {0} such that S ⊆ Ŝ,
and (AS−1)T−1 = AT̂−1 = AŜ−1.
Proof. (1) Consider T ∩A, the subset in T of elements with a denominator of 1. Clearly,
this is a multiplicative set in A which contains S. Set T̃ = T ∩A. Let a ∈ T̃ and α ∈ A.
Then a ∈ T , and since α ∈ AS−1, there exist b′ ∈ T and β ′ ∈ AS−1 such that aβ ′ = αb′.
By [16, 10.2], there exist y ∈ S, and b, β ∈ A such that β ′ = βy−1 and b′ = by−1; hence,
aβy−1 = αby−1 in AS−1. It follows that aβ = αb in A. So T̃ satisfies the right Ore
condition in A, and the left Ore condition by symmetry. By the universal property,
AT̃−1 ∼= (AS−1)T−1. As subrings of D, we have AT̃−1 = (AS−1)T−1.
18 HEIDI HAYNAL
(2) The generating element t has the form t = ā(sm11 s
2 · · · smnn )−1 for some mi ∈ N,
and ā ∈ A. For any si ∈ S, we have
siā(s
2 · · · smnn )−1 = λiā(sm11 sm22 · · · smnn )−1si = µλiāsi(sm11 sm22 · · · smnn )−1,
where µ is a product of powers of the γij. So ā scalar commutes with the genera-
tors of S via the relations siā = µλiāsi. Let Ŝ be the multiplicative set generated by
ā, s1, . . . , sn in A, and T̂ the multiplicative set generated by ās1s2 · · · sn in A. Recall
that (AS−1)T−1 = AT̃−1, where T̃ = T ∩ A from part (1). From the scalar com-
muting relations it follows that any element at̃−1 ∈ AT̃−1 may be written in the form
b(ās1, · · · sn)−m for some m ∈ N ∪ {0}, b ∈ A, or the form cā−ℓn+1s−ℓ11 · · · s−ℓnn , for
ℓj ∈ N ∪ {0}, c ∈ A. So we conclude that Ŝ and T̂ are Ore sets in A and that
(AS−1)T−1 = AT̂−1 = AŜ−1. �
Lemma 4.5. Let A be a noetherian domain, S1 ⊆ A \ {0} an Ore set, and for integers
j = 2, . . . , n let Sj be an Ore set in ((AS
1 ) · · · )S−1j−1 \ {0} with Sj−1 ⊆ Sj.
(1) Then there exists an Ore set T ⊆ A \ {0} such that AT−1 = (((AS−11 )S−12 ) · · · )S−1n .
(2) Suppose A is a k-algebra, S1 is generated by s1, and for j = 2, . . . , n, Sj is generated
by Sj−1 ∪ {sj}, where sisj = γijsjsi for some multiplicatively antisymmetric matrix
(γij) ∈ Mn(k×). Then there are a cyclic Ore set T̂ ⊆ A and an n-generator Ore set
Ŝ ⊆ A such that S1 ⊆ Ŝ, and ((AS−11 )S−12 ) · · ·S−1n = AT̂−1 = AŜ−1.
Proof. (1) The proof proceeds by induction on n. The case n = 1 is covered in the
lemma above. Suppose that for all j ≤ n− 1 there exists an Ore set Tj ⊆ A \ {0} such
that AT−1j = (((AS
2 ) · · · )S−1j . Then the equality
AT−1n−1 = (((AS
2 ) · · · )S−1n−1
identifies an Ore set Tn ⊆ AT−1n−1 \ {0} such that
(AT−1n−1)T
n = (((AS
2 ) · · ·S−1n−1)S−1n .
Furthermore, Lemma 4.4 implies the existence of an Ore set T ⊆ A \ {0} such that
AT−1 = (AT−1n−1)T
n = (((AS
2 ) · · ·S−1n−1)S−1n .
(2) Suppose, inductively, that there exist
(i) a cyclic Ore set T̂n−1 ⊆ A \ {0} generated by s1ā2 · · · ān−1
(ii) an (n− 1)-generator Ore set Ŝn−1 ⊆ A \ {0} with S1 ⊆ Ŝn−1 and generators
s1, ā2, ā3, . . . , ān−1
(iii) the āi scalar commute with s1 and with each other
(iv) ((AS−11 )S
2 ) · · ·S−1n−1 = AT̂−1n−1 = AŜ−1n−1 as subrings of D.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 19
Then sn = ān(s1ā2 · · · ān−1)−r for some ān ∈ A and r ∈ N. Using the relations
sisj = γijsjsi, routine calculations show that the āi scalar commute with the sj ,
and also with each other, for all i, j. Let T̂ be the multiplicative set generated by
s1ā2 · · · ān, and let Ŝ be the multiplicative set generated by s1, ā2, ā3, . . . , ān. Then
((AS−11 )S
2 ) · · ·S−1n = (AT̂−1n−1)S−1n = AT−1 from part (1). Using Lemma 4.4, we con-
clude that T̂ and Ŝ are Ore sets in A and that AT−1 = AT̂−1 = AŜ−1. �
In the proof of the main theorem, we will use without mention the facts gathered here.
For greater details on these statements, see [16, 10X, 10Y] and [10, 1.4].
(1) Given a noetherian ring A and a normal element x ∈ A, the multiplicative set
generated by x is an Ore set.
(2) The multiplicative set generated by a nonempty family of right Ore sets is right
(3) Let A = R[x; τ, δ], and S a right denominator set in R such that τ(S) = S.
Then S is a right denominator set in A and the identity map on AS−1 extends
to an isomorphism of AS−1 onto (RS−1)[x; τ, δ] sending x1−1 to x. Note that
if A is a k-algebra, τ , δ are k-linear, and τ(k×S) = k×S, then the result holds
because S is a denominator set if and only if k×S is a denominator set.
Theorem 4.6. Let R be a k-algebra and noetherian domain,
A = R[x1; τ1, δ1] · · · [xn; τn, δn],
where each τi is a k-linear automorphism of R〈xi, . . . , xi−1〉 such that τi(xj) = λijxj
for all i, j with 1 ≤ j < i ≤ n and some λij ∈ k×, and where each δi is a k-linear τi-
derivation. Assume that there exist elements qi ∈ k× with qi 6= 1 such that δiτi = qiτiδi,
and that δi extends to a locally nilpotent, iterative h.qi-s.τi-d. on R〈xi, . . . , xi−1〉 for
i = 1, . . . , n.
(1) Then there exists an Ore set T ⊆ A generated by n elements of A such that
AT−1 ∼= R[y±11 ; τ1][y±12 ; τ ′2] · · · [y±1n ; τ ′n]
where τ ′i |R = τi and τ ′i(yj) = λijyj for all i, j with 1 ≤ j < i ≤ n
(2) There is PI degree parity between A and R[y1; τ1][y2; τ
2] · · · [yn; τ ′n]. Moreover, these
algebras have isomorphic division rings of fractions.
Proof. (a) Suppose, inductively, that we have
R[x1; τ1, δ1][y
2 ; τ2] · · · [y±1n ; τ ′n] ∼= AS−12
where the restriction of τ ′i to R〈x1〉 coincides with τi, τ ′i(ym) = λimym for 2 ≤ i ≤ n and
1 < m < i, and S2 is an Ore set in A generated by n − 1 elements from A. Then by
20 HEIDI HAYNAL
Lemma 4.2
AS−12
∼= R[y±12 ; τ ′′2 ] · · · [y±1n ; τ ′′n ][x1; τ ′1, δ′1] (9)
where the restrictions of τ ′1 and δ
1 to R coincide with τ1 and δ1, τ
1(yj) = λ
j1 yj, δ
1(yj) =
0, and τ ′′i coincides with the restriction of τi to R〈y2, . . . , yi−1〉 for 2 ≤ i ≤ n. Observe
that by Lemmas 4.2 and 2.7 we also have δ′1τ
1 = q1τ
1, and that δ
1 extends to a
locally nilpotent iterative h.q1-s.τ -d. on R〈y±12 , . . . , y±1n 〉. Then applying the derivation
removing homomorphism to the right hand side of (9) gives an isomorphism
(AS−12 )T
∼= R[y±12 ; τ ′2] · · · [y±1n ; τ ′n][y±11 ; τ ′1]
where T1 ⊆ AS−12 is an Ore set generated by one element of AS−12 . Then Lemma 4.5
and a reordering of variables shows the existence of an Ore set T ⊆ A, generated by n
elements of A, such that AT−1 ∼= R[y±11 ; τ1][y±12 ; τ ′2] · · · [y±1n ; τ ′n].
(2) This follows from part (1). �
Corollary 4.7. Let A = k[x1; τ1, δ1] · · · [xn; τn, δn] with the hypotheses as in Theorem
4.6. Set λ = (λij). Then
(1) A and Oλ(kn) have isomorphic division rings of fractions.
(2) A is a PI-algebra if and only if all the λij are roots of unity, in which case A and
Oλ(kn) have the same PI degree.
In general, identification of the generators for the Ore set T in Theorem 4.6 is very
cumbersome. To illustrate the computations on a fairly short iterated skew polynomial
ring, we consider the multiparameter second quantized Weyl algebra A
2 (k). Here,
Q = (q1, q2) ∈ (k×)2, qi 6= 1 for all i, and Γ = (γij) ∈ M2(k×) with γii = 1 and
γ21 = γ
12 . The algebra A
2 (k) may be presented as an iterated skew polynomial ring
of the form k[y1][x1; τ2, δ2][y2; τ3][x2; τ4, δ4], where the τi are k-linear automorphisms and
the δ2i are k-linear τ2i-derivations such that
τ2(y1) = q1y1, δ2(y1) = 1
τ3(y1) = γ
12 y1
τ3(x1) = γ12x1
τ4(y1) = q1γ12y1, δ4(y1) = 0
τ4(x1) = q
1 γ21x1, δ4(x1) = 0
τ4(y2) = q2y2, δ4(y2) = (q1 − 1)y1x1 + 1.
For greater detail about this algebra, the reader is referred to [1], [23], [12], and [15].
Routine computations show that the pair (τ2, δ2) is a q1-skew derivation and that (τ4, δ4)
is a q2-skew derivation. To show that δ2 and δ4 are locally nilpotent, it suffices to check
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 21
for local nilpotence on a set of generators. Given their definitions, this is accomplished
by verifying their action on powers of y1 and y2:
δi2(y
1 ) =
(n)!q1
(n−i)!q1
yn−i1 i ≤ n
0 i > n
δi4(y
2 ) =
(n)!q2
(n−i)!q2
[δ4(y2)]
iyn−i2 i ≤ n
0 i > n
Using Theorem 2.8 we have a h.q1-s.τ2-d. {d2,i} extending δ2, and a h.q2-s.τ4-d. {d4,i}
extending δ4, both of which are iterative and locally nilpotent. Let S2 ⊆ AQ,Γ2 (k) be the
multiplicative set generated by x2. The derivation removing homomorphism induces an
isomorphism
Φ : k[y1][x1; τ2, δ2][y2; τ3][z
2 ; τ4] −→ A
2 (k)S
whose action on generators is given by
y1 7→ y1
x1 7→ x1
z2 7→ x2
y2 7→ y2 + (q2 − 1)−1
(q1 − 1)y1x1 + 1
x−12 .
For simplicity, label the domain of Φ as BZ−1. Let X1 ⊆ BZ−1 be the Ore set generated
by z2 and x1. Applying the derivation removing homomorphism to BZ
−1 induces an
isomorphism
Ψ : k[y1][z
1 ; τ2][y2; τ3][z
2 ; τ
4] −→ (BZ−1)X−11
whose action on generators is given by
z1 7→ z1
z2 7→ z2
y2 7→ y2
y1 7→ y1 + (q1 − 1)−1x−11 .
The derivation removing homomorphism need not be employed again to achieve the
result. Through iterated localization we find that there is an Ore set T ⊆ AQ,Γ2 (k) such
2 (k)T
−1 ∼= k[y±11 ][x±11 ; τ2][y±12 ; τ3][x±12 ; τ4]
and T is generated by the four elements x2, x1, y2x2(q2 − 1) + y1x1(q1 − 1) + 1, and
y1x1(q1 − 1) + 1. Note that we recover the result of [22, Theorem 5].
22 HEIDI HAYNAL
5. Examples
We will demonstrate how each of the following k-algebras satisfies all the conditions
of Theorem 2.8. Then Corollary 4.7 is applied to obtain an isomorphism of quotient
division rings (thereby confirming the quantum Gel’fand-Kirillov conjecture) and PI
degree parity with a multiparameter quantum affine space. When calculating the PI
degree of a quantum affine space, we encounter an antisymmetric, or skew-symmetric,
integral matrix. As proved in [30, Theorem IV.1], such a matrix is congruent to a matrix
in skew normal form.
Theorem 5.1. [Newman] Let A be a skew-symmetric matrix of rank r which belongs
to Mn(R), where the commutative principal ideal domain R is not of characteristic 2.
Then r = 2s and A is congruent to a matrix in block diagonal form

−h1 0 0
−h2 0
. . .
0 −hs 0

where hi | hi+1, 1 ≤ i ≤ s− 1.
The same result, in the language of alternating bilinear forms, can be found in [3, Section
5.1].
The matrix S in Theorem 5.1 is clearly equivalent to the more familiar Smith normal
form, diag(h1, h1, h2, h2, . . . , hs, hs, 0, 0, . . . , 0), where the diagonal entries are the in-
variant factors of the matrix A. In the examples that follow, we outline the operations
necessary to obtain the Smith normal form.
Definition 5.2. Let A = k[x1; τ1, δ1] · · · [xn; τn, δn] and A′ = k[x1; τ1] · · · [xn; τn] be
iterated skew polynomial rings. (1) If there exists Q = (q1, . . . , qn) ∈ (k×)n such that
δiτi = qiτiδi for i = 1, . . . , n, then A is called an iterated Q-skew polynomial ring. (2) If
there exist λji ∈ k× such that τj(xi) = λjixi for all i < j, then set λij = λ−1ji and λii = 1
for all i. We call Λ = (λij) ∈ Mn(k×) the matrix of relations for A′.
Lemma 5.3. Let C be a commutative k-algebra, A a C-algebra, B ⊆ A a C-subalgebra
generated by {b1, b2, . . . }. Let τ be a C-algebra automorphism of A, and δ a u-skew
τ -derivation on A for some unit u ∈ C. If τ(bj) ∈ B and δn(bj) ∈ (n)!uB for all j, n,
then δn(B) ⊆ (n)!uB for all n.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 23
Proof. Note that τ(bj) ∈ B for all j implies that τ(B) ⊆ B and hence we have
(j)!uB
⊆ (j)!uB for all j. Suppose that for integers m ≥ 1 and 1 ≤ ℓ ≤ m − 1, we
have δi(bj1 · · · bjℓ) ∈ (i)!uB for all i, and all choices of j1, . . . , jℓ. Then
δn(bj1 · · · bjm) =
τn−iδi(bj1 · · · bj(m−1))δ
n−i(bjm)
(i)!u(n− i)!uB ⊆ (n)!uB
for all n and all j1, . . . , jm by induction. �
For a first family of examples, we take odd-dimensional quantum Euclidean spaces. The
even-dimensional ones will be covered in Example 5.4.
5.1. The coordinate ring of odd-dimensional quantum Euclidean space; Oq(ok2n+1).
For q ∈ k×, assuming q has a (fixed) square root q1/2 ∈ k, the k-algebra Oq(ok2n+1)
may be presented as an iterated skew polynomial ring
k[w][y1; σ1][x1; τ1, δ1] · · · [yn; σn][xn; τn, δn]
with automorphisms σi, τi and derivations δi defined by
σi(w) = q
−1w all i
τi(w) = qw all i
σi(yj) = q
−1yj j < i
σi(xj) = q
−1xj j < i
τi(yj) = qyj i 6= j
τi(xj) = qxj j < i
τi(yi) = yi all i
δi(w) = δi(xj) = δi(yj) = 0 j < i
δi(yi) = (q
1/2 − q3/2)w2 + (1− q2)
yℓxℓ all i.
Quantum Euclidean spaces have been studied since 1990 when they were introduced by
Reshetikhin et al. in [36]. The three-dimensional case has applications to the structure
of space-time at small distances. Musson simplified the original set of relations in [29],
and Oh further simplified them, renaming the generators ω, xi, yi in [31]. Here, we have
made a change to Oh’s variables, yi 7→ qiyi, to obtain the relations in our presentation
of Oq(ok2n+1).
Routine computations show that τ−1i δiτi(yi) = q
−2δi(yi) for all i, and so we conclude
that each (τi, δi) is a q
−2-skew derivation. We may present the analogous k[t±1]-algebra
24 HEIDI HAYNAL
Ot(ok[t±1]2n+1) as an iterated skew polynomial ring with coefficient ring k[t±1] and
generators w, yi, xi for i = 1, . . . , n,
k[t±1][w][y1; σ̄1][x1; τ̄1, δ̄1] · · · [yn; σ̄n][xn; τ̄n, δ̄n]
where the automorphisms and derivations are defined analogously to those of the algebra
Oq(ok2n+1) with t ∈ k[t±1] replacing q ∈ k×. So each (τ̄i, δ̄i) is a t−2-skew derivation. It
is immediate that
Ot(ok[t±1]2n+1)/〈t− q〉 ∼= Oq(ok2n+1)
with each τ̄i and δ̄i reducing to τi and δi respectively.
Let Aj denote the k[t
±1]-subalgebra generated by w, ym, xm for m < j, and yj. To show
that δ̄ij(Aj) ⊆ (i)!t−2Aj, we apply Lemma 5.3 noting that δ̄ij(yj) has been given for i = 1
and is zero for i > 1. So, by Theorem 2.8, each δi in our presentation of Oq(ok2n+1)
extends to an iterative, locally nilpotent h.q−2-s.τi-d. on an appropriate subalgebra.
Then Corollary 4.7 gives
FractOq(ok2n+1) ∼= FractOB(k2n+1),
where the matrix of relations is

1 q q−1 q q−1 · · · q q−1
q−1 1 1 q q−1 · · · q q−1
q 1 1 q q−1 · · · q q−1
q−1 q−1 q−1 1 1 · · · q q−1
q q q 1 1 · · · q q−1
. . .
q−1 q−1 q−1 q−1 q−1 · · · 1 1
q q q q q · · · 1 1

If q ∈ k× is a root of unity, we may assume without loss of generality that it is a
primitive rth root of unity. Then the powers of q from the matrix B become the entries
of a (2n+ 1)× (2n+ 1) integer matrix

0 1 −1 1 −1 · · · 1 −1
−1 0 0 1 −1 · · · 1 −1
1 0 0 1 −1 · · · 1 −1
−1 −1 −1 0 0 · · · 1 −1
1 1 1 0 0 · · · 1 −1
. . .
−1 −1 −1 −1 −1 · · · 0 0
1 1 1 1 1 · · · 0 0

Now, PIdegOq(ok2n+1) can be computed from Theorem 1.2(2) using the matrix B′. The
cardinality of the image will not be changed if we first perform some row reductions on
B′. Letting N = 2n+ 1, n > 2, we manipulate the rows as follows.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 25
• For i = 2, 4, 6, . . . , N − 1, replace row i with row i + row (i+ 1).
• For i = N,N − 2, N − 4, . . . , 5, replace row i with row i − row (i− 2).
• Replace row 5 with row 5 − row 1.
• For i = 2, 4, 6, . . . , N − 5, replace row i with row i − 2row (i+ 5).
• Multiply the even numbered rows, except row 2n− 2, by −1.
The resulting matrix has 2n pivots and one zero row. We put the rows in this order
3, 1, 5, 7, 2, 9, 4, 11, 6, 13, . . . , 2i, 2i+ 7, . . . , N,N − 5, N − 3, N − 1
to place the pivots on the main diagonal and the zero row in the last position. Then we
have a matrix of this form

1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 1 −1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 0 2 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 0 0 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 0 0 0 4 ∗ ∗ ∗ ∗ ∗ ∗ ∗
0 0 0 0 0 1 1 ∗ ∗ ∗ ∗ ∗
0 0 0 0 0 0 4 ∗ ∗ ∗ ∗ ∗
0 0 0 0 0 0 0
. . . ∗ ∗ ∗ ∗
0 0 0 0 0 0 0 0 1 1 ∗ ∗
0 0 0 0 0 0 0 0 0 4 ∗ ∗
0 0 0 0 0 0 0 0 0 0 2 −2
0 0 0 0 0 0 0 0 0 0 0 0

The diagonal entries of this echelon matrix do not yet reveal the size of its image because
the pivot in row three does not divide all of the (suppressed) entries in its row when
n ≥ 3. So more row reduction is needed.
First replace row 3 with row 3 +
row(4i+ 2).
For n even and j = 5, 7, 9, . . . , 2n− 3, replace row j as follows:
for j = 4p+ 1, p ≥ 1, use row j +
i=p+1
2 · row(4i) + row(2n);
for j = 4p+ 3, p ≥ 1, use row j +
i=p+1
2 · row(4i+ 2).
26 HEIDI HAYNAL
For n odd and j = 5, 7, 9, . . . , 2n− 5, replace row j as follows:
for j = 4p+ 1, p ≥ 1, use row j +
i=p+1
2 · row(4i) + 2 · row(2n);
for j = 4p+ 3, p ≥ 1, use row j +
i=p+1
2 · row(4i+ 2) + row(2n).
Then add row(2n) to row(2n − 3), and add 2·row(2n) to row(2n − 1). For integers
4 ≤ j ≤ 2n − 1, with j 6≡ 2(mod 4), add (−1)jcol 3 to col j. Subtract col(2n + 1)
from col 3; add row 3 to row(2n − 2); and subtract 2·row 3 from row(2n). The result
is an upper echelon matrix in which each pivot divides all the nonzero entries in its
row. So it is trivial to diagonalize by column operations. The Smith normal form for n
odd is diag(1, 1, . . . , 1, 4, 4, . . . , 4, 0) with n+1 ones and n− 1 fours. The Smith normal
form for n even is diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4, 0) with n ones, two twos, and n − 2
fours.
For the cases n = 1, 2, the row-reduced matrices are, respectively,
1 0 0
0 1 −1
0 0 0

1 0 0 1 −1
0 1 −1 1 −1
0 0 2 −2 2
0 0 0 2 −2
0 0 0 0 0

Hence we have, for all n > 0,
PIdegOq(ok2n+1) =
rn, r odd
rn/2⌊
⌋, r even, r /∈ 4Z
rn/2n−1, r ∈ 4Z
5.2. The multiparameter quantized Weyl algebras; AQ,Γn (k). For a fixed n-tuple
Q = (q1, . . . , qn) ∈ (k×)n and Γ = (γij) a multiplicatively antisymmetric n × n matrix
over k, the algebra AQ,Γn (k), studied in [23] and [26], may be presented as an iterated
skew polynomial ring
k[y1][x1; τ1, δ1][y2; σ2][x2; τ2, δ2] · · · [yn; σn][xn; τn, δn]
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 27
where the automorphisms and derivations are defined by
σi(yj) = γjiyj j < i
σi(xj) = γijxj j < i
τi(yj) = qjγjiyj j < i
τi(xj) = q
j γijxj j < i
τi(yi) = qiyi all i
δi(xj) = δi(yj) = 0 j < i
δi(yi) = 1 +
(qℓ − 1)yℓxℓ all i.
Routine computations show that τ−1i δiτi(yi) = qiδi(yi) for all i, and so we conclude
that each (τi, δi) is a qi-skew derivation. We may present the k[t
1 , . . . , t
n ]-algebra
AT,Γn (k[t
1 , . . . , t
n ]) as an iterated skew polynomial ring
k[t±11 , . . . , t
n ][y1][x1; τ̄1, δ̄1][y2; σ̄2][x2; τ̄2, δ̄2] · · · [yn; σ̄n][xn; τ̄n, δ̄n]
where the automorphisms and derivations are defined analogously to those of AQ,Γn (k)
with ti ∈ k[t±11 , . . . , t±1n ] replacing qi ∈ k. So each (τ̄i, δ̄i) is a ti-skew derivation. It is
immediate that
AT,Γn (k[t
1 , . . . , t
n ])/〈t1 − q1, . . . , tn − qn〉 ∼= AQ,Γn (k)
with each τ̄i and δ̄i reducing to τi and δi respectively.
Let Aj denote the k[t
1 , . . . , t
n ]-subalgebra generated by ym, xm for m < j, and yj.
To show that δ̄ij(Aj) ⊆ (i)!tjAj, it suffices to check δ̄ij(yj) by Lemma 5.3. But this is
given by definition for i = 1 and is zero for i > 1. So, by Theorem 2.8, each δi in
our presentation of AQ,Γn (k) extends to an iterative, locally nilpotent h,qi-s.τi-d. on the
appropriate subalgebra. Then Corollary 4.7 gives FractAQ,Γn (k)
∼= FractOΛ(k2n), where
the 2n× 2n matrix of relations Λ is comprised of 2× 2 blocks
Bii =
1 q−1i
, for all i;
Bij =
γji q
i γji
γij qiγij
, for i < j;
Bij =
γji γij
qjγji q
j γij
, for i > j.
If γij and qi are roots of unity for all i, j, then OΛ(k2n) is a PI algebra. Assuming that
γij is an r
ij root of unity and that qi is an r
i root of unity, we let
r = lcm{rij, ri | i, j = 1, . . . , n}.
28 HEIDI HAYNAL
Then there exists a primitive rth root of unity q ∈ k and integers bi, bij such that qi = qbi
and γij = q
bij for i, j = 1, . . . , n. The powers of this q from the matrix Λ give a 2n× 2n
integer matrix Λ′ comprised of 2× 2 blocks
B′ii =
0 −bi
, for all i;
B′ij =
bji bji − bi
bij bij + bi
, for i < j;
B′ij =
bji bij
bj + bji bij − bj
, for i > j.
Then PIdegAQ.Γn (k) can be computed using the matrix Λ
′ in Theorem 1.2 (2).
Consider the single parameter case, denoted Aqn(k), where qi = q for all i, and γij = 1
for i < j, relegating the σi to identity maps. Assuming that q is a primitive r
th root of
unity, then δi(y
i ) = 0 and τi(y
i ) = y
i for all i, implying that y
i is central. The definition
of the τi, along with the q-Liebnitz rule, implies that x
i is central for all i. So the algebra
Aqn(k) is a finitely generated module over the central subring k[y
i , x
1, . . . , y
n]. To
find the PI degree in this case, the integer matrix becomes

0 −1 0 −1 . . . 0 −1
1 0 0 1 0 1
0 0 0 −1 0 −1
1 −1 1 0 0 1
. . .
0 0 0 0 . . . 0 −1
1 −1 1 −1 . . . 1 0

which is seen to have a trivial kernel after these row reductions:
• Replace row 2n with row 2n− row (2n− 2)− row (2n− 3)
• For j = n−1, n−2, . . . , 2, replace row 2j with row 2j−row (2j−2)−row (2j−3)
• Rearrange the rows to order 2, 1, 4, 3, 6, 5 . . . , 2n, 2n− 1.
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 29
The resulting matrix has the form

0 −1 ∗
. . .
0 1 0

thus verifying that PIdegAqn(k) = r
5.3. The multiparameter coordinate ring of quantum n×nmatrices; Oλ,p
Mn(k)
The multiparameter coordinate ring of quantum n×n matrices was introduced by Artin,
Schelter, and Tate in [2]. The k-algebra Oλ,p
Mn(k)
is defined by generators xij for
i, j = 1, . . . , n and relations
xℓmxij =
pℓipjmxijxℓm + (λ− 1)pℓiximxlj (ℓ > i, m > j)
λpℓipjmxijxℓm (ℓ > i, m ≤ j)
pjmxijxℓm (ℓ = i, m > j),
where λ ∈ k× and p = (pij) ∈ Mn2(k×) is multiplicatively antisymmetric. It can also
be presented as an iterated skew polynomial ring
k[x11][x12; τ12] · · · [xij ; τij, δij ] · · · [xnn; τnn, δnn]
where each τℓm and δℓm is k-linear and satisfies
τℓm(xij) =
pℓipjmxij when ℓ > i and m 6= j
λpℓipjmxij when ℓ > i and m = j
pjmxij when ℓ = i and m > j,
δℓm(xij) =
(λ− 1)pℓiximxℓj when ℓ > i and m > j
0 otherwise.
Routine computations show τ−1ℓm δℓmτℓm(xij) = λ
−1δℓm(xij) as in [9, Section 5], and so
we conclude that each (τℓm, δℓm) is a λ
−1-skew derivation. We may present the k[t±1]-
algebra Ot,p
Mn(k[t
as an iterated skew polynomial ring with generators xij for
i, j = 1, . . . , n
k[t±1][x11][x12, τ̄12] · · · [xij ; τ̄ij, δ̄ij ] · · · [xnn; τ̄nn, δ̄nn]
30 HEIDI HAYNAL
where the automorphisms and derivations are defined analogously to those of the algebra
Mn(k)
with t ∈ k[t±1] replacing λ ∈ k. So each (τ̄ℓm, δ̄ℓm) is a t−1-skew derivation.
It is immediate that
Mn(k[t
/〈t− λ〉 ∼= Oλ,p
Mn(k)
with each τ̄ℓm and δ̄ℓm reducing to τℓm and δℓm respectively.
Let A−ℓm denote the k[t
±1]-subalgebra generated by the xij with (i, j) < (ℓ,m) in the
lexicographic order. Lemma 5.3 allows us to to verify that δ̄sℓm(A
ℓm) ⊆ (s)!t−1(A−ℓm) by
checking only that δ̄sℓm(xij) is contained in A
ℓm. This is immediate from the formula for
δ̄ℓm given above. Thus, by Theorem 2.8, each δℓm in our presentation of Oλ,p
Mn(k)
extends to an iterative, locally nilpotent h.λ−1-s.τℓm-d. on the appropriate k-subalgebra.
Then Corollary 4.7 gives
FractOλ,p
Mn(k)
) ∼= FractOΛ(kn
where the matrix of relations Λ = (bij) ∈ Mn2(k) is comprised of n× n blocks
Bii =

1 p21 p31 · · · pn1
p12 1 p32 · · · pn2
p13 p23 1 · · · pn3
. . .
p1n p2n p3n · · · 1

for all i,
Bij =

λ−1pij pijp21 pijp31 · · · pijpn1
λ−1pijp12 λ
−1pij pijp32 · · · pijpn2
λ−1pijp13 λ
−1pijp23 λ
−1pij · · · pijpn3
. . .
λ−1pijp1n λ
−1pijp2n λ
−1pijp3n · · · λ−1pij

, for i < j,
Bij =

λpij λpijp21 λpijp31 · · · λpijpn1
pijp12 λpij λpijp32 · · · λpijpn2
pijp13 pijp23 λpij · · · λpijpn3
. . .
pijp1n pijp2n pijp3n · · · λpij

, for i > j.
If λ and pij are roots of unity for all i, j, then OΛ(kn
) is a PI algebra. In this case we
may assume that λ is an sth root of unity and that pij is an r
ij root of unity, and let
r = lcm{s, rij | i, j = 1, . . . , n}. Then there exists a primitive rth root of unity q ∈ k
and integers b, bij such that λ = q
b and pij = q
bij . The powers of this q from the matrix
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 31
Λ provide entries for an n2 × n2 integer matrix Λ′ made up of n× n blocks
B′ii =

0 b21 b31 · · · bn1
b12 0 b32 · · · bn2
b13 b23 0 · · · bn3
. . .
b1n b2n b3n · · · 0

for all i,
B′ij =

bij − b bij + b21 bij + b31 · · · bij + bn1
bij + b12 − b bij − b bij + b32 · · · bij + bn2
bij + b13 − b bij + b23 − b bij − b · · · bij + bn3
. . .
bij + b1n − b bij + b2n − b bij + b3n − b · · · bij − b

, for i < j,
B′ij =

bij + b bij + b21 + b bij + b31 + b · · · bij + bn1 + b
bij + b12 bij + b bij + b32 + b · · · bij + bn2 + b
bij + b13 bij + b23 bij + b · · · bij + bn3 + b
. . .
bij + b1n bij + b2n bij + b3n · · · bij + b

, for i > j.
Then PIdegOλ,p
Mn(k)
can be calculated using Λ′ in Theorem 1.2 (2).
The single parameter quantized coordinate ring of n × n matrices, Oq(Mn(k)), is de-
fined over k analogously to Oλ,p(Mn(k)), but with relations that are recovered by setting
λ = q−2 and pij = q for all i > j. When k has characteristic zero and q is a primitive m
root of unity for m odd, Jakobsen and Zhang found in [20] that
PIdegOq(Mn(k)) = m
n(n−1)
2 by using De Concini’s and Procesi’s tool given in Theo-
rem 1.2. This result is reproved in [19] using results of De Concini and Procesi and also
Jøndrup’s work from [21]. Now we can recover PIdegOq(Mn(k) without the assumption
that k has characteristic zero.
In the single parameter case of n × n quantum matrices, the matrix that we use to
calculate the PI degree is

An In In In · · · In
−In An In In · · · In
−In −In An In · · · In
−In −In −In −In · · · An

32 HEIDI HAYNAL
where

0 1 1 1 · · · 1
−1 0 1 1 · · · 1
−1 −1 0 1 · · · 1
−1 −1 −1 · · · −1 0

is n× n and In is the n× n identity matrix.
For any n, the characteristic polynomial of An is the sum of the terms of degree ≡ n
(mod 2) in the binomial expansion of (x+1)n, so in fact χn(x) =
(x+1)n+ 1
(x− 1)n.
But there is also a recursion formula for the characteristic polynomial for n ≥ 3 given
χn(x) = χn−1(x)(x+ 1)− (x− 1)n−1,
which will be useful in the linear algebra that follows.
We will perform the following row reductions on the rows of blocks of Λ′. For ease of
notation, we’ll denote the jth row of blocks as BRj , the interchange of BRi and BRj as
BRi ↔ BRj , and the addition of a multiple of BRi to BRj as MBRi + BRj 7→ BRj ,
where M ∈ Mn(Z).
• BR1 ↔ BRn.
• −InBR1 7→ BR1.
• For i = 2, . . . , n− 1, BR1 +BRi 7→ BRi.
• BRn − AnBR1 7→ BRn.
This yields the matrix

In In In In · · · −An
0 An + In 2In 2In · · · In −An
0 0 An + In 2In · · · In −An
. . .
0 0 0 · · · An + In In −An
0 In −An In −An In −An · · · In + A2n

which can be reduced further by n − 2 block row operations, each of which produces
one zero block in the nth row. We list the first three here along with the resulting (n, n)
block.
• (An + In)BRn − (In − An)BR2 7→ BRn : A3n + 3An
• (An + In)BRn + (In −An)2BR3 7→ BRn : A4n + 6A2n + In
• (An + In)BRn − (In − An)3BR4 7→ BRn : A5n + 10A3n + 5An
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 33
In general, the block row operations that we need to perform in order to obtain a block
upper triangular matrix are:
• For i = 2, . . . , n− 1, (An + In)BRn + (−1)i−1(In − An)i−1BRi 7→ BRn.
These row operations are justified when m is odd because An + In is invertible in
Mn(Z/mZ) in that case, as will be shown below. After applying this step to the i
th row,
the (n, n) block is χi+1(An). So the resulting block upper triangular matrix is

In In In In · · · −An
0 An + In 2In 2In · · · In −An
0 0 An + In 2In · · · In −An
. . .
0 0 0 · · · An + In In −An
0 0 0 0 · · · χn(An)

where χn(An) is the n× n zero matrix. Each block on the diagonal is
An + In =

1 1 1 1 · · · 1
−1 1 1 1 · · · 1
−1 −1 1 1 · · · 1
−1 −1 −1 −1 · · · 1

which can be row reduced just by adding row 1 to rows 2 through n to yield the
matrix 

1 1 1 1 · · · 1
0 2 2 2 · · · 2
0 0 2 2 · · · 2
. . .
0 0 0 0 · · · 2

In particular this shows that An + In is invertible in Mn(Z/mZ) for m odd. Hence
Λ′ can be reduced through row operations to an upper triangular n2 × n2 matrix with
2n− 2 ones, (n− 1)(n− 2) twos, and n zeroes on the diagonal. Assuming that q ∈ k is
a primitive mth root of unity, and recalling Theorem 1.2, the cardinality of the image
in (Z/mZ)n
is mn
2−n if m is odd. Thus we conclude that PIdegOqMn(k) = m
n(n−1)
recovering the result of Jakobsen and Zhang [20] in characteristic zero. By similar
methods, one can show that PIdegOqMn(k) = m
n(n−1)
(n−1)(n−2)
2 when m is even. For
details on this result see [20] or [17].
5.4. The algebra K
n,Γ (k), which generalizes the coordinate rings of even-
dimensional quantum Euclidean space and quantum symplectic space. For
P = (p1, . . . , pn) and Q = (q1, . . . , qn) in (k
×)n with pi 6= qi for all i = 1, . . . , n, and
34 HEIDI HAYNAL
Γ = (γij) ∈ Mn(k×) multiplicatively antisymmetric, the k-algebra KP,Qn,Γ (k) introduced
in [18] is defined by generators xi, yi for i = 1, . . . , n and relations
yiyj = γijyjyi all i, j
xixj = qip
j γijxjxi i < j
xiyj = pjγjiyjxi i < j
xiyj = qjγjiyjxi i > j
xiyi = qiyixi +
(qℓ − pℓ)yℓxℓ all i.
This algebra may be presented in the form of an iterated skew polynomial ring
k[y1][x1; τ1][y2; σ2][x2; τ2, δ2] · · · [yn; σn][xn; τn, δn]
where the automorphisms τi, σi and derivations δi are defined by
σi(yj) = γijyj j < i
σi(xj) = p
i γjixj j < i
τi(yj) = qjγjiyj j < i
τi(xj) = q
j piγijxj j < i
τi(yi) = qiyi all i
δi(xj) = δi(yj) = 0 j < i
δi(yi) =
(qℓ − pℓ)yℓxℓ all i.
Routine computations show that τ−1i δiτi(yi) = qip
i δi(yi) for all i, and so we conclude
that each (τi, δi) is a qip
i -skew derivation. For ease of notation we now shall let
k = k[t±11 , . . . , t
n , u
1 , . . . , u
n ] with T = (t1, . . . , tn) ∈ k and U = (u1, . . . , un) ∈ k.
We may present the k-algebra K
n,Γ (k) as an iterated skew polynomial ring
k[y1][x1; τ̄1][y2; σ̄2][x2; τ̄2, δ̄2] · · · [yn; σ̄n][xn; τ̄n, δ̄n]
where the automorphisms and derivations are defined analogously to those of K
n.Γ (k)
with ti replacing pi and ui replacing qi. Let I ⊆ KT,Un,Γ (k) be the ideal generated by the
2n monomials ti − pi, ui − qi for i = 1, . . . , n. It is immediate that
n,Γ (k)/I
∼= KP,Qn,Γ (k),
with each τ̄i, δ̄i, σ̄i reducing to τi, δi, σi respectively.
Let Aj denote the subalgebra of K
n,Γ (k) generated by ym, xm for m < j and yj. To
show that δ̄ij(Aj) ⊆ (i)!ujt−1j Aj , it suffices to check that δ̄
j(yj) is an element of (i)!ujt−1j
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 35
by Lemma 5.3. This is given for i = 1 by the formula for δ̄j and is zero for i > 1. So,
by Theorem 2.8, each δi in our presentation of K
n,Γ (k) extends to an iterative, locally
nilpotent h.qip
i -s.τi-d. on the appropriate subalgebra. Then Corollary 4.7 gives
FractK
n,Γ (k)
∼= FractOΛ(k2n)
where the 2n× 2n matrix of relations Λ = (Bij) is comprised of 2× 2 blocks
Bii =
1 q−1i
, for all i;
Bij =
γij q
i γji
pjγji qip
j γij
, for i < j;
Bij =
γij p
i γij
qjγji q
j piγij
, for i > j.
If the qi, pi and γi are all roots of unity, then OΛ(k2n) is a PI algebra. Suppose qi is an
rthi root of unity, pi is an s
i root of unity, and γij is an r
ij root of unity for all i, j. Let
r = lcm{ri, si, γij | i, j = 1, . . . , n}. Then there extsis a primitive rth root of unity q ∈ k
and integers bi, ci, bij such that qi = q
bi , pi = q
ci, and γij = q
bij for all i, j. The powers
of q from the matrix Λ provide the entries for an integer matrix Λ′ comprised of 2 × 2
blocks
B′ii =
0 −bi
, for all i;
B′ij =
bij bji − bi
bji + cj bi + bij − cj
, for i < j;
B′ij =
bij bij − ci
bji + bj bij + ci − bj
, for i > j.
Then PIdegK
n,Γ (k) can be calculated using Λ
′ in Theorem 1.2 (2).
The coordinate ring of quantum Euclidean 2n-space over k, Oq(ok2n), is formed by
setting qi = 1, pi = q
−2 for all i, and γij = q
−1 for i < j in the parameters Q, P , and Γ
36 HEIDI HAYNAL
(see [18], Example 2.6). Then the integer matrix, Λ′, is

0 0 −1 1 −1 1 . . . −1 1
0 0 −1 1 −1 1 . . . −1 1
1 1 0 0 −1 1 −1 1
−1 −1 0 0 −1 1 ...
1 1 1 1 0 0
−1 −1 −1 −1 0 0
. . .
1 1 1 1 1 . . . 0 0
−1 −1 −1 −1 −1 . . . 0 0

We perform the following row reductions that preserve the size of the image of the
homomorphism Z2n −→ Z2n given by Λ′:
• For j = 2n, 2n− 1, 2n− 2, . . . , 4, replace row j with row j + row (j − 1)
• Replace row 2 with row 2− row 1
• Replace the (new) row 5 with row 5 + row 1
• For j = 4, 6, 8, . . . , 2n− 4, replace row j with row j + 2row (j + 3)
• For n ≥ 4, rearrange the rows to order 3, 1, 5, 7, 4, 9, 6, 11, . . . , 2i, 2i+ 5, . . . ,
2n− 4, 2n− 2, 2, 2n.
The resulting matrix has the form

0 −1 1
0 2 ∗
0 1 1
. . .
0 1 1
0 0 4
0 −2 2

When n is even, the pivot in the third row does not divide all the entries in its row,
so more elementary row and column operations are needed before it becomes clear that
the matrix can be diagonalized. By a method similar to that used in Example 5.1,
suppressed here in the interest of saving space but listed explicitly in [17], we obtain the
Smith normal form diag(1, 1, . . . , 1, 4, 4, . . . , 4, 0, 0) with n ones and n− 2 fours when n
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 37
is even; and diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4, 0, 0), with n− 1 ones and n− 3 fours when
n is odd. Thus we have
PIdegOq(ok2n) =
rn−1, r odd
rn−1/2⌊
⌋, r even /∈ 4Z
rn−1/2n−2, r ∈ 4Z
. (10)
The low-dimensional cases do not fit the same pattern, but the matrices for the cases
n = 2 and n = 3 are readily transformed to
1 1 0 0
0 0 −1 1
0 0 0 0
0 0 0 0
 and

1 1 0 0 −1 1
0 0 −1 1 −1 1
0 0 0 2 0 0
0 0 0 0 −2 2
0 0 0 0 0 0
0 0 0 0 0 0

respectively. Therefore, formula (10) holds for all n ≥ 2.
As a specific case ofK
n,Γ (k), quantum symplectic spaceOq(sp(k2n)) is formed by setting
qi = q
−2 and pi = 1 for all i, and γij = q for i < j (see [18], Example 2.4). With these
parameters, the 2n× 2n integer matrix Λ′ is

0 2 1 1 1 1 . . . 1 1
−2 0 −1 −1 −1 −1 . . . −1 −1
−1 1 0 2 1 1 1 1
−1 1 −2 0 −1 −1 −1 −1
−1 1 −1 1 0 2 ...
−1 1 −1 1 −2 0
. . .
−1 1 −1 1 −1 1 . . . 0 2
−1 1 −1 1 −1 1 . . . −2 0

We perform the following row reductions that preserve the size of the image of the
homomorphism Z2n −→ Z2n given by Λ′:
• For j = 2n, 2n− 1, . . . , 4, replace row j with row j − row (j − 1)
• Replace row 2 with −(row 2− 2row 3 + row 1)
• For j = 4, 6, 8, . . . , 2n− 2, replace row j with row j + 2row (j + 1)
• For n ≥ 3, order the rows 3, 1, 5, 2, 7, 4, 9, . . . , 2j, 2j + 5, . . . , 2n− 4, 2n, 2n− 2.
38 HEIDI HAYNAL
This yields a matrix whose image is more easily measured:

0 1 1 ∗
. . .
0 0 4
−2 −2

But the pivot in row 2 is problematic because it does not always divide the other
entries in its row. With further elementary row and column operations, full details
of which can be found in [17], we can bring this matrix into Smith normal form
diag(1, 1, . . . , 1, 4, 4, . . . , 4) with n ones and n fours when n is even; or the form
diag(1, 1, . . . , 1, 2, 2, 4, 4, . . . , 4) with n − 1 ones, two twos, and n − 1 fours when n
is odd.
For n = 1, 2, the row reduced matrices are, respectively,
−1 1 0 2
0 1 1 1
0 0 −4 −4
0 0 0 −4
 .
Hence we have, for all n,
PIdegOq(sp(k2n)) =
rn, r odd
rn/2⌊
⌋, r even, r /∈ 4Z
rn/2n, r ∈ 4Z
6. Prime Factor Localizations
In this section we present a structure theorem for completely prime factors of iterated
skew polynomial rings analogous to the main theorem of section four. Applying this
result to the algebras studied in section five, we’d like to strengthen it to the form
of the quantum Gel’fand-Kirillov conjecture. Recall that the assumptions about skew
polynomial rings from section one are still in effect.
Theorem 6.1. Let A = R[x; τ, δ], where R is noetherian and δτ = qτδ for some
q ∈ k×. Assume that δ extends to a locally nilpotent, iterative h.q-s.τ -d., {di}, on R.
Let P ∈ specA be completely prime. Then
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 39
(1) there exists a cyclic Ore set S in A/P such that (A/P )S−1 ∼=
R[y; τ ]/Q
Y −1 for
some completely prime Q ∈ specR[y; τ ] and cyclic Ore set Y ,
(2) FractA/P ∼= FractR[y; τ ]/Q.
Proof. The completely prime ideal P naturally satisfies one of two cases: x ∈ P or
x /∈ P . If x ∈ P , then xA ⊆ P and Ax ⊆ P . So the relation xr = τ(r)x + δ(r)
implies that δ(r) ∈ P for all r ∈ R. Hence, there is a completely prime ideal I ∈ R
such that A/P ∼= R/I ∼= R[y; τ ]/(I + 〈y〉). In this case, we can take S = Y = {1} and
localize. If x /∈ P , then xi /∈ P for all i ∈ N ∪ {0} because A/P is a domain. Letting
S = {1, x, x2, . . . }, which is a known denominator set in A, we have P ∩ S = ∅. Since
extension and contraction provide inverse bijections between the sets specAS−1 and
{I ∈ specA | I ∩ S = ∅}, we know that P e ∈ specAS−1. From Theorem 3.7, we
have AS−1 ∼= R[y±1; τ ], a localization of R[y; τ ]. So there is a completely prime ideal
Q̄⊳R[y±1; τ ] such that AS−1/P e ∼= R[y±1; τ ]/Q̄. Setting Y = {1, y, y2, . . . , }, contraction
to R[y; τ ] gives a completely prime ideal Q, where Q∩ Y = ∅, such that R[y±1; τ ]/Q̄ is
isomorphic to (R[y; τ ]/Q)Y −1. The canonical projection π : AS−1 −→ (A/P )S−1 gives
AS−1/P e ∼= (A/P )S−1. Thus (A/P )S−1 ∼=
R[y; τ ]/Q
Y −1. �
Theorem 6.2. Let R be a noetherian k-algebra, and let
A = R[x1, τ1, δ1] · · · [xn; τn, δn]
be an iterated skew polynomial ring where, for j < i and λij ∈ k×, τi(xj) = λijxj, and δi
is a qi-skew τi-derivation, qi 6= 1, which extends to a locally nilpotent, iterative h.qi-s.τi-d.
{di,p}∞p=0 on R[x1; τ1, δ1] · · · [xi−1; τi−1, δi−1] for all i. Let A′ = R[y1; τ ′1][y2; τ ′2] · · · [yn; τ ′n]
where τ ′i(yj) = λijyj for all i with j < i and the same units λij as above. Let P be a
completely prime ideal in A. Then
(1) there exists a finitely generated Ore set Sn in A/P such that (A/P )S
n is isomorphic
Y −1n for some completely prime ideal Q ⊆ A′ and finitely generated Ore set
(2) FractA/P ∼= FractA′/Q.
Proof. The case n = 1 has been established in Theorem 6.1. Suppose the result holds
for the case n− 1, and let An−1 = R[x1, τ1, δ1] · · · [xn−1; τn−1, δn−1] ⊆ A. Then we have
A = An−1[xn; τn, δn]. If xn ∈ P , then as in Theorem 6.1 there is a completely prime
ideal I ⊆ An−1 such that A/P ∼= An−1/I ∼= An−1[yn; τ ′n]/(I + 〈yn〉). The induction
hypothesis and Lemma 4.2 imply that
An−1[yn; τ
n]/(I + 〈yn〉)
S−1 ∼=
Y −1 for
some finitely generated Ore sets S and Y . Hence there is a finitely generated Ore set
Sn in A such that (A/P )S
∼= (A′/Q)Y −1. If xn /∈ P , let Sn = {1, xn, x2n, . . . } ⊆ A
and Yn = {1, yn, y2n, . . . } ⊆ An−1[yn; τn]. Then from the single-variable result, it follows
that (
An−1[yn; τ
n]/Q̄
Y −1n , (11)
40 HEIDI HAYNAL
for a completely prime ideal Q̄ ⊆ An−1[yn; τ ′n]. From Lemma 4.2, we have
An−1[yn; τ
n] = R[yn; τ
n][x1; τ
1] · · · [xn−1; τ ′n−1, δ′n−1],
which is an iterated skew polynomial ring in n − 1 variables over the coefficient ring
R[yn; τ
n] that satisfies the current assumptions. So, we apply the induction hypothesis
and rearrange variables to obtain
An−1[yn; τn]/Q̄
Y −1n
R[yn; τ
n][y1; τ
1] · · · [yn−1; τ ′n−1]/Q
R[y1; τ
1][y2; τ
2] · · · [yn; τ ′n]/Q
for a completely prime ideal Q ⊆ R[y1; τ ′1][y1; τ ′1] · · · [yn; τ ′n] and a denominator set
Z ⊆ R[y1; τ ′1][y1; τ ′1] · · · [yn; τ ′n]/Q. This, along with isomorphism (11) gives the re-
sult. �
When R is replaced by k, we have the following result.
Corollary 6.3. Let A = k[x1, τ1, δ1] · · · [xn; τn, δn], where τi(xj) = λijxj and δiτi = qiτiδi,
qi 6= 1, for λij , qi ∈ k× and all i with j < i. Assume that each δi extends to a locally
nilpotent, iterative h.qi-s.τi-d. {di,m}∞m=0 on the subalgebra k[x1; τ1, δ1] · · · [xi−1; τi−1, δi−1].
Let P be a completely prime ideal in A and set λii = 1 and λji = λ
ij . Then for
λ = (λij) ∈ Mn(k), and an appropriate completely prime ideal Q ⊆ Oλ(kn), we have
FractA/P ∼= FractOλ(kn)/Q.
We summarize how this applies to the k-algebras of quantized coordinate type.
Corollary 6.4. Let A be any of the examples discussed in sections 5.1 - 5.4, and let P be
a completely prime ideal of A. Then there exist a positive integer N , a multiplicatively
antisymmetric N ×N matrix λ over k, and a completely prime ideal Q ∈ Oλ(kN) such
that FractA/P ∼= FractOλ(kN)/Q.
To complete the question posed by the corollary, one might ask how far the quantum
Gel’fand-Kirillov conjecture extends to prime factor algebras. For instance:
Question 6.5. Find conditions under which we can conclude that for any positive in-
teger n, multiplicatively antisymmetric matrix λ ∈ Mn(k×), and completely prime ideal
Q ∈ specOλ(kn), we have
FractOλ(kn)/Q ∼= FractOp(Km)
for some field extension K ⊇ k, integer m ≤ n, and m×m matrix p over K.
The case n = 1 is trivial. When n = 2 and Q contains x1 or x2, then FractOλ(k2)/Q is
isomorphic either to FractOp(k(y)) where p = (1), or to k itself. In fact, for any n, if Q
is generated by a subset S of {x1, . . . , xn}, then the result holds, with p the submatrix
of λ formed by deleting the ith row and column for xi ∈ S, and K = k. When xi /∈ Q
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 41
for all i, answering the question fully will likely require different methods depending on
the presence of roots of unity among the λij. A positive answer in the generic case has
been provided in the proof of [13, Theorem 2.1]:
Theorem 6.6. [Goodearl - Letzter] Let k be a field, λ = (λij) a multiplicatively anti-
symmetric n× n matrix over k×, and Λ the subgroup of k× generated by the λij. If Λ
is torsionfree, then all of the prime ideals Q of Oλ(kn) are completely prime.
In their proof, they showed that FractOλ(kn)/Q ∼= FractOp(Km), and identified K as
the quotient field of a commutative domain embedded in the center of Oλ((k×)n)/Q′,
where Q′ is the prime ideal in Oλ((k×)n) induced by localization.
Quantum affine space is included in a class called quantum solvable algebras by A. N. Panov.
The main theorem of [34, Section 3], states that when the group generated by the λij is
torsionfree, then FractOλ(kn)/Q is isomorphic to the quotient division ring of a quan-
tum torus. The main theorem of [35, Section 3], allows roots of unity and states that
when Q satisfies the extra condition of being stable under a certain set of derivations,
then FractOλ(kn)/Q is isomorphic to the quotient division ring of a quantum torus.
Cauchon’s work may also be specialized to apply to quantum affine space when the
group generated by the λij is torsionfree. The result of [5, Theorem 6.1.1], indicates
that FractOλ(kn)/Q is isomorphic to FractOp(Km) which specializes to this result.
But the division ring of real quaternions provides an example showing that Question
6.5 needs to have some conditions imposed. Note that
H ∼= Oλ(R3)/Q, where λ =
1 −1 −1
−1 1 −1
−1 −1 1
 , andQ = 〈x21 + 1, x22 + 1, x23 + 1〉.
Therefore, we cannot obtain the desired isomorphism of quotient division rings in this
case, illustrating the necessity of an extra condition such as the one imposed by Panov
in [35].
acknowledgments
The author thanks her dissertation advisor, Ken Goodearl, for his direction that was so
freely given in many inspiring discussions.
References
[1] J. Alev and F. Dumas, Sur le corps de fractions de certaines algèbres quantiques, J. Algebra
170 (1994), 229-265
[2] M. Artin, W. Schelter, and J. Tate, Quantum deformations of GLn, Comm. Pure Appl.
Math 44 (1991), 879-895
[3] N. Bourbaki, Éléments de mathématique, Livre II, Algèbre, Chapitre 9, Formes sesquilinéaires
et formes quadratiques, Hermann, Paris, 1959
42 HEIDI HAYNAL
[4] K. A. Brown and K. R. Goodearl, Lectures on Algebraic Quantum Groups, Birkhäuser Verlag,
Basel - Boston, 2002
[5] G. Cauchon, Effacement des dérivations et spectres premiers des algèbres quantiques, J. Algebra
260 (2003), 476-518
[6] G. Cauchon, Spectre premier de Oq(Mn(k)) image canonique et séparation normale, J. Algebra
260 (2003), 519-569
[7] G. Cliff, The division ring of quotients of the coordinate ring of the quantum general linear
group, J. London Math. Soc. (2) 51 (1995), 503-513
[8] C. De Concini and C. Procesi, Quantum Groups, in D-modules Representation Theory, and
Quantum Groups (Venezia, June 1992) (G. Zampieri and A. D’Agnolo, eds.), Lecture Notes in
Math. 1565, Springer-Verlag, Berlin, 1993, 31-140
[9] K. R. Goodearl, Uniform ranks of prime factors of skew polynomial rings, in Ring Theory, Proc.
Biennial Ohio State - Denison Conf., 1992 (S. K. Jain and S. T. Rizvi, eds.), World Scientific,
Singapore, 1993, 182-199
[10] K. R. Goodearl, Prime ideals in Skew polynomial rings and quantized Weyl algebras, Trans.
Amer. Math. Soc. 352 (2000), 1381-1403
[11] K. R. Goodearl, Prime spectra of quantized coordinate rings, in Interactions between Ring
Theory and Representations of Algebras (Murcia 1998) (F. Van Oystaeyen and M. Saoŕın, eds.),
Dekker, New York, 2000, pp. 205-237
[12] K. R. Goodearl and T. H. Lenagan, Catenarity in quantum algebras, J. Pure and Appl.
Algebra 111 (1996), 123-142
[13] K. R. Goodearl and E. S. Letzter, Prime factor algebras of the coordinate ring of quantum
matrices, Proc. Amer. Math. Soc. 121 (1994), 1017-1025
[14] K. R. Goodearl and E. S. Letzter, Prime ideals in skew and q-skew polynomial rings, Mem.
Amer. Math. Soc. 521 (1994)
[15] K. R. Goodearl and E. S. Letzter, The Dixmier-Moeglin equivalence in quantum coordinate
rings and quantized Weyl Algebras, Trans. Amer. Math. Soc. 352 (2000), 1381-1403
[16] K. R. Goodearl and R. B. Warfield, Jr., An Introduction to Noncommutative Noetherian
Rings, 2nd ed., Cambridge Univ. Press, Cambridge, 2004
[17] H. A. Haynal, Pi degree parity in q-skew polynomial rings, Ph.D. Thesis, to appear, (2007)
University of California, Santa Barbara
[18] K. L. Horton, The prime and primitive spectra of multiparameter quantum symplectic and eu-
clidean spaces, Comm. Algebra 31 (10) (2003), 4713-4743
[19] H. P. Jakobsen and S. Jøndrup, Quantized rank r matrices, J. Algebra 246 (2001), 70-96,
arXiv:math.QA/9902133 v3, 23 May 2001
[20] H. P. Jakobsen and H. Zhang, The center of the quantized matrix algebra, J. Albegra 196
(1997), 458-474
[21] S. Jøndrup, Representations of skew polynomial algebras, Proc. Amer. Math Soc. 128 (2000),
1301-1305
[22] S. Jøndrup, Representations of some PI algebras, Comm. Algebra 31 (6) (2003), 2587-2602
[23] D. A. Jordan, A simple localization of the quantized Weyl algebra, J. Algebra 174 (1995), 267-281
[24] T. Y. Lam, Lectures on Modules and Rings, Springer, New York, 1999
[25] D. R. Malm, Simplicity of partial and Schmidt differential operator rings, Pacific J. Math. 132
(1998), no. 1, 85-112
[26] G. Maltsiniotis, Calcul différentiel quantique, Groupe de travail, Université Paris VII (1992)
[27] J. C. McConnell and J. C. Robson, Noncommutative Noetherian Rings, Wiley-Interscience,
Chichester - New York, 1987
http://arxiv.org/abs/math/9902133
PI DEGREE PARITY IN q-SKEW POLYNOMIAL RINGS 43
[28] V. G. Mosin and A. N. Panov, Division rings of quotients and central elements of multiparam-
eter quantizations, Sbornik: Mathematics 187:6 (1996), 835-855
[29] I. M. Musson, Ring theoretic properties of the coordinate rings of quantum symplectic and Eu-
clidean space, in Ring Theory, Proc. Biennial Ohio State-Denison Conf., 1992 (S.K. Jain and S.T.
Rizvi, eds.), World Scientific, Singapore, 1993, 248-258
[30] M. Newman, Integral Matrices, Academic Press, 1972
[31] S. Q. Oh, Catenarity in a class of iterated skew polynomial rings, Comm. Algebra 25 (1) (1997),
37-49
[32] A. N. Panov, Skew fields of twisted rational functions and the skew field of rational functions on
GLq(n,K), St. Petersburg Math J. 7 (1) (1996), 129-143
[33] A. Panov, Fields of fractions of quantum solvable algebras, J. Algebra 236 (2001), 110-121
[34] A. Panov, Stratification of prime spectrum of quantum solvable algebras, Comm. Algebra 29(9)
(2001), 3801-3827
[35] A. Panov, Quantum solvable algebras. Ideals and representations at roots of 1, Transformation
Groups 7, no. 4, (2002) 379-402
[36] N. Yu. Reshetikhin, L. A. Takhtadzhyan, and L. D. Fadeev, Quantization of Lie Groups
and Lie Algebras, Leningrad Math J. 1 (1990), 193-225
[37] L. H. Rowen, Ring Theory, Volumes I and II, Academic Press, Boston, 1988
[38] S. P. Smith, Quantum groups: An introduction and survey for ring theorists, in Noncommutative
Rings (S. Montgomery and L. W. Small, eds.), pp131-178, MSRI Publ. 24, Springer-Verlag, Berlin
(1992)
[39] R. P. Stanley, Enumerative Combinatorics, Vol. I, Wadsworth & Brooks/Cole, Monterey, CA,
Department of Mathematics, University of California, Santa Barbara, California
93106
E-mail address : heidi@softerhardware.com
	1. Introduction
	2. Higher q-Skew -Derivations
	3. The -Derivation Removing Homomorphism
	4. Main Theorem
	5. Examples
	5.1. The coordinate ring of odd-dimensional quantum Euclidean space; Oq (o k2n+1)
	5.2. The multiparameter quantized Weyl algebras; AnQ, (k)
	5.3. The multiparameter coordinate ring of quantum n n matrices; O, bold0mu mumu pppppp(to.Mn(k))to.
	5.4. The algebra Kn, P, Q (k), which generalizes the coordinate rings of even-dimensional quantum Euclidean space and quantum symplectic space
	6. Prime Factor Localizations
	acknowledgments
	References
ABSTRACT
  For k a field of arbitrary characteristic, and R a k-algebra, we show that
the PI degree of an iterated skew polynomial ring
R[x_1;\tau_1,\delta_1]...b[x_n;\tau_n,\delta_n] agrees with the PI degree of
R[x_1;\tau_1]...b[x_n;\tau_n] when each (\tau_i,\delta_i) satisfies a q_i-skew
relation for q_i \in k^{\times} and extends to a higher q_i-skew
\tau_i-derivation. We confirm the quantum Gel'fand-Kirillov conjecture for
various quantized coordinate rings, and calculate their PI degrees. We extend
these results to completely prime factor algebras.

<|endoftext|><|startoftext|>
Semi-spheroidal Quantum Harmonic Oscillator
D. N. Poenaru,1, 2, ∗ R. A. Gherghescu,1, 2 A. V. Solov’yov,1 and W. Greiner1
1Frankfurt Institute for Advanced Studies, J. W. Goethe Universität,
Max-von-Laue-Str. 1, D-60438 Frankfurt am Main, Germany
2 Horia Hulubei National Institute of Physics and Nuclear Engineering (IFIN-HH),
P.O. Box MG-6, RO-077125 Bucharest-Magurele, Romania
(Dated: November 15, 2018)
A new single-particle shell model is derived by solving the Schrödinger equation for a semi-
spheroidal potential well. Only the negative parity states of the Z(z) component of the wave function
are allowed, so that new magic numbers are obtained for oblate semi-spheroids, semi-sphere and
prolate semi-spheroids. The semi-spherical magic numbers are identical with those obtained at the
oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68, 100, 140, ... The superdeformed prolate
magic numbers of the semi-spheroidal shape are identical with those obtained at the spherical shape
of the spheroidal harmonic oscillator: 2, 8, 20, 40, 70, 112, 168 ...
PACS numbers: 03.65.Ge, 21.10.Pc, 31.10.+z,
The spheroidal harmonic oscillator have been used in
various branches of Physics. Of particular interest was
the famous single-particle Nilsson model [1] very success-
ful in Nuclear Physics and its variants [2, 3, 4] for atomic
clusters. Major spherical-shells N = 2, 8, 20, 40, 58, 92
have been found [2] in the mass spectra of sodium clus-
ters of N atoms per cluster, and the Clemenger’s shell
model [3] was able to explain this sequence of spherical
magic numbers.
In the present paper we would like to write explicitly
the analytical relationships for the energy levels of the
spheroidal harmonic oscillator and to derive the corre-
sponding solutions for a semi-spheroidal harmonic oscil-
lator which may be useful to study atomic cluster de-
posited on planar surfaces.
For spheroidal equipotential surfaces, generated by a
potential with cylindrical symmetry the states of the va-
lence electrons were found [3] by using an effective single-
particle Hamiltonian with a potential
Mω20R
2 + δ
2 + δ
In order to get analytical solutions we shall neglect an
additional term proportional to (l2 − 〈l2〉n). We plan to
include in the future such a term which needs a numerical
solution. K. L. Clemenger introduced the deformation δ
by expressing the dimensionless two semiaxes (in units
of the radius of a sphere with the same volume, R0 =
1/3, where rs is the Wigner-Seitz radius, 2.117 Å for
Na [5, 6]) as
2 + δ
; c =
2 + δ
The spheroid surface equation in dimensionless cylindri-
cal coordinates ρ and z is given by
= 1 (3)
where a is the minor (major) semiaxis for prolate (oblate)
spheroid and c is the major (minor) semiaxis for prolate
(oblate) spheroid. Volume conservation leads to a2c = 1.
One can separate the variables in the Schrödinger
equation, HΨ = EΨ, written in cylindrical coordinates.
As a result the wave function [7, 8] may be written as
Ψ(η, ξ, ϕ) = ψmnr (η)Φm(ϕ)Znz (ξ) (4)
where each component of the wave function is ortonor-
malized leading to
Φm(ϕ) = e
2π (5)
ψ(η) = Nmnrη
|m|/2e−η/2L
nr (η)
Nmnr =
α⊥(nr+|m|)!
in which η = R20ρ
2/α2⊥ and the quantum numbers m =
(n⊥−2i) with i = 0, 1, ... up to (n⊥−1)/2 for an odd n⊥
or to (n⊥− 2)/2 for an even n⊥. Lmn (x) is the associated
Laguerre polynomial and the constant α⊥ =
h̄/Mω⊥
has the dimension of a length.
Znz(ξ) = Nnze
−ξ2Hnz (ξ)
Nnz =
π2nznz!)
1/2 (7)
where ξ = R0z/αz, αz =
h̄/Mωz, and the main quan-
tum number n = n⊥ + nz = 0, 1, 2, ....
The eigenvalues are
En = h̄ω⊥(n⊥ + 1) + h̄ωz(nz + 1/2) (8)
The parity of the Hermite polynomials Hnz (ξ) is given
by (−1)nz meaning that the even order Hermite poly-
nomials are even functions H2nz(−ξ) = H2nz (ξ) and
the odd order Hermite polynomials are odd functions
H2nz+1(−ξ) = −H2nz+1(ξ). There is a recurrence rela-
tionship 2zHn = Hn+1+2nHn−1. One hasH0 = 1, H1 =
http://arxiv.org/abs/0704.0847v1
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
(spheroidal deformation)
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8
(spheroidal deformation)
FIG. 1: LEFT: Spheroidal harmonic oscillator energy levels in units of h̄ω0 vs. the deformation parameter δ. Only 6 major
shells (N = 0, 1, 2, ..., 5) have been considered. Each level is labeled by n, n⊥ quantum numbers and is (2n⊥+2)-fold degenerate.
The labels are 0, 0; 1, 0, 1, 1; 2, 0, 2, 1, 2, 2; 3, 0, 3, 1, 3, 2, 3, 3; 4, 0, 4, 1, 4, 2, 4, 3, 4, 4, etc. RIGHT: Semi-spheroidal harmonic
oscillator energy levels in units of h̄ω0 vs. the deformation coordinate δ. Only 9 major shells (N = 0, 1, 2, ..., 8) have been
considered. Each level is labeled by n, n⊥ quantum numbers (with nz = n− n⊥ = 1, 3, 5, ... and is (2n⊥ + 2)-fold degenerate.
The labels are 1, 0; 2, 1; 3, 2, 3, 0; 4, 3, 4, 1; 5, 4, 5, 2, 5, 0; 6, 5, 6, 3, 6, 1, etc. The semi-spherical magic numbers are identical
with those obtained at the oblate spheroidal superdeformed shape (δ = −2/3): 2, 6, 14, 26, 44, 68, 100, 140, ...
2z, H2 = 4z
2−2, H3 = 8z3−12z, H4 = 16z4−48z2+12,
H5 = 32z
5 − 160z3 + 120z, etc.
In units of h̄ω0 the eigenvalues, ǫ = E/(h̄ω0), are given
(2− δ)1/3(2 + δ)2/3
For a prolate spheroid, δ > 0, at n⊥ = 0 the energy level
decreases with deformation except for n = 0, but when
n⊥ = n it increases. For a given prolate deformation and
z0=0.5
z1=1.5
z2=2.5
z3=3.5
z4=4.5
z5=5.5
FIG. 2: LEFT: Harmonic oscillator potential V = V (ξ), the
wave functions Znz = Znz (ξ) for nz = 0, 1, 2, 3, 4, 5 and the
corresponding contributions to the total energy levels ǫz nz =
Enz/h̄ωz = (nz + 1/2) for spherical shapes, δ = 0. ξ =
h̄/Mωz. RIGHT: The similar functions for a semi-
spherical harmonic oscillator potential. Only negative parity
states are retained which are vanishing at ξ = 0 where the
potential wall is infinitely high .
a maximum energy ǫm, there are nmin closed shells and
other levels for high-order shells up to nmax:
nmin =
(2− δ)1/3(2 + δ)2/3ǫm −
2 + δ
nmax =
(2− δ)1/3(2 + δ)2/3ǫm −
2− δ (11)
and similar formulae for oblate deformations, δ < 0. The
low lying energy levels for the six shells (main quantum
number n = 0, 1, 2, 3, 4, 5) can be seen in figure 1. Each
level, labelled by n⊥, n, may accomodate 2n⊥ + 2 parti-
cles. One has 2
(n⊥+1) = (n+1)(n+2) nucleons
in a completely filled shell charcterized by n, and the
total number of states of the low-lying n + 1 shells is
n=0(n+1)(n+2) = (n+1)(n+2)(n+3)/3 leading to
the magic numbers 2, 8, 20, 40, 70, 112, 168... for a spheri-
cal shape. Besides the important degeneracy at a spher-
ical shape (δ = 0), one also have degeneracies at some
superdeformed shapes, e.g. for prolate shapes at the ra-
tio c/a = (2 + δ)/(2 − δ) = 2 i.e. δ = 2/3. More details
may be found in the Table I. The first five shells can
reproduce the experimental magic numbers mentioned
above; in order to describe the other shells Clemenger
introduced the term proportional to (l2 − 〈l2〉n).
Let us consider a particular shape (half of an oblate or
prolate spheroid) of a semi-spheroidal cluster deposited
on a surface with the z axis perpendicular on the sur-
face and the ρ axis in the surface plane. Then the semi-
20 40 60 80 100 120 140
20 40 60 80 100 120 140
20 40 60 80 100 120 140 160
20 40 60 80 100 120 140
= - 1
= - 2/3
= - 0.4
= - 0.8/3
= 2/3
= 0.4
= 0.8/3
FIG. 3: Variation of shell corrections with N for Na clusters. TOP: δ = 0. The semi-spherical magic numbers are identical
with those obtained at the oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68, 100, 140, ... For a prolate superdeformed
(δ = 2/3) shape the magic numbers are identical with those obtained at the spherical shape: 2, 8, 20, 40, 70, 112, 168, ...
Other magic numbers are given in Table I. Oblate and prolate shapes are considered on the left-hand side and right-hand side,
respectively.
spheroidal surface equation is given by
(a/c)2(c2 − z2) z ≥ 0
0 z < 0
The radius of the semi-sphere obtained for the defor-
mation δ = 0 is Rs, given by the volume conservation,
(1/2)(4πR3s/3) = 4πR
0/3, leading to Rs = 2
1/3R0. We
shall give ρ, z, a, c in units of Rs instead of R0. Accord-
ing to the volume conservation, a2cR3s/2 = R
0 so that
a2c = 1. Other kind of shapes obtained from a spheroid
by removing less or more than its half (as in the liquid
drop calculations [9]) will be considered in the future; in
this case it is not possible to obtain analytical solutions.
The new potential well we have to consider in order to
solve the quantum mechanical problem is shown in the
right-hand side of the figure 2. The potential along the
symmetry axis, Vz(z), has a wall of an infinitely large
height at z = 0, and concerns only positive values of z
∞ z = 0
MR2sω
2/2 z ≥ 0 (13)
In this case the wave functions should vanish in the
origin, where the potential wall is infinitely high, so
that only negative parity Hermite polynomials (nz odd)
should be taken into consideration.
From the energy levels given in figure 1 we have to
select only those corresponding to this condition. In this
way the former lowest level with n = 0, n⊥ = 0 should
be excluded. From the two leveles with n = 1 we can
retain the level with n⊥ = 0 i.e. nz = 1. This will be
the lowest level for the semi-spherical harmonic oscillator
and will accomodate 2n⊥+2 = 2 atoms. From the three
levels with n = 2 only the one with nz = n⊥ = 1 with
2n⊥ + 2 = 4 degeneracy is retained so that the first two
magic numbers at spherical shape (δ = 0) are now 2
followed by 6, etc. Some deformed magic numbers may
be found in the Table I and as position of minima in
Fig. 3.
Each level, labelled by n⊥, n, may accomodate 2n⊥+2
particles. When n is an odd number, one should only
have even n⊥ in order to select the odd nz = n − n⊥.
The contribution of the shells with odd n to the semi-
spherical magic numbers will be
neven
(2n⊥ + 2) =
(n+ 1)2
leading to the sequence 2, 8, 18 for n = 1, 3, 5. The con-
tribution of the shells with even n to the semi-spherical
TABLE I: TOP: Deformed magic numbers of the spheroidal harmonic oscillator.
BOTTOM: Deformed magic numbers of the semi-spheroidal harmonic oscillator.
OBLATE PROLATE
δ a/c Magic numbers δ a/c Magic numbers
−0.8/3 17/13 2, 8, 18, 20, 34, 38, 58, 64, 92,
100, 136, 148, ...
0.8/3 13/17 2, 8, 20, 22, 42, 46, 76, 82, 124,
134 ...
−0.4 1.5 2, 6, 8, 14, 18, 28, 34, 48, 58, 76,
90, 114, 132, ...
0.4 2/3 2, 8, 10, 22, 26, 46, 54, 66, 84,
96, 114, 138, 156, ...
−2/3 2 2, 6, 14, 26, 44, 68, 100, 140, ... 2/3 0.5 2, 4, 10, 16, 28, 40, 60, 80, 110,
140, ...
−1 3 2, 6, 12, 22, 36, 54, 78, 108, 144,
1 1/3 4, 12, 18, 24, 36, 48, 60, 80, 100,
120, 150, ...
−0.8/3 17/13 2, 6, 12, 22, 26, 36, 42, 56, 64,
82, 92, 114, 126, 154, ...
0.8/3 13/17 2, 6, 8, 14, 18, 28, 34, 48, 58, 76,
90, 114, 132, ...
−0.4 1.5 2, 6, 12, 22, 36, 54, 78, 108, 144,
0.4 2/3 2, 8, 18, 20, 34, 38, 50, 58, 64,
80, 92, 100, ...
−2/3 2 2, 6, 12, 20, 32, 48, 68, 92, 122,
158, ...
2/3 0.5 2, 8, 20, 40, 70, 112, 168, ...
−1 3 2, 6, 20, 30, 42, 58, 78, 102, 130,
1 1/3 2, 8, 10, 14, 22, 26, 46, 54, 66,
84, 96, 114, 138, 156, ...
magic numbers will be
(2n⊥ + 2) =
n(n+ 2)
which gives the sequence 4, 12, 24 for n = 2, 4, 6. This
should be interlaced with the preceding one so that the
magic numbers will be 2, 2+4 = 6, 6+8 = 14, 14+12 =
26, 26+18 = 44, 44+24 = 68, as shown at the right-hand
side of the Fig. 1.
The equation (9) from the harmonic oscillator, in units
of h̄ω0 is still valid, but one should only allow the values
of n and n⊥ for which nz = n−n⊥ ≥ 1 are odd numbers.
The ortonormalization condition of the Znz component
of the wave function became
Zn′z(z)Znz(z)dz = δn′znz (16)
with nz = 1, 3, 5, ..., n for odd n and nz = 1, 3, 5, ..., n− 1
for even n. Consequently the normalization factor is
times the preceding one
Znz(ξ) =
2Nnze
−ξ2Hnz(ξ)
Nnz =
π2nznz!)
1/2 (17)
For a nucleus with mass number A the shell gap is
given by h̄ω00 = 41A
1/3 MeV. For an atomic cluster [10]
the single-particle shell gap is given by
h̄ω0(N) =
13.72 eV Å
rsN1/3
which is 3.0613N−1/3 eV in case of Na clusters. Since we
consider solely monovalent elements, N in this eq. is the
number of atoms and t denotes the electronic spillout for
the neutral cluster according to [10].
The shell correction energy, δU [11], in figure 3 shows
minima at the oblate and prolate magic numbers given
in the lower part of the table I. The striking result is
that the superdeformed prolate magic numbers of the
semi-spheroidal shape are identical with those obtained
at the spherical shape of the spheroidal harmonic oscil-
lator. We expect that this kind of symmetry will not
be present anylonger for the Hamiltonian including the
term proportional to (l2−〈l2〉n) and/or the more complex
equipotential surface we shall study in the future.
∗ poenaru@fias.uni-frankfurt.de
[1] S. G. Nilsson, Det Kongelige Danske Videnskabernes Sel-
skab (Dan. Mat. Fys. Medd.) 29 (1955).
[2] W. D. Knight, K. Clemenger, W. A. de Heer, W. A.
Saunders, M. Y. Chou, and M. L. Cohen, Phys. Rev.
Lett. 52, 2141 (1984).
[3] K. L. Clemenger, Phys. Rev. B 32, 1359 (1985).
[4] S. M. Reimann, M. Brack, and K. Hansen, Z. Phys. D
28, 235 (1993).
[5] M. Brack, Phys. Rev. B 39, 3533 (1989).
[6] C. Yannouleas and U. Landman, Phys. Rev. B 51, 1902
(1995).
[7] A. J. Rassey, Phys. Rev. 109, 949 (1958).
[8] D. Vautherin, Phys. Rev. C 7, 296 (1973).
[9] V. V. Semenikhina, A. G. Lyalin, A. V. Solov’yov, and
W. Greiner, to be published (2007).
[10] K. L. Clemenger, Ph. D. Dissertation (1985), University
of California, Berkeley.
[11] V. M. Strutinsky, Nuclear Physics, A 95, 420 (1967).
mailto:poenaru@fias.uni-frankfurt.de
ABSTRACT
  A new single-particle shell model is derived by solving the Schr\"odinger
equation for a semi-spheroidal potential well. Only the negative parity states
of the $Z(z)$ component of the wave function are allowed, so that new magic
numbers are obtained for oblate semi-spheroids, semi-sphere and prolate
semi-spheroids. The semi-spherical magic numbers are identical with those
obtained at the oblate spheroidal superdeformed shape: 2, 6, 14, 26, 44, 68,
100, 140, ... The superdeformed prolate magic numbers of the semi-spheroidal
shape are identical with those obtained at the spherical shape of the
spheroidal harmonic oscillator: 2, 8, 20, 40, 70, 112, 168 ...

<|endoftext|><|startoftext|>
Introduction to mathematics of quasicrys-
tals, edited by M. V. Jaric (Academic Press, Boston,
1989), pp. 53–80.
[6] S. Dworkin and J.-I. Shieh, Commun. math. Phys. 168,
337 (1995).
[7] R. Penrose, Bull. Inst. Math. and Its Appl. 10, 266
(1974).
[8] M. V. Jaric and M. Ronchetti, Phys. Rev. Lett. 62, 1209
(1989).
[9] G. van Ophuysen, M. Weber, and L. Danzer, J. Phys. A
28, 281 (1995).
[10] J. E. S. Socolar, in Quasicrystals, The State of Art (2nd
Ed.), edited by D. DiVincenzo and P. J. Steinhardt
(World Scientific, Singapore, 1999), pp. 225–250.
[11] M. Gardner, Sci. Am. 236, 110 (1977).
[12] H.-C. Jeong and E. D. Williams, Surface Science Reports
34, 171 (1999).
[13] R. Penrose, Emperor’s New Mind (Oxford University
Press, New York, 2002), p. 640ff.
[14] T. Dotera, H.-C. Jeong, and P. J. Steinhardt, in Methods
of structural analysis of modulated structures and qua-
sicrystals, edited by et. al.. J. M. Perez-Mato (World
Scientific, Singapore, 1991), pp. 660–663.
[15] K. Ingersent, Ph.D. thesis, University of Pennsylvania,
1990.
[16] We introduce the concept of sticky sites as a mathemat-
ical devise to avoid the mistakes of copying a tile in a
flipped worm but it may mimic the real growth process of
quasicrystals. Recently, Fournée et al. observed “traps”
or sticky sites on which adatoms are easily captured for
some quasicrystal surfaces [21].
[17] Covering of all ends of the semi-infinite worms further
ensures that all sticky sites in the decapod tiling stay
outside of the semi-infinite worms.
[18] From the Fig 2(b), one can see that an active decapod
tiling is obtained by flipping a semi-infinite worm in a
cartwheel tiling while flipping a half of an infinite worm
results in an inactive decapod tiling.
[19] The requirement of the underneath tiles is added to en-
sure that the nucleation happens from the third layer.
In real growth, this requirement may not be needed. Nu-
cleation is likely to happen only on the top layer (hence
from the third layer) which can be large enough to wait
the slow necleation process.
[20] The sticky sites (Fig. 2(a)) are formed only on a compact
cluster of tiles. Therefore, the vertical growth by Rule-V,
which produces isolated tiles, cannot be continue more
than one layer height without nucleation process.
[21] V. Fournée et al., Phys. Rev. B 67, 033406 (2003).
http://arxiv.org/abs/cond-mat/9903074
ABSTRACT
  A local growth algorithm for a decagonal quasicrystal is presented. We show
that a perfect Penrose tiling (PPT) layer can be grown on a decapod tiling
layer by a three dimensional (3D) local rule growth. Once a PPT layer begins to
form on the upper layer, successive 2D PPT layers can be added on top resulting
in a perfect decagonal quasicrystalline structure in bulk with a point defect
only on the bottom surface layer. Our growth rule shows that an ideal
quasicrystal structure can be constructed by a local growth algorithm in 3D,
contrary to the necessity of non-local information for a 2D PPT growth.

<|endoftext|><|startoftext|>
Introduction
Cosmological models representing the early stages of the Universe have been
studied by several authors. An LRS (Locally Rotationally Symmetric) Binachi
type-V spatially homogeneous space-time creates more interest due to its richer
structure both physically and geometrically than the standard perfect fluid FRW
models. An LRS Bianchi type-V universe is a simple generalization of the
Robertson-Walker metric with negative curvature. Most cosmological models
assume that the matter in the universe can be described by ’dust’ (a pressure-
less distribution) or at best a perfect fluid. However, bulk viscosity is expected
to play an important role at certain stages of expanding universe [1]−[3]. It has
been shown that bulk viscosity leads to inflationary like solution [4] and acts
like a negative energy field in an expanding universe [5]. Furthermore, there
are several processes which are expected to give rise to viscous effects. These
are the decoupling of neutrinos during the radiation era and the decoupling of
1Corresponding Author
http://arxiv.org/abs/0704.0849v2
radiation and matter during the recombination era. Bulk viscosity is associ-
ated with the Grand Unification Theories (GUT) phase transition and string
creation. Thus, we should consider the presence of a material distribution other
than a perfect fluid to have realistic cosmological models (see Grøn [6] for a
review on cosmological models with bulk viscosity). A number of authors have
discussed cosmological solutions with bulk viscosity in various context [7]−[9].
Models with a relic cosmological constant Λ have received considerable at-
tention recently among researchers for various reasons (see Refs.[10]−[14] and
references therein). Some of the recent discussions on the cosmological constant
“problem” and consequence on cosmology with a time-varying cosmological con-
stant by Ratra and Peebles [15], Dolgov [16]−[18] and Sahni and Starobinsky
[19] have pointed out that in the absence of any interaction with matter or radi-
ation, the cosmological constant remains a “constant”. However, in the presence
of interactions with matter or radiation, a solution of Einstein equations and
the assumed equation of covariant conservation of stress-energy with a time-
varying Λ can be found. For these solutions, conservation of energy requires
decrease in the energy density of the vacuum component to be compensated by
a corresponding increase in the energy density of matter or radiation. Earlier
researchers on this topic, are contained in Zeldovich [20], Weinberg [11] and
Carroll, Press and Turner [21]. Recent observations by Perlmutter et al. [22]
and Riess et al. [23] strongly favour a significant and positive value of Λ. Their
finding arise from the study of more than 50 type Ia supernovae with redshifts
in the range 0.10 ≤ z ≤ 0.83 and these suggest Friedmann models with negative
pressure matter such as a cosmological constant (Λ), domain walls or cosmic
strings (Vilenkin [24], Garnavich et al. [25]) Recently, Carmeli and Kuzmenko
[26] have shown that the cosmological relativistic theory (Behar and Carmeli
[27]) predicts the value for cosmological constant Λ = 1.934 × 10−35s−2. This
value of “Λ” is in excellent agreement with the measurements recently obtained
by the High-Z Supernova Team and Supernova Cosmological Project (Garnavich
et al. [25], Perlmutter et al. [22], Riess et al. [23], Schmidt et al. [28]). The
main conclusion of these observations is that the expansion of the universe is
accelerating.
Several ansätz have been proposed in which the Λ term decays with time
(see Refs. Gasperini [29, 30], Berman [31], Freese et al. [14], Özer and Taha [14],
Peebles and Ratra [32], Chen and Hu [33], Abdussattar and Viswakarma [34],
Gariel and Le Denmat [35], Pradhan et al. [36]). Of the special interest is the
ansätz Λ ∝ S−2 (where S is the scale factor of the Robertson-Walker metric)
by Chen and Wu [33], which has been considered/modified by several authors
( Abdel-Rahaman [37], Carvalho et al. [14], Waga [38], Silveira and Waga [39],
Vishwakarma [40]).
Recently Bali and Yadav [41] obtained an LRS Bianchi type-V viscous fluid
cosmological models in general relativity. Motivated by the situations discussed
above, in this paper, we focus upon the exact solutions of Einstein’s field equa-
tions in presence of a bulk viscous fluid in an expanding universe. We do this
by extending the work of Bali and Yadav [41] by including a time dependent
cosmological term Λ in the field equations. We have also assumed the coefficient
of bulk viscosity to be a power function of mass density. This paper is organized
as follows. The metric and the field equations are presented in section 2. In
section 3 we deal with the solution of the field equations in presence of viscous
fluid. The sections 3.1 and 3.2 contain the two different cases and also con-
tain some physical aspects of these models respectively. Section 4 describe two
models under suitable transformations. Finally in section 5 concluding remarks
have been given.
2 The Metric and Field Euations
We consider LRS Bianchi type-V metric in the form
ds2 = −dt2 +A2dx2 +B2e2x(dy2 + dz2), (1)
where A and B are functions of t alone.
The Einstein’s field equations (in gravitational units c = 1, G = 1) read as
i + Λg
i = −8πT
i , (2)
where R
i is the Ricci tensor; R = g
ijRij is the Ricci scalar; and T
i is the stress
energy-tensor in the presence of bulk stress given by
i = (ρ+ p)viv
j + pg
i − (v
i; + v
;i + v
jvℓvi;ℓ + viv
ξ − 2
vℓ;ℓ(g
i + viv
j). (3)
Here ρ, p, η and ξ are the energy density, isotropic pressure, coefficients of
shear viscosity and bulk viscous coefficient respectively and vi the flow vector
satisfying the relations
ivj = −1. (4)
The semicolon (; ) indicates covariant differentiation. We choose the coordinates
to be comoving, so that vi = δi4.
The Einstein’s field equations (2) for the line element (1) has been set up as
= −8π
p− 2ηA4
ξ − 2
− Λ, (5)
= −8π
p− 2η
− Λ, (6)
2A4B4
= −8πρ− Λ, (7)
= 0. (8)
The suffix 4 after the symbols A, B denotes ordinary differentiation with respect
to t and
θ = vℓ;ℓ
3 Solutions of the Field Eqations
In this section, we have revisited the solutions obtained by Bali and Yadav [41].
Equations (5) - (8) are four independent equations in seven unknowns A, B, p,
ρ, ξ, η and Λ. For complete determinacy of the system, we need three extra
conditions.
Eq. (8), after integration, reduce to
A = Bk, (9)
where k is an integrating constant. Equations (5) and (6) lead to
− A44
− A4B4
= −16πη
. (10)
Using Eq. (9) in (10), we obtain
k + 1
f = −16πη, (11)
where B4 = f(B). Eq. (11) leads to
f = − 16πη
(k + 2)
, (12)
where L is an integrating constant. Eq. (12) again leads to
B = (k + 2)
k1 − k2e−16πηt
k+2 , (13)
where
, (14)
, (15)
N being constant of integration. From Eqs. (9) and (13), we obtain
A = (k + 2)
k1 − k2e−16πηt
k+2 . (16)
Hence the metric (1) reduces to the form
ds2 = −dt2 + (k + 2)
k1 − k2e−16πηt
k+2 dx2
+ e2x(k + 2)
k1 − k2e−16πηt
k+2 (dy2 + dz2). (17)
The pressure and density of the model (17) are obtained as
8πp =
(8π)(16πη)k2e
−16πηt
3(k + 2)2(k1 − k2e−16πηt)2
k1(k + 2)
2(4η + 3ξ)− {k2(4η + 3ξ)
+4k(η+3ξ)+2(5η+6ξ)}k2e−16πηt
[(k + 2)(k1 − k2e−16πηt)]
−Λ, (18)
8πρ = − (2k + 1)
(k + 2)2
(16πη)2k22
e−32πηt
(k1 − k2e−16πηt)2
[(k + 2)(k1 − k2e−16πηt)]
+ Λ. (19)
The expansion θ in the model (17) is obtained as
(16πη)k2e
−16πηt
(k1 − k2e−16πηt)
. (20)
For complete determinacy of the system we have to consider three extra condi-
tions. Firstly we assume that the coefficient of shear viscosity is constant, i.e.,
η = η0 (say). For the specification of Λ(t), we secondly assume that the fluid
obeys an equation of state of the form
p = γρ, (21)
where γ(0 ≤ γ ≤ 1) is a constant.
Thirdly bulk viscosity (ξ) is assumed to be a simple power function of the
energy density [42]−[45].
ξ(t) = ξ0ρ
n, (22)
where ξ0 and n are constants. For small density, n may even be equal to unity
as used in Murphy’s work [46] for simplicity. If n = 1, Eq. (22) may correspond
to a radiative fluid [47]. Near the big bang, 0 ≤ n ≤ 1
is a more appropriate
assumption [48] to obtain realistic models.
For simplicity and realistic models of physical importance, we consider the
following two cases (n = 0, 1):
3.1 Model I: Solution for n = 0
When n = 0, Eq. (22) reduces to ξ = ξ0 = constant. Hence, in this case Eqs.
(18) and (19), with the use of (21), lead to
8π(1 + γ)ρ =
k1(k + 2)
2(4η0 + 3ξ0)− {k2(4η0 + 3ξ0)
+ 4k(η0 + 3ξ0) + 2(5η0 + 6ξ0)}k2e−16πη0t
− (2k + 1)M
. (23)
Eliminating ρ(t) between Eqs. (19) and (23), we obtain
(1 + γ)Λ =
k1(k + 2)
2(4η0 + 3ξ0)− {k2(4η0 + 3ξ0)
+ 4k(η0 + 3ξ0) + 2(5η0 + 6ξ0)}k2e−16πη0t
+ (2k + 1)γ
(1− 3γ)
, (24)
where
M = 16πk2η0e
−16πη0t,
N = (k + 2)(k1 − k2e−16πη0t),
P = 2k2 + 2k + 5,
Q = k2 + 4k + 4. (25)
3.2 Model II: Solution for n = 1
When n = 1, Eq. (22) reduces to ξ = ξ0ρ . Hence, in this case Eqs. (18) and
(19), with the use of (21), leads to
8πρ =
16πM{2k1(k + 2)2η0 − Pk2η0e−16πη0t}
3 [(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}]
k+2 − (2k + 1)M2
[(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}]
. (26)
Eliminating ρ(t) between Eqs. (19) and (26), we get
Λ = 16πM [2k1(k + 2)
2η0 − Pk2η0e−16πη0t] +
γ(2k + 1)
(1 + γ)
(1− 3γ)
(1 + γ)N
M [k1(k + 2)
2ξ0 −Qk2ξ0e−16πη0t]{4N
k+2 − (2k + 1)M2}
(1 + γ)N2 [(1 + γ)N2 −M{k1(k + 2)2ξ0 −Qk2ξ0e−16πη0t}]
From Eqs. (23) and (26), we note that ρ(t) is a decreasing function of time
and ρ > 0 for all time in both models. The behaviour of the universe in these
models will be determined by the cosmological term Λ; this term has the same
effect as a uniform mass density ρeff = −Λ/4πG, which is constant in space
and time. A positive value of Λ corresponds to a negative effective mass density
(repulsion). Hence, we expect that in the universe with a positive value of Λ,
the expansion will tend to accelerate; whereas in the universe with negative
value of Λ, the expansion will slow down, stop and reverse. From Eqs. (24) and
(27), we observe that the cosmological term Λ in both models is a decreasing
function of time and it approaches a small positive value as time increase more
and more. This is a good agreement with recent observations of supernovae Ia
(Garnavich et al. [25], Perlmutter et al. [22], Riess et al. [23], Schmidt et al.
[28]).
The shear σ in the model (17) is given by
(k − 1)M√
. (28)
The non-vanishing components of conformal curvature tensor are given by
C2323 = −C1414 =
(k − 1)M
[kM − 16πη0k1(k + 2)], (29)
C1313 = −C2424 =
(k − 1)M
[16πη0k1(k + 2)− kM ], (30)
C1212 = −C3434 =
(k − 1)M
[16πη0k1(k + 2)− kM ]. (31)
Equations (20) and (28) lead to
(k − 1)√
3(k + 2)
= constant. (32)
The model (17) is expanding, non-rotating and shearing. Since σ
= conatant,
hence the model does not approach isotropy. The space-time (17) is Petrov type
D in presence of viscosity.
4 Other Models
After using the transformation
k1 − k2e−16πηt = sin (16πητ), k + 2 = 1/16πη, (33)
the metric (17) reduces to
ds2 = −
cos (16πητ)
k1 − sin (16πητ)
dτ2 +
sin (16πητ)
]2(1−32πη)
+ e2x
sin (16πητ)
](32πη)
(dy2 + dz2). (34)
The pressure (p), density (ρ) and the expansion (θ) of the model (34) are ob-
tained as
8πp =
(16πη)2{k1 − sin (16πητ)
3 sin2 (16πητ)
2k1−2(1−48πη+1152π2η2){k1−sin (16πητ)}
(16πη)(8πξ){k1 − sin (16πητ)}
sin (16πητ)
sin (16πητ)
]2(1−32πη)
− Λ, (35)
8πρ =
2(24πη − 1)(16πη)3{k1 − sin (16πητ)}2
sin2 (16πητ)
sin (16πητ)
]2(1−32πη)
(16πη){k1 − sin (16πητ)}
sin (16πητ)
. (37)
4.1 Model I: Solution for n = 0
When n = 0, Eq. (22) reduces to ξ = ξ0 = constant. Hence, in this case Eqs.
(35) and (36), with the use of (21), lead to
8π(1 + γ)ρ =
2(16πη0)
3 sin2(16πη0τ)
[k1 − P1M1] +
(16πη0)(8πξ0)M1
sin(16πη0τ)
+ 4N1 +
2(24πη0 − 1)(16πη0)3M21
sin2(16πη0τ)
. (38)
Eliminating ρ(t) between Eqs. (36) and (38), we obtain
(1 + γ)Λ =
2(16πη0)
3 sin2(16πη0τ)
[k1 − P1M1] +
(16πη0)(8πξ0)M1
sin(16πη0τ)
+ (1− 3γ)N1 +
2γ(24πη0 − 1)(16πη0)3M21
sin2(16πη0τ)
. (39)
4.2 Model II: Solution for n = 1
When n = 1, Eq. (22) reduces to ξ = ξ0ρ . Hence, in this case Eqs. (35) and
(36), with the use of (21), lead to
8πρ =
2(16πη0)
2M1[(k1 − P1M1) + 3(24πη0 − 1)(16πη0)M1]
3 sin(16πη0τ)[(1 + γ) sin(16πη0τ) − 16πη0ξ0M1]
4N1 sin(16πη0τ)
[(1 + γ) sin(16πη0τ) − 16πη0ξ0M1]
. (40)
Eliminating ρ(t) between Eqs. (36) and (40), we obtain
2(16πη0)
2M1(k1 − P1M1)
3 sin(16πη0τ)[(1 + γ) sin(16πη0τ)− 16πη0ξ0M1]
[3(16πη0ξ0)M1 + (1− 3γ) sin(16πη0τ)]
[(1 + γ) sin(16πη0τ)− 16πη0ξ0M1]
2(24πη0 − 1)(16πη0)3M21 [γ(1 + γ) sin(16πη0τ) − (1− γ)(16πη0ξ0)M1]
(1 + γ) sin2(16πη0τ)[(1 + γ) sin(16πη0τ)− (16πη0ξ0)M1]
, (41)
where
M1 = k1 − sin(16πη0τ),
16πη0
sin (16πη0τ)
]2(1−32πη)
P1 = 1− 48πη0 + 1152π2η20 . (42)
The shear (σ) in the model (34) is obtained as
(1− 48πη0)(16πη0)[k1 − sin(16πη0τ)√
3 sin(16πη0τ)
. (43)
The models descibed in cases 4.1 and 4.2 preserve the same properties as in the
cases of 3.1 and 3.2.
5 Conclusions
We have obtained a new class of LRS Bianchi type-V cosmological models of
the universe in presence of a viscous fluid distribution with a time dependent
cosmological term Λ. We have revisited the solutions obtained by Bali and Ya-
dav [41] and obtained new solutions which also generalize their work.
The cosmological constant is a parameter describing the energy density of
the vacuum (empty space), and a potentially important contribution to the dy-
namical history of the universe. The physical interpretation of the cosmological
constant as vacuum energy is supported by the existence of the “zero point”
energy predicted by quantum mechanics. In quantum mechanics, particle and
antiparticle pairs are consistently being created out of the vacuum. Even though
these particles exist for only a short amount of time before annihilating each
other they do give the vacuum a non-zero potential energy. In general relativity,
all forms of energy should gravitate, including the energy of vacuum, hence the
cosmological constant. A negative cosmological constant adds to the attractive
gravity of matter, therefore universes with a negative cosmological constant are
invariably doomed to re-collapse [49]. A positive cosmological constant resists
the attractive gravity of matter due to its negative pressure. For most universes,
the positive cosmological constant eventually dominates over the attraction of
matter and drives the universe to expand exponentially [50].
The cosmological constants in all models given in Sections 3.1 and 3.2 are
decreasing functions of time and they all approach a small and positive value
at late times which are supported by the results from recent type Ia supernova
observations recently obtained by the High-z Supernova Team and Supernova
Cosmological Project (Garnavich et al. [25], Perlmutter et al. [22], Riess et al.
[23], Schmidt et al. [28]). Thus, with our approach, we obtain a physically rele-
vant decay law for the cosmological term unlike other investigators where adhoc
laws were used to arrive at a mathematical expressions for the decaying vacuum
energy. Our derived models provide a good agreement with the observational
results. We have derived value for the cosmological constant Λ and attempted
to formulate a physical interpretation for it.
Acknowledgements
The authors wish to thank the Harish-Chandra Research Institute, Allahabad,
India, for providing facility where part this work was done. We also thank to
Professor Raj Bali for his fruitful suggestions and comments in the first draft of
the paper.
References
[1] C. W. Misner, Astrophys. J. 151, 431 (1968).
[2] G. F. R. Ellis, In General Relativity and Cosmology, Enrico Fermi Course,
R. K. Sachs. ed. (Academic Press, New York, 1979).
[3] B. L. Hu, In Advance in Astrophysics, eds. L. J. Fung and R. Ruffini,
(World Scientific, Singapore, 1983).
[4] T. Padmanabhan and S. M. Chitre, Phys. Lett. A 120, 433 (1987).
[5] V. B. Johri and R. Sudarshan, Proc. Int. Conf. on Mathematical Modelling
in Science and Technology, L. S. Srinath et al., eds (World Scientific,
Singapore, 1989).
[6] Ø. Grøn, Astrophys. Space Sci. 173, 191 (1990).
[7] A. Pradhan, V. K. Yadav and I. Chakrabarty, Int. J. Mod. Phys. D 10,
339 (2001).
I. Chakrabarty, A. Pradhan and N. N. Saste, Int. J. Mod. Phys. D 10,
741 (2001).
A. Pradhan and I. Aotemashi, Int. J. Mod. Phys. D 11, 1419 (2002).
A. Pradhan and H. R. Pandey, Int. J. Mod. Phys. D 12 , 941 (2003).
[8] L. P. Chimento, A. S. Jakubi and D. Pavon, Class. Quant. Grav. 16, 1625
(1999).
[9] G. P. Singh, S. G. Ghosh and A. Beesham, Aust. J. Phys. 50, 903 (1997).
[10] S. Weinberg, Rev. Mod. Phys. 61, 1 (1989).
[11] S. Weinberg, Gravitation and Cosmology, (Wiley, New York, 1972).
[12] J. A. Frieman and I. Waga, Phys. Rev. D 57, 4642 (1998).
[13] R. Carlberg, et al., Astrophys. J. 462, 32 (1996).
[14] M. Özer and M. O. Taha, Nucl. Phys. B 287, 776 (1987).
K. Freese, F. C. Adams, J. A. Frieman and E. Motta, ibid. B 287, 1797
(1987).
J. C. Carvalho, J. A. S. Lima and I. Waga, Phys. Rev.D 46, 2404 (1992).
V. Silviera and I. Waga, ibid. D 50, 4890 (1994).
[15] B. Ratra and P. J. E. Peebles, Phys. Rev. D 37, 3406 (1988).
[16] A. D. Dolgov, in The Very Early Universe, eds. G. W. Gibbons, S. W.
Hawking and S. T. C. Siklos, (Cambridge Univerity Press, 1983).
[17] A. D. Dolgov, M. V. Sazhin and Ya. B. Zeldovich, Basics of Modern
Cosmology, (Editions Frontiers, 1990).
[18] A. D. Dolgov, Phys. Rev. D 55, 5881 (1997).
[19] V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 9, 373 (2000).
[20] Ya. B. Zeldovich, Sov. Phys.-Uspekhi 11, 381 (1968).
[21] S. M. Carroll, W. H. Press and E. L. Turner, Ann. Rev. Astron. Astrophys.
30, 499 (1992).
[22] S. Perlmutter et al., Astrophys. J. 483, 565 (1997), Supernova Cosmology
Project Collaboration (astro-ph/9608192);
S. Perlmutter et al., Nature 391, 51 (1998), Supernova Cosmology Project
Collaboration (astro-ph/9712212);
S. Perlmutter et al., Astrophys. J. 517, 565 (1999), Project Collaboration
(astro-ph/9608192).
[23] A. G. Riess et al., Astron. J. 116, 1009 (1998); Hi-Z Supernova Team
Collaboration (astro-ph/9805201).
[24] A. Vilenkin, Phys. Rep. 121, 265 (1985).
[25] P. M. Garnavich et al., Astrophys. J. 493, L53 (1998a), Hi-z Supernova
Team Collaboration (astro-ph/9710123);
P. M. Garnavich et al., Astrophys. J. 509, 74 (1998b); Hi-z Supernova
Team Collaboration (astro-ph/9806396).
[26] M. Carmeli and T. Kuzmenko, Int. J. Theor. Phys. 41, 131 (2002).
[27] S. Behar and M. Carmeli, Int. J. Theor. Phys. 39, 1375 (2002) 1375.
http://arxiv.org/abs/astro-ph/9608192
http://arxiv.org/abs/astro-ph/9712212
http://arxiv.org/abs/astro-ph/9608192
http://arxiv.org/abs/astro-ph/9805201
http://arxiv.org/abs/astro-ph/9710123
http://arxiv.org/abs/astro-ph/9806396
[28] B. P. Schmidt et al., Astrophys. J. 507, 46 (1998), Hi-z Supernova Team
Collaboration (astro-ph/9805200).
[29] M. Gasperini, Phys. Lett. B 194, 347 (1987).
[30] M. Gasperini, Class. Quant. Grav. 5, 521 (1988).
[31] M. S. Berman, Int. J. Theor. Phys. 29, 567 (1990) 567;
M. S. Berman, Int. J. Theor. Phys. 29, 1419 (1990);
M. S. Berman, Phys. Rev. D 43, 75 (1991).
M. S. Berman and M. M. Som, Int. J. Theor. Phys. 29, 1411 (1990).
M. S. Berman, M. M. Som and F. M. Gomide, Gen. Rel. Grav. 21, 287
(1989).
M. S. Berman and F. M. Gomide, Gen. Rel. Grav. 22, 625 (1990).
[32] P. J. E. Peebles and B. Ratra, Astrophys. J. 325, L17 (1988).
[33] W. Chen and Y. S. Wu, Phys. Rev. D 41, 695 (1990).
[34] Abdussattar and R. G. Vishwakarma, Pramana J. Phys. 47, 41 (1996).
[35] J. Gariel and G. Le Denmat, Class. Quant. Grav. 16, 149 (1999).
[36] A. Pradhan and A. Kumar, Int. J. Mod. Phys. D 10, 291 (2001).
A. Pradhan and V. K. Yadav, Int J. Mod Phys. D 11, 983 (2002).
A. Pradhan and O. P. Pandey, Int. J. Mod. Phys. D 12, 941 (2003).
A. Pradhan, S. K. Srivastava and K. R. Jotania, Czech. J. Phys. 54, 255
(2004).
A. Pradhan, A. K. Yadav and L. Yadav, Czech. J. Phys. 55, 503 (2005).
A. Pradhan and P. Pandey, Czech. J. Phys. 55, 749 (2005).
A. Pradhan and P. Pandey, Astrophys. Space Sci. 301, 221 (2006).
G. S. Khadekar, A. Pradhan and M. R. Molaei, Int. J. Mod. Phys. D 15,
95 (2006).
A. Pradhan, K. Srivastava and R. P. Singh, Fizika B (Zagreb) 15, 141
(2006).
C. P. Singh, S. Kumar and A. Pradhan, Class. Quantum Grav. 24, 455
(2007).
A. Pradhan, A. K. Singh and S. Otarod, Romanian J. Phys. 52, 415
(2007).
[37] A.-M. M. Abdel-Rahaman, Gen. Rel. Grav. 22, 655 (1990); Phys. Rev. D
45, 3492 (1992).
[38] I. Waga, Astrophys. J. 414, 436 (1993).
[39] V. Silveira and I. Waga, Phys. Rev. D 50, 4890 (1994).
[40] R. G. Vishwakarma, Class. Quant. Grav. 17, 3833 (2000).
[41] R. Bali and M. K. Yadav, J. Raj. Acad. Phys. Sci. 1, 47 (2002).
http://arxiv.org/abs/astro-ph/9805200
[42] D. Pavon, J. Bafaluy and D. Jou, Class Quant. Grav. 8, 357 (1991);
“Proc. Hanno Rund Conf. on Relativity and Thermodynamics”, Ed. S.
D. Maharaj, (University of Natal, Durban, 1996, p. 21).
[43] R. Maartens, Class Quant. Grav. 12, 1455 (1995).
[44] W. Zimdahl, Phys. Rev. D 53, 5483 (1996).
[45] N. O. Santos, R. S. Dias and A. Banerjee, J. Math. Phys. 26, 878 (1985).
[46] G. L. Murphy, Phys. Rev. D 8, 4231 (1973).
[47] S. Weinberg, Astrophys. J. 168, 175 (1971).
[48] U. A. Belinskii and I. M. Khalatnikov, Sov. Phys. JETP 42, 205 (1976).
[49] S. M. Carrol, W. H. Press and E. L. Turner, ARA&A 30, 499 (1992).
[50] C. S. Kochanek, Astrophys. J. 384, 1 (1992).
	Introduction
	The Metric and Field Euations
	Solutions of the Field Eqations
	Model I:       Solution for n = 0
	Model II:       Solution for n = 1
	Other Models
	Model I:       Solution for n = 0
	Model II:       Solution for n = 1
	Conclusions
ABSTRACT
  An LRS Bianchi type-V cosmological models representing a viscous fluid
distribution with a time dependent cosmological term $\Lambda$ is investigated.
To get a determinate solution, the viscosity coefficient of bulk viscous fluid
is assumed to be a power function of mass density. It turns out that the
cosmological term $\Lambda(t)$ is a decreasing function of time, which is
consistent with recent observations of type Ia supernovae. Various physical and
kinematic features of these models have also been explored.

<|endoftext|><|startoftext|>
Introduction
The spin-1/2 antiferromagnetic Heisenberg XXZ chain is one of the most fundamental models
for one-dimensional quantum magnetism, which is given by the Hamiltonian
Sxj S
j+1 + S
j+1 +∆S
, (1.1)
where Sαj = σ
j /2 with σ
j being the Pauli matrices acting on the j-th site and ∆ is the
anisotropy parameter. For ∆ > 1, it is called the massive XXZ model where the system is
gapful. Meanwhile for −1 < ∆ ≤ 1 case, the system is gapless and called the massless XXZ
model. Especially we call it XXX model for the isotropic case ∆ = 1.
The exact eigenvalues and eigenvectors of this model can be obtained by the Bethe Ansatz
method [1, 2]. Many physical quantities in the thermodynamic limit such as specific heat,
magnetic susceptibility, elementary excitations, etc..., can be exactly evaluated even at finite
temperature by the Bethe ansatz method [2].
The exact calculation of the correlation functions, however, is still a difficult problem. The
exceptional case is ∆ = 0, where the system reduces to a lattice free-fermion model by the
Jordan-Wigner transformation. In this case, we can calculate arbitrary correlation functions
by means of Wick’s theorem [3, 4]. Recently, however, there have been rapid developments in
the exact evaluations of correlation functions for ∆ 6= 0 case also, since Kyoto Group (Jimbo,
Miki, Miwa, Nakayashiki) derived a multiple integral representation for arbitrary correlation
functions. Using the representation theory of the quantum affine algebra Uq(ŝl2), they first
derived a multiple integral representation for massive XXZ antiferromagnetic chain in 1992
[5, 6], which is before long extended to the XXX case [7, 8] and the massless XXZ case
[9]. Later the same integral representations were reproduced by Kitanine, Maillet, Terras
[10] in the framework of Quantum Inverse Scattering Method. They have also succeeded in
generalizing the integral representations to the XXZ model with an external magnetic field
[10]. More recently the multiple integral formulas were extended to dynamical correlation
functions as well as finite temperature correlation functions [11, 12, 13, 14]. In this way
it has been established now the correlation functions for XXZ model are represented by
multiple integrals in general. However, these multiple integrals are difficult to evaluate both
numerically and analytically.
For general anisotropy ∆, it has been shown that the multiple inetegrals up to four-
dimension can be reduced to one-dimensional integrals [15, 16, 17, 18, 19, 20, 21]. As a
result all the density matrix elements within four lattice sites have been obtained for general
anisotropy [21]. To reduce the multiple integrals into one-dimension, however, involves hard
calculation, which makes difficult to obtain correlation functions on more than four lattice
sites. On the other hand, at the isotropic point ∆ = 1, an algebraic method based on
qKZ equation has been devised [22] and all the density matrix elements up to six lattice
sites have been obtained [23, 24]. Moreover, as for the spin-spin correlation functions, up to
seventh-neighbour correlation 〈Sz1Sz8〉 for XXX chain have been obtained from the generating
functional approach [25, 26]. It is desirable that this algebraic method will be generalized to
the case with ∆ 6= 1. Actually, Boos, Jimbo, Miwa, Smirnov and Takeyama have derived an
exponential formula for the density matrix elements of XXZ model, which does not contain
multiple integrals [27, 28, 29, 30, 31]. It, however, seems still hard to evaluate the formula
for general density matrix elements.
Among the general ∆ 6= 0, there is a special point ∆ = 1/2, where some intriguing prop-
erties have been observed. Let us define a correlation function called Emptiness Formation
Probability (EFP) [8] which signifies the probability to find a ferromagnetic string of length
P (n) ≡
+ Szj
. (1.2)
The explicit general formula for P (n) at ∆ = 1/2 was conjectured in [33]
P (n) = 2−n
(3k + 1)!
(n + k)!
, (1.3)
which is proportional to the number of alternating sign matrix of size n × n. Later this
conjecture was proved by the explicit evaluation of the multiple integral representing the
EFP [34]. Remarkably, one can also obtain the exact asymptotic behavior as n → ∞
from this formula, which is the unique valuable example except for the free fermion point
∆ = 0. Note also that as for the longitudinal two-point correlation functions at ∆ = 1/2,
up to eighth-neighbour correlation function 〈Sz1Sz9〉 have been obtained in [32] by use of the
multiple integral representation for the generating function. Most outstanding is that all the
results are represented by single rational numbers. These results motivated us to calculate
other correlation functions at ∆ = 1/2. Actually we have obtained all the density matrix
elements up to six lattice sites by the direct evaluation of the multiple integrals. All the
results can be written by single rational numbers as expected. A direct evaluation of the
multiple integrals is possible due to the particularity of the case for ∆ = 1/2 as is explained
below.
2 Analytical evaluation of multiple integral
Here we shall describe how we analytically obtain the density matrix elements at ∆ = 1/2
from the multiple integral formula. Any correlation function can be expressed as a sum of
density matrix elements P
,··· ,ǫ′n
ǫ1,··· ,ǫn , which are defined by the ground state expectation value of
the product of elementary matrices:
,··· ,ǫ′n
ǫ1,··· ,ǫn ≡ 〈E
1 · · ·Eǫ
n 〉, (2.1)
where E
j are 2× 2 elementary matrices acting on the j-th site as
E++j =
+ Szj , E
− Szj ,
E+−j =
= S+j = S
j + iS
j , E
= S−j = S
j − iS
The multiple integral formula of the density matrix element for the massless XXZ chain
reads [9]
,··· ,ǫ′n
ǫ1,··· ,ǫn =(−ν)−n(n−1)/2
· · ·
sinh(xa − xb)
sinh[(xa − xb − ifabπ)ν]
sinhyk−1 [(xk + iπ/2)ν] sinh
n−yk [(xk − iπ/2)ν]
coshn xk
, (2.2)
where the parameter ν is related to the anisotropy as ∆ = cosπν and fab and yk are
determined as
fab = (1 + sign[(s
′ − a + 1/2)(s′ − b+ 1/2)])/2,
y1 > y2 > · · · > ys′, ǫ′yi = +
ys′+1 > · · · > yn, ǫn+1−yi = −. (2.3)
In the case of ∆ = 1/2, namely ν = 1/3, the significant simplification occurs in the multiple
integrals due to the trigonometric identity
sinh(xa−xb) = 4 sinh[(xa−xb)/3] sinh[(xa−xb+iπ)/3] sinh[(xa−xb−iπ)/3]. (2.4)
Actually if we note that the parameter fab takes the value 0 or 1, the first factor in the
multiple integral at ν = 1/3 can be decomposed as
sinh(xa − xb)
sinh[(xa − xb − iπ)/3]
= 4 sinh
xa − xb
xa − xb + iπ
= −1 + ωe
(xa−xb) + ω−1e−
(xa−xb), (2.5)
sinh(xa − xb)
sinh[(xa − xb)/3]
= 4 sinh
xa − xb + iπ
xa − xb − iπ
= 1 + e
(xa−xb) + e−
(xa−xb), (2.6)
where ω = eiπ/3. Expanding the trigonometoric functions in the second factor into exponen-
tials
sinhy−1 [(x+ iπ/2)/3] sinhn−y [(x− iπ/2)/3]
= 21−n
ω1/2ex/3 − ω−1/2e−x/3
)y−1 (
ω−1/2ex/3 − ω1/2e−x/3
= 21−n
(−1)l+m
y − 1
ωy−l+m−(n+1)/2e
(n−2l−2m−1)x, (2.7)
we can explicitly evaluate the multiple integral by use of the formula
eαxdx
coshn x
= 2n−1B
, Re(n± α) > 0, (2.8)
where B(p, q) is the beta function defined by
B(p, q) =
tp−1(1− t)q−1dt, Re(p),Re(q) > 0. (2.9)
Table 1: Comparison with the asymptotic formula of the transverse correlation function
〈Sx1Sx2 〉 〈Sx1Sx3 〉 〈Sx1Sx4 〉 〈Sx1Sx5 〉 〈Sx1Sx6 〉
Exact −0.156250 0.0800781 −0.0671234 0.0521997 −0.0467664
Asymptotics −0.159522 0.0787307 −0.0667821 0.0519121 −0.0466083
In this way we have succeeded in calculating all the density matrix elements up to six lattice
sites. All the results are represented by single rational numbers, which are presented in
Appendix A. As for the spin-spin correlation functions, we have newly obtained the fourth-
and fifth-neighbour transverse two-point correlation function
〈Sx1Sx2 〉 = −
= −0.15625,
〈Sx1Sx3 〉 =
= 0.080078125,
〈Sx1Sx4 〉 = −
65536
= −0.0671234130859375,
〈Sx1Sx5 〉 =
1751531
33554432
= 0.0521996915340423583984375,
〈Sx1Sx6 〉 = −
3213760345
68719476736
= −0.046766368104727007448673248291015625.
The asymptotic formula of the transverse two-point correlation function for the massless
XXZ chain is established in [35, 36]
〈Sx1Sx1+n〉 ∼ Ax(η)
(−1)n
− Ãx(η)
+ · · · , η = 1− ν,
Ax(η) =
8(1− η)2
sinh(ηt)
sinh(t) cosh[(1− η)t]
− ηe−2t
Ãx(η) =
2η(1− η)
cosh(2ηt)e−2t − 1
2 sinh(ηt) sinh(t) cosh[(1− η)t]
sinh(ηt)
η2 + 1
, (2.10)
which produces a good numerical value even for small n as is shown in Table 1. Note that the
longitudinal correlation function was obtained up to eighth-neighbour correlaion 〈Sz1Sz9〉 from
the multiple integral representation for the generating function [32]. Note also that up to
third-neighbour both longitudinal and transverse correlation functions for general anisotropy
∆ were obtained in [21].
3 Reduced density matrix and entanglement entropy
Below let us discuss the reduced density matrix for a sub-chain and the entanglement entropy.
The density matrix for the infinite system at zero temperature has the form
ρT ≡ |GS〉〈GS|, (3.1)
0 10 20 30 40 50 60
0 10 20 30
Figure 1: Eigenvalue-distribution of density matrices
Table 2: Entanglement entropy S(n) of a finite sub-chain of length n
S(1) S(2) S(3) S(4)
1 1.3716407621868583 1.5766810784924767 1.7179079372711414
S(5) S(6)
1.8262818282012363 1.9144714710902746
where |GS〉 denotes the ground state of the total system. We consider a finite sub-chain of
length n, the rest of which is regarded as an environment. We define the reduced density
matrix for this sub-chain by tracing out the environment from the infinite chain
ρn ≡ trEρT =
,··· ,ǫ′n
ǫ1,··· ,ǫn
ǫj ,ǫ
. (3.2)
We have numerically evaluate all the eigenvalues ωα (α = 1, 2, · · · , 2n) of the reduced density
matrix ρn up to n = 6. We show the distribution of the eigenvalues in Figure 1. The
distribution is less degenerate comapared with the isotropic case ∆ = 1 shown in [24]. In the
odd n case, all the eigenvalues are two-fold degenerate due to the spin-reverse symmetry.
Subsequently we exactly evaluate the von Neumann entropy (Entanglement entropy)
defined as
S(n) ≡ −trρn log2 ρn = −
ωα log2 ωα. (3.3)
The exact numerical values of S(n) up to n = 6 are shown in Table 2. By analyzing the
behaviour of the entanglement S(n) for large n, we can see how long quantum correlations
reach [37]. In the massive region ∆ > 1, the entanglement entropy will be saturated as n
grows due to the finite correlation length. This means the ground state is well approximated
by a subsystem of a finite length corresponding to the large eigenvalues of reduced density
matrix. On the other hand, in the massless case −1 < ∆ ≤ 1, the conformal field theory
predict that the entanglement entropy shows a logarithmic divergence [38]
S(n) ∼ 1
log2 n + k∆. (3.4)
1 2 3 4 5 6
Exact
Asymptotics
Figure 2: Entanglement entropy S(n) of a finite sub-chain of length n
Our exact results up to n = 6 agree quite well with the asymptotic formula as shown in Figure
2. We estimate the numerical value of the constant term k∆=1/2 as k∆=1/2 ∼ S(6)− 13 log2 6 =
1.0528. This numerical value is slightly smaller than the isotropic case ∆ = 1, where the
constant k∆=1 is estimated as k∆=1 ∼ 1.0607 from the exact data for S(n) up to n = 6 [24].
At free fermion point ∆ = 0, the exact asymptotic formula has been obtained in [39]
S(n) ∼ 1
log2 n+ k∆=0,
k∆=0 = 1/3−
t sinh2(t/2)
− cosh(t/2)
2 sinh3(t/2)
/ ln 2. (3.5)
In this case the numerical value for the constant term is given by k∆=0 = 1.0474932144 · · · .
4 Summary and discussion
We have succeeded in obtaining all the density matrix elements on six lattice sites for XXZ
chain at ∆ = 1/2. Especially we have newly obtained the fourth- and fifth-neighbour
transverse spin-spin correlation functions. Our exact results for the transverse correlations
show good agreement with the asymptotic formula established in [35, 36]. Subsequently we
have calculated all the eigenvalues of the reduced density matrix ρn up to n = 6. From these
results we have exactly evaluated the entanglement entropy, which shows a good agreement
with the asymptotic formula derived via the conformal field theory. Finally, we remark
that similar procedures to evaluate the multiple integrals are also possible at ν = 1/n for
n = 4, 5, 6, · · · , since there are similar trigonometric identities as (2.4). We will report the
calculation of correlation functions for these cases in subsequent papers.
Acknowledgement
The authors are grateful to K. Sakai for valuable discussions. This work is in part sup-
ported by Grant-in-Aid for the Scientific Research (B) No. 18340112. from the Ministry of
Education, Culture, Sports, Science and Technology, Japan.
Appendix A Density matrix elements up to n = 6
In this appendix we present all the independent density matrix elements defined in eq. (2.1)
up to n = 6. Other elements can be computed from the relations
,··· ,ǫ′n
ǫ1,··· ,ǫn = 0 if
ǫj 6=
ǫ′j , (A.1)
,··· ,ǫ′n
ǫ1,··· ,ǫn = P
ǫ1,··· ,ǫn
,··· ,ǫ′n
,··· ,−ǫ′n
−ǫ1,··· ,−ǫn
ǫ′n,··· ,ǫ
ǫn,··· ,ǫ1 , (A.2)
,··· ,ǫ′n
+,ǫ1,··· ,ǫn
,··· ,ǫ′n
−,ǫ1,··· ,ǫn
,··· ,ǫ′n,+
ǫ1,··· ,ǫn,+
,··· ,ǫ′n,−
ǫ1,··· ,ǫn,−
,··· ,ǫ′n
ǫ1,··· ,ǫn , (A.3)
and the formula for the EFP [33, 34]
P (n) = P
+,··· ,+
+,··· ,+ = 2
(3k + 1)!
(n+ k)!
. (A.4)
Appendix A.1 n ≤ 4
P−++− = −
= −0.3125, P−++++− =
= 0.0800781,
P−++++−++ = −
= −0.0269775, P−+++++−+ =
65536
= 0.0240936,
P−++++++− = −
32768
= −0.00881958, P+−+++−++ =
16384
= 0.0632935,
P+−++++−+ = −
32768
= −0.0611877, P−−+++−+− = −
65536
= −0.0583038,
P−−++++−− =
65536
= 0.0212555, P−+−++−+− =
32768
= 0.149017,
P−++−+−−+ =
32768
= 0.0943298.
Appendix A.2 n = 5
P−+++++−+++ = −
14721
8388608
= −0.00175488, P−++++++−++ =
37335
16777216
= 0.00222534,
P−+++++++−+ = −
48987
33554432
= −0.00145993, P−++++++++− =
13911
33554432
= 0.00041458,
P+−++++−+++ =
179699
33554432
= 0.00535545, P+−+++++−++ = −
120337
16777216
= −0.00717264,
P+−++++++−+ =
165155
33554432
= 0.004922, P++−++++−++ =
168313
16777216
= 0.0100322,
P−−++++−−++ =
31069
2097152
= 0.0148149, P−−++++−+−+ = −
411583
16777216
= −0.0245323,
P−−++++−++− =
196569
16777216
= 0.0117164, P−−+++++−+− = −
281271
33554432
= −0.00838253,
P−−++++++−− =
79673
33554432
= 0.00237444, P−+−+++−−++ = −
1441787
33554432
= −0.0429686,
P−+−+++−++− = −
1261655
33554432
= −0.0376002, P−+−++++−+− =
59459
2097152
= 0.0283523,
P−++−++−++− =
1575515
33554432
= 0.046954, P−+++−+−−++ = −
696151
33554432
= −0.0207469,
P−+++−+−+−+ =
1366619
33554432
= 0.0407284.
Appendix A.3 n = 6
P−++++++−++++ = −
1546981
34359738368
= −0.0000450231, P−+++++++−+++ =
5095899
68719476736
= 0.0000741551,
P−++++++++−++ = −
2366275
34359738368
= −0.0000688677, P−+++++++++−+ =
2455833
68719476736
= 0.0000357371,
P−++++++++++− = −
284577
34359738368
= −8.28228× 10−6, P+−+++++−++++ =
2927709
17179869184
= 0.000170415,
P+−++++++−+++ = −
20086627
68719476736
= −0.000292299, P+−+++++++−++ =
19268565
68719476736
= 0.000280395,
P+−++++++++−+ = −
10295153
68719476736
= −0.000149814, P++−+++++−+++ =
17781349
34359738368
= 0.000517505,
P++−++++++−++ = −
35087523
68719476736
= −0.000510591, P−−+++++−−+++ =
48421023
34359738368
= 0.00140924,
P−−+++++−+−++ = −
214080091
68719476736
= −0.00311528, P−−+++++−++−+ =
88171589
34359738368
= 0.00256613,
P−−+++++−+++− = −
57522267
68719476736
= −0.000837059, P−−++++++−−++ =
56776545
34359738368
= 0.00165241,
P−−++++++−+−+ = −
154538459
68719476736
= −0.00224883, P−−++++++−++− =
60809571
68719476736
= 0.000884896,
P−−+++++++−−+ =
6708473
8589934592
= 0.000780969, P−−+++++++−+− = −
33366621
68719476736
= −0.000485548,
P−−++++++++−− =
3860673
34359738368
= 0.00011236, P−+−++++−−+++ = −
85706851
17179869184
= −0.0049888,
P−+−++++−+−++ =
12211375
1073741824
= 0.0113727, P−+−++++−++−+ = −
332557469
34359738368
= −0.0096787,
P−+−++++−+++− =
56183761
17179869184
= 0.00327033, P−+−+++++−−++ = −
430452959
68719476736
= −0.00626391,
P−+−+++++−+−+ =
606065059
68719476736
= 0.00881941, P−+−+++++−++− = −
123612511
34359738368
= −0.0035976,
P−+−++++++−−+ = −
108202041
34359738368
= −0.00314909, P−+−++++++−+− =
70061315
34359738368
= 0.00203905,
P−++−+++−−+++ =
7860495
1073741824
= 0.00732066, P−++−+++−+−++ = −
591759525
34359738368
= −0.0172225,
P−++−+++−++−+ =
1044016671
68719476736
= 0.0151924, P−++−+++−+++− = −
367905053
68719476736
= −0.00535372,
P−++−++++−−++ =
676957849
68719476736
= 0.00985103, P−++−++++−+−+ = −
988973861
68719476736
= −0.0143915,
P−++−++++−++− =
6581795
1073741824
= 0.00612977, P−++−+++++−−+ =
363618785
68719476736
= 0.00529135,
P−+++−++−−+++ = −
185522333
34359738368
= −0.00539941, P−+++−++−+−++ =
901633567
68719476736
= 0.0131205,
P−+++−++−++−+ = −
103539423
8589934592
= −0.0120536, P−+++−++−+++− =
38524625
8589934592
= 0.00448486,
P−+++−+++−−++ = −
267901987
34359738368
= −0.00779697, P−+++−+++−+−+ =
12750645
1073741824
= 0.011875,
P−+++−++++−−+ = −
309855965
68719476736
= −0.004509, P−++++−+−−+++ =
29410257
17179869184
= 0.0017119,
P−++++−+−+−++ = −
296882461
68719476736
= −0.00432021, P−++++−+−++−+ =
35985105
8589934592
= 0.00418922,
P−++++−++−−++ =
92176287
34359738368
= 0.00268268, P+−−++++−−+++ =
202646807
34359738368
= 0.0058978,
P+−−++++−+−++ = −
972245985
68719476736
= −0.014148, P+−−++++−++−+ =
217687057
17179869184
= 0.0126711,
P+−−+++++−+−+ = −
211696415
17179869184
= −0.0123224, P+−−++++++−−+ =
78922695
17179869184
= 0.00459391,
P+−+−+++−+−++ =
1196499417
34359738368
= 0.0348227, P+−+−+++−++−+ = −
2209522727
68719476736
= −0.0321528,
P+−+−++++−+−+ =
1108384987
34359738368
= 0.0322582, P+−++−++−++−+ =
530683585
17179869184
= 0.0308899,
P+−++−+++−−++ =
347202525
17179869184
= 0.0202098, P−−−++++−−++− = −
268623007
68719476736
= −0.00390898,
P−−−++++−+−+− =
46285135
8589934592
= 0.0053883, P−−−++++−++−− = −
136974885
68719476736
= −0.00199325,
P−−−+++++−+−− =
19939391
17179869184
= 0.00116063, P−−−++++++−−− = −
18442085
68719476736
= −0.000268368,
P−−+−+++−−++− =
1018463205
68719476736
= 0.0148206, P−−+−+++−+−+− = −
1454513249
68719476736
= −0.021166,
P−−+−+++−++−− =
277721503
34359738368
= 0.00808276, P−−+−++++−+−− = −
335265249
68719476736
= −0.00487875,
P−−++−++−−++− = −
369408975
17179869184
= −0.0215024, P−−++−++−+−+− =
1104236607
34359738368
= 0.0321375,
P−−++−++−++−− = −
880560357
68719476736
= −0.0128138, P−−++−+++−−+− = −
876924641
68719476736
= −0.0127609,
P−−+++−+−−−++ =
113631201
17179869184
= 0.00661421, P−−+++−+−−+−+ = −
292857807
17179869184
= −0.0170466,
P−−+++−+−+−−+ =
548645951
34359738368
= 0.0159677, P−−+++−++−−−+ = −
377925345
68719476736
= −0.00549954,
P−+−+−++−−++− =
1719255909
34359738368
= 0.0500369, P−+−+−++−+−+− = −
5350158879
68719476736
= −0.0778551,
P−+−++−+−−+−+ =
1565770597
34359738368
= 0.0455699, P−+−++−+−+−−+ = −
3059753503
68719476736
= −0.0445253,
P−++−−++−−++− = −
2117554719
68719476736
= −0.0308145.
References
[1] H.A. Bethe, Z. Phys. 71 (1931) 205.
[2] M. Takahashi, Thermodynamics of One-Dimensional Solvable Models, Cambridge Uni-
versity Press, Cambridge, 1999.
[3] E. Lieb, T. Schultz, D. Mattis, Ann. Phys. (N.Y.) 16 (1961) 407.
[4] B.M. McCoy, Phys. Rev. 173 (1968) 531.
[5] M. Jimbo, K. Miki, T. Miwa, A. Nakayashiki, Phys. Lett. A 168 (1992) 256.
[6] M. Jimbo, T. Miwa, Algebraic Analysis of Solvable Lattice Models, CBMS Regional Con-
ference Series in Mathematics vol.85, American Mathematical Society, Providence, 1994.
[7] A. Nakayashiki, Int. J. Mod. Phys. A 9 (1994) 5673.
[8] V.E. Korepin, A. Izergin, F.H.L. Essler, D. Uglov, Phys. Lett. A 190 (1994) 182.
[9] M. Jimbo, T. Miwa, J. Phys. A: Math. Gen. 29 (1996) 2923.
[10] N. Kitanine, J.M. Maillet, V. Terras, Nucl. Phys. B 567 (2000), 554.
[11] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, Nucl. Phys. B 729 (2005) 558.
[12] F.Göhmann, A. Klümper, A. Seel, J. Phys. A: Math. Gen 37 (2004) 7625.
[13] F.Göhmann, A. Klümper, A. Seel, J. Phys. A: Math. Gen 38 (2005) 1833.
[14] K. Sakai, “Dynamical correlation functions of the XXZ model at finite temperature”,
cond-mat/0703319.
[15] H.E. Boos, V.E. Korepin, J. Phys. A: Math. Gen. 34 (2001) 5311.
[16] H.E. Boos, V.E. Korepin, “Evaluation of integrals representing correlators in XXX
Heisenberg spin chain” in. MathPhys Odyssey 2001, Birkhäuser, Basel, (2001) 65.
[17] H.E. Boos, V.E. Korepin, Y. Nishiyama, M. Shiroishi, J. Phys. A: Math. Gen 35 (2002)
4443.
[18] K. Sakai, M. Shiroishi, Y. Nishiyama, M. Takahashi, Phys. Rev. E 67 (2003) 065101.
[19] G. Kato, M. Shiroishi, M. Takahashi, K. Sakai, J. Phys. A: Math. Gen. 36 (2003) L337.
[20] M. Takahashi, G. Kato, M. Shiroishi, J. Phys. Soc. Jpn, 73 (2004) 245.
[21] G. Kato, M. Shiroishi, M. Takahashi, K. Sakai, J. Phys. A: Math. Gen. 37 (2004) 5097.
[22] H.E. Boos, V.E. Korepin, F.A. Smirnov, Nucl. Phys. B 658 (2003) 417.
[23] H.E. Boos, M. Shiroishi, M. Takahashi, Nucl. Phys. B 712 (2005) 573.
[24] J. Sato, M. Shiroishi, M. Takahashi, J. Stat. Mech. 0612 (2006) P017.
http://arxiv.org/abs/cond-mat/0703319
[25] J. Sato, M. Shiroishi, J. Phys. A: Math. Gen. 38 (2005) L405.
[26] J. Sato, M. Shiroishi, M. Takahashi, Nucl. Phys. B 729 (2005) 441, hep-th/0507290.
[27] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Algebra Anal. 17 (2005)
[28] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Commun. Math. Phys. 261
(2006) 245.
[29] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, J. Phys. A: Math. Gen. 38
(2005) 7629.
[30] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Lett. Math. Phys. 75 (2006)
[31] H.E. Boos, M. Jimbo, T. Miwa, F. Smirnov, Y. Takeyama, Annales Henri Poincare 7
(2006) 1395.
[32] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, J. Stat. Mech. 0509 (2005) L002.
[33] A.V. Razumov, Yu.G. Stroganov, J. Phys. A: Math. Gen. 34 (2001) 3185.
[34] N. Kitanine, J.M. Maillet, N.A. Slavnov, V. Terras, J. Phys. A: Math. Gen. 35 (2002)
L385.
[35] S. Lukyanov, A. Zamolodchikov, Nucl. Phys. B 493 (1997) 571.
[36] S. Lukyanov, V. Terras, Nucl. Phys. B 654 (2003) 323.
[37] G. Vidal, J.I. Latorre, E. Rico, A. Kitaev, Phys. Rev. Lett. 90 (2003) 227902.
[38] C. Holzhey, F. Larsen, F. Wilczek, Nucl. Phys. B 424 (1994) 443.
[39] B.-Q. Jin, V.E. Korepin, J. Stat. Phys. 116 (2004) 79.
http://arxiv.org/abs/hep-th/0507290
	Introduction
	Analytical evaluation of multiple integral
	Reduced density matrix and entanglement entropy
	Summary and discussion
	Density matrix elements up to n=6
ABSTRACT
  We have analytically obtained all the density matrix elements up to six
lattice sites for the spin-1/2 Heisenberg XXZ chain at $\Delta=1/2$. We use the
multiple integral formula of the correlation function for the massless XXZ
chain derived by Jimbo and Miwa. As for the spin-spin correlation functions, we
have newly obtained the fourth- and fifth-neighbour transverse correlation
functions. We have calculated all the eigenvalues of the density matrix and
analyze the eigenvalue-distribution. Using these results the exact values of
the entanglement entropy for the reduced density matrix up six lattice sites
have been obtained. We observe that our exact results agree quite well with the
asymptotic formula predicted by the conformal field theory.

<|endoftext|><|startoftext|>
Counting on Rectangular Areas
Milan Janjić,
Faculty of Natural Sciences and mathematics,
Banja Luka, Republic of Srpska, Bosnia and Herzegovina.
Counting on Rectangular Areas
Abstract
In the first section of this paper we prove a theorem for the number
of columns of a rectangular area that are identical to the given one. A
special case, concerning (0, 1)-matrices, is also stated.
In the next section we apply this theorem to derive several combina-
torial identities by counting specified subsets of a finite set. This means
that the obtained identities will involve binomial coefficients only. We
start with a simple equation which is, in fact, an immediate consequence
of Binomial theorem, but it is derived independently of it. The second
result concerns sums of binomial coefficients. In a special case we obtain
one of the best known binomial identity dealing with alternating sums.
Klee’s identity is also obtained as a special case as well as some formu-
lae for partial sums of binomial coefficients, that is, for the numbers of
Bernoulli’s triangle.
1 A counting theorem
The set of natural numbers {1, 2, . . . , n} will be denoted by [n], and by |X | will
be denoted the number of elements of the set X.
For the proof of the main theorem we need the following simple result:
(−1)|I| = 0, (1)
where I run over all subsets of [n] (empty set included). This may be easily
proved by induction or using Binomial theorem. But the proof by induction
makes all further investigations independent even of Binomial theorem.
Let A be an m× n rectangular matrix filled with elements which belong to
a set Ω.
By the i-column of A we shall mean each column of A that is equal to
[c1, c2, . . . , cm]
T , where c1, c2, . . . , cm of Ω are given. We shall denote the number
of i-columns of A by νA(c) or simply by ν(c).
For I = {i1, i2, . . . , ik} ⊂ [m], by A(I) will be denoted the maximal number
of columns j of A such that
aij 6= cj , (i ∈ I).
http://arxiv.org/abs/0704.0851v1
We also define
A(∅) = n.
Theorem 1. The number ν(c) of i-columns of A is equal
ν(c) =
(−1)|I|A(I), (2)
where summation is taken over all subsets I of [m].
Proof. Theorem may be proved by the standard combinatorial method, by
counting the contribution of each column of A in the sum on the right side of
We give here a proof by induction. First, the formula will be proved in the
case ν(c) = 0 and ν(c) = n. In the case ν(c) = n it is obvious that for I 6= ∅ we
have A(I) = 0, which implies
(−1)|I|A(I) = n+
I 6=∅
(−1)|I|A(I) = n.
In the case ν(c) = 0 we use induction on n. If n = 1 then the matrix
A has only one column, which is not equal c. It yields that there exists i0 ∈
{1, 2, . . . ,m} such that ai0,1 6= ci0 . Denote by I0 the set of all such numbers.
Then A(I) = 1 if and only if I ⊂ I0. From this and (1) we obtain
(−1)|I|A(I) =
(−1)|I| = 0.
Suppose now that the formula is true for matrices with n columns and that
A has n+ 1-columns, and νA(c) = 0. Omitting the first column, the matrix B
with n columns remains. If I0 is the same as in the case n = 1, then
(−1)|I|A(I) =
I 6⊂I0
(−1)|I|A(I) +
(−1)|I|A(I) =
I 6⊂I0
(−1)|I|B(I) +
(−1)|I|(B(I) + 1) =
(−1)|I|B(I) +
(−1)|I| = 0,
since the first sum is equal zero by the induction hypothesis, and the second by
For the rest of the proof we use induction on n again. For n = 1 the matrix
A has only one column which is either equal c or not. In both cases theorem is
true, from the preceding.
Suppose that theorem holds for n, and that the matrix A has n+1 columns.
We may suppose that ν(c) ≥ 1. Omitting one of the i-columns we obtain the
matrix B with n columns. By the induction hypothesis theorem is true for B.
On the other hand it is clear that A(I) = B(I) for each nonempty subset I.
Furthermore A has one i-column more then B, which implies
ν(c) = νA(c) = νB(c) + 1 = 1 +
(−1)|I|B(I) =
= 1 + n+
I 6=∅
(−1)|I|B(I) = 1 + n+
I 6=∅
(−1)|I|A(I).
ν(c) =
(−1)|I|A(I),
and theorem is proved.
If the number A(I) does not depend on elements of the set I, but only on
its number |I| then the equation(2) may be written in the form
ν(c) =
(−1)i
A(i), (3)
where |I| = i.
Our object of investigation will be (0, 1) matrices. Let c be the i- column of
a such matrix A. Take I0 ⊆ [m], |I0| = k such that
1 i ∈ I0
0 i 6∈ I0
Then the number A(I) is equal to the number of columns of A having 0’s in
the rows labelled by the set I ∩ I0, and 1’s in the rows labelled by the set I \ I0.
Suppose that the number A(I) depends only on |I ∩ I0|, |I \ I0|. If we denote
|I ∩ I0| = i1, |I \ I0| = i2, A(I) = A(i1, i2), then (2) may be written in the form
ν(c) =
(−1)i1+i2
A(i1, i2). (5)
2 Counting subsets of a finite set
Suppose that a finite set X = {x1, x2, . . . , xn} is given. Label by 1, 2, . . . , 2
n all
subsets of X arbitrary and define an n× 2n matrix A in the following way
aij =
1 if xi lies in the set labelled by j
0 otherwise
. (6)
Take I0 ⊆ [n], |I0| = k, and form the submatrix B of A consisting of those
rows of A which indices belong to I0. Let c be arbitrary i-column of B. Define
= {i ∈ I0 : ci = 1},
I ′′0 = {i ∈ I0 : ci = 0}
. (7)
The number ν(c) is equal to the number of subsets that contain the set
{xi, i ∈ I
0}, and do not intersect the set {xi : i ∈ I
0 }. There are obviously
ν(c) = 2n−k,
such sets.
Furthermore, if I ⊆ I0 then the number B(I) is equal to the number of
subsets that contain the set {xi : i ∈ I ∩ I
}, and do not meet the set {xi : i ∈
I ∩ I ′0}. It is clear that there are
B(I) = 2n−|I|
such subsets, so that the formula (2) may be applied. It follows
2n−k =
(−1)i
2n−i.
Thus we have
Proposition 2.1. For each nonnegative integer k holds
(−1)i
2k−i.
Note 2.1. The preceding equation is a trivial consequence of Binomial theorem.
But here it is obtained independently of this theorem.
The preceding Proposition shows that counting i-columns over all subsets of
X always produce the same result.
We shall now make some restrictions on the number of subsets of X . Take
0 ≤ m1 ≤ m2 ≤ n fixed, and consider the submatrix C of A consisting of rows
whose indices belong to I0, and columns corresponding to those subsets of X
that have m, (m1 ≤ m ≤ m2) elements.
Let c be an i-column of C. Define I ′0 = {i ∈ I0 : ci = 1}, |I
0| = l.
The number ν(c) is equal to the number of sets that contain {xi : i ∈ I
and do not intersect the sets {xi : i ∈ I0 \ I
}. We thus have
m2−|I
i=m1−|I
n− |I0|
On the other hand, for I ⊆ I0 the number C(I) corresponds to the number
of sets that contain {xi : i ∈ I \ I
}, and do not intersect {xi : i ∈ I ∩ I
}. Its
number is equal
m2−|I\I
i3=m1−|I\I
n− |I|
It follows that the formula (5) may be applied. We thus have
Proposition 2.2. For 0 ≤ m1 ≤ m2 ≤ n, and 0 ≤ l ≤ k holds
i=m1−l
m2−i2
i3=m1−i2
(−1)i1+i2
k − l
n− i1 − i2
In the special case when one takes k = l, m1 = m2 = m we obtain
Corollary 2.1. For arbitrary nonnegative integers m,n, k holds
(−1)i
. (9)
Note 2.2. The preceding is one of the best known binomial identities. It appears
in the book [1] in many different forms.
Taking m1 = m2 = m, in (8) one gets
Corollary 2.2. For arbitrary nonnegative integer m,n, k, l, (l ≤ k) holds
(−1)i1+i2
k − l
n− i1 − i2
m− i2
, (10)
For l = 0 we obtain
(−1)i
, (11)
which is only another form of (9).
Taking n = 2k, l = k in (10)we obtain
(−1)i1
2k − i1
Substituting k − i1 by i we obtain
Corollary 2.3. Klee’s identity,([2],p.13)
(−1)k
(−1)i
k + i
From (8) we may obtain different formulae for partial sums of binomial
coefficients, that is, for the numbers of Bernoulli’s triangle. For instance, taking
l = 0, m1 = 0, m2 = m we obtain
Corollary 2.4. For any 0 ≤ m ≤ n and arbitrary nonnegative integer k holds
(−1)i1
n+ k − i1
. (12)
Note 2.3. The number k in the preceding equation may be considered as a free
variable that takes nonnegative integer values. Specially, for k = 1 the equa-
tion represents the standard recursion formula for the numbers of Bernoulli’s
triangle.
Taking k = l = m1, m2 = m one obtains
(−1)i1
n+ k − i1
Note 2.4. The formulae (12) and (13) differs in the range of the index i2.
References
[1] J. Riordan, Combinatorial Identities. New York: Wiley, 1979.
	A counting theorem
	Counting subsets of a finite set
ABSTRACT
  In the first section of this paper we prove a theorem for the number of
columns of a rectangular area that are identical to the given one. In the next
section we apply this theorem to derive several combinatorial identities by
counting specified subsets of a finite set.

<|endoftext|><|startoftext|>
Introduction
Photons have an extremely long mean free path length and escape from the hot
matter without rescattering. By measuring their Bose-Einstein (or Hanbury-Brown
Twiss, HBT) correlations one can extract the space-time dimensions of the hottest
central part of the collision1,2,3,4,5 in contrast to hadron HBT correlations which
measure the size of the system at the moment of its freeze-out. Moreover, photons
emitted at different stages of the collision dominate in different ranges of trans-
verse momentum6, therefore measuring photon correlation radii at various average
transverse momenta (KT ) one can scan the space-time dimensions of the system at
various times and thus trace the evolution of the hot matter.
Photons emitted directly by the hot matter – direct photons – constitute only a
small fraction of the total photon yield while the dominant contribution comes from
decays of the final state hadrons, mainly π0 → 2γ and η → 2γ mesons. Fortunately,
the lifetime of these hadrons is extremely large and the width of the Bose-Einstein
correlations between the decay photons is of the order of a few eV and cannot
obscure the direct photon correlations. This feature can be used to extract the direct
photon yield3: assuming that direct photons are emitted incoherently, the photon
correlation strength parameter can be related to the proportion of direct photons as
λ = 1/2(Ndirγ /N
2. This approach is probably the only way to experimentally
measure direct photon yield at very small pT . Presently, the only experiment to
∗For the full list of the PHENIX collaboration and acknowledgments, see9.
http://arxiv.org/abs/0704.0852v1
November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko-
ggHBT-T
2 D. Peressounko
have measured direct photon Bose-Einstein correlations in ultrarelativistic heavy
ion collisions is WA987. An invariant correlation radius was extracted and the direct
photon yield was measured in Pb+Pb collisions at
sNN = 17 GeV.
Since the strength of the direct photon Bose-Einstein correlation is typically a
few tenths of a percent, it is important to exclude all background contributions
which could distort the photon correlation function. These contributions can be
classified as following: apparatus effects (close clusters interference – attraction of
close clusters in the calorimeter during reconstruction) and correlations caused by
real particles. The latter in turn can be divided into contribution due to ”splitting”
of particles – processes like antineutron annihilation in the calorimeter and photon
conversion on detector material in front of the calorimeter; contamination by corre-
lated hadrons (e.g. Bose-Einstein-correlated π±); background correlations of decay
photons. In this paper we consider all of these contributions in detail and describe
how to control for them in the PHENIX experiment.
2. Analysis
This analysis is based on the data taken by PHENIX in Run3 (d+Au) and Run4
(Au+Au). The total collected statistics is ≈ 3 billion d+Au events and ≈ 900 M
Au+Au events. Details of the PHENIX configuration in these runs can be found in
references 8 and 9, respectively.
2.1. Apparatus effects
Since correlation functions are rapidly rising functions at small relative momenta
any small distortion of the relative momentum for real pairs, because of errors in
reconstruction of close clusters in the calorimeter (”cluster attraction”) for example,
can lead to the appearance of a fake bump in the correlation function.
To explore the influence of cluster interference in the calorimeter EMCAL, we
construct a set of correlation functions by applying different cuts on the minimal
distance between photon clusters in EMCAL. To quantify the difference between
these correlation functions we fit them with a Gaussian and compare the extracted
correlation parameters. We find that for correlation functions that include clusters
with small relative distances there is strong dependence on minimal distance cut,
but for distance cuts above 24 cm (4-5 modules) the correlation parameters are
independent of the relative distance cut. This implies that with this distance cut
the apparatus effects are sufficiently small.
2.2. Photon conversion, n̄ annihilation, and similar backgrounds
The next class of possible backgrounds are processes in which one real particle
produces several clusters in the calorimeter close to each other. These are processes
like n̄ annihilation in the calorimeter producing several separated clusters, or photon
conversion in front of calorimeter, or residual correlations between photons that
November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko-
ggHBT-T
Bose-Einstein correlations of direct photons in Au+Au collisions 3
 (GeV)
0 0.05 0.1 0.15 0.2 0.25 0.3
C Min.Bias, Au+Au
Min.Bias, d+Au, scaled
Fig. 1. Two-photon correlation function measured in d+Au collisions at
sNN = 200 GeV scaled
to reproduce the height of the π0 peak in Au+Au collisions compared to the same correlation
function measured in Au+Au collisions at
sNN = 200 GeV. Absolute vertical scale is omitted
in this technical plot.
belong to different π0 in decays like η → 3π0 → 6γ. The common feature of this
type of process is that their strength is proportional to the number of particles per
event and not to the square of the number of particles, as would be the case for
Bose-Einstein correlations.
To estimate the upper limit on these contributions, we compare two-photon
correlation functions, calculated in d+Au and Au+Au collisions. For the moment
we assume, that all correlations at small relative momenta seen in d+Au collisions
are due to the background effects under consideration. Then we scale the correlation
function obtained in d+Au collisions with the number of π0 (that is we reproduce
the height of the π0 peak in Au+Au collisions):
Cscaled2 = 1−
hAu+Auπ
hd+Auπ
(C2 − 1). (1)
The result of this operation is shown in Fig. 1. We find that the scaled d+Au
correlation function lies well below (close to unity) the correlation function calcu-
lated for Au+Au collisions at small relative momenta. From this we conclude that
the contribution from effects with strength proportional to the first power of the
number of particles is negligible in Au+Au collisions.
2.3. Charged and neutral hadron contamination
Another possible source of distortion of the photon correlation function is a contam-
ination by (correlated) hadrons. Although we use rather strict identification criteria
November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko-
ggHBT-T
4 D. Peressounko
for photons there still may be some admixture of correlated hadrons contributing
to the region of small relative momenta.
 (GeV)
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
C Converted + EMCAL
EMCAL + EMCAL
Fig. 2. Comparison of two-photon correlation functions measured in Au+Au collisions at
sNN =
200 GeV by two different methods: both photons are registered in the EMCAL (closed) and one
photon is registered in EMCAL while the other is reconstructed through its external conversion
(open). Absolute vertical scale is omitted in this technical plot.
To exclude this possibility, we construct the two-photon correlation function
using one photon registered in the calorimeter EMCAL and reconstructing the sec-
ond photon from its conversion into an e+e− pair on the material of the beam
pipe. The photon sample, constructed using external conversions is completely free
from hadron contamination, so comparison of the standard correlation function
with the pure one allows to estimate the contribution from non-photon contami-
nation. This comparison is shown in Fig. 2. We find that the correlation function
constructed with the more pure photon sample demonstrates a slightly larger cor-
relation strength. This demonstrates that the observed correlation is indeed a pho-
ton correlation, while hadron contamination in the photon sample just increases
combinatorial background and reduces the correlation strength. In addition, this
comparison shows that we have properly excluded the region of cluster interference.
Due to deflection by the magnetic field the electrons of the e+e− conversion pair
hit the calorimeter far from the location of the pair photon used in the correlation
function and thus effects related to the interference of close clusters are absent.
2.4. Photon residual correlations
The last possible source of the distortion of the photon correlation function are
residual correlations between photons. We have already demonstrated that the con-
November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko-
ggHBT-T
Bose-Einstein correlations of direct photons in Au+Au collisions 5
tributions of residual correlations between photons in decays like η → 3π0 → 6γ,
with strength proportional to Npart and not N
part is negligible in Au+Au collisions.
Below we consider other effects, which may cause photon correlations. These are
collective flow (and jet-like correlations) and correlations between photons, origi-
nated from decays of Bose-Einstein correlated mesons. Collective (elliptic) flow as
well as jet-like correlations are long-range effects, resulting in correlations at relative
angles much larger than under consideration here (for example, the opening angle of
a photon pair with 20 MeV mass and KT = 500 MeV is ∼ 5 degrees). Monte-Carlo
simulations demonstrate that flow and jet-like contribution are indeed negligible.
 (GeV)
0 0.05 0.1 0.15 0.2 0.25 0.3
C Min.Bias Au+Au, Data
 HBT resid.corr., Sim.0π
Fig. 3. Comparison of two-photon correlation functions measured in Au+Au collisions at
sNN =
200 GeV with Monte-Carlo simulations of the contribution of residual correlations due to decays
of Bose-Einstein-correlated neutral pions. Absolute vertical scale is omitted in this technical plot.
Potentially, the most serious distortion of the photon correlation function are
residual correlations between decay photons of HBT-correlated π0s. Monte-Carlo
simulations show that this contribution is not negligible, but has a rather specific
shape (see Fig. 3), so that it does not distort the photon correlation function at
small Qinv. This result can be explained as follows. Let us consider two π
0s with
zero relative momentum. The distribution of decay photons is isotropic in their rest
frame, and the probability to find a collinear photon pair (Qinv = 0) is suppressed
due to phase space reasons. The photon pair mass distribution has a maximum at
2/3mπ, not at zero. After convoluting with the pion correlation function we find
a step-like two-photon correlation function3. On the other hand, if one artificially
chooses photons with momentum along the direction of the parent π0 (e.g. by
looking at photon pairs at very large KT ), then the shape of the decay photon
correlation function will reproduce the shape of the parent π0 correlation. This
November 9, 2018 19:7 WSPC/INSTRUCTION FILE DPeressounko-
ggHBT-T
6 D. Peressounko
probably explains the different shape of the residual correlations due to decays of
HBT-correlated π0 found in10.
3. Conclusions
We have presented the current status of analysis of direct photon Bose-Einstein
correlations in the PHENIX experiment. We are able to measure the two-photon
correlation function with a precision sufficient to extract the direct photon corre-
lations. Correlation measurements in which one of the photon pair has converted
to an e+e− pair have been used to provide an important cross-check. We have
demonstrated that all known backgrounds are under control. The extraction of the
correlation parameters of direct photon pairs is in progress.
References
1. A.N. Makhlin, JETP Lett. 46:55 (1987); A.N. Makhlin, Sov.J.Nucl.Phys.
49:151,(1989).
2. D.K. Srivastava, J. Kapusta, Phys.Rev. C48:1335 (1993); D.K. Srivastava, C. Gale,
Phys.Lett. B319:407 (1993); D.K. Srivastava, Phys.Rev. D49:4523 (1994); D.K. Srivas-
tava, J. Kapusta, Phys.Rev. C50:505 (1994). D.K. Srivastava, Phys.Rev. C71:034905
(2005); S. Bass, B. Muller, D.K. Srivastava, Phys.Rev. Lett. 93:162301 (2004).
3. D. Peressounko, Phys.Rev. C67:014905 (2003).
4. J. Alam et al., Phys.Rev. C67:054902 (2003); J. Alam et al., Phys.Rev. C70:054901
(2004).
5. T. Renk, Phys.Rev. C71:064905 (2005); hep-ph/0408218.
6. D. d’Enterria and D.Peressounko, Eur.Phys.J.C46:451 (2006).
7. M.M. Aggarwal et al., Phys.Rev.Lett. 93:022301 (2004).
8. S.S.Adler et al., (PHENIX collaboration), Phys.Rev.Lett. 98:012002.
9. S.Bathe et al., (PHENIX collaboration), Nucl.Phys. A774:731 (2006).
10. D.Das et al., nucl-ex/0511055.
http://arxiv.org/abs/hep-ph/0408218
http://arxiv.org/abs/nucl-ex/0511055
	Introduction
	Analysis
	Apparatus effects
	Photon conversion,  annihilation, and similar backgrounds
	Charged and neutral hadron contamination
	Photon residual correlations
	Conclusions
ABSTRACT
  The current status of the analysis of direct photon Bose-Einstein
correlations in Au+Au collisions at $\sqrt{s_{NN}}=200$ GeV done by the PHENIX
collaboration is summarized. All possible sources of distortion of the
two-photon correlation function are discussed and methods to control them in
the PHENIX experiment are presented.

<|endoftext|><|startoftext|>
Introduction
Let (M, g) be a Riemannian manifold of dimension 2. The normalized ricci flow
= (r −R)gij ,
where R is the scalar curvature and r is some constant. For compact surface, r is
the average of scalar curvature. In this case, Hamilton [4] and Chow [2] proved the
normalized Ricci flow from any initial metric will exist for all time and converge
to a metric of constant curvature. It’s therefore nature to ask if such result holds
for non-compact surfaces. Recently, a preprint of Ji and Sesum [14] generalized
the above result to complete surfaces with logarithmic ends. Such surfaces have
infinities like hyperbolic cusps. In particular, they have finite volume, therefore are
parabolic, in the sense that there exists no positive Green’s function. One of their
result shows that the normalized Ricci flow from such a metric will exist for all time
and converge to hyperbolic metric. In this paper, we study nonparabolic complete
surfaces, i.e. surfaces admitting positive Green’s function. In contrast to [14],
such surfaces have at least one nonparabolic end and have infinite volume. For a
discussion of parabolic and nonparabolic ends and their geometric characterization,
see Li’s survey paper [6].
Here we choose r = −1 because if the flow converges, the limit metric will be
of constant curvature r. Since we are considering noncompact surfaces, r can’t
be positive. If r = 0, the limit will be flat R2 or its quotient. However, it’s
well known that these flat surfaces are parabolic. On the other hand, whether a
surface is parabolic or nonparabolic is invariant under quasi-isometries. Since if
the normalized Ricci flow converges, then the limit metric will be quasi-isometric
to the initial one, we know r can’t be zero.(For the definition of quasi-isometry, see
also [6].) If r < 0, we can always assume r = −1 by a scaling.
The main result of this paper is
Theorem 1.1. Let (M, g) be a nonparabolic surface with bounded curvature. If the
infinity is close to a hyperbolic metric in the sense that
|R+ 1| dV < +∞.
http://arxiv.org/abs/0704.0853v2
2 HAO YIN
Then, the normalized Ricci flow will converge to a metric of constant scalar curva-
ture −1.
As in [14], we try to apply the above result to prove results along the line
of Uniformization theorem. That amounts to prove the existence of a complete
hyperbolic metric within a given conformal class of a noncompact surface. In [14],
the authors proved that there is a uniformization theorem for Riemann surfaces
obtained from compact Riemann surface by removing finitely many points and
remarked that similar result should be true for Riemann surfaces obtained from
compact ones by removing finitely many disjoint disks and points. Our theorem
can be used to prove the same result in the case there is at least one disk removed.
In fact, we will give a unified proof, which includes and simplifies the proof of [14].
Precisely, we will show
Corollary 1.2. Let M be a Riemann surface obtained from compact Riemann sur-
face by removing finitely many disjoint disks and/or points. If no disk is removed,
then we further assume that the Euler number of M is less than zero. Then there
exists on M a complete hyperbolic metric compatible with the conformal structure.
The proof of Theorem 1.1 is along the same line as [14]. The method was initiated
by Hamilton in [4]. There, Hamilton considered only compact case. for the purpose
of generalizing this method to complete case, we need to overcome some analytic
difficulties. Precisely, one need to solve Poisson equations and obtain estimates for
the solutions, for all t. Those growth estimates for the solution are needed to apply
the maximum principle. As for the maximum principle, there are many versions of
maximum principle on complete manifolds. Since we will be working on complete
manifold with a changing metric, the closest version for our need is in [1]. We still
need a little modification.
Theorem 1.3. Suppose g(t) is a smooth family of complete metrics defined on M ,
0 ≤ t ≤ T with Ricci curvature bounded from below and
∣ ≤ C on M × [0, T ].
Suppose f(x, t) is a smooth function defined on M × [0, T ] such that
△tf −
whenever f(x, t) > 0 and
exp(−ar2t (o, x))f2+(x, t)dVt < ∞
for some a > 0. If f(x, 0) ≤ 0 for all x ∈ M , then f ≤ 0 on M × [0, T ].
Although there is no detail in [1], one can prove it using the method of Ecker
and Huisken in [3] and Ni and Tam in [12].
To solve the Poisson equation△u = R+1 for t = 0. We use a result of Ni[10], See
Theorem 3.1. That’s the reason why we assume
|R+ 1| dV < +∞. Moreover,
we prove a growth estimate of the solution under the further assumption that Ricci
curvature bounded from blow. This result is true for all dimensions. For the growth
estimate, an estimate of Green’s function is proved under the assumption that Ricci
curvature bounded from below. This estimate may be of independent interests, see
the discussion in Section 2.
Instead of solving △tu(x, t) = R(x, t) + 1 for later t. We solve an evolution
equation for u. Thanks to the recent preprint of Chau, Tam and Yu [1], we can
RICCI FLOW ON SURFACES 3
solve this evolution equation with a changing metric. Following a method in [11],
we show that u, |∇u| and △u satisfy the growth estimate like in equation (1). With
these preperation, we proceed to show that u(x, t) is indeed the potential functions
we need. Now the Theorem 1.1 follows from the approach of Hamilton and repeated
use of Theorem 1.3.
The paper is organized as follows: In Section 2, we prove the crucial estimate of
Green’s function needed for the growth estimate. In Section 3, we solve the Poisson
equation and prove the relevant growth estimates. In the last section, we prove
Theorem 1.1 and discuss results related to Uniformization theorem.
2. An estimate of Green’s function
In this section we prove that
Theorem 2.1. Let (M, g) be a complete noncompact manifold with Ricci curvature
bounded from below by −K. Assume that M admits a positive Green’s function
G(x, y). Let x0 be a fixed point in M . Then there exists constant A > 0 and B > 0,
which may depend on M and x0, so that
{G(x,y)>eAr(y,x0)}
G(x, y)dx ≤ BeAr(y,x0),
where r(y, x0) is the distance from y to x0.
Remark 2.2. It’s impossible to get an estimate of this kind with constant depending
only on K. Considering a family of nonparabolic manifolds Mi, which are becoming
less and less ’nonparabolic’, i.e. their infinities are closing up. For any A,B > 0,
there exists Mi and some xi ∈ Mi such that
{Gi(x,xi)>A}
Gi(x, xi)dx > B.
See [8].
Remark 2.3. To the best of the author’s knowledge, known estimates on Green’s
function in terms of volume of balls require Ricci curvature to be non-negative,
See [9]. There could be one estimate of such type for Ricci curvature bounded from
below, in light of [1]. If so, our relative estimate should be a corollary. The following
proof is a direct one.
We begin with a lemma,
Lemma 2.4. There is a constant C depending only on K and the dimension, such
that if Ricci curvature on B(x, 1) is bounded from below by −K and G(x, y) is the
Dirichlet Green’s function on B(x, 1), then
B(x,1)
G(x, y)dy < C.
Proof. Let H(x, y, t) be the Dirichlet heat kernel of B(x, 1). It’s easy to see
B(x,1)
H(x, y, t)dy ≤ 1,
for all t > 0.
4 HAO YIN
Now we prove that H(x, y, 2) is bounded from above. The proof is Moser itera-
tion, which has appeared several times. Here we follow computations in [17]. Since
we have Dirichlet boundary condition, we don’t need cut off function of space.
Let 0 < τ < 2 and 0 < δ ≤ 1/2 be some positive constants, σk = (1− (1/2)kδ)τ
and ηi be smooth function on [0,∞) such that 1) ηi = 0 on [0, σi], 2) ηi = 1 on
[σi+1,∞) and 3) η′i ≤ 2i+3(δτ)−1. Let pi = (1 + 2n )
i. Since H is a solution to the
heat equation, it’s easy to know Hp is a subsolution to the heat equation for p > 1.
−△y)Hp(x, y, t) ≤ 0.
Multiply by η2iH
pi and integrate
B(x,1)
−△y)Hpidydt ≤ 0.
Routine computation gives
B(x,1)
|∇yHpi |2 dydt+
B(x,1)
H2pi(x, y, T )dy ≤ 2i+3(τδ)−1
B(x,1)
H2pidydt.
The sobolev inequality in [13] implies
B(x,1)
(Hpi)
n−2 dy
≤ CV −2/n
B(x,1)
|∇yHpi |2 +H2pidy,
where V is the volume of B(x, 1). By Hölder inequality,
B(x,1)
H2pi+1dy ≤
B(x,1)
(Hpi)
n−2 dy
B(x,1)
H2pidy)2/n
≤ (CV −2/n
B(x,1)
|∇yHpi |2 +H2pidy)(
B(x,1)
H2pidy)2/n.
By (2), integrate over time
B(x,1)
H2pi+1dydt ≤ CV −2/nci+30 (στ)−(1+2/n)(
B(x,1)
H2pidydt)1+
where c0 = 2
1+2/n. A standard Moser iteration gives
t∈[τ,2]
y∈B(x,1)
H2(x, y, t) ≤ CV −1(στ)−
(1−δ)τ
B(x,1)
H2(x, y, t)dydt.
An iteration process as given in [7] implies the L1 mean value inequality. In par-
ticular,
y∈B(x,1)
H(x, y, 2) ≤ CV −1
B(x,1)
H(x, y, t)dydt ≤ CV −1.
Hence,
B(x,1)
H2(x, y, 2)dy ≤ CV −1.
RICCI FLOW ON SURFACES 5
Due to a Poincaré inequality in [7],
B(x,1)
H2(x, y, t)dy =
B(x,1)
2H△yHdy
B(x,1)
|∇yH |2 dy
B(x,1)
H2(x, y, t)dy.
This differential inequality implies
B(x,1)
H2(x, y, t)dy ≤
B(x,1)
H2(x, y, 2)dy × e−C(t−2)
≤ CV −1e−C(t−2).
Hölder inequality shows
B(x,1)
H(x, y, t) ≤ V (B(x, 1))
B(x,1)
H2(x, y, t)dy ≤ Ce−C(t−2),
for t ≥ 2. The lemma follows from
B(x,1)
G(x, y)dy =
B(x,1)
H(x, y, t)dydt.
Now let’s turn to the proof of Theorem 2.1.
Proof. The key tool in the proof is Gradient estimate for harmonic function. Recall
that if u is a positive harmonic function on B(x, 2R), then
B(x,R)
|∇ log u(x)|2 ≤ C1K + C2R−2
This is to say outside B(x, 0.1), the Green function as a function of y decays or
increases at most exponentially with a factor
C1K + 100C2.
(1) Consider G(x0, y), Set
p = max
y∈∂B(x0,1)
G(x0, y).
As pointed out in Li and Tam, in the paper constructing Green function, G(x0, y) ≤
p for y /∈ B(x0, 1). Since the Green function is symmetric, for any point y far out
in the infinity, G(y, x0) ≤ p.
(2) If the theorem is not true, then for any big A and B, there is a point y (far
away) so that
{G(x,y)>eAr(y,x0)}
G(x, y) > BeAr(y,x0).
We will derive a contradiction with (1).
Claim: {x|G(x, y) > eAr} ⊂ B(y, 1) is not true.
If true, then consider the Dirichlet Green function G1(z, y) on B(y, 1). It’s well
known that G(z, y) − G1(z, y) is a harmonic function. Notice that this harmonic
function has boundary value less than eAr. Therefore, its integration on B(y, 1) is
less than eAr × V ol(B(y, 1)). Since we assume Ricci lower bound, V ol(B(y, 1)) is
less than a universal constant depending on K.
6 HAO YIN
Therefore,
{G(x,y)>eAr}
G(x, y)dx ≤
B(y,1)
G(x, y)dx
≤ V ol(B(y, 1))× eAr +
B(y,1)
G1(x, y)dx
≤ C(K,n)× eAr,
where we used Lemma 2.4 for the last inequality. If we choose B to be any number
larger than C(K,n) in the above equation, then the choice of y gives an contradic-
tion and implies that the claim is true.
(3) There is a z ∈ {x|G(x, y) > eAr} so that d(z, y) = 1 because the set
{G(x, y) > eAr} is connected. This follows from the maximum principle and the
construction of Green’s function.
(3.1) If |d(y, x0)− d(z, x0)| < 0.3, then
Let σ be the minimal geodesic connecting z and x0.
Claim: the nearest distance from y to σ is no less than 0.1.
If not, let w be the point in σ such that d(y, w) < 0.1. Since d(y, z) > 1, we
d(w, z) > 0.9
Now, w is on the minimal geodesic from z to x0, so
d(w, x0) ≤ d(z, x0)− 0.9
d(y, x0) < d(w, x0) + d(y, w) < d(z, x0)− 0.8
This is a contradiction , so the claim is true.
We can use the gradient estimate along the segment σ. (Notice that d(z, x0) <
r(x, x0) + 1)
G(y, x0) >
G(y, z)
C1K + 100C2(r + 1))
This is a contradiction if we choose A >>
C1K + 100C2.
(3.2) If d(z, x0) ≤ d(y, x0)− 0.3, then
The distance from y to the minimal geodesic connecting z and x0 will be larger
than 0.1. The above argument gives a contradiction.
(3.3) If d(z, x0) ≥ d(y, x0) + 0.3, then
Since G(z, y) > eAr, we move the center to z, by symmetry of Green function.
G(y, z) > eAr(y,x0)) > eA
′r(z,x0). This is case (3.2). We get a contradiction at z.
This finishes the proof of estimate of Green function. �
3. Poisson equations △u = R+ 1
This section is divided into two parts. The first part solves the Poisson equation
for t = 0. The second part solves for t > 0 before the maximum time using an
indirect way.
First, we use Theorem 2.1 to obtain an growth estimate of the solution of the
Poisson equation △u = R + 1 for t = 0. The existence part without curvature
restriction and boundedness of f of the following theorem is due to Lei Ni in [10].
RICCI FLOW ON SURFACES 7
Theorem 3.1. Let M be a complete nonparabolic manifold with Ricci curvature
bounded from below by −K. For non-negative bounded continuous function f the
Poisson equation
△u = −f
has a non-negative solution u ∈ W 2,nloc (M) ∩ C
loc (M)(0 < α < 1) if f ∈ L1(M).
Moreover, for any fixed x0 ∈ M , there exists A > 0 and C > 0 such that
u(x) ≤ CeAr(x,x0).
Proof. Let G(x, y) be the positive Green’s function.
G(x, y)f(y)dy =
{G(x,y)≤eAr(x,x0)}
G(x, y)f(y)dy
{G(x,y)>eAr(x,x0)}
G(x, y)f(y)dy
≤ CeAr(x,x0).
For the first term, we use the assumption that f is integrable, for the second term,
we use the boundedness of f and the Theorem 2.1. The estimate above shows the
Poisson equation is solvable with the required estimate. �
Corollary 3.2. Let M be a surface satisfying the assumptions in Theorem 1.1.
There exists a solution u0 to the equation △u0 = R(x) + 1 satisfying
exp(−ar2(x, x0))u20(x)dV < ∞
exp(−br2(x, x0)) |∇u0|2 (x)dV < ∞
where a and b are some positive constants.
Proof. Solve the Poisson equation for the positive part and the negative part of
R+ 1 respectively. Then subtract the solutions. The first integral estimate follows
from the pointwise growth estimate and volume comparison.
Let R > 1. Choose a cut-off function ϕ such that
ϕ(x) =
1 x ∈ B(x0, R)
0 x /∈ B(x0, 2R)
|∇ϕ|2 ≤ C1ϕ.
Multiply the equation by ϕu0 and integrate over M ,
ϕu0△u0dV =
(R + 1)ϕu0dV,
which implies
ϕ |∇u0|2 dV +
u0∇ϕ · ∇u0dV = −
(R+ 1)ϕu0dV.
8 HAO YIN
Hence
(ϕ− |∇ϕ|
) |∇u0|2 dV ≤ C
B(x0,2R)
u20dV + C
B(x0,2R)
|u0| dV.
B(x0,2R)
u20dV + CV ol(B(x0, 2R)).
From the integration estimate of u0,
B(x0,2R)
u20dV ≤ Ce4aR
By choice of ϕ,
B(x0,R)
|∇u0|2 dV ≤ CeãR
From here, it’s not difficult to see the estimate we need. �
Now let’s look at the case of t > 0. In fact, it’s not difficult to show the above
method can be used for t > 0. This amounts to show that M is still nonparabolic
for t > 0 and
|R+ 1| dV is still finite. The first claim is trivial and the second
follows from the evolution equation and maximum principle. Assume the solutions
are u(t). We have trouble in deriving the evolution equation for u(t), due to the
possible existence of nontrivial harmonic functions. This explains why we use the
following indirect way.
Lemma 3.3. Assume the normalized Ricci flow exists for t ∈ [0, Tmax). The
following equation has a solution u(x, t) (0 ≤ t < Tmax) with initial value u0,
= △u− u,
where △ is the Laplace operator of metric g(t). Moreover, there exists a > 0
depending on T such that for any T < Tmax
exp(−ar2(x, x0))u2(x, t)dVt < ∞.
Similar estimates hold for |∇u| and △u with different constants.
Remark 3.4. Since g(0) and g(t) are equivalent up to a constant depending on T ,
it doesn’t matter whether we estimate ∇u or ∇tu and whether we use r to stand
for distance at g(0) or g(t) if t ∈ [0, T ].
Proof. In [1], the authors considered a class of evolution equation with changing
metric. ∂u
= △u− u with the underling metric evolving by normalized Ricci flow
is in this class. They proved, among other things, that the fundamental solution
Z(x, t; y, s) has a Gaussian upper bound, i.e
Z(x, t; y, x) ≤ C
t− s)
r2(x,y)
D(t−s) .
These constants depends on the solution of normalized Ricci flow and T . See
Corollary 5.2 in [1]. For simplicity, denote Z(x, t; y, 0) by H(x, y, t), then to solve
the equation, it suffices to show the following integral converges,
u(x, t) =
H(x, y, t)u0(y)dy.
RICCI FLOW ON SURFACES 9
Bt(x,1)
H(x, y, t)u0(y)dy
≤ CeAr(x,x0),
because the integral of H on Bt(x, 1) is less than 1 and u0 grows at most exponen-
tially by Theorem 2.1.
M\Bt(x,1)
H(x, y, t)u0(y)dy
M\Bt(x,1)
r2(x,y)
Dt |u0(y)| dy.
By volume comparison,
Vx(1) ≥ C1e−A1r(x,x0)Vx0(1)
t) ≥ C2e−A1r(x,x0) min(1, tn/2).
Therefore
M\Bt(x,1)
H(x, y, t)u0(y)dy
M\Bt(x,1)
CeA1r(x,x0)e−
r2(x,y)
2DT |u0(y)| dy
M\Bt(x,1)
CeA2r(x,x0)eAr(x,y)e−
r2(x,y)
2DT dy
≤ CeA2r(x,x0).
In summary,
|u(x, t)| ≤ CeAr(x,x0),
where A means a different constant. Volume comparison then implies
exp(−ar2(x, x0))u2(x, t)dVt < ∞.
For estimates on derivatives, note first that etu(x, t) is a solution of heat equation
(with evolving metric)
with initial value u0. Since we allow constants depend on T , it’s equivalent to prove
estimates for etu(x, t). Therefore, from now on, to the end of this proof, we assume
u(x, t) is a solution of heat equation. Then
(2) (△− ∂
)u2 = 2 |∇u| .
Assume that ϕ : R+ → R+ satisfies
1) ϕ(x) = 1 for x ≤ 1;
2) ϕ(x) = 0 for x ≥ 2.
Choose the cut-off function ϕ(
r(x,x0)
)(R > 1). Multiplying this to the equation
(2) and integrate,
r(x, x0)
) |∇u|2 dVtdt ≤
r(x, x0)
)u2dVtdt.
△ϕ( r
) = div(ϕ′(
= ϕ′′(
|∇r|2
+ ϕ′(
10 HAO YIN
By definition of ϕ, we know ϕ( r
) vanishes unless R ≤ r(x, x0) ≤ 2R. Laplacian
comparison implies (curvature is bounded from below −k)
△r ≤ (n− 1)
kcoth(
kr) ≤ C.
Therefore,
ϕ△u2dVt ≤ C
B(x0,2R)
u2dVt.
Let dVt = e
FdV0,
(ϕu2)eF dV0dt ≥
(ϕu2eF )dtdV0 −
Cϕu2dVtdt
ϕu2(x, T )dVT −
ϕu2(x, 0)dV0 − C
ϕu2dVtdt
ϕu20(x)dV0 − C
ϕu2dVtdt.
Here we have used the fact that ∂e
is bounded. Combined with equation (3) and
B(x0,R)
|∇u|2 dVtdt ≤ C
B(x0,2R)
u2dVtdt+
B(x0,2R)
u20(x)dV0.
From here it’s easy to see the type of estimate in Theorem 1.3. For △u, it suffices
to consider
∣. The Bochner formula in this case is (remember we have assumed
that u is a solution of the heat equation),
(△− ∂
) |∇u|2 = 2
2 − |∇u|2 .
The same argument as before works for
Lemma 3.5. For t ∈ [0, Tmax),
△tu(x, t) = R(x, t) + 1.
Proof. We know for t = 0 it’s true. Calculation shows
(△tu−R(t)− 1) = (R+ 1)△tu+△t(△tu− u)−△tR−R(R+ 1)
= △t(△tu−R(t)− 1) +R(△tu−R− 1)
By previous lemma, we have growth estimate for △tu−R(t)−1. If △tu−R−1 ≥ 0,
−△t)(△tu−R(t)− 1) ≤ C(△tu−R− 1).
If △tu−R− 1 ≤ 0, then
−△t)(△tu−R(t)− 1) ≥ C(△tu−R− 1).
Apply maximum principle for △tu − R − 1, which is zero at t = 0. We know it’s
zero forever. �
RICCI FLOW ON SURFACES 11
4. Proof of the main theorem and the corollary
Assume we have a surface satisfying the assumptions of Theorem 1.1. Short
time existence is known, see [15]. The long time existence and convergence follows
exactly by an argument of Hamilton in [4]. For completeness, we outline the steps.
Solve Poisson equations△tu(x, t) = R(x, t)+1 as we did. Consider the evolution
equation for H = R+ 1 + |∇u|2,
H = △H − 2 |M |2 −H,
where M = ∇∇u − 1
△f · g. Since we have growth estimate for H , maximum
principle says
R+ 1 ≤ H ≤ Ce−t.
Therefore, after some time R will be negative everywhere. Applying maximum
principle again to the evolution equation of scalar curvature
R = △R+ R(R+ 1)
will prove Theorem 1.1.
Next, we discuss the application of the above theorem to Uniformization theorem.
Let S be a compact Riemann surface. Let p1, · · · , pk be k different points in S
and D1, · · · , Dl be l domains on S such that all of them are disjoint and Di is
diffeomorphic to disk. Denote S \ ∪iDi \ {p1, · · · , pk} by M . The aim is to show
there exists a complete hyperbolic metric on M compatible with the conformal
structure.
The approach is to construct an initial metric g0 on M compatible with the
conformal structure so that the normalized Ricci flow starting from g0 will converge
to a hyperbolic metric. Assume there is metric h in the given conformal class of S.
Note that h is incomplete as a metric on M .
For pi, there is an isothermal coordinate (x, y) around pi. By a conformal change
of h, one can ask g0 to be
(x2 + y2) log2(x2 + y2)
(dx2 + dy2)
in a small neighborhood Ui of pi.
Remark 4.1. This is called hyperbolic cusp metric in [14] and it has scalar curva-
ture −1.
For Dj, let r be the distence to ∂Dj on M with respect to h. Let Vj be a
neighborhood of ∂Dj in M . Let (r, θ) be the Fermi coordinates for ∂Dj so that
h0 = dr
2 +A(r, θ)dθ2.
We will find ρ = ρ(r, θ) such that
1) ρ = 0 on ∂Dj;
2) dρ 6= 0 on ∂Dj;
12 HAO YIN
is asymptoticly hyperbolic in high order. Let K and K0 be the Gaussian curvature
of h and g0 respectively. We have the formula,
K0 = ρ
2(△h log ρ+K).
In order that K0 = −1,
1− |∇ρ|2 + ρ△ρ+ ρ2K = 0.
In terms of r and θ,
|∇ρ|2 = (∂ρ
)2 +A−1(r, θ)(
△ρ = ∂
Here A, B, C and D are smooth functions of r and θ. The equation now becomes
(5) ρ
+ 1− (
)2 −A−1(
)2 + ρ2K = 0.
If equation (5) is true at r = 0, then
(r, θ) = 1.
Here we used that fact that ρ > 0.
Set η(r, θ) = ρ
. Equation (6) implies η(0, θ) = 1. Equation (5) becomes
+Brη +Br2 ∂η
+ Cr2 ∂η
+Dr2 ∂
+ 1−η
)2 −A−1 r
)2 + ηr2K = 0
For the convinience of formal calculation, this equation is rewritten as
(7) (D2 −D − 2)η + F [r, η] = 0,
where D = r ∂
F [r, η] = Brη +Br2
+ Cr2
(1− η)2
)2 −A−1 r
)2 + ηr2K.
Equation (7) is a very typical form of Fuchsian type PDE. Formal solutions of
this kind of equation has been discussed many times. For example, Kichenassamy
[5] and Yin [16]. We will only outline the main steps here, for details see [5] and
[16].
Consider formal solution with the following expansion,
(8) η(r, θ) = 1 +
aij(θ)r
i(log r)j .
We will call the sum
j=0 aijr
i(log r)j the i-level of the expansion. Note that D
maps i-level to i-level. Details on formal calculation could be find in [5] and [16].
A common feature of all terms in F [r, η], which is crutial in obtaining a formal
solution, is that the k-level of F [r, η] could be calculated with knowledge of only
l-level of η with l < k. For example, consider (1 − η)2/η. It’s the multiplication
of three formal series, two 1 − η and 1/η. In order the k-level of η appears in the
RICCI FLOW ON SURFACES 13
k-level of (1 − η)2/η, the only possibility is that two of the three series contribute
zero level and one k-level. However, the zero level of 1− η vanishes.
The only thing we need is that there exists a formal solution and furthermore
due to Borel’s Lemma as in [16], there is an approximate solution so that
(D2 −D − 2)η + F [r, η] = o(rk)
for any k. In terms of ρ,
(9) K0 + 1 = 1 + ρ
2(△h log ρ+K) = o(ρk)
for any k. This metric g0 near ∂Dj has Gaussian curvature -1 asymptotically. By
a scaling, we assume it has scalar curvature -1 asymptotically.
We construct g0 by doing the above to every point Pi and disk Dj. If there is
at least one disk removed, we know M is nonparabolic.
|R+ 1| dV
is finite because of equation (9). Therefore, Theorem 1.1 proves the Uniformization
in this case.
If there is no disk removed, i.e. M = S \ {p1, · · · , pk} and M has negative
Euler number, then it’s proved in [14] that there exists a hyperbolic metric in the
conformal class. A large part of [14] is devoted to solve
(10) △g0u = Rg0 + 1
with |∇u| < ∞.
Observe that the above equation is equivalent to
(11) △hu =
(Rg0 + 1).
Since every end of (M, g0) is a hyperbolic cusp, Gauss-Bonnet theorem says
Rg0dV0 = 2πχ(M) < 0.
There exists a function f of compact support on M such that the volume of
(M, efg0) is −2πχ(M), because (M, g0) has finite volume. Denote efg0 by g0,
since the infinity is not changed, equation (12) is still true. Now, the volume of
(M, g0) is −2πχ(M). This implies
(Rg0 + 1)dV0 = 0.
Therefore
(Rg0 + 1)dVh = 0.
By construction of g0, we know
(Rg0+1) is zero near Pi. So
(Rg0+1) is a smooth
function on S. Therefore, equation (11) is solvable. Since u is a smooth function
on compact surface S, u has bounded gradient with respect to h. The relation of
h and g0 near Pi is explicit. It’s straight forward to check u has bounded gradient
as a function of (M, g0). This simplifies the proof in [14].
14 HAO YIN
Remark 4.2. In the case that there is at least one disk removed, by construction
of g0, Rg0 +1 vanishes at high order near ∂Dj. Then one can extend the definition
(Rg0 + 1) to S so that
(Rg0 + 1)dVh = 0.
The rest is the same as in the previous case.
This method of solving Poisson equation depends on the conformal structure of
M , therefore Theorem 3.1 and Theorem 1.1 are not coverd by the above discussion.
References
[1] Chau, A., Tam, L.-F. and Yu, C., Pseudolocality for the Ricci flow and applications, preprint,
DG/0701153.
[2] Chow, B., The Ricci flow on the 2-sphere, J. Diff. Geom. 33(1991), 325-334.
[3] Ecker, K. and Huisken, G., Interior estimates for hypersurfaces moving by mean curvature,
Invent. Math., 105(1991), 547-569.
[4] Hamilton, R., The Ricci flow on surfaces. Mathematics and General Relativity, Contemporary
Mathematics, 71(1988), 237-261.
[5] Kichenassamy, S., On a conjecture of Fefferman and Graham, Adv. In Math., 184(2004),
268-288.
[6] Li, P., Curvature and function theorey on Riemannian manifolds, Surveys in Differential Ge-
ometry, Vol VII, International Press(2000), 375-432.
[7] Li, P. and Schoen, R., Lp and mean value properties of subharmonic functions on Riemannian
manifolds, Acta Math. , 153(1984), no.3-4, 279-301.
[8] Li, P. and Tam, L.-F., Symmetric Green’s function on complete manifolds, Amer. J. Math.,
109(1987), 1129-1154.
[9] Li,P. and Yau, S.-T., On the parabolic kernel of the Schrödinger operator, Acta. Math.,
156(1986), 139-168.
[10] Ni, L. Poisson equation and Hermitian-Einstein metrics on holomorphic vector bundles over
complete noncompact Kähler manifolds, Indiana Univ. Math. Jour., 51 (2002), 670-703.
[11] Ni, L. and Tam, L.-F., Plurisubharmonic functions and the structure of complete Kähler
manifolds with nonnegative curvture, J. Diff. Geom., 64(2003), 457-524.
[12] Ni, L. and Tam, L.-F., Kähler Ricci flow and Poincaré Lelong equation, Comm. Anal. Geom.,
12(2004), no 1. 111-141.
[13] Saloff-Coste, L. Uniform elliptic operators on Riemannian manifolds, J. Diff. Geom., 36(1992),
417-450.
[14] Ji, L.-Z. and Sesum, N. Uniformization of conformally finite Riemann surfaces by the Ricci
flow, preprint, DG/0703357.
[15] Shi, W.-X., Deforming the metric on complete Riemannian manifolds, J. Diff. Geom.,
30(1989), 223-301.
[16] Yin, H., Boundary regularity of harmonic maps from Hyperbolic space into nonpositively
curved manifolds, to appear in Pacific. J. Math..
[17] Zhang, Q.-S., Some gradient estimates for the heat equation on domains and for an equation
by Perelman, preprint, DG/0605518.
	1. Introduction
	2. An estimate of Green's function
	3. Poisson equations u=R+1
	4. Proof of the main theorem and the corollary
	References
ABSTRACT
  This paper studies normalized Ricci flow on a nonparabolic surface, whose
scalar curvature is asymptotically -1 in an integral sense. By a method
initiated by R. Hamilton, the flow is shown to converge to a metric of constant
scalar curvature -1. A relative estimate of Green's function is proved as a
tool.

<|endoftext|><|startoftext|>
myjournal manuscript No.
(will be inserted by the editor)
Polarization properties of subwavelength hole arrays consisting of
rectangular holes
Xi-Feng Ren, Pei Zhang, Guo-Ping Guo⋆, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo
Key Laboratory of Quantum Information, University of Science and Technology of China, Hefei 230026, People’s Republic of
China
Received: date / Revised version: date
Abstract Influence of hole shape on extraordinary op-
tical transmission was investigated using hole arrays con-
sisting of rectangular holes with different aspect ratio.
It was found that the transmission could be tuned con-
tinuously by rotating the hole array. Further more, a
phase was generated in this process, and linear polar-
ization states could be changed to elliptical polarization
states. This phase was correlated with the aspect ratio of
the holes. An intuitional model was presented to explain
these results.
PACS numbers:78.66.Bz,73.20.MF, 71.36.+c
1 introduction
In metal films perforated with a periodic array of sub-
wavelength apertures, it has long been observed that
there is an unusually high optical transmission[1]. It is
⋆ E-mail: e-mail: :gpguo@ustc.edu.cn
believed that metal surface plays a crucial role and the
phenomenon is mediated by surface plasmon polaritons
(SPPs) and there is a process of transforming photon
to SPP and back to photon[2,3,4]. This phenomenon
can be used in various applications, for example, sen-
sors, optoelectronic device, etc[5,6,7,8,9,10]. Polariza-
tion properties of nanohole arrays have been studied in
many works[11,12,13]. Recently, orbital angular momen-
tum of photons was explored to investigate the spatial
mode properties of surface plasmon assisted transmis-
sion [14,15]. It is also showed that entanglement of pho-
ton pairs can be preserved when they respectively travel
through a hole array [15,16,17]. Therefore, the macro-
scopic surface plasmon polarizations, a collective excita-
tion wave involving typically 1010 free electrons propa-
gating at the surface of conducting matter, have a true
quantum nature. However, the increasing use of EOT
requires further understanding of the phenomenon.
http://arxiv.org/abs/0704.0854v2
2 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo
The polarization of the incident light determines the
mode of excited SPP which is also related to the periodic
structure. For the manipulation of light at a subwave-
length scale with periodic arrays of holes, two ingredi-
ents exist: shape and periodicity[2,3,4,11,18,19,20]. In-
fluence of unsymmetrical periodicity on EOT was dis-
cussed in [21]. Influence of the hole shape on EOT was
also observed recently[18,20], in which the authors mainly
focused on the transmission spectra. In this work, we
used rectangle hole arrays to investigate the influence of
hole shape on the polarization properties of EOT. It is
found that linear polarization states could be changed
to elliptical polarization states and a phase could be
added between two eigenmode directions. The phase was
changed when the aspect ratio of the rectangle holes was
varied. The hole array was also rotated in the plane per-
pendicular to the illuminate beam. The optical transmis-
sion was changed in this process. It strongly depended
on the rotation angle, in other words, the angle between
polarization of incident light and axis of hole array, as
in the case with unsymmetrical hole array structure[21].
2 experimental results and modeling
2.1 Relation between transmission efficiency and
photon polarization
Fig. 1(a) is a scanning electron microscope picture of
part of our hole arrays. The hole arrays are produced
as follows: after subsequently evaporating a 3-nm tita-
nium bonding layer and a 135-nm gold layer onto a 0.5-
mm-thick silica glass substrate, a focused ion beam etch-
ing system is used to produce rectangle holes (100nm×
100nm, 100nm × 150nm, 100nm × 200nm, 100nm ×
300nm respectively) arranged as a square lattice (520nm
period). The area of the hole array is 10µm× 10µm.
Transmission spectra of the hole arrays were recorded
by a silicon avalanche photodiode single photon counter
couple with a spectrograph through a fiber. White light
from a stabilized tungsten-halogen source passed though
a single mode fiber and a polarizer (only vertical polar-
ized light can pass), then illuminated the sample. The
hole arrays were set between two lenses of 35mm focal
length, so that the light was normally incident on the
hole array with a cross sectional diameter about 10µm
and covered hundreds of holes. The light exiting from
the hole array was launched into the spectrograph. The
hole arrays were rotated anti-clockwise in the plane per-
pendicular to the illuminating light, as shown in Fig.
            (a)                                  (b)  
Fig. 1 (Color online)The rectangle hole arrays. (a) Scanning
electron microscope pictures. (b) Rotation direction. S (L)
is the axis of short (long) edge of rectangle hole; H(V) is
horizontal (vertical) axis.
Polarization properties of subwavelength hole arrays consisting of rectangular holes 3
1(b). Transmission spectra of the hole arrays for rota-
tion angle θ = 0o and 90o were given in Fig. 2. There
were large difference between the two cases, which was
also observed in [18].
Further, the typical hole array(100nm×300nm holes)
was rotated anti-clockwise in the plane perpendicular to
the illuminating light(see Fig.1 (b)). Transmission effi-
ciencies of H and V photons(702nm wavelength) were
measured with rotation angle θ = 0o, 30o, 45o, 60o, and
90o respectively, as shown in Fig. 3. They were varied
with θ. To explain the results, we gave a simple model.
For our sample, photons with 702nm wavelength will
excite the SPP eigenmodes (0,±1) and (±1, 0). Since
the SPPs were excited in the directions of long (L) and
short (S) edges of rectangle holes, we suspected that this
550 600 650 700 750 800 850
550 600 650 700 750 800 850
550 600 650 700 750 800 850
550 600 650 700 750 800 850
Wavelength(nm)
Fig. 2 (Color online)Hole array transmittance as a function
of wavelength for rotation angle θ = 0o(black square dots)
and 90o(red round dots)(holes for a, b, c, and d are 100nm×
100nm, 100nm × 150nm, 100nm × 200nm, and 100nm ×
300nm respectively). The dashed vertical lines indicate the
wavelength of 702nm used in the experiment.
two directions were eigenmode-directions for our sample.
The polarization of illuminating light was projected into
the two eigenmode-directions to excite SPPs. After that,
the two kinds of SPPs transmitted the holes and irritated
light with different transmission efficiencies TL and TS
respectively. For light whose polarization had an angle θ
with the S direction, the transmission efficiency Tθ will
Tθ = TS cos
2(θ) + TL sin
2(θ). (1)
This equation was also given in the works[20,21]. Due
to the unequal values of TL and TS, the whole transmis-
sion efficiency was varied with angle θ. So if we know
the transmission spectra for enginmode-directions (here
L and S), we can calculate out the transmission spectra
(including the heights and locations of peaks) for any
0 15 30 45 60 75 90
10000
20000
30000
40000
50000
60000
70000
Tilt angle  (degree)
Fig. 3 (Color online)Transmittance as a function rotation
angle θ for photons in 702nm wavelength(100nm × 300nm
holes). Red round dots and black square dots are the counts
for V and H photons respectively. The lines come from the-
oretical calculation.
4 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo
θ. The theoretical calculations were also given in Fig.
3, which agreed well with the experimental data. The
similar results were also observed when the hole arrays
(100nm× 150nm and 100nm× 200nm) were used. With
this model, the transmission efficiency can be continu-
ously tuned in a certain range.
2.2 Influence of hole shape on photon polarization
To investigate the polarization property of the hole ar-
ray, we used the method of polarization state tomog-
raphy. Experimental setup was shown in Fig. 4. White
light from a stabilized tungsten-halogen source passed
though single mode fiber and 4nm filter (center wave-
length 702 nm) to generate 702nm wavelength photons.
Polarization of input light was controlled by a polarizer,
a HWP (half wave plate, 702nm) and a QWP (quar-
ter wave plate, 702nm). The hole array was set between
two lenses of 35mm focal length. Symmetrically, a QWP,
a HWP and a polarizer were combined to analyze the
polarization of transmitted photons. For arbitrary in-
put states, the output states were measured in the four
bases: H , V , 1/
2(|H〉 + |V 〉), and 1/
2(|H〉 + i|V 〉).
With these experimental data, we could get the density
matrix of output states, which gave the full polarization
characters of transmitted photons. For example, in the
case of θ = 0o, for input state 1/
2(|H〉 + eI∗0.5π|V 〉),
four counts (8943, 31079, 3623 and 21760) were recorded
when we used the four detection bases. The density ma-
trix was calculated as:
0.223 −0.410− 0.043i
−0.410 + 0.043i 0.777
, (2)
which had a fidelity of 0.997 with the pure state 0.472|H〉+
0.882eI∗0.967π|V 〉. Compared this state with the input
state, we found that not only the ratio of |H〉 and |V 〉
was changed, but also a phase ϕ = 0.467π was added
between them. The similar phenomenon was also ob-
served when the input state was 1/
2(|H〉 + |V 〉) and
in this case ϕ = 0.442π. We also considered the cases
for θ = 30o, 45o, 60o, and 90o. The experimental density
matrices had the fidelities all larger than 0.960 with the
theoretical calculations, where ϕ = (0.462 ± 0.053)π. It
can be seen that the phase ϕ was hardly influenced by
the rotation.
To study the dependence of phase ϕ with the hole
shape, we performed the same measurements on other
hole arrays which were shown in Fig. 1. It was found
that ϕ was changed with the aspect ratio of the rectan-
Filter Detector 
HA SMF Source SMF 
Polarization Controller 
Polarization Analyzer 
Fig. 4 (Color online)Experimental setup to investigate the
polarization property of our rectangle hole array. Polariza-
tion of input light was controlled by a polarizer, a HWP and
a QWP. The hole array was set between two lenses of 35mm
focal length. Symmetrically, a QWP, a HWP and a polar-
izer were combined to analyze the polarization of transmitted
photons.
Polarization properties of subwavelength hole arrays consisting of rectangular holes 5
gle holes. Fig. 5 gave the relation between ϕ and aspect
ratio. The phases are 0, (0.227±0.032)π, (0.357±0.020)π
and (0.462±0.053)π for aspect ratio 1, 1.5, 2.0 and 3.0 re-
spectively. As mentioned above, period is another impor-
tant parameter in the EOT experiments. Since no similar
result was observed for hole arrays with symmetrical pe-
riods, a special quadrate hole array(see Fig. 1 of [21]) was
also investigated to show the influence of the hole period.
We found that even the periods were different in two di-
rections, there was no birefringent phenomenon(ϕ = 0).
This birefringent phenomenon might be explained
with the propagating of SPPs on the metal surface. As
we know, the interaction of the incident light with sur-
face plasmon is made allowed by coupling through the
grating momentum and obeys conservation of momen-
k sp =
k 0 ± i
Gx ± j
Gy, (3)
1.0 1.5 2.0 2.5 3.0
Aspect ratio
Fig. 5 (Color online)Relation between birefringent phase ϕ
and hole shape aspect ratio. ϕ becomes lager when the aspect
ratio increases.
where
k sp is the surface plasmon wave vector,
k 0 is the
component of the incident wave vector that lies in the
plane of the sample,
Gx and
Gy are the reciprocal lattice
vectors, and i, j are integers. Usually, Gx = Gy = 2π/d
for a square lattice, and relation
k sp ∗ d = mπ was sat-
isfied, where m was the band index[22]. While for our
rectangle hole arrays, the length of holes in L direction
was changed form 150nm to 300nm, which was not as
same as it in S direction. Though Gx = Gy = 2π/d for
our rectangle hole array, the time for surface plasmon
polariton propagating in the L direction must be influ-
enced by the aspect ratio of hole shape, which could not
be same as that in the S direction. A phase difference
ϕ was generated between the two directions, leading the
birefringent phenomenon. Due to the absorption or scat-
tering of the SPPs and scattering at the hole edges, it
is hard to give the accurate value of the phase or the
exact relation between the phase and aspect ratio of
holes. Even so, ϕ could be controlled by changing the
hole shape. As a contrast, there was no birefringent phe-
nomenon observed when the quadrate hole array(see Fig.
1 of [21]) was used. The reason was that phase Gx ∗ dx
always equal to Gy ∗ dy, even Gx 6= Gy for the quadrate
hole array.
3 conclusion
In conclusion, rectangle hole array was explored to study
the influence of hole shape on EOT, especially the prop-
erties of photon polarization. Because of the unsymmet-
6 Xi-Feng Ren, Pei Zhang, Guo-Ping Guo, Yun-Feng Huang, Zhi-Wei Wang, Guang-Can Guo
rical of the hole shape, a birefringent phenomenon was
observed. The phase was determined by the hole shape,
which gave us a potential method to control this bire-
fringent process. It was also found that the transmission
efficiency can be tuned continuously by rotating the hole
array. These results might be explained using an intu-
itional model based on surface plasmon eigenmodes.
This work was funded by the National Fundamental
Research Program, National Natural Science Foundation
of China (10604052), Program for New Century Excel-
lent Talents in University, the Innovation Funds from
Chinese Academy of Sciences, the Program of the Educa-
tion Department of Anhui Province (Grant No.2006kj074A).
Xi-Feng Ren also thanks for the China Postdoctoral Sci-
ence Foundation (20060400205) and the K. C. Wong Ed-
ucation Foundation, Hong Kong.
References
1. T.W. Ebbesen, H. J. Lezec, H. F. Ghaemi, T. Thio, and
P. A. Wolff, Nature 391, 667 (1998).
2. H. Raether, Surface Plasmons on Smooth and Rough Sur-
faces and on Gratings, Vol. 111 of Springer Tracts in Mod-
ern Physics, Springer, Berlin, (1988).
3. D. E. Grupp, H. J. Lezec, T. W. Ebbesen, K. M. Pellerin,
and Tineke Thio, Appl. Phys. Lett. 77 1569 (2000).
4. M. Moreno, F. J. Garca-Vidal, H. J. Lezec, K. M. Pellerin,
T. Thio, J. B. Pendry, and T. W. Ebbesen, Phys. Rev.
Lett. 86, 1114 (2001).
5. S. M. Williams, K. R. Rodriguez, S. Teeters-Kennedy, A.
D. Stafford, S. R. Bishop, U. K. Lincoln, and J. V. Coe,
J. Phys. Chem. B. 108, 11833 (2004).
6. A. G. Brolo, R. Gordon, B. Leathem, and K. L. Kavanagh,
Langmuir. 20, 4813 (2004).
7. A. Nahata, R. A. Linke, T. Ishi, and K. Ohashi, Opt. Lett.
28, 423 (2003).
8. X. Luo and T. Ishihara, Appl. Phys. Lett. 84, 4780 (2004).
9. S. Shinada, J. Hasijume and F. Koyama, Appl. Phys. Lett.
83, 836 (2003).
10. C. Genet and T. W. Ebbeson, Nature, 445, 39 (2007).
11. J. Elliott, I. I. Smolyaninov, N. I. Zheludev, and A. V.
Zayats, Opt. Lett. 29, 1414 (2004).
12. R. Gordon, A. G. Brolo, A. McKinnon, A. Rajora, B.
Leathem, and K. L. Kavanagh, Phys. Rev. Lett. 92,
037401 (2004).
13. E. Altewischer, C. Genet, M. P. van Exter, and J. P.
Woerdman, Opt. Lett. 30, 90 (2005).
14. X. F. Ren, G. P. Guo, Y. F. Huang, Z. W. Wang, and
G. C. Guo, Opt. Lett. 31, 2792, (2006).
15. X. F. Ren, G. P. Guo, Y. F. Huang, C. F. Li, and G. C.
Guo, Europhys. Lett. 76, 753 (2006).
16. E. Altewischer, M. P. van Exter and J. P. Woerdman
Nature 418 304 (2002).
17. S. Fasel, F. Robin, E. Moreno, D. Erni, N. Gisin and H.
Zbinden, Phys. Rev. Lett. 94 110501 (2005).
18. K. J. Klein Koerkamp, S. Enoch, F. B. Segerink, N. F.
van Hulst and L. Kuipers, Phys. Rev. Lett. 92 183901
(2004).
19. Zhichao Ruan and Min Qiu, Phys. Rev. Lett. 96 233901
(2006).
Polarization properties of subwavelength hole arrays consisting of rectangular holes 7
20. M. Sarrazin, J. P. Vigneron, Opt. Commun. 240 89
(2004) .
21. X. F. Ren, G. P. Guo, Y. F. Huang, Z. W. Wang, and
G. C. Guo, Appl. Phys. Lett. 90, 161112 (2007).
22. F. L. Tejeira, S. G. Rodrigo, L. M. Moreno, F. J. G. Vi-
dal, E. Devaux, T. W. Ebbesen, J. R. Krenn, I. P. Radko,
S. I.Bozhevolnyi, M. U. Gonzalez, J. C. Weeber, and A.
Dereux, Nature Physics 3, 324 (2007).
	introduction
	experimental results and modeling
	conclusion
ABSTRACT
  Influence of hole shape on extraordinary optical transmission was
investigated using hole arrays consisting of rectangular holes with different
aspect ratio. It was found that the transmission could be tuned continuously by
rotating the hole array. Further more, a phase was generated in this process,
and linear polarization states could be changed to elliptical polarization
states. This phase was correlated with the aspect ratio of the holes. An
intuitional model was presented to explain these results.

<|endoftext|><|startoftext|>
Introduction
Cooling and trapping alkaline-earth atoms offer interest-
ing alternatives to alkaline atoms. Indeed, the singlet-
triplet forbidden lines can be used for optical frequency
measurement and related subjects [1]. Moreover, the spin-
less ground state of the most abundant bosonic isotopes
can lead to simpler or at least different cold collisions prob-
lems than with alkaline atoms [2]. Considering fermionic
isotopes, the long-living and isolated nuclear spin can be
controlled by optical means [3] and has been proposed to
implement quantum logic gates [4]. It has also been shown
that the ultimate performance of Doppler cooling can be
greatly improved by using narrow transitions whose pho-
ton recoil frequency shifts ωr are larger than their natural
widths Γ [5]. This is the case for the 1S0 →3 P1 spin-
forbidden line of Magnesium (ωr ≈ 1100Γ ) or Calcium
(ωr ≈ 36Γ ). Unfortunately, both atomic species can not
be hold in a standard magneto-optical trap (MOT) be-
cause the radiation pressure force is not strong enough
to overcome gravity. This imposes the use of an extra
quenching laser as demonstrated for Ca [6]. For Stron-
tium, the natural width of the intercombination transition
(Γ = 2π×7.5 kHz) is slightly broader than the recoil shift
(ωr = 2π×4.7 kHz). The radiation pressure force is higher
than the gravity but at the same time the final tempera-
ture is still in the µK range [7,8]. In parallel, the narrow
transition partially prevents multiple scattering processes
and the related atomic repulsive force [10]. Hence impor-
tant improvements on the spatial density have been re-
ported [7]. However, despite experimental efforts, such as
adding an extra confining optical potential, pure optical
methods have not allowed yet to reach the quantum de-
generacy regime with Strontium atoms [9].
In this paper, we will discuss some performances, es-
sentially in terms of temperatures, sizes and loading rates,
of a Strontium 88 MOT using the 689 nm 1S0 →3P1 in-
tercombination line.
Initially the atoms are precooled in a MOT on the
spin-allowed 461 nm 1S0 →1P1 transition (natural width
Γ = 2π × 32MHz) as discussed in [11]. Then the atoms
are transferred into the 689 nm intercombination MOT.
To achieve a high loading rate, Katori et al. [7] have used
laser spectrum, broadened by frequency modulation. Thus
the velocity capture range of the 689 nm MOT matches
the typical velocity in the 461 nm MOT. They report a
transfer efficiency of 30%. The same value of transfer effi-
ciency is also reported in reference [8]. In our set-up, 50%
of the atoms initially in the blue MOT are transferred into
the red one. In section 3 we present a systematic study
of the transfer efficiency as function of the parameters of
the frequency modulation. In order to discuss the intrin-
sic limitations of the loading efficiency, we compare our
experimental results to a simple model. In particular, we
demonstrated that our transfer efficiency is limited by the
size of the red MOT beams. We show that it could be op-
timized up to 90% with realistic laser power (25mW per
beams).
The minimum temperature achieved in the broadband
MOT is about 2.5µK. In order to reduce the tempera-
ture down to the photon recoil limit (0.5µK), we apply
a second cooling stage, using a single frequency laser and
observe similar temperatures, detuning and intensity de-
pendencies as reported in the literature (see references [7],
http://arxiv.org/abs/0704.0855v2
2 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line
[8], [12] and [13]). In those publications, the role of gravity
on the cooling and trapping dynamics along the z vertical
direction has been discussed. In this paper we compare
the steady state behaviour along vertical (z) direction to
that along the horizontal plane (x−y) where gravity plays
indirectly a crucial role (section 4).
Details about the dynamics are given in references
[8],[12]. In particular the authors establish three regimes.
In regime (I) the laser detuning |δ| is larger than the
power-broadened linewidth ΓE = Γ
(1 + s). Regime (II)
on the contrary corresponds to ΓE > |δ|. In both regimes
(I) and (II) ΓE ≫ Γ, ωr and the semiclassical limit is a
good approximation. In regime (III) the saturation pa-
rameter is small and a full quantum treatment is required.
We will focus here on the semiclassical regime (I). In this
regime, we confirm that the temperature along the z di-
rection is independent of the detuning δ. Following Loftus
et al. [12], we have also found (see section 4.1) that this
behavior is due to the balance of the gravitational force
and the radiation pressure force produced by the upward
pointing laser (the gravity defining the downward direc-
tion). The center of mass of the atomic cloud is shifted
downward from the magnetic field quadrupole center. As a
consequence, cooling and trapping in the horizontal plane
occur at a strong bias magnetic field mostly perpendicu-
lar to the cooling plane. This unusual situation is studied
in detail (section 4.2). Despite different friction and dif-
fusion coefficients along the horizontal and the vertical
directions, the horizontal temperature is found to be the
same as the vertical one (see section 4.3). In reference [12],
the trapping potential is predicted to have a box shape
whose walls are given by the laser detuning. This is in-
deed the case without a bias magnetic field along the z
axis. It is actually different for the regime (I) described in
this paper. Here we have found that the trapping poten-
tial remain harmonic. This leads to a cloud width in the
horizontal direction which is proportional to
|δ| (section
4.2).
2 Experimental set-up
Our blue MOT setup (on the broad 1S0 →1 P1 transi-
tion at 461 nm) is described in references [14,15]. Briefly,
it is composed by six independent laser beams typically
10mW/cm2 each. The magnetic field gradient is about
70G/cm. The blue MOT is loaded from an atomic beam
extracted from an oven at 550 ◦C and longitudinally slowed
down by a Zeeman slower. The loading rate of our blue
MOT is of 109 atoms/s and we trap about 2.106 in a
0.6mm rms radius cloud when no repumping lasers are
used [16]. To optimize the transfer into the red MOT, the
temperature of the blue MOT should be as small as possi-
ble. As previously observed [11], this temperature depends
strongly on the optical field intensity. We therefore de-
crease the intensity by a factor 5 (see figure 1) 4ms before
switching off the blue MOT. The rms velocity right before
the transfer stage is thus reduced down to σb = 0.6m/s
whereas the rms size remains unchanged. Similar two stage
cooling in a blue MOT is also reported in reference [13].
The 689 nm laser source is an anti-reflection coated
laser diode in a 10 cm long extended cavity, closed by a
diffraction grating. It is locked to an ULE cavity using the
Pound-Drever-Hall technique [17]. The unity gain of the
servo loop is obtained at a frequency of 1MHz. From the
noise spectrum of the error signal, we derive a frequency
noise power. It shows, in the range of interest, namely
1Hz − 100 kHz, an upper limit of 160 Hz2/Hz which is
low enough for our purpose. The transmitted light from
the ULE cavity is injected into a 20mW slave laser diode.
Then the noise components at frequencies higher than the
ULE cavity cut-off (300 kHz) are filtered. It is important
to note that the lateral bands used for the lock-in are
also removed. Those lateral bands, at 20MHz from the
carrier, are generated modulating directly the current of
the master laser diode. A saturated spectroscopy set-up on
the 1S0 →3P1 intercombination line is used to compensate
the long term drift of 10−50Hz/s mainly due to the daily
temperature change of the ULE cavity.
The slave beam is sent through an acousto-optical mod-
ulator mounted in a double pass configuration. The laser
detuning can then be tuned within the range of a few
hundreds of linewidth around the resonance. This acousto-
optical modulator is also used for frequency modulation
(FM) of the laser, as required during the loading phase
(see section 3).
The red MOT is made of three retroreflected beams
with a waist of 0.7 cm. The maximum intensity per beam
is about 4mW/cm2 (the saturation intensity being Is =
3µW/cm2). The magnetic gradient used for the red MOT
is varied from 1 to 10G/cm.
To probe the cloud (number of atoms and tempera-
ture) we use a resonant 40µs pulse of blue light (see fig
1). The total emitted fluorescence is collected onto an
avalanche detector. From this measurement, we deduce
the number of atoms and then evaluate the transfer rate
into the red MOT. At the same time, an image of the cloud
is taken with an intensified CCD camera. The typical spa-
tial resolution of the camera is 30µm. Varying the dark
period (time-of-flight) between the red MOT phase and
the probe, we get the ballistic expansion of the cloud. We
then derive the velocity rms value and the corresponding
temperature.
3 Broadband loading of the red MOT
The loading efficiency of a MOT depends strongly on the
width of the transition. With a broad transition, the max-
imum radiation pressure force is typically am =
104 × g, where vr is the recoil velocity [18]. Hence, on
l ≈ 1 cm (usual MOT beam waist) an atom with a veloc-
ity vc =
2aml ≈ 30m/s can be slowed down to zero and
then be captured. During the deceleration, the atom re-
mains always close to resonance because the Doppler shift
is comparable to the linewidth. Thus MOTs can be di-
rectly loaded from a thermal vapor or a slow atomic beam
using single frequency lasers. Moreover typical magnetic
field gradients of few tens of G/cm usually do not dras-
T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 3
tically change the loading because the Zeeman shift over
the trapping region is also comparable to the linewidth.
An efficient loading is more complex to achieved with
a narrow transition. For Strontium, the maximum radia-
tion pressure force of a single laser is only am ≈ 15 × g.
Assuming the force is maximum during all the capture
process, one gets vc =
2aml ≈ 1.7m/s. Hence, precool-
ing in the blue MOT is almost mandatory. In that case the
initial Doppler shift will be vcλ
−1 ≈ 2.5MHz, 300 times
larger than the linewidth. In order to keep the laser on
resonance during the capture phase, the red MOT lasers
must thus be spectrally broadened. Because of the low
value of the saturation intensity, the spectral power den-
sity can easily be kept large enough to maintain a maxi-
mum force with a reasonable total power (few milliwatts).
The magnetic field gradient of the MOT may also affect
the velocity capture range. To illustrate this point, let us
consider an atom initially in the blue MOT at the center
of the trap with a velocity vc = 1.7m/s. During the decel-
eration, the Doppler shift decreases whereas the Zeeman
shift increases. However, the magnetic field gradient does
not affect the capture velocity as far as the total shift
(Doppler+Zeeman) is still decreasing. This condition is
fulfilled if the magnetic field gradient is lower than [19]:
λgeµbvc
≈ 0.6G/cm (1)
where ge = 1.5 is the Landé factor of the
3P1 level and
µb = 1.4MHz/G is the Bohr magneton. In practice we use
a magnetic field gradient which is larger than bc. In that
case, it is necessary to increase the width of the laser spec-
trum so that the optimum transfer rate is not limited by
the Zeeman shift (see section 3.2). An alternative solution
may consist of ramping the magnetic field gradient during
the loading [7].
3.1 Transfer rate: experimental results
In this section we will present the experimental results re-
garding the loading efficiency of the red MOT from the
blue MOT. To optimize the transfer rate, the laser spec-
trum is broadened using frequency modulation (FM). Thus
the instantaneous laser detuning is ∆(t) = δ+∆ν. sin νmt.
∆ν and νm are the frequency deviation and modulation
frequency respectively, δ is the carrier detuning. Here, the
modulation index ∆ν/νm is always larger than 1, thus the
so-called wideband limit is well fulfilled. Hence one can
assume the FM spectrum to be mainly enclosed in the
interval [δ −∆ν; δ +∆ν].
As shown in figure 2, the transfer rate increases with
νm up to 15 kHz where we observe a plateau at 45% trans-
fer efficiency. On the one hand when νm is larger than
the linewidth, the atoms are in the non-adiabatic regime
where they interact with all the Fourier components of the
laser spectrum. Moreover, the typical intensity per Fourier
component remains always higher than the saturation in-
tensity Is = 3µW/cm
2. As a consequence, the radiation
pressure force should be close to its maximum value for
any atomic velocity. On the other hand when νm < Γ/2π,
the atoms interact with a chirped intense laser where the
mean radiation pressure force (over a period 2π/νm) is
clearly smaller than in the case νm > Γ/2π. As a conse-
quence, the transfer rate is reduced when νm decreases.
In figure 3, the transfer rate is measured as a func-
tion of ∆ν. The carrier detuning is δ = −1MHz and the
modulation frequency is kept larger than the linewidth
(νm = 25 kHz). Starting from no deviation (∆ν = 0), we
observe (fig. 3) an increase of the transfer rate with ∆ν (in
the range 0 < ∆ν < 500 kHz). After reaching its maximum
value, the transfer rate does not depend on ∆ν anymore.
Thus the capturing process is not limited by the laser
spectrum anymore. If we further increase the frequency
deviation ∆ν, the transfer becomes less efficient and fi-
nally decreases again down to zero. This reduction occurs
as soon as ∆ν > |δ|, i.e. some components of the spectrum
are blue detuned. This frequency configuration obviously
should affect the MOT steady regime adding extra heat-
ing at zero velocity (see section 3.3). We can see that it
is also affecting the transfer rate. To confirm that point,
figure 4 shows the same experiment but with a larger de-
tuning δ = −1.5MHz and δ = −2MHz for the figures
4a and 4b respectively. Again the transfer rate decreases
as soon as ∆ν > |δ|. The transfer rate is also very small
on the other side for small values of ∆ν. In that case the
entire spectrum of the laser is too far red detuned. The
radiation pressure forces are significant only for velocities
larger than the capture velocity and no steady state is
expected. Keeping now the deviation fixed and varying
the detuning as shown in figure 5, we observe a maximum
transfer rate when the detuning is close to the deviation
frequency ∆ν ≃ |δ|. Closer to resonance (∆ν < |δ|), the
blue detuned components prevent an efficient loading of
the MOT.
The magnetic field gradient plays also a crucial role for
the loading. We indeed observe (fig. 6) that the transfer
rate decreases when the magnetic field gradient increases.
At very low magnetic field (b < 1G/cm) the reduction
of the transfer rate is most likely due to a lack of stabil-
ity within the trapping region. In that case we actually
observe a strong displacement of the center of mass of
the cloud. This is induced by imperfections of the set-up
such as non-balanced laser intensities which are critical
at low magnetic gradient. Hence, the optimum magnetic
field gradient is found to be the smallest one which ensure
the stability of the cloud in the MOT.
3.2 Theoretical model and comparison with the
experiments
To clearly understand the limiting processes of the transfer
rate, we compare the experimental data to a simple 1D
theoretical model based on the following assumptions:
- An atom undergoes a radiation pressure force and thus a
deceleration if the modulus of its velocity is between vmax
and vmin with
vmax = λ(|δ|+∆ν), vmin = max{λ(|δ|−∆ν);λ(−|δ|+∆ν)}
4 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line
am = 0 elsewhere. We simply write that the Doppler shift
is contained within the FM spectrum. We add the condi-
tion vmin = λ(−|δ|+∆ν) when some components are blue
detuned ∆ν > |δ|. In this case, we consider the simple
ideal situation where the two counter-propagating lasers
are assumed perfectly balanced and then compensate each
other in the spectral overlapping region.
- Even in the semiclassical model, it is difficult to calculate
the acceleration as a function of the velocity for a FM spec-
trum. However for all the data presented here, the satura-
tion parameter is larger than one. Hence the deceleration
is set to a constant value − 1
am when vmin < |v| < vmax.
The prefactor 1/3 takes into account the saturation by the
3 counter-propagating laser beam pairs.
- The magnetic field gradient is included by giving a spa-
tial dependence of the detuning δ in the expression (2).
- An atom will be trapped if its velocity changes of sign
within a distance shorter that the beam waist.
In figures 3-6 the results of the model are compared to
the experimental data. The agreement between the model
and the experimental data is correct except at large fre-
quency deviation (figures 3 and 4) or at low detuning (fig-
ure 5). In those cases the spectrum has some blue detuned
components. As mentionned before, this is a complex sit-
uation where the assumptions of the simple model do not
hold anymore. Fortunately those cases do not have any
practical interest because they do not correspond to the
optimum transfer efficiency.
At the optimum, the model suggests that the transfer
is limited by the beam waist (see caption of figures (3-6)).
Moreover for all the situation explored in figures 3-5, the
magnetic field gradient is strong enough (b = 1G/cm) to
have an impact on the capture process, as suggested by
the inequality (1). However it is not the transfer limiting
factor because the Zeeman shift is easily compensated by
a larger frequency excursion or by a larger detuning.
Increasing the beam waist would definitely improve the
transfer efficiency as showed in figure 7. If the saturation
parameter would remain large for all values of beam waist,
more than 90% of the atoms would be transferred for a
2 cm beam waist. 25mW of power per beam should be
sufficient to achieve this goal. In our experimental set-up,
the power is limited to 3mW per beam. So the satura-
tion parameter is necessarily reduced once the waist is
increased. To take this into account and get a more realis-
tic estimation of the efficiency for larger beams, we replace
the previous acceleration by the expression ams/(1 + 3s),
with s = I/Is the saturation parameter per beam. In this
case, the transfer efficiency becomes maximum at 70% for
a beam waist of 1.5 cm.
3.3 Temperature
Cooling with a broadband FM spectrum on the intercom-
biaison line decreases the temperature by three orders of
magnitude in comparison with the blue MOT: from 3mK
(σb = 0.6m/s) to 2.5µK (see figure 8). For small detuning,
the temperature is strongly increasing when the spectrum
has some blue detuned components (∆ν > |δ|). Indeed the
cooling force and heating rate are strongly modified at the
vicinity of zero detuning. This effect is illustrated in figure
8. On the other side at large detuning (δ < −1.5MHz), the
temperature becomes constant. This regime corresponds
to a detuning independent steady state, as also observed
in single frequency cooling (see ref. [12] and section 4).
4 Single frequency cooling
About half of the atoms initially in the 461 nm MOT are
recaptured in the red one using a broadband laser. The
final temperature is 2.5µK i.e. 5 times larger than the
photon recoil temperature Tr = 460 nK. To further de-
crease the temperature one has to switch to single fre-
quency cooling (for time sequences: see figure 1). As we
will see in this section, the minimum temperature is now
about 600 nK close to the expected 0.8Tr in an 1D mo-
lasses [5]. Moreover, one has to note that, under proper
conditions described in reference [12], the transfer between
the broadband and the single frequency red MOT can be
almost lossless.
In the steady state regime of the single frequency red
MOT, one has kσv ≈ ωr ≈ Γ . Thus, there is no net sepa-
ration of different time scales as in MOTs operated with a
broad transition where ωr << kσv << Γ . However, here
the saturation parameter s always remains high. It cor-
responds to the so-called regimes (I) and (II) presented
in reference [12]. Thus ωr << Γ
1 + s and the semiclas-
sical Doppler theory describes properly the encountered
experimental situations.
To insure an efficient trapping, the parameter’s val-
ues of the single frequency red MOT are different from
a usual broad transition MOT: the magnetic field gradi-
ent is higher, typically 1000Γ/cm. Moreover the gravity
is not negligible anymore by comparison with the typical
radiation pressure. Those features lead to an unusual be-
havior of the red MOT as we will explain in this section.
We will first independently analyze the MOT properties
along the vertical dimension (section 4.1) then in the hor-
izontal plane (section 4.2), to finally compare those two
situations (section 4.3).
4.1 Vertical direction
In the regime (I) i.e. at large negative detuning and high
saturation (see examples on figure 9a) the temperature is
indeed constant. As explained in reference [12], this be-
havior is due to the balance between the gravity and the
radiation pressure force of the upward laser. At large neg-
ative detuning, the downward laser is too far detuned to
give a significant contribution. In the semiclassical regime,
an atom undergoes a net force of
Fz = h̄k
1 + sT + 4(δ − geµBbz − kvz)2/Γ 2
−mg (3)
Considering the velocity dependence of the force, the first
order term is:
T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 5
Fz ≈ −γzvz (4)
γz = −4
h̄k2δeff
(1 + sT + 4δ
/Γ 2)2
where the effective detuning δeff = δ − geµBb < z > is
define such as
1 + sT + 4δ
= mg (6)
sT is the total saturation parameter including all the beams.
< z > is the mean vertical position of the cold cloud.
Hence δeff is independent of the laser detuning δ and the
vertical temperature at larger detuning depends only on
the intensity as shown in figures 9a and 9b.
The spatial properties of the cloud are also related to
the effective detuning δeff which is independent of δ. The
mean vertical position depends linearly on the detuning,
so that one has :
d < z >
geµBb
The predicted vertical displacement is compared to the
experimental data in figure 10a. The agreement is excel-
lent (the only adjustable parameter is the unknown origin
of the vertical axe). Because the radiation pressure force
for an atom at rest does not depend on the laser detuning
δ, the vertical rms size should be also δ-independent. This
point is also verified experimentally (see figure 10b).
4.2 x− y horizontal plane
Let us now study the behavior of the cold cloud in the x−y
plane at large laser detuning. As explained in section 4.1,
the position of the cloud is vertically shifted downward
with respect to the center of the magnetic field quadrupole
(see figure 11). The dynamic in the x−y plane occurs thus
in the presence of a high bias magnetic field. To derive
the expression of the semiclassical force in this unusual
situation one has first to project the circular polarizations
states of the horizontal lasers on the eigenstates. We define
the quantification axis along the magnetic field, one gets:
e+x =
1 + sinα
cosα√
1− sinα
e−x =
1− sinα
cosα√
1+ sinα
where e−i , πi and e
i represent respectively the left-handed,
linear and right-handed polarisations along the i axis. The
angle α between the vertical axis and the local magnetic
field is shown on figure 11. For large detuning, α is al-
ways small (α ≪ 1 ) and we write α ≈ −x/ < z >
considering only the dynamics along the x dimension. For
simplicity the magnetic field gradient b is considered as
spatially isotropic with b > 0 as sketched on figure 11b.
The expression of the radiation pressure force is then:
Fx = h̄k
×(10)
s(1− sinα)2/4
1 + sT + 4(δ − geµBb < z > (1− tanα)− kvx)2/Γ 2
s(1 + sinα)2/4
1 + sT + 4(δ − geµBb < z > (1− tanα) + kvx)2/Γ 2
Note that this expression is not restricted to the small α
values. We expect six terms in the expression (11): three
terms for each laser corresponding to the three e−
and e+
polarisation eigenstates. However only two terms,
corresponding to the e+
state, are close to resonance and
thus have a dominant contribution. As for the vertical
dimension, the off resonant terms are removed from the
expression (11). One has also to note that the effective
detuning δeff = δ − geµBb < z > is actually the same as
the one along the vertical dimension.
The first order expansion of (11) in α and kvx/Γ gives
the expression of the horizontal radiation pressure force:
Fx ≈ −καα− γxvx = −κxx− γxvx (11)
κα = − < z > κx = h̄k
1 + sT + 4δ
= mg (12)
= −2 h̄k
2δeff
(1 + sT + 4δ
/Γ 2)2
As for the vertical dimension (equation (6)), the force
depends on δeff but at the position of the MOT does not
depend on the laser detuning δ. Hence, at large detuning,
the horizontal temperature depends only on the intensity
as observed in figures 9a and 9b.
To understand the trapping mechanisms in the x − y
plane, we now consider an atom at rest located at a po-
sition x 6= 0 (corresponding to α 6= 0), i.e. not in the
center of the MOT. The transition rate of two counter-
propagating laser beam is not balanced anymore. This is
due to the opposite sign in the α dependency of the prefac-
tor in expression (11). This mechanism leads to a restoring
force in the x−y plane at the origin of the spatial confine-
ment (equation 11). Applying the equipartition theorem
one gets the horizontal rms size of the cloud:
x2rms =
= −< z > kBT
Without any free adjusting parameter, the agreement with
experimental data is very good as shown in figure 10b.
On the other hand there’s no displacement of the center
of mass in the x − y plane whatever is the detuning δ as
long as the equilibrium of the counter-propagating beams
intensities is preserved (figure 10a).
6 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line
4.3 Comparing the temperatures along horizontal and
vertical axes
As seen in sections 4.1 and 4.2, gravity has a dominant
impact on cooling in a MOT operated on the intercom-
bination line not only along the vertical axe but also in
the horizontal plane. Even so we expect different behav-
iors along this directions essentially because the gravity
renders the trapping potential anisotropic. This is indeed
the case for the spatial distribution (figures 10a and 10b)
whereas the temperatures are surprisingly the same (fig-
ures 9a and 9b). We will now give few simple arguments
to physically explain this last point.
In the semiclassical approximation, the temperature is
defined as the ratio between the friction and the diffusion
term:
kBTi =
Dabsi +D
with i = x, y, z (15)
Dabs and Dspo correspond to the diffusion coefficients in-
duced by absorption and spontaneous emission events re-
spectively. The friction coefficients has been already de-
rived (equation 13):
γz = 2γx,y (16)
Indeed cooling along an axe in the x − y plane results in
the action of two counter-propagating beams four times
less coupled than the single upward laser beam. The same
argument holds for the absorption term of the diffusion
coefficient:
Dabsz = 2D
x,y (17)
The spontaneous emission contribution in the diffusion co-
efficient can be derived from the differential cross-section
dσ/dΩ of the emitting dipole [20]. With a strong biased
magnetic field along the vertical direction, this calcula-
tion is particularly simple as e+z is the only quasi resonant
state. Hence
dσ/dΩ ∝ (1 + cosφ2) (18)
φ is the angle between the vertical axe and the direction of
observation. After a straightforward integration, one finds
a contribution again two times larger along the vertical
Dspoz = 2D
x,y (19)
From those considerations, the temperature is expected
to be isotropic as observed experimentally (see figures 9a
and 9b).
In the so-called regime (I), the minimum temperature
is given by the semiclassical Doppler theory:
T = NR
s (20)
Where NR is a numerical factor which should be close
to two [12]. This solution is represented in figure 9 by
a dashed line nicely matching the experimental data for
s > 8 but with NR = 1.2. Similar results, i.e. with unex-
pected low NR values, have been found in [12]. For s ≤ 8
we observed a plateau in the final temperature slightly
higher than the low saturation theoretical prediction [5].
We cannot explain why the temperature does not decrease
further down as reported in [12]. For quantitative compar-
ison with the theory, more detailed studies in a horizontal
1D molasses are required.
4.4 Conclusions
Cooling of Strontium atoms using the intercombination
line is an efficient technique to reach the recoil temper-
ature in three dimensions by optical methods. Unfortu-
nately loading from a thermal beam cannot be done di-
rectly with a single frequency laser because of the nar-
row velocity capture range. We have shown experimentally
that more than 50% of the atoms initially in a blue MOT
on the dipole-allowed transition are recaptured in the red
MOT using a frequency-broadened spectrum. Using a sim-
ple model, we conclude that the transfer is limited by the
size of the laser beam. If the total power of the beams
at 689 nm was higher, transfer rates up to 90% could be
expected by tripling our laser beam size. The final tem-
perature in the broadband regime is found to be as low
as 2.5µK, i.e. only 5 times larger than the photon recoil
temperature. The gain in temperature by comparison to
the blue MOT (1−10mK) is appreciable. So in absence of
strong requirements on the temperature, broadband cool-
ing is very efficient and reasonably fast (less than 100ms).
The requirements for the frequency noise of the laser are
also much less stringent than for single frequency cooling.
Using a subsequent single frequency cooling stage, it
is possible to reduce the temperature down to 600 nK,
slightly above the photon recoil temperature. Analyzing
the large detuning regime, we particularly focus our stud-
ies on the comparison between vertical and horizontal di-
rections. We show how gravity indirectly influences the
horizontal parameters of the steady state MOT and find
that the trapping potential remains harmonic along all
directions, but with an anisotropy.
Gravity has a major impact on the MOT as it coun-
terbalances the laser pressure of the upward laser (making
the steady state independent of the detuning). We show
that gravity thus affects the final temperature, which re-
mains isotropic, despite different cooling dynamics along
the vertical and horizontal directions.
5 Acknowledgments
The authors wish to thank J.-C. Bernard and J.-C. Bery
for valuable technical assistances. This research is finan-
cially supported by the CNRS (Centre National de la
Recherche Scientifique) and the former BNM (Bureau Na-
tional de Métrologie) actually LNE (Laboratoire national
de métrologie et d’essais) contract N◦ 03 3 005.
T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 7
References
1. F. Ruschewitz, J. L. Peng, H. Hinderthr, N. Schaffrath, K.
Sengstock, and W. Ertmer, Phys. Rev. Lett. 80, 3173 (1998);
G. Ferrari, P. Cancio, R. Drullinger, G. Giusfredi, N. Poli, M.
Prevedelli, C. Toninelli, and G. M. Tino Phys. Rev. Lett. 91,
243002 (2003); M. Yasuda and H. Katori Phys. Rev. Lett. 92,
153004 (2004); T. Ido, T. H. Loftus, M. M. Boyd, A. D. Lud-
low, K. W. Holman, and J. Ye Phys. Rev. Lett. 94, 153001
(2005); R. Le Targat, X. Baillard, M. Fouch, A. Brusch, O.
Tcherbakoff, G. D. Rovera, and P. Lemonde Phys. Rev. Lett.
97, 130801 (2006).
2. J. Weiner, V. Bagnato, S. Zilio, and P. S. Julienne, Rev.
Mod. Phys. 71, 1 (1999); T. Dinneen, K. R. Vogel, E.
Arimondo, J. L. Hall, and A. Gallagher, Phys. Rev. A
59, 1216 (1999). A.R.L.Caires, G.D.Telles, M.W.Mancini,
L.G.Marcassa, V.S.Bagnato, D.Wilkowski, R. Kaiser, Bra.
J. Phys. 34, 1504 (2004).
3. M. M. Boyd, T. Zelevinsky, A. D. Ludlow, S.M. Forman, T.
Ido, and J. Ye Science 314, 1430 (2006).
4. D. Hayes, P. Julienne, I. Deutsch, Arxiv, quant-ph/0609111.
5. Y. Castin, H. Wallis, and J. Dalibard, J. Opt. Soc. Am. B.
6, 2046 (1989).
6. T. Binnewies, G. Wilpers, U. Sterr, F. Riehle, and J. Helm-
cke, T. E. Mehlstubler, E. M. Rasel, and W. Ertmer, Phys.
Rev. Lett. 87, 123002 (2001).
7. H. Katori, T. Ido, Y. Isoya, and M. Kuwata-Gonokami,
Phys. Rev. Lett. 82, 1116 (1999)
8. T. H. Loftus, T. Ido, A. D. Ludlow, M. M. Boyd, and J. Ye,
Phys. Rev. Lett. 93, 073003 (2004).
9. T. Ido, Y. Isoya, and H. Katori, Phys. Rev. A 61, 061403
(2000).
10. D. W. Sesko, T. G. Walker and C. E. Wieman, J. Opt.
Soc. Am. B 8, 946 (1991).
11. T. Chanelière, J.-L. Meunier, R. Kaiser, C. Miniatura, and
D. Wilkowski. J. Opt. Soc. Am. B, 22, 1819 (2005).
12. T. H. Loftus, T. Ido, M. M. Boyd, A. D. Ludlow, and J.
Ye, Phys. Rev. A 70, 063413 (2004).
13. K. R. Vogel, Ph. D. Thesis, University of Colorado, Boul-
der, CO 80309, (1999).
14. Y. Bidel, B. Klappauf, J.C. Bernard, D. Delande, G.
Labeyrie, C. Miniatura, D. Wilkowski, R. Kaiser, Phys. Rev.
Lett. 88, 203902 (2002).
15. B. Klappauf, Y. Bidel, D. Wilkowski, T. Chanelière, R.
Kaiser, Appl.Opt. 43, 2510 (2004).
16. D. Wilkowski, Y. Bidel, T. Chanelière, R. Kaiser, B. Klap-
pauf, C. Miniatura, SPIE Proceeding 5866, 298 (2005).
17. N. Poli, G. Ferrari, M. Prevedelli, F. Sorrentino, R. E.
Drullinger, and G. M. Tino, Spectro. Acta Part A 63, 981
(2006).
18. H.J. Metcalf, P. van der Straten, Laser cooling and trap-
ping, Springer, (1999).
19. C. Dedman, J. Nes, T. Hanna, R. Dall, K. Baldwin, and
A. Truscott, Rev. Mod. Phys., 75, 5136 (2004).
20. J.D. Jackson, Classical Electrodynamics (J. Wiley and
sons, third edition New York, 1999).
Blue MOT Laser
Red MOT Laser
Red MOT Laser
Magnetic field gradient
70 G/cm
1-10 G/cm
Du~k vbD
70 ms40 ms
80 ms
Fig. 1. Time sequence and cooling stages of Strontium with
the dipole-allowed transition and with the intercombination
line.
0 5 10 15 20 25 30 35 40 45
Modulation frequency (kHz)
Fig. 2. Transfer rate as a function of the modulation frequency.
The other parameters are fixed: P = 3mW, δ = −1000 kHz,
b = 1G/cm and ∆ν = 1000 kHz
http://arxiv.org/abs/quant-ph/0609111
8 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line
0 500 1000 1500 2000 2500 3000
Frequency deviation (kHz)
Fig. 3. Transfer rate as a function of the frequency deviation
(squares). The other parameters are fixed: P = 3mW, δ =
−1000 kHz, b = 1G/cm and νm = 25 kHz. The dash and solid
line correspond to a simple model prediction (see text). The
transfer rate is limited by the frequency deviation of the broad
laser spectrum for the dash line and by the waist of the MOT
beam for the solid line.
0 500 1000 1500 2000 2500 3000
Frequency deviation (kHz)
Frequency deviation (kHz)
0 500 1000 1500 2000 2500 3000
(a)                                                     (b)
Fig. 4. Transfer rate as a function of the frequency deviation
(squares). δ = −1500 kHz and δ = −2000 kHz for (a) and (b)
respectively, the other parameters and the definitions are the
same than for figure 3.
0 500 1000 1500 2000 2500
Detuning (kHz)
Fig. 5. Transfer rate as a function of the detuning (squares).
The other parameters are fixed: P = 3mW, ∆ν = 1000 kHz,
b = 1G/cm and νm = 25 kHz. The dashed and solid lines have
the same signification than in figure 3.
0 1 2 3 4 5 6 7 8 9 10
b (G/cm)
Fig. 6. Transfer rate as a function of the magnetic gradient
(squares). The other parameters are fixed: P = 3mW, δ =
−1000 kHz, ∆ν = 1000 kHz and νm = 25 kHz. The transfer
rate is limited by the waist of the MOT beam for all values.
The dotted lines represent the case where the magnetic field
gradient do not affect the deceleration.
0 1 2 3 4 5
Beam waist (cm)
Fig. 7. Transfer rate as a function of the beam waist. The solid
lines correspond to a high saturation parameter where as the
dash line correspond to a constant power of P = 3mW. The
other parameters are fixed: δ = −1000 kHz, ∆ν = 1000 kHz
and b = 0.1G/cm.
-2000 -1500 -1000 -500
δ (kHz)
Fig. 8. Measured temperature as a function of the detuning for
a FM spectrum. The other parameters are fixed: P = 3mW,
b = 1G/cm, ∆ν = 1000 kHz and νm = 25 kHz
T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line 9
10 100 1000
-80 -60 -40 -20 0
Detuning (kHz)
Fig. 9. Measured temperature as a function of the detuning
(a) with I = 4Is or I = 15Is and as a function of the intensity
(b) with δ = −100 kHz of single frequency cooling. The circles
(respectively stars) correspond to temperature along one of
the horizontal (respectively vertical) axis. The magnetic field
gradient is b = 2.5G/cm
-700 -600 -500 -400 -300 -200 -100 0
Detuning (kHz)
-700 -600 -500 -400 -300 -200 -100 0
Detuning (kHz)
Fig. 10. Displacement (a) and rms radius (b) of the cold cloud
in single frequency cooling along the z axis (star) and in the
x−y plane (circle). The intensity per beam is I = 20Is and the
magnetic gradient b = 2.5G/cm along the strong axis in the x−
y plane. The linear displacement prediction correspond to the
plain line (graph a). In graph b, the plain curve correspond to
the rms radius prediction based on the equipartition theorem.
10 T. Chanelière, L. He, R. Kaiser and D. Wilkowski: Three dimensional cooling and trapping with a narrow line
d=-1000kHzd=-100kHz
Cloud
Quantization axe
Fig. 11. (a) Images of the cold cloud in the red MOT. The
cloud position for δ = −100 kHz coincides roughly with the
center of the MOT whereas it is shifted downward for δ =
−1000 kHz. The spatial position of the resonance correspond
dot circle. (b) Sketch representing the large detuning case. The
coupling efficiency of the MOT lasers is encoded in the size of
the empty arrow. The laser form below has maximum efficiency
whereas the one pointing downward is absent because is too
detuned. Along a horizontal axe, the lasers are less coupled
because they do not have the correct polarization. The α angle
is the angular position of an atom M with respect to O, the
center of the MOT.
	Introduction
	Experimental set-up
	Broadband loading of the red MOT
	Single frequency cooling
	Acknowledgments
ABSTRACT
  The intercombination line of Strontium at 689nm is successfully used in laser
cooling to reach the photon recoil limit with Doppler cooling in a
magneto-optical traps (MOT). In this paper we present a systematic study of the
loading efficiency of such a MOT. Comparing the experimental results to a
simple model allows us to discuss the actual limitation of our apparatus. We
also study in detail the final MOT regime emphasizing the role of gravity on
the position, size and temperature along the vertical and horizontal
directions. At large laser detuning, one finds an unusual situation where
cooling and trapping occur in the presence of a high bias magnetic field.

<|endoftext|><|startoftext|>
Approximate Selection Rule for Orbital Angular Momentum
in Atomic Radiative Transitions
I.B. Khriplovich and D.V. Matvienko
Budker Institute of Nuclear Physics,
630090 Novosibirsk, Russia,
and Novosibirsk University
Abstract
We demonstrate that radiative transitions with ∆l = −1 are strongly dominating
for all values of n and l, except small region where l ≪ n.
It is well-known that the selection rule for the orbital angular momentum l in electro-
magnetic dipole transitions, dominating in atoms, is ∆l = ±1, i. e. in these transitions
the angular momentum can both increase and decrease by unity. Meanwhile, the classical
radiation of a charge in the Coulomb field is always accompanied by the loss of angular
momentum. Thus, at least in the semiclassical limit, the probability of dipole transitions
with ∆ l = − 1 is higher. Here we discuss the question how strongly and under what
exactly conditions the transitions with ∆l = −1 dominate in atoms. (To simplify the
presentation, we mean always, here and below, the radiation of a photon, i. e. transitions
with ∆n < 0. Obviously, in the case of photon absorption, i. e. for ∆n > 0, the angular
momentum predominantly increases.)
The analysis of numerical values for the transition probabilities in hydrogen presented
in [1] has demonstrated that even for n and l, comparable with unity, i. e. in a nonclassical
situation, radiation with ∆l = −1 can be much more probable than that with ∆l = 1.
Later, the relation between the probabilities of transitions with ∆l = −1 and ∆l = 1
was investigated in [2] by analyzing the corresponding matrix elements in the semiclassical
approximation. The conclusion made therein is also that the transitions with ∆l = −1
dominate, and the dominance is especially strong when l > n2/3.
Here we present a simple solution of the problem using the classical electrodynamics
and, of course, the correspondence principle. Our results describe the situation not only
in the semiclassical situation. Remarkably enough, they agree, at least qualitatively, with
the results of [1], although the latter refer to transitions with |∆n| ∼ n ∼ 1 and l ∼ 1,
which are not classical at all.
We start our analysis with a purely classical problem. Let a particle with a mass
m and charge − e moves in an attractive Coulomb field, created by a charge e, along
an ellipse with large semi-axis a and eccentricity ε. It is known [3] that the radiation
intensity at a given harmonic ν is here
4e2ω4
ξ2ν + η
; (1)
J ′ν(νε), ην =
1− ε2
Jν(νε). (2)
In expressions (2), Jν(νε) is the Bessel function, and J
ν(νε) is its derivative. We use the
Fourier transformation in the following form:
x(t) = a
iνω0t = 2a
ξν cos νω0t,
http://arxiv.org/abs/0704.0856v1
y(t) = a
iνω0t = 2a
ην sin νω0t,
where all dimensionless Fourier components ξν and ην are real, and ξ−ν = ξν , η−ν = − ην .
We note that the Cartesian coordinates x and y are related here to the polar coordinates r
and φ as follows: x = r cos φ, y = r sin φ, where φ increases with time. Thus, the angular
momentum is directed along the z axis (but not in the opposite direction).
We note also that, since 0 ≤ ε ≤ 1, both Jν(νε) and J ′ν(νε) are reasonably well
approximated by the first term of their series expansion in the argument. Therefore, all
the Fourier components ξν and ην are positive.
In the quantum problem (where ν = |∆n|), the probability of transition in the unit
time is
h̄ω0ν
4e2ω3
3c3h̄
ξ2ν + η
, ω0 =
h̄2n3
. (3)
Now, the loss of angular momentum with radiation is [3]
r× r... .
Going over here to the Fourier components, we obtain
Ṁν = −
4e2ω2
rν × ṙν ,
or (with our choice of the direction of coordinate axes, and with the angular momentum
measured in the units of h̄)
Ṁν = −
4e2ω3
3c3h̄
2ξνην . (4)
Obviously, the last expression is nothing but the difference between the probabilities
of transitions with ∆l = 1 and ∆l = −1 in the unit time:
Ṁν = W
ν −W−ν . (5)
Of course, the total probability (3) can be written as
Wν = W
ν . (6)
From explicit expressions (3) and (4) it is clear that inequality W+ν ≪ W−ν holds if
2ξνην ≈ ξ2ν + η2ν , or ην ≈ ξν . The last relation is valid for ε ≪ 1, i. e. for orbits close
to circular ones. (The simplest way to check it, is to use in formulae (2) the explicit
expression for the Bessel function at small argument: Jν(νε) = (νε)
ν/(2νν !).)
This conclusion looks quite natural from the quantum point of view. Indeed, it is
the state with the orbital quantum number l equal to n − 1 (i. e. with the maximum
possible value for given n) which corresponds to the circular orbit. In result of radiation
n decreases, and therefore l should decrease as well.
The surprising fact is, however, that in fact the probabilities W−ν of transitions with
∆l = −1 dominate numerically everywhere, except small vicinity of the maximum possible
eccentricity ε = 1. For instance, if ε ≃ 0.9 (which is much more close to 1 than to 0 !),
then at ν = 1 the discussed probability ratio is very large, it constitutes
≃ 12 .
The change with ε of the ratio of W+ν to W
ν for two values of ν is illustrated in Fig. 1.
The curves therein demonstrate in particular that with the increase of ν, the region
0.2 0.4 0.6 0.8 1.0 Ε
�������� ����
Fig. 1
where W−ν and W
ν are comparable, gets more and more narrow, i. e. when ν grows, the
corresponding curves tend more and more to a right angle.
Let us go over now to the quantum problem. In the semiclassical limit, the classical
expression for the eccentricity
is rewritten with usual relations E = −me4/(2h̄2n2) and M = h̄l as
. (8)
In fact, the exact expression for ε, valid for arbitrary l and n, is [3]:
l(l + 1) + 1
. (9)
Clearly, in the semiclassical approximation the eccentricity is close to unity only under
condition l ≪ n. If this condition does not hold, one may expect that in the semiclas-
sical limit the transitions with ∆l = −1 dominate. In other words, as long as l ≪ n,
the probabilities of transitions with decrease and increase of the angular momentum are
comparable. But if the angular momentum is not small, it is being lost predominantly in
radiation. This situation looks quite natural.
The next point is that with the increase of |∆n| = ν, the region where W−ν and W+ν
are comparable, gets more and more narrow in agreement with the observation made in
However, we do not see any hint at some special role (advocated in [2]) of the condition
l > n2/3 for the dominance of transitions with ∆l = −1.
As mentioned already, the analysis of the numerical values of transition probabilities [1]
demonstrates that even for n and l comparable with unity and |∆n| ≃ n, i. e. in the
absolutely nonclassical regime, the transitions with ∆l = −1 are still much more probable
than those with ∆l = 1. The results of this analysis for the ratio W−/W+ in some
transitions are presented in Table 3.1 (first line). Then we indicate in Table 3.1 (last line)
W4p→3s
W4p→3d
W5p→4s
W5p→4d
W5d→4p
W5d→4f
W6f→5d
W6f→5g
W5p→3s
W5p→3d
W6p→3s
W6p→3d
exact
value 10 3.75 28 72 10.67 13.7
ε̄ 0.87 0.92 0.81 0.75 0.90 0.92
ν = |∆n| 1 1 1 1 2 3
semiclassical
value 17.6 8.7 34 58 17.2 15.7
Table 3.1
the values of these ratios obtained in the näıve (semi)classical approximation. Here for
the eccentricity ε̄ we use the value of expression (9), calculated with l corresponding to
the initial state; as to n, we take its value average for the initial and final states.
The table starts with the smallest possible quantum numbers where the transitions,
which differ by the sign of ∆l, occur, i. e. with the ratio W4p→3s/W4p→3d. This table
demonstrates that the ratio of the classical results to the exact quantum-mechanical ones
remains everywhere within a factor of about two. In fact, if one uses as ε̄ expression (8),
calculated in the analogous way, the numbers in the last line change considerably. It is
clear, however, that the classical approximation describes here, at least qualitatively, the
real situation.
References
[1] H.A. Bethe and E.E. Salpeter, Quantum Mechanics of One- and Two-Electron Atoms,
Springer, 1957; §63.
[2] N.B. Delone and V.P. Krainov, FIAN Preprint No. 18, 1979; J. Phys. B 27, (1994)
4403.
[3] L.D. Landau and E.M. Lifshitz, The Classical Theory of Fields, Nauka, 1973; §70,
problem 2 to §72.
[4] L.D. Landau and E.M. Lifshitz, Quantum Mechanics, Nauka, 1974; §36.
ABSTRACT
  We demonstrate that radiative transitions with \Delta l = - 1 are strongly
dominating for all values of n and l, except small region where l << n.

<|endoftext|><|startoftext|>
Introduction
Being a ‘close relative’ of General Relativity (GR), Absolute Parallelism (AP) has many interesting
features: larger symmetry group of equations; field irreducibility with respect to this group; vast
list of compatible second order equations (discovered by Einstein and Mayer [1]) not restricted to
Lagrangian ones.
There is the only variant of Absolute Parallelism which solutions are free of arising singularities,
if D=5 (there is no room for changes; this variant of AP does not have a Lagrangian, nor match
GR); in this case AP has topological features of nonlinear sigma-model.
In order to give clear presentation and full picture of the theory’ scope, many items should be
sketched: instability of trivial solution and expanding O4-symmetrical ones; tensor Tµν (positive
energy, but only three polarizations of 15 carry (and angular) momentum; how to quantize such a
stuff ?) and PN-effects; topological classification of symmetric 5D field configurations (alighting on
evident parallels with Standard Model’ particle combinatorics) and ‘quantum phenomenology on
expanding classical background’ (coexistence); ‘plain’ R2-gravity on very thick brane and change
in the Newton’s Law: 1
goes to 1
with distance (not with acceleration – as it is in MOND [2]).
At last, an experiment with single photon interference is discussed as the other way to observe
very-very long (and very undeveloped) the extra dimension.
2 Unique 5D equation of AP (free of singularities in solutions)
There is one unique variant of AP (non-Lagrangian, with the unique D; D=5) which solutions of
general position seem to be free of arising singularities. The formal integrability test [3] can be
∗E-mail: zhogin at inp.nsk.su; http://zhogin.narod.ru
http://arxiv.org/abs/0704.0857v1
http://zhogin.narod.ru
extended to the cases of degeneration of either co-frame matrix, haµ, (co-singularities) or contra-
variant frame (or contra-frame density of some weight), serving as the local and covariant (no
coordinate choice) test for singularities of solutions. In AP this test singles out the next equation
(and D=5, see [4]; ηab = diag(−1, 1, . . . , 1), then h = det haµ =
Eaµ = Laµν;ν − 13(faµ + LaµνΦν) = 0 , (1)
where (see [4] for more detailed introduction to AP and explanation of notations used)
Laµν = La[µν] = Λaµν − Saµν − 23ha[µΦν],
Λaµν = 2ha[µ,ν], Sµνλ = 3Λ[µνλ], Φµ = Λaaµ, fµν = 2Φ[µ,ν] = 2Φ[µ;ν]. (2)
Coma ”,” and semicolon ”;” denote partial derivative and usual covariant differentiation with
symmetric Levi-Civita connection, respectively.
One should retain the identities [which follow from the definitions (2)]:
Λa[µν;λ] ≡ 0 , haλΛabc;λ ≡ fcb (= fµνhµchνb ), f[µν;λ] ≡ 0. (3)
The equation Eaµ;µ = 0 gives ‘Maxwell-like equation’ (we prefer to omit g
µν (ηab) in contrac-
tions that not to keep redundant information – when covariant differentiation is in use only):
(faµ + LaµνΦν);µ = 0, or fµν;ν = (SµνλΦλ);ν (= −12Sµνλfνλ, see below) . (4)
Actually the Eq. (4) follows from the symmetric part of equation, E(ab), because skewsymmetric
one gives just the identity:
2E[νµ] = Sµνλ;λ = 0, E[µν];ν ≡ 0;
note also that the trace part becomes irregular (the principal derivatives vanish) if D = 4 (this
number of dimension is forbidden, and the next number, D = 5, is the most preferable):
Eµµ = Eaµh
ab = 4−D
Φµ;µ + (Λ
2) = 0.
The system (1) remains compatible under adding fµν = 0, see (4); this is not the case for
another covariant, S,Φ, or (some irreducible part of the) Riemannian curvature, which relates to
Λ as usually:
Raµνλ = 2haµ;[ν;λ]; haµhaν;λ =
Sµνλ − Λλµν .
3 Tensor Tµν (despite Lagrangian absence) and PN-effects
One might rearrange E(µν)=0 that to pick out (into LHS) the Einstein tensor, Gµν =Rµν− 12gµνR,
but the rest terms are not proper energy-momentum tensor: they contain linear terms Φ(µ;ν)
(no positive energy ( !); another presentation of ‘Maxwell equation’ (4) is possible instead – as
divergence of symmetrical tensor).
However, the prolonged equation E(µν);λ;λ = 0 can be written as ‘plain’ (no R-term) R
2-gravity:
(−h−1 δ(hRµνGµν)/δgµν=) Gµν;λ;λ +Gǫτ(2Rǫµτν − 12gµνRǫτ ) = Tµν(Λ
′2, . . .), Tµν;ν = 0; (5)
up to quadratic terms,
Tµν =
2 − fµλfνλ) + Aµǫντ (Λ2);(ǫ;τ) + (Λ2Λ′,Λ4);
tensor A has symmetries of Riemann tensor, so the term A′′ adds nothing to momentum and
angular momentum.
It is worth noting that:
(a) the theory does not match GR, but shows ‘plain’ R2-gravity (sure, (5) does not contain all
the theory);
(b) only f -component (three transverse polarizations in D=5) carries D-momentum and an-
gular momentum (‘powerful’ waves); other 12 polarizations are ‘powerless’, or ‘weightless’ (this is
a very unusual feature – impossible in the Lagrangian tradition; how to quantize ? let us not to
try this, leaving the theory ‘as is’);
(c) f -component feels only metric and S-field (‘contorsion’, not ‘torsion’ Λ – to label somehow),
see (4), but S has effect only on polarization of f : S[µνλ] does not enter eikonal equation, and f
moves along usual Riemannian geodesic (if background has f=0); one may think that all ‘quantum
fields’ (phenomenological quantized fields accounting for topological (quasi)charges and carrying
some ‘power’; see further) inherit this property;
(d) the trace Tµµ =
fµνfµν can be non-zero if f
2 6= 0 and this seemingly depends on S-
component [which enters the current in (4)]; in other words, ‘mass distribution’ is to depend on
distribution of f - and S-component;
(e) it should be stressed and underlined that the f -component is not usual (quantum) EM-
field – just important covariant responsible for energy-momentum (suffice it to say that there is
no gradient invariance for f).
4 Linear domain: instability of trivial solution (with powerless waves)
Another strange feature is the instability of trivial solution: some ‘powerless’ polarizations grow
linearly with time in presence of ‘powerful’ f -polarizations. Really, from the linearized Eq. (1)
and the identity (3) one can write (the following equations should be understood as linearized):
Φa,a = 0 (D 6= 4), 3Λabd,d = Φa,b − 2Φb,a, Λa[bc,d],d ≡ 0 ⇒ 3Λabc,dd = −2fbc,a .
The last ‘D‘Alembert equation’ has the ‘source’ in its right hand side. Some components of Λ
(most symmetrical irreducible parts) do not grow (as well as curvature), because (again, linearized
equations are implied below)
Sabc,dd = 0, Φa,dd = 0, fab,dd = 0, Rabcd,ee = 0,
but the least symmetrical components of the tensor Λ do grow up with time (due to terms ∼ t e−iωt;
three growing polarizations which are ‘imponderable’, or powerless) if the ‘ponderable’ waves (three
f -polarizations) do not vanish (and this should be the case for solutions of ‘general position’).
5 Expanding O4-symmetrical (single wave) solutions and cosmology
The unique symmetry of AP equations gives scope for symmetrical solutions. In contrast to GR,
this variant of AP has non-stationary spherically (O4-) symmetric solutions. The O4-symmetric
frame field can be generally written as follows [4]:
haµ(t, x
a bni
cni eninj + d∆ij
; i, j = (1, 2, 3, 4), ni =
. (6)
Here a, . . . , e are functions of time, t = x0, and radius r, ∆ij = δij −ninj, r2 = xixi. As functions
of radius, b, c are odd, while the others are even; other boundary conditions: e = d at r = 0, and
haµ → δ aµ as r → ∞. Placing in (6) b = 0, e = d (the other interesting choice is b=c=0) and
making integrations one can arrive to the next system (resembling dynamics of Chaplygin gas;
dot and prime denote derivation on time and radius, resp.; A = a/e = e1/2, B = −c/e):
A· = AB′ −BA′ + 3AB/r , B · = AA′ − BB′ − 2B2/r . (7)
This system (does not suffer of gradient catastrophe and) has non-stationary solutions; a single-
wave solution of proper ‘amplitude’ might serve as a suitable cosmological (expanding) background.
The condition fµν=0 is a must for solutions with such a high symmetry (as well as Sµνλ=0); so,
these O4-solutions carry no energy, that is, weight nothing (some lack of gravity ! in this theory
the universe expansion seemingly has little common with gravity, GR and its dark energy [5]).
More realistic cosmological model might look like a single O4-wave (or a sequence of such
waves) moving along the radius and being filled with chaos, or stochastic waves, both powerful
(weak, ∆h≪ 1) and powerless (∆h < 1, but intense enough that to lead to non-linear fluctuations
with ∆h ∼ 1), which form statistical ensemble(s) having a few characteristic parameters (after
‘thermalization’). The development and examination of stability of such a model is an interesting
problem. The metric variation in cosmological O4-wave can serve as a time-dependent ‘shallow
dielectric guide’ for that weak noise waves. The ponderable waves (which slightly ‘decelerate’
the O4-wave) should have wave-vectors almost tangent to the S
3-sphere of wave-front that to be
trapped inside this (‘shallow’) wave-guide; the imponderable waves can grow up, and partly escape
from the wave-guide, and their wave-vectors can be less tangent to the S3-sphere.
The waveguide thickness can be small for an observer in the center of O4-symmetry, but in co-
moving coordinates it can be very large (due to relativistic effect), however still small with respect
to the radius of sphere, L≪ R. It seems that the radial dimension has to be very ‘undeveloped’;
that is, there are no other characteristic scales, smaller than L, along this extra-dimension.
6 Non-linear domain: topological charges and quasi-charges
Let AP-space is of trivial topology: no worm-holes, no compactified space dimensions, no singu-
larities. One can continuously deform frame field h(x) to a field of rotation matrices (metric can
be diagonalized and ‘square-rooted’) haµ(x) → saµ(x) ∈ SO(1, d); m=D−1. Further deformation
can remove boosts too, and so, for any space-like (Cauchy) surface, this gives a (pointed) map,
s : Rm ∪∞ = Sm → SOm; ∞ 7→ 1m ∈ SOm.
The set of such maps consists of homotopy classes forming the group of topological charge, Π(m):
Π(m) = πm(SOm); Π(3) = Z, Π(4) = Z2 + Z2. (8)
Here Z is the infinite cyclic group, and Z2 is the cyclic group of order two.
It is important that deformation to s-field can keep symmetry of field configuration. Definition:
localized field (pointed map) s(x) : Rm → SO(m), s(∞) = 1m, is G-symmetric if, in some
coordinates,
s(σx) = σs(x)σ−1 ∀ σ ∈ G ⊂ O(m) . (9)
The set of such fields C(m)G generally consists of separate, disconnected components – homotopy
classes forming the ‘topological quasi-charge group’ denoted here as Π(G;m) ≡ π0(C(m)G ). These
QC-groups classify symmetrical localized configurations of frame field. Since field equation does
not break symmetry, quasi-charge conserves; if symmetry is not exact (because of distant regions),
quasi-charge is not exactly conserving value, and quasi-particle (of zero topological charge) can
annihilate (or be created) during colliding with another quasi-particle.
The other problem. Let G1 ⊃ G2, such that there is a mapping (embedding) i : C(m)G1 → C
which induces the homomorphism of QC-groups: i∗ : Π(G1;m) → Π(G2;m), so one has to
describe this morphism.
Let us consider the simple (discreet) symmetry group P1 with a plane of reflection symmetry:
P1 = {1, p(1)}, where p(1) = diag(−1, 1, . . . , 1) = p−1(1).
It is necessary to set field s(x) on the half-space 1
Rm = {x1 ≥ 0}, with additional condition
imposed on the surface Rm−1 = {x1 = 0} (stationary points of P1 group) where s has to commute
with the symmetry [see (9)]:
p(1)x = x ⇒ s(x) = p(1)sp(1) ⇒ s ∈ 1× SOm−1.
Hence, accounting for the localization requirement, we have a diad map (relative spheroid; here
Dm is anm-ball and Sm−1 its surface) (Dm;Sm−1) → (SOm;SOm−1), and topological classification
of such maps leads to the relative (or diad) homotopy group ([6]; the last equality below follows
due to fibration SOm/SOm−1 = S
m−1):
Π(P1;m) = πm(SOm;SOm−1) = πm(S
m−1).
Similar considerations (of group orbits and stationary points) lead to the following result:
Π(Ol;m) = πm−l+1(SOm−l+1;SOm−l) = πm−l+1(S
m−l).
If l > 3, there is the equality: Π(SOl;m) = Π(Ol;m), while for l = 2, 3 one can find:
Π(SO3;m) = πm−2(SO2 × SOm−2;SOm−3) = πm−2(S1 × Sm−3),
Π(SO2;m) = πm−1(SOm;SOm−2 × SO2) = πm−1(RG+(m, 2)).
The set of quaternions with absolute value one, H1 = {f, |f| = 1}, forms a group under
quaternion multiplication, H1 ∼= SU2 = S3, and any s ∈ SO4 can be represented as a pair of such
quaternions [6], (f , g) ∈ S3(l) × S3(r), |f | = |g| = 1:
x∗ = sx ⇔ x∗ = f x g−1 = f x ḡ ; |x| = |x∗|.
The pairs (f,g) and (–f, –g) correspond to the same rotation s, that is, SO4 = S
(l) × S3(r)/±.
Note that the symmetry condition (9) also splits into two parts:
f(axb−1) = af(x)a−1, g(axb−1) = bg(x)b−1 ∀(a,b) ∈ G ⊂ SO4. (10)
7 Example of SO2-symmetric quaternion field
Let’s consider an example of SO2{2, 3}−symmetric f–field configuration (g=1), which carries both
charge and SO2-quasi-charge (left, of course), f(x): H = R
4 → H1; f(∞) = 1. The symmetry
condition (10) reads
f(eiφ/2xe−iφ/2) = eiφ/2f(x)e−iφ/2. (11)
We’ll switch to ‘double-axial’ coordinates: x = aeiϕ + beiψj. Let us use imaginary quaternions q
as stereogrphic coordinates on H1, and take symmetrical field q(x) consistent with Eq. (11):
q(x) = x i x̄+ i = −q̄, f(x) = −
1 + q
. (12)
It is easy to find the ‘center of quasi-soliton’ (1-submanifold, S1)
S1 = f−1(−1) = q−1(0) = {a = 0, b = 1} = {x0(ψ) = eiψj}
and the ‘vector equipment’ on this circle:
dx|x0 = da eiϕ + (db+ i dψ)eiψj, 14df
= idb− k ei (ϕ+ψ)da ;
i-vector all time looks along the radius b (parallel translation along the circle S1; this is a ‘trivial‘,
or ‘flavor’-vector). Two others (’phase’-vectors) make 2π−rotation along the circle.
In fact, the field (12) has also symmetry SO2{1, 4}, and this feature restricts possible directions
of ‘flavor’-vector (two ‘flavors’ are possible, ±; the P2{1, 4}−symmetry (this is the π-rotation of
x1, x4) gives the same effect). The other interesting observation is that the equipped circle can be
located also at the stationary points of SO2−symmetry (this increases the number of ‘flavors’).
8 Quasi-charges and their morphisms (in 5D, ie m = 4)
If G ⊂ SO4, the QC-group has two isomorphous parts, left and right: Π(G) = Π(l)(G) + Π(r)(G).
The Table below describes quasi-charge groups for G ⊂ G0 = (O3 × P4) ∩ SO4 (P4 is spatial
inversion, the 4-th coordinate is the extra dimension of G0-symmetric expanding cosmological
background).
Table. QC-groups Π(l)(G) and their morphisms to the preceding group; G ⊂ G0.
G Πl(G) → Πl(G∗) ‘label’
SO{1, 2} Z(e)
e→ Z2 e
SO{1, 2} × P{3, 4} Z(ν) + Z(H)
i,m2→ Z(e) ν0; H0 → e + e
SO{1, 2} × P{2, 3} Z(W )
0→ Z(e) W → e + ν0
SO{1, 2} × P{2, 4} Z(Z)
0→ Z(e) Z0 → e+ e
SO{1, 2} × P{3, 4} × Z(γ)
0→ Z(H) γ0 → H0 +H0
×P{2, 3} 0→ Z(W ) →W +W
‘Quasi-particles’, which symmetry includes P4, seem to be true neutral (neutrinos, Higgs particles,
photon).
One can assume further that an hadron bag is a specific place where G0−symmetry does
not work, and the bag’s symmetry is isomorphous to O4. This assumption can lead to another
classification of quasi-solitons (some doubling the above scheme), where self-dual and anti-self-
dual one-parameter groups take place of SO2−group. The total set of quasi-particle parameters
(parameters of equipped 1-manifold (loop) plus parameters of group) for (anti)self-dual groups,
G(4, 2)×RP 2, is larger than the analogous set for groups SO2 ⊂ G0, which is just O3×G(3, 1) =
RP 2 . If the number of ‘flavor’-parameters (which are not degenerate and have some preferable
particular values; this should be sensitive to discreet part of G – at least photons have the same
flavor) is the same as in the case of ‘white’ quasi-particles, the remaining parameters (degenerate,
or ‘phase’) can give room for ‘color’ (in addition to spin). So, perhaps one might think about
‘color neutrinos’ (in the context of pomeron, and baryon spin puzzle), ‘color W, Z, and Higgs’
(another context – B-mesons), and so on.
Note that in this picture the very notion of quasi-particle depends on the background symmetry
(also to note: there are no ’quanta of torsion’ per se). On the other hand, large clusters of
quasi-particles (matter) can disturb the background, and waves of such small disturbances (with
wavelength larger than the thickness L, perhaps) can be generated as well (but these waves do
not carry (quasi)charges, that is, are not quantized).
9 Coexistence: phenomenological ‘quantum fields’ on classical back-
ground
The non-linear, particle-like field configurations with quasi-charges (quasi-particles) should be very
elongated along the extra-dimension (all of the same size L), while being small sized along usual
dimensions, λ≪ L. The motion of such a spaghetti-like quasi-particle should be very complicated
and stochastic due to ‘strong’ imponderable noise, such that different parts of spaghetti are coming
their own paths. At the same time, quasi-particle can acquire ‘its own’ energy–momentum – due to
scattering of ponderable waves (which wave-vectors are almost tangent to usual 3D (sub)space);
so, it seems that scattering amplitudes1 of those spaghetti’s parts which have the same 3D–
coordinates can be summarized providing an auxiliary, secondary field.
So, the imponderable waves provides stochasticity (of motion of spaghetti’s parts), while the
ponderable waves ensure superposition (with secondary fields). Phenomenology of secondary fields
could be of Lagrangian type, with positive energy acquired by quasi-particles, – that to ensure the
stability (of all the waveguide with its infill – with respect to quasi-particle production; the least
action principle has deep concerns with Lyapunov stability and is deducible, in principle, from the
path integral approach).
10 ‘Plain’ R2 gravity on very thick brane
and change in the Newton’s Law of Gravitation
Let us start with 4d (from 5D) bi-Laplace equation with a δ-source [as weak field, non-relativistic
(stationary) approximation (it is assumed that ‘mass is possible’) for R2-gravity (5)] and its
solution (R is 4d distance, radius):
∆2ϕ = − a
δ(R); ϕ(R2) =
lnR2 − b
(+ c , but c does not matter); (13)
the attracting force between two point masses is Fpoint =
, a, b should be proportional to
both masses.
Now let us suppose that all masses are distributed along the extra dimension with a ‘universal
function’, µ(p),
µ(p) dp = 1. Then the attracting (gravitation) force takes the next form [see
1 These amplitudes can depend on additional vector-parameters (‘equipment vectors’) relating to differential of
field mapping at a ‘quasi-particle center’ – where quasi-charge density is largest (if it has covariant sense).
0 1 2 3 4 5 6
Fig. 1. Deviation δF = F − 1/r2 for different µ(p), see Eq. (14) and text below.
(13); r is usual 3d distance]:
F (r) =
ϕ(r2 + (p− q)2)µ(p)µ(q) dp dq =
V − b V ′, V (r) =
µ(p)µ(q) dp dq
r2 + (p− q)2
. (14)
(Note that V (r) can be restored if F (r) is measured.)
Taking µ1(p) = π
−1/(1 + p2) (typical scale along the extra dimension is taken as unit, L = 1;
it seems that L should be greater than ten AU), one can find rV1(r) = 1/(2 + r) and
F (r) =
8 + 4r
2b(1 + r)
r2(2 + r)2
; or (now L 6= 1) F (r) =
2L(2L+ r)2
, where a = b = 2/L2.
Fig. 1, curve (a) shows δF = F − 1/r2 (deviation from the Newton’s Law; a/b is chosen that
δF (0)=0); two other curves, (b) & (c), correspond to µ2 = 2π
−1/(1 + p2)2, µ3 = 2π
−1p2/(1 + p2)2
(also δF (0)=0; residues help to find rV2 = (10 + 6r + r
2)/(2 + r)3, rV3 = (2 + 2r + r
2)/(2 + r)3).
We see that in principle this theory can explain galaxy rotation curves, v2(r)∝ rF r→∞−→ const,
without need for Dark Matter (or MOND [2]; about rotation curves and DM see [7]; they are
looking for DM in Solar system too, [8]).
Q: Can the ‘coherence of mass’ along the extra dimension be disturbed ? (the flyby anomaly,
the Pioneer anomaly [9]); can µ(p) be negative in some domains of p ?
11 How to register ‘powerless’ waves
This section is added perhaps for some funny recreation (or still not ? who knows). We have
learnt that S-waves do not carry momentum and angular momentum, so they can not perform
any work or spin flip.
But let us conceive that these waves can effect a flip-flop of two neighbor spins. So, a ‘detector’
could be a media with two sorts of spins, A and B. Let sA = sB = 1/2 but gA 6= gB, and let
the initial state is prepared as follows: {<sAz >,<sBz >}(0) = {1/2,−1/2}. Then the process of
spin relaxation starts; turning on appropriate magnetic field Hz (and alternating fields of proper
frequencies) one can measure the detector’s state and find the time of spin relaxation.
The next step. Skilled experimenters try to generate S-waves and to register an effect of these
waves on spin relaxation. The generation of intense ‘coherent’ S-waves could be proceeded perhaps
with a similar spin system subjected to alternating polarization.
12 Single photon experiment (that to feel huge extra dimension),
and Conclusion
Today, many laboratories have sources of single (heralded) photons, or entangled bi-photons (say,
for Bell-type experiments [10]); some students can perform laboratory works with single photons,
having convinced on their own experience that light is quantized (the Grangier experiment)[11].
It is being suggested a minor modification of the single (polarized) photon interference exper-
iment, say, in a Mach-Zehnder fiber interferometer with ‘long’ (the fibers may be rolled) enough
arms. The only new element is a fast-acting shutter placed at the beginning of (one of) the inter-
ferometer’s arms (the closing-opening time of the shutter should be smaller than the flight time
in the arms). For example, a fast electro-optical modulator in combination with polarizer (or a
number of such combinations) can be used with polarized photons.
Both Quantum mechanics (no particle’s ontology) and Bohmian mechanics (wave-particle dou-
ble ontology)[12] exclude any change in the interference figure as a result of separating activity
of such a fast shutter (while the photon’s ‘halves’ are making their ways to the place of a meet-
ing). However, if a photon has non-local spaghetti-like ontology (along the extra dimension) and
fragments of this spaghetti are moving along both arms at once, then the shutter should tear up
this spaghetti (mainly without photon absorption), tear out its fragments (which will dissolve in
‘zero-point oscillations’). Hence, if the absorption factor of the shutter (the extinction ratio of
polarizer) is large enough, the 50/50-proportion (between the photon’s amplitudes in the arms)
will be changed and a significant decrease of the interference visibility should be observed.
QM is everywhere (where we can see, of course), and, so, non-linear 5D-field fluctuations,
looking like spaghetti-anti-spaghetti loops, should exist everywhere. (This omnipresence can be
related to the universality of ‘low-level heat death’, restricted by the presence of topological quasi-
solitons – some as the 2D computer experiment by Fermi, Pasta, and Ulam, where the process of
thermalization was restricted by the existence of solitons. See also the sections 5–8 (and [4]) for
arguments in favor of phenomenological (quantized) ‘secondary fields’ accounting for topological
(quasi)charges and obeying superposition, path integral and so on.)
AP, at least at the level of its symmetry, seems to be able to cure the gap between the
two branches of physics – General Relativity (with coordinate diffeomorphisms) and Quantum
Mechanics (with Lorentz invariance).2 Most people give all the rights of fundamentality to quanta,
and so, they are trying to quantize gravity, and the very space-time (probing loops, and strings,
and branes; see also the warning polemic by Schroer [14]). The other possibility is that quanta
have the specific phenomenological origin relating to topological (quasi)charges.
2Rovelli writes[13]: In spite of their empirical success, GR and QM offer a schizophrenic and confused under-
standing of the physical world.
References
[1] A. Einstein and W. Mayer, Sitzungsber. preuss. Akad. Wiss. Kl 257–265 (1931).
[2] M. Milgrom, The modified dynamics – a status review, arXiv: astro-ph/9810302.
[3] J. F. Pommaret, Systems of Partial Differentiation Equations and Lie Pseudogroups (Math.
and its Applications, Vol. 14, New York 1978).
[4] I. L. Zhogin, Topological charges and quasi-charges in AP, arXiv: gr-qc/0610076; spherical
symmetry: gr-qc/0412130; 3-linear equations (contra-singularities): gr-qc/0203008.
[5] S.M. Carroll, Why is the Universe Accelerating ? arXiv: astro-ph/0310342
[6] B.A. Dubrovin, A.T. Fomenko and S.P. Novikov, Modern Geometry – Methods and Applica-
tions, Springer-Verlag, 1984.
[7] M.E. Peskin, Dark Matter: What is it ? Where is it ? Can we make it in the lab ?
http://www.slac.stanford.edu/grp/th/mpeskin/Yale1.pdf; M. Battaglia, M.E. Peskin,
The Role of the ILC in the Study of Cosmic Dark Matter, hep-ph/0509135
[8] L. Iorio, Solar System planetary orbital motions and dark matter, arXiv: gr-qc/0602095;
I.B. Khriplovich, Density of dark matter in Solar system and perihelion precession of planets,
astro-ph/0702260.
[9] C. Lämmerzahl, O. Preuss, and H. Dittus, Is the physics within the Solar system really
understood ? arXiv: gr-qc/0604052; A. Unzicker, Why do we Still Believe in Newton’s Law ?
Facts, Myths and Methods in Gravitational Physics, gr-qc/0702009.
[10] G. Weihs, T. Jennewein, C. Simon, H. Weinfurter, and A. Zeilinger, Phys. Rev. Lett. 81, 5039
(1998); quant-ph/9810080; W. Tittel, G. Weihs, Photonic Entanglement for Fundamental
Tests and Quantum Communication, quant-ph/0107156.
[11] See the next links: departments.colgate.edu/physics/research/Photon/root/ ,
marcus.whitman.edu/ beckmk/QM .
[12] H. Nikolić, Quantum mechanics: Myths and facts, arXiv: quant-ph/0609163 .
[13] C. Rovelli, Unfinished revolution, gr-qc/0604045 .
[14] B. Schroer, String theory and the crisis in particle physics (a Samisdat on particle physics),
arXiv: physics/0603112;
the other sources of contra-string polemic are seemingly the books: P. Woit, Not even wrong;
L. Smolin, The Trouble with Physics (and the blog math.columbia.edu/∼woit/wordpress).
http://arxiv.org/abs/astro-ph/9810302
http://arxiv.org/astro-ph/9810302
http://arxiv.org/abs/gr-qc/0610076
http://arXiv.org/gr-qc/0610076
http://arXiv.org/gr-qc/0412130
http://arXiv.org/gr-qc/0203008
http://arxiv.org/abs/astro-ph/0310342
http://arxiv.org/astro-ph/0310342
http://www.slac.stanford.edu/grp/th/mpeskin/Yale1.pdf
http://arxiv.org/hep-ph/0509135
http://arxiv.org/abs/gr-qc/0602095
http://arxiv.org/gr-qc/0602095
http://arxiv.org/astro-ph/0702260
http://arxiv.org/abs/gr-qc/0604052
http://arxiv.org/gr-qc/0604052
http://arxiv.org/gr-qc/0702009
http://arXiv.org/quant-ph/9810080
http://arXiv.org/quant-ph/0107156
http://departments.colgate.edu/%physics/research/Photon/root/photon_quantum_mechanics.htm
http://marcus.whitman.edu/~beckmk/QM/
http://arxiv.org/abs/quant-ph/0609163
http://arXiv.org/gr-qc/0604045
http://arxiv.org/abs/physics/0603112
http://arXiv.org/physics/0603112
http://www.math.columbia.edu/~woit/wordpress/
	Introduction
	Unique 5D equation of AP (free of singularities in solutions)
	Tensor T (despite Lagrangian absence) and PN-effects
	Linear domain: instability of trivial solution (with powerless waves) 
	Expanding O4-symmetrical (single wave) solutions and cosmology 
	Non-linear domain: topological charges and quasi-charges 
	Example of SO2-symmetric quaternion field
	Quasi-charges and their morphisms (in 5D, ie m=4) 
	Coexistence: phenomenological `quantum fields' on classical background 
	`Plain' R2 gravity on very thick brane  and change in the Newton's Law of Gravitation
	How to register `powerless' waves 
	Single photon experiment (that to feel huge extra dimension), and Conclusion
ABSTRACT
  Galactic rotation curves and lack of direct observations of Dark Matter may
indicate that General Relativity is not valid (on galactic scale) and should be
replaced with another theory. There is the only variant of Absolute Parallelism
which solutions are free of arising singularities, if D=5 (there is no room for
changes). This variant does not have a Lagrangian, nor match GR: an equation of
`plain' R^2-gravity (ie without R-term) is in sight instead. Arranging an
expanding O_4-symmetrical solution as the basis of 5D cosmological model, and
probing a universal_function of mass distribution (along very-very long the
extra dimension) to place into bi-Laplace equation (R^2 gravity), one can
derive the Law of Gravitation: 1/r^2 transforms to 1/r with distance (not with
acceleration).

<|endoftext|><|startoftext|>
Introduction 
During the last decade, several initiatives have been 
developed to monitor and collect real world data about 
malicious activities on the Internet, e.g., the Internet 
Motion Sensor project [1], CAIDA [2] and Dshield [3]. The 
CADHo project [4] in which we are involved is 
complementary to these initiatives and is aimed at: 
• deploying a distributed platform of honeypots [5] that 
gathers data suitable to analyze the attack processes 
targeting a large number of machines on the Internet; 
• validating the usefulness of this platform by carrying out 
various analyses, based on the collected data, to 
characterize the observed attacks and model their 
impact on security. 
A honeypot is a machine connected to a network but 
that no one is supposed to use. If a connection occurs, it 
must be, at best an accidental error or, more likely, an 
attempt to attack the machine.  
The first stage of the project focused on the 
deployment of a data collection environment (called 
Leurré.com [6]) based on low-interaction honeypots. As 
of today, around 40 honeypot platforms have been 
deployed at various sites from academia and industry in 
almost 30 different countries over the five continents. 
Several analyses and interesting conclusions have been 
derived based on the collected data as detailed e.g., in 
[4,5,7-9]. Nevertheless, with such honeypots, hackers can 
only scan ports and send requests to fake servers without 
ever succeeding in taking control over them. The second 
stage of our project is aimed at setting up and deploying 
high-interaction honeypots to allow us to analyze and 
model the behavior of malicious attackers once they have 
managed to compromise and get access to a new host, 
under strict control and monitoring. We are mainly 
interested in observing the progress of real attack 
processes and the activities carried out by the attackers in 
a controlled environment. 
In this paper, we describe the lessons learned from the 
development and deployment of such a honeypot. The 
main contributions are threefold. First, we do confirm the 
findings discussed in [9] showing that different sets of 
compromised machines are used to carry out the various 
stages of planned attacks. Second, we do outline the fact 
that, despite this apparent sophistication, the actors 
behind those actions do not seem to be extremely skillful, 
to say the least. Last, the geographical location of the 
machines involved in the last step of the attacks and the 
link with some phishing activities shed a geopolitical and 
socio-economical light on the results of our analysis. 
The paper is organized as follows. Section 2 presents 
the architecture of our high-interaction honeypot and the 
design rationales for our solution. The lessons learned 
from the attacks observed over a period of almost 4.5 
months are discussed in Section 3. Finally, Section 4 
concludes and discusses future work. An extended version 
of this paper detailing the context of this work and the 
related state-of-the art is available in [10]. 
2. Architecture of our honeypot 
In our implementation, we decided to use VMware [11] 
and to install virtual operating system upon it. Compared 
to solutions based on physical machines, virtual 
honeypots provide a cost effective and flexible solution 
that is well suited for running experiments to observe 
attacks.  
The objective of our experiment is to analyze the 
behavior of the attackers who succeed in breaking into a 
machine. The vulnerability that they exploit is not as 
crucial as the activity they carry out once they have broken 
into the host. That's why we chose to use a simple 
vulnerability: weak passwords for ssh user accounts. Our 
honeypot is not particularly hardened for two reasons. 
First, we are interested in analyzing the behavior of the 
attackers even when they exploit a buffer overflow and 
become root. So, if we use some kernel patch such as Pax 
[12], our system will be more secure but it will be 
impossible to observe some behavior. Secondly, if the 
system is too hardened, the intruders may suspect 
something abnormal and then give up. 
In our setup, only ssh connections to the virtual host 
are authorized so that the attacker can exploit this 
vulnerability. A firewall blocks all connection attempts 
from the Internet, but those to port 22 (ssh). Also, any 
connection from the virtual host to the outside is blocked 
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
to avoid that intruders attack remote machines from the 
honeypot. This does not prevent the intruder from 
downloading code, using the ssh connection1. 
Our honeypot is a standard Gnu/Linux installation, 
with kernel 2.6, with the usual binary tools. No additional 
software was installed except the http apache server. 
This kernel was modified as explained in the next 
subsection. The real host executing VMware uses the 
same Gnu/Linux distribution and is isolated from outside. 
In order to log what the intruders do on the honeypot, 
we modified some drivers functions (tty_read and 
tty_write), as well as the exec system call in the Linux 
kernel. The modifications of tty_read and tty_write 
enable us to intercept the activity on all the terminals of 
the system. The modification of the exec system call 
enables us to record the system calls used by the intruder.  
These functions are modified in such a way that the 
captured information is logged directly into a buffer of the 
kernel memory of the honeypot itself. 
Moreover, in order to record all the logins and 
passwords tried by the attackers to break into the 
honeypot we added a new system call into the kernel of 
the virtual operating system and we modified the source 
code of the ssh server so that it uses this new system call. 
The logins and passwords are logged in the kernel 
memory, in the same buffer as the information related to 
the commands used by the attackers. 
The activities of the intruder logged by the honeypot 
are preprocessed and then stored into an SQL database. 
The raw data are automatically processed to extract 
relevant information for further analyses, mainly: i) the IP 
address of the attacking machine, ii) the login and the 
password tested, iii) the date of the connection, iv) the 
terminal associated (tty) to each connection, and v) each 
command used by the attacker. 
3. Experimental results 
This section presents the results of our experiments. 
First, we give global statistics in order to give an overview 
of the activities observed on the honeypot, then we 
characterize the various intrusion processes. Finally, we 
analyze in detail the behavior of the attackers once they 
break into the honeypot. In this paper, an intrusion 
corresponds to the activities carried out by an intruder 
who has succeeded to break into the system. 
3.1. Global statistics 
The high-interaction honeypot has been deployed on 
the Internet and has been running for 131 days during 
which 480 IP addresses have tried to contact its ssh port.  
It is worth comparing this value to the amount of hits 
observed against port 22, considering all the other low-
interaction honeypot platforms we have deployed in the 
rest of the world (40 platforms). In the average, each 
platform has received hits on port 22 from around 
approximately 100 different IPs during the same period of 
time. Only four platforms have been contacted by more 
                                                             
1 We have sometimes authorized http connections for a short time, by 
checking that the attackers were not trying to attack other remote hosts. 
than 300 different IP addresses on that port and only one 
was hit by more visitors than our high interaction 
honeypot. Even better, the low-interaction platform 
maintained in the same subnet as the high-interaction 
honeypot experimented only 298 visits, i.e. less than two 
thirds of what the high-interaction did see. This very 
simple and first observation confirms the fact already 
described in [9] that some attacks are driven by the fact 
that attackers know in advance, thanks to scans done by 
other machines, where potentially vulnerable services are 
running. The existence of such a service on a machine will 
trigger more attacks against it. This is what we observe 
here: the low interaction machines do not have the ssh 
service open, as opposed to the high interaction one, and, 
therefore get less attacked than the one where some target 
has been identified. 
The number of ssh connection attempts to the 
honeypot we have recorded is 248717 (we do not consider 
here the scans on the ssh port). This represents about 
1900 connection attempts a day. Among these 248717 
connection attempts, only 344 were successful. Table 1 
represents the user accounts that were mostly tried (the 
top ten) as well as the number of different passwords that 
have been tested by the attackers. It is noteworthy that 
many user accounts corresponding to usual first names 
have also regularly been tested on our honeypot. The total 
number of accounts tested is 41530. 
Account Number of 
connection 
attempts 
Percentage of 
connection 
attempts 
Number of 
passwords 
tested  
root 34251 13.77% 12027 
admin 4007 1.61% 1425 
test 3109 1.25% 561 
user 1247 0.50% 267 
guest 1128 0.45% 201 
info 886 0.36% 203 
mysql 870 0.35% 211 
oracle 857 0.34% 226 
postgres 834 0.33% 194 
webmaster 728 0.29% 170 
Table 1: ssh connection attempts and number of 
passwords tested 
Before the real beginning of the experiment 
(approximately one and a half month), we had deployed a 
machine with a ssh server correctly configured, offering 
no weak account and password. We have taken advantage 
of this observation period to determine which accounts 
were mostly tried by automated scripts. Using this 
acquired knowledge, we have created 17 user accounts and 
we have started looking for successful intrusions. Some of 
the created accounts were among the most attacked ones 
and others not. As we already explained in the paper, we 
have deliberately created user accounts with weak 
passwords (except for the root account). Then, we have 
measured the time between the creation of the account 
and the first successful connection to this account, then 
the duration between the first successful connection and 
the first real intrusion (as explained in section 3.2, the 
first successful connection is very seldom a real intrusion 
but rather an automatic script which tests passwords). 
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
Table 2 summarizes these durations (UAi means User 
Account i). 
User 
Account 
Duration between 
creation and first 
successful connection 
Duration between first 
successful connection 
and first intrusion 
UA1 1 day 4 days 
UA2 Half a day 4 minutes 
UA3 15 days 1 day 
UA4 5 days 10 days 
UA5 5 days null 
UA6 1 day 4 days 
UA7 5 days 8 days 
UA8 1 day 9 days 
UA9 1 day 12 days 
UA10 3 days 2 minutes 
UA11 7 days 4 days 
UA12 1 day 8 days 
UA13 5 days 17 days 
UA14 5 days 13 days 
UA15 9 days 7 days 
UA16 1 day 14 days 
UA17 1 day 12 days 
Table 2: History of breaking accounts 
The second column indicates that there is usually a gap 
of several days between the time when a weak password is 
found and the time when someone logs into the system 
with this password to issue some commands on the now 
compromised host. This is a somehow a surprising fact 
and is described with some more details here below. The 
particular case of the UA5 account is explained as follows: 
an intruder succeeded in breaking the UA4 account. This 
intruder looked at the contents of the /etc/passwd file in 
order to see the list of user accounts for this machine. He 
immediately decided to try to break the UA5 account and 
he was successful. Thus, for this account, the first 
successful connection is also the first intrusion. 
3.2. Intrusion process 
In the section, we present the conclusions of our 
analyses regarding the process to exploit the weak 
password vulnerability of our honeypot. The observed 
attack activities can be grouped into three main 
categories: 1) dictionary attacks, 2) interactive intrusions, 
3) other activities such as scanning, etc. 
Figure 3: Classification of observed IP addresses 
As illustrated in figure 3, among the 480 IP addresses 
that were seen on the honeypot, 197 performed dictionary 
attacks and 35 performed real intrusions on the honeypot 
(see below for details). The 248 IP addresses left were 
used for scanning activity or activity that we did not 
clearly identified. Among the 197 IP addresses that made 
dictionary attacks, 18 succeeded in finding passwords. 
The others (179) did not find the passwords either because 
their dictionary did not include the accounts we created or 
because the corresponding weak password had already 
been changed by a previous intruder. We have also 
represented in Figure 3 the corresponding number of IP 
addresses that were also seen on the low-interaction 
honeypot deployed in the context of the project in the 
same network (between brackets). Whereas most of the IP 
addresses seen on the high interaction honeypot are also 
observed on the low interaction honeypot, none of the 35 
IPs used to really log into our machine to launch 
commands have ever been observed on any of the low 
interaction honeypots that we do control in the whole 
world! This striking result is discussed hereafter. 
3.2.1. Dictionary attack. The preliminary step of 
the intrusion consists in dictionary attacks2. In general, it 
takes only a couple of days for newly created accounts to 
be compromised. As shown in Figure 3, these attacks have 
been launched from 197 IP addresses. By analysing more 
precisely the duration between the different ssh 
connection attempts from the same attacking machine, we 
can say that these dictionary attacks are executed by 
automatic scripts. As a matter of fact, we have noted that 
these attacking machines try several hundreds, even 
several thousands of accounts in a very short time. 
We have made then further analyses regarding the 
machines that succeed in finding passwords, i.e., the 18 IP 
addresses. By searching the leurré.com database 
containing information about the activities of these 
addresses against the other low interaction honeypots we 
found four important elements of information. First, we 
note that none of our low interaction honeypot has an ssh 
server running, none of them replies to requests sent to 
port 22. These machines are thus scanning machines 
without any prior knowledge on their open ports. Second, 
we found evidences that these IPs were scanning in a 
simple sequential way all addresses to be found in a block 
of addresses. Moreover, the comparison of the 
fingerprints left on our low interaction honeypots 
highlights the fact that these machines are running tools 
behaving the same way, not to say the same tool. Third, 
these machines are only interested in port 22, they have 
never been seen connecting to other ports. Fourth, there is 
no apparent correlation as far as their geographical 
location is concerned: they are located all over the world. 
In other words, it comes from this analysis that these 
IPs are used to run a well known program. The detailed 
analysis of this specific tool is outside the scope of the 
paper but, nevertheless, it is worth mentioning that the 
activities linked to that tool, as observed in our 
Leurré.com database, indicate that it is unlikely to be a 
worm but rather an easy to use and widely spread tool. 
3.2.2. Interactive attack: intrusion. The second 
step of the attack consists in the real intrusion. We have 
noted that, several days after the guessing of a weak 
                                                             
2 We consider as “dictionary attack” any attack that tries more than 10 
different accounts and passwords. 
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
password, an interactive ssh connection is executed on 
our honeypot to issue several commands. We believe that, 
in those situations, a real human being, as opposed to an 
automated script, is connected to our machine. This is 
explained and justified in Section 4.3. As shown in Figure 
3, these intrusions come from 35 IP addresses never 
observed on any of the low-interaction honeypots. 
Whereas the geographic localisation of the machines 
performing dictionary attacks is very blur, the machines 
that are used by a human being for the interactive ssh 
connection are, most of the time, clearly identified. We 
have a precise idea of their country, geographic address, 
the responsible of the corresponding domain. 
Surprisingly, these machines, for half of them, come from 
the same country, an European country not usually seen 
as one of the most attacking ones as reported, for 
instance, by the www.leurrecom.org web site. 
We then made analyses in order to see if these IP 
addresses had tried to connect to other ports of our 
honeypot except for these interactive connections; and the 
answer is no. Furthermore, the machines that make 
interactive ssh connections on our honeypot do not make 
any other kind of connections on this honeypot, i.e, no 
scan or dictionary attack. Further analyses, using the data 
collected from the low-interaction honeypots deployed in 
the CADHo project, revealed that none of the 35 IP 
addresses have ever been observed on any of our 
platforms deployed in the word. This is interesting 
because it shows that these machines are totally dedicated 
to this kind of attack (they only targeted our high-
interaction honeypot and only when they knew at least 
one login and password on this machine). 
We can conclude for these analyses that we face two 
groups of attacking machines. The first group is composed 
of machines that are specifically in charge of making 
dictionary attacks. Then the results of these dictionary 
attacks are published somewhere. Then, another group of 
machines, which has no intersection with the first group, 
comes to exploit the weak passwords discovered by the 
first group. This second group of machines is, as far as we 
can see, clearly geographically identified and commands 
are executed by a human being. A similar two steps 
process was already observed in the CADHo project when 
analyzing the data collected from the low-interaction 
honeypots (see [9] for more details). 
3.3. Behavior of attackers 
This section is dedicated to the analysis of the behavior 
of the intruders. We first characterize the intruders, i.e. 
we try to know if they are humans or programs. Then, we 
present in more details the various actions they have 
carried out on the honeypot. Finally, we try to figure out 
what their skill level seems to be. 
We concentrate the analyses on the last three months 
of our experiment. During this period, some intruders 
have visited our honeypot only once, others have visited it 
several times, for a total of 38 ssh intrusions. These 
intrusions were initiated from 16 IP addresses and 7 
accounts were used. Table 3 presents the number of 
intrusions per account, IP addresses and passwords used 
for these intrusions. It is of course difficult to be sure that 
all the intrusions for a same account are initiated by the 
same person. Nevertheless, in our case, we noted that: 
• most of the time, after his first login, the attacker 
changes the weak password into a strong which, from 
there on, remains unchanged. 
• when two different IP addresses access the same 
account (with the same password), they are very close 
and belong to the same country or company.  
These two remarks lead us to believe that there is in 
general only one person associated to the intrusions for a 
particular account. 
Account Number of 
intrusions 
Number of 
passwords 
Number of IP 
addresses 
UA2 1 1 1 
UA4 13 2 2 
UA5 1 1 1 
UA8 1 1 1 
UA10 9 2 2 
UA13 6 1 5 
UA16 5 1 3 
UA17 2 1 1 
Table 3: Number of intrusions per account 
3.3.1. Type of the attackers: humans or 
programs. Before analyzing what intruders do when 
connected, we can try to identify who they are. They can 
be of two different natures. Either they are humans, or 
they are programs which reproduce simple behaviors. For 
all intrusions but 12, intruders have made mistakes when 
typing commands. Mistakes are identified when the 
intruder uses the backspace to erase a previously entered 
character. So, it is very likely that such activities are 
carried out by a human, rather than programs. 
When an intruder did not make any mistake, we 
analyzed how the data were transmitted from the attacker 
machine to the honeypot. We can note that, for ssh 
communications, data transmission between the client 
and the server is asynchronous. Most of the time, the ssh 
client implementation uses the function select() to get 
user input. So, when the user presses a key, this function 
ends and the program sends the corresponding value to 
the server. In the case of a copy and a paste into the 
terminal running the client, the select() function also 
ends, but the program sends all the values contained in 
the buffer used for the paste into the server. We can 
assume that, when tty_read() returns more than one 
character, these values have been sent after a copy and a 
paste. If all the activities during a connection are due to a 
copy and a paste, we can strongly assume that it is due to 
an automatic script. Otherwise, this is quite likely a 
human being who uses shortcuts from time to time (such 
as CTRL-V to paste commands into its ssh session). For 7 
out of the last 12 activities without mistakes, intruders 
have entered several commands on a character-by-
character basis. This, once again, seems to indicate that a 
human being is entering the commands. For the 5 others, 
their activities are not significant enough to conclude: 
they have only launched a single command, like w, which 
is not long enough to highlight a copy and a paste. 
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
3.3.2. Attacker activities. The first significant 
remark is that all of the intruders change the password of 
the hacked account. The second remark is that most of 
them start by downloading some files. In all cases, but 
one, the attackers tried to download some malware to the 
compromised machines. In a single case, the attacker has 
first tried to download an innocuous, yet large, file to the 
machine (the binary for a driver coming from a known 
web site). This is probably a simple way to assess the 
connectivity quality of the compromised host. 
The command used by the intruders to download the 
software is wget. To be more precise, 21 intrusions upon 
38 include the wget command. These 21 intrusions 
concern all the hacked accounts.  As mentioned in 
section 2, outgoing http connections are forbidden by the 
firewall. Nevertheless, the intruders still have the 
possibility to download files through the ssh connection 
using sftp command (instead of wget). Surprisingly, we 
noted that only 30% of the intruders did use this ssh 
connection. 70% of the attackers were unable to download 
their malware due to the absence of http connectivity! 
Three explanations can be envisaged at this stage. First, 
they follow some simplistic cookbook and do not even 
known the other methods at their disposal to upload a file. 
Second, the machines where the malware resides do not 
support sftp. Third, the lack of http connectivity made 
the attacker suspicious and he decided to leave our 
system. Surprisingly, the first explanation seems to be the 
right one in our case as we noticed that the attackers leave 
after an unsuccessful wget and come back a few hours or 
days later, trying the same command again as if they were 
hoping it to work at that time. Some of them have been 
seen trying this several times. It can be concluded that:  
i) they are apparently unable to understand why the 
command fails, ii) they are not afraid to come back to the 
machine despite the lack of http connectivity,  
iii) applying such brute force attack reveals that they are 
not aware of any other method to upload the file. 
Once the attackers manage to download their malware 
using sftp, they try to install it (by decompressing or 
extracting files for example). 75% of the intrusions that 
installed software did not install it on the hacked account 
but rather on standard directories such as /tmp, /var/tmp 
or /dev/shm (which are directories with write access for 
everybody). This makes the hacker activity more difficult 
to identify because these directories are regularly used by 
the operating system itself and shared by all the users. 
Additionally, we have identified four main activities of 
the intruders. The first one is launching ssh scans on 
other networks but these scans have never tested local 
machines. Their idea is to use the targeted machine to 
scan other networks, so that it is more difficult for the 
administrator of the targeted network to localize them. 
The program used by most intruders, which is easy to find 
on the Internet, is pscan.c. 
The second type of activity consists in launching irc 
clients, e.g., emech [13] and psyBNC. Names of binary files 
have regularly been changed by intruders, probably in 
order to hide them. For example, the binary files of emech 
have been changed to crond or inetd, which are well 
known Unix binary file names and processes. 
The third type of activity is trying to become root. 
Surprisingly, such attempts have been observed for 3 
intrusions only. Two rootkits were used. The first one 
exploits two vulnerabilities: a vulnerability which 
concerns the Linux kernel memory management code of 
the mremap system call [14] and a vulnerability which 
concerns the internal kernel function used to manage 
process's memory heap [15]. This exploit could not 
succeed because the kernel version of our honeypot does 
not correspond to the version of the exploit. The intruder 
should have realized this because he checked the version 
of the kernel of the honeypot (uname -a). However, he 
launched this rootkit anyway and failed. The other rootkit 
used by intruders exploits a vulnerability in the program 
ld. Thanks to this exploit, three intruders became root 
but the buffer overflow succeeded only partially. Even if 
they apparently became root, they could not launch all 
desired programs (removing files for example caused 
access control errors). 
The last activity observed in the honeypot is related to 
phishing activities. It is difficult to make precise 
conclusions because only one intruder has attempted to 
launch such an attack. He downloaded a forged email and 
tried to send it through the local smtp agent. But, as far as 
we could understand, it looked like a preliminary step of 
the attack because the list of recipient emails was very 
short. It seems that is was just a preliminary test before 
the real deployment of the attack. 
3.3.3. Attackers skill. Intruders can roughly 
speaking be classified into two main categories. The most 
important one is relative to script kiddies. They are 
inexperienced hackers who use programs found on the 
Internet without really understanding how they work. The 
next category represents intruders who are more 
dangerous. They are named “black hat”. They can make 
serious damage on systems because they are expert in 
security and they know how to exploit vulnerabilities on 
various systems. 
As already presented in §3.3.2. (use of wget and sftp), 
we have observed that intruders are not as clever as 
expected. For example, for two hacked accounts, the 
intruders don't seem to really understand the Unix file 
access rights (it's very obvious for example when they try 
to erase some files whereas they don't have the required 
privileges). For these two same accounts, the intruders 
also try to kill the processes of other users. Many 
intruders do not try to delete the file containing the 
history of their commands or do not try to deactivate this 
history function (this file depends on the login shell used, 
it is .bash_history for example for the bash). Among the 
38 intrusions, only 14 were cleaned by the intruders (11 
have deactivated the history function and 3 have deleted 
the.bash_history file). This means that 24 intrusions left 
behind them a perfectly readable summary of their 
activity within the honeypot. 
The IP address of the honeypot is private and we have 
started another honeypot on this network. This second 
honeypot is not directly accessible from the outside, it is 
only accessible from the first honeypot. We have modified 
the /etc/motd file of the first honeypot (which is 
automatically printed on the screen during the login 
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
process) and added the following message: “In order to 
use the software XXX, please connect to A.B.C.D”. 
In spite of this message, only one intruder has tried to 
connect to the second honeypot. We could expect that an 
experienced hacker will try to use this information. In a 
more general way, we have very seldom seen an intruder 
looking for other active machines on the same network. 
One important thing to note is relative to 
fingerprinting activity. No intruder has tried to check the 
presence of VMware software. For three hacked accounts, 
the intruders have read the contents of the file 
/proc/cpuinfo but that's all. None of the methods 
discussed on Internet was tested to identify the presence 
of VMware software [16,17]. This probably means that the 
intruders are not experienced hackers. 
4. Conclusion 
In this paper, we have presented the results of an 
experiment carried out over a period of 6 months during 
which we have observed the various steps that lead an 
attacker to successfully break into a vulnerable machine 
and his behavior once he has managed to take control 
over the machine. 
The findings are somehow consistent with the informal 
know how shared by security experts. The contributions of 
the paper reside in performing an experiment and 
rigorous analyses that confirm some of these informal 
assumptions. Also, the precise analysis of the observed 
attacks reveals several interesting facts. First of all, the 
complementarity between high and low interaction 
honeypots is highlighted as some explanations can be 
found by combining information coming from both set 
ups. Second, it appears that most of the observed attacks 
against port 22 were only partially automated and carried 
out by script kiddies. This is very different from what can 
be observed against other ports, such as 445, 139 and 
others, where worms have been designed to completely 
carry out the tasks required for the infection and 
propagation. Last, honeypot fingerprinting does not seem 
to be a high priority for attackers as none of them has 
tried the known techniques to check if they were under 
observation. It is also worth mentioning a couple of 
important missing observations. First, we did not observe 
scanners detecting the presence of the open ssh port and 
providing this information to other machines in charge of 
running the dictionary attack. This is different from 
previous observations reported in [9]. Second, as most of 
the attacks follow very simple and repetitive patterns, we 
did not observe anything that could be used to derive 
sophisticated scenarios of attacks that could be analyzed 
by intrusion detection correlation engine. Of course, at 
this stage it is too early to derive definite conclusions from 
this observation. 
Therefore, it would be interesting to keep doing this 
experiment over a longer period of time to see if things do 
change, for instance if a more efficient automation takes 
place. We would have to solve the problem of weak 
passwords being replaced by strong ones though, in order 
to see more people succeeding in breaking into the 
system. Also, it would be worth running the same 
experiment by opening another vulnerability into the 
system and verifying if the identified steps remain the 
same, if the types of attackers are similar. Could it be, at 
the contrary, that some ports are preferably chosen by 
script kiddies while others are reserved to some more elite 
attackers? This is something that we are in the process of 
assessing. 
Acknowledgement. This work has been partially 
supported by: 1) CADHo, a research action funded by the French 
ACI “Securité & Informatique” (www.cadho.org), 2) the 
CRUTIAL IST-027513 project (crutial.cesiricerca.it), and 3) the 
ReSIST IST- 026764  project (www.resist-noe.org). 
5. References 
[1] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, The Internet 
motion sensor - a distributed blackhole monitoring 
system. Network and Distributed Systems Security Symp. 
(NDSS 2005), San Diego, USA, 2005. 
[2] CAIDA Project. Home Page of the CAIDA Project, 
http://www.caida.org. 
[3] http://www.dshield.org. Home page of the DShield.org 
Distributed Intrusion Detection System.  
[4] E. Alata, M. Dacier, Y. Deswarte, M. Kaaniche, K. 
Kortchinsky, V. Nicomette, V. Hau Pham, and F. Pouget, 
Collection and analysis of attack data based on honeypots 
deployed on the Internet. QOP 2005, 1st Workshop on 
Quality of Protection (co-located with ESORICS and 
METRICS), Sept. 15, Milan, Italy, 2005. 
[5] F. Pouget, M. Dacier, V. Hau Pham. Leurre.com: on the 
advantages of deploying a large scale distributed 
honeypot platform. In Proc. of ECCE'05, E-Crime and 
Computer Conference, Monaco, 2005. 
[6] Home page of Leurré.com: http://www.leurre.org. 
[7] Project Leurré.com. Publications web page: 
http://www.leurrecom.org/paper.htm. 
[8] M. Dacier, F. Pouget, H. Debar. Honeypots: practical 
means to validate malicious fault assumptions. 10th IEEE 
Pacific Rim Int. Symp., pp. 383--388, Tahiti, 2004. 
[9] F. Pouget, M. Dacier, V. Hau Pham, “Understanding 
threats: a prerequisite to enhance survivability of 
computing systems”, Int. Infrastructure Survivability 
Workshop IISW'04,  (25th IEEE Int. Real-Time Systems 
Symp. (RTSS 04)), Lisboa, Portugal, 2004. 
[10] E. Alata, V. Nicomette, M. Kaaniche, M. Dacier, M. Herrb, 
Lessons learned from the deployment of a high-
interaction honeypot: Extended version. LAAS Report, 
July 2006.  
[11] Inc. VMware. Available on: http://www.vmware.com 
[12] The PaX Team. Available on: http://pax.grsecurity.net.  
[13] EnergyMech team. Energymech. Available on: 
http://www.energymech.net. 
[14] US-CERT. Linux kernel mremap(2) system call does not 
properly check return value from do_munmap() function. 
Available on: http://www.kb.cert.org/vuls/id/981222.  
[15] US-CERT. Linux kernel do_brk() function contains 
integer overflow. http://www.kb.cert.org/vuls/id/981222. 
[16] J. Corey, Advanced honeypot identification and 
exploitation. Phrack, N 63, Available on: 
http://www.phrack.org/fakes/p63/p63-0x09.txt.  
[17] T. Holz and F. Raynal, Detecting honeypots and other 
suspicious environments. In Systems, Man and 
Cybernetics (SMC) Information Assurance Workshop. 
Proc. from the Sixth Annual IEEE, pages 29--36, 2005.
Proceedings of the Sixth European Dependable Computing Conference (EDCC'06)
0-7695-2648-9/06 $20.00  © 2006
ABSTRACT
  This paper presents an experimental study and the lessons learned from the
observation of the attackers when logged on a compromised machine. The results
are based on a six months period during which a controlled experiment has been
run with a high interaction honeypot. We correlate our findings with those
obtained with a worldwide distributed system of lowinteraction honeypots.

<|endoftext|><|startoftext|>
Introduction
The idea behind abstract (linear) potential theory, as developed by
Choquet [4], Fuglede [9] and Ohtsuka [15], is to replace the Euclidian
space Rd by some locally compact space X and the well-known Newto-
nian kernel by some other kernel function k : X×X → R∪{+∞}, and
∗ This work was started during the 3rd Summerschool on Potential Theory, 2004,
hosted by the College of Kecskemét, Faculty of Mechanical Engineering and Automa-
tion (GAMF). Both authors would like to express their gratitude for the hospitality
and the support received during their stay in Kecskemét.
† The second named author was supported by the Hungarian Scientific Research
Fund; OTKA 49448
http://arxiv.org/abs/0704.0859v1
to look at which “potential theoretic” assertions remain true in this gen-
erality (see the monograph of Landkof [12]). This approach facilitates
general understanding of certain potential theoretic phenomena and
allows also the exploration of fundamental principles like Frostman’s
maximum principle.
Although there is a vast work done considering energy integrals and
different notions of energies, the familiar notions of transfinite diame-
ter and Chebyshev constants in this abstract setting are sporadically
found, sometimes indeed inaccessible, in the literature, see Choquet [4]
or Ohtsuka [17]. In [4] Choquet defines transfinite diameter and proves
its equality with the Wiener energy in a rather general situation, which
of course covers the classical case of the logarithmic kernel on C. We
give a slightly different definition for the transfinite diameter that, for
infinite sets, turns out to be equivalent with the one of Choquet. The
primary aim of this note is to revisit the above mentioned notions and
related results and also to partly complement the theory.
We already remark here that Zaharjuta’s generalisation of transfi-
nite diameter and Chebyshev constant to Cn is completely different in
nature, see [24], whereas some elementary parts of weighted potential
theory (see, e.g., Mhaskar, Saff [13] and Saff, Totik [20]) could fit in
this framework.
The power of the abstract potential analytic tools is well illustrated
by the notion of the average distance number from metric analysis, see
Gross [11], Stadje [21]. The surprising phenomenon noticed by Gross is
the following: If (X, d) is a compact connected metric space, there al-
ways exists a unique number r(X) (called the average distance number
or the rendezvous number of X), with the property that for any finite
point system x1, . . . , xn ∈ X there is another point x ∈ X with average
distance
d(xj , x) = r(X).
Stadje generalised this to arbitrary continuous, symmetric functions
replacing d. Actually, it turned out, see the series of papers [6, 5, 7] and
the references therein, that many of the known results concerning av-
erage distance numbers (existence, uniqueness, various generalisations,
calculation techniques etc.), can be proved in a unified way using the
works of Fuglede and Ohtsuka. We mention for example that Frost-
man’s Equilibrium Theorem is to be accounted for the existence for
certain invariant measures (see Section 5 below). In these investigations
the two variable versions of Chebyshev constants and energies and even
their minimax duals had been needed, and were also partly available
due to the works of Fuglede [10] and Ohtsuka [16, 17], see also [6].
Another occurrence of abstract Chebyshev constants is in the study
of polarisation constants of normed spaces, see Anagnostopoulos, Ré-
vész [1] and Révész, Sarantopoulos [19].
Let us settle now our general framework. A kernel in the sense of
Fuglede is a lower semicontinuous function k : X × X → R ∪ {+∞}
[9, p. 149]. In this paper we will sometimes need that the kernel is
symmetric, i.e., k(x, y) = k(y, x). This is for example essential when
defining potential and Chebyshev constant, otherwise there would be
a left- and right-potential and the like.
Another assumption, however a bit of technical flavour, is the pos-
itivity of the kernel. This we need, because we would like to avoid
technicalities when integrating not necessarily positive functions. This
assumption is nevertheless not very restrictive. Since we usually con-
sider compact sets of X ×X, where by lower semicontinuity k is nec-
essarily bounded from below, we can assume that k ≥ 0. Indeed, as we
will see, energy, nth diameter and nth Chebyshev constant are linear in
constants added to k.
Denote the set of compactly supported Radon measures on X by
M(X), that is
M(X) := {µ : µ is a regular Borel measure on X,
µ has compact support, ‖µ‖ < +∞}.
Further, let M1(X) be the set of positive unit measures from M(X),
M1(X) := {µ ∈ M(X) : µ ≥ 0, µ(X) = 1}.
We say that µ ∈ M1(X) is supported on H if supp µ, which is
a compact subset of X, is in H. The set of (probability) measures
supported on H are denoted by M(H) (M1(H)).
Before recalling the relevant potential theoretic notions from [9] (see
also [15]), let us spend a few words on integrals (see [2, Ch. III-IV.]). Let
µ be a positive Radon measure on X. Then the integral of a compactly
supported continuous function with respect to µ is the usual integral.
The upper integral of a positive l.s.c. function f is defined as
f dµ := sup
0 ≤ h ≤ f
h ∈ Cc(X)
h dµ.
This definition works well, because by standard arguments (see, e.g.,
[2, Ch. IV., Lemma 1]) one has
k(x, y) = sup
0 ≤ h ≤ k
h ∈ Cc(X ×X)
h(x, y),
where, because of the symmetry assumption, it suffices to take only
symmetric functions h in the supremum.
What should be here noted, is that this notion of integral has all
useful properties that we are used to in case of Lebesgue integrals (note
also the necessity of the positivity assumptions).
The usual topology onM is the so-called vague topology which is a lo-
cally convex topology defined by the family {µ 7→
X f dµ : f ∈ Cc(X)}
of seminorms. We will only encounter this topology in connection with
families M of measures supported on subsets of the same compact set
K ⊂ X. In this case, the weak∗-topology (determined by C(K)) and
the vague topology coincide on M, Fuglede [9].
For a potential theoretic kernel k : X ×X → R+ ∪ {0} Fuglede [9]
and Ohtsuka [15] define the potential and the energy of a measure µ
Uµ(x) :=
k(x, y) dµ(y) , W (µ) :=
k(x, y) dµ(y) dµ(x).
The integrals exist in the above sense, although may attain +∞ as
well.
For a given set H ⊂ X its Wiener energy is
w(H) := inf
µ∈M1(H)
W (µ), (1)
see [9, (2) on p. 153].
One also encounters the quantities (see [9, p. 153])
U(µ) := sup
Uµ(x), V (µ) := sup
x∈ supp µ
Uµ(x).
Accordingly one defines the following energy functions
u(H) := inf
µ∈M1(H)
U(µ), v(H) := inf
µ∈M1(H)
V (µ).
In general, one has the relation
w ≤ v ≤ u ≤ +∞,
where in all places strict inequality may occur. Nevertheless, under our
assumptions we have the equality of the energies v and w, being gen-
erally different, see [9, p. 159]. More importantly, our set of conditions
suffices to have a general version of Frostman’s equilibrium theorem,
see Theorem 9.
In fact, at a certain point (in §4), we will also assume Frostman’s
maximum principle, which will trivially guarantee even u = v, that is,
the equivalence of all three energies treated by Fuglede.
Definition. The kernel k satisfies the maximum principle, if for every
measure µ ∈ M1
U(µ) = V (µ).
As our examples show in §5, this is essential also for the equivalence
of the Chebyshev constant and the transfinite diameter. Carleson [3,
Ch. III.] gives a class of examples satisfying the maximum principle:
Let Φ(r), r = |x|, x ∈ Rd be the fundamental solution of the Laplace
equation, i.e., Φ(|x−y|) the Newtonian potential on Rd. For a positive,
continuous, increasing, convex function H assume also that
H(Φ(r))rd−2 dr < +∞.
Then H ◦Φ satisfies the maximum principle; see [3, Ch. III.] and also
Fuglede [9] for further examples.
Let us now turn to the systematic treatment of the Chebyshev
constant and the transfinite diameter. We call a function g : X →
R log-polynomial, if there exist w1, . . . , wn ∈ X such that g(x) =
j=1 k(x,wj) for all x ∈ X. Accordingly, we will call the wjs and
n the zeros and the degree of g(x), respectively. Obviously the sum of
two log-polynomials is a log-polynomial again. The terminology here is
motivated by the case of the logarithmic kernel
k(x, y) = − log |x− y|,
where the log-polynomials correspond to negative logarithms of alge-
braic polynomials.
Log-polynomials give access to the definition of transfinite diameter
and the Chebyshev constant, see Carleson [3], Choquet [4], Fekete [8],
Ohtsuka [17] and Pólya, Szegő [18]. First we start with the “degree n”
versions, whose convergence will be proved later.
Definition. Let H ⊂ X be fixed. We define the nth diameter of H as
Dn(H) := inf
w1,...,wn∈H
(n− 1)n
1≤j 6=l≤n
k(wj , wl)
; (2)
or, if the kernel is symmetric
Dn(H) = inf
w1,...,wn∈H
(n− 1)n
1≤i<j≤n
k(wi, wj)
If H is compact, then due to the fact that k is l.s.c., Dn(H) is
attained for some points w1, . . . , wn ∈ H, which are then called n-Fekete
points. We will also use the term approximate n-Fekete points with the
obvious meaning. Note also that for a finite set H, #H = m and
n > m, there is always a point from the diagonal ∆ = {(x, x) : x ∈ H}
in the definition of Dn(H). This possibility is completely excluded by
Choquet in [4], thus allowing only infinite sets.
Definition. For an arbitrary H ⊂ X the nth Chebyshev constant of
H is defined as
Mn(H) := sup
w1,...,wn∈H
k(x,wk)
We are going to show that both nth diameters and nth Chebyshev
constants converge from below to some number (or +∞), which are
respectively called the transfinite diameter D(H) and the Chebyshev
constant M(H). The aim of this paper is to relate these quantities as
well as the Wiener energy of a set.
2. Chebyshev constant and transfinite diameter
We define the Chebyshev constant and the transfinite diameter of a
set H ⊂ X and proceed analogously to the classical case. It turns out,
though not very surprisingly, that in general the equality of these two
quantities does not hold.
First, we prove the convergence of nth diameters and nth Chebyshev
constants. This is for both cases classical, we give the proof only for
the sake of completeness, see, e.g., Carleson [3], Choquet [4], Fekete [8],
Ohtsuka [17] and Pólya, Szegő [18].
PROPOSITION 1. The sequence of nth diameters is monotonically
increasing.
Proof. Choose x1, . . . , xn ∈ H arbitrarily. If we leave out any index
i = 1, 2, . . . , n, then for the remaining n − 1 points we obtain by the
definition of Dn−1(H) that
(n− 1)(n − 2)
1 ≤ j 6= l ≤ n
j 6= i, l 6= i
k(xj , xl) ≥ Dn−1(H).
After summing up for i = 1, 2, . . . , n this yields
1≤j 6=l≤n
k(xj , xl) ≥ n ·Dn−1(H),
for each term k(xj , xl) occurs exactly n − 2 times. Now taking the
infimum for all possible x1, . . . , xn ∈ H, we obtain n · Dn(H) ≥ n ·
Dn−1(H), hence the assertion.
The limit D(H) := limn→∞Dn(H) is the transfinite diameter of H.
Similarly, the nth Chebyshev constants converge, too.
PROPOSITION 2. For any H ⊂ X, the Chebyshev constants Mn(H)
converge in the extended sense.
Proof. The sum of two log-polynomials, p(z) =
i=1 k(z, xi) with de-
gree n and q(z) =
j=1 k(z, yj) with degree m, is also a log-polynomial
with degree n+m. Therefore
(n+m)Mn+m ≥ nMn +mMm (3)
for all n,m follows at once. Should Mn(H) be infinity for some n,
then all succeeding terms Mn′(H), n
′ ≥ n are infinity as well, hence
the convergence is obvious. We assume now that Mn(H) is a finite
sequence. At this point, for the sake of completeness, we can repeat the
classical argument of Fekete [8].
Namely, let m,n be fixed integers. Then there exist l = l(n,m) and
r = r(n,m), 0 ≤ r < m nonnegative integers such that n = l ·m + r.
Iterating the previous inequality (3) we get
n ·Mn ≥ l
+ rMr = nMm + r(Mr −Mm).
Fixing now the value of m, the possible values of r remain bounded
by m, and the finitely many values of Mr −Mm’s are finite, too. Hence
dividing both sides by n, and taking lim infn→∞, we are led to
lim inf
Mn ≥ lim inf
Mr −Mm
= Mm .
This holds for any fixed m ∈ N, so taking lim supm→∞ on the right
hand side we obtain
lim inf
Mn ≥ lim sup
that is, the limit exists.
M(H) := limn→∞Mn(H) is called the Chebyshev constant of H.
In the following, we investigate the connection between the Chebyshev
constant M(H) and the transfinite diameter D(H).
THEOREM 3. Let k be a positive, symmetric kernel. For any n ∈ N
and H ⊂ X we have Dn(H) ≤ Mn(H), thus also D(H) ≤ M(H).
Proof. If Mn(H) = +∞, then the assertion is trivial. So assume
Mn(H) < +∞. By the quasi-monotonicity (see (3)) we have that for
all m ≤ n also Mm(H) is finite. We use this fact to recursively find
w1, . . . wn ∈ H such that k(wi, wj) < +∞ for all i < j ≤ n. At the
end we arrive at
1≤i<j≤n k(wi, wj) < +∞, hence Dn(H) < +∞. This
was our first aim to show, in the following this choice of the points
w1, . . . , wn will not play any role. Instead, for an arbitrarily fixed ε > 0,
we take, as we may, an “approximate n-Fekete point system” w1, . . . , wn
(n− 1)n
1≤i 6=j≤n
k(wi, wj) < Dn + ε. (4)
For any x ∈ H the points x,w1, . . . , wn form a point system of n + 1
points, so by the definition of Dn+1 we have
k(x,wi) +
1≤i 6=j≤n
k(wi, wj) ≥ n(n+ 1)Dn+1 ≥ n(n+ 1)Dn,
using also the monotonicity of the sequence Dn. This together with
(4) lead to
pn(x) :=
k(x,wi) ≥
n(n+ 1)
n(n− 1)
Dn + ε
Taking infimum of the left hand side for x ∈ H we obtain
pn(x) ≥ nDn −
n(n− 1)ε
By the very definition of the nth Chebyshev constant, n · Mn ≥
infx∈H pn(x) holds, hence Mn ≥ Dn − (n− 1)ε/2 follows. As this holds
for all ε > 0, we conclude Mn ≥ Dn.
Later we will show that, unlike the classical case of C, the strict
inequality D < M is well possible.
3. Transfinite diameter and energy
We study the connection between the energy w and the transfinite
diameter D. Without assuming the maximum principle we can prove
the equivalence of these two quantities for compact sets. This result
can actually be found in a note of Choquet [4]. There is however a
slight difference to the definitions of Choquet in [4]. There the diagonal
was completely excluded from the definition of D, that is the infimum
in (2) is taken over wi 6= wj, i 6= j and not for systems of arbitrary
wj’s . This means, among others, that in [4] the transfinite diameter is
only defined for infinite sets. The other assumption of Choquet is that
the kernel is infinite on the diagonal. This is completely the contrary
to what we assume in Theorem 8. Indeed, with our definitions of the
transfinite diameter one can even prove equality for arbitrary sets if
the kernel is finite-valued.
THEOREM 4. Let k be an arbitrary kernel and H ⊂ X be any set.
Then D(H) ≤ w(H).
Proof. Let µ ∈ M1(H) be arbitrary, and define ν :=
j=1 µ the
product measure on the product space Xn. We can assume that the
kernel is positive because supp µ, and hence supp ν, is compact so we
can add a constant to k such that it will be positive on these supports.
Consider the following lower semicontinuous functions g and h on Xn
g : (x1, . . . , xn) 7→ Dn(H)
:= inf
(w1,...,wn)∈Xn
n(n−1)
1≤i 6=j≤n
k(wi, wj)
h : (x1, . . . , xn) 7→
n(n−1)
1≤i 6=j≤n
k(xi, xj).
Since 0 ≤ g ≤ h, by the definition of the upper integral the following
holds true
Dn(H) ≤
n(n− 1)
1≤i 6=j≤n
k(xi, xj) dν(x1, . . . , xn)
n(n− 1)
1≤i 6=j≤n
k(xi, xj) dµ(xi) dµ(xj) = W (µ).
Taking infimum in µ yields Dn(H) ≤ w(H), hence also D(H) ≤ w(H).
To establish the converse inequality we need a compactness as-
sumption. With the slightly different terminology, Choquet proves the
following for kernels being +∞ on the diagonal ∆. The arguments there
are very similar, except that the diagonal doesn’t have to be taken care
of in [4]. We give a detailed proof.
PROPOSITION 5 (Choquet [4]). For an arbitrary kernel function k
the inequality D(K) ≥ w(K) holds for all K ⊆ X compact sets.
Proof. First of all the l.s.c. function k attains its infimum on the
compact set K × K. So by shifting k up we can assume that it is
positive, and the validity of the desired inequality is not influenced by
this.
If D(K) = +∞, then by Theorem 4 we have w(K) = +∞, thus
the assertion follows. Assume therefore D(K) < +∞, and let n ∈ N,
ε > 0 be fixed. Let us choose a Fekete point system w1, . . . , wn from
K. Put µ := µn := 1/n
i=1 δwi where δwi are the Dirac measures at
the points wi, i = 1, . . . , n. For a continuous function 0 ≤ h ≤ k with
compact support, we have
h dµ dµ =
i,j=1
h(wi, wj)
h(wi, wi) +
i,j=1
h(wi, wj)
h(wi, wi) +
i,j=1
k(wi, wj)
i,j=1
k(wi, wj)
Dn(K) ≤
+D(K)
using, in the last step, also the monotonicity of the sequenceDn (Propo-
sition 1). In fact, we obtain for n ≥ N = N(‖h‖, ε) the inequality
h dµ dµ ≤ D + ε. (5)
It is known, essentially by the Banach-Alaoglu Theorem, that for a
compact set K the measures of M1(K) form a weak
∗-compact subset
of M, hence there is a cluster point ν ∈ M1(K) of the set MN :=
{µn : n ≥ N} ⊂ M1(K). Let {να}α∈I ⊆ MN be a net converging to
ν. Recall that να⊗να weak
∗-converges to ν⊗ν. We give the proof. For
a function g ∈ C(K ×K), g(x, y) = g1(x) · g2(y) it is obvious that
g dνα dνα →
g dν dν. (6)
The set A of such product-decomposable functions g(x, y) = g1(x)g2(y)
is a subalgebra of C(K ×K), which also separates X ×X, since it is
already coordinatewise separating. By the Stone–Weierstraß theorem
A is dense in C(K ×K). From this, using also that the family MN of
measures is norm-bounded, we immediately get the weak∗-convergence
(6). All these imply
h dν dν ≤ D(K) + ε,
w(K) ≤ W (ν) :=
kdνdν = sup
0 ≤ h ≤ k
h ∈ Cc(X ×X)
hdνdν ≤ D(K)+ε,
for all ε > 0. This shows w(K) ≤ D(K).
COROLLARY 6 (Choquet [4]). For arbitrary kernel k and compact set
K ⊂ X, the equality D(K) = w(K) holds.
Proof. By compactness we can shift k up and therefore assume it is
positive. Then we apply Theorem 4 and Proposition 5.
The assumptions of Choquet [4] are the compactness of the set plus the
property that the kernel is +∞ on the diagonal (besides it is continuous
in the extended sense). This ensures, loosely speaking, that for a set K
of finite energy an energy minimising measure µ (i.e., for whichW (µ) =
w(K)) is necessarily non-atomic, moreover µ ⊗ µ is not concentrated
on the diagonal. Therefore to show equality of w with D, one has to
exclude the diagonal completely from the definition of the transfinite
diameter.
We however allow a larger set of choices for the point system in the
definition of D. Indeed, we allow Fekete points to coincide, and this also
makes it possible to define the transfinite diameter of finite sets. With
this setup the inequality D ≤ w is only simpler than in the case handled
by Choquet. Whereas, however surprisingly, the equality D(K) = w(K)
is still true for compact sets K but without the assumption on the
diagonal values of the kernel.
We will see in §5 Example 13 that even assuming the maximum prin-
ciple but lacking the compactness allows the strict inequality D < w.
This phenomena however may exist only in case of unbounded kernels,
as we will see below. In fact, we show that if the kernel is finite on the
diagonal, thenD = w holds for arbitrary sets. For this purpose, we need
the following technical lemma, which shows certain inner regularity
properties of D and is also interesting in itself.
LEMMA 7. Assume that the kernel k is positive and finite on the
diagonal, i.e., k(x, x) < +∞ for all x ∈ X. Then for an arbitrary
H ⊂ X we have
D(H) = inf
K ⊂ H
K compact
D(K) = inf
W ⊂ H
#W < ∞
D(W ). (7)
Proof. The inequality infD(K) ≤ infD(W ) is clear. For H ⊇ K the
inequality D(H) ≤ D(K) is obvious, so we can assume D(H) < +∞.
For ε > 0 let W = {w1, . . . , wn} be an approximate n-Fekete point set
of H satisfying (4). Then
D(W ) = lim
Dmn(W ) ≤ lim
mn(mn− 1)
1≤i′ 6=j′≤mn
k(wi′ , wj′),
where
wi′ :=
. . .
′ = i+ rn, r = 0, . . . ,m− 1
. . .
Set C := max{k(x, x) : x ∈ W}. So we find
D(W ) ≤ lim
mn(mn−1)
1≤i 6=j≤n
k(wi, wj) +
mn(mn−1)
1≤i≤n
k(wi, wi)
1≤i 6=j≤n
k(wi, wj) lim
mn(mn−1)
+ Cn lim
mn(mn−1)
1≤i 6=j≤n
k(wi, wj) ≤
(Dn(H) + ε) ≤ D(H) + ε.
This being true for all ε > 0, taking infimum we finally obtain
W ⊂ H
#W < ∞
D(W ) ≤ D(H).
Clearly, if k(x, x) = +∞ for all x ∈ W with a finite set #W = n,
then for all m > n we have Dm(W ) = +∞. Thus in particular for
kernels with k : ∆ → {+∞}, the above can not hold in general, at least
as regards the last part with finite subsets.
Now, completely contrary to Choquet [4] we assume that the kernel
is finite on the diagonal and prove D = w for any set. Hence an example
of D < w (see §5 Example 13) must assume k(x, x) = +∞ at least for
some point x.
THEOREM 8. Assume that the kernel k is positive and is finite on
the diagonal, that is k(x, x) < +∞ for all x ∈ X. Then for arbitrary
sets H ⊂ X, the equality D(H) = w(H) holds.
Proof. By Theorem 4 we have D(H) ≤ w(H). Hence there is nothing
to prove, if D(H) = +∞. Assume D(H) < +∞, and let ε > 0 be
arbitrary. By Lemma 7 we have for some n ∈ N a finite set W =
{w1, w2 . . . , wn} with D(H) + ε ≥ D(W ). In view of Proposition 5
we have D(W ) ≥ w(W ), and by monotonicity also w(W ) ≥ w(H). It
follows that D(H) + ε ≥ w(H) for all ε > 0, hence also the “≥” part
of the assertion follows.
4. Energy and Chebyshev constant
To investigate the relationship between the energy and the Cheby-
shev constant the following general version of Frostman’s Equilibrium
Theorem [9, Theorem 2.4] is fundamental for us.
THEOREM 9 (Fuglede). Let k be a positive, symmetric kernel and
K ⊂ X be a compact set such that w(K) < +∞. Every µ which
has minimal energy (µ ∈ M1(K),W (µ) = w(K)) satisfy the following
properties
Uµ(x) ≥ w(K) for nearly every1 x ∈ K,
Uµ(x) ≤ w(K) for every x ∈ supp µ,
Uµ(x) = w(K) for µ-almost every x ∈ X.
Moreover, if the kernel is continuous, then
Uµ(x) ≥ w(K) for every x ∈ K.
THEOREM 10. Let H ⊂ X be arbitrary. Assume that the kernel k is
positive, symmetric and satisfies the maximum principle. Then we have
Mn(H) ≤ w(H) for all n ∈ N, whence also M(H) ≤ w(H) holds true.
Proof. Let n ∈ N be arbitrary. First let K be any compact set.
We can assume w(K) < +∞, since otherwise the inequality holds
irrespective of the value of Mn(K). Consider now an energy-minimising
measure νK of K, whose existence is assured by the lower semicontinu-
ity of µ 7→
k dµ dµ and the compactness of M1(K), see [9, Theorem
2.3].
By the Frostman-Fuglede theorem (Theorem 9) we have UνK (x) ≤
w(K) for all x ∈ supp νK , so V (νK) ≤ w(K), and by the maximum
principle even
UνK (x) ≤ w(K) for all x ∈ X.
1 The set A of exceptional points is small in the sense w(A) = +∞.
Then for all w1, . . . , wn ∈ K
k(x,wj) ≤
k(x,wj) dνK(x) ≤ w(K) .
Taking supremum for w1, . . . , wn ∈ K, we obtain
w1,...,wn∈K
k(x,wj) ≤ w(K).
So Mn(K) ≤ w(K) for all n ∈ N.
Next let H ⊂ X be arbitrary. In view of the last form of (1), for all
ε > 0 there exists a measure µ ∈ M1(H), compactly supported in H,
with w(µ) ≤ w(H) + ε. Let W = {w1, . . . , wn} ⊂ H be arbitrary and
define pW (x) :=
i k(x,wi).
Consider the compact set K := W ∪ supp µ ⊂ H. By definition of
the energy, supp µ ⊂ K implies w(K) ≤ w(µ), hence w(K) ≤ w(H) +
ε. Combining this with the above, we come to Mn(K) ≤ w(H) + ε.
Since W ⊂ K, by definition of Mn(K) we also have
pW (x) ≤ Mn(K). (8)
The left hand side does not increase, if we extend the inf over the
whole of H, and the right hand side is already estimated from above
by w(H) + ε. Thus (8) leads to
pW (x) ≤ w(H) + ε.
This holds for all possible choices of W = {w1, . . . , wn} ⊂ H, hence is
true also for the sup of the left hand side. By definition of Mn(H) this
gives exactly Mn(H) ≤ w(H) + ε, which shows even Mn(H) ≤ w(H).
Remark. In [6] it is proved that M(H) = q(H), where
q(H) = inf
µ∈M1(H)
Uµ(x).
The idea behind is a minimax theorem, see also [16, 17]. Trivially
w(H) ≤ q(H) ≤ u(H). So the maximum principle implies M(H) =
w(H) = q(H) = u(H).
5. Summary of the Results. Examples
In this section, we put together the previous results, thus proving the
equality of the three quantities being studied, under the assumption
of the maximum principle for the kernel. Further, via several instruc-
tive examples we investigate the necessity of our assumptions and the
sharpness of the results.
THEOREM 11. Assume that the kernel k is positive, symmetric and
satisfies the maximum principle. Let K ⊂ X be any compact set. Then
the transfinite diameter, the Chebyshev constant and the energy of K
coincide:
D(K) = M(K) = w(K).
Proof. We presented a cyclic proof above, consisting of M ≥ D
(Theorem 3), D ≥ w (Proposition 5) and finally w ≥ M (Theorem 10).
THEOREM 12. Assume that the kernel k is positive, finite and sat-
isfies the maximum principle. For an arbitrary subset H ⊂ X the
transfinite diameter, the Chebyshev constant and the energy of H co-
incide:
D(H) = M(H) = w(H).
Proof. By finiteness D = w, due to Theorem 8. This with D ≤ M
and M ≤ w (Theorems 3 and 10) proves the assertion.
Remark. In the above theorem, logically it would suffice to assume
that the kernel be finite only on the diagonal. But if this was the case,
the maximum principle would then immediately imply the finiteness of
the kernel everywhere.
Let us now discuss how sharp the results of the preceding sections
are. In the first example we show that, if we drop the assumption of
compactness the assertions of Theorem 3, Theorem 4 and Theorem 10
are in general the strongest possible.
Example 13. Let X = N ∪ {0} endowed with discrete topology and
the kernel
k(n,m) :=
+∞ if n = m,
0 if 0 6= n 6= m 6= 0,
1 otherwise.
The kernel is symmetric, l.s.c. and has the maximum principle. This
latter can be seen by noticing that for a probability measure µ ∈ M1(X)
the potential is +∞ on the support of µ. Indeed, since X is countable,
all measures µ ∈ M1(X) are necessarily atomic, and if for some point
ℓ ∈ X we have µ({ℓ}) > 0, then by definition
X k(x, y) dµ(y) = +∞.
We calculate the studied quantities of the set H = X (also as in all
the examples below). Since the kernel is positive, Dn ≥ 0. On the other
hand, choosing w1 := 1, . . . , wn := n, all the values k(wi, wj) will be
exactly 0, so it follows that Dn = 0, n = 1, 2, . . ., and hence D = 0.
The Chebyshev constant can be estimated from below, if we compute
the infimum of a suitably chosen log-polynomial. Consider the log-
polynomial p(x) with all zeros placed at 0, that is with w1 = . . . =
wn = 0. Then the log-polynomial p(x) is
j k(x,wj) = n · k(x, 0). If
x 6= 0, we have p(x) = n, which gives M ≥ 1. The upper estimate of
M is also easy: suppose that in the system w1, . . . wn there are exactly
m points being equal to 0 (say the first m). Then
p(x) =
+∞ x = w1, . . . , wn,
n x = 0, x 6= w1, . . . , wn (if m = 0)
m x 6= 0, x 6= w1, . . . , wn
This shows for the corresponding log-polynomial inf p(x) = m, so
Mn ≤ 1, whence M = 1.
The energy is computed easily. Using the above reasoning on the
maximum principle, we see W (µ) = +∞ for any µ ∈ M1(X), hence
w(X) = +∞.
Thus we have an example of
+∞ = w > M > D = 0.
The above example completes the case of the kernel with maximum
principle. Let us now drop this assumption and look at what can
happen.
Example 14. Let X := {−1, 0, 1} be endowed with the discrete topol-
ogy. We define the kernel by
k(x, y) :=
2 if 0 ≤ |x− y| < 2,
0 if 2 = |x− y|.
Then k is continuous and bounded on X×X. This, in any case, implies
D = w by Theorem 8. Note that k does not satisfy the maximum
principle. To see this, consider, e.g., the measure µ = 1
δ1. Then
for the potential Uµ one has Uµ(1) = Uµ(−1) = 1 and Uµ(0) = 2,
which shows the failure of the maximum principle.
To estimate the nth diameter from above, let us consider the point
system {wi} of n = 2m points with m points falling at −1 and m points
falling at 1, while no points being placed at 0. Then by definition of
Dn := Dn(X) one can write
n(n− 1)
Dn ≤ 2
· 2 +m2 · 0 =
Applying this estimate for all even n = 2m as n → ∞, it follows that
D = lim
Dn ≤ 1. (9)
Next we estimate the Chebyshev constants from below by computing
the infimum of some special log-polynomials. For pn(x) = k(x, 0) one
has pn(x) ≡ 2 = inf pn. We thus find Mn ≥ 2 and M ≥ 2, showing
M > D, as desired.
Example 15. Let X := N with the discrete topology. Then X is
a locally compact Hausdorff space, and all functions are continuous,
hence l.s.c. on X. Let k : X ×X → [0,+∞] be defined as
k(n,m) :=
+∞ if n = m,
2−n−m if n 6= m.
Clearly k is an admissible kernel function. For the energy we have
again w(X) = +∞, see Example 13.
On the other hand let n ∈ N be any fixed number, and compute the
nth diameter Dn(X). Clearly if we choose wj := m+ j, with m a given
(large) number to be chosen, then we get
Dn(H) ≤
(n− 1)n
1≤i 6=j≤n
2−i−j−2m ≤
(n− 1)n
≤ 2−2m ,
hence we find that the nth diameter is Dn(X) = 0, so D(X) = 0,
too. For any log-polynomial p(x) we have inf p(x) = limx→∞ p(x) = 0,
hence M(X) = 0. That is we have D(X) = M(X) = 0 < w(X) = +∞.
The example shows how important the diagonal, excluded in the
definition of D but taken into account in w, may become for particular
cases. We can even modify the above example to get finite energy.
Example 16. Let X := (0, 1], equipped with the usual topology, and
let xn = 1/n. We take now
k(x, y) :=
+∞ if x = y,
2−n−m if x = xn and y = xm (xn 6= xm),
− log |x− y| otherwise
Compared to the l.s.c. logarithmic kernel, this k assumes different,
smaller values at the relatively closed set of points {(xn, xm) : n 6=
m} ⊂ X ×X only, hence it is also l.s.c. and thus admissible as kernel.
If a measure µ ∈ M1(X) has any atom, say if for some point z ∈ X
we have µ({z}) > 0, then by definition
X k(x, y) dµ(y) = +∞, hence
also w(µ) = +∞. Since for all µ ∈ M1(X) with any atomic component
w(µ) = +∞, we find that for the set H := X we have
w(H) := inf
µ∈M1(H)
w(µ) = inf
µ∈M1(H)
µ not atomic
w(µ).
But for measures without atoms, the countable set of the points xn are
just of measure zero, hence the energy equals to the energy with respect
to the logarithmic kernel. Thus we conclude w(H) = e−cap(H) = e−1/4,
as cap((0, 1]) = 1/4 is well-known.
On the other hand if n ∈ N is any fixed number, we can compute
the nth diameter Dn(H) exactly as above in Example 15. Hence it is
easy to see that Dn(H) = 0, whence also D(H) = 0. Similarly, we find
M(H) = 0, too.
This example shows that even in case w(H) < +∞ we can have
w(H) > D(H) = M(H).
6. Average distance number and the maximum principle
In the previous section, we showed the equality of the Chebyshev con-
stant M and the transfinite diameter D, using essentially elementary
inequalities and the only theoretically deeper ingredient, the assump-
tion of the maximum principle. We have also seen examples showing
that the lack of the maximum principle for the kernel allows strict
inequality between M and D. These observations certify to the rel-
evance of this principle in our investigations. Indeed, in this section
we show the necessity of the maximum principle in case of continuous
kernels for having M(K) = D(K) for all compact sets K. We need
some preparation first.
Recall from the introduction the notion of the average distance (or
rendezvous) number. Actuyally, a more general assertion than there can
be stated, see Stadje [21] or [6]. For a compact connected set K and a
continuous, symmetric kernel k, the average distance number r(K) is
the uniquely existing number with the property that for all probability
measures supported in K there is a point x ∈ K with
Uµ(x) =
k(x, y) dµ(y) = r(K).
This can be even further generalised by dropping the connectedness,
see Thomassen [22] and [6]. Even for not necessarily connected but
compact spacesK with symmetric, continuous kernel k there is a unique
number r(K) with the property that whenever a probability measure
on K and a positive ε are given, there are points x1, x2 ∈ K such that
Uµ(x1)− ε ≤ r(K) ≤ U
µ(x2) + ε.
This number is called the (weak) average distance number, and is par-
ticularly easy to calculate, when a probability measure with constant
potential is available. Such a measure µ is called then an invariant
measure. In this case the average distance number r(K) is trivially just
the constant value of the potential Uµ, see Morris, Nicholas [14] or [7].
It was proved in [7] that one always has M(K) = r(K), so once we
have an invariant measure, then the Chebyshev constant is again easy
to determine.
Also the Wiener energy w(K) has connection to invariant measures,
as shown by the following result, which is a simplified version of a more
general statement from [7], see also Wolf [23].
THEOREM 17. Let ∅ 6= K ⊂ X be a compact set and k be a continu-
ous, symmetric kernel. Then we have
r(K) ≥ w(K).
Furthermore, if r(K) = w(K), then there exists an invariant measure
in M1(K).
As mentioned above, we have r(K) = M(K), so the inequality r(K) ≥
w(K) in the first assertion of the above theorem is also the conse-
quence of Theorems 3 and 8. For the proof of the second assertion
one can use the Frostman-Fuglede Equilibrium Theorem 9 with the
obvious observation that “nearly every” in this context means indeed
“every”. Actually any probability measure µ ∈ M1(K) which minimises
ν 7→ supK U
ν is an invariant measure and its potential is constant
M(K), see [7, Thm. 5.2] (such measures undoubtedly exist because of
compactness of M1(K)). Henceforth we will indifferently use the terms
energy minimising or invariant for expressing this property of measures.
THEOREM 18. Suppose that the kernel k is symmetric and continu-
ous. If M(K) = D(K) for all compact sets K ⊆ X, then the kernel has
the maximum principle.
Proof. Recall from Corollary 6 that D(K) = w(K) for all K ⊆ X
compact. So we can use Theorem 17 all over in the following arguments.
We first prove the assertion in the case when X is a finite set. The
proof is by induction on n = #X. For n = 1 the assertion is trivial.
Let now #X = 2, X = {a, b}. Assume without loss of generality that
k(a, a) ≤ k(b, b). Then we only have to prove that for µ = δa the
maximum principle, i.e., the inequality k(a, b) ≤ k(a, a) holds. To see
this we calculate M(X) and D(X). We certainly have D(X) ≤ k(a, a).
On the other hand for an energy minimising probability measure νp :=
pδa + (1 − p)δb on X we know that its potential is constant over X,
hence
pk(a, a) + (1− p)k(b, a) = pk(a, b) + (1− p)k(b, b)
= M(X) = D(X) ≤ k(a, a).
Here if p = 1, then k(a, a) = k(a, b). If p < 1, then we can write
(1− p)k(b, a) ≤ (1− p)k(a, a), hence k(b, a) ≤ k(a, a),
so the maximum principle holds.
Assume now that the assertion is true for all sets with at most n
elements and for all kernels, and let #X = n + 1. For a probability
measure µ on X we have to prove supx∈X U
µ(x) = supx∈ supp µ U
µ(x).
If supp µ = X, then there is nothing to prove. Similarly, if there are
two distinct points x1 6= x2, x1, x2 ∈ X \ supp µ, then by the induction
hypothesis we have
x∈X\{x1}
Uµ(x) = sup
x∈ supp µ
Uµ(x) = sup
x∈X\{x2}
Uµ(x).
So for a probability measure µ defying the maximum principle we
must have # supp µ = n, say supp µ = X \ {xn+1}; let µ be such a
measure. Set K = supp µ and let µ′ be an invariant measure on K.
We claim that all such measures µ′ are also violating the maximum
principle. If µ = µ′, we are done. Assume µ 6= µ′ and consider the
linear combinations µt := tµ+(1− t)µ
′. There is a τ > 1, for which µτ
is still a probability measure and supp µτ ( supp µ. By the inductive
hypothesis (as # supp µτ < n) we have U
µτ (xn+1) ≤ U
µτ (a) for some
a ∈ supp µτ . We also know that U
µ(xn+1) = U
µ1(xn+1) > U
µ1(a).
Hence for the linear function Φ(t) := Uµt(xn+1) − U
µt(a) we have
Φ(1) > 0 and also Φ(τ) ≤ 0 (τ > 1). This yields Φ(0) > 0, i.e.,
(xn+1) = U
µ0(xn+1) > U
µ0(a) = Uµ
(y) for all y ∈ K. We have
therefore shown that all energy minimising (invariant) measures on K
must defy the maximum principle.
Let now ν be an invariant measure on X. We have
M(X) = Uν(y) = sup
Uν(x) = D(X)
≤ D(K) = sup
(x) = Uµ
(z) < Uµ
(xn+1)
for all y ∈ X, z ∈ K. Thus we can conclude Uν(y) ≤ Uµ
(y) for all
y ∈ X and even “<” for y = xn+1. Integrating with respect to ν would
yield
k dν dν = M(X) <
k dµ′ dν =
k dν dµ′ = M(X),
hence a contradiction, unless ν({xn+1}) = 0. If ν({xn+1}) = 0 held,
then ν would be an energy minimising measure on K. This is because
obviously supp ν ⊆ K holds, and the potential of ν is constant M(X)
over K, so
M(X) =
k dν dµ′ =
k dµ′ dν = M(K) holds.
As we saw above, then ν would not satisfy the maximum principle,
a contradiction again, since the potential of ν is constant on X. The
proof of the case of finite X is complete.
We turn now to the general case of X being a locally compact space
with continuous kernel. Let µ be a compactly supported probability
measure on X and y 6∈ supp µ. Set K = supp µ and note that both
M1(K) ∋ ν 7→ supK U
ν and ν 7→ Uν(y) are continuous mappings with
respect to the weak∗-topology on M1(K). If supK U
µ < Uµ(y) were
true, we could therefore find, by a standard approximation argument,
see for example [6, Lemma 3.8], a finitely supported probability measure
µ′ on K for which
x∈ supp µ′
(x) ≤ sup
(x) < Uµ
This is nevertheless impossible by the first part of the proof, thus the
assertion of the theorem follows.
Acknowledgement
The authors are deeply indebted to Szilárd Révész for his insightful
suggestions and for the motivating discussions.
References
1. Anagnostopoulos, V. and Sz. Gy. Révész: 2006, ‘Polarization constants for
products of linear functionals over R2 and C2 and Chebyshev constants of
the unit sphere’. Publ. Math. Debrecen 68(1–2), 75–83.
2. Bourbaki, N.: 1965, Intégration, Éléments de Mathématique XIII., Vol. 1175 of
Actualités Sci. Ind. Paris: Hermann, 2nd edition.
3. Carleson, L.: 1967, Selected Problems on Exceptional Sets, Vol. 13 of Van
Nostrand Mathematical Studies. D. Van Nostrand Co., Inc.
4. Choquet, G.: 1958/59, ‘Diamètre transfini et comparaison de diverses ca-
pacités’. Technical report, Faculté des Sciences de Paris.
5. Farkas, B. and Sz. Gy. Révész: 2005, ‘Rendezvous numbers in normed spaces’.
Bull. Austr. Math. Soc. 72, 423–440.
6. Farkas, B. and Sz. Gy. Révész: 2006a, ‘Potential theoretic approach to
rendezvous numbers’. Monatshefte Math 148, 309–331.
7. Farkas, B. and Sz. Gy. Révész: 2006b, ‘Rendezvous numbers of metric spaces
– a potential theoretic approach’. Arch. Math. (Basel) 86, 268–281.
8. Fekete, M.: 1923, ‘Über die Verteilung der Wurzeln bei gewissen algebraischen
Gleichungen mit ganzahligen Koeffizienten’. Math. Z. 17, 228–249.
9. Fuglede, B.: 1960, ‘On the theory of potentials in locally compact spaces’. Acta
Math. 103, 139–215.
10. Fuglede, B.: 1965, ‘Le théorème du minimax et la théorie fine du potentiel’.
Ann Inst. Fourier 15, 65–87.
11. Gross, O.: 1964, ‘The rendezvous value of a metric space’. Ann. of Math. Stud.
52, 49–53.
12. Landkof, N. S.: 1972, Foundations of modern potential theory, Vol. 180 of
Die Grundlehren der mathematischen Wissenschaften. New York, Heidelberg:
Springer.
13. Mhaskar, H. N. and E. B. Saff: 1992, ‘Weighted analogues of capacity,
transfinite diameter and Chebyshev constants’. Constr. Approx. 8(1), 105–124.
14. Morris, S. A. and P. Nickolas: 1983, ‘On the average distance property of
compact connected metric spaces’. Arch. Math. 40, 459–463.
15. Ohtsuka, M.: 1961, ‘On potentials in locally compact spaces’. J. Sci. Hiroshima
Univ. ser A 1, 135–352.
16. Ohtsuka, M.: 1965, ‘An application of the minimax theorem to the theory of
capacity’. J. Sci. Hiroshima Univ. ser A 29, 217–221.
17. Ohtsuka, M.: 1967, ‘On various definitions of capacity and related notions’.
Nagoya Math. J. 30, 121–127.
18. Pólya, Gy. and G. Szegő: 1931, ‘Über den transfiniten Durchmesser (Ka-
pazitätskonstante) von ebenen und räumlichen Punktmengen’. J. Reine Angew.
Math. 165, 4–49.
19. Révész, Sz. Gy. and Y. Sarantopoulos: 2004, ‘Plank problems, polarization,
and Chebyshev constants’. J. Korean Math. Soc. 41(1), 157–174.
20. Saff, E. B. and V. Totik: 1997, Logarithmic potentials with external fields, Vol.
316 of Grundlehren der Mathematischen Wissenschaften. Springer, Berlin.
21. Stadje, W.: 1981, ‘A property of compact, connected spaces’. Arch. Math. 36,
275–280.
22. Thomassen, C.: 2000, ‘The rendezvous number of a symmetric matrix and a
compact connected metric space’. Amer. Math. Monthly 107(2), 163–166.
23. Wolf, R.: 1997, ‘On the average distance property and certain energy integrals’.
Ark. Mat. 35, 387–400.
24. Zaharjuta, V. P.: 1975, ‘Transfinite diameter, Chebishev constants, and
capacity for compacta in Cn’. Math. USSR Sbornik 25(3), 350–364.
ABSTRACT
  We study the relationship between transfinite diameter, Chebyshev constant
and Wiener energy in the abstract linear potential analytic setting pioneered
by Choquet, Fuglede and Ohtsuka. It turns out that, whenever the potential
theoretic kernel has the maximum principle, then all these quantities are equal
for all compact sets. For continuous kernels even the converse statement is
true: if the Chebyshev constant of any compact set coincides with its
transfinite diameter, the kernel must satisfy the maximum principle. An
abundance of examples is provided to show the sharpness of the results.

<|endoftext|><|startoftext|>
Introduction
Event logs have been widely used to analyze the
error/failure behavior of computer-based systems and
to estimate their dependability. Event logs include a
large amount of information about the occurrence of
various types of events that are collected concurrently
with normal system operation, and as such reflect
actual workload and usage. Some of the events are
informational and are issued from the normal activity
of the target systems, whereas others are recorded when
errors and failures affect local or distributed resources,
or are related to system shutdown and start-up. The
latter events are particularly useful for dependability
analysis.
Computer system dependability analysis based on
event logs has been the focus of several published
papers [1, 2, 4, 5, 7, 8, 9]. Various types of systems
have been studied (Tandem, VAX/VMS, Unix,
Windows NT, Windows 2000, etc.) including
mainframes and largely deployed commercial systems.
The issues addressed in these studies cover a large
spectrum, including the development of techniques and
methodologies for the extraction of relevant
information from the event logs, the identification of
error patterns, their causes and their effects, and the
statistical assessment of dependability measures such
as failure and recovery rates, reliability and availability.
It is widely recognized that such event log based
dependability analyses provide useful feedback to
software and system designers. Nevertheless, it is
important to note that the results obtained are
intimately related to the quality and the accuracy of the
data recorded in the logs. The study reported in [1]
points out various problems that might affect the data
included in the event logs and make incorrect
conclusions likely, considering as an example the
VAX/VMS system. Thus, extreme care is needed to
identify deficiencies in the data and to avoid that they
lead to incorrect conclusions.
In this paper, we show that similar problems can be
observed in the event logs maintained by the
SunOS/Solaris Unix operating system, and we present
a novel approach that is aimed to address such
problems and to improve the dependability estimates
based on such event logs. These results are illustrated
using field data collected during a 4-year period from
373 SunOS/Solaris Unix workstations and servers
interconnected through a LAN. The data corresponds to
event logs recorded via the syslog daemon. In
particular, we use var/adm/messages log files. We
focus on the evaluation of machine uptimes, downtimes
and availability based on the identification of failures
that caused a total service interruption of the machine.
In this study, we show that the consideration of the
information recorded in the var/adm/messages log
files only may lead to dependability estimations that do
not faithfully reflect reality due to incomplete or
imperfect data recorded in the corresponding logs. For
the estimation of these measures, we start with the
assumption that machine failures can be identified by
the last events recorded in the event log before the
machine goes down and then is rebooted. This
assumption was considered in the study reported in [3].
However, the validity of this assumption is
questionable in the following situations: 1) the machine
has a real activity between the last event logged and the
reboot without generating events in the logs, 2) the
time when the failure occurs is earlier than the
timestamp of the last event logged on the machine. To
address these problems and to obtain more realistic
estimations, we propose a solution based on utilization
of additional information obtained from wtmpx Unix
files, as well as data characterizing the state of the
machines included in the data collection that are
recorded at a regular basis during the data collection
procedure. The results clearly show that the combined
use of this additional information and syslogd log
files have a significant impact on the estimations.
To our knowledge, the approach discussed in this
paper and the corresponding results have not been
addressed in the previous studies published on the
exploitation of syslogd log files for the dependability
analysis of Unix based systems, including our paper
The rest of the paper is structured into 5 sections.
Section 2 describes the event logging mechanism in
Unix and the data collection procedure that we have
used in our study. Section 3 presents the dependability
measures that we have considered and discusses
different approaches and assumptions to estimate them
from the collected data. Section 4 presents some results
illustrating the benefits of the proposed approach, as
well as various statistics characterizing the
dependability of the Unix systems considered in our
study.
2. Event logging and data collection
For the Unix operating system, the event logging
mechanism is implemented by the syslog daemon
(denoted as syslogd). Running as a background
process, this daemon listens for the events generated by
different sources: kernel, system components (disk,
memory, network interfaces), daemons and
applications that are configured to communicate with
syslogd. These events inform about the normal
activity of the system as well as its behavior under the
occurrence of errors and failures including reboot and
shutdown events. The configuration file
/etc/syslog.conf specifies the destination of each
event received by syslogd, depending on its severity
level and its origin. The destination could be one or
several log files, the administration console or the
operator.
The events that are relevant to our study are
generally stored in the /var/adm/messages log file.
Each message stored in a log file refers to an event that
occurred on the system due to the local activity or its
interaction with other systems on the network. It
contains the following information: the date and time
of the event, the machine name on which the event is
logged and a description of the message. An example
of an event recorded in the log file is given below:
Mar 2 10:45:12 elgar automountd[124]:
server mahler not responding
The SunOS/Solaris Unix operating system limits the
size of the log files. Generally, only the log files
corresponding to the last 5 weeks of activity are kept. It
is necessary to set up a data collection strategy in order
to archive a large amount of data. This is essential to
obtain representative results for the dependability
measures characterizing the monitored systems.
In our study, we have included all the
SunOS/Solaris machines connected through the LAAS
local area network, excluding those used for
experimental testbeds or maintenance activities. We
have developed a data collection strategy to
automatically collect the /var/adm/messages log
files stored on these machines. This strategy takes into
account the frequent evolution of the network
configuration during the observation period in terms of
variation of the number of connected systems, updates
or changes of the operating system versions,
modification of software configurations, etc. A shell
script executed each week via the cron mechanism
implements the strategy and remotely copies the log
files from each system included in the study and
archives them on a dedicated machine. After each data
collection campaign, a text file (named DCSummary)
containing a summary of the data collection campaign
is created. This summary indicates the status of each
machine included in the campaign and how the
collection of the corresponding log file has been done.
For each machine, the status information reported in
the summary is one of the following:
• alive_OK: the machine is alive and the copy of its log
file succeeded;
• alive_KO: the machine is alive but the copy of its log
file failed. For this case, a description of the failure
symptom and cause is also included: shell problem,
connection ended by tiers, etc.
• no_answer: the machine did not answer to a ping
request before expiration of the default timeout
period.
The information included in the DCSummary file is
used to verify each data collection campaign and solve
the problems that may appear during the collection. It
is also useful to improve the accuracy of dependability
measures estimation (see Section 3.2). More detailed
information about the syslogd mechanism and the
data collection strategy are reported in [6].
3. Dependability measures estimation and
assumptions
Various types of dependability analyses can be
carried out based on the information contained in the
log files and several quantitative measures can be
considered to characterize the dependability of the
target machines: machine uptimes and downtimes,
reliability, availability, failure and recovery rates, etc.
In order to evaluate these measures, it is necessary to
identify from the log files the failure occurrences and
the corresponding service degradation durations. Such
task is tedious and requires the development of
heuristics and predefined failure criteria. An example
of such analysis is reported in [7].
In our study, we have focused on the availability
analysis of the individual machines included in the data
collection. In this context, we have considered machine
failures leading to a total interruption of the service
delivered to the users, followed by a reboot. The time
between the failure occurrence and the end of the
reboot corresponds to the total service interruption
period of the system. Apart from these periods, the
system is considered to be in the normal functioning
state where it delivers an appropriate service to the
users.
In order to evaluate the availability of the machines
included in the study, we need to estimate for each
machine the corresponding uptimes (denoted as UTi)
and downtimes (DTi), based on the information
recorded in the event logs. Each downtime value DTi
corresponds to the total service interruption period
associated to the i
failure. It is composed by the
service degradation period due to the failure occurrence
and the reboot period. Each uptime value corresponds
to the period between two successive downtimes.
Using the uptime and downtime estimates for each
machine j, we can evaluate the corresponding
availability (noted Aj) and the unavailability (noted
UAj). These measures are computed with the following
formulas:
UAj =� UTi ⁄ �(UTi +DTi) and UAj = 1 - UAj (1)
3.1. Machine uptimes and downtimes
estimation
The estimation of machine uptimes and downtimes
is carried out in two steps:
1) Identification of machine reboots and their
duration.
2) Identification of failures associated to each reboot
and of the corresponding service interruption
period.
To identify the occurrence of machine reboots and
their duration, we have developed an algorithm based
on the sequential parsing and matching of each event
recorded in the system log files to specific patterns or
sequences of patterns characterizing the occurrence of
reboots. Indeed, whereas some reboots can be explicitly
identified by a “reboot” or a “shutdown” event, many
others can be detected only by identifying the sequence
of the initialization events that are generated by the
system when it is restarted. The algorithm is described
in [4, 6]. It gives, for each reboot i identified in the
event logs and for each machine, the timestamp of the
reboot start (dateSBi), the timestamp of the reboot end
(dateEBi) and the associated service interruption
duration.
The identification of the timestamp of the failure
associated to each reboot and the corresponding service
interruption period is more problematic. In the study
reported in [3], it was assumed that the timestamp of
the last event recorded before the reboot (denoted as
dateEBRi) identifies the failure occurrence time. With
this assumption, each uptime UTi and downtime DTi
can be evaluated as follows:
UTi = dateEBRi – dateEBi-1 and
DTi = dateEBi - dateEBRi (2)
where i is the index of the current reboot, i-1 the index
of the previous reboot.
The consideration of EBR for the estimation of UTi
and DTi parameters may not be realistic in the
following situations (denoted as S1 and S2):
S1) The system could be in a normal functioning state
during a period of time between EBR and the
following reboot although it does not generate any
event into the log files during that period.
S2) The beginning of the service interruption period
for the users could be prior to the timestamp of
the EBR event. This happens for instance when a
critical failure affects the machine in such a way
that it becomes completely unusable to the users,
without preventing the event logging mechanisms
from recording some messages into the log files.
A careful analysis of the data collected during our
study revealed that the above situations are common.
To address this problem and to improve downtime and
uptime estimation accuracy, it is necessary to use
auxiliary data that provides complementary information
on the activity of the target machines.
In this paper, we present a solution based on the
correlation of data collected from the
/var/adm/messages log files, with data issued from
wtmpx files also maintained by the SunOS/Solaris
operating system. We also use the information recorded
in the DCSummary file (see Section 2). The following
section presents the method developed to extract the
data from the wtmpx file and how we used this data to
adjust the estimation of machine uptimes and
downtimes.
3.2. Uptime and downtime estimations
refinement
3.2.1. wtmpx files. The SunOS/Solaris Unix operating
system records into the /var/adm/wtmpx binary file
information identifying the users login/logout. Through
the pseudo-user reboot it also records information on
the system reboots. The wtmpx file is organized into
records (named also entries) with a fixed size. Each
record has the format of a data structure with the
following fields:
• the user login name: “user”;
• the id associated to the current record in the
/etc/inittab file: “init_id”;
• the device name (console, lnxx): “device”;
• the process id: “pid”;
• the record type: “proc_type”;
• the exit status for a process marked as
DEAD_PROCESS: “exit_status” and “term_status”;
• the timestamp of the record: “date”;
• the session id: “session_id”;
• the length of the machine’s name: “length”;
• the machine’s name used by the user to connect, if it
is a remote one: “host”.
We developed a specific algorithm that collects the
wtmpx file of each machine included in the study on a
regular basis and processes the binary file to extract the
information that is relevant to our study. The results of
the algorithm are kept in a separate file for each
machine. Figure 1 presents examples of records
obtained for a machine of our network.
The first two records show that the root user
connected to the local system from the system named
cubitus on November 6, 2001 at 16h 37mn 41s, using
the rlogin command. The next records inform about the
occurrence of a reboot event about 3 minutes later. The
third record shows that this reboot was done via a
shutdown command executed probably by the root
user. The sequence of records corresponding to a
reboot event is much longer than this example. The
whole sequence is not presented in Figure 1, the aim of
the illustration is to show some examples of records as
extracted from wtmpx files by our algorithm.
In the following, we outline the approach that we
developed to use the information extracted from the
wtmpx files together with the information from the
DCSummary files in order to refine the uptime and
downtime estimations, considering situations S1 and
S2 discussed in Section 3.1.
2001 Nov 6 16:37:41 user=.rlogin host=
length=0 init_id=r100
device=/dev/pts/1 pid=25220
proc_type=6 term_status=0
2001 Nov 6 16:37:41 user=root host=cubitus
length=8 init_id=r100
device=/dev/pts/1 pid=25220
proc_type=7 term_status=0
2001 Nov 6 16:40:35 user=shutdown host=
length=0 init_id=
device=~
pid=0 proc_type=0
term_status=0 exit_status=0
2001 Nov 6 16:41:39 user= host=
length=0 init_id=
device=system boot pid=0
proc_type=2 term_status=0
2001 Nov 6 16:42:09 user= host=
length=0 init_id=
device=run–level 3 pid=0
proc_type=1 term_status=0
Figure 1. Examples of records from /var/adm/wtmpx
obtained with our algorithm
3.2.2. Situation S1: an operational activity exists
between EBR and SB events. The detailed analysis of
the collected data from the log files and comparison
with the information extracted from wtmpx files
showed that the situation where a real activity exists
between the last event recorded before a reboot (EBR)
and the event identifying the start of the following
reboot (SB event) recorded in the
/var/adm/messages log files appears quite often.
This situation occurs when the machine functions
normally but its activity doesn’t produce any message
into the log file maintained by the syslogd daemon. The
cause could be that the applications or services run by
the users aren’t configured to communicate with the
syslogd daemon.
To better understand this case, Figure 2 gives an
example of a sequence of events characterizing the
state of the corresponding system, taking into account
the information extracted from the
/var/adm/messages, wtmpx and DCSummary files.
For each event, we indicate the timestamp when it is
logged, a short description and the source file from
which the event is extracted. For wtmpx events, we
present only the fields which are useful to identify the
system activity, the other fields are not significant for
this analysis.
For this example, the events recorded in the
/var/adm/messages log file let us believe that the
system had no activity between December 8 at 18:06
(EBR event) and December 9 at 15:30, the timestamp
of the reboot start. However, the analysis of the
DCSummary and wtmpx files shows that the system
had a real activity between EBR and SB events. In fact,
we see that the data collection campaign was
successfully carried out on December 9 at 6:43.
Event # Event date Event description File where the event is logged
..................
2002 Dec 8 18:06:08
2002 Dec 9 06:43:34
2002 Dec 9 13:18:45
2002 Dec 9 13:35:21
2002 Dec 9 13:47:57
2002 Dec 9 13:48:48
2002 Dec 9 15:18:46
2002 Dec 9 15:29:20
2002 Dec 9 15:29:25
2002 Dec 9 15:29:25
2002 Dec 9 15:29:27
..................
2002 Dec 9 15:29:52
2002 Dec 9 15:30:52
2002 Dec 9 15:30:52
2002 Dec 9 15:30:52
2002 Dec 9 15:30:52
2002 Dec 9 15:30:53
last event before reboot <EBR>
alive_ok
user=UserC; device=pts/0; pid=2362; proc_type=7
user=UserB; device=pts/1; pid=2379; proc_type=7
user=UserB; device=pts/1; pid=2379; proc_type=8
user=UserA; device=pts/1; pid=2434; proc_type=7
user=UserA; device=pts/1; pid=2434; proc_type=8
user=UserB; device=console; pid=2644; proc_type=7
user=UserB; device=console; pid=338; proc_type=8
user=UserB; device=console; pid=2644; proc_type=8
user=LOGIN; device=console; pid=2742; proc_type=6
..................
user=troot; device=console; pid=334; proc_type=7
user=sac; device=; pid=333; proc_type=8
user=troot; device=console; pid=334; proc_type=8
user=; device=run-level 6; pid=0; proc_type=1
user=rc6; device=; pid=2899; proc_type=5
reboot start <SB>
var/adm/messages log file
DCSummary
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
..................
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
var/adm/messages log file
Figure 2. Example illustrating situation S1
Moreover, the records from wtmpx file show, for
example, that UserA used the system on December 9
between 13:48 (information given by the proc_type
field value equal to 7, that is the process with pid=2434
started at the time of this record) and 15:18
(proc_type=8, the same process ended at the time of
this record), corresponding to an utilization period of
the system of nearly one hour and a half.
In this situation, the EBR event as defined earlier
doesn’t correspond to the beginning of the total service
interruption period. Thus, the estimated value of the
downtime parameter using the assumption discussed in
Section 3.1, does not faithfully reflect the real value of
the service interruption period. Based on the correlation
of the information provided by the three data source
files, a refined and more accurate estimation of
machine downtimes and uptimes could be obtained.
The refinement consists in associating the failure
occurrence time to the timestamps of the last event
recorded before the reboot based on the information
contained in /var/adm/messages, wtmpx and
DCSummary files.
3.2.3. Situation S2: the service interruption period
starts before the EBR event. This situation occurs
when critical failures affect the system in such a way
that it becomes completely unusable, without
preventing the event logging mechanisms from
recording some messages into the log files. During the
recovery phase, the actions performed by the system
administrators may include several unsuccessful reboot
attempts that are not recorded in the
/var/adm/messages log file, but some events
referring to them are written in the wtmpx file. Using
this information, just like in the previous case, we can
refine the downtime and uptime estimations by
associating the failure occurrence time to the
timestamps of the events recorded in the wtmpx file
that better reflects the start of the service interruption.
An example of a sequence of events illustrating this
case is given in Figure 3.
Event # Event date Event description File where the event is logged
2003 Jan 9 10:18:59
2003 Jan 9 10:21:39
2003 Jan 9 10:21:39
2003 Jan 9 10:21:39
2003 Jan 9 10:21:39
2003 Jan 9 10:21:48
2003 Jan 9 10:21:48
2003 Jan 9 10:22:05
2003 Jan 9 10:22:13
2003 Jan 9 10:22:13
2003 Jan 9 10:22:16
user=root; device=console; pid=2370; proc_type=7
user=sac; device=; pid=425; proc_type=8
user=root; device=console; pid=2370; proc_type=8
user=; device=run-level 5; pid=0; proc_type=1
user=rc5; device=; pid=25952; proc_type=5
user=UserC; device=pts/3; pid=11584; proc_type=8
user=UserC; device=pts/1; pid=11359; proc_type=8
last event before reboot <EBR>
user=rc5; device=; pid=25953; proc_type=8
user=uadmin;device=; pid=26121; proc_type=5
reboot start <SB>
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
wtmpx
var/adm/messages log file
wtmpx
wtmpx
var/adm/messages log file
Figure 3. Example illustrating situation S2
We can identify the events extracted from the
wtmpx file informing upon the stop of the system:
event # 2 with user field “sac” and proc_type “ 8”
(dead process) followed by events #3, #4, and #5
notifying the system run-level change to run-level 5
(this one is used to properly stop the system).
This example shows that the start of the service
interruption period is prior to the EBR event recorded
in the /var/adm/messages log file. The refinement
of the uptime and downtime estimations
corresponding to such situations consists in
associating the failure occurrence time to the
timestamps of the last event recorded in the wtmpx
file before the start of the reboot sequence.
4. Experimental results
The analyses presented in this Section are based on
/var/adm/messages log file data collected during
45 months (October 1999 – July 2003) from 418
SunOS/Solaris Unix workstations and servers
interconnected through the LAAS local area
computing network. As shown in Figure 4, the data
collection period differed significantly from one
machine to another due to the dynamic evolution of
the network. For more than 70 % of the machines, the
data collection period was longer than 21 months. On
the other hand, it can be noticed that some machines
have a quite short data collection period. In order to
have significant statistical analysis results, we
excluded from the analysis the machines for which
the data collection period was shorter than 2000 hours
(about 3 months). Consequently, the results presented
in the following concern 373 Unix machines. Among
these machines, 17 correspond to major servers for
the entire network or a sub-set of users: WWW,
NIS+, NFS, FTP, SMTP, file servers, printing
servers, etc.
Figure 4. Examples of records from
/var/adm/wtmpx obtained with our algorithm
The application of the reboot identification
algorithm on the collected data allowed us to identify
12805 reboots for the 373 machines, only 476 reboots
concern the 17 servers. Based on the information
provided by the reboot identification algorithm, we
evaluated for each machine the associated uptimes
UTi and downtimes DTi, and the availability measure.
The collection of wtmpx files started later than the
/var/adm/messages log files. For this reason, we
were able to analyze the impact of uptimes and
downtimes estimation refinement algorithms only on
a subset of UTi and DTi values associated with the
reboots identified from the log files. Among the
12805 reboots, this analysis concerned 6163 reboots
(48.13%). For the remaining 6642 reboots, the
corresponding data from the wtmpx files was not
available.
In the following, we first present in Section 4.1 the
results of machine uptimes and downtimes estimation
based on the processing of the set of 6163 reboots
focusing on the impact of the estimation refinement
algorithms. Then, global results taking into account
the whole data collected during our study are
presented in Section 4.2 in order to give an overall
picture on the availability and the rate of occurrence
of reboots characterizing the Unix machines included
in our study.
4.1. Machine uptimes and downtimes
estimation and refinement
The correlation of the information contained in the
/var/adm/messages log files, the wtmpx files, and
the DCSummary files, revealed that both situations S1
and S2 discussed in Section 3.2 are common:
• Situation S1 was observed for 79.35% of the
analyzed reboots;
• Situation S2 was observed for 10.77% of the
analyzed reboots;
For the 9.88% remaining reboots, the assumption
that the EBR recorded in /var/adm/messages file
identifies the last event recorded on the machine
before the reboot was consistent with the information
available in the wtmpx and the DCSummary files.
In order to analyze the impact of the estimation
refinement algorithms on the results, Table 1 gives
the Mean, Median and Standard Deviation of uptime
and downtime values, before and after the application
of our estimation refinement algorithms discussed in
Section 3. Considering the median of the downtime
values, it can be seen that the refinement algorithms
have a significant impact on the results. The median
estimated after the refinement is 66 times lower than
the value obtained without the refinement. The
refinement algorithms have also an impact on the
uptimes estimation, but as expected the improvement
factor is lower than the one observed for the
downtime values (1.8 compared to 66).
Table 1. Machine uptimes and downtimes
estimates before and after refinement
Uptimes UTi Downtimes DTi
before
refinement
after
refinement
before
refinement
after
refinement
Mean 28.3days 1.1 month 5.9 days 1.9 days
Median 6.1 days 10.8 days 8.9 hours 8.1 min
1.7 months 1.8 months 24.1 days 21.1 days
The impact of the estimation refinement
algorithms on availability is summarized in Table 2.
It can be seen the estimated average unavailability
after the refinement is three times lower than the
value estimated based only on the information in the
/var/adm/messages log files. Clearly, the
difference is significant and cannot be ignored.
Table 2. Impact of the estimation refinement
algorithms on Availability and Unavailability
before refinement after refinement
A 89.3% 96.3 %
UA 39.0 days/year 13.7 days/year
4.2. Availability and reboot rates estimated
from the whole data set
This section presents some results concerning the
reboot rates and the availability of the 373
SunOS/Solaris Unix machines included in our study
taking into account the whole set of 12805 reboots
identified from the /var/adm/messages files.
When the wtmpx files were not available (this
concerned 6642 reboots), the estimation of the UTi,
DTi, availability and reboot rates was based only on
the information in the /var/adm/messages files,
using the assumption discussed in Section 3.1. In the
other case (i.e., for the 6163 reboots), we applied the
estimation refinement algorithms presented in
Section 3.2.
Figure 5 plots the reboot rates estimated for each
machine as a function of the data collection period.
The estimated reboot rate for each machine
corresponds to the average number of reboots
recorded during the corresponding observation. It can
be seen that the reboot rates are uniformly distributed
between 10
/hour and 10
/hour.
Figure 5. Unix machines reboot rates as a function
of the data collection period
As indicated in Table 3, the mean value of
machine reboot rates is 1.3 10
/hour, when
considering all Unix machines including workstations
and servers. If we take into account only the servers,
the mean reboot rate is 1.5 times lower (7.7 10
/hour)
corresponding to one reboot every two months.
Table 3. Reboot rate statistics
Mean Median Std. Dev.
SunOS/Solaris
machines
(Workstations
+ Servers)
1.3 10
/h 1.0 10
/h 1.3 10
Servers only 7.7 10
/h 6.4 10
/h 5.6 10
The results illustrating the availability and
unavailability of the Unix machines including
workstations and servers are given in Figure 6 and
Table 4. The mean availability is 97.81 %
corresponding to an average unavailability of 8 days
per year. Detailed analysis shows that only 15 among
the 373 Unix machines included in the study have an
availability lower than 90%.
When considering only the servers, the estimated
availability varies between 99.36% and 99.1% with
an average unavailability of 12 hours per year.
Figure 6. SunOS/Solaris Unix machines
availability distribution
Table 4. Availability and Unavailability statistics
Mean Median Std. Dev.
A 97.81 % 98.79 % 3.07 %
UA 7.99 day/year 4.41 day/year 11.20 day/year
6. Conclusion
Dependability analyses based on event logs
provide useful feedback to software and system
designers. Nevertheless, the results obtained are
intimately related to the quality and the completeness
of the information recorded in the logs. As the
information contained in such event logs could be
incomplete or imperfect, it is important to use
additional sources of information to ensure that the
conclusions derived from such analyses faithfully
reflect reality. The approach investigated in this paper
is aimed to fulfill this objective considering
SunOS/Solaris Unix systems as an example.
In particular, we have shown that the combined us
of the data contained in the syslogd files and the
information recorded in the wtmpx files or through
the monitoring of systems state during the data
collection campaigns provides uptime and downtime
estimations that are closer to reality than the
estimations obtained based on syslogd files only.
This result is illustrated based on a large set of field
data collected from 373 machines during a 45 month
observation period.
In our future work, we will investigate the
applicability of the approach proposed in this paper to
other operating systems such as Linux, Windows 2K
and Mac OS X.
References
[1] M. F. Buckley, D. P. Siewiorek, “VAX/VMS Event
Monitoring and Analysis”, 25th IEEE Int. Symp. on
Fault-Tolerant Computing (FTCS-25), (Pasadena, CA,
USA), pp. 414-423, IEEE Computer Society, 1995.
[2] R. K. Iyer, D. Tang, “Experimental Analysis of
Computer System Dependability”, in Fault-Tolerant
Computer System Design, D. K. Pradhan, Ed., Prentice
Hall PTR, 1996, pp. 282-392.
[3] M. Kalyanakrishnam, Z. Kalbarczyk, R. K. Iyer,
“Failure Data Analysis of a LAN of Windows NT
Based Computers”, 18th IEEE Symp. on Reliable
Distributed Systems (SRDS-18), (Lausanne,
Switzerland), pp. 178-187, 1999.
[4] C. Simache, M. Kaâniche, “Measurement-based
Availability Analysis of Unix Systems in a Distributed
Environment”, The 12th Int. Symp. on Software
Reliability Engineering (ISSRE-2001), (Hong Kong,
China), pp. 346-355, IEEE Computer Society, 2001.
[5] C. Simache, M. Kaâniche, “Event Log based
Dependability Analysis of Windows NT and 2K
Systems”, 2002 Pacific Rim Int. Symposium on
Dependable Computing (PRDC-2002), (Tsukuba,
Japan), pp. 311-315, IEEE Computer Society, 2002.
[6] C. Simache, “Dependability evaluation of Unix and
Windows Systems based on operational data: A
Method and Application”, PhD Thesis, LAAS Report
N°04333, 2004 (in French).
[7] A. Thakur, R. K. Iyer, “Analyze-NOW — An
Environment for Collection & Analysis of Failures in a
Network of Workstations”, IEEE Transactions on
Reliability, vol. 45, pp. 561-570, 1996.
[8] M. Tsao, D. P. Siewiorek, “Trend Analysis on System
Error Files”, 13th IEEE Int. Symp. on Fault-Tolerant
Computing (FTCS-13), (Milano, Italy), pp. 116-119,
IEEE Computer Society, 1983.
[9] J. Xu, Z. Kalbarczyk, R. K. Iyer, “Networked
Windows NT System Field Failure Data Analysis”,
Proc. 1999 IEEE Pacific Rim Int. Symp. on
Dependable Computing (PRDC-1999), (Los Alamitos,
CA), pp. 178-185, 1999
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles false
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (None)
  /CalRGBProfile (None)
  /CalCMYKProfile (None)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Error
  /CompatibilityLevel 1.3
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJDFFile false
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.1000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails true
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams true
  /MaxSubsetPct 100
  /Optimize true
  /OPM 0
  /ParseDSCComments false
  /ParseDSCCommentsForDocInfo false
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo false
  /PreserveFlatness true
  /PreserveHalftoneInfo true
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Remove
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile ()
  /AlwaysEmbed [ true
  /NeverEmbed [ true
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 150
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages true
  /ColorImageDownsampleType /Bicubic
  /ColorImageResolution 300
  /ColorImageDepth -1
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 2.00333
  /EncodeColorImages true
  /ColorImageFilter /DCTEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.76
    /HSamples [2 1 1 2] /VSamples [2 1 1 2]
  /ColorImageDict <<
    /QFactor 0.76
    /HSamples [2 1 1 2] /VSamples [2 1 1 2]
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 15
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 15
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 150
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages true
  /GrayImageDownsampleType /Bicubic
  /GrayImageResolution 300
  /GrayImageDepth -1
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 2.00333
  /EncodeGrayImages true
  /GrayImageFilter /DCTEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.76
    /HSamples [2 1 1 2] /VSamples [2 1 1 2]
  /GrayImageDict <<
    /QFactor 0.76
    /HSamples [2 1 1 2] /VSamples [2 1 1 2]
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 15
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 15
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages true
  /MonoImageDownsampleType /Bicubic
  /MonoImageResolution 600
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.00167
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName (http://www.color.org)
  /PDFXTrapped /False
  /Description <<
    /JPN <FEFF3053306e8a2d5b9a306f300130d330b830cd30b9658766f8306e8868793a304a3088307353705237306b90693057305f00200050004400460020658766f830924f5c62103059308b3068304d306b4f7f75283057307e305930023053306e8a2d5b9a30674f5c62103057305f00200050004400460020658766f8306f0020004100630072006f0062006100740020304a30883073002000520065006100640065007200200035002e003000204ee5964d30678868793a3067304d307e30593002>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e0020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200075006d002000650069006e00650020007a0075007600650072006c00e40073007300690067006500200041006e007a006500690067006500200075006e00640020004100750073006700610062006500200076006f006e00200047006500730063006800e40066007400730064006f006b0075006d0065006e00740065006e0020007a0075002000650072007a00690065006c0065006e002e00200044006900650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f0062006100740020006f0064006500720020006d00690074002000640065006d002000520065006100640065007200200035002e003000200075006e00640020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /FRA <FEFF004f007000740069006f006e00730020007000650072006d0065007400740061006e007400200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e007400730020005000440046002000700072006f00660065007300730069006f006e006e0065006c007300200066006900610062006c0065007300200070006f007500720020006c0061002000760069007300750061006c00690073006100740069006f006e0020006500740020006c00270069006d007000720065007300730069006f006e002e00200049006c002000650073007400200070006f0073007300690062006c0065002000640027006f00750076007200690072002000630065007300200064006f00630075006d0065006e007400730020005000440046002000640061006e00730020004100630072006f0062006100740020006500740020005200650061006400650072002c002000760065007200730069006f006e002000200035002e00300020006f007500200075006c007400e9007200690065007500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300740061007300200063006f006e00660069006700750072006100e700f5006500730020007000610072006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000500044004600200063006f006d00200075006d0061002000760069007300750061006c0069007a006100e700e3006f0020006500200069006d0070007200650073007300e3006f00200061006400650071007500610064006100730020007000610072006100200064006f00630075006d0065006e0074006f007300200063006f006d0065007200630069006100690073002e0020004f007300200064006f00630075006d0065006e0074006f0073002000500044004600200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002c002000520065006100640065007200200035002e00300020006500200070006f00730074006500720069006f0072002e>
    /DAN <FEFF004200720075006700200064006900730073006500200069006e0064007300740069006c006c0069006e006700650072002000740069006c0020006100740020006f0070007200650074007400650020005000440046002d0064006f006b0075006d0065006e007400650072002c0020006400650072002000650072002000650067006e006500640065002000740069006c0020007000e5006c006900640065006c006900670020007600690073006e0069006e00670020006f00670020007500640073006b007200690076006e0069006e006700200061006600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020005000440046002d0064006f006b0075006d0065006e007400650072006e00650020006b0061006e002000e50062006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /NLD <FEFF004700650062007200750069006b002000640065007a006500200069006e007300740065006c006c0069006e00670065006e0020006f006d0020005000440046002d0064006f00630075006d0065006e00740065006e0020007400650020006d0061006b0065006e00200064006900650020006700650073006300680069006b00740020007a0069006a006e0020006f006d0020007a0061006b0065006c0069006a006b006500200064006f00630075006d0065006e00740065006e00200062006500740072006f0075007700620061006100720020007700650065007200200074006500200067006500760065006e00200065006e0020006100660020007400650020006400720075006b006b0065006e002e0020004400650020005000440046002d0064006f00630075006d0065006e00740065006e0020006b0075006e006e0065006e00200077006f007200640065006e002000670065006f00700065006e00640020006d006500740020004100630072006f00620061007400200065006e002000520065006100640065007200200035002e003000200065006e00200068006f006700650072002e>
    /ESP <FEFF0055007300650020006500730074006100730020006f007000630069006f006e006500730020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000500044004600200071007500650020007000650072006d006900740061006e002000760069007300750061006c0069007a006100720020006500200069006d007000720069006d0069007200200063006f007200720065006300740061006d0065006e0074006500200064006f00630075006d0065006e0074006f007300200065006d00700072006500730061007200690061006c00650073002e0020004c006f007300200064006f00630075006d0065006e0074006f00730020005000440046002000730065002000700075006500640065006e00200061006200720069007200200063006f006e0020004100630072006f00620061007400200079002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004e00e4006900640065006e002000610073006500740075007300740065006e0020006100760075006c006c006100200076006f006900740020006c0075006f006400610020006a0061002000740075006c006f00730074006100610020005000440046002d0061007300690061006b00690072006a006f006a0061002c0020006a006f006900640065006e0020006500730069006b0061007400730065006c00750020006e00e400790074007400e400e40020006c0075006f00740065007400740061007600610073007400690020006c006f00700070007500740075006c006f006b00730065006e002e0020005000440046002d0061007300690061006b00690072006a0061007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f006200610074002d0020006a0061002000520065006100640065007200200035002e00300020002d006f0068006a0065006c006d0061006c006c0061002000740061006900200075007500640065006d006d0061006c006c0061002000760065007200730069006f006c006c0061002e>
    /ITA <FEFF00550073006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e007400690020005000440046002000610064006100740074006900200070006500720020006c00610020007300740061006d00700061002000650020006c0061002000760069007300750061006c0069007a007a0061007a0069006f006e006500200064006900200064006f00630075006d0065006e0074006900200061007a00690065006e00640061006c0069002e0020004900200064006f00630075006d0065006e00740069002000500044004600200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f00700070007200650074007400650020005000440046002d0064006f006b0075006d0065006e00740065007200200073006f006d002000700061007300730065007200200066006f00720020007000e5006c006900740065006c006900670020007600690073006e0069006e00670020006f00670020007500740073006b007200690066007400200061007600200066006f0072007200650074006e0069006e006700730064006f006b0075006d0065006e007400650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e006500730020006d006500640020004100630072006f0062006100740020006f0067002000520065006100640065007200200035002e00300020006f0067002000730065006e006500720065002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006e00e40072002000640075002000760069006c006c00200073006b0061007000610020005000440046002d0064006f006b0075006d0065006e007400200073006f006d00200070006100730073006100720020006600f600720020007000e5006c00690074006c006900670020007600690073006e0069006e00670020006f006300680020007500740073006b0072006900660074002000610076002000610066006600e4007200730064006f006b0075006d0065006e0074002e0020005000440046002d0064006f006b0075006d0065006e00740065006e0020006b0061006e002000f600700070006e006100730020006d006500640020004100630072006f0062006100740020006f00630068002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006100720065002e>
    /ENU <FEFF005500730065002000740068006500730065002000730065007400740069006e0067007300200074006f0020006300720065006100740065002000500044004600200064006f00630075006d0065006e007400730020007300750069007400610062006c006500200066006f007200200049004500450045002000580070006c006f00720065002e0020004300720065006100740065006400200031003500200044006500630065006d00620065007200200032003000300033002e>
>> setdistillerparams
  /HWResolution [600 600]
  /PageSize [612.000 792.000]
>> setpagedevice
ABSTRACT
  This paper presents a measurement-based availability assessment study using
field data collected during a 4-year period from 373 SunOS/Solaris Unix
workstations and servers interconnected through a local area network. We focus
on the estimation of machine uptimes, downtimes and availability based on the
identification of failures that caused total service loss. Data corresponds to
syslogd event logs that contain a large amount of information about the normal
activity of the studied systems as well as their behavior in the presence of
failures. It is widely recognized that the information contained in such event
logs might be incomplete or imperfect. The solution investigated in this paper
to address this problem is based on the use of auxiliary sources of data
obtained from wtmpx files maintained by the SunOS/Solaris Unix operating
system. The results obtained suggest that the combined use of wtmpx and syslogd
log files provides more complete information on the state of the target systems
that is useful to provide availability estimations that better reflect reality.

<|endoftext|><|startoftext|>
Introduction 
Several initiatives have been developed during the 
last decade to monitor malicious threats and activities on 
the Internet, including viruses, worms, denial of service 
attacks, etc. Among them, we can mention the Internet 
Motion Sensor project [1], CAIDA [2], DShield [3], and 
CADHo [4]. These projects provide valuable 
information on security threats and the potential damage 
that they might cause to Internet users. Analysis and 
modeling methodologies are necessary to extract the 
most relevant information from the large set of data 
collected from such monitoring activities that can be 
useful for system security administrators and designers 
to support decision making. The designers are mainly 
interested in having representative and realistic 
assumptions about the kind of threats and vulnerabilities 
that their system will have to cope with once it is used in 
operation. Knowing who are the enemies and how they 
proceed to defeat the security of target systems is an 
important step to be able to build systems that can be 
resilient with respect to the corresponding threats. From 
the system security administrators’ perspective, the 
collected data should be used to support the 
development of efficient early warning and intrusion 
detection systems that will enable them to better react to 
the attacks targeting their systems. 
As of today, there is still a lack of methodologies and 
significant results to fulfill the objectives described 
above, although some progress has been achieved 
recently in this field. The CADHo project “Collection 
and Analysis  of Data from Honeypots” [4], an ongoing 
research action started in September 2004, is aimed at 
contributing to filling such a gap by carrying out the 
following activities:   
1)  deploying a distributed platform of honeypots [5] 
that gathers data suitable to analyze the attack 
processes targeting a large number of machines 
connected to the Internet;  
2)  developing analysis methodologies and modeling 
approaches to validate the usefulness of this 
platform by carrying out various analyses, based on 
the collected data, to characterize the observed 
attacks and model their impact on security. 
A honeypot is a machine connected to a network but 
that no one is supposed to use. In theory, no connection 
to or from that machine should be observed. If a 
connection occurs, it must be, at best an accidental error 
or, more likely, an attempt to attack the machine.  
The Leurré.com data collection environment  [5], set 
up in the context of the CADHo project, has deployed, 
as of to date, thirty five honeypot platforms at various 
locations from academia and industry, in twenty five 
countries over the five continents. Several analyses 
carried out based on the data collected so far from these 
honeypots have revealed that very interesting 
observations and conclusions can be derived with 
respect to the attack activities observed on the Internet 
[4, 6-9]. In addition, several automatic data analyses and 
clustering techniques have been developed to facilitate 
the extraction of relevant information from the collected 
data. A list of papers detailing the methodologies used 
and the results of these analyses is available in [6]. 
This paper focuses on modeling-related activities 
based on the data collected from the honeypots. We first 
discuss the objectives of such activities and the 
challenges that need to be addressed. Then we present 
some examples of models obtained from the data. 
The paper is organized as follows. Section 2 presents 
the data collection environment. Section 3 focuses on 
the modeling of attacks based on the data collected from 
the honeypots deployed. Modeling examples are 
presented in Section 4. Finally, Section 5 discusses 
future work. 
2.  The data collection environment  
The data collection environment (called Leurré.com 
[5]) deployed in the context of the CADHo project is 
based on low-interaction honeypots using the freely 
available software called honeyd [10]. Since September 
2004, 35 honeypot platforms have been progressively 
deployed on the Internet at various geographical 
locations. Each platform emulates three computers 
running Linux RedHat, Windows 98 and Windows NT, 
respectively, and various services such as ftp, web, etc. 
A firewall ensures that connections cannot be initiated 
from the computers, only replies to external solicitations 
are allowed. All the honeypot platforms are centrally 
managed to ensure that they have exactly the same 
configuration. The data gathered by each platform are 
securely uploaded to a centralized database with the 
complete content, including payload of all packets sent 
to or from these honeypots, and additional information 
to facilitate its analysis, such as the IP geographical 
localization of packets’ source addresses, the OS of the 
attacking machine, the local time of the source, etc. 
3. Modeling objectives 
Modeling involves three main steps: 
1) The definition of the objectives of the modeling 
activities and the quantitative measures to be 
evaluated. 
2) The development of one (or several) models that are 
suitable to achieve the specified objectives. 
3) The processing of the models and the analysis of the 
results to support system design or operation 
activities.  
The data collected from the honeypots can be 
processed in various ways to characterize the attack 
processes and perform predictive analyses. In particular, 
modeling activities can be used to: 
• Identify the probability distributions that best 
characterize the occurrence of attacks and their 
propagation through the Internet.  
• Analyze whether the data collected from different 
platforms exhibit similar or different malicious 
attack activities. 
• Model the time relationships that may exist between 
attacks coming from different sources (or to different 
destinations). 
• Predict the occurrence of new waves of attacks on a 
given platform based on the history of attacks 
observed on this platform as well as on the other 
platforms. 
For the sake of illustration, we present in the 
following sections simple preliminary models based on 
the data collected from our honeypots that are aimed at 
fulfilling such objectives. 
4. Examples 
The examples presented in the following address:  
1) The analysis of the time evolution of the number of 
attacks taking into account the geographic location 
of the attacking machine. 
2) The characterization and statistical modeling of the 
times between attacks. 
3) The analysis of the propagation of attacks throughout 
the honeypot platforms. 
The data considered for the examples has been 
collected from January 1st, 2004 to April 17, 2005, 
corresponding to a data collection period of 320 days.  
We take into account the attacks observed on 14 
honeypot platforms among those deployed so far. The 
selected honeypots correspond to those that have been 
active for almost the whole considered period. The total 
number of attacks observed on these honeypots is 
816476. These attacks are not uniformly distributed 
among the platforms. In particular, the data collected 
from three platforms represent more than fifty percent of 
the total attack activity. 
4.1 Attack occurrence and geographic distribution 
The preliminary models presented in this sub-section 
address: i) the time-evolution modeling of the number of 
attacks observed on different honeypot platforms, and ii) 
the analysis of potential correlations for the attack 
processes observed on the different platforms taking into 
account the geographic location of the attacking 
machines and the proportion of attacks observed on each 
platform, wrt. to the global attack activity. 
Let us denote by: 
− Y(t) the function describing the evolution of the 
number of attacks per unit of time observed on all 
the honeypots during the observation period,  
− Xj(t) the function describing the evolution of the 
number of attacks per unit of time observed on all 
the honeypots during the observation period for 
which the IP address of the attacking machine is 
located in country j . 
In a first stage, we have plotted, for various time 
periods, Y(t) and the curves Xj(t) corresponding to 
different countries j. Visual inspection showed 
surprising similarities between Y(t) and some Xj(t). To 
confirm such empirical observations, we have then 
decided to rigorously analyze this phenomenon using 
mathematical linear regression models.  
Considering a linear regression model, we have 
investigated if Y(t) can be estimated from the 
combination of the attacks described by Xj(t), taking into 
account a  limited number of countries j. Let us denote 
by Y*(t) the estimated model. 
Formally, Y*(t) is defined as follows: 
Y*(t) = Σαj Xj(t) + β         j= 1, 2, .. k (1) 
Constants αj and β  correspond to the parameters of 
the linear model that provide the best fit with the 
observed data, and k is the number of countries 
considered in the regression.  
The quality of fit of the model is measured by the 
statistics R2 defined by: 
R2 = Σ(Y*(i) – Yav)
 2/ Σ(Y (i) – Yav) 2 (2) 
Y (i) and Y*(i) correspond to the observed and estimated 
number of attacks for unit of time i, respectively. Yav is 
the average number of attacks per unit of time, taking 
into account the whole observation period.  
Indeed, R is the correlation factor between the 
estimated model and the observed values. The closer the 
R2 value is to 1, the better the estimated model fits the 
collected data.  
We have applied this model considering linear 
regressions involving one, two or more countries. 
Surprisingly, the results reveal that a good fit can be 
obtained by considering the attacks from one country 
only. For example, the models providing the best fit 
taking into account the total number of attacks from all 
the platforms are obtained by considering the attacks 
issued from either UK, USA, Russia or Germany only. 
The corresponding R2 values are of the same order of 
magnitude (0.944 for UK, 0.939 for USA, 0.930 for 
Russia and 0.920 for Germany), denoting a very good fit 
of the estimated models to the collected data. For 
example, the estimated model obtained when 
considering the attacks from Russia only is defined by 
equation (3): 
Y*(t) = 44.568 X1(t) + 1555.67 (3) 
X1(t) represents the evolution of the number of attacks 
from Russia. Figure 1 plots the evolution of the 
observed and estimated number of attacks per unit  of 
time during the data collection period considered in this 
example. The unit of time corresponds to 4 days. It is 
noteworthy that, similar conclusions are obtained if we 
consider another granularity for the unit of time, for 
example one day, or one week. 
These results are even more surprising that the 
attacks from Russia and UK represent only a small 
proportion of the total number of attacks (1.9% and 
3.7% respectively). Concerning the USA, although the 
proportion is higher (about 18%), it is not sufficient to 
explain the linear model. 
Figure 1- Evolution of the number of attacks per unit of time 
observed on all the platforms and estimated model considering 
attacks from Russia only 
We have applied similar analyses by respectively 
considering each honeypot platform in order to 
investigate if similar conclusions can be derived by 
comparing their attack activities per source country to 
their global attack activities. The results are summarized 
in Table 1. The second column identifies the source 
country that provides the best fit. The corresponding R2 
value is given in the third column. Finally, the last three 
columns give the R2 values obtained when considering 
UK, USA, or Russia in the regression model.  
It can be noticed that the quality of the regressions 
measured when considering attacks from Russia only is 
generally low for all platforms (R2 less than 0.80). This 
indicates that the property observed at the global level is 
not visible when looking at the local activities observed 
on each platform. However, for the majority of the 
platforms, the best regression models often involve one 
of the three following countries: USA, Germany or UK, 
which also provide the best regressions when analyzing 
the global attack activity considering all the platforms 
together. Two exceptions are found with P6 and P8 for 
which the observed attack activities exhibit different 
characteristics with respect to the origin of the attacks 
(Taiwan, China), compared to the other platforms.  
The trends discussed above have been also observed 
when considering a different granularity for the unit of 
time (e.g., 1 day or 1 week) as well as different data 
observation periods.  
Platform Country 
providing 
the best 
model 
Best 
model 
Russia 
P1 Germany 0.895 0.873 0.858 0.687 
P2 USA 0.733 0.464 0.733 0.260 
P4 Germany 0.722 0.197 0.373 0.161 
P5 Germany 0.874 0.869 0.872 0.608 
P6 UK 0.861 0.861 0.699 0.656 
P8 Taiwan 0.796 0.249 0.425 0.212 
P9 Germany 0.754 0.630 0.624 0.631 
P11 China 0.746 0.303 0.664 0.097 
P13 Germany 0.738 0.574 0.412 0.389 
P14 Germany 0.708 0.510 0.546 0.087 
P20 USA 0.912 0.787 0.912 0.774 
P21 SPAIN 0.791 0.620 0.727 0.720 
P22 USA 0.870 0.176 0.870 0.111 
P23 USA 0.874 0.659 0.874 0.517 
Global UK 0.944 0.944 0.939 0.930 
Table 1 – Estimated models for each platform: correlation 
factors for the countries providing the best fit and for UK, USA 
and Russia 
To summarize, two main findings can be derived 
from the results presented above:  
1) Some trends exhibited at the global level considering 
the attack processes on all the platforms together are 
not observed when analyzing each platform 
individually (this is the case, for example, of attacks 
from Russia). On the other hand, we have observed 
the other situation where the trends observed 
globally are also visible locally on the majority of 
the platforms (this is the case, for example, of attacks 
from USA, UK and Germany). 
2) The attack processes observed on each platform are 
very often highly correlated with the attack processes 
originating from a particular country. The country 
providing the best regressions locally, does not 
necessary exhibit high correlations when considering 
other platforms or at the global level. These trends 
seem to result from specific factors that govern the 
attack processes observed on each platform. 
4.2 Distribution of times between attacks 
In this example, we focus on the analysis and the 
modeling of the times between attacks observed on 
different honeypot platforms.  
Let us denote by ti, the time separating the 
occurrence of attack i and attack (i-1). Each attack is 
associated to an IP address, and its occurrence time is 
defined by the time when the first packet is received 
from the corresponding address at one of the three 
virtual machines of the honeypot platform. All the 
packets received from the same IP address within 24 
hours are supposed to belong to the same attack session.  
We have analyzed the distribution of the times 
between attacks observed on each honeypot platform. 
Our objective was to find analytical models that 
faithfully reflect the empirical data collected from each 
platform. In the following, we summarize the results 
obtained considering 5 platforms for which we have 
observed the highest attack activity.  
4 .2.1 Empirical analyses 
Table 2 gives the number of intervals of times 
between attacks observed at each platform considered in 
the analysis as well as the corresponding number of IP 
addresses. As illustrated by Figure 2, most of these 
addresses have been observed only once at a given 
platform. Nevertheless, some IP addresses have been 
observed several times, the maximum number of visits 
per IP address for the five platforms was 57, 96, 148, 
183, and 83 (respectively). Indeed, the curves plotting 
the number of IP addresses as a function of the number 
of attacks for each address follow a heavy-tailed power 
law distribution. It is noteworthy that such distributions 
have been observed in many performance and 
dependability related studies in the context of the 
Internet, e.g., transfer and interarrival times, burst sizes, 
sizes of files transferred over the web, error rates in web 
servers, etc. 
 P5 P6 P9 P20  P23 
Number of ti 85890 148942 46268 224917 51580 
Number of IP 
addresses 
79549 90620 42230 162156 47859 
Table 2 - Numbers of intervals of times between attacks  (ti) and 
of different IP addresses observed at each platform  
Figure 2- Number of IP addresses versus the number of attacks 
per IP address observed at each platform  (log-log scale) 
4 .2.2 Modeling 
Finding tractable analytical models that faithfully 
reflect the observed times between attacks is useful to 
characterize the observed attack processes and to find 
appropriate indicators that can be used for prediction 
purposes. We have investigated several candidate 
distributions, including Weibull, Lognormal, Pareto, and 
the Exponential distribution, which are traditionally 
used in reliability related studies. The best fit for each 
platform has been obtained using a mixture model 
combining a Pareto and an exponential distribution.  
Let us denote by T the random variable 
corresponding to the time between the occurrence of two 
consecutive attacks at a given platform, and t a 
realization of T. Assuming that the probability density 
function pdf(t) associated to T is characterized by a 
mixture distribution combining a Pareto distribution and 
an exponential distribution, then f(t) is defined as 
follows. 
pdf (t) = P
(t +1)
+ (1" P
k is the index parameter of the Pareto distribution, λ is 
the rate associated to the exponential distribution and Pa 
is a probability. 
We have used the R statistical package [11] to estimate 
the parameters that provide the best fit to the collected 
data. The quality of fit is assessed by applying the 
Kolmogorov-Smirnov statistical test. The results are 
presented in Figure 3. It can be noticed that for all the 
platforms, the mixed distribution provides a good fit to 
the observed data whereas the exponential distribution is 
not suitable to describe the observed attack processes. 
Thus, the traditional assumption considered in hardware 
reliability evaluation studies assuming that failures 
occur according to a Poisson process does not seem to 
be satisfactory when considering the data observed form 
our honeypots. These results have been also confirmed 
when considering the data collected during other 
observation periods.  
1 31 61 91 121 151 181 211 241 271
Time between attacks (seconds)
Pa = 0.0051
k = 0.173
! = 0.121/sec.
p-value = 0.90
Data Mixture (Pareto, Exp.)
Exponential
1 31 61 91 121 151 181 211 241 271
Time between attacks (seconds)
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0115
k = 0.1183
! = 0.1364/sec.
p-value = 0.999
 a) P5 b) P6 
1 31 61 91 121 151 181 211 241 271
Time between attacks (seconds)
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0019
k = 0.1668
! = 0.276/sec.
p-value = 0.99
1 31 61 91 121 151 181 211 241 271
Time between attacks (seconds)
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0144
k = 0.0183
! = 0.0136/sec.
p-value = 0.90
 c) P9 d) P20 
1 31 61 91 121 151 181 211 241 271
Time between attacks (seconds)
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0031
k = 0.1240
! = 0.275/sec.
p-value = 0.985
e) P23 
Figure 3- Observed and estimated times between attacks probability density functions. 
4.3 Propagation of attacks 
Besides analyzing the attack activities observed at 
each platform in isolation, it is useful to identify 
phenomena that reflect propagation of attacks through 
different platforms. In this section, we analyze simple 
scenarios where a propagation between two platforms is 
assumed to occur when the IP address of an attacking 
machine observed at a given platform is also observed at 
another platform. Such a situation might occur for 
example as a result of a scanning activity or might be 
resulting from the propagation of worms.  
For the sake of illustration, we restrict the analysis to 
the five platforms considered in the previous example. 
For each attacking IP address in the data collected from 
the five platforms during the period of the study, we 
identified: 1) all the occurrences with the same source 
address, 2) the times of each occurrence and 3) the 
platform on which each occurrence has been reported. A 
propagation is said to occur for this IP address from 
platform Pi to platform Pj when the next occurrence of 
this address is observed on Pj after visiting Pi.  
Based on this information we build a propagation 
graph where each node identifies a platform and a 
transition between two nodes identifies a propagation 
between the nodes. A probability is associated to each 
transition to characterize its likelihood of occurrence.  
Figure 4 presents the propagation graph obtained for 
the five platforms included in the analysis. Considering 
platforms P6 and P20, it can be seen that only a few IP 
addresses that attacked these platforms have been 
observed on the other platforms. The situation is 
different when considering platforms P5, P9, and P23.  
In particular, it can be noticed that propagation between 
P5 and P9 is highly probable. This is related in 
particular to the fact that the addresses of the 
corresponding platforms belong to the same /8 network 
domain. More thorough and detailed analyses are 
currently carried out based on the propagation graph in 
order to take into account timing information for the 
corresponding transitions and also the types of attacks 
observed, in order to better explain the propagation 
phenomena illustrated by the graph. 
Figure 4- Propagation graph 
5. Conclusion 
This paper presented simple examples and 
preliminary models illustrating various types of 
empirical analysis and modeling activities that can be 
carried out based on the data collected from honeypots 
in order to characterize attack processes. The honeypot 
platforms deployed so far in our project belong to the 
family of so-called “low interaction honeypots”. Thus, 
hackers can only scan ports and send requests to fake 
servers without ever succeeding in taking control over 
them. In our project, we are also interested in running 
experiments with “high interaction” honeypots where 
attackers can really compromise the targets. Such 
honeypots are suitable to collect data that would enable 
us to study the behaviors of attackers once they have 
managed to get access to a target and try to progress in 
the intrusion process to get additional privileges. Future 
work will be focused on the deployment of such 
honeypots and the exploitation of the collected data to 
better characterize attack scenarios and analyze their 
impact on the security of the target systems. The 
ultimate objective would be to build representative 
stochastic models that will enable us to evaluate the 
ability of computing systems to resist to attacks and to 
validate them based on real attack data. 
Acknowledgement. This work has been carried out in the 
context of the CADHo project, an ongoing research action 
funded by the French ACI “Securité & Informatique” 
(www.cadho.org). It is partially supported by the ReSIST 
European Network of Excellence (www .resist-noe.org). 
References 
[1] M. Bailey, E. Cooke, F. Jahanian, J. Nazario, and D. 
Watson, "The Internet Motion Sensor: A Distributed 
Blackhole Monitoring System," Proc. 12th Annual 
Network and Distributed System Security Symposium 
(NDSS), San Diego, CA, Feb. 2005. 
[2] Home Page of the CAIDA Project, http://www.caida.org/ 
[3] DShield Distributed Detection System homepage, 
http://www.honeynet.org/ 
[4] E. Alata, M. Dacier, Y. Deswarte, M. Kaâniche, K. 
Kortchinsky, V. Nicomette, V.H. Pham, F. Pouget,  
Collection and Analysis of Attack data based on 
honeypots deployed on the Internet”, 1st Workshop on 
Quality of Protection, Milano, Italy, September 2005.  
[5]  F. Pouget, M. Dacier, V. H. Pham, “Leurré.com: On the 
Advantages of Deploying a Large Scale Distributed 
Honeypot Platform”, Proc. E-Crime and Computer 
Evidence Conference (ECCE 2005), Monaco, Mars 2005. 
[6] L. Spitzner, Honeypots: Tracking Hackers, Addison-
Wesley, ISBN from-321-10895-7, 2002 
[7] Project Leurré.com. Publications web page, 
http://www.leurrecom.org/paper.htm  
[8] M. Dacier, F. Pouget, H. Debar, “Honeypots: Practical 
Means to Validate Malicious Fault Assumptions on the 
Internet”, Proc. 10th IEEE International Symposium 
Pacific Rim  Dependable Computing (PRDC10), Tahiti, 
March 2004, pages 383-388. 
[9] M. Dacier, F. Pouget, H. Debar, “Attack Processes found 
on the Internet”, Proc. OTAN Symp. on Adaptive Defense 
in Unclassified Networks, Toulouse, France, April 2004. 
[10] Honeyd Home page, 
http://www.citi.umich.edu/u/provos/honeyd/ 
[11] R statistical package Home page, http://www.r-project.org
ABSTRACT
  Honeypots are more and more used to collect data on malicious activities on
the Internet and to better understand the strategies and techniques used by
attackers to compromise target systems. Analysis and modeling methodologies are
needed to support the characterization of attack processes based on the data
collected from the honeypots. This paper presents some empirical analyses based
on the data collected from the Leurr{\'e}.com honeypot platforms deployed on
the Internet and presents some preliminary modeling studies aimed at fulfilling
such objectives.

<|endoftext|><|startoftext|>
Draft version October 24, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
THE LOW CO CONTENT OF THE EXTREMELY METAL POOR GALAXY I ZW 18
Adam Leroy
, John Cannon
, Fabian Walter
, Alberto Bolatto
, Axel Weiss
Draft version October 24, 2018
ABSTRACT
We present sensitive molecular line observations of the metal-poor blue compact dwarf I Zw 18
obtained with the IRAM Plateau de Bure interferometer. These data constrain the CO J = 1 → 0
luminosity within our 300 pc (FWHM) beam to be LCO < 1×10
5 K km s−1 pc2 (ICO < 1 K km s
−1), an
order of magnitude lower than previous limits. Although I Zw 18 is starbursting, it has a CO luminosity
similar to or less than nearby low-mass irregulars (e.g. NGC 1569, the SMC, and NGC 6822). There is
less CO in I Zw 18 relative to its B-band luminosity, H I mass, or star formation rate than in spiral or
dwarf starburst galaxies (including the nearby dwarf starburst IC 10). Comparing the star formation
rate to our CO upper limit reveals that unless molecular gas forms stars much more efficiently in
I Zw 18 than in our own galaxy, it must have a very low CO-to-H2 ratio, ∼ 10
−2 times the Galactic
value. We detect 3mm continuum emission, presumably due to thermal dust and free-free emission,
towards the radio peak.
Subject headings: galaxies: individual (I Zw 18); galaxies: ISM; galaxies: dwarf, radio lines: ISM
1. INTRODUCTION
With the lowest nebular metallicity in the nearby uni-
verse (12+ logO/H ≈ 7.2, Skillman & Kennicutt 1993),
the blue compact dwarf I Zw 18 plays an important role
in our understanding of galaxy evolution. Vigorous ongo-
ing star formation implies the presence of molecular gas,
but direct evidence has been elusive. Vidal-Madjar et al.
(2000) showed that there is not significant diffuse H2, but
Cannon et al. (2002) found ∼ 103 M⊙ of dust organized
in clumps with sizes 50 – 100 pc. Vidal-Madjar et al.
(2000) did not rule out compact, dense molecular clouds,
and Cannon et al. (2002) argued that this dust may in-
dicate the presence of molecular gas.
Observations by Arnault et al. (1988) and
Gondhalekar et al. (1998) failed to detect CO J = 1 → 0
emission, the most commonly used tracer of H2. This
is not surprising. The low dust abundance and intense
radiation fields found in I Zw 18 may have a dramatic
impact on the formation of H2 and structure of molecular
clouds. A large fraction of the H2 may exist in extended
envelopes surrounding relatively compact cold cores. In
these envelopes, H2 self-shields while CO is dissociated
(Maloney & Black 1988). The result may be that in
such galaxies [CII] or FIR emission trace H2 better than
CO (Madden et al. 1997; Israel 1997a; Pak et al. 1998).
Further, H2 may simply be underabundant, as there is a
lack of grains on which to form while photodissociation
is enhanced by an intense UV field. Indeed, Bell et al.
(2006) found that at Z = Z⊙/100, a molecular cloud
may take as long as a Gyr to reach chemical equilibrium.
A low CO content in I Zw 18 is then expected, and a
stringent upper limit would lend observational support to
predictions for molecular cloud structure at low metallic-
1 Max-Planck-Institut für Astronomie, Königstuhl 17, D-69117,
Heidelberg, Germany; email: leroy@mpia-hd.mpg.de
2 Astronomy Department, Wesleyan University, Middletown, CT
06459, cannon@astro.wesleyan.edu
3 Radio Astronomy Lab, UC Berkeley, 601 Campbell Hall,
Berkeley, CA, 94720
4 MPIfR, Auf dem Hügel 69, 53121, Bonn, Germany
ity. However, while the existing upper limits are sensitive
in an absolute sense, they do not even show I Zw 18 to
have a lower normalized CO content than a spiral galaxy
(e.g. less CO per B-band luminosity). The low luminos-
ity (MB ≈ −14.7, Gil de Paz et al. 2003) and large dis-
tance (d=14 Mpc, Izotov & Thuan 2004) of this system
require very sensitive observations to set a meaningful
upper limit.
In this letter we present observations, obtained with
the IRAM Plateau de Bure Interferometer (PdBI)5, that
constrain the CO luminosity, LCO, to be equal to or less
than that of nearby CO-poor (non-starbursting) dwarf
irregulars.
2. OBSERVATIONS
I Zw 18 was observed with the IRAM Plateau de
Bure Interferometer on 17, 21, and 27 April and 13
May 2004 for a total of 11 hours. The phase calibrators
were 0836+710 (Fν(115GHz) ≈ 1.1 Jy), and 0954+556
(Fν(115GHz) ≈ 0.35 Jy). One or more calibrators with
known fluxes were also observed during each track. The
data were reduced at the IRAM facility in Grenoble us-
ing the GILDAS software package; maps were prepared
using AIPS. The final CO J = 1 → 0 data cube has beam
size 5.59′′ × 3.42′′, and a velocity (frequency) resolution
of 6.5 km s−1 (2.5 MHz). The velocity coverage stretches
from vLSR ≈ 50 to 1450 km s
−1. The data have an RMS
noise of 3.77 mJy beam−1 (18 mK; 1 Jy beam−1 = 4.8
K). The 44′′ (FWHM) primary beam completely covers
the galaxy. Based on variation of the relative fluxes of
the calibrators, we estimate the gain uncertainty to be
< 15%.
3. RESULTS
3.1. Upper Limit on CO Emission
To search for significant CO emission, we smooth the
cube to 20 km s−1 velocity resolution, a typical line width
5 Based on observations carried out with the IRAM Plateau
de Bure Interferometer. IRAM is supported by INSU/CNRS
(France), MPG (Germany) and IGN (Spain).”
http://arxiv.org/abs/0704.0862v1
Fig. 1.— CO 1 → 0 spectra of I Zw 18 towards the radio
continuum/Hα peak (left) and the highest significance spectra
(right), which is still too faint to classify as more than marginal.
The locations of both spectra are shown in Figure 2. Dashed hor-
izontal lines show the magnitude of the RMS noise.
for CO at our spatial resolution (e.g., Helfer et al. 2003).
The noise per channel map in this smoothed cube is
σ20 ≈ 0.25 K km s
−1. Over the H I velocity range (710
– 810 km s−1, van Zee et al. 1998), there are no regions
with ICO,20 > 1 K km s
−1 (4σ) within the primary beam.
We pick a slightly conservative upper limit for two rea-
sons. First, if there were CO emission with this intensity
we would be certain of detecting it. Second, the noise in
the cube is slightly non-Gaussian, so that the false posi-
tive rate for ICO,20 > 1 K km s
−1 — estimated from the
negatives and the channel maps outside the H I velocity
range — is ∼ 0.2%, very close to that of a 3σ deviate.
For d = 14 Mpc, the synthesized beam has a FWHM
of 300 pc and an area of 1.0 × 105 pc2. Our intensity
limit, ICO < 1 K km s
−1, therefore translates to a CO
luminosity limit of LCO < 1× 10
5 K km s−1 pc2.
There is a marginal signal toward the southern knot
of Hα emission (9h34m02s.4, 55◦14′23′′.0). This emission
has the largest |ICO,20| found over the H I velocity range,
corresponding to LCO ∼ 8×10
4 K km s−1 pc2, just below
our limit. This same line of sight also shows |ICO| > 2σ
over three consecutive channels, a feature seen along only
one other line of sight (in negative) over the H I velocity
range. The marginal signal is suggestively located in the
southeast of I Zw 18, where Cannon et al. (2002) identi-
fied several potential sites of molecular gas from regions
of relatively high extinction. While tantalizing, the sig-
nal is not strong enough to be categorized as a detection.
Figure 1 shows CO spectra towards the Hα/radio contin-
uum peak (Cannon et al. 2002, 2005; Hunt et al. 2005a,
see Figure 2) and this marginal signal.
3.2. Continuum Emission
We average the data over all channels and produce a
continuum map with noise σ115GHz = 0.35 mJy beam
The highest value in the map is I115GHz = 1.06±0.35mJy
beam−1 at α2000 = 9
h34m02s.1, δ2000 = +55
◦ 14′ 27′′.0.
This is within a fraction of a beam of the 1.4 GHz peak
identified by Cannon et al. (2005, α2000 = 9
h34m02s.1,
δ2000 = +55
◦ 14′ 28′′.06) and Hunt et al. (2005a, α2000 =
9h34m02s, δ2000 = +55
◦ 14′ 29′′.06). Figure 2 shows the
radio continuum peak and 115 GHz continuum contours
plotted over Hα emission from I Zw 18 (Cannon et al.
2002). There is only one other region with |I115GHz| >
3σ115GHz within the primary beam and the star-forming
extent of I Zw 18 occupies ≈ 10 % of the primary beam.
Therefore, we estimate the chance of a false positive co-
incident with the galaxy to be only ∼ 10%.
4. DISCUSSION
Here we discuss the implications of our CO upper limit
and continuum detection. We adopt the following prop-
erties for I Zw 18, all scaled to d = 14 Mpc: MB =
−14.7 (Gil de Paz et al. 2003), MHI = 1.4 × 10
(van Zee et al. 1998), Hα luminosity log10 Hα = 39.9 erg
s−1 (Cannon et al. 2002; Gil de Paz et al. 2003), 1.4 GHz
flux F1.4 = 1.79 mJy (Cannon et al. 2005).
4.1. Point Source Luminosity
Our upper limit along each line of sight, LCO <
1 × 105 K km s−1 pc2, matches the luminosity of
a fairly massive Galactic giant molecular cloud (Blitz
1993). For a Galactic CO-to-H2 conversion factor, 2 ×
1020 cm−2 (K km s−1)−1, the corresponding molecular
gas mass is MMol ≈ 4.4× 10
5 M⊙, similar to the mass of
the Orion-Monoceros complex (e.g. Wilson et al. 2005).
4.2. Comparison With More Luminous Galaxies
In galaxies detected by CO surveys, the CO content
per unit B-band luminosity is fairly constant. Figure 3
shows the CO luminosity normalized by B-band lumi-
nosity, LCO/LB, as a function of absolute B-band mag-
nitude (LB is extinction corrected). LCO/LB is nearly
constant over two orders of magnitude in LB, though
with substantial scatter (much of it due to the extrapo-
lation from a single pointing to LCO).
Based on these data and assuming that LCO is not
a function of the metallicity of the galaxy, we may ex-
trapolate to an expected CO luminosity for I Zw 18.
For MB,IZw18 ≈ −14.7 the CO luminosity correspond-
ing to the median value of LCO/LB (dashed line) in
Figure 3 is LCO,IZw18 ≈ 1.7 × 10
6 K km s−1 pc2.
The Hα, 1.4 GHz, and H I luminosities lead to simi-
lar predictions. Young et al. (1996) found MH2/LHα ≈
10L⊙/M⊙ for Sd–Irr galaxies, which implies LCO,IZw18 ∼
4 × 106 K km s−1 pc2. Murgia et al. (2005) measured
FCO/F1.4 ≈ 10 Jy km s
−1 (mJy)−1 for spirals, that would
imply LCO,IZw18 ∼ 10
7 K km s−1. For Sd/Sm galax-
ies, MH2/MHI ≈ 0.2 (Young & Scoville 1991), leading
to LCO,IZw18 ∼ 5 × 10
6 K km s−1 pc2. Both MH2/LHα
and MH2/MHI tend to be even higher in earlier-type
spirals.
Therefore, surveys would predict LCO,IZw18 & 2 ×
106 K km s−1 pc2, very close to the previously established
upper limits of 2− 3 × 106 K km s−1pc2 (Arnault et al.
1988; Gondhalekar et al. 1998). With the present obser-
vations, we constrain LCO < 1 × 10
5 K km s−1pc2 and
thus clearly rule out LCO ∼ 10
6 K km s−1 pc2. This
may be seen in Figure 3; even if I Zw 18 has the highest
possible CO content, it will still have a lower LCO/LB
than 97% of the survey galaxies.
4.3. Comparison With Nearby Metal-Poor Dwarfs
The subset of irregular galaxies detected by CO surveys
tend to be CO-rich and actively star-forming, resembling
scaled-down versions of spiral galaxies (Young et al.
1995, 1996; Leroy et al. 2005). Such galaxies may not
be representative of all dwarfs. Because they are nearby,
several of the closest dwarf irregulars have been detected
Fig. 2.— V -band and Hα (right, Cannon et al. 2002) images of I Zw 18. Overlays on the left image show the size of the synthesized
beam and the locations of the spectra shown in Figure 1. Contours on the right image show continuum emission in increments of 0.5σ
significance and the location of the radio continuum peak. The primary beam is larger than the area shown. Both optical maps are on
linear stretches. V -band data obtained from the MAST Archive, originally observed for GO program 9400, PI: T. Thuan).
despite very small LCO. With their low masses and
metallicities, they may represent good points of compar-
ison for I Zw 18. Table 1 and Figure 3 show CO lumi-
nosities and LCO/LB for four nearby dwarfs: NGC 1569,
the Small Magellanic Cloud (SMC), NGC 6822, and
IC 10. The SMC, NGC 1569, and NGC 6822 have
LCO ∼ 10
5 K km s−1 pc2, close to our upper limit, and
occupy a region of LCO/LB-LB parameter space similar
to I Zw 18. All four of these galaxies have active star for-
mation but very low CO content relative to their other
properties.
We test whether our observations would have detected
CO in NGC 1569, the SMC, and IC 10 at the plausible
lower limit of 10 Mpc (from H0 = 72 km s
−1) or our
adopted distance of 14 Mpc. We convolve the integrated
intensity maps to resolutions of 210 and 300 pc and mea-
sure the peak integrated intensity. The results appear in
columns 4 and 5 of Table 1. The PdBI observations of
NGC 1569 resolve out most of the flux, so we also apply
this test to a distribution with the size and luminosity
derived by Greve et al. (1996) from single dish observa-
tions. Our observations would detect an analog to IC 10
but not the SMC, with NGC 1569 an intermediate case.
With a factor of ∼ 3 better sensitivity (requiring ∼ 10
times more observing time) we would expect to detect
all three nearby galaxies. However, achieving such sen-
sitivity with present instrumentation will be quite chal-
lenging. ALMA will likely be necessary to place stronger
constraints on CO in galaxies like I Zw 18.
IC 10 may be the nearest blue compact dwarf
(Richer et al. 2001), so it may be telling that we would
detect it at the distance of I Zw 18. The blue compact
galaxies that have been detected in CO have LCO/LB
similar to IC 10 (Gondhalekar et al. 1998, the diamonds
in Figure 3). Most searches for CO towards BCDs have
yielded nondetections, so those detected may not be rep-
resentative, but I Zw 18 is clearly not among the “CO-
rich” portion of the BCD population.
4.4. Interpretation of the Continuum
We measure continuum intensity of F115GHz = 1.06±
0.35 mJy towards the radio continuum peak. The
continuum is detected along only one line of sight,
so we refer to it here as a point source and com-
pare it to integrated values for I Zw 18. F115GHz is
expected to be the product of mainly two types of
emission: thermal free-free emission and thermal dust
emission. At long wavelengths, the integrated ther-
mal free-free emission is F1.4GHz(free− free) ≈ 0.52 –
0.75 mJy (Cannon et al. 2005; Hunt et al. 2005a), imply-
ing F115GHz(free− free) = 0.36 – 0.51 mJy at 115 GHz
(Fν ∝ ν
−0.1). The Hα flux predicts a similar value,
F115GHz(free− free) = 0.34 mJy (Cannon et al. 2005,
Equation 1). Hunt et al. (2005b) placed an upper limit
of Fν(850) < 2.5 mJy on dust continuum emission at
850µm; this is consistent with the ∼ 5 × 103 M⊙ esti-
mated by Cannon et al. (2002) given almost any reason-
able dust properties. Extrapolating this to 2.6 mm as-
suming a pure blackbody spectrum, the shallowest plau-
sible SED, constrains thermal emission from dust to be
< 0.25 mJy at 115 GHz. Based on these data, we would
predict F115GHz . 0.75 mJy. Thus our measured F115GHz
is consistent with, but somewhat higher than, the ther-
mal free-free plus dust emission expected based on opti-
cal, centimeter, and submillimeter data.
4.5. Relation to Star Formation
I Zw 18 has a star formation rate ∼ 0.06 – 0.1 M⊙
yr−1, based on Hα and cm radio continuum measure-
ments (Cannon et al. 2002; Kennicutt 1998a; Hunt et al.
2005a). Our continuum flux suggests a slightly higher
value ≈ 0.15 – 0.2 M⊙ yr
−1 (following Hunt et al. 2005a;
Condon 1992), with the exact value depending on the
contribution from thermal dust emission. For any value
in this range, the star formation rate per CO luminosity,
SFR/LCO is much higher in I Zw 18 than in spirals. For
Fig. 3.— CO luminosity normalized by absolute blue mag-
nitude for galaxies with Hubble Type Sb or later (black cir-
cles, Young et al. 1995; Elfhag et al. 1996; Böker et al. 2003;
Leroy et al. 2005). We also plot nearby dwarfs from Ta-
ble 1 (crosses) and blue compact galaxies compiled by
Gondhalekar et al. (1998, , diamonds). The shaded regions shows
our upper limit for I Zw 18, with the range inMB for distances from
10 to 20 Mpc. The dashed line and light shaded region show the
median value and 1σ scatter in LCO/LB for spirals and dwarf star-
bursts. Methodology: We extrapolate from ICO in central pointings
to LCO assuming the CO to have an exponential profile with scale
length 0.1 d25 (Young et al. 1995), including only galaxies where
the central pointing measures > 20% of LCO. We adopt B mag-
nitudes (corrected for internal and Galactic extinction), distances
(Tully-Fisher when available, otherwise Virgocentric-flow corrected
Hubble flow), and radii from LEDA (Paturel et al. 2003).
comparison, our upper limit and the molecular “Schmidt
Law” derived by Murgia et al. (2002) predicts a star for-
mation rate . 2 × 10−4 M⊙ yr
−1. Fits by Young et al.
(1996) and Kennicutt (1998b, applied to just the molec-
ular limit) yield similar values. Again, I Zw 18 is similar
to the SMC and NGC 6822, which have star formation
rates of 0.05 M⊙ yr
−1 and 0.04 M⊙ yr
−1 (Wilke et al.
2004; Israel 1997b) and LCO ∼ 10
5 K km s−1 pc2.
4.6. Variations in XCO
Several calibrations of the CO-to-H2 conversion factor,
XCO as a function of metallicity exist in the literature.
The topic has been controversial and these calibrations
range from little or no dependence (e.g. Walter 2003;
Rosolowsky et al. 2003) to very steep dependence (e.g.,
XCO ∝ Z
−2.7 Israel 1997a). Comparing the star for-
mation rate to our CO upper limit, we may rule out
that I Zw 18 has a Galactic XCO unless molecular gas
in I Zw 18 forms stars much more efficiently than in the
Galaxy. Either the ratio of CO-to-H2 is low in I Zw 18
or molecular gas in this galaxy forms stars with an effi-
ciency two orders of magnitude higher than that in spiral
galaxies.
5. CONCLUSIONS
We present new, sensitive observations of the metal-
poor dwarf galaxy I Zw 18 at 3 mm using the Plateau de
Bure Interferometer. These data constrain the integrated
CO J = 1 → 0 intensity to be ICO < 1 K km s
−1 over our
300 pc (FWHM) beam and the luminosity to be LCO <
1× 105 K km s−1 pc2.
I Zw 18 has less CO relative to its B-band luminosity,
H Imass, or SFR than spiral galaxies or dwarf starbursts,
including more metal-rich blue compact galaxies such as
IC 10 (ZIC 10 ∼ Z⊙/4, Lee et al. 2003). Because of its
small size and large distance, these are the first observa-
tions to impose this constraint.
We show that I Zw 18 should be grouped with several
local analogs — NGC 1569, the SMC, NGC 6822 — as
a galaxy with active star formation but a very low CO
content relative to its other properties. In these galax-
ies, observations suggest that the environment affects the
molecular gas and these data suggest that the same is
true in I Zw 18. A simple comparison of star formation
rate to CO content shows that this must be true at a
basic level: either the ratio of CO to H2 is dramatically
low in I Zw 18 or molecular gas in this galaxy forms stars
with an efficiency two orders of magnitude higher than
that in spiral galaxies.
We detect 3mm continuum with F115 GHz = 1.06 ±
0.35 mJy coincident with the radio peak identified by
Cannon et al. (2005) and Hunt et al. (2005a). This flux
is consistent with but somewhat higher than the thermal
free-free plus dust emission one would predict based on
centimeter, submillimeter, and optical measurements.
Finally, we note that improving on this limit with cur-
rent instrumentation will be quite challenging. The order
of magnitude increase in sensitivity from ALMA will be
needed to place stronger constraints on CO in galaxies
like I Zw 18.
We thank Roberto Neri for his help reducing the data.
We acknowledge the usage of the HyperLeda database
(http://leda.univ-lyon1.fr).
REFERENCES
Arnault, P., Kunth, D., Casoli, F., & Combes, F. 1988, A&A,
205, 41
Bell, T. A., Roueff, E., Viti, S., & Williams, D. A. 2006, MNRAS,
371, 1865
Blitz, L. 1993, Protostars and Planets III, 125
Böker, T., Lisenfeld, U., & Schinnerer, E. 2003, A&A, 406, 87
Cannon, J. M., Skillman, E. D., Garnett, D. R., & Dufour, R. J.
2002, ApJ, 565, 931
Cannon, J. M., Walter, F., Skillman, E. D., & van Zee, L. 2005,
ApJ, 621, L21
Condon, J. J. 1992, ARA&A, 30, 575
Gil de Paz, A., Madore, B. F., & Pevunova, O. 2003, ApJS, 147,
Elfhag, T., Booth, R. S., Hoeglund, B., Johansson, L. E. B., &
Sandqvist, A. 1996, A&AS, 115, 439
Gondhalekar, P. M., Johansson, L. E. B., Brosch, N., Glass, I. S.,
& Brinks, E. 1998, A&A, 335, 152
Greve, A., Becker, R., Johansson, L. E. B., & McKeith, C. D.
1996, A&A, 312, 391
Helfer, T. T., Thornley, M. D., Regan, M. W., Wong, T., Sheth,
K., Vogel, S. N., Blitz, L., & Bock, D. C.-J. 2003, ApJS, 145,
Hunt, L. K., Dyer, K. K., & Thuan, T. X. 2005a, A&A, 436, 837
Hunt, L., Bianchi, S., & Maiolino, R. 2005b, A&A, 434, 849
Israel, F. P. 1997, A&A, 328, 471
Israel, F. P. 1997, A&A, 317, 65
Izotov, Y. I., & Thuan, T. X. 2004, ApJ, 616, 768
http://leda.univ-lyon1.fr
TABLE 1
CO in Nearby Low Mass Galaxies
Galaxy MB LCO ICO,210
a ICO,300
a Reference
(mag) (K km s−1 pc2) (K km s−1) (K km s−1)
NGC 1569 −16.5 1.2× 105 1.1 0.8 Greve et al. (1996)
−16.5 0.2× 105 0.8 0.5 Taylor et al. (1999)
SMC −16 1.5× 105 0.5 0.4 Mizuno et al. (2001, 2006)
NGC 6822 −16 1.2× 105 · · · · · · Israel (1997b)
IC 10 −16.5 2.2× 106 3.8 2.2 Leroy et al. (2006)
I Zw 18 −14.7 < 2× 106 · · · · · · Arnault et al. (1988); Gondhalekar et al. (1998)
I Zw 18 −14.7 . 1× 105 < 1 < 1 this paper
a Peak integrated intensity at 210 and 300 pc, corresponding to our beam size at 10 and 14 Mpc, respectively.
Kennicutt, R. C., Jr. 1998a, ARA&A, 36, 189
Kennicutt, R. C., Jr. 1998b, ApJ, 498, 541
Lee, H., McCall, M. L., & Richer, M. G. 2003, AJ, 125, 2975
Leroy, A., Bolatto, A. D., Simon, J. D., & Blitz, L. 2005, ApJ,
625, 763
Leroy, A., Bolatto, A., Walter, F., & Blitz, L. 2006, ApJ, 643, 825
Madden, S. C., Poglitsch, A., Geis, N., Stacey, G. J., & Townes,
C. H. 1997, ApJ, 483, 200
Maloney, P., & Black, J. H. 1988, ApJ, 325, 389
Mizuno, N., Rubio, M., Mizuno, A., Yamaguchi, R., Onishi, T., &
Fukui, Y. 2001, PASJ, 53, L45
Mizuno, N., et al. 2006, in prep.
Murgia, M., Crapsi, A., Moscadelli, L., & Gregorini, L. 2002,
A&A, 385, 412
Murgia, M., Helfer, T. T., Ekers, R., Blitz, L., Moscadelli, L.,
Wong, T., & Paladino, R. 2005, A&A, 437, 389
Pak, S., Jaffe, D. T., van Dishoeck, E. F., Johansson, L. E. B., &
Booth, R. S. 1998, ApJ, 498, 735
Paturel, G., Petit, C., Prugniel, P., Theureau, G., Rousseau, J.,
Brouty, M., Dubois, P., & Cambrésy, L. 2003, A&A, 412, 45
Richer, M. G., et al. 2001, A&A, 370, 34
Rosolowsky, E., Engargiola, G., Plambeck, R., & Blitz, L. 2003,
ApJ, 599, 258
Skillman, E. D., & Kennicutt, R. C., Jr. 1993, ApJ, 411, 655
Taylor, C. L., Kobulnicky, H. A., & Skillman, E. D. 1998, AJ,
116, 2746
Taylor, C. L., Hüttemeister, S., Klein, U., & Greve, A. 1999,
A&A, 349, 424
van Zee, L., Westpfahl, D., Haynes, M. P., & Salzer, J. J. 1998,
AJ, 115, 1000
Vidal-Madjar, A., et al. 2000, ApJ, 538, L77
Walter, F. 2003, IAU Symposium, 221, 176P
Wilke, K., Klaas, U., Lemke, D., Mattila, K., Stickel, M., & Haas,
M. 2004, A&A, 414, 69
Wilson, B. A., Dame, T. M., Masheder, M. R. W., & Thaddeus,
P. 2005, A&A, 430, 523
Young, J. S., et al. 1995, ApJS, 98, 219
Young, J. S., & Scoville, N. Z. 1991, ARA&A, 29, 581
Young, J. S., Allen, L., Kenney, J. D. P., Lesser, A., & Rownd, B.
1996, AJ, 112, 1903
ABSTRACT
  We present sensitive molecular line observations of the metal-poor blue
compact dwarf I Zw 18 obtained with the IRAM Plateau de Bure interferometer.
These data constrain the CO J=1-0 luminosity within our 300 pc (FWHM) beam to
be L_CO < 1 \times 10^5 K km s^-1 pc^2 (I_CO < 1 K km s^-1), an order of
magnitude lower than previous limits. Although I Zw 18 is starbursting, it has
a CO luminosity similar to or less than nearby low-mass irregulars (e.g. NGC
1569, the SMC, and NGC 6822). There is less CO in I Zw 18 relative to its
B-band luminosity, HI mass, or star formation rate than in spiral or dwarf
starburst galaxies (including the nearby dwarf starburst IC 10). Comparing the
star formation rate to our CO upper limit reveals that unless molecular gas
forms stars much more efficiently in I Zw 18 than in our own galaxy, it must
have a very low CO-to-H_2 ratio, \sim 10^-2 times the Galactic value. We detect
3mm continuum emission, presumably due to thermal dust and free-free emission,
towards the radio peak.

<|endoftext|><|startoftext|>
Introduction
	The Model
	The BPS code and the formation of hot subdwarfs
	Monte-Carlo simulation parameters
	Spectral library
	Observables from the model
	Simulations
	Standard simulation set
	Simulation sets with varying model parameters
	Simulation sets for composite stellar populations
	Results and Discussion
	Simple stellar populations
	The model for composite stellar populations
	Theory versus observations
	The UV-upturn and metallicity
	Comparison with previous models
	Summary and Conclusion
	REFERENCES
ABSTRACT
  The discovery of a flux excess in the far-ultraviolet (UV) spectrum of
elliptical galaxies was a major surprise in 1969. While it is now clear that
this UV excess is caused by an old population of hot helium-burning stars
without large hydrogen-rich envelopes, rather than young stars, their origin
has remained a mystery. Here we show that these stars most likely lost their
envelopes because of binary interactions, similar to the hot subdwarf
population in our own Galaxy. We have developed an evolutionary population
synthesis model for the far-UV excess of elliptical galaxies based on the
binary model developed by Han et al (2002, 2003) for the formation of hot
subdwarfs in our Galaxy. Despite its simplicity, it successfully reproduces
most of the properties of elliptical galaxies with a UV excess: the range of
observed UV excesses, both in $(1550-V)$ and $(2000-V)$, and their evolution
with redshift. We also present colour-colour diagrams for use as diagnostic
tools in the study of elliptical galaxies. The model has major implications for
understanding the evolution of the UV excess and of elliptical galaxies in
general. In particular, it implies that the UV excess is not a sign of age, as
had been postulated previously, and predicts that it should not be strongly
dependent on the metallicity of the population, but exists universally from
dwarf ellipticals to giant ellipticals.

<|endoftext|><|startoftext|>
Baltic Astronomy, vol.12, XXX–XXX, 2003.
THE REDSHIFT OF LONG GRBS’
Z. Bagoly1 and I. Csabai2 and A. Mészáros3 and P. Mészáros4
and I. Horváth5 and L. G. Balázs6 and R. Vavrek7
1 Lab. for Information Technology, Eötvös University, H-1117 Budapest,
Pázmány P. s. 1./A, Hungary
2 Dept. of Physics for Complex Systems, Eötvös University, H-1117 Bu-
dapest, Pázmány P. s. 1./A, Hungary
3 Astronomical Institute of the Charles University, V Holešovičkách 2,
CZ-180 00 Prague 8, Czech Republic
4 Dept. of Astronomy & Astrophysics, Pennsylvania State University,
525 Davey Lab., University Park, PA 16802, USA
5 Dept. of Physics, Bolyai Military University, H-1456 Budapest, POB
12, Hungary
6 Konkoly Observatory, H-1505 Budapest, POB 67, Hungary
7 Max-Planck-Institut für Astronomie, D-69117 Heidelberg, 17 Königstuhl,
Germany
Received October 20, 2003
Abstract. The low energy spectra of some gamma-ray bursts’ show ex-
cess components beside the power-law dependence. The consequences of
such a feature allows to estimate the gamma photometric redshift of the
long gamma-ray bursts in the BATSE Catalog. There is good correla-
tion between the measured optical and the estimated gamma photometric
redshifts. The estimated redshift values for the long bright gamma-ray
bursts are up to z = 4, while for the the faint long bursts - which should
be up to z = 20 - the redshifts cannot be determined unambiguously with
this method. The redshift distribution of all the gamma-ray bursts with
known optical redshift agrees quite well with the BATSE based gamma
photometric redshift distribution.
Key words: Cosmology - Gamma-ray burst
1. INTRODUCTION
In this article we present a new method called gamma photometric
redshift (GPZ) estimation of the estimation of the redshifts for the
http://arxiv.org/abs/0704.0864v1
2 Z.Bagoly et. al
long GRBs. We utilize the fact that broadband fluxes change sys-
tematically, as characteristic spectral features redshift into, or out of
the observational bands. The situation is in some sense similar to
the optical observations of galaxies, where for galaxies and quasars
the photometric redshift estimation (Csabai et. al (2000), Budavári
et. al (2001)) achieved a great success in estimating redshifts from
photometry only.
We construct our template spectrum that will be used in the GPZ
process in the following manner: let the spectrum be a sum of the
Band’s function and of a low energy soft excess power-law function,
observed in several cases (Preece et. al (2000)). The low energy
cross-over is at Ecr = 90 keV, Eo = 500 keV, and the spectral indices
are α = 3.2, β = 0.5 and γ = 3.0.
Let us introduce the peak flux ratio (PFR hereafter) in the fol-
lowing way:
PFR =
l34 − l12
l34 + l12
where lij is the BATSE DISCSC flux in energy channel Ei < E < Ej ,
here E1 = 25 keV, E2 = E3 = 55 keV, E4 = 100 keV.
0 2 4 6 8 10 12 14
α=3.2    β=0.5    Ecr=90 keV    
Fig. 1. The theoretical PFR curves
calculated from the template spec-
trum using the average detector re-
sponse matrix.
The spectra are changing
quite rapidly with time; the typ-
ical timescale for the time vari-
ation is ≃ (0.5 − 2.5) s (Ryde &
Svensson (1999, 2000)). There-
fore, we will consider the spectra
in the 320ms time interval cen-
tered around the peak-flux. If
we redshift the template spec-
trum and use the detector re-
sponse matrix of the given burst,
we can get for any redshift the
observed flux and the PFR value.
On Fig. 1. we plot the the-
oretical PFR curves calculated
from the above defined template
spectrum using the average detector response matrices for the 8 bursts
that have both BATSE data and measured redshifts (Klose (2000))
In the used range of z (i.e. for z<
4) the relation between z and PFR
is invertible, hence we can use it to estimate the gamma photometric
The Redshift of Long GRBs’ 3
redshift (GPZ) from a measured PFR. For the 7 considered GRBs
(leaving out GRB associated with the supernova and GRB having
upper redshift limit only) the estimation error between the real z
and the GPZ is ∆z =≈ 0.33.
2. ESTIMATION OF THE REDSHIFTS
Here restrict ourselves to long and not very faint GRBs with
T90 > 10 s and F256 < 0.65 photon/(cm
2s) to avoid the problems with
the instrumental threshold (Pendleton et. al (1997), Hakkila et. al,
(2000)). Introducing an another cut at F256 > 2.00 photon/(cm
2s) we
can investigate roughly the brighter half of this sample.
As the soft-excess range redshifts out from the BATSE DISCSC
energy channels around z ≈ 4, the theoretical curves converge to a
constant value. For higher z it starts to decrease. This means that
the method is ambiguous: for the given value of PFR one may have
two redshifts - below and above z ≈ 4. Because for the bright GRBs
the values above z ≈ 4 are practically excluded, for them the method
is usable. Using only the 25 − 55 keV and 55 − 100 keV BATSE
energy channels, this method can be used to estimate GPZ only in
the redshift range z <
0  1  2  3  4  5
Gamma Photometric Redshift
F256>0.65 ph/cm
F256>2.0 ph/cm
Fig. 1. The distribution of the GPR
estimators of the long GRBs having
DISCSC data.
Let us assume for a moment
that all observed long bursts, we
have selected above, have z <
4. Then we can simply calculate
the zGPZ redshift for any GRB,
which has PFR from the DISCSC
data. Fig. 2. shows the distribu-
tion of the estimated derived red-
shifts under the assumption that all
GRBs are below z ≈ 4. The dis-
tribution has a clear peak value
around PFR ≈ 0.2, which corre-
sponds to z ≈ (1.5− 2.0).
Although there is a problem
with the degeneracy (e.g. two possible redshift values) we think that
the great majority of values of z obtained for the bright half are
correct. This opinion may be supported by the following arguments:
the obtained distribution of GRBs in z for the bright half is very
similar to the obtained distribution of Schmidt (2001) and Schaefer
4 Z.Bagoly et. al
et. al (2001). An another problem for z as it moves into z>
4 regime for
the bright GRB is the extremely high GRB luminosities, ≃ 1053ergs/s
(Mészáros & Mészáros, 1996).
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
17 bursts with known redshift
GPZ{ F256>0.65 ph/cm2/s
Fig. 3. The redshift distribution of
the 17 GRBs’ with known z and the
distributions from the GPZ estima-
tors.
As an additional statistical
test we compared the redshift
distribution of the 17 GRB with
observed redshift with our re-
constructed GRB z distributions
(limited to the z < 4 range). For
the F256 > 0.65 photon/(cm
group the KS test suggests a 38%
probability, i.e. the observed
N(< z) probability distribution
agrees quite well with the GPZ
reconstructed function.
ACKNOWLEDGMENTS
The useful remarks with Drs. T.
Budavári, S. Klose, D. Reichart,
A.S. Szalay are kindly acknowl-
edged. This research was supported in part through OTKA grants
T024027 (L.G.B.), F029461 (I.H.) and T034549, Czech Research
Grant J13/98: 113200004 (A.M.), NASA grant NAG5-9192 (P.M.).
REFERENCES
??udavári, T., Csabai, I., Szalay, A.S. et. al, 2001, AJ, 122, 1163
??sabai, I., Connolly, A.J., Szalay, A.S. et. al, 2000, AJ, 119, 69
??akkila, J., Haglin, D. J., Pendleton, G. N. et. al, 2000, ApJ, 538,
??lose, S. 2000, Reviews in Modern Astronomy 13, Astronomische
Gesellschaft, Hamburg, p.129
??észáros, A., & Mészáros, P. 1996, ApJ, 466, 29
??reece, R.D., Briggs, M.S., Pendleton, G.N., et. al 1996, ApJ, 473,
??reece, R.D., Briggs, M.S., Mallozzi, et. al, 2000, ApJS, 126, 19
??yde, F., & Svensson, R. 1999, ApJ, 512, 693
??yde, F., & Svensson, R. 2000, ApJ, 529, L13
??chaefer, B. E., Deng, M. & Band, D. L., 2001, ApJ, 563, L123
??chmidt, M. 2001, ApJ, 552, 36
ABSTRACT
  The low energy spectra of some gamma-ray bursts' show excess components
beside the power-law dependence. The consequences of such a feature allows to
estimate the gamma photometric redshift of the long gamma-ray bursts in the
BATSE Catalog. There is good correlation between the measured optical and the
estimated gamma photometric redshifts. The estimated redshift values for the
long bright gamma-ray bursts are up to z=4, while for the the faint long bursts
- which should be up to z=20 - the redshifts cannot be determined unambiguously
with this method. The redshift distribution of all the gamma-ray bursts with
known optical redshift agrees quite well with the BATSE based gamma photometric
redshift distribution.

<|endoftext|><|startoftext|>
Introduction 
The increasing complexity of software systems raises 
major concerns in various critical application domains, in 
particular with respect to the validation and analysis of 
performance, timing and dependability requirements. 
Model-driven engineering approaches based on 
architecture description languages (ADLs) aim at 
mastering this complexity at the design level. Over the 
last decade, considerable research has been devoted to 
ADLs leading to a large number of proposals [1]. In 
particular, AADL (Architecture Analysis and Design 
Language) [2] has received an increasing interest from the 
safety-critical industry (i.e., Honeywell, Rockwell Collins, 
Lockheed Martin, the European Space Agency, Airbus) 
during the last years. It has been standardized under the 
auspices of the International Society of Automotive 
Engineers (SAE), to support the design and analysis of 
complex real-time safety-critical applications. AADL 
provides a standardized textual and graphical notation, for 
describing architectures with functional interfaces, and for 
performing various analyses to determine the behavior 
and performance of the system being modeled. AADL has 
been designed to be extensible to accommodate analyses 
that the core language does not support, such as 
dependability and performance.  
In critical application domains, one of the challenges 
faced by the software engineers concerns: 1) the 
description of the software architecture and its dynamic 
behavior taking into account the impact of errors and 
failures, and 2) the evaluation of quantitative measures of 
relevant dependability properties such as reliability, 
availability and safety, allowing them to assess the impact 
of errors and failures on the service. For pragmatic 
reasons, the designers using an AADL-based engineering 
approach are interested in using an integrated set of 
methods and tools to describe specifications and designs, 
and to perform dependability evaluations. The AADL 
Error Model Annex [3] has been defined to complement 
the description capabilities of the AADL core language 
standard by providing features with precise semantics to 
be used for describing dependability-related 
characteristics in AADL models (faults, failure modes and 
repair assumptions, error propagations, etc.). AADL and 
the AADL Error Model Annex are supported by the Open 
Source AADL Tool Environment (OSATE)1.  
At the current stage, there is a lack of methodologies and 
guidelines to help the developers, using an AADL based 
engineering approach, to use the notations defined in the 
standard for describing complex dependability models 
reflecting real-life systems with multiple dependencies 
between components. The objective of this paper is to 
propose a structured method for AADL dependability 
model construction. The AADL model is built and 
validated iteratively, taking into account progressively the 
dependencies between the components. 
The approach proposed in this paper is complementary to 
other research studies focused on the extension of the 
AADL language capabilities to support formal 
verifications and analyses (see e.g. [4]). Also, it is 
intended to be complementary to other studies focused on 
the integration of formal verification, dependability and 
performance related activities in the general context of 
                                                             
1 http://lwww.aadl.info/OpenSourceAADLToolEnvironment.html 
model driven engineering approaches based on ADLs and 
on UML (see e.g., [5-9]). 
The remainder of the paper is organized as follows. 
Section 2 presents the AADL concepts that are necessary 
for understanding our modeling approach. Section 3 gives 
an overview of our framework for system dependability 
modeling and evaluation using AADL. Section 4 presents 
the iterative approach for building the AADL 
dependability model. Section 5 illustrates some of the 
concepts of our approach on a small example and  
section 6 concludes the paper. 
2. AADL concepts 
The AADL core language allows analyzing the impact of 
different architecture choices (such as scheduling policy 
or redundancy scheme) on a system’s properties [10]. An 
architecture specification in AADL is an hierarchical 
collection of interacting components (software and 
compute platform) combined in subsystems. Each AADL 
component is modeled at two levels: in the component 
type and in one or more component implementations 
corresponding to different implementation structures of 
the component in terms of subcomponents and 
connections. The AADL core language is designed to 
describe static architectures with operational modes for 
their components. However, it can be extended to 
associate additional information to the architecture. 
AADL error models are an extension intended to support 
(qualitative and quantitative) analyses of dependability 
attributes. The AADL Error Model Annex defines a sub-
language to declare reusable error models within an error 
model annex library. The AADL architecture model 
serves as a skeleton for error model instances. Error 
model instances can be associated with components of the 
system and with the system itself. 
The component error models describe the behavior of 
the components with which they are associated, in the 
presence of internal faults and recovery events, as well as 
in the presence of external propagations from the 
component’s environment. Error models have two levels 
of description: the error model type and the error model 
implementation. The error model type declares a set of 
error states, error events (internal to the 
component) and error propagations2 (events that 
propagate, from one component to other components, 
through the connections and bindings between 
components of the architecture model). Propagations have 
associated directions (in or out or in out). Error 
model implementations declare transitions between 
states, triggered by events and propagations declared in 
the error model type. Both the type and the 
implementation can declare Occurrence properties that 
                                                             
2 Error states can also model error free states, error events can also 
model repair events and error propagations can model all kinds of 
notifications. 
specify the arrival rate or the occurrence probability of 
events and propagations. An out propagation occurs 
according to a specified Occurrence property when it 
is named in a transition and the current state is the origin 
of the transition. If the source state and the destination 
state of a transition triggered by an out propagation are 
the same, the propagation is sent out of the component but 
does not influence the state of the sender component. An 
in propagation occurs as a consequence of an out 
propagation from another component. Figure 1 shows an 
error model example.  
Error Model Type [simple] 
error model simple 
features 
Error_Free: initial error state; 
Failed: error state; 
Fail: error event  
 {Occurrence => Poisson λ}; 
Recover: error event 
 {Occurrence => Poisson µ}; 
KO: in out error propagation  
 {Occurrence => fixed p}; 
end simple; 
Error Model Implementation [simple.general] 
error model implementation   
 simple.general 
transitions 
Error_Free-[Fail] -> Failed; 
Error_Free-[in KO] -> Failed; 
Failed-[Recover] -> Error_Free; 
Failed-[out KO] -> Failed; 
end simple.general; 
Figure 1. Simple error model 
Error model instances can be customized to fit a particular 
component through the definition of Guard properties 
that control and filter propagations by means of Boolean 
expressions. 
The system error model is defined as a composition of a 
set of concurrent finite stochastic automata corresponding 
to components. In the same way as the entire architecture, 
the system error model is described hierarchically. The 
state of a system that contains subcomponents can be 
specified as a function of its subcomponents’ states (i.e., 
the system has a derived error model). 
3. Overview of the modeling framework 
For complex systems, the main difficulty for building a 
dependability model arises from dependencies between 
the system components. Dependencies can be of several 
types, identified in [11]: functional, structural or related to 
the recovery and maintenance strategies. Exchange of data 
or transfer of intermediate results from one component to 
another is an example of functional dependency. The fact 
that a thread runs on a processor induces a structural 
dependency between the thread and the processor. Sharing 
a recovery or maintenance facility between several 
components leads to a recovery or maintenance 
dependency. Functional and structural dependencies can 
be grouped into an architecture-based dependency class, 
as they are triggered by physical or logical connections 
between the dependent components at architectural level. 
Instead, recovery and maintenance dependencies are not 
always visible at architectural level. 
A structured approach is necessary to model dependencies 
in a systematic way, to promote model reusability, to 
avoid errors in the resulting model of the system and to 
facilitate its validation. In our approach, the AADL 
dependability-oriented model is built in a progressive and 
iterative way. More concretely, in a first iteration, we 
propose to build the model of the system’s components, 
representing their behavior in the presence of their own 
faults and recovery events only. The components are thus 
modeled as if they were isolated from their environment. 
In the following iterations, dependencies between basic 
error models are introduced progressively.  
This approach is part of a complete framework that allows 
the generation of dependability analysis and evaluation 
models from AADL models. An overview of this 
framework is presented in Figure 2.  
Figure 2. Modeling framework 
The first step is devoted to the modeling of the application 
architecture in AADL (in terms of components and 
operational modes of these components). The AADL 
architecture model may be available if it has been already 
built for other purposes. 
The second step concerns the specification of the 
application behavior in the presence of faults through 
AADL error models associated with components of the 
architecture model. The error model of the application is a 
composition of the set of component error models.  
The architecture model and the error model of the 
application form the dependability-oriented AADL model, 
referred to as the AADL dependability model.  
The third step aims at building an analytical dependability 
evaluation model, from the AADL dependability model, 
based on model transformation rules.  
The fourth step is devoted to the dependability evaluation 
model processing that aims at evaluating quantitative 
measures characterizing dependability attributes. This step 
is entirely based on existing processing tools.  
The iterative approach can be applied to the second step 
of the modeling framework only or to the second and third 
steps together. In the latter case, semantic validation based 
on the analytical model, after each iteration, is helpful to 
identify specification errors in the AADL dependability 
model.  
Due to space limitations, we focus only on the first and 
second steps in this paper. A transformation from AADL 
to generalized stochastic Petri nets (GSPN) for 
dependability evaluation purposes is presented in [12].  
4. AADL dependability model construction 
To illustrate the proposed approach, the rest of this section 
presents successively guidelines for modeling an 
architecture-based dependency (structural or functional) 
and a recovery and maintenance dependency. More 
general practical aspects for building the AADL 
dependability model are given at the end of this section. 
Note that we illustrate the principles using the graphical 
notation for AADL composite components (system 
components). However, they apply to all types of 
components and connections. 
4.1. Architecture-based dependency 
The dependency is modeled in the error models associated 
with the dependent components, by specifying 
respectively outgoing and incoming propagations and 
their impact on the corresponding error model. An 
example is shown in Figure 3: Component 1 sends data to 
Component 2, thus we assume that, at the error model 
level, the behavior of Component 2 depends on that of 
Component 1.  
Figure 3. Architecture-based dependency  
Instances of the same error model, shown in Figure 1, are 
associated both with Component 1 and with Component 2. 
However, the AADL dependability model is asymmetric 
because of the unidirectional connection between 
Component 1 and Component 2. Thus, the out 
propagation KO declared in the error model instance 
associated with Component 2 is inactive (i.e., even if it 
occurs, it cannot propagate to Component 1). 
The out propagation KO from the error model instance 
of Component 1, together with its Occurrence property 
and the AADL transition triggered by it form the “sender” 
part of the dependency. It means that when Component 1 
fails, it sends a propagation through the unidirectional 
connection. The in propagation KO from the error model 
instance of Component2 together with the AADL 
transition triggered by it form the “receiver” part of the 
dependency. Thus, an incoming propagation KO causes 
the failure of the receiving component.  
In real applications, architecture-based dependencies 
usually require using more advanced propagation 
controlling and filtering through Guard properties. In 
particular, Boolean expressions can be defined to specify 
the consequences of a set of propagations occurring in a 
set of sender components on a receiver component. 
4.2. Recovery and maintenance dependency 
Recovery and maintenance dependencies need to be 
described when recovery and maintenance facilities are 
shared between components or when the maintenance 
activity of some components has to be carried out 
according to a given order or a specified strategy (i.e., a 
thread can be restarted only if another thread is running). 
Components that are not dependent at architectural level 
may become dependent due to the recovery and 
maintenance strategy. Thus, the AADL dependability 
model might need some adjustments to support the 
description of dependencies related to the maintenance 
strategy. As error models interact only via propagations 
through architectural features (i.e., connections, bindings), 
the recovery and maintenance dependency between 
components’ error models must be supported by the 
architecture model. Thus, besides the architecture 
components, we may need to model (at architectural 
level) a component allowing to describe the recovery and 
maintenance strategy. Figure 4-a shows an example of 
AADL dependability model. In this architecture, 
Component 3 and Component 4 do not interact at the 
architecture level. However, if we assume that they share 
a recovery and maintenance facility, the recovery and 
maintenance strategy has to be taken into account in the 
error model of the application. Thus, it is necessary to 
represent the recovery and maintenance facility at the 
architectural level, as shown in Figure 4-b in order to 
model explicitly the dependency between Components 3 
and Component 4. 
Also, the error models of dependent components with 
regards to the recovery and maintenance strategy might 
need some adjustments. For example, to represent the fact 
that Component 3 can only restart if Component 4 is 
running, one needs to distinguish between a failed state of 
Component 3 and a failed state where Component 3 is 
allowed to restart. 
- a -  - b -  
Figure 4. Maintenance dependency 
4.3. Practical aspects 
The order for modeling dependencies does not impact the 
final AADL dependability model. However, it may 
impact the reusability of parts of the model. Thus, the 
order may be chosen according to the context of the 
targeted analysis. For example, if the analysis is meant to 
help the user to choose the best-adapted structure for a 
system whose functions are completely defined, it may be 
convenient to introduce first functional dependencies 
between components and then structural dependencies, as 
the model corresponding to functional dependencies is to 
be reused. Generally, recovery and maintenance 
dependencies are modeled at the end, as one important 
aim of the dependability evaluation is to find the best-
suited recovery and maintenance strategies for an 
application. Recovery and maintenance dependencies may 
have an impact on the system’s structure.  
Not all the details of the architecture model are necessary 
for the AADL dependability model. Only components that 
have associated error models and all connections and 
bindings between them are necessary. This allows a 
designer to evaluate dependability measures at different 
stages in the development cycle by moving from a lower 
fidelity AADL dependability model to a detailed one. In 
some cases, not all components having associated error 
models are part of the AADL dependability model. The 
AADL Error Model Annex offers two useful abstraction 
options for error models of components composed of 
subcomponents: 
− The first option is to declare an abstract error model 
for a system component. In this case, the 
corresponding component is seen as a black box (i.e., 
the detailed subcomponents’ error models are not part 
of the AADL dependability model). This option is 
useful to abstract away modeling details in case an 
architecture model with too detailed error models 
associated with components does exist for other 
purposes. Issues linked to the relationship between 
abstract and concrete error models have been 
mentioned in [13].  
− The second option is to define the state of a system 
component as a function of its subcomponents’ states. 
This option can be used to specify state classes for 
the overall application. These classes are useful in the 
evaluation of measures. If the user wishes to evaluate 
reliability or availability, it is necessary to specify the 
system states that are to be considered as failed states. 
If in addition, the user wishes to evaluate safety, it is 
necessary to specify the system states that are 
considered as catastrophic. 
5. Example 
In this section we illustrate our modeling approach on a 
small software architecture representing a process whose 
functional role is to compute a result. The computation is 
divided in three sub computations, each of them being 
performed by a thread. The thread Compute2 uses the 
result obtained by the thread Compute1 and the thread 
Compute3 uses the result obtained by the thread 
Compute2 to compute the result expected from the 
process. The three threads are connected through data 
connections according to the pipe and filter architectural 
style [14]. Due to space limitations, we only take into 
account two dependencies: 
− An architecture-based dependency between the 
computing threads: a failure in one of the computing 
threads may cause the failure of the following thread 
(with a probability p). In some cases, cascading 
failures can occur. 
−  A recovery dependency: Compute3 can only recover 
if Compute1 and Compute2 are error free. We assume 
that Compute2 can recover if Compute1 is not error 
free. 
The AADL dependability model of this application is 
shown in Figure 5 using the AADL graphical notation. 
Figure 5. AADL dependability model 
The AADL dependability model of this application is 
built in three iterations. The computing threads’ behavior 
in the presence of their own fault and recovery events is 
represented in the first iteration. The propagation KO 
together with corresponding transitions are added in a 
second iteration to represent the architecture-based 
dependency. The thread Compute1 can have an impact on 
Compute2 and Compute2 can have an impact on 
Compute3. We remind that the opposite is not possible, as 
the connections between threads are unidirectional. The 
recovery dependency is modeled in the third iteration. It 
requires the existence of a Recovery thread in the 
architecture model (see light grey part of Figure 5). Its 
role is to send (through the out port to3) a 
RecoverAuthorize propagation to Compute3 if Compute1 
and Compute2 are error free. 
Figure 6-a shows the error model Comp.general 
associated with threads Compute1 and Compute2. Figure 
6-b shows the error model Comp3.general associated with 
the threads Compute3. The three iterations are 
highlighted. Each line tagged with a (+) sign is added to 
the error model corresponding to the previous iteration 
while each line tagged with a (-) sign is removed from it 
during the current iteration. The first and second iterations 
are the same for all three computing threads. In the third 
iteration, it is necessary to distinguish between a failed 
state and a failed state from which Compute3 is 
authorized to restart. This leads to removing a transition 
declared in the first iteration, and adding a state 
(CanRecover) and two transitions linking it to the state 
machine. 
Figure 7 shows the Guard_Out property applied to port 
to3 of the Recovery thread in the third iteration. This 
property specifies that a RecoverAuthorize propagation is 
sent to Compute3 through port to3 when OK propagations 
are received through ports in1 and in2 (meaning that 
Compute1 and Compute2 are error free). The Recovery 
thread has an associated error model that is not shown 
here. It declares in and out propagations used in the 
Guard_Out property. 
The main idea of this method is to verify and validate the 
model at each iteration. If a problem arises during 
iteration i, only the part of the current AADL 
dependability model corresponding to iteration i is 
questioned. Thus, the validation process is facilitated 
especially in the context of complex systems. 
6. Conclusion 
This paper presented an iterative approach for system 
dependability modeling using AADL. This approach is 
meant to ease the task of analyzing dependability 
characteristics and evaluating dependability measures for 
the AADL users community. Our approach assists the 
user in the structured construction of the AADL 
dependability model (i.e., architecture model and 
dependability-related information). To support and trace 
model evolution, this approach proposes that the user 
builds the model iteratively. Components’ behaviors in 
the presence of faults are modeled in the first iteration as 
if they were isolated. Then, each iteration introduces a 
new dependency between system components. Error 
models representing the behavior of several types of 
system components and several types of dependencies 
may be placed in a library and then instantiated to 
minimize the modeling effort and maximize the 
reusability of models. 
The OSATE toolset is able to support our modeling 
approach. It also allows choosing component models and 
error models from libraries. For the sake of illustration, 
we used simple examples in this paper. We have already 
applied the iterative modeling approach to a system with 
multiple dependencies in [12] and we plan to validate it 
against other complex case studies. 
Error Model Type [Comp] 
error model Comp 
features 
-- iteration 1 
(+) Error_Free: initial error state; 
(+) Failed: error state; 
(+) Fail: error event  
(+) {Occurrence => Poisson λ}; 
(+) Recover: error event 
(+) {Occurrence => Poisson µ}; 
-- iteration 2 
(+) KO: in out error propagation  
(+) {Occurrence => fixed p}; 
-- iteration 3 
(+) OK: out error propagation   
(+) {Occurrence => fixed 1}; 
end Comp; 
 Error Model Type [Comp3] 
error model Comp3 
features 
-- iteration 1 
(+) Error_Free: initial error state; 
(+) Failed: error state; 
(+) Fail: error event  
(+) {Occurrence => Poisson λ}; 
(+) Recover: error event 
(+) {Occurrence => Poisson µ}; 
-- iteration 2 
(+) KO: in out error propagation  
(+) {Occurrence => fixed p}; 
-- iteration 3 
(+) CanRecover: error state; 
(+) OK: in error propagation; 
end Comp3; 
Error Model Implementation [Comp.general] 
error model implementation Comp.general 
transitions 
-- iteration 1 
(+) Error_Free-[Fail]->Failed; 
(+) Failed-[Recover]->Error_Free; 
-- iteration 2 
(+) Error_Free-[in KO]->Failed; 
(+) Failed-[out KO]->Failed; 
-- iteration 3 
(+) Error_Free-[out OK]->Error_Free; 
end Comp.general; 
 Error Model Implementation [Comp3.general] 
error model implementation Comp3.general 
transitions 
-- iteration 1 
(+) Error_Free-[Fail]->Failed; 
(+) Failed-[Recover]->Error_Free; 
-- iteration 2 
(+) Error_Free-[in KO]->Failed; 
(+) Failed-[out KO]->Failed; 
-- iteration 3 
(-) Failed-[Recover]->Error_Free; 
(+) Failed-[RecoverAuthorize]->CanRecover; 
(+) CanRecover-[Recover]->Error_Free; 
end Comp3.general; 
a: Error Model for Compute1 and Compute2   b: Error Model for Compute3 
Figure 6. Error model for Compute1 / Compute2 
Guard_Out [port Recovery.to3] 
-- iteration 3 
(+) Guard_Out => 
(+) RecoverAuthorize when    
(+) (from1[OK]and from2[OK]) 
(+) mask when others 
(+) applies to to3; 
Figure 7. Guard_Out property (port Recovery.to3) 
Acknowledgements 
This work is partially supported by 1) the European Commission 
(European integrated project ASSERT No. IST 004033 and 
network of excellence ReSIST No. IST 026764). and 2) the 
European Social Fund. 
References 
[1] N. Medvidovic and R. N. Taylor, A classification and 
comparison framework for Software Architecture 
Description Languages, IEEE Transactions on Software 
Engineering, 26, 2000, 70-93. 
[2] SAE-AS5506, Architecture Analysis and Design Language, 
Society of Automotive Engineers, 2004. 
[3] SAE-AS5506/1, Architecture Analysis and Design 
Language (AADL) Annex Volume 1, Annex E: Error 
Model Annex, Society of Automotive Engineers, 2006. 
[4] J.-M. Farines, et al., The Cotre project: rigorous software 
development for real time systems in avionics, 27th 
IFAC/IFIP/IEEE Workshop on Real Time Programming, 
Zielona Gora, Poland, 2003. 
[5] R. Allen and D. Garlan, A Formal Basis for Architectural 
Connection, ACM Transactions on Software Engineering 
and Methodology, 6, 1997, 213-249. 
[6] M. Bernardo, P. Ciancarini, and L. Donatiello, Architecting 
Families of Software Systems with Process Algebras, ACM 
Transactions on Software Engineering and Methodology, 
11, 2002, 386-426. 
[7] A. Bondavalli, et al., Dependability Analysis in the Early 
Phases of UML Based System Design, Int. Journal of 
Computer Systems - Science & Engineering, 16, 2001, 265-
275. 
 [8] S. Bernardi, S. Donatelli, and J. Merseguer, From UML 
Sequence Diagrams and Statecharts to analysable Petri Net 
models, 3rd Int. Workshop on Software and Performance 
(WOSP 2002), Rome, Italy, 2002, ,35-45. 
[9] P. King and R. Pooley, Using UML to Derive Stochastic 
Petri Net Models, 15th annual UK Performance 
Engineering Workshop, 1999, 45-56. 
[10] P. H. Feiler, et al., Pattern-Based Analysis of an Embedded 
Real-time System Architecture, 18th IFIP World Computer 
Congress, ADL Workshop, Toulouse, France, 2004, 83-91. 
[11] K. Kanoun and M. Borrel, Fault-tolerant systems 
dependability. Explicit modeling of hardware and software 
component-interactions, IEEE Transactions on Reliability, 
49, 2000, 363-376. 
[12] A. E. Rugina, K. Kanoun, and M. Kaâniche, AADL-based 
Dependability Modelling, LAAS-CNRS Research Report 
n°06209, April 2006, 85p. 
[13] P. Binns and S. Vestal, Hierarchical composition and 
abstraction in architecture models, 18th IFIP World 
Computer Congress, ADL Workshop, Toulouse, France, 
2004, 43-52. 
[14] M. Shaw and D. Garlan, Software Architecture: 
Perspectives on an Emerging Discipline (Prentice-Hall, 
1996).
ABSTRACT
  For efficiency reasons, the software system designers' will is to use an
integrated set of methods and tools to describe specifications and designs, and
also to perform analyses such as dependability, schedulability and performance.
AADL (Architecture Analysis and Design Language) has proved to be efficient for
software architecture modeling. In addition, AADL was designed to accommodate
several types of analyses. This paper presents an iterative dependency-driven
approach for dependability modeling using AADL. It is illustrated on a small
example. This approach is part of a complete framework that allows the
generation of dependability analysis and evaluation models from AADL models to
support the analysis of software and system architectures, in critical
application domains.

<|endoftext|><|startoftext|>
Introduction
Let X be a compact connected Kähler manifold of dimension n ∈ N∗.
Throughout the article ω denotes a smooth closed form of bidegree (1, 1)
which is nonnegative and big, i.e. such that
ωn > 0. We continue the
study started in [GZ 2], [EGZ] of the complex Monge-Ampère equation
(MA)µ (ω + dd
cϕ)n = µ,
where ϕ, the unknown function, is ω-plurisubharmonic: this means that
ϕ ∈ L1(X) is upper semi-continuous and ω+ ddcϕ ≥ 0 is a positive current.
We let PSH(X,ω) denote the set of all such functions (see [GZ 1] for their
basic properties). Here µ is a fixed positive Radon measure of total mass
µ(X) =
ωn, and d = ∂ + ∂, dc = 1
(∂ − ∂).
Following [GZ 2] we say that a ω-plurisubharmonic function ϕ has fi-
nite weighted Monge-Ampère energy, ϕ ∈ E(X,ω), when its Monge-Ampère
measure (ω+ ddcϕ)n is well defined, and there exists an increasing function
χ : R− → R− such that χ(−∞) = −∞ and χ ◦ ϕ ∈ L1((ω + ddcϕ)n).
In general χ has very slow growth at infinity, so that ϕ is far from being
bounded.
The purpose of this article is twofold. First we extend one of the main
results of [GZ 2] by showing
THEOREM A. There exists ϕ ∈ E(X,ω) such that µ = (ω+ddcϕ)n if and
only if µ does not charge pluripolar sets.
This results has been established in [GZ 2] when ω is a Kähler form. It
is important for applications to complex dynamics and Kähler geometry to
consider as well forms ω that are less positive (see [EGZ]).
http://arxiv.org/abs/0704.0866v2
2 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
We then look for conditions on the measure µ which insure that the
solution ϕ is almost bounded. Following the seminal work of S. Kolodziej
[K 2,3], we say that µ is dominated by the Monge-Ampère Capacity Capω
if there exists a function F : R+ → R+ such that limt→0+ F (t) = 0 and
(†) µ(K) ≤ F (Capω(K)), for all Borel subsets K ⊂ X.
Here Capω denotes the global version of the Monge-Ampère capacity intro-
duced by E.Bedford and A.Taylor [BT] (see section 2).
Observe that µ does not charge pluripolar sets since F (0) = 0. When
F (x) . xα vanishes at order α > 1 and ω is Kähler, S. Kolodziej has
proved [K 2] that the solution ϕ ∈ PSH(X,ω) of (MA)µ is continuous. The
boundedness part of this result was extended in [EGZ] to the case when
ω is merely big and nonnegative. If F (x) . xα with 0 < α < 1, two of
us have proved in [GZ 2] that the solution ϕ has finite χ−energy, where
χ(t) = −(−t)p, p = p(α) > 0. This result was first established by U. Cegrell
in a local context [Ce].
Another objective of this article is to fill in the gap inbetween Cegrell’s
and Kolodziej’s results, by considering all intermediate dominating functions
F. Write Fε(x) = x[ε(− ln(x)/n)]
n where ε : R → [0,∞[ is nonincreasing.
Our second main result is:
THEOREM B. If µ(K) ≤ Fε(Capω(K)) for all Borel subsets K ⊂ X,
then µ = (ω + ddcϕ)n where ϕ ∈ PSH(X,ω) satisfies supX ϕ = 0 and
Capω(ϕ < −s) ≤ exp(−nH
−1(s)).
Here H−1 is the reciprocal function of H(x) = e
ε(t)dt + s0, where s0 =
s0(ε, ω) ≥ 0 only depends on ε and ω.
This general statement has several useful consequences:
ε(t)dt < +∞, thenH−1(s) = +∞ for s ≥ s∞ := e
ε(t)dt+
s0, hence Capω(ϕ < −s) = 0. This means that ϕ is bounded from
below by −s∞. This result is due to S. Kolodziej [K 2,3] when ω is
Kähler, and [EGZ] when ω ≥ 0 is merely big;
• the condition (†) is easy to check for measures with density in Lp, p >
1. Our result thus gives a simple proof (Corollary 3.2), following the
seminal approach of S. Kolodziej ([K2]), of the C0-a priori estimate
of S.T. Yau [Y], which is crucial for proving the Calabi conjecture
(see [T] for an overview);
• when
ε(t)dt = +∞, the solution ϕ is generally unbounded. The
faster ε(t) decreases towards zero, the faster the growth of H−1 at
infinity, hence the closer is ϕ from being bounded;
• the special case ε ≡ 1 is of particular interest. Here µ(·) ≤ Capω(·),
and our result shows that Capω(ϕ < −s) decreases exponentially
fast, hence ϕ has “ loglog-singularities”. These are the type of sin-
gularities of the metrics used in Arakelov geometry in relation with
measures µ = fdV whose density has Poincaré-type singularities
(see [Ku], [BKK]).
We prove Theorem B in section 3, after establishing Theorem A in section
2.1 and recalling some useful facts from [GZ 2], [EGZ] in section 2.2. We
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 3
then test the sharpness of our estimates in section 4, where we give examples
of measures fulfilling our assumptions: these are absolutely continuous with
respect to ωn, and their density do not belong to Lp, for any p > 1.
2. Weakly singular quasiplurisubharmonic functions
The class E(X,ω) of ω-psh functions with finite weighted Monge-Ampère
energy has been introduced and studied in [GZ 2]. It is the largest subclass
of PSH(X,ω) on which the complex Monge-Ampère operator (ω+ddc·)n is
well-defined and the comparison principle is valid. Recall that ϕ ∈ E(X,ω)
if and only if (ω + ddcϕj)
n(ϕ ≤ −j) → 0, where ϕj := max(ϕ,−j).
2.1. The range of the Monge-Ampère operator. The range of the
operator (ω + ddc·)n acting on E(X,ω) has been characterized in [GZ 2]
when ω is a Kähler form. We extend here this result to the case when ω is
merely nonnegative and big.
Theorem 2.1. Assume ω is a smooth closed nonnegative (1,1) form on X,
and µ is a positive Radon measure such that µ(X) =
ωn > 0.
Then there exists ϕ ∈ E(X,ω) such that µ = (ω + ddcϕ)n if and only if µ
does not charge pluripolar sets.
Proof. We can assume without loss of generality that µ and ω are normalized
so that µ(X) =
ωn = 1. Consider, for A > 0,
CA(ω) := {ν probability measure / ν(K) ≤ A · Capω(K), for all K ⊂ X},
where Capω denotes the Monge-Ampère capacity introduced by E.Bedford
and A.Taylor in [BT] (see [GZ 1] for this compact setting). Recall that
Capω(K) := sup
(ω + ddcu)n / u ∈ PSH(X,ω), 0 ≤ u ≤ 1
We first show that a measure ν ∈ CA(ω) is the Monge-Ampère of a func-
tion ψ ∈ Ep(X,ω), for any 0 < p < 1, where
Ep(X,ω) := {ψ ∈ E(X,ω) / ψ ∈ Lp
(ω + ddcψ)n
Indeed, fix ν ∈ CA(ω), 0 < p < 1, and ωj := ω + εjΩ, where Ω is
a kähler form on X, and εj > 0 decreases towards zero. Observe that
PSH(X,ω) ⊂ PSH(X,ωj), hence Capω(.) ≤ Capωj(.), so that ν ∈ CA(ωj).
It follows from Proposition 3.6 and 2.7 in [GZ 1] that there exists C0 > 0
such that for any v ∈ PSH(X,ωj) normalized by supX v = −1, we have
Capωj(v < −t) ≤
, for all t ≥ 1.
This yields Ep(X,ωj) ⊂ L
p(ν): if v ∈ Ep(X,ωj) with supX v = −1, then
(−v)pdν = p ·
tp−1ν(v < −t)dt
≤ pA ·
tp−1Capω(v < −t)dt+ Cp
+ Cp < +∞.
4 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
It follows therefore from Theorem 4.2 in [GZ 2] that there exists ϕj ∈
Ep(X,ωj) with supX ϕj = −1 and (ωj+dd
n = cj ·ν, where cj =
ωnj ≥
1 decreases towards 1 as εj decreases towards zero. We can assume without
loss of generality that 1 ≤ cj ≤ 2. Observe that the ϕj ’s have uniformly
bounded energies, namely
(−ϕj)
p(ωj + dd
n ≤ 2
(−ϕj)
pdν ≤ 2
Since supX ϕj = −1, we can assume (after extracting a convergent subse-
quence) that ϕj → ϕ in L
1(X), where ϕ ∈ PSH(X,ω), supX ϕ = −1.
Set φj := (supl≥j ϕl)
∗. Thus φj ∈ PSH(X,ωj), and φj decreases towards
ϕ. Since φj ≥ ϕj , it follows from the “fundamental inequality” (Lemma 2.3
in [GZ 2]) that
(−φj)
p(ωj + dd
n ≤ 2n
(−ϕj)
p(ωj + dd
n ≤ C ′ < +∞.
Hence it follows from stability properties of the class Ep(X,ω) that ϕ ∈
Ep(X,ω) (see Proposition 5.6 in [GZ 2]). Moreover
(ωj + dd
n ≥ inf
(ωl + dd
n ≥ ν,
hence (ω + ddcϕ)n = lim(ωj + dd
n ≥ ν. Since
ωn = ν(X) = 1, this
yields ν = (ω + ddcϕ)n as claimed above.
We can now prove the statement of the theorem. One implication is
obvious: if µ = (ω+ddcϕ)n, ϕ ∈ E(X,ω), then µ does not charge pluripolar
sets, as follows from Theorem 1.3 in [GZ 2].
So we assume now µ that does not charge pluripolar sets. Since C1(ω) is
a compact convex set of probability measures which contains all measures
(ω + ddcu)n, u ∈ PSH(X,ω), 0 ≤ u ≤ 1, we can project µ onto C1(ω) and
get, by a generalization of Radon-Nikodym theorem (see [R], [Ce]),
µ = f · ν, ν ∈ C1(ω), 0 ≤ f ∈ L
1(ν).
Now ν = (ω + ddcψ)n for some ψ ∈ E1/2(X,ω), ψ ≤ 0, as follows from the
discussion above. Replacing ψ by eψ shows that we can actually assume ψ
to be bounded (see Lemma 4.5 in [GZ 2]). We can now apply line by line the
same proof as that of Theorem 4.6 in [GZ 2] to conclude that µ = (ω+ddcϕ)n
for some ϕ ∈ E(X,ω). �
2.2. High energy and capacity estimates. Given χ : R− → R− an
increasing function, we consider, following [GZ 2],
Eχ(X,ω) :=
ϕ ∈ E(X,ω) /
(−χ)(−|ϕ|) (ω + ddcϕ)n < +∞
Alternatively a function ϕ ≤ 0 belongs to Eχ(X,ω) if and only if
(−χ) ◦ ϕj (ω + dd
n < +∞, where ϕj := max(ϕ,−j)
is the canonical approximation of ϕ by bounded ω-psh functions. When
χ(t) = −(−t)p, Eχ(X,ω) is the class E
p(X,ω) used in previous section.
The properties of classes Eχ(X,ω) are quite different whether the weight
χ is convex (slow growth at infinity) or concave. In previous works [GZ 2],
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 5
two of us were mainly interested in weights χ of moderate growth at infinity
(at most polynomial). Our main objective in the sequel is to construct
solutions ϕ of (MA)µ which are “almost bounded”, i.e. in classes Eχ(X,ω)
for concave weights χ of arbitrarily high growth.
For this purpose it is useful to relate the property ϕ ∈ Eχ(X,ω) to the
speed of decreasing of Capω(ϕ < −t), as t → +∞. We set
Êχ(X,ω) :=
ϕ ∈ PSH(X,ω) /
tnχ′(−t)Capω(ϕ < −t)dt < +∞
An important tool in the study of classes Eχ(X,ω) are the “fundamental
inequalities” (Lemmas 2.3 and 3.5 in [GZ 2]), which allow to compare the
weighted energy of two ω-psh functions ϕ ≤ ψ. These inequalities are only
valid for weights of slow growth (at most polynomial), while they become
immediate for classes Êχ(X,ω). So are the convexity properties of Êχ(X,ω).
We summarize this and compare these classes in the following:
Proposition 2.2. The classes Êχ(X,ω) are convex and stable under maxi-
mum: if Êχ(X,ω) ∋ ϕ ≤ ψ ∈ PSH(X,ω), then ψ ∈ Êχ(X,ω).
One always has Êχ(X,ω) ⊂ Eχ(X,ω), while
Eχ̂(X,ω) ⊂ Êχ(X,ω), where χ
′(t− 1) = tnχ̂′(t).
Since we are mainly interested in the sequel in weights with (super)
fast growth at infinity, the previous proposition shows that Êχ(X,ω) and
Eχ(X,ω) are roughly the same: a function ϕ ∈ PSH(X,ω) belongs to one of
these classes if and only if Capω(ϕ < −t) decreases fast enough, as t→ +∞.
Proof. The convexity of Êχ(X,ω) follows from the following simple observa-
tion: if ϕ,ψ ∈ Êχ(X,ω) and 0 ≤ a ≤ 1, then
{aϕ+ (1− a)ψ < −t} ⊂ {ϕ < −t} ∪ {ψ < −t} .
The stability under maximum is obvious.
Assume ϕ ∈ Êχ(X,ω). We can assume without loss of generality ϕ ≤ 0
and χ(0) = 0. Set ϕj := max(ϕ,−j). It follows from Lemma 2.3 below that
(−χ) ◦ ϕj (ω + dd
χ′(−t)(ω + ddcϕj)
n(ϕj < −t)dt
χ′(−t)tnCapω(ϕ < −t)dt < +∞,
This shows that ϕ ∈ Eχ(X,ω). The other inclusion goes similarly, using
the second inequality in Lemma 2.3 below. �
If ϕ ∈ Eχ(X,ω) (or Êχ(X,ω)), then the bigger the growth of χ at −∞,
the smaller Capω(ϕ < −t) when t → +∞, hence the closer ϕ is from being
bounded. Indeed ϕ ∈ PSH(X,ω) is bounded iff it belongs to Eχ(X,ω) for
all weights χ, as was observed in [GZ 2], Proposition 3.1. Similarly
PSH(X,ω) ∩ L∞(X) =
Êχ(X,ω),
where the intersection runs over all concave increasing functions χ.
We will make constant use of the following result:
6 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
Lemma 2.3. Fix ϕ ∈ E(X,ω). Then for all s > 0 and 0 ≤ t ≤ 1,
tnCapω(ϕ < −s− t) ≤
(ϕ<−s)
(ω + ddcϕ)n ≤ snCapω(ϕ < −s),
where the second inequality is true only for s ≥ 1.
The proof is a direct consequence of the comparison principle (see Lemma
2.2 in [EGZ] and [GZ 2]).
3. Measures dominated by capacity
From now on µ denotes a positive Radon measure on X whose total mass
is V olω(X): this is an obvious necessary condition in order to solve (MA)µ.
To simplify numerical computations, we assume in the sequel that µ and ω
have been normalized so that
µ(X) = V olω(X) =
ωn = 1.
When µ = ehωn is a smooth volume form and ω is a Kähler form, S.T.Yau
has proved [Y] that (MA)µ admits a unique smooth solution ϕ ∈ PSH(X,ω)
with supX ϕ = 0. Smooth measures are easily seen to be nicely dominated
by the Monge-Ampère capacity (see the proof of Corollary 3.2 below).
Measures dominated by the Monge-Ampère capacity have been exten-
sively studied by S.Kolodziej in [K 2,3,4]. Following S. Kolodziej ([K3], [K4])
with slightly different notations, fix ε : R → [0,∞[ a continuous decreasing
function and set
Fε(x) := x[ε(− lnx/n)]
n, x > 0.
We will consider probability measures µ satisfying the following condition :
for all Borel subsets K ⊂ X,
µ(K) ≤ Fε(Capω(K)).
The main result achieved in [K 2], can be formulated as follows: If ω is a
Kähler form and
ε(t)dt < +∞ then µ = (ω + ddcϕ)n for some contin-
uous function ϕ ∈ PSH(X,ω).
The condition
ε(t)dt < +∞ means that ε decreases fast enough
towards zero at infinity. This gives a quantitative estimate on how fast
ε(− lnCapω(K)/n), hence µ(K), decreases towards zero as Capω(K) → 0.
ε(t)dt = +∞, it follows from Theorem 2.1 that µ = (ω +
ddcϕ)n for some function ϕ ∈ E(X,ω), but ϕ will generally be unbounded.
Our second main result measures how far ϕ is from being bounded:
Theorem 3.1. Assume for all compact subsets K ⊂ X,
(3.1) µ(K) ≤ Fε(Capω(K)).
Then µ = (ω + ddcϕ)n where ϕ ∈ E(X,ω) is such that supX ϕ = 0 and
Capω(ϕ < −s) ≤ exp(−nH
−1(s)), for all s > 0.
Here H−1 is the reciprocal function of H(x) = e
ε(t)dt + s0, where s0 =
s0(ε, ω) ≥ 0 is a constant which only depends on ε and ω.
In particular ϕ ∈ Eχ(X,ω) where −χ(−t) = exp(nH
−1(t)/2).
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 7
Recall that here, and troughout the article, ω ≥ 0 is merely big.
Before proving this result we make a few observations.
• It is interesting to consider as well the case when ε(t) increases to-
wards +∞. One can then obtain solutions ϕ such that Capω(ϕ <
−t) decreases at a polynomial rate. When e.g. ω is Kähler and
µ(K) ≤ Capω(K)
α, 0 < α < 1, it follows from Proposition 5.3
in [GZ 2] that µ = (ω + ddcϕ)n where ϕ ∈ Ep(X,ω) for some
p = pα > 0. Here E
p(X,ω) denotes the Cegrell type class Eχ(X,ω),
with χ(t) = −(−t)p.
• When ε(t) ≡ 1, Fε(x) = x and H(x) ≍ e.x. Thus Theorem 3.1 reads
µ ≤ Capω ⇒ µ = (ω + dd
cϕ)n, where
Capω(ϕ < −s) . exp (−ns/e) .
This is precisely the rate of decreasing corresponding to functions
which look locally like − log(− log ||z||), in some local chart z ∈
U ⊂ Cn. This class of ω-psh functions with “loglog-singularities” is
important for applications (see [Ku], [BKK]).
• If ε(t) decreases towards zero, then Capω(ϕ < −t) decreases at a
superexponential rate. The faster ε(t) decreases towards zero, the
slower the growth of H, hence the faster the growth of H−1 at infin-
ity. When
ε(t)dt < +∞, the function ε decreases so fast that
Capω(ϕ < −t) = 0 for t >> 1, thus ϕ is bounded. This is the case
when µ(K) ≤ Capω(K)
α for some α > 1 [K 2], [EGZ].
• When
ε(t)dt = +∞, the solution ϕmay well be unbounded (see
Examples in section 4). At the critical case where µ ≤ Fε(Capω) for
all functions ε such that
ε(t)dt = +∞, we obtain
µ = (ω + ddcϕ)n with ϕ ∈ PSH(X,ω) ∩ L∞(X),
as follows from Proposition 3.1 in [GZ 2]. This partially explains the
difficulty in describing the range of Monge-Ampère operators on the
set of bounded (quasi-)psh functions.
Proof. The assumption on µ implies in particular that it vanishes on pluripo-
lar sets. It follows from Theorem 2.1 that there exists a function ϕ ∈ E(X,ω)
such that µ = (ω + ddcϕ)n and supX ϕ = 0. Set
g(s) := −
logCapω(ϕ < −s), ∀s > 0.
The function g is increasing on [0,+∞] and g(+∞) = +∞, since Capω
vanishes on pluripolar sets. Observe also that g(s) ≥ 0 for all s ≥ 0, since
g(0) = −
logCapω(X) = −
log V olω(X) = 0.
It follows from Lemma 2.3 and (3.1) that for all s > 0 and 0 ≤ t ≤ 1,
tnCapω(ϕ < −s− t) ≤ µ(ϕ < −s) ≤ Fε (Capω(ϕ < −s)) .
Therefore for all s > 0 and 0 ≤ t ≤ 1,
(3.2) log t− log ε ◦ g(s) + g(s) ≤ g(s + t).
We define an increasing sequence (sj)j∈N by induction setting
sj+1 = sj + eε ◦ g(sj), for all j ∈ N.
8 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
The choice of s0. Recall that (3.2) is only valid for 0 ≤ t ≤ 1. We choose
s0 ≥ 0 large enough so that
(3.3) e.ε ◦ g(s0) ≤ 1.
This will allow us to use (3.2) with t = tj = sj+1 − sj ∈ [0, 1], since ε ◦ g is
decreasing, while sj ≥ s0 is increasing, hence
0 ≤ tj = eε ◦ g(sj) ≤ eε ◦ g(s0) ≤ 1.
We must insure that s0 = s0(ε, ω) can chosen to be independent of ϕ. This
is a consequence of Proposition 2.7 in [GZ 1]: since supX ϕ = 0, there exists
c1(ω) > 0 so that 0 ≤
(−ϕ)ωn ≤ c1(ω), hence
g(s) := −
logCapω(ϕ < −s) ≥
log s−
log(n+ c1(ω)).
Therefore g(s0) ≥ ε
−1(1/e) for s0 = s0(ε, ω) := (n+ c1(ω)) exp(nε
−1(1/e)),
which is independent of ϕ. This yields e.ε ◦ g(s0) ≤ 1, as desired.
The growth of sj. We can now apply (3.2) and get g(sj) ≥ j + g(s0) ≥ j.
Thus lim g(sj) = +∞. There are two cases to be considered.
If s∞ = lim sj ∈ R
+, then g(s) ≡ +∞ for s > s∞, i.e. Capω(ϕ < −s) =
0, ∀s > s∞. Therefore ϕ is bounded from below by −s∞, in particular
ϕ ∈ Eχ(X,ω) for all χ.
Assume now (second case) that sj → +∞. For each s > 0, there exists
N = Ns ∈ N such that sN ≤ s < sN+1. We can estimate s 7→ Ns:
s ≤ sN+1 =
(sj+1 − sj) + s0 =
e ε ◦ g(sj) + s0
ε(j) + s0 ≤ e.ε(0) + e
ε(t)dt+ s0 =: H(N),
Therefore H−1(s) ≤ N ≤ g(sN ) ≤ g(s), hence
Capω(ϕ < −s) ≤ exp(−nH
−1(s)).
Set now −χ(−t) = exp(nH−1(t)/2). Then
tnχ′(−t)Capω(ϕ < −t)dt
ε(H−1(t)) + s̃0
exp(−nH−1(t)/2)dt
tn exp(−nt/2)dt < +∞.
This shows that ϕ ∈ Eχ(X,ω) where χ(t) = − exp(nH
−1(−t)/2).
It follows from the proof above that when
ε(t)dt < +∞, the solution
ϕ is bounded since in this case we have
s∞ := lim
sj ≤ s0(ε, ω) + e ε(0) + e
ε(t)dt < +∞
where s0(ε, ω) is an absolute constant satisfying (3.3) (see above). �
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 9
Let us emphasize that Theorem 3.1 also yields a slightly simplified proof of
the following result [K 2], [EGZ]: if µ(K) ≤ Fε(Capω(K)) for some decreas-
ing function ε : R → R+ such that
ε(t)dt < +∞, then the sequence
(sj) above is convergent, hence µ = (ω + dd
cϕ)n, where ϕ ∈ PSH(X,ω) is
bounded. For the reader’s convenience we indicate a proof of the following
important particular case:
Corollary 3.2. Let µ = fωn be a measure with density 0 ≤ f ∈ Lp(ωn),
where p > 1 and
fωn =
ωn. Then there exists a unique bounded
function ϕ ∈ PSH(X,ω) such that (ω + ddcϕ)n = µ, supX ϕ = 0 and
0 ≤ ||ϕ||L∞(X) ≤ C(p, ω).||f ||
Lp(ωn)
where C(p, ω) > 0 only depends on p and ω.
This a priori bound is a crucial step in the proof by S.T.Yau of the Calabi
conjecture (see [Ca], [Y], [A], [T], [Bl]). The proof presented here follows
Kolodziej’s new and decisive pluripotential approach (see [K2]). Let us stress
that the dependence ω 7−→ C(p, ω) is quite explicit, as we shall see in the
proof. This is important when considering degenerate situations [EGZ].
Proof. We claim that there exists C1(ω) such that
(3.4) µ(K) ≤
C1(ω)||f ||
Lp(ωn)
[Capω(K)]
, for all Borel sets K ⊂ X.
Assuming this for the moment, we can apply Theorem 3.1 with ε(x) =
C1(ω)||f ||
Lp(ωn)
exp(−x), which yields, as observed at the end of the proof
of Theorem 3.1
||ϕ||L∞(X) ≤M(f, ω),
whereM(f, ω) := s0(ε, ω)+e ε(0)+e
ε(t)dt = s0(ε, ω)+2eC1(ω)||f ||
Lp(ωn)
and s0 = s0(ε, ω) is a large number s0 > 1 satisfying the inequality (3.3).
In order to give the precise dependence of the uniform bound M(f, ω) on
the Lp−norm of the density f , we need to choose s0 more carefully. Observe
that condition (3.3) can be written
Capω({ϕ ≤ −s0}) ≤ exp(−nε
−1(1/e).
Since nε−1(1/e) = log
enC1(ω)
n‖f‖Lp(ωn)
, we must choose s0 > 0 so that
(3.5) Capω({ϕ ≤ −s0}) ≤
enC1(ω)n‖f‖Lp(ωn)
We claim that for any N ≥ 1 there exists a uniform constant C2(N, p, ω) >
0 such that for any s > 0,
(3.6) Capω({ϕ ≤ −s}) ≤ C2(N, p) s
−N ‖f‖Lp(ωn).
Indeed observe first that by Hölder inequality,
(−ϕ)Nωnϕ =
(−ϕ)Nfωn ≤ ‖f‖Lp(ωn)‖ϕ‖
LNq (ωn)
10 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
Since ϕ belongs to the compact family {ψ ∈ PSH(X,ω); supX ψ = 0}
([GZ2]), there exists a uniform constant C ′2(N, p, ω) > 0 such that ‖ϕ‖
LNq(ωn)
C ′2(N, p, ω), hence
(−ϕ)Nωnϕ ≤ C
2(N, p, ω)‖f‖Lp(ωn).
Fix u ∈ PSH(X,ω) with −1 ≤ u ≤ 0 and N ≥ 1 to be specified later. If
follows from Tchebysheff and energy inequalities ([GZ2]) that
{ϕ≤−s}
(ω + ddcu)n ≤ s−N
(−ϕ)N (ω + ddcu)n
≤ cN s
−N max
(−ϕ)Nωnϕ,
(−u)Nωnu
≤ cN s
−N max
C ′2(N, p, ω), 1
‖f‖Lp(ωn).
We have used here the fact that ‖f‖Lp(ωn) ≥ 1, which follows from the
normalization : 1 =
fωn ≤ ‖f‖Lp(ωn). This proves the claim.
SetN = 2n, it follows from (3.6) that s0 := C1(ω)
nenC2(2n, p, ω)‖f‖
Lp(ωn)
satisfies the required condition (3.5), which implies the estimate of the the-
orem.
We now establish the estimate (3.4). Observe first that Hölder’s inequality
yields
(3.7) µ(K) ≤ ||f ||Lp(ωn) [V olω(K)]
, where 1/p + 1/q = 1.
Thus it suffices to estimate the volume V olω(K). Recall the definition of
the Alexander-Taylor capacity, Tω(K) := exp(− supX VK,ω), where
VK,ω(x) := sup{ψ(x) /ψ ∈ PSH(X,ω), ψ ≤ 0 on K}.
This capacity is comparable to the Monge-Ampère capacity, as was observed
by H.Alexander and A.Taylor [AT] (see Proposition 7.1 in [GZ 1] for this
compact setting):
(3.8) Tω(K) ≤ e exp
Capω(K)
It thus remains to show that V olω(K) is suitably bounded from above by
Tω(K). This follows from Skoda’s uniform integrability result: set
ν(ω) := sup {ν(ψ, x) /ψ ∈ PSH(X,ω), x ∈ X} ,
where ν(ψ, x) denotes the Lelong number of ψ at point x. This actually
only depends on the cohomology class {ω} ∈ H1,1(X,R). It is a standard
fact that goes back to H.Skoda (see [Z]) that there exists C2(ω) > 0 so that
ωn ≤ C2(ω),
for all functions ψ ∈ PSH(X,ω) normalized by supX ψ = 0. We infer
(3.9) V olω(K) ≤
V ∗K,ω
ωn ≤ C2(ω)[Tω(K)]
1/ν(ω).
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 11
It now follows from (3.7), (3.8), (3.9), that
µ(K) ≤ ||f ||Lp [C2(ω)]
1/qe1/qν(ω) exp
qν(ω)Capω(K)
The conclusion follows by observing that exp(−1/x1/n) ≤ Cnx
2 for some
explicit constant Cn > 0. �
4. Examples
4.1. Measures invariant by rotations. In this section we produce exam-
ples of radially invariant functions/measures which show that our previous
results are essentially sharp. The first example is due to S.Kolodziej [K 1].
Example 4.1. We work here on the Riemann sphere X = P1(C), with
ω = ωFS, the Fubini-Study volume form. Consider µ = fω a measure with
density f which is smooth and positive on X \ {p}, and such that
f(z) ≃
|z|2(log |z|)2
, c > 0,
in a local chart near p = 0. A simple computation yields µ = ω + ddcϕ,
where ϕ ∈ PSH(P1, ω) is smooth in P1 \ {p} and ϕ(z) ≃ −c′ log(− log |z|)
near p = 0, c′ > 0, hence
logCapω(ϕ < −t) ≃ −t,
Here a ≃ b means that a/b is bounded away from zero and infinity.
This is to be compared to our estimate logCapω(ϕ < −t) . −t/e (Theo-
rem 3.1 ) which can be applied, as it was shown by S.Kolodziej in [K 1] that
µ . Capω. Thus Theorem 3.1 is essentially sharp when ε ≡ 1.
We now generalize this example and show that the estimate provided by
Theorem 3.1 is essentially sharp in all cases.
Example 4.2. Fix ε as in Theorem 3.1. Consider µ = fω on X = P1(C),
where ω = ωFS is the Fubini-Study volume form, f ≥ 0 is continuous on
1 \ {p}, and
f(z) ≃
ε(log(− log |z|))
|z|2(log |z|)2
in local coordinates near p = 0. Here ε : R → R+ decreases towards 0 at
+∞. We claim that there exists A > 0 such that
(4.1) µ(K) ≤ ACapω(K)ε(− logCapω(K)), for all K ⊂ X.
This is clear outside a small neighborhood of p = 0 since the measure µ
is there dominated by a smooth volume form. So it suffices to establish this
estimate when K is included in a local chart near p = 0. Consider
K̃ := {r ∈ [0, R] ; K ∩ {|z| = r} 6= ∅}.
It is a classical fact (see e.g. [Ra]) that the logarithmic capacity c(K) of
K can be estimated from below by the length of K̃, namely
l(K̃)
≤ c(K̃) ≤ c(K).
12 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
Using that ε is decreasing, hence 0 ≤ −ε′, we infer
µ(K) ≤ 2π
∫ l(K̃)
f(r)rdr
∫ l(K̃)
ε(log(− log r))− ε′(log− log r)
r(log r)2
ε(log(− log l(K̃)))
− log l(K̃)
ε(log(− log 4c(K)))
− log 4c(K)
Recall now that the logarithmic capacity c(K) is equivalent to Alexander-
Taylor’s capacity T∆(K), which in turn is equivalent to the global Alexander-
Taylor capacity Tω(K) (see [GZ 1]): c(K) ≃ T∆(K) ≃ Tω(K). The Alexander-
Taylor’s comparison theorem [AT] reads
− log 4c(K) ≃ − log Tω(K) ≃ 1/Capω(K),
thus µ(K) ≤ ACapω(K)ε(− logCapω(K)).
We can therefore apply Theorem 3.1. It guarantees that µ = (ω + ddcϕ),
where ϕ ∈ PSH(P1, ω) satisfies logCapω(ϕ < −s) ≃ −nH
−1(s), with
H(s) = eA
ε(t)dt + s0. On the other hand a simple computation shows
that ϕ is continuous in P1 \ {p} and
ϕ ≃ −H(log(− log |z|)) , near p = 0.
The sublevel set (ϕ < −t) therefore coincides with the ball of radius
exp(− exp(H−1(t))), hence logCapω(ϕ < −s) ≃ −H
−1(s).
4.2. Measures with density. Here we consider the case when µ = fdV
is absolutely continuous with respect to a volume form.
Proposition 4.3. Assume µ = fωn is a probability measure whose density
satisfies f [log(1 + f)]n ∈ L1(ωn). Then µ . Capω.
More generally if f [log(1 + f)/ε(log(1 + | log f |))]n ∈ L1(ωn) for some
continuous decreasing function ε : R → R+∗ , then for all K ⊂ X,
µ(K) ≤ Fε(Capω(K)), where Fε(x) = Ax
, A > 0.
Proof. With slightly different notations, the proof is identical to that of
Lemma 4.2 in [K 4] to which we refer the reader. �
We now give examples showing that Proposition 4.3 is almost optimal.
Example 4.4. For simplicity we give local examples. The computations to
follow can also be performed in a global compact setting.
Consider ϕ(z) = − log(− log ||z||), where ||z|| =
|z1|2 + . . .+ |zn|2 de-
notes the Euclidean norm in Cn. One can check that ϕ is plurisubharmonic
in a neighborhood of the origin in Cn, and that there exists cn > 0 so that
µ := (ddcϕ)n = f dVeucl, where f(z) =
||z||2n(− log ||z||)n+1
Observe that f [log(1 + f)]n−α ∈ L1, ∀α > 0 but f [log(1 + f)]n 6∈ L1.
A PRIORI ESTIMATES FOR SOLUTIONS OF MONGE-AMPÈRE EQUATIONS 13
When n = 1 it was observed by S. Kolodziej [K 1] that µ(K) . Capω(K).
Proposition 4.3 yields here
µ(K) . Capω(K)(| logCapω(K)|+ 1).
For n ≥ 1, it follows from Proposition 4.3 and Theorem 3.1 that
logCapω(ϕ < −s) . −nH
−1(s).
On the other hand, one can directly check that logCapω(ϕ < −s) ≃ −nH
−1(s).
One can get further examples by considering ϕ(z) = χ ◦ log ||z||, so that
(ddcϕ)n =
′ ◦ log ||z||)n−1χ′′(log ||z||)
||z||2n
dVeucl.
References
[AT] H.ALEXANDER & B.A.TAYLOR: Comparison of two capacities in Cn.
Math. Zeit, 186 (1984),407-417.
[A] T.AUBIN: Équations du type Monge-Ampère sur les variétés kählériennes
compactes. Bull. Sci. Math. (2) 102 (1978), no. 1, 63–95.
[BT] E.BEDFORD & B.A.TAYLOR: A new capacity for plurisubharmonic func-
tions. Acta Math. 149 (1982), no. 1-2, 1–40.
[Bl] Z.BLOCKI: On uniform estimate in Calabi-Yau theorem. Sci. China Ser. A
48 (2005), suppl., 244–247.
[BKK] G.BURGOS & J.KRAMER & U.KUHN: Arithmetic characteristic classes
of automorphic vector bundles. Doc. Math. 10 (2005), 619–716.
[Ca] E.CALABI: On Kähler manifolds with vanishing canonical class. Algebraic
geometry and topology. A symposium in honor of S. Lefschetz, pp. 78–89.
Princeton Univ. Press, Princeton, N. J. (1957).
[Ce] U.CEGRELL: Pluricomplex energy. Acta Math. 180 (1998), no. 2, 187–217.
[EGZ] P.EYSSIDIEUX & V.GUEDJ & A.ZERIAHI: Singular Kähler-Einstein met-
rics. Preprint arxiv math.AG/0603431.
[GZ 1] V.GUEDJ & A.ZERIAHI: Intrinsic capacities on compact Kähler manifolds.
J. Geom. Anal. 15 (2005), no. 4, 607-639.
[GZ 2] V.GUEDJ & A.ZERIAHI: The weighted Monge-Ampère energy of quasi-
plurisubharmonic functions. J. Funct. Anal. 250 (2007), 442-482.
[K 1] S.KOLODZIEJ: The range of the complex Monge-Ampère operator. Indiana
Univ. Math. J. 43 (1994), no. 4, 1321–1338.
[K 2] S.KOLODZIEJ: The complex Monge-Ampère equation. Acta Math. 180
(1998), no. 1, 69–117.
[K 3] S.KOLODZIEJ: The Monge-Ampère equation on compact Kähler manifolds.
Indiana Univ. Math. J. 52 (2003), no. 3, 667–686
[K 4] S.KOLODZIEJ: The complex Monge-Ampère equation and pluripotential
theory. Mem. Amer. Math. Soc. 178 (2005), no. 840, x+64 pp.
[Ku] U.KUHN: Generalized arithmetic intersection numbers. J. Reine Angew.
Math. 534 (2001), 209–236.
[R] J.RAINWATER: A note on the preceding paper. Duke Math. J. 36 (1969)
799–800.
[Ra] T.RANSFORD: Potential theory in the complex plane. London Mathemati-
cal Society Student Texts, 28. Cambridge University Press, Cambridge, 1995.
x+232 pp.
[T] G.TIAN: Canonical metrics in Kähler geometry. Lectures in Mathematics
ETH Zürich. Birkhäuser Verlag, Basel (2000).
[Y] S.T.YAU: On the Ricci curvature of a compact Kähler manifold and the
complex Monge-Ampère equation. I. Comm. Pure Appl. Math. 31 (1978),
no. 3, 339–411.
[Z] A.ZERIAHI: Volume and capacity of sublevel sets of a Lelong class of psh
functions. Indiana Univ. Math. J. 50 (2001), no. 1, 671–703.
http://arxiv.org/abs/math/0603431
14 S.BENELKOURCHI & V.GUEDJ & A.ZERIAHI
Slimane BENELKOURCHI & Vincent GUEDJ & Ahmed ZERIAHI
Laboratoire Emile Picard
UMR 5580, Université Paul Sabatier
118 route de Narbonne
31062 TOULOUSE Cedex 09 (FRANCE)
benel@math.ups-tlse.fr
guedj@math.ups-tlse.fr
zeriahi@math.ups-tlse.fr
	1. Introduction
	2. Weakly singular quasiplurisubharmonic functions
	2.1. The range of the Monge-Ampère operator
	2.2. High energy and capacity estimates
	3. Measures dominated by capacity
	4. Examples
	4.1. Measures invariant by rotations
	4.2. Measures with density
	References
	Bibliography
ABSTRACT
  Let $X$ be a compact K\"ahler manifold and $\om$ a smooth closed form of
bidegree $(1,1)$ which is nonnegative and big. We study the classes ${\mathcal
E}_{\chi}(X,\om)$ of $\om$-plurisubharmonic functions of finite weighted
Monge-Amp\`ere energy. When the weight $\chi$ has fast growth at infinity, the
corresponding functions are close to be bounded.
  We show that if a positive Radon measure is suitably dominated by the
Monge-Amp\`ere capacity, then it belongs to the range of the Monge-Amp\`ere
operator on some class ${\mathcal E}_{\chi}(X,\om)$. This is done by
establishing a priori estimates on the capacity of sublevel sets of the
solutions.
  Our result extends U.Cegrell's and S.Kolodziej's results and puts them into a
unifying frame. It also gives a simple proof of S.T.Yau's celebrated a priori
${\mathcal C}^0$-estimate.

<|endoftext|><|startoftext|>
Density oscillation in highly flattened quantum elliptic rings and tunable strong
dipole radiation
S.P. Situ, Y.Z. He, and C.G. Bao∗
The State Key Laboratory of Optoelectronic Materials and Technologies,
Zhongshan University, Guangzhou, 510275, P.R. China
A narrow elliptic ring containing an electron threaded by a magnetic field B is studied. When the
ring is highly flattened, the increase of B would lead to a big energy gap between the ground and
excited states, and therefore lead to a strong emission of dipole photons. The photon frequency
can be tuned in a wide range by changing B and/or the shape of the ellipse. The particle density is
found to oscillate from a pattern of distribution to another pattern back and forth against B. This
is a new kind of Aharonov-Bohm oscillation originating from symmetry breaking and is different
from the usual oscillation of persistent current.
∗Corresponding author
It is recognized that micro-devices are important to
micro-techniques. Various kinds of micro-devices, includ-
ing the quantum rings,1 have been extensively studied
theoretically and experimentally in recent years. Quan-
tum rings are different from other devices due to their
special geometry. A distinguished phenomenon of the
ring is the Aharonov-Bohm (A-B) oscillation of the
ground state energy and persistent current2−5. It is be-
lieved that geometry would affect the properties of small
systems. Therefore, in addition to circular rings, ellip-
tic rings or other rings subjected to specific topological
transformations deserve to be studied, because new and
special properties might be found. There have been a
number of literatures devoted to elliptic quantum dots6−9
and rings10−12. It was found that the elliptic rings have
two distinguished features. (i) The avoided crossing of
the levels and the suppression of the A-B oscillation. (ii)
The appearance of localized states which are related to
bound states in infinite wires with bends.13 These feature
would become more explicit if the eccentricity is larger
and the ring is narrower.
On the other hand, as a micro-device, the optical prop-
erty is obviously essential to its application. It is guessed
that very narrow rings with a high eccentricity might
have special optical property, this is a point to be clari-
fied. This paper is dedicated to this topic. It turns out
that the optical properties of a highly flattened narrow
ring is greatly different from a circular ring due to having
a tunable energy gap, which would lead to strong dipole
transitions with wave length tunable in a very broad
range (say, from 0.1 to 0.001cm). Besides, a kind of A-B
density-oscillation originating from symmetry breaking
was found as reported as follows.
We consider an electron with an effective mass m∗ con-
fined on a one-dimensional elliptic ring with a half major
axis rax and an eccentricity ε. Let us introduce an argu-
ment θ so that a point (x, y) at the ring is related to θ as
x = rax cos θ and y = ray sin θ, where ray = rax
1− ε2 is
the half minor axis. A uniform magnetic field B confined
inside a cylinder with radius rin vertical to the plane of
the ring is applied. The associated vector potential reads
A = Br2int/2r, where t is a unit vector normal to the
position vector r. Then, the Hamiltonian reads
H = G/(1− ε2 cos2 θ)[− d
− i2α
1− ε2
(1− ε2 sin2 θ)
1− ε2 cos2 θ
1− ε2 sin2 θ
] (1)
where G = ~2/(2m∗r2ax), α = φ/φo, φ = πr
inB is the
flux, φo = hc/e is the flux quantum.
The eigen-states are expanded as Ψj =
∑kmax
k=kmin
eikθ, where k is an integer ranging
from kmin to kmax, and j = 1, 2, · · · denotes the ground
state, the second state, and so on. The coefficients C
are obtained via the diagonalization of H . In practice, B
takes positive values, kmin = −100 and kmax = 10. This
range of k assures the numerical results having at least
four effective figures. The energy of the j − th state is
Ej = 〈H〉j ≡
dθ(1 − ε2 cos2 θ)Ψ∗jHΨj (2)
where the eigen-state is normalized as
dθ(1− ε2 cos2 θ)Ψ∗jΨj (3)
In the follows the units meV, nm, and Tesla are used,
m∗ = 0.063me (for InGaAs rings), and rin is fixed at
25. When rax = 50, ε = 0 and 0.4, the evolution of
the low-lying spectra with B are given in Fig.1. When
ε = 0.4, the effect of eccentricity is still small, the spec-
trum is changed only slightly from the case ε = 0, but
the avoided crossing of levels can be seen.10,11 In par-
ticular, the A-B oscillation exists and the period of φ
remains to be φo. However, when ε becomes large, three
remarkable changes emerge as shown in Fig.2. (i) The
A-B oscillation of the ground state vanishes gradually.
(ii) The energy of the second state becomes closer and
closer to the ground state. (iii) There is an energy gap
lying between the ground state and the third state, the
http://arxiv.org/abs/0704.0867v1
0 2 4 6 8 10
E(meV)
B(Tesla)
ε =0.4rax=50(b)
ε =0rax=50(a)
FIG. 1: Low-lying spectrum (in meV) of an one-electron sys-
tem on an elliptic ring against B. rax = 50nm and ε = 0 (a)
and 0.4 (b). The period of the flux φo = hc/e is associated
with B = 2.106 Tesla.
0 4 8 12 16 20
B (Tesla)
E(meV)
rax= 50,   ε = 0.8
FIG. 2: Similar to Fig.1 but ε = 0.8. The lowest eight
levels are included, where a great energy gap lies between the
ground and the third states.
gap width increases nearly linearly with B. The exis-
tence of the gap is a remarkable feature which has not
yet been found before from the rings with a finite width.
This feature is crucial to the optical properties as shown
later. Fig.3 demonstrates further how the gap varies
with ε, rax, and B , where B is from 0 to 30 (or φ from 0
to 14.24φo). One can see that, when ε is large and rax is
small, the increase of B would lead to a very large gap.
0 5 10 15 20 25 30
0 5 10 15 20 25 30
(b)  ε = 0.8
ε =0.8
B (Tesla)
E3-E1 
(meV)
(a)  r
ε =0.6
ε=0.4
FIG. 3: Evolution of the energy gap E3 − E1 when rax and
ε are given.
The A-B oscillation of the ground state energy is given
in Fig.4. The change of ε does not affect the period
(2.106 Tesla). However, when ε is large, the amplitude of
the oscillation would be rapidly suppressed. Thus, for a
highly flattened elliptic ring, the A-B oscillation appears
only when B is small.
0 2 4 6 8 10
B (Tesla)
ε =0,   0.4,   0.8rax=50
FIG. 4: The A-B oscillation of the ground state energy. The
solid, dash-dot-dot, and dot lines are for ε = 0, 0.4, and 0.8,
respectively.
The persistent current of the j − th state reads14
Jj = G/~[Ψ
1− ε2
(1− ε2 sin2 θ)
)Ψj + c.c.] (4)
The A-B oscillation of Jj is plotted in Fig.5. When ε is
small (≤ 0.4), just as in Fig.4, the effect of ε is small as
shown in 5a. When ε is large there are three noticeable
points: (i) The oscillation of the ground state current
would become weaker and weaker when B increases. (ii)
The current of the second state has a similar amplitude
as the ground state, but in opposite phase. (iii) The
third (and higher) state has a much stronger oscillation
of current.
0 2 4 6 8 10
B (Tesla)
(b)   rax=50,      ε =0.8
ε=0,  0.4,  0.8(a)  r
FIG. 5: The A-B oscillation of the persistent current J . (a)
is for the ground state with ε = 0 (solid line), 0.4 (dash-dot-
dot), and 0.8 (dot). (b) is for the first (ground), second and
third states (marked by 1,2, and 3 by the curves) with ε fixed
at 0.8. The ordinate is 106 times J/c in nm−1.
For elliptic rings, the angular momentum L is not con-
served. However, it is useful to define (L)j = 〈−i ∂∂θ 〉j
(refer to eq.(2)). This quantity would tend to an integer
if ε → 0. It was found that (i) When ε is small (≤ 0.4),
(L)1 of the ground state decreases step by step with B,
each step by one, just as the case of circular rings. How-
ever, when ε is large, (L)1 decreases continuously and
nearly linearly. (ii) When ε is small, |(L)i− (L)1| is close
(not close) to 1 if 2 ≤ i ≤ 3 (otherwise). Since L would
be changed by ±1 under a dipole transition, the ground
state would therefore essentially jump to the second and
third states. Accordingly, the dipole photon has essen-
tially two energies, namely, E2 −E1 and E3 −E1 . How-
ever, this is not exactly true when ε is large.
There is a relation between the dipole photon energies
and the persistent current.15 For ε = 0, the ground state
with L = k1 would have the current J1 = G(k1 + α)/π~,
while the ground state energy E(k1) = G(k1 + α)
2. Ac-
cordingly the second and third states would have L =
k1 ± 1, therefore we have
|E3 − E2| = |E(k1 + 1)− E(k1 − 1)| = 2hJ1 (5)
This relation implies that the current can be accurately
measured simply by measuring the energy difference of
the photons emitted in dipole transitions. For elliptic
rings, this relation holds approximately when ε is small
(≤ 0.4), as shown in Fig.6a. However, the deviation is
quite large when ε is large as shown in 6c.
0 2 4 6 8 10
(c)   ε = 0.7
B (Tesla)
(b)   ε = 0.5
(a)     ε = 0.3
FIG. 6: E3 − E2 and the persistent current of the ground
state. The solid line denotes (E3 − E2)/(2hc)10
6, the dash-
dot-dot line denotes |J |/c·106 . They overlap nearly if ε < 0.3.
The probability of dipole transition from Ψj to Ψj′
reads
(ω/c)3|〈x∓ iy〉j′,j |2 (6)
where ~ω = Ej′ − Ej is the photon energy,
〈x∓ iy〉j′,j = rax
dθ(1 − ε2 cos2 θ)Ψ∗j′
[cos θ ∓ i
1− ε2 sin θ]Ψj (7)
The probability of the transition of the ground state to
the j′ − th state is shown in Fig.7. When ε is small
(≤ 0.4) and B is not very large (≤ 10), the allowed final
states essentially Ψ2 and Ψ3, and the oscillation of the
probability is similar to the case of circular rings with the
same period as shown in 7a and 7b. In particular, P±3,1 is
considerably larger than P±2,1 due to having a larger pho-
ton energy, thus the third state is particularly important
to the optical properties. When ε is large (Fig.7c), the
oscillation disappears gradually with B,while the prob-
ability increases very rapidly due to the factor (ω/c)3.
Since E3 − E1 is nearly proportional to B as shown in
Fig.3, the probability is nearly proportional to B3. This
leads to a very strong emission (absorption). Further-
more, in Fig.7c the black solid curve is much higher than
the dash-dot-dot curve, it implies that the final states
can be higher than Ψ3, this leads to an even larger prob-
ability.
0 2 4 6 8 10
B(Tesla)
c) ε =0.8
b)  ε =0.4
a)   ε =0
FIG. 7: Evolution of the probability of dipole transition of
the ground state. The green line is for Ψ1 to Ψ2 transition,
red line for Ψ1 to Ψ3, dash-dot-dot line is for the sum of the
above two, solid line in black is for the total probability.
For circular rings, the particle densities ρ of all the
eigen-states are uniform under arbitraryB. However, for
elliptic rings, ρ is no more uniform as shown in Fig.8. For
the ground state (8a), when φ =0, the non-uniformity is
slight and ρ is a little smaller at the two ends of the major
axis (θ = 0, π). When φ increases, the density at the
two ends of the minor axis (θ = π/2, 3π/2) increases as
well. When φ = 4φo the non-uniformity is very strong as
shown by the curve 9, where ρ ≈ 0 when θ ≈ 0 or π. The
second state has a parity opposite to the ground state,
0 20 40 60 80 100 120
0 20 40 60 80 100 120
0 20 40 60 80 100 120
Arc length (nm)
                      fourth state 
    third state
8ε =0.8,   r
ground state
FIG. 8: Particle densities ρ as functions of the arc length
(the according change of θ is 0 to π). The fluxes are given as
φ = (i− 1)φo/2, where i is an integer from 1 to 9 marked by
the curves. The first group of curves (in violet) have φ/φo =
integer, the second group (in green) have φ/φo = half-integer.
When φ increases, the curve of ρ jumps from the first group
to the second group and jumps back, and repeatedly.
but their densities are similar. For the third state (8b), ρ
is peaked not at the ends of the major and minor axes but
in between. In particular, when B increases, ρ oscillates
from one pattern (say, in violet line) to another pattern
(in green line), and repeatedly. The density oscillation
would become stronger in higher states (8c). The period
of oscillation remains to be φo, thus it is a new type of
A-B oscillation without analogue in circular rings (where
ρ remains uniform). Incidentally, the density oscillation
does not need to be driven by a strong field, instead, a
small change of φ from 0 to φo is sufficient.
Let us evaluate Ej roughly by using (L)j to replace
the operator −i ∂
in eq.(2) Then,
Ej ≈ G
dθ{[(L)j + α
1− ε2
1− ε2 sin2 θ
αε2 sin 2θ
2(1− ε2 sin2 θ)
]2}Ψ∗jΨj (8)
There are two terms at the right each is a square of a
pair of brackets (for circular rings the second term does
not exist). It is reminded that, while α = φ/φo is given
positive, (L)j is negative. Thus there is a cancellation
inside the first term. Therefore, when ε and α are large,
the second term would be more important. It is recalled
that both Ψ1 and Ψ2 are mainly distributed around θ =
π/2 and 3π/2 (refer to Fig.8a), where the second term
is zero due to the factor sin 2θ. Accordingly the energies
of Ψ1 and Ψ2 are lower. On the contrary, both Ψ3 and
Ψ4 are distributed close to the peaks of the second term
(refer to Fig.8b and 8c), this leads to a higher energy.
This effect would be greatly amplified by αε2 , this leads
to the large energy gap shown in Fig.3.
In summary, the optical property of highly flattened
elliptic narrow rings was found to be greatly different
from circular rings. For the latter, both the energy of
the dipole photon and the probability of transition are
low, and they are oscillating in small domains. On the
contrary, for the former, both the energy and the prob-
ability are not limited, the energy (probability) is nearly
proportional to B (B3), they are tunable by changing
ε, rax and/or B. It implies that a strong source of light
with frequency adjustable in a wide domain can be de-
signed by using highly flattened, narrow, and small rings.
Furthermore, a new type of A-B oscillation, namely, the
density oscillation, originating from symmetry breaking,
was found. This is a noticeable point because the density
oscillation might be popular for the systems with broken
symmetry (e.g., with C3 symmetry).
Acknowledgment: The support under the grants
10574163 and 90306016 by NSFC is appreciated.
References
1, S.Viefers, P. Koskinen, P. Singha Deo, M. Manninen,
Physica E 21 , 1 (2004)
2, U.F. Keyser, C. Fühner, S. Borck, R.J. Haug, M.
Bichler, G. Abstreiter, and W. Wegscheider, Phys. Rev.
Lett. 90, 196601 (2003)
3, D. Mailly, C. Chapelier, and A. Benoit, Phys. Rev.
Lett. 70, 2020 (1993)
4, A. Fuhrer, S. Lüscher, T. Ihn, T. Heinzel, K. Ensslin,
W. Wegscheider, and M. Bichler, Nature (London) 413,
822 (2001)
5, A.E. Hansen, A. Kristensen, S. Pedersen, C.B.
Sorensen, and P.E. Lindelof, Physica E (Amsterdam) 12,
770 (2002)
6, M. van den Broek, F.M. Peeters, Physica E,11, 345
(2001)
7, E. Lipparini, L. Serra, A. Puente, European Phys.
J. B 27, 409 (2002)
8, J. Even, S. Loualiche, P. Miska, J. of Phys.: Cond.
Matt., 15, 8737 (2003)
9, C. Yannouleas, U. Landman, Physica Status Solidi
A 203, 1160 (2006)
10, D. Berman, O Entin-Wohlman, and M. Ya. Azbel,
Phys. Rev. B 42, 9299 (1990)
11, D. Gridin, A.T.I. Adamou, and R.V. Craster, Phys.
Rev. B 69, 155317 (2004)
12, A. Bruno-Alfonso, and A. Latgé, Phys. Rev. B 71,
125312 (2005)
13, J. Goldstone and R.L. Jaffe, Phys. Rev. B 45,
14100 (1992)
14, Eq.(4) originates from a 2-dimensional system via
the following steps. (i) the components of the current
along X- and Y-axis are firstly obtained from the conser-
vation of mass as well known. (ii) Then, the component
along the tangent of ellipse jθ can be obtained. (iii) jθ
is integrated along the normal of the ellipse under the
assumption that the wave function is restricted in a very
narrow region along the normal, then it leads to eq.(4).
15, Y.Z. He, C.G. Bao (submitted to PRB)
ABSTRACT
  A narrow elliptic ring containing an electron threaded by a magnetic field B
is studied. When the ring is highly flattened, the increase of B would lead to
a big energy gap between the ground and excited states, and therefore lead to a
strong emission of dipole photons. The photon frequency can be tuned in a wide
range by changing B and/or the shape of the ellipse. The particle density is
found to oscillate from a pattern of distribution to another pattern back and
forth against $B$. This is a new kind of Aharonov-Bohm oscillation originating
from symmetry breaking and is different from the usual oscillation of
persistent current.

<|endoftext|><|startoftext|>
Effect of electron-electron interaction on the phonon-mediated spin relaxation in
quantum dots
Juan I. Climente,1, ∗ Andrea Bertoni,1 Guido Goldoni,1, 2 Massimo Rontani,1 and Elisa Molinari1, 2
1CNR-INFM National Center on nanoStructures and bioSystems at Surfaces (S3), Via Campi 213/A, 41100 Modena, Italy
2Dipartimento di Fisica, Università degli Studi di Modena e Reggio Emilia, Via Campi 213/A, 41100 Modena, Italy
(Dated: October 21, 2018)
We estimate the spin relaxation rate due to spin-orbit coupling and acoustic phonon scattering
in weakly-confined quantum dots with up to five interacting electrons. The Full Configuration
Interaction approach is used to account for the inter-electron repulsion, and Rashba and Dresselhaus
spin-orbit couplings are exactly diagonalized. We show that electron-electron interaction strongly
affects spin-orbit admixture in the sample. Consequently, relaxation rates strongly depend on the
number of carriers confined in the dot. We identify the mechanisms which may lead to improved
spin stability in few electron (> 2) quantum dots as compared to the usual one and two electron
devices. Finally, we discuss recent experiments on triplet-singlet transitions in GaAs dots subject
to external magnetic fields. Our simulations are in good agreement with the experimental findings,
and support the interpretation of the observed spin relaxation as being due to spin-orbit coupling
assisted by acoustic phonon emission.
PACS numbers: 73.21.La,71.70.Ej,72.10.Di,73.22.Lp
I. INTRODUCTION
There is currently interest in manipulating electron
spins in quantum dots (QDs) for quantum information
and quantum computing purposes.1,2,3 A major goal in
this research line is to optimize the spin relaxation time
(T1), which sets the upper limit of the spin coherence
time (T2): T2 ≤ 2T1.4 Therefore, designing two-level
spin systems with long spin relaxation times is an im-
portant step towards the realization of coherent quantum
operations and read-out measuraments. Up to date, spin
relaxation has been investigated almost exclusively in
single-electron4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 and two-
electron20,21,22,23,24,25,26,27,28,29,30 QDs. Spin relaxation
in QDs with a larger number of electrons has seldom been
considered28,31, even though Coulomb blockade makes
it possible to control the exact number of carriers con-
fined in a QD.32 Yet, recent theoretical works suggest
that Coulomb interaction renders few-electron charge de-
grees of freedom more stable than single-electron ones33,
which leads to the question of whether similar findings
hold for spin degrees of freedom. Moreover, in weakly-
confined QDs, acoustic phonon emission assisted by spin-
orbit (SO) interaction has been identified as the domi-
nant spin relaxation mechanism when cotunneling and
nuclei-mediated relaxation are reduced.6,8,31 The com-
bined effect of Coulomb interaction and SO coupling has
been shown to influence the energy spectrum of few-
electron QDs profoundly,34,35,36 but the consequences on
the spin relaxation remain largely unexplored.37
In Ref. 28 we investigated the effect of a magnetic
field on the triplet-singlet (TS) spin relaxation in two
and four-electron QDs with SO coupling, so as to under-
stand related experimental works. Motivated by the very
different response observed for different number of con-
fined particles, in this work we shall focus on the role of
electron-electron interaction in spin relaxation processes,
extending our analysis to different number of carriers,
highlighting, in particular, the different physics involved
in even and odd number of confined electrons. Further-
more, we will explicitly compare the predictions of our
theoretical model with very recent experiments on spin
relaxation in two-electron GaAs QDs.29
We study theoretically the energy structure and spin
relaxation of N interacting electrons (N = 1 − 5) in
parabolic GaAs QDs with SO coupling, subject to axial
magnetic fields. Both Rashba38 and Dresselhaus39 SO
terms are considered, and the electron-electron repulsion
is accounted for via the Full Configuration Interaction
method.40,41 By focusing on the two lowest spin states,
two different classes of systems are distinguished. For N
odd (1,3,5) and weak magnetic fields, the ground state is
a doublet and then the two-level system is defined by the
Zeeman-split sublevels of the lowest orbital. For N even
(2,4), the two-level system is defined by a singlet and a
triplet. We analyze these two classes of systems sepa-
rately because, as we shall comment below, the physics
involved in the spin transition differs. Thus, we compare
the phonon-induced spin relaxation of N = 1, 3, 5 elec-
trons and that of N = 2, 4 separately. As a general rule,
the larger the number of confined carriers, the stronger
the SO mixing, owing to the increasing density of elec-
tronic states. This would normally yield faster relax-
ation rates. However, we note that this is not necessarily
the case, and few-electron states may display compara-
ble or even slower relaxation than their single-electron
and two-electron counterparts. This is due to charac-
teristic features of the few-particle energy spectra which
tend to weaken the admixture between the initial and
final spin states. In N -odd systems, it is the presence
of low-energy quadruplets for N > 1 that reduces the
admixture between the Zeeman sublevels of the (dou-
blet) ground state, hence inhibiting the spin flipping. In
N -even systems, electronic correlations partially quench
http://arxiv.org/abs/0704.0868v2
phonon emission33, and the relaxation can be further sup-
pressed forN > 2 if one selects initial and final spin states
differing in more than one quantum of angular momen-
tum, which inhibits direct triplet-singlet SO mixing via
linear Rashba and Dresselhaus SO terms.28 Noteworthy,
all these effects are connected with Coulomb interaction
between confined carriers.
The paper is organized as follows. In Section II we give
details about the theoretical model we use. In Section
III we study the energy structure and spin relaxation of
a QD with an odd number of electrons (N = 1, 3, 5). In
Section IV we do the same for QDs with an even number
of electrons (N = 2, 4). In Section V we compare our
numerical simulations with experimental data recently
reported for N = 2 GaAs QDs. Finally, in Section VI we
present the conclusions of this work.
II. THEORY
We consider weakly-confined GaAs/AlGaAs QDs,
which are the kind of samples usually fabricated
by different groups to investigate spin relaxation
processes.7,8,20,22 In these structures, the dot and the sur-
rounding barrier have similar elastic properties, and the
lateral confinement (which we approximate as circular)
is much weaker than the vertical one. A number of use-
ful approximations can be made for such QDs. First,
since the weak lateral confinement gives inter-level spac-
ings within the range of few meV, only acoustic phonons
have significant interaction with bound carriers, while op-
tical phonons can be safely neglected. Second, the elasti-
cally homogeneous materials are not expected to induce
phonon confinement, which allows us to consider three-
dimensional bulk phonons. Finally, the different energy
scales of vertical and lateral electronic confinement allow
us to decouple vertical and lateral motion in the building
of single-electron spin-orbitals. Thus, we take a parabolic
confinement profile in the in-plane (x, y) direction, with
single-particle energy gaps h̄ω0, which yields the Fock-
Darwin states.42 In the vertical direction (z) the confine-
ment is provided by a rectangular quantum well of width
Lz and height determined by the band-offset between
the QD and barrier materials (the zero of energy is then
the bottom of the conduction band). The quantum well
eigenstates are derived numerically. In cylindrical coor-
dinates, the single-electron spin-orbitals can be written
ψµ(ρ, θ, z; sz) =
eimθ Rn,m(ρ) ξ0(z)χsz , (1)
where ξ0 is the lowest eigenstate of the quantum well,
χsz is the spinor eigenvector of the spin z-component
with eigenvalue sz, and Rn,m is the n−th Fock-Darwin
orbital with azimuthal angular momentum m,
Rn,m(ρ) =
(n+ |m|)!
0 L|m|n
In the above expression L|m|n denotes a generalized La-
guerre polynomial and l0 =
h̄/m∗ω0 is the effective
length scale, with m∗ standing for the electron effec-
tive mass. The energy of the single-particle Fock-Darwin
states is given by En,m = (2n + 1 + |m|)h̄Ωc + m2 h̄ωc,
where ωc =
is the cyclotron frequency and Ωc =
+ (ωc/2)2 is the total (spatial plus magnetic) con-
finement frequency.
With regard to Coulomb interaction, we need to go
beyond mean field approximations in order to properly
include electronic correlations, which play an important
role in determining the phonon-induced electron scatter-
ing rate.43 Moreover, since we are interested in the re-
laxation time of excited states, we need to know both
ground and excited states with comparable accuracy.
Our method of choice is the Full Configuration Interac-
tion approach: the few-electron wave functions are writ-
ten as linear combinations |Ψa〉 =
cai|Φi〉, where the
Slater determinants |Φi〉 = Πµic†µi |0〉 are obtained by
filling in the single-electron spin-orbitals µ with the N
electrons in all possible ways consistent with symmetry
requirements; here c†µ creates an electron in the level µ.
The fully interacting Hamiltonian is numerically diago-
nalized, exploiting orbital and spin symmetries.40,41 The
few-electron states can then be labeled by the total az-
imuthal angular momentumM = 0,±1,±2 . . ., total spin
S and its z-projection Sz.
The inclusion of SO terms is done following a similar
scheme to that of Ref. 44, although here we consider
not only Rashba but also linear Dresselhaus terms. For
a quantum well grown along the [001] direction, these
terms read:38,39
HR = α
(kysx − kxsy), (3)
HD = γc 〈k2z〉(kysy − kxsx), (4)
where α and γc are coupling constants, while sj and kj
are the j-th Cartesian projections of the electron spin and
canonical momentum, respectively, along the main crys-
talographic axes (〈k2z〉 = (π/Lz)2 for the lowest eigen-
state of the quantum well). The momentum operator
includes a magnetic field B applied along the vertical di-
rection z. Other SO terms may also be present in the
conduction band of a QD, such as the contribution aris-
ing from the system inversion asymmetry in the lateral
dimension or the cubic Dresselhaus term. However, in
GaAs QDs with strong vertical confinement, HR and HD
account for most of the SO interaction.36
We rewrite Eqs.(3,4) in terms of ladder operators as:
HR = α
(k+s− − k−s+), (5)
HD = β
(k+s+ + k−s−), (6)
where k± and s± change m and sz by one quantum,
respectively, and β = γc (π/Lz)
2 is the Dresselhaus in-
plane coupling constant. It is worth mentioning that
when only Rashba (Dresselhaus) coupling is present, the
total angular momentum j = m + sz (j = m − sz)
is conserved. However, in the general case, when both
coupling terms are present and α 6= β, all symmetries
are broken. Still, SO interaction in a large-gap semi-
conductor such as GaAs is rather weak, and the low-
lying states can be safely labelled by their approximate
quantum numbers (M,S, Sz) except in the vicinity of the
level anticrossings.11,26,45 Since the few-electron M and
Sz quantum numbers are given by the algebraic sum of
the single-particle states m and sz quantum numbers, it
is clear from Eqs. (5,6) that Rashba interaction mixes
(M,Sz) states with (M ± 1, Sz ∓ 1) ones, while Dressel-
haus interaction mixes (M,Sz) with (M ± 1, Sz ± 1).
The SO terms of Eqs. (5,6) can be spanned on a basis of
correlated few-electron states.46 The SO matrix elements
are then given by sums of single-particle contributions of
the form:
〈n′m′ s′z| HR +HD |nmsz〉 =
C∗R O+n′m′ nm δm′ m+1 δs′z sz−1+CR O
n′m′ nm δm′ m−1 δs′z sz+1+
C∗D O+n′m′ nm δm′ m+1 δs′z sz+1+CD O
n′m′ nm
δm′ m−1 δs′
sz−1.
Here CR = α and CD = −iβ are constans for the Rashba
and Dresselhaus interactions respectively, andO± are the
form factors:
Rnm(t),
Rnm(t),
with t = ρ2/l20. The above forms factors have analytical
expressions which depend on the set of quantum num-
bers {n′m′, nm}. The resulting SO-coupled eigenvec-
tors are then linear combinations of the correlated states,
|ΨSOA 〉 =
cAa|Ψa〉.
We assume zero temperature, which suffices to capture
the main features of one-phonon processes.9,16 Indeed, it
is one-phonon processes that account for most of the low-
temperature experimental observations in the SO cou-
pling regime.2,6,8,28,29,31 We evaluate the relaxation rate
between the initial (occupied) and final (empty) states
of the SO-coupled few-electron state, B and A, using the
Fermi Golden Rule:
τ−1B→A =
c∗BbcAa
c∗bicaj〈Φi|Vνq|Φj〉
δ(EB−EA−h̄ωq),
where the electron states |ΨSOK 〉 (K = A,B) have been
written explicitly as linear combinations of Slater deter-
minants, EK stands for the K electron state energy and
h̄ωq represents the phonon energy. Vνq is the interac-
tion operator of an electron with an acoustic phonon of
momentum q via the mechanism ν, which can be either
deformation potential or piezoelectric field interaction.
Details about the electron-phonon interaction matrix el-
ements can be found elsewhere.33
In this work we study a GaAs/Al0.3Ga0.7As QDs, us-
ing the following material parameters:47 electron effective
massm∗ = 0.067, band-offset Vc = 243 meV, crystal den-
sity d = 5310 kg/m3, acoustic deformation potential con-
stant D = 8.6 eV, effective dielectric constant ǫ = 12.9,
and piezoelectric constant h14 = 1.41 · 109 V/m. The
Landé factor is g = −0.44.5 As for GaAs sound speed, we
take cLA = 4.72 · 103 m/s for longitudinal phonon modes
and cTA = 3.34 · 103 m/s for transversal modes.48 Unless
otherwise stated, a lateral confinement of h̄ω0 = 4 meV
and a quantum well width of Lz = 10 nm are assumed
for the QD under study, and a Dressehlaus coupling pa-
rameter γc = 25.5 eV·Å3 is taken49, so that β ≈ 25
meV·Å. The value of the Rashba coupling constant can
be modulated externally e.g. with external electric fields.
Here we will investigate systems both with and without
Rashba interaction. When present, we shall mostly con-
sider α = 50 meV·Å, to represent the case where Rashba
effects prevail over Dresselhaus ones.
Few-body correlated states (M,S, Sz) are obtained us-
ing a basis set composed by the Slater determinants
(SDs) which result from all possible combinations of 42
single-electron spin-orbitals (i.e., from the six lowest en-
ergy shells of the Fock-Darwin spectrum at B = 0) filled
with N electrons. For N = 5, this means that the basis
rank may reach ∼ 2 · 105. The SO Hamiltonian is then
diagonalized in a basis of up to 56 few-electron states,
which grants a spin relaxation convergence error below
2%. Since SO terms break the spin and angular mo-
mentum symmetries, the SO-coupled states |ΨSOK 〉 are
described by a linear combination of SDs coming from
different (M,S, Sz) subspaces. Thus, for N = 5, the
states are described by up to ∼ 8.5 · 105 SDs. To evalu-
ate the electron-phonon interaction matrix elements, we
note that only a small percentage of the huge number
of possible pairs of SDs (∼ 7 · 1011 for N = 5) may
give non-zero matrix elements, owing to spin-orbital or-
thogonalities. We scan all pairs of SDs and filter those
which may give non-zero matrix elements writing the de-
terminants in binary representation and using efficient
bit-per-bit algorithms.40,41 The matrix elements of the
remaining pairs (∼ 2 ·106 for N = 5) are evaluated using
massive parallel computation.
 0 1 3
B (T)
FIG. 1: Low-lying energy levels in a QD with N = 1, 3, 5
interacting electrons, as a function of an axial magnetic field.
The SO interaction coefficients are α = 50 meV· Å and β =
25 meV· Å. The dot has h̄ω0 = 4 meV and Lz = 10 nm. Note
the increasing size of the SO-induced anticrossing gaps and
zero-field splittings with increasing N .
III. SPIN RELAXATION IN A QD WITH N ODD
A. Energy structure
When the number of electrons confined in the QD is
odd and the magnetic field is weak enough, the ground
and first excited states are usually the Zeeman sz = 1/2
and sz = −1/2 sublevels of a doublet [Fig. 1]. Since the
initial and final spin states belong to the same orbital,
∆M = 0 and SO mixing (which requires ∆M = ±1)
is only possible with higher-lying states. In addition,
the phonon energy (corresponding to the electron tran-
sition energy) is typically small (in the µeV scale). In
this case, the relaxation rate is determined essentially by
the phonon density, the strength and nature of the SO
interaction, and the proximity of higher-lying states.9,11
In order to gain some insight on the influence of these
factors, in Fig. 1 we compare the energy structure of a
QD with N = 1, 3, 5 vs. an axial magnetic field, in the
presence of Rashba and Dresselhaus interactions.55 One
can see that the increasing number of particles changes
the energy magneto-spectrum drastically. This is be-
cause the quantum numbers of the low-lying energy levels
change, resulting in a different field dependence, and be-
cause Coulomb interaction leads to an increased density
of electron states, as well as to a more complicated spec-
trum.
At first sight, the energy spectra of Fig. 1 closely resem-
ble those in the absence of SO effects. For instance, the
N = 1 spectrum is very similar to the pure Fock-Darwin
spectrum.42 Rashba and Dresselhaus interactions were
expected to split the degenerate |m| > 0 shells at B = 0,
shift the positions of the level crossings and turn them
into anticrossings36,52,53,54, but here such signatures are
hardly visible because SO interaction is weak in GaAs.
In fact, the magnitude of the SO-induced zero-field en-
ergy splittings and that of the anticrossing gaps is of very
few µeV, and SO effects simply add fine features to the
N = 1 spectrum.52
A significantly different picture arises in the N = 3
and N = 5 cases. Here, the increased density of elec-
tronic states enhances SO mixing as compared to the
single-electron case.56 As a result, the anticrossing gaps
can be as large as 30 µeV (N = 3) and 60 µeV (N = 5).
Moreover, unlike in the N = 1 case, where the ground
state orbital has m = 0, here it has |M | = 1. Therefore,
the Zeeman sublevels involved in the fundamental spin
transition are subject to SO-induced zero-field splittings.
To illustrate this point, in Fig. 2 we zoom in on the energy
spectrum of the four lowest states of N = 3 and N = 5
under weak magnetic fields, without (left panels) and
with (right panels) Rashba interaction. Clearly, the four-
fold degeneracy of |M | = 1 spin-orbitals at B = 0 has
been lifted by SO interaction.36 One can also see that the
order of the two lowest sublevels at B ∼ 0 changes when
Rashba interaction is switched on. Thus, for N = 3 and
α = 0, the two lowest sublevels are (M = −1, Sz = 1/2)
and (M = −1, Sz = −1/2), but this order is reversed
when α = 50 meV·Å. The opposite level order as a func-
tion of α is found for N = 5. This behavior constitutes a
qualitative difference with respect to the N = 1 case in
two aspects. First, the phonon energy (i.e., the energy
of the fundamental spin transition) is no longer given by
the bare Zeeman splitting. Instead, it has a more compli-
cated dependence on the magnetic field, and it is greatly
influenced by the particular values of α and β. This is
apparent in the N = 5 panels, where the energy splitting
between the two lowest states strongly differs depending
on the relative value of α and β. Second, it is possible
to find situations where the ground state at B ∼ 0 has
Sz = −1/2 and the first excited state has Sz = 1/2 (e.g.
N = 3 when α > β or N = 5 when α < β). In these
cases, the Zeeman splitting leads to a weak anticrossing
of the two sublevels (highlighted with dashed circles in
Fig. 2) which has no counterpart in single-electron sys-
tems. This kind of B-induced (i.e., not phonon-induced)
ground state spin mixing, also referred to as “intrinsic
spin mixing”, has been previously reported for singlet-
triplet transitions in N = 2 QDs.58 Here we show that
they may also exist in few-electron QDs with N odd.
(−2,1/2)(1,1/2)
(−1,1/2)
(1,1/2)
(1,1/2)
(−4,1/2)
(−2,1/2)
(−4,1/2)
(−1,1/2)
(−1,1/2)
(−1,1/2)
(1,1/2)
α = 50, β = 25α = 0, β = 25
N = 3
N = 5
 127.0
 127.5
 128.0
 128.5
0.5 1.0 0.0
 236.0
 236.5
 237.0
B (T)
 0.0 0.5 1.0
B (T)
FIG. 2: The four lowest energy levels in a QD with N = 3, 5
interacting electrons, as a function of an axial magnetic field,
without (left column) and with (right column) Rashba SO
interaction. The approximate quantum numbers (M,S) of
the levels are shown, with arrows denoting the spin projection
Sz = 1/2 (↑) and Sz = −1/2 (↓). The dashed circles highlight
the region of intrinsic spin mixing of the ground state.
Figure 1 puts forward yet another qualitative differ-
ence between SO coupling in single- and few-electron
QDs: while in the former low-energy anticrossings are
due to Rashba interaction11,36,52, in few-electron QDs,
when S = 3/2 states come into play, both Rashba and
Dresselhaus terms may induce anticrossings. For exam-
ple, the (M = −1, Sz = 1/2) sublevel couples directly to
both (M = −2, Sz = −1/2) and (M = −2, Sz = 3/2)
sublevels, via the Dresselhaus and Rashba interaction,
respectively. Coupling to S = 3/2 states is a characteris-
tic feature of N > 1 systems, which has important effects
on the spin relaxation rate, as we will discuss below.
B. Spin relaxation between Zeeman sublevels
In Fig. 3 we compare the magnetic field dependence of
the spin relaxation rate between the two lowest Zeeman
sublevels of N = 1, 3, 5. Dashed lines (solid lines) are
used for systems without (with) Rashba interaction.59
While for N = 1 the well-known exponential dependence
with B is found2,6,9, and the main effect of Rashba cou-
pling is to shift the curve upwards (i.e., to accelerate the
relaxation), for N = 3 and N = 5 the relaxation rate ex-
hibits complicated trends which strongly depend on the
values of the SO coupling parameters.
α = 50, β = 25
 α = 0, β = 25
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
B (T)
 0  1  2  3
FIG. 3: Spin relaxation rate in a QD with N = 1, 3, 5 inter-
acting electrons as a function of an axial magnetic field. Solid
(dashed) lines stand for the system with (without) Rashba
interaction. Note the strong influence of the SO interaction
in the shape of the relaxation curve for N > 1.
To understand this result, one has to bear in mind that
in spin relaxation processes two well-distinguished and
complementary ingredients are involved, namely SO in-
teraction and phonon emission. Phonon emission grants
the conservation of energy in the electron relaxation,
but phonons have zero spin and therefore cannot cou-
ple states with different spin. It is the SO interaction
that turns pure spin states into mixed ones, thus enabling
the phonon-induced transition. The overall efficiency of
the scattering event is then given by the combination
of the two phenomena: the phonon emission efficiency
modulated by the extent of the SO mixing. The shape
of spin relaxation curves shown in Fig. 3 can be directly
related to the energy dispersion of the phonon, which cor-
responds to the splitting between the two lowest levels of
the electron spectrum. Thus, for N = 1, the phonon
energy is simply proportional to B through the Zeeman
splitting, but for N = 3 and N = 5 it has a non-trivial
dependence on B, as shown in Fig. 2. Actually, the relax-
ation minima in Fig. 3 are connected with the magnetic
field values where the two lowest levels anticross in Fig. 2.
In these magnetic field windows, in spite of the fact that
SO coupling is strong, the phonon density is so small that
the relaxation rate is greatly suppressed.28 Similarly, the
relaxation rate fluctuations of N = 3 at B ∼ 3 T are
signatures of the anticrossings with high-angular momen-
tum states. For larger fields (B > 3 T), the ground state
approaches the maximum density droplet configuration
and high-spin states are possible.44 In this work, how-
ever, we restrict ourselves to the magnetic field regime
where the ground state is a doublet.
eV∆    (µ     )
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 0 20 40 60 80
 10  
FIG. 4: (Color online). Spin relaxation rate in a QD with
N = 1, 3, 5 interacting electrons as a function of the energy
splitting between the two lowest spin states. Top panel: α =
0, β = 25 meV·Å. Bottom panel: α = 50 meV·Å, β = 25
meV·Å. The relaxation of N = 3 is slower than that of N = 1
for a wide range of ∆12. The irregular data distribution is
due to the irregular relaxation rates vs. magnetic field. For
example, the strongly deviated points of N = 3 come from
the peaks at B ∼ 3 in Fig. 3.
For a more direct comparison between the relaxation
rates of N = 1, 3, 5, in Fig. 4 we replot the data of Fig. 3
as a function of the energy splitting between the two
lowest states, ∆12, without (top panel) and with (bot-
tom panel) Rashba interaction. Since the phonon energy
is identical for all points with the same ∆12, differences
in the relaxation rate arise exclusively from the different
strength of SO interaction. ∆12 is also a relevant pa-
rameter from the experimental point of view, since it is
usually required that it be large enough for the states to
be resolvable. In this sense, it is worth noting that, even
if the inter-level splittings shown in Fig. 4 are fairly small,
a number of experiments have successfully addressed this
regime.5,8,21
A most striking feature observed in the figure is that,
for most values of ∆12, the N = 3 relaxation rate is
clearly slower than the N = 1 one. Likewise, N = 5
shows a similar (or slightly faster) relaxation rate than
N = 1. These are interesting results, for they suggest
that improved spin stability may be achieved using few-
electron QDs instead of the single-electron ones typically
employed up to date.8 At first sight the results are sur-
prising, because the higher density of states in the few-
electron systems implies smaller inter-level spacings, and
hence stronger SO mixing, which should translate into
enhanced relaxation. It then follows that another physi-
cal mechanism must be acting upon the few-electron sys-
tems, which reduces the transition probability between
the initial and final spin states, and may even make it
smaller than for N = 1. Here we propose that such
mechanism is the SO admixture with low-lying quadru-
plet (S = 3/2) states, which become available for N > 1.
By coupling to S = 3/2 levels, the projection of the dou-
blet Sz = 1/2 levels onto Sz = −1/2 ones is reduced,
and this partly inhibitis the transition between the low-
est doublet sublevels.
Let us explain this by comparing the spin transition for
N = 1 and N = 3. For N = 1, the spin configuration of
the initial and final states, in the absence of SO coupling,
is |Sz = −1/2〉 and |Sz = +1/2〉, respectively. The tran-
sition between these states is spin-forbidden. However,
when SO coupling is switched on, the two states become
admixed with higher-lying S = 1/2 states fulfilling the
∆Sz = ±1 condition. The transition between the initial
and final states can then be represented schematically as:
ca |Sz = −1/2〉+cb|Sz = +1/2〉 ⇒ cr |Sz = +1/2〉+cs|Sz = −1/2〉,
where ci are the admixture coefficients (in general ca ≫
cb and cr ≫ cs). Clearly now both spin configurations of
the initial state have a finite overlap with the final state,
and so the transition is possible. Let us next consider
the N = 3 case. In the absence of SO coupling, the
initial and final states are again the Sz = −1/2 and Sz =
+1/2 doublets, respectively, and the transition is spin-
forbidden. When we switch on SO coupling, we note that
the ∆Sz = ±1 condition allows for mixing not only with
Sz = ±1/2 states (either doublets or quadruplets) but
also with Sz = ±3/2 quadruplets, so that the transition
can be represented as:
ca|Sz = −1/2〉+ cb|Sz = +1/2〉+ cc|Sz = −3/2〉 ⇒
cr|Sz = +1/2〉+ cs|Sz = −1/2〉+ ct|Sz = +3/2〉,
where, in general, ca ≫ cb, cc, and cr ≫ cs, ct. In this
case, |Sz = −3/2〉 has no overlap with the final state con-
figurations. Likewise, |Sz = +3/2〉 has no overlap with
the initial state configurations. Therefore, these quadru-
plet configurations are inactive from the point of view
of the transition, and the more important they are (i.e.,
the stronger the SO coupling with quadruplet states), the
less likely the transition is.
To prove this argument quantitatively, in Fig. 5 we il-
lustrate the spin relaxation of N = 3 calculated by diag-
onalization of the SO Hamiltonian including and exclud-
ing the low-lying S = 3/2 states from the basis set. As
expected, when the quadruplets are not considered, the
transition is visibly faster. For N = 5, low-lying S = 3/2
levels are also available, but in this case they barely com-
pensate for the large density of electron states, so that
the overall scattering rate turns out to be comparable to
that of N = 1.
eV∆    (µ     )
without S=3/2
with S=3/2
 10  
 10  
 10  
 10  
 10  
 0 20 40 60 80
FIG. 5: (Color online). Spin relaxation rate in a QD with N =
3 interacting electrons as a function of the energy splitting
between the two lowest spin states. α = 0 and β = 25 meV·Å.
Symbol + (×) stands for SO Hamiltonian diagonalized in a
basis which includes (excludes) S = 3/2 states. Clearly, the
inclusion of S = 3/2 states slows down the relaxation.
To test the robustness of the few-electron spin states
stability predicted above, we also compare the relax-
ation rate of N = 1 and N = 3 in a QD with dif-
ferent confinement, namely h̄ω0 = 6 meV, in Fig. 6.
Since the lateral confinement of the dot is now stronger,
(M = −1, S = 1/2) is the N = 3 ground state up to large
values of the magnetic field (B ∼ 5 T). This allows us
to investigate larger Zeeman splittings (i.e., larger ∆12),
which may be easier to resolve experimentally. As seen
in the figure, the relaxation rate of N = 3 is again slower
than that of N = 1 for a wide range of ∆12, the behavior
being very similar to that of Fig. 4, albeit extended to-
wards larger inter-level spacings. The crossing between
N = 3 and N = 1 relaxation rates at large ∆12 val-
ues, both in Fig. 4 and Fig. 6, is due to the proximity
of high-angular momentum levels coming down in energy
for N = 3 when the magnetic field (and hence the Zee-
man splitting) is large. Such levels bring about strong
SO admixture and thus fast relaxation (see middle panel
of Fig. 3 at B ∼ 3 T).
IV. SPIN RELAXATION IN A QD WITH N
A. Energy structure
When the number of electrons confined in the QD is
even and the magnetic field is not very strong, the ground
and first excited states are usually a singlet (S = 0)
and a triplet (S = 1) with three Zeeman sublevels
eV∆    (µ     )
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 10  
 0 40 80 120
FIG. 6: (Color online). Spin relaxation rate in a QD with
N = 1, 3 interacting electrons as a function of the energy
splitting between the two lowest spin states. The QD has
h̄ω0 = 6 meV. Top panel: α = 0, β = 25 meV·Å. Bottom
panel: α = 50 meV·Å, β = 25 meV·Å. As for the weaker-
confined dot of Fig. 4, the relaxation of N = 3 is slower than
that of N = 1 for a wide range of ∆12.
(Sz = +1, 0,−1). Unlike in the previous section, here the
initial and final states of the spin transition may have dif-
ferent orbital quantum numbers, and the inter-level split-
ting ∆12 may be significantly larger (in the meV scale).
Under these conditions, the phonon emission efficiency no
longer exhibits a simple proportionality with the phonon
density, but it further depends on the ratio between the
phonon wavelength and the QD dimensions.50,51 More-
over, SO interaction is sensitive to the quantum numbers
of the initial and final electron states.26,28 Therefore, in
this class of spin transitions the details of the energy
structure are also relevant to determine the relaxation
rate.
In Fig. 7 we plot the energy levels vs. magnetic field for
a QD with N = 2, 4 in the presence of Rashba and Dres-
selhaus interactions. The approximate quantum num-
bers (M,S) of the lowest-lying states are written between
parenthesis. For N = 2 and weak fields, the ground state
is the (M = 0, S = 0) singlet, and the first excited state
is the (M = −1, S = 1) triplet. As in the previous sec-
tion, SO interaction introduces small zero-field splittings
and anticrossings in the energy levels with |M | > 0.36
As a consequence, when α > β, the zero-field ordering of
the (M = −1, S = 1) Zeeman sublevels is such that they
anticross in the presence of an external magnetic field.
This anticrossing is highlighted in the figure by a dashed
circle. On the other hand, as B increases the singlet-
triplet energy spacing is gradually reduced, and then the
singlet experiences a series of weak anticrossings with all
(−1,1)
(−2,0)
(−3,1)
(0,1)
(0,0)
 0  1  2  3
B (T)
FIG. 7: Low-lying energy levels in a QD with N = 2, 4 in-
teracting electrons as a function of an axial magnetic field.
α = 50 meV· Å and β = 25 meV· Å. The approximate
quantum numbers (M,S) of the lowest states are shown. The
dashed circle in N = 2 highlights the anticrossing between
M = −1 Zeeman sublevels.
three Zeeman sublevels of the triplet. These anticross-
ings are due to the fact that (M = 0, S = 0, Sz = 0)
couples to the (M = −1, S = 1, Sz = −1) sublevel via
Dresselhaus interaction, to the (M = −1, S = 1, Sz =
+1) sublevel via Rashba interaction, and finally to the
(M = −1, S = 1, Sz = 0) sublevel indirectly through
higher-lying states.26,28
For N = 4, the density of electronic states is larger
than for N = 2, which again reflects in a larger magni-
tude of the anticrossings gaps due to the enhanced SO
interaction. The ground state at B = 0 is a triplet,
(M = 0, S = 1), but soon after it anticrosses with a
singlet, (M = −2, S = 0). After this, and before the
formation of Landau levels, two different branches of the
first excited state can be distinguished: when B < 1 T,
the first excited state is (M = 0, S = 1), and when B > 1
T it is (M = −3, S = 1). It is worth pointing out that
the complexity of the N = 4 spectrum, as compared to
the simple N = 2 one, implies a greater flexibility to
select initial and final spin states by means of external
fields. As we shall discuss below, this degree of freedom
has important consequences on the relaxation rate.
B. Triplet-singlet spin relaxation
In a recent work, we have investigated the magnetic
field dependence of the TS relaxation due to SO cou-
pling and phonon emission in N = 2 and N = 4 QDs.28.
Here we study this kind of transition from a different
perspective, namely we compare the spin relaxation of
two- and four-electron systems in order to highlight the
changes introduced by inter-electron repulsion. Increas-
ing the number of electrons confined in the QD has three
important consequences on the TS transition. First, it
increases the density of electronic states (and then the
SO mixing), leading to faster relaxation. Second, as
mentioned in the previous section, it introduces a wider
choice of orbital quantum numbers for the singlet and
triplet states. Third, it increases the strength of elec-
tronic correlations. Since now the initial and final spin
states have different orbital wave functions, the latter
factor effectively reduces phonon scattering, in a similar
fashion to charge relaxation processes33 (this effect has
been recently pointed out in Ref. 30 as well). To find out
the overall combined effect of these three factors, in this
section we analyze quantitative simulations of correlated
We focus on the magnetic field regions where the
ground state is a singlet and the excited state is a triplet.
A complete description of the TS transition should then
include spin relaxation between the Zeeman-split sub-
levels of the triplet. However, for the weak fields we con-
sider this relaxation is orders of magnitude slower than
the TS one (compare Figs. 3 and 8),60 the reason for
this being the small Zeeman energy and the fact that the
Zeeman sublevels are not directly coupled by Rashba and
Dresselhaus terms, as mentioned in Section III. There-
fore, it is a good approximation to assume that all three
triplet Zeeman sublevels are equally populated and they
relax directly to the singlet.26
α = 50, β = 25
 α = 0, β = 25
 10  
 10  
 10  
 10  
 10  
 10  
 0.5  1  1.5  2  2.5  3
B (T)
FIG. 8: Spin relaxation rate in a QD with N = 2, 4 interact-
ing electrons as a function of an axial magnetic field. Solid
(dashed) lines stand for the system with (without) Rashba
interaction. The relaxation of N = 4 when B < 1 T is slower
than that of N = 2.
Figure 8 represents the TS relaxation rate in a QD with
N = 2, 4, after averaging the relaxation from the three
triplet sublevels. Solid (dashed) lines stand for the case
with (without) Rashba interaction.59 The main effect of
Rashba and Dresselhaus interactions is to accelerate the
spin transition by shifting the relaxation curve upwards.
This is in contrast to the N -odd case, where these terms
may induce drastic changes in the shape of the relaxation
rate curve (see Fig. 3). Figure 8 also reveals a different
behavior of the N = 2 and N = 4 TS relaxation rates.
The former increases gradually with B and then drops
in the vicinity of the TS anticrossing, due to the small
phonon energies.28,29,30 Conversely, for N = 4 an addi-
tional feature is found, namely an abrupt step at B ∼ 1.
This is due to the change of angular momentum of the
excited triplet. For B < 1 T the triplet has M = 0, and
for B > 1 T it has M = −3. Since the ground state
is a singlet with M = −2, the M = 0 triplet does not
fulfill the ∆M = ±1 condition for linear SO coupling.
This inhibits direct spin mixing between initial and final
states and reduces the relaxation rate by about one order
of magnitude.28
meV∆    (         )
N=4, M=0
N=4, M=−3
  10  
 10  
 10  
 10  
 10  
  10  
  10  
 0  0.4  0.8  1.2
FIG. 9: (Color online). Spin relaxation rate in a QD with
N = 2, 4 interacting electrons as a function of the energy
spacing between the singlet and the triplet. Here M stands
for the angular momentum of the triplet. Top panel: α = 0,
β = 25 meV·Å. Bottom panel: α = 50 meV·Å, β = 25 meV·Å.
The relaxation of N = 4 is comparable to that of N = 2 when
the triplet has M = −3, and it is much smaller when M = 0.
Noteworthy, the choice of states differing in more than
one quantum of angular momentum is only possible for
N > 2 QDs. One may then wonder if it is more conve-
nient to use these systems instead of the N = 2 ones dom-
inating the experimental literature up to date20,21,29, i.e.
if it compensates for the increased density of electronic
states. Interestingly, Fig. 8 predicts slower relaxation for
the N = 4 QD with M = 0 triplet than for N = 2. To
verify that this arises from weakend SO coupling rather
than from different phonon energy values, in Fig. 9 we
replot the spin relaxation rate of N = 2, 4 as a function
of the TS energy splitting. In the figure, the upper and
bottom panels represent the situations without and with
Rashba interaction, respectively. While N = 4 shows
similar relaxation rate to N = 2 when the triplet has
M = −3, the relaxation is slower by about one order
of magnitude when the triplet has M = 0. This result
indicates that the weakening of SO mixing due to the
violation of the ∆M = ±1 condition clearly exceeds the
strengthening due to the higher density of states, con-
firming that N = 4 systems are more attractive than
N = 2 ones to obtain long triplet lifetimes. We also
point out that, in spite of the different density of states,
the relaxation rate of N = 2 and N = 4, M = −3 triplets
is quite similar. This can be ascribed to the phonon scat-
tering reduction by electronic correlations,33 which may
also explain the fact that experimentally resolved TS re-
laxation rates of N = 8 QDs and N = 2 QDs be quite
similar.20,31
V. COMPARISON WITH N = 2 EXPERIMENTS
Whereas, to our knowledge, no experiments have mea-
sured transitions between Zeeman-split sublevels in N >
1 systems yet, a number of works have dealt with TS re-
laxation in QDs with few interacting electrons. In Ref. 28
we showed that our model correctly predicts the trends
observed in experiments with N = 2 and N = 8 QDs
subject to axial magnetic fields.20,21,31 In this section, we
extend the comparison to new experiments available for
N = 2 TS relaxation in QDs,29 which for the first time
provide continuous measurements of the average triplet
lifetime against axial magnetic fields, from B = 0 to the
vicinity of the TS anticrossing. By using a simple model,
the authors of the experimental work showed that the
measuraments are in clear agreement with the behavior
expected from SO coupling plus acoustic phonon scatter-
ing. However, in such model: (i) the TS energy splitting
was a taken directly from the experimental data, (ii) the
SO coupling effect was accounted for by parametrizing
the admixture of the lowest singlet and triplet states only,
and (iii) the B-dependence of the SO-induced admix-
ture was neglected. Approximation (ii) may overlook the
correlation-induced reduction of phonon scattering,30,33
that we have shown above to be significant, and which
may have an important contribution from higher excited
states in weakly-confined QDs. In turn, approximation
(iii) may overlook the important influence of SO coupling
in the B-dependence of the triplet lifetime, as we had
anticipated in Ref. 28. Here we compare with the exper-
imental findings using our model, which includes these
effects properly. We assume a QD with an effective well
width Lz = 30 nm, as expected by Ref. 29 authors, and
a lateral confinement parabola of h̄ω0 = 2 meV which,
as we shall see next, fits well the position of the TS an-
ticrossing. Yet, the comparison is limited by the lack of
detailed information about the Rashba and Dresselhaus
interaction constants, and because we deal with circular
QDs instead of elliptical ones (the latter effect introduces
simple deviations from the circular case26). In addition,
in the experiment a tilted magnetic field of magnitude
B∗, forming an angle of 68◦ with the vertical direction
was used. Here we consider the vertical component of
the field (B = 0.37B∗), which is the main responsible
for the changes in the energy structure, and the effect of
the in-plane component enters via the Zeeman splitting
only.
Figure 10 illustrates the average triplet lifetime for
N = 2. The bottom axis shows the vertical magnetic
field B value, while the top axis shows the value to be
compared with the experiment B∗.59 As can be seen, the
triplet lifetime first decreases with the field and then it
abruptly increases in the vicinity of the TS anticross-
ing, due to the small phonon density.28 This behavior
is in clear agreement with the experiment (cf. Fig. 3 of
Ref. 29). The position of the anticrossing (B∗ ∼ 2.9 T) is
also close to the experimental value (B∗ ∼ 2.8 T), which
confirms that that h̄ω0 = 2 meV is similar to the mean
confinement frequency of the experimental sample. A
departure from the experimental trend appears at weak
fields (B < 0.5 T), where we observe a continuous in-
crease of T1 with decreasing B, while the experiment re-
ports a plateau. This is most likely due to the ellipticity
of the experimental sample, which renders the electron
states (and consequently the relaxation rate) insensitive
to the field in the B∗ = 0 − 0.5 T region (see Fig. 1a
in 29). In any case, Fig. 10 clearly confirms the role of
phonon-induced relaxation in the experiments, using a
realistic model for the description of correlated electron
states, SO admixture and phonon scattering.
A comment is worth here on the magnitude of the SO
coupling terms. In Fig. 10, we obtain good agreement
with the experimental relaxation times by using small
values of the SO coupling parameters. In particular, a
close fit is obtained using β = 1, α = 0.5 meV·Å, which
yields a spin-orbit length λSO = 48 µm. This value,
which coincides with the experimental guess (λSO ≈ 50
µm), indicates that SO coupling is several times weaker
than that reported for other GaAs QDs.8 Typical GaAs
parameters are often larger. For instance, measuraments
of the Rashba and Dresselhaus constants by analysis of
the weak antilocalization in clean GaAs/AlGaAs two-
dimensional gases revealed α = 4−5 meV·Å, and γc = 28
eV·Å3 (i.e, β = 3 meV·Å for our quantum well of Lz = 30
nm).61 To be sure, the small SO coupling parameters in
the experiment have a major influence on the lifetime
scale. Compare e.g. the β = 1 and β = 5 meV·Å curves
in Fig. 10. Actually, we note that accurate comparison
with the timescale reported for other GaAs samples31 is
also possible within our model, but assuming stronger
SO coupling constants.28 In Ref. 29, it was suspected
that the weak SO coupling inferred from the experimen-
tal data could be the result of the exclusion of higher
orbitals and the magnetic field dependence of SO ad-
α = 0.5 β = 1
β = 5
β = 2
β = 1 α = 0
B (T)
 0.2  0.4  0.6  0.8  1 0
 0.53 1.58 2.11 2.631.05
B (T)
FIG. 10: Average triplet lifetime in a QD with N = 2 elec-
trons as a function of an axial magnetic field. Only the field
region before the TS anticrossing is shown. α and β are in
meV·Å units. B is the applied axial magnetic field, and B∗
is the equivalent tilted magnetic field, for comparison with
Ref. 29 experiment.
mixture in their model (higher states reduce the effective
SO coupling constants by decreasing the phonon-induced
scattering30,33). Here we have considered both these ef-
fects and still small SO coupling constants are needed to
reproduce the experiment. Therefore, understanding the
origin of their small value remains as an open question.
One possibility could be that the particular direction of
the tilted magnetic field used in the experiment corre-
sponded to a reduced degree of SO admixture.30
VI. CONCLUSIONS
We have investigated theoretically the energy structure
and spin relaxation rate of weakly-confined QDs with
N = 1 − 5 interacting electrons, subject to axial mag-
netic fields, in the presence of linear Rashba and Dressel-
haus SO interactions. It has been shown that the num-
ber of electrons confined in the dot introduces changes
in the energy spectrum which significantly influence the
intensity of the SO admixture, and hence the spin re-
laxation. In general, the larger the number of confined
carriers, the higher the density of electronic states. This
decreases the energy splitting between consecutive lev-
els and then enhances SO admixture, which should lead
to faster spin relaxation. However, we find that this is
not necessarily the case, and slower relaxation rate may
be found for few-electron QDs as compared to the usual
single and two-electron QDs used up to date. The physi-
cal mechanisms responsible for this have been identified.
For N -odd systems, when the spin transition takes place
between Zeeman-split sublevels, it is the presence of low-
energy S = 3/2 states for N > 1 that reduces the pro-
jection of the doublet Sz = 1/2 sublevels into Sz = −1/2
ones, thus partly inhibiting the spin transition. For N -
even systems, when the spin transition takes place be-
tween triplet and singlet levels, there are two underlying
mechanisms. On the one hand, electronic correlations
tend to reduce phonon emission efficiency. On the other
hand, for N > 2 a magnetic field can be used to se-
lect a pair of singlet-triplet states which do not fulfill
the ∆M = ±1 condition of direct SO admixture, which
significantly weakens the SO mixing.
Last, we have compared our estimates with recent
experimental data for TS relaxation in N = 2 QDs.29
Our results support the interpretation of the experi-
ment in terms of SO admixture plus acoustic phonon
scattering, even though quantitative agreement with the
experiment requires assuming much weaker SO coupling
than that reported for similar GaAs structures.
Acknowledgments
We acknowledge support from the Italian Ministry
for University and Scientific Research under FIRB
RBIN04EY74, Cineca Calcolo parallelo 2006, and
Marie Curie IEF project NANO-CORR MEIF-CT-2006-
023797.
∗ Electronic address: climente@unimore.it;
URL: www.nanoscience.unimore.it
1 I. Zutic, J. Fabian, and S. Das Sarma, Rev. Mod. Phys.
76, 323 (2004).
2 D. Heiss, M. Kroutvar, J.J. Finley, and G. Abstreiter, Solid
State Comm. 135, 519 (2005).
3 D. Loss, and D.P. DiVincenzo, Phys. Rev. A 57, 120
(1998).
4 V.N. Golovach, A. Khaetskii, and D. Loss, Phys. Rev. Lett.
93, 016601 (2004).
5 R. Hanson, B. Witkamp, L.M.K. Vandersypen, L.H.
Willems van Beveren, J.M. Elzerman, and L.P. Kouwen-
hoven, Phys. Rev. Lett. 91, 196802 (2003).
6 M. Kroutvar, Y. Ducommun, D. Heiss, M. Bichler, D.
Schuh, G. Abstreiter, and J.J. Finley, Nature (London)
432, 81 (2004).
7 J.M. Elzerman, R. Hanson, L.H. Willems van Beveren, B.
Witkamp, L.M.K. Vandersypen, and L.P. Kouwenhoven,
Nature (London) 430, 431 (2004).
8 S. Amasha, K. MacLean, I. Radu, D.M. Zumbühl,
M.A. Kastner, M.P. Hanson, and A.C. Gossard,
cond-mat/0607110.
9 A.V. Khaetskii, and Y.V. Nazarov, Phys. Rev. B 64,
125316 (2001).
10 J.L. Cheng, M.W. Wu, and C. Lü, Phys. Rev. B 69, 115318
(2004).
11 D.V. Bulaev, and D. Loss, Phys. Rev. B 71, 205324 (2005).
12 C.F. Destefani, and S.E. Ulloa, Phys. Rev. B 72, 115326
(2005).
13 P. Stano, and J. Fabian, Phys. Rev. B 74, 045320 (2006).
14 Y.Y. Wang, and M.W. Wu, Phys. Rev. B 74, 165312
(2006).
15 E. Ya. Sherman, and D.J. Lockwood, Phys. Rev. B 72,
125340 (2005).
16 L.M. Woods, T.L Reinecke, and Y. Lyanda-Geller, Phys.
Rev. B 66, 161318(R) (2002).
17 I.A. Merkulov, Al. L. Efros, and M. Rosen, Phys. Rev. B
65, 205309 (2002).
18 S.I. Erlingsson, and Y.V. Nazarov, Phys. Rev. B 66,
155327 (2002).
19 P. San-Jose, G. Zarand, A. Shnirman, and G. Schön, Phys.
Rev. Lett. 97, 076803 (2006).
20 T. Fujisawa, D.G. Austing, Y. Tokura, Y. Hirayama, and
S. Tarucha, Nature (London) 419, 278 (2002); T. Fujisawa,
D.G. Austing, Y. Tokura, Y. Hirayama, and S. Tarucha,
J. Phys.: Cond. Matter 15, R1395 (2003).
21 R. Hanson, L.H. Willems van Beveren, I.T. Vink, J.M.
Elzerman, W.J.M. Naber, F.H.L. Koppens, L.P. Kouwen-
hoven, and L.M.K. Vandersypen, Phys. Rev. Lett. 94,
196802 (2005).
22 J.R. Petta, A.C. Johnson, J.M. Taylor, E.A. Laird, A. Ya-
coby, M.D. Lukin, C.M. Marcus, M.P. Hanson, and A.C.
Gossard, Science 309, 2180 (2005).
23 A.C. Johnson, J.R. Petta, J.M. Taylor, A. Yacoby, M.D.
Lukin, C.M. Marcus, M.P. Hanson, and A.C. Gossard, Na-
ture (London) 435, 925 (2005).
24 J.R. Petta, A.C. Johnson, A. Yacoby, C.M. Marcus, M.P.
Hanson, and A.C. Gossard, Phys. Rev. B 72, 161301(R)
(2005).
25 W.A. Coish, and D. Loss, Phys. Rev. B 72, 125337 (2005).
26 M. Florescu, S. Dickman, M. Ciorga, A. Sachrajda, and P.
Hawrylak, Physica E (Amsterdam) 22, 414 (2004); M. Flo-
rescu, and P. Hawrylak, Phys. Rev. B 73, 045304 (2006).
27 D. Chaney and P.A. Maksym, Phys. Rev. B 75, 035323
(2007).
28 J.I. Climente, A. Bertoni, G. Goldoni, M. Rontani, and E.
Molinari, Phys. Rev. B 75, 081303(R) (2007).
29 T. Meunier, I.T. Vink, L.H. Willems van Beveren, K.J.
Tielrooij, R. Hanson, F.H.L. Koppens, H.P. Tranitz, W.
Wegscheider, L.P. Kouwenhoven, and L.M.K. Vander-
sypen, Phys. Rev. Lett. 98, 126601 (2007).
30 V.N. Golovach, A. Khaetskii, and D. Loss,
cond-mat/0703427 (unpublished).
31 S. Sasaki, T. Fujisawa, T. Hayashi, and Y. Hirayama, Phys.
Rev. Lett. 95, 056803 (2005).
32 M. Ciorga, A.S. Sachrajda, P. Hawrylak, C. Gould, P. Za-
wadzki, S. Jullian, Y. Feng, and Z. Wasilewski, Phys. Rev.
B 61, R16315 (2000); H. Drexler, D. Leonard, W. Hansen,
J.P. Kotthaus, P.M. Petroff, Phys. Rev. Lett. 73, 2252
(1994).
33 A. Bertoni, M. Rontani, G. Goldoni, and E. Molinari,
Phys. Rev. Lett. 95, 066806 (2005); J.I. Climente, A.
Bertoni, M. Rontani, G. Goldoni, and E. Molinari, Phys.
Rev. B 74, 125303 (2006).
mailto:climente@unimore.it
www.nanoscience.unimore.it
http://arxiv.org/abs/cond-mat/0607110
http://arxiv.org/abs/cond-mat/0703427
34 T. Chakraborty, and P. Pietiläinen, Phys. Rev. B 71,
113305 (2005).
35 P. Pietiläinen, and T. Chakraborty, Phys. Rev. B 73,
155315 (2006).
36 C.F. Destefani, S.E. Ulloa, and G.E. Marques, Phys. Rev.
B 70, 205315 (2004).
37 During the finalization of this paper we have learned about
a parallel work investigating the influence of Coulomb in-
teraction in two-electron TS relaxation.30 Many of the find-
ings in such paper are in agreement with our numerical
results.
38 Y.A. Bychkov, and E.I. Rashba, J. Phys. C 17, 6039
(1984).
39 G. Dresselhaus, Phys. Rev. 100, 580 (1955).
40 M. Rontani, C. Cavazzoni, D. Bellucci, and G. Goldoni, J.
Chem. Phys. 124, 124102 (2006).
41 http://www.s3.infm.it/donrodrigo
42 L. Jacak, P. Hawrylak, and A. Wojs, Quantum Dots,
(Springer Verlag, Berlin, 1998).
43 M. Brasken, S. Corni, M. Lindberg, J. Olsen, and D. Sund-
holm, Mol. Phys. 100, 911 (2002).
44 P. Lucignano, B. Jouault, and A. Tagliacozzo, Phys. Rev.
B 69, 045314 (2004).
45 The (M,S, Sz) quantum numbers of few-electron states are
a good approximation for the lowest-lying states only. For
higher-lying states, the energy spectrum becomes denser
and the SO interaction becomes very strong even for GaAs,
which leads to important departures from the SO-free pic-
ture. This does not occur in single-electron parabolic QDs
because the energy levels are equally spaced.
46 The convenience of using exact diagonalization procedures,
instead of perturbational approaches, to account for the SO
coupling in GaAs QDs has been claimed in Ref. 10.
47 C.S. Ting (ed.), Physics of Hot Electron Transport in Semi-
conductors, (World Scientific, 1992).
48 Landolt-Börnstein: Numerical Data and Functional Rela-
tionships in Science and Technology, Vol. 17. Semiconduc-
tors, Group IV Elements and III-V Compounds, edited by
O. Madelung, (Springer-Verlag, 1982).
49 M. Cardona, N.E. Christensen, and G. Fasol, Phys. Rev.
B 38, 1806 (1988).
50 U. Bockelmann, Phys. Rev. B 50, 17271 (1994).
51 J.I. Climente, A. Bertoni, G. Goldoni, and E. Molinari,
Phys. Rev. B 74, 035313 (2006).
52 P. Stano, and J. Fabian, Phys. Rev. B 72, 155410 (2005).
53 O. Voskoboynikov, C.P. Lee, and O. Tretyak, Phys. Rev.
B 63, 165306 (2001).
54 W.H. Kuan, and C.S. Tang, J. Appl. Phys. 95, 6368 (2004).
55 The energy magneto-spectrum of GaAs parabolic QDs
with SO interaction and up to four interacting electrons
was also investigated in Ref. 35, but considering Rashba
interaction only.
56 Coulomb-enhanced SO interaction was previously pre-
dicted for higher-dimensional structures.57 Here we report
it for QDs.
57 G.H. Chen, and M.E. Raikh, Phys. Rev. B 60, 4826 (1999).
58 C.F. Destefani, S.E. Ulloa, and G.E. Marques, Phys. Rev.
B 69, 125302 (2004).
59 For simplicity of the discussion, in Figs. 3, 8 and 10, the
near vicinity of B = 0 T is not shown. In that range one
finds damped phonon-induced relaxation rates due to de-
generacies arising from the time-reversal symmetry and the
circular symmetry of the confinement we have assumed.
We do not expect these features to be observable in exper-
iments, because QDs are not perfectly circular and because
hyperfine interaction is expected to be the dominant spin
relaxation mechanism for very weak fields (see Refs. 18,23).
60 Greatly suppressed TS spin relaxation, comparable to that
of inter-Zeeman sublevels at very weak B, may be achieved
by means of geometrically or field-induced acoustic phonon
emission minima.27,28
61 J.B. Miller, D.M. Zumbühl, C.M. Marcus, Y.B. Lyanda-
Geller, D. Goldhaber-Gordon, K. Campman, and A.C.
Gossard, Phys. Rev. Lett. 90, 076807 (2003).
http://www.s3.infm.it/donrodrigo
ABSTRACT
  We estimate the spin relaxation rate due to spin-orbit coupling and acoustic
phonon scattering in weakly-confined quantum dots with up to five interacting
electrons. The Full Configuration Interaction approach is used to account for
the inter-electron repulsion, and Rashba and Dresselhaus spin-orbit couplings
are exactly diagonalized. We show that electron-electron interaction strongly
affects spin-orbit admixture in the sample. Consequently, relaxation rates
strongly depend on the number of carriers confined in the dot. We identify the
mechanisms which may lead to improved spin stability in few electron (>2)
quantum dots as compared to the usual one and two electron devices. Finally, we
discuss recent experiments on triplet-singlet transitions in GaAs dots subject
to external magnetic fields. Our simulations are in good agreement with the
experimental findings, and support the interpretation of the observed spin
relaxation as being due to spin-orbit coupling assisted by acoustic phonon
emission.

<|endoftext|><|startoftext|>
Introduction
The Asymmetric Simple Exclusion Process (ASEP) is a lattice model of parti-
cles with hard core interactions. Due to its simplicity, the ASEP appears as a
minimal model in many different contexts such as one-dimensional transport
phenomena, molecular motors and traffic models. From a theoretical point
of view, this model has become a paradigm in the field of non-equilibrium
statistical mechanics; many exact results have been derived using various
methods, such as continuous limits, Bethe Ansatz and matrix Ansatz (for re-
views, see e.g., Spohn 1991, Derrida 1998, Schütz 2001, Golinelli and Mallick
2006).
In a recent work (Golinelli and Mallick 2007), we applied the algebraic
Bethe Ansatz technique to the Totally Asymmetric Exclusion Process (TASEP).
http://arxiv.org/abs/0704.0869v1
Golinelli, Mallick — Connected Operators for TASEP 2
This method allowed us to construct a hierarchy of ‘generalized Hamiltonians’
that contain the Markov matrix and commute with each other. Using the
algebraic relations satisfied by the local jump operators, we derived explicit
formulae for the transfer matrix and the generalized Hamiltonians, generated
from the transfer matrix. We showed that the transfer matrix can be inter-
preted as the generator of a discrete time Markov process and we described
the actions of the generalized Hamiltonians. These actions are non-local be-
cause they involve non-connected bonds of the lattice. However, connected
operators are generated by taking the logarithm of the transfer matrix. We
conjectured for the connected operators a combinatorial formula that was
verified for the first ten connected operators by using a symbolic calculation
program.
The aim of the present work is to present an analytical calculation of the
connected operators and to prove the formula that was proposed in (Golinelli
and Mallick 2007). This paper is a sequel of our previous work, however, in
section 2, we briefly review the main definitions and results already obtained
so that this work can be read in a fairly self-contained manner. In section 3,
we derive the general expression of the connected operators.
2 Review of known results
We first recall the dynamical rules that define the TASEP with n particles
on a periodic 1-d ring with L sites labelled i = 1, . . . , L. The particles move
according to the following dynamics: during the time interval [t, t + dt], a
particle on a site i jumps with probability dt to the neighboring site i+ 1, if
this site is empty. This exclusion rule which forbids to have more than one
particle per site, mimics a hard-core interaction between particles. Because
the particles can jump only in one direction this process is called totally
asymmetric. The total number n of particles is conserved. The TASEP
being a continuous-time Markov process, its dynamics is entirely encoded in
a 2L × 2L Markov matrix M , that describes the evolution of the probability
distribution of the system at time t. The Markov matrix can be written as
Mi , (1)
where the local jump operator Mi affects only the sites i and i + 1 and
represents the contribution to the dynamics of jumps from the site i to i+1.
Golinelli, Mallick — Connected Operators for TASEP 3
2.1 The TASEP algebra
The local jump operators satisfy a set of algebraic equations :
M2i = −Mi, (2)
Mi Mi+1 Mi = Mi+1 Mi Mi+1 = 0, (3)
[Mi,Mj ] = 0 if |i− j| > 1. (4)
These relations can be obtained as a limiting form of the Temperley-Lieb
algebra. On the ring we have periodic boundary conditions : Mi+L = Mi.
The local jumps matrices define an algebra. Any product of the Mi’s will be
called a word. The length of a given word is the minimal number of operators
Mi required to write it. A word, that can not be simplified further by using
the algebraic rules above, will be called a reduced word.
Consider any word W and call I(W ) the set of indices i of the operators
Mi that compose it (indices are enumerated without repetitions). We remark
that, if W is not annihilated by application of rule (3), the simplification
rules (2, 4) do not alter the set I(W ), i.e., these rules do not introduce any
new index or suppress any existing index in I(W ). This crucial property is
not valid for the algebra associated with the partially asymmetric exclusion
process (see Golinelli and Mallick 2006).
Using the relation (2) we observe that for any i and any real number
λ 6= 1 we have
(1 + λMi)
−1 = (1 + αMi) with α =
. (5)
2.2 Simple words
A simple word of length k is defined as a word Mσ(1)Mσ(2) . . .Mσ(k), where σ
is a permutation on the set {1, 2, . . . , k}. The commutation rule (4) implies
that only the relative position of Mi with respect to Mi±1 matters. A simple
word of length k can therefore be written as Wk(s2, s3, . . . , sk) where the
boolean variable sj for 2 ≤ j ≤ k is defined as follows : sj = 0 if Mj is
on the left of Mj−1 and sj = 1 if Mj is on the right of Mj−1. Equivalently,
Wk(s2, s3, . . . , sk) is uniquely defined by the recursion relation
Wk(s2, s3, . . . , sk−1, 1) = Wk−1(s2, s3, . . . , sk−1) Mk , (6)
Wk(s2, s3, . . . , sk−1, 0) = Mk Wk−1(s2, s3, . . . , sk−1) . (7)
The set of the 2k−1simple words of length k will be called Wk. For a simple
word Wk, we define u(Wk) to be the number of inversions in Wk, i.e., the
Golinelli, Mallick — Connected Operators for TASEP 4
number of times that Mj is on the left of Mj−1 :
u(Wk(s2, s3, . . . , sk)) =
(1− sj) . (8)
We remark that simple words are connected, they cannot be factorized
in two (or more) commuting words.
2.3 Ring-ordered product
Because of the periodic boundary conditions, products of local jump opera-
tors must be ordered adequately. In the following we shall need to use a ring
ordered product O () which acts on words of the type
W = Mi1Mi2 . . .Mik with 1 ≤ i1 < i2 < . . . < ik ≤ L , (9)
by changing the positions of matrices that appear in W according to the
following rules :
(i) If i1 > 1 or ik < L, we define O (W ) = W . The word W is well-
ordered.
(ii) If i1 = 1 and ik = L, we first write W as a product of two blocks,
W = AB, such that B = MbMb+1 . . .ML is the maximal block of matrices
with consecutive indices that contains ML, and A = M1Mi2 . . .Mia , with
ia < b− 1, contains the remaining terms. We then define
O (W ) = O (AB) = BA = MbMb+1 . . .MLM1Mi2 . . .Mia . (10)
(iii) The previous definition makes sense only for k < L. Indeed, when
k = L, we have W = M1M2 . . .ML and it is not possible to split W in two
different blocks A and B. For this special case, we define
O (M1M2 . . .ML) = |1, 1, . . . , 1〉〈1, 1, . . . , 1| , (11)
which is the projector on the ‘full’ configuration with all sites occupied.
The ring-orderingO () is extended by linearity to the vector space spanned
by words of the type described above.
2.4 Transfer matrix and generalized Hamiltonians Hk
The algebraic Bethe Ansatz allows to construct a one parameter commuting
family of transfer matrices, t(λ), that contains the translation operator T =
t(1) and the Markov matrix M = t′(0). For 0 ≤ λ ≤ 1, the operator
Golinelli, Mallick — Connected Operators for TASEP 5
t(λ) can be interpreted as a discrete time process with non-local jumps :
a hole located on the right of a cluster of p particles can jump a distance
k in the backward direction, with probability λk(1 − λ) for 1 ≤ k < p,
and with probability λp for k = p. The probability that this hole does
not jump at all is 1 − λ. This model is equivalent to the 3-D anisotropic
percolation model of Rajesh and Dhar (1998) and to a 2-D five-vertex model.
It is also an adaptation on a periodic lattice of the ASEP with a backward-
ordered sequential update (Rajewsky et al. 1996, Brankov et al. 2004),
and equivalently of an asymmetric fragmentation process (Rákos and Schütz
2005).
The operator t(λ) is a polynomial in λ of degree L given by
t(λ) = 1 +
λkHk , (12)
where the generalized HamiltoniansHk are non-local operators that act on the
configuration space. [We emphasize that the notation used here is different
from that of our previous work : t(λ) was denoted by tg(λ) in (Golinelli and
Mallick 2007).]
We have H1 = M and more generally, as shown in (Golinelli and Mallick
2007), Hk is a homogeneous sum of words of length k
1≤i1<i2<...<ik≤L
O (Mi1Mi2 . . .Mik) , (13)
where O () represents the ring ordered product that embodies the periodicity
and the translation-invariance constraints.
For a system of size L with N particles only H1, H2, . . . , HN have a non-
trivial action. Because we are interested only in the case N ≤ L− 1 (the full
system as no dynamics) there are at most L − 1 operators Hk that have a
non-trivial action.
3 The connected operators Fk
3.1 Definition
The generalized Hamiltonians Hk and the transfer matrix t(λ) have non-local
actions and couple particles with arbitrary distances between them. Besides
Hk is a highly non-extensive quantity as it involves generically a number of
terms of order Lk. As usual, the local connected and extensive operators
are obtained by taking the logarithm of the transfer matrix. For k ≥ 1, the
Golinelli, Mallick — Connected Operators for TASEP 6
connected Hamiltonians Fk are defined as
ln t(λ) =
Fk . (14)
Taking the derivative of this equation with respect to λ and recalling that
t(λ) commutes with t′(λ), we obtain
λkFk = λ t(λ)
−1 t′(λ) . (15)
Expanding t(λ)−1 with respect to λ, this formula allows to calculate Fk as a
polynomial function of H1, . . . , Hk. For example F1 = H1, F2 = 2H2 − H
etc... (see Golinelli and Mallick 2007). By using (13), we observe that Fk is
a priori a linear combination of products of k local operators Mi. However
this expression can be simplified by using the algebraic rules (2, 3, 4) and in
fine, Fk will be a linear combination of reduced words of length j ≤ k.
Because of the ring-ordered product that appears in the expression (13)
of the Hk’s, it is difficult to derive an expression of Fk in terms of the local
jump operators. An exact formula for the Fk with k ≤ 10 was obtained
in (Golinelli and Mallick 2007) by using a computer program and a general
expression was conjectured for all k. In the following, the conjectured formula
is derived and proved rigorously.
3.2 Elimination of the ring-ordered product
The expression
λkFk can be written as a linear combination of reduced
words W . We know from formula (13) that at most L− 1 operators Hk are
independent in a system of size L, we shall therefore calculate Fk only for
k ≤ L− 1. Thus, we need to consider reduced words of length j ≤ L − 1.
Let W be such a word, and I(W ) be the set of indices of the operators Mi
that compose W ; our aim is to find the expression of W and to calculate
its prefactor from equation (15). Because the rules (2, 4) do not suppress or
add any new index, the following property is true : if a word W ′ appearing
in λ t(λ)−1 t′(λ) is such that I(W ′) 6= I(W ) then even after simplification,
W ′ will remain different from W . Therefore, the prefactor of W in
is the same as the prefactor of W in
λ tI(λ)
−1 t′I(λ) where tI(λ) = O
(1 + λMi)
with I(W ) ⊂ I . (16)
Because Fk commutes with the translation operator T , then for any r =
1, . . . , L−1, the prefactor of W = Mi1Mi2 . . .Mij is the same as the prefactor
Golinelli, Mallick — Connected Operators for TASEP 7
of T rMT−r = Mr+i1Mr+i2 . . .Mr+ij . Furthermore, any word W of size k ≤
L − 1 is equivalent, by a translation, to a word that contains M1 and not
ML : indeed, there exists at least one index i0 such that i0 /∈ I(W ) and
(i0 + 1) ∈ I(W ) and it is thus sufficient to translate W by r = L− i0.
In conclusion, it is enough to study in expression (15), the reduced words
W with set of indices included in
I∗ = {1, 2, . . . , L− 1} . (17)
Because the index L does not appear in I∗, the ring-ordered product has a
trivial action in equation (16) and we have
tI∗(λ) = (1 + λM1)(1 + λM2) . . . (1 + λML−1) . (18)
We have thus been able to eliminate the ring-ordered product.
3.3 Explicit formula for the connected operators
In equation (18), differentiating tI∗(λ) with respect to λ, we have
t′I∗(λ) =
(1 + λM1) . . . (1 + λMi−1)Mi(1 + λMi+1) . . . (1 + λML−1) . (19)
Using equation (5) we obtain
tI∗(λ)
−1 = (1+αML−1)(1 +αML−2) . . . (1 +αM1) , with α =
. (20)
Noticing that λ(1 + αMi)Mi = −αMi, we deduce
λ tI∗(λ)
−1 t′I∗(λ) = (21)
(1 + αML−1) . . . (1 + αMi+1)Mi(1 + λMi+1) . . . (1 + λML−1) .
The ith term in this sum contains words with indices between i and L − 1.
Because we are looking for the words that contain the operator M1, we must
consider only the first term in this sum, which we note by Q
Q = −α(1 + αML−1) . . . (1 + αM2)M1(1 + λM2) . . . (1 + λML−1) . (22)
In the appendix, we show that
Q = R1 +R2 + . . .+RL−1 , (23)
Golinelli, Mallick — Connected Operators for TASEP 8
where Ri is defined by the recursion :
R1 = −αM1 , (24)
Ri = λRi−1Mi + αMiRi−1 for i ≥ 2 . (25)
To summarize, all the words in
k=1 λ
kFk that contain M1 and not ML are
given by Q = R1+R2+. . .+RL−1. From the recursion relation (25) we deduce
that Ri is a linear combination of the 2
i−1 simple words Wi(s2, s3, . . . , si)
defined in section 2.1. Furthermore, we observe from (25) that a factor λ
appears if si = 1 and a factor α = λ/(λ − 1) appears if si = 0. Therefore,
the coefficient f(W ) of W = Wi(s2, s3, . . . , si) in Q is given by
f(W ) = (−1)u
(1− λ)u+1
= (−1)u
λi+j (26)
where i is the length of W and u = u(W ) is its inversion number, defined in
equation (8). We have thus shown that
f(W ) W =
(−1)u(W )
u(W )+j
λi+j , (27)
where Wi is the set of simple words of length i.
Finally, we recall that the coefficient in
k=1 λ
kFk of a reduced word W
that contains M1 and not ML is the same as its coefficient in Q. Extracting
the term of order λk in equation (27) we deduce that any word W in Fk that
contains M1 and not ML is a simple word of length i ≤ k and its prefactor
is given by (−1)u(W )
u(W )+k−i
The full expression of Fk is obtained by applying the translation operator
to the expression (27); indeed any word in Fk can be uniquely obtained by
translating a simple word in Fk that contains M1 and not ML. We conclude
that for k < L,
Fk = T
(−1)u(W )
k−i+u(W )
W , (28)
where T is the translation-symmetrizator that acts on any operator A as
follows : T A =
i=0 T
i A T−i . The presence of T in equation (28) insures
that Fk is invariant by translation on the periodic system of size L. All simple
words being connected, we finally remark that formula (28) implies that Fk
is a connected operator.
Golinelli, Mallick — Connected Operators for TASEP 9
4 Conclusion
By using the algebraic properties of the TASEP algebra (2-4), we have derived
an exact combinatorial expression for the family of connected operators that
commute with the Markov matrix. This calculation allows to fully elucidate
the hierarchical structure obtained from the Algebraic Bethe Ansatz. It
would be of a great interest to extend our result to the partially asymmetric
exclusion process (PASEP), in which a particle can make forward jumps
with probability p and backward jumps with probability q. In particular, we
recall that the symmetric exclusion process is equivalent to the Heisenberg
spin chain : in this case the connected operators have been calculated only
for the lowest orders (Fabricius et al., 1990). This is a challenging and
difficult problem. In our derivation we used a fundamental property of the
TASEP algebra : the rules (2-4) when applied to a word W either cancel
W or conserve the set of indices I(W ). The algebra associated with PASEP
violates this crucial property because there we have Mi Mi+1 Mi = pq Mi.
Therefore the method followed here does not have a straightforward extension
to the PASEP case.
Appendix: Proof of equation (23)
Let us define the following series
Q1 = −αM1 , (29)
Qi = (1 + αMi)Qi−1(1 + λMi) for i ≥ 2 . (30)
We remark that Q defined in equation (22) is given by Q = QL−1. Let us
consider Ri defined by the recursion (25). The indices that appear in the
words of Qi and Ri belong to {1, 2, . . . , i}. Therefore, we have
[Rj ,Mi] = 0 for j ≤ i− 2 , (31)
because the operators M1,M2, . . . ,Mj that compose Rj commute with Mi.
From equations (31) and (5), we obtain
(1 + αMi)Rj(1 + λMi) = Rj for j ≤ i− 2 . (32)
Furthermore, from (25), we obtain
MiRi−1Mi = λMiRi−2Mi−1Mi + αMiMi−1Ri−2Mi . (33)
Because Mi commutes with Ri−2, we can use the relation MiMi−1Mi = 0 to
deduce that
MiRi−1Mi = 0 . (34)
Golinelli, Mallick — Connected Operators for TASEP 10
Using equation (34), we find
(1 + αMi)Ri−1(1 + λMi) = Ri−1 + λRi−1Mi + αMiRi−1 = Ri−1 +Ri . (35)
From equations (32) and (35), we prove that the (unique) solution of the
recursion relation (30) is given by equation (23), Qi = R1 +R2 + . . .+Ri.
References
• Brankov J. G., Priezzhev V. B. and Shelest R. V., 2004, Generalized
determinant solution of the discrete-time totally asymmetric exclusion
process and zero-range process, Phys. Rev. E 69 066136.
• Derrida B., 1998, An exactly soluble non-equilibrium system: the asym-
metric simple exclusion process, Phys. Rep. 301 65.
• Fabricius K., Mütter K.-H. and Grosse H., 1990, Hidden symmetries in
the one-dimensional antiferromagnetic Heisenberg model, Phys. Rev.
B 42 4656.
• Golinelli O. and Mallick K., 2006, The asymmetric simple exclusion
process: an integrable model for non-equilibrium statistical mechanics,
J. Phys. A: Math. Gen. 39 12679.
• Golinelli O. and Mallick K., 2007, Family of Commuting Operators for
the Totally Asymmetric Exclusion Process, Submitted to J. Phys. A:
Math. Theor., cond-mat/0612351.
• Rajesh R. and Dhar D., 1998, An exactly solvable anisotropic directed
percolation model in three dimensions, Phys. Rev. Lett. 81 1646.
• Rajewsky N., Schadschneider A. and Schreckenberg M., 1996, The
asymmetric exclusion model with sequential update, J. Phys. A: Math.
Gen. 29 L305.
• Rákos A. and Schütz G. M., 2005, Current distribution and random
matrix ensembles for an integrable asymmetric fragmentation process,
J. Stat. Phys. 118 511.
• Schütz G. M., 2001, Exactly solvable models for many-body systems far
from equilibrium in Phase Transitions and Critical Phenomena, vol.
19, C. Domb and J. L. Lebowitz Ed., Academic Press, San Diego.
• Spohn H., 1991, Large scale dynamics of interacting particles, Springer,
New-York.
http://arxiv.org/abs/cond-mat/0612351
	Introduction
	Review of known results
	The TASEP algebra
	Simple words
	Ring-ordered product
	Transfer matrix and generalized Hamiltonians Hk
	The connected operators Fk
	Definition
	Elimination of the ring-ordered product
	Explicit formula for the connected operators
	Conclusion
ABSTRACT
  We fully elucidate the structure of the hierarchy of the connected operators
that commute with the Markov matrix of the Totally Asymmetric Exclusion Process
(TASEP). We prove for the connected operators a combinatorial formula that was
conjectured in a previous work. Our derivation is purely algebraic and relies
on the algebra generated by the local jump operators involved in the TASEP.
  Keywords: Non-Equilibrium Statistical Mechanics, ASEP, Exact Results,
Algebraic Bethe Ansatz.

<|endoftext|><|startoftext|>
PROPOSAL FOR AN ENHANCED OPTICAL COOLING SYSTEM 
TEST IN AN ELECTRON STORAGE RING 
                     E.G.Bessonov, M.V.Gorbunkov, Lebedev Phys. Inst. RAS, Moscow, Russia,     
                              A.A.Mikhailichenko, Cornell University, Ithaca, NY, U.S.A. 
                                                                        Abstract  
    We are proposing to test experimentally the new idea of Enhanced Optical Cooling (EOC) 
in an electron storage ring. This experiment will confirm new fundamental processes in beam 
physics and will demonstrate new unique possibilities with this cooling technique. It will 
open important applications of EOC in nuclear physics, elementary particle physics and in 
Light Sources (LS) based on high brightness electron and ion beams. 
1. INTRODUCTION 
    Emittance and the number of stored particles –N in the beam determine the principal 
parameter of the beam, its Brightness what can be defined as / x z sB N γε γε γε= , where each , ,x z sγε  
stands for invariant emittance associated with corresponding coordinate. Beam cooling 
reduces the beam emittance (its size and the energy spread) in a storage ring and therefore 
improves its quality for experiments. All high-energy colliders and high-brilliance LS’s 
require intense cooling to reach extreme parameters. Several methods for the particle beam 
cooling are in hand now: (i) radiation cooling, (ii) electron cooling, (iii) stochastic cooling, 
(iv) optical stochastic cooling, (v) laser cooling, (vi) ionization cooling, and (vii) radiative 
(stimulated radiation) cooling [1-3]. Recently a new method of EOC was suggested [4-7] and 
in this proposal we discuss an experiment which might test this method in an existing 
electron storage ring having maximal energy ~ 2.5 GeV, and which can also function down to 
energies of ~100-200 MeV.  
Figure1: The scheme of the EOC of a particle beam (a) and unwrapped optical scheme (b) 
     EOC [4] appeared as the symbiosis of enhanced emittance exchange and Optical 
Stochastic Cooling (OSC) [8-10]. These ideas have not yet been demonstrated. At the same 
time the ordinary Stochastic Cooling (SC) is widely in use in proton and ion colliders. OSC 
and EOC extend the potential for fast cooling due to bandwidth. EOC can be successfully 
used in Large Hadron Collider (LHC) as well as in a planned muon collider.  
    The EOC in the simpiest case of two dimensional cooling in the longitudinal and 
transverse x-planes is based on one pickup and one or more kicker undulators located at a 
distance determined by the betatron phase advance betxψ = ,2 ( 1/ 2)p kkπ +  for first kicker 
undulator and betxψ = ,2 k kkπ  for the next ones, where kij  = 0, 1, 2, 3,… is the whole numbers. 
Other elements of the cooling system are the optical amplifier (typically Optical Parametric 
Amplifier i.e. OPA), optical filters, optical lenses, movable screen(s) and optical line with 
variable time delay (see Fig.1). An optical delay line can be used together with (or in some 
cases without) isochronous pass-way between undulators to keep the phases of particles such 
that the kicker undulator decelerates the particles during the process of cooling [6], [7].  
2. TO THE FOUNDATIONS OF ENHANCED OPTICAL COOLING  
    The total amount of energy carried out by undulator radiation (UR) emitted by electrons 
traversing an undulator, according to classical electrodynamics, is given by 
      
2 2 2 22
to t e uE r B Lβ γ= ,                                                 (1) 
where  is the classical electron radius;  e,  are the electron charge and mass 
respectively; 
2 /er e m c=
2B  is an averaged square of magnetic field along the undulator period 
uλ ; 
/v cβ = is the relative velocity of the electron;  is the relativistic factor; 2/ eE m cγ = u uL Mλ=  is 
the length of the undulator; and M is the number of undulator periods.  For a planar harmonic 
undulator 2 20 / 2B B= , where B0 is the peak of the undulator field. For a helical undulator 
0B B=
.  The spectral distribution of the first harmonic of UR for  M>>1 is given by [11]  
              1 1/ ( ) (0
cl cldE d E fξ ξ ξ= ≤ ≤ ,                                           (2) 
where , , 2 21 /(1 )cl cltotE E K= + )221(3)( 2ξξξξ +−=f ξ = 1,min 1/λ λ , 1min 1 0|θλ λ == , , ( ) 1f dξ ξ =∫ K = 2 /ue B λ  
 is the deflection parameter, 22 em cπ
1 (1 ) / 2u K
2λ λ ϑ= + + γ  is the wavelength of the first harmonic 
of the UR, ϑ γθ= ; θ  is the azimuthtal angle between the vector of electron average velocity 
in the undulator and the undulator axis.   
    Electrons have effective resonant interaction in the field of the kicker undulator only with 
that part of their undulator radiation wavelets (URW) emitted in the pickup undulator if the 
frequency bands and the angles of the electron average velocities are selected in the ranges   
       ( )
∆⎛ ⎞ = ∆ =⎜ ⎟
⎝ ⎠ M
+                                              (3) 
nearby maximal frequency and to the axes of both pickup and kicker undulators. Optical 
filters which are tuned up to the maximal frequency of the first harmonic of the UR can be 
used for this selection. In this case screens must select the URWs emitted at angles 
( )URW Cϑ ϑ< ∆  to the pickup undulator axis both in horizontal and vertical directions before 
they enter optical amplifier (to do away with the unwanted part of URWs loading OPA). In 
this case the angle between the average electron velocity vector in the undulator and the 
undulator axis will be small:  
         ( ) ( )e Cϑ ϑ∆ < ∆ .                                                        (4) 
    Below we suggest that the optical system of EOC selects a portion of URWs, emitted in 
this range of angles and frequencies, by filters, diaphragms and/or screens. This condition 
limits the precision  of the phase advance  determined by the equation  ,
x zδψ ,
                         ,                                                        (5) ,( ) ( )
x z Cδθ θ< ∆
where  is the change of the angle between the electron 
average velocity and the axis of the kicker undulator owing to an error in the arrangement of 
, , , , ,( ) (2 / )sin( )
bet bet
x z x z bet x z bet x zAδθ π λ δψ= ,
undulators,  is the amplitude of the betatron oscillations of the electron in the storage 
ring, in the smooth approximation δψ ,  is the displacement of the kicker 
undulator from optimal position,  is the length of the period of betatron oscillations.  
, ,x z betA
, , ,2 /
x z x z betsπ λ= ∆ s∆
, ,x z betλ
   The number of the photons in the URW emitted by electrons in suitable cooling frequency 
and angular ranges (3) is defined by the following formula (see Appendix 1) 
              
1max 1
,                                                     (6) 
where , 2 21 1( / ) 3 / 2 (1 )
cl cl
totE dE d E M Kω ω∆ = ∆ = + 1maxω = 1min2 /cπ λ ,   [11]. 
Filtered URWs must be amplified and directed along the axis of the kicker undulator.  
2 1 137e cα = / ≅ /
   If the density of energy in the URWs has a Gaussian distribution with a waist size ,
w x zσ σ> , 
, the R.M.S. electric field strength / 2R uZ L> clwE  of the wavelet of the length 1min2Mλ  in the 
kicker undulator defined by the expression    
                            
2 2 3/ 2
M K M
σ λ σ
.                                          (7)  
where ,
x zσ  are the electron beam dimensions, 
, 1mi4 /R w cZ nπσ λ=  is the Rayleigh length. If 
x z w cσ σ< ,W w c, , the R.M.S. electric field strength clwE  of the wavelet becomes  σ σ=
                              
                                                           (8) 
where ,w cσ = 1min /8uL λ π  is the  waist size corresponding to the Rayleigh length .  /2R uZ L=
     Note that electric field values (7), (8) do not take into account quantum nature of emission 
of URWs in a pickup undulator. They are valid for . Such case can be realized only 
for heavy ions with atomic number  and for deflection parameter . If, according 
to classical electrodynamics, , then it means that in the reality, according to quantum 
theory, one photon is emitted with the energy 
1phN >>
10Z > 1K >
1<phN
max,1ω  and with the probability . 
In this case the electric field strength is determined by the replacement of the energy 
1em php N= <
clE∆  on 
 in (7) and the frequency of the emission of photons  
, where 
1 1 1,ph
q clE E N ω−∆ = ∆ ⋅ = max ph emf f p= ⋅ =
phf N f⋅ < f  is the revolution frequency of the electron in the storage ring.  
     If the number of electrons in the URW sample is , then URW emitted by an electron i  
in pickup undulator and amplified in OPA decrease the amplitudes of betatron and 
synchrotron oscillations of this electron in the kicker undulator. Other  electrons emit 
URWs including  non-synchronous (for the electron i ) photons, which are 
amplified by OPA and together with noice photons of the OPA increase the amplitudes of 
oscillations of the electron i . If the number of non-synchronous photons in the sample 
, where - is the number of noise photons in the URW 
sample at the amplifier front end [6], [7], then the jumps of the closed orbit and the electric 
field strengths are determined by the replacement of the energy 
,e sN
, 1e sN −
,( 1)e s phN N−
, ,( 1)ph e s ph nN N N 1Σ = − + <N nN
clE∆  on 1
qE∆  in (7), (8) and 
the frequency of the emission of photons is ,em phf f p f NΣ fΣ= ⋅ = ⋅ < . In the opposite case, 
, the electric field strengths are determined by the replacement of the energy , 1phN Σ > 1
clE∆  
on  in (7) and the frequency of the emission of photons is .  1 , 1
phE N ω
Σ∆ = ⋅ ,max
2 /e s e sN M N
     In our case , 1,min , ,0 , λ σΣ= eN Σ  stands for the number of electrons in the 
bunch, ,0sσ  is the initial length of the electron bunch.  
    The maximum rate of energy losses for the electron in the fields of the kicker undulators 
and amplified URW is  
max ( ) |
w w c
loss w u m ph kick amplP eE L f N N σ σβ α⊥ == − Φ =
1,min
8 ( )
ph kick ample f N N K
π π α
,                 (9) 
where γβ /K=⊥ ;  is the number of kicker undulators (it is supposed that electrons are 
decelerated in these undulators); and 
kickN
a m p lα  is the gain in the optical amplifier. The function 
1( ) | phph N em phN p NΦ = = , 1( ) 1phph NN >>Φ =  takes into account the quantum nature of 
the emission (the frequency of emission of photons ~ phf N⋅  and the electric field strength 
~1/ phN ). It follows, that quantum nature of the photon emission in undulators leads to the 
decrease of the maximum average rate of energy losses for electrons in the fields of the 
kicker undulator and amplified URW by the factor 1( ) | 9.phph N phN NΦ = 3 .  
     The damping times for the longitudinal and transverse degrees of freedom are  
         ,0, max
s EOC
lossP
,           
,0 ,0
, , max
x EOC s EOC
x loss x kick
σ β σ
= = ,                                     (10) 
where ,0Eσ  is the initial energy spread of the electron beam, Ploss  stands for the power losses 
(9), ,0xσ  is the initial radial beam dimension determined by betatron oscillations, , 0x kickη ≠  is 
the dispersion function in the kicker undulator, . Note that the damping 
time for the longitudinal direction does not depend on 
,0 , ,0( /x x kick E Eησ η β σ
,x kickη  and one for the transverse 
direction is inverse to ,x kickη . Factor 6 in (10) takes into account that the energy spread for 
cooling is ,02 Eσ , electrons does not interact with their URWs every turn (screening effect) and 
that the jumps of the electron energy and closed orbit in general case lead to lesser jumps of 
the amplitude of synchrotron and betatron oscillations [6].  
     The equilibrium spread in the positions of the closed orbits 2
, the spread of betatron 
amplitudes  2
A⎛ ⎞⎜ ⎟
 and corresponding beam dimensions ,EOC EOCx xησ σ  determined by EOC 
        
2 2 1
, , 1 ,
/ 2 / 2 | ( 1 / ) | |
2 2ph
EOC EOC
EOC EOC
x eq x eq x N e s n ph
eq eq
A x N N
N xη ησ σ δΣ >
⎛ ⎞ ⎛ ⎞= = = = − +⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
,              (11) 
where 1 2 max( /x loss )x E Eηδ η β
−= ∆  is the jump of the electron closed orbit determined by the energy 
jump  of the electron in the fields of the kicker undulator and its amplified 
URW (corresponds to one-photon/mode or one-photon/sample at the amplifier front end).  
max max /loss loss phE P f N∆ =
     The equilibrium relative energy spread of the electron beam 
                                                                                                (12) 
EOC EOC
E eq x kick x kickE ησ β σ η= ,/
     Note that jumps of closed orbits 1xηδ ~1/ .  That is why the electron bunch dimensions 
(11) at the same number of particles in the sample and relativistic factor are much higher ions 
one. As a sequence of small electron charge the number of photons in the URWs 
, 87% of UWRs are empty of synchronous photons and every URW has 
 non-synchronous photons. That is why the contribution of noise photons for 
electrons is greater ( ) then for heavy ions.  
21.15 10 1phN
, 1phN Σ >
/ 87n phN N Nn
    The power transferred from the optical amplifier to electron beam is  
                                                   clampl sample e nP f Nε Σ= ⋅ ⋅ + ,                                                (13) 
where 1,max
sample ph amplNε ω α=  is the average energy in a sample,  is the noise power. This is the 
maximal limit for the power corresponding to the case if all electrons are involved in the 
cooling process simultaneously (screening is absent and the amplification time interval of the 
amplifier  is higher then the time duration of the electron bunch 
amplt∆ bt∆ ).  
   The initial phases inϕ  of electrons in their URWs radiated in the pickup undulator and 
transferred to the entrance of the kicker undulator(s) depend on their energies and amplitudes 
of betatron oscillations. If we assume that synchronous electron enter the kicker undulator 
together with their URW at the decelerating phase corresponding to the maximum 
decelerating effect, then the initial phases for other electrons in their URWs will correspond 
to deceleration as well, if the difference of their closed orbit lengths between undulators 
remains 1,min / 2s λ∆ < . In this case the amplitudes of betatron oscillations, the transverse 
horizontal emittance of the beam in the smooth approximation and the energy spread of the 
beam at zero amplitude of betatron oscillations of electrons must not exceed the values  
,lim 1min , /x x x betA A λ λ π<< = ,     1min2xε λ< ,       
p k c lE E L
σ σ λ β
< = ,           (14) 
where ,p kL  is the distance between pickup and kicker undulators along the synchronous orbit, 
, ,ln / lnc l p kd T d pη = −  is the local slippage factor between the undulators,  is the momentum 
of an electron,  is the pass by time between pickup and kicker undulators. In 
accordance with the betatron phase advance 
βcLT kpkp /,, =
,2 ( 1/ 2)
x p kkψ π= + ; the value ,p kL =  
, ,( 1/x bet p kk 2)λ + , where , /x bet xC vλ =  is the wavelength of betatron oscillations, C is the 
circumference of the ring, xv  is the betatron tune.  
     The third equation in (14) can be overcome if the isochronous bend or bypass between 
undulators will be used. In some cases controllable variable in time optical delay-line can be 
used to change in situ the length of the light pass-way between the undulators during the 
cooling cycle to keep the decelerating phases of electrons in the kicker undulator in the 
process of cooling [6], [7].  
   Below we investigate this case in more details. The difference  in the propagation time 
of the URW and the traveling time  of the electron between pickup and kicker undulators 
depends on initial conditions of electron’s trajectory which can be expressed as   
,p kT
1 0 0 2
0 0 0 0
s s s s
s s s s
x C S E
c t c d c x d x d d
τ τ τ
ρ ρ ρ β
′∆ = − = − − −∫ ∫ ∫ ∫ τρ , 
where , ηβ xxx += ηβ 000 xxx += ,  is the deviation of the electron from its closed orbit, 
 is the deviation of the closed orbit itself from synchronous one,  and  stand for 
appropriate deviations at location 
ηx β0x η0x
0s s= . Two eigenvectors called sine-like S(z) and cosine-
like C(z) trajectories and ρ  stands for the local bending radius,  is a constant which is 
determined by the optical delay line. Basically vectors S(z), C(z) describe the trajectories with 
initial conditions like 
0 0 0 0 0( ) 0; ( , ) ( , )x s x s s x C s′ = = ⋅ s  and 0 0 0 0 0( ) 0; ( , ) ( , )x s x s s x S s′ s= = ⋅ , where s0 
corresponds to the longitudinal position of the pick up. So the transverse position of the 
particle has the form [12] 
)/(),(),(),()( 200000 EEssDssSxssCxsx β∆⋅+⋅′+⋅=  
where   is the deviation of the electron energy from the dedicated energy  and 
dispersion D defined as  
dE E E∆ = − dE
ssSssD
∫∫ ⋅−⋅−=
00 )(
),(),( ,  
Dispersion D(s,s0) describes transverse position of the test particle having relative momentum 
deviation from equilibrium as big as /p p∆ , while its initial values of transverse coordinates 
at s=s0 are zero. So full expression for transverse position of particle comes to form   
0 0 0 0 ,0 0 ,0 0 0 2( ) ( , ) ( , ) [ ( , ) ( , ) ( , ) ]x x
x s x C s s x S s s C s s S s s D s s
′ ′= ⋅ + ⋅ + ⋅ + ⋅ + , 
where xη describes periodic solution for dispersion in damping ring (slippage factor) and 
 marks pure betatron part in transverse coordinate; ββ 00 , xx ′ 0, 0,,x xη η′  stand for its value at 
location of pickup kicker. So the time difference becomes  
51 0 0 52 0 0 56 0 , ,2( , ) ( , ) ( , )t t p k c l
c t c R s s x R s s x R s s c cT
′∆ = − ⋅ − ⋅ − ≅ + ,                 (15) 
where we neglected terms responsible for the betatron oscillations (i.e. R51=0, R52=0).  
     In general case , 0c lη ≠  the initial phase of an electron in the field of amplified URW 
propagating through kicker undulator, according to (15), 1,max 0in tϕ ω= ∆ ≠  and the rate of the 
energy loss   
max| | sin( ) (loss loss in )P P fϕ= − ⋅ ∆E ,                                          (16) 
where ( ) 1 | ( ) | / 2inf E E Mϕ π∆ = − ∆ , if  | | 2in Mϕ π≤  and ( )f E 0∆ =  if | |inϕ > 2 Mπ . The 
function  takes into account that electron with some energy  and its URW enter 
kicker undulator simultaneously at the phase 
(f E∆ ) dE
| | 0inϕ =  and passing together all undulator 
length zero rate of the energy loss if . Electrons having the energies  so they 
and their URWs enter the kicker undulator non-simultaneously with different phases, travel 
together shorter distance in the undulator under smaller rate of the energy change.  
0tc = dE E≠
     According to (16), electrons with different initial phases are accelerated or decelerated and 
gathered at phases 2min mϕ π π= +  ( M m M− ≤ ≤ , 0, 1,..m M= ± ± ) and at energies  
                           
1,min
1,max , , , ,
(2 1)(2 1)
2m d d dp k c l p k c l
E E E E
λ βπβ
ω η η
= + = + ,                       (17) 
if RF accelerating system is switched off (see Fig.2).  
Figure 2: In the  EOC scheme electrons are grouping near the phases 2in mϕ π π= +  (energies mE ) 
     The energy gaps between equilibrium energy positions have magnitudes given by  
                                           1,min 2
gap m m d
p k c l
E E E E
= − = β .                                           (18) 
     Note that the energy gap (18) is 2 times higher the limiting energy spread of the beam at 
zero amplitude of betatron oscillations of electrons (14).  
     The power loss  is the oscillatory function of energy |lossP |dE E− with the amplitude 
linearly decreasing from the maximum value  at the energy max| lossP | dE E=  to a zero one at the 
energy | | . If the RF accelerating system is switched off, the electron energy 
falls into the energy range | |
d gapE E M Eδ− ≥ ⋅
dE E− < gapM Eδ⋅ , excitation of synchrotron oscillations by 
non-synchronous photons can be neglected then the electron energy is drifting to the nearest 
energy value mE . The variation of the particle’s energy looks like it produces aperiodic 
motion in one of 2M  potential wells located one by one. The depth of the wells is decreased 
with their number | . If the delay time in the optical line is changed, the energies |m mE  and 
the energies of particles in the well are changed as well. In this case particles stay in their 
wells if their maximal power loss satisfies the condition    
                                             ,                                                   (19) max| | | / extloss m lossP dE dt P> + |
where | |  stands for the external power losses determined by synchrotron radiation.  extlossP
3. VARIANTS OF OPTICAL COOLING 
Depending on the local slippage factor and coefficients R51, R52 and R56 in (14), different 
variants of optical cooling can be suggested.  
     1. The local slippage factor 0, =lcη , betatron oscillations are absent, dispersion function 
in the pickup undulator , 0x pickupη ≠ . In this case  and the initial phase for all electrons 
can be installed 
t constδ =
inϕ = / 2π . It corresponds to electrons arriving kicker undulator in 
decelerating phases of theirs URWs under maximum rate of energy loss. In this case 
electrons will be gathered near to the synchronous electron if a moving screen opens the way 
only to URWs emitted by electrons with the energy higher than synchronous one. This is the 
case of an EOC in the longitudinal plane based on isochronous bend and screening technique.  
     If electrons develop small betatron oscillations (betatron oscillations introduce phase shift 
less than π/2, (see (14)), then the electron beam will be cooled in transverse and longitudinal 
directions simultaneously. If the dispersion function value in the pickup undulator 0xη =  or if 
the synchrotron oscillations of electrons are small (no selection in longitudinal plane) then the 
cooling in the transverse direction only takes place ( , 0x kickη ≠ ).  
     2. The scheme of OSC can be used at 0, =lcη  [8]. In this scheme the pickup undulator is a 
quadrupole one and kicker undulator is ordinary one. They have the same period. The 
magnetic field in the quadrupole undulator is increased with the radial coordinate   by the 
( )B x G≅ ⋅ x  and changes the sign at 0x = , where G stands for the gradient. The phase of 
the emitted URWs changes its value on π  at 0x =  as well. That is why electrons are 
grouped around synchronous orbit in the ring where they do not emit URWs. The deflection 
parameter in the quadrupole undulator increased with the radial coordinate  and 
so the emitted wavelength also 
( | ( ) |)K B x∝
1min (1 ( )) / 2u K x
2λ λ≅ ⋅ + γ . As the resonance interaction of 
URW and the electron emitted the URW is possible in the kicker undulator only if deflection 
parameters of undulators are near the same, this opens a possibility for initial selection of 
amplitudes in the pickup undulator. So the cooling can be arranged for some specific 
amplitude of synchrotron oscillations only. The continuous resonance interaction and cooling 
is possible if the magnetic field of kicker undulator is decreased in time for cooling process. 
Electrons having other than resonance synchrotron amplitude do not interact with cooling 
system. Betatron oscillations in this scheme must introduce the phase shift less then 2/π  as 
well. This can be arranged by proper zeroing cos and sin-like trajectory integrals [13].  
  The scheme with two quadrupole undulators (the pickup one and the kicker one) described 
in [13]. In this case the second quadrupole undulator decreases the amplitudes of synchrotron 
oscillations for positive deviations  (we choose the conditions when electrons are 
decelerated in their URWs if  and betatron amplitudes are neglected 
( )) and increases them for negative ones 
, 0pickupxη >
, 0pickupxη >
, , 0pickup kickx xβ β= = , 0pickupxη <  as it experience 
deceleration again (the phases of URWs change their value on π  at 0x = and simultaneously 
the electron will pass the kicker undulator at opposite magnetic field). In this case to cool 
electron beam additional selection of URWs can be used by the screen (cut off URWs 
emitted by electrons at negative deviations ).  Another scheme which can be used, based on 
truncated undulator with the magnetic field of the form 
0( ) |xB x G> ⋅ x  and . 
Such undulator can be linearly polarized one with upper or down array of magnetic poles. It 
was used in the undulator radiation experiments in circular accelerators [14].  
0( ) | 0xB x <
     3. If  , 0c lη ≠ , betatron oscillations of electrons introduce phase shift / 2π<<  and the 
energy gaps have the magnitude gapEδ = ,0(3 5) Eσ÷ ⋅ , then transit-time method of OSC based 
on two identical undulators can be used [9]. In this case the main part of electrons including 
tail electrons will be gathered at the energy sE , if the energy 0 |m mE E =0=  was chosen equal 
to sE . Decreasing of the beam dimensions leads to decrease of the rate of cooling. In this 
case a time depended local slippage factor  can be used to decrease the energy gap for 
cooling process and to increase the rate of cooling.  
, ( )c l tη
     4. If  , 0c lη ≠ , the energy gaps between equilibrium energy positions have the magnitudes  
, RF accelerating system is switched off  then electrons are gathered 
at phases 
.0(3 5) /gap EEδ σ≅ ÷ ⋅ M
inϕ  and energies  independently on amplitudes of betatron oscillations. If the 
screen overlap URWs emitted by electrons at negative deviation from one having minimum 
energy  and optical system change the delay time of the rest URWs to move the energies 
min ,0(3 5) EE E σ> + ÷ ⋅  to the energy  then electrons loose their energy and amplitudes of 
betatron oscillations until their energy takes minimum one. Cooling takes place according to 
the scheme of the EOC.  
     5. For this variant  , 0c lη ≠ , the energy gaps between equilibrium energy positions have the 
magnitudes  , the RF accelerating system of the storage ring is switched 
off, the screen absorbs the URWs emitted by electrons at a negative deviation of theirs 
position from the synchronous one in the radial direction, energy layers are located at positive 
deviations from synchronous one outside the energy spread of the beam and optical system 
change the delay time of the URWs to move the energy layers to the synchronous energy. 
Then the energy layers capture small part of electrons of the beam first and electrons with 
smaller energy are captured increasingly and loose their energy and betatron amplitudes until 
reaching the minimum energy  allowed in the beam. So the cooling process takes place. 
This process can be repeated. In this case the energy jump  of the electron 
in the kicker undulator must be less than the energy gap 
,0 /gap EEδ σ<< M
max max| | /loss loss phE P f Nδ =
gapEδ  determined by synchronous 
photons (18) and non-synchronous photons in the URWs having higher energy jumps 
 maxnon synEδ − = max ,loss phE Nδ Σ  at . That is why the next condition must be fulfilled  , 1phN Σ >
                                  
1,minmax 2
phloss ph N gap d
p k c l
E N E E
ηΣΣ >
< = β
.                                            (20)  
Otherwise electrons can jump over the energy layers and cooling will not be effective.  
     If the RF accelerating system of the storage ring is switched on, the average energy loss 
per turn  is higher than the energy loss max| | /turnloss lossE P∆ = 0| (sin sin )seV φ φ−  of the electron in 
the RF accelerating system of the storage ring or 
                                     ,                                           (21) 0| sin sin | /
s lossE eVφ φ δ− <
then the energy of electrons is drifting to the nearest energy value  and EOC takes place. 
Here  is the amplitude of the RF accelerating voltage,  is the synchronous phase 
determined by the equation ,  is the energy 
loss per turn, 
0V sφ
, 0 0sin / /s SR s sE eV V Vφ = ∆ = , , /SR s SR s sE P f eV∆ = =
2 2 22
, 3SR s eP cr B γ=  
3 42.77 10 / s sR Rγ⋅  eV/sec is the average power of the synchrotron 
radiation emitted by the electron in the ring. The value  eV/turn.  7 4, / 5.8 10 /SR s sP f Rγ
     To keep the condition (21) satisfied, the range of RF phases of particles  2 |  
interacting with their URWs must be limited by the value  determined by equality 
in (21). This can be done by using OPA with short amplification time interval 
|sφ φ−
,12 | |c sφ φ−
,1amplt∆  
corresponding to the range of phases   and by overlapping the center 
of this time interval with synchronous particle, where 
RF ampltω ⋅ ∆ < ,12 | |c sφ φ−
2RF RFfω π= , RFf  is the frequency of 
the RF accelerating system of the ring. The last condition is equivalent to  
                            ,                 (22) ,1 ,12 | | /
laser
ampl ampl c s RFl l c φ φ ω< = −
where  is the length of the amplified laser bunch. In this case 2  electron ellipses 
appear around synchronous phase in the longitudinal plane. The amplitudes of synchrotron 
oscillations are determined by the energies  which are moved to synchronous energy if the 
optical system changes the delay time of the URWs. The condition (20) must be fulfilled if 
the RF accelerating system of the storage ring is switched on as well.  
,1ampll 1M +
     The electrons will be gathered effectively on elliptic trajectories having maximum 
energies  (17) if conditions (20), (21) are fulfilled, and the deviation  of 
the electron energies from  in the process of interaction are small. The last condition can 
be overcame by limiting the amplification time by interval 
mE mE E Eδ = −
,2am plt∆  and the corresponding 
length  to  ,2 ,2ampl ampll c t= ∆
                                                ,0,2
,02 2
s glaser laser
ampl ampl
< = ap ,                                                (23) 
where  is the initial length of the electron bunch. Above we suggested that electrons are 
moving along elliptical trajectories 
2 21 /EE sσ∆ = ± − sσ  and interact with URWs at the top 
energies  in the region of the energy deviations .  mE∼ / 8m gapE E E Eδ δ= − =
     Multiple processes of excitation of synchrotron oscillations by non-synchronous and noise 
photons will increase the widths of the electron ellipses and to transfer electrons from one 
ellipse to another. They can be neglected if the equilibrium energy spread (12) of the beam 
is less then the energy gap (18)or  
                                                            ,
E O C
E e q g a pEσ δ<                                                (24) 
     Damping time (10) in variant 5 will be increased  times:   ,02 /s amplσ l
                           ,0 ,0max
E sEOC
loss amplP l
τ =                    
,0 ,0
s x sEOC
loss x kick ampl
β σ σ
= .                                    (25) 
     The variant 5 permits to avoid any changes in the existing lattice of the ring (isochronous 
bend, bypass). It works easier for existing ion storage rings (see Appendix 2).  
     The screen permits us to select in pickup undulator electrons with positive deviations of 
both betatron and synchrotron oscillations, and such a way to produce effective cooling both 
in the transverse and longitudinal direction (we suggested 0xη ≠  in pickup and kicker 
undulators in this case). Using the number of kicker undulators  permits to cool the 
beam either in two directions or in the transverse or in the longitudinal directions only by 
selecting corresponding distances between kicker undulators [4], [6].  
1kickN >
4. OPTICAL SYSTEM FOR THE EOC SCHEME  
UR of an electron gets its well known properties only after the electron passed the undulator 
and UR is considered in far zone. The lens located near the pickup undulator can strongly 
influence to the UR properties if its focus is inside the undulator [15]. 
    For effective cooling of an electron beam in a storage ring, parameters of the beam under 
cooling and the optics of EOC system must fulfill certain requirements.  
   1. The URW, emitted in a pickup undulator must be filtered and passed through the laser 
amplifier.  
   2. In variant 1 each electron in a beam should enter kicker undulator simultaneously with its 
amplified URW emitted in a pickup undulator and to move in decelerating phase of this 
URW. For the test electron of a beam (for example, for the synchronous electron with zero 
amplitude of betatron oscillations) this requirement is satisfied by equating the propagation 
time of the URW with the traveling time of the electron between undulators. Conditions (14) 
are necessary for other electrons of the beam to get decelerating phases of theirs URWs in 
this case.  
    3. Each electron in the beam should enter the kicker undulator with its URW emitted in the 
pickup undulator near the center of this wavelet in transverse direction. This requirement will 
be satisfied if the transverse sizes of all URWs in the kicker undulator are overlapped.     
   The rms transverse size of one URW at the distance l  from the pickup undulator is equal to 
 (assuming that radiation is emitted from the center of the undulator). 
At the distance l  from the undulator the R.M.S. transverse size of the beam of emitted URWs 
is equal , where d  is the transverse size of the electron beam. 
If the optical screen opens the way to radiation from the part of the electron beam only, so the 
only small angles to the undulator axis   passed through, then at the distance from 
the end of the undulator , where   
( ) ( /2)w c ul Mσ θ λ∆ +
, ( ) /2 ( )w b c u cd Mσ θ λ lθ∆ + ∆ ⋅+
( )Cϑ ϑ∆ < ∆
cl l=
                         
( ) 2
                                                       (26) 
URWs will be overlapped, and the transverse size of the beam of the selected wave packets 
will be equal 2 (  (see. Fig. 3). If the beam of URWs passed optical lenses, 
movable screen, the optical amplifier, optical delay and injected into the kicker undulator 
then electrons under cooling will hit theirs URWs in the transverse plane if the transverce 
dimensions of the electron beam in the undulator are less then .  
)cd Mθ λ+ ∆ u
,2 w bσ
  4. Generally the angular resolution of an electron bunch by an optical system is   
                              1min1.22res D
δ ϕ ≅ ,                                                          (27) 
where  is diameter of the first lens. This formula is valid, if elements of the source emit 
radiation which is distributed uniformly in a large solid angle. In our case only a fraction of 
the lens affected by radiation, as . That is why the effective diameter  must 
be used in (27). At the distances   the size ,  so the space resolution of the 
optical system is 
,w bD σ> ,eff w bD σ=
cll > , ( )w b c lσ ∆θ
res resx lδ δ ϕ≅  or  
1min 1min1.22 /( ) 0.86res c ux Lδ λ θ λ≅ ∆ = ,                                     (28) 
where 2( ) ( ) / (1/ ) (1 ) /c c K Mθ ϑ γ γ∆ = ∆ = +  is the observation angle.  
     Note that at closer distances , the spatial resolution is better. More complicated 
optical system can be used for increase the spatial resolution in this case.  
cl l<
                                                       Figure 3: Scheme of URW’s propagation.  
     5. URWs must be focused on the crystal of OPA. For the URW beam, the dedicated 
optical system with focusing lenses can be used to make the Rayleigh length equal to the 
length of the crystal (typically ~1 cm) for small diameters of the focused URW beam in the 
crystal (typically ~0.1 mm).  
     6. The electron bunch spacing in storage rings is much bigger than the bunch length. The 
same time structure of the OPA must take advantage on this circumstance.   
5. USEFUL EXPRESSIONS FROM THE THEORY OF CICLIC ACCELERATORS 
The equilibrium value of relative energy spread of the electron beam in the isomagnetic 
lattice of the storage ring determined by synchrotron radiation (SR) described by expression  
655 6.2 10
eq SR
s s sE R R
γ γ−Λ= ⋅
,                                       (29) 
where  cm, 11/ 3.86 10c mc
−Λ = ⋅
sR  is the equilibrium bending radius of the synchronous 
electron, in the smooth approximation 4 /s c s sR Rαℑ = −  is a coefficient determined by the 
magnetic lattice ( ),  is the momentum compaction factor, 0 3s< ℑ < cα sR  is the equilibrium 
averaged radius of the storage ring [16].  
     For small synchrotron oscillations the equilibrium length of the electron bunch is 
eq SR
eq SR c E
,                                                           (30)  
where 0 cos /2rev c s sh eV Eω α φ πΩ=  is the synchrotron frequency, 2rev fω π= , h  is an 
accelerating RF voltage harmonic order.  
     The equilibrium radial synchrotron and betatron beam dimensions are  
eq SR
eq SR E
x x Eη
σ η= ,           , 55 3 6.2 10
eq SR s
σ γ γ−
= = ⋅
,                       (31) 
where in the smooth approximation /s c s sR Rα 1ℑ = −  is a coefficient determined by the 
magnetic lattice ( ),  is a coefficient [16],  [17].  3x sℑ +ℑ = 1F ∼
      The damping time in the storage ring determined be synchrotron radiation in the bending 
magnets and comes to the following values ( 1zℑ = )  
                            , , 3 3
, , , , , , ,
355SR s sx z s
SR s x z s e x z s x z s
s sR R RE
= = =
ℑ ℑ ℑ
[sec].                                          (32) 
     The maximum deviation of the energy from its synchronous one is  
                            0
[( 2 )sin 2 cos ]sep s s
E h E
β π φ φ
= ± − − sφ ,                                      (33) 
where  is the storage ring slippage factor of the ring.  2/1 γαη −= cc
     For the ordinary storage ring lattice (without local isochronous bend or bypass between 
undulators) the natural local slippage factor 
                                                                  
p kc l c
L Cη η .                                                      (34) 
6. EXAMPLE 
Below we consider an example of one dimensional EOC of an electron beam in the 
transverse x-plane in strong focusing storage ring like Siberia-2 (Kurchatov Institute Atomic 
Energy, Moscow) having maximal energy 2.5 GeV [18]. The magnetic system of the ring is 
designed with so-called separate functions. The lattice consists of six mirror-symmetrical 
superperiods, each containing an achromatic bend and two 3 m long straight sections. For the 
functionality, the half of the superperiod arranged with two sections. The first one, 
comprising the quadrupoles F,D,F and two bending magnets is responsible for the achromatic 
bend and high xβ , yβ  functions in the undulator straight section. The second part, 
comprising quadrupoles D,F,D and dispersion free straight section, allows to change the 
betatron tunes, without disturbing the achromatic bend. Main parameters of the ring, the 
electron beam in the ring, pickup and kicker undulators and Optical Parametric Amplifier are 
presented in Tables 1 - 4.  
      After single bunch injection in the storage ring the energy 100 MeV is establishd for the 
experiment and the beam is cooled by synchrotron radiation damping (see section 5). In this 
case the energy spread and the beam size acquire equilibrium values in ~40 seconds (see 
Table1). The equilibrium energy spread is equal to , / 3.94 10eq SRE Eσ 5−= ⋅ , the length of the bunch 
 cm at the amplitude of the accelerating voltage  =  73 V, the synchronous 
voltage  V, the radial emittance  cm
, 2.32eq SRsσ = 0V
1.89sV =
, 6 21.25 10 [ ]eq SRx E GeV
−∈ = ⋅ 81.25 10−= ⋅  rad, the radial 
betatron beam dimension at pickup undulator , 24.61 10eq SRxσ −= ⋅  mm.  
     Following synchrotron radiation damping the amplitudes of radial betatron oscillations 
,0xσ  are artificially excited to be suitable for resolution of the electron beam in the experiment 
with EOC (see Table 2). The amplitudes of  synchrotron oscillations must stay damped to 
work with short electron bunches and short duration of amplification OPAs.  
     In the variants of the example considered below the optical system resolution of electron 
beam, according to (28), is 1.9resxδ =  mm at 
1,min 2 10λ
−= ⋅  cm, 240uMλ =  cm. It yields that 
effective EOC in this case is possible if the beam under cooling has total size in the pickup 
undulator , 2.0x totσ >  mm. We accepted the initial energy spread , the 
dispersion beam size  mm, the length of the electron bunch   
cm, its transverse size at pickup undulator 
,0 3.94 10
eq SR
E E Eσ σ
−= = ⋅
,0 3.15 10xησ
−= ⋅ ,0sσ =
,2 4eq SRsσ = .64
,0 4xσ =  mm,  the laser amplification length 
 mm (duration 0.5 ps), the radial betatron beam size in kicker undulator 1.5laserampll = ,0 1xσ = mm, 
the URW beam size  mm. We took the number of electrons at the orbit . In 
this case the number of electrons in the URW sample is 
, 2w bσ =
, 5 10eN Σ = ⋅
, 129.5e sN = , the number of non-
synchronous photons in the sample is , 2.5phN Σ =  for the case of one noice photon at the OPA 
front end. In this storage ring the natural local slippage factor (34) is , , /c l c p kL Cη η=  
, /c p kL Cα =
34.45 10−⋅ , the energy gap (18) is 0.62gapEδ =  keV. 
     We consider EOC in the transverse plane. In this case the dispersion beam size 
,0x resxησ δ<<  and that is why there is no selection of electrons in the longitudinal plane. That is 
why in order to prevent heating in the longitudinal plane by energy jumps determined by both 
synchronous and non-synchronous photons in the URWs, two kicker undulators are used 
which produce zero total energy jump [4], [6]. Note that the purpose of this experiment is to 
check physics of optical cooling. At the same time cooling in the transverce plane is 
important for heavy ions in RHIC, LHC.  
     We accept the distance between pickup and first kicker undulator along the synchronous 
orbit  m (, 72.27p kL = ,9 , 4
x p kkψ π= = ). It corresponds to the installation of undulators in 
the first and seventh straight sections which are located at a distance 72.38 m (we count off 
pickup undulator). Second kicker undulator is located on the same distance from the first one. 
Optical line is tuned such a way that electrons are decelerated in the first kicker undulator and 
accelerated in the second one.  
     The URWs have the number of the photons emitted in the pickup undulator (see Table 3) 
 per electron in the frequency and angular ranges (3) suitable for cooling. The 
limiting amplitude of betatron oscillations (14) is 
21.15 10phN
,lim 3.2xA =  mm. The energy spread of the 
beam limited by the separatrix is 4/ 3.3 10sepE E
−∆ = ⋅ . The electric field strength at the first 
harmonic of the amplified URW in the kicker undulator is clwE ≅ 32.06 10−⋅  V/cm. The power 
loss for the electron passing through one kicker undulator together with its amplified URW 
comes to  eV/sec if the amplification gain of OPA is (see Tables 2, 
4). This power loss corresponds to the maximal energy jumps  eV and the average 
energy loss per turn  eV/turn. The jump of the closed orbits is 
max 62.03 10lossP = ⋅
710amplα =
max 73lossE∆ =
0.84turnlossE∆ =
1 55.8 10xηδ
−= ⋅  cm. 
Below we will consider two variants of EOC. 
     1. The variant 1 (section 3). For the parameters presented above the cooling time for the 
transverse coordinate, according to (10), comes to , 18.5x EOCτ =  msec. SR damping time ~ 40 
sec is much bigger (see Table 1). The average power transferred from the optical amplifier to 
electron beam (13) is 0.061amplP =  mW. It is determined by the power of the URWs (0.036 
mW) and noise average power (41) equal to 0.025 mW. We adopted one-photon/mode (one-
photon/sample) at the amplifier front end corresponding to pulse noise power at the amplifier 
front end W,  W at the gain , used  eV. 
We took into account that the amplification time interval of the amplifier  is less than 
the revolution period by a factor of 
,0 /nP c Mω λ= = 74 10−⋅
2nP = 70 10G = 0.5ω
amplt∆
,0/ / 2 8.27b sC c t C σ∆ = = ⋅  times.  
     Necessary conditions for selection of electrons must be created: high beta function in the 
pickup undulator (to increase the transverce dimensions of the bunch for selection of 
electrons with positive deviations from closed orbit), the isochronous bend between 
undulators. We believe that the lattice of the ring is flexible enough to be changed in 
nesessary limits by analogy with those presented in [19].  
    The number of  electrons in the bunch is enough to detect them in the experiment and to 
neglect intrabeam scattering.  
     Note that if one kicker undulator is used in the scheme of two-dimensional EOC and the 
beam resolution is high  mm, the equilibrium relative energy spread, the spread of 
closed orbits, the longitudinal, dispersion and radial betatron beam dimensions determined by 
EOC, according to (11), are equal to 
210resxδ
, 5/ 5.56 10eq EOCE sEσ
−= ⋅ ,,0 2.63
eq EOC
sσ =,  cm,  , ,eq EOC eq EOCx xησ σ= =
24.46 10−⋅  mm. It follows that if the number of electrons in the bunch  then their 
influence on the equilibrium dimensions of the bunch can be neglected, longitusinal 
dimensions and the energy spread stay small and the radial betatron beam dimensions 
determined by EOC are high degree decreased. In reality the equilibrium synchrotron and 
betatron bunch dimensions will be much higher. This is the consequence of the finite beam 
resolution in pickup undulator. That is why we use two kicker undulators to keep longitudinal 
bunch dimensions small to exclude the excitation of longitudinal oscillations by multiple 
energy jumps. The situation could be better if we had effective OPA at the wavelength 
 cm, i.e. about one order less. Shorter undulator can be used as well.  
, 5 10eN Σ < ⋅
1,min 3 10λ
      2. The variant 5 (section 3).  The variant 5 requires easier tuning of the lattice for the 
arrangement of the local small slippage factor between undulators. In the case of one-
dimensional EOC, using two kicker undulators, the multiple processes of excitation are not 
essential because of the excitation of the synchrotron oscillations in this case is absent or 
unessential and that is why there is no need in the small local slippage factor. In this case the 
initial phase ( , )in xE Aϕ  of the electron in the field of amplified URW propagating through the 
kicker undulator, according to (15) is the function of both the energy (which is a constant in 
this variant of EOC) and the amplitude of betatron oscillations. The amplitudes of betatron 
oscillations will increase or decrease depending on their initial phases until they reach the 
equilibrium amplitudes determined, in the smooth approximation, by the expression 
, 1min ,2x m x betA m /λ λ= ⋅ π  (generelised expression (14)) corresponding to the phases 2m mϕ π= . 
Variable in time optical delay-line can be used to change in situ the length of the light pass-
way between the undulators during the cooling cycle to move the initial phases to 
2in m / 2ϕ π π+  for production of the optimal rate of decrease the amplitudes of betatron 
oscillations of electrons in the fields of amplified URW and the kicker undulator. The 
damping time for radial betatron oscillations, according to (25), is , 0.57x EOCτ =  sec.  
     Note that in the case of two-dimensional EOC using one kicker undulator, according to 
(18), the energy gaps between equilibrium energy positions have the magnitudes 
0.62gapEδ =  keV. They are higher than the energy jumps of electrons in kicker undulator 
 eV and the energy jumps max 73lossE∆ = max , 115loss phE N Σ∆ =  eV determined by the non-
synchronous photons in the URWs (see condition (20)). The conditions (22), (23) or 
 limit the length of the laser URWs by the values   cm,  cm. 
The accepted laser amplification length  mm is enough to satisfy these conditions. 
The damping time for radial betatron oscillations, according to (25), is 
laser
ampl ampll l< ,1 1.69ampll = ,2 0.32ampll =
1.5laserampll =
, 1.14x EOCτ =  sec. This 
damping time is less then one for synchrotron radiation damping (see Table1). The 
equilibrium energy spread determined by EOC is about 6-10 times higher then the energy 
gap. It follows that the local slippage factor between undulators, according to (24), must be 
decreased by a factor higher then 10. Unfortunately the resolution of the electron beam will 
not permit to reach the equilibrium energy spread and cooling in the transverse plane in this 
case will be less then heating in the longitudinal one.  
7. CONCLUSIONS 
  We have shown in this paper, that test of EOC is possible in the 2.5 GeV electron strong 
focusing storage ring tuned down to the energy ~ 100 MeV. Electron beam can be cooled in 
transverse direction. The damping time is much less than one determined by synchrotron 
radiation. So the EOC can be identified by the change of the damping rate of the electron 
beam. Variant of cooling is found, which permits to avoid any changes in the existing lattice 
of the ring (for production of isochronous bend, bypass). It can work for existing ion storage 
rings as well (see Appendix 2). Three short undulators in this variant installed in the storage 
ring have rather long periods and weak fields. They can be manufactured at low cost.  
   The cooling of a relatively small number of electrons (one bunch, ) is considered 
in this proposal in attempt to avoid strong influence of the non-synchronous photons on the 
equilibrium energy spread of the beam. The intrabeam scattering effects could be overcame 
as well. Optical amplifier suitable for the EOC - so called Optical Parametric Amplifier - 
suggested as a baseline of experiment, must have moderate gain and power. We have chosen 
the wavelength of the OPA equal 2 mkm as the OPA technique is more developed for these 
wavelengths. At the present times the OPAs having amplification gain ~10
, 5 10eN Σ = ⋅
8 and the power >1 
W are fully satisfy requirements for this experiment (See Appendix 3). Usage of OPAs with 
shorter wavelength will permit to increase the spatial resolution by the optical system and the 
degree of cooling of the beam.   
     We have predicted that the maximum rate of energy loss for electrons in the fields of the 
kicker undulator and amplified URW calculated in the framework of classical 
electrodynamics is 9.3phN  times lesser then one taking into account quantum nature of 
the photon emission in undulators. Quantum aspects of the beam physics will be checked in 
the proposed test experiment. It is suggested that the scheme based on optical line with 
variable delay time will be tested as well.  
     Authors thank A.V.Vinogradov and Yu.Ja.Maslova for useful discussions.  
     Supported by RFBR under grant No 05-02-17162a, 05-02-17448a, by the Program  
     of Fundamental Research of RAS, subprogram “Laser systems” and by NSF.   
Table 1.  Parameters of the ring:  
The maximal energy of the storage ring                                      Emax= 2.5 GeV 
The energy for the experiment                                                     Eexp=100 MeV 
Relativistic factor for the experiment                                          200γ ≅  
Circumference                                                                              C=124.13 m 
Bending radius                                                                              cm  490.54R =
Average radius                                                                             1976R=  cm 
Frequency of revolution                                                              62.42 10f = ⋅  Hz 
Harmonic number                                                                        75h =  
RF frequency                                                                               181.14RFf =  MHz 
Energy loss determined by SR                                                    , / 1.8SRP fγ 9=  eV/turn 
The amplitude of the accelerating voltage at                         =  73 V  expE 0V
The synchronous phase                                                                 0.026sϕ
Radial tune                                                                                   7.731xv =  
Vertical tune                                                                                 7.745zv =  
Momentum compaction factor                                                      37.6 10cα −= ⋅
Dispersion function at the pickup and the kicker undulator        80xη =  cm,  
Radial beta function in pickup undulator                                     17xβ =  m  
Radial beta function in kicker undulator                                     1.7xβ =  m 
Vertical beta function                                                                   6zβ =  m  
Patrician coefficients                 , ,z x sℑ ℑ ℑ                                   1,  0.97, 2.03 
Damping times by SR at                                        43.1; 44.4; 21.23 sec expE , ,z xτ τ τs
The length of the period of betatron oscillations                          , 16.06x betλ = m 
Slippage factor of the ring                                                            c cη α=  
Local slippage factor of the ring                                                   , 0.58c l cη α= ⋅  
Frequency of synchrotron oscillations at 100 MeV           expE = 31.6 10 f−Ω = ⋅ .  
Table 2.    Initial parameters of the electron beam in the ring:  
Number of electrons at the orbit                                           4, 5 10eN Σ = ⋅ ,  
Number of electron bunches being cooled                             1 
Relative energy spread                                                          ,  5,0 / 3.94 10E Eσ
Betatron beam size at pickup undulator ( 32xβ =  m)             ,0 4xσ =  mm,  
Betatron beam size at kicker undulator ( 2xβ =  m)               ,0 1xσ =  mm,  
Dispersion beam size                                                             2,0 3.15 10xησ
−= ⋅  mm,  
Total beam size at pickup undulator                                     2 2, ,0 ,0( ) ( )x tot x xησ σ σ 4= + =  mm,  
The length of the electron bunch                                            cm,  ,0 2.32sσ =
Table 3.  Parameters of pickup and kicker undulators:  
Magnetic field strength                                                              2 1 3 3 8B = Gs,  
Undulator period                                                                        8uλ =  cm,  
Number of periods                                                                      M = 30,   
Deflection parameter                                                                  K=1.  
Table 4.   Optical Parametric Amplifier  
Number of Optical Parametric Amplifiers                                             2 
Total gain                                                                                                 710amplα =
The wavelength of amplified URWs                                                     cm 41,min 2 10λ
The characreristic URW waist size                                                         mm   . 0.77w cσ =
The URW beam waist size                                                                      mm   2wσ =
The duration of the amplification time of the OPA                                5 psec (  mm) 1.5ampll =
The frequency of the amplified cycles                                                     Hz 62.42 10amplf f= = ⋅
Appendix 1  
Spectral-angular distribution of the UR energy emitted by the relativistic electron in the 
pickup undulator on the harmonic  is  n
                                         
sin ( , )
E M E
c σ ω ϑ
∂ ∂ ∂
,                                           (35) 
where  is the angular distribution of the energy of the UR emitted in the unit solid 
angle  at the angle 
/nE∂ ∂o
do θ  to it’s axis, , , sin sin /n n nσc σ σ= ( )/n n nnMσ π ω ω ω= −
1n nω ω=  is the frequency of the -th harmonic of  UR. [20] - [22]. For the helical undulator  n
                                   
2 2 3 2 2 2
( ) ( ) 6 ( )
n n totE e Mn F K E n F K
c K 2 31 )
nω ϑ β ϑ γ ϑ
⊥∂ ,= =
∂ Ω +
n nF K J n
2( ) ( )ϑ χ
2 2 2 21
nK J n
ϑ( ) )χ
+ − ,+  2 2 24 3totE e M K cπ γ= Ω / , 2 ucπ λΩ = / ,  is the 
Bessel function and it’s derivative, 
n nJ J
2 22 (1 ) 1K Kχ ϑ ϑ= / + + < /Kβ γ⊥ =, . The number of 
photons emitted in the undulator on the harmonic  in the solid angle  and 
frequency band dω is determined by the relation  
n 2 /do dπ ϑ γ= 2
                            
2 2 2
, 2 2 2
sin ( , )
ph n n
n M K F K
α λ ϑ
d doσ ω ϑ ω
+ +∫ ∫ .                  (36) 
     If the considered frequency band  then we can neglect the angular 
dependence of the first multiplier and the frequency dependence of the value in (36). 
In this case the value  determine the range of angles of the UR (3). The increasing of 
the frequency band will lead to the increase of the angular range. Taking the frequency band 
 out of the integral and taking into account that 
/ 1/Mω ω∆ 1
sin nc σ
sin nc σ
ω∆ 2 2 1(2 / ) nd Mϑ γ π ω σ= Ω ,  
we can transform (36) to (6) for  sin ( )nc σ πδ σ= n / 1/2Mω ω∆ = 0ϑ =  and .  1n =
Appendix 2  
Below we investigate the possibility of lead ions cooling (Z=82) in the storage ring LHC 
based on using the version 5 (section 3) of EOC. We take the example 2 considered in [7]. In 
this case the energy  eV, (141.85 10E = ⋅ 953γ = ) the slippage factor of the ring 43.23 10c cη α −= ⋅ , 
the amplitude of the RF accelerating voltage  MV, the RMS bunch length 8 cm, RF 
frequency  MHz, synchrotron frequency  Hz, circumference C=2665888.3 
cm, harmonic order h=35640, RMS relative energy spread ,  eV, 
0 16V =
400RFf = 23Ω=
,0 / 1.1 10E Eσ
−= ⋅ 102.04 10inEσ = ⋅
, 415x betλ =  m, RF bucket half-height 
4/ 4.43 10sepE E
−∆ = ⋅ ,  eV. We take the 
distance between pickup and kicker  undulators 
108.19 10sepE∆ = ⋅
, 1453p kL =  m (k=3), synchrotron radiation 
energy loss per ion per turn  eV.  4, / 1.2 10SR sP f = ⋅
     For the parameters of the undulators [7] the energy loss per turn is  eV, 
 eV (M=12), the gap between equilibrium energy positions is 
 eV,  cm. It follows that , that is the condition  (20)  
is satisfied. In the case of ions the equation (21)  must include  instead of . In this 
example . It follows the laser amplification length  mm. By 
this choice the condition (21) is satisfied as well. To cool the ion beam in the transverse plane 
and to keep the magnetic lattice unchanged one pickup and two kicker undulators must be 
used.  
max 53 10lossE∆ = ⋅
,0 / 1.7 1E Mσ = ⋅ 0
81.97 10gapEδ = ⋅
1,min 5 10λ
−= ⋅ maxloss gapE Eδ∆
0eZ V⋅ 0eV
max 4
0/ 2.2 1lossE eZVδ
−= ⋅ ,2 2.78ampll =
Appendix 3  
The principle of OPG is quite simple: in a suitable nonlinear crystal, a high frequency and 
high intensity beam (the pump beam, at frequency pω ) amplifies a lower frequency, lower 
intensity beam (the signal beam, at frequency sω ); in addition a third beam (the idler beam, 
at frequency iω , with i s pω ω ω< < ) is generated (In the OPG process, signal and idler beams 
play an interchangeable role, we assume that the signal is at higher frequency, i.e., s iω ω> ).. 
In the interaction, energy conservation  
                                                                  p s iω ω ω= +                                    
is satisfied; for the interaction to be efficient, also the momentum conservation (or phase 
matching) condition 
                                                                 p s i= +k k k                                      
where , pk sk , and  are the wave vectors of pump, signal, and idler, respectively, must be 
fulfilled. The signal frequency to be amplified can vary in principle from 
2pω  (the 
so-called degeneracy condition) to pω , and correspondingly the idler varies from 2pω  to 0; 
at degeneracy, signal and idler have the same frequency. In summary, the OPG process 
transfers energy from a high-power, fixed frequency pump beam to a low-power, variable 
frequency signal beam, thereby generating also a third idler beam. 
The signal and idler group velocities sv  and  (GVM – group velocity mismatch) 
determine the phase matching bandwidth for the parametric amplification process. Let us 
assume that perfect phase matching is achieved for a given signal frequency 
sω  (and for the 
corresponding idler frequency i p sω ω ω= − . If the signal frequency increases to sω ω+∆ , by 
energy conservation the idler frequency decreases to iω ω−∆ . The wave vector mismatch 
can then be approximated to the first order as 
                                              
s i gi gs
k ω ω ω
ω ω ν ν
⎛ ⎞∂ ∂
∆ ≅ − ∆ + ∆ = − ∆⎜ ⎟⎜ ⎟∂ ∂ ⎝ ⎠
    The gain bandwidth of an OPA can be estimated using the analytical solution of the 
coupled wave equations in the slowly varying envelope approximation and assuming flat top 
spatial and temporal profiles and no pump depletion. The intensity gain (G ) and phase (ϕ ) 
of the amplified signal beam are given in [23] by  
   ( )
2 sinh
γ ⎛ ⎞= + ⎜ ⎟
sin cosh cos sinh
tan ,
cos cosh sin sinh
B A B A A B
B A B A A B
            (37) 
where 2,A kL= ∆  ( ) ( )2 22 ,B L kLγ= − ∆ and 0gain coefficient 4 2 ,eff p p s i s id I n n n cγ π ε= = λ λ
(phase mismatch ,p s ikL L∆ = = − −k k k  where L is the length of amplifier,  is the 
effective nonlinear coefficient, 
pI  is the pumping intensity. 
       The full width at half maximum (FWHM) phase matching bandwidth can then be 
calculated within the large-gain approximation as 
                                            
( )1 2 1 22 ln 2 1
gs gi
⎛ ⎞∆ ≅ ⎜ ⎟
                                              
(38) 
Large GVM between signal and idler waves dramatically decreases the phase matching 
bandwidth; large gain bandwidth can be expected when the OPA approaches degeneracy 
( s iω ω→ ) in type I phase matching or in the case of group velocity matching between signal 
and idler ( gs giν ν= ). Obviously, in this case Eq. (5) loses validity and the phase mismatch k∆  
must be expanded to the second order, giving  
                                       
( )1 4 1 4
ln 2 1
L k k
∆ ≅ ⎜ ⎟
∂ ∂⎝ ⎠
                                                        
For the case of perfect phase matching ( 0k∆ = , B Lγ= ) and in the large gain approximation 
( 1Lγ ), Eq. (37)(4) simplify to 
                       ( ) (1 04 exp 2 ,s s )I L I Lγ≅           ( ) (0 exp 2 .4
)I L I Lω γ
≅                             (39) 
Note that the ratio of signal and idler intensities is such that an equal number of signal and 
idler photons are generated. Equations (39) allow defining a parametric gain as 
                                                      ( )1 exp 2
G Lγ≅     
growing exponentially with the crystal length L  and with the nonlinear coefficient γ .  
            ( ) ( )( )0 01 1exp 8 2 exp 8 24 4eff p p s i s i eff s s p pG L d I n n n c L d n Iπ ε λ λ π λ ε≅ ≅ n c
      (40) 
for  and in n≈ i sλ λ≈ , 
377 Ohm
= .  
     The noise (amplified self emission) power of the optical amplifier is determined by  the 
expression  
                                                            ,                                                   (41) ,0 0n nP P G=
where  is the noise power at the amplifier front end,  is the gain of the amplifier.  ,0nP 0G
If  the noise power corresponds to one-photon/mode at the amplifier front end then 
 [27], [28], where in our case  is the coherence length, 
,0 1max /nP ω τ= coh 2coh MTτ =
1min /T cλ=
Example: MgO Periodically Poled Lithium Niobate 
     In recent years the development of periodically poled nonlinear materials has enhanced 
the flexibility and performance of OPAs. In the case of well-studied Periodically Poled 
Lithium Niobat (PPLN), one can get access to the material’s highest effective nonlinearity as 
well as retain generous flexibility in phase-matching parameters and nonlinear interaction 
lengths.  
     Operation near the degeneracy wavelength of 2.128 µm reduces thermal-lens effect 
because the signal and the idler wavelengths fall within the highest transparency range of 
Lithium Niobat. For wideband optical parametric amplification, we will use MgO: PPLN 
with a poling period of 31.1 (31.2) µm, which has high damage threshold and high nonlinear 
coefficient 16pm/V [25] (17pm/V [26]). To avoid photorefractive damage, thick (~1-2 mm) 
PPLN crystal was suggested to be kept at elevated temperatures 1500C [24]. 
For the signal wavelength sλ  = 2 µm, 2.1s pn n≈ = , and the gain, according to formula (40) 
comes to  
exp 3
pI lG
GW cm mm
≅ ⎜ ⎟
For G=107,  Ip=1GW/cm2,  l = 5.8 mm. We propose two-stage crystal amplifier (l1 = l2 = 3.5 
mm). OPA amplifies linearly polarized radiation. That is why the circular polarization of the 
URWs (if helical undulator is used) in our case must be transformed into linear polarized one 
before it will be injected in the kicker undulator. Usual quarter wave plate can be used for this 
purpose in simplest case. Planar undulators can be used as well. Reflective optics can be used 
for dispersion-free undulator light propagation.  
REFERENCES  
[1] A.M.Sessler, “Methods of beam cooling”, LBL-38278 UC-427, February 1996; 31st 
Workshop:   “Crystalline Beams and Related Issues”, November 11-21, 1995.  
[2] D.Mohl, A.M.Sessler, “Beam cooling: Principles and Achievements”, NIMA, 532(2004), 
p.1-10.  
[3] D. Mohl, “Stochastic cooling”, CERN Accelerator School Report, CERN No 87-03, 1987, 
pp.453- 533.  
[4]. E.G.Bessonov, “On Violation of the Robinson’s Damping Criterion and Enhanced 
Cooling of Ion, Electron and Muon Beams in Storage Rings”, 
http://arxive.org/abs/physics/0404142 .   
[5] E.G.Bessonov, A.A.Mikhailichenko, “Enhanced Optical Cooling of Particle Beams in 
Storage  Rings”, Published in Proc. 2005 Particle Accelerator Conference, May 16-20, 
2005, Knoxville,  Tennessee, USA, (http://www.sns.gov/PAC05),    
    http://accelconf.web.cern.ch/accelconf/p05/PAPERS/TPAT086.PDF.   
[6] E.G.Bessonov, A.A.Mikhailichenko, A.V.Poseryaev, “Physics of the Enhanced optical 
cooling of  particle beams in storage rings”, http://arxiv.org/abs/physics/0509196.    
[7] E.G.Bessonov, M.V. Gorbunkov, A.A.Mikhailichenko, “Enhanced optical cooling of ion 
beams for LHC”, Proc. 2006 European Particle accelerator Conference, June 26-30 2006.  
Edinburgh, Scotland, 
       http://accelconf.web.cern.ch/accelconf/e06/PAPERS/TUPLS001.PDF;  
      Electronic Journal of Instrumentation – 2006 JINST 1 P08005, http://jinst.sissa.it/,        
      http://ej.iop.org/links/r5pyfrsWE/sl44atI92xGGY2iAav5vpA/jinst6_08_p08005.pdf.  
[8] A.A.Mikhailichenko, M.S.Zolotorev,” Optical Stochastic Cooling”, SLAC-PUB-6272, 
Jul1993, 11pp. Published in Phys.Rev.Lett.71:4146-4149,1993. 
[9] M.S.Zolotorev, A.A.Zholents, “Transit-Time Method of Optical Stochastic Cooling”, 
Phys. Rev. E   v.50, No 4, 1994, p. 3087.  
[10] M.Babzich, I.Ben-Zvi, I.Pavlishin at all., “Optical Stochastic Cooling for RHIC Using 
Optical  Parametric Amplification”, PRSTAB, v.7, 012801 (2004).  
[11] E.G.Bessonov, “Some Aspects of the Theory and Technology of the Conversion 
Systems of Linear Colliders”, Proc. 15th Int. Accelerator Conf. on High Energy 
Accelerators, (Int. J. Mod. Phys Proc. Suppl.2A), V.1, pp. 138-141, 1992, Hamburg, 
Germany.  
[12] K. Steffen, “High Energy Beam optics”, Interscience Pub., 1964. 
[13] A.A. Mikhailichenko, “Optical Stochastic Cooling and Requirements for a Laser 
Amplifier”, CLNS-98-1539, Dec 1997, Cornell U., 14pp.; Int. Conf. on Lasers '97, New 
Orleans, LA, 15-19 Dec. 1997, Proceedings, ed. J.J.Carrol, T.A.Goldman, pp. 890-897.     
[14] D.F.Alferov, Yu.A.Bashmakov, K.A.Belovontsev, E.G.Bessonov, P.A.Cherenkov,    
       “Observation of undulating radiation with the "Pakhra" synchrotron”, Phys. - JETP 
Lett.,  1977, v.26, N7, p.385.  
[15] V.I.Alexeev, E.G.Bessonov, “Experiments on Generation of Long Wavelength Edge 
Radiation  along the Directions nearly Coincident with the Aaxis of a Straight Section of 
the  “Pakhra”synchrotron”, Nucl.  Instr. Meth. B 173 (2001), p. 51-60.  
[16] H.Bruck, “Circular Particle Accelerators”, Press Universities of France, 1966.  
[17] A.A.Kolomensky and  A.N.Lebedev, “Theory of Cyclic Accelerators”, North Holland 
Publ., 1966.  
[18] V.V.Anashin, A.G.Valentinov, V.G. Veshcherevich et al., The dedicated synchrotron 
radiation source SIBERIA-2, NIM A282 (1989), p.369-374.  
[19] H.Hama, S.Takano, G.Isoyama, Control of  bunch length on the UVSOR Storage ring, 
Workshop on Foth Generation Light Sources, Chairmen M. Cornacchia and H. Winick,  
1992, SSRL 92/02, USA, p.208-216.  
[20] D.F.Alferov, Yu.A.Bashmakov, E.G.Bessonov. “To the theory of the undulator 
radiation”. Sov. Phys.-Tech. Phys., 18, (1974),1336.  
http://arxive.org/abs/physics/0404142
http://www.sns.gov/PAC05
http://accelconf.web.cern.ch/accelconf/p05/PAPERS/TPAT086.PDF
http://arxiv.org/abs/physics/0509196
http://accelconf.web.cern.ch/accelconf/p05/PAPERS/TUPLS001.PDF
http://jinst.sissa.it/
http://ej.iop.org/links/r5pyfrsWE/sl44atI92xGGY2iAav5vpA/jinst6_08_p08005.pdf
[21]   D.F.Alferov, Yu.A.Bashmakov, K.A.Belovintsev, E.G.Bessonov, P.A.Cherenkov. "The 
Undulator as a Source of Electromagnetic Radiation”, Particle accelerators, (1979), v.9, 
No 4, p. 223-235.  
[22] E.G.Bessonov, Undulators, Undulator Radiation, Free-Electron Lasers, Proc. Lebedev   
       Phys. Inst., Ser.214, 1993, p.3-119 (Undulator Radiation and Free-Electron Lasers),  
       Chief ed. N.G.Basov, Editor-in-chief P.A.Cherenkov (in Russian).   
[23] J.A. Armstrong, N. Bloembergen, J. Ducuing, P.S. Pershan, Phys. Rev. 127 (1962) 1918. 
[24] C.W. Hoyt, M. Sheik-Bahae, M. Ebrahimzadeh, High-power picosecond optical 
parametric oscillator based on periodically poled lithium niobate, Opt. Lett. Vol. 27, No. 
17, 2002. 
[25] K.W. Chang, A.C. Chiang, T.C. Lin, B.C. Wong, Y.H. Chen, Y.C. Huang Simultaneous 
wavelength conversion and amplitude modulation in a monolithic periodically-poled 
lithium niobate. Opt. Comms, 203 (2002) 163-168. 
[26] L. E. Myers, G. D. Miller, E. C. Eckardt, M. M. Fejer, and R. L. Byer, Opt. Let. 20, 52  
      (1995).  
[27] A.Maitland and M.H.Dunn, Laser physics, Noth-Holland publishing Company, 
Amsterdam – London, 1969.  
[28] I.N.Ross, P.Matousek, M.Towrie, et al., Optics communications 144 (1997), p. 125- 133.  
This paper is published in: http://arxive.org/abs/0704.0870   
http://arxive.org/abs/0704.0870
ABSTRACT
  We are proposing to test experimentally the new idea of Enhanced Optical
Cooling (EOC) in an electron storage ring. This experiment will confirm new
fundamental processes in beam physics and will demonstrate new unique
possibilities with this cooling technique. It will open important applications
of EOC in nuclear physics, elementary particle physics and in Light Sources
(LS) based on high brightness electron and ion beams.

<|endoftext|><|startoftext|>
Introduction
	Hypothesis
	Model
	Input data
	Observed frequencies
	Results
	Alternative models
	Distributions within outcomes
	Discussion
	Conclusions
ABSTRACT
  Extrasolar planetary systems range from hot Jupiters out to icy comet belts
more distant than Pluto. We explain this diversity in a model where the mass of
solids in the primordial circumstellar disk dictates the outcome. The star
retains measures of the initial heavy-element (metal) abundance that can be
used to map solid masses onto outcomes, and the frequencies of all classes are
correctly predicted. The differing dependences on metallicity for forming
massive planets and low-mass cometary bodies are also explained. By
extrapolation, around two-thirds of stars have enough solids to form Earth-like
planets, and a high rate is supported by the first detections of low-mass
exo-planets.

<|endoftext|><|startoftext|>
Introduction
Let x ≥ 0, n ≥ 0 be integers. An integer N , which record consists from n
records of number x, we shall designate by
N = {x}n = x . . . x, n > 0. (1)
For n = 0 it is received {x}0 = ∅ an empty record. For example, {10}31 =
1010101, {10}01 = 1, etc.
Palindromic numbers of a kind
En,k = {1{0}k}n1, (2)
where n ≥ 0, k ≥ 0 we will name initial numbers. We will notice thatE0,k = 1
at any k ≥ 0.
Numbers repunit(see[2, 3, 4]) are natural numbers, which records consist of
units only, i.e. by definition
Rn = En−1,0, (3)
where n ≥ 1.
In decimal notation the general formula for numbers repunit is
Rn = (10
n − 1)/9, (4)
∗Tarasov, B.V. The concrete theory of numbers: initial numbers and wonderful properties
of numbers repunit. MSC 11A67+11B99. c©2007 Tarasov, B.V., independent researcher.
http://arxiv.org/abs/0704.0875v2
Tarasov, B.V. ”Initial and repunit numbers” 2
where n = 1, 2, 3, . . . .
There are known only five prime repunit for n =2,19, 23, 317, 1031.
Known problem ((Prime repunit numbers[3])). Whether exists infinite num-
ber of prime numbers repunit ?
Will we use designations further :
(a, b) = gcd(a, b) the greatest common divider of integers a > 0, b > 0.
p, q odd prime numbers.
If it is not stipulated specially, the integer positive numbers are considered.
2 Initial numbers
Let’s consider the trivial properties of initial numbers.
Theorem 1. Following trivial statements are fair :
(1) General formula of initial numbers is
En,k =
R(k+1)(n+1)
10(k+1)(n+1) − 1
10k+1 − 1
. (5)
(2) For k ≥ 0, n ≥ m ≥ 1 if n + 1 ≡ 0(mod (m + 1)),
then (En,k, Em,k) = Em,k.
(3) For k ≥ 0, n > m ≥ 1 if integer s ≥ 1, exists such
that n + 1 ≡ 0(mod (s + 1)), m + 1 ≡ 0(mod (s + 1)), then
(En,k, Em,k) ≥ Es,k > 1.
(4) For k ≥ 0, n > m ≥ 1 (En,k, Em,k) = 1 when and only then,
(n + 1,m + 1) = 1.
Proof. 1) Properties (1)—(3) are obvious.
2) The Proof of property (4). Necessity. Let
(En,k, Em,k) = 1 and (n + 1,m + 1) = s > 1, s − 1 ≥ 1. From property
(3) of the theorem follows that (En,k, Em,k) ≥ Es−1,k = {1{0}k}s−11 > 1.
Appears the contradiction .
Sufficiency of property (4). Let (n+1,m+1) = 1, then will be integers
a > 0, b > 0, such that either a(n + 1) = b(m + 1) + 1
or b(m + 1) = a(n + 1) + 1. Let’s assume, that (En,k, Em,k) = d > 1.
a) Let a(n + 1) = b(m + 1) + 1, then Eb(m+1),k = Ea(n+1)−1,k =
(10a(n+1)(k+1) − 1)/(10k+1 − 1) ≡ 0(modEn,k) ≡ 0(modd).
On the other hand Eb(m+1),k = (10
(k+1){b(m+1)+1}−1)/(10k+1−1) =
((10b(m+1)(k+1) − 1)/(10k+1 − 1)) · 10k+1 + 1 ≡
≡ 1(modEm,k) ≡ 1(modd). Appears the contradiction.
b) Let b(m + 1) = a(n + 1) + 1, then Ea(n+1),k = Eb(m+1)−1,k =
(10b(m+1)(k+1) − 1)/(10k+1 − 1) ≡ 0(modEm,k) ≡ 0(modd).
On the other hand Ea(n+1),k = (10
(k+1){a(n+1)+1} −1)/(10k+1−1) =
((10a(n+1)(k+1) − 1)/(10k+1 − 1)) · 10k+1 + 1 ≡
≡ 1(modEn,k) ≡ 1(modd). Have received the contradiction.
3 Numbers repunit
Let’s consider trivial properties of numbers repunit.
Tarasov, B.V. ”Initial and repunit numbers” 3
Theorem 2. Following trivial statements are fair :
(1) The number Rn is prime only if n number is prime.
(2) If p > 3 all prime dividers of number Rp look like 1+2px where x ≥ 1
is integer.
(3) (Ra, Rb) = 1 if and only if (a, b) = 1.
Proof. Property (1) of theorem is proved in ([2, 3]), property (2) is proved in
([1]), as exercise. Property (3) is the corollary of the theorem 1.
Theorem 3. (Ra, Rb) = R(a,b), where a ≥ 1, b ≥ 1 are integers.
Proof. Validity of the theorem for (a, b) = 1 follows from property (3) of
theorem2. Let (a, b) = d > 1, where a = a1d, b = b1d, (a1, b1) = 1. Let’s
consider equations
Ra = Rd · {10
d(a1−1) + . . . + 10d + 1},
Rb = Rd · {10
d(b1−1) + . . . + 10d + 1}.
A = 10d(a1−1) + . . . + 10d + 1,
B = 10d(b1−1) + . . . + 10d + 1.
Let’s assume, that (A,B) > 1, and q is a prime odd number such that
A ≡ 0(modq), B ≡ 0(modq). (6)
If q = 3, then 10t ≡ 1(modq) for any integer t ≥ 1. Then from (6) it
follows that a1 ≡ b1 ≡ 0(modq). Have received the contradiction.
Thus, q > 3. Then there exists an index dmin, to which the number 10
belongs on the module q.
(10d)dmin ≡ 1(modq),
where dmin ≥ 1.
If dmin = 1, then it follows from (6) that a1 ≡ b1 ≡ 0(modq). Have
received the contradiction. Hence dmin > 1. As Ra ≡ Rb ≡ 0(modq),
then (10d)a1 ≡ 1(modq) and (10d)b1 ≡ 1(modq).
Then a1 ≡ b1 ≡ 0(moddmin). Have received the contradiction.
Theorem 4. Let p > 3 be a prime number, k ≥ t ≥ 1, t ≥ s ≥ 1 integer
numbers. Then
gcd(Rpk/Rpt , Rps) = 1. (7)
Proof. Let’s consider expression
A = Rpk/Rpt = (10
k−t−1 + (10p
k−t−2 + . . . + 10p
If (A,Rps) > 1, then the prime number q exists such that
A ≡ 0(modq) Rps ≡ 0(modq). Hence 10
≡ 1(modq), then
A ≡ pk−t ≡ 0(modq), p = q = 3. Have received the contradiction,
because p > 3.
Tarasov, B.V. ”Initial and repunit numbers” 4
Theorem 5. Let a ≥ 1, b ≥ 1 are integers, then the following statements are
true :
(1) If (a, b) = 1, then
gcd(Rab, RaRb) = RaRb. (8)
(2) If (a, b) > 1, then
RaRb/R(a,b) ≤ gcd(Rab, RaRb) < RaRb. (9)
Proof. 1) Let (a, b) = 1, then (Ra, Rb) = R(a,b) = 1,
Rab = RaX = RbY , X = cRb, where c ≥ 1 is integer. Rab = cRaRb.
2) Let (a, b) = d > 1, a = a1d, b = b1d, (a1, b1) = 1, a1 ≥ 1, b1 ≥ 1.
As gcd(Ra, Rb) = R(a,b), we receive equality
Ra = R(a,b)X,Rb = R(a,b)Y, (10)
where (X,Y ) = 1.
Further, Rab = RaA = RbB = XAR(a,b) = Y BR(a,b), XA = Y B,
A = Y z, B = Xz, z ≥ 1 is integer. Then Rab = XY R(a,b)z,
Rab = zRaRb/R(a,b). We have proved, that
RaRb/R(a,b) ≤ gcd(Rab, RaRb).
Let’s assume, that gcd(Rab, RaRb) = RaRb, then Rab = zRaRb, where
z ≥ 1 is integer. Let’s consider equalities
Rab = RaA = RbB,
where
A = 10a(b−1) + 10a(b−2) + . . . + 10a + 1,
B = 10b(a−1) + 10b(a−1) + . . . + 10b + 1.
Since A = Rbz, B = Raz, 10
a ≡ 1(modR(a,b)),
10b ≡ 1(modR(a,b)), then A ≡ B ≡ 0(modR(a,b)), hence
a ≡ b ≡ 0(modR(a,b)).
Thus, comparison (a, b) ≡ 0(modR(a,b)) or
d ≡ 0(modRd) is fair, that contradicts an obvious inequality
(10x − 1)/9 > x, (11)
where x > 1 is real.
({⋆} The Important corollary of the theorem 5).
Number Rab/(RaRb) is integer when and only when (a, b) = 1, where a ≥ 1,
b ≥ 1 are integers.
Let’s quote some trivial statements for numbers repunit.
Lemma 1. If a = 3nb, (b, 3) = 1, then
Ra ≡ 0(mod 3
n), butRa 6≡ 0(mod 3
(n+1)). (12)
Tarasov, B.V. ”Initial and repunit numbers” 5
Proof. If n = 1, then Ra = R3B, where B = 10
3(b−1) + . . . + 103 + 1,
R3 ≡ 0(mod 3), B ≡ b 6≡ 0(mod 3). Thus,
Ra ≡ 0(mod 3), but Ra 6≡ 0(mod 3
Let comparisons (12) be proved for n ≤ k− 1. We shall consider a = 3kb,
(b, 3) = 1. Then Ra = R3k−1bA, where A = 10
3k−1b2 + 103
b + 1.
R3k−1b ≡ 0(mod 3
k−1), but R3k−1b 6≡ 0(mod 3
k), A ≡ 0(mod 3), but
A 6≡ 0(mod 32).
Lemma 2. If n ≥ 0 is integer, then
rn = 10
11n + 1 ≡ 0(mod 11n+1), but rn 6≡ 0(mod 11
n+2). (13)
Proof. r0 = 11 ≡ 0(mod 11), but r0 = 11 6≡ 0(mod 11
r1 = 10
11 + 1 ≡ 0(mod 112), but r1 6≡ 0(mod 11
Let’s make the inductive assumption, that formulas (13) are proved for
n ≤ k − 1, where k − 1 ≥ 1, k ≥ 2. Let n = k, then
rk = 10
11k + 1 = (1011
)11 + 1 = rk−1A, where
A = 1011
k−110 − 1011
k−19 + 1011
k−18 − 1011
k−17 + 1011
k−16−
− 1011
k−15 + 1011
k−14 − 1011
k−13 + 1011
k−12 − 1011
+ 1. (14)
Since, due to the inductive assumption 1011
≡ −1(mod 11k), where
k ≥ 2, then A ≡ 11(mod 11k). Then A ≡ 0(mod 11), but
A 6≡ 0(mod 112). Thus, we receive, that rk ≡ 0(mod 11
k+1), but
rk 6≡ 0(mod 11
k+2).
Lemma 3. For an integer a ≥ 1, the following statements are true :
(1) If a is odd, then Ra 6≡ 0(mod 11).
(2) If a = 2(11n)b, (b, 11) = 1, then
Ra ≡ 0(mod 11
n+1), butRa 6≡ 0(mod 11
n+2). (15)
Proof. If a is odd, then Ra ≡ 1(mod 11). If a = 2(11
n)b, (b, 11) = 1,
then Ra = ((10
2(11)n)b − 1)/9 = R11n · rn · A, where rn = 10
11n + 1,
A = 102(11
n)(b−1) + . . . + 102(11
n) + 1. R11n 6≡ 0(mod 11),
A ≡ b 6≡ 0(mod 11). Then validity of the statement (2) of lemma 3 follows
from lemma 2.
({⋆} The assumption: the general formula for gcd(Rab, RaRb)).
If a ≥ 1, b ≥ 1 are integers, d = (a, b), where d = 3L · 11S · c, (c, 3) = 1,
(c, 11) = 1, L ≥ 0, S ≥ 0, then equalities are true :
— if c is an odd number, then
gcd(Rab, RaRb) = ((RaRb)/R(a,b)) · 3
L, (16)
— if c is an even number, then
gcd(Rab, RaRb) = ((RaRb)/R(a,b)) · 3
L · 11S. (17)
Let’s give another two obvious statements in which divisors of numbers
repunit are studied, as degrees of prime number.
Tarasov, B.V. ”Initial and repunit numbers” 6
Lemma 4. If p, q are prime numbers and Rp ≡ 0(modq), but
Rp 6≡ 0(modq
2), then statements are true :
(1) For any integer r, 0 < r < q, Rpr 6≡ 0(modq
(2) For any integer n, n ≥ 1, Rpn 6≡ 0(modq
Proof. 1) Rpr = Rp · R̂pr, where R̂pr = 10
p(r−1) + 10p(r−2) +
+ . . . + 10p + 1. If Rpr ≡ 0(modq
2), then R̂pr ≡ 0(modq),
r ≡ 0(modq). Have received the contradiction.
2) If n > 1 found such that Rpn ≡ 0(modq
2), then from (7) follows
(Rpn/Rp, Rp) = 1. Have received the contradiction.
Lemma 5. If p, q are prime numbers and Rp ≡ 0(modq), then
Rpqn ≡ 0(modq
n+1).
Proof. Since Rpq = Rp · R̂pq, where R̂pq = 10
p(q−1) +
+ 10p(q−2) + . . . + 10p + 1, then R̂pq ≡ 0(modq), Rpq ≡ 0(modq
Let’s assume that Rpqn−1 ≡ 0(modq
n). Then
Rpqn = Rpqn−1·q = Rpqn−1 · R̂pqn−1·q, where
R̂pqn−1·q = 10
n−1·(q−1)+10pq
n−1·(q−2)+ . . .+10pq
+1 ≡ 0(modq),
Rpqn ≡ 0(modq
n+1).
4 Problem of simplicity of initial numbers
Let’s consider the problem of simplicity of initial numbers En,k, where
k ≥ 0, n ≥ 0.
If k = 0, then En,0 = Rn+1. Thus, simplicity of numbers En,0 – is known
problem of prime numbers repunit Rp, where p is prime number.
If n = 1, then E1,k = 1{0}k1 = 10
k+1 + 1. As number E1,k can be
prime only when k + 1 = 2m, m ≥ 0 is integer, then we come to the known
problem of simplicity of the generalized Fermat numbers fm(a) = a
2m + 1
for a = 10. Generalized Fermat numbers nave been define by Ribenboim [5] in
1996, as numbers of the form fn(a) = a
2n + 1, where a > 2 is even.
The generalized Fermat numbers fn(10) = 10
2n + 1 for n ≤ 14 are prime
only if n = 0, 1. f0(10) = 11, f1(10) = 101.
Theorem 6. Let n > 1, k > 0. If any of conditions
(1) n number is odd,
(2) k number is odd,
(3) n + 1 ≡ 0(mod 3),
(4) (n + 1, k + 1) = 1,
is true, then number En,k is compound.
Proof. 1) n + 1 = 2t, t > 1. Then En,k = Et−1,k · (10
t(k+1) + 1), where
t > 1, t − 1 ≥ 1. As Et−1,k > 1, then En,k is compound number.
2) Let k be an odd number. Due to the proved condition (1) we count that
number (n + 1) is odd. k + 1 = 2t ≥ 2, t ≥ 1. Further,
En,k = En,t−1 · ((10
(n+1)t + 1)/(10t + 1)),
where n > 1, t− 1 ≥ 0, En,t−1 > 1, number (10
(n+1)t +1)/(10t +1) > 1
is integer.
Tarasov, B.V. ”Initial and repunit numbers” 7
3) If n + 1 ≡ 0(mod 3), then En,k ≡ 0(mod 3), En,k > 11.
4) Let n > 1, k ≥ 1, (n + 1, k + 1) = 1, then
En,k = R(n+1)(k+1)/R(k+1) = R(n+1) · (R(n+1)(k+1)/(Rk+1 · Rn+1)).
Due to the theorem 5 number z = R(n+1)(k+1)/(Rk+1 · Rn+1) is integer.
Further,
z > (10(n+1)(k+1)−1)/(10n+k+2) = 10nk−1−1/(10n+k+2), nk−1 ≥ 1,
thus, z > 1.
Question of simplicity of initial numbers under conditions, when
(n + 1, k + 1) > 1, (n + 1) number is odd, (k + 1) number is odd,
n + 1 6≡ 0(mod 3), remains open.
In particular, it is interesting to considerate numbers Ep−1,p−1 = Rp2/Rp,
where p is prime number. For p < 100 numbers Ep−1,p−1 are compound.
5 The open problems of numbers repunit
The known problem of numbers repunit remains open.
Problem 1 ((Prime repunit numbers[3])). Whether there exists infinite number
of prime numbers Rp, p–prime number ?
Problem 2. Whether all numbers Rp, p–prime number, are numbers free from
squares ?
The author has checked up for p < 97, that numbers Rp are free from
squares. Another following open questions are interesting :
Problem 3. If number Rp is free from squares, where p > 3 is prime number,
whether will number n, be found such what number Rpn contains a square ?
Problem 4. p is prime number, whether there are simple numbers of a kind
Ep−1,p−1 = Rp2/Rp ?
The author has checked up to p ≤ 127, that numbers Ep−1,p−1 is com-
pound. It is known, that Rp divide by number (2p + 1) for prime numbers
p = 41, 53, Rp divide by number (4p+1) for prime numbers p = 13, 43, 79.
There appears a question :
Problem 5. Whether there is infinite number of prime numbers p, such that
Rp divide by number (2p + 1) or is number (4p + 1) ?
(The remark). If the number p > 5 Sophie Germain prime (i.e. number
2p+1 is prime too), then either Rp or R
= (10p +1)/11 divide by number
(2p + 1).
6 The conclusion
Leonhard Euler, professor of the Russian Academy of sciences since 1731,
has paid mathematics forever ! Euler’s invisible hand directs the develop-
ment of concrete mathematics for more than 200 years.
Euler’s titanic work which has opened a way to freedom to mathematical
community, admires. The pleasure caused by Euler’s works warms hearts.
Tarasov, B.V. ”Initial and repunit numbers” 8
References
[1] Vinogradov I.M. Osnovy teorii chisel. -M. :Nauka, 1981.
[2] Ronald L.Graham,Donald E.Knuth,Oren Patashnik,
Concrete Mathematics : A Foundation for Computer Science, 2nd edition
(Reading,Massachusetts: Addison-Wesley), 1994.
[3] Weisstein, Eric W. ”Repunit.” From MathWorld–A Wolfram Web Resource.
—http://mathworld.wolfram.com/Repunit.html/.
c©1999—2007 Wolfram Research, Inc.
[4] The Prime Clossary repunit.
—http://primes.utm.edu/glossary/page.php?sort=Repunit/.
[5] Ribenboim, P. ”Fermat Numbers” and ”Numbers k × 2n ± 1.” 2.6 and 5.7
in The New Book of Prime Number Records. New York: Springer-Verlag,
pp. 83-90 and 355-360, 1996.
———————————————————————
Institute of Thermophysics, Siberian Branch of RAS
Lavrentyev Ave., 1, Novosibirsk, 630090, Russia
E-mail: tarasov@itp.nsc.ru
———————————————————————
Independent researcher,
E-mail: tarasov-b@mail.ru
———————————————————————
ABSTRACT
  In this work initial numbers and repunit numbers have been studied. All
numbers have been considered in a decimal notation. The problem of simplicity
of initial numbers has been studied. Interesting properties of numbers repunit
are proved:
  $gcd(R_a, R_b) = R_{gcd(a,b)}$;
  $R_{ab}/(R_aR_b)$ is an integer only if $gcd(a,b) = 1$, where $a\geq1$,
$b\geq1$ are integers. Dividers of numbers repunit, are researched by a degree
of prime number.

<|endoftext|><|startoftext|>
Introduction to the Mathematics of Financial Markets, LNM 1816 - Lec-
tures on Probability Theory and Statistics, Saint-Flour summer school 2000 (Pierre Bernard,
editor), Springer Verlag, Heidelberg (2003), pp. 111–177.
[4] H. Tanaka, An inequality for a functional of probability distributions and its applications to
Kac’s one-dimensional model of a Maxwell gas, Zeitschrift fr Wahrscheinlichkeitstheorie und
verwandte Gebiete 27, 47–52, 1973
[5] C. Villani, Topics in Optimal Transportation, Graduate Studies in Mathematics 58, American
Mathematical Society, Providence Rhode Island, 2003.
Financial and Actuarial Mathematics, Technical University Vienna, Wiedner Haupt-
strasse 8–10, A-1040 Vienna, Austria.
http://www.numdam.org/en/
http://www.fam.tuwien.ac.at/
	References
ABSTRACT
  We give an easy counter-example to Problem 7.20 from C. Villani's book on
mass transport: in general, the quadratic Wasserstein distance between $n$-fold
normalized convolutions of two given measures fails to decrease monotonically.

<|endoftext|><|startoftext|>
Introduction
	The weak bimodality in type Ia SNe
	The strong bimodality in type Ia SNe
	Evolution of the SN rate with redshift
ABSTRACT
  We comment on the presence of a bimodality in the distribution of delay time
between the formation of the progenitors and their explosion as type Ia SNe.
Two "flavors" of such bimodality are present in the literature: a "weak"
bimodality, in which type Ia SNe must explode from both young and old
progenitors, and a "strong" bimodality, in which about half of the systems
explode within 10^8 years from formation. The "weak" bimodality is
observationally based on the dependence of the rates with the host galaxy SFR,
while the "strong" one on the different rates in radio-loud and radio-quiet
early-type galaxies. We review the evidence for these bimodalities. Finally, we
estimate the fraction of SNe which are missed by optical and near-IR searches
because of dust extinction in massive starbursts.

<|endoftext|><|startoftext|>
Structural relaxation around substitutional Cr3+ in MgAl2O4
Amélie Juhin,∗ Georges Calas, Delphine Cabaret, and Laurence Galoisy
Institut de Minéralogie et de Physique des Milieux Condensés,
UMR CNRS 7590
Université Pierre et Marie Curie, Paris 6
140 rue de Lourmel, F-75015 Paris, France
Jean-Louis Hazemann†
Laboratoire de Cristallographie, CNRS, 25 avenue des Martyrs, BP 166, 38042 Grenoble cedex 9, France
(Dated: October 26, 2018)
The structural environment of substitutional Cr3+ ion in MgAl2O4 spinel has been investigated
by Cr K-edge Extended X-ray Absorption Fine Structure (EXAFS) and X-ray Absorption Near
Edge Structure (XANES) spectroscopies. First-principles computations of the structural relaxation
and of the XANES spectrum have been performed, with a good agreement to the experiment. The
Cr-O distance is close to that in MgCr2O4, indicating a full relaxation of the first neighbors, and the
second shell of Al atoms relaxes partially. These observations demonstrate that Vegard’s law is not
obeyed in the MgAl2O4-MgCr2O4 solid solution. Despite some angular site distortion, the local D3d
symmetry of the B-site of the spinel structure is retained during the substitution of Cr for Al. Here,
we show that the relaxation is accomodated by strain-induced bond buckling, with angular tilts of
the Mg-centred tetrahedra around the Cr-centred octahedron. By contrast, there is no significant
alteration of the angles between the edge-sharing octahedra, which build chains aligned along the
three four-fold axes of the cubic structure.
PACS numbers: 61.72.Bb, 82.33.Pt, 78.70.Dm, 71.15.Mb
I. INTRODUCTION
Most multicomponent materials belong to complete or
partial solid solutions. The presence of chemical sub-
stitutions gives rise to important modifications of the
physical and chemical properties of the pure phases. For
instance, the addition of a minor component can im-
prove significantly the electric, magnetic or mechanical
behaviour of a material.1,2,3 Another evidence for the
presence of impurities in crystals comes from the modifi-
cation of optical properties such as coloration. Transition
metal ions like Cr3+ cause the coloration of wide band
gap solids, because of the splitting of 3d-levels under the
action of crystal field.4 Despite the ubiquitous presence
of substitutional elements in solids, their accommoda-
tion processes and their structural environment are still
discussed,5 since they have important implications. For
example, the interpretation of the color differences be-
tween Cr-containing minerals (e.g. ruby, emerald, red
spinel) requires to know the structural environment of
the coloring impurity.4,6,7,8 The ionic radius of a sub-
stitutional impurity being usually different from that of
the substituted ion, the accommodation of the mismatch
imposes a structural relaxation of the crystal structure.
Vegard’s law states that there is a linear relationship
between the concentration of a substitutional impurity
and the lattice parameters, provided that the substi-
tuted cation and impurity have similar bonding proper-
ties. Chemically selective spectroscopies, like Extended
X-ray Absorption Fine Structure (EXAFS), have pro-
vided evidence that diffraction studies of solid solutions
give only an average vision of the microscopic states and
that Vegard’s law is limited.9,10,11 Indeed, a major result
concerns the existence of a structural relaxation of the
host lattice around the substitutional cation. This im-
plies the absence of modification of the site occupied by
a doping cation, when decreasing its amount in a solid
solution. This important result has been observed in var-
ious materials, including III-V semi-conductors or mixed
salts:12,13 e. g., in mixed alkali halides, some important
angular buckling deviations have been observed.13 Re-
cently, the use of computational tools, as a complement
of EXAFS experiments, has been revealed successful for
the study of oxide/metal epilayers.14 In oxides contain-
ing dilute impurities, this combined approach is manda-
tory. It has been recently applied to the investigation of
the relaxation process around Cr dopant in corundum:
in the α-Al2O3 - α-Cr2O3 system, the radial relaxation
was found to be limited to the first neighbors around Cr,
while the angular relaxation is weak.8,15
In this work, we investigate the relaxation caused by
the substitution of Al3+ by Cr3+ in spinel MgAl2O4,
which gives rise to a solid solution, as observed for corun-
dum α-Al2O3. The spinel MgAl2O4 belongs to an impor-
tant range of ceramic compounds, which has attracted
considerable interest among researchers for a variety of
applications, great electrical, mechanical, magnetic and
optical properties.16 The spinel structure is based on a
cfc close-packing, with a Fd3̄m space group symmetry.
Its chemical composition is expressed as AB2X4, where
A and B are tetrahedral and octahedral cations, respec-
tively, and X is an anion. These two types of cations
define two different cationic sublattices, which may in-
duce a very different relaxation process than in corun-
dum. In the normal spinel structure, the octahedra host
http://arxiv.org/abs/0704.0878v1
trivalent cations and exhibit D3d site symmetry. This
corresponds to a small distortion along the [111] direc-
tion, arising from a departure of the position of oxy-
gen ligands from a cubic arrangement. Small amounts
of chromium oxide improve the thermal and mechanical
properties of spinel.1 A color change from red to green
is also observed with increasing Cr-content. In this arti-
cle, we report new results on the local geometry around
Cr3+ in spinel MgAl2O4, using a combination of EXAFS
and X-ray Absorption Near Edge Structure (XANES).
The experimental data are compared to those obtained
by theoretical calculations, based on the Density Func-
tional Theory in the Local Spin Density Approximation
(DFT-LSDA): this has enabled us to confirm the local
structure around substitutional Cr3+ and investigate in
detail the radial and angular aspects of the relaxation.
The paper is organized as follows. Section II is dedi-
cated to the methods, including the sample description
(Sec. II A), the X-ray absorption measurements and
analysis (Sec. II B), and the computational details (Sec.
II C). Section III is devoted to the results and discussion.
Conclusions are given in Sec. IV.
II. MATERIALS AND METHODS
A. Sample description
Two natural gem-quality red spinel single crystals from
Mogok, Burma (Cr-1, Cr-2) were investigated. They
contain respectively 70.0, 71.4 wt %-Al2O3, 0.70, 1.03
wt%-Cr2O3 and 26.4, 25.3 wt%-MgO. These composi-
tions were analyzed using the Cameca SX50 electron mi-
croprobe at the CAMPARIS analytical facility of the Uni-
versities of Paris 6/7, France. A 15 kV voltage with a 40
nA beam current was used. X-ray intensities were cor-
rected for dead-time, background, and matrix effects us-
ing the Cameca ZAF routine. The standards used were
α-Al2O3, α-Cr2O3 and MgO.
B. X-ray Absorption Spectroscopy measurements
and analysis
Cr K-edge (5989 eV) X-ray Absorption Spectroscopy
(XAS) spectra were collected at room temperature at
beamline BM30b (FAME), at the European Synchrotron
Radiation Facility (Grenoble, France) operated at 6 GeV.
The data were recorded using the fluorescence mode with
a Si (111) double crystal and a Canberra 30-element Ge
detector.17 We used a spacing of 0.1 eV and of 0.05 Å−1,
respectively in the XANES and EXAFS regions. Data
treatment was performed using ATHENA following the
usual procedure and the EXAFS data were analyzed us-
ing IFEFFIT, with the support of ARTEMIS.18 The de-
tails of the fitting procedure can be found elsewhere.19
An uvarovite garnet, Ca3Cr2Si3O12, was used as model
compound to derive the value of the amplitude reduction
factor S20 (0.81) needed for fitting. For each sample, a
multiple-shell fit was performed in the q-space, including
the first four single scattering paths: the photoelectron is
backscattered either by the first (O), second (Al or Cr),
third (O) or fourth (Mg) neighbors. Treating identically
the third and fourth paths, we used a unique energy shift
∆e0 for all paths, three different path lengths R and three
independent values of the Debye-Waller factor σ2. In a
first step, the number of neighbors N was fixed to the
path degeneracy. In a second time, a single amplitude
parameter was fitted for the last three shells, assuming
a proportional variation of the number of atoms on each
shell.
C. Computations
1. Structural relaxation
In order to complement the structural information
from EXAFS, a simulation of the structural relaxation
was performed to quantify the geometric surrounding
around an isolated Cr3+. The calculations were done in a
neutral supercell of MgAl2O4, using a first-principles to-
tal energy code based on DFT-LSDA.20 We used Plane
Wave basis set and norm conserving pseudopotentials21
in the Kleiman Bylander form.22 For Mg, we considered
3s, 3p, 3d as valence states (core radii of 1.05 a.u, ℓ=2
taken as local part) and those of Ref.15 for Al, Cr, O.
We first determined the structure of bulk MgAl2O4. We
used a unit cell, which was relaxed with 2×2×2 k-point
grid for electronic integration in the Brillouin Zone and
cut-off energy of 90 Ry. We obtained a lattice constant of
7.953 Å and an internal parameter of 0.263 (respectively
-1.6 % and +0.3 % relative to experiment),23 which
are consistent with previous calculations.16. In order to
simulate the Cr defect, we used a 2×2×2 supercell, built
using the relaxed positions of the pure phase. It con-
tains 1 neutral Cr, 31 Al, 16 Mg and 64 O atoms. It was
chosen large enough to minimize the interaction between
two paramagnetic ions, with a minimal Cr-Cr distance
of 11.43 Å. While the size of the supercell is kept fixed,
all atomic positions are relaxed in order to investigate
long-range relaxation. We used the same cut-off energy
and a single k-point sampling. The convergence of the
calculation was verified by comparing it to a computa-
tion with a 2×2×2 k-point grid, and discrepancies in the
atomic forces are lower than 0.3 mRy/a.u. In order to
compare directly the theoretical bond distances to those
obtained by EXAFS spectroscopy, the inital slight un-
derestimation of the lattice constant (systematic within
the LDA)24 was removed by rescaling the lattice param-
eter by -1.6 %. This rescaling is homothetic and does not
affect the relative atomic positions.
2. XANES simulations
As the analysis of the experimental XANES data is
not straightforward, ab initio XANES simulations are re-
quired to relate the experimental spectral features to the
local structure around the absorbing atom. The method
used for XANES calculations are described in Ref. 25,26.
The all-electron wave-functions are reconstructed within
the projector augmented wave framework.27 In order to
allow the treatment of large systems, the scheme uses a
recursion method to construct a Lanzcos basis and then
compute the cross section as a continued fraction.28,29
The XANES spectrum is calculated in the electric dipole
approximation, using the same first-principles total en-
ergy code as the one used for the structural relaxation. It
was carried out in the relaxed 2×2×2 supercell (i.e 112
atoms), which contains one Cr atom and results from
ab initio energy minimization mentioned in the previ-
ous subsection. The pseudopotentials used are the same
as those used for structural relaxation, except for Cr.
Indeed, in order to take into account the core-hole ef-
fects, the Cr pseudopotential is generated with only one
1s electron. Convergence of the XANES calculation is
reached for the following parameters: a 70 Ry energy
cut-off for the plane-wave expansion, one k-point for the
self-consistent spin-polarized charge density calculation,
and a Monkhorst-Pack grid of 3×3×3 k-points in the
Brillouin Zone for the absorption cross-section calcula-
tion. The continued fraction is computed with a con-
stant broadening γ=1.1 eV, which takes into account the
core-hole lifetime.30
III. RESULTS AND DISCUSSION
Figure 1 shows the k3-weighted experimental EXAFS
signals for Cr-1 and Cr-2 samples and the Fourier Trans-
forms (FT) for the k-range 3.7-11.9 Å−1. The similari-
ties observed suggest a close environment for Cr in the
two samples (0.70 and 1.03 wt%-Cr2O3), which is con-
firmed by fitting the FT in the R-range 1.0-3.1 Å (see
Table I). The averaged Cr-O distance derived from EX-
TABLE I: Structural parameters obtained from the EXAFS
analysis in the R range [1.0-3.1 Å] for Cr-1 and Cr-2 samples.
The energy shifts ∆e0 were found equal to 1.3 ± 1.5 eV. The
obtained RF factors were 0.0049 and 0.0045.
R(Å) N σ2 (Å2)
Cr-O 1.98 6.0 0.0031 Cr-1
1.98 6.0 0.0026 Cr-2
Cr-Al 2.91 5.3 0.0032 Cr-1
2.91 5.4 0.0033 Cr-2
Cr-O 3.39 1.8 0.0079 Cr-1
3.37 1.8 0.0077 Cr-2
Cr-Mg 3.39 5.3 0.0079 Cr-1
3.39 5.4 0.0077 Cr-2
FIG. 1: Fourier-transform of k3-weighted EXAFS function for
Cr-1 and Cr-2 samples (dashed and solid lines respectively).
Inset: background-subtracted data
FIG. 2: (a) Inverse-FT of EXAFS data (dots) and fitted signal
(solide line) for R=1.0-3.1 Å. (b) Inverse-FT of EXAFS data
(dots) for R=2.0-3.1 Å, multi-shell fit with Cr-Al pairs (solid
line) and theoretical function with Cr-Cr pairs (dashed line)
in the same structural model.
TABLE II: First, second and third neighbor mean distances (in Å) from central M3+ in the different structures considered in
this work.
MgAl2O4: Cr
3+ exp MgAl2O4: Cr
3+ calc MgAl2O4 exp
a MgCr2O4 exp
Cr-O 1.98 1.99 — 1.99
Al-O — — 1.93 —
Cr-Al 2.91 2.88 — —
Cr-Cr — — — 2.95
Al-Al — — 2.86 —
Cr-O 3.37 3.34 — 3.45
Al-O — — 3.34 —
Cr-Mg 3.39 3.36 — 3.45
Al-Mg — — 3.35 —
afrom Ref.23
bfrom Ref.34
AFS data is equal to 1.98 Å (± 0.01 Å), with six oxygen
first neighbors. The second shell is composed of six Al
atoms, located at 2.91 Å (± 0.01 Å). Two oxygen and
six magnesium atoms compose the further shells, at dis-
tances of 3.38 Å and 3.39 Å (± 0.03 Å). We investigated
in detail the chemical nature of these second neighbors,
by fitting the second peak on the FT (2.0-3.1 Å) with ei-
ther a Cr or an Al contribution, this latter corresponding
to a statistical Cr-distribution (Cr/Al ∼ 0.01). The only
satisfactory fits were obtained in the latter case (Fig. 2).
Calculated and experimental interatomic distances are
in good agreement (Table II), a confirmation of the
EXAFS-derived radial relaxation around Cr3+ after sub-
stitution. The symmetry of the relaxed Cr-site is re-
tained from the Al-site in MgAl2O4 and is similar to
the Cr-site in MgCr2O4. It belongs to the D3d point
group, with an inversion center, three binary axes and
a C3 axis (Fig. 3a). This result is consistent with opti-
cal absorption31 and Electron-Nuclear Double Resonance
experiments32 performed on MgAl2O4: Cr
3+. Our first-
principles calculations also agree with a previous inves-
tigation of the first shell relaxation, using Hartree-Fock
formalism on an isolated cluster.33 As it has been men-
tioned previously, the simulation can provide comple-
mentary distances (Fig. 3b): the Al1-O distances, equal
to 1.91 Å, are slightly smaller than Al-O distances in
MgAl2O4. The Al1-Al2 distances are equal to 2.85 Å,
which is close to the Al-Al distances in MgAl2O4.
Apart from the radial structural modifications around
Cr, significant angular deviations are observed in the
doped structure. Indeed, the Cr-centred octahedron is
slightly more distorted in MgAl2O4: Cr
3+, with six O-
Cr-O angles of 82.1◦ (and six supplementary angles of
97.9◦): O-Cr-O is more acute than O-Cr-O in MgCr2O4
(84.5◦, derived from refined structure)34 and than O-Al-
O in MgAl2O4 (either calculated in the present work,
83.5◦, or derived from refined structure, 83.9◦) (Fig. 3a).
At a local scale around the dopant, the sequence of edge-
sharing octahedra is hardly modified by the substitution
(Fig. 3b): the Cr-O-Al1 angles (95.1◦) are similar to
Cr-O-Cr in MgCr2O4 (95.2
◦) and Al-O-Al in MgAl2O4
FIG. 3: (color online) (a) Cr-centred octahedron before re-
laxation (green) and after (red). (b) Model of structural dis-
tortions around Cr (red) in MgAl2O4: Cr
3+. The O first
neighbors (black) and the Al1 (green) second neighbors are
displaced outward the Cr dopant in the direction of arrows.
FIG. 4: Cr K-edge XANES spectra in MgAl2O4: Cr
3+. The
experimental signal (thick line) is compared with the theo-
retical spectra calculated in the relaxed structure (solid line)
and in the non-relaxed structure (dotted line)
(95.8◦). However, the six Al-centred octahedra connected
to the Cr-octahedron are slightly distorted (with six O-
Al1-O angles of 86.7◦), compared to O-Cr-O angles in
MgCr2O4 (84.5
◦) and O-Al-O angles in MgAl2O4 (83.9
This modification affects in a similar way the three types
of chains composed of edge-sharing octahedra, in agree-
ment with the conservation of the C3 axis. On the
contrary, the relative tilt angle between the Mg-centred
tetrahedra and the Cr-centred octahedron is very differ-
ent in MgAl2O4: Cr
3+ (with Cr-O-Mg angle of 117.4◦)
than in MgCr2O4 and MgAl2O4 (with respectively, Cr-
O-Mg and Al-O-Mg angles of 124.5◦ and 121.0◦)
The experimental XANES spectrum of natural
MgAl2O4: Cr
3+ is shown in Fig. 4. It is similar to that of
a synthetic Cr-bearing spinel.35 A good agreement with
the one calculated from the ab initio relaxed structure is
obtained, particularly in the edge region : the position,
intensity and shape of the strong absorption peak (peak
c) is well reproduced by the calculation. The small fea-
tures (peaks a and b) exhibited at lower energy are also
in good agreement with the experimental ones. In our
calculation, the pre-edge features (visible at 5985 eV on
the experimental data) cannot be reproduced, since we
only considered the electric dipole contribution to the X-
ray absorption cross-section: indeed, as it has been said
previously, the Cr-site is centrosymmetric in the relaxed
structure, which implies that the pre-edge features are
due to pure electric quadrupole transitions. The sen-
sitivity of the XANES calculation to the relaxation is
evaluated by computing the XANES spectrum for the
non-relaxed supercell, in which one Cr atom substitutes
an Al atom in its exact position. The result is plotted in
Fig. 4: the edge region (peaks a, b and c) is clearly not
as well reproduced as in the relaxed model, and peak e
is not visible at all. Therefore, we can conclude that the
structural model obtained from our ab initio relaxation
is reliable.
The Cr-O distance is larger than the Al-O distance
in MgAl2O4, but is similar to the Cr-O distance in
MgCr2O4 (Table II). This demonstrates the existence of
an important structural relaxation around the substitu-
tional Cr3+ ion, which is expected since Cr3+ has a larger
ionic radius than Al3+ (0.615 Å vs 0.535 Å).36 The size
mismatch generates indeed a local strain, which locally
expands the host structure. As a result, the O atoms
relax outward the Cr defect. This radial relaxation is ac-
companied with a slight angular deviation of the O first
neighbors, as compared to the host structure. The mag-
nitude of the radial relaxation may be quantified by a
relaxation parameter ζ, defined by the relation:10
RCr−O(MgAl2O4 : Cr
3+)− RAl−O(MgAl2O4)
RCr−O(MgCr2O4)− RAl−O(MgAl2O4)
We find ζ = 0.83 (taking the Cr-O experimental dis-
tance), close to the full relaxation limit (ζ = 1), which
is more than in ruby α-Al2O3: Cr
3+ (ζ = 0.76).8 Veg-
ard’s law, which corresponds to ζ = 0, is thus not obeyed
at the atomic scale. The Cr-Al distance is intermedi-
ate between the Al-Al and Cr-Cr distances in MgAl2O4
and MgCr2O4, which accounts for a partial relaxation
of the second neighbors, but the third and fourth shells
(O, Mg) do not relax, within the experimental and com-
putational uncertainties. The chains of Al-centred octa-
hedra are radially affected only at a local scale around
Cr: the Al second neighbors relax partially outward Cr,
with a Al1-O bond slightly shortened. The angular devi-
ations are also moderate (below 1◦), since the sequence
of octahedra is not modified, but these Al-centred oc-
tahedra are slighlty distorted. Indeed, these octahedra
being edge-shared, the number of degrees of freedom is
reduced, and the polyhedra can either distort or tilt a
little, one around another. It is interesting to point out
that the three chains of octahedra are orientated along
the three four-fold axes of the cubic structure, which are
highly symmetric directions. On the contrary, an angular
relaxation (3.5◦) is observed for the Mg atoms, but with
the absence of radial modifications. This must be con-
nected to the fact that the tetrahedra share a vertex with
the Cr-centred octahedron, a configuration which allows
more flexibility for relative rotation of the polyhedra.
The extension of the relaxation process up to the sec-
ond shell is not observed in the corundum solid solution,
in which it is limited to the first coordination shell.15
Such a difference between these two solid solutions can
be related to the lattice rigidity: the bulk modulus B
is smaller in MgAl2O4 than in α-Al2O3, 200 GPa and
251 GPa, respectively.37 This difference directly arises
from the peculiarity of the structure of these two crystals:
in the spinel structure, one octahedron is edge-shared
to 6 Al octahedra and corner-shared to 6 Mg-centred
tetrahedra (Fig. 3b). In corundum, each octahedron is
face-shared with another, in addition to corner and edge-
sharing bonds: this is at the origin of the rigidity of the
corundum structure, which is less able to relax around a
substitutional impurity such as Cr3+, and relaxation is
thus limited to the first neighbors.
IV. CONCLUSIONS
This study provides a direct evidence of the struc-
tural relaxation during the substitution of Cr for Al in
MgAl2O4 spinel. The local structure determined by X-
ray Absorption Spectroscopy and first-principles calcu-
lations show similar Cr-O distances and local symmetry
in dilute and concentrated spinels. This demonstrates
that, at the atomic scale, Vegard’s law is not obeyed in
the MgAl2O4-MgCr2O4 solid solution. Though this re-
sult has been obtained in other types of materials (semi-
conductors, mixed salts), it is particularly relevant for
oxides like spinel and corundum: indeed, the application
of Vegard’s law has long been a structural tool to in-
terpret, within the so-called ”point charge model”,4 the
color of minerals containing transition metal ions. In
spinel, the full relaxation of the first shell is partially ac-
comodated by strain-induced bond buckling, which was
found to be weak in corundum: important angular tilts
of the Mg-centred tetrahedra around the Cr-centred oc-
tahedron have been calculated, while the angles between
Cr- and Al-bearing edge-sharing octahedra are hardly af-
fected. The improved thermal and mechanical properties
of Cr-doped spinel may be explained by remanent local
strain fields induced by the full relaxation of the structure
around chromium, as it has been observed in other solid
solutions.2 Another important consequence of relaxation
concerns the origin of the partition of elements between
minerals and liquids in geochemical systems.5 Finally, the
data obtained in this study will provide a structural basis
for discussing the origin of color in red spinel and its vari-
ation at high Cr-contents. Indeed, the origin of the color
differences between Cr-containing minerals (ruby, emer-
ald, red spinel, alexandrite) is still actively debated.6,8,38
Acknowledgments
The authors are very grateful to O. Proux (FAME
beamline) for help during experiment. The theoret-
ical part of this work was supported by the French
CNRS computational Institut of Orsay (Institut du
Développement et de Recherche en Informatique Scien-
tifique) under project 62015. This work has been greatly
improved through fruitful discussions with E. Balan, F.
Mauri, M. Lazzeri and Ph. Sainctavit. This is IPGP
Contribution n◦XXXX.
∗ Electronic address: amelie.juhin@impmc.jussieu.fr
† Electronic address: hazemann@grenoble.cnrs.fr
1 D. Levy, G. Artioli, A. Gualtieri, S. Quartieri S, and M.
Valle, Mater. Res. Bull. 34, 711 (1999)
2 C. Laulhé, F. Hippert, J. Kreisel, M. Maglione, A. Simon,
J. L. Hazemann, and V. Nassif, Phys. Rev. B 74, 014106
(2006)
3 A.I. Frenkel, D. M. Pease, J. I. Budnick, P. Metcalf, E. A.
Stern, P. Shanthakumar, and T. Huang, Phys. Rev. Lett.
97, 195502 (2006)
4 R. G. Burns, Mineralogical Applications of Crystal Field
Theory (Cambridge University Press, Cambridge, 1993)
5 J. Blundy, and B. Wood, Nature 372, 452 (1994)
6 J. M. Garcia-Lastra, M. T. Barriuso, J. A. Aramburu, and
M. Moreno, Phys. Rev. B 72, 113104 (2005)
7 M. Moreno, M. T. Barriuso, J. M. Garcia-Lastra, J. A.
Aramburu, P. Garcia-Frenandez, and M. Moreno, J. Phys.:
Condens. Matter 18, R315 (2006)
8 E. Gaudry, Ph. Sainctavit, F. Juillot, F. Bondioli, Ph.
Ohresser, and I. Letard, Phys. Chem. Minerals 32, 710
(2006)
9 K. Langer, Z. Kristallogr. 216, 87 (2001)
10 J. L. Martins, and A. Zunger, Phys. Rev. B 30, 6217 (1984)
11 L. Galoisy, Phys. Chem. Minerals 23, 217 (1996)
12 J. C. Mikkelsen, Jr., and J. B. Boyce, Phys. Rev. B 28,
7130 (1983)
13 A. I. Frenkel, E. A. Stern, A. Voronel, M. Qian, and M.
Newville, Phys. Rev. B 49, 11662 (1993)
14 C. Lamberti, E. Groppo, C. Prestipino, S. Casassa, A. M.
Ferrari, C. Pisani, C. Giovanardi, P. Luches, S. Valeri, and
F. Boscherini, Phys. Rev. Lett. 91, 046101 (2003)
15 E. Gaudry, A. Kiratisin, P. Sainctavit, C. Brouder, F.
Mauri, A. Ramos, A. Rogalev, and J. Goulon, Phys. Rev.
B 67, 094108 (2003)
16 P. Thibaudeau, and F. Gervais, J. Phys.: Condens. Matter
14, 3543 (2002)
17 O. Proux, X. Biquard, E. Lahera, J-J Menthonnex, A.
Prat, O. Ulrich, Y. Soldo, P. Trévisson, G. Kapoujyan, G.
Perroux, P. Taunier, D. Grand, P. Jeantet, M. Deleglise, J-
P. Roux, and J-L. Hazemann, Phys. Scr. T115, 970 (2005)
18 N. Newville, J. Synch. Radiation 8, 322 (2001)
19 Ravel, and M. Newville, J. Synch. Radiation 12, 537 (2005)
20 Calculations were performed with PARATEC (PARAllel
Total Energy Code) by B. Pfrommer, D. Raczkowski, A.
Canning, S. G. Louie, Lawrence Berkeley National Lab-
oratory (with contributions from F. Mauri, M. Cote, Y.
Yoon, Ch. Pickard and P. Haynes. For more infomration
see www.nersc.gov/projects/paratec
21 N. Troullier, and J. L. Martins, Phys. Rev. B 43, 1993
(1991)
22 L. Kleinman, and D. M. Bylander, Phys. Rev. Lett. 48,
1425 (1982)
23 T.Yamanaka, and Y. Takeuchi, Z. Kristallogr. 165, 65
(1983)
24 S. G. Louie, S. Froyen, and M. L. Cohen , Phys. Rev. B
26, 1738 (1982)
25 M. Taillefumier, D. Cabaret, A.-M. Flank, and F. Mauri,
Phys. Rev. B 66, 195107 (2002)
26 D. Cabaret, E. Gaudry, M. Taillefumier, P. Sainctavit,
and F. Mauri, Physica Scripta, Proc. XAFS-12 conference
T115, 131 (2005)
27 P. E. Blöchl, Phys. Rev. B 50, 17953 (1994)
mailto:amelie.juhin@impmc.jussieu.fr
mailto:hazemann@grenoble.cnrs.fr
28 R. Haydock, V. Heine, and M. J. Kelly, J. Phys. C: Solid
State Phys. 5, 2845 (1972)
29 R. Haydock, V. Heine, and M. J. Kelly, J. Phys. C: Solid
State Phys. 8, 2591 (1975)
30 M. O. Krause, and J. H. Oliver, J. Phys. Chem. Ref. Data
8, 329 (1979)
31 D. L. Wood, and G.F. Imbush, J. Chem. Phys. 48, 5255
(1968)
32 D. Bravo, and R. Böttcher, J. Phys.: Condens. Matter 4,
7295 (1992)
33 S. L. Votyakov, A. V. Porotnikov, Y. V. Shchapova ,E. I.
Yuryeava, and A. L. Ivanovskii, Int. J. Quant. Chem. 100,
567 (2004)
34 R. J. Hill, J. R. Craig, and G. V. Gibbs, Phys. Chem.
Minerals 4, 317 (1979)
35 D. Levy, G. Artioli, A. Gualtieri, S. Quartieri, and M.
Valle, Mat. Res. Bull. 34, 711 (1999)
36 R. D. Shannon, Acta Crystallogr, Sect. A 32, 751 (1976)
37 O. L. Anderson, and J. E. Nafe, J. Geophys. Res. 70, 3951
(1965)
38 J. M. Garcia-Lastra, J. A. Aramburu, M. T. Barriuso, and
M. Moreno, Phys. Rev. B 74, 115118 (2006)
ABSTRACT
  The structural environment of substitutional Cr3+ ion in MgAl2O4 spinel has
been investigated by Cr K-edge Extended X-ray Absorption Fine Structure (EXAFS)
and X-ray Absorption Near Edge Structure (XANES) spectroscopies.
First-principles computations of the structural relaxation and of the XANES
spectrum have been performed, with a good agreement to the experiment. The Cr-O
distance is close to that in MgCr2O4, indicating a full relaxation of the first
neighbors, and the second shell of Al atoms relaxes partially. These
observations demonstrate that Vegard's law is not obeyed in the MgAl2O4-MgCr2O4
solid solution. Despite some angular site distortion, the local D3d symmetry of
the B-site of the spinel structure is retained during the substitution of Cr
for Al. Here, we show that the relaxation is accomodated by strain-induced bond
buckling, with angular tilts of the Mg-centred tetrahedra around the Cr-centred
octahedron. By contrast, there is no significant alteration of the angles
between the edge-sharing octahedra, which build chains aligned along the three
four-fold axes of the cubic structure.

<|endoftext|><|startoftext|>
Introduction 
A RAID (Redundant Array of Inexpensive Disks) is a 
set of disks (and associated controller) that can automati-
cally recover data when one or more disks fail [4, 13]. 
Storage architectures using a large cache and RAID disks 
                                                                            
* Was a Visiting Research Assistant Professor at CRHC, on leave from LAAS-
CNRS, when this work was performed. 
† Was a Visiting Research Scholar at CRHC, on leave from Dipartimento di 
Informatica e Sistemistica,  University of Naples, Italy 
are becoming a popular solution for providing high per-
formance at low cost without compromising much data re-
liability [5, 10]. The analysis of these systems is focused 
on performance (see e.g., [9, 11]). The cache is assumed to 
be error free, and only the impact of errors in the disks is 
investigated. The impact of errors in the cache is addressed 
(to a limited extent) from a design point of view in [12], 
where the architecture of a fault-tolerant, cache-based 
RAID controller is presented. Papers studying the impact 
of errors in caches can be found in other applications not 
related to RAID systems (e.g., [3]). 
In this paper, unlike previous work, which mainly ex-
plored the impact of caching on the performance of disk 
arrays, we focus on dependability analysis of a cache-
based RAID controller. Errors in the cache might have a 
significant impact on the performance and dependability of 
the overall system. Therefore, in addition to the fault toler-
ance capabilities provided by the disk array, it is necessary 
to implement error detection and recovery mechanisms in 
the cache.  This prevents error propagation from the cache 
to the disks and users, and it reduces error latency (i.e., 
time between the occurrence of an error and its detection 
or removal). The analysis of the error detection coverage 
of these mechanisms, and of error latency distributions, 
early in the design process provides valuable information. 
System manufacturers can understand, early on, the fault 
tolerance capabilities of the overall design and the impact 
of errors on performance and dependability.  
In our case study, we employ hierarchical simulation, 
[6], to model and evaluate the dependability of a commer-
cial cache-based RAID architecture. The system is decom-
posed into several abstraction levels, and the impact of 
faults occurring in the cache and the disk array is evaluated 
at each level of the hierarchy. To analyze the system under 
realistic operational conditions, we use real input traces to 
drive the simulation. The system model is based on the 
specification of the RAID architecture, i.e., we do not 
evaluate a prototype system. Simulation experiments are 
conducted using the DEPEND environment [7].  
The cache architecture is complex and consists of sev-
eral layers of overlapping error detection and recovery 
mechanisms. Our three main objectives are 1) to analyze 
how the system responds to various fault and error scenar-
ios, 2) to analyze error latency distributions taking into ac-
count the origin of errors, and 3) to evaluate the coverage 
of error detection mechanisms. These analyses require a 
detailed evaluation of the system’s behavior in the pres-
ence of faults. In general, two complementary approaches 
can be used to make these determinations: analytical mod-
eling and simulation. Analytical modeling is not appropri-
ate here, due to the complexity of the RAID architecture. 
Hierarchical simulation offers an efficient method to con-
duct a detailed analysis and evaluation of error latency and 
error detection coverage using real workloads and realistic 
fault scenarios. Moreover, the analysis can be completed 
within a reasonable simulation time. 
To best reproduce the characteristics of the input load, a 
real trace file, collected in the field, is used to drive the 
simulation. The input trace exhibits the well-known track 
skew phenomenon, i.e., a few tracks among the address-
able tracks account for most of the I/O requests. Since 
highly reliable commercial systems commonly tolerate iso-
lated errors, our study focuses on the impact of multiple 
near-coincident errors occurring during a short period of 
time (error bursts), a phenomenon which has seldom been 
explored. We show that due to the high frequency of sys-
tem operation, a transient fault in a single system compo-
nent can result in a burst of errors that propagate to other 
components. In other words, what is seen at a given ab-
straction level as a single error becomes a burst of errors at 
a higher level of abstraction. Also, we analyze how bursts 
of errors affect the coverage of error detection mechanisms 
implemented in the cache and how they affect the error la-
tency distributions, (taking into account where and when 
the errors are generated). In particular, we demonstrate that 
the overlapping of error detection and recovery mecha-
nisms provides high error detection coverage for the over-
all system, despite the occurrence of long error bursts. Fi-
nally, analysis of the evolution of the number of faulty 
tracks in the cache memory and in the disks shows an in-
creasing trend for the disks but an almost constant number 
for cache memory. 
This paper contains five sections. Section 2 describes 
the system architecture and cache operations, focusing on 
error detection and recovery mechanisms. Section 3 out-
lines the hierarchical modeling approach and describes the 
hierarchical model developed for the system analyzed in 
this paper. Section 4 presents the results of the simulation 
experiments. Section 5 summarizes the main results of the 
study and concludes the paper. 
2 System presentation  
The storage architecture analyzed in this paper (Figure 
1) is designed to support a large amount of disk storage 
and to provide high performance and high availability. The 
storage system supports a RAID architecture composed of 
a set of disk drives storing data, parity, and Reed-Solomon 
coding information, which are striped across the disks [4]. 
This architecture tolerates the failure of up to two disks. If 
a disk fails, the data from the failed disk is reconstructed 
on-the-fly using the valid disks; the reconstructed data is 
stored on a hot spare disk without interrupting the service. 
Data transfer between the hosts and the disks is supervised 
by the array controller. The array controller is composed of 
a set of control units. The control units process user re-
quests received from the channels and direct these requests 
to the cache subsystem. Data received from the hosts is as-
sembled into tracks in the cache. The number of tracks cor-
responding to a single request is application dependent. 
Data transfers between the channels and the disks are per-
formed by the cache subsystem via reliable and high-speed 
control and data busses. The cache subsystem consists of 
1) a cache controller organized into cache controller inter-
faces to the channels and the disks and cache controller in-
terfaces to the cache memory (these interfaces are made of 
redundant components to ensure a high level of availabil-
ity) and 2) cache volatile and nonvolatile memory. Com-
munication between the cache controller interfaces and the 
cache memory is provided by redundant and multidirec-
tional busses (denoted as Bus 1 and Bus 2 in Figure 1). 
The cache volatile memory is used as a data staging area 
for read and write operations. The battery-backed nonvola-
tile memory is used to protect critical data against failures 
(e.g., data modified in the cache and not yet modified in 
the disks, information on the file system that is necessary 
to map the data processed by the array controller to physi-
cal locations on the disks). 
2.1  Cache subsystem operations  
The cache subsystem is for caching read and write re-
quests. A track is always staged in the cache memory as a 
whole, even in the event of a write request involving only a 
few blocks of the track. In the following, we describe the 
main cache operations assuming that the unit of data trans-
fer is an entire track. 
Read operation. First, the cache controller checks for 
Figure 1: Array controller architecture, interfaces  
and data flow 
Cache subsystem 
cache  
controller 
(CC) 
cache  
memory 
Bus 1 
Bus 2 
Control & Data 
 Busses 
Data transfer be-
tween cache and 
channels 
Requests  
for data transfer 
to/from hosts 
Data transfer 
to/from disks 
Channel 
Interfaces 
RAID 
Disk 
Disk 
Interfaces 
Hosts 
Control  
Units 
CC interfaces to 
channels/disks 
CC interfaces to 
cache memory 
nonvolatile 
memory 
volatile 
memory 
Array  
Controller 
the requested track in the cache memory. If the track is al-
ready there («cache hit»), it is read from the cache and the 
data is sent back to the channels. If not («cache miss»), a 
request is issued to read the track from the disks and swap 
it to the cache memory. Then, the track is read from the 
cache. 
Write operation. In the case of a cache hit, the track is 
modified in the cache and flagged as «dirty.» In the case of 
a cache miss, a memory is allocated to the track and the 
track is written into that memory location. Two write 
strategies can be distinguished: 1) write-through and 2) fast 
write. In the write-through strategy, the track is first writ-
ten to the volatile memory. The write operation completion 
is signaled to the channels after the track is written to the 
disks. In the fast-write strategy, the track is written to the 
volatile memory and to nonvolatile memory. The write op-
eration completion is signaled immediately. The modified 
track is later written to the disks according to a write-back 
strategy, which consists of transferring the dirty tracks to 
the disks, either periodically or when the amount of dirty 
tracks in the cache exceeds a predefined threshold. Finally, 
when space for a new track is needed in the cache, the 
track-replacement algorithm based on the Least-Recently-
Used (LRU) strategy is applied to swap out a track from 
the cache memory. 
Track transfer inside the cache. The transfer of a track 
between the cache memory, the cache controller, and the 
channel interfaces is composed of several elementary data 
transfers. The track is broken down into several data 
blocks to accommodate the parallelism of the different de-
vices involved in the transfer. This also makes it possible 
to overlap several track transfer operations over the data 
busses inside the cache subsystem. Arbitration algorithms 
are implemented to synchronize these transfers and avoid 
bus hogging by a single transfer.  
2.2  Error detection mechanisms 
The cache is designed to detect errors in the data, ad-
dress, and control paths by using, among other techniques, 
parity, error detection and correction codes (EDAC), and 
cyclic redundancy checking (CRC). These mechanisms are 
applied to detect errors in the data path in the following 
ways: 
Parity. Data transfers, over Bus 1 (see Figure 1) are 
covered by parity. For each data symbol (i.e., data word) 
transferred on the bus, parity bits are appended and passed 
over separate wires. Parity is generated and checked in 
both directions. It is not stored in the cache memory but is 
stripped after being checked. 
EDAC. Data transfers over Bus 2 and the data stored in 
the cache memory are protected by an error detection and 
correction code. This code is capable of correcting on-the-
fly all single and double bit errors per data symbol and de-
tecting all triple bit data errors. 
CRC. Several kinds of CRC are implemented in the ar-
ray controller. Only two of these are checked or generated 
within the cache subsystem: the frontend CRC (FE-CRC) 
and the physical sector CRC (PS-CRC). FE-CRC is ap-
pended, by the channel interfaces, to the data sent to the 
cache during a write request. It is checked by the cache 
controller. If FE-CRC is valid, it is stored with the data in 
the cache memory. Otherwise, the operation is interrupted 
and a CRC error is recorded. FE-CRC is checked again 
when a read request is received from the channels. There-
fore, extra-detection is provided to recover from errors that 
may have occurred while the data was in the cache or in 
the disks, errors that escaped the error detection mecha-
nisms implemented in the cache subsystem and the disk ar-
ray. PS-CRC is appended by the cache controller to each 
data block to be stored in a disk sector. The PS-CRC is 
stored with the data until a read from disk operation oc-
curs. At this time, it is checked and stripped before the data 
is stored in the cache. The same algorithm is implemented 
to compute FE-CRC and PS-CRC. This algorithm guaran-
tees detection of three or fewer data symbols in error in a 
data record. 
Table 1 summarizes the error detection conditions for 
each mechanism presented above, taking into account the 
component in which the errors occur and the number of 
noncorrected errors occurring between the computation of 
the code and its being checked. The (x) symbol means that 
errors affecting the corresponding component can be de-
tected by the mechanism indicated in the column. It is 
noteworthy that the number of check bits and the size of 
the data symbol (ds) mentioned in the error detection con-
dition are different for parity, EDAC, and CRC. 
2.3 Error recovery and track reconstruction 
Besides EDAC, which is able to automatically correct 
some errors by hardware, software recovery procedures are 
invoked when errors are detected by the cache subsystem. 
Recovery actions mainly consist of retries, memory fenc-
ing, and track-reconstruction operations. When errors are 
detected during a read operation from the cache volatile 
memory and the error persists after retries, an attempt is 
made to read the data from nonvolatile memory. If this op-
 Error detection mechanism 
Error Location FE-CRC Parity EDAC PS-CRC 
Transfer: channel to cache x    
CCI to channels/disks x    
Bus 1 x x   
CCI to cache memory x    
Bus 2 x  x  
Cache memory x  x  
Transfer: cache to disk x   x 
Disks x   x 
      Error detection condition < 4 ds 
with  
errors 
odd # of 
errors 
per ds 
< 4 bit-
errors 
per ds 
< 4 ds 
with er-
rors  
ds= data symbol, CCI = Cache Controller Interface 
Table 1. Error detection efficiency with respect to the loca-
tion and the number of errors  
eration fails, the data is read from the disk array. This op-
eration succeeds if the data on the disks is still valid or it 
can be reconstructed (otherwise it fails). Figure 2 describes 
a simplified disk array composed of n data disks (D1 to 
Dn) and two redundancy disks (P and Q). Each row of the 
redundancy disks is computed based on the corresponding 
data tracks. For example, the first rows in disks P (P[1;n]) 
and Q (Q[1;n]) are obtained based on the data tracks T1 to 
Tn stored in the disks D1 to Dn. This architecture tolerates 
the loss of two tracks in each row; this condition will be re-
ferred to as the track reconstruction condition. The tracks 
that are lost due to disk failures or corrupted due to bit-
errors can be reconstructed using the valid tracks in the 
row, provided that the track reconstruction condition is sat-
isfied; otherwise data is lost. More information about disk 
reconstruction strategies can be found in [8]. 
3 Hierarchical modeling methodology 
We propose a hierarchical simulation approach to en-
able an efficient, detailed dependability analysis of the 
RAID storage system described in the previous section.  
Establishing the proper number of hierarchical levels 
and their boundaries is not trivial. Several factors must be 
considered in determining an optimal hierarchical decom-
position that provides a significant simulation speed-up 
with one minimal loss of accuracy: 1) system complexity, 
2) the level of detail of the analysis and the dependability 
measures to be evaluated, and 3) the strength of system 
component interactions (weak interactions favor hierarchi-
cal decomposition). 
In our study, we define three hierarchical levels (sum-
marized in Figure 3) to model the cache-based storage sys-
tem. At each level, the behavior of the shaded components 
is detailed in the lower-level model. Each model is built in 
a modular fashion and is characterized by: 
• the components to be modeled and their behavior, 
•  a workload generator specifying the input distribution,  
•  a fault dictionary specifying the set of faults to be in-
jected in the model, the distribution characterizing the 
occurrence of faults, and the consequences of the fault 
with the corresponding probability of occurrence, and 
•  the outputs derived from the submodel simulation. 
For each level, the workload can be a real I/O access 
trace or generated from a synthetic distribution (in this 
study we use a real trace of user I/O requests). The effects 
of faults injected at a given level are characterized by sta-
tistical distributions (e.g., probability and number of errors 
occurring during data transfer inside the cache). Such dis-
tributions are used as inputs for fault injection at the next 
higher level. This mechanism allows the propagation of 
fault effects from lower-level models to higher-level mod-
els.  
In the model described in Figure 3, the system behavior, 
the granularity of the data transfer unit, and the quantita-
tive measures evaluated are refined from one level to an-
other. In the Level 1 model, the unit of data transfer is a set 
of tracks to be read or written from a user file. In Level 2, 
it is a single track. In Level 3, the track is decomposed into 
a set of data blocks, each of which is composed of a set of 
data symbols. In the following subsections, we describe the 
three levels. In this study, we address Level 2 and Level 3 
models, which describe the internal behavior of the cache 
and RAID subsystems in the presence of faults. Level 1 is 
included to illustrate the flexibility of our approach. Using 
the hierarchical methodology, additional models can be 
built on top of Level 2 and Level 3 models to study the be-
havior of other systems relying on the cache and RAID 
subsystems.  
3.1 Level 1 model  
Level 1 model translates user requests to read/write a 
specified file into requests to the storage system to 
read/write the corresponding set of tracks. It then propa-
gates the replies from the storage system back to the users, 
taking into account the presence of faults in the cache and 
RAID subsystems. A file request (read, write) results in a 
sequence of track requests (read, fast-write, write-through). 
Concurrent requests involving the same file may arrive 
from different users. Consequently, a failure in a track op-
eration can affect multiple file requests. In the Level 1 
model, the cache subsystem and the disk array are modeled 
as a single entity—a black box. A fault dictionary specify-
ing the results of track operations is defined to characterize 
the external behavior of the black box in the presence of 
faults. There are four possible results for a track operation 
(from the perspective of occurrence, detection, and correc-
tion of errors): 1) successful read/write track operation 
(i.e., absence of errors, or errors detected and corrected), 2) 
errors detected but not corrected, 3) errors not detected, 
and 4) service unavailable. Parameter values representing 
the probability of the occurrence of these events are pro-
vided by the simulation of the Level 2 model. Two types of 
outputs are derived from the simulation of the Level 1 
model: 1) quantitative measures characterizing the prob-
ability of user requests failure and 2) the workload distri-
bution of read or write track requests received by the cache 
subsystem. This workload is used to feed the Level 2 
model. 
3.2 Level 2 model 
The Level 2 model describes the behavior in the pres-
ence of faults of the cache subsystem and the disk array. 
Cache operations and the data flow between the cache con-
troller, the cache memory, and the disk array are described 
Figure 2: A simplified RAID 
Tn+1 
row2 
row1 
Q[n+1 ; 2n] 
Q[1 ; n] 
Redundancy Disks: P, Q Data Disks: D1, … Dn 
P[n+1 ; 2n] 
P[1 ; n] 
to identify scenarios leading to the outputs described in the 
Level 1 model and to evaluate their probability of occur-
rence. At Level 2, the data stored in the cache memory and 
the disks is explicitly modeled and structured into a set of 
tracks. Volatile memory and nonvolatile memory are mod-
eled as separate entities. A track transfer operation is seen 
at a high level of abstraction. A track is seen as an atomic 
piece of data, traveling between different subparts of the 
system (from user to cache, from cache to user, from disk 
to cache, from cache to disk), while errors are injected to 
the track and to the different components of the system. 
Accordingly, when a track is to be transferred between two 
communication partners, for example, from the disk to the 
cache memory, none of the two needs to be aware of the 
disassembling, buffering, and reassembling procedures that 
occur during the transfer. This results in a significant simu-
lation speedup, since the number of events needing to be 
processed is reduced dramatically. 
3.2.1 Workload distribution. Level 2 model inputs 
correspond to requests to read or write tracks from the 
cache. Each request specifies the type of the access (read, 
write-through, fast-write) and the track to be accessed. The 
distribution specifying these requests and their interarrival 
times can be derived from the simulation of the Level 1 
model, from real measurements (i.e., real trace), or by gen-
erating distributions characterizing various types of work-
loads. 
3.2.2  Fault models. Specification of adequate fault 
models is essential to recreate realistic failure scenarios. 
To this end we distinguished three primary fault models, 
used to exercise and analyze error detection and recovery 
mechanisms of the target system. These fault models in-
clude 1) permanent faults leading to cache controller com-
ponent failures, cache memory component failures, or disk 
failures, 2) transient faults leading to track errors affecting 
single or multiple bits of the tracks while they are stored in 
the cache memory or in the disks, and 3) transient faults 
leading to track errors affecting single or multiple bits dur-
ing the transfer of the tracks by the cache controller to the 
cache memory or to the disks. 
Component failures. When a permanent fault is in-
jected into a cache controller component, the requests 
processed by this component are allocated to the other 
Figure 3: Hierarchical modeling of the cache-based storage system 
Workload 
Generator 
Workload 
Generator 
Fault  
Injector Cache & 
 RAID disk array 
Channel interfaces 
 and control units 
Level 3 outputs 
Prob. and # of errors during transfer 
• in CCI to channels/disks 
• over Bus 1 
• in CCI to cache memory 
• over Bus 2 
Level 2 outputs 
1) Prob. of successful track operation 
2) Prob. of failed track operation 
 • Errors detected & not corrected 
 • Errors not detected 
 • Cache or RAID unavailable 
2) Coverage (CRC, parity, EDAC) 
3) Error latency distribution 
4) Frequency of track reconstructions 
Level 1 outputs 
1) Probability of failure of user requests 
2) Distribution of track read/write requests 
sent to {cache+RAID} 
User requests to read or write Si 
tracks from/to a file Fi 
Track read/ write requests 
 to {cache + RAID} 
{read,T1}, …, {write,Tn} 
Un {read,Fi,Si}, …,U1{write,F1,S1} 
result of each 
track transfer 
Read/write 
from/to memory 
Track Transfer 
Read/write 
to/from disks 
Requests to read 
or write tracks 
Level 1: {Cache +RAID} modeled as a black box 
Cache Memory RAID Disk 
Level 3: Details the transfer of a track inside the cache controller;  
each track decomposed into data blocks; data block = set of data symbols 
Level 2: interactions between cache controller, RAID and cache memory; 
tracks in cache memory and disks explicitly modeled; data unit = track 
CCI to  
 cache memory 
CCI to  chan-
nels/disks 
Cache controller 
accesses 
 to the disks 
accesses to 
 cache memory 
CCI  
cache memory 
CCI chan-
nels/disks 
Buffers 
Workload 
Generator 
Bus2 
Bus1 Buffers 
Fault  
Injectors 
Fault  
Injectors 
Request to transfer a Track 
composed of n data blocks, 
Bi = {di1,…dik} data symbols 
user specified 
 parameters 
T1{B1, B2,…, Bn } 
fault injection in {cache+disk} based on 
probabilities evaluated from level 2 
• track errors in the buffers 
• bus transmission errors 
•cache component/disk failure 
•memory/disk track errors 
•track errors during transfer 
user specified  
parameters inputs from level 3 
inputs from level 2 
components of the cache controller that are still available. 
The reintegration of the failed component after repair does 
not interrupt the cache operations in progress. Permanent 
faults injected into a cache memory card or a single disk 
lead to the loss of all tracks stored in these components. 
When a read request involving tracks stored on a faulty 
component is received by the cache, an attempt is made to 
read these tracks from the nonvolatile memory or from the 
disks. If the tracks are still valid in the nonvolatile memory 
or in the disks, or if they can be reconstructed from the 
valid disks, then the read operation is successful, otherwise 
the data is lost. Note that when a disk fails, a hot spare is 
used to reconstruct the data and the failed disk is sent for 
repair. 
Track errors in the cache memory and the disks. These 
correspond to the occurrence of single or multiple bit-
errors in a track due to transient faults. Two fault injection 
strategies are distinguished: time dependent and load de-
pendent. Time dependent strategy simulates faults occur-
ring randomly. The time of injection is sampled from a 
predefined distribution, and the injected track, in the mem-
ory or in the disks, is chosen uniformly from the set of ad-
dressable tracks. Load dependent strategy aims at simulat-
ing the occurrence of faults due to stress. The fault injec-
tion rate depends on the number of accesses to the memory 
or to the disks (instead of the time), and errors are injected 
in the activated tracks. Using this strategy, frequently ac-
cessed tracks are injected more frequently than other 
tracks. For both strategies, errors are injected randomly 
into one or more bytes of a track. The fault injection rate is 
tuned to allow a single fault injection or multiple near-
coincident fault injections (i.e., the fault rate is increased 
during a short period of time). This enables us to analyze 
the impact of isolated and bursty fault patterns. 
Track errors during transfer inside the cache. Track er-
rors can occur: 
• in the cache controller interfaces with channels/disks be-
fore transmission over Bus 1 (see Figure 1), i.e., before 
parity or CRC computation or checking, 
• during transfer over Bus 1, i.e., after parity computation, 
• in the cache controller interfaces to cache memory before 
transmission over Bus 2, i.e., before EDAC computation, 
• during transfer over Bus 2, i.e., after EDAC computation. 
To be able to evaluate the probability of occurrence and 
the number of errors affecting the track during the transfer, 
a detailed simulation of cache operations during this trans-
fer is required. Including this detailed behavior in the 
Level 2 model would be far too costly in terms of compu-
tation time and memory occupation. For that reason, this 
simulation is performed in the Level 3 model. In the Level 
2 model, a distribution is associated with each event de-
scribed above, specifying the probability and the number 
of errors occurring during the track transfer. The track er-
ror probabilities are evaluated at Level 3. 
3.2.3 Modeling of error detection mechanisms. Per-
fect coverage is assumed for cache components and disk 
failures due to permanent faults. The detection of track er-
rors occuring when the data is in the cache memory or in 
the disks, or during the data transfer depends on (1) the 
number of errors affecting each data symbol to which the 
error detection code is appended and (2) when and where 
these errors occurred (see Table 1). The error detection 
modeling is done using a behavioral approach. The number 
of errors in each track is recorded and updated during the 
simulation. Each time a new error is injected into the track, 
the number of errors is incremented. When a request is 
sent to the cache controller to read a track, the number of 
errors affecting the track is checked and compared with the 
error detection conditions summarized in Table 1. During a 
write operation, the track errors that have been accumu-
lated during the previous operations are overwritten, and 
the number of errors associated to the track is reset to zero. 
3.2.4 Quantitative measures. Level 2 simulation en-
ables us to reproduce several error scenarios and analyze 
the likelihood that errors will remain undetected by the 
cache or will cross the boundaries of several error detec-
tion and recovery mechanisms before being detected. 
Moreover, using the fault injection functions implemented 
in the model, we analyze (a) how the system responds to 
different error rates (especially burst errors) and input dis-
tributions and (b) how the accumulation of errors in the 
cache or in the disks and the error latency affect overall 
system behavior. Statistics are recorded to evaluate the fol-
lowing: coverage factors for each error detection mecha-
nism, error latency distributions, and the frequency of track 
reconstruction operations. Other quantitative measures, 
such as the availability of the system and the mean time to 
data loss, can also be recorded. 
3.3 Level 3 model 
The Level 3 model details cache operations during the 
transfer of tracks from user to cache, from cache to user, 
from disk to cache, and from cache to disk. This allows us 
to evaluate the probabilities and number of errors occur-
ring during data transfers (these probabilities are used to 
feed the Level 2 model, as discussed in Section 3.2). Un-
like Level 2, which models a track transfer at a high level 
of abstraction as an atomic operation, in Level 3, each 
track is decomposed into a set of data blocks, which are in 
turn broken down into data symbols (each one correspond-
ing to a predefined number of bytes). The transfer of a 
track is performed in several steps and spans several cy-
cles. CRC, parity or EDAC bits are appended to the data 
transferred inside the cache or over the busses (Bus 1 and 
Bus 2). Errors during the transfer may affect the data bits 
as well as the check bits. At this level, we assume that the 
data stored in the cache memory and in the disk array is er-
ror free, as the impacts of these errors are considered in the 
Level 2 model. Therefore, we need to model only the 
cache controller interfaces to the channels/disks and to the 
cache memory and the data transfer busses. The Level 3 
model input distribution defines the tracks to be accessed 
and the interarrival times between track requests. This dis-
tribution is derived from the Level 2 model. 
Cache controller interfaces include a set of buffers in 
which the data to be transmitted to or received from the 
busses is temporarily stored (data is decomposed or as-
sembled into data symbols and redundancy bits are ap-
pended or checked). In the Level 3 model, only transient 
faults are injected to the cache components (buffers and 
busses). During each operation, it is assumed that a healthy 
component will perform its task correctly, i.e., it will exe-
cute the operation without increasing the number of errors 
in the data it is currently handling. For example, the cache 
controller interfaces will successfully load their own buff-
ers, unless they are affected by errors while performing the 
load operation. Similarly, Bus 1 and Bus 2 will transfer a 
data symbol and the associated information without errors, 
unless they are faulty while doing so. On the other hand, 
when a transient fault occurs, single or multiple bit-flips 
are continuously injected (during the transient) into the 
data symbols being processed. Since a single track transfer 
is a sequence of operations spanning several cycles, single 
errors due to transients in the cache components may lead 
to a burst of errors in the track currently being transferred. 
Due to the high operational speed of the components, even 
a short transient (a few microseconds) may result in an er-
ror burst, which affects a large number of bits. 
4 Simulation experiments and results 
In this section, we present the simulation results ob-
tained from Level 2 and Level 3 to highlight the advan-
tages of using a hierarchical approach for system depend-
ability analysis. We focus on the behavior of the cache and 
the disks when the system is stressed with error bursts. Er-
ror bursts might occur during data transmission over bus-
ses, in the memory and the disks as observed, e.g., in [2]. It 
is well known that the CRC and EDAC error detection 
mechanisms provide high error detection coverage of sin-
gle bit errors. Previously the impact of error bursts has not 
been extensively explored. In this section, we analyze the 
coverage of the error detection mechanisms, the distribu-
tion of error detection latency and error accumulation in 
the cache memory and the disks, and finally the evolution 
of the frequency of track reconstruction in the disks.  
4.1  Experiment set-up 
Input distribution. Real traces of user I/O requests were 
used to derive inputs for the simulation. Information pro-
vided by the traces included tracks processed by the cache 
subsystem, the type of the request (read, fast-write, write-
through), and the interarrival times between the requests. 
Using a real trace gave us the opportunity to analyze the 
system under a real workload. The input trace described 
accesses to more than 127,000 tracks, out of 480,000 ad-
dressable tracks. As illustrated by Figure 4, the distribution 
of the number of accesses per track is not uniform. Rather 
a few tracks are generally more frequently accessed than 
the rest—the well-known track skew phenomenon. For in-
stance, the first 100 most frequently accessed tracks ac-
count for 80% of the accesses in the trace; the leading 
track of the input trace is accessed 26,224 times, whereas 
only 200 accesses are counted for rank-100 track. The in-
terarrival time between track accesses is about a few milli-
seconds, leading to high activity in the cache subsystem. 
Figure 5 plots the probability density function of the inter-
arrival times between track requests. Regarding the type of 
the requests, the distribution is: 86% reads, 11.4% fast-
writes and 2.6% write-through operations. 
Simulation parameters. We simulated a large disk ar-
ray composed of 13 data disks and 2 redundancy disks. 
The RAID data capacity is 480,000 data tracks. The capac-
ity of the simulated cache memory is 5% the capacity of 
the RAID. The rate of occurrence of permanent faults is 
10-4 per hour for cache components (as is generally ob-
served for hardware components) and 10-6 per hour for the 
disks [4]. The mean time for the repair of cache subsystem 
components is 72 hours (a value provided by the system 
manufacturer). Note that when a disk fails, a hot spare is 
used for the online reconstruction of the failed disk.  
Transient faults leading to track errors occur more fre-
quently than permanent faults. Our objective is to analyze 
how the system responds to high fault rates and bursts of 
errors. Consequently, high transient fault rates are assumed 
in the simulation experiment: 100 transients per hour over 
the busses, and 1 transient per hour in the cache controller 
interfaces, the cache memory and the disks. Errors occur 
more frequently over the busses than in the other compo-
nents. Regarding the load-dependent fault injection strat-
egy, the injection rate in the disk corresponds to one error 
each 1014 bits accessed, as observed in [4]. The same injec-
tion rate is assumed for the cache memory. Finally, the 
length of the error burst in the cache memory and in the 
disks is sampled from a normal distribution with a mean of 
100 and a standard deviation of 10, whereas the length of 
the error burst during the track transfer inside the cache is 
evaluated from the Level 3 model as discussed in Section 
3.3. The results presented in the following subsections cor-
respond to the simulation of 24 hours of system operation. 
#accesses per track
1E+0 1E+2 1E+4
rank ordered tracks
interarrival time (ms) 
1 2 3 4 5 6 7 8 9 10
Figure 4: Track skew (Log-Log) Figure 5: Interarrival time 
4.2 Level 3 model simulation results 
As discussed in Section 3.3, the Level 3 model aims at 
evaluating the number of errors occurring during the trans-
fer of the tracks inside the cache due to transient faults 
over the busses and in the cache controller interfaces. We 
assumed that the duration of a transient fault is 5 micro-
seconds. During the duration of the transient, single or 
multiple bit flips are continuously injected in the track data 
symbols processed during that time. The cache operational 
cycle for the transfer of a single data symbol is of the order 
of magnitude of a few nanoseconds. Therefore the occur-
rence of a transient fault might affect a large number of 
bits in a track. This is illustrated by Figure 6, which plots 
the conditional probability density function of the number 
of errors (i.e., number of bit-flips) occurring during the 
transfer over Bus 1 and Bus 2 (Figure 6-a) and inside the 
cache controller interfaces (Figure 6-b), given that a tran-
sient fault occurred. The distribution is the same for Bus 1 
and Bus 2 due to the fact these busses have the same 
speed. The mean length of the error burst measured from 
the simulation is around 100 bits during transfer over the 
busses, 800 bits when the track is temporarily stored in the 
cache controller interfaces to cache memory, and 1000 bits 
when the track is temporarily stored in the cache controller 
interfaces to channels/disks. The difference between the 
results is related to the difference between the track trans-
fer time over the busses and the track loading time inside 
the cache controller interfaces. 
4.3 Level 2 model simulation results 
We used the burst error distributions obtained from the 
Level 3 model simulation to feed Level 2 model as ex-
plained in Section 3.2. In following subsections we present 
and discuss the results obtained from the simulation of 
Level 2, specifically: 1) the coverage of the cache error de-
tection mechanisms, 2) the error latency distribution, and 
3) the error accumulation in the cache memory and disks 
and the evolution of the frequency of track reconstruction. 
4.3.1  Error detection coverage. For all simulation 
experiments that we performed, the coverage factor meas-
ured for the frontend CRC and the physical sector CRC 
was 100%. This is due to the very high probability of de-
tecting error patterns by the CRC algorithm implemented 
in the system (see Section 2.2). Regarding EDAC and par-
ity, the coverage factors tend to stabilize as the simulation 
time increases (see Figures 7-a and 7-b, respectively). Each 
unit of time in Figures 7-a and 7-b corresponds to 15 min-
utes of system operation. Note that EDAC coverage re-
mains high even though the system is stressed with long 
bursts occurring at a high rate, and more than 98% of the 
errors detected by EDAC are automatically corrected on-
the-fly. This is due to the fact that errors are injected ran-
domly in the track and the probability of having more than 
three errors in a single data symbol is low. (The size of a 
data symbol is around 10-3 the size of the track.) All the er-
rors that escaped EDAC or parity have been detected by 
the frontend CRC upon a read request from the hosts. This 
result illustrates the advantages of storing the CRC with 
the data in the cache memory to provide extra detection of 
errors escaping EDAC and parity and to compensate for 
the relatively low parity coverage. 
0 40 80 120 160 200
# errors during transfer  
0 200 400 600 800 1000
# errors during transfer
CCI channels/disks
CCI cache memory
           a) Bus 1, Bus 2                  b) cache controller interfaces 
Figure 6: Pdf of number of errors during track transfer  
given that a transient fault is injected 
0.970
0.975
0.980
0.985
0.990
0.995
0 15 30 45 60 75
EDAC detection 
coverage
EDAC correction 
coverage
0.9530
0.9535
0.9540
0.9545
0.9550
0.9555
0 15 30 45 60 75
Parity coverage
 a) EDAC b) Parity 
Figure 7: EDAC and parity coverage during simulation time 
4.3.2  Error latency and error propagation. When 
an error is injected in a track, the time of occurrence of the 
error and a code identifying which component caused the 
error are recorded. This allows us to monitor the error 
propagation in the system. Six error codes are defined: CCI 
(error occurred when data is stored in the cache controller 
interfaces to the channels/disks and to the cache memory), 
CM (error in the cache memory), D (error in the disk), B1 
(error during transmission over Bus 1), and B2 (error dur-
ing transmission over Bus 2). The time for an error to be 
overwritten (during a write operation) or detected (upon a 
read operation) is called error latency.  Since a track is 
considered faulty as soon as an error is injected, we record 
the latency associated with the first error injected in the 
track. This means that the error latency that we measure 
corresponds to the time between when the track becomes 
faulty and when the errors are overwritten or detected. 
Therefore, the error latency measured for each track is the 
maximum latency for errors present in the track. Figure 8 
plots the error latency probability density function for er-
rors, as categorized above, and error latency for all sam-
ples without taking into account the origin of errors (the 
unit of time is 0.1 ms). The latter distribution is bimodal. 
The first mode corresponds to a very short latency that re-
sults mainly from errors occurring over Bus 1 and detected 
by parity. The second mode corresponds to longer laten-
cies due to errors occurring in the cache memory or the 
disks, or to the propagation of errors occurring during data 
transfer inside the cache. Note that most of the errors es-
caping parity (error code B1) remain latent for a longer pe-
riod of time (as discussed in Section 3.2.3). 
The value of the latency depends on the input distribu-
tion. If the track is not frequently accessed, then errors pre-
sent in the track might remain latent for a long period of 
time. Figure 8-b shows that the latency of errors injected in 
the cache memory is slightly lower than the latency of er-
rors injected in the disk. This is because the disks are less 
frequently accessed than the cache memory. Finally, it is 
important to notice that the difference between the error la-
tency distribution for error codes B1 and B2 (Figure 8-a) is 
due to the fact that data transfers over Bus1 (during read 
and write operations) are covered by parity, whereas errors 
occurring during write operations over Bus 2 are detected 
later by EDAC or by FE-CRC when the data is read from 
the cache. Consequently, it would be useful to check 
EDAC before data is written to the cache memory in order 
to reduce the latency of errors due to Bus 2. 
4.3.3 Error distribution in the cache memory and 
in the disks. Analysis of error accumulation in the cache 
memory and disks provides valuable feedback, especially 
for scrubbing policy. Figure 9 plots the evolution in time 
of the percentage of faulty tracks in the cache memory and 
in disks (the unit of time is 15 minutes). An increasing 
trend is observed for the disks, whereas in the cache mem-
ory we observe a periodic behavior. In the latter case, the 
percentage of faulty tracks first increases and then de-
creases when either errors are detected upon read opera-
tions or are overwritten when tracks become dirty. Since 
the cache memory is accessed very frequently (every 5 
milliseconds in average) and the cache hit rate is high 
(more than 60%), errors are more frequently detected and 
overwritten in the cache memory than in the disks. The in-
crease of the number of faulty tracks in the cache affects 
the track reconstruction rate (number of reconstructions 
per unit of time), as illustrated in Figure 10. The average 
track reconstruction rate is approximately 8.7 10-5 per mil-
lisecond. It is noteworthy that the detection of errors in the 
cache memory does not necessarily lead to the reconstruc-
tion of a track (the track might still be valid in the disks). 
Nevertheless, the detection of errors in the cache has an 
impact on performance due to the increase in the number 
of accesses to the disk. Figure 9 indicates that different 
strategies should be considered for disk and cache memory 
scrubbing. The disk should be scrubbed more frequently 
than the cache memory; this prevents error accumulation, 
which can lead to inability to reconstruct a faulty track. 
5 Summary, discussion, and conclusions 
The dependability of a complex and sophisticated 
cache-based storage architecture is modeled and simulated. 
To ensure reliable operation and to prevent data loss, the 
system employs a number of error detection mechanisms 
and recovery strategies, including parity, EDAC, CRC 
checking, and support of redundant disks for data recon-
struction. Due to the complex interactions among these 
mechanisms, it is not a trivial task to accurately capture the 
behavior of the overall system in the presence of faults. To 
enable an efficient and detailed dependability analysis, we 
proposed a hierarchical, behavioral simulation-based ap-
proach in which the system is decomposed into several ab-
straction levels and a corresponding simulation model is 
associated with each level. In this approach, the impact of 
low-level faults is used in a higher level analysis. Using an 
appropriate hierarchical system decomposition, the com-
plexity of individual models can be significantly reduced 
while preserving the model’s ability to capture detailed 
system behavior. Moreover, additional details can be in-
corporated by introducing new abstraction levels and asso-
ciated simulation models.   
2 0.0
Latency
0E+01E+21E+41E+61E+8
latency
  a) all error codes   b) Bus 1, cache memory and disks 
Figure 8: Error latency distribution 
To demonstrate the capabilities of the methodology, we 
have conducted an extensive analysis of the design of a 
real, commercial cache RAID storage system. To our 
knowledge, this kind of analysis of a cache-based RAID 
system has not been accomplished either in academia or in 
the industry. The dependability measures used to charac-
terize the system include coverage of the different error de-
tection mechanisms employed in the system, error latency 
distribution classified according to the origin of an error, 
error accumulation in the cache memory and disks, and 
frequency of data reconstruction in the cache memory. To 
analyze the system under realistic operational conditions, 
we used real input traces to drive the simulations. It is im-
portant to emphasize that an analytical modeling of the 
system is not appropriate in this context due to the com-
plexity of the architecture, the overlapping of error detec-
tion and recovery mechanisms, and the necessity of captur-
ing the latent errors in the cache and the disks. Hierarchical 
simulation offers an efficient method to accomplish the 
above task and allows detailed analysis of the system to be 
performed using real input traces. 
The specific results of the study are presented in the 
previous sections. It is, however, important to summarize 
the key points that demonstrate the usefulness of the pro-
posed methodology. First, we focused on the analysis of 
the system behavior when it is stressed with high fault 
rates. In particular, we demonstrated that transient faults 
during a few microseconds—during data transfer over the 
busses or while the data is in the cache controller inter-
faces—may lead to bursts of errors affecting a large num-
ber of bits of the track. Moreover, despite the high fault in-
jection rate, high EDAC and CRC error detection coverage 
was observed, and the relatively low parity coverage was 
compensated for by the extra detection provided by CRC, 
which is stored with the data in the cache memory. 
The hierarchical simulation approach allowed us to per-
form a detailed analysis of error latency with respect to the 
origin of an error. The error latency distribution measured 
from the simulation, regardless the origin of the errors, is 
bimodal†: short latencies are mainly related to errors oc-
curring and detected during data transfer over the bus pro-
tected by parity, and the highest error latency was observed 
for errors injected into the disks. The analysis of the evolu-
tion during the simulation of the percentage of faulty 
tracks in the cache memory and the disks showed that, in 
                                                                            
†  Similar behavior was observed in other studies, e.g.,[1, 14]. 
spite of a high rate of injected faults, there is no error ac-
cumulation in the cache memory, i.e., the percentage of 
faulty tracks in the cache varies within a small range (0.5% 
to 2.5%, see Section 4.3.3), whereas an increasing trend 
was observed for the disks (see Figure 9). This is related to 
the fact that the cache memory is accessed very frequently, 
and errors are more frequently detected and overwritten in 
the cache memory than in the disks. The primary implica-
tion of this result, together with the results of the error la-
tency analysis, is the need for a carefully designed scrub-
bing policy capable of reducing the error latency with ac-
ceptable performance overhead. Simulation results suggest 
that the disks should be scrubbed more frequently than the 
cache memory in order to prevent error accumulation, 
which may lead to an inability to reconstruct faulty tracks.  
We should emphasize that the results presented in this 
paper are derived from the simulation of the system using a 
single, real trace to generate the input patterns for the 
simulation. Additional experiments with different input 
traces and longer simulation times should be performed to 
confirm these results. Moreover, the results presented in 
this paper are preliminary, as we addressed only the impact 
of errors affecting the data. Continuation of this work will 
include modeling of errors affecting the control flow in 
cache operations. The proposed approach is flexible 
enough to incorporate these aspects of system behavior. 
Including control flow will obviously increase the com-
plexity of the model, as more details about system behav-
ior must be described in order to simulate realistic error 
scenarios and provide useful feed back for the designers. It 
is clear that this kind of detailed analysis cannot be done 
without the support of a hierarchical modeling approach. 
Acknowledgments 
The authors are grateful to the anonymous reviewers 
whose comments helped improve the presentation of the 
paper and to Fran Baker for her insightful editing if our 
manuscript. This work was supported by the National 
Aeronautics and Space Administration (NASA) under 
grant NAG-1-613, in cooperation with the Illinois Com-
puter Laboratory for Aerospace Systems and Software 
(ICLASS), and by the Advanced Research Projects 
Agency under grant DABT63-94-C-0045. The findings, 
opinions, and recommendations expressed herein are those 
of the authors and do not necessarily reflect the position or 
policy of the United States Government or the University 
of Illinois, and no official endorsement should be inferred.  
References 
[1] J. Arlat, M. Aguera, Y. Crouzet, et al., «Experimental 
Evaluation of the Fault Tolerance of an Atomic Multicast 
System,» IEEE Transactions on Reliability, vol. 39, pp. 455-
467, 1990. 
[2] A. Campbell, P. McDonald, and K. Ray, «Single Event Upset 
Rates in Space,» IEEE Transactions on Nuclear Science, vol. 
39, pp. 1828-1835, 1992. 
1 11 21 31 41 51 61 71 81
cache memory
disks
%faulty tracks
frequency 
1 11 21 31 41 51 61 71 81
Figure 9: Percentage faulty  Figure 10: Frequency of  
   tracks in cache and disks  track reconstruction 
[3] C.-H. Chen and A. K. Somani, «A Cache Protocol for Error 
Detection and Recovery in Fault-Tolerant Computing Sys-
tems,» 24th IEEE International Symposium on Fault-
Tolerant Computing (FTCS-24), Austin, Texas, USA, 1994, 
pp. 278-287. 
[4] P. M. Chen, E. K. Lee, G. A. Gibson, et al., «RAID: High-
Performance, Reliable Secondary Storage,» ACM Computing 
Surveys, vol. 26, pp. 145-185, 1994. 
[5] M. B. Friedman, «The Performance and Tuning of a Stora-
geTek RAID 6 Disk Subsystem,» CMG Transactions, vol. 
87, pp. 77-88, 1995. 
[6] K. K. Goswami, «Design for Dependability: A simulation-
Based Approach,» PhD., University of Illinois at Urbana-
Champaign,  UILU-ENG-94-2204, CRHC-94-03, February 
1994. 
[7] K. K. Goswami, R. K. Iyer, and L. Young, «DEPEND: A 
simulation Based Environment for System level Dependabil-
ity Analysis,» IEEE Transactions on Computers, vol. 46, pp. 
60-74, 1997. 
[8] M. Holland, G. Gibson, A., and D. P. Siewiorek, «Fast, On-
Line Failure Recovery in Redundant Disk Arrays,» 23rd In-
ternational Symposium on Fault-Tolerant Computing (FTCS-
23), Toulouse, France, 1993, pp. 422-431. 
[9] R. Y. Hou and Y. N. Patt, «Using Non-Volatile Storage to 
improve the Reliability of RAID5 Disk Arrays,» 27th Int. 
Symposium on Fault-Tolerant Computing (FTCS-27), WA, 
Seattle, 1997, pp. 206-215. 
[10] G. E. Houtekamer, «RAID System: The Berkeley and 
MVS Perspectives,» 21st Int. Conf. for the Resource Man-
agement & Performance Evaluation of Enterprise Computing 
Systems (CMG'95), Nashville, Tennesse, USA, 1995, pp. 46-
[11] J. Menon, «Performance of RAID5 Disk Arrays with 
Read and Write Caching,» International Journal on Distrib-
uted and Parallel Databases, vol. 2, pp. 261-293, 1994. 
[12] J. Menon and J. Cortney, «The Architecture of a Fault-
Tolerant Cached RAID Controller,» 20th Annual Interna-
tional Symposium on Computer Architecture, San Diego, CA, 
USA, 1993, pp. 76-86. 
[13] D. A. Patterson, G. A. Gibson, and R. H. Katz, «A Case 
for Redundant Arrays of Inexpensive Disks (RAID),» ACM 
International Conference on Management of Data 
(SIGMOD), New York, 1988, pp. 109-116. 
[14] J. G. Silva, J. Carreira, H. Madeira, et al., «Experimental 
Assessment of Parallel Systems,» 26th International Sympo-
sium on Fault-Tolerant Computing (FTCS-26), Sendai, Ja-
pan, 1996, pp. 415-424.
ABSTRACT
  We present a hierarchical simulation approach for the dependability analysis
and evaluation of a highly available commercial cache-based RAID storage
system. The archi-tecture is complex and includes several layers of
overlap-ping error detection and recovery mechanisms. Three ab-straction levels
have been developed to model the cache architecture, cache operations, and
error detection and recovery mechanism. The impact of faults and errors
oc-curring in the cache and in the disks is analyzed at each level of the
hierarchy. A simulation submodel is associated with each abstraction level. The
models have been devel-oped using DEPEND, a simulation-based environment for
system-level dependability analysis, which provides facili-ties to inject
faults into a functional behavior model, to simulate error detection and
recovery mechanisms, and to evaluate quantitative measures. Several fault
models are defined for each submodel to simulate cache component failures, disk
failures, transmission errors, and data errors in the cache memory and in the
disks. Some of the parame-ters characterizing fault injection in a given
submodel cor-respond to probabilities evaluated from the simulation of the
lower-level submodel. Based on the proposed method-ology, we evaluate and
analyze 1) the system behavior un-der a real workload and high error rate
(focusing on error bursts), 2) the coverage of the error detection mechanisms
implemented in the system and the error latency distribu-tions, and 3) the
accumulation of errors in the cache and in the disks.

<|endoftext|><|startoftext|>
Introduction 
It is a long time conviction of scientists that the all systems in nature optimize certain 
mathematical measures in their motion. The search for such quantities has always been a 
major objective in the efforts to understand the laws of nature. One of these measures is the 
Lagrangian action considered as a most fundamental quantity in physics. The least action 
principle1 [1] has been used to derive almost all the physical laws for regular dynamics 
(classical mechanics, optics, electricity, relativity, electromagnetism, wave motion, etc.[2]). 
This achievement explain the efforts to extend the principle to irregular dynamics such as 
equilibrium thermodynamics[3], irreversible process [4], random dynamics[5][6], stochastic 
mechanics[7][8], quantum theory[9] and quantum gravity theory[10]. We notice that in most 
of these approaches, the randomness or the uncertainty (often measured by information or 
entropy) of the irregular dynamics is not considered in the optimization methods. For 
example, we often see expression such as RR δδ =  concerning the variation of a random 
variable R with an expectation R . This is incorrect because the variation of uncertainty 
aroused by the variation of the R may play important role in the dynamics.  
Another most fundamental measure, called entropy, is frequently used in variational 
methods of thermodynamics and statistics. The word "entropy" has a well known definition 
given by Clausius in the equilibrium thermodynamics. But it is also used as a measure of 
uncertainty in stochastic dynamics. In this sense, it is also referred to as "information" or 
"informational entropy". In contrast to the action principle, entropy and its optimization have 
always been a source of controversies. It has been used in different even opposite variational 
methods based on different physical understanding of the optimization. For instance, there is 
the principle of maximum thermodynamic entropy in statistical thermodynamics[11][12], the 
maximum information-entropy[13][14] in information theory, the principle of minimum 
entropy production [15] for certain nonequilibrium dynamics, and the principle of maximum 
entropy production for others[16][17]. Certain interpretation of entropy and of its evolution 
was even thought to be in conflict with the mechanical laws[18]. Notice that these laws can be 
derived from least action principle. In fact, the definition of entropy is itself a great matter of 
investigation for both equilibrium and nonequilibrium systems since the proposition of 
Boltzmann and Gibbs entropy. Concerning the maximum entropy calculus, few people still 
                                                 
1 We continue to use this term "least action principle" here considering its popularity in the scientific community, 
although we know nowadays that the term "optimal action" is more suitable because the action of a mechanical 
system can have a maximum, or a minimum, or a stationary for real paths[19]. 
contest the fact that the maximization of Shannon entropy yields the correct exponential 
distribution. But curously enough, few people are completely satisfied by the arguments of 
Jaynes and others[12][13][14] supporting the maximum entropy principle by considering 
entropy as anthropomorphic quantity and the principle as only an inference method. This 
question will be revisited to the end of the present paper.  
In view of the fundamental character of entropy in stochastic dynamics, it seems that the 
associated variation approaches must be considered as first principles and cannot be derived 
from other ones (such as least action principle) for regular dynamics where uncertainty does 
not exist at all. However, a question we asked is whether we can formulate a more general 
variation principle covering both the optimization of action for regular dynamics and the 
optimization of information-entropy for stochastic dynamics. We can imagine a mechanical 
system originally obeying least action principle and then subject to a random perturbation 
which makes the movement stochastic. For this kind of systems, we have proposed a 
stochastic action principle [20][21][22] which was originally a combination of maximum 
entropy principle (MEP) and least action principle on the basis of the following assumptions : 
1) A random Hamiltonian system can have different paths between two points in both 
configuration space and phase space. 
2) The paths are characterized uniquely by their action. 
3) The path information is measured by Shannon entropy. 
4) The path information is maximum for real process. 
This is in fact maximization of path entropy under the constraint associated with average 
action over paths (we assume the existence of this average measure). As expected, this 
variational principle leads to a path probability depending exponentially on the Lagrangian 
action of the paths and satisfying the Fokker-Planck equation of normal diffusion[21]. Some 
diffusion laws such as Fick's laws, Ohm's law, and Fourier's law can be derived from this 
probability distribution. We noticed that the above combination of two variation principles 
could be written in a concise form 0=Aδ [22], i.e., the variation of action averaged over all 
possible paths must vanish.  
However, many disadvantages exist in the above formalism. The first one is that not all 
the above physical assumptions are obvious and convincing. For example, concerning the 
path probability, another point of view[23] says that the probability should depend on the 
average energy on the paths instead of their action. The second disadvantage of that 
formalism is we used the Shannon entropy as a starting hypothesis, which limits the validity 
of the formalism. One may think that the principle is probably no more valid if the path 
uncertainty cannot be measured by the Shannon formula. The third disadvantage is that MEP 
is already a starting hypothesis, while it was expected that the work might help to understand 
why entropy goes to maximum.  
In this work, the reasoning is totally different even opposite. The only physical 
assumption we make is a stochastic action principle (SAP), i.e., 0=Aδ . The first and second 
assumptions mentioned above are not necessary because these properties will be extracted 
from experimental results. The third and fourth assumptions become purely the consequences 
of SAP. This work is limited to the classical mechanics of Hamiltonian systems for which the 
least action principle is well formulated. Neither relativistic nor quantum effects is 
considered.  
2) Stochastic dynamics of particle diffusion 
We consider a classical Hamiltonian systems moving, maybe randomly, in the 
configuration space between two points a and b. Its Hamiltonian is given by H=T+V and its 
Lagrangian by VTL −=  where T is the kinetic energy and V the potential one. The 
Lagrangian action on a given path is ∫=
LdtA  as defined in the Lagrangian mechanics. These 
definitions need sufficiently smooth dynamics at smallest time scales of observation. In 
addition, if there are random noises perturbing the motion, the energy variation due to the 
external perturbation or internal fluctuation is negligible at a time scale τ which is 
nevertheless small with respect to the observation period. Hence VTL −=  and VTH +=  
can exist, where T  and V  are kinetic and potential energies averaged over τ  such as 
TdtT . 
It is known that if there is no random forces and if the duration of motion tab= tb -ta from a 
to b is given, there is only one possible path between a and b. However, this uniqueness of 
transport path disappears if the motion is perturbed by random forces. An example is the case 
of particle diffusion in random media, where many paths between two given points are 
possible. This effect of noise can be easily demonstrated by a thought experiment in Figure 1. 
See the caption for detailed description. In this experiment, it is expected that more a path is 
different from the least action path (straight line in the figure) between a and b, less there are 
particles traveling on that path, i.e., smaller is the probability that the path is taken by the 
particles.  
Dust particles 
 h1 h2
Figure 1 
A thought experiment for the random diffusion of the dust particles falling in the 
air. At time ta, the particles fall out of the hole at point a. At time tb, certain 
particles arrive at point b. The existence of more than one path of the particles 
from a to b can be proved by the following operations. Let us open only one hole 
h1 on a wall between a and b, we will observe dust particles at point b at time tb. 
Then close the hole h1 and open another hole h2, we can still observe particles at 
point b at time tb, as illustrated by the two curves passing respectively through h1 
and h2. Another observation of this experiment is that more a path is different 
from the vertical straight line between a and b, less there are particles traveling on 
that path, i.e., smaller is the probability that the path is taken by the particles. This 
observation can be easily verified by the numerical experiment in the following 
section. 
Now let us suppose W discrete paths from a to b. Among a very large N particles leaving 
the point a, we observe Nk ones arriving at point b by the path k. Then the probability for the 
particles to take the path k is defined by 
Nkp kab =)( . The normalization is given by 
1)( =∑ kp
ab  or, in the case of continuous space, by the path integral 1)( =∫ prD ab , where r 
denotes the continuous coordinates of the paths.  
3) A numerical experiment of particle diffusion and path probability 
Does the probability 
Nkp kab =)(  really exist for each path? If it exists, how does it 
change from path to path? What are the quantities associated with the paths which determines 
the change in path probability? To answer these questions, we have carried out numerical 
experiments (Figure 2) showing the dust particles fall from a small hole a on the top of a two 
dimensional experimental box to the bottom of the box. A noise is introduced to perturb 
symmetrically in the direction of x the falling particles. We have used three kind of noises: 
Gaussian noise, uniform noise (with amplitudes uniformly distributed between -1 and 1) and 
truncated uniform noises (uniform noise with a cutoff of magnitude between -z and z where 
z<1, i.e., the probability is zero for the magnitude between –z and z).  
Figure 2 
2a: Model of the numerical experiment showing the dust particles fall from 
a small hole a onto the bottom of the experimental box. The distribution of 
particles on the bottom (represented by the vertical bars) is caused by the 
random noise (air for example) in the direction of x. 2b: An example of 
experimental results in which the falling particles are perturbed by a noise 
whose magnitude is uniformly distributed between -1 and 1 in x. The 
vertical bars are experimental result and the curve is a Gaussian distribution 
( ) dxxxN
xdNxdp )
1)()( 2
−== , where dN(x) is the particle number in 
the interval x—x+dx, N is the total number of falling particles and σ is the 
standard deviation (sd). The experiments show that the dp(x) is always 
Gaussian whatever the noise (uniform, Gaussian or other truncated uniform 
noises).  
Dust particles 
x0 x 
The observed distributions of particles are Gaussian for the three noises. The standard 
deviation of the distributions is uniquely determined by the nature of the noise (type, maximal 
magnitude, frequency etc.). This result was expected because of the finite variances of the 
used noises and of the central limit theorem saying that the attractor distribution is a normal 
(Gaussian) one if the noises (random variables) have finite variance.  
What can we conclude from this experiment of falling particles which seems to be trivial? 
First, let us suppose that the falling distance h is small so that the path y between a and 
any position x on the bottom can be considered as a straight line and the average velocity on y 
can be given by 
y  where τ  is the motion time from a to x (see Figure 2a). In this case, it is 
easy to show that the action Ax from a to x is proportional to (x-x0)2, i.e.,  
τττττ
2)(222222
2222 hmxxmh
mymmghymmghymVTAx −−=−=−=−=−=   
where 
mT =  and 
V =  are the average kinetic and potential energy, respectively. 
This analysis applies to any smooth motion provided h is small. Considering the observed 
Gaussian distribution of the falling particles in figure 2, we can write for small h 
)exp()( AxdN xη−∝  where η is a constant. The probability that a particle takes the small 
straight path from a to x is proportional to the exponential of action Ax.  
Now let us consider large h. In this case the paths may not be straight lines. But a curved 
path from a to x can be cut into small intervals at x1, x2, .... The above analysis is still valid for 
each small segment. The probability that a particle takes the path to x is then equal to the 
product of the probabilities on every segment of that path from a to x and should be 
proportional to the exponential of the total action from a to x, i.e.,  
( ) ( ) ( )AAAp axi i
iax ηηη −=∑−=∏ −∝ expexpexp  (1) 
where Ai is the action on the segment xi and Aax is the total action on a given path from a to x. 
The constant η is a characteristic of the noise and should be the same for every segment. The 
conclusion of this section is the path probability depends exponentially on action as long as 
the particle distribution on the bottom of the box is Gaussian.  
Concerning the exponential form of path probability, there is another proposal [23] 
( )kab Hkp γ−∝ exp)( , i.e., the path probability depends exponentially on the negative 
average energy. According to this probability, the most probable path has minimum average 
energy, so that for vanishing noise (regular dynamics), this minimum energy path would be 
the unique one which must also follow the least action principle. Here we have a paradox 
because the real path given by least action principle is in general not the path of minimum 
average energy.  
4) An action principle for stochastic dynamics 
Recently, the following stochastic action principle (SAP) was postulated[20][22]  : 
0=Aδ  (2) 
where AprDA abδδ ∫= )(  is the average of the variation Aδ  over all the paths. It can be 
written as follows  
pArDAprD
AprDA
where ∫= AprDA ab)(  is the ensemble average of action A, and abSδ  is defined by 
( ) pArDAAS abab δηδδηδ ∫=−= )( . (4) 
Eq.(4) makes it possible to derive Sab directly from probability distribution if the latter is 
known. Let us consider the dynamics in the section 3 that has the exponential path probability  
η−= exp1  (5) 
where ( )∫ −= ArDZ ηexp)(  is the partition function of the distribution. A trivial calculation 
tell us that abSδ is a variation of the path entropy Sab given by Shannon formula 
∫−= pprDS ababab ln)( . (6) 
Eq.(4) is a definition of entropy or information as a measure of uncertainty of random 
variable (action in the present case)[26]. It mimics the first law of thermodynamics 
dWdUdQ +=  where EpEU i
i∑==  is the average energy,  Ei is the energy of the state i with 
probability pi, dW is the work of the forces )(
∑−=  and qj is some extensive 
variables such as volume, surface, magnetic moment etc. The work can be written as 
∑ −=−=∑ ⎟
i dEdEpdqq
EpdW . So the first law becomes dEEddQ −= . We see that by 
Eq.(4) a “heat” Q is defined as the measure the randomness of action (or of any other random 
variables in general[26]). In Eq.(6), this heat” is related to the Shannon entropy since the 
probability is exponential. If the probability is not exponential, the functional of the entropy is 
probably different from the Shannon one, as discussed in [26]. 
With the help of Eqs.(2) and (5), it is easy to verify that  
App abab δηδ −=  (7) 
and  
App abab δηδ
22 −= . (8) 
From Eqs.(7) and (8), the maximum condition of pab , i.e., 0=pabδ  and 0
2 <pabδ , is 
transformed into 0=Aδ  and 02 >Aδ  if the constant η is positive, that is the least action path 
is the most probable path. On the contrary, if η is negative, we get 0=Aδ  and 02 <Aδ , the 
most probable path is a maximum action one.  
In our previous work, we have proved that the probability distribution of Eq.(5) satisfied 
the Fokker-Planck equation in configuration space. It is easy to see that[20], in the case of 
free particle, Eq.(5) gives us the transition probability of Brownian motion with 0
where m is the mass and D the diffusion constant of the Brownian particle[25].  
5) Return to the regular least action principle 
The stochastic action principle Eq.(2) should recover the usual least action principle 0=Aδ  
when the stochastic dynamics tends to regular dynamics with vanishing noise. To show this, 
let us put the probability Eq.(5) into Eq.(6), a straightforward calculation leads to 
AZSab η+=ln . (9) 
In regular dynamics, pab=1 for the path of optimal (maximal or minimal or stationary) 
action A0 and pab=0 for other paths having different actions, so that 0=abS  from Eq.(6). We 
have only one path, the integral in the partition function gives ( ) ( )0expexp)( AAqDZ ηη −=∫ −= . 
Eq.(9) yields 0AA= . On the other hand, we have ( ) 0=−= AASab δδηδ . Thus our principle 0=Aδ  
implies 00== AA δδ  or, more generally, 0=Aδ . This is the usual action principle. 
6) Stochastic action principle and maximum entropy  
Eq.(3) tells us that the SAP given by Eq.(2) implies 
0)( =− ASab ηδ . (10) 
meaning that the quantity ( )ASab γ−  should be optimized. If we add the normalization 
condition, the SAP becomes: 
0)]1)(([ =∫ −+− pqDAS abab αηδ  (11) 
which is just the usual Jaynes principle of maximum entropy. Hence Eq.(2) is equivalent to 
the Jaynes principle applied t path entropy.  
Is Eq.(2) simply a concise mathematical form of Jaynes principle associated to average 
action? Or is there something of fundamental which may help us to understand why entropy 
gets to maximum for stable or stationary distribution?    
From section 4, we understand that, in the case of equilibrium system, the variation dEi  is 
a work dW. However, in the case of regular mechanics, dW=0 is the condition of equilibrium 
meaning that the sum of all the forces acting on the system should be zero and the net torque 
taken with respect to any axis must also vanish. So it seems reasonable to take 0=dEi  as an 
equilibrium condition for stochastic equilibrium. In other words, when a random motion is in 
(global) equilibrium, the total work ∑ ⎟
i dqq
EpdW  by all the random forces 
on all the virtual increments dqj of a state variable (e.g., volume) must vanish. As a 
consequence of  the first law, 0=dEi  naturally leads to 0][ =− US ηδ , i.e., Jaynes maximum 
entropy principle associated with the average energy 0]1[ =+− αηδ US  where S is the 
thermodynamic entropy. This analysis seems to say that the maximum entropy (maximum 
randomness) is required by the mechanical equilibrium condition in stochastic situation.  
Remember that dEi  can also be written as a variation of free energy TSUF −= , i.e., 
dFdEi = . The stochastic equilibrium condition can be put into 0=dF . 
Coming back to our SAP in Eq.(2), the system is in nonequilibrium motion. If there is no 
noise, the true path satisfies 0=Aδ  and 0=
. When there is noise perturbation, 
we have[22]  
0)( =∑ ⎥
∂∫= ∫
j ab drdtr
prDdA  
(12) 
where 0≠=
t jjj
 is the random force on drj. Let ∫=
j dtft
f 1  be the time average of 
the random force fj, we obtain 
[ ] 0)( ==∑= ∫ dWtdrfprDtdA ab
jjj abab  
(13) 
where [ ] ∑=∑= ∫∫
jj abj
jjj ab dWprDdrfprDdW )()(  is the ensemble average (over all paths) of the 
time average ∫=∫=
j dtWdt
dtrdft
dW 11   and rdfdW jjj=  is the work of random force 
over the variation (deformation) rd j  of a given path. Eq.(13) means 
0=dW  (14) 
since tab is arbitrary. Eq.(14) implies that the average work of the random forces at any 
moment over any time interval and over arbitrary path deformation must vanish. This 
condition can be satisfied only when the motion is totally random, a state at which the system 
does not have privileged degrees of freedom without constraints. Indeed, it is easy to show 
that the maximum entropy with only the normalization as constraint yields totally 
equiprobable paths. This argument also holds for equilibrium systems. The vanishing work 
0==dWdEi  needs that, if there is no other constraint than the normalization, no degree of 
freedom is privileged, i.e., all microstates of the equilibrium state should be equiprobable. 
This is the state which has the maximum randomness and uncertainty.  
To summarize this section, the optimization of both equilibrium entropy and 
nonequilibrium path entropy is simply the requirement of the mechanical equilibrium 
conditions in the case of stochastic motion. There is no mystery in that. Entropy or dynamical 
randomness (uncertainty) must take the largest value for the system to reach a state where the 
total virtual work of the random forces should vanish. Entropy is not necessarily 
anthropomorphic quantity as claimed by Jaynes[14] to be able to take maximum for correct 
inference. Entropy is nothing but a measure of physical uncertainty of stochastic situation.  
Hence maximum entropy is not merely an inference principle. It is a law of physics. This is a 
major result of the present work. 
7) Concluding remarks 
We have presented numerical experiments showing the path probability distribution of 
some stochastic dynamics depends exponentially on Lagrangian action. On this basis, a 
stochastic action principle (SAP) formulated for Hamiltonian system perturbed by random 
forces is revisited. By using a new definition of statistical uncertainty measure which mimics 
the heat in the first law of equilibrium thermodynamics, it is shown that, if the path 
probability is exponential of action, the measure of path uncertainty we defined is just 
Shannon information entropy. It is also shown that the SAP yields both the Jaynes principle of 
maximum entropy and the conventional least action principle for vanishing noise. It is argued 
that the maximum entropy is the requirement of the conventional mechanical equilibrium 
condition for the motion of random systems to be stabilized, which means the total virtual 
work of random forces should vanish at any moment within any arbitrary time interval. This 
implies, in equilibrium case, 0=dEi , and in nonequilibrium case, 0== dWdA . In both cases, 
the randomness of the motion must be at maximum in order that all degrees of freedom are 
equally probable if there is no constraint. By these arguments, we try to give the maximum 
entropy principle, considered by many as only an inference principle, the status of a 
fundamental physical law. 
References 
[1] P.L.M. de Maupertuis, Essai de cosmologie (Amsterdam, 1750) 
[2] S. Bangoup, F. Dzangue, A. Jeatsa, Etude du principe de Maupertuis dans tous ses 
états, Research Communication of ISMANS, June 2006 
[3] L. De Broglie, La thermodynamique de la particule isolée, Gauthier-Villars éditeur, 
Paris, 1964 
[4] L. Onsager and S. Machlup, Fluctuations and irreversible processes, Phys. Rev., 
91,1505(1953); L. Onsager, Reciprocal relations in irreversible processes I., Phys. 
Rev. 37, 405(1931) 
[5] M.I. Freidlin and A.D. Wentzell, Random perturbation of dynamical systems, 
Springer-Verlag, New York, 1984 
[6] G.L. Eyink, Action principle in nonequilibrium statistical dynamics, Phys. Rev. E, 
54,3419(1996) 
[7] F. Guerra and L. M. Morato, Quantization of dynamical systems and stochastic 
control theory, Phys. Rev. D, 27, 1774(1983) 
[8] F. M. Pavon, Hamilton's principle in stochastic mechanics, J. Math. Phys., 36, 
6774(1995);  
[9] R.P. Feynman and A.R. Hibbs, Quantum mechanics and path integrals, 
McGraw-Hill Publishing Company, New York, 1965 
[10] S.W. Hawking, T. Hertog, Phys. Rev. D, 66(2002)123509; 
S.W. Hawking, Gary.T. Horowitz, Class.Quant.Grav., 13(1996)1487; 
S. Weinberg, Quantum field theory, vol.II, Cambridge University Press, 
Cambridge, 1996 (chapter 23: extended field configurations in particle 
physics and treatments of instantons) 
[11] J. Willard Gibbs, Principes élémentaires de mécanique statistique (Paris, Hermann, 
1998) 
[12] E.T. Jaynes, The evolution of Carnot's principle, The opening talk at the EMBO 
Workshop on Maximum Entropy Methods in x-ray crystallographic and biological 
macromolecule structure determination, Orsay, France, April 24-28, 1984;  
[13] M. Tribus, Décisions Rationelles dans l'incertain (Paris, Masson et Cie, 1972) P14-
26; or Rational, descriptions, decisions and designs (Pergamon Press Inc., 1969)  
[14] E.T. Jaynes, Gibbs vs Boltzmann entropies, American Journal of Physics, 
33,391(1965) ; Where do we go from here? in Maximum entropy and Bayesian 
methods in inverse problems, pp.21-58, eddited by C. Ray Smith and W.T. Grandy 
Jr., D. Reidel, Publishing Company (1985) 
[15] I. Prigogine, Bull. Roy. Belg. Cl. Sci., 31,600(1945) 
[16] L.M. Martyushev and V.D. Seleznev, Maximum entropy production principle in 
physics, chemistry and biology, Physics Reports, 426, 1-45 (2006) 
[17] G. Paltridge, Quart. J. Roy. Meteor. Soc., 101,475(1975) 
[18] J.R. Dorfmann, An introduction to Chaos in nonequilibrium statistical mechanics, 
Cambridge University Press, 1999 
[19] C.G.Gray, G.Karl, V.A.Novikov, Progress in Classical and Quantum Variational 
Principles, Reports on Progress in Physics (2004), arXiv: physics/0312071 
[20] Q.A. Wang, Maximum path information and the principle of least action for 
chaotic system, Chaos, Solitons & Fractals, (2004), in press; cond-mat/0405373 
and ccsd-00001549 
[21] Q.A. Wang, Maximum entropy change and least action principle for 
nonequilibrium systems, Astrophysics and Space Sciences, 305 (2006)273 
[22] Q.A. Wang, Non quantum uncertainty relations of stochastic dynamics, Chaos, 
Solitons & Fractals, 26,1045(2005), cond-mat/0412360 
[23] R. M. L. Evans, Detailed balance has a counterpart in non-equilibrium steady 
states, J. Phys. A: Math. Gen. 38 293-313(2005)  
[24] V.I. Arnold, Mathematical methods of classical mechanics, second edition, 
Springer-Verlag, New York, 1989, p243 
[25] R. Kubo, M. Toda, N. Hashitsume, Statistical physics II, Nonequilibrium 
statistical mechanics, Springer, Berlin, 1995 
[26] Q.A. Wang, Some invariant probability and entropy as a measure of uncertainty, 
cond-mat/0612076 
ABSTRACT
  A stochastic action principle for stochastic dynamics is revisited. We
present first numerical diffusion experiments showing that the diffusion path
probability depend exponentially on average Lagrangian action. This result is
then used to derive an uncertainty measure defined in a way mimicking the heat
or entropy in the first law of thermodynamics. It is shown that the path
uncertainty (or path entropy) can be measured by the Shannon information and
that the maximum entropy principle and the least action principle of classical
mechanics can be unified into a concise form. It is argued that this action
principle, hence the maximum entropy principle, is simply a consequence of the
mechanical equilibrium condition extended to the case of stochastic dynamics.

<|endoftext|><|startoftext|>
CONSTRAINING THE DARK ENERGY EQUATION OF
STATE WITH COSMIC VOIDS
Jounghun Lee and Daeseong Park
Department of Physics and Astronomy, FPRD, Seoul National University, Seoul 151-747,
Korea
jounghun@snu.ac.kr
ABSTRACT
Our universe is observed to be accelerating due to the dominant dark en-
ergy with negative pressure. The dark energy equation of state (w) holds a key
to understanding the ultimate fate of the universe. The cosmic voids behave
like bubbles in the universe so that their shapes must be quite sensitive to the
background cosmology. Assuming a flat universe and using the priors on the
matter density parameter (Ωm) and the dimensionless Hubble parameter (h),
we demonstrate analytically that the ellipticity evolution of cosmic voids may
be a sensitive probe of the dark energy equation of state. We also discuss the
parameter degeneracy between w and Ωm.
Subject headings: cosmology:theory — large-scale structure of universe
Recent observations have revealed that our universe is flat and in a phase of acceler-
ation (Riess et al. 1998; Perlmutter et al. 1999; Spergel et al. 2003). It implies that some
mysterious dark energy fills dominantly the universe at present epoch, exerting anti-gravity.
The nature of this mysterious dark energy which holds a key to understanding the ultimate
fate of the universe is often specified by its equation of state, i.e., the ratio of its pressure to
density: w ≡ Pde/ρde. The anti-gravity of the dark energy corresponds to the negative value
of w. The simplest candidate for the dark energy is the vacuum energy (Λ) with w = −1
that is constant at all times (Einstein 1917). Although all current data are consistent with
the vacuum energy model (e.g., Wang & Tegmark 2004; Jassal et al. 2004; Percival 2005;
Guzzo et al. 2008), the notorious failure of the theoretical estimate of the vacuum energy
density (see Caroll et al. 1992, for a review) has led a dynamic dark energy model to emerge
as an alternative. In this dynamic dark energy models which is often collectively called
quintessence, the dark energy is described as a slowly rolling scalar field with time-varying
equation of state in the range of −1 < w < 0 (Caldwell et al. 1998).
http://arxiv.org/abs/0704.0881v2
– 2 –
The following observables have so far been suggested to discriminate the dark en-
ergy models: the luminosity-distance measure of type Ia supernova (Riess et al. 2004, 2007;
Davis et al. 2007; Kowalski et al. 2008); the abundance of galaxy clusters as a function of
mass (Wang & Steinhardt 1998; Haiman et al. 2001; Weller et al. 2002), the baryonic acous-
tic oscillations in the galaxy power spectrum (Blake & Glazebrook 2003; Hu & Haiman 2003;
Cooray 2004; Seo & Eisentein 2005), and the weak gravitational lensing effect (Hu 1999;
Huterer 2001; Takada & Jain 2004; Song & Knox 2004). True as it is that these observables
can constrain powerfully the value of w, it is still quite necessary and important to find out
as many different observables as possible for consistency tests.
Another possible observable as a dark energy constraint may be the shapes of the cos-
mic voids. As the voids behave like bubbles due to their extremely low densities, their
shapes determined by the spatial distribution of the void galaxies tend to change sensitively
according to the competition between the tidal distortion and the gravitational rarefaction
effect. Therefore, the shape evolution of the voids must depend sensitively on the background
cosmology. In this Letter we study the ellipticity evolution of cosmic voids in the QCDM
(quintessence + cold dark matter) model with the help of the analytic formalism developed
by Park & Lee (2007) and explore the possibility of using it as a complimentary probe of
the dark energy equation of state.
According to Park & Lee (2007), the shape of a void region is related to the eigenvalues
of the local tidal shear tensor as
λ1(µ, ν) =
1 + (δv − 2)ν2 + µ2
(µ2 + ν2 + 1)
, (1)
λ2(µ, ν) =
1 + (δv − 2)µ2 + ν2
(µ2 + ν2 + 1)
, (2)
where {λi}3i=1 (with λ1 > λ2 > λ3) are the three eigenvalues of the local tidal field smoothed
on void scale, δv is the density contrast threshold for the formation of a void: δv =
i=1 λi,
and {µ, ν} (with ν < µ) represents a set of the two parameters that quantify the anisotropic
distribution of the void galaxies. They defined the void ellipticity as ε ≡ 1−ν and evaluated
its probability density distribution as
p(1− ε; z) = p(ν; z, RL) =
p[µ, ν|δ = δv; σ(z, RL)]dµ
10πσ5(z, RL)
2σ2(z, RL)
15δv(λ1 + λ2)
2σ2(z, RL)
× exp
15(λ21 + λ1λ2 + λ
2σ2(z, RL)
(2λ1 + λ2 − δv)
×(λ1 − λ2)(λ1 + 2λ2 − δv)
4(δv − 3)2µν
(µ2 + ν2 + 1)3
. (3)
– 3 –
Here, σ(z, RL)) ≡ D2(z)
∆2(k)W 2(kRL)d ln k is the linear rms fluctuation of the matter
density field smoothed on a Lagrangian void scale of RL at redshift z where D(z) is the
linear growth factor, W (kRL) is a top-hat window function, and ∆
2(k) is the dimensionless
linear power spectrum. Throughout this study, we adopt the linear power spectrum of the
cold dark matter cosmology (CDM) that does not depend explicitly on w (Bardeen et al.
1986).
Equation (3) was originally derived under the assumption of a ΛCDM model (w = −1).
We propose here that it also holds good for the case of a QCDM (quintessence+CDM) model
where the dark energy equation of state changes with time as w(z) = w0 + waz/(1 + z)
(Chevallier & Polarski 2001; Linder 2003) where w0 is the value of w at present epoch and
wa quantifies how the dark energy equation of state changes with time. Then, we employ
the following approximation formula for the linear growth factor, D(z), for a QCDM model
(Basilakos 2003; Percival 2005):
D(z) =
2(z + 1)
Ωαm − ΩQ +
(1 +AΩQ)
. (4)
where
E2(z) = Ωm(1 + z)
3 + ΩQ(1 + z)
−f(z), (5)
f(z) = −3(1 + w0)−
2 ln(1 + z)
, (6)
5− 2/(1− w)
(1− w)(1− 3w/2)
(1− 6w/5)3
[1− Ωm], (7)
A = − 0.28
w + 0.08
− 0.3. (8)
The CDM density parameter Ωm and the dark energy density parameter ΩQ evolve with z
respectively as
Ωm(z) =
Ωm0(1 + z)
E2(z)
, ΩQ(z) =
E2(z)(1 + z)f(z)
, (9)
where Ωm0 and ΩQ0 represent the present values. Equation (3) implies that the mean elliptic-
ity of voids decreases with z. A key question is how the rate of the decrease changes with the
dark energy equation of state. Since most of the recent observations indicate that the dark
energy equation of state at present epoch is consistent with w = −1 (e.g., see Guzzo et al.
2008, and references therein) we focus on how the mean void ellipticity depends on the value
of wa. Even in case that w0 = −1, if wa is found to deviate from zero, it would imply the
dynamic dark energy, disproving the simple ΛCDM model.
To explore how the void ellipticity evolution depends on wa, we evaluate the mean
ellipticity of voids as ε̄(z) =
ε p(ε;RL, z)dǫ for different values of wa through equations
– 4 –
(3)- (9). The other key cosmological parameters are set at Ωm = 0.75,ΩQ = 0.75, h = 0.73,
σ8 = 0.9 and w0 = −1. When the abundance of evolution of galaxy clusters is used to
constrain the dark energy equation of state, the cluster mass is usually set at a certain
threshold, MR, defined as the mass within a certain comoving radius (Wang & Steinhardt
1998). Likewise, we set the Lagrangian scale of a void, RL at 4h
−1Mpc, which is related
to the mean effective radius of a void as R̄E = (1 + δv)
−1/3R̄L/(1 + z). The Lagrangian
scale RL = 4h
−1Mpc corresponds to the mean effective size of a void at present epoch,
RE ∼ 8.5h−1Mpc.
Figure 1 plots ε̄(z) for the four different cases: wa = −1/3, 0, 1/3 and 2/3 (long-dashed,
solid, dashed, and dotted line, respectively). As can be seen, the higher the value of wa is,
the more rapidly ε̄(z) decreases. It also suggests that ε̄(z) is well approximated as a linear
function of z in recent epochs (0 < z < 0.2). Therefore, we fit ε(z) to a straight line as
ε̄(z) ≈ Avz + Bv. Varying the value of wa in the range of [0, 2/3], we compute the best-fit
slope Av. The range, 0 ≤ wa ≤ 2/3, corresponds to the dark energy equation of state range,
−1 ≤ w ≤ −0.9. The result is plotted in Fig. 2. As can be seen, the void ellipticity evolves
more rapidly as the value of wa increases. That is, the void ellipticity undergoes a stronger
evolution when the anti-gravitational effect is less strong in recent epochs. Note that Av
shows a noticeable 30% difference as the dark energy equation of state changes w from −1
to −0.9.
We have so far neglected the parameter degeneracy between w and the other key pa-
rameters. However, as the dependence of the void ellipticity distribution on the dark energy
equation of state comes from its dependence on ∆2(k; Ωm0, σ8, h, w), it is naturally expected
that there should be a strong parameter degeneracy. Here, we focus on the degeneracy be-
tween Ωm0 and w. First, we recompute Av, varying the values of Ωm0 and w0 with setting
wa = 1/3. The left panel of Fig. 3 plots a family of the degeneracy curves in the Ωm0-w0
plane for the three different values of Av. As can be seen, there is a strong degeneracy
between the two parameters. For a given value of Av, the value of w0 increases as the value
of Ωm0 decreases. A similar trend is also found in the Ωm0-wa degeneracy curves that are
plotted in the right panel of Fig. 3 for which the value of w0 is set at −1. It is worth noting
that this degeneracy trend is orthogonal to that found from the cluster abundance evolution
(see Fig.3 in Wang & Steinhardt 1998). Thus, when combined with the cluster analysis, the
void ellipticity analysis may be useful to break the degeneracy between Ωm0 and w.
We have shown that the void ellipticity evolution is in principle a useful constraint of the
dark energy equation of state. We have also shown that it provides a new degeneracy curve
for Ωm0 and w. When combined with the cluster abundance analysis, it should be useful
to break the degeneracy. Furthermore, unlike the mass measurement of high-z clusters
– 5 –
which suffers from considerable scatters, the void ellipticities are readily measured from the
positions of the void galaxies without requiring any additional information.
To use our analytic tool in practice to constrain the dark energy equation of state,
however, it will require to account for the redshift distortion effect since the positions of the
void galaxies are measured in redshift space. In our companion paper (Park & Lee 2009 in
preparation), we have analyzed the Millennium Run Redshift-Space catalog (Springel et al.
2005) and determined the ellipticity distribution of the galaxy voids. From this analysis, it
is somewhat unexpectedly found that the void ellipticity distribution measured in redshift
space is hardly changed from the one in real space. In fact, this result is consistent with
the recent claims of Hoyle & Vogeley (2007) and that of van de Weygaert (2008, private
communication) who have already pointed out that the redshift distortion effect has only
negligible, if any, effect on the shapes of voids. We hope to constrain the dark energy
equation of state by applying our theoretical tool to real observational data and report the
result elsewhere in the near future.
We thank an anonymous referee for a constructive report. J.L. am very grateful to
S.Basilakos for very helpful discussion and comments. This work is financially supported
by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korean
Government (MOST, NO. R01-2007-000-10246-0).
– 6 –
REFERENCES
Bardeen, J. M., Bond, J. R., Kaiser, N., & Szalay, A. S. 1986, ApJ, 304, 15
Basilakos, S. 2003, ApJ, 590, 636
Blake, C., & Glazebrook, K. 2003, ApJ, 594, 665
Caldwell, R. R., Dave, R. & Steinhardt, P.J. 1998, Phys. Rev. Lett., 80, 1582
Carroll, S., Press, W.H. & Turner, E.C. 1992, Ann. Rev. , 30, 499
Chevallier, M. & Polarski, D. 2001, Int. J. Mod. Phys. D, 10, 213
Cooray, A. 2004, MNRAS, 348, 250
Davis, T. M. 2007, ApJ, 666, 716
Einstein, A. 1917, Sitz. Preuss. Akad. Wiss., 142
Guzzo, L., et al. 2008, Nature, 451, 31
Haiman, Z., Mohr, J.J. & Holder, G.P. 2001, ApJ, 553, 545
Hoyle, F., & Vogeley, M. S. 2002, ApJ, 566, 641
Hu, W. 1999, ApJ, 522, L21
Hu, W. & Haiman, Z. 2003, Phys. Rev. D, 68, 063004
Huterer, D. 2001, Phys. Rev. D, 65, 63001
Jassal, H.K., Bagla, J.S., & Padmanabhan, T. 2004, MNRAS, 356, L11
Kowalski, M. et al. 2008, ApJ, 686, 749
Linder, E. 2003, Phys. Rev. Lett., 90, 091301
Park, D., & Lee, J. 2007, Phys. Rev. Lett., 98, 081301
Perlmutter, S. et al. 1999, ApJ, 517, 565
Percival, W. J. 2005, A&A, 819, 830
Riess, A. G. et al. 1998, ApJ, 116, 1009
Riess, A. G. et al. 2004, ApJ, 607, 665
– 7 –
Riess, A. G. et al. 2007, ApJ, 659, 98
Seo, H. & Eisenstein, D. J. 2005, ApJ, 633, 575
Song, Y. S. & Knox, L. 20034 Phys. Rev. D, 70, 063510
Spergel, D. N. et al. 2003, ApJS, 148, 175
Springel, V. et al. 2005, Nature, 435, 629
Strauss, M. et al. 2002, ApJ, 124, 1810
Takada, M. & Jain, B. 2004, MNRAS, 348, 897
Wang, L. & Steinhardt, P. J. 1998, ApJ, 508, 483
Wang, Y. & Tegmark, M. 2004, Phys. Rev. Lett., 92, 241302
Weller, J., Battye, R. A., & Kneissl, R. 2002, Phys. Rev. Lett., 88, 231301
This preprint was prepared with the AAS LATEX macros v5.2.
– 8 –
Fig. 1.— Mean ellipticity of the voids with RL = 4h
−1Mpc as a function of z.
– 9 –
Fig. 2.— Slope of the void ellipticity as a function of wa.
– 10 –
Fig. 3.— Contours of Av in the Ωm0-w0 (left) and in the Ωm-wa (right) plane.
ABSTRACT
  Our universe is observed to be accelerating due to the dominant dark energy
with negative pressure. The dark energy equation of state (w) holds a key to
understanding the ultimate fate of the universe. The cosmic voids behave like
bubbles in the universe so that their shapes must be quite sensitive to the
background cosmology. Assuming a flat universe and using the priors on the
matter density parameter (Omega_m) and the dimensionless Hubble parameter (h),
we demonstrate analytically that the ellipticity evolution of cosmic voids may
be a sensitive probe of the dark energy equation of state. We also discuss the
parameter degeneracy between w and Omega_m.

<|endoftext|><|startoftext|>
Introduction
The functionals that we now call uniform measures were originally studied by Berezanskǐı [1],
Csiszár [2], Fedorova [3] and LeCam [17]. The theory was later developed in several directions
by a number of other authors; see the references in [19] and [21].
Uniform measures need not be countably additive, but they have a number of properties that
have traditionally been formulated and proved for countably additive measures, or countably
additive functionals on function spaces. The main result in this paper, in section 3, is that
countably additive measures are uniform measures on a large class of uniform spaces (on all
uniform spaces, if every cardinal has measure zero).
Section 4 deals with the functionals that behave like uniform measures on sequences of func-
tions; or, equivalently, like countably additive measures on bounded uniformly equicontinuous
sets. In the case of a topological group with its right uniformity, these functionals were defined
by Ferri and Neufang [6] and used in their study of topological centres in convolution algebras.
2 Notation
In the whole paper, linear spaces are assumed to be over the field R of reals. Uniform spaces are
assumed to be Hausdorff. Uniform spaces are described by uniformly continuous pseudometrics
([11], Chap. 15), abbreviated u.c.p.
When d is a pseudometric on a set X , define
Lip(d) = {f : X → R | |f(x)| ≤ 1 and |f(x)− f(x′)| ≤ d(x, x′) for all x, x′ ∈ X} .
http://arxiv.org/abs/0704.0885v1
Then Lip(d) is compact in the topology of pointwise convergence onX , as a topological subspace
of the product space RX .
When X is a uniform space, denote by Ub(X) the space of bounded uniformly continuous
functions f : X → R with the norm ‖f‖ = sup{ |f(x)| | x ∈ X}. Let Coz(X) be the set
of all cozero sets in X ; that is, sets of the form {x ∈ X | f(x) 6= 0} where f ∈ Ub(X). Let
σ(Coz(X)) be the sigma-algebra of subsets of X generated by Coz(X).
When d is a pseudometric on a set X , denote by O(d) the collection of open sets in the (not
necessarily Hausdorff) topology defined by d. Note that if d is a u.c.p. on a uniform space X
then O(d) ⊆ Coz(X).
Denote by M(X) the norm dual of Ub(X), and consider three subspaces of M(X):
1. Mu(X) is the space of those µ ∈ M(X) that are continuous on Lip(d) for every u.c.p. d
on X , where Lip(d) is considered with the topology of pointwise convergence on X . The
elements of Mu(X) are called uniform measures on X .
2. Mσ(X) is the space of µ ∈ M(X) for which there is a bounded (signed) countably additive
measure m on the sigma-algebra σ(Coz(X)) such that
µ(f) =
fdm for f ∈ Ub(X) .
3. Muσ(X) is the space of those µ ∈ M(X) that are sequentially continuous on Lip(d) for
each u.c.p. d. That is, limn µ(fn) = 0 whenever d is a u.c.p. on X , fn ∈ Lip(d) for
n = 1, 2, . . ., and limn fn(x) = 0 for each x ∈ X .
When X is a topological group G with its right uniformity, Muσ(X) is the space Leb
in the notation of [6].
Clearly Mu(X) ⊆ Muσ(X) for every uniform space X . By Lebesgue’s dominated conver-
gence theorem ([7], 123C), Mσ(X) ⊆ Muσ(X) for every X .
For any uniform space X , let cX be the set X with the weak uniformity induced by all
uniformly continuous functions fromX to R ([13], p. 129). Let eX be the cardinal reflection Xℵ1
([13], p. 52 and 129), also known as the separable modification of X . Thus eX is a uniform space
on the same set as X , and a pseudometric on X is a u.c.p. on eX if and only if it is a separable
u.c.p. on X . Note that Ub(X) = Ub(eX) = Ub(cX) and M(X) = M(eX) = M(cX).
Let ℵ be a cardinal number, and let A be a set of cardinality ℵ. As in [12], say that ℵ
has measure zero if m(A) = 0 for every non-negative countably additive measure m defined
on the sigma-algebra of all subsets of A and such that m({a}) = 0 for all a ∈ A. A related
notion, not used in this paper, is that of a nonmeasurable cardinal as defined by Isbell [13],
using two-valued measures m in the preceding definition.
It is not known whether every cardinal has measure zero. The statement that every cardinal
has measure zero is consistent with the usual axioms of set theory. A detailed discussion of this
and related properties of cardinal numbers can be found in [9] and [14].
Let d be a pseudometric on a set X . A collection W of nonempty subsets of X is uniformly
d-discrete if there exists ε > 0 such that d(x, x′) ≥ ε whenever x∈V, x′∈V ′, V, V ′∈W , V 6= V ′.
A set Y ⊆ X is uniformly d-discrete if the collection of singletons {{y} | y ∈ Y } is uniformly
d-discrete.
Let X be a uniform space. A set Y ⊆ X is uniformly discrete if there exists a u.c.p. d on X
such that Y is uniformly d-discrete. Say that X is a (uniform) D-space [18] if the cardinality
of every uniformly discrete subset of X has measure zero.
This generalizes the notion of a topological D-space as defined by Granirer [12] and further
discussed by Kirk [16] in the context of topological measure theory. A topological space T is a
D-space in the sense of [12] if and only if T with its fine uniformity ([13], I.20) is a uniform D-
space. If X is a uniform space and Y ⊆ X is uniformly discrete in X then Y is also uniformly
discrete in X with its fine uniformity. Therefore, if X is a topological D-space in the sense
of [12] then it is also a uniform D-space.
Since the countable infinite cardinal ℵ0 has measure zero, every uniform space X such that
X = eX is a D-space. Thus every uniform subspace of a product of separable metric spaces is
a D-space. Moreover, the statement that every uniform space is a D-space is consistent with
the usual axioms of set theory.
3 Measures on uniform D-spaces
The uniform spaces X for which Mu(X) ⊆ Mσ(X) were investigated by several authors [1] [3]
[4] [10] [17]. The opposite inclusion Mσ(X) ⊆ Mu(X) has not attracted as much attention.
Theorem 2 in this section characterizes the uniform spaces X for which Mσ(X) ⊆ Mu(X).
Lemma 1 Let d be a pseudometric on a set X, and ε > 0. Then there exist sets Wn of
nonempty subsets of X, n = 1, 2, . . ., such that
n=1 Wn is a cover of X;
2. for each n, Wn ⊆ O(d);
3. for each n, the d-diameter of each V ∈ Wn is at most ε;
4. each Wn is uniformly d-discrete.
The lemma is essentially the theorem of A.H. Stone about σ-discrete covers in metric spaces.
For the proof, see the proof of 4.21 in [15].
The next theorem is the main result of this paper. It generalizes a known result about
separable measures on completely regular topological spaces — Proposition 3.4 in [16].
Theorem 2 For any uniform space X, the following statements are equivalent:
(i) X is a uniform D-space.
(ii) Mσ(X) ⊆ Mu(X).
In view of Theorem 2 and the remarks in section 2, the statement that Mσ(X) ⊆ Mu(X)
for every uniform space X is consistent with the usual axioms of set theory.
Proof. This proof is adapted from the author’s unpublished manuscript [18].
To prove that (i) implies (ii), let X be a D-space. To show that Mσ(X) ⊆ Mu(X), it is
enough to show that µ ∈ Mu(X) for every non-negative µ ∈ Mσ(X), in view of the Jordan
decomposition of countably additive measures ([8], 231F). Take any µ ∈ Mσ(X), µ ≥ 0 and
any ε > 0. Let m be the non-negative countably additive measure on σ(Coz(X)) such that
µ(f) =
fdm for f ∈ Ub(X).
Let d be a u.c.p. on X , and {fα}α a net of functions fα ∈ Lip(d) such that limα fα(x) = 0
for every x ∈ X . Our goal is to prove that limα µ(fα) = 0.
For the given X , d and ε, let Wn be as in Lemma 1. If V ∈ Wn for some n then choose a
point xV ∈ V . Let Tn = {xV | V ∈ Wn} for n = 1, 2, . . ..
Fix n for a moment. For each subset W ′ ⊆ Wn we have
W ′ ∈ O(d) ⊆ Coz(X). Thus for
each S ⊆ Tn we may define m̃(S) = m(
{V ∈ Wn | xV ∈ S} ), and m̃ is a countably additive
measure defined on all subsets of Tn. Since the set Tn is uniformly discrete and X is a D-space,
it follows that the cardinality of Tn is of measure zero, and there exists a countable set Sn ⊆ Tn
such that
{V ∈ Wn | xV ∈ Tn \ Sn} ) = m̃(Tn \ Sn) = 0.
Denote P =
n=1 Sn and Y = {x ∈ X | d(x, P ) ≤ ε}. If V ∈ Wn for some n and xV ∈ P
then V ⊆ Y , by property 3 in Lemma 1. Therefore
X \ Y ⊆
{V ∈ Wn | xV ∈ Tn \ Sn}
and m(X \ Y ) = 0.
Define gα(x) = supβ≥α |fβ(x)| for x ∈ X . Then gα ∈ Lip(d), gα ≥ gβ for α ≤ β, and
limα gα(x) = 0 for every x ∈ X .
Since the set P is countable, there is an increasing sequence of indices α(n), n = 1, 2, . . .,
such that limn gα(n)(x) = 0 for every x ∈ P , hence limn gα(n)(x) ≤ ε for every x ∈ Y . Thus
|µ(fα)| ≤ lim
µ(gα) ≤ lim
µ(gα(n)) = lim
gα(n)dm+
gα(n)dm
≤ εm(X)
which proves that limα µ(fα) = 0.
To prove that (ii) implies (i), assume that X is not a D-space. Thus there is a u.c.p. d on
X , a subset P ⊆ X and a non-negative countably additive measure m defined on all subsets of
P such that
• d(x, y) ≥ 1 for x, y ∈ P , x 6= y;
• m(x) = 0 for each x ∈ P ;
• m(P ) = 1.
Define µ(f) =
fdm for f ∈ Ub(X). Clearly µ ∈ Mσ(X).
For any set S ⊆ P , define the function fS ∈ Lip(d) by fS(x) = min(1, d(x, S)) for x ∈ X .
Then fS(x) = 0 for x∈S and fS(x) = 1 for x∈P \ S. Let F be the directed set of all finite
subsets of P ordered by inclusion. We have limS∈F fS(x) = fP (x) for each x ∈ X , µ(fS) = 1
for every S ∈ F , and µ(fP ) = 0. Thus µ 6∈ Mu(X). �
The inclusion Mσ(X) ⊆ Mu(cX) in the following corollary is Theorem 2.1 in [5].
Corollary 3 If X is any uniform space then Mσ(X) ⊆ Mu(eX) ⊆ Mu(cX).
Proof. As is noted above, eX is a D-space for any X . Thus Mσ(eX) ⊆ Mu(eX) by
Theorem 2. From the definitions of Mσ(X), eX and cX we get Mσ(X) = Mσ(eX) and
Mu(eX) ⊆ Mu(cX). �
Corollary 3 follows also from Theorem 4 in the next section: Mσ(X) ⊆ Muσ(X) = Mu(eX).
4 Countably uniform measures
In this section we compare the spaces Muσ(X) and Mu(X).
Theorem 4 If X is any uniform space then Muσ(X) = Mu(eX).
Proof. To prove that Muσ(X) ⊆ Mu(eX), note that if a pseudometric d is separable then
Lip(d) with the topology of pointwise convergence is metrizable, and therefore sequential con-
tinuity on Lip(d) implies continuity.
To prove that Mu(eX) ⊆ Muσ(X), take any µ ∈ Mu(eX). Let d be a u.c.p. on X ,
fn ∈ Lip(d) for n = 1, 2, . . ., and limn fn(x) = 0 for each x ∈ X . Define a pseudometric d̃ on X
d̃(x, y) = sup
|fn(x) − fn(y)| for x, y ∈ X .
Then d̃ is a separable u.c.p. on X , hence a u.c.p. on eX , and fn ∈ Lip(d̃ ) for n = 1, 2, . . ..
Therefore limn µ(fn) = 0. �
In view of Theorem 4, spaces Muσ(X) have all the properties of general Mu(X) spaces.
For example, every Mu(X) is weak∗ sequentially complete [19], and the positive part µ
every µ ∈ Mu(X) is in Mu(X) [1] [3] [17]. Therefore the same is true for Muσ(X).
By Theorem 4, if X = eX then Mu(X) = Muσ(X) (cf. [6], 2.5(iii)). To see that the
equality Mu(X) = Muσ(X) does not hold in general, first consider a uniform space X that is
not a uniform D-space. Since Mσ(X) ⊆ Muσ(X), from Theorem 2 we get Mu(X) 6= Muσ(X).
However, that furnishes an actual counterexample only if there exists a cardinal that is not of
measure zero. Next we shall see that, even without assuming the existence of such a cardinal,
there is a space X such that Mu(X) 6= Muσ(X).
Let X̂ denote the completion of a uniform space X . Pelant [20] constructed a complete
uniform space X for which eX is not complete. For such X , there exists an element x ∈ êX \X .
Every f ∈Ub(X) = Ub(eX) uniquely extends to f̂ ∈Ub(êX). Let δx ∈M(X) be the Dirac
measure at x; that is, δx(f) = f̂(x) for f ∈Ub(X). Then δx∈Mu(eX), therefore δx∈Muσ(X)
by Theorem 4. On the other hand, δx 6∈ Mu(X), since δx is a multiplicative functional on
Ub(X) and x 6∈ X̂ ([19], section 6). Thus Mu(X) 6= Muσ(X).
References
[1] I.A. Berezanskǐı. Measures on uniform spaces and molecular measures. (In Russian)
Trudy Moskov. Mat. Obšč. 19 (1968) 3-40.
English translation: Trans. Moscow Math. Soc. 19 (1968) 1-40.
[2] I. Csiszár. On the weak∗ continuity of convolution in a convolution algebra over an arbi-
trary topological group. Studia Sci. Math. Hungarica 6 (1971) 27-40.
[3] V.P. Fedorova. Linear functionals and the Daniell integral on spaces of uniformly con-
tinuous functions. (In Russian) Mat. Sb. 74 (116) (1967) 191-201.
English translation: Math. USSR – Sbornik 3 (1967) 177-185.
[4] V.P. Fedorova. Concerning Daniell integrals on an ultracomplete uniform space. (In Rus-
sian) Mat. Zametki 16, 4 (1974) 601-610.
English translation: Math. Notes AN USSR 16 (1974) 950-955.
[5] V.P. Fedorova. Integral representation of functionals on spaces of uniformly continuous
functions. (In Russian) Sibirsk. Mat. Ž. 23, 5 (1982) 205-218.
English translation: Siber. Math. J. 23 (1982) 753-762.
[6] S. Ferri and M. Neufang. On the topological centre of the algebra LUC(G)∗ for general
topological groups. J. Funct. Anal. 244 (2007) 154-171.
[7] D.H. Fremlin. Measure Theory. Volume 1 (Third Printing). Torres Fremlin (2004).
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
[8] D.H. Fremlin. Measure Theory. Volume 2 (Second Printing). Torres Fremlin (2003).
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
[9] D.H. Fremlin. Measure Theory. Volume 5. Torres Fremlin (to appear in 2008).
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
[10] Z. Froĺık. Measure-fine uniform spaces I. Springer-Verlag Lecture Notes in Mathematics
541 (1975) 403-413.
[11] L. Gillman and M. Jerison. Rings of Continuous Functions. Van Nostrand (1960).
[12] E.E. Granirer. On Baire measures on D-topological spaces. Fund. Math. 60 (1967) 1-22.
http://matwbn.icm.edu.pl/ksiazki/fm/fm60/fm6001.pdf
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
http://www.essex.ac.uk/maths/staff/fremlin/mt.htm
http://matwbn.icm.edu.pl/ksiazki/fm/fm60/fm6001.pdf
[13] J.R. Isbell. Uniform Spaces. American Mathematical Society (1960).
[14] T. Jech. Set Theory (Second Edition). Springer-Verlag (1997).
[15] J.L. Kelley. General Topology. Van Nostrand (1967).
[16] R.B. Kirk. Complete topologies on spaces of Baire measure. Trans. Amer. Math. Soc.
184 (1973) 1-29.
[17] L. LeCam. Note on a certain class of measures. Unpublished manuscript (1970).
http://www.stat.berkeley.edu/users/rice/LeCam/papers/classmeasures.pdf
[18] J. Pachl. Katětov-Shirota theorem in uniform spaces. Unpublished manuscript (1976).
[19] J. Pachl. Uniform measures and convolution on topological groups.
arXiv:math.FA/0608139v2 (2006)
http://arxiv.org/abs/math.FA/0608139v2
[20] J. Pelant. Reflections not preserving completeness. Seminar Uniform Spaces 1973-74
(Directed by Z. Froĺık) 235-240 (presented in April 1975) MÚ ČSAV (Prague).
[21] M.D. Rice. Uniform ideas in analysis. Real Analysis Exchange 6 (1981) 139-185.
http://www.stat.berkeley.edu/users/rice/LeCam/papers/classmeasures.pdf
http://arxiv.org/abs/math/0608139
http://arxiv.org/abs/math.FA/0608139v2
	Introduction
	Notation
	Measures on uniform D-spaces
	Countably uniform measures
ABSTRACT
  Uniform measures are defined as the functionals on the space of bounded
uniformly continuous functions that are continuous on bounded uniformly
equicontinuous sets. If every cardinal has measure zero then every countably
additive measure is a uniform measure. The functionals sequentially continuous
on bounded uniformly equicontinuous sets are exactly uniform measures on the
separable modification of the underlying uniform space.

<|endoftext|><|startoftext|>
Lower bounds for the conductivities of correlated quantum systems
Peter Jung and Achim Rosch
Institute for Theoretical Physics, University of Cologne, 50937 Cologne, Germany.
(Dated: October 30, 2018)
We show how one can obtain a lower bound for the electrical, spin or heat conductivity of cor-
related quantum systems described by Hamiltonians of the form H = H0 + gH1. Here H0 is an
interacting Hamiltonian characterized by conservation laws which lead to an infinite conductivity
for g = 0. The small perturbation gH1, however, renders the conductivity finite at finite temper-
atures. For example, H0 could be a continuum field theory, where momentum is conserved, or an
integrable one-dimensional model while H1 might describe the effects of weak disorder. In the limit
g → 0, we derive lower bounds for the relevant conductivities and show how they can be improved
systematically using the memory matrix formalism. Furthermore, we discuss various applications
and investigate under what conditions our lower bound may become exact.
PACS numbers: 72.10.Bg, 05.60.Gg, 75.40.Gb, 71.10.Pm
I. INTRODUCTION
Transport properties of complex materials are not only
important for many applications but are also of funda-
mental interest as their study can give insight into the
nature of the relevant quasi particles and their interac-
tions.
Compared to thermodynamic quantities, the transport
properties of interacting quantum systems are notori-
ously difficult to calculate even in situations where in-
teractions are weak. The reason is that conductivities
of non-interacting systems are usually infinite even at fi-
nite temperature, implying that even to lowest order in
perturbation theory an infinite resummation of a per-
turbative series is mandatory. To lowest order this im-
plies that one usually has to solve an integral equation,
often written in terms of (quantum-) Boltzmann equa-
tions or – within the Kubo formalism – in terms of vertex
equations. The situation becomes even more difficult if
the interactions are so strong that an expansion around
a non-interacting system is not possible. Also numeri-
cally, the calculation of zero-frequency conductivities of
strongly interacting clean systems is a serious challenge
and even for one-dimensional systems reliable calcula-
tions are available for high temperatures only1,2,3,4,5,6.
Variational estimates, e.g. for the ground state energy,
are powerful theoretical techniques to obtain rigorous
bounds on physical quantities. They can be used to
guide approximation schemes to obtain simple analytic
estimates and are sometimes the basis of sophisticated
numerical methods like the density matrix renormaliza-
tion group7.
Taking into account both the importance of transport
quantities and the difficulties involved in their calculation
it would be very useful to have general variational bounds
for transport coefficients.
A well known example where a bound for transport
quantities has been derived is the variational solution
of the Boltzmann equation, discussed extensively by
Ziman8. The linearized Boltzmann equation in the pres-
ence of a static electric field can be written in the form
Wk,k′Φk′ (1)
where Wk,k′ is the integral kernel describing the scatter-
ing of quasiparticles and we have linearized the Boltz-
mann equation around the Fermi (or Bose) distribution
= f0(ǫk) using fk = f
Φk. Therefore, the
current is given by I = −e
Φk and the dc con-
ductivity is determined from the inverse of the scattering
matrix W using σ = −e2
k,k′v
. It is
easy to see that this result can be obtained by maximiz-
ing a functional8,9,10,11 F [Φ] with
σ = e2max
F [Φ] ≥ e2 max
F [Φ] =
k,k′(Φk − Φk′)2Wk,k′
where we used that
Wk,k′ = 0 reflecting the conser-
vation of probability. The variational formula (2) is ac-
tually closely related8 to the famous H-theorem of Boltz-
mann which states that entropy always increases upon
scattering.
A lower bound for the conductivity can be obtained
by varying Φ only in a subspace of all possible func-
tions. This allows for example to obtain analytically
good estimates for conductivities without inverting an
infinite dimensional matrix or, euqivalently, solving an
integral equation, see Ziman’s book for a large number
of examples8.
The applicability of Eq. (2) is restricted to situations
where the Boltzmann equation is valid and bounds for
the conductivity in more general setups are not known.
However, for ballistic systems with infinite conductiv-
ity it is possible to get a lower bound for the so-called
Drude weight. Mazur12 and later Suzuki13 considered
situations where the presence of conservation laws pro-
hibits the decay of certain correlation functions in the
http://arxiv.org/abs/0704.0886v2
Re σ(ω) Re σ(ω)
σreg(ω)
πDδ(ω)
g != 0g = 0
FIG. 1: For g = 0 a Drude peak shows up in the conductivity,
resulting from exact conservation laws. For g 6= 0 the Drude
peak broadens and the dc conductivity becomes finite.
long time limit. In the context of transport theory their
result can be applied to systems (see Appendix A for de-
tails) where the finite-temperature conductivity σ(ω, T )
is infinite for ω = 0 and characterized by a finite Drude
weight D(T ) > 0 with
Re σ(ω, T ) = πD(T )δ(ω) + σreg(ω, T ). (3)
Such a Drude weight can arise only in the presence of
exact conservation laws Cj with [H,Cj ] = 0. Suzuki
showed that the Drude weight can be expressed as a sum
over all Cj
〈CjJ〉2
〈C2j 〉
〈CjJ〉2
〈C2j 〉
. (4)
where J is the current associated with σ. For conve-
nience a basis in the space of Ci has been chosen such
that 〈CiCj〉 = 0 for i 6= j. More useful than the equal-
ity in Eq. (4) is often the inequality12 which is obtained
when the sum is restricted to a finite subset of conser-
vation laws. Such a finite sum over simple expectation
values can often be calculated rather easily using either
analytical or numerical methods. The Mazur inequality
has recently been used heavily4,14,15,16,17 to discuss the
transport properties of one-dimensional systems.
Model systems, due to their simplicity, often exhibit
symmetries not shared by real materials. For exam-
ple, the heat conductivity of idealized one-dimensional
Heisenberg chains is infinite at arbitrary temperature as
the heat current is conserved. However, any additional
coupling (next-nearest neighbor, inter-chain, disorder,
phonon,...) renders the conductivity finite1,4,5,6,18,19,20.
If these perturbations are weak, the heat conductivity
is, however, large as observed experimentally21,22. For a
more general example, consider an arbitrary translation-
ally invariant continuum field theory. Here momentum
is conserved which usually implies that the conductivity
is infinite for this model. In real materials momentum
decays by Umklapp scattering or disorder rendering the
conductivity finite. It is obviously desirable to have a
reliable method to calculate transport in such situations.
In this work we consider systems with the Hamiltonian
H = H0 + gH1, (5)
where for g = 0 the relevant heat-, charge- or spin- con-
ductivity is infinite and characterized by a finite Drude
weight given by Eq. (4). As discussed above,H0 might be
an integrable one-dimensional model, a continuum field
theory, or just a non-interacting system. The term gH1
describes a (weak) perturbation which renders the con-
ductivity finite, e.g. due to Umklapp scattering or dis-
order, see Fig. 1. Our goal is to find a variational lower
bound for conductivities in the spirit of Eq. (2) for this
very general situation, without any requirement on the
existence of quasi particles. For technical reasons (see
below) we restrict our analysis to situations where H is
time reversal invariant.
In the following, we first describe the general setup and
introduce the memory matrix formalism, which allows
us to formulate an inequality for transport coefficients
for weakly perturbed systems. We will argue that the
inequality is valid under the conditions which we specify.
Finally, we investigate under which conditions the lower
bounds become exact and briefly discuss applications of
our formula.
II. SETUP
Consider the local density ρ(x) of an arbitrary phys-
ical quantity which is locally conserved, thus obeying a
continuity equation
∂tρ+∇j = 0.
Transport of that quantity is described by the dc con-
ductivity σ which is the response of the current to some
external field E coupling to the current,
〈J〉 = σE,
where J =
j(x) is the total current and 〈J〉 its expec-
tation value. Note that J can be an electrical-, spin-, or
heat current and E the corresponding conjugate field de-
pending on the context. The dynamic conductivity σ(z)
is given by Kubo’s formula, see Eq. (A1). We are inter-
ested in the dc conductivity σ = limω→0 σ(z = ω + i0).
Starting from the Hamiltonian (5) we consider a sys-
tem where H0 posesses a set of exact conservation laws
{Ci} of which at least one correlates with the current,
〈JC1〉 6= 0. Without loss of generality we assume
〈CiCj〉 = 0 for i 6= j. For g = 0 the Drude weight D
defined by Eq. (3) is given by Eq. (4). We can split up
the current under consideration into a part which is par-
allel to the Ci and one that is orthogonal,
J = J‖ + J⊥,
with J‖ =
〈CiJ〉
Ci, which results in a separation of
the conductivity,
σ(z) = σ‖(z) + σ⊥(z). (6)
Since the conductivity σ(z) is given by a current-current
correlation function and the current J‖ (J⊥) is diago-
nal (off-diagonal) in energy, cross-correlation functions
〈〈J‖; J⊥〉〉 vanish in Eq. (6).
According to Eq. (4), the Drude peak of the unper-
turbed system, g = 0, arises solely from J‖:
Re σ‖(ω) = πDδ(ω), (7)
while σ⊥(z) appears in Eq. (3) as the regular part,
Re σ⊥(ω) = σreg(ω).
In this work we will focus on σ‖(ω), since the small
perturbation is not going to affect σ⊥(ω) much (which is
assumed to be free of singularities here, see section IV)
while σ‖(ω = 0) diverges for g → 0, see Fig. 1. As we
are interested in the small g asymptotics only, we may
neglect the contribution σ⊥(0) to the dc conductivity.
Hence we set J = J‖ and σ(ω) = σ‖(ω) in the following.
III. MEMORY MATRIX FORMALISM
We have seen that certain conservation laws ofH0 play
a crucial role in determining the conductivity of both the
unperturbed and the perturbed system. In the presence
of a small perturbation gH1, these modes are not con-
served anymore but at least some of them decay slowly.
Typically, the conductivity of the perturbed system will
be determined by the dynamics of these slow modes. To
separate the dynamics of the slow modes from the rest, it
is convenient to use a hydrodynamic approach based on
the projection of the dynamics onto these slow modes. In
this section we will therefore review the so called memory
matrix formalism23, introduced by Mori and Zwanzig24,25
for this purpose. In the next section we will show that
this approach can be used to obtain a lower bound for
the dc conductivity for small g.
We start by defining a scalar product in the space of
quantum mechanical operators,
(A|B) =
dλ〈A†B(iλ)〉 − β〈A†〉〈B〉 (8)
As the next step we choose a – for the moment – arbi-
trary set of operators {Ci}. In most applications, the Ci
are the relevant slow modes of the system. For notational
convenience, we assume that the {Ci} are orthonormal-
ized,
(Ci|Cj) = δij . (9)
In terms of these we may define the projector P onto (and
Q away from) the space spanned by these ‘slow’ modes
|Ci)(Ci| = 1−Q.
We assume that C1 is the current we are interested in,
|J) ≡ |C1).
The time evolution is given by the Liouville-
(super)operator
L = [H, .] = L0 + gL1
with (LA|B) = (A|LB) = (A|L|B), and the time evo-
lution of an operator may be expressed as |A(t)) =
|eiHtAe−iHt) = eiLt|A). With these notions, one obtains
the following simple, yet formal expression for the con-
ductivity:
σ(ω) =
ω − L
ω − L
Using a number of simple manipulations, one can
show23,24,25 that the conductivity can be expressed as
the (1, 1)-component of the inverse of a matrix
σ(ω) = (M(ω) + iK − iω)−1
, (10)
where
Mij(ω) =
ω − LQ
is the so-called memory matrix and
Kij =
Ċi|Cj
a frequency independent matrix. The formal expression
(10) for the conductivity is exact, and completely gen-
eral, i.e. valid for an arbitrary choice of the modes Ci
(they do not even have to be ‘slow’). Only C1 = J is re-
quired. However, due to the projection operators Q, the
memory matrix (11) is in general difficult to evaluate. It
is when one uses approximations to M that the choice of
the projectors becomes crucial (see below).
Obviously, the dc conductivity is given by the (1, 1)-
component of
(M(0) +K)−1. (13)
More generally, the (m,n)-component of Eq. (13) de-
scribes the response of the ‘current’ Cm to an external
field coupling solely to Cn. We note, that since a matrix
of transport coefficients has to be positive (semi)definite,
this also holds for the matrix M(0) +K.
To avoid technical complications associated with the
presence of K we restrict our analysis in the following to
time reversal invariant systems and choose the Ci such,
that they have either signature +1 or −1 under time
reversal34 Θ. In the dc limit, ω = 0, components of
Eq.(13) connecting modes of different signatures vanish.
Thus, M(0) + K is block-diagonal with respect to the
time reversal signature, and consequently we can restrict
our analysis to the subspace of slow modes with the same
signature as C1. However, if Cm and Cn have the same
signature, then (Cm|Ċn) = 0, and thus K vanishes on
this restricted space. The dc conductivity therefore takes
the form
σ = (M(0)−1)11. (14)
IV. CENTRAL CONJECTURE
To obtain a controlled approximation to the memory
matrix in the limit of small g, it is important to identify
the relevant slow modes of the system. For the Ci we
choose quantities which are conserved by H0, [H0, Ci] =
0, such that Ċi = ig[H1, Ci] is linear in the small coupling
g. As argued below, we require that the singularities
of correlation functions of the unperturbed system are
exclusively due to exact conservation laws Ci, i.e. that
the Drude peak appearing in Eq. (3) is the only singular
contribution. Furthermore, we choose J = J‖ = C1 and
consider only Ci with the same time reversal signature
as J , as discussed in the previous section.
To formulate our central conjecture we introduce the
following notions. We define Mn(ω) as the (exact) n× n
memory matrix obtained by setting up the memory ma-
trix formalism for the first n slow modes {Ci, i = 1, .., n}.
Note that the definitions of the relevant projectors P and
Q also depend on this choice, and that for any choice of
n one gets σ = (M−1n )11. We now introduce the approxi-
mate memory matrix M̃n motivated by the following ar-
guments: Ċi is already linear in g, therefore in Eq. (11)
we approximate L by L0 and replace (.|.) by (.|.)0 as
we evaluate the scalar product with respect to H0. As
L0|Ci) = 0 and (Cj |Ċi) = 0 due to time reversal symme-
try, one has L0Q = 1 and Q|Ċi) = |Ċi) and therefore the
projector Q does not contribute within this approxima-
tion. We thus define the n× n matrix M̃n by
M̃n,ij = lim
ω − L0
Note that M̃n is a submatrix of M̃m for m > n and
therefore the approximate expression for the conductiv-
ity σ ≈ (M̃−1n )11 does depend on n while (M−1n )11 is
independent of n. A much simpler, alternative deriva-
tion for M̃1 is given in Appendix B, where the validity of
this formula is also discussed.
The central conjecture of our paper is, that for small g
(M̃−1n )11 gives a lower bound to the dc conductivity, or,
more precisely,
1/g2 = (M̃
∞ )11 ≥ · · · ≥ (M̃−1n )11 ≥ · · · ≥ M̃−11 . (16)
Here σ|
1/g2 = (1/g
2) limg→0 g
2σ denotes the leading
term ∝ 1/g2 in the small-g expansion of σ. Note that
M̃n ∝ g2 by construction. M̃∞ is the approximate mem-
ory matrix where all35 conservation laws have been in-
cluded. In some special situations, discussed in Ref. 6,
one has σ ∼ 1/g4 and therefore σ|
1/g2 = ∞.
A special case of the inequality above is Eq. (B4) in
appedix B, as the scattering rate Γ̃/χ may be expressed
as Γ̃/χ2 = M̃1.
Two steps are necessary to prove Eq. (16). The simple
part is actually the inequalities in Eq. (16). They are
a consequence of the fact that the matrices M̃n are all
positive definite and that M̃n is a submatrix of M̃m for
m ≥ n. More difficult to prove is that the first equality
in (16) holds. To show this we will need an additional
assumption, namely, that the regular part of all correla-
tion functions (to be defined below) remains finite in the
limit g → 0, ω → 0. In this case, the perturbative ex-
pansion around M̃∞ in powers of g is free of singularities
at finite temperature (which is not the case for M̃n<∞).
This in turn implies that limg→0 M∞/g
2 = M̃∞/g
2 and
therefore σ|
1/g2 = (M̃
∞ )11.
Next, we present the two parts of the proof.
A. Inequalities
We start by investigating the (1,1)-component of the
inverse of the positive definite symmetric matrix M̃∞. It
is convenient to write the inverse as
(M̃−1∞ )11 = max
(ϕTe1)
ϕT M̃∞ϕ
where e1 is the first unit vector. The same method is
used to derive Eq. (2) in the context of the Boltzmann
equation. The maximum is obtained for ϕ = M̃−1∞ e1.
By restricting the variational space in (17) to the first n
components of ϕ we reproduce the submatrix M̃n of M̃∞
and obtain
(M̃−1∞ )11 ≥ max
(ϕTe1)
ϕT M̃∞ϕ
= (M̃−1m )11
≥ max
(ϕT e1)
ϕT M̃∞ϕ
= (M̃−1n<m)11
By choosing different values form and n < m, this proves
all inequalities appearing in (16).
B. Expansion of the memory matrix
We proceed by expanding the exact memory matrix
Mn, where Pn = 1 − Qn is a projector on the first n
conservation laws, in powers of g. Using that LQn =
L0 + gL1Qn, we obtain the geometric series
Mn,ij(ω) =
ω − L0
ω − L0
Note that this is not a full expansion in g, as the scalar
product (8) is defined with respect to the full Hamilto-
nian H = H0 + gH1. We will turn to the discussion of
the remaining g-dependence later.
In general, one can expand
λmn|Am)(An|
in terms of some basis Am in the space of operators.
Therefore Eq. (18) can be written as a sum over products
of terms with the general structure
ω − L0
. (19)
In the following we would like to argue that such an ex-
pansion is regular for n = ∞ if all conservation laws
have been included in the definition of Q. As argued in
Appendix B, we have to investigate whether the series co-
efficients in Eq. (18) diverge for ω → 0. The basis of our
argument is the following: as Q∞ projects the dynam-
ics to the space perpendicular to all of the conservation
laws, the associated singularities are absent in Eq. (19)
and therefore the expansion of M∞ is regular.
To show this more formally, we split up B = B‖ +B⊥
in (19) into a component parallel and one perpendicular
to the space of all conserved quantities, |B‖) = P∞|B).
With this notation, the action of L0 becomes more trans-
parent:
ω − L0
|B) = 1
|B‖) +
ω − L0
|B⊥). (20)
As we assume that all divergencies can be traced back
to the conservation laws, we take the second term to be
regular. It is only the first term which leads in Eq. (19)
to a divergence for ω → 0, provided that (A|Qn|B‖) is fi-
nite. If we consider the perturbative expansion ofMn<∞,
where Pn = 1 − Qn projects only to a subset of con-
served quantities, then finite contributions of the form
(A|Qn|B‖) exist and the perturbative series in g will be
singular (see also Appendix B). Considering M∞, how-
ever, Q∞ projects out all conservation laws and therefore
by construction Q∞|B‖) = Q∞P∞|B) = 0. Thus the
first term in (20) does not contribute in (19) for n = ∞
and the expansion (18) of M∞ is therefore regular.
The only remaining part of our argument is to show
that in the limit g → 0 one can safely replace (.|.) by
(.|.)0. Here it is useful to realize that (A|B) can be in-
terpreted as a (generalized) static susceptibility. In the
absence of a phase transition and at finite temperatures,
susceptibilities are smooth, non-singular functions of the
coupling constants and therefore we do not expect any
further singularities from this step. If we define a phase
transition by a singularity in some generalized suscepti-
bility, then the statement that susceptibilities are regular
in the absence of phase transitions even becomes a mere
tautology.
Combining all arguments, the expansion (18) of
M∞(ω → 0) is regular, and using (Ċi|Q∞ = (Ċi| [see
discussion before Eq. (15)] its leading term, k = 0 is
given by M̃∞. We therefore have shown the missing first
equality of our central conjecture (16).
V. DISCUSSION
In this paper we have established that in the limit of
small perturbations, H = H0 + gH1, lower bounds to dc
conductivities may be calculated for situations where the
conductivity is infinite for g = 0. In the opposite case,
when the conductivity is finite for g = 0, one can use
naive perturbation theory to calculate small corrections
to σ without further complications.
The relevant lower bounds are directly obtained from
the memory matrix formalism. Typically26,27,28 one has
to evaluate a small number of correlation functions and
to invert small matrices. The quality of the lower bounds
depends decisively on whether one has been able to iden-
tify the ‘slowest’ modes in the system.
There are many possible applications for the results
presented in this paper. The mostly considered situ-
ation is the case where H0 describes a non-interacting
system26. For situations where the Boltzmann equation
can be applied, it has been pointed out a long time ago by
Belitz29 that there is a one-to-one relation of the memory
matrix calculation to a certain variational Ansatz to the
Boltzmann equation, see Eq. (2). In this paper we were
able to generalize this result to cases where a Boltzmann
description is not possible. For example, if H0 is the
Hamiltonian of a Luttinger liquid, i.e. a non-interacting
bosonic system, then typical perturbations are of the
form cosφ for which a simple transport theory in the
spirit of a Boltzmann or vertex equation does not exist
to our knowledge.
Another class of applications are systems where H0
describes an interacting system, e.g. an integrable one-
dimensional model6 or some non-trivial quantum-field
theory30. In these cases it can become difficult to cal-
culate the memory matrix and one has to resort to use
either numerical6 or field-theoretical methods30 to obtain
the relevant correlation functions.
An important special case are situations where H0
is characterized by a single conserved current with the
proper symmetries, i.e. with overlap to the (heat-, spin-
or charge-) current J . For example, in a non-trivial con-
tinuum field theory H0, interactions lead to the decay of
all modes with exception of the momentum P . In this
case the momentum relaxation and therefore the con-
ductivity at finite T is determined by small perturba-
tions gH1 like disorder or Umklapp scattering which are
present in almost any realistic system. As M̃∞ = M̃1 in
this case, our results suggest that for small g the conduc-
tivity is exactly determined by the momentum relaxation
rate M̃PP = limω→0 i(Ṗ |(ω − L0)−1|Ṗ ),
for g → 0. (21)
Here we used that J‖ = P (P |J)/(P |P ) with χPJ = (P |J)
and we have restored all factors which arise if the nor-
malization condition (9) is not used. In Appendix C, we
check numerically that this statement is valid for a real-
istic example within the Boltzmann equation approach.
A number of assumptions entered our arguments. The
strongest one is the restriction that all relevant singu-
larities arise from exact conservation laws of H0. We
assumed that the regular parts of correlation functions
are finite for ω = 0. There are two distinct scenarios
in which this assumption does not hold. First, in the
limit T → 0, often scattering rates vanish which can lead
to diverencies of the nominally regular parts of correla-
tion functions. Furthermore, at T = 0 even infinitesi-
mally small perturbations can induce phase transitions
– again a situation where our arguments fail. Therefore
our results are not applicable at T = 0. Second, finite
temperature transport may be plagued by additional di-
vergencies for ω → 0 not captured by the Drude weight.
In some special models, for instance, transport is singu-
lar even in the absence of exactly conserved quantities
(e.g. non-interacting phonons in a disordered crystal8).
In all cases known to us, these divergencies can be traced
back to the presence of some slow modes in the system
(e.g. phonons with very low momentum). While we have
not kept track of such divergencies in our arguments, we
nevertheless believe that they do not invalidate our main
inequality (16) as further slow modes not captured by ex-
act conservation laws will only increase the conductivity.
It is, however, likely that the equality (21) is not valid
for such situations. In Appendix C we analyze in some
detail within the Boltzmann equation formalism under
which conditions (21) holds. As an aside, we note that
the singular heat transport of non-interacting disordered
phonons, mentioned above, is well described within our
formalism if we model the clean system by H0 and the
disorder by H1, see the extensive discussion by Ziman
within the variational approach which can be directly
translated to the memory matrix language, see Ref. [29].
It would be interesting to generalize our results to cases
where time reversal symmetry is broken, e.g. by an exter-
nal magnetic field. As time reversal invariance entered
nontrivially in our arguments, this seems not to be sim-
ple. We nevertheless do not see any physical reason why
the inequality should not be valid in this case, too. One
example where no problems arise are spin chains in a
uniform magnetic field31 where one can map the field to
a chemical potential using a Jordan-Wigner transforma-
tion. Then one can directly apply our results to the time
reversal invariant system of Jordan-Wigner fermions.
Acknowledgments
We thank N. Andrei, E. Shimshoni, P. Wölfle and
X. Zotos for useful discussions. This work was partly sup-
ported by the Deutsche Forschungsgemeinschaft through
SFB 608 and the German Israeli Foundation.
APPENDIX A: DRUDE WEIGHT AND MAZUR
INEQUALITY
In this appendix we clarify the connection between the
Drude weight and the Mazur inequality, mentioned in the
introduction.
The Drude weight D is the singular part of the con-
ductivity at zero frequency, Re σ(ω) = πDδ(ω)+σreg(ω).
It can be calculated from the relation
D = lim
ω Im σ(ω).
It has been introduced by Kohn32 as a measure of ballistic
transport, indicated by D > 0.
Using Kubo formulas, conductivities can be expressed
in terms of the dynamic current susceptibilities33 Π(z)
using
σ(z) = − 1
ΠT −Π(z)
, (A1)
where Π(z) is the current response function
Π(z) =
dteizt〈[J(t), J(0)]〉 (A2)
Π′′(ω)
. (A3)
and ΠT is a current susceptibility. The conductivity may
be calculated by setting σ(ω) = σ(z = ω + i0). Relation
(A3) is a well known sum rule and for all regular corre-
lation functions one has ΠT = Π(0). In the presence of
a singular contribution to σ(ω), one easily identifies the
Drude weight with the expression ΠT−Π(0). For this dif-
ference Mazur12,13 derived a lower bound. Furthermore,
Suzuki13 has shown, that ΠT − Π(0) may be expressed
as a sum over all constants of the motion Ci present in
the system36,
D = ΠT −Π(0) =
〈CjJ〉2
〈C2j 〉
. (A4)
Thus, the Drude weight is intimately connected to the
presence of conservation laws: only components of the
current perpendicular to all conservation laws decay and
any conservation law with a component parallel to the
current (i.e. with a finite cross-correlation 〈CjJ〉) leads
to a finite Drude weight and thus ballistic transport. The
relation between the Drude weight and Mazur’s inequal-
ity has been first pointed out by Zotos14.
APPENDIX B: PERTURBATION THEORY FOR
Let us give an example of a naive perturbative deriva-
tion (see also Ref. [6]) to gain some insight about what
problems can turn up in a perturbative derivation as the
one presented in this work. According to our assump-
tions, the conductivity is diverging for g → 0 and there-
fore it is useful to consider the scattering rate Γ(ω)/χ
(with the current susceptibility χ) defined by
σ(ω) =
Γ(ω)/χ− iω
. (B1)
If J is conserved for g = 0 (i.e. for J = J‖, see above),
the scattering rate vanishes, Γ(ω) = 0, for g = 0, which
results in a finite Drude weight. A perturbation around
this singular point results in a finite Γ(ω). In the limit
g → 0 we can expand (B1) for any finite frequency ω in
Γ to obtain
ω2Re σ(ω) = Re Γ(ω) +O(Γ2/ω). (B2)
We can read this as an equation for the leading order
contribution to Γ(ω), which now is expressed through
the Kubo formula for the conductivity. By partially in-
tegrating twice in time we can write Γ(ω) = Γ̃(ω)+O(g3)
Re Γ̃(ω) = Re
dteizt〈[J̇(t), J̇(0)]〉0
z=ω+i0
where J̇ = i[H, J ] = ig[H1, J ] is linear in g and therefore
the expectation value 〈...〉0 can be evaluated with respect
to H0 (which may describe an interacting system). Thus
we have expressed the scattering rate via a simple corre-
lation function of the time derivative of the current.
To determine the dc conductivity one is interested in
the limit ω → 0 and it is tempting to set ω = 0 in
Eq. (B3). We have, however, derived Eq. (B3) in the
limit g → 0 at finite ω and not in the limit ω → 0 at
finite g. The series Eq. (B2) is well defined for finite
ω 6= 0 only and in the limit ω → 0 the series shows
singularities to arbitrarily high orders in 1/ω.
At first sight this makes Eq. (B3) useless for calculating
the dc conductivity. One of the main results of this paper
is that, nevertheless, Γ̃(ω = 0) can be used to obtain a
lower bound to the dc conductivity
σ(ω = 0) ≥ χ
Γ̃(0)
for g → 0. (B4)
APPENDIX C: SINGLE SLOW MODE
In this appendix we check whether in the presence of
a single conservation law with finite cross correlations
with the current the inequality (16) can be replaced by
the equality (21). This requires us to compare the true
conductivity, which in general is hard to determine, to
the result given by M̃1. Thus we restrict ourselves to
the discussion of models for which a Boltzmann equation
can be formulated and the expression for the conductivity
can be calculated at least numerically. In the following
we first show numerically that the equality (21) holds for
a realistic model. In a second step we discuss the precise
regularity requirement of the scattering matrix such that
Eq. (21) holds.
To simplify numerics, we consider a simple one-
dimensional Boltzmann equation of interacting and
weakly disordered Fermions. Clearly, the Boltzmann ap-
proach breaks down close to the Fermi surface due to
singularities associated with the formation of a Luttinger
liquid, but in the present context we are not interested
in this physics as we only want to investigate properties
of the Boltzmann equation. To avoid the restrictions as-
sociated with momentum and energy conservation in one
dimension we consider a dispersion with two minima and
four Fermi points,
ǫk = −
. (C1)
The Boltzmann equation reads
k′qq′
fkfk′(1− fq)(1 − fq′)
− fqfq′(1 − fk)(1 − fk′)
δ(ǫk − ǫk′)
fk(1 − fk′)− fk′(1 − fk)
Wkk′Φk′ (C2)
where the inelastic scattering term S
kk′ = δ(ǫk + ǫk′ −
ǫq − ǫq′)δ(k+ k′ − q− q′) conserves both energy and mo-
mentum. In the last line we have linearized the right
hand side using the definitions of the introductory chap-
ter. The velocity vk is given by vk =
ǫk. The scat-
tering matrix splits up into an interaction component
and a disorder component, Wkk′ = W
kk′ + g
2W 1kk′ . As
we do not consider Umklapp scattering, W 0kk′ conserves
momentum,
′ = 0, and one expects that mo-
mentum relaxation will determine the conductivity for
small g.
For the numerical calculation we discretize momentum
in the interval [−π/2, π/2], kn = nδk = nπ/N with inte-
ger n. (At the boundaries the energy is already too high
to play any role in transport.) The delta function aris-
ing from energy conservation is replaced by a gaussian of
width δ. The proper thermodynamic limit can for exam-
ple be obtained by choosing δ = 0.3/
N . The numerics
shows small finite size effects.
In Fig. 2 we compare the numerical solution of the
Boltzmann equation to the single mode memory matrix
calculation or, equivalently29, to the variational bound
obtained by setting Φk = k in Eq. (2)
k,k′ kWkk′k
k,k′ kW
. (C3)
As can be seen from the inset, in the limit of small g
one obtains the exact value for the conductivity, which is
what we intended to demonstrate.
0 0.05 0.1 0.15 0.2 0.25 0.3
0 0.1 0.2 0.3
FIG. 2: Comparison of the result of a single mode memory
matrix calculation (solid line), Eq. (C3), to the full numerical
solution of the Boltzmann equation (dotted line) for T = 0.05
and N = 500. The memory matrix is always a lower bound to
the Boltzmann result and converges towards it as the disorder
strength g is reduced, as shown in the inset (ratio of the single
mode approximation to the Boltzmann result).
Next we turn to an analysis of regularity conditions
which have to be met in general by the scattering matrix
Wkk′ such that convergence is guaranteed in the limit
g → 0. According to the assumptions of this appendix,
for g = 0 the variational form of the Boltzmann equa-
tion (2) has a unique solution Φ̄k (up to a multiplica-
tive constant), with F (Φ̄k) = ∞,
kk′ Φ̄k′ = 0 and
k vkΦ̄kdf
0/dǫk > 0.
In the presence of a finite, but small g we write the solu-
tion of the Boltzmann equation as Φ = Φ̄+Φ⊥, where Φ⊥
has no component parallel to Φ̄ (i.e.
k Φ̄kΦ
0/dǫk =
0 ). On the basis of the two inequalities
F [Φ̄] ≤ F [Φ] (C4)
ΦWΦ = Φ̄g2W 1Φ̄ + Φ⊥WΦ⊥ ≥ Φ̄g2W 1Φ̄ (C5)
one concludes that Eq. (21) is valid, i.e. that
F [Φ̄]
F [Φ]
under the condition that
vkΦ̄k
or, equivalently,
vkΦ⊥,k
= 0. (C6)
We therefore have to check whether Φ⊥ becomes small
in the limit of small g.
Expanding the saddlepoint equation for (2) we obtain
W 0kk′Φ
k′ = vk
k′k′′ Φ̄k′g
2W 1k′k′′ Φ̄k′′
k′ vk′
g2W 1kk′ Φ̄k′ +O(g2W1Φ⊥,Φ⊥W0Φ⊥)
As by definition Φ⊥ has no component parallel to Φ̄, we
can insert the projector Q which projects out the con-
servation law in front of Φ⊥k on the left hand side. We
therefore conclude that if the inverse of W 0Q exists, then
Φ⊥ is of order g
2, Eq. (C6) is valid and therefore also
Eq. (21). In our numerical examples these conditions are
all met.
Under what conditions can one expect that Eq. (C6) is
not valid? Within the assumptions of this appendix we
have excluded the presence of other zero modes of W 0
(i.e. conservation laws) with finite overlap with the cur-
rent. But it may happen that W 0 has many eigenvalues
which are arbitrarily small such that the sum in Eq. (C6)
diverges. In such a situation the presence of slow modes
which cannot be identified with conservation laws of the
unperturbed system invalidates Eq. (21).
1 X. Zotos and P. Prelovsek, Phys. Rev. B 53, 983 (1996).
2 K. Fabricius and B. M. McCoy, Phys. Rev. B 57, 8340
(1998).
3 B. N. Narozhny, A. J. Millis, and N. Andrei, Phys. Rev. B
58, R2921 (1998).
4 X. Zotos and P. Prelovsek, e-print
arXiv:cond-mat/0304630 (2003).
5 F. Heidrich-Meisner, A. Honecker, D. C. Cabra, and
W. Brenig, Physica B 359, 1394 (2005).
6 P. Jung, R. W. Helmes, and A. Rosch, Phys. Rev. Lett.
96, 067202 (2006).
7 U. Schollwöck, Rev. Mod. Phys. 77, 259 (2005).
8 J. Ziman, Electrons and Phonons: The theory of transport
phenomena in solids (Oxford University Press, 1960).
9 M. Kohler, Z. Phys. 124, 772 (1948).
10 M. Kohler, Z. Phys. 125, 679 (1949).
11 E. H. Sondheimer, Proc. R. Soc. London, Ser. A 203, 75
(1950).
12 P. Mazur, Physica (Amsterdam) 43, 533 (1969).
13 M. Suzuki, Physica (Amsterdam) 51, 277 (1971).
14 X. Zotos, F. Naef, and P. Prelovsek, Phys. Rev. B 55,
11029 (1997).
http://arxiv.org/abs/cond-mat/0304630
15 S. Fujimoto and N. Kawakami, Phys. Rev. Lett. 90,
197202 (2003).
16 F. Heidrich-Meisner, A. Honecker, and W. Brenig, Phys.
Rev. B 71, 184415 (2005).
17 K. Sakai, Physica E (Amsterdam) 29, 664 (2005).
18 E. Shimshoni, N. Andrei, and A. Rosch, Phys. Rev. B 68,
104401 (2003).
19 F. Heidrich-Meisner, A. Honecker, D. C. Cabra, and
W. Brenig, Phys. Rev. B 66, 140406(R) (2002).
20 A. V. Rozhkov and A. L. Chernyshev, Phys. Rev. Lett.
94, 087201 (2005).
21 A. V. Sologubenko, E. Felder, K. Giannò, H. R. Ott, A. Vi-
etkine, and A. Revcolevschi, Phys. Rev. B 62, R6108
(2000).
22 C. Hess, H. El Haes, A. Waske, B. Buchner, C. Sekar,
G. Krabbes, F. Heidrich-Meisner, and W. Brenig, Phys.
Rev. Lett. 98, 027201 (2007).
Hydrodynamic Fluctuations, Broken Symmetry, and Cor-
relation Functions, edited by D. Forster (Perseus, New
York, 1975).
24 H. Mori, Prog. Theor. Phys. 33, 423 (1965).
25 R. Zwanzig, in Lectures in Theoretical Physics, edited by
W. E. Brittin, B. W. Downs and J. Downs (Interscience,
New York, 1961), Vol. III; J. Chem. Phys. 33, 1338 (1960).
26 W. Götze and P. Wölfle, Phys. Rev. B 6, 1226 (1972).
27 T. Giamarchi, Phys. Rev. B 44, 2905 (1991).
28 A. Rosch and N. Andrei, Phys. Rev. Lett. 85, 1092 (2000).
29 D. Belitz, J. Phys. C 17, 2735 (1984).
30 E. Boulat, P. Mehta, N. Andrei, E. Shimshoni, and
A. Rosch, e-print arXiv:cond-mat/0607837 (2006).
31 A. V. Sologubenko, K. Berggold, T. Lorenz, A. Rosch,
E. Shimshoni, M. D. Phillips, and M. M. Turnbull, Phys.
Rev. Lett. 98, 107201 (2007).
32 W. Kohn, Physical Review 133, 171 (1964).
33 L. P. Kadanoff and P. C. Martin, Ann. Phys. (N.Y.) 24,
419 (1963).
34 As Θ2 = ±1 for states with integer or half-integer spin,
the combinations A±ΘAΘ−1 have signatures ±1 provided
the operator A does not change the total spin by half an
integer, which is the case for all operators with finite cross-
correlation functions with the physical currents.
35 The Ci span the space of all conservation laws, including
those which do not commute with each other.
36 More precisely, {Ci} is taken to be a basis of the space of
operators with energy-diagonal entries only, chosen to be
orthogonal in the sense that 〈CiCj〉 ∝ δij .
http://arxiv.org/abs/cond-mat/0607837
ABSTRACT
  We show how one can obtain a lower bound for the electrical, spin or heat
conductivity of correlated quantum systems described by Hamiltonians of the
form H = H0 + g H1. Here H0 is an interacting Hamiltonian characterized by
conservation laws which lead to an infinite conductivity for g=0. The small
perturbation g H1, however, renders the conductivity finite at finite
temperatures. For example, H0 could be a continuum field theory, where momentum
is conserved, or an integrable one-dimensional model while H1 might describe
the effects of weak disorder. In the limit g to 0, we derive lower bounds for
the relevant conductivities and show how they can be improved systematically
using the memory matrix formalism. Furthermore, we discuss various applications
and investigate under what conditions our lower bound may become exact.

<|endoftext|><|startoftext|>
Introduction to Plasma Theory (John
Wiley, New York, 1983).
[4] J. Barre, T. Dauxois, G. De Ninno, D. Fanelli, and
S. Ruffo, Phys. Rev. E 69, 045501(R) (2004).
[5] G. K. Zipf, Human Behavior and the Principle of Least
Effort (Addison-Wesley, New York, 1949).
[6] I. Kanter and D. A. Kessler, Phys. Rev. Lett. 74,
4559 (1995); K. E. Kechedzhy, O. V. Usatenko, and
V. A. Yampol’skii, Phys. Rev. E. 72, 046138 (2005).
[7] R. N. Mantegna, S. V. Buldyrev, A. L. Goldberger,
S. Havlin, C.-K. Peng, M. Simons, and H. E. Stanley,
Phys. Rev. E 52, 2939 (1995).
[8] R. F. Voss, Phys. Rev. Lett. 68, 3805 (1992).
[9] Z. Ouyang, C. Wang, and Z.-S. She, Phys. Rev. Lett.,
93, 078103 (2004).
[10] H. E. Stanley et. al., Physica A 224,302 (1996).
[11] V. Pareto, Le Cour d’Economie Politique (Macmillan,
London, 1896).
[12] R. N. Mantegna, H. E. Stanley, Nature (London) 376,
46 (1995).
[13] D. Sornette, L. Knopoff, Y. Y. Kagan, and C. Vanneste,
J. Geophys. Res. 101, 13883 (1996).
[14] http://myhome.hanafos.com/philoint/phd-data/Zipf’s-Law-2.htm.
[15] L. Casetti, M. Kastner, Phys. Rev. Lett. 97, 100602
(2006).
[16] F. Baldovin and E. Orlandini, Phys. Rev. Lett. 97,
100601 (2006).
[17] N. Theodorakopoulos, Physica D, 216, 185 (2006).
[18] L. P. Kadanoff et al., Rev. Mod. Phys., 39, 395 (1967).
[19] P. C. Hohenberg, B. I. Halperin, Rev. Mod. Phys., 49,
435 (1977).
[20] C. Tsalis, J. Stat. Phys. 52, 479 (1988).
[21] Nonextensive Statistical Mechanics and Its Applications,
eds. S. Abe and Yu. Okamoto (Springer, Berlin, 2001).
[22] O. V. Usatenko and V. A. Yampol’skii, Phys. Rev. Lett.
90, 110601 (2003); O. V. Usatenko, V. A. Yampol’skii,
K. E. Kechedzhy and S. S. Mel’nyk, Phys. Rev. E, 68,
061107 (2003).
[23] S. S. Melnyk, O. V. Usatenko, V. A. Yampolskii, and
V. A. Golick, Phys. Rev. E 72, 026140 (2005).
[24] S. S. Melnyk, O. V. Usatenko, V. A. Yampol’skii, Phys-
ica A 361, 405 (2005); F. M. Izrailev, A. A. Krokhin,
N. M. Makarov, S. S. Melnyk, O. V. Usatenko,
V. A. Yampol’skii, Physica A, 372, 279 (2006).
[25] S. S. Apostolov, Z. A. Mayzelis, O. V. Usatenko, and
V. A. Yampol’skii, Europhys. Lett. 76 (6), 1015 (2006).
[26] R. J. Glauber, J. Math. Phys. 4, 294 (1963).
http://myhome.hanafos.com/philoint/phd-data/Zipf's-Law-2.htm
ABSTRACT
  A new approach to non-extensive thermodynamical systems with non-additive
energy and entropy is proposed. The main idea of the paper is based on the
statistical matching of the thermodynamical systems with the additive
multi-step Markov chains. This general approach is applied to the Ising spin
chain with long-range interaction between its elements. The asymptotical
expressions for the energy and entropy of the system are derived for the
limiting case of weak interaction. These thermodynamical quantities are found
to be non-proportional to the length of the system (number of its particle).

<|endoftext|><|startoftext|>
NMR evidence for a strong modulation of the Bose-Einstein Condensate in BaCuSi2O6
S. Krämer,1 R. Stern,2 M. Horvatić,1 C. Berthier,1 T. Kimura,3 and I. R. Fisher4
1Grenoble High Magnetic Field Laboratory (GHMFL) - CNRS, BP 166, 38042 Grenoble Cedex 09, France
2National Institute of Chemical Physics and Biophysics, 12618,Tallinn, Estonia
3Los Alamos National Laboratory, Los Alamos NM 87545, USA
4Geballe Laboratory for Advanced Materials and Department of Applied Physics, Stanford University, Stanford CA 94305, USA
(Dated: October 30, 2018)
We present a 63,65Cu and 29Si NMR study of the quasi-2D coupled spin 1/2 dimer compound
BaCuSi2O6 in the magnetic field range 13-26 T and at temperatures as low as 50 mK. NMR data in
the gapped phase reveal that below 90 K different intra-dimer exchange couplings and different gaps
(∆B/∆A = 1.16) exist in every second plane along the c-axis, in addition to a planar incommensurate
(IC) modulation. 29Si spectra in the field induced magnetic ordered phase reveal that close to the
quantum critical point at Hc1 = 23.35 T the average boson density n of the Bose-Einstein condensate
is strongly modulated along the c-axis with a density ratio for every second plane nA/nB ≃ 5. An
IC modulation of the local density is also present in each plane. This adds new constraints for the
understanding of the 2D value φ = 1 of the critical exponent describing the phase boundary.
PACS numbers: 75.10.Jm,75.40.Cx,75.30.Gw
The interest in Bose-Einstein condensation (BEC) has
been considerably renewed since it was shown to occur
in cold atomic gases [1]. In condensed matter, a formal
analog of the BEC can also be obtained in antiferromag-
netic (AF) quantum spin systems [2, 3, 4, 5] under an
applied magnetic field. Many of these systems have a
collective singlet ground state, separated by an energy
gap ∆ from a band of triplet excitations. Applying a
magnetic field (H) lowers the energy of the Mz = −1
sub-band and leads to a quantum phase transition be-
tween a gapped non magnetic phase and a field induced
magnetic ordered (FIMO) phase at the critical field Hc1
corresponding to ∆min-gµBHc1 = 0, where ∆min is the
minimum gap value corresponding to some q vector qmin
[2, 3, 4, 5]. This phase transition can be described as a
BEC of hard core bosons for which the field plays the role
of the chemical potential, provided the U(1) symmetry is
conserved. Quite often, however, anisotropic interactions
can change the universality class of the transition and
open a gap [6, 7, 8]. From that point of view, BaCuSi2O6
[9] seems at the moment the most promising candidate
for the observation of a true BEC quantum critical point
(QCP) [10]. In addition, this system exhibits an un-
usual dimensionality reduction at the QCP, which was
attributed to frustration between adjacent planes in the
nominally body-centered tetragonal structure [11]. The
material also exhibits a weak orthorhombic distortion at
≃90 K which is accompanied by an in-plane IC lattice
modulation [12]. This structural phase transition affects
the triplon dispersion, and the possibility of a modula-
tion of the amplitude of the BEC along the c-axis has
been speculated based on low field inelastic neutron data
[13].
In order to get a microscopic insight of this system, we
performed 29Si and 63,65Cu NMR in BaCuSi2O6 single
crystals. Our data in the gapped phase reveal that the
structural phase transition which occurs around 90 K not
only introduces an IC distortion within the planes, but
also leads to the existence of two types of planes alter-
nating along the c-axis. From one plane to the other, the
intra-dimer exchange coupling and the energy gap for the
triplet states differs by 16 %. Exploring the vicinity of
the QCP in the temperature (T ) range 50-720 mK, we
confirm the linear dependence of TBEC with H −Hc1 as
expected for a 2D BEC. Our main finding is that the av-
erage boson density n in the BEC is strongly modulated
along the c-axis in a ratio of the order of 1:5 for every sec-
ond plane, whereas its local value n(R) is IC modulated
within each plane.
NMR measurements have been obtained on ∼10 mg
single crystals of BaCuSi2O6 using a home-made spec-
trometer and applying an external magnetic fieldH along
the c axis. The gapped phase was studied using a su-
perconducting magnet in the field range 13-15 T and
the temperature range 3-100 K. The investigation of the
FIMO phase was conducted in a 20 MW resistive magnet
at the GHMFL in the field range 22-25 T and the tem-
perature range 50-720 mK. Except for a few field sweeps
in the gapped phase, the spectra were obtained at fixed
fields by sweeping the frequency in regular steps and sum-
ming the Fourier transforms of the recorded echoes.
Before discussing the microscopic nature of the QCP,
let us first consider the NMR data in the gapped phase.
The system consists of S = 1/2 Cu spin dimers parallel
to the c axis and arranged (at room temperature) on a
square lattice in the ab plane. Each Cu dimer is sur-
rounded by four Si atoms, lying approximately in the
equatorial plane. For Cu nuclei, the interaction with the
electronic spins is dominated by the on-site hyperfine in-
teraction. For 29Si nuclei both the transferred hyperfine
interaction through oxygen atoms with a single dimer and
the direct dipolar interaction are important. According
http://arxiv.org/abs/0704.0888v1
125.0 125.2 125.4 125.6 125.8
0 20 40 60 80 100
 [ MHz]
T [ K ]
29Si NMR
H = 14.79 T
H || c-axis
I x 0.5
I x 0.25
T [ K ]
FIG. 1: (Color online) Evolution of the normalized 29Si NMR
spectra as a function of T in the gapped phase. Below 90 K
the line splits into two components, each of them correspond-
ing to an IC pattern. Inset: T dependence of the 1st moment
(i.e., the average position) for i) the total spectra (squares)
and ii) the individual components before they overlap (up
and down triangles). The solid and dashed lines are fits for
non-interacting dimers.
to the room temperature structure I41/acd [14], there
should be only one single Cu and two nearly equivalent
Si sites for NMR when H‖c. As far as 29Si is concerned,
one actually observes a single line above 90 K, as can be
seen in Fig. 1. However, below 90 K, the line splits into
two components, each of them corresponding to an IC
pattern, that is an infinite number of inequivalent sites.
This corresponds to the IC structural phase transition
discovered by X-ray measurements [12]. At 3 K, when
T is much smaller than the gap, the spin polarization is
zero and one observes again a single unshifted line, at the
frequency ν = ν0 =
29γH defined by the Si gyromagnetic
ratio 29γ.
On the 63,65Cu NMR spectra recorded at 3 K and
13.2 T (Fig. 2), however, one can distinguish two dif-
ferent Cu sites, denoted A and B. That is, each of the
6 lines of Cu spectrum (for 2 copper isotopes × 3 tran-
sitions of a spin 3/2 nucleus) is split into two, which
is particularly obvious on the lowest frequency “satel-
lite” 63Cu line. The whole spectra can be nicely fitted
with the following parameters: 63ν
Q = 14.85 (14.14)
MHz, η = 0, and K
zz = 1.80 (1.93) %, where νQ is the
quadrupolar frequency and η the asymmetry parameter.
The Kzz is the hyperfine shift, expected to be purely
orbital since the susceptibility has fully vanished. On in-
creasing T the highest frequency 65Cu “satellite” lines of
sites A and B become well separated and both exhibit
a line shape typical of an IC modulation of the nuclear
spin-Hamiltonian. Although the apparent intensities of
lines A and B look different, they correspond to the same
number of nuclei after corrections due to different spin-
spin relaxation rate 1/T2. Since the satellite NMR lines
135 140 145 150 155 160 165 170 175
 = 3.7 meV
 = 4.3 meV
63,65Cu NMR
 [ MHz ]
H = 13.205 T
15.0 14.5 14.0 13.5 13.0
upper sat.
T = 8.9 K
167 MHz
H [ T ]
FIG. 2: (Color online) 63,65Cu NMR spectra of BaCuSi2O6
in the gapped phase, well below the critical field. The T de-
pendence of the high-frequency “satellite” line clearly reveals
two different copper sites. From their shifts, the two corre-
sponding gap values have been determined. Inset: field sweep
spectrum that reveals the IC nature of the line shape for each
of the two sites. Shading separates the contribution of the
65Cu high-frequency satellite from the rest of the spectrum.
The analysis of the latter part confirms that the observed line
shape has a pure magnetic origin.
at 3 K (the lowest temperature) are narrow, the modu-
lation of νQ is negligible, meaning that the IC lineshapes
visible at higher temperature are purely magnetic. This
is confirmed by the analysis of the spectrum shown in the
inset of Fig. 2, which shows that at 8.9 K the broadening
of the “central” line is the same as that observed on the
“satellites”. Such a broadening results from a distribu-
tion of local hyperfine fields: δhz(R) = Azz(R)mz(R) in
which A(R) is the hyperfine coupling tensor and mz(R)
the longitudinal magnetization at site R. Since νQ(R) is
not modulated by the distortion, one expects that the
modulation of A(R) is negligible too, A(R) =A. This
means that the NMR lineshape directly reflects the IC
modulation of mz in the plane.
Keeping constant the νQ parameters obtained at 3 K,
one can analyze the T dependence of the shift Kαzz(T ) of
each component α = A or B according to the formula
Kαzz(T )−Kαzz(0) = Aαzzmdz(∆α, H, T )/H, (1)
where mdz is the magnetization of a non-interacting
dimer, mdz = gcµB/(e
(∆α−gcµBH)/kBT + 1) in the given
T range, gc = 2.3 [15], and K
zz is determined from the
average line position, i.e., the first moment. The best fit
was obtained for ∆A(B) = 3.7 (4.3) meV and A
-16.4 T/µB. We assumed that A
cc = A
cc, but the values
of ∆ depend only weakly with this quantity. The values
are slightly higher than those determined by neutron in-
elastic scattering for Qmin = [π, π] [13], which is normal
considering our approximate description. However, the
ratio ∆B/∆A = 1.16 is in excellent agreement with the
neutron result 1.15. Considering the fact that there is
FIG. 3: (Color online) Evolution of the normalized 29Si spec-
trum as a function of H at fixed T . The colored spectra
correspond to the BEC. a) T = 50 mK: Instead of a simple
splitting of the line as expected for a standard BEC, a com-
plex pattern appears, typical of an IC distribution of the local
hyperfine field. Inset: H dependence of the 1st (M1, squares)
and 2nd (M2, circles) moment of the spectra. M1 is propor-
tional to mz and M2 to the square of the order parameter. b)
T = 720 mK: The non zero magnetization outside the BEC
leads to an IC pattern for fields H ≤ Hc1(T ), where Hc1(T )
is determined from the H dependence of M2, as shown in the
inset. Lower inset: Tc is linear in H −Hc1, as expected for a
2D BEC QCP.
no disorder in the system (as Cu lines at low T are nar-
row), and that X-rays did not detect any commensurate
peak corresponding to a doubling of the unit cell in the
ab plane, our NMR data can only be explained if there
are two types of planes with different gap values. Look-
ing back at the 29Si spectra in Fig. 1, one also observes
just below 90 K two well separated components, both of
them exhibiting an IC pattern. They indeed correspond
to the two types of planes, as the T dependence of their
positions can be well fit using values close to ∆A and
∆B determined from Cu NMR (inset to Fig. 1). This
means that the 90 K structural phase transition not only
corresponds to the onset of an IC distortion in the ab
plane, but also leads simultaneously to an alternation of
different planes along the c-axis [16], with intra-dimer
exchange in the ratio JB/JA ∼= ∆B/∆A = 1.16
Let us now recall what is expected from a microscopic
point of view in the vicinity of the QCP corresponding to
the onset of a homogeneous BEC for coupled dimer sys-
tems. As soon as a finite density of bosons n is present
(H > Hc1 = ∆min/gµB), a transverse staggered magne-
tization m⊥ (⊥ to H) appears. Its amplitude and direc-
tion correspond respectively to the amplitude and phase
of the order parameter. At the same time, the longitu-
dinal magnetization mz is proportional to the number of
bosons at a given temperature and field, this latter play-
ing the role of the chemical potential. Due to the appear-
ance of a static m⊥, the degeneracy between sites which
were equivalent outside the condensate will be lifted and
their corresponding NMR lines will be split into two. To
be more specific, we consider a pair of Si sites situated
in the ab plane on opposite sides of a Cu dimer. Outside
the condensate, and in the absence of the IC modula-
tion, they should give a single line for H ‖ c. Inside
the condensate the NMR lines of this pair of Si sites will
split by ±29γ|Az⊥|m⊥ because their Az⊥ couplings are
of opposite sign. Obviously, observing a splitting of lines
requires the existence of off-diagonal terms in the hyper-
fine tensor. Such terms are always present due to the
direct dipole interaction between an electronic and the
nuclear spin, which can be easily calculated.
Instead of this expected simple line splitting, the spec-
tra of Fig. 3a reveal a quite complex modification of the
line-shape when entering the condensate. The narrow
single line, observed at 23.41 T at the frequency ν0,
which corresponds to a negligible boson density, sud-
denly changes into a composite line-shape including a
narrow and a broad component. The spread-out of
the broad component increases very quickly with the
field. The width of the narrow component also in-
creases, but at a much lower rate. Both peculiar broad-
enings are related to the IC modulation of the boson
density n(R) due to the structural modulation. To be
more precise, a copper dimer at position R has in to-
tal 4 Si atoms (denoted by k = 0,1,2,3) situated around
in a nearly symmetrical square coordination. The ab-
solute values of the corresponding hyperfine couplings
will thus be nearly identical, and we will also neglect
their dependence on R. These 4 Si sites will give rise
to four NMR lines at the frequencies 29νk(R) = ν0 +
ν1(R) + ν2,k(R), where ν1(R) =
29γAzzgµBn(R) and
ν2,k(R) =
29γAz⊥m⊥(R) cos(φ − kπ/2). Note that ν2,k
only exists when the bosons are condensed, that is when
there is a transverse magnetizationm⊥ pointing in the di-
rection φ. In a uniform condensate m⊥ is proportional to√
n near the QCP, since the mean field behavior is valid
in both, 2D and 3D. We assume that only the amplitude
of the order parameter is spatially modulated, and that
m⊥(R) ∝
n(R). The line shape is the histogram of the
distribution of 29νk(R), convoluted by some broadening
due to nuclei – nuclei interaction.
Three quantities can be derived from the analysis of
NMR lines at fixed T values and variable H : the average
boson density n(H,T ), the field Hc1(T ) corresponding to
to the BEC phase boundary, and the field dependence of
the BEC order parameter (for T close to zero). The aver-
age number of bosons n per dimer is directly proportional
to the first moment M1 (i.e., the average position) of the
line: M1 =
(ν − ν0)f(ν)dν = 29γAzzgµBn(H,T ),
where the line shape f(ν) is supposed to be normalized.
The second moment (i.e., the square of the width) of the
line M2 =
(ν − ν0 − M1)2f(ν)dν has two origins:
the broadening due to the IC distribution of (n(R)-n),
and that due to the onset of m⊥ ∝
n(R) in the con-
densate. When increasing H at T ≃ 0, the condensation
occurs as soon as bosons populate the dimer plane. This
is observed in the inset of Fig. 3a at T = 50 mK. Both
M1 (n) and M2 (m⊥) vary linearly with the field and the
extrapolation of M2 to zero allows the determination of
Hc1 at 50 mK. For higher temperatures a thermal pop-
ulation of bosons n exists and increases with H before
entering the BEC phase. As a result both M1 and M2
increase non-linearly with H , as shown in the upper in-
set of Fig. 3b. However, the increase of M2(H) shows
two clearly separated regimes and allows the determina-
tion of Hc1(T ) as the point where the rate of change of
M2(H) strongly increases due to the appearance of m⊥.
Applying this criterion to all temperatures, we were able
to determine the field dependence of TBEC (lower inset of
Fig. 3b) and define precisely the QCP at Hc1 = 23.35 T.
In agreement with the torque measurements [11], we find
a linear field dependence. This is the signature of a 2D
BEC QCP, where Tc ∝ (H − Hc1)φ with φ = 2/d and
d = 2 [17].
This analysis, however, does not take into account the
specificity of the line shapes, which are related to the ex-
istence of two types of planes with different energy gaps.
A careful examination of the spectra clearly reveals that
they correspond to the superposition of two lines exhibit-
ing different field dependence at fixed T value. For sake
of simplicity, we have made a decomposition only for the
spectra at 50 mK, as shown in the inset to Fig. 4. Clearly,
one of the components remains relatively narrow with-
out any splitting, whereas the other immediately heav-
ily broadens in some sort of triangular line shape. The
field dependence of M1 of the two components, shown in
Fig. 4, reveals that they differ by a factor of 5. This is
attributed to the difference by a factor of 5 in the cor-
responding average populations of bosons. If there were
no hopping of bosons between A and B planes, the B
planes should be empty for the range of field such that
∆A < gµBH < ∆B. Although the observed density of
boson is finite in the B planes, it is strongly reduced,
giving rise to a strong commensurate modulation of n
along the c-axis. According to [11], the hopping along
the c-axis of bosons in the condensate is forbidden by
the frustration, and can only occur as a correlated jump
of a pair. However, this argument does not take into
23.5 24.0 24.5 25.0 25.5
T = 50 mK
total line  
H [ T ]
0.0 0.1 0.2
20H = 23.59 T
(  - 29  H ) [ MHz ]
FIG. 4: (Color online) Using a simple decomposition of the
spectra into two components as shown in the inset, we deter-
mined the 1st moments of the 29Si lines corresponding to the
different types of planes A and B. From the slopes of their
field dependence, the ratio of the average boson density is
found equal to nA/nB ≃ 5.
account the IC modulation of the boson density.
In conclusion, this NMR study of the 2D weakly cou-
pled dimers BaCuSi2O6 reveals that the microscopic na-
ture of the BEC in this system is much more complicated
than first expected. Two types of planes are clearly evi-
denced, with different intra-dimer J couplings and a gap
ratio of 1.16. Close to the QCP we observed that the
density of bosons, which is IC modulated within each
plane, is reduced in every second plane along the c-axis
by a factor of ≃ 5. This provides new constraints for
the understanding of the quasi-2D character of the BEC
close to the QCP.
We thank S.E. Sebastian, C.D. Batista and T. Gia-
marchi for discussions. Part of this work has been sup-
ported by the European Commission through the Euro-
MagNET network (contract RII3-CT-2004-506239), the
Transnational Access - Specific Support Action (contract
RITA-CT-2003-505474), the Estonian Science Founda-
tion (grant 6852) and the NSF (grant DMR-0134613).
[1] M.H. Anderson et al., Science 269, 198 (1995).
[2] I. Afflek, Phys. Rev. B 43, 3215 (1991).
[3] T. Giamarchi and A.M. Tsvelik, Phys. Rev. B 59, 11398
(1999).
[4] T. Nikuni et al., Phys. Rev. Lett. 84, 5868 (2000).
[5] H. Tanaka et al., J. Phys. Soc. Jpn. 70, 939 (2001).
[6] J. Sirker, A. Weiße, O.P. Sushkov, Europhys. Lett. 68,
275 (2004).
[7] M. Clémancey et al., Phys. Rev. Lett. 97, 167204 (2006).
[8] S. Miyahara et al., cond-mat/0610861.
[9] M. Jaime et al., Phys. Rev. Lett. 93, 087203 (2004).
[10] S.E. Sebastian et al., Phys. Rev. B 74, 180401(R) (2006).
[11] S.E. Sebastian et al., Nature. 441, 617 (2006).
[12] E. Samulon et al., Phys. Rev. B 73, 100407(R) (2006).
[13] Ch. Rüegg et al., Phys. Rev. Lett. 98, 017202 (2007).
[14] K.M. Sparta and G. Roth, Act. Cryst. B. 60, 491 (2004).
[15] S.A. Zvyagin et al., Phys. Rev. B 73, 094446 (2006).
[16] This does not introduce any superstructure peak along
(0,0,l) in X-ray experiments, since the unit cell already
http://arxiv.org/abs/cond-mat/0610861
contains four planes along the c-axis. Only the form fac-
tor, which has not been studied in details below 90 K,
should be slightly affected.
[17] C.D. Batista et al., cond-mat/0608703.
http://arxiv.org/abs/cond-mat/0608703
ABSTRACT
  We present a $^{63,65}$Cu and $^{29}$Si NMR study of the quasi-2D coupled
spin 1/2 dimer compound BaCuSi$_2$O$_6$ in the magnetic field range 13-26 T and
at temperatures as low as 50 mK. NMR data in the gapped phase reveal that below
90 K different intra-dimer exchange couplings and different gaps
($\Delta_{\rm{B}}/\Delta_{\rm{A}}$ = 1.16) exist in every second plane along
the c-axis, in addition to a planar incommensurate (IC) modulation. $^{29}$Si
spectra in the field induced magnetic ordered phase reveal that close to the
quantum critical point at $H_{\rm{c1}}$ = 23.35 T the average boson density
$\bar{n}$ of the Bose-Einstein condensate is strongly modulated along the
c-axis with a density ratio for every second plane
$\bar{n}_{\rm{A}}/\bar{n}_{\rm{B}} \simeq 5$. An IC modulation of the local
density is also present in each plane. This adds new constraints for the
understanding of the 2D value $\phi$ = 1 of the critical exponent describing
the phase boundary.

<|endoftext|><|startoftext|>
Introduction  
In previous articles (van Raan 2006a, 2006b, 2007) we presented an empirical approach 
to the study of the statistical properties of bibliometric indicators of research groups. Now 
we focus on a two orders of magnitude larger aggregation level within the science 
system: the university. Our target group consists of the 100 largest European 
universities. We will distinguish between different ‘dimensions’: top- and lower-
performance universities, higher and lower field citation densities, and higher and lower 
journal impact. In particular, we are interested in the phenomenon of size-dependent 
(size of a university in terms of number of publications) cumulative advantage1 of impact 
                                                 
1 By ‘cumulative advantage’ we mean that the dependent variable (for instance, number of citations of a 
university, C) increases in a disproportional, nonlinear (in this case: power law) way as a function of the 
independent variable (for instance, in the present study the size of a research university, in terms of number of 
publications, P). Thus, larger universities (in terms of P) do not just receive more citations (as can be 
expected), but they do so increasingly more advantageously: universities that are twice as large as other 
universities receive, on average, about 2.5 more citations.  
(in terms of numbers of citations), for different levels of research performance, field 
citation density and journal impact.   
Katz (1999) discussed scaling relationships between number of citations and number of 
publications across research fields and countries. He concluded that the science system is 
characterized by cumulative advantage, more particularly a size-dependent ‘Matthew 
effect’ (Merton 1968, 1988). As explained in footnote 1, this implies a nonlinear increase 
of impact with increasing size, demonstrated by the finding that the number of citations 
as a function of number of publications (in Katz’ study for 152 fields of science) exhibits a 
power law dependence with an exponent larger than 1. In our previous articles (van 
Raan 2006a, 2006b, 2007) we demonstrated a size-dependent cumulative advantage of 
the correlation between number of citations and number of publications also at the level 
of research groups. In this study we extent our observations to the level of entire 
universities.  
We focus on performance-related differences of bibliometric properties of universities. 
Particularly important are the citation characteristics of the research fields in which a 
university is active (the field citation densities) and the impact level of the journals used 
by a university. Seglen (1992, 1994) found a poor correlation between the impact of 
publications and journal impact at the level of individual publications. However, grouping 
publications in classes of journal impact yielded a high correlation between publication 
and journal impact. This grouping is determined by journal impact classes, and not by a 
‘natural’ grouping such as research groups and universities. In our previous study we 
showed a significant correlation between the average number of citations per publication 
of research groups, and the average journal impact of these groups. In this study we 
investigate whether this finding also holds at the level of entire universities.   
The structure of this study is as follows. Within a set of the 100 largest universities in 
Europe we distinguish in our analysis between performance, field citation densities and 
journal impact. In Section 2 we discuss the data material of the universities, the 
application of the method, and the calculation of the indicators. In Section 3 we analyse 
the data of the 100 largest European universities in the framework of size-dependent 
cumulative advantage and classify the results of the analysis in main observations. Our 
analysis of performance- and field density-related differences of bibliometric properties of 
universities reveals further interesting results, particularly on the role of journal impact. 
These observations are discussed in the last part of Section 3. Finally, in Section 4 we 
summarize the main outcomes of this study.  
2. Basic data and indicators derived from these data 
We studied the statistics of bibliometric indicators on the basis of all publications (as far 
as published in journals covered by the Citation Index, ‘CI publications’2) of the 100 
largest European universities for the period 1997-20043. This material is quite unique. To 
our knowledge no such compilations of very accurately collected publication sets on a 
large scale are used for statistical analysis of the characteristics of indicators at the 
university level. Obtaining data at the university level is not a trivial matter. The 
delineation of universities through externally available data such as the address 
information in the CI database is very problematic. For a thorough discussion of this 
problem, see Van Raan (2005a). The (CI-) publications were collected as part of a large 
                                                 
2 Thomson Scientific, the former Institute for Scientific Information (ISI) in Philadelphia, is the producer and 
publisher of the Citation Index system covered by the Web of Science. Throughout this article we use the 
acronym CI (Citation Index) to refer to this data system.  
3 We included Israel. We have left out Lomonosov University of Moscow. As far as number of publications 
concerns, this university is one of the largest in Europe (about 24,000 publications in the covered 8-year 
period) but the impact is so low (CPP/FCSm about 0.3) that it would have a very outlying position in the 
ranking. 
EC study on the scientific strengths of the European Union and its member states4. For a 
detailed discussion of methodological and technical issues we refer to Moed (2006). From 
a listing of more than 250 European universities we selected the 100 largest. The period 
covered is 1997-2004 for both publications and citations received by these publications. 
In total, the analysis involves the work of many thousands of senior researchers in 100 
large universities and covers around 1,5 million publications and 11 million citations 
(excluding self-citations), about 15% of the worldwide scientific output and impact. 
The indicators are calculated on the basis of a total time-period analysis. This means that 
publications are counted for the entire period (1997-2004) and citations are counted up 
to and including 2004 (e.g., for publications from 1997, citations are counted in the 
period 1997-2004, and for publications from 2004, citations are counted only in 2004). 
We are currently updating our data system with the 2005 and 2006 publication and 
citation data. 
We apply the CWTS5 standard bibliometric indicators. Only ‘external’ citations, i.e., 
citations corrected for self-citations, are taken into account. An overview of these 
indicators is given in the text box here below. For a detailed discussion we refer to Van 
Raan (1996, 2004, 2005b).  
Standard Bibliometric Indicators: 
• Number of publications P in CI-covered journals of a university in the specified period; 
• Number of citations C received by P during the specified period, without self-citations; including self-
citations: Ci, i.e., number of self-citations Sc = Ci – C, relative amount of self-citations Sc/Ci;  
• Average number of citations per publication, without self-citations (CPP); 
• Percentage of publications not cited (in the specified period) Pnc; 
• Journal-based worldwide average impact as an international reference level for a university (JCS, journal 
citation score, which is our journal impact indicator), without self-citations (on this world-wide scale!); in 
the case of more than one journal we use the average JCSm; for the calculation of JCSm the same 
publication and citation counting procedure, time windows, and article types are used as in the case of 
CPP; 
• Field-based6 worldwide average impact as an international reference level for a university (FCS, field 
citation score), without self-citations (on this world-wide scale!); in the case of more than one field (as 
almost always) we use the average FCSm; for the calculation of FCSm the same publication and citation 
counting procedure, time windows, and article types are used as in the case of CPP; we refer in this article 
to the FCSm indicator as the ‘field citation density’; 
• Comparison of the CPP of a university with the world-wide average based on JCSm as a standard, without 
self-citations, indicator CPP/JCSm; 
• Comparison of the CPP of a university with the world-wide average based on FCSm as a standard, without 
self-citations, indicator CPP/FCSm; 
• Ratio JCSm/FCSm is the relative, field-normalized journal impact indicator.  
In Table 1 we show as an example the results of our bibliometric analysis for the first 30 
universities within the European 100 largest. This table makes clear that our indicator 
calculations allow an extensive statistical analysis of these indicators for our set of 
universities. Of the above indicators, we regard the internationally standardized (field-
normalized) impact indicator CPP/FCSm as our ‘crown’ indicator. This indicator enables 
us to observe immediately whether the performance of a university is significantly far 
below (indicator value < 0.5), below (0.5 - 0.8), around (0.8 - 1.2), above (1.2 – 1.5), or 
far above (>1.5) the international (Western world dominated) impact standard averaged 
over all fields (van Raan 2004). 
                                                 
4 The ASSIST (Analysis and Studies of Statistics and Indicators on Science and Technology) project. 
5 Centre for Science and Technology Studies, Leiden University. 
6 We here use the definition of fields based on a classification of scientific journals into categories developed by 
Thomson Scientific/ISI. Although this classification is not perfect, it provides a clear and ‘fixed’ consistent field 
definition suitable for automated procedures within our data-system. 
Table 1: Largest 30 European universities 
 University P  C  CPP  Pnc  CPP/
1  UNIV CAMBRIDGE UK 36.349 361.681 9,95 29,1 1,63
2  UNIV COLL LONDON UK 34.407 346.028 10,06 26,9 1,46
3  UNIV OXFORD UK 33.780 355.856 10,53 29,5 1,67
4  IMPERIAL COLL LONDON UK 27.017 222.713 8,24 30,7 1,45
5  LUDWIG MAXIMILIANS UNIV MUNCHEN DE 23.519 177.317 7,54 30,8 1,14
6  UNIV PARIS VI PIERRE & MARIE CURIE FR 23.468 146.483 6,24 32,8 1,09
7  UNIV MILANO IT 23.006 175.181 7,61 30,0 1,11
8  UNIV UTRECHT NL 22.668 189.671 8,37 28,3 1,37
9  KATHOLIEKE UNIV LEUVEN BE 22.521 153.851 6,83 34,9 1,22
10  UNIV MANCHESTER  UK 22.470 137.812 6,13 34,4 1,16
11  UNIV WIEN AT 21.940 137.251 6,26 32,9 1,01
12  UNIV ROMA SAPIENZA IT 21.778 119.076 5,47 37,7 0,95
13  TEL AVIV UNIV IL 21.447 112.337 5,24 35,9 0,94
14  UNIV HELSINKI FI 21.034 179.662 8,54 28,5 1,38
15  LUNDS UNIV SE 20.631 157.944 7,66 27,9 1,21
16  KAROLINSKA INST STOCKHOLM SE 20.525 213.629 10,41 23,2 1,30
17  KOBENHAVNS UNIV DK 19.555 153.583 7,85 27,4 1,18
18  UNIV AMSTERDAM NL 19.333 163.417 8,45 28,9 1,35
19  UPPSALA UNIV SE 18.998 140.518 7,40 28,6 1,17
20  RUPRECHT KARLS UNIV HEIDELBERG DE 18.735 155.451 8,30 30,1 1,22
21  ETH ZURICH CH 18.611 148.078 7,96 29,8 1,52
22  KINGS COLL UNIV LONDON UK 18.601 161.460 8,68 28,7 1,32
23  HEBREW UNIV JERUSALEM IL 18.389 127.263 6,92 33,2 1,16
24  UNIV PARIS XI SUD FR 18.183 115.157 6,33 32,8 1,13
25  UNIV EDINBURGH UK 17.786 164.380 9,24 29,7 1,48
26  HUMBOLDT UNIV BERLIN DE 17.780 127.381 7,16 31,6 1,13
27  LEIDEN UNIV NL 16.832 147.821 8,78 26,9 1,26
28  UNIV ZURICH CH 16.783 154.154 9,19 29,2 1,33
29  UNIV BARCELONA ES 16.783 103.628 6,17 32,4 1,03
30  UNIV BRISTOL UK 16.387 119.960 7,32 29,7 1,31   
3. Results and Discussion 
3.1 Impact scaling and research performance  
In our previous study (van Raan 2006a, 2006b, 2007) we showed how a set of research 
groups is characterized in terms of the correlation between size (the total number of 
publications P of a specific research group7) and the total number of citations C received 
by a group. Now we calculated the same correlation for all 100 largest European 
universities. Fig. 3.1.1 shows that this correlation is described with a strong significance 
(coefficient of determination of the fitted regression is R2 = 0.79) by a power law: 
C(P) = 0.36 P 1.31 .  
At the lower side of P (and C) we observe a few ‘outliers’. These are universities with a 
considerably lower number of citations as compared to the other larger universities 
(among them Charles University of Prague and the University of Athens). We observe 
that the size of universities leads to a cumulative advantage (with exponent α=+1.31) 
for the number of citations received by these universities. Thus, the Matthew effect also 
works in at the aggregation level of entire universities. The intriguing question is how the 
                                                 
7 The number of publications is a measure of size in the statistical context described in this article. It is, 
however, a proxy for the real size of a research group or a university, for instance in terms number of staff full 
time equivalents (fte) available for research.  
research performance of the universities (measured by the indicator CPP/FCSm) relates 
to size-dependency. Gradual differentiation between top- and lower-performance 
(top/bottom 10%, 25%, and 50% of the CPP/FCSm distribution) enables us to study 
the correlation of C with P and possible scale effects (size-dependent cumulative 
advantage) in more detail. The results are presented in Figs. 3.1.2 - 3.1.4 and a 
summary of the findings in Table 3. 
 Correlation of C (per university) with P (per university) 
for the 100 largest European universities
y = 0.3566x1.308
R2 = 0.7929
10,000
100,000
1,000,000
1,000 10,000 100,000
Fig. 3.1.1: Correlation of the number of citations (C) received per university with the 
number of publications (P) of these universities for all 100 largest European universities. 
The group of highest performance universities (top-10%) does not have a cumulative 
advantage (i.e., exponent significantly8 > 1). The bottom-10% exponent is heavily 
determined by the outliers. The broader top-25% shows a slight (α=+1.16) and the 
bottom-25% a stronger cumulative advantage (α=+1.33). If we divide the entire set of 
universities in a top- and bottom-50% we see that both subsets have more or less equal 
exponents. Thus, the most intriguing finding is that the lowest performance universities 
have a larger size-dependent cumulative advantage than top-performance universities. 
This phenomenon was already observed at the level of research groups (van Raan 
2006a, 2006b, 2007). It is fascinating that within the science system this scaling rule 
covers at least two orders of magnitude in size of entities. Furthermore, the top-
performance universities are generally the larger ones, i.e., in the right hand side of the 
correlation function. 
                                                 
8 To estimate the influence of these noisy data, we randomly removed five universities. We found that the error 
in the exponent α is about ± 0.05. Thus, the noisiness of data remains within acceptable limits and does not 
substantially affect our findings.  
top-10% and bottom-10% of CPP/FCSm
y = 18.455x0.9355
R2 = 0.9556
y = 0.0833x1.4287
R2 = 0.6539
10,000
100,000
1,000,000
1,000 10,000 100,000
Fig. 3.1.2: Correlation of the number of citations (C) received per university with the 
number of publications (P) for the top-10% (of CPP/FCSm) universities (diamonds) and 
the bottom-10% universities (squares) within the 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 1.7776x1.1608
R2 = 0.8436
y = 0.2328x1.3293
R2 = 0.8291
10,000
100,000
1,000,000
1,000 10,000 100,000P
Fig. 3.1.3: Correlation of the number of citations (C) received per university with the 
number of publications (P) for the top-25% (of CPP/FCSm) universities (diamonds) and 
the bottom-25% universities (squares) within the 100 largest European universities. 
top-50% and bottom-50% of CPP/FCSm
y = 1.4839x1.171
R2 = 0.8197
y = 1.2745x1.1626
R2 = 0.7202
10,000
100,000
1,000,000
1,000 10,000 100,000
Fig. 3.1.4: Correlation of the number of citations (C) received per university with the 
number of publications (P) for the top-50% (of CPP/FCSm) universities (diamonds) and 
the bottom-50% universities (squares) with the 100 largest European universities. 
Table 3.1: Power law exponent α of the correlation of C with P for the 100 largest 
European universities in the indicated modalities. The differences in α between top and 
bottom modalities are indicated by ∆α(b,t). 
All 100 1.31 
top 10% 0.94 
bottom 10% 1.43 
∆α(b,t) 0.49 
top 25% 1.16 
bottom 25% 1.33 
∆α(b,t) 0.17 
top 50% 1.17 
bottom 50% 1.16 
∆α(b,t) -0.01 
An important feature of research impact is the number of not-cited publications. We 
analysed the correlation of the fraction (percentage) of not-cited-publications Pnc of the 
100 largest European universities with size (P) of a university. The results are shown in 
Fig. 3.1.5. We observe that the fraction of not-cited publications decreases with low 
significance as a function of size. The significance of the correlation is too low for clear 
results. Thus, as a further step we investigate this correlation with a distinction between 
top- and lower-performance universities. Fig. 3.1.6 shows the results for the top- and 
bottom-25%, and Fig. 3.1.7 for the top-50% and bottom-50% of the CPP/FCSm 
distribution of the 100 largest universities.   
 Correlation of Pnc (per university) with P (total per university) 
y = 126.14x-0.1425
R2 = 0.1239
100.0
1,000 10,000 100,000P
Fig. 3.1.5: Correlation of the percentage of not cited publications (Pnc) with the number 
of publications (P) for the entire set of the 100 largest European universities.  
top-25% and bottom-25% of CPP/FCSm
y = 58.445x-0.0705
R2 = 0.0535
y = 196.33x-0.1773
R2 = 0.2131
100.0
1,000 10,000 100,000
Fig. 3.1.6: Correlation of the relative number of not cited publications (Pnc) with the 
number of publications (P) for the top-25% (of CPP/FCSm) universities (diamonds), 
and the bottom-25% universities (squares).  
top-50% and bottom-50% of CPP/FCSm
y = 56.73x-0.0649
R2 = 0.0342
y = 80.722x-0.0902
R2 = 0.0466
100.0
1,000 10,000 100,000P
Fig. 3.1.7: Correlation of the relative number of not cited publications (Pnc) with the 
number of publications (P) for the top-50% (of CPP/FCSm) universities (diamonds), 
and for the bottom-50% universities (squares).  
The observations suggest that the fraction of non-cited publications decreases with size, 
particularly for the lower performance universities. This phenomenon was also found at 
the level of research groups (van Raan 2006a, 2006b, 2007) which means that we 
discovered another scaling rule in the science system covering at least two orders of 
magnitude. We notice, however, that this scaling rule for non-cited publications is less 
strong at the level of entire universities as compared to groups. Advantage by size works 
by a mechanism in which the number of not-cited publications is diminished. This 
mechanism works at the level of research groups as follows. The larger the number of 
publications in a group, the more those publications are ‘promoted’ which otherwise 
would have remained uncited. Thus, size reinforces an internal promotion mechanism, 
namely initial citation of these ‘stay behind’ publications in other more cited publications 
of the group. Then authors in other groups are stimulated to take notice of these stay 
behind publications and eventually decide to cite them. Consequently, the mechanism 
starts with within-group citation (which is not necessarily the same as self-citation), and 
subsequently spreads. It is obvious that particularly the lower performance groups will 
benefit from this mechanism. Top-performance groups do not ‘need’ the internal 
promotion mechanism to the same extent as low performance groups. This explains, at 
least in a qualitative sense, why top-performance groups show less, or even no 
cumulative advantage by size. Since an entire university is the sum of a large number of 
research groups, the above mechanism will also be visible at the university level.    
We also investigated the relation between research performance as measured by 
indicator CPP/FCSm with size in terms of P.  We find a very slight positive correlation as 
shown in Fig. 3.1.8 for all 100 universities and in Fig. 3.1.9 for the top- and bottom-25% 
of the CPP/FCSm distribution. This, however, this is certainly not a cumulative 
advantage; the exponent of the correlation is very small, around 0.2. Probably the most 
interesting aspect of this measurement is that performance does not decrease, not 
‘dilute’ with increasing size.   
 Correlation of CPP/FCSm (university) with P (total per university) 
y = 0.1117x0.2427
R2 = 0.2164
10.00
1,000 10,000 100,000
CPP/FCSm
Fig. 3.1.8: Correlation of CPP/FCSm with the number of publications (P) for the entire 
set of all 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 0.1285x0.209
R2 = 0.2101
y = 0.6862x0.0727
R2 = 0.1138
10.00
1,000 10,000 100,000
CPP/FCSm
Fig. 3.1.9: Correlation of CPP/FCSm with the number of publications (P) for the top-
25% (diamonds) and the bottom-25% (squares) of CPP/FCSm distribution of the 100 
largest European universities.  
3.2  Impact scaling, field citation density and journal impact 
In Fig. 3.2.1 we present the correlation of the number of citations with size for those 
universities among the 100 largest European universities that have high and low field 
citation densities, i.e., top-25% and bottom-25%, respectively, of the FCSm distribution. 
We observe that the high field density universities hardly have a cumulative advantage 
(exponent α = 1.09). The low field citation density universities have a considerably size-
dependent cumulative advantage (exponent α = 1.50).  
Correlation of C (total per university) with P (total per university) 
top-25% and bottom-25% of FCSm
y = 3.6829x1.0853
R2 = 0.8523
y = 0.0458x1.503
R2 = 0.8399
10,000
100,000
1,000,000
1,000 10,000 100,000
Figure 3.2.1: Correlation of the number of citations (C) with the number of publications 
(P) for the universities within the top- (diamonds) and the bottom-25% (squares) of the 
field citation density (FCSm) distribution. 
Correlation of C (total per university) with P (total per university) 
top-25% and bottom-25% of JCSm
y = 4.6851x1.0657
R2 = 0.9112
y = 0.0709x1.458
R2 = 0.7474
10,000
100,000
1,000,000
1,000 10,000 100,000
Figure 3.2.2: Correlation of the number of citations (C) with the number of publications 
(P) for the universities within the top- (diamonds) and the bottom-25% (squares) of the 
field citation density (JCSm) distribution. 
In Fig. 3.2.2 we present a similar correlation for the top- and bottom-25% of the JCSm, 
the average journal impact of a university. We see that these results are practically the 
same as in Fig. 3.2.1. Given the strong correlation of JCSm and FCSm at the level of 
universities, as illustrated in Fig. 3.2.3, this similarity can be expected. We remark, 
however, that the correlation of JCSm and FCSm has a power exponent 1.22 which 
means that the JCSm values increase in a nonlinear way (‘cumulatively’) with FCSm.  
 Correlation of JCSm (per university) with FCSm (per university) 
y = 0,7327x1,2191
R2 = 0,7033
Figure 3.2.3: Correlation of the average journal impact (JCSm) with the average field 
citation density (FCSm) for all 100 largest European universities.  
We now investigate the relation between citation impact of a university in terms of 
average number of citations per publication (CPP) on the one hand, and field citation 
density (FCSm) and journal impact (JCSm) on the other. Seglen (1994) showed that the 
citedness of individual publications CPP is not significantly affected by journal impact9. 
However, grouping publications in classes of journal impact yielded a high correlation 
between publication citedness and journal impact. We found that also a ‘natural’ grouping 
of publications, such as the work of a research group, leads to a high correlation of CPP 
and JCSm (van Raan 2006b, 2007).  
In this study we find that this is also the case at the aggregation level of entire 
universities. We find a significant correlation between the average number of citations 
per publication for the 100 largest European universities (CPP), and both the field 
citation density (FCSm) as well as the average journal impact of these universities 
(JCSm). We applied again the distinction between top- and lower-performance 
universities in order to find performance-related aspects in the above relation. The 
results are shown for the correlation of CPP with FCSm for the entire set of all 100 
largest European universities in Fig. 3.2.4, and for the top-performance (top-25% of 
CPP/FCSm) and lower performance (bottom-25% of CPP/FCSm) universities in Fig. 
3.2.5. The correlation of CPP with JCSm for the entire set of all 100 largest European 
universities is presented in Fig. 3.2.6 and for the top-performance and lower 
performance universities in Fig. 3.2.7. We see hat these correlations are very significant. 
                                                 
9 In Seglen’s work journal impact was defined with the ISI (Web of Science) journal impact factor; he did not 
consider the more sophisticated journal impact indicators such as the JCSm used in this study. 
 Correlation of CPP (per university) with FCSm (per university)
y = 0.5928x1.3654
R2 = 0.5357
10.00
100.00
Fig. 3.2.4: Correlation of CPP with FCSm for all 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 0.1712x1.9746
R2 = 0.7401
y = 1.3407x1.0219
R2 = 0.8316
10.00
100.00
Fig. 3.2.5: Correlation of CPP with FCSm for the top-25% (diamonds) and the bottom-
25% (squares) of CPP/FCSm distribution of the 100 largest European universities.  
 Correlation of CPP (per university) with JCSm (per university) 
y = 0.6942x1.2222
R2 = 0.907
10.00
100.00
Fig. 3.2.6: Correlation of CPP with JCSm for all 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 0.6384x1.238
R2 = 0.9224
y = 1.228x0.9641
R2 = 0.9497
10.00
100.00
Fig. 3.2.7: Correlation of CPP with JCSm for the top-25% (diamonds) and the bottom-
25% (squares) of CPP/FCSm distribution of the 100 largest European universities.  
Both the top- and lower-performance universities have more citations per publication 
(CPP) as a function of field citation density (FCSm, Fig.3.2.5) as well as of average 
journal impact (JCSm, Fig. 3.2.7). Clearly, the top universities generally have higher 
CPP values. We find that particularly for the lower-performance universities the field 
citation density (FCSm) provides a strong cumulative advantage in citations per 
publication (CPP) with exponent α = 1.97. The correlation of CPP with the average 
journal impact (JCSm) shows a less strong cumulative advantage for the lower-
performance universities, α = 1.24. We also observe clearly (Fig. 3.2.7) that most top-
performance universities publish in journals with significantly higher journal impact as 
compared to the lower performance universities.  Moreover, the top-25% universities 
perform in terms of citations per publications (CPP) with a factor of about 1.3 better than 
the bottom-25% universities in journals with the same average impact. An overview of 
the exponents of the correlation functions is given in Table 3.2. 
Table 3.2: Power law exponent α of the correlation of CPP with FCSm and with JCSm 
for the 100 largest European universities. The differences in α between top- and bottom-
modalities are given by ∆α(b,t).  
FCSm 
JCSm 
all 1.37 1.22 
top 25% 1.02 0.96 
bottom 25% 1.97 1.24 
∆α(b,t) 0.95 0.28 
Next to the impact measure CPP we also investigated the correlation of the field-
normalized research performance indicator (CPP/FCSm) of the 100 largest European 
universities with field citation density and with journal impact. The results are shown for 
the correlation of CPP/FCSm with FCSm for the entire set of all 100 largest European 
universities in Fig. 3.2.8, and for the top-performance (top-25% of CPP/FCSm) and 
lower performance universities in Fig. 3.2.9. The correlation of CPP/FCSm with JCSm 
for the entire set of all 100 largest European universities is presented in Fig. 3.2.10 and 
for the top-performance and lower performance universities in Fig. 3.2.11.  
 Correlation of CPP/FCSm with FCSm 
y = 0.5928x0.3654
R2 = 0.0763
10.00
1 10FCSm
CPP/FCSm
Fig. 3.2.8: Correlation of CPP/FCSm with FCSm for the entire set of the 100 largest 
European universities.  
top-25% and bottom-25% of CPP/FCSm
y = 0.1712x0.9746
R2 = 0.4095
y = 1.3407x0.0219
R2 = 0.0023
10.00
CPP/FCSm
Fig. 3.2.9: Correlation of CPP/FCSm with FCSm for the top-25% (diamonds) and the 
bottom-25% (squares) of CPP/FCSm distribution of the 100 largest European 
universities.  
 Correlation of CPP/FCSm with JCSm 
y = 0.3417x0.6452
R2 = 0.503
10.00
CPP/FCSm
Fig. 3.2.10: Correlation of CPP/FCSm with JCSm for the entire set of the 100 largest 
European universities.  
 top-25% and bottom-25% of CPP/FCSm
y = 0.2535x0.7627
R2 = 0.7952
y = 1.1046x0.116
R2 = 0.0815
10.00
CPP/FCSm
Fig. 3.2.11: Correlation of CPP/FCSm with JCSm for the top-25% (diamonds) and the 
bottom-25% (squares) of CPP/FCSm distribution of the 100 largest European 
universities.  
We observe that the research performance of the top universities is independent of field 
citation density (FCSm). For the lower-performance universities there is a slight increase 
of performance as a function of FCSm. The results for the average journal impact 
(JCSm) are similar but more outspoken. Again we notice that top-performance 
universities have a strong preference for the higher-impact journals.  
Finally, we analysed the correlation between the number of not-cited publications (Pnc) 
of a university and its average journal impact level (JCSm). The results are shown in Fig. 
3.2.12 for the entire set of 100 universities and in Fig. 3.2.13 for the top- en lower-
performance universities. We see a quite significant correlation between these two 
variables. Very clearly the top universities have the lowest Pnc. Given the strong 
correlation between CPP and JCSm (see Fig. 3.2.6) we can also expect a significant 
correlation between Pnc and CPP, as confirmed nicely by Fig. 3.2.14 for the entire set of 
100 universities and in Fig. 3.2.15 for the top- en lower-performance universities. Thus, 
we find that the higher the average journal impact of the publications of a university, the 
lower the number of not-cited publications. Also, the higher the average number of 
citation per publication in a university, the lower the number of not-cited publications. In 
other words, universities that are cited more per paper also have more cited papers. 
These findings underline the generally good correlation at the university level between 
the average number of citations per publication in a university, and its average journal 
impact.  
We also find that the relation between the relative number of not-cited publications 
(Pnc) and the mean number of citations per publication (CPP) can be written in good 
approximation as 
Pnc = 1/√(CPP). 
This expression reflects the characteristics of the citation-distribution function as it is the 
relation between the number of publications with zero citations and the average number 
of citations per publications. 
 Correlation of Pnc (per university) with JCSm per university) 
y = 105.22x-0.6333
R2 = 0.8054
100.0
Fig. 3.2.12: Correlation of the relative number of not cited publications (Pnc) with the 
mean journal impact (JCSm) of the 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 84.574x-0.5261
R2 = 0.8363
y = 111.24x-0.6516
R2 = 0.8182
100.0
Fig. 3.2.13: Correlation of the relative number of not cited publications (Pnc) with the 
mean journal impact (JCSm) for the top-25% (of CPP/FCSm) universities (diamonds), 
and the bottom-25% universities (squares).  
 Correlation of Pnc (per university) with CPP (per university) 
y = 85.039x-0.5058
R2 = 0.8459
100.0
1.00 10.00 100.00CPP
Fig. 3.2.14: Correlation of the relative number of not cited publications (Pnc) with the 
mean number of citations per publication (CPP) of the 100 largest European universities. 
top-25% and bottom-25% of CPP/FCSm
y = 86.762x-0.5053
R2 = 0.7551
y = 88.425x-0.5303
R2 = 0.9008
100.0
1.00 10.00 100.00
Fig. 3.2.15: Correlation of the relative number of not cited publications (Pnc) with the 
mean number of citations per publication (CPP) for the top-25% (of CPP/FCSm) 
universities (diamonds), and the bottom-25% universities (squares).  
3.3 Characteristics of self-citations 
In this section we present a first analysis of a specific feature of the science system, the 
statistical properties of self-citations. We calculated the correlation between size (the 
total number of publications P) and the total number of citations C for all 100 largest 
European universities. Fig. 3.3.1 shows that this correlation is described with high 
significance by a power law: 
Sc(P) = 0.53 P 1.15 .  
 Correlation of Sc (per university) with P (per university) 
y = 0.5289x1.148
R2 = 0.882
10000
100000
1,000 10,000 100,000P
Fig. 3.3.1: Correlation of the number of self-citations (Sc) received per university with 
the number of publications (P) of these universities, for all 100 largest European 
universities.  
At the lower side of P (and Sc) we again observe the ‘outliers’ as in the case of the 
(external) citations (Fig. 3.1.1). We find that the size of universities leads to a cumulative 
advantage (with exponent α=+1.15) for the number of self-citations given by these 
universities. Gradual differentiation between top- and lower-performance (top/bottom 
10%, 25%, and 50%) enables us to study the correlation of Sc with P in more detail as 
presented in Figs. 3.3.2 - 3.3.4. We see that the group of highest performance 
universities (top-10%) does not have a cumulative advantage (exponent around 1), 
whereas the bottom-10% exponent is heavily determined by the outliers. The broader 
top-25% and the bottom-25% show a slight cumulative advantage (α= 1.11 and 1.15, 
respectively). If we divide the entire set of universities in a top- and bottom-50% we see 
that both subsets have more or less equal exponents (around 1.11). 
In Fig. 3.3.5 we show that the fraction (percentage) of self-citations (%Sc) decreases 
slightly with size (P), but this correlation is not very significant. More significant is the 
decrease of the fraction of self-citations as a function of research performance 
CPP/FCSm, as shown in Fig. 3.3.6. We also observe a clear decrease of self-citations for 
the 100 largest universities in Europe as a function of average field citation density 
FCSm, Fig. 3.3.7, average journal impact JCSm, Fig. 3.3.8, and of field-normalized 
journal impact JCSm/FCSm, see Fig. 3.3.9. 
top-10% and bottom-10% of CPP/FCSm
y = 3.7993x0.9599
R2 = 0.9755
y = 0.0659x1.3587
R2 = 0.6437
10000
100000
1,000 10,000 100,000P
Fig. 3.3.2: Correlation of the number of self-citations (Sc) received per university with 
the number of publications (P), for the top-10% (of CPP/FCSm) universities 
(diamonds), and the bottom-10% universities (squares) with the 100 largest European 
universities. 
top-25% and bottom-25% of CPP/FCSm
y = 0.8272x1.1067
R2 = 0.8943
y = 0.4661x1.1531
R2 = 0.8385
10000
100000
1,000 10,000 100,000P
Fig. 3.3.3: Correlation of the number of self-citations (Sc) received per university with 
the number of publications (P), for the top-25% (of CPP/FCSm) universities 
(diamonds), and the bottom-25% universities (squares) with the 100 largest European 
universities. 
top-50% and bottom-50% of CPP/FCSm
y = 0.7546x1.1137
R2 = 0.8759
y = 0.7x1.1158
R2 = 0.8415
10000
100000
1,000 10,000 100,000P
Fig. 3.3.4: Correlation of the number of self-citations (Sc) received per university with 
the number of publications (P), for the top-50% (of CPP/FCSm) universities 
(diamonds), and the bottom-50% universities (squares) with the 100 largest European 
universities. 
 Correlation of %Sc (per university) with P (per university) 
y = 76.132x-0.1197
R2 = 0.1076
100.0
1,000 10,000 100,000P
Fig. 3.3.5: Correlation of the relative number of self-citations (%Sc) per university with 
the number of publications (P) of these universities, for all 100 largest European 
universities.  
 Correlation of %Sc with CPP/FCSm 
y = 25.98x-0.5349
R2 = 0.5853
100.0
0.10 1.00 10.00
CPP/FCSm
Fig. 3.3.6: Correlation of the relative number of self-citations (%Sc) per university with 
the performance (CPP/FCSm) of these universities, for all 100 largest European 
universities.  
Correlation of %Sc (per university) with FCSm 
y = 55.092x-0.4599
R2 = 0.2474
100.0
1 10FCSm
Fig. 3.3.7: Correlation of the relative number of self-citations (%Sc) per university with 
the field citation density (FCSm) of these universities, for all 100 largest European 
universities.  
Correlation of %Sc (per university) with JCSm 
y = 59.794x-0.4841
R2 = 0.5793
100.0
1 10JCSm
Fig. 3.3.8: Correlation of the relative number of self-citations (%Sc) per university with 
the average journal impact (JCSm) of these universities, for all 100 largest European 
universities.  
 Correlation of %Sc (per university) with JCSm/FCSm 
y = 25.94x-0.8393
R2 = 0.5499
100.0
0.10 1.00 10.00
JCSm/FCSm
Fig. 3.3.9: Correlation of the relative number of self-citations (%Sc) per university with 
the field-normalized journal impact (JCSm/FCSm) of these universities, for all 100 
largest European universities. 
4. Summary of the main findings and concluding remarks 
For the 100 largest European universities we studied statistical properties of bibliometric 
characteristics related to research performance, field citation density and journal impact. 
Our five main observations are as follows.  
First, we find a size-dependent cumulative advantage for the impact of universities in 
terms of total number of citations. Quite remarkably, lower performance universities 
have a larger size-dependent cumulative advantage for receiving citations than top-
performance universities. We found in previous work a similar scaling rule at the level of 
research groups and therefore we conjecture that this scaling rule is a prevalent property 
of the science system. We also observe that the top universities are about twice as 
efficient in receiving citations (C) as compared to the bottom-performance universities. 
Our criterion of top- or low performance is based on the field-normalized indicator 
CPP/FCSm. We hypothesize that in network terms this indicator represents the ‘fitness’ 
of a university as a node in the science system. It brings a university in a better position 
to acquire additional links (in terms of citations) on the basis of quality (high 
performance).  
Second, we find that for the lower-performance universities the fraction of not-cited 
publications decreases with size. We explain this phenomenon with a model in which size 
is advantageous in an ‘internal promotion mechanism’ to get more publications cited. 
Thus, in this model size is a distinctive parameter which acts as a bridge between the 
macro-picture (characteristics of the entire set of universities) and the micro-picture 
(characteristics within a university). We find that the higher the average journal impact 
of a university, the lower the number of not-cited publications. Also, the higher the 
average number of citations per publication in a university, the lower the number of not-
cited publications. In other words, universities that are cited more per paper also have 
more cited papers.  
Third, we find that the average research performance of university measured by our 
crown indicator CPP/FCSm does not ‘dilute’ with increasing size. Apparently large 
universities, particularly the top-performance universities are characterized by ‘big and 
beautiful’. In other words, they succeed in keeping a high performance over a broad 
range of activities. This most probably is an indication of their overall scientific and 
intellectual attractive power.  
Fourth, we observe that particularly the low field citation density and the low journal 
impact universities have a considerably size-dependent cumulative advantage for the 
total number of citations. We find that particularly for the lower-performance universities 
the field citation density (FCSm) provides a strong cumulative advantage in citations per 
publication (CPP). We also observe clearly that most top-performance universities 
publish in journals with significantly higher journal impact as compared to the lower 
performance universities.  Moreover, the top universities perform in terms of citations per 
publications (CPP) with a factor of about 1.3 better than the bottom universities in 
journals with the same average impact. The relation between number of citations and 
field citation density found in this study can be considered as a second basic scaling rule 
of the science system.  
Fifth, we find a significant decrease of the fraction of self-citations as a function of 
research performance CPP/FCSm, of the average field citation density FCSm, of the 
average journal impact JCSm, and of the field-normalized journal impact JCSm/FCSm.  
Acknowledgements 
The author would like to thank his CWTS colleagues Henk Moed and Clara Calero for the 
work to define and to delineate the universities, and for the data collection, data analysis 
and calculation of the bibliometric indicators.  
References 
Katz, J.S. (1999). The Self-Similar Science System. Research Policy 28, 501-517 
Merton, R.K. (1968). The Matthew effect in science. Science 159, 56-63.  
Merton, R.K. (1988). The Matthew Effect in Science, II: Cumulative advantage and the 
symbolism of intellectual property. Isis 79, 606-623. 
Moed, H.F. (2006) Bibliometric Rankings of World Universities, 
http://www.cwts.nl/hm/bibl_rnk_wrld_univ_full.pdf  
van Raan, A.F.J. (1996). Advanced Bibliometric Methods as Quantitative Core of Peer 
Review Based Evaluation and Foresight Exercises. Scientometrics 36, 397-420. 
van Raan, A.F.J. (2004). Measuring Science. Capita Selecta of Current Main Issues. In: 
H.F. Moed, W. Glänzel, and U. Schmoch (eds.). Handbook of Quantitative Science and 
Technology Research. Dordrecht: Kluwer Academic Publishers, p. 19-50. 
van Raan, A.F.J. (2005a). Fatal Attraction: Conceptual and methodological problems in 
the ranking of universities by bibliometric methods. Scientometrics 62(1), 133-143.  
van Raan, A.F.J. (2005b). Measurement of central aspects of scientific research:  
performance, interdisciplinarity, structure. Measurement 3(1), 1-19. 
van Raan, A.F.J. (2006a). Statistical Properties of Bibliometric Indicators: Research 
Group Indicator Distributions and Correlations. Journal of the American Society for 
Information Science and Technology (JASIST) 57(3), 408-430. 
van Raan, A.F.J. (2006b). Performance-related differences of bibliometric statistical 
properties of research groups: cumulative advantages and hierarchically layered 
networks. Journal of the American Society for Information Science and Technology 
(JASIST) 57(14), 1919-1935. 
Van Raan, A.F.J. (2007). Influence of field and journal citation characteristics in size 
dependent cumulative advantage of research group impact. To be published. 
Seglen, P.O. (1992). The skewness of science. Journal of the American Society for 
Information Science, 43, 628-638 
Seglen, P.O. (1994). Causal relationship between article citedness and journal impact. 
Journal of the American Society for Information Science, 45, 1-11
ABSTRACT
  For the 100 largest European universities we studied the statistical
properties of bibliometric indicators related to research performance, field
citation density and journal impact. We find a size-dependent cumulative
advantage for the impact of universities in terms of total number of citations.
In previous work a similar scaling rule was found at the level of research
groups. Therefore we conjecture that this scaling rule is a prevalent property
of the science system. We observe that lower performance universities have a
larger size-dependent cumulative advantage for receiving citations than
top-performance universities. We also find that for the lower-performance
universities the fraction of not-cited publications decreases considerably with
size. Generally, the higher the average journal impact of the publications of a
university, the lower the number of not-cited publications. We find that the
average research performance does not dilute with size. Large top-performance
universities succeed in keeping a high performance over a broad range of
activities. This most probably is an indication of their scientific attractive
power. Next we find that particularly for the lower-performance universities
the field citation density provides a strong cumulative advantage in citations
per publication. The relation between number of citations and field citation
density found in this study can be considered as a second basic scaling rule of
the science system. Top-performance universities publish in journals with
significantly higher journal impact as compared to the lower performance
universities. We find a significant decrease of the fraction of self-citations
with increasing research performance, average field citation density, and
average journal impact.

<|endoftext|><|startoftext|>
Introduction
It is widely accepted that the structure and the chemical abun-
dances of the interstellar medium (ISM) are strongly influ-
enced by supernova (SN) explosions and by their remnants
(SNRs). However, the details of the interaction between SNR
shock fronts and ISM depend, in principle, on many factors,
among which the multiple-phase structure of the medium, its
density and temperature, the intensity and direction of the am-
bient magnetic fields. These factors are not easily determined
and this somewhat hampers our detailed understanding of the
complex ISM.
The bilateral supernova remnants (BSNRs, Gaensler 1998;
also called ”barrel-shaped,” Kesteven & Caswell 1987, or
”bipolar”, Fulbright & Reynolds 1990) are considered a bench-
mark for the study of large scale SNR-ISM interactions, since
no small scale effect like encounters with ISM clouds seems
to be relevant. The BSNRs are characterized by two opposed
radio-bright limbs separated by a region of low surface bright-
ness. In general, the remnants appear asymmetric, distorted and
Send offprint requests to: S. Orlando,
e-mail: orlando@astropa.inaf.it
elongated with respect to the shape and surface brightness of
the two opposed limbs. In most (but not all) of the BSNRs
the symmetry axis is parallel to the galactic plane, and this has
been interpreted as a difficulty for “intrinsic” models, e.g. mod-
els based on SN jets, rather than for “extrinsic” models, e.g.
models based on properties of the surrounding galactic medium
(Gaensler 1998).
In spite of the interest around BSNRs, a satisfactory and
complete model which explains the observed morphology and
the origin of the asymmetries does not exist. The galactic
medium is supposed to be stratified along the lines of con-
stant galactic latitude, and characterized by a large-scale am-
bient magnetic field with field lines probably mostly aligned
with the galactic plane. The magnetic field plays a three-fold
role: first, a magnetic tension and a gradient of the magnetic
field strength is present where the field is perpendicular to the
shock velocity leading to a compression of the plasma; sec-
ond, cosmic ray acceleration is most rapid where the field lines
are perpendicular to the shock speed (Jokipii 1987, Ostrowski
1988); third, the electron injection could be favored where
the magnetic field is parallel to the shock speed (Ellison et al.
1995). Gaensler (1998) notes that magnetic models (i.e. those
http://arxiv.org/abs/0704.0890v1
2 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
considering uniform ISM and ordered magnetic field) cannot
explain the asymmetric morphology of most BSNRs, and in-
vokes a dynamical model based on pre-existing ISM inhomo-
geneities, e.g. large-scale density gradients, tunnels, cavities.
Unfortunately, the predictions of these ad-hoc models have
consisted so far of a qualitative estimate of the BSNRs mor-
phology, with no real estimates of the ISM density interacting
with the shock. Moreover, most likely also non-uniform ambi-
ent magnetic fields may cause asymmetries in BSNRs, without
the need to assume ad-hoc density ISM structures. Two main
aspects of the nature of BSNRs, therefore, remain unexplored:
how and under which physical conditions do the asymmetries
originate in BSNRs? What is more effective in determining the
morphology and the asymmetries of this class of SNRs, the
ambient magnetic field or the non-uniform ISM?
Answering such questions at an adequate level requires
detailed physical modeling, high-level numerical implementa-
tions and extensive simulations. Our purpose here is to inves-
tigate whether the morphology of BSNR observed in the ra-
dio band could be mainly determined by the propagation of
the shock through a non-uniform ISM or, rather, across a non-
uniform ambient magnetic field. To this end, we model the
propagation of a shock generated by an SN explosion in the
magnetized non-uniform ISM with detailed numerical MHD
simulations, considering two complementary cases of shock
propagation: 1) through a gradient of ambient density with a
uniform ambient magnetic field; 2) through a homogeneous
isothermal medium with a gradient of ambient magnetic field
strength.
In Sect. 2 we describe the MHD model, the numerical
setup, and the synthesis of synchrotron emission; in Sect. 3 we
analyze the effects the environment has on the radio emission
of the remnant; finally in Sect. 4 and 5 we discuss the results
and draw our conclusions.
2. Model
2.1. Magnetohydrodynamic modeling
We model the propagation of an SN shock front through a
magnetized ambient medium. The model includes no radia-
tive cooling, no thermal conduction, no eventual magnetic field
amplification and no effects on shock dynamics due to back-
reaction of accelerated cosmic rays. The shock propagation
is modeled by solving numerically the time-dependent ideal
MHD equations of mass, momentum, and energy conservation
in a 3-D cartesian coordinate system (x, y, z):
+ ∇ · (ρu) = 0 , (1)
+ ∇ · (ρuu − BB) + ∇P∗ = 0 , (2)
+ ∇ · [u(ρE + P∗) − B(u · B)] = 0 , (3)
+ ∇ · (uB − Bu) = 0 , (4)
where
P∗ = P +
, E = ǫ +
|u|2 +
are the total pressure (thermal pressure, P, and magnetic pres-
sure) and the total gas energy (internal energy, ǫ, kinetic energy,
and magnetic energy) respectively, t is the time, ρ = µmHnH is
the mass density, µ = 1.3 is the mean atomic mass (assuming
cosmic abundances), mH is the mass of the hydrogen atom, nH
is the hydrogen number density, u is the gas velocity, T is the
temperature, and B is the magnetic field. We use the ideal gas
law, P = (γ − 1)ρǫ, where γ = 5/3 is the adiabatic index. The
simulations are performed using the flash code (Fryxell et al.
2000), an adaptive mesh refinement multiphysics code for as-
trophysical plasmas.
As initial conditions, we adopted the model profiles of
Truelove & McKee (1999), assuming a spherical remnant with
radius r0snr = 4 pc and with total energy E0 = 1.5 × 10
51 erg,
originating from a progenitor star with mass of 15 Msun,
and propagating through an unperturbed magnetohydrostatic
medium. The initial total energy is partitioned so that 1/4 of
the SN energy is contained in thermal energy, and the other
3/4 in kinetic energy. The explosion is at the center (x, y, z) =
(0, 0, 0) of the computational domain which extends between
−30 and 30 pc in all directions. At the coarsest resolution,
the adaptive mesh algorithm used in the flash code (paramesh;
MacNeice et al. 2000) uniformly covers the 3-D computational
domain with a mesh of 83 blocks, each with 83 cells. We al-
low for 3 levels of refinement, with resolution increasing twice
at each refinement level. The refinement criterion adopted
(Löhner 1987) follows the changes in density and temperature.
This grid configuration yields an effective resolution of ≈ 0.1
pc at the finest level, corresponding to an equivalent uniform
mesh of 5123 grid points. We assume zero-gradient conditions
at all boundaries.
We follow the expansion of the remnant for 22 kyrs, consid-
ering two sets of simulations: 1) through a gradient of ambient
density with a uniform ambient magnetic field; or 2) through
a homogeneous isothermal medium with a gradient of ambi-
ent magnetic field strength. Table 1 summarizes the physical
parameters characterizing the simulations.
In the first set of simulations, the ambient magnetic field
is assumed uniform with strength B = 1 µG and oriented
parallel to the x axis. The ambient medium is modeled with
an exponential density stratification along the x or the z direc-
tion (i.e. parallel or perpendicular to the B field) of the form:
n(ξ) = n0 + ni exp(−ξ/h) (where ξ is, respectively, x or z) with
n0 = 0.05 cm
−3 and ni = 0.2 cm
−3, and where h (set either to
25 pc or to 10 pc) is the density scale length. This configuration
has been used by e.g. Hnatyk & Petruk (1999) to describe the
SNR expansion in an environment with a molecular cloud. Our
choice leads to a density variation of a factor ∼ 6 or ∼ 60, re-
spectively, along the x or the z direction over the spatial domain
considered (60 pc in total). The temperature of the unperturbed
ISM is T = 104 K at ξ = 0 and is determined by pressure bal-
ance elsewhere. The adopted values of T = 104 K, n = 0.25
cm−3 and B = 1 µG at ξ = 0, outside the remnant, lead to
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 3
Table 1. Relevant initial parameters of the simulations: n0 and
ni are particle number densities of the stratified unperturbed
ISM (see text), h is the density scale length, and (x, y, z) are
the coordinates of the magnetic dipole moment. The ambient
medium is either uniform or with an exponential density strat-
ification along the x or the z direction (x−strat. and z−strat.,
respectively); the ambient magnetic field is uniform or dipolar
with the dipole oriented along the x axis and located at (x, y, z).
ISM n0 ni h B (x, y, z)
cm−3 cm−3 pc pc
GZ1 z−strat. 0.05 0.2 25 uniform -
GZ2 z−strat. 0.05 0.2 10 uniform -
GX1 x−strat. 0.05 0.2 25 uniform -
GX2 x−strat. 0.05 0.2 10 uniform -
DZ1 uniform 0.25 - - z−strat. (0, 0,−100)
DZ2 uniform 0.25 - - z−strat. (0, 0,−50)
DX1 uniform 0.25 - - x−strat. (−100, 0, 0)
DX2 uniform 0.25 - - x−strat. (−50, 0, 0)
β ∼ 17 (where β = P/(B2/8π) is the ratio of thermal to mag-
netic pressure) a typical order of magnitude of β in the diffuse
regions of the ISM (Mac Low & Klessen 2004).
In the second set of simulations, the unperturbed ambient
medium is uniform with temperature T = 104 K and parti-
cle number density n = 0.25 cm−3. The ambient magnetic
field, B , is assumed to be dipolar. This idealized situation is
adopted here mainly to ensure magnetostaticity of the non-
uniform field. The dipole is oriented parallel to the x axis and
located on the z axis (x = y = 0) either at z = −100 pc or at
z = −50 pc; alternatively the dipole is located on the x axis
(y = z = 0) either at x = −100 pc or at x = −50 pc (as shown
in Fig. 1). In both configurations, the field strength varies by a
factor ∼ 6 (z or x = −100 pc) or ∼ 60 (z or x = −50 pc) over
60 pc: in the first case in the direction perpendicular to the av-
erage ambient field 〈B〉, whereas in the second case parallel to
〈B〉. In all the cases, the initial magnetic field strength is set to
B = 1 µG at the center of the SN explosion (x = y = z = 0).
Note that the transition time from adiabatic to radiative
phase for a SNR is (e.g. Blondin et al. 1998; Petruk 2005)
ttr = 2.84 × 10
4 E4/1751 n
−9/17
ism yr , (5)
where E51 = E0/(10
51 erg) and nism is the particle number den-
sity of the ISM. In our set of simulations, runs GZ2 and GX2
present the lowest values of the transition time, namely ttr ≈ 25
kyr. Since we follow the expansion of the remnant for 22 kyrs,
our modeled SNRs are in the adiabatic phase.
2.2. Nonthermal electrons and synchrotron emission
We synthesize the radio emission from the remnant, assum-
ing that it is only due to synchrotron radiation from relativistic
electrons distributed with a power law spectrum N(E) = KE−ζ ,
where E is the electron energy, N(E) is the number of elec-
trons per unit volume with arbitrary directions of motion and
with energies in the interval [E, E + dE], K is the normaliza-
tion of the electron distribution, and ζ is the power law index.
Figure 1. 2-D sections in the (x, z) plane of the initial mass
density distribution and initial configuration of the unperturbed
dipolar ambient magnetic field in two cases: dipole moment lo-
cated on the z axis (DZ1, left panel), or on the x axis (DX1,
right panel). The initial remnant is at the center of the domain.
Black lines are magnetic field lines.
Following Ginzburg & Syrovatskii (1965), the radio emissivity
can be expressed as:
i(ν) = C1KB
, (6)
where C1 is a constant, B⊥ is the component of the magnetic
field perpendicular to the line-of-sight (LoS), ν is the frequency
of the radiation, α = (ζ − 1)/2 is the synchrotron spectral index
(assumed to be uniform everywhere and taken as 0.5 as ob-
served in many BSNRs). To compute the total radio intensity
(Stokes parameter I) at a given frequency ν0, we integrate the
emissivity i(ν0) along the LoS:
I(ν0) =
i(ν0) dl , (7)
where dl is the increment along the LoS.
The normalization of the electron distribution Ks in Eq.
6 (index “s” refers to the immediately post-shock values) de-
pends on the injection efficiency (the fraction of electrons
that move into the cosmic-ray pool). Unfortunately, it is un-
known how the injection efficiency evolves in time. On theo-
retical grounds, Ks is expected to vary with the shock velocity
Vsh(t) and, in case of inhomogeneous ISM, with the immedi-
ately post-shock value of mass density, ρs; let us assume that
approximately Ks ∝ ρsVsh(t)
−b. Reynolds (1998) considered
three empirical alternatives for b as a free parameter, namely,
b = 0,−1,−2. Petruk & Bandiera (2006) showed that one can
expect b > 0 and its value could be b ≈ 4. Stronger shocks
are more successful in accelerating particles. To be accelerated
effectively, a particle should obtain in each Fermi cycle larger
increase in momentum, which is proportional to the shock ve-
locity. Negative b reflects an expectation that injection effi-
ciency may behave in a way similar to acceleration efficiency:
stronger shocks might inject particles more effectively. In con-
trast, positive b represents a different point of view: efficien-
cies of injection and acceleration may have opposite depen-
dencies on the shock velocity. Stronger shock produces higher
turbulence which is expected to prevent more thermal particles
to recross the shock from downstream to upstream and to be,
4 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
therefore, injected. Since the picture of injection is quite un-
clear from both theoretical and observational points of view,
we do not pay attention to the physical motivations of the value
of b. Instead, our goal is to see how different trends in evolu-
tion of injection efficiency may affect the visible morphology
of SNRs. Such understanding could be useful for future obser-
vational tests on the value of b.
We found, in agreement with Reynolds (1998), that the
value of b does not affect the main features of the sur-
face brightness distribution if SNR evolves in uniform ISM.
Therefore we use the value b = 0 to produce the SNR images in
models with uniform ISM (models DZ1, DZ2, DX1, and DX2).
In cases where non-uniformity of ISM causes variation of the
shock velocity in SNR (models GZ1, GZ2, GX1, and GX2),
we calculate images for b = −2, 0, 2. We follow the model of
Reynolds (1998) in description of the post-shock evolution of
relativistic electrons. Adopting this approach and considering
that ζ = 2 (being α = 0.5, see above), one obtains that (see
Appendix A)
K(a, t)
Ks(R, t)
P(a, t)
P(R, t)
)−b/2 (
ρo(a)
ρo(R)
)−(b+1)/3 (
ρ(a, t)
ρ(R, t)
)5b/6+4/3
where a is the lagrangian coordinate, R is the shock radius, ρ is
the gas density, P is the gas pressure, and the index “o” refers
to the pre-shock values. It is important to note that this formula
accounts for variation of injection efficiency caused by the non-
uniformity of ISM.
The electron injection efficiency may also vary with the
obliquity angle between the external magnetic field and the
shock normal, φBn. The numerical simulations suggest that in-
jection efficiency is larger for parallel shocks, i.e. where the
magnetic field is parallel to the shock speed (obliquity an-
gle close to zero; Ellison et al. 1995). However, it has been
shown (Fulbright & Reynolds 1990) that models with injec-
tion strongly favoring parallel shocks produce SNR maps that
do not resemble any known objects (it is also claimed that
injection is more efficient where the magnetic field is per-
pendicular to the shock speed; Jokipii 1987). On the other
hand, comparison of known SNRs morphologies with model
SNR images calculated for different strengths of the injec-
tion efficiency dependence on obliquity suggests that the in-
jection efficiency in real SNRs could not depend on obliquity
(Petruk, in preparation). In such an unclear situation, we con-
sider the three cases: quasi-parallel, quasi-perpendicular, and
isotropic injection models. Following Fulbright & Reynolds
(1990), we model quasi-parallel injection by multiplying the
normalization of the electron distribution K by cos2 φBn2 (see
also Leckband et al. 1989), where φBn2 is the angle between the
shock normal and the post-shock magnetic field1. By analogy
with the quasi-parallel case, we model quasi-perpendicular in-
jection by multiplying K by sin2 φBn2.
1 For a shock compression ratio of 4 (the shock Mach number is≫
10 in all directions during the whole evolution in each of our simula-
tions), the obliquity angle between the external magnetic field and the
shock normal, φBn, is related to φBn2 by sin
φBn2 = (cot
2 φBn/16+1)
(e.g. Fulbright & Reynolds 1990).
An important point is the degree of ordering of magnetic
field downstream of the shock. Radio polarization observation
of a number of SNRs (e.g. Tycho Dickel et al. 1991, SN1006
Reynolds & Gilmore 1993) show the low degree of polariza-
tion, 10-15% (in case of ordered magnetic field the value ex-
pected is about 70%; Fulbright & Reynolds 1990), indicating
highly disordered magnetic field. Thus we calculate the syn-
chrotron images of SNR for two opposite cases. First, since our
MHD code gives us the three components of magnetic field,
we are able to calculate images with ordered magnetic field.
Second, we introduce the procedure of the magnetic field dis-
ordering (with randomly oriented magnetic field vector with
the same magnitude in each point) and then synthesize the ra-
dio maps. In models which have a disordered magnetic field,
we use the post-shock magnetic field before disordering to cal-
culate the angle φBn2; as discussed by Fulbright & Reynolds
(1990), this corresponds to assume that the disordering process
takes place over a longer time-scale than the electron injection,
occurring in the close proximity of the shock. Since we found
that the asymmetries induced by gradients either of ambient
plasma density or of ambient magnetic field strength are not
significantly affected by the degree of ordering of the magnetic
field downstream of the shock, in the following we will focus
on the models with disordered magnetic field.
The goal of this paper is to look whether non-uniform ISM
or non-uniform magnetic field can produce asymmetries on
BSNRs morphology. In order to clearly see the role of these
two factors in determining the morphology of BSNRs, we use
some simplifying assumptions about electron kinetic and be-
havior of magnetic field in vicinity of the shock front. Our
calculations are performed in the test-particle limit, i.e. they
ignores the energy in cosmic rays. In particular, we do not con-
sider possible amplification of magnetic field by the cosmic-ray
streaming instability (Lucek & Bell 2000, Bell & Lucek 2001).
We expect that the main features of the modeled SNR morphol-
ogy will not change if this process is independent of obliquity
angle. If future investigations show undoubtedly that magnetic
field amplification varies strongly with obliquity, the role of
this effect in producing BSNRs have to be additionally studied.
3. Results
In all the models examined, we found the typical evolution
of adiabatic SNRs expanding through an organized ambient
magnetic field (see Balsara et al. 2001 and references therein):
the fast expansion of the shock front with temperatures of few
millions degrees, and the development of Richtmyer-Meshkov
(R-M) instability, as the forward and reverse shocks progress
through the ISM and ejecta, respectively (see Kane et al. 1999).
As examples, Fig. 2 shows 2-D sections in the (x, z) plane of
the distributions of mass density and of magnetic field strength
for the models GZ2, DZ2, and DX2 at t = 18 kyrs. The in-
ner shell is dominated by the R-M instability that causes the
plasma mixing and the magnetic field amplification. In the in-
ner shell, the magnetic field shows a turbulent structure with
preferentially radial components around the R-M fingers (see
Fig. 3). Note that, some authors have invoked the R-M insta-
bilities to explain the dominant radial magnetic field observed
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 5
Figure 2. 2-D sections in the (x, z) plane of the mass density
distribution (left panels), in log scale, and of the distribution
of the magnetic-field strength (right panels), in log scale, in
the simulations GZ2 (upper panels), DZ2 (middle panels), and
DX2 (lower panels) at t = 18 kyrs. The box in the upper left
panel marks the region shown in Fig. 3.
in the inner shell of SNRs (e.g. Jun & Norman 1996); however,
in our simulations, the radial tendency is observed well inside
the remnant and not immediately behind the shock as inferred
from observations.
We found that, throughout the expansion, the shape of the
remnant is not appreciably distorted by the ambient magnetic
field because, for the values of explosion energy and ambi-
ent field strength (typical of SNRs) used in our simulations,
the kinetic energy of the shock is many orders of magnitude
larger than the energy density in the ambient B field (see also
Mineshige & Shibata 1990). The shape of the remnant does
not differ visually from a sphere also in the cases with den-
sity stratification of the ambient medium2 as it is shown by
Hnatyk & Petruk (1999).
The radio emission of the evolved remnants is character-
ized by an incomplete shell morphology when the viewing an-
gle is not aligned with the direction of the average ambient
2 In these cases, the remnant appears shifted toward the low den-
sity region; see upper panels in Fig. 2 (see also Dohm-Palmer & Jones
1996).
Figure 3. Close-up view of the region marked with a box in
Fig. 2. The dark fingers mark the R-M instability. The mag-
netic field is described by the superimposed arrows the length
of which is proportional to the magnitude of the field vector.
magnetic field (cf. Fulbright & Reynolds 1990); in general, the
radio emission shows an axis of symmetry with low levels of
emission along it, and two bright limbs (arcs) on either side
(see also Gaensler 1998). This morphology is very similar to
that observed in BSNRs.
3.1. Obliquity angle dependence
For each of the models listed in Table 1, we synthesized the
synchrotron radio emission, considering each of the three cases
of variation of electron injection efficiency with shock obliq-
uity: quasi-parallel, quasi-perpendicular, and isotropic particle
injection. As an example, Fig. 4 shows the synchrotron radio
emission synthesized from the uniform ISM model DZ1 with
randomized internal magnetic field at t = 18 kyrs in each of
the three cases. We recall that for these uniform density cases,
we have adopted an injection efficiency independent from the
shock speed (b = 0, Sect. 2.2). All images are maps of total in-
tensity normalized to the maximum intensity of each map and
have a resolution of 400 beams per remnant diameter (DSNR).
The images are derived when the LoS is parallel to the average
direction of the unperturbed ambient magnetic field 〈B〉 (LoS
aligned with the x axis), or perpendicular both to 〈B〉 and to
the gradient of field strength (LoS along y), or parallel to the
gradient of field strength (LoS along z).
The different particle injection models produce images that
can differ considerably in appearance. In particular, the quasi-
parallel case leads to morphologies of the remnant not repro-
duced by the other two cases: a center-brightened SNR when
the LoS is aligned with x (top left panel in Fig. 4), a BSNR
with two bright arcs slanted and converging on the side where
B field strength is higher when the LoS is along y (top center
6 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
Figure 4. Synchrotron radio emission (normalized to the maximum of each panel), at t = 18 kyrs, synthesized from model DZ1
assuming b = 0 (see text) and randomized internal magnetic field, when the LoS is aligned with the x (left), y (center), or z (right)
axis. The figure shows the quasi-parallel (top), isotropic (middle), and quasi-perpendicular (bottom) particle injection cases. The
color scale is linear and is given by the bar on the right. The directions of the average unperturbed ambient magnetic field, 〈B〉,
and of the magnetic field strength gradient, ∇|B |, are shown in the upper left and lower right corners of each panel, respectively.
panel), and a remnant with two symmetric bright spots located
between the center and the border of the remnant when the LoS
is along z (top right panel). Neither the center-brightened rem-
nant nor the double peak structure, showing no structure de-
scribable as a shell, seems to be observed in SNRs3. We found
analogous morphologies in all the models listed in Table 1, con-
sidering the quasi-parallel case. As extensively discussed by
Fulbright & Reynolds (1990) for models with uniform ambient
magnetic field and b = −2, we also conclude that the quasi-
parallel case leads to radio images unlike any observed SNR
(see also Kesteven & Caswell 1987).
3 Excluding filled center and composite SNRs, but these are due to
energy input from a central pulsar.
The isotropic case leads to remnant’s morphologies simi-
lar to those produced in the quasi-perpendicular case although
the latter case shows deeper minima in the radio emission than
the first one. When the LoS is aligned with x (middle left and
bottom left panels in Fig. 4) or with y (middle center and bot-
tom center panels), the remnants have one bright arc on the side
where the B strength is higher. When the LoS is aligned with z
(middle right and bottom right panels), the remnants have two
opposed arcs that appear perfectly symmetric. We found that
the isotropic and quasi-perpendicular cases lead to morpholo-
gies of the remnants similar to those observed.
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 7
Figure 5. Presentation as in Fig. 4 for model GZ1 with randomized internal magnetic field, assuming quasi-perpendicular particle
injection and b = −2 (top panels), b = 0 (middle) and b = 2 (bottom). The directions of the average unperturbed ambient magnetic
field, 〈B〉, and of the ambient plasma density gradient, ∇ρ, are shown in the upper left and lower right corners of each panel,
respectively.
3.2. Non-uniform ISM: dependence from parameter b
For models describing the SNR expansion through a non-
uniform ISM (models GZ1, GZ2, GX1, GX2), we derived the
synthetic radio maps considering three alternatives for the de-
pendence of the injection efficiency on the shock speed, namely
b = −2, 0, 2 (see Sect. 2.2). As an example, Fig. 5 shows the
synthetic maps derived from model GZ1 with randomized in-
ternal magnetic field, assuming quasi-perpendicular particle in-
jection, and considering b = −2 (top panels), b = 0 (middle)
and b = 2 (bottom).
When the LoS is not aligned with the density gradient, the
radio images show asymmetric morphologies of the remnants.
In this case, the main effect of varying b is to change the de-
gree of asymmetry observed in the radio maps. In the example
shown in Fig. 5, the density gradient is aligned with the z axis
and asymmetric morphologies are produced when the LoS is
aligned with x (left panels) or with y (center panels). In all the
cases, the remnant is brighter where the mass density is higher.
On the other hand, the degree of asymmetry increases with in-
creasing value of b.
The reason of such behavior consists in the balance be-
tween the roles of the shock velocity and of density in chang-
ing the injection efficiency. Consider, as an example, the top
left panel in Fig. 5: the increase of the shock velocity on the
north (due to fall of the ambient density) leads to an increase of
the brightness there (due to rise of the injection efficiency) that
partially balances the increase of the brightness on the south
due to higher density of ISM. On the other hand, for the model
shown in the bottom left panel in Fig. 5, the fraction of accel-
8 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
Figure 6. Synchrotron radio emission (normalized to the maximum of each panel), at t = 18 kyrs, synthesized from models
assuming a gradient of ambient plasma density (panels A and D; with b = 2) or of ambient magnetic field strength (panels B
and E; with b = 0) when the LoS is aligned with the y axis. All the models assume quasi-perpendicular particle injection. The
directions of the average unperturbed ambient magnetic field, 〈B〉, and of the plasma density or magnetic field strength gradient,
are shown in the upper left and lower right corners of each panel, respectively. The right panels show two examples of radio maps
(data adapted from Whiteoak & Green 1996 and Gaensler 1998; the arrows point in the north direction) collected for the SNRs
G338.1+0.4 (panel C) and G296.5+10.0 (panel F). The color scale is linear and is given by the bar on the right.
erated electrons increases on the south due to both the rise of
density and the decrease of the shock velocity.
When the LoS is aligned with the density gradient, the radio
images are symmetric. In the example shown in Fig. 5, this
corresponds to the maps derived when the LoS is along z (right
panels); the remnants are characterized by two opposed arcs
with identical surface brightness.
3.3. Morphology
Fig. 6 shows the radio emission maps, at a time of 18 kyrs, syn-
thesized from models with a gradient of ambient plasma den-
sity (panels A and D; assuming b = 2) and of ambient B field
strength (panels B and E; assuming b = 0). All the models as-
sume quasi-perpendicular particle injection (the isotropic case
produces radio maps with similar morphologies and the quasi-
parallel case is discussed later) and randomized internal mag-
netic field. The viewing angle is perpendicular both to the av-
erage direction of the unperturbed ambient magnetic field, 〈B〉,
(direct along the x axis) and to the gradients of density or field
strength (direct either along z, panels A and B, or x, panels D
and E). The right panels show, as examples, the radio maps of
the SNRs G338.1+04 (panel C, data from Whiteoak & Green
1996) and G296.5+10.0 (panel F, from Gaensler 1998).
In the quasi-perpendicular case discussed here, the max-
imum synchrotron emissivity is reached where the magnetic
field is strongly compressed. This configuration has been re-
ferred as “equatorial belt” (e.g. Rothenflug et al. 2004); 〈B〉
runs between the two opposed arcs (along the x axis). We found
that, when the density or the magnetic field strength gradient is
perpendicular to the field itself, the morphology of the radio
map strongly depends on the viewing angle. In these cases, the
two opposed arcs appear perfectly symmetric when the LoS
is aligned with the gradient (see, for instance, the right pan-
els in Fig. 5), otherwise the two arcs can have very different
radio brightness, leading to strongly asymmetric BSNRs (see
panels A and B in Fig. 6). In the former case (LoS aligned
with the gradient), the remnant is characterized by two axes of
symmetry: one between the two symmetric arcs and the other
perpendicular to the two. In models with strong magnetic field
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 9
strength gradients (DZ2; B varies by a factor ∼ 60 over 60 pc),
we found that the radio images are center-brightened when the
LoS is aligned with the gradient (figure not reported). The fact
that center-brightened remnants are not observed suggests that
the external B varies moderately in the neighborhood of the
remnants.
In case of asymmetry, the gradient is always perpendicular
to the arcs, and the brightest arc is located where either mag-
netic field strength or plasma density is higher (see panels A
and B in Fig. 6), since the synchrotron emission depends on
the plasma density, on the pressure, and on the field strength
(see Eqs. 6 and 8); in this case, there is only one axis of sym-
metry oriented along the density or B gradient. When the LoS
is parallel to 〈B〉 (along x in our models), the radio maps show
a shell structure with a maximum intensity located where mag-
netic field strength or plasma density is higher (see left pan-
els in Fig. 4 for isotropic and quasi-perpendicular cases and
left panels in Fig. 5). Our simulations show that, when the
density or the magnetic field strength gradient is perpendicu-
lar to the field itself, remnants with a monopolar morphology
can be observed at LoS not aligned with the gradient (see also
Reynolds & Fulbright 1990). Examples of observed monopolar
remnants are G338.1+0.4 (see panel C in Fig. 6) or G327.4+1.0
or G341.9-0.3.
When the density or B field strength gradient is parallel to
〈B〉 (panels D and E in Fig. 6) and the LoS lies in the plane per-
pendicular to 〈B〉, the morphology of the radio map does not
depend on the viewing angle and the two opposed arcs have
the same radio brightness. In these cases, however, there is only
one axis of symmetry and the two arcs appear slanted and con-
verging on the side where field strength or plasma density is
higher; again, the symmetry axis is aligned with the density
or B strength gradient. Examples of this kind of objects are
G296.5+10.0 (see panel F in Fig. 6) or G332.4-004 or SN1006
(which is, however, much younger than the simulated SNRs).
When the external magnetic field is parallel to the LoS, because
the system is symmetric about the magnetic field, the remnant
is axially symmetric and the radio maps show a complete radio
shell at constant intensity.
In the quasi-parallel case, 〈B〉 runs across the arcs. This
configuration has been referred as “polar caps” and it has been
invoked for the SN1006 remnant (Rothenflug et al. 2004). The
quasi-parallel case, apart from the center-brightened morphol-
ogy discussed in Sect. 3.1, can also produce remnant morpholo-
gies similar to those shown in Fig. 6. As examples, Fig. 7
shows the radio emission maps obtained in the cases dis-
cussed in Fig. 6 but assuming quasi-parallel instead of quasi-
perpendicular particle injection. Again, the viewing angle is
perpendicular both to 〈B〉 (direct along the x axis) and to the
gradients of density or field strength (direct either along z, pan-
els A and B, or x, panels C and D). In the quasi-parallel case,
remnants with a bright radio limb are produced if the gradi-
ent of ambient density or of ambient B field strength is par-
allel to 〈B〉 (instead of perpendicular to 〈B〉 as in the quasi-
perpendicular case), whereas slanting similar radio arcs are ob-
tained if the gradient is perpendicular to 〈B〉 (instead of parallel
as in the quasi-perpendicular case).
4. Discussion
Our simulations show that asymmetric BSNRs are explained
if the ambient medium is characterized by gradients either of
density or of ambient magnetic field strength: the two opposed
arcs have different surface brightness if the gradient runs across
the arcs (see panels A and B in Fig. 6, and panels C and D in
Fig. 7), whereas the two arcs appear slanted and converging
on one side if the gradient runs between them (see panels D
and E in Fig. 6 and panels A and B in Fig. 7). In all the cases
(including the three alternatives for the particle injection), the
symmetry axis of the remnant is always aligned with the gradi-
From the radio maps, we derived the azimuthal intensity
profiles: we first find the point on the map where the intensity
is maximum; then the contour of points at the same distance
from the center of the remnant as the point of maximum in-
tensity defines the azimuthal radio intensity profile. Following
Fulbright & Reynolds (1990), we quantify the degree of “bipo-
larity” of the remnants by using the so-called azimuthal inten-
sity ratio A, i.e. the ratio of maximum to minimum intensity
derived from the azimuthal intensity profiles. In addition, we
quantify the degree of asymmetry of the BSNRs by using a
measure we call the azimuthal intensity ratio Rmax ≥ 1, i.e.
the ratio of the maxima of intensity of the two limbs as de-
rived from the azimuthal intensity profiles, and the azimuthal
distance θD, i.e. the distance in deg of the two maxima. In the
case of symmetric BSNRs, Rmax = 1 and θD = 180
o. As al-
ready noted by Fulbright & Reynolds (1990), the parameter A
depends on the spatial resolution of the radio maps and on the
aspect angle (i.e. the angle between the LoS and the unper-
turbed magnetic field); moreover we note that, in real observa-
tions, the measure of A gives a lower limit to its real value if the
background is not accurately taken into account. On the other
hand, the parameters Rmax and θD have a much less critical de-
pendency on these factors and, therefore, they may provide a
more robust diagnostic in the comparison between models and
observations.
Fig. 8 shows the values of A, Rmax, and θD derived for all
the cases examined in this paper, considering the LoS aligned
with the y axis, and radio maps with a resolution of 25 beams
per remnant diameter4 (DSNR). Note that, our choice of the
LoS aligned with y (aspect angle φ = 90o) implies that the
values of A in Fig. 8 are upper limits, being A maximum at
φ = 90o and minimum at φ = 0o (see Fulbright & Reynolds
1990). The three models of particle injection (isotropic, quasi-
perpendicular and quasi-parallel) lead to different values of A.
In the isotropic and quasi-perpendicular cases, most of the val-
ues of A range between 5 and 20 (for model DX2, A is even
larger than 100); in the quasi-parallel case, the values of A are
larger than 500.
We found that, in general, a gradient of the ambient mag-
netic field strength leads to remnant morphologies similar to
those induced by a gradient of plasma density (compare, for
instance, panel A with B and panel D with E in Fig. 6). On the
other hand, if b < 0 in GX and GZ models, ambient B field
4 After the radio maps are calculated, they are convolved with a
gaussian function with σ corresponding to the required resolution.
10 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
Figure 7. Presentation as in Fig. 6, as-
suming quasi-parallel instead of quasi-
perpendicular particle injection.
gradients are more effective in determining the morphology of
asymmetric BSNRs. This is seen in a more quantitative form
in Fig. 8. DX and DZ models give Rmax values higher and θD
values lower than GX and GZ models with b < 0: a modest
gradient of the magnetic field (models DX1 and DZ1) gives
a value of Rmax higher or θD lower than the two models with
strong density gradients (models GX2 and GZ2) and b < 0.
Fig. 8 also shows that, in models with a density gradient,
the degree of asymmetry of the remnant increases with increas-
ing value of b; the GX and GZ models with b > 0 give values
of Rmax and θD comparable with (or, in the case of Rmax, even
larger of) those derived from DX and DZ models. In the case of
quasi-parallel particle injection for remnants with converging
similar arcs, it is necessary a strong gradient of density perpen-
dicular to B and b ≥ 0 (compare models GZ1 and GZ2 in the
lower panel in Fig. 8) to give values of θD comparable to those
obtained with a moderate gradient of ambient B field strength
perpendicular to B (see model DZ1 in Fig. 8).
In order to compare our model predictions with observa-
tions of real BSNRs, we have selected 11 SNR shells which
show one or two clear lobes of emission in archive total in-
tensity radio images, separated by a region of minima. We
have discarded all those cases in which several point-like or
extended sources appear superimposed to the bright limbs, or
other cases in which the location of maximum or minimum
emission around the shell is difficult to derive. Unlike other lists
of BSNRs published in the literature (e.g. Kesteven & Caswell
1987; Fulbright & Reynolds 1990; Gaensler 1998), here we fo-
cus on a reliable measure of the parameters A, Rmax and θD; we
avoid, therefore, patchy and irregular limbs, as in the case of
G320.4-01.2 of Gaensler (1998). Moreover, we are obviously
not discarding remnants which have constraints on A, Rmax or
θD (e.g. Fulbright & Reynolds 1990 considered only cases with
Rmax < 2), and we are considering remnants observed with a
resolution greater than 10 beams per remnant diameter. Since
in our models we follow the remnant evolution during the adi-
abatic phase, we also need to discard objects that are clearly
in the radiative phase. Unfortunately, for most of the objects
selected, there is no indication of their evolutionary stage in lit-
erature. Assuming that the remnant expands in a medium with
particle number density nism <∼ 0.3 cm
−3, the shock radius de-
rived from the Sedov solution at time ttr (i.e. at the transition
time from the adiabatic to the radiative phase; see Eq. 5) is
rtr = 19 E
−7/17
35 pc , (9)
where we have assumed that E51 = 1.5. Therefore, we only
considered remnants with radius rsnr < 35 pc (i.e. with size <
70 pc) that are, most likely, in the adiabatic phase. Our list does
not pretend to be complete or representative of the class, and
it is compiled to derive the observed values of the parameters
A, Rmax and θD with the lowest uncertainties. For this reason,
we have considered remnants for which a total intensity radio
image in digital format is available. Actually, in most of the
cases, we have used the 843 MHz data of the MOST supernova
remnant catalogue (Whiteoak & Green 1996).
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 11
Figure 8. Azimuthal intensity ratio A (i.e.
the ratio of maximum to minimum inten-
sity around the shell of emission - see text;
upper panel), azimuthal intensity ratio Rmax
(i.e. the ratio of the maxima of intensity
of the two limbs around the shell; middle
panel), and azimuthal distance θD (i.e. the
distance in deg of the two maxima of in-
tensity around the shell; lower panel) for
all the cases examined, considering the LoS
aligned with the y axis and a spatial res-
olution of 25 beams per remnant diame-
ter, DSNR. Black crosses: isotropic; red tri-
angles: quasi-perpendicular; blue diamonds:
quasi-parallel.
Our list is reported in Table 2. We have separated evolved
and young SNRs. While the young SNRs listed in Table 2 have
very reliable measurement of A, Rmax and θD and a good record
of literature, making them very good candidate to test the diag-
nostic power of our model, we stress that the models we are
considering in this paper are focused on evolved SNRs; we
leave the discussion about young SNRs to a separate work. For
each object in Table 2, we show the apparent size, the distance
(from dedicated studies where possible, otherwise from the re-
vised Σ − D relation of Case & Bhattacharya 1998; see their
paper for caveats on usage of the Σ − D relation to derive SNR
distance), the real size, the resolution of the observation, and
the parameters A, Rmax, and θD we have introduced here.
Table 2 shows that most of the 11 remnants have A ≤ 10,
i.e. values consistent with those derived in Fig. 8 for the three
alternatives for the particle injection (recall that the values
shown in the figure have to be considered as upper limits).
Four remnants show high values of A (10 < A < 100) that
are difficult to explain in terms of the isotropic or the quasi-
perpendicular injection models with b < 0 unless the remnant
expands through a non-uniform ambient magnetic field (see
models DX2, and DZ2 in Fig. 8). In the light of these consid-
erations, we cannot exclude a priori any of the three alternative
models for the particle injection.
Four of the 11 objects in Table 2 show values of Rmax ≥ 2,
pointing out that, in these objects, the bipolar morphology is
asymmetric with the two radio limbs differing significantly in
intensity. An example of this kind of remnants is G338.1+0.4
(see panel C in Fig. 6). In the light of our results, its morphol-
ogy can be explained if a gradient of ambient density or of
ambient magnetic field strength is either perpendicular to the
average ambient magnetic field, 〈B〉, in the isotropic and quasi-
perpendicular cases or parallel to 〈B〉 in the quasi-parallel case.
It is worth noting that reveling such a gradient from the obser-
vations may be a powerful diagnostic to discriminate among
the alternative particle injection models, producing real ad-
vances in the understanding of the nonthermal physics of strong
shock waves.
An extreme example of a monopolar remnant with a bright
radio limb is G327.4+1.0 whose value of Rmax is larger than
10. Fig. 8 shows that high values of Rmax can be easily ex-
plained as due to non-uniform ambient magnetic field strength
or to non-uniform ambient density if b > 0. We suggest that the
morphology of G327.4+1.0 may give some hints on the value
of b (and, therefore, on the dependence of the injection effi-
ciency on the shock velocity) if the observations show that the
asymmetry is due to a non-uniform ambient medium through
which the remnant expands.
12 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
Table 2. List of barrel-shaped SNR shells for which a measurement of A, Rmax, and θD is presented for comparison with our
models.
Remnanta Flux size d size Res.b A Rmax θD Ref./notes
Jy arcmin kpc pc beams/DSNR deg
Evolved Remnants
G296.5+10.0 48 90 × 65 2.1 55 × 40 108 > 11 1.2 85 1
G299.6-0.5 1.1 13 × 13 18.1 68 18 6 2 160 2
G304.6+0.1 18 8 × 8 7.9 18 11 20 1.5 120 3
G327.4+1.0 2.1 14 × 13 13.9 56 19 > 10 > 10 ND 2,4
G332.0+0.2 8.9 12 × 12 < 20 < 70 17 5 1 145 2,7
G338.1+0.4 3.8 16 × 14 9.9 46 × 40 21 3 2 > 120 2
G341.9-0.3 2.7 7 × 7 14.0 28 10 8 3 170 2
G346.6-0.2 8.7 11 × 10 8.2 26 × 23 15 2 1.1 110 2,7
G351.7+0.8 11 18 × 14 6.7 35 × 27 22 2 1.6 130 2
Young Remnants
G327.6+14.6 19 30 × 30 2.2 19 × 19 42 22 1 127 5
G332.4-0.4 34 11 × 10 3.1 10 × 9 15 7 1.6 98 6
References and notes. - (1) A.k.a. PKS 1209-51/52. Age: 3–20 kyrs, Roger et al. (1988). Distance from Giacani et al. (2000). (2) Distance
derived by Case & Bhattacharya (1998) using a revised Σ−D relation. (3) Distance from Caswell et al. (1975). (4) This shell has only one limb
(“monopolar” according to the definition of Fulbright & Reynolds 1990). A and Rmax are lower limits and no θD is derived. (5) A.k.a. SN1006.
Distance from Winkler et al. (2003). (6) A.k.a. RCW103. Distance from Reynoso et al. (2004). (7) Two maxima have been found in one lobe.
θD is the average of the two.
a All the data are from the MOST supernova remnant catalogue (Whiteoak & Green 1996), except where noted.
b Spatial resolution of the observation in beams per remnant diameter.
In Table 2, six of the 11 remnants (including the two young
remnants SN1006 and RCW103) have values of θD < 140
pointing out that, in these objects, the two bright radio limbs
appear slanted and converging on one side. An example of this
class of objects is G296.5+10.0 (a.k.a PKS 1209-51/52) shown
in panel F in Fig. 6. In this case, the value of θD ∼ 85
o de-
rived from the observations may be easily explained as due
to a gradient of magnetic field strength either parallel to 〈B〉
in the isotropic and quasi-perpendicular cases or perpendicu-
lar to 〈B〉 in the quasi-parallel case. Models with a gradient
of ambient density cannot explain the low values of θD found
for G296.5+10.0 unless the gradients are strong (the density
should change by a factor 60 over 60 pc) and the dependence
of the injection efficiency on the shock velocity gives5 b ≥ 2.
5. Conclusions
Our findings have significant implications on the diagnostics
and lead to several useful conclusions:
1. The three different particle injection models (namely,
quasi-parallel, quasi-perpendicular and isotropic dependence
of injection efficiency from shock obliquity) can produce con-
siderably different images (see Fig. 4). The isotropic and quasi-
perpendicular cases lead to radio images similar to those ob-
5 Large positive values of b do not necessarily mean an increas-
ing fraction of shock energy going into relativistic particles as the
shock slows down because decelerating shock accelerates particles to
smaller Emax , namely the maximum energy at which the electrons are
accelerated.
served. The parallel-case may produce radio images unlike any
observed SNR (center-brightened or with a double-peak struc-
ture not describable as a shell). This is in agreement with the
findings of Fulbright & Reynolds (1990).
2. In models with gradients of the ambient density, the
dependence of the injection efficiency on the shock velocity
(through the parameter b defined in Sect. 2.2) affects the degree
of asymmetry of the radio images: the asymmetry increases
with increasing value of b.
3. Small variations of the ambient magnetic field lead to
significant asymmetries in the morphology of BSNRs (see
Figs. 6 and 7). Therefore, we conclude that the close similar-
ity of the radio brightness of the opposed limbs of a BSNR
(i.e. Rmax ≈ 1 and θD ≈ 180
o) is evidence of uniform ambient
B field where the remnant expands.
4. Variations of the ambient density lead to asymmetries
of the remnant with extent comparable to that caused by non-
uniform ambient magnetic field if b = 2.
5. Strongly asymmetric BSNRs (i.e. Rmax ≫ 1 or θD ≪
180o) imply either moderate variations of B or strong (moder-
ate) variations of the ISM density if b < 2 (b ≥ 2) as in the
case, for instance, of interaction with a giant molecular cloud.
6. BSNRs with different intensities of the emission of the
radio arcs (i.e. Rmax > 1) can be produced by models with a
gradient of density or of magnetic field strength perpendicular
to the arc (upper panels in Fig. 6 and lower panels in Fig. 7),
and the brightest arc is in the region of higher plasma density
or higher magnetic field strength.
S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants 13
7. Remnants with two slanting similar arcs (i.e. θD < 180
can be produced by models with a gradient of density or of
magnetic field strength running centered between the two arcs
(lower panels in Fig. 6 and upper panels in Fig. 7), and the
region of convergence is where either the plasma density or the
magnetic field strength is higher.
8. In all the cases examined, the symmetry axis of the rem-
nant is always aligned with the gradient of density or of mag-
netic field.
We found that the degree of ordering of the magnetic field
downstream of the shock does not affect significantly the asym-
metries induced by gradients either of ambient plasma density
or of ambient magnetic field strength; thus our conclusions, de-
rived in the case of disordered magnetic field, do not change in
the case of ordered magnetic field.
We defined useful model parameters to quantify the degree
of asymmetry of the remnants. These parameters may provide a
powerful diagnostic in the comparison between models and ob-
servations, as we have shown in a few cases drawn from a ran-
domly selected sample of BSNRs presented in Table 2. For in-
stance, if the density of the external medium is known by other
means (e.g. thermal X-rays, HI and Co maps, etc.), BSNRs can
be very useful to investigate the variation of the efficiency of
electron injection with the angle between the shock normal and
the ambient magnetic field or to investigate the dependence of
the injection efficiency from the shock velocity. Alternatively,
BSNRs can be used as probes to trace the local configuration
of the galactic magnetic field if the dependence of the injection
efficiency from the obliquity is known.
It is worth emphasizing that our model follows the evolu-
tion of the remnant during the adiabatic phase and, therefore,
their applicability is limited to this evolutionary stage. In the
radiative phase, the high degree of compression suggested by
radiative shocks leads to increase of the radio brightness due
to compression of ambient magnetic field and electrons. Since
our model neglects the radiative cooling it is limited to rela-
tively small compression ratios and, therefore, it is not able to
simulate this mechanism of limb brightening.
It will be interesting to expand the present study, consider-
ing the detailed comparison of model results with observations.
This may lead to a major advance in the study of interactions
between the magnetized ISM and the whole galactic SNR pop-
ulation (not only BSNRs), since the mechanisms at work in the
BSNRs are also valid for SNRs of more complex morphology.
Acknowledgements. We thank the referee for constructive and help-
ful criticism. The software used in this work was in part devel-
oped by the DOE-supported ASC/Alliance Center for Astrophysical
Thermonuclear Flashes at the University of Chicago. The simula-
tions have been executed at CINECA (Bologna, Italy) in the frame-
work of the INAF-CINECA agreement. This work was supported by
Ministero dell’Istruzione, dell’Università e della Ricerca, by Istituto
Nazionale di Astrofisica, and by Agenzia Spaziale Italiana (ASI/INAF
I/023/05/0).
Appendix A: Derivation of Eq. (8)
We follow Reynolds (1998) in the description of the evolution of elec-
tron distribution. His approach is extended here to the possibility to
deal with non-uniform ISM (cf. Petruk 2006). Fluid element a ≡ R(ti)
was shocked at time ti, where R is the radius of the shock, and a is the
Lagrangian coordinate. At that time the electron distribution on the
shock was
N(Ei , a, ti) = Ks(a, ti)E
i , (A.1)
where Ei is the electron energy at time ti, Ks is the normalization of
the electron distribution immediately after the shock (in the following,
index “s” refers to the immediately post-shock values), and ζ is the
power law index. Since we are interested in radio emission, we have
to account for only energy losses of electrons due to the adiabatic
expansion (Reynolds 1998):
, (A.2)
where ρ is the mass density, so the energy varies as
E = Ei
ρ(a, t)
ρs(a, ti)
. (A.3)
The conservation law for the number of particles per unit volume per
unit energy interval
N(E, a, t) = N(Ei, a, ti)
a2 da dEi
σr2 dr dE
, (A.4)
where σ is the shock compression ratio and r is the Eulerian coor-
dinate, together with the continuity equation ρo(a)a
2da = ρ(a, t)r2dr
(index “o” refers to the pre-shock values) and the derivative
ρ(a, t)
ρs(a, ti)
)−1/3
, (A.5)
implies that downstream
N(E, a, t) = Ks(a, ti)E
ρ(a, t)
ρs(a, ti)
)(ζ+2)/3
. (A.6)
If Ks ∝ ρsVsh(t)
−b, where Vsh(t) is the shock velocity and ρs is the
immediately post-shock value of density, then
Ks(a, ti) = Ks(R, t)
ρo(a)
ρo(R)
Vsh(t)
Vsh(ti)
. (A.7)
Therefore, the distribution of relativistic electrons follows
K(a, t)
Ks(R, t)
N(E, a, t)
N(E,R, t)
ρo(a)
ρo(R)
Vsh(t)
Vsh(ti)
ρ(a, t)
ρs(a, ti)
)(ζ+2)/3
. (A.8)
Now we can substitute Eq. A.8 with the ratio of the shock velocities
which comes from the expression (Hnatyk & Petruk 1999)
P(a, t)
Ps(R, t)
ρo(a)
ρo(R)
)−2/3 (
Vsh(ti)
Vsh(t)
ρ(a, t)
ρs(R, t)
. (A.9)
Thus, finally
K(a, t)
Ks(R, t)
P(a, t)
Ps(R, t)
)−b/2 (
ρo(a)
ρo(R)
)−(b+ζ−1)/3 (
ρ(a, t)
ρs(R, t)
)5b/6+(ζ+2)/3
. (A.10)
This formula may easily be used to calculate the profile of K(a) for
known P(a) and ρ(a) in the case of the radial flow of fluid. In the case
when mixing is allowed, the position R should correspond to the same
part of the shock which was at a at time ti.
14 S. Orlando et al.: On the origin of asymmetries in bilateral supernova remnants
References
Balsara, D., Benjamin, R. A., & Cox, D. P. 2001, ApJ, 563, 800
Bell, A. R. & Lucek, S. G. 2001, MNRAS, 321, 433
Blondin, J. M., Wright, E. B., Borkowski, K. J., & Reynolds, S. P.
1998, ApJ, 500, 342
Case, G. L. & Bhattacharya, D. 1998, ApJ, 504, 761
Caswell, J. L., Murray, J. D., Roger, R. S., Cole, D. J., & Cooke, D. J.
1975, A&A, 45, 239
Dickel, J. R., van Breugel, W. J. M., & Strom, R. G. 1991, AJ, 101,
Dohm-Palmer, R. C. & Jones, T. W. 1996, ApJ, 471, 279
Ellison, D. C., Baring, M. G., & Jones, F. C. 1995, ApJ, 453, 873
Fryxell, B., Olson, K., Ricker, P., et al. 2000, ApJS, 131, 273
Fulbright, M. S. & Reynolds, S. P. 1990, ApJ, 357, 591
Gaensler, B. M. 1998, ApJ, 493, 781
Giacani, E. B., Dubner, G. M., Green, A. J., Goss, W. M., & Gaensler,
B. M. 2000, AJ, 119, 281
Ginzburg, V. L. & Syrovatskii, S. I. 1965, ARA&A, 3, 297
Hnatyk, B. & Petruk, O. 1999, A&A, 344, 295
Jokipii, J. R. 1987, ApJ, 313, 842
Jun, B.-I. & Norman, M. L. 1996, ApJ, 472, 245
Kane, J., Drake, R. P., & Remington, B. A. 1999, ApJ, 511, 335
Kesteven, M. J. & Caswell, J. L. 1987, A&A, 183, 118
Leckband, J. A., Spangler, S. R., & Cairns, I. H. 1989, ApJ, 338, 963
Löhner, R. 1987, Comp. Meth. Appl. Mech. Eng., 61, 323
Lucek, S. G. & Bell, A. R. 2000, MNRAS, 314, 65
Mac Low, M.-M. & Klessen, R. S. 2004, Reviews of Modern Physics,
76, 125
MacNeice, P., Olson, K. M., Mobarry, C., de Fainchtein, R., & Packer,
C. 2000, Comp. Phys. Comm., 126, 330
Mineshige, S. & Shibata, K. 1990, ApJ, 355, L47
Ostrowski, M. 1988, MNRAS, 233, 257
Petruk, O. 2005, J. Phys. Studies, 9, 364
Petruk, O. 2006, A&A, 460, 375
Petruk, O. & Bandiera, R. 2006, J. Phys. Studies, 10, 66
Reynolds, P. S. & Fulbright, S. M. 1990, in International Cosmic Ray
Conference, 72
Reynolds, S. P. 1998, ApJ, 493, 375
Reynolds, S. P. & Gilmore, D. M. 1993, AJ, 106, 272
Reynoso, E. M., Green, A. J., Johnston, S., et al. 2004, Publications of
the Astronomical Society of Australia, 21, 82
Roger, R. S., Milne, D. K., Kesteven, M. J., Wellington, K. J., &
Haynes, R. F. 1988, ApJ, 332, 940
Rothenflug, R., Ballet, J., Dubner, G., et al. 2004, A&A, 425, 121
Truelove, J. K. & McKee, C. F. 1999, ApJS, 120, 299
Whiteoak, J. B. Z. & Green, A. J. 1996, A&AS, 118, 329
Winkler, P. F., Gupta, G., & Long, K. S. 2003, ApJ, 585, 324
	Introduction
	Model
	Magnetohydrodynamic modeling
	Nonthermal electrons and synchrotron emission
	Results
	Obliquity angle dependence
	Non-uniform ISM: dependence from parameter b
	Morphology
	Discussion
	Conclusions
	Derivation of Eq. (8)
ABSTRACT
  AIMS: We investigate whether the morphology of bilateral supernova remnants
(BSNRs) observed in the radio band is determined mainly either by a non-uniform
interstellar medium (ISM) or by a non-uniform ambient magnetic field.
  METHODS: We perform 3-D MHD simulations of a spherical SNR shock propagating
through a magnetized ISM. Two cases of shock propagation are considered: 1)
through a gradient of ambient density with a uniform ambient magnetic field; 2)
through a homogeneous medium with a gradient of ambient magnetic field
strength. From the simulations, we synthesize the synchrotron radio emission,
making different assumptions about the details of acceleration and injection of
relativistic electrons.
  RESULTS: We find that asymmetric BSNRs are produced if the line-of-sight is
not aligned with the gradient of ambient plasma density or with the gradient of
ambient magnetic field strength. We derive useful parameters to quantify the
degree of asymmetry of the remnants that may provide a powerful diagnostic of
the microphysics of strong shock waves through the comparison between models
and observations.
  CONCLUSIONS: BSNRs with two radio limbs of different brightness can be
explained if a gradient of ambient density or, most likely, of ambient magnetic
field strength is perpendicular to the radio limbs. BSNRs with converging
similar radio arcs can be explained if the gradient runs between the two arcs.

<|endoftext|><|startoftext|>
Nonstationary pattern in unsynchronizable complex networks
Xingang Wang,1, 2 Meng Zhan,3 Shuguang Guan,1, 2 and Choy Heng Lai4, 2
1Temasek Laboratories, National University of Singapore, 117508 Singapore
2Beijing-Hong Kong-Singapore Joint Centre for Nonlinear & Complex Systems (Singapore),
National University of Singapore, Kent Ridge, 119260 Singapore
3Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071, China
4Department of Physics, National University of Singapore, 117542 Singapore
(Dated: November 12, 2018)
Pattern formation and evolution in unsynchronizable complex networks are investigated. Due to the asym-
metric topology, the synchronous patterns formed in complex networks are irregular and nonstationary. For
coupling strength immediately out of the synchronizable region, the typical phenomenon is the on-off inter-
mittency of the system dynamics. The patterns appeared in this process are signatured by the coexistence of a
giant cluster, which comprises most of the nodes, and a few number of small clusters. The pattern evolution
is characterized by the giant cluster irregularly absorbs or emits the small clusters. As the coupling strength
leaves away from the synchronization bifurcation point, the giant cluster is gradually dissolved into a number of
small clusters, and the system dynamics is characterized by the integration and separation of the small clusters.
Dynamical mechanisms and statistical properties of the nonstationary pattern evolution are analyzed and con-
ducted, and some scalings are newly revealed. Remarkably, it is found that the few active nodes, which escape
from the giant cluster with a high frequency, are independent of the coupling strength while are sensitive to the
bifurcation types. We hope our findings about nonstationary pattern could give additional understandings to
the dynamics of complex systems and have implications to some real problems where systems maintain their
normal functions only in the unsynchronizable state.
PACS numbers: 89.75.-k, 05.45.Xt
I. INTRODUCTION
Synchronization of complex networks has aroused many in-
terest in nonlinear science since the discoveries of the small-
world and scale-free properties in many real and man-made
systems [1, 2]. In this study, one important issue is to ex-
plore the inter-dependent relationship between the collective
behaviors of the complex systems and their underlying topolo-
gies. In particular, many efforts have been paid to the con-
struction of optimal networks, and a number of factors which
have important affections to the synchronizability of complex
networks have been gradually disclosed. Now it is known that
random networks, due to their small average distances, are
generally more synchronizable than regular networks [3, 4];
and scale-free networks, with weighted and asymmetric cou-
plings, can be more synchronizable than homogeneous net-
works [5, 6]. In these studies, the standard method employed
for synchronization analysis is the master stability function
(MSF), where the network synchronizability is estimated by
an eigenratio calculated from the coupling matrix, and system
which has a smaller eigenratio is believed to be more synchro-
nizable than that of larger eigenratio [7]. Inspired by this, to
improve the network synchronizability, the only task seems to
be upgrading the coupling matrix so as to decrease the eigen-
ratio, either by changing the network topology [4] or by ad-
justing the coupling scheme [5, 6].
The MSF method, while bringing great convenience to the
analysis, overlooks the temporal, local property of the sys-
tem and reflects only partial information about the system
dynamics. Specifically, from MSF we only know ultimately
whether the network is globally synchronizable or unsynchro-
nizable, but do not know how the global synchronization is
reached if the network is synchronizable, or what’s the pat-
tern and how it evolves if the network is unsynchronizable.
These evolution details, or the transient behavior in system
development, contain rich information about the system dy-
namics and may give additional insights to the organization
of complex systems. For instance, the recent studies about
synchronization transition have shown that, in the unsynchro-
nizable states, heterogeneous networks are more synchroniz-
able (have a higher degree of coherence) than homogeneous
networks at small couplings, while at larger couplings the op-
posite happens [8]. This crossover phenomenon of network
synchronizability are difficult to understood if we only look
at the final state of the system, but are straightforward if we
look at the transient behaviors of their evolutions [8]. Besides
revealing the synchronization mechanisms, the transient be-
havior of network synchronization can also be used to detect
the topological scales and hierarchical structures in the real
systems, e.g., the detection of cluster structures in social and
biological networks [9, 10]. However, despite of its theoreti-
cal and practical significance, the study of transient dynamics
of complex networks is still at its infancy and many questions
remain open, say for example, the pattern evolution of unsyn-
chronizable complex networks.
Pattern formation in unsynchronizable but near-
synchronization networks has been an important issue
in studying the collective behavior of regular networks
[11, 12]. By setting the coupling strength nearby the
synchronization bifurcation point, the system state shares
both the dynamical properties of the synchronizable and
unsynchronizable states: a state of high coherence but is not
synchronized. The bifacial dynamical property makes this
state a natural choice in investigating the transition process of
networks synchronization. Previously studies about regular
http://arxiv.org/abs/0704.0892v1
networks, say for example the lattices [11], have shown that,
when the coupling strength is slightly out of the synchroniz-
able region, although global synchronization is unreachable,
nodes are still synchronized in a partial sense. That is, nodes
are self-organized into a number of synchronous clusters.
The distribution of these clusters, also called the synchronous
pattern, is determined by a set of factors such as the coupling
strength, the system size and the coupling scheme. As the
coupling strength leaves away from the bifurcation point, the
pattern structure becomes more and more complicated and the
system coherence will be decreased, and finally reaches the
turbulence state. It is worthy of note that the patterns arisen
in regular networks have two common properties: spatially
symmetric and temporally stationary. More specifically, the
contents of each cluster are fixed and the clusters are of
translation symmetry in space. For this reason, we say that
the patterns formed in regular networks are symmetric and
stationary. These two properties, as have been discussed in
the previous studies [11, 12], are rooted in the symmetric
topology of the regular networks. This makes it interesting
to ask the following question: how about the patterns in
unsynchronizable complex networks?
Different to the regular networks, in complex networks we
are not able to find any symmetry from their topologies. The
asymmetric topology, according to the pattern analysis devel-
oped in studying regular networks [11], will induce two sig-
nificant changes to the patterns: 1) the synchronous clusters, if
they exist, will be asymmetric; and 2) all the possible patterns,
including the one of global synchronization, are linearly un-
stable under small perturbations. In other words, the patterns
formed in complex networks are expected to be asymmetric
and nonstationary. Our mission of this paper is just to under-
stand and characterize the nonstationary patterns arisen in the
development of complex networks. Specifically, we are trying
to investigate the following questions: 1) is there any pattern
arises during the system evolution? 2) the pattern is stationary
or nonstationary? if nonstationary, how is it evolving and how
is it reflected from the system dynamics? 3) What happens
to the pattern properties during the transition of network syn-
chronization? and 4) How the coupling strength and bifurca-
tion type affect the pattern properties? By investigating these
dynamical and statistical properties, we wish to have a global
understanding to the dynamics of unsynchronizable complex
networks.
Our main findings are: 1) for coupling strength immedi-
ately outside of the synchronizable region, the system dynam-
ics undergoes the process of on-off intermittency. That is,
most of the time the system stays on the global synchroniza-
tion state (the ”off” state) but, once in a while, it develops
into a breaking state (the ”on” state) which is composed by
a giant cluster and a few number of small clusters (hereafter
we call it the giant-cluster state). As the system develops,
the giant cluster changes its shape by absorbing or emitting
the small clusters, leading to the ”off” or ”on” states, respec-
tively; 2) the few active nodes which escape from the giant
cluster with the high frequencies are coupling-strength inde-
pendent but are bifurcation-type dependent. That is, in the
neighboring region of a fixed bifurcation point, the locations
of these active nodes do not change with the coupling strength;
if we change the coupling strength from nearby another bi-
furcation point (the two bifurcation points will be explained
later), their locations will be totally changed; 3) as coupling
strength leaves away from the bifurcation point, the giant clus-
ter is gradually dissolved and more small clusters are gener-
ated from it. Eventually, the giant cluster disappears and the
pattern is composed by only the small clusters (hereafter we
call it the scattering–cluster state). During the course of sys-
tem evolution, each small cluster may either increase its size
by integrating with other small clusters or decrease its size
by breaking to even small clusters, but it can never reach to
the global synchronization state; 4) besides the giant cluster,
the giant- and scattering-cluster states are also distinct in their
small clusters. For giant-cluster state the size of the small
clusters follows a power-law distribution, while for scattering-
cluster state it follows a Gaussian distribution.
The rest of the paper is going to be arranged as follows. In
Sec. II we will give our model of coupled map network and,
based on the method of MSF, point out the two bifurcation
points and the transition areas that we are going to study with.
In Sec. III we will employ the method of finite-time Lyapunov
exponent to predict and describe the intermittent system dy-
namics in the bifurcation regions. Direct simulations about
on-off intermittency will be presented in Sec. IV. By intro-
ducing the method of temporal phase synchronization, in Sec.
V we will investigate in detail the dynamical and statistical
properties of the nonstationary pattern. Meanwhile, proper-
ties of the giant- and scattering states will be compared and
the transition between the two states will be conducted. In
Sec. VI we will discuss the phenomenon of active nodes and
investigate their dependence to the network properties. Dis-
cussions and conclusions about pattern evolution in complex
networks will be presented in Sec. VII.
II. COUPLED MAP NETWORKS AND THE TWO
BIFURCATION POINTS
Our model of coupled map network is of the following form
xi(t+ 1) = F(xi(t))− ε
Gi,jH [f(xj(t))] . (1)
where xi(t + 1) = F(xi(t)) is a d-dimensional map repre-
senting the local dynamics on node i, ε is a global coupling
parameter, G is Laplacian matrix representing the couplings,
and H is a coupling function. To facilitate our analysis, we
adopt the following coupling scheme [13]:
Gi,j = −
Ai,jk
j=1 Ai,jk
, (2)
for j 6= i and Gi,i = 1, with ki the degree of node i and
A the adjacent matrix of the network: Ai,j = 1 if node i and
j are connected and Ai,j = 0 otherwise. In comparison with
the traditional coupling schemes, one important advantage we
benefit from this coupling scheme is that the synchronizabil-
ity of the network, i.e. the eigenratio of the coupling matrix
described in Eq. [2], can be easily adjusted by the parameter
β, while the network topology is kept unchanged. This advan-
tage brings many convenience in network selection since for
a given network topology, even though it is unsynchronizable
under the traditional schemes, can now be synchronizable by
adjusting β in Eq. [2]. This convenience is of particular im-
portance when our studies of network dynamics are focused
on the bifurcation regions, where the network synchronizabil-
ity should be deliberately arranged in order to demonstrate
both the two types of bifurcations. We note that the adop-
tion of Eq. [2] is only for the purpose of convenient analy-
sis, the findings we are going to report are general and can
also be observed by other coupling schemes given the net-
work is properly prepared. In practice, we use logistic map
F(x) = 4x(1−x) as the local dynamics and adopt H(x) = x
as the coupling function.
We first locate the two bifurcation points of global syn-
chronization. The linear stability of the global synchroniza-
tion state {xi(t) = s(t), ∀i} is determined by the correspond-
ing variational equations, which can be diagonalized into N
blocks of form
y(t+ 1) = [DF(s) + σDH(s)] y(t), (3)
with DF(s) and DH(s) the Jacobian matrices of the corre-
sponding vector functions evaluated at s(t), and y represents
the different modes that are transverse to the synchronous
manifold s(t). We have σ(i) = ελi for the ith block, i =
1, 2, ..., N , and λ1 = 0 ≤ λ2 ≤ ... ≤ λN are the eigenvalues
of matrix G. The largest Lyapunov exponent Λ(σ) of Eq. [3],
known as the master stability function (MSF) [7], determines
the linear stability of the synchronous manifold s(t). In par-
ticular, the synchronous manifold is stable if only Λ(ελi) < 0
for each i = 2, ..., N . The set of Lyapunov exponents Λ(ελi)
govern the stability of the synchronous manifold in the trans-
verse spaces, and a positive value of Λ(ελi) represents the
loss of the stability in the transverse space of mode i. It was
found that for a large class of chaotic systems, Λ(σ) < 0
is only fulfilled within a limit range in the parameter space
σ ∈ (σ1, σ2). This indicates that, to make the global synchro-
nization state linearly stable, all the eigenvalues λi should be
contained within range (σ1, σ2), i.e., λN/λ2 < σ2/σ1. For
the logistic map employed here, it is not difficult to prove that
σ1 = 0.5 and σ2 = 1.5. Therefore, to achieve global syn-
chronization, the coupling matrix G should be designed with
eigenratio R ≡ λN/λ2 < σ2/σ1 = 3 = Rc.
Besides the condition of R < Rc, to guarantee the synchro-
nization, we also need to set the coupling strength in a proper
way: either small or large couplings may deteriorate the syn-
chronization. If ε < ε1 = σ1/λ2, the couplings are too weak
to restrict the node trajectories to the synchronous manifold;
while if ε > ε2 = σ2/λN , the couplings will be too strong
and actually act as large perturbations to the synchronization
manifold. Therefore, to achieve the global synchronization,
we also require ε1 < ε < ε2. The two critical couplings
ε1 and ε2, which are named as the long-wave (LW) [14] and
short-wave (SW) bifurcations [15] respectively in the studies
of regular networks, thus stand as the boundaries of the syn-
chronizable region. Our studies about network synchroniza-
0.80 0.85 0.90 0.95 1.00
 = 0.95127
 = 0.83488
FIG. 1: For scale-free network of N = 1000 nodes and of average
degree < k >= 8, a schematic plot on the generation of the two
bifurcation points as a function of the coupling strength. The long-
wave bifurcation occurs at about ε1 ≈ 0.83488 which is determined
by the condition ελ2 = σ1 = 0.5 (the lower line). The short-wave
bifurcation occurs at about ε2 ≈ 0.95127 which is determined by
the condition ελN = σ2 = 1.5 (the upper line).
tion will be focused on the neighboring regions of the two
bifurcation points, i.e., the region of ε . ε1 or ε & ε2.
By the standard BA growth model [1], we construct a scale-
free network of 103 nodes and of average degree 〈k〉 = 8. By
setting β = 2.5 in Eq. [2], we have λ2 ≈ 0.6 and λN ≈ 1.58.
Because of R = λN/λ2 ≈ 2.6 < Rc, the network is glob-
ally synchronizable. Also, because of λ2 > σ1 and λN > σ2,
both the two bifurcations can be realized by adjusting the cou-
pling strength within range ε ∈ (0, 1). In specific, when
ε < ε1 ≈ 0.835, we have ελ2 < σ1 and ελN < σ2, the
synchronous manifold loses its stability at the lower boundary
of the synchronizable region and LW bifurcation occurs; and
when ε > ε2 ≈ 0.95, we have ελ2 > σ1 and ελN > σ2, the
synchronous manifold loses its stability at the upper bound-
ary of the synchronizable region and SW bifurcation occurs
[Fig. 1]. In the following we will fix the network topology
and the parameter β, while generating the various patterns by
changing the coupling strength ε nearby the two bifurcation
points.
III. FINITE-TIME LYAPUNOV EXPONENT
Before direct simulations, we first give a qualitative de-
scription (prediction) on the possible system dynamics in bi-
furcation regions. To concrete our analysis, in the following
we will only discuss the situation of SW bifurcation (ε . ε1),
while noting that the same phenomena can be found at the
LW bifurcation as well (ε & ε2) . In preparing the unsynchro-
nizable states, we only let Λ(λ2) be slightly puncturing into
the unstable region, while keeping all the other exponents still
staying in the stable region, i.e., Λ(λ2) & 0 and Λ(λi) < 0
for i = 3, ...N . With this setting, the synchronous manifold
is only desynchronized in the transverse space of mode 2. As
such, the system possesses only two positive Lyapunov expo-
nents, one is Λ(λ0) which is associated to the synchronous
manifold itself and another one is Λ(λ2). Noticing that Λ(λ)
are asymptotic averages, and, as so, they account only for the
global stability properties, but do not warrant the possible co-
herent sets arising in the system evolutions. These coherent
sets, for regular networks, refer to the stationary, symmetric
patterns to which the system finally develops. While for com-
plex networks, these sets can be the temporal, irregular clus-
ters emerged in the process of system evolution.
In the region of ε . ε1, although global synchronization is
unreachable, the system may still keep with the high coher-
ence due to the existence of the synchronous clusters. Espe-
cially, there could be some moments at which all the trajecto-
ries are restrained to a small region in the phase space, very
close to the situation of global synchronization. This vary-
ing system coherence, however, can not be reflected from the
asymptotic value Λ(λ). To characterize the variation, we need
to employ some new quantities which are able to capture the
temporal behavior of system. One of such quantities is the
finite-time Lyapunov exponent (FLE), a technique developed
in studying chaos transition in nonlinear science [16]. In stead
of asymptotic average, FLE measures the diverging rate of
nearby trajectories only in a finite time interval T .
t=(i−1)T
lnDH(s(t)). (4)
As our studies are focused on the situation of one-mode
desynchronization, the stability of the synchronous manifold
and the temporal behavior that it displays are therefore ex-
pected to be more reflected from the variation of Λ2,i, the
FLE that associates with mode 2. With ε = 0.83 and
T = 100, we plot in Fig. 2 the time evolution of Λ2,i. It
is found that, although with a positive asymptotic value about
〈Λ2,i〉 ≈ 6 × 10
−3, the instant value of Λ2,i penetrates to the
negative region at a high frequency. According to the different
signs of Λ2,i, the system evolution is divided into two types of
intervals: the divergent interval and the contractive interval.
In the divergent intervals we have Λ2,i > 0 and the system
dynamics is temporarily dominated by the divergence of the
node trajectories from the synchronous manifold; while in the
contractive intervals we have Λ2,i < 0 and the system dynam-
ics is temporarily dominated by the convergence of the node
trajectories to the synchronous manifold.
The variation of Λ2,i, reflected on the process of pattern
evolution, characterizes the travelling property of the sys-
tem dynamics among the neighboring regions of two different
kinds of states: the desynchronization state and the synchro-
nization state. In Fig. 2, the minimum value of Λ2,i is about
−0.07, during this contractive interval the node trajectories
will converge to the synchronous manifold by an amount of
eminΛ2,iT ≈ e−7 ≈ 10−3 on average. Assuming that before
entering this interval the average distance between the node
trajectories is ∆ (for logistic map we always have ∆ < 1),
then at the end of this interval the average distance is de-
creased to ∆ × 10−3, a small value which is usually over-
shadowed by noise in practice. Due to this small distance,
0 400 800 1200 1600 2000
-0.08
-0.04
FIG. 2: For ε = 0.83 in Fig. 1, the time evolution of the finite-
time Lyapunov exponent Λ2,i calculated on intervals of length T =
100. It is observed that, while having the positive asymptotic value
〈Λ2,i〉 > 0, the temporal value of Λ2,i is penetrating into the negative
region frequently.
the system can be practically regarded as already reached the
synchronization state. On the other hand, if the system enters
a divergent interval, the node trajectories will diverge from
each other and, at the end of this interval, their average dis-
tance will be increased by an order of 103. This large distance
will deteriorate the ordered trajectories (or the high coherence
of the system dynamics) that achieved during the contractive
intervals, and leading to the incoherent, breaking state. The
pattern of the breaking state, however, is not unique. Depend-
ing on the initial conditions and the divergence intervals, the
pattern may assume the different configurations. Therefore,
based on the observations of Λ2,i [Fig. 2] the dynamics of
unsynchronizable networks can be intuitively understood as
an intermittent travelling among the synchronization state and
the different kinds of desynchronization states.
IV. ON-OFF INTERMITTENCY DESCRIBED BY
COMPLETE SYNCHRONIZATION
We now investigate the system dynamics by direct simula-
tions. To implement, we first prepare the system to be staying
on the synchronization state. This can be achieved by adopt-
ing a large coupling strength from the synchronizable region,
i.e. ε1 < ε < ε2. After synchronization is achieved, we then
decrease ε to a value slightly below the bifurcation point ε1
and, in the meantime, an instant small perturbation is added
on each node. In practice, we take i.i.d (independent iden-
tically distributed) noise of strength 1 × 10−5 as the pertur-
bations. After this, we release the system and let it develop
according to Eq. (1). Since ε < ε1, the synchronization state
is unstable and, triggered by the noises, the node trajectories
begin to diverge from each other. The divergent trajectories,
however, will frequently visit the neighborhood of the syn-
chronous manifold, especially during those contractive inter-
vals of small Λ2,i [Fig. 2]. The intermittent system dynamics
is plotted in Fig. 3(a), where the average trajectory distance
∆X = 1
i=1 xi − ~x is plotted as a function of time. As
we have predicted from LLE, the system dynamics indeed un-
dergoes an intermittent process. To characterize the intermit-
tency, we plot in Fig. 3(b) the laminar-phase distribution of
the ∆X sequence plotted in Fig. 3(a). It is found that the lam-
inar length τ (the time interval between two adjacent bursts
of amplitude ∆X(t) > 10−3) and the probability p(τ) for it
to appear follow a power-law scaling p(τ) ∼ τ−γ . The fitted
exponent is about γ ≈ −1.5 ± 0.05, with a fat tail at large τ
due to the finite simulating time.
In chaos theory, intermittent process of laminar-phase expo-
nent −3/2 is classified as the ”on-off” intermittency, a typical
phenomenon observed in dynamical systems with a symmet-
ric invariant set [17]. On-off intermittency is also reported
in chaos synchronization of regular networks, where the in-
variant set refers to the synchronous manifold, and the ”off”
state refers to the long stretches that the system dynamics is
staying nearby the synchronous manifold and the ”on” state
refers to the short bursts that the system dynamics is staying
away from the synchronous manifold. Therefore, in terms of
laminar-phase distribution, the intermittency we have found
in complex networks [Fig. 3] has no difference to the that of
the regular networks, despite of the drastic difference between
their topologies. We have also investigated the transition be-
havior of the averaged distance 〈∆X(t)〉 nearby the bifurca-
tion points. As shown in Fig. 3(c), a linear relation between
〈∆X(t)〉 and ε is found in the region of ε . ε1. This linear
transition of the system performance, again, is consistent with
the transition of regular networks [18]. Therefore, in terms of
complete synchronization, the on-off intermittency we have
found in complex networks has no difference to that of the
regular networks.
V. PATTERN EVOLUTION IN COMPLEX NETWORKS
To reveal the unique properties of the system dynamics that
induced by the complex topology, we go on to investigate
the pattern formation of unsynchronizable networks by the
method of temporal phase synchronization (TPS).
A. Temporal phase synchronization
TPS is defined as follows. Let xi(t) be the time sequence
recorded on node i, we first transform it into a symbolic se-
quence θi(t) according to the following equations
θi(t) =
0, if xi(t) < 0.5,
1, if xi(t) ≥ 0.5.
Then we divide θi(t) into short segments of the equal length
n. Regarding each segment as an new element, we there-
fore have transformed the long, variable sequence xi(t) into
a short, symbolic sequence Θi(t
′). If at moment t′ we have
′) = Θj(t
′), then we say that TPS is achieved between
0.820 0.825 0.830 0.835
0 2000 4000 6000 8000 10000
10 100 1000
FIG. 3: (Color online) The on-off intermittency of the system dy-
namics nearby the LW bifurcation at ε = 0.83. (a) The time evolu-
tion of the average trajectory distance ∆X . (b) The laminar-phase
distribution of ∆X , which follows a power-law scaling with the fit-
ted exponent around 3/2. (c) The transition behavior of the average
distance 〈∆X〉 nearby the LW bifurcation point ε1, where a linear
relation is found between the two quantities.
the nodes i and j. The collection of nodes which have the
same value of Θ at moment t′ are defined as a temporarily
synchronous cluster, and all the synchronous clusters consti-
tute the temporarily pattern of the system. During the course
of system evolution, the clusters will change their shapes and
contents and the pattern will change its configuration.
In comparison with the method of complete synchroniza-
tion, the advantage we benefit from TPS is obvious: it makes
the synchronous pattern detectable. With complete synchro-
nization, it is almost impossible for two nodes to have ex-
actly the same variable at the same time. Despite the fact that
at some moments the system has already reached the high-
coherence states (formed during those contractive intervals in
Fig. 3(a)), with complete synchronization we are not able to
distinguish these states from those low-coherence ones quanti-
tatively (formed during those divergent intervals in Fig. 3(a)).
(A remedy to this difficulty seems to define the clusters by
the method of threshold truncation, i.e., nodes are regarded
as synchronized if the distance between their trajectories is
smaller than some small value. However, this definition of
synchronization will induce the problem of cluster idenfica-
tion, as the same state may generate different patterns if we
choose the different reference nodes.) On the contrary, TPS
focuses on the loose match (phase synchronization) between
the node variables over a period of time. By requiring an ex-
act match of the discrete variable Θ, the synchronous pattern
is uniquely defined; while by requiring the match of the long
sequences of θ, the ”synchronous” nodes are guaranteed with
a strong coherence.
B. Pattern evolution of the giant-cluster state
With the same set of parameters as in Fig. 3(a), by the
method of TPS we plot in Fig. 4 the time evolutions of two ba-
sic quantities of pattern evolution: the number of synchronous
clusters nc and the size of the largest cluster Lmax. It is
found that, similar to the phenomenon in complete synchro-
nization [Fig. 3(a)], on-off intermittency is also found in the
TPS quantities nc and Lmax. In Fig. 4(a) it is shown that most
of the time the system is broken into only a few number of
clusters, nc = 2 or 3, while occasionally it is broken into a
quite large number of clusters, 10 < nc < 50, or united to the
synchronization state, nc = 1. The intermittent pattern evo-
lution is also reflected on the sequence of Lmax [Fig. 4(b)],
where most of the time the size of the giant cluster is about
Lmax ≈ N , while occasionally it decreases to some small
values of Lmax < N/2. As we have discussed previously,
the main advantage we benefit from TPS is in identifying the
clusters. This advantage is clearly shown in Figs. 4(a) and
(b), where for any time instant the two quantities nc and Li
are uniquely defined. Besides cluster identification, we also
benefit from TPS in quantifying the synchronization degrees.
In specific, the different coherence states shown in Fig. 3(a)
now can be clearly quantified: high coherence states are those
of smaller nc or larger Lmax. Specially, the synchronization
state now is unambiguously defined as the moments of nc = 1
in Fig. 4(a) or, equally, the moments of Lmax = N in Fig.
4(b).
We go on to investigate the pattern evolution by statistical
analysis. The first statistic we are interested is the laminar-
phase distribution of the synchronization state, i.e. the time
intervals that nc = 1 in Fig. 4(a) or Lmax = N in Fig. 4(b).
In its original definition, laminar phase refers to the time inter-
vals τ that all node trajectories stays within a small distance
from the synchronous manifold, therefore the actual value of
τ is varying with the predefined threshold distance. This un-
certainty is overcome in TPS. As shown in Fig. 4(a), in TPS
0 1x105 2x105 3x105
0 1x105 2x105 3x105
FIG. 4: For the same set of parameters as in Fig. 3(a). The time
evolutions of the TPS quantities. (a) The number of the synchronous
clusters nc and (b) the size of the giant cluster Lmax. The synchro-
nization state is defined as the moments nc = 1 in (a) or Lmax = N
in (b).
the ”off” state refers to the moments of nc = 1 specifically.
The laminar-phase distribution of nc is plotted in Fig. 5(a). In
consistency with the distribution of complete synchronization
[Fig. 3], the laminar-phase distribution of nc also follows a
power-law scaling and has the same exponent γ ≈ −1.5±0.1.
Therefore the use of TPS, while bringing convenience to the
pattern analysis, still capture the basic properties of the on-off
intermittency. The second statistic we are interested is the size
distribution of the largest cluster, an important indicator for
the coherence degree of the system. For the Lmax sequence
plotted in Fig. 4(b), in Fig. 5(b) we plot the size distribution
of Lmax. It is seen that the probability of finding large clus-
ter Lmax ≈ N is much higher than that of small cluster of
Lmax < 500. In particular, the probability for finding clusters
of Lmax > 990 is about 20 percent and for Lmax > 990 it
is about 70 percent. Therefore, in the region of ε . ε1, the
distinct feature of the system patterns is the existence of a gi-
ant cluster. Due to this special feature, we call these states the
giant-cluster state.
Besides the giant cluster, we are also interested in the prop-
erties of the small clusters. We plot in Fig. 5(c) the distribu-
tion of nc and in Fig. 5(d) the size distribution of the small
clusters Li that surround the giant cluster in the pattern. As
shown in Fig. 5(c), the distribution of nc follows a power-law
100 101 102
0 500 1000
1 10 100
1 10 100 1000
FIG. 5: (Color online) Statistical properties of the on-off intermit-
tency plotted in Fig. 4. (a) The power-law scaling of the laminar-
phase distribution of nc. The fitted slope is about −2.3 ± 0.05. (b)
The size distribution of the size of the giant cluster. (c) The power-
law distribution of the number of small clusters nc. The fitted slope
is about −3±0.1. (d) The power-law scaling on the size distribution
of the small clusters. The fitted slope is about −1.2± 0.01.
scaling with the fixed exponent is about γ ≈ −3± 0.05. The
heterogeneous distribution of nc indicates that, in the giant-
cluster state, the system is usually broken into only a few num-
ber of clusters. An interesting finding exists in the size distri-
bution of the small clusters. As shown in Fig. 5(d), in range
Li ∈ [1, N/2] a power-law scaling is found between PLi and
Li, with the fitted exponent is about γ ≈ −1.1 ± 0.05. The
distribution of Li confirms the finding of Fig. 5(c) that the
small clusters which join or separate from the giant cluster are
usually of small size.
Combining the findings of Fig. 4 and Fig. 5, the picture
of pattern evolution in the bifurcation region ε . ε1 now be-
comes clear. Generally speaking, the evolution can be divided
into two opposite dynamical processes happening around the
giant cluster: the separation and integration of the small clus-
ters. During the separation process, the small clusters are es-
caped from the giant cluster, which weakens the dominant role
of the giant cluster and makes the pattern complicated. How-
ever, the separated clusters occupy only a small proportion of
the nodes [Fig. 5(c)], the majority nodes are still attached to
the giant cluster, which sustains the synchronization skeleton
and keeps the system on the high coherence states. At some
rare moments the giant cluster may disappears, and the pat-
tern is composed by only small clusters of Li < N/2. At
these moments, the synchronization skeleton is broken, the
pattern becomes even complicated and the system coherence
reaches its minimum. In contrast, during the process of cluster
integration, the giant cluster will increase it size by attracting
the small clusters , and gradually towards the state of global
synchronization. It should be noticed that the separation and
integration processes are uneven and are typically occurring
at the same time. For instance, during the separation process,
while the system evolution is dominated by the separation of
new small clusters from the giant cluster, there could be some
small clusters rejoin to the giant cluster.
C. Pattern evolution of the scattering-cluster state
As we further decrease the coupling strength from ε1, the
picture of pattern evolution will be totally changed. With
ε = 0.79, we plot in Fig. 6 the same statistics as in Fig. 5.
The first observation is the loss of the global synchronization
state, as can be found from the time variation of nc plotted in
Fig. 6(a). The loss of global synchronization becomes even
clear if we compare Fig. 6(a) with Fig. 4(a): in Fig. 6(a),
except the moment at t = 0, the system can never reach the
synchronization state at nc = 1 and very often it is broken into
a large number of small clusters at about nc ∼ 10
2. The fact
that the pattern is decomposed into a large number of small
clusters is also manifested by the distribution of nc, as plotted
in Fig. 6(b). Instead of the power-law distribution found in the
giant-cluster state, in the scattering-cluster state nc follows a
Gaussian distribution [Fig. 6(b)]. As ε further decreases from
ε1, the mean value of nc will shift to the larger values, as
indicated by the ε = 0.78 curve plotted in Fig. 6(b). The sec-
ond observation is the disappearance of the giant cluster. As
shown in Fig. 6(c), the size distribution of the largest cluster
also follows a Gaussian distribution, with its mean value lo-
cates at 〈Lmax〉 < N/2. The distribution of Fig. 6(c) is very
different to that of Fig. 5(b), where in Fig. 5(b) the largest (gi-
ant) cluster has size Lmax ≈ N in most of the time. As ε de-
creases, the mean value of the largest cluster 〈Lmax〉 will shift
to small values and the variance of Lmax will be decreased, as
indicated by the ε = 0.78 curve plotted in Fig. 6(c). Similar to
plot of Fig. 5(d), we have also investigated the distribution of
Li, the sizes for all the small clusters appeared in the system
evolution [Fig. 6(d)]. It is found that the distribution of Li fol-
lows a power-law distribution for Li < N/2, while having an
exponential tail for Li > N/2. Numerically we find that the
exponent of the power-law section, i.e. in rangeLi ∈ [1, 200),
is about −2± 0.05, while the fitted exponent for the exponen-
tial section is about −4.5 × 10−3 ± 2 × 10−5. These two
exponents, however, are changing with ε. As ε decreases, the
two exponents will shift to some small values.
Combining Fig. 5 and Fig. 6, we are able to outline the
transition process of network synchronization nearby the bi-
furcation points, i.e., the transition from the giant-cluster state
to the scattering-cluster state as ε leaves away from ε1. In
the region of ε . ε1, the pattern is composed by a giant
cluster and a few number of small clusters, i.e. the giant-
cluster state. As ε decreases from ε1 gradually, more and
more small clusters will be emitted out from the giant clus-
ter and, as a consequence, both the size of the giant cluster
and the fraction of synchronization time will be decreased.
Then, at about εc ≈ 0.832, the giant cluster disappears and
the pattern of the system is composed by several larger clus-
ters, of size Lmax . N/2, together with many small clusters
of heterogenous size distribution, i.e. the scattering-cluster
state. After that, as ε decreases from εc, the clusters shrink
their size by breaking into even small clusters, and the pat-
0 200 400 600 800 1000
0 100 200 300 400 500
1 10 100 1000
0.0 5.0x104 1.0x105 1.5x105 2.0x105
 g=0.78
 g=0.79
 g=0.78
 g=0.79
(d)  g=0.78
 g=0.79
FIG. 6: (Color online) The dynamical and statistical properties of
pattern evolution for ε = 0.79. (a) The time evolution of nc. (b) The
Gaussian distribution of the number of the small clusters nc. (b) The
Gaussian distribution of the size of the largest cluster Lmax. (d) The
two-segment distribution on the size of the small clusters Li. In the
region of Li < 200, Li follows a power-law distribution with fitted
exponent is about −2 ± 0.05; while for Li > 200, the distribution
is exponential with the fitted exponent is about −4.5 × 10−3 ± 2×
−5. As ε further decreases from ε1, the largest cluster becomes
even smaller and more small clusters are emitted out from it. As
illustrated by the ε = 0.78 curves plotted in (b), (c) and (d).
tern becomes even complicated. The detail transition from the
giant-cluster state to the scattering-cluster state is presented
in Fig. 7, where the average number of clusters that the sys-
tem is broken into 〈nc〉, Fig. 7(a), and the average size of
the largest cluster 〈Lmax〉, Fig. 7(b), are plotted as a func-
tion of the coupling strength in the LW bifurcation region.
The transition is found to be smooth and steady, just as we
have expected. Besides the giant cluster, another difference
between the giant-cluster and scattering-cluster states exists
in their pattern evolutions. In the giant-cluster state, while
the configuration of the giant cluster is continuously updated
by emitting or absorbing the small clusters, its main contents
are stable and do not change with time. In contrast, in the
scattering-cluster state the small clusters integrate with or sep-
arate from each other in a random fashion. Although occa-
sionally there could be some large clusters show up in the pat-
tern of the scattering-cluster state [Fig. 6(d)], these ”large”
clusters, however, are very fragile and will break into small
clusters again in a short time. This quick-dissolving prop-
erty stops the scattering-cluster state from having a high co-
herence.
VI. CHARACTERIZING THE ACTIVE NODES
In the giant-cluster state, most of the nodes are organized
into the giant cluster while few nodes, either in forms of small
group or isolated node, are separating from or joining to the
giant cluster with a high frequency. These active nodes, al-
0.79 0.80 0.81 0.82 0.83 0.84 0.85
0.79 0.80 0.81 0.82 0.83 0.84 0.85
 > (0.832,500)
FIG. 7: The transition process of the network synchronization nearby
the LW bifurcation point ε1. (a) The average number of clusters that
the system is broken into as a function of coupling strength. (b) The
average size of the largest cluster as a function of coupling strength.
Each date is an averaged result over 108 time steps.
though are few in amount, play an important role in network
synchronization. Clearly, a proper characterization of these
nodes will deepen our understandings on the system dynam-
ics and give indications to the improvement of network perfor-
mance. For instance, to improve the synchronizability of the
system, we may either remove the few most active nodes from
the network, or update their coupling strengths specifically.
In characterizing the active nodes, the following properties
are of general interest: 1) what’s the dependence of the node
activity on the network topology? can we characterize these
nodes by the known network properties such as node degree
or betweenness? 2) are their locations sensitive to the coupling
strength? and 3) what’s the effect of bifurcation type on their
locations? In the following we will explore these questions by
numerical simulations.
We first try to characterize the active nodes by their topo-
logical properties. For the giant-cluster state described in Fig.
4, we plot in Fig. 8(a) the probability pu1 that each node stays
in the giant cluster. While the majority nodes stay in the gi-
ant cluster with a high probability pu1 ≈ 1, few nodes are
of unusually small probabilities: 1 percent of the nodes have
pu1 < 0.8. One important observation of Fig. 8(a) is that
the locations of the active nodes are entangled with those of
310 320 330 340 350
0 200 400 600 800 1000
160 170 180 190 200
0 200 400 600 800 1000
 0.9514
 0.9515
 0.952(c)
 0.8342
 0.834
 0.8335
 0.83(a)
FIG. 8: (Color online) The properties of the active nodes. (a) For the
giant-cluster state shown in Fig. 4, the probability that node stays in
the giant cluster versus the node index. (b) A segment of (a) but with
different coupling strengths nearby the LW bifurcation point ε1. (c)
For the giant-cluster state (ε = 0.952) nearby the SW bifurcation
point, the probability that node stays in the giant cluster versus the
node index. (d) A segment of (c) under different coupling strengths
nearby the SW bifurcation point ε2.
the stable nodes. Noticing that in the BA growth model node
of higher index in general assume the smaller degree, the ob-
servation of Fig. 8(a) therefore indicates the independence of
the node degree to the node stability, or the inaccuracy of us-
ing degree to characterize the node activity. Specifically, in
Fig. 8(a) the 5 most unstable nodes, by a descending order of
pu1, are those of degrees k = 47, 36, 26, 10, 4, respectively.
Except the one of k = 4, all the other nodes have higher de-
grees. Another well-known topological property of complex
network is the node betweenness, which counts the number
of shortest pathes that pass through each node and actually
evaluates the node importance from the global-network point
of view. This global-network property, however, is also inca-
pable to characterize the active nodes. In Tab. 1 we list the
detail information about the 5 most active nodes in Fig. 8(a),
where the inaccuracy of node degree or node betweenness in
characterizing the active nodes are summarized.
TABLE I: For the attaching probability pi plotted in Fig. 8(a), we
list the 5 most unstable nodes and try to characterize them by a set of
topological quantities including the node index i, the attaching prob-
ability pi, the stability rank pi rank, the node degree ki, the degree
rank ki rank, the node betweenness Bi, and the betweenness rank
Bi rank.
Node index i pi pi rank ki ki rank Bi Bi rank
615 0.72797 1 5 39→537 1301 280
762 0.74424 2 5 39→537 1375 356
680 0.75416 3 4 1→338 1254 680
372 0.7591 4 6 538→645 1440 406
938 0.75972 5 4 1→338 1215 159
We go on to investigate the affection of the coupling
strength on the locations of the active nodes. In Fig. 8(b) we
fix the network topology and compare the node activities un-
der different coupling strengths nearby the bifurcation point
ε1. It is found that, despite of the changes in pu1, the lo-
cations of the active nodes are kept unchanged. That is, the
active nodes are always the first ones to escape from the gi-
ant cluster whenever the network is unsynchronizable. We
have also investigated the affection of the bifurcation type on
the locations of the active nodes. By choosing the coupling
strength nearby the SW bifurcation ε = 0.952 & ε2, we plot
in Fig. 8(c) the node attaching probability pu2 as a function
of the node index i. An interesting finding is that, comparing
to the situation of LW bifurcation [Fig. 8(a)], the locations
of the active nodes have been totally changed in Fig. (c). In
Tab. 2 we list the detail information about the 5 most active
nodes in Fig. 8(c), again their locations can not be predicted
by the node degree or betweenness. Similar to the LW bifur-
cation, the locations of the active nodes are also independent
to the coupling strength at the SW bifurcation, as shown in
Fig. 8(d).
TABLE II: Similar to Tab. I but for the attaching probability pi plot-
ted in Fig. 8(c). Comparing to Tab. I, one important observation is
the changed locations of the active nodes due to the changed bifurca-
tion type.
Node index i pi pi rank ki ki rank Bi Bi rank
43 0.78196 1 9 779→813 2847 813
35 0.78969 2 18 936→940 8513 953
714 0.795 3 4 1→338 1215 158
130 0.79652 4 13 886→901 4200 886
154 0.19944 5 10 814→846 2898 815
Previous studies about network synchronization have
shown that, while individually it is difficult to predict the dy-
namical behavior of each node, the average performance of
an ensemble of nodes of the same network properties do have
some reliable characters. For instance, it has been shown that
in complex networks the high-degree nodes are on average
more synchronizable than the low-degree ones [19]. Regard-
ing to the problem of node activities, it is natural to ask the
similar question: are the high-degree nodes more synchro-
nized than the low-degree nodes? In Fig. 9 we plot the av-
erage attaching probability 〈pu1〉k as a function of degree k.
Still, we can not find a clear dependence of 〈pu1〉k on k.
VII. DISCUSSIONS AND CONCLUSION
It is worthy of note that our studies of active nodes are only
focused on the giant-cluster state, and the purpose is to under-
stand their dynamics and reveal their properties. By ensemble
0 30 60 90 120
FIG. 9: (Color online) The average attaching probability 〈pu1〉k as
a function of node degree k. On average, the 5 most unstable nodes
are those of degrees k = 47, 36, 26, 10, 4. Still, we can not find a
clear dependence between 〈pu1〉k and k.
average, we may able to improve our prediction of the active
nodes, say for example the dependence of 〈pu1〉k on k in Fig.
9 may be smoothed if we average the results over a large num-
ber of network realizations. Such an improvement, however,
comes at the cost of the decreased prediction accuracy due to
the increased candidates. Taking Fig. 9 as an example, al-
though it is noticed that nodes of k = 4 in general are more
active than those of other degrees, only one of them is listed
as the 5 most unstable nodes [Tab. 1]. In specific, among the
total number of 338 nodes which have degree k = 4, most
of them are tightly attracted to the giant cluster (90 percent of
them have attaching probabilities pu1 > 0.95). Therefore, in
terms of precise predication, the average method is infeasible
in practice.
Beside node degree and betweenness, we have also checked
the dependence of the property of node activity to some other
well-known network properties such as the clustering coeffi-
cient, the modularity, and the assortativity. However, none of
them is suitable to characterize the active nodes, their perfor-
mance is very similar to that of the node degree described in
Tab. 1 and Tab. 2. Our study thus suggests that, to give a pre-
cise prediction to the active nodes, we may need to develop
some new quantities.
Despite of the amount of studies carried on network syn-
chronization, to the best of our knowledge, we are the first to
study the nonstationary pattern in unsynchronizable complex
networks. In Ref. [9] the authors have discussed the transient
process of global synchronization in complex networks, but
their study are concentrating on the synchronizable state in
which, during the course of system evolution, small clusters
integrate into larger clusters monotonically and finally reach
the synchronization state. After that, the system will always
stay on the synchronization state. Our works are also different
to the studies of Refs. [10, 20]. Similar to our works, in these
studies the authors also consider the problem of pattern for-
mation in unsynchronizable networks, but their interests are
focused on the stationary pattern of the system. That is, the
size and contents of the clusters do not change with time. In
contrast, in our studies both the size and contents of the clus-
ters are variable.
In summary, we have reported and investigated a kind of
new phenomena in network synchronization: the nonstation-
ary pattern. That is, the final state of the network settles nei-
ther to the synchronization state nor to any stationary state
of fixed pattern, the system is travelling among all the possi-
ble patterns in an intermittent fashion (the pattern can be of
any configuration, but its probability of showing up is pattern-
dependent). We attribute this nonstationarity to the asymmet-
ric topology of the complex networks, and its dynamical ori-
gin can be understood from the property of the finite-time
Lyapunov exponent associated to the desynchronized mode.
Two types of synchronization formats, the complete synchro-
nization and the temporal phase synchronization, have been
employed to detect the nonstationary dynamics. For coupling
strength immediately out of the stable region, the pattern evo-
lution is characterized by the process of on-off intermittency
and the existence of the giant-cluster; while if the coupling
strength is far away from the bifurcation points, the pattern
evolution is signatured by the random interactivities among
the number of small clusters. A remarkable finding is that,
in the giant-cluster state the locations of the active nodes are
independent of the coupling strength but are sensitive to the
bifurcation types. The active nodes, however, can not be char-
acterized by the currently known network properties, further
investigations about their identification are necessary. While
we are hoping our studies about nonstationary pattern could
give some new understandings to the dynamics of coupled
complex systems, we also hope that our findings about unsyn-
chronizable networks could be used to some practical prob-
lems where system maintains their normal functions only un-
der the unsynchronizable states, for example the problem of
epileptic seizers [21].
[1] D.J. Watts and S.H. Strogatz, Nature 393, 440 (1998); A.-L.
Barabási and R. Albert, Science 286, 509 (1999); R. Albert and
A.-L. Barabási, Rev. Mod. Phys. 74, 47 (2002).
[2] S. Boccaletti and L.M. Pecora, Chaos 16, 015101 (2006); A.
E. Motter, M. A. Matı́as, J. Kurths, E. Ott, Physica D 224, 7
(2006); S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and
D.-U. Hwang, Phys. Rep. 424, 175 (2006).
[3] X.F. Wang and G. Chen, Int. J. Bifurcation Chaos Appl. Sci.
Eng. 12, 187 (2002).
[4] T. Nishikawa, A. E. Motter, Y.-C. Lai, and F. C. Hoppensteadt,
Phys. Rev. Lett. 91, 014101 (2003).
[5] A.E. Motter, C. Zhou, and J. Kurths, Europohys. Lett. 69, 334
(2005); Phys. Rev. E 71, 016116 (2005); AIP Conf. Proc. 776,
201 (2005); C. Zhou, A.E. Motter, and J. Kurths, Phys. Rev.
Lett. 96, 034101 (2006).
[6] M. Chavez, D.-U. Hwang, A. Amann, H.G.E. Hentschel, and S.
Boccaletti, Phys. Rev. Lett. 94, 218701 (2005); D.-U. Hwang,
M. Chavez, A.Amann, and S. Boccaletti, Phys. Rev. Lett. 94,
138701 (2005).
[7] L.M. Pecora and T.L. Carroll, Phys. Rev. Lett. 80, 2109 (1998);
M. Barahona and L. M. Pecora, ibid, 89, 054101 (2002).
[8] D.-S. Lee, Phys. Rev. E 72, 026208 (2005); J. Gómez-
Gardeñes, Y. Moreno, and A. Arenas, Phys. Rev. Lett. 98,
034101 (2007).
[9] A. Arenas, A. Dı́az-Guilera, and C. J. Pérez-Vicente, Phys. Rev.
Lett. 96, 114102 (2006); C. Zhou, L. Zemanová, G. Zamora, C.
C. Hilgetag, and J. Kurths, ibid, 97, 238103 (2006)
[10] S. Boccaletti, M. Ivancheko, V. Latora, A. Pluchino, and A.
Rapisarda, Preprint physics/0607179 (2006).
[11] M. Zhan, Z.G. Zheng, G. Hu, and X.H. Peng, Phys. Rev. E
62, 3552 (2000); Y. Zhang, G. Hu, H.A. Cerdeira, S. Chen, T.
Braun, and Y. Yao, Phys. Rev. E 63, 026211 (2001); B. Ao and
Z. Zheng, Europhys. Lett. 74, 229 (2006); X. Zhang, M. Fu, J.
Xiao, and G. Hu, Phys. Rev. E 74, 015202 (2006).
[12] E. Ott and J.C. Sommerer, Phys. Lett. A 188, 39 (1994); M.
Ding and W. Yang, Phys. Rev. E 56, 4009 (1997); S.H. Wang,
J. Xiao, X.G. Wang, B. Hu, and G. Hu, Eur. Phys. J. B 30, 571
(2002); J.G. Restrepo, E. Ott, and B. Hunt, Phys. Rev. Lett. 93,
114101 (2004).
[13] X.G. Wang, Y.-C. Lai, and C.-H. Lai, Preprint nlin.CD/0608035
(2006).
[14] L. A. Bunimovich, A. Lambert, and R. Lima, J. Stat. Phys. 65,
253 (1990); J.F. Heagy, T.L. Carroll, and L.M. Pecora, Phys.
Rev. Lett. 73, 3528 (1994); ibid, 74, 4185 (1995).
[15] M.A. Matias, V.P. Munuzuri, M.N. Lorenzo, I.P. Marino, and
V.P. Villar, Phys. Rev. Lett. 78, 219 (1997); G. Hu, J. Yang, and
W. Liu, Phys. Rev. E 58, 4440 (1998).
[16] A. Pikovsky and U. Feudel, Chaos 5, 253 (1995); X. Wang, M.
Zhan, C. H. Lai, and Y.-C. Lai, Phys. Rev. Lett. 92, 074102
(2004).
[17] N. Platt, E.A. Spiegel, and C. Tresser, Phys. Rev. Lett. 70,
279 (1993); E. Ott and J. C. Sommerer, Phys. Lett. A 188, 39
(1994); Y. Nagai and Y.-C. Lai, Phys. Rev. E 55, 1251 (1997).
[18] A.S. Pikovsky, M.G. Rosenblum, and J. Kurths, Synchroniza-
tion: A Universal Concept in Nonlinear Science (Cambridge
University Press, Cambridge, UK, 2001); S. Strogatz, Sync:
The Emerging Science of Spontaneous Order (Hyperion, New
York, 2003).
[19] C. Zhou and J. Kurths, Chaos 16, 015104 (2006).
[20] S. Jalan, R.E. Amritkar, and C.-K. Hu, Phys. Rev. E 72, 016211
(2005); ibid, 72, 016212 (2005).
[21] Y.-C. Lai, M. G. Frei, I. Osorio, and L. Huang, Phys. Rev. Letts.
92, 108102 (2007).
http://arxiv.org/abs/physics/0607179
http://arxiv.org/abs/nlin/0608035
ABSTRACT
  Pattern formation and evolution in unsynchronizable complex networks are
investigated. Due to the asymmetric topology, the synchronous patterns formed
in complex networks are irregular and nonstationary. For coupling strength
immediately out of the synchronizable region, the typical phenomenon is the
on-off intermittency of the system dynamics. The patterns appeared in this
process are signatured by the coexistence of a giant cluster, which comprises
most of the nodes, and a few number of small clusters. The pattern evolution is
characterized by the giant cluster irregularly absorbs or emits the small
clusters. As the coupling strength leaves away from the synchronization
bifurcation point, the giant cluster is gradually dissolved into a number of
small clusters, and the system dynamics is characterized by the integration and
separation of the small clusters. Dynamical mechanisms and statistical
properties of the nonstationary pattern evolution are analyzed and conducted,
and some scalings are newly revealed. Remarkably, it is found that the few
active nodes, which escape from the giant cluster with a high frequency, are
independent of the coupling strength while are sensitive to the bifurcation
types. We hope our findings about nonstationary pattern could give additional
understandings to the dynamics of complex systems and have implications to some
real problems where systems maintain their normal functions only in the
unsynchronizable state.

<|endoftext|><|startoftext|>
Topological phase for spin-orbit transformations on a laser beam
C. E. R. Souza†, J. A. O. Huguenin†, P. Milman††, and A.Z. Khoury†
Instituto de F́ısica, Universidade Federal Fluminense, 24210-340 Niterói - RJ, Brasil. and
Laboratoire Matériaux et Phenomènes quantiques CNRS UMR 7162,
Université Denis Diderot, 2 Place Jussieu 75005 Paris cedex.
We investigate the topological phase associated with the double connectedness of the SO(3)
representation in terms of maximally entangled states. An experimental demonstration is provided
in the context of polarization and spatial mode transformations of a laser beam carrying orbital
angular momentum. The topological phase is evidenced through interferometric measurements and
a quantitative relationship between the concurrence and the fringes visibility is derived. Both the
quantum and the classical regimes were investigated.
PACS numbers: PACS: 03.65.Vf, 03.67.Mn, 07.60.Ly, 42.50.Dv
The seminal work by S. Pancharatnam [1] introduced
for the first time the notion of a geometric phase acquired
by an optical beam passing through a cyclic sequence
of polarization transformations. A quantum mechanical
parallel for this phase was later provided by M. Berry
[2]. Recently, the interest for geometric phases was re-
newed by their potential applications to quantum compu-
tation. The experimental demonstration of a conditional
phase gate was recently provided both in nuclear mag-
netic ressonance [3] and trapped ions [4]. Another optical
manifestation of geometric phase is the one acquired by
cyclic spatial mode conversions of optical vortices. This
kind of geometric phase was first proposed by van Enk
[5] and recently found a beautiful demonstration by E. J.
Galvez et al [6].
The Hilbert space of a single qubit admits an useful ge-
ometric representation of pure states on the surface of a
sphere. This is the Bloch sphere for spin 1/2 particles or
the Poincaré sphere for polarization states of an optical
beam. A Poincaré sphere representation can also be con-
structed for the first order subspace of the spatial mode
structure of an optical beam [7]. Therefore, in the quan-
tum domain, we can attribute two qubits to a single pho-
ton, one related to its polarization state and another one
to its spatial structure. Geometrical phases of a cyclic
evolution of the mentioned states can be beautifully in-
terpreted in such representations as being related to the
solid angle of a closed trajectory. However, in order to
compute the total phase gained in a cyclic evolution, one
should also consider the dynamical phase. When added
to the geometrical phase, it leads to a total phase gain
of π after a cyclic trajectory. This phase has been put
into evidence for the first time using neutron interfer-
ence [8]. The appearence of this π phase is due to the
double connectedness of the three dimensional rotation
group SO(3). However, in the neutron experience, only
two dimensional rotations were used, and this topologi-
cal property of SO(3) was not unambiguously put into
evidence, as explained in details in [9, 10].
As discussed by P. Milman and R. Mosseri [9, 11], when
the quantum state of two qubits is considered, the mathe-
matical structure of the Hilbert space becomes richer and
the phase acquired through cyclic evolutions demands a
more careful inspection. The naive sum of independent
phases, one for each qubit, is applicable only for prod-
uct states. In this case, the two qubits are geometrically
represented by two independent Bloch spheres. When
a more general partially entangled pure state is consid-
ered, the phase acquired through a cyclic evolution has
a more complex structure and can be separated in three
contributions: dynamical, geometrical and topological.
Maximally entangled states are solely represented on the
volume of the SO(3) sphere which has radius π and its di-
ametrically opposite points identified. This construction
reveals two kinds of cyclic evolutions, each one mapped
to a different homotopy class of closed trajectories in the
SO(3) sphere. One kind is mapped to closed trajecto-
ries that do not cross the surface of the sphere (0−type)
and the other one is mapped to trajectories that cross
the surface (π−type). The phase acquired by a maxi-
mally entangled state is 0 for the first kind and π for the
second one.
In the present work we demonstrate the topological
phase associated to polarization and spatial mode trans-
formations of an optical vortice. This phase appears first
in the classical description of a paraxial beam with ar-
bitrary polarization state and has its quantum mechan-
ical counterpart in the spin-orbit entanglement of a sin-
gle photon, which constitutes one possible realization of
a two-qubit system and the topological phase discussed
in Ref.[9]. However, it is interesting to observe that,
like the Pancharatnam phase, the two-qubit topological
phase also admits a classical manifestation, since it can
be implemented on the classical amplitude of the opti-
cal field. This is also the first experiment unambiguously
showing the double connectedness of the rotation group
SO(3). The optical modes used in our experiment have
a mathematical structure analog to the one of entangled
states, so that the geometrical representation developped
in [10] also applies and the results of Ref.[9, 11] can be
experimentally demonstrated. When excited with single
photons, these modes give rise to single particle entangled
http://arxiv.org/abs/0704.0893v1
states and provide a more direct relationship with the
ideas put forward in Refs.[9, 10, 11]. This regime is also
investigated in the present work. There are a number of
quantum computing protocols that can be implemented
with single particle entanglement and will certainly ben-
efit from our results.
Let us now combine the spin and orbital degrees of
freedom in the framework of the classical theory in order
to build the same geometric representation applicable to
a two-qubit quantum state. Consider a general first order
spatial mode with arbitrary polarization state:
E(r) = αψ+(r)êH + βψ+(r)êV + γψ−(r)êH
+ δψ−(r)êV , (1)
where êH(V ) are two linear polarization unit vectors
along two orthogonal directions H and V , and ψ±(r)
are the normalized first order Laguerre-Gaussian pro-
files which are orthogonal solutions of the paraxial wave
equation [12]. We may now define two classes of spatial-
polarization modes: the separable (S) and the nonsepa-
rable (NS) ones. The S modes are of the form
E(r) = (α+ψ+(r) + α−ψ+(r)) (βH êH + βV êV ) . (2)
For these modes, a single polarization state can be
atributted to the whole wavefront of the paraxial beam.
They play the role of separable two-qubit quantum
states.
For nonseparable (NS) paraxial modes, the polariza-
tion state varies across the wavefront. As for entangle-
ment in two-qubit quantum states, the separability of a
paraxial mode can be quantified by the analogous defi-
nition of concurrence. For the spin-orbit mode described
by Eq.(1), it is given by:
C = 2 | αδ − βγ | . (3)
Let us first consider the maximally nonseparable
modes (MNS) of the form
E(r) = αψ+(r)êH + βψ+(r)êV − β∗ψ−(r)êH
+ α∗ψ−(r)êV . (4)
For these modes C = 1. It is important to mention that
the concept of entanglement does not applies to the MNS
mode, since the object described by Eq.(4) is not a quan-
tum state, but a classical amplitude. However, we can
build an SO(3) representation of the MNS modes as it
was done in Refs.[11, 13]. Let us define the following
normalized MNS modes:
E1(r) =
[ψ+(r)êH + ψ−(r)êV ] ,
E2(r) =
[ψ+(r)êH − ψ−(r)êV ] , (5)
E3(r) =
[ψ+(r)êV + ψ−(r)êH ] ,
E4(r) =
[ψ+(r)êV − ψ−(r)êH ] .
Laser
HWP-A HWP-B
HWP-1
HWP-2
HWP-3
Pol-V Pol-H
QWP-2
QWP-1
FIG. 1: Experimental setup.
The SO(3) sphere is then constructed in the following
way: mode E1 is represented by the center of the sphere,
while modes E2, E3, and E4 are represented by three
points on the surface, connected to the center by three
mutually orthogonal segments. Each point of the SO(3)
sphere corresponds to a MNS mode. Following the recipe
given in Ref.[13], the coefficients α and β of Eq.(4) are
parametrized to:
α = cos
− i kz sin
β = −(ky + i kx) sin
, (6)
where (kx, ky , kz) = k is a unit vector, and a is an angle
between 0 and π. With this parametrization, each MNS
mode is represented by the vector ak in the sphere.
In order to evidence the topological phase for cyclic
transformations, we must follow two different closed
paths, each one belonging to a different homotopy class,
and compare their phases. The experimental setup is
sketched in Fig.(1). First, a linearly polarized TEM00
laser mode is diffracted on a forked grating used to gen-
erate Laguerre-Gaussian beams [14]. The two side orders
carrying the ψ+(r) and ψ−(r) spatial modes are trans-
mitted through half waveplates HWP-A and HWP-B, fol-
lowed by two orthogonal polarizers Pol-V and Pol-H, and
finally recombined at a beam splitter (BS-1). Half wave-
plates HWP-A and HWP-B are oriented so that their
fast axis are paralell. This allows us to adjust the mode
separability at the output of BS-1 without changing the
corresponding output power, what prevents normaliza-
tion issues.
Experimentally, an MNS mode is produced when both
HWP-A and HWP-B are oriented at 22.5o , so that the
FIG. 2: Interference patterns for a-) a maximally nonsepara-
ble, and b-) a separable mode. From left to right the images
were obtained with QWP-2 oriented at −45o, 0o, and 45o,
respectively.
setup prepares mode E1 located at the centre of the
sphere. Other MNS modes can then be obtained by uni-
tary transformations in only one degree of freedom. Since
polarization is far easier to operate than spatial modes
we choose to implement the cyclic transformations in the
SO(3) sphere using waveplates. The MNS mode E1 is
first transmitted through three waveplates. The first one
(HWP-1) is oriented at 0o and makes the transformation
E1 → E2, the second one (HWP-2) is oriented at −45o
and makes the transformation E2 → E3, and the third
one (HWP-3) is oriented at 90o and makes the transfor-
mation E3 → E4. Finally, two alternative closures of
the path are performed in a Michelson interferometer.
In one arm a π−type closure is implemented by dou-
ble pass through a quarter-waveplate (QWP-1) fixed at
−45o. In the other arm, either a 0−type or a π−type
closure is performed by a double pass through another
quarter-waveplate (QWP-2) oriented at a variable angle
between −45o (π−type) and 45o (0−type). These tra-
jectories are analogous to spin rotations around different
directions of space [13]. They evidence the topological
properties of the three dimensional rotation group.
In order to provide spatial interference fringes, the in-
terferometer was slightly misaligned. The interference
patterns were registered with either a charge coupled
device (CCD) camera or a photocounter (PC), depend-
ing on the working power. First, we registered the
interference patterns obtained when an intense beam
is sent through the apparatus. The images shown in
Fig.(2a) demonstrate clearly the π topological phase
shift. The phase singularity characteristic of Laguerre-
Gaussian beams can be easily identified in the images
and is very useful to evidence the phase shift. When
both arms perform the same kind of trajectory in the
SO(3) sphere (QWP-1 and QWP-2 oriented at −45o), a
bright fringe falls on the phase singularity. When QWP-
2 is oriented at 45o, the trajectory performed in each arm
belongs to a different homotopy class and a dark fringe
falls on the singularity, what clearly demonstrates the π
topological phase shift.
In order to discuss the role played by mode separa-
bility, it is interesting to observe the pattern obtained
when QWP-2 is oriented at intermediate angles, which
correpond to open trajectories in the SO(3) sphere. We
observed that during the phase shift transition, the in-
terference fringes are deformed and finally return to its
initial topology with the π phase shift. This is clearly il-
lustrated by the intermediate image displayed in Fig.(2a),
which corresponds to QWP-2 oriented at 0o . Notice that,
despite the deformation, the interference fringes display
high visibility.
As we mentioned above, the mode preparation settings
can be adjusted in order to provide a separable mode. For
example, when we set HWP-A and B both at 45o , the
output of BS-1 is the separable mode ψ+(r)êH , which
can be represented in the Poincaré spheres for spatial
and polarization modes. The same π phase shift can
be observed when QWP-2 is rotated, but the transition
is essentially different. The intereference pattern is not
topologically deformed, but its visibility decreases until
it completelly vanishes at 0o , and then reappears with
the π phase shift. This transition is clearly illustrated
by the three patterns displayed in Fig.(2b). In this case,
the π phase shift is of purely geometric nature, since the
spatial mode is kept fixed while the polarization mode is
turned around the equator of the corresponding Poincaré
sphere.
The relationship between mode separability and
fringes visibility can be clarified by a straightforward cal-
culation of the interference pattern. Therefore, let us
consider that HWP-A and B are oriented so that the
output of BS-1 is described by
Eǫ(r) =
ǫ ψ+(r)êH +
1− ǫ ψ−(r)êV , (7)
where ǫ is the fraction of the ψ+(r)êH mode in the out-
put power. Now, let us consider that QWP-2 is oriented
at 0o and suppose that the two arms of the Michelson
interferometer are slightly misaligned so that the wave
vectors difference between the two outputs is δk = δk x̂ ,
orthogonal to the propagation axis. Taking into account
the passage through the three half waveplates, and the
transformation performed in each arm of the Michelson
interferometer, we arrive at the following expression for
the interference pattern:
I(r) = 2 |ψ(r)|2
1 + 2
ǫ(1− ǫ) sin 2φ sin (δk x)
, (8)
where φ = arg(x + iy) is the angular coordinate in
the transverse plane of the laser beam, and |ψ(r)|2 is
the doughnut profile of the intensity distribution of a
Laguerre-Gaussian beam. It is clear from Eq.(8) that the
visibility of the interference pattern is 2
ǫ(1− ǫ), which
is precisely the concurrence of Eǫ(r) as given by Eq.(3).
Therefore, the fringes visibility is quantitatively related
to the separability of the mode sent through the setup.
However, the numerical coincidence with the concurrence
0 2 4 6 8 10
Displacement (mm)
FIG. 3: Interference patterns measured in the photocount-
ing regime for ǫ = 1/2 . Empty and full circles correspond
to QWP-2 oriented at −45o and 45o, respectively. Solid and
dashed lines are theoretical fits with sinusoidal functions mod-
ulated by a Laguerre-Gaussian envelope. The phase shift
given by the fits is 3.14 rad .
is restricted to modes of the form given by Eq.(7). In fact,
it is important to stress that the fringes visibility can-
not be regarded as a measure of the concurrence for any
nonseparable mode, but for our purposes it evidences the
topological nature of the phase shift implemented by the
experimental setup. A detailed discussion on the mea-
surement of the concurrence is available in Ref.[15].
Next, we briefly discuss the quantum domain. When
a partially nonseparable mode like Eǫ(r) is occupied by
a single photon, this leads to partially entangled single
particle quantum states of the kind
|ϕǫ〉 =
ǫ |+H〉+
1− ǫ | − V 〉 . (9)
Experimentally, we attenuated the laser beam down to
the single photon regime, and scanned a photocounting
module across the interference pattern. First, HWP-A
and B were set at 22.5o (ǫ = 1/2) in order to evidence
the topological phase in this regime. Fig.(3) displays the
interference patterns obtained with QWP-2 oriented at
−45o and 45o. The π phase shift is again clear.
The relationship between the fringes visibility and the
state separability was evidenced by fixing QWP-2 at
0o and rotating HWP-A and B by an angle θ so that
ǫ = cos2 2θ . Fig.(4) shows the experimental results for
the fringes visibility for several values of ǫ . The solid
line corresponds to the analytical expression of the con-
currence, showing a very good agreement with the exper-
imental values.
As a conclusion, we demonstrated the double con-
nected nature of the SO(3) rotation group and the topo-
logical phase acquired by a laser beam passing through
a cycle of spin-orbit transformations. We investigated
both the classical and the quantum regimes and com-
0,0 0,2 0,4 0,6 0,8 1,0
cos2(2
FIG. 4: Fringes visibility as a function of ǫ. The solid line is
a theoretical fit with C = 2
ǫ(1− ǫ) .
pared the separability of the mode travelling through the
apparatus with the visibility of the interference fringes.
Our results may constitute an useful tool for quantum
computing and quantum information protocols.
The authors are deeply grateful to S.P. Walborn and
P.H. Souto Ribeiro for their precious help with the photo-
counting system and for fruitful discussions. Funding was
provided by Coordenação de Aperfeiçoamento de Pes-
soal de Nı́vel Superior (CAPES), Fundação de Amparo
à Pesquisa do Estado do Rio de Janeiro (FAPERJ-BR),
and Conselho Nacional de Desenvolvimento Cient́ıfico e
Tecnológico (CNPq).
[1] S. Pancharatnam, Proc. Ind. Acad. Sci. 44, 247 (1956).
[2] M. V. Berry, Proc. R. Soc. London A 392, 45 (1984).
[3] J. A. Jones, V. Vedral, A. Ekert, and G. Castagnoli, Na-
ture (London) 403, 869 (2000).
[4] L.-M. Duan, J. I. Cirac, and P. Zoller, Science 292, 1695
(2001).
[5] S.J. van Enk, Opt. Comm. 102, 59 (1993).
[6] E. J. Galvez et al, Phys. Rev. Lett. 90, 203901 (2003).
[7] M. J. Padgett and J. Courtial, Opt. Lett. 24, 430 (1999).
[8] S. A. Werner et al, Phys. Rev. Lett. 35, 1053 (1975).
[9] P. Milman, and R. Mosseri, Phys. Rev. Lett. 90, 230403
(2003).
[10] R. Mosseri, and R. Dandoloff, J. Phys. A 34, 10243
(2003).
[11] P. Milman, Phys. Rev. A 73, 062118 (2006).
[12] A. Yariv, ”Quantum Electronics”, John Wiley & Sons,
third ed. (1988).
[13] W. LiMing, Z. L. Tang, and C. J. Liao, Phys. Rev. A 69,
064301 (2004).
[14] N. R. Heckenberg, R. McDuff, C. P. Smith, and A. G.
White, Opt. Lett. 17, 221 (1992); G.F. Brand, Am. J. of
Phys. 67, 55 (1999).
[15] S. Walborn, P. H. Souto Ribeiro, L. Davidovich, F.
Mintert, and A. Buchleitner, Nature 440, 1022 (2006).
ABSTRACT
  We investigate the topological phase associated with the double connectedness
of the SO(3) representation in terms of maximally entangled states. An
experimental demonstration is provided in the context of polarization and
spatial mode transformations of a laser beam carrying orbital angular momentum.
The topological phase is evidenced through interferometric measurements and a
quantitative relationship between the concurrence and the fringes visibility is
derived. Both the quantum and the classical regimes were investigated.

<|endoftext|><|startoftext|>
Introduction
The study of white dwarfs (WDs) provides insight to understanding WD formation
rates, evolution, and space density. Cool WDs, in particular, provide limits on the age of the
Galactic disk and could represent some unknown fraction of the Galactic halo dark matter.
Individually, nearby WDs are excellent candidates for astrometric planetary searches because
the astrometric signature is greater than for an identical WD system more distant. As a
population, a complete volume limited sample is necessary to provide unbiased statistics;
however, their intrinsic faintness has allowed some to escape detection.
Of the 18 WDs with trigonometric parallaxes placing them within 10 pc of the Sun (the
RECONS sample), all but one have proper motions greater than 1.′′0 yr−1 (94%). By com-
parison, of the 230 main sequence systems (as of 01 January 2007) in the RECONS sample,
50% have proper motions greater than 1.′′0 yr−1. We have begun an effort to reduce this
apparent selection bias against slower-moving WDs to complete the census of nearby WDs.
This effort includes spectroscopic, photometric, and astrometric initiatives to characterize
newly discovered as well as known WDs without trigonometric parallaxes. Utilizing the Su-
perCOSMOS Sky Survey (SSS) for plate magnitude and proper motion information coupled
with data from other recently published proper motion surveys (primarily in the southern
hemisphere), we have identified relatively bright WD candidates via reduced proper motion
diagrams.
– 3 –
In this paper, we present spectra for 33 newly discovered WD systems brighter than
V = 17.0. Once an object is spectroscopically confirmed to be a WD (in this paper for
the first time or elsewhere in the literature), we obtain CCD photometry to derive Teff and
estimate its distance using a spectral energy distribution (SED) fit and a model atmosphere
analysis. If an object’s distance estimate is within the NStars (Henry et al. 2003) and CNS
(Gliese & Jahreiß 1991) horizons of 25 pc, it is then added to CTIOPI (Cerro Tololo Inter-
American Observatory Parallax Investigation) to determine its true distance (e.g. Jao et al.
2005, Henry et al. 2006).
2. Candidate Selection
We used recent high proper motion (HPM) surveys (Pokorny et al. 2004; Subasavage et al.
2005a,b; Finch et al. 2007) in the southern hemisphere for this work because our long-term
astrometric observing program CTIOPI, is based in Chile. To select good WD candidates
for spectroscopic observations, plate magnitudes via SSS and 2MASS JHKS are extracted
for HPM objects. Each object’s (R59F − J) color and reduced proper motion (RPM) are
then plotted. RPM correlates proper motion with proximity, which is certainly not always
true; however, it is effective at separating WDs from subdwarfs and main sequence stars.
Figure 1 displays an RPM diagram for the 33 new WDs presented here. To serve as examples
for the locations of subdwarfs and main sequence stars, recent HPM discoveries from the
SuperCOSMOS-RECONS (SCR) proper motion survey are also plotted (Subasavage et al.
2005a,b). The solid line represents a somewhat arbitrary cutoff separating subdwarfs and
WDs. Targets are selected from the region below the solid line. Note there are four stars
below this line that are not represented with asterisks. Three have recently been spectro-
scopically confirmed as WDs (Subasavage et al., in preparation) and one as a subdwarf (SCR
1227−4541, denoted by “sd”) that fell just below the line at (R59F − J) = 1.4 and HR59F =
19.8 (Subasavage et al. 2005b).
Completeness limits (S/N > 10) for 2MASS are J = 15.8, H = 15.1, and KS = 14.3
for uncontaminated point sources (Skrutskie et al. 2006). The use of J provides a more
reliable RPM diagram color for objects more than a magnitude fainter than the KS limit,
which is particularly important for the WDs (with (J − KS) < 0.4) discussed here. Only
objects bright enough to have 2MASS magnitudes are included in Figure 1. Consequently,
all WD candidates are brighter than V ∼ 17, and are therefore likely to be nearby. Objects
that fall in the WD region of the RPM diagram were cross-referenced with SIMBAD and
– 4 –
McCook & Sion (1999)1 to determine those that were previously classified as WDs. The
remainder were targeted for spectroscopic confirmation.
The remaining 33 candidates comprise the “new sample” whose spectra are presented
in this work, while the “known sample” constitutes the 23 previously identified WD systems
without trigonometric parallaxes for which we have complete V RIJHKS data.
3. Data and Observations
3.1. Astrometry and Nomenclature
The traditional naming convention for WDs uses the object’s epoch 1950 equinox 1950
coordinates. Coordinates for the new sample were extracted from 2MASS along with the
Julian date of observation. These coordinates were adjusted to account for proper motion
from the epoch of 2MASS observation to epoch 2000 (hence epoch 2000 equinox 2000).
The coordinates were then transformed to equinox 1950 coordinates using the IRAF proce-
dure precess. Finally, the coordinates were again adjusted (opposite the direction of proper
motion) to obtain epoch 1950 equinox 1950 coordinates.
Proper motions were taken from various proper motion surveys in addition to unpub-
lished values obtained via the SCR proper motion survey while recovering previously known
HPM objects. Appendix A contains the proper motions used for coordinate sliding as well
as J2000 coordinates and alternate names.
3.2. Spectroscopy
Spectroscopic observations were taken on five separate observing runs in 2003 Octo-
ber and December, 2004 March and September, and 2006 May at the Cerro Tololo Inter-
American Observatory (CTIO) 1.5m telescope as part of the SMARTS Consortium. The
Ritchey-Chrétien Spectrograph and Loral 1200×800 CCD detector were used with grating
09, providing 8.6 Å resolution and wavelength coverage from 3500 to 6900 Å. Observations
consisted of two exposures (typically 20 - 30 minutes each) to permit cosmic ray rejection,
followed by a comparison HeAr lamp exposure to calibrate wavelength for each object. Bias
subtraction, dome/sky flat-fielding, and extraction of spectra were performed using standard
IRAF packages.
1The current web based catalog can be found at http://heasarc.nasa.gov/W3Browse/all/mcksion.html
http://heasarc.nasa.gov/W3Browse/all/mcksion.html
– 5 –
A slit width of 2′′ was used for the 2003 and 2004 observing runs. Some of these data have
flux calibration problems because the slit was not rotated to be aligned along the direction
of atmospheric refraction. In conjunction with telescope “jitter”, light was sometimes lost
preferentially at the red end or the blue end for these data.
A slit width of 6′′, used for the 2006 May run, eliminated most of the flux calibration
problems even though the slit was not rotated. All observations were taken at an airmass
of less than 2.0. Within our wavelength window, the maximum atmospheric differential
refraction is less than 3′′ (Filippenko 1982). A test was performed to verify that no resolution
was lost by taking spectra of a F dwarf with sharp absorption lines from slit widths of 2′′ to
10′′ in 2′′ increments. Indeed, no resolution was lost.
Spectra for the new DA WDs with Teff ≥ 10000 K are plotted in Figure 2 while spectra
for the new DA WDs with Teff < 10000 K are plotted in Figure 3. Featureless DC spectra
are plotted in Figure 4. Spectral plots as well as model fits for unusual objects are described
in § 4.2.
3.3. Photometry
Optical V RI (Johnson V , Kron-Cousins RI) for the new and known samples was ob-
tained using the CTIO 0.9 m telescope during several observing runs from 2003 through
2006 as part of the Small and Moderate Aperture Research Telescope System (SMARTS)
Consortium. The 2048×2046 Tektronix CCD camera was used with the Tek 2 V RI filter
set2. Standard stars from Graham (1982), Bessel (1990), and Landolt (1992) were observed
each night through a range of airmasses to calibrate fluxes to the Johnson-Kron-Cousins
system and to calculate extinction corrections.
Bias subtraction and dome flat-fielding (using calibration frames taken at the beginning
of each night) were performed using standard IRAF packages. When possible, an aperture
14′′ in diameter was used to determine the stellar flux, which is consistent with the aperture
used by Landolt (1992) for the standard stars. If cosmic rays fell within this aperture, they
were removed before flux extraction. In cases of crowded fields, aperture corrections were
applied and ranged from 4′′ to 12′′ in diameter using the largest aperture possible without
including contamination from neighboring sources. Uncertainties in the optical photometry
were derived by estimating the internal night-to-night variations as well as the external errors
(i.e. fits to the standard stars). A complete discussion of the error analysis can be found in
2The central wavelengths for V , R, and I are 5475, 6425, and 8075Å respectively.
– 6 –
Henry et al. (2004). We adopt a total error of ±0.03 mag in each band. The final optical
magnitudes are listed in Table 1 as well as the number of nights each object was observed.
Infrared JHKS magnitudes and errors were extracted via Aladin from 2MASS and are
also listed in Table 1. JHKS magnitude errors are, in most cases, significantly larger than for
V RI, and the errors listed give a measure of the total photometric uncertainty (i.e. include
both global and systematic components). In cases when the magnitude error is null, the star
is near the magnitude limit of 2MASS and the photometry is not reliable.
4. Analysis
4.1. Modeling of Physical Parameters
The pure hydrogen, pure helium, and mixed hydrogen and helium model atmospheres
used to model the WDs are described at length in Bergeron et al. (2001) and references
therein, while the helium-rich models appropriate for DQ and DZ stars are described in
Dufour et al. (2005, 2007), respectively. The atmospheric parameters for each star are ob-
tained by converting the optical V RI and infrared JHKS magnitudes into observed fluxes,
and by comparing the resulting SEDs with those predicted from our model atmosphere cal-
culations. The first step is accomplished by transforming the magnitudes into average stellar
fluxes fm
received at Earth using the calibration of Holberg et al. (2006) for photon count-
ing devices. The observed and model fluxes, which depend on Teff , log g, and atmospheric
composition, are related by the equation
= 4π (R/D)2 Hm
, (1)
where R/D is the ratio of the radius of the star to its distance from Earth, and Hm
the Eddington flux, properly averaged over the corresponding filter bandpass. Our fitting
technique relies on the nonlinear least-squares method of Levenberg-Marquardt (Press et al.
1992), which is based on a steepest descent method. The value of χ2 is taken as the sum over
all bandpasses of the difference between both sides of eq. (1), weighted by the corresponding
photometric uncertainties. We consider only Teff and the solid angle to be free parameters,
and the uncertainties of both parameters are obtained directly from the covariance matrix
of the fit. In this study, we simply assume a value of log g = 8.0 for each star.
As discussed in Bergeron et al. (1997, 2001), the main atmospheric constituent — hy-
drogen or helium — is determined by comparing the fits obtained with both compositions,
or by the presence of Hα in the optical spectra. For DQ and DZ stars, we rely on the
– 7 –
procedure outlined in Dufour et al. (2005, 2007), respectively: we obtain a first estimate
of the atmospheric parameters by fitting the energy distribution with an assumed value of
the metal abundances. We then fit the optical spectrum to measure the metal abundances,
and use these values to improve our atmospheric parameters from the energy distribution.
This procedure is iterated until a self-consistent photometric and spectroscopic solution is
achieved.
The derived values for Teff for each object are listed in Table 1. Also listed are the
spectral types for each object determined based on their spectral features. The DAs have
been assigned a half-integer temperature index as defined by McCook & Sion (1999), where
the temperature index equals 50,400/Teff. As an external check, we compare in Figure 5
the photometric effective temperatures for the DA stars in Table 1 with those obtained by
fitting the observed Balmer line profiles (Figs. 2 and 3) using the spectroscopic technique
developed by Bergeron et al. (1992b), and recently improved by Liebert et al. (2003). Our
grid of pure hydrogen, NLTE, and convective model atmospheres is also described in Liebert
et al. The uncertainties of the spectroscopic technique are typically of 0.038 dex in log g
and 1.2% in Teff according to that study. We adopt a slightly larger uncertainty of 1.5%
in Teff (Spec) because of the problematic flux calibrations of the pre−2006 data (see § 3.2).
The agreement shown in Figure 5 is excellent, except perhaps at high temperatures where
the photometric determinations become more uncertain. It is possible that the significantly
elevated point in Figure 5, WD 0310−624 (labeled), is an unresolved double degenerate (see
§ 4.2). We refrain here from using the log g determinations in our analysis because these
are available only for the DA stars in our sample, and also because the spectra are not flux
calibrated accurately enough for that purpose.
Once the effective temperature and the atmospheric composition are determined, we
calculate the absolute visual magnitude of each star by combining the new calibration of
Holberg et al. (2006) with evolutionary models similar to those described in Fontaine et al.
(2001) but with C/O cores, q(He) ≡ logMHe/M⋆ = 10
−2 and q(H) = 10−4 (representative
of hydrogen-atmosphere WDs), and q(He) = 10−2 and q(H) = 10−10 (representative of
helium-atmosphere WDs)3. By combining the absolute visual magnitude with the Johnson
V magnitude, we derive a first estimate of the distance of each star (reported in Table 1).
Errors on the distance estimates incorporate the errors of the photometry values as well as
an error of 0.25 dex in log g, which is the measured dispersion of the observed distribution
using spectroscopic determinations (see Figure 9 of Bergeron et al. 1992b).
Of the 33 new systems presented here, 5 have distance estimates within 25 pc. Four
3see http://www.astro.umontreal.ca/˜bergeron/CoolingModels/
http://www.astro.umontreal.ca/~bergeron/CoolingModels/
– 8 –
more systems require additional attention because distance estimates are derived via other
means. Three of these are likely within 25 pc. All four are further discussed in the next
section. In total, 20 WD systems (8 new and 12 known) are estimated (or determined) to be
within 25 pc and one additional common proper motion binary system possibly lies within
25 pc.
4.2. Comments on Individual Systems
Here we address unusual and interesting objects.
WD 0121−429 is a DA WD that exhibits Zeeman splitting of Hα and Hβ, thereby
making its formal classification DAH. The SED fit to the photometry is superb, yielding
a Teff of 6,369 ± 137 K. When we compare the strength of the absorption line trio with
that predicted using the Teff from the SED fit, the depth of the absorption appears too
shallow. Using the magnetic line fitting procedure outlined in Bergeron et al. (1992a), we
must include a 50% dilution factor to match the observed central line of Hα. In light of
this, we utilized the trigonometric parallax distance determined via CTIOPI of 17.7 ± 0.7
pc (Subasavage et al., in preparation) to further constrain this system. The resulting SED
fit, with distance (hence luminosity) as a constraint rather than a variable, implies a mass of
0.43 ± 0.03 M⊙. Given the age of our Galaxy, the lowest mass WD that could have formed
is ∼0.47 M⊙ (Iben & Renzini 1984). It is extremely unlikely that this WD formed through
single star evolution. The most likely scenario is that this is a double degenerate binary
with a magnetic DA component and a featureless DC component (necessary to dilute the
absorption at Hα), similar to G62-46 (Bergeron et al. 1993) and LHS 2273 (see Figure 33
of Bergeron et al. 1997). If this interpretation is correct, any number of component masses
and luminosities can reproduce the SED fit.
The spectrum and corresponding magnetic fit to the Hα lines (including the dilution)
is shown in Figure 6. The viewing angle, i = 65◦, is defined as the angle between the dipole
axis and the line of sight (i = 0 corresponds to a pole-on view). The best fit produces a
dipole field strength, Bd = 9.5 MG, and a dipole offset, az = 0.06 (in units of stellar radius).
The positive value of az implies that the offset is toward the observer. Only Bd is moderately
constrained, both i and az can vary significantly yet still produce a reasonable fit to the data
(Bergeron et al. 1992a).
WD 0310−624 is a DAWD that is one of the hottest in the new sample. Because of it’s
elevation significantly above the equal temperature line (solid) in Figure 5, it is possible that
it is an unresolved double degenerate with very different component effective temperatures.
– 9 –
In fact, this method has been used to identify unresolved double degenerate candidates (i.e.
Bergeron et al. 2001).
WD 0511−415 is a DA WD (spectrum is plotted in Figure 2) whose spectral fit
produces a Teff = 10,813 ± 219 K and a log g = 8.21 ± 0.10 using the spectral fitting
procedure of Liebert et al. (2003). This object lies near the red edge of the ZZ Ceti instability
strip as defined by Gianninas et al. (2006). If variable, this object would help to constrain
the cool edge of the instability strip in Teff , log g parameter space. Follow-up high speed
photometry is necessary to confirm variability.
WD 0622−329 is a DAB WD displaying the Balmer lines as well as weaker He I at
4472 and 5876 Å. The spectrum, shown in Figure 7, is reproduced best with a model having
Teff ∼43,700 K. However, the predicted He II absorption line at 4686 Å for a WD of this
Teff is not present in the spectrum. In contrast, the SED fit to the photometry implies a
Teff of ∼10,500 K (using either pure H or pure He models). Because the Teff values are
vastly discrepant, we explored the possibility that this spectrum is not characterized by a
single temperature. We modeled the spectrum assuming the object was an unresolved double
degenerate. The best fit implies one component is a DB with Teff = 14,170 ± 1,228 K and
the other component is a DA with Teff = 9,640 ± 303 K, similar to the unresolved DA +
DB degenerate binary PG 1115+166 analyzed by Bergeron & Liebert (2002). One can see
from Figure 7 that the spectrum is well modeled under this assumption. We conclude this
object is likely a distant (well beyond 25 pc) unresolved double degenerate.
WD 0840−136 is a DZ WD whose spectrum shows both Ca II (H & K) and Ca I (4226
Å) lines as shown in Figure 8. Fits to the photometric data for different atmospheric com-
positions indicate temperatures of about 4800-5000 K. However, fits to the optical spectrum
using the models of Dufour et al. (2007) cannot reproduce simultaneously all three calium
lines. This problem is similar to that encountered by Dufour et al. (2007) where the atmo-
spheric parameters for the coolest DZ WDs were considered uncertain because of possible
high atmospheric pressure effects. We utilize a photometric relation relevant for WDs of any
atmospheric composition, which links MV to (V −I) (Salim et al. 2004) to obtain a distance
estimate of 19.3 ± 3.9 pc.
WD 1054−226 was observed spectroscopically as part of the Edinburgh-Cape (EC)
blue object survey and assigned a spectral type of sdB+ (Kilkenny et al. 1997). As is evident
in Figure 3, the spectrum of this object is the noisiest of all the spectra presented here and
perhaps a bit ambiguous. As an additional check, this object was recently observed using
the ESO 3.6 m telescope and has been confirmed to be a cool DA WD (Bergeron, private
communication).
– 10 –
WD 1105−340 is a DA WD (spectrum is plotted in Figure 2) with a common proper
motion companion with separation of 30.′′6 at position angle 107.1◦. The companion’s spectral
type is M4Ve with VJ = 15.04, RKC = 13.68, IKC = 11.96, J = 10.26, H = 9.70, and KS
= 9.41. In addition to the SED derived distance estimate for the WD, we utilize the main
sequence distance relations of Henry et al. (2004) to estimate a distance to the red dwarf
companion. We obtain a distance estimate of 19.1 ± 3.0 pc for the companion leaving
open the possibility that this system may lie just within 25 pc. A trigonometric parallax
determination is currently underway for confirmation.
WD 1149−272 is the only DQ WD discovered in the new sample. This object was
observed spectroscopically as part of the Edinburgh-Cape (EC) blue object survey for which
no features deeper than 5% were detected and was labeled a possible DC (Kilkenny et al.
1997). It is identified as having weak C2 swan band absorption at 4737 and 5165 Å and is
otherwise featureless. The DQ model reproduces the spectrum reliably and is overplotted in
Figure 9. This object is characterized as having Teff = 6188 ± 194 K and a log (C/He) =
−7.20 ± 0.16.
WD 2008−600 is a DC WD (spectrum is plotted in Figure 4) that is flux deficient in
the near infrared, as indicated by the 2MASS magnitudes. The SED fit to the photometry
is a poor match to either the pure hydrogen or the pure helium models. A pure hydrogen
model provides a slightly better match than a pure helium model, and yields a Teff of ∼3100
K, thereby placing it in the relatively small sample of ultracool WDs. In order to discern the
true nature of this object, we have constrained the model using the distance obtained from
the CTIOPI trigonometric parallax of 17.1 ± 0.4 pc (Subasavage et al., in preparation). This
object is then best modeled as having mostly helium with trace amounts of hydrogen (log
(He/H) = 2.61) in its atmosphere and has a Teff = 5078 ± 221 K (see Figure 10). A mixed
hydrogen and helium composition is required to produce sufficient absorption in the infrared
as a result of the collision-induced absorption by molecular hydrogen due to collisions with
helium. Such mixed atmospheric compositions have also been invoked to explain the infrared
flux deficiency in LHS 1126 (Bergeron et al. 1994) as well as SDSS 1337+00 and LHS 3250
(Bergeron & Leggett 2002). While WD 2008−600 is likely not an ultracool WD, it is one of
the brightest and nearest cool WDs known. Because the 2MASS magnitudes are not very
reliable, we intend to obtain additional near-infrared photometry to better constrain the fit.
WD 2138−332 is a DZ WD for which a calcium rich model reproduces the spectrum
reliably. The spectrum and the overplotted fit are shown in the bottom panel of Figure
8. Clearly evident in the spectrum are the strong Ca II absorption at 3933 and 3968 Å. A
weaker Ca I line is seen at 4226Å. Also seen are Mg I absorption lines at 3829, 3832, and
3838 Å (blended) as well as Mg I at 5167, 5173, and 5184 Å (also blended). Several weak Fe I
– 11 –
lines from 4000Å to 4500Å and again from 5200Å to 5500Å are also present. The divergence
of the spectrum from the fit toward the red end is likely due to an imperfect flux calibration
of the spectrum. This object is characterized as having Teff = 7188 ± 291 K and a log
(Ca/He) = −8.64 ± 0.16. The metallicity ratios are, at first, assumed to be solar (as defined
by Grevesse & Sauval 1998) and, in this case, the quality of the fit was sufficient without
deviation. The corresponding log (Mg/He) = −7.42 ± 0.16 and log (Fe/He) = −7.50 ± 0.16
for this object.
WD 2157−574 is a DAWD (spectrum is plotted in Figure 3) unique to the new sample
in that it displays weak Ca II absorption at 3933 and 3968 Å (H and K) thereby making its
formal classification a DAZ. Possible scenarios that enrich the atmospheres of DAZs include
accretion via (1) debris disks, (2) ISM, and (3) cometary impacts (see Kilic et al. 2006 and
references therein). The 2MASS KS magnitude is near the faint limit and is unreliable, but
even considering the J and H magnitudes, there appears to be no appreciable near-infrared
excess. While this may tentatively rule out the possibility of a debris disk, this object would
be an excellent candidate for far-infrared spaced-based studies to ascertain the origin of the
enrichment.
5. Discussion
WDs represent the end state for stars less massive than ∼8 M⊙ and are therefore rel-
atively numerous. Because of their intrinsic faintness, only the nearby WD population can
be easily characterized and provides the benchmark upon which WD stellar astrophysics is
based. It is clear from this work and others (e.g. Holberg et al. 2002; Kawka & Vennes 2006)
that the WD sample is complete, at best, to only 13 pc. Spectroscopic confirmation of new
WDs as well as trigonometric parallax determinations for both new and known WDs will
lead to a more complete sample and will push the boundary of completeness outward. We
estimate that 8 new WDs and an additional 12 known WDs without trigonometric parallaxes
are nearer than 25 pc, including one within 10 pc (WD 0141−675). Parallax measurements
via CTIOPI are underway for these 20 objects to confirm proximity. This total of 20 WDs
within 25 pc constitutes an 18% increase to the 109 WDs with trigonometric parallaxes ≥
40 mas.
Evaluating the proper motions of the new and known samples within 25 pc indicates
that almost double the number of systems have been found with µ < 1.′′0 yr−1 than with
µ ≥ 1.′′0 yr−1 (13 vs 7, see Table 2). The only WD estimated to be within 10 pc has µ >
1.′′0 yr−1, although WD 1202−232 is estimated to be 10.2 ± 1.7 pc and it’s proper motion is
small (µ = 0.′′227 yr−1).
– 12 –
Because this effort focuses mainly on the southern hemisphere, it is likely that there
is a significant fraction of nearby WDs in the northern hemisphere that have also gone
undetected. With the recent release of the LSPM-North Catalog (Lépine & Shara 2005),
these objects are identifiable by employing the same techniques used in this work. The
challenge is the need for a large scale parallax survey focusing on WDs to confirm proximity.
Since the HIPPARCOS mission, only six WD trigonometric parallaxes have been published
(Hambly et al. 1999; Smart et al. 2003), and of those, only two are within 25 pc. The USNO
parallax program is in the process of publishing trigonometric parallaxes for ∼130 WDs,
mostly in the northern hemisphere, although proximity was not a primary motivation for
target selection (Dahn, private communication).
In addition to further completing the nearby WD census, the wealth of observational
data available from this effort provides reliable constraints on their physical parameters
(i.e. Teff , log g, mass, and radius). Unusual objects are then revealed, such as those dis-
cussed in § 4.2. In particular, trigonometric parallaxes help identify WDs that are overlu-
minous, as is the case for WD 0121−429. This object, and others similar to it, are excellent
candidates to provide insight into binary evolution. If they can be resolved using high res-
olution astrometric techniques (i.e. speckle, adaptive optics, or interferometry via Hubble
Space Telescope’s Fine Guidance Sensors), they may provide astrometric masses, which are
fundamental calibrators for stellar structure theory and for the reliability of the theoretical
WD mass-radius and initial-to-final-mass relationships. To date, only four WD astrometric
masses are known to better than ∼ 5% (Provencal et al. 1998).
One avenue that is completely unexplored to date is a careful high resolution search
for planets around WDs. Theory dictates that the Sun will become a WD, and when it
does, the outer planets will remain in orbit (not without transformations of their own, of
course). In this scenario, the Sun will have lost more than half of its mass, thereby amplifying
the signature induced by the planets. Presumably, this has already occurred in the Milky
Way and systems such as these merely await detection. Because of the faintness and spectral
signatures of WDs (i.e. few, if any, broad absorption lines), current radial velocity techniques
are inadequate for planet detection, leaving astrometric techniques as the only viable option.
For a given system, the astrometric signature is inversely related to distance (i.e. the nearer
the system, the larger the astrometric signature). This effort aims to provide a complete
census of nearby WDs that can be probed for these astrometric signatures using future
astrometric efforts.
– 13 –
6. Acknowledgments
The RECONS team at Georgia State University wishes to thank the NSF (grant AST
05-07711), NASA’s Space Interferometry Mission, and GSU for their continued support of
our study of nearby stars. We also thank the continuing support of the members of the
SMARTS consortium, who enable the operations of the small telescopes at CTIO where all
of the data in this work were collected. J. P. S. is indebted to Wei-Chun Jao for the use of his
photometry reduction pipeline. P. B. is a Cottrell Scholar of Research Corporation and would
like to thank the NSERC Canada for its support. N. C. H. would like to thank colleagues in
the Wide Field Astronomy Unit at Edinburgh for their efforts contributing to the existence
of the SSS; particular thanks go to Mike Read, Sue Tritton, and Harvey MacGillivray. This
work has made use of the SIMBAD, VizieR, and Aladin databases, operated at the CDS in
Strasbourg, France. We have also used data products from the Two Micron All Sky Survey,
which is a joint project of the University of Massachusetts and the Infrared Processing and
Analysis Center, funded by NASA and NSF.
A. Appendix
In order to ensure correct cross-referencing of names for the new and known WD systems
presented here, Table 3 lists additional names found in the literature. Objects for which there
is an NLTT designation will also have the corresponding L or LP designations found in the
NLTT catalog. This is necessary because the NLTT designations were not published in the
original catalog, but rather are the record numbers in the electronic version of the catalog
and have been adopted out of necessity.
REFERENCES
Bergeron, P., Ruiz, M.-T., & Leggett, S. K. 1992, ApJ, 400, 315
Bergeron, P., Saffer, R. A., & Liebert, J. 1992, ApJ, 394, 228
Bergeron, P., Ruiz, M.-T., & Leggett, S. K. 1993, ApJ, 407, 733
Bergeron, P., Ruiz, M.-T., Leggett, S. K., Saumon, D., & Wesemael, F. 1994, ApJ, 423, 456
Bergeron, P., Ruiz, M. T., & Leggett, S. K. 1997, ApJS, 108, 339
Bergeron, P., Leggett, S. K., & Ruiz, M. T. 2001, ApJS, 133, 413
– 14 –
Bergeron, P., & Leggett, S. K. 2002, ApJ, 580, 1070
Bergeron, P., & Liebert, J. 2002, ApJ, 566, 1091
Bessel, M. S. 1990, A&AS, 83, 357
Carpenter, J. M. 2001, AJ, 121, 2851
Dufour, P., Bergeron, P., & Fontaine, G. 2005, ApJ, 627, 404
Dufour, P., Bergeron, P., Liebert, J., Harris, H. C., Knapp, G. R., Anderson, S. F., Hall,
P. B., Strauss, M. A., Collinge, M. J., & Edwards, M. C. 2007, submitted to ApJ
Filippenko, A. V. 1982, PASP, 94, 715
Finch, C. T., Henry, T. J., Subasavage, J. P., Jao, W.-C., Hambly, N. C. 2007, AJ, submitted
Fontaine, G., Brassard, P., & Bergeron, P. 2001, PASP, 113, 409
Gianninas, A., Bergeron, P., & Fontaine, G. 2006, AJ, 132, 831
Gliese, W., & Jahreiß, H. 1991, On: The Astronomical Data Center CD-ROM: Selected As-
tronomical Catalogs, Vol. I; L.E. Brotzmann, S.E. Gesser (eds.), NASA/Astronomical
Data Center, Goddard Space Flight Center, Greenbelt, MD
Graham, J. A. 1982, PASP, 94, 244
Grevesse, N., & Sauval, A. J. 1998, Space Science Reviews, 85, 161
Hambly, N. C., Smartt, S. J., Hodgkin, S. T., Jameson, R. F., Kemp, S. N., Rolleston,
W. R. J., & Steele, I. A. 1999, MNRAS, 309, L33
Henry, T. J., Walkowicz, L. M., Barto, T. C., & Golimowski, D. A. 2002, AJ, 123, 2002
Henry, T. J., Backman, D. E., Blackwell, J., Okimura, T., & Jue, S. 2003, The Future of
Small Telescopes In The New Millennium. Volume III - Science in the Shadow of
Giants, 111
Henry, T. J., Subasavage, J. P., Brown, M. A., Beaulieu, T. D., Jao, W., & Hambly, N. C.
2004, AJ, 128, 2460
Henry, T. J., Jao, W.-C., Subasavage, J. P., Beaulieu, T. D., Ianna, P. A., Costa, E., &
Méndez, R. A. 2006, AJ, 132, 2360
Holberg, J. B., Oswalt, T. D., & Sion, E. M. 2002, ApJ, 571, 512
– 15 –
Holberg, J. B., & Bergeron, P. 2006, AJ, 132, 1223
Iben, I., & Renzini, A. 1984, Phys. Rep., 105, 329
Jao, W.-C., Henry, T. J., Subasavage, J. P., Brown, M. A., Ianna, P. A., Bartlett, J. L.,
Costa, E., & Méndez, R. A. 2005, AJ, 129, 1954
Kawka, A., & Vennes, S. 2006, ApJ, 643, 402
Kilic, M., von Hippel, T., Leggett, S. K., & Winget, D. E. 2006, ApJ, 646, 474
Kilkenny, D., O’Donoghue, D., Koen, C., Stobie, R. S., & Chen, A. 1997, MNRAS, 287, 867
Kleinman, S. J., et al. 2004, ApJ, 607, 426
Landolt, A. U. 1992, AJ, 104, 340
Lépine, S., Shara, M. M., & Rich, R. M. 2003, AJ, 126, 921
Lépine, S., & Shara, M. M. 2005, AJ, 129, 1483
Lépine, S., Rich, R. M., & Shara, M. M. 2005, ApJ, 633, L121
Liebert, J., Bergeron, P., & Holberg, J. B. 2003, AJ, 125, 348
Liebert, J., Bergeron, P., & Holberg, J. B. 2005, ApJS, 156, 47
Luyten, W. J. 1949, ApJ, 109, 528
Luyten, W. J. 1979, LHS Catalogue (2nd ed.; Minneapolis: Univ. of Minnesota Press)
Luyten, W. J. 1979, New Luyten Catalogue of Stars with Proper Motions Larger than Two
Tenths of an Arcsecond (Minneapolis: Univ. of Minnesota Press)
McCook, G. P., & Sion, E. M. 1999, ApJS, 121, 1
Oppenheimer, B. R., Hambly, N. C., Digby, A. P., Hodgkin, S. T., & Saumon, D. 2001,
Science, 292, 698
Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 1992, Numerical Recipes
in FORTRAN, 2nd edition (Cambridge: Cambridge University Press), 644
Provencal, J. L., Shipman, H. L., Hog, E., & Thejll, P. 1998, ApJ, 494, 759
Pokorny, R. S., Jones, H. R. A., Hambly, N. C., & Pinfield, D. J. 2004, A&A, 421, 763
– 16 –
Salim, S., Rich, R. M., Hansen, B. M., Koopmans, L. V. E., Oppenheimer, B. R., & Bland-
ford, R. D. 2004, ApJ, 601, 1075
Scholz, R.-D., Szokoly, G. P., Andersen, M., Ibata, R., & Irwin, M. J. 2002, ApJ, 565, 539
Skrutskie, M. F., et al. 2006, AJ, 131, 1163
Smart, R. L., et al. 2003, A&A, 404, 317
Subasavage, J. P., Henry, T. J., Hambly, N. C., Brown, M. A., & Jao, W. 2005, AJ, 129, 413
Subasavage, J. P., Henry, T. J., Hambly, N. C., Brown, M. A., Jao, W.-C., & Finch, C. T.
2005, AJ, 130, 1658
This preprint was prepared with the AAS LATEX macros v5.2.
Table 1. Optical and Infrared Photometry, and Derived Parameters for New and Known White Dwarfs.
WD VJ RC IC # J σJ H σH KS σK
Teff Comp Dist SpT Notes
Name Obs (K) (pc)
New Spectroscopically Confirmed White Dwarfs
0034−602............. 14.08 14.19 14.20 3 14.37 0.04 14.55 0.06 14.52 0.09 14655±1413 H 35.8±5.7 DA3.5
0121−429............. 14.83 14.52 14.19 4 13.85 0.02 13.63 0.04 13.53 0.04 6369± 137 H · · · ± · · · DAH a
0216−398............. 15.75 15.55 15.29 3 15.09 0.04 14.83 0.06 14.89 0.14 7364± 241 H 29.9±4.7 DA7.0
0253−755............. 16.70 16.39 16.08 2 15.77 0.07 15.76 0.15 15.34 null 6235± 253 He 34.7±5.5 DC
0310−624............. 15.92 15.99 16.03 2 16.13 0.10 16.31 0.27 16.50 null 13906±1876 H · · · ± · · · DA3.5 b
0344+014............. 16.52 16.00 15.54 2 15.00 0.04 14.87 0.09 14.70 0.12 5084± 91 He 19.9±3.1 DC
0404−510............. 15.81 15.76 15.70 2 15.74 0.06 15.55 0.13 15.59 null 10052± 461 H 53.5±8.5 DA5.0
0501−555............. 16.35 16.17 15.98 2 15.91 0.08 15.72 0.15 15.82 0.26 7851± 452 He 44.8±6.9 DC
0511−415............. 16.00 15.99 15.93 2 15.96 0.08 15.97 0.15 15.20 null 10393± 560 H 61.8±10.8 DA5.0
0525−311............. 15.94 16.03 16.03 2 16.20 0.12 16.21 0.25 14.98 null 12941±1505 H 76.3±13.6 DA4.0
0607−530............. 15.99 15.92 15.78 3 15.82 0.07 15.66 0.14 15.56 0.21 9395± 426 H 51.7±9.0 DA5.5
0622−329............. 15.47 15.41 15.36 2 15.44 0.06 15.35 0.11 15.53 0.25 · · · ± · · · · · · · · · ± · · · DAB c
0821−669............. 15.34 14.82 14.32 3 13.79 0.03 13.57 0.03 13.34 0.04 5160± 95 H 11.5±1.9 DA10.0
0840−136............. 15.72 15.36 15.02 3 14.62 0.03 14.42 0.05 14.54 0.09 · · · ± · · · · · · · · · ± · · · DZ d
1016−308............. 14.67 14.75 14.81 2 15.05 0.04 15.12 0.08 15.41 0.21 16167±1598 H 50.6±9.2 DA3.0
1054−226............. 16.02 15.82 15.62 2 15.52 0.05 15.40 0.11 15.94 0.26 8266± 324 H 41.0±7.0 DA6.0 e
1105−340............. 13.66 13.72 13.79 2 13.95 0.03 13.98 0.04 14.05 0.07 13926± 988 H 28.2±4.8 DA3.5 f
1149−272............. 15.87 15.59 15.37 4 15.17 0.05 14.92 0.06 14.77 0.11 6188± 194 He (+C) 24.0±3.8 DQ
1243−123............. 15.57 15.61 15.64 2 15.74 0.07 15.73 0.11 16.13 null 12608±1267 H 62.6±10.7 DA4.0
1316−215............. 16.67 16.33 15.99 2 15.56 0.05 15.33 0.08 15.09 0.14 6083± 201 H 31.6±5.3 DA8.5
1436−781............. 16.11 15.82 15.49 2 15.04 0.04 14.88 0.08 14.76 0.14 6246± 200 H 26.0±4.3 DA8.0
1452−310............. 15.85 15.77 15.63 2 15.58 0.06 15.54 0.09 15.50 0.22 9206± 375 H 46.8±8.1 DA5.5
1647−327............. 16.21 15.85 15.49 3 15.15 0.05 14.82 0.08 14.76 0.11 6092± 193 H 25.5±4.2 DA8.5
1742−722............. 15.53 15.62 15.70 2 15.85 0.08 15.99 0.18 15.65 null 15102±2451 H 71.7±12.9 DA3.5
1946−273............. 14.19 14.31 14.47 2 14.72 0.04 14.77 0.09 14.90 0.13 21788±3304 H 52.0±9.9 DA2.5
2008−600............. 15.84 15.40 14.99 4 14.93 0.05 15.23 0.11 15.41 null 5078± 221 He · · · ± · · · DC g
2008−799............. 16.35 15.96 15.57 3 15.11 0.04 15.03 0.08 14.64 0.09 5807± 161 H 24.5±4.1 DA8.5
2035−369............. 14.94 14.85 14.72 2 14.75 0.04 14.72 0.06 14.84 0.09 9640± 298 H 33.1±5.7 DA5.0
2103−397............. 15.31 15.15 14.91 2 14.79 0.03 14.63 0.04 14.64 0.08 7986± 210 H 28.2±4.8 DA6.5
2138−332............. 14.47 14.30 14.16 3 14.17 0.03 14.08 0.04 13.95 0.06 7188± 291 He (+Ca) 17.3±2.7 DZ
2157−574............. 15.96 15.73 15.49 3 15.18 0.04 15.05 0.07 15.28 0.17 7220± 246 H 32.0±5.4 DAZ
2218−416............. 15.36 15.35 15.24 2 15.38 0.04 15.14 0.09 15.39 0.15 10357± 414 H 45.6±8.0 DA5.0
2231−387............. 16.02 15.88 15.62 2 15.57 0.06 15.51 0.11 15.11 0.15 8155± 336 H 40.6±6.9 DA6.0
Known White Dwarfs without a Trigonometric Parallax Estimated to be Within 25 pc
0141−675 ............ 13.82 13.52 13.23 3 12.87 0.02 12.66 0.03 12.58 0.03 6484± 128 H 9.7±1.6 DA8.0
0806−661 ............ 13.73 13.66 13.61 3 13.70 0.02 13.74 0.03 13.78 0.04 10753± 406 He 21.1±3.5 DQ
1009−184 ............ 15.44 15.18 14.91 3 14.68 0.04 14.52 0.05 14.31 0.07 6449± 194 He 20.9±3.2 DZ h
1036−204 ............ 16.24 15.54 15.34 3 14.63 0.03 14.35 0.04 14.03 0.07 4948± 70 He 16.2±2.5 DQ i
1202−232 ............ 12.80 12.66 12.52 3 12.40 0.02 12.30 0.03 12.34 0.03 8623± 168 H 10.2±1.7 DA6.0
1315−781 ............ 16.16 15.73 15.35 2 14.89 0.04 14.67 0.08 14.58 0.12 5720± 162 H 21.6±3.6 DC j
1339−340 ............ 16.43 16.00 15.56 2 15.00 0.04 14.75 0.06 14.65 0.10 5361± 138 H 21.2±3.5 DA9.5
1756+143 ............ 16.30 16.12 15.69 1 14.93 0.04 14.66 0.06 14.66 0.08 5466± 151 H 22.4±3.4 DA9.0 k
Table 1—Continued
WD VJ RC IC # J σJ H σH KS σK
Teff Comp Dist SpT Notes
Name Obs (K) (pc)
1814+134 ............ 15.85 15.34 14.86 2 14.38 0.04 14.10 0.06 14.07 0.06 5313± 115 H 15.6±2.5 DA9.5
2040−392 ............ 13.74 13.77 13.68 2 13.77 0.02 13.82 0.03 13.81 0.05 10811± 325 H 23.1±4.0 DA4.5
2211−392 ............ 15.91 15.61 15.24 2 14.89 0.03 14.64 0.05 14.56 0.08 6243± 167 H 23.5±4.0 DA8.0
2226−754A........... 16.57 15.93 15.33 2 14.66 0.04 14.66 0.06 14.44 0.08 4230± 104 H 12.8±2.0 DC l
2226−754B........... 16.88 16.17 15.51 2 14.86 0.04 14.82 0.06 14.72 0.12 4177± 112 H 14.0±2.2 DC l
Known White Dwarfs without a Trigonometric Parallax Estimated to be Beyond 25 pc
0024−556............. 15.17 15.15 15.07 2 15.01 0.04 15.23 0.10 15.09 0.14 10007± 378 H 39.8±6.8 DA5.0
0150+256............. 15.70 15.52 15.33 2 15.07 0.04 15.07 0.09 15.15 0.14 7880± 280 H 33.0±5.6 DA6.5
0255−705 ............ 14.08 14.03 14.00 2 14.04 0.03 14.12 0.04 13.99 0.06 10541± 326 H 25.8±4.5 DA5.0
0442−304............. 16.03 15.93 15.86 2 15.94 0.09 15.81 null 15.21 null 9949± 782 He 55.1±9.1 DQ
0928−713 ............ 15.11 14.97 14.83 3 14.77 0.03 14.69 0.06 14.68 0.09 8836± 255 H 30.7±5.3 DA5.5
1143−013............. 16.39 16.08 15.79 1 15.54 0.06 15.38 0.08 15.18 0.16 6824± 250 H 34.4±5.8 DA7.5
1237−230 ............ 16.53 16.13 15.74 2 15.35 0.05 15.08 0.08 14.94 0.11 5841± 173 H 26.9±4.5 DA8.5
1314−153............. 14.82 14.89 14.97 2 15.17 0.05 15.26 0.09 15.32 0.21 15604±2225 H 52.7±9.5 DA3.0
1418−088 ............ 15.39 15.21 15.01 2 14.76 0.04 14.73 0.06 14.76 0.10 7872± 243 H 28.5±4.8 DA6.5
1447−190............. 15.80 15.59 15.32 2 15.06 0.04 14.87 0.07 14.78 0.11 7153± 235 H 29.1±4.9 DA7.0
1607−250............. 15.19 15.12 15.09 2 15.08 0.08 15.08 0.08 15.22 0.15 10241± 457 H 41.2±7.2 DA5.0
aDistance via SED fit (not listed) is underestimated because object is likely an unresolved double degenerate with one magnetic component (see § 4.2). Instead, we
adopt the trigonometric parallax distance of 17.7 ± 0.7 pc derived via CTIOPI.
bDistance via SED fit (not listed) is underestimated because object is likely a distant (well beyond 25 pc) unresolved double degenerate (see § 4.2).
cDistance via SED fit (not listed) is underestimated because object is likely a distant (well beyond 25 pc) unresolved double degenerate with components of type DA
and DB (see § 4.2). Temperatures derived from the spectroscopic fit yield 9,640 ± 303 K and 14,170 ± 1,228 K for the DA and DB respectively.
dObject is likely cooler than Teff ∼5000 K and the theoretical models do not provide an accurate treatment at these temperatures (see § 4.2). Instead, we use the
linear photometric distance relation of Salim et al. (2004) and obtain a distance estimate of 19.3 ± 3.9 pc.
eThis object was observed as part of the Edinburgh-Cape survey and was classified as a sdB+ (Kilkenny et al. 1997).
fDistance of 19.1 ± 3.0 pc is estimated using V RIJHKS for the common proper motion companion M dwarf and the relations of Henry et al. (2004). System is
possibly within 25 pc. (see § 4.2).
gDistance estimate is undetermined. Instead, we adopt the distance measured via trogonometric parallax of 17.1 ± 0.4 pc (see § 4.2).
hNot listed in McCook & Sion (1999) but identified as a DC/DQ WD by Henry et al. (2002). We obtained blue spectra that show Ca II H & K absorption and classify
this object as a DZ.
iThe SED fit to the photometry is marginal. This object displays deep swan band absorption that significantly affects its measured magnitudes.
jNot listed in McCook & Sion (1999) but identified as a WD by Luyten (1949). Spectral type is derived from our spectra.
kAs of mid-2004, object has moved onto a background source. Photometry is probably contaminated, which is consistent with the poor SED fit for this object.
lSpectral type was determined using spectra published by Scholz et al. (2002).
– 19 –
Table 2. Distance Estimate Statistics for New and Known White Dwarfs.
Proper motion d ≤ 10 pc 10 pc < d ≤ 25 pc d > 25 pc
µ ≥ 1.′′0 yr−1......................... 1 6 1
1.′′0 yr−1 > µ ≥ 0.′′8 yr−1...... 0 0 0
0.′′8 yr−1 > µ ≥ 0.′′6 yr−1...... 0 2 2
0.′′6 yr−1 > µ ≥ 0.′′4 yr−1...... 0 6 11
0.′′4 yr−1 > µ ≥ 0.′′18 yr−1.... 0 5 22
Total.................................... 1 19 36
– 20 –
Table 3. Astrometry and Alternate Designations for New and Known White Dwarfs.
WD Name RA Dec PM PA Ref Alternate Names
(J2000.0) (J2000.0) (arcsec yr−1) (deg)
New Spectroscopically Confirmed White Dwarfs
0034−602......... 00 36 22.31 −59 55 27.5 0.280 069.0 L NLTT 1993 = LP 122-4 = · · ·
0121−429......... 01 24 03.98 −42 40 38.5 0.538 155.2 L LHS 1243 = NLTT 4684 = LP 991-16
0216−398......... 02 18 31.51 −39 36 33.2 0.500 078.6 L LHS 1385 = NLTT 7640 = LP 992-99
0253−755......... 02 52 45.64 −75 22 44.5 0.496 063.5 S SCR 0252-7522 = · · · = · · ·
0310−624......... 03 11 21.34 −62 15 15.7 0.416 083.3 S SCR 0311-6215 = · · · = · · ·
0344+014......... 03 47 06.82 +01 38 47.5 0.473 150.4 S LHS 5084 = NLTT 11839 = LP 593-56
0404−510......... 04 05 32.86 −50 55 57.8 0.320 090.7 P LEHPM 1-3634 = · · · = · · ·
0501−555......... 05 02 43.43 −55 26 35.2 0.280 191.9 P LEHPM 1-3865 = · · · = · · ·
0511−415......... 05 13 27.80 −41 27 51.7 0.292 004.4 P LEHPM 2-1180 = · · · = · · ·
0525−311......... 05 27 24.33 −31 06 55.7 0.379 200.7 P NLTT 15117 = LP 892-45 = LEHPM 2-521
0607−530......... 06 08 43.81 −53 01 34.1 0.246 327.6 P LEHPM 2-2008 = · · · = · · ·
0622−329......... 06 24 25.78 −32 57 27.4 0.187 177.7 P LEHPM 2-5035 = · · · = · · ·
0821−669......... 08 21 26.70 −67 03 20.1 0.758 327.6 S SCR 0821-6703 = · · · = · · ·
0840−136......... 08 42 48.45 −13 47 13.1 0.272 263.0 S NLTT 20107 = LP 726-1 = · · ·
1016−308......... 10 18 39.84 −31 08 02.0 0.212 304.0 L NLTT 23992 = LP 904-3 = LEHPM 2-5779
1054−226......... 10 56 38.64 −22 52 55.9 0.277 349.7 P NLTT 25792 = LP 849-31 = LEHPM 2-1372
1105−340......... 11 07 47.89 −34 20 51.4 0.287 168.0 S SCR 1107-3420A = · · · = · · ·
1149−272......... 11 51 36.10 −27 32 21.0 0.199 278.3 P LEHPM 2-4051 = · · · = · · ·
1243−123......... 12 46 00.69 −12 36 19.9 0.406 305.4 S SCR 1246-1236 = · · · = · · ·
1316−215......... 13 19 24.72 −21 47 55.0 0.467 179.2 S NLTT 33669 = LP 854-50 = WT 2034
1436−781......... 14 42 51.54 −78 23 53.6 0.409 272.0 S NLTT 38003 = LP 40-109 = LTT 5814
1452−310......... 14 55 23.47 −31 17 06.4 0.199 174.2 P LEHPM 2-4029 = · · · = · · ·
1647−327......... 16 50 44.32 −32 49 23.2 0.526 193.8 L LHS 3245 = NLTT 43628 = LP 919-1
1742−722......... 17 48 31.21 −72 17 18.5 0.294 228.2 P LEHPM 2-1166 = · · · = · · ·
1946−273......... 19 49 19.78 −27 12 25.7 0.213 162.0 L NLTT 48270 = LP 925-53 = · · ·
2008−600......... 20 12 31.75 −59 56 51.5 1.440 165.6 S SCR 2012-5956 = · · · = · · ·
2008−799......... 20 16 49.66 −79 45 53.0 0.434 128.4 S SCR 2016-7945 = · · · = · · ·
2035−369......... 20 38 41.42 −36 49 13.5 0.230 104.0 L NLTT 49589 = L 495-42 = LEHPM 2-3290
2103−397......... 21 06 32.01 −39 35 56.7 0.266 151.7 P LEHPM 2-1571 = · · · = · · ·
2138−332......... 21 41 57.56 −33 00 29.8 0.210 228.5 P NLTT 51844 = L 570-26 = LEHPM 2-3327
2157−574......... 22 00 45.37 −57 11 23.4 0.233 252.0 P LEHPM 1-4327 = · · · = · · ·
2218−416......... 22 21 25.37 −41 25 27.0 0.210 143.4 P LEHPM 1-4598 = · · · = · · ·
2231−387......... 22 33 54.47 −38 32 36.9 0.370 220.5 P NLTT 54169 = LP 1033-28 = LEHPM 1-4859
Known White Dwarfs without a Trigonometric Parallax Estimated to be Within 25 pc
0141−675 ........ 01 43 00.98 −67 18 30.3 1.048 197.8 L LHS 145 = NLTT 5777 = L 88-59
0806−661 ........ 08 06 53.76 −66 18 16.6 0.454 131.4 S NLTT 19008 = L 97-3 = · · ·
1009−184 ........ 10 12 01.88 −18 43 33.2 0.519 268.2 S WT 1759 = LEHPM 2-220 = · · ·
1036−204 ........ 10 38 55.57 −20 40 56.7 0.628 330.3 L LHS 2293 = NLTT 24944 = LP 790-29
1202−232 ........ 12 05 26.66 −23 33 12.1 0.227 002.0 L NLTT 29555 = LP 852-7 = LEHPM 2-1894
1315−781 ........ 13 19 25.63 −78 23 28.3 0.477 139.2 S NLTT 33551 = L 40-116 = · · ·
1339−340 ........ 13 42 02.88 −34 15 19.4 2.547 296.7 Le PM J13420-3415 = · · · = · · ·
1756+143 ........ 17 58 22.90 +14 17 37.8 1.014 235.4 Le LSR 1758+1417 = · · · = · · ·
1814+134 ........ 18 17 06.48 +13 28 25.0 1.207 201.5 Le LSR 1817+1328 = · · · = · · ·
2040−392 ........ 20 43 49.21 −39 03 18.0 0.306 179.0 L NLTT 49752 = L 495-82 = · · ·
2211−392 ........ 22 14 34.75 −38 59 07.3 1.056 110.1 O WD J2214-390 = LEHPM 1-4466 = · · ·
2226−754A........ 22 30 40.00 −75 13 55.3 1.868 167.5 S SSSPM J2231-7514 = · · · = · · ·
2226−754B........ 22 30 33.55 −75 15 24.2 1.868 167.5 S SSSPM J2231-7515 = · · · = · · ·
Known White Dwarfs without a Trigonometric Parallax Estimated to be Beyond 25 pc
0024−556......... 00 26 40.69 −55 24 44.1 0.580 211.8 L LHS 1076 = NLTT 1415 = L 170-27
0150+256......... 01 52 51.93 +25 53 40.7 0.220 076.0 L NLTT 6275 = G 94-21 = · · ·
0255−705......... 02 56 17.22 −70 22 10.8 0.682 097.9 L LHS 1474 = NLTT 9485 = L 54-5
0442−304......... 04 44 29.38 −30 21 14.2 0.196 199.5 P NLTT 13882 = LP 891-65 = HE 0442-3027
0928−713......... 09 29 07.97 −71 33 58.8 0.439 320.2 S NLTT 21957 = L 64-40 = · · ·
1143−013......... 11 46 25.77 −01 36 36.8 0.563 140.2 S LHS 2455 = NLTT 28493 = · · ·
– 21 –
Table 3—Continued
WD Name RA Dec PM PA Ref Alternate Names
(J2000.0) (J2000.0) (arcsec yr−1) (deg)
1237−230......... 12 40 24.18 −23 17 43.8 1.102 219.9 L LHS 339 = NLTT 31473 = LP 853-15
1314−153......... 13 16 43.59 −15 35 58.3 0.708 196.7 L LHS 2712 = NLTT 33503 = LP 737-47
1418−088......... 14 20 54.93 −09 05 08.7 0.480 266.8 S LHS 5270 = NLTT 37026 = · · ·
1447−190......... 14 50 11.93 −19 14 08.7 0.253 285.4 P NLTT 38499 = LP 801-14 = LEHPM 2-1835
1607−250......... 16 10 50.21 −25 13 16.0 0.209 314.0 L NLTT 42153 = LP 861-31 = · · ·
References. — (L) Luyten 1979a,b, (Le) Lépine et al. 2003, Lépine et al. 2005, (O) Oppenheimer et al. 2001, (P) Pokorny et al.
2004, (S) Subasavage et al. 2005a,b, this work
– 22 –
Fig. 1.— Reduced proper motion diagram used to select WD candidates for spectroscopic
follow-up. Plotted are the new high proper motion objects from Subasavage et al. (2005a,b).
The line is a somewhat arbitrary boundary between the WDs (below) and the subdwarfs
(just above). Main sequence dwarfs fall above and to the right of the subdwarfs, although
there is significant overlap. Asterisks indicate the 33 new WDs reported here. Three dots
in the WD region are deferred to a future paper. The point labeled “sd” is a confirmed
subdwarf contaminant of the WD sample.
Fig. 2.— Spectral plots of the hot (Teff ≥ 10000 K) DA WDs from the new sample, plotted
in descending Teff as derived from the SED fits to the photometry. Note that some of the
flux calibrations are not perfect, in particular, at the blue end.
Fig. 3.— Spectral plots of cool (Teff < 10000 K) DA WDs from the new sample, plotted in
descending Teff as derived from the SED fits to the photometry. Note that some of the flux
calibrations are not perfect, in particular, at the blue end.
Fig. 4.— Spectral plots of the four featureless DC white dwarfs from the new sample, plotted
in descending Teff as derived from the SED fits to the photometry. Note that some of the
flux calibrations are not perfect, in particular, at the blue end.
Fig. 5.— Comparison plot of the values of Teff derived from photometric SED fitting vs
those derived from spectral fitting for 25 of the DA WDs in the new sample. The solid line
represents equal temperatures. The elevated point, 0310−624, is discussed in § 4.2.
Fig. 6.— Spectral plot of WD 0121−429. The inset plot displays the spectrum (light line)
in the Hα region to which a magnetic fit (heavy line), as outlined in Bergeron et al. (1992a),
was performed using the Teff obtained from the SED fit to the photometry. The resulting
magnetic parameters are listed below the fit.
Fig. 7.— Spectral plot of WD 0622−329. The inset plot displays the spectrum (light line)
in the region to which the model (heavy line) was fit assuming the spectrum is a convolution
of a DB component and a slightly cooler DA component. Best fit physical parameters are
listed below the fit for each component.
Fig. 8.— (top panel) Spectral plot of WD 0840−136. The DZ model failed to reproduce
the spectrum presumably because this object is cooler than Teff ∼ 5000 K where additional
pressure effects, not included in the model, become important. (bottom panel) Spectral plot
of WD 2138−332. The inset plot displays the spectrum (light line) in the region to which
the model (heavy line) was fit.
– 23 –
Fig. 9.— Spectral plot of WD 1149−272. The inset plot displays the spectrum (light line)
in the region to which the model (heavy line) was fit.
Fig. 10.— Spectral energy distribution plot of WD 2008−600 with the distance constrained
by the trigonometric distance of 17.1 ± 0.4 pc. Best fit physical parameters are listed below
the fit. Points are fit values; error bars are derived from the uncertainties in the magnitudes
and the parallax.
– 24 –
– 25 –
– 26 –
– 27 –
– 28 –
– 29 –
– 30 –
– 31 –
– 32 –
– 33 –
	Introduction
	Candidate Selection
	Data and Observations
	Astrometry and Nomenclature
	Spectroscopy
	Photometry
	Analysis
	Modeling of Physical Parameters
	Comments on Individual Systems
	Discussion
	Acknowledgments
	Appendix
ABSTRACT
  We present spectra for 33 previously unclassified white dwarf systems
brighter than V = 17 primarily in the southern hemisphere. Of these new
systems, 26 are DA, 4 are DC, 2 are DZ, and 1 is DQ. We suspect three of these
systems are unresolved double degenerates. We obtained VRI photometry for these
33 objects as well as for 23 known white dwarf systems without trigonometric
parallaxes, also primarily in the southern hemisphere. For the 56 objects, we
converted the photometry values to fluxes and fit them to a spectral energy
distribution using the spectroscopy to determine which model to use (i.e. pure
hydrogen, pure helium, or metal-rich helium), resulting in estimates of
effective temperature and distance. Eight of the new and 12 known systems are
estimated to be within the NStars and Catalogue of Nearby Stars (CNS) horizons
of 25 pc, constituting a potential 18% increase in the nearby white dwarf
sample. Trigonometric parallax determinations are underway via CTIOPI for these
20 systems.
  One of the DCs is cool so that it displays absorption in the near infrared.
Using the distance determined via trigonometric parallax, we are able to
constrain the model-dependent physical parameters and find that this object is
most likely a mixed H/He atmosphere white dwarf similar to other cool white
dwarfs identified in recent years with significant absorption in the infrared
due to collision-induced absorptions by molecular hydrogen.

<|endoftext|><|startoftext|>
Introduction
The description of the singular locus and of the types of singularities appearing in Schubert
varieties is a hard problem. A first step in this direction was the proof by V. Lakshmibai and B.
Sandhya [LS90] of a pattern avoidance criterion for a Schubert variety in type A to be smooth. There
exist some other results in this direction, for a detailed account see [BL00]. Another important
result was a complete combinatorial description, still in type A, of the irreducible components of
the singular locus of a Schubert variety (this has been realised, almost in the same time, by L.
Manivel [Ma01a] and [Ma01b], S. Billey and G. Warrington [BW03], C. Kassel, A. Lascoux and C.
Reutenauer [KLR03] and A. Cortez [Co03]). The singularity at a generic point of such a component
is also given in [Ma01b] and [Co03]. However, as far as I know, this problem is still open for other
types. Another partial result in this direction is the description of the irreducible components of
the singular locus and of the generic singularity of minuscule and cominuscule Schubert varieties
(see Definition 1.2) by M. Brion and P. Polo [BP99].
In the same vein as [LS90], A. Woo and A. Yong gave in [WY06a] and [WY06b] a generalised
pattern avoidance criterion, in type A, to decide if a Schubert variety is Gorenstein. They do not
describe the irreducible components of the Gorenstein locus but give the following conjecture (see
Conjecture 6.7 in [WY06b]):
CONJECTURE 0.1. — Let X be a Schubert variety, a point x in X is in the Gorenstein locus of
X if and only if the generic point of any irreducible component of the singular locus of X containing
x is is the Gorenstein locus of X.
The interest of this conjecture relies on the fact that, at least in type A, the irreducible compo-
nents of the singular locus and the singularity of a generic point of that component are well known.
The conjecture would imply that one only needs to know the information on the irreducible com-
ponents of the singular locus to get all the information on the Gorenstein locus.
In this paper we prove this conjecture for all minuscule Schubert varieties thanks to a combi-
natorial description of the Gorenstein locus of minuscule Schubert varieties. To do this we use the
http://arxiv.org/abs/0704.0895v1
combinatorial tool introduced in [Pe07] associating to any minuscule Schubert variety a reduced
quiver generalising Young diagrams. First, we translate the results of M. Brion and P. Polo [BP99]
in terms of the quiver. We define the holes, the virtual holes and the essential holes in the quiver
(see Definitions 2.3 and 3.1) and prove the following:
THEOREM 0.2. — (ı) A minuscule schubert variety is smooth if and only if its associated quiver
has no nonvirtual hole.
(ıı) The irreducible components of the singular locus of a minuscule Schubert variety are indexed
by essential holes.
Furthermore we explicitely describe in terms of the quiver and the essential holes these irre-
ducible components and the singularity at a generic point of a component (for more details see
Theorem 3.2). In particular, with this description it is easy to say if the singularity at a generic
point of an irreducible component of the singular locus is Gorenstein or not. The essential holes
corresponding to irreducible components having a Gorenstein generic point are called Gorenstein
holes (see also Definition 3.8). We give the following complete description of the Gorenstein locus:
THEOREM 0.3. — The generic point of a Schubert subvariety X(w′) of a minuscule Schubert
variety X(w) is in the Gorenstein locus if and only if the quiver of X(w′) contains all the non
Gorenstein holes of the quiver of X(w).
COROLLARY 0.4. — Conjecture 0.1 is true for all minuscule Schubert varieties.
Example 0.5. — Let G(4, 7) be the Grassmannian variety of 4-dimensional subspaces in a 7-
dimensional vector space. Consider the Schubert variety
X(w) = {V4 ∈ G(4, 7) dim(V4 ∩W3) ≥ 2 and dim(V4 ∩W5) ≥ 3}
where W3 and W5 are fixed subspaces of dimension 3 and 5 respectively. The minimal length
representative w is the permutation (2357146). Its quiver is the following one (all the arrows are
going down):
We have circled the two holes on this quiver. The left hole is not a Gorenstein hole (this can be
easily seen because the two peaks above this hole do not have the same height, see Definition 2.3)
but the right one is Gorenstein (the two peaks have the same height). Let X(w′) be an irreducible
component of the singular locus of X(w). The possible quivers of such a variety X(w′) are the
following (for each hole we remove all the vertices above that hole):
These Schubert varieties correspond to the permutations: (1237456) and (2341567). Let X(w′) be
a Schubert subvariety in X(w) whose generic point is not in the Gorenstein locus. Then X(w′) has
to be contained in X(1237456).
Acknowledgements: I thank Frank Sottile and Jim Carrel for their invitation to the BIRS
workshop Comtemporary Schubert calculus during which the major part of this work has been
done.
1 Minuscule Schubert varieties
Let us fix some notations and recall the definitions of minuscule homogeneous spaces and minuscule
Schubert varieties. A basic reference is [LMS79].
In this paper G will be a semi-simple algebraic group, we fix B a Borel subgroup and T a
maximal torus in B. We denote by R the set of roots, by R+ and R− the set of positive and
negative roots. We denote by S the set of simple roots. We will denote by W the Weyl group of G.
We also fix P a parabolic subgroup containing B. We denote by WP the Weyl group of P
and by WP the set of minimal length representatives in W of the coset W/WP . Recall that the
Schubert varieties in G/P (that is to say the B-orbit closures in G/P ) are parametrised by WP .
DEFINITION 1.1. — A fundamental weight ̟ is said to be minuscule if, for all positive roots
α ∈ R+, we have 〈α∨,̟〉 ≤ 1.
With the notation of N. Bourbaki [Bo68], the minuscule weights are:
Type minuscule
An ̟1 · · ·̟n
Bn ̟n
Cn ̟1
Dn ̟1, ̟n−1 and ̟n
E6 ̟1 and ̟6
E7 ̟7
E8 none
F4 none
G2 none
DEFINITION 1.2. — Let ̟ be a minuscule weight and let P̟ be the associated parabolic subgroup.
The homogeneous space G/P̟ is then said to be minuscule. The Schubert varieties of a minuscule
homogeneous space are called minuscule Schubert varieties.
Remark 1.3. — It is a classical fact that to study minuscule homogeneous spaces and their
Schubert varieties, it is sufficient to restrict ourselves to simply-laced groups.
In the rest of the paper, the group G will be simply-laced, the subgroup P will be a maximal
parabolic subgroup associated to a minuscule fundamental weight ̟. The minuscule homogeneous
space G/P will be denoted by X and the Schubert variety associated to w ∈ WP will be denoted
by X(w) with the convention that the dimension of X(w) is the length of w.
2 Miniscule quivers
In [Pe07], we associated to any minuscule Schubert variety X(w) a unique quiver Qw. The definition
a priori depends on the choice of a reduced expression but does not depend on the commuting
relations. In the minuscule setting this implies that the following definitons do not depend on the
choosen reduced expression. Fix a reduced expression w = sβ1 · · · sβr of w (recall that w is in W
the set of minimal length representatives of W/WP ) where for all i ∈ [1, r], we have βi ∈ S.
DEFINITION 2.1. — (ı) The successor s(i) and the predecessor p(i) of an element i ∈ [1, r] are the
elements s(i) = min{j ∈ [1, r] / j > i and βj = βi} and p(i) = max{j ∈ [1, r] / j < i and βj = βi}.
(ıı) Denote by Qw the quiver whose set of vertices is the set [1, r] and whose arrows are given
in the following way: there is an arrow from i to j if and only if 〈β∨j , βi〉 6= 0 and i < j < s(i) (or
only i < j if s(i) does not exist).
Remark 2.2. — (ı) This quiver comes with a coloration of its vertices by simple roots via the
map β : [1, r] → S such that β(i) = βi.
(ıı) There is a natural order on the quiver Qw given by i4j if there is an oriented path from j
to i. Caution that this order is the reversed order of the one defined in [Pe07].
(ııı) Note that if we denote by Q̟ the quiver obtained from the longuest element in W
P , then
the quiver Qw is a subquiver of Q̟. The quivers of Schubert subvarieties are exactely the order
ideals in the quiver Q̟. We will call such a quiver reduced (meaning that it corresponds to a reduced
expression of an element in WP , see [Pe07] for more details on the shape of reduced quivers).
Recall also that we defined in [Pe07] some combinatorial objects associated to the quiver Qw.
DEFINITION 2.3. — (ı) We call peak any vertex of Qw maximal for the partial order 4. We
denote by Peaks(Qw) the set of peaks of Qw.
(ıı) We call hole of the quiver Qw any vertex i of Q̟ satisfying one of the following properties
• the vertex i is in Qw but p(i) 6∈ Qw and there are exactly two vertices j1<i and j2<i in Qw
with 〈β∨i , βjk〉 6= 0 for k = 1, 2.
• the vertex i is not in Qw, s(i) does not exist in Q̟ and there exist j ∈ Qw with 〈β
i , βj〉 6= 0.
Because the vertex of the second type of holes is not a vertex in Qw we call such a hole a virtual
hole of Qw. We denote by Holes(Qw) the set of holes of Qw.
(ııı) The height h(i) of a vertex i is the largest positive integer n such that there exists a sequence
(ik)k∈[1,n] of vertices with i1 = 1, in = r and such that there is an arrow from ik to ik+1 for all
k ∈ [1, n − 1].
Many geometric properties of the Schubert variety X(w) can be read on its quiver. In particular
we proved in [Pe07, Corollary 4.12]:
PROPOSITION 2.4. — A Schubert subvariety X(w′) in X(w) is stable under Stab(X(w)) if and
only if β(Holes(Qw′)) ⊂ β(Holes(Qw)).
An easy consequence of this fact and the result by M. Brion and P. Polo that the smooth locus
of X(w) is the dense Stab(X(w))-orbit is the following:
PROPOSITION 2.5. — A Schubert variety X(w) is smooth if and only if all the holes of its
quiver Qw are virtual.
We will be more precise in Theorem 3.2 and we will describe the irreducible components of
the singular locus and the generic singularity of this component in terms of the quiver. The
Gorensteiness of the variety is also easy to detect on the quiver as we proved in [Pe07, Corollary
4.19]:
PROPOSITION 2.6. — A Schubert variety X(w) is Gorenstein if and only if all the peaks of its
quiver Qw have the same height.
3 Generic singularities of minuscule Schubert varieties
In this section, we go one step further in the direction of reading on the quiver Qw the geometric
properties of X(w). We will translate the results of M. Brion and P. Polo [BP99] on the irreducible
components of the singular locus of X(w) and the singularity at a generic point of such a component
in terms of the quiver Qw. We will need the following notations:
DEFINITION 3.1. — (ı) Let i be a vertex of Qw, we define the subquiver Q
w of Qw as the full
subquiver containing the following set of vertices {j ∈ Qw / j < i}. We denote by Qw,i the full
subquiver of Qw containing the vertices of Qw \ Q
w. We denote by w
i (resp. wi) the elements in
WP associated to the quivers Qiw (resp. Qw,i).
(ıı) A hole i of the quiver Qw is said to be essential if it is not virtual and if there is no hole in
the subquiver Qiw.
(ııı) Following M. Brion and P. Polo, denote by J the set β(Holes(Qw))
We then prove the following:
THEOREM 3.2. — (ı) The set of irreducible components of the singular locus of X(w) is in one
to one correspondence with the set of essential holes of the quiver Qw. In particular, if i is an
essential hole of Qw, the corresponding irreducible component is the Schubert subvariety X(wi) of
X(w) whose quiver is Qw,i.
(ııı) Furthermore, the singularity of X(w) at a generic point of X(wi) is the same singularity
as the one of the B-fixed point in the Schubert variety X(wi) whose quiver is Qiw.
Remark 3.3. — The singularity of the B-fixed point in X(wi) is described in [BP99].
Proof — This result is a reformulation of the main results of M. Brion and P. Polo [BP99].
Proposition 2.4 shows that the essential holes are in one to one correspondence with maximal
Schubert subvarieties in X(w) stable under Stab(X(w)) and that if i is an essential hole, then the
corresponding Schubert subvariety X(wi) is associated to the quiver Qw,i. According to [BP99],
these are the irreducible components of the singular locus.
To describe the singularity of X(wi), M. Brion and P. Polo define two subsets I and I
′ of the
set of simple roots as follows:
• the set I is the union of the connected components of J ∩ wi(RP ) adjacent to β(i)
• the set I ′ is the union I ∪ {β(i)}.
We describe these sets thanks to the quiver.
PROPOSITION 3.4. — The set I ′ is β(Qiw).
Proof — The elements in J ∩ wi(RP ) are the simple roots γ ∈ J such that wi
−1(γ) ∈ RP .
Thanks to Lemma 3.5, these elements are the simple roots in J neither in β(Holes(Qw,i)) nor in
β(Peaks(Qw,i)).
An easy (but fastidious for types E6 and E7) look on the quivers shows that I
′ = β(Qiw). A
uniform proof of this statement is possible but needs an involved case analysis on the quivers. �
LEMMA 3.5. — Let β be a simple root, then we have
1. w−1(β) ∈ R− \R−
if β ∈ β(Peaks(Qw)),
2. w−1(β) ∈ R+ \R+
if β ∈ β(Holes(Qw)) = J
3. w−1(β) ∈ R+
otherwise.
Proof — Let w = sβ1 · sβr be a reduced expression for w, we want to compute w
−1(β) =
sβr · · · sβ1(β). We proceed by induction and deal with the three cases at the same time.
1. Take first β ∈ β(Peaks(Qw)), we may assume that β1 = β and w
−1(β) = sβr · · · sβ2(−β). Let
i ∈ Peaks(Qw) such that β(i) = β, the quiver obtained by removing i has s(i) for hole (possibly
virtual). We may apply induction and the result in case 2.
2.a. Let β ∈ Jc. Assume first that there is no k ∈ Qw with β(k) = β. Then there exist an
i ∈ Qw such that 〈β
∨, βi〉 6= 0. Let us prove that such a vertex i is unique. Indeed, the support
of w is contained in a subdiagram D of the Dynkin diagram not containing β. The diagram D
contains the simple root α corresponding to P (except if X(w) is a point in which case w = Id
and the lemma is easy). The quiver Qw is in particular contained in the quiver of the minuscule
homogeneous variety associated to α ∈ D. It is easy to check on these quivers (see in [Pe07] for
the shape of these quivers) that there is a unique such vertex i.
Now consider the quivers Qiw and Qw,i. Recall that we denote by w
i and wi the associated
elements in W . We have w = wiwi. We compute w
i−1(β) and because all simple roots β(x)
for x ∈ Qiw with x 6= i are orthogonal to β we have w
i−1(β) = sβi(β) = β + βi. We then
have w−1(β) = w−1i (β + βi). Because i was the only vertex such that 〈β
∨, βi〉 6= 0, we have
w−1i (β) = β ∈ R
and by induction (note that i is now a hole of Qw,i) we have w
i (βi) ∈ R
+ \R+
and we have the result.
2.b. Now assume that there exist k ∈ Holes(Qw) with β(k) = β and let i a vertex maximal for
the property 〈β∨, βi〉 6= 0. Remark that we have k < i. Consider one more time the quivers Q
and Qw,i and the elements w
i and wi. We have w
−1(β) = w−1i (βi + β). But as before we have by
induction w−1
(βi) ∈ R
+ \ R+
so that we can conclude by induction as soon as k is not a peak of
Qw,i. But because k is an hole, there exist a vertex j ∈ Qw with j 6= i and such that there is an
arrow j → k in Qw. Because i was taken maximal j is a vertex of Qw,i and k is not a peak of this
quiver.
3. If β is not in the support of w but is not in β(Holes) then w−1(β) = β ∈ R+
Let β in β(Qw) but not in β(Holess(Qw)) or β(Peaks(Qw)) and let k the highest vertex such
that β(k) = β. There exists a unique vertex i ∈ Qw such that i ≻ k and 〈β
∨, β(i)〉 6= 0. We
have w−1(β) = w−1i (βi + β) and the vertex k is a peak of Qw,i so that wi = sβ(k)wk = sβwk and
w−1(β) = w−1
(βi). Now it is easy to see that either s(i) does not exists and in this case it is not a
virtual hole or it exists but is neither a peak nor a hole of Qw,k. We conclude by induction on the
third case. �
The Theorem is now a corollary of the description of the singularities thanks to I and I ′ done
by M. Brion and P. Polo. �
Remark 3.6. — In their article M. Brion and P. Polo also deal with the cominucule Schubert
varieties. We believe that, in that case, Theorem 0.3 should hold true as well as Corollary 0.4.
It is now easy to decide which generic singularity is Gorenstein:
COROLLARY 3.7. — Let i be an essential hole of the quiver Qw. The generic point of the
irreducible component X(wi) of the singular locus is Gorenstein if and only if all the peaks of Q
are of the same height.
We describe the Schubert subvarieties X(w′) in X(w) that are expected to be Gorenstein at
their generic point by the conjecture of A. Woo and A. Yong. Let us give the following
DEFINITION 3.8. — (ı) An essential hole is said to be Gorenstein if the generic point of the
associated irreducible component of the singular locus is in the Gorenstein locus.
(ıı) A Schubert subvariety X(w′) in X(w) is said to have the property (WY) if the generic point
of any irreductible component of the singular locus of X(w) containing X(w′) is in the Gorenstein
locus of X(w).
We have the following:
PROPOSITION 3.9. — Let X(w′) be a Schubert subvariety of the Schubert variety X(w). If the
generic point of X(w′) is Gorentein in X(w), then X(w′) has the property (WY).
Proof — Let X(v) be an irreducible component of the singular locus of X(w) containing X(w′).
Because the property of beeing non Gorenstein is stable under closure, this implies that the generic
point of X(v) is Gorenstein in X(w). �
Remark that, because all the irreducible components of the singular locus of X(w) are stable
under Stab(X(w)), the property (WY) need only to be checked on Stab(X(w))-stable Schubert
subvarieties.
PROPOSITION 3.10. — (ı) The Schubert subvarieties X(w′) in X(w) stable under Stab(X(w))
are exactely those such that the associated quiver Qw′ satisfies
Qw′ =
i∈Holes(Qw)
Qw,ski(i)
where the (ki)i∈Holes(Qw) are integers greater or equal to −1 (if ki = −1, the quiver Qw,ski(i) is Qw
by definition).
(ıı) A Stab(X(w))-stable Schubert subvariety X(w′) of X(w) has the property (WY) if and only
if the only essential holes in the difference Qw \Qw′ are Gorenstein. Equivalentely, writing
Qw′ =
i∈Holes(Qw)
Qw,ski(i),
if and only if the only holes in of the quivers (Q
ski (i)
w )i∈Holes(Qw) are Gorenstein holes. Another
equivalent formulation is that Qw′ contains all the non Gorenstein essential holes of Qw.
Proof — (ı) Consider the subquiver Qw′ in Qw and for each hole i of Qw define the integer
ki = min{k ≥ 0 / s
k(i) ∈ Qw′} − 1. Because of the fact (see for example [LMS79]) that the
strong and weak Bruhat orders coincide for minuscule Schubert varieties, the quiver Qw′ has to be
contained in the intersection
i∈Holes(Qw)
Qw,ski(i).
We therefore need to remove some vertices to Q′ to get Qw′. But removing a vertex j of the quiver
Q′ (it has to be a peak of Q′) creates a hole in s(j) (or a virtual hole in j if s(j) does not exist).
Because X(w′) is Stab(X(w))-stable, the last removed vertex j is such that β(j) ∈ β(Holes(Qw)).
This implies that no more vertex can be removed from Q′ to get Qw′ and in particular Qw′ = Q
(ıı) The Schubert subvariety has the property (WY) if and only if all the irreducible components
X(wi) of the singular locus of X(w) containing X(w
′) are such that i is a Gorenstein hole. But
X(w′) is contained in X(wi) if and only if Qw′ is contained in Qw,i. This is equivalent to the fact
that Qiw is contained in Qw \Qw′ and the proof follows. �
4 Relative canonical model and Gorenstein locus
In this section, we recall the explicit construction given in [Pe07] of the relative canonical model
of X(w). Recall that we described in [Pe07] the Bott-Samelson resolution π : X̃(w) → X(w) as a
configuration variety à la Magyar [Ma98]:
X̃(w) ⊂
G/Pβi
where Pβi is the maximal parabolic associated to the simple root βi. The map π : X̃(w) → X(w)
is given by the projection
G/Pβi → G/Pβm(w) where m(w) is the smallest element in Qw.
We define a partition on the peaks of the quiver Qw and a partition of the quiver itself:
DEFINITION 4.1. — (ı) Define a partition (Ai)i∈[1,n] of Peaks(Qw) by induction: A1 is the set of
peaks with minimal height and Ai+1 is the set of peaks in Peaks(Qw)\
k=1Ak with minimal height
(the integer n is the number of different values the height function takes on the set Peaks(Qw)).
(ıı) Define a partition (Qw(i))i∈[1,n] of Qw by induction:
Qw(i) = {x ∈ Qw / ∃j ∈ Ai : x 4 j and x 64 k ∀k ∈ ∪j>iAj}.
We proved in [Pe07] that these quivers Qw(i) are quivers of minuscule Schubert varieties and
in particular have a minimal element mw(i). We defined the variety X̂(w) as the image of the
Bott-Samelson resolution X̃(w) (seen as a configuration variety) in the product
i=1 G/Pβmw(i) .
Because mw(n) = m(w) we have a map π̂ : X̂(w) → X(w) and a factorisation
X̃(w)
// X̂(w)
X(w).
We proved the following result in [Pe07]:
THEOREM 4.2. — (ı) The variety X̂(w) together with the map π̂ realise X̂(w) as the relative
canonical model of X(w).
(ıı) The variety X̂(w) is a tower of locally trivial fibrations with fibers the Schubert varieties
associated to the quivers Qw(i). In particular X̂(w) is Gorenstein.
We will use this resolution to prove our main result. Indeed, we will prove that the generic fibre
of the map π̂ : X̂(w) → X(w) above a (WY) Schubert subvariety X(w′) is a point. In other words,
the map π̂ is an isomorphism on an open subset of X(w′). As a consequence, the generic point of
X(w′) will be in the Gorenstein locus.
Let us recall some facts on X̃(w) and X̂(w) (see [Pe07]):
FACT 4.3. — (ı) To each vertex i of Qw one can associated a divisor Di on X̃(w) and all these
divisors intersect transversally.
(ıı) For K a subset of the vertices of Qw, we denote by ZK the transverse intersection of the
Di for i ∈ K.
(ııı) The image of the closed subset ZK by the map π is the Schubert variety X(wK) whose
quiver QwK is the biggest reduced subquiver of Qw not containing the vertices in K.
The quiver Qw(i) defines a element w(i) in W and the fact that these quivers realise a partition
of Qw implies that we have an expression w = w(1) · · ·w(n) with l(w) =
l(w(i)). We prove the
following generalisation of this fact:
PROPOSITION 4.4. — Let K be a subset of the vertices of Qw. The image of the closed subset
ZK by the map π̃ is a tower of locally trivial fibrations with fibers the Schubert varieties X(wK(i))
whose quiver QwK(i) is the biggest reduced subquiver of Qw(i) not containing the vertices of K∩Qw(i).
This variety is the image by π̃ of Z∪n
QK(i).
Proof — As we explained in [Pe07, Proposition 5.9], the Bott-Samelson resolution is the quotient
of the product
Ri where theRi are certain minimal parabolic subgroups by a product of Borel
subgroups
i=1 Bi. The variety X̂(w) is the quotient of a product
i=1 Ni of parabolic subgroups
such that the multiplication in G maps
k∈Qw(i)
Rk to Ni by a product
i=1Mi of parabolic
subgroups. The map π̃ is induced by the product from
Ri to
i=1Ni. In particular, this
means that for i ∈ [1, n] fixed, the map
k∈Qw(i)
→ Ni induces the map from the Bott-Samelson
resolution X̃(w(i)) to X(w(i)). We may now apply part (ııı) of the preceding fact because the
quiver Qw(i) is minuscule. �
We now remark that the quivers Qw′ associated to Schubert subvarieties X(w
′) in the Schu-
bert variety X(w) having the property (WY) have a nice behaviour with repect to the partition
(Qw(i))i∈[1,n] of Qw.
PROPOSITION 4.5. — Let X(w′) be a Stab(X(w))-stable Schubert subvariety of X(w) having
the property (WY). Let us denote by (Cj)j∈[1,k] the connected components of the subquiver Qw \Qw′
of Qw. Then for each j, there exist an unique ij ∈ [1, n] such that Cj ⊂ Qw(ij).
Proof — Recall from Proposition 3.10 that, denoting by GorHol(Qw) the set of Gorenstein holes
in Qw, we may write
Qw \Qw′ =
i∈GorHol(Qw)
with ki an integer greater or equal to −1 and with the additional condition that Q
ski(i)
w contains
only Gorenstein holes. Because the quivers Q
ski(i)
w are connected, any connected component of
Qw \Qw′ is an union of such quivers. But we have the following:
LEMMA 4.6. — Let i ∈ Holes(Qw) and assume that Q
sk(i)
w meets at least two subquivers of the
partition (Qw(i))i∈[1,n], then Q
sk(i)
w contains a non Gorenstein hole.
Proof — The quiver Q
sk(i)
w meets two subquivers of the partition (Qw(i))i∈[1,n], in particular it
contains two peaks of Qw of different heights. By connexity of Q
sk(i)
w , we may assume that these
two peaks are adjacent. In particular there is a hole between these two peaks and this hole is not
Gorenstein and is contained in Q
sk(i)
w . �
The proposition follows. �
We describe the inverse image by π̂ of a Stab(X(w))-stable Schubert subvariety of X(w) having
the property (WY). To do this, first remark that the map π is B-equivariant and that the inverse
image π−1(X(w′)) has to be a union of closed subsets ZK for some subsets K of Qw. Let ZK ⊂
π−1(X(w′)) be such that π : ZK → X(w
′) is dominant. We will denote by Qw
w (i) the intersection
Qw′ ∩Qw(i) and by w
′(i) the associated element in W .
PROPOSITION 4.7. — The image of ZK in X̂(w) by π̃ is the same as the image of ZQw\Qw′ .
Proof — Thanks to Proposition 4.4 we only need to compute the quivers QwK(i). Consider the
decomposition into connected components Qw \Qw′ = ∪
j=1Cj . We may decompose K accordingly
as K = ∪kj=1Kj where Kj = K ∩ Cj. But because each connected component of Qw \ Qw′ is
contained in one of the quivers (Qw(i))i∈[1,n] this implies that QwK(i) is exactely QwK ∩ Qw(i)
where QwK is the biggest reduced quiver in Qw Qw not containing the vertices in K (see Fact 4.3).
We get QwK = Qw′ (because ZK is sent onto X(w
′)) and the result follows. �
THEOREM 4.8. — Let X(w′) be a Schubert subvariety in X(w). Then X(w′) has the property
(WY) if and only if its generic point is in the Gorenstein locus of X(w).
Proof — We have already seen in Proposition 3.9 that if the generic point of X(w′) is in the
Gorenstein locus of X(w) then X(w′) has the property (WY).
Conversely let X(w′) be a Schubert subvariety having the property (WY). The previous propo-
sition implies that its inverse image π̂−1(X(w′)) is the variety π̃(ZQw\Qw′ ). But this implies that
the map π̂ : π̃(ZQw\Qw′ ) = π̂
−1(X(w′)) → X(w′) is birational (because the varieties have the
same dimension given by the number of vertices in the quiver). In particular, the map π̂ is an
isomorphism on an open subset of X(w) meeting X(w′) non trivially. Therefore, because X̂(w) is
Gorenstein, it is the case of the generic point in X(w′) as a point in X(w). �
References
[BL00] Sara Billey and Venkatramani Lakshmibai : Singular loci of Schubert varieties. Progress in Math-
ematics, 182. Birkhäuser Boston, Inc., Boston, MA, 2000.
[BW03] Sara Billey and Gregory Warrington: Maximal singular loci of Schubert varieties in SL(n)/B.
Trans. Amer. Math. Soc. 355 (2003), no. 10, 3915–3945.
[Bo68] Nicolas Bourbaki : Éléments de mathématique. Fasc. XXXIV. Groupes et algèbres de Lie.
Chapitre IV : Groupes de Coxeter et systèmes de Tits. Chapitre V: Groupes engendrés par des
réflexions. Chapitre VI: systèmes de racines. Actualités Scientifiques et Industrielles, No. 1337
Hermann, Paris 1968.
[BP99] Michel Brion and Patrick Polo: Generic singularities of certain Schubert varieties. Math. Z. 231
(1999), no. 2, 301–324.
[Co03] Aurélie Cortez : Singularités génériques et quasi-résolutions des variétés de Schubert pour le
groupe linéaire. Adv. Math. 178 (2003), no. 2, 396–445.
[KLR03] Christian Kassel, Alain Lascoux and Chritophe Reutenauer : The singular locus of a Schubert
variety. J. Algebra 269 (2003), no. 1, 74–108.
[LMS79] Venkatramani Lakshmibai, Chitikila Musili and Conjeerveram S. Seshadri : Geometry of G/P .
III. Standard monomial theory for a quasi-minuscule P . Proc. Indian Acad. Sci. Sect. A Math.
Sci. 88 (1979), no. 3, 93–177.
[LS90] Venkatramani Lakshmibai and B. Sandhya: Criterion for smoothness of Schubert varieties in
Sl(n)/B. Proc. Indian Acad. Sci. Math. Sci. 100 (1990), no. 1, 45–52.
[Ma98] Peter Magyar : Borel-Weil theorem for configuration varieties and Schur modules. Adv. Math.
134 (1998), no. 2, 328–366.
[Ma01a] Laurent Manivel : Le lieu singulier des variétés de Schubert. Internat. Math. Res. Notices 2001,
no. 16, 849–871.
[Ma01b] Laurent Manivel : Generic singularities of Schubert Varieties. math.AG/0105239 (2001).
[Pe07] Nicolas Perrin: Small-resolutions of minuscule Schubert varieties. Preprint math.AG/0601117 to
appear in Compositio Mathematica (2007).
[WY06a] Alexander Woo and Alexander Yong: When is a Schubert variety Gorenstein? Adv. Math. 207
(2006), no. 1, 205–220.
[WY06b] Alexander Woo and Alexander Yong: Governing singularities of Schubert varieties. Preprint
math.AG/0603273 (2006).
Université Pierre et Marie Curie - Paris 6
UMR 7586 — Institut de Mathématiques de Jussieu
175 rue du Chevaleret
75013 Paris, France.
email : nperrin@math.jussieu.fr
ABSTRACT
  In this article, we describe explicitely the Gorenstein locus of all
minuscule Schubert varieties. This proves a special case of a conjecture of A.
Woo and A. Yong (see math.AG/0603273) on the Gorenstein locus of Schubert
varieties.

<|endoftext|><|startoftext|>
Introduction
In this paper, we address the peculiarities of criticality under an influence of the random
anisotropy of structure. To be more specific, given a reference system is a 3d magnet with
n-component order parameter which below the second order phase transition point Tc
characterizes a ferromagnetic state, what will be the impact of random anisotropy [1–3]
on the critical dynamics [4,5] of this transition? It appears, that contrary to the general
believe that even weak random anisotropy destroys ferromagnetic long-range order at
d = 3, this is true only for the isotropic random axis distribution [6]. Therefore, we will
study a particular case, when the second order phase transition survives and, moreover,
it remains in the random-Ising universality class [7, 8] for any n. A particular feature
of 3d systems which belong to the random-Ising universality class is that their heat
capacity does not diverge at Tc (it is the isothermal magnetic susceptibility which
manifests singularity) [9]. Again, general arguments state [10,11] that for such systems
the relaxational critical dynamics of the non-conserved order parameter coupled to a
conserved density, model C dynamics, degenerates to purely relaxation model without
any couplings to conserved densities (model A). Nevertheless, this statement is true only
in the asymptotics [12,13] (i.e. at Tc, which in fact is never reached in experiments or in
simulations). As we will show in the paper, common influence of two different factors:
randomness of structure and coupling of dynamical modes leads to a rich effective critical
behavior which possesses many new unexpected features.
Dynamical properties of a system near the critical point are determined by the
behavior of its slow densities. In addition to the order parameter density ϕ these are
the conserved densities. Here, we consider the case of one conserved density m. For the
description of critical dynamics the characteristic time scales for the order parameter,
tϕ, and for the conserved density, tm, are introduced. Approaching the critical point,
where the correlation length ξ is infinite, they are growing accordingly to the scaling
tϕ ∼ ξz, (1)
tm ∼ ξzm . (2)
These power laws define the dynamical critical exponents of the order parameter, z,
and of the conserved densities, zm. The conserved density dynamical exponents may be
different from that of the order parameter.
The simplest dynamical model taking into account conserved densities is model
C, [4, 14] which contains a static coupling between non-conserved n-dimensional order
parameter ϕ and scalar conserved density m. Being quite simple, the model can be
applied to the description of different physical systems. In particular, in a lattice
model of intermetallic alloys [15] the non-conserved order parameter corresponds to
differences in the concentration of atoms of certain kind between the odd and even
sublattices. It is coupled to a conserved quantity – the concentration of atoms of
this kind in the full system. In the supercooled liquids the fraction of locally favored
Model C critical dynamics of random anisotropy magnets 3
structures is non-conserved “bond order parameter”, coupled to the conserved density
of a liquid [17]. Systems containing annealed impurities with long relaxational times [18]
manifest certain similarity with the model C as well.
Dynamical properties of a model with coupling to a conserved density were less
studied numerically than those for model without any coupling to secondary densities.
It may be the consequence of the complexity of the numerical algorithms, which turn
out to be much slower than for the simpler model. Simulations were performed for an
Ising antiferromagnet with conserved full magnetization and non-conserved staggered
magnetization (i.e. the order parameter) [19] and also for an Ising magnet with conserved
energy [20].
Theoretical analysis of model C critical dynamics were performed by means of
the field-theoretical renormalization group. Critical dynamical behavior of model C in
different regions of d − n plane was analyzed by ε = 4 − d expansion in first order in
ε [14]. The results lead to speculations about the existence of an anomalous region
for 2 < n < 4, where the order parameter is much faster than the conserved density
and dynamic scaling is questionable. Recent two-loop calculation [21, 22] corrected the
results of Ref. [23] and showed an absence of the anomalous region 2 < n < 4.
For the 3d model C with order parameter dimension n = 1, the conserved density
lead to the ”strong” scaling: [21,22] the dynamical exponents z and zm coincide and are
equal 2 + α/ν, where α and ν are the specific heat and the correlation length critical
exponents, correspondingly. For the Ising system (n = 1) the specific heat diverges
and α > 0. While for a system with α < 0, that is for the physically interesting
cases n = 2, 3, the scalar density decouples from the order parameter density in the
asymptotic region. It means that for such values of n the order parameter scales with
the same dynamical critical exponent z as in the model A and the dynamical exponent
of the scalar density is equal to zm = 2. The importance of the sign of α was already
mentioned in Ref. [14].
A rich critical dynamical behavior has already been observed in system with
structural disorder [18, 24–27]. Interest in this case is increased by the fact that real
materials are always characterized by some imperfection of their structure. Obviously,
that models describing their properties should contain terms connected with structural
disorder of certain type. For the static behavior of a system with quenched energy
coupled disorder (e.g. dilution), the Harris criterion [28] states that disorder does not
lead to a new static universality class if the heat capacity of the pure system does not
diverge, that is α < 0. In appears that in diluted systems α < 0 is always the case
(see Ref. [9]). The conclusion about influence of coupling between order parameter and
secondary density works also in this case. The presence of a secondary density does
not affect the dynamical critical properties in the asymptotics [10]: order parameter
dynamics is the same as in an appropriate model A, and zm = 2. Nevertheless, as we
noted at the beginning, the coupling between the order parameter and the secondary
density considerably influences the non-asymptotic critical behavior [12, 13].
We are interested in the critical dynamics of a systems with structural disorder of
Model C critical dynamics of random anisotropy magnets 4
another type, namely, random anisotropy magnets. Their properties are described by
the random anisotropy model (RAM) introduced in Ref. [1]. In this spin lattice model
each spin is subjected to a local anisotropy of random orientation, which essentially is
described by a vector and therefore is defined only for n > 1. The Hamiltonian reads: [1]
H = −
JR,R′ ~SR~SR′ − D̄
(x̂R~SR)
2, (3)
where, ~S = (S1, ..., Sn), are n-component vectors located on the sites R of a d-
dimensional cubic lattice, D̄ > 0 is an anisotropy constant, x̂ is a random unit vector
pointing in direction of the local anisotropy axis. The short-range interaction JR,R′ is
assumed to be ferromagnetic.
The static critical behavior of RAM was analyzed by many theoretical and
numerical investigations which could be compared with the critical properties of random
anisotropy magnets found in experiments (for recent review see Ref. [3]). The results
of this analysis bring about that random anisotropy magnets do not show a second
order phase transition for an isotropic random axis distribution. However they possibly
undergo a second-order phase transition for an anisotropic distribution (for references
see reviews Refs. [2,3]). Renormalization group studies of the asymptotic [7,8,29–31] and
non-asymptotic properties [3] of RAM corroborated such a conclusion. For example, the
RAM with random axes distributed due to the so-called cubic distribution was shown
within two-loop approximation to undergo a second order phase transition governed by
the random Ising critical exponents, [3, 8] as first suggested in Ref. [7]. Recently this
result found its confirmation in a five-loop RG study [31]. The cubic distribution allows
x̂ to point only along one of the 2n directions of the axes k̂i of a (hyper)cubic lattice: [29]
p(x̂) =
δ(n)(x̂− k̂i) + δ(n)(x̂+ k̂i)
, (4)
where δ(y) are Kronecker’s deltas.
Contrary to the static critical behavior of random anisotropy magnets their
dynamics was less investigated. Only dynamical models for systems with isotropic
distribution were briefly discussed in Refs. [32,33]. The critical dynamics was discussed
within model A, Ref. [34], and the dynamical exponents were calculated. However, it
does not give a comprehensive quantitative description since it is (i) restricted to the
isotropic distribution of the random axis and (ii) it is performed only within the first
non-trivial order of ε = 4− d expansion.
The model A critical dynamics of RAM with cubic random axis distribution was
analyzed within two-loop approximation in Ref. [35] Although the asymptotic dynamical
properties found coincide with those of the random-site Ising model, the non-asymptotic
behavior is strongly influenced by the presence of random anisotropy [35].
Beside the slow order parameter an additional slow conserved densities might be
present, for instance the energy density. Therefore considering the non-asymptotic
dynamical behavior of the RAM an extension to model C is of interest. Indeed, there
Model C critical dynamics of random anisotropy magnets 5
exist magnets where the distribution of the local random axes is anisotropic (e.g. the
rare earth compounds, see Ref. [3]).
The structure of the paper is as follows: Section 2 presents the equations defining
the dynamical model and its Lagrangian, the renormalization is performed is Section 3,
there the asymptotic and effective dynamical critical exponents are defined. In Section
4 we give the expressions for the field-theoretic functions in two-loop order and the
resulting non-asymptotic behavior is discussed. Section 5 summarizes our study. Details
of the perturbation expansion are presented in the appendix.
2. Model equations
Here we consider the dynamical model for random anisotropy systems described by
(3) with random axis distribution (4). The structure of the equations of motion for
n-component order parameter ~ϕ0 and secondary density [4, 14] m0 is not changed by
presence of random anisotropy
∂ϕi,0
= − Γ̊ ∂H
∂ϕi,0
+ θϕi , i = 1 . . . n, (5)
= λ̊∇2 ∂H
+ θm . (6)
The order parameter relaxes and conserved density diffuses with the kinetic coefficients
Γ̊, λ̊ correspondingly. The stochastic forces θϕi, θm obey the Einstein relations:
<θϕi(x, t)θϕj (x
′, t′)>= 2Γ̊δ(x− x′)δ(t− t′)δij , (7)
<θm(x, t)θm(x
′, t′)> =−2̊λ∇2δ(x−x′)δ(t−t′)δij . (8)
The disorder-dependent equilibrium effective static functional H describing behavior of
system in the equilibrium reads:
|∇~ϕ0|2+̊r̃|~ϕ0|2
|~ϕ0|4−D0(x̂~ϕ0)2 +
m20 +
γ̊m0|~ϕ0|2 − h̊m0
, (9)
where D0 is an anisotropy constant proportional to D̄ of Eq. (3), ˚̃r and ˚̃v depend on D̄
and the coupling of the usual φ4 model.
Integrating out the secondary density one reduces (9) to usual Ginzburg-Landau-
Wilson model with random anisotropy term and new parameter v̊ and r̊ connected to
the model parameters ˚̃r,˚̃v, γ̊ and h via relations:
r̊ =˚̃r + γ̊h̊, v̊ =˚̃v − 3̊γ2 (10)
We study the critical dynamics by applying the Bausch-Janssen-Wagner approach
[36] of dynamical field-theoretical renormalization group (RG). In this approach, the
critical behavior is studied on the basis of long-distance and long-time properties of
the Lagrangian incorporating features of dynamical equations of the model. The model
defined by expressions (5)-(9) within Bausch-Janssen-Wagner formulation [36] turns out
Model C critical dynamics of random anisotropy magnets 6
to be described by an unrenormalized Lagrangian:
ϕ̃0,iϕ̃0,i+
ϕ̃0,i
+Γ̊(̊µ̃−∇2)
ϕ0,i +
λ̊m̃0∇2m̃0 + m̃0
− λ̊∇2
Γ̊̊ṽϕ̃0,iϕ0,i
ϕ0,jϕ0,j +
2Γ̊D0(x̂ϕ̃0,i(t))(x̂ϕ0,i(t)) + Γ̊γ̊m0ϕ̃0,iϕ0,i−
λ̊̊γm̃0∇2ϕ0,iϕ0,i
,(11)
with auxiliary response fields ϕ̃i(t). There are two ways to average over the disorder
configurations for dynamics. The first way originates from statics and consists in using
the replica trick, [37] where N replicas of the system are introduced in order to facilitate
configurational averaging of the corresponding generating functional. Finally the limit
N → 0 has to be taken.
However we follow the second way proposed in Ref. [33]. There it was shown that
the replica trick is not necessary if one takes just the average of the Lagrangian with
respect to the distribution of random variables. The Lagrangian obtained in this way is
described by the following expression:
ϕ̃i,0
+Γ̊(̊µ̃−∇2)
ϕi,0−Γ̊ϕ̃i+
Γ̊̊ṽ
ϕj,0ϕj,0+
ϕ3i,0
+λ̊m̃0∇2m̃0 + m̃0
− λ̊∇2
m0 + Γ̊γ̊m0ϕ̃i,0ϕi,0−
λ̊γ̊m̃0∇2ϕi,0ϕi,0+
ϕ̃i,0(t)ϕi,0(t)
Γ̊2ů
ϕ̃j,0(t
′)ϕj,0(t
Γ̊2ẘ
ϕ̃i,0(t
′)ϕi,0(t
. (12)
In Eq. (12), the bare mass is ˚̃µ = ˚̃r − D/n, and bare couplings are ů > 0, ˚̃v > 0,
ẘ < 0. Terms with couplings ů and ẘ are generated by averaging over configurations
and the values of ů and ẘ are connected to the moments of distribution (4). Therefore
the ratio of the two couplings has to be ẘ/ů = −n. The ẙ-term in (12) does not
result from the averaging procedure but has to be included since it is generated in the
perturbational treatment. It can be of either sign.
3. RG functions
We perform renormalization within minimal subtraction scheme introducing renormal-
ization factors Zai , ai = {{α}, {δ}}, leading to the renormalized parameters {α} =
{u, v, w, y, γ,Γ, λ} and renormalized densities {δ} = {ϕ, ϕ̃,m, m̃}. For the specific heat
we need also an additive renormalization Aϕ2 which leads to the function
Bϕ2(u,∆) = µ
εZ2ϕ2µ
µ−εAϕ2
, (13)
Model C critical dynamics of random anisotropy magnets 7
with the scale parameter µ and factor Zϕ2 that renormalizes the vertex with ϕ
2 insertion.
From the Z-factors one obtains the ζ-functions describing the critical properties
ζa({α}) = −
d lnZa
d lnµ
, (14)
Relations between the renormalization factors lead to corresponding relations between
the ζ-functions. In consequence for the description of the critical dynamics one needs
only ζ-functions of the couplings, ζui (ui = {u, v, w, y} for i = 1, 2, 3, 4), the order
parameter ζϕ, the auxiliary field ζϕ̃, ϕ
2-insertion ζϕ2 and also function Bϕ2 . In particular,
the ζ-function of the time scale ratio
introduced for the description of dynamic properties is related to the above ζ-functions:
ζϕ̃ − γ2Bϕ2 . (16)
The behavior of the model parameters under renormalization is described by the
flow equations
= β{α} . (17)
The β-functions for the static model parameters have the following explicit form:
βui = ui(ε+ ζϕ + ζui), (18)
βγ = γ(
+ ζϕ2 +
Bϕ2). (19)
The dynamic β-function for the time scale ratio W reads
βW = WζW = W (
ζϕ̃ − γ2Bϕ2). (20)
The asymptotic critical behavior of the system is obtained from the knowledge of
the fixed points (FPs) of the flow equations (17). A FP {α∗} = {u∗, v∗, w∗, y∗, γ∗,W ∗}
is defined as simultaneous zero of the β-functions. The set of equations for the static
fourth order couplings decouple from the other β-functions. Thus for each of the FPs
of the static forth order couplings {u∗i } one obtains two FP values of the static coupling
between the order parameter and the conserved density γ:
=0 and γ∗
ε− 2ζϕ2({u⋆i })
Bϕ2({u⋆i })
νBϕ2({u⋆i })
, (21)
where α and ν are the heat capacity and correlation length critical exponent calculated
at the corresponding FP {u∗}. Inserting the obtained values for the static FPs into the
β-function (20) one finds the corresponding FP values of the time scale ratio W .
The stable FP accessible from the initial conditions corresponds to the critical
point of system. A FP is stable if all eigenvalues ωi of the stability matrix ∂βαi/∂αj
calculated at this FP have positive real parts. The values of ωi indicate also how fast
the renormalized model parameters reach their fixed point values.
Model C critical dynamics of random anisotropy magnets 8
From the structure of β-functions we conclude, that the stability of any FP with
respect to the parameters γ and W is determined solely by the derivatives of the
corresponding β-functions:
, ωW =
. (22)
Moreover using (19) we can write:
ωγ = −
ε− 2ζϕ2({ui})
γ2Bϕ2(u,∆) , (23)
which at the FP {α∗} leads to:
ωγ|{α}={α∗} = −
for γ∗
= 0 , (24)
ωγ|{α}={α∗} =
for γ∗
2 6= 0 . (25)
Therefore, a stability with respect to parameter γ is determined by the sign of the
specific heat exponent α. For a system with non-diverging heat capacity (α < 0) at the
critical point, γ∗ = 0 is the stable FP. Static results report that the stable and accessible
FP is of a random site Ising type. In this case α < 0. This leads to the conclusions that
in the asymptotic region the secondary density decouples from the order parameter.
The critical exponents are defined by the FP values of the ζ-functions. For instance,
the asymptotic dynamical critical exponent z is expressed at the stable FP by:
z = 2 + ζΓ({α∗}), (26)
ζΓ({α}) =
ζϕ({ui})−
ζϕ̃({α}). (27)
In similar way the dynamical critical exponent zm for the secondary density is defined
zm = 2 + ζm({u∗i }, γ∗), (28)
where
ζm({ui}, γ) =
γ2Bϕ2({ui}). (29)
While their effective counterparts in the non asymptotic region are defined by the
solution of flow equations (17) as
zeff = 2 + ζΓ({ui(ℓ)}, γ(ℓ),W (ℓ)), (30)
zeffm = 2 + ζm({ui(ℓ)}, γ2(ℓ)). (31)
In the limit ℓ → 0 the effective exponents reach their asymptotic values. In the next
section we analyze the possible scenarios of effective dynamical behavior as well as check
the approach to the asymptotical regime.
Model C critical dynamics of random anisotropy magnets 9
4. Results
4.1. Asymptotic properties
The static two-loop RG functions of RAM with cubic random axis distribution in the
minimal substraction scheme agree with the results obtained in Ref. [3] using the replica
trick and read:
βu= − εu+
n + 2
uw+yu+
11 (n+ 2)
−5 (n + 2)
v2u−11
u2w− 5
uw2−11
u2y − 5
vuw −
vuy −
v2w −
w2v, (32)
βv= − εv+
v2+2vu+
wv+yv−
3n+ 14
11n+58
wvu, (33)
βw= − εw+
w2+2wu+
wv+yw−7
w3−29
w2u−41
wu2−31
v2w−11
w2y− 5
y2w−17
wuy−5n+ 34
wvu−5
vwy, (34)
βy = − εy +
y2 + 2yu+ 2yv + 2wy +
wv−17
y3 − 41
y2u−23
y2v−23
y2w−5n+ 82
v2y−41
w2y−2w2v−
n + 18
5n+ 82
vuw, (35)
, (36)
u2− 5
n + 2
12v2−5
wu−5yu
. (37)
Here, u, v, w, y stand for the renormalized couplings.
Given the expression for the function ζϕ2 , Eq. (37), the function βγ can be
constructed via Eq. (19) and the two-loop expression Bϕ2 = n/2.
In order to discuss the dynamical FPs it turns out to be useful to introduce the
parameter ρ = W/(1 + W ) which maps W and its FPs into a finite region of the
Model C critical dynamics of random anisotropy magnets 10
parameter space ρ. Then instead of the flow equation for W the flow equation for ρ
arises in (17):
= βρ({ui}, γ, ρ), (38)
where according to (20)
βρ({ui}, γ, ρ) = ρ(ρ− 1)(ζΓ({ui}, γ, ρ)− γ2Bϕ2({ui})). (39)
The function ζΓ in the above expression is obtained from Eq (27) using the static
function ζϕ (36) and the two loop result for the dynamic function ζϕ̃ (calculated from
Eq (A.8)). We get the following two-loop expression for ζΓ:
ζΓ = −
+ γ2ρ+
(6 ln (4/3)− 1)
(n + 2)
vy + y2
5u2 + (n + 2)uv + 10uw + 3uy + 3vw + 5w2 + 3wy
(n+ 2)
v + y
(1− 3 ln (4/3)) +
3 (n+ 2)
ln (4/3) + (1 + ρ) ln
1− ρ2
+ u+ w
ρ2 ln (ρ)
+ (3 + ρ) ln (1− ρ)
. (40)
The two-loop result [22] for the pure model C is recovered by setting in (40) the couplings
u, w, y equal to zero. While setting γ = 0 in (40) the result for model A with random
anisotropy [35] is recovered. The γ2u, γ2w, γ2y-terms represent the intrinsic contribution
of model C for random anisotropy magnets.
There are two different ways to proceed with the numerical analysis of the
perturbative expansions for the RG functions (32) - (35), (40). The first one is an
ε-expansion [38] whereas the second one is the so-called fixed-dimension approach [39].
In the frames of the latter approach, one fixed ε and solves the non-linear FP equations
directly at the space dimension of interest (i.e. at ε = 1 in our d = 3 case). Whilst
in many problems these two ways serve as complementing ones, it appears that for
certain cases only one of them, namely the fixed-d approach leads to the quantitative
description. Indeed, as it is well known by now, the ε-expansion turns into the
expansion for the random-site Ising model and no reliable numerical estimates can be
obtained on its basis (see [9] and references therein). As one will see below, the random-
site Ising model behavior emerges in our problem as well, therefore we proceed within
the fixed-d approach.
The series for RG functions are known to diverge. Therefore to obtain reliable
results on their basis we apply the Padé-Borel resummation procedure [40] to the static
functions. It is performed in following way: we construct the Borel image of the resolvent
series [41] of the initial RG function f :
0≤i+j+l+k≤2
ai,j,k,l(ut)
i(vt)j(wt)k(yt)l →
Model C critical dynamics of random anisotropy magnets 11
0≤i+j+l+k≤2
ai,j,k,lu
ivjwkylti+j+k+l
Γ(i+ j + 1)
where f stands for one of the static RG functions βui , βγ/γ − γ2n/4, ai,j,k,l are the
corresponding expansion coefficients given by Eqs. (32)–(37), and Γ(i+ j+1) is Euler’s
gamma function. Then, the Borel image is extrapolated by a rational Padé approximant
[42] [K/L](t). Within two-loop approximation we use the diagonal approximant with
linear denominator [1/1]. As it is known in the Padé analysis, the diagonal approximants
ensure the best convergence of the results [42]. The resummed function is then calculated
by an inverse Borel transform of this approximant:
f res =
dt exp(−t)[1/1](t). (41)
As far as the above procedure enables one to restore correct static RG flow (as sketched
below) we do not further resum the dynamic RG function βW .
The analysis of the static functions β{ui} at fixed dimension d = 3 brings about an
existence of 16 FPs [3, 31]. Only ten of these FPs are situated in the region of physical
interest u > 0, v > 0, w < 0. Corresponding values of FP coordinates can be found in
Ref. [3].
For each static FP {u∗i} we obtain a set of dynamical FPs with different γ∗ and
ρ∗. The FPs obtained for n = 2, 3 are listed in Table 1 and Table 2 correspondingly.
Stability exponents ωγ and ωρ are given in tables as well. Here we keep the numbering
of FPs already used in Refs. [3, 8, 29, 31, 35]. It is known from the statics that FP
XV governs the critical behavior of RAM with cubic distribution. This FP is of the
same origin as the FP of random-site Ising model therefore all static critical exponents
coincide with those of the random-site Ising model. Since the specific heat exponent in
this case is negative, the asymptotic critical dynamics is described by model A. However
the non-asymptotic critical properties of random anisotropy magnets are different from
the random-site Ising magnets in statics [3] as well as in dynamics [35]. Moreover, for
the model C considered here, the non-asymptotic critical behavior differs considerably
from that of the corresponding model A as we will se below.
4.2. Non-asymptotic properties
The existence of such a large number of dynamical FPs makes non-asymptotic critical
behavior more complex as in model A. We present here results for n = 3. For n = 2
the behavior is qualitatively similar. Solving the flow equations for different initial
conditions we obtain different flows in the space of model parameters. The projection
of most characteristic flows into the subspace w−y−ρ is presented in Fig. 1. The open
circles indicate genuine model C unstable FPs whereas filled circles represent model A
unstable FPs. The filled square denotes the stable FP.
The initial conditions for the couplings u(0), v(0), w(0), y(0) for the flows shown
are the same as those in Refs. [3, 35]. We choose γ(0) = 0.1 and ρ(0) = 0.6. Many
flows are affected by the two Ising FPs VC and XC . Inserting the solutions of the flow
Model C critical dynamics of random anisotropy magnets 12
-1,8-1,6
Figure 1. Projections of flows for n = 3 in the subspace of couplings w− y− ρ. Open
circles represent projections of unstable FPs with non-zero γ∗. Filled circles denote
unstable FPs with γ∗ = 0. The filled square shows the stable FP. See Section 4.2 for
a more detailed description.
-100 -80 -60 -40 -20 0
Figure 2. Dependence of the order parameter effective dynamical critical exponent in
the model C dynamics on the logarithm of flow parameter. See text for full description.
-100 -80 -60 -40 -20 0
Figure 3. Dependence of the conserved density effective dynamical critical exponent in
the model C dynamics on the logarithm of flow parameter. See text for full description.
Model C critical dynamics of random anisotropy magnets 13
Table 1. Two-loop values for the dynamical FPs of random anisotropy magnets with
n = 2 (model C).
FP γ∗ ρ∗ ωγ ωρ z
I 0 0 ≤ ρ∗ ≤ 1 0 0 2
I′ 1 0 1 -1 2
IC 1 0.6106 1 0.745 3
I′1 1 1 1 -∞ ∞
II 0 0 0.0387 0.0526 2.0526
II1 0 1 0.0387 -0.0526 2.0526
III 0 0 -0.1686 -0.1850 1.8150
III1 0 1 -0.1686 0.1850 1.8150
IIIC .5806 0 0.3371 -0.5222 1.8150
III′1 .5806 1 0.3371 ∞ −∞
V 0 0 -0.0525 0.0523 2.0523
V1 0 1 -0.0525 -0.0523 2.0523
V′ .3240 0 0.1050 -0.0527 2.0523
VC .3240 0.5241 0.1050 0.0277 2.1050
V′1 .3240 1 0.1050 -∞ ∞
VI 0 0 -0.0049 -0.0417 2.0107
VI1 0 1 -0.0049 0.0417 2.0107
VI′ 0.0986 0 0.0097 0.00095 2.0107
VI′1 0.0986 1 0.0097 ∞ -∞
VIII 0 0 -0.0525 0.1569 2.1569
VIII1 0 1 -0.0525 -0.1569 2.1569
VIII′ 0.3240 0 0.1050 0.0519 2.1569
VIII′1 0.3240 1 0.1050 -∞ ∞
X 0 0 -0.0525 0.0523 2.0523
X1 0 1 -0.0525 -0.0523 2.0523
X′ .3240 0 0.1050 -0.0527 2.0523
XC .3240 0.5241 0.1050 0.0277 2.1050
X′1 .3240 1 0.1050 -∞ ∞
XV 0 0 0.0018 0.1388 2.1388
XV1 0 1 0.0018 -0.1388 2.1388
equations into the expressions for dynamical exponents we obtain the effective exponents
zeff and zeffm . The dependence of z
eff on the flow parameter ℓ corresponding to flows 1-7
is shown in Fig. 2. Similarly Fig 3 shows this dependence for the effective exponent of
the conserved density zeffm . Flow 3 is affected by both FPs VC and XC . Therefore the
effective exponents demonstrate a region with values which are close to those for model
C in the case of the Ising magnet (see curves 3 in Figs. 2 and 3). The asymptotic values
corresponding to the FPs VC and XC are indicated by the dashed line. They correspond
Model C critical dynamics of random anisotropy magnets 14
Table 2. Two-loop values for the dynamical FPs of random anisotropy magnets with
n = 3 (model C).
FP γ∗ ρ∗ ωγ ωρ z
I 0 0 ≤ ρ∗ ≤ 1 0 0 2
I′ 0.8165 0 1 -1 2
IC 0.8165 0.7993 1 0.5218 3
I′1 0.8165 1 1 -∞ ∞
II 0 0 0.1109 0.0506 2.0506
II1 0 1 0.1109 -0.0506 2.0506
III 0 0 -0.1686 -0.1850 1.8150
III1 0 1 -0.1686 0.1850 1.8150
III′ 0.4741 0 .3371 -0.5222 1.8150
III′1 0.4741 1 0.3371 ∞ −∞
V 0 0 -0.0525 0.0523 2.0523
VI1 0 1 -0.0525 -0.0523 2.0523
VI′ 0.2646 0 0.1050 -0.0527 2.0523
VIC 0.2646 0.7617 0.1050 0.0157 2.1050
VI′1 0.2646 1 0.1050 -∞ ∞
VI 0 0 -0.0162 -0.0401 1.9599
VI1 0 1 -0.0162 0.0401 1.9599
VI′ 0.1467 0 0.0323 -0.0724 1.9599
VI′1 0.1467 1 0.0323 ∞ -∞
VIII 0 0 0.1051 0.0425 2.0425
VIII1 0 1 0.1051 -0.0425 2.0425
IX 0 0 -0.0161 -0.0384 1.9616
IX1 0 1 -0.0161 0.0384 1.9616
IX′ 0.1466 0 0.0322 -0.0707 1.9616
IX′1 0.1466 1 0.0322 ∞ -∞
X 0 0 -0.0525 0.0523 2.0523
X1 0 1 -0.0525 -0.0523 2.0523
X′ 0.2646 0 0.1050 -0.0527 2.0523
XC 0.2646 0.7617 0.1050 0.0157 2.1050
X′1 0.2646 1 0.1050 -∞ ∞
XV 0 0 0.0018 0.1388 2.1388
XV1 0 1 0.0018 -0.1388 2.1388
Model C critical dynamics of random anisotropy magnets 15
to the values asymptotically obtained in the pure model C with n = 1, since the FPs
VC and XC are of the same origin, that FP of pure model C. Curves 6 correspond to
flows near the pure FP II. Whereas curve 7 corresponds to the flow near the cubic FP
VIII.
The main difference of the behavior of the effective dynamical exponent zeff in
model C from that in model A is the appearance of curves with several peaks. The
value of the peak appearing on the right-hand side depends on the initial condition γ(0)
and ρ(0). This is demonstrated in Fig.4.
-100 -80 -60 -40 -20 0
 γ=0.01, ρ=0.1
 γ=0.01, ρ=0.6
 γ=0.1, ρ=0.1
 γ=0.1, ρ=0.6
 γ=0.6, ρ=0.1
 γ=0.6, ρ=0.6
Figure 4. Dependence of zeff on the logarithm of flow parameter for different initial
values γ and ρ
-100 -80 -60 -40 -20 0
Figure 5. Normalized effective dynamical critical exponents of order parameter and
conserved density zeff (solid line), zeffm (dashed line) correspondent to flow 2.
The effective behavior of the two dynamical critical exponents for the order
parameter and the conserved density might be quite different as one sees comparing
Figs. 2 and 3. However, one may ask if both exponents reach the asymptotic values
in the same way. For this purpose we introduce a normalization of the values of the
effective exponents by their values in the asymptotics. In particular, we introduce
notations zeff = zeff/z, zeffm = z
m /zm for order parameter exponent and conserved
density exponent correspondingly. Figs. 5, 6 show behavior of normalized exponents for
order parameter and conserved density for flows 2 and 4 correspondingly. It illustrates
that approach to the asymptotics for order parameter exponents and conserved density
Model C critical dynamics of random anisotropy magnets 16
-100 -80 -60 -40 -20 0
Figure 6. Dependencies of normalized effective dynamical critical exponents of the
order parameter and conserved density correspondent to flow 4. Notations as if Fig.
exponents occurs in different way for different flows, that means for different initial
conditions. For system with small degree of disorder (small u(0) and w(0), flow 4) the
approach of order parameter dynamical exponent to asymptotic regime is faster than
for the conserved density one, while for system with larger amount of disorder (flow 2)
approach of both quantities is almost simultaneous.
5. Conclusion
In this paper, we have studied model C dynamics of the random anisotropy magnets
with cubic distribution of local anisotropy axis. For this purpose two-loop dynamical
RG function ζΓ has been obtained. On the base of static results [3] the dependencies of
effective critical exponents of order parameter, zeff , and conserved density, zeffm , on the
flow parameter were calculated.
The two-loop approximation adopted in our paper may be considered as certain
compromise between what is feasible in static calculations from one side, and in dynamic
ones form the other side. As a matter of fact, the state-of-the-art expansions of the static
RG functions in the minimal subtraction scheme are currently available for many models
within the five-loop accuracy [9, 44] but it is not the case for the dynamic functions.
Complexity of dynamical calculations is reflected in the current situation, when the
results beyond two loops have been obtained for model A only. The model C even with
no structural disorder seems to be outside present manageable problems (see the recent
review [5]). However, there are examples which demonstrate the even in two loops
highly accurate results for dynamical characteristics can be obtained. One of them is
given by the critical dynamics of 4He at the superfluid phase transition [45]. Besides,
analysis of the two-loop static RG functions refined by resummation also brings about
sufficiently accurate quantitative characteristics of a static critical behavior in disordered
systems [9, 46].
In the asymptotics the conserved density is decoupled from the order parameter
and the dynamical critical behavior of random anisotropy model with cubic random axis
distribution is the same as that of the random-site Ising model. Crossover occurring
Model C critical dynamics of random anisotropy magnets 17
between different FPs present in the random anisotropy model considerably influences
the non-asymptotic critical properties. Different scenarios of dynamical critical behavior
are observed depending of the initial values of the model parameters. The main feature
is the presence of additional peaks on the curves for the effective dynamical critical
exponents in comparison with the effective model A critical dynamics.
As far as the approach to the asymptotics is very slow, the effective exponents may
be observed in experiments and in numerical simulations. The effective exponent for the
order parameter may take a value far away from the asymptotic one (the asymptotic
value in in our two loop calculation is z = 2.139). The same holds for the conserved
density effective critical exponent which may be far of its van Hove asymptotic value
zm = 2. For example one can observe values of z
eff and zeffm close to those for pure Ising
model with model C dynamics.
This work was supported by Fonds zur Förderung der wissenschaftlichen Forschung
under Project No. P16574
Appendix A. Perturbation expansion
We perform our calculations on the basis of the Lagrangian defined by (12) using
the Feynman graph technique. The propagators for this Lagrangian are shown in the
Fig. A1.
k,w k ’ ’,w
G(k, ω)δ(k+k′)δ(ω+ω′)δi,j
k,w k ’ ’,w
C(k, ω)δ(k+k′)δ(ω+ω′)δi,j
k,w k ’ ’,w
i j H(k, ω)δ(k+k′)δ(ω+ω′)δi,j
k,w k ’ ’,w
i j D(k, ω)δ(k+k′)δ(ω+ω′)δi,j
Figure A1. Propagators for constructing Feynman graphs. G(k, ω) and H(k, ω) are
response propagators while C(k, ω) and D(k, ω) are correlation propagators.
Response propagators G(k, ω) and H(k, ω) are equal to
G(k, ω) = 1/(−iω + Γ̊(̊µ̃+ k2)) and H(k, ω) = 1/(−iω + λ̊k2) , (A.1)
while the correlation propagators C(k, ω) and D(k, ω) are equal to
C(k, ω) = 2Γ̊/|−iω+Γ̊(̊µ̃+k2)|2 and D(k, ω) = 2̊λk2/|−iω+λ̊k2|2 .(A.2)
The vertices defined by Lagrangian are shown in Fig. A2.
We obtain an expression for the two-point vertex function Γ̊
ϕ̃ϕ by keeping the
diagrams up to two-loop order. The result of calculations can be expressed in form:
Γ̊ϕ̃ϕ(ξ, k, ω) = −iωΩ̊ϕ̃ϕ(ξ, k, ω) + Γ̊stϕϕ(ξ, k)̊Γ . (A.3)
Here we introduce the correlation length ξ(µ̊ = ˚̃µ+ γ̊h̊, ů, v̊, ẘ, ẙ), which is defined by
∂ ln Γ̊stϕϕ
. (A.4)
Model C critical dynamics of random anisotropy magnets 18
k ’ ’,w
k ’’ ’’,w
k ’’’ ’’’,w
ΓAδ(k + k′ + k′′ + k′′′)δ(ω + ω′ + ω′′ + ω′′′)
k ’ ’,w
k ’’ ’’,w
k ’’’ ’’’,w
Γ2Bδ(k + k′)δ(k′′ + k′′′)δ(ω + ω′)δ(ω′′ + ω′′′)
c k,w
k ’ ’,w
k ’’ ’’,w
Γ̊γ̊δ(k + k′ + k′′)δ(ω+ω′+ω′′)δi,j
k ’ ’,w
k ’’ ’’,w
λ̊̊γk2δ(k + k′ + k′′)δ(ω+ω′+ω′′)δi,j
Figure A2. Vertices for our model. In vertex a, A stands for
v0/3! (δi,jδl,m + δi,lδj,m + δi,mδj,l)/3 or y0/3! δi,jδj,lδl,m. In vertex b, B stands for
u0/3! δi,jδl,m or w0/3! δi,jδj,lδl,m. Vertices c and d originate from the coupling to the
conserved density.
The function Γ̊ϕϕ is the static two-loop vertex function of the disordered magnet. The
structure (A.3) of the dynamic vertex function of pure model C was obtained in Ref. [43]
up to two-loop order.
We can express two-loop dynamical function Ω̊ϕ̃ϕ in the following form:
Ω̊ϕ̃ϕ(ξ, k, ω) = 1 + Ω̊
ϕ̃ϕ(ξ, k, ω) + Ω̊
ϕ̃ϕ(ξ, k, ω), (A.5)
where the one loop contribution reads:
Ω̊1ϕ̃ϕ(ξ, k, ω) = −
ů+ ẘ
(−iω + Γ̊(ξ−2 + k′2))(ξ−2+k′2)
+ γΓ̊IC(ξ, k, ω),(A.6)
while the two-loop contribution is of the form:
Ω̊2ϕ̃ϕ(ξ, k, ω) = Γ̊(
v̊2 +
ϕ̃ϕ (ξ, k, ω)−
v̊ + ẙ)̊γ2C̊
ϕ̃ϕ (ξ, k, ω) + Γ̊γ̊
4S̊ϕ̃ϕ(ξ, k, ω) +
v̊ů+
(CD2)
ϕ̃ϕ (ξ, k, ω) + Γ̊(
(CD3)
ϕ̃ϕ (ξ, k, ω) + W̊
(CD4)
ϕ̃ϕ (ξ, k, ω)
ů+ ẘ
(CD5)
ϕ̃ϕ (ξ, k, ω) + W̊
(CD6)
ϕ̃ϕ (ξ, k, ω) + 2W̊
(CD7)
ϕ̃ϕ (ξ, k, ω)
.(A.7)
In (A.6) and (A.7) the expressions for the integrals IC , W̊
(A), C̊(T3) and S̊ of the
pure model C are given in the Appendix A.1 in Ref. [22], while the contributions for
W̊ (CDi) are presented in the Appendix of Ref. [13].
Following the renormalization procedure for Γ̊ϕ̃ϕ we obtain the two-loop
renormalizating factor Zϕ̃:
Zϕ̃ = 1+
Model C critical dynamics of random anisotropy magnets 19
v + y
n + 2
v + y
1−3 ln
3(n+2)
(1+W )2
3(n+ 2)
vw−(u+ w)γ2 W
ln(1 +W )− 1
. (A.8)
[1] R. Harris, M. Plischke, and M. J. Zuckermann, Phys. Rev. Lett. 31, 160 (1973).
[2] R. W. Cochrane, R. Harris, and M. J. Zuckermann, Phys. Reports, 48, 1 (1978).
[3] M. Dudka, R. Folk, and Yu. Holovatch, J. Magn. Magn. Mater., 294, 305 (2005).
[4] B. I. Halperin and P. C. Hohenberg, Rev. Mod. Phys. 49, 436 (1977).
[5] R. Folk and G. Moser, J. Phys. A: Math. Gen. 39, R207 (2006).
[6] As shown in: R.A. Pelcovits, E. Pytte, and J. Rudnick, Phys. Rev. Lett., 40, 476 (1978); S.-k.
Ma and J. Rudnick, Phys. Rev. Lett., 40, 589 (1978), an absence of the ferromagnetic ordering
for an isotropic random axis distribution at d ≤ 4 follows from the Imry-Ma arguments first
formulated in the context of the random field Ising model in: Y. Imry and S.-k. Ma, Phys. Rev.
Lett. 35, 1399 (1975). Later this fact has been proven by several other methods (see Ref. [3] for
a more detailed account).
[7] D. Mukamel and G. Grinstein, Phys. Rev. B 25, 381 (1982).
[8] M. Dudka, R, Folk, and Yu. Holovatch, in: W. Janke, A. Pelster, H.-J. Schmidt and M. Bachmann
(Eds)., Fluctuating Paths and Fields, Singapore, World Scientific, 2001, p. 457; M. Dudka, R.
Folk, and Yu. Holovatch, Condens. Matter Phys. 4, 459 (2001).
[9] For the recent reviews see e.g.: R. Folk, Yu. Holovatch, and T. Yavors’kii, Phys. Rev. B 61, 15114
(2000); R. Folk, Yu. Holovatch, and T. Yavors’kii, Physics - Uspekhi 46 169 (2003) [Uspekhi
Fizicheskikh Nauk 173, 175 (2003)]; A. Pelissetto and E. Vicari, Phys. Rep. 368, 549 (2002).
[10] U. Krey, Phys. Lett. A 57, 215 (1976).
[11] I. D. Lawrie and V. V. Prudnikov, J. Phys. C 17, 1655 (1984).
[12] M. Dudka, R. Folk, Yu. Holovatch, and G. Moser, Phys. Rev. E 72 036107 (2005).
[13] M. Dudka, R. Folk, Yu. Holovatch, and G. Moser, J. Phys. A 39 7943 (2006).
[14] B. I. Halperin, P. C. Hohenberg, and S.-k. Ma, Phys. Rev. B 10, 139 (1974).
[15] V. I. Corentsveig, P. Fraztl, and J. L. Lebowitz, Phys. Rev. B 55 2912 (1997).
[16] K. Binder, W. Kinzel, and D. P. Landau, Surf. Sci. 117 232 (1982).
[17] H. Tanaka, J. Phys.:Condens. Matter 11 L159 (1999).
[18] G. Grinstein, S.-k. Ma, and G. F. Mazenko, Phys. Rev. B 15 258 (1977).
[19] P. Sen, S. Dasgupta, and D. Stauffer, Eur. Phys. J. B 1 107 (1998).
[20] D. Stauffer, Int. J. Mod. Phys. C 8 1263 (1997)
[21] R. Folk and G. Moser, Phys. Rev. Lett. 91, 030601 (2003).
[22] R. Folk and G. Moser, Phys. Rev. E 69 036101 (2004).
[23] E. Brezin and C. De Dominicis, Phys. Rev. B 12 4954 (1975).
[24] V.V. Prudnikov and A. N. Vakilov, Sov. Phys. JETP 74, 900 (1992).
[25] K. Oerding and H. K. Janssen, J. Phys. A: Math. Gen. 28, 4271 (1995).
[26] H. K. Janssen, K. Oerding, and E. Sengenspeick, J. Phys. A: Math. Gen. 28, 6073 (1995).
[27] V. Blavats’ka, M. Dudka, R. Folk, and Yu. Holovatch, Phys. Rev. B, 72 064417 (2005)
[28] A. B. Harris, J. Phys. C: Solid State Phys. 7, 1671 (1974).
Model C critical dynamics of random anisotropy magnets 20
[29] A. Aharony, Phys. Rev. B 12 1038 (1975).
[30] M. Dudka, R. Folk, and Yu. Holovatch, Condens. Matter Phys., 4, 77 (2001); Yu. Holovatch, V.
Blavats’ka, M. Dudka, C. von Ferber, R. Folk, and T. Yavors’kii, Int. J. Mod. Phys. B 16, 4027
(2002).
[31] P. Calabrese, A. Pelissetto, and E. Vicari, Phys. Rev. E 70, 036104 (2004).
[32] S.-k. Ma and J. Rudnick, Phys. Rev. Lett. 40, 587 (1978).
[33] C. De Dominicis, Phys. Rev. B 18, 4913 (1978).
[34] U. Krey, Z. Phys. B 26, 355 (1977).
[35] M. Dudka, R. Folk, Yu. Holovatch, and G. Moser, Condens. Matter Phys. 8, 737 (2005)
[36] R. Bausch, H. K. Janssen, and H. Wagner, Z. Phys. B 24, 113 (1976).
[37] V. J. Emery, Phys. Rev. B 11, 239 (1975).
[38] K. G. Wilson and M. E. Fisher, Phys. Rev. Lett. 28, 240 (1972).
[39] R. Schloms and V. Dohm, Europhys. Lett. 3, 413 (1987); R. Schloms and V. Dohm, Nucl. Phys.
B 328, 639 (1989).
[40] G. A. Baker, Jr., B. G. Nickel, and D. I. Meiron, Phys. Rev. B 17, 1365 (1978).
[41] P. J. S. Watson, J. Phys. A 7, L167 (1974).
[42] G. A. Baker, Jr. and P. Graves-Morris, Padé Approximants (Addison-Wesley: Reading, MA, 1981).
[43] R. Folk and G. Moser, Acta Physica Slovaca 52, 285 (2002).
[44] H. Kleinert, J. Neu, V. Schulte-Frohlinde, K. G. Chetyrkin, and S. A. Larin, Phys. Lett. B 272,
39 (1991); Erratum: Phys. Lett. B 319, 545 (1993); H. Kleinert and V. Schulte-Frohlinde, Phys.
Lett. B 342, 284 (1995).
[45] V. Dohm, Phys. Rev. B 44, 2697; (1991), Phys. Rev. B 73, 09990(E) (2006)
[46] J. Jug, Phys. Rev. B 27, 609 (1983).
	Introduction
	Model equations 
	RG functions 
	Results 
	Asymptotic properties
	Non-asymptotic properties
	Conclusion
	 Perturbation expansion
ABSTRACT
  We study the relaxational critical dynamics of the three-dimensional random
anisotropy magnets with the non-conserved n-component order parameter coupled
to a conserved scalar density. In the random anisotropy magnets the structural
disorder is present in a form of local quenched anisotropy axes of random
orientation. When the anisotropy axes are randomly distributed along the edges
of the n-dimensional hypercube, asymptotical dynamical critical properties
coincide with those of the random-site Ising model. However structural disorder
gives rise to considerable effects for non-asymptotic critical dynamics. We
investigate this phenomenon by a field-theoretical renormalization group
analysis in the two-loop order. We study critical slowing down and obtain
quantitative estimates for the effective and asymptotic critical exponents of
the order parameter and scalar density. The results predict complex scenarios
for the effective critical exponent approaching an asymptotic regime.

<|endoftext|><|startoftext|>
Introduction
In this article all complex manifolds are supposed to be of finite dimension and
countable at infinity, and all complex analytic spaces are supposed to be reduced,
irreducible, of finite dimension and countable at infinity. For a subset S of a topo-
logical space M, S denotes the closure of S in M, and the set ∂S := S ∩ M \ S
denotes, as usual, the boundary of S in M.
The main purpose of this work is to investigate the following
PROBLEM. Let X, Y be two complex manifolds, let D (resp. G) be an open subset
of X (resp. Y ), let A (resp. B) be a subset of D (resp. G) and let Z be a complex
analytic space. Define the cross
A× (G ∪B)
(D ∪ A)× B
We want to determine the “envelope of holomorphy” of the cross W, that is, an
“optimal” open subset of X×Y, denoted by
W, which is characterized by the following
properties:
Let f : W −→ Z be a mapping that satisfies, in essence, the following two
conditions:
• f(a, ·) is holomorphic on G for all a ∈ A, f(·, b) is holomorphic on D for all
b ∈ B;
• f(a, ·) is continuous on G ∪ B for all a ∈ A, f(·, b) is continuous on D ∪ A
for all b ∈ B.
2000 Mathematics Subject Classification. Primary 32D15, 32D10.
Key words and phrases. Hartogs’ theorem, holomorphic extension, Poletsky theory of discs,
Rosay Theorem on holomorphic discs.
http://arxiv.org/abs/0704.0897v2
2 VIÊT-ANH NGUYÊN
Then there is a holomorphic mapping f̂ defined on
W such that for every (ζ, η) ∈ W,
f̂(z, w) tends to f(ζ, η) as (z, w) ∈
W tends, in some sense, to (ζ, η).
Now we recall briefly the main developments around this problem. All the results
obtained so far may be divided into two directions. The first direction investigates
the results in the “interior” context: A ⊂ D and B ⊂ G, while the second one
explores the “boundary” context: A ⊂ ∂D and B ⊂ ∂G.
The first fundamental result in the field of separate holomorphy is the well-known
Hartogs extension theorem for separately holomorphic functions (see [14]). In the
language of the PROBLEM the following case X = Cn, Y = Cm, A = D, B =
G, Z = C has been solved and the result is
W = D×G. In particular, this theorem
may be considered as the first main result in the first direction. In his famous
article [8] Bernstein obtained some positive results for the PROBLEM in certain
cases where A ⊂ D, B ⊂ G, X = Y = C and Z = C.
More than 60 years later, a next important impetus was made by Siciak (see
[43, 44]) in 1969–1970, where he established some significant generalizations of the
Hartogs extension theorem. In fact, Siciak’s formulation of these generalizations
gives rise to the above PROBLEM: to determine the envelope of holomorphy for sep-
arately holomorphic functions defined on some cross sets W. The theorems obtained
under this formulation are often called cross theorems. Using the so-called relative
extremal function, Siciak completed the PROBLEM for the case where A ⊂ D,
B ⊂ G, X = Y = C and Z = C.
The next deep steps were initiated by Zahariuta in 1976 (see [45]) when he started
to use the method of common bases of Hilbert spaces. This original approach per-
mitted him to obtain new cross theorems for some cases where A ⊂ D, B ⊂ G and
D = X, G = Y are Stein manifolds. As a consequence, he was able to generalize
the result of Siciak in higher dimensions.
Later, Nguyên Thanh Vân and Zeriahi (see [25, 26, 27]) developed the method
of doubly orthogonal bases of Bergman type in order to generalize the result of Za-
hariuta. This is a significantly simpler and more constructive version of Zahariuta’s
original method. Nguyên Thanh Vân and Zeriahi have recently achieved an elegant
improvement of their method (see [24], [47]).
Using Siciak’s method, Shiffman (see [41]) was the first to generalize some Siciak’s
results to separately holomorphic mappings with values in a complex analytic space
Z. Shiffman’s result (see [42]) shows that the natural “target spaces” for obtaining
satisfactory generalizations of cross theorems are the ones which possess the Hartogs
extension property (see Subsection 2.4 below for more explanations).
In 2001 Alehyane and Zeriahi solved the PROBLEM for the case where A ⊂ D,
B ⊂ G and X, Y are Stein manifolds, and Z is a complex analytic space which
possesses the Hartogs extension property (see Theorem 2.2.4 in [5]).
In a recent work (see [28]) we complete, in some sense, the PROBLEM for the
case where A ⊂ D, B ⊂ G and X, Y are arbitrary complex manifolds. The main
ingredients in our approach are Poletsky theory of discs developed in [37, 38], Rosay’s
A UNIFIED APPROACH 3
Theorem on holomorphic discs (see [40]), the above mentioned result of Alehyane–
Zeriahi and the technique of level sets of the plurisubharmonic measure which was
previously introduced in our joint-work with Pflug (see [33]).
To conclude the first direction of research we mention the survey articles by
Nguyên Thanh Vân [23] and Peter Pflug [32] which give nice accounts on this sub-
ject.
The first result in the second direction (i.e. “boundary context”) was established
in the work of Malgrange–Zerner [46] in the 1960s. Further results in this direction
were obtained by Komatsu [21] and Drużkowski [9], but only for some special cases.
Recently, Gonchar [12, 13] has proved a more general result where the following
case has been solved: X = Y = C, D and G are Jordan domains, A (resp. B) is
an open boundary subset of ∂D (resp. ∂G), and Z = C. It should be noted that
Airapetyan and Henkin published a general version of the edge-of-the-wedge theorem
for CR manifolds (see [1] for a brief version and [2] for a complete proof). Gonchar’s
result could be deduced from the latter works. In our joint-articles with Pflug (see
[33, 34, 35]), Gonchar’s result has been generalized considerably. More precisely, the
work in [35] treats the case where the “source spaces” X, Y are arbitrary complex
manifolds, A (resp. B) is an open boundary subset of ∂D (resp. ∂G), and Z =
C. The work in [34] solves the case where the “source spaces” X, Y are Riemann
surfaces, A (resp. B) is a measurable (boundary) subset of ∂D (resp. ∂G), and
Z = C.
The main purpose of this article is to give a new version of the Hartogs extension
theorem which unifies all results up to now. Namely, we are able to give a reason-
able solution to the PROBLEM when the “target space” Z possesses the Hartogs
extension property. Our method is based on a systematic application of Poletsky
theory of discs, Rosay Theorem on holomorphic discs and our joint-work with Pflug
on boundary cross theorems in dimension 1 (see [34]). It also relies on our new
technique of conformal mappings and a generalization of Siciak’s relative extremal
function. The approach illustrates the unified character in the theory of extension
of holomorphic mappings:
One can deduce the global extension from local informations.
Moreover, the novelty of this new approach is that one does not use the classical
method of doubly orthogonal bases of Bergman type.
We close the introduction with a brief outline of the paper to follow.
In Section 2 we formulate the main results.
The tools which are needed for the proof of the main results are developed in
Section 3, 4, 5 and 7.
The proof of the main results is divided into three parts, which correspond to
Section 6, 8 and 9. Section 10 concludes the article with various applications of our
results.
Acknowledgment. The paper was written while the author was visiting the
Abdus Salam International Centre for Theoretical Physics in Trieste. He wishes to
express his gratitude to this organization.
4 VIÊT-ANH NGUYÊN
2. Preliminaries and statement of the main result
First we develop some new notions such as system of approach regions for an open
set in a complex manifold, and the corresponding plurisubharmonic measure. These
will provide the framework for an exact formulation of the PROBLEM and for our
solution.
2.1. Approach regions, local pluripolarity and plurisubharmonic measure.
Definition 2.1. Let X be a complex manifold and let D ⊂ X be an open subset.
A system of approach regions for D is a collection A =
Aα(ζ)
ζ∈D, α∈Iζ
of open
subsets of D with the following properties:
(i) For all ζ ∈ D, the system
Aα(ζ)
forms a basis of open neighborhoods
of ζ (i.e., for any open neighborhood U of a point ζ ∈ D, there is α ∈ Iζ
such that ζ ∈ Aα(ζ) ⊂ U).
(ii) For all ζ ∈ ∂D and α ∈ Iζ , ζ ∈ Aα(ζ).
Aα(ζ) is often called an approach region at ζ.
A is said to be canonical if it satisfies (i) and the following property (which is
stronger than (ii)):
(ii’) For every point ζ ∈ ∂D, there is a basis of open neighborhoods (Uα)α∈Iζ of ζ
in X such that Aα(ζ) = Uα ∩D, α ∈ Iζ.
It is possible that Iζ = ∅ for some ζ ∈ ∂D.
Various systems of approach regions which one often encounters in Complex Anal-
ysis will be described in the next subsection. Systems of approach regions for
D are used to deal with the limit at points in D of mappings defined on some
open subsets of D. Consequently, we deduce from Definition 2.1 that the subfamily(
Aα(ζ)
ζ∈D, α∈Iζ
is, in a certain sense, independent of the choice of a system of ap-
proach regions A. In addition, any two canonical systems of approach regions are,
in some sense, equivalent. These observations lead us to use, throughout the paper,
the following convention:
We fix, for every open set D ⊂ X, a canonical system of approach regions.
When we want to define a system of approach regions A for an open set D ⊂ X, we
only need to specify the subfamily
Aα(ζ)
ζ∈∂D, α∈Iζ
In what follows we fix an open subset D ⊂ X and a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
for D.
For every function u : D −→ [−∞,∞), let
(A− lim sup u)(z) :=
lim sup
w∈Aα(z), w→z
u(w), z ∈ D, Iz 6= ∅,
lim sup
w∈D, w→z
u(w), z ∈ ∂D, Iz = ∅.
By Definition 2.1 (i), (A−lim sup u)|D coincides with the usual upper semicontinuous
regularization of u.
For a set A ⊂ D put
hA,D := sup {u : u ∈ PSH(D), u ≤ 1 on D, A− lim sup u ≤ 0 on A} ,
A UNIFIED APPROACH 5
where PSH(D) denotes the cone of all functions plurisubharmonic on D.
A is said to be pluripolar inD if there is u ∈ PSH(D) such that u is not identically
−∞ on every connected component of D and A ⊂ {z ∈ D : u(z) = −∞} . A is said
to be locally pluripolar in D if for any z ∈ A, there is an open neighborhood V ⊂ D
of z such that A ∩ V is pluripolar in V. A is said to be nonpluripolar (resp. non
locally pluripolar) if it is not pluripolar (resp. not locally pluripolar). According to
a classical result of Josefson and Bedford (see [16], [6]), if D is a Riemann domain
over a Stein manifold, then A ⊂ D is locally pluripolar if and only if it is pluripolar.
Definition 2.2. The relative extremal function of A relative to D is the function
ω(·, A,D) defined by
ω(z, A,D) = ωA(z, A,D) := (A− lim sup hA,D)(z), z ∈ D.
Note that when A ⊂ D, Definition 2.2 coincides with the classical definition of
Siciak’s relative extremal function.
Next, we say that a set A ⊂ D is locally pluriregular at a point a ∈ A if ω(a, A ∩
U,D ∩ U) = 0 for all open neighborhoods U of a. Moreover, A is said to be locally
pluriregular if it is locally pluriregular at all points a ∈ A. It should be noted from
Definition 2.1 that if a ∈ A ∩D then the property of local pluriregularity of A at a
does not depend on any particular choices of a system of approach regions A, while
the situation is different when a ∈ A ∩ ∂D : the property does depend on A.
We denote by A∗ the following set
(A ∩ ∂D)
a ∈ A ∩D : A is locally pluriregular at a
If A ⊂ D is non locally pluripolar, then a classical result of Bedford and Taylor (see
[6, 7]) says that A∗ is locally pluriregular and A\A∗ is locally pluripolar. Moreover,
A∗ is locally of type Gδ, that is, for every a ∈ A
∗ there is an open neighborhood
U ⊂ D of a such that A∗ ∩ U is a countable intersection of open sets.
Now we are in the position to formulate the following version of the plurisubhar-
monic measure.
Definition 2.3. For a set A ⊂ D, let Ã = Ã(A) :=
P∈E(A)
P, where
E(A) = E(A,A) :=
P ⊂ D : P is locally pluriregular, P ⊂ A∗
The plurisubharmonic measure of A relative to D is the function ω̃(·, A,D) defined
ω̃(z, A,D) := ω(z, Ã, D), z ∈ D.
It is worthy to remark that ω̃(·, A,D) ∈ PSH(D) and 0 ≤ ω̃(z, A,D) ≤ 1, z ∈ D.
Moreover,
(2.1)
A− lim sup ω̃(·, A,D)
(z) = 0, z ∈ Ã.
An example in [3] shows that in general, ω(·, A,D) 6= ω̃(·, A,D) on D. Section 10
below is devoted to the study of ω̃(·, A,D) in some important cases.
1Observe that this function depends on the system of approach regions.
6 VIÊT-ANH NGUYÊN
Now we compare the plurisubharmonic measure ω̃(·, A,D) with Siciak’s relative
extremal function ω(·, A,D). We only consider two important special cases: A ⊂ D
and A ⊂ ∂D. For the moment, we only focus on the case where A ⊂ D. The latter
one will be discussed in Section 10 below.
If A is an open subset of an arbitrary complex manifold D, then it is easy to see
ω̃(z, A,D) = ω(z, A,D), z ∈ D.
If A is a (not necessarily open) subset of an arbitrary complex manifold D, then we
will prove in Proposition 7.1 below that
ω̃(z, A,D) = ω(z, A∗, D), z ∈ D.
On the other hand, if, morever, D is a bounded open subset of Cn then we have (see,
for example, Lemma 3.5.3 in [18]) ω(z, A,D) = ω(z, A∗, D), z ∈ D. Consequently,
under the last assumption,
ω̃(z, A,D) = ω(z, A,D), z ∈ D.
Our discussion shows that at least in the case where A ⊂ D, the notion of the
plurisubharmonic measure is a good candidate for generalizing Siciak’s relative ex-
tremal function to the manifold context in the theory of separate holomorphy.
For a good background of the pluripotential theory, see the books [18] or [20].
2.2. Examples of systems of approach regions. There are many systems of
approach regions which are very useful in Complex Analysis. In this subsection we
present some of them.
1. Canonical system of approach regions. It has been given by Definition 2.1
(i)–(ii’).
2. System of angular (or Stolz) approach regions for the open unit disc.
Let E be the open unit disc of C. Put
Aα(ζ) :=
t ∈ E :
∣∣∣∣arg
ζ − t
)∣∣∣∣ < α
, ζ ∈ ∂E, 0 < α <
where arg : C −→ (−π, π] is as usual the argument function. A =
(Aα(ζ))ζ∈∂E, 0<α<π
is referred to as the system of angular (or Stolz) approach regions
for E. In this context A− lim is also called angular limit.
3. System of angular approach regions for certain “good” open subsets
of Riemann surfaces. Now we generalize the previous construction (for the open
unit disc) to a global situation. More precisely, we will use as the local model the
system of angular approach regions for E. Let X be a complex manifold of dimension
1, in other words, X is a Riemann surface, and D ⊂ X an open set. Then D is said
to be good at a point ζ ∈ ∂D2 if there is a Jordan domain U ⊂ X such that ζ ∈ U
and U ∩ ∂D is the interior of a Jordan curve.
Suppose that D is good at ζ. This point is said to be of type 1 if there is a
neighborhood V of ζ such that V0 = V ∩D is a Jordan domain. Otherwise, ζ is said to
be of type 2. We see easily that if ζ is of type 2, then there are an open neighborhood
2 In the work [34] we use the more appealing word Jordan-curve-like for this notion.
A UNIFIED APPROACH 7
V of ζ and two disjoint Jordan domains V1, V2 such that V ∩D = V1∪V2. Moreover,
D is said to be good on a subset A of ∂D if D is good at all points of A.
Here is a simple example which may clarify the above definitions. Let G be the
open square in C with vertices 1 + i, −1 + i, −1− i, and 1− i. Define the domain
D := G \
Then D is good on ∂G ∪
. All points of ∂G are of type 1 and all points of(
are of type 2.
Suppose now that D is good on a nonempty subset A of ∂D.We define the system
of angular approach regions supported on A: A =
Aα(ζ)
ζ∈D, α∈Iζ
as follows:
• If ζ ∈ D \A, then
Aα(ζ)
coincide with the canonical approach regions.
• If ζ ∈ A, then by using a conformal mapping Φ from V0 (resp. V1 and V2)
onto E when ζ is of type 1 (resp. 2), we can “transfer” the angular approach
regions at the point Φ(ζ) ∈ ∂E : (Aα(Φ(ζ)))0<α<π
to those at the point
ζ ∈ ∂D (see [34] for more detailed explanations).
Making use of conformal mappings in a local way, we can transfer, in the same way,
many notions which exist on E (resp. ∂E) to those on D (resp. ∂D).
4. System of conical approach regions.
Let D ⊂ Cn be a domain and A ⊂ ∂D. Suppose in addition that for every point
ζ ∈ A there exists the (real) tangent space Tζ to ∂D at ζ. We define the system of
conical approach regions supported on A: A =
Aα(ζ)
ζ∈D, α∈Iζ
as follows:
• If ζ ∈ D \A, then
Aα(ζ)
coincide with the canonical approach regions.
• If ζ ∈ A, then
Aα(ζ) := {z ∈ D : |z − ζ | < α · dist(z, Tζ)} ,
where Iζ := (1,∞) and dist(z, Tζ) denotes the Euclidean distance from the
point z to Tζ .
We can also generalize the previous construction to a global situation:
X is an arbitrary complex manifold, D ⊂ X is an open set and A ⊂ ∂D is a
subset with the property that at every point ζ ∈ A there exists the (real) tangent
space Tζ to ∂D.
We can also formulate the notion of points of type 1 or 2 in this general context
in the same way as we have already done in Paragraph 3 above.
2.3. Cross and separate holomorphicity and A-limit. Let X, Y be two com-
plex manifolds, let D ⊂ X, G ⊂ Y be two nonempty open sets, let A ⊂ D and
B ⊂ G. Moreover, D (resp. G) is equipped with a system of approach regions
A(D) =
Aα(ζ)
ζ∈D, α∈Iζ
(resp. A(G) =
Aα(η)
η∈G, α∈Iη
). We define a 2-fold
cross W, its interior W o and its regular part W̃ (with respect to A(D) and A(G))
8 VIÊT-ANH NGUYÊN
W = X(A,B;D,G) :=
(D ∪A)×B
A× (B ∪G)
W o = Xo(A,B;D,G) := (A×G) ∪ (D × B),
W̃ = X̃(A,B;D,G) :=
(D ∪ Ã)× B̃
Ã× (G ∪ B̃)
Moreover, put
ω(z, w) := ω(z, A,D) + ω(w,B,G), (z, w) ∈ D ×G,
ω̃(z, w) := ω̃(z, A,D) + ω̃(w,B,G), (z, w) ∈ D ×G.
For a 2-fold cross W := X(A,B;D,G) let
Ŵ := X̂(A,B;D,G) = {(z, w) ∈ D ×G : ω(z, w) < 1} ,
W := X̂(Ã, B̃;D,G) = {(z, w) ∈ D ×G : ω̃(z, w) < 1} .
Let Z be a complex analytic space. We say that a mapping f : W o −→ Z is
separately holomorphic and write f ∈ Os(W
o, Z), if, for any a ∈ A (resp. b ∈ B)
the restricted mapping f(a, ·) (resp. f(·, b)) is holomorphic on G (resp. on D).
We say that a mapping f : W −→ Z is separately continuous and write f ∈
if, for any a ∈ A (resp. b ∈ B) the restricted mapping f(a, ·) (resp.
f(·, b)) is continuous on G ∪ B (resp. on D ∪A).
In virtue of (2.1), for every (ζ, η) ∈ W̃ and every α ∈ Iζ , β ∈ Iη, there are open
neighborhoods U of ζ and V of η such that
U ∩ Aα(ζ)
V ∩Aβ(η)
Then a mapping f :
W −→ Z is said to admit A-limit λ at (ζ, η) ∈ W̃ , and one
writes
(A− lim f)(ζ, η) = λ, 3
if, for all α ∈ Iζ, β ∈ Iη,
cfW∋(z,w)→(ζ,η), z∈Aα(ζ), w∈Aβ(η)
f(z, w) = λ.
Throughout the paper, for a topological space M, C(M, Z) denotes the set of all
continuous mappings f : M −→ Z. If, moreover, Z = C, then C(M,C) is equipped
with the “sup-norm” |f |M := supM |f | ∈ [0,∞]. A mapping f : M −→ Z is said to
be bounded if there exist an open neighborhood U of f(M) in Z and a holomorphic
embedding φ of U into a polydisc of Ck such that φ(U) is an analytic set in this
polydisc. f is said to be locally bounded along N ⊂ M if for every point z ∈ N ,
there is an open neighborhood U of z (in M) such that f |U : U −→ Z is bounded.
f is said to be locally bounded if it is so for N = M. It is clear that if Z = C then
the above notions of boundedness coincide with the usual ones.
3Note that here A = A(D)×A(G).
A UNIFIED APPROACH 9
2.4. Hartogs extension property. The following example (see Shiffman [42])
shows that an additional hypothesis on the “target space” Z is necessary in or-
der that the PROBLEM makes sense. Consider the mapping f : C2 −→ P1 given
f(z, w) :=
[(z + w)2 : (z − w)2], (z, w) 6= (0, 0),
[1 : 1], (z, w) = (0, 0).
Then f ∈ Os
o(C,C;C,C),P1
, but f is not continuous at (0, 0).
We recall here the following notion (see, for example, Shiffman [41]). Let p ≥ 2
be an integer. For 0 < r < 1, the Hartogs figure in dimension p, denoted by Hp(r),
is given by
Hp(r) :=
, zp) ∈ E
p : ‖z
‖ < r or |zp| > 1− r
where E is the open unit disc of C and z
= (z1, . . . , zp−1), ‖z
‖ := max
1≤j≤p−1
|zj|.
Definition 2.4. A complex analytic space Z is said to possess the Hartogs extension
property in dimension p if every mapping f ∈ O(Hp(r), Z) extends to a mapping
f̂ ∈ O(Ep, Z). Moreover, Z is said to possess the Hartogs extension property if it
does in any dimension p ≥ 2.
It is a classical result of Ivashkovich (see [17]) that if Z possesses the Hartogs
extension property in dimension 2, then it does in all dimensions p ≥ 2. Some typical
examples of complex analytic spaces possessing the Hartogs extension property are
the complex Lie groups (see [4]), the taut spaces (see [48]), the Hermitian manifold
with negative holomorphic sectional curvature (see [41]), the holomorphically convex
Kähler manifold without rational curves (see [17]).
Here we mention an important characterization due to Shiffman (see [41]).
Theorem 2.5. A complex analytic space Z possesses the Hartogs extension property
if and only if for every domain D of any Stein manifold M, every mapping f ∈
O(D,Z) extends to a mapping f̂ ∈ O(D̂, Z), where D̂ is the envelope of holomorphy
of D.
In the light of Definition 2.4 and Shiffman’s Theorem, the natural “target spaces”
Z for obtaining satisfactory answers to the PROBLEM are the complex analytic
spaces which possess the Hartogs extension property.
2.5. Statement of the main results. We are now ready to state the main results.
Theorem A. Let X, Y be two complex manifolds, let D ⊂ X, G ⊂ Y be two open
sets, let A (resp. B) be a subset of D (resp. G). D (resp. G) is equipped with a
system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
). Let Z be a
complex analytic space possessing the Hartogs extension property. Then, for every
mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
10 VIÊT-ANH NGUYÊN
• f is locally bounded along X
A ∩ ∂D,B ∩ ∂G;D,G
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there exists a unique mapping f̂ ∈ O(
W,Z) which admits A-limit f(ζ, η) at every
point (ζ, η) ∈ W ∩ W̃ .
If, moreover, Z = C and |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−eω(z,w)
A×B |f |
eω(z,w)
W , (z, w) ∈
Theorem A has an important corollary. Before stating this, we need to introduce
a terminology. A complex manifoldM is said to be a Liouville manifold if PSH(M)
does not contain any non-constant bounded above functions. We see clearly that
the class of Liouville manifolds contains the class of connected compact manifolds.
Corollary B. We keep the hypothesis and the notation in Theorem A. Suppose in
addition that G is a Liouville manifold and that Ã, B̃ 6= ∅. Then, for every mapping
f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded along X
A ∩ ∂D,B ∩ ∂G;D,G
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there is a unique mapping f̂ ∈ O(D × G,Z) which admits A-limit f(ζ, η) at every
point (ζ, η) ∈ W ∩ W̃ .
Corollary B follows immediately from Theorem A since ω̃(·, B,G) ≡ 0.
We will see in Section 10 below that Theorem A and Corollary B generalizes all
the results discussed in Section 1 above. Moreover, they also give many new results.
Although our main results have been stated only for the case of a 2-fold cross, they
can be formulated for the general case of an N -fold cross with N ≥ 2 (see also
[28, 33]).
3. Holomorphic discs and a Two-Constant Theorem
We recall here some elements of Poletsky theory of discs, some background of the
pluripotential theory and auxiliary results needed for the proof of Theorem A.
3.1. Poletsky theory of discs and Rosay Theorem on holomorphic discs.
Let E denote as usual the open unit disc in C. For a complex manifold M, let
O(E,M) denote the set of all holomorphic mappings φ : E −→ M which extend
holomorphically to a neighborhood of E. Such a mapping φ is called a holomorphic
disc on M. Moreover, for a subset A of M, let
1A,M(z) :=
1, z ∈ A,
0, z ∈ M \ A.
4 It follows from Subsection 2.3 that
A ∩ ∂D,B ∩ ∂G;D,G
(A ∩ ∂D)× (G ∪B)
(D ∪ A)× (B ∩ ∂G)
A UNIFIED APPROACH 11
In the work [40] Rosay proved the following remarkable result.
Theorem 3.1. Let u be an upper semicontinuous function on a complex manifold
M. Then the Poisson functional of u defined by
P[u](z) := inf
u(φ(eiθ))dθ : φ ∈ O(E,M), φ(0) = z
is plurisubharmonic on M.
Rosay Theorem may be viewed as an important development in Poletsky theory
of discs. Observe that special cases of Theorem 3.1 have been considered by Poletsky
(see [37, 38]), Lárusson–Sigurdsson (see [22]) and Edigarian (see [10]).
The following Rosay type result gives the connections between the Poisson func-
tional of the characteristic function 1M\A,M and holomorphic discs.
Lemma 3.2. Let M be a complex manifold and let A be a nonempty open subset of
M. Then for any ǫ > 0 and any z0 ∈ M, there are an open neighborhood U of z0,
an open subset T of C, and a family of holomorphic discs (φz)z∈U ⊂ O(E,M) with
the following properties:
(i) Φ ∈ O(U ×E,M), where Φ(z, t) := φz(t), (z, t) ∈ U × E;
(ii) φz(0) = z, z ∈ U ;
(iii) φz(t) ∈ A, t ∈ T ∩ E, z ∈ U ;
(iv) 1
1∂E\T,∂E(e
iθ)dθ < P[1M\A,M](z0) + ǫ.
Proof. See Lemma 3.2 in [28]. �
The next result describes the situation in dimension 1. It will be very useful later
Lemma 3.3. Let T be an open subset of E. Then
ω(0, T ∩ E,E) ≤
1∂E\T,T (e
iθ)dθ.
Proof. See, for example, Lemma 3.3 in [28]. �
The last result, which is an important consequence of Rosay’s Theorem, gives the
connection between the Poisson functional and the plurisubharmonic measure.
Proposition 3.4. Let M be a complex manifold and A a nonempty open subset of
M. Then ω(z, A,M) = P[1M\A,M](z), z ∈ M.
Proof. See, for example, the proof of Proposition 3.4 in [28]. �
12 VIÊT-ANH NGUYÊN
3.2. Level sets of the relative extremal functions and a Two-Constant
Theorem. Let X be a complex manifold and D ⊂ X an open set. Suppose that
D is equipped with a system of approach regions A =
Aα(ζ)
ζ∈D, α∈Iζ
. For every
open subset G of D, there is a natural system of approach regions for G which is
called the induced system of approach regions A
ζ∈G, α∈I
of A onto G.
It is given by
α(ζ) := Aα(ζ) ∩G, ζ ∈ G, α ∈ I
where I
α ∈ Iζ : ζ ∈ Aα(ζ) ∩G
Proposition 3.5. Under the above hypothesis and notation, let A ⊂ D be a locally
pluriregular set (relative to A). For 0 < δ < 1, define the δ-level set of D relative
to A as follows
Dδ,A := {z ∈ D : ω(z, A,D) < 1− δ} .
We equip Dδ,A with the induced system of approach regions A
of A onto Dδ,A (see
Subsection 2.1 above). Then A ⊂ Dδ,A and
(3.1) ω(z, A,Dδ,A) =
ω(z, A,D)
, z ∈ Dδ,A.
Moreover, A is locally pluriregular relative to A
Proof. Since A is locally pluriregular, we see that
(3.2)
A− lim supω(·, A,D)
(z) = 0, z ∈ A.
Therefore, for every z ∈ A and α ∈ Iz, there is an open neighborhood U of z such
that ∅ 6= Aα(z) ∩ U ⊂ Dδ,A. Hence, A ⊂ Dδ,A.
Next, we turn to the proof of identity (3.1). Observe that 0 ≤
ω(·,A,D)
≤ 1 on
Dδ,A by definition. This, combined with (3.2), implies that
(3.3)
ω(z, A,D)
≤ ω(z, A,Dδ,A), z ∈ Dδ,A.
To prove the converse inequality of (3.3), let u ∈ PSH(Dδ,A) be such that u ≤ 1 on
Dδ,A and A
− lim sup u ≤ 0 on A. Consider the following function
(3.4) û(z) :=
max {(1− δ)u(z), ω(z, A,D)} , z ∈ Dδ,A,
ω(z, A,D), z ∈ D \Dδ,A.
It can be checked that û ∈ PSH(D) and 0 ≤ û ≤ 1. Moreover, in virtue of the
assumption on u and (3.2) and (3.4), we have that
(A−lim sup û)(a) ≤ max
(1− δ)(A
− lim sup u)(a),
A− lim supω(·, A,D)
for all a ∈ A. Consequently, û ≤ ω(·, A,D). In particular, one gets from (3.4) that
u(z) ≤
ω(z, A,D)
, z ∈ Dδ,A.
Since u is arbitrary, we deduce from the latter estimate that the converse inequality
of (3.3) also holds. This, combined with (3.3), completes the proof of (3.1).
A UNIFIED APPROACH 13
To prove the last conclusion of the proposition, fix a point a ∈ A and an open
neighborhood U of a. Then we have
A− lim supω(·, A∩U,Dδ,A∩U)
(a) ≤
A− lim supω(·, A∩U, (D∩U)δ,A∩U )
A− lim supω(·, A ∩ U,D ∩ U)
(a) = 0,
where the first equality follows from identity (3.1) and the second one from the
hypothesis that A is locally pluriregular. �
The following Two-Constant Theorem for plurisubharmonic functions will play
an important role in the proof of the estimate in Theorem A.
Theorem 3.6. Let X be a complex manifold and D ⊂ X an open subset. Suppose
that D is equipped with a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
. Let A ⊂ D
be a locally pluriregular set. Let m,M ∈ R and u ∈ PSH(D) such that u(z) ≤ M
for z ∈ D, and (A− lim sup u)(z) ≤ m for z ∈ A. Then
u(z) ≤ m(1− ω(z, A,D)) +M · ω(z, A,D), z ∈ D.
Proof. It follows immediately from Definition 2.2. �
Theorem 3.7. We keep the hypotheses and notation of Theorem 3.6. Let f be a
bounded function in O(D,C) such that (A − lim f)(ζ) = 0, ζ ∈ A. Then f(z) = 0
for all z ∈ D such that ω(z, A,D) 6= 1.
Proof. Fix a finite positive constant M such that |f |D < M. Consequently, the
desired conclusion follows from applying Theorem 3.6 to the function u := log |f |.
3.3. Construction of discs. In this subsection we present the construction of discs
à la Poletsky (see [38]). This is one of the main ingredients in the proof of Theorem
Let mes denote the Lebesgue measure on the unit circle ∂E. For a bounded
mapping φ ∈ O(E,Cn) and ζ ∈ ∂E, f(ζ) denotes the angular limit value of f at
ζ if it exists. A classical theorem of Fatou says that mes ({ζ ∈ ∂E : ∃f(ζ)}) = 2π.
For z ∈ Cn and r > 0, let B(z, r) denote the open ball centered at z with radius r.
Theorem 3.8. Let D be a bounded open set in Cn, A ⊂ D, z0 ∈ D and ǫ > 0.
Let A be a system of approach regions for D. Suppose in addition that A is locally
pluriregular (relative to A). Then there exist a bounded mapping φ ∈ O(E,Cn) and
a measurable subset Γ0 ⊂ ∂E with the following properties:
1) Γ0 is pluriregular (with respect to the system of angular approach regions),
φ(0) = z0, φ(E) ⊂ D, Γ0 ⊂
ζ ∈ ∂E : φ(ζ) ∈ A
, and
·mes(Γ0) < ω(z0, A,D) + ǫ.
14 VIÊT-ANH NGUYÊN
2) Let f ∈ C(D ∪ A,C) ∩ O(D,C) be such that f(D) is bounded. Then there
exist a bounded function g ∈ O(E,C) such that g = f ◦ φ in a neighborhood
of 0 ∈ E and5 g(ζ) = (f ◦ φ)(ζ) for all ζ ∈ Γ0. Moreover, g|Γ0 ∈ C(Γ0,C).
This theorem motivates the following
Definition 3.9. We keep the hypothesis and notation of Theorem 3.8. Then ev-
ery pair (φ,Γ0) satisfying the conclusions 1)–2) of this theorem is said to be an
ǫ-candidate for the triplet (z0, A,D).
Theorem 3.8 says that there always exist ǫ-candidates for all triplets (z, A,D).
Proof. First we will construct φ. To do this we will construct by induction a sequence
k=1 ⊂ O(E,D) which approximates φ as k ր ∞. This will allow to define the
desired mapping as φ := lim
φk. The construction of such a sequence is divided into
three steps.
For 0 < δ, r < 1 let
Da,r := D ∩ B(a, r), a ∈ A.
Aa,r,δ := {z ∈ Da,r : ω(z, A ∩ B(a, r), Da,r) < δ} , a ∈ A,
Ar,δ :=
Aa,r,δ,
(3.5)
where in the second “:=” Da,r is equipped with the induced system of approach
regions of A onto Da,r (see Subsection 3.2 above).
Suppose without loss of generality that D ⊂ B(0, 1).
Step 1: Construction of φ1.
Let δ0 :=
and r0 := 1. Fix 0 < δ1 <
and 0 < r1 <
. Applying Proposition
3.4, we obtain φ1 ∈ O(E,D) such that φ1(0) = z0 and
∂E ∩ φ−11 (Ar1,δ1)
≤ ω(z0, Ar1,δ1, D) + δ0.
On the other hand, using (3.5) and Definition 2.2 and the hypothesis that A is
locally pluriregular, we obtain
ω(z0, Ar1,δ1 , D) ≤ ω(z0, A,D).
Consequently, we may choose a subset Γ1 of Γ0 := ∂E ∩ φ
1 (Ar1,δ1) which consists
of finite disjoint closed arcs (Γ1j)j∈J1 so that
(3.6) 1−
·mes(Γ1) < ω(z0, Ar1,δ1, D) + 2δ0 ≤ ω(z0, A,D) + 2δ0,
t,τ∈Γ1j
|t− τ | < 2δ1, sup
t,τ∈Γ1j
|φ1(t)− φ1(τ)| < 2r1, j ∈ J1.
Step 2: Construction of φk+1 from φk for all k ≥ 1.
By the inductive construction we have 0 < δk <
and 0 < rk <
φk ∈ O(E,D) such that φk(0) = z0 and there exists a closed subset Γk of ∂E ∩
5 Note here that by Part 1), (f ◦ φ)(ζ) exists for all ζ ∈ Γ0.
A UNIFIED APPROACH 15
(Ark,δk) ∩ Γk−1 which consists of finite closed arcs (Γk,j)j∈Jk such that Γk is
relatively compact in the interior of Γk−1, and
(3.7) 1−
·mes(Γk) < 1−
·mes(Γk−1) + 2δk−1,
t,τ∈Γk,j
|t− τ | < 2δk, sup
t,τ∈Γk,j
|φk(t)− φk(τ)| < 2rk, j ∈ Jk,
|φk − φk−1|Γk < 2rk−1.
Here we make the convention that the last inequality is empty when k = 1.
In particular, we have that φk(Γk) ⊂ Ark,δk . Therefore, by (3.5), for every ζ ∈
φk(Γk) there is a ∈ A such that ζ ∈ Aa,rk,δk , that is,
ω(ζ, A∩ B(a, rk), Da,rk) < δk.
Using the hypothesis that A is locally pluriregular and (3.5) we see that
ω(z, Ar,δ ∩Da,rk , Da,rk) ≤ ω(z, A ∩ B(a, rk), Da,rk), 0 < δ, r < 1.
Consequently, for every ζ ∈ φk(Γk) there is a ∈ A such that
ω(ζ, Ar,δ ∩Da,rk , Da,rk) < δk, 0 < δ, r < 1.
Using the last estimate and arguing as in [38, p. 120–121] (see also the proof of
Theorem 1.10.7 in [19] for a nice presentation), we can choose 0 < δk+1 <
0 < rk+1 <
and φk+1 ∈ O(E,D) such that φk+1(0) = z0, and there exists a
closed subset Γk+1 of ∂E ∩ φ
k+1(Ark+1,δk+1) ∩ Γk which consists of finite closed arcs
(Γk+1,j)j∈Jk+1 such that Γk+1 is relatively compact in the interior of Γk, and
(3.8) 1−
·mes(Γk+1) < 1−
·mes(Γk) + 2δk,
t,τ∈Γk+1,j
|t− τ | < 2δk+1, sup
t,τ∈Γk+1,j
|φk+1(t)− φk+1(τ)| < 2rk+1, j ∈ Jk+1,
|φk+1 − φk|Γk+1 < 2rk.
Step 3: Construction of φ from the sequence (φk)
In summary, we have constructed a decreasing sequence (Γk)
k=1 of closed subsets
of ∂E. Consider the new closed set
By (3.7)–(3.8),
·mes(Γ) =
mes(Γ1)− 2
mes(Γ1)− 3δ1.
16 VIÊT-ANH NGUYÊN
This, combined with (3.6), implies the following property
·mes(Γ) < 1−
·mes(Γ1)+3δ1 ≤ ω(z0, A,D)+2δ0+3δ1 < ω(z0, A,D)+ ǫ.
On the other hand, we recall from the above construction the following properties:
(ii) φk(Γ) ⊂ φk(Γk) ⊂ Ark,δk .
(iii) δ0 =
, r0 = 1, 0 < δk+1 <
, 0 < rk+1 <
and |φk+1−φk|Γ ≤ |φk+1−φk|Γk+1 <
(iv) sup
t,τ∈Γkj
|t− τ | < 2δk and sup
t,τ∈Γk,j
|φk(t)− φk(τ)| < 2rk, j ∈ Jk.
(v) For every ζ ∈ Γ there exists a sequence (jk)k≥1 such that jk ∈ Jk, and ζ is an
interior point of Γk,jk , and Γk+1,jk+1 ⋐ Γk,jk , and ζ =
Γk,jk .
Therefore, we are able to apply the Khinchin–Ostrowski Theorem (see [11, The-
orem 4, p. 397]) to the sequence (φk)
k=1. Consequently, this sequence converges
uniformly on compact subsets of E to a mapping φ ∈ O(E,D). Moreover, φ admits
(angular) boundary values at all points of Γ and φ(Γ) ⊂
Ark,δk ⊂ A.
Observe that since φk(0) = φ(0) = z0 ∈ D and f ∈ C(D ∪ A,C) ∩ O(D,C), the
sequence (f ◦ φk)
k=1 converges to f ◦ φ uniformly on a neighborhood of 0 ∈ E. On
the other hand, f(D) is bounded by the hypothesis. Thus by Montel Theorem, the
family (f ◦ φk)
k=1 ⊂ O(E,C) is normal. Consequently, the sequence (f ◦ φk)
converges uniformly on compact subsets of E. Let g be the limit mapping. Then
g ∈ O(E,C) and g = f ◦ φ in a neighborhood of 0 ∈ E. Moreover, it follows
from (i)–(iii) above and the hypothesis f ∈ C(D ∪ A,C) that g(ζ) = (f ◦ φ)(ζ)
for all ζ ∈ Γ. We deduce from (iii)–(v) above that g|Γ ∈ C(Γ,C) Finally, applying
Lemma 4.1 below we may choose a locally pluriregular subset Γ0 ⊂ Γ (relative to
the system of angular approach regions) such that mes(Γ0) = mes(Γ). Hence, the
proof is finished. �
It is worthy to remark that φ(E) ⊂ D; but in general, φ(E) 6⊂ D !
The last result of this section sharpens Theorem 3.8.
Theorem 3.10. Let D be a bounded open set in Cn, A ⊂ D, and ǫ > 0. Let A be a
system of approach regions for D. Suppose in addition that A is locally pluriregular
(relative to A). Then there exists a Borel mapping Φ : D × E −→ Cn with the
following property: for every z ∈ D, there is a measurable subset Γz of ∂E such that
(Φ(z, ·),Γz) is an ǫ-candidate for the triplet (z, A,D).
Roughly speaking, this result says that one can construct ǫ-candidates for (z, A,D)
so that they depend in a Borel-measurable way on z ∈ D.
Proof. Observe that in Proposition 3.4 we can construct ǫ-candidates for (z, A,M)
so that they depend in a Borel-measurable way on z ∈ M. Here an ǫ-candidate
for (z, A,M) is a holomorphic disc φ ∈ O(E,M) such that φ(0) = z and
1∂E\φ−1(A),∂E(e
iθ)dθ < P[1M\A,M](z) + ǫ.
A UNIFIED APPROACH 17
Using this we can adapt the proof of Theorem 3.8 in order to obtain the desired
result. �
4. A mixed cross theorem
Let E be as usual the open unit disc in C. Let B be a measurable subset of ∂E
and ω(·, B, E) the relative extremal function of B relative to E (with respect to the
canonical system of approach regions). Then it is well-known (see [39]) that
(4.1) ω(z, B, E) =
1− |z|2
|eiθ − z|2
· 1∂E\B,∂E(e
iθ)dθ.
The following elementary lemma will be very useful.
Lemma 4.1. We keep the above hypotheses and notation.
1) Let u be a subharmonic function defined on E with u ≤ 1 and let α ∈ (0, π
be such that
lim sup
z→ζ, z∈Aα(ζ)
u(z) ≤ 0 for a.e. ζ ∈ B,
where A = (Aα(ζ)) is the system of angular approach regions defined in
Subsection 2.2. Then u ≤ ω(·, B, E) on E.
2) ω(·, B, E) is also the relative extremal function of B relative to E (with
respect to the system of angular approach regions).
3) For all subsets N ⊂ ∂E with mes(N ) = 0, ω(·, B, E) = ω(·, B ∪N , E).
4) Let B
be the set of all density points of B. Then
z→ζ, z∈Aα(ζ)
ω(z, B, E) = 0, ζ ∈ B
, 0 < α <
In particular, B
is locally pluriregular (with respect to the system of angular
approach regions).
5) ω(·, B, E) = ω̃c(·, B, E) = ω̃a(·, B, E) on E, where ω̃c(·, B, E) (resp.
ω̃a(·, B, E)) is given by Definition 2.3 relative to the system of canonical
approach regions (resp. angular approach regions).
Proof. It follows immediately from the explicit formula (4.1). �
The main ingredient in the proof of Theorem A is the following mixed cross
theorem.
Theorem 4.2. Let D be a complex manifold and E as usual the open unit disc in
C. D (resp. E) is equipped with the canonical system of approach regions (resp.
the system of angular approach regions). Let A be an open subset of D and B a
measurable subset of ∂E such that B is locally pluriregular (relative to the system of
angular approach regions). For 0 ≤ δ < 1 put G := {w ∈ E : ω(w,B,E) < 1− δ} .
18 VIÊT-ANH NGUYÊN
Let W := X(A,B;D,G), W o := Xo(A,B;D,G), and6
Ŵ = X̂(A,B;D,G) :=
(z, w) ∈ D ×G : ω(z, A,D) +
ω(w,B,E)
Let f : W −→ C be such that
(i) f ∈ Os(W
o,C);
(ii) f is locally bounded on W, f |A×B is a Borel function;
(iii) for all z ∈ A,
w→η, w∈Aα(η)
f(z, w) = f(z, η), η ∈ B, 0 < α <
Then there is a unique function f̂ ∈ O(Ŵ ,C) such that f̂ = f on A×G. Moreover,
|f |W = |f̂ |cW .
The proof of this theorem will occupy the present and the next sections. Our
approach here avoid completely the classical method of doubly orthogonal bases
of Bergman type. For the proof we need the following “measurable” version of
Gonchar’s Theorem.
Theorem 4.3. Let D = G := E be equipped with the system of angular approach
regions. Let A (resp. B) be a Borel measurable subset of ∂D (resp. ∂G) such
that A and B are locally pluriregular and that mes(A), mes(B) > 0. Put W :=
X(A,B;D,G) and define W o, Ŵ , ω(z, w) as in Subsection 2.3. Let f : W −→ C
be such that:
(i) f is locally bounded on W and f ∈ Os(W
o,C);
(ii) f |A×B is a Borel function;
(iii) for all a ∈ A (resp. b ∈ B), f(a, ·)|G (resp. f(·, b)|D) admits A-limit
7 f(a, b)
at all b ∈ B (resp. at all a ∈ A).
Then there exists a unique function f̂ ∈ O(Ŵ ,C) which admits A-limit f(ζ, η) at
all points (ζ, η) ∈ W o. If, moreover, |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−ω(z,w)
A×B |f |
ω(z,w)
W , (z, w) ∈ Ŵ .
Proof. It follows from Steps 1–3 of Section 6 in [34]. �
The above theorem is also true in the context of an N -fold cross W (N ≥ 2). We
give here a version of a special 3-fold cross which is needed for the proof of Theorem
Theorem 4.4. Let D = G := E be equipped with the system of angular approach
regions. Let A (resp. B) be a Borel measurable subset of ∂D (resp. ∂G) such that
6 In fact, Theorem 4.10 in [34] says that ω(·, B,G) =
ω(·,B,E)
on G, where ω(·, B,G) is the
relative extremal function with respect to the system of angular approach regions induced onto G.
7 that is, the angular limit
A UNIFIED APPROACH 19
A and B are locally pluriregular and that mes(A), mes(B) > 0. Define W, W o, Ŵ
as follows:
W = X(A, ∂E,B;D,E,G) := A× ∂E × (G ∪B)
A× E × B
(D ∪ A)× ∂E × B,
W o = Xo(A, ∂E,B;D,E,G) := A× ∂E ×G
A× E × B
D × ∂E × B,
Ŵ = X̂(A, ∂E,B;D,E,G) := {(z, t, w) ∈ D × E ×G : ω(z, A,D) + ω(w,B,G) < 1} .
Let f : W −→ C be such that:
(i) f is locally bounded on W and f ∈ Os(W
o,C)8;
(ii) f |A×∂E×B is a Borel function;
(iii) for all (a, λ) ∈ A × ∂E (resp. (a, b) ∈ A × B) (resp. (λ, b) ∈ ∂E × B),
f(a, λ, ·)|G (resp. f(a, ·, b)|E) (resp. f(·, λ, b)|D) admits the angular limit
f(a, λ, b) at all b ∈ B (resp. at all λ ∈ ∂E) (resp. at all a ∈ A).
Then there exists a unique function f̂ ∈ O(Ŵ ,C) such that
cW∋(z,t,w)→(ζ,τ,η),w∈Aα(η)
f̂(z, t, w) =
f(ζ, λ, η)
(ζ, τ, η) ∈ D × E × B, 0 < α <
If, moreover, |f |W <∞, then
|f̂(z, t, w)| ≤ |f |
1−ω(z,A,D)−ω(w,B,G)
A×∂E×B |f |
ω(z,A,D)+ω(w,B,G)
W , (z, t, w) ∈ Ŵ .
Proof. We refer the reader to Subsections 5.2 and 5.3 in [34].
Let ω̂(·, A,D) (resp. ω̂(·, B,G)) be the conjugate harmonic function of ω(·, A,D)
(resp. ω(·, B,G) ) such that ω̂(z0, A,D) = 0 (resp. ω̂(w0, B,G) = 0) for a certain
fixed point z0 ∈ D (resp. w0 ∈ G). Thus we define the holomorphic functions
g1(z) := ω(z, A,D) + iω̂(z, A,D), g2(w) := ω(w,B,G) + iω̂(w,B,G), and
g(z, w) := g1(z) + g2(w), (z, w) ∈ D ×G.
Each function e−g1 (resp. e−g2) is bounded on D (resp. on G). Therefore, in
virtue of [11, p. 439], we may define e−g1(a) (resp. e−g2(b)) for a.e. a ∈ A (resp. for
a.e. b ∈ B) to be the angular limit of e−g1 at a (resp. e−g2 at b).
In virtue of (i), for each positive integer N, we define, as in [12, 13] (see also [34]),
the Gonchar–Carleman operator as follows
(4.2) KN(z, t, w) = KN [f ](z, t, w) :=
(2πi)2
−N(g(a,b)−g(z,w)) f(a, t, b)dadb
(a− z)(b− w)
for (z, t, w) ∈ D × ∂E × G. Reasoning as in [13] and using (i)–(iii) above, we see
that the following limit
(4.3) K(z, t, w) = K[f ](z, t, w) := lim
KN(z, t, w)
8 This notation means that for all (a, λ) ∈ A×∂E (resp. (a, b) ∈ A×B) (resp. (λ, b) ∈ ∂E×B),
the function f(a, λ, ·)|G (resp. f(a, ·, b)|E) (resp. f(·, λ, b)|D) is holomorphic.
20 VIÊT-ANH NGUYÊN
exists for all points in the set
(z, t, w) : t ∈ ∂E, (z, w) ∈ X̂(A,B;D,G)
, and its
limit is uniform on compact subsets of the latter set.
Observe that for n = 0, 1, 2, . . . , and N = 1, 2, . . . ,
KN(z, t, w)dt =
(2πi)2
f(a, t, b)dt
e−N(g(a,b)−g(z,w))dadb
(a− z)(b− w)
where the first equality follows from (4.2), the second one from the equality∫
tnf(a, t, b)dt = 0 which itself is an immediate consequence of (i). Therefore,
we deduce from (4.3) that
tnK(z, t, w)dt = 0, (z, w) ∈ X̂(A,B;D,G), n = 0, 1, 2, . . . .
On the other hand,
(z, t, w) : t ∈ E, (z, w) ∈ X̂(A,B;D,G)
Hence, we are able to define the desired extension function
f̂(z, t, w) :=
K(z, λ, w)
dλ, (z, t, w) ∈ Ŵ .
Recall from Steps 1–3 of Section 6 in [34] that
cW∋(z,w)→(ζ,η),w∈Aα(η)
K(z, t, w) = f(ζ, t, η), (ζ, t, η) ∈ D × ∂E × B, 0 < α <
Inserting this into the above formula of f̂ , the desired conclusion of the theorem
follows. �
We break the proof of Theorem 4.2 into two cases.
CASE 1: δ = 0 (that is G = E).
We follow essentially the arguments presented in Section 4 of [28]. For the sake
of clarity and completeness we give here the most basic arguments.
We begin the proof with the following lemma.
Lemma 4.5. We keep the hypothesis of Theorem 4.2. For j ∈ {1, 2}, let φj ∈
O(E,D) be a holomorphic disc, and let tj ∈ E such that φ1(t1) = φ2(t2) and
1D\A,D(φj(e
iθ))dθ < 1. Then:
1) For j ∈ {1, 2}, the function (t, w) 7→ f(φ(t), w) defined on X(φ−1j (A) ∩
∂E,B;E,G) satisfies the hypothesis of Theorem 4.3, where φ−1j (A) := {t ∈
E : φj(t) ∈ A}.
A UNIFIED APPROACH 21
2) For j ∈ {1, 2}, in virtue of Part 1), let f̂j be the unique function in
X̂(φ−1j (A) ∩ ∂E,B;E,G),C
given by Theorem 4.3. Then
f̂1(t1, w) = f̂2(t2, w),
for all w ∈ G such that (tj, w) ∈ X̂
φ−1j (A) ∩ ∂E,B;E,G
, j ∈ {1, 2}.
Proof of Lemma 4.5. Part 1) follows immediately from the hypothesis. There-
fore, it remains to prove Part 2). To do this fix w0 ∈ G such that (tj, w0) ∈
φ−1j (A) ∩ E,B;E,G
for j ∈ {1, 2}.We need to show that f̂1(t1, w0) = f̂2(t2, w0).
Observe that both functions w ∈ G 7→ f̂1(t1, w) and w ∈ G 7→ f̂2(t2, w) belong to
O(G,C), where G is the connected component which contains w0 of the following
open set
w ∈ G : ω(w,B,G) < 1− max
j∈{1,2}
ω(tj, φ
j (A) ∩ ∂E,E)
Since φ1(t1) = φ2(t2), it follows from Theorem 4.3 and the hypothesis of Part 2)
(A− lim f̂1)(t1, η) = f(φ1(t1), η) = f(φ2(t2), η) = (A− lim f̂2)(t2, η), η ∈ B.
Therefore, by Theorem 3.7, f̂1(t1, w) = f̂2(t2, w), w ∈ G. Hence, f̂1(t1, w0) =
f̂2(t2, w0), which completes the proof of the lemma. �
Now we return to the proof of the theorem in CASE 1 which is divided into two
steps.
Step 1: Construction of the extension function f̂ on Ŵ and its uniqueness.
Proof of Step 1. We define f̂ as follows: Let W be the set of all pairs (z, w) ∈ D×G
with the property that there are a holomorphic disc φ ∈ O(E,D) and t ∈ E such
that φ(t) = z and (t, w) ∈ X̂ (φ−1(A) ∩ ∂E,B;E,G) . By Part 1) of Lemma 4.5 and
Theorem 4.3, let f̂φ be the unique function in O
X̂(φ−1(A) ∩ ∂E,B;E,G),C
(4.4) (A− lim f̂φ)(t, w) = f(φ(t), w), (t, w) ∈ X
−1(A) ∩ ∂E,B;E,G
Then the desired extension function f̂ is given by
(4.5) f̂(z, w) := f̂φ(t, w).
In virtue of Part 2) of Lemma 4.5, f̂ is well-defined on W. We next prove that
(4.6) W = Ŵ .
Taking (4.6) for granted, then f̂ is well-defined on Ŵ .
Now we return to (4.6). To prove the inclusion W ⊂ Ŵ , let (z, w) ∈ W. By the
above definition of W, one may find a holomorphic disc φ ∈ O(E,D), a point t ∈ E
22 VIÊT-ANH NGUYÊN
such that φ(t) = z and (t, w) ∈ X̂ (φ−1(A) ∩ ∂E,B;E,G) . Since ω(φ(t), A,D) ≤
ω(t, φ−1(A) ∩ ∂E,E), it follows that
ω(z, A,D) + ω(w,B,G) ≤ ω(t, φ−1(A) ∩ ∂E,E) + ω(w,B,G) < 1,
Hence (z, w) ∈ Ŵ . This proves the above mentioned inclusion.
To finish the proof of (4.6), it suffices to show that Ŵ ⊂ W. To do this, let
(z, w) ∈ Ŵ and fix any ǫ > 0 such that
(4.7) ǫ < 1− ω(z, A,D)− ω(w,B,G).
Applying Theorem 3.1 and Proposition 3.4, there is a holomorphic disc φ ∈ O(E,D)
such that φ(0) = z and
(4.8)
1D\A,D(φ(e
iθ))dθ < ω(z, A,D) + ǫ.
Observe that
ω(0, φ−1(A) ∩ ∂E,E) + ω(w,B,G) =
1D\A,D(φ(e
iθ))dθ + ω(w,B,G)
< ω(z, A,D) + ω(w,B,G) + ǫ < 1,
where the equality follows from (4.1), the first inequality holds by (4.8), and the
last one by (4.7). Hence, (0, w) ∈ X̂ (φ−1(A) ∩ ∂E,B;E,G) , which implies that
(z, w) ∈ W. This completes the proof of (4.6). Hence, the construction of f̂ on Ŵ
has been completed.
Next we show that f̂ = f on A×G. To this end let (z0, w0) be an arbitrary point
of A × G. Choose the holomorphic disc φ ∈ O(E,D) given by φ(t) := z0, t ∈ E.
Then by formula (4.5),
f̂(z0, w0) = f̂φ(0, w0) = f(φ(0), w0) = f(z0, w0).
If g ∈ O(Ŵ ,C) satisfies g = f on A × G, then we deduce from (4.4)–(4.5) that
g = f̂ . This proves the uniqueness of f̂ . �
Finally, we conclude the proof of CASE 1 by the following
Step 2: Proof of the fact that f̂ ∈ O(Ŵ ,C).
Proof of Step 2. Fix an arbitrary point (z0, w0) ∈ Ŵ and let ǫ > 0 be so small such
(4.9) 2ǫ < 1− ω(z0, A,D)− ω(w0, B,G).
Since ω(·, B,G) ∈ PSH(G), one may find an open neighborhood V of w0 such that
(4.10) ω(w,B,D) < ω(w0, B,G) + ǫ, w ∈ V.
A UNIFIED APPROACH 23
Let n be the dimension of D at the point z0. Applying Lemma 3.2 and Proposition
3.4, we obtain an open set T in C, an open neighborhood U of z0, and a family of
holomorphic discs (φz)z∈U ⊂ O(E,D) with the following properties:
the mapping (z, t) ∈ U × E 7→ φz(t) is holomorphic;(4.11)
φz(0) = z, z ∈ U ;(4.12)
φz(t) ∈ A, t ∈ T ∩ E, z ∈ U ;(4.13)
1∂E\T,∂E(e
iθ)dθ < ω(z0, A,D) + ǫ.(4.14)
By shrinking U (if necessary), we may assume without loss of generality that in a
chart, z0 = 0 ∈ C
n and
(4.15) U =
z = (z1, . . . , zn) = (z
, zn) ∈ Cn : z
∈ S, |zn| < 2
where S ⊂ Cn−1 is an open set.
Consider the 3-fold cross (compared with the notation in Theorem 4.4)
X (T ∩ ∂E, U,B;E,U,G) := (T ∩ ∂E)× U × (G ∪ B)
(T ∩ ∂E)× U × B
(E ∪ (T ∩ ∂E))× U ×B,
and the function g : X (T ∩ ∂E, U,B;E,U,G) −→ C given by
(4.16) g(t, z, w) := f(φz(t), w), (t, z, w) ∈ X (T ∩ ∂E, U,B;E,U,G) .
We make the following observations:
Let t ∈ T ∩ ∂E. Then, in virtue of (4.13) we have φz(t) ∈ A for z ∈ U. Con-
sequently, in virtue of (4.11), (4.16) and the hypothesis f ∈ Os(W
o,C), we con-
clude that g(t, z, ·)|G ∈ O(G,C)
resp. g(t, ·, w)|U ∈ O(U,C)
for any z ∈ U (resp.
w ∈ B). Analogously, for any z ∈ U, w ∈ B, we can show that g(·, z, w)|E ∈ O(E,C).
In summary, we have shown that g is separately holomorphic. In addition,
it follows from hypothesis (ii) and (4.11)–(4.13) that g is locally bounded and
g|(T∩∂E)×U×B is a Borel function.
For z
∈ S write Ez′ :=
z = (z
, zn) ∈ C
n : |zn| < 1
. Then by (4.15),⋃
Ez′ ⊂ U. Consequently, for all z
∈ S, using hypothesis (iii) we are
able to apply Theorem 4.4 to g in order to obtain a unique function ĝ ∈
X̂ (T ∩ ∂E, ∂Ez′ , B;E,Ez′ , G) ,C
9 such that
(t,z,w)→(τ,ζ,η), w∈Aα(η)
ĝ(t, z, w) =
g(τ, ζ
, λ, η)
λ− ζn
(τ, ζ, η) ∈ E ×Ez′ × B, z
∈ S, 0 < α <
9 In fact, we identify Ez′ with E in an obvious way.
24 VIÊT-ANH NGUYÊN
Using (4.11) and (4.15)–(4.16) and the Cauchy’s formula, we see that the right hand
side is equal to g(τ, ζ, η). Hence, we have shown that
(4.17)
(t,z,w)→(τ,ζ,η), w∈Aα(η)
ĝ(t, z, w) = g(τ, ζ, η), (τ, ζ, η) ∈ E×Ez′×B, z
∈ S, 0 < α <
Observe that
X̂ (T ∩ ∂E, ∂Ez′ , B;E,Ez′ , G) = {(t, z, w) ∈ E × Ez′ ×G : ω(t, T ∩ ∂E,E) + ω(w,B,G) < 1} .
On the other hand, for any w ∈ V,
ω(0, T ∩ ∂E,E) + ω(w,B,G) ≤
1∂E\T,∂E(e
iθ)dθ + ω(w0, B,G) + ǫ
< ω(z0, A,D) + ω(w0, B,G) + 2ǫ < 1,
(4.18)
where the first inequality follows from (4.1) and (4.10), the second one from (4.14),
and the last one from (4.9). Consequently,
(4.19) (0, z, w) ∈ X̂ (T ∩ ∂E, ∂Ez′ , B;E,Ez′ , G) , (z, w) ∈ Ez′ × V, z
It follows from (4.5), (4.12), (4.13) and (4.18) that, for z
∈ S and z ∈ Ez′ , f̂φz is
well-defined and holomorphic on X̂(T ∩ ∂E,B;E,G), and
(4.20) f̂(z, w) = f̂φz(0, w), w ∈ V.
On the other hand, it follows from (4.4), (4.16) and (4.17) that
(t,w)→(τ,η), w∈Aα(η)
f̂φz(t, w) = lim
(t,w)→(τ,η), w∈Aα(η)
ĝ(t, z, w),
(τ, η) ∈ E ×B, z ∈ Ez′ , z
∈ S, 0 < α <
Since, for fixed z ∈ Ez′ , the restricted functions (t, w) 7→ ĝ(t, z, w) and f̂φz are
holomorphic on X̂(T ∩ ∂E,B;E,G), we deduce from the latter equality and the
uniqueness of Theorem 4.3 that
ĝ(t, z, w) = f̂φz(t, w), (t, w) ∈ X̂ (T ∩ ∂E,B;E,G) , z ∈ Ez′ , z
In particular, using (4.5), (4.19) and (4.20),
ĝ(0, z, w) = f̂φz(0, w) = f̂(z, w), (z, w) ∈ Ez′ × V, z
Since we know from (4.19) that ĝ is holomorphic in the variables zn and w on a
neighborhood of (0, z0, w0), it follows that f̂ is holomorphic in the variables z
n and
w on a neighborhood of (z0, w0). Exchanging the role of z
n and any other variable
zj , j = 1, . . . , n − 1, we see that f̂ is separetely holomorphic on a neighborhood
of (z0, w0). In addition, f̂ is locally bounded. Consequently, we conclude, by the
classical Hartogs extension Theorem, that f̂ is holomorphic on a neighborhood of
(z0, w0). Since (z0, w0) ∈ Ŵ is arbitrary, it follows that f̂ ∈ O(Ŵ ,C). �
Combining Steps 1–2, CASE 1 follows. �
A UNIFIED APPROACH 25
5. Completion of the proof of Theorem 4.2
In this section we introduce the new technique of conformal mappings. This
technique will allow us to pass from CASE 1 to the general case. We recall a notion
from Definition 4.8 in [34] which will be relevant for our further study.
Definition 5.1. Let A be the system of angular approach regions for E, let Ω be an
open subset of the unit disc E and ζ a point in ∂E. Then the point ζ is said to be
an end-point of Ω if, for every 0 < α < π
, there is an open neighborhood U = Uα of
ζ such that U ∩Aα(ζ) ⊂ Ω. The set of all end-points of Ω is denoted by End(Ω).
The main idea of the technique of conformal mappings is described below.
Proposition 5.2. Let B be a measurable subset of ∂E with mes(B) > 0. For 0 ≤
δ < 1 put G := {w ∈ E : ω(w,B,E) < 1− δ} . Let Ω be an arbitrary connected
component of G. Then
1) End(Ω) is a measurable subset of ∂E and mes(End(Ω)) > 0. Moreover, Ω is
a simply connected domain.
In virtue of Part 1) and the Riemann mapping theorem, let Φ be a confor-
mal mapping of Ω onto E.
2) For every ζ ∈ End(Ω), there is η ∈ ∂E such that
z→ζ, z∈Ω∩Aα(ζ)
Φ(z) = η, 0 < α <
η is called the limit of Φ at the end-point ζ and it is denoted by Φ(ζ).
Moreover, Φ|End(Ω) is one-to-one.
3) Let f be a bounded holomorphic function on Ω, ζ ∈ End(Ω) and λ ∈ C such
that lim
z→ζ, z∈Ω∩Aα(ζ)
f(z) = λ for some 0 < α < π
. Then f ◦ Φ−1 ∈ O(E,C)
admits the angular limit λ at Φ(ζ).
4) Let ∆ be a subset of End(Ω) such that mes(∆) = mes(End(Ω)). Put Φ(∆) :=
{Φ(ζ), ζ ∈ ∆}, where Φ(ζ) is given by Part 2). Then Φ(∆) is a measurable
subset of of ∂E with mes
> 0. and
ω(Φ(z),Φ(∆), E) =
ω(z, B, E)
, z ∈ Ω.
Proof. The first assertions of Part 1) follows from Theorem 4.9 in [34]. To show
that Ω is simply connected, take an arbitrary Jordan domain D such that ∂D ⊂ Ω.
We need to prove that D ⊂ Ω. Observe that D ⊂ E and ω(z, B, E) < 1 − δ for all
z ∈ ∂D ⊂ Ω ⊂ G. By the Maximum Principle, we deduce that ω(z, B, E) < 1 − δ
for all z ∈ D. Hence, D ⊂ G, which, in turn, implies that D ⊂ Ω. This completes
Part 1).
Part 2) follows from the “end-point” version of Theorem 4.4.13 in [39] (that is,
we replace the hypothesis “accessible point” therein by end-point).
Applying the classical Lindelöf’s Theorem to f ◦ Φ−1 ∈ O(E,C), Part 3) follows.
26 VIÊT-ANH NGUYÊN
It remains to prove Part 4). A straightforward argument shows that Φ(∆) is a
measurable subset of ∂E. Next, we show that
(5.1) ω(Φ(z),Φ(∆), E) ≤
ω(z, B, E)
, z ∈ Ω.
To do this pick any u ∈ PSH(E) such that u ≤ 1 and
lim sup
u(w) ≤ 0, η ∈ Φ(∆).
Consequently, Part 2) gives that
(5.2) lim sup
z→ζ, z∈Ω∩Aα(ζ)
u ◦ Φ(z) = 0, ζ ∈ ∆, 0 < α <
Next, consider the following function
(5.3) ũ(z) :=
max{(1− δ) · (u ◦ Φ)(z), ω(z, B, E)}, z ∈ Ω,
ω(z, B, E), z ∈ E \ Ω.
Then it can be checked that ũ is subharmonic and ũ ≤ 1 in E. In addition, for
every density point ζ of B such that ζ 6∈ End(Ω), we know from Theorem 4.9 in [34]
that there is a connected component Ωζ of G other than Ω such that ζ ∈ End(Ωζ).
Consequently, Part 4) of Lemma 4.1 gives, for such a point ζ, that
lim sup
z→ζ, z∈Aα(ζ)
ũ(z) = lim sup
z→ζ, z∈Aα(ζ)
ω(z, B, E) = 0, 0 < α <
This, combined with (5.2), implies that
lim sup
z→ζ, z∈Aα(ζ)
ũ(z) = 0, 0 < α <
, for a.e. ζ ∈ B.
Consequently, applying Part 1) of Lemma 4.1 yields that ũ ≤ ω(·, B, E) on E.
Hence, by (5.3), (u ◦Φ)(z) ≤
ω(z,B,E)
, z ∈ Ω, which completes the proof of (5.1). In
particular, we obtain that mes (Φ(∆)) > 0.
To prove the opposite inequality of (5.1), let u be an arbitrary function in PSH(E)
such that u ≤ 1 and
lim sup
u(z) ≤ 0, ζ ∈ B.
Applying Part 3) to the function f(z) := z, we obtain that
lim sup
w→η, w∈Aα(η)
(u ◦ Φ−1) (w)
≤ 0, η ∈ Φ(∆), 0 < α <
On the other hand, since u ≤ ω(·, B, E) on E, one gets that
(u◦Φ−1)(w)
≤ 1, w ∈ E.
Therefore, applying Part 1) of Lemma 4.1 yields that
(u ◦ Φ−1) (w)
≤ ω(w,Φ(∆), E), w ∈ E,
which, in turn, implies the converse inequality of (5.1). Hence, the proof of Part 4)
is complete. �
A UNIFIED APPROACH 27
Now we are in the position to complete the proof of Theorem 4.2:
CASE 2: 0 < δ < 1.
Let (Gk)k∈K be the family of all connected components of G, where K is an (at
most) countable index set. By Proposition 5.2, we may fix a conformal mapping Φk
from Gk onto E for every k ∈ K. Put
Bk :=
End(Gk) ∩ B
, Wk := X(A,B
k;D,E),
W ok := X
o(A,B
k;D,E), Ŵk := X̂(A,B
k;D,E), k ∈ K.
(5.4)
where [T ]
(or simply T
) for T ⊂ ∂E is, following the notation of Lemma 4.1, the
set of all density points of T.
Recall from the hypotheses of Theorem 4.2 that for every fixed z ∈ A, the holo-
morphic function f(z, ·)|G is bounded and that for every η ∈ B,
w→η, w∈Ω∩Aα(η)
f(z, w) = f(z, η), 0 < α <
Consequently, Part 3) of Proposition 5.2, applied to f(z, ·)|Gk with k ∈ K, implies
that for every fixed z ∈ D, f(z,Φ−1k (·)) ∈ O(E,C) admits the angular limit f(z, η)
at Φk(η) for all η ∈ B ∩ End(Gk). By Part 1) of that proposition, we know that
B ∩ End(Gk)
> 0. This discussion and the hypothesis allow us to apply the
result of CASE 1 to the function gk : Wk −→ C defined by
(5.5) gk(z, w) :=
f(z,Φ−1
(w)), (z, w) ∈ D ×Gk,
f(z,Φ−1k (w)) (z, w) ∈ D × B
where in the second line we have used the definition of Φk|End(Gk) and its one-to-one
property proved by Part 2) of Proposition 5.2.
Consequently, we obtain an extension function ĝk ∈ O(Ŵk,C) such that
(5.6) ĝk(z, w) = gk(z, w), (z, w) ∈ A× E.
Ŵk :=
(z,Φ−1
(w)), (z, w) ∈ Ŵk
, k ∈ K.
Observe that the open sets (Ŵk)k∈K are pairwise disjoint. Moreover, by (5.4),
Ŵk = {(z, w) ∈ D ×E : w ∈ Gk and
ω(z, A,D) + ω
Φk(w),Φk(End(Gk)), E
< 1 for some k ∈ K
(z, w) ∈ D × E : w ∈ Gk and ω(z, A,D) +
ω(w,B,E)
< 1 for some k ∈ K
= Ŵ ,
28 VIÊT-ANH NGUYÊN
where the second equality follows from Part 4) of Proposition 5.2. Therefore, we
can define the desired extension function f̂ ∈ O(Ŵ ,C) by the formula
f̂(z, w) := ĝk(z,Φk(w)), (z, w) ∈ Ŵk, k ∈ K.
This, combined with (5.4)–(5.6), implies that f̂ = f on A×G. The uniqueness of f̂
follows from that of ĝk, k ∈ K. Hence, the proof of the theorem is complete. �
6. A local version of Theorem A
The main purpose of the section is to prove the following result.
Theorem 6.1. Let D ⊂ Cn, G ⊂ Cm be bounded open sets. D (resp. G) is equipped
with a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aα(η)
η∈G, α∈Iη
). Let
A (resp. B) be a subset of D (resp. G) such that A and B are locally pluriregular.
W := X(A,B;D,G), W := X(A,B;D,G),
:= Xo(A,B;D,G), Ŵ := X̂(A,B;D,G).
Then, for every bounded function f : W −→ C such that f ∈ Cs(W,C)∩Os(W
and that f |A×B is continuous at all points of (A ∩ ∂D) × (B ∩ ∂G), there exists a
unique bounded function f̂ ∈ O(Ŵ ,C) which admits A-limit f(ζ, η) at all points
(ζ, η) ∈ W. Moreover,
(6.1) |f̂(z, w)| ≤ |f |
1−ω(z,w)
A×B |f |
ω(z,w)
W , (z, w) ∈ Ŵ .
The core of our unified approach will be presented in the proof below. Our idea is
to use Theorem 3.8 in order to reduce Theorem 6.1 to the case of bidisk, that is, the
case of Theorem 4.3. This reduction is based on Theorem 4.2 and on the technique
of level sets.
Proof. It is divided into four steps.
Step 1: Construction of the desired function f̂ ∈ O(Ŵ ,C) and proof of the estimate
|f̂ |cW ≤ |f |W .
Proof of Step 1. We define f̂ at an arbitrary point (z, w) ∈ Ŵ as follows: Let ǫ > 0
be such that
(6.2) ω(z, A,D) + ω(w,B,G) + 2ǫ < 1.
By Theorem 3.8 and Definition 3.9, there is an ǫ-candidate (φ,Γ) (resp. (ψ,∆))
for (z, A,D) (resp. (w,B,G)). Moreover, using the hypotheses, we see that the
function fφ,ψ, defined by
(6.3) fφ,ψ(t, τ) := f(φ(t), ψ(τ)), (t, τ) ∈ X (Γ,∆;E,E) ,
satisfies the hypotheses of Theorem 4.3. By this theorem, let f̂φ,ψ be the unique
function in X̂ (Γ,∆;E,E) such that
(6.4) (A− lim f̂φ,ψ)(t, τ) = fφ,ψ(t, τ), (t, τ) ∈ X
o (Γ,∆;E,E) .
A UNIFIED APPROACH 29
In virtue of (6.2) and Theorem 3.8 and Lemma 3.3, (0, 0) ∈ X̂ (Γ,∆;E,E) . Then
we can define the value of the desired extension function f̂ at (z, w) as follows
(6.5) f̂(z, w) := f̂φ,ψ(0, 0).
The remaining part of this step is devoted to showing that f̂ is well-defined and
holomorphic on Ŵ .
To this end we fix an arbitrary point w0 ∈ G, a number ǫ0 : 0 < ǫ0 < 1 −
ω(w0, B,G), and an arbitrary ǫ0-candidate (ψ0,∆0) for (w0, B,G).
(6.6) Ŵ0 := {(z, τ) ∈ D × E : ω(z, A,D) + ω(τ,∆0, E) < 1} .
Inspired by formula (6.5) we define a function f̂0 : Ŵ0 −→ C as follows
(6.7) f̂0(z, τ) := f̂φ,ψ0(0, τ).
Here we have used an ǫ-candidate (φ,Γ) for (z, A,D), where ǫ is arbitrarily chosen
so that 0 < ǫ < 1− ω(z, A,D)− ω(τ,∆0, E).
Using (6.3)–(6.4) and (6.7) and arguing as in Part 2) of Lemma 4.5, one can show
that f̂0 is well-defined on Ŵ0.
For all 0 < δ < 1 let
(6.8) Aδ := {z ∈ D : ω(z, A,D) < δ} and Eδ := {w ∈ E : ω(w,∆0, E) < 1− δ} .
Then by the construction in (6.7), we remark that f̂0(z, ·) is holomorphic on Eδ for
every fixed z ∈ Aδ. We are able to define a new function f̃δ on X (Aδ, B;D,Eδ) as
follows
(6.9) f̃δ(z, τ) :=
f̂0(z, τ) (z, τ) ∈ Aδ × Eδ,
f(z, ψ0(τ)) (z, τ) ∈ D ×∆0.
Using the hypotheses on f and the previous remark, we see that f̃δ ∈
o (Aδ, B;D,Eδ) ,C
Observe that Aδ is an open set in D. Consequently, f̃δ satisfies the hypothe-
ses of Theorem 4.2. Applying this theorem yields a unique function f̂δ ∈
X̂ (Aδ, B;D,Eδ) ,C
such that
f̂δ(z, w) = f̃δ(z, w), (z, w) ∈ Aδ × Eδ.
This, combined with (6.9), implies that f̂0 is holomorphic on Aδ ×Gδ. On the other
hand, it follows from (6.6) and (6.8) that
Ŵ0 = X̂ (A,∆0;D,E) =
0<δ<1
Aδ ×Gδ.
Hence, f̂0 ∈ O(Ŵ0,C).
In summary, we have shown that f̂0, given by (6.7), is well-defined and holomor-
phic on Ŵ0.
30 VIÊT-ANH NGUYÊN
Now we are able to prove that f̂ , given by (6.5), is well-defined. To this end we fix
an arbitrary point (z0, w0) ∈ Ŵ , an ǫ0 : 0 < ǫ0 < 1− ω(z0, D,G), and two arbitrary
ǫ0-candidates (ψ1,∆1) and (ψ2,∆2) for (w0, B,G). Let
Ŵj := {(z, τ) ∈ D ×E : ω(z, A,D) + ω(τ,∆j, E) < 1} , j ∈ {1, 2}.
Using formula (6.7) define, for j ∈ {1, 2}, a function f̂j : Ŵj −→ C as follows
(6.10) f̂j(z, τ) := f̂φ,ψj(0, τ).
Here we have used any ǫ-candidate (φ,Γ) for (z, A,D) with a suitable ǫ > 0. Let
τj ∈ E be such that ψj(τj) = w0, j ∈ {1, 2}. Then, in virtue of (6.5) and (6.10) and
the result of the previous paragraph on the well-definedness of f̂0, the well-defined
property of f̂ is reduced to showing that
(6.11) f̂1(φ(t), τ1) = f̂2(φ(t), τ2)
for all t ∈ E and all ǫ-candidates (φ,Γ) for (φ(t), A,D), such that
ω(t,Γ, A) < ǫ := 1− max
j∈{1,2}
{ω(τ1,∆1, E), ω(τ2,∆2, E)} .
Observe that (6.11) follows from an argument based on Part 2) of Lemma 4.5. Hence,
f̂ is well-defined on Ŵ .
As in (6.8), for all 0 < δ < 1 let
Aδ := {z ∈ D : ω(z, A,D) < δ} , Bδ := {w ∈ G : ω(w,B,G) < δ} ,
Dδ := {z ∈ D : ω(z, A,D) < 1− δ} , Gδ := {w ∈ G : ω(w,B,G) < 1− δ} .
(6.12)
Now we combine (6.8) and (6.12) and the result that f̂0, given by (6.7), is well-defined
and holomorphic on Ŵ0, and the result that f̂ is well-defined on Ŵ . Consequently,
we obtain that
f̂(·, w) ∈ O(Dδ,C), w ∈ Bδ, 0 < δ < 1.
Since the formula (6.5) for f̂ is symmetric in two variables (z, w), one also gets that
f̂(z, ·) ∈ O(Gδ,C), z ∈ Aδ, 0 < δ < 1.
Since by (6.12),
0<δ<1
Aδ ×Gδ =
0<δ<1
Dδ × Bδ,
it follows from the previous conclusions that, for all points (z, w) ∈ Ŵ , there is an
open neighborhood U of z (resp. V of w) such that f ∈ Os(X
o(U, V ;U, V ),C). By
the classical Hartogs extension theorem, f ∈ O(U × V,C). Hence, f̂ ∈ O(Ŵ ,C).
On the other hand, it follows from (6.5) and the estimate in Theorem 4.3 that
(6.13) |f̂ |cW ≤ |f |W .
This completes Step 1. �
Step 2: f |A×B ∈ C(A× B,C).
A UNIFIED APPROACH 31
Proof of Step 2. Using the hypotheses we only need to check the continuity of f |A×B
at every point (a0, w0) ∈ A × (G ∩ B) and at every point (z0, b0) ∈ (D ∩ A) × B.
We will verify the first assertion. To do this let (ak)
k=1 ⊂ A and (wk)
k=1 ⊂ (G∩B)
such that lim
ak = a0 and lim
wk = w0. We need to show that
(6.14) lim
f(ak, wk) = f(a0, w0).
Since f |W is locally bounded, we may choose an open connected neighborhood V
of w0 such that sup
|f(ak, ·)|V <∞. Consequently, by Montel’s Theorem, there is a
sequence (kp)
p=1 such that (f(akp, ·)) converges uniformly on compact subsets of V
to a function g ∈ O(V ). Equality (6.14) is reduced to showing that g = f(a0, ·) on
V. Since f ∈ Cs(W,C), we deduce that f(a0, ·) = g on B ∩ V. On the other hand,
B ∩V is non locally pluripolar because B is locally pluriregular and w0 ∈ B. Hence,
we conclude by the uniqueness principle that g = f(a0, ·) on V. �
Step 3: f̂ admits A-limit f(ζ, η) at all points (ζ, η) ∈ W.
Proof of Step 3. To this end we only need to prove that
(6.15)
A− lim sup |f̂ − f(ζ0, η0)|
(ζ0, η0) < ǫ0
for an arbitrary fixed point (ζ0, η0) ∈ W and an arbitrary fixed 0 < ǫ0 < 1. Suppose
without loss of generality that
(6.16) |f |W ≤
First consider (ζ0, η0) ∈ A × B. Since f ∈ C(A × B,C), one may find an open
neighborhood U of ζ0 in C
n (resp. V of η0 in C
m) so that
(6.17) |f − f(ζ0, η0)|(A∩U)×(B∩V ) <
Consider the open sets
(6.18)
z ∈ D : ω(z, A ∩ U,D) <
and G
w ∈ G : ω(w,B ∩ V,G) <
In virtue of (6.16)–(6.18), an application of Theorem 3.6 gives that
|f(ζ, w)− f(ζ, η0)| ≤ (
)1−ω(w,B∩V,G) ≤
, ζ ∈ A ∩ U, w ∈ G
Hence,
(6.19) |f − f(ζ0, η0)|X(A∩U,B∩V ;D′ ,G′ ) ≤
Consider the function g : X(A ∩ U,B ∩ V ;D
) −→ C, given by
(6.20) g(z, w) := f(z, w)− f(ζ0, η0).
32 VIÊT-ANH NGUYÊN
Applying the result of Step 1, we can construct a function ĝ ∈ O(X̂(A ∩ U,B ∩
),C) from g in exactly the same way as we obtain f̂ ∈ O(Ŵ ,C) from f.
Moreover, combining (6.5) and (6.20), we see that
(6.21) ĝ = f̂ − f(ζ0, η0) on X̂(A ∩ U,B ∩ V ;D
On the other hand, it follows from formula (6.20), estimate (6.19), and estimate
(6.13) that
|ĝ|bX(A∩U,B∩V ;D′ ,G′) ≤
This, combined with (6.21) and (6.18), implies that
A− lim sup |f̂(z, w)− f(ζ0, η0)|
(ζ0, η0) ≤
Hence, (6.15) follows. In summary, we have shown that A− lim f̂ = f on A× B.
Now it remains to consider (ζ0, η0) ∈ A ×G. Using the last limit and arguing as
in Step 2, one can show that A− lim f̂(ζ0, η0) = f(ζ0, η0). �
Step 4: Proof of the uniqueness of f̂ and (6.1).
Proof of Step 4. To prove the uniqueness of f̂ suppose that ĝ ∈ O(Ŵ ,C) is a
bounded function which admits A-limit f(ζ, η) at all points (ζ, η) ∈ W. Fix an
arbitrary point (z0, w0) ∈ Ŵ , it suffices to show that f̂(z0, w0) = ĝ(z0, w0). Observe
that both functions f̂(z0, ·) and ĝ(z0, ·) are bounded and holomorphic on the δ-level
set of G relative to B :
Gδ,B := {w ∈ G : ω(w,B,G) < 1− ω(z0, A,D)} ,
where δ := ω(z0, A,D). On the other hand, they admit A-limit f(z0, η) at all
points η ∈ B. Consequently, applying Proposition 3.5 and Theorem 3.7 yields that
f̂(z0, ·) = ĝ(z0, ·) on Gδ,B. Hence, f̂(z0, w0) = ĝ(z0, w0).
To prove (6.1) fix an arbitrary point (z0, w0) ∈ Ŵ . For every η ∈ B, applying
Theorem 3.6 to log |f(·, η)| defined on D, we obtain that
(6.22) |f(z0, η)| ≤ |f |
1−ω(z0,A,D)
A×B |f |
ω(z0,A,D)
Applying Theorem 3.6 again to log |f̂(z0, ·)| defined on Gδ,B of the preceeding para-
graph, one gets that
|f̂(z0, w0)| ≤ |f(z0, ·)|
1−ω(w0,B,G)
B |f̂ |
ω(w0,B,G)
Inserting (6.13) and (6.22) into the right hand side of the latter estimate, (6.1)
follows. Hence Step 4 is finished. �
This completes the proof. �
In the sequel we will need the following refined version of Theorem 6.1.
Theorem 6.2. Let D ⊂ Cn, G ⊂ Cm be bounded open sets. D (resp. G) is equipped
with a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aα(η)
η∈G, α∈Iη
). Let
A UNIFIED APPROACH 33
A, A0 (resp. B, B0) be subsets of D (resp. G) such that A0 and B0 are locally
pluriregular and that A0 ⊂ A
∗ and B0 ⊂ B
∗. Put
W := X(A,B;D,G) and W0 := X(A0, B0;D,G).
Then, for every bounded function f : W −→ C which satisfies the following condi-
tions:
• f ∈ Cs(W,C) ∩ Os(W
o,C);
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there exists a unique bounded function f̂ ∈ O(Ŵ0,C) which admits A-limit f(ζ, η)
at all points (ζ, η) ∈ W0. Moreover,
(6.23) |f̂(z, w)| ≤ |f |
1−ω(z,A0,D)−ω(w,B0,G)
A0×B0
ω(z,A0,D)+ω(w,B0,G)
W , (z, w) ∈ Ŵ0.
Proof. Using the hypotheses and applying Part 1) of Theorem 7.2 below we can ex-
tend f to a locally bounded function (still denoted by) f defined on X(A∗, B∗, D,G)
such that f ∈ Os
o(A∗, B∗, D,G),C
and that f |X(A∗∩D,B∗∩G;D,G) is continuous.
Therefore, the newly defined function f satisfies
(6.24) f(a, b) := lim
f(ak, b),
where (a, b) is an arbitrary point of A∗ × (G∪B∗) and (ak)
k=1 ⊂ A
∗ is an arbitrary
sequence with lim
ak = a. Since f |W is bounded, it follows that the newly defined
function f is also bounded. In virtue of the definition of A∗ and B∗ we have
(6.25) ∂D ∩ A = ∂D ∩A∗ and ∂G ∩ B = ∂G ∩ B∗.
Using the second • in the hypotheses and formula (6.24) we see that f |A∗×B∗ is
continuous at all points all (∂D ∩ A) × (∂G ∩ B). Consequently, arguing as in the
proof of Step 2 of Theorem 6.1 and using (6.25), we can show that f ∈ C
. In summary, the newly defined function f which is defined and bounded on
X(A∗, B∗, D,G) satisfies
(6.26) f ∈ Os
o(A∗, B∗, D,G),C
and f ∈ C
A∗ × B∗,C
Observe that f is only separately continuous on X(A,B;D,G), but it is not nec-
essarily so on the cross X
A∗, B∗, D,G
. However, we will show that one can adapt
the argument of Theorem 6.1 in order to prove Theorem 6.2.
We define f̂ at an arbitrary point (z0, w0) ∈ Ŵ0 as follows: Let ǫ > 0 be such that
ω(z0, A0, D) + ω(w0, B0, G) + 2ǫ < 1.
By Theorem 3.8 and Definition 3.9, there is an ǫ-candidate (φ,Γ) (resp. (ψ,∆)) for
(z0, A0, D) (resp. (w0, B,G)). To conclude the proof we only need to prove that the
function fφ,ψ, defined by
fφ,ψ(t, τ) := f(φ(t), ψ(τ)), (t, τ) ∈ X (Γ,∆;E,E) ,
satisfies the hypotheses of Theorem 4.3. Indeed, having proved this assertion, the
proof will follow along the same lines as those given in Theorem 6.1. This assertion
is again reduced to showing that for each fixed t ∈ Γ, the function fφ,ψ(t, ·) admits
the angular limit f(φ(t), ψ(τ)) for every point τ ∈ ∆. We will prove the last claim.
34 VIÊT-ANH NGUYÊN
Using the first • and Theorem 3.8, we see that for every a ∈ A, the function
f(a, ψ(·)) ∈ O(E,C) admits the angular limit f(a, ψ(τ)) for every point τ ∈ ∆.
Next, using the hypothesis A0 ⊂ A
∗ we may choose a sequence (ak)
k=1 ⊂ A ∩ A
such that lim
ak = φ(t) ∈ A0. Observe from (6.26) that for every k the uniformly
bounded function f(ak, ψ(·)) ∈ O(E,C) admits the angular limit f(ak, ψ(τ)) and
that lim
f(ak, ψ(τ)) = f(φ(t), ψ(τ)) for every point τ ∈ ∆. Consequently, by the
Khinchin–Ostrowski Theorem (see [11, Theorem 4, p. 397]), the above claim follows.
7. Preparatory results
The first result of this section shows that the two definitions of plurisubharmonic
measure ω̃(·, A,D), given respectively in Definition 2.3 and in Subsection 2.1 of [28],
coincide in the case when A ⊂ D.
Proposition 7.1. Let X be a complex manifold and D ⊂ X an open set. D is
equipped with the canonical system A of approach regions. Let A be a subset of D.
Then ω̃(z, A,D) = ω(z, A∗, D).
Proof. Let P ∈ E(A). Then by Definition 2.3, P ⊂ A∗ and P is locally pluriregular.
Hence, P ⊂ (A∗)∗ = A∗. Since P ∈ E(A) is arbitrary, it follows from Definition 2.3
that Ã is locally pluriregular and Ã ⊂ A∗. In particular, (Ã)∗ ⊂ A∗ and
(7.1) ω̃(z, A,D) = ω(z, Ã, D) ≥ ω(z, A∗, D).
In the sequel we will show that
(7.2) A∗ ⊂ (Ã)∗.
Taking (7.2) for granted, we have that A∗ = (Ã)∗. Consequently,
ω̃(z, A,D) = ω(z, Ã, D) ≤ ω(z, A∗, D).
This, coupled with (7.1), completes the proof.
To prove (7.2) fix an arbitrary point a ∈ A∗ and an arbitrary but sufficiently small
neighborhood U ⊂ X of a such that U is biholomorphic to a bounded open set in
n, where n is the dimension of X at a. Since A∗ is a Borel subset of D, Theorem
8.5 in [7] provides a subset P ⊂ A∗ ∩ U of type Fσ
10 such that
(7.3) ω(z, P, U) = ω(z, A∗ ∩ U, U), z ∈ U.
Write P =
Pn, where Pn is closed. Observe that Pn ∩ P
n is locally pluriregular,
Pn \ (Pn ∩ P
n) is locally pluripolar and Pn ∩ P
n ⊂ Pn ⊂ A
∗ ∩ P. Consequently,⋃
(Pn ∩ P
n) ⊂ Ã ∩ P and P \
(Pn ∩ P
n) is locally pluripolar. This implies that
ω(z, Ã ∩ U, U) ≤ ω
(Pn ∩ P
n), U
= ω(z, P, U),
10 This means that P is a countable (or finite) union of relatively closed subsets of U.
A UNIFIED APPROACH 35
where the equality holds by applying Lemma 3.5.3 in [18] and by using the fact that
U is biholomorphic to a bounded open set in Cn. This, combined with (7.3) and the
assumption a ∈ A∗, implies that ω(a, Ã ∩ U, U) = 0. Thus (7.2) follows. �
The main purpose of this and the next sections is to generalize Theorem 6.1 to the
case where the “target space” Z is an arbitrary complex analytic space possessing
the Hartogs extension property.
Theorem 7.2. Let D ⊂ Cn, G ⊂ Cm be two bounded open sets. D (resp. G) is
equipped with the canonical system of approach regions. Let Z be a complex analytic
space possessing the Hartogs extension property. Let A (resp. B) be a subset of D
(resp. G). Put W := X(A,B;D,G) and Ŵ := X̂(A,B;D,G). Let f ∈ Os(W
o, Z).
1) Then f extends to a mapping (still denoted by) f defined on Xo(A∪A∗, B ∪
B∗;D,G) such that f is separately holomorphic on Xo(A∪A∗, B∪B∗;D,G)
and that f |Xo(A∗,B∗;D,G) is continuous.
2) Suppose in addition that A and B are locally pluriregular. Then f extends
to a unique mapping f̂ ∈ O(Ŵ , Z) such that f̂ = f on W.
Proof. This result has already been proved in Théorème 2.2.4 in [5] starting from
Proposition 3.2.1 therein. In the latter proposition Alehyane and Zeriahi make use
of the method of doubly orthogonal bases of Bergman type. We can avoid this
method by simply replacing every application of this proposition by Theorem 6.1.
Keeping this change in mind and using Proposition 7.1, the remaining part of the
proof follows along the same lines as that of Théorème 2.2.4 in [5]. �
Theorem 7.3. Let D, G be complex manifolds, and let A ⊂ D, B ⊂ G be open
subsets. Let Z be a complex analytic space possessing the Hartogs extension property.
Put W := X(A,B;D,G) and Ŵ := X̂(A,B;D,G). Then for any mapping f ∈
Os(W,Z), there is a unique mapping f̂ ∈ O(Ŵ , Z) such that f̂ = f on W.
Proof. It has already been proved in Theorem 5.1 of [28]. The only places where the
method of doubly orthogonal bases of Bergman type is involved is the applications
of Théorème 2.2.4 in [5]. As we already pointed out in Theorem 7.2, one can avoid
this method by using Theorem 6.1 instead. �
We are ready to formulate a slight generalization of Theorems 6.2 and 7.2.
Theorem 7.4. Let D ⊂ Cn, G ⊂ Cm be bounded open sets. D (resp. G) is equipped
with a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
). Let A
and A0 (resp. B and B0) be two subsets of D (resp. G) such that A0 and B0 are
locally pluriregular and that A0 ⊂ A
∗ and B0 ⊂ B
∗. Let Z be a complex analytic
space possessing the Hartogs extension property. Put
W := X(A,B;D,G) and W0 := X(A0, B0;D,G).
Then, for every bounded mapping f : W −→ Z which satisfies the following condi-
tions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
36 VIÊT-ANH NGUYÊN
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there exists a unique bounded mapping f̂ ∈ O(Ŵ0,C) which admits A-limit f(ζ, η)
at all points (ζ, η) ∈ W0.
Proof. Since f is bounded, one may find an open neighborhood U of f(W ) in Z and
a holomorphic embedding φ of U into the polydisc Ek of Ck such that φ(U) is an
analytic set in Ek. Now we are able to apply Theorem 6.2 to the mapping φ ◦ f :
W −→ Ck. Consequently, one obtains a unique bounded mapping F ∈ O(Ŵ ,Ck)
which admits A-limit (φ ◦ f)(ζ, η) at all points (ζ, η) ∈ W. Using estimate (6.23)
one can show that F ∈ O(Ŵ , Ek). Now using Theorem 3.7 it is not difficult to see
that F (Ŵ ) ⊂ φ(U). Consequently, one can define the desired extension mapping f̂
as follows:
f̂(z, w) := (φ−1 ◦ F )(z, w), (z, w) ∈ Ŵ .
The following Uniqueness Theorem for holomorphic mappings generalizes Theo-
rem 3.7.
Theorem 7.5. Let X be a complex manifold, D ⊂ X an open subset and Z a
complex analytic space. Suppose that D is equipped with a system of approach regions(
Aα(ζ)
ζ∈D, α∈Iζ
. Let A ⊂ D be a locally pluriregular set. Let f1, f2 : D∪A −→ Z
be locally bounded mappings such that f1|D, f2|D ∈ O(D,Z) and A− lim f1 = A −
lim f2 on A. Then f1(z) = f2(z) for all z ∈ D such that ω(z, A,D) 6= 1.
We leave the proof to the interested reader. Finally, we conclude this section with
the following Gluing Lemma.
Lemma 7.6. Let D and G be open subsets of some complex manifolds and Z a com-
plex analytic space. Suppose that D (resp. G) is equipped with a system of approach
regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
). Let (Dk)
(resp. (Gk)
a family of open subsets of D (resp. G) equipped with the induced system of approach
regions. Let (Pk)
(resp. (Qk)
) be a family of locally pluriregular subsets of
D (resp. G). Suppose, in addition, that
(i) Pk ⊂ Pk0, Dk0 ⊂ Dk, and Pk is locally pluriregular relative to Dk0 . Similarly,
Qk ⊂ Qk0, Gk0 ⊂ Gk, and Qk is locally pluriregular relative to Gk0 .
(ii) There are a family of locally bounded mappings (fk)
such that fk :
o (Pk,Qk;Dk,Gk) −→ Z verifies fk = fk0 on X
o (Pk,Qk;Dk0,Gk0) ,
and a family of holomorphic mappings (f̂k)
such that f̂k ∈
X̂ (Pk,Qk;Dk,Gk) , Z
, and
(A− lim f̂k)(z, w) = fk(z, w), (z, w) ∈ X
o (Pk,Qk;Dk0,Gk0) .
(iii) There are open subsets U of D and V of G such that ω̃(z,Pk,Dk0) +
ω̃(w,Qk,Gk0) < 1 for all (z, w) ∈ U × V and k ≥ k0.
Then f̂k(z, w) = f̂k0(z, w) for all (z, w) ∈ U × V and k ≥ k0.
A UNIFIED APPROACH 37
Proof. By (iii), we have that
(7.4) U × V ⊂ H := X̂ (Pk,Qk;Dk0,Gk0) .
On the other hand, using (i) we see that
(7.5) H ⊂ X̂ (Pk,Qk;Dk,Gk) ∩ X̂ (Pk0 ,Qk0;Dk0,Gk0) .
Fix arbitrary (z0, w0) ∈ H and k ≥ k0. Observe that both mappings f̂k(·, w0) and
f̂k0(·, w0) are defined on {z ∈ Dk0 : ω(z,Pk,Dk0) < 1− ω(w0,Qk,Gk0)} . Using (ii)
and Proposition 3.5, we may apply Theorem 7.5 to these mappings and conclude
that f̂k(z0, w0) = f̂k0(z0, w0). �
8. Local and semi-local versions of Theorem A
The aim of this section is to generalize Theorem 6.2 to some cases where the
“target space” Z is a complex analytic space possessing the Hartogs extension prop-
erty. Our philosophy is the following: we first apply Theorem 6.2 locally in order to
obtain various local extension mappings, then we glue them together. The gluing
process needs the following
Definition 8.1. Let M be a complex manifold and Z a complex space. Let (Uj)j∈J
be a family of open subsets of M, and (fj)j∈J a family of mappings such that fj ∈
O(Uj , Z). We say that the family (fj)j∈J is collective if, for any j, k ∈ J, fj = fk
on Uj ∩ Uk. The unique holomorphic mapping f :
Uj −→ Z, defined by f := fj
on Uj, j ∈ J, is called the collected mapping of (fj)j∈J .
We arrive at the following local version of Theorem A.
Theorem 8.2. Let D ⊂ Cp, G ⊂ Cq be bounded open sets and Z a complex analytic
space possessing the Hartogs extension property. D (resp. G) is equipped with a
system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
). Let A, A0
(resp. B, B0) be subsets of D (resp. G) such that A0 and B0 are locally pluriregular
and that A0 ⊂ A
∗ and B0 ⊂ B
∗. Put
W := X(A,B;D,G) and W0 := X(A0, B0;D,G).
Then, for every mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded along X
A ∩ ∂D,B ∩ ∂G;D,G
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there exists a unique mapping f̂ ∈ O(Ŵ0, Z) which admits A-limit f(ζ, η) at all
points (ζ, η) ∈ W0.
Theorem 8.2 generalizes Theorem 6.2 to the case where the “target space” Z is an
arbitrary complex analytic space possessing the Hartogs extension property. Since
the proof is somewhat technical, the reader may skip it at the first reading.
38 VIÊT-ANH NGUYÊN
Proof. Recall that for a ∈ Ck and r > 0, B(a, r) denotes the open ball centered at a
with radius r. For 0 < δ < 1 and 0 < r put
Da,δ,r := {z ∈ D ∩ B(a, r) : ω(A0 ∩ B(a, r), D ∩ B(a, r)) < δ} , a ∈ A0,
Gb,δ,r := {w ∈ G ∩ B(b, r) : ω(B0 ∩ B(b, r), G ∩ B(b, r)) < δ} , b ∈ B0.
(8.1)
Applying Part 1) of Theorem 7.2 and using the hypotheses on f, we see that f
extends to a mapping defined on X(A ∪A∗, B ∪B∗;D,G) such that f is separately
holomorphic on Xo(A∪A∗, B ∪B∗;D,G) and that f |X(A∗,B∗;D,G) is locally bounded.
Therefore, using the compactness of A0 and B0, one may find a real number r0 > 0
such that
(8.2) fa,b := f |X(A0∩B(a,r),B0∩B(b,r);D∩B(a,r),G∩B(b,r))
is bounded for all 0 < r ≤ r0 and a ∈ A0, b ∈ B0. Applying Theorem 7.4 to fa,b ,
one obtains a mapping
(8.3) f̂a,b ∈ O
A0 ∩ B(a, r), B0 ∩ B(b, r);D ∩ B(a, r), G ∩ B(b, r)
which admits A-limit f on X
A0 ∩ B(a, r), B0 ∩ B(b, r);D ∩ B(a, r), G ∩ B(b, r)
Fix 0 < δ0 <
. Then it follows from (8.1) that for 0 < r ≤ r0, a ∈ A0, b ∈ B0.
Da,δ0,r ×Gb,δ0,r ⊂ X̂
A0 ∩ B(a, r), B0 ∩ B(b, r);D ∩ B(a, r), G ∩ B(b, r)
This, combined with (8.3), implies that
(8.4) f̂a,b ∈ O (Da,δ0,r ×Gb,δ0,r, Z) , 0 < r ≤ r0, a ∈ A0, b ∈ B0.
Next we fix a finite covering (A0 ∩ B(am, r))
m=1 of A0 and (B0 ∩ B(bn, r))
n=1 of B0,
where (am)
m=1 ⊂ A0 and (bn)
n=1 ⊂ B0.
We divide the proof into two steps.
Step 1: Fix an open set G
⋐ G. Then there exists r1: 0 < r1 < r0 with the following
property: for every a ∈ A0 there exist an open subset Aa of D and a mapping
f̂ = f̂a ∈ O
Gbn,δ0,r0
such that
f̂(z, w) = f̂a,bn(z, w), (z, w) ∈ (Aa ∩Da,δ0,r0)×Gbn,δ0,r0, n = 1, . . . , N ;
and that Aa is of the form {z ∈ D∩B(a, r1) : ω(z, A0∩B(a, r1), D∩B(a, r1)) < δa}
for some 0 < δa < δ0.
Proof of Step 1. Fix an arbitrary point a0 ∈ A0. First we claim that there are a
sufficiently small number r1 : 0 < r1 < r0 and a finite number of open subsets
n=1 of G with the following properties:
(a) V1 = Gb1,δ0,r0 and (Gbn,δ0,r0)
⊂ (Vn)
n=1 (see the notation in (8.1));
(b) f |(A0∩B(a,r1))×Vn is bounded, n = 1, . . . , N0;
(c) G
A UNIFIED APPROACH 39
(d) Vn ∩ Vn+1 6= ∅, n = 1, . . . , N0 − 1.
Indeed, we first start with the test r1 := r0 and N0 := N and (Vn)
n=1 :=
(Gbn,δ0)
. In virtue of (8.2) we see that our choice satisfies (a)–(b). If (c)–(d)
are satisfied then we are done. Otherwise, we will make the following procedure.
Fix a point w0 ∈ G
. For n = 1, . . . , N, let γn : [0, 1] → G be a continuous
one-to-one map such that
γn(0) = w0 and γn(1) ∈ Gbn,δ0,r0.
Since f is locally bounded, there exist sufficiently small numbers r1, s : 0 < r1 ≤
r0 and 0 < s such that f |(A0∩B(a,r1))×B(w,s) is bounded for all a ∈ A0 and w ∈
γn([0, 1]). Therefore, we may add to the starting collection (Vn)
n=1 some balls
of the form B(w, s), where w ∈ G
γn([0, 1]), and the new collection (Vn)
still satisfies (a)–(b). Now it remains to show that by adding a finite number of
suitable balls B(w, s), (c)–(d) are also satisfied. But this assertion follows from an
almost obvious geometric argument. In fact, we may renumber the collection (Vn)
if necessary. Hence, the above claim has been shown.
Using (c)–(d) above we may fix open sets Un ⋐ Vn for n = 1, . . . , N0, such that
(8.5) G
Un and Un ∩ Un−1 6= ∅, 1 < n ≤ N0.
In what follows we will find the desired set Aa0 and the desired holomorphic mapping
f̂ after N0 steps. Namely, after the n-th step (1 ≤ n ≤ N0), we construct an
open subset An of D in the form Da0,δn,r1 for a suitable δn > 0, and a mapping
f̂n ∈ O
. Finally, we obtain Aa0 := AN0 and f̂ := f̂N0. Now we
carry out this construction.
In the first step, using (8.1), (8.3), (8.4) and (a), we define
δ1 := δ0, A1 := Da0,δ1,r1 and f̂1(z, w) := f̂a0,b1(z, w), (z, w) ∈ A1 × U1.
Suppose that we have constructed an open subset An−1 of D and a mapping f̂n−1 ∈
An−1 ×
( n−1⋃
for some n : 2 ≤ n ≤ N0. We wish to construct an open
subset An of D and a mapping f̂n ∈ O
. There are two cases to
consider.
Case Vn = Gbm,δ0 for some 1 ≤ m ≤ N.
In this case let δn := δn−1 and An := An−1 = Da0,δn−1,r1, and
f̂n :=
f̂n−1, on An ×
( n−1⋃
f̂a0,bm, on An × Un.
40 VIÊT-ANH NGUYÊN
Case Vn 6∈
Gbm,δ0
By (8.5) fix a nonempty open set K ⋐ Un ∩ Un−1. Then by the induction, f̂n−1 ∈
O (An−1 ×K,Z) . Recall from (b) that f : (A0 ∩ B(a0, r1))× Vn −→ Z is bounded.
Since f is locally bounded, by decreasing r1 > 0 (if necessary) we may assume that
g := f |
X(A0∩B(a0,r1),K;D∩B(a0,r1),Vn)
is bounded. Applying Theorem 7.4 to g, we obtain
ĝ ∈ O
X̂(A0 ∩ B(a0, r1), K;D ∩ B(a0, r1), Vn), Z
which extends g. Since Un ⋐ Vn, we may choose δn such that 0 < δn < 1 −
ω(w,K, Vn). Using this and (8.1), it follows that
Da0,δn,r1 × Un ⊂ X̂(A0 ∩ B(a0, r1), K;D ∩ B(a0, r1), Vn).
Therefore, let An := Da0,δn,r1 and define
f̂n :=
f̂n−1, on An ×
( n−1⋃
ĝ, on An × Un.
This completes our construction in the n-step. Finally, we put Aa0 := AN0 and
f̂a0 := f̂N0 . Using this and (8.3) and (8.5) and (a), the desired conclusion of Step 1
follows. �
Step 2: Completion of the proof.
Proof of Step 2. Fix a sequence of relatively compact open subsets (D
k=1 of D
(resp. (G
k=1 of G) such that D
k ր D and G
k ր G as k ր ∞. Put
(8.6) Dk := D
Dam,δ0,r0, Gk := G
Gbn,δ0,r0, k ≥ 1.
Using the result of Step 1, we may find, for every k, a number 0 < rk < r0 with the
following properties:
• for every a ∈ A0, there is 0 < δa,k < δ0 such that by considering the open set
Aa,k := {z ∈ D ∩ B(a, rk) : ω (z, A0 ∩ B(a, rk), D ∩ B(a, rk)) < δa,k}
one can find a mapping f̂a,k ∈ O (Aa,k ×Gk, Z) satisfying
(8.7) f̂a,k = f̂a,bn on (Aa,k ∩Da,δ0,rk)×Gbn,δ0,rk , n = 1, . . . , N ;
• for every b ∈ B, there is 0 < δb,k < δ0 such that by considering the open set
Bb,k := {w ∈ G ∩ B(b, rk) : ω (z, B0 ∩ B(b, rk), G ∩ B(b, rk)) < δb,k}
one can find a mapping f̂b,k ∈ O (Dk ×Bb,k, Z) satisfying
(8.8) f̂b,k = f̂am,b on Dam,δ0,rk × (Bb,k ∩Gb,δ0,rk), m = 1, . . . ,M.
A UNIFIED APPROACH 41
Next using the compactness of A0 and B0, one may find, for every k, two fi-
nite coverings (A0 ∩ B(a
m, rk))
of A0 and (B0 ∩ B(bn′ , rk))
of B0, where
(am′ )
⊂ A0 and (bn′ )
⊂ B0. Put
(8.9) Ak :=
′ ,k and Bk :=
′ ,k, k ≥ 1.
In virtue of (8.6)–(8.9) and (8.2)–(8.4), the family (f̂a
′ ,k)
is col-
lective for every k ≥ 1. Let
(8.10) f̂k ∈ O
X(Ak, Bk;Dk, Gk), Z
denote the collected mapping of this family.
Next, we show that
(8.11)
ω(z, A0, Dk) = ω(z, A0, D) and lim
ω(w,B0, Gk) = ω(z, B0, G), z ∈ D, w ∈ G.
It is sufficient to prove the first identity in (8.11) since the proof of the second one
is similar. Observe that there is u ∈ PSH(D) such that ω(·, A0, Dk) ց u as k ր ∞
and u ≥ ω(·, A0, D) on D. Therefore, the proof of (8.11) will be complete if one can
show that u ≤ ω(·, A0, D) on D.
To this end observe that for every a ∈ A0 there is 1 ≤ m ≤ M such that
a ∈ B(am, r0). Consequently, using (8.6),
(A− lim sup u)(a) ≤
A− lim supω(·, A0 ∩ B(am, r0), Dam,δ0,r0)
(a) = 0,
where the equality follows from an application of Proposition 3.5. This, combined
with the obvious inequality u ≤ 1, implies that u ≤ ω(·, A0, D). Hence, (8.11)
follows.
We are now in the position to define the desired extension mapping f̂ . Indeed,
one glues
given in (8.10) together to obtain f̂ in the following way
f̂ := lim
f̂k on Ŵ0.
One needs to check that the last limit exists and possesses all the required properties.
In virtue of (8.7)–(8.11), and the Gluing Lemma 7.6, the proof will be complete if
we can show the following
Claim. For every (z0, w0) ∈ Ŵ0, there are an open neighborhood U × V of (z0, w0)
and δ0 > 0 such that the hypotheses of Lemma 7.6 is fulfilled with
D := D, G := G, Pk := Ak, Qk := Bk, Dk := Dk, Gk := Gk, k ≥ 1.
To this end let
δ0 :=
1− ω(z0, A0, D)− ω(w0, B0, G)
and let U × V be an open neighborhood of (z0, w0) such that
ω(z, A0, D) + ω(w,B0, G) < ω(z0, A0, D) + ω(w0, B0, G) + δ0.
42 VIÊT-ANH NGUYÊN
Then using these inequalities and (8.11), we see that there is a sufficiently big q0 ∈ N
such that for q0 ≤ q ≤ p and (z, w) ∈ U × V,
ω(z, Ap, Dq) + ω(w,Bp, Dq) ≤ ω(z, A0, Dq) + ω(w,B0, Gq)
≤ ω(z, A0, D) + ω(w,B0, G) + δ0 < 1.
This proves the above claim. Hence, the proof of the theorem is finished. �
Now we are able to formulate the following semi-local result.
Theorem 8.3. Let D be an open subset of a complex manifold and G ⊂ Cm a
bounded open set and Z a complex analytic space possessing the Hartogs extension
property. D (resp. G) is equipped with the canonical system of approach regions
(resp. the system of approach regions
Aβ(η)
η∈G, α∈Iη
). Let A be an open subset of
D and let B, B0 be subsets of G such that B0 is locally pluriregular and B0 ⊂ B
W := X(A,B;D,G) and W0 := X(A,B0;D,G).
Then, for every mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded along D × (B ∩ ∂G),
there exists a unique mapping f̂ ∈ O(Ŵ0, Z) which admits A-limit f(ζ, η) at all
points (ζ, η) ∈ W0.
Proof. First, applying Part 1) of Theorem 7.2 and using the hypotheses on f, we
see that f extends to a mapping (still denoted by) f defined on X(A,B ∪B∗;D,G)
such that f is separately holomorphic on Xo(A,B ∪B∗;D,G) and that f |X(A,B∗;D,G)
is locally bounded.
We define f̂ at a point (z0, w0) ∈ Ŵ0 as follows: Let ǫ > 0 be such that
(8.12) ω(z0, A,D) + ω(w0, B0, G) + ǫ < 1.
By Theorem 3.1 and Proposition 3.4, there is a holomorphic disc φ ∈ O(E,D) such
that φ(0) = z0 and
(8.13) 1−
·mes(φ−1(A) ∩ ∂E) < ω(z0, A,D) + ǫ.
Moreover, using the hypotheses, we see that the mapping fφ, defined by
(8.14) fφ(t, w) := f(φ(t), w), (t, w) ∈ X
φ−1(A) ∩ ∂E,B;E,G
satisfies the hypotheses of Theorem 8.2. By this theorem, let f̂φ be the unique
mapping in X̂ (φ−1(A) ∩ ∂E,B0;E,G) such that
(8.15) (A− lim f̂φ)(t, w) = fφ(t, w), (t, w) ∈ X
φ−1(A) ∩ ∂E,B0;E,G
In virtue of (8.12)–(8.13), (0, w0) ∈ X̂ (φ
−1(A) ∩ ∂E,B0;E,G) . Then the value at
(z0, w0) of the desired extension mapping f̂ is given by
f̂(z0, w0) := f̂φ(0, w0).
A UNIFIED APPROACH 43
Using this and (8.14)–(8.15), and arguing as in Part 2) of Lemma 4.5, one can show
that f̂ is well-defined on Ŵ0.
To show that f̂ is holomorphic, one argues as in Step 1 of the proof of Theorem
6.1. To show that f̂ admits A-limit f(ζ, η) at all points (ζ, η) ∈ W0 and that it is
uniquely defined, one proceeds as in Step 2–4 of the proof of Theorem 6.1 making
the obviously necessary changes and adaptations. Hence, the proof is finished. �
9. The proof of Theorem A
First we need a variant of Definition 2.3. For a set A ⊂ D, Let Ẽ(A) be the set of
all elements P ∈ E(A) with the property that there is an open neighborhood U ⊂ X
of P such that U is biholomorphic to a domain in some Cn. Then it can be checked
(9.1) Ã :=
P∈eE(A)
This identity will allow us to pass from “local informations” to “global extensions”.
For the proof we need to develop some preparatory results.
In virtue of (9.1), for any P ∈ Ẽ(A) (resp. Q ∈ Ẽ(B)) fix an open neighborhood
UP of P (resp. VQ of Q) such that UP (resp. VQ) is biholomorphic to a domain in
dP (resp. in CdQ), where dP (resp. dQ) is the dimension of D (resp. G) at points
of P (resp. Q). For any 0 < δ ≤ 1
define
UP,δ := {z ∈ UP : ω(z, P, UP ) < δ} , P ∈ Ẽ(A),
VQ,δ := {w ∈ VQ : ω(w,Q, VQ) < δ} , Q ∈ Ẽ(B),
Aδ :=
P∈eE(A)
UP,δ, Bδ :=
Q∈eE(B)
VQ,δ,
Dδ := {z ∈ D : ω̃(z, A,D) < 1− δ} , Gδ := {w ∈ G : ω̃(w,B,G) < 1− δ} .
(9.2)
Lemma 9.1. We keep the above notation. Then:
(1) For every ζ ∈ Ã and α ∈ Iζ, there is an open neighborhood U of ζ such that
U ∩ Aα(ζ) ⊂ Aδ.
(2) Aδ is an open subset of D and Aδ ⊂ D1−δ ⊂ Dδ.
(3) ω̃(z, A,D)− δ ≤ ω(z, Aδ, D) ≤ ω̃(z, A,D), z ∈ D.
Proof of Lemma 9.1. To prove Part (1) fix, in view of (9.1)–(9.2), P ∈ Ẽ(A),
ζ ∈ P and α ∈ Iζ . Using the definition of local pluriregularity, we see that
lim sup
z→ζ, z∈Aα(ζ)
ω(z, P, UP ) = 0. Hence, Part (1) follows.
The assertion that Aδ is open follows immediately from (9.2). Since 0 < δ ≤
the second inclusion in Part (2) is clear. To prove the first inclusion let z be an
arbitrary point of Aδ. Then there is P ∈ Ẽ(A) such that z ∈ UP,δ. Using (9.2) and
Definition 2.3 we obtain
(9.3) ω̃(z, A,D) = ω(z, Ã, D) ≤ ω(z, P, UP ) < δ.
44 VIÊT-ANH NGUYÊN
Hence, z ∈ D1−δ, which in turn implies that Aδ ⊂ D1−δ.
It follows from Part (1) that
ω(z, Aδ, D) ≤ ω(z, Ã, D) = ω̃(z, A,D), z ∈ D,
which proves the second estimate in Part (3). To complete the proof let P ∈ Ẽ(A)
and 0 < δ ≤ 1
. We deduce from (9.3) that ω̃(z, A,D) − δ ≤ 0 for z ∈ UP,δ. Hence,
by (9.2),
ω̃(z, A,D)− δ ≤ 0, z ∈ Aδ.
On the other hand, ω̃(z, A,D) − δ < 1, z ∈ D. Recall from Part (2) that Aδ is an
open subset of Dδ. Consequently, the first estimate of Part (3) follows. �
Now we are able to to prove Theorem A in the following special case.
Proposition 9.2. Let D be an open subset of a complex manifold and G a bounded
open subset of Cm and Z a complex analytic space possessing the Hartogs extension
property. D (resp. G) is equipped with a system of approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
). Let A be a subset of D, let B, B0 be subsets of G such
that B0 is locally pluriregular and B0 ⊂ B
∗. Put
W := X(A,B;D,G), W0 := X(A,B0;D,G), W̃
(D ∪ Ã)× B0
Ã× (G ∪ B0)
Ŵ o := {(z, w) ∈ D ×G : ω̃(z, A,D) + ω(w,B0, G) < 1} .
Then, for every mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded along X
A ∩ ∂D,B ∩ ∂G;D,G
• f |A×B is continuous at all points of (A ∩ ∂D)× (B ∩ ∂G),
there exists a unique mapping f̂ ∈ O(Ŵ o, Z) which admits A-limit f(ζ, η) at all
points (ζ, η) ∈ W̃ o.
Proof of Proposition 9.2. First, applying Part 1) of Theorem 7.2 and using the
hypotheses on f, we see that f extends to a mapping (still denoted by f) defined on
X(A ∪ A∗, B ∪ B∗;D,G) such that f is separately holomorphic on Xo(A ∪ A∗, B ∪
B∗;D,G) and that f |X(A∗,B∗;D,G) is locally bounded.
For each P ∈ Ẽ(A), UP (resp. G) is biholomorphic to an open set in C
dP (resp.
in Cm). Consequently, the mapping fP := f |X(P ,B;UP ,G) satisfies the hypotheses of
Theorem 8.2. Hence, we obtain a unique mapping f̂P ∈ O
X̂ (P,B0;UP , G) , Z
(9.4) (A− lim f̂P )(z, w) = fP (z, w) = f(z, w), (z, w) ∈ X (P,B0;UP , G) .
Let 0 < δ ≤ 1
and G
δ := {w ∈ G : ω(w,B0, G) < 1 − δ}. We will show that the
family
f̂P |UP,δ×G
P∈eE(A)
is collective in the sense of Definition 8.1, where UP,δ is
given in (9.2).
A UNIFIED APPROACH 45
To prove this assertion let P1, P2 be arbitrary elements of Ẽ(A). By (9.4), we have
(9.5)
(A− lim f̂P1)(z, w) = f(z, w) = (A− lim f̂P2)(z, w), (z, w) ∈ (UP1 ∩ UP2)× B0.
The assertion is reduced to showing that
(9.6) f̂P1(z, w) = f̂P2(z, w), (z, w) ∈ X̂ (P1, B0;UP1, G) ∩ X̂ (P2, B0;UP2, G) .
To this end fix (z0, w0) ∈ X̂ (P1, B0;UP1, G) ∩ X̂ (P2, B0;UP2 , G) . Observe that both
mappings w 7→ f̂P1(z0, w) and w 7→ f̂P2(z0, w) belong to O(G, Z), where G is the
connected component which contains w0 of the following open set{
w ∈ G : ω(w,B0, G) < 1− max
j∈{1,2}
ω(z0, Pj, Uj)
Applying Theorem 7.5 to these mappings using (9.5), Proposition 3.5 and (9.6), the
above assertion follows.
In virtue of (9.2) let
(9.7)
fδ ∈ O(Aδ ×G
δ, Z)
denote the collected mapping of the family
f̂P |UP,δ×G
P∈eE(A)
. In virtue of (9.4)
and (9.7), we are able to define a new mapping f̃δ on X
Aδ, B;D,G
as follows
f̃δ :=
fδ, on Aδ ×G
f, on D × B.
Using this and (9.4)–(9.7), we see that
(9.8) A− lim f̃δ = f on X(A ∩ Ã, B0;D,G
Since Aδ is an open subset of X and G
δ is a bounded open set in C
m, we are able to
apply Theorem 8.3 to f̃δ in order to obtain a mapping f̂δ ∈ O
Aδ, B0;D,G
such that
(9.9) A− lim f̂δ = f̃δ on X(Aδ, B0;D,G
We are now in a position to define the desired extension mapping f̂ . Indeed, one
glues
0<δ≤ 1
together to obtain f̂ in the following way
f̂ := lim
on Ŵ o.
One needs to check that the last limit exists and possesses all the required properties.
In virtue of (9.8)–(9.9) and Lemma 7.6, the proof will be complete if one can show
that for every (z0, w0) ∈ Ŵ
o, there are an open neighborhood U × V of (z0, w0) and
δ0 > 0 such that hypothesis (iii) of Lemma 7.6 is fulfilled with
D := D, G := G, Pk := A 1
, Qk := B0, Dk := D, Gk := G
, k > 2.
To this end let
δ0 :=
1− ω̃(z0, A,D)− ω(w0, B0, G)
46 VIÊT-ANH NGUYÊN
and let U × V be an open neighborhood of (z0, w0) such that
ω̃(z, A,D) + ω(w,B0, G) < ω̃(z0, A,D) + ω(w0, B0, G) + δ0.
Then for k > 1
and for (z, w) ∈ U ×V, using the last inequality, and applying Part
(3) of Lemma 9.1 and Proposition 3.5, we see that
ω̃(z, A 1
, D) + ω(w,B0, G
) ≤ ω̃(z, A,D) +
ω(w,B0, G)
1− δ0
ω̃(z, A,D) + ω(w,B0, G)
1− δ0
This proves the above assertion. Hence, the proof of the proposition is finished. �
We now arrive at
Proof of Theorem A. First, applying Part 1) of Theorem 7.2 and using the
hypotheses on f, we see that f extends to a mapping (still denoted by) f defined on
X(A ∪ A∗, B ∪ B∗;D,G) such that f is separately holomorphic on Xo(A ∪ A∗, B ∪
B∗;D,G) and that f |X(A∗,B∗;D,G) is locally bounded.
For each P ∈ Ẽ(A), UP is biholomorphic to an open set in C
dP . Consequently, the
mapping fP := f |X(P,B;UP ,G) satisfies the hypotheses of Proposition 9.2. Hence, we
obtain a unique mapping f̂P ∈ O
o (P,B;UP , G) , Z
11 such that
(9.10) (A− lim f̂P )(z, w) = f(z, w), (z, w) ∈ X
P, B̃ ∩B;UP , G
Let 0 < δ ≤ 1
. Using (9.10) and arguing as in the proof of Proposition 9.2, we
may collect the family
f̂P |UP,δ×Gδ
P∈eE(A)
in order to obtain the collected mapping
f̃Aδ ∈ O(Aδ ×Gδ, Z).
Similarly, for each Q ∈ Ẽ(B), one obtains a unique mapping f̂Q ∈
o (A,Q;D, VQ) , Z
12 such that
(9.11) (A− lim f̂Q)(z, w) = f(z, w), (z, w) ∈ X
A ∩ Ã, Q;D, VQ
Moreover, one can collect the family
f̂Q|Dδ×VQ,δ
Q∈eE(B)
in order to obtain the col-
lected mapping f̃Bδ ∈ O(Dδ × Bδ, Z).
Next, we prove that
(9.12) f̃Aδ = f̃
δ on Aδ × Bδ.
Indeed, in virtue of (9.10)–(9.11) it suffices to show that for any P ∈ Ẽ(A) and
Q ∈ Ẽ(B) and any 0 < δ ≤ 1
(9.13) f̂P (z, w) = f̂Q(z, w), (z, w) ∈ UP,δ × VQ,δ.
Observe that in virtue of (9.10)–(9.11) one has that
(A− lim f̂P )(z, w) = (A− lim f̂Q)(z, w) = f(z, w), (z, w) ∈ X (P,Q;UP , VQ) .
11 Here X̂o (P,B;UP , G) := {(z, w) ∈ UP ×G : ω(z, P, UP ) + ω̃(w,B,G) < 1} .
12 Here X̂o (A,Q;D,VQ) := {(z, w) ∈ D × VQ : ω̃(z, A,D) + ω(w,Q, VQ) < 1} .
A UNIFIED APPROACH 47
Recall that UP (resp. VQ) is biholomorphic to a domain in C
dP (resp. CdQ). Con-
sequently, applying the uniqueness of Theorem 8.2 yields that
f̂P (z, w) = f̂Q(z, w), (z, w) ∈ X̂ (P,Q;UP , VQ) .
Hence, the proof of (9.13) and then the proof of (9.12) are finished.
In virtue of (9.12), we are able to define a new mapping f̃δ :
o (Aδ, Bδ;Dδ, Gδ) −→ Z as follows
(9.14) f̃δ :=
f̃Aδ , on Aδ ×Gδ,
f̃Bδ , on Dδ ×Bδ.
Using formula (9.14) it can be readily checked that f̃δ ∈ Os
o (Aδ, Bδ;Dδ, Gδ) , Z
Since we know from Part (2) of Lemma 9.1 that Aδ (resp. Bδ) is an open subset
of Dδ (resp. Gδ), we are able to apply Theorem 7.3 to f̃δ for every 0 < δ ≤
Consequently, one obtains a unique mapping f̂δ ∈ O
X̂ (Aδ, Bδ;Dδ, Gδ) , Z
(9.15) f̂δ = f̃δ on X
o (Aδ, Bδ;Dδ, Gδ) .
It follows from (9.10)–(9.11) and (9.14)–(9.15) that
(9.16) A− lim f̂δ = f on X
A ∩ Ã, B ∩ B̃;Dδ, Gδ
In addition, for any 0 < δ ≤ δ0 ≤
, and any (z, w) ∈ Aδ × Bδ, there is P ∈ Ẽ(A)
such that z ∈ UP,δ0. Therefore, it follows from the construction of f̃
δ , (9.14) and
(9.15) that
f̂δ(z, w) = f̂P (z, w) = f̂δ0(z, w).
This proves that f̂δ = f̂δ0 on Aδ × Bδ for 0 < δ ≤ δ0 ≤
. Hence,
(9.17) f̂δ = f̂δ0 on X(Aδ, Bδ;Dδ0, Gδ0), 0 < δ ≤ δ0 ≤
We are now in a position to define the desired extension mapping f̂ .
f̂ := lim
To prove that f̂ satisfies the desired conclusion of the theorem one proceeds as in
the end of the proof of Proposition 9.2. In virtue of (9.16)–(9.17) and Lemma 7.6,
the proof will be complete if we can verify that for every (z0, w0) ∈ Ŵ , there are an
open neighborhood U×V of (z0, w0) and δ0 > 0 such that hypothesis (iii) of Lemma
7.6 is fulfilled with
D := D, G := G, Pk := A 1
, Qk := B 1
, Dk := D 1
, Gk := G 1
, k > 2.
Since the verification follows along almost the same lines as that of Proposition 9.2,
it is, therefore, left to the interested reader.
Hence, the proof of Theorem A is finished. �
48 VIÊT-ANH NGUYÊN
10. Applications
In this section we give various applications of Theorem A using different systems
of approach regions defined in Subsection 2.2.
10.1. Canonical system of approach regions. For every open subset U ⊂ R2n−1
and every continuous function h : U −→ R, the graph
z = (z
, zn) = (z
, xn + iyn) ∈ C
n : (z
, xn) ∈ U and yn = h(z
, xn)
is called a topological hypersurface in Cn.
Let X be a complex manifold of dimension n. A subset A ⊂ X is said to be a
topological hypersurface if, for every point a ∈ A, there is a local chart (U, φ : U →
n) around a such that φ(A ∩ U) is a topological hypersurface in Cn
Now let D ⊂ X be an open subset and let A ⊂ ∂D be an open subset (with
respect to the topology induced on ∂D). Suppose in addition that A is a topological
hypersurface. A point a ∈ A is said to be of type 1 (with respect to D) if, for every
neighborhood U of a there is an open neighborhood V of a such that V ⊂ U and
V ∩D is a domain. Otherwise, a is said to be of type 2. We see easily that if a is of
type 2, then for every neighborhood U of a, there are an open neighborhood V of a
and two domains V1, V2 such that V ⊂ U, V ∩D = V1 ∪ V2 and all points in A ∩ V
are of type 1 with respect to V1 and V2.
In virtue of Proposition 3.7 in [35] we have the following
Proposition 10.1. Let X be a complex manifold and D an open subset of X. D
is equipped with the canonical system of approach regions. Suppose that A ⊂ ∂D is
an open boundary subset which is also a topological hypersurface. Then A is locally
pluriregular and A ⊂ Ã.
This, combined with Theorem A, implies the following result.
Theorem 10.2. Let X, Y be two complex manifolds, and D ⊂ X, G ⊂ Y two
nonempty open sets. D (resp. G) is equipped with the canonical system of approach
regions. Let A (resp. B) be a nonempty open subset of ∂D (resp. ∂G) which is also
a topological hypersurface. Let Z be a complex analytic space possessing the Hartogs
extension property. Define
W := X(A,B;D,G),
Ŵ := {(z, w) ∈ D ×G : ω(z, A,D) + ω(w,B,G) < 1} .
Let f : W −→ Z be such that:
(i) f ∈ Cs(W,Z) ∩ Os(W
o, Z);
(ii) f is locally bounded on W ;
(iii) f |A×B is continuous.
Then there exists a unique mapping f̂ ∈ O(Ŵ ) such that
cW∋(z,w)→(ζ,η)
f̂(z, w) = f(ζ, η), (ζ, η) ∈ W.
A UNIFIED APPROACH 49
If, moreover, Z = C and |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−ω(z,w)
A×B |f |
ω(z,w)
W , (z, w) ∈ Ŵ .
The special case where Z = C has been proved in [35].
10.2. System of angular approach regions. We will use the terminology and
the notation in Paragraph 3 of Subsection 2.2. More precisely, if D is an open set
of a Riemann surface such that D is good on a nonempty part of ∂D, we equip D
with the system of angular approach regions supported on this part. Moreover, the
notions such as set of positive length, set of zero length, locally pluriregular point
which exist on ∂E can be transferred to ∂D using conformal mappings in a local
way (see [34] for more details).
Theorem 10.3. Let X, Y be Riemann surfaces and D ⊂ X, G ⊂ Y open subsets
and A (resp. B) a subset of ∂D (resp. ∂G) such that D (resp. G) is good on A
(resp. B) and that both A and B are of positive length. Let Z be a complex analytic
space possessing the Hartogs extension property. Define
W := X(A,B;D,G), W
:= X(A
;D,G),
Ŵ := {(z, w) ∈ D ×G : ω(z, A,D) + ω(w,B,G) < 1} ,
(z, w) ∈ D ×G : ω(z, A
, D) + ω(w,B
, G) < 1
where A
(resp. B
) is the set of points at which A (resp. B) is locally pluriregular
with respect to the system of angular approach regions supported on A (resp. B),
and ω(·, A,D), ω(·, A
, D) (resp. ω(·, B,G), ω(·, B
, G)) are calculated using the
canonical system of approach regions.
Then for every mapping f : W −→ Z which satisfies the following conditions:
(i) f ∈ Cs(W,Z) ∩ Os(W
o, Z);
(ii) f is locally bounded;
(iii) f |A×B is continuous,
there exists a unique mapping f̂ ∈ O(Ŵ
, Z) which admits the angular limit f at all
points of W ∩W
If A and B are Borel sets or if X = Y = C then Ŵ = Ŵ
If Z = C and |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−ω(z,A
,D)−ω(w,B
A×B |f |
ω(z,A
,D)+ω(w,B
W , (z, w) ∈ Ŵ
Theorem 10.3 generalizes, in some sense, the result of [34].
In the above theorem we have used the equality
W = Ŵ
when either A and B
are Borel sets or X = Y = C. This follows from the identity ω(·, A,D) = ω̃(·, A,D)
when either A is a Borel set or D ⊂ C (see Theorem 4.6 in [34]). On the other hand,
we can sharpen Theorem 10.3 further, namely, hypothesis (i) can be replaced by a
weaker hypothesis (i’) as follows:
50 VIÊT-ANH NGUYÊN
(i’) for any a ∈ A the mapping f(a, ·)|G is holomorphic and has angular limit
f(a, b) at all points b ∈ B, and for any b ∈ B the mapping f(·, b)|D is
holomorphic and has angular limit f(a, b) at all points a ∈ A.
To see this it suffices to observe that the hypotheses of Theorem 3.8 and Theorem
6.1 can be weakened considerably when the bounded open set D therein is just
one-dimensional.
10.3. System of conical approach regions. The remaining part of this section
is devoted to two important applications of Theorem A: a boundary cross theorem
and a mixed cross theorem. In order to formulate them, we need to introduce some
terminology and notation.
Let X be an arbitrary complex manifold and D ⊂ X an open subset. We say
that a set A ⊂ ∂D is locally contained in a generating manifold if there exist an
(at most countable) index set J 6= ∅, a family of open subsets (Uj)j∈J of X and
a family of generating manifolds13 (Mj)j∈J such that A ∩ Uj ⊂ Mj, j ∈ J, and
that A ⊂
j∈J Uj . The dimensions of Mj may vary according to j ∈ J. Given a
set A ⊂ ∂D which is locally contained in a generating manifold, we say that A is
of positive size if under the above notation
j∈J mesMj (A∩ Uj) > 0, where mesMj
denotes the Lebesgue measure on Mj. A point a ∈ A is said to be a density point
of A if it is a density point of A ∩ Uj on Mj for some j ∈ J. Denote by A
the set
of density points of A.
Suppose now that A ⊂ ∂D is of positive size. We equip D with the system
of conical approach regions supported on A. Using the work of B. Jöricke (see,
for example, Theorem 3, pages 44–45 in [15]), one can show that14 A is locally
pluriregular at all density points of A. Observe that mesMj
(A \ A
) ∩ Uj
= 0 for
j ∈ J. Therefore, it is not difficult to show that A
is locally pluriregular. Choose
an increasing sequence (An)
n=1 of subsets of A such that An ∩ Uj is closed and
mesMj
An) ∩ Uj
= 0 for j ∈ J. Observe that A
n is locally pluriregular,
n ∩ Uj ⊂ A for j ∈ J and that Â :=
n is locally pluriregular and that Â is
locally pluriregular at all points of A
. Consequently, it follows from Definition 2.3
ω̃(z, A,D) ≤ ω(z, A
, D), z ∈ D.
This estimate, combined with Theorem A, implies the following result which is a
generalization in higher dimensions of Theorem 10.3.
Theorem 10.4. Let X, Y be two complex manifolds, let D ⊂ X, G ⊂ Y be two
open sets, and let A (resp. B) be a subset of ∂D (resp. ∂G). D (resp. G) is equipped
with a system of conical approach regions
Aα(ζ)
ζ∈D, α∈Iζ
(resp.
Aβ(η)
η∈G, β∈Iη
13 A differentiable submanifold M of a complex manifold X is said to be a generating manifold
if for all ζ ∈ M, every complex vector subspace of TζX containing TζM coincides with TζX.
14 A complete proof will be available in [29].
A UNIFIED APPROACH 51
supported on A (resp. on B). Suppose in addition that A and B are of positive size.
Let Z be a complex analytic space possessing the Hartogs extension property. Define
:= X(A
;D,G),
(z, w) ∈ D ×G : ω(z, A
, D) + ω(w,B
, G) < 1
where A
(resp. B
) is the set of density points of A (resp. B).
Then, for every mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded;
• f |A×Bis continuous,
there exists a unique mapping f̂ ∈ O(Ŵ
, Z) which admits A-limit f(ζ, η) at every
point (ζ, η) ∈ W ∩W
If, moreover, Z = C and |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−ω(z,A
,D)−ω(w,B
A×B |f |
ω(z,A
,D)+ω(w,B
W , (z, w) ∈ Ŵ
The second application is a very general mixed cross theorem.
Theorem 10.5. Let X, Y be two complex manifolds, let D ⊂ X, G ⊂ Y be open
sets, let A be a subset of ∂D, and let B be a subset of G. D is equipped with
the system of conical approach regions
Aα(ζ)
ζ∈D, α∈Iζ
supported on A and G is
equipped with the canonical system of approach regions
Aβ(η)
η∈G, β∈Iη
. Suppose
in addition that A is of positive size. Let Z be a complex analytic space possessing
the Hartogs extension property. Define
:= X(A
, B∗;D,G),
(z, w) ∈ D ×G : ω(z, A
, D) + ω(w,B∗, G) < 1
where A
is the set of density points of A and B∗ denotes, as usual (see Subsection
2.1 above), the set of points in B ∩G at which B is locally pluriregular.
Then, for every mapping f : W −→ Z which satisfies the following conditions:
• f ∈ Cs(W,Z) ∩ Os(W
o, Z);
• f is locally bounded along A×G,
there exists a unique mapping f̂ ∈ O(Ŵ
, Z) which admits A-limit f(ζ, η) at every
point (ζ, η) ∈ W ∩W
If, moreover, Z = C and |f |W <∞, then
|f̂(z, w)| ≤ |f |
1−ω(z,A
,D)−ω(w,B∗,G)
A×B |f |
ω(z,A
,D)+ω(w,B∗,G)
W , (z, w) ∈ Ŵ
Concluding remarks. In ongoing joint-works with Pflug [30, 31] we develop new
cross theorems with singularities. On the other hand, in [36] the problem of opti-
mality of the envelope of holomorphy
W in Theorem A has been investigated.
52 VIÊT-ANH NGUYÊN
References
[1] R. A. Airapetyan, G. M. Henkin, Analytic continuation of CR-functions across the “edge of
the wedge”, Dokl. Akad. Nauk SSSR, 259 (1981), 777-781 (Russian). English transl.: Soviet
Math. Dokl., 24 (1981), 128–132.
[2] R. A. Airapetyan, G. M. Henkin, Integral representations of differential forms on Cauchy-
Riemann manifolds and the theory of CR-functions. II, Mat. Sb., 127(169), (1985), 92–112,
(Russian). English transl.: Math. USSR-Sb. 55 (1986), 99–111.
[3] O. Alehyane et J. M. Hecart, Propriété de stabilité de la fonction extrémale relative, Potential
Anal., 21, (2004), no. 4, 363–373.
[4] K. Adachi, M. Suzuki and M. Yoshida, Continuation of holomorphic mappings with values
in a complex Lie group, Pacific J. Math., 47, (1973), 1–4.
[5] O. Alehyane et A. Zeriahi, Une nouvelle version du théorème d’extension de Hartogs pour
les applications séparément holomorphes entre espaces analytiques, Ann. Polon. Math., 76,
(2001), 245–278.
[6] E. Bedford, The operator (ddc)n on complex spaces, Semin. P. Lelong - H. Skoda, Analyse,
Années 1980/81, Lect. Notes Math., 919, (1982), 294–323.
[7] E. Bedford, B. A. Taylor, A new capacity for plurisubharmonic functions, Acta Math., 149,
(1982), 1–40.
[8] S. Bernstein, Sur l’ordre de la meilleure approximation des fonctions continues par des
polynômes de degré donné, Bruxelles 1912.
[9] L. M. Drużkowski, A generalization of the Malgrange–Zerner theorem, Ann. Polon. Math.,
38, (1980), 181–186.
[10] A. Edigarian, Analytic discs method in complex analysis, Diss. Math. 402, (2002), 56 pages.
[11] G. M. Goluzin, Geometric theory of functions of a complex variable, (English), Providence,
R. I.:American Mathematical Society (AMS). VI, (1969), 676 pages.
[12] A. A. Gonchar, On analytic continuation from the “edge of the wedge” theorem, Ann. Acad.
Sci. Fenn. Ser. A.I: Mathematica, 10, (1985), 221–225.
[13] A. A. Gonchar, On Bogolyubov’s “edge-of-the-wedge” theorem, Proc. Steklov Inst. Math.,
228, (2000), 18–24.
[14] F. Hartogs, Zur Theorie der analytischen Funktionen mehrer unabhängiger Veränderlichen,
insbesondere über die Darstellung derselben durch Reihen, welche nach Potenzen einer
Veränderlichen fortschreiten, Math. Ann., 62, (1906), 1–88.
[15] B. Jöricke, The two-constants theorem for functions of several complex variables, (Russian),
Math. Nachr. 107 (1982), 17–52.
[16] B. Josefson, On the equivalence between polar and globally polar sets for plurisubharmonic
functions on Cn, Ark. Mat., 16, (1978), 109–115.
[17] S. M. Ivashkovich, The Hartogs phenomenon for holomorphically convex Kähler manifolds,
Math. USSR-Izv., 29, (1997), 225–232.
[18] M. Jarnicki, P. Pflug, Extension of Holomorphic Functions, de Gruyter Expositions in Math-
ematics 34, Walter de Gruyter, 2000.
[19] M. Jarnicki, P. Pflug, Invariant distances and metrics in complex analysis—revisited, Disser-
tationes Math. (Rozprawy Mat.), 430, (2005), 192 pages.
[20] M. Klimek, Pluripotential theory, London Mathematical society monographs, Oxford Univ.
Press., 6, (1991).
[21] H. Komatsu, A local version of Bochner’s tube theorem, J. Fac. Sci., Univ. Tokyo, Sect. I A
19, (1972), 201–214.
[22] F. Lárusson, R. Sigurdsson, Plurisubharmonic functions and analytic discs on manifolds, J.
Reine Angew. Math., 501, (1998), 1–39.
[23] Nguyên Thanh Vân, Separate analyticity and related subjects, Vietnam J. Math., 25, (1997),
81–90.
A UNIFIED APPROACH 53
[24] Nguyên Thanh Vân, Note on doubly orthogonal system of Bergman, Linear Topological
Spaces and Complex Analysis, 3, (1997), 157–159.
[25] Nguyên Thanh Vân et A. Zeriahi, Familles de polynômes presque partout bornées, Bull. Sci.
Math., 107, (1983), 81–89.
[26] Nguyên Thanh Vân et A. Zeriahi, Une extension du théorème de Hartogs sur les fonctions
séparément analytiques, Analyse Complexe Multivariable, Récents Développements, A. Meril
(ed.), EditEl, Rende, (1991), 183–194.
[27] Nguyên Thanh Vân et A. Zeriahi, Systèmes doublement orthogonaux de fonctions holomor-
phes et applications, Banach Center Publ. 31, Inst. Math., Polish Acad. Sci., (1995), 281–297.
[28] V.-A. Nguyên, A general version of the Hartogs extension theorem for separately holomorphic
mappings between complex analytic spaces, Ann. Scuola Norm. Sup. Pisa Cl. Sci., (2005), serie
V, Vol. IV(2), 219–254.
[29] V.-A. Nguyên, Conical plurisubharmonic measure and new cross theorems, in preparation.
[30] V.-A. Nguyên and P. Pflug, Boundary cross theorem in dimension 1 with singularities,
arXiv:0705.4649v1, preprint of the ICTP, Trieste-Italy, (2007), 19 pages.
[31] V.-A. Nguyên and P. Pflug, Cross theorems with singularities, in preparation, 22 pages.
[32] P. Pflug, Extension of separately holomorphic functions–a survey 1899–2001, Ann. Polon.
Math., 80, (2003), 21–36.
[33] P. Pflug and V.-A. Nguyên, A boundary cross theorem for separately holomorphic functions,
Ann. Polon. Math., 84, (2004), 237–271.
[34] P. Pflug and V.-A. Nguyên, Boundary cross theorem in dimension 1, Ann. Polon. Math.,
90(2), (2007), 149-192.
[35] P. Pflug and V.-A. Nguyên, Generalization of a theorem of Gonchar, Ark. Mat., 45, (2007),
105–122.
[36] P. Pflug and V.-A. Nguyên, Envelope of holomorphy for boundary cross sets, Arch. Math.
(Basel), 89, (2007), 326–338.
[37] E. A. Poletsky, Plurisubharmonic functions as solutions of variational problems, Several
complex variables and complex geometry, Proc. Summer Res. Inst., Santa Cruz/CA (USA)
1989, Proc. Symp. Pure Math. 52, Part 1, (1991), 163–171.
[38] E. A. Poletsky, Holomorphic currents, Indiana Univ. Math. J., 42, No.1, (1993), 85–144.
[39] T. Ransford, Potential theory in the complex plane, London Mathematical Society Student
Texts, 28, Cambridge: Univ. Press., (1995).
[40] J. P. Rosay, Poletsky theory of disks on holomorphic manifolds, Indiana Univ. Math. J., 52,
No.1, (2003), 157–169.
[41] B. Shiffman, Extension of holomorphic maps into Hermitian manifolds, Math. Ann., 194,
(1971), 249–258.
[42] B. Shiffman, Hartogs theorems for separately holomorphic mappings into complex spaces, C.
R. Acad. Sci. Paris Sér. I Math., 310 (3), (1990), 89–94.
[43] J. Siciak, Analyticity and separate analyticity of functions defined on lower dimensional sub-
sets of Cn, Zeszyty Nauk. Univ. Jagiellon. Prace Mat., 13, (1969), 53–70.
[44] J. Siciak, Separately analytic functions and envelopes of holomorphy of some lower dimen-
sional subsets of Cn, Ann. Polon. Math., 22, (1970), 145–171.
[45] V. P. Zahariuta, Separately analytic functions, generalizations of the Hartogs theorem and
envelopes of holomorphy, Math. USSR-Sb., 30, (1976), 51–67.
[46] M. Zerner, Quelques résultats sur le prolongement analytique des fonctions de variables com-
plexes, Séminaire de Physique Mathématique.
[47] A. Zeriahi, Comportement asymptotique des systèmes doublement orthogonaux de Bergman:
Une approche élémentaire, Vietnam J. Math., 30, No.2, (2002), 177–188.
[48] H. Wu, Normal families of holomorphic mappings, Acta Math., 119, (1967), 193–233.
http://arxiv.org/abs/0705.4649
54 VIÊT-ANH NGUYÊN
Viêt-Anh Nguyên, Mathematics Section, The Abdus Salam international centre
for theoretical physics, Strada costiera, 11, 34014 Trieste, Italy
E-mail address : vnguyen0@ictp.trieste.it
	1. Introduction
	2. Preliminaries and statement of the main result
	2.1. Approach regions, local pluripolarity and plurisubharmonic measure
	2.2. Examples of systems of approach regions
	2.3. Cross and separate holomorphicity and A-limit.
	2.4. Hartogs extension property.
	2.5. Statement of the main results
	3. Holomorphic discs and a Two-Constant Theorem
	3.1. Poletsky theory of discs and Rosay Theorem on holomorphic discs
	3.2. Level sets of the relative extremal functions and a Two-Constant Theorem
	3.3. Construction of discs
	4. A mixed cross theorem
	5. Completion of the proof of Theorem 4.2
	6. A local version of Theorem A
	7. Preparatory results
	8. Local and semi-local versions of Theorem A
	9. The proof of Theorem A
	10. Applications
	10.1. Canonical system of approach regions
	10.2. System of angular approach regions
	10.3. System of conical approach regions
	References
ABSTRACT
  We extend the theory of separately holomorphic mappings between complex
analytic spaces. Our method is based on Poletsky theory of discs, Rosay Theorem
on holomorphic discs and our recent joint-work with Pflug on cross theorems in
dimension 1. It also relies on our new technique of conformal mappings and a
generalization of Siciak's relative extremal function.
  Our approach illustrates the unified character: ``From local informations to
global extensions". Moreover, it avoids systematically the use of the classical
method of doubly orthogonal bases of Bergman type.

<|endoftext|><|startoftext|>
Higher spin algebras as higher symmetries
Xavier Bekaert
Laboratoire de Mathématiques et Physique Théorique
Unité Mixte de Recherche 6083 du CNRS, Fédération Denis Poisson
Université François Rabelais, Parc de Grandmount
37200 Tours, France
Abstract
The exhaustive study of the rigid symmetries of arbitrary free field
theories is motivated, along several lines, as a preliminary step in the
completion of the higher-spin interaction problem in full generality. Some
results for the simplest example (a scalar field) are reviewed and com-
mented along these lines.
Expanded version of the lectures presented at the “5th international school
and workshop on QFT & Hamiltonian systems” (Calimanesti, May 2006).
1 Higher-spin interaction problem
Whereas covariant gauge theories describing arbitrary free massless fields
on constant-curvature spacetimes of dimension n are firmly established by
means of the unitary representation theory of their isometry groups, it is
still open to question whether non-trivial consistent self-couplings and/or
cross-couplings among those fields may exist for n > 2 , such that the
deformed gauge algebra is non-Abelian. The goal of the present paper is
to advocate that a lot of information on the interactions can be extracted
from the symmetries of the free field theory.
The conventional local free field theories corresponding to unitary irre-
ducible representations of the helicity group SO(n− 2) , that are spanned
by completely symmetric tensors, have been constructed a while ago (for
some introductory reviews, see [1]). In order to have Lorentz invariance
manifest and second order local field equations with minimal field content,
the theory is expressed in terms of completely symmetric double-traceless
tensor gauge fields hµ1... µs of rank s > 0, the gauge transformation of
which reads
δξ hµ1µ2... µs =
∇ µ1 ξµ2...µs + cyclic , (1)
E-mail address: Xavier.Bekaert@lmpt.univ-tours.fr
http://arxiv.org/abs/0704.0898v2
where
∇ is the covariant derivative with respect to the background Levi–
Civita connection and “cyclic” stands for the sum of terms necessary to
have symmetry of the right-hand-side under permutations of the indices.
The gauge parameter ξ is a completely symmetric traceless tensor field
of rank s − 1. In this relativistic field theory, the “spin” is equal to the
rank s. For spin s = 1 the gauge field hµ represents the photon with U(1)
gauge symmetry while for spin s = 2 the gauge field hµν represents the
graviton with linearized diffeomorphism invariance. The gauge algebra of
field independent gauge transformations such as (1) is of course Abelian.
Non-Abelian gauge theories for “lower spin” s 6 2 are well known and
essentially correspond to Yang-Mills (s = 1) and Einstein (s = 2) theories
for which the underlying geometries (principal bundles and Riemannian
manifolds) were familiar to mathematicians before the construction of the
physical theory. In contrast, the situation is rather different for “higher
spin” s > 2 for which the underlying geometry (if any!) remains obscure.
Due to this lack of information, it is natural to look for inspiration in
the perturbative “reconstruction” of Einstein gravity as the non-Abelian
gauge theory of a spin-two particle propagating on a constant-curvature
spacetime (see e.g. [2] for a comprehensive review).
Let us denote by
S [hµ1... µs ] the Poincaré-invariant, local, second-
order, quadratic, ghost-free, gauge-invariant action of a spin-s symmetric
tensor gauge field. In order to perform a perturbative analysis via the
Noether method [3], the non-Abelian interaction problem for a collection
of higher (and possibly lower) spin gauge fields is formulated as a defor-
mation problem.
Higher-spin interaction problem: List all Poincaré-invariant local
deformations
S[h] =
S [h] + ε
S [h] + O(ε
of a positive sum, with at least one s > 2,
S [h] =
S [hµ1... µs ]
of quadratic actions such that the deformed local gauge symmetries
δξh =
δξ h + ε
δξ h + O(ε
are already non-Abelian at first order, in the deformation parameters ε
and do not arise from local redefinitions
h → h + ε φ(h) + O(ε
) , ξ → ξ + ε ζ(h, ξ) + O(ε
of the gauge fields and parameters.
This well-posed mathematical problem is expected to possess non-
trivial solutions including higher-spin fields, as strongly indicated by Vasiliev’s
works (for some reviews, see [4] and references therein) and deserves to
be investigated further along systematic lines.
2 The Noether method
The assumption that the deformations are formal power series in some
deformation parameters ε enables to investigate the problem order by
order. The crucial observation of any perturbation theory is that the first
order deformations are constrained by the symmetries of the undeformed
system. In the present case, the Noether method scrutinizes the gauge
symmetry of the action, δξS = 0 . At zeroth order, the latter equation is
satisfied by hypothesis. At first order, it reads
S = 0 . (2)
This equation may be used to constrain the possible deformations by
reinterpreting them as familiar objects of the undeformed gauge theory.
By definition, an observable of a gauge theory is a functional which is
gauge-invariant on-shell, while a reducibility parameter of a gauge theory
is a gauge parameter such that the corresponding gauge variation vanishes
off-shell.
First-order deformations in terms of the undeformed theory:
• First-order deformations of the action are observables of the undeformed
theory.
• First-order deformations of the gauge symmetries evaluated at reducibil-
ity parameters of the undeformed gauge theory define symmetries of the
undeformed theory.
Proof: In (2) the infinitesimal variation
S of the undeformed action
is proportional to the undeformed Euler–Lagrange equations. This proves
the fist part of the theorem. Reducibility parameters ξ of the undeformed
gauge theory verify
h = 0 by definition. Inserting this fact into (2)
with ξ = ξ gives
S = 0 , which is precisely the translation of the second
part of the theorem.
In the mathematical litterature, a (conformal) Killing tensor of a
pseudo-Riemannian manifold is a symmetric tensor field ξ such that its
symmetrized covariant derivative with respect to the Levi–Civita connec-
tion, ∇µ1 ξµ2...µs + cyclic, vanishes (modulo a term proportional to the
metric for conformal Killing tensors). Therefore, any reducibility parame-
ter ξ of the spin-s symmetric gauge field theory on the constant-curvature
spacetime M is identified with a Killing tensor of rank s− 1 of the mani-
fold M. The space of Killing tensors on any constant-curvature spacetime
is known to be finite-dimensional [5], thus the linear gauge symmetries (1)
are irreducible.
These results suggest two strategies for addressing the higher-spin in-
teraction problem. The most ambitious one is the computation of all lo-
cal observables of the free gauge theory associated to deformations of the
gauge algebra. This result would provide the exhaustive list of algebra-
deforming first order vertices, but this computation is technically demand-
ing and seems out of reach in the completely general case. Nevertheless,
the BRST reformulation of the problem [6] allowed the complete classi-
fication of non-Abelian deformations in various particular cases (see e.g.
the review [7] and references therein). Actually, a more humble strategy
is the computation of all rigid symmetries of the free irreducible gauge
theory. It is of interest because the knowledge of these rigid symmetries
would strongly constrain the candidates for gauge symmetry deforma-
tions. Indeed, the constant tensors appearing in the rigid symmetries
could be compared with the complete list [5] of constant-curvature space-
time Killing tensors.
3 Free theory symmetries
Bosonic fields are usually described in terms of their components living in
some subspace V of the space ⊗(Rn) of tensors on Rn (e.g. V = ⊙(Rn)
for symmetric tensor fields). The background metric of the constant-
curvature spacetime induces some non-degenerate bilinear form on V .
This defines a non-degenerate sesquilinear form 〈 | 〉 on the space L2(Rn)⊗
V of square-integrable fields taking values in the countable space V (the
components). Let † stands for the adjoint with respect to the sesquilinear
form 〈 | 〉 .
Any quadratic action for bosonic fields ψ can be expressed as a quadratic
S [ψ] =
〈ψ | K | ψ 〉 , (3)
where the kinetic operatorK is self-adjoint, K† = K. Because the sesquilin-
ear form 〈 | 〉 is non-degenerate, the Euler-Lagrange equation extremizing
the quadratic action is the linear equation
δ〈ψ |
= K|ψ 〉 = 0 . (4)
Moreover, the quadratic form 〈ψ | K | ψ 〉 is degenerate if and only if the
kinetic operator K is degenerate. This happens if and only if there exists
a linear operator P (on L2(Rn) ⊗ V ) such that KP = 0. Infinitesimal
gauge symmetries then read
δχ | ψ 〉 = P | χ 〉 ,
with gauge parameters χ . The Noether identity is P†K = (KP)† = 0 .
A symmetry of the quadratic action (3) is an invertible linear pseudo-
differential operator U preserving the quadratic form 〈 | K | 〉. In other
words,
KU = K .
The group of off-shell symmetries is the group of symmetries of the quadratic
action endowed with the composition ◦ as product. A symmetry genera-
tor of the quadratic action (3) is a linear differential operator T which is
self-adjoint with respect to the quadratic form 〈 | K | 〉. More concretely,
KT = T
Any symmetry generator T defines a symmetry U = eiT of the quadratic
action (3). If T = T† then the linear operator T is a symmetry generator
of the quadratic action if and only it commutes with K. The real Lie
algebra of off-shell symmetries is the algebra of symmetry generators of
the quadratic action endowed with i times the commutator as Lie bracket,
{ , } := i [ , ].
A symmetry of the linear equation (4) is a linear differential operator
T obeying
KT = SK , (5)
for some linear operator S. Such a symmetry T preserves the space KerK
of solutions to the equations of motion. Any symmetry generator T of
the action (3) is always a symmetry of the equation of motion (4) with
S = T† in (5). A symmetry T is trivial on-shell if T = RK for some
linear operator R. Such an on-shell-trivial symmetry is always a sym-
metry of the field equation (4), since it obeys (5) with S = KR. The
algebra of on-shell-trivial symmetries obviously forms a left ideal in the
algebra of linear differential operators endowed with the composition ◦
as multiplication. Furthermore, it is also a right ideal in the algebra of
symmetries of the linear equation (4). The complex associative algebra of
on-shell symmetries is the associative algebra of symmetries of the linear
equation quotiented by the two-sided ideal of on-shell-trivial symmetries.
The complex Lie algebra of on-shell symmetries is the algebra of on-shell
symmetries endowed with the commutator as Lie bracket.
Notice that when K is non-degenerate, a linear operator T = RK is
a symmetry generator of the quadratic action (3) if and only if R is self-
adjoint. Moreover, the Lie subalgebra of such on-shell-trivial symmetry
generators is an ideal in the Lie algebra of off-shell symmetries.
4 Higher-spin algebras
Let g be the Lie algebra corresponding to the finite-dimensional (confor-
mal) isometry group G of the constant-curvature spacetime of dimension
n > 2. For n = 2 , the spacetime may be arbitrary and the conformal
algebra is of course infinite-dimensional. If the free field theory is rela-
tivistic, then g is linearly realized on the space L2(Rn)⊗ V (respectively,
KerK) of off-shell (resp. on-shell) fields. This induces a linear realiza-
tion of the universal enveloping algebra U(g) over C. The real form of
this realization corresponding to the self-adjoint operators, endowed with
i times the commutator as Lie bracket, is nowadays referred to as (confor-
mal) on/off-shell higher-spin algebra of the constant-curvature spacetime
(see e.g. [8] for an elementary introduction to such algebraic structures).
The name comes from the fact that its generators are in “higher-spin”
representations of the Lorentz group, and the algebra is said to be “on”
or “off” shell whether the algebra is realized on the space of solutions of
the Euler-Lagrange equations or not.
The isometry algebra g of a constant-curvature spacetime is a module
of the Lorentz subalgebra o(n − 1, 1) ⊂ g for the adjoint representation.
This module decomposes as the sum of two irreducible o(n−1, 1)-modules:
the “translations” are in the vector module ∼= R
n while the boosts and ro-
tations are in the antisymmetric module ∼= ∧
2(Rn). These representations
are labelled by one-column Young diagrams of, respectively, one and two
cells. The number of columns is associated with the spin. The fact that
the generators of U(g) are in higher-spin representations is summarized in
the following result.
Universal enveloping algebra of isometries: The universal envelop-
ing algebra U(g) of the isometry algebra g of an n-dimensional constant-
curvature spacetime is an infinite-dimensional module of the general linear
Lie algebra gl(n), decomposing as an infinite sum of finite-dimensional ir-
reducible gl(n)-modules labelled by the set of all Young diagrams, with
multiplicity one, the first column of which has length 6 n.
Proof: The Poincaré-Birkhoff-Witt theorem states that the universal en-
veloping algebra U(g) is isomorphic to the symmetric algebra ⊙(g) as a
vector space. As a gl(n)-module, the vector space g is isomorphic to the
sum Rn ⊕ ∧2(Rn) of irreducible modules. This leads to the following
isomorphism of modules:
⊙ (g) ∼=
. (6)
The idea is to evaluate the right-hand-side of (6) using the available tech-
nology on Kronecker products of irreducible representations [9]. The mod-
ule ⊙(Rn) decomposes as the infinite sum of irreducible modules labelled
by all one-row Young diagrams with multiplicity one. A formula of Lit-
tlewood for symmetric plethsyms implies that the module ⊙
∧2 (Rn)
decomposes as the infinite sum of irreducible modules, with multiplic-
ity one, labelled by all Young diagrams with columns of even lengths.
The Kronecker product in (6) decomposes as the infinite sum of all the
Kronecker products between a one-row Young diagram and a Young dia-
gram with columns of even lengths, each with multiplicity one. Using the
Littlewood–Richardson rule, one may show that the result of this compu-
tation is the infinite sum of irreducible modules labelled with all possible
Young diagrams, each with multiplicity one. The Young diagrams whose
first column has length greater than n lead to vanishing modules, hence
they do not appear in the series.
The higher-spin algebras are important in relativistic field theories
because they always appear as spacetime symmetry algebras in the free
limit.
Spacetime symmetries of relativistic free field theories: If the
Lie algebra of off/on-shell symmetries contains the (conformal) isometry
algebra g of some constant-curvature spacetime M, then it also contains
the (conformal) off/on-shell higher-spin algebra of M.
Proof: The Poincaré-Birkhoff-Witt theorem states that one can realize
the universal enveloping U(g) as Weyl-ordered polynomials in the elements
of the Lie algebra g. The above theorem is proved by observing that
any Weyl-ordered polynomial in on-shell symmetries is itself an on-shell
symmetry. As observed in [10], the same is true for symmetry generators.
As an important corollary, the theorem implies that any relativistic
free field theory has an infinite number of rigid symmetries, and therefore
it possesses an infinite number of conserved currents via the Noether the-
orem, as it is well known. Notice that relativistic integrable models are
precisely such that they possess an infinite set of commuting rigid symme-
tries corresponding to an infinite set of conserved charges in involution.
The infinite-dimensional subalgebra of symmetries of the free field theory
generated by the translations only is, of course, Abelian. Actually, the fac-
torization property is deeply related to the preservation of this subalgebra
of symmetries at the interacting level [11]. Thus the relationship between
higher-spin algebras and integrable models appears to be very intimate
(see also [12] and references therein). The strong form of the Maldacena
conjecture (in the large N limit) and the integrability properties recently
enlightened in this context are further indications of such a relationship.
Symmetries may be characterized by their action on the spacetime co-
ordinates. A smooth change of coordinates is generated by a first-order
linear differential operator. Therefore, a higher-order linear differential
operator does not generate coordinate transformations. For instance, an
isometry generator is a first-order linear differential operator correspond-
ing to a Killing vector field, but the spacetime higher-symmetries are
powers of such isometry generators, hence they are higher-order linear
differential operators. They do not generate coordinate transformations
and this explains why spacetime higher-symmetries are usually not con-
sidered in textbooks.
Let us focus on the first non-trivial example of free field theory: the
quadratic action of a complex scalar field on an n-dimensional spacetime
M. In such case, the space V = C and the kinetic operator K can be
taken to be a constant mass term plus the Laplacian on M,
A scalar field is said to be conformal if its kinetic operator is the conformal
Laplacian
4 (n− 1)
R , (7)
where R denotes the scalar curvature. The quadratic action and the linear
equation are symmetric under the full conformal algebra o(n, 2) if and only
if the scalar field is conformal and has conformal weight 1− n/2.
Higher symmetries of the conformal scalar field: For the quadratic
action of a complex conformal scalar field on a constant-curvature space-
time M of dimension n > 2, the following spaces over R are isomorphic:
• The Lie algebra of off-shell symmetries quotiented by the ideal of on-
shell-trivial symmetry generators,
• A real form of the associative algebra of on-shell symmetries.
• The conformal on-shell higher-spin algebra,
• The real algebra of Weyl-ordered polynomials in the conformal Killing
vector fields quotiented by the ideal generated by the conformal Laplacian,
endowed with i times the commutator as Lie bracket. The symbols of these
differential operators,
T = (−i)
µ1...µr
∇µ1 . . .
∇µr + lower + on-shell-trivial ,
may be represented by real traceless symmetric tensor fields ξ which are
conformal Killing tensors.
Moreover, in n = 2 dimensions the theorem is valid for an arbitrary space-
time manifold.
Proof: The theorem can be extracted from the results of [13] on flat
spacetime of dimension n > 2 by taking into account that any constant-
curvature spacetime M can be seen as a conic in the projective null cone
of the ambient space Rn,2 . The two-dimensional case is addressed by
using the left/right-moving coordinates.
Notice that the on-shell higher-spin algebra of a non-conformal scalar
field on a constant-curvature spacetime is a proper subalgebra of the uni-
versal enveloping algebra of the isometry algebra g: it decomposes as
the infinite sum of irreducible o(n− 1, 1)-modules labelled by all two-row
Young diagrams with multiplicity one, as reviewed in [4, 7]. This algebra
is in one-to-one correspondence with the space of reducibility parameters
of the infinite tower of symmetric tensor gauge fields where each field ap-
pears once and only once for each given spin s > 0. Moreover, notice that
the AdSn+1/CFTn correspondence for n > 2 in the weak tension/coupling
limit also makes use of the isomorphism between the on-shell higher-spin
algebra of a non-conformal scalar field on AdSn+1 and the on-shell sym-
metry algebra of a conformal scalar field on Rn−1,1 (see [14] for the cor-
respondence at the level of conserved currents). Remark also that the
conformal on-shell higher-spin algebra of a two-dimensional spacetime for
a massless scalar field is isomorphic to the direct sum of u(1) and the two
Lie algebras of differential operators for the left and right moving sectors
respectively. Each of such algebras of differential operators is isomorphic
to the algebra W∞ with zero central charge [15].
The deep connection between higher-spin algebras and integrable mod-
els is exhibited by the following example in n = 2 dimensions.
Higher symmetries of the interacting scalar field: A non-linear
action of a real scalar field on the two-dimensional Minkowski spacetime,
without derivative interaction term, of the form
S[φ] =
〈φ | � | φ 〉+
xV (φ) , V (φ) = O(φ
is invariant under an infinite number of local infinitesimal rigid symmetry
transformations, independent of the coordinate xµ, if and only if
V (φ) = ±
cos (αφ)− 1
, m ∈ R ,
the parameter α is either purely real or imaginary. In such case, the field
φ either corresponds to a free massless scalar field (m = 0), a free massive
scalar field (m 6= 0 , α = 0) or sine-Gordon theory (m 6= 0 , α 6= 0).
Moreover, via linearisation, there is a one-to-one correspondence be-
tween:
• The set of on-shell non-trivial, polynomial in the field derivatives, coordinate-
independent, symmetry transformations of the sine-Gordon Lagrangian
• The Lie algebra of coordinate-independent off-shell symmetries of a free
real scalar field quotiented by the ideal of on-shell-trivial symmetry gener-
ators,
• A proper Abelian Lie subalgebra of the on-shell higher-spin algebra of
the Minkowski plane,
• The space of harmonic odd polynomials in the momenta Pµ = −i∂µ .
These differential operators T may be represented by real traceless sym-
metric constant tensors λ:
T = i λ
µ1...µ2q+1∂µ1 . . . ∂µ2q+1 + on-shell-trivial .
Proof: The first part of the theorem is a straightforward consequence
of the results of [16] in the case when V (φ) is at least quadratic in φ
(by hypothesis). The second part is proven by selecting all coordinate-
independent symmetries of a free real scalar field and comparing them
with the conserved currents of [16]. In both cases, the Noether correspon-
dence between non-trivial conserved currents and non-trivial symmetries
(see e.g. [17] for a precise statement of this isomorphism) is performed
via the Hamiltonian formulation of a two-dimensional scalar field where
one of the light-cone coordinate plays the role of “time.”
5 A gauge principle for higher-spins ?
The analogy with lower-spins suggests to guess the full non-Abelian gauge
theory by making use of the “gauge principle.” Moreover, this point of
view actually provides a concrete motivation for using the higher-spin
algebras in the interaction problem.
The idea is to consider some “matter” system described by a quadratic
action (3) with some algebra of rigid symmetries. The rigid symmetries U
of this system are by definition in the “fundamental” representation of the
algebra of off-shell symmetries of the action (3). Connections are usually
introduced in order to “gauge” these rigid symmetries by allowing U to be
a smooth function on Rn taking values in the group of off-shell symmetries
of the action (3). In order to construct a covariant derivative D = ∂ + Γ,
one introduces a connection defined as a covariant vector field Γµ taking
values in the Lie algebra of off-shell symmetries and transforming as
| ψ 〉 −→ U | ψ 〉 , Γ −→ UDU
, (8)
in such a way that
D | ψ 〉 −→ UD | ψ 〉 .
The minimal coupling is the replacement of all partial derivatives ∂ in the
kinetic operator K(∂) by covariant derivatives D which should ensure that
the quadratic action 〈ψ | K(D) | ψ 〉 is preserved by gauge symmetries
(8). The connection transforms in the “adjoint” representation of the
rigid symmetries while the matter field transforms in the “fundamental.”
(More precisely, the covariant derivative transforms in the adjoint while
the matter field belongs to a module of the gauge algebra.)
The introduction of a connection requires the introduction of some new
dynamical fields: the “gauge” sector. In Yang-Mills gauge theories, the
rigid symmetry is internal and the connection is itself made of spin-1 gauge
fields. For spacetime symmetries, the relation between the connection
and the gauge field is more complicated. For instance, in Einstein gravity
the Levi-Civita connection is expressed in terms of the first derivative of
the metric via the torsionlessness and metricity constraints. In general,
the spin-s tensor field propagating on a constant-curvature spacetime is
expected to be the perturbation of some background field
gµ1...µs =
g µ1...µs + ε hµ1...µs ,
so that the deformed gauge symmetries would be of the form
δξgµ1µ2... µs = ε (Dξ)µ1µ2... µs , (9)
where the covariant derivative D = ∇ + O(ε) starts as the covariant
derivative with respect to the Levi–Civita connection for the spacetime
metric plus non-minimal corrections. Thus the background connection
is identified with the Levi-Civita connection for the background metric,
and the linearization of (9) reproduces (1). Furthermore, the reducibility
parameters of (1) exactly correspond to the gauge symmetries (9) leaving
the background geometry invariant. In the present case, this group of
rigid symmetries contains the isometry group g of the constant-curvature
spacetime. The classical theory of (in)homogeneous pseudo-orthogonal
groups tells us that completely symmetric tensor fields which are invariant
under g are constructed from products of the background metric:
g (µ1µ2 . . .
µ2m−1µ2m)
Thus, along these lines, only even-spin symmetric tensor fields can be
perturbations of a non-vanishing higher-spin background in a constant-
curvature spacetime. The first-order deformation of the gauge symmetries
(1) following from (9) would be of the schematic form
δξ hµ1µ2... µs = (
Γ · ξ )µ1µ2... µs , (10)
where
Γ stands for the linearized connection (including the linearized
Levi-Civita connection) and the dot stands for the action on the gauge
parameter ξ. The transformations (10) evaluated on Killing tensors ξ
of the background spacetime would be rigid symmetry transformations
of the free gauge theory. This property highly constrains the possible
expressions for the linearized connection.
Let us now consider the expansion of the minimally coupled action for
the “matter” sector in power series of ε :
〈ψ | K(D) | ψ 〉 =
〈ψ | K(∂) | ψ 〉 + ε 〈h | J 〉 + O(ε
where J denotes a set of symmetric tensors which are bilinear in ψ and
their derivatives. Assuming that the “matter” sector is strictly distinct
from the “gauge” sector, the gauge invariance of the complete action at
first order in ε requires the symmetric tensors Jµ1µ2... µs to be conserved
up to terms proportional to the “matter” free field equations (and deriva-
tives thereof) corresponding to first-order deformations
δξ | ψ 〉 =
U | ψ 〉 (11)
of the gauge transformations of the “matter” sector, where
U is a lin-
ear differential operator depending linearly on ξ and its derivatives. At
zeroth order in ǫ , the “gauge” group does not act on the matter. There-
fore, at leading order, the transformation law (8) reads as (11). Via the
Noether correspondence, the space of all rigid symmetries of the “matter”
quadratic action determines the space of all on-shell-conserved currents
bilinear in the “matter” fields. The latter ones determine, at first order,
the “fundamental” representation of the “gauge” group. The transforma-
tions (11) evaluated on Killing tensors ξ must define off-shell symmetries
of the “matter” quadratic action. Their algebra algebra is non-Abelian,
hence the “gauge” algebra is already non-Abelian at first order.
As a suggestive example, one may consider a “matter” sector contain-
ing only a single scalar field.
Noether cubic couplings of a scalar field: The minimally coupled
action of a complex scalar field on flat spacetime, given by
S[φ] =
〈φ | � −m
| φ 〉 − ε
x hµ1µ2... µsJ
µ1µ2... µs + O(ε
is invariant at first order in ε , for any symmetric tensor field ξµ1µ2... µs−1 ,
under infinitesimal symmetry transformations
δξhµ1µ2... µs =
δξ hµ1µ2... µs + O(ε) ,
δξ | φ 〉 = εT | φ 〉+ O(ε
) , (12)
where the symbol of the differential operator T is represented by ξ and the
lower order terms depend on derivatives of ξ,
T = (−i)
µ1...µs−1∂µ1 . . . ∂µs−1 + lower+ on-shell-trivial ,
if and only if the on-shell-conserved current J is equivalent to a Noether
current associated to the coordinate-independent off-shell symmetries of
the free scalar field. This defines a one-to-one correspondence between
equivalence classes of such symmetric Noether currents J , bilinear in φ
and its derivatives, and equivalence classes of such deformations δξφ at
first order.
Proof: The explicit equation expressing the gauge invariance of the min-
imaly coupled action for any symmetric tensor field ξ(x) of rank s − 1
precisely states that the symmetric tensor J of rank s is conserved mod-
ulo terms proportional to field equation of the scalar field φ. The one-
to-one correspondence, precisely explained in [17], between equivalence
classes of on-shell conserved currents and equivalence classes of off-shell
symmetry transformations shows explicitly that J is necessarily related to
a coordinate-independent transformation of the form (12). In turn, these
transformations are obtained by evaluating the transformation (12), at
lowest order in ε and on gauge parameters ξ equal to constant Killing
tensors. The sufficiency is proven by making use of the symmetric con-
served currents of [18]. The second part of the theorem follows from the
fact that trivial currents define trivial deformations and conversely, as it
can be seen explicitly.
In the lower-spin case, one recovers the standard minimal coupling
procedure. For s = 1 , the minimal coupling stops at second order in ε
since Jµ is the U(1) current and hµ is the Abelian vector gauge field. For
s = 2 , the minimal coupling at first order is the usual coupling between
a spin-two gauge field and the energy-momentum tensor Jµν leading to
the coordinate transformations of the scalar field, generated by the vector
fields T = −i ξµ(x) ∂µ . The commutators of such infinitesimal transfor-
mations close and define the Lie bracket of vector fields, so the underlying
gauge symmetry algebra may already be guessed at first order for gravity:
it is the Lie agebra of smooth vector fields, i.e. the Lie algebra for the
group of diffeomorphisms. The minimally coupled action is obtained to
all orders by introducing the Levi-Civita connection.
In the higher-spin case, it should be stressed that the trace condi-
tions on the gauge field and parameter have not been stated in the former
proposition because they may indeed be relaxed in order to simplify its
formulation. (Nevertheless, these constraints may be included by consis-
tently imposing weaker conservation laws on double-traceless currents.)
Moreover, it is convenient to remove trace constraints for searching a
Non-Abelian higher-spin gauge symmetry algebra. Actually, the trace
constraints may be removed for free field theories in several ways (see [19]
for some reviews, and [20] for the latest developments). The Lie algebra of
gauge transformations (12) for the infinite tower of all gauge parameters
(1 6 s < ∞) is a real form of the algebra of linear differential opera-
tors on Rn endowed with i times the commutator as Lie bracket. Notice
also that the unital associative algebra of linear differential operators on
n is isomorphic to the universal enveloping algebra of vector fields on
n . (Strictly speaking, this is true only for polynomial vector fields and
differential operators, more sophisticated mathematical statements may
be required for smooth functions, but this point is only technical.) More
concretely, the symbol of a differential operator of order r is represented
by a symmetric tensor field of rank r. In the light of these remarks, it is
tempting to conjecture that, for higher-spin gauge theories, the algebra of
Hermitian differential operators,
µ1...µr (x) ∂µ1 . . . ∂µr + Hermitian conjugate
generalizes the algebra of infinitesimal diffeomorphisms for gravity. An-
other argument in favour of this conjecture may be presented in the
“gauge” sector by looking at the metric-like formulation of higher-spins
arising from the frame-like formulation of Vasiliev, at first order in the
coupling constant [21].
6 Conclusion
The conclusion is that there are two complementary but distinct ways
of using rigid symmetries of the free theory in order to guess the proper
gauge symmetry principle of higher-spin gauge theories.
On the one hand, the infinite set of rigid symmetries of the free (or,
maybe, even integrable) “matter” sector, might be gauged by the intro-
duction of a connection via a minimal coupling prescription. The idea of
using a massive scalar field as free matter sector and an infinite tower of
massless symmetric tensor fields as interacting gauge sector is in agree-
ment with the isomorphism between the off-shell higher-spin algebra and
the space of reducibility parameters. (If tensor fields are used as free
“matter” sector, then the symmetry algebra could be larger. Following
the lines of the Vasiliev construction in such case, the structure of the uni-
versal enveloping algebra points towards a larger infinite tower of gauge
fields including mixed-symmetry tensors.)
On the other hand, in the free “gauge” sector, rigid symmetries linked
to reducibility parameters may arise from the linearization of the gauge
symmetries of some non-linear action. Thus the complete knowledge of the
rigid symmetries of free higher-spin gauge theories would indicate what
can be the linearized connection.
Acknowledgments
I. Bakas, G. Barnich, N. Boulanger, T. Damour and J. Remmel are
thanked for very useful exchanges. The author is grateful to the orga-
nizers for their invitation to this enjoyable meeting and the opportunity
to present his lecture. The Institut des Hautes Études Scientifiques de
Bures-sur-Yvette is acknowledged for its hospitality.
References
[1] D. Sorokin, AIP Conf. Proc. 767 (2005) 172 [hep-th/0405069];
N. Bouatta, G. Compere and A. Sagnotti, in the proceedings of the
“First Solvay Workshop on Higher-Spin Gauge Theories” (Brussels,
Belgium; May 2004) [hep-th/0409068].
[2] T. Ortin, Gravity and strings (Cambridge, 2004).
[3] F. A. Berends, G. J. H. Burgers and H. van Dam, Nucl. Phys. B 260
(1985) 295.
[4] M. A. Vasiliev, Comptes Rendus Physique 5 (2004) 1101
[hep-th/0409260];
X. Bekaert, S. Cnockaert, C. Iazeolla and M. A. Vasiliev, in the
proceedings of the “First Solvay Workshop on Higher-Spin Gauge
Theories” (Brussels, Belgium; May 2004) [hep-th/0503128].
[5] G. Thompson, J. Math. Phys. 27 (1986) 2693;
R. G. McLenaghan, R. Milson and R. G. Smirnov, C. R. Acad. Sci.
Paris, Ser. I 339 (2004) 621.
http://arxiv.org/abs/hep-th/0405069
http://arxiv.org/abs/hep-th/0409068
http://arxiv.org/abs/hep-th/0409260
http://arxiv.org/abs/hep-th/0503128
[6] G. Barnich and M. Henneaux, Phys. Lett. B 311 (1993) 123
[hep-th/9304057];
M. Henneaux, Contemp. Math. 219 (1998) 93 [hep-th/9712226].
[7] X. Bekaert, N. Boulanger, S. Cnockaert and S. Leclercq, Fortsch.
Phys. 54 (2006) 282 [hep-th/0602092].
[8] X. Bekaert, in the proceedings of the “First Modave Summer School
in Mathematical Physics” (Modave, Belgium; June 2005).
[9] D.E. Littlewood, The theory of group characters and matrix repre-
sentations of groups (Clarendon, 1958);
G. R. E. Black, R. C. King and B. G. Wybourne, J. Phys. A: Math.
Gen. 16 (1983) 1555.
[10] A. Mikhailov, “Notes on higher spin symmetries,” hep-th/0201019.
[11] A. B. Zamolodchikov and A. B. Zamolodchikov, Annals Phys. 120
(1979) 253.
[12] M. A. Vasiliev, Int. J. Mod. Phys. D 5 (1996) 763 [hep-th/9611024].
[13] R. Geroch, J. Math. Phys. 11 (1970) 1955;
M. G. Eastwood, “Higher symmetries of the Laplacian,”
hep-th/0206233.
[14] S. E. Konstein, M. A. Vasiliev and V. N. Zaikin, JHEP 0012 (2000)
018 [hep-th/0010239].
[15] I. Bakas, B. Khesin and E. Kiritsis, Commun. Math. Phys. 151
(1993) 233.
[16] R. K. Dodd and R. K. Bullough, Proc. Roy. Soc. Lond. A 352 (1977)
[17] G. Barnich and F. Brandt, Nucl. Phys. B 633 (2002) 3
[hep-th/0111246].
[18] D. Anselmi, Class. Quant. Grav. 17 (2000) 1383 [hep-th/9906167];
M. A. Vasiliev, in M. Shifman ed., The many faces of the superworld
(World Scientific, 2000) [hep-th/9910096].
[19] D. Francia and A. Sagnotti, Class. Quant. Grav. 20 (2003)
S473 [hep-th/0212185]; J. Phys. Conf. Ser. 33 (2006) 57
[hep-th/0601199].
[20] D. Francia, J. Mourad and A. Sagnotti, Nucl. Phys. B 773 (2007)
203 [hep-th/0701163];
I. L. Buchbinder, A. V. Galajinsky and V. A. Krykhtin, Nucl. Phys.
B 779 (2007) 155 [hep-th/0702161].
[21] X. Bekaert, work in progress.
http://arxiv.org/abs/hep-th/9304057
http://arxiv.org/abs/hep-th/9712226
http://arxiv.org/abs/hep-th/0602092
http://arxiv.org/abs/hep-th/0201019
http://arxiv.org/abs/hep-th/9611024
http://arxiv.org/abs/hep-th/0206233
http://arxiv.org/abs/hep-th/0010239
http://arxiv.org/abs/hep-th/0111246
http://arxiv.org/abs/hep-th/9906167
http://arxiv.org/abs/hep-th/9910096
http://arxiv.org/abs/hep-th/0212185
http://arxiv.org/abs/hep-th/0601199
http://arxiv.org/abs/hep-th/0701163
http://arxiv.org/abs/hep-th/0702161
	Higher-spin interaction problem
	The Noether method
	Free theory symmetries
	Higher-spin algebras
	A gauge principle for higher-spins ?
	Conclusion
ABSTRACT
  The exhaustive study of the rigid symmetries of arbitrary free field theories
is motivated, along several lines, as a preliminary step in the completion of
the higher-spin interaction problem in full generality. Some results for the
simplest example (a scalar field) are reviewed and commented along these lines.

<|endoftext|><|startoftext|>
To Appear in Publications of the Astronomical Society of the Pacific
Preprint typeset using LATEX style emulateapj v. 08/22/09
CALFUSE v3: A DATA-REDUCTION PIPELINE FOR THE FAR ULTRAVIOLET SPECTROSCOPIC
EXPLORER1
W. V. Dixon
, D. J. Sahnow
, P. E. Barrett
, T. Civeit
, J. Dupuis
, A. W. Fullerton
, B. Godard
J.-C. Hsu
, M. E. Kaiser
, J. W. Kruk
, S. Lacour
, D. J. Lindler
, D. Massa
, R. D. Robinson
M. L. Romelfanger
, and P. Sonnentrucker
To Appear in Publications of the Astronomical Society of the Pacific
ABSTRACT
Since its launch in 1999, the Far Ultraviolet Spectroscopic Explorer (FUSE) has made over 4600
observations of some 2500 individual targets. The data are reduced by the Principal Investigator
team at the Johns Hopkins University and archived at the Multimission Archive at Space Telescope
(MAST). The data-reduction software package, called CalFUSE, has evolved considerably over the
lifetime of the mission. The entire FUSE data set has recently been reprocessed with CalFUSE v3.2,
the latest version of this software. This paper describes CalFUSE v3.2, the instrument calibrations
upon which it is based, and the format of the resulting calibrated data files.
Subject headings: instrumentation: spectrographs — methods: data analysis — space vehicles: in-
struments — ultraviolet: general — white dwarfs
1. INTRODUCTION
The Far Ultraviolet Spectroscopic Explorer (FUSE) is a
high-resolution, far-ultraviolet spectrometer operating in
the 905–1187 Å wavelength range. FUSE was launched
in 1999 on a Delta II rocket into a nearly circular, low-
earth orbit with an inclination of 25◦ to the equator and
an approximately 100-minute orbital period. Data ob-
tained with the instrument are reduced by the principal
investigator team at the Johns Hopkins University using
a suite of computer programs called CalFUSE. Both raw
and processed data files are deposited in the Multimis-
sion Archive at Space Telescope (MAST).
CalFUSE evolved considerably in the years following
launch as our increasing knowledge of the spectrograph’s
performance allowed us to correct the data for more and
more instrumental effects. The program eventually be-
came unwieldy, and in 2002 we began a project to re-
write the code, incorporating our new understanding of
the instrument and best practices for data reduction.
1 Based on observations made with the NASA-CNES-CSA Far
Ultraviolet Spectroscopic Explorer. FUSE is operated for NASA by
the Johns Hopkins University under NASA contract NAS5-32985.
2 Department of Physics and Astronomy, Johns Hopkins Univer-
sity, 3400 N. Charles Street, Baltimore, MD 21218
3 Space Telescope Science Institute, ESS/SSG, 3700 San Martin
Drive, Baltimore, MD 21218
4 Current address: Earth Orientation Department, U.S. Naval
Observatory, 3450 Massachusetts Avenue NW, Washington, DC
20392
5 Primary affiliation: Centre National d’Études Spatiales, 2 place
Maurice Quentin, 75039 Paris Cedex 1, France
6 Current address: Canadian Space Agency, 6767 route de
l’Aéroport, Longueuil, QC, Canada, J3Y 8Y9
7 Primary affiliation: Department of Physics and Astronomy,
University of Victoria, P. O. Box 3055, Victoria, BC V8W 3P6,
Canada
8 Current address: Institut d’Astrophysique de Paris, 98 bis,
boulevard Arago, 75014 Paris, France
9 Retired
10 Current address: Sydney University, NSW 2006, Australia
11 Sigma Space Corporation, 4801 Forbes Boulevard, Lanham,
MD 20706
12 SGT, Inc., NASA Goddard Space Flight Center, Code 665.0,
Greenbelt, MD 20771
The result is CalFUSE v3, which produces a higher qual-
ity of calibrated data while running ten times faster than
previous versions. The entire FUSE archive has recently
been reprocessed with CalFUSE v3.2; we expect this to
be the final calibration of these data.
In this paper, we describe CalFUSE v3.2.0 and its cal-
ibrated data products. Because this document is meant
to serve as a resource for researchers analyzing archival
FUSE spectra, we emphasize the interpretation of pro-
cessed data files obtained fromMAST rather than the de-
tails of designing or running the pipeline. An overview of
the FUSE instrument is provided in § 2, and an overview
of the pipeline in § 3. Section 4 presents a detailed de-
scription of the pipeline modules and their subroutines.
The FUSE wavelength and flux calibration are discussed
in § 5, and a few additional topics are considered in § 6. A
detailed description of the various file formats employed
by CalFUSE is presented in the Appendix.
Additional documentation available from MAST in-
cludes the CalFUSE Homepage,13 The CalFUSE
Pipeline Reference Guide,14 The FUSE Instrument and
Data Handbook,15 and The FUSE Data Analysis Cook-
book.16
2. THE FUSE INSTRUMENT
FUSE consists of four co-aligned prime-focus tele-
scopes, each with its own Rowland spectrograph (Fig. 1).
Two of the four channels employ Al+LiF optical coatings
and record spectra over the wavelength range∼ 990–1187
Å, while the other two use SiC coatings, which provide
reflectivity to wavelengths below the Lyman limit. The
four channels overlap between 990 and 1070 Å. Spectral
resolution is roughly 20,000 (λ/∆λ) for point sources.
For a complete description of FUSE, see Moos et al.
(2000) and Sahnow et al. (2000a).
At the prime focus of each mirror lies a focal-plane as-
13 http://archive.stsci.edu/fuse/calfuse.html
14 http://archive.stsci.edu/fuse/pipeline.html
15 http://archive.stsci.edu/fuse/dhbook.html
16 http://archive.stsci.edu/fuse/cookbook.html
http://arxiv.org/abs/0704.0899v1
http://archive.stsci.edu/fuse/calfuse.html
http://archive.stsci.edu/fuse/pipeline.html
http://archive.stsci.edu/fuse/dhbook.html
http://archive.stsci.edu/fuse/cookbook.html
2 Dixon et al.
sembly (or FPA, shown in Fig. 2) containing three spec-
trograph entrance apertures: the low-resolution aperture
(LWRS; 30′′ × 30′′), used for most observations, the
medium-resolution aperture (MDRS; 4′′ × 20′′), and the
high-resolution aperture (HIRS; 1.25′′ × 20′′). The ref-
erence point (RFPT) is not an aperture; when a target
is placed at this location, the three apertures sample the
background sky. For a particular exposure, the FITS
file header keywords RA TARG and DEC TARG con-
tain the J2000 coordinates of the aperture (or RFPT)
listed in the APERTURE keyword, while the keyword
APER PA contains the position angle of the −Y axis (in
the FPA coordinate system; see Fig. 2), corresponding to
a counter-clockwise rotation of the spacecraft about the
target (and thus about the center of the target aperture).
The spectra from the four instrument channels are
imaged onto two photon-counting microchannel-plate
(MCP) detectors, labeled 1 and 2, with a LiF spectrum
and a SiC spectrum on each (Fig. 1). Each detector is
comprised of two MCP segments, labeled A and B. Raw
science data from each detector segment are stored in
a separate data file; an exposure thus yields four raw
data files, labeled 1A, 1B, 2A, and 2B. Because the three
apertures are open to the sky at all times, the LiF and
SiC channels each generate three spectra, one from each
aperture. In most cases, the non-target apertures are
empty and sample the background sky. Figure 3 presents
a fully-corrected image of detector 1A obtained during
a bright-earth observation. The emission features in all
three apertures are geocoronal. Note that the LiF1 wave-
length scale increases to the right, while the SiC1 scale
increases to the left. The Lyman β λ1026 airglow feature
is prominent in each aperture.
Two observing modes are available: In photon-address
mode, also known as time-tag or TTAG mode, the X and
Y coordinates and pulse height (§ 4.3.7) of each detected
photon are stored in a photon-event list. A time stamp
is inserted into the data stream, typically once per sec-
ond. Data from the entire active area of the detector are
recorded. Observing bright targets in time-tag mode can
rapidly fill the spacecraft recorder. Consequently, when
a target is expected to generate more than ∼ 2500 counts
s−1 across all four detector segments, the data are stored
in spectral-image mode, also called histogram or HIST
mode. To conserve memory, histogram data are (usu-
ally) binned by eight pixels in Y (the spatial dimension),
but unbinned in X (the dispersion dimension). Only data
obtained through the target aperture are recorded. Indi-
vidual photon arrival time and pulse height information
is lost. The orbital velocity of the FUSE spacecraft is 7.5
km s−1. Since Doppler compensation is not performed
by the detector electronics, histogram exposures must be
kept short to preserve spectral resolution; a typical his-
togram exposure is about 500 s in length.
The front surfaces of the FPAs are reflective in visible
light. On the two LiF channels, light not passing through
the apertures is reflected into a visible-light CCD camera.
Images of stars in the field of view around the apertures
are used for acquisition and guiding by this camera sys-
tem, called the Fine Error Sensor (FES). FUSE carries
two redundant FES cameras, which were provided by
the Canadian Space Agency. FES A views the FPA on
the LiF1 channel, and FES B views the LiF2 FPA. Dur-
ing initial checkout, FES A was designated the default
camera and was used for all science observations until it
began to malfunction in 2005. In July of that year, FES
B was made the default guide camera. Implications of
the switch from FES A to FES B are discussed in § 6.1.
3. OVERVIEW OF CALFUSE
The new CalFUSE pipeline was designed with three
principles mind: the first was that, to the extent possi-
ble, we follow the path of a photon backwards through
the instrument, correcting for the instrumental effects in-
troduced in each step. The principal steps in this path,
together with the effects imparted by each, are listed be-
low. Most of the optical and electronic components in
this list are labeled in Fig. 1.
1. Satellite motion imparts a Doppler shift.
2. Satellite pointing instabilities shift the target image
within (or out of) the aperture.
3. Thermally-induced mirror motions shift the target im-
age within (or out of) the aperture.
4. FPA offsets shift the spectrum on the detector.
5. Thermally-induced motions of the spectrograph grat-
ings shift the target image within (or out of) the aper-
ture.
6. Ion-repelling wire grids can cast shadows called
“worms.”
7. Detector effects include quantum efficiency, flat field,
dead spots, and background.
8. The spectra are distorted by temperature-, count-rate,
time-, and pulse-height-dependent errors in the photons’
measured X and Y coordinates, as well as smaller-scale
geometric distortions in the detector image.
9. Count-rate limitations in the detector electronics and
the IDS data bus are sources of dead time.
To correct for these effects, we begin at the bottom
of the list and (to the extent possible) work backwards.
First, we adjust the photon weights to account for data
lost to dead time (9) and correct the photons’ X and
Y coordinates for a variety of detector distortions (8).
Second, we identify periods of unreliable, contaminated,
or missing data. Third, we correct the photons’ X and
Y coordinates for grating (5), FPA (4), mirror (3), and
spacecraft (2) motions. Fourth, we assign a wavelength
to each photon based on its corrected X and Y coor-
dinates (5), then convert to a heliocentric wavelength
scale (1). Finally, we correct for detector dead spots
(7); model and subtract the detector and scattered-light
backgrounds (7); and extract (using optimal extraction,
if possible), flux calibrate (7) and write to separate FITS
files the target’s LiF and SiC spectra. Note that we can-
not correct for the effects of worms (6) or the detector
flat field (7).
Our second principal was to make the pipeline as mod-
ular as possible. CalFUSE is written in the C program-
ming language and runs on the Solaris, Linux, and Mac
OS X (versions 10.2 and higher) operating systems. The
pipeline consists of a series of modules called by a shell
script. Individual modules may be executed from the
command line. Each performs a set of related correc-
tions (screen data, remove motions, etc.) by calling a
series of subroutines.
Our third principal was to maintain the data as a pho-
ton list (called an intermediate data file, or IDF) un-
til the final module of the pipeline. Input arrays are
read from the IDF at the beginning of each module, and
CalFUSE: The FUSE Calibration Pipeline 3
output arrays are written at the end. Bad photons are
flagged but not discarded, so the user can examine, fil-
ter, and combine processed data files without re-running
the pipeline. Like all FUSE data, IDFs are stored as
FITS files (Hanisch et al. 2001); the various file formats
employed by CalFUSE are described in the Appendix.
A FUSE observation consists of a set of exposures ob-
tained with a particular target in a particular aperture
on a particular date. Each exposure generates four raw
data files, one per detector segment, and each raw data
file yields a pair of calibrated spectra (LiF and SiC), for
a total of 8 calibrated spectral files per exposure. Each
raw data file is processed individually by the pipeline.
Error and status messages are written to a trailer file
(described in § 4.10). Spectra are extracted only for
the target aperture and are binned in wavelength. Bin-
ning can be set by the user, but the default is 0.013 Å,
which corresponds to about two detector pixels or one
fourth of a point-source resolution element. After pro-
cessing, additional software is used to generate a set of
observation-level spectral files, the ALL, ANO, and NVO
files described in § 4.11. A complete list of FUSE data
files and file-naming conventions may be found in The
FUSE Instrument and Data Handbook. All of the expo-
sures that constitute an observation are processed and
archived together.
Investigators who wish to re-process their data may
retrieve the CalFUSE source code and all associated cal-
ibration files from the CalFUSE Homepage. Instructions
for running the pipeline and detailed descriptions of the
calibration files are provided in The CalFUSE Pipeline
Reference Guide. Note that, within the CalFUSE soft-
ware distribution, all of the calibration files, including
the FUSE.TLE file (§ 4.2), are stored in the directory
v3.2/calfiles, while all of the parameter files, including
master calib file.dat and the screening and parameter
files (SCRN CAL and PARM CAL; § 4.2), are stored
in the directory v3.2/parmfiles.
4. STEP BY STEP
In this section, we discuss the pipeline subroutine
by subroutine. Our goal is to describe the algorithms
employed by each subroutine and any shortcomings or
caveats of which the user should be aware.
4.1. OPUS
The Operations Pipeline Unified System (OPUS) is
the data-processing system used by the Space Telescope
Science Institute to reduce science data from the Hub-
ble Space Telescope (HST). We use a FUSE-specific ver-
sion of OPUS to manage our data processing (Rose et al.
1998). OPUS ingests the data downlinked by the space-
craft and produces the data files that serve as input to
the CalFUSE pipeline. OPUS then manages the execu-
tion of the pipeline and the files produced by CalFUSE
and calls the additional routines that combine spectra
from each channel and exposure into a set of observation-
level spectral files. OPUS reads the FUSE Mission Plan-
ning Database (which contains target information from
the individual observing proposals and instrument con-
figuration and scheduling information from the mission
timeline) to populate raw file header keywords and to
verify that all of the data expected from an observation
were obtained.
OPUS generates six data files for each exposure. Four
are raw data files (identified by the suffix “fraw.fit”), one
for each detector segment. One is a housekeeping file
(“hskpf.fit”) containing time-dependent spacecraft engi-
neering data. Included in this file are detector volt-
ages, count rates, and spacecraft-pointing information.
The housekeeping file is used to generate a jitter file
(“jitrf.fit”), which contains information needed to cor-
rect the data for spacecraft motion during an exposure.
Detailed information on the format and contents of each
file is provided in the Appendix.
4.2. Generate the Intermediate Data File
The first task of the pipeline is to convert the raw data
file into an intermediate data file (IDF), which maintains
the data in the form of a photon-event list. (The format
and contents of the IDF are described in § A-3.) For data
obtained in time-tag mode, the module cf ttag init
merely copies the arrival time, X and Y detector coor-
dinates, and pulse-height of each photon event from the
raw file to the TIME, XRAW, YRAW, and PHA arrays of
the IDF. A fifth array, the photon weight, is initially set
to unity. Photons whose X and Y coordinates place them
outside of the active region of the detector are flagged as
described in § 4.3.8. Raw histogram data are stored by
OPUS as an image; the module cf hist init converts
each non-zero pixel of that image into a single entry in
the IDF, with X and Y equal to the pixel coordinates
(mapped to their location on the unbinned detector),
arrival time set to the mid-point of the exposure, and
pulse height set to 20 (possible values range from 0 to
31). The arrival time and pulse height are modified later
in the pipeline. The photon weight is set to the number
of accumulated counts on the pixel, i.e., the number of
photons detected on that region of the detector.
The IDF has two additional extensions. The first con-
tains the good-time intervals (GTIs), a series of start
and stop times (in seconds from the exposure start time
recorded in the file header) computed by OPUS, when
the data are thought to be valid. For time-tag data, this
extension is copied directly from the raw data file. For
histogram data, a single GTI is generated with START =
0 and STOP = EXPTIME (the exposure time computed
by OPUS). The final extension is called the timeline ta-
ble and consists of 16 arrays containing status flags and
spacecraft-position, detector high-voltage, and count-
rate parameters tabulated once per second throughout
the exposure. Only the day/night and OPUS bits of the
time-dependent status flags are populated (§ A-3); the
others are initialized to zero. The spacecraft-position pa-
rameters are computed as described below. The detector
voltages and the values of various counters are read from
the housekeeping data file.
A critical step in the initialization of the IDF is pop-
ulating the file-header keywords that describe the space-
craft’s orbit and control the subsequent actions of the
pipeline. The names of all calibration files to be used by
the pipeline are read from the file master calib file.dat
and written to file-header keywords. (Keywords for each
calibration file are included in the discussion that fol-
lows.) Three sets of calibration files are time-dependent:
the effective area is interpolated from the two files
with effective dates immediately preceding and follow-
ing the exposure start time (these file names are stored
4 Dixon et al.
in the header keywords AEFF1CAL and AEFF2CAL);
the scattered-light model is taken from the file with an
effective date immediately preceding the exposure start
time (keyword BKGD CAL); and the orbital elements
are read from the FUSE.TLE file, an ASCII file contain-
ing NORAD two-line elements for each day of the mis-
sion. These two-line elements are used to populate both
the orbital ephemeris keywords in the IDF file header and
the various spacecraft-position arrays in the timeline ta-
ble. Finally, a series of data-processing keywords is set
to either PERFORM or OMIT the subsequent steps of
the pipeline. Once a step is performed, the correspond-
ing keyword is set to COMPLETE. Some user control of
the pipeline is provided by the screening and parameter
files (SCRN CAL and BKGD CAL), which allow one,
for example, to select only night-time data or to turn
off background subtraction. An annotated list of file-
header keywords, including the calibration files used by
the pipeline, is provided in the FUSE Instrument and
Data Handbook.
Caveats: Occasionally, photon arrival times in raw
time-tag data files are corrupted. When this happens,
some fraction of the photon events have identical, enor-
mous TIME values, and the good-time intervals contain
an entry with START and STOP set to the same large
value. The longest valid exposure spans 55 ks (though
most are ∼ 2 ks long). If an entry in the GTI table ex-
ceeds this value, the corresponding entry in the timeline
table is flagged as bad (using the “photon arrival time
unknown” flag; § A-3). Bad TIME values less than 55 ks
will not be detected by the pipeline.
Raw histogram files may also be corrupted. OPUS fills
missing pixels in a histogram image with the value 21865.
The pipeline sets the WEIGHT of such pixels to zero and
flags them as bad (by setting the photon’s “fill-data bit”;
§ A-3). Occasionally, a single bit in a histogram image
pixel is flipped, producing (for high-order bits) a “hot
pixel” in the image. The pipeline searches for pixels with
values greater than 8 times the average of their neighbors,
identifies the flipped bit, and resets it.
One or more image extensions may be missing from a
raw histogram file (§A-2). If no extensions are present,
the keyword EXP STAT in the IDF header is set to −1.
Exposures with non-zero values of EXP STAT are pro-
cessed normally by the pipeline, but are not included in
the observation-level spectral files ultimately delivered to
MAST (§ 4.11). Though the file contains no data, the
header keyword EXPTIME is not set to zero.
Early versions of the CalFUSE pipeline did not make
use of the housekeeping files, but instead employed engi-
neering information downloaded every five minutes in a
special “engineering snapshot” file. That information is
used by OPUS to populate a variety of header keywords
in the raw data file. If a housekeeping file is not avail-
able, CalFUSE v3 uses these keywords to generate the
detector high-voltage and count-rate arrays in the time-
line table. Should these header keywords be corrupted,
the pipeline issues a warning and attempts to estimate
the corrupted values. In such cases, it is wise to compare
the resulting dead-time corrections (§ 4.3.2) with those
of other, uncorrupted exposures of the same target.
4.3. Convert to FARF
The pipeline module cf convert to farf is designed
to remove detector artifacts. Our goal is to construct
the data set that would be obtained with an ideal detec-
tor. The corrections can be grouped into two categories:
dead-time effects, which are system limitations that re-
sult in the loss of photon events recorded by the detec-
tor, and positional inaccuracies, i.e., errors in the raw
X and Y pixel coordinates of individual photon events.
The coordinate system defined by these corrections is
called the flight alignment reference frame, or FARF.
Corrected coordinates for each photon event are written
to the XFARF and YFARF arrays of the IDF.
4.3.1. Digitizer Keywords
The first subroutine of this module,
cf check digitizer, merely compares a set of 16
IDF file header keywords, which record various detector
settings, with reference values stored in the calibration
file DIGI CAL. Significant differences result in warning
messages being written to both the file header and the
exposure trailer file. Such warning messages should be
taken seriously, as data obtained when the detectors are
not properly configured are likely to be unusable. Be-
sides issuing a warning, the program sets the EXP STAT
keyword in the IDF header to −2.
4.3.2. Detector Dead Time
The term “dead time” refers specifically to the finite
time interval required by the detector electronics to pro-
cess a photon event. During this interval, the detector is
“dead” to incoming photons. The term is more generally
applied to any loss of data that is count-rate dependent.
There are three major contributions to the effective de-
tector dead time on FUSE. The first is due to limitations
in the detector electronics, which at high count rates may
not be able to process photon events as fast as they ar-
rive. The correction for this effect is computed separately
for each segment from the count rate measured at the
detector anode by the Fast Event Counter (FEC) and
recorded to the engineering data stream, typically once
every 16 seconds. The functional form of the correction
was provided by the detector development group at the
University of California, Berkeley, and its numerical con-
stants were determined from in-flight calibration data. It
is applied by the subroutine cf electronics dead time.
A second contribution to the dead time comes from
the way that the Instrument Data System (IDS) pro-
cesses counts coming from the detector. The IDS can
accept at most 8000 counts per second in time-tag mode
and 32000 counts per second in histogram mode from
the four detector segments (combined). At higher count
rates, photon events are lost. To correct for such losses,
the subroutine cf ids dead time compares the Active
Image Counter (AIC) count rate, measured at the back
end of the detector electronics, with the maximum al-
lowed rate. The IDS dead-time correction is the ratio of
these two numbers (or unity, whichever is greater).
A third contribution occurs when time-tag data are
bundled into 64 kB data blocks in the IDS bulk memory.
This memory is organized as a software FIFO (first-in,
first-out) memory buffer, and the maximum data transfer
rate from it to the spacecraft recorder (the FIFO drain
rate) is approximately 3500 events per second. At higher
count rates, the FIFO will eventually fill, resulting in the
CalFUSE: The FUSE Calibration Pipeline 5
loss of one or more data blocks. The effect appears as
a series of data drop-outs, each a few seconds in length,
in the raw data files. The correction, computed by the
subroutine cf fifo dead time, is simply the ratio of the
AIC count rate to the FIFO drain rate. When triggered,
this correction incorporates (and replaces) the IDS cor-
rection discussed above.
The total dead-time correction (always ≥ 1.0) is
simply the product of the detector electronics and
IDS corrections. It is computed (by the subroutine
cf apply dead time) once each second and applied to
the data by scaling the WEIGHT associated with each
photon event. The mean value of the detector electron-
ics, IDS, and total dead-time corrections are stored in
the DET DEAD, IDS DEAD, and TOT DEAD header
keywords, respectively. Other possible sources of dead
time, such as losses due to the finite response time of the
MCPs, have a much smaller effect and are ignored.
Caveats: Our dead-time correction algorithms are in-
appropriate for very bright targets. If the header key-
word TOT DEAD > 1.5, then the exposure should not
be considered photometric. If the housekeeping file for
a particular exposure is missing, the file header key-
words from which the count rates are calculated appear
to be corrupted, and either DET DEAD or IDS DEAD
is > 1.5, then the dead-time correction is assumed to be
meaningless and is set to unity. In both of these cases,
warning messages are written to the file header and the
trailer file.
4.3.3. Temperature-Dependent Changes in Detector
Coordinates
The X and Y coordinates of a photon event do not
correspond to a physical pixel on the detector, but
are calculated from timing and voltage measurements
of the incoming charge cloud (Siegmund et al. 1997;
Sahnow et al. 2000b). As a result, the detector coor-
dinate system is subject to drifts in the detector elec-
tronics caused by temperature changes and other effects.
To track these drifts, two signals are periodically injected
into the detector electronics. These “stim pulses” appear
near the upper left and upper right corners of each de-
tector, outside of the active spectral region. The stim
pulses are well placed for tracking changes in the scale
and offset of the X coordinate, but they are not well
enough separated in Y to track scale changes along that
axis. The subroutine cf thermal distort determines
the X and Y centroids of the stim pulses, computes the
linear transformation necessary to move them to their
reference positions, and applies that transformation to
the X and Y coordinates of each photon event in the re-
gions of the stim pulses and in the active region of the
detector. Events falling within 64 pixels (in X and Y)
of the expected stim-pulse positions are flagged by set-
ting the stim-pulse bit in the LOC FLGS array (§ A-3).
In raw histogram files, the stim pulses are stored in a
pair of image extensions. If either of these extensions is
missing, the pipeline reads the expected positions of the
stim pulses from the calibration file STIM CAL and ap-
plies the corresponding correction. This works (to first
order) because the stim pulses drift slowly with time,
though short-timescale variations cannot be corrected if
the stim pulses are absent.
4.3.4. Count-Rate Dependent Changes in Detector Y Scale
For reasons not yet understood, the detector Y scale
varies with the count rate, in the sense that the detector
image for a high count-rate exposure is expanded in Y. To
measure this effect, we tabulated the positions of individ-
ual detector features (particularly bad-pixel regions) as a
function of the FEC count rate (§ 4.3.2) and determined
the Y corrections necessary to shift each detector feature
to its observed position in a low count-rate exposure.
From this information, we derived the calibration file
RATE CAL for each detector segment. The correction
is stored as a two-dimensional image: the first dimension
represents the count rate and the second is the observed
Y pixel value. The value of each image pixel is the Y shift
(in pixels) necessary to move a photon to its corrected
position. The subroutine cf count rate y distort ap-
plies this correction to each photon event in the active
region of the detector. For time-tag data, the FEC count
rate is used to compute a time- and Y-dependent correc-
tion; for histogram data, the weighted mean of the FEC
count rate is used to derive a set of shifts that depends
only on Y.
4.3.5. Time-Dependent Changes in Detector Coordinates
As the detector and its electronics age, their proper-
ties change, resulting in small drifts in the computed
coordinates of photon events. These changes are most
apparent in the Lyman β airglow features observed in
each of the three apertures of the LiF and SiC channels
(Fig. 3), which drift slowly apart in Y as the mission pro-
gresses, indicating a time-dependent stretch in the detec-
tor Y scale. To correct for this stretch, the subroutine
cf time xy distort applies a time-dependent correction
(stored in the calibration file TMXY CAL) to the Y co-
ordinate of each photon event in the active region of the
detector.
Caveats: Although there is likely to be a similar change
to the X coordinate, no measurement of time-dependent
drifts in that dimension is available, so no correction is
applied.
4.3.6. Geometric Distortion
In an image of the detector generated from raw X and
Y coordinates, the spectrum is not straight, but wig-
gles in the Y dimension (Fig. 4). To map these geo-
metric distortions, we made use of two wire grids (the
so-called “quantum efficiency” and “plasma” grids) that
lie in front of each detector segment. Both grids are
regularly spaced and cover the entire active area of the
detectors. Although designed to be invisible in the spec-
tra, they cast sharp shadows on the detector when il-
luminated directly by on-board stimulation (or “stim”)
lamps. We determined the shifts necessary to straighten
these shadows. Their spacing is approximately 1 mm, too
great to measure fine-scale structure in the X dimension,
but sufficient for the Y distortion. Geometric distortions
in the X dimension have the effect of compressing and
expanding the spectrum in the dispersion direction, so
the X distortion correction is derived in parallel with the
wavelength calibration as described in § 5.1. The geomet-
ric distortion corrections are stored in a set of calibration
files (GEOM CAL) as pairs of 16384× 1024 images, one
each for the X and Y corrections. The value of each im-
6 Dixon et al.
age pixel is the shift necessary to move a photon to its
corrected position. This shift is applied by the subrou-
tine cf geometric distort.
Caveats: Though designed to be invisible, the wire
grids can cast shadows that are visible in the spectra of
astrophysical targets. These shadows are the “worms”
discussed in § 6.3.
4.3.7. Pulse-Height Variations in Detector X Scale
The FUSE detectors convert each ultraviolet photon
into a shower of electrons, for which the detector elec-
tronics calculate the X and Y coordinates and the to-
tal charge, or pulse height. Prolonged exposure to pho-
tons causes the detectors to become less efficient at
this photon-to-electron conversion (a phenomenon called
“gain sag”), and the mean pulse height slowly decreases.
Unfortunately, the X coordinate of low-pulse-height pho-
ton events is systematically miscalculated by the detec-
tor electronics. As the pulse height decreases with time,
spectral features appear to “walk” across the detector.
The strength of the effect depends on the cumulative
photon exposure experienced by each pixel and therefore
varies with location on the detector.
To measure the error in X as a function of pulse height,
we used data from long stim lamp exposures to construct
a series of 32 detector images, each containing events
with a single pulse height (allowed values range from
0 to 31). We stepped through each image in X, com-
puting the shift (∆X) necessary to align the shadow of
each grid wire with the corresponding shadow in a stan-
dard image constructed from photon events with pulse
heights between 16 and 20. The shifts were smoothed
to eliminate discontinuities and stored in calibration files
(PHAX CAL) as a two-dimensional image: the first di-
mension represents the observed X coordinate, and the
second is the pulse height. The value of each image
pixel is the walk correction (∆X) to be added to the
observed value of X. This correction, assumed to be in-
dependent of detector Y position, is applied by the sub-
routine cf pha x distort.
Caveats: For time-tag data, the walk correction is
straightforward and reasonably accurate. For histogram
data, pulse-height information is unavailable, so the
subroutine cf modify hist pha assigns to each photon
event the mean pulse height for that aperture, derived
from contemporaneous time-tag observations and stored
in the calibration file PHAH CAL. While this trick places
time-tag and histogram data on the same overall wave-
length scale, small-scale coordinate errors due to local-
ized regions of gain sag (e.g., around bright airglow lines,
particularly Lyman β) remain uncorrected in histogram
spectra.
4.3.8. Detector Active Region
When the IDF is first created, photon events with co-
ordinates outside the active region of the detector are
flagged as bad (§ 4.2). Once their coordinates are con-
verted to the FARF, the subroutine cf active region
flags as bad any photons that have been repositioned
beyond the active region of the detector. These limits
are read from the electronics calibration file (stored un-
der the header keyword ELEC CAL). Allowed values are
800 ≤ XFARF ≤ 15583, 0 ≤ YFARF ≤ 1023. The
active-area bit is written to the LOC FLGS array.
4.3.9. Uncorrected Detector Effects
CalFUSE does not perform any sort of flat-field correc-
tion. Pre-launch flat-field observations differ sufficiently
from in-flight data to make them unsuitable for this pur-
pose, and in-flight flat-field data are unavailable. (Even
if such data were available, any flat-field correction would
be only approximate, because MCPs do not have physical
pixels for which pixel-to-pixel variations can be clearly
delineated; § 4.3.3). As a result, detector fixed-pattern
noise limits the signal-to-noise ratio achievable in obser-
vations of bright targets. To the extent that grating,
mirror, and spacecraft motions shift the spectrum on the
detector during an exposure, these fixed-pattern features
may be averaged out. For some targets, we deliberately
move the FPAs between exposures to place the spectrum
on different regions of the detector. Combining the ex-
tracted spectra from these exposures can significantly im-
prove the resulting signal-to-noise ratio (§ 4.5.5). Other
detector effects (including the moiré pattern discussed in
§ 6.4) are described in the FUSE Instrument and Data
Handbook.
4.4. Screen Photons
The module cf screen photons calls subroutines de-
signed to identify periods of potentially bad data, such
as Earth limb-angle violations, SAA passages, and de-
tector bursts. A distinct advantage of CalFUSE v3 over
earlier versions of the pipeline is that bad data are not
discarded, but merely flagged, allowing users to mod-
ify their selection criteria without having to re-process
the data. To speed up processing, the pipeline calcu-
lates the various screening parameters once per second
throughout the exposure, sets the corresponding flags in
the STATUS FLAGS array of the timeline table, then
copies the flags from the appropriate entry of the time-
line table into the TIMEFLGS array for each photon
event (§ A-3). Many of the screening parameters applied
by the pipeline are set in the screening parameter file
(SCRN CAL). Other parameters are stored in various
calibration files as described below.
4.4.1. Airglow Events
Numerous geocoronal emission features lie within the
FUSE waveband (Feldman et al. 2001). While the
pipeline processes airglow photons in the same manner
as all other photon events in the target aperture, it is
occasionally helpful to exclude from consideration re-
gions of the detector likely to be contaminated by geo-
coronal or scattered solar emission. These regions are
listed in the calibration file AIRG CAL; the subroutine
cf screen airglow flags as airglow (by setting the air-
glow bit of the LOC FLGS array in the photon-event list)
all photon events falling within the tabulated regions –
even for data obtained during orbital night, when many
airglow features are absent.
4.4.2. Limb Angle
Spectra obtained when a target lies near the earth’s
limb are contaminated by scattered light from strong
geocoronal Lyman α and O I emission. To minimize this
effect, the subroutine cf screen limb angle reads the
LIMB ANGLE array of the timeline table, identifies pe-
riods when the target violates the limb-angle constraint,
CalFUSE: The FUSE Calibration Pipeline 7
and sets the corresponding flag in the STATUS FLAGS
array of the timeline table. Minimum limb angles for day
and night observations are read from the BRITLIMB
and DARKLIMB keywords of the screening parameter
file and copied to the IDF file header. The default limits
are 15◦ during orbital day and 10◦ during orbital night.
4.4.3. SAA Passages
The South Atlantic Anomaly (SAA) marks a depres-
sion in the earth’s magnetic field that allows particles
trapped in the Van Allen belts to reach low altitudes.
The high particle flux in this region raises the background
count rate of the FUSE detectors to unacceptable levels.
The subroutine cf screen saa compares the spacecraft’s
ground track, recorded in the LONGITUDE and LAT-
ITUDE arrays of the timeline table, with the limits of
the SAA (stored in the calibration file SAAC CAL as
a binary table of latitude-longitude pairs) and flags as
bad periods when data were taken within the SAA. Our
SAA model was derived from orbital information and
on-board counter data from the first three years of the
FUSE mission.
Caveats: Because the SAA particle flux is great enough
to damage the FUSE detectors, we end most exposures
before entering the SAA and lower the detector high volt-
age during each SAA pass. As a result, very little data
is actually flagged by this screening step.
4.4.4. Detector High Voltage
The detector high voltage is set independently for each
detector segment (1A, 1B, 2A, 2B). During normal op-
erations, the voltage on each segment alternates between
its nominal full-voltage and a reduced SAA level. The
SAA level is low enough that the detectors are not dam-
aged by the high count rates that result from SAA passes,
and it is often used between science exposures to mini-
mize detector exposure to bright airglow emission. The
full-voltage level is the normal operating voltage used
during science exposures. It is raised regularly to com-
pensate for the effects of detector gain sag. Without
this compensation, the mean pulse height of real photon
events would gradually fall below our detection thresh-
old. Unfortunately, there is a limit above which the
full-voltage level cannot be raised. Detector segment
2A reached this limit in 2003 and has not been raised
since; it will gradually become less sensitive as the frac-
tion of low-pulse-height events increases. The subroutine
cf screen high voltage reads the instantaneous value
of the detector high voltage from the HIGH VOLTAGE
array of the timeline table, compares it with the nominal
full-voltage level (stored as a function of time in the cali-
bration file VOLT CAL), and flags periods of low voltage
as bad.
For any number of reasons, an exposure may be ob-
tained with the detector high voltage at less than the
full-voltage level. To preserve as much of this data as pos-
sible, we examined all of the low-voltage exposures taken
during the first four years of the mission and found that,
for detector segments 1A, 1B, and 2B, the data quality is
good whenever the detector high voltage is greater than
85% of the nominal (time-dependent) full-voltage level.
For segment 2A, data obtained with the high voltage
greater than 90% of full are good, lower than 80% are
bad, and between 80 and 90% are of variable quality. In
this regime, the pipeline flags the affected data as good,
but writes warning messages to both the IDF header and
the trailer file. When this warning is present in time-tag
data, the user should examine the distribution of pulse
heights in the target aperture to ensure that the photon
events are well separated from the background (§ 4.4.12).
For histogram data, the spectral photometry and wave-
length scale are most likely to be affected.
Caveats: If the header keywords indicate that the de-
tector voltage was high, low, or changed during an ex-
posure, the IDF initialization routines (§ 4.2) write a
warning message to the trailer file. If a valid housekeep-
ing file is available for the exposure, this warning may
be safely ignored, because the pipeline uses housekeep-
ing information to populate the HIGH VOLTAGE array
in the timeline table and properly excludes time inter-
vals when the voltage was low. If the housekeeping file
is not present, each entry of the HIGH VOLTAGE array
is set to the “HV bias maximum setting” reported in the
IDF header. In this case, the pipeline has no information
about time-dependent changes in the detector high volt-
age, and warnings about voltage-level changes should be
investigated by the user.
4.4.5. Event Bursts
Occasionally, the FUSE detectors register large count
rates for short periods of time. These event bursts can
occur on one or more detectors and often have a complex
distribution across the detector, including scalloping and
sharp edges (Fig. 5). CalFUSE includes a module that
screens the data to identify and exclude bursts. The sub-
routine cf screen burst computes the time-dependent
count rate using data from background regions of the de-
tector (excluding airglow features) and applies a median
filter to reject time intervals whose count rates differ by
more than 5 standard deviations (the value may be set
by the user) from the mean. The algorithm rejects any
time interval in which the background rate rises rapidly,
as when an exposure extends into an SAA or the tar-
get nears the earth limb. The background rate com-
puted by the burst-rejection algorithm is stored in the
BKGD CNT RATE array of the timeline table and in-
cluded on the count-rate plots generated for each expo-
sure (§ 4.10). Burst rejection is possible only for data
obtained in time-tag mode.
Caveats: Careful examination of long background ob-
servations reveals that many are contaminated by emis-
sion from bursts too faint to trigger the burst-detection
algorithm. Observers studying, for example, diffuse
emission from the interstellar medium should be alert
to the possibility of such contamination.
4.4.6. Spacecraft Drift
Pointing of the FUSE spacecraft was originally con-
trolled with four reaction wheels, which typically main-
tained a pointing accuracy of 0.2–0.3 arc seconds. In late
2001, two of the reaction wheels failed, and it became
necessary to control the spacecraft orientation along one
axis with magnetic torquer bars. The torquer bars can
exert only about 10% of the force produced by the re-
action wheels, and the available force depends on the
strength and relative orientation of the earth’s magnetic
field. Thus, spacecraft drift increased dramatically along
this axis, termed the antisymmetric or A axis. Drifts
8 Dixon et al.
about the A axis shift the spectra of point-source tar-
gets in a direction 45◦ from the dispersion direction (i.e.,
∆X = ∆Y ). These motions can substantially degrade
the resolution of the spectra, so procedures have been
implemented to correct the data for spacecraft motion
during an exposure. For time-tag observations of point
sources, we reposition individual photon events. For his-
togram observations, we correct only for the exposure
time lost to large excursions of the spacecraft. The abil-
ity to correct for spacecraft drift became even more im-
portant when a third reaction wheel failed in 2004 De-
cember.
The correction of photon-event coordinates for space-
craft motion is discussed in § 4.5.7. During screening, the
subroutine cf screen jitter merely flags times when
the target is out of the aperture, defined as those for
which either ∆X or ∆Y , the pointing error in the disper-
sion or cross-dispersion direction, respectively, is greater
than 30′′, the width of the LWRS aperture. These lim-
its, set by the keywords DX MAX and DY MAX in the
CalFUSE parameter file (PARM CAL), underestimate
the time lost to pointing excursions, but smaller limits
can lead to the rejection of good data for some chan-
nels. Also flagged as bad are times when the jitter track-
ing flag TRKFLG = −1, indicating that the spacecraft
is not tracking properly. If TRKFLG = 0, no track-
ing information is available and no times are flagged as
bad. Pointing information is read from the jitter file
(JITR CAL; § A-2). If the jitter file is not present or the
header keyword JIT STAT = 1 (indicating that the jit-
ter file is corrupted), cf screen jitter issues a warning
and exits; again, no times are flagged as bad.
4.4.7. User-Defined Good-Time Intervals
One bit of the status array is reserved for user-defined
GTIs. For example, to extract data corresponding to a
particular phase of a binary star orbit, one would flag
data from all other phases as bad. A number of tools
exist to set this flag, including cf edit (available from
MAST). CalFUSE users may specify good-time inter-
vals by setting the appropriate keywords (NUSERGTI,
GTIBEG01, GTIEND01, etc.) in the screening pa-
rameter file. (Times are in seconds from the exposure
start time.) If these keywords are set, the subroutine
cf set user gtis flags times outside of these good-time
intervals as bad.
4.4.8. Time-Dependent Status Flags
Once the status flags in the timeline table are popu-
lated, the subroutine cf set photon flags copies them
to the corresponding entries in the photon event list. For
time-tag data, this process is straightforward: match the
times and copy the flags. Header keywords in the IDF
record the number of photon events falling in bad time in-
tervals or outside of the detector active area; the number
of seconds lost to bursts, SAAs, etc.; and the remaining
night exposure time. If more than 90% of the exposure
is lost to a single cause, an explanatory note is written
to the trailer file.
The task is more difficult for histogram data, for which
photon-arrival information is unavailable. We distin-
guish between time flags that represent periods of lost
exposure time (low detector voltage or target out of aper-
ture) and those that represent periods of data contami-
nation (limb angle violations or SAAs). For the former,
we need only modify the exposure time; for the latter,
we must flag the exposure as being contaminated. Our
goal is to set the individual photon flags and header key-
words so that the pipeline behaves in the following way:
When processing a single exposure, it treats all photon
events as good. When combining data from multiple
exposures, it excludes contaminated exposures (defined
below). To this end, we generate an 8-bit status word
containing only day/night information: if the exposure
is more than 10% day, the day bit is set. This status
word is copied onto the time-dependent status flag of
each photon event. We generate a second 8-bit status
word containing information about limb-angle violations
and SAAs: if a single second is lost to one of these events,
the corresponding flag is set and a message is written to
the trailer file. (To avoid rejecting an exposure that, for
example, abuts an SAA, we ignore its initial and final 20
seconds in this analysis.) The status word is stored in
the file header keyword EXP STAT (unless that keyword
has already been set; see § 4.2 and § 4.3.1). The routines
used by the pipeline to combine data from multiple ex-
posures into a single spectrum (§ 4.11) reject data files
in which this keyword is non-zero. The number of bad
events, the exposure time lost to periods of low detector
voltage or spacecraft jitter, and the exposure time dur-
ing orbital night are written to the file header, just as for
time-tag data.
Only in this subroutine is the DAYNIGHT keyword
read from the screening parameter file and written to the
IDF file header. Allowed values are DAY, NIGHT, and
BOTH. The default is BOTH. For most flags, if the bit
is set to 1, the photon event is rejected. The day/night
flag is different: it is always 1 for day and 0 for night.
The pipeline must read and interpret the DAYNIGHT
keyword before accepting or rejecting an event based on
the value of its day/night flag.
4.4.9. Good-Time Intervals
Once the time-dependent screening is complete, the
subroutine cf set good time intervals calculates a
new set of good-time intervals from information in the
timeline table and writes them to the second extension
of the IDF (§ 4.2). For time-tag data, all of the TIME-
FLGS bits are used and the DAYNIGHT filter is applied.
For histogram data, the bits corresponding to limb-angle
violations and SAAs are ignored, since data arriving dur-
ing these events cannot be excluded. The DAYNIGHT
filter is applied (assuming that all are day photons if the
exposure is more than 10% day). The exposure time,
EXPTIME = Σ (STOP−START), summed over all en-
tries in the GTI table, is then written to the IDF file
header.
4.4.10. Histogram Arrival Times
For histogram data, all of the photon events in an
IDF are initially assigned an arrival time equal to the
midpoint of the exposure. Should this instant fall in
a bad-time interval, the data may be rejected by a
subsequent step of the pipeline or one of our post-
processing tools. To avoid this possibility, the subroutine
cf modify hist times resets all photon-arrival times to
the midpoint of the exposure’s longest good-time inter-
val. This subroutine is not called for time-tag data.
CalFUSE: The FUSE Calibration Pipeline 9
4.4.11. Bad-Pixel Regions
Images of the FUSE detectors reveal a number of dead
spots that may be surrounded by a bright ring (see the
FUSE Instrument and Data Handbook for examples).
The subroutine cf screen bad pixels reads a list of
bad-pixel regions from a calibration file (QUAL CAL)
and flags as bad all photon events whose XFARF and
YFARF coordinates fall within the tabulated limits. A
bad-pixel map, constructed later in the pipeline (§ 4.8),
is used by the optimal-extraction algorithm to correct for
flux lost to dead spots.
4.4.12. Pulse Height Limits
For time-tag data, the pulse height of each photon
event is recorded in the IDF. Values range from 0 to
31 in arbitrary units. A typical pulse-height distribution
has a peak at low values due to the intrinsic detector
background, a Gaussian-like peak near the middle of the
range due to “real” photons, and a tail of high pulse-
height events, which likely represent the superposition of
two photons and therefore are not reliable. In addition,
the detector electronics selectively discard high pulse-
height events near the top and bottom of the detectors
(i.e., with large or small values of Y). We can thus im-
prove the signal-to-noise ratio of faint targets by rejecting
photon events with extreme pulse-height values. Pulse-
height limits (roughly 2–24) are defined for each detector
segment in the screening parameter file (SCRN CAL).
The subroutine cf screen pulse height flags photon
events with pulse heights outside of this range (by set-
ting the appropriate bit in the LOC FLGS array; § A-3)
and writes the pulse-height limits used and the number
of photon events rejected to the IDF file header. This
procedure is not performed on histogram data.
Caveats: We do not recommend the use of narrow
pulse-height ranges to reduce the detector background
in FUSE data. Careful analysis has shown that limits
more stringent than the default values can result in sig-
nificant flux losses across small regions of the detector,
particularly in the LiF1B channel, resulting in apparent
absorption features that are not real.
4.5. Remove Motions
Having corrected the data for various detector ef-
fects and identified periods of bad data, we continue
to work backwards through the instrument, correcting
for spectral motions on the detector due to the move-
ments of various optical components – and even of the
spacecraft itself. This task is performed by the module
cf remove motions. It begins by reading the XFARF
and YFARF coordinates of each photon event from the
IDF. It concludes by writing the motion-corrected coor-
dinates to the X and Y arrays of the same file.
4.5.1. Locate Spectra on the Detector
The LiF and SiC channels each produce three spec-
tra, one from each aperture, for a total of six spectra
per detector segment (Fig. 3). Because motions of the
optical components can shift these spectra on the detec-
tor, the first step is to determine the Y centroid of each.
To do this, we use the following algorithm: First, we
project the airglow photons onto the Y axis (summing
all values of X for each value of Y) and search the result-
ing histogram for peaks within 70 pixels of the expected
Y position of the LWRS spectrum. If the airglow fea-
ture is sufficiently bright (33 counts in 141 Y pixels), we
adopt its centroid as the airglow centroid for the LWRS
aperture and compute its offset from the expected value
stored in the CHID CAL calibration file. If the airglow
feature is too faint, we adopt the expected centroid and
assume an offset of zero. We apply the offset to the ex-
pected centroids of the MDRS and HIRS apertures to
obtain their airglow centroids. Second, we project the
non-airglow photons onto the Y axis and subtract a uni-
form background. Airglow features fill the aperture, but
point-source spectra are considerably narrower in Y and
need not be centered in the aperture. For each aperture,
we search for a 5σ peak within 40 pixels of the airglow
centroid. If we find it, we use its centroid; otherwise,
we use the airglow centroid. This scheme, implemented
in the subroutine cf find spectra, allows for the pos-
sibility that an astrophysical spectrum may appear in a
non-target aperture.
For each of the six spectra, two keywords are written
to the IDF file header: YCENT contains the computed
Y centroid, and YQUAL contains a quality flag. The
flag is HIGH if the centroid was computed from a point-
source spectrum, MEDIUM if computed from an airglow
spectrum, and LOW if the tabulated centroid was used.
It is possible for the user to specify the target centroid
by setting the SPEX SIC and SPEC LIF keywords in
the CalFUSE parameter file (PARM CAL). Two other
keywords, EMAX SIC and EMAX LIF, limit the offset
between the expected and calculated centroids: if the cal-
culated centroid differs from the predicted value by more
than this limit, the pipeline uses the default centroid.
Caveats: On detector 1, the SiC LWRS spectrum falls
near the bottom edge of the detector (Fig. 3). Because
the background level rises steeply near this edge, the cal-
culated centroid can be pulled (incorrectly) to lower val-
ues of Y, especially for faint targets.
4.5.2. Assign Photons to Channels
The subroutine cf identify channel assigns each
photon to a channel, where “channel” now refers to one
of the six spectra on each detector (Fig. 3). For each
channel, extraction windows for both point-source and
extended targets are tabulated in the calibration file
CHID CAL along with their corresponding spectral Y
centroids. These extraction limits encompass at least
99.5% of the target flux. For the target channels, iden-
tified in the APERTURE header keyword, we use either
the point-source or extended extraction windows, as indi-
cated by the SRC TYPE keyword; for the other (presum-
ably airglow) channels, we use the extended extraction
windows. The offset between the calculated and tabu-
lated spectral Y centroids (§ 4.5.1) is used to shift each
extraction window to match the data.
To insure that, should two extraction windows overlap,
photon events falling in the overlap region are assigned
to the more likely channel, photon coordinates (XFARF
and YFARF) are compared with the extraction limits
of the six spectral channels in the following order: first
the target channels (LiF and SiC); then the airglow chan-
nels (LiF and SiC) corresponding to the larger non-target
aperture; and finally the airglow channels (LiF and SiC)
10 Dixon et al.
corresponding to the smaller non-target aperture. For
example, if the target were in the MDRS aperture, the
search order would be MDRS LiF, MDRS SiC, LWRS
LiF, LWRS SiC, HIRS LiF, and HIRS SiC. The process
stops when a match is made. The channel assignment
of each photon event is stored in the CHANNEL array
(§ A-3); photon events that do not fall in an extraction
window are assigned a CHANNEL value of 0.
Channel assignment is performed twice, once before
the motion corrections and once after. The first time, all
extraction windows are padded by ±10 Y pixels to ac-
commodate errors in the channel centroids. The second
time, no padding is applied to time-tag data. Histogram
data, which are generally binned by 8 pixels in Y, present
a special challenge: The geometric correction (§ 4.3.6)
can move a row of data out of the extraction window,
producing a significant loss of flux. To prevent this, his-
togram extraction windows are padded by an additional
±8 Y pixels (or an amount equal to the binning factor
in Y, if other than 8).
4.5.3. Track Y Centroids with Time
For the LiF and SiC target apertures, the subroutine
cf calculate ycent motion computes the spectral Y
centroid as a function of time throughout the exposure.
The algorithm requires 500 photon events to compute
an average, so the centroid is updated more often for
bright targets than for faint ones. Photon events flagged
as airglow are ignored. The results are stored in the
YCENT LIF and YCENT SIC arrays of the timeline ta-
ble, but are not currently used by the pipeline. This cal-
culation is not performed for data obtained in histogram
mode.
4.5.4. Correct for Grating Motion
The FUSE spectrograph gratings are subject to small,
thermally-induced motions on orbital, diurnal, and pre-
cessional (60-day) timescales; an additional long-term,
non-periodic drift is also apparent. These motions can
shift the target and airglow spectra by as much as 15
pixels (peak to peak) in both the X and Y dimensions.
Measurements of the Lyman β airglow line in thousands
of exposures obtained throughout the mission reveal that
the gratings’ orbital motion depends on three parame-
ters: beta angle (the angle between the target and the
anti-sun vector), pole angle (the angle between the tar-
get and the orbit pole), and spacecraft roll angle (east of
north, stored in the file-header keyword APER PA). The
subroutine cf grating motion compares the beta, pole,
and roll angles of the spacecraft with a grid of values
in the calibration file GRAT CAL, reads the appropri-
ate correction, and computes the X and Y photon shifts.
The grating-motion correction is applied to all photon
events with CHANNEL > 0; photon events not assigned
to a channel are not moved.
Caveats: Some combinations of beta and pole angle
are too poorly sampled for us to derive a grating-motion
correction; for these regions, no correction is applied. At
present, only corrections for the orbital and long-term
grating motions are available. Because all photon events
in histogram data are assigned the same arrival time (the
midpoint of the longest good-time interval), they receive
the same grating-motion correction.
4.5.5. Correct for FPA Motion
The four focal-plane assemblies (shown in Fig. 1) can
be moved independently in either the X or Z direction.
FPA motions in the X direction are used to correct
for mirror misalignments and to perform FP splits (de-
scribed below). FPA motions in the Z direction are used
to place the apertures in the focal plane of the spectro-
graph. (Strictly speaking, an FPA moves along the tan-
gent to or the radius of the spectrograph Rowland circle,
not the X and Z axes shown in Fig. 1.) Both motions
change the spectrograph entrance angle, shifting the tar-
get spectrum on the detector. The FUSE wavelength
calibration is derived from a single stellar observation
obtained at a particular FPA position. The subroutine
cf fpa position computes the shift in pixels (∆X) nec-
essary to move each spectrum from its observed X posi-
tion on the detector to that of the wavelength-calibration
target. The X and Z positions of the LiF and SiC FPAs
are stored in file header keywords, various spectrograph
parameters are stored in a calibration file (SPEC CAL),
and the wavelength calibration and the FPA position
of the wavelength-calibration target are stored in the
WAVE CAL file. Shifts are computed for both the LiF
and SiC channels; the appropriate shift is applied to all
photon events with CHANNEL > 0; photon events not
assigned to a channel are ignored.
The FUSE detectors suffer from fixed-pattern noise.
Astigmatism in the instrument spreads a typical resolu-
tion element over several hundred detector pixels (pre-
dominantly in the cross-dispersion dimension), mitigat-
ing this effect, but to achieve a signal-to-noise ratio
greater than ∼ 30, one must remove the remaining fixed-
pattern noise. A useful technique is the focal-plane split.
FP splits are performed by obtaining a series of MDRS
or HIRS exposures at several FPA X positions. Moving
the FPA in the X dimension (and moving the satellite
to center the target in the aperture) between exposures
shifts both target and airglow spectra in the dispersion
direction on the detector. CalFUSE shifts each spectrum
to the standard X position expected by our wavelength
calibration routines. If the signal-to-noise ratio in the
spectra obtained at each FPA position is high enough, it
is possible for the user to disentangle the source spectrum
from the detector fixed-pattern noise; however, simply
combining extracted spectra obtained at different FPA
positions will average out most of the small-scale detec-
tor features.
4.5.6. Correct for Mirror Motion
The spectrograph mirrors are subject to thermal mo-
tions that shift the target’s image within the spectro-
graph aperture and thus its spectrum in both X and Y
on the detector. A source in either of the SiC channels
may move as much as 6′′ in a period of 2 ks. This motion
has two effects on the data: first, flux will be lost if the
source drifts (partially or completely) out of the aperture;
second, spectral resolution will be degraded (for LWRS
observations) as the spectrum shifts on the detector. Dif-
fuse sources, such as airglow emission, fill the aperture,
so their spectra are unaffected by mirror motion.
When the LiF1 channel is used for guiding, motions
of the LiF1 mirror are corrected by the spacecraft it-
self. Only the LiF2 and SiC spectra must be corrected
CalFUSE: The FUSE Calibration Pipeline 11
by CalFUSE. In theory, the switch from LiF1 to LiF2
as the primary channel for guiding the spacecraft (§ 2)
should require another set of calibration files. In practice,
the LiF2 mirror motion in the dispersion direction tracks
that of the LiF1 mirror. The mirror-motion correction is
stored as a function of time since orbital sunset (via the
TIME SUNSET array in the timeline table) in the cali-
bration file MIRR CAL. The correction (∆X) is applied
by the subroutine cf mirror motion to all photon events
within the target aperture; photon events in other aper-
tures and those not assigned to a channel are ignored.
This correction is not applied to extended sources. Be-
cause all photon events in histogram data are assigned
the same arrival time (generally the midpoint of the ex-
posure), they receive the same mirror-motion correction.
Caveats: We correct only the relative mirror motion
during an orbit, not the absolute mirror offset based on
longer-term trends. We do not correct for mirror motions
in the Y dimension. Finally, because the shifts for the
SiC1 and SiC2 mirrors are similar, we adopt a single
correction for both channels.
4.5.7. Correct for Spacecraft Motion
Spacecraft motions during an exposure shift the tar-
get spectrum on the detector and thus degrade spectral
resolution. The subroutine cf satellite jitter uses
pointing information stored in the jitter file (JITR CAL;
§ A-2) to correct the observed coordinates of photon
events for these motions. Pointing errors in arc sec-
onds are converted to X and Y pixel shifts and applied
to all photon events within the target aperture; events
in other apertures and those not assigned to a channel
are ignored. The correction is applied only if the jitter
tracking flag TRKFLG > 0, indicating that valid track-
ing information is available. TRKFLG values rise from
1 to 5 as the reliability of the pointing information in-
creases. The minimum acceptable value of the TRKFLG
may be adjusted by modifying the TRKFLG keyword in
the CalFUSE parameter file (PARM CAL).
4.5.8. Recompute Spectral Centroids
Once all spectral motions are removed from the data,
the subroutine cf calculate y centroid recomputes
the spectral Y centroids. Separate source and airglow
centroids are determined for each aperture in turn, from
largest to smallest. (The former is meaningless if the
aperture does not contain a source.) The offset be-
tween the measured airglow centroid in the LWRS aper-
ture and the tabulated centroid (from the calibration file
CHID CAL) is used to compute the airglow centroids for
the MDRS and HIRS apertures; the computed MDRS
and HIRS airglow centroids are ignored. The YCENT
value written to the IDF file header is determined by
the quality flag previously set by cf find spectra (§
4.5.1): if YQUAL = HIGH, the source centroid is used;
if YQUAL = MEDIUM, the airglow centroid is used; and
if YQUAL = LOW, the tabulated centroid is used. The
SPEX SIC, SPEC LIF, EMAX SIC, and EMAX LIF
keywords in the CalFUSE parameter file (PARM CAL)
have the effects discussed in § 4.5.1.
4.5.9. Final Assignment of Photons to Channels
The final assignment of each photon event to a chan-
nel is performed by cf identify channel, just as in §
4.5.2, but with two modifications: First, we consider
only photon events with CHANNEL > 0; unassigned
events (which are not motion corrected) remain unas-
signed. Second, we do not pad the extraction windows
by an additional ±10 pixels in Y, though the extraction
windows for histogram data are padded by ±8 Y pixels
(or an amount equal to the binning factor in Y, if other
than 8), as before.
4.5.10. Compute Count Rates
For time-tag data, cf target count rate computes
the count rate in the target aperture for the LiF and
SiC channels. To account for dead-time effects, the con-
tents of the WEIGHT array are used. Events in airglow
regions are excluded, but no other filters are applied to
the data. Results are written to the LIF CNT RATE
and SIC CNT RATE arrays of the timeline table. For
histogram data, the initial values of these arrays, taken
from the housekeeping file (§ A-3), are scaled by the value
of the header keyword DET DEAD.
4.6. Wavelength Calibration
Once converted to the FARF and corrected for optical
and spacecraft drifts, the data can be wavelength cal-
ibrated. The module cf assign wavelength performs
three tasks: first, it corrects for astigmatism in the spec-
trograph optics; second, it applies a wavelength calibra-
tion to each photon event; and third, it shifts the wave-
lengths to a heliocentric reference frame. The derivation
of the FUSE wavelength scale is discussed in § 5.1.
4.6.1. Astigmatism Correction
The astigmatic height of FUSE spectra perpendicular
to the dispersion axis is significant and varies as a func-
tion of wavelength (Fig. 3). Moreover, spectral features
show considerable curvature, especially near the ends of
the detectors where the astigmatism is greatest. The
subroutine cf astigmatism shifts each photon event in
X to correct for the spectral-line curvature introduced by
the FUSE optics, providing a noticeable improvement in
spectral resolution for point sources (Fig. 6).
The astigmatism correction is derived from observa-
tions of GCRV 12336, the central star of the Dumbbell
Nebula (M 27), which exhibits H2 absorption features
across the FUSE waveband. We cross-correlate and com-
bine the absorption features from a small range in X, fit
a parabola to each set of combined features, compute the
shift required to straighten each parabola, and interpo-
late the shifts across the waveband. Because an astigma-
tism correction has been derived only for point sources,
no correction is performed on the spectra of extended
sources, airglow spectra, or observations with APER-
TURE = RFPT.
The correction is stored in the calibration file
ASTG CAL as a two-dimensional image representing the
region of the detector containing the target spectrum. A
separate image is provided for each aperture. The value
of each image pixel is the astigmatism correction (∆X
in pixels) to be applied to that pixel. The entire image
is shifted in Y to match the centroid of the target spec-
trum, and the appropriate correction is applied to each
photon event in the target aperture. The corrected X co-
ordinates are not written to the IDF, but are passed im-
mediately to the wavelength-assignment subroutine. In
12 Dixon et al.
effect, we apply a two-dimensional wavelength calibra-
tion, which depends upon both the X and Y coordinates
of each photon event.
4.6.2. Assign Wavelengths
The wavelength calibration is stored as a binary table
extension in the calibration file WAVE CAL (§ 5.1); a
separate extension is provided for each aperture. Wave-
lengths are tabulated for integral values of X, assumed
to be in motion-corrected FARF coordinates. Given
the astigmatism-corrected X and CHANNEL arrays, the
subroutine cf dispersion considers each aperture in
turn and reads the corresponding calibration table. It
interpolates between tabulated values of X to derive the
wavelength of each photon event. Photon events not as-
signed to an aperture (CHANNEL = 0) are not wave-
length calibrated.
4.6.3. Doppler Correction
The component of the spacecraft’s orbital velocity
in the direction of the target is stored in the OR-
BITAL VEL array of the timeline table. The component
of the earth’s orbital velocity in the direction of the tar-
get is stored in the IDF file header keyword V HELIO.
Their sum is used to compute a time-dependent Doppler
correction, which is applied to each photon event by the
subroutine cf doppler and heliocentric. The result-
ing wavelength scale is heliocentric. Because histogram
data are assigned identical arrival times, their Doppler
correction is not time dependent. Histogram exposures
are kept short (approximately 500 seconds) to minimize
the resulting loss of spectral resolution. The final wave-
length assigned to each photon event is stored in the
LAMBDA array of the IDF.
4.7. Flux Calibration
Because the instrument sensitivity varies through the
mission, we employ a set of time-dependent effective-
area files (AEFF CAL). We interpolate between the two
files whose dates bracket the exposure start time but
do not extrapolate beyond the most recent effective-area
file. Within each calibration file, the instrumental ef-
fective area is stored as a binary table extension, with
a separate extension provided for each aperture. The
module cf flux calibrate invokes a single subroutine,
cf convert to ergs. Considering each aperture in turn,
the program reads the appropriate calibration files, inter-
polates between them if appropriate, and computes the
“flux density” of each photon (in units of erg cm−2) ac-
cording to the formula
ERGCM2 = WEIGHT× hc / LAMBDA / Aeff(λ), (1)
where ERGCM2, WEIGHT, and LAMBDA are read
from the photon-event list (§ A-3), h is Planck’s con-
stant, c the speed of light, and Aeff(λ) the effective area
at the wavelength of interest. Only photon events as-
signed to an aperture are flux calibrated; events with
CHANNEL = 0 are ignored. The flux density computed
for each photon event is stored in the ERGCM2 array of
the IDF. This array is not used by the pipeline, but is
employed by some of our interactive IDF manipulation
tools.
4.8. Create Bad-Pixel Map
When possible, spectra are extracted using an optimal-
extraction algorithm (§ 4.9.4) that employs a bad-pixel
map (BPM) to correct for flux lost to dead spots and
other detector blemishes. Because motions of the space-
craft and its optical components cause FUSE spectra
to wander on the detector, a particular spectral feature
may be affected by a dead spot for only a fraction of an
exposure. We thus generate a bad-pixel map for each
exposure. The module cf bad pixels reads a list of
dead spots from a calibration file (QUAL CAL), deter-
mines which of them overlap the target aperture, tracks
the motion corrections applied to each, and converts the
motion-corrected coordinates to wavelengths. The re-
sulting bad-pixel map (identified by the suffix “bpm.fit”)
has a format similar to that of an IDF (§ A-4), but the
WEIGHT column, whose values range from 0 to 1, rep-
resents the fraction of the exposure that each pixel was
affected by a dead spot. No BPM file is created if the
(screened) exposure time is less than 1 second. BPM files
are not archived, but can be generated from an IDF and
its associated jitter file using software distributed with
CalFUSE (available from MAST).
4.9. Extract Spectra
Through all previous steps of the pipeline, we resist
the temptation to convert the photon-event list into an
image. In the module cf extract spectra, we relent.
Indeed, we generate four sets of images: a background
model, a bad-pixel mask, a two-dimensional probabil-
ity distribution of the target flux, and a spectral image
for each extracted spectrum (LiF and SiC). Only pho-
ton events that pass all of the requested screening steps
(§ 4.4) are considered. If the (screened) exposure time
is less than 1 second or no photon events survive the
screening routines, then the program generates a null-
valued spectral file.
4.9.1. Background Model
Microchannel plates contribute to the detector back-
ground via beta decay of 40K in the MCP glass. On
orbit, cosmic rays add to this intrinsic background to
yield a total rate of ∼ 0.5 counts cm−2 s−1. Scattered
light, primarily geocoronal Lyman α, contributes a well-
defined illumination pattern (Fig. 7) that varies in inten-
sity during the orbit, with detector-averaged count rates
as small as 20% of the intrinsic background during the
night and 1-3 times the intrinsic rate during the day. We
assume that the observed background consists of three
independent components, a spatially uniform dark count
and spatially-varying day- and nighttime scattered-light
components. Properly scaling them to the data is thus a
problem with three unknown parameters. We attempt to
fit as many of these parameters as possible directly from
the data. When such a fit is not possible, we estimate
one or more components and fit the remainder.
Background events due to the detector generally have
pulse-height values lower than those of real photons (§
4.4.12). The observed dark count thus depends on the
pulse-height limits imposed on the data. An initial esti-
mate of the dark count, as a function of the lower pulse-
height threshold, is read from the background characteri-
zation file (BCHR CAL). The day and night components
CalFUSE: The FUSE Calibration Pipeline 13
of the scattered-light model are read from separate exten-
sions of the appropriate time-dependent background cal-
ibration file (BKGD CAL). The background models are
scaled to match the counts observed on unilluminated
regions of the detector. The Y limits of these regions
(selected according to the target aperture) are read from
header keywords in the IDF. Airglow photons in these
regions are excluded from the analysis. The day and
night counts in the background regions of the detector
are summed and recorded.
In its default mode, the subroutine cf scale bkgd es-
timates the uniform background as follows: The back-
ground regions of the day and night scattered-light mod-
els are scaled by their relative exposure times, summed,
and projected onto the X axis (to produce a histogram
in X). A similar histogram (called the “empirical back-
ground spectrum”) is constructed from the data. An it-
erative process is used to determine the uniform compo-
nent and scattered-light scale factor that best reproduce
the observed X distribution of background counts. The
uniform component is then subtracted from the day and
night totals computed above, and the day and night com-
ponents of the scattered-light model are scaled to match
the remaining observed counts.
If the empirical background spectrum is too faint, we
do not attempt to fit it, but assume that the uniform
component of the background is equal to the tabulated
dark-count rate scaled by the exposure time. The day
and night components of the scattered-light model are
calculated as above. Users who require a more accurate
background model may wish to combine data from mul-
tiple exposures before extracting a spectrum (see § 6.6).
If the empirical background spectrum is very bright –
as, for example, when nebular emission or a background
star contaminates one of the other apertures (and thus
the background-sample region) – no fit is performed. In-
stead, the day and night components of the scattered-
light model are scaled by the day and night exposure
times, the tabulated dark-count rate is scaled by the to-
tal exposure time, and the three components are summed
to produce a background image. This scheme is also used
for histogram data. Because histogram files contain data
only from the region about the desired extraction win-
dow, the background cannot be estimated from other
regions of the detector. Fortunately, histogram observa-
tions typically consist of short exposures of bright tar-
gets, so the background is comparatively faint.
Our day and night scattered-light models were de-
rived from the sum of many deep background observa-
tions spanning hundreds of kiloseconds. Individual ex-
posures that differ markedly from the mean were ex-
cluded from the sum. The data were processed only
through the FARF-conversion and data-screening steps
of the pipeline. Airglow features were replaced with a
mean background interpolated along the dispersion (X)
axis of the detector. An estimate of the uniform back-
ground component was subtracted from the final image,
and the data were binned by 16 detector pixels in X. This
process was performed on both day- and night-only data
sets. We produce a new set of background images every 6
to 12 months, as the effects of gain sag and adjustments
of the detector high voltage slowly alter the relative sen-
sitivity of the illuminated and background regions of the
detectors.
Caveats: While early versions of the pipeline (through
v2.4) assume a 10% uncertainty in the background flux,
propagating it through to the final extracted spectrum,
CalFUSE v3 treats uncertainties in the background as
systematic errors and does not include them in the
(purely statistical) error bars of the extracted spectra.
The algorithm assumes that the intensity of the uni-
form background is constant throughout an exposure.
This would be the case if it were due only to the detec-
tor dark count, but in fact the uniform background in-
cludes a substantial contribution from the scattered light
and is thus brighter during day-time portions of an expo-
sure. The assumption of a constant uniform background
can lead the algorithm to over-estimate the scale factors
for both the uniform and spatially-varying components
of the background model. A better scheme would be to
fit the day and night components of the uniform back-
ground separately. Similarly, when the empirical back-
ground spectrum is very faint, adopting the tabulated
value of the uniform background is not the best solu-
tion. It is likely that the scattered-light component of
the uniform background would be better estimated from
the observed day and night backgrounds. The difference
between the tabulated and observed levels of the uniform
background will become greater as the mission extends
into solar minimum and the intensities of both individ-
ual airglow features and the scattered-light component
of the uniform background continue to weaken.
Grating scattering of point-source photons along the
dispersion direction is potentially significant and is not
corrected by the CalFUSE pipeline. Typical values are
1–1.5% of the continuum flux in the SiC channels and 10
times less in the LiF channels.
4.9.2. Probability Array
The optimal-extraction algorithm (§ 4.9.4) requires as
input a two-dimensional probability array representing
the distribution of flux on the detector. Separate prob-
ability arrays, derived from high signal-to-noise stellar
observations, have been computed for each channel and
stored as image extensions in the weights calibration file
(WGTS CAL).
By construction, the Y dimension of the probability
array represents the maximum extent of the extraction
window for a particular aperture. For simplicity, all ar-
rays employed by the optimal-extraction algorithm are
trimmed to match the probability array in Y. The cen-
troids of the probability distribution and the target spec-
trum (recorded in the corresponding file headers) are
used to determine the offset between detector and prob-
ability coordinates. In the X dimension, all arrays are
binned to the output wavelength scale requested by the
user. Default wavelength parameters for each aperture
are specified in the header of the wavelength calibration
file (WAVE CAL); the default binning for all channels is
0.013 Å per output spectral bin, which corresponds to
approximately two detector pixels. The background ar-
ray, originally binned by 16 pixels in X, is rescaled to the
width of each output wavelength bin by the subroutine
cf rebin background. The probability array is rescaled
by the subroutine cf rebin probability array to have
a sum of unity in the Y dimension for each wavelength
14 Dixon et al.
4.9.3. Bad-Pixel Mask
A bad-pixel mask with the same wavelength scale and
Y dimensions as the probability array is constructed from
the BPM file (§ 4.8) by the subroutine cf make mask.
The array is initialized to zero. For each entry in the
BPM file, the value of the WEIGHT array is added to
the corresponding pixel of the bad-pixel mask. The mask
is then normalized and inverted, so that the center of the
deepest dead spot has a value of 0 and the regions outside
are set to 1. The conversion from pixel to wavelength
coordinates can open gaps in the mask, which appear as
values of unity surrounded by pixels with lower values.
We search for array elements that are larger than their
neighbors and replace them with the mean value of the
adjoining pixels. If the BPM file is absent or a particular
aperture is free of bad pixels, all elements of the bad-pixel
mask are set to 1.
4.9.4. Optimal (Weighted) Spectral Extraction
The extraction subroutine, cf optimal extraction,
is called separately for each of the two target spectra
(LiF and SiC). Inputs include the photon-event list and
the indices of events that pass through the various screen-
ings, as well as the 2-D background, probability, and bad-
pixel arrays described above. For numerical simplicity,
extraction is performed using the WEIGHT of each pho-
ton event, rather than its ERGCM2 value. A pair of 2-D
data and variance arrays with the same dimensions as the
probability array are constructed from the good photons
whose CHANNEL values correspond to the target aper-
ture. For time-tag data, this process is straightforward:
the LAMBDA and Y values of each photon event corre-
spond to a particular cell in the data and variance arrays.
That cell in the data array is incremented by the photon
weight, while the corresponding cell in the variance array
is incremented by the square of the weight. A 1-D raw-
counts spectrum (useful for the statistical analysis of low
count-rate data) is constructed simultaneously: for each
photon event added to the data array, the appropriate
bin of the counts spectrum is incremented by one.
For histogram data, the process is more complex, be-
cause the original detector image is generally binned by
8 pixels in Y and because each entry in the photon-event
list represents the sum of many individual photons. In
the Y dimension, an event’s WEIGHT is divided among
8 pixels (or the actual Y binning for that exposure, if
different) according to the distribution predicted by the
probability array. In the X dimension, each event is as-
sumed to have a width in wavelength space equal to the
mean dispersion per pixel for the channel (read from the
DISPAPIX keyword of the WAVE CAL file), and the
WEIGHT of an event that spans the boundary between
two output wavelength bins is divided between them.
This smoothing in X helps to mitigate the “beating” that
would otherwise occur between detector pixels and out-
put wavelength bins.
One-dimensional background, weights, and variance
spectra are then extracted from the two-dimensional
background, data, and variance arrays. To insure that
the three spectra sample the same region of the detector,
only cells in the 2-D arrays for which the correspond-
ing cell in the probability array has a value greater than
10−4 are included in the sum. These limits differ slightly
from those defined in the aperture (CHID CAL) calibra-
tion files. As a result, the ratio of the final weights and
counts spectra may not be a constant. (Ideally, their ra-
tio would equal the mean dead-time correction for the
exposure.) An initial flux spectrum, equal to the differ-
ence of the weights and background spectra, is used as
input to the optimal-extraction algorithm.
CalFUSE employs the optimal-extraction algorithm
described by Horne (1986), which requires as input the
2-D data, background, probability, and bad-pixel arrays
and the 1-D initial flux spectrum. Originally designed for
CCD spectroscopy, the algorithm has been modified for
the FUSE detectors. Specifically, instead of constructing
a 2-D spatial profile from each data set, we use a tabu-
lated probability array; the 2-D cosmic-ray mask, an in-
teger array in the original algorithm, is replaced with the
bad-pixel mask, which is a floating-point array; and the
2-D variance estimate is scaled by the bad-pixel mask.
Extraction is iterative: in the original version, iteration
is performed until the cosmic-ray mask stops changing.
In our version, iteration continues until the output flux
spectrum changes by less than 0.01 counts in all pixels. If
the loop repeats 50 times, the algorithm fails. The num-
ber of iterations performed is written to the OPT EXTR
keyword of the output file header.
If optimal extraction is successful, the variance of
the optimal spectrum is computed using the recipe of
Horne (1986). We have adapted this recipe to pro-
duce weights and background spectra such that FLUX
= WEIGHTS − BKGD. The resulting background spec-
trum is not smooth. Optimal extraction is not per-
formed on the spectra of extended sources or on those
for which the quality of the computed spectral centroid
is not HIGH. (Both the centroid and its quality flag are
stored in file header keywords.) In these cases, or if the
optimal-extraction algorithm fails, the initial flux, vari-
ance, weights, and background spectra are adopted.
However they are constructed, the final FLUX, ER-
ROR (equal to the square root of the variance),
WEIGHTS, and BKGD arrays (all in units of counts)
are returned to the calling routine. For time-tag data,
the COUNTS array as described above is returned. For
histogram data, the COUNTS array is computed by di-
viding the final WEIGHTS array by the mean dead-
time correction, which is stored in a file-header keyword
TOT DEAD. Also returned is the QUALITY array. It
is the product of the probability array and the bad-pixel
map, projected onto the wavelength axis and expressed
as an integer between 0 and 100. Its value is 0 if all the
flux in a wavelength bin is lost to a detector dead spot,
100 if no flux is lost.
Caveats: The optimal-extraction algorithm is designed
to improve the signal-to-noise ratio of the spectra of faint
point sources. Unfortunately, it is in precisely these cases
that the spectral centroid is most likely to be uncertain.
Because proper positioning of the probability array is es-
sential to the weighting scheme, observers of faint targets
may wish to combine the IDFs from multiple exposures
and re-compute their spectral centroids before attempt-
ing optimal extraction.
4.9.5. Extracted Spectral Files
Because the optimal-extraction routine returns the fi-
nal FLUX and ERROR arrays in units of counts, the
CalFUSE: The FUSE Calibration Pipeline 15
spectral-extraction module applies a flux calibration to
both arrays using the subroutine cf convert to ergs
described in § 4.7. Dividing each array element by (EX-
PTIME × WPC), where EXPTIME is the length in sec-
onds of the (screened) exposure and WPC the width in
Ångstroms of each output wavelength bin, completes the
conversion to units of erg cm−2 s−1 Å−1. The format of
the extracted spectral files is described in § A-5.
4.10. Trailer and Image Files
A number of supplementary files are generated by Cal-
FUSE and archived with the data. For each exposure
and detector segment, the pipeline generates a trailer file
and a pair of image files in Graphics Interchange Format
(GIF). The trailer file (suffix “.trl”) contains timing in-
formation for all pipeline modules and any warning or
error messages that they may have generated. The first
image file contains an image of the detector overlaid by a
wavelength scale and extraction windows for each aper-
ture (suffix “ext.gif”). Only photon events flagged as
good are included in the plot, unless there are none, in
which case all events are plotted. The second image file
presents count-rate plots for both the LiF and SiC target
apertures (suffix “rat.gif”). These arrays come from the
timeline table in the IDF and exclude photons flagged as
airglow. These image files are powerful tools for diagnos-
ing problems in the data, revealing, for example, when
high background levels cause the SiC1 LWRS extraction
window to be misplaced.
4.11. Observation-Level Files
For each exposure, CalFUSE produces LiF and SiC
spectra from each of four detector segments, for a total
of eight extracted spectral files. OPUS combines them
into a set of three observation-level files for submission
to MAST. Observation-level files are distinguished from
exposure-level files by having an exposure number of 000.
Depending on the target and the scientific questions at
hand, these files may be of sufficient fidelity for scientific
investigation. Here is a brief description of their contents:
ALL: For each combination of detector segment and
channel (LiF1A, SiC1A, etc.), we combine data from all
exposures in the observation into a single spectrum. If
the individual spectra are bright enough, we cross corre-
late and shift them before combining. (For each channel,
the shift calculated for the detector segment spanning
1000–1100 Å is applied to the other segment as well.) If
the spectra are too faint for cross correlation, we com-
bine the individual IDFs and extract a single spectrum
to optimize the background model. Combined spectra
(WAVE, FLUX, and ERROR arrays) for each of the eight
channels are stored in separate binary table extensions
in the following order: 1ALIF, 1BLIF, 2BLIF, 2ALIF,
1ASIC, 1BSIC, 2BSIC, and 2ASIC.
ANO (all, night-only): With the same format as the
ALL files, these spectra are constructed using only data
obtained during the night-time portion of each exposure.
They are generated only for time-tag data, and only if
EXPNIGHT > 0. The shifts calculated for the ALL files
are applied to the night-only data; they are not recom-
puted.
NVO (National Virtual Observatory): These files con-
tain a single spectrum spanning the entire FUSE wave-
length range. The spectrum is assembled by cutting and
pasting segments from the most sensitive channel at each
wavelength. Segments are shifted to match the guide
channel (either LiF1 or LiF2) between 1045 and 1070 Å.
Columns are WAVE, FLUX, and ERROR and are stored
in a single binary table extension.
The ALL file is used to generate a “quick-look” spec-
tral plot for each observation. When available, combined
spectra from channels spanning the FUSE waveband are
plotted in a single GIF image file (suffix “specttagf.gif”
or “spechistf.gif”). This plot appears on the MAST pre-
view page of each observation. Four additional GIF files
contain the combined LiF1, LiF2, SiC1, and SiC2 spectra
for each observation.
Caveats: Cross-correlation may fail, even for the spec-
tra of bright stars, if they lack strong spectral features.
Examples are nearby white dwarfs with weak interstellar
absorption lines. If cross-correlation fails for a given ex-
posure, that exposure is excluded from the sum. Thus,
the exposure time for a particular segment in an ALL
file may be less than the total exposure time for that
observation.
The cataloging software used by MAST requires the
presence of an ALL file for each exposure, not just for
the entire observation. We generate exposure-level ALL
files, but they contain no data, only a FITS file header.
The observation-level ALL files discussed above can be
distinguished by the string “00000all” in their names.
4.12. Quality Control and Archiving
Before the reduced data are archived, they undergo
a two-step quality-control process: First, a set of auto-
mated checks is performed on each exposure. The soft-
ware compares the flux observed in the guide channel
(LiF1 or LiF2) with that expected for the target and
with that observed in the other three channels. If an
anomaly is detected, a flag is set requesting manual in-
vestigation. The software works well for bright contin-
uum sources, but often flags faint or emission-line targets
as unsuccessful observations. Second, a member of the
FUSE operations team investigates any warnings gener-
ated by the software. If it is determined that less than
50% of the requested data were obtained, the target is
re-observed.
The philosophy of the FUSE project is to archive data
whenever possible, even if it does not satisfy the require-
ments of the original investigator. As a result, the MAST
archive contains a number of FUSE data sets that are in
some way flawed (e.g., misaligned channels, partial loss
of guiding, or no good observing time). Users should be
aware of this possibility.
If the pipeline detects an error in the data or its as-
sociated housekeeping or jitter files, it writes a warning
to both the trailer file and the headers of the IDF and
extracted spectral files. Users are advised to scan trailer
files for the “WARNING” string and spectral files for
“COMMENT” records. Occasionally, the FUSE opera-
tions team inserts comments directly into the headers of
raw data files. Such comments may warn of an unusual
instrument configuration, errors in the reported target
coordinates, or data obtained during slews.
Observation names beginning with the letter “S” are
science-verification observations and may have been ob-
tained with an unusual instrument configuration. For
16 Dixon et al.
example, the program S523 was designed to test pro-
cedures for observing a bright object by defocusing the
SiC mirrors. The LiF mirrors were moved for some S523
exposures. As a result, data from this program should
be used with caution. Abstracts for all FUSE observing
programs are available from MAST.
5. CALIBRATION FILES
5.1. Wavelength Calibration
5.1.1. Derivation of the FUSE Wavelength Calibration
Our principal wavelength-calibration target is
GCRV 12336, central star of the planetary nebula M 27,
whose spectrum exhibits a myriad of molecular-hydrogen
absorption features (McCandliss et al. 2007). For the
SiC1B channel, these data are supplemented at the
shortest wavelengths by spectra of the hot white dwarf
G 191-B2B (Lemoine et al. 2002). Spectra obtained
through each of the three FUSE apertures were fully
reduced, corrected for astigmatism, and used to derive
an empirical mapping of pixel to wavelength. For each
channel, standard optical expressions were used to
derive a theoretical dispersion solution, which was fit
to the empirical data with only its constant term (the
zero-point of the wavelength scale) as a free parameter.
The shifted theoretical dispersion solution was used to
generate the wavelength-calibration file.
Early versions of the pipeline relied on the wavelength
calibration to correct for non-linearities in the detector X
scale (the geometric distortion discussed in § 4.3.6). Cor-
recting for this effect separately has greatly improved the
accuracy of the FUSE wavelength scale. To determine
the geometric distortion in the X dimension, a spline
was fit to the residuals from each aperture (expected mi-
nus observed X coordinate of each absorption feature;
Fig. 8). In practice, residuals from all six apertures (both
LiF and SiC channels) were included in the fit, but data
from the other five apertures were weighted 100 times
less than those for the aperture being fitted. The addi-
tional data points help to constrain the fit in wavelength
regions where the data are sparse or missing. The spline
fits from all six apertures were then used to construct the
two-dimensional map of detector distortions in the X di-
mension that is used by the geometric-distortion routine
described in § 4.3.6. The process is iterative, with the
residuals (ideally) becoming smaller with each iteration.
The scatter of individual measurements about the
spline fit is caused in some cases by blended absorption
lines and in others by localized distortions induced by
the fiber-bundle structure of the MCPs. This scatter is
thus a fair estimate of the inaccuracies that the user may
expect in the relative measurement of the wavelength of
any given feature. The wavelength inaccuracies caused
by localized distortions are 3–4 detector pixels (0.025 Å
or 7 km s−1) at most wavelengths, but may be as large
as 6–8 pixels. They occur in tiny windows about 1 to 3 Å
wide, depending on the channel and segment. Some data
sets show larger residuals. These distortions are inherent
in the FUSE data set and represent the ultimate limit
to the accuracy of the FUSE wavelength calibration.
5.1.2. Zero-Point Uncertainties
The FUSE wavelength calibration assumes that the
motion-corrected spectrum falls at a precise location on
the detector. If it is shifted in X, then the wavelength
scale of the extracted spectrum will suffer a zero-point
offset. For the guide channel (either LiF1 or LiF2),
the dominant source of wavelength errors is thermally-
induced rotations of the spectrograph gratings, which de-
pend on the satellite attitude. For the other channels,
additional wavelength errors come from mirror misalign-
ments that shift the target away from the center of the
aperture. Such misalignments may produce zero-point
offsets of up to ±0.15 Å for point sources in the LWRS
aperture. Offsets are less than ±0.02 Å for the MDRS
aperture and are negligible for the HIRS aperture.
We define the zero point of our wavelength scale by
requiring that the Lyman β airglow feature, observed
through the HIRS aperture and processed as if it were a
point source, be at rest in spacecraft coordinates when all
Doppler corrections are turned off. The use of an airglow
feature eliminates errors due to mirror motions in the
non-guide channels, but not errors in the grating-motion
correction, so we measure the Lyman β line in some 200
background exposures and shift their mean velocity to
0 km s−1. For each channel, all three apertures (HIRS,
MDRS, and LWRS) use this HIRS-derived wavelength
scale (WAVE CAL version 022 and greater).
The grating-motion correction (§ 4.5.4) is designed to
place the centroid of each Lyman β airglow feature at
a fixed location in FARF coordinates. On average, it
achieves that goal: for our sample of 200 background ex-
posures, the measured velocity of the Lyman β line has a
standard deviation of between 2 and 3 km s−1, depend-
ing on the channel. Unfortunately, some combinations
of pole and beta angle are not well corrected, leading
to velocity offsets of 10 km s−1 or more, and additional
motions – of either the gratings or some other optical
component – can shift the extracted spectra by several
km s−1 from one exposure to the next.
Figure 9 presents the measured wavelength of the in-
terstellar O I λ1039 absorption feature in 47 exposures of
the hot white dwarf KPD 0005+5106 obtained through
the HIRS aperture. The 2001 and 2002 data show little
scatter and yield a mean velocity of −10.7± 1.9 km s−1.
(Holberg, Barstow, and Sion 1998 report a heliocentric
velocity of −7.50 ± 0.76 km s−1 for the interstellar fea-
tures along this line of sight.) The 2003 data (all from
a single observation) span nearly 10 km s−1. The 2004
data (again from a single observation) are tightly corre-
lated but offset by∼ 7 km s−1. These data were obtained
at a spacecraft orientation (beta and pole angles) that is
generally well corrected by our grating-motion algorithm;
apparently, some other effect is at work. The 2006 data
come from the LiF2 channel, which became the default
guide channel in 2005 (§ 6.1).
We do not recommend the general use of airglow lines
to fix the absolute wavelength scale of point-source spec-
tra for several reasons: First, airglow emission fills the
aperture, so the resulting airglow lines provide no in-
formation about the position of the target relative to
the aperture center. Second, the jitter correction (for all
channels) and the mirror-motion correction (for the SiC
channels) are inappropriate for airglow emission. Third,
the Doppler correction for the spacecraft’s orbital motion
can degrade their resolution.
CalFUSE: The FUSE Calibration Pipeline 17
TABLE 1
Stellar Parameters Adopted for FUSE
Flux Standards
Teff log g Vrad
Name (K) (cm s−2) (km s−1)
GD 71 32,843 7.783 80.0
GD 659 35,326 7.923 33.0
GD 153 39,158 7.770 50.0
HZ 43 50,515 7.964 20.6
GD 246 53,000 7.865 −13.2
G 191-B2B 61,200 7.5 · · ·
Note. — For G 191-B2B, we use the model
employed by Kruk et al. 1999 for the final
Astro-2 calibration of HUT.
5.1.3. Diffuse Emission
The FUSE wavelength scale is derived from
astigmatism-corrected, point-source spectra. Extended-
source (diffuse) spectra are not corrected for astig-
matism. If point-source data are processed with the
astigmatism correction turned off, the resulting wave-
length errors are less than about 4 detector pixels,
consistent with the uncertainties in the wavelength
scale. Therefore, the present FUSE wavelength calibra-
tion should be adequate for extended-source spectra.
Airglow lines are useful for determining the zero-point
for extended sources that fill the aperture.
5.2. Flux Calibration
5.2.1. Derivation of the Effective-Area Curve
The FUSE flux calibration is based on in-flight ob-
servations of the well-studied DA (pure-hydrogen) white
dwarfs listed in Table 1, which have been observed at reg-
ular intervals throughout the mission. For each channel,
data from multiple stars are combined to track changes
in the instrument sensitivity using a technique similar
to that developed by Massa and Fitzpatrick (2000) for
the International Ultraviolet Explorer (IUE)/ satellite.
The algorithm yields a series of time- and wavelength-
dependent sensitivity curves as well as the spectrum of
each star, in units of raw counts, as it would have ap-
peared on a date early in the mission, which we choose
to be T0 = 1999 December 31. (We refer to the latter as
“T0 spectra.”)
For each star, we generated a synthetic spectrum using
the programs TLUSTY (version 200) and SYNSPSEC
(version 48) of Hubeny and Lanz (1995). The non-LTE
pure-hydrogen model atmospheres were computed ac-
cording to a prescription by Hubeny (private commu-
nication) using 200 atmospheric layers to ensure an opti-
mal absolute flux accuracy. The atmospheric parameters
listed in Table 1, consistent with HST, IUE, and opti-
cal observations, were used to compute the models (Hol-
berg, private communication). For G 191-B2B, we used
the model employed by Kruk et al. (1999) for the final
Astro-2 calibration of HUT. Observations of these stars
with the Faint Object Spectrograph aboard HST have
shown that the models, including parameter uncertain-
ties, are consistent to within 2% at wavelengths longer
than Lyman α (Bohlin, Colina, and Finley 1995; Bohlin
1996). Uncertainties in the far-ultraviolet waveband are
slightly higher, as discussed by Kruk et al. (1999).
For each channel, the effective area in units of cm2 is
computed by dividing one or more T0 spectra in units of
counts s−1 Å−1 by a synthetic white-dwarf spectrum in
units of photons cm−2 s−1 Å−1. We find excellent agree-
ment between the effective areas derived from the differ-
ent standard stars. Sensitivity curves for the LiF1A and
SiC1A channels are presented in Fig. 10. (Effective-area
curves for all FUSE channels are available from MAST.)
The sensitivity of the LiF1A channel decreased by ∼ 15%
over the first three years of the mission, but appears to
have stabilized; that of the SiC1 channel has declined by
∼ 45% since launch and is falling still (though slowly).
Effective-area curves (AEFF CAL) for each channel and
detector segment were generated at three-month inter-
vals until the loss of the third reaction wheel in 2004
December; we plan to generate them at six-month inter-
vals for the duration of the mission.
Caveats: We do not attempt to correct spectra ob-
tained through the MDRS and HIRS apertures for
changes in instrument sensitivity, but employ a single
effective-area curve for each. The low throughput of
these apertures, combined with the likelihood that their
spectra are non-photometric, makes tracking changes in
their sensitivity both more difficult and less useful than
for the LWRS aperture.
5.2.2. Systematic Uncertainties
The greatest uncertainties in a line or continuum flux
derived from a FUSE spectrum are due to systematic
effects. An estimate of the uncertainty in our flux cali-
bration can be obtained by comparing the effective-area
curves derived from different white-dwarf stars. Differ-
ences among the curves reflect errors in both the model
atmospheres and the stellar parameters upon which they
are based. In most channels, the scatter in the derived
effective areas is between 2 and 4%.
The photometric accuracy of FUSE spectra is subject
to numerous effects that cannot be fully corrected by the
CalFUSE pipeline. A target centered in an aperture of
the guide channel (LiF1 or LiF2) may not be centered
in the corresponding apertures of the other three chan-
nels. Since the loss of the first two reaction wheels in
2001, spacecraft drifts may move the target out of even
the guide-channel aperture. While the pipeline does at-
tempt to flag times when the target is out of the aper-
ture, the algorithm used is conservative in that it un-
derestimates the time lost to pointing errors (§ 4.4.6).
The user is advised to consult the count-rate plots gen-
erated by the pipeline (suffix “rat.gif”; § 4.10) and the
LIF CNT RATE and SIC CNT RATE arrays of the IDF
timeline table to determine the photometric quality of an
exposure. Using tools available from MAST or the user-
defined good-time intervals discussed in § 4.4.7, users can
reject time periods when the count rate is low or re-scale
the flux of low-count-rate exposures.
When a point-source target falls near the top or bottom
edge of an aperture, vignetting in the spectrograph may
attenuate the target flux in a wavelength-dependent way.
Astigmatism gives FUSE spectra the shape of a bow tie
(Fig. 3). If vignetting is important, then the spectrum
will lie below the center of the aperture on one side of
the bow tie and above it on the other. Significant flux
loss is possible in wavelength regions far from the center
of the bow tie.
18 Dixon et al.
Other systematic uncertainties are imposed by various
detector flat-field effects; their relative importance de-
pends upon one’s scientific goals. For narrow emission
lines, flux uncertainties are dominated by the moiré pat-
tern (high-frequency ripples due to beating among the
arrays of microchannel pores in the MCP stack; § 6.4),
unless the observation was obtained using an FP split or
the equivalent was achieved via grating and mirror mo-
tions. For broad features, the moiré is not important,
but larger-scale flat-field features are. These effects are
discussed in The FUSE Instrument and Data Handbook.
Finally, when fitting a spectral energy distribution, the
greatest uncertainty is caused by worms (§ 6.3), which
may depress the observed flux over tens of Ångstroms by
50% or more.
5.2.3. Extended Sources
The FUSE flux calibration is derived from point-
source targets. Because the distribution of flux in the
cross-dispersion direction differs for point and extended
sources, it is possible that the instrumental sensitivity
may also differ; this question has not been explored in
detail. Extended spectra are less affected by worms (§
6.3) than are point-source spectra. Moreover, because
the spectrum of a diffuse emitter is spread over a larger
region of the detector, it will suffer less from local flat-
field effects.
6. DISCUSSION
6.1. Spacecraft Guiding on the LiF2 Channel
The switch from FES A to FES B as the default guide
camera in 2005 July has two principal effects on the qual-
ity of FUSE data. First, tracking with FES A ensured
that targets remained in the center of the LiF1 aperture,
which is the most sensitive channel in the astrophysically-
important 1000–1100 Å waveband. Tracking with FES
B will keep targets centered in the LiF2 aperture, in-
creasing the likelihood of data loss in the LiF1 channel.
Second, in order to optimize the optical focus of FES B,
the LiF2 FPA was moved out of the focal plane of the
LiF2 primary mirror. Observations of point sources with
the LWRS aperture are unaffected, and the point-source
spectral resolution of this channel is unchanged, but the
throughput of the narrow LiF2 apertures is reduced. The
effective transmission of the apertures has not been char-
acterized in detail, but is approximately 70% for LIF2
MDRS and 15% for LiF2 HIRS, versus 98% and 60% for
their LiF1 counterparts. The spectral resolution for dif-
fuse sources is expected to be slightly lower in LiF2 than
in LiF1.
6.2. Scattered Solar Emission
In addition to airglow lines, scattered solar emission
features are present in the SiC channels when observ-
ing at high beta angles during the sunlit portion of the
orbit. Emission from C III λ977.0, Lyman β λ1025.7,
and O VI λλ1031.9, 1037.6 has been positively identified.
Emission from N III λ991.6 and N II λ1085.7 may also
be present. It is believed that sunlight is scattered by
reflective, silver-coated Teflon blankets lying above the
SiC baffles. At low beta angles, scattered solar emission
is less apparent, because the blankets are shaded by the
SiC baffles and the open baffle doors and because the ra-
diation strikes the blankets at a high angle of incidence.
It is unknown at which beta angle, if any, the solar emis-
sion completely disappears. Because the LiF channels
lie on the shadowed side of the spacecraft, solar emission
lines are not seen in LiF spectra. C III and O VI emission
observed in the SiC channels during orbital day should
always be compared with the emission observed either
with the LiF channel or during the nighttime portion of
an orbit.
Since the failure of the third reaction wheel in 2004
December, FUSE mission controllers have experimented
with the use of non-standard roll angles to improve space-
craft stability. These roll angles can place the spacecraft
in a configuration that greatly increases the sunlight scat-
tered into one of the SiC channels. The scattered light,
mostly Lyman continuum emission, appears as an in-
crease in the background at wavelengths shorter than
about 920 Å; strong, resolved Lyman lines are present at
longer wavelengths. When present, it is generally seen
in only one of the two SiC channels. We have no way to
model or subtract this emission.
6.3. The Worm
The spectra of point-source targets occasionally exhibit
a depression in flux that may span as much as 50 Å
(Fig. 11). These depressions appear in detector images
as narrow stripes roughly parallel to the dispersion axis
(Fig. 12). The stripes, known as worms, can attenuate
as much as 50% of the incident light in affected portions
of the spectrum. Worms shift in the dispersion direction
when the target moves in the aperture. They are due
to an unfortunate interaction between the horizontal fo-
cus of the spectrograph and the innermost wire grid (the
quantum-efficiency grid; § 4.3.6). Since the location of
this focus point is a function of wavelength, the strength
of a worm is exquisitely sensitive to the exact position of
the spectrum on the detector. We cannot determine this
position with sufficient precision to correct reliably for
flux lost to worms. Though most prominent in LiF1B
LWRS spectra, worms can appear in all channels and
apertures. Observers who require absolute spectropho-
tometry should carefully examine FUSE spectral image
files for the presence of worms. The redundant wave-
length coverage of the various FUSE channels can be
used to mitigate their effects.
6.4. The Moiré Pattern in Histogram Data
Since the release of CalFUSE v3.0, users have reported
strong, non-Gaussian noise in the spectra of some bright
stars observed in histogram mode. An example is shown
in Fig. 13. The high-frequency ripples have a period of
approximately 9 detector pixels, or about 0.06 Å. These
ripples are a moiré pattern due to beating among the
arrays of microchannel pores in the three layers of the
MCP stack (Tremsin et al. 1999). The moiré fringes are
strongest on segment 2B, but are also visible on segments
1A and 2B. The motion corrections applied to time-tag
data tend to smooth out this effect, but it can be quite
strong in histogram data. Where it is present, users are
advised to smooth or bin their spectra by at least one
resolution element to reduce its effects. This and other
detector artifacts are described in the FUSE Instrument
CalFUSE: The FUSE Calibration Pipeline 19
and Data Handbook.
6.5. A Note about Time
The FUSE spacecraft uses Coordinated Universal
Time (UTC). The spacecraft clock is updated periodi-
cally from the ground using a procedure that corrects
for the signal transit time from the ground station to the
spacecraft. The ground station time comes from GPS
satellites. The Instrument Data System receives a 1 Hz
signal from the spacecraft that is used to align the IDS
clock with the spacecraft clock to an accuracy of ±5 ms.
In time-tag mode, the IDS typically inserts a time stamp
into the data stream once per second, but can insert time
stamps as frequently as 125 times per second. Unfortu-
nately, the binary format of the time stamp rounds the
time value to the nearest 1/128 of a second. The two pe-
riods beat against one another, causing the loss of three
time stamps each second. Additional timing uncertain-
ties due to delays in the detector electronics have not
been measured, but are assumed to be on the order of
a few milliseconds. For most time-tag observations, for
which time stamps are recorded only once per second,
these effects can safely be ignored.
Raw time-tag files are constructed by assigning the
value of the most recent time stamp, in units of seconds
from the exposure start time, to each subsequent photon
event. The frequency of these time markers determines
the temporal resolution of the data. Photon-arrival times
are not modified by the pipeline: values are UTC as as-
signed by the IDS. In particular, photon-arrival times are
not converted to a heliocentric scale.
6.6. Combining Data from Multiple Exposures
For each FUSE observation, OPUS combines data
from individual exposures into a set of observation-level
spectra, as described in § 4.11. While these files are suffi-
cient for many projects, other projects may benefit from
specialized data processing. Here are some points to keep
in mind when combining FUSE data from multiple expo-
sures: For bright targets, the goal is to maximize spectral
resolution, so it is important to align precisely the spectra
from individual exposures before combining them. The
wavelength zero points of segments A and B are consis-
tent across each of the FUSE detectors (§ 5.1), so shifts
measured for one detector segment can safely be applied
to the other. For observations made before 2005 July, the
LiF1 spectrum is likely to have the most accurate wave-
length scale, so it serves as the standard for the other
three channels. For later observations, the LiF2 spectra
are likely to be the most accurate. A procedure to cross-
correlate and shift spectra by hand is described in The
FUSE Data Analysis Cookbook. When cross-correlating
the spectra of point-source targets, it is important to ex-
clude regions contaminated by airglow features, as their
motions are unlikely to track those of the target. For
faint targets, the goal is to optimize the fidelity of the
background model by maximizing the signal-to-noise ra-
tio on background regions of the detector, a goal achieved
by combining the IDFs from multiple exposures before
extracting the spectra. A variety of C- and IDL17-based
tools to perform these and other data-analysis tasks has
17 IDL is a registered trademark of ITT Corporation for their
Interactive Data Language software.
TABLE 2
Format of Raw Time-Tag Files
Array Name Format Description
Primary Header-Data Unit (HDU 1)
Header only. Keywords contain exposure-specific information.
HDU 2: Photon Event List
TIME FLOAT Photon arrival time (seconds)
X SHORT Raw X position (0–16383)
Y SHORT Raw Y position (0–1023)
PHA BYTE Pulse height (0–31)
HDU 3: Good-Time Intervals
START DOUBLE GTI start time (seconds)
STOP DOUBLE GTI stop time (seconds)
Note. — Times are relative to the exposure start time, stored
in the header keyword EXPSTART.
been generated by the FUSE project. Software and doc-
umentation are available from MAST.
We acknowledge with gratitude the efforts of those who
contributed to the design and implementation of initial
versions of the CalFUSE pipeline and its associated cal-
ibration files: G. A. Kriss, E. M. Murphy, J. Murthy,
W. R. Oegerle, and K. C. Roth. This research has made
use of the Multimission Archive at the Space Telescope
Science Institute (MAST). STScI is operated by the As-
sociation of Universities for Research in Astronomy, Inc.,
under NASA contract NAS5-26555. Support for MAST
for non-HST data is provided by the NASA Office of
Space Science via grant NAG5-7584 and by other grants
and contracts. This work is supported by NASA contract
NAS5-32985.
Facility: FUSE
APPENDIX
A. FILE FORMATS
All FUSE data are stored as FITS files (Hanisch et al.
2001) containing one or more Header + Data Units
(HDUs). The first is called the primary HDU (or
HDU 1); it consists of a header and an optional N-
dimensional image array. The primary HDU may be fol-
lowed by any number of additional HDUs, called “exten-
sions.” Each extension has its own header and data unit.
FUSE employs two types of extensions, image extensions
(a 2-dimensional array of pixels) and binary table exten-
sions (rows and columns of data in binary representa-
tion). CalFUSE uses the CFITSIO subroutine library
(Pence 1999) to read and write FITS files.
A-1. Raw Time-Tag and Histogram Files
FUSE raw data files are generated by OPUS using
both data downlinked by the telescope and information
from the FUSE Mission Planning Database (§ 4.1). In-
formation regarding the target, exposure times, instru-
ment configuration, and engineering parameters is stored
in a series of header keywords in the primary HDU. All
header keywords are described in the FUSE Instrument
and Data Handbook. In raw time-tag files (Table 2), the
20 Dixon et al.
primary HDU consists of a header only, with no asso-
ciated image array. HDU 2 contains the photon-event
list, with arrival time (in seconds from the exposure start
time), raw detector coordinates, and pulse height for each
event in turn. HDU 3 lists good-time intervals (GTIs)
calculated by OPUS. Raw time-tag file names end with
the suffix “ttagfraw.fit.” They can be as large as 10–20
MB for the brightest targets.
The data in raw histogram files (suffix “histfraw.fit”)
are stored as a series of image extensions (Table 3).
The primary HDU contains the same header keywords
as time-tag files, along with a small (8 × 64 pixel) im-
age called the Spectral Image Allocation (SIA) table.
The SIA table is used to map regions of the detector
to on-board memory. Each element in the SIA table
corresponds to a 2048 × 16 pixel region on a detector
segment. If the element is set to 1, the photons from
the corresponding region are saved; if 0, they are dis-
carded. Additional image extensions follow, each con-
taining the binned image of some region of the detector;
these regions may overlap. In general, science data are
binned by 8 pixels in Y and unbinned in X; binning fac-
tors for each exposure are stored in the header keywords
SPECBINX and SPECBINY. While the format given in
Table 3 is standard, any number of image extensions may
be present in a histogram file. Raw histogram data files
are 1–1.5 MB in size.
A-2. Housekeeping and Jitter Files
For each exposure, a single housekeeping file is gen-
erated by OPUS from engineering data supplied by the
spacecraft (§ 4.1). Housekeeping files (suffix “hskpf.fit”)
contain 62 arrays, including spacecraft pointing informa-
tion, detector voltage levels, and various counter values,
in a single binary table extension. Arrays are tabulated
once per second, though most parameters are updated
only once every 16 seconds. Only a few of the house-
keeping arrays are employed by the pipeline. The detec-
tor high voltage and LiF, SiC, FEC, and AIC counter
arrays are used to populate the corresponding arrays in
the IDF timeline table (§ A-3).
From pointing information in the housekeeping file,
OPUS derives a jitter file (suffix “jitrf.fit”) consisting of
a single binary table extension with 4 columns: TIME,
DX, DY and TRKFLG. The time refers to the elapsed
time (in seconds) from the start of the exposure. Since
the engineering data commonly begin up to a minute be-
fore the exposure, the first few entries of this array are
negative. DX and DY are the offsets along the X (disper-
sion) and Y (cross-dispersion) directions in arc seconds.
These offsets are defined relative to the commanded posi-
tion of the telescope (presumably the target coordinates).
Finally, TRKFLG is the tracking quality flag. Its value
is −1 if the spacecraft is not tracking properly and 0 if
tracking information is unavailable. Values between 1
and 5 represent increasing levels of fidelity for DX and
Additional details regarding the contents and format
of the housekeeping and jitter files are provided in The
FUSE Instrument and Data Handbook.
A-3. Intermediate Data File (IDF)
The IDF (suffix “idf.fit”) contains three FITS binary
table extensions; their contents are listed in Table 4. The
file’s primary header-data unit (HDU 1) is copied directly
from the raw data file. (For histogram data, the SIA ta-
ble is discarded.) Various keywords are populated by the
initialization routine (§ 4.2) and by subsequent pipeline
modules. The first binary-table extension (HDU 2) con-
tains the photon events themselves. For time-tag data,
the TIME, XRAW, YRAW, and PHA arrays are copied
from the raw data file, and the WEIGHT array is ini-
tialized to 1.0. For histogram data, each image pixel
is mapped back to its coordinates on the full detector,
which are recorded in the XRAW and YRAW arrays.
The WEIGHT array is initialized to the number of pho-
ton events in the pixel. Zero-valued pixels are ignored.
Histogram data are not “unbinned.” Each entry of the
TIME array is set to the midpoint of the exposure and
each entry of the PHA array to 20. (Both arrays are
subsequently modified.)
The first pipeline module (§ 4.3) corrects for various
detector effects; it scales the WEIGHT array to correct
for detector dead time and populates the XFARF and
YFARF arrays. (The flight alignment reference frame
represents the output of an ideal detector.) Each pho-
ton event is assigned to one of six aperture-channel com-
binations or to the background (Table 5) and a corre-
sponding code is written to the CHANNEL array (§ 4.5).
After corrections for mirror, grating, and spacecraft mo-
tions, the photon’s final coordinates are recorded in the
X and Y arrays. Though floating-point arrays, XFARF,
YFARF, X, and Y are written to the IDF as arrays of
8-bit integers using the FITS TZERO and TSCALE key-
words. This process effectively rounds each element of
XFARF and X to the nearest 0.25 of a detector pixel
and each element of YFARF and Y to the nearest 0.1 of
a detector pixel.
The screening routines (§ 4.4) use information from the
timeline table (described below) to identify photons that
violate pulse-height limits, limb-angle constraints, etc.
“Bad” photons are not deleted from the IDF, but merely
flagged. Flags are stored as single bits in an 8-bit byte.
We use two sets of flags, TIMEFLGS for time-dependent
and LOC FLGS for location-dependent effects (Table 6).
For each bit, a value of 0 indicates that the photon is
“good,” except for the day/night flag, for which 0 = night
and 1 = day. It is possible to modify these flags without
re-running the pipeline. For example, one could exclude
day-time photons or include data taken close to the earth
limb.
The LAMBDA array contains the heliocentric wave-
length assigned to each photon (§ 4.6), and the ERGCM2
array records its “energy density” in units of erg cm−2 (§
4.7). To convert an extracted spectrum to units of flux,
one must divide by the exposure time and the width of
an output spectral bin.
The second extension (HDU 3) is a list of good-time
intervals (GTIs). The initial values are copied from the
raw data file, but they are modified by the pipeline once
the various screening routines have been run. By con-
vention, the START value of each GTI corresponds to
the arrival time of the first photon in that interval. The
STOP value is one second later than the arrival time of
the last photon in that interval. The length of the GTI
is thus STOP−START.
The third extension (HDU 4) is called the timeline ta-
CalFUSE: The FUSE Calibration Pipeline 21
ble. It contains status flags and spacecraft and detector
parameters used by the pipeline. An entry in the time-
line table is created for each second of the exposure. For
time-tag data, the first entry corresponds to the time of
the first photon event, and the final entry to the time of
the final photon event plus one second. (Should an expo-
sure’s photon-arrival times purport to exceed 55 ks, we
create timeline entries only for each second in the good-
time intervals.) For histogram data, the first element of
the TIME array is set to zero and the final element to
EXPTIME+1 (where EXPTIME is the exposure dura-
tion computed by OPUS). Because we require that EXP-
TIME equal both Σ (STOP−START), summed over all
entries in the GTI table, and the number of good times
in the timeline table, we must flag the final second of
each GTI as bad. No photons are associated with the
STOP time of a GTI.
Only the day/night and OPUS flags of the STA-
TUS FLAGS array are populated when the IDF is cre-
ated; the other flags are set by the various screening
routines (§ 4.4). The elements of the TIME SUNSET,
TIME SUNRISE, LIMB ANGLE, LONGITUDE, LAT-
ITUDE, and ORBITAL VEL arrays are computed from
the orbital elements in the FUSE.TLE file. The
HIGH VOLTAGE array is populated with values from
the housekeeping file. The LIF CNT RATE and
SIC CNT RATE arrays are initially populated with val-
ues derived from the LiF and SiC counter arrays in the
housekeeping file. For time-tag data, these arrays are
eventually updated with the actual count rates within
the target aperture, excluding regions contaminated by
airglow. The FEC CNT RATE and AIC CNT RATE,
described in § 4.3.2, are also derived from counter ar-
rays in the housekeeping file. For time-tag data, the
BKGD CNT RATE array is populated by the burst-
rejection routine (§ 4.4.5) and represents the count rate
in pre-defined background regions of the detector, ex-
cluding airglow features. The array is not populated for
histogram data. The YCENT LIF and YCENT SIC ar-
rays trace the centroid of the target spectra with time
before motion corrections are applied. These two arrays
are not used by the pipeline.
Raw time-tag files (§ A-1) employ the standard FITS
binary table format, listing TIME, X, Y, PHA for each
photon event in turn. The intermediate data files have a
slightly different format, listing all of the photon arrival
times, then the X coordinates, then the Y coordinates.
Formally, the table has only one row, and each element of
the table is an array. (To use the STSDAS terminology,
IDFs are written as 3-D tables.) The MDRFITS func-
tion from the IDL Astronomy User’s Library (Landsman
1993) can read both file formats; some older FITS read-
ers cannot. Note that, because HDUs 2 and 4 of the
IDFs contain floating-point arrays stored as shorts (using
the TZERO and TSCALE keywords), calls to MRDFITS
must include the keyword parameter FSCALE.
A-4. Bad-Pixel Maps (BPM Files)
The BPM files (suffix “bpm.fit”; § 4.8) consist of a
single binary table extension. Its format is similar to
that of the IDF, but it contains only five columns: X, Y,
CHANNEL, WEIGHT, and LAMBDA. The WEIGHT
column, whose values range from 0 to 1, represents the
fraction of the exposure that each pixel was affected by
a dead spot. The BPM files are not archived, but can
be generated from the IDF and jitter file using pipeline
software available from MAST.
A-5. Extracted Spectral Files
Extracted spectra (suffix “fcal.fit”; § 4.9) are stored
in a single binary table extension. Its contents are pre-
sented in Table 7. Note that the spectra are binned in
wavelength. The bin size can be set by the user, but the
default is 0.013 Å, which corresponds to about 2 detec-
tor pixels or about one-fourth of a spectral resolution el-
ement. The WAVE array records the central wavelength
of each spectral bin. For time-tag data, the COUNTS
array represents the total of all (raw) photon events as-
signed to the target aperture. For histogram data, the
COUNTS array is simply the WEIGHTS array divided
by the mean dead-time correction for the exposure. If op-
timal extraction is performed, the values of the FLUX,
ERROR, WEIGHTS, and BKGD arrays are determined
by that algorithm. As a result, the ratio of WEIGHTS
to COUNTS is constant only for histogram data. The
QUALITY array records the percentage of the extrac-
tion window containing valid data. It is 100 if no bad
pixels fell within the wavelength bin, 0 if the entire bin
was lost to bad pixels.
REFERENCES
Bohlin, R. C. 1996, AJ, 111, 1743
Bohlin, R. C., Colina, L., and Finley, D. S. 1995, AJ, 110, 1316
Feldman, P. D., Sahnow, D. J., Kruk, J. W., Murphy, E. M., and
Moos, H. W. 2001, J. Geophys. Res., 106, 8119
Hanisch, R. J., Farris, A., Greisen, E. W., Pence, W. D.,
Schlesinger, B. M., Teuben, P. J., Thompson, R. W., and
Warnock, A. 2001, A&A, 376, 359
Holberg, J. B., Barstow, M. A., and Sion, E. M. 1998, ApJS, 119,
Horne, K. 1986, PASP, 98, 609
Hubeny, I. and Lanz, T. 1995, ApJ, 439, 875
Kruk, J. W., Brown, T. M., Davidsen, A. F., Espey, B. R.,
Finley, D. S., and Kriss, G. A. 1999, ApJS, 122, 299
Landsman, W. B. 1993, in ASP Conf. Ser. 52: Astronomical Data
Analysis Software and Systems II, ed. R. J. Hanisch, R. J. V.
Brissenden, and J. Barnes (San Francisco: ASP), p. 246.
Lemoine, M. et al. 2002, ApJS, 140, 67
Massa, D. and Fitzpatrick, E. L. 2000, ApJS, 126, 517
McCandliss, S. R., France, K., Lupu, R. E., Burgh, E. B.,
Sembach, K., Kruk, J., Andersson, B-G, and Feldman, P. D.
2007, ApJ, in press (astro-ph/0701439)
Moos, H. W. et al. 2000, ApJ, 538, L1
Pence, W. 1999, in ASP Conf. Ser. 172: Astronomical Data
Analysis Software and Systems VIII, ed. D. M. Mehringer,
R. L. Plante, and D. A. Roberts (San Francisco: ASP), p. 487
Rose, J. F., Heller-Boyer, C., Rose, M. A., Swam, M., Miller, W.,
Kriss, G. A., and Oegerle, W. R. 1998, in Proc. SPIE 3349,
Observatory Operations to Optimize Scientific Return, p. 410
Sahnow, D. J. et al. 2000a, ApJ, 538, L7
Sahnow, D. J., Gummin, M. A., Gaines, G. A., Fullerton, A. W.,
Kaiser, M. E., and Siegmund, O. H. 2000b, in Proc. SPIE 4139,
Instrumentation for UV/EUV Astronomy and Solar Missions,
ed. S. Fineschi, C. M. Korendyke, O. H. Siegmund, and B. E.
Woodgate, p. 149
Siegmund, O. H. et al. 1997, in Proc. SPIE 3114, EUV, X-Ray,
and Gamma-Ray Instrumentation for Astronomy VIII, ed.
O. H. Siegmund and M. A. Gummin, p. 283
Tremsin, A. S., Siegmund, O. H., Gummin, M. A., Jelinsky,
P. N., and Stock, J. M. 1999, Appl. Opt., 38, 2240
http://arxiv.org/abs/astro-ph/0701439
22 Dixon et al.
Al+LiF Coated 
Mirror #2
Focal Plane 
Assemblies (4)
Detectors (2) 
Al+LiF Coated 
Mirror #1
SiC Coated 
Mirror #2
SiC Coated 
Mirror #1
Rowland Circles
Al+LiF Coated
Grating #2
Al+LiF Coated
Grating #1
SiC Coated
Grating #2
Fig. 1.— Schematic of the FUSE instrument optical system. The telescope focal lengths are 2245 mm, and the Rowland circle diameters
are 1652 mm. (Figure from Moos et al. 2000.)
CalFUSE: The FUSE Calibration Pipeline 23
-150 -100 -50 0 50 100 150
X Coordinate (arcsec)
Fig. 2.— The FUSE apertures projected onto the sky. In the FPA coordinate system, the LWRS, HIRS, and MDRS apertures are centered
at Y = −118.′′07, −10.′′27, and +90.′′18, respectively. The reference point (RFPT) at X = +55.′′18 is not an aperture; when a target is
placed at this location, the three apertures sample the background sky. With north on top and east on the left, this diagram corresponds
to an aperture position angle of 0◦. Positive aperture position angles correspond to a counter-clockwise rotation of the spacecraft about
the target aperture. This diagram represents only a portion of the FPA; its active area is 19′×19′.
24 Dixon et al.
0 5.0•103 1.0•104 1.5•104
2.0•103 4.0•103 6.0•103 8.0•103 1.0•104 1.2•104 1.4•1040
1000 1020 1040 1060 1080
1000 1020 1040 1060 10801080 1060 1040 1020
1080 1060 1040 1020
Wavelength (Å)
Fig. 3.— Image of detector segment 1A during a bright-earth observation. All lines are geocoronal. Note the strong Lyman β (1026 Å)
feature in each spectrum. The data have been fully corrected for detector and other distortions. Extended-source extraction windows for all
three apertures in both the LiF and SiC channels are marked; point-source extraction windows are somewhat narrower in Y. Instrumental
astigmatism is responsible for the bow-tie shape of each spectrum. The region shown corresponds to detector pixels 900 to 15,300 in X and
0 to 915 in Y.
Fig. 4.— Image of detector 1A in raw X and Y coordinates showing geometric distortion. The image shows only a portion of the detector.
It was constructed from 3 separate exposures with stars in the HIRS, MDRS, and LWRS apertures of the LiF1A detector.
CalFUSE: The FUSE Calibration Pipeline 25
Fig. 5.— Segments of detector 1A images showing filamentary (top) and checkerboard (bottom) bursts. Checkerboard bursts typically
fill the detector, save for the region around the LiF Lyman β lines on detector 1A.
Fig. 6.— Segment of LiF1A spectrum before (bottom) and after (top) astigmatism correction. Note the reduction of curvature in the
absorption features. A detector dead spot is present on the left side of the figure.
Fig. 7.— Night-time scattered-light image for detector 1A. Note the vertical scattered-light stripe to the right of the image center.
26 Dixon et al.
0 5000 10000 15000
X Coordinate (Pixels)
Fig. 8.— Geometric distortion in the X coordinate of the LiF1A LWRS channel. Data represent the difference between the measured
locations of H2 lines in the spectrum of GCRV 12336 and those predicted by a theoretical dispersion relation. The solid line is a spline fit
to the residuals.
0 10 20 30 40 50
Observation Date
2001/09
2002/08
2002/10
2003/12
2004/07
2006/12
Fig. 9.— Measured heliocentric velocity of the interstellar O I λ1039.23 absorption feature in each of 47 exposures of the white dwarf
KPD 0005+5106 through the high-resolution (HIRS) aperture. Holberg et al. 1998 report a heliocentric velocity of −7.50 ± 0.76 km s−1
for the interstellar features along this line of sight.
CalFUSE: The FUSE Calibration Pipeline 27
2000 2002 2004 2006
Date (Year)
LiF1A
SiC1A
980 1000 1020 1040 1060 1080 1100
Wavelength (Å)
LiF1A
SiC1A
Fig. 10.— FUSE sensitivity as a function of time. Upper panel: Effective area of the LiF1A and SiC1A channels, averaged over the
wavelength region 1030–1040 Å. The gap between 2004 October and 2006 May represents the period after the loss of the third reaction
wheel, when few calibration targets were observed. Lower panel: Effective-area curves for the LiF1A and SiC1A channels, dated 1999 and
2006. (For both channels, the 1999 curve has the higher effective area.)
28 Dixon et al.
1100 1120 1140 1160 1180
Wavelength (Å)
Fig. 11.— Point-source spectra showing the effects of the worm. Spectra A and B, obtained with the LiF1B channel, show deep depressions
near 1145 and 1160 Å, respectively. The wavelength of maximum attenuation varies with the Y position of the target within the aperture.
Spectrum C, obtained with the LiF2A channel, is unattenuated.
0 2000 4000 6000
Fig. 12.— Detector images showing the effects of the worm. In these negative images, worms appear as bright stripes parallel to the
dispersion axis. The data shown correspond to spectra A and B in Fig. 11 and span wavelengths between 1134 and 1187 Å.
1044.0 1044.5 1045.0 1045.5 1046.0
Wavelength (Å)
Fig. 13.— Moiré pattern in the LiF2B spectrum of the star HD 209339, obtained in histogram mode. The associated error array is
overplotted. The moiré ripples are strongest on this detector segment, but are also seen on segments 1A and 1B.
CalFUSE: The FUSE Calibration Pipeline 29
TABLE 3
Format of Raw Histogram Files
Image Sizea
HDU Contents (binned pixels)
1b SIA Tablec 8× 64
2 SiC Spectral Image (12–20) × 16384
3 LiF Spectral Image (12–20) × 16384
4 Left Stim Pulse 2 × 2048
5 Right Stim Pulse 2 × 2048
Note. — While this table describes the format of a typical
raw histogram file, any number of HDUs are allowed.
a Quoted image sizes assume the standard histogram binning: by
8 pixels in Y, unbinned in X. Actual binning factors are given in
the primary file header.
b Header keywords of HDU 1 contain exposure-specific informa-
tion.
c The SIA table describes which regions of the detector are in-
cluded in the file.
TABLE 4
Format of Intermediate Data Files
Array Name Format Description
Primary Header-Data Unit (HDU 1)
Header only. Keywords contain exposure-specific information.
HDU 2: Photon Event List
TIME FLOAT Photon arrival time (seconds)
XRAW SHORT Raw X coordinate (0–16383)
YRAW SHORT Raw Y coordinate (0–1023)
PHA BYTE Pulse height (0–31)
WEIGHT FLOAT Photons per binned pixel for HIST data,
initially 1.0 for TTAG data
XFARF FLOAT X coordinate in geometrically-corrected frame
YFARF FLOAT Y coordinate in geometrically-corrected frame
X FLOAT X coordinate after motion corrections
Y FLOAT Y coordinate after motion corrections
CHANNEL BYTE Aperture+channel ID for the photon (Table 5)
TIMEFLGS BYTE Time flags (Table 6)
LOC FLGS BYTE Location flags (Table 6)
LAMBDA FLOAT Wavelength of photon (Å)
ERGCM2 FLOAT Energy density of photon (erg cm−2)
HDU 3: Good-Time Intervals
START DOUBLE GTI start time (seconds)
STOP DOUBLE GTI stop time (seconds)
HDU 4: Timeline Table
TIME FLOAT Seconds from exposure start time
STATUS FLAGS BYTE Status flags
TIME SUNRISE SHORT Seconds since sunrise
TIME SUNSET SHORT Seconds since sunset
LIMB ANGLE FLOAT Limb angle (degrees)
LONGITUDE FLOAT Spacecraft longitude (degrees)
LATITUDE FLOAT Spacecraft latitude (degrees)
ORBITAL VEL FLOAT Component of spacecraft velocity
in direction of target (km/s)
HIGH VOLTAGE SHORT Detector high voltage (unitless)
LIF CNT RATE SHORT LiF count rate (counts/s)
SIC CNT RATE SHORT SiC count rate (counts/s)
FEC CNT RATE FLOAT FEC count rate (counts/s)
AIC CNT RATE FLOAT AIC count rate (counts/s)
BKGD CNT RATE SHORT Background count rate (counts/s)
YCENT LIF FLOAT Y centroid of LiF target spectrum (pixels)
YCENT SIC FLOAT Y centroid of SiC target spectrum (pixels)
Note. — Times are relative to the exposure start time, stored in the header
keyword EXPSTART. To conserve memory, floating-point values are stored as shorts
(using the FITS TZERO and TSCALE keywords) except for TIME, WEIGHT,
LAMBDA and ERGCM2, which remain floats.
30 Dixon et al.
TABLE 5
Aperture Codes for IDF
CHANNEL Array
Aperture LiF SiC
HIRS 1 5
MDRS 2 6
LWRS 3 7
Not in an aperture 0
TABLE 6
Bit Codes for IDF Time and
Location Flags
Bit Value
Time Flags
8 User-defined bad-time interval
7 Jitter (target out of aperture)
6 Not in an OPUS-defined GTI or
Photon arrival time unknown
5 Burst
4 High voltage reduced
3 SAA
2 Limb angle
1 Day/Night flag (N = 0, D = 1)
Location Flags
8 Not used
7 Fill data (histogram mode only)
6 Photon in bad-pixel region
5 Photon pulse height out of range
4 Right stim pulse
3 Left stim pulse
2 Airglow feature
1 Not in detector active area
Note. — Flags are listed in order from
most- to least-significant bit.
TABLE 7
Format of Extracted Spectral Files
Array Name Format Description
Primary Header-Data Unit (HDU 1)
Header only. Keywords contain exposure-specific information.
HDU 2: Extracted Spectrum
WAVE FLOAT Wavelength (Å)
FLUX FLOAT Flux (erg cm−2 s−1 Å−1)
ERROR FLOAT Gaussian error (erg cm−2 s−1 Å−1)
COUNTS INT Raw counts in extraction window
WEIGHTS FLOAT Raw counts corrected for dead time
BKGD FLOAT Estimated background in extraction window (counts)
QUALITY SHORT Percentage of window used for extraction (0–100)
ABSTRACT
  Since its launch in 1999, the Far Ultraviolet Spectroscopic Explorer (FUSE)
has made over 4600 observations of some 2500 individual targets. The data are
reduced by the Principal Investigator team at the Johns Hopkins University and
archived at the Multimission Archive at Space Telescope (MAST). The
data-reduction software package, called CalFUSE, has evolved considerably over
the lifetime of the mission. The entire FUSE data set has recently been
reprocessed with CalFUSE v3.2, the latest version of this software. This paper
describes CalFUSE v3.2, the instrument calibrations upon which it is based, and
the format of the resulting calibrated data files.

<|endoftext|><|startoftext|>
Voltage-Current curves for small Josephson junction arrays
B. Douçot1 and L.B. Ioffe2
Laboratoire de Physique Théorique et Hautes Énergies, CNRS UMR 7589,
Universités Paris 6 et 7, 4, place Jussieu, 75252 Paris Cedex 05 France
Center for Materials Theory, Department of Physics and Astronomy,
Rutgers University 136 Frelinghuysen Rd, Piscataway NJ 08854 USA
We compute the current voltage characteristic of a chain of identical Josephson circuits charac-
terized by a large ratio of Josephson to charging energy that are envisioned as the implementation
of topologically protected qubits. We show that in the limit of small coupling to the environment
it exhibits a non-monotonous behavior with a maximum voltage followed by a parametrically large
region where V ∝ 1/I . We argue that its experimental measurement provides a direct probe of the
amplitude of the quantum transitions in constituting Josephson circuits and thus allows their full
characterization.
I. INTRODUCTION
In the past years, the dramatic experimental progress
in the design and fabrication of quantum two level sys-
tems in various superconducting circuits1 has raised a
hope that such solid state devices could eventually serve
as basic logical units in a quantum computer (qubits).
However, a very serious obstacle on this path is the ubiq-
uitous decoherence, which in practice limits the typical
life-time of quantum superpositions of two distinct log-
ical states of a qubit to microseconds. This is far from
being sufficient to satisfy the requirements for implement-
ing quantum algorithms and providing systematic error
correction.2
This has motivated us to propose some alternative
ways to design Solid-State qubits, that would be much
less sensitive to decoherence than those presently avail-
able. These protected qubits are finite size Josephson
junction arrays in which interactions induce a degener-
ate ground-state space characterized by the remarkable
property that all the local operators induced by couplings
to the environment act in the same way as the identity
operator. These models fall in two classes. The first
class is directly inspired by Kitaev’s program of topo-
logical quantum computation,3 and amounts to simulat-
ing lattice gauge theories with small finite gauge groups
by a large Josephson junction lattice.4,5,6 The second
class is composed of smaller arrays with sufficiently large
and non-Abelian symmetry groups allowing for a persis-
tent ground-state degeneracy even in the presence of a
noisy environment.7,8 All these systems share the prop-
erty that in the classical limit for the local superconduct-
ing phase variables (i.e. when the Josephson coupling is
much larger than the charging energy), the ground-state
is highly degenerate. The residual quantum processes
within this low energy subspace lift the classical degen-
eracy in favor of macroscopic coherent superpositions of
classical ground-states. The simplest example of such
system is based on chains of rhombi (Fig. 1) frustrated
by magnetic field flux Φ = Φ0/2 that ensures that in the
classical limit each rhombus has two degenerate states.8
Practically, it is important to be able to test these ar-
rays and optimize their parameters in relatively simple
experiments. In particular one needs the means to verify
the degeneracy of the classical ground states, the pres-
ence of the quantum coherent processes between them
and measure their amplitude. Another important pa-
rameter is the effective superconducting stiffness of the
fluctuating rhombi chain. The classical degeneracy and
chain stiffness can be probed by the experiments dis-
cussed in9; they are currently being performed10. The
idea is that a chain of rhombi threaded individually by
half a superconducting flux quantum, the non-dissipative
current is carried by charge 4e objects,11,12 so that the
basic flux quantum for a large closed chain of rhombi
becomes h/(4e) instead of h/(2e) which can be directly
observed by measuring the critical current of the loop
made from such chain and a large Josephson junction.
The main goal of the present paper is to discuss a prac-
tical way to probe directly the quantum coherence associ-
ated with these tunneling processes between macroscop-
ically distinct classical ground-states. In principle, it is
relatively simple to implement, since it amounts to mea-
suring the average dc voltage generated across a finite
Josephson junction array in the presence of a small cur-
rent bias (i. e. this bias current has to be smaller than
the critical current of the global system). The physi-
cal mechanism leading to this small dissipation is very
interesting by itself; it was orinally discussed in a sem-
inal paper by Likharev and Zorin16 in the context of a
single Josephson junction. Consider one element (single
junction or a rhombus) of the chain, and denote by φ
the phase difference across this element. When it is dis-
connected from the outside world, its wave-function Ψ
is 2πζ-periodic in φ where ζ = 1 for a single junction
and ζ = 1/2 for a rhombus. This reflects the quanti-
zation of the charge on the island between the elements
which can change by integer multiples of 2e/ζ. If φ is
totally classical, the element’s energy is not sensitive to
the choice of a quasi-periodic boundary condition of the
form Ψ(φ+ 2πζ) = exp(i2πζq)Ψ(φ), where q represents
the charge difference induced across the rhombus. In the
presence of coherent quantum tunneling processes for φ,
the energy of the element ǫ(q) will acquire q-dependence,
with a bandwidth directly related to the basic tunnel-
http://arxiv.org/abs/0704.0900v1
ing amplitude. Whereas q is constrained to be integer
for an isolated system, it is promoted to a genuine con-
tinuous degree of freedom when the array is coupled to
leads and therefore to a macroscopic dissipative environ-
ment. So, as emphasized by Likharev and Zorin16, the
situation becomes perfectly analogous to the Bloch the-
ory of a quantum particle in a one-dimensional periodic
potential, where the phase φ plays the role of the posi-
tion, and q of the Bloch momentum. A finite bias cur-
rent tilts the periodic potential for the phase variable, so
that in the absence of dissipation, the dynamics of the
phase exhibits Bloch oscillations, very similar to those
which have been predicted17 and observed18,19 for elec-
trons in semi-conductor super-lattices. If the driving cur-
rent is not too large, it is legitimate to neglect inter-band
transitions induced by the driving field, and one obtains
the usual spectrum of equally spaced localized levels of-
ten called a Wannier-Stark ladder. In the presence of
dissipation, these Wannier-Stark levels acquire a finite
life-time, and therefore the time-evolution of the phase
variable is characterized by a slow and uniform drift su-
perimposed on the faster Bloch oscillations. This drift is
translated into a finite dc voltage by the Josephson re-
lation 2eV = ~(dφ/dt). This voltage decreases with cur-
rent until one reaches the current bias high enough to
induce the interband transition. At this point the phase
starts to slide down fast and the junction switches into
a normal state. In the context of Josephson junctions
these effects were first observed in the experiments on
Josephson contacts with large charging energy20,21,22,23
and more recently24,25 in the semiclassical (phase) regime
of interest to us here. Bloch oscillations in the quantron-
ium circuit driven by a time-dependent gate voltage have
also been recently observed.26
This picture holds as long as the dissipation affecting
the phase dynamics is not too strong, so that the radia-
tive width of the Wannier-Stark levels is smaller than the
nearest-level spacing (corresponding to phase translation
by 2πζ) that is proportional to the bias current. This
provides a lower bound for the bias current which has
to be compatible with the upper bound coming from the
condition of no inter-band transitions. As we shall see,
this requires a large real part of the external impedance
Zω ≫ RQ as seen by the element at the frequency of the
Bloch oscillation, where the quantum resistance scale is
RQ = h/(4e
2). This condition is the most stringent in or-
der to access experimentally the phenomenon described
here. Note that this physical requirement is not lim-
ited to this particular experimental situation, because
any circuit exploiting the quantum coherence of phase
variables, for instance for quantum information process-
ing, has to be imbedded in an environment with a very
large impedance in order to limit the additional quantum
fluctuations of the phase induced by the bath. The intrin-
sic dissipation of Josephson elements will of course add to
the dissipation produced by external circuitry, but we ex-
pect that in the quantum regime (i.e. with sizable phase
fluctuations) considered here, this additional impedance
will be of order of RQ at the superconducting transition
temperature, and will grow exponentially below. Thus,
the success of the proposed measurements is also a test of
the quality of the environment for the circuits intended
to serve as protected qubits.
In many physical realizations Zω has a significant fre-
quency dependence and the condition Zω ≫ RQ is sat-
isfied only in a finite frequency range ωmax > ω > ωmin.
This situation is realized, for example, when the Joseph-
son element is decoupled from the environment by a
long chain of larger Josephson junctions (Section V).
In this case the superconducting phase fluctuations are
suppressed at low frequencies implying that a phase co-
herence and thus Josephson current reappears at these
scales. The magnitude of the critical current is however
strongly suppressed by the fluctuations at high frequen-
cies. This behavior is reminiscent of the reappearance
of the Josephson coupling induced by the dissipative en-
vironment observed in27. At higher energy scales fluc-
tuations become relevant, the phase exhibits Bloch os-
cillations resulting in the insulating behavior described
above. Thus, in this setup one expects a large hierarchy
of scales: at very low currents one observes a very small
Josephson current, at larger currents an almost insulat-
ing behavior and finally a switching into the normal state
at largest currents.
In the case of a chain of identical elements, the total
dc voltage is additive, but Bloch oscillation of different
elements might happen either in phase or in antiphase.
In the former case the ac voltages add increasing the
dissipation in the external circuitry; while in the latter
case the dissipation is low and the individual elements
get more decoupled from the environment. As we show in
Section III a small intrinsic dissipation of the individual
elements is crucial to ensure the antiphase scenario.
This paper is organized as follows. In section II, we
present a semi-classical treatment of the voltage versus
bias current curves for a single Josephson element. We
show that this gives an accurate way to measure the effec-
tive dispersion relation ǫ(q) of this element, which fully
characterizes its quantum transition amplitude. Further,
we show that application of the ac voltage provides a
direct probe of the periodicity (2π versus π) of each ele-
ment. In Section III we consider the chain of these ele-
ments and show that under realistic assumptions about
the dynamics of individual elements, it provides much
more efficient decoupling from the environment. Sec-
tion IV focusses on the dispersion relation expected in
a practically important case of a fully frustrated rhom-
bus which is the building block for the protected arrays
considered before.5,8 In this case, the band structure has
been determined by numerical diagonalizations of the
quantum Hamiltonian. An important result of this anal-
ysis is that even in the presence of relatively large quan-
tum fluctuations, the effective band structure is always
well approximated by a simple cosine expression. Finally,
in section V we discuss the conditions for the experimen-
tal implementation of this measurement procedure and
the full V (I) characteristics expected in realistic setup.
After a Conclusion section, an Appendix presents a full
quantum mechanical derivation of the dc voltage when
the bias current is small enough so that inter-band tran-
sitions can be neglected, and large enough so that the
level decay rate can simply be estimated from Fermi’s
golden rule.
II. SEMI-CLASSICAL EQUATIONS FOR A
SINGLE JOSEPHSON ELEMENT
Let us consider the system depicted on Fig. (1). In
the absence of the current source, the energy of the one
dimensional chain of N Josephson elements is a 2πζ pe-
riodic function of the phase difference φ across the chain.
The current source is destroying this periodicity by in-
troducing the additional term −~(I/2e)φ in the system’s
Hamiltonian. Because φ is equal to the sum of phase
differences across all the individual elements, it seems
that the voltage generated by the chain is N times the
voltage of a chain reduced to a single element. This is,
however, not the case: the individual elements are cou-
pled by the common load, and furthermore, as we show
in the next section, their collective behavior is sensitive
to the details of the single element dynamics. In this sec-
tion, we consider the case of a single Josephson element
(N = 1), rederive the results of Likharev and Zorin16 for
single Josephson contact and generalize them for more
complicated structures such as rhombus and give ana-
lytic equations convenient for data comparison.
The dynamics of a single Josephson contact is analo-
gous to the motion of a quantum particle (with a charge
e) in a one-dimensional periodic potential (with period
a) in the presence of a static and uniform force F , the
phase-difference φ playing the role of the spatial coordi-
nate x of the particle.16 In the limit of a weak external
force, it is natural to start by computing the band struc-
ture ǫn(k) for k in the first Brillouin zone [−π/a, π/a],
n being the band label. A first natural approximation is
to neglect interband transitions induced by the driving
field. This is possible provided the Wannier-Stark energy
gap ∆B = Fa is smaller than the typical band gap ∆ in
zero external field. As long as ∆B is also smaller than
the typical bandwidth W , the stationary states of the
Schrödinger equation spread over many (roughlyW/∆B)
periods, so we may ignore the discretization (i.e. one
quantum state per energy band per spacial period) im-
posed by the projection onto a given band. We may
therefore construct wave-packets whose spacial extension
∆x satisfies a≪ ∆x≪ aW/∆B, and the center of such a
wave-packet evolves according to the semi-classical equa-
tions:
dǫn(k)
F (2)
In the presence of dissipation, the second equation is
modified according to:
F − m
where m∗ is the effective mass of the particle in the n-th
band and τ is the momentum relaxation time introduced
by the dissipation.
FIG. 1: The experimental setup discussed in this paper: a
chain of identical building blocks represented by shaded rect-
angle that are biased by the external current source charac-
terized by the impedance Z(ω). The internal structure of the
block that is considered in more detail in the following sec-
tions is either a rhombus (4 junction loop) frustrated by half
flux quantum, or a single Josephson junction but the the re-
sults of the section II can be applied to any circuit of this
form provided that the junctions in the elementary building
blocks are in the phase regime, i.e. EJ ≫ EC .
In the context of a Josephson circuit, we have to diago-
nalize the Hamiltonian describing the array as a function
of the pseudo-charge q associated with the 2πζ periodic
phase variable φ. The quantity q controls the periodic
boundary condition imposed on φ, namely the system’s
wave-function is multiplied by exp(i2πq) when φ is in-
creased by 2πζ. From this phase-factor, we see that
the corresponding Brillouin zone for q is the interval
[−1/2, 1/2]. For a simple Josephson contact (ζ = 1), the
fixed value of q means that the total number of Cooper
pairs on the site carrying the phase φ is equal to q plus an
arbitrary integer. For a doubly periodic element, such as
rhombus (ζ = 1/2), charge is counted in the units of 4e.
To simplify the notations we assume usual 2π periodicity
(ζ = 1) in this and the following Sections and restore
the ζ-factors in Sections IV, V. From the band structure
ǫn(q), we may write the semi-classical equations of mo-
tion in the presence of the bias current I and the outer
impedance Z as:
dǫn(q)
where we used the Josephson relation for the voltage drop
V across the Josephson element as V = (~/2e)(dφ/dt)
and defined ZQ = ~/(4e
This semi-classical model exhibits two different
regimes. Let us denote by ωmax the maximum value of
the “group velocity” |dǫn(q)/(~dq)|. If the driving cur-
rent is small (I < Ic = 2eωmaxZQ/Z), it is easy to see
that after a short transient, the system reaches a station-
ary state where q is constant and:
that is: V = ZI. Thus, at I < Ic the current flows en-
tirely through the external impedance, i.e. the Joseph-
son elements become effectively insulating due to quan-
tum phase fluctuations. Indeed, a Bloch state writ-
ten in the phase reprentation corresponds to a fixed
value of the pseudo-charge q and non-zero dc voltage
(1/2e)(dǫn/dq). Note that the measurement of the maxi-
mal value Vc of the voltage on this linear branch directly
probes the spectrum of an individual Josephson block,
because Vc = ~ωmax/2e
At stronger driving (I > Ic), it is no longer possible
to find a stationary solution for q. The system enters
therefore a regime of Bloch oscillations. In the absence of
dissipation (Z/ZQ → ∞), the motion is periodic in time
for both φ and q. A small but finite dissipation preserves
the periodicity in q, but induces an average drift in φ or
equivalently a finite dc voltage. To see this, we first note
that the above equations of motion imply:
Since the right-hand side is a periodic function of q with
period 1, q(t) is periodic with the period T (I) given by:
T (I) =
∫ 1/2
f(q)dq (8)
f(q) =
On the other hand, the instantaneous dissipated power
reads:
ǫn(q)−
= −~ZQ
)2 (10)
Because q(t) is periodic, averaging this expression over
one period gives:
〉 = 2e
)2〉 (11)
or, equivalently:
〈V 〉 = ~
)2〉 (12)
Using the equations of motion, we get more explicitely:
〈V 〉 = 1
) ∫ 1/2
−1/2(
)2f(q)dq
∫ 1/2
−1/2 f(q)dq
Here we emphasized by the subscript that Zω might have
some frequency dependence. As we show in Appendix,
the dissipation actually occurs at the frequency of Bloch
oscillations that becomes ωB = 2πI/2e in the limit of
large currents. In the limit of large currents, I ≫ Ic,
(that can be achieved for large impedances) we may ap-
proximate f(q) by a constant, so the voltage is given by
the simpler expression:
〈V (I ≫ Ic)〉 =
4e2ZωI
∫ 1/2
)2dq (14)
On the other hand, when I approaches Ic from above,
Bloch oscillations become very slow and f(q) is strongly
peaked in the vicinity of the maximum of the group ve-
locity. Since this velocity is in general a smooth function
of q, we get in this limit for the maximal dc voltage:
2ω2max
4e2Z0Ic
= Z0Ic (15)
0 1 2 3 4 5
0 1 2 3 4 5
FIG. 2: Typical I − V curve of a single Josephson element
measured by a circuits shown in Fig. 1
In the simplest case of a purely harmonic dispersion,
ǫ(q) = 2w cos 2πq, the maximal voltage Vc = 4πw/(2e).
If one can further neglect the frequency dependency of
Z, the V (I) can be computed analytically:
〈V 〉 = ZI I < Ic (16)
〈V 〉 = ZIc
I2 − I2c
I > Ic (17)
We show this dependence in Fig. 2. This expression (16),
(17) is related to the known result for Z ≪ ZQ13,14 by
the duality15 transformation:
V → I,
I → V,
Z → 1
The semi-classical approximation is valid when the os-
cillation amplitude of the superconducting phase is much
larger than 2π, which allows the formation of the semi-
classical wave-packets. When I is much larger than Ic,
this oscillation amplitude is equal to 2eW/~I, whereW is
the total band-width of ǫn(q). This condition also ensures
that the work done by the current source when the phase
increases by 2π is much smaller than the band-width. In
order to observe the region of negative differential resis-
tance, corresponding to the regime of Bloch oscillations,
we require therefore that:
2π~Ic
≪W ≃ 2eVc
, (18)
where the last equality becomes exact in the case of a
purely harmonic dispersion. This translates into:
Z ≫ RQ. (19)
For large currents one can compute dc voltage di-
rectly by using the golden rule (without semiclassics);
we present the results in Appendix A. The result is con-
sistent with the large I limit of Eq. (17), 〈V 〉 = V 2c /2ZI.
Deep in the classical regime (EJ ≫ EC), the bandwidth
and the generated voltage become exponentially small.
In this regime the bandwidth is much smaller than the
energy gaps, so these formulas are applicable (asuming
(19) is satisfied) until the splitting between Wannier-
Stark levels becomes equal to the first energy gap given
by the Josephson plasma frequency, i.e. for I < eωJ/π.
Upon a further increase of the driving current in this
regime the generating voltage experiences resonant in-
crease for each splitting that is equal to the energy gap:
Ik = e(Ek−E0)/π. Physically, at these currents the phase
slips are rare events that lead to the excitation of the
higher levels at a new phase value that are followed by
their fast relaxation. At very large energies, the band-
width of these levels becomes larger than their decay rate
due to relaxation, (RQ/Z)EC . At these driving currents,
the system starts to generate large voltage and switches
to a normal state. At a very large EJ this happens at
the driving currents very close to the Josephson criti-
cal current 2eEJ , but in a numerically wide regime of
100 & EJ/EC & 10 the generated voltage at low curents
is exponentially small but switching to the normal state
occurs at significantly smaller currents than 2eEJ .
In the intermediate regime where EJ and EC are com-
parable, we expect a band-width comparable to energy
gaps so that the range of application of the quantum
derivation is not much larger than the one for the semi-
classical approach.
Negative differential resistance associated to Bloch os-
cillations has been predicted long ago,28 and observed
experimentally29 in the context of semi-conductor su-
perlattices. For Josephson junctions in the cross-over
regime (EJ/EC ≃ 1), a negative differential resistance
has been observed in a very high impedance environ-
ment,24 in good agreement with earlier theoretical pre-
dictions.30 More recently, the I − V curve of the type
shown on Fig 2 have been reported on a junction with
a ratio EJ/EC = 4.5
25. These experiments show good
agreement with a calculation which takes into account
the noise due to residual thermal fluctuations in the
resistor.31
Although the above results allows the extraction of the
band structure of an individual Josephson block from the
measurement of dc I−V curves, the interpretation of ac-
tual data may be complicated by frequency dependence
of the external impedance Zω. Additional information
independent on Zω can be obtained from measuring the
dc V (I) characteristics in the circuit driven by an ad-
ditional ac current. In this situation, the semi-classical
equations of motion become:
dǫn(q)
I + I ′ cos(ωt)
A small ac driving amplitude I ′ strongly affects the V (I)
curve only in the vicinity of resonances where nωB(IR) =
mω, with m and n integers. The largest deviation occurs
for m = n = 1. Furthermore, for I ′ ≪ I the terms
with m > 1 are parametrically small in I ′/I while for
I ≫ Ic the terms with n > 1 are parametrically small
in Ic/I. Experimental determination of the resonance
current, IR, would allow a direct measurement of the
Bloch oscillation frequency and thus the periodicity of the
phase potential (see next Section). Observation of these
mode locking properties have in fact provided the first
experimental evidence of Bloch oscillations in a single
Josephson junction.20,21
We now calculate the shape of V (I) curve in the vicin-
ity ofm = n = 1 point when both I ′ ≪ I and I ≫ Ic. We
denote by φ0(t) and q0(t) the time-dependent solutions
of the equations at I = IR in the absence of ac driving
current. We shall look for solutions which remain close
to φ0(t) and q0(t) at all times and expand them in small
deviations φ1 = φ − φ0, q1 = q − q0. We can always
assume that q1 has no Fourier component at zero fre-
quency because such component can be eliminated by a
time translation applied to q0. The equations for φ1, q1
become
ǫ′′n(q0)q1 (22)
I − IR + I ′ cos(ωt)
Because the main component of
d2ǫn(q0)
oscillates with
frequency ω and q1 has no dc component, the average
value of the voltage
is due to the part of q1 that os-
cillates with the same frequency, q1ω = I
′/(2eω) sin(ωt).
Because q0 = ω(t−t0)+χ(ω(t−t0)) where χ(t) is a small
periodic function, the first equation implies that
> = 〈
ǫ′′n(ω(t− t0)) sin(ωt)〉
ǫ′′n(q) cos(2πq)dq
The deviation q1 remains small only if the constant parts
cancel each other in the right hand side of the equation
(23). This implies
I − IR
ǫn(q) cos(2πq)dq (24)
We conclude that in the near vicinity of the reso-
nances the increase of the current does not lead to addi-
tional current through the Josephson circuit, so the re-
lation between current and voltage becomes linear again
δV = ZδI. In other words, the Josephson circuit be-
comes insulating with respect to current increments. The
width of this region (in voltage) is directly related to the
first moment of the energy spectrum of the Josephson
block providing one with the direct experimental probe
of this quantity. In particular, a Josephson element such
as rhombus in a magnetic flux somewhat different from
Φ0/2 displays a phase periodicity 2π but a very strong
deviations from a simple cos 2πq spectrum that will man-
ifest themselves in first moment of the spectrum. Note
finally, that the discussion above assumes that the ex-
ternal impedance Zω has no resonances in the important
frequency range. The presence of such resonances will
modify significantly the observed V (I) curves because it
would provide an efficient mechanism for the dissipation
of Bloch (or Josephson) oscillations at this frequency.
III. CHAIN OF JOSEPHSON ELEMENTS
We shall first consider the simplest example of a two-
element chain, because it captures the essential physics.
This chain is characterized by two phase differences (φ1
and φ2) and two pseudo-charges (q1 and q2). The equa-
tions of motion for the pseudo-charges (5) implies that
the charge difference q1− q2 is constant, because the cur-
rents flowing through these elements are equal, and thus
the right-hand sides of the evolution equations (5) are
identical. Because of this conservation law, even the long-
term physical properties depend on the initial conditions.
Similar problems have already been discussed in the con-
text of a chain of Josephson junctions driven by a current
larger than the critical current.32,33,34,35 This unphysical
behavior disappears if we take into account the dissipa-
tion associated with individual elements. Physically, it
might be due to stray charges, two-level systems, quasi-
particles, phonon emission, etc.36,37 A convenient model
for this dissipation is to consider an additional resistor
in parallel with each junction. For the sake of simplicity,
we assume that each element has a low energy band with
a simple cosine form. This physics is summarized by the
equations:
φ̇j = 4πw sin 2πqj (25)
q̇j =
I − 1
φ̇i −
Eliminating the phases gives:
(q̇1 +Ω1 sin 2πq1) = ν −
(sin 2πq1 + sin 2πq2)
(q̇2 +Ω2 sin 2πq2) = ν −
(sin 2πq1 + sin 2πq2)
where
(2e)2Ri
(2e)2Z
Here we allowed for different effective resistances asso-
ciated with each element because this has an important
effect on their dynamics. Indeed the difference between
the currents flowing through the resistors changes the
charge accumulated at the middle island and therefore
violates the conservation law mentioned before. Using
the notations δΩ = (Ω2 − Ω1)/2 and q± = (q2 ± q1)/2,
we have:
˙q− +Ωsin 2πq− cos 2πq++
+δΩcos 2πq− sin 2πq+ = 0 (27)
˙q+ + (ν0 +Ω) sin 2πq+ cos 2πq−−
−δΩ sin 2πq− cos 2πq+ = ν (28)
Significant quantum fluctuations imply that internal re-
sistance of the element R ∼ ZQ for individual elements
at T . TC ; at lower temperature it grows exponen-
tially. Thus, in a realistic case R ≫ Z which implies that
Ωi ≪ ν. In the insulating regime the equations (27-28)
have stable stationary solution (ν0 +Ω) sin 2πq+ = ν,
q− = 0. This solution exists for (ν0 +Ω) < ν , i.e. if
the voltage drop across both junctions does not exceed
Vc = 8πw/(2e). The conducting regime occurs when
ν > (ν0 +Ω); to simplify the analytic calculations we
assume that ν ≫ ν0. This allows to solve the equations
(27-28) by iterations in all non-linear terms. In the ab-
sence of non-linearity q+ = νt , q− = const; the first
iteration gives periodic corrections ∝ cos 2πνt. Averag-
ing the result of the second order iteration over the period
we get
˙〈q−〉 = −
ν0 cos
2 2πq− + 2Ω
The second term in the right hand side of this equation
is much smaller than the first if Ω ≪ ν0. In its absence
the dynamics of q− has fixed points at cos 2πq− = 0. At
these fixed points the periodic potentials generated by in-
dividual elements cancel each other and the dissipation in
external circuitry (which is proportional to cos2(2πq−))
is strictly zero. In a general case the equation (29) has
solution
cos2(2πq−) =
1 + ν0+2Ω
2Ω(ν0 + 2Ω)t
that corresponds to the short bursts of dissipation in
external circuitry that occur with low frequency νb =
2Ω(ν0 + 2Ω). The average value of cos
2(2πq−)
< cos2(2πq−) >=
ν0+2Ω
ν0 + 2Ω
is small implying that the effective dissipation introduced
by the external circuitry is strongly suppressed because
the pseudocharge oscillations on different elements al-
most cancel each other. The effective impedance of the
load seen by individual junction is strongly increased:
Zeff =
ν0 + 2Ω
Z (30)
Similar to a single element case discussed in the previous
Section, an additional dissipation in the external circuit
implies dc current across the Josephson chain
V = Vc
I ≫ Ic = Vc/Zeff
We conclude that a chain of Josephson elements has a
current-voltage characteristics similar to the one of the
single element with one important difference: the effec-
tive impedance of the external circuitry is strongly en-
hanced by the antiphase locking of the individual Joseph-
son elements. In particular, it means that the condition
Z ≫ RQ is much easier to satisfy for the chain of the
elements than for a single element. The analytical equa-
tions derived here describe the chain of two elements but
it seems likely that similar suppression of the dissipation
should occur in longer chains.
To substantiate this claim, lets us generalize the av-
eraging method which led to Eq. (29) for N = 2. The
coupled equations of motion read:
q̇j +Ωj sin 2πqj = ν −
sin 2πqk (31)
To second order in Ωj and ν0, the averaged equations of
motion are:
〈q̇j〉 = −
cos(2π(qk − ql))
− ν0Ωj
cos(2π(qj − qk)) (32)
This set of coupled equations is similar to the Kuramoto
model for coupled rotors38 defined as:
q̇j = ωj −
sin(2π(qj − qk) + α) (33)
The equation of motion (33) exhibits synchronisation of
a finite fraction of the rotors only for K > Kc(α).
39,40
The last term in Eq. (32) is equivalent to the interac-
tion term of Kuramoto model with α = π/2. The ad-
ditional (third) term in the model (32) is the same for
all oscillators, it is thus not correlated with individual
qj and thus can not directly lead to their synchroni-
sation. Remarkably, it turns out that for model (33)
Kc(α = π/2) = 0
39,40, suggesting that in our case, syn-
chronization never occurs on a macroscopic scale. Note
that the coupling K arising from Eq. (32) is not only j-
dependent, but it is also proportional to N . This could
present a problem in the infinite N limit, but should not
present a problem in a finite system. It is striking to see
that α = π/2 is the value for which synchronization is
the most difficult.
IV. ENERGY BANDS FOR A FULLY
FRUSTRATED JOSEPHSON RHOMBUS
In order to apply general results of the previous sec-
tion to the physical chains made of Josephson junctions
or more complicated Josephson circuits we need to com-
pute the spectrum of these systems as a function of the
pseudocharge q conjugated to the phase across these ele-
ments. In all cases the superconducting phase in Joseph-
son devices fluctuates weakly near some classical value
φ0 where the Josephson energy has a minimum in the
limit EJ/EC ≫ 1. In the vicinity of the minimum, the
phase Hamiltonian is H = −4EC d
E′′(φ0)(φ−φ0)2,
so a higher energy state of the individual element (at a
fixed q) can be approximated by one of the oscillator
En = (n +
)ωJ where the Josephson plasma frequency
8E′′(φ0)EC ≈
8EJEC . The Josephson en-
ergy is periodic in the phase with the period 2π but the
amplitude of the transitions between these minima is ex-
ponentially small:
w = a~ωJ(EJ/EC)
1/4 exp(−c
EJ/EC)
where a, c ∼ 1. In this limit one can neglect the contribu-
tion of the excited states (separated by a large gap ωJ )
to the lower band, so the low energy spectrum acquires
a simple form ǫ(q) = 2w cos 2πq. The numerical coeffi-
cients c, a in the formulae for the transition amplitude
depend on the element construction. For a single junc-
tion as = 8 2
π , cs =
8 while for the rhombus
ar ≈ 4.0 , cr ≈ 1.61. In case of the rhombus in mag-
netic field with flux Φ0/2 the Hamiltonian is periodic in
phase with period π provided that the rhombus is sym-
metric along its horizontal axis: indeed in this case the
combination of the time reversal symmetry and reflec-
tion ensures that the Josephson energy has a minimum
for φ0 = ±π/2. Thus, in this case the period in q dou-
bles and the low energy band becomes ǫ(q) = 2w cosπq.
The maximal voltage generated by the chain of N such
elements at I = Ic = (8πζew/~)(ZQ/Z) is
Vc = N
The voltage generated at larger currents depend on the
collective behaviour of the elements in the chain. For a
single element it is simply
〈V (I)〉 = (2πζw)
I2 − I2c
For more than one element the total volage is sufficiently
reduced due to the antiphase correlations. Generally, one
expects that
〈V (I)〉 = N (2πζw)
I2 − I2c
, (35)
where Zeffω is the effective impedance of the environment
affecting each Josephson element which is generally much
larger than its ’bare’ impedance Zω. For two elements the
exact solution (see previous Section) gives Zeffω ≈
that shows the increase of the effective impedance by a
large factor
R/Z. We expect that a similar enhance-
ment factor appears for all N & 2. Finallly, For I < Ic,
the system is ohmic with:
〈V (I)〉 = Z0I (36)
As discussed in Section II, application of a small addi-
tional ac voltage produces features on the current-voltage
characteristics for the currents that produce Bloch oscil-
lation with frequencies commensurate with the frequency
of the applied ac field ωB = 2πζI/2e = (m/n)ω. At these
currents the system becomes insulating with respect to
current increments, the largest such feature appears at
m = n = 1 that allows a direct measurement of the
Josephson element periodicity.
For smaller EJ/EC ∼ 1 the quasiclassical formulas
for the transition amplitudes do not work and one has
to perform the numerical diagonalization of the quan-
tum system in order to find its actual spectrum. As
EJ/EC → 1 the higher energy band approaches the
low energy band and the dispersion of the latter de-
viates from the simple cosine form shown in Figure 3.
These deviations, lead to higher harmonics in the dis-
persion: ǫ(q) = 2w cos 2πζq + 2w′ cos 4πζq and change
the equations (34,35). Our numerical diagonalization of
a single rhombus shows, however, that even at relatively
small EJ/EC ∼ 1 the second harmonics w′ does not ex-
ceed 0.15w, so its additional contribution to the voltage
current characteristic (∝ w′2) can always be neglected.
Thus, in the whole range of EJ/EC > 1 the voltage cur-
rent characteristic is given by Eqs. (34,35) where the ef-
fective value of transition amplitude t can be found from
the band width W = E1 − E0 = 4w plotted in Fig. 3.
For comparison we show the variation of the lower band
width for a single junction in Fig. 4
0 2.5 5 7.5 10 12.5 15
EJ/EC
(E1-E0)/EC
(E2-E0)/EC
0 0.1 0.2 0.3 0.4
E2/EC
E1/EC
E0/EC
EJ/EC=4
FIG. 3: Spectrum of a single rhombus biased by magnetic flux
Φ = Φ0/2. The upper pane shows the bands of the rhombus
characterized by Josephson eneergy EJ/EC = 4 as a function
of bias charge, q. The two lower levels are fitted by the first
two harmonics (dashed line), the coefficient w′ of the second
harmonics is w′ = 0.1w. One observes period doubling of the
first two states that reflects the symmetries of the rhombus
frustrated by a half flux quantum. The second excited level is
doubly degenerate that makes its period doubling difficult to
observe. Physically, these states correspond to an excitation
localized on the upper or lower arms of the rhombus. The
lower pane shows the dependence of the gaps for q = 0 as a
function of EJ/EC . Because higher order harmonics are very
small for all EJ/EC > 1, the gap E1 − E0 coincides with 4w
where w is the tunneling amplitude between the two classical
ground states.
V. PHYSICAL IMPLEMENTATIONS
Generally, the effects described in the previous sections
can be observed if the environment does not affect much
the quantum fluctuations of individual elements and the
resulting quasiclassical equations of motion. These physi-
cal requirements translate into different conditions on the
impedance of the environment at different frequencies.
We begin with the quantum dynamics of the individual
elements. The effect of the leads impedance on it can
be taken into account by adding the appropriate current
term to the phase equation of motion before projecting
on a low energy band and requiring that their effect on
the phase dynamics is small at the relevant frequencies.
For instance, for a single junction
= E′J (φ) +
0 1 2 3 4
EJ/EC
FIG. 4: Band width W = 4w of a single Josephson junction
The characteristic frequency of the quantum fluctuations
responsibe for the tunneling of a single element is Joseph-
son plasma frequency, ωJ =
8EJEc, so the first condi-
tion implies that
|Z(ωJ)| ≫
Ec/EJZQ (37)
For a typical ωJ/2π ∼ 10GHz, the impedance of a sim-
ple superconducting lead of the length ∼ 1cm is smaller
than ZQ and the condition (37) is not satisfied. The
situation is changed if the Josephson elements are decou-
pled from the leads by a large resistance or by a chain
of M ≫ 1 large junctions with
ẼJ/Ẽc ≫ 1 that has
no quantum tunneling transitions of their own (the am-
plitude of such transitions is ∝ exp(−
8ẼJ/Ẽc ). As-
suming that elements of this chain have no direct ca-
pacitive coupling to the ground (M2C0 ≪ C), the chain
has an impedance Z =
8Ẽc/ẼJMZQ at the relevant
frequencies, so a realistic chain with M ∼ 50 junctions
8ẼJ/Ẽc ∼ 10 provides the contribution to the
impedance needed to satisfy (37).
Similar decoupling from the leads of the individual el-
ements can be achieved by a sufficiently long chain of
similar Josephson elements, e.g. rhombi. Consider a
long (N ≫ 1) chain of similar elements connected to
the leads characterized by a large but finite capacitance
Cg ≫ C. For a short chain the tunneling of a single
element changes the phase on the leads resulting in a
huge action of the tunneling process. However, in a long
chain of junctions, a tunneling of individual rhombus may
be compensated by a simultaneous change of the phases
δφ/N of the remaining rhombi, and subsequent relax-
ation of δφ from its initial value π towards the equilib-
rium value which is zero. For N ≫ 1, this later process
can be treated within the Gaussian approximation, with
the Lagrangian (in imaginary time):
where Eg = e
2/(2Cg). So the total action involved in the
relaxation is: S = π
If this action S is less
than unity this relaxation has strictly no effect on the
tunneling amplitude of the individual rhombus.
We now turn to the constraints imposed by the qua-
siclassical equations of motion. The solution of these
equations shows oscillation at the Bloch frequency that
is ωB = 2πζI/(2e) for large currents and approaches
zero near the Ic. Thus, for a single Josephson ele-
ment the quasiclassical equations of motion are valid if
Re(RQ/Z(ωB)) ≪ 1 . A realistic energy band for a
Josephson element, W ∼ 0.3K and Z/ZQ ∼ 100 cor-
respond to Bloch frequency ωB/2π ∼ 0.1GHz . In this
frequency range a typical lead gives a capacitive contri-
bution to the dynamics. The condition that it does not
affect significantly the equations of motion implies that
the lead capacitance C . 10fF . As discussed in Section
(III) the individual elements in a short chain oscillate in
antiphase decreasing the effective coupling to the leads
by a factor
R/Z where R is the intrinsic resistance
of the contact. This factor can easily reach 102 at suf-
ficiently low temperatures making much less restrictive
the condition on the lead capacitance.
Large but finite impedance of the environment
Re(RQ/Z(ωB)) . 1 modifes the observed current-voltage
characteristics qualitatively, specially in the limit of very
small driving current. When I vanishes, and with infi-
nite external impedance, the wave function of the phase
variable is completely extended, with the form of a Bloch
state, and the pseudo-charge q is a good quantum num-
ber. As discussed at the end of Sec. II, the system be-
haves as a capacitor. But when the external impedance
is finite, charge fluctuations appear, which in the dual
description means that quantum phase fluctuations are
no longer unbounded. To be specific, consider a realistic
example of N rhombi chain (or two ordinary junctions)
attached to the leads with Z(ω) = Z0 in a broad but fi-
nite frequency interval ωmin < ω < ωmax and decreases as
Z(ω) = Z0(ωmax/ω) for ω > ωmax, Z(ω) = Z0(ω/ωmin)
for ω < ωmin. Such Z(ω) is realized in a long chain
of M Josephson junctions between islands with a finite
capactive coupling to the ground C0: ωmax = ωJ and
ωmin = (
C/C0/M)ωJ . The effective action describ-
ing the phase dynamics across the chain has contribu-
tions from the tunneling of individual rhombi and from
impedance of the chain
Ltot =
8π2ζ2Nw
Here the first term describes the effect of the tunneling of
the Josephson element between its quasiclassical minima
which we approximate by a single tunneling amplitude
w resulting in a spectrum ǫ(q) = −2w cos 2πζq that in a
Gaussian approximation becomes ǫ(q) = 4π2ζ2wq2. This
approximation is justified by the fact that, as we show
below, the main effect of the phase fluctuations comes
from the broad frequency range where the action is dom-
inated by the second term while the first serves only as
a cutoff of the logarithmical divergence. Its precise form
is therefore largely irrelevant.
This action leads to a large but finite phase fluctuations
8π2ζ2Nw
min(ωmax, ω
where ω′max = 8π
2ζ2Nw(ZQ/Z0). These fluctuations
are only logarithmically large, so they result in a finite
renormalization of the Josephson energy of the rhombi
chain and the corresponding critical current. In the ab-
sence of such renormalization the Josephson energy of a
finite chain of elements can be approximated by the lead-
ing harmonics E(φ) = −E0 cos(φ/ζ) with E0 ∼ EJ for
N ∼ 1 and EJ & Ec. Renormalization by fluctuations
replaces E0 by
ER = exp(−
)E0 =
min(ωmax, ω
]− Z0
In the limit of ωmin → 0 or Z0 → ∞ the phase fluc-
tuations renormalize Josephson energy to zero. But for
realistic parameters this suppression of Josephson energy
is finite which thus results in a small but non-zero value of
the critical current. In this situation the current-voltage
characterictics sketched in Fig. 1 is modified for very
small values of currents and voltages: instead of insulat-
ing regime at very low currents and voltages one should
observe a very small supercurrent (ER/2e) followed by a
small voltage step as shown in Fig. 2 by a dashed line.
As is clear from the above discussion the value of the
resulting critical current is controlled by the phase fluc-
tuations at low ω ≪ ωmax; these frequencies are much
smaller than the typical internal frequencies of a chain of
Josephson elements which can be thus lumped together
into an effective object characterized by the bare Joseph-
son energy E(φ) and transition amplitude between its
minima w. We thus expect the same qualitative behav-
ior for a small chain of Josephson elements as for a single
element at low currents.
VI. CONCLUSION
The main results of the present work are the expres-
sions (34), (35) for the I-V curves of a chain of N identi-
cal basic Josephson circuits. They are derived within the
assumption that the Josephson coupling is much larger
than the charging energy, but in fact, the numerical cal-
culations show that they remain very accurate even if
EJ ≈ EC . These equations predict a maximum dc volt-
age when I = Ic and V (I) ∝ 1/I for I ≫ Ic. The anoma-
lous V versus I dependence exhibited by these equations
is a signature of underdamped quantum phase dynamics.
It occurs only if the impedance of the external circuitry
is sufficiently large both at the frequency of Bloch oscil-
lations and at the Josephson frequency of the individual
elements. The precise conditions are given in Section V.
Observation of this dependence and the measurement of
the maximal voltage would provide the proof of the quan-
tum dynamics and the measurements of the tunneling
amplitude which is the most important characteristics of
these systems. It would also provide a crucial test of the
quality of decoupling to the environment.
As a deeply quantum mechanical system, the chain
of Josephson devices is very sensitive to an additional
ac driving. It exhibits resonances when the driving
frequency is commensurate with the frequency ωB =
2πζI/2e of the Bloch oscillations. This would provide
additional ways to characterize the quantum dynamics
of these circuits and confirm the period doubling of the
rhombi frustrated by exactly half flux quantum.
Acknowledgments
LI is thankful to LPTMS Orsay, and LPTHE Jussieu
for their hospitality through a financial support from
CNRS while BD has enjoyed the hospitality of the
Physics Department at Rutgers University. This work
was made possible by support from NSF DMR-0210575,
ECS-0608842 and ARO W911NF-06-1-0208.
APPENDIX A: QUANTUM-MECHANICAL
CALCULATION OF THE DC VOLTAGE
In the large current regime where I ≥ Imax, the energy
drop ∆B = hI/2e induced by the driving current when
φ increases by 2π becomes comparable to or larger than
the bandwidth W . In this regime, the semi-classical ap-
proach is no longer reliable. But as long as ∆B remains
small compared to the typical gap ∆ between nearby
bands, we may still construct the system wave functions
in the presence of the driving field from Wannier orbitals
belonging to a single band. In such quantum-mechanical
approach, dissipation is described as the result of cou-
pling the single degree of freedom (φ, q) to a continuum
of oscillator modes (qα, pα). The corresponding Hamil-
tonian has the form:
Hn = ǫn
(q2α+p
α) (A1)
where we have chosen the following commutation rela-
tions:
[φ, q] = i, [qα, pβ ] = iδαβ (A2)
and all other commutators between these operators van-
ish. The form of (A1) is plausible on the physical ground
because when the superconducting islands are coupled
to macroscopic leads, the charge q undergoes quantum
fluctuations, so that it has to be replaced by a “dressed
charge” q −
α gαqα in the effective Hamiltonian. A
more explicit justification is that the corresponding semi-
classical equations of motion have the same form as
Eqs. (4), (5) which simply mean that the effective cur-
rent going through the superconducting circuit is the bias
current minus the current going through the external im-
pedence. The semi-classical equations deduced from (A1)
read:
gαqα)
= ωαpα
= −ωαqα +
gαqα)
It is then natural to introduce q′ = q −
α gαqα, so that:
(q′) (A3)
To show that (A4) has the same form as (5), we notice
that the driving term for the bath oscillators is directly
proportional to dφ/dt. Specifically:
+ ω2αqα = ωαgα
Going to Fourier space, we see that after averaging over
initial conditions for the bath oscillators, Eq. (A4) takes
the form:
− iωq̃′(ω) = I
2πδ(ω)− i
ω2 − ω2α
−iωφ̃(ω)
This is exactly the frequency space version of Eq. (5),
where as usual, the dissipation is related to the spectral
density of the bath by:
ω2 − ω2α
Here we emphasize that Z is typically frequency depen-
dent, in which case the term (ZQ/Z)(dφ/dt) in Eq. (5)
becomes a convolution with ZQ/Z replaced by a non-local
kernel in time.
Now we turn to the solution of the quantum prob-
lem (A1) in the large driving current regime. Let us first
consider the Josephson array without dissipation. It is
straightforward to express its eigenstates in the q repre-
sentation41 because the Schrödinger equation reads then:
Eψ(q) = ǫn(q)ψ(q)− i
(q) (A8)
so that:
ψ(q) = ψ(0) exp
−i 2e
(ǫn(q
′)− E)dq′
The energy spectrum is determined via the boundary
condition ψ(q + 1) = ψ(q) so that:
∫ 1/2
′)dq′ +∆WSν, ν integer (A10)
This yields a Wannier-Stark ladder of spacially localized
states, with a constant level spacing equal to ∆B. Note
that increasing ν by one unit multiplies the wave-function
ψ(q) by exp(i2πq). In the phase representation, this is
equivalent to a translation by −2π. Of course, in the
absence of dissipation, these levels have infinite life-time,
and therefore, no dc voltage is generated.
Let us now consider the limit of a weak coupling to
the dissipative bath. This means that the decay rate Γ
of the Wannier-Stark levels is much smaller than the level
spacing ∆B. Assuming that transitions take place mostly
between two adjacent levels, we get an average voltage:
〈V 〉 = ~
〉 = hΓ
(A11)
The rate Γ is estimated via Fermi’s golden rule which we
prefer to use in the correlation function formulation :
Γν→ν′ =
|ν′〉|2C̃AA((ν − ν′)ωB) (A12)
where ωB = ∆B/~ = 2πI/(2e). In this expression,
C̃AA is the Fourier transform of the correlation function,
CAA(t − t′) = 〈A(t)A(t′)〉 of the Heisenberg operators
A(t) =
α gαqα(t), taken in the equilibrium state of the
dissipative bath.
We evaluate now the matrix element of the velocity
operator dǫn/dq between Wannier-Stark states:
〈ν|dǫn
|ν′〉 =
∫ 1/2
(q) exp(i2π(ν′ − ν)q) dq (A13)
As we have seen, in most physically interesting situations,
we can approximate the periodic function ǫ(q) by a single
harmonic 2w cos(2πq). In this case:
|ν′〉 = 2πwδ|ν′−ν|,1 (A14)
In the zero temperature limit, the bath correlation func-
tion is:
C̃AA(ω) =
πg2αδ(ω−ωα) =
θ(ω) (A15)
where θ(ω) is the Heaviside step function. Putting all
these elements together gives:
〈V (I ≫ Ic)〉 =
(2πw)2
2e2ZωI
(A16)
where Zω denotes the external impedance taken for
ω = ωB, and this result is in perfect agreement with the
large current limit of the semi-classical treatment, shown
as Eq. (35).
When the band structure is replaced by 2w cos(2πζq)
as in the case of a rhombus (for which ζ = 1/2), we have
〈V 〉 = ζ hΓ
(A17)
The frequency of Bloch oscillations becomes ωB(ζ) =
ζωB(ζ = 1), and the matrix element is multiplied by
ζ, so that the voltage is multiplied by ζ2. Again, this is
compatible with the semi-classical result (35).
1 For a review on superconducting qubits, see for in-
stance: M. H. Devoret, A. Wallraff, and J. M. Martinis,
arXiv:cond-mat/0411174
2 E. Knill, Nature, 434, 39, (2005)
3 A. Y. Kitaev, Ann. Phys. 303, 2, (2003)
4 L. B. Ioffe, andM. V. Feigel’man, Phys. Rev.B 66, 224503,
(2002)
5 B. Douçot, M. V. Feigel’man, and L. B. Ioffe, Phys. Rev.
Lett. 90, 107003, (2003)
6 B. Douçot, L. B. Ioffe and J. Vidal, Phys. Rev. B 69,
214501, (2003)
7 M. V. Feigel’man, L. B. Ioffe, V. B. Geshkenbein, P. Dayal,
and G. Blatter Phys. Rev. B 70, 224524 (2004)
8 B. Douçot, M. V. Feigel’man, L. B. Ioffe, and A. S. Iosele-
vich Phys. Rev. B 71, 024505, (2005)
9 I. V. Protopopov, and M. V. Feigel’man, Phys. Rev. B 70,
184519 (2004)
10 B. Pannetier, private communication.
11 B. Douçot and J. Vidal, Phys. Rev. Lett. 88, 227005,
(2002)
12 M. Rizzi, V. Cataudella, and R. Fazio, Phys. Rev. B 73,
100502(R), (2006)
13 Yu. M. Ivanchenko and L. A. Zil’berman, Zh. Eksp. Teor.
Fiz. 55, 2395 (1968); [Sov. Phys. JETP 28, 1272 (1969)].
14 G.-L. Ingold, and H. Grabert, Phys. Rev. Lett.83, 3721,
(1999)
15 A. Schmid, Phys. Rev. Lett. 51, 1506 (1983)
16 K. K. Likharev and A. B. Zorin, J. Low Temp. Phys. 59,
347, (1985)
17 G. H. Wannier, Rev. Mod. Phys. 34, 645, (1962)
18 E. E. Mendez, F. Agullo-Rueda, and J. M. Hong, Phys.
Rev. Lett. 60, 2426, (1988)
19 P. Voisin, J. Bleuse, C. Bouche, S. Gaillard, C. Alibert,
and A. Regreny, Phys. Rev. Lett. 61, 1639, (1988)
20 L. S. Kuzmin and D. B. Haviland, Phys. Rev. Lett. 67,
2890 (1991); Physica Scripta T42, 171 (1992).
21 L. S. Kuzmin, Yu. A. Pashkin and T. Claeson, Supercond.
Science and Technology, 7, 324 (1994).
22 L. S. Kuzmin, Yu. A. Pashkin, A. Zorin and T. Claeson,
Physica B 203, 376, (1994)
23 L. S. Kuzmin, Yu. A. Pashkin, D. S. Golubev, and A. D.
Zaikin, Phys. Rev. B 54, 10074, (1996)
24 M. Watanabe, and D. B. Haviland, Phys. Rev. B 67,
094505 (2003)
25 S. Corlevi, W. Guichard, F. W. J. Hekking, and D. B.
Haviland, Phys. Rev. Lett. 97 096802, (2006)
26 N. Boulant, G. Ithier, F. Nguyen, P. Bertet, H. Pothier, D.
Vion, C. Urbina, and D. Esteve, arXiv:cond-mat/0605061
27 A. Steinbach, P. Joyez, A. Cottet, D. Esteve, M. H. De-
voret, M. E. Huber, and John M. Martinis, Phys. Rev. Lett
87, 137003 (2001).
28 L. Esaki and R. Tsu, IBM J. Res. Develop. 14, 61 (1970)
29 A. Sibille, J. F. Palmier, H. Wang, and F. Mollot, Phys.
Rev. Lett. 64, 52 (1990)
30 U. Geigenmüller and G. Schön, Physica B 152, 186 (1988)
31 I. S. Beloborodov, F. W. J. Hekking, and F. Pis-
tolesi, in New Directions in Mesoscopic Physics (Towards
Nanoscience), edited by R. Fazio, V. F. Gantmakher, and
Y. Imry (Kluwer Academic Publisher, Dordrecht, 2002),
p. 339.
32 K. Wiesenfeld, and P. Hadley, Phys. Rev. Lett. 62, 1335,
(1989)
33 K. Y. Tsang, S. H. Strogatz, and K. Wiesenfeld, Phys.
Rev. Lett. 66, 1094, (1991)
34 S. Nichols, and K. Wiesenfeld, Phys. Rev. A 45, 8430,
(1992)
35 S. H. Strogatz, and R. E. Mirollo, Phys. Rev. E 47, 220,
(1993)
36 L. Faoro, and L. B. Ioffe, Phys. Rev. Lett. 96, 047001,
(2006)
37 L. B. Ioffe, V. B. Geshkenbein, Ch. Helm, and G. Blatter,
Phys. Rev. Lett. 93, 057001, (2004)
38 Y. Kuramoto, Progr. Theoret. Phys. Suppl. 79, 223, (1984)
39 H. Sakaguchi and Y. Kuramoto, Progr. Theoret. Phys. 76,
576, (1986)
40 H. Daido, Progr. Theoret. Phys. 88, 1213, (1992)
41 See for instance P. W. Anderson, Concepts in Solids, W. A.
Benjamin (1963), reedited by Addison-Wesley Publishing
Co. (1992), chapter 2, section C.
http://arxiv.org/abs/cond-mat/0411174
http://arxiv.org/abs/cond-mat/0605061
ABSTRACT
  We compute the current voltage characteristic of a chain of identical
Josephson circuits characterized by a large ratio of Josephson to charging
energy that are envisioned as the implementation of topologically protected
qubits. We show that in the limit of small coupling to the environment it
exhibits a non-monotonous behavior with a maximum voltage followed by a
parametrically large region where $V\propto 1/I$. We argue that its
experimental measurement provides a direct probe of the amplitude of the
quantum transitions in constituting Josephson circuits and thus allows their
full characterization.

<|endoftext|><|startoftext|>
Introduction
	Theory for the infinite strip
	Simulations for the infinite strip
	Percolation on a square and semi-infinite strip
	Further comments and conclusions
	Acknowledgments
	References
ABSTRACT
  We consider the density of two-dimensional critical percolation clusters,
constrained to touch one or both boundaries, in infinite strips, half-infinite
strips, and squares, as well as several related quantities for the infinite
strip. Our theoretical results follow from conformal field theory, and are
compared with high-precision numerical simulation. For example, we show that
the density of clusters touching both boundaries of an infinite strip of unit
width (i.e. crossing clusters) is proportional to $(\sin \pi
y)^{-5/48}\{[\cos(\pi y/2)]^{1/3} +[\sin (\pi y/2)]^{1/3}-1\}$.
  We also determine numerically contours for the density of clusters crossing
squares and long rectangles with open boundaries on the sides, and compare with
theory for the density along an edge.

<|endoftext|><|startoftext|>
s_Im_U3.eps
Effective band-structure in the insulating phase versus
strong dynamical correlations in metallic VO2
Jan M. Tomczak,1 Ferdi Aryasetiawan,2 and Silke Biermann1
Centre de Physique Théorique, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
Research Institute for Computational Sciences, AIST,
Umezono 1-1-1, Tsukuba Central 2, Tsukuba Ibaraki 305-8568, Japan
Using a general analytical continuation scheme for cluster dynamical mean field calculations,
we analyze real-frequency self-energies, momentum-resolved spectral functions, and one-particle
excitations of the metallic and insulating phases of VO2. While for the former dynamical correlations
and lifetime effects prevent a description in terms of quasi-particles, the excitations of the latter
allow for an effective band-structure. We construct an orbital-dependent, but static one-particle
potential that reproduces the full many-body spectrum. Yet, the ground state is well beyond a
static one-particle description. The emerging picture gives a non-trivial answer to the decade-old
question of the nature of the insulator, which we characterize as a “many-body Peierls” state.
PACS numbers: 71.27.+a, 71.30.+h, 71.15.Ap
Describing electronic correlations is a challenge for
modern condensed matter physics. While weak corre-
lations slightly modify quasi-particle states, by broad-
ening them with lifetime effects and shifting their ener-
gies, strong enough correlations can entirely invalidate
the band picture by inducing a Mott insulating state.
In a half-filled one-band model, an insulator is re-
alized above a critical ratio of interaction to band-
width. Though more complex scenarios exist in realis-
tic multi-band cases, a common feature of compounds
that undergo a metal-insulator transition (MIT) upon
the change of an external parameter, such as temper-
ature or pressure, is that the respective insulator feels
stronger correlations than the metal, since it is precisely
their enhancement that drives the system insulating.
In this paper we discuss a material where this rule of
thumb is inverted : We argue that in VO2 it is the insu-
lator that is less correlated, in the sense that band-like
excitations are better defined and have longer lifetimes
than in the metal. Albeit, neither phase is well described
by standard band-structure techniques. Using an an-
alytical continuation scheme for quantum Monte Carlo
solutions to Dynamical Mean Field Theory (DMFT) [1],
we discuss quasi-particle lifetimes, k-resolved spectra (for
comparison with future angle resolved photoemission ex-
periments) and effective band-structures. While dynam-
ical effects are crucial in the metal, the excitations of
the insulator are well described within a static picture :
For the insulator we devise an effective one-particle po-
tential that captures the interacting excitation spectrum.
Still, the corresponding ground state is far from a Slater
determinant, leading us to introduce the concept of a
“many-body Peierls” insulator.
The MIT of VO2 has intrigued solid state physicists
for decades [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. A
high temperature metallic rutile (R) phase transforms at
Tc=340 K into an insulating monoclinic structure (M1),
in which vanadium atoms pair up to form tilted dimers
along the c-axis. The resistivity jumps up by two orders
of magnitude, yet no local moments form. Despite exten-
sive efforts, the mechanism of the transition is still under
debate [6, 7, 8, 9, 10, 11, 12]. Two scenarios compete : In
the Peierls picture the structural aspect (unit-cell dou-
bling) causes the MIT, while in the Mott picture local
correlations predominate.
VO2 has a d
1 configuration and the crystal field splits
the 3d-manifold into ⁀2g and empty eσg components. The
former further split into eπg and a1g orbitals, which
overlap in R-VO2, accounting for the metallic charac-
ter. Still, the quasi-particle peak seen in photoemission
(PES) [9, 10, 11] is much narrower than the Kohn-Sham
spectrum of density functional theory (DFT) in the local
density approximation (LDA) [7], and eminent satellite
features evidenced in PES are absent. In M1-VO2, the
a1g form bonding/antibonding orbitals, due to the dimer-
ization. As discussed by Goodenough [3], this also pushes
up the eπg relative to the a1g. Yet, the LDA [7] yields a
metal. Non-local correlations beyond LDA were shown
to be essential [15, 16, 17]. Indeed, recent Cluster DMFT
(CDMFT) calculations [15], in which a two-site vanadium
dimer constituted the DMFT impurity, opened a gap,
agreeing well with PES and x-ray experiments [11, 12].
Starting from these LDA + CDMFT results [15] for
the Matsubara ⁀2g Green’s function G(ıωn) we deduce the
real frequency Green’s function G(ω) by the maximum
entropy method [18] and a Kramers-Kronig transform.
The self-energy matrix Σ(ω) we obtain by numerical in-
version of G(ω) =
[ω + µ−Hk − Σ(ω)]−1 [1], with
the LDA Hamiltonian H , and the chemical potential µ.
Fig. 1 shows (a) the diagonal elements of the R-
VO2 self-energy, and (b) the resulting k-resolved spec-
trum. Notwithstanding minor details, the a1g and e
self-energies exhibit a similar dynamical behavior. The
real-parts at zero energy, ℜΣ(0), entailing relative shifts
of quasi-particle bands, are almost equal, congruent with
the low changes in their occupations vis-à-vis LDA [15],
http://arxiv.org/abs/0704.0902v2
-2 -1  0  1  2  3
ω [eV]
  a1g 
-2 -1  0  1  2  3
ω [eV]
  a1g 
-2 -1  0  1  2  3
ω [eV]
  a1g 
-2 -1  0  1  2  3
ω [eV]
  a1g 
-2 -1  0  1  2  3
ω [eV]
-2 -1  0  1  2  3
ω [eV]
M1 a1g
M1 a1g-a1g 
ΓZCYΓ
FIG. 1: (color online) Rutile VO2 : (a) self-energy (Σ − µ).
Real (imaginary) parts are solid (dashed). As comparison M1
ℑΣa1g , ℑΣa1g−a1g are shown. (b) spectral function A(k,ω)
and solutions of the QPE (blue). The LHB is the (yellow)
region at -1.7 eV, the broad UHB appears (yellow) at ∼2.5 eV.
and with the isotropy evidenced in experiment [19].
Neglecting lifetime effects (i.e. ℑΣ≈0), one-particle ex-
citations are given by the poles of G(ω) : det[ωk + µ −
Hk − ℜΣ(ωk)] = 0. We shall refer to this as the quasi-
particle equation (QPE) [23]. For static or absentℜΣ this
reduces to a simple eigenvalue problem. In regions of low
ℑΣ, the QPE solutions will give an accurate description
of the position of spectral weight and constitute an effec-
tive band-structure of the interacting system. Yet, due
to the frequency dependence, the number of solutions is
no longer bounded to the number of orbitals.
Below (above) -0.5 (0.2) eV, the imaginary parts of
the self-energy – the inverse lifetime – of R-VO2 is con-
siderable. Due to our limited precision for ℑΣ(0), we
have not attempted a temperature dependent study to
assess the experimental bad metal behavior, but the re-
sistivity exceeding the Ioffe-Regel-Mott limit [20] indi-
cates that even close to the Fermi level, coherence is not
fully reached. At low energy, the QPE solutions (dots
in Fig. 1b) closely follow the spectral weight. Above
0.2 eV, regions of high intensity appear, howbeit, the
larger ℑΣ broadens the excitations, and no coherent fea-
tures emerge, though the positions of some eπg derived
excitations are discernible. At high energies, positive and
negative, distinctive features appear in ℑΣ(ω) that are
responsible for lower (upper) Hubbard bands (L/UHB),
seen in the spectrum at around -1.7 (2.5) eV. The UHB
exhibits a pole-structure that reminds of the low-energy
quasi-particle band-structure. Hence, an effective band
picture is limited to the close vicinity of the Fermi level,
and R-VO2 has to be considered as a strongly correlated
metal (the weight of the quasi-particle peak is of the or-
der of 0.6). This is experimentally corroborated by the
fact that an increase in the lattice spacing by Nb-doping
results in a Mott insulator of rutile structure [4].
The imaginary parts of the M1 a1g on-site, and a1g–
a1g intra-dimer self-energies, Fig. 1a, are larger than
in R-VO2, usually a hallmark of increased correla-
tions. However, we shall argue that correlations are in
fact weaker than in the metal. Indeed, the dimeriza-
tion in M1 leads to strong inter-site fluctuations, evi-
denced by the significant intra-dimer a1g–a1g self-energy.
Fig. 2 displays the M1-VO2 self-energy in the a1g bond-
ing/antibonding (bab) basis, Σb/ab = Σa1g ± Σa1g−a1g .
The a1g (anti)bonding imaginary part is low and varies
-3 -2 -1  0  1  2  3  4
ω [eV]
a1g b
a1g ab
-3 -2 -1  0  1  2  3  4
ω [eV]
FIG. 2: (color online) Self-energy (Σ−µ) of M1-VO2 in the a1g
bab–basis : (a) real parts. The black stripes delimit the a1g
LDA bandwidths, dashed horizontal lines indicate the values
of the static potential ∆. (b) imaginary parts. Self-energy
elements are dotted in regions irrelevant for the spectrum.
little with frequency in the (un)occupied part of the
spectrum, thus allowing for coherent weight. In the
opposite regions, the imaginary parts reach huge val-
ues. The eπg elements are flat, and their imaginary parts
tiny. This is a direct consequence of the drastically re-
duced eπg occupancy which drops to merely 0.14. These
almost empty orbitals feel only weak correlations, and
sharp bands are expected at all energies. A first idea
for the a1g excitations is obtained from the intersections
ω+µ−ǫb/ab(k)=ℜΣb/ab(ω) as depicted in Fig. 2a, where
the black stripes delimit the LDA a1g bandwidths. The
(anti)bonding band appears as the crossing of the (blue)
red solid line with the stripe at (positive) negative en-
ergy. Hence, the (anti)bonding band emerges at (2.5)
−0.75 eV. Still, the antibonding band is much broadened
since ℑΣab reaches -1 eV. To confirm this, we solved the
QPE and calculated the k-resolved spectrum (Fig. 3a).
As expected, reasonably coherent weight appears over
nearly the entire spectrum from -1 to +2 eV, whose po-
sition coincides with the QPE poles : The filled bands
correspond to the a1g bonding orbitals, while above the
gap, the eπg bands give rise to sharp features. The anti-
bonding a1g is not clearly distinguished since e
g weight
prevails in this range. The L/UHB have faded : a mere
shoulder at -1.5 eV reminds of the LHB. Finally, con-
trary to R-VO2, the number of poles equals the orbital
dimension. Since, moreover, the real-parts of the M1-
VO2 self-energy are almost constant for relevant ener-
gies [24], we construct a static potential, ∆, by evaluat-
ing the dynamical self-energy at the LDA band centers
(pole energies) for the eπg (a1g), see Fig. 2a [25]. Fig. 3b
shows the band-structure ofHk+∆ : The agreement with
the DMFT poles is excellent. Our one-particle potential,
albeit static, depends on the orbital, and is thus non-
local. We emphasize the conceptual difference to the
Kohn-Sham (KS) potential of DFT : The latter gener-
ates an effective one-particle problem with the ground
state density of the true system. The KS energies and
states are auxiliary quantities. Our one-particle poten-
tial, ∆, on the contrary, was designed to reproduce the
interacting excitations. The eigenvalues of Hk+∆ are
thus not artificial. Still, like in DFT, the eigenstates are
SDs by construction, although the true states are not (see
below). The crucial point for M1-VO2 is that spectral
properties are capturable with this effective one-particle
description. It is in this sense that M1-VO2 exhibits only
weak correlation effects. The weight of the bonding ex-
citation is Z=(1 − ∂ωℜΣb(ω))−1ω=−0.7eV≈0.75, and thus
larger than the rutile quasi-particle weight (see above).
What is at the origin of this overall surprising coher-
ence? For the eπg orbitals, this simply owes to their deple-
tion. For the nearly half-filled a1g orbitals the situation is
more intricate. It is a joint effect of charge transfer into
the a1g bands, and the bonding/antibonding–splitting.
Indeed, the filled bonding band experiences only weak
fluctuations, due to its separation of several eV from the
antibonding one. To substantiate these qualitative argu-
ments, we resort to the following model, which treats the
solid as a collection of Hubbard dimers :
H = −t
l1σcl2σ+h.c.
i=1,2
〈l,l′〉
liσcl′iσ+U
nli↑nli↓
Here, c
liσ (cliσ) creates (destroys) an electron with
spin σ on site i of the lth dimer. t is the intra-dimer,
t⊥ the inter-dimer hopping, U the on-site Coulomb
repulsion, and we assume half-filling. First, we discuss
Γ Y C Z Γ
Γ Y C Z Γ
scissors
FIG. 3: (color online) M1-VO2 : (a) spectral function A(k,ω).
(blue) dots ((a) & (b)) are solutions of the QPE. (b) The (red)
dots are the eigenvalues of Hk+∆. See text for discussion.
the t⊥→ 0 limit, which is an isolated dimer : the
Hubbard molecule. We choose t=0.7 eV, the LDA
intra-dimer a1g–a1g hopping, and U=4.0 eV [15] for
all evaluations. The bonding/antibonding–splitting,
∆bab=−2t +
16t2 + U2=3.48 eV, gets enhanced with
respect to the U=0 case. In M1-VO2, the embedding
into the solid, and the hybridization with the eπg reduce
the splitting to ∼3 eV, as can be inferred from the one-
particle poles (Fig. 3), consistent with experiment [11].
The ground state of the dimer is given by |ψ0〉 =
{4t/ (c− U) (| ↓ ↑〉 − | ↑ ↓〉) + (| ↑↓ 0〉+ |0 ↑↓〉)} /a [26]
which is intermediate to the Slater determinant (SD)
(the four states having equal weight), and the Heitler-
London (HL) limit (double occupancies projected out).
With the VO2 parameters, the model dimer is close
to the HL limit [5]. The inset of Fig. 4b shows the
projections of the ground state onto the SD and the
HL state. The former, |〈SD|ψ0〉|2, equals the weight
of the band-derived features in the spectrum (for U>0
satellites appear), while the other measures the double
occupancy
i〈ni↑ni↓〉 = 1 − |〈HL|ψ0〉|
. For U=4.0 eV
the latter is largely suppressed, as a consequence of the
interaction : The N-particle state is clearly not a SD.
Still, the overlap with the SD, and thus the coherent
weight, remains significant, i.e. one-particle excitations
survive and lifetimes are large. To do justice to the
seemingly opposing tendencies of correlation driven
non-SD-behavior, coexisting with a band-like spectrum,
we introduce the notion of a “many-body Peierls” state.
The charge transfer from the eπg into the then almost
half-filled a1g orbitals, finds its origin in the effective re-
duction of the local interaction in the bab–configuration :
While for U=4 eV, 〈SD|H |SD〉 = 2.0 eV in the SD limit,
it reduces to merely 〈ψ0|H |ψ0〉 = 0.91 eV in the ground
state. In fact, inter-site fluctuations are an efficient way
to avoid the on-site Coulomb repulsion. In M1-VO2, this
effect manifests itself in a close cancellation of the local
and inter-site self-energies in the (un-)occupied parts of
the spectrum for the (anti)bonding a1g orbitals.
The gap-opening in VO2 thus owes to two effects : The
self-energy enhancement of the a1g bab–splitting, and
a charge transfer from the eπg orbitals. The difference
in ℜΣ corresponds to this depopulation, seen in exper-
iments [19] and theoretical studies [8, 15], and leads to
the separation of the a1g and e
g at the Fermi level. The
local interactions thus amplify Goodenough’s scenario.
To show that the embedding of the dimer into the
solid does not qualitatively alter our picture of the M1
phase, we solve the model, Eq. (1), using CDMFT. This
moreover allows to study the essentials of the rutile to
M1 MIT by scanning through the degree of dimeriza-
tion t at constant interaction strength U and embedding,
or inter-dimer hopping, t⊥. For the latter we assume
a semi-circular density of states D⊥(ω) of bandwidth
W=4t⊥. In M1-VO2, the t⊥ for direct a1g-a1g hopping
is rather small, yet eπg -hybridizations lead to an effective
D⊥-bandwidth of about 1 eV. We choose U=4t⊥, and an
inverse temperature β=10/t⊥. Fig. 4a displays the or-
bital traced local spectral function A(ω)=Ab(ω)+Aab(ω)
(b,ab denoting again the bonding/antibonding combi-
nations) and the bonding self-energy Σb(ω) for differ-
ent intra-dimer hoppings t : In the absence of t, the
result equals by construction the single site DMFT so-
lution (Σb=Σab), which, for our parameters, is a corre-
lated metal, analog to R-VO2. The spectral weight at the
Fermi level is given by Ab/ab(0) = D⊥(±t − ℜΣb/ab(0)),
with ℜΣb/ab(0)=∓ℜΣab(0). Thus a MIT occurs at
t + ℜΣab(0) = 2t⊥, when all spectral weight has been
shifted out of the bandwidth : Above t/t⊥=0.5 we find a
many-body Peierls phase corresponding to M1-VO2. In
Fig. 4a we have indicated again the graphical QPE ap-
proach : The system evolves from three solutions per or-
bital (Kondo resonance, L/UHB) at t=0 to a single one at
t/t⊥=0.6. Hence the peaks in the insulator are not Hub-
bard satellites, but just shifted bands. The embedding,
t⊥, broadens the excitations and washes out the satellites
of the isolated dimer, like for M1-VO2. Still, as a function
of t, the coherence of the spectrum increases, since the
imaginary part of the (anti-)bonding self-energy subsides
at the renormalized (anti-)bonding excitation energies.
Our model thus captures the essence of the rutile to M1
transition, reproducing both, the dimerization induced
increase in coherence, and the shifting of excitations.
-4 -2  0  2  4
ω / t⊥
t / t⊥
-4 -2  0  2  4
ω / t⊥
t / t⊥
-4 -2  0  2  4
ω / t⊥
t / t⊥
-4 -2  0  2  4
ω / t⊥
t / t⊥
     
     
     
     
     
     
     
-4 -2  0  2  4
     
-4 -2  0  2  4
     
-4 -2  0  2  4
     
-4 -2  0  2  4
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
 0  1  2  3  4  5
iω / t⊥
t / t⊥=0.0
t / t⊥=0.2
t / t⊥=0.4
t / t⊥=0.6
t / t⊥=0.8
 0  2  4
U [eV]
t=0.7eV
 0  2  4
|<SD|Ψ0>|
|<HL|Ψ0>|
U [eV]
t=0.7eV
FIG. 4: (color online) (a) spectral function (top), real
(middle), imaginary (bottom) bonding self-energy Σb of the
CDMFT solution to Eq. (1) for U=4.0t⊥, β=10/t⊥, and
varying intra-dimer hopping t/t⊥. ℜΣb(ω)=−ℜΣab(−ω),
ℑΣb(ω)=ℑΣab(−ω) by symmetry. (b) Imaginary Matsuba-
ra self-energy, ℑΣb(ıω)=ℑΣab(ıω), for U=6t⊥, β=10/t⊥ and
varying t. Inset: Projection of the SD and HL limit on the
Hubbard molecule ground state (t=0.7 eV, t⊥=0) versus U.
Under uni-axial pressure or Cr-doping, VO2 develops
the insulating M2 phase [4] in which every second vana-
dium chain along the c-axis consists of untilted dimers,
whereas in the others only the tilting occurs. We may
now speculate that the dimerized pairs in M2 form a1g
Peierls singlets as in M1, while the tilted pairs are in a
Mott state. Hence, we interpret the seminal work of [4]
as the observation of a Mott to many-body Peierls tran-
sition taking place on the tilted chains when going from
M2 to M1. To illustrate this, we solve again Eq. (1) for
appropriate parameters. The tilted M2 chains are akin
to the rutile phase, yet with a reduced a1g bandwidth [7].
Thus we now choose U=6t⊥, β=10/t⊥, and vary t. All
solutions shown in Fig. 4b are insulating, however, the
diverging self-energy at vanishing intra-dimer coupling
(t=0, tilted “M2” chains) becomes regularized with the
bond enhancement (t>0, “M1”). The imaginary part of
the self-energy gets flatter and the system thus more co-
herent. The above is consistent with the finding of (S=0)
S=1/2 for the (dimerized) tilted pairs in M2-VO2 [4].
While our results do not exclude surprises in the direct
vicinity of Tc [22], the nature of insulating VO2 is shown
to be rather “band-like” in the above sense. Our analyti-
cal continuation scheme allowed us to explicitly calculate
this band-structure. The latter can also be derived from
a static one-particle potential. Yet, this does not im-
ply a one-particle picture for quantities other than the
spectrum. Above all, the ground state is not a Slater de-
terminant. Hence, we qualify M1-VO2 as a “many-body
Peierls” phase. We argue that the weakness of lifetime
effects results from strong inter-site fluctuations that cir-
cumvent local interactions in an otherwise strongly cor-
related solid. This is in striking contrast to the strong
dynamical correlations in the metal, which is dominated
by important lifetime effects and incoherent features.
We thank H. T. Kim, J. P. Pouget, M. M. Qazil-
bash, and A. Tanaka for valuable discussions and A. I.
Poteryaev, A. Georges and A. I. Lichtenstein for discus-
sions and the collaboration [15] that was our starting
point. We thank AIST, Tsukuba, for hospitality. JMT
was supported by a JSPS fellowship. Computer time was
provided by IDRIS, Orsay (project No. 071393).
[1] J. M. Tomczak and S. Biermann, J. Phys.: Condens.
Matter (2007), in press.
[2] A. Zylbersztejn, N. F. Mott, Phys. Rev. B 11, 4383
(1975).
[3] J. B. Goodenough, J. Solid State Chem. 3, 490 (1971).
[4] J. P. Pouget, H. Launois, J.Phys. France 37, C4 (1976).
[5] C. Sommers, S. Doniach, Solid State Commun. 28, 133
(1978).
[6] R. M. Wentzcovitch et al., Phys. Rev. Lett. 72, 3389
(1994).
[7] V. Eyert, Ann. Phys. (Leipzig) 11, 650 (2002).
[8] A. Tanaka, J. Phys. Soc. Jpn. 72, 2433 (2003).
[9] R. Eguchi et al., cond-mat/0607712 (2006).
[10] S. Shin et al., Phys. Rev. B 41, 4993 (1990).
[11] T. C. Koethe et al., Phys. Rev. Lett. 97, 116402 (2006).
[12] G. A. Sawatzky, D. Post, Phys. Rev. B 20, 1546 (1979).
[13] A. Continenza et al., Phys. Rev. B 60, 15699 (1999).
[14] M. A. Korotin et al., Phys. Met. Metallogr. 94, 17 (2002).
[15] S. Biermann et al., Phys. Rev. Lett. 94, 026404 (2005).
[16] A. Liebsch et al., Phys. Rev. B 71, 085109 (2005).
[17] M. S. Laad et al., Phys. Rev. B 73, 195120 (2006).
[18] M. Jarrell, J. E. Gubernatis, Phys. Rep. 269, 133 (1996).
[19] M. Haverkort et al., Phys. Rev. Lett. 95, 196404 (2005).
[20] M. M. Qazilbash et al., Phys. Rev. B 74, 205118 (2006).
[21] A. Georges et al., Rev. Mod. Phys. 68, 13 (1996).
[22] H.-T. Kim et al., Phys. Rev. Lett. 97, 266401 (2006).
[23] We solve the equation numerically by iterating until self-
consistency within an accuracy of 0.05 eV.
[24] Explaining why LDA+U opens a gap [14, 16], yet while
missing the correct bonding/antibonding splitting.
[25] ∆eπ
=0.48eV, ∆eπ
=0.54eV, ∆b=−0.32eV, ∆ab=1.2eV
[26] a =
2 (16t2/(c− U)2 + 1), c =
16t2 + U2
ABSTRACT
  Using a general analytical continuation scheme for cluster dynamical mean
field calculations, we analyze real-frequency self-energies, momentum-resolved
spectral functions, and one-particle excitations of the metallic and insulating
phases of VO2. While for the former dynamical correlations and lifetime effects
prevent a description in terms of quasi-particles, the excitations of the
latter allow for an effective band-structure. We construct an
orbital-dependent, but static one-particle potential that reproduces the full
many-body spectrum. Yet, the ground state is well beyond a static one-particle
description. The emerging picture gives a non-trivial answer to the decade-old
question of the nature of the insulator, which we characterize as a ``many-body
Peierls'' state.

<|endoftext|><|startoftext|>
Introduction 
     It is well known that the mass represents  one of the most basic properties of an atomic nucleus. 
It is also a complex and non trivial quantity whose basic properties still must be investigated deeply and 
properly understood.  
The celebrated Einstein’s mass law is  known 
m =                (1) 
On this basis some different contributions of energy  are stored inside a nucleus, and contribute to its 
mass.  
During nucleus formation in its ground state, a certain amount of energy B will be released in the process 
so that 
BcmMc
j −= ∑
22 .     (2) 
There are different sources of such energy B. It contributes  the strong attractive interaction of nucleons . 
However,  despite the immense amount of data about nuclear properties, the basic understanding of the 
nuclear strong interaction, as example, still lacks.  We have a basic model of meson exchange that of 
course works at a qualitative level but it does not provide a satisfactory approach to the description of 
such basic interaction. Still it contributes Coulomb repulsion between protons, and in addition we have 
also surface effects and still many other contributions that in a phenomenological picture are tentatively 
taken into account invoking some models as example  the liquid drop elaboration as von Weizsacker [1].  
 It is known some other  nuclear mass  models may be  considered and, despite the numerous parameters 
that are contained in these different models and the intrinsic conceptual differences adopted in their 
formulation, some common features arise from these calculations. All such models, [2], give similar 
results for the known masses, their calculations yield a typical accuracy that results about 4105 −×  for a 
medium-heavy nucleus having binding energy of the order of MeV1000 , but the predictions of such 
different mass models strongly give a net divergence when applied to unknown regions. 
One consequence of such two indications  seems rather evident. According to [2], there is the possibility 
that a basic underlying mechanism oversees the process of mass formation of atomic nuclei, and it is not 
presently incorporated and considered in the present nuclear models of the traditional nuclear physics. 
In fact, some astonishing results are not lacking  as far as this problem is concerned. 
Owing to the presence of Pauli’s exclusion principle, when nucleons are put together to form a bound 
state , there are not at rest and thus their kinetic energy also contributes to B  given in (2) and thus to the 
mass of the nucleus. Still according to [2], a part of this energy, that is to say, that one that varies 
smoothly with the number of the nucleons, is taken into account in the liquid drop model but the 
remaining part of this energy fluctuates with the number of nucleons. 
The proper nature of such fluctuations should be more investigated. 
P. Leboeuf [2] has extensively analyzed this problem and his conclusion is that  the motion of the 
nucleons inside the nucleus has  a  regular plus a  chaotic component. We will not enter into details here 
[2] but we only remember here that traditionally in nuclear physics dynamical effects in the structure of 
nuclei have been referred to as shell effects with the pioneer studies of A. Bohr and B.R. Mottelson [3]  
and V.M. Strutinsky [4]. The experience here derives from atomic physics where the symmetries of the 
Hamiltonian generates strong degeneracy of the electronic levels  and such degeneracy produce 
oscillations in the electronic binding energy. Shell effects  should be due to deviations of the single 
particle levels with respect to their average properties.  
According to the different approaches that have been introduced to reproduce the systematics of the 
observed nuclear masses that in part are inspired to liquid drop models or Thomas Fermi approximations, 
the total energy may be expressed as the sum of two contributions: 
)x,Z,N(Û)x,Z,N(U)x,Z,N(U +=          (3) 
with x  a parameter set defining the shape of the nucleus. U  is  describing the bulk or macroscopic 
properties of the nucleus and Û  instead describes shell effects. This term could be splitted in two 
components [2], the first representing the regular component and the second representing instead the 
chaotic contribution. The same thing we should have for the mass  
BcmMc
j −= ∑
with 
)x,E(B̂)x,E(B)x,E(B −=        (4) 
There is now another important  but independent contribution that deserves to be mentioned here. 
Rather recently V. Paar et al [5] introduced a power law for description of the line of stability of atomic 
nuclei, and in particular for the description of atomic weights. They compared the found power law with 
the semi-empirical formula of the liquid drop model, and showed that the power law corresponds to a 
reduction of neutron excess in superheavy nuclei with respect to the semi-empirical formula. Some fractal 
features of the nuclear valley of stability were analyzed and the value of fractal dimension was 
determined. 
It is well known that a power law may be often connected with an underlying fractal geometry of the 
considered system. If confirmed for atomic nuclei, according to [5], it could be proposed a new approach 
to the problem of stability of atomic nuclei. In this case the aim should be to identify the basic features in 
underlying dynamics giving rise to the structure of the atomic nuclei. Of course, it was pointed out the 
role of fractal geometry in quantum physics and quark dynamics [6] and in particular it was analyzed the 
self-similarity of paths of the Feynman path integral. 
Finally, M. Pitkanen repeatedly outlined that his  TGD model predicts  that universe is 4-D spin glass and 
this kind of fractal energy landscape might be present in some geometric degrees of freedom such as 
shape of nuclear  outer surface or, if nuclear string picture is accepted, in the folding dynamics of the 
nuclear string [7]. 
Still examining the problem under a different point of view, we must outline here the results that recently 
were obtained in [8]. 
These authors found non linear dynamical properties of giant monopole resonances in atomic nuclei 
investigated within the time-dependent relativistic mean field model. 
Finally, in ref [9], the statistics of the radioactive decay of heavy nuclei was the subject of experimental 
interest. It was considered that, owing to the intrinsic fluctuations of the decay rate, the counting statistics 
could depart from the simple Poissonian behaviour. Several experiments carried out with alfa and beta 
sources have found that the counting variance for long counting periods, is higher than the Poissonian 
value by more than one order of magnitude. This anomalous large variance has been taken as an 
experimental indication that the power spectrum of the decay rate fluctuations has a contribution that 
grows as the inverse of the frequency f  at low frequencies. 
In conclusion, also considered the problem from several and different view points, there are  some 
different arising evidences that it deserves to be analyzed by the methods of non linear dynamics in order 
to obtain some detailed result. 
This was precisely the aim of the present paper. We analyzed the Atomic Weights, )Z(Wa  and the Mass 
Number A(Z) as function of the atomic number Z for stable atomic nuclei and we applied to such data our 
non linear test methods, Fractal and Recurrence Quantification Analysis. The results are reported and 
discussed in detail in the following section. 
2.  Preparation of the Experimental Data. 
It is known that the trends of nuclear stability may be represented in a well known Z,N  chart of nuclides 
where each nucleus with Z protons and N  neutrons has mass Number NZA += . A line of stability may 
be realized by taking for each atomic number Z , the stable nucleus of the isotope having the largest 
relative abundance. 
The atomic weights of a naturally occurring element are  given by averaging the corresponding isotope 
weights, weighted so  to take into account the relative isotopic abundances. In this paper the data for 
stable nuclei with Z  values until 83=Z  were considered. The data were taken using the IUPAC 1997, 
standard atomic weights, at www.webelements.com. )Z(Wa    and  )Z(A   are given in Fig.1. 
Fig.1 
.             Atomic Weights: )Z(Wa       Mass Number )Z(A    
3.Tests by using Mutual Information. 
Autocorrelation Function. The autocorrelation )(τρ  is given by the correlation of a time series with itself 
using )t(x  and )t(x τ+  two time series in the correlation formula. For time series it measures as well 
correlated values of the given time series result under different delays. A  choice for the delay time to use  
when embedding time series data should be the first time the autocorrelation crosses zero. It represents 
the lowest value of delay for which the values are uncorrelated. The important thing here is that the 
autocorrelation function is a linear measure and thus it should not provide accurate results in all the 
situations in which important non linear contributions are expected. 
In the present  case we examine two series that are )Z(Wa  and )Z(A . Here we have not a time variable 
respect to which  the delay must be characterized  but instead  it is the atomic number Z that takes the 
place of time t. Therefore  we will speak here of shiftZ −  instead of time –lags in our embedding 
procedure. 
In Fig.2 we report the results of our calculations for autocorrelation function (ACF)  in the case of Atomic 
Weights and Mass Number respectively. Both the ACF for )(ZWa  and )(ZA  were calculated for 
shiftZ − ranging from 1 to 80. The first value of Z the ACF crosses the zero was obtained for )(ZWa  and 
)(ZA , and it resulted 30=− shiftZ . A typical behaviour was obtained for ACF, in the cases of   )(ZWa  
and )(ZA  respectively,  resulting in progressively, positive but decreasing values of ACF until the value 
30=− shiftZ , and a subsequent negative half-wave for shiftZ −  values greater than 30. This seems an 
interesting result that deserves in some manner a careful interpretation. 
Fig.2 
       ACF for shiftZ −  values ranging from               ACF for shiftZ −  values ranging from 1 to 80 in  
       1 to 80 in the case of  )(ZWa .                             the case of ).(ZA  
The Mutual Information.  It is usually used to determine a useful time delay for attractor reconstruction 
of a given time series. Generally speaking, we may observe only a variable from a system, )t(x , and we 
wish to reconstruct a higher dimensional attractor. We have to consider 
[ ])nt(x),.......,t(x),t(x),t(x τττ +++ 2  to produce a )n( 1+  dimensional representation. Consequently, 
the problem is to choose a proper value for the delay τ . If the delay is chosen too short, then )t(x  is very 
similar to )t(x τ+ . Of course,  for a too large delay, then the corresponding coordinates result essentially 
independent and no information can be gained. The method of Mutual Information [10] involves the idea 
that a good choice for τ   is one  that, given )(tx   provides new information with measurement  )t(x τ+ . 
In other terms, given a measurement of )(tx , how many bits on the average can be predicted about 
)( τ+tx ? In the general case, as τ  is increased, Mutual Information decreases and then usually rises again. 
The first minimum in Mutual Information is used to select a proper τ . The important thing is here that the 
Mutual Information function takes non linear correlations into account. 
Before to consider the results that we have obtained, it is important to take into account that they change 
in some manner our traditional manner to approach the discussion on atomic weights and mass numbers 
of atomic nuclei. In fact, we do not consider here values obtained for a single atomic weight or for a 
single mass number. Instead, using M.I., we evaluate M.I. values computed for pairs of Atomic Weights, 
i.e.  )(ZWa   and )( shiftZZWa −+ , for any possible Z  and for each considered .shiftZ −  The same thing 
happens for pairs of atomic nuclei with Mass Numbers )(ZA   and )( shiftZZA −+ . 
In Fig.3 we give our results for analysis of )(ZWa . The calculated shiftZ − resulted 3=Z .In Fig.4 we give 
instead the results for )(ZA . In this case the calculated shiftZ − resulted to be 2=Z . To complete our 
results, in Fig.5 we give also the results of M.I computed for )(ZN , being this time N the number of 
neutrons considered  as ).(ZN  Finally, Fig. 6 compares Mutual Information of )(ZWa , )(ZA , )(ZN . 
    Fig. 3: Mutual Information  
Z-shift M. I. 
0 2.75572 
1 2.29477 
2 2.12415 
3 2.11128 
4 2.29507 
5 2.30601 
6 2.32549 
7 2.15915 
8 2.11995 
9 2.19359 
10 2.34003 
11 2.21110 
12 2.05087 
13 2.09827 
14 2.16302 
15 2.19951 
16 2.09898 
17 2.03026 
18 2.07486 
19 2.15473 
20 2.07256 
Fig. 4: Mutual Information 
Z-shift M. I. 
0 2.75025 
1 2.24680 
2 2.12359 
3 2.12379 
4 2.24418 
5 2.25447 
6 2.21247 
7 2.15885 
8 2.11560 
9 2.19567 
10 2.25399 
11 2.09334 
12 2.02933 
13 2.08003 
14 2.13427 
15 2.07279 
16 2.09985 
17 1.98171 
18 2.08486 
19 2.11095 
20 2.09009 
Mutual Information Atomic Weights
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Z(lags)
Mutual Information Mass Number
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Z(lags)
Fig. 5: Mutual Information 
Z-shift M.I. 
0 2.746829 
1 2.247466 
2 2.117802 
3 2.108233 
4 2.248289 
5 2.251926 
6 2.288034 
7 2.144352 
8 2.131585 
9 2.133403 
10 2.219288 
11 2.086516 
12 2.082935 
13 2.129288 
14 2.226088 
15 2.092718 
16 2.034579 
17 1.983242 
18 2.053253 
19 2.038391 
20 2.018196 
Fig.6 : Mutual Information. 
Mutual Information 
Atomic Weights-Mass Number-Neutron Number
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Z-shift
atomic w eights
mass number
neutron number
We are now in the condition to reassume some results. 
Using autocorrelation function, ACF (Linear Analysis), a shiftZ −  value of 30=Z  is obtained for both 
)(ZW a  and )(ZA . 
Using Mutual Information (Non Linear Analysis) it is obtained instead 3=− shiftZ  for )(ZWa   and 
2=− shiftZ  for )(ZA . Also )(ZN  gave 3=− shiftZ . 
We have a preliminary indication that the mechanism of increasing mass in atomic nuclei is a non linear 
mechanism. Of course, this could be an important indication in understanding of the basic features of  
nuclear matter .Therefore  it becomes of relevant importance to attempt to confirm such conclusion on the 
Mutual Information Neutron Number
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Z-shift
basis of a more deepened control. The test that in such cases one uses in analysis of non linear dynamics 
of time series data is that one of  surrogate data. Here we  used shuffled data. The results are given in 
Fig.7 for )(ZWa  and in Fig.8 for )(ZA . 
Fig.7 : Surrogate Data Analysis 
Z-lags M.I.-Surrogate Data 
0 2.76156 
1 1.48621 
2 1.44105 
3 1.44755 
4 1.38917 
5 1.36923 
6 1.54505 
7 1.38976 
8 1.38138 
9 1.48849 
10 1.36547 
11 1.34689 
12 1.37347 
13 1.42643 
14 1.34143 
15 1.29609 
16 1.25763 
17 1.40470 
18 1.43727 
19 1.41390 
20 1.36288 
Fig. 8 : Surrogate Data Analysis 
Z-lags M.I.-Surrogate Data
0 2.73606 
1 1.61453 
2 1.39896 
3 1.37041 
4 1.33673 
5 1.30566 
6 1.41145 
7 1.34616 
8 1.35618 
9 1.34365 
10 1.37382 
11 1.25994 
12 1.30270 
13 1.41078 
14 1.47545 
15 1.21417 
16 1.31047 
17 1.27582 
18 1.38925 
19 1.35851 
20 1.41464 
The results obtained by using shuffled data clearly confirm that we are in presence of a on linear 
mechanism in the  process of increasing mass of atomic nuclei. 
We also tested statistically  the obtained differences between M.I. for  original and surrogate data for the 
case of Atomic Weights as well as for the case of Mass Numbers. In the case of M.I for  Atomic Weights 
vs M.I. Atomic Weights – Surrogate Data , using Unpaired t test we obtained a  P value P<0.0001 and the 
same value was found in the case of M.I. Mass Number vs M.I. Mass Number – Surrogate Data. 
Mutual Information 
Atomic Weights vs Surrogate Data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Z(lags)
atomic w eigths
surrogate data
Mutual Information 
Mass Number vs Surrogate Data
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Z(lags)
mass number
surrogate data
In conclusion, by accepting the presence of non linearity, we have reached  the first relevant conclusion of 
the present paper. 
 Looking now to Figures 3, 4, 5, 6 one may identify now new properties for atomic nuclei. Remember that 
we are considering each time, pairs of Atomic Weights or pairs of Mass Numbers or still pairs of Neutron 
Numbers for atomic nuclei with shiftedZ −  values ranging from 1 to 20.  What one should expect in this 
case is to find a minimum of  Mutual Information followed soon after by a rather constant behaviour for 
M.I. Examination of the results reveals instead that we have  some definite maxima  and some definite 
minima at given values of shiftZ −  that are quite different in the two and three cases that we have 
examined. 
In detail the maxima for Atomic Weights are given at Z-shift values = 6,10,15,19,….   . Minima instead 
are given at Z-shift values =3,8,12,17,…… . 
The maxima for Mass Numbers are given at Z-shift values=5,10,14,16(19)  while the minima are given at 
Z-shift values =2,8,12,15,17. For Neutrons we have Z-shift values = 3,9,12,17 for the minima and 
6,10,14,18 for the maxima. In conclusion: Still repeating here  that each time we are exploring the M.I 
value for pairs of atomic weights, or of mass number or of neutrons, shifted in the valueZ −  by some 
given values ranging between 1 and 20, we find that some pairs of nuclei show maxima MI values while 
other pairs of nuclei show minima MI value. Therefore  we have new and  interesting properties identified 
in atomic nuclei when analyzed by pairs as in the present methodology.  We may call such new identified 
regularities for atomic nuclei as pseudo periodicities in pairs of atomic nuclei.  
For Mass Numbers we may write as example that 
shiftZNA −+∆=∆  
Fixed  a value of Z , we have consequently 11 NZA += . For an atomic nucleus with muss number 2A  and 
Z-shifted, we will have 22 NshiftZZA +−+= . 
Consequently, shiftZNAAA −+∆=−=∆ 12 with 12 NNN −=∆ . For  Z-shift values=5,10,14,16(19), the 
considered pairs of atomic nuclei will show maxima of M.I.. Instead for  Z-shift values =2,8,12,15,17, 
such M.I. values will reach a minimum value. 
Let us go in more detail in the analysis. 
First of all we have also to note that  the values of MI, calculated for each shiftZ − ranging from 1 to 20,  
result to be quite different in )(ZWa  respect to )(ZA , and N(Z). In addition, as previously said, Mutual 
Information measures how much, given two random variables, and knowing one of these two variables, is 
reduced our uncertainty about the other. Mutual Information must thus be intended essentially as 
estimation of mutual dependence of two variables. In our case we find that the pairs of atomic weights, or 
of mass number or of neutrons in  atomic nuclei that are shifted by some definite values of the atomic 
number Z , show strong dependence (maxima values of M.I.) or, respectively, they show  very low 
dependence (minima values of M.I.). We have some new pseudoperiodicities that in some manner recall  
a new kind of pseudo isotopies. All that seems to be realized in a full regime of non linearity. 
4. Phase Space Reconstruction of )(ZWa  and )(ZA . 
We may now attempt to obtain  for the first time a phase space reconstruction of Atomic Weights and 
Mass Number of atomic nuclei. 
To reach this objective one must estimate Embedding Dimension using the False Nearest Neighbors 
Criterion (FFN). We applied it using a shiftZ −  = 3 for )(ZWa  and a 2=− shiftZ  for )(ZA  as previously 
found. A false criterion distance was considered to be 4.42 for both the cases of the analysis. The results 
are reported in Fig.9 for atomic weights, )(ZWa , and in Fig.10 for Mass Number, )(ZA . 
Fig. 9 :False Nearest Neighbors for Atomic  Fig.10 : False Nearest Neighbors for Mass  
 Weights        Number 
The evaluation of the results given in Figures 9, 10 enables us to conclude that the phase space 
reconstruction for atomic weights and mass number requires an estimated embedding dimension ,that 
results to be included between 1 and 2. We may assume to consider 2=D . Atomic weights and mass 
number of atomic nuclei may be approximately represented in a bi dimensional phase space. 
Consequently, according to the general framework of the  theory on non linear dynamics of  systems, we 
may conclude that a very few number of independent variables is required in order to describe the 
mechanism of increasing mass of atomic nuclei. We may accept to consider that  they are two variables 
that, with greatest prudence, we may accept to identify as being  the proton and the neutron numbers, 
respectively. The phase space description of atomic weights )(ZWa and of Mass Number )(ZA  requires 
with approximation, the use of such two variables. 
 Since this result has been obtained in a closed form, we may now attempt to analyze if the two given 
)(ZWa  and )(ZA  exhibit or not properties of divergence. 
To this purpose we may calculate Lyapunov spectrum in the embedded phase space. The results that we 
obtained, are reported in Fig. 11 and in Fig.12 for )(ZWa  and )(ZA , respectively. 
Fig.11 : Lyapunov spectrum of atomic weights 
iteration, exponents  
1 -0.139132 -1.767928 
2 -0.054642 -1.852418 
3 -0.032002 -1.875058 
4 -0.021497 -1.885562 
iteration, exponents  
5 -0.015293 -1.891767 
6 -0.011169 -1.895891 
7 -0.008224 -1.898835 
8 -0.006016 -1.901043 
iteration, exponents  
9 -0.004299 -1.902761 
10 -0.002925 -1.904134 
11 -0.001801 -1.905258 
12 -0.000864 -1.906195 
13 -0.000946 -1.854474 
14 -0.000219 -1.810264 
15 0.000026 -1.779827 
16 0.000273 -1.753227 
17 0.000425 -1.705629 
18 0.000914 -1.667297 
19 0.001349 -1.632997 
20 0.001738 -1.602124 
21 0.002186 -1.585933 
22 0.002567 -1.570617 
23 0.002785 -1.552573 
24 0.003007 -1.536054 
25 0.003233 -1.508474 
26 0.003433 -1.483006 
27 0.003304 -1.450959 
28 0.003100 -1.421118 
29 0.003155 -1.394447 
30 0.002734 -1.365179 
31 0.002481 -1.336786 
32 0.002367 -1.309490 
33 0.002297 -1.283733 
34 0.002104 -1.261030 
35 0.002071 -1.235585 
36 0.002229 -1.208038 
37 0.002491 -1.182091 
38 0.002558 -1.161598 
39 0.002648 -1.143582 
40 0.002781 -1.125369 
41 0.002912 -1.108048 
42 0.003353 -1.094097 
43 0.003464 -1.078832 
44 0.003659 -1.064133 
iteration, exponents  
45 0.003833 -1.043353 
46 0.003724 -1.022886 
47 0.003713 -1.008188 
48 0.003649 -0.995043 
49 0.003576 -0.982424 
50 0.003419 -0.970124 
51 0.003116 -0.958012 
52 0.002781 -0.948094 
53 0.002434 -0.938526 
54 0.002281 -0.929092 
55 0.002133 -0.920000 
56 0.001982 -0.911222 
57 0.001851 -0.902767 
58 0.001885 -0.897098 
59 0.001945 -0.891648 
60 0.002004 -0.886751 
61 0.002097 -0.882050 
62 0.002113 -0.877724 
63 0.002311 -0.872891 
64 0.002398 -0.865438 
65 0.002469 -0.858201 
66 0.002539 -0.854135 
67 0.002586 -0.850170 
68 0.002372 -0.850370 
69 0.002156 -0.850555 
70 0.001942 -0.850731 
71 0.001733 -0.850900 
72 0.001529 -0.851065 
73 0.001330 -0.851224 
74 0.001136 -0.851379 
75 0.000947 -0.851530 
76 0.000763 -0.851676 
77 0.000585 -0.851819 
78 0.000410 -0.851958 
Fig.12 : Lyapunov spectrum of mass number 
iteration, exponents                iteration,          exponents 
1 -0.063750 -1.815136 
2 -0.013655 -1.865231 
3 -0.004496 -1.874390 
4 -0.000940 -1.877946 
5 0.001067 -1.879953 
6 0.002390 -1.881276 
iteration, exponents 
7 0.003333 -1.882218 
8 0.004039 -1.882925 
9 0.004589 -1.883475 
10 0.005029 -1.883914 
11 0.005907 -2.030549 
12 0.005902 -2.152007 
13 0.007184 -2.344940 
14 0.007900 -2.508600 
15 0.008538 -2.585360 
16 0.008446 -2.593648 
17 0.008353 -2.600949 
18 0.007751 -2.585976 
19 0.008750 -2.554168 
20 0.008150 -2.531907 
21 0.008186 -2.482481 
22 0.007894 -2.450329 
23 0.008806 -2.420931 
24 0.009633 -2.393973 
25 0.009816 -2.350399 
26 0.009954 -2.310147 
27 0.009880 -2.258690 
28 0.009662 -2.210759 
29 0.009018 -2.163469 
30 0.008250 -2.119600 
31 0.007832 -2.082773 
32 0.007498 -2.052309 
33 0.007109 -2.023616 
34 0.006523 -2.013469 
35 0.005691 -2.003623 
36 0.005143 -1.995782 
37 0.004718 -1.988460 
38 0.003966 -1.963859 
39 0.004116 -1.946858 
40 0.004339 -1.930786 
41 0.004649 -1.922343 
42 0.005367 -1.917894 
43 0.007274 -1.903393 
iteration, exponents 
44 0.008790 -1.889480 
45 0.010447 -1.876392 
46 0.010388 -1.871786 
47 0.009970 -1.869670 
48 0.009374 -1.855670 
49 0.008541 -1.841986 
50 0.007498 -1.829730 
51 0.006726 -1.821934 
52 0.005940 -1.808215 
53 0.005215 -1.795046 
54 0.004497 -1.783516 
55 0.004030 -1.771905 
56 0.003596 -1.760725 
57 0.002953 -1.750916 
58 0.002916 -1.743006 
59 0.002961 -1.735446 
60 0.003004 -1.728808 
61 0.003032 -1.722373 
62 0.003108 -1.713120 
63 0.003229 -1.704206 
64 0.003291 -1.693598 
65 0.003299 -1.689223 
66 0.003127 -1.691107 
67 0.002824 -1.702828 
68 0.002620 -1.700615 
69 0.002292 -1.708755 
70 0.002307 -1.709571 
71 0.002341 -1.710383 
72 0.002445 -1.714966 
73 0.002536 -1.719414 
74 0.002212 -1.719409 
75 0.001877 -1.719385 
76 0.001547 -1.719358 
77 0.001225 -1.719332 
78 0.000912 -1.719306 
79 0.000606 -1.719280 
80 0.000308 -1.719255 
To calculate the Lyapunov spectrum in the case of the Atomic Weights, )(ZWa , we used  a number of 22 
fitted points in the embedded phase space. 
These are the following results for the calculated Lyapunov exponents:  
λ1 = 0.000410 and λ2 = -0.851958. It is seen that we have 01 >λ   and 02 <λ  with 021 <λ+λ  as required. 
In conclusion we are in presence of a divergent system and such divergence may be indicative a pure 
chaotic regime for Atomic Weights. 
In the case of the Mass Number, )(ZA , we utilized a number of 17 fitted points in the embedded phase 
space. These are the results we obtained for the calculated Lyapunov exponents: 000308.01 =λ , 
719255.12 −=λ   with 01 >λ  , 02 <λ  and 021 <λ+λ . Also in the case of Mass Number, )(ZA ,  we are in 
presence of a divergent system and it could be indicative of a pure chaotic regime. 
In brief, we have reached the following conclusions: 
1) In the process of increasing mass of atomic nuclei we are in presence of a non linear mechanism. 
Remember that the presence of   non linear contributions in the dynamic of a system gives  often 
origin to  chaotic regimes. 
2) The mechanism of increasing mass in Atomic Weights and  in  Mass Number for pairs of atomic 
nuclei also exhibits some pseudo periodicities at some definite shiftsZ − of atomic nuclei. 
Therefore, we could be in presence of  an ordered  regime of increasing mass but  in the complex 
of a whole structure that is divergent and possibly chaotic. 
3) A phase space reconstruction has been realized for Atomic Weights and Mass Number of atomic 
nuclei, respectively. In our opinion this is a relevant result that is obtained here. In fact, from the 
analysis performed by using F.F.N, it does not result in a so clear manner that the reconstructed  
phase space has dimension D=2. We have F.F.N. values that suspend embedding dimension 
between 1 and 2. In this case in the analysis it is adopted the greatest value. In conclusion we may 
accept an embedding dimension D=2, and only in this condition we have that only few, two 
variables, are required in order to describe the mechanism of increasing mass of atomic nuclei in 
phase space with respect to atomic weights and mass number . The first variable should be the 
atomic number, Z , and the other variable should be the Neutron Number , .ZAN −=  
4) The analysis of )(ZWa and )(ZA  reveals a new important features when we analyze such two 
systems by calculation of Lyapunov spectrum. It results that we are in presence of divergent 
systems in both case of stable nuclei analyzed by )(ZWa  and )(ZA . Such divergent property, 
linked to the previously found on non linearity, could be indicative that we are in presence of a 
chaotic regime in the  mechanism of the increasing mass of atomic  nuclei  when  seen as function 
of Z . 
We may go on by a further step calculating the Correlation Dimension in the reconstructed phase space 
for both )(ZWa  and )(ZA . In Fig.13 we report the results for )(ZWa . In Fig. 14 we give instead the results 
for )(ZA . Finally in Figures 15 and 16 we have the results using surrogate (shuffled) data. 
Fig.13 : Atomic Weights. 
Fig.14 : Mass Number. 
For the atomic weights it is obtained  D2 = 1.955 ± 0.296 as value for Correlation Dimension. For Mass 
Number it results instead  D2 = 2.120 ± 0.084. 
It is important to observe that in both case we obtain non integer values of such topological dimension in 
phase space reconstruction. 
We may now consider the results for surrogate data. 
Fig.15 : Results on Correlation Dimension, Atomic Weights (surrogate data). 
Fig.16 : Results on Correlation Dimension, Mass Number (surrogate data).  
In the case of )(ZWa  we obtain D2 = 5.130 ± 0.624  while instead in the case of )(ZA  we have D2 = 5.193 
± 0.810 
As seen through the results, a net difference is obtained in comparison of original with surrogate data. 
They may be quantified in the following manner: 
)2(793.3
),4(088.5),2(678.3
=−=−=
lagZNumberMass
lagZWeightsAtomiclagZWeightsAtomic
surrogate
surrogateoriginal
The null hypothesis may be rejected. In conclusion we have a non integer topological dimension for both 
)(ZWa  and ).(ZA  
Therefore it has a sense to attempt to ascertain if we are in presence of a fractal behaviour for both the 
system of data that we have in examination. 
5. On a Possible Existing Power Law to Represent Increasing Mass in Atomic Weights and Mass 
Number of Atomic Nuclei. 
As we indicated in the introduction in the present paper, rather recently V. Paar et al [6] introduced a 
power law for description of the line of stability of atomic nuclei, and in particular for the description of 
atomic weights. They compared the found power law with the semi-empirical formula of the liquid drop 
model, and showed that the power law corresponds to a reduction of neutron excess in superheavy nuclei 
with respect to the semi-empirical formula. Some fractal features of the nuclear valley of stability were 
analyzed and the value of fractal dimension was determined. 
It is well known that a power law may be often connected with an underlying fractal geometry of the 
considered system. If confirmed for atomic nuclei, according to [5], it could be proposed a new approach 
to the problem of stability of atomic nuclei. In this case the aim should be to identify the basic features in 
underlying dynamics giving rise to the structure of the atomic nuclei. 
The aim is to perform here such kind of analysis for )(ZWa  and )(ZA . 
For atomic weights let us introduce the following Power Law : 
β= aZZWa )(                            (5) 
while instead for the Mass Number let us introduce the following power law 
γ= cZZA )(                                    (6) 
The problem is now to estimate ( β,a )  and  ( γ,c ) by a fitting procedure. 
We give here the obtained results for the Atomic Weights. 
Curve Fit Report 
Y Variable: C2.  X Variable: C1. Model Fit:   A*X^B  
Parameter Estimates for All Groups 
Groups CountIter's            R2                               A                             B  
All 83 24 0.99954 1.47335 1.12133  
Combined Plot Section 
Fig. 17 
125.0
187.5
250.0
0.0 25.0 50.0 75.0 100.0
Y = A*X^B
                     
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 1.47335 0.02448 1.42464 1.52206 
B 1.12133 0.00399 1.11338 1.12927 
Iterations 24 Rows Read 83 
R-Squared 0.999538 Rows Used 83 
Random Seed 9839 Total Count 83 
Estimated Model 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Plot Section 
           
Fig.18 
0.0 25.0 50.0 75.0 100.0
Residual vs C1
In conclusion, for the (5) we obtained  a = 1.47335  and β = 1.12133. 
V. Paar et al. [5] obtained  instead that a =1.44±0.02  and β=1.120±0.004 and β=1.19±0.01 by using  the 
Box Counting method. There is an excellent agreement. 
As it may be seen the obtained values significantly differ from the line. In addition, the obtained values 
strongly give evidence for a  possible fractal regime. 
Let us see now the results that we obtained for the (6) concerning Mass Number. 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit:   A*X^B  
Parameter Estimates for All Groups 
Groups CountIter's             R2                              A                               B  
All 83 23 0.99929 1.46185 1.12389  
Combined Plot Section 
Fig.19 
125.0
187.5
250.0
0.0 25.0 50.0 75.0 100.0
Y = A*X^B
                      
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 1.46185 0.03010 1.40195 1.52174 
B 1.12389 0.00495 1.11404 1.13373 
Iterations 23 Rows Read 83 
R-Squared 0.999294 Rows Used 83 
Random Seed 10007 Total Count 83 
Estimated Model 
 Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Plot Section 
Fig.20 
0.0 25.0 50.0 75.0 100.0
Residual vs C1
            
In conclusion,  for the (6)  we obtained: c=1.46185, γ=1.12389.  
V. Paar et al. obtained [6] c=1.47±0.02 and γ=1.123±0.005 in excellent agreement. 
Again we may conclude for values that significantly differ from line. In addition, the obtained values 
strongly give evidence for a  possible fractal regime. 
The possible existence of a fractal regime in the mechanism of increasing mass in atomic weights and 
mass umber of atomic nuclei changes radically our traditional manner to conceive nuclear matter. 
Consequently, it becomes of relevant importance to attempt to deepen such result so  to reach the highest 
possible level of certainty on it. A way to deepen such kind of analysis is to follow the way of variogram 
method. Variograms usually give powerful indications on the variability of the examined data, on their 
self-similarity and self-affine behaviour. In particular, they enable us to calculate the Generalized Fractal 
Dimension [ for details see ref.11]. 
The semivariogram is given in the following manner 
))()((
2hxRxRE
=γ        (7) 
For Atomic Weights it is: 
))()((
2hZWZWE
=γ       (8) 
and for Mass Number it is 
))()((
2hZMZME
=γ   .  shiftZ − is indicated here by  ,.....2,1=h   .  (9) 
Still, in the general case we may write 
ii xRhxRhN
2))()((
)(  (10) 
For a self-affine series the semivariogram scales according to   
DhCh =γ )(      (11) 
being D the Generalized Fractal Dimension. It is linked to aH  by aHD 2= being aH the Hausdorff 
dimension. 
We may also estimate the corresponding Probability Density Function that is given in the following 
manner 
1)( −= a
hP       (12) 
being  1−= aD   and k is a scale parameter. 
Let us introduce now the results that we obtained for Atomic Weights. 
RESULTS 
Fig. 21: Variogram of atomic weights 
     
Z-shift     value      Z-shift     value       Z-shift       value     
1 4.1038233 28 2583.4415 55 9936.2738
2 13.890723 29 2774.3962 56 10263.722
3 30.73476 30 2974.6827 57 10614.884
4 53.281165 31 3177.0212 58 10974.746
5 82.706811 32 3391.3496 59 11360.41 
6 118.39056 33 3607.1587 60 11745.812
7 161.00905 34 3826.8629 61 12160.26 
8 209.83439 35 4053.0824 62 12559.606
9 265.1562 36 4283.8682 63 12969.082
10 326.91086 37 4517.609 64 13342.27 
11 395.38036 38 4760.8117 65 13738.976
12 470.35989 39 5010.0734 66 14163.507
13 551.85195 40 5268.4938 67 14573.838
14 640.44287 41 5533.4834 68 14979.658
15 735.22246 42 5805.6797 69 15415.604
16 837.44107 43 6083.8624 70 15830.968
17 946.67776 44 6370.1678 71 16277.052
18 1060.8571 45 6665.5819 72 16709.305
19 1183.535 46 6969.6876 73 17165.872
20 1314.6237 47 7285.3773 74 17613.299
21 1448.8492 48 7605.9824 75 18096.431
22 1590.1856 49 7928.8907 76 18533.473
23 1734.6391 50 8261.1897 77 18994.677
24 1888.6115 51 8592.0056 78 19457.937
25 2047.654 52 8917.3226 79 20009.547
26 2217.1471 53 9259.8625 80 20578.44 
27 2394.576 54 9591.9184
Using the (11) in a Ln-Ln representation, we may now estimate the Generalized Fractal Dimension. We 
obtained the following results. 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Parameter Estimates for All Groups 
Groups            Count     Iter's R2 A B  
All 81 6           0.99986                    1.25542                 1.98086  
Combined Plot Section: Ln-Ln variogram fitting 
Fig.22 
0.0 1.3 2.5 3.8 5.0
Y = Simple Linear
                      
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 1.25542 0.00939 1.23674 1.27410 
B 1.98086 0.00264 1.97560 1.98612 
Iterations 6 Rows Read 81 
R-Squared 0.999859 Rows Used 81 
Random Seed 7364 Total Count 81 
Estimated Model 
(1.25542232430747)+(1.98086225009967)*(C1) 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Plot Section 
Fig.23 
0.0 1.3 2.5 3.8 5.0
Residual vs C1
             
In conclusion, we obtain for Atomic Weights the following results: 
Generalized Fractal Dimension  D = 1.98086 
Hausdorff dimension     Ha = 0.99043. 
By using the (12) we may now calculate the Probability Density Function. For atomic weights it results 
that  
P(Z) = 98086.1610673.5 Z−×            (13) 
that is given in Fig.24. 
Fig. 24 
In order to deepen our analysis we may  also employ  a modified version of standard variogram analysis, 
using this time  a light modification of its usual form in the following way  
iiN hxRxRN
2 ))()((2
)(     (14) 
where we calculate  now by )2/1( N instead of )(2/1 hN −  being )1(2 −N the number of degrees of freedom 
for the whole system taken in consideration. 
We have  the results in the case of the variogram N2γ  for atomic weights in Figures 25, 26, 27. 
Fig. 25 
Fig.26 
Fig.27    
As see, passing from variogram in Fig.25 to variogram in Fig.26 and, finally, in variogram in Fig.27 we 
have used each time a different factor of scale and, in spite of such different factors of scale, the behaviour 
of the correspondent variograms, remain unchanged. This result may be taken as further indication that we 
are in presence of a fractal regime. 
In addition, by  the N2γγγγ variogram, we may now re-calculate the Generalized Fractal Dimension using a 
Ln-Ln scale.  
The results are given in the following scheme. 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Parameter Estimates for All Groups 
Groups Count Iter's R2 A B  
All 52 6          0.99288                   1.65610                  1.70683  
Combined Plot Section 
Fig.28 
0.0 1.0 2.0 3.0 4.0
Y = Simple Linear
                      
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 1.65610 0.06408 1.52739 1.78481 
B 1.70683 0.02045 1.66576 1.74790 
Iterations 6 Rows Read 52 
R-Squared 0.992875 Rows Used 52 
Random Seed 10882 Total Count 52 
Estimated Model 
(1.65609637745411)+(1.70683194557246)*(C1) 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Plot Section 
Fig.29 
0.0 1.0 2.0 3.0 4.0
Residual vs C1
             
In conclusion, also in this case a non integer  value of the Generalized Fractal Dimension is obtained. It 
results D=1.70683 with Hausdorff dimension 853415.0=aH . Such values result in satisfactory accord 
with those previously had in the case of the standard variogram. 
We may now consider the results that we obtained in the corresponding analysis for Mass Number. 
Fig. 30:Variogram of Mass Number 
Variogram values: 
 Z-lags      Variogram- value              Z-lags      Variogram- value
1 5.243902 
2 14.7284 
3 32.275 
4 54.56329 
5 84.44231 
6 119.9221 
7 163.0592 
8 212.2067 
9 268.1824 
10 329.637 
11 399.0625 
12 474.9789 
13 557.2214 
14 646.1667 
15 741.9779 
16 843.4851 
17 953.9318 
18 1068.915 
19 1192.695 
20 1324.532 
21 1460.766 
22 1604.451 
23 1749.508 
24 1903.856 
25 2063.957 
26 2234.096 
27 2412.473 
28 2603.482 
29 2796.472 
30 3000.292 
31 3205.394 
32 3423.01 
33 3639.03 
34 3860.969 
35 4091.052 
36 4326.67 
37 4561.033 
38 4804.122 
39 5055.284 
40 5320.965 
41 5591.036 
42 5867.11 
43 6145.663 
44 6428.603 
45 6726.934 
46 7036.689 
47 7356.458 
48 7675.757 
49 7997.926 
50 8333.5 
51 8671.016 
52 8996.565 
53 9339.117 
54 9664.879 
55 10009.8 
56 10334.06 
57 10688.52 
58 11053.98 
59 11444.02 
60 11836.33 
61 12256.52 
62 12651.02 
63 13058.83 
64 13431.24 
65 13832.53 
66 14249.85 
67 14660.66 
68 15086.9 
69 15531.39 
70 15943.19 
71 16399.63 
72 16814.68 
73 17282.35 
74 17737.39 
75 18219.06 
76 18626.5 
77 19078.67 
78 19562.9 
79 20150.38 
80 20672.67 
81 21218.5 
We may now give the estimation of the Generalized Fractal Dimension.      
Curve Fit Report (Ln-Ln plot) 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Parameter Estimates for All Groups 
Groups Count Iter's R2 A B  
All 81 4          0.99943                   1.32691                   1.96370  
Combined Plot Section 
Fig.31 
0.0 1.3 2.5 3.8 5.0
Y = Simple Linear
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 1.32691 0.01877 1.28955 1.36427 
B 1.96370 0.00528 1.95318 1.97422 
Iterations 4 Rows Read 81 
R-Squared 0.999428 Rows Used 81 
Random Seed 2960 Total Count 81 
Estimated Model 
(1.32690928509202)+(1.96369892532774)*(C1) 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Plot Section 
Fig.32      
0.0 1.3 2.5 3.8 5.0
Residual vs C1
The analysis enables us to give the following results: 
Generalized Fractal Dimension  D = 1.96370 
Hausdorff dimension     Ha = 0.98185 
We may now calculate the Probability Density Function. It assumes the following form 
P(Z) = 9637.1610786.6 Z−−−−××××  
Fig. 33 
Let us proceed estimating N2γ  at different  scale factors. 
Fig. 34 
  Fig. 35a 
Fig. 35b 
           
By the N2γ variogram  we may now re-calculate the Generalized Fractal Dimension using a Ln-Ln scale. The 
results are given in the following scheme. 
 Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Parameter Estimates for All Groups 
Groups Count Iter's R2 A B  
All 49 4           0.99538                   6.81261                  1.70270  
Combined Plot Section 
Fig.36 
                  
0.0 1.0 2.0 3.0 4.0
Y = Simple Linear
                          
Model Estimation Section 
Parameter Parameter Asymptotic Lower Upper 
Name Estimate Standard Error 95% C.L. 95% C.L. 
A 6.81261 0.05209 6.70783 6.91739 
B 1.70270 0.01692 1.66867 1.73674 
Iterations 4 Rows Read 49 
R-Squared 0.995381 Rows Used 49 
Random Seed 11153 Total Count 49 
Estimated Model 
(6.81260680697216)+(1.70270493930648)*(C1) 
Curve Fit Report 
Y Variable: C2.  X Variable: C1.  
Model Fit: C2=A+B*(C1)  Simple Linear  
Plot Section 
Fig.37 
0.0 1.0 2.0 3.0 4.0
Residual vs C1
          
Conclusion: also in this case a non integer  value of the Generalized Fractal Dimension is obtained. It 
results D=1.70270 with Hausdorff dimension 85135.0=aH . Such values result in satisfactory accord with 
those previously obtained in the case of the standard variogram. 
In conclusion, until here we have used  the standard methodologies that  generally one utilizes with the 
aim to ascertain the presence of non linear contributions in the investigated dynamics as well as to 
reconstruct phase space dynamics and to evaluate the possible presence of divergent features in the 
system, possibly of chaotic nature, and still the probable presence of a fractal regime in such dynamics. 
On the basis of the results that we have obtained, it seems very difficult to escape the conclusion that the 
process of increasing mass, regarding Atomic Weighs and Mass Number in atomic nuclei, concerns all the 
basic features of non linearity, divergence, possible chaoticity and fractality that we have only just 
indicated for systems with non linear dynamics. This is a conclusion that in some manner overthrows our 
traditional manner to approach nuclear matter. For this reason it requires still more detailed deepening. In 
the following sections we will support our conclusion by other detailed results. 
6. Calculation of Hurst Exponent and Possible Presence of Fractional Brownian Behaviour In 
Atomic Weights and Mass Number of Atomic Nuclei 
It is known that time series arise often from a random walk  usually  called Brownian motion. The Hurst 
exponent [12] in such cases is calculated to be 0.5. 
This concept may be generalized introducing the Fractional Brownian Motion (fBM) which arises from 
integrating correlated –coloured noise. 
The value of Hurst exponent helps us to identify the nature of the regime we have under examination. In 
detail, if the H exponent results greater than 0.5, we are in presence of persistence, that is to say, past 
trends also persist into the future. On the other hand, in presence of H exponent values less than 0.5 we 
conclude for anti persistence, indicating it in this case  that past trends tend to reverse in the future. 
In the present case the analysis is not performed having a time series but instead  we consider the atomic 
number Z in )(ZWa , the atomic weights, and in )(ZA , the Mass Number of atomic nuclei. 
Our analysis gave the following results. 
For atomic weights, )(ZWa , we obtained the subsequent value: 
Hurst exponent H = 0.9485604 ; SDH = 0.00625887 ; r2 = 0.999645 
Instead for Mass Number , )(ZA , we had the next value: 
Hurst exponent H = 0.8953571 ; SDH = 0.0057648 ; r2 = 0.999753. 
Both the results obtained respectively for Atomic Weights and for Mass Number, enable us to conclude 
that: 
1) we are in presence of a Fractional Brownian Regime in both the cases; 
2) in both )(ZWa  and )(ZA  the tendency is for the persistence that results more marked in )(ZWa  respect 
to )(ZA ; 
3) in the case of the Atomic Weights, )(ZWa , the value of Fractal Dimension results to be 
      0514396.12 =−= HD   
      while in the case of  Mass Number, )(ZA , the value of  Fractal Dimension is  
      1046429.12 =−= HD  . 
7. Recurrence Quantification Analysis – RQA 
Further important information on the nature of the processes presiding over the mechanism of increasing 
mass in Atomic Weights and Mass Number of atomic nuclei may be obtained by using RQA, the 
Recurrence Quantification Analysis. 
This is a kind of analysis that, as it is well known, was introduced by J.P Zbilut and C.L. Webber [13]. 
Such investigation offers a new opportunity to us. By it we may give a look to the process of increasing 
mass of atomic nuclei analyzing in detail the kind of dynamics that governs such mechanism. Therefore, 
the results of such investigation must be considered with particular attention owing to their relevance. 
The features that we may investigate in detail are the following: first of all we may evaluate the level of 
recurrence, that is to say of “periodicity”, that such process exhibits. This is obtained by estimating the % 
Rec in RQA. Soon after we may also calculate the Determinism that is involved in such process. This is to 
say that we evaluate the level of predictability that it has. We estimate such features by %Det. in RQA. As 
third RQA variable we may also estimate the entropy and than the Max Line that is a measure linked 
proportionally to the inverse of the Lyapunov exponent. In brief, such measure enables us to evaluate still 
again the possible divergence involved in such mechanism. 
Usually, when using RQA, one starts with an embedding procedure of the given time series and thus 
providing with a given reconstruction of the given time series in phase space. In our case such 
reconstruction was previously performed in previous sections and we obtained  that we should use an 
embedding dimension 2=D  with a 3=− shiftZ  in the case of Atomic Weights and a 2=− shiftZ  in the 
case of Mass Numbers. However, in the present analysis our purpose is slightly different in the sense that 
we aim to preserve the embedded dimension 2=D  but we yearn for analyzing the behaviour of the basic 
RQA variables as %Rec., %Det., ENT., and Max Line shifting step by step the value of the atomic 
number Z  so to explore the mechanism as well as Z increases step by step. In order to perform such kind 
of analysis  a  value of the distance R  should be correctly selected. Usually, the distance R  in RQA may 
be fixed rather empirically selecting a proper value so that %Rec. remain about 1%.  However Zbilut and 
Webber [13] in their RQA software package introduced RQS that estimates recurrences at various 
distances and the cut off that one has at a particular distance respect to a flat behaviour. In this manner one 
selects the best optimized distance R to use in the analysis. We applied RQS software to select the proper 
distance and it was obtained that such value should be taken 4=R  
We also ascertained that such selected value remained rather constant when increasing Z  step by step.  In 
Fig. 38 we give some of the results that were obtained.  
Fig. 38 
Estimation of  Recurrences at various Distances
(Ln-Ln Plot)
0 1 2 3 4 5
Distance R
Optimized distance R = 4 
Atomic Weights
Estimation of  Recurrences at various  Distances
(Ln-Ln Plot)
0 0.5 1 1.5 2 2.5 3 3.5
Distance R
Optimized distance R = 4 
Mass Number
In conclusion we selected R=4 for the distance to use in RQA. The embedding dimension was chosen to 
be D=2 as it resulted by using FNN criterion and verifying this choice also for different Z  values. 
Finally, we decided to use the value  L=3 for the Line Length. 
We have obtained the following results. 
Recurrence Quantification Analysis applied to Atomic Weights, )(ZWa , for increasing values of shiftZ − . 
The results obtained for %Rec., %Det., ENT., and MaxLine are reported in the following Table 1 
Table 1 
Z-shift %Rec. %Det. Entropy Max-Line
1 1.36 73.33 2.00 14 
2 1.39 44.44 1.50 7 
3 1.33 52.38 1.59 13 
4 1.36 28.57 1.00 7 
5 1.30 38.46 1.00 11 
6 1.37 27.50 0.92 5 
7 1.16 24.24 0.00 8 
8 1.33 13.51 0.00 5 
9 1.04 32.14 1.00 5 
10 1.33 17.14 0.00 3 
11 0.94 25.00 0.00 6 
12 1.33 12.12 0.00 4 
13 1.04 24.00 0.00 6 
14 1.28 20.00 0.00 3 
15 0.92 14.29 0.00 3 
16 1.31 20.69 0.00 3 
17 0.89 15.79 0.00 3 
18 1.06 13.64 0.00 3 
19 0.84 17.65 0.00 3 
20 1.13 13.69 0.00 3 
21 0.95 0.00 - - 
There are some results that deserve to be outlined . 
%Rec. remains rather constant in correspondence of the different shiftZ −  values with some fluctuations 
taking minima values mainly at shiftZ − = 11, 15, 19,21. 
A graph is given in Fig.39. 
Fig. 39 
Atomic Weight (%Recurrences)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
%Det. assumes rather  low values also with a length Line L=3. It oscillates among maxima and minima 
for increasing values of shiftZ − as it is pictured in Fig.40 (a, b, c). Significantly, %Det. goes definitively 
to zero starting with shiftZ − =21. 
Fig. 40a 
Atomic Weight (%Determinism)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
Rather interesting appear also the value we obtain for Entropy and Max Line as reported in the following 
figures. 
Fig. 40b 
Atomic Weight (Entropy)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
Fig. 40c 
Atomic Weight (Max-Line)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
We may now pass to consider  Recurrence Quantification Analysis in the case of Mass Numbers, )(ZA . 
The results for %Rec., %Det., ENT., and Max Line are given in Table2. 
Table 2 
Z-shift %Rec. %Det. Entropy Max-Line 
1 1.20 77.50 1.37 14 
2 1.33 30.23 0.92 7 
3 1.23 41.03 1.00 13 
4 1.43 36.36 0.81 7 
5 1.23 37.84 1.00 11 
6 1.30 21.05 1.00 5 
7 1.09 38.71 1.00 8 
8 1.44 20.00 1.00 5 
9 1.19 31.25 1.00 6 
10 1.41 8.11 0.00 3 
11 1.25 18.75 0.00 6 
12 1.33 21.21 1.00 4 
13 1.28 41.94 1.59 6 
14 1.58 27.03 0.92 4 
15 1.14 46.15 1.59 6 
16 1.45 9.38 0.00 3 
17 1.31 21.43 0.00 3 
18 1.59 21.21 1.00 4 
19 1.19 25.00 0.00 3 
20 0.92 16.67 0.00 3 
21 0.79 0.00 - - 
%Rec. remains rather constant in correspondence of the different shiftZ −  values with some fluctuations  
taking minima values mainly at shiftZ − = 7,9,11,13,15,..,19. 
A graph is given in Fig.41 (a, b, c, d) 
Fig.41a 
Mass Number (%Recurrences)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
Fig. 41b 
Mass Number (%Determinism)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
Fig. 41c 
Mass Number (Entropy)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
Fig. 41d 
Mass Number (Max-Line)
0 2 4 6 8 10 12 14 16 18 20 22 Z-shift
%Det. assumes rather  low values also with a length Line L=3. It oscillates among maxima and minima 
for increasing values of shiftZ − as it is pictured in Fig.41b. Significantly, %Det. goes definitively to zero 
starting with shiftZ − =21. 
In Fig.42 we have the comparison of %Det of Atomic Weights respect to %Det of Mass Numbers. 
     
Fig.42    
Atomic Weights - Mass Number (% Determinism)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Z, atomic number
atomic weigths
mass number
 Looking at the results given in Tables 1 and 2 and linked figures, we deduce that for different values of 
shiftZ − , the corresponding values of %Rec tend to show fluctuations. As it is well known, %Rec 
indicates in some manner presence of pseudoperiodicities. Therefore the rather small fluctuations of 
%Rec  indicate that we are in presence of a mechanism of increasing mass that tends to preserve some 
kind of periodicity and self-resemblance with rather modest fluctuations The more interesting datum is 
given by %Det. In this case we have more marked oscillations showing that in the process of increasing 
mass of stable atomic nuclei we have phase of increasing stability as opposed to phases of decrease 
stability. Here the law is the mechanism of addition of nucleons that is realized at each step between the 
given nucleus and its subsequent as considered in our phase space representation. %Det oscillations 
indicate that the process of progressively addition of nucleons in nuclei happens on the basis of a complex 
non linear mechanism in which the determinism and thus the same predictability of subsequent Mass 
Number and /or Atomic Weights is very complex and so distant from a simple and linear regime of 
addition of matter that we have expected to hold for a very long time. Looking at the values of Entropy, 
expressed in bits, one finds that also in this case oscillations are dominant for increasing values of Z-shift. 
The same happens for MaxLine whose inverse gives estimation of the divergence of the system in 
consideration giving direct indication of a possible chaotic regime. 
In conclusion, by using RQA we conclude that the mechanism of increasing mass in atomic nuclei is 
rather periodical and self-resemblance. We have obtained marked oscillations for the values of RQA 
variables. The important thing to remember here is that we are operating in a reconstructed phase space 
that takes into account o more an isolated nucleus as in the classical nuclear physics discussions, but each 
time pairs of nuclei in the embedded space with dimension D=2.The deriving behaviour of the 
mechanism of increasing mass of atomic nuclei evidences in this case all its complexity. We have now set 
of nuclei that evidence their oscillatory behaviour for % Rec, %Det, Entropy and Max Line.Such 
oscillatory behaviours of classes of nuclei result obviously connected to “periodicities” and mainly to 
classes of similarities that also stable nuclei seem to exhibit. The marked variations in the values of 
determinism indicate that the whole process results rather complex  and it is regulated from phases of 
more stability and subsequent phases of increased instability. 
In order to conclude such kind of research, and to confirm the new results that we have here indicated, we 
have  performed the last kind analysis. In this last case we have in some manner overturned the scheme of 
the previous analysis in the sense that we have selected an embedding dimension D=1. The reader will 
remember that results by FFN gave same uncertainty in selecting the values D=1 or D=2. Our previous 
RQA was performed by using D=2 . In this final exploration we use D=1. In this condition of analysis a 
given value of delay and thus , in our case of shiftZ −  , has no more sense . Each point in phase space is 
given by a value of )(ZWa or of )(ZA . To use RQA we have to select a distance , that is to say a Radius R. 
Using Euclidean distance , R will result to be the difference A∆  between two values of Mass Numbers in 
the case one utilizes )(ZA  for the analysis. In conclusion we have 
NZA += , 222 NZA += . 
The distance , R, to use in RQA will result to be given 
NZA ∆+∆=∆  
We decided to use RQA considering L=3 as Line Length and R increasing step by step from 1 to209. In 
this manner we calculated %Rec, %Det, Entropy and Max Line, for increasing values of ,.....3,2,1=Z  . 
Note that in such kind of analysis we used also shuffled data in order to ascertain the validity of the 
obtained results. In addition , on the obtained data, we used also a Wald-Wolfowitz run test that we 
executed on %Det and on %Det / %Rec, and  the probability that the results were obtained by chance, 
was found to be <0.001. 
The results are now given in Tables 3, 4 and in Figures 43, 44, 45, 46, 47. 
Table 3: Results of Recurrences quantification analysis of Mass Number with embedding D=1 
  and distance R ranging from 1 to 209 
distance 
Rec. 
Det. %Det./%Rec. 
distance 
Rec. 
Det. %Det./%Rec.
distance
Rec. 
Det. %Det./%Rec. 
1 0.029 0.000 0.000 71 42.051 98.742 2.348 141 71.907 99.55 1.384 
2 1.029 8.571 8.329 72 42.051 98.742 2.348 142 72.436 99.513 1.374 
3 1.381 23.404 16.947 73 42.786 98.283 2.297 143 72.436 99.513 1.374 
4 1.381 23.404 16.947 74 43.432 98.241 2.262 144 72.877 99.395 1.364 
5 2.204 66.667 30.248 75 43.432 98.241 2.262 145 73.288 99.399 1.356 
6 3.115 75.472 24.229 76 44.167 99.069 2.243 146 73.288 99.399 1.356 
7 4.29 82.192 19.159 77 44.902 98.822 2.201 147 73.817 99.602 1.349 
8 4.29 82.192 19.159 78 45.519 98.773 2.170 148 74.376 99.526 1.338 
9 5.172 86.364 16.698 79 45.519 98.773 2.170 149 74.904 99.451 1.328 
10 6.024 89.268 14.819 80 46.195 98.885 2.141 150 74.904 99.451 1.328 
11 6.024 89.269 14.819 81 46.812 98.87 2.112 151 75.316 99.532 1.322 
12 7.082 90.45 12.772 82 46.812 98.87 2.112 152 75.727 99.728 1.317 
13 7.875 88.806 11.277 83 47.458 98.885 2.084 153 75.725 99.728 1.317 
14 8.639 91.156 10.552 84 48.193 98.963 2.053 154 76.197 99.691 1.308 
15 8.639 91.156 10.552 85 48.78 98.675 2.023 155 76.609 99.501 1.299 
16 9.58 94.785 9.894 86 48.78 98.675 2.023 156 77.05 99.657 1.293 
17 10.638 94.475 8.881 87 49.398 98.989 2.004 157 77.05 99.657 1.293 
18 10.638 94.475 8.881 88 50.132 99.062 1.976 158 77.667 99.659 1.283 
19 11.666 93.703 8.032 89 50.132 99.062 1.976 159 78.078 99.511 1.275 
20 12.401 95.261 7.682 90 50.779 99.016 1.950 160 78.078 99.511 1.275 
21 13.282 95.575 7.196 91 51.455 99.029 1.925 161 78.343 99.4 1.269 
22 13.282 95.575 7.196 92 52.16 99.155 1.901 162 78.754 99.664 1.266 
23 14.252 94.433 6.626 93 52.16 99.155 1.901 163 79.224 99.703 1.258 
24 15.046 95.508 6.348 94 52.806 98.998 1.875 164 79.224 99.703 1.258 
25 15.046 95.508 6.348 95 53.425 99.065 1.854 165 79.783 99.595 1.248 
26 16.927 97.232 5.744 96 53.425 99.065 1.854 166 80.194 99.45 1.240 
27 18.513 96.508 5.213 97 54.041 99.402 1.839 167 80.635 99.745 1.237 
28 17.602 95.993 5.454 98 54.687 99.087 1.812 168 80.635 99.745 1.237 
29 17.602 95.993 5.454 99 55.245 98.83 1.789 169 81.105 99.565 1.228 
30 18.513 96.508 5.213 100 55.245 98.83 1.789 170 81.399 99.458 1.222 
31 19.16 96.626 5.043 101 55.804 99.052 1.775 171 81.399 99.458 1.222 
32 19.16 96.625 5.043 102 56.303 98.904 1.757 172 81.781 99.748 1.220 
33 19.982 96.176 4.813 103 56.92 99.019 1.740 173 82.222 99.607 1.211 
34 21.04 97.067 4.613 104 56.92 99.019 1.740 174 82.662 99.523 1.204 
35 21.863 97.312 4.451 105 57.743 99.237 1.719 175 82.662 99.573 1.205 
36 21.863 97.312 4.451 106 58.419 98.994 1.695 176 83.133 99.611 1.198 
37 22.656 97.017 4.282 107 58.419 98.994 1.695 177 83.368 99.612 1.195 
38 23.45 97.243 4.147 108 58.919 99.202 1.684 178 83.368 99.612 1.195 
39 24.42 96.51 3.952 109 59.536 99.112 1.665 179 83.75 99.684 1.190 
40 24.42 96.51 3.952 110 60.182 99.121 1.647 180 84.337 99.617 1.181 
41 25.272 96.86 3.833 111 60.182 99.121 1.647 181 84.69 99.722 1.177 
42 25.918 96.485 3.723 112 60.711 99.177 1.634 182 84.69 99.722 1.177 
43 25.918 96.485 3.723 113 61.387 99.234 1.617 183 84.984 99.654 1.173 
44 26.8 96.82 3.613 114 61.387 99.234 1.617 184 85.307 99.724 1.169 
45 27.593 97.551 3.535 115 62.033 99.337 1.601 185 85.307 99.724 1.169 
46 28.299 98.027 3.464 116 62.504 99.436 1.591 186 85.688 99.863 1.165 
47 28.299 98.027 3.464 117 63.033 99.441 1.578 187 86.13 99.693 1.157 
48 29.151 97.984 3.361 118 63.033 99.441 1.578 188 86.483 99.728 1.153 
49 29.944 97.544 3.258 119 63.562 99.353 1.563 189 86.483 99.728 1.153 
50 29.944 97.547 3.258 120 64.091 99.358 1.550 190 86.835 99.763 1.149 
51 30.767 98.376 3.197 121 64.091 99.358 1.550 191 87.129 99.865 1.146 
52 31.472 98.039 3.115 122 64.678 99.5 1.538 192 87.129 99.865 1.146 
53 32.119 98.079 3.054 123 65.207 99.459 1.525 193 87.423 99.832 1.142 
54 32.119 98.079 3.054 124 65.795 99.285 1.509 194 87.775 99.766 1.137 
55 30.059 98.489 3.277 125 65.795 99.285 1.509 195 88.099 99.666 1.131 
56 33.911 98.44 2.903 126 66.441 99.513 1.498 196 88.099 99.666 1.131 
57 33.911 98.44 2.903 127 66.941 99.517 1.487 197 88.481 99.801 1.128 
58 34.558 98.469 2.849 128 66.941 99.517 1.487 198 88.804 99.735 1.123 
59 35.292 98.751 2.798 129 67.382 99.477 1.476 199 89.039 99.736 1.120 
60 36.086 98.616 2.733 130 67.852 99.524 1.467 200 89.039 99.736 1.120 
61 36.086 98.616 2.733 131 68.44 99.614 1.455 201 89.362 99.704 1.116 
62 36.938 98.329 2.662 132 68.44 99.614 1.455 202 89.686 99.803 1.113 
63 37.702 98.051 2.601 133 69.486 99.49 1.432 203 89.686 99.803 1.113 
64 37.702 98.051 2.601 134 69.556 99.493 1.430 204 90.009 99.739 1.108 
65 38.29 98.388 2.570 135 69.909 99.496 1.423 205 90.332 99.707 1.104 
66 39.024 98.494 2.524 136 69.906 99.496 1.423 206 90.538 99.838 1.103 
67 39.788 98.523 2.476 137 70.291 99.373 1.414 207 90.538 99.838 1.103 
68 39.788 98.523 2.476 138 70.79 99.377 1.404 208 90.831 99.871 1.100 
69 40.406 98.545 2.439 139 70.79 99.377 1.404 209 91.155 99.742 1.094 
70 41.17 98.787 2.399 140 71.378 99.506 1.394         
Fig. 43 
Fig. 44 
Fig. 45 
Fig. 46 
Fig. 47 
The obtained results may be considered of valuable interest since they indicate possible new properties 
for Mass Number of atomic nuclei. 
At increasing values of Radius R, % Rec and % Det increase, as it is trivially expected in some general 
case, but the interesting new thing is that, after some regular increasing values of %Rec and %Det, 
occurring every two or three step, soon after the values of RQA variables reach values of stability that so 
remain for two steps in the increasing values of R. In other terms,  in presence of increasing R, we have  
corresponding increasing values of % Rec, %Det, Entropy, followed by a phase in which, still for 
increasing R, the values of RQA variables remain instead constant.  
This is certainly a new mechanism of increasing mass of atomic nuclei that deserves to be carefully 
explained. 
Table 4 
d=2 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=3 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=4 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N    
  3 4 4 5 1 1     1 0 2 2 1 2     3 4 5 6 2 2 
  4 5 5 6 1 1     2 2 3 4 1 2     6 6 8 8 2 2 
  6 6 7 7 1 1     4 5 6 6 2 1     8 8 10 10 2 2 
  7 7 8 8 1 1     5 6 7 7 2 1     9 10 11 12 2 2 
  26 30 28 30 2 0     8 8 9 10 1 2     10 10 12 12 2 2 
  38 50 40 50 2 0     10 10 11 12 1 2     11 12 13 14 2 2 
  52 78 54 78 2 0     12 12 13 14 1 2     12 12 14 14 2 2 
  56 82 58 82 2 0     14 14 15 16 1 2     13 14 15 16 2 2 
  57 82 59 82 2 0     16 16 17 18 1 2     14 14 16 16 2 2 
  66 98 68 98 2 0     21 24 22 26 1 2     15 16 17 18 2 2 
  77 116 78 117 1 1     22 26 23 28 1 2     17 18 19 20 2 2 
  78 117 79 118 1 1     24 28 25 30 1 2     22 26 24 28 2 2 
                  25 30 28 30 3 0     23 28 25 30 2 2 
                  26 30 27 32 1 2     24 28 26 30 2 2 
                  37 48 38 50 1 2     25 30 27 32 2 2 
                  40 50 41 52 1 2     27 32 29 34 2 2 
                  43 56 44 58 1 2     33 42 35 44 2 2 
                  45 58 46 60 1 2     34 46 36 48 2 2 
                  52 78 55 78 3 0     36 48 38 50 2 2 
                  56 82 59 82 3 0     37 48 39 50 2 2 
                  59 82 60 84 1 2     39 50 41 52 2 2 
                  68 98 69 100 1 2     42 56 44 58 2 2 
                  73 108 74 110 1 2     43 56 45 58 2 2 
                  74 110 75 112 1 2     44 58 46 60 2 2 
                  76 116 78 117 2 1     45 58 47 60 2 2 
                  80 122 81 124 1 2     58 82 60 84 2 2 
                  81 124 82 126 1 2     59 82 61 84 2 2 
                                  67 98 69 100 2 2 
                                  72 108 74 110 2 2 
                                  77 116 79 118 2 2 
                                  81 124 83 126 2 2 
                                              
d=6 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=7 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=8 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N    
  1 0 3 4 2 4     2 2 5 6 3 4     1 0 4 5 3 5 
  7 7 10 10 3 3     3 4 7 7 4 3     2 2 6 6 4 4 
  19 20 21 24 2 4     4 5 8 8 4 3     5 6 9 10 4 4 
  21 24 23 28 2 4     6 6 9 10 3 4     6 6 10 10 4 4 
  24 28 28 30 4 2     8 8 11 12 3 4     8 8 12 12 4 4 
  28 30 30 34 2 4     10 10 13 14 3 4     9 10 13 14 4 4 
  29 34 31 38 2 4     12 12 15 16 3 4     10 10 14 14 4 4 
  31 38 33 42 2 4     14 14 17 18 3 4     11 12 15 16 4 4 
  32 42 34 46 2 4     16 16 19 20 3 4     12 12 16 16 4 4 
  35 44 37 48 2 4     21 24 24 28 3 4     13 14 17 18 4 4 
  36 48 40 50 4 2     22 26 25 30 3 4     15 16 19 20 4 4 
  41 52 43 56 2 4     23 28 28 30 5 2     16 16 18 22 2 6 
  48 66 50 70 2 4     24 28 27 32 3 4     16 16 20 20 4 4 
  49 66 51 70 2 4     26 30 29 34 3 4     18 22 22 26 4 4 
  51 70 53 74 2 4     43 56 46 60 3 4     20 20 22 26 2 6 
  53 74 55 78 2 4     47 60 48 66 1 6     22 26 26 30 4 4 
  54 78 56 82 2 4     48 66 51 70 3 4     23 28 27 32 4 4 
  55 78 57 82 2 4     50 70 53 74 3 4     25 30 29 34 4 4 
  56 82 60 84 4 2     54 78 57 82 3 4     26 30 30 34 4 4 
  57 82 61 84 4 2     55 78 58 82 3 4     34 46 38 50 4 4 
  62 90 64 94 2 4     56 82 61 84 5 2     37 48 41 52 4 4 
  63 90 65 94 2 4     61 84 62 90 1 6     40 50 42 56 2 6 
  64 94 66 98 2 4     62 90 65 94 3 4     42 56 46 60 4 4 
  65 94 67 98 2 4     64 94 67 98 3 4     43 56 47 60 4 4 
  69 100 71 104 2 4     65 94 68 98 3 4     46 60 48 66 2 6 
  70 104 72 108 2 4     70 104 73 108 3 4     47 60 49 66 2 6 
  71 104 73 108 2 4     72 108 75 112 3 4     52 78 56 82 4 4 
  73 108 75 112 2 4     78 117 80 122 2 5     54 78 58 82 4 4 
  75 112 77 116 2 4     80 122 83 126 3 4     55 78 59 82 4 4 
  80 122 82 126 2 4                     60 84 62 90 2 6 
                                  61 84 63 90 2 6 
                                  64 94 68 98 4 4 
                                  68 98 70 104 2 6 
                                  74 110 76 116 2 6 
                                  75 112 78 117 3 5 
                                  79 118 81 124 2 6 
                                              
d=9 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=10 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=11 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N    
  3 4 8 8 5 4     1 0 5 6 4 6     1 0 6 6 5 6 
  5 6 10 10 5 4     2 2 7 7 5 5     4 5 10 10 6 5 
  7 7 11 12 4 5     4 5 9 10 5 5     6 6 11 12 5 6 
  9 10 14 14 5 4     7 7 12 12 5 5     8 8 13 14 5 6 
  11 12 16 16 5 4     17 18 21 24 4 6     10 10 15 16 5 6 
  15 16 18 22 3 6     21 24 25 30 4 6     12 12 17 18 5 6 
  15 16 20 20 5 4     22 26 28 30 6 4     14 14 19 20 5 6 
  19 20 22 26 3 6     27 32 31 38 4 6     18 22 23 28 5 6 
  25 30 30 34 5 4     30 34 32 42 2 8     20 20 23 28 3 8 
  33 42 36 48 3 6     31 38 35 44 4 6     21 24 26 30 5 6 
  34 46 39 50 5 4     32 42 36 48 4 6     22 26 27 32 5 6 
  35 44 38 50 3 6     33 42 37 48 4 6     24 28 29 34 5 6 
  36 48 41 52 5 4     34 46 40 50 6 4     28 30 31 38 3 8 
  39 50 42 56 3 6     35 44 39 50 4 6     29 34 32 42 3 8 
  40 50 43 56 3 6     38 50 42 56 4 6     30 34 33 42 3 8 
  41 52 44 58 3 6     39 50 43 56 4 6     31 38 34 46 3 8 
  42 56 47 60 5 4     41 52 45 58 4 6     32 42 37 48 5 6 
  46 60 49 66 3 6     50 70 52 78 2 8     35 44 40 50 5 6 
  51 70 52 78 1 8     52 78 58 82 6 4     38 50 43 56 5 6 
  52 78 57 82 5 4     65 94 69 100 4 6     45 58 48 66 3 8 
  54 78 59 82 5 4     66 98 70 104 4 6     51 70 54 78 3 8 
  60 84 63 90 3 6     67 98 71 104 4 6     52 78 59 82 7 4 
  67 98 70 104 3 6     70 104 74 110 4 6     53 74 56 82 3 8 
  68 98 71 104 3 6     75 112 79 118 4 6     55 78 60 84 5 6 
  71 104 74 110 3 6     76 116 80 122 4 6     59 82 62 90 3 8 
  74 110 77 116 3 6     78 117 81 124 3 7     63 90 66 98 3 8 
  77 116 80 122 3 6                     64 94 69 100 5 6 
                                  66 98 71 104 5 6 
                                  69 100 72 108 3 8 
                                  73 108 76 116 3 8 
                                  74 110 78 117 4 7 
                                  79 118 82 126 3 8 
                                              
d=13 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=14 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N      d=15 Z N Z N ∆∆∆∆ Z    ∆∆∆∆ N    
  1 0 7 7 6 7     4 5 11 12 7 7     1 0 8 8 7 8 
  3 4 10 10 7 6     7 7 14 14 7 7     2 2 9 10 7 8 
  5 6 12 12 7 6     15 16 21 24 6 8     4 5 12 12 8 7 
  7 7 13 14 6 7     21 24 27 32 6 8     6 6 13 14 7 8 
  9 10 16 16 7 6     25 30 31 38 6 8     8 8 15 16 7 8 
  13 14 18 22 5 8     32 42 38 50 6 8     10 10 17 18 7 8 
  13 14 20 20 7 6     33 42 39 50 6 8     12 12 19 20 7 8 
  16 16 21 24 5 8     35 44 41 52 6 8     18 22 25 30 7 8 
  17 18 22 26 5 8     36 48 42 56 6 8     20 20 25 30 5 10 
  19 20 24 28 5 8     37 48 43 56 6 8     22 26 29 34 7 8 
  21 24 28 30 7 6     38 50 44 58 6 8     27 32 32 42 5 10 
  23 28 30 34 7 6     39 50 45 58 6 8     30 34 35 44 5 10 
  26 30 31 38 5 8     41 52 47 60 6 8     31 38 36 48 5 10 
  33 42 38 50 5 8     46 60 50 70 4 10     32 42 39 50 7 8 
  34 46 41 52 7 6     47 60 51 70 4 10     33 42 40 50 7 8 
  37 48 42 56 5 8     52 78 60 84 8 6     36 48 43 56 7 8 
  39 50 44 58 5 8     53 74 59 82 6 8     38 50 45 58 7 8 
  40 50 45 58 5 8     56 82 62 90 6 8     43 56 48 66 5 10 
  41 52 46 60 5 8     57 82 63 90 6 8     46 60 51 70 5 10 
  44 58 49 66 5 8     60 84 64 94 4 10     49 66 52 78 3 12 
  47 60 50 70 3 10     61 84 65 94 4 10     52 78 61 84 9 6 
  48 66 53 74 5 8     62 90 68 98 6 8     56 82 63 90 7 8 
  50 70 55 78 5 8     68 98 72 108 4 10     60 84 65 94 5 10 
  53 74 58 82 5 8     73 108 78 117 5 9     65 94 70 104 5 10 
  54 78 61 84 7 6     78 117 83 126 5 9     67 98 72 108 5 10 
  57 82 62 90 5 8                     68 98 73 108 5 10 
  58 82 63 90 5 8                     69 100 74 110 5 10 
  61 84 64 94 3 10                     72 108 78 117 6 9 
  62 90 67 98 5 8                     75 112 80 122 5 10 
  63 90 68 98 5 8                     77 116 82 126 5 10 
  70 104 75 112 5 8                                 
  72 108 77 116 5 8                                 
  74 110 79 118 5 8                                 
  76 116 81 124 5 8                                 
  78 117 82 126 4 9                                 
In Table 4 we give the scheme of increasing R corresponding to A∆  and the corresponding variations in 
the number of nucleons as they are induced step by step. Obviously this table 4 cannot be complete. 
However,  the exposition of the process, also limited to few cases of interest, will contribute to elucidate 
the mechanism under consideration. In brief for 2=∆A  we have oscillation in the values of RQA 
variables but they soon after return to be stable for 3=∆A and 4=∆A . After we pass to 6=∆A where again 
RQA variables are unstable but they return to be stable for 7=∆A and 8=∆A . The next step is 9=∆A with 
instability, followed from stable values for 10=∆A and 11=∆A . We may continue with 13=∆A  that is 
unstable but followed from stable 14=∆A and 15=∆A .The same thing happens for 
...........180.............119........80........38343027232016 ororororororororororororA =∆ . To each given 
unstable A∆  value , will correspond two subsequent stable values that respectively will be given at 
17=∆A and 18=∆A ; at 21=∆A and 22=∆A , ……….. at 120=∆A and 121=∆A , ….. at 181=∆A and 
182=∆A . 
Instabilities are present every three or four increasing values of A∆ . Systematically, each of them is 
followed by stable values at the two subsequent increments of A∆ . 
In conclusion the law seems as it follows: for each pair of nuclei , fixed the value of A∆  with unstable 
value of the RQA variables, the addition of one nucleon by two subsequent steps   stabilizes the values of 
such variables. Obviously, for each selected value of A∆  we have a class of pair of nuclei as indicated as 
example in Table 4. 
In conclusion, the use of RQA variables has cleared that we are in presence of new features for atomic 
nuclei that deserve to be properly explained. We intend to say that the next step of the present research 
should be now to link the different results that have been obtained with  concrete evidences expressible in 
terms of basic concepts of nuclear physics. If on one hand some of such new findings are just evident by 
itself  on the other hand we cannot ignore that in this paper we have moved more on the line of the 
notions as they are contained in the methods that we have used. More concretely : referring as example to 
the basic results that we have obtained by using RQA, and, in particular, to the last results as given by 
using embedding dimension D=1 and reported in Tables 3 and 4 and in Figures 43-47, we cannot ignore 
that we have to consider now pairs of nuclei with given A∆  and thus to identify pairs of subsequent stable 
nuclei and, following this way, to find some new regularities in Z, N and to give new classifications of 
nuclei to different groups using such regularities. In short, the results that we have obtained should reveal 
new regularities about ground states of nuclei not found so easily by other methods. Consequently, this 
new approach might be very useful and important. The aim is to pursue such research work in our future 
investigations. 
8. Conclusions 
In the present paper we have introduced a preliminary but complete analysis of Atomic Weights and Mass 
Number using the methods of non linear analysis. 
We have obtained some  results that appear to be of some interest in understanding the basic foundations 
of nuclear matter. As methodology, we have applied the tests of autocorrelation function and of Mutual 
Information. We have also provided to a reconstruction of the experimental data in phase space giving 
results on Lyapunov spectrum and Correlation Dimension. We have performed an analysis to establish 
the presence of a power law in data on Atomic Weights and Mass Number and such kind of analysis has 
been completed by using the technique of the variogram. The results seem to confirm the presence of a 
fractal regime in the process of increasing mass of atomic nuclei. The estimation of Husrt exponent has 
enabled us to indicate that we would be in presence of a fractional Brownian regime with long range 
correlations.  
To summarize: Some preliminary results have been obtained. The mechanism of increasing mass in 
atomic nuclei reveals itself to be  a nonlinear mechanism marked by a non integer value of Correlation 
dimension in phase space reconstruction. The presence of positive Lyapunov exponents indicate that the 
system of mass increasing is divergent and thus possibly chaotic. By using an identified Power Law and 
the variogram technique we may conclude that we are in presence of a fractal regime, a fractional 
Brownian regime. 
The most relevant results have been  obtained by using RQA. The process under our investigation results 
to be not fully deterministic when considering an embedding dimension D=2. We are in presence of self-
resemblance and pseudo periodicities that show small fluctuations at increasing value of shiftZ −  while 
instead Determinism shows consistent variations at increasing values of such parameter. Also Entropy 
and Max Line reveal the same tendency. Therefore, in the same framework of stable nuclei we have phase 
of increasing stability or increasing instability, depending on the mechanism of composition of the 
considered atomic nuclei and on the differences that they exhibit in the values of their Atomic Weights 
and of Mass Number. A final important result is obtained by using RQA in phase space reconstruction 
using embedding dimension D=1 and increasing Radius R corresponding to net differences in Mass 
number of the considered atomic nuclei. In this case, in phase space reconstruction, RQA involves pairs 
of nuclei in our analysis. New properties are identified at the increasing values of A∆ . In particular, 
determinism oscillates but at some regular distances it also shows definite constant values as well as the 
other RQA variables . This confirms that we are in presence of a mechanism of increasing mass of atomic 
nuclei in which phases of stability result subsequent to phases of instability possibly marked from 
conditions of order-disorder like transitions. We have to consider pairs of nuclei with fixed A∆  and to 
identify pairs of subsequent stable nuclei that indicate new regularities in Z, N that we need to indicate in 
detail . We have to classify nuclei pertaining to different groups using these new regularities. This 
approach might be of valuable interest and it will constitute the object of our future work.  
In this framework, the next step of the present investigation will be also  to analyze data corresponding to 
values of binding energies for atomic nuclei. Possibly the complex of such results will give the possibility 
to indicate new perspectives in the elaboration of more accurate nuclear models of nuclear matter. 
Acknowledgement 
Many thanks are due to M. Pitkanen for his continuous and stimulating interest, suggestions and 
encouragement through this work. 
Software NDT by J. Reiss and VRA by E. Kononov were also used for general non linear analysis. 
REFERENCES 
[1] C.F. von Weizsacher, Z. Phys., 96, 431-458, 1935 
[2] P. Leboeuf, Regularity and Chaos in the nuclear masses, arXiv:nucl-th/0406064; see also H.Olofsson, 
S. Alberg, O. Bohigas, P. Leboeuf, Correlations in Nuclear Masses, arXiv:nucl-th (0602041 v1 13 
Feb.2006 and O. Bohigas, P. Leboeuf, Nuclear Masses: evidence of order-chaos coexistence, 
arXiv:nucl-th/0110025v2 28 Nov. 2001 and references therein. 
[3] A. Bohr and B.R. Mottelson , Nuclear Structure vol. I, Benjamin Reading ,1969. 
[4] V.M. Strutinsky, Nucl. Phys. A95, 420-442,1967 ). 
[5]V. Paar, N. Parvin, A. Rubcic, J. Rubcic, Chaos, Solitons and Fractals, 14, 901-916, 2002 
[6]H. Kroger, Fractal geometry in quantum mechanics, Phys. Rep.323, 81-181, 2000. 
[7] M. Pitkanen, TGD and Nuclear Physics in book p-adic length scale hypothesis and dark matter 
hierarchy,www.helsinki.fi/∼matpitka/paddark/paddark.html≠padnucl. 
[8]G.A. Lalazissis, D. Vrtenar, N. Paar, P. Ring, Chaos, Solitons and Fractals,17, 585-590, 2003. 
[9]M.A.Azar . K. Gopala, Phys.Rev.A39, 5311-5318, 1989 and Phys. Rev. A37, 2173-2180, 1988. 
[10] A.M. Fraser, H.L. Swinney, Independent Coordinates for strange attractors from mutual Information, 
Phys. Rev. A33,1134-1137, 1986 
[11] For details see as example: BB. Mandelbrot et al. SIAM Review,10, 422-437,10,1968; SHEN Wei, 
Zhao Pengda, Multidimensional self-affine distribution with application in geochemistry, Math. 
Geology, 34, 2,109-123, 2002, E. Conte, J.P Zbilut  et al., Chaos, Solitons and Fractals, 29, 701-730, 
2006; 
[12] J. Feder, Fractals, Plenum, New York,1988. 
[13] C.L Webber. Jr, J.P. Zbilut, Dynamical Assessment of physiological systems and states using 
recurrence plot strategies, J. Appl. Physiol. 76, 965-973, 1994. The package of RQA software may be 
free downloaded at http://homepages.luc.edu/∼CWebber/.
ABSTRACT
  For the first time we apply the methodologies of nonlinear analysis to
investigate atomic matter. We use these methods in the analysis of Atomic
Weights and of Mass Number of atomic nuclei. Using the AutoCorrelation Function
and Mutual Information we establish the presence of nonlinear effects in the
mechanism of increasing mass of atomic nuclei considered as a function of the
atomic number. We find that increasing mass is divergent, possibly chaotic. We
also investigate the possible existence of a Power Law for atomic nuclei and,
using also the technique of the variogram, we conclude that a fractal regime
could superintend to the mechanism of increasing mass for nuclei. Finally,
using the Hurst exponent, evidence is obtained that the mechanism of increasing
mass in atomic nuclei is in the fractional Brownian regime. The most
interesting results are obtained by using Recurrence Quantification Analysis
(RQA). New recurrences, psudoperiodicities, self-resemblance and class of
self-similarities are identified with values of determinism showing oscillating
values indicating the presence of more or less stability during the process of
increasing mass of atomic nuclei. In brief, new regimes of regularities are
identified for atomic nuclei that deserve to be studied by future researches.
In particular an accurate analysis of binding energy values by nonlinear
methods is further required.

<|endoftext|><|startoftext|>
Introduction
Recent Chandra and XMM Newton X-ray observations of active galactic nuclei (AGNs)
have detected a new absorption feature in the 15-17 Å wavelength range. This has been iden-
tified as an unresolved transition array (UTA) due mainly to 2p− 3d inner shell absorption
in iron ions with an open M-shell (Fe I - Fe XVI). UTAs have been observed in IRAS
13349+2438 (Sako et al. 2001), Mrk-509 (Pounds et al. 2001), NGC 3783 (Blustin et al.
2002; Kaspi et al. 2002; Behar et al. 2003), NGC 5548 (Steenbrugge et al. 2003), MR 2251-
178 (Kaspi et al. 2004), I Zw 1 (Gallo et al. 2004), NGC 4051 (Pounds et al. 2004), and
NGC 985 (Krongold et al. 2005).
Based on atomic structure calculations and photoabsorbtion modeling, Behar et al.
(2001) have shown that the shape, central wavelength, and equivalent width of the UTA
can be used to diagnose the properties of AGN warm absorbers. However, models which fit
well absorption features from second and third row elements cannot reproduce correctly the
observed UTAs due to the fourth row element iron. The models appear to predict too high
an ionization level for iron. Netzer et al. (2003) attributed this discrepancy to an underesti-
mate of the low temperature dielectronic recombination (DR) rate coefficients for Fe M-shell
ions. To investigate this possibility Netzer (2004) and Kraemer et al. (2004) arbitrarily in-
creased the low temperature Fe M-shell DR rate coefficients. Their model results obtained
with the modified DR rate coefficients support the hypothesis of Netzer et al. (2003). New
calculations by Badnell (2006a) using a state-of-the-art theoretical method disscused in § 5
further support the hypotesis of Netzer et al. (2003).
Astrophysical models currently use the DR data for Fe M-shell ions recommended
by Arnaud & Raymond (1992). These data are based on theoretical DR calculations by
Jacobs et al. (1977) and Hahn (1989). The emphasis of this early theoretical work was
on producing data for modeling collisional ionization equilibrium (sometimes also called
coronal equilibrium). Under these conditions an ion forms at a temperature about an
order of magnitude higher than the temperature where it forms in photoionized plasmas
(Kallman & Bautista 2001). The use of the Arnaud & Raymond (1992) recommended DR
data for modeling photoionized plasmas is thus questionable. Benchmarking by experiment
is highly desirable.
Reliable experimentally-derived low temperature DR rate coefficients of M-shell iron
ions are just now becoming available. Until recently, the only published Fe M-shell DR mea-
surements were for Na-like Fe XVI (Linkemann et al. 1995; Müller 1999; here and throughout
we use the convention of identifying the recombination process by the initial charge state
of the ion). The Na-like measurements were followed up with modern theoretical calcula-
tions (Gorczyca & Badnell 1996; Gu 2004; Altun et al. 2007). Additional M-shell experi-
mental work also exists for Na-like Ni XVIII (Fogle et al. 2003) and Ar-like Sc IV and Ti V
(Schippers et al. 1998, 2002). We have undertaken to measure low temperature DR for other
Fe M-shell ions. Our results for Al-like Fe XIV are presented in Schmidt et al. (2006) and
Badnell (2006b). The present paper is a continuation of this research.
DR is a two-step recombination process that begins when a free electron approaches
an ion, collisionally excites a bound electron of the ion and is simultaneously captured into
a Rydberg level n. The electron excitation can be labeled Nlj → N
′l′j′ where N is the
principal quantum number of the core electron, l its orbital angular momentum, and j its
total angular momentum. The intermediate state, formed by simultaneous excitation and
capture, may autoionize. The DR process is complete when the intermediate state emits a
photon which reduces the total energy of the recombined ion to below its ionization limit.
In this paper we present experimental and theoretical results for ∆N=N ′ −N = 0 DR
of Mg-like Fe XV forming Al-like Fe XIV. In specific we have studied 3 → 3 DR via the
resonances:
Fe14+(3s2[1S0]) + e
Fe13+(3s3p[3P o
0,1,2;
1P1]nl)
Fe13+(3s3d[3D1,2,3;
1D2]nl)
Fe13+(3p2[3P0,1,2;
1S0]nl)
Fe13+(3p3d[3Do
1,2,3;
2,3,4;
0,1,2;
; 1Do
; 1F o
Fe13+(3d2[3P0,1,2;
3F2,3,4;
1S0]nl)
Possible contributions due to 3s3p 3P metastable parent ions will be discussed below. Table 1
lists the excitation energies for the relevant Fe XV levels, relative to the ground state, that
have been considered in our theoretical calculations. In our studies we have carried out
measurements for electron-ion center-of-mass collision energies Ecm between 0 and 45 eV.
Our work is motivated by the “formation zone” of Fe M-shell ions in photoionized gas.
This zone may be defined as the temperature range where the fractional abundance of a given
ion is greater than 10% of its peak value (Schippers et al. 2004). We adopt this definition
for this paper. Savin et al. (1997, 1999, 2002a,b, 2006) defined this zone as the temperature
range where the fractional abundance is greater than 10% of the total elemental abundance.
This is narrower than the Schippers et al. (2004) definition. For Fe XV the wider definition
corresponds to a kBTe ≈ 2.5-15 eV (Kallman & Bautista 2001). It should be kept in mind
that this temperature range depends on the accuracy of the underlying atomic data used to
calculate the ionization balance.
The paper is organized as follows: The experimental arrangement for our measure-
ments is described in § 2. Possible contamination of our parent ion beam by metastable
ions is discussed in § 3. Our laboratory results are presented in § 4. In this section the
experimentally-derived DR rate coefficient for a Maxwellian plasma is provided as well.
Theoretical calculations which have been carried out for comparison with our experimental
results are discussed in § 5. Comparison between the experimental and theoretical results is
presented in § 6. A summary of our results is given in § 7.
2. Experimental Technique
DR measurements were carried out at the heavy-ion test storage ring (TSR) of the
Max-Planck Institute for Nuclear Physics (MPI-K) in Heidelberg, Germany. A merged
beams technique was used. A beam of 56Fe14+ with an energy of 156 MeV was provided
by the MPI-K accelerator facility. Ions were injected into the ring and their energy spread
reduced using electron cooling (Kilgus et al. 1990). Typical waiting times after injection
and before measurement were ≈ 1 s. Mean stored ion currents were ≈ 10 µA. Details of
the experimental setup have been given elsewhere (Kilgus et al. 1992; Lampert et al. 1996;
Schippers et al. 1998, 2000, 2001).
Recently a second electron beam has been installed at the TSR (Sprenger et al. 2004;
Kreckel et al. 2005). This allows one to use the first electron beam for continuous cooling
of the stored ions and to use the second electron beam as a target for the stored ions. In
this way a low velocity and spatial spread of the ions can be maintained throughout the
course of a DR measurement. The combination of an electron cooler and an electron target
can be used to scan energy-dependent electron-ion collision cross sections with exceptional
energy resolution. In comparison to the electron cooler, the electron source and the electron
beam are considerably smaller and additional procedures, such as the stabilization of the
beam positions during energy scans and electron beam profile measurements, are required
to control the absolute luminosity product between the ion and electron beam on the same
precise level as reached at the cooler. The target electron beam current was ≈ 3 mA. The
beam was adiabatically expanded from a diameter of 1.6 mm at the cathode to 7.5 mm
in the interaction region using an expansion factor of 22. This was achieved by lowering
the guiding magnetic field from 1.28 T at the cathode to 0.058 T in the interaction region
thus reducing the transverse temperature to approximately 6 meV. The relative electron-
ion collision energy can be precisely controlled and the recombination signal measured as a
function of this energy. We estimate that the uncertainty of our scale for Ecm is . 0.5%.
The electrons are merged and demerged with the ion beam using toroidal magnets.
After demerging, the primary and recombined ion beams pass through two correction dipole
magnets and continue into a bending dipole magnet. Recombined ions are bent less strongly
than the primary ion beam and they are directed onto a particle detector used in single
particle counting mode. Some of the recombined ions can be field-ionized by motional
electric fields between the electron target and the detector and thus are not detected. Here
we assumed a sharp field ionization cutoff and estimated for Fe XV that only electrons
captured into nmax . 80 are detected by our experimental arrangement.
The experimental energy distribution can be described as a flattened Maxwellian dis-
tribution. It is characterized by the transversal and longitudinal temperatures T⊥ and T‖,
respectively. The experimental energy spread depends on the electron-ion collision energy
and can be approximated according to the formula ∆E = ([ln(2)kBT⊥]
2+16 ln(2)EcmkBT‖)
(Pastuszka et al. 1996). For the comparison of our theoretical calculations with our experi-
mental data we convolute the theoretical results described in § 5 with the velocity distribution
function given by Dittner et al. (1986) to simulate the experimental energy spread.
With the new combination of an electron target and an electron cooler we obtain in the
present experiment electron temperatures of kBT⊥ ≈ 6 meV and kBT‖ ≈ 0.05 meV. In order
to verify the absolute calibration of the absolute rate coefficient scale we also performed
a measurement with the electron cooler using the previous standard method (Kilgus et al.
1992, Lampert et al. 1996). We find consistent rate coefficients and spectral shapes, while
the electron temperatures were larger by a factor of about 2 with the electron cooler alone.
Moreover, because of the large density of resonances found in certain regions of the Fe XV
DR spectrum the determination of the background level for the DR signal was considerably
more reliable in the higher resolution electron target data than in the lower resolution cooler
data. Hence, we performed the detailed analysis presented below on the electron target data
only.
Details of the experimental and data reduction procedure are given in Schippers et al.
(2001, 2004) and Savin et al. (2003) and reference therein. The baseline experimental un-
certainty (systematic and statistical) of the DR measurements is estimated to be ±25% at
a 90% confidence level (Lampert et al. 1996). The major sources of uncertainties include
the electron beam density determination, ion current measurements, and corrections for the
merging and demerging of the two beams. Additional uncertainties discussed below result in
a higher total experimental uncertainty as is explained in §§ 3 and 4. Unless stated otherwise
all uncertainties in this paper are cited at an estimated 90% confidence level.
3. Metastable Ions
For Mg-like ions with zero nuclear spin (such as 56Fe), the 1s22s22p63s3p 3P0 level is
forbidden to decay to the ground state via a one-photon transition and the multiphoton
transition rate is negligible. Hence this level can be considered as having a nearly infinite
lifetime (Marques, Parente, & Indelicato 1993; Brage et al. 1998). It is possible that these
metastables are present in the ion beam used for the present measurements.
We estimate that the largest possible metastable 3P0 fraction in our stored beam is 11%.
This assumes that 100% of the initial Fe14+ ions are in 3PJ levels and that the levels are
statistically populated. We expect that the J = 1 and 2 levels will radiatively decay to the
ground state during the ∼ 1 s between injection and measurement. The lifetimes of the 3P1
and 3P2 levels are ∼ 1.4 × 10
−10 s (Marques et al. 1993) and ∼ 0.3 s (Brage et al. 1998),
respectively. These decays leave 1/9th or 11% of the stored ions in the 3P0 level.
Our estimate is only slightly higher than the inferred metastable fraction for the ion
beam used for DR measurements of the analogous Be-like Fe22+ (Savin et al. 2006). The
Be-like system has a metastable 1s22s2p 3P0 state and following the above logic the stored
Be-like ion beam had an estimated maximum 11% 3P0 fraction. Fortunately, for the Be-like
measurements we were able to identify DR resonances due to the 3P0 parent ion and use the
ratio of the experimental to theoretical resonance strengths to infer the 3P0 fraction. There
we determined a metastable fraction of 7% ± 2%. A similar fraction was inferred for DR
measurements with Be-like Ti18+ ions (Schippers et al. 2007).
Using theory as a guide, we have searched our Mg-like data fruitlessly for clearly iden-
tifiable DR resonances due to metastable 3P0 parent ions. First, following our work in the
analogous Be-like Fe22+ with its 2s2p 3P0 → 2p
2 core excitation channel (Savin et al. 2006),
we searched for Fe14+ resonances associated with the relevant 3s3p 3P0 → 3p
2 core exci-
tations. However, most of these yield only very small DR cross sections as they strongly
autoionize into the 3s3p 3PJ=1,2 continuum channels. These are energetically open at Ecm
greater than 0.713 eV and 2.468 eV, respectively (Table 1). Hence, above Ecm ≈ 0.713 eV
there are no predicted significant DR resonances for metastable Fe14+ via 3s3p 3P0 → 3p
core excitations. Below this energy the agreement between theory and experiment is ex-
tremely poor (as can be seen in Fig. 1) and we are unable to assign unambiguously any DR
resonance to either the ground state or metastable parent ion. Second, we searched for reso-
nances associated with 3s3p 3P0 → 3s3p
1P1, 3s3p
3P0 → 3s3p
3P1, and 3s3p
3P0 → 3s3p
core excitation which are energetically possible for capture into the n ≥ 14, 62, and 33 levels,
respectively, and which may contribute to the observed resonance structures. The analogous
2s2p 3P0 → 2s2p
1P1 and 2s2p
3P0 → 2s2p
3P2 core excitations were seen for Be-like Ti
(Schippers et al. 2007). However, again the complexity of the Fe XV DR resonance spec-
trum (cf., Fig. 1) prevented unambiguous identification for DR via any of these three core
excitations. Hence despite these two approaches, we have been unable to directly determine
the metastable fraction of our Fe14+ beam.
Clearly our assumption that the 3PJ levels are statistically populated is questionable.
Ion beam generation using beam foil techniques are known to produce excited levels. The
subsequent cascade relaxation could potentially populated the J levels non-statistically
(Martinson & Gaupp 1974; Quinet et al. 1999). Additionally the magnetic sublevels mJ
can be populated non-statistically (Martinson & Gaupp 1974) which may affect the J lev-
els. However, our argument in the above paragraphs that the 3PJ levels are statistically
populated yields 3P0 fractions of the analgous Be-like Ti
18+ and Fe22+ of 11% while our
measurements found metastable fractions of ∼ 7% for those two beams. From this we con-
clude either (a) that if 100% of the initial ions are in the 3PJ levels, then the J = 0 level is
statistically under-populated or (b) that the fraction of initial ions in the 3PJ levels is less
than 100% by a quantity large enough that any non-statistical populating of the various J
levels still yields only a 7% 3P0 metastable fraction of the ion beam. Thus we believe that
our assumption provides a reasonable upper limit to the metastable fraction of the Fe14+
beam.
Based on our estimates above and the Be-like results we have assumed that 6%±6% of
the Fe14+ ions are in the 3s3p 3P0 metastable state and the remaining fraction in the 3s
2 1S0
ground state. Here, we treat this possible 6% systematic error as a stochastic uncertainty
and add it in quadrature with the 25% uncertainty discussed above.
4. Experimental Results
Our measured 3 → 3 DR resonance spectrum for Fe XV is shown in Figs. 1 - 8. The
data 〈σv〉 represent the summed DR and radiative recombination (RR) cross sections times
the relative velocity convolved with the energy spread of the experiment, i.e., a merged beam
recombination rate coefficient (MBRRC).
The strongest DR resonance series corresponds to 3s2 1S0 → 3s3p
1P1 core excitations.
Other observed features in the DR resonance spectrum are possibly due to double core
excitations discussed in § 1. Trielectronic recombination (TR), as this has been named, has
been observed in Be-like ions (Schnell et al. 2003a,b; Fogle et al. 2005). These ions are the
second row analog to third row Mg-like ions. However in our data unambiguous assignment
of possible candidates for the TR resonances could not be made.
Extracted resonance energies Ei and resonance strengths Si for Ecm ≤ 0.95 eV are listed
in Table 2 along with their fitting errors. These data were derived following the method
outlined in Kilgus et al. (1992). Most of these resonances were not seen in any of the
theoretical calculations for either ground state or metastable Fe14+. Hence their parentage
is uncertain. The implications of this are discussed below.
Difficulties in determining the non-resonant background level of the data contributed
an uncertainty to the extracted DR resonance strengths. For the strongest peaks this was
on the order of ≈ 10% for Ecm . 5 eV and ≈ 3% for Ecm & 5 eV. Taking into account the
25% and 6% uncertainties discussed in §§ 2 and 3, respectively, this results in an estimated
total experimental uncertainty for extracted DR resonance strengths of ±28% below ≈ 5 eV
and ±26% above.
Due to the energy spread of the electron beam, resonances below Ecm . kBT⊥ cannot be
resolved from the near 0 eV RR signal. Here this limit corresponds to ≈ 6 meV. But we can
infer the absence of resonances lying below the lowest resolved resonance at 6.74 meV. For
Ecm . kBT‖, a factor of up to ∼ 2− 3 enhanced MBRRC is observed in merged electron-ion
beam experiments (see e.g., Gwinner et al. 2000; Heerlein et al. 2002). Here this temperature
limit corresponds to Ecm . 0.05 meV. As shown in Fig. 9, at an energy 0.005 meV our
MBRRC is a factor of 2.5 times larger than the fit to our data using the RR cross section
from semi-classical RR theory with quantum mechanical corrections (Schippers et al. 2001)
and the extracted DR resonance strengths and energies. This enhancement is comparable
to that found for systems with no unresolved DR resonances near 0 eV (e.g., Savin et
al. 2003 and Schippers at al. 2004). Hence, we infer that there are no additional significant
unresolved DR resonances below 6.74 meV. Recent possible explanations for the cause of the
enhancement near 0 eV have been given by Hörndl et al. (2005, 2006) and reference therein.
We have generated an experimentally-derived rate coefficient for 3 → 3 DR of Fe XV
forming Fe XIV in a plasma with a Maxwellian electron energy distribution (Fig. 10). For
Ecm ≤ 0.95 eV we have used our extracted resonance strengths listed in Table 2. For
energies Ecm ≥ 0.95 eV we have numerically integrated our MBRRC data after subtracting
out the non-resonant background. The rate coefficient was calculated using the methodology
outlined in Savin (1999) for resonance strengths and in Schippers et al. (2001) for numerical
integration.
In the present experiment only DR involving capture into Rydberg levels with quantum
numbers nmax . 80 contribute to the measured MBRRC. In order to generate a total ∆N=0
plasma rate coefficient we have used AUTOSTRUCTURE calculations (see § 5) to account
for DR into higher n levels. As is discussed in more detail in § 6, between 25-42 eV we find
good agreement between the experimental and AUTOSTRUCTURE resonance energies.
However, the theoretical results lie a factor of 1.31 above the measurement. To account
for DR into n ≥ nmax = 80, above 42 eV we replaced the experimental data with the
AUTOSTRUCTURE results (nmax =1000) reduced by a factor of 1.31. Our resulting rate
coefficient is shown in Fig. 10.
Including the DR contribution due to capture into n > 80 increases our experimentally-
derived DR plasma rate coefficient by < 1% for kBTe < 7 eV, by < 2.5% at 10 eV and by
< 7% at 15 eV. This contribution increases to 20% at 40 eV, rises to 27% at 100 eV and
saturates at ≈ 35% at 1000 eV. Thus we see that accounting for DR into n > nmax = 80
levels has only a small effect at temperatures of kBTe ≈ 2.5-15 eV where Fe XV is predicted
to form in photoionized gas (Kallman & Bautista 2001). Also, any uncertainties in this
theoretical addition, even if relatively large, would still have a rather small effect at these
temperatures on our derived DR total rate coefficient. Hence, we have not included this in
our determination below of the total experimental uncertainty for the experimentally-derived
plasma rate coefficient at kBTe ≥ 1 eV.
The two lowest-energy resonances in the experimental spectrum occur at energies of
6.74 meV and 9.80 meV with resonance strengths of 1.89 × 10−16 cm2 eV and 1.01 ×
10−17 cm2 eV, respectively (see Table 1 and Fig. 9). As already mentioned, the parent-
age for the two lowest energy resonances is uncertain. These resonances dominate the DR
rate coefficient for kBTe < 0.24 eV. The contribution is 50% at 0.24 eV, 16% at 0.5 eV,
6.5% at 1 eV, 2.4% at 2.5 eV, and < 0.31% above 15 eV. At temperatures where Fe XV is
predicted to form in photoionized plasmas, contributions due to these two resonances are
insignificant. Because of this, we do not include the effects of these two resonances when
calculating below the total experimental uncertainty for the experimentally-derived plasma
rate coefficient at kBTe ≥ 1 eV.
An additional source of uncertainty in our results is due to possible contamination of
the Fe XV beam by metastable 3P0 ions. Because we cannot unambiguously identify DR
resonances due to metastable parent ions, we cannot directly subtract out any contributions
they may make to our experimentally-derived rate coefficient. Instead we have used our
AUTOSTRUCTURE calculations for the metastable parent ion as a guide, multiplied them
by 0.06 on the basis of the estimated (6 ± 6)% metastable content. We then integrated
them to produce a Maxwellian rate coefficient and compared the results to our experimental
results, leaving out the two lowest measured resonances at 6.74 and 9.80 meV. As discussed
in the paragraph above, these two resonaces were left out because of the uncertainty in their
parentage and their small to insignificant effects above 1 eV. The metastable theoretical
results are 9.5% of this experimentally-derived rate coefficient at kBTe = 1 eV, 4.9% at
2.5 eV, 2.2% at 5 eV, 1% at 10 eV and < 0.77% above 15 eV.
In reality these are probably lower limits for the unsubtracted metastable contributions
to our experimentally-derived rate coefficient. However, these limits appear to be reasonable
estimates even taking into account the uncertainty in the exact value of the contributions
due to metastable ions. For example, if we assume that we have the estimated maximum
metastable fraction of 11%, then our experimentally-derived rate coefficients would have to
reduced by only 9.0% at 2.5 eV, 4.0% at 5 eV, 1.8% at 10 eV, and less than 1.4% above
15 eV. Alternatively, it is likely that theory underestimates the resonance strength for the
metastable parent ions similar to the case for ground state parent ions (cf., Fig. 1). However,
if the metastable fraction is 6% and the resonance contributions are a factor of 2 higher, then
our experimentally-derived rate coefficients would have to reduced by only 9.8% at 2.5 eV,
4.4% at 5 eV, 2.0% at 10 eV, and less than 1.5% above 15 eV. These are small and not very
significant corrections. We consider it extremely unlikely that we have underestimated by a
factor of nearly 2 both the metastable fraction and the metastable resonance contribution.
Thus we expect contamination due to metastable 3P0 ions to have a small to insignificant
effect on our derived rate coefficient at temperatures where Fe XV is predicted to form in
photoinoized gas.
Taking into account the baseline experimental uncertainty of 25%, the metastable frac-
tion uncertainty of 6%, and the nonresonant background uncertainty of 10%/3%, all dis-
cussed above, as well as the uncertainty due to the possible unsubtracted metastable res-
onances, the estimated uncertainty in the absolute magnitude of our total experimentally-
derived Maxwellian rate coefficient ranges between 26% and 29% for kBTe ≥ 1 eV. Here
we conservatively take the total experimental uncertainty to be ±29%. This uncertainty
increases rapidly below 1 eV due to the ambiguity of the parentage for the two lowest energy
resonances and possible resonance contributions from metastable Fe XV which we have not
been able to subtract out.
We have fitted our experimentally-derived rate coefficient plus the theoretical estimate
for capture into n > 80 using the simple fitting formula
αDR(Te) = T
−Ei/kBTe (2)
where ci is the resonance strength for the ith fitting component and Ei the corresponding
energy parameter. Table 3 lists the best-fit values for the fit parameters. All fits to the total
experimentally-derived Maxwellian-averaged DR rate coefficient show deviations of less than
1.5% for the temperature range 0.001 ≤ kBTe ≤ 10000 eV.
In Table 3, the Experiment (I) column gives a detailed set of fitting parameters where
the first 30 values of ci and their corresponding Ei values are for all the resolved resonances
for Ecm ≤ 0.95 eV given in Table 2. The parentage for these resonances are uncertain, though
the majority are most likely due to ground state and not metastable Fe14+. It is our hope that
future theoretical advances will allow one to determine which resonances are due to ground
state ions and which are due to metastables. Listing the resonances as we have will allow
future researchers to readily exclude those resonances which have been determined to be due
to the metastable parent. The remaining 6 fitting parameters yield the rate coefficient due
to all resonances for Ecm between 0.95 and the 3s3p(
1P1)nl series limit at 43.63 eV. In the
Experiment (II) column of Table 3, the first six sets of ci and Ei give the fitting parameters
for the first six resonances. The remaining sets of fit parameters are due to all resonances
between 0.1 eV and the series limit.
5. Theory
The only published theoretical DR rate coefficient for Fe XV which we are aware of is
the work of Jacobs et al. (1977). Using the work of Hahn (1989), Arnaud and Raymond
(1992) modified the results of Jacobs et al. (1977) to take into account contributions from
2p−3d inner-shell transitions. The resulting rate coefficient of Arnaud and Raymond (1992)
is widely used throughout the astrophysics community.
We have carried out new calculations using a state-of-the-art multiconfiguration Breit-
Pauli (MCBP) theoretical method. Details of the MCBP calculations have been reported in
Badnell et al. (2003). Briefly, the AUTOSTRUCTURE code was used to calculate energy
levels as well as radiative and autoionization rates in the intermediate-coupling approxi-
mation. These must be post-processed to obtain the final state level-resolved and total
dielectronic recombination data. The resonances are calculated in the independent process
and isolated resonance approximation (Seaton & Storey 1976).
The ionic thresholds were shifted to known spectroscopic values for the 3 → 3 transi-
tions. Radiative transitions between autoionizing states were accounted for in the calculation.
The DR cross section was approximated by the sum of Lorentzian profiles for all included
resonances. The AUTOSTRUCTURE calculations were performed with explicit n values up
to 80 in order to compare closely with experiment. The resulting MBRRC is presented for
3 → 3 core excitations in Figs. 1-8.
The theoretical 3 → 3 DR plasma rate coefficient was obtained by convolving calculated
DR cross section times the relative electron-ion velocity with a Maxwellian electron energy
distribution. Cross section calculations were carried out up to nmax = 1000. The resulting
Maxwellian plasma rate coefficient is given in Fig. 10.
We have fit our theoretical 3 → 3 MCBP Maxwellian DR rate coefficients using Eq. 2.
The resulting fit parameters are presented in Table 3. The accuracy of the MCBP fit is
better than 0.5% for the temperature range 0.1 ≤ kBTe ≤ 10000 eV. This lower limit
represents the range over which rate coefficient data were calculated. Data are not presented
below (101z2)/11605 eV, which is estimated to be the lower limit of the reliability for the
calculations (Badnell 2007). Here z = 14 and this limit is 0.17 eV.
6. Discussion
6.1. Resonance Structure
As we have already noted, we find poor agreement between our experimental and the-
oretical resonance energies and strengths for electron-ion collision energies below 25 eV.
Theory does not correctly predict the strength of many DR resonances which are seen in
the measurement. A similar extensive degree of disparity between the theoretical and the
measured resonances was also seen in our recent Fe13+ results (Schmidt et al. 2006; Badnell
2006b).
Some of the weaker peaks in our data below 1 eV may be due to the possible presence
of metastable Fe14+ in our beam. But the estimated small metastable contamination seems
unlikely to be able to account in this range for many of the strong resonances which are not
seen in the present theory. Above ≈ 1 eV, we expect no significant DR resonances due to
metastable Fe14+ (as is discussed in § 3).
In the energy range from 1− 25 eV, the differences between experiment and theory are
extensive. The reader can readily see from Figs. 1-8 that theory does not correctly predict
the strength of many resonances which are observed in the experiment. This conclusion
takes into account the by-eye shifting of the theoretical resonances energies to try to match
up theory with the measured resonances.
Between 25 − 42 eV we find good agreement between the experiment and theory for
resonance energies. The AUTOSTRUCTURE code reproduces well the more regular res-
onance energy structure of high-n Rydberg resonances approaching the 3s3p(1P1)nl series
limit. However the AUTOSTRUCTURE cross section lies ≈ 31% above the measurements.
This discrepancy is larger than the estimated ±26% total experimental uncertainty in this
energy range. A similar discrepancy with theory was found for Fe13+ (Badnell 2006b).
Theory and experiment diverge above 42 eV and approaching the 3s3p(1P1)nl se-
ries limit. We attribute the difference in the shape between the calculated and measured
3s3p(1P1)nl series limit partly to the nl dependence of the field-ionization process in the
experiment. Here we assumed a sharp n cutoff. Schippers et al. (2001) discuss the effects
of a more correct treatment of the field-ionization process in TSR. Their formalism uses the
hydrogenic approximation to take into account the radiative lifetime of the Rydberg level n
into which the initially free electron is captured.
Our theoretical calculations indicate there are no DR resonances due to 2 → 3 or 3 → 4
core excitations below 44 eV, significant or insignificant. The two weak peaks above the
3s3p(1P1)nl series limit at 43.63 eV are attributed to ∆N=1 resonances.
6.2. Rate Coefficients
The recommended rate coefficient of Arnaud & Raymond (1992) is in mixed agreement
with our experimental results (Fig. 10). For temperatures below 90 eV, their rate coefficient
is in poor agreement. At temperatures where Fe XV is predicted to form in photoionzed gas,
their data are a factor of 3 to orders of magnitude smaller than our experimental results. At
temperatures above 90 eV, the Arnaud & Raymond (1992) data are in good agreement with
our combined experimental and theoretical rate coefficient.
As already implied by the work of Netzer et al. (2003) and Kraemer et al. (2004), the
present result shows that the previously available theoretical DR rate coefficients for Fe XV
are much too low at temperatures relevant for photoionized plasmas. Other storage ring
measurements show similar difference with published recommended low temperature DR
rate coefficients for Fe M-shell ions (Müller 1999; Schmidt et al. 2006). The reason for this
discrepancy is primarily because the earlier theoretical calculations were for high temperature
plasmas and did not include the DR channels important for low temperatures plasmas.
At temperatures relevant for the formation of Fe XV in photoionized gas, we find that
the modified Fe XV rate coefficient of Netzer (2004) is up to an order of magnitude smaller
than our experimental results. The modified rate coefficient of Kraemer et al. (2004) is a
factor of over 3 times smaller. These rate coefficients were guesses meant to investigate the
possibility that larger low temperature DR rate coefficients could explain the discrepancy
between AGN observations and models. The initial results were suggestive that this is the
case. Our work confirms that the previously recommended DR data are indeed too low but
additionally shows that the estimates of Netzer et al. (2003) and Kraemer et al. (2004) are
also still too low. A similar conclusion was reached by Schmidt et al. (2006) based on their
measurement for Fe13+. Clearly new AGN modeling studies need to be carried out using our
more accurate DR data (Badnell 2006a).
Our state-of-the-art MCBP calculations are 37% lower than our experimental results at
a temperature of 1 eV. This difference decreases roughly linearly with increasing temperature
to ≈ 25% at 2.5 eV. It is basically constant at ≈ 23% up to 7 eV and then again nearly
monotonically decreases to 19% at 15 eV. As discussed in § 4, a small part of these difference
may be attributed to unsubtracted metastable 3P0 contributions. But these contributions are
< 10% at 2.5 eV, < 5% at 5 eV, < 2.0% at 10 eV, and < 1.4% above 15 eV (hence basically
insignificant). Above 15 eV the difference decreases and at 23 eV and up the agreement is
within . 10% with theory initially smaller than experiment but later greater. Part of the
good agreement at these higher temperatures is due to our use of theory for the unmeasured
DR contribution due to states with n > 80.
7. Summary
We have measured resonance strengths and energies for ∆N=0 DR of Mg-like Fe XV
forming Al-like Fe XIV for center-of-mass collision energies Ecm from 0 to 45 eV and compared
our results with new MCBP calculations. We have generated an experimentally-derived
plasma rate coefficient by convolving the measured MBRRC with a Maxwell-Boltzmann
electron energy distribution. We have supplemented our measured MBRRC with MCBP cal-
culations to account for unmeasured DR into states which are field-ionized before detection.
The resulting plasma recombination rate coefficient has been compared to the recommended
rate coefficient of Arnaud & Raymond (1992) and new calculations using a state-of-the-art
MCBP theoretical method. We have considered the issues of metastable ions in our stored
ion beam, enhanced recombination for collision energies near 0 eV, and field-ionization of
high Rydberg states in the storage ring bending magnets.
As suggested by Netzer et al. (2003) and Kraemer et al. (2004), the present result shows
that the previously available theoretical DR rate coefficients for Fe XV are much too low.
Other storage ring measurements show similar differences with published recommended low
temperature DR rate coefficients for M-shell iron ions (Müller 1999; Schmidt et al. 2006).
We are now in the process of carrying out DR measurements for additional Fe M-shell ions.
As these data become available we recommend that these experimentally-derived DR rate
coefficients be incorporated into AGN spectral models in order to produce more reliable
results.
We gratefully acknowledge the excellent support by the MPI-K accelerator and TSR
crews. CB, DVL, MS, and DWS were supported in part by the NASA Space Astrophysics
Research Analysis program, the NASA Astronomy and Astrophysics Research and Analy-
sis program, and the NASA Solar and Heliosperic Physics program. This work was also
supported in part by the German research-funding agency DFG under contract no. Schi
378/5.
REFERENCES
Altun, Z., Yumak, A., Yavuz, I., Badnell, N. R., Loch, S. D., & Pindzola, M. S. 2007, in
preparation
Arnaud, M., & Raymond, J. 1992, ApJ, 398, 394
Badnell, N. R., et al. 2003 A&A, 406, 1151
Badnell, N. R. 2006a, ApJ, 651, L73
Badnell, N. R. 2006b, J. Phys. B, 39, 4285
Badnell, N. R. 2007, http://amdpp.phys.strath.ac.uk/tamoc/DATA/DR/
Behar, E., Sako, M., & Kahn S. M. 2001, ApJ, 563, 497
Behar, E., et al. 2003, ApJ, 598, 232
http://amdpp.phys.strath.ac.uk/tamoc/DATA/DR/
Blustin, A. J., et al. 2002, A&A, 442, 757
Brage, T., Judge, P. G., Aboussaied, A., Godefroid, M. R., Joensson, P., Ynnerman, A.,
Fischer, C. F., & Leckrone, D. S. 1998, ApJ, 500, 507
Churilov, S. S., Levashov, V. E., & Wyart, J. F. 1989, Phys. Scr., 40, 625
Dittner, P. F., Datz, S., Miller, P. D., Pepmiller, P. L., & Fou, C. M. 1986, Phys. Rev. A,
33, 124
Fogle, M., Badnell, N. R., Eklöw, N., Mohamed, T., & Schuch, R. 2003, A&A, 409, 781
Fogle, M., Badnell, N. R., Glans, P., Loch, S. D., Madzunkov, S., Abdel-Naby, Sh. A.,
Pindzola, M. S., & Schuch, R. 2005, A&A, 442, 757
Gallo, L. C., Boller, T., Brandt, W. N., Fabian, A. C., & Vaughan, S. 2004, A&A, 417, 29
Gorczyca T. W., & Badnell, N. R. 1996, Phys. Rev. A, 54, 4113
Gu, M. F. 2004, ApJ, 589, 389
Gwinner, G., et al. 2000, Phys. Rev. Lett., 84, 4822
Hahn, Y. 1989, J. Quant. Spectrosc. Radiat. Transfer, 41, 315
Heerlein, C., Zwicknagel, G., & Toepffer, C. 2002, Phys. Rev. Lett., 89, 083202
Hörndl, M., Yoshida, S., Wolf, A., Gwinner, G., & Burgdörfer J. 2005, Phys. Rev. Lett., 95,
243201
Hörndl, M., Yoshida, S., Wolf, A., Gwinner, G., Seliger, M., & Burgdörfer J. , Phys. Rev. A
74, 052712
Jacobs, V. L., Davis, J., Kepple, P. C., & Blaha, M. 1977, ApJ, 211, 605
Kallman, T. R., & Bautista M. 2001, ApJS, 133, 221
Kaspi, S., et al. 2002, ApJ, 574, 643
Kaspi, S., Netzer, H., Chelouche, D., George, I. M., Nandra, K., & Turner, T. J. 2004, ApJ,
611, 68
Kilgus, G., et al. 1990, Phys. Rev. Lett., 64, 737
Kilgus, G., Habs, D., Schwalm, D., Wolf, A., Badnell, N. R., & Müller, A. 1992, Phys. Rev. A,
46, 5730
Kraemer, S. B., Ferland, G. J., & Gabel, J. R. 2004, ApJ, 604, 561
Kreckel, H.. et al. 2005, Phys. Rev. Lett., 95, 263201
Krongold, Y., Nicastro, F. M., Brickhouse, N. S., Mathura, S., & Zezas, A. 2005, ApJ, 620,
Lampert, A., Wolf, A., Habs, D., Kilgus, G., Schwalm, D., Pindzola, M. S., & Badnell, N.
R. 1996, Phys. Rev. A, 53, 1413
Linkemann, J., et al. 1995, Nucl. Instrum. Methods Phys. Res. B, 98, 154
Marques, J. P., Parent, F., & Indelicato, P. 1993, At. Data. Nuc. Data Tab., 55, 157
Martinson, I., & Gaupp, A. 1974, Phys. Rep. 15, 113
Müller, A. 1999, Int. J. Mass Spectrom., 192, 9
Netzer, H., et al. 2003, ApJ, 599, 933
Netzer, H. 2004, ApJ, 604, 551
Nikolić, D., et al. 2004, Phys. Rev. A, 70, 062723
Pastuszka, S., et al. 1996, Nucl. Instrum. Methods Phys. Res. A, 369, 11
Pounds, K. A., Reeves, J. N., O’Brien, P. T., Page, K. A., Turner, M. J. L., & Nayakshin S.
2001, ApJ, 559, 181
Pounds, K. A., Reeves, J., King, A. R, & Page, K. L. 2004, MNRAS, 350, 10
Quinet, P., Palmeri, P., Bimont, E., McCurdy. M. M., Rieger, G., Pinnington, E. H., Wick-
liffe, M. E., & Lawler, J. E. 1999, Mon. Not. R. Astron. Soc. 307, 934
Ralchenko, Yu., et al. 2006, NIST Atomic Spectra Database (version 3.1.0), [Online]. Avail-
able: http://physics.nist.gov/asd3. National Institute of Standards and Technology,
Gaithersburg, MD.
Sako, M., et al. 2001, A&A, 365, L168
Savin, D. W. 1999, ApJ, 523, 855
Savin, D. W. 2000, ApJ, 533, 106
Savin, D. W., et al. 1997, ApJ, 489, L115
http://physics.nist.gov/asd3
Savin, D. W., et al. 1999, ApJS, 123, 687
Savin, D. W., et al. 2002a, ApJS, 138, 337
Savin, D. W., et al. 2002b, ApJ, 576, 1098
Savin, D. W., et al. 2003, ApJS, 147, 421
Savin, D. W., et al. 2006, ApJ, 642, 1275
Schippers, S., Bartsch, T., Brandau, C., Gwinner, G., Linkemann, J., Müller, A., Saghiri,
A. A., & Wolf, A. 1998, J. Phys. B, 31, 4873
Schippers, S., et al. 2000, Phys. Rev. A, 62, 022708
Schippers, S., Müller, A., Gwinner, G., Linkemann, J., Saghiri, A. A., & Wolf, A. 2001, ApJ,
555, 1027
Schippers, S., et al. 2002, Phys. Rev. A, 65, 042723
Schippers, S., Schnell, M., Brandau, C., Kieslich, S., Müller, A., & Wolf, A. 2004, A&A,
421, 1185
Schippers, S., et al. 2007, Phys. Rev. Lett., 98, 033001
Schmidt, E. W., et al. 2006, ApJ, 641, L157
Schnell, M., et al. 2003, Phys. Rev. Lett. 91, 043001
Schnell M., et al. 2003, Nucl. Instrum. Methods Phys. Res. B, 205, 367
Seaton, M. J., & Storey, P. J. 1976, in Atomic Processes and Applications, ed. P. G. Burke
& B. L. Moisewitch (North-Holland, Amsterdam), 133
Sprenger, F., Lestinsky, M., Orlov, D. A., Schwalm, D., & Wolf, A. 2004, Nucl. Instrum.
Methods Phys. Res. A, 532, 298
Steenbrugge, K. C., Kaastra, J. S., de Vries, C. P., & Edelson, R. 2003 A&A, 402, 477
This preprint was prepared with the AAS LATEX macros v5.2.
Table 1. Energy levels for the n = 3 shell of Fe XV relative to the ground state.
Level Energy (eV)a
3s3p(3P o
) 28.9927
3s3p(3P o
) 29.7141
3s3p(3P o
) 31.4697
3s3p(1P o
) 43.6314
3p2(3P0) 68.7522
3p2(1D2) 69.3816
3p2(3P1) 70.0017
3p2(3P2) 72.1344
3p2(1S0) 81.7833
3s3d(3D1) 84.1570
3s3d(3D2) 84.2826
3s3d(3D3) 84.4848
3s3d(1D2) 94.4875
3p3d(3F o
) 115.087
3p3d(3F o
) 116.313
3p3d(3F o
) 117.743
3p3d(1Do
) 117.601
3p3d(3Do
) 121.860
3p3d(3Do
) 123.346
3p3d(3Do
) 123.565
3p3d(3P o
) 121.940
3p3d(3P o
) 123.474
3p3d(3P o
) 123.518
3p3d(1F o
) 131.7351
3p3d(1P o
) 133.2690
3d2(3F2) 169.8994
3d2(3F3) 170.1106
3d2(3F4) 170.3612
3d2(1D2) 173.8992
3d2(1G4) 174.4529
3d2(3P0) 174.2613
Table 1—Continued
Level Energy (eV)a
3d2(3P1) 174.3433
3d2(3P2) 174.5416
3d2(1S0) 184.3712
aRalchenko et al. (2006)
unless otherwise noted.
bChurilov et al. (1989)
Table 2. Measured resonance energies Ei and strengths Si for Fe XV forming Fe XIV via
N = 3 → N ′ = 3 DR for Ecm ≤ 0.95. Fitting errors are presented at a 90% confidence level.
Peak Number Ei (eV) Si (10
−21 cm2 eV)
1 (6.74 ± 0.05)E-3 189430.0 ± 20635.3
2 0.0098 ± 0.0008 10078.0 ± 483.1
3 0.0196 ± 0.0008 613.1 ± 56.8
4 0.0254 ± 0.0003 743.9 ± 51.8
5 0.0444 ± 0.0002 686.3 ± 37.9
6 0.0610 ± 0.0002 2949.3 ± 39.0
7 0.1098 ± 0.0002 805.5 ± 699.5
8 0.1674 ± 0.0014 2424.3 ± 954.1
9 0.1943 ± 0.0018 4408.5 ± 1213.1
10 0.2143 ± 0.0022 4735.5 ± 750.9
11 0.2436 ± 0.0003 4257.6 ± 132.6
12 0.2660 ± 0.0006 4169.1 ± 339.0
13 0.2895 ± 0.0122 213.9 ± 218.4
14 0.3102 ± 0.0074 292.5 ± 188.6
15 0.3346 ± 0.0008 1158.1 ± 118.6
16 0.3596 ± 0.0010 943.5 ± 100.3
17 0.4154 ± 0.0149 193.3 ± 230.2
18 0.4536 ± 0.0005 8013.6 ± 328.0
19 0.4781 ± 0.0072 706.9 ± 310.2
20 0.4988 ± 0.0072 781.3 ± 303.5
21 0.5199 ± 0.0266 216.7 ± 285.6
22 0.5433 ± 0.0290 121.8 ± 270.4
23 0.6164 ± 0.0078 136.2 ± 106.9
24 0.6599 ± 0.0006 1269.1 ± 97.8
25 0.6992 ± 0.0010 3090.3 ± 99.5
26 0.7385 ± 0.0010 2068.5 ± 113.4
27 0.7943 ± 0.0006 1594.4 ± 83.7
28 0.8406 ± 0.0006 1740.6 ± 83.6
29 0.8830 ± 0.0006 2164.2 ± 89.9
30 0.9232 ± 0.0013 1420.7 ± 86.9
Table 3. Fit parameters for the total experimentally-derived DR rate coefficient for Fe XV
forming Fe XIV via N = 3 → N ′ = 3 core excitation channels and including the theoretical
estimate for capture into n > 80 (nmax = 1000). See § 4 for an explanation of the columns labeled
“Experiment (I)” and “Experiment (II)”. Also given are the fit parameters for our calculated
MCBP results (nmax = 1000). The units below are cm
3 s−1 K1.5 for ci and eV for Ei.
Parameter Experiment (I) Experiment (II) MCBP
c1 1.07E-4 1.07E-4 7.07E-4
c2 8.26E-6 8.26E-6 7.18E-3
c3 1.00E-6 1.00E-6 2.67E-2
c4 1.46E-6 1.46E-5 3.15E-2
c5 2.77E-6 2.77E-6 1.62E-1
c6 1.51E-5 1.51E-6 5.37E-4
c7 2.90E-6 3.29E-6 -
c8 2.66E-5 1.63E-4 -
c9 5.62E-5 4.14E-4 -
c10 6.66E-5 2.17E-3 -
c11 6.81E-5 6.40E-3 -
c12 7.28E-5 4.93E-2 -
c13 4.07E-6 1.51E-1 -
c14 5.96E-6 - -
c15 2.54E-5 - -
c16 2.23E-5 - -
c17 5.27E-6 - -
c18 2.40E-4 - -
c19 2.22E-5 - -
c20 2.56E-5 - -
c21 7.40E-6 - -
c23 4.35E-6 - -
c23 5.51E-6 - -
c24 5.50E-5 - -
c25 1.42E-4 - -
c26 1.00E-4 - -
c27 8.32E-5 - -
c28 9.61E-5 - -
c29 1.25E-4 - -
c30 8.61E-5 - -
c31 1.02E-4 - -
Table 3—Continued
Parameter Experiment (I) Experiment (II) MCBP
c32 5.46E-1 - -
c33 2.91E-3 - -
c34 4.83E-3 - -
c35 4.86E-2 - -
c36 1.51E-1 - -
E1 6.74E-3 6.74E-3 4.12E-1
E2 9.80E-3 9.80E-3 2.06E+0
E3 1.97E-2 1.97E-2 1.03E+1
E4 2.54E-2 2.54E-2 2.20E+1
E5 4.45E-2 4.45E-2 4.22E+1
E6 6.10E-2 6.10E-2 3.41E+3
E7 1.10E-1 1.10E-1 -
E8 1.67E-1 1.91E-1 -
E9 1.94E-1 3.33E-1 -
E10 2.14E-1 9.63E-1 -
E11 2.44E-1 2.47E+0 -
E12 2.66E-1 1.08E+1 -
E13 2.90E-1 3.83E+1 -
E14 3.10E-1 - -
E15 3.35E-1 - -
E16 3.60E-1 - -
E17 4.15E-1 - -
E18 4.54E-1 - -
E19 4.78E-1 - -
E20 4.99E-1 - -
E21 5.20E-1 - -
E22 5.43E-1 - -
E23 6.16E-1 - -
E24 6.60E-1 - -
E25 6.99E-1 - -
Table 3—Continued
Parameter Experiment (I) Experiment (II) MCBP
E26 7.39E-1 - -
E27 7.94E-1 - -
E28 8.41E-1 - -
E29 8.83E-1 - -
E30 9.23E-1 - -
E31 1.00E+0 - -
E32 1.16E+0 - -
E33 1.62E+0 - -
E34 3.14E+0 - -
E35 1.08E+1 - -
E36 3.82E+1 - -
Fig. 1.— Fe XV to Fe XIV 3 → 3 DR resonance structure versus center-of-mass energy Ecm
from 0 to 1 eV. The solid curve represents the measured rate coefficient 〈σv〉 which is the summed
DR plus radiative recombination (RR) cross sections times the relative velocity convolved with the
experimental energy spread, i.e., a merged beam recombination rate coefficient (MBRRC). The
dotted curve shows our calculated multiconfiguration Breit-Pauli (MCBP) results (nmax = 80) for
ground state Fe XV (top plot) and 3P0 metastable state Fe XV multiplied by a factor of 0.06
to account for the estimated 6% population in our ion beam (bottom plot). To these results we
have added the convolved, non-resonant RR contribution obtained from semi-classical calculations
(Schippers et al. 2001). The inset shows our results for Ecm from 5× 10
−6 to 1× 10−1 eV.
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6
Center of Mass Energy (eV)
 Experiment
 MCBP Theory
2.5 3.0 3.5 4.0 4.5
Center of Mass Energy (eV)
 Experiment
 MCBP Theory
4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0
Center of Mass Energy (eV)
 Experiment
 MCBP Theory
15 16 17 18 19 20 21 22 23 24
Center of Mass Energy (eV)
 Experiment
 MCBP Theory
15 16 17 18 19 20 21 22 23 24
Center of Mass Energy (eV)
 Experiment
 MCBP Theory
Fig. 7.— Same as Fig. 2 but for Ecm from 23 to 36 eV. The dotted curve shows our calculated
MCBP results and the thin solid curve shows our calculated MCBP results reduced by a
factor of 1.31.
Fig. 8.— Same as Fig. 7 but for Ecm from 35 to 45 eV. The weak resonances above 44 eV
are attributed to ∆N=1 DR. These are not included in either our experimentally-derived or
theoretical Maxwellian rate coefficients.
Fig. 9.— Measured and fitted Fe XV to Fe XIV 3 → 3 resonance structure below 0.07 eV.
The experimental MBRRC results are shown by the filled circles. The vertical error bars
show the statistical uncertainty of the data points. The solid curve is the fit to the data
using our calculated RR rate coefficient (dashed curve) and taking into account all resolved
DR resonances. The dotted curves show the fitted DR resonances. At Ecm = 0.005 meV the
difference between the model spectrum α0 and the data is 1 + (∆α/α0) = 2.5.
10-1 100 101 102 103
10-11
10-10
Electron Temperature (eV)
Photoionized
       Zone
Fig. 10.— Maxwellian-averaged 3 → 3 DR rate coefficients for Fe XV forming Fe XIV. The
solid curve represent our experimentally-derived rate coefficient plus the theoretical estimate
for unmeasured contributions due to capture into states with n > 80. The error bars show
our estimated total experimental uncertainty of ±29% (at a 90% confidence level). No error
bars are shown below 1 eV for reasons discussed in § 4. The thin solid curve represents our
experimentally-derived rate coefficient without the two lowest energy resonances included.
The dash-dotted curve represents our experimentally-derived rate coefficient alone (nmax =
80). Also shown is the recommended DR rate coefficient of Arnaud & Raymond (1992; thick
dash-dot-dotted curve) and its modification by Netzer (2004; thin dash-dot-dotted curve).
The filled pentagon at 5.2 eV represents the estimated rate coefficient from Kraemer et al.
(2004). The dashed curve shows our MCBP calculations for nmax = 1000. As a reference
we show the recommended RR rate coefficient of Arnaud & Raymond (1992; dotted curve).
Neither the experimental nor theoretical DR rate coefficients include RR. The horizontal
line shows the temperature range over which Fe XV is predicted to form in photoionized gas
(Kallman & Bautista 2001).
	Introduction
	Experimental Technique
	Metastable Ions
	Experimental Results
	Theory
	Discussion
	Resonance Structure
	Rate Coefficients
	Summary
ABSTRACT
  We have measured resonance strengths and energies for dielectronic
recombination (DR) of Mg-like Fe XV forming Al-like Fe XIV via N=3 -> N' = 3
core excitations in the electron-ion collision energy range 0-45 eV. All
measurements were carried out using the heavy-ion Test Storage Ring at the Max
Planck Institute for Nuclear Physics in Heidelberg, Germany. We have also
carried out new multiconfiguration Breit-Pauli (MCBP) calculations using the
AUTOSTRUCTURE code. For electron-ion collision energies < 25 eV we find poor
agreement between our experimental and theoretical resonance energies and
strengths. From 25 to 42 eV we find good agreement between the two for
resonance energies. But in this energy range the theoretical resonance
strengths are ~ 31% larger than the experimental results. This is larger than
our estimated total experimental uncertainty in this energy range of +/- 26%
(at a 90% confidence level). Above 42 eV the difference in the shape between
the calculated and measured 3s3p(^1P_1)nl DR series limit we attribute partly
to the nl dependence of the detection probabilities of high Rydberg states in
the experiment. We have used our measurements, supplemented by our
AUTOSTRUCTURE calculations, to produce a Maxwellian-averaged 3 -> 3 DR rate
coefficient for Fe XV forming Fe XIV. The resulting rate coefficient is
estimated to be accurate to better than +/- 29% (at a 90% confidence level) for
k_BT_e > 1 eV. At temperatures of k_BT_e ~ 2.5-15 eV, where Fe XV is predicted
to form in photoionized plasmas, significant discrepancies are found between
our experimentally-derived rate coefficient and previously published
theoretical results. Our new MCBP plasma rate coefficient is 19-28% smaller
than our experimental results over this temperature range.

<|endoftext|><|startoftext|>
Introduction.
The Metropolis algorithm, introduced in [29] and later generalized in [18], is
currently (together with other Monte Carlo Markov Chain methods) one of the
most used simulation techniques both in statistics and in physics. See, among
others, [33, 32, 39, 17, 35, 34, 25, 6].
In a finite setting the Metropolis algorithm can be described as follows. Suppose
that, given a probability π(x) on a finite set X , want to approximate
(1.1) µ =
f(x)π(x),
for f : X → R. As a first step, take a reversible Markov chain K(x, y) (the proposal
chain) on X and change its output in order to have a new chain with stationary
distribution π. This can be achieved by constructing a new (π–reversible) chain
(1.2) M(x, y) =
K(x, y)A(x, y) x 6= y
K(x, x) +
z 6=xK(x, z)(1−A(x, z)) x = y
where A(x, y) := min(
π(y)K(y,x)
π(x)K(x,y)
, 1). Then, the metropolis estimate of µ is given by
(1.3) µ̂n =
f(Yi),
where Y0 is generated from some initial distribution π0 and Y1, . . . , Yn fromM(x, y).
It is clear that, from a computational point of view, the speed of convergence to
the stationary distribution and the (asymptotic) variance of the estimate are two
very important features of the Markov chain M .
It is well-known that in some situation a Markov chain can converge very slowly
to its stationary distribution and, moreover, that the asymptotic variance of the es-
timate (1.3) can be much bigger than the variance of f , i.e. V arπ(f) :=
x(f(x)−
Key words and phrases. asymptotic variance, Chain decomposition theorem, fast/slowly mix-
ing chain, mean-field Ising model, Metropolis, spectral gap analysis.
http://arxiv.org/abs/0704.0906v2
2 FEDERICO BASSETTI AND FABRIZIO LEISEN
µ)2π(x), which is equal to the asymptotic variance of the crude Montecarlo estima-
tor. In these cases (1.3) turns out to be a very inefficient estimate of µ.
For the Metropolis chain a classical situation in which the convergence is slow
(and the variance big) is when the target distribution π has many peaks and K is
somehow too “local”.
This is well known in statistical physics, where, typically, a distribution of a
system with energy function h and in thermal equilibrium at temperature T is
described by the Gibbs distribution
πh,T (x) = exp{−h(x)/T }Z−1T
with ZT =
x exp{−h(x)/T }. In point of fact, the Metropolis algorithm has been
proposed in [29] to compute average with respect to such distributions. Indeed,
if h is nice, the Metropolis algorithm is very efficient, but it can perform very
poorly if the energy has many local minima separated by high barriers that cannot
be crossed by the proposal moves K. This problem can be bypassed, for specific
energy, designing appropriate moves that have higher chance to cut across the
energy barrier (see, e.g, [4, 5]), or constructing clever alternative approaches to the
problem, for instance using a reparametrization of the problem (see, e.g., [12, 13])
or using auxiliary variables (see, e.g., [40, 9, 1, 30]). A different kind of solution has
been proposed in [14] and in [28] by introducing the so called simulated tempering,
which essentially means that T is changed (stochastically or not) to flatten h. A
remarkable variant of these methods is the parallel tempering, see, for instance, [19].
More recently new algorithms based on the so called equi–energy levels sampling
have been proposed (see [26] and [22]). In particular, the algorithm proposed in [22]
relies on the so–called equi-energy jump, which enables the chain to reach regions
of the sample space with energy close to the one of the starting state, but that may
be separated by steep energy barriers. In point of fact, even if, according to some
simulations, the method seems to be efficient nothing has been formally proved.
Finally, let us mention a recent algorithm, called small world Markov chains (see
[15, 16]), that combine a local chain with long jumps. In these papers, it has
been shown that a simple modification of the proposal mechanism results in faster
convergence of the chain. That mechanism, which is based on an idea from the
field of small-world networks, amounts to adding occasional wild proposals to any
local proposal scheme.
In the present paper we study two simple examples: the so called mean field Ising
model and the mean field Blume–Emery–Griffiths model. As for the former, it is
well-known that the usual choice of K gives rise, for low temperature, to a slowly
mixing Metropolis chain (see, e.g., [26]). Here we show that a slight variant in the
proposal chain can completely solve this problem, keeping the mean computational
cost similar to the cost of the usual Metropolis. The idea again rests on allowing
appropriate jumps in the same energy level of the starting state. As for the Blume–
Emery–Griffyths mean–field model, we first show that there is a critical region of
the parameters space for which the naive Metropolis chain is slowly mixing. Then
we show how one can modify the proposal chain in order to obtain a better mixing
for the Metropolis chain. The present paper should be intended as a further step in
the direction of a better mathematical understanding of both small world Markov
chains and equi-energy sampling.
The rest of the paper is organized as follows. In Section 2 some general consid-
erations are given. In Section 3 some basic tools concerning Markov chain, which
will be used in the paper, are reviewed. Section 4 contains a warming up example.
In Section 5 the mean field Ising model is treated, while Section 6 deals with the
more complex case of the mean field Blume-Emery-Griffiths model. All the proofs
are deferred to the Appendix.
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 3
2. A general strategy
In an abstract setting, what we shall do in the next examples can be summarized
as follows. Let G be a group acting on X for which
(2.1) π(x) = π(g(x)) ∀ x ∈ X , ∀ g ∈ G.
For every x in X let Ox := {y = g(x) : g ∈ G} be the orbit of x (of course if y
belongs to Ox then Ox = Oy).
Assume now that we have a reversible Markov chain KE(x, y) (the proposal) on
X and suppose that the Metropolis chain ME with proposal KE is slowly mixing
(see next section for more details). To speed up the mixing one can try to exploit
(2.1) by taking a proposal of the following form:
(2.2) Kǫ(x, y) = ǫKE(x, y) + (1 − ǫ)KG(x, y)
where
KG(x, y) =
qx(z)Iz(y),
0 < qx(z) < 1 and
qx(z) = 1.
In point of fact, usually KE is “local”; for instance frequently
KE(x, y) = 0
whenever y 6= x belongs to Ox, hence with KG we are adding “long” jumps to the
chain. Moreover, note that if KE is such that KE(x, g(x)) = KE(g(x), x), for every
x in X and g in G, then the Metropolis always accepts the move x→ g(x) and
M(x, g(x)) = ǫKE(x, g(x)) + (1− ǫ)qx(g(x)).
In particular this holds when KE is symmetric.
The heuristics under (2.2) is to combining small world Markov chains and equi-
energy sampling.
Before presenting some examples in which one can actually improve the perfor-
mances of the Metropolis chain using this idea, we collect in the next section some
useful facts concerning Markov chains.
3. Preliminaries
Let P (x, y) be a reversible and ergodic Markov chain on the finite set X with
(unique) stationary distribution p(x). Thus, p(x)P (x, y) = p(y)P (y, x). Let L2(p) =
{f : X → R} with < f, g >p= Ep(fg) =
x f(x)g(x)p(x). Reversibility is equiva-
lent to P : L2 → L2 being self–adjoint. Here Pf(x) =
y f(y)P (x, y). The spec-
tral theorem implies that P has real eigenvalues 1 = λ0(P ) > λ1(P ) ≥ λ2(P ) ≥
· · · ≥ λ|X |−1(P ) > −1 with orthonormal basis of eigen–functions ψi : X → R
(Pψi(x) = λiψi(x), < ψi, ψj >p= δij).
3.1. Spectral gap, variance and speed of convergence. A very important
quantity related to the eigenvalues is the spectral gap, defined by
Gap(P ) = 1−max{λ1, |λ|X |−1|}.
It turns out that the spectral gap is a good index to measure the mixing of a
chain. To better understand this point, assume that f belongs to L2(p) and write
f(x) =
i≥0 aiψi(x) (with ai =< f, ψi >p). Now let Y0 be chosen form some
distribution p0 and Y1, . . . , Yn be a realization of the P (x, y) chain, then
µ̂n =
f(Yi)
4 FEDERICO BASSETTI AND FABRIZIO LEISEN
has asymptotic variance given by
AV ar(f, p, P ) := lim
n · V ar(µ̂n) =
|ak|2
1 + λk
1− λk
See, for instance, Theorem 6.5 in Chapter 6 of [3]. From the last expression, the
classical inequality
(3.1) AV ar(f, p, P ) ≤ 2
1− λ1
V arp(f),
follows easily. The last inequality is the usual way of relating spectral gap to
asymptotic variance and, hence, to the efficency of a chain.
The spectral gap is very important also to give bounds on the speed of con-
vergence to the stationary distribution. For example, if ‖ · ‖TV denotes the total
variation norm, one has
‖δxP k − p‖2TV =
|P k(x,A) − p(x)|
≤ 1− p(x)
4p(x)
(max{λ1, |λ|X |−1|})2k
See, e.g., Proposition 3 in [7]. Another classical bound is
‖p0P k/p− 1‖2,p ≤ Gap(P k)‖p0/p− 1‖2,p
valid for every probability p0. See, for instance, [39].
Roughly speaking one can say that a sequence of Markov chains defined on a
sequence of state space XN is slowly mixing (in the dimension of the problem N)
if the spectral gap decreases exponentially fast in N .
3.2. Cheeger’s inequality. As already recalled, problems of slowly mixing typ-
ically occur when π has two or more peaks and the chain K can only move in
a neighborhood of the starting peak. Usually this phenomenon is called bottle-
neck. A powerful tool to detect the presence of a bottleneck is the conductance and
the related Cheeger’s inequality. Recall that the conductance of a chain P with
stationary distribution p is defined by
h = h(p, P ) := inf
A :p(A)≤ 1
x∈A,y∈Ac
p(x)P (x, y),
and the well-known Cheeger’s inequality is
(3.2) 1− 2h ≤ λ1(P ) ≤ 1−
See, for instance, [3, 37, 7]. Note that, since P is reversible,
(3.3) h ≤ 1
p(x)P (x, y) =
p(y)P (y, x)
for every A such that p(A) ≤ 1/2.
3.3. Chain decomposition theorem. In this subsection we briefly describe a
useful technique to obtain bounds on the spectral gap: the so called chain decom-
position technique. Following [16] assume that A1, . . . , Am is a partition of X .
Moreover, for each i = 1, . . . ,m, define a new Markov chain on Ai by setting
PAi(x, y) := P (x, y) + Ix(y)
P (x, z)
 (x, y ∈ Ai).
PAi is a reversible chain on the state space Ai with respect to the probability
measure
pi(x) := p(x)/p(Ai).
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 5
The movement of the original chain among the “pieces” A1, . . . , Am can be de-
scribed by a Markov chain with state space {1, . . . ,m} and transition probabilities
PH(i, j) :=
2p(Ai)
x∈Ai,y∈Aj
P (x, y)p(x)
for i 6= j and
PH(i, i) := 1−
j 6=i
PH(i, j),
which is reversible with stationary distribution
p̄(i) := p(Ai).
A variant of a result of Caracciolo, Pelisetto and Sokal (published in [27]), states
(3.4) Gap(P ) ≥ 1
Gap(PH)
i=1,...,m
Gap(PAi)
holds true, see Theorem 2.2 in [16]. Other results about chain decompositions can
be found, for instance, in [20].
In the next very simple example we shall show how this technique can be used,
starting from a slowly mixing chain, to suggest how to modify the proposal chain
in order to obtain a fast mixing chain.
4. Warming up example
Set X = {−N,−N + 1, . . . , 0, 1, . . . , N} and define a probability measure on X
π(x) =
(θ − 1)θ|x|
2θN+1 + 1− θ
θ being a given parameter bigger than 1. Here we can consider G = {+1,−1} (with
group operation given by the usual product) acting on X by g(x) = gx, hence
Ox = {x,−x}.
Now let KE be a chain defined by
KE(x, x + 1) = 1/2 x 6= N
KE(x, x − 1) = 1/2 x 6= −N
KE(N,N) = KE(−N,−N) = 1/2
KE(x, y) = 0 otherwise
and denote by ME the Metropolis chain with stationary distribution π derived
by KE. It is clear that in this case KE(x, y) = 0 whenever y belongs to Ox.
In this example it is very easy to bound the conductance on ME , indeed, taking
A = {−N, . . . ,−1}, by (3.3), it follows that
h(π,ME) ≤
1− π(0)
Hence,
h(π,ME) ≤ Cθ−N ,
and then (3.2) yields
1− λ1 ≤ 2Cθ−N .
This means that, if f is such that a1 6= 0 and θ > 1, then the asymptotic variance
of f blows up exponentially fast, indeed
AV ar(f, π,ME) ≥ 2Celog(θ)N .
6 FEDERICO BASSETTI AND FABRIZIO LEISEN
Now, instead of KE consider
Kǫ(x, y) = (1− ǫ)KE(x, y) + ǫI{−x}(y)
and let M (ǫ) be the Metropolis chain derived by Kǫ. Decompose X as follows
X = A1 ∪A2 · · · ∪ AN
with A1 = {−1, 0, 1} and Ai = {x ∈ X : |x| = i}, for i > 1. Moreover let
π̄(i) = π(Ai) =
(2θ + 1)/Z for i = 1
2θi/Z for i > 1
where
2θN+1 + 1− θ
(θ − 1)
and set
H (i, j) =
2π(Ai)
l∈Ai,m∈Aj
M (ǫ)(l,m)π(l), M
H (i, i) = 1−
j 6=i
H (i, j).
For i 6= 1, N , one has
H (i, i+ 1) =
2π(Ai)
[M (ǫ)(i, i+ 1)π(i) +M (ǫ)(−i,−i− 1)π(−i)]
and, since π(i) = π(−i) and π(i + 1) ≥ π(i)
H (i, i+ 1) =
In the same way it is easy to see that
H (i, i− 1) =
, i 6= 1, N
H (i, i) = 1−
(1 + θ−1) i 6= 1, N
H (N,N − 1) =
H (N,N) = 1−
H (1, 2) =
4(1 + 1/(2θ))
H (1, 1) = 1−
4(1 + 1/(2θ))
Moreover, for every i 6= 1, M (ǫ)Ai in matrix form is given by
1− ǫ ǫ
ǫ 1− ǫ
and hence
Gap(M
) = 1− |1− 2ǫ|.
While M
is given by
(2θ − 1)(1− ǫ)/(2θ) (1− ǫ)/(2θ) ǫ
(1− ǫ)/2 ǫ (1− ǫ)/2
ǫ (1− ǫ)/(2θ) (2θ − 1)(1− ǫ)/(2θ)
and hence
Gap(M
) = k(θ, ǫ) > −1.
Moreover, since
i6=1,N
H (i, i± 1)),M
H (1, 2),M
H (N,N − 1)
≥ min
(1− ǫ)/(4θ), 1− ǫ
4(1 + 1/(2θ))
=: m(ǫ, θ) > 0
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 7
and π̄(i) ≤ 3π̄(j) for every i < j, Lemma A.1 in the appendix yields that
1− λ1(M (ǫ)H ) ≥
m(ǫ, θ)
In the same way, since M
H (i, i+1)+M
H (i, i− 1) ≤ (1− ǫ)M(θ)/4, with M(θ) =
max(1 + θ−1, 2θ/(2θ + 1)) ≤ 2, inequality (A.1) in the Appendix yields that
λN−1(M
H ) ≥ 1−
≥ 1 + ǫ
Hence
Gap(M
H ) ≥
m(ǫ, θ)
and (3.4) yield
Gap(M (ǫ)) ≥
h(θ, ǫ)
for a suitable h. This shows that M (ǫ) is fast mixing for every ǫ > 0 and for every
θ > 1 while ME is slowly mixing for every θ > 1.
5. The mean field Ising model
Let X = {−1, 1}N , N being an even integer. For every β > 0 let π = πβ,N be a
probability on X defined by
π(x) = πβ,N (x) := exp
S2N (x)
Z−1N (β) (x ∈ X )
where
ZN (β) = ZN :=
S2N (x)
is the normalization constant (“partition function”) and
SN (x) :=
xi x = (x1, . . . , xN ).
This is the so called mean field Ising model, or Curie-Weiss model, in which every
particle i, with spin xi, interacts equally with every other particle. It is probably
the most simple but also the most studied example of spin system on a complete
graph. The usual Metropolis algorithm uses as proposal chain
KE(x, y) =
I{x(j)}(y)
where x(j) denotes the vector (x1, . . . ,−xj , . . . , xN ). It has been proved in [26] that,
whenever β > 1,
1− λ1 ≤ Ce−D
where λ1 is the first eigenvalues smaller than 1 of the Metropolis chain ME derived
KE . This yields that the variance of an estimator obtained from this Metropolis
algorithm can blow up exponentially fast in N .
The aim of this section is to show how one can construct a different Metropolis
chain avoiding this problem. In the notation of Section 2, we consider
G = SN × {+1,−1}
(SN being the symmetric group of order N) and we define the action of G on
X = {−1, 1}N by
g(x) = (e · xσ(1), . . . , e · xσ(N)) g = (σ, e).
8 FEDERICO BASSETTI AND FABRIZIO LEISEN
In order to introduce a new proposal, it is useful to write X as the union of its
“energy sets”, that is
X = X0 ∪ X2 ∪ X4 ∪ · · · ∪ XN
where
Xi := {x ∈ X : |SN (x)| = i} (i = 0, 2, . . . , N).
Note that energy takes only even values and that Ox = X|SN (x)|. Moreover, for
i 6= 0, set
X+i := {x ∈ X : SN (x) = i} and X
i := {x ∈ X : SN(x) = −i}.
The new proposal chain will be
K(x, y) = p1KE(x, y) + (1 − p1)K0(x, y) if x ∈ X0
K(x, y) = p1KE(x, y) + p2I{−x}(y) + (1− p1 − p2)Ki(x, y)
if x ∈ Xi, i 6= 0
(5.1)
where p1, p2 belong to (0, 1), p1 + p2 < 1, and
Ki(x, y) = IX+
{x}K+i (x, y) + IX−
{x}K−i (x, y) (i 6= 0).
We shall assume that K±i (K0, respectively) are irreducible, symmetric and aperi-
odic chains on X±i ( X0, respectively).
As a leading example we shall take
K0(x, y) =
) y ∈ X0
K±i (x, y) =
(N−i)/2
) y ∈ X±i ,
(5.2)
that is: a realization of a chain K±i (K0, respectively) is simply a sequence of
independent uniform random sampling from X±i (X0, respectively).
Remark 1. Note that (5.2) is the (n, k)-Bose-Einstein distribution with n = (N +
i)/2 and k = (N − i)/2 + 1 and recall that there is a very easy way to directly
generate Bose-Einstein configurations. One may place n balls sequentially into k
boxes, each time choosing a box with probability proportional to its current content
plus one. Starting from the empty configuration this results in a Bose-Einstein
distribution for every stage.
Now let M be the Metropolis chain defined by the transition kernel (1.2) with
K as in (5.1), i.e. for every x in X±i (i 6= 0)
M(x, y) =
if y = x(j), j = 1...N
p2 if y = −x
(1− p1 − p2)K±i (x, y) if y ∈ X
i , y 6= x
z 6=xM(x, z) if y = x
while for x in X0
M(x, y) =
if y = x(j), j = 1...N
(1− p1)K0(x, y) if y ∈ X0, y 6= x
z 6=xM(x, z) if y = x.
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 9
By construction M is an aperiodic, irreducible and reversible chain with stationary
distribution π. Then, when (5.2) holds true,
M(x, y) =
if y = x(j), j = 1...N
p2 if y = −x
(1− p1 − p2) 1( N(N−i)/2)
if y ∈ X±i , y 6= x
z 6=xM(x, z) if y = x
for x in X±i (i 6= 0), while if x belongs to X0
M(x, y) =
if y = x(j), j = 1...N
(1− p1) 1( NN/2)
if y ∈ X0, y 6= x
z 6=xM(x, z) if y = x.
In order to bound the spectral gap ofM we shall use the decomposition theorem
described in Subsection 3.3. To this end, for every i = 0, 2, . . . , N and every j 6= i
P̄ (i, j) :=
2π(Xi)
M(x, y)π(x)
P̄ (i, i) := 1−
j 6=i
P̄ (i, j).
As already noted, P̄ is a reversible chain on {0, 2, . . . , N} with stationary distribu-
π̄(i) := π(Xi).
Moreover define for every i = 0, 2, . . . , N a chain on Xi setting
PXi(x, y) :=M(x, y) + Ix(y)
z∈X c
M(x, z)
where both x and y belong to Xi. In the same way, define chains on X+i and X
for i = 2, . . . , N setting
(x, y) := PXi(x, y) (y 6= x, x, y ∈ X±i )
(x, x) := 1−
y 6=x
PXi(x, y).
These chains are reversible on Xi (X±i , respectively) and have as stationary distri-
butions
πXi(x) :=
π(Xi)
and πX±
(x) :=
πXi(x)
πXi(X±i )
|X±i |
10 FEDERICO BASSETTI AND FABRIZIO LEISEN
respectively. Finally, for every i = 2, 4, . . . , N , define a chain on {+,−} setting
Pi(+,−) :=
2πXi(X+i )
PXi(x, y)πXi(x)
Pi(−,+) :=
2πXi(X−i )
PXi(x, y)πXi(x).
Now the lower bound (3.4), applied two times yields
Gap(M) ≥ 1
Gap(P̄ ) min
i=0,2,...,N
{Gap(PXi)}
Gap(P̄ )min
Gap(PX0),
i=2,...,N
Gap(Pi)min{Gap(PX+
), Gap(PX−
(5.3)
Hence, to get a lower bound on Gap(M) it is enough to obtain bounds on the gaps
of the chains P̄ , PX0 , Pi, PX±
The most important of these bounds is given by the following
Proposition 5.1. P̄ is a birth and death chain on {0, 2, . . . , N}, more precisely
(5.4)
P̄ (0, 2) = p1
P̄ (i, i+ 2) = p1
i 6= N, 0
P̄ (i, i− 2) = p1
exp{2β(1− i)/N} i 6= 0.
Moreover
λ1(P̄ ) ≤ 1−
(N/2 + 1)3
λN/2(P̄ ) ≥ 1− p1.
The proof of the previous proposition is based on a bound for a birth and death
chain, given in the Appendix, which can be of its own interest.
As for the others chains, we have the following
Lemma 5.2. For every i = 2, 4, . . . , N
Gap(PX±
) ≥ (1− p1 − p2)Gap(K±i )
Gap(Pi) = p2,
moreover
Gap(PX0) ≥ (1− p1)Gap(K0).
In this way, using (5.3), we can prove the main result of this section.
Proposition 5.3. Let M be the Metropolis chain derived by the chain K defined
as in (5.1) then
Gap(M) ≥ p1p2
(N/2 + 1)3
[ (1− p1)
Gap(K0),
(1− p1 − p2)
min{Gap(K+i ), Gap(K
i, )}
If K±i and K0 are defined as in (5.2) then
Gap(M) ≥
(N/2 + 1)3
[ (1 − p1 − p2)
(1− p1)
for every β > 0 and N ≥ N0.
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 11
Proposition 5.3 shows that the gap is polynomial in 1/N independently of β.
Hence, even when β > 1, the variance of the metropolis estimate obtained with this
proposal can not grow up faster than a polynomial in N .
Note that if in Proposition 5.3 we choose
(5.5) p1 = 1− a/(2N), p2 = a/N
we get
Gap(M) ≥ C
Hence, even with this choice, the Metropolis algorithm is still fast mixing for every
β. It is worth noticing that the mean computational cost of this Metropolis does
not change with respect to the Metropolis which uses the proposal KE. Indeed,
in the case of the usual Metropolis, the computational cost needed to go from Xn
to Xn+1 is O(N), since it is essentially due to a sample of one number among N
numbers (we need to decide which coordinate to flip). In the case of the ”modified”
proposal, things are slight more complex. In this case, at the beginning, we have
an extra “toss”. If with this fist toss we decide to flip at random a coordinate the
cost is still O(N) but if we need to sample from K±i the cost is O(N
2) (in this last
case we need to pick a sample from a Bose-Einstein distribution). Hence, although
our algorithm is ”sometime” more expensive, if we take p1 and p2 as in (5.5), we
get that the mean cost of our algorithm is still O(N).
6. The mean–field Blume-Emery-Griffiths model
The Blume-Emery-Griffiths (BEG) model (see [2]) is an important lattice–spin
model in statistical mechanics, it has been studied extensively as a model of many
diverse systems, including He3 − He4 mixtures as well solid–liquid–gas systems,
microemulsions, semiconductor alloys and electronic conduction models. See, for
instance, [2, 38, 23, 24, 31, 36, 21]. We will focus our attention on a simplified
mean–field version of the BEG model. For a mathematical treatment of this mean–
field model see [10]. In what follows let X := {−1, 0, 1}N , N being an even integer,
and for every β > 0 and K > 0 let πβ,K,N be the probability defined by
π(x) = πβ,K,N(x) = exp{−βRN(x) +
S2N(x)}Z−1N (β,K) (x ∈ X )
where
ZN(β,K) = ZN :=
−βRN (x) +
S2N (x)
is the normalization constant,
SN (x) :=
xi and RN (x) :=
x2i x = (x1, x2, ..., xN ).
A natural Metropolis algorithm can be derived by using the proposal chain
(6.1) KE(x, y) =
[I{x(+j)}(y) + I{x(−j)}(y)]
where x(±j) denotes the vector (x1, . . . , xj ± 1, . . . , xN ), with the convention that
2 = −1 and −2 = 1.
The next proposition shows that there exists a critical region of the parameters
space in which the Metropolis chain is slowly mixing. More precisely, using some
results of [10] it is quite straightforward to proove the following
12 FEDERICO BASSETTI AND FABRIZIO LEISEN
Proposition 6.1. Ler ME be the Metropolis chain (with stationary distribution π)
with proposal chain KE defined in (6.1).Then, there exists a non decreasing function
Γ : (0,+∞) → (0,+∞) with limx→0 Γ(x) = +∞ and limx→∞ Γ(x) = γc ≃ 1.082
such that for every couple of positive parametrs (β,K) with K > Γ(β)
Gap(ME) ≤ Ce−∆N
for suitable constants C = C(γ,K) > 0 and ∆ = ∆(γ,K) > 0.
As in the case of the mean–field Ising model, we intend to by pass the slowly
mixing problem of this Metropolis chain by choosing a different proposal. To un-
derstand which kind of proposal is reasonable, here we choose
G = SN × {+1,−1}
with G acting on X = {−1, 0, 1}N by
g(x) = (e · xσ(1), . . . , e · xσ(N)) g = (σ, e).
At this stage, decompose X as the union of its ”energy sets”, that is
X = X0,0 ∪ X1,1 ∪ X0,2 ∪ X1,3 ∪X3,3 ∪ ... ∪ X0,N ∪ X2,N ∪ ...XN,N
where
Xs,r := {x ∈ X : |SN | = s and RN (x) = r}
r = 0, 1, 2, ..., N and s = 1, 3, ..., r if r is odd and s = 0, 2, ..., N if r is even.
Moreover, for s = 1, 2, ..., N , set
X+s,r := {x ∈ X : SN = s and RN (x) = r}
X−s,r := {x ∈ X : SN = −s and RN (x) = r}.
Note again that Ox = Xs,r with s = SN (x) and r = RN (x). The new proposal
chain will be
K(x, y) = p1KE(x, y) + (1 − p1)K0,r(x, y) if x ∈ X0,r, r = 0, 2, ..., N
K(x, y) = p1KE(x, y) + p2I{−x}(y) + (1− p1 − p2)Ks,r(x, y)
if x ∈ Xs,r, s 6= 0
(6.2)
where p1, p2 belong to (0, 1), p1 + p2 < 1, and
Ks,r(x, y) = IX+s,r{x}K
s,r(x, y) + IX−s,r{x}K
s,r(x, y) (s 6= 0)
K0,r(x, y) =
) y ∈ X0,r
K±s,r(x, y) =
(r−s)/2
) y ∈ X±s,r.
(6.3)
Now let M be the Metropolis chain defined by the transition kernel (1.2) with K
as in (6.2), i.e. for every x in X±s,r (s 6= 0)
M(x, y) =
if y = x(±j), j = 1...N
p2 if y = −x
(1 − p1 − p2) 1(Nr )( r(r−s)/2)
if y ∈ X±s,r, y 6= x
z 6=xM(x, z) if y = x,
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 13
while if x belongs to X0,r
M(x, y) =
if y = x(±j), j = 1...N
(1− p1) 1(Nr )( rr/2)
if y ∈ X0,r, y 6= x
z 6=xM(x, z) if y = x.
By construction M is an aperiodic, irreducible and reversible chain with stationary
distribution π.
Also in this case, to bound the spectral gap of M , we shall use the chain decom-
position tools. Let
DN = {(0, 0), (1, 1), (0, 2), (2, 2), (1, 3), (3, 3), (0, 4), (2, 4), (4, 4), ..., (0, N), (2, N), ..., (N,N)}
and, for every couple (s, r), (s̃, r̃) in DN , with (s, r) 6= (s̃, r̃), let
P̄ ((s, r), (s̃, r̃)) :=
2π(Xs,r)
x∈Xs,r
y∈Xs̃,r̃
M(x, y)π(x)
P̄ ((s, r), (s, r)) := 1−
(s̃,r̃) 6=(s,r)
P̄ ((s, r), (s̃, r̃)).
Once again, note that P̄ is a reversible chain on DN with stationary distribution
π̄(s, r) := π(Xs,r).
Moreover, for every (s, r) in DN , define a chain on Xs,r setting
PXs,r (x, y) :=M(x, y) + Ix(y)
z∈X cs,r
M(x, z)
where both x and y belong to Xs,r. In the same way, define chains on X+s,r and X−s,r
for (s, r) in DN , s 6= 0, setting
PX±s,r (x, y) := PXs,r(x, y) (y 6= x, x, y ∈ X
PX±s,r (x, x) := 1−
s,ry 6=x
PXs,r (x, y).
These chains are reversible on Xs,r (X±s,r , respectively) and have as stationary dis-
tributions
πXs,r(x) :=
π(Xs,r)
|Xs,r|
and πX±s,r (x) :=
πXs,r(x)
πXs,r (X±s,r)
|X±s,r|
respectively. Finally, for every (s, r) in DN , s 6= 0, define a chain on {+,−} setting
Ps,r(+,−) :=
2πXs,r(X+s,r)
PXs,r (x, y)πXs,r(x)
Ps,r(−,+) :=
2πXs,r(X−s,r)
PXs,r (x, y)πXs,r(x).
14 FEDERICO BASSETTI AND FABRIZIO LEISEN
At this stage, the lower bound (3.4), applied two times, yields
Gap(M) ≥ 1
Gap(P̄ ) min
(s,r)∈DN
Gap(PXs,r)
Gap(P̄ )min
r=0,2,...,N
Gap(PX0,r )
(s,r)∈DN ,s6=0
Gap(Ps,r)min{Gap(PX+s,r), Gap(PX−s,r)}
(6.4)
To derive from the last bound a more explicit bound we need some preliminary
work. The first result we need is exactly the analogous of Lemma 5.2.
Lemma 6.2. Fore every r = 1, . . . , N
Gap(PX0,r ) ≥ (1− p1)Gap(K0,r) = (1− p1),
moreover, for every (s, r) in DN with s 6= 0,
Gap(P±Xs,r) ≥ (1− p1 − p2)Gap(K
s,r) = (1− p1 − p2).
Finally, for every (s, r) in DN ,
Gap(Ps,r) = p2.
Hence, (6.4) can be rewritten as
(6.5) Gap(M) ≥ Gap(P̄ )p2
min{(1− p1)/2, (1− p1 − p2)/2}.
It remains to bound Gap(P̄ ). Unfortunately the the analogous of Proposition 5.1
is not so simple, hence we shall require an additional hypothesis. In what follows
q|[N ]|(r) :=
(r−2i)2
 if r is even
q|[N ]|(r) : =
(r−2i)2
 if r is odd
r = 0, 1, . . . , N and set
A = {β > 0,K > 0 : ∃N0 such that ∀N ≥ N0, q|[N ]| is unimodal}.
Lemma 6.3. For every (β,K) in A
Gap(P̄ ) ≥ Cp
for a suitable constant C = C(β,K).
Under the same assumptions of the previous Lemma we can state the main
results of this section.
Proposition 6.4. For every (β,K) in A
Gap(M) ≥ C̃p
for a suitable constant C̃ = C̃(β,K).
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 15
0 5 10 15
β=2.3, K=0.5,0.6,0.7,0.8
0 5 10 15
β=2.3,2.4,2.5,2.6, K=0.5
0 5 10 15
β=2.3, K=1.2,1.3,1.4,1.5
0 5 10 15
β=2.3,2.4,2.5,2.6, K=1.2
Figure 1. The function q|[N ]| for N = 15 and few values of β and K.
We conjecture that Gap(P̄ ) is polynomial in N for every (β,K) such that β 6=
Γ(K) (where Γ is the function of Proposition 6.1), but we are not able to prove this
conjecture. In point of fact we conjecture that R+×R+ \{(β,K) : Γ(K) = β} ⊂ A.
We plotted q|[N ]| for different N , β andK, and these plotts seem, at least, to confirm
that R+ × R+ \ {(β,K) : |Γ(K)− β| ≤ ǫ} ⊂ A for a suitable small ǫ. In Figure 1
we show the graph of q|[N ]| for few different N , β and K.
Appendix A. The Spectral Gap of a Birth and Death Chain
We derive here some bounds on the eigenvalues of a birth and death chain that we
shall use later. These bounds are obtained using the so called geometric techniques,
see [7]. Let Pn be a birth and death chain on Ωn = {1, . . . , n}. Assume that Pn
is reversible with respect to a probability pn, that is pn(i)Pn(i, j) = pn(j)Pn(j, i).
Moreover let
1 > λ1 ≥ λ2 ≥ . . . λn−1 ≥ −1
the eigenvalues of Pn.
We can now prove the following variant of Proposition 6.3 in [6].
Lemma A.1. If there exist positive constants A, q, B and an integer k such that
Pn(i, i± 1) ≥ An−q (i 6= 1, n)
Pn(1, 2) ≥ An−q
Pn(n, n− 1) ≥ An−q
pn(i) ≤ Bpn(j) i ≤ j ≤ k
pn(j) ≤ Bpn(i) k ≤ i ≤ j
λ1 ≤ 1−
16 FEDERICO BASSETTI AND FABRIZIO LEISEN
Proof. We use the notation and the techniques of [7], see also [3] and [6]. Choose
the set of paths
Γ = {γij = (i, i+ 1, ..., j); i ≤ j; i, j ∈ Ωn}
and for e = (i, i+ 1) (i < n) let
ψ(e) =
pn(i, i+ 1)
γl,m∈Γ
γl,m∋e
|γl,m|
pn(l)pn(m)
pn(i)
where |γ| is the length of the path γ. Setting K := supe ψ(e) one has
λ1 ≤ 1−
(see Proposition 1’ in [7], or Exercise 6.4 page 248 in [3]). So, for our purposes, it
suffices to give an upper bound on K. Assume first that e = (i, i+ 1) with
i < k ≤ n, since |γl,m| ≤ n, it follows that
ψ(e) ≤ n
s≥i+1
pn(r)pn(s)
pn(i)
pn(r)
pn(i)
s≥i+1
pn(s)
pn(s)
≤ nq+2
All the other cases can be treated in the same way. Hence,
ψ(e) ≤ B
and then
λ1 ≤ 1−
As for the smaller eigenvalues, Gershgorin theorem yields that
λn−1 ≥ −1 + 2min
P (i, i).
See, for instance, Corollary 2.1 in the Appendix of [3]. Hence, if there exists a
positive constant D such that
Pn(i, i+ 1) + Pn(i, i− 1) ≤ D/2
for every i, then
(A.1) λn−1 ≥ 1−D.
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 17
Appendix B. Proofs
To prove Proposition 5.1 we need first to show that π̄ is essentially unimodal.
Lemma B.1. Let
qN (i) =
i = 0, 2, 4, . . . , N.
For every β < 1 there exists an integer N0 such that for every N ≥ N0
qN (i) ≤ qN (j)
whenever j ≤ i. For every β ≥ 1 there exists an integer N0 such that for every
N ≥ N0
qN (i) ≤ qN (j)
whenever i ≤ j ≤ kN and
qN (i) ≥ qN (j)
whenever kN ≤ i ≤ j, kN being a suitable integer.
Proof. Let ∆N (i) be the ratio
∆N (i) =
qN (i+ 2)
qN (i)
i = 0, 2, 4, ..., N − 2,
so that
∆N (i) =
) exp
(1 + i)
N − i
N + 2 + i
(1 + i)
Setting ∆N (x) =
N+2+x
(1 + x)
, x in [0, N−2], it is enough to prove that
x 7→ ∆N (x) takes the value 1 at most once in [0, N − 2], for sufficiently large N .
To prove this last claim first note that
∆N (0) =
N + 2
1 + 2
= 1− 2
(1 − β) +
− β + 2
Hence, there exists N0 in N such that for N ≥ N0:
β ≥ 1 ⇒ ∆N (0) > 1
β < 1 ⇒ ∆N (0) < 1.
As for the first derivative note that
∆′N (x) =
−2(N + 1) + 2β(N + 2)− 2β
(x2 + 2x)
(N + x+ 2)2
(1 + x)
hence ∆′N (x) = 0 if and only if
−2(N + 1) + 2β(N + 2)− 2β
(x2 + 2x) = 0.
18 FEDERICO BASSETTI AND FABRIZIO LEISEN
Rearranging the last equation as
x2 − 4β
+ 2[(β − 1)N + 2β − 1] = 0
one sees that the roots are
x1,2 = 1±
2β − 1
β − 1
Hence, after setting
r := 1 +
2β − 1
β − 1
N2 and r := 1 +
one has
β < 1 ⇒ ∆′N (x) < 0 ∀x ∈ [0, N − 2]
β > 1 ⇒ ∆′N (x) > 0 for x ∈ [0, r)
∆′N (x) < 0 for x ∈ (r,N − 2]
β = 1 ⇒ ∆′N (x) < 0 for x ∈ [0, r)
∆′N (x) < 0 for x ∈ (r,N − 2]
and this concludes the proof. �
Proof of Proposition 5.1. By direct computations it is easy to prove (5.4). Hence
P (i, i± 2) ≥
4(N + 2)
P (i, i+ 2) + P (i, i− 2) ≤ p1
Now observe that
π̄(0) =
ZN (β)
qN (0)
π̄(i) =
ZN (β)
qN (i) i 6= 0.
Hence, by Lemma B.1, if β < 1
π̄(i) ≤ 2π̄(j)
whenever j ≤ i and N is large enough. While for β > 1
π̄(i) ≤ π̄(j)
whenever i ≤ j ≤ kN and
π̄(i) ≥ π̄(j)
whenever kN ≤ i ≤ j. The thesis follows now by Lemma A.1 and by (A.1). �
In order to prove Lemma 5.2 we recall that by Rayleigh’s theorem
(B.1) 1− λ1(P ) = inf
Ep(f, f)
V arp(f)
: f nonconstant
where
Ep(f, f) :=< (I − P )f, f >p=
(f(x) − f(y))2P (x, y)p(x),
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 19
P being a reversible chain w.r.t. p, moreover
(B.2) 1− |λN−1| = inf
x,y(f(x) + f(y))
2P (x, y)p(x)
V arp(f)
: f nonconstant
(see, for instance, Theorem 2.3 in Chapter 6 of [3] and Section 2.1 of [8]). At this
stage set
Pǫ(x, y) := (1 − ǫ)P (x, y) + ǫIx(y).
Hence, (B.1) yields
1− λ1(Pǫ) = inf
f∈L2pf 6=const
x,y(f(x) − f(y))2Pǫ(x, y)p(x)
V arp(f)
= inf
f∈L2pf 6=const
(1− ǫ)
x 6=y(f(x)− f(y))2P (x, y)p(x)
V arp(f)
= (1− ǫ)(1− λ1(P )).
Arguing in the same way and using (B.2) we get
1− |λ|X |−1(Pǫ)| ≥ (1 − ǫ)(1− |λ|X |−1(P )|).
Hence,
(B.3) Gap(Pǫ) ≥ (1− ǫ)Gap(P ).
Proof of Lemma 5.2. Note that
(x, y) = (1 − p1 − p2)K±i (x, y) + (p1 + p2)Ix(y)
and, analogously,
PX0(x, y) = (1 − p1)K0(x, y) + p1Ix(y).
Hence, by (B.3),
Gap(PX±
) ≥ (1− p1 − p2)Gap(K±i )
as well
Gap(PX0) ≥ (1− p1)Gap(K0).
Finally note that Pi is given by
1− p2
1− p2
for every i, hence Gap(Pi) = p2. �
Proof of Proposition 5.3. To prove the first part of the proposition it is enough to
combine Lemma 5.2, Proposition 5.1 and (5.3). To complete the proof observe that
Gap(K±i ) = Gap(K0) = 1, when K
i and K0 are given by (5.2). �
In order to prove Proposition 6.1 we need some results obtained in [10].
Theorem B.2 (Ellis-Otto-Touchette). Let ρN be the distribution of SN (x)/N un-
der πβ,K,N , then ρN satisfies a large deviation principle on [−1, 1] with rate function
Ĩβ,K(z) = Jβ(z)− βKz2 − inf
{Jβ(t)− βKt2}
Jβ(z) = sup
tz − log
1 + e−β(et + e−t
1 + 2e−β
Moreover, if Ẽβ,K := argminĨβ,K, then there exists a non decreasing function Γ :
(0,+∞) → (0,+∞) with limx→0 Γ(x) = +∞ and limx→∞ Γ(x) = γc ≃ 1.082 such
that for every (β,K) with K > Γ(β) then
Ẽβ,K = {±z(β,K) 6= 0}.
20 FEDERICO BASSETTI AND FABRIZIO LEISEN
In particular, for such (β,K) and for every 0 < ǫ < |z(β,K)| there exists a constant
C1 = C1(ǫ, β,K) such that
(B.4) ρ([0, ǫ]) ≤ C1 exp{−
γǫ,β,K}
(B.5) γǫ,β,K = inf
z∈[0,ǫ]
Ĩβ,K(z) > 0.
Proof. For the first part see Theorems 3.3, 3.6 and 3.8 in [10]. As for (B.4)-(B.5),
they are standard consequences of the theory of the large deviations and of the first
part of the proposition, see, e.g., Proposition 6.4 of [11]. �
Proof of Proposition 6.1. We intend to use the Chegeer’s inequality. To do this, let
A := {x : SN (x) < 0}, B := {x : SN (x) > 0}, C := {SN (x) = 0}. First of all note
that, by symmetry, π(A) = π(B) = (1−π(C))/2 ≤ 1/2. The main task is to bound
φ(A) =
π(x)ME(x, y) =
π(y)ME(y, x).
Now, observe that if SN (y) > 1 then ME(y, x) = 0 for every x in A, hence
φ(A) =
y:SN (y)=0
ME(y, x) +
y:SN(y)=1
ME(y, x)
≤ π {y : SN (y) ∈ {0, 1}} .
This yields a bound on the conductance
h = h(π,ME) ≤ φ(A)/π(A) ≤
2π {y : SN (y) ∈ {0, 1}}
1− π{y : SN (y) = 0}
Now by Proposition B.2 we get
h(π,ME) ≤ C2e−∆N
for suitable constants C2 and ∆ > 0. The thesis follows by Cheeger inequality
(3.2). �
Proof of Lemma 6.2. The proof is exactly the same as the proof of Lemma 5.2. �
In order to prove Lemma 6.3 it is convenient to fix some simple properties of the
chain P̄ .
Lemma B.3. P̄ is a random walk on DN . If P̄ ((s, r), (s̃, r̃)) 6= 0,
P̄ ((s, r), (s̃, r̃)) ≥ p1C3
for a suitable constant C3 = C3(β,K), moreover
P̄ ((s, r), (s̃, r̃)) ≤ p1
for every (s, r), (s̃, r̃)) 6= ((0, 0), (1, 1)).
Proof of Lemma B.3. Easy but tedious computations show that
P̄ ((0, 0), (1, 1)) =
1, exp{Kβ
P̄ ((0, N), (1, N − 1)) = p1
P̄ ((0, N), (2, N)) =
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 21
P̄ ((0, r), (2, r)) =
r = 0, 2, 4, ..., N − 2
P̄ ((0, r), (1, r − 1)) = p1
r = 0, 2, 4, ..., N − 2
P̄ ((0, r), (1, r + 1)) =
1, exp{Kβ
r = 0, 2, 4, ..., N − 2
P̄ ((s, r), (s + 2, r)) =
(r − s)
(s, r) ∈ DN , 0 < s ≤ N − 2, r ≤ N
P̄ ((s, r), (s − 2, r)) = p1
(r + s) exp{4Kβ
(1 − s)}
(s, r) ∈ DN , 0 < s ≤ N, r ≤ N
P̄ ((s, r), (s + 1, r + 1)) =
(N − r)min
1, exp{Kβ
(2s+ 1)− β}
(s, r) ∈ DN , 0 < s, r ≤ N − 1,
P̄ ((s, r), (s − 1, r + 1)) = p1
(N − r) exp{Kβ
(−2s+ 1)− β}
(s, r) ∈ DN , 0 < s, r ≤ N − 1,
P̄ ((s, r), (s + 1, r − 1)) = p1
(r − s)
(s, r) ∈ DN , 0 < r ≤ N, 0 < s ≤ N − 2
P̄ ((s, r), (s − 1, r − 1)) = p1
(r + s)min
1, exp{Kβ
(2s+ 1)− β}
(s, r) ∈ DN , 0 < r ≤ N, 0 < s ≤ r.
At this stage the statement follows easily. �
Proof of Lemma 6.3. In order to obtain a bound on the gap of P̄ we shall apply
another time the decomposition technique. Write
DN = X̄1 ∪ X̄2 ∪ X̄3 ∪ ... ∪ X̄N ,
where
X̄1 = {(0, 0), (1, 1)} X̄r = {(u1, u2) ∈ Dn : u2 = r}.
On |[N ]| := {1, ..., N} define a chain P|[N ]| setting
P|[N ]|(i, j) :=
2π̄(X̄i)
a∈X̄i
b∈X̄j
P̄ (a, b)π̄(a)
P|[N ]|(i, i) := 1−
j 6=i
P|[N ]|(i, j).
Again P|[N ]| is a reversible chain on |[N ]| with stationary distribution
π̄|[N ]|(i) := π̄(X̄i).
Finally for every r = 1, 2, . . . , N we define a chain on X̄r by setting
PX̄r(a, b) := P̄ (a, b) + Ia(b)
z∈X̄ cr
P̄ (a, z)
where both a and b belong to X̄r. Now note that for every r = 2, 3, . . . , N PX̄r is
a birth and death chain on the state space {(1, r), (3, r), . . . , (r, r)} for r odd and
22 FEDERICO BASSETTI AND FABRIZIO LEISEN
{(0, r), (2, r), . . . , (r, r)} for r even. Let
qr(s) :=
(r − s)/2
and, for r even,
qr(0) := 2
Now observe that PX̄r has stationary distribution
πr(s) ∝ qr(s)
with s = 0, 2, . . . , r if r is even and s = 1, 3, . . . , r if r is odd. First of all let r 6= 1, by
Lemma B.3 and Lemma B.1, it is easy to check that (PX̄r , πr) meets the condition
of Lemma A.1 with
B = 2, n = [(r + 2)/2], A = C3p1[(r + 2)/2]N
([x] being the integer part of x) and then
1− λ1(PX̄r ) ≥
C3p1[(r + 2)/2]
2N [(r + 2)/2]3
Finally, Lemma B.3 with (A.1) yields
λ|X̄r|−1(PX̄r ) ≥ 1− p1.
Hence, for every r 6= 1, we have proved that
(B.6) Gap(PX̄r) ≥ C3/2p1N
For r = 1
PX̄1 =
1− α1/2 α1/2
α2/2 1− α2/2
where
α1 :=
1, exp{
α2 := p1 min
1, exp{Kβ
Gap(PX̄1) ≥ 1− |
2− α1 − α2
| = α1 + α2
where the last equality follows from the fact that α1
and α2
. Hence, for
sufficiently large N , it’s easy to see that
(B.7) Gap(PX̄1) ≥ C4p1N
with C4 = C4(β,K). At this stage (B.6) with (B.7) gives
(B.8) Gap(PX̄r) ≥ C5p1N
for all r ∈ |[N ]|. As for the gap of P|[N ]|, first of all note that P|[N ]| is a birth and
death chain on |[N ]|. From Lemma B.3
P|[N ]|(i, i+1) :=
2π̄(X̄i)
a∈X̄i
b∈X̄i+1
P̄ (a, b)π̄(a) ≥ p1C3
2π̄(X̄i)
a∈X̄i
b∈X̄i+1
π̄(a) ≥ p1C3
and analogously,
P|[N ]|(i, i− 1) ≥
Now, for r 6= 1
π̄|[N ]|(r) = q|[N ]|(r)/(
q|[N ]|(i))
METROPOLIS ALGORITHM AND EQUIENERGY SAMPLING 23
while
π̄|[N ]|(1) = (q|[N ]|(1) + q|[N ]|(0))/(
q|[N ]|(i)).
So, using the unimodality of q|[N ]|, we can apply Lemma B.3 with
which gives
λ1(P|[N ]|) ≤ 1−
≤ 1− p1C3
Using another time Lemma B.3, by (A.1), we get
λN (P|[N ]|) ≥ 1− p1.
Combining this two bounds we have
(B.9) Gap(P|[N ]|) ≥
and so from (3.4)
Gap(P̄ ) ≥
C being a suitable constant that depends by β,K,C3, C4, C5. �
Proof of Proposition 6.4. Combine Lemma 6.3 with 6.5. �
Acknowledgments
We should like to thank Persi Diaconis for useful discussions and for having
encouraged us during this work, Antonietta Mira for suggesting some interesting
references and Claudio Giberti for helping to improve an earlier version of the paper.
References
[1] J. Besag and P. J. Green. Spatial statistics and Bayesian computation. J. Roy. Statist. Soc.
Ser. B, 55(1):25–37, 1993.
[2] M. Blume, V. J. Emery, and R. B. Griffiths. Ising model for the λ transition and phase
separation in he3-he4 mixtures. Phys. Rev. A, 4:1071–1077, 1971.
[3] P. Bremaud. Markov Chains. Springer-Verlag, New York, 1998.
[4] S. Caracciolo, A. Pelissetto, and A. D. Sokal. Nonlocal Monte Carlo algorithm for self-avoiding
walks with fixed endpoints. J. Statist. Phys., 60(1-2):1–53, 1990.
[5] S. Caracciolo, A. Pelissetto, and A. D. Sokal. Dynamic critical exponent of the BFACF
algorithm for self-avoiding walks. J. Statist. Phys., 63(5-6):857–865, 1991.
[6] P. Diaconis and L. Saloff-Coste. What do we know about the Metropolis algorithm? J.
Comput. System Sci., 57(1):20–36, 1998. 27th Annual ACM Symposium on the Theory of
Computing (STOC’95) (Las Vegas, NV).
[7] P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of Markov chains. Ann. Appl.
Probab., 1(1):36–61, 1991.
[8] M. Dyer, L. A. Goldberg, M. Jerrum, and R. Martin. Markov chain comparison. Probab.
Surv., 3:89–111 (electronic), 2006.
[9] R. G. Edwards and A. D. Sokal. Generalization of the Fortuin-Kasteleyn-Swendsen-Wang
representation and Monte Carlo algorithm. Phys. Rev. D (3), 38(6):2009–2012, 1988.
[10] R. S. Ellis, P. T. Otto, and H. Touchette. Analysis of phase transitions in the mean-field
Blume-Emery-Griffiths model. Ann. Appl. Probab., 15(3):2203–2254, 2005.
[11] R.S. Ellis. The Theory of Large Deviation and Applications to Statistical Mechanics.
2006. Lectures for the international seminar on Extreme Events in Complex Dynamics.
http://www.math.umass.edu/ ˜ rsellis/pdf-files/Dresden-lectures.pdf Max-Planck-Institut.
[12] A. E. Gelfand, S. K. Sahu, and B. P. Carlin. Efficient parameterisations for normal linear
mixed models. Biometrika, 82(3):479–488, 1995.
[13] A. Gelman, G. O. Roberts, and W. R. Gilks. Efficient Metropolis jumping rules. In Bayesian
statistics, 5 (Alicante, 1994), Oxford Sci. Publ., pages 599–607. Oxford Univ. Press, New
York, 1996.
http://www.math.umass.edu/
24 FEDERICO BASSETTI AND FABRIZIO LEISEN
[14] C. J. Gey er and E. A. Thompson. Annealing markov chain monte carlo with applications to
ancestral inference. J. Amer. Statist. Assoc., 90:909–920, 1995.
[15] Y. Guan, R. Fleißner, P. Joyce, and S. M. Krone. Markov chain Monte Carlo in small worlds.
Stat. Comput., 16(2):193–202, 2006.
[16] Y. Guan and S. M. Krone. Small-world mcmc and convergence to multi-modal distributions:
From slow mixing to fast mixing. Ann. Appl. Probab., 17(1):284–304, 2007.
[17] J. M. Hammersley and D. C. Handscomb. Monte Carlo methods. Methuen & Co. Ltd., Lon-
don, 1965.
[18] W.K. Hastings. Monte carlo sampling methods using markov chains and their application.
Biometrika, 57:97–109, 1970.
[19] K. Hukushima and K. Nemoto. Exchange monte carlo method and application to spin glass
simulations. J.Phys.Soc.Jpn., 65:1604–1608, 1996.
[20] M. Jerrum, J. Son, P. Tetali, and E. Vigoda. Elementary bounds on Poincaré and log-Sobolev
constants for decomposable Markov chains. Ann. Appl. Probab., 14(4):1741–1765, 2004.
[21] S. A. Kivelson, V. J. Emery, and H. Q. Lin. Doped antiferromagnets in the weak-hopping
limit. Phys. Rev. B, 42:6523–6530, 1990.
[22] S. C. Kou, Qing Zhou, and Wing Hung Wong. Equi-energy sampler with applications in
statistical inference and statistical mechanics. Ann. Statist., 34(4):1581–1619, 2006.
[23] J. Lajzerowicz and J. Sivardiére. Spin 1 lattice gas model. ii. condensation and phase sepa-
ration in a binary fluid. Phys. Rev. A, 11:2090–2100, 1975.
[24] J. Lajzerowicz and J. Sivardiére. Spin 1 lattice gas model. iii. tricritical points in binary and
ternary fluids. Phys. Rev. A, 11:2101–2110, 1975.
[25] J. S. Liu. Monte Carlo strategies in scientific computing. Springer Series in Statistics.
Springer-Verlag, New York, 2001.
[26] N. Madras and M. Piccioni. Importance sampling for families of distributions. Ann. Appl.
Probab., 9(4):1202–1225, 1999.
[27] N. Madras and D. Randall. Markov chain decomposition for convergence rate analysis. Ann.
Appl. Probab., 12(2):581–606, 2002.
[28] E. Marinari and G. Parisi. Simulated tempering: a new monte carlo scheme. Europhy s. Lett,
19:451–458, 1992.
[29] N. Metrpolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state
caluclations by fast computing machines. J. Chem. Phys., 21:1087–1092, 1953.
[30] A. Mira and L. Tierney. Efficiency and convergence properties of slice samplers. Scand. J.
Statist., 29(1):1–12, 2002.
[31] K.E. Newman and J.D. Dow. Zinc blende diamond order disorder transition in metastable
crystalline (gaas)1-xge2x alloys. Phys. Rev. B, 27:7495–7508, 1983.
[32] P. H. Peskun. Optimum Monte-Carlo sampling using Markov chains. Biometrika, 60:607–612,
1973.
[33] P. H. Peskun. Guidelines for choosing the transition matrix in Monte Carlo methods using
Markov chains. J. Comput. Phys., 40(2):327–344, 1981.
[34] C. P. Robert and G. Casella. Monte Carlo statistical methods. Springer Texts in Statistics.
Springer-Verlag, New York, second edition, 2004.
[35] R. Y. Rubinstein. Simulation and the Monte Carlo method. John Wiley & Sons Inc., New
York, 1981. Wiley Series in Probability and Mathematical Statistics.
[36] M. Schick and Wei-Heng Shih. Spin 1 model of a microemulsion. Phys. Rev. B, 34:1797–1801,
1986.
[37] A. Sinclair. Algorithms for random generation and counting. Progress in Theoretical Com-
puter Science. Birkhäuser Boston Inc., Boston, MA, 1993. A Markov chain approach.
[38] J. Sivardiére and J. Lajzerowicz. Spin 1 lattice gas model. i. condensation and solidification
of a simple fluid. Phys. Rev. A, 11:2090–2100, 1975.
[39] A. Sokal. Monte Carlo methods in statistical mechanics: foundations and new algorithms. In
Functional integration (Cargèse, 1996), volume 361 of NATO Adv. Sci. Inst. Ser. B Phys.,
pages 131–192. Plenum, New York, 1997.
[40] R.H. Swendsen and J. S. Wang. Non–universal critical dynamics in monte carlo simulations.
Phy. Rev. Lett., pages 86–88, 1987.
Università degli Studi di Pavia, Dipartimento di Matematica, via Ferrata 1, 27100
Pavia, Italy
Università dell’Insubria, Dipartimento di economia, via monte generoso 71, 21100
Varese , Italy
E-mail address: federico.bassetti@unipv.it
E-mail address: leisen.fabrizio@unimore.it
	1. Introduction.
	2. A general strategy
	3. Preliminaries
	3.1. Spectral gap, variance and speed of convergence
	3.2. Cheeger's inequality
	3.3. Chain decomposition theorem
	4. Warming up example
	5. The mean field Ising model
	6. The mean–field Blume-Emery-Griffiths model
	Appendix A. The Spectral Gap of a Birth and Death Chain
	Appendix B. Proofs
	Acknowledgments
	References
ABSTRACT
  In this paper we study the Metropolis algorithm in connection with two
mean--field spin systems, the so called mean--field Ising model and the
Blume--Emery--Griffiths model. In both this examples the naive choice of
proposal chain gives rise, for some parameters, to a slowly mixing Metropolis
chain, that is a chain whose spectral gap decreases exponentially fast (in the
dimension $N$ of the problem). Here we show how a slight variant in the
proposal chain can avoid this problem, keeping the mean computational cost
similar to the cost of the usual Metropolis. More precisely we prove that, with
a suitable variant in the proposal, the Metropolis chain has a spectral gap
which decreases polynomially in 1/N. Using some symmetry structure of the
energy, the method rests on allowing appropriate jumps within the energy level
of the starting state.

<|endoftext|><|startoftext|>
Experimental test of the high frequency quantum shot noise theory in a Quantum
Point Contact
E. Zakka-Bajjani, J. Ségala, F. Portier,∗ P. Roche, and D. C. Glattli†
Nanoelectronic group, Service de Physique de l’Etat Condensé,
CEA Saclay, F-91191 Gif-Sur-Yvette, France
A. Cavanna and Y. Jin
CNRS, Laboratoire de Photonique et Nanostructures ,
Route de Nozay, F-91460 Marcoussis, France
(Dated: November 9, 2018)
We report on direct measurements of the electronic shot noise of a Quantum Point Contact
(QPC) at frequencies ν in the range 4-8 GHz. The very small energy scale used ensures energy
independent transmissions of the few transmitted electronic modes and their accurate knowledge.
Both the thermal energy and the QPC drain-source voltage Vds are comparable to the photon
energy hν leading to observation of the shot noise suppression when Vds < hν/e. Our measurements
provide the first complete test of the finite frequency shot noise scattering theory without adjustable
parameters.
PACS numbers: 73.23.-b,73.50.Td,42.50.-p,42.50.Ar
Pauli’s exclusion principle has striking consequences on
the properties of quantum electrical conductors. In an
ideal quantum wire, it is responsible for the quantization
of the conductance by requiring that at most one electron
(or two for spin degeneracy) occupies the regularly time-
spaced wave-packets emitted by the contacts and propa-
gating in the wire [1]. Concurrently, at zero temperature,
the electron flow is noiseless [2, 3] as can be observed in
ballistic conductors [4, 5, 6]. In more general quantum
conductors, static impurities diffract the noiseless elec-
trons emitted by the contacts. This results in a partition
of the electrons between transmitted or reflected states,
generating quantum shot noise [1, 2, 3, 7, 8]. However,
Pauli’s principle possesses more twists to silence elec-
trons. At finite frequency ν, detection of current fluctu-
ations in an external circuit at zero temperature requires
emission of photons corresponding to a finite energy cost
hν [9]. For drain-source contacts biased at voltage Vds,
a sharp suppression is expected to occur when the pho-
ton energy hν is larger than eVds as an electron emit-
ted by the source can not find an empty state in the
drain to emit such a photon [9, 10, 11]. Another striking
consequence of Pauli’s principle is the prediction of non-
classical photon emission for a conductor transmitting
only one or few electronic modes. It has been shown that
in the frequency range eVds/2h < ν < eVds/h, the popu-
lation of a photon mode obeys a sub-Poissonian statistics
inherited from the electrons [12]. Investigating quantum
shot noise in this high frequency regime using a Quan-
tum Point Contact (QPC) to transmit few modes is thus
highly desirable.
The first step is to check the validity of the above pre-
diction based on a non-interacting picture of electrons.
For 3D or 2D wide conductors with many quantum chan-
nels which are good Fermi liquids, one expects this non-
interacting picture to work well. Indeed, the eVds/h sin-
gularity has been observed in a 3D diffusive wire in the
shot noise derivative with respect to bias voltage [13].
However, for low dimensional systems like 1D wires or
conductors transmitting one or few channels, electron in-
teractions give non-trivial effects. Long 1D wires defined
in 2D electron gas or Single Wall Carbon Nanotubes be-
come Luttinger liquids. Long QPCs exhibit a 0.7 con-
ductance anomaly [14], and a low frequency shot noise
[15] compatible with Kondo physics [16]. Consequently,
new characteristic frequencies may appear in shot noise
reflecting electron correlations. Another possible failure
of the non-interacting finite frequency shot noise model
could be the back-action of the external circuit. For high
impedance circuits, current fluctuations implies potential
fluctuations at the contacts [17]. Also, the finite time re-
quired to eliminate the sudden drain-source charge build-
up after an electron have passed through the conductor
leads to a dynamical Coulomb blockade for the next elec-
tron to tunnel. A peak in the shot noise spectrum at the
electron correlation frequency I/e is predicted for a tun-
nel junction connected to a capacitive circuit [18]. Other
timescales may also be expected which affect both con-
ductance [19] and noise [20] due to long range Coulomb
interaction or electron transit time. This effects have
been recently observed for the conductance [21] .
The present work aims at giving a clear-cut test of
the non-interacting scattering theory of finite frequency
shot noise using a Quantum Point Contact transmitting
only one or two modes in a weak interaction regime. It
provides the missing reference mark to which further ex-
periments in strong interaction regime can be compared
in the future. We find the expected shot noise suppres-
sion for voltages ≤ hν/e in the whole 4-8 GHz frequency
range. The data taken for various transmissions perfectly
http://arxiv.org/abs/0704.0907v4
agree with the finite temperature, non-interacting model
with no adjustable parameter. In addition to provide
a stringent test of the theory, the technique developed
is the first step toward the generation of non-classical
photons with QPCs in the microwave range [12]. The
detection technique uses cryogenic linear amplification
followed by room temperature detection. The electron
temperature much lower than hν/kB, the small energy
scale used (eVds ≪ 0.02EF ) ensuring energy independent
transmissions, the high detection sensitivity, and the ab-
solute calibration allow for direct comparison with the-
ory without adjustable parameters. Our technique differs
from the recent QPC high frequency shot noise measure-
ments using on-chip Quantum Dot detection in the 10-
150 GHz frequency range [22]. Although most QPC shot
noise features were qualitatively observed validating this
promising method, the lack of independent determina-
tion of the QPC-Quantum Dot coupling, and the large
voltage used from 0.05 to 0.5EF making QPC transmis-
sions energy dependent, prevent quantitative comparison
with shot noise predictions. However, Quantum Dot de-
tectors can probe the vacuum fluctuations via the stim-
ulated noise while the excess noise detected here only
probes the emission noise [9, 10].
The experimental set-up is represented in fig. 1. A
two-terminal conductor made of a QPC realized in a
2DEG in GaAs/GaAlAs heterojunction is cooled at 65
mK by a dilution refrigerator and inserted between two
transmission lines. The sample characteristics are a 35
nm deep 2DEG with 36.7 m2V−1s−1 mobility and 4.4
1015 m−2 electron density. Interaction effects have been
minimized by using a very short QPC showing no sign
of 0.7 conductance anomaly. In order to increase the
sensitivity, we use the microwave analog of an optical re-
flective coating. The contacts are separately connected
to 50 Ω coaxial transmission lines via two quarter wave
length impedance adapters, raising the effective input
impedance of the detection lines to 200 Ω over a one
octave bandwidth centered on 6 GHz. The 200 Ω elec-
tromagnetic impedance is low enough to prevent dynam-
ical Coulomb blockade but large enough for good cur-
rent noise sensitivity. The transmitted signals are then
amplified by two cryogenic Low Noise Amplifiers (LNA)
with Tnoise ≃ 5K. Two rf-circulators, thermalized at
mixing chamber temperature protect the sample from
the current noise of the LNA and ensure a circuit en-
vironment at base temperature. After further amplifi-
cation and eventually narrow bandpass filtering at room
temperature, current fluctuations are detected using two
calibrated quadratic detectors whose output voltage is
proportional to noise power. Up to a calculable gain
factor, the detected noise power contains the weak sam-
ple noise on top of a large additional noise generated by
the cryogenic amplifiers. In order to remove this back-
ground, we measure the excess noise ∆SI(ν, T, Vds) =
SI(ν, T, Vds) − SI(ν, T, 0). Practically, this is done by
4.2 K 
adjustable 
filters
VDrain-Source
50/200 Ω
λ/4 adapter
65 mK
circulator
FIG. 1: Schematic diagram of the measurement set-up. See
text for details.
applying a 93 Hz 0-Vds square-wave bias voltage on the
sample through the DC input of a bias-T, and detecting
the first harmonic of the square-wave noise response of
the detectors using lock-in techniques. In terms of noise
temperature referred to the 50 Ω input impedance, an
excess noise ∆SI(ν, T, Vds) gives rise to an excess noise
temperature
∆T 50Ωn (ν, T, Vds) =
ZeffZ
sample∆SI(ν, T, Vds)
(2Zeff + Zsample)2
. (1)
Eq. 1 demonstrates the advantage of impedance match-
ing : in the high source impedance limit Zsample ≫ Zeff ,
the increase in noise temperature due to shot noise is
proportional to Zeff . Our set up (Zeff = 200Ω) is thus
four times more efficient than a direct connection of the
sample to standard 50 Ω transmission lines. Finally, the
QPC differential conductance G is simultaneously mea-
sured through the DC input of the bias-Tee using low
frequency lock-in technique.
The very first step in the experiment is to characterize
the QPC. The inset of fig. 4 shows the differential con-
ductance versus gate voltage when the first two modes
are transmitted. As the experiment is performed at zero
magnetic field, the conductance exhibits plateaus quan-
tized in units of G0 = 2e
2/h. The short QPC length
(80 nm) leads to a conductance very linear with the
low bias voltage used (δG/G ≤ 6% for Vds ≤ 80µV for
G ≃ 0.5G0). It is also responsible for a slight smooth-
ing of the plateaus. Each mode transmission is extracted
from the measured conductance (open circles) by fitting
with the saddle point model (solid line) [23].
We then set the gate voltage to obtain a single mode
at half transmission corresponding to maximum electron
partition (G ≃ 0.5G0). Fig. 2 shows typical excess noise
measured at frequencies 4.22 GHz and 7.63 GHz and
bandwidth 90 MHz and 180 MHz. We note a striking
suppression of shot noise variation at low bias voltage,
and that the onset of noise increases with the measure-
ment frequency. This is in agreement with the photon
suppression of shot noise in a non-interacting system.
-80 -60 -40 -20 0 20 40 60 80
2 X V
(4.22 GHz)
2 X V
(7.63 GHz)
Drain Source Voltage V
 (µV)
 4.22 GHz
 7.63 GHz
FIG. 2: Color Online. Excess noise temperature as a func-
tion of bias voltage, measured at 4.22 GHz (open circles) and
7.63 GHz (open triangles). The dashed lines represent the
linear fits to the data, from which the threshold V0 is de-
duced. The solid lines represent the expected excess noise
SI(ν, Te(Vds), Vds) − SI(ν, Te(0), 0), using Te(Vds) obtained
from eq. 5. The frequency dependent coupling is the only
fitting parameter.
The expected excess noise reads
∆SI(ν, T, Vds) = 2G0
Di(1−Di)
hν − eVds
e(hν−eVds)/kBT − 1
hν + eVds
e(hν+eV )/kBT − 1
ehν/kBT − 1
. (2)
It shows a zero temperature singularity at eVds = hν :
∆SI(ν, T, Vds) = 2G0
i Di(1−Di)(eVds−hν) if eVds >
hν and 0 otherwise. At finite temperature, the singular-
ity is thermally rounded. At high bias (eVds ≫ hν, kBT ),
equation 2 gives an excess noise
∆SI(ν, T, Vds) = 2G0
iDi(1−Di) (eVds − eV0) (3)
with eV0 = hν coth (hν/2kBT ) . (4)
In the low frequency limit, the threshold V0 charac-
terizes the transition between thermal noise and shot
noise (eV0 = 2kBT ), whereas in the low temperature
limit, it marks the onset of photon suppressed shot noise
(eV0 = hν). As shown on fig. 2, V0 is determined by the
intersection of the high bias linear regression of the mea-
sured excess noise and the zero excess noise axis. Fig. 3
shows V0 for eight frequencies spanning in the 4-8 GHz
range for G ≃ 0.5G0 . Eq. 4 gives a very good fit to
the experimental data. The only fitting parameter is the
electronic temperature Te = 72 mK, very close to the
fridge temperature Tfridge = 65 mK. We will show that
electron heating can account for this small discrepancy.
To get a full comparison with theory, we now inves-
tigate the influence of the transmissions of the first two
electronic modes of the QPC. To do so, we repeat the
same experiment at fixed frequency (here we used a 5.4-
5.9 GHz filter) for different sample conductances. The
0 5 10 15 20 25 30 35
: Experiment
 Fit to theory
          yields T
 = 72mK
          (fridge temp = 65 mK)
hν/e (µV)
0 2 4 6 8
 Observation Frequency ν (GHz)
2 /k T e
asymptote
V h eν=
FIG. 3: Onset V0 as a function of the observation frequency.
The experimental uncertainty corresponds to the size of the
symbols. The dashed lines correspond to the low (eV0 =
2kBT ) and high (eV0 = hν) frequency limits, and the solid
line is a fit to theory, with the electronic temperature as only
fitting parameter.
0,0 0,5 1,0 1,5 2,0
-0,5 -0,4 -0,3
Gate Voltage(V)
FIG. 4: Open circles: d∆SI/d(eVds) deduced from ∆T
Full line : theoretical prediction. The only fitting parameter
is the microwave attenuation. The experimental uncertainty
corresponds to the size of the symbols. Inset : Open circles :
conductance of the QPC as a function of gate voltage. Solid
Line : fit with the saddle point model [23].
noise suppression at Vds ≤ hν/e is the only singularity
we observe, independently of the QPC conductance G.
Fig. 4 shows the derivative with respect to eVds of the
excess noise d∆SI/d(eVds) deduced from the excess noise
temperature measured between 50 µV and 80 µV. This
energy range is chosen so that eVds is greater than hν
by at least 5kBTfridge over the entire frequency range.
The data agree qualitatively with the expected D(1−D)
dependence of pure shot noise, showing maxima at con-
ductances G = 0.5G0, and G = 1.5G0, and minima at
conductances G = G0 and G = 2G0. The short QPC
is responsible for the non zero minima as, when the sec-
ond mode starts to transmit electrons, the first one has
not reached unit transmission (inset of fig. 4). How-
ever, eq. 2 is not compatible with a second maximum
higher than the first one, which is due to electron heating.
The dimensions of the 2-DEG being much larger than
the electron-electron energy relaxation length, but much
smaller than electron-phonon energy relaxation length,
there is a gradient of electronic temperature from the
QPC to the ohmic metallic contacts assumed at the fridge
temperature. Combining the dissipated power IVds with
the Wiedemann-Franz law, one gets [5, 24]
T 2e = T
fridge +
where Gm stands for the total conductance of the 2D
leads, estimated from measurements to be 12 mS ±20%.
The increased noise temperature is then due to both
shot noise and to the increased thermal noise. For a
fridge temperature of 65 mK and G = G0/2, the elec-
tronic temperature will increase from 69 mK to 77 mK
as Vds increases from 50 µV to 80 µV. This accounts
for the small discrepancy between the fridge temper-
ature and the electron temperature deduced from the
variation of V0 with frequency. As G increases, the ef-
fect is more important, as can be seen both in fig. 4
and eq. 5. The solid line in figure 4 gives the av-
erage derivative with respect to eVds of the total ex-
pected excess noise SI(ν, Te(Vds), Vds) − SI(ν, Te(0), 0),
using the attenuation of the signal as a free parame-
ter. The agreement is quite satisfactory, given the ac-
curacy of the saddle point model description of the QPC
transmission. We find a 4.7 dB attenuation, which is
in good agreement with the expected 4 ±1 dB deduced
from calibration of the various elements of the detection
chain. Moreover, the voltage dependent electron temper-
ature obtained from eq. 5 can also be used to evaluate
SI(ν, Te(Vds), Vds) − SI(ν, Te(0), 0) as a function of Vds
at fixed sample conductance G = 0.5G0. The result, as
shown by the solid lines of fig. 2, is in excellent agreement
with experimental observations.
In conclusion, we performed the first direct measure-
ment of the finite frequency shot noise of the simplest
mesoscopic system, a QPC. Accurate comparison of the
data with non-interacting shot noise predictions have
been done showing perfect quantitative agreement. Even
when a single mode is transmitted, no sign of devia-
tion related to interaction was found, as expected for
the experimental parameters chosen for this work. We
have also shown that accurate and reliable high frequency
shot noise measurements are now possible for conductors
with impedance comparable to the conductance quan-
tum. This opens the way to high frequency shot noise
characterization of Carbon Nanotubes, Quantum Dots
or Quantum Hall samples in a regime where microscopic
frequencies are important and will encourage further the-
oretical work in this direction. Our set-up will also allow
to probe the statistics of photons emitted by a phase co-
herent single mode conductor.
It is a pleasure to thank D. Darson, C. Ulysse, P.
Jacques and C. Chaleil for valuable help in the construc-
tion of the experiments, P. Roulleau for technical help,
and X. Waintal for useful discussions.
∗ Electronic address: fabien.portier@cea.fr
† Also at LPA, Ecole Normale Supérieure, Paris.
[1] T. Martin and R. Landauer, Phys. Rev. B 45, 1742
(1992)
[2] V. A. Khlus, Zh. Eksp. Teor. Fiz. 93 (1987) 2179 [Sov.
Phys. JETP 66 (1987) 1243].
[3] G. B. Lesovik, Pis’ma Zh. Eksp. Teor. Fiz. 49 (1989) 513
[JETP Lett. 49 (1989) 592].
[4] M. Reznikov, et al., Phys. Rev. Lett. 75, 3340 (1995);
[5] A. Kumar et al., Phys. Rev. Lett. 76, 2778 (1996).
[6] L. Hermann et al., arXiv:cond-mat/0703123v1.
[7] M. Büttiker, Phys. Rev. Lett. 65, 2901 (1990)
[8] Y. M. Blanter and M. Büttiker, Phys. Rep. 336, 1 (2000).
[9] G.B. Lesovik, R. Loosen, JETP Lett. 65, 295 (1997).
Here is made the distinction between emission noise
SI(ν) =
〈I(0)I(τ )〉ei2πντdτ and stimulated noise
SI(−ν). While observation of the later requires excitation
of the sample by external sources, for a zero temperature
external circuit, only SI(ν) should be observed. For an
earlier high frequency shot noise derivation not making
the distinction between SI(ν) and SI(−ν), see Ref.[2, 3].
[10] R. Aguado and L. P. Kouwenhoven, Phys. Rev. Lett. 84,
1986 (2000);
[11] U. Gavish, Y. Levinson, Y. Imry, Phys. Rev. B 62,
R10637 (2000); M. Creux, A. Crepieux, Th. Martin,
Phys. Rev. B 74 115323 (2006).
[12] C. W. J Beenakker and H. Schomerus, Phys. Rev. Lett.
86, 700 (2001); J. Gabelli, et al., Phys. Rev. Lett. 93,
056801 (2004); C. W. J. Beenakker and H. Schomerus
Phys. Rev. Lett. 93, 096801 (2004).
[13] R. J. Schoelkopf et al., Phys. Rev. Lett. 78, 3370 (1997).
[14] K. J. Thomas et al., Phys. Rev. Lett. 77, 135 (1996); K.
J. Thomas et al., Phys. Rev. B 58, 4846 (1998).
[15] P. Roche et al., Phys. Rev. Lett. 93, 116602 (2004); L.
DiCarlo et al., Phys. Rev. Lett. 97, 036810 (2006).
[16] A. Golub, T. Aono, and Y. Meir Phys. Rev. Lett. 97,
186801 (2006)
[17] B. Reulet, J. Senzier, and D. E. Prober, Phys. Rev. Lett.
91, 196601 (2003); M. Kindermann, Yu. V. Nazarov, and
C. W. J. Beenakker Phys. Rev. B 69, 035336 (2004).
[18] D.V. Averin and K.K. Likharev, J. Low Temp.Phys. 62
345 (1986).
[19] M. Büttiker, H. Thomas, and A. Prêtre, Phys. Lett.
A180, 364 (1993); M. Büttiker, A. Prêtre, H. Thomas,
Phys. Rev. Lett. 70, 4114 (1993)
[20] M. H. Pedersen, S. A. van Langen, and M. Buttiker,
Phys. Rev. B 57 (1998) 1838.
[21] J. Gabelli et al., Science 313, 499 (2006). J. Gabelli et
al., Phys. Rev. Lett. 98, 166806 (2007)
[22] E. Onac et al. Phys. Rev. Lett. 96, 176601 (2006).
The experimental onset in Vds for the emission of high
frequency shot noise was larger than expected (Vds ≃
mailto:fabien.portier@cea.fr
http://arxiv.org/abs/cond-mat/0703123
5 × hν/e). After submission of this work, Gustavson et
al. reported on a double quantum dot on-chip detector,
yielding to a more quantitative agreement with theory
(arXiv:0705.3166v1).
[23] M. Büttiker Phys. Rev. B 41, 7906-7909 (1990).
[24] A. H. Steinbach, J. M. Martinis, and M. H. Devoret Phys.
Rev. Lett. 76, 3806 (1996)
http://arxiv.org/abs/0705.3166
ABSTRACT
  We report on direct measurements of the electronic shot noise of a quantum
point contact at frequencies nu in the range 4-8 GHz. The very small energy
scale used ensures energy independent transmissions of the few transmitted
electronic modes and their accurate knowledge. Both the thermal energy and the
quantum point contact drain-source voltage Vds are comparable to the photon
energy hnu leading to observation of the shot noise suppression when
$V_{ds}<h\nu/e$. Our measurements provide the first complete test of the finite
frequency shot noise scattering theory without adjustable parameters.

<|endoftext|><|startoftext|>
Introduction
The detection of an extreme “cold spot” (Vielva et al. 2004) in the foreground-corrected
WMAP images was an exciting but unexpected finding. At 4◦ resolution, Cruz et al. (2005)
determine an amplitude of -73 µK, which reduces to -20 µK at ∼10◦ scales (Cruz et al.
1larry@astro.umn.edu
2brown@astro.umn.edu
3llrw@astro.umn.edu
http://arxiv.org/abs/0704.0908v2
– 2 –
2007). The non-gaussianity of this extreme region has been scrutinized, (Cayon, Jun &
Treaster 2005; Cruz et al. 2005, 2006, 2007) concluding that it cannot be explained by either
foreground correction problems or the normal Gaussian fluctuations of the CMB. Thus, the
cold spot seems to require a distinct origin – either primordial or local. Across the whole sky,
local mass tracers such as the optical Sloan Digital Sky Survey (SDSS, York et al. 2000) and
the radio NRAO VLA Sky Survey (NVSS, Condon et al. 1998) are seen to correlate with
the WMAP images of the CMB (Pietrobon, Balbi & Marinucci 2006; Cabre et al. 2006),
probably through the late integrated Sachs-Wolfe effect (ISW, Crittenden & Turok 1996).
McEwen et al. (2007) extended the study of radio source/CMB correlations by per-
forming a steerable wavelet analysis of NVSS source counts and WMAP images. They
isolated 18 regions that, as a group, contributed a significant fraction of the total NVSS-
ISW signal. Three of those 18 regions were additionally robust to the choice of wavelet form.
The centroid of one of those three robust correlated regions, (#16), is inside the 10◦ cold
spot derived from WMAP data alone (Cruz et al. 2007), although McEwen et al. (2007)
did not point out this association.
The investigations reported here were conducted independently and originally without
knowledge of the McEwen et al. (2007) analysis. However, our work is a posteriori in
nature, because we were specifically looking for the properties in the direction of the cold
spot. These results thus support and quantify the NVSS properties in the specific direction
of the cold spot, but should be considered along with the McEwen et al. (2007) analysis for
the purposes of an unbiased proof of a WMAP association.
2. Analysis and Characterization of the NVSS “dip”
We examined both the number counts of NVSS sources in the direction of the WMAP
cold spot and their smoothed brightness distribution. The NVSS 21 cm survey covers the
sky above a declination of -40◦ at a resolution of 45”. It has an rms noise of 0.45 mJy/beam
and is accompanied by a catalog of sources stronger than 2.5 mJy/beam. Because of the
short interferometric observations that went into its construction, the survey is insensitive to
diffuse sources greater than ≈ 15’ in extent. Convolution of the NVSS images to larger beam
sizes, as done here, shows the integrated surface brightness of small extragalactic sources;
this is very different than what would be observed by a single dish of equivalent resolution. In
the latter case, the diffuse structure of the Milky Way Galaxy dominates (e.g, Haslam et al.
(1981)), although it is largely invisible to the interferometer.
To explore the extragalactic radio source population in the direction of the WMAP
– 3 –
cold spot, we first show in Figure 1 the 50◦×50◦ region around the cold spot convolved to a
resolution of 3.4◦. Here, the region of the cold spot is seen to be the faintest region on the
image (minimum at lII ,bII = 207.8
◦, -56.3◦). At minimum, its brightness is 14 mK below
the mean, with an extent of ≈5◦. The WMAP cold spot thus picks out a special region in
the NVSS – at least within this 2500� region.
We examined the smoothed brightness distribution across the whole NVSS survey using
another averaging technique that reduces the confusion from the brightest sources. We first
pre-convolved the images to 800”, which fills all the gaps between neighboring sources, and
then calculated the median brightness in sliding boxes 3.4◦ on a side. The resulting image
is shown in Figure 2 which is in galactic coordinates centered at lII=180
◦. The dark regions
near the galactic plane are regions of the NVSS survey that were perturbed by the presence
of very strong sources. Note that the galactic plane itself, which dominates single dish maps,
is only detectable here between -20◦ < lII < 90
◦, where there is a local increase in the number
of small sources detectable by the interferometer.
To evaluate the NVSS brightness properties of the cold spot, we compared it with the
distribution of median brightnesses in two strips from this all sky map. The first strip was
in the north, taking everything above a nominal galactic latitude of 30◦ (More precisely,
we used the horizontal line in the Aitoff projection tangent to the 30◦ line at lII=180
The second strip was in the south, taking everything below a nominal galactic latitude of
-30◦, but only from 10◦ < lII <180
◦, to avoid regions near the survey limit of δ=-40◦. The
minimum brightness in the cold spot region (≈ 20 mK) is equal to the lowest values seen in
the 16,800 square degree area of the two strips, and is ≈ 30% below the mean (Figure 3).
Formally, the probability of finding this weakest NVSS spot within the ≈10◦ (diameter)
region of the WMAP cold spot is 0.6%. This a posteriori analysis thus is in agreement with
the statistical conclusions of McEwen et al. (2007) that the NVSS properties in this region
are linked to those of WMAP. We also note that the magnitude of the NVSS dip is at the
extreme, but not an outlier of the overall brightness distribution. We thus expect that less
extreme NVSS dips would also individually correlate with WMAP cold regions, although it
may be more difficult to separate those from the primordial fluctuations.
The NVSS brightness dip can be seen at a number of resolutions, and there is probably
more than one scale size present. At resolutions of 1◦, 3.4◦, and 10◦, we find that the dip is
≈ 60%, 30% and 10% of the respective mean brightness. At 10◦ resolution, the NVSS deficit
overlaps with another faint region about 10◦ to the west, while the average dip in brightness
then decreases from -14 mK (at 3.4◦) to -4 mK.
The dip in NVSS brightness in the WMAP cold spot region is not due to some peculiarity
– 4 –
of the NVSS survey itself. In Figure 4, we compare the 1◦ convolved NVSS image with the
similar resolution, single dish 408 MHz image of Haslam et al. (1981). This 408 MHz all
sky map is dominated in most places by galactic emission, and was usedby Bennett et al.
(2003) as a template for estimating the synchrotron contribution in CMB observations.
On scales of 1◦, the fluctuations in brightness are a combination of galactic (diffuse) and
extragalactic (smeared small source) contributions. In the region of the cold spot, we can
see the extragalactic contribution at 408 MHz by comparison with the smoothed NVSS
1.4 GHz image. Note that although there is flux everywhere in the NVSS image, this is
the “confusion” from the smoothed contribution of multiple small extragalactic sources in
each beam, whereas the 408 MHz map has strong diffuse galactic emission as well. Strong
brightness dips are seen in both images in the region of the WMAP cold spot - with the
brightness dropping by as much as 62% in the smoothed NVSS; in the 408 MHz map, this
is diluted by galactic emission.
To look more quantitatively at the source density in the cold spot region, we measured
the density of NVSS sources (independent of their fluxes) as a function of distance from
the cold spot. Figure 5 shows the counts in equal area annuli around the WMAP cold spot
down to two different flux limits. With a limit of 5 mJy, there is a 45±8% decrease in counts
in the 1◦ radius circle around the WMAP cold spot centroid. At the survey flux limit of
2.5 mJy, the decrease is 23±3%. This reduction in number counts is what is measured by
McEwen et al. (2007) in their statistical all-sky analysis.
3. Foreground Corrections
Several studies have claimed that the properties of the cold spot are most likely an effect
of incorrect foreground subtraction (Chiang & Naselsky 2004; Coles 2005; Liu & Zhang
2005; Tojeiro et al. 2006). This possibility has been investigated in detail for both the
first year (Vielva et al. 2004; Cruz et al. 2005, 2006) and third (Cruz et al. 2007) year
WMAP data. The arguments against foreground subtraction errors can be summarized in
three main points – 1) The region of the spot shows no spectral dependence in the WMAP
data. This is consistent with the CMB and inconsistent with the known spectral behavior
of galactic emission (as well as the SZ effect). The flat (CMB-like) spectrum is found both
in temperature and kurtosis, as well as in real and wavelet space. 2) Foreground emission
is found to be low in the region of the spot, making it unlikely that an over-subtraction
could produce an apparent non-Gaussianity. 3) Similar results are found when using totally
independent methods to model and subtract out the foreground emission (Cruz et al. 2006),
namely the combined and foreground cleaned Q-V-W map (Bennett et al. 2003) and the
– 5 –
weighted internal linear combination analysis (Tegmark et al. 2003).
Now that we know that there is a reduction in the extragalactic radio source contribution
in the direction of the cold spot, we can re-examine this issue. We ask whether a 20-30%
decrement in the local brightness of the extragalactic synchrotron emission could translate
into a foreground subtraction problem that could generate the WMAP cold spot. We are
not re-examining the foreground question ab initio, simply examining the plausibility that
the deficit of NVSS sources could complicate the foreground calculations in this location.
The characteristic brightness in the 3.4◦ convolved NVSS image around the cold spot is ∼
51 mK at 1.4 GHz; the brightness of the cold spot is ∼ 37 mK. This difference of 14 mK
in brightness (4 mK at 10◦ resolution) represents the extragalactic population contribution
only, as the NVSS is not sensitive to the large scale galactic synchrotron emission. By
contrast, the single dish 1.4 GHz brightness within a few degrees of the cold spot is ∼ 3.4 K,
as measured using the Bonn Stockert 25 m telescope (Reich and Reich 1986). Therefore the
total synchrotron contribution at 1.4 GHz is ∼ 0.7 K above the CMB, 50 times larger than
the localized extragalactic deficit.
One way to create the cold spot would be if the universal spectral index used for the
normal galactic (plus small extragalactic) subtraction was incorrect for the extra brightness
temperature contribution of the NVSS dip, δT (-14 mK at L band, 1.4 GHz). We make an
order of magnitude estimate of this potential error. Following the first year data analysis
(Bennett et al. 2003), and the similar exercise performed by Cruz et al. (2006), we consider
fitting a synchrotron template map at some reference frequency νref , and extrapolating the
model, (F(ν)model), with a spectral index of β = −3 to the Q, V, and W bands. This
spectral index is consistent with those of Cruz et al. (2006) and the average spectral index
observed in the WMAP images (Bennett et al. 2003; Hinshaw et al. 2006). Under the null
hypothesis that the spectral index of δT is the same as that of the mean brightness T0, we
then calculate in the region of the deficit,
F (ν)model = [T0(νref) + δ T (νref)] (ν/νref)
. (1)
However, if the actual spectrum of δT is -α (from L band through W band) instead of
-3, then the true foreground subtraction, F(ν)true, should have been
F (ν)true = T0 (νref) (ν/νref)
+ δT (νref) (ν/νref)
. (2)
The foreground subtraction would then be in error as a function of frequency as follows,
expressed in terms of the L band temperatures :
– 6 –
δF (ν) ≡ F (ν)true − F (ν)model = δT (νL)(νL/ν)
1− (νref/ν)
. (3)
Three different reference frequencies have been used for synchrotron extrapolation – the
Haslam et al. (1981) 408 MHz map (Bennett et al. 2003), the Jonas, Baart & Nicolson
(1998) 2326 MHz Rhodes/HartRAO survey (Cruz et al. 2006), and the internal K and Ka
band WMAP images (Hinshaw et al. 2006). Since the foreground subtraction errors would
be worst extrapolating from the lowest frequency template, we start at 408 MHz and look
at the problems caused by a spectral index for δT that is different than the assumed -3.
We obtain a rough measure of the spectral index of the dip by comparing the 1◦
resolution maps (Figure 4) at 408 MHz and 1400 MHz. At lII ,bII = 207
◦,-55◦ , we find
δT = 2.6±0.75 K (30±12 mK) at 408 (1400) MHz, yielding a spectral index of -3.6±0.5.
Using Equation (3), this would actually lead to a WMAP “hot spot” if a spectral index of
-3.0 had been assumed for the extrapolation. Within the errors, the worst foreground ex-
trapolation mistakes would then be hot spots that range from +0.5 to +4.6 µK at Q band,
1◦ resolution, (≈ +0.25 to +2.3 µK at 4◦ resolution). Our derived spectral index for the dip
is steeper than expected for extragalactic sources, so we also do the calculation assuming
the flattest reasonable extragalactic spectrum of -2.5 . This would lead to a spurious cold
spot of -2.9 µK at 4◦ resolution . In either case, this is far below the -73 µK observed at this
resolution in WMAP, and we thus conclude that the deficit of NVSS sources does not lead
to a significant foreground subtraction error of either sign.
4. Discussion
The WMAP cold spot could have three origins: a) at the last scattering surface (z ∼
1000), b) cosmologically local (z < 1), or c) galactic. Because the spot corresponds to a
significant deficit of flux (and source number counts) in the NVSS, we have argued here that
the spot is cosmologically local and hence, a localized manifestation of the late ISW effect.
Cruz et al. (2005, 2007) derived a temperature deviation for the cold spot of ∼ −20µK
and a diameter of 10◦ using the WMAP 3 year data; on scales of 4◦ the average temperature is
lower, ∼ −73 µK. Using these two data points we derive an approximate relation between the
temperature deviation and the corresponding size of the cold spot: θ(∆T/T ) ≈ 4.5× 10−5,
where θ is the radius of the cold spot in degrees. We now perform an order of magnitude
calculation to see if the late ISW can produce such a spot, assuming that the entire effect
comes from the ISW.
The contribution of the late ISW along a given line of sight is given by (∆ T/T )|ISW =
– 7 –
Φ̇ dη, where the dot represents differentiation with respect to the conformal time η,
dη = dt/a(t), and a is the scale factor. The integrand will be non-zero only at late times
(z<1) when the cosmological constant becomes dynamically dominant.
We start with the Newtonian potential given by
Φ = GM/r ≈
r2 ρb δ. (4)
The proper size r and the background density ρb scale as a and a
−3, respectively. The growth
of the fractional density excess, δ(a) in the linear regime is given by D(a) = δ(a)/δ(a0), and
D(a) is the linear growth factor. For redshifts below ∼ 1 in ΛCDM, this factor can be
approximated as δ(a) ≈ aδ(a0)(3−a)/2. Assuming that the region is spherical, its comoving
radius is rc = 0.5∆z(c/H), and ∆z is the line of sight diameter of the region. The change
in the potential over dη can be approximated by
r2c ρc δ (∆z), (5)
where subscript c refers to the average comoving size of the void and the comoving back-
ground density. In ΛCDM the Hubble parameter is roughly given by H2 = H2
(1 + 2z),
for redshifts below ∼ 1. Incorporating these approximations we get the following relation
between the size of the region, its redshift and the temperature deviation from the late ISW:
∆Φ ≈ −
(1 + 2z)1/2(1 + z)−2 δ ≈
We now ask under what conditions this expression is consistent with the observed rela-
tion between the size and temperature of the cold spot derived earlier, θ(∆T/T ) ≈ 4.5×10−5.
For Ωm ∼ 0.3 and δ = −1 (i.e. a completely empty region) this leads to the simplified re-
lation, θ(1 + z) ≈ 6, where θ is in degrees, as before. Since the spot’s association with the
NVSS places it at z ∼ 0.5− 1, this leads to a self-consistent value of the radius of ∼ 3− 4◦
for the observed spot. For c/H0 = 4000 Mpc, the comoving radius of the void region is
120-160 Mpc.
How likely is such a large underdense region in a concordance cosmology? Suppose there
is only one such large underdense region in the whole volume up to z=1. The correspond-
ing void frequency is then the ratio of the comoving volume of the void to the comoving
volume of the Universe to z=1, which is roughly 3 × 10−5. Is this consistent with ΛCDM?
Void statistics have been done for a number of optical galaxy surveys, as well as numerical
structure formation simulations. Taking the most optimistic void statistics (filled dots in
Fig. 9 of Hoyle & Vogeley, 2004) which can be approximated by logP = −(r/Mpc)/15, a
– 8 –
140 Mpc void would occur with a probability of 5× 10−10, considerably more rare than our
estimate for our Universe (3× 10−5) based on the existence of the cold spot. One must keep
in mind, however, that observational and numerical void probability studies are limited to
rc ∼ 30 Mpc; it is not yet clear how these should be extrapolated to rc > 100 Mpc.
We note that Inoue & Silk (2006a,b) had already suggested that anomalous tempera-
ture anisotropies in the CMB, such as the cold spot, may be explained by the ISW effect. In
contrast to our calculation described above, their analysis considers the linear ISW plus the
second order effects due to an expanding compensated void, partially filled with pressureless
dust, embedded in a standard CDM (Inoue and Silk 2006a) or ΛCDM (Inoue and Silk 2006b)
background. It is reassuring that the size of the void indicated by their analysis—about 200
Mpc if located at z ∼ 1—is roughly the same as what we get here using linear ISW.
The need for an extraordinarily large void to explain the cold spot would add to the
list of anomalies associated with the CMB. (See Holdman, Mersini-Houghton & Takahashi
(2006a,b) for a theory that predicts such large voids based on a particular landscape model.)
These include the systematically higher strength of the late ISW correlation measured for a
variety of mass tracers, compared to theWMAP predictions (see Fig. 11 of Giannantonio et al.
2006), and the alignment and planarity of the quadrupole and the octopole (de Oliveira-Costa et al.
2004; Land & Magueijo 2005). We can, however, conclude that models linking the cold spot
with the larger scale anomalies, such as the anisotropic Bianchi Type VIIh model of Jaffe et
al. (2005), are no longer necessary. While we suggest that the cold spot is a local effect, low
order global anisotropic models (e.g., Gumrukcuoglu, Contaldi & Peloso 2006) may still be
needed for the low−ℓ anomalies.
5. Concluding Remarks
We have detected a significant dip in the average surface brightness and number counts of
radio sources from the NVSS survey at 1.4 GHz in the direction of the WMAP cold spot. The
deficit of extragalactic sources is also seen in a single dish image at 408 MHz. Together with
previous work, we rule out instrumental artifacts in WMAP due to foreground subtraction.
A fuller examination of the statistical uncertainties associated with our combination of the
McEwen et al. (2007) wavelet results and our own a posteriori analysis should be performed.
With this caveat, we conclude that the cold spot arises from effects along the line of sight,
and not at the last scattering surface itself. Any non-gaussianity of the WMAP cold spot
therefore would then have a local origin.
A 140 Mpc radius, completely empty void at z≤1 is sufficient to create the magnitude
– 9 –
and angular size of the cold spot through the late integrated Sachs-Wolfe effect. Voids this
large currently seem improbable in the concordance cosmology, adding to the anomalies
associated with the CMB.
We suggest that a closer investigation of all mass tracers would be useful to search
for significant contributions from isolated regions. Also, if our interpretation of the cold
spot is correct, it might be possible to detect it indirectly using Planck, through the lack of
lensing-induced polarization B modes (Zaldarriaga & Seljak 1997).
ACKNOWLEDGMENTS We thank Eric Greisen, NRAO, for improvements in the
AIPS FLATN routine, which allows us to easily stitch together many fields in a flexible
coordinate system. The 408 MHz maps were obtained through SkyView, operated under
the auspices of NASA’s Goddard Space Flight Center. We appreciate discussions with M.
Peloso, T. J. Jones and E. Greisen regarding this work, and useful criticisms from the
anonymous referee. LR acknowledges the inspiration from his thesis adviser, the late David
T. Wilkinson, who would have appreciated the notion of deriving information from a hole.
At the University of Minnesota, this work is supported in part, through National Science
Foundation grants AST 03-07604 and AST 06-07674 and STScI grant AR-10985.
REFERENCES
Bennett, C.L., et al. 2003, ApJS 148, 1
Cabre, A., Gaztanaga, E., Manera, M., Fosalba, P. Castander, F. 2006, MNRAS 372, 23
Cayon, L., Jin J., Treaster, A. 2005, MNRAS 362, 826
Chiang, L.-Y., Naselsky, P.D. 2006, IJMPD 15,1283C
Coles, P. 2005, Nature 433, 248
Condon, J. J., Cotton, W. D., Greisen, E. W., Yin, Q. F., Perley, R. A., Taylor, G. B.,
Broderick, J. J. 1998, AJ 115, 1693
Crittenden, R. G., Turok, N. 1996, PRL 76, 575
Cruz, M., Martinez-Gonzalez, E., Vielva, P., Cayon, L. 2005, MNRAS 356, 29
Cruz, M., Tucci, M., Martinez-Gonzalez, E., Vielva, P. 2006, MNRAS 369, 57
Cruz, M., Cayon, L., Martinez-Gonzalez, E., Vielva, P., Jin, J. 2007, ApJ 655, 11
– 10 –
de Oliveira-Costa A., et al. 2004, PhRevD 69, 3516
Giannantonio, T. et al. 2006, PhRvD 74, 352
Gumrukcuoglu, A. E., Contaldi, C. R. & Peloso, M. 2006, astro-ph/0608405
Haslam, C. G. T., Klein, U., Salter, C. J., Stoffel, H., Wilson, W.E., Cleary, M.N., Cooke,
D.J., & Thomasson, P. 1981, A&A, 100,209
Hinshaw, G., et al. 2006, ApJ, submitted (astro-ph/0603451)
Holman, R.; Mersini-Houghton, L.; Takahashi, Tomo, 2006, hep-th/0611223
Holman, R.; Mersini-Houghton, L.; Takahashi, Tomo, 2006, hep-th/0612142
Hoyle, F., Vogeley, M. S. 2004, ApJ 607, 751
Inoue, K. T., Silk, J. 2006, ApJ, 648, 23
Inoue, K. T., Silk, J. 2006, astro-ph/0612347
Jaffe, T. R., Banday A. J., Eriksen, H. K., Forski, K. M, Hansen, F. K. 2005, ApJ 629, L1
Jonas, J., Baart, E. E., Necolson, G. D. 1998, MNRAS, 297, 997
Land, K., Magueijo, J.,2005, PRL 95,071301
Liu, X., & Zhang, S. N. 2005, ApJ, 633,542
McEwen, J. D., Vielva, P., Hobson, M. P., Martinez-Gonzalez, E., & Lasenby, A. N. 2007,
MNRAS, in press
Pietrobon, D., Balbi, A., Marinucci, D. 2006, PhysRevD 74, 352
Reich, W. and Reich, P. 1986, A&A Suppl. 63, 205
Tegmark, M., de Oliveira-Costa, A., Hamilton A. 2003, Phys.Rev. D68, 123523.
Tojeiro, R., Castro, P.G., Heavens, A.G., Gupta, S. 2006, MNRAS,365,265
Vielva, P., Martinez-Gonzalez, E., Varreiro, R. B., Sanz, J. L., Cayon, L. 2004, ApJ 609, 22
York, D. G. et al. 2000, AJ 120, 1579
Zaldarriaga, M. & Seljak, U. 1997, PhRvD 55, 1830
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0608405
http://arxiv.org/abs/astro-ph/0603451
http://arxiv.org/abs/hep-th/0611223
http://arxiv.org/abs/hep-th/0612142
http://arxiv.org/abs/astro-ph/0612347
– 11 –
Fig. 1.— 50◦ field from smoothed NVSS survey at 3.4◦ resolution, centered at lII , bII
= 209◦, -57◦. Values range from black: 9.3 mJy/beam to white: 21.5 mJy/beam. A 10◦
diameter circle indicates the position and size of the WMAP cold spot.
– 12 –
Fig. 2.— Aitoff projection of NVSS survey, centered at lII , bII = 180
◦, 0◦, showing the
median brightness in sliding boxes of 3.4◦. The WMAP cold spot is indicated by the black
box. Closer to the plane, large dark patches arise from sidelobes around strong NVSS sources.
Fig. 3.— The cumulative distribution, normalized to 1000, of median brightness levels (mK)
in 3.4◦ sliding boxes of the NVSS images in two strips above |bII | > 30
◦ (see text). The
minimum brightness (which is from the cold spot region) is indicated by a vertical line.
– 13 –
Fig. 4.— 18◦ fields, with 1◦ resolution, centered at lII , bII = 209
◦, -57◦. Left: 408 MHz
(Haslam et al. 1981). Right: 1.4 GHz (Condon et al. 1998). A 10◦ diameter circle indicates
the position and size of the WMAP cold spot.
– 14 –
Fig. 5.— Number of NVSS sources in 3.14 square degree annuli as a function of distance
from the cold spot. The counts axis refers to the results for counts of sources with S>5mJy;
the grey line refers to counts for S>2.5mJy with those counts multiplied by 0.56. Each bin
is independent.
	Introduction
	Analysis and Characterization of the NVSS ``dip''
	Foreground Corrections
	Discussion
	Concluding Remarks
ABSTRACT
  We detect a dip of 20-45% in the surface brightness and number counts of NVSS
sources smoothed to a few degrees at the location of the WMAP cold spot. The
dip has structure on scales of approximately 1-10 degrees. Together with
independent all-sky wavelet analyses, our results suggest that the dip in
extragalactic brightness and number counts and the WMAP cold spot are
physically related, i.e., that the coincidence is neither a statistical anomaly
nor a WMAP foreground correction problem. If the cold spot does originate from
structures at modest redshifts, as we suggest, then there is no remaining need
for non-Gaussian processes at the last scattering surface of the CMB to explain
the cold spot. The late integrated Sachs-Wolfe effect, already seen
statistically for NVSS source counts, can now be seen to operate on a single
region. To create the magnitude and angular size of the WMAP cold spot requires
a ~140 Mpc radius completely empty void at z<=1 along this line of sight. This
is far outside the current expectations of the concordance cosmology, and adds
to the anomalies seen in the CMB.

<|endoftext|><|startoftext|>
Introduction
Secondary invariants of Dirac operators are a distinctive issue of the heat equation
approach to index theory. The eta invariant of a Dirac operator first appeared as the
boundary term in the Atiyah–Patodi–Singer index theorem [2]: this spectral invariant,
highly nonlocal and therefore unstable, became a major object of investigation, because
of its subtle relation to geometry. With the introduction of superconnnections in index
theory by Quillen and Bismut, it became possible to employ heat equation techniques
in higher geometric situations, where the primary invariant, the index, is no longer a
number, but a class in a K-theory group [44, 10, 35]. This led to so called local index
theorems, which are refinements of the cohomological index theorems at the level of
differential forms, and gave as new fundamental byproduct the eta forms, coming from
the transgression of the index class [11, 12, 13], which are the higher analogue of eta
invariants [41, 38, 34].
Rho invariants are differences (or, more generally, delocalized parts) of eta invariants,
so they naturally possess stability properties when computed for geometrically relevant
operators, mainly the spin Dirac operator and the signature operator [3, 31, 42]. Fur-
thermore, they can be employed to detect geometric structures: the Cheeger–Gromov
L2-rho invariant, for example, has major applications in distinguishing positive scalar
curvature metrics on spin manifolds [15, 43], and can show the existence of infinitely
many manifolds that are homotopy equivalent but not diffeomorphic to a fixed one [16].
As secondary invariants always accompany primary ones, it is very natural to ask what
are the L2-eta and L2-rho forms in the case of a families, and what are their properties.
We consider the easiest L2-setting one could think of, namely a normal covering of a fibre
bundle. This interesting model contains yet all the features and problems offered by the
presence of continuos spectrum. Since the fibres of the covering family are noncompact,
the large time asymptotic of the superconnection Chern character is in general not
Date: September 4, 2010.
http://arxiv.org/abs/0704.0909v2
2 SARA AZZALI
converging to a differential form representative of the index class, and the same problem
is reflected when trying to integrate on [1,∞) the transgression term involved in the
definition of the L2-eta form.
The major result in this sense is by Heitsch and Lazarov, who gave the first families index
theorem for foliations with Hausdorff graph [30]. They computed the large time limit
of the superconnection Chern character as Haefliger form, assuming smooth spectral
projections and Novikov–Shubin invariants bigger than 3 times the codimension of the
foliation. Their result implies an index theorem in Haefliger cohomology (not a local
one, because they do not deal with the transgression term), which in particular applies
to the easier L2-setting under consideration.
We use the techniques of Heitsch–Lazarov to investigate the integrability on [1,∞) of
the transgression term, in order to define the L2-eta form for families D of generalised
Dirac operators on normal coverings of fibre bundles. Our main result, Theorem 3.4,
implies that the L2-eta form η̂(2)(D) is well defined as a continuos differential form on
the base B if the spectral projections of the family D are smooth, and the families
Novikov–Shubin invariants {αK}K⊂B are greater than 3(dimB + 1).
We define then naturally the L2-rho form ρ̂(2)(D) as the difference between the L2-eta
form for the covering family and the eta form of the family of compact manifolds. When
the fibre is odd dimensional, the zero degree term of ρ̂(2)(D) is the Cheeger–Gromov
L2-rho invariant of the induced covering of the fibre. We prove that the L2-form is
(weakly) closed when the fibres are odd dimensional (Prop. 4.3).
The strong assumptions of Theorem 3.4 are required because we want to define η̂(2) for
a family of generalised Dirac operators. In the particular case of de Rham and signature
operators one can put weaker assumptions: this is showed by Gong–Rothenberg’s result
for the L2-Bismut–Lott index theorem (proved under positivity of the Novikov–Shubin
invariants) [24], and from results in [4], where we develop a new approach to large time
estimate exclusive to the families of de Rham and signature operators. On the contrary,
a family of signature operators twisted by a fibrewise flat bundle has to be treated as a
general Dirac operator [7].
Next we investigate the L2-rho form in relation to the space R+(M/B) of positive scalar
curvature vertical metrics for a fibre bundle of spin manifolds. For this purpose, the
Dirac families D/ involved are uniformly invertible by Lichnerowicz formula, so that the
definition of the L2-rho form does not require Theorem 3.4, but follows from classical
estimates. Here the L2-rho form is always closed, and we prove the first step in order
to use this invariant for the study of R+(M/B), namely that the class [ρ̂(2)(D/)] is the
same for metrics in the same concordance classes of R+(M/B) (Prop.5.1). The action
of a fibrewise diffeomorphism is also taken into account.
Along the lines of [42] we can expect that if Γ is torsion-free and satisfies the Baum–
Connes conjecture, then the L2-rho class of a family of odd signature operators is an
oriented Γ- fibrewise homotopy invariant, and that [ρ̂(2)(D̃/ĝ)] vanishes correspondingly
to a vertical metric ĝ of positive scalar curvature.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 3
Acknowledgements This work was part of my researches for the doctoral thesis. I
would like to thank Paolo Piazza for having suggested the subject, for many interesting
discussions and for the help and encouragement. I wish to express my gratitude to
Moulay-Tahar Benameur for many interesting discussions.
2. Geometric families in the L2-setting
We recall local index theory’s machine, here adapted to the following L2-setting for
families.
Definition 2.1. Let π̃ : M̃ → B be a smooth fibre bundle, with typical fibre Z̃ con-
nected, and let Γ be a discrete group acting fibrewise freely and properly discontinuosly
on M , such that the quotient M = M̃/Γ is a fibration π : M → B with compact fibre Z.
Let p : M̃ → M̃/Γ = M denote the covering map. This setting will be called a normal
covering of the fibre bundle π and will be denoted with the pair (p : M̃ →M,π : M → B).
Let π : M → B be endowed with the structure of a geometric family (π : M →
B, gM/B ,V, E), meaning by definition:
• gM/B is a given metric on the vertical tangent bundle T (M/B)
• V the choice of a smooth projection V : TM → T (M/B) (equivalently, the choice
of a horizontal complement THM = KerV)
• E → M is a Dirac bundle, i.e. an Hermitian vector bundle of vertical Clifford
modules, with unitary action c : Cl(T ∗(M/B), gM/B) → End(E), and Clifford
connection ∇E.
To a gemetric family it is associated a family D = (Db)b∈B of Dirac operators along
the fibres of π, Db = cb ◦ ∇Eb : C∞(Mb, Eb) → C∞(Mb, Eb), where Mb = π−1(b), and
Eb := E|Mb .
If we have a normal Γ-covering p : M̃ → M of the fibre bundle π, the pull back of the
geometric family via p gives a Γ-invariant geometric family which we denote (π̃ : M̃ →
B, p∗gM/B , Ṽ , Ẽ).
2.0.1. The Bismut superconnection. The structure of a geometric family gives a distin-
guished metric connection ∇M/B on T (M/B), defined as follows: fix any metric gB on
the base and endow TM with the metric g = π∗gB ⊕ gM/B ; let ∇g the Levi-Civita con-
nection on M with respect to g; the connection ∇M/B := V∇gV on the vertical tangent
does not depend on gB ([9, Prop. 10.2]).
When X ∈ C∞(B,TB), let XH denote the unique section of THM s.t. π∗XH = X. For
any ξ1, ξ2 ∈ C∞(B,TB) let T (ξ1, ξ2) := [ξH1 , ξH2 ]− [ξ1, ξ2]H and let δ ∈ C∞(M, (THM)∗)
measuring the change of the volume of the fibres LξH vol =: δ(ξH ) vol. Following the
notation of [9], in formulas in local expression we denote as e1, . . . , en a local orthonormal
base of the vertical tangent bundle; f1, . . . fm will be a base of TyB and dy
1, . . . , dym will
denote the dual base. The indices i, j, k.. will be used for vertical vectors, while α, β, . . .
will be for the horizontal ones. The 2-form c(T ) =
α<β(T (fα, fβ), ei)eidy
αdyβ has
4 SARA AZZALI
values vertical vectors. Using the vertical metric, c(T )(fα, fβ) can be seen as a cotangent
vertical vector, hence it acts on E via Clifford multiplication.
Let H → B be the infinite dimensional bundle with fibres Hb = C∞(Mb, Eb). Its space of
sections is given by C∞(B,H) = C∞(M,E). We denote Ω(B,H) := C∞(M,π∗(ΛT ∗B)⊗
E). Let ∇H be the connection on H → B defined by ∇HU ξ = ∇EUHξ +
δ(ξH) where ξ is
on the right hand side is regarded as a section of E. ∇H is compatible with the inner
product < s, s′ >b:=
hE(s, s′) vol b , with s, s
′ ∈ C∞(B,H), and hE the fixed metric
on E.
Even dimensional fibre. When dimF = 2l the bundle E is naturally Z2-graded by chi-
raliry, E = E+ ⊕ E−, and D is odd. Correspondingly, the infinite dimensional bundle
is also Z2-graded: H = H+ ⊕ H−. The Bismut superconnection adapted to D is the
superconnection B = ∇H +D − c(T )
on H.
The corresponding bundle for the covering family π̃ is denoted H̃ → B where the same
construction for the family M̃ → B gives the Bismut superconnection B̃ = ∇H̃ + D̃ −
c(T̃ )
, adapted to D̃. It is Γ-invariant by construction, being the pull-back via p of B.
Odd dimensional fibre. When dimZ = 2l − 1, the appropriate notion is the one of
Cl(1)-superconnection, as introduced by Quillen in [44, sec. 5]. Let Cl(1) the Clifford
algebra Cl(1) = C⊕Cσ, where σ2 = 1, and consider EndE⊗Cl(1), adding therefore the
extra Clifford variable σ. On End(Eb) ⊗ Cl(1) = Endσ(Eb ⊕ Eb) define the supertrace
tr σ(A + Bσ) := trB, extended then to tr σ : C∞(M,π∗Λ∗B ⊗ EndE) → Ω(B) as usual
by tr σ(ω ⊗ (a+ bσ)) = ω tr b, for ω ∈ C∞(B,ΛT ∗B), ∀a, b ∈ C∞(B,EndE).
The family D, as well as c(T ) are even degree elements of the algebra C∞(B,EndH ⊗
Cl(1)⊗̂ΛT ∗B). On the other hand, ∇H is odd. By definition, the Bismut Cl(1)-
superconnection adapted to the family D is the operator of odd total degree Bσ :=
Dσ + ∇̃u − c(T )
Notation. In the odd case we will distinguish between the Cl(1)-superconnection de-
fined above Bσ acting on Ω(B,H) ⊗̂Cl(1), and the differential operator B : Ω(B,H) →
Ω(B,H) given by B := D +∇H − c(T )
, which is not a superconnection but is needed in
the computations.
2.1. The heat operator for the covering family. In this section we briefly discuss
the construction of the heat operator e−B̃
, which can be easily performed combining
the usual construction for compact fibres families in [9, Appendix of Chapter 9], with
Donnelly’s construction for the case of a covering of a compact manifolds [20]. We
integrate notations of [9, Ch. 9-10] with the ones of our appendix A. We refer to the
latter for the definitions of the spaces of operators used the rest of this section.
Let C∞(B,DiffΓ(Ẽ)) the algebra of smooth maps D : B → DiffΓ(Ẽ) satisfying that
∀z ∈ B, Dz is a Γ-invariant differential operator on M̃z, with coefficients depending
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 5
smoothly on the variables of B. In the same way, let N = C∞(B,ΛT ∗B⊗Op−∞Γ (Ẽ)) =
Ω(B,Op−∞Γ (Ẽ)) the space of smooth maps A : B → ΛT ∗B ⊗ Op
Γ (Ẽ). N contains
families of Γ-invariant operators of order −∞ with coefficients differential forms, hence
N is filtered by Ni = C∞(B,
j≥i Λ
jT ∗B ⊗Op−∞Γ (Ẽ)). The curvature of B̃ is a family
2 ∈ Ω(B,Diff2Γ(Ẽ)) and can be written as B̃2 = D̃2 − C̃, with C̃ ∈ Ω≥1(B,Diff1Γ(Ẽ)).
2.1.1. Definition and construction. For each point z ∈ B the operator e−tB̃2z is by defi-
nition an the one whose Schwartz kernel p̃zt (x, y) ∈ Ẽx⊗ Ẽ∗y ⊗ΛT ∗zB is the fundamental
solution of the heat equation, i.e.
• p̃zt (x, y) is C1 in t, C2 in x, y;
p̃zt (x, y) + B̃
z,II p̃
t (x, y) = 0 where B̃z,II means it acts on the second variable;
• lim
p̃zt (x, y) = δ(x, y)
• ∀T > 0 ∀t ≤ T ∃ c(T ) :
∥∂it∂
ypt(x, y)
∥ ≤ ct−
−i−j−ke−
2(x,y)
2 , 0 ≤ i, j, k ≤ 1.
Its construction is as follows: pose
e−tB̃
z := e−tD̃
tk e−σ0tD̃
z C̃e−σ1tD̃
z . . . C̃e−σktD̃
︸ ︷︷ ︸
dσ1 . . . dσk (2.1)
Since ∀σ = (σ0, . . . , σk) there exists σi > 1k+1 , then each term Ik ∈ ΛT
zB ⊗Op−∞(Ẽz)
and so does e−tB̃
z . Let p̃zt (x, y) = [e
−B̃2t,z ](x, y) be the Schwartz kernel of the operator
(2.1). Using arguments of [9, theorems 9.50 and 9.51], one proves that p̃zt (x, y) is smooth
in z ∈ B so that one can conclude e−B̃ ∈ Ω(B,Op−∞Γ ).
The next property, proved in [20] and [21], is needed in the t→ 0 asymptotic. For t < T0
−tB̃2 ](x̃, ỹ)
∣ ≤ c1t−
2 e−c2
2(x̃,ỹ)
t (2.2)
2.2. Transgression formulæ, eta integrands. For t > 0 let δt : Ω(B,H) → Ω(B,H)
the operator which on Ωi(B,H) is multiplication by t− i2 . Then consider the rescaled
superconnection Bt = t
2 δtBδ
t = ∇H +
tD − c(T ) 1
2.2.1. Even dimensional fibre. From (A.1) we have
Str Γe
−B̃2t = −dStr Γ
which on a finite interval (t, T ) gives the transgression formula
Str Γ
− Str Γ
Str Γ
ds (2.3)
6 SARA AZZALI
2.2.2. Odd dimensional fibre. Here it is convenient to use that tr σΓe
−(B̃σt )2 = tr oddΓ e
−B̃2t ,
(from [44] and (A.1)), where trodd means we take the odd degree part of the resulting
form. Then taking the odd part of the formula
tr Γe
−B2t = −d tr Γ
Tr oddΓ
− Tr oddΓ
Tr evenΓ
ds (2.4)
Remarks and notation 2.2. Since we wish now to look at the limits as t → 0 and
t → ∞ in (2.3) and 2.4, let us make precise what the convergences on the spaces of
forms are, and for families of operators. On Ω(B) we consider the topology of conver-
gence on compact sets. We say a family of forms ωt
C0→ ωt0 as t → t0 if ∀K
supz∈K ‖ωt(z)− ωt0(z)‖ΛT ∗z B → 0. We say ωt
C1→ ωt0 if the convergence also hold for
first derivatives of ωt with respect to the base variables. We say ωt = O(tδ) as t→ ∞ if
∃ a constant C = C(K) : supz∈K ‖ωt(z) − ωt0(z)‖ΛT ∗z B ≤ Ct
δ. We say ωt
= O(tδ) if
also the first derivatives with respect to base directions are O(tδ).
For a family Tt ∈ UC∞(B,Op−∞(Ẽ)) we say Tt
Ck→ Tt0 as t → t0 if ∀K
⊆ B, ∀r, s ∈ Z
supz∈K ‖Tt(z)− Tt0(z)‖r,s → 0 together with derivatives up to order k with respect to
the base variables.
On the space of kernels UC∞(M̃ ×B M̃, Ẽ4Ẽ∗ ⊗ π∗ΛT ∗B), we say kt → kt0 if ∀ϕ ∈
C∞c (B) ‖(π∗ϕ(x))(kt(x, y) − kt0(x, y))‖k → 0.
We stress that from (A.3) the map Ω(B,Op−∞Γ (Ẽ)) → UC∞(M̃ ×B M̃ , Ẽ4Ẽ∗ ⊗
π∗ΛT ∗B), T 7→ [T ] is continuos.
2.3. The t→ 0 asymptotic.
Proposition 2.3.
Str Γ
Â(M/B) chE/S if dim Z̃ = even
tr oddΓ
Â(M/B) chE/S if dim Z̃ = odd
The result is proved exactly as in the classic case of compact fibres, together with the
following argument of [33, Lemma 4, pag. 4]:
Lemma 2.4. [33] ∃A > 0, c > 0 s.t.
−B2t ](π(x̃), π(x̃))− [e−B̃2t ](x̃, x̃)
∣ = O(t−ce−
For the proof of the lemma see [32], or also [5], [24]. With the same technique we deduce
Proposition 2.5. The differential forms StrΓ
and tr σΓ
dB̃σt
e−(B̃
integrable on [0, 1], uniformly on compact subsets.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 7
Proof. The proof is as in [9, Ch.10, pag. 340]. We reason for example in the even case.
Consider the rescaled superconnection B̃s as a one-parameter family of superconnections,
s ∈ R+, and construct the new family M̆ = M̃ ×R+ → B ×R+ =: B̆. On Ĕ = Ẽ ×R+
there is a naturally induced family of Dirac operators whose Bismut superconnection
is B̆ = B̃s + dR+ − n4sds, and its rescaling is B̆t = B̃st + dR+ −
ds. Its curvature is
t = B̃
st + t
∧ ds, so that
t = e−B̃
e−uB̃
e−(1−u)B
st ∧ ds = e−F̃st − ∂B̃st
e−B̃st ∧ ds.
Str Γ
= Str Γ(e
−B̃2st)− Str Γ
∂B̃st
e−B̃st
ds (2.5)
At t = 0 we have the asymptotic expansion Str Γ(e
−B̆t) ∼
j=0 t
2 (Φ j
− α j
without singular terms. Computing (2.5) in s = 1, since
∂B̃st
, one has
Str Γ
e−F̃t
, and therefore Str Γ
e−F̃t
j=0 t
−1α j
. Let’s
compute α0. From the local formula
Φ0 − α0ds = lim
Str Γ
e−F̆t
M̆/B̆
Â(M̆/B̆) (2.6)
since M̆(z,s) = M̃z×{s} and the differential forms are pulled back from those on M̃ → B,
then the right hand side of (2.6) does not contain ds so that α0 = 0. This implies that
Str Γ(
t ) ∼
−1α j
3. The L2-eta form
We prove in Theorem 3.4 the well definiteness of the L2-eta form η̂(2)(D̃) under opportune
regularity assumptions. We make use of the techniques of [30].
3.1. The family Novikov–Shubin invariants. The t → ∞ asymptotic of the heat
kernel is controlled by the behaviour of the spectrum near zero. Let P̃ = (P̃ z)z∈B
the family of projections onto ker D̃ and let P̃ǫ = χ(0,ǫ)(D̃) be the family of spectral
projections relative to the interval (0, ǫ); denote Q̃ǫ = 1− P̃ǫ − P̃ .
For any z ∈ B the operator D̃z is a Γ-invariant unbounded operator: let D̃2z =
λdEz(λ)
be the spectral decomposition of D̃2z , andN
z(λ) = trΓE
z(λ) its spectral density function
[27]. Denote bz = trΓ P̃
z. Then N z(ǫ) = bz + trΓ P̃
ǫ and from [22] the behaviour of
θz(t) = trΓ(exp(−tD̃z)) at ∞ is governed by
αz = sup{a : θz(t) = bz +O(t−a)} = sup{a : N z(ǫ) = bz +O(ǫa)} (3.1)
8 SARA AZZALI
where αz is called the Novikov–Shubin invariant of D̃z.
We shall later impose conditions on αz uniformly on compact subset of B, so we intro-
duce the following definition from [24]: let K ⊂ B be a compact, define αK := infz∈K αz.
We call {αK}K⊂B the family Novikov–Shubin invariants of the fibre bundle M̃ → B.
By results of Gromov and Shubin [27], when D̃2z is the Laplacian, αz is a Γ-homotopy
invariant of M̃z [27], in particular it does not depend on z. In that case αz is locally
constant on B. For a general Dirac type operator this is not true and we need to use
the αK ’s.
Definition 3.1. [30] We say the family D̃ has regular spectral projections if P̃ and P̃ǫ are
smooth with respect to z ∈ B, for ǫ small, and ∇H̃P̃ ,∇H̃P̃ǫ are in N and are bounded
independently of ǫ. We say that the family D̃ has regularity A, if ∀K
⊆ B it holds
αK ≥ A.
Remark 3.2. To have regular projections is a strong condition, difficult to be verified
in general. The family of signature operators verifies the smoothness of P̃ [24, Theorem
2.2] but the smoothness of P̃ǫ is not clear even in that case.
The large time limit of the superconnection-Chern character StrΓ e
−B̃2t is computed in
[30, Theorem 5]. Specializing to our L2-setting it says the following.
Theorem 3.3. [30] Let ∇̃0 = P̃∇H̃P̃ . If D̃ has regular projections and regularity
> 3 dimB,
Str Γ(e
−B̃2t ) = Str Γe
3.2. The L2-eta form. We now use the same techniques of [30] to analyse the trans-
gression term in (2.3) and define the secondary invariant L2 eta form. We prove
Theorem 3.4. If D̃ has regular spectral projections and regularity > 3(dimB+1), then
= O(t−δ−1), for δ > 0. The same holds for trevenΓ
We start with some remarks and lemmas. In particular we shall repeatedly use the
following.
Remark 3.5. Let T ∈ N . From lemma A.6, ∀z ∈ B its Schwartz kernel [Tz] satis-
fies that for sufficiently large l, ∃ czl such that ∀x, y ∈ M̃z | [Tz](x, y) | ≤ czl ‖Tz‖−l,l
Therefore an estimate of ‖Tz‖−l,l produces directly via an estimate of TrΓ Tz.
Notation. Since in this section we are dealing only with the family of operators on the
covering, to simplify the notations let’s call D̃ = D, removing all tildes. Pose
Bǫ := (P +Qǫ)B(P +Qǫ) + PǫBPǫ
Aǫ = B− Bǫ
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 9
and write the rescaled operators as
Bǫ,t = (P +Qǫ)(Bt −
tD)(P +Qǫ) +
tD + Pǫ(Bt −
tD)Pǫ (3.2)
Aǫ,t = (P +Qǫ)(Bt −
tD)Pǫ + Pǫ(Bt −
tD)(P +Qǫ)
Denote also Tǫ = QǫBQǫ and Tǫ,t = QǫBtQǫ as in [30].
We will need the following two lemmas from [30]. The first is the “diagonalization” of
ǫ with respect to the spectral splitting of H.
Lemma 3.6. [30, Prop.6] Let M be the space of all maps f : B → ΛTB⊗End H̃. There
exists a measurable section gǫ ∈ M, with gǫ ∈ 1 +N1 such that
∇20 0 0
0 T 2ǫ 0
0 0 (PǫBPǫ)
N3 0 0
0 N2 0
0 0 0
The diagonalization procedure acts on (P ⊕Qǫ)H, in fact gǫ has the form gǫ = ĝǫ ⊕ 1,
with ĝǫ acting on (P ⊕Qǫ)H. From this lemma we get B2ǫ,t = tδtB2ǫδ−1t =
= tδtg
∇20 0 0
0 T 2ǫ 0
0 0 (PǫBtPǫ)
N3 0 0
0 N2 0
0 0 0
 gǫδt =
= δtg
tδt(∇20 +N3)δ−1t 0 0
0 tδt(T
ǫ +N2)δ−1t 0
0 0 PǫBtPǫ
δtgǫδ
The next lemma gives an estimate of the terms which are modded out.
Lemma 3.7. [30, lemma 9] If A ∈ Nk is a residual term in the diagonalization lemma
or is a term in gǫ − 1 or g−1ǫ − 1, then, posing ǫ = t−
a , At := δtAδ
t verifies: ∀r, s
‖At‖r,s = O(t
a ) as t→ ∞.
The lemma implies that at place (1,1) in the diagonalized matrix above we get ∇20 +
O(t− 32+ 3a+1) = O(t− 12+ 3a ). To have −1
< 0 we take a > 6. The term at place (2,2)
gives T 2ǫ,t +O(t
a ). Then
ǫ,t = δtg
∇20 +O(t−γ) 0 0
0 T 2ǫ +O(t
a ) 0
0 0 (PǫBPǫ)
δtgǫδ
t , with γ > 0
Now since gǫ = ĝǫ ⊕ 1
ǫ,t =
∇20 +O(t−γ) 0
0 T 2ǫ,t +O(t
δtĝǫδ
0 PǫBPǫ
10 SARA AZZALI
Observe that since gǫ − 1, g−1ǫ − 1 ∈ N1, we have δtĝ−1ǫ δ−1t = Id+
O(t− 12+ 1a ).
Denote w := O(t− 12+ 1a ). Then
∇20 +O(t−γ) 0
0 T 2ǫ,t +O(t
δtĝǫδ
1 + w w
w 1 + w
∇20 +O(t−γ) 0
0 T 2ǫ,t +O(t
1 + w w
w 1 + w
Since e−∇
0+O(t−γ ) = e−∇
0 +O(t−γ), then leaving (P +Qǫ) out of the notation
ǫ,t =
1 + w w
w 1 + w
0 +O(t−γ) 0
0 e−T
1 + θ w
w 1 + w
+ e−(PǫBPǫ)
= e−(PǫBPǫ)
where
(1 + w)2e−∇
0 w(1 + w)e−∇
w(1 + w)e−∇
0 w2e−∇
O(t−1+ 2a ) O(t− 12+ 1a )
O(t− 12+ 1a ) O(t−1+ 2a )
(1 + w)2O(t−γ) w(1 + w)[O(t−γ) + e−T ]
w(1 + w)[O(t−γ) + e−T ] w2O(t−γ) + (1 + w)2e−T
Proof of theorem 3.4. To fix notation, say Z is even dimensional. In the odd case use
trevenΓ instead of Str Γ.
Let K ⊆ B be a compact, and denote as β = αK the Novikov–Shubin invariant on it.
Write Bt = Bǫ,t + Aǫ,t as in (3.2), and define Bt(z) = Bt,ǫ + zAt,ǫ, z ∈ [0, 1], so that by
Duhamel’s principle (for example [30, eq. (3.10)])
t − e−B2t,ǫ =
e−Bt(z)
dz = −
e−(s−1)B
t (z)
dB2t (z)
t (z)dsdz =: Fǫ,t
Write then
Str Γ(
t ) = Str Γ(
dBt,ǫ
︸ ︷︷ ︸
+Str Γ(
Fǫ,t)
︸ ︷︷ ︸
(3.3)
For the family
we shall use that
D + c(T )
D+O(t−
2 ), as in
Remark 2.2.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 11
3.2.1. The term I.
t,ǫ =
0 0 0
2QǫDQǫ 0
0 0 t−
2PǫDPǫ
+O(t−
e−(PǫBPǫ)
0 0 0
2QǫDQǫ 0
0 0 t−
2PǫDPǫ
0 0 0
0 0 0
0 0 0
O(t−1+ 2a ) O(t− 12+ 1a ) 0
O(t− 12+ 1a ) O(t−1+ 2a ) 0
0 0 0
0 0 0
2QǫDQǫ 0
0 0 t−
2PǫDPǫ
(1 + w)2O(t−γ) w(1 +w)2(O(t−γ) + e−T ) 0
w(1 + w)2(O(t−γ) + e−T ) w2O(t−γ) + (1 + w)2e−T 0
0 0 0
0 0 0
2QǫDQǫ 0
0 0 t−
2PǫDPǫ
e−(PǫBPǫ)
2PǫDPǫe
−(PǫBPǫ)2 +
0 0 0
2QǫDQǫO(t−
a ) QǫDQǫO(t−
a ) 0
0 0 0
0 0 0
2QǫDQǫw(1 + w)(O(t−γ) + e−T ) t−
2QǫDQǫ(w
2O(t−γ) + (1 + w)2e−T ) 0
0 0 0
The choice of a > 6 implies 2
. Moreover only diagonal blocks contribute1 to the
StrΓ, therefore we only have to guarantee the integrability of StrΓ(t
2PǫDPǫe
−PǫB2tPǫ),
because from [30, Prop.11] Str Γ e
−T = O(t−δ), ∀δ > 0.
We reason as follows: Str Γ(t
2PǫDPǫe
−PǫB2tPǫ) = t−
2 tr Γ(UPǫ), where U =
τPǫDPǫe
−PǫB2tPǫ , and τ is the chirality grading.
Next we evaluate trΓ(UPǫ) = trΓ(UP
ǫ ) = trΓ(PǫUPǫ). To do this, since our trace has
values differential forms, let ω1, . . . , ωJ a base of ΛT
zB, for z fixed on K. U is a family
of operators and Uz acts on C∞(M̃z, Ẽz)⊗ ΛT ∗zB. Write Uz =
j Uj ⊗ ωj.
tr Γ(PǫUPǫ) =
tr Γ(PǫUjPǫ)⊗ ωj =
tr(χFPǫUjPǫχF )⊗ ωj.
Now tr(χFPǫUjPǫχF ) =
i < χFPǫUjPǫχFδvi , δvi >=
i < UjPǫχFδvi , PǫχFδvi >,
where {δvi} is a base of L2(M̃z |F , Ẽz |F). Therefore
| < UjPǫχFδvi , PǫχFδvi > | ≤ ‖UjPǫχFδvi‖ · ‖PǫχFδvi‖ ≤
≤ ‖Uj‖ ‖PǫχFδvi‖
2 ≤ ‖Uz‖ ‖PǫχFδvi‖
1In fact if Pi are orthogonal projections s.t.
Pi = 1, then for a fibrewise operator A we have
StrA = tr ηA = tr(
PiηAPi) + tr(
PiηAPj) = tr(
PiηAPi).
12 SARA AZZALI
i ‖PǫχFδvi‖ =
i < PǫχFδvi , PǫχFδvi >=
i < χFPǫχFδvi , δvi >= tr Γ(Pǫ) =
O(ǫβ) where β = αK . Hence
tr Γ(PǫUPǫ) ≤ ‖U‖O(ǫβ) = ‖U‖O(t−
a ) , with ǫ = t−
Claim ([30, Lemma 13]):
∥ is bounded independently of t, for t large. This follows
because (PǫBPǫ)
2 = PǫD
2Pǫ − C̄t, with C̄t is a fibrewise differential operator of order
at most one with uniformly bounded coefficients. Therefore
2 C̄t
l,l−1
is bounded
independently of t, for t large. Now writing the Volterra series for e−t(PǫD
2+C̄t ,
we have U = τPǫ
e−tσ0PǫD
2PǫC̄te
−tσ1PǫD2Pǫ . . . C̄te
−tσkPǫD2Pǫdσ, then estimating
each addend as
−tσ0PǫD2PǫC̄te
−tσ1PǫD2Pǫ
∥τPǫDe
−tσ0PǫD2Pǫ
l,l+1
l+1,l
−tσ1PǫD2Pǫ
l,l+1
·· · ··
l+1,l
−tσkPǫD2Pǫ
l,l+1
we get the Claim.
Thus t−
2 trΓ(UPǫ) ≤ c ‖U‖ t−
2 , and Str Γ(
t,ǫ) ≤ ct
2 . We require then
< −1 to have integrability hence we need finally a < 2β
. Because a was also
required to be a > 6 (see lines after Lemma 3.7), the hypothesis
β > 3(q + 1) (3.4)
is a sufficient condition to have the first term in (3.3) equal O(t−1−δ), with δ > 0.
3.2.2. The term II. Now let’s consider the second term in (3.3). As in [30, pag.197-
198], write Bt =
tD + B1 +
B2, and locally B1 = d + Φ. We have
dB2t (z)
Bt(z)Aǫ,t + Aǫ,tBt(z) =
tDA1 + A2
tD + A3, where Ai = Ci,1PǫCi,2, and Ci,j ∈ M1
are sums of words in Φ, d(Φ), t−
2B[2], t
2 d(B[2]). This implies that Ci,j are differential
operators with coefficients uniformly bounded in t.
Str Γ
= tr Γτ(t
2D − t−
2B[2])
e−(s−1)B
t (z)(
tDC1,1PǫC1,2+
+ C2,1PǫC2,2
tD + C3,1PǫC3,2)e
−sB2t (z)dsdz =
= tr Γ
C1,2e
−sB2t (z)τ
e−(s−1)B
t (z)
tDC1,1Pǫ +
+ C2,2
tDe−sB
t (z)τ
e−(s−1)B
t (z)C2,1Pǫ+
+C3,2e
−sB2t (z)τ
e−(s−1)B
t (z)C3,1Pǫ
dsdz = tr Γ(PǫWPǫ)
with W the term in square brackets.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 13
With a similar argument as in the Claim above and as in [30, p. 199], we have that
2 e−sB
t (z)τe−(s−1)B
t (z)
∥ is bounded independently of t as t→ ∞ so that the condition
(3.4) on the Novikov–Shubin exponent guaranties that the term II. is O(t−1−δ) as t → ∞
as well. �
Theorem 3.4 and Proposition 2.5 taken together imply
Corollary 3.8. If D̃ has regular spectral projections and regularity > 3(dimB + 1)
η̂(2)(D̃) =
dt if dim Z̃ = even
trevenΓ
dt if dim Z̃ = odd
is well defined as a continuos differential form on B.
Remark 3.9. Theorem 3.4 gives η̂(2) as a continuos form on B. Therefore η̂(2) fits
into a weak L2-local index theorem (see [24, 4]). To get a strong local index theorem
one should prove estimates for Str Γ(
t ) in C1-norm, assuming more regularity
on αK .
Remark 3.10. If Z odd dimensional, ρ̂(2) is an even degree differential form, whose
zero degree term is a continuos function on B with values the Cheeger–Gromov L2-eta
invariant of the fibre, η̂
(b) = η(2)(Db, M̃b →Mb).
3.3. Case of uniform invertibility. Suppose the two families D and D̃ are both
uniformly invertible, i.e.
∃µ > 0 such that ∀b ∈ B
spec(Db) ∩ (−µ, µ) = ∅
spec(D̃b) ∩ (−µ, µ) = ∅
(3.5)
In this case the t → ∞ asymptotic is easy and in particular StrΓ(
t ) = O(t−δ),
∀δ > 0 [5]. With the same estimates (see [30, p. 194]) one can look at ∂
StrΓ(
and obtain that StrΓ(
= O(t−δ), ∀δ > 0.
4. The L2 rho form
Definition 4.1. Let (π : M → B, gM/B ,V, E) be a geometric family, p : M̃ → M a
normal covering of it. Assume that kerD forms a vector bundle, and that the family D̃
has regular projections with family Novikov–Shubin invariants αK > 3(dimB + 1). We
define the L2-rho form to be the difference
ρ̂(2)(M,M̃,D) := η̂(2)(D̃)− η̂(D) ∈ C0(B,ΛT ∗B).
14 SARA AZZALI
Remark 4.2. When the fibres are odd dimensional, ρ̂(2) is an even degree differential
form, whose zero degree term is a continuos function on B with values the Cheeger–
Gromov L2-rho invariant of the fibre, ρ̂
(b) = ρ(2)(Db, M̃b →Mb).
We say a continuos k-form ϕ on B has weak exterior derivative ψ (a (k+1)-form) if, for
each smooth chain c : ∆k+1 → B, it holds
ϕ, and we write dϕ = ψ.
Proposition 4.3. If π : M → B has odd dimensional fibres, ρ̂(2)(D) is weakly closed.
Proof. From (2.4),
odde−B̃
odde−B̃
dt. Tak-
ing the limits t → 0, T → ∞ we get
Â(M/B) ch(E/S) =
η̂(2)(D̃)
because limT→∞ tr
odde−B
T = tr(e−∇
0)odd = 0 because tr(e−∇
0) is a form of even degree.
The same happens for the family D̃ where
Â(M/B) ch(E/S) = dη̂(D̃) (strongly).
ρ̂(2)(D) = 0, which gives the result. �
Corollary 4.4. Under uniform invertibility hypothesis (3.5) the form ρ̂(2)(D) is always
(strongly) closed.
Proof. The argument is standard: from transgression formulæ (2.3) (2.4), asymptotic
behaviour, and Remark 3.9, we have dη̂(D) =
Â(M/B) ch(E/S) = dη̂(2)(D̃). �
5. ρ̂(2) and positive scalar curvature for spin vertical bundle
Let π : M → B be a smooth fibre bundle with compact base B. If ĝ denotes a metric
on the vertical tangent bundle T (M/B), and b ∈ B, denote with ĝb the metric induced
on the fibre Mb, and write ĝ = (ĝb)b∈B . Define
R+(M/B) := {ĝ metric on T (M/B) | scal ĝb > 0 ∀b ∈ B}
to be the space of positive scalar curvature vertical metrics (= PSC).
Assume that T (M/B) is spin and let ĝ ∈ R+(M/B) 6= ∅. By Lichnerowicz formula
the family of Dirac operators D/ĝ is uniformly invertible. Let p : M̃ → M be a normal
Γ-covering of π, with M̃ → B having connected fibres, and denote with r : M → BΓ the
map classifying it. The same holds for D̃/ĝ, so that we are in the situation of (3.3).
On the space R+(M/B) we can define natural relations, following [43]. We say ĝ0,
ĝ1 ∈ R+(M/B) are path-connected if there exists a continuos path ĝt ∈ R+(M/B)
between them.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 15
We say ĝ0 and ĝ1 are concordant if on the bundle of the cylinders Π: M × I → B,
Π(m, t) = π(m), there exists a vertical metric Ĝ such that: ∀b ∈ B Ĝb is of product-
type near the boundary, scal(Ĝb) > 0, and onM×{i} → B it coincides with ĝi, i = 0, 1.
Proposition 5.1. Let π : M → B be a smooth fibre bundle with T (M/B) spin and
B compact. Let p : M̃ → M be a normal Γ-covering of the fibre bundle, such that
M̃ → B has connected fibres. Then the rho class [ρ̂(2)(D/)] ∈ H∗dR(B) is constant on the
concordance classes of R+(M/B).
Proof. Let ĝ0 and ĝ1 be concordant, and Ĝ the PSC vertical metric on the family of
cylinders. The family of Dirac operators D/
M×I/B,Ĝ has as boundary the two families
D/0 = (Dz , ĝ0,z)z∈B and D/1 = (Dz , ĝ1,z)z∈B , both invertible. Then the Bismut–Cheeger
theorem in [11] can be applied
M×I/B
Â(M × I/B)− 1
η̂(D/ĝ0) +
η̂(D/ĝ1) in H∗dR(B)
where Ch(IndDM×I,h) = 0 ∈ H∗dR(B).
On the family of coverings we reason as before and apply the index theorem in [36,
Theorem 4] to get
M×I/B
Â(M × I/B)− 1
η̂(2)(D̃/ĝ0) +
η̂(2)(D̃/ĝ1) in H
dR(B)
Subtracting we get [ρ̂(2)(D/g0)] = [ρ̂(2)(D/g1)] ∈ H∗dR(B). �
5.1. ρ̂(2) and the action of a fibre bundle diffeomorphism on R+(M/B). Let
(p, π) be as in Definition 2.1 and assume further that p is the universal covering of M .
If one wants to use [ρ̂(2)(D/)] for the study of R+(M/B) it is important to check how this
invariant changes when ĝ ∈ R+(M/B) is acted on by a fibre bundle diffeomorphism f
preserving the spin structure.
Proposition 5.2. Let f : M →M be a fibre bundle diffeomorphism preserving the spin
structure. Then [ρ̂(2)(D/ĝ)] = [ρ̂(2)(D/f∗ ĝ)]
Proof. We follow the proof [43, Prop. 2.10] for the Cheeger–Gromov rho invariant. Let
ĝ be a vertical metric and denote S = PSpin(M/B) a fixed spin structure, i.e. a 2-fold
covering2 of PSOĝ(T (M/B)) →M .
The eta form downstairs of D/ depends in fact on ĝ, on the spin structure, and on the
horizontal connection THM , so we write here explicitly η̂(D/ĝ) = η̂(D/ĝ,S , THM).
First of all η̂(D/ĝ,S , THM) = η̂(D/f∗ ĝ,f∗S , f∗THM), because f induces a unitary equiva-
lence between the superconnections constructed with the two geometric structures.
2or, equivalently, a 2-fold covering of PGL+(T (M/B)) which is not trivial along the fibres of
PGL+(T (M/B)) → M , [43, p. 8].
16 SARA AZZALI
Because f spin structure preserving, it induces an isomorphism βGL+ between the orig-
inal spin structure S and the pulled back one df∗S. Then βGL+ gives a unitary equiv-
alence between the operator obtained via the pulled back structures, and the Dirac
operator for f∗ĝ and the chosen fixed spin structure, so that η̂(D/f∗ĝ,f∗S , f∗THM) =
η̂(D/f∗ĝ,S , f∗THM). Taken together
η̂(D/ĝ,S , THM) = η̂(D/f∗ĝ,S , f∗THM)
Let p : M̃ →M be the universal covering. Now we look at η̂(2)(D̃/) = η̂(2)(D̃/ĝ,S , THM,p),
where on M̃ the metric, spin structure and connection are the lift via p as by defini-
tion. Again, if we construct the L2 eta form for the entirely pulled back structure, we
get η̂(2)(D̃/ĝ,S , THM,p) = η̂(2)(D̃/f∗ĝ,f∗S , f∗THM,f∗p). Proceeding as above on the spin
structure, η̂(2)(D̃/f∗ĝ,f∗S , f∗THM,f∗p) = η̂(2)(D̃/f∗ĝ,S , f∗THM,f∗p). Since M̃ is the uni-
versal covering we have a covering isomorphism between f∗M̃ and M̃ , which becomes
an isometry when M̃ is endowed of the lift of the pulled back metric f∗ĝ, therefore
η̂(2)(D̃/f∗ ĝ,S , f
∗THM,f∗p) = η̂(2)(D̃/f∗ ĝ,S , f
∗THM,p)
It remains to observe how η̂ and η̂(2) depends on the connection T
HM . We remove for
the moment the hat ˆ to simplify the notation. Let TH0 M,T
1 M two connections, say
given by ω0, ω1 ∈ Ω1(M,T (M/B)) and pose ωt = (1− t)ω0 + tω1. Construct the family
M̆ =M × [0, 1] π̆→ B× [0, 1] =: B̆ as in the proof of Prop. 2.5. On this fibre bundle put
the connection one form ω̆ + dt. Since d̆η̆ = dη̆(·, t)− ∂
η(t)dt we have
η0 − η1 =
d̆η̆ −
M̆/B̆
Â(M × I/B × I)− d
which is the sum of a local contribution plus an exact form. Writing the same for η(2)
we get that for the L2-rho form ρ̂(2)(D/, TH0 M) = ρ̂(2)(D/, TH1 M) ∈ Ω(B)/dΩ(B) and
therefore we get the result. �
5.2. Conjectures. Along the lines of [31, 42] we can state the following conjectures.
Conjecture 5.1. If Γ is torsion-free and satisfies the Baum-Connes conjecture for the
maximal C∗-algebra, then [ρ̂(2)(D/ĝ)] vanishes if ĝ ∈ R+(M/B).
Definition 5.3. Let π : M → B and θ : N → B be two smooth fibre bundles of compact
manifolds over the same base B. A continuos map h : N → M is called a fibrewise
homotopy equivalence if π ◦ h = θ, and there exists g : N →M such that θ ◦ g = π and
such that h ◦ g, g ◦ h are homotopic to the identity by homotopies that take each fibre
into itself.
We work in the following with smooth fibrewise homotopy equivalences.
Definition 5.4. Let Γ be a discrete group and (π : M → M,p : M̃ → M), (θ : N →
B, q : Ñ → N) be two normal Γ-coverings of the fibre bundles π and θ. Denote as
r : M → BΓ, s : N → BΓ the two classifying maps. We say (π, p) and (θ, q) are Γ-
fibrewise homotopy equivalent if there exists a fibrewise homotopy equivalence h : N →
M such that s ◦ h is homotopic to r.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 17
Let Dsign denote the family of signature operators.
Conjecture 5.2. Assume Γ is a torsion-free group that satisfies the Baum-Connes
conjecture for the maximal C∗-algebra. Let h be a orientation preserving Γ-fibrewise
homotopy equivalence between (π, p) and (θ, q) and suppose D̃sign
and D̃sign
have smooth
spectral projections and Novikov–Shubin invariants > 3(dimB + 1).
Then [ρ̂(2)(D̃signM/B)] = [ρ̂(2)(D̃
)] ∈ H∗dR(B).
Appendix A. Analysis on normal coverings
We summarize the analytic tools we use to investigate L2 spectral invariants, namely
NΓ-Hilbert spaces and Sobolev spaces on manifolds of bounded geometry, following the
nice exposition in [46].
A.1. NΓ-Hilbert spaces and von Neumann dimension. Let Γ be a discrete count-
able group and l2(Γ) the Hilbert space of complex valued, square integrable functions
on Γ. Denote with δγ ∈ CΓ the function with value 1 on γ, and zero elsewhere. The
convolution law on CΓ is δγ ∗ δβ = δγβ. Let L be the action of Γ on l2(Γ) by left con-
volution L : Γ → U(l2(Γ)), Lγ(f) = (δγ ∗ f)(x) = f(γ−1x). Right convolution action is
denoted by R.
Definition A.1. The group von Neumann algebra NΓ is defined to be the weak closure
NΓ := L(CΓ)weak in B(l2(Γ)). By the double commutant theorem NΓ = R(CΓ)′, so
that NΓ is the algebra of operators commuting with the right action of Γ. An important
feature of the group von Neumann algebra is its standard trace trΓ : NΓ −→ C defined
as trΓA =< Aδe, δe >l2(Γ). In particular for A =
aγLγ ∈ NΓ, then trΓ(A) = ae.
Definition A.2. A free NΓ-Hilbert space is a Hilbert space of the form W ⊗ l2(Γ),
where W is a Hilbert space and Γ acts on l2(Γ) on the right.
A NΓ-Hilbert space H is a Hilbert space with a unitary right-action of Γ such
that there exists a Γ-equivariant immersion H → V ⊗ l2(Γ) in some free NΓ-
Hilbert space. For H1,H2 NΓ-Hilbert spaces, define BΓ(H1H2) : = {T : H1 →
H2 bounded and Γ-equivariant}.
Let H = V ⊗ l2(Γ) be a free NΓ-Hilbert space. Then BΓ(V ⊗ l2(Γ)) ≃ B(V) ⊗ NΓ.
There exist a trace on the positive elements of this von Neumann algebra, with values
in [0,∞]: let (ψj)j∈N is a orthonormal base of V; if f ∈ B(H)+, its trace is given by
trΓ(f) =
j∈N < f(ψj ⊗ δe), ψj ⊗ δe >. A Γ-trace can be defined also on any NΓ-
Hilbert-space H using the immersion j : H →֒ V ⊗ l2Γ and proveing that the trace does
not depend on the choice of j (see [17] or [39, pag. 17]).
Definition A.3. Let H be a NΓ-Hilbert space. Its von Neumann dimension is defined
as dim Γ(H) = trΓ(id : H → H) ∈ [0,+∞).
Definition A.4. Let H1 and H2 be NΓ-Hilbert spaces. Define
18 SARA AZZALI
• BfΓ(H1,H2) := {A ∈ BΓ(H1,H2)′|dim Γ
<∞} are the Γ-finite rank oper-
ators
• B∞Γ (H1,H2) := B
Γ(H1,H2)
, are the Γ-compact operators
• B2Γ(H) := {A ∈ BΓ(H)s.t. trΓ (AA∗) <∞}, are the Γ-Hilbert-Schmidt operators
• B1Γ(H) := B2Γ(H)B2Γ(H)∗ the Γ-trace class operators.
Their main properties are:
1) Bf (H),B∞(H),B2(H),B1(H) are ideals and Bf ⊂ B1 ⊂ B2 ⊂ B∞;
2) A ∈ Bi(H) if and only if |A| ∈ Bi(H) for i = 1, 2, f,∞.
A.2. Covering spaces, bounded geometry techniques. Let p : Z̃ → Z a normal
Γ-covering of a compact Riemannian manifold Z. Let I ⊂ Z̃ be a fundamental domain
for the (right) action of Γ on Z̃ (I is an open subset s.t. I · γ ∩ I and Z̃ \
I · γ have
zero measure ∀γ 6= e).
Let E → Z a Hermitian vector bundle, and Ẽ = p∗E the pull-back. The sections
C∞c (Z̃, Ẽ) form a CΓ-right module for the action (ξ · f)(m̃) =
(R∗gξ)(m̃)f(g
−1) where
(R∗gξ)(m̃) := ξ(m̃g). Its Hilbert space completion L
2(Z̃, Ẽ) is a Γ-free Hilbert space
in the sense of definition A.2, in fact the map ψ : L2(Z̃, Ẽ) −→ L2(I, Ẽ|I) ⊗ l2(Γ),
|I ⊗ δγ is an isomorphism.
The Γ-trace class operators are characterized as follows: let A ∈ BΓ(L2(Z̃, Ẽ))
A ∈ B1Γ(L2(Z̃, Ẽ)) if and only if χI |A|χI ∈ B1(L2(I, E|I))
If A ∈ B1Γ(L2(Z̃, Ẽ)) then trΓ(A) = tr(χIAχI). If A ∈ B1Γ(L2(Z̃, Ẽ)) has Schwartz
kernel [A] continuos, then
trA =
([A](x, x)) dx =
π∗ tr Ẽx ([A](x, x)) dx . (A.1)
The covering of a compact manifold and the pulled back bundle Ẽ above are the most
simple examples of manifolds of bounded geometry3.
The analysis on manifolds of bounded geometry was developped in [45]. We specialize
here to the case of a normal covering Z̃.
The Sobolev spaces of sections are defined, for k ≥ 0, as the completion Hk(Z̃, Ẽ) :=
C∞c (Z̃, Ẽ)
where ‖f‖k :=
L2(Z̃,Ẽ⊗jT ∗Z̃); for k < 0 H
k(Z̃, Ẽ) is defined
as the dual of H−k(Z̃, Ẽ).
3Let (N, g) be a Riemannian manifold. N is of bounded geometry if
(1) it has positive injectivity radius i(N, g);
(2) the curvature RN and all its covariant derivatives are bounded.
A hermitian vector bundle E → N is of bounded geometry if the curvature RE and all its covariant
derivatives are bounded. This can be characterized in normal coordinates with conditions on g, coordi-
nate transformations and ∇ (see for example in [45] and [46]).
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 19
The spaces of uniform Ck sections are defined as follows: UCk(M̃) = {f : M̃ → C | f ∈
Ck and ‖f‖k ≤ c(k) ∀k}, where ‖f‖k = supm̃∈M̃,Xi{|∇X1 . . .∇Xkf(m̃)|}, and analo-
gously for sections UCk(M̃, Ẽ). UC∞(M̃, Ẽ) is the Fréchet space :=
k UCk(M̃ , Ẽ).
The following Sobolev embedding property holds [45]: if dim M̃ = n, then for j > n
there is a continuos inclusion Hj(M̃ , Ẽ) →֒ UCk(M̃ , Ẽ).
The algebra UDiff(M̃, Ẽ) of uniform differential operators is the algebra generated by
operators in UC∞(M̃ ,End Ẽ) and derivatives {∇ẼX}X∈UC∞(M̃ ,TM̃) with respect to uni-
form vector fields. P ∈ UDiff(M̃ , Ẽ) extends to a continuos operator Hj(M̃, Ẽ) →
Hj−k(M̃ , Ẽ) ∀j ∈ Z.
P ∈ UDiff(M̃ , Ẽ) is called uniformly elliptic if its principal symbol σpr ∈
UC∞(T ∗M̃, π∗ End Ẽ) is invertible out of an ǫ-neighborhood of 0 ∈ T ∗M̃ , with inverse
section which can be uniformly estimated.
For a uniformly elliptic operator T the G̊arding inequality holds:
Hs+k(M̃,Ẽ)
≤ c(s, k) (‖ϕ‖Hs + ‖Tϕ‖Hs) ∀s ∈ R (A.2)
If T is a continuos operator T : C∞c (N,E) → (C∞c (M̃ , Ẽ))′ we will denote its Schwartz
kernel with [T ] ∈ C∞(M̃ × M̃, Ẽ4Ẽ∗).
Definition A.5. We say that T : C∞c (N,E) → (C∞c (M̃ , Ẽ))′ has order k ∈ Z if ∀s ∈
Z it admits a bounded extension Hs(M̃, Ẽ) → Hs−k(M̃, Ẽ). Hence it is closable as
unbounded operator on L2(M̃ , Ẽ).
The space of order k operators is denoted Opk(M̃, Ẽ), and comes with the seminorms
on B(Hs(M̃ , Ẽ),Hs−k(M̃ , Ẽ)). The space Op−∞(M̃, Ẽ) =
k(M̃, Ẽ) is a Fréchet
space.
Finally, an operator T ∈ Opk(M̃ , Ẽ) is called elliptic if it satisfies G̊arding inequality.
We will denote as OpkΓ(M̃, Ẽ) the subspace of Γ-invariant operators in Op
k(M̃, Ẽ).
Consider the Fréchet space of continuos rapidly decreasing functions
RB(R) = {f : R → C : f continuos, and
∣(1 + x
2 f(x)
∣ <∞ ∀k}
Let T ∈ Opk(M̃ , Ẽ) , k ≥ 1 an elliptic, formally self-adjoint operator. Denote again by T
its closure, with domain Dom T = Hk(M̃ , Ẽ). From G̊arding inequality (A.2) the map
RC(R) −→ B(Hj(M̃ , Ẽ),H l(M̃ , Ẽ)), f 7→ f(T ) is continuos ∀j, l ∈ Z, so that
RC(R) −→ Op−∞(Z̃, Ẽ) , f 7→ f(T )
is continuos. One can prove that the Schwartz kernel of such operator is smooth: by
(A.2) and Sobolev embedding, for L = [n
+ 1] the map
Op−2L−l(Z̃, Ẽ) −→ UC l(Z̃ × M̃,E4E∗) , T 7→ [T ] (A.3)
is continuos ∀l ∈ N; then in particular for f ∈ RC(R), the kernel [f(T )] ∈ UC∞(Z̃ ×
Z̃, Ẽ4Ẽ∗) and the map RB(R) −→ UC∞(Z̃ × Z̃, Ẽ4Ẽ∗) , f 7→ [f(T )] is continuos.
20 SARA AZZALI
Lemma A.6. Since elements in Op−∞Γ (M̃, Ẽ) are Γ-trace class, then for T ∈
Op kΓ(M̃, Ẽ) elliptic and selfadjoint, the map RC(R) → B1Γ(M̃ , Ẽ) , f 7→ f(T ) is contin-
uos. As a consequence ∀m ∃l such that
| trΓ f(T )|m ≤ C ‖f(T )‖−l,l . (A.4)
References
[1] M. F. Atiyah, Elliptic operators, discrete groups and von Neumann algebras, Asterisque 32/33
(1976), 43–72.
[2] M. F. Atiyah, V. K. Patodi, I. M. Singer. Spectral asymmetry and Riemannian geometry I, Math.
Proc. Cambridge Philos. Soc. 77, 43–49, 1975.
[3] M.F. Atiyah, V.K. Patodi, I. M. Singer, Spectral asymmetry and Riemannian geometry II, Math.
Proc. Cambridge Philos. Soc. 78 (3), 405–432, 1975.
[4] S. Azzali, S. Goette, T. Schick, in preparation.
[5] S. Azzali, Two spectral invariants of type rho Ph.D. thesis, Università La Sapienza, Roma 2007.
[6] M-T. Benameur, J. Heitsch, Index theory and non-commutative geometry. I and II. K-Theory
36 (2005), and J. K-Theory 1 (2008).
[7] M-T. Benameur, J. Heitsch, The Twisted Higher Harmonic Signature for Foliations, preprint
arXiv:0711.0352v2 [math.KT]
[8] M-T. Benameur, P. Piazza, Index, eta and rho invariants on foliated bundles, Asterisque 327,
2009, 199–284.
[9] N. Berline, E. Getzler, M. Vergne, Heat kernels and Dirac operators, Springer-Verlag, New York
1992.
[10] J-M. Bismut, The Atiyah–Singer index theorem for families of Dirac operators: two heat equation
proofs, Invent. Math. 83 (1986), 91–151.
[11] J-M. Bismut, J. Cheeger, Families index for manifolds with boundary, superconnections, and
cones I and II, J. Funct. Anal. 89 (1990), no. 2, 313–363.
[12] J-M. Bismut, D. Freed, The analysis of elliptic families II, Comm. Math. Phys. 107 (1986),
103–163.
[13] J-M. Bismut, J. Cheeger, Eta invariants and their adiabatic limits, J. Amer. Math. Soc. 2
(1989), 33–70.
[14] J-M. Bismut, J. Lott, Flat vector bundles, direct images and higher real analytic torsion, J.
Amer. Math. Soc. 8 (1995).
[15] B. Botvinnik, P. B. Gilkey, The eta invariant and metrics of positive scalar curvature, Math.
Ann., 302 (3), 507-517, 1995.
[16] S. Chang, S. Weinberger, On Invariants of Hirzebruch and Cheeger-Gromov, Geom. Topol., 7,
pp. 311–319, (2003).
[17] J. Cheeger, M. Gromov, On the characteristic numbers of complete manifolds of bounded cur-
vature and finite volume, in Differential geometry and complex analysis, pp. 115–154, Springer,
Berlin, 1985.
[18] J. Cheeger, M. Gromov, Bounds on the von Neumann dimension of L2-cohomology and the
Gauss-Bonnet theorem for open manifolds, J. Differential Geom. 21 (1985), no. 1, pp.1–34
[19] X. Dai, Adiabatic limits, nonmultiplicativity of signature, and Leray spectral sequence, J. Amer.
Math. Soc. 4 (1991), 265–321.
[20] H. Donnelly, Asymptotic expansion for the compact quotients of properly discontinuos group
actions, Illinois Journal of Mathematics 23 (3) , 485-496, 1979.
[21] H. Donnelly, Local index theorem for families, Michigan Math J. 35 (1988), 11-20.
[22] D. V. Efremov, M. A. Shubin, Spectrum distribution function and variational principle for au-
tomorphic operators on hyperbolic space, Séminaire sur les Équations aux Dérivées Partielles,
1988–1989, Exp. No. VIII, École Polytech., Palaiseau, 1989.
[23] D. Freed, Notes on index theory, http://www.ma.utexas.edu/users/dafr/Index/index.html.
L2-RHO FORM FOR NORMAL COVERINGS OF FIBRE BUNDLES 21
[24] D. Gong and M. Rothenberg, Analytic torsion forms for noncompact fiber bundles, MPIM
preprint 1997. Available at www.mpim-bonn.mpg.de/preprints.
[25] A. Gorokhovsky, J. Lott, Local index theory over étale groupoids J. Reine Angew. Math. 560
(2003).
[26] A. Gorokhovsky, J. Lott, Local index theory over foliation groupoids, Adv. Math 204 (2006),
413–447.
[27] M. Gromov, M. A. Shubin, Von Neumann spectra near zero, Geom. and Funct. Analysis vol. 1,
No. 4 (1991) 375–404
[28] A. Hassell, R. Mazzeo, R. Melrose, A signature formula for manifolds with corners of codimension
two, Topology 36 (5), 1055–1075, 1997.
[29] J. Heitsch, Bismut superconnections and the Chern character for Dirac operators on foliated
manifolds, K-Theory 9 (1995), no. 6, 507–528.
[30] J. Heitsch, C. Lazarov, A general families index theorem, K-theory 18, 181–202, 1999.
[31] N. Keswani, Von Neumann eta-invariants and C∗-algebra K-theory, J. London Math. Soc. (2)
62 (2000), 771–783.
[32] J. Lott, Higher eta invariants K-theory 6, 191-233, (1992).
[33] J. Lott, Heat kernels on covering spaces and topological invariants, J. Diff. Geom. 35 (1992),
471–510.
[34] J. Lott, Eta and Torsion, Symtries quantiques (Les Houches, 1995), 947–955, North-Holland,
Amsterdam, 1998.
[35] Superconnections and higher index theory, Geom. Funct. Anal. 2 (1992), 421–454.
[36] E. Leichtnam, P. Piazza, Étale groupoids, eta invariants and index theory. J. Reine Angew. Math.
587 (2005), 169–233.
[37] E. Leichtnam, P. Piazza, On higher eta-invariants and metrics of positive scalar curvature, K-
Theory 24 (2001), 341–359.
[38] E. Leichtnam, P. Piazza, Spectral sections and higher Atiyah-Patodi-Singer index theory on
Galois coverings, Geom. Funct. Anal. 8 (1998), 17–58.
[39] W. Lück, L2-invariants: theory and applications to geometry and K-theory, Springer-Verlag,
Berlin, 2002.
[40] V. Mathai, The Novikov conjecture for low degree cohomology classes, Geometriae Dedicata 99,
(2003) 1–15.
[41] R. B. Melrose, P. Piazza, Families of Dirac operators, boundaries and the b-calculus. J. Differ-
ential Geom. 46 (1997), no. 1, 99–180.
[42] P. Piazza, T. Schick, Bordism, rho-invariants and the Baum-Connes conjecture. J. of Noncom-
mutative Geometry 1 (2007), 27–111.
[43] P. Piazza, T. Schick, Groups with torsion, bordism and rho invariants. Pacific J. Math. 232
(2007), no. 2, 355–378.
[44] D. Quillen, Superconnections and the Chern character. Topology 24 (1985), 89–95.
[45] J. Roe, An index theorem on open manifolds I. J. Diff. Geom. 27 (1988), 87–113.
[46] B. Vaillant, Indextheorie für Überlagerungen, Diploma Thesis, 1996, available at www.math.uni-
bonn.de/people/strohmai/globan/boris/
Mathematisches Institut, Georg-August Universität Göttingen
E-mail address: azzali@uni-math.gwdg.de
	1. Introduction
	2. Geometric families in the L2-setting
	2.1. The heat operator for the covering family
	2.2. Transgression formulæ, eta integrands
	2.3. The t0 asymptotic
	3. The L2-eta form
	3.1. The family Novikov–Shubin invariants
	3.2. The L2-eta form
	3.3. Case of uniform invertibility
	4. The L2 rho form
	5. (2) and positive scalar curvature for spin vertical bundle
	5.1. (2) and the action of a fibre bundle diffeomorphism on R+(M/B)
	5.2. Conjectures
	Appendix A. Analysis on normal coverings
	A.1. N-Hilbert spaces and von Neumann dimension
	A.2. Covering spaces, bounded geometry techniques
	References
ABSTRACT
  We define the secondary invariants L^2- eta and -rho forms for families of
generalized Dirac operators on normal coverings of fibre bundles. On the
covering family we assume transversally smooth spectral projections, and
Novikov--Shubin invariants bigger than 3(dim B+1) to treat the large time
asymptotic for general operators. In the particular case of a bundle of spin
manifolds, we study the L^2- rho class in relation to the space of positive
scalar curvature vertical metrics.

<|endoftext|><|startoftext|>
Introduction
Let A and B be algebras and n ≥ 2 an integer. A linear map φ : A → B is an
n-homomorphism if for all a1, a2, . . . , an ∈ A,
φ(a1a2 · · · an) = φ(a1)φ(a2) · · ·φ(an).
A 2-homomorphism is then just a homomorphism, in the usual sense, between
algebras. Furthermore, every homomorphism is clearly also an n-homomorphism
for all n ≥ 2, but the converse is false, in general. The concept of n-homomorphism
was studied for complex algebras by Hejazian, Mirzavaziri, and Moslehian [7]. This
concept also makes sense for rings and (semi)groups. For example, an AEn-ring is a
ring R such that every additive endomorphism φ : R → R is an n-homomorphism;
Feigelstock [4, 5] classified all unital AEn-rings.
In [7], Hejazian et al. ask: Is every ∗-preserving n-homomorphism between C⋆-
algebras continuous? We answer in the affirmative by proving that every involutive
n-homomorphism φ : A → B between C⋆-algebras is in fact norm contractive:
‖φ‖ ≤ 1. Surprisingly, the arguments for the even and odd n cases are disjoint
and, thus, are discussed in different sections. When n = 3, automatic continuity is
reported by Bračič and Moslehian [2], but note that the proof of their Theorem 2.1
does not extend to the nonunital case since the unitization of a 3-homomorphism
is not a 3-homomorphism, in general.
Using these automatic continuity results, we prove the following stronger results:
If n > 2 is even, every ∗-linear n-homomorphism φ : A → B between C⋆-algbras
is in fact a ∗-homomorphism. If n ≥ 3 is odd, every ∗-linear n-homomorphism
φ : A→ B is a difference φ(a) = ψ1(a)−ψ2(a) of two orthogonal ∗-homomorphisms
ψ1 ⊥ ψ2. Regardless, for all integers n ≥ 3, every positive linear n-homomorphism
MSC 2000 Classification: Primary 46L05; Secondary 47B99, 47L30.
c©1997 American Mathematical Society
http://arxiv.org/abs/0704.0910v3
2 EFTON PARK AND JODY TROUT
is a ∗-homomorphism. Note that if ψ is a ∗-homomorphism, then −ψ = 0− ψ is a
norm contractive ∗-preserving 3-homomorphism that is not positive linear.
There is also a dichotomy between the unital and nonunital cases. When the
domain algebra A is unital, there is a simple representation of an n-homomorphism
as a certain n-potent multiple of a homomorphism (discussed in the Appendix.)
The nonunital case is more subtle. For example, if A and B are nonunital (Banach)
algebras such that An = Bn = {0}, then every linear map L : A→ B (bounded or
unbounded) is, trivially, an n-homomorphism (see Examples 2.5 and 4.3 of [7]).
The outline of the paper is as follows: In Section 2, we prove automatic continuity
for the even case and in Section 3 for the odd case. In Section 4, we prove our
nonexistence results. A key fact in many of our proofs is the Cohen Factorization
Theorem [3] of C⋆-algebras. (See Proposition 2.33 [8] for an elementary proof of this
important result.) Finally, in Appendix A, we collect some facts about n-potents
that we need.
The authors would like to thank Dana Williams and Tom Shemanske for their
helpful comments and suggestions.
2. Automatic Continuity: The Even Case
In this section, we prove that when n > 2 is even, every involutive (i.e., ∗-linear)
n-homomorphism between C⋆-algebras is completely positive and norm contractive,
which generalizes the well-known result for ∗-homomorphisms (n = 2). Recall that
a linear map θ : A→ B between C⋆-algebras is positive if a ≥ 0 implies θ(a) ≥ 0 or,
equivalently, for every a ∈ A there is a b ∈ B such that θ(a∗a) = b∗b. We say that
θ is completely positive if, for all k ≥ 1, the induced map θk : Mk(A) → Mk(B),
θk((aij)) = (θ(aij)), on k × k matrices is positive.
Theorem 2.1. Let H be a Hilbert space. If n ≥ 2 is even, then every involutive
n-homomorphism from a C*-algebra A into B(H) is completely positive.
Proof. Let φ : A → B(H) be an involutive n-homomorphism. We may assume
n = 2k > 2. Let 〈·, ·〉 denote the inner product on H. By Stinespring’s Theorem
[9] (see Prop. II.6.6 [1]), φ is completely positive if and only for any m > 1 and
elements a1, . . . , am ∈ A and vectors v1, . . . , vm ∈ H we have
i,j=1
〈φ(a∗i aj)vj , vi〉 ≥ 0.
We proceed as follows: for each 1 ≤ i ≤ m use the Cohen Factorization Theorem
[3] to factor ai = ai1 · · · aik into a product of k elements. Thus, their adjoints factor
as a∗i = a
ik · · · a
i1. Since n = 2k, we compute
i,j=1
〈φ(a∗i aj)vj , vi〉 =
i,j=1
〈φ(a∗ik · · ·a
i1aj1 · · · ajk)vj , vi〉
i,j=1
〈φ(aik)
∗ · · ·φ(ai1)
∗φ(aj1) · · ·φ(ajk)vj , vi〉
φ(aj1) · · ·φ(ajk)vj ,
φ(ai1) · · ·φ(aik)vj〉
= 〈x, x〉 ≥ 0,
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 3
where x =
i=1 φ(ai1) · · ·φ(aik)vi ∈ H. The result now follows. �
Even though the previous result is a corollary of the more general theorem below,
we have included it because the proof technique is different.
Lemma 2.2. Let φ : A → B be an n-homomorphism. Then, for all k ≥ 1, the
induced maps φk : Mk(A) → Mk(B) on k × k matrices are n-homomorphisms.
Moreover, if φ is involutive (φ(a∗) = φ(a)∗), then each φk is also involutive.
Proof. Given n matrices a1 = (a1ij), . . . , a
n = (anij) in Mk(A), we can express
their product a1a2 · · · an = (aij), where the (i, j)-th entry aij is given by the formula
aij =
m1,··· ,mn−1=1
a1im1a
· · · anmn−1j .
Since φk(a
1a2 · · · an) = (φ(aij)) by definition and
φ(aij) =
m1,··· ,mn−1=1
φ(a1im1a
· · ·anmn−1j)
m1,··· ,mn−1=1
φ(a1im1)φ(a
) · · ·φ(anmn−1j)
= [φk(a
1)φk(a
2) · · ·φk(a
n)]ij ,
it follows that φk : Mk(A) → Mk(B) is an n-homomorphism. Now suppose that φ
is involutive. We compute for all a = (aij) ∈Mk(A):
∗) = φk((a
ji)) = (φ(a
ji)) = (φ(aji)
∗) = φk(a)
and hence each φk :Mk(A) →Mk(B) is involutive. �
Theorem 2.3. Let φ : A → B be an involutive n-homomorphism between C*-
algebras. If n ≥ 2 is even, then φ is completely positive. Thus, φ is bounded.
Proof. We may assume n = 2k > 2. Since φ is linear, we want to show that for
every a ∈ A we have φ(a∗a) ≥ 0. By the Cohen Factorization Theorem, for any
a ∈ A we can find a1, ..., ak ∈ A such that the factorization a = a1 · · · ak holds.
Thus, the adjoint factors as a∗ = a∗k · · ·a
1. Since n = 2k and φ is n-multiplicative
and ∗-preserving,
φ(a∗a) = φ(a∗k · · · a
1a1 · · · ak)
= φ(ak)
∗ · · ·φ(a1)
∗φ(a1) · · ·φ(ak)
= (φ(a1) · · ·φ(ak))
∗(φ(a1) · · ·φ(ak))
= b∗b ≥ 0,
where b = φ(a1) · · ·φ(ak) ∈ B. Thus, φ is a positive linear map. By the previous
lemma, all of the induced maps φk : Mk(A) → Mk(B) on k × k matrices are
involutive n-homomorphisms and are positive. Hence, φ is completely positive and
therefore bounded [1]. �
We now wish to show that if n ≥ 2 is even, then an involutive n-homomorphism
is actually norm-contractive. First, we will need generalizations of the familiar
C⋆-identity appropriate for n-homomorphisms.
4 EFTON PARK AND JODY TROUT
Lemma 2.4. Let A be a C⋆-algebra. For all k ≥ 1, we have that
‖x‖2k = ‖(x∗x)k‖
‖x‖2k+1 = ‖x(x∗x)k‖
for all x ∈ A.
Proof. In the even case, we have easily that
‖x‖2k = (‖x‖2)k = ‖x∗x‖k = ‖(x∗x)k‖
by the functional calculus since x∗x ≥ 0. In the odd case, we compute again using
the C⋆-identity and functional calculus:
‖x(x∗x)k‖2 = ‖(x(x∗x)k)∗(x(x∗x)k)‖
= ‖(x∗x)kx∗x(x∗x)k‖
= ‖(x∗x)2k+1‖ = ‖(x∗x)‖2k+1
= (‖x‖2)2k+1 = (‖x‖2k+1)2;
the result follows by taking square roots. �
Theorem 2.5. Let φ : A → B be an involutive n-homomorphism of C⋆-algebras.
If φ is bounded, then φ is norm contractive (‖φ‖ ≤ 1).
Proof. Suppose n = 2k is even. Then for all x ∈ A we have
(x∗x)k
= φ(x∗x · · ·x∗x) =
φ(x∗)φ(x)
φ(x)∗φ(x)
Thus by the previous lemma,
‖φ(x)‖n = ‖φ(x)‖2k
= ‖(φ(x)∗φ(x))k‖ = ‖φ((x∗x)k)‖
≤ ‖φ‖‖(x∗x)k‖ = ‖φ‖‖x‖2k = ‖φ‖‖x‖n,
which implies that ‖φ‖ ≤ 1 by taking n-th roots.
The proof for the odd case n = 2k + 1 is similar. �
3. Automatic Continuity: The Odd Case
The positivity methods above do not work when n is odd, since the negation
of a ∗-homomorphism defines an involutive 3-homomorphism that is (completely)
bounded, but not positive. We need the following slight generalization of Lemma
3.5 of Harris [6].
Lemma 3.1. Let A be a C⋆-algebra and let λ 6= 0 and k ≥ 1. If a ∈ A then
λ ∈ σ((a∗a)k) if and only if there does not exist an element c ∈ A with
(1) c (λ− (a∗a)k) = a.
Proof. If λ 6∈ σ((a∗a)k), then c = a(λ− (a∗a)k)−1 ∈ A satisfies
c (λ− (a∗a)k) = a(λ− (a∗a)k)−1(λ− (a∗a)k) = a.
and so (1) holds.
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 5
On the other hand, if λ ∈ σ((a∗a)k) then, by the commutative functional
calculus, there is a sequence {bm}
1 in the unitization A
+ with bm 6→ 0 but
dm =def (λ − (a
∗a)k)bm → 0. Since λ 6= 0 we must have
a∗(aa∗)k−1(abm) = (a
∗a)kbm = λbm − dm 6→ 0,
which implies abm 6→ 0. Hence, there does not exist an element c ∈ A that can
satisfy equation (1), since this would imply that
abm = c (λ− (a
∗a)k)bm → 0,
which is a contradiction. This proves the lemma. �
We now prove automatic continuity for involutive n-homomorphisms of C⋆-
algebras for all odd values of n. Note that we do not assume that A is unital,
nor do we appeal to the unitization φ+ : A+ → B+ of φ, which is not an n-
homomorphism, in general.
Theorem 3.2. Let φ : A → B be an involutive n-homomorphism between C⋆-
algebras. If n ≥ 3 is odd, then ‖φ‖ ≤ 1, i.e., φ is norm contractive.
Proof. Let n = 2k + 1 where k ≥ 1. Given any a ∈ A and λ > 0 such that that
λ 6∈ σ((a∗a)k), there is, by the previous lemma, an element c ∈ A such that
a = c (λ− (a∗a)k) = (λc− c(a∗a)k).
Noting that c(a∗a)k is a product of 2k + 1 = n elements in A, and φ is a ∗-linear
n-homomorphism, we compute:
φ(a) = φ(λc − c(a∗a)k) = λφ(c) − φ(c(a∗a)k)
= λφ(c) − φ(c)(φ(a)∗φ(a))k = φ(c)(λ − (φ(a)∗φ(a))k)
which yields that there is an element φ(c) ∈ B with:
φ(c)(λ − (φ(a)∗φ(a))k) = φ(a).
By the previous lemma, we conclude that λ 6∈ σ((φ(a)∗φ(a))k). Thus, we have
shown the following inclusion of spectra:
σ((φ(a)∗φ(a))k) ⊆ σ((a∗a)k) ∪ {0}.
Therefore, by the spectral radius formula [1, II.1.6.3] and the generalization of the
C⋆-identity in Lemma 2.4, we must deduce that:
‖φ(a)‖2k = ‖(φ(a)∗φ(a))k‖
= r((φ(a)∗φ(a))k) ≤ r((a∗a)k)
= ‖(a∗a)k‖ = ‖a‖2k,
which implies that ‖φ(a)‖ ≤ ‖a‖ for all a ∈ A, as desired. �
Note that the argument in the previous proof does not work for n = 2k even,
since we would need to employ (a∗a)k−1a which is a product of 2k − 1 = n − 1
elements as needed, but not self-adjoint, in general. Thus, we could not appeal
to the spectral radius formula for self-adjoint elements and Lemma 3.1 would not
apply. Hence, the even and odd n arguments are essentially disjoint.
6 EFTON PARK AND JODY TROUT
4. Nonexistence of Nontrival Involutive n-homomorphisms of
C⋆-algebras
Our first main result is the nonexistence of nontrivial n-homomorphisms on
unital C⋆-algebras for all n ≥ 3. We do the unital case first since it is much simpler
to prove and helps to frame the argument for the nonunital case.
Theorem 4.1. Let φ : A→ B be an involutive n-homomorphism between the C⋆-
algebras A and B, where A is unital. If n ≥ 2 is even, then φ is a ∗-homomorphism.
If n ≥ 3 is odd, then φ is the difference φ(a) = ψ1(a) − ψ2(a) of two orthogonal
∗-homomorphisms ψ1 ⊥ ψ2 : A→ B.
Proof. In either case, by Proposition A.1, the element e = φ(1) ∈ B is an
n-potent (en = e) and is self-adjoint, because
e = φ(1) = φ(1∗) = φ(1)∗ = e∗.
Also, there is an associated algebra homomorphism ψ : A→ B defined for all a ∈ A
by the formula
ψ(a) = en−2φ(a) = φ(a)en−2
such that φ(a) = eψ(a) = ψ(a)e. In either case, ψ is ∗-linear since φ is ∗-linear and
e is self-adjoint and commutes with the range of φ:
ψ(a∗) = en−2φ(a∗) = en−2φ(a)∗ =
en−2φ(a)
= ψ(a)∗.
Now, if n = 2k is even, e = en = (ek)∗ek ≥ 0 and so e = p is a projection. Thus,
φ(a) = pψ(a) = ψ(a)p = pψ(a)p is a ∗-homomorphism. If n ≥ 3 is odd, then by
Lemma A.8, e is the difference of two orthogonal projections e = p1−p2 which must
commute with both ψ and φ by the functional calculus. Define ψ1, ψ2 : A → B
by ψi(a) = piψ(a)pi for all a ∈ A and i = 1, 2. Then ψ2 ⊥ ψ2 are orthogonal
∗-homomorphisms, and
ψ1(a)− ψ2(a) = p1ψ(a)− p2ψ(a) = eψ(a) = φ(a)
for all a ∈ A, from which the desired result follows. �
Corollary 4.2. Let φ : A→ B be a linear map between C⋆-algebras. If A is unital,
the following are equivalent for all integers n ≥ 2:
a.) φ is a ∗-homomorphism.
b.) φ is a positive n-homomorphism.
c.) φ is an involutive n-homomorphism and φ(1) ≥ 0.
Proof. Clearly (a) =⇒ (b) =⇒ (c). If n ≥ 2 is even, then (c) =⇒ (a) by the
previous result. If n ≥ 3 is odd, then by the previous result, we only need to show
that φ is positive. Let n = 2k + 1. Given any a ∈ A, by the Cohen Factorization
Theorem, we can write a = a1 · · · ak. Since φ(1) ≥ 0, by hypothesis, and n = 2k+1,
we compute:
φ(a∗a) = φ(a∗1a) = φ(a∗k · · · a
11a1 · · ·ak)
= φ(ak)
∗ · · ·φ(a1)
∗φ(1)φ(a1) · · ·φ(ak)
φ(a1) · · ·φ(ak)
φ(a1) · · ·φ(ak)
= b∗φ(1)b ≥ 0,
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 7
where b = φ(a1) · · ·φ(ak) ∈ B. Thus, φ is positive linear and therefore a ∗-
homomorphism. �
Next, we extend our nonexistence results to the nonunital case, by appealing to
approximate unit arguments (which require continuity!) and the following impor-
tant factorization property of ∗-preserving n-homomorphisms.
Lemma 4.3 (Coherent Factorization Lemma). Let φ : A → B be an involutive
n-homomorphism of C⋆-algebras. For any 1 ≤ k ≤ n and any a ∈ A, if a =
a1 · · ·ak = b1 · · · bk in A, then
φ(a1) · · ·φ(ak) = φ(b1) · · ·φ(bk) ∈ B.
Note that, in general, φ(a) 6= φ(a1) · · ·φ(ak) when 1 < k < n.
Proof. Clearly, we may assume 1 < k < n. Since φ is ∗-linear, the range
φ(A) ⊂ B is a self-adjoint linear subspace of B (but not necessarily a subalgebra,
in general). Given any d = φ(c) ∈ φ(A), using the Cohen Factorization Theorem,
write d = d1 · · · dn = φ(c1) · · ·φ(cn) where di = φ(ci) for 1 ≤ i ≤ n. Consider the
following computation:
φ(a1) · · ·φ(ak)d = φ(a1) · · ·φ(ak)φ(c1) · · ·φ(cn)
= φ(a1 · · · akc1 · · · cn−k)φ(cn−k+1) · · ·φ(cn)
= φ(b1 · · · bkc1 · · · cn−k)φ(cn−k+1) · · ·φ(cn)
= φ(b1) · · ·φ(bk)φ(c1) · · ·φ(cn)
= φ(b1) · · ·φ(bk)d.
Let f = φ(a1) · · ·φ(ak) − φ(b1) · · ·φ(bk). Then fd = 0 for all d ∈ φ(A) ⊂ B, and
thus fd = 0 for all d in the ∗-subalgebra Aφ of B generated by φ(A). In particular,
for the element
da = φ(a
k) · · ·φ(a
1)− φ(b
k) · · ·φ(b
1) = f
∗ ∈ Aφ.
Hence, ff∗ = fda = 0 and so ‖f‖
2 = ‖ff∗‖ = 0 by the C⋆-identity. Therefore,
φ(a1) · · ·φ(ak)− φ(b1) · · ·φ(bk) = f = 0,
and the result is proven. �
Definition 4.4. An approximate unit for a (nonunital) C⋆-algebra A is a net
{eλ}λ∈Λof elements in A indexed by a directed set Λ such that
a.) 0 ≤ eλ and ‖eλ‖ ≤ 1 for all λ ∈ Λ;
b.) eλ ≤ eµ if λ ≤ µ in Λ;
c.) For all a ∈ A,
‖aeλ − a‖ = lim
‖eλa− a‖ = 0.
Every C⋆-algebra has an approximate unit, which is countable (Λ = N) if A is
separable (see Section II.4 of Blackadar [1].)
Theorem 4.5. Suppose φ : A → B is an involutive n-homomorphism of C⋆-
algebras, where A is nonunital. Then, for all a ∈ A, the limit
ψ(a) = lim
φ(eλ)
n−2φ(a) = lim
φ(a)φ(eλ)
8 EFTON PARK AND JODY TROUT
exists, independently of the choice of the approximate unit {eλ} of A, and defines
a ∗-homomorphism ψ : A→ B such that
φ(a) = lim
φ(eλ)ψ(a)
for all a ∈ A.
Proof. We may assume n ≥ 3. Given a ∈ A, use the Cohen Factorization
Theorem to factor a = a1a2 · · · an. Define a map ψ : A→ B by
ψ(a) = φ(a1a2)φ(a3) · · ·φ(an) = φ(a1) · · ·φ(an−2)φ(an−1an),
which is well-defined by the Coherent Factorization Lemma. The continuity of φ
implies that
φ(eλ)
n−2φ(a) = lim
φ(eλ)
n−2φ(a1) · · ·φ(an)
= lim
φ(en−2λ a1a2)φ(a3) · · ·φ(an)
= φ(a1a2)φ(a3) · · ·φ(an) = ψ(a) ∈ B.
It follows that we can write:
ψ(a) = lim
φ(eλ)
n−2φ(a) = lim
φ(a)φ(eλ)
and so ψ : A→ B is linear since φ is linear. Moreover, since φ is ∗-linear, it follows
that ψ is also ∗-linear:
ψ(a)∗ =
φ(a1a2)φ(a3) · · ·φ(an)
= φ(an)
∗ · · ·φ(a3)
φ(a1a2)
= φ(a∗n) · · ·φ(a
3)φ(a
= φ(a∗n1a
n2)φ(a
n−1) · · ·φ(a
= ψ((a∗n1a
n2)(a
n−1) · · · (a
= ψ(a∗n · · ·a
1) = ψ(a
In the computation above, we factored an = an2an1 and set a12 = a1a2 to obtain
the factorization a∗ = a∗n · · ·a
1 = (a
n−1 · · · a
12 into n elements. Given
a, b ∈ A with factorizations a = a1 · · · an and b = b1 · · · bn, the fact that φ is an
n-homomorphism implies:
ψ(a)ψ(b) =
φ(a1a2)φ(a3) · · ·φ(an)
φ(b1b2)φ(b3) · · ·φ(bn)
= φ((a1a2)a3 · · · an(b1b2))φ(b3) · · ·φ(bn)
= φ((ab1)b2)φ(b3) · · ·φ(bn)
= ψ(ab);
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 9
note that ab = (ab1)b2b3 · · · bn is a factorization of ab into n elements. A second
proof of multiplicativity goes as follows:
ψ(ab) = lim
φ(eλ)
φ(ab) = lim
φ(eλ)
φ( lim
= lim
φ(eλ)
n−2 lim
φ(aen−2µ b)
= lim
φ(eλ)
n−2 lim
φ(a)φ(eµ)
n−2φ(b)
= lim
φ(eλ)
n−2φ(a) lim
φ(eµ)
n−2φ(b)
= ψ(a)ψ(b).
Thus, ψ is a well-defined ∗-homomorphism. Finally, we compute:
φ(eλ)ψ(a) = lim
φ(eλ)φ(a1a2)φ(a3) · · ·φ(an)
= lim
φ(eλ(a1a2)a3 · · · an) = lim
φ(eλa)
= φ(a).
Using similar factorizations, the fact that {enλ} is also an approximate unit for
A, and the fact that the strict completion of the C⋆-algebra C⋆(φ(A)) generated
by the range φ(A) is the multiplier algebra M(C⋆(ψ(A))), we obtain the nonunital
version of Proposition A.1.
Corollary 4.6. Suppose that A and B are C⋆-algebras with A nonunital, and let
φ : A → B be an involutive n-homomorphism with associated ∗-homomorphism
ψ : A→ B. Then there is a self-adjoint n-potent e = e∗ = en ∈M(C∗(φ(A))) such
that φ(eλ) → e strictly for any approximate unit {eλ} of A, and with the property
φ(a) = eψ(a) = ψ(a)e
ψ(a) = en−2φ(a)
for all a ∈ A.
Proof. By the previous proof, we can define e ∈ M(C⋆(φ(A))) on generators
φ(a) by
eφ(a) = lim
φ(eλ)φ(a) = φ(a1a2 · · · an−1)φ(an) ∈ C
⋆(φ(A))
for any a = a1 · · · an ∈ A. It follows that:
enφ(a) = lim
φ(eλ)
nφ(a)
= lim
φ(enλ)φ(a1)φ(a2) · · ·φ(an)
= lim
φ((enλ)a1a2 · · · an−1)φ(an)
= φ(a1 · · · an−1)φ(an) = eφ(a),
which implies e ∈ M(C⋆(φ(A))) is n-potent. The fact that e = e∗ follows from
φ(eλ)
∗ = φ(e∗λ) = φ(eλ). The other statements follow from the previous proof. �
The dichotomy between the unital and nonunital cases is now clear. If A is unital,
then C⋆(φ(A)) ⊂ B is a unital C⋆-subalgebra of B with unit ψ(1) = φ(1)n−1 ∈ B
(which is a projection!) and so
M(C⋆(ψ(A))) = C⋆(φ(A)) ⊂ B.
10 EFTON PARK AND JODY TROUT
However, for A nonunital, we cannot identify the multiplier algebra M(C⋆(φ(A)))
as a subalgebra of B, or evenM(B), unless φ is surjective. In general, we only have
inclusions ψ(A) ⊂ C⋆(φ(A)) ⊂ B.
Now that we know, as in the unital case, every involutive n-homomorphism is an
n-potent multiple of a ∗-homomorphism, we can prove the following general version
of Theorem 4.1 and its corollary in a similar manner using Lemma A.8.
Theorem 4.7. Let φ : A→ B be an involutive n-homomorphism of C⋆-algebras. If
n ≥ 2 is even, then φ is a ∗-homomorphism. If n ≥ 3 is odd, then φ is the difference
φ(a) = ψ1(a)− ψ2(a) of two orthogonal ∗-homomorphisms ψ1 ⊥ ψ2 : A→ B.
Corollary 4.8. For all n ≥ 2 and C⋆-algebras A and B, φ : A → B is a positive
n-homomorphism if and only if φ is a ∗-homomorphism.
Appendix A. On n-homomorphisms and n-potents
An element x ∈ A is called an n-potent if xn = x. Note that if φ : A→ B is an n-
homomorphism, then φ(x) = φ(xn) = φ(x)n ∈ B is also an n-potent. The following
important result is Proposition 2.2 [7], whose proof is included for completeness.
Proposition A.1. If A is a unital algebra (or ring) and φ : A → B is an n-
homomorphism, then there is a homomorphism ψ : A → B and an n-potent e =
en ∈ B such that φ(a) = eψ(a) = ψ(a)e for all a ∈ A. Also, e commutes with the
range1 of φ, i.e., eφ(a) = φ(a)e for all a ∈ A.
Proof. Note that e = φ(1) = φ(1n) = φ(1)n = en ∈ B is an n-potent. Define a
linear map ψ : A→ B by ψ(a) = en−1φ(a) for all a ∈ A. For all a, b ∈ A,
ψ(ab) = en−2φ(ab) = en−2φ(a1n−2b)
en−2φ(a)
φ(1)n−2φ(b)
= ψ(a)ψ(b),
and so ψ is an algebra homomorphism. Furthermore,
eψ(a) = φ(1)(φ(1)n−2φ(a)) = φ(1)n−1φ(a) = φ(1n−1a) = φ(a).
Similarly, ψ(a)e = φ(a) for all a ∈ A. The final statement is a consequence of the
fact that for all a ∈ A,
eφ(a) = φ(1)φ(a1n−1) =
φ(1)φ(a)φ(1)n−2
φ(1) = φ(1a1n−2)e = φ(a)e.
The following computation will be more significant when we consider the nonuni-
tal case (see the proof of Theorem 4.5.)
Corollary A.2. Let φ and ψ be as in Proposition A.1 and n ≥ 3. Then for all
a ∈ A, if a = a1a2 · · ·an with a1, . . . , an ∈ A,
ψ(a) = φ(a1a2)φ(a3) · · ·φ(an).
1Note that the range φ(A) is not a subalgebra of B in general.
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 11
Proof. We compute as follows:
ψ(a) =def e
n−2φ(a) = φ(1)n−2φ(a1 · · · an)
= φ(1)n−2φ(a1) · · ·φ(an)
φ(1)n−2φ(a1)φ(a2)
φ(a3) · · ·φ(an)
= φ(1n−2a1a2)φ(a3) · · ·φ(an)
= φ(a1a2)φ(a3) · · ·φ(an). �
Definition A.3. Let A be a unital algebra. An n-partition of unity is an ordered
n-tuple (e0, e1, . . . , en−1) of idempotents (e
k = ek) that sum to the identity e0 +
e1 + · · · + en−1 = 1 and are pairwise mutually orthogonal, i.e., ejek = δjk1 for all
0 ≤ j, k ≤ n− 1, where δjk is the Kronecker delta.
Note that e0 = 1− (e1+ · · ·+ en−1) is completely determined by e1, e2, . . . , en−1
and is thus redundant in the notation for an n-partition of unity.
Definition A.4. Let ω0 = 0 and ωk = e
2πi(k−1)/(n−1) for 1 ≤ k ≤ n − 1.
Note that ω1 = 1 and ω1, . . . , ωn−1 are the (n − 1)-th roots of unity and Σn =
{ω0, ω1, . . . , ωn−1} are the n roots of the polynomial equation x
n−x = x(xn−1−1) =
If A is a complex algebra, we let Ã denote A, if A is unital, or the unitization
A+ = A⊕ C, if A is nonunital.
Theorem A.5. Let A be a complex algebra. If e ∈ A is an n-potent, there is a
unique n-partition of unity (e0, e1, . . . , en−1) in Ã such that
ωkek.
If A is nonunital, then e1, . . . , en−1 ∈ A.
Proof. Define the n polynomials p0, p1, . . . , pn−1 by
pk(x) =
j 6=k(x− ωj)
j 6=k(ωk − ωj)
In particular, p0(x) = 1− x
n−1. Each polynomial pk has degree n− 1 and satisfies
pk(ωk) = 1 and pk(ωj) = 0 for all j 6= k. It follows that pj(x)pk(x) = 0 for all
x ∈ Σn. We also claim that for all x ∈ C that
pk(x) = p0(x) + · · ·+ pn−1(x) = 1
(3) x =
ωkpk(x).
Indeed, these identities follow from the fact that these polynomial equations have
degree n− 1 but are satisfied by the n distinct points in Σn.
Now, given any xn = x in C it follows that pk(x)
2 = pk(x). Hence, for any n-
potent e ∈ A, if we define ek = pk(e) then (e0, e1, . . . , en−1) consists of idempotents
12 EFTON PARK AND JODY TROUT
e2k = pk(e)
2 = pk(e) = ek and satisfy, by (2),
pk(e) = 1Ã.
They are pairwise orthogonal, because ejek = pj(e)pk(e) = 0 for j 6= k. Moreover,
ωkpk(e) =
by Equation (3). For 1 ≤ k ≤ n− 1, note that pk(x) = xqk(x) for some polynomial
qk(x). Hence, if A is nonunital and 1 ≤ k ≤ n−1, we have ek = pk(e) = eqk(e) ∈ A,
since A is an ideal in Ã. �
The following result is the n-homomorphism version of the previous n-potent
result. Recall say that two linear maps ψi, ψj : A→ B are orthogonal (ψi ⊥ ψj) if
ψi(a)ψj(b) = ψj(b)ψi(a) = 0
for all a, b ∈ A.2
Proposition A.6. Let A and B be complex algebras. If A is unital then a linear
map φ : A → B is an n-homomorphism if and only if there exist n − 1 mutually
orthogonal homomorphisms ψ1, . . . , ψn−1 : A→ B such that for all a ∈ A,
φ(a) =
ωkψk(a).
Proof. (⇒) Let φ : A → B be an n-homomorphism. By Proposition A.1, there
is an n-potent e ∈ B and a homomorphism ψ : A → B such that φ(a) = eψ(a) =
ψ(a)e. Using the previous result, write e =
k=1 ωkek, where (e0, e1, . . . , en−1)
is the associated n-partition of unity in Ã defined by the polynomials pk. Since
ek = pk(e), we have that ekψ(a) = ψ(a)ek for 1 ≤ k ≤ n− 1. Define ψk : A → B
ψk(a) =def ekψ(a) = e
kψ(a) = ekψ(a)ek.
Then ψ1, . . . , ψn−1 are orthogonal homomorphisms and, for all a ∈ A,
φ(a) = eψ(a) =
ωkekψ(a) =
ωkψk(a).
(⇐) Follows from the fact that ωnk = ωk for all k = 1, . . . , n− 1. �
Remark A.7. If A is nonunital, the above result does not hold. One reason is
that the unitization φ+ : A+ → B+ of an n-homomorphism is not, in general, an
n-homomorphism. Also, if An = Bn = {0}, then every linear map L : A → B is
an n-homomorphism (See Examples 2.5 and 4.3 of Hejazian et al [7]).
Let Σn be the n roots of the polynomial equation x = x
n from Definition A.4.
If A is a C⋆-algebra, it follows that a normal n-potent e = en must have spectrum
σ(e) ⊆ Σn. Recall that a projection is an element p = p
∗ = p2 ∈ A. Two projections
p1 and p2 are orthogonal if p1p2 = 0. A tripotent is a 3-potent element e
3 = e ∈ A.
The following characterization of self-adjoint n-potents in C⋆-algebras is impor-
tant for our nonexistence results on n-homomorphisms.
2Note that the zero homomorphism is orthogonal to every homomorphism.
NONEXISTENCE OF NONTRIVIAL n-HOMOMORPHISMS 13
Lemma A.8. Let A be a C⋆-algebra.
a.) If n ≥ 2 is an even integer, the following are equivalent:
i.) e is a projection.
ii.) e is a positive n-potent.
iii.) e is a self-adjoint n-potent.
b.) If n ≥ 3 is an odd integer, the following are equivalent:
i.) e is a self-adjoint tripotent.
ii.) e = p1 − p2 is a difference of two orthogonal projections.
iii.) e is a self-adjoint n-potent.
Proof. In both the even and odd cases, (i) =⇒ (ii) =⇒ (iii) (See Theorem
A.5). Suppose (iii) holds. If n = 2k is even,
e = e∗ = en = e2k = (ek)∗(ek) ≥ 0,
and so the spectrum of e satisfies σ(e) ⊂ Σn∩[0,∞] = {0, 1}. Thus, e is a projection.
If n ≥ 3 is odd, then since e = e∗ we must have σ(e) ⊂ Σn ∩R = {−1, 0, 1}. Thus,
λ = λ3 for all λ ∈ σ(e), which implies e = e3 is tripotent. �
References
[1] B. Blackadar, Theory of C∗-algebras and von Neumann algebras, Encyclopaedia of Math-
ematical Sciences, 122. Operator Algebras and Non-commutative Geometry, III. Springer-
Verlag, Berlin, 2006.
[2] J. Bračič and S. Moslehian, On Automatic Continuity of 3-Homomorphisms on Banach
Algebras, to appear in Bull. Malays. Math. Sci. Soc. arXiv: math.FA/0611287.
[3] P. Cohen, Factorization in group algebras, Duke Math. J. 26 (1959) 199–205.
[4] S. Feigelstock, Rings whose additive endomorphisms are N-multiplicative, Bull. Austral.
Math. Soc. 39 (1989), no. 1, 11–14.
[5] S. Feigelstock, Rings whose additive endomorphisms are n-multiplicative. II, Period. Math.
Hungar. 25 (1992), no. 1, 21–26.
[6] L. Harris, A Generalization of C⋆-algebras, Proc. London Math. Soc. 42 (1981) no. 3, 331–
[7] M. Hejazian, M. Mirzavaziri, and M.S. Moslehian, n-homomorphisms, Bull. Iranian Math.
Soc. 31 (2005), no. 1, 13-23.
[8] I. Raeburn and D. P. Williams, Morita Equivalence and Continuous-Trace C⋆-Algebras,
Mathematical Surveys and Monographs, vol. 60, American Mathematical Society, 1998.
[9] W. Stinespring, Positive functions on C⋆-algebras, Proc. Amer. Math. Soc. 6 (1955), 211–216.
Box 298900, Texas Christian University, Fort Worth, TX 76129
E-mail address: e.park@tcu.edu
6188 Kemeny Hall, Dartmouth College, Hanover, NH 03755
E-mail address: jody.trout@dartmouth.edu
http://arxiv.org/abs/math/0611287
	1. Introduction
	2. Automatic Continuity: The Even Case
	3. Automatic Continuity: The Odd Case
	4. Nonexistence of Nontrival Involutive n-homomorphisms of C-algebras
	Appendix A. On n-homomorphisms and n-potents
	References
ABSTRACT
  An n-homomorphism between algebras is a linear map $\phi : A \to B$ such that
$\phi(a_1 ... a_n) = \phi(a_1)... \phi(a_n)$ for all elements $a_1, >..., a_n
\in A.$ Every homomorphism is an n-homomorphism, for all n >= 2, but the
converse is false, in general. Hejazian et al. [7] ask: Is every *-preserving
n-homomorphism between C*-algebras continuous? We answer their question in the
affirmative, but the even and odd n arguments are surprisingly disjoint. We
then use these results to prove stronger ones: If n >2 is even, then $\phi$ is
just an ordinary *-homomorphism. If n >= 3 is odd, then $\phi$ is a difference
of two orthogonal *-homomorphisms. Thus, there are no nontrivial *-linear
n-homomorphisms between C*-algebras.

<|endoftext|><|startoftext|>
Introduction
Galaxy interactions and mergers are observed at all redshifts and play a key role in
galaxy evolution. Two percent of local galaxies are interacting or merging (Athanassoula &
Bosma 1985; Patton et al. 1997), and this fraction is larger at high redshift (e.g., Abraham
et al. 1996b; Neuschaefer et al. 1997 ; Conselice et al. 2003; Lavery et al. 2004; Straughn
et al. 2006; Lotz et al. 2006, and others). Conselice (2006a) estimates that massive galaxies
have undergone about 4 major mergers by redshift 1. Toomre (1977) described a sequence
of merger activity ranging from separated galaxies with tails and a bridge between them,
to double nuclei in a common envelope with tails, to merged nuclei with tails. Ground-
based (Hibbard & van Gorkom 1996) and space-based (Laine et al. 2003; Smith et al. 2007)
observations of this sequence show optical, infrared, and radio activity in the tails and nuclei.
High resolution images and numerical simulations of nearby interactions demonstrate
how star formation and morphology are affected. General reviews of interaction simulations
are given by Barnes & Hernquist (1992) and Struck (1999). The initial galaxy properties,
such as mass, rotational velocity, gas content and dark matter content, and their initial sep-
arations and velocity vectors, all play a role in generating structure. The viewing angle also
affects the morphology. Early-type galaxies with little gas are expected to display smooth
plumes and shells, while spiral interactions and mergers should exhibit clumpy star forma-
tion along tidal tails, and condensations of material at the tail ends. Equal mass companions
may show bridges between them. A prominent example of a tidal interaction is the Antennae
(NGC4038/9), a merging pair of disk galaxies with rampant star formation in the central
regions, including young globular clusters (Whitmore et al. 2005). Its interaction was first
modeled by Toomre & Toomre (1972). The Cartwheel galaxy is a collisional ring system
rimmed with star formation from a head-on collision (Struck et al. 1996). Sometimes polar-
ring or spindle galaxies are the result of perpendicular collisions (Struck 1999). The Mice
(NGC 4676) has a long narrow straight tail and a curved tidal arm (Vorontsov-Velyaminov
1957; Burbidge & Burbidge 1959); numerical simulations reproduce both features well in a
model with a halo:(disk+bulge) mass ratio of 5 (Barnes 2004). The Superantennae (IRAS
19254-7245) is a pair of infrared-luminous merging giant galaxies having Seyfert and star-
– 3 –
burst nuclei and ∼ 200 kpc tails with a tidal tail dwarf (Mirabel, Lutz, & Maza, 1991). The
Leo Triplet includes NGC 3628 with an 80 kpc stellar tail containing star-forming complexes
with masses up to 106 M⊙ (Chromey et al. 1998). The Tadpole galaxy UGC10214 (Tran et
al. 2003; de Grijs et al. 2003; Jarrett et al. 2006), the IC2163/NGC2207 pair (Elmegreen
et al. 2001, 2006), and Arp 107 (Smith et al. 2005) are all interacting systems observed
with HST and SST and modeled in simulations. Many local mergers have intense nuclear
activity, such as the Seyfert galaxy NGC 5548, which also has an 80 kpc long, low surface
brightness (V=27-28 mag arcsec−2) tidal tail and a 1-arm diffuse spiral (Tyson et al. 1998).
The GEMS (Galaxy Evolution from Morphology and SEDs; Rix et al. 2004), GOODS
(Great Observatories Origins Deep Survey; Giavalisco et al. 2004), and UDF (Ultra Deep
Field; Beckwith et al. 2006) surveys done with the HST ACS (Hubble Space Telescope
Advanced Camera for Surveys) have enabled high resolution studies of the morphology of
intermediate and high redshift galaxies. Light distribution parameters such as the Gini co-
efficient (Lotz et al. 2006) and concentration index, asymmetry, and clumpiness (CAS;
Conselice 2006) have been applied to galaxies in these fields to study possible merger
systems. For GEMS and GOODS, John Caldwell of the GEMS team has posted images
(archive.stsci.edu/prepds/gems/datalist.html) of several galaxies from each field, including
peculiar and interacting systems with tails and bridges. Here we examine the entire GEMS
and GOODS fields systematically for such galaxies and study their tails, bridges, and star-
forming regions. Their properties are useful for understanding interactions and interaction-
triggered star formation, and for probing the relative dark matter content (e.g., Dubinski,
Mihos, & Hernquist 1999).
2. The Sample of Interactions and Mergers
The GOODS and GEMS images from the public archive were used for this study. They
include exposures in 4 filters for GOODS: F435W (B435), F606W (V606), F775W (i775), and
F850LP (z850); and 2 filters (V606 and z850) for GEMS. The public images were drizzled to
produce final archival images with a scale of 0.03 arcsec per px. GEMS, which incorporates
the southern GOODS survey (Chandra Deep Field South, CDF-S) in the central quarter of
its field, covers 28 arcmin x 28 arcmin; there are 63 GEMS and 18 GOODS images that
make up the whole field. The GOODS images have a limiting AB mag of V606= 27.5 for
an extended object, or about two mags fainter than the GEMS images. There are over
25,000 galaxies catalogued in the COMBO-17 survey (Classifying Objects by Medium-Band
Observations, a spectrophotometric 17-filter survey; Wolf et al. 2003), and 8565 that are
cross-correlated with the GEMS survey (Caldwell et al. 2005).
– 4 –
Interacting galaxies with tails, bridges, diffuse plumes and other features were identified
by eye on the online Skywalker images and examined on high resolution V606 fits images.
The lower limit to the length of detectable tails is about 20 pixels. Snapshots of several
different morphologies for interacting galaxies are shown in Figures 1-6. Out of an initial
list of about 300 galaxies, a total of 100 best cases are included in our sample: 14 diffuse
types, 18 antennae types, 22 M51 types, 19 shrimp types, 15 equal mass interactions, and
12 assemblies, as we describe below.
GEMS and GOODS galaxy redshifts were obtained from the COMBO-17 list (Wolf et
al. 2003). Our sample ranges from redshift z = 0.1 to 1.4 in an area of 2.8×106 square arcsec.
The linear diameters of the central objects were determined from their angular diameters
and redshifts using the appropriate conversion for a ΛCDM cosmology (Carroll et al., 1992;
Spergel et al., 2003). The range is ∼ 3 to 33 kpc. Projected tail lengths were measured in
a straight line from the galaxy center to the 2σ noise limit (25.0 mag arcsec−2) in the outer
tail.
Photometry was done on the whole galaxies, on each prominent star-forming clump, and
on the tails using the IRAF task imexam. A box of variable size was defined around each
feature; the outer limits of the boxes were chosen to be where the clump brightness is about
3 times the surrounding region. Sky subtraction was not done because the background
is negligible. The photometric errors are ∼0.1-0.2 mag for individual clumps. The V606
surface brightnesses of the tidal tails were determined using imagej (Rasband 1997) to trace
freehand contours around the tails, so that they could be better defined than with rectangular
or circular apertures.
Figure 1 shows galaxies with diffuse plumes and either no blue star formation patches
or only a few tiny patches (e.g., galaxies number 5 and 6); we refer to these interactions as
diffuse types. The colors of the plumes match the colors of the outer parts of the central
galaxies, indicating the plumes are tidally shorn stars with little gas. There is structure in
most of the plumes consisting of arcs or sharp edges. This is presumably tidal debris from
early type galaxies with little or no gas (e.g. Larson & Tinsley 1978; Malin & Carter 1980;
Schombert et al. 1990). This type of interaction is relatively rare in the GEMS and GOODS
images, perhaps because the tidal debris is faint. The best cases are shown here and they
all have relatively small redshifts compared to the other interaction types (the average z is
0.23 and the maximum z is 0.69).
The image in the top left panel of Figure 1 (galaxy 1) has a giant diffuse clump in the
upper right corner. This could be a condensation in the tidal arm, or it could be another
galaxy. In either case, it has the same color as the rest of the tidal arm nearby. That is,
V606 − z850 = 0.90 ± 0.5 for the clump and also in six places along the tail; the color is
– 5 –
essentially the same, 0.94 ± 0.05, in the core of the galaxy. The absolute magnitude of the
clump is MV = −18.41 for redshift z = 0.15. The mass is ∼ 5 × 10
9 M⊙ (Sect. 3.2). If
this clump is a condensation in the tail, then it could be a rare case where a pure stellar arc
has collapsed gravitationally into a gas-free tidal dwarf. The final result could be a dwarf
elliptical. Usually tidal dwarfs form by gaseous condensations in tidal arms (Wetzstein,
Naab, & Burkert 2007).
Figure 2 shows interactions that resemble the local Antennae pair, so we refer to them as
antennae types. These types have long tidal tails and double nuclei or highly distorted centers
that appear to be mergers of disk galaxies. Note that antennae are not the same as “tadpole”
galaxies (Elmegreen et al. 2005a; de Mello et al. 2006; Straughn et al. 2006), which have one
main clump and a sometimes wiggly tail that may contain smaller clumps. Some antennae
have giant clumps near the ends of the tails which could have formed there (galaxies 16 and
17) and are analogous to the clump at the end of the Superantennae (Mirabel et al. 1991).
Galaxy 18 is in a crowded field with at least two long tidal arms; here we consider only the
tail system in the north, which is in the upper part of the figure. These long-tail systems are
relatively rare and all the best cases are shown in the figure; their average redshift, 0.70, is
typical for GEMS and GOODS fields. Galaxy 24 is somewhat like a tadpole galaxy, but its
very narrow tail and protrusion on the anti-tail side of the main clump are unlike structures
seen in tadpoles of the Ultra Deep Field.
For the antennae galaxies in Figure 2, the tails have an average (V-z) color that is
negligibly bluer, 0.10±0.25 mag, than the central disks. In a study of tidal features in local
Arp atlas galaxies, Schombert et al. (1990) also found that the tail colors are uniform and
similar to those of the outer disks. They noted that the most sharply-defined tails are with
spiral systems and the diffuses plumes are with ellipticals. This correlation may be true
here also, but it is difficult to tell from Figure 1 whether the smooth distorted systems are
intrinsically disk-like.
Galaxy 20 in Figure 2 is an interesting case. It has an elliptical clump at the end of its
tail that could be one of the collision partners. There are two central galaxy cores, however,
and their interaction may have formed the tidal arms without this companion. Furthermore,
the clump at the tip is aligned perpendicular to the tail, which is unusual for a tidal dwarf.
Thus it is possible that the clump was a pre-existing galaxy lying in the orbital plane of one
of the larger galaxies now at the center. Presumably this former host is the galaxy currently
connected to the dwarf by the tidal arm. The interaction could have swung it around to
its current position at the tip. A similar case occurs for the local IC 2163/NGC 2207 pair,
which has a spheroidal dwarf galaxy at the tip of its tidal arm (Elmegreen et al. 2001). Such
swing-around dwarfs should have the same dynamical origin as the large pools of gas and
– 6 –
star formation that are at the tips of superantenna-type galaxies; i.e. the whole outer disk
moves to this position during the interaction (Elmegreen et al. 1993; Duc et al. 1997).
Figure 3 shows examples of interactions that we refer to as M51-type galaxies, where
the tidal arms can be bridges that connect the main disk galaxy to the companion (galaxy
33), or tails on the opposite side of the companion (e.g., galaxies 34 and 35), or both (galaxy
36). In galaxy 44, the tidal arm looks like the debris path of a pre-existing galaxy that lies at
the right; the orbit path apparently curves around on the left. The M51-types usually have
strong spirals in the main disk. In the top row, the tails and bridges are thin and diffuse.
The galaxy on the left in the lower row (galaxy 42) has a thick, fan-shaped tail opposite
the companion. Some bridges have star formation clumps (galaxy 40) and others appear
smooth (galaxy 33). Interactions like this, especially those with small companions, are more
common than the previous two types and only a few best cases are shown in Figure 3 and
discussed in the rest of this paper.
Figure 4 shows examples of galaxies dominated by one highly curved, dominant arm and
large, regularly-spaced clumps of star formation. We call these “shrimp” galaxies because
of their resemblance to the tail of a shrimp. Although their star formation indicates they
contain gas and therefore are disk systems, there are no well-defined spirals (except for the
prominent arm), merging cores, or obvious central nuclei. The clumps resemble the beads-
on-a-string star formation in spiral density waves and probably have the same origin, a
gravitational instability (Elmegreen & Elmegreen 1983; Kim & Ostriker 2006; Bournaud,
Duc, & Masset 2003). The J-shaped morphology is reminiscent of the 90 kpc gas tail of M51
(Rots et al. 1990) and the 48 kpc gas tail observed in NGC 2535 (Kaufman et al. 1997).
Rots el al. point out that the M51 gas tail is much broader (10 kpc) than the narrow tails
seen in merging systems like the Antennae. The broad tail in galaxy 42 (Fig. 3) is similar
to the M51 tail. Sometimes there is a bright tail with no obvious companion (galaxies 56,
57, and 60); one of these, galaxy 56, was in our ring galaxy study (Elmegreen & Elmegreen
2006). Asymmetric, strong arm galaxies like this are not common in GEMS and GOODS;
this figure shows the best cases.
Figure 5 has a selection of irregular galaxies that appear to be interactions. Most of
them suggest an assembly of small pieces, so we refer to them as assembly types. If they were
slightly more round in overall shape, with more obvious interclump emission, then we would
classify them as clump-clusters, as we did in the UDF (Elmegreen et al. 2005a). The galaxy
in the lower left (galaxy 83) is like this. The resemblance of these types to clump-clusters
suggests that some of the clumps are accreted from outside the disk and others form from
gravitational instabilities in a pre-existing gas disk, as suggested previously (Elmegreen &
Elmegreen 2005). The system in the lower right (galaxy 85) could be interacting spirals, or a
– 7 –
triple system, or a bent chain (as studied in Elmegreen & Elmegreen 2006). There are many
examples of highly irregular galaxies like these in the GEMS and GOODS fields; indeed most
galaxies at z > 1.5 are peculiar in this sense (Conselice 2005). In what follows, we discuss
only these 12 galaxies.
Figure 6 has samples of grazing or close interactions, with spirals at the top of the
page (numbers 86-93), ellipticals lower down (numbers 95-97) and two polar-ring galaxies
(numbers 99 and 100) in the lowest row at the middle and right bottom. We refer to these
paired systems as “equals” because their distinguishing feature is that the two galaxies have
comparable size. The pair number 89 has a bright oval in the smaller galaxy, which is char-
acteristic of recent tidal forces for an in-plane, prograde encounter such as IC2163/NGC2207
(Sundin 1993; Elmegreen et al. 1995 ). There is a spiral-elliptical pair on the right in the
middle row (galaxy 94). Double ellipticals in the UDF were studied previously (Elmegreen
et al. 2005a, Coe et al. 2006). Near neighbors like this have been studied previously in the
GEMS field; 6 double systems out of 379 red sequence galaxies were identified as being dry
merger candidates, as reproduced in simulations (Bell et al. 2006). The models of mergers
of early-type systems by Naab et al. (2006) apparently account for kinematic and isophotal
properties of ellipticals better than the formation of ellipticals through late-type mergers
alone. For the pairs in our figure, both components have the same COMBO17 redshift.
There are many other examples of close galaxy groups and near interactions in the GEMS
and GOODS surveys. In what follows we discuss only the properties of those shown in Figure
The interacting types shown in the figures are meant to be as distinct as possible. These
and other good cases are listed in Table 1 by running number, along with their COMBO-
17 catalog number, redshift, and R magnitude. There is occasionally some ambiguity and
overlap in the interaction types, particularly between M51-types and shrimps when the M51-
types have small or uncertain companions at the ends of their prominent tails. Projection
effects can lead to uncertainties in the classifications as well, particularly for antennae whose
tails may be foreshortened. Nevertheless, these divisions serve as a useful attempt to sort out
the most prominent features among interacting galaxies. There are numerous other galaxies
in GEMS and GOODS that are apparently interacting, but most of them are too highly
distorted to indicate the particular physical properties of interest here, namely, disk-to-halo
mass ratio and star formation scale.
– 8 –
3. Photometric Results
3.1. Global galaxy properties
The integrated Johnson restframe (U-B) and (B-V) colors from COMBO-17 for the
observed galaxies with measured redshifts are shown in a color-color diagram in Figure 7.
The crosses in the diagram are Johnson colors for standard Hubble types (Fukugita et al.
1995). Our sample of galaxies spans the range of colors from early to late Hubble types,
although the bluest are bluer than standard irregular galaxies (a typical Im has U-B= −0.35,
B-V= 0.27). The reddest galaxies tend to be the diffuse types, thought to originate with
ellipticals involved in interactions. The two reddest galaxies in our sample are the diffuse
types number 1 and 2 in Figure 1. The bluest tend to be the assemblies, consistent with
their having formed recently.
Figure 8 shows a restframe color-magnitude diagram. Early and late type galaxies
usually separate into a “red sequence” and a “blue cloud” on such a diagram (Baldry et
al. 2004; Faber et al. 2005). The solid line indicates the boundary between these two
regions from a study of 22,000 nearby galaxies (Conselice 2006b). The short-dashed lines
are the limits of the Conselice (2006) survey; local galaxies are brighter than the vertical
short-dashed line and their colors lie between the horizontal short-dashed lines. The long-
dashed lines approximately outline the bright limit for the local blue cloud galaxies. Our
galaxies fall in both the red sequence and the blue cloud. The restframe colors in Figure 8
are consistent with their morphological appearances. The red sequence galaxies in the figure
usually appear smooth (the diffuse types) or lack obvious huge star formation clumps (the
equal mass mergers), while the blue cloud galaxies usually have patches of star formation
(the M51-types, shrimps, assemblies, and many antennae). We see now why the redshifts of
the diffuse galaxies (z < 0.3) are much lower than the others: this is a selection effect for the
ACS camera. These tails comprise old stellar populations without star-forming clumps, and
their intrinsic redness makes them difficult to see at high redshifts. Also, they tend to have
intrinsically low surface brightnesses because of a lack of star formation, and cosmological
dimming makes them too faint to see at high redshift. Hibbard & Vacca (1997) note that it
is difficult to detect tidal arms beyond z ∼ 1.5.
3.2. Clump properties
Prominent star-forming clumps are apparent in many of the interacting galaxies. Their
sizes and magnitudes were measured using rectangular apertures. The observed magnitudes
were converted to restframe B magnitudes whenever possible, using linear interpolations
– 9 –
between the ACS bands. For example, GEMS observations are at two filters, V606 and z850.
GEMS galaxies with redshifts z between 0.39 (= 606/435 − 1) and 0.95 (= 850/435 − 1)
were assumed to have restframe blue luminosities given by LB,rest = LV,obs(0.95− z)/(0.95−
0.39) + Lz,obs(z − 0.39)/(0.95 − 0.39). The restframe B magnitude is then −2.5 logLB,rest.
For GOODS galaxies, the conversions were divided into 3 redshift bins to make use of
the 4 available filters, and a linear interpolation was again applied to get restframe clump
magnitudes. For the GOODS galaxies, the restframe magnitudes determined by interpolation
between the nearest 2 filters among the 4 filters are within ±0.2 mag of the restframe
magnitudes determined from only the V and z filters. Thus, the GEMS interpolations are
accurate to this level. (We do not include corrections for intergalactic absorption in these
colors, because we are comparing them directly with their parent galaxy properties. Below,
when we convert the colors and magnitudes to masses and ages, absorption corrections are
taken into account.)
The apparent restframe B magnitudes of the clumps were converted to absolute rest-
frame B magnitudes using photometric redshifts and the distance modulus for a ΛCDM
cosmology. These absolute clump magnitudes are shown as a function of absolute galaxy
magnitude in Figure 9. The clump absolute B magnitudes scale linearly with the galaxy
magnitudes. The clumps are typically a kpc in size (∼ 3 to 8 pixels across), comparable
to star-forming complexes in local galaxies (Efremov 1995), which also scale with galaxy
magnitude (Elmegreen et al. 1996; Elmegreen & Salzer 1999).
Clump ages and masses were estimated by comparing observed clump colors, magni-
tudes, and redshifts with evolutionary models that account for bandshifting and intergalac-
tic absorption and that assume an exponential star formation rate decay (see Elmegreen &
Elmegreen 2005). Internal dust extinction as a function of redshift is taken from Rowan-
Robinson (2003). The GEMS galaxy clumps only have (V606-z850) colors, so the ages are not
well constrained. For the GOODS galaxies, the additional B and I filters help place better
limits on the ages, although there is still a wide range of possible fits.
Figure 10 shows sample model results for redshift z = 1. The different lines in each
panel correspond to different decay times for the star formation rate, in years: 107, 3× 107,
108, 3×108, and 109, and the sixth line represents a constant rate. Generally the shorter the
decay time, the redder the color and higher the mass for a given duration of star formation.
This correspondence between color and mass gives a degeneracy to plots of mass versus color
at a fixed apparent magnitude (top left) and apparent magnitude versus color at a fixed mass
(top right). Thus the masses of clumps can be derived approximately from their colors and
magnitudes, without needing to know their ages or star formation histories.
Figure 11 shows observations and models in the color-magnitude plane for 6 redshift in-
– 10 –
tervals spanning our galaxies. Each curve represents a wide range of star formation durations
that vary along the curve as in the top right panel of Fig. 10; each curve in a set of curves is
a different decay time. The different sets of curves, shifted vertically in the plots, correspond
to different clump masses, as indicated by the adjacent numbers, which are in M⊙. Each
different point is a different clump; many galaxies have several points. Only clumps with
both V606 and z850 magnitudes above the 2σ noise limit are plotted in Figure 11. The clump
(V606 − z850) colors range from 0 to 1.5. The magnitudes tend to be about constant for each
redshift because of a selection effect (brighter magnitudes are rare and fainter magnitudes
are not observed).
Figure 11 indicates that the masses of the observable clumps are between 106 and 109
M⊙ for all redshifts, with higher masses selected for the higher redshifts. The masses for all
of the clumps are plotted in Figure 12 versus the galaxy type (types 1 through 6 are in order
of Figs. 1 through 6 above). The masses are obtained from the observed values of V606 and
V606 − z850 using the method indicated in Figure 11. The different mass evaluations for the
six decay times are averaged together in the log to give the log of the mass plotted as a dot
in Figure 12. The rms values of log-mass among these six evaluations are shown in Figure
12 as plus-symbols, using the right-hand axes. These rms deviations are less than 0.2, so
the uncertainties in star formation decay times and clump ages do not lead to significant
uncertainties in the clump mass. (Systematic uncertainties involving extinctions, stellar
evolution models, photometric redshifts, and so on, would be larger.)
The clump ages cannot be determined independently from the star formation decay
times with only the few passbands available at high angular resolution. Figure 13 shows
model results that help estimate the clump ages. As in the other figures, each line is a
different exponential decay time for the star formation rate. If we consider the two extreme
decay times in this figure (continuous star formation for the bottom lines in each panel and
107 years for the top lines), then we can estimate the age range for each decay time from
the observed color range. For V606 − z850 colors in the range from 0 to 0.5 at low z (cf. Fig
11), the clump ages range from 107 to 1010 yr with continuous star formation and from 107
to 3 × 108 yr with a decay time of 107 yrs. For colors in the range from 0 to 1.5 at higher
redshifts, the age ranges are about the same in each case. For intermediate decay times, the
typical clump ages are between ∼ 107 years for the bluest clumps and ∼ 109 years for the
reddest clumps. These are reasonable ages for star formation regions, and consistent with
model tail lifetimes.
The star-forming complexes in the GEMS and GOODS interacting galaxies are 10 to
1000 times more massive than the local analogs seen in non-interacting late-type galaxies
(Elmegreen & Salzer 1999), but the low mass end in the present sample is similar to the
– 11 –
high mass end of the complexes measured in local interacting galaxies. For example, the
Tadpole galaxy, UGC 10214, contains 106 M⊙ complexes along the tidal arm (Tran et al.
2003; Jarrett et al. 2006). The interacting galaxy NGC 6872 has tidal tails with 109 M⊙ HI
condensations (Horellou & Koribalski 2007), but the star clusters have masses only up to
106 M⊙ (Bastian et al. 2005). The most massive complexes in the tidal tail of NGC 3628
in the Leo Triplet are also ∼ 106 M⊙ (Chromey et al. 1998). The NGC 6872 clusters differ
qualitatively from those in our sample in being spread out along a narrow arm; ours are big
round clumps spaced somewhat evenly along the arm. Small star clusters are also scattered
along the tidal arms the Tadpole and Mice systems; they typically contain less than 106 M⊙
(de Grijs et al. 2003). The NGC 3628 clusters are also faint with surface brightnesses less
than 27 mag arcsec−2; they would not stand out at high redshift.
It is reasonable to consider whether the observed increase of complex mass with increas-
ing redshift is a selection effect. Our clumps are several pixels in size, corresponding to a
scale of ∼ 1 kpc. Individual clusters are not resolved and we only sample the most massive
conglomerates. These kpc sizes are comparable to the complex sizes in local galaxies, but the
high redshift complexes are much brighter and more massive. They would be observed easily
in local galaxies. The massive complexes in our sample are more similar to those measured
generally in UDF galaxies (Elmegreen & Elmegreen 2005).
Clump separations were measured for clumps along the long arms in the shrimp galaxies
of Figure 4. They average 2.20±0.94 kpc for 49 separations. This is about the same separa-
tion as that for the largest complexes in the spiral arms of local spiral galaxies (Elmegreen
& Elmegreen 1983, 1987), and comparable to the spacing between groups of dust-feathers
studied by La Vigne et al. (2006). Yet the clumps in shrimp galaxies and others studied here
are much more massive than the complexes in local spiral arms, which are typically < 106
M⊙ in stars and ∼ 10
7 M⊙ in gas. This elevated mass can be explained by a heightened
turbulent speed for the gas, combined with an elevated gas density. Considering that the
separation is about equal to the two-dimensional Jeans length, λ ∼ 2a2/ (GΣ) for velocity
dispersion a and mass column density Σ, and that the mass is the Jeans mass, λ2Σ, the
mass scales with the square of the velocity dispersion, M = M0 (a/a0)
for fixed length
λ0 = 2a
0/ (GΣ0) and M0 = λ
0Σ0. The mass column density also scales with the square of
the dispersion, Σ = Σ0 (a/a0)
to keep λ constant. Thus the interacting tidal arm clumps
are massive because the velocity dispersions and column densities are high. Another way
to derive this is to note that for regular spiral arm instabilities, 2Gµ/a2 is about unity at
the instability threshold, where µ is the mass/length along the arm (Elmegreen 1994). Thus
cloud mass scales with a2 for constant cloud separation. High velocity dispersions for neutral
hydrogen, ∼ 50 km s−1, are also observed in local interacting galaxies (Elmegreen et al. 1993;
Irwin 1994; Elmegreen et al. 1995; Kaufman et al. 1997; Kaufman et al. 1999; Kaufman
– 12 –
et al. 2002). Presumably the interaction agitates the interstellar medium to make the large
velocity dispersions. The orbital motions are forced to be non-circular and then the gaseous
orbits cross, converting orbital energy into turbulent energy and shocks. Similar evidence
for high velocity dispersions was found in the masses and spacings of star forming complexes
in clump cluster galaxies (Elmegreen & Elmegreen 2005) and in spectral line widths (Genzel
et al. 2006; Weiner et al. 2006).
3.3. Tail Properties
Figure 14 shows the average tail surface brightness as a function of (1+ z)4 for galaxies
in Figures 1-4. Some systems have more than one tail. Cosmological dimming causes a fixed
surface brightness to get fainter as (1+z)−4, so there should be an inverse correlation in this
diagram. Clearly, the tails are brighter for the more nearby galaxies, and they decrease out
to z ∼ 1, where they are fairly constant. This constant limit is at the 2σ detection limit of
25 mag arcsec−2. Antennae galaxies with average tail surface brightnesses fainter than this
limit have patchy tails with no apparent emission between the patches. Only the brightest
high redshift tails can be observed in this survey.
Simulations by Mihos (1995) suggested that tidal tails are observable for a brief time in
the early stages of a merger, corresponding to ∼ 150 Myr at a redshift z = 1 and 350 Myr
at z = 0.4. The difference is the result of surface brightness dimming as tails disperse. A
nearby galaxy merger, Arp 299, has a 180 kpc long tail encompassing 2 to 4% of the total
galaxy luminosity, with an interaction age of 750 Myr, but its low surface brightness of 28.5
mag arcsec−2 (Hibbard & Yun 1999) would be below the GOODS/GEMS detection limit.
The ratio of the luminosity of the combined tails and bridges to the luminosity of the
disk (the luminosity fraction) is shown in Figure 15. The luminosity fraction in the tidal
debris ranges from 10% to 80%, averaging about 30% regardless of redshift. This range is
consistent with that of local galaxies in the Arp atlas and Toomre sequence (e.g., Schombert,
Wallin, & Struck-Marcell 1990; Hibbard & van Gorkom 1996).
Interaction models with curled tails, as in our shrimp galaxies, were made by Bournaud
et al. (2003). Their models had dark matter halos with masses ∼ 10 times the disk mass
and extents less than 12 disk scale lengths. Some of our shrimp galaxies have one prominent
curved arm that is pulled out from the main disk but not very far, resulting in a lopsided
galaxy. Simulations indicate that such lopsidedness may be the result of a recent minor
merger (Bournaud et al. 2005). In some of our cases, a nearby companion is obvious.
The linear sizes of the tidal tails in our sample are shown in Figure 16. They range from
– 13 –
2 to 60 kpc, and are typically a few times the disk diameter, as shown in Figure 17, which
plots this ratio versus redshift. The average tail to diameter ratio is 2.9±1.7 for diffuse tails,
2.5± 1.3 for antennae, 2.5± 1.1 for M51-types and 1.5± 1.4 for shrimps, so the shrimps are
about 60% as extended as the antennae types. There is no apparent dependence of these
ratios on redshift in Figure 17. Projection effects make these apparent ratios smaller than
the intrinsic ratios.
For comparison, the ratio of tail length to disk diameter versus the tail length for local
galaxies is shown in Figure 18 based on measurements of antennae-type systems in the Arp
atlas (1966) and the Vorontsov-Velyaminov atlas (1959). Our galaxies are also shown. The
average tail length for the local galaxies in this figure is 72 ± 48 kpc, while the average tail
length for the GEMS and GOODS antennae is 37% as much, 27±16 kpc. The diameters for
these two groups are 20±12 kpc and 11±5 kpc, and the ratios of tail length to diameter are
4.5± 3.7 and 2.5± 1.3, respectively. Thus the local antennae mergers are larger in diameter
by a factor of 2 than the GEMS and GOODS antennae, and the tails for the locals are larger
by a factor of 2.7. These results for the diameters are consistent with other indicators that
galaxies are smaller at higher redshift, although usually this change does not show up until
z > 1 (see observations and literature review in Elmegreen et al. 2007).
3.4. Tidal dwarf galaxy candidates
Three antennae galaxies at the top of Figure 2, numbers 15, 16, and 18, have long
straight tidal arms with large star-forming regions at the ends. These clumps are possibly
tidal dwarf galaxies. The clump diameters and restframe B magnitudes are listed in Table
2, along with the clump in diffuse galaxy number 1 discussed in Sect. 2. Listed are their
V606 and V606 − z850 magnitudes and associated masses, calculated as in Sect. 3.2. The
masses range from 0.2× 108 to 4.6× 108 M⊙ for the star-forming dwarfs, but for the stellar
condensation in the diffuse-tail galaxy 1 (Fig. 1), the mass is 50×108 M⊙. The star-forming
dwarf masses are similar to or larger than those found for the tidal object at the end of the
Superantennae (Mirabel et al. 1991) as well as the tidal object at the end of the tidal arm in
the IC 2163/NGC 2207 interaction (Elmegreen et al. 2001) and at the end of the Antennae
tail (Mirabel et al. 1992). The HI dynamical masses for these local tidal dwarfs are ∼ 109
Simulations of interacting galaxies that form tidal dwarf galaxies require long tails and
a dark matter halo that extends a factor of 10 beyond the optical disk (Bournaud et al.
2003). If one or both galaxies contain an extended gas disk before the interaction, then more
massive, 109 M⊙ stellar objects can form at the tips of the tidal arms from the accumulated
– 14 –
pool of outer disk material (Elmegreen et al. 1993; Bournaud et al. 2003). Observations of
nearby interactions show clumpy regions of tidal condensations with masses of ∼ 108 − 109
M⊙ (Bournaud et al. 2004; Weilbacher et al. 2002, 2003; Knierman et al. 2003; Iglesias-
Paramo & Vilchez 2001), like what is observed in our high redshift tidal dwarfs.
No well-resolved models have yet formed tidal dwarfs from stellar debris. Wetzstein,
Naab, & Burkert (2007) considered this possibility and found collapsing gas more likely. Yet
the condensed object in the tail of galaxy 1 could have formed there and it is interesting
to consider whether the Jeans mass in such an environment is comparable to the observed
mass. If, for example, the tidal arm surface density corresponds to a value typical for the
outer parts of disks, ∼ 10 M⊙ pc
−2, and the stellar velocity dispersion is comparable to that
required in Sect. 3.2 for the gas to give the giant star forming regions, ∼ 40 km s−1, then the
Jeans mass is M ∼ a4/ (G2Σ) ∼ 1010 M⊙. This is not far from the value we observe, 5× 10
M⊙, so the diffuse clump could have formed by self-gravitational collapse of tidal tail stars.
The timescale for the collapse would be a/ (πGΣ) ∼ 300 Myr, which is not unreasonable
considering that the orbit time at this galactocentric radius is at least this large.
4. Dark Matter Halo Constraints
Models of interacting galaxies have been used to place constraints on dark halo poten-
tials. Springel & White (1999) and Dubinski, Mihos, & Hernquist (1999) found that tidal
tail lengths can be long compared to the disk if the ratio of escape speed to rotation speed
at 2 disk scale lengths is small, ve/Vr < 2.5, and the rotation curve is falling in the outer
disk. In a series of models, Dubinski et al. showed that this condition may result from either
disk-dominated rotation curves where the halo is extended and has a low concentration, or
halo-dominated rotation curves where the halo is compact and low mass. Dubinski et al.
point out that the latter possibility is inconsistent with observed flat or rising disk rotation
curves, but the first is compatible if the disk is massive and dominant in the inner regions.
The first case also gives prominent bridges. In addition, Springel & White (1999) found that
CDM halo models with embedded disks allow long tidal tails, but Dubinski et al. noted that
most of those which do are essentially low surface brightness disks in massive halos, and not
normal bright galaxies. Galaxies without dark matter halos are not capable of generating
long tidal tails (Barnes 1988). In all cases, longer tails develop in prograde interactions.
The smooth diffuse types and antenna types in Figures 1 and 2 have relatively long tails,
so the progenitors were presumably disks of early and late types, respectively, with falling
rotation curves in their outer parts. These long-tail cases are relatively rare, comprising
only about 8% and 9%, respectively, of our original (300 galaxy) interacting sample from
– 15 –
GEMS and GOODS. The more compact M51 types and shrimps represent 9% and 12% of
the sample. Short tail interactions could be younger, less favorably projected, or have a
more steeply rising rotation curve than long tail interactions. The M51 types have clear
companions, so the prominent features are bridges. According to Dubinski et al. (1999),
bridging requires a prograde interaction with a maximum-disk galaxy, that is, one with a
low-mass, extended halo.
5. Conclusions
Mergers and interactions out to redshift z = 1.4 have tails, bridges, and plumes that are
analogous to features in local interacting galaxies. Some interactions have only smooth and
red features, indicative of gas-free progenitors, while others have giant blue star-formation
clumps. The tail luminosity fraction has a wide range, comparable to that found locally.
A striking difference arises regarding the tail lengths, however. The tails in our antenna
sample, at an average redshift of 0.7, are only one-third as long as the tails in local antenna
mergers, and the disk diameters are about half the local merger diameters. This difference is
consistent with the observations that high redshift galaxies are smaller than local galaxies,
although such a drop in size has not yet been seen for galaxies at redshifts this low. The
implication is that dark matter halos have not built up to their full sizes for typical galaxies
in GEMS and GOODS.
Star formation is strongly triggered by the interactions observed here, as it is locally.
The star-forming clumps tend to be much more massive than their local analogs, however,
with masses between ∼ 106 M⊙ and a few ×10
8 M⊙, increasing with redshift. This is not
merely a selection effect, since the massive clumps seen at high redshift would show up at
lower redshift, although of course smaller clumps would not be resolved at high redshift. The
clump spacings were measured along the tidal arms of the most prominent one-arm type of
interaction, the shrimp-type, and found to be 2.20 ± 0.94, which is typical for the spacing
between beads on a string of star formation in local spiral arms. If both types of arms form
clumps by gravitational instabilities, then the turbulent speed of the interstellar medium in
the GEMS and GOODS sample has to be larger than it is locally by a factor of ∼ 5 or more;
the gas mass column density has to be larger by this factor squared.
Some interactions have tidal dwarf galaxies at the ends of their tidal arms, similar to
those found in the Superantennae galaxy and other local mergers. One diffuse interaction
with red stellar tidal debris has a large stellar clump that may have formed by gravitational
collapse in a stellar tidal arm; the clump mass is 5 × 109 M⊙. Long-arm interactions are
relatively rare, comprising only ∼ 17% of our total sample of ∼ 300 interacting systems (only
– 16 –
a fraction of which were discussed here). For those with long arms, numerical models suggest
the dark matter halos must be extended, so that the rotation curves are falling in the outer
disks. Most interactions are not like this, however, so the rotation curves are probably still
rising in their outer disks, like most galaxies locally.
We gratefully acknowledge summer student support for B.M. and T.F. through an REU
grant for the Keck Northeast Astronomy Consortium from the National Science Foundation
(AST-0353997) and from the Vassar URSI (Undergraduate Research Summer Institute) pro-
gram. D.M.E. thanks Vassar for publication support through a Research Grant. We thank
the referee for useful comments. This research has made use of the NASA/IPAC Extragalac-
tic Database (NED) which is operated by the Jet Propulsion Laboratory, California Institute
of Technology, under contract with the National Aeronautics and Space Administration.
REFERENCES
Abraham, R., Tanvir, N., Santiago, B., Ellis, R., Glazebrook, K., & van den Bergh, S. 1996,
MNRAS, 279, L47
Arp, H.J. 1966, Atlas of Peculiar Galaxies (Pasadena: CalTech)
Athanassoula, E., & Bosma, A. 1985, ARAA, 23, 147
Baldry, I. K. et al. 2004, ApJ, 600, 681
Barnes, J.E. 1988, ApJ, 331, 699
Barnes, J.E. 2004, MNRAS, 350, 798
Barnes, J., & Hernquist, L. 1992, ARAA, 30, 705
Bastian, N., Hempel, M., Kissler-Patig, M., Homeier, N., & Trancho, G. 2005, A&A, 435,
Beckwith, S.V.W., et al. 2006, AJ, 132, 1729
Bell, E.F., et al. 2006, ApJ, 640, 241
Bournaud, F., Combes, F., Jog, C., & Puerari, I. 2005, A&A, 438, 507
Bournard, F., Duc, P.-A., Amram, P., Combes, F., & Gach, J.-L. 2004, A&A, 425, 813
Bournaud, F., Duc., P.A.-, & Masset, F. 2003, A&A, 411, L469
– 17 –
Burbidge, E.M., & Burbidge, G.R. 1959, ApJ, 130, 23
Caldwell, J. et al., 2005, astro-ph/0510782
Carroll, S. M., Press, W. H., & Turner, E. L. 1992, ARAA, 30, 499
Chromey, F. R., Elmegreen, D. M., Mandell, A., & McDermott, J. 1998, AJ, 115, 2331
Conselice, C.J. 2005, in Multiwavelength mapping of galaxy formation and evolution, ESO
Workshop, ed. A. Renzini & R. Bender (Berlin: Springer), 163
Conselice, C.J. 2006b, MNRAS, 373, 1389
Conselice, C.J. 2006a, ApJ, 638, 686
Conselice, C., Bershady, M., Dickinson, M., & Papovich, C. 2003, AJ, 126, 1183
Coe, D., Benitez, N., Sanchez, S., Jee, M., Bouwens, R., & Ford, H. 2006, AJ, 132, 926
de Grijs, R., Lee, J., Mora Herrera, M., Fritze-v. Alvensleben, U., & Anders, P. 2003, NewA,
8, 155
de Mello, D., Wadadekar, Y., Dahlen, T., Casertano, S., & Gardner, J.P. 2006, AJ, 131, 216
Dubinski, J., Mihos, J., & Hernquist, L. 1999, ApJ, 526, 607
Duc, P.-A., et al., 1997, A&A, 326, 537
Efremov, Y.N. 1995, AJ, 110, 2757
Elmegreen, B.G. 1994, ApJ, 433, 39
Elmegreen, B.G., & Elmegreen, D.M. 1983, MNRAS, 203, 31
Elmegreen, B.G., & Elmegreen, D.M. 1987, ApJ, 320, 182
Elmegreen, B.G., & Elmegreen, D.M. 2005, ApJ, 627, 632
Elmegreen, B., Elmegreen, D., Salzer, J., & Mann, H. 1996, ApJ, 467, 579
Elmegreen, B., Kaufman, M., & Thomasson, M. 1993, ApJ, 412, 90
Elmegreen, D.M., & Elmegreen, B.G. 2006, ApJ, 651, 676
Elmegreen, D.M., Kaufman, M., Elmegreen, B.G., Brinks, E., Struck, C., Klaric, M., &
Thomasson, M. 2001, AJ, 121, 182
http://arxiv.org/abs/astro-ph/0510782
– 18 –
Elmegreen, D.M., Elmegreen, B.G., Ravindranath, S., & Coe, D. 2007, astroph/0701121
Elmegreen, D.M., Elmegreen, B.G., Rubin, D.S., & Schaffer, M.A. 2005, ApJ, 631, 85
Elmegreen, D., Kaufman, M., Brinks, E., Elmegreen, B., & Sundin, M. 1995, ApJ, 453, 100
Elmegreen, D., & Salzer, J. 1999, AJ, 117, 764
Faber, S. et al. 2005, astro-ph/0506044
Fukugita, M., Shimasaku, K., & Ichikawa, T. 1995, PASP, 107, 945
Genzel, R. et al. 2006, Nature, 442, 786
Giavalisco, M., et al. 2004, ApJ, 600, L103
Hibbard, J.E., & Vacca, W. D. 1997, AJ, 114, 1741
Hibbard, J., & van Gorkom, J. 1996, AJ, 111, 655
Hibbard, J.E., & Yun, M.S. 1999, AJ, 118, 162
Horellou, C., & Koribalski, B. 2007, A&A, 464, 155
Iglesias-Paramo, J., & Vilchez, J.M. 2001, ApJ, 550, 204
Irwin, J.A. 1994, ApJ, 429, 618
Jarrett, T.H., et al. 2006, AJ, 131, 261
Kaufman, M., Brinks, E., Elmegreen, D., Thomasson, M., Elmegreen, B., Struck, C., &
Klaric, M. 1997, AJ, 114, 2323
Kaufman, M., Brinks, E., Elmegreen, B.G., Elmegreen, D.M., Klaric, M., Struck, C.,
Thomasson, M., & Vogel, S., 1999, AJ, 118, 1577
Kaufman, M., Sheth, K., Struck, C., Elmegreen, B G., Thomasson, M., Elmegreen, D.M.,
Brinks, E. 2002, AJ, 123, 702
Kim, W.-T. & Ostriker, E.C. 2006, ApJ, 646, 213
Knierman, K., et al. 2003, AJ, 126, 1227
Laine, S., van der Marel, R., Rossa, J., Hibbard, J., Mihos, J., Boker, T., & Zabludoff, A.
2003, AJ, 126, 2717
http://arxiv.org/abs/astro-ph/0506044
– 19 –
Larson, R. B., & Tinsley, B. M. 1978, ApJ, 219, 46
Lavery, R., Remijan, A., Charmandaris, V., Hayes, R., & Ring, A. 2004, ApJ, 612, 679
La Vigne, M.A., Vogel, S.N., & Ostriker, E.C. 2006, ApJ, 650, 818
Lotz, J. M., Madau, P., Giavalisco, M., Primack, J., & Ferguson, H. 2006, ApJ, 636, 592
Malin, D.F., & Carter, D. 1980, Nature, 285, 643
Mihos, C. 1995, ApJ, 438, L75
Mirabel, I., Dottori, H., & Lutz, D. 1992, A&A, 256, L19
Mirabel, I., Lutz, D., & Maza, J. 1991, A&A, 243, 367
Naab, T., Khochfar, S., & Burkert, A. 2006, ApJ, 636, L81
Neuschaefer, L., Im, M., Ratnatunga, U., Griffiths, R., & Casertano, S. 1997, ApJ, 480, 59
Patton, D., Pritchet, C., Yee, H., Ellingson, E., & Carlberg, R. 1997, ApJ, 475, 29
Rasband, W.S., 1997, ImageJ, U.S. National Institutes of Health, Bethesda, MD,
http://rsb.info.nih.gov/ij/
Rix, H.W., et al. 2004, ApJS, 152, 163
Rots, A., Bosma, A., van der Hulst, J., Athanassoula, E., & Crane, P. 1990, AJ, 100, 387
Rowan-Robinson, M. 2003, MNRAS, 345, 819
Schombert, J., Wallin, J., & Struck-Marcell, C. 1990, AJ, 99, 497
Smith, B.J., Struck, C., Appleton, P.N., Charmandaris, V., Reach, W., & Eitter, J.J. 2005,
AJ, 130, 2117
Smith, B.J., Struck, C., Hancock, M., Appleton, P.N., Charmandaris, V., & Reach, W.T.
2007, AJ, 133, 791
Spergel, D. N., et al. 2003, ApJS, 148, 175
Springel, V. & White, S. 1999, MNRAS, 307, 162
Sundin, M. 1993, Ph.D. thesis, Chalmers Univ. of Technology
http://rsb.info.nih.gov/ij/
– 20 –
Straughn, A. N., Cohen, S. H., Ryan, R. E., Hathi, N. P., Windhorst, R. A., & Jansen, R.
A. 2006, ApJ, 639, 724
Struck, C. 1999, Physics Reports, 321, 1
Struck, C., Appleton, P., Borne, K., & Lucas, R. 1996, AJ, 112, 1868
Toomre, A. 1977, in “The Evolution of Galaxies and Stellar Populations,” eds. W. Becker
and G. Contopoulos (Reidel: Dordrecht), 401
Toomre, A., & Toomre, J. 1972, ApJ, 431, L9
Tran, H.D., et al. 2003, ApJ, 585, 750
Trujillo, I., et al. 2006, MNRAS, 373, L36
Tyson, J. A., Fischer, P., Guhathakurta, P., McIlroy, P., Wenk, R., Huchra, J., Macri, L.,
Neuschaefer, L., Sarajedini, V., Glazebrook, K., Ratnatunga, K., & Griffiths, R. 1998,
AJ, 116, 102
Vorontsov-Velyaminov, B.A. 1957, Astr. Circ., USSR, 178, 19
Vorontsov-Velyaminov, B.A. 1959, Atlas and Catalog of Interacting Galaxies, (Moscow:
Sternberg Institute)
Weilbacher, P., Duc, P.-A., & Fritze-v. Alvensleben, U. 2003, A&A, 397, 545
Weilbacher, P., Fritze-v. Alvensleben, U., Duc, P.-A., & Fricke, K.J. 2002, ApJ, 579, L79
Weiner, B.J, Willmer, C. N. A., Faber, S. M., Melbourne, J., Kassin, S.A., Phillips, A.C.,
Harker, J., Metevier, A. J., Vogt, N. P., & Koo, D. C. 2006, ApJ, 653, 1027
Wetzstein, M., Naab, T., & Burkert, A. 2007, MNRAS, 375, 805
Whitmore, B., et al. 2005, AJ, 130, 2104
Wolf, C., Meisenheimer, K., Rix, H.-W., Borch, A., Dye, S., & Kleinheinrich, M. 2003, A&A,
401, 73
This preprint was prepared with the AAS LATEX macros v5.2.
– 21 –
Table 1. Interacting Galaxies in GEMS and GOODS
Type, Figure Number COMBO 17 z R mag.
Diffuse (Fig. 1) 1 6423 0.15 16.572
2 12639 0.154 16.678
3 11538 0.134 17.713
4 53129 0.171 16.968
5 57881 0.118 17.552
6 28509 0.093 18.79
7 17207 0.69 19.742
8 30824 0.341 19.755
9 25874 0.262 19.757
Diffuse (other) 10 22588 0.684 21.263
11 21990 0.429 21.243
12 46898 0.617 20.794
13 49709 0.302 20.23
14 15233 0.304 18.857
Antennae (Fig. 2) 15 61546 0.552 20.41
16 45115 0.579 21.275
17 20280 0.555 21.653
18 41907 0.702 22.66
19 35611 1.256 22.655
20 10548 0.698 22.43
21 33650 0.169 18.86
22 42890 0.421 20.68
23 49860 1.169 23.632
24 34926 0.779 -19.69
Antennae (other) 25 14829 0.219 21.429
26 18588 0.814 22.748
27 46738 1.204 20.65
28 7551 1.162 25.926
29 20034 1.326 21.932
30 33267 0.067 23.112
31 38651 0.988 23.89
32 55495 1.00 24.261
M51-type (Fig. 3) 33 5640 0.204 19.477
34 9415 0.523 21.16
35 40901 0.193 19.751
36 17522 0.82 23.103
37 6209 1.187 22.723
38 23667 1.151 23.514
39 37293 0.274 20.533
40 39805 0.557 20.089
41 53243 0.698 21.683
42 15599 0.56 21.381
43 25783 0.663 20.732
44 39228 0.117 18.031
M51-type (other) 45 1984 0.762 22.855
– 22 –
Table 1—Continued
Type, Figure Number COMBO 17 z R mag.
46 2760 1.281 23.202
47 15040 0.667 22.392
48 18502 0.228 21.942
49 14959 0.306 19.581
50 16023 0.668 21.887
51 30226 0.509 22.689
52 40744 0.292 21.119
53 45102 0.857 22.514
54 60582 0.946 22.54
Shrimp (Fig. 4) 55 40198 0.201 20.55
56 14373 0.795 23.183
57 12222 1.004 22.417
58 28344 0.257 19.509
59 56284 0.657 21.667
60 2385 0.283 21.334
61 54335 0.892 22.824
62 28841 0.673 20.971
63 6955 0.983 22.24
Shrimp (other) 64 34244 0.999 22.504
65 48298 0.429 21.663
66 37809 0.357 20.667
67 25316 0.985 23.717
68 49595 0.663 21.939
69 59467 0.487 21.568
70 9062 0.854 23.82
71 30076 0.832 22.672
72 2760 1.281 23.202
73 54335 0.892 22.824
Assembly (Fig. 5) 74 28751 0.093 23.506
75 4728 0.702 22.799
76 23187 1.183 23.565
77 45309 1.061 22.916
78 41835 0.098 19.134
79 61945 1.309 21.813
80 62605 1.011 23.143
81 44956 0.506 22.494
82 4546 0.809 22.163
83 23000 0.132 22.951
84 63112 0.499 22.273
85 43975 1.059 22.878
Equal (Fig. 6) 86 40813 0.182 19.983
87 8496 0.354 22.415
88 13836 0.661 21.054
89 11164 0.464 19.351
90 39877 0.493 22.142
– 23 –
Table 1—Continued
Type, Figure Number COMBO 17 z R mag.
91 40598 0.263 20.128
92 51021 0.743 20.96
93 35317 0.671 20.755
94 56256 0.502 20.309
95 47568 0.649 20.206
96 40766 0.46 19.997
97 24927 0.524 19.647
98 15233 0.304 18.857
99 18663 1.048 24.011
100 43242 0.657 21.177
– 24 –
Table 2: Tidal Dwarf Galaxy Candidates
Galaxy z Galaxy Diam. Dwarf V606 V606 − z850 Clump Mass
(COMBO17 #) MB,rest (mag) (kpc) MB,rest (mag) mag mag x10
1 (6423) 0.15 -20.64 13.9 -17.55 20.83 0.90 50
15 (61546) 0.552 -20.77 5.5 -16.67 26.12 0.78 1.2
16 (45115) 0.579 -20.17 4.7 -17.72 25.42 1.1 4.6
18 (41907) 0.702 -19.21 1.9 -16.03 27.45 0.55 0.24
17 (20280) 0.555 -19.56 6.2 -17.06 25.77 0.73 1.4
– 25 –
Fig. 1.— Color images of galaxies in the GEMS and GOODS fields with smooth diffuse tidal
debris. The galaxy at the top right, number 3 in Table 1, is only partially covered by the
GEMS field; the right-hand portion of the image is from ground-based observations. The
smooth debris is presumably from old stars that were spread out during the interaction. A
few small star-formation patches are evident in some cases. The clump in the upper right
corner of the galaxy 1 image could be a rare example of a gravitationally driven condensation
in a pure-stellar arm. The smooth arcs and spirals in this and other images are probably
a combination of orbital debris and flung-out tidal tails. The galaxy numbers, as listed in
Table 1, are 1 through 9, as plotted from left to right and top to bottom. (Image quality
degraded for astroph.)
– 26 –
Fig. 2.— Color images of interacting antennae galaxies with long and structured tidal arms.
Galaxy numbers, in order, are 15 through 24. Several have dwarf galaxy-like condensations
at the arm tips or broad condensations midway out in the arms. The dwarf elliptical at the
tip of the tidal arm in galaxy 20 might have existed before the interaction and been placed
there by tidal forces; the main body of this system has a double nucleus from the main
interaction. (Image quality degraded for astroph.)
– 27 –
Fig. 3.— M51-type galaxies are shown as logarithmic grayscale V-band images. In order,
the galaxy numbers are 33 through 44. The linear streak in galaxy 44 could be orbital debris
from the small companion on the right. (Image quality degraded for astroph.)
– 28 –
Fig. 4.— Shrimp galaxies, named because of their curved tails, are shown as logarithmic
V-band images. In order, they are numbers 55 through 63. (Image quality degraded for
astroph.)
– 29 –
Fig. 5.— Assembly galaxies look like they are being assembled through mergers. In order:
galaxy 74 through 85.
– 30 –
Fig. 6.— Galaxies with approximately equal-mass grazing companions, in order, are 86
through 100.
– 31 –
0 0.2 0.4 0.6 0.8 1
Restframe Johnson B–V
Equal
Assembly
Shrimp
M51 type
Antenna
Diffuse
Fig. 7.— Restframe (U-B) and (B-V) integrated colors for interacting galaxies in the GEMS
and GOODS fields, from COMBO-17. The reddest tend to be the diffuse types, which
are presumably dry mergers, and the bluest are the assembly types, which could be young
proto-galaxies. Crosses indicate standard Hubble types, measured by Fukugita et al. (1995).
– 32 –
–24 –22 –20 –18 –16 –14 –12
MB (mag)
Diffuse
Antenna
M51 type
Shrimp
Assembly
Equal
Fig. 8.— Restframe Johnson U-B integrated color versus absolute restframe MB, from
COMBO-17. The solid line separates the red sequence and blue cloud (Conselice 2006b).
Color limits for local galaxies are indicated by the horizontal short-dashed lines; local galaxies
are brighter than the vertical line. The local blue cloud galaxies are approximately delimited
on the left side of the diagram by the long-dashed lines. Thus, most of our observed galaxies
fall near the local galaxy colors and magnitudes.
– 33 –
–17 –18 –19 –20 –21 –22
Galaxy MB
Diffuse
Antenna
M51 type
Shrimp
Assembly
Tidal Dwarf
Fig. 9.— Restframe B absolute magnitudes of star-forming clumps versus integrated galaxy
restframe magnitudes. The correlation is also found for local galaxies.
– 34 –
0.01 0.1 1 10
Duration of SF (Gyr)
0.01 0.1 1 10
Duration of SF (Gyr)
–1 0 1 2 3
V606–z850
–1 0 1 2 3
V606–z850
Fig. 10.— Models at z = 1 for clump color (bottom left) and clump mass at an apparent
V606 magnitude of 27 (lower right) are shown in the bottom panels versus the duration of star
formation in 6 models with exponentially decaying star formation. Five lines are for decay
times of 107, 3×107, 108, 3×108, and 109 years, and the sixth line represents a constant rate.
Shorter decay times correspond to redder color (upper lines) and higher masses (upper lines).
In the top panels, the clump mass at V606 = 27 (top left) and the clump apparent magnitude
at 108 M⊙ masses (top right) are shown versus the clump color. The correspondence between
color and mass gives a degeneracy to plots of mass versus color at a fixed apparent magnitude
(top left) and apparent magnitude versus color at a fixed mass (top right). Thus the masses
of clumps can be derived approximately from their V606 − z850 colors and V606 magnitudes
for each redshift.
– 35 –
–1 0 1 2 3
V606–z850
0–0.125
Diffuse
Antenna
M51 type
Shrimp
Assembly
Equal
Tidal Dw.
0 1 2 3
V606–z850
0.125–0.375
0.375–0.625
0.625–0.875
z=0.875–1.125
z=1.125–1.375
Fig. 11.— The masses of the clumps can be estimated from this figure. Each curve in a
cluster of curves is a different model for color-magnitude evolution of a star-forming region,
with the age of the region changing along the curve and the exponential decay rate of the
star formation changing from curve to curve. The different clusters of curves correspond to
different total masses for the star-forming regions (mass in M⊙ is indicated to the right of each
curve). The symbols represent observations of apparent magnitude and color. Bandshifting
and absorption are considered by plotting the observations and models in redshift bins.
The mass scales shift slightly with redshift. The mass of each star-forming region can be
determined by interpolation between the curves. Typical masses are 106 M⊙ for low z and
108 M⊙ for high z. The circle near the 10
10 M⊙ curves in the z = 0.125 − 0.375 interval
corresponds to the diffuse clump in the tidal debris of galaxy 1 in Fig. 1.
– 36 –
D A M S AS E T
0–0.125
D A M S AS E T
0.125–0.375
log M
= 9.7
0.375–0.625 0.625–0.875
z=0.875–1.125 z=1.125–1.375
Fig. 12.— Clump masses (left axis) are plotted versus galaxy type in order of Figs. 1-6:
Diffuse, Antenna, M51-type, Shrimp, Assembly, and Equal, with T representing the tidal
dwarfs. The method of Fig. 11 is used. The rms deviations among the six star formation
decay times are shown as plus-symbols using the right-hand axes.
– 37 –
0.01 0.1 1 10
Duration of SF (Gy)
50 z=0.25
0.01 0.1 1 10
Duration of SF (Gy)
z=0.5
z=0.75 z=1
z=1.25 z=1.5
Fig. 13.— The apparent color of a star forming region is shown versus the duration of star
formation for an exponentially decaying star formation law. The decay times are as in Fig.
10, with short decay times the upper lines and continuous star formation the lower lines.
Using the observed clump colors, the durations of star formation are found to range between
107 and 3× 108 yrs for short decay times.
– 38 –
0 0.5 1 1.5
log(1+z)4
Diffuse
Antenna
M51 type
Shrimp
Fig. 14.— V-band surface brightness of tidal tails for galaxies in Figures 1-4 plotted as a
function of (1 + z)4 for redshift z. Some systems have more than one tail. Cosmological
dimming causes a decrease with redshift equal to 2.5 magnitudes for each factor of 10 in
(1 + z)
; this decrease is consistent with the dimming seen here. The observable 2σ limit
for these fields is ∼ 25 mag arcsec−2. Some antenna galaxies have patchy tails with fainter
average surface brightnesses.
– 39 –
0 0.5 1
Redshift, z
Diffuse
Antenna
M51 type
Shrimp
Fig. 15.— Fraction of V-band luminosity in antennae tidal tails relative to their integrated
galaxy luminosity, as a function of redshift.
– 40 –
0 5 10 15 20 25 30 35
Disk Diameter (kpc)
Diffuse
Antenna
M51 type
Shrimp
Fig. 16.— Tail length versus disk diameter from Figs. 1-4, based on the V-band images.
Conversions to linear size assumed a standard ΛCDM cosmology applied to the photometric
redshifts.
– 41 –
0 0.5 1 1.5
Redshift, z
Diffuse
Antenna
M51 type
Shrimp
Fig. 17.— Tail length/disk diameter as a function of redshift for shrimps and antennae,
measured from the V-band images. There is no obvious trend.
– 42 –
0 50 100 150 200
Tail Length (kpc)
GEMS GOODS Antennae
Local Antennae
Superantennae
Arp 299
NGC 3628Arp 241
VV109
NGC 3256
Antennae
Arp 243
Arp 242
Arp 226
Arp 157 Arp 75
Arp 35 NGC 5548
Arp 33
Arp 102
Fig. 18.— Tail length/disk diameter versus the tail length for antenna galaxies in our sample
as well as for local antennae, whose names are indicated. The GEMS and GOODS systems
are significantly smaller than the local antenna galaxies, even if the two extreme local cases,
the Superantennae and Arp 299, are excluded.
	Introduction
	The Sample of Interactions and Mergers
	Photometric Results
	Global galaxy properties
	Clump properties
	Tail Properties
	Tidal dwarf galaxy candidates
	Dark Matter Halo Constraints
	Conclusions
ABSTRACT
  GEMS and GOODS fields were examined to z~1.4 for galaxy interactions and
mergers. The basic morphologies are familiar: antennae with long tidal tails,
tidal dwarfs, and merged cores; M51-type galaxies with disk spirals and tidal
arm companions; early-type galaxies with diffuse plumes; equal-mass
grazing-collisions; and thick J-shaped tails beaded with star formation and
double cores. One type is not common locally and is apparently a loose
assemblage of smaller galaxies. Photometric measurements were made of the tails
and clumps, and physical sizes were determined assuming photometric redshifts.
Antennae tails are a factor of ~3 smaller in GEMS and GOODS systems compared to
local antennae; their disks are a factor of ~2 smaller than locally. Collisions
among early type galaxies generally show no fine structure in their tails,
indicating that stellar debris is usually not unstable. One exception has a
5x10**9 Msun smooth red clump that could be a pure stellar condensation. Most
tidal dwarfs are blue and probably form by gravitational instabilities in the
gas. One tidal dwarf looks like it existed previously and was incorporated into
the arm tip by tidal forces. The star-forming regions in tidal arms are 10 to
1000 times more massive than star complexes in local galaxies, although their
separations are about the same. If they all form by gravitational
instabilities, then the gaseous velocity dispersions in interacting galaxies
have to be larger than in local galaxies by a factor of ~5 or more; the gas
column densities have to be larger by the square of this factor.

<|endoftext|><|startoftext|>
Nuclear Spin Effects in Optical Lattice Clocks
Martin M. Boyd, Tanya Zelevinsky, Andrew D. Ludlow, Sebastian
Blatt, Thomas Zanon-Willette, Seth M. Foreman, and Jun Ye
JILA, National Institute of Standards and Technology and University of Colorado,
Department of Physics, University of Colorado, Boulder, CO 80309-0440
(Dated: August 28, 2018)
We present a detailed experimental and theoretical study of the effect of nuclear spin on the
performance of optical lattice clocks. With a state-mixing theory including spin-orbit and hyperfine
interactions, we describe the origin of the 1S0-
3P0 clock transition and the differential g-factor be-
tween the two clock states for alkaline-earth(-like) atoms, using 87Sr as an example. Clock frequency
shifts due to magnetic and optical fields are discussed with an emphasis on those relating to nuclear
structure. An experimental determination of the differential g-factor in 87Sr is performed and is
in good agreement with theory. The magnitude of the tensor light shift on the clock states is also
explored experimentally. State specific measurements with controlled nuclear spin polarization are
discussed as a method to reduce the nuclear spin-related systematic effects to below 10−17 in lattice
clocks.
Optical clocks [1] based on alkaline-earth atoms con-
fined in an optical lattice [2] are being intensively ex-
plored as a route to improve state of the art clock accu-
racy and precision. Pursuit of such clocks is motivated
mainly by the benefits of Lamb-Dicke confinement which
allows high spectral resolution [3, 4], and high accuracy
[5, 6, 7, 8] with the suppression of motional effects, while
the impact of the lattice potential can be eliminated using
the Stark cancelation technique [9, 10, 11, 12]. Lattice
clocks have the potential to reach the impressive accu-
racy level of trapped ion systems, such as the Hg+ opti-
cal clock [13], while having an improved stability due to
the large number of atoms involved in the measurement.
Most of the work performed thus far for lattice clocks has
been focused on the nuclear-spin induced 1S0-
3P0 tran-
sition in 87Sr. Recent experimental results are promis-
ing for development of lattice clocks as high performance
optical frequency standards. These include the confir-
mation that hyperpolarizability effects will not limit the
clock accuracy at the 10−17 level [12], observation of tran-
sition resonances as narrow as 1.8 Hz [3], and the excel-
lent agreement between high accuracy frequency mea-
surements performed by three independent laboratories
[5, 6, 7, 8] with clock systematics associated with the lat-
tice technique now controlled below 10−15 [6]. A main
effort of the recent accuracy evaluations has been to min-
imize the effect that nuclear spin (I = 9/2 for 87Sr) has
on the performance of the clock. Specifically, a linear
Zeeman shift is present due to the same hyperfine inter-
action which provides the clock transition, and magnetic
sublevel-dependent light shifts exist, which can compli-
cate the stark cancelation techniques. To reach accuracy
levels below 10−17, these effects need to be characterized
and controlled.
The long coherence time of the clock states in alkaline
earth atoms also makes the lattice clock an intriguing
system for quantum information processing. The closed
electronic shell should allow independent control of elec-
tronic and nuclear angular momenta, as well as protec-
tion of the nuclear spin from environmental perturbation,
providing a robust system for coherent manipulation[14].
Recently, protocols have been presented for entangling
nuclear spins in these systems using cold collisions [15]
and performing coherent nuclear spin operations while
cooling the system via the electronic transition [16].
Precise characterization of the effects of electronic and
nuclear angular-momentum-interactions and the resul-
tant state mixing is essential to lattice clocks and po-
tential quantum information experiments, and therefore
is the central focus of this work. The organization of
this paper is as follows. First, state mixing is discussed
in terms of the origin of the clock transition as well as
a basis for evaluating external field sensitivities on the
clock transition. In the next two sections, nuclear-spin
related shifts of the clock states due to both magnetic
fields and the lattice trapping potential are discussed.
The theoretical development is presented for a general
alkaline-earth type structure, using 87Sr only as an ex-
ample (Fig. 1), so that the results can be applied to other
species with similar level structure, such as Mg, Ca, Yb,
Hg, Zn, Cd, Al+, and In+. Following the theoretical dis-
cussion is a detailed experimental investigation of these
nuclear spin related effects in 87Sr, and a comparison to
the theory sections. Finally, the results are discussed in
the context of the performance of optical lattice clocks,
including a comparison with recent proposals to induce
the clock transition using external fields in order to elim-
inate nuclear spin effects [17, 18, 19, 20, 21, 22]. The
appendix contains additional details on the state mixing
and magnetic sensitivity calculations.
I. STATE MIXING IN THE nsnp
CONFIGURATION
To describe the two-electron system in intermediate
coupling, we follow the method of Breit and Wills [23]
and Lurio [24] and write the four real states of the ns np
configuration as expansions of pure spin-orbit (LS) cou-
http://arxiv.org/abs/0704.0912v2
)5( Ss
)55( Pps
)55( Pps
)55( Pps
)55( Pps
State A(MHz) Q(MHz)
-260              -35
-212                67
-3.4                39
State Mixing
FIG. 1: (color online) Simplified 87Sr energy level diagram
(not to scale). Relevant optical transitions discussed in the
text are shown as solid arrows, with corresponding wave-
lengths given in nanometers. Hyperfine structure sublevels
are labeled by total angular momentum F , and the magnetic
dipole (A) and electric quadrupole (Q, equivalent to the hy-
perfine B coefficient) coupling constants are listed in the inset.
State mixing of the 1P1 and
3P1 states due to the spin-orbit
interaction is shown as a dashed arrow. Dotted arrows repre-
sent the hyperfine induced state mixing of the 3P0 state with
the other F = 9/2 states in the 5s5p manifold.
pling states,
P0〉 = |
P1〉 = α|
1 〉 + β|
P2〉 = |
P1〉 = −β|
1 〉 + α|
Here the intermediate coupling coefficients α and β
(0.9996 and -0.0286 respectively for Sr) represent the
strength of the spin-orbit induced state mixing between
singlet and triplet levels, and can be determined from
experimentally measured lifetimes of 1P1 and
3P1 (see
Eq. 15 in the appendix). This mixing process results in
a weakly allowed 1S0-
3P1 transition (which would other-
wise be spin-forbidden), and has been used for a variety
of experiments spanning different fields of atomic physics.
In recent years, these intercombination transitions have
provided a unique testing ground for studies of narrow-
line cooling in Sr [25, 26, 27, 28, 29] and Ca [30, 31], as
well as the previously unexplored regime of photoassocia-
tion using long lived states [32, 33, 34]. These transitions
have also received considerable attention as potential op-
tical frequency standards [35, 36, 37], owing mainly to
the high line quality factors and insensitivity to external
fields. Fundamental symmetry measurements, relevant
to searches of physics beyond the standard model, have
also made use of this transition in Hg [38]. Furthermore,
the lack of hyperfine structure in the bosonic isotopes
(I = 0) can simplify comparison between experiment and
theory.
The hyperfine interaction (HFI) in fermionic isotopes
provides an additional state mixing mechanism between
states having the same total spin F , mixing the pure 3P0
state with the 3P1,
3P2 and
1P1 states.
|3P0〉 = |
〉+ α0|
3P1〉+ β0|
1P1〉+ γ0|
〉. (2)
The HFI mixing coefficients α0, β0, and γ0 (2×10
−4, −4×
10−6, and 4 × 10−6 respectively for 87Sr) are defined in
Eq. 16 of the appendix and can be related to the hyperfine
splitting in the P states, the fine structure splitting in the
3P states, and the coupling coefficients α and β [23, 24].
The 3P0 state can also be written as a combination of
pure states using Eq. 1,
P0〉 =|
0 〉 + (α0α− β0β)|
+ (α0β + β0α)|
1 〉 + γ0|
The HFI mixing enables a non-zero electric-dipole tran-
sition via the pure 1P 01 state, with a lifetime which can
be calculated given the spin-orbit and HFI mixing coef-
ficients, the 3P1 lifetime, and the wavelengths (λ) of the
3P0 and
3P1 transitions from the ground state [39].
3P0 =
3P1−1S0
(α0β + β0α)2
3P1 . (4)
In the case of Sr, the result is a natural lifetime on the
order of 100 seconds [9, 40, 41], compared to that of a
bosonic isotope where the lifetime approaches 1000 years
[41]. Although the 100 second coherence time of the
excited state exceeds other practical limitations in cur-
rent experiments, such as laser stability or lattice life-
time, coherence times approaching one second have been
achieved [3]. The high spectral resolution has allowed a
study of nuclear-spin related effects in the lattice clock
system discussed below.
The level structure and state mixing discussed here
are summarized in a simplified energy diagram, shown
in Fig. 1, which gives the relevant atomic structure and
optical transitions for the 5s5p configuration in 87Sr.
II. THE EFFECT OF EXTERNAL MAGNETIC
FIELDS
With the obvious advantages in spectroscopic precision
of the 1S0-
3P0 transition in an optical lattice, the sensi-
tivity of the clock transition to external field shifts is a
central issue in developing the lattice clock as an atomic
frequency standard. To evaluate the magnetic sensitivity
of the clock states, we follow the treatment of Ref. [24] for
the intermediate coupling regime described by Eqns. 1-3
in the presence of a weak magnetic field. A more general
treatment for the case of intermediate fields is provided
in the appendix. The Hamiltonian for the Zeeman inter-
action in the presence of a weak magnetic field B along
the z-axis is given as
HZ = (gsSz + glLz − gIIz)µ0B. (5)
Here gs ≃ 2 and gl = 1 are the spin and orbital an-
gular momentum g-factors, and Sz, Lz, and Iz are the
z-components of the electron spin, orbital, and nuclear
spin angular momentum respectively. The nuclear g-
factor, gI , is given by gI=
µI(1−σd)
µ0|I|
, where µI is the nuclear
magnetic moment, σd is the diamagnetic correction and
. Here, µB is the Bohr magneton, and h is Planck’s
constant. For 87Sr, the nuclear magnetic momement and
diamagnetic correction are µI = −1.0924(7)µN [42] and
σd = 0.00345 [43] respectively, where µN is the nuclear
magneton. In the absence of state mixing, the 3P0 g-
factor would be identical to the 1S0 g-factor (assuming
the diamagnetic effect differs by a negligible amount for
different electronic states), equal to gI . However since
the HFI modifies the 3P0 wavefunction, a differential g-
factor, δg, exists between the two states. This can be
interpreted as a paramagnetic shift arising due to the
distortion of the electronic orbitals in the triplet state,
and hence the magnetic moment [44]. δg is given by
δg = −
〈3P0|HZ |
3P0〉 − 〈
3P 00 |HZ |
3P 00 〉
mFµ0B
= − 2 (α0α− β0β)
〈3P 00 ,mF |HZ |
3P 01 , F = I,mF 〉
mFµ0B
+ O(α
0 , γ
0 , . . .).
Using the matrix element given in the appendix for
87Sr (I = 9/2), we find 〈3P 00 ,mF |HZ |
3P 01 , F =
,mF 〉=
mFµ0B, corresponding to a modification of the
g-factor by ∼60%. Note that the sign in Eq. 6 differs
from that reported in [39, 44] due to our choice of sign
for the nuclear term in the Zeeman Hamiltonian (oppo-
site of that found in Ref. [24]). The resulting linear
Zeeman shift ∆
B = −δgmFµ0B of the
3P0 transition
is on the order of ∼110×mF Hz/G (1 G = 10
−4 Tesla).
This is an important effect for the development of lattice
clocks, as stray magnetic fields can broaden the clock
transition (deteriorate the stability) if multiple sublevels
are used. Furthermore, imbalanced population among
the sublevels or mixed probe polarizations can cause fre-
quency errors due to line shape asymmetries or shifts.
It has been demonstrated that if a narrow resonance is
achieved (10 Hz in the case of Ref. [6]), these systematics
can be controlled at 5×10−16 for stray fields of less than
5 mG. To reduce this effect, one could employ narrower
resonances or magnetic shielding.
An alternative measurement scheme is to measure
the average transition frequency between mF and −mF
states of to cancel the frequency shifts. This requires
application of a bias field to resolve the sublevels, and
therefore the second order Zeeman shift ∆
B must be
considered. The two clock states are both J = 0 so the
shift ∆
B arises from levels separated in energy by the
fine-structure splitting, as opposed to the more tradi-
tional case of alkali(-like) atoms where the second order
shift arises from nearby hyperfine levels. The shift of
the clock transition is dominated by the interaction of
0 500 1000 1500 2000 2500 3000
0 1 2 3
=-9/2C
Magnetic Field (G)
=+9/2
Magnetic Field (G)
FIG. 2: (color online) A Breit-Rabi diagram for the 1S0-
clock transition using Eq. 22 with δgµ0 = −109 Hz/G. Inset
shows the linear nature of the clock shifts at the fields relevant
for the measurement described in the text.
the 3P0 and
3P1 states since the ground state is sepa-
rated from all other energy levels by optical frequencies.
Therefore, the total Zeeman shift of the clock transition
∆B is given by
∆B = ∆
B + ∆
|〈3P0, F,mF |HZ |
3P1, F
′,mF 〉|
ν3P1,F ′ − ν3P0
The frequency difference in the denominator is mainly
due to the fine-structure splitting and is nearly indepen-
dent of F ′, and can therefore be pulled out of the sum-
mation. In terms of the pure states, and ignoring terms
of order α0, β0, β
2, and smaller, we have
B ≃− α
|〈3P 00 , F,mF |HZ |
3P 01 , F
′,mF 〉|
ν3P1 − ν3P0
2α2(gl − gs)
3(ν3P1 − ν3P0)
where we have used the matrix elements given in the
appendix for the case F = 9/2. From Eq. 8 the sec-
ond order Zeeman shift (given in Hz for a magnetic field
given in Gauss) for 87Sr is ∆
B =−0.233B
2. This is con-
sistent with the results obtained in Ref. [20] and [45] for
the bosonic isotope. Inclusion of the hyperfine splitting
into the frequency difference in the denominator of Eq. 7
yields an additional term in the second order shift pro-
portional to m2F which is more that 10
−6 times smaller
than the main effect, and therefore negligible. Notably,
the fractional frequency shift due to the second order
Zeeman effect of 5×10−16 G−2 is nearly 108 times smaller
than that of the Cs [46, 47] clock transition, and more
than an order of magnitude smaller than that present in
Hg+ [13], Sr+ [48, 49],and Yb+ [50, 51] ion optical clocks.
A Breit-Rabi like diagram is shown in Fig. 2, giving
the shift of the 1S0-
3P0 transition frequency for different
mF sublevels (assuming ∆m = 0 for π transitions), as a
function of magnetic field. The calculation is performed
using an analytical Breit-Rabi formula (Eq. 22) provided
in the appendix. The result is indistinguishable from the
perturbative derivation in this section, even for fields as
large as 104 G.
III. THE EFFECT OF THE OPTICAL LATTICE
POTENTIAL
In this section we consider the effect of the confining
potential on the energy shifts of the nuclear sublevels. In
the presence of a lattice potential of depth UT , formed by
a laser linearly polarized along the axis of quantization
defined by an external magnetic field B, the level shift of
a clock state (h∆g/e) from its bare energy is given by
∆e = −mF (gI + δg)µ0B − κ
e ξmF
F − F (F + 1)
∆g = −mF gIµ0B − κ
g ξmF
F − F (F + 1)
Here, κS , κV , and κT are shift coefficients proportional
to the scalar, vector (or axial), and tensor polarizabil-
ities, and subscripts e and g refer to the excited (3P0)
and ground (1S0) states respectively. ER is the energy
of a lattice photon recoil and UT /ER characterizes the
lattice intensity. The vector (∝ mF ) and tensor (∝ m
light shift terms arise solely from the nuclear structure
and depend on the orientation of the light polarization
and the bias magnetic field. The tensor shift coefficient
includes a geometric scaling factor which varies with the
relative angle φ of the laser polarization axis and the axis
of quantization, as 3cos2 φ − 1. The vector shift, which
can be described as an pseudo-magnetic field along the
propagation axis of the trapping laser, depends on the
trapping geometry in two ways. First, the size of the
effect is scaled by the degree of elliptical polarization ξ,
where ξ = 0 (ξ = ±1) represents perfect linear (circular)
polarization. Second, for the situation described here,
the effect of the vector light shift is expected to be orders
of magnitude smaller than the Zeeman effect, justifying
the use of the bias magnetic field direction as the quan-
tization axis for all of the mF terms in Eq. 9. Hence
the shift coefficient depends on the relative angle be-
tween the pseudo-magnetic and the bias magnetic fields,
vanishing in the case of orthogonal orientation [52]. A
more general description of the tensor and vector effects
in alkaline-earth systems for the case of arbitrary ellipti-
cal polarization can be found in Ref. [10]. Calculations of
the scalar, vector, and tensor shift coefficients have been
performed elsewhere for Sr, Yb, and Hg [9, 10, 11, 52]
and will not be discussed here. Hyperpolarizability ef-
fects (∝ U2T ) [9, 10, 11, 12] are ignored in Eq. 9 as they
are negligible in 87Sr at the level of 10−17 for the range of
lattice intensities used in current experiments [12]. The
second order Zeeman term has been omitted but is also
present.
Using Eq. 9 we can write the frequency of a π-
transition (∆mF = 0) from a ground state mF as
νπmF = νc −
F (F + 1)
mF ξ + ∆κ
− δgmFµ0B,
where the shift coefficients due to the differential polar-
izabilities are represented as ∆κ, and νc is the bare clock
frequency. The basic principle of the lattice clock tech-
nique is to tune the lattice wavelength (and hence the
polarizabilities) such that the intensity-dependent fre-
quency shift terms are reduced to zero. Due to the mF -
dependence of the third term of Eq. 10, the Stark shifts
cannot be completely compensated for all of the sublevels
simultaneously. Or equivalently, the magic wavelength
will be different depending on the sublevel used. The
significance of this effect depends on the magnitude of
the tensor and vector terms. Fortunately, in the case of
the 1S0-
3P0 transition the clock states are nearly scalar,
and hence these effects are expected to be quite small.
While theoretical estimates for the polarizabilities have
been made, experimental measurements are unavailable
for the vector and tensor terms. The frequencies of σ±
(∆mF = ±1) transitions from a ground mF state are
similar to the π-transitions, given by
= νc −
F (F + 1)
e (mF ± 1) − κ
g mF )ξ
e 3(mF ± 1)
− (±gI + δg(mF ± 1))µ0B.
IV. EXPERIMENTAL DETERMINATION OF
FIELD SENSITIVITIES
To explore the magnitude of the variousmF -dependent
shifts in Eq. 10, a differential measurement scheme can
be used to eliminate the large shifts common to all levels.
Using resolved sublevels one can extract mF sensitivities
by measuring the splitting of neighboring states. This is
the approach taken here. A diagram of our spectroscopic
setup is shown in Fig. 3(a). 87Sr atoms are captured
from a thermal beam into a magneto-optical trap (MOT),
based on the 1S0-
1P1 cycling transition. The atoms are
then transferred to a second stage MOT for narrow line
cooling using a dual frequency technique [26]. Full de-
tails of the cooling and trapping system used in this
work are discussed elsewhere [5, 28]. During the cooling
process, a vertical one-dimensional lattice is overlapped
-30 -20 -10 0 10 20 30
Laser Detuning (Hz)
FIG. 3: (color online) (a) Schematic of the experimental ap-
paratus used here. Atoms are confined in a nearly vertical
optical lattice formed by a retro-reflected 813 nm laser. A
698 nm probe laser is co-aligned with the lattice. The probe
polarization EP can be varied by an angle θ relative to that of
the linear lattice polarization EL. A pair of Helmholtz coils
(blue) is used to apply a magnetic field along the lattice po-
larization axis. (b) Nuclear structure of the 1S0 and
3P0 clock
states. The large nuclear spin (I = 9/2) results in 28 total
transitions, and the labels π, σ+, and σ− represent transi-
tions where mF changes by 0, +1, and −1 respectively. (c)
Observation of the clock transition without a bias magnetic
field. The 3P0 population (in arbitrary units) is plotted (blue
dots) versus the probe laser frequency for θ = 0, and a fit to
a sinc-squared lineshape yields a Fourier-limited linewidth of
10.7(3) Hz. Linewidths as narrow as 5 Hz have been observed
under similar conditions and when the probe time is extended
to 500 ms.
with the atom cloud. We typically load ∼104 atoms into
the lattice at a temperature of ∼1.5µK. The lattice is
operated at the Stark cancelation wavelength [6, 12] of
813.4280(5) nm with a trap depth of U0 = 35ER. A
Helmholtz coil pair provides a field along the lattice po-
larization axis for resolved sub-level spectroscopy. Two
other coil pairs are used along the other axes to zero the
orthogonal fields. The spectroscopy sequence for the 1S0-
3P0 clock transition begins with an 80 ms Rabi pulse from
a highly stabilized diode laser [53] that is co-propagated
with the lattice laser. The polarization of the probe laser
is linear at an angle θ relative to that of the lattice. A
shelved detection scheme is used, where the ground state
population is measured using the 1S0-
1P1 transition. The
3P0 population is then measured by pumping the atoms
through intermediate states using 3P0-
3S1, and
the natural decay of 3P1 , before applying a second
-300 -200 -100 0 100 200 300
+1/2-1/2
Laser Detuning (Hz)
FIG. 4: (color online) Observation of the 1S0-
3P0 π-
transitions (θ = 0) in the presence of a 0.58 G magnetic field.
Data is shown in grey and a fit to the eight observable line-
shapes is shown as a blue curve. The peaks are labeled by
the ground state mF -sublevel of the transition. The relative
transition amplitudes for the different sublevels are strongly
influenced by the Clebsch-Gordan coefficients. Here, transi-
tion linewidths of 10 Hz are used. Spectra as narrow as 1.8
Hz can be achieved under similar conditions if the probe time
is extended to 500 ms.
1P1 pulse. The 461 nm pulse is destructive, so for each
frequency step of the probe laser the ∼800 ms loading
and cooling cycle is repeated.
When π polarization is used for spectroscopy (θ = 0),
the large nuclear spin provides ten possible transitions,
as shown schematically in Fig. 3(b). Figure 3(c) shows
a spectroscopic measurement of these states in the ab-
sence of a bias magnetic field. The suppression of mo-
tional effects provided by the lattice confinement allows
observation of extremely narrow lines [3, 4, 19], in this
case having Fourier-limited full width at half maximum
(FWHM) of ∼10 Hz (quality factor of 4 × 1013). In our
current apparatus the linewidth limitation is 5 Hz with
degenerate sublevels and 1.8 Hz when the degeneracy is
removed [3]. The high spectral resolution allows for the
study of nuclear spin effects at small bias fields, as the
ten sublevels can easily be resolved with a few hundred
mG. An example of this is shown in Fig. 4, where the ten
transitions are observed in the presence of a 0.58 G bias
field. This is important for achieving a high accuracy
measurement of δg as the contribution from magnetic-
field-induced state mixing is negligible. To extract the
desired shift coefficients we note that for the π transi-
tions we have a frequency gap between neighboring lines
fπ,mF = νπmF − νπmF −1
= −δgµ0B − ∆κ
3(2mF − 1)
From Eq. 12, we see that by measuring the differences in
-500 -250 0 250 500
-9/2 ( +)
-7/2 (
Laser Detuning (Hz)
+7/2 ( +)
+9/2 ( )
FIG. 5: (color online) Observation of the 18 σ transitions
when the probe laser polarization is orthogonal to that of the
lattice (θ = π
). Here, a field of 0.69 G is used. The spectro-
scopic data is shown in grey and a fit to the data is shown
as a blue curve. Peak labels give the ground state sublevel of
the transition, as well as the excitation polarization.
frequency of two spectroscopic features, the three terms
of interest (δg, ∆κV , and ∆κT ) can be determined inde-
pendently. The differential g factor can be determined
by varying the magnetic field. The contribution of the
last two terms can be extracted by varying the inten-
sity of the standing wave trap, and can be independently
determined since only the tensor shift depends on mF .
While the π transitions allow a simple determination
of δg, the measurement requires a careful calibration of
the magnetic field and a precise control of the probe
laser frequency over the ∼500 seconds required to pro-
duce a scan such as in Fig. 4. Any linear laser drift
will appear in the form of a smaller or larger δg, de-
pending on the laser scan direction. Furthermore, the
measurement can not be used to determine the sign of
δg as an opposite sign would yield an identical spectral
pattern. In an alternative measurement scheme, we in-
stead polarize the probe laser perpendicular to the lattice
polarization (θ = π
) to excite both σ+ and σ− tran-
sitions. In this configuration, 18 spectral features are
observed and easily identified (Fig. 5). Ignoring small
shifts due to the lattice potential, δg is given by extract-
ing the frequency splitting between adjacent transitions
of a given polarization (all σ+ or all σ− transitions) as
fσ±,mF =νσ±mF
mF −1
=−δgµ0B . If we also measure the
frequency difference between σ+ and σ− transitions from
the same sublevel, fd,mF =νσ+mF
=−2(gI + δg)µ0B,
we find that the differential g-factor can be determined
from the ratio of these frequencies as
fd,mF
σ±,mF
. (13)
-600 -300 0 300 600
Laser Detuning (Hz)
FIG. 6: (color online) Calculation of the 18 σ transition fre-
quencies in the presence of a 1 G bias field, including the influ-
ence of Clebsch-Gordan coefficients. The green (red) curves
show the σ+ (σ−) transitions. (a) Spectral pattern for g-
factors gIµ0 = −185 Hz/G and δgµ0 = −109 Hz/G. (b) Same
pattern as in (a) but with δgµ0 = +109 Hz/G. The qualita-
tive difference in the relative positions of the transitions allows
determination of the sign of δg compared to that of gI .
In this case, prior knowledge of the magnetic field is not
required for the evaluation, nor is a series of measure-
ment at different fields, as δg is instead directly deter-
mined from the line splitting and the known 1S0 g factor
gI . The field calibration and the δg measurement are in
fact done simultaneously, making the method immune to
some systematics which could mimic a false field, such as
linear laser drift during a spectroscopic scan or slow mag-
netic field variations. Using the σ transitions also elim-
inates the sign ambiguity which persists when using the
π transitions for measuring δg. While we can not extract
the absolute sign, the recovered spectrum is sensitive to
the relative sign between gI and δg. This is shown explic-
itly in Fig. 6 where the positions of the transitions have
been calculated in the presence of a ∼1 G magnetic field.
Figure 6(a) shows the spectrum when the signs of gI and
δg are the same while in Fig. 6(b) the signs are oppo-
site. The two plots show a qualitative difference between
the two possible cases. Comparing Fig. 5 and Fig. 6 it is
obvious that the hyperfine interaction increases the mag-
nitude of the 3P0 g-factor (δg has the same sign as gI).
We state this point explicitly because of recent inconsis-
tencies in theoretical estimates of the relative sign of δg
and gI in the
87Sr literature [7, 8].
To extract the magnitude of δg, data such as in Fig. 5
are fit with eighteen Lorentzian lines, and the relevant
splitting frequencies fd,mF and fσ± are extracted. Due
to the large number of spectral features, each experimen-
tal spectrum yields 16 measurements of δg. A total of
31 full spectra was taken, resulting in an average value
of δgµ0 = −108.4(4) Hz/G where the uncertainty is the
0.0 0.5 1.0 1.5 2.0
Lattice Depth (U/U
FIG. 7: (color online) Summary of δg-measurements for dif-
ferent lattice intensities. Each data point (and uncertainty)
represents the δg value extracted from a full σ± spectrum
such as in Fig. 5. Linear extrapolation (red line) to zero lat-
tice intensity yields a value −108.4(1) Hz/G.
standard deviation of the measured value. To check for
sources of systematic error, the magnetic field was varied
to confirm the field independence of the measurement.
We also varied the clock laser intensity by an order of
magnitude to check for Stark and line pulling effects. It
is also necessary to consider potential measurement er-
rors due to the optical lattice since in general the splitting
frequencies fd,mF and fσ± will depend on the vector and
tensor light shifts. For fixed fields, the vector shift is in-
distinguishable from the linear Zeeman shift (see Eqs. 10-
12) and can lead to errors in calibrating the field for a δg
measurement. In this work, a high quality linear polar-
izer (10−4) is used which would in principle eliminate the
vector shift. The nearly orthogonal orientation should
further reduce the shift. However, any birefringence of
the vacuum windows or misalignment between the lattice
polarization axis and the magnetic field axis can lead to
a non-zero value of the vector shift. To measure this ef-
fect in our system, we varied the trapping depth over a
range of ∼ (0.6 − 1.7)U0 and extrapolated δg to zero in-
tensity, as shown in Fig. 7. Note that this measurement
also checks for possible errors due to scalar and tensor
polarizabilites as their effects also scale linearly with the
trap intensity. We found that the δg-measurement was
affected by the lattice potential by less then 0.1%, well
below the uncertainty quoted above.
Unlike the vector shift, the tensor contribution to the
sublevel splitting is distinguishable from the magnetic
contribution even for fixed fields. Adjacent σ transitions
can be used to measure ∆κT and κTe due to the m
F de-
pendence of the tensor shift. An appropriate choice of
transition comparisons results in a measurement of the
tensor shift without any contributions from magnetic or
vector terms. To enhance the sensitivity of our measure-
0 5 10 15 20 25
Measurement
UT=1.7U0 UT=0.85U0 UT=1.3U0
FIG. 8: (color online) Measurement of the tensor shift coef-
ficients ∆κT (blue triangles), and κTe (green circles), using σ
spectra and Eq. 14. The measured coefficients show no sta-
tistically significant trap depth dependence while varying the
depth from 0.85–1.7 U0.
ment we focus mainly on the transitions originating from
states with large mF ; for example, we find that
fσ+,mF=7/2 − fσ+,mF =−7/2
e = −
fd,mF =7/2 − fd,mF =−7/2
while similar combinations can be used to isolate the dif-
ferential tensor shift from the σ− data as well as the
tensor shift coefficient of the 1S0 state. From the σ split-
ting data we find ∆κT = 0.03(8) Hz/U0 and |κ
e |=0.02(4)
Hz/U0. The data for these measurements is shown in
Fig. 8. Similarly, we extracted the tensor shift coeffi-
cient from π spectra, exploiting the mF -dependent term
in Eq. 12, yielding ∆κT = 0.02(7) Hz/U0. The measure-
ments here are consistent with zero and were not found
to depend on the trapping depth used for a range of 0.85–
1.7 U0, and hence are interpreted as conservative upper
limits to the shift coefficients. The error bars represent
the standard deviation of many measurements, with the
scatter in the data due mainly to laser frequency noise
and slight under sampling of the peaks. It is worth noting
that the tensor shift of the clock transition is expected to
be dominated by the 3P0 shift, and therefore, the limit
on κTe can be used as an additional estimate for the up-
per limit on ∆κT . Improvements on these limits can be
made by going to larger trap intensities to enhance sen-
sitivity, as well as by directly stabilizing the clock laser
to components of interest for improved averaging.
Table I summarizes the measured sensitivities to mag-
netic fields and the lattice potential. The Stark shift
coefficients for linear polarization at 813.4280(5) nm are
given in units of Hz/(UT /ER). For completeness, a recent
measurement of the second order Zeeman shift using 88Sr
has been included [45], as well as the measured shift coef-
ficient ∆γ for the hyperpolarizability [12] and the upper
TABLE I: Measured Field Sensitivities for 87Sr
Sensitivity Value Units Ref.
B /mFB -108.4(4) Hz/G This work
2 -0.233(5) Hz/G2 [45]a
∆κT 6(20) ×10−4 Hz/(UT /ER) This work
∆κT 9(23)×10−4 Hz/(UT /ER) This work
κTe 5(10)×10
−4 Hz/(UT /ER) This work
κ -3(7)×10−3 Hz/(UT /ER) [6]
∆γ 7(6)×10−6 Hz/(UT /ER)
2 [12]d
a Measured for 88Sr
b Measured with π spectra
c Measured with σ± spectra
d Measured with degenerate sublevels
limit for the overall linear lattice shift coefficient κ from
our recent clock measurement [6]. While we were able
to confirm that the vector shift effect is small and con-
sistent with zero in our system, we do not report a limit
for the vector shift coefficient ∆κV due to uncertainty
in the lattice polarization purity and orientation relative
to the quantization axis. In future measurements, use of
circular trap polarization can enhance the measurement
precision of ∆κV by at least two orders of magnitude.
Although only upper limits are reported here, the re-
sult can be used to estimate accuracy and linewidth lim-
itations for lattice clocks. For example, in the absence
of magnetic fields, the tensor shift can cause line broad-
ening of the transition for unpolarized samples. Given
the transition amplitudes in Fig. 4, the upper limit for
line broadening, derived from the tensor shift coefficients
discussed above, is 5 Hz at U0. The tensor shift also
results in a different magic wavelength for different mF
sublevels, which is constrained here to the few picometer
level.
V. COMPARISON OF THE δg MEASUREMENT
WITH THEORY AND
3P0 LIFETIME ESTIMATE
The precise measurement of δg provides an opportu-
nity to compare various atomic hyperfine interaction the-
ories to the experiment. To calculate the mixing param-
eters α0 and β0 (defined in Eq. 16 of the Appendix),
we first try the simplest approach using the standard
Breit-Wills (BW) theory [23, 24] to relate the mixing
parameters to the measured triplet hyperfine splitting
(hfs). The parameters α (0.9996) and β (−0.0286(3))
are calculated from recent determinations of the 3P1 [32]
and 1P1 [54] lifetimes. The relevant singlet and triplet
single-electron hyperfine coefficients are taken from Ref.
[55]. From this calculation we find α0 = 2.37(1) × 10
β0 = −4.12(1) × 10
−6, and γ0 = 4.72(1) × 10
−6, resulting
in δgµ0 = −109.1(1) Hz/G . Using the mixing values in
conjunction with Eq. 4 we find that the 3P0 lifetime is
152(2) s. The agreement with the measured g-factor is
excellent, however the BW-theory is known to have prob-
lems predicting the 1P1 characteristics based on those of
the triplet states. In this case, the BW-theory frame-
work predicts a magnetic dipole A coefficient for the 1P1
state of -32.7(2) MHz, whereas the experimental value is
-3.4(4) MHz [55]. Since δg is determined mainly by the
properties of the 3P1 state, it is not surprising that the
theoretical and experimental values are in good agree-
ment. Conversely, the lifetime of the 3P0 state depends
nearly equally on the 1P1 and
3P1 characteristics, so the
lifetime prediction deserves further investigation.
A modified BW (MBW) theory [44, 55, 56] was at-
tempted to incorporate the singlet data and eliminate
such discrepancies. In this case 1P1,
3P1, and
3P2 hfs are
all used in the calculation, and two scaling factors are
introduced to account for differences between singlet and
triplet radial wavefunctions when determining the HFI
mixing coefficients (note that γ0 is not affected by this
modification). This method has been shown to be suc-
cessful in the case of heavier systems such as neutral Hg
[44]. We find α0 = 2.56(1)× 10
−4 and β0 = −5.5(1)× 10
resulting in δgµ0 = −117.9(5) Hz/G and τ
3P0 = 110(1) s.
Here, the agreement with experiment is fair, but the un-
certainties in experimental parameters used for the the-
ory are too small to explain the discrepancy.
Alternatively, we note that in Eq. 6, δg depends
strongly on α0α and only weakly (< 1%) on β0β, such
that our measurement can be used to tightly constrain
α0 = 2.35(1)×10
−4, and then use only the triplet hfs data
to calculate β0 in the MBW theory framework. In this
way we find β0 = −3.2(1) × 10
−6, yielding τ
3P0 = 182(5)s.
The resulting 1P1 hfs A coefficient is −15.9(5) MHz,
which is an improvement compared to the standard BW
calculation. The inability of the BW and MBW theory to
simultaneously predict the singlet and triplet properties
seems to suggest that the theory is inadequate for 87Sr.
A second possibility is a measurement error of some of
the hfs coefficients, or the ground state g-factor. The
triplet hfs is well resolved and has been confirmed with
high accuracy in a number of measurements. An error in
the ground state g-factor measurement at the 10% level
is unlikely, but it can be tested in future measurements
TABLE II: Theoretical estimates of δg and τ
3P0 for 87Sr
Values used in Calculation
α = 0.9996 β = −0.0286(3)
Calc. α0 β0 τ
3P0 δgµ0 A
×104 ×106 (s) mF (Hz/G) (MHz)
BW 2.37(1) -4.12(1) 152(2) -109.1(1) -32.7(2)
MBW I 2.56(1) -5.5(1) 110(1) -117.9(5) -3.4(4)a
MBW II 2.35(1) -3.2(1) 182(5) -108.4(4)b -15.9(5)
Ref [40] — — 132 — —
Ref [41, 59] 2.9(3) -4.7(7) 110(30) -130(15) c —
Ref [8, 9] — — 159 106d —
a Experimental value [55]
b Experimental value from this work
c Calculated using Eq. 6
d Sign inferred from Figure 1 in Ref. [8]
by calibrating the field in an independent way so that
both gI and δg can be measured. On the other hand,
the 1P1 hfs measurement has only been performed once
using level crossing techniques, and is complicated by the
fact that the structure is not resolved, and that the 88Sr
transition dominates the spectrum for naturally abun-
dant samples. Present 87Sr cooling experiments could be
used to provide an improved measurement of the 1P1 data
to check whether this is the origin of the discrepancy.
Although one can presumably predict the lifetime with
a few percent accuracy (based on uncertainties in the
experimental data), the large model-dependent spread
in values introduces significant additional uncertainty.
Based on the calculations above (and many other similar
ones) and our experimental data, the predicted lifetime is
145(40) s. A direct measurement of the natural lifetime
would be ideal, as has been done in similar studies with
trapped ion systems such as In+ [39] and Al+ [57] or neu-
tral atoms where the lifetime is shorter, but for Sr this
type of experiment is difficult due to trap lifetime limi-
tations, and the measurement accuracy would be limited
by blackbody quenching of the 3P0 state [58].
Table II summarizes the calculations of δg and τ
discussed here including the HFI mixing parameters α0
and β0. Other recent calculations based on the BW the-
ory [8, 9], ab initio relativistic many body calculations
[40], and an effective core calculation [41] are given for
comparison, with error bars shown when available.
VI. IMPLICATIONS FOR THE
SR LATTICE
CLOCK
In the previous sections, the magnitude of relevant
magnetic and Stark shifts has been discussed. Briefly, we
will discuss straightforward methods to reduce or elim-
inate the effects of the field sensitivities. To eliminate
linear Zeeman and vector light shifts the obvious path is
to use resolved sublevels and average out the effects by al-
ternating between measurements of levels with the same
|mF |. Figure 9 shows an example of a spin-polarized mea-
surement using the mF = ±9/2 states for cancelation of
the Zeeman and vector shifts. To polarize the sample,
we optically pump the atoms using a weak beam reso-
nant with the 1S0-
3P1 (F = 7/2) transition. The beam
is co-aligned with the lattice and clock laser and linearly
polarized along the lattice polarization axis (θ = 0), re-
sulting in optical pumping to the stretched (mF = 9/2)
states. Spectroscopy with (blue) and without (red) the
polarizing step shows the efficiency of the optical pump-
ing as the population in the stretched states is dramati-
cally increased while excitations from other sublevels are
not visible. Alternate schemes have been demonstrated
elsewhere [8, 26] where the population is pumped into a
single mF = ±9/2 state using the
3P1 (F = 9/2)
transition. In our system, we have found the method
shown here to be more efficient in terms of atom number
in the final state and state purity. The highly efficient
-150 -100 -50 0 50 100 150
Laser Detuning (Hz)
FIG. 9: (color online) The effect of optical pumping via the
3P1 (F = 7/2) state is shown via direct spectroscopy with θ =
0. The red data shows the spectrum without the polarizing
light for a field of 0.27 G. With the polarizing step added
to the spectroscopy sequence the blue spectum is observed.
Even with the loss of ∼ 15% of the total atom number due to
the polarizing laser, the signal size of the mF = ±9/2 states
is increased by more than a factor of 4.
optical pumping and high spectral resolution should al-
low clock operation with a bias field of less than 300 mG
for a 10 Hz feature while keeping line pulling effects due
to the presence of the other sublevels below 10−17. The
corresponding second order Zeeman shift for such a field
is only ∼21 mHz, and hence knowledge of the magnetic
field at the 10% level is sufficient to control the effect
below 10−17. With the high accuracy δg-measurement
reported here, real time magnetic field calibration at the
level of a few percent is trivial. For spin-polarized sam-
ples, a new magic wavelength can be determined for the
mF -pair, and the effect of the tensor shift will only be to
modify the cancelation wavelength by at most a few pi-
cometers if a different set of sublevels are employed. With
spin-polarized samples, the sensitivity to both magnetic
and optical fields (including hyperpolarizability effects)
should not prevent the clock accuracy from reaching be-
low 10−17.
Initial concerns that nuclear spin effects would limit
the obtainable accuracy of a lattice clock have prompted
a number of recent proposals to use bosonic isotopes
in combination with external field induced state mixing
[17, 18, 20, 21, 22] to replace the mixing provided natu-
rally by the nuclear spin. In these schemes, however, the
simplicity of a hyperfine-free system comes at the cost
of additional accuracy concerns as the mixing fields also
shift the clock states. The magnitudes of the shifts de-
pend on the species, mixing mechanism, and achievable
spectral resolution in a given system. As an example,
we discuss the magnetic field induced mixing scheme [20]
which was the first to be experimentally demonstrated
for Yb [19] and Sr [45]. For a 10 Hz 88Sr resonance (i.e.
the linewidth used in this work), the required magnetic
and optical fields (set to minimize the total frequency
shift) result in a second order Zeeman shift of −30 Hz
and an ac Stark shift from the probe laser of −36 Hz.
For the same transition width, using spin-polarized 87Sr,
the second order Zeeman shift is less than −20 mHz for
the situation in Fig. 9, and the ac Stark shift is less than
1 mHz. Although the nuclear-spin-induced case requires
a short spin-polarizing stage and averaging between two
sublevels, this is preferable to the bosonic isotope, where
the mixing fields must be calibrated and monitored at the
10−5 level to reach below 10−17. Other practical concerns
may make the external mixing schemes favorable, if for
example isotopes with nuclear spin are not readily avail-
able for the species of interest. In a lattice clock with
atom-shot noise limited performance, the stability could
be improved, at the cost of accuracy, by switching to a
bosonic isotope with larger natural abundance.
In conclusion we have presented a detailed experimen-
tal and theoretical study of the nuclear spin effects in op-
tical lattice clocks. A perturbative approach for describ-
ing the state mixing and magnetic sensitivity of the clock
states was given for a general alkaline-earth(-like) system,
with 87Sr used as an example. Relevant Stark shifts from
the optical lattice were also discussed. We described in
detail our sign-sensitive measurement of the differential
g-factor of the 1S0-
3P0 clock transition in
87Sr, yield-
ing µ0δg = −108.4(4)mF Hz/G, as well as upper limit for
the differential and exited state tensor shift coefficients
∆κT = 0.02 Hz/(UT /ER) and κ
e = 0.01 Hz/(UT /ER).
We have demonstrated a polarizing scheme which should
allow control of the nuclear spin related effects in the 87Sr
lattice clock to well below 10−17.
We thank T. Ido for help during the early stages of
the g-factor measurement, and G. K. Campbell and A.
Pe’er for careful reading of the manuscript. This work
was supported by ONR, NIST, and NSF. Andrew Lud-
low acknowledges support from NSF-IGERT through the
OSEP program at the University of Colorado.
[1] S. A. Diddams et al., Science 306, 1318 (2004).
[2] M. Takamoto et al., Nature 435, 321 (2005).
[3] M. M. Boyd et al., Science 314, 1430 (2006).
[4] C. W. Hoyt et al., in Proceedings of the 20th European
Frequency and Time Forum, Braunschweig, Germany,
March 27-30, p. 324-328 (2006).
[5] A. D. Ludlow et al., Phys. Rev. Lett. 96, 033003 (2006).
[6] M. M. Boyd et al., Phys. Rev. Lett. 98, 083002 (2007).
[7] R. Le Targat et al., Phys. Rev. Lett. 97, 130801 (2006).
[8] M. Takamoto et al., J. Phys. Soc. Japan 75, 10 (2006).
[9] H. Katori et al., Phys. Rev. Lett. 91, 173005 (2003).
[10] V. Ovsiannikov et al., Quantum Electron. 36, 3 (2006).
[11] S. G. Porsev et al., Phys. Rev. A 69, 021403(R) (2004).
[12] A. Brusch et al., Phys. Rev. Lett. 96, 103003 (2006).
[13] W. H. Oskay et al., Phys. Rev. Lett. 97, 020801 (2006).
[14] L. Childress et al., Science 314, 281 (2006).
[15] D. Hayes, P. S. Julienne, and I. H. Deutsch, Phys. Rev.
Lett. 98, 070501 (2007)
[16] I. Reichenbach and I. Deutsch, quant-ph/0702120.
[17] R. Santra et al., Phys. Rev. Lett. 94, 173002 (2005).
[18] T. Hong et al., Phys. Rev. Lett. 94, 050801 (2005).
[19] Z. W. Barber et al., Phys. Rev. Lett. 96, 083002 (2006).
[20] A. V. Taichenachev et al., Phys. Rev. Lett. 96, 083001
(2006).
[21] T. Zanon-Willette et al., Phys. Rev. Lett. 97, 233001
(2006).
[22] V. D. Ovsiannikov et al., Phys. Rev. A 75, 020501 (2007).
[23] G. Breit and L. A. Wills, Phys. Rev. 44, 470 (1933).
[24] A. Lurio, M. Mandel, and R. Novick, Phys. Rev. 126,
1758 (1962).
[25] K. R. Vogel et al., IEEE Trans. on Inst. and Meas. 48,
618 (1999).
[26] T. Mukaiyama et al., Phys. Rev. Lett. 90, 113002 (2003).
[27] T. H. Loftus et al., Phys. Rev. Lett. 93, 073003 (2004).
[28] T. H. Loftus et al., Phys. Rev. A 70, 063413 (2004).
[29] N. Poli et al., Phys. Rev. A 71, 061403 (2005).
[30] E. A. Curtis, C. W. Oates, and L. Hollberg, Phys. Rev.
A 64, 031403 (2001).
[31] T. Binnewies et al., Phys. Rev. Lett. 87, 123002 (2001)
[32] T. Zelevinsky et al., Phys. Rev. Lett. 96, 203201 (2006).
[33] S. Tojo et al., Phys. Rev. Lett. 96, 153201 (2006).
[34] R. Ciury lo et al., Phys. Rev. A 70, 062710 (2004).
[35] C. Degenhardt et al., Phys. Rev. A 72, 062111 (2005).
[36] G. Wilpers et al., Appl. Phys. B 85, 31 (2006).
[37] T. Ido et al., Phys. Rev. Lett. 94, 153001 (2005).
[38] M. V. Romalis et al., Phys. Rev. Lett. 86, 2505 (2001).
[39] T. Becker et al., Phys. Rev. A 63, 051802 (2001).
[40] S. G. Porsev and A. Derevianko, Phys. Rev. A 69, 042506
(2004).
[41] R. Santra et al., Phys. Rev. A 69, 042510 (2004).
[42] L. Olschewski, Z. Phys. 249, 205 (1972).
[43] H. Kopfermann, Nuclear Moments, New York, (1963).
[44] B. Lahaye and J. Margerie, J. Physique 36, 943 (1975).
[45] X. Baillard et al., physics/0703148.
[46] S. Bize et al., J. Phys. B 38, S449 (2005).
[47] T. P. Heavner et al., Metrologia 42, 411 (2005).
[48] H. S. Margolis et al., Science 306, 1355 (2004).
[49] P. Dubé et al., Phys. Rev. Lett. 95, 033001 (2005).
[50] T. Schneider et al., Phys. Rev. Lett. 94, 230801 (2005).
[51] P. J. Blythe et al., Phys. Rev. A 67, 020501 (2003).
[52] M. V. Romalis and E. N. Fortson, Phys. Rev. A 59, 4547
(1999).
[53] A. D. Ludlow et al., Opt. Lett. 32, 641 (2007).
[54] P. Mickelson et al., Phys. Rev. Lett. 95, 223002 (2005).
[55] H. J. Kluge and H. Sauter, Z. Physik 270, 295 (1974).
[56] A. Lurio, Phys. Rev. 142, 46 (1966).
[57] T. Rosenband et al., Phys. Rev. Lett. 98, 220801 (2007).
[58] X. Xu et al., J. Opt. Soc. Am. B 20, 5 (2003).
[59] Unpublished HFI coefficients extracted from Ref. [41], R.
Santra private communication.
[60] S. G. Porsev and A. Derevianko, Phys. Rev. A 74, 020502
(2006).
[61] G. Breit and I. I. Rabi, Phys. Rev. 38, 2082 (1932).
[62] S. M. Heider and G. O. Brink, Phys. Rev. A. 16, 1371
(1977).
[63] G. zu Putlitz, Z. Phys. 175, 543 (1963).
http://arxiv.org/abs/quant-ph/0702120
http://arxiv.org/abs/physics/0703148
VII. APPENDIX
The appendix is organized as follows, in the first sec-
tion we briefly describe calculation of the mixing coeffi-
cients needed to estimate the effects discussed in the main
text. We also include relevant Zeeman matrix elements.
In the second section we describe a perturbative treat-
ment of the magnetic field on the hyperfine-mixed 3P0
state, resulting in a Breit-Rabi like formula for the clock
transition. In the final section we solve the more general
case and treat the magnetic field and hyperfine interac-
tion simultaneously, which is necessary to calculate the
sensitivity of the 1P1,
3P1 and
3P2 states.
A. State mixing coefficients and Zeeman elements
The intermediate coupling coefficients α and β are typ-
ically calculated from measured lifetimes and transition
frequencies of the 1P1 and
3P1 states and a normalization
constraint, resulting in
= 1. (15)
The HFI mixing coefficients α0, β0, and γ0 are due to
the interaction between the pure 3P0 state and the spin-
orbit mixed states in Eq. 1 having the same total angular
momentum F . They are defined as
〈3P1, F = I |HA|
3P 00 , F = I〉
ν3P0 − ν3P1
〈1P1, F = I |HA|
3P 00 , F = I〉
ν3P0 − ν1P1
〈3P2, F = I |HQ|
3P 00 , F = I〉
ν3P0 − ν3P2
Where HA and HQ are the magnetic dipole and electric
quadrupole contributions of the hyperfine Hamiltonian.
A standard technique for calculating the matrix elements
is to relate unknown radial contributions of the wavefunc-
tions to the measured hyperfine magnetic dipole (A) and
electric quadrupole (Q) coefficients. Calculation of the
matrix elements using BW theory [23, 24, 39, 44, 55] can
be performed using the measured hyperfine splitting of
the triplet state along with matrix elements provided in
[24]. Inclusion of the 1P1 data (and an accurate predic-
tion of β0) requires a modified BW theory [44, 55, 56]
where the relation between the measured hyperfine split-
ting and the radial components is more complex but man-
ageable if the splitting data for all of the states in the
nsnp manifold are available. A thorough discussion of
the two theories is provided in Refs. [44, 55].
Zeeman matrix elements for singlet and triplet states in
the nsnp configuration have been calculated in Ref. [24].
Table III summarizes those elements relevant to the work
here, where the results have been simplified by using the
electronic quantum numbers for the alkaline-earth case,
but leaving the nuclear spin quantum number general
for simple application to different species. Note that the
results include the application of our sign convention in
Eq. 5 which differs from that in Ref. [24].
B. Magnetic field as a perturbation
To determine the magnetic sensitivity of the 3P0 state
due to the hyperfine interaction with the 3P1 and
states, we first use a perturbative approach to add the
Zeeman interaction as a correction to the |3P0〉 state in
Eq. 3. The resulting matrix elements depend on spin-
orbit and hyperfine mixing coefficients α, β, α0, β0, and
γ0. For the
3P0 state, diagonal elements to first order in
α0 and β0 are relevant, while for
1P1 and
3P1, the contri-
bution of the hyperfine mixing to the diagonal elements
can be ignored. All off-diagonal terms of order β2, α0α,
α0β, α
, and smaller can be neglected. Due to the selec-
tion rules for pure (LS) states, the only contributions of
the 3P2 hyperfine mixing are of order α0γ0, γ
, and β0γ0.
Thus the state can be ignored and the Zeeman interac-
tion matrixMz between atomic P states can be described
in the
|1P1, F,mF 〉, |
3P0, F,mF 〉, |
3P1, F,mF 〉
basis as
ν1P1 M
ν3P0 M
, (17)
where we define diagonal elements as
ν3P0 = ν
0 |HZ |
+ 2(αα0 − ββ0)〈
1 , F = I |HZ |
ν3P1 = ν
1 , F
|HZ |
1 , F
1 , F
|HZ |
1 , F
ν1P1 = ν
1 , F
|HZ |
1 , F
1 , F
|HZ |
1 , F
Off diagonal elements are given by
|〈3P 01 , F
′|HZ |3P 00 , F 〉|
|〈3P 00 , F |HZ |
3P 01 , F
′〉|2.
TABLE III: Zeeman Matrix Elements for Pure (2S+1L0J ) States
Relevant Elements for the
3P0 State:
〈3P 00 , F = I |HZ|
3P 00 , F = I〉= −gImFµ0B
〈3P 00 , F = I |HZ|
3P 01 , F
′ = I〉 =(gs − gl)mFµ0B
3I(I+1)
〈3P 00 , F = I |HZ|
3P 01 , F
′ = I + 1〉 =(gs − gl)µ0B
((I+1)2−m2
)(4I+6)
3(I+1)(4(I2+1)−1)
〈3P 00 , F = I |HZ|
3P 01 , F
′ = I − 1〉 =(gs − gl)µ0B
(I2−m2
)(4I−2)
3I(4I2−1)
Relevant Diagonal Elements within
3P1 Manifold:
〈3P 01 , F = I |HZ|
3P 01 , F = I〉=
gl+gs−gI (2I(I+1)−2)
2I(I+1)
mFµ0B
〈3P 01 , F = I + 1|HZ |
3P 01 , F = I + 1〉=
gl+gs−2gII
2(I+1)
mFµ0B
〈3P 01 , F = I − 1|HZ |
3P 01 , F = I − 1〉=
gl+gs+2gI (I+1)
mFµ0B
Relevant Diagonal Elements within
1P1 Manifold:
〈1P 01 , F = I |HZ|
1P 01 , F = I〉=
gl−gI (I(I+1)−1)
I(I+1)
mFµ0B
〈1P 01 , F = I + 1|HZ |
1P 01 , F = I + 1〉=
gl−gII
(I+1)
mFµ0B
〈1P 01 , F = I − 1|HZ |
1P 01 , F = I − 1〉=
gl+gI(I+1)
mFµ0B
The eigenvalues of Eq. 17 can be written analytically as
three distinct cubic roots
ν20 + 3ν
arccos
2ν30 + 9ν0ν
1 + 27ν
2(ν20 + 3ν
νmF ≡ν3P0,mF =
ν20 + 3ν
arccos
2ν30 + 9ν0ν
1 + 27ν
2(ν20 + 3ν
where we have
ν0 =ν3P0 + ν3P1 + ν1P1
−ν3P0ν3P1 − ν3P1ν1P1 − ν3P0ν1P1 + (M
ν3P0ν3P1ν1P1 − ν3P1(M
− ν1P1(M
Since the main goal is a description of the 3P0 state sen-
sitivity, the solution can be simplified when one considers
the relative energy spacing of the three states, and that
elements having terms β, αβ, and smaller are negligible
compared to those proportional to only α. Therefore we
can ignore M
terms and find simplified eigenvalues
arising only from the interaction between 3P1 and
that can be expressed as a Breit-Rabi like expression for
the 3P0 state given by
ν3P0,mF =
ν3P0 + ν3P1
ν3P0 − ν3P1
1 + 4
α2|〈3P 00 , F |HZ|
3P 01 , F
(ν3P0 − ν3P1)
For magnetic fields where the Zeeman effect is small com-
pared to the fine-structure splitting, the result is identi-
cal to that from Eq. 8 of the main text. The magnetic
0 10 20 30 40 50 60 70 80
F=9/2
F=11/2
F=7/2
Magnetic Field(G)
FIG. 10: (color online) Magnetic sensitivity of the 1P1 state
calculated with the expression in Eq. 24 using A = −3.4 MHz
and Q = 39 MHz [55]. Note the inverted level structure.
sensitivity of the clock transition (plotted in Fig. 2) is de-
termined by simply subtracting the 〈3P 00 |HZ |
3P 00 〉 term
which is common to both states.
C. Full treatment of the HFI and magnetic field
For a more complete treatment of the Zeeman effect
we can relax the constraint of small fields and treat the
hyperfine and Zeeman interactions simultaneously using
the spin-orbit mixed states in Eq. 1 as a basis. The total
Hamiltonian is written Htotal = HZ+HA+HQ including
hyperfine HA and quadrupole HQ effects in addition to
the Zeeman interaction HZ defined in Eq. 5 of the main
0 500 1000 1500 2000 2500 3000
 Magnetic Field (G)
F=11/2
F=9/2
F=7/2
FIG. 11: (color online) Magnetic sensitivity of the 3P1 state
calculated with the expression in Eq. 24 using A = −260 MHz
and Q = −35 MHz [63].
text. The Hamiltonian Htotal can be written as
Htotal =HZ + A~I · ~J
~I · ~J(2~I · ~J + 1) − IJ(I + 1)(J + 1)
2IJ(2I − 1)(2J − 1)
Diagonalization of the full space using Eq. 23 does not
change the 3P0 result discussed above, even for fields
as large as 104 G. This is not surprising since the 3P0
state has only one F level, and is therefore only af-
fected by the hyperfine interaction through state mix-
ing which was already accounted for in the previous cal-
culation. Alternatively, for an accurate description of
the 1P1,
3P1 and
3P2 states, Eq. 23 must be used. For
an alkaline-earth 2S+1L1 state in the |I, J, F,mF 〉 basis
we find an analytical expression for the field dependence
of the F = I, I ± 1 states and sublevels. The solution
is identical to Eq. 20 except we replace the frequencies
in Eq. 21 with those in Eq. 24. We define the relative
strengths of magnetic, hyperfine, and quadrupole inter-
actions with respect to an effective hyperfine-quadrupole
coupling constant WAQ = A +
4I(1−2I)
as XBR =
, and XQ =
I(1−2I)WAQ
, respectively. The so-
lution is a generalization of the Breit-Rabi formula [61]
for the 2S+1L1 state in the two electron system with nu-
clear spin I. The frequencies are expanded in powers of
XBR as
ν0 = −2WAQ
mFXBR
ν1 = WAQ
2(geff − gI)XA + 3geffXQ
mFXBR +
(geff + gI)
3m2F g
(geff+gI)
ν2 = WAQ
I(I + 1)X
I(I+1)
3(1−2I)(3+2I)
I(I+1)
−XAXQ
geff(2 −
2I(I+1)
) + gI
mFXBR
2gIgeff
I(I+1)
(geff+gI )
3m2F g
I(I+1)(geff+gI)
gI((geff+gI )
2−(gImF )
I(I+1)
with abbreviations
=I(I + 1)
− I(I + 1)XQ(XA − 1)
=Xeff
XQXeff +
X2A −X
3(3 + 2I)(1 − 2I)
Xeff =XA +XQ
(3 + 2I)(1 − 2I)
geff =
− gs)
(L(L+ 1)− S(S + 1)) .
The resulting Zeeman splitting of the 5s5p1P1 and
5s5p3P1 hyperfine states in
87Sr is shown in Fig. 10 and
Fig. 11. For the more complex structure of 3P2, we
have solved Eq. 23 numerically, with the results shown in
Fig. 12. The solution for the 1P1 state depends strongly
on the quadrupole (Q) term in the Hamiltonian, while
for the 3P1 and
3P2 states the magnetic dipole (A) term
is dominant.
0 500 1000 1500 2000 2500 3000
10 F=5/2
F=7/2
F=11/2
F=13/2
F=9/2
Magnetic Field  (G)
FIG. 12: (color online) Magnetic sensitivity of the 3P1 state
calculated numerically with Eq. 23 using A=-212 MHz and
Q=67 MHz [62].
ABSTRACT
  We present a detailed experimental and theoretical study of the effect of
nuclear spin on the performance of optical lattice clocks. With a state-mixing
theory including spin-orbit and hyperfine interactions, we describe the origin
of the $^1S_0$-$^3P_0$ clock transition and the differential g-factor between
the two clock states for alkaline-earth(-like) atoms, using $^{87}$Sr as an
example. Clock frequency shifts due to magnetic and optical fields are
discussed with an emphasis on those relating to nuclear structure. An
experimental determination of the differential g-factor in $^{87}$Sr is
performed and is in good agreement with theory. The magnitude of the tensor
light shift on the clock states is also explored experimentally. State specific
measurements with controlled nuclear spin polarization are discussed as a
method to reduce the nuclear spin-related systematic effects to below
10$^{-17}$ in lattice clocks.

<|endoftext|><|startoftext|>
Introduction
There has recently been considerable interest, both theoretical [16, 18, 19, 3,
13] and experimental [20], in invisibility (or “cloaking”) from observation by
electromagnetic (EM) waves. (See also [17] for a treatment of cloaking in the
context of elasticity.) Theoretically, cloaking devices are given by specifying
the conductivity σ(x) (in the case of electrostatics), the index of refraction
n(x) (for optics in the absence of polarization, where one uses the Helmholtz
equation), or the electric permittivity ǫ(x) and magnetic permeability µ(x)
(for the full system of Maxwell’s equations.) In the constructions to date,
the EM parameter fields ( σ;n; ǫ and µ ) have been piecewise smooth and
anisotropic. (See, however, [5, Sec.4] for an example that can be interpreted
as cloaking with respect to Helmholtz by an isotropic negative index of re-
fraction material.) Furthermore, the EM parameters have singularities, with
one or more eigenvalues of the tensors going to zero or infinity as one ap-
proaches from on or both sides the cloaking surface Σ, which encloses the
region within which objects may be hidden from external observation. Such
constructions might have remained theoretical curiosities, but the advent of
metamaterials[1] allows one, within the constraints of current technology, to
construct media with fairly arbitrary ǫ(x) and µ(x).
It thus becomes an interesting mathematical problem with practical signif-
icance to understand what other new phenomena of wave propagation can
be produced by prescribing other arrangements of ǫ and µ. Geometrically,
cloaking can be viewed as arising from a singular transformation of R3. In-
tuitively, for a spherical cloak [6, 7, 18], it is as if an infinitesimally small
hole in space has been stretched to a ball D; an object can be inserted in-
side the hole so created and is then invisible to external observations. On
the level of the EM parameters, homogeneous, isotropic parameters ǫ, µ are
pushed forward to become inhomogeneous, anisotropic and singular as one
approaches Σ = ∂D from the exterior. There are then two ways, referred to
as the single and double coating in [3], of continuing ǫ, µ to within D so as to
rigorously obtain invisibility with respect to locally finite energy waves. We
refer to either process as blowing up a point. As observed in [3], one can use
the double coating to produce a manifold with a different topology, but with
the change in topology invisible to external measurements.
To define the solutions of Maxwell’s equations rigorously in the single coating
case, one has to add boundary conditions on Σ. Physically, this corresponds
to the lining of the interior of the single coating material, e.g., in the case
of blowing up a point, with a perfectly conducting layer, see [3]. We point
out here that in the recent preprint [21], the single coating construction is
supplemented with selfadjoint extensions of Maxwell operators in the interior
of the cloaked regions; these implicitly impose interior boundary conditions
on the boundary of the cloaked region, similar to the PEC boundary condi-
tion suggested in [3]. For the case of an infinite cylinder the Soft-and-Hard
(SH) interior boundary condition is used in [3] to guarantee cloaking of active
objects, and is needed even for passive ones.
In this paper, we show how more elaborate geometric constructions, cor-
responding to blowing up a curve, enable the description of tunnels which
allow the passage of waves between distant points, while only the ends of
the tunnels are visible to external observation. These devices function as
electromagnetic wormholes, essentially changing the topology of space with
respect to solutions of Maxwell’s equations.
We form the wormhole device around an obstacle K ⊂ R3 as follows. First,
one surrounds K with metamaterials, corresponding to a specification of EM
parameters ε̃ and µ̃. Secondly, one lines the surface of K with material im-
plementing the Soft-and-Hard (SH) boundary condition from antenna theory
[8, 10, 11]; this condition arose previously [3] in the context of cloaking an
infinite cylinder. The EM parameters, which become singular as one ap-
proaches K, are given as the pushforwards of nonsingular parameters ε and
µ on an abstract three-manifold M , described in Sec. 2. For a curve γ ⊂ M ,
we construct the diffeomorphism F fromM \γ to the wormhole device in Sec.
3. For the resulting EM parameters ε̃ and µ̃, we have singular coefficients of
Maxwell’s equations at K, and so it is necessary to formulate an appropriate
notion of locally finite energy solutions (see Def. 4.1). In Theorem 4.2, we
then show that there is a perfect correspondence between the external mea-
surements of EM waves propagating through the wormhole device and those
propagating on the wormhole manifold.
It was shown in [3] that the cloaking constructions are mathematically valid
at all frequencies k. However, both cloaking and the wormhole effect stud-
ied here should be considered as essentially monochromatic, or at least very
narrow-band, using current technology, since, from a practical point of view
the metamaterials needed to implement the constructions have to be fabri-
cated and assembled with a particular wavelength in mind, and theoretically
are subject to significant dispersion [18]. Thus, as for cloaking in [16, 18, 3],
here we describe the wormhole construction relative to electromagnetic waves
at a fixed positive frequency k. We point out that the metamaterials used in
the experimental verification of cloaking [20] should be readily adaptable to
yield a physical implementation, at microwave frequencies, of the wormhole
device described here. See Remark 1 in Sec. 4.2 for further discussion.
The results proved here were announced in [4].
2 The wormhole manifold M
First we explain, somewhat informally, what we mean by a wormhole. The
concept of a wormhole is familiar from general relativity [9, 22], but here
we define a wormhole as an object obtained by gluing together pieces of
Euclidian space equipped with certain anisotropic EM parameter fields. We
start by describing this process heuristically; later, we explain more precisely
how this can be effectively realized vis-a-vis EM wave propagation using
metamaterials.
We first describe the wormhole as an abstract manifold M , see Fig. 1; in
the next section we will show how to realize this concretely in R3, as a
wormhole device N . Start by making two holes in the Euclidian space R3 =
{(x, y, z)|x, y, z ∈ R}, say by removing the open ball B1 = B(O , 1) with
center at the origin O and of radius 1, and also the open ball B2 = B(P, 1),
where P = (0, 0, L) is a point on the z-axis having the distance L > 3 to
the origin. We denote by M1 the region so obtained, M1 = R
3 \ (B1 ∪ B2),
which is the first component we need to construct a wormhole. Note that
M1 is a 3-dimensional manifold with boundary, the boundary of M1 being
∂M1 = ∂B1∪∂B2, the union of two 2-spheres. Thus, ∂M1 can be considered
as a disjoint union S2 ∪ S2, where we will use S2 to denote various copies of
the two-dimensional unit sphere.
The second component needed is a 3−dimensional cylinder, M2 = S2× [0, 1].
This cylinder can be constructed by taking the closed unit cube [0, 1]3 in R3
and, for each value of 0 < s < 1, gluing together, i.e., identifying, all of the
points on the boundary of the cube with z = s. Note that we do not identify
points at the top of the boundary, at z = 1, or at the bottom, at z = 0. We
then glue together the boundary ∂B(O , 1) ∼ S2 of the ball B(O , 1) with the
lower end (boundary component) S2×{0} of M2, and the boundary ∂B(P, 1)
with the upper end, S2 × {1}. In doing so we identify the point (0, 0, 1) ∈
∂B(O , 1) with the point NP × {0} and the point (0, 0, L − 1) ∈ ∂B(P, 1)
with the point NP × {1}, where NP is the north pole on S2.
The resulting manifold M no longer lies in R3, but rather is the connected
sum of the components M1 and M2 and has the topology of R
3 with a
3−dimensional handle attached. Note that adding this handle makes it pos-
sible to travel from one point in M1 to another point in M1, not only along
curves lying in M1 but also those in M2.
To consider Maxwell’s equations on M , let us start with Maxwell’s equations
on R3 at frequency k ∈ R, given by
∇× E = ikB, ∇×H = −ikD, D(x) = ε(x)E(x), B(x) = µ(x)H(x).
Here E and H are the electric and magnetic fields, D and B are the electric
displacement field and the magnetic flux density, ε and µ are matrices corre-
sponding to permittivity and permeability. As the wormhole is topologically
different from the Euclidian space R3, we use a formulation of Maxwell’s
equations on a manifold, and as in [3], do this in the setting of a general Rie-
mannian manifold, (M, g). For our purposes, as in [14, 3] it suffices to use
ε, µ which are conformal, i.e., proportional by scalar fields, to the metric g.
In this case, Maxwell’s equations can be written, in the coordinate invariant
form, as
dE = ikB, dH = −ikD, D = ǫE, B = µH in M,
where E,H are 1-forms, D,B are 2-forms, d is the exterior derivative, and ǫ
and µ are scalar functions times the Hodge operator of (M, g), which maps
1-forms to the corresponding 2-forms [2]. In local coordinates these equations
are written in the same form as Maxwell’s equations in Euclidian space with
matrix valued ε and µ. Although not necessary, for simplicity one can choose
a metric on the wormhole manifold M which is Euclidian on M1, and on M2
is the product of a given metric g0 on S
2 and the standard metric of [0, 1].
More generally, can also choose the metric on M2 to be a warped product.
Even the simple choice of the product of the standard metric of S2 and the
metric δ2ds2, where δ is the “length” of the wormhole, gives rise to interesting
ray-tracing effects for rays passing through the wormhole tunnel. For δ << 1,
the image through one end of the wormhole (of the region beyond the other
end) would resemble the image in a a fisheye lens; for δ & 1, multiple images
and greater distortion occur. (See [4, Fig.2].)
The proof of the wormhole effect that we actually give is for yet another
variation, where the balls that form the ends have their boundary spheres
flattened; this may be useful for applications, since it allows for there to be a
vacuum (or air) in a neighborhood of the axis of the wormhole, so that, e.g.,
instruments may be passed through the wormhole. We next show how to
construct, using metamaterials, a device N in R3 that effectively realizes the
geometry and topology of M , relative to solutions of Maxwell’s equations at
frequency k, and hence functions as an electromagnetic wormhole.
3 The wormhole device N in R3
We now explain how to construct a “device” N in R3, i.e., a specification
of permittivity ε and permeability µ, which affects the propagation of elec-
tromagnetic waves in the same way as the presence of the handle M2 in the
wormhole manifold M . What this means is that we prescribe a configuration
of metamaterials which make the waves behave as if there were an invisible
tube attached to R3, analogous to the handle M2 in the wormhole manifold
M . In the other words, as far as external EM observations of the wormhole
device are concerned, it appears as if the topology of space has been changed.
We use cylindrical coordinates (θ, r, z) corresponding to a point (r cos θ, r sin θ, z)
in R3. The wormhole device is built around an obstacle K ⊂ R3. To de-
fine K, let S be the two-dimensional finite cylinder {θ ∈ [0, 2π], r = 2, 0 ≤
z ≤ L} ⊂ R3. The open region K consists of all points in R3 that have
distance less than one to S and has the shape of a long, thick-walled tube
with smoothed corners.
Let us first introduce a deformation map F from M to N = R3 \K or, more
precisely, from M \γ to N \Σ, where γ is a closed curve in M to be described
shortly and Σ = ∂K. We will define F separately on M1 and M2 denoting
the corresponding parts by F1 and F2.
To describe F1, let γ1 be the line segment on the z−axis connecting ∂B(O , 1)
and ∂B(P, 1) in M1, namely, γ1 = {r = 0, z ∈ [1, L − 1]}. Let F1(r, z) =
(θ, R(r, z), Z(r, z)) be such that (R(r, z), Z(r, z)), shown in Fig. 2,
PSfrag replacements
Figure 1: Schematic figure: a wormhole manifold is glued from two com-
ponents, the “handle” and space with two holes. Note that in the actual
construction, the components are three dimensional.
PSfrag replacements
A B C D A
Figure 2: The map (R(r, z), Z(r, z)) in cylindrical coordinates (z, r).
transforms in the (r, z) coordinates the semicircles AB and CD in the left
picture to the vertical line segments A′B′ = {r ∈ [0, 1], z = 0} and C ′D′ =
{r ∈ [0, 1], z = L} in the right picture and the cut γ1 on the left picture to the
curve B′C ′ on the right picture. This gives us a map F1 : M1 \ γ1 → N1 \Σ,
where the closed region N1 in R
3 is obtained by rotation of the region exterior
to the curve A′B′C ′D′ around the z−axis. We can choose F1 so that it is
the identity map in the domain U = R3 \ {−2 ≤ z ≤ L+ 2, 0 ≤ r ≤ 4}.
To describe F2, consider the line segment, γ2 = {NP} × [0, 1] on M2 . The
sphere without the north pole can be ”flattened” and stretched to an open
disc with radius one which, together with stretching [0, 1] to [0, L], gives us a
map F2 from M2 \γ2 to N2\Σ. The region N2 is the 3−dimensional cylinder,
N2 = {θ ∈ [0, 2π], r ∈ [0, 1], z ∈ [0, L]}. When flattening S2 \ NP , we do
it in such a way that F1 on ∂B(O , 1) and ∂B(P, 1) coincides with F2 on
(S2 \NP )× {0} and (S2 \NP )× {1}, respectively.
Thus, F maps M \ γ, where γ = γ1 ∪ γ2 is a closed curve in M , onto N \ Σ;
in addition, F is the identity on the region U .
Now we are ready to define the electromagnetic material parameter tensors
on N . We define the permittivity to be
ε̃ = F∗ε(y) =
(DF )(x)· ε(x)· (DF (x))t
det(DF )
x=F−1(y)
where DF is the derivative matrix of F , and similarly the permeability to
be µ̃ = F∗µ. These deformation rules are based on the fact that permittivity
and permeability are conductivity type tensors, see [14].
Maxwell’s equations are invariant under smooth changes of coordinates. This
means that, by the chain rule, any solution to Maxwell’s equations in M \ γ,
endowed with material parameters ε, µ becomes, after transformation by F ,
a solution to Maxwell’s equations in N \Σ with material parameters ε̃ and µ̃,
and vice versa. However, when considering the fields on the entire spaces M
and N , these observations are not enough, due to the singularities of ε̃ and µ̃
near Σ; the significance of this for cloaking was observed and analyzed in [3].
In the following, we will show that the physically relevant class of solutions to
Maxwell’s equations, namely the (locally) finite energy solutions, remains the
same, with respect to the transformation F , in (M ; ε, µ) and (N ; ε̃, µ̃). One
can analyze the rays in M and N endowed with the electromagnetic wave
propagation metrics g =
εµ and g̃ =
ε̃µ̃, respectively. Then the rays on
M are transformed by F into the rays in N . As almost all the rays on M do
not intersect with γ, therefore, almost all the rays on N do not approach Σ.
This was the basis for [16, 18] and was analyzed further in [19]; see also [17]
for a similar analysis in the context of elasticity. Thus, heuristically one is
led to conclude that the electromagnetic waves on (M ; ε, µ) do not feel the
presence of γ, while those on (N ; ε̃, µ̃) do not feel the presence of K, and
these waves can be transformed into each other by the map F .
Although the above considerations are mathematically rigorous, on the level
both of the chain rule and of high frequency limits, i.e., ray tracing, in the
exteriors M \ γ and N \Σ, they do not suffice to fully describe the behavior
of physically meaningful solution fields on M and N . However, by carefully
examining the class of the finite-energy waves in M and N and analyzing
their behavior near γ and Σ, respectively, we can give a complete analysis,
justifying the conclusions above. Let us briefly explain the main steps of the
analysis using methods developed for theory of invisibility (or cloaking) at
frequency k > 0 [3] and at frequency k = 0 in [6, 7]. The details will follow.
First, to guarantee that the fields in N are finite energy solutions and do not
blow up near Σ, we have to impose at Σ the appropriate boundary condition,
namely, the Soft-and-Hard (SH) condition, see [8, 11],
eθ ·E|Σ = 0, eθ ·H|Σ = 0,
where eθ is the angular direction. Secondly, the map F can be considered
as a smooth coordinate transformation on M \ γ; thus, the finite energy
solutions on M \ γ transform under F into the finite energy solutions on
N \ Σ, and vice versa. Thirdly, the curve γ in M has Hausdorff dimension
equal to one. This implies that the possible singularities of the finite energy
electromagnetic fields near γ are removable [12], that is, the finite energy
fields in M \ γ are exactly the restriction to M \ γ of the fields defined on all
of M .
Combining these steps we can see that measurements of the electromagnetic
fields on (M ; ε, µ) and on (R3 \K; ε̃, µ̃) coincide in U . In the other words, if
we apply any current on U and measure the radiating electromagnetic fields
it generates, then the fields on U in the wormhole manifold (M ; ε, µ) coincide
with the fields on U in (R3 \K; ε̃, µ̃), 3-dimensional space equipped with the
wormhole device construction.
Summarizing our construction, the wormhole device consists of the metama-
terial coating of the obstacle K. This coating should have the permittivity
ε̃ and permeability µ̃. In addition, we need to impose the SH boundary
condition on Σ, which may be realized by fabricating the obstacle K from a
perfectly conducting material with parallel corrugations on its surface [8, 11].
In the next section, the permittivity ε̃ and and permeability µ̃ are described
in a rather simple form. (As mentioned earlier, in order to allow for a tube
around the axis of the wormhole to be a vacuum or air, we deal with a
slightly different construction than was described above, starting with flat-
tened spheres). It should be possible to physically implement an approxima-
tion to this mathematical idealization of the material parameters needed for
the wormhole device, using concentric rings of split ring resonators as in the
experimental verification of cloaking obtained in [20].
4 Rigorous construction of the wormhole
Here we present a rigorous model of a typical wormhole device and justify
the claims above concerning the behavior of the electromagnetic fields in the
wormhole device in R3 in terms of as the fields on the wormhole manifold
(M, g).
4.1 The wormhole manifold (M, g) and the wormhole
device N
Here we prove the wormhole effect for a variant of the wormhole device
described in the previous sections. Instead of using a round sphere S2 as
before, we present a construction that uses a deformed sphere S2flat that is
flat the near the south and north poles, SP and NP . This makes it possible
to have constant isotropic material parameters near the z-axis located inside
the wormhole. For possible applications, see [4].
We use following notations. Let (θ, r, z) ∈ [0, 2π]×R+×R be the cylindrical
coordinates of R3, that is the map
X : (θ, r, z) → (r cos θ, r sin θ, z)
that maps X : [0, 2π] × R+ × R → R3. In the following, we identify [0, 2π]
and the unit circle S1.
Let us start by removing from R3 two “deformed” balls which have flat
portions near the south and north poles. More precisely, let M1 = R
3 \ (P1∪
P2), where in the cylindrical coordinates
P1 = {X(θ, r, z) : −1 ≤ z ≤ 1, 0 ≤ r ≤ 1}
∪{X(θ, r, z) : (r − 1)2 + z2 ≤ 1},
P2 = {X(θ, r, z) : 1 ≤ z − L ≤ 1, 0 ≤ r ≤ 1}
∪{X(θ, r, z) : (r − 1)2 + (z − L)2 ≤ 1}.
We say that the boundary ∂P1 of P1 is a deformed sphere with flat portions,
and denote it by S2flat. We say that the intersection points of S
flat with the
z-axis are the north pole, NP , and the south pole, SP .
Let g1 be the metric on M1 inherited from R
3, and let γ1 be the path
γ1 = {X(0, 0, z) : 1 < z < L− 1} ⊂ M1.
A1 = M1 \ V1/4,
Vt = {X(θ, r, z) : 0 ≤ r ≤ t, 1 < z < L− 1}, 0 < t < 1,
and consider a map G0 : M1 \γ1 → A1; see Fig. 3. G0 defined as the identity
map on M1 \ V1/2 and, in cylindrical coordinates, as
G0(X(θ, r, z)) = X(θ,
, z), (θ, r, z) ∈ V1/2.
Clearly, G0 is C
0,1−smooth.
Let U(x) ∈ R3×3, x = X(θ, r, z), be the orthogonal matrix that maps the
standard unit vectors e1, e2, e3 of R
3 to the Euclidian unit vectors correspond-
ing to the θ, r, and z directions, that is,
U(x)e1 = (− sin θ, cos θ, 0), U(x)e2 = (cos θ, sin θ, 0), U(x)e3 = (0, 0, 1).
Then the differential of G0 in the Euclidian coordinates at the point x ∈ V1/2
is the matrix
DG0(x)U(y)
) 0 0
0 0 1
U(x)−1, x = X(θ, r, z), y = G0(x). (1)
Later we impose on part of the boundary, Σ0 = ∂A1 ∩ {1 < z < L− 1}, the
soft-and-hard boundary condition (marked red in the figures).
Next, let (θ, z, τ) = (θ(x), z(x), τ(x)) be the Euclidian boundary normal
coordinates associated to Σ0, that is, τ(x) = distR3(x,Σ0) and (θ(x), z(x))
are the θ and z-coordinates of the closest point of Σ0 to x.
Denote by (G0)∗g1 the push forward of the metric g1 in G0, that is, the
metric obtained from g1 using the change of coordinates G0, see [2]. The
metric (G0)∗g1 coincides with g1 in A1 \ V1/2, and in the Euclidian boundary
normal coordinates of Σ0, on A1 ∩ V1/2, the metric (G0)∗g1, has the length
element
ds2 = 4τ 2 dθ2 + dz2 + 4dτ 2.
PSfrag replacements
Figure 3: A schematic figure on the map G0, considered in the (r, z) coor-
dinates. Later, we impose the SH boundary condition on the portion of the
boundary coloured red.
Next, let
q3 = conv
{(r, z) : (r − 2)2 + (z − (−2))2 ≤ 1}
∪{(r, z) : (r − 2)2 + (z − (L+ 2))2 ≤ 1}
q4 = {(r, z) : 0 ≤ r ≤ 1, −1 ≤ z ≤ L+ 1},
where conv(q) denotes the convex hull of the set q.
N1 = R
3 \ (P3 ∪ P4),
P3 = {X(θ, r, z) : (r, z) ∈ q3},
P4 = {X(θ, r, z) : (r, z) ∈ q4},
Σ1 = ∂N1 \ ∂P4.
We can find a Lipschitz smooth map G1 : A1 → N1, see Fig. 4, of the form
G1(X(θ, r, z)) = X(θ, R(r, z), Z(r, z))
such that it maps Σ0 to Σ1, and in A1 near Σ0 it is given by
G1(x+ tν0) = G1(x) + tν1. (2)
Here, x ∈ Σ0, ν0 is the Euclidian unit normal vector of Σ0, ν1 is the Euclidian
unit normal vector of Σ1, and 0 < t <
. Moreover, we can find a G1 so that
it is the identity map near the z-axis, that is,
G1(x) = x, x ∈ A1 ∩ {0 ≤ r <
} (3)
and such that G1 is also the identity map in the set of points with the
Euclidian distance 4 or more from P1 ∪ P2. Note that we can find such a
G1 such that both G1 and its inverse G
1 are Lipschitz smooth up to the
boundary. Thus the differentialDG1 ofG1 at x ∈ A1 in Euclidian coordinates
DG1(x) = U(y)
a11(r, z) 0
0 A(r, z)
U(x)−1, x = X(θ, r, z), y = G1(x),
where c0 ≤ a11(r, z) ≤ c1 and A(r, z) is a symmetric (2×2)-matrix satisfying
c0I ≤ A(r, z) ≤ c1I
with some c0, c1 > 0.
The map F1(x) = G1(G0(x)) then maps F1 : M1\γ1 → N1. Let g̃1 = (F1)∗g1
be metric on N1. From the above considerations, we see that the differential
DF1 of F1 at x ∈ M1 \ γ1 near Σ0, in Euclidian coordinates, is given by
DF1(x) = U(y)
b11(θ, r, z) 0
0 B(r, z)
U(x)−1, (4)
b11(θ, r, z) =
c11(r, z)
distR3(X(θ, r, z),Σ0)
x = X(θ, r, z), y = F1(x)
where c0 ≤ c11(r, z) ≤ c1, and B(r, z) is a symmetric (2×2)-matrix satisfying
c0I ≤ B(r, z) ≤ c1I,
for some c0, c1 > 0.
Note that ∂P4∩{r < 1} consists of two two-dimensional discs, B2(0, 1)×{−1}
and B2(0, 1)× {L+ 1}. Below, we will use the map
f2 = F1|∂P1\NP : ∂P1 \NP → B2(0, 1)× {−1} ⊂ ∂N1.
The map f2 can be considered as the deformation that “flattens” S
flat \NP
to a two dimensional unit disc.
PSfrag replacements
Figure 4: Map G1 in (r, z)-coordinates.
To describe f2, consider S
flat as a surface in Euclidian space and define on it
the θ coordinate corresponding to the θ coordinate of R3 \ {z = 0}. Let then
s(y) be the intrinsic distance of y ∈ S2flat to the south pole SP . Then (θ, s)
define coordinates in S2flat\{SP,NP}. We denote by y(θ, s) ∈ S2flat\{SP,NP}
the point corresponding to the coordinates (θ, s).
By the above construction, the map f2 has the form, with respect to the
coordinates used above,
f2(y(θ, s)) = X(θ, R(s),−1) ∈ B2(0, 1)× {−1}, where (5)
R(s) = s, for 0 < s <
R(s) = 1− 1
[(π + 4)− s], for (π + 4)− 1
< s < (π + 4),
cf. formulae (2) and (3). In the following we identify B2(0, 1) × {−1} with
the disc B2(0, 1).
Let h1 be the metric on ∂P1 \NP inherited from (M1, g1). Let h2 = (f2)∗h1
be the metric on B2(0, 1). We observe that the metric h2 makes the disc
B2(0, 1) isometric to S
flat \NP , endowed with the metric inherited from R3.
Thus, let
M2 = S
flat × [−1, L+ 1].
OnM2, let the metric g2 be the product of the metric of S
flat inherited from R
and the metric α2(z)dz
2, α2 > 0 on [−1, L+1]. Let γ2 = {NP}× [−1, L+1]
be a path on M2.
Define N2 = P4 = {X(θ, r, z) : 0 ≤ r < 1,−1 ≤ z ≤ L + 1} ⊂ R3,
Σ2 = ∂N2 ∩ {r = 1}, and let F2 : M2 \ γ2 → N2 be the map of the form
F2(y, z) = (f2(y), z) ∈ R3, (y, z) ∈ (S2flat \NP )× [−1, L+ 1]. (6)
Let g̃2 = (F2)∗g2 be the resulting metric on N2.
Figure 5: The set N2 in the (r, z) coordinates. Later, we impose the SH
boundary condition on the portion of the boundary colored red.
Denote byM 1 = M1∪∂M1 the closure ofM1 and let (M, g) = (M 1, g1)#(M2, g2)
be the connected sum of M 1 and M2, that is, we glue the boundaries ∂M1
and ∂M2. The set N = N1 ∪ N2 ⊂ R3 is open, and its boundary ∂N is
Σ = Σ1 ∪ Σ2.
Let F be the map F : M \ γ → N defined by the maps F1 : M1 \ γ1 → N1
and F2 : M2 \ γ2 → N2, and finally, let γ = γ1 ∪ γ2 and g̃ = F∗g.
Figure 6: The set N = N1 ∪ N2 ⊂ R3 having the complement K, presented
in the (r, z) coordinates. Later, the SH boundary condition is imposed on
Let K = R3 \N . On the surface Σ = ∂K we can use local coordinates (t̃, θ̃),
where θ̃ is the θ-coordinate of the ambient space R3 and t̃ is either the r or
z -coordinate of the ambient space R3 restricted to Σ. Denote also
τ̃ = τ̃ (x) = distR3(x, ∂K).
Then by formula (2) we see that in N1, in the Euclidian boundary normal
coordinates (θ̃, t̃, τ̃) associated to the surface Σ1, the metric g̃ has the length
element
ds2 = 4dτ̃ 2 + α1(t̃) dt̃
2 + 4τ̃ 2 dθ̃2, 0 < τ̃ <
, c−10 ≤ α1(t̃) ≤ c0, c0 ≥ 1.
The construction of F2 yields that in N2 , in the Euclidian boundary normal
coordinates (θ̃, t̃, τ̃) with t̃ = z, associated to the surface Σ2 = ∂K ∩ ∂N2,
the metric g̃ has the length element, near Σ2,
ds2 = 4dτ̃ 2 + α2(t̃)dt̃
2 + 4τ̃ 2 dθ̃2, 0 < τ̃ <
Here, near ∂N1 ∩ ∂N2, we use t̃ = z on Σ1. Choosing the map G1 in the
construction of the map F1 appropriately, we have α2(−1) = α1(−1), α2(L+
1) = α1(L+ 1), and the resulting map is Lipschitz.
On M1, N1, and N2 that are subsets of R
3 we have the well defined cylin-
drical coordinates (θ, r, z). Similarly, M2 = S
flat × [−1, L + 1] we define
the coordinates (θ, s, z), where (θ, s) are the above defined coordinates on
flat \ {SP,NP}.
We can also consider on N ⊂ R3 also the Euclidian metric, denoted by
ge. In Euclidean coordinates, (ge)ij = δjk. Consider next the above defined
Euclidian boundary normal coordinates (θ̃, t̃, τ̃) associated to ∂K. They are
well defined in a neighborhood of ∂K. We define the vector fields
ξ̃ = ∂eτ , η̃ = ∂eθ, ζ̃ = ∂et
on N near ∂K. These vector fields are orthogonal with respect to the metric
g̃ and to the metric ge.
On M near γ, we use coordinates (θ, t, τ). On M1, near γ1 they in the terms
of the cylindrical coordinates are (θ, t, τ) = (θ, z, r). On M2, they are the
coordinates (θ, t, τ) = (θ, z, s), where s is the intrinsic distance to the north
pole NP . We define also the vector fields
ξ = ∂τ , η = ∂θ, ζ = ∂t
on M \ γ near γ. These vector fields are orthogonal with respect to the
metric g.
In the sequel, we consider the differential of F as the linear map
DF : (TxM, g) → (TyN, ge), y = F (x), x ∈ M \ γ.
Using formula (4) in M1 and formulas (5), (6) in M2, we see that DF
−1(x)
at x ∈ N near ∂N is a bounded linear map that satisfies
|(η,DF−1(x)η̃)g| ≤ C τ̃(x), (ζ,DF−1(x)η̃)g = 0, (ξ,DF−1(x)η̃)g = 0,
(η,DF−1(x)ζ̃)g = 0, |(ζ,DF−1(x)ζ̃)g| ≤ C, |(ξ,DF−1(x)ζ̃)g| ≤ C,
(η,DF−1(x)ξ̃)g = 0, |(ζ,DF−1(x)ξ̃)g| ≤ C, |(ξ,DF−1(x)ξ̃)g| ≤ C,
where C > 0 and (· , · )g is the inner product defined by the metric g. More-
over, we obtain similar estimates for DF in terms of the Euclidian metric
|(η̃, DF (y)η)ge| ≤ C τ(y)−1, (ζ̃ , DF (y)η)ge = 0, (ξ̃, DF (y)η)ge = 0,
(η̃, DF (y)ζ)ge = 0, |(ζ̃ , DF (y)ζ)ge| ≤ C, |(ξ̃, DF (y)ζ)ge| ≤ C,
(η̃, DF (y)ξ)ge = 0, |(ζ̃ , DF (y)ξ)ge| ≤ C, |(ξ̃, DF (y)ξ)ge| ≤ C
for y ∈ M \ γ near γ with C > 0.
Next, consider DF (y) at y ∈ M \γ. Recall that the singular values sj(y), j =
1, 2, 3 of DF (y) are the square roots of the eigenvalues of (DF (y))tDF (y),
where (DF )t is the transpose of DF . By (7), the singular values sj = sj(y),
j = 1, 2, 3, of DF (y), numbered in increasing order, satisfy
c1 ≤ s1(y) ≤ c2,
c1 ≤ s2(y) ≤ c2,
≤ s3(y) ≤
where c1, c2 > 0.
The determinant of the matrix DF (y) can be computed in terms of its sin-
gular values by det(DF ) = s1s2s3. Later, we need the norm of the matrix
det(DF (y))−1DF (y). It satisfies by formula (8)
‖det(DF (y))−1DF (y)‖ = ‖(
s−1k )diag (s1, s2, s3)‖ = max
1≤j≤3
k 6=j
s−1k ≤ c−21 . (9)
4.2 Maxwell’s equations on the wormhole with SH coat-
Let dV0(x) denote the Euclidian volume element on N ⊂ R3. Recall that
N ⊂ R3 is open set with boundary ∂N = Σ. Let dVg be the Riemannian
volume on (M, g). We consider below the map F : M\γ → N as a coordinate
deformation. The map F induces for any differential form Ẽ on N a form
E = F ∗Ẽ in M \ γ called the pull back of Ẽ in F , see [2].
Next, we consider Maxwell equations with degenerate material parameters ε̃
and µ̃ on N with SH boundary conditions on Σ. On M and N we define the
permittivity and permeability by setting
εjk = µjk = det(g)1/2gjk, on M, (10)
ε̃jk = µ̃jk = det(g̃)1/2g̃jk, on N.
Here, and below, the matrix [gjk(x)] is the representation of the metric g in
local coordinates, [gjk(x)] is the inverse of the matrix [gjk(x)], and det(g) is
the determinant of [gjk(x)]. We note that the metric g̃ is degenerate near
Σ, and thus ε̃ and µ̃, represented as matrices in the Euclidian coordinates,
have elements that tend to infinity at Σ, that is, the matrices ε̃ and µ̃ have
a singularity near Σ.
Remark 1. Modifying the above construction by replacing M2 with M2 =
flat× [l1, l2] for appropriate l1, l2 ∈ R and choosing F1 in an appropriate way,
we can use local coordinates (θ̃, t̃) on Σ such that the Euclidian distance
along Σ of points (θ̃, t̃1) and (θ̃, t̃2) is proportional to |t̃1− t̃2|, and the metric
g̃ in the Euclidian boundary normal coordinates (θ̃, t̃, τ̃) associated to ∂K
has the form
ds2 = 4dτ̃ 2 + dt̃2 + 4τ̃ 2 dθ̃2, 0 < τ̃ <
The metric corresponding to the metamaterials used in the physical exper-
iment in [20] has the same form in Euclidian boundary normal coordinates
associated to an infinitely long cylinder B2(0, 1)×R. Thus it seems likely that
metamaterials similar to those used in the experimental verification of cloak-
ing could be used to create physical wormhole devices working at microwave
frequencies.
4.3 Finite energy solutions of Maxwell’s equations
and the equivalence theorem
In the following, we consider 1-forms Ẽ =
j Ẽjdx̃
j and H̃ =
j H̃jdx̃
the Euclidian coordinates (x̃1, x̃2, x̃3) of N ⊂ R3. In the sequel, we use Ein-
stein’s summation convention and omit the sum signs. We use the Euclidian
coordinates as we want to consider N with the differential structure inherited
from the Euclidian space. We say that Ẽj and H̃j are the (Euclidian) coeffi-
cients of the forms Ẽ and H̃, correspondingly. We say that these coefficients
are in L
loc(N, dV0), 1 ≤ p < ∞, if∫
|Ej(x)|p dV0(x) < ∞, for all bounded measurable sets W ⊂ N.
Definition 4.1 We say that the 1-forms Ẽ and H̃ are finite energy solutions
of Maxwell’s equations in N with the soft-and-hard (SH) boundary conditions
on Σ and the frequency k 6= 0,
∇× Ẽ = ikµ̃(x)H̃, ∇× H̃ = −ikε̃(x)Ẽ + J̃ on N,
η̃ · Ẽ|Σ = 0, η̃ · H̃|Σ = 0,
if 1-forms Ẽ and H̃ and 2-forms D̃ = ε̃Ẽ and B̃ = µ̃H̃ in N have coefficients
in L1loc(N, dV0) and satisfy
‖Ẽ‖2
L2(W,|eg|1/2dV0)
ε̃jk Ẽj Ẽk dV0(x) < ∞,
‖H̃‖2
L2(W,|eg|1/2dV0))
µ̃jk H̃j H̃k dV0(x) < ∞
for all bounded measurable sets W ⊂ N , and finally,
((∇× h̃) · Ẽ − ikh̃ · µ̃(x)H̃) dV0(x) = 0,
((∇× ẽ) · H̃ + ẽ · (ikε̃(x)Ẽ − J̃)) dV0(x) = 0,
for all 1-forms ẽ and h̃ with coefficients in C∞0 (N) that satisfy
η̃ · ẽ|Σ = 0, η̃ · h̃|Σ = 0, (11)
where η̃ = ∂θ is the angular vector field that is tangential to Σ.
Below, we use for 1-forms E = Ejdx
j and H = Hjdx
j , given in local coordi-
nates (x1, x2, x3) on M , the notations
∇× E = dH, ∇· (εE) = d ∗ E, ∇· (µH) = d ∗H,
where d is the exterior derivative and ∗ is the Hodge operator on (M, g), cf.
formula (10).
We have the following “equivalent behavior of electromagnetic fields on N
and M” result, analogous to the results of [3] for cloaking.
Theorem 4.2 Let E and H be 1-forms on M \ γ and Ẽ and H̃ be 1-forms
with coefficients in L1loc(N, dV0) such that E = F
∗Ẽ, H = F ∗H̃. Let J̃
and J = F ∗J̃ be 2-forms with smooth coefficients in N and M \ γ that are
supported away from Σ and γ.
Then the following are equivalent:
1. On N , the 1-forms Ẽ and H̃ satisfy Maxwell’s equations with SH bound-
ary conditions in the sense of Definition 4.1.
2. On M , the forms E and H can be extended on M so that they are
classical solutions E and H of Maxwell’s equations,
∇× E = ikµH, in M,
∇×H = −ikεE + J, in M.
Proof. Assume first that E and H satisfy Maxwell’s equations on M with
source J supported away from γ. Then E and H are C∞ smooth near γ.
Using F−1 : N → M \ γ we define the 1-forms Ẽ, H̃ and 2-form J̃ on N
by Ẽ = (F−1)∗E, H̃ = (F−1)∗H , and J̃ = (F−1)∗J. These fields satisfy
Maxwell’s equations in N ,
∇× Ẽ = ikµ̃(x)H̃, ∇× H̃ = −ikε̃(x)Ẽ + J̃ in N. (12)
Now, writing E = Ej(x)dx
j on M near γ, we see using the transformation
rule for differential 1-forms that the form Ẽ = (F−1)∗E is in local coordinates
Ẽ = Ẽj(x̃)dx̃
j , Ẽj(x̃) = (DF
−1)kj (x̃)Ek(F
−1(x̃)), x̃ ∈ N. (13)
Using the smoothness of E and H near γ on M and formulae (7), we see
that Ẽ, H̃ are forms on N with L1loc(N, dV0) coefficients. Moreover,
ε̃(x)Ẽ(x) = det(DF (y))−1DF (y)ε(y)DF (y)t(DF (y)t)−1E(y)
= det(DF (y))−1DF (y)ε(y)E(y)
where x ∈ N , y = F−1(x) ∈ M \ γ. Formula (9) shows that D̃ = ε̃Ẽ, and
B̃ = µ̃H̃ are 2-forms on N with L1loc(N, dV0) coefficients.
Let Σ(t) ⊂ N be the t-neighbourhood of Σ in the g̃-metric. Note that for
small t > 0 the set Σ(t) is the Euclidian (t/2)-neighborhood of ∂K. Denote
by ν be the unit exterior Euclidian normal vector of ∂Σ(t) and the Euclidian
inner product by (η̃, Ẽ)ge = η̃ · Ẽ.
Formulas (7) and (13) imply that the angular components satisfy
|η̃ · Ẽ| ≤ Ct, x ∈ ∂Σ(t),
|ζ̃ · Ẽ| ≤ C, x ∈ ∂Σ(t)
with some C > 0. Thus denoting by dS the Euclidian surface area on ∂Σ(t),
Stokes’ formula, formula (12), and the identity ν × ξ̃ = ±η̃ yield
((∇× h̃) · Ẽ − ikh̃ · µ̃H̃) dV0(x)
= lim
N\Σ(t)
((∇× h̃) · Ẽ − ikh̃ · µ̃H̃) dV0(x)
= − lim
∂Σ(t)
(ν × Ẽ) · h̃ dS(x)
= − lim
∂Σ(t)
ν × ((η̃ · Ẽ)η̃ + (ζ̃ · Ẽ)ζ̃) · h̃ dS(x)
for a test function h̃ satisfying formula (11).
Similar analysis for H̃ shows that 1-forms Ẽ and H̃ satisfy Maxwell’s equa-
tions with SH boundary conditions in the sense of Definition 4.1.
Next, assume that Ẽ and H̃ form a finite energy solution of Maxwell’s equa-
tions on (N, g) with a source J̃ supported away from Σ, implying in particular
ε̃jkẼjẼk ∈ L1(W, dV0), µ̃jkH̃jH̃k ∈ L1(W, dV0)
where W = F (U \ γ) ⊂ N and U ⊂ M is a relatively compact open neigh-
bourhood of γ, supp (J̃)∩W = ∅. Define E = F ∗Ẽ, H = F ∗H̃ , and J = F ∗J̃
on M \ γ. Therefore we conclude that
∇×E = ikµ(x)H, ∇×H = −ikε(x)E + J, in M \ γ
εjkEjEk ∈ L1(U \ γ, dVg), µjkHjHk ∈ L1(U \ γ, dVg).
As representations of ε and µ, in local coordinates of M , are matrices that
are bounded from above and below, these imply that
∇× E ∈ L2(U \ γ, dVg), ∇×H ∈ L2(U \ γ, dVg),
∇· (εE) = 0, ∇· (µH) = 0, in U \ γ.
Let Ee, He ∈ L2(U, dVg) be measurable extensions of E and H to γ. Then
∇× Ee − ikµ(x)He = 0, in U \ γ,
∇× Ee − ikµ(x)He ∈ H−1(U, dVg),
∇×He + ikε(x)Ee = 0, in U \ γ,
∇×He + ikε(x)Ee ∈ H−1(U, dVg),
where H−1(U, dVg) is the Sobolev space with smoothness (−1) on (U, g).
Since γ is a subset with (Hausdorff) dimension 1 of the 3-dimensional domain
U , it has zero capacitance. Thus, the Lipschitz functions on U that vanish on
γ are dense in H1(U), see [12]. Therefore, there are no non-zero distributions
in H−1(U) supported on γ. Hence we see that
∇× Ee − ikµ(x)He = 0, ∇×He + ikε(x)Ee = 0 in U.
This also implies that
∇· (εEe) = 0, ∇· (µHe) = 0 in U,
which, by elliptic regularity, imply that Ee and He are C∞ smooth in U .
In summary, E and H have unique continuous extensions to γ, and the
extensions are classical solutions to Maxwell’s equations. ✷
References
[1] G. Eleftheriades and K. Balmain, Negative-Refraction Metamaterials,
IEEE Press (Wiley-Interscience), 2005.
[2] T. Frankel, The geometry of physics, Cambridge University Press, Cam-
bridge, 1997.
[3] A. Greenleaf, Y. Kurylev, M. Lassas and G. Uhlmann, Full-wave invisi-
bility of active devices at all frequencies, ArXiv.org:math.AP/0611185),
2006; Comm. Math. Phys., to appear.
[4] A. Greenleaf, Y. Kurylev, M. Lassas and G. Uhlmann,
Electromagnetic wormholes and virtual magnetic monopoles,
ArXiv.org:math-ph/0703059, submitted, 2007.
[5] A. Greenleaf, M. Lassas, and G. Uhlmann, The Calderón problem for
conormal potentials, I: Global uniqueness and reconstruction, Comm.
Pure Appl. Math 56 (2003), no. 3, 328–352
[6] A. Greenleaf, M. Lassas, and G. Uhlmann, Anisotropic conductivities
that cannot detected in EIT, Physiological Measurement (special issue
on Impedance Tomography), 24 (2003), pp. 413-420.
[7] A. Greenleaf, M. Lassas, and G. Uhlmann, On nonuniqueness for
Calderón’s inverse problem, Math. Res. Let. 10 (2003), no. 5-6, 685-693.
[8] I. Hänninen, I. Lindell, and A. Sihvola, Realization of generalized Soft-
and-Hard Boundary, Progr. In Electromag. Res., PIER 64, 317-333, 2006.
[9] S. Hawking and G. Ellis, The Large Scale Structure of Space-Time, Cam-
bridge Univ. Press, 1973.
http://arxiv.org/abs/math/0611185
http://arxiv.org/abs/math-ph/0703059
[10] P.-S. Kildal, Definition of artificially soft and hard surfaces for electro-
magnetic waves, Electron. Lett. 24 (1988), 168–170.
[11] P.-S. Kildal, Artificially soft-and-hard surfaces in electromagnetics,
IEEE Trans. Ant. and Propag., 10 (1990), 1537-1544.
[12] T. Kilpeläinen, J. Kinnunen, and O. Martio, Sobolev spaces with zero
boundary values on metric spaces. Potential Anal. 12 (2000), no. 3, 233–
[13] R. Kohn, H. Shen, M. Vogelius, and M. Weinstein, in preparation.
[14] Y. Kurylev, M. Lassas, and E. Somersalo, Maxwell’s equations with a
polarization independent wave velocity: Direct and inverse problems, J.
Math. Pures Appl., 86 (2006), 237-270.
[15] M. Lassas, M. Taylor, G. Uhlmann, On determining a non-compact
Riemannian manifold from the boundary values of harmonic functions,
Comm. Geom. Anal. 11 (2003), 207-222.
[16] U. Leonhardt, Optical Conformal Mapping, Science 312 (23 June,
2006), 1777-1780.
[17] G. Milton, M. Briane, J. Willis, On cloaking for elasticity and physical
equations with a transformation invariant form, New J. Phys. 8 (2006),
[18] J.B. Pendry, D. Schurig, D.R. Smith, Controlling Electromagnetic
Fields, Science 312 (23 June, 2006), 1780-1782.
[19] J.B. Pendry, D. Schurig, D.R. Smith, Optics Express 14, 9794 (2006).
[20] D. Schurig, J. Mock, B. Justice, S. Cummer, J. Pendry, A. Starr, and
D. Smith, Metamaterial electromagnetic cloak at microwave frequencies,
Science 314 (10 Nov. 2006), 977-980.
[21] R. Weder, A rigorous time-domain analysis of full–wave electromagnetic
cloaking (Invisibility), preprint, ArXiv.org:07040248v1 (2007).
[22] M. Visser, Lorentzian Wormholes, AIP Press, 1997.
Department of Mathematics
University of Rochester
Rochester, NY 14627, USA, allan@math.rochester.edu
Department of Mathematical Sciences
University of Loughborough
Loughborough, LE11 3TU, UK, Y.V.Kurylev@lboro.ac.uk
Institute of Mathematics
Helsinki University of Technology
Espoo, FIN-02015, Finland, Matti.Lassas@tkk.fi
Department of Mathematics
University of Washington
Seattle, WA 98195, USA, gunther@math.washington.edu
	Introduction
	The wormhole manifold M
	The wormhole device N in R3
	Rigorous construction of the wormhole
	The wormhole manifold (M,g) and the wormhole device N
	Maxwell's equations on the wormhole with SH coating 
	Finite energy solutions of Maxwell's equations  and the equivalence theorem
ABSTRACT
  Cloaking devices are prescriptions of electrostatic, optical or
electromagnetic parameter fields (conductivity $\sigma(x)$, index of refraction
$n(x)$, or electric permittivity $\epsilon(x)$ and magnetic permeability
$\mu(x)$) which are piecewise smooth on $\mathbb R^3$ and singular on a
hypersurface $\Sigma$, and such that objects in the region enclosed by $\Sigma$
are not detectable to external observation by waves. Here, we give related
constructions of invisible tunnels, which allow electromagnetic waves to pass
between possibly distant points, but with only the ends of the tunnels visible
to electromagnetic imaging. Effectively, these change the topology of space
with respect to solutions of Maxwell's equations, corresponding to attaching a
handlebody to $\mathbb R^3$. The resulting devices thus function as
electromagnetic wormholes.

<|endoftext|><|startoftext|>
Microsoft Word - manuscript_supergrowth070322.doc
Millimeter-Thick Single-Walled Carbon Nanotube Forests: 
Hidden Role of Catalyst Support 
Suguru Noda1*, Kei Hasegawa1, Hisashi Sugime1, Kazunori Kakehi1,
Zhengyi Zhang2, Shigeo Maruyama2 and Yukio Yamaguchi1
1 Department of Chemical System Engineering, School of Engineering, The University 
of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan 
2 Department of Mechanical Engineering, School of Engineering, The University of 
Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan 
A parametric study of so-called "super growth" of single-walled carbon nanotubes 
(SWNTs) was done by using combinatorial libraries of iron/aluminum oxide catalysts. 
Millimeter-thick forests of nanotubes grew within 10 min, and those grown by using 
catalysts with a thin Fe layer (about 0.5 nm) were SWNTs. Although nanotube forests 
grew under a wide range of reaction conditions such as gas composition and 
temperature, the window for SWNT was narrow. Fe catalysts rapidly grew nanotubes 
only when supported on aluminum oxide. Aluminum oxide, which is a well-known 
catalyst in hydrocarbon reforming, plays an essential role in enhancing the nanotube 
growth rates.  
KEYWORDS: single-walled carbon nanotubes, vertically aligned nanotubes, 
combinatorial method, growth mechanism 
* Corresponding author. E-mail address: noda@chemsys.t.u-tokyo.ac.jp
 Soon after the realizations of the vertically-aligned single-walled carbon 
nanotube (VA-SWNT) forests1) by alcohol chemical vapor deposition (ACCVD),2)
many groups achieved this morphology of nanotubes by several tricks in CVD 
conditions.3-6) Among these methods, the water-assisted method, the so-called "super 
growth" method,3) realized an outstanding growth rate of a few micrometers per second, 
thus yielding millimeter-thick VA-SWNT forests. Despite its significant impact on the 
nanotube community, no other research groups have been successful in reproducing 
"super growth". Later, the control of the nominal thickness of Fe in the Fe/ Al2O3
catalyst was shown crucial for controlling the number of walls and diameters of the 
nanotubes.7) In this work, we carried out a parametric study of this growth method by 
using a combinatorial method that we previously developed for catalyst optimization.8,9)
 Si wafers that had a 50-nm-thick thermal oxide layer and quartz glass 
substrates were used as substrates, and Fe/ SiO2, Fe/Al2Ox, and Fe/Al2O3 catalysts were 
prepared by sputter deposition on them. An Al2Ox layer was formed by depositing 
15-nm-thick Al on the substrates, and then exposing the layer to air. A 20-nm-thick 
Al2O3 layer was formed by sputtering an Al2O3 target. Then, Fe was deposited on SiO2,
on Al2Ox, and on Al2O3. In some experiments, gradient-thickness profiles were formed 
for Fe by using the combinatorial method previously described.9) The catalysts were set 
in a tubular, hot-wall CVD reactor (22-mm inner diameter and 300-mm length), heated 
to a target temperature (typically 1093 K), and kept at that temperature for 10 min while 
being exposed to 27 kPa H2/ 75 kPa Ar at a flow rate of 500 sccm, to which H2O vapor 
was added at the same partial pressure as for the CVD condition (i.e., 0 to 0.03 kPa). 
During this heat treatment, Fe formed into a nanoparticle structure with a diameter and 
areal density that depended on the initial Fe thickness.8) After the heat treatment, CVD 
was carried out by switching the H2/ H2O /Ar gas to C2H4/ H2/ H2O/ Ar. The standard 
condition was 8.0 kPa C2H4/ 27 kPa H2/ 0.010 kPa H2O/ 67 kPa Ar and 1093 K. The 
samples were analyzed by using transmission electron microscopy (TEM) (JEOL 
JEM-2000EX) and micro-Raman scattering spectroscopy (Seki Technotron, STR-250) 
with an excitation wavelength at 488 nm.  
 Figure 1a shows a photograph of the nanotubes grown for 30 min under the 
standard condition. Nanotubes formed forests that were about 2.5 mm thick. The taller 
nanotubes at the edge compared with those at the center of the substrates indicate that 
the nanotube growth rate was limited by the diffusion of the growth species through the 
millimeter-thick forests of nanotubes. Figure 1b shows a TEM image of the as-grown 
sample shown in the center of Fig. 1a. The nanotubes were mostly SWNTs. These 
figures show that "super growth" was achieved. Although catalysts with thicker Fe layer 
(≥ 1 nm) yielded rapid growth for a wide range of CVD conditions, mainly multi-walled 
nanotubes (MWNTs) formed instead of SWNTs. Rapid growth of SWNTs requires 
complicated optimization of the CVD conditions, i.e., C2H4/ H2/ H2O pressures and the 
growth temperature, because the thinner layer of Fe catalysts (around 0.5 nm) yielded 
rapid SWNT growth under a narrow window near the standard condition.  
 The effect of the catalyst supports on the nanotube growth was also studied 
here. Figure 2a shows normal photographs of nanotubes grown by using three types of 
combinatorial catalyst libraries; i.e., Fe/ SiO2, Fe/ Al2Ox, and Fe/ Al2O3. For the Fe/ 
SiO2 catalyst, the surface was slightly darker at regions with 0.4- to 0.5-nm-thick Fe. 
For the Fe/ Al2Ox, and Fe/ Al2O3 catalyst, the result was completely different; nanotube 
forests even thicker than the substrates were formed within 10 min. Differences also 
were evident between the catalysts with Al2Ox and Al2O3 supports. When Fe was 
relatively thick (> 0.6 nm), nanotube forests grew thick by using either of these two 
catalysts. When Fe was thinner (≤ 0.6 nm), however, nanotube forests grew thick only 
by using the Fe/ Al2Ox catalyst (Fig. 2b). Figure 2c shows Raman spectra taken at 
several locations for each catalyst library. For Fe/ SiO2, a Raman signal of nanotubes 
was obtained only when the Fe layer was thin (i.e. ≤ 0.8 nm). The sharp and branched 
G-band with small D-band and the peaks of radial breathing mode (RBM) indicate the 
existence of SWNTs. The G/D peak area ratios exceeding 10 indicate that the SWNTs 
were of relatively good quality. For Fe/ Al2Ox, the Raman signal of nanotubes was 
observed also for a thick Fe region (i.e. ≥ 1.0 nm) with G/D ratios somewhat smaller 
than the G/D ratios for Fe/ SiO2. The G/D ratio of 10 for the nanotubes by 0.5 nm Fe/ 
Al2Ox shows that the SWNTs still were of relatively good quality compared with the 
original "super growth".3) As the Fe thickness was increased, G/D ratios became smaller 
because MWNTs became the main product at the thicker Fe regions. For Fe/ Al2O3, the 
results were similar to those for Fe/ Al2Ox except when the Fe layer was thin (around 0.5 
nm) where nanotube forests did not grow. Similar phenomenon was observed also for 
Co and Ni catalysts; they yielded nanotube forests when supported on an aluminum 
oxide layer. These results show that an aluminum oxide layer is essential for "super 
growth", that the growth rate enhancement by Al2Oxmight accompany some decrease in 
the G/D ratio, and that the catalyst Fe layer needs to be thin (< 1 nm for the CVD 
condition studied here) to grow SWNTs. An Al2Ox catalyst support was more suitable 
than Al2O3 to grow SWNTs, and the underlying growth mechanism is now under 
investigation.
 The effect of the H2O vapor on the nanotube growth was studied next. Figure 
3a shows the thickness profiles of nanotube forests grown on the Fe/ Al2Ox catalyst 
library. In the absence of H2O vapor, nanotubes grew at the thin Fe region (0.3- to 1-nm 
thick). Addition of 0.010 kPa H2O, which corresponds to 100 ppmv in the reactant gases, 
enhanced the nanotube growth, especially at the thicker Fe region (> 0.7 nm). Further 
addition of H2O (0.030 kPa), however, inhibited the nanotube growth at the thinner Fe 
region (0.3- 0.6 nm) where SWNTs grew at lower H2O partial pressures. Figure 3b 
shows Raman spectra of these samples. Slight addition of H2O (0.01 kPa) did not affect 
the G/D ratio at the thin Fe region (0.5 nm) but decreased the G/D ratio at the thicker 
region (0.8 and 1.0 nm). Further addition of H2O (0.03 kPa) significantly decreased the 
G/D ratio at the whole region of Fe thickness. These results show that the H2O addition 
up to a certain level can enhance the nanotube growth rate, but too much addition 
degrades the nanotube quality.  
 Considering that alumina and its related materials catalyze hydrocarbon 
reforming,10) a possible mechanism for "super growth" is proposed as follows: C2H4 or 
its derivatives adsorb onto aluminum oxide surfaces, diffuse on the surface to be 
incorporated into Fe nanoparticles, and segregate as nanotubes from Fe nanoparticles. 
H2O vapor keeps aluminum oxide surface reactive by removing the carbon byproducts, 
while simultaneously, H2O reacts with the nanotubes and degrades the quality of the 
nanotubes. The C2H4/H2O pressure ratio needs to be kept large (790 for the standard 
condition in this work) as previously reported in ref. 11. The complicated optimization 
among C2H4, H2, and H2O to achieve "super growth" of SWNTs indicates that balancing 
the carbon fluxes of adsorption onto aluminum oxides, the surface diffusion from 
aluminum oxides to Fe nanoparticles, and the segregation as nanotubes from Fe 
nanoparticles is essential to sustain the rapid nanotube growth at a few micrometers per 
second. During nanotube growth, because the surface of catalyst nanoparticles is mostly 
covered by nanotubes, nanotube growth can be enhanced by introducing a carbon 
source not only through the limited open sites on catalyst nanoparticles but also through 
the catalyst supports whose surface remains uncovered with growing nanotubes. This 
concept might provide a new route for further development of supported catalysts for 
nanotube growth. 
Acknowledgements:  
This work is financially supported in part by the Grant-in-Aid for Young Scientists (A), 
18686062, 2006, from the Ministry of Education, Culture, Sports, Science and 
Technology (MEXT), Japan.  
References:  
1) Y. Murakami, S. Chiashi, Y. Miyauchi, M. Hu, M. Ogura, T. Okubo and S. 
Maruyama: Chem. Phys. Lett. 385 (2004) 298. 
2) S. Maruyama, R. Kojima, Y. Miyauchi, S. Chiashi and M. Kohno: Chem. Phys. Lett. 
360 (2002) 229. 
3) K. Hata, D.N. Futaba, K. Mizuno, T. Nanami, M. Yumura and S. Iijima: Science 306
(2004) 1362. 
4) G. Zhong, T. Iwasaki, K. Honda, Y. Furukawa, I. Ohdomari and H. Kawarada: Jpn. J. 
Appl. Phys. 44 (2004) 1558. 
5) L. Zhang, Y. Tan and D.E. Resasco: Chem. Phys. Lett. 422 (2006) 198. 
6) G. Zhang, D. Mann, L. Zhang, A. Javey, Y. Li, E. Yenilmez, Q. Wang, J. P. McVittie, 
Y. Nishi, J. Gibbons and H Dai, Proc. Nat. Acad. Sci. 102 (2005) 16141. 
7) T. Yamada, T. Nanami, K. Hata, D.N. Futaba, K. Mizuno, J. Fan, M. Yudasaka, M. 
Yumura and S. Iijima: Nat. Nanotechnol. 1 (2006) 131. 
8) S. Noda, Y. Tsuji, Y. Murakami and S. Maruyama: Appl. Phys. Lett. 86 (2005) 
173106.
9) S. Noda, H. Sugime, T. Osawa, Y. Tsuji, S. Chiashi, Y. Murakami and S. Maruyama: 
Carbon 44, (2006) 1414. 
10) S.E. Tung and E, Mcininch: J. Catal. 4 (1965) 586.  
11) D.N. Futaba, K. Hata, T. Yamada, K. Mizuno, M. Yumura and S. Iijima: Phys. Rev. 
Lett. 95 (2005) 056104.  
Figure Captions: 
Fig. 1. Typical nanotubes grown in this work. (a) Normal photographs of nanotube 
forests grown on Fe/ Al2Ox for 30 min under the standard condition (8.0 kPa C2H4/ 27 
kPa H2/ 0.010 kPa H2O/ 67 kPa Ar and 1093 K). Fe catalyst thickness was uniform at 
0.45 nm (left sample), 0.50 nm (middle), and 0.55 nm (right). (b) TEM image of 
nanotubes in Fig. 1a grown using 0.50-nm-thick Fe catalysts. Insets show the enlarged 
images (2.5x) of nanotubes.  
Fig. 2. Effect of support materials for Fe catalyst on nanotube growth. Nanotubes were 
grown for 10 min under the standard condition. (a) Photographs of nanotubes grown by 
using combinatorial catalyst libraries, which had a nominal Fe thickness profile ranging 
from 0.2 nm (at left on each sample) to 3 nm (right) formed on either SiO2, Al2Ox, or 
Al2O3. (b) Relationship between the thickness of nanotube forest (shown in Fig. 2a) and 
the nominal Fe thickness of the catalyst. (c) Raman spectra of the same samples. 
Intensity at the low wavenumber region (< 300 cm-1) is shown magnified by a factor of 
5x in this figure. Declined background signals in some of the RBM spectra (e.g., 0.5, 
0.8-nm-Fe/ SiO2 and 0.5-nm-Fe/ Al2O3) were due to the signal from SiO2 substrates 
passing through the thin nanotube layer.  
Fig. 3. Effect of H2O vapor on the nanotube growth. Nanotubes were grown using Fe/ 
Al2Ox combinatorial catalyst libraries for 10 min under the standard condition except for 
H2O partial pressures. (a) Relationship between the thickness of nanotube forest and the 
nominal Fe thickness of the catalyst at different H2O partial pressures. (b) Raman 
spectra of the same samples. Intensity at the low wavenumber region (< 300 cm-1) is 
shown magnified by a factor of 5x in this figure.  
Fig. 1 S. Noda, et al., submitted to Jpn. J. Appl. Phys. 
Fig. 2 S. Noda, et al., submitted to Jpn. J. Appl. Phys. 
Nominal Fe thickness [nm]
Al2Ox
0.3 0.5 1 3
Al2O3
1300 1400 1500 1600
Raman shift [cm-1]
  5.6
  5.2
  4.1
  6.8
  5.0
100 200 300
0.5 nm
0.8 nm
Fe thickness
0.5 nm
0.8 nm
1.0 nm
0.5 nm
0.8 nm
1.0 nm
on Al2Ox
on Al2O3
on SiO2
Fig. 3 S. Noda, et al., submitted to Jpn. J. Appl. Phys. 
Nominal Fe thickness [nm]
 0 kPa
 0.010 kPa
 0.030 kPa
0.3 0.5 1 3
1300 1400 1500 1600
Raman shift [cm-1]
  5.6
  5.2
  3.1
  2.4
  2.5
100 200 300
0.5 nm
0.8 nm
1.0 nm
Fe thickness
0.5 nm
0.8 nm
1.0 nm
0.5 nm
0.8 nm
1.0 nm
0.010 kPa
0.030 kPa
0 kPa   9.7
  7.6
  7.9
ABSTRACT
  A parametric study of so-called "super growth" of single-walled carbon
nanotubes(SWNTs) was done by using combinatorial libraries of iron/aluminum
oxide catalysts. Millimeter-thick forests of nanotubes grew within 10 min, and
those grown by using catalysts with a thin Fe layer (about 0.5 nm) were SWNTs.
Although nanotube forests grew under a wide range of reaction conditions such
as gas composition and temperature, the window for SWNT was narrow. Fe
catalysts rapidly grew nanotubes only when supported on aluminum oxide.
Aluminum oxide, which is a well-known catalyst in hydrocarbon reforming, plays
an essential role in enhancing the nanotube growth rates.

<|endoftext|><|startoftext|>
Test of nuclear level density inputs for Hauser-Feshbach model calculations
1A.V. Voinov∗, 1S.M. Grimes, 1C.R. Brune, 1M.J. Hornish, 1T.N. Massey, 1,2A. Salas
Department of Physics and Astronomy, Ohio University, Athens, OH 45701, USA and
Los-Alamos National Laboratory, P-25 MS H846, Los Alamos, New Mexico 87545, USA
The energy spectra of neutrons, protons, and α-particles have been measured from the d+59Co
and 3He+58Fe reactions leading to the same compound nucleus, 61Ni. The experimental cross
sections have been compared to Hauser-Feshbach model calculations using different input level
density models. None of them have been found to agree with experiment. It manifests the serious
problem with available level density parameterizations especially those based on neutron resonance
spacings and density of discrete levels. New level densities and corresponding Fermi-gas parameters
have been obtained for reaction product nuclei such as 60Ni,60Co, and 57Fe.
I. INTRODUCTION
The nuclear level density (NLD) is an important in-
put for the calculation of reaction cross sections in the
framework of Hauser-Feshbach (HF) theory of compound
nuclear reactions. Compound reaction cross sections are
needed in many applications including astrophysics and
nuclear data for science and technology. In astrophysics
the knowledge of reaction rates is crucial for understand-
ing nucleosynthesis and energy generation in stars and
stellar explosions. In many astrophysical scenarios, e.g.
the r-process, the cross sections required to compute the
reaction rates are in the regime where the statistical ap-
proach is appropriate [1]. In these cases HF calculations
are an essential tool for determining reaction rates, par-
ticularly for reactions involving radioactive nuclei which
are presently inaccessible to experiment. HF calculations
are likewise very important for other applications, e.g.,
the advanced reactor fuel cycle program [2].
The statistical approach utilized in HF theory [3] re-
quires knowledge of the two quantities for participating
species (see details below). These are the transmission
coefficients of incoming and outgoing particles and level
densities of residual nuclei. Transmission coefficients can
be obtained from optical model potentials established on
the basis of experimental data of elastic and total cross
sections. Because of experimental constraints, the dif-
ference between various sources of transmission coeffi-
cients usually does not exceed 10 − 15%. Level densi-
ties are more uncertain. The reason is that it is diffi-
cult to obtain them experimentally above the region of
well-resolved discrete low-lying levels known from nuclear
spectroscopy. At present, the level density for practi-
cal applications is calculated mainly on the basis of the
Fermi-gas [4] and Gilbert-Cameron [5] formulas with ad-
justable parameters which are found from experimental
data on neutron resonance spacing and the density of low-
lying discrete levels. Parameters recommended for use in
HF calculations are tabulated in Ref. [6]. The global pa-
rameter systematics for both the Fermi-gas and Gilbert-
Electronic address: voinov@ohio.edu
Cameron formulas have been developed in Ref. [7]. How-
ever, it is still unclear how well these parameters repro-
duce compound reaction cross sections. No systematic
investigations have been performed yet. Experimental
data on level density above discrete levels are scarce.
Some information is available from particle evaporation
spectra (i.e. from compound nuclear reactions). The lat-
est data obtained from (p,n) reaction on Sn isotopes [8]
claims that the available level density parameters do not
reproduce neutron cross sections thereby indicating the
problem with level density parameterizations. It becomes
obvious that more experimental data on level density are
needed in the energy region above discrete levels.
In this work, we study compound nuclear reactions
to obtain information about level densities of the resid-
ual nuclei from particle evaporation spectra. Two dif-
ferent reactions, d+59Co and 3He+58Fe, which produce
the same 61Ni compound nucleus, have been investigated.
This approach helps to eliminate uncertainties connected
to a specific reaction mechanism. As opposed to most of
the similar experiments where only one type of outgo-
ing particles has been measured, we have measured cross
sections of all main outgoing particles including neu-
trons, protons, and α-particles, populating 60Ni, 60Co,
and 57Fe, respectively.
We will begin with a discussion of the present status of
level density estimates used as inputs for HF calculations.
II. METHODS OF LEVEL DENSITY
ESTIMATES FOR HF CODES
The simple level counting method to determine the
level density of a nucleus works only up to a certain exci-
tation energy below which levels are well separated and
can be determined from nuclear spectroscopy. This re-
gion is typically up to 2 MeV for heavy nuclei and up
to 6-9 MeV for light ones. Above these energies, more
sophisticated methods have to be applied.
http://arxiv.org/abs/0704.0916v2
A. Level density based on neutron resonance
spacings
In the region of neutron resonances which are located
just above the neutron binding energy (Bn), the level
density can again be determined by counting. In this case
neutron resonances are counted; one must also take into
account the assumed spin cut-off factor σ. Traditionally,
because of the the absence of reliable data below Bn, the
level density is determined by an interpolation procedure
between densities of low-lying discrete levels and den-
sity obtained from neutron resonance spacing.The Bethe
Fermi-gas model [4] with adjustable parameters a and δ
is often used as an interpolation formula:
ρ(E) =
exp[2
a(E − δ)]
2σa1/4(E − δ)5/4
, (1)
where the σ is the spin cut-off factor determining the level
spin distribution. There are a few drawbacks to this ap-
proach. One shortcoming is that it uses an assumption
that the selected model is valid in the entire excitation
energy region including low-lying discrete states and neu-
tron resonances. Undoubtedly this is correct for some of
the nuclei. A nice example is the level density of 26Al
which exhibits Fermi-gas behavior up to 8 MeV of exci-
tation energy [7]. On the other hand the level densities
of 56,57Fe measured with Oslo method [9], for example,
show complicated behavior which cannot be described
by simple Fermi-gas formula. The reason for this might
be the influence of pairing correlations leading to step
structures in vicinity of proton and neutron paring en-
ergies and above. In such cases the model function fit
to discrete levels may undergo considerable deviations in
the higher excitation energy region leading to incorrect
determination of level density parameters.
Another consideration is associated with the spin cut-
off parameter which is important in determination of the
total level density from density of neutron resonances at
Bn. In Fermi-gas model the spin cut-off parameter is
determined according to:
σ2 = m2gt =
t, (2)
wherem2 is the average of the square of the single particle
spin projections, t =
(E − δ)/a is the temperature,
g = 6a/π2 is the single particle level density, I is the rigid
body moment of inertia expressed as I = (2/5)µAR2,
where µ is the nucleon mass, A is the mass number and
R = 1.25A1/3 is the nuclear radius. The spin cut off
parameter in rigid body model is :
σ21 = 0.0146A
5/3t = 0.0146A5/3
((E − δ)/a)). (3)
On the other hand the Gilbert and Cameron [5] used
m2 = 0.146A2/3. The corresponding formula for σ is:
σ22 = 0.089A
((E − δ)/a). (4)
Eqs. (3) and (4) have the same energy and A dependence
(σ2 ∼ A7/6(E − δ)1/2) but differ by a factor of ≈ 2. It
should be mentioned also that the recent model calcula-
tions [10] show the suppression of the moment of inertia
at low temperatures compared to its rigid body value.
Thus uncertainties in spin cut off parameter transform
to corresponding uncertainties of total level densities de-
rived from neutron resonance spacings.
Experimentally, the spin cutoff parameter can be ob-
tained only from spin distribution of low-lying discrete
levels. However, because of the small number of known
spins, the uncertainty of such procedure is large. It turns
out that reported systematics based on such investiga-
tion σ = (0.98 ± 0.23)A(0.29±0.06) [7] is different from
above expressions for which σ ∼
A7/6 = A0.58. At
higher excitation energies determining the cutoff param-
eter becomes problematic due to the high level density
and the absence of the reliable observables sensitive to
this parameter. One can mention Ref. [11] where the
spin cutoff parameter has been determined from the an-
gular distribution of evaporation neutrons with α and
proton projectiles. The deviation from the expected A
dependence has also been reported. The absolute values
of the parameter agree with Eq. (3).
The parity dependence of level densities is also not
established experimentally beyond the discrete level re-
gion. At the neutron binding energy the assumption is
usually made about the equality of negative and positive
parity states. This is supported by some experimental
results [12]. However, recent calculations, performed for
Fe, Ni, and Zn isotopes, show that for some of them the
assumption of equally distributed states is not fulfilled
even far beyond the neutron binding energy, up to exci-
tation energies 15-20 MeV [13].
As is seen from the above considerations, the cal-
culation of the total level density from neutron res-
onance spacing might contain uncertainties associated
with many factors such as the possible deviation from
Fermi-gas dependence in interpolation region, uncertain-
ties in spin cutoff parameter and inequality of states with
different parity. Thus the question of how large these
uncertainties are or to what extent the level density ex-
tracted in such a way can be applicable to calculations
of reaction cross sections still remains important and not
completely resolved.
B. Level density from evaporation particles
The cross section of evaporated particles from the first
stage of a compound-nuclear reaction (i.e. when the out-
going particle is the first particle resulting from com-
pound nucleus decay ) can be calculated in the framework
of the Hauser-Feshbach theory:
(εa, εb) = (5)
σCN(εa)
Iπ Γb(U, J, π, E, I, π)ρb(E, I, π)
Γ(U, J, π)
Γ(U, J, π) =
Γb′(U, J, π, Ek, Ik, πk)+ (6)
∫ U−B
dE′ Γb′(U, J, π, E
′, I ′, π′) ρb′(E
′, I ′, π′)
Here σCN (εa) is the fusion cross section, εa and εb
are energies of relative motion for incoming and outgo-
ing channels (εb = U − Ek − Bb, where Bb is the sepa-
ration energy of particle b from the compound nucleus),
the Γb are the transmission coefficients of the outgoing
particle, and the quantities (U, J, π) and (E, I, π) are the
energy, angular momentum, and parity of the compound
and residual nuclei, respectively. The energy Ec is the
continuum edge, above which levels are modeled using a
level density parameterization. For energies below Ec the
known excitation energies, spins, and parities of discrete
levels are used. In practice Ec is determined by the avail-
able spectroscopic data in the literature. It follows from
Eq. (6) that the cross section is determined by both trans-
mission coefficients of outgoing particles and the NLD of
the residual nucleus ρb(E, I, π). It is believed that trans-
mission coefficients are known with sufficient accuracy
near the line of stability because they can be obtained
from optical model potentials usually based on experi-
mental data for elastic scattering and total cross sections
in the corresponding outgoing channel. Transmission co-
efficients obtained from different systematics of optical
model parameters do not differ by more that 15-20 %
from each other in our region of interest (1− 15 MeV of
outgoing particles). The uncertainties in level densities
are much larger. Therefore the Hauser Feshbach model
can be used to improve level densities by comparing ex-
perimental and calculated particle evaporation spectra.
Details and assumptions of this procedure are described
in Refs [14, 15].
The advantage of this method is that because of the
wide range of spin population in both the compound and
final nuclei, evaporation spectra are determined by the
total level density (integrated over all level spins) as op-
posed to the neutron resonance technique where reso-
nances are known for one or two spins and one parity.
The drawback stems from possible direct or multistep
compound reaction contributions distorting the evapora-
tion spectra, especially in the region of low-lying discrete
levels needed for the absolute normalization of obtained
level densities.
According to Hodgson [16], the interaction process can
usefully be considered to take place in a series of stages
corresponding to the successive nucleon-nucleon interac-
tion until complete equilibrium is reached. At each stage
it is possible for particles to be emitted from the nucleus.
The direct reactions refer to the fast, first stages of this
process giving forward peaked angular distribution. The
term multistep direct reaction implies that that such pro-
cess may take place in a number of states. Compound
nuclear reactions refers to all processes giving angular
distributions symmetric about 900; they are subdivided
into multistep compound reactions that take place before
the compound system has attained final statistical equi-
librium and statistical compound reactions that corre-
spond to the evaporation of particle from an equilibrium
system.
The use of evaporation spectra to infer level densities
requires that the reaction goes through to complete equi-
librium. Significant contributions from either multistep
direct or multistep compound reactions could cause in-
correct level density parameters to be deduced. Multi-
step direct reactions would usually be forward peaked
and also concentrated in peaks. If the reaction has a lim-
ited number of stages, the two-body force cannot cause
transitions to states which involve a large number of rear-
rangements from the original state. Multistep compound
reactions would be expected to lead to angular distribu-
tions which are symmetric about 900. They would, if
complete equilibration has not occurred, also preferen-
tially reach states which are similar to the target plus pro-
jectile. The shape of spectra from a multistep compound
reaction would be different for a deuteron-induced as op-
posed to a 3He-induced reaction. Hodgson has reviewed
[16] the evidence for multistep compound reactions. He
finds the most convincing evidence for such contributions
comes from fluctuation measurements for the 27Al(3He,p)
reaction. In this case, certain low-lying states show level
widths in the compound system which are larger than
expected. These states are low-lying and are the ones
which would be most likely to show such effects. It ap-
pears that measurements of continuum spectra do not
show evidence of such contributions. The uncertainties
connected to contributions of pre-equilibrium reactions
are generally difficult to estimate experimentally. The
measurement of angular distribution does not solve the
problem in the case of multistep compound mechanism.
We believe that the use of different reactions to form the
same compound nucleus is the most reliable way to esti-
mate and eliminate such contributions.
In this work we investigate reactions with deuteron and
3He projectiles on 59Co and 58Fe, respectively. These two
reactions form the same compound nucleus, 61Ni. The
purpose was to investigate if the cross section of outgo-
ing particles from both reactions can be described in the
framework of Hauser-Feshbach model with same set of
level density parameters. This is possible only when pro-
duction cross section is due to compound reaction mecha-
nism in both reactions. Neutron, protons, and α-particles
have been measured. These outgoing particles exhaust
the majority of the fusion cross section. The ratio be-
tween cross sections of different particles is determined
by the ratio of level densities of corresponding residual
nuclei. It puts constraints on relative level density val-
ues obtained from an experiment. In our experiment, the
level densities of 60Ni, 60Co, and 57Fe residual nuclei have
been determined from the region of the energy spectra of
neutron, proton, and α-particles where only first state
emission is possible.
III. EXPERIMENT AND METHOD
The tandem accelerator at Ohio University’s Edwards
Accelerator Laboratory provided 3He and deuteron
beams with energies of 10 and 7.5 MeV, respectively.
Self-supporting foils of 0.625-mg/cm2 58Fe (82% en-
riched) and 0.89-mg/cm2 59Co (100% natural abun-
dance) have been used as targets. The outgoing charged
particles were registered by charged-particle spectrome-
ters as shown in Fig. 2. The setup has ten 2-m time-
of-flight legs ending with Si detectors (see Fig. 1). Legs
are set up at different angles ranging from 22.5◦ up to
157.5◦. The mass of the charged particles is determined
by measuring both the energy deposited in Si detectors
and the time of flight. Additionally, a neutron detector
was placed at the distance of 140 cm from the target to
measure the neutron energy spectrum. The mass resolu-
tion was sufficient to resolve protons, deuterons, 3H/3He,
and α-particles.
He-3 beamTarget
2m flight path
FIG. 1: Charge particle spectrometer utilized for the mea-
surements.
Additionally, the neutron spectra from both the
58Fe(3He, Xn) and the 59Co(d,Xn) reactions have been
measured by the time-of-flight method with the Swinger
facility of Edwards Laboratory [17]. Here a flight path
of 7 m has been used to obtain better energy resolution
for outgoing neutrons, allowing us to measure the shape
of neutron evaporation spectrum more accurately. The
energy of the outgoing neutrons is determined by time-of-
flight method. The 3-ns pulse width provided an energy
resolution of about 100 keV and 800 keV at 1 and 14 MeV
of neutrons, respectively. The neutron detector efficiency
was measured with neutrons from the 27Al(d, n) reaction
on a stopping Al target at Ed = 7.44 MeV [18]. This
measurement allowed us to determine the detector ef-
ficiency from 0.2 to 14.5 MeV neutron energy with an
accuracy of ∼ 6%. The neutron spectra have been mea-
sured at backward angles from 110◦ to 150◦. Additional
measurements with a blank target have been performed
at each angle to determine background contribution. The
absolute cross section has been calculated by taking into
account the target thickness, the accumulated charge of
incoming deuteron or 3He beam, and the neutron detec-
tor efficiency. The overall systematic error for the abso-
lute cross sections is estimated to be 15%. The errors in
ratios of proton and α cross sections are only a few per-
cents because they are determined by counting statistics
alone.
IV. EXPERIMENTAL PARTICLE SPECTRA
AND LEVEL DENSITY OF PRODUCT NUCLEI
Energy spectra of neutron, protons, and α-particles
have been measured at backward angles (from 112◦ to
157◦) to eliminate contributions from direct reaction
mechanisms. Fig. 2 show energy spectra of outgoing
particles for both the 3He+58Fe and d+59Co reactions.
The calculations of particle energy spectra have been
performed with Hauser-Feshbach (HF) program devel-
oped at Edwards Accelerator Lab of Ohio University
[19]. Particle transmission coefficients have been calcu-
lated with optical model potentials taken from the RIPL
data base [6]. Different potentials have been tested and
found to be the same within 15%. Alpha-particle po-
tentials are more uncertain. Differences between corre-
sponding α-transmission coefficients depends on the α-
energy and varies from ∼ 40% for lower α-energies to
< 1% for higher α-energies in our region of interest (8-
18 MeV). In order to reduce these uncertainties the RIPL
α-potentials have been tested against the experimental
data on low energy α−elastic scattering on 58Ni [20]. The
data have been reproduced best by the potential from
Ref.[21] which has been adopted for our HF calculations.
Four level density models have been chosen for testing:
• The M1 model uses the Bethe formula (1) with pa-
rameters adjusted to fit both discrete level density
and neutrons s-wave resonance spacings.
• The M2 model uses the Gilbert-Cameron [5] for-
mula with parameters adjusted to fit both discrete
level density and neutrons s-wave resonance spac-
ings.
• The M3 model uses Bethe formula but δ parame-
ters are obtained from pairing energies according
to Ref. [1]. The a parameter has been adjusted
to match s-wave neutron resonance spacing. This
model does not fit discrete levels.
• The M4 model is based on microscopic Hartree-
Fock-BCS calculations [22] which are available from
RIPL data base [6]. According to Ref. [22], this
model has also been renormalized to fit discrete lev-
els and neutron resonance spacings.
0 2 4 6 8 10 12 14
4 6 8 10 12 14 16 18 6 8 10 12 14 16 18 20
0 2 4 6 8 10 12 14
 Neutron energy  (MeV)
2 4 6 8 10 12
 Proton energy (MeV)
6 8 10 12 14 16
 Alpha energy (MeV)
FIG. 2: Particle energy spectra for the 3He+58Fe (upper panel)and d+59Co (lower panel) reactions. The experimental data
are shown by points. Solid lines are HF calculations with level density parameters extracted from the experiment. Calculations
have been multiplied by reduction factor K = 0.52 due to direct reaction contributions. Arrows show energies above which
spectra contain only contributions from the first stage of the reaction.
The value of total level density derived from neutron res-
onance spacings depends on spin cutoff parameter used.
Therefore two prescriptions (3) and (4) for this parame-
ter have been tested for M1-M3 models. The M4 model
uses its one spin distribution which it is close to the pre-
scription (3).
The measured particle energy spectra include particles
from all possible stages of the reaction. However, by lim-
iting our consideration to particles with energies above
a particular threshold, we can ensure that only particles
from the first stage of the reaction contribute. These
thresholds depend on the particular reaction and are in-
dicated by the arrows in Fig. 2. In this energy interval
cross sections are determined exclusively by the level den-
sity of those residual nuclei. Another aspect which should
be taken into consideration when comparing calculations
and experiment is the contribution of direct processes.
Direct processes take away the incoming flux resulting in
reduction of compound reaction contribution. Assuming
that the total reaction cross section (σR) can be decom-
posed into the sum of direct (σdr ) and compound reaction
mechanisms (σcR), we have σR = σ
R + σ
R. In this case,
the HF calculations should be multiplied by the constant
factorK = σ
c exp
R /σR to correct for the absorbed incident
flux which does not lead to compound nucleus formation.
In our experiment the K has been estimated from the ra-
tioKexp = (σexpn +σ
α )/(σ
α ) ≈ K
where the experimental cross sections have been mea-
sured at backward angles. If level densities used in calcu-
lations are correct, Kexp = K. However, the calculations
show that this parameter is not very sensitive to input
level densities and can be estimated with ∼20% accuracy
with any reasonable level density models.
Table I shows the ratio of theoretical and experimen-
tal cross sections for different level density models used
in calculations. Calculations have been multiplied by re-
duction factor K which for both reactions varied within
0.48-0.54 for different level density models. Results show
that all of the models reproduce neutron cross sections
within ∼20%. However, they overestimate α-particle
cross sections by ∼30% in average and underestimate
protons by 5-80%. None of the models reproduce the
ratio of p/α cross section; for example all models sys-
tematically overestimate this ratio by a factor of ∼2 for
the d+59Co reaction. Assuming that particle transmis-
sion coefficients are known with sufficient accuracy, we
conclude that the level density of residual nuclei is re-
sponsible for such disagreement. In particular, the level
density ratio ρ[57Fe]/ρ[60Co] is overestimated by model
calculations.
In order to obtain correct level densities, the following
procedure has been used as described in Ref. [23]. The
NLD model is chosen to calculate the differential cross
section of Eq. (6). The parameters of the model were
adjusted to reproduce the experimental spectra as closely
as possible. The input NLD was improved by binwise
renormalization according to the expression:
ρb(E, I, π) = ρb(E, I, π)input
(dσ/dεb)meas
(dσ/dεb)calc
. (7)
The absolute normalization of the improved level den-
sities (later referred to as experimental level densities)
has been obtained by using discrete level densities of
60Ni populated by neutrons from the 59Co(d,n) reac-
tion. Protons and α-particles populating discrete lev-
els behave differently for different reactions. The Fig. 2
shows that the ratio between experiment and calculations
in discrete energy region is greater for 59Co(d,p) com-
pared to 58Fe(3He,p) and for 58Fe(3He,α) compared to
59Co(d,α). These enhancements are apparently reaction
specific and connected to contribution of direct or/and
multistep compound reaction mechanism. We are not
able to make the same comparison for neutron spectra
because the counting statistics in the region of discrete
levels for 58Fe(3He,n) reaction are rather poor. However,
our recent result from 55Mn(d,n) [23] indicates that the
neutron spectrum measured at backward angles is purely
evaporated even for high energy neutrons populating dis-
crete levels. Therefore we used the neutron spectrum
from the 59Co(d,n) reaction to determine the absolute
normalization of the level density for the residual nu-
cleus 60Ni. The absolute level densities of both 60Co and
57Fe nuclei have been adjusted in such a way as to repro-
duce ratios of both neutron/proton and neutron/alpha
cross sections. Uncertainties of obtained level densities
have been estimated to be about 20% which include un-
certainties of absolute cross section measurements and
uncertainties of particle transmission coefficients.
Both experimental and calculated level densities are
displayed in Fig. 3. The level density for 60Ni has been
extracted from (d,n) spectra because of better counting
statistics but (3He,p) and (3He,α) reactions have been
used to obtain level density for 60Co and 57Fe, respec-
tively, because of larger Q value. This approach allows
one to obtain level densities in a larger excitation energy
interval. Calculations have been performed with models
M1-M4 with spin cutoff parameters σ1 and σ2 for M1-M3
models. The M4 model uses its own spin distribution
which is close to σ1 for these nuclei. The χ
2 values for
calculated and experimental level densities are shown in
the Table III. Results show that the M1 model with σ1
gives worse agreement with experimental data. The use
of σ2 improve the agreement for all of the models. The
M2 and M3 models using σ2 give best agreement with
experiment on average, however level density for 60Co
agrees better when using σ1 and the best agreement is
reached with M4 model. It appears that the spin cut-
off parameter is very important when deriving the total
level density from neutron resonance spacings. However,
none of the models give a perfect description of the ex-
perimental data.
In order to improve level density parameters, the
experimental level densities have been fitted with the
Fermi-gas function (1) for two different spin cutoff fac-
tors σ1 and σ2. Best fit parameters are presented in the
Table II. They allow one to reproduce both shapes of
particle spectra (fig.2) as well as ratios of neutron, pro-
ton and α cross sections for both 3He+58Fe and d+59Co
reactions (Table I). Level density parameters have been
adjusted independently for both spin cutoff parameters
resulting in the approximately same final ratio of exper-
imental/calculated cross sections and χ2 values. There-
fore the only one entry M1exp is presented in tables. The
fact that a single set of level density parameters allows
one to reproduce all particle cross sections from both re-
actions supports our conclusion that the compound nu-
clear mechanism is dominant in these reactions. Finally
we note that the HF calculations do not perfectly repro-
duce the low-energy regions of the proton spectra where
the second stage of outgoing protons dominate. Here
the calculations also depends on additional level densi-
ties of corresponding residual nuclei as well as on the
γ-strength functions. We leave this problem for further
investigations.
The level density of 57Fe below the particle separation
threshold has also been obtained [9] by Oslo technique
using particle-γ coincidences from 57(3He,3He′)57Fe re-
action. We performed a similar comparison for the 56Fe
nucleus where we confirmed consistency of both the Oslo
technique and the technique based on particle evapo-
ration spectra. Figure 4 shows the comparison for the
57Fe nucleus. Here we also see good agreement between
level densities obtained from two different experiments.
It supports the obtained level densities. The Fermi-
gas parameters for 60Ni have been obtained in Ref. [24]
from 63Cu(p,α)60Ni reaction at Ep=12 MeV . The val-
ues a=6.4 and δ=1.3 are in a good agreement with our
parameters presented in the Table II.
V. SPIN CUTOFF PARAMETER
As it has been mentioned in the previous section the
spin cutoff parameter σ2 obtained according to Eq. (4)
gives slightly better agreement with the experiment com-
pared to σ1 obtained from Eq. (3). On the other hand,
the spin cut off parameters at the neutron binding energy
can be directly obtained from the experimental total level
density and the density of levels for one or several spin
states which are known from the analysis of neutron res-
TABLE I: Ratio of experimental and calculated cross sections obtained with four prior level density models M1-M4 and one
posterior M1exp which uses parameters fit to experimental level densities (see Table II). The spin cutoff parameters σ1 and σ2
are defined according to Eqs. (3) and (4).
M1 M2 M3 M4 M1exp Kexp
σ1 σ2 σ1 σ2 σ1 σ2
58Fe(3He,n) 0.79(12) 1.04(16) 1.03(16) 1.22(19) 1.03(16) 1.05(16) 0.90(14) 1.03(16) 0.52
58Fe(3He,p) 1.23(20) 1.01(15) 1.05(16) 0.93(14) 1.03(15) 1.01(15) 1.11(17) 0.98(15)
58Fe(3He,α) 0.66(10) 0.81(12) 0.66(10) 0.86(13) 0.73(11) 0.81(12) 0.72(11) 1.01(15)
59Co(d,n) 0.81(12) 0.90(14) 0.84(13) 0.91(14) 0.92(14) 0.93(14) 0.89(13) 0.97(15) 0.53
59Co(d,p) 1.82(27) 1.42(21) 1.70(26) 1.40(21) 1.32(20) 1.24(19) 1.41(21) 1.07(16)
59Co(d,α) 0.69(11) 0.59(10) 0.64(10) 0.57(10) 0.59(9) 0.70(11) 0.64(10) 0.97(15)
0 2 4 6 8 10 12
0 2 4 6 8 10 0 2 4 6 8 10 12
 Excitation energy (MeV)
57Fe60Co
FIG. 3: Our experimental level density are shown as points. Curves indicate level densities from the four model prescriptions
M1-M4. The upper and lower curves for M1-M3 relate to two spin cutoff parameters σ1 and σ2 used to determine total level
densities from neutron resonance spacings. The histogram is the density of discrete levels.
onances [6]. We used the spin distribution formula from
Ref. [5]:
G(J) =
(2J + 1)
[−(J + 0.5)2
with normalization condition:
G(J) = 1 (9)
TABLE II: Fermi-gas parameters obtained from experimental
level densities
Nucleus 60Ni 60Co 57Fe
a, δ for Eq.(3) 6.16;1.43 6.91;-1.89 5.92;-0.13
a, δ for Eq.(4) 6.39;0.80 7.17;-2.6 6.14;-0.78
TABLE III: χ2 of experimental and calculated total level den-
sities for different level density models and spin cutoff factors
Nucleus M1 M2 M3 M4 M1exp
σ1 σ2 σ1 σ2 σ1 σ2
60Ni 15.3 3.8 1.5 0.2 0.9 1.4 2.5 0.6
60Co 1.3 1.9 2.9 3.1 1.8 2.0 0.8 0.6
57Fe 20.2 2.5 18.9 2.0 10.8 1.8 5.8 0.6
All nuclei 11.5 2.7 7.5 1.8 4.3 1.8 3.1 0.6
0 2 4 6 8 10 12
 Excitation energy (MeV)
FIG. 4: The experimental level densities of 57Fe nucleus.
Filled points are present experimental values. Open points
are data from Oslo experiment [9]. Histogram is density of
discrete levels.
The total level density ρ(U) can be connected to the neu-
tron resonance spacing by using the expression:
= ρ(Bn + 0.5∆E)
|I0+0.5+L|
J=|I0−0.5−L|
G(J), (10)
where DL is the neutron resonance spacing for neutrons
with orbital momentum L, ∆E is the energy interval con-
taining neutron resonances. The assumption of equality
of level numbers with positive and negative parity is used.
Because the total level density ρ(Bn+0.5∆E) around the
neutron separation energy is known from our experiment,
the parameter σ can be obtained from Eqs.(8)-(10).
The data on neutron resonance spacings for nuclei un-
der study are taken from Ref. [25]. The estimated spin
cut off parameters from both s-wave (L = 0) and p-
wave (L = 1) resonance spacings are presented in the
Table IV. The uncertainties include a 20% normaliza-
tion uncertainty in total level densities and uncertainties
in the resonance spacings. For 57Fe, we have obtained
good agreement between two values of σ derived from s-
and p-wave neutron resonances. It indicates the parity
equilibrium of neutron resonances. For 60Co, the uncer-
tainties are too large to draw a definite conclusion. For
60Ni, the onlyD0 is known and one value of σ is obtained.
It agrees better with σ2 but σ1 cannot be excluded.
The calculations of spin cutoff parameter have been
performed with Eqs. (3) and (4) with Fermi-gas param-
eters from Table II. The experiment shows better agree-
ment with σ2 for
57Fe. Spin cut off parameters for 60Ni
and 60Co agree better with σ2 and σ1 respectively, how-
ever because of the large uncertainties, it is impossible to
draw an unambiguous conclusion.
TABLE IV: Spin cut off parameter obtained from s-wave σexps
and p-wave σexpp resonances with using the total level density
from the experiment. σcal1 and σ
2 have been calculated ac-
cording to Eqs. (3) and (4), respectively, with parameters
from Table II.
Nucleus 60Ni 60Co 57Fe
s 3.3(8) 3.6(15) 2.80(30)
p 5.2(12) 2.88(35)
1 4.13 3.95 3.76
2 3.22 3.26 3.0
VI. DISCUSSION
The consistency between results from two different re-
actions supports our conclusion that these reactions are
dominated by the compound-nuclear reaction mechanism
at backward angles. Our results show that level densi-
ties estimated on the basis of interpolation procedure be-
tween neutron resonance and discrete energy regions do
not reproduce experimental cross sections of all outgo-
ing particles simultaneously. The reason is that for some
of the nuclei the level density between discrete and con-
tinuum regions has a complicated behavior which can-
not be described by simple formulas based on Fermi-gas
or Gilbert-Cameron models. It is seen for 57Fe (Fig. 4)
where the level density exhibits some step structure at en-
ergy around 3.7 MeV. Nevertheless, the Fermi-gas model
can still be used to describe the level density at higher ex-
citation energies where density fluctuations vanish. The
M3 model, which does not use discrete levels, gives best
agreement. However, a problem apparently connected to
the spin cutoff parameters is still present. These results
indicate that it is necessary to use level density systemat-
ics obtained from compound-nuclear particle evaporation
spectra. Obviously, the region of discrete levels should be
excluded from such an analysis.
Spin cut off parameters obtained from this experiment
are in general agreement with model prediction of Eqs.(3)
and (4). However it is difficult to reduce uncertainties
to make more specific conclusions about the origin of
this parameter. Most probably, this parameter fluctuates
from nucleus to nucleus and is determined by the internal
properties of nuclei such as the specific population of shell
orbits.
As it has been discussed in the introduction, level den-
sities affect reaction rates which are important in astro-
physics and other applications. The magnitude of this
affect depends mainly on level densities and contribution
of the channel of interest to the total reaction cross sec-
tion. According to the Table I, the neutron outgoing
channel is less sensitive to variations of level densities
while changes in proton and α cross sections can reach a
factor of 2 from corresponding changes in level densities.
Changes in predicted cross sections will also occur at this
level.
VII. CONCLUSION
The neutron, proton, and α-particle cross sections have
been measured at backward angles from 3He+58Fe and
d+59Co reactions. The calculations using HF model have
been performed with three level density models adjusted
to match discrete levels and neutron resonance spacings
and one model adjusted to match neutron resonances
only. None of the model reproduces cross sections of all
outgoing particles simultaneously from both reactions.
However, the model M3 suggested in Ref.[1] gives the
best agreement with experiment.
Level densities of residual nuclei 60Ni, 60Co, and 57Fe
have been obtained from particle evaporation spectra.
Experimental level densities have been fit by Fermi-gas
function and new level density parameters have been ob-
tained. The new level densities allow us to reproduce
all particle energy spectra from both reactions that in-
dicate the dominance of compound nuclear mechanism
in particle spectra measured at backward angles. The
contribution of compound mechanism to the total cross
section is estimated about 50% for both reactions.
The total level density obtained from particle spectra
and neutron resonance spacings have been used to extract
the spin cut off parameter at the neutron separation ener-
gies. The extracted parameters agree with predictions of
Eq. (4) for 57Fe but no definite conclusions can be made
for 60Ni and 60Co. A better understanding of parity ra-
tio systematics would help to make this technique more
reliable.
VIII. ACKNOWLEDGMENTS
We are grateful to J.E. O’Donnell and D. Carter for
computer and electronic support during the experiment,
A. Adekola, C. Matei, B. Oginni and Z. Heinen for
taking shifts, D.C. Ingram for target thickness calcula-
tions done for us. We also acknowledge financial sup-
port from Department of Energy, grant No. DE-FG52-
06NA26187/A000.
[1] T. Rauscher, F.K. Thielemann, K.L. Kratz, Phys. Rev.
C 56, 1613 (1997).
[2] Report of the Nuclear Physics and Related Com-
putational Science R&D for Advanced Fuel Cycle
Workshop, Bethesda Maryland, August 10-12,2006,
http://www-fp.mcs.anl.gov/nprcsafc/Report FINAL.pdf
[3] W. Hauser and H. Feshbach, Phys. Rev. 87, 366 (1952).
[4] H.A. Bethe, Phys.Rev. 50, 332(1936).
[5] A. Gilbert and A.G.W. Cameron, Can.J.Phys. 43, 269
(1965).
[6] T. Belgya, O. Bersillon, R. Capote, T. Fukahori, G. Zhi-
gang, S. Goriely, M. Herman, A.V. Ignatyuk, S. Kailas,
A. Koning, P. Oblozhinsky, V. Plujko, and P. Young,
Handbook for calculations of nuclear reaction data: Ref-
erence Input Parameter Library. Available online at
http://www-nds.iaea.org/RIPL-2/, IAEA, Vienna, 2005.
[7] T. von Egidy and D. Bucurescu, Phys. Rev. C 72, 044311
(2005); 73, 049901(E) (2006).
[8] B.V. Zhuravlev, A.A.Lychagin, and N.N.Titarenko,
Physics of Atomic Nuclei, 69, 363(2006).
[9] A. Schiller et al., Phys. Rev. C 68, 054326 (2003).
[10] Y. Alhassid, G.F. Bertsch, L. Fang, and S. Liu, Phys.
Rev. C 72, 064326 (2005).
[11] S.M. Grimes, J.D. Anderson, J.W. McClure, B.A. Pohl,
and C. Wong, Phys. Rev. C 10, p.2373 (1974).
[12] S.J.Lokitz, G.E.Mitchell, J.F.Shriner, Jr, Phys. Rev. C
71, 064315(2005).
[13] D Mocelj, T.Rauscher, F-K Thielemann, G Mart́ınez
Pinedo, K.Langanke, L.Pacearescu and A.Fäßler,
J.Phys.G:Nucl.Part. Phys. 31, 1927(2005)
[14] H. Vonach, Proceedings of the IAEA Advisory Group
Meeting on
[15] A. Wallner, B. Strohmaier, and H. Vonach, Phys. Rev.
C 51, 614 (1994).
[16] P.E. Hodgson, Rep. Prog. Phys. 50 1171(1987). Basic
and Applied Problems of Nuclear Level Densities, Upton,
NY, 1983, BNL Report No. BNL-NCS-51694, 1983, p.
[17] A. Salas-Bacci, S.M. Grimes, T.N. Massey, Y. Parpottas,
R.T. Wheeler, J.E. Oldendick, Phys. Rev. C 70, 024311
(2004).
[18] T.N. Massey, S. Al-Quraishi, C.E. Brient, J.F.
Guillemette, S.M. Grimes, D. Jacobs, J.E. O’Donnell,
J. Oldendick and R. Wheeler, Nuclear Science and Engi-
neering 129, 175 (1998).
[19] S. M. Grimes, Ohio University Report INPP-04-03, 2004
(unpublished).
[20] L.R.Gasques, L.C.Chamon, D.Pereira, V.Guimarães,
A.Lépine-Szily, M.A.G.Alvarez, E.S.Rossi, Jr., C.P.Silva,
B.V.Carlson, J.J.Kolata, L.Lamm, D.Peterson, P.Santi,
S.Vincent, P.A.De Young, G.Peasley, Phys. Rev. C 67,
024602 (2003).
[21] R.C.Harper and W.L.Alford, J.Phys.G:Nucl.Phys. 8,
153(1982)
http://www-fp.mcs.anl.gov/nprcsafc/Report_FINAL.pdf
http://www-nds.iaea.org/RIPL-2/
[22] P. Demetriou and S. Goriely, Nucl. Phys. A695, 95
(2001).
[23] A.V.Voinov, S.M.Grimes, U.Agvaanluvsan, E.Algin,
T.Belgya, C.R.Brune, M.Guttormsen, M.J.Hornish,
T.Massey, G.E.Mitchell, J.Rekstad, A.Schiller, S.Siem,
Phys. Rev. C 74, 014314 (2006).
[24] Louis C. Vaz, C.C.Lu, and J.R.Huizenga, Phys. Rev. C
5, p.463 (1972).
[25] S.F. Mughabhab, Atlas of Neutron Resonances, Elsevier
5-th ed. 2006.
ABSTRACT
  The energy spectra of neutrons, protons, and alpha-particles have been
measured from the d+59Co and 3He+58Fe reactions leading to the same compound
nucleus, 61$Ni. The experimental cross sections have been compared to
Hauser-Feshbach model calculations using different input level density models.
None of them have been found to agree with experiment. It manifests the serious
problem with available level density parameterizations especially those based
on neutron resonance spacings and density of discrete levels. New level
densities and corresponding Fermi-gas parameters have been obtained for
reaction product nuclei such as 60Ni,60Co, and 57Fe.

<|endoftext|><|startoftext|>
Introduction
The High Accuracy Radial-velocity Planet Searcher (HARPS,
Pepe et al. 2002, 2004; Mayor et al. 2003) was put in opera-
tion during the second half of 2003. HARPS is a high-resolution,
fiber-fed echelle spectrograph mounted on the 3.6–m telescope
at ESO–La Silla Observatory (Chile). It is placed in a vacuum
vessel and is accurately thermally-controlled (temperature varia-
tions are less than 1 mK over one night, less than 2 mK over one
month). Its most striking characteristic is its unequaled stability
and radial-velocity (RV) accuracy: 1 m s−1 in routine operations.
A sub-m s−1 accuracy can even be achieved for inactive, slowly
rotating stars when an optimized observing strategy aimed at
averaging out the stellar oscillations signal is applied (Santos
et al. 2004a; Lovis et al. 2006).
Send offprint requests to: D. Naef e-mail: dnaef@eso.org
? Based on observations made with the HARPS instrument on the
ESO 3.6-m telescope at the La Silla Observatory (Chile) under the GTO
programme ID 072.C-0488.
The HARPS Consortium that manufactured the instrument
for ESO has received Guaranteed Time Observations (GTO). The
core programme of the HARPS-GTO is a very high RV-precision
search for low-mass planets around non-active and slowly ro-
tating Solar-type stars. Another programme carried out by the
HARPS-GTO is a lower RV precision planet search. It is a survey
of about 850 Solar-type stars at a precision better than 3 m s−1.
The sample is a volume-limited complement (up to 57.5 pc) of
the CORALIE sample (Udry et al. 2000). The goal of this sub-
programme is to obtain improved Jupiter-sized planets orbital
elements distributions by substantially increasing the size of the
exoplanets sample. Statistically robust orbital elements distribu-
tions put strong constraints on the various planet formation sce-
narios. The total number of extra-solar planets known so far is
over 200. Nevertheless, some sub-categories of planets with spe-
cial characteristics (e.g. hot-Jupiters or very long-period planets)
are still weakly populated. The need for additional detections
thus remains high.
With typical measurement precisions of 2-3 m s−1, the
HARPS volume-limited programme does not necessarily aim at
2 D. Naef et al.: The HARPS search for southern extra-solar planets IX
Table 1. Observed and inferred stellar characteristics of
HD 100777, HD 190647, and HD 221287 (see text for details).
HD 100777 HD 190647 HD 221287
HIP 56572 99115 116084
Type K0 G5 F7V
mV 8.42 7.78 7.82
B − V 0.760 0.743 0.513
π [mas] 18.84± 1.14 18.44± 1.10 18.91± 0.82
d [pc] 52.8+3.4
−3.0 54.2
−3.1 52.9
MV 4.807 4.109 4.203
B.C. −0.119 −0.109 −0.010
L [L�] 1.05 1.98 1.66
Teff [K] 5582± 24 5628± 20 6304± 45
log g [cgs] 4.39± 0.07 4.18± 0.05 4.43± 0.16
[Fe/H] 0.27± 0.03 0.24± 0.03 0.03± 0.05
Vt [km s−1] 0.98± 0.03 1.06± 0.02 1.27± 0.12
M∗ [M�] 1.0± 0.1 1.1± 0.1 1.25± 0.10
log R
HK −5.03 −5.09 −4.59
Prot [d] 39± 2 39± 2 5.0± 2
age [Gyr] >2 >2 1.3
v sin i [km s−1] 1.8± 1.0 2.4± 1.0 4.1± 1.0
detecting very low-mass planetary companions, and it is mostly
sensitive to planets that are more massive than Saturn. To date,
it has already allowed the detection of several short-period pla-
nets: HD 330075 b (Pepe et al. 2004), HD 2638 b, HD 27894 b,
HD 63454 b (Moutou et al. 2005), and HD 212301 b (Lo Curto
et al. 2006).
In this paper, we report the detection of 3 longer-period pla-
netary companions orbiting stars in the volume-limited sample:
HD 100777 b, HD 190647 b, and HD 221287 b. In Sect. 2, we de-
scribe the characteristics of the 3 host stars. In Sect. 3, we present
our HARPS radial-velocity data and the orbital solutions for the
3 targets. In Sect. 3.4, we discuss the origin of the high residuals
to the orbital solution for HD 221287. Finally, we summarize our
findings in Sect. 4.
2. Stellar characteristics of HD 100777, HD 190647,
and HD 221287
The main characteristics of HD 100777, HD 190647, and
HD 221287 are listed in Table 1. Spectral types, apparent magni-
tudes mV, colour indexes B − V , astrometric parallaxes π, and
distances d are from the HIPPARCOS Catalogue (ESA 1997).
From the same source, we have also retrieved information on
the scatter in the photometric measurements and on the good-
ness of the astrometric fits for the 3 targets. The photometric
scatters are low in all cases. HD 190647 is flagged as a constant
star. The goodness-of-fit parameters are close to 0 for the 3 stars,
indicating that their astrometric data are explained by a single-
star model well. Finally, no close-in faint visual companions are
reported around these objects in the HIPPARCOS Catalogue.
We performed LTE spectroscopic analyses of high signal-
to-noise ratio (SNR) HARPS spectra for the 3 targets follow-
ing the method described in Santos et al. (2004b). Effective
temperatures (Teff), gravities (log g), iron abundances ([Fe/H]),
and microturbulence velocities (Vt) indicated in Table 1 re-
sult from these analyses. Like most of the planet-hosting stars
Table 2. HARPS radial-velocity data obtained for HD 100777.
Julian date RV Uncertainty
BJD− 2 400 000 [d] [km s−1]
53 063.7383 1.2019 0.0016
53 377.8740 1.2380 0.0015
53 404.7626 1.2205 0.0014
53 407.7461 1.2164 0.0014
53 408.7419 1.2176 0.0016
53 409.7891 1.2161 0.0015
53 468.6026 1.2109 0.0016
53 470.6603 1.2122 0.0021
53 484.6703 1.2273 0.0013
53 489.5948 1.2319 0.0022
53 512.5671 1.2494 0.0017
53 516.5839 1.2539 0.0014
53 518.5962 1.2554 0.0021
53 520.5778 1.2558 0.0022
53 543.5598 1.2649 0.0013
53 550.5289 1.2641 0.0016
53 573.4689 1.2682 0.0014
53 579.4705 1.2641 0.0022
53 724.8617 1.2520 0.0012
53 762.8169 1.2355 0.0013
53 765.7645 1.2311 0.0016
53 781.8894 1.2242 0.0016
53 789.7730 1.2209 0.0018
53 862.6200 1.2217 0.0015
53 866.6037 1.2239 0.0014
53 871.6270 1.2357 0.0015
53 883.5589 1.2397 0.0014
53 918.4979 1.2589 0.0017
53 920.5110 1.2613 0.0023
(Santos et al. 2004b), HD 100777 and HD 190647 have very high
iron abundances, more than 1.5 times the solar value, whereas
HD 221287 has a nearly solar metal content. Using the spec-
troscopic effective temperatures and the calibration in Flower
(1996), we computed bolometric corrections. Luminosities were
obtained from the bolometric corrections and the absolute ma-
gnitudes. The low gravity and the high luminosity of HD 190647
indicate that this star is slightly evolved. The other two stars
are still on the main sequence. Stellar masses M∗ were derived
from L, Teff , and [Fe/H] using Geneva and Padova evolutiona-
ry models (Schaller et al. 1992; Schaerer et al. 1993; Girardi
et al. 2000). Values of the projected rotational velocities, v sin i,
were derived from the widths of the HARPS cross-correlation
functions using a calibration obtained following the method de-
scribed in Santos et al. (2002, see their Appendix A) 1
Stellar activity indexes log R
HK (see the index definition
in Noyes et al. 1984) were derived from Ca II K line core re-
emission measurements on high-SNR HARPS spectra following
the method described in Santos et al. (2000). Using these va-
lues and the calibration in Noyes et al. (1984), we derived es-
timates of the rotational periods and stellar ages. The chromo-
spheric ages obtained with this calibration for HD 100777 and
HD 190647 are 6.2 and 7.6 Gyr. Pace & Pasquini (2004) have
shown that chromospheric ages derived for very low-activity,
Solar-type stars were not reliable. This is due to the fact that
the chromospheric emission drops rapidly after 1 Gyr and be-
comes virtually constant after 2 Gyr. For this reason, we have
chosen to indicate ages greater than 2 Gyr instead of the cali-
1 Using cross-correlation function widths for deriving projected rota-
tional velocities is a method that was first proposed by Benz & Mayor
(1981).
D. Naef et al.: The HARPS search for southern extra-solar planets IX 3
Fig. 1. Top: HARPS radial-velocity data (dots) for HD 100777
and fitted orbital solution (solid curve). The radial-velocity sig-
nal is induced by the presence of a 1.16 MJup planetary compan-
ion on a 384-day orbit. Bottom: Residuals to the fitted orbit. The
scatter of these residuals is compatible with the velocity uncer-
tainties.
brated values for these two stars. HD 221287 is much more active
and thus younger making its chromospheric age estimate more
reliable: 1.3 Gyr. We have also measured the lithium abundance
for this target following the method described in Israelian et al.
(2004) and again using a high-SNR HARPS spectrum. The mea-
sured equivalent width for the Li I λ6707.70Å line (deblended
from the Fe I λ6707.44Å line) is 66.6 mÅ leading to the fol-
lowing lithium abundance: log n(Li) = 2.98. Unfortunately, the
lithium abundance cannot provide any reliable age constraint in
our case. Studies of the lithium abundances of open cluster main
sequence stars have shown that log n(Li) remains constant and
equals ' 3 for Teff = 6300 K stars older than a few million years
(see for example Sestito & Randich 2005).
From the activity level reported for HD 221287
(log R
HK =−4.59), we can estimate the expected level of
activity-induced radial-velocity scatter (i.e. the jitter). Using
the results obtained by Santos et al. (2000) for stars with
similar activity levels and spectral types, the range of expected
jitter is between 8 and 20 m s−1. A similar study made on a
different stellar sample by Wright (2005) gives similar results:
an expected jitter of the order of 20 m s−1 (from the fit this
author presents in his Fig. 4). It has to be noted that both
studies contain very few F stars and even fewer high-activity
F stars. This results from the stellar sample they have used:
planet search samples selected against rapidly-rotating young
stars. Their predictions for the expected jitter level are therefore
not well-constrained for active F stars. Both HD 100777 and
HD 190647 are inactive and slowly rotating. Following the same
studies, low jitter values are expected in both cases: ≤ 8 m s−1.
Table 3. HARPS radial-velocity data obtained for HD 190647.
Julian date RV Uncertainty
BJD− 2 400 000 [d] [km s−1]
52 852.6233 −40.2874 0.0013
52 854.6727 −40.2868 0.0013
53 273.5925 −40.2435 0.0022
53 274.6041 −40.2354 0.0024
53 468.8874 −40.2688 0.0015
53 470.8634 −40.2689 0.0014
53 493.9234 −40.2700 0.0029
53 511.9022 −40.2723 0.0015
53 543.8020 −40.2785 0.0013
53 550.7721 −40.2804 0.0014
53 572.8074 −40.2818 0.0015
53 575.7266 −40.2844 0.0017
53 694.5136 −40.3056 0.0014
53 836.9226 −40.2982 0.0015
53 861.8630 −40.2950 0.0017
53 883.8224 −40.2907 0.0013
53 886.8881 −40.2883 0.0013
53 917.8390 −40.2785 0.0021
53 921.8472 −40.2785 0.0018
53 976.6039 −40.2623 0.0014
53 979.6654 −40.2610 0.0017
3. HARPS radial-velocity data
Stars belonging to the volume-limited HARPS-GTO sub-
programme are observed most of the time without the simulta-
neous Thorium-Argon reference (Baranne et al. 1996). The ob-
tained radial velocities are thus uncorrected for possible instru-
mental drifts. This only has a very low impact on our results
as the HARPS radial-velocity drift is less than 1 m s−1 over one
night. For this large volume-limited sample, we aim at a radial-
velocity precision of the order of 3 m s−1 (or better). This corres-
ponds roughly to an SNR of 40-50 at 5500Å. For bright targets
like the three stars of this paper, the exposure times required for
reaching this signal level can be very short as an SNR of 100
(at 5500Å) is obtained with HARPS in a 1-minute exposure on
a 6.5 mag G dwarf under normal weather and seeing conditions.
In order to limit the impact of observing overheads (telescope
preset, target acquisition, detector read-out), we normally do not
use exposure times less than 60 seconds. As a consequence, the
SNR obtained for bright stars is significantly higher than the tar-
geted one, and the output measurement errors are frequently be-
low 2 m s−1.
For our radial-velocity measurements, we consider two main
error terms. The first one is obtained through the HARPS Data
Reduction Software (DRS). It includes all the known calibration
errors (' 20 cm s−1), the stellar photon-noise, and the error on
the instrument drift. For observations taken with the simultane-
ous reference, the drift error is derived from the photon noise
of the Thorium-Argon exposure. For observations taken without
the lamp, a drift error term of 50 cm s−1 is quadratically added.
The second main error term is called the non-photonic error. It
includes guiding errors and a lower limit for the stellar pulsa-
tion signals. For the volume-limited programme, we use an ad-
hoc value for this term: 1.0 m s−1. Stellar noise (activity jitter,
pulsation signal) can of course be greater in some cases (as for
example for HD 221287, cf. Sect. 3.3). The non-photonic term
nearly vanishes for targets belonging to the very high RV pre-
cision sample (non-active stars, pulsation modes averaged out
by the specific observing strategy). In this latter case, this term
thus only contains the guiding errors: ' 30 cm s−1. The error bars
4 D. Naef et al.: The HARPS search for southern extra-solar planets IX
Fig. 2. Top: HARPS radial-velocity data (dots) for HD 190647
and fitted orbital solution (solid curve). The radial-velocity sig-
nal is induced by the presence of a 1.9 MJup planetary companion
on a 1038-day orbit. Bottom: Residuals to the fitted orbit. The
scatter of these residuals is compatible with the velocity uncer-
tainties.
listed in this paper correspond to the quadratic sum of the DRS
and the non-photonic errors.
In the following sections, we present the HARPS radial-
velocity data obtained for HD 100777, HD 190647, and
HD 221287 in more detail, as well as the orbital solutions fitted
to the data.
3.1. A 1.16 MJup planet around HD 100777
We have gathered 29 HARPS radial-velocity measurements of
HD 100777. These data span 858 days between February 27th
2004 (BJD = 2 453 063) and July 4th 2006 (BJD = 2 453 921).
Their mean radial-velocity uncertainty is 1.6 m s−1 (mean DRS
error: 1.3 m s−1). We list these measurements in Table 2 (elec-
tronic version only).
A nearly yearly signal is present in these data. We fit-
ted a Keplerian orbit. The resulting parameters are listed in
Table 5. The fitted orbit is displayed in Fig. 1, together with our
radial-velocity measurements. The radial-velocity data is best
explained by the presence of a 1.16 MJup planet on a 384 d fairly
eccentric orbit (e = 0.36). The inferred separation between the
host star and its planet is a = 1.03 AU. Both m2 sin i and a were
computed using a primary mass of 1 M�.
We performed Monte-Carlo simulations using the ORBIT
software (see Sect. 3.1. in Forveille et al. 1999) in order to
double-check the parameter uncertainties. The errorbars ob-
tained from these simulations are quasi-symmetric and some-
what larger ('18%) than the ones obtained from the covariance
matrix of the Keplerian fit. The errors we have finally quoted in
Table 5 are the Monte-Carlo ones. The residuals to the fitted orbit
(see bottom panel of Fig. 1) are flat and have a dispersion com-
patible with the measurement noise. The low reduced χ2 value
(1.45) and the χ2 probability (0.074) further demonstrate the
Table 4. HARPS radial-velocity data obtained for HD 221287.
Julian date RV Uncertainty
BJD− 2 400 000 [d] [km s−1]
52 851.8534 −21.9101 0.0012
52 853.8544 −21.9201 0.0021
52 858.7810 −21.9252 0.0019
53 264.7097 −21.8617 0.0021
53 266.6805 −21.8852 0.0020
53 268.7030 −21.8595 0.0034
53 273.6951 −21.8888 0.0031
53 274.7129 −21.8718 0.0021
53 292.6284 −21.9049 0.0020
53 294.6239 −21.8998 0.0019
53 295.6781 −21.9017 0.0024
53 296.6388 −21.9186 0.0026
53 339.6009 −21.9289 0.0015
53 340.5974 −21.9225 0.0013
53 342.5955 −21.9172 0.0013
53 344.5961 −21.9428 0.0014
53 345.5923 −21.9291 0.0013
53 346.5566 −21.9284 0.0013
53 546.9385 −21.8022 0.0039
53 550.9121 −21.8294 0.0022
53 551.9479 −21.7956 0.0018
53 723.5707 −21.8679 0.0033
53 727.5302 −21.8815 0.0021
53 862.9297 −21.9205 0.0023
53 974.7327 −21.8240 0.0021
53 980.7273 −21.8192 0.0022
good fit quality. The presence of another massive short-period
companion around HD 100777 is thus unlikely.
3.2. A 1.9 MJup planet orbiting HD 190647
Between August 1, 2003 (BJD = 2 452 852) and September 2,
2006 (BJD = 2 453 980), we obtained 21 HARPS radial-velocity
measurements for HD 190647. These data have a mean radial-
velocity uncertainty of 1.7 m s−1 (mean DRS error: 1.3 m s−1).
We list these measurements in Table 3 (electronic version only).
A long-period signal is clearly present in the RV data. We
performed a Keplerian fit. The resulting fitted parameters are
listed in Table 5. The fitted orbital period (1038 d) is slightly
shorter than the observing time span (1128 d) and the orbital
eccentricity is low (0.18). Monte-Carlo simulations were car-
ried out. The uncertainties on the orbital parameters obtained in
this case are nearly symmetric and a bit larger ('18%) than the
ones resulting from the Keplerian fit. In Table 5, we have listed
these more conservative Monte-Carlo uncertainties. The small
discrepancy between the two sets of errorbars is most probably
due to the rather short time span of the observations (only 1.09
orbital cycle covered) and to our still not optimal coverage of
both the minimum and the maximum of the radial-velocity orbit.
From the fitted parameters and with a primary mass of 1.1 M�,
we compute a minimum mass of 1.90 MJup for this planetary
companion. The computed separation between the two bodies
is 2.07 AU. Figure 2 shows our data and the fitted orbit.
The weighted rms of the residuals (1.6 m s−1) is slightly
smaller than the mean RV uncertainty. The low dispersion of the
residuals, the low reduced χ2 value, and its associated proba-
bility (0.11) allow us to exclude the presence of an additional
massive short-period companion.
D. Naef et al.: The HARPS search for southern extra-solar planets IX 5
Fig. 3. Top: HARPS radial-velocity data (dots) for HD 221287
and fitted orbital solution (solid curve). The detected Keplerian
signal is induced by a 3.1 MJup planet on a 456–day orbit.
Because of a non-optimal coverage of the radial-velocity maxi-
mum and the presence of stellar activity-induced jitter, the ex-
act shape of the orbit is not very well-constrained. Bottom:
Residuals to the fitted orbit. The scatter of these residuals is
much larger than the velocity uncertainties. This large dispersion
is probably due to the fairly high activity level of this star.
3.3. A 3.1 MJup planetary companion to HD 221287
A total of 26 HARPS radial-velocity data were obtained
for HD 221287. These data are spread over 1130 days: be-
tween July 31, 2003 (BJD = 2 452 851) and September 3, 2006
(BJD = 2 453 981). Unlike the two other targets presented in
this paper, a substantial fraction ('65%) of these velocities
were taken using the simultaneous Thorium-Argon reference.
They were thus corrected for the measured instrumental velocity
drifts. The mean radial-velocity uncertainty computed for this
data set is 2.1 m s−1 (mean DRS error: 1.8 m s−1). We list these
measurements in Table 4 (electronic version only).
A 456 d radial-velocity variation is clearly visible in our data
(see Fig. 3). This period is two orders of magnitude longer than
the rotation period obtained from the Noyes et al. (1984) cali-
bration for HD 221287: 5± 2 d. This large discrepancy between
PRV and Prot is probably sufficient for safely excluding stellar
spots as the origin of the detected RV signal, but we nevertheless
checked if this variability could be due to line-profile variations.
The cross-correlation function (CCF) bisector span versus radial
velocity plot is shown in the top panel of Fig. 4. The average
CCF bisector value is computed in two selected regions: near the
top of the CCF (i.e. near the continuum) and near its bottom (i.e
near the RV minimum). The span is the difference between these
two average values (top−bottom) and thus represents the overall
slope of the CCF bisector (for details, see Queloz et al. 2001).
As for the case of HD 166435 presented in Queloz et al.
(2001), an anti-correlation between spans and velocities is
expected in the case of star-spot induced line-profile varia-
tions. The bisector span data are quite noisy (weighted rms of
10.4 m s−1), but they are not correlated with the RV data. The
Fig. 4. a: Bisector span versus radial-velocity plot for
HD 221287. The dispersion of the span data is quite
large (10.4 m s−1) revealing potential line-profile variations.
Nevertheless, the main radial-velocity signal is not correlated
to these profile variations and is thus certainly of Keplerian
origin. b: Bisector span versus radial-velocity residuals to the
Keplerian orbit (see Table 5) displayed in the same velocity
scale. A marginal anti-correlation between the two quantities is
observed.
main signal can therefore not be due to line-profile variations
and certainly has a Keplerian origin.
Table 5 contains the results of a Keplerian fit that we per-
formed. Our data and the fitted orbit are displayed in Fig. 3. The
RV maximum remains poorly covered by our observations. As
for the other two targets, we made Monte-Carlo simulations for
checking our orbital parameter uncertainties. As expected, the
uncertainties obtained in this case largely differ from the ones
obtained via the Keplerian fit. For most of the parameters, the
errorbars resulting from the simulations are not symmetric and
much larger ('5 times larger). In order to be more conserva-
tive, we have chosen to quote these errors in Table 5. The shape
of the orbit is not very well-constrained, but there is no doubt
about the planetary nature of HD 221287 b. The fitted eccen-
tricity is low (0.08), but circular or moderately eccentric orbits
(up to 0.25) cannot be excluded yet. Using a primary mass of
1.25 M�, we compute the companion minimum mass and sepa-
ration: m2 sin i = 3.09 MJup and a = 1.25 AU.
6 D. Naef et al.: The HARPS search for southern extra-solar planets IX
Table 5. HARPS orbital solutions for HD 100777, HD 190647, and HD 221287.
HD 100777 HD 190647 HD 221287
P [d] 383.7± 1.2 1038.1± 4.9 456.1± +7.7
T [JD†] 456.2± 2.3 868± 24 263± +99
e 0.36± 0.02 0.18± 0.02 0.08± +0.17
−0.05
γ [km s−1] 1.246± 0.001 −40.267± 0.001 −21.858± +0.008
−0.005
ω [◦] 202.7± 3.1 232.5± 9.4 98± +92
K1 [m s−1] 34.9± 0.8 36.4± 1.2 71± +18−8
f (m) [10−9M�] 1.37± 0.10 4.94± 0.50 16.4± 12.6
a1 sin i [10−3AU] 1.15± 0.03 3.42± 0.12 2.95± 0.75
m2 sin i [MJup] 1.16± 0.03 1.90± 0.06 3.09± 0.79
a [AU] 1.03± 0.03 2.07± 0.06 1.25± 0.04
N 29 21 26
\ [m s−1] 1.7 1.6 8.5
χ2red
? 1.45 1.46 26.7
p(χ2, ν)‡ 0.074 0.11 0
† JD = BJD− 2 453 000
\ σO−C is the weighted rms of the residuals (weighted by 1/�2, where � is the O−C uncertainty)
? χ2red = χ
2/ν where ν is the number of degrees of freedom (here ν= N − 6).
‡ Post-fit χ2 probability computed with ν= N − 6.
3.4. Residuals to the HD 221287 orbital fit
The residuals to the orbital solution for HD 221287 presented in
Sect. 3.3 are clearly abnormal. Their weighted rms, 8.5 m s−1,
is much larger than the mean radial-velocity uncertainty ob-
tained for this target: 〈�RV〉= 2.1 m s−1. The abnormal scatter ob-
tained by quadratically correcting the residual rms for 〈�RV〉 is
8.2 m s−1. This matches the lowest value expected for this star
from the Santos et al. (2000) and Wright (2005) studies. Again,
we stress that these two studies clearly lack active F stars, and
their activity versus jitter relations are thus weakly constrained
for this kind of target. Our measured jitter value certainly does
not strongly disagree with their results.
We have searched for periodic signals in the radial-velocity
residuals by computing their Fourier transform, but no signifi-
cant peak in the power spectrum could be found. The absence of
significant periodicity is not surprising since the phase of star-
spot induced signals is not always conserved over more than a
few rotational cycles.
Cross-correlation function bisector spans are plotted against
the observed radial-velocity residuals in the bottom panel of
Fig. 4. A marginal anti-correlation (Spearman’s rank correlation
coefficient: ρ=−0.1) between the two quantities is visible. A
weighted linear regression (i.e the simplest possible model) was
computed. The obtained slope is only weakly significant (1σ).
We are thus unable, at this stage, to clearly establish the link be-
tween the line-profile variations and our residuals. As indicated
in Sect. 3.3, our orbital solution is not very well-constrained.
This probably affects the residuals and possibly explains the
absence of a clear anti-correlation. Additional radial-velocity
measurements are necessary for establishing this relation, but
activity-related processes so far remain the best explanation for
the observed abnormal residuals to the fitted orbit.
HD 221287 has a planet with an orbital period of 456 d but
with an additional radial-velocity signal, probably induced by
the presence of cool spots whose visibility is modulated by stel-
lar rotation.
4. Conclusion
We have presented our HARPS radial-velocity data for 3 Solar-
type stars: HD 100777, HD 190647, and HD 221287. The radial-
velocity variations detected for these stars are explained by the
presence of planetary companions. HD 100777 b has a minimum
mass of 1.16 MJup. Its orbit is eccentric (0.36) and has a period of
384 days. The 1038–day orbit of the 1.9 MJup planet around the
slightly evolved star HD 190647 is moderately eccentric (0.18).
The planetary companion inducing the detected velocity signal
for HD 221287 has a minimum mass of 3.1 MJup. Its orbit has
a period of 456 days. The orbital eccentricity for this planet is
not well-constrained. The fitted value is 0.08 but orbits with
0.0≤ e≤ 0.25 cannot be excluded yet. This rather weak con-
straint on the orbital shape is explained by two reasons. First,
our data cover the radial-velocity maximum poorly. Second, the
residuals to this orbit are abnormally large. We have tried to es-
tablish the relation between these high residuals and line-profile
variations through a study of the CCF bisectors. As expected, a
marginal anti-correlation of the two quantities is observed, but
it is only weakly significant, thereby preventing us from clearly
establishing the link between them.
Acknowledgements. The authors would like to thank the ESO–La Silla
Observatory Science Operations team for its efficient support during the observa-
tions and to all the ESO staff involved in the HARPS maintenance and techni-
cal support. Support from the Fundação para Ciência e a Tecnologia (Portugal)
to N.C.S. in the form of a scholarship (reference SFRH/BPD/8116/2002) and
a grant (reference POCI/CTEAST/56453/2004) is gratefully acknowledged.
Continuous support from the Swiss National Science Foundation is apprecia-
tively acknowledged. This research has made use of the Simbad database oper-
ated at the CDS, Strasbourg, France.
References
Baranne, A., Queloz, D., Mayor, M., et al. 1996, A&AS, 119, 373
Benz, W. & Mayor, M. 1981, A&A, 93, 235
ESA. 1997, The HIPPARCOS and TYCHO catalogue, ESA-SP 1200
Flower, P. J. 1996, ApJ, 469, 355
Forveille, T., Beuzit, J., Delfosse, X., et al. 1999, A&A, 351, 619
Girardi, L., Bressan, A., Bertelli, G., & Chiosi, C. 2000, A&AS, 141, 371
D. Naef et al.: The HARPS search for southern extra-solar planets IX 7
Israelian, G., Santos, N. C., Mayor, M., & Rebolo, R. 2004, A&A, 414, 601
Lo Curto, G., Mayor, M., Clausen, J. V., et al. 2006, A&A, 451, 345
Lovis, C., Mayor, M., Pepe, F., et al. 2006, Nature, 441, 305
Mayor, M., Pepe, F., Queloz, D., et al. 2003, The Messenger, 114, 20
Moutou, C., Mayor, M., Bouchy, F., et al. 2005, A&A, 439, 367
Noyes, R. W., Hartmann, L. W., Baliunas, S. L., Duncan, D. K., & Vaughan,
A. H. 1984, ApJ, 279, 763
Pace, G. & Pasquini, L. 2004, A&A, 426, 1021
Pepe, F., Mayor, M., Queloz, D., et al. 2004, A&A, 423, 385
Pepe, F., Mayor, M., Rupprecht, G., et al. 2002, The Messenger, 110, 9
Queloz, D., Henry, G. W., Sivan, J. P., et al. 2001, A&A, 379, 279
Santos, N. C., Bouchy, F., Mayor, M., et al. 2004a, A&A, 426, L19
Santos, N. C., Israelian, G., & Mayor, M. 2004b, A&A, 415, 1153
Santos, N. C., Mayor, M., Naef, D., et al. 2000, A&A, 361, 265
Santos, N. C., Mayor, M., Naef, D., et al. 2002, A&A, 392, 215
Schaerer, D., Meynet, G., Maeder, A., & Schaller, G. 1993, A&AS, 98, 523
Schaller, G., Schaerer, D., Meynet, G., & Maeder, A. 1992, A&AS, 96, 269
Sestito, P. & Randich, S. 2005, A&A, 442, 615
Udry, S., Mayor, M., Naef, D., et al. 2000, A&A, 356, 590
Wright, J. T. 2005, PASP, 117, 657
	Introduction
	Stellar characteristics of HD100777, HD190647, and HD221287
	HARPS radial-velocity data
	A 1.16MJup planet around HD100777
	A 1.9MJup planet orbiting HD190647
	A 3.1MJup planetary companion to HD221287
	Residuals to the HD221287 orbital fit
	Conclusion
ABSTRACT
  The HARPS high-resolution high-accuracy spectrograph is offered to the
astronomical community since the second half of 2003. Since then, we have been
using this instrument for monitoring radial velocities of a large sample of
Solar-type stars (~1400 stars) in order to search for their possible low-mass
companions. Amongst the goals of our survey, one is to significantly increase
the number of detected extra-solar planets in a volume-limited sample to
improve our knowledge of their orbital elements distributions and thus obtain
better constraints for planet-formation models.
  In this paper, we present the HARPS radial-velocity data and orbital
solutions for 3 Solar-type stars: HD 100777, HD 190647, and HD 221287. The
radial-velocity data of HD 100777 is best explained by the presence of a 1.1
M_Jup planetary companion on a 384--day eccentric orbit (e=0.36). The orbital
fit obtained for the slightly evolved star HD 190647 reveals the presence of a
long-period (P=1038 d) 1.9 M_Jup planetary companion on a moderately eccentric
orbit (e=0.18). HD 221287 is hosting a 3.1 M_Jup planet on a 456--day orbit.
The shape of this orbit is not very well constrained because of our non-optimal
temporal coverage and because of the presence of abnormally large residuals. We
find clues for these large residuals to result from spectral line profile
variations probably induced by stellar activity related processes.

<|endoftext|><|startoftext|>
Introduction
A Bayesian network or directed graphical model is a statistical model that uses a directed
acyclic graph (DAG) to represent the conditional independence structures between collections
of random variables. The word Bayesian is used to describe these models because the nodes
in the graph can be used to represent random variables that correspond to parameters or hy-
perparameters, though the basic models themselves are not a priori Bayesian. These models
are used throughout computational statistics to model complex interactions between collections
of random variables. For instance, tree models are used in computational biology for sequence
alignment [4] and in phylogenetics [5, 15]. Special cases of Bayesian networks include familiar
models from statistics like factor analysis [3] and the hidden Markov model [4].
The DAG that specifies the Bayesian network specifies the model in two ways. The first is
through a recursive factorization of the parametrization, via restricted conditional distributions.
The second method is via the conditional independence statements implied by the graph. The
recursive factorization theorem [13, Thm 3.27] says that these two methods for specifying a
Bayesian network yield the same family of probability density functions.
When the underlying random variables are Gaussian or discrete, conditional independence
statements can be interpreted as algebraic constraints on the parameter space of the global
model. In the Gaussian case, this means that conditional independence corresponds to algebraic
constraints on the cone of positive definite matrices. One of our main goals in this paper is to
explore the recursive factorization theorem using algebraic techniques in the case of Gaussian
random variables, with a view towards the case of hidden random variables. In this sense, the
current paper is a generalization of the work began in [3] which concerned the special case of
factor analysis. Some past work has been done on the algebraic geometry of Bayesian networks
in the discrete case in [6, 7], but there are many open questions that remain in both the Gaussian
and the discrete case.
2 SETH SULLIVANT
In the next section, we describe a combinatorial parametrization of a Bayesian network in
the Gaussian case. In statistics, this parametrization in known as the trek rule [17]. We also
describe the algebraic interpretation of conditional independence in the Gaussian case which
leads us to our main problem: comparing the vanishing ideal of the model IG to the conditional
independence ideal CG. Section 3 describes the results of computations regarding the ideals
of Bayesian networks, and some algebraic conjectures that these computations suggest. In
particular, we conjecture that the coordinate ring of a Bayesian network is always normal and
Cohen-Macaulay.
As a first application of our algebraic perspective on Gaussian Bayesian networks, we provide a
new and greatly simplified proof of the tetrad representation theorem [17, Thm 6.10] in Section
4. Then in Section 5 we provide an extensive study of trees in the fully observed case. In
particular, we prove that for any tree T , the ideal IT is a toric ideal generated by linear forms
and quadrics that correspond to conditional independence statements implied by T . Techniques
from polyhedral geometry are used to show that C[Σ]/IT is always normal and Cohen-Macaulay.
Sections 6 and 7 are concerned with the study of hidden variable models. In Section 6 we
prove the Upstream Variables Theorem (Theorem 6.4) which shows that IG is homogeneous
with respect to a two dimensional multigrading induced by upstream random variables. As
a corollary, we deduce that hidden tree models are generated by tetrad constraints. Finally
in Section 7 we show that models with hidden variables include, as special cases, a number
of classical constructions from algebraic geometry. These include toric degenerations of the
Grassmannian, matrix Schubert varieties, and secant varieties.
Acknowledgments. I would like to thank Mathias Drton, Thomas Richardson, Mike Stillman,
and Bernd Sturmfels for helpful comments and discussions about the results in this paper. The
IMA provided funding and computer equipment while I worked on parts of this project.
2. Parametrization and Conditional Independence
Let G be a directed acyclic graph (DAG) with vertex set V (G) and edge set E(G). Often,
we will assume that V (G) = [n] := {1, 2, . . . , n}. To guarantee the acyclic assumption, we
assume that the vertices are numerically ordered ; that is, i → j ∈ E(G) only if i < j. The
Bayesian network associated to this graph can be specified by either a recursive factorization
formula or by conditional independence statements. We focus first on the recursive factorization
representation, and use it to derive an algebraic description of the parametrization. Then we
introduce the conditional independence constraints that vanish on the model and the ideal that
these constraints generate.
Let X = (X1, . . . , Xn) be a random vector, and let f(x) denote the probability density
function of this random vector. Bayes’ theorem says that this joint density can be factorized as
a product
f(x) =
fi(xi|x1, . . . , xi−1),
where fi(xi|x1, . . . , xi−1) denotes the conditional density of Xi given X1 = x1, . . . , Xi−1 = xi−1.
The recursive factorization property of the graphical model is that each of the conditional
densities fi(xi|x1, . . . , xi−1) only depends on the parents pa(i) = {j ∈ [n] | j → i ∈ E(G)}. We
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 3
can rewrite this representation as
fi(xi|x1, . . . , xi−1) = fi(xi|xpa(i)).
Thus, a density function f belongs to the Bayesian network if it factorizes as
f(x) =
fi(xi|xpa(i)).
To explore the consequences of this parametrization in the Gaussian case, we first need to
recall some basic facts about Gaussian random variables. Each n-dimensional Gaussian random
variable X is completely specified by its mean vector µ and its positive definite covariance matrix
Σ. Given these data, the joint density function of X is given by
f(x) =
(2π)n/2|Σ|1/2
exp(−
(x− µ)T Σ−1(x− µ)),
where |Σ| is the determinant of Σ. Rather than writing out the density every time, the short-
hand X ∼ N (µ,Σ) is used to indicate that X is a Gaussian random variable with mean µ
and covariance matrix Σ. The multivariate Gaussian generalizes the familiar “bell curve” of
a univariate Gaussian and is an important distribution in probability theory and multivariate
statistics because of the central limit theorem [1].
Given an n-dimensional random variable X and A ⊆ [n], let XA = (Xa)a∈A. Similarly, if
x is a vector, then xA is the subvector indexed by A. For a matrix Σ, ΣA,B is the submatrix
of Σ with row index set A and column index set B. Among the nice properties of Gaussian
random variables are the fact that marginalization and conditioning both preserve the Gaussian
property; see [1].
Lemma 2.1. Suppose that X ∼ N (µ,Σ) and let A,B ⊆ [n] be disjoint. Then
(1) XA ∼ N (µA,ΣA,A) and
(2) XA|XB = xB ∼ N (µA + ΣA,BΣ−1B,B(xB − µB),ΣA,A − ΣA,BΣ
B,BΣB,A).
To build the Gaussian Bayesian network associated to the DAG G, we allow any Gaussian con-
ditional distribution for the distribution f(xi|xpa(i)). This conditional distribution is recovered
by saying that
i∈pa(j)
λijXi +Wj
where Wj ∼ N (νj , ψ2j ) and is independent of the Xi with i < j, and the λij are the regression
parameters. Linear transformations of Gaussian random variables are Gaussian, and thus X is
also a Gaussian random variable. Since X is completely specified by its mean µ and covariance
matrix Σ, we must calculate these from the conditional distribution. The recursive expression
for the distribution of Xj given the variables preceding it yields a straightforward and recursive
expression for the mean and covariance. Namely
µj = E(Xj) = E(
i∈pa(j)
λijXi +Wj) =
i∈pa(j)
λijµi + νj
4 SETH SULLIVANT
and if k < j the covariance is:
σkj = E ((Xk − µk)(Xj − µj))
(Xk − µk)
i∈pa(j)
λij(Xi − µi) +Wj − νj
i∈pa(j)
λijE ((Xk − µk)(Xi − µi)) + E ((Xk − µk)(Wj − νj))
i∈pa(j)
λijσik
and the variance satisfies:
σjj = E
(Xj − µj)2
i∈pa(j)
λij(Xi − µi) +Wj − νj
i∈pa(j)
k∈pa(j)
λijλkjσik + ψ
If there are no constraints on the vector ν, there will be no constraints on µ either. Thus, we
will focus attention on the constraints on the covariance matrix Σ. If we further assume that the
ψ2j are completely unconstrained, this will imply that we can replace the messy expression for
the covariance σjj by a simple new parameter aj . This leads us to the algebraic representation
of our model, called the trek rule [17].
For each edge i → j ∈ E(G) let λij be an indeterminate and for each vertex i ∈ V (G) let ai
be an indeterminate. Assume that the vertices are numerically ordered, that is i → j ∈ E(G)
only if i < j. A collider is a pair of edges i → k, j → k with the same head. For each pair of
vertices i, j, let T (i, j) be the collection of simple paths P in G from i to j such that there is no
collider in P . Such a colliderless path is called a trek. The name trek come from the fact that
every colliderless path from i to j consists of a path from i up to some topmost element top(P )
and then from top(P ) back down to j. We think of each trek as a sequence of edges k → l. If
i = j, T (i, i) consists of a single empty trek from i to itself.
Let φG be the ring homomorphism
φG : C[σij | 1 ≤ i ≤ j ≤ n]→ C[ai, λij | i, j ∈ [n]i→ j ∈ E(G)]
σij 7→
P∈T (i,j)
atop(P ) ·
k→l∈P
When i = j, we get σii = ai. If there is no trek in T (i, j), then φG(σij) = 0. Let IG = kerφG.
Since IG is the kernel of a ring homomorphism, it is a prime ideal.
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 5
Example 2.2. Let G be the directed graph on four vertices with edges 1 → 2, 1 → 3, 2 → 4,
and 3→ 4. The ring homomorphism φG is given by
σ11 7→ a1 σ12 7→ a1λ12 σ13 7→ a1λ13 σ14 7→ a1λ12λ24 + a1λ13λ34
σ22 7→ a2 σ23 7→ a1λ12λ13 σ24 7→ a2λ24 + a1λ12λ13λ34
σ33 7→ a3 σ34 7→ a3λ34 + a1λ13λ12λ24
σ44 7→ a4
The ideal IG is the complete intersection of a quadric and a cubic:
σ11σ23 − σ13σ21, σ12σ23σ34 + σ13σ24σ23 + σ14σ22σ33 − σ13σ24σ33 − σ13σ22σ34 − σ14σ223
Dual to the ring homomorphism is the rational parametrization
φ∗G : R
E(G)+V (G) → R(
φ∗G(a, λ) = (
P∈T (i,j)
atop(P ) ·
k→l∈P
λkl)i,j .
We will often write σij(a, λ) to denote the coordinate polynomial that represents this function.
Let Ω ⊂ RE(G)+V (G) be the subset of parameter space satisfying the constraints:
j∈pa(i)
k∈pa(i)
λjiλkiσjk(a, λ)
for all i, where in the case that pa(i) = ∅ the sum is zero.
Proposition 2.3. [Trek Rule] The set of covariance matrices in the Gaussian Bayesian network
associated to G is the image φ∗G(Ω). In particular, IG is the vanishing ideal of the model.
The proof of the trek rule parametrization can also be found in [17].
Proof. The proof goes by induction. First, we make the substitution
i∈pa(j)
k∈pa(j)
λijλkjσik + ψ
which is valid because, given the λij ’s, ψ2j can be recovered from aj and vice versa. Clearly
σ11 = a1. By induction, suppose that the desired formula holds for all σij with i, j < n. We
want to show that σin has the same formula. Now from above, we have
σin =
k∈pa(n)
λknσik
k∈pa(n)
P∈T (i,k)
atop(P ) ·
r→s∈P
This last expression is a factorization of φ(σkn) since every trek from i to n is the union of a
trek P ∈ T (i, k) and an edge k → n where k is some parent of n. �
The parameters used in the trek rule parametrization are a little unusual because they involve
a mix of the natural parameters (regression coefficients λij) and coordinates on the image space
(variance parameters ai). While this mix might seem unusual from a statistical standpoint,
we find that this parametrization is rather useful for exploring the algebraic structure of the
covariance matrices that come from the model. For instance:
6 SETH SULLIVANT
Corollary 2.4. If T is a tree, then IT is a toric ideal.
Proof. For any pair of vertices i, j in T , there is at most one trek between i and j. Thus φ(σij)
is a monomial and IT is a toric ideal. �
In fact, as we will show in Section 5, when T is a tree, IT is generated by linear forms and
quadratic binomials that correspond to conditional independence statements implied by the
graph. Before getting to properties of conditional independence, we first note that these models
are identifiable. That is, it is possible to recover the λij and ai parameters directly from Σ. This
also allows us to determine the most basic invariant of IG, namely its dimension.
Proposition 2.5. The parametrization φ∗G is birational. In other words, the model parameters
λij and ai are identifiable and dim IG = #V (G) + #E(G).
Proof. It suffices to prove that the parameters are identifiable via rational functions of the entries
of Σ, as all the other statements follow from this. We have ai = σii so the ai parameters are
identifiable. We also know that for i < j
σij =
k∈pa(j)
σikλkj .
Thus, we have the matrix equation
Σpa(j),j = Σpa(j),pa(j)λpa(j),j
where λpa(j),j is the vector (λij)
i∈pa(j). Since Σpa(j),pa(j) is invertible in the positive definite
cone, we have the rational formula
λpa(j),j = Σ
pa(j),pa(j)
Σpa(j),j
and the λij parameters are identifiable. �
One of the problems we want to explore is the connection between the prime ideal defining the
graphical model (and thus the image of the parametrization) and the relationship to the ideal
determined by the independence statements induced by the model. To explain this connection,
we need to recall some information about the algebraic nature of conditional independence.
Recall the definition of conditional independence.
Definition 2.6. Let A, B, and C be disjoint subsets of [n], indexing subsets of the random
vector X. The conditional independence statement A⊥⊥B|C (“A is independent of B given C)
holds if and only if
f(xA, xB|xC) = f(xA|xC)f(xB|xC)
for all xC such that f(xC) 6= 0.
We refer to [13] for a more extensive introduction to conditional independence. In the Gauss-
ian case, a conditional independence statement is equivalent to an algebraic restriction on the
covariance matrix.
Proposition 2.7. Let A,B,C be disjoint subsets of [n]. Then X ∼ N (µ,Σ) satisfies the con-
ditional independence constraint A⊥⊥B|C if and only if the submatrix ΣA∪C,B∪C has rank less
than or equal to #C.
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 7
Proof. If X ∼ N (µ, σ), then
XA∪B|XC = xC ∼ N
µA∪B + ΣA∪B,CΣ
C,C(xC − µC),ΣA∪B,A∪B − ΣA∪B,CΣ
C,CΣC,A∪B
The CI statement A⊥⊥B|C holds if and only if (ΣA∪B,A∪B − ΣA∪B,CΣ−1C,CΣC,A∪B)A,B = 0. The
A,B submatrix of ΣA∪B,A∪B − ΣA∪B,CΣ−1C,CΣC,A∪B is easily seen to be ΣA,B − ΣA,CΣ
C,CΣC,B
which is the Schur complement of the matrix
ΣA∪C,B∪C =
ΣA,B ΣA,C
ΣC,B ΣC,C
Since ΣC,C is always invertible (it is positive definite), the Schur complement is zero if and only
if the matrix ΣA∪C,B∪C has rank less than or equal to #C. �
Given a DAG G, a collection of conditional independence statements are forced on the joint
distribution by the nature of the graph. These independence statements are usually described
via the notion of d-separation (the d stands for “directed”).
Definition 2.8. Let A, B, and C be disjoint subsets of [n]. The set C d-separates A and B if
every path in G connecting a vertex i ∈ A and B ∈ j contains a vertex k that is either
(1) a non-collider that belongs to C or
(2) a collider that does not belong to C and has no descendants that belong to C.
Note that C might be empty in the definition of d-separation.
Proposition 2.9 ([13]). The conditional independence statement A⊥⊥B|C holds for the Bayesian
network associated to G if and only if C d-separates A from B in G.
A joint probability distribution that satisfies all the conditional independence statements
implied by the graph G is said to satisfy the global Markov property of G. The following theorem
is a staple of the literature of graphical models, that holds with respect to any σ-algebra.
Theorem 2.10 (Recursive Factorization Theorem). [13, Thm 3.27] A probability density has
the recursive factorization property with respect to G if and only if it satisfies the global Markov
property.
Definition 2.11. Let CG ⊆ C[Σ] be the ideal generated by the minors of Σ corresponding to
the conditional independence statements implied by G; that is,
CG = 〈(#C + 1) minors of ΣA∪C,B∪C | C d-separates A from B in G〉 .
The ideal CG is called the conditional independence ideal of G.
A direct geometric consequence of the recursive factorization theorem is the following
Corollary 2.12. For any DAG G,
V (IG) ∩ PDn = V (CG) ∩ PDn.
In the corollary PDn ⊂ R(
2 ) is the cone of n × n positive definite symmetric matrices. It
seems natural to ask whether or not IG = CG for all DAGs G. For instance, this was true for
the DAG in Example 2.2. The Verma graph provides a natural counterexample.
8 SETH SULLIVANT
2 3 4 5
Example 2.13. Let G be the DAG on five vertices with edges 1 → 3, 1 → 5, 2 → 3, 2 → 4,
3→ 4, and 4→ 5. This graph is often called the Verma graph.
The conditional independence statements implied by the model are all implied by the three
statements 1⊥⊥2, 1⊥⊥4|{2, 3}, and {2, 3}⊥⊥5|{1, 4}. Thus, the conditional independence ideal
CG is generated by one linear form and five determinantal cubics. In this case, we find that
IG = CG + 〈f〉 where f is the degree four polynomial:
f = σ23σ24σ25σ34 − σ22σ25σ234 − σ23σ
24σ35 + σ22σ24σ34σ35
−σ223σ25σ44 + σ22σ25σ33σ44 + σ
23σ24σ45 − σ22σ24σ33σ45.
We found that the primary decomposition of CG is
CG = IG ∩ 〈σ11, σ12, σ13, σ14〉
so that f is not even in the radical of CG. Thus, the zero set of CG inside the positive semidefinite
cone contains singular covariance matrices that are not limits of distributions that belong to the
model. Note that since none of the indices of the σij appearing in f contain 1, f vanishes on
the marginal distribution for the random vector (X2, X3, X4, X5). This is the Gaussian version
of what is often called the Verma constraint. Note that this computation shows that the Verma
constraint is still needed as a generator of the unmarginalized Verma model. �
The rest of this paper is concerned with studying the ideals IG and investigating the circum-
stances that guarantee that CG = IG. We report on results of a computational study in the
next section. Towards the end of the paper, we study the ideals IG,O that arise when some of
the random variables are hidden.
3. Computational Study
Whenever approaching a new family of ideals, our first instinct is to compute as many exam-
ples as possible to gain some intuition about the structure of the ideals. This section summarizes
the results of our computational explorations.
We used Macaulay2 [9] to compute the generating sets of all ideals IG for all DAGs G on n ≤ 6
vertices. Our computational results concerning the problem of when CG = IG are summarized
in the following proposition.
Proposition 3.1. All DAGs on n ≤ 4 vertices satisfy CG = IG. Of the 302 DAGs on n = 5
vertices, exactly 293 satisfy CG = IG. Of the 5984 DAGs on n = 6 vertices exactly 4993 satisfy
CG = IG.
On n = 5 vertices, there were precisely nine graphs that fail to satisfy CG = IG. These
nine exceptional graphs are listed below. The numberings of the DAGs come from the Atlas of
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 9
Graphs [14]. Note that the Verma graph from Example 2.13 appears as A218 after relabeling
vertices.
(1) A139: 1→ 4, 1→ 5, 2→ 4, 3→ 4, 4→ 5.
(2) A146: 1→ 3, 2→ 3, 2→ 5, 3→ 4, 4→ 5.
(3) A197: 1→ 2, 1→ 3, 1→ 5, 2→ 4, 3→ 4, 4→ 5.
(4) A216: 1→ 2, 1→ 4, 2→ 3, 2→ 5, 3→ 4, 4→ 5.
(5) A217: 1→ 3, 1→ 4, 2→ 4, 2→ 5, 3→ 4, 4→ 5.
(6) A218: 1→ 3, 1→ 4, 2→ 3, 2→ 5, 3→ 4, 4→ 5.
(7) A275: 1→ 2, 1→ 4, 1→ 5, 2→ 3, 2→ 5, 3→ 4, 4→ 5.
(8) A277: 1→ 2, 1→ 3, 1→ 5, 2→ 4, 3→ 4, 3→ 5, 4→ 5.
(9) A292: 1→ 2, 1→ 4, 2→ 3, 2→ 5, 3→ 4, 3→ 5, 4→ 5.
The table below displays the numbers of minimal generators of different degrees for each of the
ideals IG where G is one of the nine graphs on five vertices such that CG 6= IG. The coincidences
among rows in this table arise because sometimes two different graphs yield the same family of
probability distributions. This phenomenon is known as Markov equivalence [13, 17].
Network 1 2 3 4 5
A139 3 1 2 0 0
A146 1 3 7 0 0
A197 0 1 5 0 1
A216 0 1 5 0 1
A217 2 1 2 0 0
A218 1 0 5 1 0
A275 0 1 1 1 3
A277 0 1 1 1 3
A292 0 1 1 1 3
It is worth noting the methods that we used to perform our computations, in particular,
how we computed generators for the ideals IG. Rather than using the trek rule directly, and
computing the vanishing ideal of the parametrization, we exploited the recursive nature of the
parametrization to determine IG. This is summarized by the following proposition.
Proposition 3.2. Let G be a DAG and G \ n the DAG with vertex n removed. Then
IG\n +
σin −
j∈pa(n)
λjnσij | i ∈ [n− 1]
〉⋂C[σij | i, j ∈ [n]]
where the ideal IG\n is considered as a graph on n− 1 vertices.
Proof. This is a direct consequence of the trek rule: every trek that goes to n passes through a
parent of n and cannot go below n. �
Based on our (limited) computations up to n = 6 we propose some optimistic conjectures
about the structures of the ideals IG.
10 SETH SULLIVANT
Conjecture 3.3.
IG = CG :
A⊂[n]
(|ΣA,A|)∞
Conjecture 3.3 says that all the uninteresting components of CG (that is, the components that
do not correspond to probability density functions) lie on the boundary of the positive definite
cone. Conjecture 3.3 was verified for all DAGs on n ≤ 5 vertices. Our computational evidence
also suggests that all the ideals IG are Cohen-Macaulay and normal, even for graphs with loops
and other complicated graphical structures.
Conjecture 3.4. The quotient ring C[Σ]/IG is normal and Cohen-Macaulay for all G.
Conjecture 3.4 was verified computationally for all graphs on n ≤ 5 vertices and graphs with
n = 6 vertices and less than 8 edges. We prove Conjecture 3.4 when the underlying graph is a
tree in Section 5. A more negative conjecture concerns the graphs such that IG = CG.
Conjecture 3.5. The proportion of DAGs on n vertices such that IG = CG tends to zero as
To close the section, we provide a few useful propositions for reducing the computation of the
generating set of the ideal IG to the ideals for smaller graphs.
Proposition 3.6. Suppose that G is a disjoint union of two subgraph G = G1 ∪G2. Then
IG = IG1 + IG2 + 〈σij | i ∈ V (G1), j ∈ V (G2)〉 .
Proof. In the parametrization φG, we have φG(σij) = 0 if i ∈ V (G1) and j ∈ V (G2), because
there is no trek from i to j. Furthermore, φG(σij) = φG1(σij) if i, j ∈ V (G1) and φG(σkl) =
φG2(σkl) if k, l ∈ V (G2) and these polynomials are in disjoint sets of variables. Thus, there can
be no nontrivial relations involving both σij and σkl. �
Proposition 3.7. Let G be a DAG with a vertex m with no children and a decomposition into
two induced subgraphs G = G1 ∪G2 such that V (G1) ∩ V (G2) = {m}. Then
IG = IG1 + IG2 + 〈σij | i ∈ V (G1) \ {m}, j ∈ V (G2) \ {m}〉 .
Proof. In the paremtrization φG, we have φG(σij) = 0 if i ∈ V (G1) \ {m} and j ∈ V (G2) \ {m},
because there is no trek from i to j. Furthermore φG(σij) = φG1(σij) if i, j ∈ V (G1) and
φG(σkl) = φG2(σkl) if k, l ∈ V (G2) and these polynomials are in disjoint sets of variables unless
i = j = k = l = m. However, in this final case, φG(σmm) = am and this is the only occurrence
of am in any of the expressions φG(σij). This is a consequence of the fact that vertex m has no
children. Thus, we have a partition of the σij into three sets in which φG(σij) appear in disjoint
sets of variables and there can be no nontrivial relations involving two or more of these sets of
variables. �
Proposition 3.8. Suppose that for all i ∈ [n − 1], the edge i → n ∈ E(G). Let G \ n be the
DAG obtained from G by removing the vertex n. Then
IG = IG\n · C[σij : i, j ∈ [n]].
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 11
Proof. Every vertex in G \ n is connected to n and is a parent of n. This implies that n cannot
appear in any conditional independence statement implied by G. Furthermore, if C d-separates
A from B in G\n, it will d-separate A from B in G, because n is below every vertex in G\n. This
implies that the CI statements that hold for G are precisely the same independence statements
that hold for G \ n. Thus
V (CG) ∩ PDn = V (CG\n · C[σij | i, j ∈ [n]]) ∩ PDn.
Since IG = I(V (CG) ∩ PDn), this implies the desired equality. �
4. Tetrad Representation Theorem
An important step towards understanding the ideals IG is to derive interpretations of the
polynomials in IG. We have an interpretation for a large part of IG, namely, the subideal
CG ⊆ IG. Conversely, we can ask when polynomials of a given form belong to the ideals IG.
Clearly, any linear polynomial in IG is a linear combination of polynomials of the form σij with
i 6= j, all of which must also belong to IG. Each linear polynomial σij corresponds to the
independence statement Xi⊥⊥Xj . Combinatorially, the linear from σij is in IG if and only if
there is no trek from i to j in G.
A stronger result of this form is the tetrad representation theorem, first proven in [17], which
gives a combinatorial characterization of when a tetrad difference
σijσkl − σilσjk
belongs to the ideal IG. The constraints do not necessarily correspond to conditional indepen-
dence statements, and need not belong to the ideal CG. This will be illustrated in Example
The original proof of the tetrad representation theorem in [17] is quite long and technical. Our
goal in this section is to show how our algebraic perspective can be used to greatly simplify the
proof. We also include this result here because we will need the tetrad representation theorem
in Section 5.
Definition 4.1. A vertex c ∈ V (G) is a choke point between sets I and J if every trek from a
point in I to a point in J contains c and either
(1) c is on the I-side of every trek from I to J , or
(2) c is on the J-side of every trek from I to J .
The set of all choke points in G between I and J is denoted C(I, J).
Example 4.2. In the graph c is a choke point between {1, 4} and {2, 3}, but is not a choke
point between {1, 2} and {3, 4}.
1 2 3 4
12 SETH SULLIVANT
Theorem 4.3 (Tetrad Representation Theorem [17]). The tetrad constraint σijσkl−σilσjk = 0
holds for all covariance matrices in the Bayesian network associated to G if and only if there is
a choke point in G between {i, k} and {j, l}.
Our proof of the tetrad representation theorem will follow after a few lemmas that lead to
the irreducible factorization of the polynomials σij(a, λ).
Lemma 4.4. In a fixed DAG G, every trek from I to J is incident to every choke point in
C(I, J) and they must be reached always in the same order.
Proof. If two choke points are on, say, the I side of every trek from I to J and there are two
treks which reach these choke points in different orders, there will be a directed cycle in G. If
the choke points c1 and c2 were on the I side and J side, respectively, and there were two treks
from I to J that reached them in a different order, this would contradict the property of being
a choke point. �
Lemma 4.5. Let i = c0, c1, . . . , ck = j be the ordered choke points in C({i}, {j}). Then the
irreducible factorization of σij(a, λ) is
σij(a, λ) =
f tij(a, λ)
where f tij(a, λ) only depends on λpq such that p and q are between choke points ct−1 and ct.
Proof. First of all, we will show that σij(a, λ) has a factorization as indicated. Then we will
show that the factors are irreducible. Define
f tij(a, λ) =
P∈T (i,j;ct−1,ct)
atop(P )
k→l∈P
where T (i, j; ct−1, ct) consists of all paths from ct−1 to ct that are partial treks from i to j (that
is, that can be completed to a trek from i to j) and atop(P ) = 1 if the top of the partial trek
P is not the top. When deciding whether or not the top is included in the partial trek, note
that almost all choke points are associated with either the {i} side or the {j} side. So there is
a natural way to decide if atop(P ) is included or not. In the exceptional case that c is a choke
point on both the {i} and the {j} side, we repeat this choke point in the list. This is because c
must be the top of every trek from i to j, and we will get a factor f tij(a, λ) = ac.
Since each ct is a choke point between i and j, the product of the monomials, one from each
f tij , is the monomial corresponding to a trek from i to j. Conversely, every monomial arises as
such a product in a unique way. This proves that the desired factorization holds.
Now we will show that each of the f tij(a, λ) cannot factorize further. Note that every monomial
in f tij(a, λ) is squarefree in all the a and λ indeterminates. This means that every monomial
appearing in f tij(a, λ) is a vertex of the Newton polytope of f
ij(a, λ). This, in turn, implies
that in any factorization f tij(a, λ) = fg there is no cancellation since in any factorization of any
polynomial, the vertices of the Newton polytope is the product of two vertices of the constituent
Newton polytopes. This means that in any factorization f tij(a, λ) = fg, f and g can be chosen
to be the sums of squarefree monomials all with coefficient 1.
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 13
Now let f tij(a, λ) = fg be any factorization and let m be a monomial appearing in f
ij(a, λ).
If the factorization is nontrivial m = mfmg where mf and mg are monomials in f and g
respectively. Since the factorization is nontrivial and m corresponds to a partial trek P in
T (i, j; ct−1, ct), there must exist a c on P such that, without loss of generality such that λpc
appears in mf and λcq appears in mg. Since every monomial in the expansion of fg corresponds
to a partial trek from ct−1 to ct it must be the case that every monomial in f contains an
indeterminate λsc from some s and similarly, every monomial appearing in g contains a λcs for
some s. But this implies that every partial trek from ct−1 to ct passes through c, with the same
directionality, that is, it is a choke point between i and j. However, this contradicts the fact the
C({i}, {j}) = {c0, . . . , ct}. �
Proof of Thm 4.3. Suppose that the vanishing tetrad condition holds, that is,
σijσkl = σilσkj
for all covariance matrices in the model. This factorization must thus also hold when we sub-
stitute the polynomial expressions in the parametrization:
σij(a, λ)σkl(a, λ) = σil(a, λ)σkj(a, λ).
Assuming that none of these polynomials are zero (in which case the choke condition is satisfied
for trivial reasons), this means that each factor f tij(a, λ) must appear on both the left and the
right-hand sides of this expression. This is a consequence of the fact that polynomial rings
over fields are unique factorization domains. The first factor f1ij(a, λ) could only be a factor of
σil(a, λ). There exists a unique t ≥ 1 such that f1ij · · · f
ij divides σil but f
ij · · · f
ij does not
divide σil. This implies that f
ij divides σkj . However, this implies that ct is a choke point
between i and j, between i and l, between k and j. Furthermore, this will imply that ct is a
choke point between k and l as well, which implies that ct is a choke point between {i, k} and
{j, l}.
Conversely, suppose that there is a choke point c between {i, k} and {j, l}. Our unique
factorization of the σij implies that we can write
σij = f1g1, σkl = f2g2, σil = f1g2, σkj = f2g1
where f1 and f2 corresponds to partial treks from i to c and k to c, respectively, and g1 and g2
correspond to partial treks from c to j and l, respectively. Then we have
σijσkl = f1g1f2g2 = σilσkj ,
so that Σ satisfies the tetrad constraint. �
At first glance, it is tempting to suggest that the tetrad representation theorem says that
a tetrad vanishes for every covariance matrix in the model if and only if an associated condi-
tional independence statement holds. Unfortunately, this is not true, as the following example
illustrates.
Example 4.6. Let A139 be the graph with edges 1→ 4, 1→ 5, 2→ 4, 3→ 4 and 4→ 5. Then
4 is a choke point between {2, 3} and {4, 5} and the tetrad σ24σ35 − σ25σ34 belongs to IA139 .
However, it is not implied by the conditional independence statements implied by the graph
(that is, σ24σ35 − σ25σ34 /∈ CA139). It is precisely this extra tetrad constraint that forces A139
onto the list of graphs that satisfy CG 6= IG from Section 3.
14 SETH SULLIVANT
In particular, a choke point between two sets need not be a d-separator of those sets. In the
case that G is a tree, it is true that tetrad constraints are conditional independence constraints.
Proposition 4.7. Let T be a tree and suppose that c is a choke point between I and J in T .
Then either c d-separates I \ {c} and J \ {c} or ∅ d-separates I \ {c} and J \ {c}.
Proof. Since T is a tree, there is a unique path from an element in I \ c to an element in J \ c.
If this path is not a trek, we have ∅ d-separates I \ {c} from J \ {c}. On the other hand, if this
path is always a trek we see that {c} d-separates I \ {c} from J \ {c}. �
The tetrad representation theorem gives a simple combinatorial rule for determining when a
2 × 2 minor of Σ is in IG. More generally, we believe that there should exist a graph theoretic
rule that determines when a general determinant |ΣA,B| ∈ IG in terms of structural features of
the DAG G. The technique we have used above, which relies on giving a factorization of the
polynomials σ(a, λ), does not seem like it will extend to higher order minors. One approach
at a generalization of the tetrad representation theorem would be to find a cancellation free
expression for the determinant |ΣA,B| in terms of the parameters ai and λij , along the lines of
the Gessel-Viennot theorem [8]. From such a result, one could deduce a combinatorial rule for
when |ΣA,B| is zero. This suggests the following problem.
Problem 4.8. Develop a Gessel-Viennot theorem for treks; that is, determine a combinatorial
formula for the expansion of |ΣA,B| in terms of the treks in G.
5. Fully Observed Trees
In this section we study the Bayesian networks of trees in the situation where all random
variables are observed. We show that the toric ideal IT is generated by linear forms σij and
quadratic tetrad constraints. The Tetrad Representation Theorem and Proposition 4.7 then
imply that IT = CT . We also investigate further algebraic properties of the ideals IT using the
fact that IT is a toric ideal and some techniques from polyhedral geometry.
For the rest of this section, we assume that T is a tree, where by a tree we mean a DAG
whose underlying undirected graph is a tree. These graphs are sometimes called polytrees in
the graphical models literature. A directed tree is a tree all of whose edges are directed away
from a given source vertex.
Since IT is a toric ideal, it can be analyzed using techniques from polyhedral geometry. In
particular, for each i, j such that T (i, j) is nonempty, let aij denote the exponent vector of the
monomial σij = atop(P )
k→l∈P λkl. Let AT denote the set of all these exponent vectors. The
geometry of the toric variety V (IT ) is determined by the discrete geometry of the polytope
PT = conv(AT ).
The polytope PT is naturally embedded in R2n−1, where n of the coordinates on R2n−1
correspond to the vertices of T and n − 1 of the coordinates correspond to the edges of T .
Denote the first set of coordinates by xi and the second by yij where i→ j is an edge in T . Our
first results is a description of the facet structure of the polytope PT .
Theorem 5.1. The polytope PT is the solution to the following set of equations and inequalities:
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 15
xi ≥ 0 for all i ∈ V (T )
yij ≥ 0 for all i→ j ∈ E(T )∑
i∈V (T ) xi = 1
i: i→j∈E(T ) yij − yjk ≥ 0 for all j → k ∈ E(T )
2xj +
i: i→j∈E(T ) yij −
k: j→k∈E(T ) yjk ≥ 0 for all j ∈ V (T ).
Proof. Let QT denote the polyhedron defined as the solution space to the given constraints.
First of all, QT is bounded. To see this, first note that because of the positive constraints and
the equation
i∈V (T ) xi = 1, we have that xi ≤ 1 is implied by the given constraints. Then,
starting from the sources of the tree and working our way down the edges repeatedly using the
inequalities xj +
i: i→j∈E(T ) yij − yjk ≥ 0, we see that the yij are also bounded.
Now, we have PT ⊆ QT , since every trek will satisfy any of the indicated constraints. Thus,
we must show that QT ⊆ PT . To do this, it suffices to show that for any vector (x0, y0) ∈ QT ,
there exists λ > 0, (x1, y1) and (x2, y2) such that
(x0, y0) = λ(x1, y1) + (1− λ)(x2, y2)
where (x1, y1) is one of the 0/1 vectors aij and (x2, y2) ∈ QT . Because QT is bounded, this
will imply that the extreme points of QT are a subset of the extreme points of PT , and hence
QT ⊆ PT . Without loss of generality we may suppose that all of the coordinates y0ij are positive,
otherwise the problem reduces to a smaller tree or forest because the resulting inequalities that
arise when yij = 0 are precisely those that are necessary for the smaller tree. Note that for
a forest F , the polytope PF is the direct join of polytopes PT as T ranges over the connected
components of F , by Proposition 3.6.
For any fixed j, there cannot exist distinct values k1, k2, and k3 such that all of
x0j +
i: i→j∈E(T )
y0ij − y
x0j +
i: i→j∈E(T )
y0ij − y
x0j +
i: i→j∈E(T )
y0ij − y
hold. If there were, we could add these three equations together to deduce that
3x0j + 3
i: i→j∈E(T )
y0ij − y
− y0jk2 − y
This in turn implies that
2x0j +
i: i→j∈E(T )
y0ij − y
− y0jk2 − y
with equality if and only if pa(j) = ∅ and x0j = 0. This in turn implies that, for instance,
y0jk1 = 0 contradicting our assumption that y
ij > 0 for all i and j. By a similar argument, if
exactly two of these facet defining inequalities hold sharply, we see that
2x0j + 2
i: i→j∈E(T )
y0ij − y
− y0jk2 = 0
16 SETH SULLIVANT
which implies that j has exactly two descendants and no parents.
Now mark each edge j → k in the tree T such that
x0j +
i: i→j∈E(T )
y0ij − y
jk = 0.
By the preceding paragraph, we can find a trek P from a sink in the tree to a source in the tree
and (possibly) back to a different sink that has the property that for no i in the trek there exists
k not in the path such that i → k is a marked edge. That is, the preceding paragraph shows
that there can be at most 2 marked edges incident to any given vertex.
Given P , let (x1, y1) denote the corresponding 0/1 vector. We claim that there is a λ > 0
such that
(1) (x0, y0) = λ(x1, y1) + (1− λ)(x2, y2)
holds with (x2, y2) ∈ QT . Take λ > 0 to be any very small number and define (x2, y2) by the
given equation. Note that by construction the inequalities x2i ≥ 0 and y
ij ≥ 0 will be satisfied
since for all the nonzero entries in (x1, y1), the corresponding inequality for (x0, y0) must have
been nonstrict and λ is small. Furthermore, the constraint
x2i = 1 is also automatically
satisfied. It is also easy to see that the last set of inequalities will also be satisfied since through
each vertex the path will either have no edges, an incoming edge and an outgoing edge, or two
outgoing edges and the top vertex, all of which do not change the value of the linear functional.
Finally to see that the inequalities of the form
i: i→j∈E(T )
yij − yjk ≥ 0
are still satisfied by (x2, y2), note that marked edges of T are either contained in the path
P or not incident to the path P . Thus, the strict inequalities remain strict (since they will
involve modifying by an incoming edge and an outgoing edge or an outgoing edge and the top
vertex), and the nonstrict inequalities remain nonstrict since λ is small. Thus, we conclude that
QT ⊆ PT , which completes the proof. �
Corollary 5.2. Let ≺ be any reverse lexicographic term order such that σii � σjk for all i and
j 6= k. Then in≺(IT ) is squarefree. In other words, the associated pulling triangulation of PT is
unimodular.
Proof. The proof is purely polyhedral, and relies on the geometric connections between trian-
gulations and initial ideals of toric ideals. See Chapter 8 in [19] for background on this material
including pulling triangulations. Let aij denote the vertex of PT corresponding to the monomial
φG(σij). For i 6= j, each of the vertices aij has lattice distance at most one from any of the
facets described by Theorem 5.1. This is seen by evaluating each of the linear functionals at the
0/1 vector corresponding to the trek between i and j.
If we pull from one of these vertices we get a unimodular triangulation provided that the
induced pulling triangulation on each of the facets of PT not containing aij is unimodular. This
is because the normalized volume of a simplex is the volume of the base times the lattice distance
from the base to the vertex not on the base.
The facet defining inequalities of any face of PT are obtained by taking an appropriate subset
of the facet defining inequalities of PT . Thus, as we continue the pulling triangulation, if the
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 17
current face contains a vertex aij with i 6= j, we will pull from this vertex first and get a
unimodular pulling triangulation provided the induced pulling triangulation of every face is
unimodular. Thus, by induction, it suffices to show that the faces of PT that are the convex
hull of vertices aii have unimodular pulling triangulations. However, these faces are always
unimodular simplices. �
Corollary 5.3. The ring C[Σ]/IT is normal and Cohen-Macaulay when T is a tree.
Proof. Since PT has a unimodular triangulation, it is a normal polytope and hence the semigroup
ring C[Σ]/IT is normal. Hochster’s theorem [10] then implies that C[Σ]/IT is Cohen-Macaulay.
While we know that C[Σ]/IT is always Cohen-Macaulay, it remains to determine how the
Cohen-Macaulay type of IT depends on the underlying tree T . Here is a concrete conjecture
concerning the special case of Gorenstein trees.
Conjecture 5.4. Suppose that T is a directed tree. Then C[Σ]/IT is Gorenstein if and only if
the degree of every vertex in T is less than or equal to three.
A downward directed tree is a tree all of whose edges point to the unique sink in the tree. A
leaf of such a downward directed tree is then a source of the tree. With a little more refined
information about which inequalities defining PT are facet defining, we can deduce results about
the degrees of the ideals IT in some cases.
Corollary 5.5. Let T be a downward directed tree and let i be any leaf of T , s the sink of T ,
and P the unique trek in T (i, s). Then
deg IT =
k→l∈P
deg IT\k→l
where T \ k → l denotes the forest obtained from T by removing the edge k → l.
Proof. First of all, note that in the case of a downward directed tree the inequalities of the form
2xj +
i: i→j∈E(T )
yij −
k: j→k∈E(T )
yjk ≥ 0
are redundant: since each vertex has at most one descendant, it is implied by the the other
constraints. Also, for any source t, the inequality xt ≥ 0 is redundant, because it is implied by
the inequalities xt − ytj ≥ 0 and ytj ≥ 0 where j is the unique child of t.
Now we will compute the normalized volume of the polytope PT (which is equal to the degree
of the toric ideal IT ) by computing the pulling triangulation from Corollary and relating the
volumes of the pieces to the associated subforests.
Since the pulling triangulation of PT with ais pulled first is unimodular, the volume of PT is
the sum of the volumes of the facets of PT that do not contain ais. Note that ais lies on all the
facets of the form
i: i→j∈E(T )
yij − yjk ≥ 0
since through every vertex besides the source and sink, the trek has either zero or two edges
incident to it. Thus, the only facets that ais does not lie on are of the form ykl ≥ 0 such that
18 SETH SULLIVANT
k → l is an edge in the trek P . However, the facet of PT obtained by setting ykl = 0 is precisely
the polytope PT\k→l, which follows from Theorem 5.1. �
Note that upon removing an edge in a tree we obtain a forest. Proposition 3.6 implies that
the degree of such a forest is the product of the degrees of the associated trees. Since the degree
of the tree consisting of a single point is one, the formula from Corollary 5.5 yields a recursive
expression for the degree of a downward directed forest.
Corollary 5.6. Let Tn be the directed chain with n vertices. Then deg ITn =
, the n−1st
Catalan number.
Proof. In Corollary 5.5 we take the unique path from 1 to n. The resulting forests obtained
by removing an edge are the disjoint unions of two paths. By the product formula implied by
Proposition 3.6 we deduce that the degree of ITn satisfies the recurrence:
deg ITn =
deg ITi · deg ITn−i
with initial condition deg IT1 = 1. This is precisely the recurrence and initial conditions for the
Catalan numbers [18]. �
Now we want to prove the main result of this section, that the determinantal conditional
independence statements actually generate the ideal IT when T is a tree. To do this, we will
exploit the underlying toric structure, introduce a tableau notation for working with monomials,
and introduce an appropriate ordering of the variables.
Each variable σij that is not zero can be identified with the unique trek in T from i to j.
We associate to σij the tableau which records the elements of T in this unique trek, which is
represented like this:
σij = [aBi|aCj]
where B and C are (possibly empty) strings. If, say, i were at the top of the path, we would
write the tableau as
σij = [i|iCj].
The tableau is in its standard form if aBi is lexicographically earlier than aCj. We introduce a
lexicographic total order on standard form tableau variables by declaring [aA|aB] ≺ [cC|cD] if
aA is lexicographically smaller that cC, or if aA = cC and aB is lexicographically smaller than
cD. Given a monomial, its tableau representation is the row-wise concatenation of the tableau
forms of each of the variables appearing in the monomial.
Example 5.7. Let T be the tree with edges 1 → 3, 1 → 4, 2 → 4, 3 → 5, 3 → 6, 4 → 7,
and 4→ 8. Then the monomial σ14σ18σ24σ234σ38σ57σ78 has the standard form lexicographically
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 19
ordered tableau: 

1 148
13 14
13 14
13 148
135 147
47 48

Note that if a variable appears to the d-th power in a monomial, the representation for this
variable is repeated as d rows in the tableau. �
When we write out general tableau, lower-case letters will always correspond to single char-
acters (possibly empty) and upper case letters will always correspond to strings of characters
(also, possibly empty).
Theorem 5.8. For any tree T , the conditional independence statements implied by T generate
IT . In particular, IT is generated by linear polynomials σij and quadratic tetrad constraints.
Proof. First of all, we can ignore the linear polynomials as they always correspond to indepen-
dence constraints and work modulo these linear constraints when working with the toric ideal
IT . In addition, every quadratic binomial of the form σijσkl − σilσkj that belongs to IT is im-
plied by a conditional independence statement. This follows from Proposition 4.7. Note that
this holds even if the set {i, j, k, l} does not have four elements. Thus, it suffices to show that
IT modulo the linear constraints is generated by quadratic binomials.
To show that IT is generated by quadratic binomials, it suffices to show that any binomial in
IT can be written as a polynomial linear combination of the quadratic binomials in IT . This, in
turn, will be achieved by showing that we can “move” from the tableau representation of one
of the monomials to the other by making local changes that correspond to quadratic binomials.
To show this last part, we will define a sort of distance between two monomials and show that
it is always possible to decrease this distance using these quadratic binomials/ moves. This is a
typical trick for dealing with toric ideals, illustrated, for instance, in [19].
To this end let f be a binomial in IT . Without loss of generality, we may suppose the terms
of f have no common factors, because if σa · f ∈ IT then f ∈ IT as well. We will write f as the
difference of two tableaux, which are in standard form with their rows lexicographically ordered.
The first row in the two tableaux are different and they have a left-most place where they
disagree. We will show that we can always move this position further to the right. Eventually
the top rows of the tableaux will agree and we can delete this row (corresponding to the same
variable) and arrive at a polynomial of smaller degree.
Since f ∈ IT , the treks associated to the top rows of the two tableaux must have the same
top. There are two cases to consider. Either the first disagreement is immediately after the top
or not. In the first case, this means that the binomial f must have the form:[
abB acC
abB adD
20 SETH SULLIVANT
Without loss of generality we may suppose that c < d. Since f ∈ IT the string ac must appear
somewhere on the right-hand monomial. Thus, f must have the form:
 abB acC
 abB adDaeE acC ′
If d 6= e, we can apply the quadratic binomial[
abB adD
aeE acC ′
abB acC ′
aeE adD
to the second monomial to arrive at a monomial which has fewer disagreements with the left-
hand tableau in the first row. On the other hand, if d = e, we cannot apply this move (its
application results in “variables” that do not belong to C[Σ]). Keeping track of all the ad
patterns that appear on the right-hand side, and the consequent ad patterns that appear on the
left-hand side, we see that our binomial f has the form
abB acC
ad∗ ∗
ad∗ ∗
−

abB adD
adD′ acC ′
ad∗ ∗
ad∗ ∗
 .
Since there are the same number of ad’s on both sides we see that there is at least one more a
on the right-hand side which has no d’s attached to it. Thus, omitting the excess ad’s on both
sides, our binomial f contains:
 abB acC
 abB adDadD′ acC ′
aeE agG
with d 6= e or g. We can also assume that c 6= e, g otherwise, we could apply a quadratic move
as above. Thus we apply the quadratic binomials[
adD′ acC ′
aeE agG
adD′ agG
aeE acC ′
and [
abB adD
aeE acC ′
abB acC ′
aeE adD
to reduce the number of disagreements in the first row. This concludes the proof of the first
case. Now suppose that the first disagreement does not occur immediately after the a. Thus we
may suppose that f has the form:[
aAxbB aC
aAxdD aE
Note that it does not matter whether or not this disagreement appears on the left-hand or
right-hand side of the tableaux. Since the string xd appears on right-hand monomial it must
also appear somewhere on the left-hand monomial as well. If x is not the top in this occurrence,
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 21
we can immediately apply a quadratic binomial to reduce the discrepancies in the first row. So
we may assume the f has the form:
 aAxbB aCxdD′ xgG
 aAxdD aE
If b 6= g we can apply the quadratic binomial[
aAxbB aC
xdD′ xgG
aAxdD′ aC
xbB xgG
to the left-hand monomial to reduce the discrepancies in the first row. So suppose that g = b.
Enumerating the xb pairs that can arise on the left and right hand monomials, we deduce, akin
to our argument in the first case above, that f has the form:
aAxbB aC
xdD′ xbG
xhH xkK
aAxdD aE
where h and k are not equal to b or d. Then we can apply the two quadratic binomials:[
xdD′ xbG
xhH xkK
xhH xbG
xdD′ xkK
and [
aAxbB aC
xdD′ xkK
aAxdD′ aC
xbB xkK
to the left-hand monomial to produce a monomial with fewer discrepancies in the first row. We
have shown that no matter what type of discrepancy that can occur in the first row, we can
always apply quadratic moves to produce fewer discrepancies. This implies that IT is generated
by quadrics. �
Among the results in this section were our proofs that IT has a squarefree initial ideal (and
hence C[Σ]/IT is normal and Cohen-Macaulay) and that IT is generated by linear forms and
quadrics. It seems natural to wonder if there is a term order that realizes these two features
simultaneously.
Conjecture 5.9. There exists a term order ≺ such that in≺(IT ) is generated by squarefree
monomials of degree one and two.
6. Hidden Trees
This section and the next concern Bayesian networks with hidden variables. A hidden or
latent random variable is one which we do not have direct access to. These hidden variables
might represent theoretical quantities that are directly unmeasurable (e.g. a random variable
representing intelligence), variables we cannot have access to (e.g. information about extinct
species), or variables that have been censored (e.g. for sensitive random variables in census
data). If we are given a model over all the observed and hidden random variables, the partially
22 SETH SULLIVANT
observed model is the one obtained by marginalizing over the hidden random variables. A
number of interesting varieties arise in this hidden variable setting.
For Gaussian random variables, the marginalization is again Gaussian, and the mean and
covariance matrix are obtained by extracting the subvector and submatrix of the mean and
covariance matrix corresponding to the observed random variables. This immediately yields the
following proposition.
Proposition 6.1. Let I ⊆ C[µ,Σ] be the vanishing ideal for a Gaussian model. Let H ∪O = [n]
be a partition of the random variables into hidden and observed variables H and O. Then
IO := I ∩ C[µi, σij | i, j ∈ O]
is the vanishing ideal for the partially observed model.
Proof. Marginalization in the Gaussian case corresponds to projection onto the subspace of pairs
(µO,ΣO,O) ⊆ R|O| × R(
|O|+1
2 ). Coordinate projection is equivalent to elimination [2]. �
In the case of a Gaussian Bayesian network, Proposition 6.1 has a number of useful corollaries,
of both a computational and theoretical nature. First of all, it allows for the computation of the
ideals defining a hidden variable model as an easy elimination step. Secondly, it can be used to
explain the phenomenon we observed in Example 2.13, that the constraints defining a hidden
variable model appeared as generators of the ideal of the fully observed model.
Definition 6.2. Let H ∪ O be a partition of the nodes of the DAG G. The hidden nodes H
are said to be upstream from the observed nodes O in G if there are no edges o→ h in G with
o ∈ O and h ∈ H.
If H∪O is an upstream partition of the nodes of G, we introduce a grading on the ring C[a, λ]
which will, in turn, induce a grading on C[Σ]. Let deg ah = (1, 0) for all h ∈ H, deg ao = (1, 2)
for all o ∈ O, deg λho = (0, 1) if h ∈ H and o ∈ O, and deg λij = (0, 0) otherwise.
Lemma 6.3. Suppose that H ∪O = [n] is an upstream partition of the vertices of G. Then each
of the polynomials φG(σij) is homogeneous with respect to the upstream grading and
deg(σij) =
(1, 0) if i ∈ H, j ∈ H
(1, 1) if i ∈ H, j ∈ O or i ∈ O, j ∈ H
(1, 2) if i ∈ O, j ∈ O.
Thus, IG is homogeneous with respect to the induced grading on C[Σ].
Proof. There are three cases to consider. If both i, j ∈ H, then every trek in T (i, j) has a top
element in H and no edges of the form h→ o. In this case, the degree of each path is the vector
(1, 0). If i ∈ H and j ∈ O, every trek from i to j has a top in H and exactly one edge of the
form h→ o. Thus, the degree of every monomial in φ(σij) is (1, 1). If both i, j ∈ O, then either
each trek P from i to j has a top in O, or has a top in H. In the first case there can be no
edges in P of the form h → o, and in the second case there must be exactly two edges in P of
the form h→ o. In either case, the degree of the monomial corresponding to P is (1, 2). �
Note that the two dimensional grading we have described can be extended to an n dimensional
grading on the ring C[Σ] by considering all collections of upstream variables in G simultaneously.
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 23
Theorem 6.4 (Upstream Variables Theorem). Let H∪O be an upstream partition of the vertices
of G. Then every minimal generating set of IG that is homogeneous with respect to the upstream
grading contains a minimal generating set of IG,O.
Proof. The set of indeterminates σij corresponding to the observed variables are precisely the
variables whose degrees lie on the facet of the degree semigroup generated by the vector (1, 2).
This implies that the subring generated by these indeterminates is a facial subring. �
The upstream variables theorem is significant because any natural generating set of an ideal
I is homogeneous with respect to its largest homogeneous grading group. For instance, every
reduced Gröbner basis if IG will be homogeneous with respect to the upstream grading. For
trees, the upstream variables theorem immediately implies:
Corollary 6.5. Let T be a rooted directed tree and O consist of the leaves of T . Then IT,O is
generated by the quadratic tetrad constraints
σikσjl − σilσkj
such that i, j, k, l ∈ O, and there is a choke point c between {i, j} and {k, l}.
Corollary 6.5 says that the ideal of a hidden tree model is generated by the tetrad constraints
induced by the choke points in the tree. Sprites et al [17] use these tetrad constraints as a tool
for inferring DAG models with hidden variables. Given a sample covariance matrix, they test
whether a collection of tetrad constraints is equal to zero. From the given tetrad constraints
that are satisfied, together with the tetrad representation theorem, they construct a DAG that
is consistent with these vanishing tetrads. However, it is not clear from that work whether or
not it is enough to consider only these tetrad constraints. Indeed, as shown in [17], there are
pairs of graphs with hidden nodes that have precisely the same set of tetrad constraints that do
not yield the same family of covariance matrices. Theorem 6.5 can be seen as a mathematical
justification of the tetrad procedure of Spirtes, et al, in the case of hidden tree models, because
it shows that the tetrad constraints are enough to distinguish between the covariance matrices
coming from different trees.
7. Connections to Algebraic Geometry
In this section, we give families of examples to show how classical varieties from algebraic
geometry arise in the study of Gaussian Bayesian networks. In particular, we show how toric
degenerations of the Grassmannian, matrix Schubert varieties, and secant varieties all arise as
special cases of Gaussian Bayesian networks with hidden variables.
7.1. Toric Initial Ideals of the Grassmannian. Let Gr2,n be the Grassmannian of 2-planes
in Cn. The Grassmannian has the natural structure of an algebraic variety under the Plücker
embedding. The ideal of the Grassmannian is generated by the quadratic Plücker relations:
I2,n := I(Gr2,n) = 〈σijσkl − σikσjl + σilσjk | 1 ≤ i < j < k < l ≤ n〉 ⊂ C[Σ].
The binomial initial ideals of I2,n are in bijection with the unrooted trivalent trees with n
leaves. These binomial initial ideals are, in fact, toric ideals, and we will show that:
24 SETH SULLIVANT
Theorem 7.1. Let T be a rooted directed binary tree with [n] leaves and let O be the set of
leaves of T . Then there is a weight vector ω ∈ R(
2 ) and a sign vector τ ∈ {±1}(
2 ) such that
IT,O = τ · inω(I2,n).
The sign vector τ acts by multiplying coordinate σij by τij .
Proof. The proof idea is to show that the toric ideals IT,O have the same generators as the toric
initial ideals of the Grassmannian that have already been characterized in [16]. Without loss
of generality, we may suppose that the leaves of T are labeled by [n], that the tree is drawn
without edge crossings, and the leaves are labeled in increasing order from left to right. These
assumptions will allow us to ignore the sign vector τ in the proof. The sign vector results from
straightening the tree and permuting the columns in the Steifel coordinates. This results in sign
changes in the Plücker coordinates.
In Corollary 6.5, we saw that IT,O was generated by the quadratic relations
σikσjl − σilσkj
such that there is a choke point in T between {i, j} and {k, l}. This is the same as saying
that the induced subtree of T on {i, j, k, l} has the split {i, j}|{k, l}. These are precisely the
generators of the toric initial ideals of the Grassmannian G2,n identified in [16]. �
In the preceding Theorem, any weight vector ω that belongs to the relative interior of the cone
of the tropical Grassmannian corresponding to the tree T will serve as the desired partial term
order. We refer to [16] for background on the tropical Grassmannian and toric degenerations of
the Grassmannian. Since and ideal and its initial ideals have the same Hilbert function, we see
Catalan numbers emerging as degrees of Bayesian networks yet again.
Corollary 7.2. Let T be a rooted, directed, binary tree and O consist of the leaves of T . Then
deg IT,O = 1n−1
, the (n− 2)-nd Catalan number.
The fact that binary hidden tree models are toric degenerations of the Grassmannian has
potential use in phylogenetics. Namely, it suggests a family of new models, of the same di-
mension as the binary tree models, that could be used to interpolate between the various tree
models. That is, rather than choosing a weight vector in a full dimensional cone of the tropical
Grassmannian, we could choose a weight vector ω that sits inside of lower dimensional cone.
The varieties of the initial ideals V (inω(I2,n)) then correspond to models that sit somewhere
“between” models corresponding of the full dimensional trees of the maximal dimensional cones
containing ω. Phylogenetic recovery algorithms could reference these in-between models to indi-
cate some uncertainty about the relationships between a given collection of species or on a given
branch of the tree. These new models have the advantage that they have the same dimension
as the tree models and so there is no need for dimension penalization in model selection.
7.2. Matrix Schubert Varieties. In this section, we will describe how certain varieties called
matrix Schubert varieties arise as special cases of the varieties of hidden variable models for
Gaussian Bayesian networks. More precisely, the variety for the Gaussian Bayesian network will
be the cone over one of these matrix Schubert varieties. To do this, we first need to recall some
equivalent definitions of matrix Schubert varieties.
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 25
Let w be a partial permutation matrix, which is an n × n 0/1 matrix with at most one 1 in
each row and column. The matrix w is in the affine space Cn×n. The Borel group B of upper
triangular matrices acts on Cn×n on the right by multiplication and on the left by multiplication
by the transpose.
Definition 7.3. The matrix Schubert variety Xw is the orbit closure of w by the action of B
on the right and left:
Xw = BTwB.
Let Iw be the vanishing ideal of Xw.
The matrix Schubert variety Xw ⊆ Cn×n, so we can identify its coordinate ring with a quotient
of C[σij | i ∈ [n], j ∈ [n′]]. Throughout this section [n′] = {1′, 2′, . . . , n′}, is a set of n symbols
that we use to distinguish from [n] = {1, 2, . . . , n}.
An equivalent definition of a matrix Schubert variety comes as follows. Let S(w) = {(i, j) | wij =
1} be the index set of the ones in w. For each (i, j) let Mij be the variety of rank one matrices:
Mij =
x ∈ Cn×n | rankx ≤ 1, xkl = 0 if k < i or l < j
(i,j)∈S(w)
where the sum denotes the pointwise Minkowski sum of the varieties. Since Mij are cones over
projective varieties, this is the same as taking the join, defined in the next section.
Example 7.4. Let w be the partial permutation matrix
1 0 00 1 0
0 0 0
Then Xw consists of all 3× 3 matrices of rank ≤ 2 and Iw =
|Σ[3],[3′]|
. More generally, if w is
a partial permutation matrix of the form
where Ed is a d×d identity matrix, then Iw is the ideal of (d+1) minors of a generic matrix. �
The particular Bayesian networks which yield the desired varieties come from taking certain
partitions of the variables. In particular, we assume that the observed variables come in two
types labeled by [n] = {1, 2, . . . , n} and [n′] = {1′, 2′, . . . , n′}. The hidden variables will be
labeled by the set S(w).
Define the graph G(w) with vertex set V = [n] ∪ [n′] ∪ S(w) and edge set consisting of edges
k → l for all k < l ∈ [n], k′ → l′ for all k′ < l′ ∈ [n′], (i, j) → k for all (i, j) ∈ S(w) and k ≥ i
and (i, j)→ k′ for all (i, j) ∈ S(w) and k′ ≥ j.
Theorem 7.5. The generators of the ideal Iw defining the matrix Schubert variety Xw are the
same as the generators of the ideal IG(w),[n]∪[n′] of the hidden variable Bayesian network for the
DAG G(w) with observed variables [n] ∪ [n′]. That is,
Iw · C[σij | i, j ∈ [n] ∪ [n′]] = IG(w),[n]∪[n′].
26 SETH SULLIVANT
Proof. The proof proceeds in a few steps. First, we give a parametrization of a cone over
the matrix Schubert variety, whose ideal is naturally seen to be Iw · C[σij | i, j ∈ [n] ∪ [n′]].
Then we describe a rational transformation φ on C[σij | i, j ∈ [n] ∪ [n′]] such that φ(Iw) =
IG(w),[n]∪[n′]. We then exploit that fact that this transformation is invertible and the elimination
ideal IG(w),[n]∪[n′] ∩ C[σij | i ∈ [n], j ∈ [n′]] is fixed to deduce the desired equality.
First of all, we give our parametrization of the ideal Iw. To do this, we need to carefully
identify all parameters involved in the representation. First of all, we split the indeterminates
in the ring C[σij | i, i ∈ [n]∪ [n′]] into three classes of indeterminates: those with i, j ∈ [n], those
with i, j ∈ [n′], and those with i ∈ [n] and j ∈ [n′]. Then we define a parametrization φw which
is determined as follows:
φw : C[τ, γ, a, λ]→ C[σij | i, j ∈ [n] ∪ [n′]
φw(σij) =
τij if i, j ∈ [n]
γij if i, j ∈ [n′]∑
(k,l)∈S(w):k≤i,l≤j a(k,l)λ(k,l),iλ(k,l),j if i ∈ [n], j ∈ [n
Let Jw = kerφw. Since the τ , γ, λ, and a parameters are all algebraically independent, we
deduce that in Jw, there will be no generators that involve combinations of the three types
of indeterminates in C[σij | i, j ∈ [n] ∪ [n′]]. Furthermore, restricting to the first two types of
indeterminates, there will not be any nontrivial relations involving these types of indeterminates.
Thus, to determine Jw, it suffices to restrict to the ideal among the indeterminates of the form
σij such that i ∈ [n] and j ∈ [n′]. However, considering the parametrization in this case, we see
that this is precisely the parametrization of the ideal Iw, given as the Minkowski sum of rank
one matrices. Thus, Jw = Iw.
Now we will define a map from φ : C[σij ] → C[σij ] which sends Jw to another ideal, closely
related to IG(w),[n]∪[n′]. To define this map, first, we use the fact that from the submatrix
Σ[n],[n] we can recover the λij and ai parameters associated to [n], when only considering the
complete subgraph associated to graph G(w)[n] (and ignoring the treks that involve the vertices
(k, l) ∈ S(w)). This follows because these parameters are identifiable by Proposition 2.5. A
similar fact holds when restricting to the subgraph G(w)[n′]. The ideal Jw we have defined thus
far can be considered as the vanishing ideal of a parametrization which gives the complete graph
parametrization for G(w)[n] and G(w)[n′] and a parameterization of the matrix Schubert variety
Xw on the σij with i ∈ [n] and j ∈ [n′]. So we can rationally recover the λ and a parameters
associated to the subgraphs G(w)[n] and G(w)[n′].
For each j < k pair in [n] or in [n′], define the partial trek polynomial
sjk(λ) =
j=l0<l1<...<lm=k
λli−1li .
We fit these into two upper triangular matrices S and S′ where Sjk = sjk if j < k with j, k ∈ [n],
Sjj = 1 and Sjk = 0 otherwise, with a similar definition for S′ with [n] replaced by [n′]. Now
we are ready to define our map. Let φ be the rational map φ : C[Σ] → C[Σ] which leaves σij
fixed if i, j ∈ [n] or i, j ∈ [n′], and maps σij with i ∈ [n] and j ∈ [n′] by sending
Σ[n],[n′] 7→ SΣ[n],[n′](S
′)T .
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 27
This is actually a rational map, because the λij that appear in the formula for sjk are expressed
as rational functions in terms of the σij by the rational parameter recovery formula of Proposition
2.5. Since this map transforms Σ[n],[n′] by multiplying on the left and right but lower and upper
triangular matrices, this leaves the ideal Jw ∩ C[σij | i ∈ [n], j ∈ [n′]] fixed. Thus Jw ⊆ φ(Jw).
On the other hand φ is invertible on Jw so Jw = φ(Jw).
If we think about the formulas for the image φ◦φw, we see that the formulas for σij with i ∈ [n]
and j ∈ [n′] in terms of parameters are the correct formulas which we would see coming from
the parametrization φG(w). On the other hand, the formulas for σij with i, j ∈ [n] or i, j ∈ [n′]
are the formulas for the restricted graph G[n] and G[n′], respectively. Since every trek contained
in G[n] or G[n′] is a trek in G(w), we see that the current parametrization of Jw is only “almost
correct”, in that it is only missing terms corresponding to treks that go outside of G(w)[n] or
G(w)[n′]. Denote this map by ψw, and let φG(w) be the actual parametrizing map of the model.
Thus, we have, for each σij with i, j ∈ [n] or i, j ∈ [n′], φG(w)(σij) = ψw(σij) + rw(σij), where
rw(σij) is a polynomial remainder term that does not contain any ai with i ∈ [n] ∪ [n′], when
i, j ∈ [n] or i, j ∈ [n′], and rw(σij) = 0 otherwise. On the other hand, every term of ψw(σij) will
involve exactly one ai with i ∈ [n] ∪ [n′], when i, j ∈ [n] or i, j ∈ [n′].
Now we define a weight ordering ≺ on the ring C[a, λ] that gives deg ai = 1 if i ∈ [n]∪ [n′] and
deg ai = 0 otherwise and deg λij = 0 for all i, j. Then, the largest degree term of φG(w)(σij) with
respect to this weight ordering is ψw(σ). Since highest weight terms must all cancel with each
other, we see that f ∈ IG(w),[n]∪[n′], implies that f ∈ Jw. Thus, we deduce that IG(w),[n]∪[n′] ⊆ Jw.
On the other hand,
IG(w),[n]∪[n]′ ∩ C[σij | i ∈ [n], j ∈ [n
′]] = Jw ∩ C[σij | i ∈ [n], j ∈ [n′]]
and since the generators of Jw ∩ C[σij | i ∈ [n], j ∈ [n′]] generate Jw, we deduce that Jw ⊆
IG(w),[n]∪[n′] which completes the proof. �
The significance of Theorem 7.5 comes from the work of Knutson and Miller [11]. They gave
a complete description of antidiagonal Gröbner bases for the ideals Iw. Indeed, these ideals
are generated by certain subdeterminants of the matrix Σ[n],[n′]. These determinants can be
interpretted combinatorially in terms of the graph G(w).
Theorem 7.6. [11] The ideal Iw defining the matrix Schubert variety is generated by the con-
ditional independence statements implied by the DAG G(w). In particular,
#C + 1 minors ofΣA,B | A ⊂ [n], B ⊂ [n′], C ⊂ S(w), and C d-separates A from B
7.3. Joins and Secant Varieties. In this section, we will show how joins and secant varieties
arise as special cases of Gaussian Bayesian networks in the hidden variable case. This, in turn,
implies that techniques that have been developed for studying defining equations of joins and
secant varieties (e.g. [12, 20]) might be useful for studying the equations defining these hidden
variable models.
Given two ideals I and J in a polynomial ring K[x] = K[x1, . . . , xm], their join is the new
ideal
I ∗ J := (I(y) + J(z) + 〈xi − yi − zi | i ∈ [m]〉)
28 SETH SULLIVANT
where I(y) is the ideal obtained from I by plugging in the variables y1, . . . , ym for x1, . . . , xm.
The secant ideal is the iterated join:
I{r} = I ∗ I ∗ · · · ∗ I
with r copies of I. If I and J are homogeneous radical ideals over an algebraically closed field,
the join ideal I ∗ J is the vanishing ideal of the join variety which is defined geometrically by
the rule
V (I ∗ J) = V (I) ∗ V (J) =
a∈V (I)
b∈V (J)
< a, b >
where < a, b > denotes the line spanned by a and b and the bar denotes the Zariski closure.
Suppose further that I and J are the vanishing ideals of parametrizations; that is there are
φ and ψ such that
φ : C[x]→ C[θ] and ψ : C[x]→ C[η]
and I = kerφ and J = kerψ. Then I ∗ J is the kernel of the map
φ+ ψ : C[x]→ C[θ, η]
xi 7→ φ(xi) + ψ(xi).
Given a DAG G and a subset K ⊂ V (G), GK denotes the induced subgraph on K.
Proposition 7.7. Let G be a DAG and suppose that the vertices of G are partitioned into
V (G) = O ∪H1 ∪H2 where both H1 and H2 are hidden sets of variables. Suppose further that
there are no edges of the form o1 → o2 such that o1, o2 ∈ O or edges of the form h1 → h2 or
h2 → h1 with h1 ∈ H1 and h2 ∈ H2. Then
IG,O = IGO∪H1 ,O ∗ IGO∪H2 ,O.
The proposition says that if the hidden variables are partitioned with no edges between the
two sets and there are no edges between the observed variables the ideal is a join.
Proof. The parametrization of the hidden variable model only involves the σij such that i, j ∈ O.
First, we restrict to the case where i 6= j. Since there are no edges between observed variables
and no edges between H1 and H2, every trek from i to j involves only edges in GO∪H1 or only
edges in GO∪H2 . This means that
φG(σij) = φGO∪H1 (σij) + φGO∪H2 (σij)
and these summands are in non-overlapping sets of indeterminates. Thus, by the comments
preceding the proposition, the ideal only in the σij with i 6= j ∈ O is clearly a join. However,
the structure of this hidden variable model implies that there are no nontrivial relations that
involve the diagonal elements σii with i ∈ O. This implies that IG,O is a join. �
Example 7.8. Let Kp,m be the directed complete bipartite graph with bipartition H = [p′]
and O = [m] such that i′ → j ∈ E(Kp,m) for all i′ ∈ [p′] and j ∈ [m]. Then Kp,m satisfies the
conditions of the theorem recursively up to p copies, and we see that:
IKp,m,O = I
K1,m,O
This particular hidden variable Gaussian Bayesian network is known as the factor analysis model.
This realization of the factor analysis model as a secant variety was studied extensively in [3].
ALGEBRAIC GEOMETRY OF GAUSSIAN BAYESIAN NETWORKS 29
Example 7.9. Consider the two “doubled trees” pictured in the figure.
1 2 3 4 5 6
1 2 3 4 5 6
Since in each case, the two subgraphs GO∪H1 and GO∪H2 are isomorphic, the ideals are secant
ideals of the hidden tree models IT,O for the appropriate underlying trees. In both cases, the
ideal I{2}T,O = IG,O is a principal ideal, generated by a single cubic. In the first case, the ideal
is the determinantal ideal J{2}T = 〈|Σ123,456|〉. In the second case, the ideal is generated by an
eight term cubic
IG,O = 〈σ13σ25σ46 − σ13σ26σ45 − σ14σ25σ36 + σ14σ26σ35
+σ15σ23σ46 − σ15σ24σ36 − σ16σ23σ45 + σ16σ24σ35〉 .
In both of the cubic cases in the previous example, the ideals under questions were secant
ideals of toric ideals that were initial ideals of the Grassmann-Plücker ideal, as we saw in Theorem
7.1. Note also that the secant ideals I{2}T,O are, in fact, the initial terms of the 6× 6 Pfaffian with
respect to appropriate weight vectors. We conjecture that this pattern holds in general.
Conjecture 7.10. Let T be a binary tree with n leaves and O the set of leaves of T . Let
I2,n be the Grassmann-Pluücker ideal, let ω be a weight vector and τ a sign vector so that
IT,O = τ · inω(I2,n) as in Theorem 7.1. Then for each r
T,O = τ · inω(I
2,n ).
References
[1] P. Bickel and K. Doksum. Mathematical Statistics. Vol 1. Prentice-Hall, London, 2001.
[2] D. Cox, J. Little, and D. O’Shea. Ideals, Varieties, and Algorithms. Undergraduate Texts in Mathematics.
Springer, New York, 1997.
[3] M. Drton, B. Sturmfels, and S. Sullivant. Algebraic factor analysis: tetrads, pentads, and beyond. To appear
in Probability Theory and Related Fields, 2006.
[4] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press,
Cambridge, 1999.
[5] J. Felsenstein. Inferring Phylogenies. Sinauer Associates, Inc. Sunderland, MA, 2004.
[6] L. Garcia. Polynomial constraints of Bayesian networks with hidden variables. Preprint, 2006.
[7] L. Garcia, M. Stillman, and B. Sturmfels. Algebraic geometry of Bayesian networks. Journal of Symbolic
Computation 39 (2005) 331-355.
[8] I. Gessel and G. Viennot. Binomial determinants, paths, and hook length formulae. Adv. in Math. 58 (1985),
no. 3, 300–321.
30 SETH SULLIVANT
[9] D. Grayson and M. Stillman. Macaulay 2, a software system for research in algebraic geometry. Available at
http://www.math.uiuc.edu/Macaulay2/
[10] M. Hochster. Rings of invariants of tori, Cohen-Macaulay rings generated by monomials, and polytopes,
Annals of Mathematics 96 (1972) 318–337.
[11] A. Knutson and E. Miller. Gröbner geometry of Schubert polynomials. Ann. of Math. (2) 161 (2005), no. 3,
1245–1318.
[12] J. M. Landsberg, L. Manivel. On the ideals of secant varieties of Segre varieties. Found. Comput. Math. 4
(2004), no. 4, 397–422.
[13] S. Lauritzen. Graphical Models. Oxford Statistical Science Series 17 Clarendon Press, Oxford, 1996.
[14] R. Read and R. Wilson. An Atlas of Graphs. Oxford Scientific Publications. (1998)
[15] C. Semple and M. Steel. Phylogenetics. Oxford Lecture Series in Mathematics and Its Applications 24 Oxford
University Press, Oxford, 2003.
[16] D. Speyer and B. Sturmfels. The tropical Grassmannian. Advances in Geometry 4 (2004) 389-411.
[17] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, Cambridge,
MA, 2000.
[18] R. Stanley. Enumerative Combinatorics Vol. 2 Cambridge Studies in Advanced Mathematics 62, Cambridge
University Press, 1999.
[19] B. Sturmfels. Gröbner Bases and Convex Polytopes. University Lecture Series 8, American Mathematical
Society, Providence, RI, 1996.
[20] B. Sturmfels and S. Sullivant. Combinatorial secant varieties. Quarterly Journal of Pure and Applied Math-
ematics 2 (2006) 285-309.
Department of Mathematics and Society of Fellows, Harvard University, Cambridge, MA 02138
http://www.math.uiuc.edu/Macaulay2/
	1. Introduction
	Acknowledgments
	2. Parametrization and Conditional Independence
	3. Computational Study
	4. Tetrad Representation Theorem
	5. Fully Observed Trees
	6. Hidden Trees
	7. Connections to Algebraic Geometry
	7.1. Toric Initial Ideals of the Grassmannian
	7.2. Matrix Schubert Varieties
	7.3. Joins and Secant Varieties
	References
ABSTRACT
  Conditional independence models in the Gaussian case are algebraic varieties
in the cone of positive definite covariance matrices. We study these varieties
in the case of Bayesian networks, with a view towards generalizing the
recursive factorization theorem to situations with hidden variables. In the
case when the underlying graph is a tree, we show that the vanishing ideal of
the model is generated by the conditional independence statements implied by
graph. We also show that the ideal of any Bayesian network is homogeneous with
respect to a multigrading induced by a collection of upstream random variables.
This has a number of important consequences for hidden variable models.
Finally, we relate the ideals of Bayesian networks to a number of classical
constructions in algebraic geometry including toric degenerations of the
Grassmannian, matrix Schubert varieties, and secant varieties.

<|endoftext|><|startoftext|>
Introduction (Cambridge University Press, 1999).
ABSTRACT
  We study a system of two quantum dots connected by a hopping bridge. Both the
dots and connecting region are assumed to be in universal crossover regimes
between Gaussian Orthogonal and Unitary ensembles. Using a diagrammatic
approach appropriate for energy separations much larger than the level spacing
we obtain the ensemble-averaged one- and two-particle Green's functions. It
turns out that the diffuson and cooperon parts of the two-particle Green's
function can be described by separate scaling functions. We then use this
information to investigate a model interacting system in which one dot has an
attractive s-wave reduced Bardeen-Cooper-Schrieffer interaction, while the
other is noninteracting but subject to an orbital magnetic field. We find that
the critical temperature is {\it nonmonotonic} in the flux through the second
dot in a certain regime of interdot coupling. Likewise, the fluctuation
magnetization above the critical temperature is also nonmonotonic in this
regime, can be either diamagnetic or paramagnetic, and can be deduced from the
cooperon scaling function.

<|endoftext|><|startoftext|>
Two-proton radioactivity and three-body decay. III. Integral formulae for decay
widths in a simplified semianalytical approach.
L. V. Grigorenko1,2, 3 and M. V. Zhukov4
Flerov Laboratory of Nuclear Reactions, JINR, RU-141980 Dubna, Russia
Gesellschaft für Schwerionenforschung mbH, Planckstrasse 1, D-64291, Darmstadt, Germany
RRC “The Kurchatov Institute”, Kurchatov sq. 1, 123182 Moscow, Russia
Fundamental Physics, Chalmers University of Technology, S-41296 Göteborg, Sweden
Three-body decays of resonant states are studied using integral formulae for decay widths. Theo-
retical approach with a simplified Hamiltonian allows semianalytical treatment of the problem. The
model is applied to decays of the first excited 3/2− state of 17Ne and the 3/2− ground state of 45Fe.
The convergence of three-body hyperspherical model calculations to the exact result for widths and
energy distributions are studied. The theoretical results for 17Ne and 45Fe decays are updated and
uncertainties of the derived values are discussed in detail. Correlations for the decay of 17Ne 3/2−
state are also studied.
PACS numbers: 21.60.Gx – Cluster models, 21.45.+v – Few-body systems, 23.50.+z – Decay by proton
emission, 21.10.Tg – Lifetimes
I. INTRODUCTION
The idea of the “true” two-proton radioactivity was
proposed about 50 years ago in a classical paper of
Goldansky [1]. The word “true” denotes here that we
are dealing not with a relatively simple emission of two
protons, which becomes possible in every nucleus above
two-proton decay threshold, but with a specific situa-
tion where one-proton emission is energetically (due to
the proton separation energy in the daughter system) or
dynamically (due to various reasons) prohibited. Only
simultaneous emission of two protons is possible in that
case (see Fig. 1, more details on the modes of the three-
body decays can be found in Ref. [2]). The dynamics of
such decays can not be reduced to a sequence of two-body
decays and from theoretical point of view we have to deal
with a three-body Coulomb problem in the continuum,
which is known to be very complicated.
Progress in this field was quite slow. Only recently a
consistent quantum mechanical theory of the process was
developed [2, 3, 4], which allows to study the two-proton
(three-body) decay phenomenon in a three-body cluster
model. It has been applied to a range of a light nuclear
systems (12O, 16Ne [5], 6Be, 8Li∗, 9Be∗ [6], 17Ne∗, 19Mg
[7]). Systematic exploratory studies of heavier prospec-
tive 2p emitters 30Ar, 34Ca, 45Fe, 48Ni, 54Zn, 58Ge, 62Se,
and 66Kr [4, 8]) have been performed providing predic-
tions of lifetime ranges and possible correlations among
fragments.
Experimental studies of the two-proton radioactivity is
presently an actively developing field. Since the first ex-
perimental identification of 2p radioactivity in 45Fe [9, 10]
it was also found in 54Zn [11]. Some fingerprints of the
48Ni 2p decay were observed and the 45Fe lifetime and de-
cay energy were measured with improved accuracy [12].
There was an intriguing discovery of the extreme en-
hancement of the 2p decay mode for the high-spin 21+
isomer of 94Ag, interpreted so far only in terms of the
hyperdeformation of this state [13]. New experiments,
(A-2) + 2N
(A-1) + N
E2rE3r
(A-2) + 2N
(A-1) + N
(a) (b)
Three-body 
    decay 
  window
thresholds
FIG. 1: Energy conditions for different modes of the two-
nucleon emission (three-body decay): true three-body decay
(a), sequential decay (b).
aimed at more detailed 2p decay studies (e.g. observa-
tion of correlations), are under way at GSI (19Mg), MSU
(45Fe), GANIL (45Fe), and Jyväskylä (94Ag).
Several other theoretical approaches were applied to
the problem in the recent years. We should mention
the “diproton” model [14, 15], “R-matrix” approach
[16, 17, 18, 19], continuum shell model [20], and adiabatic
hyperspherical approach of [21]. Some issues of a com-
patibility between different approaches will be addressed
in this work.
Another, possibly very important, field of applica-
tion of the two-proton decay studies was shown in Refs.
[22, 23]. It was demonstrated in [22] that the importance
of direct resonant two-proton radiative capture processes
was underestimated in earlier treatment of the rp-process
waiting points [24]. The scale of modification of the astro-
physical 2p capture rates can be as large as several orders
of magnitude in certain temperature ranges. In paper [23]
it has been found that nonresonant E1 contributions to
three-body (two-proton) capture rates can also be much
larger than was expected before. The updated 2p as-
trophysical capture rate for the 15O(2p,γ)17Ne reaction
appears to be competing with the standard 15O(α,γ)19Ne
breakout reaction for the hot CNO cycle. The improve-
ments of the 2p capture rates obtained in [22, 23] are
connected to consistent quantum mechanical treatment
http://arxiv.org/abs/0704.0920v1
of the three-body Coulomb continuum in contrast to the
essentially quasiclassical approach typically used in as-
trophysical calculations of three-body capture reactions
(e.g. [24, 25]).
The growing quality of the experimental studies of the
2p decays and the high precision required for certain as-
trophysical calculations inspired us to revisit the issues
connected with different uncertainties and technical diffi-
culties of our studies. In this work we make the following.
(i) Extend the two-body formalism of the integral for-
mulae for width to the three-body case. We perform the
relevant derivations for the two-body case to make the
relevant approximations and assumptions explicit. (ii)
Formulate a simplified three-body model which has many
dynamical features similar to the realistic case, but allows
the exact semianalytical treatment and thus makes pos-
sible a precise calibration of three-body calculations. It
is also possible to study in great detail several impor-
tant dependencies of three-body widths in the frame of
this model. (iii) Perform practical studies of some sys-
tems of interest and demonstrate a connection between
the simplified semianalytical formalism and the realistic
three-body calculations.
The unit system h̄ = c = 1 is used in the article.
II. INTEGRAL FORMULA FOR WIDTH
Integral formalisms of width calculations for narrow
two-body states are known for a long time, e.g. [26, 27].
The prime objective of those studies was α-decay widths.
An interesting overview of this field can be found in the
book [28]. This approach, to our opinion, did not pro-
duce novel results as the inherent uncertainties of the
method are essentially the same as those of the R-matrix
phenomenology, which is technically much simpler (see
e.g. a discussion in [29]). An important nontrivial appli-
cation of the integral formalism was calculation of widths
for proton emission off deformed states [30, 31]. There
were attempts to extend the integral formalism to the
three-body decays, using a formal generalization for the
hyperspherical space [2, 32]. These were shown to be
difficult with respect to technical realisation and to be
inferior to other methods developed in [2, 3].
Here we develop an integral formalism for the three-
body (two-proton) decay width in a different way. How-
ever, first we review the standard formalism to define
(clearer) the approximations used.
A. Width definition, complex energy WF
For decay studies we consider the wave function (WF)
with complex pole energy
Ẽr = k̃
r/(2M) = Er − iΓ/2 , k̃r ≈ kr − iΓ/(2vr) ,
where v =
2E/M . The pole solution for Hamiltonian
(H − Ẽr)Ψ
lm (r) = (T + V − Ẽr)Ψ
lm (r) = 0
provides the WF with outgoing asymptotic
lm (r) = r
l (kr)Ylm(r̂) . (1)
For single channel two-body problem the pole solution is
formed only for one selected value of angular momentum
l. In the asymptotic region
l (k̃rr)
l (k̃rr) = Gl(k̃rr) + iFl(k̃rr) . (2)
The above asymptotic is growing exponentially
l (k̃rr)
∼ exp[+ik̃rr] ≈ exp[+ikrr] exp[+Γr/(2vr)]
as a function of the radius at pole energy. This unphys-
ical growth is connected to the use of time-independent
formalism and could be reliably neglected for typical ra-
dioactivity time scale as it has a noticeable effect at very
large distances.
Applying Green’s procedure to complex energy WF
Ψ(+)†
(H − Ẽr)Ψ
(H − Ẽr)Ψ
Ψ(+) = 0
we get for the partial components at pole energy Ẽr
After radial integration from 0 to R (here and below R
denotes the radius sufficiently large that the nuclear in-
teraction disappears) we obtain
which corresponds to a definition of the width as a decay
probability (reciprocal of the lifetime):
N = N0 exp[−t/τ ] = N0 exp[−Γt] .
The width Γ is then equal to the outgoing flux jl through
the sphere of sufficiently large radius R, divided by num-
ber of particles Nl inside the sphere.
Using Eq. (2) the flux in the asymptotic region could
be rewritten for k̃r → kr in terms of a Wronskian
= (kr/M)W (Fl(krR), Gl(krR)) = vr , (4)
where the Wronskian for real energy functions Fl, Gl is
W (Fl, Gl) = GlF
lFl ≡ 1 .
The effect of the complex energy is easy to estimate (ac-
tually without loss of a generality) in a small energy ap-
proximation
Fl(kr)
∼ Cl(kr)
l+1, Gl(kr)
(kr)−l
(2l+ 1)Cl
, (5)
where Cl is a Coulomb coefficient (defined e.g. in Ref.
[33]). The flux is then
l (k̃
l (k̃rr) − k̃
l (k̃
l (k̃rr)
2l(l+ 1)
+ l × o[Γ3]
So, the equality (4) is always valid for l = 0 and for l 6= 0
we get
l(l + 1)
B. Two-body case, real energy WF
Now we need a WF as real energy E = k2/2M solution
of Schrödinger equation
(H − E)Ψk(r) = (T + V
nuc + V coul − E)Ψk(r) = 0 ,
Ψk(r) = 4π
il(kr)−1ψl(kr)
Y ∗lm(k̂)Ylm(r̂) ,
in S-matrix representation, which means that for r > R
ψl(kr) =
[(Gl(kr) − iFl(kr)) − Sl(Gl(kr) + iFl(kr))] .
At resonance energy Er
Sl(Er) = e
2iδl(Er) = e2iπ/2 = −1
and in asymptotic region, defined by the maximal size of
nuclear interaction R,
ψl(krr)
= i Gl(krr) .
At resonance energy we can define a “quasibound” WF
ψ̃l as matching the irregular solution Gl and normalized
to unity for the integration in the internal region limited
by radius R:
ψ̃l(krr) =
(−i)ψl(krr)
|ψl(krx)|
ψl(krr)
. (6)
Now we introduce an auxiliary Hamiltonian H̄ with
different short range nuclear interaction V̄ nuc,
(H̄ − E)Φk(r) = (T + V̄
nuc + V coul − E)Φk(r) = 0 ,
and also construct other WF in S-matrix representation
Φk(r) = 4π
il(kr)−1ϕl(kr)
Y ∗lm(k̂)Ylm(r̂) ,
ϕl(kr) =
(Gl(kr)− iFl(kr)) − S̄l(Gl(kr) + iFl(kr))
for r > R. Or in equivalent form:
ϕl(kr) = exp(iδ̄l)
Fl(kr) cos(δ̄l) +Gl(kr) sin(δ̄l)
. (7)
The Hamiltonian H̄ should provide the WF Φk(r) which
at energy Er is sufficiently far from being a resonance
WF and for this WF δ̄l(Er) ∼ 0.
For real energy WFs Ψk(r) and Φk(r) we can write:
Φk(r)
† [(H − E)Ψk(r)] −
(H̄ − E)Φk(r)
Ψk(r) = 0 ,
ϕ∗l (V − V̄ )ψl =
For WFs taken at resonance energy Er this expression
provides
ϕ∗l (V − V̄ )ψldr = 2MiNl
ϕ∗l (V − V̄ )ψ̃ldr
= exp(−iδ̄l) cos(δ̄l) kr W (Fl(krR), Gl(krR)) ,(9)
1/2 =
−i exp(−iδ̄l) cos(δ̄l) kr
ϕ∗l (V − V̄ )ψ̃ldr
From Eqs. (3), (4), (6) and the approximation ψ
l ≈ ψl
it follows that
vr cos2(δ̄l)
ϕ∗l (V − V̄ )ψ̃ldr
. (10)
So, the idea of the integral method is to define the in-
ternal normalizations for the WF with resonant boundary
conditions (this is equivalent to determination of the out-
going flux for normalized “quasibound” WF) by the help
of the eigenfunction of the auxiliary Hamiltonian, which
has the same long-range behaviour and differs only in the
compact region.
III. ALTERNATIVE DERIVATION
Let us reformulate the derivation of Eq. (10) in a more
general way, so that the detailed knowledge of the WF
structure for ψl and ψ
l is not required. It would
allow a straightforward extension of the formalism to
the three-body case. We start from Schrödinger equa-
tion in continuum with solution Ψ(+) at the pole energy
Ẽr = Er + iΓ/2:
H − Ẽr
Ψ(+) =
T + V − Ẽr
Ψ(+) = 0 . (11)
Then we rewrite it identically via the auxiliary Hamilto-
nian H̄ = T + V̄
H + V̄ − V − Ẽr
Ψ(+) =
V̄ − V
H̄ − Er
Ψ(+) =
V̄ − V + iΓ/2
Ψ(+). (12)
Thus we can use the real-energy Green’s function ḠEr of
auxiliary Hamiltonian H̄ to “regenerate” the WF with
outgoing asymptotic
Ψ̄(+) = Ḡ
V̄ − V + iΓ/2
Ψ(+) . (13)
At this point in Eq. (13) Ψ̄(+) ≡ Ψ(+) and the bar in the
notation for “corrected” WF Ψ̄(+) is introduced for later
use to distinguish it from the “initial” WF Ψ(+) [the one
before application of Eq. (13)]. Further assumptions we
should consider separately in two-body and three-body
cases.
A. Two-body case
To define the width Γ by Eq. (3) we need to know
the complex-energy solution Ψ(+) at pole energy. For
narrow states Γ ≪ Er this solution can be obtained in a
simplified way using the following approximations.
(i) For narrow states we can always choose the auxil-
iary Hamiltonian in such a way that Γ ≪ V̄ −V , and we
can assume Γ → 0 in the Eq. (13).
(ii) Instead of complex-energy solution Ψ(+) in the
right-hand side of (13) we can use the normalized real-
energy quasibound solution Ψ̃ defined for one real reso-
nant value of energy Er = k
dr r2
Ψ̃lm(r)
≡ 1 .
So, the Eq. (13) is used in the form
lm = Ḡ
V̄ − V
Ψ̃lm . (14)
The solution Ψ̄(+) is matched to function
l (kr) = Gl(kr) + iFl(kr) , (15)
while the solution Ψ̃ is matched to function Gl. For deep
subbarrier energies it is reasonable to expect that in the
internal region r ≤ R
Gl ≫ Fl →
Re[Ψ̃(+)]
Im[Ψ̃(+)]
In the single channel case it can be shown by direct cal-
culation that an approximate equality
Re[Ψ̃(+)]
Im[Ψ̃(+)]
holds in the internal region and thus for narrow states
Γ ≪ Er the approximation (13) → (14) should be very
reliable.
(c) N2N1
X , lx
Y , ly
CoreCore
Y , ly
X , lx
(b)N1
r1 , l1
r2 , l2
FIG. 2: Single particle coordinate systems: (a) “V” system
typical for a shell model. In the Jacobi “T” system (b),
“diproton” and core are explicitly in configurations with def-
inite angular momenta lx and ly . For a heavy core the Jacobi
“Y” system (c) is close to the single particle system (a).
To derive Eq. (10) the WF with outgoing asymptotic
is generated using the Green’s function of the auxiliary
Hamiltonian H̄ and the “transition potential” (V − V̄ ).
The standard two-body Green’s function is
k2/(2m)
(r, r′) =
ϕl(kr)h
l (kr
′), r ≤ r′
l (kr)ϕl(kr
′), r > r′
Ylm(r̂)Y
lm(r̂
′) , (16)
where the radial WFs h
l and ϕl of the auxiliary Hamil-
tonian are defined in (15) and (7).
lm (r) =
dr′ Ḡ
k2/(2m)
(r, r′)
V̄ − V
Ψ̃l′m′(r
For the asymptotic region r > R
lm (r) =
l (krr)Ylm(r̂)
dr′ ϕl(krr
V̄ − V
ψ̃l(kr, r
The outgoing flux is then calculated [see Eq. (4)]
2l+ 1
lm (r)∇Ψ̄
As far as function Ψ̃ is normalized by construction then
Γ ≡ jl =
dr ϕl(krr)
V̄ − V
ψ̃l(kr, r)
. (17)
Note, that this equation differs from Eq. (10) only by a
factor 1/(cos2[δ̄l]) which should be very close to unity for
sufficiently high barriers.
B. Simplified model for three-body case
In papers [2, 3] the widths for three-body decays were
defined by the following procedure. We solve numerically
the problem
(H − E3r) Ψ̃ = 0
with some box boundary conditions (e.g. zero or qua-
sibound in diagonal channels at large distances) getting
the WF Ψ̃ normalized in the finite domain and the value
of the real resonant energy E3r. Thereupon we search for
the outgoing solution Ψ(+) of the equation
(H − E3r) Ψ
(+) = −iΓ/2 Ψ̃
with approximate boundary conditions of three-body
Coulomb problem (see Ref. [2] for details) and arbitrary
Γ. The width is then defined as the flux through the
hypersphere of the large radius divided by normalization
within this radius:
dΩ5 Ψ
(+)∗ρ5/2 d
ρ5/2Ψ(+)
ρ=ρmax
∫ ρmax
∣Ψ(+)
The 3-body WF with outgoing asymptotic is
JM (ρ,Ω5) = ρ
Kγ (ρ)J
Kγ (Ω5) , (19)
where the definitions of the hyperspherical variables ρ,
Ω5 and hyperspherical harmonics J
Kγ can be found in
Ref. [4].
Here we formulate the simplified three-body model in
the way which, on one hand, keeps the important dy-
namical features of the three-body decays (typical sizes
of the nuclear potentials, typical energies in the subsys-
tems, correct ratios of masses, etc.), and, on the other
hand, allows a semianalytical treatment of the problem.
Two types of approximations are made here.
The three-body Coulomb interaction is
V coul =
Z1Z2α
Z1Z3α
Y + A2X
A1+A2
Z2Z3α
Y − A1X
A1+A2
, (20)
where α is the fine structure constant. By convention, see
e.g. Fig. 2, in the “T” Jacobi system the core is particle
number 3 and in “Y” system it is particle number 2. We
assume that the above potential can be approximated by
Coulomb terms which depend on Jacobi variables X and
Y only:
V coulx (X) =
, V couly (Y ) =
(in reality for the small X and Y values the Coulomb
formfactors of the homogeneously charged sphere with
radius rsph are always used). The effective charges Zx
and Zy could be considered in two ways.
1. We can neglect one of the Coulomb interactions.
This approximation is consistent with physical situ-
ation of heavy core and treatment of two final state
interactions. Such a situation presumes that Jacobi
“Y” system is preferable and there is a symmetry
in the treatment of the X and Y coordinates, which
are close to shell-model single particle coordinates.
Zx = Z1Zcore , Zy = Z2Zcore . (21)
Further we refer this approximation as “no p-
p Coulomb” case, as typically the proton-proton
Coulomb interaction is neglected compared to
Coulomb interaction of a proton with heavy core.
2. We can also consider two particles on the X coor-
dinate as one single particle. The Coulomb inter-
action in p-p channel is thus somehow taken into
account effectively via a modification of the Zy
charge:
Zx = Z1Zcore , Zy = Z2(Zcore + Z1) . (22)
Below we call this situation as “effective p-p
Coulomb” case.
For nuclear interactions we can assume that
1. There is only one nuclear pairwise interaction and
H = T + V3(ρ) + V
x (X)
+V nucx (X) + V
y (Y ) ,
∆V (X,Y ) = V nucy (Y )− V3(ρ) . (23)
This approximation is good for methodological pur-
poses as it allows to focus on one degree of freedom
and isolate it from the others. From physical point
of view it could be reasonable if only one FSI is
strong [42], or we have reasons to think that de-
cay mechanism associated with this particular FSI
is dominating. Potential V nucy (Y ) in the auxiliary
Hamiltonian (27) is “unphysical” in that case and
can be put zero [43]. We further refer this model
as “one final state interaction” (OFSI).
2. We can consider two final state interactions (TFSI).
Simple form of the Green’s function in that case can
be preserved only if the core mass is considered as
infinite (the X and Y coordinates in the Jacobi
“Y” system coincide with single-particle core-p co-
ordinates). In that case both pairwise interactions
V nucx (X) and V
y (Y ) are treated as “physical”,
that means that they are both present in the ini-
tial and in the auxiliary Hamiltonians. Thus only
three-body potential “survive” the V̄ − V subtrac-
tion:
H = T + V3(ρ) + V
x (X) + V
x (X)
+V couly (Y ) + V
y (Y ) ,
∆V (X,Y ) = −V3(ρ) . (24)
The three-body potential is used in this work in
Woods-Saxon form
V3(ρ) = V
3 (1 + exp [(ρ− ρ0)/aρ])
, (25)
with ρ0 = 5 fm for
17Ne, ρ0 = 6 fm for
45Fe [44], and
a small value of diffuseness parameter aρ = 0.4 fm. Use
of such three-body potential is an important difference
from our previous calculations, where it was utilized in
the form
V3(ρ) = V
1 + (ρ/ρ0)
, (26)
which provides the long-range behaviour ∼ ρ−3. Such
an asymptotic in ρ variable is produced by short-range
pairwise nuclear interactions and thus the interpretation
of three-body potential (26) is phenomenological taking
into account those components of pairwise interactions
which were omitted for some reasons in calculations. In
this work the aim of the potential V3 is different. On
one hand we would like to keep the three-body energy
fixed while the properties (and number) of pairwise in-
teractions are varied. On the other hand we do not want
to change the properties of the Coulomb barriers beyond
the typical nuclear distance (this is achieved by the small
diffuseness of the potential). Thus this potential is phe-
nomenological taking into account interactions that act
only when both valence nucleons are close to the core
(both move in the mean field of the nucleus).
The auxiliary Hamiltonian is taken in the form that
allows a separate treatment of X and Y variables
H̄ = T+V coulx (X)+V
x (X)+V
y (Y )+V
y (Y ) (27)
In this formulation of the model the Coulomb potentials
are fixed as shown above. The nuclear potential V nucx (X)
[V nucy (Y ) if present] defines the position of the state in the
X [Y ] subsystem. The three-body potential V3(ρ) defines
the position of the three-body state, which is found using
the three-body HH approach of [2, 4]. After that a new
WF with outgoing asymptotic is generated by means of
the three-body Green’s function which can be written for
(27) in a factorized form (without paying attention to the
angular coupling)
(XY,X′Y′) =
(X,X′)G
(Y,Y′),
where E3r = Ex + Ey (Ex, Ex are energies of subsys-
tems). The two-body Green’s functions in the expres-
sions above are defined as in (16) via eigenfunctions of
the subhamiltonians
H̄x − Ex = Tx + V
x (X) + V
x (X)− Ex
H̄y − Ey = Ty + V
y (Y ) + V
y (Y )− Ey
In the OFSI case the nuclear potential in the “Y” sub-
system should be put V nucy (Y ) ≡ 0. The “corrected”
continuum WF Ψ̄(+) is
Ψ̄(+)(X,Y) =
dX′dY′
(X,X′)
(Y,Y′) ∆V (X ′, Y ′) Ψ(+)(X′Y′)
The “initial” solution Ψ(+) of Eq. (19) rewritten in the
coordinates X and Y is
JM (X,Y) =
ϕLlxlyS(X,Y )
[ly ⊗ lx]L ⊗ S
0 20 40 60 80 100
17Ne, "Y"-system,  core-p interaction in s-wave
  HH + FR  Kmax  →  12
  HH + FR  Kmax  →  24
  CorrectedΓ
FIG. 3: Convergence of the 17Ne width in a simplified model
in the “Y” Jacobi system. One final state interaction model
with experimental position E2r = 0.535 KeV of the s-wave
two-body resonance. Diamonds show the results of dynamic
HH calculations. Solid curves correspond to calculations with
effective FR potentials.
The asymptotic form of the ”corrected” continuum
WF Ψ̄
JM is
JM (X,Y) =
vx(ε)vy(ε)
×eikx(ε)X+iky(ε)Y
[ly ⊗ lx]L ⊗ S
Ex = εE3r ; Ey = (1− ε)E3r ; vi(ε) =
2Ei/Mi
A(ε) =
dY ′ ϕlx(kx(ε)X
′) ϕly (ky(ε)Y
×∆V (X ′, Y ′) ϕLlxlyS(X
′, Y ′) .(29)
The “corrected” outgoing flux jc can be calculated on the
sphere of the large radius for any of two Jacobi variables.
E.g. for X coordinate we have [45]
jc(E3r) = Im
Ψ̄(+)∗
Ψ̄(+)
= E23r
A∗(ε)
kx(ε)
A(ε′)
2π δ(ky(ε
′)− ky(ε)) . (30)
Values v′i above denote vi(ε
′). The flux is obtained as
jc(E3r) =
vx(ε)vy(ε)
|A(ε)|
. (31)
In principle as we have seen above that the widths ob-
tained with both fluxes Eqs. (18) and (31) should be
equal
≡ Γc =
. (32)
10 20 30 40 50 60 70 80
17Ne, "Y"-system,  core-p interaction in s-wave
HH + FR
   no nucl. potential
   E2R = 1.0 MeV
   E2R = 0.535 MeV
   E2R = 0.36 MeV
FIG. 4: Convergence of widths in OFSI model for different
positions E2r of the two-body resonance in the core-p channel
(Jacobi “Y” system). For Kmax > 24 the value of Kmax de-
note the size of the basis for Feshbach reduction toKmax = 24.
This is the idea of calibration procedure for the simplified
three-body model. The convergence of the HH method
(for WF Ψ
JM ) is expected to be fast in the internal re-
gion and much slower in the distant subbarrier region.
This should be true for the width Γ calculated in the HH
method. However, the procedure for calculation of the
“corrected” width Γc is exact under the barrier and it is
sensitive only to HH convergence in the internal region,
which is achieved easily. Below we demonstrate this in
particular calculations.
IV. DECAYS OF THE
NE 3/2− AND 45FE 3/2−
STATES IN A SIMPLIFIED MODEL
In this Section when we refer widths of 17Ne and 45Fe
we always mean the 17Ne 3/2− state (E3r = 0.344 MeV)
and the 45Fe 3/2− ground state (E3r = 1.154 MeV) cal-
culated in a very simple models. We expect that impor-
tant regularities found for these models should be true
also in realistic calculations. However, particular values
obtained in realistic models may differ significantly, and
this issue is considered specially in the Section V.
To keep only the most significant features of the sys-
tems we assume pure sd structure (lx = 0, ly = 2) for
17Ne and pure p2 structure (lx = 1, ly = 1) for
45Fe in
”Y” Jacobi system (see Fig. 2). Spin dependencies of
the interactions are neglected. The Gaussian formfactor
V nuci (r) = Vi0 exp[−(r/r0)
where i = {x, y}, is taken for 17Ne (see Table I), and a
standard Woods-Saxon formfactor is used for 45Fe (see
Table II),
V nuci (r) = Vi0 [1 + exp[(r − r0)/a]]
. (33)
The simplistic structure models can be expected to
overestimate the widths. There should be a considerable
−14 −12 −10 −8 −6 −4 −2 0
10-16
10-15
10-14
10-13
10-12
0.2 0.3 0.4 0.5 0.6 0.70.80.9 1 2 3
10-16
10-15
10-14
10-13
10-12 17
Three-body
    regime
E2r = E3r
E2r (exp.)
E2r  (MeV)
   Corrected
   Kmax = 24
   Kmax = 24 + FR
  Transition region 
E2r = (0.7−0.85) E3r
 Vx  (MeV)
FIG. 5: Width of the 17Ne 3/2− state as a function of two-
body resonance position E2r. Dashed, dotted and solid lines
show cases of pure HH calculations with Kmax = 24, the
same but with Feshbach reduction from Kmax = 100, and the
corrected width Γc. Inset shows the same, but as a function
of the potential depth parameter Vx0. Gray area shows the
transition region from three-body to two-body decay regime.
The gray curve shows simple analytical dependence of Eq.
(34).
weight of d2 component (lx = 2, ly = 2) in
17Ne and f2
component (lx = 3, ly = 3) in
45Fe. Also the spin-angular
coupling should lead to splitting of the single-particle
strength and corresponding reduction of the width es-
timates (e.g. we assume one s-wave state at 0.535 keV in
the “X” subsystem of 17Ne while in reality there are two
s-wave states in 16F: 0− at 0.535 MeV and 1− at 0.728
MeV). Thus the results of the simplified model should
most likely be regarded as upper limits for widths.
A. One final state interaction — core-p channel
First we take into account only the 0.535 MeV s-wave
two-body resonance in the 16F subsystem (this is the
experimental energy of the first state in 16F). Conver-
gence of the 17Ne width in a simplified model for Jacobi
“Y” system is shown in Fig. 3. The convergence of the
corrected width Γc as a function of Kmax is very fast:
Kmax > 8 for the width is stable within ∼ 1%. For maxi-
mal achieved in the fully dynamic calculation Kmax = 24
the three-body width Γ is calculated within 30% preci-
sion. Further increase of the effective basis size is possi-
ble within the adiabatic procedure based on the so called
Feschbach reduction (FR).
Feschbach reduction is a procedure, which eliminates
from the total WF Ψ = Ψp +Ψq an arbitrary subspace q
using the Green’s function of this subspace:
Hp = Tp + Vp + VpqGqVpq
In a certain adiabatic approximation we can assume that
the radial part of kinetic energy is small and constant un-
0.0 0.2 0.4 0.6 0.8 1.0
"Y"-system
 (p-core)-p
  Kmax = 6
  Kmax = 10
  Kmax = 14
  Kmax = 24
  Kmax = 100
  corrected
ε =Ex / E3r
FIG. 6: Convergence of energy distribution for 17Ne in the
“Y” Jacobi system.
der the centrifugal barrier in the channels with so high
centrifugal barrier that it is much higher than any other
interaction. In this approximation the reduction proce-
dure becomes trivial as it is reduced to construction of
effective three-body interactions V effKγ,K′γ′ by matrix op-
TABLE I: Parameters for 17Ne calculations. Potential pa-
rameters for 15O+p channel in s-wave (Vx0 in MeV, r0 = 3.53
fm) and 16F+p channel in d-wave (Vy0 in MeV). Radius of the
charged sphere is rsph = 3.904 fm. Widths Γi of the state in
the subsystem and experimental width values Γexp for really
existing at these energies states are given in keV. The cor-
rected three-body width Γc is given in the units 10
−14 MeV.
TFSI calculations with d-wave state at 1.2 MeV are made
with s-wave state at 0.728 MeV.
E2r lx (ly) Vx0 (Vy0) Γx (Γy) Γexp Γc
0.258 0 −14.4 0.221 144
0.275 0 −14.35 0.355 16.6
0.292 0 −14.3 0.544 7.75
0.360 0 −14.1 2.09 2.34
0.535 0 −13.55 17.9 25(5) [34] 0.545
0.728 0 −12.89 72.0 70(5) [34] 0.211
1.0 0 −12.0 252 0.093
2.0 0 −9.0 ∼ 1500 0.021
0.96 2 −87.06 3.5 6(3) [34] 4.73a
1.256 2 −85.98 12.2 < 15 [35] 2.0a
0.96 2 −66.46 3.6 6(3) [34] 1.37b
1.256 2 −65.4 13.7 < 15 [35] 0.584b
aThis is TFSI calculation with “no p-p” Coulomb, r0 = 2.75 fm.
bThis is TFSI calculation with “effective” Coulomb, r0 = 3.2 fm.
0.0 0.2 0.4 0.6 0.8 1.0
94.4%
68.0%
   no nucl. potential
   E2r = 1.0 MeV
   E2r = 0.535 MeV
   E2r = 0.360 MeV
   E2r = 0.284 MeV
   E2r = 0.266 MeV
   E2r = 0.249 MeV
ε =Ex / E3r
FIG. 7: Energy distributions for 17Ne in the “Y” Jacobi
system for different two-body resonance positions E2r. The
three-body decay energy is E3r = 0.344 MeV. The distri-
butions are normalized to have unity value on maximum of
three-body components. The values near the peaks show the
fraction of the total intensity concentrated within the peak.
Note the change of the scale at vertical axis.
erations
G−1Kγ,K′γ′ = (H − E)Kγ,K′γ′ = VKγ,K′γ′
Ef − E +
(K + 3/2)(K + 5/2)
δKγ,K′γ′ ,
V effKγ,K′γ′ = VKγ,K′γ′ +
VKγ,K̄γ̄GK̄γ̄,K̄′γ̄′VK̄′γ̄′,K′γ′ .
Summation over indexes with bar is made for eliminated
channels. No strong sensitivity to the exact value of the
“Feshbach energy” Ef is found and we take it as Ef ≡ E
in our calculations. More detailed account of the pro-
cedure applied within HH method can be found in Ref.
[36].
It can be seen in Fig. 3 (solid line) that Feschbach re-
duction procedure drastically improves the convergence.
However, the calculation converges to a width value,
which is somewhat smaller than the corrected width value
(that should be exact). The reason for this effect can
be understood if we make a reduction to a smaller “dy-
namic” basis size (Kmax = 12, gray line). The calculation
in this case also converges, but even to a smaller width
value. We can conclude that FR procedure allows any-
how to approach the real width value, but provides a
good result only for sufficiently large size of the dynamic
sector of the basis.
The next issue to be discussed is a convergence of the
width in calculations with different positions E2r of two-
body resonance in the core+p subsystem. It is demon-
strated for several energies E2r in Fig. 4. When the
resonance in the subsystem is absent (or located rela-
tively high) the convergence of the width value to the
exact result is very fast both in the pure three-body
and in the “corrected” calculation (in that case, how-
ever, much faster). Here even FR is not required as the
0 20 40 60 80 100 120 140
HH + FRHH
Gaussian potential in s-wave
   OFSI           TFSI   
Potential with repulsive core in s-wave
   OFSI           TFSI
17Ne, "Y"-system 
FIG. 8: Convergence of the 17Ne width in a simplified model.
Jacobi “Y” system. OFSI model with s-wave two-body reso-
nance at E2r = 0.535 MeV; Gaussian potential and potential
with repulsive core. TFSI model with d-wave two-body reso-
nance at E2r = 0.96 MeV.
convergent result is achieved in the HH calculations by
Kmax = 10 − 24. The closer two-body resonance ap-
proaches the decay window, the worse is convergence of
HH calculations. At energy E2r = 360 keV (which is al-
ready close to three-body decay window E3r = 344 keV)
even FR procedure provides a convergence to the width
value which is only about 65% of the exact value.
In Fig. 5 the calculations with different E2r values are
summarized. The width grows rapidly as the two-body
resonance moves closer to the decay window. The pen-
etrability enhancement provided by the two-body reso-
nance even before it moves into the three-body decay
window is very important. Difference of widths with no
core-p FSI and FSI providing the s-wave resonance to
be at it experimental position E2r = 0.535 MeV is more
than two orders of the magnitude. The convergence of
HH calculations also deteriorates as E2r moves closer to
the decay window. However, the disagreement between
the HH width and the exact value is within the order
of the magnitude, until the resonance achieves the range
E2r ∼ (0.7 − 0.85)E3r. Within this range a transition
from three-body to two-body regime happens (see also
discussion in [8]), which can be seen as a drastic change
of the width dependence on E2r. This means that a se-
quential decay via two-body resonanceE2r becomes more
efficient than the three-body decay. In that case the hy-
perspherical expansion can not treat the dynamics effi-
ciently any more and the disagreement with exact result
becomes as large as orders of the magnitude. The de-
cay dynamics in the transition region is also discussed in
details below.
It can be seen in Fig. 5 that in three-body regime the
dependence of the three-body width follows well the an-
alytical expression
Γ ∼ (E3r/2− E2r)
−2 (34)
The reasons of such a behaviour will be clarified in
0 20 40 60 80 100 120 140
  HH + FR  Kmax  →  24
  Corrected
17Ne, "T"-system,  p-p interaction only
FIG. 9: Convergence of the 17Ne width in a simplified model.
Jacobi “T” system. Final state interaction describes s-wave
p-p scattering.
the forthcoming paper [37]. The deviations from this de-
pendence can be found in the decay window (close to
“transition regime”) and at higher energies. This depen-
dence is quite universal; e.g. for 45Fe it is demonstrated
in Fig. 14, where it follows the calculation results even
with higher precision.
Another important issue is a convergence of energy dis-
tributions in the HH calculations, demonstrated in Fig.
6 for calculations with E2r = 535 keV. The distribution
is calculated in “Y” Jacobi subsystem, thus Ex is the en-
ergy between the core and one proton. The energy dis-
tribution convergence is fast: the distribution is stable at
Kmax = 10− 14 and does not change visibly with further
increase of the basis. There remain a visible disagree-
ment with exact (”corrected”) results, which give more
narrow energy distribution. We think that this effect was
understood in our work [4]. The three-body calculations
are typically done for ρmax ∼ 500−2000 fm (ρmax ∼ 1000
fm everywhere in this work). It was demonstrated in Ref.
[4] by construction of classical trajectories that we should
expect a complete stabilization of the energy distribution
in core+p subsystem at ρmax ∼ 30000−50000 fm and the
effect on the width of the energy distribution should be
comparable to one observed in Fig. 6.
The evolution of the energy distribution in core+p sub-
system with variation of E2r is shown in Fig. 7. When
we decrease the energy E2r the distribution is very stable
until the two-body resonance enters the three-body de-
cay energy window. After that the peak at about ε ∼ 0.5
first drifts to higher energy and then for E2r ∼ 0.85E3r
the noticeable second narrow peak for sequential decay
is formed. At E2r ∼ 0.7E3r the sequential peak becomes
so high that the three-body component of the spectrum
is practically disappeared in the background.
The result concerning the transition region obtained
in this model is consistent with conclusion of the paper
[8] (where much simpler model was used for estimates).
The three-body decay is a dominating decay mode, not
0.0 0.2 0.4 0.6 0.8 1.0
   Kmax = 10
   Kmax = 100
   corrected  (p-p FSI)
   corrected  (no p-p FSI)
ε =Ex / E3r
"T"-system   (p-p)-core
FIG. 10: Energy distributions for 17Ne in “T” Jacobi system
(between two protons).
only when the sequential decay is energy prohibited as
E2r > E3r. Also the three-body approach is valid when
the sequential decay is formally allowed (because E2r <
E3r) but is not taking place in reality due to Coulomb
suppression at E2r >∼ 0.8E3r.
Geometric characters of potentials can play an impor-
tant role in the width convergence. To test this aspect
of the convergence we have also made the calculations
for potential with repulsive core. This class of potentials
was employed in studies of 17Ne and 19Mg in Ref. [7]. A
comparison of the convergence of HH calculations with s-
wave 15O+p potential from [7] and Gaussian potential is
given in Fig. 8. The width convergence in the case of the
“complicated” potential with a repulsive core is drasti-
cally worse than in the “easy” case of Gaussian potential.
For typical dynamic calculations withKmax = 20−24 the
HH calculations provide only 20 − 25% of the width for
potential with a repulsive core. On the other hand the
calculations with both potentials provide practically the
same widths Γc [46] and FR provides practically the same
and very well converged result in both cases.
B. One final state interaction — p-p channel
As far as two-proton decay is often interpreted as
“diproton” decay we should also consider this case and
study how important this channel could be. For this cal-
culation we use a simple s-wave Gaussian p-p potential,
providing a good low-energy p-p phase shifts,
V (r) = −31 exp[−(r/1.8)2] . (35)
Calculations with this potential are shown in Fig. 9 (see
also Table V). First of all the penetrability enhancement
provided by p-p FSI is much less than the enhancement
provided by core-p FSI (the widths differs more than two
orders of the magnitude, see Fig. 3). This is the feature,
which has been already outlined in our works. The p-p in-
teraction may boost the penetrability strongly, but only
0 4 8 12 16 20
ρ0 /√2   (fm)
ρ0 = 6 fm
0 2 4 6 8
Ypeak  (fm)
FIG. 11: Comparison of the OFSI calculations (solid lines) for
45Fe in the “T” system with diproton model Eq. (36) (dashed
lines). Effective equivalent channel radius rch(dp) for “dipro-
ton emission” (a) as a function of radius ρ0 of the three-body
potential (25), the value ρ0/
2 should be comparable with
typical nuclear sizes. (b) as a function of the position of the
peak Ypeak in the three-body WF Ψ
(+) in Y coordinate. The
dashe lines are given to guide the eye.
in the situation, when protons occupy predominantly or-
bitals with high orbital momenta. In such a situation the
p-p interaction allows transitions to configurations with
smaller orbital momenta in the subbarrier region, which
provide a large increase of the penetrability. In our sim-
ple model for 17Ne 3/2− state, we have already assumed
the population of orbitals with minimal possible angular
momenta and thus no strong effect of the p-p interaction
is expected.
Also a very slow convergence of the decay width should
be noted in this case. For core-p interaction the Kmax ∼
10 − 40 were sufficient to obtain a reasonable result. In
the case of the p-p interaction theKmax ∼ 100 is required.
Energy distributions between two protons obtained in
this model are shown in Fig. 10. Important feature of
these distributions is a strong focusing of protons at small
p-p energies. This feature is connected, however, not
with attractive p-p FSI, but with dominating Coulomb
repulsion in the core-p channel. This is demonstrated by
the calculation with nuclear FSI turned off, which pro-
vides practically the same energy distributions. Similarly
to the case of the core-p FSI, very small Kmax > 10 is
sufficient to provide the converged energy distribution.
The converged HH distribution is very close to the exact
(”corrected”) one but it is, again, somewhat broader.
So far the diproton model has been treated by us as a
reliable upper limit for three-body width [8]. With some
technical improvements this model was used for the two-
proton widths calculations in Refs. [16, 17, 18, 19]. It
is important therefore to try to understand qualitatively
the reason of the small width values obtained in this form
of OFSI model, which evidently represents appropriately
formulated diproton model [47]. In Fig. 11 we compared
the results of the OFSI calculations for 45Fe in the “T”
0 20 40 60 80 100
17Ne, "Y"-system, core-p interactions 
                    in s- and d-waves
  HH + FR  Kmax  →  24
  Corrected
FIG. 12: Convergence of the 17Ne width for experimental
positions E2r = 0.535 MeV of the 0
+ two-body resonance in
the “X” subsystem and E2r = 0.96 MeV of the 2
+ two-body
resonance in the “Y” subsystem (TFSI model).
system with diproton width estimated by expression
Γdp =
Mredr
ch(dp)
Pl=0(0.95E3r, rch(dp), 2Zcore) ,
where Mred is the reduced mass for
43Cr-pp motion and
rch(dp) is channel radius for diproton emission. The en-
ergy for the relative 43Cr-ppmotion is taken 0.95E3r bas-
ing on the energy distribution in the p-p channel (see
Fig. 10 for example). In Fig. 11a we show the effective
equivalent channel radii for diproton emission obtained
by fulfilling condition Γdp ≡ Γc for OFSI model calcula-
tions with different radii ρ0 of the three-body potential
Eq. (25). It is easy to see that for realistic values of these
radii (ρ0 ∼ 6 fm for
45Fe) the equivalent diproton model
radii should be very small (∼ 1.5 fm). This happens pre-
sumably because the “diproton” is too large to be con-
sidered as emitted from nuclear surface of such small ρ0
radius. Technically it can be seen as the nonlinearity of
the rch(dp)-ρ0 dependence, with linear region achieved at
ρ0 ∼ 15−20 fm. Only at such unrealistically large ρ0 val-
ues the typical nuclear radius (when it becomes compa-
rable with the “size” of the diproton) can be reasonably
interpreted as the surface, off which the “diproton” is
emitted. It is interesting to note that in the nonlinearity
region for Fig. 11a there exists practically exact corre-
spondence between the Y coordinate of the WF peak in
the internal region and the channel radius for diproton
emission (Fig. 11b). This fact is reasonable to interpret
in such a way that the diproton is actually emitted not
from nuclear surface (as it is presumed by the existing
systematics of diproton calculations) but from the inte-
rior region, where the WF is mostly concentrated.
0 20 40 60 80
45Fe, "Y"-system, core-p interactions 
                           in p-waves
  HH + FR  Kmax  →  24
  Corrected
FIG. 13: Convergence of the 45Fe width for position of the 1−
two-body resonances in “X” and “Y” subsystems E2r = 1.48
C. Two final state interactions
As we have already mentioned the situation of one final
state interaction is comfortable for studies, but rarely re-
alized in practice. An exception is the case of the E1 tran-
sitions to continuum in the three-body systems, consid-
ered in our previous work [23]. For narrow states in typ-
ical nuclear system of the interest there are at least two
comparable final state interactions (in the core-p chan-
nel). For systems with heavy core this situation can be
treated reasonably well as the Y coordinate (in “Y” Ja-
cobi system) for such systems practically coincides with
the core-p coordinate. Below we treat in this way 17Ne
(for which this approximation could be not very consis-
tent) and 45Fe (for which this approximation should be
good). In the case of 17Ne we are thus interested in the
scale of the effect, rather in the precise width value.
For calculations with two FSI for 17Ne we used Gaus-
sian d-wave potential (see Table I), in addition to the
s-wave potential used in Section IVA. This potential
provides a d-wave state at 0.96 MeV (Γ = 13.5 keV),
which corresponds to the experimental position of the
first d-wave state in 16F. The convergence of the 17Ne
decay width is shown in Fig. 12. Comparing with Fig.
3 one can see that the absolute value of the width has
changed significantly (2−3 times) but not extremely and
the convergence is practically the same. Interesting new
feature is a kind of the convergence curve “staggering”
for odd and even values of K/2. Also the convergence
of the corrected calculations requires now a considerable
Kmax ∼ 12− 14.
The improved experimental data for 2p decay of 45Fe
is published recently in Ref. [12]: E3r = 1.154(16) MeV,
Γ2p = 2.85
+0.65
−0.68× 10
−19 MeV [T1/2(2p) = 1.6
−0.3 ms] for
two-proton branching ratio Br(2p) = 0.57. Below we use
the resonance energy from this work.
The convergence of the 45Fe width is shown in Fig. 13.
The character of this convergence is very similar to that
1 2 3
10-19
10-18
E2r   from Refs. [4,8]
E2r = E3r
Three-body
    regime
n    Corrected
   Kmax = 24
   Kmax = 24 + FR
E2r  (MeV)
FIG. 14: The 45Fe g.s. width as a function of the two-body
resonance position E2r. Dashed, dotted and solid lines show
cases of a pure HH calculation with Kmax = 24, the same but
with Feshbach reduction from Kmax = 100, and the corrected
width Γc. Gray area shows the transition region from three-
body to two-body decay regime. The gray curve shows simple
analytical dependence of Eq. (34).
in the 17Ne case, except the “staggering” feature is more
expressed.
The dependence of the 45Fe width on the two-body
resonance energy E2r is shown in Fig. 14. Potential
parameters for these 45Fe calculations are given in Ta-
ble II. The result calculated for E3r = 1.154 MeV and
E2r = 1.48 MeV in paper [4] for pure [p
2] configuration
is Γ = 2.85× 10−19 MeV. The value Kmax = 20 was used
in these calculations. If we take the HH width value from
Fig. 13 at Kmax = 20 it provides Γ = 2.62× 10
−19 MeV,
which is in a good agreement with a full HH three-body
model of Ref. [4]. However, from Fig. 13 we can conclude
that in the calculations of [4] the width was about 35%
underestimated. Thus the value of about Γ = 6.3×10−19
MeV should be expected in these calculations. On the
other hand much larger uncertainty could be inferred
from Fig. 14 due to uncertain energy of the 44Mn ground
state. If we assume a variation E2r = 1.1− 1.6 MeV the
TABLE II: Parameters for 45Fe calculations. Potential pa-
rameters for p-wave interactions (33) in 43Cr+p channel (Vx0
in MeV, r0 = 4.236 fm, rsph = 5.486 fm) and
44Mn+p (Vy0 in
MeV, r0 = 4.268 fm, rsph = 5.527 fm), a = 0.65 fm. Calcula-
tions are made with “effective Coulomb” of Eq. (22). Widths
Γx, Γy of the states in the subsystems are given in keV. Cor-
rected three-body widths are given in the units 10−19 MeV.
E2r Vx0 Γx Vy0 Γy Γc
1.0 −24.350 4.3× 10−3 −24.54 2.1 × 10−3 26.5
1.2 −24.03 0.032 −24.224 0.018 11.8
1.48 −23.58 0.26 −23.78 0.15 5.6
2.0 −22.7 3.6 −22.93 2.3 2.3
3.0 −20.93 58 −21.19 44 0.84
10 100
HH + FR
   TFSI
   Three-body realistic
17Ne 
FIG. 15: Interpolation of 17Ne decay width obtained in full
three-body calculations by means of TFSI convergence curves
(see Fig. 8). Upper curves correspond to TFSI case with
Gaussian potential in s-wave and compatible S1 case for full
three-body model. Lower curves correspond to TFSI case
with repulsive core potential in s-wave and compatible GMZ
case for full three-body model.
inferred from Fig. 14 uncertainty of the width would be
Γ = (4 − 16)× 10−19 MeV. On top of that we expect a
strong p2/f2 configuration mixing which could easily re-
duce the width within an order of the magnitude. Thus
we can conclude that a better knowledge about spectrum
of 44Mn and a reliable structure information about 45Fe
are still required to make sufficiently precise calculations
of the 45Fe width. More detailed account of these issues
is provided below.
V. THREE-BODY CALCULATIONS
Having in mind the experience of the convergence stud-
ies we have performed large-basis calculations for 45Fe
and 17Ne. They are made with dynamicalKmax = 16−18
(including Fechbach reduction from Kmax = 30− 40) for
17Ne and Kmax = 22 (FR from Kmax = 40) for
45Fe.
The calculated width values are extrapolated using the
convergence curves obtained in TFSI model (Figs. 15) for
17Ne and 13for 45Fe). We have no proof that the width
convergence in the realistic three-body case is asolutely
the same as in the TFSI case. However, the TFSI model
takes into account main dynamic features of the system
causing a slow convergence, and we are expecting that
the convergence should be nearly the same in both cases.
A. Widths and correlations in
The potentials used in the realistic calculations are the
same as used for 17Ne studies in Refs. [7, 38]. The GPT
potential [39] is used in the p-p channel. The core-p
potentials are referred in [38] as “GMZ” (potential in-
FIG. 16: Correlations for 17Ne decay in “T” and “Y” Jacobi systems. Three-body calculations with realistic (GMZ) potential.
troduced in [7]) and “high s” (with centroid of d-wave
states is shifted upward which is providing a higher con-
tent of s2 components in the 17Ne g.s. WF). Both poten-
tials provide correct low-lying spectrum of 16F and differ‘
only for d-wave continuum above 3 MeV (see Table III).
The core-p nuclear potentials, including central, ss and
ls terms, are taken as
V (r) =
V lc + (s1 · s2)V
1 + exp[(r − rl0)/a]
− (l · s)
2.0153V lls
× exp[(r − rl0)/a]
1 + exp[(r − rl0)/a]
,(37)
with parameters: a = 0.65 fm, r00 = 3.014 fm, r
2.94 fm, V 0c = −26.381 MeV, V
c = −9 MeV, V
−57.6 (−51.48) MeV, V 3c = −9 MeV, V
ss = 0.885 MeV,
V 2ss = 4.5 (12.66) MeV, Vls = 4.4 (13.5) MeV (the values
in brackets are for “high s” case). There are also repulsive
cores for s- and p-waves described by a = 0.4 fm, r00 =
0.89 fm, Vcore = 200 MeV. These potentials are used
together with Coulomb potential obtained for Gaussian
charge distribution reproducing the charge radius of 15O.
To have extra confidence in the results, the width of
the 17Ne 3/2− state is calculated in several models of
growing complexity (Tables IV-VI). One can see from
those Tables that improvements introduced on each step
provide quite smooth transition from the very simple to
the most sophisticated model.
In Table IV we demonstrate how the calculations in the
simplified model of Section IV are compared with calcu-
lations of the full three-body model with appropriately
truncated Hamiltonian. We can switch off correspond-
ing interactions in the full model to make it consistent
with approximations of the simplified model. To remind,
the differences of the full model and simplified model are
the following: (i) antisymmetrization between protons
is missing in the simplified model and (ii) Y coordinate
is only approximately equal to the coordinate between
core and second proton. Despite these approximations
the models demonstrate very close results: the worst dis-
agreement is not more than 30%.
In Table V we compare approximations of a different
kind: those connected with choice of the Jacobi coordi-
nate system in the simplified model. First we compare
the “pure Coulomb” case: all pairwise nuclear interac-
tions are off and the existence of the resonance is provided
solely by the three-body potential (25). This model pro-
vides some hint what should be the width of the system
without nuclear pairwise interactions. Then the models
are compared with the nuclear FSIs added. The addition
of nuclear FSI drastically increase width in all cases. It
is the most “efficient” (in the sense of width increase) in
the case of TFSI model in the “Y” system. Choice of
this model provides the largest widths and can be used
for the upper limit estimates.
In Table VI full three-body models are compared. The
simplistic S1 and S2 interactions correspond to calcula-
tions with simplified spectra of the 16F subsystem. For S1
case it includes one s-wave state at 0.535 MeV (Γ = 18.8
keV) and one d-wave state at 0.96 MeV (Γ = 3.5 keV).
These are two lower s- and d-wave states known experi-
mentally. In the S2 case we use instead the experimental
positions of the higher component of the s- and d-wave
doublets: s-wave at 0.72 MeV (Γ = 73.4 keV) and d-
TABLE III: Low-lying states of 16F obtained in the “GMZ”
and “high s” core-p potentials. The potential is diagonal in
the representation with definite total spin of core and proton
S, which is given in the third column.
Case GMZ high s Exp.
Jπ l S E2r (MeV) Γ (keV) E2r (MeV) Γ (keV) Γ (keV)
0− 0 0 0.535 18.8 0.535 18.8 25(5) [34]
1− 0 1 0.728 73.4 0.728 73.4 70(5) [34]
2− 2 0 0.96 3.5 0.96 3.5 6(3) [34]
3− 2 1 1.2 9.9 1.2 10.5 < 15 [35]
2− 2 1 3.2 430 7.6 ∼ 3000
1− 2 1 4.6 1350 ∼ 15 ∼ 6000
FIG. 17: Correlations for 17Ne decay in “T” and “Y” Jacobi systems. Three-body calculations with Coulomb FSIs only (all
nuclear pairwise potentials are turned off).
wave at 1.2 MeV (Γ = 10 keV). Parameters of the core-p
potentials can be found in Table I. Simple Gaussian p-
p potential (35) is used. The variation of the results
between these models is moderate (∼ 30%). The calcu-
lations with GMZ potential provide the width for 17Ne
3/2− state which comfortably rests in between the re-
sults obtained in the simplified S1 and S2 models. The
structure of the WF is also obtained quite close to these
calculations. The structure in the “high s” case is ob-
tained with a strong domination of the sd component.
The width in the “high s” case is obtained somewhat
larger (∼ 11%) than in GMZ case, but this increase is
consistent with the increase of the sd WF component,
(∼ 15%) which is expected to be more preferable for de-
cay than d2 component.
It is important for us that the results obtained in the
three-body models with considerably varying spectra of
the two-body subsystems and different convergence sys-
tematics appear to be quite close: Γ ∼ (5 − 8) × 10−15
MeV. Thus we have not found a factor which could lead
to a considerable variation of the three-body width, given
the ingredients of the model are reasonably realistic.
The decomposition of the 17Ne WF obtained with
GMZ potential is provided in Table VII in terms of par-
tial internal normalizations and partial widths. The cor-
respondence between the components with large weights
and large partial widths is typically good. However, there
are several components giving large contribution to the
width in spite of negligible presence in the interior.
Complete correlation information for three-body de-
cay of a resonant state can be described by two variables
(with omission of spin degrees of freedom). We use the
energy distribution parameter ε = Ex/E3r and the angle
cos(θk) = (kxky)/(kxky) between the Jacobi momenta.
The complete correlation information is provided in Fig.
16 for realistic 17Ne 3/2− decay calculations. We can see
that the profile of the energy distribution is characterized
by formation of the double-hump structure, expected so
far for p2 configurations (see, e.g. [4]). This structure
can be seen both in “T” system (in energy distribution)
and in “Y” system (in angular distribution). In the cal-
culations of ground states of the s-d shell nuclei we were
getting such distributions to be quite smooth. It can be
found that the profile of this distribution is defined by
the sd/d2 components ratio. For example in the calcula-
tions with “high s” potential the total domination of the
sd configuration leads to washing out of the double-hump
profile.
The correlations in the 17Ne (shown in Fig. 16) are
strongly influenced by the nuclear FSIs. Calculations for
only Coulomb pairwise FSIs left in the Hamiltonan are
TABLE IV: Comparison of widths for 17Ne (in 10−14 MeV
units) obtained in simplified model in “Y” Jacobi system
and in full three-body model with correspondingly truncated
Hamiltonian. Structure information is provided for the three-
body model. In the simplified model the weight of the [sd]
configuration is 100% by construction. “No p-p” column
shows the case where Coulomb interaction in p-p channel is
switched off (see, (21)). “Eff.” column corresponds to the ef-
fective treatment (see, (22)) of Coulomb interaction in the p-p
channel in the simplified model, but to the exact treatment
in full three-body model.
pure Coulomb OFSI TFSI
“no p-p” Eff. “no p-p” Eff. “no p-p” Eff.
Simpl. 0.017 0.0032 3.02 0.545 4.70 1.37
3-body 0.024a 0.0041a 3.22 0.555 3.91 0.445
[sd] 99.8 99.3 99.6 99.5 92.0 72.6
[p2] 0.2 0.6 0.3 0.4 0.1 0.2
[d2] 0 0 0 0 7.8 27.1
aSmall repulsion (∼ 0.5 MeV) was added in that case in the p-
wave core-p channel to split the states with sd and p2 structure
which appear practically degenerated and strongly mixed in this
model.
FIG. 18: Correlations for 17Ne decay calculated in simplified OFSI model in “T” (only p-p FSI) and in “Y” Jacobi systems
(only s-wave core-p FSI).
shown in Fig. 17. The strong peak at small p-p energy
is largely dissolved and the most prominent feature of
the correlation density in that case is a rise of the distri-
bution for cos(θk) → 1 in the “Y” Jacobi system. This
kinematical region corresponds to motion of protons in
the opposite directions from the core and is qualitatively
understandable feature of the three-body Coulomb in-
teraction (the p-p Coulomb interaction is minimal along
such a trajectory).
The distributions calculated in the simplified (OFSI)
model are shown in Fig. 18 on the same {ε, cos(θk)} plane
as in Figs. 16 and 17. It should be noted that here the
calculations in “T” and “Y” Jacobi systems represent
different calculations (with p-p FSI only and with core-p
FSI only). In Figs. 16 and 17 two panels show differ-
ent representations of the same result. Providing rea-
sonable (within factor 2 − 4) approximation to the full
three-body model in the sense of the decay width, the
simplified model is very deficient in the sense of correla-
TABLE V: Comparison of widths calculated for 17Ne (10−14
MeV units) and 45Fe (10−19 MeV units) with pure Coulomb
FSIs and for nuclear plus Coulomb FSIs. Simplified OFSI
model in “T”, TFSI in “Y” Jacobi systems (“effective”
Coulomb is used in both cases) and full three-body calcu-
lations.
pure Coulomb Nuclear+Coulomb
“T” “Y” 3-body “T” “Y” 3-body
17Ne 0.0011 0.0032 0.0041 0.0077 1.37 0.76a
[sd] 100 100 99.3 100 100 73.1
[p2] 0 0 0.6 0 0 1.8
[d2] 0 0 0 0 0 24.2
45Fe 0.0053 0.0167 0.26 0.034 4.94 6.3b
aThis is a calculation with S1 Hamiltonian.
bThis is a calculation providing pure p2 structure.
tions. The only feature of the realistic correlations which
is even qualitatively correctly described in the simplified
model is the energy distribution in the “Y” system. The
“diproton” model (OFSI model with p-p interaction) fails
especially strongly, which is certainly relevant to the very
small width provided by this calculation.
B. Width of
The calculation strategy is the same as in [4]. We start
with interactions in the core-p channel which give a res-
onance in p-wave at fixed energy E2r. Such a calculation
provides 45Fe with practically pure p2 structure. Then
we gradually increase the interaction in the f -wave, until
it replaces the p-wave resonance at fixed E2r and then
we gradually move the p-wave resonance to high energy.
Thus we generate a set of WFs with different p2/f2 mix-
ing ratios.
The results of the improved calculations with the same
settings as in [4] (the 44Mn g.s. is fixed to have E2r = 1.48
MeV) are shown in Fig. 19 (see also Table V) together
with updated experimental data [12]. The basis size used
in [4] was sufficient to provide stable correlation pictures
(as we have found in this work) and they are not updated.
TABLE VI: Width (in 10−14 MeV units) and structure of
17Ne 3/2− state calculated in a full three-body model with
different three-body Hamiltonians.
S1 S2 GMZ high s
Kmax = 18 0.35 0.27 0.14 0.16
Extrapolated 0.76 0.56 0.69 0.76
[sd] 73.1 71.7 80.2 95.1
[p2] 1.8 1.8 2.0 1.3
[d2] 24.2 25.7 16.8 3.1
The sensitivity of the obtained results to the experi-
mentally unknown energy of 44Mn can be easily studied
by means of Eq. (34). The results are shown in Fig. 20 in
terms of the regions consistent with experimental data
on the {E2r,W (p
2)} plane [W (p2) is the weight of p2
configuration in 45Fe WF]. It is evident from this plot
that our current experimental knowledge is not sufficient
to draw definite conclusions. However, it is also clear
that with increased precision of the lifetime and energy
measurements for 45Fe and the appearance of more de-
tailed information on 44Mn subsystem the restrictions on
the theoretical models should become strong enough to
provide the important structure information.
VI. DISCUSSION
General trends of the model calculations can be well
understood from Tables IV-VI. For the pure Coulomb
case the simplified model calculations (in the “Y” and
“T” systems) and three-body calculations provide rea-
sonably consistent results. The simplified calculations in
the “Y” system always give larger widths than those in
the “T” system. From decay dynamics point of view this
leads to understanding of the contradictory fact that the
sequential decay path is preferable even if no even virtual
sequential decay is possible (as the nuclear interactions
are totally absent in this case).
The calculations with attractive nuclear FSIs rather
TABLE VII: Partial widths ΓKγ of different components of
17Ne 3/2− WF calculated in “T” Jacobi systems. Partial
weights are given in “T” (valueN
) and in “Y” (valueN
Jacobi systems. Sx is the total spin of two protons.
K L lx ly Sx N
2 2 0 2 0 23.88 33.87 44.93
2 2 2 0 0 24.97 16.52 13.29
2 2 1 1 1 0.28 7.39 3.59
2 2 1 1 0 1.54
2 2 0 2 1 3.68
2 2 2 0 1 3.68
4 2 0 2 0 8.97 20.04 3.19
4 2 2 0 0 8.68 13.57 5.57
4 2 2 2 0 15.49 0.32 18.80
4 2 1 3 1 0.03 2.18 0.95
4 2 3 1 1 0 1.89 0.63
4 1 2 2 1 1.02
4 2 0 2 1 1.99
4 2 2 0 1 2.07
6 2 2 4 0 0.14 0.77 3.57
6 2 4 2 0 0.14 0.77 0.78
6 2 0 2 0 0.50 0.09 0.69
8 2 4 4 0 0.02 0.003 1.58
1.0 1.1 1.2 1.3
10−20
10−19
10−18
 f  2  
26Fe 
E3r  (MeV)
4.56 10-21
   MeV
E2r = 1.48 MeV
FIG. 19: The lifetime of 45Fe as a function of the 2p decay
energy E3r. The plot is analogue of Fig. 6a from [4] with up-
dated experimental data [12] and improved theoretical results.
Solid curves shows the cases of practically pure p2 and f2
configurations, dashed curves stand for different mixed p2/f2
cases. The numerical labels on the curves show the weights
of the s2 and p2 configurations in percents.
expectedly provide larger widths than the corresponding
calculations with Coulomb interaction only. The core-
proton FSI is much more efficient for width enhancement
than p-p FSI. This fact is correlated with the observation
of the previous point and is a very simple and strong indi-
cation that the wide-spread perception of the two-proton
decay as “diproton” decay is to some extent misleading.
As it has already been mentioned the p-p FSI influences
the penetration strongly in the very special case when the
decay occurs from high-l orbitals (e.g. f2 in the case of
45Fe). Thus we should consider as not fully consistent the
attempts to explain two-proton decay results only by the
FSI in the p-p channel (e.g. Ref. [19]) as much stronger
decay mechanism is neglected in these studies.
From techical point of view the states considered in
this work belong to the most complicated cases. The
complication is due to the ratio between the decay energy
and the strength of the Coulomb interaction (it defines
the subbarrier penetration range to be considered dy-
namically). Thus the convergence effects demonstrated
in this work for 17Ne have the strongest character among
the systems studied in our previous works [4, 6, 7, 8].
Because of the relatively small Kmax = 12 used in the
previous works we have found an order of the magnitude
underestimation of the 17Ne(3/2−) width. For systems
like 48Ni — 66Kr the underestimation of widths in our
previous calculations is expected to be about factor of 2.
A much smaller effect is expected for lighter systems.
It was demonstrated in [22, 23] that the capture rate
for the 15O(2p,γ)17Ne reaction depends strongly on the
two-proton width of the first excited 3/2− state in 17Ne.
This width was calculated in Ref. [7] as 4.1× 10−16 MeV
(some confusion can be connected with misprint in Table
1.0 1.2 1.4 1.6 1.8 2.0
E2r  (MeV)
FIG. 20: Compatibility of the measured width of the 45Fe
with different assumptions about position E2r of the ground
state in the 44Mn subsystem and structure of 45Fe [weights
of the p2 configuration W (p2) are shown on the vertical
axis]. Central gray area corresponds to experimental width
uncertainty Γ = 2.85+0.65
−0.68 × 10
−19 MeV [12]. The light
gray area also takes into account the energy uncertainty
E3r = 1.154(16) MeV from [12]. The vertical dashed line
corresponds to E2r used in Fig. 19.
III of Ref. [7], see erratum). However, in the subsequent
work [21], providing very similar to [7] properties of the
17Ne WFs for the ground and the lowest excited states,
the width of the 3/2− state was found to be 3.6× 10−12
MeV. It was supposed in [21] that such a strong disagree-
ment is connected with poor subbarrier convergence of
the HH method in [7] compared to Adiabatic Faddeev
HH method of [21]. This point was further reiterated in
Ref. [41]. We can see now that this statement has a cer-
tain ground. However, the convergence problems of the
HH method are far insufficient to explain the huge dis-
agreement: the width increase found in this work is only
one order of magnitude. The most conservative upper
limit Γ ∼ 5× 10−14 MeV (see Table IV) was obtained in
a TFSI calculation neglecting p-p Coulomb interaction.
The other models systematically produce smaller values,
with realistic calculations confined to the narrow range
Γ ∼ (5 − 8) × 10−15 MeV (Table VI). Thus the value
Γ ∼ 4× 10−12 MeV obtained in paper [21] is very likely
to be erroneous. That result is possibly connected with a
simplistic quasiclassical procedure for width calculations
employed in this work.
VII. CONCLUSION.
In this work we derive the integral formula for the
widths of the resonances decaying into the three-body
channel for simplified Hamiltonians and discuss various
aspects of its practical application. The basic idea of the
derivation is not new, but for our specific purpose (pre-
cision solution of the multichannel problem) several im-
portant features of the scheme have not been discussed.
We can draw the following conclusions from our stud-
(i) We presume that HH convergence in realistic calcu-
lations should be largely the same as in the simplified
calculations as they imitate the most important dynamic
aspects of the realistic situation. The width values were
somewhat underestimated in our previous calculations.
The typical underestimation ranges from few percent to
tens of percent for “simple” potential and from tens of
percent to an order of magnitude in “complicated” cases
(potentials with repulsive core).
(ii) Convergence of the width calculations in the three-
body HH model can be drastically improved by a simple
adiabatic version of the Feshbach reduction procedure.
For a sufficiently large dynamic sector of the basis the
calculation with effective FR potential converges from
below and practically up to the exact value of the width.
For a small dynamic basis the FR calculation converges
towards a width value smaller than the exact value, but
still improves considerably the result.
(iii) The energy distributions obtained in the HH calcu-
lations are quite close to the exact ones. Convergence
with respect to basis size is achieved at relatively small
Kmax values. The disagreement with exact distributions
is not very significant and is likely to be connected not
with basis size convergence but, with radial extent of the
calculations [4].
(iv) Contributions of different decay mechanisms were
evaluated in the simplified models. We have found that
the “diproton” decay path is much less efficient than the
“sequential” decay path. This is true even in the model
calculations without nuclear FSIs (no specific dynamics),
which means that the “sequential” decay path is some-
how kinematically preferable.
(v) The value of the width for 17Ne 3/2− state was un-
derestimated in our previous works by around an order of
magnitude. A very conservative upper limit is obtained
in this work as Γ ∼ 5 × 10−14 MeV, while typical values
for realistic calculations are within the (5 − 8) × 10−15
MeV range. Thus the value Γ ∼ 4×10−12 MeV obtained
in papers [21, 41] is likely to be erroneous.
From this paper it is clear that the convergence issue
is sufficiently serious, and in some cases were underesti-
mated in our previous works. However, from practical
point of view, the convergence issue is not a principle
problem. For example the uncertain structure issues and
subsystem properties impose typically much larger uncer-
tainties for width values. For heavy two-proton emitters
(e.g. 45Fe) the positions of resonances in the subsystems
are experimentally quite uncertain. For a moment this is
the issue most limiting the precision of theoretical predic-
tions. We have demonstrated that with increased preci-
sion the experimental data impose strong restrictions on
theoretical calculations allowing to extract an important
structure information.
VIII. ACKNOWLEDGEMENTS
The authors are grateful to Prof. K. Langanke and
Prof. M. Ploszajczak for interesting discussions. The au-
thors acknowledge the financial support from the Royal
Swedish Academy of Science. LVG is supported INTAS
Grants 03-51-4496 and 05-1000008-8272, Russian RFBR
Grants Nos. 05-02-16404 and 05-02-17535 and Russian
Ministry of Industry and Science grant NS-8756.2006.2.
[1] V. I. Goldansky, Nucl. Phys. 19, 482 (1960).
[2] L. V. Grigorenko, R. C. Johnson, I. G. Mukha, I. J.
Thompson, and M. V. Zhukov, Phys. Rev. C 64 054002
(2001).
[3] L. V. Grigorenko, R. C. Johnson, I. G. Mukha, I. J.
Thompson, and M. V. Zhukov, Phys. Rev. Lett. 85, 22
(2000).
[4] L. V. Grigorenko, and M. V. Zhukov, Phys. Rev. 68 C,
054005 (2003).
[5] L. V. Grigorenko, I. G. Mukha, I. J. Thompson, and M.
V. Zhukov, Phys. Rev. Lett. 88, 042502 (2002).
[6] L. V. Grigorenko, R. C. Johnson, I. G. Mukha, I. J.
Thompson, and M. V. Zhukov, Eur. Phys. J. A 15 125
(2002).
[7] L. V. Grigorenko, I. G. Mukha, and M. V. Zhukov, Nucl.
Phys. A713, 372 (2003); erratum A740, 401 (2004).
[8] L. V. Grigorenko, I. G. Mukha, and M. V. Zhukov, Nucl.
Phys. A714, 425 (2003).
[9] M. Pfutzner, E. Badura, C. Bingham, B. Blank, M.
Chartier, H. Geissel, J. Giovinazzo, L. V. Grigorenko,
R. Grzywacz, M. Hellstrom, Z. Janas, J. Kurcewicz, A.
S. Lalleman, C. Mazzocchi, I. Mukha, G. Munzenberg,
C. Plettner, E. Roeckl, K. P. Rykaczewski, K. Schmidt,
R. S. Simon, M. Stanoiu, J.-C. Thomas, Eur. Phys. J. A
14, 279 (2002).
[10] J. Giovinazzo, B. Blank, M. Chartier, S. Czajkowski,
A. Fleury, M. J. Lopez Jimenez, M. S. Pravikoff, J.-
C. Thomas, F. de Oliveira Santos, M. Lewitowicz, V.
Maslov, M. Stanoiu, R. Grzywacz, M. Pfutzner, C.
Borcea, B. A. Brown, Phys. Rev. Lett. 89, 102501 (2002).
[11] B. Blank, A. Bey, G. Canchel, C. Dossat, A. Fleury, J.
Giovinazzo, I. Matea, N. Adimi, F. De Oliveira, I. Ste-
fan, G. Georgiev, S. Grevy, J. C. Thomas, C. Borcea,
D. Cortina, M. Caamano, M. Stanoiu, F. Aksouh, B. A.
Brown, F. C. Barker, and W. A. Richter, Phys. Rev. Lett.
94, 232501 (2005).
[12] C. Dossat, A. Bey, B. Blank, G. Canchel, A. Fleury, J.
Giovinazzo, I. Matea, F. de Oliveira Santos, G. Georgiev,
S. Grèvy, I. Stefan, J. C. Thomas, N. Adimi, C. Borcea,
D. Cortina Gil, M. Caamano, M. Stanoiu, F. Aksouh,
B. A. Brown, and L. V. Grigorenko, Phys. Rev. C 72,
054315 (2005).
[13] Ivan Mukha, Ernst Roeckl, Leonid Batist, Andrey
Blazhev, Joachim Döring, Hubert Grawe, Leonid Grig-
orenko, Mark Huyse, Zenon Janas, Reinhard Kirchner,
Marco La Commara, Chiara Mazzocchi, Sam L. Tabor,
Piet Van Duppen, Nature 439, 298 (2006).
[14] B. A. Brown, Phys. Rev. C 43, R1513 (1991); 44, 924(E)
(1991).
[15] W. Nazarewicz, J. Dobaczewski, T. R. Werner, J. A.
Maruhn, P.-G. Reinhard, K. Rutz, C. R. Chinn, A. S.
Umar, and M. R. Strayer, Phys. Rev. C 53, 740 (1996).
[16] F. C. Barker, Phys. Rev. C 63, 047303 (2001).
[17] F. C. Barker, Phys. Rev. C 66, 047603 (2002).
[18] F. C. Barker, Phys. Rev. C 68, 054602 (2003).
[19] B. A. Brown and F. C. Barker, Phys. Rev. C 67,
041304(R) (2003).
[20] J. Rotureau, J. Okolowicz, and M. Ploszajczak, Nucl.
Phys. A767, 13 (2006).
[21] E. Garrido, D. V. Fedorov, and A. S. Jensen, Nucl. Phys.
A733, 85 (2004).
[22] L. V. Grigorenko and M. V. Zhukov, Phys. Rev. C 72
015803 (2005).
[23] L. V. Grigorenko, K. Langanke, N. B. Shul’gina, and M.
V. Zhukov, Phys. Lett. B641, 254 (2006).
[24] J. Görres, M. Wiescher, and F.-K. Thielemann, Phys.
Rev. C 51, 392 (1995).
[25] K. Nomoto, F. Thielemann, and S. Miyaji, Astron. As-
trophys. 149, 239 (1985).
[26] K. Harada and E. A. Rauscher, Phys. Rev. 169, 818
(1968).
[27] S. G. Kadmensky and V. E. Kalechits, Yad. Fiz. 12, 70
(1970) [Sov. J. Nucl. Phys. 12, 37 (1971)].
[28] S. G. Kadmensky and V. I. Furman, Alpha-decay and
relevant reactions, Moscow, Energoatomizdat, 1985 (in
Russian).
[29] S. G. Kadmensky, Z. Phys. A312, 113 (1983).
[30] V. P. Bugrov and S. G. Kadmenskii, Sov. J. Nucl. Phys.
49, 967 (1989).
[31] C. N. Davids, P. J. Woods, D. Seweryniak, A. A. Son-
zogni, J. C. Batchelder, C. R. Bingham, T. Davinson, D.
J. Henderson, R. J. Irvine, G. L. Poli, J. Uusitalo, W. B.
Walters, Phys. Rev. Lett. 80, 1849 (1998).
[32] B. V. Danilin and M. V. Zhukov, Yad. Fiz. 56, 67 (1993)
[Phys. At. Nucl. 56, 460 (1993)].
[33] M. Abramowitz and I. Stegun, Handbook of Mathematical
Functions, p. 538.
[34] I. Stefan, F. de Oliveira Santos, M. G. Pellegriti, G. Du-
mitru, J. C. Angelique, M. Angelique, E. Berthoumieux,
A. Buta, R. Borcea, A. Coc, J. M. Daugas, T. Davin-
son, M. Fadil, S. Grevy, J. Kiener, A. Lefebvre-Schuhl,
M. Lenhardt, M. Lewitowicz, F. Negoita, D. Pantelica,
L. Perrot, O. Roig, M. G. Saint Laurent, I. Ray, O. Sor-
lin, M. Stanoiu, C. Stodel, V. Tatischeff, J. C. Thomas,
nucl-ex/0603020 v3.
[35] Ajzenberg-Selove, Nucl. Phys. A460, 1 (1986).
[36] B. V. Danilin et al., to be submitted.
[37] L. V. Grigorenko and M. V. Zhukov, to be submitted.
[38] L. V. Grigorenko, Yu. L. Parfenova, and M. V. Zhukov,
Phys. Rev. C 71, 051604(R) (2005).
[39] D. Gogny, P. Pires, and R. de Tourreil, Phys. Lett. B 32,
591 (1970).
[40] J. Görres, H. Herndl, I. J. Thompson, and M. Wiescher,
Phys. Rev. C 52, 2231 (1995).
[41] E. Garrido, D. V. Fedorov, A. S. Jensen, H.O.U. Fynbo,
Nucl. Phys. A748, 39 (2005).
[42] A realistic example of this situation is the case of “E1”
(coupled to the ground state by the E1 operator) con-
http://arxiv.org/abs/nucl-ex/0603020
tinuum considered in Ref. [23]. This case is relevant to
the low energy radiative capture reactions, important for
astrophysics, but deal with nonresonant continuum only.
[43] Interesting numerical stability test is a variation of
the “unphysical” (for OFSI approximation) potential
V nucy (Y ) in the auxiliary Hamiltonian (27). It can be used
for numerical tests of the procedure as it should not in-
fluence the width. Really, for variation of this potential
from weak attraction (we should not allow an unphysical
resonance into decay window) to strong repulsion (scale
of the variation is tens of MeV for potential with some
typical radius) the width is varied only within couple of
percents. This shows high numerical stability of the pro-
cedure.
[44] These values can be evaluated as typical nuclear ra-
dius for the system multiplied by
2: 3.53
2 ≈ 5 and
2 ≈ 6.
[45] The derivation of the flux here is given in a schematic
form. The complete proof is quite bulky to be provided
in the limited space. We would mention only that it is
easy to check directly that the derived expression for flux
preserves the continuum normalization.
[46] We demonstrate in paper [37] that a three-body width
should depend linearly on two-body widths of the sub-
systems and only very weakly on various geometrical fac-
tors. This is confirmed very well by direct calculations.
[47] The assumed nuclear structure is very simple, but the
diproton penetration process is treated exactly — with-
out assumptions about the emission of diproton from
some nuclear surface, which should be made in “R-
matrix” approach.
ABSTRACT
  Three-body decays of resonant states are studied using integral formulae for
decay widths. Theoretical approach with a simplified Hamiltonian allows
semianalytical treatment of the problem. The model is applied to decays of the
first excited $3/2^{-}$ state of $^{17}$Ne and the $3/2^{-}$ ground state of
$^{45}$Fe. The convergence of three-body hyperspherical model calculations to
the exact result for widths and energy distributions are studied. The
theoretical results for $^{17}$Ne and $^{45}$Fe decays are updated and
uncertainties of the derived values are discussed in detail. Correlations for
the decay of $^{17}$Ne $3/2^-$ state are also studied.

<|endoftext|><|startoftext|>
WHEN THE CRAMÉR-RAO INEQUALITY PROVIDES NO INFORMATION
STEVEN J. MILLER
Abstract. We investigate a one-parameter family of probability densities (related to the
Pareto distribution, which describes many natural phenomena) where the Cramér-Rao inequal-
ity provides no information.
1. Cramér-Rao Inequality
One of the most important problems in statistics is estimating a population parameter from a
finite sample. As there are often many different estimators, it is desirable to be able to compare
them and say in what sense one estimator is better than another. One common approach is to
take the unbiased estimator with smaller variance. For example, if X1, . . . , Xn are independent
random variables uniformly distributed on [0, θ], Yn = maxi Xi and X = (X1 + · · ·+ Xn)/n, then
Yn and 2X are both unbiased estimators of θ but the former has smaller variance than the
latter and therefore provides a tighter estimate.
Two natural questions are (1) which estimator has the minimum variance, and (2) what bounds
are available on the variance of an unbiased estimator? The first question is very hard to solve
in general. Progress towards its solution is given by the Cramér-Rao inequality, which provides
a lower bound for the variance of an unbiased estimator (and thus if we find an estimator that
achieves this, we can conclude that we have a minimum variance unbiased estimator).
Date: February 5, 2008.
2000 Mathematics Subject Classification. 62B10 (primary), 62F12, 60E05 (secondary).
Key words and phrases. Cramér-Rao Inequality, Pareto distribution, power law.
The author would like to thank Alan Landman for many enlightening conversations and the referees for helpful
comments. The author was partly supported by NSF grant DMS0600848.
http://arXiv.org/abs/0704.0923v1
2 STEVEN J. MILLER
Cramér-Rao Inequality: Let f(x; θ) be a probability density function with continuous parameter
θ. Let X1, . . . , Xn be independent random variables with density f(x; θ), and let Θ̂(X1, . . . , Xn)
be an unbiased estimator of θ. Assume that f(x; θ) satisfies two conditions:
(1) we have
· · ·
Θ̂(x1, . . . , xn)
f(xi; θ)dxi
· · ·
Θ̂(x1, . . . , xn)
i=1 f(xi; θ)
dx1 · · ·dxn;
(1.1)
(2) for each θ, the variance of Θ̂(X1, . . . , Xn) is finite.
var(Θ̂) ≥
∂ log f(x;θ)
)2], (1.2)
where E denotes the expected value with respect to the probability density function f(x; θ).
For a proof, see for example [CaBe]. The expected value in (1.2) is called the information
number or the Fisher information of the sample.
As variances are non-negative, the Cramér-Rao inequality (equation (1.2)) provides no useful
bounds on the variance of an unbiased estimator if the information is infinite, as in this case we
obtain the trivial bound that the variance is greater than or equal to zero. We find a simple
one-parameter family of probability density functions (related to the Pareto distribution) that
satisfy the conditions of the Cramér-Rao inequality, but the expectation (i.e., the information) is
infinite. Explicitly, our main result is
Theorem: Let
f(x; θ) =
−θ log−3 x if x ≥ e
0 otherwise,
(1.3)
WHEN THE CRAMÉR-RAO INEQUALITY PROVIDES NO INFORMATION 3
where aθ is chosen so that f(x; θ) is a probability density function. The information is infinite
when θ = 1. Equivalently, the Cramér-Rao inequality yields the trivial (and useless) bound that
Var(Θ̂) ≥ 0 for any unbiased estimator Θ̂ of θ when θ = 1.
In §2 we analyze the density in our theorem in great detail, deriving needed results about aθ
and its derivatives as well as discussing how f(x; θ) is related to important distributions used to
model many natural phenomena. We show the information is infinite when θ = 1 in §3, which
proves our theorem. We also discuss there properties of estimators for θ. While it is not clear
whether or not this distribution has an unbiased estimator, there is (at least for θ close to 1) an
asymptotically unbiased estimator rapidly converging to θ as the sample size tends to infinity. By
examining the proof of the Cramér-Rao inequality we see that we may weaken the assumption of
an unbiased estimator. While typically there is a cost in such a generalization, as our information
is infinite there is no cost in our case. We may therefore conclude that arguments such as those
used to prove the Cramér-Rao inequality cannot provide any information for estimators of θ from
this distribution.
2. An Almost Pareto Density
Consider
f(x; θ) =
aθ/(x
θ log3 x) if x ≥ e
0 otherwise,
(2.1)
where aθ is chosen so that f(x; θ) is a probability density function. Thus
xθ log3 x
= 1. (2.2)
We chose to have log3 x in the denominator to ensure that the above integral converges, as does
log x times the integrand; however, the expected value (in the expectation in (1.2)) will not
converge.
4 STEVEN J. MILLER
For example, 1/x logx diverges (its integral looks like log log x) but 1/x log2 x converges (its
integral looks like 1/ logx); see pages 62–63 of [Rud] for more on close sequences where one
converges but the other does not. This distribution is close to the Pareto distribution (or a power
law). Pareto distributions are very useful in describing many natural phenomena; see for example
[DM, Ne, NM]. The inclusion of the factor of log−3 x allows us to have the exponent of x in the
density function equal 1 and have the density function defined for arbitrarily large x; it is also
needed in order to apply the Dominated Convergence Theorem to justify some of the arguments
below. If we remove the logarithmic factors then we obtain a probability distribution only if the
density vanishes for large x. As log
x is a very slowly varying function, our distribution f(x; θ)
may be of use in modeling data from an unbounded distribution where one wants to allow a
power law with exponent 1, but cannot as the resulting probability integral would diverge. Such
a situation occurs frequently in the Benford Law literature; see [Hi, Rai] for more details.
We study the variance bounds for unbiased estimators Θ̂ of θ, and in particular we show that
when θ = 1 then the Cramér-Rao inequality yields a useless bound.
Note that it is not uncommon for the variance of an unbiased estimator to depend on the
value of the parameter being estimated. For example, consider again the uniform distribution on
[0, θ]. Let X denote the sample mean of n independent observations, and Yn = max1≤i≤n Xi be
the largest observation. The expected value of 2X and n+1
Yn are both θ (implying each is an
unbiased estimator for θ); however, Var(2X) = θ2/3n and Var(n+1
Yn) = θ
2/n(n+1) both depend
on θ, the parameter being estimated (see, for example, page 324 of [MM] for these calculations).
Lemma 2.1. As a function of θ ∈ [1,∞), aθ is a strictly increasing function and a1 = 2. It has
a one-sided derivative at θ = 1, and daθ
∈ (0,∞).
Proof. We have
xθ log3 x
= 1. (2.3)
WHEN THE CRAMÉR-RAO INEQUALITY PROVIDES NO INFORMATION 5
When θ = 1 we have
x log3 x
, (2.4)
which is clearly positive and finite. In fact, a1 = 2 because the integral is
x log3 x
log−3 x
d log x
2 log2 x
∣∣∣∣∣
; (2.5)
though all we need below is that a1 is finite and non-zero, we have chosen to start integrating at
e to make a1 easy to compute.
It is clear that aθ is strictly increasing with θ, as the integral in (2.4) is strictly decreasing with
increasing θ (because the integrand is decreasing with increasing θ).
We are left with determining the one-sided derivative of aθ at θ = 1, as the derivative at any
other point is handled similarly (but with easier convergence arguments). It is technically easier
to study the derivative of 1/aθ, as
(2.6)
xθ log
. (2.7)
The reason we consider the derivative of 1/aθ is that this avoids having to take the derivative of
the reciprocals of integrals. As a1 is finite and non-zero, it is easy to pass to
|θ=1. Thus we
= lim
x1+h log3 x
x log3 x
= lim
1 − xh
x log3 x
. (2.8)
We want to interchange the integration with respect to x and the limit with respect to h above.
This interchange is permissible by the Dominated Convergence Theorem (see Appendix A for
details of the justification). Note
1 − xh
= − log x; (2.9)
6 STEVEN J. MILLER
one way to see this is to use the limit of a product is the product of the limits, and then use
L’Hospital’s rule, writing xh as eh log x. Therefore
x log2 x
; (2.10)
as this is finite and non-zero, this completes the proof and shows daθ
|θ=1 ∈ (0,∞). �
Remark 2.2. We see now why we chose f(x; θ) = aθ/x
θ log3 x instead of f(x; θ) = aθ/x
θ log2 x.
If we only had two factors of log x in the denominator, then the one-sided derivative of aθ at θ = 1
would be infinite.
Remark 2.3. Though the actual value of daθ
|θ=1 does not matter, we can compute it quite
easily. By (2.10) we have
x log
d log x
log x
= −1. (2.11)
Thus by (2.6), and the fact that a1 = 2 (Lemma 2.1), we have
= −a21 ·
= 4. (2.12)
3. Computing the Information
We now compute the expected value, E
∂ log f(x;θ)
; showing it is infinite when θ = 1
completes the proof of our main result. Note
log f(x; θ) = log aθ − θ log x + log log
∂ log f(x; θ)
− log x. (3.1)
WHEN THE CRAMÉR-RAO INEQUALITY PROVIDES NO INFORMATION 7
By Lemma 2.1 we know that daθ
is finite for each θ ≥ 1. Thus
∂ log f(x; θ)
− log x
− log x
xθ log3 x
. (3.2)
If θ > 1 then the expectation is finite and non-zero. We are left with the interesting case when
θ = 1. As daθ
|θ=1 is finite and non-zero, for x sufficiently large (say x ≥ x1 for some x1, though
by Remark 2.3 we see that we may take any x1 ≥ e
4) we have
∣∣∣∣ ≤
log x
. (3.3)
As a1 = 2, we have
∂ log f(x; θ)
)2] ∣∣∣∣∣
log x
x log
2x logx
log−1 x
d log x
log log x
= ∞. (3.4)
Thus the expectation is infinite. Let Θ̂ be any unbiased estimator of θ. If θ = 1 then the
Cramér-Rao inequality gives
var(Θ̂) ≥ 0, (3.5)
which provides no information as variances are always non-negative. This completes the proof of
our theorem. �
We now discuss estimators for θ for our distribution f(x; θ). If X1, . . . , Xn are n independent
random variables with common distribution f(x; θ), then as n → ∞ the sample median converges
to the population median µ̃θ (if n = 2m + 1 then the sample median converges to being normally
distributed with median µ̃θ and variance 1/8mf(µ̃θ; θ)
2; see for example Theorem 8.17 of [MM]).
8 STEVEN J. MILLER
1.1 1.2 1.3 1.4
Figure 1. Plot of the median µ̃θ of f(x; θ) as a function of θ (µ̃1 = e
For θ close to 1 we see in Figure 1 that the median µ̃θ of f(x; θ) is strictly decreasing with
increasing θ, which implies that there is an inverse function g such that g(µ̃θ) = θ. We obtain an
estimator to θ by applying g to the sample median. This estimator is a consistent estimator (as
the sample size tends to infinity it will tend to θ) and should be asymptotically unbiased.
The proof of the Cramér-Rao inequality starts with
0 = E
· · ·
Θ̂(x1, . . . , xn) − θ
h(x1; θ) · · ·h(xn; θ)dx1 · · ·dxn
, (3.6)
where Θ̂(x1, . . . , xn) is an unbiased estimator of θ depending only on the sample values x1, . . . , xn.
In our case (when each h(x; θ) = f(x; θ)) we may not have an unbiased estimator. If we denote
this expectation by F(θ), for our investigations all that we require is that dF(θ)/dθ is finite (which
is easy to show). Going through the proof of the Cramér-Rao inequality shows that the effect
of this is to replace the factor of 1 in (1.2) with (1 + dF(θ)/dθ)2; thus the generalization of the
Cramér-Rao inequality for our estimator is
var(Θ̂) ≥
dF(θ)
∂ log f(x; θ)
. (3.7)
As our variance is infinite for θ = 1 we see that, no matter what ‘nice’ estimator we use, we will
not obtain any useful information from such arguments.
WHEN THE CRAMÉR-RAO INEQUALITY PROVIDES NO INFORMATION 9
Appendix A. Applying the Dominated Convergence Theorem
We justify applying the Dominated Convergence Theorem in the proof of Lemma 2.1. See, for
example, [SS] for the conditions and a proof of the Dominated Convergence Theorem.
Lemma A.1. For each fixed h > 0 and any x ≥ e, we have
1 − xh
∣∣∣∣ ≤ e log x, (A.1)
and e log x
x log3 x
is positive and integrable, and dominates each 1−x
x log3 x
Proof. We first prove (A.1). As x ≥ e and h > 0, note xh ≥ 1. Consider the case of 1/h ≤ log x.
Since |1 − xh| < 1 + xh ≤ 2xh, we have
|1 − xh|
≤ 2 log x. (A.2)
We are left with the case of 1/h > log x, or h logx < 1. We have
|1 − xh| = |1 − eh log x|
∣∣∣∣∣1 −
(h log x)n
∣∣∣∣∣
= h log x
(h log x)n−1
< h log x
(h log x)n−1
(n − 1)!
= h logx · eh log x. (A.3)
This, combined with h log x < 1 and xh ≥ 1 yields
|1 − xh|
eh log x
= e log x. (A.4)
It is clear that log x
x log3 x
is positive and integrable, and by L’Hospital’s rule (see (2.9)) we have that
1 − xh
x log3 x
x log2 x
. (A.5)
Thus the Dominated Convergence Theorem implies that
1 − xh
x log3 x
x log2 x
= −1 (A.6)
10 STEVEN J. MILLER
(the last equality is derived in Remark 2.3). �
References
[CaBe] G. Casella and R. Berger, Statistical Inference, 2nd edition, Duxbury Advanced Series, Pacific Grove,
CA, 2002.
[DM] D. Devoto and S. Martinez, Truncated Pareto Law and oresize distribution of ground rocks, Mathematical
Geology 30 (1998), no. 6, 661–673.
[Hi] T. Hill, A statistical derivation of the significant-digit law, Statistical Science 10 (1996), 354–363.
[MM] I. Miller and M. Miller, John E. Freund’s Mathematical Statistics with Applications, seventh edition,
Prentice Hall, 2004.
[Ne] M. E. J. Newman, Power laws, Pareto distributions and Zipfs law, Contemporary Physics 46 (2005),
no. 5, 323-351.
[NM] M. Nigrini and S. J. Miller, Benford’s Law applied to hydrology data – results and relevance to other
geophysical data, preprint.
[Rai] R. A. Raimi, The first digit problem, Amer. Math. Monthly 83 (1976), no. 7, 521–538.
[Rud] W. Rudin, Principles of Mathematical Analysis, third edition, International Series in Pure and Applied
Mathematics, McGraw-Hill Inc., New York, 1976.
[SS] E. Stein and R. Shakarchi, Real Analysis: Measure Theory, Integration, and Hilbert Spaces, Princeton
University Press, Princeton, NJ, 2005.
Department of Mathematics, Brown University, 151 Thayer Street, Providence, RI 02912
E-mail address: sjmiller@math.brown.edu
	1. Cramér-Rao Inequality
	2. An Almost Pareto Density
	3. Computing the Information
	Appendix A. Applying the Dominated Convergence Theorem
	References
ABSTRACT
  We investigate a one-parameter family of probability densities (related to
the Pareto distribution, which describes many natural phenomena) where the
Cramer-Rao inequality provides no information.

<|endoftext|><|startoftext|>
Introduction to Number Theory, Chelsea Publishing Company, New York, 1981.
[Od1] A. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math. Comp. 48 (1987), no.
177, 273–308.
[Od2] A. Odlyzko, The 1022-nd zero of the Riemann zeta function, Proc. Conference on Dynamical, Spectral and
Arithmetic Zeta-Functions, M. van Frankenhuysen and M. L. Lapidus, eds., Amer. Math. Soc., Contempo-
rary Math. series, 2001, http://www.research.att.com/∼amo/doc/zeta.html.
[OS] A. E. Özlük and C. Snyder, On the distribution of the nontrivial zeros of quadratic L-functions close to the
real axis, Acta Arith. 91 (1999), no. 3, 209–228.
[RR1] G. Ricotta and E. Royer, Statistics for low-lying zeros of symmetric power L-functions in the level aspect,
preprint. http://arxiv.org/abs/math/0703760
[RR2] G. Ricotta and E. Royer, Lower order terms for the one-level density of symmetric power L-functions in the
level aspect, preprint. http://arxiv.org/pdf/0806.2908
[RoSi] M. Rosen and J. Silverman, On the rank of an elliptic surface, Invent. Math. 133 (1998), 43–67.
[RoSc] J. B. Rosser and L. Schoenfeld, Approximate formulas for some functions of prime numbers, Illinois J.
Math. 6 (1962) 64–94.
[Ro] E. Royer, Petits zéros de fonctions L de formes modulaires, Acta Arith. 99 (2001), 47–172.
[Rub] M. Rubinstein, Low-lying zeros of L-functions and random matrix theory, Duke Math. J. 109 (2001), no. 1,
147–181.
[RS] Z. Rudnick and P. Sarnak, Zeros of principal L-functions and random matrix theory, Duke Math. J. 81
(1996), 269–322.
[Si1] J. Silverman, The Arithmetic of Elliptic Curves, Graduate Texts in Mathematics 106, Springer-Verlag,
Berlin - New York, 1986.
[Si2] J. Silverman, Advanced Topics in the Arithmetic of Elliptic Curves, Graduate Texts in Mathematics 151,
Springer-Verlag, Berlin - New York, 1994.
[Sl] N. Sloane, On-Line Encyclopedia of Integer Sequences,
http://www.research.att.com/∼njas/sequences/Seis.html.
[Tay] R. Taylor, Automorphy for some ℓ-adic lifts of automorphic mod l representations. II, preprint.
http://www.math.harvard.edu/∼rtaylor/twugk6.ps
[Yo1] M. Young, Lower-order terms of the 1-level density of families of elliptic curves, Internat. Math. Res.
Notices 2005, no. 10, 587–633.
[Yo2] M. Young, Low-lying zeros of families of elliptic curves, J. Amer. Math. Soc. 19 (2006), no. 1, 205–250.
E-mail address: Steven.J.Miller@williams.edu
DEPARTMENT OF MATHEMATICS AND STATISTICS, WILLIAMS COLLEGE, WILLIAMSTOWN, MA 01267
	1. Introduction
	2. Explicit Formulas
	2.1. Standard Explicit Formula
	2.2. The Alternate Explicit Formula
	2.3. Formulas for the r 3 Terms
	3. Families of cuspidal newforms
	3.1. Weights
	3.2. Results
	4. Preliminaries for Families of Elliptic Curves
	4.1. Notation
	4.2. Sieving
	4.3. Moments of the Fourier Coefficients and the Explicit Formula
	5. Examples: One-parameter families of elliptic curves over Q(T)
	5.1. CM Example: The family y2 = x3 + B (6T+1) over Q(T)
	5.2. CM Example: The family y2 = x3 -B(36T+6)(36T+5)x over Q(T)
	5.3. Non-CM Example: The family y2 = x3 -3x + 12T over Q(T)
	Appendix A. Evaluation of SA(F) for the family of cuspidal newforms
	Appendix B. Evaluation of Ar,F for families of elliptic curves
	B.1. The family y2 = x3 + B (6T+1) over Q(T)
	B.2. The family y2 = x3 -(36T+6)(36T+5)x over Q(T)
	B.3. The family y2 = x3 -3x+12T over Q(T)
	References
ABSTRACT
  The Katz-Sarnak density conjecture states that, in the limit as the
conductors tend to infinity, the behavior of normalized zeros near the central
point of families of L-functions agree with the N -> oo scaling limits of
eigenvalues near 1 of subgroups of U(N). Evidence for this has been found for
many families by studying the n-level densities; for suitably restricted test
functions the main terms agree with random matrix theory. In particular, all
one-parameter families of elliptic curves with rank r over Q(T) and the same
distribution of signs of functional equations have the same limiting behavior.
We break this universality and find family dependent lower order correction
terms in many cases; these lower order terms have applications ranging from
excess rank to modeling the behavior of zeros near the central point, and
depend on the arithmetic of the family. We derive an alternate form of the
explicit formula for GL(2) L-functions which simplifies comparisons, replacing
sums over powers of Satake parameters by sums of the moments of the Fourier
coefficients lambda_f(p). Our formula highlights the differences that we expect
to exist from families whose Fourier coefficients obey different laws (for
example, we expect Sato-Tate to hold only for non-CM families of elliptic
curves). Further, by the work of Rosen and Silverman we expect lower order
biases to the Fourier coefficients in families of elliptic curves with rank
over Q(T); these biases can be seen in our expansions. We analyze several
families of elliptic curves and see different lower order corrections,
depending on whether or not the family has complex multiplication, a forced
torsion point, or non-zero rank over Q(T).

<|endoftext|><|startoftext|>
Spinor Dynamics in an Antiferromagnetic Spin-1 Condensate
A. T. Black, E. Gomez, L. D. Turner, S. Jung, and P. D. Lett
Joint Quantum Institute, University of Maryland and
National Institute of Standards and Technology, Gaithersburg, Maryland 20899
(Dated: October 30, 2018)
We observe coherent spin oscillations in an antiferromagnetic spin-1 Bose-Einstein condensate
of sodium. The variation of the spin oscillations with magnetic field shows a clear signature of
nonlinearity, in agreement with theory, which also predicts anharmonic oscillations near a critical
magnetic field. Measurements of the magnetic phase diagram agree with predictions made in the
approximation of a single spatial mode. The oscillation period yields the best measurement to date
of the sodium spin-dependent interaction coefficient, determining that the difference between the
sodium spin-dependent s-wave scattering lengths af=2−af=0 is 2.47 ± 0.27 Bohr radii.
PACS numbers: 03.75.Mn, 32.80.Cy, 32.80.Pj
Atomic collisions are essential to the formation of Bose-
Einstein condensates (BEC), redistributing energy dur-
ing evaporative cooling. Collisions can be coherent and
reversible, leading to diverse phenomena such as super-
fluidity [1] and reversible formation of molecules [2] in
BECs with a single internal state. When internal de-
grees of freedom are included (as in spinor condensates),
coherent collisions lead to rich dynamics [3, 4] in which
the population oscillates between different Zeeman sub-
levels. We present the first observation of coherent spin
oscillations in a spin-1 condensate with antiferromagnetic
interactions (in which the interaction energy of colliding
spin-aligned atoms is higher than that of spin-antialigned
atoms.)
Spinor condensates have been a fertile area for the-
oretical studies of dynamics [5, 6, 7, 8], ground state
structures [9, 10], and domain formation [11]. Extensive
experiments on the ferromagnetic F=1 hyperfine ground
state of 87Rb have demonstrated spin oscillations and co-
herent control of spinor dynamics [3, 12]. Observation of
domain formation in 23Na demonstrated the antiferro-
magnetic nature of the F=1 ground state [13] and de-
tected tunneling across spin domains [14]; no spin oscilla-
tions have been reported in sodium BEC until now. The
F=2 state of 87Rb is thought to be antiferromagnetic, but
a cyclic phase is possible [15, 16]. Experiments on this
state have demonstrated that the amplitude and period
of spin oscillations can be controlled magnetically [4].
At low magnetic fields, spin interactions dominate the
dynamics. The different sign of the spin dependent in-
teraction causes the antiferromagnetic F=1 case to differ
from the ferromagnetic one both in the structure of the
ground-state magnetic phase diagram and in the spinor
dynamics. Both cases can exhibit a regime of slow, an-
harmonic spin oscillations; however, this behavior is pre-
dicted over a wide range of initial conditions only in the
antiferromagnetic case [8]. The spin interaction energies
in sodium are more than an order of magnitude larger
than in 87Rb F =1 for a given condensate density [3],
facilitating studies of spinor dynamics.
The dynamics of the spin-1 system are much simpler
than the spin-2 case [4, 15, 16], having a well-developed
analytic solution [8]. This solution predicts a divergence
in the oscillation period (not to be confused with the
amplitude peak observed in 87Rb F=2 [4] oscillations).
This Letter reports the first measurement of the
ground state magnetic phase diagram of a spinor con-
densate, and the first experimental study of coherent
spinor dynamics in an antiferromagnetic spin-1 conden-
sate. Both show good agreement with the single-spatial-
mode theory [10]. To study the dynamics, we displace
the spinor from its ground state, observing the resulting
oscillations of the Zeeman populations as a function of
applied magnetic field B. At low field the oscillation pe-
riod is constant, at high field it decreases rapidly, and at a
critical field it displays a resonance-like feature, all as pre-
dicted by theory [8]. These measurements have allowed
us to improve by a factor of three the determination of
the sodium F = 1 spin-dependent interaction strength,
which is proportional to the difference af=2 − af=0 in
the spin-dependent scattering lengths.
The state of the condensate in the single-mode ap-
proximation (SMA) is written as the product φ(r)ζ of a
spin-independent spatial wavefunction φ(r) and a spinor
ζ = (
eiθ− ,
iθ0 ,
iθ+). We use ρ
, ρ0, and
ρ+ (θ−, θ0, and θ+) to denote fractional populations
(phases) of the Zeeman sublevels mF = −1, 0, and 1,
so that
ρi=1. The spinor’s ground state and its non-
linear dynamics may be derived from the spin-dependent
part of the Hamiltonian in the single-mode and mean-
field approximations, subject to the constraints that to-
tal atom number N and magnetization m≡ ρ+−ρ− are
conserved [8]. The “classical” spinor Hamiltonian E is a
function of only two canonical variables: the fractional
population ρ0 and the relative phase θ ≡ θ+ + θ− − 2θ0.
It is given by
E = δ(1−ρ0) + cρ0
(1−ρ0) +
(1−ρ0)2−m2 cos θ
where δ = h × (2.77× 1010Hz/T2)B2 is the quadratic
http://arxiv.org/abs/0704.0925v2
Zeeman shift [8] with h the Planck constant. (The linear
Zeeman shift has no effect on the dynamics.) The spin-
dependent interaction energy is c= c2 〈n〉, where 〈n〉 is
the mean particle density of the condensate and
(af=2 − af=0) (2)
is the spin-dependent interaction coefficient [8, 17]. Here
M is the atomic mass. af=2 and af=0 are the s-wave
scattering lengths for a colliding pair of atoms of total
spin f = 2 and f = 0, respectively; Bose symmetry en-
sures there are no s-wave collisions with total spin of
1. If c2 is positive (negative), the system is antiferro-
magnetic (ferromagnetic). The spinor ground state and
spinor dynamics are determined by Eq. (1).
The apparatus is similar to that described previ-
ously [18]. We produce a BEC of 105 23Na atoms in
the F=1 state, with an unobservably small thermal frac-
tion, in a crossed-beam 1070nm optical dipole trap. The
trap beams lie in the horizontal xy plane, so that the trap
curvature is nearly twice as large along the vertical z axis
as in the xy plane. By applying a small magnetic field
gradient with the MOT coils (less than 10mT/m) during
the 9 s of forced evaporation, we fully polarize the BEC:
all atoms are in mF =+1. Conservation of spin angular
momentum ensures that the magnetization remains con-
stant once evaporation has ceased; a state with ρ+ = 1
persists for the lifetime of the condensate, about 14 s.
We then turn off the gradient field and adiabatically
apply a bias field B of 4 to 51µT along x̂, leaving the
BEC in the ρ+ = 1 state. To prepare an initial state,
we apply an rf field resonant with the linear Zeeman
splitting; typically the frequency is tens to hundreds of
kilohertz. Rabi flopping in the three-level system is ob-
served [19], and controlling the amplitude and duration
of the pulse can produce any desired magnetization m,
which also determines the population ρ0. The flopping
time is less than 50µs, much shorter than the character-
istic times for spin evolution governed by Eq. (1). Using
this Zeeman transition avoids populating the F=2 state,
thus avoiding inelastic losses, which are much greater for
23Na than for 87Rb.
We measure the populations ρi of atoms in the three
Zeeman sublevels by Stern-Gerlach separation and ab-
sorption imaging [20]. The Stern-Gerlach gradient is par-
allel to the bias field ~B, while the imaging beam propa-
gates in the ẑ direction. The phase θ is not measured.
To measure the ground state population distribution
as a function of magnetization and magnetic field, we
first set the magnetization using the rf pulse. We then
ramp the field to a desired final value over 1 s, wait 3 s
for equilibration, and measure the populations as above.
Figure 1(b) displays the measured ground-state mag-
netic phase diagram. The theoretical prediction in
Fig. 1(a) is the population ρ0 that minimizes the energy,
B (µT)
B (µT)
FIG. 1: a) Theoretical prediction of the ground-state frac-
tional population ρ0 as a function of magnetization m and
applied magnetic field B, assuming a spin-dependent interac-
tion energy c=h×20.5 Hz. The thick line lying in the ρ0 = 0
plane indicates the boundary between the ρ0 = 0 and the
ρ0 > 0 regions. b) Experimental measurement. The surface
plot is produced by interpolation of data points.
Eq.(1). Such minima always occur at θ = π for antifer-
romagnetic interactions. The measurements agree well
with the prediction, which is made for spin interaction
energy c= h×20.5Hz (determined by spin dynamics as
described below).
The first term of Eq. (1) depends on the external mag-
netic field and tends to maximize the equilibrium ρ0
population. The second, spin dependent, term has the
same sign as c2 and in the antiferromagnetic case tends
to minimize the equilibrium ρ0 population. The phase
transition indicated by the thick line in Fig. 1a arises
at the point where these opposing tendencies cancel for
ρ0 = 0. Along the transition contour, ρ0 rapidly falls
to zero. By contrast, the ferromagnetic phase diagram
has ρ0 = 0 only at m = 1. In the region B < 15µT
and m > 0.6, there should be virtually no population in
mF = 0 for antiferromagnetic interactions, and popula-
tions up to ρ0 = 0.34 for ferromagnetic interactions (as-
suming the same magnitude of c). For our equilibrium
data, the reduced χ2 with respect to the antiferromag-
netic (ferromagnetic) prediction in this region is 2 (20).
This demonstrates that sodium F =1 spin interactions
are antiferromagnetic, as previously shown by the misci-
bility of spin domains formed in a quasi-one-dimensional
trap [13].
Across most of the phase diagram, the scatter in the
population is consistent with measured shot-to-shot vari-
ation in atom number. This variation is 20%, implying
an 8% variation in the mean condensate density accord-
ing to Thomas-Fermi theory. The variance of results is
not due to the magnetic field (calibrated to a precision of
0.2µT), nor to residual field variations across the BEC
(less than 250pT). Uncertainties in setting the magne-
tization are obviated, as the magnetization is measured
for each point as the difference in fractional populations
m= ρ+−ρ−. Discrepancies between theory and exper-
iment at low magnetic fields may be attributed to the
field dependence of the equilibration time. We observe
equilibration times (see below) ranging from 200ms at
high fields to several seconds at low fields, by which time
atom loss is substantial.
If the spinor is driven away from equilibrium, the full
coherent dynamics of the spinor system Eq.(1) are re-
vealed. We initiate the spinor dynamics with the rf tran-
sition described above, but now look at the evolution over
millisecond timescales.
The spinor dynamics are described by the Hamilton
equations for Eq. (1) [8]:
ρ̇0 = −
and θ̇ =
The system is closely related to the double-well “bosonic
Josephson junction” (BJJ) [21, 22] and exhibits a regime
of small, harmonic oscillations and, near a critical field
Bc, is predicted to display large, anharmonic oscilla-
tions. At Bc the period diverges (where δ(Bc) = c[(1 −
ρ0) +
(1 − ρ0)2 −m2 cos θ], with ρ0 and θ taken at
t = 0) [8]. The critical value corresponds to a transi-
tion from periodic-phase solutions of Eq. (3) to running-
phase solutions. At the critical value it is predicted that
the population is trapped in a spin state with ρ0=0. This
phenomenon is related to the macroscopic quantum self-
trapping that has been observed in the BJJ [22]. How-
ever, very small fluctuations in field or density will drive
ρ0 away from 0. Observing a ten-fold increase in the pe-
riod above its zero-field value would require a technically
challenging magnetic field stability of better than 100 fT.
Figure 2 plots the period and amplitude of oscillation
as a function of magnetic field. An example of the os-
cillating populations is shown in the inset. The spinor
condensate is prepared with initial ρ0=0.50± 0.01 1 and
m = 0.00 ± 0.02, and a plot of ρ0 versus time is taken
at each field value. Qualitatively, the period is nearly
independent of magnetic field at low fields, with a small
peak at a critical value Bc = 28µT, followed by a steep
decline in period. The amplitude likewise shows a max-
imum at Bc. Oscillations are visible over durations of
40ms to 300ms. Beyond these times, the amplitude of
the shot-to-shot fluctuations in ρ0 is roughly equal to
the harmonic amplitude. This indicates dephasing due
to shot-to-shot variation in oscillation frequency, proba-
bly associated with the variations in magnetic field and
condensate density, rather than any fundamental damp-
ing process. At even longer times, we observe damping
and equilibration to a new constant ρ0; the damping time
varies with magnetic field from 200ms to 5 s.
For the theoretical prediction in Fig. 2, the initial value
of ρ0 and m are obtained experimentally. We treat only
c and θ(t = 0) as free parameters; c is also predicted
by prior determinations of c2 and our knowledge of the
condensate density. The initial relative phase is not the
equilibrium value θ=π, due to our rf preparation. For a
three-level system driven in resonance with both transi-
tions, the relative phase is θ=0 at all times during the rf
transition, as we derive from Ref. [19]. Small deviations
from initial θ=0 could be caused by an unequal splitting
between the levels, from e.g., the quadratic Zeeman shift.
The best fit to the data in Fig. 2a and b is obtained
by using c=h×(21± 2)Hz and θ(t=0)=0.5± 0.3 (with
no other free parameters). Away from the critical field
Bc, agreement with theory is good. The fitted value of
c implies that Bc is 27µT, in reasonable agreement with
the apparent peak observed at 28µT. Our ability to ob-
serve strong variations in period near Bc is limited by
density fluctuations (8%) and magnetic field fluctuations
(0.2µT). Near Bc, typically only one cycle is visible be-
fore dephasing is complete. Such rapid dephasing can,
itself, be taken as evidence of a strongly B-dependent
period, as expected near the critical field.
To include the known fluctuations in density and mag-
netic field in our model, we perform a Monte Carlo sim-
ulation of the expected signal, based on measured, nor-
mally distributed shot-to-shot variations in values of c,
δ, m and ρ0(t = 0). At each value of B in Fig. 2, we
generate 80 simulated time traces, with each point in the
time trace determined from Eq. 3. We fit the simulated
traces using sine waves and record the mean and stan-
dard deviation of the amplitude and period of the fits.
The results (shaded regions in Fig. 2) show a less sharp
peak in the period. The smoothing of the peak at Bc is
consistent with our data.
1 All uncertainties in this paper are one standard deviation com-
bined statistical and systematic uncertainties
Magnetic Field (µT)
0 10 20 30 40 50
Time (ms)
0 40 80
FIG. 2: Period (a) and amplitude (b) of spin oscillations as a
function of applied magnetic field, following a sudden change
in spin state. The solid lines are theoretical predictions from
solving Eq. (3). The theoretical prediction of the period goes
to infinity at about 27µT. The shaded regions are ±1 stan-
dard deviation about the mean values predicted by the Monte
Carlo simulation. Inset: Fractional Zeeman population (solid
dots) and magnetization (open circles) as a function of time
after the spinor condensate is driven to ρ0 = 0.5, m = 0.
B=6.1µT. The solid line is a sinusoidal fit.
It is clear in Fig. 2 that the oscillation period is insensi-
tive to the magnetic field at low values of the field. In this
regime, the period is sensitive only to the spin interaction
c2 and the density of the condensate 〈n〉. Measuring this
period allows us to determine the difference in scattering
lengths af=2 − af=0. The trace inset in Fig. 2 was taken
in this regime, at a magnetic field of B = 6.1µT, and
shows harmonic oscillations with period 24.6 ± 0.3ms.
Here the predicted period dependence on magnetic field,
14µs/µT, is indeed weak and the oscillations dephase
only slightly over the duration shown. Using this mea-
surement of the period (in which much more data was
taken than for each point making up Fig. 2 (a) and (b)),
and including uncertainties in initial θ, ρ0, and m, we
obtain the spin interaction energy c=h× (20.5± 1.3)Hz.
Finding af=2 − af=0 requires a careful measurement
of the condensate density. We take absorption images
with various expansion times to find the mean field en-
ergy. The images yield the column density in the xy
plane, and the distribution in the z direction can be
inferred from our trap beam geometry. We find that
the mean density of the condensate under the conditions
of the inset to Fig. 2 is 〈n〉 = 8.6 ± 0.9× 1013 cm−3.
From this we calculate af=2 − af=0 = (2.47 ± 0.27)a0,
where a0 = 52.9pm is the Bohr radius. This is consis-
tent with a previous measurement, from spin domain
structure, of af=2 − af=0 = (3.5 ± 1.5)a0 [13] and is
smaller than the difference between scattering lengths
determined from molecular levels, af=2 = (55.1 ± 1.6)a0
and af=0=(50.0± 1.6)a0 [23]. A multichannel quantum
defect theory calculation gives af=2 − af=0 = 5.7a0 [24].
Finally, we consider the validity of the spatial single-
mode approximation. The SMA was clearly violated in
previous work on 23Na [13] and 87Rb [3] F =1 spinor
condensates where spatial domains formed. Spatial de-
grees of freedom decouple from spinor dynamics when the
spin healing length ξs=2π~/
2m|c2|n is larger than the
condensate. From our density measurements we find typ-
ical Thomas-Fermi radii of (9.4, 6.7, 5.7)µm. The spin
healing length, based on our measurements of c, is typi-
cally ξs =17µm. We therefore operate within the range
of validity of the SMA. Furthermore, Stern-Gerlach ab-
sorption images show three components with identical
spatial distributions after ballistic expansion, indicating
that domain formation does not occur.
In conclusion, we have studied both the ground state
and the spinor dynamics of a sodium F=1 spinor conden-
sate. Both agree well with theoretical predictions in the
SMA. By measuring the spin oscillation frequency at low
magnetic field, we have determined the difference in spin-
dependent scattering lengths. The observed peak in oscil-
lation period as a function of magnetic field demonstrates
that the spinor dynamics are fundamentally nonlinear.
It also suggests the existence of the predicted regime of
highly anharmonic spin oscillations at the center of this
peak, which should be experimentally accessible with suf-
ficient control of condensate density and magnetic field.
Observation of anharmonic oscillations, as well as popu-
lation trapping and spin squeezing effects, could be aided
by a minimally destructive measurement of Zeeman pop-
ulations [25] to reduce the effects of magnetic field drifts
and shot-to-shot density variations.
We thank W. Phillips for helpful discussions, and ONR
and NASA for support. ATB acknowledges an NRC
Fellowship. LDT acknowledges an Australian-American
Fulbright Fellowship.
[1] M. R. Matthews, B. P. Anderson, P. C. Haljan, D. S.
Hall, C. E. Wieman, and E. A. Cornell, Phys. Rev. Lett.
83, 2498 (1999).
[2] E. A. Donley, N. R. Claussen, S. T. Thompson, and C. E.
Wieman, Nature (London) 417, 529 (2002).
[3] M.-S. Chang, Q. Qin, W. Zhang, L. You, and M. S. Chap-
man, Nature Physics 1, 111 (2005).
[4] J. Kronjäger, C. Becker, P. Navez, K. Bongs, and K. Sen-
gstock, Phys. Rev. Lett. 97, 110404 (2006).
[5] M. Moreno-Cardoner, J. Mur-Petit, M. Guilleumas,
A. Polls, A. Sanpera, and M. Lewenstein, arXiv:cond-
mat/0611379 (2006).
[6] C. K. Law, H. Pu, and N. P. Bigelow, Phys. Rev. Lett.
81, 5257 (1998).
[7] T. Ohmi and K. Machida, J. Phys. Soc. Jpn. 67, 1822
(1998).
[8] W. Zhang, D. L. Zhou, M.-S. Chang, M. S. Chapman,
and L. You, Phys. Rev. A 72, 013602 (2005).
[9] T.-L. Ho, Phys. Rev. Lett. 81, 742 (1998).
[10] W. Zhang, S. Yi, and L. You, New J. Phys. 5, 77 (2003).
[11] W. Zhang, D. L. Zhou, M.-S. Chang, M. S. Chapman,
and L. You, Phys. Rev. Lett. 95, 180403 (2005).
[12] J. Kronjäger, C. Becker, M. Brinkmann, R. Walser,
P. Navez, K. Bongs, and K. Sengstock, Phys. Rev. A
72, 063619 (2005).
[13] J. Stenger, S. Inouye, D. M. Stamper-Kurn, H.-J. Mies-
ner, A. P. Chikkatur, and W. Ketterle, Nature 396, 345
(1998).
[14] D. M. Stamper-Kurn, H.-J. Miesner, A. P. Chikkatur,
S. Inouye, J. Stenger, and W. Ketterle, Phys. Rev. Lett.
83, 661 (1999).
[15] A. Widera, F. Gerbier, S. Fölling, T. Gericke, O. Mandel,
and I. Bloch, New J. Phys. 8, 152 (2006).
[16] T. Kuwamoto, K. Araki, T. Eno, and T. Hirano, Phys.
Rev. A 69, 063604 (2004).
[17] D. M. Stamper-Kurn and W. Ketterle, in Coherent
Atomic Matter Waves, edited by R. Kaiser, C. West-
brook, and F. David (Springer, New York, 2001), no. 72
in Les Houches Summer School Series, pp. 137–217, cond-
mat/0005001.
[18] R. Dumke, M. Johanning, E. Gomez, J. D. Weinstein,
K. M. Jones, and P. D. Lett, New J. Phys. 8, 64 (2006).
[19] M. Sargent III and P. Horwitz, Phys. Rev. A 13, 1962
(1976).
[20] R. Dumke, J. D. Weinstein, M. Johanning, K. M. Jones,
and P. D. Lett, Phys. Rev. A 72, 041801(R) (2005).
[21] S. Raghavan, A. Smerzi, S. Fantoni, and S. R. Shenoy,
Phys. Rev. A 59, 620 (1999).
[22] M. Albiez, R. Gati, J. Fölling, S. Hunsmann, M. Cris-
tiani, and M. K. Oberthaler, Phys. Rev. Lett. 95, 010402
(2005).
[23] A. Crubellier, O. Dulieu, F. Masnou-Seeuws, M. Elbs,
H. Knöckel, and E. Tiemann, Eur. Phys. J. D 6, 211
(1999).
[24] J. P. Burke, C. H. Greene, and J. L. Bohn, Phys. Rev.
Lett. 81, 3355 (1998).
[25] G. A. Smith, S. Chaudhury, A. Silberfarb, I. H. Deutsch,
and P. S. Jessen, Phys. Rev. Lett. 93, 163602 (2004).
ABSTRACT
  We observe coherent spin oscillations in an antiferromagnetic spin-1
Bose-Einstein condensate of sodium. The variation of the spin oscillations with
magnetic field shows a clear signature of nonlinearity, in agreement with
theory, which also predicts anharmonic oscillations near a critical magnetic
field. Measurements of the magnetic phase diagram agree with predictions made
in the approximation of a single spatial mode. The oscillation period yields
the best measurement to date of the sodium spin-dependent interaction
coefficient, determining that the difference between the sodium spin-dependent
s-wave scattering lengths $a_{f=2}-a_{f=0}$ is $2.47\pm0.27$ Bohr radii.

<|endoftext|><|startoftext|>
Introduction
Nonlinear stability properties are often considered with respect to an equi-
librium point or to a nominal system trajectory (see e.g. [31]). By contrast,
incremental stability is concerned with the behaviour of system trajectories
with respect to each other. From the triangle inequality, global exponential
incremental stability (any two trajectories tend to each other exponentially)
is a stronger property than global exponential convergence to a single tra-
jectory.
Historically, work on deterministic incremental stability can be traced
back to the 1950’s [23, 7, 16] (see e.g. [26, 20] for a more extensive list and
historical discussion of related references). More recently, and largely inde-
pendently of these earlier studies, a number of works have put incremental
To whom correspondance should be addressed.
http://arxiv.org/abs/0704.0926v2
stability on a broader theoretical basis and made relations with more tradi-
tional stability approaches [14, 32, 24, 2, 6]. Furthermore, it was shown that
incremental stability is especially relevant in the study of such problems as
state detection [2], observer design or synchronization analysis.
While the above references are mostly concerned with deterministic sta-
bility notions, stability theory has also been extended to stochastic dynam-
ical systems, see for instance [22, 17]. This includes important recent de-
velopments in Lyapunov-like approaches [12, 27], as well as applications to
standard problems in systems and control [13, 34, 8]. However, stochastic
versions of incremental stability have not yet been systematically investi-
gated.
The goal of this paper is to extend some concepts and results in in-
cremental stability to stochastic dynamical systems. More specifically, we
derive a stochastic version of contraction analysis in the specialized context
of state-independent metrics.
We prove in section 2 that the mean square distance between any two
trajectories of a stochastically contracting system is upper-bounded by a
constant after exponential transients. In contrast with previous works on
incremental stochastic stability [5], we consider the case when the two tra-
jectories are subject to distinct and independent noises, as detailed in sec-
tion 2.2.1. This specificity enables our theory to have a number of new and
practically important applications. However, the fact that the noise does
not vanish as two trajectories get very close to each other will prevent us
from obtaining asymptotic almost-sure stability results (see section 2.3.2).
In section 3, we show that results on combinations of deterministic con-
tracting systems have simple analogues in the stochastic case. These combi-
nation properties allow one to build by recursion stochastically contracting
systems of arbitrary size.
Finally, as illustrations of our results, we study in section 4 several ex-
amples, including contracting observers with noisy measurements, stochas-
tic composite variables and synchronization phenomena in networks of noisy
dynamical systems.
2 Main results
2.1 Background
2.1.1 Nonlinear contraction theory
Contraction theory [24] provides a set of tools to analyze the incremental
exponential stability of nonlinear systems, and has been applied notably
to observer design [24, 25, 1, 21, 36], synchronization analysis [35, 28] and
systems neuroscience modelling [15]. Nonlinear contracting systems enjoy
desirable aggregation properties, in that contraction is preserved under many
types of system combinations given suitable simple conditions [24].
While we shall derive global properties of nonlinear systems, many of our
results can be expressed in terms of eigenvalues of symmetric matrices [19].
Given a square matrix A, the symmetric part of A is denoted by As. The
smallest and largest eigenvalues of As are denoted by λmin(A) and λmax(A).
Given these notations, the matrix A is positive definite (denoted A > 0) if
λmin(A) > 0, and it is uniformly positive definite if
∃β > 0 ∀x, t λmin(A(x, t)) ≥ β
The basic theorem of contraction analysis, derived in [24], can be stated
as follows
Theorem 1 (Contraction) Consider, in Rn, the deterministic system
ẋ = f(x, t) (2.1)
where f is a smooth nonlinear function. Denote the Jacobian matrix of f
with respect to its first variable by ∂f
. If there exists a square matrix Θ(x, t)
such that M(x, t) = Θ(x, t)TΘ(x, t) is uniformly positive definite and the
matrix
F(x, t) =
Θ(x, t) +Θ(x, t)
Θ−1(x, t)
is uniformly negative definite, then all system trajectories converge exponen-
tially to a single trajectory, with convergence rate | sup
x,t λmax(F)| = λ > 0.
The system is said to be contracting, F is called its generalized Jacobian,
M(x, t) its contraction metric and λ its contraction rate.
2.1.2 Standard stochastic stability
In this section, we present very informally the basic ideas of standard stochas-
tic stability (for a rigourous treatment, the reader is referred to e.g. [22]).
This will set the context to understand the forthcoming difficulties and dif-
ferences associated with incremental stochastic stability.
For simplicity, we consider the special case of global exponential stability.
Let x(t) be a Markov stochastic process and assume that there exists a non-
negative function V (V (x) may represent e.g. the squared distance of x
from the origin) such that
∀x ∈ Rn ÃV (x) ≤ −λV (x) (2.2)
where λ is a positive real number and Ã is the infinitesimal operator of the
process x(t). The operator Ã is the stochastic analogue of the deterministic
differentiation operator. In the case that x(t) is an Itô process, Ã corre-
sponds to the widely-used [27, 34, 8] differential generator L (for a proof of
this fact, see [22], p. 15 or [3], p. 42).
For x0 ∈ Rn, let Ex0(·) = E(·|x(0) = x0). Then by Dynkin’s formula
([22], p. 10), one has
∀t ≥ 0 Ex0V (x(t))− V (x0) = Ex0
ÃV (x(s))ds
≤ −λEx0
V (x(s))ds = −λ
Ex0V (x(s))ds
Applying the Gronwall’s lemma to the deterministic real-valued function
t → Ex0V (x(t)) yields
∀t ≥ 0 Ex0V (x(t)) ≤ V (x0)e−λt
If we assume furthermore that Ex0V (x(t)) < ∞ for all t, then the above
implies that V (x(t)) is a supermartingale (see lemma 3 in the Appendix for
details), which yields, by the supermartingale inequality
T≤t<∞
V (x(t)) ≥ A
≤ Ex0V (x(T ))
≤ V (x0)e
(2.3)
Thus, one obtains an almost-sure stability result, in the sense that
∀A > 0 lim
T≤t<∞
V (x(t)) ≥ A
= 0 (2.4)
2.2 The stochastic contraction theorem
2.2.1 Settings
Consider a noisy system described by an Itô stochastic differential equation
da = f(a, t)dt+ σ(a, t)dW d (2.5)
where f is a Rn × R+ → Rn function, σ is a Rn × R+ → Rnd matrix-valued
function and W d is a standard d-dimensional Wiener process.
To ensure existence and uniqueness of solutions to equation (2.5), we
assume, here and in the remainder of the paper, the following standard
conditions on f and σ
Lipschitz condition: There exists a constant K1 > 0 such that
∀t ≥ 0, a,b ∈ Rn ‖f(a, t)− f(b, t)|+ ‖σ(a, t) − σ(b, t)‖ ≤ K1‖a− b‖
Restriction on growth: There exists a constant K2 > 0
∀t ≥ 0, a ∈ Rn ‖f(a, t)‖2 + ‖σ(a, t)‖2 ≤ K2(1 + ‖a‖2)
Under these conditions, one can show ([3], p. 105) that equation (2.5)
has on [0,∞[ a unique Rn-valued solution a(t), continuous with probability
one, and satisfying the initial condition a(0) = a0, with a0 ∈ Rn.
In order to investigate the incremental stability properties of system (2.5),
consider now two system trajectories a(t) and b(t). Our goal will consist of
studying the trajectories a(t) and b(t) with respect to each other. For this,
we consider the augmented system x(t) = (a(t),b(t))T , which follows the
equation
f(a, t)
f(b, t)
σ(a, t) 0
0 σ(b, t)
dW d1
dW d2
f(x, t)dt+
σ(x, t)dW 2d (2.6)
Important remark As stated in the introduction, the systems a and
b are driven by distinct and independent Wiener processes W d1 and W
This makes our approach considerably different from [5], where the authors
studied two trajectories driven by the same Wiener process.
Our approach enables us to study the stability of the system with respect
to variations in initial conditions and to random perturbations: indeed, two
trajectories of any real-life system are typically affected by distinct “real-
izations” of the noise. In addition, it leads very naturally to nice results on
the comparison of noisy and noise-free trajectories (cf. section 2.4), which
are particularly useful in applications (cf. section 4).
However, because of the very fact that the two trajectories are driven
by distinct Wiener processes, we cannot expect the influence of the noise
to vanish when the two trajectories get very close to each other. This con-
strasts with [5], and more generally, with the standard stochastic stability
case, where the noise vanishes near the origin (cf. section 2.1.2). The con-
sequences of this will be discussed in detail in section 2.3.2.
2.2.2 The basic stochastic contraction theorem
We introduce two hypotheses
(H1) f(a, t) is contracting in the identity metric, with contraction rate λ,
(i.e. ∀a, t λmax
≤ −λ)
(H2) tr
σ(a, t)Tσ(a, t)
is uniformly upper-bounded by a constant C (i.e.
∀a, t tr
σ(a, t)Tσ(a, t)
In other words, (H1) says that the noise-free system is contracting, while
(H2) says that the variance of the noise is upper-bounded by a constant.
Definition 1 A system that verifies (H1) and (H2) is said to be stochas-
tically contracting in the identity metric, with rate λ and bound C.
Consider now the Lyapunov-like function V (x) = ‖a−b‖2 = (a−b)T (a−
b). Using (H1) and (H2), we derive below an inequality on ÃV (x), similar
to equation (2.2) in section 2.1.2.
Lemma 1 Under (H1) and (H2), one has the inequality
ÃV (x) ≤ −2λV (x) + 2C (2.7)
Proof Since x(t) is an Itô process, Ã is given by the differential operator
L of the process [22, 3]. Thus, by the Itô formula
ÃV (x) = L V (x) =
∂V (x)
f(x, t) +
σ(x, t)T
∂2V (x)
σ(x, t)
1≤i≤2n
f(x, t)i +
1≤i,j,k≤2n
σ(x, t)ij
∂xi∂xk
σ(x, t)kj
1≤i≤n
f(a, t)i +
1≤i≤n
f(b, t)i
1≤i,j,k≤n
σ(a, t)ij
∂ai∂ak
σ(a, t)kj
1≤i,j,k≤n
σ(b, t)ij
∂bi∂bk
σ(b, t)kj
= 2(a− b)T (f(a, t) − f(b, t))
+tr(σ(a, t)Tσ(a, t)) + tr(σ(b, t)Tσ(b, t))
Fix t ≥ 0 and, as in [10], consider the real-valued function
r(µ) = (a− b)T (f(µa+ (1− µ)b, t)− f(b, t))
Since f is C1, r is C1 over [0, 1]. By the mean value theorem, there exists
µ0 ∈]0, 1[ such that
r′(µ0) = r(1)− r(0) = (a− b)T (f(a)− f(b))
On the other hand, one obtains by differentiating r
r′(µ0) = (a− b)T
(µ0a+ (1− µ0)b, t)
(a− b)
Thus, one has
(a− b)T (f(a)− f(b)) = (a− b)T
(µ0a+ (1− µ0)b, t)
(a− b)
≤ −λ(a− b)T (a− b) = −2λV (x) (2.8)
where the inequality is obtained by using (H1).
Finally,
ÃV (x) = 2(a− b)T (f(a)− f(b)) + tr(σ(a, t)Tσ(a, t)) + tr(σ(b, t)Tσ(b, t))
≤ −2λV (x) + 2C
where the inequality is obtained by using (H2). �
We are now in a position to prove our main theorem on stochastic incre-
mental stability.
Theorem 2 (Stochastic contraction) Assume that system (2.5) verifies
(H1) and (H2). Let a(t) and b(t) be two trajectories whose initial condi-
tions are given by a probability distribution p(x(0)) = p(a(0),b(0)). Then
∀t ≥ 0 E
‖a(t)− b(t)‖2
+ e−2λt
‖a0 − b0‖2 −
dp(a0,b0)
(2.9)
where [·]+ = max(0, ·). This implies in particular
∀t ≥ 0 E
‖a(t)− b(t)‖2
‖a(0) − b(0)‖2
e−2λt (2.10)
Proof Let x0 = (a0,b0) ∈ R2n. By Dynkin’s formula ([22], p. 10)
Ex0V (x(t)) − V (x0) = Ex0
ÃV (x(s))ds
Thus one has ∀u, t 0 ≤ u ≤ t < ∞
Ex0V (x(t)) − Ex0V (x(u)) = Ex0
ÃV (x(s))ds
≤ Ex0
(−2λV (x(s)) + 2C)ds (2.11)
(−2λEx0V (x(s)) + 2C)ds
where inequality (2.11) is obtained by using lemma 1.
Denote by g(t) the deterministic quantity Ex0V (x(t)). Clearly, g(t) is a
continuous function of t since x(t) is a continuous process. The function g
then satisfies the conditions of the Gronwall-type lemma 4 in the Appendix,
and as a consequence
∀t ≥ 0 Ex0V (x(t)) ≤
V (x0)−
e−2λT
Integrating the above inequality with respect to x0 yields the desired
result (2.9). Next, inequality (2.10) follows from (2.9) by remarking that
‖a0 − b0‖2 −
dp(a0,b0) ≤
‖a0 − b0‖2dp(a0,b0)
‖a(0) − b(0)‖2
(2.12)
Remark Let ǫ > 0 and Tǫ =
E(‖a0−b0‖2)
. Then inequal-
ity (2.10) and Jensen’s inequality [30] imply
∀t ≥ Tǫ E(‖a(t)− b(t)‖) ≤
C/λ+ ǫ (2.13)
Since ‖a(t)−b(t)‖ is non-negative, (2.13) together with Markov inequal-
ity [11] allow one to obtain the following probabilistic bound on the distance
between a(t) and b(t)
∀A > 0 ∀t ≥ Tǫ P (‖a(t)− b(t)‖ ≥ A) ≤
C/λ+ ǫ
Note however that this bound is much weaker than the asymptotic
almost-sure bound (2.4).
2.2.3 Generalization to time-varying metrics
Theorem 2 can be vastly generalized by considering general time-dependent
metrics (the case of state-dependent metrics is not considered in this article
and will be the subject of a future work). Specifically, let us replace (H1)
and (H2) by the following hypotheses
(H1’) There exists a uniformly positive definite metric M(t) = Θ(t)TΘ(t),
with the lower-bound β > 0 (i.e. ∀x, t xTM(t)x ≥ β‖x‖2) and f(a, t)
is contracting in that metric, with contraction rate λ, i.e.
Θ(t) +Θ(t)
Θ−1(t)
≤ −λ uniformly
or equivalently
M(t) +
M(t) ≤ −2λM(t) uniformly
(H2’) tr
σ(a, t)TM(t)σ(a, t)
is uniformly upper-bounded by a constant C
Definition 2 A system that verifies (H1’) and (H2’) is said to be stochas-
tically contracting in the metric M(t), with rate λ and bound C.
Consider now the generalized Lyapunov-like function V1(x, t) = (a −
b)TM(t)(a − b). Lemma 1 can then be generalized as follows.
Lemma 2 Under (H1’) and (H2’), one has the inequality
ÃV1(x, t) ≤ −2λV1(x, t) + 2C (2.14)
Proof Let us compute first ÃV1
ÃV1(x, t) =
f(x, t) +
σ(x, t)T
σ(x, t)
= (a− b)T
(a− b) + 2(a− b)TM(t)(f(a, t) − f(b, t))
+tr(σ(a, t)TM(t)σ(a, t)) + tr(σ(b, t)TM(t)σ(b, t))
Fix t > 0 and consider the real-valued function
r(µ) = (a− b)TM(t)(f(µa+ (1− µ)b, t)− f(b, t))
Since f is C1, r is C1 over [0, 1]. By the mean value theorem, there exists
µ0 ∈]0, 1[ such that
r′(µ0) = r(1)− r(0) = (a− b)TM(t)(f(a) − f(b))
On the other hand, one obtains by differentiating r
r′(µ0) = (a− b)TM(t)
(µ0a+ (1− µ0)b, t)
(a− b)
Thus, letting c = µ0a+ (1− µ0)b, one has
(a− b)T
(a− b) + 2(a− b)TM(t)(f(a) − f(b))
= (a− b)T
(a− b) + 2(a− b)TM(t)
(c, t)
(a− b)
= (a− b)T
M(t) +M(t)
(c, t)
(c, t)
(a− b)
≤ −2λ(a− b)TM(t)(a− b) = −2λV1(x) (2.15)
where the inequality is obtained by using (H1’).
Finally, combining equation (2.15) with (H2’) allows to obtain the de-
sired result. �
We can now state the generalized stochastic contraction theorem
Theorem 3 (Generalized stochastic contraction) Assume that system
(2.5) verifies (H1’) and (H2’). Let a(t) and b(t) be two trajectories whose
initial conditions are given by a probability distribution p(x(0)) = p(a(0),b(0)).
∀t ≥ 0 E
(a(t)− b(t))TM(t)(a(t) − b(t))
+ e−2λt
(a0 − b0)TM(0)(a0 − b0)−
dp(a0,b0) (2.16)
In particular,
∀t ≥ 0 E
‖a(t)− b(t)‖2
‖a(0) − b(0)‖2
e−2λt
(2.17)
Proof Following the same reasoning as in the proof of theorem 2, one
obtains
∀t ≥ 0 Ex0V1(x(t)) ≤
V1(x0)−
e−2λt
which leads to (2.16) by integrating with respect to (a0,b0). Next, observing
‖a(t)− b(t)‖2 ≤ 1
(a(t)− b(t))TM(t)(a(t) − b(t)) = 1
EV1(x(t))
and using the same bounding as in (2.12) lead to (2.17). �
2.3 Strength of the stochastic contraction theorem
2.3.1 “Optimality” of the mean square bound
Consider the following linear dynamical system, known as the Ornstein-
Uhlenbeck (colored noise) process
da = −λadt+ σdW (2.18)
Clearly, the noise-free system is contracting with rate λ and the trace of
the noise matrix is upper-bounded by σ2. Let a(t) and b(t) be two system
trajectories starting respectively at a0 and b0 (deterministic initial condi-
tions). Then by theorem 2, we have
∀t ≥ 0 E
(a(t)− b(t))2
(a0 − b0)2 −
e−2λt (2.19)
Let us verify this result by solving directly equation (2.18). The solution
of equation (2.18) is ([3], p. 134)
a(t) = a0e
−λt + σ
eλ(s−t)dW (s) (2.20)
Next, let us compute the mean square distance between the two trajec-
tories a(t) and b(t)
E((a(t)− b(t))2) = (a0 − b0)2e−2λt +
((∫ t
eλ(s−t)dW1(s)
((∫ t
eλ(u−t)dW2(u)
= (a0 − b0)2e−2λt +
(1− e−2λt)
(a0 − b0)2 −
e−2λt
The last inequality is in fact an equality when (a0 − b0)2 ≥ σ
. Thus,
this calculation shows that the upper-bound (2.19) given by theorem 2 is
optimal, in the sense that it can be attained.
2.3.2 No asymptotic almost-sure stability
From the explicit form (2.20) of the solutions, one can deduce that the
distributions of a(t) and b(t) converge to the normal distribution N
([3], p. 135). Since a(t) and b(t) are independent, the distribution of the
difference a(t)−b(t) will then converge to N
. This observation shows
that, contrary to the case of standard stochastic stability (cf. section 2.1.2),
one cannot – in general – obtain asymptotic almost-sure incremental stability
results (which would imply that the distribution of the difference converges
instead to the constant 0).
Compare indeed equations (2.2) (the condition for standard stability, sec-
tion 2.1.2) and (2.7) (the condition for incremental stability, section 2.2.2).
The difference lies in the term 2C, which stems from the fact that the influ-
ence of the noise does not vanish when two trajectories get very close to each
other (cf. section 2.2.1). The presence of this extra term prevents ÃV (x(t))
from being always non-positive, and as a result, it prevents V (x(t)) from be-
ing always “non-increasing”. As a consequence, V (x(t)) is not – in general –
a supermartingale, and one cannot then use the supermartingale inequality
to obtain asymptotic almost-sure bounds, as in equation (2.3).
Remark If one is interested in finite time bounds then the supermartin-
gale inequality is still applicable, see ([22], p. 86) for details.
2.4 Noisy and noise-free trajectories
Consider the following augmented system
f(a, t)
f(b, t)
0 σ(b, t)
f(x, t)dt +
σ(x, t)dW2d
(2.21)
This equation is the same as equation (2.6) except that the a-system is
not perturbed by noise. Thus V (x) = ‖a − b‖2 will represent the distance
between a noise-free trajectory and a noisy one. All the calculations will be
the same as in the previous development, with C being replaced by C/2.
One can then derive the following corollary
Corollary 1 Assume that system (2.5) verifies (H1’) and (H2’). Let a(t)
be a noise-free trajectory starting at a0 and b(t) a noisy trajectory whose
initial condition is given by a probability distribution p(b(0)). Then
∀t ≥ 0 E
‖a(t)− b(t)‖2
‖a0 − b(0)‖2
e−2λt
(2.22)
Remarks
• One can note here that the derivation of corollary 1 is only permitted
by our initial choice of considering distinct driving Wiener process for
the a- and b-systems (cf. section 2.2.1).
• Corollary 1 provides a robustness result for contracting systems, in the
sense that any contracting system is automatically protected against
noise, as quantified by (2.22). This robustness could be related to the
exponential nature of contraction stability.
3 Combinations of contracting stochastic systems
Stochastic contraction inherits naturally from deterministic contraction [24]
its convenient combination properties. Because contraction is a state-space
concept, such properties can be expressed in more general forms than input-
output analogues such as passivity-based combinations [29]. The following
combination properties allow one to build by recursion stochastically con-
tracting systems of arbitrary size.
Parallel combination Consider two stochastic systems of the same di-
mension {
dx1 = f1(x1, t)dt+ σ1(x1, t)dW1
dx2 = f2(x2, t)dt+ σ2(x2, t)dW2
Assume that both systems are stochastically contracting in the same
constant metric M, with rates λ1 and λ2 and with bounds C1 and C2.
Consider a uniformly positive bounded superposition
α1(t)x1 + α2(t)x2
where ∀t ≥ 0, li ≤ αi(t) ≤ mi for some li,mi > 0, i = 1, 2.
Clearly, this superposition is stochastically contracting in the metric M,
with rate l1λ1 + l2λ2 and bound m1C1 +m2C2.
Negative feedback combination In this and the following paragraphs,
we describe combinations properties for contracting systems in constant met-
rics M. The case of time-varying metrics can be easily adapted from this
development but is skipped here for the sake of clarity.
Consider two coupled stochastic systems
dx1 = f1(x1,x2, t)dt+ σ1(x1, t)dW1
dx2 = f2(x1,x2, t)dt+ σ2(x2, t)dW2
Assume that system i (i = 1, 2) is stochastically contracting with respect to
Mi = Θ
i Θi, with rate λi and bound Ci.
Assume furthermore that the two systems are connected by negative
feedback [33]. More precisely, the Jacobian matrices of the couplings are of
the form Θ1J12Θ
2 = −kΘ2JT21Θ
1 , with k a positive constant. Hence,
the Jacobian matrix of the augmented system is given by
J1 −kΘ−11 Θ2JT21Θ
J21 J2
Consider a coordinate transform Θ =
associated with
the metric M = ΘTΘ > 0. After some calculations, one has
ΘJΘ−1
Θ1J1Θ
Θ2J2Θ
≤ max(−λ1,−λ2)I uniformly (3.1)
The augmented system is thus stochastically contracting in the metric
M, with rate min(λ1, λ2) and bound C1 + kC2.
Hierarchical combination We first recall a standard result in matrix
analysis [19]. Let A be a symmetric matrix in the formA =
A21 A2
Assume that A1 and A2 are definite positive. Then A is definite positive if
sing2(A21) < λmin(A1)λmin(A2) where sing(A21) denotes the largest singu-
lar value of A21. In this case, the smallest eigenvalue of A satisfies
λmin(A) ≥
λmin(A1) + λmin(A2)
λmin(A1)− λmin(A2)
+ sing2(A21)
Consider now the same set-up as in the previous paragraph, except that
the connection is now hierarchical and upper-bounded. More precisely, the
Jacobians of the couplings verify J12 = 0 and sing
2(Θ2J21Θ
1 ) ≤ K. The
Jacobian matrix of the augmented system is then given by J =
J21 J2
Consider a coordinate transform Θǫ =
0 ǫΘ2
associated with the
metric Mǫ = Θ
ǫ Θǫ > 0. After some calculations, one has
ΘJΘ−1
Θ1J1Θ
ǫ(Θ2J21Θ
ǫΘ2J21Θ
Θ2J2Θ
Set now ǫ =
2λ1λ2
. The augmented system is then stochastically con-
tracting in the metric Mǫ, with rate
(λ1 + λ2 −
λ21 + λ
2)) and bound
2C2λ1λ2
Small gains In this paragraph, we require no specific assumption on the
form of the couplings. Consider the coordinate transformΘ =
associated with the metric Mk = Θ
Θk > 0. Aftersome calculations, one
Θ1J1Θ
Θ2J2Θ
where Bk =
kΘ2J21Θ
Θ1J12Θ
Following the matrix analysis result stated at the beginning of the pre-
vious paragraph, if infk>0 sing
2(Bk) < λ1λ2 then the augmented system is
stochastically contracting in the metric Mk, with bound C1 + kC2 and rate
λ verifying
λ ≥ λ1 + λ2
λ1 − λ2
+ inf
sing2(Bk) (3.2)
4 Some examples
4.1 Effect of measurement noise on contracting observers
Consider a nonlinear dynamical system
ẋ = f(x, t) (4.1)
If a measurement y = y(x) is available, then it may be possible to choose
an output injection matrix K(t) such that the dynamics
˙̂x = f(x̂, t) +K(t)(ŷ − y) (4.2)
is contracting, with ŷ = y(x̂). Since the actual state x is a particular
solution of (4.2), any solution x̂ of (4.2) will then converge towards x expo-
nentially.
Assume now that the measurements are corrupted by additive “white
noise”. In the case of linear measurement, the measurement equation be-
comes y = H(t)x+Σ(t)ξ(t) where ξ(t) is a multidimensional “white noise”
and Σ(t) is the matrix of measurement noise intensities.
The observer equation is now given by the following Itô stochastic dif-
ferential equation (using the formal rule dW = ξdt)
dx̂ = (f(x̂, t) +K(t)(H(t)x −H(t)x̂))dt+K(t)Σ(t)dW (4.3)
Next, remark that the solution x of system (4.1) is a also a solution of
the noise-free version of system (4.3). By corollary 1, one then has, for any
solution x̂ of system (4.3)
∀t ≥ 0 E
‖x̂(t)− x(t)‖2
+ ‖x̂0 − x0‖2e−2λt (4.4)
where
λ = inf
∣∣∣∣λmax
∂f(x, t)
−K(t)H(t)
)∣∣∣∣
C = sup
Σ(t)TK(t)TK(t)Σ(t)
Remark The choice of the injection gain K(t) is governed by a trade-
off between convergence speed (λ) and noise sensitivity (C/λ) as quantified
by (4.4). More generally, the explicit computation of the bound on the
expected quadratic estimation error given by (4.4) may open the possibility
of measurement selection in a way similar to the linear case. If several
possible measurements or sets of measurements can be performed, one may
try at each instant (or at each step, in a discrete version) to select the
most relevant, i.e., the measurement or set of measurements which will best
contribute to improving the state estimate. Similarly to the Kalman filters
used in [9] for linear systems, this can be achieved by computing, along
with the state estimate itself, the corresponding bounds on the expected
quadratic estimation error, and then selecting accordingly the measurement
which will minimize it.
4.2 Estimation of velocity using composite variables
In this section, we present a very simple example that hopefully suggests the
many possibilities that could stem from the combination of our stochastic
stability analysis with the composite variables framework [31].
Let x be the position of a mobile subject to a sinusoidal forcing
ẍ = −U1ω2 sin(ωt) + 2U2
where U1 and ω are known parameters. We would like to compute good
approximations of the mobile’s velocity v and acceleration a using only mea-
surements of x and without using any filter. For this, construct the following
observer
−αv 1
−αa 0
(αa − α2v)x
−αaαvx− U1ω3 cos(ωt)
3 cos(ωt)
(4.5)
and introduce the composite variables v̂ = v + αvx and â = a + αax. By
construction, these variables follow the equation
v̂ − v
−U1ω3 cos(ωt)
(4.6)
and therefore, a particular solution of (v̂, â) is clearly (v, a). Choose now
αa = α
v = α
2 and let Mα =
α2 −α/2
−α/2 1
. One can then show that
system (4.6) is contracting with rate λα = α/2 in the metric Mα. Thus, by
the basic contraction theorem [24], (v̂, â) converges exponentially to (v, a)
with rate λα in the metric Mα. Also note that the β-bound corresponding
to the metric Mα is given by βα =
1+α2−
α4−α2+1
Next, assume that the measurements of x are corrupted by additive
“white noise”, so that xmeasured = x+ σξ. Equation (4.5) then becomes an
Itô stochastic differential equation
3 cos(ωt)
By definition of B, the variance of the noise in the metric Mα is upper-
bounded by α
. Thus, using again corollary 1, one obtains (see Figure 1
for a numerical simulation)
∀t ≥ 0 E
‖v̂(t)− v(t)‖2 + ‖â(t)− a(t)‖2
‖v̂0 − v0‖2 + ‖â0 − a0‖2
4.3 Stochastic synchronization
Consider a network of n dynamical elements coupled through diffusive con-
nections
dxi =
f(xi, t) +
j 6=i
Kij(xj − xi)
 dt+ σi(xi, t)dW di i = 1, . . . , n
(4.7)
x, t) =
f(x1, t)
f(xn, t)
 , ⌢σ(⌢x, t) =
σ1(x1, t) 0 0
. . . 0
0 0 σn(xn, t)
The global state
x then follows the equation
x, t)− L⌢x
x, t)dW nd (4.8)
 0  2  4  6  8  10
 0  2  4  6  8  10
Figure 1: Estimation of the velocity of a mobile using noisy measurements
of its position. The simulation was performed using the Euler-Maruyama
algorithm [18] with the following parameters: U1 = 10, U2 = 2, ω = 3,
σ = 10 and α = 1. Left plot: simulation for one trial. The plot shows
the measured position (red), the actual velocity (blue) and the estimate of
the velocity using the measured position (green). Right plot: the average
over 1000 trials of the squared error ‖v̂ − v‖2 + ‖â − a‖2 (green) and the
asymptotic bound
= 200
given by our approach (red).
In the sequel, we follow the reasoning of [28], which starts by defining an
appropriate orthonormal matrix V describing the synchronization subspace
(V represents the state projection on the subspace M⊥, orthogonal to the
synchronization subspace M = {(x1, . . . ,xn)T : x1 = . . . = xn}, see [28] for
details). Denote by
y the state of the projected system,
y = V
x. Since the
mapping is linear, Itô differentiation rule simply yields
y = Vd
x, t) −VL⌢x
x, t)dW nd
y, t) −VLVT⌢y
y, t)dW nd (4.9)
Assume now that ∂f
is uniformly upper-bounded. Then for strong
enough coupling strength, A = V ∂f
VT − VLVT will be uniformly neg-
ative definite. Let λ = |λmax(A)| > 0. System (4.9) then verifies condition
(H1) with rate λ. Assume furthermore that each noise intensity σi is upper-
bounded by a constant Ci (i.e. supx,t tr(σi(x, t)
Tσi(x, t)) ≤ Ci). Condition
(H2) will then be satisfied with the bound C =
i Ci.
Next, consider a noise-free trajectory
yu(t) of system (4.9). By theo-
rem 3 of [28], we know that
yu(t) converges exponentially to zero. Thus,
by corollary 1, one can conclude that, after exponential transients of rate λ,
E(‖⌢y(t)‖2) ≤ C
On the other hand, one can show that
‖⌢y(t)‖2 = 1
‖xi − xj‖2
Thus, after exponential transients of rate λ, we have
‖xi − xj‖2 ≤
Remarks
• The above development is fully compatible with the concurrent syn-
chronization framework [28]. It can also be easily generalized to the
case of time-varying metrics by combining theorem 3 of this paper and
corollary 1 of [28].
• The synchronization of Itô dynamical systems has been investigated
in [4]. However, the systems considered by the authors of that article
were dissipative. Here, we make a less restrictive assumption, namely,
we only require ∂f
to be uniformly upper-bounded. This enables us
to study the synchronization of a broader class of dynamical systems,
which can include nonlinear oscillators or even chaotic systems.
Example As illustration of the above development, we provide here a de-
tailed analysis for the synchronization of noisy FitzHugh-Nagumo oscillators
(see [35] for the references). The dynamics of two diffusively-coupled noisy
FitzHugh-Nagumo oscillators can be described by
dvi = (c(vi + wi − 13v
i + Ii) + k(v0 − vi))dt+ σdWi
dwi = −1c (vi − a+ bwi)dt
i = 1, 2
Let x = (v1, w1, v2, w2)
T and V = 1√
1 0 −1 0
0 1 0 −1
. The Jaco-
bian matrix of the projected noise-free system is then given by
c− c(v
− k c
−1/c −b/c
Thus, if the coupling strength verifies k > c then the projected system
will be stochastically contracting in the diagonal metric M = diag(1, c) with
rate min(k − c, b/c) and bound σ2. Hence, the average absolute difference
between the two “membrane potentials” |v1 − v2| will be upper-bounded by
min(1, c)min(k − c, b/c) (see Figure 2 for a numerical simulation).
Acknowledgments We are grateful to Dr S. Darses, Prof D. Bennequin
and Prof M. Yor for stimulating discussions, and to the Associate Editor
and the reviewers for their helpful comments.
 0  2  4  6  8  10
Figure 2: Synchronization of two noisy FitzHugh-Nagumo oscillators. The
simulation was performed using the Euler-Maruyama algorithm [18] with
the following parameters: a = 0.3, b = 0.2, c = 30, k = 40 and σ = 1. The
plot shows the “membrane potentials” of the two oscillators.
A Appendix
A.1 Proof of the supermartingale property
Lemma 3 Consider a Markov stochastic process x(t) and a non-negative
function V such that ∀t ≥ 0 EV (x(t)) < ∞ and
∀x ∈ Rn ÃV (x) ≤ −λV (x) (A.1)
where λ is a non-negative real number and Ã is the infinitesimal operator
of the process x(t). Then V (x(t)) is a supermartingale with respect to the
canonical filtration Ft = {x(s), s ≤ t}.
We need to show that for all s ≥ t, one has E(V (x(s))|Ft) ≤ V (x(t)).
Since x(t) is a Markov process, it suffices to show that
∀x0 ∈ Rn E(V (x(t))|x(0) = x0) ≤ V (x0)
By Dynkin’s formula, one has for all x0 ∈ Rn
Ex0V (x(t)) = V (x0) + Ex0
ÃV (x(s))ds
≤ V (x0)− λEx0
V (x(s))ds ≤ V (x0)
where Ex0(·) = E(·|x(0) = x0).
A.2 A variation of Gronwall’s lemma
Lemma 4 Let g : [0,∞[→ R be a continuous function, C a real number
and λ a strictly positive real number. Assume that
∀u, t 0 ≤ u ≤ t g(t)− g(u) ≤
−λg(s) + Cds (A.2)
∀t ≥ 0 g(t) ≤ C
g(0) − C
e−λt (A.3)
where [·]+ = max(0, ·).
Proof Case 1 : C = 0, g(0) > 0.
Define h(t) by
∀t ≥ 0 h(t) = g(0)e−λt
Remark that h is positive with h(0) = g(0), and satisfies (A.2) where the
inequality has been replaced by an equality
∀u, t 0 ≤ u ≤ t h(t)− h(u) = −
λh(s)ds
Consider now the set S = {t ≥ 0 | g(t) > h(t)}. If S = ∅ then the
lemma holds true. Assume by contradiction that S 6= ∅. In this case, let
m = inf S < ∞. By continuity of g and h and by the fact that g(0) = h(0),
one has g(m) = h(m) and there exists ǫ > 0 such that
∀t ∈]m,m+ ǫ[ g(t) > h(t) (A.4)
Consider now φ(t) = g(m)− λ
g(s)ds. Equation (A.2) implies that
∀t ≥ m g(t) ≤ φ(t)
In order to compare φ(t) and h(t) for t ∈]m,m+ ǫ[, let us differentiate the
ratio φ(t)/h(t).
φ′h− h′φ
−λgh+ λhφ
λh(φ − g)
Thus φ(t)/h(t) is increasing for t ∈]m,m + ǫ[. Since φ(m)/h(m) = 1, one
can conclude that
∀t ∈]m,m+ ǫ[ φ(t) ≥ h(t)
which implies, by definition of φ and h, that
∀t ∈]m,m+ ǫ[
g(s)ds ≤
h(s)ds (A.5)
Choose now t0 such that m < t0 < m+ ǫ, then one has by (A.4)
g(s)ds >
h(s)ds
which clearly contradicts (A.5).
Case 2 : C = 0, g(0) ≤ 0
Consider the set S = {t ≥ 0 | g(t) > 0}. If S = ∅ then the lemma holds
true. Assume by contradiction that S 6= ∅. In this case, let m = inf S < ∞.
By continuity of g and by the fact that g(0) ≤ 0, one has g(m) = 0 and
there exists ǫ such that
∀t ∈]m,m+ ǫ[ g(t) > 0 (A.6)
Let t0 ∈]m,m+ ǫ[. Equation (A.2) implies that
g(t0) ≤ −λ
g(s)ds
which clearly contradicts (A.6).
Case 3 : C 6= 0
Define ĝ = g − C/λ. One has
∀u, t 0 ≤ u ≤ t ĝ(t)−ĝ(u) = g(t)−g(u) ≤
−λg(s)+Cds = −
λĝ(s)ds
Thus ĝ satisfies the conditions of Case 1 or Case 2, and as a consequence
∀t ≥ 0 ĝ(t) ≤ [ĝ(0)]+e−λt
The conclusion of the lemma follows by replacing ĝ by g−C/λ in the above
equation. �
References
[1] N. Aghannan and P. Rouchon. An intrinsic observer for a class of la-
grangian systems. IEEE Transactions on Automatic Control, 48, 2003.
[2] D. Angeli. A lyapunov approach to incremental stability properties.
IEEE Transactions on Automatic Control, 47:410–422, 2002.
[3] L. Arnold. Stochastic Differential Equations : Theory and Applications.
Wiley, 1974.
[4] T. Caraballo and P. Kloeden. The persistence of synchronization under
environmental noise. Proceedings of the Royal Society A, 461:2257–
2267, 2005.
[5] T. Caraballo, P. Kloeden, and B. Schmalfuss. Exponentially stable
stationary solutions for stochastic evolution equations and their pertu-
bation. Applied Mathematics and Optimization, 50:183–207, 2004.
[6] L. d’Alto and M. Corless. Incremental quadratic stability. In Proceed-
ings of the IEEE Conference on Decision and Control, 2005.
[7] B. Demidovich. Dissipativity of a nonlinear system of differential equa-
tions. Ser. Mat. Mekh., 1961.
[8] H. Deng, M. Krstic, and R. Williams. Stabilization of stochastic nonlin-
ear systems driven by noise of unknown covariance. IEEE Transactions
on Automatic Control, 46, 2001.
[9] E. Dickmanns. Dynamic Vision for Intelligent Vehicles. Course Notes,
MIT EECS dept, 1998.
[10] K. El Rifai and J.-J. Slotine. Contraction and incremental stability.
Technical report, MIT NSL Report, 2006.
[11] W. Feller. An Introduction to Probability Theory and Its Applications.
Wiley, 1968.
[12] P. Florchinger. Lyapunov-like techniques for stochastic stability. SIAM
Journal of Control and Optimization, 33:1151–1169, 1995.
[13] P. Florchinger. Feedback stabilization of affine in the control stochastic
differential systems by the control lyapunov function method. SIAM
Journal of Control and Optimization, 35, 1997.
[14] V. Fromion. Some results on the behavior of lipschitz continuous sys-
tems. In Proceedings of the European Control Conference, 1997.
[15] B. Girard, N. Tabareau, Q.-C. Pham, A. Berthoz, and J.-J. Slotine.
Where neuroscience and dynamic system theory meet autonomous
robotics: a contracting basal ganglia model for action selection. Neural
Networks, 2008.
[16] P. Hartmann. Ordinary differential equations. Wiley, 1964.
[17] R. Has’minskii. Stochastic Stability of Differential Equations. Sijthoff
and Nordhoff, Rockville, 1980.
[18] D. Higham. An algorithmic introduction to numerical simulation of
stochastic differential equations. SIAM Review, 43:525–546, 2001.
[19] R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press,
1985.
[20] J. Jouffroy. Some ancestors of contraction analysis. In Proceedings of
the IEEE Conference on Decision and Control, 2005.
[21] J. Jouffroy and T. Fossen. On the combination of nonlinear contracting
observers and uges controllers for output feedback. In Proceedings of
the IEEE Conference on Decision and Control, 2004.
[22] H. Kushner. Stochastic Stability and Control. Academic Press, 1967.
[23] D. Lewis. Metric properties of differential equations. American Journal
of Mathematics, 71:294–312, 1949.
[24] W. Lohmiller and J.-J. Slotine. On contraction analysis for nonlinear
systems. Automatica, 34:671–682, 1998.
[25] W. Lohmiller and J.-J. Slotine. Nonlinear process control using con-
traction theory. A.I.Ch.E. Journal, 2000.
[26] W. Lohmiller and J.-J. Slotine. Contraction analysis of nonlinear dis-
tributed systems. International Journal of Control, 78, 2005.
[27] X. Mao. Stability of Stochastic Differential Equations with Respect to
Semimartingales. Longman, White Plains, NY, 1991.
[28] Q.-C. Pham and J.-J. Slotine. Stable concurrent synchronization in
dynamic system networks. Neural Netw, 20(1):62–77, Jan. 2007.
[29] V. Popov. Hyperstability of Control Systems. Springer-Verlag, 1973.
[30] W. Rudin. Real and complex analysis. McGraw-Hill, 1987.
[31] J.-J. Slotine and W. Li. Applied Nonlinear Control. Prentice-Hall, 1991.
[32] E. Sontag and Y. Wang. Output-to-state stability and detectability of
nonlinear systems. Systems and Control Letters, 29:279–290, 1997.
[33] N. Tabareau and J.-J. Slotine. Notes on contraction theory. Technical
report, MIT NSL Report, 2005.
[34] J. Tsinias. The concept of “exponential iss” for stochastic systems and
applications to feedback stabilization. Systems and Control Letters,
36:221–229, 1999.
[35] W. Wang and J.-J. E. Slotine. On partial contraction analysis for cou-
pled nonlinear oscillators. Biol Cybern, 92(1):38–53, Jan. 2005.
[36] Y. Zhao and J.-J. Slotine. Discrete nonlinear observers for inertial
navigation. Systems and Control Letters, 54, 2005.
	Introduction
	Main results
	Background
	Nonlinear contraction theory
	Standard stochastic stability
	The stochastic contraction theorem
	Settings
	The basic stochastic contraction theorem
	Generalization to time-varying metrics
	Strength of the stochastic contraction theorem
	``Optimality'' of the mean square bound
	No asymptotic almost-sure stability
	Noisy and noise-free trajectories
	Combinations of contracting stochastic systems
	Some examples
	Effect of measurement noise on contracting observers
	Estimation of velocity using composite variables
	Stochastic synchronization
	Appendix
	Proof of the supermartingale property
	A variation of Gronwall's lemma
ABSTRACT
  We investigate the incremental stability properties of It\^o stochastic
dynamical systems. Specifically, we derive a stochastic version of nonlinear
contraction theory that provides a bound on the mean square distance between
any two trajectories of a stochastically contracting system. This bound can be
expressed as a function of the noise intensity and the contraction rate of the
noise-free system. We illustrate these results in the contexts of stochastic
nonlinear observers design and stochastic synchronization.

<|endoftext|><|startoftext|>
Introduction to the Theory of Numbers, fifth edition, Oxford
Science Publications, Clarendon Press, Oxford, 1995.
[Hej] D. Hejhal, On the triple correlation of zeros of the zeta function, Internat. Math. Res. Notices
1994, no. 7, 294-302.
[HM] C. Hughes and S. J. Miller, Low-lying zeros of L-functions with orthogonal symmtry, Duke
Math. J., 136 (2007), no. 1, 115–172.
[HR] C. Hughes and Z. Rudnick, Linear Statistics of Low-Lying Zeros of L-functions, Quart. J.
Math. Oxford 54 (2003), 309–333.
[HKS] D. K. Huynh, J. P. Keating and N. C. Snaith, work in progress.
[ILS] H. Iwaniec, W. Luo and P. Sarnak, Low lying zeros of families of L-functions, Inst. Hautes
Études Sci. Publ. Math. 91, 2000, 55–131.
[Ju1] M. Jutila, On character sums and class numbers, Journal of Number Theory 5 (1973), 203–
[Ju2] M. Jutila, On mean values of Dirichlet polynomials with real characters, Acta Arith. 27
(1975), 191–198.
[Ju3] M. Jutila, On the mean value of L(1/2, χ) for real characters, Analysis 1 (1981), no. 2,
149–161.
[KaSa1] N. Katz and P. Sarnak, Random Matrices, Frobenius Eigenvalues and Monodromy, AMS
Colloquium Publications 45, AMS, Providence, 1999.
[KaSa2] N. Katz and P. Sarnak, Zeros of zeta functions and symmetries, Bull. AMS 36, 1999, 1−26.
[Ke] J. P. Keating, Statistics of quantum eigenvalues and the Riemann zeros, in Supersymmetry
and Trace Formulae: Chaos and Disorder, eds. I. V. Lerner, J. P. Keating & D. E Khmelnit-
skii (Plenum Press), 1–15.
[KeSn1] J. P. Keating and N. C. Snaith, Random matrix theory and ζ(1/2+ it), Comm. Math. Phys.
214 (2000), no. 1, 57–89.
[KeSn2] J. P. Keating and N. C. Snaith, Random matrix theory and L-functions at s = 1/2, Comm.
Math. Phys. 214 (2000), no. 1, 91–110.
[KeSn3] J. P. Keating and N. C. Snaith, Random matrices and L-functions, Random matrix theory,
J. Phys. A 36 (2003), no. 12, 2859–2881.
[Mil1] S. J. Miller, 1- and 2-level densities for families of elliptic curves: evidence for the underly-
ing group symmetries, Compositio Mathematica 104 (2004), 952–992.
[Mil2] S. J. Miller, Variation in the number of points on elliptic curves and applications to excess
rank, C. R. Math. Rep. Acad. Sci. Canada 27 (2005), no. 4, 111–120.
[Mil3] S. J. Miller, Lower order terms in the 1-level density for families of holomorphic cuspidal
newforms, preprint. http://arxiv.org/abs/0704.0924
[Mon] H. Montgomery, The pair correlation of zeros of the zeta function, Analytic Number Theory,
Proc. Sympos. Pure Math. 24, Amer. Math. Soc., Providence, 1973, 181− 193.
[Od1] A. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math. Comp.
48 (1987), no. 177, 273–308.
[Od2] A. Odlyzko, The 1022-nd zero of the Riemann zeta function, Proc. Confer-
ence on Dynamical, Spectral and Arithmetic Zeta-Functions, M. van Frankenhuy-
sen and M. L. Lapidus, eds., Amer. Math. Soc., Contemporary Math. series, 2001,
http://www.research.att.com/∼amo/doc/zeta.html.
A SYMPLECTIC TEST OF THE L-FUNCTIONS RATIOS CONJECTURE 29
[OS1] A. E. Özlük and C. Snyder, Small zeros of quadratic L-functions, Bull. Austral. Math. Soc.
47 (1993), no. 2, 307–319.
[OS2] A. E. Özlük and C. Snyder, On the distribution of the nontrivial zeros of quadratic L-
functions close to the real axis, Acta Arith. 91 (1999), no. 3, 209–228.
[RR] G. Ricotta and E. Royer, Statistics for low-lying zeros of symmetric power L-functions in
the level aspect, preprint. http://arxiv.org/abs/math/0703760
[Ro] E. Royer, Petits zéros de fonctions L de formes modulaires, Acta Arith. 99 (2001), no. 2,
147-172.
[Rub1] M. Rubinstein, Low-lying zeros of L–functions and random matrix theory, Duke Math. J.
109, (2001), 147–181.
[Rub2] M. Rubinstein, Computational methods and experiments in analytic number theory. Pages
407–483 in Recent Perspectives in Random Matrix Theory and Number Theory, ed. F. Mez-
zadri and N. C. Snaith editors, 2005.
[RS] Z. Rudnick and P. Sarnak, Zeros of principal L-functions and random matrix theory, Duke
Math. J. 81, 1996, 269− 322.
[So] K. Soundararajan, Nonvanishing of quadratic Dirichlet L-functions at s = 1/2, Ann. of
Math. (2) 152 (2000), 447–488.
[Yo1] M. Young, Lower-order terms of the 1-level density of families of elliptic curves, Internat.
Math. Res. Notices 2005, no. 10, 587–633.
[Yo2] M. Young, Low-lying zeros of families of elliptic curves, J. Amer. Math. Soc. 19 (2006), no.
1, 205–250.
E-mail address: sjmiller@math.brown.edu
DEPARTMENT OF MATHEMATICS, BROWN UNIVERSITY, PROVIDENCE, RI 02912
	1. Introduction
	2. Analysis of the terms from the Ratios Conjecture.
	2.1. Analysis of R(g;X)
	2.2. Secondary term (of size 1/logX) of R(g;X)
	3. Analysis of the terms from Number Theory
	3.1. Contribution from k even
	3.2. Contribution from k odd
	Appendix A. The Explicit Formula
	Appendix B. Sums over fundamental discriminants
	Appendix C. Improved bound for non-square m terms in SM(X,Y,g"0362g,)
	References
ABSTRACT
  Recently Conrey, Farmer and Zirnbauer conjectured formulas for the averages
over a family of ratios of products of shifted L-functions. Their L-functions
Ratios Conjecture predicts both the main and lower order terms for many
problems, ranging from n-level correlations and densities to mollifiers and
moments to vanishing at the central point. There are now many results showing
agreement between the main terms of number theory and random matrix theory;
however, there are very few families where the lower order terms are known.
These terms often depend on subtle arithmetic properties of the family, and
provide a way to break the universality of behavior. The L-functions Ratios
Conjecture provides a powerful and tractable way to predict these terms. We
test a specific case here, that of the 1-level density for the symplectic
family of quadratic Dirichlet characters arising from even fundamental
discriminants d \le X. For test functions supported in (-1/3, 1/3) we calculate
all the lower order terms up to size O(X^{-1/2+epsilon}) and observe perfect
agreement with the conjecture (for test functions supported in (-1, 1) we show
agreement up to errors of size O(X^{-epsilon}) for any epsilon). Thus for this
family and suitably restricted test functions, we completely verify the Ratios
Conjecture's prediction for the 1-level density.

<|endoftext|><|startoftext|>
arXiv:0704.0928v3  [hep-ph]  26 Oct 2007
Cosmology from String Theory
Luis Anchordoqui,1 Haim Goldberg,2 Satoshi Nawata,1 and Carlos Nuñez3
1Department of Physics,
University of Wisconsin-Milwaukee, Milwaukee, WI 53201
2Department of Physics,
Northeastern University, Boston, MA 02115
3 Department of Physics,
University of Swansea, Singleton Park, Swansea SA2 8PP, UK
(Dated: April 2007)
Abstract
We explore the cosmological content of Salam-Sezgin six dimensional supergravity, and find a
solution to the field equations in qualitative agreement with observation of distant supernovae,
primordial nucleosynthesis abundances, and recent measurements of the cosmic microwave back-
ground. The carrier of the acceleration in the present de Sitter epoch is a quintessence field slowly
rolling down its exponential potential. Intrinsic to this model is a second modulus which is au-
tomatically stabilized and acts as a source of cold dark matter, with a mass proportional to an
exponential function of the quintessence field (hence realizing VAMP models within a String con-
text). However, any attempt to saturate the present cold dark matter component in this manner
leads to unacceptable deviations from cosmological data – a numerical study reveals that this
source can account for up to about 7% of the total cold dark matter budget. We also show that
(1) the model will support a de Sitter energy in agreement with observation at the expense of a
miniscule breaking of supersymmetry in the compact space; (2) variations in the fine structure
constant are controlled by the stabilized modulus and are negligible; (3) “fifth” forces are carried
by the stabilized modulus and are short range; (4) the long time behavior of the model in four
dimensions is that of a Robertson-Walker universe with a constant expansion rate (w = −1/3).
Finally, we present a String theory background by lifting our six dimensional cosmological solution
to ten dimensions.
http://arxiv.org/abs/0704.0928v3
I. GENERAL IDEA
The mechanism involved in generating a very small cosmological constant that satisfies
’t Hooft naturalness is one of the most pressing questions in contemporary physics. Re-
cent observations of distant Type Ia supernovae [1] strongly indicate that the universe is
expanding in an accelerating phase, with an effective de-Sitter (dS) constant H that nearly
saturates the upper bound given by the present-day value of the Hubble constant, i.e.,
H <∼ H0 ∼ 10−33 eV. According to the Einstein field equations, H provides a measure of
the scalar curvature of the space and is related to the vacuum energy density ρvac through
Friedmann’s equation, 3M2PlH
2 ∼ ρvac, where MPl ≃ 2.4 × 1018 GeV is the reduced Planck
mass. However, the “natural” value of ρvac coming from the zero-point energies of known
elementary particles is found to be at least ρvac ∼ TeV4. Substitution of this value of ρvac into
Friedmann’s equation yields H >∼ 10−3 eV, grossly inconsistent with the set of supernova
(SN) observations. The absence of a mechanism in agreement with ’t Hooft naturalness
criteria then centers on the following question: why is the vacuum energy needed by the
Einstein field equations 120 orders of magnitude smaller than any “natural” cut-off scale in
effective field theory of particle interactions, but not zero?
Nowadays, the most popular framework which can address aspects of this question is
the anthropic approach, in which the fundamental constants are not determined through
fundamental reasons, but rather because such values are necessary for life (and hence intel-
ligent observers to measure the constants) [2]. Of course, in order to implement this idea in
a concrete physical theory, it is necessary to postulate a multiverse in which fundamental
physical parameters can take different values. Recent investigations in String theory have
applied a statistical approach to the enormous “landscape” of metastable vacua present in
the theory [3]. A vast ensemble of metastable vacua with a small positive effective cosmo-
logical constant that can accommodate the low energy effective field theory of the Standard
Model (SM) have been found. Therefore, the idea of a string landscape has been used to
proposed a concrete implementation of the anthropic principle.
Nevertheless, the compactification of a String/M-theory background to a four dimen-
sional solution undergoing accelerating expansion has proved to be exceedingly difficult.
The obstruction to finding dS solutions in the low energy equations of String/M theory
is well known and summarized in the no-go theorem of [4]. This theorem states that in
a D-dimensional theory of gravity, in which (a) the action is linear in the Ricci scalar
curvature (b) the potential for the matter fields is non-positive and (c) the massless fields
have positive defined kinetic terms, there are no (dynamical) compactifications of the form:
ds2D = Ω
2(y)(dx2d+ ĝmndy
ndym), if the d dimensional space has Minkowski SO(1, d−1) or dS
SO(1, d) isometries and its d dimensional gravitational constant is finite (i.e., the internal
space has finite volume). The conclusions of the theorem can be circumvented if some of its
hypotheses are not satisfied. Examples where the hypotheses can be relaxed exist: (i) one
can find solutions in which not all of the internal dimensions are compact [5]; (ii) one may
try to find a solution breaking Minkowski or de Sitter invariance [6]; (iii) one may try to
add negative tension matter (e.g., in the form of orientifold planes) [7]; (iv) one can even
appeal to some intrincate String dynamics [8].
Salam-Sezgin six dimensional supergravity model [9] provides a specific example where
the no-go theorem is not at work, because when their model is lifted to M theory the
internal space is found to be non-compact [10]. The lower dimensional perspective of this,
is that in six dimensions the potential can be positive. This model has perhaps attracted
the most attention because of the wide range of its phenomenological applications [11]. In
this article we examine the cosmological implications of such a supergravity model during
the epochs subsequent to primordial nucleosynthesis. We derive a solution of Einstein field
equations which is in qualitative agreement with luminosity distance measurements of Type
Ia supernovae [1], primordial nucleosynthesis abundances [12], data from the Sloan Digital
Sky Survey (SDSS) [13], and the most recent measurements from the Wilkinson Microwave
Anisotropy Probe (WMAP) satellite [14]. The observed acceleration of the universe is
driven by the “dark energy” associated to a scalar field slowly rolling down its exponential
potential (i.e., kinetic energy density < potential energy density ≡ negative pressure) [15].
Very interestingly, the resulting cosmological model also predicts a cold dark matter (CDM)
candidate. In analogy with the phenomenological proposal of [16], such a nonbaryonic matter
interacts with the dark energy field and therefore the mass of the CDM particles evolves with
the exponential dark energy potential. However, an attempt to saturate the present CDM
component in this manner leads to gross deviations from present cosmological data. We
will show that this type of CDM can account for up to about 7% of the total CDM budget.
Generalizations of our scenario (using supergravities with more fields) might account for the
rest.
II. SALAM-SEZGIN COSMOLOGY
We begin with the action of Salam-Sezgin six dimensional supergravity [9], setting to
zero the fermionic terms in the background (of course fermionic excitations will arise from
fluctuations),
R− κ2(∂Mσ)2 − κ2eκσF 2MN −
e−κσ − κ
e2κσG2MNP
. (1)
Here, g6 = det gMN , R is the Ricci scalar of gMN , FMN = ∂[MAN ], GMNP = ∂[MBNP ] +
κA[MFNP ], and capital Latin indices run from 0 to 5. A re-scaling of the constants: G6 ≡ 2κ2,
φ ≡ −κσ and ξ ≡ 4 g2 leads to
R− (∂Mφ)2 −
eφ − G6
e−φF 2MN −
e−2φG2MNP
. (2)
The length dimensions of the fields are: [G6] = L
4, [ξ] = L2, [φ] = [g2MN ] = 1, [A
M ] = L
and [F 2MN ] = [G
MNP ] = L
Now, we consider a spontaneous compactification from six dimension to four dimension.
To this end, we take the six dimensional manifold M to be a direct product of 4 Minkowski
directions (hereafter denoted by N1) and a compact orientable two dimensional manifold N2
with constant curvature. Without loss of generality, we can set N2 to be a sphere S
2, or a
Σ2 hyperbolic manifold with arbitrary genus. The metric on M locally takes the form
ds26 = ds4(t, ~x)
2 + e2f(t,~x)dσ2, dσ2 =
r2c (dϑ
2 + sin2 ϑdϕ2) for S2
r2c (dϑ
2 + sinh2 ϑdϕ2) for Σ2 ,
where (t, ~x) denotes a local coordinate system in N1, rc is the compactification radius of N2.
We assume that the scalar field φ is only dependent on the point of N1, i.e., φ = φ(t, ~x).
We further assume that the gauge field AM is excited on N2 and is of the form
b cos ϑ (S2)
b cosh ϑ (Σ2) .
This is the monopole configuration detailed by Salam-Sezgin [9]. Since we set the Kalb-
Ramond field BNP = 0 and the term A[MFNP ] vanishes on N2, GMNP = 0. The field
strength becomes
F 2MN = 2b
2e−4f/r4c . (5)
Taking the variation of the gauge field AM in Eq. (2) we obtain the Maxwell equation
2f−φFMN
= 0. (6)
It is easily seen that the field strengths in Eq. (5) satisfy Eq. (6).
With this in mind, the Ricci scalar reduces to [17]
R[M ] = R[N1] + e
−2fR[N2]− 4✷f − 6(∂µf)2 , (7)
where R[M ], R[N1], and R[N2] denote the Ricci scalars of the manifolds M, N1, and N2;
respectively. (Greek indices run from 0 to 3). The Ricci scalar of N2 reads
R[N2] =
+2/r2c (S
−2/r2c (Σ2).
To simplify the notation, from now on, R1 and R2 indicate R[N1] and R[N2], respectively.
The determinant of the metric can be written as
g6 = e
2f√g4
gσ, where g4 = det gµν and
gσ is the determinant of the metric ofN2 excluding the factor e
2f . We define the gravitational
constant in the four dimension as
2πr2c
. (9)
Hence, by using the field configuration given in Eq. (4) we can re-write the action in Eq. (2)
as follows
e2f [R1 + e
−2fR2 + 2(∂µf)
2 − (∂µφ)2]−
e2f+φ − G6b
e−2f−φ
. (10)
Let us consider now a rescaling of the metric of N1: ĝµν ≡ e2fgµν and
ĝ4 = e
4f√g4. Such a
transformation brings the theory into the Einstein conformal frame where the action given
in Eq. (10) takes the form
R[ĝ4]− 4(∂µf)2 − (∂µφ)2 −
e−2f+φ −
e−6f−φ + e−4fR2
. (11)
The four dimensional Lagrangian is then
R− 4(∂µf)2 − (∂µφ)2 − V (f, φ)
, (12)
V (f, φ) ≡ ξ
e−2f+φ +
e−6f−φ − e−4fR2 , (13)
where to simplify the notation we have defined: g ≡ ĝ4 and R ≡ R[ĝ4].
Let us now define a new orthogonal basis, X ≡ (φ + 2f)/
G4 and Y ≡ (φ − 2f)/
so that the kinetic energy terms in the Lagrangian are both canonical, i.e.,
(∂X)2 − 1
(∂Y )2 − Ṽ (X, Y )
, (14)
where the potential Ṽ (X, Y ) ≡ V (f, φ)/G4 can be re-written (after some elementary algebra)
as [18]
Ṽ (X, Y ) =
G4X −R2e−
G4X +
. (15)
The field equations are
Rµν −
gµνR =
∂µX∂νX −
∂ηX ∂
∂µY ∂νY −
∂ηY ∂
− gµνṼ (X, Y )
, (16)
✷X = ∂X Ṽ , and ✷Y = ∂Y Ṽ . In order to allow for a dS era we assume that the metric takes
the form
ds2 = −dt2 + e2h(t)d~x 2, (17)
and that X and Y depend only on the time coordinate, i.e., X = X(t) and Y = Y (t). Then
the equations of motion for X and Y can be written as
Ẍ + 3ḣẊ = −∂X Ṽ (18)
Ÿ + 3ḣẎ = −∂Y Ṽ , (19)
whereas the only two independent components of Eq. (16) are
ḣ2 =
(Ẋ2 + Ẏ 2) + Ṽ (X, Y )
2ḧ+ 3ḣ2 =
(Ẋ2 + Ẏ 2) + Ṽ (X, Y )
. (21)
The terms in the square brackets in Eq. (15) take the form of a quadratic function of
G4 X . This function has a global minimum at e−
G4 X0 = R2 r
c/(2G6 b
2). Indeed, the
necessary and sufficient condition for a minimum is that R2 > 0, so hereafter we only
consider the spherical compactification, where e−
G4 X0 = M2Pl/(4πb
2). The condition for
the potential to show a dS rather than an AdS or Minkowski phase is ξb2 > 1. Now, we
expand Eq. (15) around the minimum,
Ṽ (X, Y ) =
(X −X0)2 +O
(X −X0)3
 , (22)
where
π brc
4πr2cb
(b2ξ − 1) . (24)
As shown by Salam-Sezgin [9] the requirements for preserving a fraction of supersymmetry
(SUSY) in spherical compactifications to four dimension imply b2ξ = 1, corresponding to
winding number n = ±1 for the monopole configuration. Consequently, a (Y -dependent)
dS background can be obtained only through SUSY breaking. For now we will leave open
the symmetry breaking mechanism and come back to this point after our phenomenological
discussion. The Y -dependent physical mass of the X-particles at any time is
MX(Y ) =
G4 Y/2
MX , (25)
which makes this a varying mass particle (VAMP) model [16], although, in this case, the
dependence on the quintessence field is fixed by the theory. The dS (vacuum) potential
energy density is
K . (26)
In general, classical oscillations for the X particle will occur for
MX > H =
G4ρtot
, (27)
where ρtot is the total energy density. (This condition is well known from axion cosmol-
ogy [19]). A necessary condition for this to hold can be obtained by saturating ρ with VY
from Eq. (26) and making use of Eqs. (23) to (27), which leads to ξb2 < 7. Of course, as
we stray from the present into an era where the dS energy is not dominant, we must check
at every step whether the inequality (27) holds. If the inequality is violated, the X-particle
ceases to behave like CDM.
In what follows, some combination of the parameters of the model will be determined by
fitting present cosmological data. To this end we assume that SM fields are confined to N1
and we denote with ρrad the radiation energy, with ρX the matter energy associated with
the X-particles, and with ρmat the remaining matter density. With this in mind, Eq. (19)
can be re-written as
Ÿ + 3H Ẏ = −∂Veff
, (28)
where Veff ≡ VY + ρX and H is defined by the Friedmann equation
H2 ≡ ḣ2 = 1
3M2Pl
Ẏ 2 + Veff + ρrad + ρmat
. (29)
(Note that the matter energy associated to the X particles is contained in Veff .)
It is more convenient to consider the evolution in u ≡ − ln(1+ z), where z is the redshift
parameter. As long as the oscillation condition is fulfilled, the VAMP CDM energy density
is given in terms of the X-particle number density nX [20]
ρX(Y, u) = MX(Y ) nX(u) = C e
G4Y/2 e−3u , (30)
where C is a constant to be determined by fitting to data. Along with Eq. (26), these define
for us the effective (u-dependent) VAMP potential
Veff(Y, u) ≡ VY + ρX = A e
G4Y + C e
G4Y/2 e−3u , (31)
where a A is just a constant given in terms of model parameters through Eqs. (22) and (24).
Hereafter we adopt natural units, MPl = 1. Denoting by a prime derivatives with respect
to u, the equation of motion for Y becomes
1− Y ′2/6
+ 3 Y ′ +
∂uρ Y
′/2 + 3 ∂Y Veff
= 0 , (32)
where ρ = Veff + ρrad + ρmat. Quantities of importance are the dark energy density
H2 Y ′2 + VY , (33)
generally expressed in units of the critical density (Ω ≡ ρ/ρc)
, (34)
and the Hubble parameter
3− Y ′2/2
. (35)
The equation of state is
H2 Y ′2
H2 Y ′2
. (36)
We pause to note that the exponential potential VY ∼ eλY/MPl , with λ =
2. Asymptotically,
this represents the crossover situation with wY = −1/3 [22], implying expansion at constant
velocity. Nevertheless, we will find that there is a brief period encompassing the recent past
(z <∼ 6) where there has been significant acceleration.
Returning now to the quantitative analysis, we take ρmat = Be
−3u and ρrad =
10−4 ρmat e
−u f(u) [21] where B is a constant and f(u) parameterizes the u-dependent
number of radiation degrees of freedom. In order to interpolate the various thresholds
appearing prior to recombination (among others, QCD and electroweak), we adopt a conve-
nient phenomenological form f(u) = exp(−u/15) [23]. We note at this point that solutions
of Eq. (32) are independent by an overall normalization for the energy density. This is also
true for the dimensionless quantities of interest ΩY and wY .
With these forms for the energy densities, Eq. (32) can be integrated for various choices
of A, B, and C, and initial conditions at u = −30. We take as initial condition Y (−30) = 0.
Because of the slow variation of Y over the range of u, changes in Y (−30) are equivalent
to altering the quantities A and C [24]. In accordance to equipartition arguments [24, 25]
we take Y ′(−30) = 0.08. Because the Y evolution equation depends only on energy density
ratios, and hence only on the ratios A : B : C of the previously introduced constants, we
may, for the purposes of integration and without loss of generality, arbitrarily fix B and
then scan the A and C parameter space for applicable solutions. In Fig. 1 we show a sample
qualitative fit to the data. It has the property of allowing the maximum value of X-CDM
FIG. 1: The upper panel shows the evolution of Y as a function of u. Today corresponds to
z = 0 and for primordial nucleosynthesis z ≈ 1010. We set the initial conditions Y (−30) = 0 and
Y ′(−30) = 0.08; we take A : B : C = 11 : 0.3 : 0.1. The second panel shows the evolution of ΩY
(solid line), Ωmat (dot-dashed line), and Ωrad (dashed line) superposed over experimental best fits
from SDSS and WMAP observations [13, 14]. The curves are not actual fits to the experimental
data but are based on the particular choice of the Y evolution shown in the upper panel, which
provides eyeball agreement with existing astrophysical observations. The lower panel shows the
evolution of the equation of state wY superposed over the best fits to WMAP + SDSS data sets
and WMAP + SNGold [14] . The solution of the field equations is consistent with the requirement
from primordial nucleosynthesis, ΩY < 0.045 (90%CL) [12], it also shows the established radiation
and matter dominated epochs, and at the end shows an accelerated dS era.
(about 7% of the total dark matter component) before the fits deviate unacceptably from
data.
It is worth pausing at this juncture to examine the consequences of this model for vari-
ation in the fine structure constant and long range forces. Specifically, excitations of the
electromagnetic field on N1 will, through the presence of the dilaton factor in Eq. (2), seem-
ingly induce variation in the electromagnetic fine structure constant αem = e
2/4π, as well
as a violation of the equivalence principle through a long range coupling of the dilaton to
the electromagnetic component of the stress tensor. We now show that these effects are
extremely negligible in the present model. First, it is easily seen using Eqs. (2) and (3)
together with Eqs. (8)-(15), that the electromagnetic piece of the lagrangian as viewed from
N1 is
Lem = −
G4X f̃ 2µν , (37)
where f̃µν denotes a quantum fluctuation of the electromagnetic U(1) field. (Fluctuations of
the U(1) background field are studied in the Appendix). At the equilibrium value X = X0,
the exponential factor is
G4X0 =
, (38)
so that we can identify the electromagnetic coupling (1/e2) ≃ M2Pl/b2. This shows that
b ∼ MPl. We can then expand about the equilibrium point, and obtain an additional factor
of (X − X0)/MPl. This will do two things [26]: (a) At the classical level, it will induce a
variation of the electromagnetic coupling as X varies, with ∆αem/αem ≃ (X − X0)/MPl;
(b) at the quantum level, exchange of X quanta will induce a new force through coupling to
the electromagnetic component of matter.
Item (b) is dangerous if the mass of the exchanged quanta are small, so that the force
is long range. This is not the case in the present model: from Eq. (22) the X quanta have
mass of O(MXMPl) ∼ MPl/(rcb), so that if rc is much less than O(cm), the forces will play
no role in the laboratory or cosmologically.
As far as the variation of αem is concerned, we find that ρX/ρmat = (C/B)e
2, so that
ρX ≃ 3× 10−120e−3uM4PleY/
X(X −X0)2eY
2M2Pl . (39)
This then gives,
〈(X −X0)2〉 ≡ ∆Xrms ≈ 10−60e−3u/2MPleY/(2
2)/MX . (40)
During the radiation era, Y ≃ const ≃ 0 (see Fig. 1), so that during nucleosynthesis
(u ≃ −23) ∆Xrms/MPl ≃ 10−45/MX , certainly no threat. It is interesting that such a small
value can be understood as a result of inflation: from the equation of motion for the X field,
it is simple to see that during a dS era with Hubble constant H , the amplitude ∆Xrms is
damped as e−3Ht/2. For 50 e-foldings, this represents a damping of 1032. In order to make the
numbers match (assuming a pre-inflation value ∆Xrms/MPl ∼ 1) an additional damping of
∼ 1013 is required from reheat temperature to primordial nucleosynthesis. With the e−3u/2
behavior, this implies a low reheat temperature, about 106 GeV. Otherwise, one may just
assume an additional fine-tuning of the initial condition on X .
As mentioned previously, the solutions of Eq. (32), as well as the quantities we are fitting
to (ΩY and wY ), depend only on the ratios of the energy densities. From the eyeball fit in
Fig. 1 we have, up to a common constant, ρordinary matter ≡ ρmat ∝ 0.3 e−3u and VY ∝ 11 e
We can deduce from these relations that
VY (now)
ρmat(now)
2Y (now) ≃ 36 e
2Y (now) . (41)
Besides, we know that ρmat(now) ≃ 0.3ρc(now) ≃ 10−120 M4Pl. Now, Eqs. (22) and (24) lead
VY (now) = e
2Y (now) M
8π r2c b
(b2ξ − 1) (42)
so that from Eqs. (41) and (42) we obtain
8π r2c b
(b2ξ − 1) ≃ 10−119 . (43)
It is apparent that this condition cannot be naturally accomplished by choosing large values
of rc and/or b. There remains the possibility that SUSY breaking [27] or non-perturbative
effects lead to an exponentially small deviation of b2ξ from unity, such that b2ξ = 1 +
O(10−119) [29]. Since a deviation of b2ξ from unity involves a breaking of supersymmetry,
a small value for this dimensionless parameter, perhaps (1 TeV/MPl)
2 ∼ 10−31, can be
expected on the basis of ’t Hooft naturalness. It is the extent of the smallness, of course,
which remains to be explained.
III. THE STRING CONNECTION
We now briefly comment on how the six dimensional solution derived above reads in
String theory. To this end, we use the uplifting formulae developed by Cvetic, Gibbons and
Pope [10]; we will denote with the subscript “cgp” the quantities of that paper and with
“us” quantities in our paper. Let us more specifically look at Eq. (34) in Ref. [10], where
the authors described the six dimensional Lagrangian they uplifted to Type I String theory.
By simple inspection, we can see that the relation between their variables and fields with
the ones we used in Eq. (2) is φ|cgp = −2φ|us, F2|cgp =
G6F2|us, H3|cgp =
G6/3G3|us, and
ḡ2|cgp = ξ/(8G6)|us. Our six dimensional background is determined by the (string frame)
metric ds26 = e
− dt2 + e2hdx23 + r2c dσ22
, the gauge field Fϑϕ = −b sin ϑ, and the t-
dependent functions h(t), f(t) =
G4 (X − Y )/4, and φ(t) =
G4 (X + Y )/2. Identifying
these expressions with those in Eqs. (47), (48) and (49) of Ref. [10] one obtains a full Type
I or Type IIB configuration, consisting of a 3-form (denoted by F3),
8G6 sinh ρ̂ cosh ρ̂
ξ cosh2 2ρ̂
dρ̂ ∧
b cos ϑdϕ
b cosϑdϕ
2G6b√
ξ cosh 2ρ̂
sinϑdθ ∧ dϕ ∧
cosh2 ρ̂
b cosϑdϕ
− sinh2 ρ̂
dβ +
b cosϑdϕ
 , (44)
a dilaton (denoted by φ̂)
e2φ̂ =
cosh(2ρ̂)
, (45)
and a ten dimensional metric that in the string frame reads
ds2str = e
φ ds26 + dz
dρ̂2 +
cosh2 ρ̂
cosh 2ρ̂
b cosϑdϕ
sinh2 ρ̂
cosh 2ρ̂
dβ +
b cosϑdϕ
 , (46)
where ρ̂, z, α, and β denote the four extra coordinates. It is important to stress that though
the uplifted procedure decribed above implies a non-compact internal manifold, the metric
in Eq. (46) can be interpreted within the context of [7] (i.e., 0 ≤ ρ̂ ≤ L, with L ≫ 1 an
infrared cutoff where the spacetime smoothly closes up) to obtain a finite volume for the
internal space and consequently a non-zero but tiny value for G6.
IV. CONCLUSIONS
We studied the six dimensional Salam-Sezgin model [9], where a solution of the form
Minkowski4×S2 is known to exist, with a U(1) monopole serving as background in the two-
sphere. This model circumvents the hypotheses of the no-go theorem [4] and then when lifted
to String theory can show a dS phase. In this work we have allowed for time dependence
of the six-dimensional moduli fields and metric (with a Robertson-Walker form). Time
dependence in these fields vitiates invariance under the supersymmetry transformations.
With these constructs, we have obtained the following results:
(1) In terms of linear combinations of the S2 moduli field and the six dimensional dilaton,
the effective potential consists of (a) a pure exponential function of a quintessence field
(this piece vanishes in the supersymmetric limit of the static theory) and (b) a part which
is a source of cold dark matter, with a mass proportional to an exponential function of the
quintessence field. This presence of a VAMP CDM candidate is inherent in the model.
(2) If the monopole strength is precisely at the value prescribed by supersymmetry, the
model is in gross disagreement with present cosmological data – there is no accelerative
phase, and the contribution of energy from the quintessence field is purely kinetic.
However, a miniscule deviation of O(10−120) from this value permits a qualitative match
with data. Contribution from the VAMP component to the matter energy density can be
as large as about 7% without having negative impact on the fit. The emergence of a
VAMP CDM candidate as a necessary companion of dark energy has been a surprising
aspect of the present findings, and perhaps encouraging for future exploration of
candidates which can assume a more prominent role in the CDM sector.
(3) In our model, the exponential potential VY ∼ eλY/MPl , with Y the quintessence field
and λ =
2. The asymptotic behavior of the scale factor for exponential potentials
eh(t) ≈ t2/λ2 , so that for our case h ≈ ln t, leading to a conformally flat Robertson-Walker
metric for large times. The deviation from constant velocity expansion into a brief
accelerated phase in the neighborhood of our era makes the model phenomenologically
viable. In the case that the supersymmetry condition (b2ξ = 1) is imposed, and there is
neither radiant energy nor dark matter except for the X contribution, we find for large
times that the scale parameter eh(t) ≈
t, so that even in this case the asymptotic metric
is Robertson-Walker rather than Minkowski. Moreover, and rather intriguingly, the scale
parameter is what one would find with radiation alone [28].
In sum, in spite of the shortcomings of the model (not a perfect fit, requirement of a tiny
deviation from supersymmetric prescription for the monopole embedding), it has provided
a stimulating new, and unifying, look at the dark energy and dark matter puzzles.
Acknowledgments
We would like to thank Costas Bachas and Roberto Emparan for valuable discussions.
The research of HG was supported in part by the National Science Foundation under Grant
No. PHY-0244507.
V. APPENDIX
In this appendix we study the quantum fluctuations of the U(1) field associted to the
background configuration. We start by considering fluctuations of the background field A0M
in the 4 dimensional space, i.e,
AM → A0M + ǫ aM , (47)
where A0M = 0 if M 6= ϕ and aM = 0 if M = ϑ, ϕ. The fluctuations on A0M lead to
FMN → F 0MN + ǫ fMN . (48)
Then,
MN = gML gNP [F 0MNF
LP + ǫ F
MN fLP + ǫ
2fMN fLP ] . (49)
The second term vanishes and the first and third terms are nonzero because F 0MN 6= 0 in
the compact space and fMN 6= 0 in the 4 dimensional space. If the Kalb-Ramond potential
BNM = 0, then the 3-form field strength can be written as
GMNP = κA[M FNP ] =
[AM FNP + AP FMN −AN FMP ] . (50)
Now we introduce notation of differential forms, in which the usual Maxwell field and field
strenght read
A1 = AMdx
M and F2 = FMN dx
M ∧ dxN ; (51)
respectively. (Note that dxM ∧ dxN is antisymmetrized by definition.) With this in mind
the 3-form reads
G3 = κA1 ∧ F2 = κAMFNP dxM ∧ dxN ∧ dxP . (52)
Substituting Eqs. (47) and (48) into Eq. (52) we obtain
G3 = κ
(A0M + ǫaM )(F
NP + ǫfNP ) dx
M ∧ dxN ∧ dxP
. (53)
The background fields read
A01 = b cosϑ dϕ, F
2 = −b sinϑ dϑ ∧ dϕ , (54)
and the fluctuations on the probe brane become
a1 = aµdx
µ, f2 = fdx
µ ∧ dxν , with f = ∂µaν − ∂µaν . (55)
All in all,
= A0ϕF
ϑϕ dϕ ∧ dϑ ∧ dϕ+ ǫA0ϕfµν dϕ ∧ dxµ ∧ dxν + ǫF 0ϑϕaµ dϑ ∧ dϕ ∧ dxµ
+ ǫ2aµfζνdx
µ ∧ dxζ ∧ dxν . (56)
Using Eq. (54) and the antisymmetry of the wedge product, Eq. (56) can be re-written as
b cos ϑfµνdϕ ∧ dxµ ∧ dxν − baµ sinϑdϑ ∧ dϕ ∧ dxµ + ǫaµfζνdxµ ∧ dxζ ∧ dxν
. (57)
From the metric
ds2 = e2αdx24 + e
2β(dϑ2 + sin ϑ2dϕ2) (58)
we can write the vielbeins
ea = eαdxa, eϑ = eβdϑ, eϕ = eβ sinϑdϕ,
dxa = e−αea, dϑ = e−βeϑ, dϕ =
eϕ (59)
where β ≡ f+ln rc. (Lower latin indeces from the beginning of the alphabet indicate coordi-
nates associted to the four dimensional Minkowski spacetime with metric ηab.) Substituting
into Eq. (57) we obtain
cos ϑ
sin ϑ
e−2α−βfabe
ϕ ∧ ea ∧ eb − be−α−2βaaeϑ ∧ eϕ ∧ ea + ǫe−3αaafcbea ∧ ec ∧ eb
, (60)
where fab = ∂aab − ∂baa. Because the three terms are orthogonal to each other straightfor-
ward calculation leads to
G23 = κ
2ǫ2(b2 cot2 ϑ e−4α−2βf 2ab + b
2e−2α−4βa2a) +O(ǫ4) . (61)
Then, the 5th term in Eq. (2) can be written as
SG3 = −
e4α+2β
dϑdϕ sinϑ
κ2ǫ2b2 cot2 ϑe−4α−2β
f 2ab
κ2ǫ2b2e−2α−4β
, (62)
whereas the contribution from the 4th term in Eq. (2) can be computed from Eq. (49)
yielding
SF2 = −
η42πe
2β−φG6ǫ
2f 2ab
2f−φr2cǫ
2f 2ab . (63)
Thus,
SG3 + SF2 = −
f 2ab +
, (64)
where the four dimensional effective coupling and the effective mass are of the form
= 4 ǫ2
πe2f−φr2c +
κ2b2e−2φ
dϑdϕ sinϑ cot2 ϑ
→ ∞ (65)
πκ2b2ǫ2e2α−2β−2φ . (66)
For the moment we let
dϑdϕ sinϑ cot2 ϑ = N , where eventually we set N → ∞. Now
to make quantum particle identification and coupling, we carry out the transformation
aa → gâa [30]. This implies that the second term in the right hand side of Eq. (64) vanishes,
yielding
fab = ∂a(gâb)− ∂b(gâa) = ∂ag âb − ∂bg âa + g ∂aâb − g ∂bâa = gf̂ab + â ∧ dg (67)
and consequently to leading order in N
f 2ab =
[g2f̂ 2ab + (â ∧ dg)2 + 2 g âb f̂ab ∂ag] . (68)
If the coupling depends only on the time variable,
f 2ab → f̂ 2ab +
â2a + 2
âi f̂
ti (69)
where ġ = ∂tg and lower latin indices from the middle of the alphabet refer to the brane
space-like dimensions. If we choose a time-like gauge in which at = 0, then the term
(ġ/g) âi f̂
ti can be written as (1/2)(ġ/g)(d/dt)(âi)
2, which after an integration by parts
gives −(1/2)[(d/dt)(ġ/g)]â2i ; with g ∼ e−φ, the factor in square brackets becomes −φ̈. Since
G4(X + Y ), the rapidly varying Ẍ will average to zero, and one is left just with the
very small Ÿ , which is of order Hubble square. For the term (ġ/g)2(ai)
2, the term (Ẋ)2
also averages to order Hubble square, implying that the induced mass term is of horizon
size. These “paraphotons” carry new relativistic degrees of freedom, which could in turn
modify the Hubble expansion rate during Big Bang nucleosynthesis (BBN). Note, however,
that these extremely light gauge bosons are thought to be created through inflaton decay
and their interactions are only relevant at Planck-type energies. Since the quantum gravity
era, all the paraphotons have been redshifting down without being subject to reheating, and
consequently at BBN they only count for a fraction of an extra neutrino species in agreement
with observations.
[1] A. G. Riess et al. [Supernova Search Team Collaboration], Astron. J. 116, 1009 (1998)
[arXiv:astro-ph/9805201]; S. Perlmutter et al. [Supernova Cosmology Project Collaboration],
Astrophys. J. 517, 565 (1999) [arXiv:astro-ph/9812133]; N. A. Bahcall, J. P. Ostriker, S. Perl-
mutter and P. J. Steinhardt, Science 284, 1481 (1999) [arXiv:astro-ph/9906463].
[2] S. Weinberg, Phys. Rev. Lett. 59, 2607 (1987).
[3] R. Bousso and J. Polchinski, JHEP 0006, 006 (2000) [arXiv:hep-th/0004134]; L. Susskind
arXiv:hep-th/0302219; M. R. Douglas, JHEP 0305, 046 (2003) [arXiv:hep-th/0303194];
N. Arkani-Hamed and S. Dimopoulos, JHEP 0506, 073 (2005) [arXiv:hep-th/0405159];
M. R. Douglas and S. Kachru, arXiv:hep-th/0610102.
[4] J. M. Maldacena and C. Nunez, Int. J. Mod. Phys. A 16, 822 (2001) [arXiv:hep-th/0007018];
G. W. Gibbons, “Aspects of Supergravity Theories,” lectures given at GIFT Seminar on The-
oretical Physics, San Feliu de Guixols, Spain, 1984. Print-85-0061 (CAMBRIDGE), published
in GIFT Seminar 1984:0123.
[5] G. W. Gibbons and C. M. Hull, arXiv:hep-th/0111072.
[6] P. K. Townsend and M. N. R. Wohlfarth, Phys. Rev. Lett. 91, 061302 (2003) [arXiv:hep-
th/0303097]. See also, N. Ohta, Phys. Rev. Lett. 91, 061303 (2003) [arXiv:hep-th/0303238].
[7] S. B. Giddings, S. Kachru and J. Polchinski, Phys. Rev. D 66, 106006 (2002) [arXiv:hep-
th/0105097].
[8] S. Kachru, R. Kallosh, A. Linde and S. P. Trivedi, Phys. Rev. D 68, 046005 (2003) [arXiv:hep-
th/0301240].
[9] A. Salam and E. Sezgin, Phys. Lett. B 147, 47 (1984).
[10] M. Cvetic, G. W. Gibbons and C. N. Pope, Nucl. Phys. B 677, 164 (2004) [arXiv:hep-
th/0308026].
[11] See e.g., J. J. Halliwell, Nucl. Phys. B 286, 729 (1987); Y. Aghababaie, C. P. Burgess,
S. L. Parameswaran and F. Quevedo, JHEP 0303, 032 (2003) [arXiv:hep-th/0212091];
Y. Aghababaie, C. P. Burgess, S. L. Parameswaran and F. Quevedo, Nucl. Phys. B 680,
389 (2004) [arXiv:hep-th/0304256]; G. W. Gibbons, R. Guven and C. N. Pope, Phys. Lett.
B 595, 498 (2004) [arXiv:hep-th/0307238]; Y. Aghababaie et al., JHEP 0309, 037 (2003)
[arXiv:hep-th/0308064].
[12] K. A. Olive, G. Steigman and T. P. Walker, Phys. Rept. 333, 389 (2000) [arXiv:astro-
ph/9905320]; R. Bean, S. H. Hansen and A. Melchiorri, Nucl. Phys. Proc. Suppl. 110, 167
(2002) [arXiv:astro-ph/0201127].
[13] M. Tegmark et al. [SDSS Collaboration], Phys. Rev. D 69, 103501 (2004) [arXiv:astro-
ph/0310723].
[14] D. N. Spergel et al. [WMAP Collaboration], arXiv:astro-ph/0603449.
[15] J. J. Halliwell, Phys. Lett. B 185, 341 (1987); B. Ratra and P. J. E. Peebles, Phys. Rev. D
37, 3406 (1988); P. G. Ferreira and M. Joyce, Phys. Rev. Lett. 79, 4740 (1997) [arXiv:astro-
ph/9707286]; P. G. Ferreira and M. Joyce, Phys. Rev. D 58, 023503 (1998) [arXiv:astro-
ph/9711102]; E. J. Copeland, A. R. Liddle and D. Wands, Phys. Rev. D 57, 4686 (1998)
[arXiv:gr-qc/9711068].
[16] D. Comelli, M. Pietroni and A. Riotto, Phys. Lett. B 571, 115 (2003) [arXiv:hep-ph/0302080];
U. Franca and R. Rosenfeld, Phys. Rev. D 69, 063517 (2004) [arXiv:astro-ph/0308149].
[17] R. M. Wald, “General Relativity,” (University of Chicago Press, Chicago, 1984).
[18] A similar expression was derived by J. Vinet and J. M. Cline, Phys. Rev. D 71, 064011 (2005)
[arXiv:hep-th/0501098].
[19] J. Preskill, M. B. Wise and F. Wilczek, Phys. Lett. B 120, 127 (1983).
[20] M. B. Hoffman, arXiv:astro-ph/0307350.
[21] This assumption will be justified a posteriori when we find that ρX ≪ ρmat.
[22] E. J. Copeland, A. R. Liddle and D. Wands, op. cit. in Ref. [15].
[23] L. Anchordoqui and H. Goldberg, Phys. Rev. D 68, 083513 (2003) [arXiv:hep-ph/0306084].
[24] U. J. Lopes Franca and R. Rosenfeld, JHEP 0210, 015 (2002) [arXiv:astro-ph/0206194].
[25] P. J. Steinhardt, L. M. Wang and I. Zlatev, Phys. Rev. D 59, 123504 (1999) [arXiv:astro-
ph/9812313].
[26] S. M. Carroll, Phys. Rev. Lett. 81, 3067 (1998) [arXiv:astro-ph/9806099].
[27] Y. Aghababaie, C. P. Burgess, S. L. Parameswaran and F. Quevedo, op. cit. in Ref. [15].
[28] This comes from a behavior Y ≃ −
2u (compatible with the equations of motion), when
combined with the e−3u in Eq. (30).
[29] Before proceeding, we remind the reader that the requirements for preserving a fraction of
SUSY in spherical compactifications to four dimensions imply b2ξ = 1, corresponding to
the winding number n = ±1 for the monopole configuration. In terms of the Bohm-Aharonov
argument on phases, this is consistent with usual requirement of quantization of the monopole.
The SUSY breaking has associated a non-quantized flux of the field supporting the two sphere.
In other words, if we perform a Bohm-Aharonov-like interference experiment, some phase
change will be detected by a U(1) charged particle that circulates around the associated Dirac
string. The quantization of fluxes implied the unobservability of such a phase, and so in our
cosmological set up, the parallel transport of a fermion will be slightly path dependent. One
possibility is that the non-compact ρ coordinate (in the uplift to ten dimensions, see Sec. III) is
the direction in which the Dirac string exists. Then the cutoff necessary on the physics at large
ρ will introduce a slight (time-dependent) perturbation on the flux quantization condition. We
are engaged at present in exploring possibilities along this line.
[30] This is because the definition of the propagator with proper residue for correct Feyman rules
in perturbation theory, and therefore also the couplings, needs to be consistent with the form
of the Hamiltonian =
k ω(k)a
kak, with [a, a
†] = 1. This in turn implies that the kinetic term
in the Lagrangian has the canonical form, (1/4)f̂2ab, with the usual expansion of the vector
field aa.
ABSTRACT
  We explore the cosmological content of Salam-Sezgin six dimensional
supergravity, and find a solution to the field equations in qualitative
agreement with observation of distant supernovae, primordial nucleosynthesis
abundances, and recent measurements of the cosmic microwave background. The
carrier of the acceleration in the present de Sitter epoch is a quintessence
field slowly rolling down its exponential potential. Intrinsic to this model is
a second modulus which is automatically stabilized and acts as a source of cold
dark matter with a mass proportional to an exponential function of the
quintessence field (hence realizing VAMP models within a String context).
However, any attempt to saturate the present cold dark matter component in this
manner leads to unacceptable deviations from cosmological data -- a numerical
study reveals that this source can account for up to about 7% of the total cold
dark matter budget. We also show that (1) the model will support a de Sitter
energy in agreement with observation at the expense of a miniscule breaking of
supersymmetry in the compact space; (2) variations in the fine structure
constant are controlled by the stabilized modulus and are negligible; (3)
``fifth''forces are carried by the stabilized modulus and are short range; (4)
the long time behavior of the model in four dimensions is that of a
Robertson-Walker universe with a constant expansion rate (w = -1/3). Finally,
we present a String theory background by lifting our six dimensional
cosmological solution to ten dimensions.

<|endoftext|><|startoftext|>
Introduction
A noncommutative (NC) spacetimeM is obtained by introducing a symplectic structure B = 1
Babdy
dyb and then by quantizing the spacetime with its Poisson structure θab ≡ (B−1)ab, treating it as a
quantum phase space. That is, for f, g ∈ C∞(M),
{f, g} = θab
⇒ −i[f̂ , ĝ]. (1.1)
According to the Weyl-Moyal map [1, 2], the NC algebra of operators is equivalent to the deformed
algebra of functions defined by the Moyal ⋆-product, i.e.,
f̂ · ĝ ∼= (f ⋆ g)(y) = exp
θab∂ya∂
f(y)g(z)
. (1.2)
Through the quantization rules (1.1) and (1.2), one can define NC IR2n by the following commutation
relation
[ya, yb]⋆ = iθ
ab. (1.3)
It is well-known [2, 3] that a NC field theory can be identified basically with a matrix model or a
large N field theory. This claim is based on the following fact. Let us consider a NC IR2 for simplicity,
[x, y] = iθ, (1.4)
although the same argument equally holds for a NC IR2n as it will be shown later. After scaling the
coordinates x →
θx, y →
θy, the NC plane (1.4) becomes the Heisenberg algebra of harmonic
oscillator
[a, a†] = 1. (1.5)
It is a well-known fact from quantum mechanics that the representation space of NC IR2 is given by
an infinite-dimensional, separable Hilbert space H = {|n〉, n = 0, 1, · · · } which is orthonormal, i.e.,
〈n|m〉 = δnm and complete, i.e.,
n=0 |n〉〈n| = 1. Therefore a scalar field φ̂ ∈ Aθ on the NC plane
(1.4) can be expanded in terms of the complete operator basis
Aθ = {|m〉〈n|, n,m = 0, 1, · · · }, (1.6)
that is,
φ̂(x, y) =
Mmn|m〉〈n|. (1.7)
One can regard Mmn in (1.7) as components of an N × N matrix M in the N → ∞ limit. More
generally one may replace NC IR2 by a Riemann surface Σg of genus g which can be quantized via
deformation quantization [4]. For a compact Riemann surface Σg with finite area A(Σg), the matrix
representation can be finite-dimensional, e.g., for a fuzzy sphere [5]. In this case, A(Σg) ∼ θN but
we simply take the limit N → ∞. We then arrive at the well-known relation:
Scalar field on NC IR2 (or Σg) ⇐⇒ N ×N matrix at N → ∞. (1.8)
If φ̂ is a real scalar field, then M should be a Hermitean matrix. We will see that the above relation
(1.8) has far-reaching applications to string theory.
The matrix representation (1.7) clarifies why NC U(1) gauge theory is a large N gauge theory.
An important point is that the NC gauge symmetry acts as a unitary transformation on H for a field
φ̂ ∈ Aθ in the adjoint representation of U(1) gauge group
φ̂ → Uφ̂ U †. (1.9)
This NC gauge symmetry Ucpt(H) is so large that Ucpt(H) ⊃ U(N) (N → ∞) [6, 7], which is
rather obvious in the matrix basis (1.6). Therefore the NC gauge theory is essentially a large N gauge
theory. It becomes more precise on a NC torus through the Morita equivalence where NC U(1) gauge
theory with rational θ = M/N is equivalent to an ordinary U(N) gauge theory [8]. For this reason,
it is not so surprising that NC electromagnetism shares essential properties appearing in a large N
gauge theory such as SU(N → ∞) Yang-Mills theory or matrix models.
It is well-known [9] that 1/N expansion of any large N gauge theory using the double line for-
malism reveals a picture of a topological expansion in terms of surfaces of different genus, which
can be interpreted in terms of closed string variables as the genus expansion of string amplitudes. It
has been underlain the idea that large N gauge theories have a dual description in terms of gravita-
tional theories in higher dimensions. For example, BFSS matrix model [10], IKKT matrix model [11]
and AdS/CFT duality [12]. From the perspective (1.8), the 1/N expansion corresponds to the NC
deformation in terms of θ/A(Σg).
All these arguments imply that there exists a solid map between a NC gauge theory and a large N
gauge theory. In this work we will find a sound realization of this idea. It turns out that the emergent
gravity recently found in [13, 14, 15, 16] can be elegantly understood in this framework. Therefore
the correspondence between NC field theory and gravity [3] is certainly akin to the gauge/gravity
duality in large N limit [10, 11, 12].
This paper is organized as follows. In Section 2 we map NC U(1) gauge theory on IRdC × IR2nNC to
U(N → ∞) Yang-Mills theory on IRdC , where IRdC is a d-dimensional commutative spacetime while
IR2nNC is a 2n-dimensional NC space. The resulting U(N) Yang-Mills theory on IR
C is equivalent to
that obtained by the dimensional reduction of (d + 2n)-dimensional U(N) Yang-Mills theory onto
IRdC . In Section 3, we show that the gauge-Higgs system (Aµ,Φ
a) in the U(N → ∞) Yang-Mills
theory on IRdC leads to an emergent geometry in the (d + 2n)-dimensional spacetime whose metric
was determined by Ward [17] a long time ago. In particular, the 10-dimensional gravity for d = 4 and
n = 3 corresponds to the emergent geometry arising from the 4-dimensional N = 4 vector multiplet
in the AdS/CFT duality [12]. We further elucidate the emergent gravity in Section 4 by showing that
the gauge-Higgs system (Aµ,Φ
a) in half-BPS configurations describes self-dual Einstein gravity. A
notable point is that the emergent geometry arising from the gauge-Higgs system (Aµ,Φ
a) is closely
related to the bubbling geometry in AdS space found in [18]. Finally, in Section 5, we discuss several
interesting issues that naturally arise from our construction.
2 A Large N Gauge Theory From NC U(1) Gauge Theory
We will consider a NC U(1) gauge theory on IRD = IRdC × IR2nNC , where D-dimensional coordinates
XM (M = 1, · · · , D) are decomposed into d-dimensional commutative ones, denoted as zµ (µ =
1, · · · , d) and 2n-dimensional NC ones, denoted as ya (a = 1, · · · , 2n), satisfying the relation (1.3).
We assume the metric on IRD = IRdC × IR2nNC as the following form 1
ds2 = GMNdXMdXN
= gµνdz
µdzν +Gabdy
adyb. (2.1)
The action for D-dimensional NC U(1) gauge theory is given by
4g2YM
detGGMPGNQ(FMN + ΦMN ) ⋆ (FPQ + ΦPQ), (2.2)
where the NC field strength FMN is defined by
FMN = ∂MAN − ∂NAM − i[AM , AN ]⋆. (2.3)
The constant two-form Φ will be taken either 0 or −B = −1
Babdy
a ∧ dyb with rank(B) = 2n.
Here we will use the background independent prescription [8, 19] where the open string metric
Gab, the noncommutativity θ
ab and the open string coupling Gs are determined by
θab =
, Gab = −κ2
, Gs = gs
det′(κBg−1), (2.4)
with κ ≡ 2πα′. The closed string metric gab in Eq.(2.4) is independent of gµν in Eq.(2.1) and det′
denotes a determinant taken along NC directions only in IR2nNC . In terms of these parameters, the
couplings are related by
, (2.5)
det′G
gs|Pfθ|
. (2.6)
An important fact is that translations in NC directions are basically gauge transformations, i.e.,
eik·y ⋆ f(z, y) ⋆ e−ik·y = f(z, y+ θ · k) for any f(z, y) ∈ C∞(M). This means that translations along
NC directions act as inner derivations of the NC algebra Aθ:
[ya, f ]⋆ = iθ
ab∂bf. (2.7)
1 Here we can take the d-dimensional spacetime metric gµν with either Lorentzian or Euclidean signature since the
signature is inconsequential in our most discussions. But we implicitly assume the Euclidean signature for some other
discussions.
Using this relation, each component of FMN can be written as the following forms
Fµν = i[Dµ, Dµ]⋆, (2.8)
Fµa = θ
[Dµ, x
b]⋆ = −Faµ, (2.9)
Fab = −iθ−1ac θ−1bd
[xc, xd]⋆ − iθcd
, (2.10)
where the covariant derivative Dµ and the covariant coordinate x
a are, respectively, defined by
Dµ ≡ ∂µ − iAµ, (2.11)
xa ≡ ya + θabAb. (2.12)
Collecting all these facts, one gets the following expression for the action (2.2) with Φ = −B 2
(2πκ)
detgµνTrH
gµλgνσFµν ⋆ Fλσ +
gµνgabDµΦ
a ⋆ DνΦ
gacgbd[Φ
a,Φb]⋆ ⋆ [Φ
c,Φd]⋆
, (2.13)
where we defined adjoint scalar fields Φa ≡ xa/κ of mass dimension and
TrH ≡
(2π)n|Pfθ| . (2.14)
Note that the number of the adjoint scalar fields is equal to the rank of θab. The resulting action (2.13)
is not new but rather well-known in NC field theory, e.g., see [19, 20].
The NC algebra (1.3) is equivalent to the Heisenberg algebra of an n-dimensional harmonic os-
cillator in a frame where θab has a canonical form:
[ai, a
j ] = δij , (i, j = 1, · · · , n). (2.15)
The NC space (1.3) is therefore represented by the infinite-dimensional Hilbert space H = {|~m〉 ≡
|m1, · · · , mn〉;mi = 0, 1, · · · , N → ∞ for i = 1, · · · , n} whose set of eigenvalues forms an n-
dimensional positive integer lattice. A set of operators in H
Aθ = {|~m〉〈~n|;mi, ni = 0, 1, · · · , N → ∞ for i = 1, · · · , n} (2.16)
can be identified with the generators of a complete operator basis and so any NC field φ(z, y) ∈ Aθ
can be expanded in the basis (2.16) as follows,
φ(z, y) =
~m,~n
Ω~m,~n (z)|~m〉〈~n|. (2.17)
2If Φ = 0 in Eq.(2.2), the only change in Eq.(2.13) is [Φa,Φb] → [Φa,Φb]− i
Now we use the ‘Cantor diagonal method’ to put the n-dimensional positive integer lattice in H
into a one-to-one correspondence with the infinite set of natural numbers (i.e., 1-dimensional positive
integer lattice): |~m〉 ↔ |i〉, i = 1, · · · , N → ∞. In this one-dimensional basis, Eq.(2.17) is relabeled
as the following form
φ(z, y) =
Ωij (z)|i〉〈j|. (2.18)
Following the motivation discussed in the Introduction, we regard Ωij(z) in (2.18) as components of
an N ×N matrix Ω in the N → ∞ limit, which also depend on zµ, the coordinates of IRdC . If the field
φ(z, y) is real which is the case for the gauge-Higgs system (Aµ,Φ
a) in the action (2.13), the matrix
Ω should be Hermitean, but not necessarily traceless. So the N × N matrix Ω(z) can be regarded as
a field in U(N → ∞) gauge theory on d-dimensional commutative space IRdC , where TrH in (2.14)
is identified with the matrix trace over the basis (2.18). All the dependence on NC coordinates is now
encoded into N ×N matrices and the noncommutativity in terms of star product is transferred to the
matrix product.
Adopting the matrix representation (2.18), the D-dimensional NC U(1) gauge theory (2.2) is
mapped to the U(N → ∞) Yang-Mills theory on d-dimensional commutative space IRdC . One can
see that the resulting U(N) Yang-Mills theory on IRdC in Eq.(2.13) is equivalent to that obtained by
the dimensional reduction of (d + 2n)-dimensional U(N) Yang-Mills theory onto IRdC . It might be
emphasized that the map between the D-dimensional NC U(1) gauge theory and the d-dimensional
U(N → ∞) Yang-Mills theory is “exact” and thus the two theories should describe a completely
equivalent physics. For example, we can recover the D-dimensional NC U(1) gauge theory on IRdC ×
IR2nNC from the d-dimensional U(N → ∞) Yang-Mills theory on IRdC by recalling that the number of
adjoint Higgs fields in the U(N) Yang-Mills theory is equal to the dimension of the extra NC space
IR2nNC and by applying the dictionary in Eqs.(2.8)-(2.10).
One can introduce linear algebraic conditions of D-dimensional field strengths FMN as a higher
dimensional analogue of 4-dimensional self-duality equations such that the Yang-Mills equations in
the action (2.2) follow automatically. These are of the following type [21, 22]
TMNPQFPQ = λFMN (2.19)
with a constant 4-form tensor TMNPQ. The relation (2.19) clearly implies via the Bianchi identity
D[MFPQ] = 0 that the Yang-Mills equations are satisfied provided λ is nonzero. For D > 4, the 4-
form tensor TMNPQ cannot be invariant under SO(D) transformations and the equation (2.19) breaks
the rotational symmetry to a subgroup H ⊂ SO(D). Thus the resulting first order equations can be
classified by the unbroken symmetry H under which TMNPQ remain invariant [21, 22]. It was also
shown [23] that the first order linear equations above are closely related to supersymmetric states, i.e.,
BPS states in higher dimensional Yang-Mills theories.
The equivalence between D- and d-dimensional gauge theories can be effectively used to clas-
sify classical solutions in the d-dimensional U(N) Yang-Mills theory (2.13). The group theoretical
classification [21], integrability condition [22] and BPS states [23] for the D-dimensional first-order
equations (2.19) can be directly translated into the properties of the gauge-Higgs system (Aµ,Φ
a) in
the d-dimensional U(N) gauge theory (2.13). These classifications will also be useful to classify the
geometries emerging from the gauge-Higgs system (Aµ,Φ
a) in the U(N → ∞) Yang-Mills theory
(2.13), which will be discussed in the next section. Unfortunately, the D = 10 case is missing in
[21, 22, 23] which is the most interesting case (d = 4 and n = 3) related to the AdS/CFT duality.
3 Emergent Geometry From NC Gauge Theory
Let us first recapitulate the result in [17]. It turns out that the Ward’s construction perfectly fits with
the emergent geometry arising from the gauge-Higgs system (Aµ,Φ
a) in the U(N → ∞) Yang-Mills
theory (2.13). Suppose that we have gauge fields on IRdC taking values in the Lie algebra of volume-
preserving vector fields on an m-dimensional manifold M [24, 25]. In other words, the gauge group
G = SDiff(M). The gauge covariant derivative is given by Eq.(2.11), but the Aµ(z) are now vector
fields on M , also depending on zµ ∈ IRdC . The other ingredient in [17] consists of m Higgs fields
Φa(z) ∈ sdiff(M), the Lie algebra of SDiff(M), for a = 1, · · · , m. The idea [24, 25] is to specify
f−1(D1, · · · , Dd,Φ1, · · · ,Φm) (3.1)
forms an orthonormal frame and hence defines a metric on IRdC ×M with a volume form ν = ddz∧ω.
Here f is a scalar, a conformal factor, defined by
f 2 = ω(Φ1, · · · ,Φm). (3.2)
The result in [24, 25] immediately implies that the gauge-Higgs system (Aµ,Φ
a) leads to a metric
on the (d + m)-dimensional space IRdC × M . A local coordinate expression for this metric is easily
obtained from Eq.(3.1). Let ya be local coordinates on M . So Aµ(z) and Φa(z) have the form
Aµ(z) = A
µ(z, y)
, Φa(z) = Φ
a(z, y)
, (3.3)
where the y-dependence, originally hidden in the Lie algebra of G = SDiff(M), now explicitly
appears in the coefficients Aaµ and Φ
a. Let V
b denote the inverse of the m×m matrix Φba, and let Aa
denote the 1-form Aaµdz
µ. Then the metric is [17]
ds2 = f 2δµνdz
µdzν + f 2δabV
d (dy
c −Ac)(dyd −Ad). (3.4)
It will be shown later that the choice of the volume form ω for the conformal factor (3.2) corresponds
to that of a particular conformally flat background although we mostly assume a flat volume form,
i.e., ω ∼ dy1 ∧ · · · ∧ dy2n, unless explicitly specified.
The gauge and Higgs fields in Eq.(3.3) are not arbitrary but must be subject to the Yang-Mills
equations, for example, derived from the action (2.13), which are, in most cases, not completely
integrable. Hence to completely determine the geometric structure emerging from the gauge-Higgs
system (Aµ,Φ
a) is as much difficult as solving the Einstein equations in general. But the self-dual
Yang-Mills equations in four dimensions or Eq.(2.19) in general are, in some sense, “completely
solvable”. Thus the metric (3.4) for these cases might be completely determined. Let us discuss two
notable examples. See [17] for more examples describing 4-dimensional self-dual Einstein gravity.
• Case d = 0, m = 4: This case was dealt with in detail in [25, 26, 27]. It was proved that the
self-dual Einstein equations are equivalent to the self-duality equations
[Φa,Φb] = ±
εabcd[Φc,Φd] (3.5)
on the four Higgs fields Φa. Furthermore reinterpreting n of the Φa’s as Dµ leads to the case d =
n, m = 4− n. In Section 5, we will discuss the physical meaning about the interpretation Φa 7→ Dµ.
• Case d = 3, m = 1: Here M is one-dimensional, so the Lie algebra of vector fields on M is the
Virasoro algebra. Thus Aµ and Φ are now real-valued vector fields on M which must be independent
of y to preserve the volume form ν = d3z ∧ dy [27]. The metric (3.4) reduces to
ds2 = Φd~z · d~z + Φ−1(dy −Aµdzµ)2 (3.6)
and has a Killing vector ∂/∂y. In this case, the self-duality equations (3.5) reduce to the Abelian
Bogomol’nyi equations, ∇× ~A = ∇Φ, and the metric (3.6) describes a gravitational instanton [28].
Recently we showed in [15, 16] for the d = 0 and m = 4 case that self-dual electromagnetism in
NC spacetime is equivalent to self-dual Einstein gravity and the metric is precisely given by Eq.(3.4).
A key observation [16] was that the self-dual system (3.5) defined by vector fields on M can be
derived from the action (2.2) or (2.13) for slowly varying fields, where all ⋆-commutators between
NC fields are approximated by the Poisson bracket (1.1). An important point in NC geometry is
that the adjoint action of (covariant) coordinates with respect to star product can be identified with
(generalized) vector fields on some (curved) manifold [15, 16], as the trivial case was already used
in Eq.(2.7). In the end, a D-dimensional manifold described by the metric (3.4) corresponds to an
emergent geometry arising from the gauge-Higgs system in Eq.(3.3). Now we will show in a general
context how the nontrivial geometry (3.4) emerges from the gauge-Higgs system (Aµ,Φ
a) in the
action (2.13).
Let us collectively denote the covariant derivatives Dµ in (2.11) and the Higgs fields Da ≡
−iκBabΦb = −i(Babyb +Aa) in (2.12) as DA(z, y). Therefore DA(z, y) transform covariantly under
NC U(1) gauge transformations
DA(z, y) → g(z, y) ⋆ DA(z, y) ⋆ g−1(z, y). (3.7)
Define the adjoint action of DA(z, y) with respect to star product acting on any NC field f(z, y) ∈ Aθ:
adDA[f ] ≡ [DA, f ]⋆. (3.8)
Then it is easy to see [16] that the above adjoint action satisfies the Leibniz rule and the Jocobi
identity, i.e.,
[DA, f ⋆ g]⋆ = [DA, f ]⋆ ⋆ g + f ⋆ [DA, g]⋆, (3.9)
[DA, [DB, f ]⋆]⋆ − [DB, [DA, f ]⋆]⋆ = [[DA, DB]⋆, f ]⋆. (3.10)
These properties imply that adDA can be identified with ‘generalized’ vector fields or Lie deriva-
tives acting on the algebra Aθ, which can be viewed as a gauge covariant generalization of the inner
derivation (2.7). Note that the generalized vector field in Eq.(3.8) is a kind of general higher or-
der differential operators in [29]. Indeed it turns out that they constitute a generalization of volume
preserving diffeomorphisms to ⋆-differential operators acting on Aθ (see Eqs.(4.1) and (4.2) in [7]).
In particular, the generalized vector fields in Eq.(3.8) reduce to usual vector fields in the commu-
tative, i.e. O(θ), limit:
adDA [f ] = iθ
ab∂DA
+ · · · = i{DA, f}+O(θ3)
≡ V aA(z, y)∂af(z, y) +O(θ3) (3.11)
where we defined [∂µ, f ]⋆ = ∂µf . Note that the vector fields VA(z, y) = V
A(z, y)∂a are exactly of
the same form as Eq.(3.3) and belong to the Lie algebra of volume preserving diffeomorphisms, as
precisely required in the Ward construction (3.1), since they are all divergence free, i.e., ∂aV
A = 0.
Thus the vector fields f−1VA(z, y) for A = 1, · · · , D can be identified with the orthonormal frame
(3.1) defining the metric (3.4). It should be emphasized that the emergent gravity (3.4) arises from a
general, not necessarily self-dual, gauge-Higgs system (Aµ,Φ
a) in the action (2.13).
Note that
[DA, DB]⋆ = −i(FAB − BAB) (3.12)
where the NC field strength FAB is given by Eq.(2.3). Then the Jacobi identity (3.10) leads to the
following identity for a constant BAB
ad[DA,DB]⋆ = −i adFAB = [adDA, adDB ]⋆. (3.13)
The inner derivation (3.11) in commutative limit is reduced to the well-known map C∞(M) →
TM : f 7→ Xf between the Poisson algebra (C∞(M), {·, ·}) and vector fields in TM defined by
Xf(g) = {g, f} for any smooth function g ∈ C∞(M). The Jacobi identity for the Poisson algebra
(C∞(M), {·, ·}) then leads to the Lie algebra homomorphism
X{f,g} = −[Xf , Xg] (3.14)
where the right-hand side is defined by the Lie bracket between Hamiltonian vector fields. One can
check by identifying f = DA and g = DB that the Lie algebra homomorphism (3.14) correspond
to the commutative limit of the Jacobi identity (3.10). That is, one can deduce from Eq.(3.14) the
following identity
XFAB = −[VA, VB] (3.15)
using the relation {DA, DB} = −FAB +BAB and XDA = iVA.
Using the homomorphism (3.15), one can translate the generalized self-duality equation (2.19)
into the structure equation between vector fields
TABCDFCD = λFAB ⇔
TABCD[VC , VD] = λ[VA, VB]. (3.16)
Therefore a D-dimensional NC gauge field configuration satisfying the first-order system defined by
the left-hand side of Eq.(3.16) is isomorphic to a D-dimensional emergent geometry defined by the
right-hand side of Eq.(3.16) whose metric is given by Eq.(3.4). For example, in four dimensions
where TABCD = εABCD and λ = ±1, the right-hand side of Eq.(3.16) is precisely equal to Eq.(3.5)
describing gravitational instantons [24, 25, 26, 27]. This proves, as first shown in [15, 16], that self-
dual NC electromagnetism is equivalent to self-dual Einstein gravity. Note that the Einstein gravity
described by the metric (3.4) arises from the commutative, i.e., O(θ) limit. Therefore it is natural to
expect that the higher order differential operators in Eq.(3.11), e.g. O(θ3), give rise to higher order
gravity [16]. We will further discuss the derivative correction in Section 5.
The 10-dimensional metric (3.4) for d = 4 and n = 3 (m = 6) is particularly interesting since it
corresponds to an emergent geometry arising from the 4-dimensional N = 4 vector multiplet in the
AdS/CFT duality. Note that the gravity in the AdS/CFT duality is an emergent phenomenon arising
from particle interactions in a gravityless, lower-dimensional spacetime. As a famous example, the
type IIB supergravity (or more generally the type IIB superstring theory) on AdS5 × S5 is emergent
from the 4-dimensional N = 4 supersymmetric U(N) Yang-Mills theory [12].3 In our construction,
N × N matrices are mapped to vector fields on some manifold M , so the vector fields in Eq.(3.3)
correspond to master fields of large N matrices [30], in other words, (Aµ,Φ
a) ∼ N2. According to
the AdS/CFT duality, we thus expect that the metric (3.4) describes a deformed geometry induced
by excitations of the gauge and Higgs fields in the action (2.13). For example, we may look for
1/2 BPS geometries in type IIB supergravity that arise from chiral primaries of N = 4 super Yang-
Mills theory [18]. Recently this kind of BPS geometries, the so-called bubbling geometry in AdS
space, with a particular isometry was completely determined in [18], where the AdS5 × S5 geometry
emerges from the simplest and most symmetric configuration. In next section we will illustrate such
kind of bubbling geometry described by the metric (3.4) by considering self-dual configurations in
the gauge-Higgs system.
3The overall U(1) = U(N)/SU(N) factor actually corresponds to the overall position of D3-branes and may be
ignored when considering dynamics on the branes, thereby leaving only an SU(N) gauge symmetry.
4 Self-dual Einstein Gravity From Large N Gauge Theory
In the previous section we showed that the Ward’s metric (3.4) naturally emerges from the D-dimensional
NC U(1) gauge fields AM on IR
C × IR2nNC or equivalently the gauge-Higgs system (Aµ,Φa) in d-
dimensional U(N) gauge theory on IRdC . So, if an explicit solution for AM or (Aµ,Φ
a) is known, the
corresponding metric (3.4) is, in principle, exactly determined. However, it is extremely difficult to
get a general solution by solving the equations of motion for the action (2.2) or (2.13). Instead we
may try to solve a more simpler system such as the first-order equations (2.19), which are morally
believed to be ‘exactly solvable’ in most cases. In this section we will further elucidate the emer-
gent gravity arising from gauge fields by showing that the gauge-Higgs system (Aµ,Φ
a) in half-BPS
configurations describes self-dual Einstein gravity. Since the case for D = 4 and n = 2 has been
extensively discussed in [14, 15, 16], we will consider the other cases for D ≥ 4. For simplicity,
the metrics in the action (2.13) are supposed to be the form already used in Eq.(3.4); gµν = δµν and
gab = δab.
Note that the action (2.2) or (2.13) contains a background B, due to a uniform condensation of
gauge fields in a vacuum. But we will require a rapid fall-off of fluctuating fields around the back-
ground at infinity in IRD as usual.4 Our boundary condition is FMN → 0 at infinity. Eq.(2.10) then
requires that [xa, xb]⋆ → iθab at |y| → ∞. Thus the coordinates ya in (2.12) are vacuum expectation
values of xa characterizing the uniform condensation of gauge fields [16]. This condensation of the
B-fields endows the ⋆-algebra Aθ with a remarkable property that translations act as an inner auto-
morphism of the algebra Aθ as shown in Eq.(2.7). But the gauge symmetry on NC spacetime requires
the covariant coordinates xa in Eq.(2.12) instead of ya [31]. The inner derivation adDa in Eq.(3.8) is
then a ‘dual element’ related to the coordinate xa. This is also true for the covariant derivatives Dµ
in (2.11) since they are related to Da = −iBabxb by the ‘matrix T -duality’; Da 7→ Dµ, as will be
explained in Section 5.
It is very instructive to take an analogy with quantum mechanics. Quantum mechanical time
evolution in Heisenberg picture is defined as an inner automorphism of the Weyl algebra obtained
from a quantum phase space
f(t) = eiHtf(0)e−iHt
and its evolution equation is of the form (3.8)
df(t)
= i[H, f(t)].
Here we liberally interpret DA(z, y) in Eq.(3.8) as ‘multi-Hamiltonians’ determining the spacetime
evolution in IRD. Then it is quite natural to interpret Eq.(3.8) as a spacetime evolution equation
determined by the “covariant Hamiltonians” DA(z, y).
4In the matrix representation (2.18), this means that matrix components Ωij(z) for the fluctuations are rapidly vanish-
ing for i, j = N → ∞ as well as for |z| → ∞, since roughly N ∼ ~y · ~y.
Let us be more precise about the meaning of the spacetime evolution. If the Hamiltonian is slightly
deformed, H → H + δH , the time evolution of a system is correspondingly changed. Likewise, the
fluctuation of gauge fields AM or (Aµ,Φ
a) around the background specified by ya’s changes DA(z, y),
which in turn induces a deformation of the background spacetime according to Eq.(3.11). This is
precisely the picture about the emergent geometry in [15, 16] and also a dependable interpretation of
the Ward’s geometry (3.4). A consistent picture related to the AdS/CFT duality was also observed in
the last of Section 3.
For the above reason, all equations in the following will be understood as inner derivations acting
on Aθ like as (3.8). The adjoint action defined in this way naturally removes a contribution from
the background in the action (2.2) or (2.13) [15]. For example, the first equation in (4.1) can be
consistent only in this way since the left hand side goes to zero at infinity but the right hand side
becomes ∼ θ/κ2. It might be remarked that this is the way to define the equations of motion in
the background independent formulation [8, 19] and thus it should be equivalent to the usual NC
prescription with ΦMN = 0.
4.1 D = 4 and n = 1
NC instanton solutions in this case were constructed in [32]. As was proved in Eq.(3.16), NC U(1)
instantons are in general equivalent to gravitational instantons. We thus expect that the NC self-
duality equations for D = 4 and n = 1 are mapped to self-dual Einstein equations. We will show
that the gauge-Higgs system (Aµ,Φ
a) in this case is mapped to two-dimensional U(∞) chiral model,
whose equations of motion are equivalent to the Plebański form of the self-dual Einstein equations
[33, 17, 34].
We showed in Section 2 that 4-dimensional NC U(1) gauge theory on IR2C × IR2NC is mapped
to 2-dimensional U(N → ∞) gauge theory with the action (2.13). The 4-dimensional self-duality
equations now become the U(N → ∞) Hitchin equations on IR2C :
Fµν = ±
εµν [Φ,Φ
†], DµΦ = ±iεµνDνΦ, (4.1)
where Φ = Φ1 + iΦ2. Note that the above equations also arise as zero-energy solutions in U(N)
Chern-Simons gauge theory coupled to a nonrelativistic complex scalar field in the adjoint represen-
tation [35]. It was shown in [36] that the self-dual system in Eq.(4.1) is completely solvable in terms
of Uhlenbeck’s uniton method. A NC generalization of Eq.(4.1), the Hitchin’s equations on IR2NC ,
was also considered in [37] with very parallel results to the commutative case. We will briefly discuss
the NC Hitchin’s equations in Section 5.
The equations (4.1) for the self-dual case (with + sign) can be elegantly combined into a zero-
curvature condition [35, 36] for the new connections defined by5
A+ = A+ + Φ, A− = A− − Φ† (4.2)
5Here we will relax the reality condition of the fields (Aµ,Φ
a) and complexify them.
with A± = A1 ± iA2:
F+− = ∂+A− − ∂−A+ − i[A+,A−] = 0 (4.3)
where ∂± = ∂1 ± i∂2. Thus the new gauge fields should be a pure gauge, that is, A± = ig−1∂±g for
some g ∈ GL(N, IC). Thus we can choose them to be zero, viz.
A+ = −Φ, A− = Φ†. (4.4)
Then the self-dual equations (4.1) reduce to
† + ∂−Φ + 2i[Φ,Φ
†] = 0, (4.5)
† − ∂−Φ = 0. (4.6)
Introducing another gauge fields C+ = −2Φ and C− = 2Φ†, Eq.(4.5) also becomes the zero-curvature
condition, hence C± are a pure gauge or
Φ = − i
h−1∂+h, Φ
h−1∂−h. (4.7)
A group element h(z) defines a map from IR2C to GL(N, IC) group, which is contractible to the map
from IR2C to U(N) ⊂ GL(N, IC). Then Eq.(4.6) implies that h(z) satisfies the equation in the two-
dimensional U(N) chiral model [35, 36]
−1∂−h) + ∂−(h
−1∂+h) = 0. (4.8)
Eq.(4.8) is the equation of motion derived from the two-dimensional U(N) chiral model governed
by the following Euclidean action
d2zTr ∂µh
−1∂νhδ
µν . (4.9)
A remarkable (mysterious) fact has been known [33, 17, 34] that in the N → ∞ limit the chiral
model (4.9) describes a self-dual spacetime whose equation of motion takes the Plebański form of
self-dual Einstein equations [38]. Thus, including the case of D = 4 and n = 2 in [14, 15, 16], we
have confirmed Eq.(3.16) stating that the 4-dimensional self-dual system in the action (2.2) or (2.13)
in general describes the self-dual Einstein gravity where self-dual metrics are given by Eq.(3.4).
4.2 D = 6 and n = 1
Our current work has been particularly motivated by this case since it was already shown in [39] that
SU(N) Yang-Mills instantons in the N → ∞ limit are gravitational instantons too. Since NC U(1)
instantons are also gravitational instantons as we showed before, it implies that there should be a close
relationship between SU(N) Yang-Mills instantons and NC U(1) instantons. A basic observation was
the relation (1.8), which leads to the sound realization in Eq.(2.13). But we will simply follow the
argument in [39] for the gauge group G = U(N); in the meantime, we will confirm the results for the
emergent geometry from NC gauge fields.
Let us look at the instanton solution in U(N) Yang-Mills theory. The self-duality equation is
given by
Fµν = ±
εµναβFαβ (4.10)
where the field strength is defined by
Fµν = ∂µAν − ∂νAµ − i[Aµ, Aν ]. (4.11)
In terms of the complex coordinates and the complex gauge fields defined by
(x2 + ix1), z2 =
(x4 + ix3),
Az1 = A2 − iA1, Az2 = A4 − iA3,
Eq.(4.10) can be written as
Fz1z2 = 0 = Fz̄1z̄2 , (4.12)
Fz1z̄1 ∓ Fz2z̄2 = 0. (4.13)
Now let us consider the anti-self-dual (ASD) case. We first notice that Fz1z2 = 0 implies that there
exists a u(N)-valued function g such that Aza = ig
−1∂zag (a = 1, 2). Therefore one can choose a
gauge
Aza = 0. (4.14)
Under the gauge (4.14), the ASD equations lead to
∂z̄1Az̄2 − ∂z̄2Az̄1 − i[Az̄1 , Az̄2 ] = 0, (4.15)
∂z1Az̄1 + ∂z2Az̄2 = 0. (4.16)
First notice a close similarity with Eqs.(4.5) and (4.6). Eq.(4.16) can be solved by introducing a
u(N)-valued function Φ such that
Az̄1 = −∂z2Φ, Az̄2 = ∂z1Φ. (4.17)
Substituting (4.17) into (4.15) one finally gets
(∂z1∂z̄1 + ∂z2∂z̄2)Φ− i[∂z1Φ, ∂z2Φ] = 0. (4.18)
Adopting the correspondence (1.8), we now regard Φ ∈ u(N)⊗C∞(IR4) in Eq.(4.18) as a smooth
function on IR4 × Σg, i.e., Φ = Φ(xµ, p, q) where (p, q) are local coordinates of a two-dimensional
Riemann surface Σg. Moreover, a Lie algebra commutator is replaced by the Poisson bracket (1.1)
{f, g} = ∂f
that is,
[Φ1,Φ2] → i{Φ1,Φ2}, (4.19)
where we absorbed θ into the coordinates (p, q). After all, the ASD Yang-Mills equation (4.18) in
the large N limit is equivalent to a single nonlinear equation in six dimensions parameterized by
(xµ, p, q):
(∂z1∂z̄1 + ∂z2∂z̄2)Φ + {∂z1Φ, ∂z2Φ} = 0. (4.20)
Since Eq.(4.20) is similar to the well-known second heavenly equation [38], it was called in [39] as
the six dimensional version of the second heavenly equation.
Starting from U(N) Yang-Mills instantons in four dimensions, we arrived at the nonlinear dif-
ferential equation for a single function in six dimensions. It is important to notice that the resulting
six-dimensional theory is a NC field theory since the Riemann surface Σg carries a symplectic struc-
ture inherited from the u(N) Lie algebra through Eq.(4.19) and it can be quantized in general via
deformation quantization [4]. Since the function Φ in (4.20) is a master field of N ×N matrices [30],
so Φ ∼ N2, the AdS/CFT duality [12] implies that the master field Φ describes a six-dimensional
emergent geometry induced by Yang-Mills instantons.
To see the emergent geometry, consider an appropriate symmetry reduction of Eq.(4.20) to show
that it describes self-dual gravity in four dimensions. There are many reductions from six to four
dimensional subspace leading to self-dual four-manifolds [39]. A common feature is that the four
dimensional subspace necessarily contains the NC Riemann surface Σg. We will show later how the
symmetry reduction naturally arises from the BPS condition in six dimensions. As a specific example,
we assume the following symmetry,
∂z1Φ = ∂z̄1Φ, ∂z2Φ = ∂z̄2Φ, (4.21)
Φ(z1, z2, z̄1, z̄2, p, q) = Λ(z1 + z̄1 ≡ x, z2 + z̄2 ≡ y, p, q). (4.22)
Then Eq.(4.20) is precisely equal to the Husain’s equation [34] which is the reduction of self-dual
Einstein equations to the sdiff(Σg) chiral field equations in two dimensions:
Λxx + Λyy + ΛxqΛyp − ΛxpΛyq = 0. (4.23)
Note that we already encountered in Section 4.1 the two-dimensional sdiff(Σg) chiral field equations
since sdiff(Σg) ∼= u(N) according to the correspondence (1.8). We showed in [16] that Eq.(4.23)
can be transformed to the first heavenly equation [38] which is a governing equation of self-dual
Einstein gravity. In the end we conclude that self-dual U(N) Yang-Mills theory in the large N limit
is equivalent to self-dual Einstein gravity.
Now it is easy to see that the self-dual Einstein equation (4.23) is coming from a 1/2 BPS equation
in six dimensions (see Eq.(34) in [23]) defined by the first-order equation (2.19). According to our
construction, the six-dimensional NC U(1) gauge theory (2.2) is equivalent to the four-dimensional
U(N) gauge theory (2.13). Therefore six-dimensional BPS equations can be equivalently described
by the gauge-Higgs system (Aµ,Φ
a) in the action (2.13). Let us newly denote the NC coordinates
y1, y2 and commutative ones z3, z4 as uα, α = 1, 2, 3, 4 while z1, z2 as vA, A = 1, 2. The 1/2 BPS
equations, Eq.(34), in [23] can then be written as the following form
Fαβ = ±
εαβγδFγδ, (4.24)
FαA = FAB = 0. (4.25)
Using Eqs.(2.8)-(2.10), the above equations can be rewritten in terms of (Aµ,Φ
a) where the constant
term in (2.10) can simply be dropped for the reason explained before.
FAB = 0 in Eq.(4.25) can be solved by AB = 0 (B = 1, 2) and then FαA = 0 demand that the
gauge fields Aα should not depend on v
A. Thereby Eq.(4.24) precisely reduces to the self-duality
equation (4.1) for D = 4 and n = 1. The symmetry reduction considered above is now understood
as the condition (4.25); in specific, the coordinates vA correspond to i(z1 − z̄1) and i(z2 − z̄2) for
the reduction (4.22). However there are many different choices taking a four-dimensional subspace
in Eq.(4.24) which are related by SO(6) rotations [23]. Unless vA ∈ (y1, y2), that is, Eq.(4.24)
becomes commutative Abelian equations in which there is no non-singular solution, Eqs.(4.24) and
(4.25) reduce to four-dimensional self-dual Einstein equations, as was shown in [39]. The above BPS
equations also clarify why the two-dimensional chiral equations in Section 4.1 reappear in Eq.(4.23).
4.3 D = 8 and D = 10
The analysis for the first-order system (2.19) becomes much more complicated in higher dimensions.
The unbroken supersymmetries in D = 8 have been analyzed in [23]. Because the integrable structure
of Einstein equations in higher dimensions is little known, it is difficult to precisely identify governing
geometrical structures emergent from the gauge theory (2.2) or (2.13) even for BPS states. Neverthe-
less some BPS configurations can be easily implemented as follows. As we did in Eqs.(4.24)-(4.25),
one can imbed the 4-dimensional self-dual system for n = 1 or n = 2 into eight or ten dimensions.
The simplest case is that the metric (3.4) becomes (locally) a product manifold M4×X where M4 is
a self-dual (hyper-Kähler) four-manifold. For example, we can consider an eight-dimensional config-
uration where (A1, A2,Φ
3,Φ4) depend only on (z1, z2, y3, y4) coordinates while (Φ1,Φ2, A3, A4) do
only on (y1, y2, z3, z4) in a B-field background with θ12 6= 0 and θ34 6= 0, only non-vanishing com-
ponents. There are many similar configurations. We will not exhaust them, instead we will consider
the simplest cases which already have some relevance to other works.
The simplest BPS state in D = 8 is the case with n = 2 in the action (2.13); see Eq.(55) in [23].
The equations are of the form
Fµν = ±
εµνλσFλσ, (4.26)
[Φa,Φb] = ±1
εabcd[Φ
c,Φd], (4.27)
a = 0. (4.28)
A solution of Eq.(4.28) is given by Aµ = Aµ(z) and Φ
a = Φa(y). Then Eq.(4.26) becomes com-
mutative Abelian equations which allow no non-singular solutions, while (4.27) reduces to Eq.(3.5)
describing 4-dimensional self-dual manifolds [15]. Thus the metric (3.4) in this case leads to a half-
BPS geometry IR4×M4. Since we don’t need instanton solutions in Eq.(4.26), we may freely replace
IR4 by 4-dimensional Minkowski space IR1,3 (see the footnote 1).
The above system was considered in [40] in the context of D3-D7 brane inflationary model. The
model consists of a D3-brane parallel to a D7-brane at some distance in the presence of Fab =
(B + F )ab on the worldvolume of the D7-brane, but transverse to the D3-brane. The F -field plays
the role of the Fayet-Illiopoulos term from the viewpoint of the D3-brane worldvolume field theory.
Because of spontaneously broken supersymmetry in de Sitter valley the D3-brane is attracted towards
the D7-brane and eventually it is dissolved into the D7-brane as a NC instanton. The system ends in
a supersymmetric Higgs phase with a smooth instanton moduli space. An interesting point in [40] is
that there is a relation between cosmological constant in spacetime and noncommutativity in internal
space. Our above result adds a geometrical picture that the internal space after tachyon condensation
is developed to a gravitational instanton, e.g., an ALE space or K3.
Another interesting point, not mentioned in [40], is that it effectively realizes the dynamical com-
pactification of extra dimensions suggested in [41]. Since the D3-brane is an instanton inside the
D7-brane, particles living in the D3-brane are trapped in the core of the instanton with size ∼ θ2
where the noncommutativity scale θ is believed to be roughly Planck scale. Since the instanton
(D3-brane) results in a spontaneous breaking of translation symmetry and supersymmetry partially,
Goldstone excitations corresponding to the broken bosonic and fermionic generators are zero-modes
trapped in the core of the instanton. “Quarks” and “leptons” might be identified with these fermionic
zero-modes [41].
We argued in the last of Section 3 that the 10-dimensional metric (3.4) for d = 4 and n = 3
reasonably corresponds to an emergent geometry arising from the 4-dimensional N = 4 supersym-
metric U(N) Yang-Mills theory. Especially it may be closely related to the bubbling geometry in AdS
space found by Lin, Lunin and Maldacena (LLM) [18]. One may notice that the LLM geometry is
a bubbling geometry deformed from the AdS5 × S5 background which can be regarded as a vacuum
manifold emerging from the self-dual RR five-form background, while the Ward’s geometry (3.4) is
defined in a 2-form B-field background and becomes (conformally) flat if all fluctuations are turned
off, say, (Aµ,Φ
a) → (0, ya/κ). But it turns out that the LLM geometry is a special case of the Ward’s
geometry (3.4).
To see this, recall that the AdS5 × S5 background is conformally flat, i.e.,
ds2 =
(ηµνdz
µdzν + dyadya) =
(ηµνdz
µdzν + dρ2) + L2dΩ25 (4.29)
where ρ2 =
a=1 y
aya and dΩ25 is the spherically symmetric metric on S
5. It is then easy to see that
the metric (4.29) is exactly the vacuum geometry of Eq.(3.4) when the volume form ω in Eq.(3.2) is
given by
dy1 ∧ · · · ∧ dy6
. (4.30)
Therefore it is obvious that the Ward’s metric (3.4) with the volume form (4.30) describes a bubbling
geometry which approaches to the AdS5 × S5 space at infinity where fluctuations are vanishing,
namely, (Aµ,Φ
a) → (0, ya/κ). Note that the flat spacetime IR1,9 is coming from the volume form
ω = dy1 ∧ · · · ∧ dy6, so Eq.(4.30) should correspond to some nontrivial soliton background from the
gauge theory point of view. We will discuss in Section 5 a possible origin of the volume form (4.30).
Now let us briefly summarize half-BPS geometries of type IIB string theory corresponding to
the chiral primaries of N = 4 super Yang-Mills theory [18]. These BPS states are giant graviton
branes which wrap an S3 in AdS5 or an S̃
3 in S5. Thus the geometry induced (or back-reacted)
by the giant gravitons preserves SO(4) × SO(4) × R isometry. It turns out that the solution is
completely determined by a single function which is specified with two types of boundary conditions
on a particular plane corresponding to either of two different spheres shrinking on the plane in a
smooth fashion. The LLM solutions are thus in one-to-one correspondence with various 2-colorings
of a 2-plane, usually referred to as ‘droplets’ and the geometry depends on the shape of the droplets.
The droplet describing gravity solutions turns out to be the same droplet in the phase space describing
free fermions for the half-BPS states.
The solutions can be analytically continued to those with SO(2, 2) × SO(4) × U(1) symmetry
[18], so the solutions have the AdS3 × S3 factor rather than S3 × S̃3. After an analytic continuation,
a underlying 4-dimensional geometry M4 attains a nice geometrical structure at asymptotic region,
where AdS3 × S3 → IR1,5 and M4 reduces to a hyper-Kähler geometry. But it loses the nice picture
in terms of fermion droplet since the solution is now specified by one type of boundary condition. It
is interesting to notice that the asymptotic bubbling geometry for the type IIB case is the Gibbons-
Hawking metric [28] and the real heaven metric [42] for the M theory case, which are all solutions of
NC electromagnetism [15, 16].
It is quite demanding to completely determine general half-BPS geometries emerging from the
gauge-Higgs system in the action (2.13). Hence we will look at only an asymptotic geometry (or a
local geometry) which is relatively easy to identify. For the purpose, we consider the n = 3 case on
4-dimensional Minkowski space IR1,3. It is simple to mimic the previous half-BPS configurations in
D = 6, 8 with trivial extra Higgs fields. Then the resulting metric (3.4) will be locally of the form
M4 × IR1,5 akin to the asymptotic bubbling geometry. However M4 can be a general hyper-Kähler
manifold. Therefore the solutions we get will be more general, whose explicit form will depend on
underlying Killing symmetries and boundary conditions. For example, the type IIB case is given by a
hyper-Kähler geometry with one translational Killing vector (Gibbons-Hawking) while the M theory
case is with one rotational Killing vector (real heaven) [43]. Therefore we may get in general bubbling
geometries in the M theory as well as the type IIB string theory.
5 Discussion
We showed reasonable evidences that the 10-dimensional metric (3.4) for d = 4 and n = 3 describes
the emergent geometry arising from the 4-dimensional N = 4 supersymmetric U(N) Yang-Mills
theory and thus might explain the AdS/CFT duality [12]. An important point in this context is that
the volume form (4.30) is required to describe the AdS5 × S5 background. What is the origin of this
nontrivial volume form ? In other words, how to realize the self-dual RR five-form background from
the gauge theory point of view ?
To get some hint about the question, first note that the AdS5 × S5 geometry emerges from multi-
instanton collective coordinates which dominates the path integral in a large N limit [44]. The factor
d4zdρρ−5 appears in bosonic collective coordinate integration (with zµ the instanton 4-positions)
which agrees with the volume form of the conformally invariant space AdS5, where instanton size
corresponds to the radial coordinate ρ in Eq.(4.29). Another point is that the AdS5 × S5 space cor-
responds to the LLM geometry for the simplest and most symmetric configuration which reduces to
the usual Gibbons-Hakwing metric (3.6) at asymptotic regions [18]. This result is consistent with
the picture in Section 4.2 that U(N) instantons at large N limit are indeed gravitational instantons.
It is then tempted to speculate that the AdS5 × S5 geometry would be emerging from a maximally
supersymmetric instanton solution of Eq.(2.19) in D = 10. It should be an interesting future work.
In addition, we would like to point out that an AdSp × Sq background arises from Eq.(3.4) in the
same way as Eq.(4.30) by choosing the volume form ω as follows
dy1 ∧ · · · ∧ dyq+1
(5.1)
with ρ2 =
a=1 y
aya and (Aµ,Φ
a) = (0, ya/κ). A particularly interesting case is d = 2 and n = 2
for which the volume form (5.1) leads to the AdS3 × S3 background and the action (2.13) describes
matrix strings [45, 46]. We believe that the metric (3.4) with ω = dy1 ∧ · · · ∧ dy4/ρ2 describes a
bubbling geometry emerging from the matrix strings.
One might already notice a subtle difference between the matrix action (2.13) and the Ward’s
metric (3.4). According to our construction in Section 2, the number of the Higgs fields Φa is even
while the Ward construction has no such restriction. But it was shown in [15, 16] that the Gibbons-
Hawking metric (3.6) for the d = 3 and m = 1 case also arises from the d = 0 and m = 4. It implies
that we can replace some transverse scalars by gauge fields and vice versa. Recalling that the fields
in the action (2.13) are all N × N matrices, of course, N → ∞, it is precisely ‘matrix T -duality’
exchanging transverse scalars and gauge fields associated with a compact direction in p-brane and
(p+ 1)-brane worldvolume theories through (see Eq.(154) in [46])
Φa ↔ iDµ = i(∂µ − iAµ). (5.2)
With this identification, the d-dimensional U(N) gauge theory (2.13) can be obtained by applying the
d-fold ‘matrix T -duality’ (5.2) to the 0-dimensional IKKT matrix model [11, 20]
S = −2πκ
gMPgNQ[Φ
M ,ΦN ][ΦP ,ΦQ]
. (5.3)
However, the T -duality (5.2) gives rise to qualitatively radical changes in worldvolume theory.
First it changes the dimensionality of the theory and thus it affects its renormalizability (see Sec. VI
in [46] and references therein for this issue in Matrix theory). For example, the action (2.13) for
d > 4 is not renormalizable since the coupling constant g2YM ∼ gsκ
2 ∼ gsm4−ds has negative mass
dimension in this case. Second it also changes a behavior of the emergent metric (3.4). But these
changes are rather consistent with the fact that under the T -duality (5.2) a Dp-brane is transformed
into a D(p+ 1)-brane and vice versa.
Our construction in Section 2 raises a bizarre question about the renormalization property of NC
field theory. If we look at the action (2.2), the theory superficially seems to be non-renormalizable for
D > 4 since the coupling constant (2.5) has negative mass dimension. But this non-renormalizability
appears as a fake if we use the matrix representation (2.18) together with the redefinition of variables
in Eq.(2.4). The resulting coupling constant, denoted as gd, in the final action (2.13) depends only on
the dimension of commutative spacetime rather than the entire spacetime. Since the resulting U(N)
theory is in the limit N → ∞, while the ’t Hooft coupling λ ≡ g2dN is kept fixed, planar diagrams
dominate in this limit [9]. Since the dependence of NC coordinates in the action (2.2) has been
encoded into the matrix degrees of freedom, one may suspect that the divergence of the original theory
might appear as a divergence of perturbation series as a whole in the action (2.13). The convergence
aspect of the planar perturbation theory concerns Np(n), the number of planar diagrams in nth order
in λ. It was shown in [47] that Np(n) behaves asymptotically as
Np(n)
n→∞∼ cn, c = constant. (5.4)
Therefore the planar theory (unlike the full theory) for d ≤ 4 has a formally convergent perturbation
series, provided the ultraviolet and infrared divergences of individual diagrams are cut off [47]. It will
be interesting to carefully examine the renormalization property of NC field theories along this line.
We showed in Section 3 that the Ward metric (3.4) is emerging from commutative, i.e., O(θ),
limit. Since the vector fields in Eq.(3.11) are in general higher order differential operators acting on
Aθ, we thus expect that they actually define a ‘generalized gravity’ beyond Einstein gravity, e.g., the
NC gravity [29] or the NC unimodular gravity [48].6 It was shown in [16] that the leading derivative
6The latter seems to be quite relevant to our emergent gravity since the vector fields VM in Eq.(3.11) always belong to
the volume preserving diffeomorphisms, which is a generic property of vector fields defined in NC spacetime. It should
be interesting to more clarify the relation between the NC unimodular gravity [48] and the emergent gravity.
corrections in NC gauge theory start with four derivatives, which was conjectured to give rise to higher
order gravity. As was explicitly checked for the self-dual case, Einstein gravity maybe emerges from
NC gauge fields in commutative limit, which then implies that the leading derivative corrections give
rise to higher order terms with four more derivatives compared to the Einstein gravity. This means
that the higher order gravity starts from the second order corrections in θ with higher derivatives, that
is, no first order correction in θ to the Einstein gravity. Interestingly this result is consistent with those
in [29] and also in [49] calculated from the context of NC gravity.
It was shown in Section 4.1 that the self-duality system for the D = 4 and n = 1 case is mapped
to the two-dimensional U(∞) chiral model (4.9) which is remarkably equivalent to self-dual Einstein
gravity [33, 17, 34]. But this case should not be much different from the D = 4 and n = 2 case
in [15, 16] since they equally describe the self-dual Einstein gravity. Indeed we can make them
bear a close resemblance each other. For the purpose, let us consider a four-dimensional NC space
IR2NC × IR2NC . We can choose the matrix representation (2.18) only for the second factor, i.e.,
φ(y1, y2, y3, y4) =
Ωij (y
1, y2)|i〉〈j|. (5.5)
As a result, the action (2.13) now becomes two-dimensional U(N) gauge theory on IR2NC . The self-
dual equations in Eq.(4.1) in this case are given by the NC Hitchin equations, now defined on IR2NC
instead of IR2C . The NC Hitchin equations have been considered by K. Lee in [37] with very parallel
results with the commutative case (4.1). It is interesting that there exist two different realizations for
self-dual Einstein gravity, whose relationship should be more closely understood.
Finally it will be interesting to consider a compact NC space instead of IR2nNC , for instance, a
NC 2n-torus T2nNC . Since the module over a NC torus is still infinite dimensional [8], the matrix
representation (2.18) is also infinite dimensional. Thus we expect that our construction in Section 2
and 3 can be applied even to the NC torus without many essential changes.
Acknowledgments
After posting this paper to the arXiv, we were informed of related works on YM-Higgs BPS configu-
rations on NC spaces [50] and on the relation between a large N gauge theory, a Moyal deformation
and a self-dual gravity [51] by O. Lechtenfeld and C. Castro, respectively. We thank them for the
references. This work was supported by the Alexander von Humboldt Foundation.
References
[1] M. R. Douglas and N. A. Nekrasov, Noncommutative field theory, Rev. Mod. Phys. 73, 977
(2001), hep-th/0106048; R. J. Szabo, Quantum field theory on noncommutative spaces,
Phys. Rep. 378, 207 (2003), hep-th/0109162.
[2] R. J. Szabo, Symmetry, gravity and noncommutativity, Class. Quantum Grav. 23, R199 (2006),
hep-th/0606233.
[3] H. S. Yang, On The Correspondence Between Noncommuative Field Theory And Gravity, Mod.
Phys. Lett. A22, 1119 (2007), hep-th/0612231.
[4] M. Kontsevich, Deformation Quantization of Poisson Manifolds, Lett. Math. Phys. 66, 157
(2003), q-alg/9709040.
[5] J. Madore, The fuzzy sphere, Class. Quantum Grav. 9, 69 (1992).
[6] J. A. Harvey, Topology of the Gauge Group in Noncommutative Gauge Theory,
hep-th/0105242.
[7] F. Lizzi, R. J. Szabo and A. Zampini, Geometry of the gauge algebra in noncommutative Yang-
Mills theory, J. High Energy Phys. 08, 032 (2001), hep-th/0107115.
[8] N. Seiberg and E. Witten, String theory and noncommutative geometry, J. High Energy Phys.
09, 032 (1999), hep-th/9908142.
[9] G. ’t Hooft, A planar diagram theory for strong interactions, Nucl. Phys. B72, 461 (1974).
[10] T. Banks, W. Fischler, S. H. Shenker and L. Susskind, M theory as a matrix model: A conjecture,
Phys. Rev. D55, 5112 (1997), hep-th/9610043.
[11] N. Ishibashi, H. Kawai, Y. Kitazawa and A. Tsuchiya, A large-N reduced model as superstring,
Nucl. Phys. B498, 467 (1997), hep-th/9612115.
[12] J. M. Maldacena, The Large N Limit of Superconformal Field Theories and Supergravity, Adv.
Theor. Math. Phys. 2, 231 (1998); Int. J. Theor. Phys. 38, 1113 (1999), hep-th/9711200;
S. S. Gubser, I. R. Klebanov and A. M. Polyakov, Gauge Theory Correlators from Non-Critical
String Theory, Phys. Lett. B428, 105 (1998), hep-th/9802109; E. Witten, Anti De Sitter
Space And Holography, Adv. Theor. Math. Phys. 2, 253 (1998), hep-th/9802150.
[13] M. Salizzoni, A. Torrielli and H. S. Yang, ALE spaces from noncommutative U(1) instantons
via exact Seiberg-Witten map, Phys. Lett. B634, 427 (2006), hep-th/0510249.
[14] H. S. Yang and M. Salizzoni, Gravitational Instantons from Gauge Theory, Phys. Rev. Lett. 96,
201602 (2006), hep-th/0512215.
http://arxiv.org/abs/hep-th/0106048
http://arxiv.org/abs/hep-th/0109162
http://arxiv.org/abs/hep-th/0606233
http://arxiv.org/abs/hep-th/0612231
http://arxiv.org/abs/q-alg/9709040
http://arxiv.org/abs/hep-th/0105242
http://arxiv.org/abs/hep-th/0107115
http://arxiv.org/abs/hep-th/9908142
http://arxiv.org/abs/hep-th/9610043
http://arxiv.org/abs/hep-th/9612115
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/9802109
http://arxiv.org/abs/hep-th/9802150
http://arxiv.org/abs/hep-th/0510249
http://arxiv.org/abs/hep-th/0512215
[15] H. S. Yang, Instantons and Emergent Geometry, hep-th/0608013.
[16] H. S. Yang, Emergent Gravity from Noncommutative Spacetime, hep-th/0611174.
[17] R. S. Ward, The SU(∞) chiral model and self-dual vacuum spaces, Class. Quantum Grav. 7,
L217 (1990).
[18] H. Lin, O. Lunin and J. Maldacena, Bubbling AdS space and 1/2 BPS geometries, J. High
Energy Phys. 10, 025 (2004), hep-th/0409174; H. Lin and J. Maldacena, Fivebranes from
gauge theory, Phys. Rev. D74, 084014 (2006), hep-th/0509235.
[19] N. Seiberg, A note on background independence in noncommutative gauge theories, matrix
model and tachyon condensation, J. High Energy Phys. 09, 003 (2000), hep-th/0008013.
[20] H. Aoki, N. Ishibashi, S. Iso, H. Kawai, Y. Kitazawa and T. Tada, Non-commutative Yang-Mills
in IIB matrix model, Nucl. Phys. B565 (2000) 176, hep-th/9908141.
[21] E. Corrigan, C. Devchand, D. B. Fairlie and J. Nuyts, First-order equations for gauge fields in
spaces of dimension greater than four, Nucl. Phys. B214, 452 (1983).
[22] R. S. Ward, Completely solvable gauge-field equations in dimension greater than four, Nucl.
Phys. B236, 381 (1984).
[23] D. Bak, K. Lee and J.-H. Park, BPS equations in six and eight dimensions, Phys. Rev. D66,
025021 (2002), hep-th/0204221.
[24] A. Ashtekar, T. Jabobson and L. Smolin, A New Characterization Of Half-Flat Solutions to
Einstein’s Equation, Commun. Math. Phys. 115, 631 (1988).
[25] L. J. Mason and E. T. Newman, A Connection Between the Einstein and Yang-Mills Equations,
Commun. Math. Phys. 121, 659 (1989).
[26] S. Chakravarty, L. Mason and E. T. Newman, Canonical structures on anti-self-dual four-
manifolds and the diffeomorphism group, J. Math. Phys. 32, 1458 (1991).
[27] D. D. Joyce, Explicit Construction of Self-dual 4-Manifolds, Duke Math. J. 77, 519 (1995).
[28] G. W. Gibbons and S. W. Hawking, Gravitational Multi-instantons, Phys. Lett. 78B, 430 (1978).
[29] P. Aschieri, C. Blohmann, M. Dimitrijevic, F. Meyer, P. Schupp and J. Wess, A gravity theory on
noncommutative spaces, Class. Quant. Grav. 22, 3511 (2005), hep-th/0504183; P. Aschieri,
M. Dimitrijevic, F. Meyer and J. Wess, Noncommutative geometry and gravity, Class. Quant.
Grav. 23, 1883 (2006), hep-th/0510059.
http://arxiv.org/abs/hep-th/0608013
http://arxiv.org/abs/hep-th/0611174
http://arxiv.org/abs/hep-th/0409174
http://arxiv.org/abs/hep-th/0509235
http://arxiv.org/abs/hep-th/0008013
http://arxiv.org/abs/hep-th/9908141
http://arxiv.org/abs/hep-th/0204221
http://arxiv.org/abs/hep-th/0504183
http://arxiv.org/abs/hep-th/0510059
[30] R. Gopakumar and D. J. Gross, Mastering the master field, Nucl. Phys. B451, 379 (1995),
hep-th/9411021; I. Ya. Aref’eva and I. V. Volovich, The master field for QCD and q-
deformed quantum field theory, Nucl. Phys. B462, 600 (1996), hep-th/9510210.
[31] J. Madore, S. Schraml, P. Schupp and J. Wess, Gauge theory on noncommutative spaces, Eur.
Phys. J. C16, 161 (2000), hep-th/0001203.
[32] C.-S. Chu, V. V. Khoze and G. Travaglini, Notes on noncommutative instantons, Nucl. Phys.
B621, 101 (2002), hep-th/0108007; K.-Y. Kim, B.-H. Lee and H. S. Yang, Noncommuta-
tive instantons on R2NC ×R2C , Phys. Lett. B523, 357 (2001), hep-th/0109121.
[33] Q-H. Park, Self-dual Gravity as A Large-N Limit of the 2D Non-linear Sigma Model, Phys.
Lett. B238, 287 (1990).
[34] V. Husain, Self-Dual Gravity and the Chiral Model, Phys. Rev. Lett. 72, 800 (1994),
gr-qc/9402020.
[35] G. V. Dunne, R. Jackiw, S.-Y. Pi and C. A. Trugenberger, Self-dual Chern-Simons solitons and
two-dimensional nonlinear equations, Phys. Rev. D43, 1332 (1991); Erratum-ibid. D45, 3012
(1992).
[36] G. V. Dunne, Chern-Simons solitons, Toda theories and the chiral model, Commun. Math. Phys.
150, 519 (1992), hep-th/9204056.
[37] K.-M. Lee, Chern-Simons solitons, chiral model, and (affine) Toda model on noncommutative
space, J. High Energy Phys. 08, 054 (2004), hep-th/0405244.
[38] J. F. Plebañski, Some solutions of complex Einstein equations, J. Math. Phys. 16, 2395 (1575).
[39] J. F. Plebański and M. Przanowski, The Lagrangian of a self-dual gravitational field as a limit of
the SDYM Lagrangian, Phys. Lett. A212, 22 (1996), hep-th/9605233.
[40] K. Dasgupta, C. Herdeiro, S. Hirano and R. Kallosh, D3-D7 inflationary model and M theory,
Phys. Rev. D65, 126002 (2002), hep-th/0203019.
[41] G. Dvali and M. Shifman, Dynamical compactification as a mechanism of spontaneous super-
symmetry breaking, Nucl. Phys. B504, 127 (1997), hep-th/9611213.
[42] C. P. Boyer and J. D. Finley, III, Killing vectors in self-dual, Euclidean Einstein spaces, J. Math.
Phys. 23, 1126 (1982).
[43] I. Bakas and K. Sfetsos, Toda Fields of SO(3) Hyper-Kahler Metrics and Free Field Realizations,
Int. J. Mod. Phys. A12, 2585 (1997), hep-th/9604003.
http://arxiv.org/abs/hep-th/9411021
http://arxiv.org/abs/hep-th/9510210
http://arxiv.org/abs/hep-th/0001203
http://arxiv.org/abs/hep-th/0108007
http://arxiv.org/abs/hep-th/0109121
http://arxiv.org/abs/gr-qc/9402020
http://arxiv.org/abs/hep-th/9204056
http://arxiv.org/abs/hep-th/0405244
http://arxiv.org/abs/hep-th/9605233
http://arxiv.org/abs/hep-th/0203019
http://arxiv.org/abs/hep-th/9611213
http://arxiv.org/abs/hep-th/9604003
[44] M. Bianchi, M. B. Green, S. Kovacs and G. Rossi, Instantons in supersymmetric Yang-
Mills and D-instantons in IIB superstring theory, J. High Energy Phys. 08, 013 (1998),
hep-th/9807033; N. Dorey, T. J. Hollowood, V. V. Khoze, M. P. Mattis and S. Vandoren,
Multi-instanton calculus and the AdS/CFT correspondence in N = 4 superconformal field the-
ory, Nucl. Phys. B552, 88 (1999), hep-th/9901128.
[45] L. Motl, Proposals on nonperturbative superstring interactions, hep-th/9701025; R. Di-
jkgraaf, E. Verlinde and H. Verlinde, Matrix string theory, Nucl. Phys. B500, 43(1997),
hep-th/9703030.
[46] W. Taylor, M(atrix) theory: matrix quantum mechanics as a fundamental theory, Rev. Mod.
Phys. 73, 419 (2001), hep-th/0101126.
[47] J. Koplik, A. Neveu and S. Nussinov, Some aspects of the planar perturbation series, Nucl. Phys.
B123, 109 (1977).
[48] X. Calmet and A. Kobakhidze, Noncommutative general relativity, Phys. Rev. D72, 045010
(2005), hep-th/0506157.
[49] P. Mukherjee and A. Saha, Note on the noncommutative correction to gravity, Phys. Rev. D74,
027702 (2006), hep-th/0605287; X. Calmet and A. Kobakhidze, Second order noncommu-
tative corrections to gravity, Phys. Rev. D74, 047702 (2006), hep-th/0605275; R. Banerjee,
P. Mukherjee and S. Samanta, Lie algebraic Noncommutative Gravity, Phys. Rev. D75, 125020
(2007), hep-th/0703128.
[50] O. Lechtenfeld, A. D. Popov and R. J. Szabo, Noncommutative instantons in higher dimensions,
vortices and topological K-cycles, J. High Energy Phys. 12, 022 (2003), hep-th/0310267;
A. V. Domrin, O. Lechtenfeld and S. Petersen, Sigma-Model Solitons in the Noncommu-
tative Plane: Construction and Stability Analysis, J. High Energy Phys. 03, 045 (2005),
hep-th/0412001; O. Lechtenfeld, A. D. Popov and R. J. Szabo, Rank two quiver gauge
theory, graded connections and noncommutative vortices, J. High Energy Phys. 09, 054 (2006),
hep-th/0603232.
[51] C. Castro, SU(∞) (super)gauge theories and self-dual (super)gravity, J. Math. Phys. 34, 681
(1993); The N = 2 super-Wess-Zumino-Novikov-Witten model valued in superdiffeomorphism
(SDIFF) M2 is self-dual supergravity in four dimensions, J. Math. Phys. 35, 920 (1994); C.
Castro and J. Plebański, The generalized Moyal-Nahm and continuous Moyal-Toda equations,
J. Math. Phys. 40, 3738 (1996), hep-th/9710041; C. Castro, A Moyal quantization of the
continuous Toda field, Phys. Lett. B413, 53 (1997), hep-th/9703094; S. Ansoldi, C. Castro
and E. Spallucci, Chern-Simons hadronic bag from quenched large-N QCD, Phys. Lett. B504,
174 (2001), hep-th/0011013.
http://arxiv.org/abs/hep-th/9807033
http://arxiv.org/abs/hep-th/9901128
http://arxiv.org/abs/hep-th/9701025
http://arxiv.org/abs/hep-th/9703030
http://arxiv.org/abs/hep-th/0101126
http://arxiv.org/abs/hep-th/0506157
http://arxiv.org/abs/hep-th/0605287
http://arxiv.org/abs/hep-th/0605275
http://arxiv.org/abs/hep-th/0703128
http://arxiv.org/abs/hep-th/0310267
http://arxiv.org/abs/hep-th/0412001
http://arxiv.org/abs/hep-th/0603232
http://arxiv.org/abs/hep-th/9710041
http://arxiv.org/abs/hep-th/9703094
http://arxiv.org/abs/hep-th/0011013
	Introduction
	A Large N Gauge Theory From NC U(1) Gauge Theory
	Emergent Geometry From NC Gauge Theory
	Self-dual Einstein Gravity From Large N Gauge Theory
	D=4 and n=1
	D=6 and n=1
	D=8 and D=10
	Discussion
ABSTRACT
  We map noncommutative (NC) U(1) gauge theory on R^d_C X R^{2n}_{NC} to U(N ->
\infty) Yang-Mills theory on R^d_C, where R^d_C is a d-dimensional commutative
spacetime while R^{2n}_{NC} is a 2n-dimensional NC space. The resulting U(N)
Yang-Mills theory on R^d_C is equivalent to that obtained by the dimensional
reduction of (d+2n)-dimensional U(N) Yang-Mills theory onto R^d_C. We show that
the gauge-Higgs system (A_\mu,\Phi^a) in the U(N -> \infty) Yang-Mills theory
on R^d_C leads to an emergent geometry in the (d+2n)-dimensional spacetime
whose metric was determined by Ward a long time ago. In particular, the
10-dimensional gravity for d=4 and n=3 corresponds to the emergent geometry
arising from the 4-dimensional N=4 vector multiplet in the AdS/CFT duality. We
further elucidate the emergent gravity by showing that the gauge-Higgs system
(A_\mu,\Phi^a) in half-BPS configurations describes self-dual Einstein gravity.

<|endoftext|><|startoftext|>
Introduction 1
2 Bosonic Solution 2
3 Superstring Solution 5
4 Pure Gauge for Bosonic Solution 10
5 Conclusion 12
A B0,L0 with Split Strings 13
B Unitary eΦ 15
1 Introduction
Following the breakthrough analytic solution of Schnabl[1], our analytic understanding of open
string field theory (OSFT) has seen remarkable progress[2, 3, 4, 5, 6, 7, 8]. So far most work has
focused on the open bosonic string, but clearly it is also important to consider the superstring.
This is not just because superstrings are ultimately the theory of interest, but because there
are important physical questions, especially the holographic encryption of closed string physics
in OSFT, which may be difficult to decipher in the bosonic case[9].
Ideally, the first goal should be to find an analytic solution of superstring field theory1 on a
non-BPS brane describing the endpoint of tachyon condensation, i.e. the closed string vacuum.
However, the construction of this solution is will likely be subtle—indeed, Schnabl’s solution for
the bosonic vacuum is very close to being pure gauge[1, 2]. Thus, it may be useful to consider a
simpler problem first: constructing solutions describing marginal deformations of a (non)BPS D-
brane. Marginal deformations correspond to a one-parameter family of open string backgrounds
obtained by adding a conformal boundary interaction to the worldsheet action—for example,
turning on a Wilson line on a brane by adding the boundary term Aµ
dt∂Xµ(t) to the
worldsheet action. Such backgrounds were studied numerically for the bosonic string in ref.[11]
and for the superstring in ref.[12]. Recently, Schnabl[13] and Kiermaier et al[14] found analytic
solutions for marginal deformations in bosonic OSFT2. The solutions bear striking resemblance
1In this paper we will work with the Berkovits WZW-type superstring field theory[10].
2For previous efforts to construct such solutions analytically in bosonic and super OSFT, see refs.[15, 16].
to Schnabl’s vacuum solution, but are simpler in the sense that they are manifestly nontrivial
and can be constructed systematically with a judicious choice of gauge.
In this note, we construct solutions of super OSFT describing marginal deformations gen-
erated by on-shell vertex operators with vanishing operator products (in either the 0 or −1
picture). As was found in ref.[13, 14] such deformations are technically simpler since they
allow for solutions in Schnabl’s gauge, B0Φ = 0—though probably more general marginal solu-
tions can be obtained once the analogous problem is understood for the bosonic string, either
by adding counterterms as described in ref.[14] or by employing a “pseudo-Schnabl gauge” as
suggested in ref.[13]. The superstring solution exhibits a remarkable duality with its bosonic
counterpart: it formally represents a re-expression of the bosonic solution in pure gauge form.
It would be very interesting if this duality generalized to other solutions.
This paper is organized as follows. In section 2 we briefly review the bosonic marginal
solution in the split string formalism[2, 8, 17], which we will prove convenient for many com-
putations. In section 3 we consider the superstring, motivating the solution as analogous to
constructing an explicit pure gauge form for the bosonic marginal solution. This strategy
quickly gives a very simple expression for the complete analytic solution of super OSFT. In sec-
tion 4 we consider the dual problem: finding a pure gauge expression for the bosonic marginal
deformation describing a constant, light-like gauge field on a non-compact brane. Though quite
analogous to the superstring, this problem is slightly more complex. Nevertheless we are able
to find an analytic solution. We end with some conclusions.
While this note was in preparation, we learned of the independent solution by Yuji Okawa[18].
His paper should appear concurrently.
2 Bosonic Solution
Let us begin by reviewing the bosonic marginal solution[13, 14] in the language of the split
string formalism[2, 8, 17], which is a useful shorthand for many calculations. The first step in
this approach is to find a subalgebra of the open string star algebra, closed under the action of
the BRST operator, in which we hope to find an analytic solution. For the bosonic marginal
solution the subalgebra is generated by three string fields K,B and J :
K = Grassmann even, gh# = 0
B = Grassmann odd, gh# = −1
J = Grassmann odd, gh# = 1 (2.1)
satisfying the identities,
[K,B] = 0 B2 = J2 = 0 (2.2)
dK = 0 dJ = 0 dB = K (2.3)
where d = QB is the BRST operator and the products above are open string star products (we
will mostly omit the ∗ in this paper). The relevant explicit definitions of K,B, J are3,
K = −π
(K1)L|I〉 K1 = L1 + L−1
B = −
(B1)L|I〉 B1 = b1 + b−1
J = J(1)|I〉 (2.4)
where |I〉 is the identity string field and the subscript L denotes taking the left half of the
corresponding charge4. The operator J(z) is a dimension zero primary generating the marginal
trajectory. It takes the form,
J(z) = cO(z) (2.5)
where O is a dimension one matter primary with nonsingular OPE with itself. This is crucial
for guaranteeing that the square of the field J vanishes, as in eq.(2.2). With these preliminaries,
the marginal solution for the bosonic string is:
Ψ = λFJ
1− λB F 2−1
F (2.6)
where λ parameterizes the marginal trajectory and F = eK/2 = Ω1/2 is the square root of the
SL(2,R) vacuum (a wedge state). To linear order in λ the solution is,
Ψ = λFJF + ... = λJ(0)|Ω〉+ ... (2.7)
which is the nontrivial element of the BRST cohomology generating the marginal trajectory.
Let us prove that eq.2.6 satisfies the equations of motion. Using the identities Eqs.(2.2,2.3),
dΨ = −λFJd
1− λB F 2−1
= −λFJ
1− λB F 2−1
F 2 − 1
1− λB F 2−1
= −λ2FJ 1
1− λB F 2−1
(F 2 − 1)J 1
1− λB F 2−1
F (2.8)
3We may generalize the construction by considering other projector frames[4, 7, 8] or by allowing the field F
in eq.(2.6) to be an arbitrary function of K[2, 8]. Such generalizations do not add much to the current discussion
so we will stick with the definitions presented here.
4“Left” means integrating the current counter-clockwise on the positive half of the unit circle. This convention
differs by a sign from ref.[8] but agrees with ref.[4].
Notice the (F 2−1)J factor in the middle. Since J2 = 0, the −1)J term vanishes when multiplied
with the Js to the left—thus the necessity of marginal operators with nonsingular OPE. This
leaves,
dΨ = −λ2FJ 1
1− λB F 2−1
1− λB F 2−1
F = −Ψ2 (2.9)
i.e. the bosonic equations of motion are satisfied.
The solution has a power series expansion in λ:
λnΨn (2.10)
where,
Ψn = FJ
F 2 − 1
F (2.11)
To make contact with the expressions of refs.[13, 14], note the relation,
F 2 − 1
dtΩt (2.12)
To prove this, recall Ωt = etK and calculate5,
dtΩt =
etK = eK − 1 = F 2 − 1 (2.13)
Using this and the mapping between the split string notation and conformal field theory de-
scribed in ref.[8], the Ψns can be written as CFT correlators on the cylinder:
〈Ψn, χ〉 =
dt1...
dtn−1 〈J(tn−1 + ...+ t1 + 1)B...J(t1 + 1)BJ(1) fS ◦ χ(0)〉Ctn−1+...+t1+2
(2.14)
where fS(z) =
tan−1 z is the sliver conformal map, and in this context B is the insertion
∫ −i∞
b(z) to be integrated parallel to the axis of the cylinder in between the J insertions on
either side. This matches the expressions found in refs.[13, 14].
In passing, we mention that this solution was originally constructed systematically by using
the equations of motion to recursively determine the Ψns in Schnabl gauge. If desired, it is also
possible to perform such calculations in split string language; we offer some sample calculations
in appendix A.
5Note that, in general, the inverse of K is not well defined. However, when operating on F 2 − 1 it is. This
is why we cannot simply use F 2/K in the solution in place of F
, which would naively give a solution even
for marginal operators with singular OPEs.
3 Superstring Solution
Let us now consider the superstring. The marginal deformation is generated by a −1 picture
vertex operator,
e−φcO(z) (3.1)
where O(z) is a dimension 1
superconformal matter primary. We will use Berkovits’s WZW-
type superstring field theory[10]6, in which case the string field is given by multiplying the −1
picture vertex operator by the ξ ghost:
X(z) = ξe−φcO(z) (3.2)
This corresponds to a solution of the linearized Berkovits equations of motion,
η0QB (λX(0)|Ω〉) = 0 (3.3)
since η0 eats the ξ and the −1 picture vertex operator is in the BRST cohomology. We will
also find it useful to consider the 0 picture vertex operator,
J(z) = QB ·X(z) = cG−1/2 · O(z)− eφηO(z) (3.4)
A complimentary way of seeing the linearized equations of motion are satisfied is to note that
J(z) is in the small Hilbert space. As with the bosonic string, it is very helpful to assume that
X(z) and J(z) have vanishing OPEs:
J(z)X(w) = lim
J(z)J(w) = lim
X(z)X(w) = 0 (3.5)
We mention two examples of such deformations. The simplest is the light-like Wilson line
O(z) = ψ+(z) (α′ = 1), where
X(z) = ξe−φcψ+(z)
J(z) = i
2c∂X+(z)− eφηψ+(z) (3.6)
There is also a “rolling tachyon” marginal deformation[22] O(z) = σ1eX
2(z) on a non-BPS
brane. The corresponding vertex operators are,
X(z) = σ1ξe
−φceX
J(z) = σ2(cψ
0 − ieφη)eX0/
2(z) (3.7)
6See refs.[19, 20, 21] for nice reviews.
The Pauli matrices σ1, σ2, σ3 are “internal” Chan-Paton factors[23, 24], necessary to accom-
modate non-BPS GSO(−) states into the Berkovits framework. Though we will not write it
explicitly, in this context it is important to remember that the BRST operator and the eta zero
mode are carrying a factor of σ3 (thus the presence iσ2 = σ3σ1 in the above expression for J).
We mention that both X(0)|Ω〉 and J(0)|Ω〉 are in Schnabl gauge and annihilated by L0.
Let us describe the subalgebra relevant for finding the marginal solution. It consists of the
products of four string fields, K,B,X, J7:
K = Grassmann even, gh# = 0
B = Grassmann odd, gh# = −1
X = Grassmann even, gh# = 0
J = Grassmann odd, gh# = 1 (3.8)
All four of these have vanishing picture number. K and B are the same fields encountered
earlier in eq.(2.4); X and J are defined,
X = X(1)|I〉 J = J(1)|I〉 (3.9)
with X(z), J(z) as in Eqs.(3.2,3.4). We have the identities,
[K,B] = 0 B2 = 0 X2 = J2 = XJ = JX = 0 (3.10)
where the third set follows because the corresponding vertex operators have vanishing OPEs.
The algebra is closed under the action of the BRST operator:
dB = K dK = 0
dX = J dJ = 0 (3.11)
Note that the eta zero mode d̄ ≡ η0 annihilates K,B and J ,
d̄K = d̄B = d̄J = 0 (3.12)
since they live in the small Hilbert space. However, it does not annihilate X , and the algebra is
not closed under d̄. Though it is not a priori obvious that the K,B,X, J algebra is rich enough
to encapsulate the marginal solution, we will quickly see that it is.
7Note that for for a GSO(−) deformation the Grassmann assignments of X, J are opposite. Still, as far
as the solution is concerned X is even and J is odd because QB, η0 carry a σ3 which anticommutes with the
internal Chan-Paton matrices of the vertex operators.
We seek a one parameter family of solutions of the super OSFT equations of motion,
e−ΦdeΦ
= 0 (3.13)
where Φ is a Grassmann even, ghost and picture number zero string field which to linear order
in the marginal parameter takes the form,
Φ = λFXF + ... (3.14)
There are many strategies one could take to solve this equation, but before describing our
particular approach it is worth mentioning the “obvious” method: fixing Φ in Schnabl gauge
and attempting a perturbative solution, as in refs.[13, 14]:
λnΦn Φ1 = FXF (3.15)
At second order8, the Schnabl gauge solution is actually fairly simple:
F 2 − 1
JF + FJB
F 2 − 1
(3.16)
and seems quite similar to the bosonic solution. At third order, however, we found an extremely
complicated expression (though still within the K,B,X, J subalgebra). It seems doubtful that
a closed form solution for Φ in Schnabl gauge can be obtained.
Since the Schnabl gauge construction appears complicated, we are lead to consider another
approach. To motivate our particular strategy, we make two observations: First, the combi-
nation e−ΦdeΦ which enters the superstring equations of motion also happens to be a pure
gauge configuration from the perspective of bosonic OSFT. Second, there is a basic similarity
between the K,B, J algebra for the bosonic marginal solution and the K,B, J,X algebra for
the superstring. The main difference of course is the presence of X for the superstring, whose
BRST variation gives J . If such a field were present for the bosonic string, the bosonic marginal
solution would be pure gauge because J would be trivial in the BRST cohomology. With this
motivation, we are lead to consider the equation
e−ΦdeΦ = λFJ
1− λB F 2−1
F (3.17)
From the bosonic string perspective, this equation represents an expression of the bosonic
marginal solution in a form which is pure gauge. From the superstring perspective, this is a
8Explicitly, if we plug eq.(3.15) into the equations of motion, we find a recursive set of equations of the form
d̄dΦn = d̄Fn−1[Φ], where Fn−1[Φ] depends on Φ1, ...,Φn−1. The Schnabl gauge solution is obtained by writing
Fn−1[Φ].
partially gauge fixed form of the equations of motion, since the expression on the right hand
side is in the small Hilbert space.
Let us now solve this equation. It will turn out to be simpler to solve for the group element
g = eΦ; we make a perturbative ansatz,
g = eΦ = 1 +
λngn g1 = Φ1 = FXF (3.18)
Expanding out eq.(3.17) to second order gives,
dg2 = FJB
F 2 − 1
JF + g1dg1
= FJB
F 2 − 1
JF + FXF 2JF (3.19)
As it turns out, this equation is solved by the second order Schnabl gauge solution eq.(3.16):
g2 = Φ2 +
Φ21 =
F 2 − 1
JF + FJB
F 2 − 1
XF + FXF 2XF
(3.20)
but there is a simpler solution:
g2 = FXB
F 2 − 1
JF (3.21)
Using this form of g2 we can proceed to third order—remarkably, the solution is practically just
as simple:
g3 = FX
F 2 − 1
F (3.22)
This leads to an ansatz for the full solution:
eΦ = 1 + λFX
1− λB F 2−1
F (3.23)
To check this, calculate:
deΦ = λFJ
1− λB F 2−1
F + λFXd
1− λB F 2−1
= λFJ
1− λB F 2−1
F + λFX
1− λB F 2−1
F 2 − 1
1− λB F 2−1
= λFJ
1− λB F 2−1
F + λ2FX
1− λB F 2−1
1− λB F 2−1
1 + λFX
1− λB F 2−1
1− λB F 2−1
= eΦλFJ
1− λB F 2−1
F (3.24)
Therefore, eq.(3.23) is indeed a complete solution to the super OSFT equations of motion!
Note, however, that it is not quite a solution to the pure gauge problem of the bosonic string.
In particular, in step three we needed to assume XJ = 0—something we would not expect
to hold in the bosonic context. We will give the solution to the bosonic problem in the next
section.
Let us make a few comments about this solution. First, though the string field Φ itself is
not in Schnabl gauge, the nontrivial part of the group element eΦ is—this is not difficult to see,
but we offer one explanation in appendix A. The second comment is related to the string field
reality condition. In super OSFT, the natural reality condition is that Φ should be “imaginary”
in the following sense:
〈Φ, χ〉 = −〈Φ|χ〉 (3.25)
where 〈Φ| is the Hermitian dual of |Φ〉 and χ is any test state. In split string notation we can
write this,
Φ† = −Φ (3.26)
where † is an anti-involution on the star algebra, formally completely analogous to Hermitian
conjugation of operators. With this reality condition, the group element should be unitary:
g† = g−1
Using,
K† = K B† = B J† = J X† = −X (3.27)
it is not difficult to see that the analytic solution eΦ is not unitary9. However, it is possible to
obtain a unitary solution by a simple gauge transformation of eq.(3.23); we explain details in
appendix B.
Let us take the opportunity to express the solution in a few other forms which may be more
convenient for explicit computations. Following the usual prescription we may express the gns
as correlation functions on the cylinder:
〈gn, χ〉 =
dt1...
dtn−1 〈X(tn−1 + ...+ t1 + 1)BJ(tn−2 + ..+ t1 + 1)...BJ(1) fS ◦ χ(0)〉Ctn−1+...+t1+2
= (−1)n
dt1...
dtn−1 〈X(L+ 1)[O′(ℓn−2 + 1)...O′(ℓ1 + 1)]BJ(1) fS ◦ χ(0)〉CL+2 (3.28)
In the second line we manipulated the multiple B insertions, simplifying the vertex operators
and obtaining a single B insertion to the right; we introduced the length parameters[14]:
tk L = ℓn−1 (3.29)
9By contrast, the Schnabl gauge construction automatically gives an imaginary Φ and unitary eΦ.
and defined O′(z) = G− 1
· O(z) (times a σ3 for GSO(−) deformations). We may also express
the solution in the operator formalism of Schnabl[1]:
|gn〉 =
(−1)nO+1
dt1...
dtn−1ÛL+2 f
S ◦ (ξe
−φO(L/2))Õ′(yn−2)...Õ′(y1)
Õ′(−L
)[B+c̃(L
)c̃(−L
)− c̃(L
)− c̃(−L
)] + f−1S ◦ (ηe
φO(−L
))[B+c̃(L
) + 1]
(3.30)
where yi = ℓi −L/2 and[6] Ûr =
. Also we have used f−1S to define the tilde to hide
some factors of π
. The expression is somewhat more complicated than the bosonic solution
since the vertex operator J(z) has a piece without a c ghost, so in the bc CFT the solution has
a component not proportional to Schnabl’s ψn[1].
4 Pure Gauge for Bosonic Solution
In the last section, we found a solution for the superstring by analogy with the pure gauge
problem of the bosonic string; but we did not solve the latter. The scenario we have in mind is
a constant, lightlike gauge field on a non-compact D-brane. Since there is no flux and no way
to wind a Wilson loop, such a field configuration should be pure gauge. From the string field
theory viewpoint, this is reflected by the fact that the marginal vertex operator becomes BRST
trivial in the noncompact limit,
ic∂X+(z) = QB · 2iX+(z) (4.1)
Of course, on a compact manifold the operator X+(z) is not globally defined so the marginal
deformation is nontrivial.
Translating to split string language, we consider an algebra generated by four fieldsK,B,X, J ,
where K,B are defined as before and,
X = 2iX+(1)|I〉 J = ic∂X+(1)|I〉 (4.2)
These have the same Grassmann and ghost number assignments as eq.(3.8). We have the
algebraic relations,
[K,B] = 0 B2 = 0 J2 = 0 [X, J ] = 0 (4.3)
Note the difference from the superstring case: the products of X with itself and with J , though
well defined (the OPEs are nonsingular), are nonvanishing. However, we still have
dB = K dK = 0
dX = J dJ = 0 (4.4)
with the second set implying that J is trivial in the BRST cohomology.
We now want to solve eq.(3.17) assuming this slightly more general set of algebraic relations.
Playing around a little bit, the solution we found is,
eΛ = 1 + λFuλ(X)
1− λB F 2−1
F (4.5)
where,
uλ(X) =
eλX − 1
(4.6)
The relevant identity satisfied by this particular combination is,
duλ = J(λuλ + 1) (4.7)
Let us prove that this gives a pure gauge expression for the bosonic marginal solution:
deΛ = λFduλ
1− λB F 2−1
F + λFuλ
1− λB F 2−1
F 2 − 1
1− λB F 2−1
= λFJ(λuλ + 1)
1− λB F 2−1
F + λ2Fuλ
1− λB F 2−1
(F 2 − 1)J
1− λB F 2−1
Now we come to the critical difference from the superstring. Note the −1)J piece in the middle
of the second term. Before it vanished when multiplied by X, J to the left. This time it
contributes because XJ 6= 0; still, the Js in the denominator of the factor to the left get killed
because J2 = 0. Thus we have,
deΛ = λFJ(λuλ + 1)
1− λB F 2−1
F + λ2Fuλ
1− λB F 2−1
1− λB F 2−1
−λ2FuλJ
1− λB F 2−1
F (4.8)
where the third term comes from the −1)J piece. Note the cancellation. We get,
deΛ = λFJ
1− λB F 2−1
F + λ2Fuλ
1− λB F 2−1
1− λB F 2−1
1 + λFuλ
1− λB F 2−1
1− λB F 2−1
= eΛλFJ
1− λB F 2−1
F (4.9)
thus we have a pure gauge expression for the marginal solution.
To further emphasize the duality with the superstring, note that for the pure gauge problem
the role of the eta zero mode is played by the lightcone derivative:
d̄ ∼ d
(4.10)
In particular we have solved the equation,
e−ΛdeΛ
= 0 (4.11)
Though there are many pure gauge trajectories generated by FXF , only a trajectory which
in addition satisfies this equation will be a well-defined, nontrivial solution once spacetime is
compactified.
5 Conclusion
In this note, we have constructed analytic solutions of open superstring field theory describing
marginal deformations generated by vertex operators with vanishing operator products. We
have not attempted to perform any detailed calculations with these solutions, though such
calculations are certainly possible. The really important questions about marginal solutions—
such as mapping out the relation between CFT and OSFT marginal parameters, obtaining
analytic solutions for vertex operators with singular OPEs, or proving Sen’s rolling tachyon
conjectures[22]—require more work even for the bosonic string. Hopefully progress will translate
directly to the superstring.
For us, the main motivation was the hope that marginal solutions could give us a hint
about how to construct the vacuum for the open superstring. Indeed, for the bosonic string
the marginal and vacuum solutions are closely related: To get the vacuum solution (up to the
ψN piece), one simply replaces J with d(Bc) = cKBc and takes the limit λ→ ∞10. Perhaps a
similar trick will work for the superstring.
The author would like to thank A. Sen and D. Gross for conversations, and A. Bagchi for
early collaboration. The author also thanks Y. Okawa for correspondence which motivated
discovery of the unitary analytic solution presented in appendix B. This work was supported
in part by the National Science Foundation under Grant No.NSF PHY05-51164 and by the
Department of Atomic Energy, Government of India.
10The λ used here and the λ parameterizing the pure gauge solutions of Schnabl[1] are related by λ(Schnabl) =
A B0,L0 with Split Strings
In many analytic computations in OSFT it is useful to invoke the operators B0,L0 and their
cousins[1, 4]. To avoid unnecessary transcriptions of notation, it is nice to accommodate these
types of operations in the split string formalism.
We begin by defining the fields,
L = (L0)L|I〉 L∗ = (L∗0)L|I〉 (A.1)
and their b-ghost counterparts B,B∗. We can split the operators L0,L∗0 into left/right halves
non-anomalously because the corresponding vector fields vanish at the midpoint[4]. The fields
L,L∗ satisfy the familiar special projector algebra,
[L,L∗] = L+ L∗ (A.2)
Following ref.[4] we may define even/odd combinations,
L+ = L+ L∗ = −K L− = L − L∗ (A.3)
where K is the field introduced before. . Note that we have,
L0 ·Ψ = LΨ+ΨL∗
B0 ·Ψ = BΨ+ (−1)ΨΨB∗ (A.4)
We can use similar formulas to describe the many related operators introduced in ref.[4]
Let us now describe a few convenient facts. Let J(z) be a vertex operator for a state J(0)|Ω〉
in Schnabl gauge, and let J = J(1)|I〉 be its corresponding field. Then,
[B−, J ] = 0 (A.5)
where [, ] is the graded commutator. A similar result [L−, J ] = 0 holds if J(0)|Ω〉 is killed by
L0. We also have the useful formulas,
LF = 1
FL− FL∗ = −1
L−F [L−,Ωγ ] = 2γKΩγ (A.6)
The third equation is a special case of,
[L−, G(K)] = 2KG′(K) (A.7)
with similar formulas involving B,B∗. Of course, these equations are well-known consequences
of the Lie algebra eq.(A.2).
As an application, let us prove the identity,
J1(0)|Ω〉 ∗ J2(0)|Ω〉 = (−1)J1FJ1B
F 2 − 1
J2F (A.8)
where J1, J2(0)|Ω〉 are killed by B0,L0. This expression occurs when constructing the marginal
solution (bosonic or superstring) in Schnabl gauge. The direct approach is to compute L−10
on the left hand side in split string notation; the resulting derivation is fairly reminiscent of
ref.[14]. Instead, we will multiply this equation by L0 and prove that both sides are equal. The
left hand side gives,
B0 · FJ1F 2J2F = BFJ1F 2J2F + (−1)J1+J2FJ1F 2J2FB∗
(−1)J1FJ1[B−, F 2]J2F
= (−1)J1FJ1BF 2J2F (A.9)
The right hand side gives,
L0 · FJ1B
F 2 − 1
J2F = LFJ1B
F 2 − 1
J2F + FJ1B
F 2 − 1
J2FL∗
L−, B
F 2 − 1
= FJ1B
F 2 − 1
J2F +
L−, F
2 − 1
J2F (A.10)
Focus on the commutator:
F 2 − 1
= [L−, F 2]
+ (F 2 − 1)
= 2F 2 − 2F
2 − 1
(A.11)
where we used eq.(A.7). This computation is a somewhat formal because the inverse of K is
not generally well defined, but it can be checked using the integral representation eq.(2.12).
Plugging the commutator back in, the F
terms cancel and we are left with,
L0 · FJ1B
F 2 − 1
J2F = FJ1BF
2J2F (A.12)
which after multiplying by (−1)J1 establishes the result.
Before concluding, we mention that any state of the form,
FJ1BG2(K)J2 ... BGn(K)JnF (A.13)
with [B−, Ji] = 0, is in Schnabl gauge. The proof follows at once upon noting,
[B−, BG(K)] = −2B2G′(K) = 0 (A.14)
so the entire expression between the F s commutes with B−. This is one way of seeing that the
nontrivial part of the group element eΦ − 1 for the superstring solution is in Schnabl gauge.
B Unitary eΦ
The analytic solution eq.(3.23) is very simple, but it has the disadvantage of not satisfying the
standard reality condition, i.e. eΦ is not unitary and Φ is not imaginary. Presumably there
is an infinite dimensional array of marginal solutions which do satisfy the reality condition,
and some may have analytic descriptions. In this appendix we give one construction which
is particularly closely related to our solution eq.(3.23). For a very interesting and completely
different solution, we refer the reader to an upcoming paper by Okawa[25].
Our strategy will be to find a finite gauge transformation of g in eq.(3.23) yielding a unitary
solution. The transformation is,
U = V g (B.1)
where V is some string field of the form,
V = 1 + dv (B.2)
with v carrying ghost number −1. A little thought reveals a natural candidate for V :
(B.3)
where g† is the conjugate of eq.(3.23):
g† = 1− λF 1
1− JλB F 2−1
XF (B.4)
and we use the Hermitian definition of the square root. Intuitively, this is just taking the
original solution and dividing by its “norm.” More explicitly, if we define,
gg† = 1 + T
T = λFX
1− λB F 2−1
F − λF 1
1− JλB F 2−1
−λ2FX 1
1− λB F 2−1
1− JλB F 2−1
XF (B.5)
then the required gauge transformation is given by the formal sum,
T n (B.6)
This proposal must be subject to two consistency checks. First, of course, is that the field
U is actually unitary. The proof is straightforward:
UU † =
= gg†
U †U = g†
g = g†(g†)−1g−1g = 1 (B.7)
The second check is that V is a gauge transformation of the form eq.(B.2). This follows if the
field T is BRST exact, T = du, since then we can write (for example),
V = 1 + d
uT n−1
(B.8)
A little guesswork reveals the following BRST exact expression for T :
T = d
1− λB F 2−1
F 2 − 1
(B.9)
This establishes not only that U is an analytic solution, but (perhaps more importantly) that
the simpler expression g is in the same gauge orbit with a solution satisfying the physical reality
condition. This leaves no question as to the physical viability of our original analytic solution
eq.(3.23).
As usual, the unitary solution U can be defined explicitly in terms of cylinder correlators by
expanding eq.(B.1) as a power series in λ. Unfortunately this is somewhat tedious because the
implicit dependence on λ in eq.(B.1) is complicated. As an expansion for the imaginary field
Φ, the first two orders agree with the Schnabl gauge solution (as they must11), while at third
order we find:
F 2 − 1
F 2 − 1
JF + FJB
F 2 − 1
F 2 − 1
FXF 2JB
F 2 − 1
+ FJB
F 2 − 1
F 2 − 1
JF 2XF + F 2XB
F 2 − 1
(FXF )3 (B.10)
This expression is much simpler than the Schnabl gauge solution at third order, which involves
intricate constrained and entangled integrals over moduli separating vertex operator insertions.
11The reality condition fixes the form of the second order solution uniquely within the K,B, J,X subalgebra.
References
[1] M. Schnabl, “Analytic solution for tachyon condensation in open string field theory,” Adv.
Theor. Math. Phys. 10 (2006) 433-501, arXiv:hep-th/0511286.
[2] Y. Okawa, “Comments on Schnabl’s analytic solution for tachyon condensation in Witten’s
open string field theory,” JHEP 0604, 055 (2006), arXiv:hep-th/0603159.
[3] E. Fuchs and M. Kroyter, “On the validity of the solution of string field theory,” JHEP
0605 006 (2006), arXiv:hep-th/0603195.
[4] L. Rastelli and B. Zwiebach, “Solving open string field theory with special projectors,”
arXiv:hep-th/0606131.
[5] I. Ellwood and M. Schnabl, “Proof of vanishing cohomology at the tachyon vacuum,”
JHEP 0702 (2007) 096, arxiv:hep-th/0606142.
[6] H. Fuji, S. Nakayama, and H Suzuki, “Open string amplitudes in various gauges,” JHEP
0701 (2007) 011, arXiv:hep-th/0609047.
[7] Y. Okawa, L.Rastelli and B.Zwiebach, “Analytic Solutions for Tachyon Condensation with
General Projectors,” arXiv:hep-th/0611110.
[8] T. Erler, “Split String Formalism and the Closed String Vacuum,”
arXiv:hep-th/0611200. T. Erler, “Split String Formalism and the Closed String
Vacuum, II” arXiv:hep-th/0612050.
[9] I. Ellwood, J. Shelton, and W. Taylor, “Tadpoles and Closed String Backgrounds in Open
String Field Theory,” JHEP 0307 (2003) 059, arXiv:hep-th/0304258.
[10] N. Berkovits, “Super-Poincare Invariant Superstring Field Theory,” Nucl. Phys. B450
(1995) 90, arXiv:hep-th/9503099; N. Berkovits, “A New Approach to Superstring Field
Theory,” proceedings to the 32nd International symposium Ahrenshoop on the Theory of
Elementary Particles, Fortschritte der Physik 48 (2000) 31, arXiv:hep-th/9912121.
[11] A. Sen and B. Zwiebach, “Large Marginal Deformations in String Field Theory,” JHEP
0010 (2000) 009, arXiv:hep-th/0007153.
[12] A. Iqbal and A. Naqvi, “On Marginal Deformations in Superstring Field Theory,” JHEP
0101 (2001) 040, arXiv:hep-th/0008127.
http://arxiv.org/abs/hep-th/0511286
http://arxiv.org/abs/hep-th/0603159
http://arxiv.org/abs/hep-th/0603195
http://arxiv.org/abs/hep-th/0606131
http://arxiv.org/abs/hep-th/0606142
http://arxiv.org/abs/hep-th/0609047
http://arxiv.org/abs/hep-th/0611110
http://arxiv.org/abs/hep-th/0611200
http://arxiv.org/abs/hep-th/0612050
http://arxiv.org/abs/hep-th/0304258
http://arxiv.org/abs/hep-th/9503099
http://arxiv.org/abs/hep-th/9912121
http://arxiv.org/abs/hep-th/0007153
http://arxiv.org/abs/hep-th/0008127
[13] M. Schnabl, “Comments on Marginal Deformations in Open String Field Theory,”
arXiv:hep-th/0701248.
[14] M. Kiermaier, Y.Okawa, L.Rastelli and B.Zwiebach, “Analytic Solutions for Marginal De-
formations in Open String Field Theory,” arXiv:hep-th/0701249.
[15] J. Kluson, “Exact solutions in SFT and marginal deformations in BCFT,” JHEP 0312,
050 (2003), arXiv:hep-th/0303199; T. Takahashi and S. Tanimoto, “Wilson lines and
classical solutions in cubic open string field theory,” Prog. Theor. Phys. 106 863 (2001),
arXiv:hep-th/0107046.
[16] I. Kishimoto and T. Takahashi, “Marginal deformations and classical solutions in open
superstring field theory,” JHEP 0511, 051 (2005) arXiv:hep-th/0506240.
[17] D. J. Gross and W. Taylor, “Split String Field Theory. I,II” JHEP 0108 009 (2001),
arXiv:hep-th/0105059, JHEP 0108 010 (2001), arXiv:hep-th/0106036; L. Rastelli, A.
Sen, B. Zwiebach, “Half-strings, Projectors, and Multiple D-branes in Vacuum String Field
Theory,” JHEP 0111 (2001) 035, arXiv:hep-th/0105058; I. Bars, “Map of Witten’s ⋆
to Moyal’s ⋆,” Phys.Lett. B517 (2001) 436-444, arXiv:hep-th/0106157; M. R. Douglas,
H. Liu, G. Moore and B. Zweibach, “Open String Star as a Continuous Moyal Product,”
JHEP 0204 (2002) 022, arXiv:hep-th/0202087.
[18] Y. Okawa, “Analytic Solutions for Marginal Deformations in Open Superstring Field The-
ory,” arXiv:0704.0936.
[19] N. Berkovits, “Review of Open Superstring Field Theory,” arXiv:hep-th/0105230.
[20] K. Ohmori, “A Review on Tachyon Condensation in Open String Field Theories,”
arXiv:hep-th/0102085.
[21] P. De Smet, “Tachyon Condensation: Calculations in String Field Theory,”
arXiv:hep-th/0109182.
[22] A. Sen, “Rolling Tachyon,” JHEP 0204 (2002) 048, arXiv:hep-th/0203211; A. Sen,
“Tachyon Matter,” JHEP 0207 (2002) 065, arXiv:hep-th/0203265.
[23] N. Berkovits, “The Tachyon Potential in Open Neveu-Schwarz String Field Theory,” JHEP
0004 (2000) 022, arXiv:hep-th/0001084.
http://arxiv.org/abs/hep-th/0701248
http://arxiv.org/abs/hep-th/0701249
http://arxiv.org/abs/hep-th/0303199
http://arxiv.org/abs/hep-th/0107046
http://arxiv.org/abs/hep-th/0506240
http://arxiv.org/abs/hep-th/0105059
http://arxiv.org/abs/hep-th/0106036
http://arxiv.org/abs/hep-th/0105058
http://arxiv.org/abs/hep-th/0106157
http://arxiv.org/abs/hep-th/0202087
http://arxiv.org/abs/0704.0936
http://arxiv.org/abs/hep-th/0105230
http://arxiv.org/abs/hep-th/0102085
http://arxiv.org/abs/hep-th/0109182
http://arxiv.org/abs/hep-th/0203211
http://arxiv.org/abs/hep-th/0203265
http://arxiv.org/abs/hep-th/0001084
[24] N. Berkovits, A. Sen and B. Zwiebach, “Tachyon Condensation in Superstring Field The-
ory,” Nucl. Phys. B587 (2000) 147-178, arXiv:hep-th/0002211.
[25] Y. Okawa, to appear.
http://arxiv.org/abs/hep-th/0002211
	Introduction
	Bosonic Solution
	Superstring Solution
	Pure Gauge for Bosonic Solution
	Conclusion
	B0,L0 with Split Strings
	Unitary e
ABSTRACT
  We construct a class of analytic solutions of WZW-type open superstring field
theory describing marginal deformations of a reference D-brane background. The
deformations we consider are generated by on-shell vertex operators with
vanishing operator products. The superstring solution exhibits an intriguing
duality with the corresponding marginal solution of the {\it bosonic} string.
In particular, the superstring problem is ``dual'' to the problem of
re-expressing the bosonic marginal solution in pure gauge form. This represents
the first nonsingular analytic solution of open superstring field theory.

<|endoftext|><|startoftext|>
Introduction
Early-type galaxies form a remarkably homogeneous class of objects with a well-defined Funda-
mental Plane and with tight relations between colour and magnitude, between colour and velocity
dispersion, and between the amount of α-element enhancement and velocity dispersion (e.g., Faber
& Jackson 1976; Visvanathan & Sandage 1977; Dressler 1987; Djorgovski & Davis 1987; Bower et
al. 1992; Ellis et al. 1997). They have old stellar populations, though sometimes with a younger
component (Trager et al. 2000; Serra et al. 2006; Kuntschner et al. 2006), contain little ionized and
cold gas (Sarzi et al. 2006; Morganti et al. 2006), and are preferentially located in massive dark
matter halos (e.g., Dressler 1980; Weinmann et al. 2006).
http://arxiv.org/abs/0704.0931v1
– 2 –
Ever since the seminal work by Davies et al. (1983), however, it has become clear that early-
type galaxies encompasses two distinct families. Davies et al. showed that bright ellipticals typically
have little rotation, such that their flattening must originate from anisotropic pressure. This is
consistent with bright ellipticals being in general triaxial. Low luminosity ellipticals, on the other
hand, typically have rotation velocities that are consistent with them being oblate isotropic rotators.
With the advent of CCD detectors, it quickly became clear that these different kinematic classes also
have different morphologies. Although ellipticals have isophotes that are ellipses to high accuracy,
there are small deviations from perfect ellipses (e.g., Lauer 1985; Carter 1987; Bender & Möllenhoff
1987). In particular, bright, pressure-supported systems typically have boxy isophotes, while the
lower luminosity, rotation-supported systems often reveal disky isophotes (e.g., Bender 1988; Nieto
et al. 1988). With the high angular resolution of the Hubble Space Telescope it has become clear
that both types have different central surface brightness profiles as well. The bright, boxy ellipticals
typically have luminosity profiles that break from steep outer power-laws to shallow inner cusps
(often called ‘cores’). The fainter, disky ellipticals, on the other hand, have luminosity profiles that
lack a clear break and have a steep central cusp (e.g., Jaffe et al. 1994; Ferrarese et al. 1994; Lauer
et al. 1995; Gebhardt et al. 1996; Faber et al. 1997; Rest et al. 2001; Ravindranath et al. 2001;
Lauer et al. 2005).
The isophotal shapes of early-type galaxies have also been found to correlate with the radio
and X-ray properties of elliptical galaxies (Bender et al. 1989; Pellegrini 1999). Objects which
are radio-loud and/or bright in soft X-ray emission generally have boxy isophotes, while disky
ellipticals are mostly radio-quiet and faint in soft X-rays. As shown in Pellegrini (2005), the soft
X-ray emission of power-law (and hence disky) ellipticals is consistent with originating from X-ray
binaries. Ellipticals with a central core (which are mainly boxy), however, often have soft X-ray
emission in excess of what may be expected from X-ray binaries. This emission originates from
a corona of hot gas which is distributed beyond the optical radius of the galaxy (e.g., Trinchieri
& Fabbiano 1985, Canizares et al. 1987; Fabbiano 1989). In terms of the radio and hard X-ray
emission, thought to originate from active galactic nuclei (AGN), it is found that those ellipticals
with the highest luminosities in radio and/or hard X-rays are virtually always boxy (Bender et
al. 1989; Pellegrini 2005). This is consistent with the results of Ravindranath et al. (2001), Lauer
et al. (2005) and Pellegrini (2005), all of whom find a somewhat higher fraction of ellipticals with
optical AGN activity (i.e., nuclear line emission) among cored galaxies.
The above mentioned trends between isophotal shape and galaxy properties have mainly been
based on relatively small, somewhat heterogenious samples of relatively few objects ( <∼ 100). Re-
cently, however, Hao et al. (2006a, hereafter H06) compiled a sample of 847 nearby, early-type
galaxies from the Sloan Digital Sky Survey (SDSS) for which they measured the isophotal shapes.
Largely in agreement with the aforementioned studies they find that (i) more luminous galaxies
are on average rounder and are more likely to have boxy isophotes (ii) disky ellipticals favor field
environments, while boxy ellipticals prefer denser environments, and (iii) disky ellipticals tend to
lack powerful radio emission, although this latter trend is weak.
– 3 –
The prevailing idea as to the origin of this disky-boxy dichotomy is that it reflects the galaxy’s
assembly history. Within the standard, hierarchical formation picture, in which ellipticals are
formed via mergers, the two main parameters that determine whether an elliptical will be boxy
and cored or disky and cuspy are the progenitor mass ratio and the progenitor gas mass fractions.
Pure N -body simulations without gas show that the isophotal shapes of merger remnants depend
sensitively on the progenitor mass ratio: major mergers create ellipticals with boxy isophotes,
while minor mergers mainly result in systems with disky isophotes (Khochfar & Burkert 2005,
Jesseit et al. 2005). As shown by Naab et al. (2006), including even modest amounts of gas
has a dramatic impact on the isophotal shape of equal-mass merger remnants. The gas causes a
significant reduction of the fraction of box and boxlet orbits with respect to collisionless mergers,
and the remnant appears disky rather than boxy. Therefore, it seems that the massive, boxy
ellipticals can only be produced via dry, major mergers. The cores in these boxy ellipticals are
thought to arise from the binding energy liberated by the coalescence of supermassive binary black
holes during the major merger event (e.g., Faber et al. 1997; Graham et al. 2001; Milosavljević et
al. 2002). When sufficient gas is present, however, dissipation and sub-sequent star formation may
regenerate a central cusp. Alternatively, the gas may serve as an energy sink for the binding energy
of the black hole binary, leaving the original stellar cusp largely intact. Thus, following Lauer
et al. (2005), we may summarize this picture as implying that power-laws reflect the outcome of
dissipation and concentration, while cores owe to mixing and scattering.
But what about the correlation between isophotal shape and AGN activity? It is tempting
to believe that this correlation simply derives from the fact that both isophotal shape and AGN
activity may be related to mergers. After all, it is well known that mergers can drive nuclear inflows
of gas, which produce starbursts and feed the central supermassive black hole(s) (Toomre & Toomre
1972, Barnes & Hernquist 1991,1996, Mihos & Hernquist 1994,1996, Springel 2000, Cattaneo et
al. 2005). However, since the onset of such AGN activity requires wet mergers, this would predict
a higher frequency of AGN among disky ellipticals, contrary to the observed trend. Another
argument against mergers being responsible for the AGN-boxiness correlation is that the time scale
for merger-induced AGN activity is relatively short ( <∼ 10
8 yrs) compared to the dynamical time
in the outer parts of the merger remnant. This implies that active ellipticals should reveal strongly
distorted isophotes, which is not the case.
An important hint may come from the strong correlation between the presence of dust (either
clumpy, filamentary, or in well defined rings and disks) and the presence of optical emission line
activity (Tran et al. 2001; Ravindranath et al. 2001; Lauer et al. 2005). Although this suggests
that this dust is (related to) the actual fuel for the AGN activity, many questions remain. For
instance, it is unclear whether the origin of the dust is internal (shed by stellar winds) or external
(see Lauer et al. 2005 for a detailed discussion). In addition, it is not clear why the presence of dust,
and hence the AGN activity, would be more prevalent in boxy ellipticals. One option is that boxy
ellipticals are preferentially central galaxies (as opposed to satellites), so that they are more efficient
at accreting external gas (and dust). This is consistent with the fact that boxy ellipticals (i) are, on
– 4 –
average, brighter, (ii) reside in dense environments (Shioya & Taniguchi 1993; H06), and (iii) more
often contain hot, soft X-ray emitting halos. Another, more benign possibility, is that the relation
between morphology and AGN activity is merely a reflection of the fact that both morphology and
AGN activity depend on the magnitude of the galaxy (or on its stellar or dynamical mass). In this
case, AGN activity is only indirectly related to the morphology of its host galaxy.
In this paper we use the large data set of H06 to re-investigate the correlations between
morphology and (i) luminosity, (ii) dynamical mass, and (iii) emission line activity in the optical,
where we discriminate between AGN activity and star formation. In addition, we also examine to
what extent morphology correlates with X-ray emission (using data from ROSAT), with 1.4GHz
radio emission (using data from FIRST), and with the mass of the dark matter halo in which the
galaxy resides (using a SDSS galaxy group catalog). The outline of this paper is as follows. In § 2
we describe the data of H06; in § 3 we present the fraction of disky galaxies across the full sample
as a function of galaxy luminosity, dynamical mass and environment. In § 4 we split the sample
galaxies according to their activity in the optical, radio and X-rays, and investigate how the disky-
boxy morphology correlates with these various levels of ‘activity’. Finally, in § 5 we summarize
and discuss our findings. Throughout this paper we adopt a ΛCDM cosmology with Ωm = 0.3,
ΩΛ = 0.7, and H0 = 100h kms
−1 Mpc−1. Magnitudes are given in the AB system.
2. Data
2.1. Sample Selection
In order to investigate the interplay among AGN activity, morphology and environment for
early-type galaxies, we have analyzed the sample of H06, which consists of 847 galaxies in the SDSS
DR4 (Adelman-McCarthy et al. 2006) classified as ellipticals (E) or lenticulars (S0). As described
in H06, these objects are selected to be at z < 0.05, in order to ensure sufficient spatial resolution
to allow for a meaningful measurement of the isophotal parameters. In addition, the galaxies are
selected to have an observed velocity dispersion between 200 km s−1 and 420 km s−1 (where the
upper limit corresponds to the largest velocity dispersion that can be reliably measured from the
SDSS spectra), and are not allowed to be saturated. Note that, for the median sample distance,
the fiber radius of 1.5 arcsec corresponds to about 30% of the sample mean effective radius. From
all galaxies that obey these criteria, early-types have been selected by H06 using visual inspection.
Galaxies with prominent dust lanes have been excluded from the final sample in order to reduce
the effects of dust on the isophotal analysis.
– 5 –
2.2. Isophotal Analysis
Isophotes are typically parameterized by their corresponding surface brightness, I0, their semi-
major axis length, a, their ellipticity, ǫ, and their major axis position angle, θ0. In addition, since
isophotes are not perfectly elliptical, it is common practice to expand the angular intensity variation
along the best fit ellipse, δI(θ), in a Fourier series:
δI(θ) =
A′n cosn(θ − θ0) +B
n sinn(θ − θ0)
(e.g., Carter 1987; Jedrzejewski 1987; Bender & Möllenhoff 1987). Only the terms with n = 3 and
n = 4 are usually computed, as the data is often too noisy to reliably measure higher-order terms
(but see Scorza & Bender 1995 and Scorza & van den Bosch 1999). Note that, by definition, the
terms with n = 0, 1, and 2 are equal to zero within the errors. If the isophote is perfectly elliptical,
then A′n and B
n are also equal to zero for n ≥ 3. Non-zero A′3 and B′3 express deviations from
a pure ellipse that occur along the observed isophote every 120o. Typically, such deviations arise
from the presence of dust features or clumps. The most important Fourier coefficient, however, is
A′4, which quantifies the deviations taking place along the major and minor axes. Isophotes with
A′4 < 0 have a ‘boxy’ shape, while those with a positive A
4 parameter are ‘disk’-shaped.
For each of the 847 E/S0 galaxies in their sample H06 measured the isophotal parameters
using the IRAF1 task ELLIPSE. In particular, for each galaxy they provide the ellipticity, ǫ, the
position angle of the major axis, θ0, and the third and fourth order Fourier coefficients A3 and A4,
which are equal to A′3 and A
4, respectively, divided by the semi-major axis length and the local
intensity gradient. All the available parameters are intensity-weighted averages over the radial
interval 2Rs < R < 1.5R50. Here Rs is the seeing radius (typically lower than 1.5 arcsec, Stoughton
et al. 2002) and R50 is the Petrosian half-light radius
2. The Petrosian radius is defined as the
radius at which the ratio of the local surface brightness to the mean interior surface brightness is
0.2 (cf. Strauss et al. 2002). Therefore, R50 is the radius enclosing half of the flux measured within
a Petrosian radius and can be used as a proxy for the galaxy effective radius Re. In what follows,
we refer to galaxies with A4 ≤ 0 and A4 > 0 as ‘boxy’ and ‘disky’, respectively.
In their seminal papers on the isophotal shapes of elliptical galaxies, Bender & Möllenhoff
(1987), Bender et al. (1988) and Bender et al. (1989) define alternative structural parameters,
an/a and bn/a, which are related to the An and Bn parameters defined here as
1− ǫAn
1− ǫBn (2)
(Bender et al. 1988; Hao et al. 2006b).
IRAF is distributed by the National Optical Astronomy Observatories, which are operated by the Asssociation of
Universities for Research in Astronomy, Inc., under cooperative agreement with the National Science Foundation
2These data are publicly available at http://www.jb.man.ac.uk/∼smao/isophote.html
http://www.jb.man.ac.uk/~smao/isophote.html
– 6 –
2.3. Additional data
For all galaxies in the H06 sample we determined the absolute magnitudes in the SDSS g, r and
i bands, corrected for Galactic extinction, and K-corrected to z = 0, using the luminosity distances
corrected for Virgo-centric infall of the Local Group (LG) following Blanton et al. (2005). In order
to allow for a comparison with the samples of Bender et al. (1989) and Pellegrini (1999, 2005), we
transform these magnitudes to those in the Johnson B-band using the filter transformations given
by Smith et al. (2002).
We also estimated, for each galaxy, the total dynamical mass as
Mdyn = A
σ2corrR50
Here G is the gravitational constant, A is a normalization constant, and σcorr is the velocity dis-
persion measured from the SDSS spectra corrected for aperture effects using
σcorr = σmeasured
Rfiber
R50/8
)0.04
, (4)
with Rfiber = 1.5 arcsec (Bernardi et al. 2003). The aperture correction is meant to give the velocity
dispersion within R50/8, and to make comparable galaxies at different distance but sampled with a
spectroscopic fiber of fixed size. Throughout this paper we adopt A = 5, which has been shown to
accurately reproduce the total dynamical masses inferred from more accurate modeling (Cappellari
et al. 20063). Note that Cappellari et al. have also shown that these dynamical masses are roughly
proportional to the stellar masses of early-type galaxies.
H06 cross-correlated their E/S0 sample with the FIRST radio survey (Becker, White & Helfand
1995), which yielded the 1.4GHz fluxes for 162 objects in the sample. In order to investigate the
relation between isophotal structure and soft X-ray properties, we also matched the H06 sample to
the ROSAT All Sky Survey Catalog (Voges et al. 1999). This yields ROSAT/PSPC count-rates in
the 0.1 – 2.4 keV energy band for 40 sample galaxies. We used the WebPIMMS tool4 to transform
the observed count-rates into astrophysical fluxes, corrected for Galactic extinction, assuming an
X-ray power-law spectrum with energy index αX = 1.5 (cf. Anderson et al. 2003). In addition,
we cross-identified the H06 sample with the spectroscopic catalogs released for DR4 by Kauffmann
et al. (2003a,b), and extracted, when available, the luminosity of the [OIII] λ5007 line corrected
for dust extinction (in ergs−1), the line-flux ratios [OIII]/Hβ and [NII]/Hα, and the S/N values
associated with the [OIII] and Hα fluxes.
3Cappellari et al. (2006) use a slightly different definition of σcorr in equation (2), namely that measured within
Re rather than R50/8. Given the weak dependence of the velocity dispersion on the enclosed radius, we estimate
that this difference results in an offset of ∼0.07 dex in Mdyn
4http://heasarc.gsfc.nasa.gov/Tools/w3pimms.html
– 7 –
Finally, in order to assess the environment of the sample galaxies, we cross-identified the
H06 sample with the SDSS group catalog of Weinmann et al. (2006; hereafter WBYM), which is
constructed using the halo-based group finder of Yang et al. (2005). This yields group (i.e. dark
matter halo) masses for a total of 431 galaxies, distributed over 403 groups. Of these, 350 are
‘central’ galaxies (defined as the brightest galaxy in its group) and 81 are ‘satellites’. As for the
groups, 83 have just a single member (the early-type galaxy in our sample), while 320 groups have
2 or more members. The fact that only 51 percent of the galaxies in the H06 sample are affiliated
with a group is due to the fact that the WBYM group catalog is based on the DR2, and to the fact
that not all galaxies can be assigned to a group (see Yang et al. 2005 for details).
3. The disky fraction across the sample
The main properties of the full H06 sample (comprising all 847 early-type galaxies) are summa-
rized in Figure 1. The sample spans about 3 orders of magnitude in MB (−17.8 > MB −5 log(h) >
−21.4) and a range of about 1.5 dex in dynamical mass (10.5 < log[Mdyn(h−1M⊙)] < 12) and 3 dex
in group (halo) mass (11.8 < log[Mgroup(h
−1M⊙)] < 15). As expected, the B-band magnitude is
well correlated with the dynamical mass, independent of whether the galaxy is a central galaxy or a
satellite, and independent of whether it is disky or boxy. The absolute magnitudes and dynamical
masses of satellite galaxies are clearly separated from those of the central galaxies when plotted as
function of the group (halo) mass. This simply reflects that centrals are defined as the brightest
(and, hence, most likely the most massive) group members. This clear segregation disappears when
the galaxies are split in disky and boxy systems (lower panels), indicating that there is no strong
correlation between morphology and group hierarchy.
The upper panels of Figure 2 show scatter plots of MB, Mdyn and Mgroup as function of the
isophotal parameter A4. They indicate that the fraction of disky systems (those with A4 > 0)
increases with decreasing luminosity and dynamical mass, in qualitative agreement with Bender et
al. (1989) and H06. In the case of Mgroup, a similar trend seems to be present, but only for the
central galaxies. In order to quantify these trends, we have computed the fraction, fdisky, of disky
galaxies as a function of MB , Mdyn and Mgroup. For each bin in absolute magnitude, dynamical
mass, or group mass, fdisky is defined as the number ratio between disky galaxies and the total
number of galaxies in that bin. Each bin contains at least ten disky galaxies. For comparison, the
disky fraction of the full H06 sample is 0.66.
The lower left-hand panel of Figure 2 plots fdisky as function of MB . The errorbars are
computed assuming Poisson statistics. The fraction of disky galaxies declines by a factor of about
1.6 from ∼ 0.8 at MB − 5 log(h) = −18.7 to ∼ 0.5 at MB − 5 log(h) = −20.8, and is well fitted by
fdisky(MB) = (0.61 ± 0.02) + (0.17 ± 0.03) [MB − 5 log(h) + 20] (5)
which is shown as the solid, grey line. Note that this relation should not be extrapolated to arbitrary
faint and/or bright magnitudes. Since 0 ≤ fdisky ≤ 1 it is clear that fdisky(MB) must flatten at
– 8 –
both ends of the magnitude distribution. Apparently the magnitude range covered by our sample
roughly corresponds to the range in which the distribution transits (relatively slowly and smoothly)
from mainly disky to mainly boxy.
It has to be noted that the exact relation between fdisky and MB is somewhat sensitive to the
exact sample selection criteria, and equation (5) therefore has to be used with some care.
We have tested the robustness of the above relation by adding Gaussian deviates to each
measured value of A4, and then recomputing the best-fit relation between fdisky and MB . Figure
3 shows the slope and zero-point of this relation as function of the standard deviation of the
Gaussian deviates used (filled circles). The grey shaded horizontal bar represents the 1 σ interval
around the best-fit slope (left-hand panel) and the best-fit zero-point (right-hand panel). The grey
shaded vertical bar indicates the mean uncertainty on the observed A4 parameter obtained by
H06 (σ(A4) = 0.0012 ± 0.0008). Note that the best-fit slope and zero-point are extremely robust.
Adding an artificial error to the A4 measurements with an amplitude that is a factor five larger
than the average error quoted by H06 yields best-fit values that agree with those of equation (5)
at better than the 1σ errorbar on these parameters obtained from the fit.
The middle panel in the lower row of Figure 2 plots fdisky as function of Mdyn. As with the
luminosity, the disky fraction decreases smoothly with increasing dynamical mass, dropping from
∼ 0.80 at Mdyn = 6 × 1010h−1 M⊙ to ∼ 0.45 at Mdyn = 3 × 1011h−1 M⊙. The grey, dashed line
indicates the best-fit log-linear relation, which is given by
fdisky(Mdyn) = (0.73 ± 0.02) − (0.53 ± 0.08) log
1011h−1 M⊙
As for equation (5), the Gaussian-deviates test shows that this relation is robust against uncertain-
ties in the A4 measurements.
As is well-known from the morphology-density relation (e.g., Dressler 1980), early-type galaxies
preferentially reside in denser environments and hence in more massive halos (e.g, Croton et al. 2005;
Weinmann et al. 2006). It is interesting to investigate whether the halo mass also determines
whether the early-type galaxies are disky or boxy. We can address this using the WBYM group
catalog described in §2.3. The lower right-hand panel of Figure 2 plots the disky fraction of centrals
(crosses) and satellites (open triangles) as function of group mass. The fraction of disky centrals
decreases with increasing group (halo) mass, declining from ∼ 0.82 at Mgroup = 1.7 × 1012h−1 M⊙
to ∼ 0.54 at Mgroup = 5.0 × 1013h−1 M⊙. For the most massive groups, we have enough satellite
galaxies to also compute their disky fraction. Interestingly, these are larger (though only marginally
so) than those of central galaxies in groups of the same mass.
Although these results seem to suggest that group mass and group hierarchy (i.e., central vs.
satellite) play a role in determining the morphology of an early-type galaxy, they may also simply
be reflections of the fact that (i) satellite galaxies are fainter than central galaxies in the same
parent halo, (ii) fainter centrals typically reside in lower mass halos (cf. Figure 1), and (iii) fainter
galaxies have a larger fdisky. In order to discriminate between these options we proceed as follows.
– 9 –
Under the null-hypothesis that the isophotal structure of an early-type galaxy is only governed by
the galaxy’s absolute magnitude or dynamical mass, the predicted fraction of disky systems for a
given sub-sample is simply
fdisky,0 =
fdisky(Xi) (7)
where Xi is either MB − 5 log(h) or log(Mdyn) of the ith galaxy in the sample, and fdisky(X) is the
average relation between fdisk and X. The grey solid and dashed lines in the lower right-hand panel
of Figure 2 show the fdisky,0(Mgroup) thus obtained, using equations (4) and (5), respectively. These
are perfectly consistent with the observed trends (for both the centrals and the satellites). A possible
exception is the disky fraction of central galaxies in groups with Mgroup < 3.0× 1012h−1M⊙, which
is ∼ 2.5σ higher than predicted by the null-hypothesis. Overall, however, these results support
the null-hypothesis that the morphology of an early-type galaxy depends only on its luminosity or
dynamical mass: there is no significant indication that group mass and/or group hierarchy have a
direct impact on the morphology of early-type galaxies.
4. The disky fraction of active early-type galaxies
4.1. Defining different activity classes
In the standard unified model, AGN are distinguished in AGN of Type I when the central black
hole, its continuum emission and its broad emission-line region are viewed directly, and Type II, if
the central engine is obscured by a dusty circumnuclear medium. Our sample of early-type galaxies
does not contain any Type I AGN, simply because these systems are not part of the main galaxy
sample in the SDSS. However, the E/S0 sample of H06 is not biased against Type II AGN. In order
to identify these systems, one needs to be able to distinguish them from early-types with some
ongoing, or very recent, star formation, which also produces narrow emission lines. Since stars and
AGN produce different ionization spectra, one can discriminate between them by using line-flux
ratios. In particular, star formation and AGN activity can be fairly easily distinguished using the
so-called BPT diagram (after Baldwin, Phillips & Terlevich 1981; see also Veilleux & Osterbrock
1987), whose most common version involves the line-flux ratios [OIII]/Hβ and [NII]/Hα.
Figure 4 plots the BPT diagram for those sample galaxies whose [OIII] λ5007 and Hα lines
have been detected with a signal-to-noise ratio S/N ≥ 3. The solid curve was derived by Kauffmann
et al. (2003b) and separates star-forming galaxies from type II AGN, with the latter lying above the
curve. We follow Kauffmann et al. (2003b) and split the Type II AGN into Seyferts, LINERS, and
Transition Objects (TOs) according to their line-flux ratios: Type II Seyferts have log([OIII]/Hα) ≥
0.5 and log([NII]/Hα) ≥ −0.2, LINERS have log([OIII]/Hα) < 0.5 and log([NII]/Hα) ≥ −0.2, and
all galaxies with log([NII]/Hα) < −0.2 and laying above the curve are labelled TO. Kewley et al.
(2006) have recently studied in detail the properties of LINERs and type II Seyferts, and found
that LINERs and Seyferts form a continuous progression in the accretion rate L
, with LINERs
– 10 –
dominating at low L
and Seyferts prevailing at high L
. The results obtained by Kewley et
al. suggest that most LINERs are AGN and require a harder ionizing radiation field together with
a lower ionization parameter than Seyferts.
In order to increase the statistics of our subsequent analysis, we have organized the 847 galaxies
in the H06 sample into 3 categories:
1. AGN: This class consists of 28 early-type galaxies with a Seyfert-like activity and 286 early-
type galaxies with a LINER-like activity.
2. Emission-line (EL): This class consists of those galaxies that according to the BPT diagram
are star formers or transition objects, as well as those galaxies that lack one or both of the
BPT line-flux ratios, but that have an [OIII] emission line with a S/N ≥ 3. There are a total
of 383 early-type galaxies in the H06 sample that fall in this category.
3. Non-active (NA): These are the 150 galaxies that are not in the AGN or EL categories.
Therefore, these galaxies either have no emission lines at all, or have a detected [OIII] line
but with a S/N < 3. Among these, 43 objects (29 percent) show Hα emission with a S/N
≥ 3. Their presence could signal a problem with the spectroscopic pipeline, which failed to
properly measure the [OIII] line, or be real and due to an episode of star formation in its
early phases. In any case, their low S/N in [OIII] prevents us from classifying these galaxies
in one of the above two categories.
Given our aim to establish the presence/absence of a correlation between the AGN activity
and the disky/boxy morphology of the host early-type galaxy, the classification above is clearly
driven by the detection of the [OIII] line emission, which is commonly used as a proxy for the AGN
strength (cf. Kauffmann et al. 2003b, Kewley et al. 2006).
Along with these 3 categories which describe the galaxy activity in the optical, we have also
defined two additional activity classes: ‘FIRST’, which consists of the 162 sample galaxies with a
1.4GHz flux in the FIRST catalog (Becker et al. 1995), and ‘ROSAT’, containing the 40 sample
galaxies that have been detected in the ROSAT All Sky Survey (Voges et al. 1999). The soft
X-ray luminosities of these ROSAT galaxies span the range 41.3 < log[LX/(ergs
−1)] < 42.7 and
are consistent (though with large scatter), with the well known LX ∝ L2B relation (Trinchieri &
Fabbiano 1985; Canizares et al. 1987). This X-ray emission is therefore associated with a hot corona
surrounding the galaxy, rather than with X-ray binaries, and we can use it to indirectly probe the
environment where galaxies live. As shown by Bender et al. (1989) and O’Sullivan et al. (2001),
the LX ∝ L2B relation applies to X-ray luminosities between 10
40 and 1043 erg s−1. Our ROSAT
category is thus somehow incomplete at 40 < log[LX/(ergs
−1)] < 41 and the trends discussed below
for this class should be taken with some caution.
Table 1 lists the number of galaxies in each of these five activity classes. Note that the AGN,
EL and NA classes are mutually exclusive, but that a galaxy in each of these three classes can
– 11 –
appear also in the FIRST and ROSAT sub-samples. The vast majority of the galaxies detected by
FIRST or ROSAT reveal activity also in the optical, and are classified as either AGN or EL. The
radio and soft X-ray detections themselves, however, are not well correlated: only 12 percent of the
galaxies detected by FIRST have also been detected in soft X-rays.
Before computing fdisky for the galaxies in these various activity classes, it is useful to examine
how their respective distributions in MB, Mgroup and L[OIII] compare. This is shown in Figures 5,
6 and 7, respectively. While the luminosity distributions of the AGN and EL galaxies are in good
agreement with that of the full sample, the galaxies detected by ROSAT are on average about half
a magnitude brighter than the galaxies in the full sample. Also the non-active and radio galaxies
are brighter than average, though the differences are less pronounced. McMahon et al. (2002)
estimated a limiting magnitude of R ≃ 20 for the optical counterparts of FIRST sources at a 97
percent completeness level. Since the apparent magnitude limit of the H06 sample is brighter than
this limit, the FIRST subsample extracted from H06 is to be considered complete. Therefore, the
shift towards higher luminosities for the galaxies detected by FIRST with respect to the full sample
in Figure 5 is real rather than an artifact due to the depth of the different surveys.
Similar trends are present with respect to the group masses: whereas AGN and EL galaxies
have group masses that are very similar to those of the full sample, galaxies detected by ROSAT and
FIRST seem to prefer more massive groups. Somewhat surprisingly, the same applies to the class
of non-active galaxies. As for the luminosity of their [OIII] line plotted in Figure 7, AGN galaxies
tend to be brighter than EL and ROSAT galaxies, while the [OIII] luminosities of FIRST galaxies
are consistent with those of the EL and AGN galaxies combined (grey shaded histogram). In
agreement with Best et al. (2005), no correlation is found between the radio and [OIII] luminosities
of the sample galaxies in common with FIRST. Finally, it is worth emphasizing that the optical
activity defined in this paper occurs at log(L[OIII]/L⊙) ≥ 4.6; therefore, the class of non-active
galaxies may also contain weak AGN with [OIII] fluxes below this limit. Using the KS-test, we
have investigated whether the various distributions are consistent with being drawn from the same
parent distribution. We have found that only EL and ROSAT are consistent, in terms of their [OIII]
luminosity, with belonging to the same population, as well as the pairs (AGN,EL) and (NA,FIRST)
in terms of their absolute magnitude, the pair (AGN,EL) with respect to their dynamical mass,
and the pairs (AGN,EL), (NA,FIRST), (NA,ROSAT) and (FIRST,ROSAT) in terms of their group
halo mass.
Another aspect of defining different modes of activity is to study their actual frequency, i.e.
the fraction of galaxies sharing the same kind of activity (with respect to the full sample) as a
function of MB , Mdyn and environment. This is plotted in Figure 8, where the percentage of
NA, FIRST and ROSAT galaxies increases by a factor of about 4 towards higher luminosities and
larger dynamical masses (cf. Best et al. 2005, O’Sullivan et al. 2001), and by a factor of about 3
as their hosting group halo becomes more massive. EL and AGN galaxies define a far less clear
picture; EL galaxies seem to occur at any MB and Mgroup with a constant frequency, while their
fraction decreases by a factor of about 1.5 as Mdyn gets larger. The percentage of AGN galaxies
– 12 –
drops by a factor of about 2 at brighter MB values. It very weakly decreases in massive group
halos, and appears quite insensitive to Mdyn. As for the hierarchy inside a group, there is a weak
indication that EL, FIRST and ROSAT galaxies are preferentially associated with central galaxies,
while satellite galaxies are more frequently NA and AGN galaxies. Within the Poisson statistics,
however, none of these trends with group hierarchy is significant.
4.2. The relation between activity and morphology
A first glance at how morphology varies with activity is provided by Table 2, which lists
fdisky for the 5 classes defined in §4.1. As for Table 1, AGN, EL and NA galaxies are mutually
exclusive, while any of them can be included in the FIRST and ROSAT categories. In this case,
fdisky is derived from the pool of galaxies common to FIRST (ROSAT) and one of the optically
active sub-samples. The ROSAT galaxies are clearly biased towards boxy shapes as their fdisky
is systematically lower than ∼ 0.50. AGN and NA galaxies with or without radio emission are
generally disky (with fdisky > 0.60). The radio emission seems to make a difference in the case
of EL galaxies: while the full sub-sample of ELs is as disky as AGN and NA galaxies, those ELs
detected by FIRST are dominated by boxy systems with fdisky ≃ 0.45.
The upper panels of Figure 9 show scatter plots of the [OIII] luminosity, radio luminosity and
X-ray luminosity as function of A4. The lower panels plot the corresponding fractions of disky
systems. In the lower left-hand panel, fdisky is plotted as function of L[OIII] for both AGN (filled
squares) and EL (filled triangles) galaxies. This shows that both AGN and EL galaxies have a disky
fraction that is consistent with that of the full H06 sample (fdisky = 0.66), and with no significant
dependence on the actual [OIII] luminosity. The grey lines (solid for AGN and dotted for EL
galaxies) indicate the disky fractions predicted under the null-hypothesis that fdisky is a function
only of MB . These predictions are in excellent agreement with the data, suggesting that the (level
of) optical activity does not help in better predicting the disky/boxy morphology of an early-type
galaxy. The only possible exception is the sub-sample of EL galaxies with log(L[OIII]/L⊙) < 5.2
which has fdisky = 0.54, approximately 3σ lower than given by the null-hypothesis.
The relatively small number of sample galaxies detected by FIRST and ROSAT prevents us
from applying the above analysis as a function of radio and/or soft X-ray luminosity. Instead
we have determined fdisky separately for the sub-samples of galaxies detected and not-detected by
FIRST or ROSAT. The results are shown in the lower middle and lower right-hand panels of Figure
9. Clearly, the disky fraction of galaxies detected by ROSAT (fdisky = 0.48 ± 0.08) is significantly
lower than those with no detected soft X-ray flux (fdisky = 0.66 ± 0.02), in agreement with the
results of Bender et al. (1989) and Pellegrini (1999, 2005). The fact that galaxies detected by
ROSAT are more boxy is expected since they are significantly brighter than those with no soft
X-ray detection (cf. Figure 5). The grey lines, which correspond to fdisky,0(MB), indicate that this
explains most of the effect. Although it is intriguing that the disky fraction of ROSAT detections
is ∼ 1σ lower than predicted, a larger sample of early-type galaxies with soft X-ray detections is
– 13 –
needed to rule out (or confirm) the null-hypothesis. As for the galaxies detected by FIRST, there is
a weak indication that these galaxies have a somewhat lower fdisky: this finding is again in excellent
agreement with the predictions based on the null-hypothesis. Therefore, there is no indication that
the morphology of an early-type galaxy is directly related to whether the galaxy is active in the
radio or not.
To further test the null-hypothesis that the isophotal structure of early-type galaxies is entirely
dictated by their absolute magnitude or dynamical mass, we have derived fdisky of NA, EL and
AGN galaxies in bins of MB and Mdyn. The results are shown in Figure 10 (symbols), which shows
that the disky fraction of all three samples decreases with increasing luminosity and dynamical
mass. The grey solid and dashed lines indicate the predictions based on the null-hypothesis, which
have been computed using equations (5)–(7). Overall, these predictions are in excellent agreement
with the data, indicating that elliptical galaxies with ongoing star formation or with an AGN do
not have a significantly different morphology (statistically speaking) than other ellipticals of the
same luminosity or dynamical mass.
Finally, in Figure 11 we plot the disky fractions of NA, EL and AGN galaxies as function of
their group mass (upper panels) and group hierarchy (lower panels). For comparison, the grey
solid and dashed lines indicate the predictions based on the null-hypothesis. Although overall these
predictions are in good agreement with the data, there are a few noteworthy trends. At the massive
end (Mgroup >∼ 10
13h−1 M⊙) the disky fraction of AGN is higher than expected, while that of NA
galaxies is lower. The lower panels show that this mainly owes to the satellite galaxies in these
massive groups. Whereas the null-hypothesis accurately predicts the disky fractions of NA, EL and
AGN centrals, it overpredicts fdisky of NA satellites, while underpredicting that of AGN satellites
at the 3 σ level. These results clearly warrant a more detailed investigation with larger samples.
Note that only about half of the 847 galaxies in the H06 sample are also in our group catalog. A
future analysis based on larger SDSS samples and a more complete group catalog would sufficiently
boost the statistics to examine the trends identified here with higher confidence.
5. Discussion and conclusions
In spite of their outwardly bland and symmetrical morphology, early-type galaxies reveal a
far more complex structure, whose isophotes usually deviate from a purely elliptical shape. As
shown by Bender et al. (1989), these deviations correlate with other parameters; for example, boxy
early-type galaxies are on average brighter and bigger than disky galaxies and are supported by
anisotropic pressure. Early-type galaxies with disky isophotes, on the other hand, are consistent
with being isotropic oblate rotators. With the advent of large galaxy redshift surveys such as
the SDSS, it is now possible to collect large and homogeneous samples of early-type galaxies and
quantify these correlations in much greater detail. In addition, it also allows for a detailed study
of the relation between morphology and environment.
– 14 –
We have used a sample of 847 early-type galaxies imaged by the SDSS and analyzed by Hao et
al. (2006a) to study the fraction of disky galaxies (fdisky) as a function of their absolute magnitude
MB , their dynamical mass Mdyn and the mass of the dark matter halo Mgroup in which they are
located. Using the Hα, Hβ, [OIII] and [NII] emission lines in the SDSS spectra we have split the
sample in AGN galaxies, emission-line (EL) galaxies, and non-active (NA) galaxies (see Figure 4).
In addition we also constructed two sub-samples of those ellipticals that have also been detected in
the radio (in FIRST) or in soft X-rays (with ROSAT), and we have analyzed the relations between
fdisky and the level of AGN activity in the optical and the radio, and the strength of soft X-ray
emission.
The fraction of disky galaxies in the full sample decreases strongly with increasing luminosity
and dynamical mass (see Figure 2). More quantitatively, fdisky decreases from ∼ 0.8 at MB −
5 log(h) = −18.6 (Mdyn = 6 × 1010h−1M⊙) to ∼ 0.5 at MB − 5 log(h) = −20.6 (Mdyn = 3 ×
1011h−1M⊙). This indicates a smooth transition between disky and boxy shapes, which is well
represented by a log-linear relation between fdisky and luminosity or dynamical mass (at least over
the ranges probed here). The relatively large sample allows us to measure these relations with a
good degree of accuracy that is robust against the uncertainties involved in the measurement of
the A4 parameter.
We have used these log-linear relations to test the null-hypothesis that the isophotal shape of
early-type galaxies depends only on their absolute magnitude or dynamical mass. The main result
of this paper is that the data is fully consistent with this simple ansatz, and that the correlations
seen among group mass, group hierarchy (central vs. satellite), soft X-ray emission, activity (both
in the optical and in the radio) and the disky/boxy morphology of an early-type galaxy reflect
the dependence of each of these properties on galaxy luminosity. In fact, the luminosity (mass)
dependence of fdisky predicts, with good accuracy, the following observed trends:
1. The variation of fdisky of central and satellite galaxies in the sample as a function of their
group halo mass (see Figure 2).
2. The constancy of fdisky of EL and AGN galaxies with respect to their [OIII] luminosity (see
Figure 9).
3. The decreasing fdisky of NA, EL and AGN galaxies with increasing MB and Mdyn (see Figure
4. The dependence of fdisky of NA, EL and AGN galaxies on their group halo mass and hierarchy
(see Figure 11).
5. The average value of fdisky among those sample galaxies detected by ROSAT and FIRST.
The fact that our null-hypothesis is also consistent with the fraction of disky radio-emitters con-
tradicts Bender et al. (1989), who wrote that “the isophotal shape is the second parameter besides
– 15 –
luminosity determining the occurance of radio activity in ellipticals”. We claim instead, using a
much larger, more homogeneous sample, that the radio activity is merely a reflection of the multi-
variate dependence of radio activity, luminosity and morphology. We have further checked this
result using an inverse approach, based on the fradio - MB relation. Briefly, we derived, for the full
sample, the fraction of radio galaxies (fradio) with respect to the total as a function of MB , and
obtained a log-linear relation whereby fradio smoothly increases from 0.06 at MB−5 log(h) = −18.7
to 0.34 at MB − 5 log(h) = −20.7. The fraction of radio galaxies among disky and boxy galaxies in
the full sample turns out to be fradio(disky) = 0.17 ± 0.02 and fradio(boxy) = 0.23 ± 0.02. Entering
the mean absolute magnitude of disky and boxy galaxies in the fradio - MB relation, we obtain radio
fractions of 0.18 and 0.21, respectively, well within 1 σ from the observed values.
Although the data is in good overall agreement with the null-hypothesis, there are a few weak
deviations at the 1 (3 at most) σ level (throughout errors have been computed assuming Poisson
statistics). First of all, emission line galaxies with log(L[OIII]/L⊙) < 5.2 have a disky fraction
that is ∼ 3σ lower than predicted by the null-hypothesis. Note however, that for higher [OIII]
luminosities, the null-hypothesis is in excellent agreement with the disky fraction of EL galaxies.
Another mild discrepancy between data and null-hypothesis regards the disky fraction of ellipticals
detected by ROSAT, which is ∼ 1σ lower than predicted. Finally, the disky fraction of NA and
AGN satellites in groups with Mgroup >∼ 10
13h−1 M⊙ are slightly too high and low, with respect
to the null-hypothesis, respectively. Whether these discrepancies indicate a true shortcoming of
the null-hypothesis, and thus signal that the isophotal shape of early-type galaxies depends on
additional parameters, requires a larger sample even. In the relatively near future, the final SDSS
should be able to roughly double the size of the sample used here, while a group catalog of this final
SDSS should increase the statistics regarding the environmental dependencies by an even larger
amount.
The relations between fdisky and MB (Mdyn) derived here provide a powerful test-bench for
theories of galaxy formation. In particular, they can be used to constrain the nature and the
merging history of the progenitors of present-day early-type galaxies. In a follow-up paper, we
will use semi-analytical models featuring AGN and supernova feedback in order to predict and
understand the observed log-linear relations in terms of the amount of cold gas in the progenitors
at the time of the last merger and their mass ratio (Kang et al., in prep).
AP acknowledges useful discussions with Sandra Faber and John Kormendy. We thank an
anonymous referee for his/her useful comments on the paper. Funding for the creation and distri-
bution of the SDSS Archive has been provided by the Alfred P. Sloan Foundation, the Participating
Institutions, the National Aeronautics and Space Administration, the National Science Foundation,
the U.S. Department of Energy, the Japanese Monbukagakusho, and the Max Planck Society. The
SDSS Web site is http://www.sdss.org/. The SDSS is managed by the Astrophysical Research
Consortium (ARC) for the Participating Institutions. The Participating Institutions are The Uni-
http://www.sdss.org/
– 16 –
versity of Chicago, Fermilab, the Institute for Advanced Study, the Japan Participation Group, The
Johns Hopkins University, the Korean Scientist Group, Los Alamos National Laboratory, the Max-
Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New
Mexico State University, University of Pittsburgh, University of Portsmouth, Princeton University,
the United States Naval Observatory, and the University of Washington.
REFERENCES
Adelman-McCarthy, J.K., et al. 2006, ApJS, 162, 38
Anderson, S.F., et al. 2003, AJ, 126, 2209
Baldwin, J., Phillips, M., Terlevich, R. 1981, PASP, 93, 5
Barnes, J.E., Hernquist, L.E. 1991, ApJL, 370, L65
Barnes, J.E., Hernquist, L.E. 1996, ApJ, 471, 115
Becker, R.H., White, R.L., Helfand, D.J. 1995, ApJ, 450, 559
Bender, R. 1988, A&AS, 193, 7
Bender, R., Döbereiner, S., Möllenhoff, C., 1988, A&AS, 74, 385
Bender, R., Möllenhoff, C. 1987, A&A, 177, 71
Bender, R., Surma, P., Döbereiner, S., Möllenhoff, C., Madejsky, R. 1989, A&A, 217, 35
Bernardi, M., et al. 2003, AJ, 125, 1817
Best, P.N., Kauffmann, G., Heckman, T.M., Brinchmann, J., Charlot, S., Ivezić, Z., White, S.D.M.
2005, MNRAS, 362, 25
Blanton, M.R., et al. 2005, AJ, 129, 2562
Bower, R.G., Lucey, J.R., Ellis, R.S. 1992, MNRAS, 254, 601
Canizares, C.R., Fabbiano, G., Trinchieri, G. 1987, ApJ, 312, 503
Cappellari, M., et al. 2006, MNRAS, 366, 1126
Carter, D. 1987, ApJ, 312, 514
Cattaneo, A., Blaizot, J., Devriendt, J., Guiderdoni, B. 2005, MNRAS, 364, 407
Croton, D.J., et al. 2005, MNRAS, 356, 1155
Djorgovski, S., Davis, M. 1987, ApJ, 313, 59
– 17 –
Davies, R.L., Efstathiou, G., Fall, S.M., Illingworth, G., Schechter, P.L. 1983, ApJ, 266, 41
Dressler, A. 1980, ApJ, 236, 351
Dressler, A. 1987, ApJ, 317, 1
Ellis, R.S., Smail, I., Dressler, A., Couch, W.J., Oemler, A.Jr., Butcher, H., Sharples, R.M. 1997,
ApJ, 483, 582
Fabbiano, G. 1989, ARA&A, 27, 87
Faber, S.M., Jackson, R.E. 1976, ApJ, 204, 668
Faber, S.M., et al. 1997, AJ, 114, 1771
Ferrarese, L., van den Bosch, F.C., Ford, H.C., Jaffe, W., O’Connell, R.W. 1994, AJ, 108, 1598
Gebhardt, K., et al. 1996, AJ, 112, 105
Graham, A.W., Erwin, P, Caon, N., Trujillo, I. 2001, ApJ, 563, L11
Hao, C.N., Mao, S., Deng, Z.G., Xia, X.Y., Wu, H. 2006a, MNRAS, 370, 1339
Hao, C.N., Mao, S., Deng, Z.G., Xia, X.Y., Wu, H. 2006b, MNRAS, 373, 1264
Jaffe, W., Ford, H.C., O’Connell, R.W., van den Bosch, F.C., Ferrarese, L. 1994, AJ, 108, 1567
Jedrzejewski, R.I. 1987, MNRAS, 226, 747
Jesseit, R., Naab, Y., Burkert, A. 2005, MNRAS, 360, 1185
Kang, X., van den Bosch, F.C., Pasquali, A., 2007, preprint (arXiv:0704.0932)
Kauffmann, G., et al. 2003a, MNRAS, 341, 33
Kauffmann, G., et al. 2003b, MNRAS, 346, 1055
Kewley, L.J., Groves, B., Kauffmann, G., Heckman, T., 2006, MNRAS, 372, 961
Khochfar, S., Burkert, A. 2005, MNRAS, 359, 1379
Kuntschner, H., et al. 2006, MNRAS, 369, 497
Lauer, T.R. 1985, MNRAS, 216, 429
Lauer, T.R., et al. 1995, AJ, 110, 2622
Lauer, T.R., et al. 2005, AJ, 129, 2138
McMahon, R.G., White, R.L., Helfand, D.J., Becker, R.H. 2002, ApJ Supl.Ser., 143, 1
http://arxiv.org/abs/0704.0932
– 18 –
Mihos, J.C., Hernquist, L.E. 1994, ApJL, 425, L13
Mihos, J.C., Hernquist, L.E. 1996, ApJ, 464, 641
Milosavljević, M., Merritt D., Rest A., van den Bosch F.C. 2002, MNRAS, 331, 51
Morganti, R., et al. 2006, MNRAS, 371, 157
Naab, Y., Jesseit, R., Burkert, A. 2006, MNRAS, in press (astro-ph/0605155)
Nieto, J.-L., Capaccioli, M., Held, E.V. 1988, A&A, 195, 1
O’Sullivan, E., Forbes, D.A., Ponman, T.J. 2001, MNRAS, 328, 461
Pellegrini, S. 1999, A&A, 351, 487
Pellegrini, S. 2005, MNRAS, 364, 169
Ravindranath, S., Ho, L.C., Peng, C.Y., Filippenko, A.V., Sargent, W.L.W. 2001, AJ, 122, 653
Rest, A., et al. 2001, AJ, 121, 2431
Sarzi, M., et al. 2006, MNRAS, 366, 1151
Scorza, C., Bender, R. 1995, A&A, 293, 20
Scorza, C., van den Bosch, F.C. 1998, MNRAS, 300, 469
Serra, P., Trager, S.C., van der Hulst, J.M., Oosterloo, T.A., Morganti, R. 2006, A&A, 453, 493
Shioya, Y., Taniguchi, Y. 1993, PASJ, 43, 39
Smith, J.A. et al. 2002, AJ, 123, 2121
Springel, V. 2000, MNRAS, 312, 859
Stoughton, C., et al., 2002, AJ, 123, 485
Strauss, M.A., et al, 2002, AJ, 124, 1810
Toomre, A., Toomre, J. 1972, ApJ, 178, 623
Trager, S.C., Faber, S.M., Worthey, G., González, J.J. 2000, AJ, 119, 1645
Tran, H.D., Tsvetanov, Z., Ford, H.C., Davies, J., Jaffe, W., van den Bosch, F.C., Rest, A. 2001,
AJ, 121, 2928
Trinchieri, G., Fabbiano, G. 1985, ApJ, 296, 447
Veilleux, S., Osterbrock, D. 1987, ApJS, 63, 295
http://arxiv.org/abs/astro-ph/0605155
– 19 –
Visvanathan, N., Sandage, A. 1977, 216, 214
Voges, W., et al. 1999, A&A, 349, 389
Weinmann, S., van den Bosch, F.C., Yang, X., Mo, H.J. 2006, MNRAS, 366, 2
Yang, X., Mo, H.J., van den Bosch, F.C., Jing, Y.P. 2005, MNRAS, 356, 1293
This preprint was prepared with the AAS LATEX macros v5.2.
– 20 –
Fig. 1.— The distributions in MB , Mdyn and Mgroup for the full sample, split between central and
satellite galaxies (grey crosses and open triangles respectively, in the top panels) and between disky
and boxy galaxies (open triangles and grey crosses respectively, in the bottom panels).
– 21 –
Fig. 2.— Top panels: the distributions of MB , Mdyn and Mgroup as a function of the isophotal
parameter A4 for the full sample, also split between central (grey crosses) and satellite (open
triangles) galaxies. Bottom panels: the fraction of disky galaxies as a function of MB and Mdyn
for the full sample. The fraction of disky galaxies is also shown per bin of group halo mass Mgroup
for central (crosses) and satellite (open triangles) galaxies. The errorbars are at the 1 σ level, and
were computed assuming Poisson statistics. The grey solid and dashed lines in the left hand-side
and middle panels are the best fits to the fractions of disky galaxies across the full sample. The
same lines in the right hand-side panel represent the predicted fractions of disky galaxies from the
working null-hypothesis.
– 22 –
Fig. 3.— Impact of individual A4 measurement errors: the slope and the zero-point of the log-
linear correlation between fdisky and MB (equation 4) are shown as a function of the standard
deviation of the Gaussian used to simulate errors on the observed A4 values. The grey shaded
areas indicate the best-fitting slope (0.17 ± 0.03) and zero-point (0.61 ± 0.02) in equation 4, and
the mean uncertainty on the observed A4 parameter (0.0012 ± 0.0008) as measured by H06.
– 23 –
-1 -0.5 0 0.5 1
Liners
Seyferts
Fig. 4.— The BPT diagram for the galaxies in the full sample, whose [OIII] λ5007 and Hα emission
lines were detected with a S/N ratio larger than 3. These objects have been split among Seyfert
galaxies of type 2, LINERs, Transition Objects (TO) and star-forming (SB) according to Kauffmann
et al. (2003b).
– 24 –
Fig. 5.— The distributions of the full sample (grey shaded area) and the 5 different activity classes
defined in §4.1 in absolute magnitude MB (in AB system). Each distribution is normalized by the
size of the sample from where it was extracted.
– 25 –
Fig. 6.— As in Figure 5, but for the group halo mass Mgroup.
– 26 –
Fig. 7.— As in Figure 5, but for the luminosity in the [OIII] line. Here, the grey shaded histogram
refers to emission-line and AGN galaxies together.
– 27 –
-19 -20 -21
ROSAT
FIRST
11 11.5
12 12.5 13 13.5 14
Group hierarchy
Fig. 8.— The fraction of galaxies in the 5 different activity classes with respect to the full sample
as a function of MB , Mdyn, Mgroup and split between centrals and satellites.
– 28 –
Fig. 9.— Top panels: the distribution of the luminosities in the [OIII] line, at 1.4 GHz and in
the soft X-rays, as a function of the isophotal parameter A4. Emission-line and AGN galaxies
are represented with grey filled triangles and black filled squares respectively. Bottom panels: the
fraction of disky galaxies among emission-line (triangles) and AGN (squares) galaxies as a function
of the luminosity in the [OIII] line (left hand-side panel). The grey solid and dotted lines trace the
predictions from the working null-hypothesis in MB . The fraction of disky galaxies for the galaxies
detected and non-detected by FIRST and ROSAT are shown in the middle and right hand-side
panels, together with the predictions from equations (4) and (5). The errorbars are at the 1 σ level,
and were computed assuming Poisson statistics.
– 29 –
Fig. 10.— The fraction of disky galaxies for non-active (black filled circles), emission-line (black
filled triangles) and AGN (black filled squares) galaxies as a function of MB and Mdyn. The grey
solid and dashed lines represent the predictions from equation (5), i.e. the working null-hypothesis.
The errorbars are at the 1 σ level, and were computed assuming a Poisson statistics.
– 30 –
Fig. 11.— As in Figure 10, but splitting the non-active, emission-line and AGN galaxies between
centrals and satellites.
– 31 –
Table 1. Activity Classes
AGN EL NA FIRST ROSAT
AGN 314 −− −− 91 17
EL −− 383 −− 53 16
NA −− −− 150 18 7
FIRST 91 53 18 162 22
ROSAT 17 16 7 22 40
Note. — The number of sample galaxies in the
five different activity classes. Note that the AGN,
EL and NA classes are mutually exclusive.
– 32 –
Table 2. Fraction of disky galaxies across the activity classes
AGN EL NA FIRST ROSAT
AGN 0.69 −− −− 0.65 0.47
EL −− 0.64 −− 0.45 0.50
NA −− −− 0.63 0.61 0.43
FIRST 0.65 0.45 0.61 0.58 0.54
ROSAT 0.47 0.50 0.43 0.54 0.47
	Introduction
	Data
	Sample Selection
	Isophotal Analysis
	Additional data
	The disky fraction across the sample
	The disky fraction of active early-type galaxies
	Defining different activity classes
	The relation between activity and morphology
	Discussion and conclusions
ABSTRACT
  We study the dependence of the isophotal shape of early-type galaxies on
their absolute B-band magnitude, their dynamical mass, and their nuclear
activity and environment, using an unprecedented large sample of 847 early-type
galaxies identified in the SDSS by Hao et al (2006). We find that the fraction
of disky galaxies smoothly decreases with increasing luminosity. The large
sample allows us to describe these trends accurately with tight linear
relations that are statistically robust against the uncertainty in the
isophotal shape measurements. There is also a host of significant correlations
between the disky fraction and indicators of nuclear activity (both in the
optical and in the radio) and environment (soft X-rays, group mass, group
hierarchy). Our analysis shows however that these correlations can be
accurately matched by assuming that the disky fraction depends only on galaxy
luminosity or mass. We therefore conclude that neither the level of activity,
nor group mass or group hierarchy help in better predicting the isophotal shape
of early-type galaxies.

<|endoftext|><|startoftext|>
arXiv:0704.0932v1  [astro-ph]  6 Apr 2007
Mon. Not. R. Astron. Soc. 000, 1–13 (2000) Printed 28 August 2021 (MN LATEX style file v1.4)
On the Origin of the Dichotomy of Early-Type Galaxies:
The Role of Dry Mergers and AGN Feedback
X. Kang1,2⋆, Frank C. van den Bosch1, A. Pasquali1
1Max-Planck-Institute for Astronomy, Königstuhl 17, D-69117 Heidelberg, Germany
2Shanghai Astronomical Observatory; the Partner Group of MPA, Nandan Road 80, Shanghai 200030, China
ABSTRACT
Using a semi-analytical model for galaxy formation, combined with a large N -body
simulation, we investigate the origin of the dichotomy among early-type galaxies. In
qualitative agreement with previous studies and with numerical simulations, we find
that boxy galaxies originate from mergers with a progenitor mass ratio n < 2 and with
a combined cold gas mass fraction Fcold < 0.1. Our model accurately reproduces the
observed fraction of boxy systems as a function of luminosity and halo mass, for both
central galaxies and satellites. After correcting for the stellar mass dependence, the
properties of the last major merger of early-type galaxies are independent of their halo
mass. This provides theoretical support for the conjecture of Pasquali et al. (2007) that
the stellar mass (or luminosity) of an early-type galaxy is the main parameter that
governs its isophotal shape. If wet and dry mergers mainly produce disky and boxy
early-types, respectively, the observed dichotomy of early-type galaxies has a natural
explanation within the hierarchical framework of structure formation. Contrary to
naive expectations, the dichotomy is independent of AGN feedback. Rather, we argue
that it owes to the fact that more massive systems (i) have more massive progenitors,
(ii) assemble later, and (iii) have a larger fraction of early-type progenitors. Each
of these three trends causes the cold gas mass fraction of the progenitors of more
massive early-types to be lower, so that their last major merger involved less cold
gas (was more “dry”). Finally, our model predicts that (i) less than 10 percent of all
early-type galaxies form in major mergers that involve two early-type progenitors, (ii)
more than 95 percent of all boxy early-type galaxies with M∗ <∼ 2 × 10
10h−1 M⊙ are
satellite galaxies, and (iii) about 70 percent of all low mass early-types do not form
a supermassive black hole binary at their last major merger. The latter may help to
explain why low mass early-types have central cusps, while their massive counterparts
have cores.
Key words: dark matter — galaxies: elliptical and lenticular — galaxies: interactions
— galaxies: structure — galaxies: formation
1 INTRODUCTION
Ever since the seminal paper by Davies et al. (1983) it
is clear that early-type galaxies (ellipticals and lenticulars,
hereafter ETGs) can be split in two distinct sub-classes.
Davies et al. showed that bright ETGs typically have lit-
tle rotation, such that their flattening must originate from
anisotropic pressure. This is consistent with bright ellipti-
cals being in general triaxial. Low luminosity ellipticals, on
the other hand, typically have rotation velocities that are
consistent with them being oblate isotropic rotators (see
Emsellem et al. 2007 and Cappellari et al. 2007 for a more
contemporary description). These different kinematic classes
⋆ E-mail:kang@mpia.de
also have different morphologies and different central surface
brightness profiles. In particular, bright, pressure-supported
systems usually have boxy isophotes and luminosity pro-
files that break from steep outer power-laws to shallow in-
ner cusps (often called ‘cores’). The low luminosity, rotation
supported systems, on the other hand, often reveal disky
isophotes and luminosity profiles with a steep central cusp
(e.g., Bender 1988; Nieto et al. 1988; Ferrarese et al. 1994;
Gebhardt et al. 1996; Rest et al. 2001; Lauer et al. 2005,
2006). Finally, the bimodality of ETGs has also been found
to extend to their radio and X-ray properties. Objects which
are radio-loud and/or bright in soft X-ray emission generally
have boxy isophotes, while disky ETGs are mostly radio-
quiet and faint in soft X-rays (Bender et al. 1989; Pellegrini
1999, 2005; Ravindranath et al. 2001).
c© 2000 RAS
http://arxiv.org/abs/0704.0932v1
2 Kang et al.
Recently, Hao et al. (2006) constructed a homoge-
neous samples of 847 ETGs from the SDSS DR4 catalogue
(Adelman-McCarthy et al. 2006), and analyzed their isopho-
tal shapes. This sample was used by Pasquali et al. (2007:
hereafter P07) to investigate the relative fractions of disky
and boxy ETGs as function of luminosity, stellar mass and
environment. They found that the disky fraction decreases
smoothly with increasing (B-band) luminosity, stellar mass,
and halo mass, where the latter is obtained from the SDSS
group catalogue of Weinmann et al. (2006). In addition, the
disky fraction is found to be higher for satellite galaxies than
for central galaxies in a halo of the same mass. These data
provide a powerful benchmark against which to test models
for the formation of ETGs.
Within the framework of hierarchical structure for-
mation, elliptical galaxies are generally assumed to form
through major mergers (e.g., Toomre & Toomre 1972;
Schweizer 1982; Barnes 1988; Hernquist 1992; Kauffmann,
White & Guiderdoni 1993). In this case, it seems logical that
the bimodality in their isophotal and kinematical properties
must somehow be related to the details of their merger his-
tories. Using dissipationless N-body simulations it has been
shown that equal mass mergers of disk galaxies mainly result
in slowly rotating ETGs with boxy (but sometimes disky)
isophotes, while mergers in which the mass ratio between the
progenitor disks is significantly different from unity mainly
yields disky ETGs (Negroponte & White 1983; Barnes 1988;
Hernquist 1992; Bendo & Barnes 2000; Naab & Burkert
2003; Bournaud et al. 2004, 2005; Naab & Trujillo 2006).
However, simulations that also include a dissipative gas com-
ponent and star formation have shown that the presence of
even a relatively small amount of cold gas in the progenitors
results in a merger remnant that resembles a disky ellipti-
cal even when the mass ratio of the progenitors is close to
unity (Barnes & Hernquist 1996; Naab et al. 2006a; Cox
et al. 2006a). This suggests that boxy ETGs can only form
out of dry, major mergers (see also discussion in Faber et
al. 1997). In this paper we test this paradigm using a semi-
analytical model for galaxy formation and the observational
constraints of P07.
Our study is similar to those of Khochfar & Burk-
ert (2005; hereafter KB05) and Naab et al. (2006b; here-
after N06), who also used semi-analytical models to explore
whether the dichotomy of elliptical galaxies can be related to
their merger properties. However, our analysis differs from
theirs on the following grounds.
• We use a numerical N-body simulation to construct the
merger histories of dark matter haloes. The models of KB05
and N06, on the other hand, used merger trees based on
the extended Press-Schechter (EPS) formalism (e.g., Lacey
& Cole 1993). As we will show, this results in significant
differences.
• Because of our use of numerical N-body simulations,
our model more accurately traces the dynamical evolution of
dark matter subhaloes with their associated satellite galax-
ies. In particular, it takes proper account of dynamical fric-
tion, tidal stripping and the merging between subhaloes.
• Contrary to KB05 and N06, we include a prescription
for the feedback from active galactic nuclei (AGN) in our
semi-analytical model.
• Our semi-analytical model is tuned to reproduce the
luminosity function and the color-bimodality of the redshift
zero galaxy population (see Kang et al. 2005). The works of
KB05 and N06 do not mention such a comparison.
• Our criteria for the production of boxy ETGs are dif-
ferent from those used in KB05 and N06.
• We use a much larger, more homogeneous data set to
constrain the models.
This paper is organized as follows. In Section 2 we de-
scribe our N-body simulation and semi-analytical model,
and outline the methodology. The results are described in
Section 3 and discussed in Section 4. We summarize our
findings in Section 5
2 METHODOLOGY
The aim of this paper is to investigate to what extend semi-
analytical models of galaxy formation can reproduce the
ecology of ETGs, in particular the fractions of disky and
boxy systems as function of luminosity and halo mass. Par-
tially motivated by numerical simulations of galaxy mergers,
both with and without gas, we adopt a framework in which
(i) ETGs are red and dominated by a spheroidal compo-
nent, (ii) ETGs are the outcome of major mergers, (iii) the
remnant is boxy if the merger is sufficiently “dry” (i.e., the
progenitors have little or no cold gas) and sufficiently “ma-
jor” (i.e., the progenitors have roughly equal masses) and
(iv) a boxy elliptical becomes a disky elliptical if newly ac-
creted material builds a sufficiently large stellar disk.
2.1 N-body simulation and model descriptions
In order to have accurate merger trees, and to be able
to follow the dynamical evolution of satellite galaxies, we
use a numerical simulation of the evolution of dark mat-
ter which we populate with galaxies using a state-of-the-art
semi-analytical model for galaxy formation. The numerical
simulation has been carried out by Jing & Suto (2002) using
a vectorized-parallel P3M code. It follows the evolution of
5123 particles in a cosmological box of 100h−1 Mpc, assum-
ing a flat ΛCDM ‘concordance’ cosmology with Ωm = 0.3,
σ8 = 0.9, and h = H0/100 kms
−1 Mpc−1 = 0.7. Each parti-
cle has a mass of 6.2 × 108h−1 M⊙. Dark matter haloes are
identified using the friends-of-friends (FOF) algorithm with
a linking length equal to 0.2 times the mean particle sepa-
ration. For each halo thus identified we compute the virial
radius, rvir, defined as the spherical radius centered on the
most bound particle inside of which the average density is
340 times the average density of the Universe (cf. Bryan &
Norman 1998). The virial mass is simply defined as the mass
of all particles that have halocentric radii r ≤ rvir. Since our
FOF haloes have a characteristic overdensity of ∼ 180 (e.g.,
White 2002), the virial mass is typically smaller than the
FOF mass.
Dark matter subhaloes within each FOF (parent) halo
are identified using the SUBFIND routine described in
Springel et al. (2001). In the present study, we use all haloes
and subhaloes with masses down to 6.2 × 109h−1M⊙ (10
particles). Using 60 simulation outputs between z = 15 and
z = 0, equally spaced in log(1 + z), Kang et al. (2005; here-
after K05) constructed the merger history for each (sub)halo
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 3
in the simulation box, which are then used in the semi-
analytical model. In what follows, whenever we refer to a
halo, we mean a virialized object which is not a sub-structure
of a larger virialized object, while subhaloes are virialized ob-
jects that orbit within a halo. In addition, (model) galaxies
associated with haloes and subhaloes are referred to as cen-
tral galaxies and satellites, respectively.
The semi-analytical model used to populate the haloes
and subhaloes with galaxies is described in detail in K05, to
which we refer the reader for details. Briefly, the model as-
sumes that the baryonic material accreted by a dark matter
halo is heated to the virial temperature. The gas then cools
radiatively and settles down into a centrifugally supported
disk, in which the star formation rate is proportional to the
total amount of cold gas, and inversely proportional to the
dynamical time of the disk. The energy feedback from su-
pernova is related to the initial stellar mass function (IMF)
and proportional to the star formation rate. It is assumed
that the gas that is reheated by supernova feedback remains
bound to the host halo so that it can cool back onto the
disk at later stages. When the subhalo associated with a
satellite galaxy is dissolved in the numerical simulation the
satellite galaxy becomes an “orphan” galaxy, which is as-
sumed to merge with the central galaxy of the parent halo
after a dynamical friction time (computed assuming stan-
dard Chandrasekhar dynamical friction). When two galaxies
merge, the outcome is assumed to depend on their mass ratio
n ≡ M1/M2 with M1 ≥ M2. If n ≤ 3 the merger is assumed
to result in the formation of an elliptical galaxy, and we
speak of a “major merger”. Any cold gas available in both
progenitors is turned into stars. This is supported by hy-
drodynamical simulations, which show that major mergers
cause the cold gas to flow to the center where the resulting
high gas density triggers a starburst (e.g., Barnes & Hern-
quist 1991, 1996; Mihos & Hernquist 1996; Springel 2000;
Cox et al. 2006a,b; Di Matteo et al. 2007). A new disk of cold
gas and stars may form around the elliptical if new gas can
cool in the halo of the merger remnant, thus giving rise to
a disk-bulge system. If n > 3 we speak of a “minor merger”
and we simply add the cold gas of the less massive progeni-
tor to that of the disk of the more massive progenitor, while
its stellar mass is added to the bulge of the massive progen-
itor. The semi-analytical model also includes a prescription
for “radio-mode” AGN feedback as described in Kang, Jing
& Silk (2006; see also Section 3.2). This ingredient is es-
sential to prevent significant amounts of star formation in
the brightest galaxies, and thus to ensure that these systems
are predominantly members of the red sequence (e.g., Cat-
taneo et al. 2006; De Lucia et al. 2006; Bower et al. 2006;
Croton et al. 2006). Finally, luminosities for all model galax-
ies are computed using the predicted star formation histo-
ries and the stellar population models of Bruzual & Charlot
(2003). Throughout we assume a Salpeter IMF and we self-
consistently model the metalicities of gas and stars, includ-
ing metal-cooling.
As shown in K05 and Kang et al. (2006) this model ac-
curately fits, among others, the galaxy luminosity function
at z = 0, the color bimodality of the z = 0 galaxy popula-
tion, and the number density of massive, red galaxies out to
z ∼ 3. We emphasize that in this paper we use this model
without changing any of its parameters.
2.2 Predicting Isophotal Shapes
In our model we determine whether an elliptical galaxy is
disky or boxy as follows. Using the output at z = 0 we first
identify the early-type (E/S0) galaxies based on two criteria.
First of all, following Simien & de Vaucouleurs (1986), we
demand that an ETG has a bulge-to-disk ratio in theB-band
of LB,bulge/LB,total ≥ 0.4. In addition, we require the B−V
color of the galaxy to be red. Following Hao et al. (2006) and
P07, we adopt B − V > 0.8. We have verified that none of
our results are sensitive to the exact choice of these selection
criteria.
Having thus identified all ETGs at z = 0, we subse-
quently trace their formation histories back until their last
major merger, and register the mass ratio n of the merger
event, as well as the total cold gas mass fraction at that
epoch, defined as
Fcold ≡
Mcold,i
(Mcold,i +M∗,i)
Here Mcold,i and M∗,i are the cold gas mass and stellar mass
of progenitor i, respectively. We adopt the hypothesis that
the merger results in a boxy elliptical if, and only if, n < ncrit
and Fcold < Fcrit. The main aim of this paper is to use the
data of P07 to constrain the values of ncrit and Fcrit, and to
investigate whether a model can be found that is consistent
with the observed fraction of boxy (disky) ETGs as function
of both galaxy luminosity and halo mass.
The final ingredient for determining whether an ETG
is disky or boxy is the potential regrowth of a stellar disk.
Between its last major merger and the present day, new gas
in the halo of the remnant may cool out to form a new disk.
In addition, the ETG may also accrete new stars and cold
gas via minor mergers (those with n > 3). Any cold gas in
those accreted systems is added to the new disk, where it
is allowed to form new stars. Whenever the stellar disk has
grown sufficiently massive, its presence will reveal itself in
the isophotes, and the system changes from being boxy to
being disky. To take this effect into account, we follow KB05
and we reclassify a boxy system as disky if at z = 0 it has
grown a disk with a stellar mass that contributes more than
a fraction fd,max of the total stellar mass in the galaxy. In
our fiducial model we set fd,max = 0.2. This is motivated by
Rix & White (1990), who have shown that if an embedded
stellar disk contains more than ∼ 20 percent of the total
stellar mass, the isophotes of its host galaxy become disky.
Note that the same value for fd,max has also been used in
the analysis of KB05.
3 RESULTS
In Figure 1 we show the fraction of boxy ETGs, fboxy, as a
function of the luminosity in the B-band (in the AB system).
Open squares with errorbars (reflecting Poisson statistics)
correspond to the data of P07, while the various lines indi-
cate the results obtained from three different models that
we describe in detail below. The (Poisson) errors on these
model predictions are comparable to those on the P07 data
and are not shown for clarity. Each column and row show,
c© 2000 RAS, MNRAS 000, 1–13
4 Kang et al.
Figure 1. The boxy fraction of ETGs as function of their B-band magnitude (in the AB system). The open, red squares with (Poissonian)
errorbars correspond to the data of P07, and are duplicated in each panel. The solid, dotted and dashed lines correspond to the three
models discussed in the text. Different columns and rows correspond to different values for the critical progenitor mass ratio, ncrit, and
the critical cold gas mass fraction, Fcrit, respectively, as indicated.
respectively, the results for different values of ncrit and Fcrit
as indicated.
We start our investigation by setting fd,max = 1, which
implies that the isophotal shape of an elliptical galaxy (disky
or boxy) is assumed to be independent of the amount of mass
accreted since the last major merger. Although we don’t con-
sider this realistic, it is a useful starting point for our investi-
gation, as it clearly separates the effects of ncrit and Fcrit on
the boxy fraction. The results thus obtained from our semi-
analytical model with AGN feedback are shown in Figure 1
as dotted lines. If we assign the isophotal shapes of ETGs
depending only on the progenitor mass ratio n, which cor-
responds to setting Fcrit = 1, we obtain the boxy fractions
shown in the upper panels. In agreement with KB05 (see
their Figure 1) this results in a boxy fraction that is virtually
independent of luminosity, in clear disagreement with the
data. Note that, for a given value of ncrit, our boxy fractions
are significantly higher than in the model of KB05. For ex-
ample, for ncrit = 2 we obtain a boxy fraction of ∼ 0.6, while
KB05 find that fboxy ∼ 0.33. This mainly reflects the differ-
ence in the type of merger trees used. As discussed above, we
use the merger trees extracted from a N-body simulation,
while KB05 use monte-carlo merger trees based on the EPS
formalism. It is well known that EPS merger trees predict
masses for the most massive progenitors that are too large
(e.g., Lacey & Cole 1994; Somerville et al. 2000; van den
Bosch 2002; Benson, Kamionkowski & Hassani 2005). This
implies that the number of mergers with a small progenitor
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 5
Figure 2. The fractions of last major mergers between centrals and satellites (C-S; solid lines) and between two satellites (S-S; dashed
lines) that result in the formation of an ETG as function of its stellar mass at z = 0. Results are shown for all ETGs (left-hand panel) and
for those with Fcold < 0.1 (right-hand panel). Black and red lines correspond to models with and without AGN feedback, respectively.
Low mass ETGs that form in dry mergers, and hence end up being boxy, mainly form out of S-S mergers. At the massive end, the
fraction of ETGs that form out of S-S mergers with Fcold = 0.1 depends strongly on the presence or absence of AGN feedback. See text
for detailed discussion.
mass ratio n will be too small, which explains the difference
between our results and those of KB05. Using cosmological
SPH simulations, Maller et al. (2006) found that the distri-
bution of merger mass ratios scales as dN/dn ∝ n−0.8. This
means that 60 percent of all galaxy mergers with n < 3 have
a progenitor mass ratio n < 2, in excellent agreement with
our results.
The dotted lines in the remaining panels of Figure 1
show the results obtained for three different values of the
maximum cold gas mass fraction, Fcrit = 0.6, 0.3, and 0.1.
Lowering Fcrit has a strong impact on the boxy fraction
of low-luminosity ETGs, while leaving that of bright ETGs
largely unaffected. As we show in §4 this mainly owes to the
fact that Fcold decreases strongly with increasing luminosity.
Consequently, by changing Fcrit we can tune the slope of
the relation between fboxy and luminosity, while ncrit mainly
governs the absolute normalization. We obtain a good match
to the P07 data for ncrit = 2 and Fcrit = 0.1 (third panel in
lowest row). This implies that boxy ETGs only form out of
relatively dry and violent mergers, in good agreement with
numerical simulations.
3.1 The influence of disk regrowth
The analysis above, however, does not consider the impact
of the growth of a new disk around the merger remnant.
Since this may turn boxy systems into disky systems, it can
have a significant impact on the predicted fboxy. We now
take this effect into account by setting fd,max to its fiducial
value of 0.2.
The solid lines in Fig. 1 show the boxy fractions thus
obtained. A comparison with the dotted lines shows that the
newly formed disks only cause a significant decrease of fboxy
at the faint end. At the bright end, AGN feedback prevents
the cooling of hot gas, therewith significantly reducing the
rate at which a new disk can regrow†. However, when Fcrit =
0.1, we obtain the same boxy fractions for fd,max = 0.2 as
for fd,max = 1, even at the faint end. This implies that we
obtain the same conclusions as above: matching the data
of P07 requires ncrit ≃ 2 and Fcrit ≃ 0.1. In other words,
our constraints on ncrit and Fcrit are robust to exactly how
much disk regrowth is allowed before it reveals itself in the
isophotes.
Why do faint ETGs with Fcold < 0.1 not regrow signifi-
cant disks, while does with Fcold > 0.1 do? Note that during
the last major merger, the entire cold gas mass is converted
into stars in a starburst. Therefore, it is somewhat puzzling
that the galaxy’s ability to regrow a disk depends on its cold
gas mass fraction at the last major merger. As it turns out,
this owes to the fact that progenitors with a low cold gas
mass fraction are more likely to be satellite galaxies. Fig. 2
plots the fractions of ETGs with last major mergers be-
tween a central galaxy and a satellite (C-S; solid lines) and
between two satellites (S-S; dashed lines). Note that in our
model, S-S mergers occur whenever their dark matter sub-
haloes in the N-body simulation merge. Results are shown
for all ETGs (left-hand panel), and for only those ETGs
that have Fcold < 0.1 (right-hand panel). In our fiducial
model with AGN feedback (black lines) the most massive
ETGs almost exclusively form out of C-S mergers. Since a
satellite galaxy can never become a central galaxy, this is
† In the absence of cooling, the only way in which a galaxy can
(re)grow a disk is via minor mergers.
c© 2000 RAS, MNRAS 000, 1–13
6 Kang et al.
Figure 3. The progenitor mass ratio, n, (left-hand panel) and the cold gas mass fraction at the last major merger, Fcold, (right-hand
panel) as function of the z = 0 stellar mass, M∗, of ETGs. Solid lines with errorbars indicate the median and the 20
th and 80th percentiles
of the distributions. While the mass ratio of the progenitors of early-type galaxies is independent of its stellar mass, Fcold decreases
strongly with increasing M∗.
consistent with the fact that virtually all massive ETGs at
z = 0 are central galaxies (in massive haloes). Roughly 40
percent of all low mass ETGs have a last major merger be-
tween two satellite galaxies. However, when we only focus on
low mass ETGs with Fcold < 0.1, we find that ∼ 95 percent
of their last major mergers are between two satellite galax-
ies. Since the z = 0 descendents of S-S mergers will also
be satellites, this implies that virtually all boxy ETGs with
M∗ <∼ 2 × 10
10h−1 M⊙ are satellite galaxies. Furthermore,
since satellite galaxies do not have a hot gas reservoir (at
least not in our semi-analytical model) they can not regrow
a new disk by cooling. This explains why for Fcrit = 0.1 the
boxy fractions are independent of the value of fd,max.
3.2 The role of AGN feedback
Our semi-analytical model includes “radio-mode” AGN
feedback, similar to that in Croton et al. (2006), in order to
suppresses the cooling in massive haloes. This in turn shuts
down star formation in the central galaxies in these haloes,
so that they become red. In the absence of AGN feedback,
new gas continues to cool making the central galaxies in
massive haloes overly massive and too blue (e.g., Benson
et al. 2003; Bower et al. 2006; Croton et al. 2006; Kang et
al. 2006). In order to study its impact on fboxy as func-
tion of luminosity, we simply turn off AGN feedback in our
model. Although this results in a semi-analytical model that
no longer fits the galaxy luminosity function at the bright
end, and results in a color-magnitude relation with far too
many bright, blue galaxies, a comparison with the models
discussed above nicely isolates the effects that are directly
due to our prescription for AGN feedback.
The dashed lines in Figure 1 show the predictions of
our model without AGN feedback (and with fd,max = 0.2).
A comparison with our fiducial model (solid lines) shows
that apparently the AGN feedback has no impact on fboxy
for faint ETGs with MB −5 log h >∼ −18. At the bright end,
though, the model without AGN feedback predicts boxy
fractions that are significantly lower (for reasons that will
be discussed in §4.2). Consequently, the luminosity depen-
dence of fboxy is much weaker than in the fiducial case. The
only model that comes close to matching the data of P07
is the one with ncrit = 3 and Fcrit = 0.1. We emphasize,
though, that this model is not realistic. In addition to the
fact that this semi-analytical model does not fit the observed
luminosity function and color magnitude relation, a value of
ncrit = 3 is also very unlikely: numerical simulations have
clearly shown that mergers with a mass ratio near 1:3 almost
invariably result in disky remnants (e.g., Naab & Burkert
2003).
4 DISCUSSION
4.1 The Origin of the ETG Dichotomy
We now examine the physical causes for the various scal-
ings noted above. We start by investigating why our fiducial
model with ncrit = 2 and Fcrit = 0.1 is successful in repro-
ducing the luminosity dependence of fboxy, i.e., why it pre-
dicts that the boxy fraction increases with luminosity. Given
the method used to assign isophotal shapes to the ETGs in
our model, there are three possibilities: (i) brighter ETGs
have smaller progenitor mass ratios, (ii) brighter ETGs have
progenitors with smaller cold gas mass fractions, or (iii)
brighter ETGs have less disk regrowth after their last major
merger.
We can exclude (i) from the fact that the models that
ignore disk regrowth (i.e., with fd,max = 1) and that ignore
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 7
Figure 4. Contour plots for the number density of ETGs as function of their present day stellar mass, M∗, and their cold gas mass
fraction at the last major merger, Fcold. Results are shown both for our fiducial model with AGN feedback (left-hand panel), as well as
for the model without AGN feedback (right-hand panel). In both cases a clear bimodality is apparent: ETGs with large and low masses
formed out of dry and wet mergers, respectively. Note that this bimodality is present independent of the presence of AGN feedback.
the cold gas mass fractions (i.e., with Fcrit = 1) predict
that the boxy fraction is roughly independent of luminosity
(dotted lines in upper panels of Fig. 1). This suggests that
the distribution of n is roughly independent of the (present
day) luminosity of the ETGs. This is demonstrated in the
left-hand panel of Fig. 3, were we plot n as function of,
M∗, the stellar mass at z = 0. Each dot corresponds to an
ETG in our fiducial model, while the solid black line with
the errorbars indicates the median and the 20th and 80th
percentiles of the distribution: clearly the progenitor mass
ratio is independent of M∗.
The boxy fraction of our best-fit model with ncrit = 2
and Fcrit = 0.1 is also independent of the regrowth of
disks, which is evident from the fact that the models with
fd,max = 1 (dotted lines) and fd,max = 0.2 (solid lines) pre-
dict boxy fractions that are indistinguishable. Therefore op-
tion (iii) can also be excluded, and the luminosity depen-
dence of fboxy thus has to indicate that the progenitors of
more luminous ETGs have a lower gas mass fraction. That
this is indeed the case can be seen from the right-hand panel
of Fig. 3 which shows Fcold as function of M∗. Once again,
the solid black line with the errorbars indicates the median
and the 20th and 80th percentiles of the distribution. Note
that Fcold decreases strongly with increasing stellar mass;
the most massive ETGs form almost exclusively from dry
mergers with Fcold < 0.1.
The left-hand panel of Fig. 4 shows a different rendi-
tion of the relation between Fcold and M∗. Contours indicate
the number density, φ(Fcold,M∗), of ETGs in the Fcold-M∗
plane, normalized by the total number of ETGs at each given
M∗-bin, i.e.,
φ(Fcold,M∗) dFcold = 1 (2)
Note that φ(Fcold,M∗) is clearly bimodal: low mass ETGs
with M∗ <∼ 3 × 10
9h−1 M⊙ have high Fcold, while the pro-
genitors of massive ETGs have low cold gas mass fractions.
Clearly, if wet and dry mergers produce disky and boxy el-
lipticals, respectively, this bimodality is directly responsible
for the ETG dichotomy.
What is the physical origin of this bimodality? It is
tempting to expect that it owes to AGN feedback. After all,
in our model AGN feedback is more efficient in more massive
galaxies. Since more massive ETGs have more massive pro-
genitors, one could imagine that their cold gas mass fractions
are lower because of the AGN feedback. However, the right-
hand panel of Fig. 4 shows that this is not the case. Here
we show φ(Fcold,M∗) for the model without AGN feedback.
Somewhat surprisingly, this model predicts almost exactly
the same bimodality as the model with AGN feedback. Their
are subtle differences, which have a non-negligible effect on
the boxy fractions and which will be discussed in §4.2 be-
low. However, it should be clear from Fig. 4 that the overall
bimodality in φ(Fcold,M∗) is not due to AGN feedback.
In order to explore alternative explanations for the bi-
modality, Fig. 5 shows some relevant statistics. Upper and
lower panels correspond to the models with and without
AGN feedback, respectively. Here we focus on our fiducial
model with AGN feedback; the results for the model without
AGN feedback will be discussed in §4.2. The upper left-hand
panel shows the average cold gas mass fraction of individ-
ual galaxies, 〈fcold〉, as function of lookback time. Note that
here we use fcold to distinguish it from Fcold, which indi-
cates the cold gas mass fraction of the combined progeni-
tors taking part in a major merger, as defined in eq. (1).
Results are shown for galaxies of two different (instanta-
neous) stellar masses, M∗ = 3× 109h−1 M⊙ (red lines) and
M∗ = 3× 1010h−1 M⊙ (black lines), and for two (instanta-
neous) types: early-types (dotted lines) and late-types (solid
lines). Following N06, here we define early-types as systems
with a bulge-to-total stellar mass ratio of 0.6 or larger; con-
trary to our z = 0 selection criteria described in §2.1, we do
not include a color selection, simply because the overall color
of the galaxy population evolves as function of time. First
of all, note that 〈fcold〉 of galaxies of given type and given
mass decreases with increasing time (i.e., with decreasing
lookback time). This is simply due to the consumption by
star formation. Secondly, at a given time, early-type galax-
ies have lower gas mass fractions than late-type galaxies.
This mainly owes to the fact that at a major merger, which
creates an early-type, all the available cold gas is consumed
in a starburst. Consequently, each early-type starts its life
c© 2000 RAS, MNRAS 000, 1–13
8 Kang et al.
Figure 5. Various statistics of our semi-analytical models. Upper and lower panels refer to our models with and without AGN feedback,
respectively. Left-hand panels: The average cold gas mass fraction of individual galaxies as function of lookback time t. Results are shown
for galaxies of two different (instantaneous) stellar masses, M∗ = 3 × 10
9h−1 M⊙ (red lines) and M∗ = 3 × 10
10h−1 M⊙ (black lines),
and for two (instantaneous) types: early-types (dotted lines) and late-types (solid lines). Middle panels: The fractions of late-late (L-L;
solid lines) , early-late (E-L; dotted lines) and early-early (E-E; dashes lines) type mergers as function of the z = 0 stellar mass of the
resulting ETG. Right-hand panels: The average lookback time to the last major merger of z = 0 ETGs as function of their z = 0 stellar
mass. Results are shown separately for L-L mergers (solid line), E-L mergers (dotted line), and E-E mergers (dashed line). The errorbars
indicate the 20th and 80th percentiles of the distribution of the E-L mergers. For clarity, we do not show these percentiles for the L-L
and E-E mergers, though they are very similar. See text for detailed discussion.
with fcold = 0. Finally, massive galaxies have lower gas
mass fractions than their less massive counterparts. This
owes to the fact that more massive galaxies live, on average,
in more massive haloes, which tend to form (not assem-
ble!) earlier thus allowing star formation to commence at
an earlier epoch (see Neistein et al. 2006). In addition, the
star formation efficiency used in the semi-analytical model
is proportional to the mass of the cold gas times M0.73vir . As
discussed in K05, this scaling with the halo virial mass is re-
quired in order to match the observed 〈fcold〉(M∗) at z = 0
(see also Cole et al. 1994, 2000; De Lucia et al. 2004).
The middle panel in the upper row of Fig. 5 shows what
kind of galaxy types are involved in the last major mergers
of present-day ETGs. Solid, dotted and dashed curves show
the fractions of L-L, E-L and E-E mergers, where ‘L’ and ‘E’
refer to late-types and early-types, respectively. As above,
these types are based solely on the bulge-to-total mass ratio
of the galaxy and not on its color. In our semi-analytical
model, the lowest mass ETGs almost exclusively form via
L-L mergers. With increasing M∗, however, there is a pro-
nounced decrease of the fraction of L-L mergers, which are
mainly replaced by E-L mergers. The fraction of E-E merg-
ers increases very weakly with increasing stellar mass but
never exceeds 10 percent. Thus, although boxy ellipticals
form out of dry mergers, these are not necessarily mergers
between early-type galaxies. In fact, our model predicts that
the vast majority of all dry mergers involve at least one late-
type galaxy (though with a low cold gas mass fraction). This
is in good agreement with the SPH simulation of Maller et
al. (2006), who also find that E-E mergers are fairly rare.
However, it is in stark contrast to the predictions of the
semi-analytical model of N06, how find that more than 50
percent of the last major mergers of massive ellipticals are E-
E mergers. We suspect that the main reason for this strong
discrepancy is the fact that N06 used merger trees based on
the EPS formalism.
Finally, the upper right-hand panel of Fig. 5 plots the
average lookback time to the last major merger of ETGs as
function of their present day stellar mass. Results are shown
separately for L-L mergers (solid line), E-L mergers (dotted
line), and E-E mergers (dashed line). Clearly, more massive
ETGs assemble later (at lower lookback times). This mainly
owes to the fact that more massive galaxies live in more
massive haloes, which themselves assemble later (cf. Lacey
& Cole 1993; Wechsler et al. 2002; van den Bosch 2002;
Neistein et al. 2006; De Lucia et al. 2006). In addition, it is
clear that at fixed stellar mass, E-E mergers occur later than
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 9
L-L mergers, with E-L mergers in between. This difference,
however, is small compared to the scatter.
If we combine all this information, we infer that the
bimodality in φ(Fcold,M∗) owes to the following three facts:
• More massive ETGs have more massive progenitors
(this follows from the fact that n is independent of M∗).
Since at a given time more massive galaxies of a given type
have lower cold gas mass fractions, 〈Fcold〉 decreases with
increasing M∗.
• More massive ETGs assemble later (at lower redshifts).
Galaxies of given mass and given type have lower 〈fcold〉 at
later times. Consequently, 〈Fcold〉 decreases with increasing
• More massive ETGs have a larger fraction of early-type
progenitors. ETGs of a given mass have a lower cold gas
mass fraction than late type galaxies of the same mass, at
any redshift. In addition, E-L mergers occur at later times
than L-L mergers. Both these effects also contribute to the
fact that 〈Fcold〉 decreases with increasing M∗.
4.2 Is AGN feedback relevant?
A comparison of the upper and lower panels in Fig. 5
shows that the three effects mentioned above, and thus
the bimodality in φ(Fcold,M∗), are present independent of
whether or not the model includes feedback from active
galactic nuclei. There are only two small differences: with-
out AGN feedback massive ETGs (i) are more likely to result
from L-L mergers, and (ii) have a higher 〈fcold〉 (cf. black
dotted curves in the left-hand panels of Fig. 5). Both ef-
fects reflect that AGN feedback prevents the cooling of hot
gas around massive galaxies, therewith removing an impor-
tant channel for building a new disk. As is evident from
Fig. 4, these two effects only have a very mild impact on
φ(Fcold,M∗). We therefore conclude that the bimodality of
ETGs is not due to AGN feedback.
This does not imply, however, that AGN feedback does
not have an impact on the boxy fractions. As is evident from
Fig. 1, the models with and without AGN feedback clearly
predict different fboxy at the bright end. To understand the
origin of these differences, first focus on Fig. 4. Although
both panels look very similar, upon closer examination one
can notice that at M∗ >∼ 10
11h−1 M⊙ the number density of
ETGs with 0.1 <∼ Fcold <∼ 0.25 is significantly larger in the
model without AGN feedback. In the model with AGN feed-
back these systems all have Fcold < 0.1. This explains why
the model without AGN feedback predicts a lower boxy frac-
tion for bright galaxies when Fcrit = 0.1. However, this does
not explain why fboxy is also different when Fcrit ≥ 0.3.
After all, for those models it should not matter whether
Fcold = 0.05 or Fcold = 0.25, for example. It turns out that
in these cases the differences between the models with and
without AGN feedback are due to the regrowth of a new
disk; since AGN feedback suppresses the cooling of hot gas
around massive galaxies, it strongly suppresses the regrowth
of a new disk, thus resulting in higher boxy fractions.
Note however, that in ETGs with Fcold < 0.1, disk re-
growth is always negligible. In the presence of AGN feedback
this is due to the suppression of cooling in massive haloes.
In the absence of AGN feedback it owes to the fact that only
a very small fraction of ETGs are central galaxies. As can
Figure 6. The boxy fraction of ETGs as function of halo (group)
mass. Red triangles (for satellite galaxies) and blue circles (for
central galaxies) are taken from P07, and have been obtained us-
ing the galaxy group catalogue of Weinmann et al. (2006). Dashed
and solid lines correspond to the predictions from our fiducial
model.
be seen from the right-hand panel of Fig. 2, more than 90
percent of the ETGs have last major mergers between two
satellite galaxies (with AGN feedback this fraction is smaller
than 20 percent). Since satellite galaxies do not have hot gas
reservoirs, no significant disks can regrow around these sys-
tems.
4.3 Environment dependence
Using the SDSS galaxy group catalogue of Weinmann et
al. (2006), which has been constructed using the halo-based
group finder developed by Yang et al. (2005), P07 inves-
tigated how fboxy scales with group mass. They also split
their sample in ‘central’ galaxies (defined as the brightest
group members) and ‘satellites’. The open circles and trian-
gles in Fig. 6 show their results for centrals and satellites,
respectively. Although there are only two data points for the
satellites, it is clear that central galaxies are more likely to
be boxy than a satellite galaxy in a group (halo) of the same
mass.
We now investigate whether our fiducial semi-analytic
model that fits the luminosity dependence of the boxy frac-
tion (i.e., the one with ncrit = 2 and Fcrit = 0.1) can also
reproduce these trends. The model predictions for the cen-
trals and satellites are shown in Fig. 6 as solid and dashed
lines, respectively. Here we have associated the halo virial
mass with the group mass, and an ETG is said to be a cen-
tral galaxy if it is the brightest galaxy in its halo. The model
accurately reproduces the boxy fraction of both central and
satellite galaxies. In particular, it reproduces the fact that
fboxy of central galaxies is higher than that of satellites in
groups (haloes) of the same mass.
As shown in P07, the boxy fraction as function of group
c© 2000 RAS, MNRAS 000, 1–13
10 Kang et al.
Figure 7. The residuals of the relations between n and M∗ (left panel) and Fcold and M∗ (right panel) as functions of the virial mass of
the halo in which the ETGs reside at z = 0. As in Fig. 3 the solid lines with errorbars indicate the mean and the 20th and 80th percentiles
of the distributions. These show that after one corrects for the stellar mass dependence, the properties of the last major mergers of ETGs
are independent of their halo mass.
mass, for both centrals and satellites, is perfectly consistent
with the null-hypothesis that the isophotal shape of an ETG
depends only on its luminosity; the fact that centrals have a
higher boxy fraction than satellites in the same group, sim-
ply owes to the fact that the centrals are brighter. Also, the
increase of fboxy with increasing group mass simply reflects
that more massive haloes host brighter galaxies. It there-
fore may not come as a surprise that our semi-analytical
model that fits the luminosity dependence of fboxy also fits
the group mass dependencies shown in Fig. 6. It does mean,
though, that in our model the merger histories of ETGs of
a given luminosity do not strongly depend on the halo mass
in which the galaxy resides.
To test this we proceed as follows. For each ETG in
our model we compute 〈n〉 and 〈Fcold〉, where the average
is over all ETGs with stellar masses similar to that of the
galaxy in question. Fig. 7 plots the residuals n − 〈n〉 and
Fcold − 〈Fcold〉 as function of the virial mass, Mvir, of the
halo in which they reside. This clearly shows that after one
corrects for the stellar mass dependence, the properties of
the last major merger of ETGs are indeed independent of
their halo mass‡. This provides theoretical support for the
conclusion of P07 that the stellar mass (or luminosity) of an
ETG is the main parameter that determines whether it will
be disky or boxy.
4.4 The Origin of Cusps and Cores
As discussed in §1, the dichotomy of ETGs is not only re-
stricted to their isophotal shapes. One other important as-
pect of the dichotomy regards the central density distribu-
‡ The fact that the distribution of the progenitor mass ratio n is
independent of halo mass was also found by KB05
tion of ETGs; while disky systems typically have cuspy pro-
files, the bright and boxy ellipticals generally reveal density
profiles with a pronounced core. Here we briefly discuss how
the formation of cusps and cores fits in the picture sketched
above.
In the paradigm adopted here, low luminosity ETGs
form mainly via wet mergers. Due to the fluctuating poten-
tial of the merging galaxies and the onset of bar instabilities,
the gas experiences strong torques which causes a significant
fraction of the gas to sink towards the center of the po-
tential well where it undergoes a starburst (e.g., Shlosman,
Frank & Begelman 1989; Barnes & Hernquist 1991; Mihos
& Hernquist 1996). Detailed hydrodynamical simulations of
gas-rich mergers (e.g., Springel & Hernquist 2005; Cox et
al. 2006b) result in the formation of remnants with surface
brightness profiles that are reminiscent of cuspy ETGs (John
Kormendy, private communication). Hence, cusps seem a
natural by-product of the dissipative processes associated
with a wet merger.
Boxy ETGs, however, are thought to form via dry merg-
ers. As can be seen from Fig. 5 roughly 35 percent of all mas-
sive ETGs originate from a last major merger that involves
an early-type progenitor. If this progenitor contains a cusp,
this will survive the merger, as most clearly demonstrated by
Dehnen (2005). The only mergers that are believed to result
directly in a remnant with a core, are mergers between pure
stellar disks with a negligible fcold (e.g., Cox et al. 2006a).
Fig. 8 shows the cumulative distributions of the bulge-to-
total stellar mass ratios of the progenitors of present day
ETGs. Results are shown for ETGs in three mass ranges,
as indicated. The probability that a progenitor of a massive
ETGs (with M∗ > 10
11h−1 M⊙) has a negligible bulge com-
ponent (M∗,bulge < 0.01M∗) is only about 3 percent. Hence,
we expect that only about 1 out of every 1000 major merg-
ers that result in a massive ETG will have a remnant with
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 11
a core. And this is most likely an overestimate, since we
did not take the cold gas mass fractions into consideration.
Since the cusp accounts for only about one percent of the
total stellar mass (e.g., Faber et al. 1997; Milosavljević et
al. 2002), cold gas mass fractions of a few percent are prob-
ably enough to create a cusp via dissipational processes.
Therefore, an additional mechanism is required in order
to create a core (i.e., destroy a cusp). Arguably the most
promising mechanism is the orbital decay of a supermas-
sive black hole (SMBH) binary, which can scour a core by
exchanging angular momentum with the cusp stars (e.g.,
Begelman et al. 1980; Ebisuzaki et al. 1991; Quinlan 1996;
Faber et al. 1997; Milosavljević et al. 2002; Merritt 2006a).
Since virtually all spheroids contain a SMBH at their cen-
ter, with a mass that is tightly correlated with the mass of
the spheroid (e.g., Kormendy & Richstone 1995; Ferrarese &
Merritt 2000; Gebhardt et al. 2000; Marconi & Hunt 2003;
Häring & Rix 2004), it is generally expected that such bina-
ries are common in merger remnants (but see below).
While offering an attractive explanation for the pres-
ence of cores in massive, boxy ETGs, this picture simulta-
neously poses a potential problem for the presence of cusps
in disky ETGs. After all, if the progenitors of disky ETGs
also harbor SMBHs, the same process could create a core
in these systems as well. There are two possible ways out
of this paradox: (i) low mass ETGs do not form a SMBH
binary, or (ii) a cusp is regenerated after the two SMBHs
have coalesced. We now discuss these two options in turn.
In order for a SMBH binary to form, dynamical friction
must first deliver the two SMBHs from the two progenitors
to the center of the newly formed merger remnant. This
process will only be efficient if the spheroidal hosts of the
SMBHs are sufficiently massive. Consider a (small) bulge
that was part of a late-type progenitor which is now orbiting
the remnant of its merger with another galaxy. Assume for
simplicity that both the bulge and the merger remnant are
purely stellar singular isothermal spheres (ρ ∝ r−2) with
velocity dispersions equal to σb and σg, respectively. Then,
assuming that the bulge is on a circular orbit, with an initial
radius ri, Chandrasekhar’s (1943) formula gives an infall
time for the bulge of
tinfall ≈ 3.3
≈ 4.7× 108 yr
(Merritt 2006b). Here we have used that ri/σg ≃
2 tcross
with tcross ∼ 108 yr the galaxy crossing time. If the galaxy
is the remnant of an equal mass merger, so that Mb ∼
(∆/2)Mg , with ∆ the bulge-to-total stellar mass ratio of
the late-type progenitor, we find that tinfall is equal to the
Hubble time (1.3 × 1010 yr) for ∆ ≃ 0.07. As can be seen
from Fig. 8, about 70 percent of the low mass ETGs (with
109h−1M⊙ < M∗ < 10
10h−1M⊙) have at least one progeni-
tor with a stellar bulge-to-total mass ratio ∆ < 0.07. There-
fore, we expect that a similar fraction will form without a
SMBH binary, and thus will not form a core. For compari-
son, for massive ETGs (with M∗ > 10
11h−1M⊙) only about
20 percent of the progenitors will have a sufficiently small
bulge to prevent the formation of a SMBH binary.
An alternative explanation for the presence of cusps in
low mass ETGs is that the cusp is regenerated by star for-
mation from gas present at the last major merger. However,
as emphasized by Faber et al. (1997), this results in a serious
Figure 8. The cumulative probability that a progenitor of a
z = 0 ETG has a stellar bulge-to-total mass ratio less then
M∗,bulge/M∗. Results are shown for the progenitors of ETGs in
three stellar mass ranges, as indicated (masses are in h−1 M⊙).
Note that the progenitors of more massive ETGs have signifi-
cantly higher M∗,bulge/M∗. As discussed in the text, this may
help to explain why low mass ETGs have cusps, while massive
ones have cores.
timing problem, as it requires that the new stars must form
after the SMBH binary has coalesced. Another potential
problem with this picture, is that the cusp would be younger
than the main body of the ETG which may lead to observ-
able effects (i.e., cusp could be bluer than main body). How-
ever, in light of the results presented here, we believe that
neither of these two issues causes a serious problem. First of
all, the cold gas mass fractions involved with the last major
merger, and hence the mass fraction that is turned into stars
in the resulting starburst, is extremely large: 〈Fcold〉 ∼ 0.8
(see Fig. 4). As mentioned above, a significant fraction of
this gas is transported to the center, where it will function
as an important energy sink for the SMBH binary, greatly
speeding up its coalescence (Escala et al. 2004, 2005) and
therewith reducing the timing problem mentioned above. In
fact, the gas may well be the dominant energy sink, so that
the pre-existing cusps of the progenitors are only mildly af-
fected. But even if the cusps were destroyed, there clearly
should be enough gas left to build a new cusp. In fact, if,
as envisioned in our semi-analytical model, all the cold gas
present at the last major merger is consumed in a starburst,
a very significant fraction of the stars in the main body
would also be formed in this starburst (not only the cusp).
This would help to diminish potential population differences
between the cusp and the main body of the ETG. In addi-
tion, as can be seen from the right-hand panels of Fig. 5,
the last major merger of low luminosity ETGs occurred on
average ∼ 9.5 Gyr ago. Hence, the stars made in this burst
are not easily distinguished observationally from the ones
that were already present before the last major merger.
To summarize, our semi-analytical model predicts that
c© 2000 RAS, MNRAS 000, 1–13
12 Kang et al.
the progenitors of ETGs have cold gas mass fractions and
bulge-to-total mass ratios that offer a relatively natural ex-
planation for the observed dichotomy between cusps and
cores.
5 CONCLUSIONS
Using a semi-analytical model for galaxy formation, com-
bined with a large N-body simulation, we have investigated
the origin of the dichotomy among ETGs. In order to assign
isophotal shapes to the ETGs in our model we use three cri-
teria: an ETG is said to be boxy if (i) the progenitor mass
ratio at the last major merger is n < ncrit, (ii) the total
cold gas mass fraction of the sum of the two progenitors
at the last major merger is Fcold < Fcrit, and (iii) after its
last major merger the ETG is not allowed to regrow a new
disk with a stellar mass that exceeds 20 percent of the total
stellar mass.
In agreement with KB05, we find that we can not repro-
duce the observed luminosity (or, equivalently, stellar mass)
dependence of fboxy if we assign isophotal shapes based only
on the progenitor mass ratio. This owes to the fact that
the distribution of n is virtually independent of the stellar
mass, M∗, of the ETG at z = 0. Rather, to obtain a boxy
fraction that increases with increasing luminosity one also
needs to consider the cold gas mass fraction at the last ma-
jor merger. In fact, we can accurately match the data of P07
with ncrit = 2 and Fcrit = 0.1. This implies that boxy galax-
ies originate from relatively violent and dry mergers with
roughly equal mass progenitors and with less than 10 per-
cent cold gas, in good agreement with numerical simulations
(e.g., Naab et al. 2006a; Cox et al. 2006a). Our model also
nicely reproduces the observed boxy fraction as function of
halo mass, for both central galaxies and satellites. We have
demonstrated that this owes to the fact that after one cor-
rects for the stellar mass dependence, the properties of the
last major merger of ETGs are independent of their halo
mass. This provides theoretical support for the conjecture
of P07 that the stellar mass (or luminosity) of an ETG is
the main parameter that determines whether it will be disky
or boxy.
Our model predicts a number density distribution,
φ(Fcold,M∗), of ETGs in the Fcold-M∗ plane that is clearly
bimodal: low mass ETGs with M∗ <∼ 3 × 10
9h−1 M⊙ have
high Fcold, while the progenitors of massive ETGs have low
cold gas mass fractions. Clearly, if wet and dry mergers pro-
duce disky and boxy ellipticals, respectively, this bimodal-
ity is directly responsible for the ETG dichotomy. Contrary
to naive expectations, we find that this bimodality is in-
dependent of the inclusion of AGN feedback in the model.
Although AGN feedback is essential for regulating the lumi-
nosities and colors of the brightest galaxies (which end up
as ETGs with AGN feedback, but as blue disk-dominated
systems without AGN feedback), it does not explain the
bimodality among ETGs. Rather, this bimodality is due to
the fact that more massive ETGs (i) have more massive pro-
genitors, (ii) assemble later, and (iii) have a larger fraction
of early-type progenitors. Each of these three trends causes
the cold gas mass fraction of the progenitors of more mas-
sive ETGs to be lower, and thus its last major merger to
be dryer. In conclusion, the dichotomy among ETGs has a
very natural explanation within the hierarchical framework
of structure formation and does not require AGN feedback.
We also examined the morphological properties of the
progenitors of present day ETGs (at the epoch of the last
major merger). Indicating early- and late-type galaxies with
‘E’ and ‘L’, respectively, we find that the lowest mass ETGs
almost exclusively form via L-L mergers. With increasing
M∗, however, there is a pronounced decrease of the fraction
of L-L mergers, which are mainly replaced by E-L mergers.
The E-E mergers, however, never contribute more than 10
percent, in good agreement with the SPH simulations of
Maller et al. (2006). Thus, although boxy ellipticals form
out of dry mergers, these only rarely involve two early-type
systems.
Since satellite galaxies do not have a hot corona from
which new gas cools down, they typically have lower cold gas
mass fractions than central galaxies of the same mass. Con-
sequently, dry mergers are preferentially mergers between
two satellite galaxies. In fact, since a satellite galaxy can
not become a central galaxy, our model predicts that more
than 95 percent of all boxy ETGs with M∗ <∼ 2×10
10h−1M⊙
are satellites.
We also find that the progenitors of less massive ETGs
typically have lower bulge-to-total mass ratios. In fact,
for ETGs with present day stellar masses in the range
109h−1 M⊙ < M∗ < 10
10h−1 M⊙ we find that almost half
of the progenitors at the last major merger have bulges
that do not contribute more than one percent to the to-
tal stellar mass. This may have important implications for
the observed dichotomy between cusps and cores in ETGs.
Cores are believed to form via the scouring effect of a SMBH
binary, that arises when the SMBHs associated with the
spheroidal components of the progenitor galaxies form a
bound pair. This requires both spheroids to sink to the cen-
ter of the potential well of the merger remnant via dynamical
friction. However, if the time scale for this infall exceeds the
Hubble time, no SMBH binary will form, thus preventing
the creation of a core. Using our prediction for the bulge-to-
total mass ratios of progenitor galaxies, and a simple esti-
mate based on Chandrasekhar’s dynamical friction formula,
we have estimated that ∼ 70 percent of low mass ETGs
in the aforementioned mass range will not form a SMBH
binary. For massive ETGs with M∗ > 10
11h−1M⊙ this frac-
tion is only ∼ 20 percent. This may help to explain why
low mass ETGs have steep cusps, while massive ETGs have
cores.
Finally, in those low mass systems that do form a SMBH
binary, the large cold gas mass fraction at its last major
merger (〈Fcold〉 ≃ 0.8) provides more than enough raw ma-
terial for the regeneration of a new cusp. In addition, a large
fraction of the cold gas will sink to the center due to angular
momentum transfer where it will function as an important
energy sink for the SMBH binary. As shown by Escala et
al. (2004, 2005), this can cause a tremendous acceleration of
the coalescence of the SMBHs, largely removing the timing
problem interjected by Faber et al. (1997).
6 ACKNOWLEDGEMENTS
We are grateful to Eric Bell, Eric Emsellem, John Kormendy,
Thorsten Naab, Hans-Walter Rix, and the entire Galaxies-
c© 2000 RAS, MNRAS 000, 1–13
On the Origin of the Dichotomy of Early-Type Galaxies 13
Cosmology-Theory group at the MPIA for enlightening dis-
cussions.
REFERENCES
Adelman-McCarthy J.K., et al., 2006, ApJS, 162, 38
Barnes J.E., 1988, ApJ, 331, 699
Barnes J.E., Hernquist L.E., 1991, ApJ, 370, 65
Barnes J.E., Hernquist L.E., 1996, ApJ, 471, 115
Begelman M.C., Blandford R.D., Rees M.J., 1980, Nature, 287,
Bender R., 1988, A&AS, 193, 7
Bender R., Surma P., Döbereiner S., Möllenhoff C., Madejsky R.,
1989, A&A, 217, 35
Bendo G.J., Barnes J.E., 2000, MNRAS, 316, 315
Benson A.J., Bower R.G., Frenk C.S., Lacey C.G., Baugh C.M.,
Cole S., 2003, ApJ, 599, 38
Benson A.J., Kamionkowski M., Hassani S.H., 2005, MNRAS,
357, 847
Bower R.G., Benson A.J., Malbon R., Helly J.C., Frenk C.S.,
Baugh C.M., Cole S., Lacey C.G., 2006, MNRAS, 370, 645
Bournaud F., Combes F., Jog C.J., 2004, A&A, 418, 27
Bournaud F., Jog C.J., Combes F., 2005, A&A, 437, 69
Bruzual G., Charlot S., 2003, MNRAS, 344, 1000
Bryan G., Norman M., 1998, ApJ, 495, 80
Cappellari M, et al., 2007, preprint (astro-ph/0703533)
Cattaneo A., Dekel A., Devriendt J., Guiderdoni B., Blaizot J.,
2006, MNRAS, 370, 1651
Chandrasekhar S., 1943, ApJ, 97, 255
Cole S., Aragon-Salamanca A., Frenk C.S., Navarro J., Zepf S.E.,
1994, MNRAS, 271, 781
Cole S., Lacey C.G., Baugh C.M., Frenk C.S., 2000, MNRAS,
319, 168
Cox T.J., Dutta S.N., Di Matteo T., Hernquist L., Hopkins P.F.,
Robertson B., Springel V., 2006a, ApJ, 650, 791
Cox T.J., Jonsson P., Primack J.R., Somerville R.S., 2006b, MN-
RAS, 373, 1013
Croton D.J., et al., 2006, MNRAS, 365, 11
Davies R.L., Efstathiou G., Fall S.M., Illingworth G., Schechter
P.L., 1983, ApJ, 266, 41
Dehnen W., 2005, MNRAS, 360, 892
De Lucia G., Kauffmann G., White S.D.M., 2004, MNRAS, 349,
De Lucia G., Springel V., White S.D.M., Croton D., Kauffmann
G., 2006, MNRAS, 366, 499
Di Matteo P., Combes F., Melchior A.L., Semelin B., 2007,
preprint (astro-ph/0703212)
Ebisuzaki T., Makino J., Okamura S.K., 1991, Nature, 354, 212
Emsellem E., et al., 2007, preprint (astro-ph/0703531)
Escala A., Larson R.B., Coppi P.S., Mardones D., 2004, ApJ, 607,
Escala A., Larson R.B., Coppi P.S., Mardones D., 2005, ApJ, 630,
Faber S.M., et al. 1997, AJ, 114, 1771
Ferrarese L., van den Bosch F.C., Ford H.C., Jaffe W., O’Connell
R.W. 1994, AJ, 108, 1598
Ferrarese L., Merritt D., 2000, ApJ, 539, L9
Gebhardt K., et al. 1996, AJ, 112, 105
Gebhardt K., et al. 2000, ApJ, 539, L13
Hao C.N., Mao S., Deng Z.G., Xia X.Y., Wu H., 2006, MNRAS,
370, 1339
Häring N., Rix H.W., 2000, ApJ, 604,
Hernquist L., 1992, ApJ, 400, 460
Jing Y.P., Suto Y., 2002, ApJ, 574, 538
Kang X., Jing Y.P., Mo H.J., Börner G., 2005, ApJ, 631, 21 (K05)
Kang X., Jing Y.P, Silk J., 2006, ApJ, 648, 820
Kauffmann G., White S.D.M., Guiderdoni B., 1993, MNRAS, 264,
Khochfar S., Burkert A., 2005, MNRAS, 359, 1379 (KB05)
Kormendy J., Richstone D.O., 1995, ARA&A, 33, 581
Lacey C., Cole S., 1993, MNRAS, 262, 627
Lacey C., Cole S., 1994, MNRAS, 271, 676
Lauer T.R., et al., 2005, AJ, 129, 2138
Lauer T.R., et al., 2006, preprint (astro-ph/0609762)
Maller A.H., Katz N., Keres D., Davé R., Weinberg D.H., 2006,
ApJ, 647, 763
Marconi A., Hunt L.K., 2003, ApJ, 589, L21
Merritt D., 2006a, ApJ, 648, 976
Merritt D., 2006b, Rep. Prog. Phys., 69, 2513
Mihos J.C., Hernquist L.E., 1996, ApJ, 464, 641
Milosavljević M., Merritt D., Rest A., van den Bosch F.C., 2002,
MNRAS, 331, L51
Naab T., Burkert A., 2003, ApJ, 597, 893
Naab T., Jesseit R., Burkert A., 2006a, MNRAS, 372, 839
Naab T., Khochfar S., Burkert A., 2006b, ApJ, 636, L81 (N06)
Naab T., Trujillo I., 2006, MNRAS, 369, 625
Negroponte J., White S.D.M., 1983, MNRAS, 205, 1009
Neistein E., van den Bosch F.C., Dekel A., 2006, MNRAS, 372,
Nieto J.-L., Capaccioli M., Held E.V. 1988, A&A, 195, 1
Pasquali A., van den Bosch F.C., Rix H.W., 2007, preprint
(arXiv:0704.0931)
Pellegrini S. 1999, A&A, 351, 487
Pellegrini S. 2005, MNRAS, 364, 169
Quinlan G.D., 1996, New Astronomy, 1, 35
Ravindranath S., Ho L.C., Peng C.Y., Filippenko A.V., Sargent
W.L.W., 2001, AJ, 122, 653
Rest A., van den Bosch F.C., Jaffe W., Tran H., Tsvetanov Z.,
Ford H.C., Davies J., Schafer J., 2001, AJ, 121, 2431
Rix H.W., White S.D.M., 1990, ApJ, 362, 52
Schweizer F., 1982, ApJ, 252, 455
Shlosman I., Frank J., Begelman M.C., 1989, Nature, 338, 45
Simien F., de Vaucouleurs G., 1986, ApJ, 302, 564
Somerville R.S., Lemson G., Kolatt T.S., Dekel A., 2000, MN-
RAS, 316, 479
Springel V., 2000, MNRAS, 312, 859
Springel V., White S.D.M., Tormen G., Kauffmann G., 2001, MN-
RAS, 328, 726
Springel V., Hernquist L., 2005, ApJ, 622, 9
Toomre A., Toomre J., 1972, ApJ, 178, 623
van den Bosch F.C., 2002, MNRAS, 331, 98
Wechsler R.H., Bullock J.S., Primack J.R., Kravtsov A.V., Dekel
A., 2002, ApJ, 568, 52
Weinmann S.M., van den Bosch F.C., Yang X.H., Mo H.J., 2006,
MNRAS, 366, 2
White M., 2002, ApJS, 143, 241
Yang X.H., Mo H.J., van den Bosch F.C., Jing Y.P., 2005, MN-
RAS, 356, 1293
c© 2000 RAS, MNRAS 000, 1–13
ABSTRACT
  Using a semi-analytical model for galaxy formation, combined with a large
N-body simulation, we investigate the origin of the dichotomy among early-type
galaxies. We find that boxy galaxies originate from mergers with a progenitor
mass ratio $n < 2$ and with a combined cold gas mass fraction $F_{\rm cold} <
0.1$. Our model accurately reproduces the observed fraction of boxy systems as
a function of luminosity and halo mass, for both central galaxies and
satellites. After correcting for the stellar mass dependence, the properties of
the last major merger of early-type galaxies are independent of their halo
mass. This provides theoretical support for the conjecture of Pasquali et al
(2007) that the stellar mass of an early-type galaxy is the main parameter that
governs its isophotal shape. We argue that the observed dichotomy of early-type
galaxies has a natural explanation within hierarchical structure formation, and
does not require AGN feedback. Rather, we argue that it owes to the fact that
more massive systems (i) have more massive progenitors, (ii) assemble later,
and (iii) have a larger fraction of early-type progenitors. Each of these three
trends causes the cold gas mass fraction of the progenitors of more massive
early-types to be lower, so that their last major merger was dryer. Finally,
our model predicts that (i) less than 10 percent of all early-type galaxies
form in major mergers that involve two early-type progenitors, (ii) more than
95 percent of all boxy early-type galaxies with $M_* < 2 \times 10^{10} h^{-1}
\Msun$ are satellite galaxies, and (iii) about 70 percent of all low mass
early-types do not form a supermassive black hole binary at their last major
merger. The latter may help to explain why low mass early-types have central
cusps, while their massive counterparts have cores.

<|endoftext|><|startoftext|>
Introduction to Phase Transitions and Critical Phenomena (Oxford 
University Press, Oxford, 1993).
ABSTRACT
  We explore the conductance of self-healing materials as a measure of the
material integrity in the regime of the onset of the initial fatigue. Continuum
effective-field modeling and lattice numerical simulations are reported. Our
results illustrate the general features of the self-healing process: The onset
of the material fatigue is delayed, by developing a plateau-like
time-dependence of the material quality. We demonstrate that in this low-damage
regime, the changes in the conductance and similar transport/response
properties of the material can be used as measures of the material quality
degradation.

<|endoftext|><|startoftext|>
Introduction
The invariants of Lie algebras are one of their defining characteristics. They have numerous appli-
cations in different fields of mathematics and physics, in which Lie algebras arise (representation
theory, integrability of Hamiltonian differential equations, quantum numbers etc). In particular,
the polynomial invariants of a Lie algebra exhaust its set of Casimir operators, i.e., the center of
its universal enveloping algebra. That is why non-polynomial invariants are also called general-
ized Casimir operators, and the usual Casimir operators are seen as ‘trivial’ generalized Casimir
operators. Since the structure of invariants strongly depends on the structure of the algebra and
the classification of all (finite-dimensional) Lie algebras is an inherently difficult problem (actually
unsolvable), it seems to be impossible to elaborate a complete theory for generalized Casimir op-
erators in the general case. Moreover, if the classification of a class of Lie algebras is known, then
the invariants of such algebras can be described exhaustively. These problems have already been
solved for the semi-simple and low-dimensional Lie algebras, and also for the physically relevant
Lie algebras of fixed dimensions (see, e.g., references in [3, 7, 8, 18, 19]).
The actual problem is the investigation of generalized Casimir operators for classes of solvable Lie
algebras or non-solvable Lie algebras with non-trivial radicals of arbitrary finite dimension. There
are a number of papers on the partial classification of such algebras and the subsequent calculation
of their invariants [1, 6, 7, 14, 15, 16, 20, 21, 22, 23]. In particular, Tremblay and Winternitz [22]
classified all the solvable Lie algebras with the nilradicals isomorphic to the nilpotent algebra t0(n)
of strictly upper triangular matrices for any fixed dimension n. Then in [23] invariants of these
algebras were considered. The case n = 4 was investigated exhaustively. After calculating the
invariants for a sufficiently large value of n, Tremblay and Winternitz made conjectures for an
arbitrary n on the number and form of functionally independent invariants of the algebra t0(n),
and the ‘diagonal’ solvable Lie algebras having t0(n) as their nilradicals and possessing either the
maximal (equal to n − 1) or minimal (one) number of nilindependent elements. A statement on a
functional basis of invariants was only proved completely for the algebra t0(n). The infinitesimal
invariant criterion was used for the construction of the invariants. Such an approach entails the
http://arxiv.org/abs/0704.0937v4
http://arxiv.org/abs/math-ph/0602046
http://arxiv.org/abs/math-ph/0606045
necessity of solving a system of ρ first-order linear partial differential equations, where ρ has the
order of the algebra’s dimension. This is why the calculations were very cumbersome and results
were obtained due to the thorough mastery of the method.
In this paper, we use our original algebraic method for the construction of the invariants (‘gen-
eralized Casimir operators’) of Lie algebras via the moving frames approach [3, 4]. The algorithm
makes use of the knowledge of the associated inner automorphism groups and Cartan’s method
of moving frames in its Fels–Olver version [9, 10]. (For modern developments about the moving
frame method and more references, see also [17].) Unlike standard infinitesimal methods, it allows
us to avoid solving systems of differential equations, replacing them instead by algebraic equations.
As a result, the application of the algorithm is simpler. Note that a closed approach was earlier
proposed in [12, 13, 19] for the specific case of inhomogeneous algebras.
The invariants of three classes of triangular Lie algebras are exhaustively investigated (below n
is an arbitrary integer):
• nilpotent Lie algebras t0(n) of n× n strictly upper triangular matrices (Section 3);
• solvable Lie algebras t(n) of n× n upper triangular matrices (Section 4);
• solvable Lie algebras st(n) of n× n special upper triangular matrices (Section 5).
The triangular algebras are especially interesting due to their ‘universality’ properties. More pre-
cisely, any finite-dimensional nilpotent Lie algebra is isomorphic to a subalgebra of t0(n). Similarly,
any finite-dimensional solvable Lie algebra over an algebraically closed field of characteristic 0 (e.g.,
over C) can be embedded as a subalgebra in t(n) (or st(n)).
We have adapted and optimized our algorithm for the specific case of triangular Lie algebras via
special double enumeration of basis elements, individual choice of coordinates in the corresponding
inner automorphism groups and an appropriate modification of the normalization procedure of
the moving frame method. As a result, the problems related to the construction of functional
bases of invariants are reduced for the algebras t0(n) and t(n) to solving linear systems of algebraic
equations! Let us note that due to the natural embedding of st(n) to t(n) and the representation
t(n) = st(n) ⊕ Z(t(n)), where Z(t(n)) is the center of t(n), we can construct a basis in the set
of invariants of st(n) without the usual calculations from a previously found basis in the set of
invariants of t(n).
We re-prove the statement for a basis of invariants of t0(n), which was first constructed in [23]
using the infinitesimal method in a heuristic way, thereafter constructed in [4] using an empiric
technique based on the exclusion of parameters within the framework of the algebraic method. The
aim of this paper in considering t0(n) is to test and better understand the technique of working
with triangular algebras. The calculations for t(n) are similar, albeit more complex, although they
are much clearer and easier than under the standard infinitesimal approach.
As proved in [22], there is a unique algebra with the nilradical t0(n) that contains a maximum
possible number (n− 1) of nilindependent elements. A conjecture on the invariants of this algebra
is formulated in Proposition 1 of [23]. We show that this algebra is isomorphic to st(n). As a result,
the conjecture by Tremblay and Winternitz on its invariants is effectively proved.
2 The algorithm
The applied algebraic algorithm was first proposed in [3] and then developed in [4]. Ibid it was
effectively tested for the low-dimensional Lie algebras and a wide range of solvable Lie algebras
with a fixed structure of nilradicals. The presentation of the algorithm here differs from [3, 4], the
differences being important within the framework of applications.
For convenience of the reader and to introduce some necessary notations, before the description
of the algorithm, we briefly repeat the preliminaries given in [3, 4] about the statement of the
problem of calculating Lie algebra invariants, and on the implementation of the moving frame
method [9, 10]. The comparative analysis of the standard infinitesimal and the presented algebraic
methods, as well as their modifications, is given in the second part of this section.
Consider a Lie algebra g of dimension dim g = n < ∞ over the complex or real field and the
corresponding connected Lie group G. Let g∗ be the dual space of the vector space g. The map
Ad∗ : G → GL(g∗), defined for any g ∈ G by the relation
〈Ad∗gx, u〉 = 〈x,Adg−1u〉 for all x ∈ g
∗ and u ∈ g
is called the coadjoint representation of the Lie group G. Here Ad: G → GL(g) is the usual adjoint
representation of G in g, and the image AdG of G under Ad is the inner automorphism group Int(g)
of the Lie algebra g. The image of G under Ad∗ is a subgroup of GL(g∗) and is denoted by Ad∗G.
A function F ∈ C∞(g∗) is called an invariant of Ad∗G if F (Ad
gx) = F (x) for all g ∈ G and x ∈ g
The set of invariants of Ad∗G is denoted by Inv(Ad
G). The maximal number Ng of functionally
independent invariants in Inv(Ad∗G) coincides with the codimension of the regular orbits of Ad
i.e., it is given by the difference
Ng = dim g− rankAd
Here rankAd∗G denotes the dimension of the regular orbits of Ad
G and will be called the rank of the
coadjoint representation of G (and of g). It is a basis independent characteristic of the algebra g,
the same as dim g and Ng.
To calculate the invariants explicitly, one should fix a basis E = {e1, . . . , en} of the algebra g.
It leads to fixing the dual basis E∗ = {e∗1, . . . , e
n} in the dual space g
∗ and to the identification
of Int(g) and Ad∗G with the associated matrix groups. The basis elements e1, . . . , en satisfy the
commutation relations [ei, ej ] =
k=1 c
ijek, i, j = 1, . . . , n, where c
ij are components of the tensor
of structure constants of g in the basis E .
Let x → x̌ = (x1, . . . , xn) be the coordinates in g
∗ associated with E∗. Given any invariant
F (x1, . . . , xn) of Ad
G, one finds the corresponding invariant of the Lie algebra g by symmetriza-
tion, SymF (e1, . . . , en), of F . It is often called a generalized Casimir operator of g. If F is
a polynomial, SymF (e1, . . . , en) is a usual Casimir operator, i.e., an element of the center of the
universal enveloping algebra of g. More precisely, the symmetrization operator Sym acts only on
the monomials of the forms ei1 · · · eir , where there are non-commuting elements among ei1 , . . . , eir ,
and is defined by the formula
Sym(ei1 · · · eir) =
eiσ1 · · · eiσr ,
where i1, . . . , ir take values from 1 to n, r > 2. The symbol Sr denotes the permutation group
consisting of r elements. The set of invariants of g is denoted by Inv(g).
A set of functionally independent invariants F l(x1, . . . , xn), l = 1, . . . , Ng, forms a functional
basis (fundamental invariant) of Inv(Ad∗G), i.e., any invariant F (x1, . . . , xn) can be uniquely rep-
resented as a function of F l(x1, . . . , xn), l = 1, . . . , Ng. Accordingly the set of SymF
l(e1, . . . , en),
l = 1, . . . , Ng, is called a basis of Inv(g).
Our task here is to determine the basis of the functionally independent invariants for Ad∗G, and
then to transform these invariants into the invariants of the algebra g. Any other invariant of g is
a function of the independent ones.
Let us recall some facts from [9, 10] and adapt them to the particular case of the coadjoint
action of G on g∗. Let G = Ad∗G×g
∗ denote the trivial left principal Ad∗G-bundle over g
∗. The right
regularization R̂ of the coadjoint action of G on g∗ is the diagonal action of Ad∗G on G = Ad
It is provided by the map R̂g(Ad
h, x) = (Ad
h · Ad
,Ad∗gx), g, h ∈ G, x ∈ g
∗, where the action
on the bundle G = Ad∗G × g
∗ is regular and free. We call R̂g the lifted coadjoint action of G.
It projects back to the coadjoint action on g∗ via the Ad∗G-equivariant projection πg∗ : G → g
Any lifted invariant of Ad∗G is a (locally defined) smooth function from G to a manifold, which
is invariant with respect to the lifted coadjoint action of G. The function I : G → g∗ given by
I = I(Ad∗g, x) = Ad
gx is the fundamental lifted invariant of Ad
G, i.e., I is a lifted invariant, and
any lifted invariant can be locally written as a function of I. Using an arbitrary function F (x)
on g∗, we can produce the lifted invariant F ◦ I of Ad∗G by replacing x with I = Ad
gx in the
expression for F . Ordinary invariants are particular cases of lifted invariants, where one identifies
any invariant formed as its composition with the standard projection πg∗. Therefore, ordinary
invariants are particular functional combinations of lifted ones that happen to be independent of
the group parameters of Ad∗G.
The algebraic algorithm for finding invariants of the Lie algebra g is briefly formulated in the
following four steps.
1. Construction of the generic matrix B(θ) of Ad∗G. B(θ) is the matrix of an inner automorphism
of the Lie algebra g in the given basis e1, . . . , en, θ = (θ1, . . . , θr) is a complete tuple of group
parameters (coordinates) of Int(g), and r = dimAd∗G = dim Int(g) = n − dimZ(g), where Z(g) is
the center of g.
2. Representation of the fundamental lifted invariant. The explicit form of the fundamental
lifted invariant I = (I1, . . . ,In) of Ad
G in the chosen coordinates (θ, x̌) in Ad
∗ is I = x̌ ·B(θ),
i.e., (I1, . . . ,In) = (x1, . . . , xn) · B(θ1, . . . , θr).
3. Elimination of parameters by normalization. We choose the maximum possible number ρ of
lifted invariants Ij1 , . . . , Ijρ, constants c1, . . . , cρ and group parameters θk1 , . . . , θkρ such that
the equations Ij1 = c1, . . . , Ijρ = cρ are solvable with respect to θk1 , . . . , θkρ . After substituting
the found values of θk1 , . . . , θkρ into the other lifted invariants, we obtain Ng = n− ρ expressions
F l(x1, . . . , xn) without θ’s.
4. Symmetrization. The functions F l(x1, . . . , xn) necessarily form a basis of Inv(Ad
G). They
are symmetrized to SymF l(e1, . . . , en). It is the desired basis of Inv(g).
Let us give some remarks on the steps of the algorithm, mainly paying attention to the special
features of its variation in this paper, and where it differs from the conventional infinitesimal
method.
Usually, the second canonical coordinate on Int(g) is enough for the first step, although some-
times, the first canonical coordinate on Int(g) is the more appropriate choice. In both the cases, the
matrix B(θ) is calculated by exponentiation from matrices associated with the structure constants.
Often the parameters θ are additionally transformed in a trivial manner (signs, renumbering, re-
denotation etc) for simplification of the final presentation of B(θ). It is also sometimes convenient
for us to introduce ‘virtual’ group parameters corresponding to the center basis elements. Efficient
exploitation of the algorithm imposes certain constrains on the choice of bases for g, in particular,
in the enumeration of their elements; thus automatically yielding simpler expressions for elements
of B(θ) and, therefore, expressions of the lifted invariants. In some cases the simplification is
considerable.
In contrast with the general situation, for the triangular Lie algebras we use special coordinates
for their inner automorphism groups, which naturally harmonize with the canonical matrix rep-
resentations of the corresponding Lie groups and with special ‘matrix’ enumeration of the basis
elements. The application of the individual approach results in the clarification and a substantial
reduction of all calculations. In particular, algebraic systems solved under normalization become
linear with respect to their parameters.
Since B(θ) is a general form matrix from Int(g), it should not be adapted in any way for the
second step.
Indeed, the third step of the algorithm can involve different techniques of elimination of pa-
rameters which are also based on using an explicit form of lifted invariants [3, 4]. The applied
normalization procedure [9, 10] can also be subject to some variations and can applied in a more
involved manner.
As a rule, in complicated cases the main difficulty is created by the determination of the num-
ber ρ, who is actually equal to rankAd∗G, which is equivalent to finding the maximum number Ng
of functionally independent invariants in Inv(Ad∗G), since Ng = dim g − rankAd
G. The rank ρ of
the coadjoint representation Ad∗G can be calculated in different ways, e.g., by the closed formulas
ρ = max
x̌∈Rn
ckijxk
i,j=1
, ρ = max
x̌∈Rn
or with the use of indirect argumentation. The first formula is native to the infinitesimal approach
to invariants (see, e.g., [5, 16, 18, 23] and other references) since it gives the number of algebraically
independent differential equations in the linear system of first-order partial differential equations∑n
j,k=1 c
ijxkFxj = 0, which arises under this approach and is the infinitesimal criterion for invariants
of the algebra g under the fixed basis E . The second formula shows that rankAd∗G coincides
with the maximum dimension of a nonsingular submatrix in the Jacobian matrix ∂I/∂θ. The
tuples of lifted invariants and parameters associated with this submatrix are appropriate for the
normalization procedure, where the constants c1, . . . , cρ are chosen to lie in the range of values of
the corresponding lifted invariants.
If ρ is known then the sufficient number (Ng = dim g − ρ) of functionally independent invari-
ants can be found with various ‘empiric’ techniques in the frameworks of both the infinitesimal
and algebraic approaches. For example, expressions of candidates for invariants can be deduced
from invariants of similar low-dimensional Lie algebras and then tested via substitution to the
infinitesimal criterion for invariants. It is the method used in [23] to describe invariants of the Lie
algebra t0(n) of strictly upper triangular n × n matrices for any fixed n > 2. In the framework of
the algebraic approach, invariants can be constructed via the combination of lifted invariants in
expressions not depending on the group parameters [9, 10]. This method was applied, in particular,
to low-dimensional algebras and the algebra t0(n) [3, 4]. Other empiric techniques, e.g., based on
commutator properties [2] also can be used.
At the same time, a basis of Inv(Ad∗G) may be constructed without first determining the number
of basis elements. Since under such consideration the infinitesimal approach leads to the necessity
of the complete integration of the partial differential equations from the infinitesimal invariant
criterion, the domain of its applicability seems quite narrow (low-dimensional algebras and Lie
algebra of special simple structure). A similar variation of the algebraic method is based on the
following obvious statement.
Proposition 1. Let I = (I1, . . . ,In) be a fundamental lifted invariant, for the lifted invariants
Ij1 , . . . , Ijρ and some constants c1, . . . , cρ the system Ij1 = c1, . . . , Ijρ = cρ be solvable with
respect to the parameters θk1 , . . . , θkρ and substitution of the found values of θk1 , . . . , θkρ into
the other lifted invariants result in m = n− ρ expressions Îl, l = 1, . . . ,m, depending only on x’s.
Then ρ = rankAd∗G, m = Ng and Î1, . . . , Îm form a basis of Inv(Ad
Our experience on the calculation of invariants of a wide range of Lie algebras shows that the
version of the algebraic method, which is based on Proposition 1, is most effective. It is the version
that is used in this paper.
Note that the normalization procedure is difficult to be made algorithmic. There is a big
ambiguity in the choice of the normalization equations. We can take different tuples of ρ lifted
invariants and ρ constants, which lead to systems solvable with respect to ρ parameters. Moreover,
lifted invariants can be additionally combined before forming a system of normalization equations
or substitution of found values of parameters. Another possibility is to use a floating system of
normalization equations (see Section 6.2 of [4]). This means that elements of an invariant basis are
constructed under different normalization constraints. The choice of an optimal method results in
a considerable reduction of calculations and a practical form of constructed invariants.
3 Nilpotent algebra of strictly upper triangular matrices
Consider the nilpotent Lie algebra t0(n) isomorphic to the one of the strictly upper triangular n×n
matrices over the field F, where F is either C or R. t0(n) has dimension n(n − 1)/2. It is the Lie
algebra of the Lie group T0(n) of upper unipotent n × n matrices, i.e., upper triangular matrices
with entries equal to 1 in the diagonal.
As mentioned above, the basis of Inv(t0(n)) was first constructed in a heuristic way in [23]
within the framework of the infinitesimal approach. This result was re-obtained in [4] with the use
of the pure algebraic algorithm first proposed in [3] and developed in [4]. Also, it is the unique
example included among the wide variety of solvable Lie algebras investigated in [4], in which the
‘empiric’ technique of excluding group parameters from lifted invariants was applied. Although
this technique was very effective in constructing a set of functionally independent invariants (calcu-
lations were reduced via a special representation of the coadjoint action to a trivial identity using
matrix determinants, see Note 2), the main difficulty was in proving that the set of invariants is a
basis of Inv(t0(n)), i.e. cardinality of the set equals the maximum possible number of functionally
independent invariants. Under the infinitesimal approach [23] the main difficulty was the same.
In this section we construct a basis of Inv(t0(n)) with the algebraic algorithm but exclude group
parameters from lifted invariants by the normalization procedure. In contrast with the previous
expositions (Section 3 of [23] and Section 8 of [4]), sufficiency of the number of found invariants
for forming a basis of Inv(t0(n)) is proved in the process of calculating them. Investigation of
Inv(t0(n)) in this way gives us a sense of the specific features of the normalization procedure in the
case of Lie algebras having nilradicals isomorphic (or closed) to t0(n).
For the algebra t0(n) we use a ‘matrix’ enumeration of the basis elements with an ‘increasing’
pair of indices, in a similar way to the canonical basis {Enij , i < j} of the isomorphic matrix algebra.
Hereafter Enij (for fixed values of i and j) denotes the n×n matrix (δii′δjj′) with i
′ and j′ running
the numbers of rows and column respectively, i.e., the n × n matrix with a unit element on the
cross of the i-th row and the j-th column, and zero otherwise. En = diag(1, . . . , 1) is the n × n
unity matrix. The indices i, j, k and l run at most from 1 to n. Only additional constraints on the
indices are indicated.
Thus, the basis elements eij ∼ E
ij , i < j, satisfy the commutation relations
[eij , ei′j′] = δi′jeij′− δij′ei′j,
where δij is the Kronecker delta.
Let e∗ji, xji and yij denote the basis element and the coordinate function in the dual space t
and the coordinate function in t0(n), which correspond to the basis element eij , i < j. In particular,
, eij〉 = δii′δjj′. The reverse order of subscripts of the objects associated with the dual space t
is justified by the simplification of a matrix representation of lifted invariants. We complete the
sets of xji and yij in the matrices X and Y with zeros. Hence X is a strictly lower triangular
matrix and Y is a strictly upper triangular one.
We reproduce Lemma 1 from [4] together with its proof, since it is important for further con-
sideration.
Lemma 1. A complete set of independent lifted invariants of Ad∗
T0(n)
is exhaustively given by the
expressions
Iij = xij +
bii′xi′j +
bj′jxij′ +
i<i′, j′<j
bii′ b̂j′jxi′j′ , j < i,
where B = (bij) is an arbitrary matrix from T0(n), and B
−1 = (̂bij) is the inverse matrix of B.
Proof. The adjoint action of B ∈ T0(n) on the matrix Y is AdBY = BYB
−1, i.e.,
yijeij =
(BY B−1)ijeij =
i6i′<j′6j
bii′yi′j′ b̂j′jeij .
After changing eij → xji, yij → e
ji, bij ↔ b̂ij in the latter equality, we obtain the representation of
the coadjoint action of B
i6i′<j′6j
bj′jxjib̂ii′e
j′i′ =
i′<j′
(BXB−1)j′i′e
j′i′ .
Therefore, the elements Iij, j < i, of the matrix I = BXB
−1, B ∈ T0(n), form a complete set of
the independent lifted invariants of Ad∗T0(n).
Note 1. The center of the group T0(n) is Z(T0(n)) = {E
n+b1nE
1n, b1n ∈ F}. The inner automor-
phism group of t0(n) is isomorphic to the factor-group T0(n)/Z(T0(n)) and hence its dimension is
n(n−1)−1. The parameter b1n in the above representation of the lifted invariants is not essential.
Below A
i1,i2
j1,j2
, where i1 6 i2, j1 6 j2, denotes the submatrix (aij)
i=i1,...,i2
j=j1,...,j2
of a matrix A = (aij).
The conjugate value of k with respect to n is denoted by κ, i.e. κ = n − k + 1. The standard
notation |A| = detA is used.
Theorem 1. A basis of Inv(Ad∗
T0(n)
) consists of the polynomials
|, k = 1, . . . ,
Proof. Under normalization we impose the following restriction on the lifted invariants Iij, j < i:
Iij = 0 if j < i, (i, j) 6= (n− j
′ + 1, j′), j′ = 1, . . . ,
It means that we do not only fix the values of the elements of the lifted invariant matrix I, which
are situated on the secondary diagonal, under the main diagonal. The other significant elements
of I are given the value 0. As shown below, the chosen normalization is correct since it provides
satisfying the conditions of Proposition 1.
In view of the (triangular) structure of the matrices B and X the formula I = BXB−1, deter-
mining the lifted invariants implies that BX = IB. This matrix equality is also significant for the
matrix elements underlying the main diagonals of the left and right hand sides, i.e.,
xij +
bii′xi′j = Iij +
Iij′bj′j, j < i.
For convenience we divide the latter system under the chosen normalization conditions into four
sets of subsystems
Sk1 : xκj +
bκi′xi′j = 0, i = κ, j < k, k = 2, . . . ,
Sk2 : xκk +
bκi′xi′k = Iκk, i = κ, j = k, k = 1, . . . ,
Sk3 : xκj +
bκi′xi′j = Iκkbkj, i = κ, k < j < κ, k = 1, . . . ,
Sk4 : xkj +
bki′xi′j = 0, i = k, j < k, k = 2, . . . ,
and solve them one by one. The subsystem S12 consists of the single equation In1 = xn1 which gives
the simplest form of the invariant corresponding to the center of the algebra t0(n). For any fixed
k ∈ {2, . . . , [n/2]} the subsystem Sk1 ∪ S
2 is a well-defined system of linear equations with respect
to bκi′ , i
′ > κ, and Iκk. Solving it, e.g., by the Cramer method, we obtain that bκi′ , i
′ > κ, are
expressions of xi′j , i
′ > κ, j < k, the explicit form of which is not essential in what follows, and
Iκk = (−1)
1,k |
κ+1,n
1,k−1 |
, k = 2, . . . ,
The combination of the found values of Iκk results in the invariants from the statement of the
theorem. The functional independence of these invariants is obvious.
After substituting the expressions of Iκk and bκi′ , i
′ > κ, via x’s, into Sk3 , we trivially resolve
Sk3 with respect to bkj as an uncoupled system of linear equations. In performing the subsequent
substitution of the calculated expressions for bkj to S
4 , for any fixed k, we obtain a well-defined
system of linear equations, e.g., with respect to bki′ , i
′ > κ.
Under the normalization we express the non-normalized lifted invariants via x’s and find a part
of the parameters b’s of the coadjoint action via x’s and the other b’s. No equations involving
only x’s are obtained. In view of Proposition 1, this implies that the choice of the normalization
constraints is correct and, therefore, the number of functionally independent invariants found is
maximal, i.e., they form a basis of Inv(Ad∗T0(n)).
Corollary 1. A basis of Inv(t0(n)) is formed by the Casimir operators
det(eij)
i=1,...,k
j=n−k+1,...,n, k = 1, . . . ,
Proof. Since the basis elements corresponding the coordinate functions from the constructed basis
of Inv(Ad∗T0(n)) commute, the symmetrization procedure is trivial.
Note 2. The set of the invariants from Theorem 1 can be easily found from the equality I = BXB−1
by the following empiric trick used in Lemma 2 from [4]. For any fixed k ∈ {1, . . . , [n/2]} we re-
strict the equality to the submatrix with the row range κ, . . . , n and the column range 1, . . . , k:
1,k = B
1,k (B
1,k. Since |B
κ,n | = |(B
1,k| = 1, we obtain |I
1,k | = |X
1,k |, i.e., |X
1,k |
is an invariant of Ad∗T0(n) in view of the definition of an invariant. Functional independence of
the constructed invariants is obvious. The proof of Nt0(n) = [n/2] is much more difficult (see
Lemma 3 of [4]).
4 Solvable algebra of upper triangular matrices
In a way analogous to the previous section, consider the solvable Lie algebra t(n) isomorphic to the
one of upper triangular n× n matrices. t(n) has dimension n(n+1)/2. It is the Lie algebra of the
Lie group T (n) of nonsingular upper triangular n× n matrices.
Its basis elements are convenient to enumerate with a ‘non-decreasing’ pair of indices in a similar
way to the canonical basis {Enij , i 6 j} of the isomorphic matrix algebra. Thus, the basis elements
eij ∼ E
ij , i 6 j, satisfy the commutation relations
[eij , ei′j′] = δi′jeij′− δij′ei′j,
where δij is the Kronecker delta.
Hereafter the indices i, j, k and l again run at most from 1 to n. Only additional constraints
on the indices are indicated.
The center of t(n) is one-dimensional and coincides with the linear span of the sum e11+ · · ·+enn
corresponding to the unity matrix En. The elements eij , i < j, and e11 + · · ·+ enn form a basis of
the nilradical of t(n), which is isomorphic to t0(n)⊕ a. Here a is the one-dimensional (Abelian) Lie
algebra.
Let e∗ji, xji and yij denote the basis element and the coordinate function in the dual space
t∗(n) and the coordinate function in t(n), which correspond to the basis element eij , i 6 j. Thus,
, eij〉 = δii′δjj′. We complete the sets of xji and yij in the matrices X and Y with zeros. Hence
X is a lower triangular matrix and Y is an upper triangular one.
Lemma 2. A fundamental lifted invariant of Ad∗
T (n) is formed by the expressions
Iij =
i6i′, j′6j
bii′ b̂j′jxi′j′ , j 6 i,
where B = (bij) is an arbitrary matrix from T (n), and B
−1 = (̂bij) is the inverse matrix of B.
Proof. The adjoint action of B ∈ T (n) on the matrix Y is AdBY = BYB
−1, i.e.
yijeij =
(BY B−1)ijeij =
i6i′6j′6j
bii′yi′j′ b̂j′jeij .
After changing eij → xji, yij → e
ji, bij ↔ b̂ij in the latter equality, we obtain the representation
for the coadjoint action of B
i6i′6j′6j
bj′jxjib̂ii′e
j′i′ =
i′6j′
(BXB−1)j′i′e
j′i′ .
Therefore, the elements Iij, j 6 i, of the matrix
I = BXB−1, B ∈ T (n),
form a complete set of the independent lifted invariants of Ad∗T (n).
Note 3. The center of the group T (n) is Z(T (n)) = {βEn | β ∈ F/{0} }. If F = C then the group
T (n) is connected. In the real case the connected component T+(n) of the unity in T (n) is formed
by the matrices from T (n) with positive diagonal elements, i.e., T+(n) ≃ T (n)/Z
2 , where Z
{diag(ε1, . . . , εn) | εi = ±1}. The inner automorphism group Int(t(n)) of t(n) is isomorphic to the
factor-group T (n)/Z(T (n)) (or T+(n)/Z(T (n)) if F is real) and hence its dimension is
n(n+1)−1.
The value of one from the diagonal elements of the matrix B or a homogenous combination of them
in the above representation of lifted invariants can be assumed inessential. It is evident from the
proof of Theorem 2 that in all cases, the invariant sets of the coadjoint representations of Int(t(n))
and t(n) coincide.
Let us remind that A
i1,i2
j1,j2
, where i1 6 i2, j1 6 j2, denotes the submatrix (aij)
i=i1,...,i2
j=j1,...,j2
of a matrix
A = (aij). The conjugate value of k with respect to n is denoted by κ, i.e. κ = n− k + 1.
Under the proof of the below theorem the following technical lemma on matrices is used.
Lemma 3. Suppose 1 < k < n. If |X
κ+1,n
1,k−1 | 6= 0 then for any β ∈ F
1,k−1(X
κ+1,n
1,k−1 )
κ+1,n
j,j =
(−1)k+1
κ+1,n
1,k−1 |
∣∣∣∣∣
1,k−1
κ+1,n
1,k−1
κ+1,n
∣∣∣∣∣ .
In particular, xκk −X
1,k−1(X
κ+1,n
1,k−1 )
κ+1,n
= (−1)k+1|X
κ+1,n
1,k−1 |
1,k |. Analogously
xκj −X
1,k−1
κ+1,n
1,k−1
κ+1,n
xjk −X
1,k−1
κ+1,n
1,k−1
κ+1,n
κ+1,n
1,k−1
∣∣∣∣∣
1,k X
∣∣∣∣∣+
1,k |
κ+1,n
1,k−1
∣∣∣∣∣
1,k−1
κ+1,n
1,k−1 X
κ+1,n
∣∣∣∣∣ .
Theorem 2. A basis of Inv(Ad∗
T (n)) is formed by the rational expressions
1,k |
j=k+1
∣∣∣∣∣
1,k xjj
1,k X
∣∣∣∣∣ , k = 0, . . . ,
where |X
n+1,n
1,0 | := 1.
Proof. We choose the following normalization restriction on the lifted invariants Iij, j 6 i:
In−j+1,j = 1, j = 1, . . . ,
Iij = 0 if j 6 i, (i, j) 6= (j
′, j′), (n − j′ + 1, j′), j′ = 1, . . . ,
This means that we do not only fix the values of the elements of the lifted invariant matrix I, which
are situated on the main diagonal over or on the secondary diagonal. The elements of the secondary
diagonal underlying the main diagonal are given a value of 1. The other significant elements of I
are given a value 0. As shown below, the imposed normalization provides satisfying the conditions
of Proposition 1 and, therefore, is correct.
Similarly to the case of strictly triangular matrices, in view of the (triangular) structure of
the matrices B and X the formula I = BXB−1 determining the lifted invariants implies that
BX = IB. This matrix equality is significant for the matrix elements lying not over the main
diagonals of the left and right hand sides, i.e.,
bii′xi′j =
Iij′bj′j , j 6 i.
For convenience we again divide the latter system under the chosen normalization conditions into
four sets of subsystems
Sk1 :
bκi′xi′j = 0, i = κ, j < k, k = 2, . . . ,
Sk2 :
bκi′xi′j = bkj, i = κ, k 6 j 6 κ, k = 1, . . . ,
Sk3 :
bki′xi′j = 0, i = k, j < k, k = 2, . . . ,
Sk4 :
bki′xi′k = bkkIkk, i = k, j < k, k = 1, . . . ,
and solve them one by one. The subsystem S12 consists of the equations
b1j = bnnxnj
which are already solved with respect to b1j . For any fixed k ∈ {2, . . . , [n/2]} the subsystem S
is a well-defined system of linear equations with respect to bκi′ , i
′ > κ, and bkj, k 6 j 6 κ. We
can solve the subsystem Sk1 with respect to bκi′ , i
′ > κ:
κ+1,n = −bκκX
1,k−1(X
κ+1,n
1,k−1 )
and then substitute the obtained values into the subsystem Sk2 . Another way is to find the expres-
sions for bkj, k 6 j 6 κ, by the Cramer method, from the whole system S
1 ∪ S
2 at once since
only these parameters are further considered. As a result, they have two representations via bκκ
and x’s:
bkj = bκκ
xκj −X
1,k−1
κ+1,n
1,k−1
κ+1,n
(−1)k+1bκκ
κ+1,n
1,k−1
∣∣∣∣∣
1,k−1 xκj
κ+1,n
1,k−1 X
κ+1,n
∣∣∣∣∣ ,
where k 6 j 6 κ. In particular,
bkk = (−1)
k+1bκκ|X
κ+1,n
1,k−1 |
1,k |.
Analogously, for any fixed k ∈ {2, . . . , [(n + 1)/2]} the subsystem Sk3 is a well-defined system of
linear equations with respect to bkj, j > κ, and it implies
κ+1,n = −
k6j6κ
1,k−1(X
κ+1,n
1,k−1 )
Substituting the found expressions for b’s into the equations of the subsystems Sk4 , we completely
exclude the parameters b’s and obtain expressions of Ikk only via x’s. Thus, under k = 1
I11 =
b1ixi1 =
xnixi1 =
xnixi1 =
xi1 xii
xn1 xni
∣∣∣∣+
where the summation range in the first sum can be bounded by 2 and n − 1 since for i = 1 and
i = n the determinants are equal to 0. In the case k ∈ {2, . . . , [(n + 1)/2]}
bkkIkk =
bkixik =
k6j6κ
bkjxjk +
bkixik
k6i6κ
xjk −X
1,k−1(X
κ+1,n
1,k−1 )
κ+1,n
= bκκ
k6i6κ
xκj −X
1,k−1(X
κ+1,n
1,k−1 )
κ+1,n
xjk −X
1,k−1(X
κ+1,n
1,k−1 )
κ+1,n
After using the representation for bnn and manipulations with submatrices of X (see Lemma 3),
we derive that
Ikk =
(−1)k+1
1,k |
k6i6κ
∣∣∣∣∣
∣∣∣∣∣+
(−1)k+1
κ+1,n
1,k−1
k6i6κ
∣∣∣∣∣
1,k−1
κ+1,n
1,k−1
κ+1,n
∣∣∣∣∣ ,
where k = 2, . . . , [(n + 1)/2]. The summation range in the first sum can be taken from k + 1 and
κ − 1 since for i = k and i = κ the determinants are equal to 0.
The combination of the found values of Ikk in the following way
Ĩ00 =
Ijj =
xii, Ĩkk = (−1)
k+1Ikk − Ĩk−1,k−1, k = 1, . . . ,
results in the invariants Ĩk′k′, k
′ = 0, . . . , [(n − 1)/2], from the statement of the theorem. The
functional independence of these invariants is obvious.
Under the normalization we express the non-normalized lifted invariants via x’s and find a part
of the parameters b’s of the coadjoint action via x’s and the other b’s. No equations involving
only x’s are obtained. In view of Proposition 1, this implies that the choice of the normalization
constraints is correct, i.e., the number of the found functionally independent invariant is maximal
and, therefore, they form a basis of Inv(Ad∗
T (n)).
Note 4. An expanded form of the invariants from Theorem 2 is
xj1 xjj
xn1 xnj
∣∣∣∣∣∣
xj1 xj2 xjj
xn−1,1 xn−1,2 xn−1,j
xn1 xn2 xnj
∣∣∣∣∣∣
xn−1,1 xn−1,2
xn1 xn2
, . . . .
The first invariant corresponds to the center of t(n). The invariant tuple ends with
1,n+1
1,n−1
if n is odd and
∣∣∣∣∣∣
∣∣∣∣∣∣
if n is even.
Corollary 2. A basis of Inv(t(n)) consists of the rational invariants
Îk =
j=k+1
∣∣∣∣∣
j,j E
ejj E
∣∣∣∣∣ , k = 0, . . . ,
where E
i1,i2
j1,j2
, i1 6 i2, j1 6 j2, denotes the matrix (eij)
i=i1,...,i2
j=j1,...,j2
n+1,n| := 1, κ = n− k + 1.
Proof. The symmetrization procedure for the tuple of invariants presented in Theorem 2 can be
assumed trivial. To show this, we expand the determinants in each element of the tuple and obtain,
as a result, a rational expression in x’s. Only the monomials from the numerator, which do not
contain the ‘diagonal’ elements xjj, include coordinate functions associated with noncommuting
basis elements of the algebra t(n). More precisely, each of the monomials includes a single pair of
such coordinate functions, namely, xji′ and xj′j for some values i
′ ∈ {1, . . . , k}, j′ ∈ {κ, . . . , n} and
j ∈ {k + 1, . . . ,κ − 1}. Hence, it is sufficient to symmetrize only the corresponding pairs of basis
elements.
After the symmetrization and the transposition of the matrices, we construct the following
expressions for the invariants of t(n) associated with the invariants from Theorem 2:
(−1)k
j=k+1
ejj +
j=k+1
ei′jejj′+ ejj′ei′j
(−1)i
∣∣E1,k;̂i
κ,n;ĵ′
∣∣E1,k;̂i
κ,n;ĵ′
∣∣ denotes the minor of the matrix E1,kκ,n complementary to the element ei′j′. Since
ei′ieij′ = eij′ei′i + ei′j′ then
ei′ieij′+ eij′ei′i
(−1)i
∣∣E1,k;̂i
κ,n;ĵ′
∣∣∣∣∣
i,i E
∣∣∣∣∣±
|E1,k
κ,n|,
where the sign ‘+’ (resp. ‘−’) have to be taken if the elements of E
i,i are placed after (resp. before)
the elements of E
κ,n in all the relevant monomials. Up to constant summands, we obtain the
expressions for the elements of an invariant basis, which are adduced in the statement and formally
derived from the corresponding expressions given in Theorem 2 by the replacement xij → eji and
the transposition of all matrices. That is why the symmetrization procedure can be assumed trivial
in the sense described. The transposition is necessary in order to improve the representation of
invariants since xij ∼ eji, j 6 i.
Note 5. The invariants from Corollary 2 can be rewritten as
Îk =
j=k+1
∣∣∣∣∣
j,j E
∣∣∣∣∣+ (−1)
j=k+1
ejj, k = 0, . . . ,
In particular, Î0 =
j ejj .
Note 6. Let us emphasize that a uniform order of elements from E
and E
κ,n has to be fixed in all
the monomials under usage of the ‘non-symmetrized’ forms of invariants presented in Corollary 2,
Note 5 and Theorem 4 (see below).
5 Solvable algebra of special upper triangular matrices
The Lie algebra st(n) of the special (i.e., having zero traces) upper triangular n × n matrices is
imbedded in a natural way in t(n) as an ideal. dim st(n) = 1
n(n+ 1)− 1. Moreover,
t(n) = st(n)⊕ Z(t(n)),
where Z(t(n)) = 〈e11 + · · · + enn〉 is the center of t(n), which corresponds to the one-dimensional
Abelian Lie algebra of the matrices proportional to En. Due to this fact we can construct a basis of
Inv(st(n)) without the usual calculations involved in finding the basis of Inv(t(n)). It is well known
that if the Lie algebra g is decomposable into the direct sum of Lie algebras g1 and g2 then the
union of bases of Inv(g1) and Inv(g2) is a basis of Inv(g). A basis of Inv(Z(t(n))) obviously consists
of only one element, e.g., e11 + · · · + enn. Therefore, the cardinality of the basis of Inv(st(n)) is
equal to the cardinality of the basis of Inv(t(n)) minus 1, i.e., [(n − 1)/2]. To construct a basis
of Inv(st(n)), it is enough for us to rewrite [(n − 1)/2] functionally independent combinations of
elements from a basis of Inv(t(n)) via elements of st(n) and to exclude the central element from
the basis.
The following basis in st(n) is chosen as a subalgebra of t(n):
eij , i < j, fk =
eii −
i=k+1
eii, k = 1, . . . , n− 1.
(Usage of this basis allows for the presentation of our results in such a form that their identity with
Proposition 1 from [23] becomes absolutely evident.) The commutation relations of st(n) in the
chosen basis are
[eij , ei′j′] = δi′jeij′− δij′ei′j, i < j, i
′ < j′;
[fk, fk′ ] = 0, k, k
′ = 1, . . . , n− 1;
[fk, eij ] = 0, i < j 6 k or k 6 i < j;
[fk, eij ] = eij , i 6 k 6 j, i < j
and, therefore, coincide with those of the algebra L(n, n−1) from [22], i.e., L(n, n−1) is isomorphic
to st(n). Combining this observation with Lemma 6 of [22] results in the following theorem.
Theorem 3. The Lie algebra st(n) has the maximal number of dimensions (equal to 1
n(n+1)−1)
among the solvable Lie algebras which have nilradicals isomorphic to t0(n). It is the unique algebra
with such a property.
Theorem 4. A basis of Inv(st(n)) consists of the rational invariants
Ǐk =
(−1)k+1
j=k+1
∣∣∣∣∣
j,j E
∣∣∣∣∣+ fk − fn−k, k = 1, . . . ,
where E
i1,i2
j1,j2
, i1 6 i2, j1 6 j2, denotes the matrix (eij)
i=i1,...,i2
j=j1,...,j2
n+1,n| := 1, κ = n− k + 1.
Proof. It is enough to observe (see Note 5) that
Ǐk = (−1)
k+1Îk +
n− 2k
Î0, k = 1, . . . ,
These combinations of elements from a basis of Inv(t(n)) are functionally independent. They are
expressed via elements of st(n). Their number is [(n − 1)/2]. Therefore, they form a basis of
Inv(st(n)).
6 Conclusion and discussion
In this paper we extend our purely algebraic approach for computing invariants of Lie algebras by
means of moving frames [3, 4] to the classes of Lie algebras t0(n), t(n) and st(n) of strictly, non-
strictly and special upper triangular matrices of an arbitrary fixed dimension n. In contrast to the
conventional infinitesimal method which involves solving an associated system of PDEs, the main
steps of the applied algorithm are the construction of the matrix B(θ) of inner automorphisms of the
Lie algebra under consideration, and the exclusion of the parameters θ from the algebraic system
I = x̌ · B(θ) in some way. The version of the algorithm, applied in this paper, is distinguished in
that a special usage of the normalization procedure when the number, and a form of elements in a
functional basis of an invariant set, are determined under excluding the parameters simultaneously.
A basis of Inv(t0(n)) was already known and constructed by both the infinitesimal method [23]
and the algebraic algorithm with an elegant but empiric technique of excluding the parameters [4].
Note that the proof introduced in [23] is very sophisticated and was completed only due to the
thorough mastery of the used infinitesimal method. A form of elements from a functional basis of
Inv(t0(n)) was guessed via calculation of bases for a number of small n’s and then justified with
the infinitesimal method, and both the testing steps (on invariance and on sufficiency of number)
were quite complicated.
Invariants of t0(n) are considered in this paper in order to demonstrate the advantages of the
normalization technique and to pave the way for further applications of this technique to the more
complicated algebras t(n) and st(n), being too complex for the infinitesimal method (only the
lowest few were completely investigated there). First the invariants of the algebras t(n) and st(n)
are exhaustively studied in this paper. The performed calculations are simple and clear since the
normalization procedure is reduced by the choice of natural coordinates on the inner automorphism
groups and by the use of a special normalization technique to solving a linear system of algebraic
equations. The results obtained for Inv(st(n)) in Theorem 4 completely agree with the conjecture
formulated as Proposition 1 in [23] on the number and form of basis elements of this invariant set.
A direct extension of the present investigation is to describe the invariants of the subalgebras
of st(n), which contain t0(n). Such subalgebras exhaust the set of solvable Lie algebras which can be
embedded in the matrix Lie algebra gl(n) and have the nilradicals isomorphic to t0(n). A technique
similar to that used in this paper can be applied. The main difficulties will be created by breaking
in symmetry and complication of coadjoint representations. The question on ways of investigation
of the other solvable Lie algebras with the nilradicals isomorphic to t0(n) remains open. (See, e.g.,
[22] for classification of the algebras of such type.)
A more general problem is to circumscribe an applicability domain of the developed algebraic
method. It has been already applied only to the low-dimensional Lie algebras and a wide range of
classes of solvable Lie algebras in [3, 4] and this paper. The next step which should be performed
is the extension of the method to classes of unsolvable Lie algebras of arbitrary dimensions, e.g.,
with fixed structures of radicals or Levi factors. An adjoining problem is the implementation of
the algorithm with symbolic calculation systems. Similar work has already began in the framework
of the general method of moving frames, e.g., in the case of rational invariants for rational actions
of algebraic groups [11]. Some other possibilities on the applications of the algorithm are outlined
in [4].
Acknowledgments. The work was partially supported by the National Science and Engineering
Research Council of Canada, by MITACS. The research of R.P. was supported by Austrian Science
Fund (FWF), Lise Meitner project M923-N13. V. B. is grateful for the hospitality the Centre de
Recherches Mathématiques, Université de Montréal.
References
[1] Ancochea J.M., Campoamor-Stursberg R. and Garcia Vergnolle L. Solvable Lie algebras with nat-
urally graded nilradicals and their invariants, J. Phys. A: Math. Gen., 2006, V.39, 1339–1355,
math-ph/0511027.
[2] Barannyk L.F. and Fushchych W.I. Casimir operators of the generalised Poincaré and Galilei groups, in
Group theoretical methods in physics (Yurmala, 1985), Vol. II, VNU Sci. Press, Utrecht, 1986, 275–282.
[3] Boyko V., Patera J. and Popovych R. Computation of invariants of Lie algebras by means of moving
frames, J. Phys. A: Math. Gen., 2006, V.39, 5749–5762, math-ph/0602046.
[4] Boyko V., Patera J. and Popovych R. Invariants of Lie algebras with fixed structure of nilradicals,
J. Phys. A: Math. Theor., 2007, V.40, 113–130, math-ph/0606045.
[5] Campoamor-Stursberg R. An alternative interpretation of the Beltrametti–Blasi formula by means of
differential forms, Phys. Lett. A, 2004, V.327, 138–145.
[6] Campoamor-Stursberg R. Application of the Gel’fand matrix method to the missing label problem in
classical kinematical Lie algebras, SIGMA, 2006, V.2, Paper 028, 11 pages, math-ph/0602065.
[7] Campoamor-Stursberg R. Affine Lie algebras with non-compact rank one Levi subalgebra and their
invariants, Acta Phys. Polon. B, 2007, V.38, 3–20.
[8] Chaichian M., Demichev A.P. and Nelipa N.F. The Casimir operators of inhomogeneous groups, Comm.
Math. Phys., 1983, V.90, 353–372.
[9] Fels M. and Olver P. Moving coframes: I. A practical algorithm, Acta Appl. Math., 1998, V.51, 161–213.
[10] Fels M. and Olver P. Moving coframes: II. Regularization and theoretical foundations, Acta Appl.
Math., 1999, V.55, 127–208.
[11] Hubert E. and Kogan I. Rational invariants of a group action: construction and rewriting, J. Symbolic
Comp., 2007, V.42, 203–217.
[12] Kaneta H. The invariant polynomial algebras for the groups IU(n) and ISO(n), Nagoya Math. J., 1984,
V.94, 43–59.
[13] Kaneta H. The invariant polynomial algebras for the groups ISL(n) and ISp(n), Nagoya Math. J., 1984,
V.94, 61–73.
[14] Ndogmo J.C. Invariants of a semi-direct sum of Lie algebras, J. Phys. A: Math. Gen., 2004, V.37,
5635–5647.
[15] Ndogmo J.C. and Winternitz P. Solvable Lie algebras with Abelian nilradicals, J. Phys. A: Math. Gen.,
1994, V.27, 405–423.
[16] Ndogmo J.C. and Winternitz P. Generalized Casimir operators of solvable Lie algebras with Abelian
nilradicals, J. Phys. A: Math. Gen., 1994, V.27, 2787–2800.
[17] Olver P.J. and Pohjanpelto J. Moving frames for Lie pseudo-groups, Canadian J. Math., to appear.
[18] Patera J., Sharp R.T., Winternitz P. and Zassenhaus H. Invariants of real low dimension Lie algebras,
J. Math. Phys., 1976, V.17, 986–994.
[19] Perroud M. The fundamental invariants of inhomogeneous classical groups, J. Math. Phys., 1983, V.24,
1381–1391.
[20] Rubin J.L. and Winternitz P. Solvable Lie algebras with Heisenberg ideals, J. Phys. A: Math. Gen.,
1993, V.26, 1123–1138.
[21] Snobl L. and Winternitz P. A class of solvable Lie algebras and their Casimir invariants, J. Phys. A:
Math. Gen., 2005, V.38, 2687–2700, math-ph/0411023.
[22] Tremblay S. and Winternitz P. Solvable Lie algebras with triangular nilradicals, J. Phys. A: Math.
Gen., 1998, V.31, 789–806, arXiv:0709.3581.
[23] Tremblay S. and Winternitz P. Invariants of the nilpotent and solvable triangular Lie algebras,
J. Phys. A: Math. Gen., 2001, V.34, 9085–9099, arXiv:0709.3116.
http://arxiv.org/abs/math-ph/0511027
http://arxiv.org/abs/math-ph/0602046
http://arxiv.org/abs/math-ph/0606045
http://arxiv.org/abs/math-ph/0602065
http://arxiv.org/abs/math-ph/0411023
http://arxiv.org/abs/0709.3581
http://arxiv.org/abs/0709.3116
	Introduction
	The algorithm
	Nilpotent algebra of strictly upper triangular matrices
	Solvable algebra of upper triangular matrices
	Solvable algebra of special upper triangular matrices
	Conclusion and discussion
ABSTRACT
  Triangular Lie algebras are the Lie algebras which can be faithfully
represented by triangular matrices of any finite size over the real/complex
number field. In the paper invariants ('generalized Casimir operators') are
found for three classes of Lie algebras, namely those which are either strictly
or non-strictly triangular, and for so-called special upper triangular Lie
algebras. Algebraic algorithm of [J. Phys. A: Math. Gen., 2006, V.39, 5749;
math-ph/0602046], developed further in [J. Phys. A: Math. Theor., 2007, V.40,
113; math-ph/0606045], is used to determine the invariants. A conjecture of [J.
Phys. A: Math. Gen., 2001, V.34, 9085], concerning the number of independent
invariants and their form, is corroborated.

<|endoftext|><|startoftext|>
Introduction
	Distribution of nucleation times
	Relaxation of clusters to metastable equilibrium
	Structure of the nucleating droplet
	Langevin simulations
	Summary
	Relaxation of clusters at the critical temperature
	Acknowledgments
	References
ABSTRACT
  We investigate the approach to stable and metastable equilibrium in Ising
models using a cluster representation. The distribution of nucleation times is
determined using the Metropolis algorithm and the corresponding $\phi^{4}$
model using Langevin dynamics. We find that the nucleation rate is suppressed
at early times even after global variables such as the magnetization and energy
have apparently reached their time independent values. The mean number of
clusters whose size is comparable to the size of the nucleating droplet becomes
time independent at about the same time that the nucleation rate reaches its
constant value. We also find subtle structural differences between the
nucleating droplets formed before and after apparent metastable equilibrium has
been established.

<|endoftext|><|startoftext|>
Introduction
In recent papers, some new techniques have been developed for calculating quantities in
a (2+1)-dimensional SU(N) gauge theories [1], [2], [3]. These techniques exploit the fact
that in an anisotropic limit of small coupling, the gauge theory becomes a collection of
completely-integrable quantum field theories, namely SU(N)× SU(N) principal chiral
nonlinear sigma models. These integrable systems are decoupled, save for a constraint
which is necessary for complete gauge invariance. In the case of N = 2, is possible to
perturb away from integrability, using exactly-known off-shell matrix elements of the
integrable theory.
Though the gauge theory we consider s not spatially-rotation invariant, it has fea-
tures one expects of real (3+1)-dimensional QCD; it is asymptotically free and confines
quarks at weak coupling. Thus the limit of no regularization is accessible.
One can formally remove the regulator in strong-coupling expansions of (2 + 1)-
dimensional gauge theories; the vacuum state in this expansion yields a string tension
and a mass gap which have formal continuum limits. This can be done in a Hamiltonian
lattice formalism [4], or with an ingenious choice of degrees of freedom and point-
splitting regularization [5]. This leaves open the question of whether these expressions
can be trusted at weak coupling (more discussion of this issue can be found in the
introduction of reference [2]). In particular, one would like to rule out a deconfinement
transition, or different dependence of physical quantities on the coupling (as in compact
QED [6]). There is a proposal for the vacuum state [7], in the formulation of reference
[5] which seems to give correct values for some glueball masses [8], but this proposal
evidently requires more mathematical justification.
In this paper, we will work out the masses of the lightest glueballs for the case
of gauge group SU(2). Our method would also work in principle for SU(N) gauge
theories, and our reason for choosing N = 2 is that the analysis is simplest for that
case.
The basic connection between the gauge theory and integrable systems is most easily
seen in axial gauge [1]. The string tensions in the x1-direction and x2-direction (which
we called the horizontal and vertical string tensions, respectively) for very small g′0,
were found by simple physical arguments. The result for the horizontal string tension
was confirmed for gauge group SU(2), and additional corrections in g′0 were found [2],
through the use of exact form factors for the currents of the sigma model. String
tensions for higher representations can also be worked out, and adjoint sources are not
confined [3].
Careful derivations of the connection between the gauge theory and integrable sys-
tems use the Kogut-Susskind lattice formalism [1], [2]. A shorter derivation was given
in reference [9], which we summarize again here. The formalism is essentially that of
“deconstruction” [10].
The Yang-Mills action is
d3L, where the Lagrangian is L = 1
2e′ 2
TrF 201+
TrF 202−
TrF 212, and where A0,A1 and A1 are SU(N)-Lie-algebra-valued components of the
gauge field, and the field strength is Fµν = ∂µAν − ∂νAµ − i[Aµ, Aν ]. This action
is invariant under the gauge transformation Aµ(x) → ig(x)−1[∂µ − iAµ(x)]g(x), where
g(x) is an SU(N)-valued scalar field. We take e′ 6= e, thereby losing rotation invariance.
We discretize the 2-direction, so that the x2 takes on the values x2 = a, 2a, 3a . . . ,
where a is a lattice spacing. All fields are considered functions of x = (x0, x1, x2). We
define the unit vector 2̂ = (0, 0, 1). We replace A2(x) by a field U(x) lying in SU(N),
via U(x) ≈ exp−iaA2(x). There is a natural discrete covariant-derivative operator:
DµU(x) = ∂µU(x) − iAµ(x)U(x) + iU(x)Aµ(x + 2̂a), µ = 0, 1, for any N × N complex
matrix field U(x). The action is S =
x2 a L where
2(g′0)
TrF 201 +
Tr[D0U(x)]†D0U(x)−
Tr[D1U(x)]†D1U(x) , (1.1)
and where g20 = e
0a and (g
2 = e′ 2a. The Lagrangian (1.1) is invariant under the gauge
transformation: Aµ(x) → ig(x)−1[∂µ − iAµ(x)]g(x) and U(x) → g(x)−1U(x)g(x + 2̂a)
where again, g(x) ∈ SU(N) and µ is restricted to 0 or 1. The bare coupling constants g0
and g′0 are dimensionless. We recover from (1.1) the anisotropic continuum action in the
limit a→ 0. The sigma model field is U(x0, x1, x2), and each discrete x2 corresponds to
a different sigma model. The system (1.1) is a collection of parallel (1+1)-dimensional
SU(N) × SU(N) sigma models, each of which couples to the auxiliary fields A0, A1.
The sigma-model self-interaction is the dimensionless number g0.
We feel it worth commenting on the nature of the anisotropic regime and how it
is different from the standard (2 + 1)-dimensional Yang-Mills theory. The point where
the regulator can be removed in the theory is g′0 = g0 = 0. This point can be reached
in our treatment, but only if
(g′0)
2 ≪ 1
e−4π/(g
N) . (1.2)
The left-hand side and ride-hand side are proportional to the two energy scales in the
theory (the latter comes from the two-loop beta function of the sigma model). Thus
our method cannot accommodate fixing the ratio g′0/g0, which is natural in standard
perturbation theory [11]. This is why the mass gap is not of order e, e′ and the string
tension is not of order e2, (e′)2.
We now discuss the Hamiltonian in the axial gauge A1 = 0. The left-handed and
right-handed currents are, jLµ (x)b = iTr tb ∂µU(x)U(x)
† and jRµ (x)b = iTr tb U(x)
†∂µU(x),
respectively, where µ = 0, 1. The Hamiltonian obtained from (1.1) is H0 +H1, where
{[jL0 (x)b]2 + [jL1 (x)b]2} , (1.3)
(g′0)
∂1Φ(x
1, x2)∂1Φ(x
1, x2)
)2 L2−a
jL0 (x
1, x2)Φ(x1, x2)− jR0 (x1, x2)Φ(x1, x2 + a)
+ (g′0)
2qbΦ(u
1, u2)b − (g′0)2q′bΦ(v1, v2)b , (1.4)
where −Φb = A0 b is the temporal gauge field, and where in the last term we have
inserted two color charges - a quark with charge q at site u and an anti-quark with
charge q′ at site v. Some gauge invariance remains after the axial-gauge fixing, namely
that for each x2
jL0 (x
1, x2)b − jR0 (x1, x2 − a)b
− g20Q(x2)b
Ψ = 0 , (1.5)
where Q(x2)b is the total color charge from quarks at x
2 and Ψ is any physical state. To
derive the constraint (1.5) more precisely, we started with open boundary conditions
in the 1-direction and periodic boundary conditions in the 2-direction, meaning that
the two-dimensional space is a cylinder [1], [2].
From (1.4) we see that the left-handed charge of the sigma model at x2 is coupled
to the electrostatic potential Φ, at x2. The right-handed charge of the sigma model
is coupled to the electrostatic potential at x2 + a. The excitations of H0, which we
call Fadeev-Zamoldochikov or FZ particles, behave like solitons, though they do not
correspond to classical configurations. Some of these FZ particles are elementary and
others are bound states of the elementary FZ particles. An elementary FZ particle has
an adjoint charge and mass m1. An elementary one-FZ-particle state is a superposition
of color-dipole states, with a quark (anti-quark) charge at x1, x2 and an anti-quark
(quark) charge at x1, x2 + a. The interaction H1 produces a linear potential between
color charges with the same value of x2. Residual gauge invariance (1.5) requires that
at each value of x2, the total color charge is zero. If there are no quarks with coordinate
x2, the total right-handed charge of FZ particles in the sigma model at x2 − a is equal
to the total left-handed charge of FZ particles in the sigma model at x2.
The particles of the principal chiral sigma model carry a quantum number r, with
the values r = 1, . . . , N − 1 [21]. Each particle of label r has an antiparticle of the
same mass, with label N − r. The masses are given by
mr = m1
sin rπ
sin π
, m1 = KΛ(g
−1/2e
N + non−universal corrections , (1.6)
where K is a non-universal constant and Λ is the ultraviolet cut-off of the sigma model.
Lorentz invariance in each x0, x1 plane is manifest. For this reason, the linear
potential is not the only effect of H1. The interaction creates and destroys pairs of
elementary FZ particles. This effect is quite small, provided that g′0 is small enough.
Specifically, this means that the square of the 1 + 1 string tension in the x1-direction
coming fromH1 is small compared to the square of the mass of fundamental FZ particle;
this is just the condition (1.2). The effect is important, however, in that it is responsible
for the correction to the horizontal string discussed in the next paragraph in equation
(1.8).
Simple arguments readily show that at leading order in g′0, the vertical and hori-
zontal string tensions are given by
, σH =
(g′0)
CN , (1.7)
respectively, where CN is the smallest eigenvalue of the Casimir of SU(N). These naive
results for the string tension have further corrections in g′0, which were determined for
the horizontal string tension for SU(2) [2]:
0.7296
(g′0)
e4π/g
. (1.8)
The leading term agrees with (1.7). This calculation was done using the exact form
factor for sigma model currents obtained by Karowski and Weisz [12]. The form factor
can also be used to find corrections of order (g′0)
2 to the vertical string tension; this
problem should be solved soon. If the reader is not familiar with form-factor techniques
in relativistic integrable field theories, a self-contained review is in the appendix of
reference [2].
Another recent application of exact form factors to the (2 + 1)-dimensional SU(2)
gauge theory is reference [13], in which form factors of the two-dimensional Ising model
[14] are used to find the profile of the electric string near the high-temperature decon-
fining transition, assuming the Svetitsky-Yaffe conjecture [15].
A rough picture of a gauge-invariant state for the gauge group SU(2) with no quarks
is given in Figure 1. For N > 2, there are more complicated ways in which strings can
join particles. For example, a junction of N strings is possible. Figure 1 is inaccurate
in an important respect; the “ring” of particles held together by horizontal strings is
extremely broad in extent in the x2-direction compared to the x1-direction. This is
because σH ≪ σV.
The lightest states have the smallest number of particles, by virtue of σH ≪ σV.
Thus the lightest glueballs are pairs of FZ particles with the same value of x2. For
small enough g′0, the very lightest state has a mass well-approximated by 2m1. The
purpose of this paper is to find the leading corrections in (g′0)
2 to this result. This
will be done using the S-matrix of the sigma model and the WKB formula. There
are further small corrections, due to the softening of the potential near where particles
overlap, which we do do not determine.
It is clear that the lightest bound states of FZ particles are (1 + 1)-dimensional in
character. If we formulated a gauge theory in which x2 was fixed in U(x0, x1, x2), we
would find the same spectrum, as a function of m1 and σH. In the Kogut-Susskind
lattice formulation, a long row of plaquettes with open boundary conditions is a regular-
ized gauge theory of this type. The only real difference between this (1+1) dimensional
model and that we study is that σH will receive different corrections of order (g
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
− − − − − − − − − − − −
Figure 1. A glueball state is a collection of heavy particles, held weakly together
by strings. The horizontal coordinate is x1 and the vertical coordinate is x2.
In the next section we will discuss the wave function of an unbound pair of FZ
particles. We find that this is described by phase shift for the color-singlet sector. In
Section 3, we determine the bound-state spectrum. The problem we solve is very similar
to that of two particle-states of the two-dimensional Ising model with an external
magnetic field [16] (for a good summary of this problem, see reference [17]); the only
genuine difference is the presence of a matching condition where the particles overlap.
This matching condition comes from the phase shift of the scattering problem. We
present our conclusions in Section 4.
2 Scattering states of FZ particles
The lightest glueball state, as discussed above, is simply a pair of FZ particles located
at the points (x1, x2) and (y1, x2) and bound in a linear potential. Residual gauge
invariance (1.5), demands that the state be a color singlet. To begin with, however,
we simply write the form of a free state of two particles.
The state of the SU(2) × SU(2) ≃ O(4) nonlinear sigma model with a particles of
momenta p1 and p2 and quantum numbers j1 and j2 (which take the values 1, 2, 3, 4)
is described by the wave function
ψp1p2(x
1, y1)j1,j2 =
eip1x
1+ip2y
Aj1,j2 , x
1 < y1
eip2x
1+ip1y
k1,k2=1
Sk1k2j1j2 (p1, p2)Ak2,k1 , x
1 > y1
, (2.1)
where Aj1j2 is an arbitrary set of complex numbers and S
(p1, p2) is the two-particle
S-matrix. We have not yet imposed (1.5).
The wave function (2.1) is written in a form where the O(4) symmetry is manifest.
It is straightforward to write it in a form where the left SU(2)L and the right SU(2)R
symmetries are manifest, by writing
ψp1p2(x
1, y1)
j1,j2
(δj14ac − iσj1ac)
bd − iσ
∗ ψp1p2(x
1, y1)j1,j2 (2.2)
describing a pair of color dipoles, one with quantum numbers a, b̄ and the other with
quantum numbers c̄, d, where σj , j = 1, 2, 3 denotes the Pauli matrices.
We impose the physical state condition (1.5) on (2.2) by requiring that a = b and
c = d and summing over these colors. The projected wave function is, up to an overall
constant,
ψp1p2(x
1, y1) =
eip1x
1+ip2y
, x1 < y1
eip2x
1+ip1y
S0(p1, p2) , x
1 > y1
, (2.3)
where S0(p1, p2) is the singlet projection of the O(4) S-matrix. This S-matrix was first
obtained by Zamolodchikov and Zamolodchikov [18]. A useful form is given in reference
[12]:
S0(p1, p2) = S0(θ) = −
π − iθ
π + iθ
exp i
1− e−ξ
1 + eξ
, (2.4)
where the relative rapidity θ is given by θ = θ2− θ1, p1 = m sinh θ1, p2 = m sinh θ2 and
where we denote the particle mass m1, given by (1.6), by m (because there is only one
mass for the case of N = 2). A derivation of (2.4) is in the appendix of reference [2].
The singlet S-matrix is just given by a phase shift φ(θ): S0(θ) = exp iφ(θ). The
phase shift has a simple form in the low-energy, non-relativistic limit, |p1 − p2| ≪ m.
In this limit, θ ≈ |p1− p2|/m. The integral on the right-hand side of (2.4) can be done
by Taylor expanding in |p1 − p2|/m yielding
φ(θ) = φ(p1, p2) = π −
3− 2 ln 2
|p1 − p2| +O
|p− r|2
. (2.5)
3 The low-lying glueball spectrum
Let us now consider the states of a bound pair of FZ particles in the potential V (x1, y1) =
2σH|x1 − y1| (the reason for the factor of two is simply that the particles are joined by
a pair of strings). We use the non-relativistic approximation, used to find (2.5). For
our problem, the horizontal string tension times the size of a typical bound state is
small compared to the mass, by (1.2). This justifies the non-relativistic approximation
for low-lying states. The mass of a low-lying glueball is given by
M = 2m+ E ,
where E is the energy eigenstate of the two-particle problem.
Let us introduce center-of-mass coordinates, X = (x1+y1)/2 and x = y1−x1. The
reduced mass of the system is m/2. We factor out the phase depending on X , leaving
us only with a wave function depending on x. The Schrödinger equation we consider
+ 2σH|x|ψ = Eψ
with a matching condition at x = 0 between the wave function ψ(x) at x > 0 and the
wave function at x < 0. There is actually a further complication, which we do not
consider here; the potential changes slightly in the region where x ≈ 0. This is due
to the fact that the color charge is slightly smeared out. This smearing out can be
calculated from the form factor [12].
Our results (2.3), (2.5) for the unbound two-particle state, tell us that for x1 ≈ y1,
where the effect of the potential can be ignored, the bound-state wave function in the
center-of-mass frame will be of the form
ψ(x) =
cos(px+ ω) , x < 0
cos[−px+ ω − φ(p)] , x > 0 , (3.1)
for some angle ω, where p = p1 − p2 and φ(p) = π− 3−2 ln 2πm |p|+O(|p|
2/m2). The value
of p near x = 0 is given by p = (mE)1/2, where E is the energy eigenvalue of the state.
This is the matching condition between the wave function for x > 0 and for x < 0.
The wave function for x < 0 an Airy function. So is the wave function for x > 0.
We therefore obtain the approximate WKB form
ψ(x) =
C(x+ E
)−1/4 cos
(2mσH)
1/2(x+ E
)3/2 − π
, x < 0
C ′( E
− x)−1/4 cos
(2mσH)
1/2( E
− x)3/2 + π
, x > 0
, (3.2)
for some constants C and C ′. The expression (3.2) can be made to agree with (3.1) for
small x, provided the generalization of the Bohr-Sommerfeld quantization condition
2(m)1/2
E3/2n +
3− 2 ln 2
πm1/2
E1/2n −
π = 0 , n = 0, 1, 2, . . . , (3.3)
is satisfied by E = En. The only new feature in this semi-classical formula is the
second term, produced by the phase shift. Absorbing the horizontal string tension in
the energy, by defining un = Enσ
H , this cubic equation becomes
2(m)1/2
u3/2n +
3− 2 ln 2
πm1/2
π = 0 .
The second term can be ignored for sufficiently small σH, i.e. sufficiently small g
There is a unique real solution of the cubic equation (3.3) for a given integer n ≥ 0,
because 3− 2 ln 2 = 1.613706 > 0. The low-lying glueball masses are given by
Mn = 2m+ En = 2m+
ǫ1/3n −
3(3− 2 ln 2)σH
ǫ−1/3n
, (3.4)
where
3πσH(n+
4m1/2
4m1/2(n + 1
3(3− 2 ln 2)σH
. (3.5)
4 Conclusions
We have identified the low-lying glueballs of the anisotropic Yang-Mills theory in (2 +
1) dimensions as bound pairs of the fundamental massive particles of the principal
chiral nonlinear sigma model. We found a matching condition for the bound-state
wave function at the origin, which when combined with elementary methods yields the
spectrum of the lightest states.
There are other aspects of the two-particle bound-state problem we have not con-
sidered here. First, the potential is not precisely linear in the region where the two
particles are close together. The corrections to the potential can be determined us-
ing form factors. This will slightly modify (3.3). A completely different issue is that
there are small corrections to the form factors themselves, coming from the presence
of bound states. This, in turn, will give a further correction to the horizontal string
tension found in [2]. Such corrections to form factors in theories close to integrability
were first discussed by Delfino, Mussardo and Simonetti [19]. The bound-state energies
proliferate between 2m and 4m, as g′0 → 0. Our method breaks down as the bound-
state mass reaches 4m, because the bound state develops an instability towards fission
into a pair of two-particle bound states. This is analogous to the situation for the Ising
model in a field [16], [17] as we stated earlier. It should be worthwhile to understand
the relativistic corrections to the bound-state formula, along the lines of the work of
Fonseca and Zamolodchikov [20].
A similar calculation is possible for SU(N). The exact S-matrix of the principal
chiral nonlinear sigma model is known for N > 2 [21]. An interesting feature is that the
phase shift should vanish as N → ∞, with g20N fixed, meaning that the wave function
would be continuous where FZ particles overlap.
It would be interesting to study the scattering of a glueball by an external particle.
If the scattering is sufficiently short range, the FZ particles could be liberated from the
glueball, after which hadronization would ensue.
The results of this paper and of references [1] and [2] may be extendable to the
standard (2 + 1)-dimensional isotropic Yang-Mills theory with g′0 = g0. The strat-
egy we have in mind is an anisotropic renormalization procedure. At the start is a
standard field theory with an isotropic cut-off. By anisotropically integrating out high-
momentum degrees of freedom, the isotropic theory will flow to an anisotropic theory
with a small momentum cut-off in the x2-direction and a large momentum cut-off in the
x1 direction. If the renormalized couplings satisfy the condition (1.2), we could apply
our techniques. A check of such a method would be approximate rotational invariance
of the string tension. This would give an analytic first-principles method of solving
the isotropic gauge theory with fixed dimensionful coupling constant e, and no cut-off.
The only other analytic weak-coupling argument for a mass gap and confinement in
(2 + 1)-dimensions, namely that of orbit-space distance estimates, discussed by Feyn-
man [22], by Karabali and Nair in the second of references [5], and by Semenoff and
the author [23] is suggestive, but has not yielded definite results yet1.
Acknowledgments
This research was supported in part by the National Science Foundation under Grant
No. PHY05-51164 and by a grant from the PSC-CUNY.
References
[1] P. Orland, Phys. Rev. D71 (2005) 054503.
[2] P. Orland, Phys. Rev. D74 (2006) 085001.
[3] P. Orland, Phys. Rev. D75 (2007) 025001.
[4] J.P. Greensite, Nucl. Phys. B166 (1980) 113; Q.-Z. Chen, X.-Q. Luo, S.-H. Guo,
Phys. Lett. B341 (1995) 349.
[5] D. Karabali and V.P. Nair, Nucl. Phys. B464 (1996) 135; Phys. Lett. B379
(1996) 141; D. Karabali, C. Kim and V.P. Nair, B524 (1998) 661; Phys. Lett.
B434 (1998) 103; Nucl. Phys. B566 (2000) 331, Phys. Rev. D64 (2001) 025011.
[6] A.M. Polyakov, Phys. Lett. B59 (1975) 82.
[7] R.G. Leigh, D. Minic and A. Yelnikov, hep-th/0604060 (2006).
[8] H.B. Meyer and M.J. Teper, Nucl.Phys. B668 (2003) 111.
[9] P. Orland, in Quark Confinement and the Hadron Spectrum VII, Ponta
Delgada, Azores, Portugal, 2006, AIP Conference Proceedings 892 (2007) 206,
available at http://proceedings.aip.org/proceedings.
1See also reference [24] for a general discussion of distance in orbit space.
http://arxiv.org/abs/hep-th/0604060
http://proceedings.aip.org/proceedings
[10] N. Arkani-Hamed, A.G. Cohen and H. Georgi, Phys. Rev. Lett. 86 (2001) 4757.
[11] D. Colladay and P. McDonald, hep-ph/0609084 (2006).
[12] M. Karowski and P. Weisz, Nucl. Phys. B139 (1978) 455.
[13] M. Caselle, P. Grinza and N. Magnoli, J. Stat. Mech. 0611 (2006) P003.
[14] B.M McCoy, C.A. Tracy and T.T. Wu, Phys. Rev. Lett. 38 (1977) 793; M. Sato,
T. Miwa and M. Jimbo, Publ. Res. Inst. Math. Sci. Kyoto (1978) 223; B. Berg,
M. Karowski and P. Weisz, Phys. Rev. D19 (1979) 2477.
[15] B. Svetitsky and L. G. Yaffe, Nucl. Phys. B210 (1982) 423.
[16] B. M. McCoy and T.T. Wu, Phys. Rev. D18 (1978) 1259.
[17] M.J. Bhaseen and A.M. Tsvelik, in From Fields to Strings; Circumnavigat-
ing Theoretical Physics, Ian Kogan memorial volumes, Vol. 1 (2004), pg. 661,
M. Shifman, A. Vainshtein and J. Wheater ed., cond-mat/0409602
[18] A.B. Zamolodchikov and Al. B. Zamolodchikov, Nucl. Phys. B133 (1978) 525.
[19] G. Delfino, G. Mussardo and P. Simonetti, Nucl. Phys. B473(1996) 469.
[20] P. Fonseca and A.B. Zamolodchikov, J. Stat. Phys, 110 (2003) 527.
[21] E. Abdalla, M.C.B. Abdalla and A. Lima-Santos, Phys. Lett.B140 (1984) 71; P.B.
Wiegmann; Phys. Lett. B142 (1984) 173; A.M. Polyakov and P.B. Wiegmann,
Phys. Lett. B131 (1983) 121; P.B. Wiegmann, Phys. Lett. B141 (1984) 217.
[22] R.P. Feynman, Nucl. Phys. B188 (1981) 479.
[23] P. Orland and G.W. Semenoff, Nucl. Phys. B576 (2000) 627.
[24] P. Orland, hep-th/9607134 (1996).
http://arxiv.org/abs/hep-ph/0609084
http://arxiv.org/abs/cond-mat/0409602
http://arxiv.org/abs/hep-th/9607134
	Introduction
	Scattering states of FZ particles
	The low-lying glueball spectrum
	Conclusions
ABSTRACT
  The confinement problem has been solved in the anisotropic (2+1)-dimensional
SU(N) Yang-Mills theory at weak coupling. In this paper, we find the low-lying
spectrum for N=2. The lightest excitations are pairs of fundamental particles
of the (1+1)-dimensional SU(2)XSU(2) principal chiral sigma model bound in a
linear potential, with a specified matching condition where the particles
overlap. This matching condition can be determined from the exactly-known
S-matrix for the sigma model.

<|endoftext|><|startoftext|>
Introduction
	Method
	The problem
	The solution
	Choosing a galaxy parametrization
	The models
	Errors
	Timings
	Tests on Simulated Data
	Star formation histories
	Wavelength range
	Noise
	Dust
	Results
	Handling SDSS data
	Duplicate galaxies
	Real fits
	VESPA and MOPED
	Conclusions
	Acknowledgments
ABSTRACT
  We introduce VErsatile SPectral Analysis (VESPA): a new method which aims to
recover robust star formation and metallicity histories from galactic spectra.
VESPA uses the full spectral range to construct a galaxy history from synthetic
models. We investigate the use of an adaptative parametrization grid to recover
reliable star formation histories on a galaxy-by-galaxy basis. Our goal is
robustness as opposed to high resolution histories, and the method is designed
to return high time resolution only where the data demand it. In this paper we
detail the method and we present our findings when we apply VESPA to synthetic
and real Sloan Digital Sky Survey (SDSS) spectroscopic data. We show that the
number of parameters that can be recovered from a spectrum depends strongly on
the signal-to-noise, wavelength coverage and presence or absence of a young
population. For a typical SDSS sample of galaxies, we can normally recover
between 2 to 5 stellar populations. We find very good agreement between VESPA
and our previous analysis of the SDSS sample with MOPED.

<|endoftext|><|startoftext|>
Introduction
Large interferometers are now being used to search for gravitational waves with
sufficient sensitivity to be able to detect signals from distant astrophysical sources.
At present, the three detectors of the Laser Interferometer Gravitational-wave
Observatory (LIGO) project [1] have achieved strain sensitivities consistent with their
design goals, while the GEO 600 [2] and Virgo [3] detectors are in the process of being
commissioned and are expected to reach comparable sensitivities. Experience gained
with these detectors, TAMA300 [4], and several small prototype interferometers has
Search for gravitational-wave bursts in LIGO data 5
nurtured advanced designs for future detector upgrades and new facilities, including
Advanced LIGO [5], Advanced Virgo [6], and the Large-scale Cryogenic Gravitational-
wave Telescope (LCGT) proposed to be constructed in Japan [7]. The LIGO Scientific
Collaboration (LSC) carries out the analysis of data collected by the LIGO and
GEO 600 gravitational-wave detectors, and has begun to pursue joint searches with
other collaborations (see, for example, [8]) as the network of operating detectors
evolves.
As the exploration of the gravitational-wave sky can now be carried out with
greater sensitivity than ever before, it is important to search for all plausible signals
in the data. In addition to well-modeled signals such as those from binary inspirals [9]
and spinning neutron stars [10], some astrophysical systems may emit gravitational
waves which are modeled imperfectly (if at all) and therefore cannot reliably be
searched for using matched filtering. Examples of such imperfectly-modeled systems
include binary mergers (despite recent advances in the fidelity of numerical relativity
calculations for at least some cases; see, for example, [11]) and stellar core collapse
events. For the latter, several sets of simulations have been carried out in the past
(see, for example, [12] and [13]), but more recent simulations have suggested a new
resonant core oscillation mechanism, driven by in-falling material, which appears to
power the supernova explosion and also to emit strong gravitational waves [14, 15].
Given the current uncertainties regarding gravitational wave emission by systems such
as these, as well as the possibility of detectable signals from other astrophysical sources
which are unknown or for which no attempt has been made to model gravitational
wave emission, it is desirable to cast a wide net.
In this article, we report the results of a search for gravitational-wave “bursts”
that is designed to be able to detect short-duration (≪ 1 s) signals of arbitrary form
as long as they have significant signal power in the most sensitive frequency band
of LIGO, considered here to be 64–1600 Hz. This analysis uses LIGO data from
the fourth science run carried out by the LSC, called S4, and uses the same basic
methods as previous LSC burst searches [17, 18] that were performed using data from
the S2 and S3 science runs. (A burst search was performed using data from the S1
science run using different methods [16].) We briefly describe the instruments and
data collection in section 2. In sections 3 and 4 we review the two complementary
signal processing methods—one based on locating signal power in excess of the baseline
noise and the other based on cross-correlating data streams—that are used together
to identify gravitational-wave event candidates. We note where the implementations
have been improved relative to the earlier searches and describe the signal consistency
tests which are based on the outputs from these tools. Section 5 describes additional
selection criteria which are used to “clean up” the data sample, reducing the average
rate of spurious triggers in the data. The complete analysis “pipeline” finds no event
candidates that pass all of the selection criteria, so we present in section 6 an upper
limit on the rate of gravitational-wave events which would be detected reliably by our
pipeline.
The detectability of a given type of burst, and thus the effective rate limit for a
particular astrophysical source model, depends on the signal waveform and amplitude;
in general, the detection efficiency (averaged over sky positions and arrival times) is less
than unity. We do not attempt a comprehensive survey of possible astrophysical signals
in this paper, but use a Monte Carlo method with a limited number of ad-hoc simulated
signals to evaluate the amplitude sensitivity of our pipeline, as described in section 7.
Overall, this search has much better sensitivity than previous searches, mostly due to
Search for gravitational-wave bursts in LIGO data 6
Mode Cleaner
Smoothes out fluctuations
of the input beam,
passes only fundamental
Gaussian beam mode
Stabilized
Laser
Power Recycling Mirror
(2.7% transmission)
Increases the stored power
by a factor of ~45, reducing
the photostatistics noise
Fabry-Perot Arm Cavity
Increases the sensitivity
to small length changes by
a factor of ~140
Photodiode
Input Mirror End Mirror
Beam Splitter
(50% transmission)
2 km or 4 km
Figure 1. Simplified optical layout of a LIGO interferometer.
using lower-noise data and partly due to improvements in the analysis pipeline. In
section 8 we estimate the amplitude sensitivity for certain modeled signals of interest
and calculate approximate distances at which those signals could be detected with 50%
efficiency. This completed S4 search sets the stage for burst searches now underway
using data from the S5 science run of the LIGO and GEO 600 detectors, which benefit
from much longer observation time and will be able to detect even weaker signals.
2. Instruments and data collection
LIGO comprises two observatory sites in the United States with a total of three
interferometers. As shown schematically in figure 1, the optical design is a Michelson
interferometer augmented with additional partially-transmitting mirrors to form
Fabry-Perot cavities in the arms and to “recycle” the outgoing beam power by
interfering it with the incoming beam. Servo systems are used to “lock” the mirror
positions to maintain resonance in the optical cavities, as well as to control the mirror
orientations, laser frequency and intensity, and many other degrees of freedom of the
apparatus. Interference between the two beams recombining at the beam splitter is
detected by photodiodes, providing a measure of the difference in arm lengths that
would be changed by a passing gravitational wave. The large mirrors which direct
the laser beams are suspended from wires, with the support structures isolated from
ground vibrations using stacks of inertial masses linked by damped springs. Active
feed-forward and feedback systems provide additional suppression of ground vibrations
for many of the degrees of freedom. The beam path of the interferometer, excluding
the laser light source and the photodiodes, is entirely enclosed in a vacuum system.
The LIGO Hanford Observatory in Washington state has two interferometers within
the same vacuum system, one with arms 4 km long (called H1) and the other with
arms 2 km long (called H2). The LIGO Livingston Observatory in Louisiana has a
single interferometer with 4 km long arms, called L1.
The response of an interferometer to a gravitational wave arriving at local time
Search for gravitational-wave bursts in LIGO data 7
t depends on the dimensionless strain amplitude and polarization of the wave and its
arrival direction with respect to the arms of the interferometer. In the low-frequency
limit, the differential strain signal detected by the interferometer (effective arm length
difference divided by the length of an arm) can be expressed as a projection of the
two polarization components of the gravitational wave, h+(t) and h×(t), with antenna
response factors F+(α, δ, t) and F×(α, δ, t):
hdet(t) = F+(α, δ, t)h+(t) + F×(α, δ, t)h×(t) , (1)
where α and δ are the right ascension and declination of the source. F+ and F× are
distinct for each interferometer site and change slowly with t over the course of a
sidereal day as the Earth’s rotation changes the orientation of the interferometer with
respect to the source location.
The electrical signal from the photodiode is filtered and digitized continuously at a
rate of 16 384 Hz. The time series of digitized values, referred to as the “gravitational-
wave channel” (GW channel), is recorded in a computer file, along with a timestamp
derived from the Global Positioning System (GPS) and additional information. The
relationship between a given gravitational-wave signal and the digitized time series is
measured in situ by imposing continuous sinusoidal position displacements of known
amplitude on some of the mirrors. These are called “calibration lines” because they
appear as narrow line features in a spectrogram of the GW channel.
Commissioning the LIGO interferometers has required several years of effort and
was the primary activity through late 2005. Beginning in 2000, a series of short data
collection runs was begun to establish operating procedures, test the detector systems
with stable configurations, and provide data for the development of data analysis
techniques. The first data collection run judged to have some scientific interest,
science run S1, was conducted in August-September 2002 with detector noise more
than two orders of magnitude higher than the design goal. Science runs S2 and S3
followed in 2003 with steadily improving detector noise, but with a poor duty cycle
for L1 due primarily to low-frequency, large-amplitude ground motion from human
activities and weather. During 2004, a hydraulic pre-isolation system was installed
and commissioned at the Livingston site to measure the ground motion and counteract
it with a relative displacement between the external and internal support structures
for the optical components, keeping the internal components much closer to an inertial
frame at frequencies above 0.1 Hz. At the same time, several improvements were made
to the H1 interferometer at Hanford to allow the laser power to be increased to the
full design power of 10 W.
The S4 science run, which lasted from 22 February to 23 March 2005, featured
good overall “science mode” duty cycles of 80.5%, 81.4%, and 74.5% for H1, H2,
and L1, respectively, corresponding to observation times of 570, 576, and 528 hours.
Thanks to the improvements made after the S3 run, the detector noise during S4 was
within a factor of two of the design goal over most of the frequency band, as shown in
figure 2. The GEO 600 interferometer also collected data throughout the S4 run, but
was over a factor of 100 less sensitive than the LIGO interferometers at 200 Hz and
a factor of few at and above the 1 kHz frequency range. The analysis approach used
in this article effectively requires a gravitational-wave signal to be distinguishable
above the noise in each of a fixed set of detectors, so it uses only the three LIGO
interferometers and not GEO 600. There are a total of 402 hours of S4 during which
all three LIGO interferometers were simultaneously collecting science-mode data.
Search for gravitational-wave bursts in LIGO data 8
100 1000
LIGO Detector Sensitivities During S4 Science Run
Frequency (Hz)
LIGO SRD goal (4 km)
Figure 2. Best achieved detector noise for the three LIGO interferometers during
the S4 science run, in terms of equivalent gravitational wave strain amplitude
spectral density. “LIGO SRD goal” is the sensitivity goal for the 4-km LIGO
interferometers set forth in the 1995 LIGO Science Requirements Document [19].
3. Trigger generation
The first stage of the burst search pipeline is to identify times when the GW channels
of the three interferometers appear to contain signal power in excess of the baseline
noise; these times, along with parameters derived from the data, are called “triggers”
and are used as input to later processing stages. As in previous searches [17, 18],
the WaveBurst algorithm [20] is used for this purpose; it will only be summarized
here [21].
WaveBurst performs a linear wavelet packet decomposition, using the symlet
wavelet basis [22], on short intervals of gravitational-wave data from each
interferometer. This decomposition produces a time-frequency map of the data similar
to a windowed Fourier transformation. A time-frequency data sample is referred to as a
pixel. Pixels containing significant excess signal power are selected in a non-parametric
way by ranking them with other pixels at nearby times and frequencies. As in the S3
analysis, WaveBurst has been configured for S4 to use six different time resolutions
and corresponding frequency resolutions, ranging from 1/16 s by 8 Hz to 1/512 s by
256 Hz, to be able to closely match the natural time-frequency properties of a variety
of burst signals. The wavelet decomposition is restricted to 64–2048 Hz. At any
given resolution, significant pixels from the three detector data streams are compared
and coincident pixels are selected; these are used to construct “clusters”, potentially
spanning many pixels in time and/or frequency, within which there is evidence for
a common signal appearing in the different detector data streams. These coincident
clusters form the basis for triggers, each of which is characterized by a central time,
Search for gravitational-wave bursts in LIGO data 9
Entries  8325975
Mean    2.583
RMS    0.9666
0 5 10 15 20 25 30 35 40
Figure 3. Distribution of Zg values for all WaveBurst triggers. The arrow shows
the location of the initial significance cut, Zg > 6.7.
duration, central frequency, frequency range, and overall significance Zg as defined
in [23]. Zg is calculated from the pixels in the cluster and is roughly proportional to
the geometric average of the excess signal power measured in the three interferometers,
relative to the average noise in each interferometer at the relevant frequency. Thus,
a large value of Zg indicates that the signal power in those pixels is highly unlikely
to have resulted from usual instrumental noise fluctuations. In addition, the absolute
strength of the signal detected by each interferometer within the sensitive frequency
band of the search is estimated in terms of the root-sum-squared amplitude of the
detected strain,
hrssdet =
|hdet(t)|
dt . (2)
WaveBurst was run on time intervals during which all three LIGO interferometers
were in science mode, but omitting periods when simulated signals were injected into
the interferometer hardware, any photodiode readout experienced an overflow, or the
data acquisition system was not operating. In addition, the last 30 seconds of each
science-mode data segment were omitted because it was observed that loss of “lock”
is sometimes preceded by a period of instability. These selection criteria reduced the
amount of data processed by WaveBurst from 402 hours to 391 hours.
For this analysis, triggers found by WaveBurst are initially required to have a
frequency range which overlaps 64–1600 Hz. An initial significance cut, Zg ≥ 6.7, is
applied to reject the bulk of the triggers and limit the number passed along to later
stages of the analysis. Figure 3 shows the distribution of Zg prior to applying this
significance cut.
Besides identifying truly simultaneous signals in the three data streams,
WaveBurst applies the same pixel matching and cluster coincidence tests to the three
data streams with many discrete relative time shifts imposed between the Hanford
and Livingston data streams, each much larger than the maximum light travel time
between the sites and the duration of the signals targeted by this search. The time-
shifted triggers found in this way provide a large sample to allow the “background”
(spurious triggers produced in response to detector noise in the absence of gravitational
waves) to be studied, under the assumption that the detector noise properties do not
Search for gravitational-wave bursts in LIGO data 10
−150 −100 −50 0 50 100 150
Mean = 41.1
χ2 = 130.5
d.o.f. = 97
WaveBurst trigger rate versus time shift
Time shift (s)
Figure 4. WaveBurst trigger rate as a function of the relative time shift applied
between the Hanford and Livingston data streams. The horizontal line is a fit to
a constant value, yielding a χ2 of 130.5 for 97 degrees of freedom.
vary much over the span of a few minutes and are independent at the two sites.
The two Hanford data streams are not shifted relative to one another, so that any
local environmental effects which influence both detectors are preserved. In fact,
some correlation in time is observed between noise transients in the H1 and H2 data
streams.
Initially, WaveBurst found triggers for 98 time shifts in multiples of 3.125 s
between −156.25 and −6.25 s and between +6.25 and +156.25 s. These 5119 triggers,
called the “tuning set”, were used to choose the parameters of the signal consistency
tests and additional selection criteria described in the following two sections. As
shown in figure 4, the rate of triggers in the tuning set is roughly constant for all time
shifts, with a marginal χ2 value but without any gross dependence on time shift. The
unshifted triggers were kept hidden throughout the tuning process, in order to avoid
the possibility of human bias in the choice of analysis parameters.
4. Signal consistency tests
The WaveBurst algorithm requires only a rough consistency among the different
detector data streams—namely, some apparent excess power in the same pixels
in the wavelet decomposition—to generate a trigger. This section describes more
sophisticated consistency tests based on the detailed content of the GW channels.
These tests succeed in eliminating most WaveBurst triggers in the data, while keeping
essentially all triggers generated in response to simulated gravitational-wave signals
added to the data streams. (The simulation method is described in section 7.) Similar
tests were also used in the S3 search [18].
Search for gravitational-wave bursts in LIGO data 11
 ]Hz [ strain / rssdetH1 h
-2210 -2110 -2010 -1910
-2210
-2110
-2010
-1910
 ]Hz [ strain / rssdetH1 h
-2210 -2110 -2010 -1910
-2210
-2110
-2010
-1910
 ]Hz [ strain / rssdetH1 h
-2210 -2110 -2010 -1910
-2210
-2110
-2010
-1910
(b) (c)
Figure 5. (a) Two-dimensional histogram, with bin count indicated by greyscale,
of H2 vs. H1 amplitudes reconstructed by WaveBurst for the tuning set of
time-shifted triggers. (b) Two-dimensional histogram of H2 vs. H1 amplitudes
reconstructed for simulated sine-Gaussian signals with a wide range of frequencies
and amplitudes from sources uniformly distributed over the sky (see section 7). In
these plots, the diagonal lines show the limits of the H1/H2 amplitude consistency
cut: 0.5 < ratio < 2 . (c) Two-dimensional histogram of L1 vs. H1 amplitudes for
the same simulated sine-Gaussian signals. Diagonal lines are drawn at ratios of
0.5 and 2 only to guide the eye; no cut is applied using this pair of interferometers.
4.1. H1/H2 amplitude consistency test
Because the two Hanford interferometers are co-located and co-aligned, they will
respond identically (in terms of strain) to any given gravitational wave. Thus, the
overall root-sum-squared amplitudes of the detected signals, estimated by WaveBurst
according to equation (2), should agree well if the estimation method is reliable.
Figure 5a shows that the time-shifted triggers in the tuning set often have poor
agreement between the detected signal amplitudes in H1 and H2. In contrast,
simulated signals injected into the data are found with amplitudes which usually agree
within a factor of 2, as shown in figure 5b. Therefore, we keep a trigger only if the
ratio of estimated signal amplitudes is in the range 0.5 to 2.
The Livingston interferometer is roughly aligned with the Hanford interferome-
ters, but the curvature of the Earth makes exact alignment impossible. The antenna
responses to a given gravitational wave will tend to be similar, but not reliably enough
to allow a consistency test which is both effective at rejecting noise triggers and effi-
cient at retaining simulated signals, as shown in figure 5c.
Search for gravitational-wave bursts in LIGO data 12
4.2. Cross-correlation consistency tests
The amplitude consistency test described in the previous subsection simply compares
scalar quantities derived from the data, without testing whether the waveforms are
similar in detail. We use a program called CorrPower [24], also used in the S3 burst
search [18], to calculate statistics based on Pearson’s linear correlation statistic,
i=1(xi − x̄)(yi − ȳ)
i=1(xi − x̄)
i=1(yi − ȳ)
. (3)
In the above expression {xi} and {yi} are sequences selected from the two GW channel
time series, possibly with a relative time shift, and x̄ and ȳ are their respective mean
values. The length of each sequence, N samples, corresponds to a chosen time window
(see below) over which the correlation is to be evaluated. r assumes values between
−1 for fully anti-correlated sequences and +1 for fully correlated sequences.
The r statistic measures the correlation between two data streams, such as
would be produced by a common gravitational-wave signal embedded in uncorrelated
detector noise [25]. It compares waveforms without being sensitive to the relative
amplitudes, and is thus complementary to the H1/H2 amplitude consistency test
described above. Furthermore, the r statistic may be used to test for a correlation
between H1 and L1 or between H2 and L1, even though these pairs consist of
interferometers with different antenna response factors, because each polarization
component will produce a measurable correlation for a suitable relative time delay
(unless the wave happens to arrive from one of the special directions for which one
of the detectors has a null response for that polarization component). In the special
case of a linearly polarized gravitational wave, the detected signals will simply differ
by a multiplicative factor, which can be either positive or negative depending on the
polarization angle and arrival direction.
Before calculating the r statistic for each detector pair, the data streams are
filtered to select the frequency band of interest (bandpass between 64 Hz and 1600 Hz)
and whitened to equalize the contribution of noise from all frequencies within this
band. The filtering is the same as was used in the S3 search [18] except for the
addition of a Q=10 notch filter, centered at 345 Hz, to avoid measuring correlations
from the prominent vibrational modes of the wires used to suspend the mirrors, which
are clustered around that frequency. The r statistic is then calculated over multiple
time windows with lengths of 20, 50, and 100 ms and a range of starting times,
densely placed (99% overlap) to cover the full duration of the trigger as reported by
WaveBurst; the maximum value from among these different time windows is used.
CorrPower [26] calculates two quantities, derived from the r statistic, which are
used to select triggers. The first of these, called R0, is simply the signed cross-
correlation between H1 and H2 with no relative time delay. Triggers with R0 < 0
are rejected. The second quantity, called Γ, combines the r-statistic values from the
three detector pairs, allowing relative time delays of up to 11 ms between H1 and L1
and between H2 and L1, and up to 1 ms between H1 and H2 (to allow for a possible
mismatch in time calibration). Specifically, Γ is the average of “confidence” values
calculated from the absolute value of each of the three individual r-statistic values.
A large value of Γ indicates that the data streams are correlated to an extent that
is highly unlikely to have resulted from normal instrumental noise fluctuations. This
quantity complements Zg, providing a different and largely independent means for
distinguishing real signals from background.
Search for gravitational-wave bursts in LIGO data 13
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
(a) (b)
Figure 6. Plots of Γ versus Zg, after the H1/H2 amplitude consistency cut but
before any other cuts. (a) Scatter plot for all time-shifted triggers in the tuning
set. (b) Two-dimensional histogram, with bin count indicated by greyscale, for
simulated sine-Gaussian signals with a wide range of frequencies and amplitudes
from sources uniformly distributed over the sky (see section 7). In both plots, the
vertical dashed line indicates the initial WaveBurst significance cut at Zg=6.7.
Figure 6 shows plots of Γ vs. Zg for time-shifted triggers and for simulated
gravitational-wave signals after the H1/H2 amplitude consistency cut but before the
R0 cut. The time-shifted triggers with Γ < 12 and Zg < 20 are the tail of the bulk
distribution of triggers. The outliers with Γ > 12 all arise from a few distinct times
when large noise transients occurred in H1 and H2; these are found many times, paired
with different L1 time shifts, and have similar values of Γ because the calculation of
Γ is dominated by the H1-H2 pair in these cases. The outliers with Γ < 12 and
Zg > 20 are artefacts of sudden changes in the power line noise at 60 Hz and 180 Hz
which WaveBurst recorded as triggers. A cut on the value of Γ can eliminate many
of the time-shifted triggers in figure 6a, but at the cost of also rejecting weak genuine
gravitational-wave signals that may have the distribution in figure 6b. Therefore, the Γ
cut is chosen only after additional selection criteria have been applied; see section 5.3.
5. Additional selection criteria for event candidates
Environmental disturbances or instrumental misbehaviour occasionally produce non-
stationary noise in the GW channel of a detector which contributes to the recording of
a WaveBurst trigger. These triggers can sometimes pass the H1-H2 consistency and
cross-correlation consistency tests, particularly since an environmental disturbance
at the Hanford site affects both H1 and H2. As noted in the previous section, the
calculated value of Γ is susceptible to being dominated by the H1-H2 pair even if
there is minimal signal power in the L1 data stream. A significant background rate of
event candidates caused by environmental or instrumental effects could obscure the
rare gravitational-wave bursts that we seek, or else require us to apply more aggressive
cuts and thus lose sensitivity for weak signals.
This section describes the two general tactics we use to reject data with
identifiable problems and thereby reduce the rate of background triggers. First,
we make use of several “data quality flags” that have been introduced in order to
describe the status of the instruments and the quality of the recorded data over time
intervals ranging from seconds to hours. Second, we remove triggers attributed to
Search for gravitational-wave bursts in LIGO data 14
short-duration instrumental or environmental effects by applying “vetoes” based on
triggers generated from auxiliary channels which have been found to correlate with
transients in the GW channel. Applying data quality conditions and vetoes to the
data set reduces the amount of “live” observation time (or “livetime”) during which
an arriving gravitational-wave burst would be detected and kept as an event candidate
at the end of the analysis pipeline. Therefore, we must balance this loss (“deadtime”)
against the effectiveness for removing spurious triggers from the data sample.
Choosing data quality and veto conditions with reference to a sample of
gravitational-wave event candidates could introduce a selection bias and invalidate any
upper limit calculated from the sample. Therefore, we have evaluated the relevance
of potential data quality cuts and veto conditions using other trigger samples. In
addition to the tuning set of time-shifted WaveBurst triggers, we have applied the
KleineWelle [27] method to identify transients in each interferometer’s GW channel.
(We have also used KleineWelle to identify transients in numerous auxiliary channels
for veto studies, as described in 5.2.) Like WaveBurst, KleineWelle is a time-frequency
method utilizing multi-resolution wavelet decomposition, but it processes each data
channel independently [28]. In analyzing data, the time series is first whitened using
a linear predictor error filter [27]. Then the time-frequency decomposition is obtained
using the Haar wavelet transform. The squared wavelet coefficients normalized to the
scale’s (frequency’s) root-mean-square provide an estimate of the energy associated
with a certain time-frequency pixel. A clustering mechanism is invoked in order to
increase the sensitivity to signals with less than optimal shapes in the time-frequency
plane and a total normalized cluster energy is computed. The significance of a
cluster is then defined as the negative natural logarithm of the probability of the
computed total normalized cluster energy to have resulted from Gaussian white noise;
we apply a threshold on this significance to define KleineWelle triggers. The samples
of KleineWelle triggers from each detector, as well as the subsample of coincident
H1 and H2 triggers, are useful indicators of localized disturbances. They may in
principle contain one or more genuine gravitational-wave signals, but decisions about
data quality and veto conditions are based on the statistics of the entire sample which
is dominated by instrumental artefacts and noise fluctuations.
5.1. Data quality conditions
We wish to reject instances of clear hardware problems with the LIGO detectors
or conditions that could affect our ability to unequivocally register the passage of
gravitational-wave bursts. Various studies of the data, performed during and after
data collection, produced a catalog of conditions that might affect the quality of the
data. Each named condition, or “flag”, has an associated list of time intervals during
which the condition is present, derived either from one or more diagnostic channels
or from entries made in the electronic logbook by operators and scientific monitors.
We have looked for significant correlations between the flagged time intervals and
time-shifted WaveBurst triggers, and also between the flagged time intervals and
KleineWelle single-detector triggers (particularly the “outliers” with large significance
and the coincident H1 and H2 triggers). Based on these studies, we decided to impose
a number of data quality conditions.
We first require the calibration lines to be continuously present. On several
occasions when they dropped out briefly, due to a problem with the excitation engine,
the data is removed from the analysis. The livetime associated with these occurrences
Search for gravitational-wave bursts in LIGO data 15
is negligible while they are all correlated with transients appearing in the GW channel.
Local winds and sound from airplanes may couple to the instrument through
the ground and result in elevated noise and/or impulsive signals. A data quality
flag was established to identify intervals of local winds at the sites with speeds of
56 km/hour (35 miles per hour) and above. We studied the correlation of these times
with the single-detector triggers produced with KleineWelle. The correlation is more
apparent in the H2 detector, for which 7.4% of the most significant KleineWelle triggers
(threshold of 1600) coincide with the intervals of strong winds at the Hanford site. The
livetime that is rejected in this way is 0.66% of the H1-H2 coincident observation time
over which this study was performed. Thanks to improved acoustic isolation installed
after the S2 science run, acoustic noise from airplanes was not found to contribute
to triggers in the GW channel in general; however, a period of 300 seconds has been
rejected around a particularly loud time when a fighter jet passed over the Hanford
site.
Elevated low-frequency seismic activity has been observed to cause noise
fluctuations and transients in the GW channel. Data from several seismometers at the
Hanford observatory was band-pass filtered in various narrow bands between 0.4 Hz
and 2.4 Hz, and the root-mean-square signal in each band was tracked over time. A
set of particularly relevant seismometers and bands was selected, and time intervals
were flagged whenever a band in this set exceeded 7 times its median value. A follow
up analysis of the single instrument as well as coincident H1-H2 KleineWelle triggers
found significant correlation with the elevated seismic noise. The strongest correlation
is observed in the outlier triggers (KleineWelle significance of 1600 or greater) in H2,
of which 41.9% coincide with the seismic flags, compared to a deadtime of 0.6%.
In the two Hanford detectors, a diagnostic channel counting ADC overflows in the
length sensing and control subsystem was used to flag intervals for exclusion from the
analysis. One minute of livetime around these overflows is rejected. Such overflows
were indeed seen to correlate with single-detector outlier triggers in H1 (44.4% of
them, with 0.68% deadtime) and H2 (74.1% of them, with 0.41% deadtime).
Two data quality cuts are derived from “trend” data (summaries of minimum,
maximum, mean and root-mean-square values over each one-second period)
monitoring the interferometry used in the LIGO detectors. The first one is based
on occasional transient dips in the stored light in the arm cavities. These have been
identified by scanning the trend data for the relevant monitoring photodiodes, defining
the size of a dip as the fractional drop of the minimum in that second relative to the
average of the previous ten seconds, and applying various thresholds on the minimum
dip size. For the three LIGO detectors, thresholds of 5%, 4% and 5% respectively
for L1, H1 and H2 are used. High correlation of such light dips with single-detector
triggers is observed, while the deadtime resulting from them in each of the three LIGO
instruments is less than 0.6%. The second data quality cut of this type is based on the
DC level of light reaching the photodiode at the output of the interferometer, which
sees very little light when the interferometer is operating properly. By thresholding
on the trend data for this channel, intervals when its value was unusually high are
identified in H1 and L1. These intervals are seen to correlate with instrument outlier
triggers significantly. The deadtime resulting from them is 1.02% in H1 and 1.74% in
Altogether, these data quality cuts result in a net loss of observation time of 5.6%.
Search for gravitational-wave bursts in LIGO data 16
5.2. Auxiliary-channel vetoes
LIGO records thousands of auxiliary read-back channels of the servo control systems
employed in the instruments’ interferometric operation as well as auxiliary channels
monitoring the instruments’ physical environment. There are plausible couplings of
environmental disturbances or servo instabilities both to these monitoring channels
and to the GW channel; thus, transients appearing in these auxiliary channels may
be used to veto triggers seen simultaneously in the GW channel. This assumes that
a genuine gravitational-wave burst would not appear in these auxiliary channels, or
at least that any coupling is small enough to stay below the threshold for selecting
transients in these channels.
We have used KleineWelle to produce triggers from over 100 different auxiliary
channels that monitor the interferometry and the environment in the three LIGO
detectors. A first analysis of single-detector KleineWelle triggers from the L1 GW
channel and coincident KleineWelle triggers from the H1 and H2 GW channels
against respective auxiliary channels identified the ones that showed high GW channel
trigger rejection power with minimal livetime loss (in the vast majority of channels
much less that 1%). In addition to interferometric channels, environmental ones
(accelerometers and microphones) located on the optical tables holding the output
optics and photodiodes appeared to correlate with GW channel triggers recorded at
the same site.
Auxiliary interferometric channels (besides the GW channel) could in principle
be affected by a gravitational wave, and a veto condition derived from such a channel
could reject a genuine signal. Hardware signal injections imitating the passage of
gravitational waves through our detectors, performed at several pre-determined times
during the run, have been used to establish under what conditions each channel is
safe to use as a veto. Non-detection of a hardware injection by an auxiliary channel
suggests the unconditional safety of this channel as a veto in the search, assuming that
a reasonably broad selection of signal strengths and frequencies were injected. But
even if hardware injections are seen in the auxiliary channels, conditions can readily
be derived under which no triggers caused by the hardware injections are used as
vetoes. This involves imposing conditions on the significance of the trigger and/or
on the ratio of the signal strength seen in the auxiliary channel to that seen in the
GW channel. We have thus established the conditions under which several channels
involved in the length and angular sensing and control systems of the interferometers
can be used safely as vetoes. (The data quality conditions described in section 5.1
were also verified to be safe using hardware injections.)
The final choice of vetoes was made by examining the tuning set of time-
shifted triggers remaining in the WaveBurst search pipeline after applying the signal
consistency tests and data quality conditions. The ten triggers from the time-shifted
analysis with the largest values of Γ, plus the ten with the largest values of Zg, were
examined and six of them were found to coincide with transients in one or more of the
following channels: the in-phase and quadrature-phase demodulated signals from the
pick-off beam from the H1 beamsplitter, the in-phase demodulated pitch signal from
one of the wavefront sensors used in the H1 alignment sensing and control system, the
beam splitter pitch and yaw control signals, and accelerometer readings on the optical
tables holding the H1 and H2 output optics and photodiodes. KleineWelle triggers
produced from these seven auxiliary channels were clustered (with a 250 ms window)
and their union was taken. This defines the final list of veto triggers for this search,
Search for gravitational-wave bursts in LIGO data 17
each indicating a time interval (generally ≪ 1 s long) to be vetoed.
The total duration of the veto triggers considered in this analysis is at the level
of 0.15% of the total livetime. However, this does not reliably reflect the deadtime
of the search since a GW channel trigger is vetoed if it has any overlap with a veto
trigger. Thus, the actual deadtime of the search depends on the duration of the
signal being sought, as reconstructed by WaveBurst. We reproduce this effect in
the Monte Carlo simulation used to estimate the efficiency of the search (described
in section 7) by applying the same analysis pipeline and veto logic. The effective
deadtime depends on the morphology of the signal and on the signal amplitude, since
larger-amplitude signals tend to be assigned longer durations by WaveBurst. For the
majority of waveforms we considered in this search and for plausible signals strengths,
the resulting effective deadtime is of the order of 2%. Because this loss is signal-
dependent, in this analysis we consider it to be a loss of efficiency rather than a loss
of live observation time; in other words, the live observation time we state reflects the
data quality cuts applied but does not reflect the auxiliary-channel vetoes.
5.3. Gamma cut
The cuts described above cleaned up the outliers in the data considerably, as shown by
the sequence of scatter plots in figure 7. Following the data quality and veto criteria
we just described, the remaining time-shifted WaveBurst triggers (shown in figure 7d)
were used as the basis for choosing the cross correlation Γ threshold. As with previous
all-sky searches for gravitational-wave bursts with LIGO, we desire the number of
background triggers expected for the duration of the observation to be much less than
1 but not zero, typically of order ∼ 0.1. On that basis, we chose a threshold of Γ > 4
which results in 7 triggers in 98 time shifts, or 0.08 such triggers normalized to the
duration of the S4 observation time.
6. Search results
After all of the trigger selection criteria had been established using the tuning set of
time-shifted triggers, WaveBurst was re-run with a new, essentially independent set
of 100 time shifts, in increments of 5 s from −250 s to −5 s and from +5 s to +250 s,
in order to provide an estimate of the background which is minimally biased by the
choice of selection criteria. The total effective livetime for the time-shifted sample is
77.4 times the unshifted observation time, reflecting the reduced overlap of Hanford
and Livingston data segments when shifted relative to one another. The unshifted
triggers were looked at for the first time. Table 1 summarizes the trigger counts for
these time-shifted and unshifted triggers at each stage in the sequence of cuts. In
addition, the expected background at each stage (time-shifted triggers normalized to
the S4 observation time) is shown for direct comparison with the observed zero-lag
counts. Figure 8 shows a scatter plot of Γ vs. Zg and histograms of Γ for both time-
shifted and unshifted triggers after all other cuts. These new time-shifted triggers
are statistically consistent with the tuning set (figure 7d), although no triggers are
found with Zg > 15 in this case. Five unshifted triggers are found, distributed in a
manner reasonably consistent with the background. All five have Γ<4 and thus fail
the Γ cut. Three time-shifted triggers pass the Γ cut, corresponding to an estimated
average background of 0.04 triggers over the S4 observation time.
Search for gravitational-wave bursts in LIGO data 18
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
(a) (b)
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
(c) (d)
Figure 7. Scatter plots of Γ versus Zg for the tuning set of time-shifted triggers.
(a) All triggers; (b) after data quality cuts; (c) after data quality and H1-
H2 consistency cuts (amplitude ratio and R0); (d) after data quality, H1-H2
consistency, and auxiliary-channel vetoes.
Table 1. Counts of time-shifted and unshifted triggers as cuts are applied
sequentially. The column labeled “Normalized” is the time-shifted count divided
by 77.4, representing an estimate of the expected background for the S4
observation time.
Time-shifted
Unshifted
Cut Count Normalized Count
Data quality 3153 40.7 44
H1/H2 amplitude consistency 1504 19.4 14
R0 > 0 755 9.8 5
Auxiliary-channel vetoes 671 8.7 5
Γ > 4 3 0.04 0
With no unshifted triggers in the final sample, we place an upper limit on the
mean rate of gravitational-wave events that would be detected reliably (i.e., with
efficiency near unity) by this analysis pipeline. Since the background estimate is small
and is subject to some systematic uncertainties, we simply take it to be zero for
purposes of calculating the rate limit; this makes the rate limit conservative. With
15.5 days of observation time, the one-sided frequentist upper limit on the rate at 90%
confidence level is − ln (0.1)/T = 2.303/(15.5 days) = 0.15 per day. For comparison,
the S2 search [17] arrived at an upper limit of 0.26 per day. The S3 search [18] had
Search for gravitational-wave bursts in LIGO data 19
5 6 7 8 9 10 11 12 13 14 15
time-shifted
unshifted
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
unshifted
time-shifted
rms spread
(a) (b)
Figure 8. (a) Scatter plot of Γ vs. Zg for time-shifted triggers (grey circles)
and unshifted triggers (black circles) after all other analysis cuts. The vertical
dashed line indicates the initial WaveBurst significance cut at Zg=6.7. The
horizontal dashed line indicates the final Γ cut. (b) Overlaid histograms of Γ
for unshifted triggers (black circles) and mean background estimated from time-
shifted triggers (black stairstep with statistical error bars). The shaded bars
represent the expected root-mean-square statistical fluctuations on the number of
unshifted background triggers in each bin.
an observation time of only 8 days and did not state a rate limit.
7. Amplitude sensitivity of the search
The previous section presented a limit on the rate of a hypothetical population
of gravitational-wave signals for which the analysis pipeline has perfect detection
efficiency. However, the actual detection efficiency will depend on the signal waveform
and amplitude, being zero for very weak signals and generally approaching unity for
sufficiently strong signals. The signal processing methods used in this analysis are
expressly designed to be able to detect arbitrary waveforms as long as they have
short duration and frequency content in the 64–1600 Hz band which stands out above
the detector noise. Therefore, for any given signal of this general type, we wish to
determine a characteristic minimum signal amplitude for which the pipeline has good
detection efficiency. As in past analyses, we use a Monte Carlo technique with a
population of simulated gravitational wave sources. Simulated events are generated
at random sky positions and pseudo-random times (imposing a minimum separation
of 80 s) during the S4 run; the resulting signal waveforms in each interferometer are
calculated with the appropriate antenna factors and time delays. These simulated
signals are added to the actual detector data, and the summed data streams are
analyzed using the same pipeline with the same trigger selection criteria.
The intrinsic amplitude of a simulated gravitational wave may be characterized
by its root-sum-squared strain amplitude at the Earth, without folding in antenna
response factors:
hrss ≡
(|h+(t)|2 + |h×(t)|2) dt . (4)
This quantity has units of s1/2, or equivalently Hz−1/2. In general, the root-sum-
squared signal measured by a given detector, hrssdet, will be somewhat smaller. The
Search for gravitational-wave bursts in LIGO data 20
Monte Carlo approach taken for this analysis is to generate a set of signals all with
fixed hrss and then to add this set of signals to the data with several discrete scale
factors to evaluate different signal amplitudes. For a given signal morphology and hrss,
the efficiency of the pipeline is the fraction of simulated signals which are successfully
recovered.
For this analysis, we do not attempt to survey the complete spectrum of
astrophysically motivated signals, but rather we use a limited number of ad-hoc
waveforms to characterize the sensitivity of the search in terms of hrss. Similar
sensitivities may be expected for different waveforms with similar overall properties
(central frequency, bandwidth, duration); the degree to which this is true has been
investigated in [18] and [29]. The waveforms evaluated in the present analysis are:
• Sine-Gaussian: sinusoid with a given frequency f0 inside a Gaussian amplitude
envelope with dimensionless width Q and arrival time t0:
h(t0 + t) = h0 sin(2πf0t) exp
− (2πf0t)
. (5)
These are generated with linear polarization, with f0 ranging from 70 Hz to
1053 Hz and with Q equal to 3, 8.9, and 100. The signal consistency tests
described in section 4 were developed using an ensemble of sine-Gaussian signals
with all simulated frequencies and Q values.
• Gaussian: a simple unipolar waveform with a given width τ and linear
polarization:
h(t0 + t) = h0 exp(−t
2/τ2) . (6)
• Band-limited white noise burst: a random signal with two independent
polarization components that are white over a given frequency band, described
by a base frequency f0 and a bandwidth ∆f (i.e. containing frequencies from f0
to f0 + ∆f). The signal amplitude has a Gaussian time envelope with a width
τ . Because these waveforms have two uncorrelated polarizations (in a coordinate
system at some random angle), they provide a stringent check on the robustness
of our cross-correlation test.
In all cases, we generate each simulated signal with a random arrival direction and a
random angular relationship between the wave polarization basis and the Earth.
Figures 9 and 10 show the measured efficiency of the analysis pipeline as a function
of root-sum-squared strain amplitude, ǫ(hrss), for each simulated waveform. The
efficiency data points for each waveform are fit with a function of the form
ǫ(hrss) =
)α(1+β tanh(hrss/hmidrss ))
, (7)
where ǫmax corresponds to the efficiency for strong signals (normally very close to
unity), hmidrss is the hrss value corresponding to an efficiency of ǫmax/2, β is the
parameter that describes the asymmetry of the sigmoid (with range −1 to +1), and
α describes the slope. Data points with efficiency below 0.05 are excluded from the
fit because they do not necessarily follow the functional form, while data points with
efficiency equal to 1.0 are excluded because their asymmetric statistical uncertainties
are not handled properly in the chi-squared fit. The empirical functional form in
equation 7 has been found to fit the remaining efficiency data points well.
Note that the Gaussian waveform with τ = 6.0 ms has efficiency less than 0.8
even for the largest simulated amplitude. This broad waveform, with little signal
Search for gravitational-wave bursts in LIGO data 21
]Hz [strain/rssh
-2210 -2110 -2010 -1910
70 Hz
100 Hz
153 Hz
235 Hz
361 Hz
554 Hz
849 Hz
1053 Hz
Sine-Gaussians, Q=3
]Hz [strain/rssh
-2210 -2110 -2010 -1910
70 Hz
100 Hz
153 Hz
235 Hz
361 Hz
554 Hz
849 Hz
1053 Hz
Sine-Gaussians, Q=8.9
]Hz [strain/rssh
-2210 -2110 -2010 -1910
1 100 Hz
153 Hz
235 Hz
361 Hz
554 Hz
849 Hz
1053 Hz
Sine-Gaussians, Q=100
Figure 9. Efficiency curves for simulated gravitational-wave signals: linearly-
polarized sine-Gaussian waves with (a) Q=3; (b) Q=8.9; (c) Q=100. Statistical
errors are comparable to the size of the plot symbols.
Search for gravitational-wave bursts in LIGO data 22
]Hz [strain/rssh
-2210 -2110 -2010 -1910
0.05 ms
0.1 ms
0.25 ms
0.5 ms
1.0 ms
2.5 ms
4.0 ms
6.0 ms
Gaussians
]Hz [strain/rssh
-2210 -2110 -2010 -1910
100-110 Hz, 0.1 s
100-200 Hz, 0.1 s
100-200 Hz, 0.01 s
250-260 Hz, 0.1 s
250-350 Hz, 0.1 s
250-350 Hz, 0.01 s
1000-1010 Hz, 0.1 s
1000-1100 Hz, 0.1 s
1000-1100 Hz, 0.01 s
1000-2000 Hz, 0.1 s
1000-2000 Hz, 0.01 s
1000-2000 Hz, 0.001 s
Band-limited white noise bursts
Figure 10. Efficiency curves for simulated gravitational-wave signals: (a)
linearly-polarized Gaussian waves; (b) band-limited white-noise bursts with two
independent polarization components. Note that four curves in the latter plot
are nearly identical: 100–110 Hz, 0.1 s; 100–200 Hz, 0.1 s; 250–260 Hz, 0.1 s;
and 250–350 Hz, 0.01 s. Statistical errors are comparable to the size of the plot
symbols.
power at frequencies above 64 Hz (the lower end of the nominal search range), is at
the limit of what the search method can detect. For some of the other waveforms,
the efficiency levels off at a value slightly less than 1.0 due to the application of the
auxiliary-channel vetoes, which randomly coincide in time with some of the simulated
signals. This effect is most pronounced for the longest-duration simulated signals due
to the veto logic used in this analysis, which rejects a trigger if there is any overlap
between the reconstructed trigger duration and a vetoed time interval. The 70-Hz
sine-Gaussian with Q=100 has a duration longer than 1 s and is reconstructed quite
poorly; it is omitted from figure 9c and from the following results.
The analytic expressions of the fits are used to determine the signal strength hrss
for which efficiencies of 50% and 90% are reached. These fits are subject to statistical
Search for gravitational-wave bursts in LIGO data 23
Table 2. hrss values corresponding to 50% and 90% detection efficiencies for
simulated sine-Gaussian signals with various central frequencies and Q values.
The 70 Hz sine-Gaussian with Q=100 is not detected reliably.
hrss (10
−21 Hz−1/2)
50% efficiency 90% efficiency
Central
frequency (Hz) Q=3 Q=8.9 Q=100 Q=3 Q=8.9 Q=100
70 3.4 5.8 — 19.2 52.0 —
100 1.8 1.7 2.6 10.4 9.4 17.7
153 1.5 1.4 1.7 8.2 8.3 8.7
235 1.6 1.7 1.9 11.0 9.8 12.6
361 2.4 2.7 3.2 11.5 16.7 20.9
554 3.3 3.2 3.2 16.1 17.9 20.4
849 5.9 4.9 4.5 28.4 28.9 24.9
1053 8.3 7.2 6.6 39.3 37.5 37.5
Table 3. hrss values corresponding to 50% and 90% detection efficiencies for
simulated Gaussian signals with various widths. The waveform with τ=6.0 ms
does not reach an efficiency of 90% within the range of signal amplitudes
simulated.
hrss (10
−21 Hz−1/2)
τ (ms) 50% efficiency 90% efficiency
0.05 6.6 33.9
0.1 4.4 25.3
0.25 3.0 14.3
0.5 2.2 13.5
1.0 2.2 10.6
2.5 3.4 20.5
4.0 8.3 43.3
6.0 39.0 —
errors from the limited number of simulations performed to produce the efficiency data
points. Also, the overall amplitude scale is subject to the uncertainty in the calibration
of the interferometer response, conservatively estimated to be 10% [30]. We increase
the nominal fitted hrss values by the amount of these systematic uncertainties to arrive
at conservative hrss values at efficiencies of 50% and 90%, summarized in tables 2,
3, and 4. The sine-Gaussian hrss values are also displayed graphically in figure 11,
showing how the frequency dependence generally follows that of the instrumental
noise.
Event rate limits as a function of waveform type and signal amplitude can be
represented by an “exclusion diagram”. Each curve in an exclusion diagram indicates
what the rate limit would be for a population of signals with a fixed hrss, as a
function of hrss. The curves in figure 12 illustrate, using selected sine-Gaussian and
Gaussian waveforms that were also considered in the S1 and S2 analyses, that the
amplitude sensitivities achieved by this S4 analysis are at least an order of magnitude
better than the sensitivities achieved by the S2 analysis. For instance, the 50%
efficiency hrss value for 235 Hz sine-Gaussians with Q=8.9 is 1.5 × 10
−20 Hz−1/2
for S2 and 1.7× 10−21 Hz−1/2 for S4. (Exclusion curves were not generated for the S3
Search for gravitational-wave bursts in LIGO data 24
Table 4. hrss values corresponding to 50% and 90% detection efficiencies for
simulated “white noise burst” signals with various base frequencies, bandwidths,
and durations.
hrss (10
−21 Hz−1/2)
Base frequency Bandwidth Duration
(Hz) (Hz) (s) 50% eff. 90% eff.
100 10 0.1 1.8 4.7
100 100 0.1 1.9 4.1
100 100 0.01 1.3 2.9
250 10 0.1 1.8 4.5
250 100 0.1 2.4 5.4
250 100 0.01 1.8 4.3
1000 10 0.1 6.5 15.8
1000 100 0.1 7.9 16.7
1000 100 0.01 5.5 12.7
1000 1000 0.1 19.2 42.6
1000 1000 0.01 9.7 22.3
1000 1000 0.001 9.5 23.7
100 1000
Sensitivities for Sine−Gaussian Waveforms
Frequency (Hz)
90% for Q=3
90% for Q=8.9
90% for Q=100
50% for Q=3
50% for Q=8.9
50% for Q=100
H2 noise
L1 noise
H1 noise
Figure 11. Sensitivity of the analysis pipeline for sine-Gaussian waveforms as a
function of frequency and Q. Symbols indicate the hrss values corresponding to
50% and 90% efficiency, taken from table 2. The instrumental sensitivity curves
from figure 2 are shown for comparison.
analysis, but the S3 sensitivity was 9 × 10−21 Hz−1/2 for this particular waveform.)
The improvement is greatest for lower-frequency sine-Gaussians and for the widest
Gaussians, due to the reduced low-frequency detector noise and the explicit extension
of the search band down to 64 Hz.
Search for gravitational-wave bursts in LIGO data 25
]Hz [strain/rssh
-2210 -2110 -2010 -1910 -1810 -1710 -1610
70 Hz
153 Hz
235 Hz
554 Hz
849 Hz
1053 Hz
S4 LIGO-TAMA
(a) Sine-Gaussians with Q=8.9
]Hz [strain/rssh
-2210 -2110 -2010 -1910 -1810 -1710 -1610
0.05 ms
0.1 ms
0.25 ms
1.0 ms
2.5 ms
4.0 ms
(b) Gaussians
Figure 12. Exclusion diagrams (rate limit at 90% confidence level, as a function
of signal amplitude) for (a) sine-Gaussian and (b) Gaussian simulated waveforms
for this S4 analysis compared to the S1 and S2 analyses (the S3 analysis did not
state a rate limit). These curves incorporate conservative systematic uncertainties
from the fits to the efficiency curves and from the interferometer response
calibration. The 849 Hz curve labeled “LIGO-TAMA” is from the joint burst
search using LIGO S2 with TAMA DT8 data [8], which included data subsets
with different combinations of operating detectors with a total observation time
of 19.7 days and thereby achieved a lower rate limit. The hrss sensitivity of the
LIGO-TAMA search was nearly constant for sine-Gaussians over the frequency
range 700–1600 Hz.
Search for gravitational-wave bursts in LIGO data 26
8. Astrophysical reach estimates
In order to set an astrophysical scale to the sensitivity achieved by this search, we
can ask what amount of mass converted into gravitational-wave burst energy at a
given distance would be strong enough to be detected by the search pipeline with 50%
efficiency. We start with the expression for the instantaneous energy flux emitted by a
gravitational wave source in the two independent polarizations h+(t) and h×(t) [31],
d2EGW
(ḣ+)
2 + (ḣ×)
, (8)
and follow the derivations in [32]. Plausible astrophysical sources will, in general, emit
gravitational waves anisotropically, but here we will assume isotropic emission in order
to get simple order-of-magnitude estimates. The above formula, when integrated over
the signal duration and over the area of a sphere at radius r (assumed not to be at
a cosmological distance), yields the total energy emitted in gravitational waves for a
given signal waveform. For the case of a sine-Gaussian with frequency f0 and Q ≫ 1,
we find
EGW =
(2πf0)
2h2rss . (9)
Taking the waveform for which we have the best hrss sensitivity, a 153 Hz sine-
Gaussian with Q=8.9, and assuming a typical Galactic source distance of 10 kpc, the
above formula relates the 50%-efficiency hrss = 1.4× 10
−21 Hz−1/2 to 10−7 solar mass
equivalent emission into a gravitational-wave burst from this hypothetical source and
under the given assumptions. For a source in the Virgo galaxy cluster, approximately
16 Mpc away, the same hrss would be produced by an energy emission of roughly
0.25M⊙c
2 in a burst with this highly favourable waveform.
We can draw more specific conclusions about detectability for models of
astrophysical sources which predict the absolute energy and waveform emitted. Here
we consider the core-collapse supernova simulations of Ott et al. [15] and a binary black
hole merger waveform calculated by the Goddard numerical relativity group [11] (as a
representative example of the similar merger waveforms obtained by several groups).
While the Monte Carlo sensitivity studies in section 7 did not include these particular
waveforms, we can relate the modeled waveforms to qualitatively similar waveforms
that were included in the Monte Carlo study and thus infer the approximate sensitivity
of the search pipeline for these astrophysical models.
Ott et al. simulated core collapse for three progenitor models and calculated
the resulting gravitational wave emission, which was dominated by oscillations of
the protoneutron star core driven by accretion [15]. Their s11WW model, based
on a non-spinning 11-M⊙ progenitor, produced a total gravitational-wave energy
emission of 1.6× 10−8M⊙c
2 with a characteristic frequency of ∼654 Hz and duration
of several hundred milliseconds. If this were a sine-Gaussian, it would have a Q
of several hundred; table 2 shows that our sensitivity does not depend strongly on
Q, so we might expect 50% efficiency for a signal at this frequency with hrss of
∼3.7 × 10−21 Hz−1/2. However, the signal is not monochromatic, and its increased
time-frequency volume may degrade the sensitivity by up to a factor of ∼2. Using
this EGW and hrss ≈ 7 × 10
−21 Hz−1/2 in equation 9, we find that our search
has an approximate “reach” (distance for which the signal would be detected with
50% efficiency by the analysis pipeline) of ∼0.2 kpc for this model. The m15b6
model, based on a spinning 15-M⊙ progenitor, yields a very similar waveform and
Search for gravitational-wave bursts in LIGO data 27
essentially the same reach. The s25WW model, based on a 25-M⊙ progenitor, was
found to emit vastly more energy in gravitational waves, 8.2 × 10−5M⊙c
2, but with
a higher characteristic frequency of ∼937 Hz. With respect to the Monte Carlo
results in section 7, we may consider this similar to a high-Q sine-Gaussian, yielding
hrss ≈ 5.5×10
−21 Hz−1/2, or to a white noise burst with a bandwidth of ∼100 Hz and
a duration of > 0.1 s, yielding hrss ≈ 8 × 10
−21 Hz−1/2. Using the latter, we deduce
an approximate reach of 8 kpc for this model.
A pair of merging black holes emits gravitational waves with very high efficiency;
for instance, numerical evolutions of equal-mass systems without spin have found the
radiated energy from the merger and subsequent ringdown to be 3.5% or more of the
total mass of the system [11]. From figure 8 of that paper, the frequency of the signal
at the moment of peak amplitude is seen to be
fpeak ≈
15 kHz
(Mf/M⊙)
, (10)
where Mf is the final mass of the system. Very roughly, we can consider the
merger+ringdown waveform to be similar to a sine-Gaussian with central frequency
fpeak and Q ≈ 2 for purposes of estimating the reach of this search pipeline for binary
black hole mergers. (Future analyses will include Monte Carlo efficiency studies using
complete inspiral-merger-ringdown waveforms.) Thus, a binary system of two 10-M⊙
black holes (i.e. Mf ≈ 20M⊙) has fpeak ≈ 750 Hz, and from table 2 we can estimate
the hrss sensitivity to be ∼5.5×10
−21 Hz−1/2. Using EGW = 0.035Mfc
2, we conclude
that the reach for such a system is roughly 1.4 Mpc. Similarly, a binary system with
Mf = 100M⊙ has fpeak ≈ 150 Hz, a sensitivity of ∼1.5×10
−21 Hz−1/2, and a resulting
reach of roughly 60 Mpc.
9. Discussion
The search reported in this paper represents the most sensitive search to date for
gravitational-wave bursts in terms of strain amplitude, reaching hrss values below
10−20 Hz−1/2, and covers a broad frequency range, 64–1600 Hz, with a live observation
time of 15.5 days.
Comparisons with previous LIGO [16, 17] and LIGO-TAMA [8] searches have
already been shown graphically in figure 12. The LIGO-TAMA search targeted
millisecond-duration signals with frequency content in the 700–2000 Hz frequency
regime (i.e., partially overlapping the present search) and had a detection efficiency
of at least 50% (90%) for signals with hrss greater than ∼ 2 × 10
−19 Hz−1/2
(10−18 Hz−1/2). Among other searches with broad-band interferometric detectors [33,
34, 35], the most recent one by the TAMA collaboration reported an upper limit of
0.49 events per day at the 90% confidence level based on an analysis of 8.1 days of the
TAMA300 instrument’s ninth data taking run (DT9) in 2003–04. The best sensitivity
of this TAMA search was achieved when looking for narrow-band signals at TAMA’s
best operating frequency, around 1300 Hz, and it was at hrss ≈ 10
−18 Hz−1/2 for 50%
detection efficiency [35]. Although we did not measure the sensitivity of the S4 LIGO
search with narrow-band signals at 1300 Hz, LIGO’s noise at that frequency range
varies slowly enough so that we do not expect it to be significantly worse than the
sensitivity for 1053 Hz sine-Gaussian signals described in section 7, which stands at
about 7× 10−21 Hz−1/2.
Search for gravitational-wave bursts in LIGO data 28
Comparisons with results from resonant mass detectors were detailed in our
previous publications [16, 17]. The upper limit of ∼ 4×10−3 events per day at the 95%
confidence level on the rate of gravitational wave bursts set by the IGEC consortium
of five resonant mass detectors still represents the most stringent rate limit for hrss
signal strengths of order 10−18 Hz−1/2 and above [36]. This upper limit quickly falls off
and becomes inapplicable to signals weaker than 10−19 Hz−1/2 (see figure 14 in [17].)
Furthermore, with the improvement in our search sensitivity, the signal strength of the
events corresponding to the slight excess seen by the EXPLORER and NAUTILUS
resonant mass detectors in their 2001 data [37] falls well above the 90% sensitivity of
our current S4 search: as described in [17], the optimal orientation signal strength of
these events assuming a Gaussian morphology with τ=0.1 ms corresponds to a hrss
of 1.9 × 10−19 Hz−1/2. For such Gaussians our S4 search all-sky 90% sensitivity is
2.5 × 10−20 Hz−1/2 (see Table 3) and when accounting for optimal orientation, this
improves by roughly a factor of 3, to 9.3×10−21 Hz−1/2. The rate of the EXPLORER
and NAUTILUS events was of order 200 events/year (or 0.55 events per day) [37, 38].
A steady flux of gravitational-wave bursts at this rate is excluded by our present
measurement at the 99.9% confidence level. Finally, in more recent running of the
EXPLORER and NAUTILUS detectors, an analysis of 149 days of data collected in
2003 set an upper limit of 0.02 events per day at the 95% confidence level and with a
hrss sensitivity of ∼ 2× 10
−19 Hz−1/2 [39].
The S5 science run, which began in November 2005 and is expected to continue
until late 2007, has a goal of collecting a full year of coincident LIGO science-mode
data. Searches for gravitational-wave bursts using S5 data are already underway and
will be capable of detecting any sufficiently strong signals which arrive during that
time, or else placing an upper limit on the rate of such signals on the order of a few
per year. Furthermore, the detector noise during the S5 run has reached the design
goals for the current LIGO interferometers, and so the amplitude sensitivity of S5
burst searches is expected to be roughly a factor of two better than the sensitivity of
this S4 search.
Another direction being pursued with the S5 data is to make appropriate use
of different detector network configurations. In addition to the approach used
in the S4 analysis reported here, which requires a signal to appear with excess
power in a time-frequency map in all three LIGO interferometers, data from two-
detector combinations is also being analyzed to maximize the total observation
time. Furthermore, using LIGO data together with simultaneous data from other
interferometers can significantly improve confidence in a signal candidate and allow
more properties of the signal to be deduced. The GEO 600 interferometer has joined
the S5 run for full-time observing in May 2006, and we look forward to the time
when VIRGO begins operating with sensitivity comparable to the similarly-sized
LIGO interferometers. Members of the LSC are currently implementing coherent
network analysis methods using maximum likelihood approaches for optimal detection
of arbitrary burst signal (see, for example, [40]) and for robust signal consistency
tests [41, 42]. Such methods will make the best use of the data collected from the
global network of detectors to search for gravitational-wave bursts.
Search for gravitational-wave bursts in LIGO data 29
Acknowledgments
The authors gratefully acknowledge the support of the United States National Science
Foundation for the construction and operation of the LIGO Laboratory and the
Science and Technology Facilities Council of the United Kingdom, the Max-Planck-
Society, and the State of Niedersachsen/Germany for support of the construction
and operation of the GEO600 detector. The authors also gratefully acknowledge the
support of the research by these agencies and by the Australian Research Council, the
Council of Scientific and Industrial Research of India, the Istituto Nazionale di Fisica
Nucleare of Italy, the Spanish Ministerio de Educación y Ciencia, the Conselleria
d’Economia, Hisenda i Innovació of the Govern de les Illes Balears, the Scottish
Funding Council, the Scottish Universities Physics Alliance, The National Aeronautics
and Space Administration, the Carnegie Trust, the Leverhulme Trust, the David
and Lucile Packard Foundation, the Research Corporation, and the Alfred P. Sloan
Foundation. This document has been assigned LIGO Laboratory document number
LIGO-P060016-C-Z.
References
[1] Sigg D (for the LSC) 2006 Class. Quantum Grav. 23 S51–6
[2] Lück H et al 2006 Class. Quantum Grav. 23 S71–8
[3] Acernese F et al 2006 Class. Quantum Grav. 23 S63–9
[4] Ando M and the TAMA Collaboration 2005 Class. Quantum Grav. 22 S881–9
[5] Fritschel P 2003 Gravitational-Wave Detection: Proc. SPIE vol 4856 ed Cruise M and Saulson
P (SPIE) p 282–91
[6] Acernese F et al 2006 Class. Quantum Grav. 23 S635–42
[7] Kuroda K et al 2003 Proc. 28th Int. Cosmic Ray Conf. ed Kajita T et al (Universal Academy
Press) p 3103
[8] Abbott B et al (LSC) and Akutsu T et al (TAMA Collaboration) 2006 Phys. Rev. D 73 102002
[9] Blanchet L, Damour T, Esposito-Farèse G and Iyer BR 2004 Phys. Rev. Lett. 93 091101
[10] Jaranowski P, Królak A and Schutz BF 1998 Phys. Rev. D 58 063001
[11] Baker J G, Centrella J, Choi D, Koppitz M and van Meter J 2006 Phys. Rev. D 73 104002
[12] Dimmelmeier H, Font J A and Müller E 2001 Astrophys. J. Lett. 560 L163–6
[13] Ott C D, Burrows A, Livne E and Walder R 2004 Astrophys. J. 600 834–64
[14] Burrows A, Livne E, Dessart L, Ott C D and Murphy J 2006 Astrophys. J. 640 878–90
[15] Ott C D, Burrows A, Dessart L and Livne E 2006 Phys. Rev. Lett. 96 201102
[16] Abbott B et al (LSC) 2004 Phys. Rev. D 69 102001
[17] Abbott B et al (LSC) 2005 Phys. Rev. D 72 062001
[18] Abbott B et al (LSC) 2006 Class. Quantum Grav. 23 S29–39
[19] Lazzarini A and Weiss R 1995 LIGO Science Requirements Document (SRD) LIGO technical
document LIGO-E950018-02-E
[20] Klimenko S and Mitselmakher G 2004 Class. Quantum Grav. 21 S1819–30
[21] The version of WaveBurst used for this analysis may be found at
http://ldas-sw.ligo.caltech.edu/cgi-bin/cvsweb.cgi/Analysis/WaveBurst/S4/?cvsroot=GDS
with the CVS tag “S4”
[22] Daubechies I 1992 Ten Lectures on Wavelets (Philadelphia: SIAM)
[23] Klimenko S, Yakushin I, Rakhmanov M and Mitselmakher G 2004 Class. Quantum Grav. 21
S1685–94
[24] Cadonati L and Márka S 2005 Class. Quantum Grav. 22 S1159–67
[25] Cadonati L 2004 Class. Quantum Grav. 21 S1695–703
[26] The version of CorrPower used for this analysis may be found at
http://www.lsc-group.phys.uwm.edu/cgi-bin/cvs/viewcvs.cgi/matapps/src/searches/burst/CorrPower/?cvsroot=lscsoft
with the CVS tag “CorrPower-080605”
[27] Chatterji S, Blackburn L, Martin G and Katsavounidis E 2004 Class. Quantum Grav. 21 S1809–
[28] The version of KleineWelle used for this analysis may be found at
http://ldas-sw.ligo.caltech.edu/-cgi-bin/-cvsweb.cgi/-Analysis/-WaveBurst/-S4/-?cvsroot=GDS
http://www.lsc-group.phys.uwm.edu/-cgi-bin/-cvs/-viewcvs.cgi/-matapps/-src/-searches/-burst/-CorrPower/-?cvsroot=lscsoft
Search for gravitational-wave bursts in LIGO data 30
http://ldas-sw.ligo.caltech.edu/cgi-bin/cvsweb.cgi/gds/Monitors/kleineWelle/?cvsroot=GDS
dated January 20, 2005
[29] Beauville F et al 2006 “A comparison of methods for gravitational wave burst searches from
LIGO and Virgo”, submitted to Phys. Rev. D, preprint gr-qc/0701026
[30] Dietz A, Garofoli J, González G, Landry M, O’Reilly B and Sung M 2006 LIGO technical
document LIGO-T050262-01-D
[31] Shapiro S L and Teukolsky S A 1983 Black Holes, White Dwarfs, and Neutron Stars (New York:
John Wiley & Sons)
[32] Riles K 2004 LIGO technical document LIGO-T040055-00-Z
[33] Nicholson D et al 1996 Phys. Lett. A 218 175
[34] Forward R L 1978 Phys. Rev. D 17 379
[35] Ando M et al (TAMA Collaboration) 2005 Phys. Rev. D 71 082002
[36] Astone P et al (International Gravitational Event Collaboration) 2003 Phys. Rev. D 68 022001
[37] Astone P et al 2002 Class. Quantum Grav. 19 5449
[38] Coccia E, Dubath F and Maggiore M 2004 Phys. Rev. D 70 084010
[39] Astone P et al 2006 Class. Quantum Grav. 23 S169–78
[40] Klimenko S, Mohanty S, Rakhmanov M and Mitselmakher G 2005 Phys. Rev. D 72 122002
[41] Wen L and Schutz B 2005 Class. Quantum Grav. 22 S1321–35
[42] Chatterji S, Lazzarini A, Stein L, Sutton P J, Searle A and Tinto M 2006 Phys. Rev. D 74
082005
http://ldas-sw.ligo.caltech.edu/-cgi-bin/-cvsweb.cgi/-gds/-Monitors/-kleineWelle/-?cvsroot=GDS
http://arxiv.org/abs/gr-qc/0701026
	Introduction
	Instruments and data collection
	Trigger generation
	Signal consistency tests
	H1/H2 amplitude consistency test
	Cross-correlation consistency tests
	Additional selection criteria for event candidates
	Data quality conditions
	Auxiliary-channel vetoes
	Gamma cut
	Search results
	Amplitude sensitivity of the search
	Astrophysical reach estimates
	Discussion
ABSTRACT
  The fourth science run of the LIGO and GEO 600 gravitational-wave detectors,
carried out in early 2005, collected data with significantly lower noise than
previous science runs. We report on a search for short-duration
gravitational-wave bursts with arbitrary waveform in the 64-1600 Hz frequency
range appearing in all three LIGO interferometers. Signal consistency tests,
data quality cuts, and auxiliary-channel vetoes are applied to reduce the rate
of spurious triggers. No gravitational-wave signals are detected in 15.5 days
of live observation time; we set a frequentist upper limit of 0.15 per day (at
90% confidence level) on the rate of bursts with large enough amplitudes to be
detected reliably. The amplitude sensitivity of the search, characterized using
Monte Carlo simulations, is several times better than that of previous
searches. We also provide rough estimates of the distances at which
representative supernova and binary black hole merger signals could be detected
with 50% efficiency by this analysis.

<|endoftext|><|startoftext|>
arXiv:0704.0944v1  [astro-ph]  7 Apr 2007
GLAST and Dark Matter Substructure in the Milky Way
Michael Kuhlen∗, Jürg Diemand†,∗∗ and Piero Madau†,‡
∗School of Natural Science, Institute for Advanced Study, Einstein Lane, Princeton, NJ 08540, USA
†Department of Astronomy and Astrophysics, UC Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
∗∗Hubble Fellow
‡Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85740 Garching, Germany
Abstract. We discuss the possibility of GLAST detecting gamma-rays from the annihilation of neutralino dark matter in the
Galactic halo. We have used “Via Lactea”, currently the highest resolution simulation of cold dark matter substructure, to
quantify the contribution of subhalos to the annihilation signal. We present a simulated allsky map of the expected gamma-ray
counts from dark matter annihilation, assuming standard values of particle mass and cross section. In this case GLAST should
be able to detect the Galactic center and several individual subhalos.
Keywords: Gamma-rays, Dark Matter Structure, Dark Matter Annihilation
PACS: 95.55.Ka, 98.70.Rz, 95.35.+d
INTRODUCTION
One of the most exciting discoveries that the Gamma-ray Large Area Space Telescope (GLAST) could make, is the
detection of gamma-rays from the annihilation of dark matter (DM). Such a measurement would directly address one
of the major physics problems of our time: the nature of the DM particle.
Whether or not GLAST will actually detect a DM annihilation signal depends on both unknown particle physics
and unknown astrophysics theory. Particle physics uncertainties include the type of particle (axion, neutralino, Kaluza-
Klein particle, etc.), its mass, and its interaction cross section. From the astrophysical side it appears that DM is not
smoothly distributed throughout the Galaxy halo, but instead exhibits abundant clumpy substructure, in the form of
thousands of so-called subhalos. The observability of DM annihilation radiation originating in Galactic DM subhalos
depends on their abundance, distribution, and internal properties.
Numerical simulations have been used in the past to estimate the annihilation flux from DM substructure [1, 2, 3, 4],
but since the subhalo properties, especially their central density profile, which determines their annihilation luminosity,
are very sensitive to numerical resolution, it makes sense to re-examine their contribution with higher resolution
simulations.
DM ANNIHILATION IN SUBSTRUCTURE
Here we report on the substructure annihilation signal in “Via Lactea”, the currently highest resolution simulation of
an individual DM halo. Details about this simulation, including the properties of the host halo and its substructure
population, can be found in [4, 5]. To briefly summarize: The central halo is resolved with ∼ 200 million high
resolution DM particles, corresponding to a particle mass of Mp = 2× 104 M⊙. At z = 0 the host halo has a mass
of M200 = 1.8× 1012 M⊙, and it underwent its last major merger at z = 1.7. In total we resolve close to 10,000
subhalos, which make up 5.3% of the host halo mass. The subhalo mass function is well approximated by a powerlaw
dN/d lnM ∝ M−1 over three orders of magnitude down to the resolution limit of about 200 particles per subhalo
(∼ 4×106 M⊙). This power law slope corresponds to equal mass in substructure per decade, and it implies that the total
subhalo mass fraction has not yet converged. Future simulations with even lower particle masses will presumably find
an even larger subhalo mass fraction. A limitation of this present simulation is that it completely neglects the effects
of baryons. Gas cooling will likely increase the DM density in the central regions of the host halo through adiabatic
compression [6]. However, because of their shallower potential wells, the DM distribution in galactic subhalos is
unlikely to be significantly altered by baryonic effects.
http://arxiv.org/abs/0704.0944v1
FIGURE 1. Left panel: The annihilation signal of individual subhalos (crosses) in units of the total luminosity of the spherically
averaged host halo. The curves are the average (solid) and total (dotted) signal in a sliding window over one decade in mass. Right
panel: The angular size subtended by 2.0rs for a fiducial observer located 8 kpc from the halo center vs. the subhalo tidal mass. For
an NFW density profile ∼ 90% of the total luminosity originates within rs. The expected GLAST 68% angular resolution at > 10
GeV of 9 arcmin is denoted by the solid horizontal line.
We approximate the annihilation luminosity of an individual subhalo by
Ssub,i =
ρ2subdVi = ∑
jε{Pi}
ρ jmp, (1)
where ρ j is the density of the jth particle (estimated using a 32 nearest neighbor SPH kernel), and {Pi} is the set of all
particles belonging to halo i. In the left panel of Figure 1 we plot Ssub normalized by Shost, the total luminosity of the
spherically averaged host halo.
We find that the subhalo luminosity is proportional to its mass. Given our measured substructure abundance of
dN/d lnMsub ∝ M−1sub, this implies a total subhalo annihilation luminosity that is approximately constant per decade of
substructure mass, as the Figure shows (dotted line). We measure a total annihilation luminosity from the host halo that
is a factor of 2 higher than the spherically-averaged smooth signal, obtained by integrating the square of the binned
radial density profile. About half of this boost is due to resolved substructure, and we attribute the remaining half to
other deviations from spherical symmetry. Similar boost factors may apply to the luminosity of individual subhalos as
well (see next section).
The detectability of DM annihilation originating in subhalos depends not only on their luminosity, but also on
the angular size of the sources in the sky, which we can constrain by “observing” the subhalo population in our
simulation. For this purpose we have picked a fiducial observer position, located 8 kpc from the halo center along the
intermediate axis of the triaxial host halo mass distribution. In the right panel of Figure 1 we plot the angular size
∆θ of the subhalos for this observer position. For an NFW density profile with scale radius rs, about 90% of the total
annihilation luminosity originates within rs. We define ∆θ to be the angle subtended by rVmax/2.16, where rVmax is
the radius of the peak of the circular velocity curve Vc(r)
2 = GM(< r)/r, which is equal to 2.16rs for an NFW profile.
GLAST’s expected 68% containment angular resolution for photons above 10 GeV is 9 arcmin. We find that (553,
85, 20) of our subhalos have angular sizes greater than (9, 30, 60) arcmin. In the following section we consider the
brightness of these subhalos and discuss the possibility of actually detecting some of them with GLAST.
FIGURE 2. Simulated GLAST allsky map of neutralino DM annihilation in the Galactic halo, for a fiducial observer located 8
kpc from the halo center along the intermediate principle axis. We assumed Mχ = 46 GeV, 〈σv〉= 5×10−26 cm3 s−1, a pixel size
of 9 arcmin, and a 2 year exposure time. The flux from the subhalos has been boosted by a factor of 10 (see text for explanation).
Backgrounds and known astrophysical gamma-ray sources have not been included.
DM ANNIHILATION ALLSKY MAP
Using the DM distribution in our Via Lactea simulation, we have constructed allsky maps of the gamma-ray flux from
DM annihilation in our Galaxy. As an illustrative example we have elected to pick a specific set of DM particle physics
and realistic GLAST/LAT parameters. This allows us to present maps of expected photon counts.
The number of detected DM annihilation gamma-ray photons from a solid angle ∆Ω along a given line of sight (θ ,
φ ) over an integration time of τexp is given by
Nγ (θ ,φ) = ∆Ω τexp
Aeff(E)dE
ρ(l)2dl, (2)
where Mχ and 〈σv〉 are the DM particle mass and velocity-weighted cross section, Eth and Aeff(E) are the detector
threshold and energy-dependent effective area, and dNγ/dE is the annihilation spectrum.
We assume that the DM particle is a neutralino and have chosen standard values for the particle mass and annihilation
cross section: Mχ = 46 GeV and 〈σv〉= 5×10−26 cm3 s−1. These values are somewhat favorable, but well within the
range of theoretically and observationally allowed models. As a caveat we note that the allowed Mχ -〈σv〉 parameter
space is enormous (see e.g. [7]), and it is quite possible that the true values lie orders of magnitude away from the
chosen ones, or indeed that the DM particle is not a neutralino, or not even weakly interacting at all. We include only
the continuum emission due to the hadronization and decay of the annihilation products (bb̄ and uū only, for our low
Mχ ) and use the spectrum dNγ/dE given in [8].
For the detector parameters we chose an exposure time of τexp = 2 years and a pixel angular size of ∆θ = 9 arcmin,
corresponding to the 68% containment GLAST/LAT angular resolution. For the effective area we used the curve
published on the GLAST/LAT performance website [9] and adopted a threshold energy of Eth = 0.45 GeV (chosen to
maximize the significance, see below). The fiducial observer is located 8 kpc from the center along the intermediate
principle axis of the host halo’s ellipsoidal mass distribution.
Lastly, we applied a boost factor of 10 to all subhalo fluxes. The motivation for this boost factor is twofold: First, we
expect the central regions of our simulated subhalos to be artificially heated due to numerical relaxation, and hence less
dense and less luminous than in reality. Secondly, we expect the subhalo signal to be boosted by its own substructure.
We in fact observe sub-subhalos in the most massive of Via Lactea’s subhalos [4], and this sub-substructure, and
indeed sub-sub-substructure, etc., will lead to a boost in the annihilation luminosity analogous to the one for the whole
host halo, discussed in the previous section. An analytical model [10] for subhalo flux boost factors gives boosts from
a few up to ∼ 100, depending on the slope and lower mass cutoff of the subhalo mass function.
Figure 2 shows the resulting allsky map in a Mollweide projection. The coordinate system has been rotated such
that the major axis of the host halo ellipsoid is aligned with the horizontal direction, which would also correspond
to the plane of the Milky Way disk, if its angular momentum vector were aligned with the minor axis of the host
halo. The halo center (at l = 0◦, b = 0◦) is the brightest source of annihilation radiation, but the most massive subhalo
(at around l = +70◦, b = −10◦) is of comparable brightness. Additionally a large number of individual subhalos are
clearly visible, especially towards the halo center (−90◦ < l <+90◦, −60◦ < b <+60◦).
In order to quantify the detectability of individual subhalos (given our assumptions) we include diffuse Galactic and
extragalactic backgrounds, and convert our photon counts Nγ into significance S = Ns/
Nb, where Ns and Nb are the
source and background counts, respectively. For the extragalactic background we use the EGRET measurement [11]
and for the Galactic background we follow [12] and assume that it is proportional to the Galactic H I column density
[13]. Whereas the extragalactic component is uniform over the sky, the Galactic background is strongest towards the
center and in a band of b± 10◦ around the Galactic disk.
We consider all objects with S > 5 to be detectable by GLAST. With our choice of parameters the halo center
could be significantly detected, with S > 100. The number of subhalos with S > 5 depends strongly on the applied
boost factor. Without boosting the subhalo fluxes, only the most massive halo is detectable. Applying a boost factor
of 5 (10), we find that 29 (71) subhalos satisfy the S > 5 threshold for detectability. Note that subhalos below our
current resolution limit might also be detectable. Their greater abundance reduces the expected distance to the nearest
neighbor, and this may compensate for their lower intrinsic luminosities (see Koushiappas’ contribution in these
Proceedings).
In conclusion we find that with favorable particle physics parameters, GLAST may very well detect gamma-ray
photons originating from DM annihilations, either from the Galactic center or from individual subhalos. This would
be a sensational discovery of great importance, and it is worth including a search for a DM annihilation signal in the
data analysis.
ACKNOWLEDGMENTS
P.M. acknowledges support from NASA grants NAG5-11513 and NNG04GK85G, and from the Alexander von
Humboldt Foundation. J.D. acknowledges support from NASA through Hubble Fellowship grant HST-HF-01194.01.
The Via Lactea simulation was performed on NASA’s Project Columbia supercomputer system.
REFERENCES
1. Calcaneo-Roldan, C., & Moore, B. 2000, PhRvD, 62, 123005
2. Stoehr, F., White, S. D. M., Springel, V., Tormen, G., & Yoshida, N. 2003, MNRAS, 345, 1313
3. Diemand, J., Kuhlen, M., & Madau, P. 2006, ApJ, 649, 1
4. Diemand, J., Kuhlen, M., & Madau, P. 2007, ApJ, 657, 262
5. Diemand, J., Kuhlen, M., & Madau, P. 2007, submitted to ApJ, (astro-ph/0703337)
6. Blumenthal, G. R., Faber, S. M., Flores, R., & Primack, J. R. 1986, ApJ, 301, 27
7. Colafrancesco, S., Profumo, S., & Ullio, P. 2006, A&A, 455, 21
8. Bergström, L., Ullio, P., & Buckley, J. H. 1998, Astroparticle Physics, 9, 137
9. http://www-glast.slac.stanford.edu/software/IS/glast_lat_performance.htm
10. Strigari, L. E., Koushiappas, S. M., Bullock, J. S., & Kaplinghat, M. 2006, submitted to Phys. Rev. D (astro-ph/0611925)
11. Sreekumar, P., et al. 1998, ApJ, 494, 523
12. Baltz, E. A., Briot, C., Salati, P., Taillet, R., & Silk, J. 2000, Phys. Rev. D, 61, 023514
13. Dickey, J. M., & Lockman, F. J. 1990, ARAA, 28, 215
ABSTRACT
  We discuss the possibility of GLAST detecting gamma-rays from the
annihilation of neutralino dark matter in the Galactic halo. We have used "Via
Lactea", currently the highest resolution simulation of Galactic cold dark
matter substructure, to quantify the contribution of subhalos to the
annihilation signal. We present a simulated allsky map of the expected
gamma-ray counts from dark matter annihilation, assuming standard values of
particle mass and cross section. In this case GLAST should be able to detect
the Galactic center and several individual subhalos.

<|endoftext|><|startoftext|>
Introduction
We are interested in various models for random trees associated with processes of re-
cursive partitioning of a finite or infinite set, known as fragmentation processes [2, 4, 9].
We start by introducing a convenient formalism for the kind of combinatorial trees aris-
ing naturally in this context [16, 18]. Let #B be the number of elements in the finite
non-empty set B. Following standard terminology, a partition of B is a collection
πB = {B1, . . . ,Bk}
of non-empty disjoint subsets of B whose union is B. To introduce a new terminology
convenient for our purpose, we make the following recursive definition. A fragmentation
of B (sometimes called a hierarchy or a total partition) is a collection tB of non-empty
subsets of B such that
(i) B ∈ tB ;
(ii) if #B ≥ 2 then, there is a partition πB of B into k parts, B1, . . . ,Bk, called the
children of B, for some k ≥ 2, with
tB = {B} ∪ tB1 ∪ · · · ∪ tBk , (1)
This is an electronic reprint of the original article published by the ISI/BS in Bernoulli,
2008, Vol. 14, No. 4, 988–1002. This reprint differs from the original in pagination and
typographic detail.
1350-7265 c© 2008 ISI/BS
http://arxiv.org/abs/0704.0945v2
http://isi.cbs.nl/bernoulli/
http://dx.doi.org/10.3150/08-BEJ134
mailto:pmcc@galton.uchicago.edu
mailto:pitman@stat.berkeley.edu
mailto:winkel@stats.ox.ac.uk
http://isi.cbs.nl/BS/bshome.htm
http://isi.cbs.nl/bernoulli/
http://dx.doi.org/10.3150/08-BEJ134
Gibbs fragmentation trees 989
Figure 1. Two fragmentations of [9] graphically represented as trees labeled by subsets of [9].
where tBi is a fragmentation of Bi for each 1≤ i≤ k.
Necessarily, Bi ∈ tB , each child Bi of B with #Bi ≥ 2 has further children, and so on,
until the set B is broken down into singletons. We use the same notation tB both
• for such a collection of subsets of B, and
• for the tree whose vertices are these subsets of B and whose edges are defined by
the parent/child relation determined by the fragmentation.
To emphasize the tree structure, we may call tB a fragmentation tree. Thus, B is the root
of tB and each singleton subset of B is a leaf of tB (see Figure 1 – here [9] = {1, . . . ,9};
we also put [n] = {1, . . . , n}). We denote by TB the collection of all fragmentations of B.
A fragmentation tB ∈ TB is called binary if every A ∈ tB has either 0 or 2 children. We
denote by BB ⊆ TB the collection of binary fragmentations of B.
For each non-empty subset A of B, the restriction to A of tB , denoted tA,B , is the
fragmentation tree whose root is A, whose leaves are the singleton subsets of A and
whose tree structure is defined by restriction of tB . That is, tA,B is the fragmentation
{C ∩ A : C ∩ A 6= ∅,C ∈ tB} ∈ TA, corresponding to a reduced subtree, as discussed by
Aldous [1].
Given a rooted combinatorial tree with no single-child vertices and whose leaves are
labeled by a finite set B, there is a corresponding fragmentation tB , where each vertex
of the combinatorial tree is associated with the set of leaves in the subtree above that
vertex. So the fragmentations defined here provide a convenient way to label the vertices
of a combinatorial tree and to encode the tree structure in the labeling.
A random fragmentation model is an assignment, for each finite subset B of N, of a
probability distribution on TB for a random fragmentation TB of B. We assume through-
out this paper that the model is exchangeable, meaning that the distribution of TB is
invariant under the obvious action of permutations of B on fragmentations of B. The
distribution of ΠB , the partition of B generated by the branching of TB at its root, is
then of the form
P(ΠB = {B1, . . . ,Bk}) = p(#B1, . . . ,#Bk) (2)
for all partitions {B1, . . . ,Bk} with k ≥ 2 blocks and some symmetric function p of com-
positions of positive integers, called a splitting probability rule. The model is called
990 P. McCullagh, J. Pitman and M. Winkel
• consistent if for every A⊂B, the restricted tree TA,B is distributed like TA;
• Markovian if, given ΠB = {B1, . . . ,Bk}, the k restricted trees TB1,B, . . . , TBk,B are
independent and distributed as TB1 , . . . , TBk ;
• binary if TB is a binary tree with probability one, for every B.
Aldous [2] initiated the study of consistent Markovian binary trees as models for neutral
evolutionary trees. He observed parallels between these models and Kingman’s theory
of exchangeable random partitions of N, and posed the problem of characterizing these
models analogously to known characterizations of the Ewens sampling formula for random
partitions. In [9], we showed how consistent Markovian trees arise naturally in Bertoin’s
theory of homogeneous fragmentation processes [4] and deduced from Bertoin’s theory a
general integral representation for the splitting rule of a Markovian fragmentation model.
To briefly review these developments in the binary case, the distribution of a Markovian
binary fragmentation TB is determined by a splitting rule p, which is a symmetric function
p of pairs of positive integers (i, j), according to the following formula for the probability
of a given tree t ∈ BB :
P(TB = t) =
A∈t:#A≥2
p(#A1,#A2), (3)
where A1 and A2 denote the two children of A in the tree TB .
The following proposition collects some known results.
Proposition 1. (i) Every non-negative symmetric function p subject to normalization
conditions
p(k,n− k) = 1 for all n≥ 2
defines a Markovian binary fragmentation model.
(ii) A splitting rule p gives rise to a consistent Markovian binary fragmentation if
and only if
p(i, j) = p(i+1, j) + p(i, j + 1)+ p(i+ j,1)p(i, j) for all i, j ≥ 1. (4)
(iii) Every consistent splitting rule admits an integral representation
p(i, j) =
Z(i+ j)
(0,1)
xi(1− x)jν(dx) + c1{i=1 or j=1}
for all i, j ≥ 1, (5)
with characteristics c≥ 0 and ν a symmetric measure on (0,1) with
(0,1)
x(1−x)ν(dx)<
∞, and Z(n) a sequence of normalization constants.
Proof. (i) is elementary. For (ii), Ford [6], Proposition 41, gave a characterizaton of
consistency for models of unlabeled trees which is easily shown to be equivalent to the
Gibbs fragmentation trees 991
condition stated here. The interpretation (and sketch of proof) of this condition is that
for B = C ∪ {k} (with k /∈ C), the vertex C of TC splits into a particular partition of
sizes i and j if and only if TB splits into that partition with k added to one or the other
block, or if TB first splits into C and {k} and then C splits further into that partition
of sizes i and j. (iii) is directly read from [9]. �
Aldous [2] studied in some detail the beta-splitting model which arises as the particular
case of (5) with characteristics c= 0 and
ν(dx) = xβ(1− x)βdx for β ∈ (−2,∞) and ν(dx) = δ1/2(dx) for β =∞. (6)
Aldous posed the problem of characterizing this model among all consistent binary
Markov models. The main focus of this paper is the following result.
Theorem 2. Aldous’ beta-splitting models for β ∈ (−2,∞] are the only consistent
Markovian binary fragmentations with splitting rule of the form
p(i, j) =
w(i)w(j)
Z(i+ j)
for all i, j ≥ 1, (7)
for some sequence of weights w(j)≥ 0, j ≥ 1, and normalization constants Z(n), n≥ 2.
As a corollary, we extract a statement purely about measures on (0,1).
Corollary 3. Every symmetric measure ν on (0,1) with
(0,1)
x(1−x)ν(dx)<∞, whose
moments factorize into the form
(0,1)
xi(1− x)jν(dx) =w(i)w(j) for all i, j ≥ 1
for some w(i)≥ 0, i≥ 1, is a multiple of one of Aldous’ beta-splitting measures (6).
In particular, this characterizes the symmetric beta distributions among probability
measures on (0,1).
Berestycki and Pitman [3] encountered a different one-dimensional class of Gibbs split-
ting rules in the study of fragmentation processes related to the affine coalescent. These
are not consistent, but the Gibbs fragmentations are naturally embedded in continuous
time.
The rest of this paper is organized as follows. Section 2 offers an alternative char-
acterization of what we call binary Gibbs models, meaning models with splitting rule
of the form (7), without assuming consistency. Theorem 2 is then proved in Section 3.
In Section 4, we discuss growth procedures and embedding in continuous time for the
consistent case. Section 5 gives a generalization of the Gibbs results to multifurcating
trees.
992 P. McCullagh, J. Pitman and M. Winkel
2. Characterization of binary Gibbs fragmentations
The Gibbs model (7) is overparameterized: if we multiply w(k), k ≥ 1, by abk (and
then Z(m), m≥ 2, by a2bm), the model remains unchanged. Note, further, that neither
w(1) = 0 nor w(2) = 0 is possible since then (7) does not define a probability function for
n= i+ j = 3. Hence, we may assume w(1) = 1 and w(2) = 1. It is now easy to see that
for any two different such sequences, the models are different. Note that the following
result does not assume a consistent model.
Proposition 4. The following two conditions on a collection of random binary fragmen-
tations TB indexed by finite subsets B of N are equivalent:
(i) TB is for each B an exchangeable Markovian binary fragmentation with splitting
rule of the Gibbs form (7) for some sequence of weights w(j)> 0, j ≥ 1, and normaliza-
tion constants Z(n), n≥ 2;
(ii) for each B, the probability distribution of TB is of the form
P(TB = t) =
w(#B)
ψ(#A) for all t ∈ BB , (8)
for some sequence of weights ψ(j)> 0, j ≥ 1, and normalisation constants w(n), n≥ 1.
More precisely, if (i) holds with w(1) = 1, then (ii) holds for the same sequence w with
ψ(1) = 1 and ψ(k) =w(k)/Z(k), k ≥ 2. (9)
Conversely, if (ii) holds for some sequence ψ with ψ(1) = 1, then (i) holds for the sequence
w(n), n≥ 1, determined by (8); in particular, w(1) = 1.
Proof. Given a Gibbs model with w(1) = 1, we can combine (3) and (7) to get, for all
t ∈ BB ,
P(TB = t) =
A∈t:#A≥2
w(#A1)w(#A2)
Z(#A)
w(#B)
A∈t:#A≥2
w(#A)
Z(#A)
If we make the substitution (9), we can read off w(n) as the correct normalization constant
and (8) follows, with ψ(1) = 1.
On the other hand, (8) determines the sequence w(n), n≥ 1, as
w(n) =
t∈B[n]
ψ(#A).
Note, in particular, that w(1) = ψ(1). We can express the normalization constants in the
Gibbs model (7) by the formula
Z(m) =
w(k)w(m− k) (10)
Gibbs fragmentation trees 993
t1∈B[k]
ψ(#A)
t2∈B[m−k]
ψ(#A)
t∈B[m]
A∈t:A 6=[m]
ψ(#A) =w(m)/ψ(m),
as in (9). By application of the previous implication from (i) to (ii), formula (8) gives
the distribution of the Gibbs model derived from this weight sequence w(n) and the
conclusion follows. �
Note that the normalization constant Z(m) in the Gibbs splitting rule (7) model and
given in (10) is a partial Bell polynomial in w(1),w(2), . . . (see [15] for more applications of
Bell polynomials), whereas the normalization constant w(n) in the Gibbs tree formula (8)
is a polynomial in ψ(1), ψ(2), . . . of a much a more complicated form. The normalization
constant in (8) is
w(n) =
t∈B[n]
ψ(#A).
In an attempt to study this polynomial in ψ(1), ψ(2), . . . , we introduce the signature
σt : [n]→N of a tree t ∈ B[n] by
σt(j) =#{A ∈ t :#A= j}, j = 1, . . . , n.
Note that P(Tn = t) depends on t only via σt, that is, σt is a sufficient statistic for
the Gibbs probabilities (8). Denote the set of signatures by Sign = {σt : t ∈ B[n]}. The
inductive definition of B[n] yields
Sign = {σ
(1) + σ(2) + 1n :σ
(1) ∈ Sign1 , σ
(2) ∈ Sign2 , n1 + n2 = n},
where 1n(j) = 1 if j = n, 1n(j) = 0 otherwise. The coefficients Qσ in w(n), when expanded
as a polynomial in ψ(1), ψ(2), . . . , are numbers of fragmentations with the same signature
σ ∈ Sign:
w(n) =
σ∈Sign
σ, where ψσ =
ψ(j)σ(j).
Let us associate with each fragmentation t ∈ B[n] its tree shape (combinatorial tree
without labels) t◦ and denote by B◦n the collection of shapes of binary trees with n
leaves. Clearly, two fragmentations with the same tree shape have the same signature,
so we can define σ(t◦) in the obvious way. For n ≤ 8 (and many larger trees), direct
enumeration shows that the tree shape t◦ ∈ B◦n is uniquely determined by its signature
σ, and Qσ is just the number q(t
◦) of different labelings. For n≥ 9, this is false: there
are two tree shapes with signature (9,3,1,2,1,0,0,0,1); see Figure 2. If we denote by
994 P. McCullagh, J. Pitman and M. Winkel
I◦σ ⊆ B
n the set of tree shapes with signature σ, then Qσ =
t◦∈I◦
q(t◦). The remaining
combinatorial problem is therefore to study I◦σ and q(t
◦). We have not been able to solve
this problem. The preprint version [12] of the present paper includes an Appendix with
a partial study: see also Corollary 2.4.3 of [17].
3. Consistent binary Gibbs rules
The statement of Theorem 2 specifies Aldous’ [2] beta-splitting models by their integral
representation (5). Observe that the moment formula for beta distributions easily gives
p(i, j) =
Z(i+ j)
xi+β(1− x)j+β dx
Γ(i+ β + 1)Γ(j + β + 1)
R(i+ j)
for all i, j ≥ 1,
for normalization constants R(n) = Z(n)Γ(n+ 2β + 2), n ≥ 2. This is for β ∈ (−2,∞).
For β =∞, we simply get p(i, j) = 1/R(i+ j) for all i, j ≥ 1, where R(n) = Z(n)2n, n≥ 2.
Proof of Theorem 2. We start from a general Gibbs model (7) with w(1) = 1 and
follow [7], Section 2 closely, where a similar characterization is derived in a partition
rather than a tree context. Let the Gibbs model be consistent. This immediately implies
that w(j)> 0 for all j ≥ 1. The consistency criterion (4) in terms of Wj =w(j +1)/w(j)
now gives
Wi +Wj =
Z(i+ j + 1)−w(i+ j)
Z(i+ j)
for all i, j ≥ 1. (12)
The right-hand side is a function of i+ j, soWj+1−Wj is constant and henceWj = a+bj
for some b≥ 0 and a >−b. Now, either b= 0 (excluded for the time being) or
w(j) =W1 · · ·Wj−1 =
(a+ bq)
Figure 2. Two tree shapes with the same signature (here marked by subtree sizes).
Gibbs fragmentation trees 995
= bj−1
= bj−1
Γ(a/b+ j)
Γ(a/b+ 1)
and, hence, reparameterizing by β = a/b − 1 ∈ (−2,∞) and pushing bi+j−2 into the
normalization constant di+j = b
i+j−2/Z(i+ j), we have
p(i, j) =
w(i)w(j)
Z(i+ j)
= di+j
Γ(i+ 1+ β)
Γ(2 + β)
Γ(j + 1+ β)
Γ(2+ β)
The case b= 0 is the limiting case β =∞, when, clearly, w(j) ≡ 1 (now pushing ai+j−2
into the normalization constant).
These are precisely Aldous’ beta-splitting models, as in (11). �
While we identified the boundary case β =∞ as being of Gibbs type, the boundary
case β =−2 is not of Gibbs type, although it can still be made precise as a Markovian
fragmentation model with characteristics c > 0 and ν = 0 (pure erosion): p(i, j) = 0 unless
i= 1 or j = 1, so the Markovian fragmentations Tn are combs, where all n− 1 branching
vertices are lined up in a single spine.
In the proof of the theorem, we obtained as parameterization for the Gibbs models
w(j) =
Γ(j +1+ β)
Γ(2 + β)
, j ≥ 1, (13)
for some β ∈ (−2,∞), or w(j)≡ 1 for β =∞. Note that the simple convention w(2) = 1
from Section 2 is not useful here. We can now still deduce the parameterization (8) by
Proposition 4, in principle. However, since ψ(k) =w(k)/Z(k) involves partial Bell polyno-
mials Z(k) in w(1),w(2), . . . , this is less explicit in terms of β than the parameterization
ψ(2) = 2+ β, ψ(3) =
, ψ(4) =
(3 + β)(4 + β)
18 + 7β
, . . . .
Special cases that have been studied in various biology and computer science contexts
(see Aldous [2] for a review) include the following: β = −3/2,−1,0,∞. In these cases,
we can explicitly calculate the Gibbs parameters in (7) and (8) and the normalisation
constants.
If β = −3/2, we can take ψ(n) ≡ 1 and TB is uniformly distributed : if #B = n, then
P(TB = t) = 2
n−1(n − 1)!/(2n− 2)!, t ∈ BB . The asymptotics of uniform trees lead to
Aldous’ Brownian CRT [1]; see also [15], Section 6.3. Table 1 uses a different parameter-
ization via the convenient relations (9) and (13).
The case β = −1 is the limiting conditional distribution in the Ewens family as the
Ewens parameter λ→ 0, conditional on the occurrence of a split. The β = 0 case is
known as the Yule model and β = ∞ as the symmetric binary trie (see Aldous [2]).
Continuum tree limits of the beta-splitting model for β ∈ (−2,−1) are described in [9].
996 P. McCullagh, J. Pitman and M. Winkel
The normalization that leads to a compact limit tree is here T[n]/n
−β−1, where T[n] is
represented as a metric tree with unit edge lengths and the scaling T[n]/n
−β−1 refers
to scaling of edge lengths. Aldous [2] studies weaker asymptotic properties for average
distance from a leaf to the root, also for β ≥−1, where growth is logarithmic.
4. Growth rules and embedding in continuous time
In [9], we study the consistently growing sequence Tn, n≥ 1, where Tn := T[n] = T[n],[n+1]
is the restriction of Tn+1 to [n] for all n≥ 1, in a general context of consistent Marko-
vian multifurcating fragmentation models. The integral representation (5) stems from an
association with Bertoin’s theory of homogeneous fragmentation processes in continuous
time [4]. Let us here look at the binary case in general and Gibbs fragmentations in
particular.
Consider the distribution of Tn+1, given Tn. The tree Tn+1 has a vertex A ∪ {n+ 1}
with children {n + 1} and A ∈ Tn. We say that n + 1 has been attached below A. In
passing from Tn to Tn+1, leaf n+1 can be attached below any vertex A of Tn (including
[n] and all leaf nodes). Note that to construct Tn+1 from Tn, n+ 1 is also added as an
element to all vertices on the path from [n] to A. Vertex A ∈ Tn is special in that both
A and A∪ {n+ 1} are in Tn+1.
Fix a vertex A of t ∈ B[n] and consider the conditional probability, given Tn = t, of
n+ 1 being attached below A. This is the ratio of two probabilities of the form (3) in
which many common factors cancel so that only the probabilities along the path from
[n] to A remain. This yields the following result.
Proposition 5. Let t ∈ B[n] and A ∈ t. Denote by
[n] =A1 ⊃ · · · ⊃Ah =A
Table 1. Closed form expressions of the parameters for β =
−3/2,−1,0,∞
β −3/2 −1 0 ∞
(2n− 2)!
22n−2(n− 1)!
(n− 1)! n! 1
(2n− 2)!
22n−3(n− 1)!
(n− 1)!
(n− 1)n! 2n−1 − 1
2n−1 − 1
Gibbs fragmentation trees 997
the path from [n] to A. We refer to h≥ 1 as the height of A in t. The probability that
n+1 attaches below A is then
p(#Aj+1 + 1,#(Aj \Aj+1))
p(#Aj+1,#(Aj \Aj+1))
p(#Ah,1).
For the uniform model (Gibbs fragmentation with β =−3/2), this product is telescop-
ing, or we calculate directly from (8)
p(#Aj+1 +1,#(Aj \Aj+1))
p(#Aj+1,#(Aj \Aj+1))
p(#Ah,1) =
2n− 1
giving a simple sequential construction (see, e.g., [15], Exercise 7.4.11).
It was shown in [9] that consistent Markovian fragmentation models can be assigned
consistent independent exponential edge lengths, where the edge below vertex A is given
parameter λ#A, for a family (λm)m≥1 of rates, where λ1 = 0, λ2 is arbitrary and λm,
m≥ 3, is determined by λ2 and the splitting rule p, in that consistency requires
λn+1(1− p(n,1)) = λn for all n≥ 2. (14)
The interpretation is that the partition of [n+1] in Tn+1 (arriving at rate λn+1) splits [n]
only with probability 1− p(n,1) and this thinning must reduce the rate for the partition
of [n] in Tn to λn. This rate λn also applies in Tn+1 after a first split {[n],{n+ 1}}.
Using consistency, equation (14) also implies
λnp(i, j) = λn+1(p(i, j + 1)+ p(i+ 1, j)) for all i, j ≥ 1 with i+ j = n.
For the Gibbs fragmentation models, we obtain, using (14), (7), (12) and (13),
λn = λ2
1− p(j,1)
Z(j + 1)
Z(j + 1)−w(j)
= λ2Z(n)
W1 +Wj−1
= λ2Z(n)
w(j − 1)
w(2)w(j − 1) +w(j)
= λ2Z(n)
Γ(4 + 2β)
Γ(n+2+ 2β)
where we require β <∞ for the last step. Table 2 contains the rate sequences for β =
−3/2,−1,0,∞ in the case λ2 = 1.
Not only is (λn)n≥3 determined by p, but a converse of this also holds.
Proposition 6. Let (λn)n≥2 be a consistent rate sequence associated with a consistent
Markovian binary fragmentation model with splitting rule p, meaning that (14) holds.
Then, p is uniquely determined by (λn)n≥2.
998 P. McCullagh, J. Pitman and M. Winkel
Proof. It is evident from (14) that p(n,1) is determined for all n ≥ 2, and p(1,1) = 1.
Now, (4) for i= 1 determines p(i+1, j) for all j ≥ 2, and an induction in i completes the
proof. �
A more subtle question is to ask what sequences (λn)n≥2 arise as consistent rate
sequences. The above argument can be made more explicit to yield
p(k,n− k) =
(−1)k−j+1
λn−j , 1≤ k ≤ n/2,
which means that (λn)n≥2 must have a discrete complete monotonicity, in that kth
differences of (λn)n≥2 must be of alternating signs, k ≥ 1. This condition is not sufficient,
however, as simple examples for n= 3 show (λn = (n− 1)
α is completely monotone for
α ∈ (0,1), but exchangeability implies that 1/3 = p(1,2) = (λ3 −λ2)/λ3 and so λ3 = 3/2,
whereas (3− 1)α ∈ (1,2) – even in the multifurcating case, cf. Section 5, we always have
λ3 ≤ 3/2).
Proposition 7. A sequence (λn)n≥2 arises as rate sequence of a consistent Markovian
binary fragmentation model if and only if
λn = nc+
(0,1)
(1− xn − (1− x)n)ν(dx)
for some c≥ 0 and ν a symmetric measure on (0,1) with
(0,1)
x(1− x)ν(dx) <∞. The
characteristics of the splitting rules associated with (λn)n≥2 are (c, ν).
Proof. This is a consequence of the integral representation (5) and [9], Proposition 3.
Specifically, the association with Bertoin’s theory of homogeneous fragmentations yields
that each of 1, . . . , n suffer erosion (being turned into a singleton) at rate c; the measure
ν(dx) gives the rate of fragmentations into two parts, to which 1, . . . , n are allocated
independently with probabilities (x,1− x), hence splitting [n] with probability 1− xn −
(1− x)n. �
The complete monotonicity is related to the study of the block containing 1, a tagged
fragment ; see [4, 10]. Since λn is the rate at which one or more of {2, . . . , n} leave the
Table 2. Explicit rate sequences for β =−3/2,−1,0,∞
β −3/2 −1 0 ∞
22n−3
2n− 2
) n−1
3n− 3
2(1− 2−(n−1)).
Gibbs fragmentation trees 999
block containing 1, the rate is composed of three components – a rate c for the erosion
of 1, a rate (n− 1)c for the erosion of 2, . . . , n and a rate Λ(dz) of fragmentations into
two parts, to which 2, . . . , n are allocated independently with probabilities (e−z,1− e−z),
with 1 in the former part, hence splitting [n] with probability 1− e−(n−1)z . Therefore
λn = c+ (n− 1)c+
(0,∞)
(1− e−(n−1)z)Λ(dz) = cn+
(0,1)
1− ξn−1
µ(dξ) = Φ(n− 1)
for a Bernstein function Φ, a finite measure µ on (0,1) or a Lévy measure Λ on (0,∞)
(0,∞)
(1∧x)Λ(dx)<∞; (see [4, 8, 10]), that is, λn can be extended to a completely
monotone function of a real parameter.
5. Multifurcating Gibbs fragmentations and
Poisson–Dirichlet models
As a generalization of the binary framework of the previous sections, we consider in this
section consistent Markovian fragmentation models with splitting rule p as in (2) of the
Gibbs form
p(n1, . . . , nk) =
w(ni) (15)
for some w(j) ≥ 0, j ≥ 1, a(k) ≥ 0, k ≥ 2, and normalization constants c(n) > 0, n ≥ 2.
Note that we must have w(1)> 0 and a(2)> 0 to get positive probabilities for n= 2. To
remove overparameterization, we will assume w(1) = 1 and a(2) = 1. Also, if we multiply
w(j) by bj−1 and a(k) by bk (and c(n) by bn), the model remains unchanged. We will
use this observation to get a nice parameterization in the consistent case (Theorem 8
below).
In [9], we showed that consistency of the model is equivalent to the set of equations
p(n1, . . . , nk) = p(n1 + 1, n2, . . . , nk) + · · ·+ p(n1, . . . , nk + 1)+ p(n1, . . . , nk,1)
+ p(n1 + · · ·+ nk,1)p(n1, . . . , nk)
for all n1, . . . , nk ≥ 1, k ≥ 2. We also established an integral representation extending (5)
to the multifurcating case. The special case relevant for us is in terms of a measure ν on
S↓ = {s = (si)i≥1 : s1 ≥ s2 ≥ · · · ≥ 0, s1 + s2 + · · ·= 1} satisfying
(1− s1)ν(ds)<∞:
p(n1, . . . , nk) =
Z(n1 + · · ·+ nk)
i1,...,ik distinct
ν(ds). (17)
The general case has a further parameter c ≥ 0, as in (5), and also allows ν to charge
(si)i≥1 with s1 + s2 + · · ·< 1; see [9]. We will only meet the extreme case p(1, . . . ,1) = 1,
which corresponds to ν = δ(0,0,...).
1000 P. McCullagh, J. Pitman and M. Winkel
We set
a(k+ 1)
c(n+1)
w(n+ 1)
and, in analogy to Proposition 5, we find that, given Tn = t ∈ T[n], for each vertex B ∈ t,
the probability that n+ 1 attaches below B is
Wnj+1
a(2)w(nh)w(1)
c(nh +1)
where [n] ⊃ S1 ⊃ · · · ⊃ Sh = B is the path from [n] to B, nj =#Sj and kj denotes the
number of children of Sj , j = 1, . . . , h.
However, n+1 can also attach as a singleton block to an existing partition {B1, . . . ,Bk}
of B ∈ Tn. In this case, we say that n+ 1 attaches to the vertex B. For each non-leaf
vertex B ∈ t, the probability that n+ 1 attaches to the vertex B is
Wnj+1
Akhw(1)
In this framework, we have the following generalization of Theorem 2 to the multifurcat-
ing case.
Theorem 8. If p is of the Gibbs form (15) and consistent, then p is associated with the
two-parameter Ewens–Pitman family given by
w(n) =
Γ(n−α)
Γ(1− α)
, n≥ 1, and a(k) = αk−2
Γ(k+ θ/α)
Γ(2 + θ/α)
, k ≥ 2
(or limiting quantities α ↓ 0), c(n), n≥ 1, being normalization constants, for a parameter
range extended as follows:
• either 0≤ α < 1 and θ >−2α (multifurcating cases with arbitrarily high block num-
bers),
• or α < 0 and θ = −mα for some integer m ≥ 3 (multifurcating with at most m
blocks),
• or α< 1 and θ =−2α (binary case),
• or α = −∞ and θ = m for some integer m ≥ 2, that is, a(2) = 1, a(k) = (m −
2) · · · (m− k + 1), k ≥ 3, and w(j) ≡ 1 (recursive coupon collector, where a split of
[n] is obtained by letting each element of [n] pick one of m coupons at random, just
conditioned so that at least two different coupons are picked),
• or α= 1, that is, w(1) = 1, w(j) = 0, j ≥ 2 (deterministic split into singleton blocks).
In terms of the integral representation (17), the measure ν on S↓ is, respectively, size-
ordered Poisson–Dirichlet(α, θ), Dirichlet(−α, . . . ,−α), Beta(−α,−α), δ(1/m,...,1/m) and
δ(0,0,...).
Gibbs fragmentation trees 1001
Proof. For the Gibbs fragmentation model with w(1) = a(2) = 1 and w(j) > 0 for all
j ≥ 2 with notation as introduced, consistency (16) is easily seen to be equivalent to
Cn =Wn1 + · · ·+Wnk +Ak +
for all n1 + · · ·+ nk = n, (18)
where k ≤m if m= inf{i≥ 1 :a(i+ 1) = 0}<∞.
As in the proof of Theorem 2, we deduce from this (the special case k = 2) that either
Wj = a > 0 (excluded for the time being as b= 0) or
Wj = a+ bj ⇒ w(j) =W1 . . .Wj−1 = b
j−1Γ(j −α)
Γ(1− α)
for all j ≥ 1,
for some b > 0, a > −b and α := −a/b < 1. As noted above, we can reparameterize so
that we get b= 1 without loss of generality. In particular, Wj = j−α, j ≥ 1, and so (18)
reduces to
Cn = n− kα+Ak +
for all 2≤ k ≤m∧ n.
Similarly, we deduce that θ :=Ak−kα does not depend on k and so a(k) = θ
k−2 if α= 0,
and otherwise,
Ak = θ+ kα ⇒ a(k) =A2 . . .Ak−1 = α
k−2Γ(k+ θ/α)
Γ(2 + θ/α)
for all 2≤ k ≤m+ 1.
Note that this algebraic derivation leads to probabilities in (15) only in the following
cases.
• If 0 ≤ α < 1, then a(3) = A2 = θ + 2α > 0 if and only if θ > −2α, and then also
Ak = θ+ kα > 0 and a(k)> 0 for all k ≥ 3.
• If α< 0, then a(3) =A2 = θ+2α> 0 if and only if θ >−2α also, but then Ak = θ+kα
is strictly decreasing in k and Ak < 0 eventually, which impedes m=∞. If we have
m<∞, we achieve a(m+ 1) = 0 if and only if θ =−mα. The iteration only takes
us to a(m+ 1) = 0 and we specify a(k) = 0 for k >m also. We cannot specify a(k),
k > m + 1, differently, since every consistent Gibbs fragmentation with a(k) > 0
for k > m+ 1 has the property that T[k] = {[k],{1}, . . . ,{k}} has only one branch
point [k] of multiplicity k with positive probability, but then the restricted tree
T[m+1],[k] = {[m+ 1],{1}, . . . ,{m+ 1}} with positive probability, which contradicts
a(m+ 1) = 0.
• If a(3) = 0, that is, m= 2, the argument of the preceding bullet point shows that we
are in the binary case a(k) = 0 for all k ≥ 3 and we can conclude by Theorem 2.
• The case b= 0 is the limiting case α=−∞ with w(j)≡ 1. We take up the argument
to see that Ak = θ − k and so m<∞ and θ =m, where we then get a(2) = 1 and
a(k) = (m− 2) · · · (m− k+ 1), 3≤ k ≤m+ 1.
1002 P. McCullagh, J. Pitman and M. Winkel
Finally, if w(m) = 0 for some m ≥ 2, then consistency imposes w(j) = 0 for all j ≥m,
and it follows from the integral representation (17) that in fact w(j) = 0 for all j ≥ 2.
The identification of ν on the standard parameter range can be read from [15], Section
3.2. For the extension −α≥ θ≥−2α, we refer to [10]. �
Kerov [11] showed that the only exchangeable partitions of N of Gibbs type are of
the two-parameter family PD(α, θ) with usual range for parameters θ > −α, etc.; see
also [7, 14]. Theorem 8 is a generalization to splitting rules that allows an extended
parameter range for the same reason as in the binary case: the trivial partition of one
single block is excluded from p and when associating consistent exponential edge lengths
with parameters λm, m≥ 1, the first split of [m+1] happens at a higher and higher rate
and we may have λm →∞. In fact,
κ({π ∈ PN :π|[n] = {B1, . . . ,Bk}}) = λnp(#B1, . . . ,#Bk)
uniquely defines a σ-finite measure on PN \ {N}, the set of non-trivial partitions of N,
associated with a homogeneous fragmentation process. This is closely related to (17)
via Kingman’s paintbox representation κ =
κsν(ds). The extended range was first
observed by Miermont [13] in the special case θ = −1 (related to the stable trees of
Duquesne and Le Gall [5]).
We refer to [10] for a study of spinal partitions of Markovian fragmentation models.
There are notions of fine and coarse spinal partitions. First, remove from Tn the spine
of 1, that is, the path from [n] to {1}. The resulting collection is a disjoint union of
fragmentations of sets Bj , say, that form a partition of {2, . . . , n}, which is called the fine
spinal partition. Second, merge blocks (in the multifurcating case) that were children of
the same spinal vertex; the resulting partition is called the coarse spinal partition. It is
shown that for the splitting rules from the two-parameter family with parameters α and
θ (the Gibbs fragmentations), the fine partition is obtained from the coarse partition by
applying independently for each block of the coarse partition an exchangeable partition
from the two-parameter family of random partitions, with parameters α and α+ θ.
Acknowledgements
This research was supported in part by EPSRC Grant GR/T26368/01 and NSF Grants
DMS-04-05779 and DMS-03-05009. M. Winkel was also supported by the Institute of
Actuaries and the insurance group Aon Limited.
References
[1] Aldous, D. (1991). The continuum random tree. I. Ann. Probab. 19 1–28. MR1085326
[2] Aldous, D. (1996). Probability distributions on cladograms. In Random Discrete Struc-
tures (Minneapolis, MN, 1993). IMA Vol. Math. Appl. 76 1–18. New York: Springer.
MR1395604
http://www.ams.org/mathscinet-getitem?mr=1085326
http://www.ams.org/mathscinet-getitem?mr=1395604
Gibbs fragmentation trees 1003
[3] Berestycki, N. and Pitman, J. (2007). Gibbs distributions for random partitions generated
by a fragmentation process. J. Stat. Phys. 127 381–418. MR2314353
[4] Bertoin, J. (2001). Homogeneous fragmentation processes. Probab. Theory Related Fields
121 301–318. MR1867425
[5] Duquesne, T. and Le Gall, J.-F. (2002). Random trees, Lévy processes and spatial branching
processes. Astérisque 281 vi+147. MR1954148
[6] Ford, D.J. (2005). Probabilities on cladograms: Introduction to the alpha model. Preprint.
arXiv:math.PR/0511246.
[7] Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles.
Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 (Teor. Predst.
Din. Sist. Komb. i Algoritm. Metody 12) 83–102, 244–245. MR2160320
[8] Gnedin, A. and Pitman, J. (2006). Moments of convex distribution functions and completely
alternating sequences. Preprint. arXiv:math.PR/0602091.
[9] Haas, B., Miermont, G., Pitman, J. and Winkel, M. (2006). Continuum tree asymp-
totics of discrete fragmentations and applications to phylogenetic models. Preprint.
arXiv:math.PR/0604350. Ann. Probab. To appear.
[10] Haas, B., Pitman, J. and Winkel, M. (2007). Spinal partitions and invariance under re-
rooting of continuum random trees. Preprint. arXiv:0705.3602. Ann. Probab. To ap-
pear.
[11] Kerov, S. (2005). Coherent random allocations, and the Ewens–Pitman formula. Zap.
Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 (Teor. Predst.
Din. Sist. Komb. i Algoritm. Metody 12) 127–145, 246. MR2160323
[12] McCullagh, P., Pitman, J. and Winkel, M. (2007). Gibbs fragmentation trees. Preprint.
arXiv:0704.0945.
[13] Miermont, G. (2003). Self-similar fragmentations derived from the stable tree. I. Splitting
at heights. Probab. Theory Related Fields 127 423–454. MR2018924
[14] Pitman, J. (2003). Poisson–Kingman partitions. In Statistics and Science: A Festschrift for
Terry Speed. IMS Lecture Notes Monogr. Ser. 40 1–34. Beachwood, OH: Inst. Math.
Statist. MR2004330
[15] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875. Lec-
tures from the 32nd Summer School on Probability Theory held in Saint-Flour, July
7–24, 2002. Berlin: Springer. MR2245368
[16] Schroeder, E. (1870). Vier combinatorische Probleme. Z. f. Math. Phys. 15 361–376.
[17] Semple, C. and Steel, M. (2003). Phylogenetics. Oxford Lecture Series in Mathematics and
Its Applications 24. Oxford Univ. Press. MR2060009
[18] Stanley, R.P. (1999). Enumerative Combinatorics. 2. Cambridge Studies in Advanced Math-
ematics 62. Cambridge Univ. Press. MR1676282
Received April 2007 and revised March 2008
http://www.ams.org/mathscinet-getitem?mr=2314353
http://www.ams.org/mathscinet-getitem?mr=1867425
http://www.ams.org/mathscinet-getitem?mr=1954148
http://arxiv.org/math.PR/0511246
http://www.ams.org/mathscinet-getitem?mr=2160320
http://arxiv.org/math.PR/0602091
http://arxiv.org/math.PR/0604350
http://arxiv.org/math.PR/0705.3602
http://www.ams.org/mathscinet-getitem?mr=2160323
http://arxiv.org/math.PR/0704.0945
http://www.ams.org/mathscinet-getitem?mr=2018924
http://www.ams.org/mathscinet-getitem?mr=2004330
http://www.ams.org/mathscinet-getitem?mr=2245368
http://www.ams.org/mathscinet-getitem?mr=2060009
http://www.ams.org/mathscinet-getitem?mr=1676282
	Introduction
	Characterization of binary Gibbs fragmentations
	Consistent binary Gibbs rules
	Growth rules and embedding in continuous time
	Multifurcating Gibbs fragmentations and Poisson–Dirichlet models
	Acknowledgements
	References
ABSTRACT
  We study fragmentation trees of Gibbs type. In the binary case, we identify
the most general Gibbs-type fragmentation tree with Aldous' beta-splitting
model, which has an extended parameter range $\beta>-2$ with respect to the
${\rm beta}(\beta+1,\beta+1)$ probability distributions on which it is based.
In the multifurcating case, we show that Gibbs fragmentation trees are
associated with the two-parameter Poisson--Dirichlet models for exchangeable
random partitions of $\mathbb {N}$, with an extended parameter range
$0\le\alpha\le1$, $\theta\ge-2\alpha$ and $\alpha<0$, $\theta =-m\alpha$, $m\in
\mathbb {N}$.

<|endoftext|><|startoftext|>
Submitted to the ApJ
Preprint typeset using LATEX style emulateapj v. 6/22/04
EFFICIENT SIMULATIONS OF EARLY STRUCTURE FORMATION AND REIONIZATION
Andrei Mesinger & Steven Furlanetto
Yale Center for Astronomy and Astrophysics, Yale University, New Haven, CT 06520
Submitted to the ApJ
ABSTRACT
Detailed theoretical studies of the high-redshift universe, and especially reionization, are generally
forced to rely on time-consuming N-body codes and/or approximate radiative transfer algorithms. We
present a method to construct semi-numerical “simulations”, which can efficiently generate realizations
of halo distributions and ionization maps at high redshifts. Our procedure combines an excursion-
set approach with first-order Lagrangian perturbation theory and operates directly on the linear
density and velocity fields. As such, the achievable dynamic range with our algorithm surpasses
the current practical limit of N-body codes by orders of magnitude. This is particularly significant in
studies of reionization, where the dynamic range is the principal limiting factor because ionized regions
reach scales of tens of comoving Mpc. We test our halo-finding and ionization-mapping algorithms
separately against N-body simulations with radiative transfer and obtain excellent agreement. We
compute the size distributions of ionized and neutral regions in our maps. We find even larger ionized
bubbles than do purely analytic models at the same volume-weighted mean hydrogen neutral fraction,
x̄HI, especially early in reionization. We also generate maps and power spectra of 21-cm brightness
temperature fluctuations, which for the first time include corrections due to gas bulk velocities. We
find that velocities widen the tails of the temperature distributions and increase small-scale power,
though these effects quickly diminish as reionization progresses. We also include some preliminary
results from a simulation run with the largest dynamic range to date: a 250 Mpc box that resolves
halos with massesM ≥ 2.2×108M⊙. We show that accurately modeling the late stages of reionization,
x̄HI ∼< 0.5, requires such large scales. The speed and dynamic range provided by our semi-numerical
approach will be extremely useful in the modeling of early structure formation and reionization.
Subject headings: cosmology: theory – early Universe – galaxies: formation – high-redshift – evolution
1. INTRODUCTION
Accurately modeling the formation of bound struc-
tures is invaluable for understanding any process in
the early universe. Reionization, the epoch when
radiation from early generations of astrophysical objects
managed to ionize the intergalactic medium (IGM), is
particularly sensitive to the distribution of collapsed
structure. Current observations paint a complex pic-
ture of the reionization epoch (Mesinger & Haiman
2004; Wyithe & Loeb 2004; Fan et al. 2006;
Mesinger & Haiman 2006; Malhotra & Rhoads 2004;
Furlanetto et al. 2006c; Malhotra & Rhoads 2006;
Page et al. 2006; Kashikawa et al. 2006; Totani et al.
2006). The next generation of instruments (James
Webb Space Telescope; 21-cm instruments such as the
Low Frequency Array and the Mileura Widefield Array
Low-Frequency Demonstrator; CMB polarization mea-
surements with Planck, etc.), could potentially shed light
on this poorly understood milestone. Unfortunately,
we still do not have accurate models of reionization
with which to interpret these upcoming (and current)
observations.
The main difficulty lies in the enormous dynamic range
required. Ionized regions are expected to reach charac-
teristic sizes of tens of comoving Mpc (Furlanetto et al.
2004c; Furlanetto & Oh 2005), which is over seven or-
ders of magnitude in mass larger than the pertinent cool-
ing mass, corresponding to gas with a temperature of
T ∼ 104 K (e.g. Efstathiou 1992; Thoul & Weinberg
1996; Gnedin 2000b; Shapiro et al. 1994). The required
dynamic range is even larger if smaller “minihalos” be-
low this cooling threshold are important during reion-
ization. Because of the steep mass dependence of halo
abundances, halos with masses close to the cooling mass
could dominate the photon budget. Hence modeling
reionization requires simulation box sizes of hundreds
of megaparsecs on a side, with extremely high resolu-
tion. Attempts to overcome these obstacles have gener-
ally followed the same fundamental and well-trod path
(e.g. Gnedin 2000a; Razoumov et al. 2002; Ciardi et al.
2003; Sokasian et al. 2003; Iliev et al. 2006b; Zahn et al.
2007; Trac & Cen 2006): (1) N-body codes are run to
generate halo distributions; (2) a simple prescription is
used to relate the halo mass to an ionizing efficiency; (3)
approximate methods (generally so-called ray-tracing al-
gorithms) are used to model radiative transfer (RT) on
large scales.
Even with modest halo resolution
(Springel & Hernquist 2003) of tens of dark matter
particles per halo, such schemes are computationally
limited to box sizes of tens of megaparcecs, if they
wish to resolve the likely cooling mass. McQuinn et al.
(2006a) extended the mass resolution of their sim-
ulations by using a merger tree scheme to populate
sub-grid scales with unresolved halos in a stochastic
manner. Such hybrid schemes are useful for extending
the dynamic range, but merger trees require a number
of corrections to achieve consistent mass functions (see,
e.g., Sheth & Pitman 1997; Benson et al. 2005, and Fig.
1 in McQuinn et al. 2006a) and to track individual halos
with redshift. Moreover, although they are perfectly
adequate for many purposes (including studying the
http://arxiv.org/abs/0704.0946v1
large-scale features of reionization), they prevent one
from taking full advantage of the simulation.
Aside from dynamic range, the other main limiting fac-
tor in all of the above numerical approaches is speed.
Even if the relevant scales can be resolved with N-body
codes, such as may be the case in the early phase of reion-
ization or with hybrid stochastic schemes. The codes
themselves generally take days to run on large super-
computing clusters, with the approximate RT algorithms
consuming a few additional days. The computational
cost of each simulation makes it difficult to explore the
full range of parameter space for reionization, which is
particularly large because we know so little about high-
redshift galaxies.
The computational cost becomes truly prohibitive if
hydrodynamics is included: the largest such simulation
of reionization performed to date spanned only 10h−1
Mpc (Sokasian et al. 2003). Including self-consistent de-
scriptions of galaxy formation – even at the approximate
level currently implemented in lower-redshift cosmolog-
ical simulations (e.g., Springel & Hernquist 2003) – re-
quires hydrodynamics, so N-body simulations of reion-
ization are limited to semi-analytic prescriptions for star
formation, feedback, etc. It is therefore worthwhile to
explore even simpler schemes.
The purpose of this paper is to introduce approximate
but efficient methods for generating halo distributions at
high redshifts as well as for generating the associated ion-
ization maps. We apply an excursion-set approach (e.g.
Bond et al. 1991; Lacey & Cole 1993) to the filtering of
a realization of the linear density field and then adjust
halo locations with first-order perturbation theory. We
can thus generate halo distributions at any given red-
shift, without explicitly including information from any
higher redshifts. This scheme is an updated form of
the “peak-patch” formalism developed and validated by
Bond & Myers (1996a,b), although it was conceived and
implemented completely independently. We then apply
a similar technique to obtain the ionization field from the
halo field. This part is similar to the schemes described
in Zahn et al. (2005, 2007), except applied to our effi-
ciently built halo distributions. As such, our methods
allow us to make general predictions about non-linear
processes, such as structure formation and reionization,
without making use of time-guzzling cosmological sim-
ulations. The speed of our approach also allows us to
explore a larger dynamic range than is possible with cur-
rent cosmological simulations while preserving detailed
spatial information (at least in a statistical sense), un-
like purely analytic models.
This paper is organized as follows. In § 2, we introduce
and test the components of our halo finding algorithm.
In § 3, we introduce and test our HII bubble finding
algorithm. In § 4, we use our semi-numerical scheme
to generate maps and power spectra of expected 21-cm
brightness temperature fluctuations throughout reioniza-
tion. In § 5, we summarize our key findings and present
our conclusions.
Unless stated otherwise, we quote all quantities in co-
moving units. We adopt the background cosmological
parameters (ΩΛ, ΩM, Ωb, n, σ8, H0) = (0.76, 0.24,
0.0407, 1, 0.76, 72 km s−1 Mpc−1), consistent with the
three–year results of the WMAP satellite (Spergel et al.
2006).
2. SEMI-NUMERICAL SIMULATIONS OF HALO
PROPERTIES
In brief, our algorithm generates a linear density field
and identifies halos within it. Because only linear evo-
lution is required, the algorithm is fast and flexible. We
generate 3D Monte-Carlo realizations of the linear den-
sity field on a box with sides of length L = 100 Mpc and
N = 12003 grid cells. As such, we are able to take advan-
tage of many pre-existing tools operating on the linear
density field alone. Our method consists of the following
principal steps:
1. creating the linear density and velocity fields
2. filtering halos from the linear density field using
the excursion-set formalism
3. adjusting halo locations using their linear-order
displacements
Step (1) only needs to be done once for each realization,
since it is independent of redshift. As mentioned above,
steps (2) and (3) need only be performed on redshifts of
interest, i.e. since our output at redshift z is independent
of any outputs at higher redshifts, there is no need for
our code to “run down” to z, as is the case for N-body
codes.
Our algorithm is an updated and simplified version of
the “peak-patch” algorithm of Bond & Myers (1996a);
we refer the interested reader there for more detailed
explanations of some steps. A simpler version has also
been used by Scannapieco et al. (2002) to study metal
enrichment at high redshifts.
We perform our semi-numerical simulations on a sin-
gle desktop Mac Pro with two dual-core 3.00 GHz Quad
Xeon processors and 16 GB of RAM. ForN = 12003, step
(1) takes ∼ 1 hour. For a given redshift in our range of
interest, specifically for z = 8.75, steps (2) and (3) take
∼ 2.5 hours. To achieve comparable halo mass resolution
(including halos with M ∼> 10
7M⊙) with a minimum of
∼ 500 particles per halo (Springel & Hernquist 2003), N-
body codes would require a prohibitively large number
of particles, N ∼ 1012! Below we describe in detail the
components of our model.
2.1. The Linear Density Field
Our linear density field is generated in much the same
way as it is for N-body codes. We briefly outline the
procedure here.
The density field of the universe, δ(x) ≡ ρ(x)/ρ̄ − 1,
in the linear regime1 is well-represented as a Gaussian
random field, whose statistical properties are fully de-
fined by its power spectrum, σ2(k) ≡ 〈|δ(k)|2〉. Here,
δ(k) is the Fourier transform of δ(x), and the standard
assumption of isotropy implies σ2(k) = σ2(k) while ho-
mogeneity implies that there are no density fluctuations
with wavelengths larger than the box size L = V 1/3. We
use the following, standard (e.g. Bagla & Padmanabhan
1997; Sirko 2005) Fourier transform conventions:
δ(k) =
δ(x)e−ik·x , (1)
1 In linear theory, density perturbations evolve in redshift as δ(z)
= δ(0)D(z), where D(z) is the linear growth factor normalized
so that D(0) = 1 (e.g., Liddle et al. 1996). Unless the redshift
dependence is noted explicitly, from this point forward we will work
with quantities linearly-extrapolated to z = 0.
with the inverse transform being
δ(x) =
δ(k)eik·x . (2)
The discrete simulation box only permits a finite set
of wavenumbers: k = ∆k(i, j, k), where ∆k = 2π/L and
i, j, k are integers in the range (-
N/2]. For each
independent wavenumber,2 we assign
δ(k) =
σ2(k)
(ak + ibk) , (3)
where ak and bk are drawn from a zero-mean Gaussian
distribution with unit variance. We use the power spec-
trum from Eisenstein & Hu (1999). Then the real-space
density field, δ(x), is obtained by performing an inverse
Fourier transform on δ(k).
2.2. The Linear Velocity Field
We construct a linear velocity field corresponding to
our linear density field using the standard Zel’Dovich
approximation (c.f. Zel’Dovich 1970; Efstathiou et al.
1985; Sirko 2005):
x1=x+ ψ(x) , (4)
v≡ ẋ1 = ψ̇(x) , (5)
δ(x)=−∇·[(1 + δ(x))ψ(x)]
≈−∇ · ψ(x) , (6)
where x and x1 denote initial (Lagrangian) and updated
(Eulerian) coordinates, respectively, ψ(x) is the displace-
ment vector, and the last equation follows from the conti-
nuity criterion, with the final approximation using linear-
ity, δ(x) ≪ 1. We note again that all units are comoving,
unless stated otherwise. From the above, one can relate
the velocity mode in our simulation at redshift z to the
linear density field:
v(k, z) =
Ḋ(z)δ(k) , (7)
where for computational convenience differentiation is
performed in k-space.
Another convenient property of this first-order
Zel’Dovich approximation is that the velocity field can
be decomposed into purely spatial, vx(x), and purely
temporal, vz(z), components:
v(x, z) = vz(z)vx(x) , (8)
where vz(z) = Ḋ(z) and vx(x) is the inverse Fourier
transform of ikδ(k)/k2. This is computationally conve-
nient, as we only need to compute the vx(x) field once
in order to be able to scale it for all redshifts, and it
also allows us to write a simple, exact expression for the
integrated linear displacement field, Ψ. When eq. (8) is
integrated from some large initial z0 [D(z0) ≪ D(z)], the
total displacement is just
Ψ(x, z)= [D(z)−D(z0)]vx(x)
≈D(z)vx(x) (9)
2 Since δ(x) is real-valued, only half of the k-modes defined
above are independent. The other half are determined by the usual
Hermitian constraints for real-valued functions (see for example
Hockney & Eastwood 1988; Bagla & Padmanabhan 1997).
We make use of this displacement field to adjust the halo
locations obtained by our filtering procedure (see § 2.4),
as well as to adjust the linear density field for our 21-cm
temperature maps (see § 4).
In principle, one could obtain non-linear velocities by
mapping the linear overdensity to a corresponding non-
linear overdensity obtained from a spherical collapse
model (Mo & White 1996), and then taking the time
derivative of the non-linear overdensity. However, due
to the large spread in the dynamical times of the non-
linear density field, accurately capturing the time evolu-
tion is non-trivial. Furthermore, although the non-linear
density field implicitly captures the velocities of collaps-
ing gas, mapping each pixel’s linear density to its non-
linear counterpart independently of other nearby pixels
does not properly preserve correlations on larger scales.
Hence, we choose to use the linear density field directly
in estimating velocities. For the purposes of studying the
ionization field, we are further justified in this procedure
because our final ionization maps are smoothed on large
scales, on which most pixels are still in the linear regime
at the high redshifts of interests. It is possible to include
higher-order contributions to the Zel’Dovich approxima-
tion where necessary (e.g., Scoccimarro & Sheth 2002).
2.3. Halo Filtering
In standard Press-Schechter theory (PS; see e.g.,
Press & Schechter 1974; Bond et al. 1991; Lacey & Cole
1993), the halo mass function can be written as
∂n(> M, z)
δc(z)
σ2(M)
∂σ(M)
c (z)
2σ2(M)
where n(> M, z) is the mean number density of halos
with total mass greater than M , ρ̄ = ΩMρcrit is the
mean background matter density, δc(z) ∼ 1.68/D(z) is
the scale-free critical over–density evaluated in the case
of spherically symmetric collapse (Peebles 1980), and
σ2(M) =
σ2(k)W 2(k,M) , (11)
is the squared r.m.s. fluctuation in the mass enclosed
within a region described by the filter function,W (k,M),
normalized to integrate to unity.
Although the PS mass function in eq. (10) is in fair
agreement with simulations, especially for halos near the
characteristic mass, at low redshifts it underestimates
the number of high–mass halos and overestimates the
number of low–mass halos when compared with large nu-
merical simulations (e.g. Jenkins et al. 2001). A mod-
ified expression shown to fit low-redshift simulation re-
sults more accurately (to within ∼ 10%) was obtained
by Sheth & Tormen (1999):
∂n(> M, z)
= − ρ̄
∂[ln σ(M)]
ν̂e−ν̂
where ν̂ ≡
aδc(z)/σ(M), and a, p, and A are fitting
parameters. Sheth et al. (2001) derive this form of the
mass function by including shear and ellipticity in model-
ing non–linear collapse, effectively changing the scale-free
critical over–density δc(z), into a function of filter scale,
δc(M, z) =
aδc(z)
1 + b
σ2(M)
aδ2c (z)
. (13)
Here b and c are additional fitting parameters (a is
the same as in eq. 12). For the constants above, we
adopt the recent values obtained by Jenkins et al. (2001),
who studied a large range in redshift and mass: a =
0.73, A = 0.353, p = 0.175, b = 0.34, c = 0.81. We
note, however, that the situation at high redshifts is
less clear: studies disagree on the relative accuracy of
the Press & Schechter (1974) and Jenkins et al. (2001)
forms (Reed et al. 2003; Iliev et al. 2006b; Zahn et al.
2007). Our algorithm can be trivially modified to ac-
commodate other choices for the mass function; fortu-
nately, for the purposes of the ionization maps (see §3),
the choice of mass function makes very little difference
because all have a similar dependence on the local den-
sity (Furlanetto et al. 2006a).
The mass functions in equations (10) and (12) can be
obtained by the standard excursion set random walk pro-
cedure. The approach is to smooth the density field
around a point, x, on successively smaller scales start-
ing with M → ∞ [where σ2(M) → 0] and to identify
the point as belonging to the halo with the largest M
such that δ(x,M) > δc(M, z). If W
2(k,M) is chosen to
have a sharp cut-off, this procedure amounts to a random
walk of δ(x,M) along the mass axis, since the change in
δ(x,M) as the scale is shrunk is independent of δ(x,M)
for a top-hat filter in k-space (see eq. 11).
We perform this procedure on our realization of the
linear density field by filtering the field using a real-
space top-hat filter3, starting on scales comparable to
the box size and going down to grid cell scales, in log-
arithmic steps of width ∆M/M = 1.2.4 At each filter
scale, we use the scale-dependent barrier in eq. (13)
to mark a collapsed halo if δ(x,M) > δc(M, z). Filter
scales large enough that collapsed structure is extremely
unlikely, δc(M, z) > 7σ(M), are skipped (Mesinger et al.
2005). Since this procedure treats each cell as the center
of a spherical filter, neighboring pixels are not properly
placed in the same halo. Because of this, we discount
halos which overlap with previously marked halos.
As mentioned above, this algorithm is similar
to the “peak-patch” approach first introduced by
Bond & Myers (1996a). The primary differences are:
3 There is a slight swindle in the current application of this for-
malism. The filter function is assumed to be a top-hat in k-space
in order to facilitate the analytic random walk approach described
above. However, when the power spectrum is normalized to ob-
servations [i.e. σ(R = 8h−1Mpc] = σ8), the filter that is used to
define the mass M corresponding to R is a top-hat filter in real
space. Nevertheless, it has been shown that the mass function is
not very sensitive to this filter choice (Bond et al. 1991).
4 We note that Mesinger et al. (2005) required a much smaller
step size at these redshifts, ∆M/M ∼ 0.1, in order to produce ac-
curate mass functions using 1D Monte-Carlo random walks. How-
ever, here we find that we can reproduce accurate mass functions
with a larger step size, since in our 3D realization of the density
field, “overstepping” δc(M, z) due to a large filter step size can be
compensated with a small offset in the filter center, i.e. by cen-
tering the filter in a neighboring cell. This is the case since over-
stepping δc(M, z) means that some dense matter between the two
filter scales was “missed”. In a 1D Monte-Carlo random walk this
matter is unrecoverable; however, in a 3D realization of the density
field, the missed matter will be picked up by a filter centered on a
neighboring cell.
(1) we use the Jenkins et al. (2001) barrier to identify
halos (rather than calculating the strain tensor to ac-
count for ellipsoidal collapse), (2) we do not separately
identify peaks in the density field (this step is not re-
quired given modern computing power), and (3) we use
the “full exclusion” criterion for preventing halo overlap.
Bond & Myers (1996b) found that a “binary exclusion”
method in which pairs of overlapping halos are compared
and eliminated was somewhat more accurate. However,
at the high redshifts of interest to us, halo overlap is rare,
and we are primarily interested in the large-scale prop-
erties of the halo field, which are relatively insensitive to
the details of the overlap criterion.
We also note that our halo finder is similar
in spirit to the PTHalos algorithm introduced by
Scoccimarro & Sheth (2002) to generate mock galaxy
surveys at low redshifts. There are two key differences.
First, at present we use only first-order perturbation the-
ory to displace the particles.5 This limits us to higher
redshifts, where velocities are smaller. However, our al-
gorithm does not require particles in order to resolve ha-
los and hence can accommodate a considerably larger
dynamic range than PTHalos.
Mass functions resulting from this procedure are shown
as points in Figure 1, with error bars indicating 1-σ Pois-
son uncertainties and bin widths spanning our mass filter
steps. Dotted red curves denote PS mass functions gen-
erated by eq. (10); short–dashed blue curves denote ex-
tended PS conditional mass functions generated by eq.
(10) but also taking into account the absence of den-
sity modes longer than the box size; long–dashed green
curves denote mass functions generated using the Sheth-
Tormen correction in eq. (12). The upper (lower) set
of curves and points correspond to redshifts of z=6.5
(z=10). The dotted and short–dashed curves overlap at
these redshifts due to our large box size (L = 100 Mpc),
so we are immune to the finite box effects pointed out by
Barkana & Loeb (2004).
Fig. 1 shows that we obtain accurate mass functions
for M ∼> 10
8M⊙. Our procedure seems to underpre-
dict the abundance of halos with masses approaching
the cell size, Mcell ∼ 107M⊙. However, as the Jeans
mass corresponding to a gas temperature of ∼ 104 K
is MJ(z ∼ 8) ∼ 108M⊙, in subsequent calculations, we
only use halos with masses greater than Mmin = 10
Using thisMmin, we match the collapse fraction obtained
by integrating eq. (12) to better than ∼ 10%.
This mass cutoff corresponds to the minimum tem-
perature required for efficient atomic hydrogen cooling
and would be the pertinent mass scale if: (1) the H2
cooling channel is suppressed, e.g. due to a perva-
sive Lyman-Werner (LW) background, and if (2) photo-
ionization feedback is ineffective at suppressing gas cool-
ing and collapse onto higher mass halos. While feed-
back at high redshifts remains poorly-constrained, both
of these assumptions seem reasonable during the mid-
dle stages of reionization on which we focus. A dis-
sociating LW background is likely to have established
5 We note that a similar scheme to ours has been independently
created by O. Zahn (private communication). This scheme uses
a simple Press-Schechter barrier but adjusts halo locations follow-
ing second-order Lagrangian perturbation theory. However, he has
found that the second-order corrections make very little difference
to the map.
Fig. 1.— Mass functions generated from our halo filtering pro-
cedure discussed in §2.3 are shown as points. Dotted red curves
denote PS mass functions generated by eq. (10); short–dashed
blue curves denote extended PS conditional mass functions gener-
ated by eq. (10) but also taking into account the absence of density
modes longer than the box size; long–dashed green curves denote
mass functions generated using the Sheth-Tormen correction in eq.
(12). The upper (lower) set of curves and points correspond to
redshifts of z=6.5 (z=10).
itself well before the universe is significantly ionized
(Haiman et al. 1997). Model-dependent empirical evi-
dence supporting the suppression of star formation in
smaller mass halos, M ∼< Mmin, can also be gleaned
from WMAP data (Haiman & Bryan 2006). Further-
more, although early work suggested that an ionizing
background could partially suppress star formation in
halos with virial temperatures of Tvir ∼< 3.6 × 10
(M ∼< 2×10
9M⊙) (Thoul & Weinberg 1996), more recent
studies (Kitayama & Ikeuchi 2000; Dijkstra et al. 2004)
find that at high redshifts (z ∼> 3), self-shielding and the
increased cooling efficiency could be strong countering
effects for halos with virial temperatures Tvir > 10
We postpone a more detailed analysis of the reionization
footprint left by photo-ionization feedback to a future
work.
2.4. Adjusting Halo Locations
Once the halo field is obtained, we use the displace-
ment field obtained through eq. (9) to adjust the halo
locations at each redshift. This corrects for the enhanced
halo bias in Eulerian space with respect to our filtering,
which is done in Lagrangian space (i.e. using the initial
locations at large z). For computational convenience, we
smooth the 12003 velocity field onto a coarser-grained
2003 grid before adjusting halo locations. The choice of
resolution, where each cell is (100 Mpc)/200 = 0.5 Mpc
on a side, is somewhat arbitrary here, and we have veri-
fied that our halo and 21-cm power spectra are unaffected
Fig. 2.— Halo power spectra at z = 8.7, with L = 20 h−1Mpc
and cosmological parameters taken from McQuinn et al. (2006a).
The solid red curve is the halo power spectrum from an N-body
simulation obtained from McQuinn et al. (2006a) (c.f. the bottom
panel of their Fig. 2). The short-dashed green and the long-dashed
violet curves are obtained from our filtering procedure with and
without the halo location adjustments, respectively.
by this choice. We also note that in linear theory, the
mean velocity dispersion inside a (0.5 Mpc)3 sphere with
mean density at z = 10 is a factor of ∼10 lower than the
r.m.s. bulk velocity of such regions, so smoothing over
smaller scale velocities appears reasonable. Furthermore,
we keep in mind that our “endproducts” in this work are
ionization and 21-cm temperature fluctuation maps, for
which such “low-resolution” is more than adequate (com-
pare, e.g., to N-body simulations of reionization, which
typically have similar cell sizes for the radiative transfer
component).
In Figure 2, we plot the halo power spectrum, de-
fined as ∆hh(k, z) = k
3/(2π2V ) 〈|δhh(k, z)|2〉k, where
δhh(x, z) ≡ Mcoll(x, z)/〈Mcoll(z)〉 − 1 is the collapsed
mass field.6 The solid red curve is the halo power spec-
trum from a 20 h−1 Mpc N-body simulation at z = 8.7
obtained from McQuinn et al. (2006a) (c.f. the bottom
panel of their Figure 2). The short-dashed green and the
long-dashed violet curves are obtained from our filtering
procedure (matching the assumed cosmology) with and
without the halo location adjustments, respectively. We
note that ignoring the cumulative motions of halos re-
sults in an underestimate of the power of long-wavelength
modes of the halo field by a factor of ∼ 2 in this case.
The average Eulerian bias of these halos is ∼ 2, about
half of which comes from the correction from Lagrangian
to Eulerian coordinates.
After the halo locations are adjusted according to lin-
ear theory, our halo power spectrum agrees almost per-
6 We use the collapsed mass field, rather than the individual
galaxies, because we calculate the power from the smoothed cells.
fectly with the simulation. By design our procedure in-
cludes Poisson fluctuations in the halo number counts,
which dominate the power spectrum at k ∼> 5 h/Mpc
and are lost in purely analytic estimates (McQuinn et al.
2006a). We also note that both the halo mass func-
tions and power spectra are statistical tests and hence
the agreement shown here does not imply that our halo
field has a one-to-one mapping with an N-body halo
field sourced by identical initial conditions. Indeed,
Gelb & Bertschinger (1994) showed that those particles
located nearest initial linear density peaks are not nec-
essarily incorporated into massive galaxies. The “peak
particle” algorithm is less robust than our smoothing
technique, but we still do not expect to recover halo
masses or locations precisely. We plan on doing a “one-
on-one” comparison between halo fields obtained from
our halo finder to those obtained from N-body codes
in a future work. However, it is certainly encouraging
that the very similar “peak-patch” group finding formal-
ism of Bond & Myers (1996a) did very well when com-
pared “one-on-one” to N-body codes at large mass scales
(Bond & Myers 1996b).
In Figure 3 we show slices through the halo field from
our simulation box at z = 8.25, generated by the above
procedure, again with (right panel) and without (left
panel) the halo location adjustments. In the figure, the
halo field is mapped to a lower resolution 4003 grid for
viewing purposes. Each slice is 100 Mpc on a side and
0.25 Mpc deep. Collapsed halos are shown in blue. Vi-
sually, it is obvious that peculiar motions increase halo
clustering.
3. GENERATING THE IONIZATION FIELD
Once the halo field is generated as described above,
we can perform a similar filtering procedure (also us-
ing the excursion-set formalism) to obtain the ionization
field (similar methods have been discussed by Zahn et al.
2005, 2007). The time required for this final step is a
function of x̄HI, with large x̄HI requiring less time than
small x̄HI. Specifically, at x̄HI ∼ 0.5 this step takes ∼ 15
minutes to generate a 2003 ionization box on our work-
station.
There are two main differences between the halo fil-
tering and the HII bubble filtering procedures: (1) HII
bubbles are allowed to overlap, and (2) the excursion
set barrier (the criterion for ionization) becomes, as per
Furlanetto et al. (2004a):
fcoll(x1,M, z) ≥ ζ−1 , (14)
where ζ is some efficiency parameter and fcoll(x1,M, z)
is the fraction of mass residing in collapsed halos inside
a sphere of mass M = 4/3πR3ρ̄[1 + 〈δnl(x1, z)〉R], with
mean physical overdensity 〈δnl(x1, z)〉R, centered on Eu-
lerian coordinate x1, at redshift z.
Equation (14) is only an approximate model and makes
several simplifying assumptions about reionization. In
particular, it assumes a constant ionizing efficiency per
halo and ignores spatially-dependent recombinations and
radiative feedback effects. It can easily be modified to
include these effects (e.g., Furlanetto et al. 2004b, 2006a;
Furlanetto & Oh 2005), and we plan to do so in future
work. Here we present the simplest case in order to best
match current RT numerical simulations.
This prescription models the ionization field as a two-
phase medium, containing fully-ionized regions (which
we refer to as HII bubbles) and fully-neutral regions.
This is obviously much less information than can be
gleaned from a full RT simulation, which precisely tracks
the ionized fraction. However, HII bubbles are typi-
cally highly-ionized during reionization, and for many
purposes (such as for 21 cm maps) this two-phase ap-
proximation is perfectly adequate.
In order to “find” the HII bubbles at each redshift we
smooth the halo field onto a 2003 grid. Then we filter
the halo field using a real-space top-hat filter, starting on
scales comparable to the box size and decreasing to grid
cell scales in logarithmic steps of width ∆M/M = 0.33.
At each filter scale, we use the criterion in eq. (14)
to check whether the region is ionized. If so, we flag
all pixels inside that region as ionized. We do this for
all pixels and scales, regardless of whether the resulting
bubble would overlap with other bubbles. Note, there-
fore, that the nominal ionizing efficiency ζ that we use
as an input parameter does not equal (1 − x̄HI)/fcoll.
They typically differ by . 30%, with ζfcoll < 1 − x̄HI
early in reionization and ζfcoll > 1− x̄HI late in reioniza-
tion). Unfortunately, we thus cannot use our algorithm
to self-consistently predict the time evolution of the ion-
ized fraction (rather, that must be prescribed from some
other model). Of course, the same is true for N-body
simulations, because the evolution of the ionized fraction
depends on the evolving ionization efficiency of galaxies
and cannot be self-consistently included in any present-
day simulation.
In order to obtain the density field used in eq. (14),
δnl(x1, z), we use the Zel’Dovich approximation on our
linear density field, δ(x), in much the same manner as
we did to adjust our halo field in § 2.4. Starting at some
arbitrarily large initial redshift (we use z0 = 50), we
discretize our high-resolution 12003 field into “particles”
whose mass equals that in each grid cell. We then use
the displacement field (eq. 9) to move the particles to
new locations at each redshift. This resulting mass field
is then smoothed onto our lower resolution 2003 box to
obtain δnl(x1, z). We then recalculate the velocity field
(§ 2.2) using the new densities.
Zahn et al. (2007) showed that a very similar HII bub-
ble filtering procedure performed on an N-body halo field
was able to reproduce the ionization topology obtained
through a ray-tracing RT algorithm fairly well. Their
algorithm differs from ours in two ways. First, they used
a slightly different barrier definition; however, this dif-
ference has only a small impact on the ionization topol-
ogy.7 More importantly, for each filter scale at each pixel,
Zahn et al. (2007) flag only the center pixel as ionized if
the barrier is crossed, whereas we flag the entire filtered
sphere.
In order to test our bubble filtering algorithm, we ex-
ecute it on the same N-body halo field at z = 6.89 as
was used to generate the bottom panels of Fig. 3 in
Zahn et al. (2007). We compare analogous ionization
maps created using various algorithms in Figure 4. All
7 Specifically, in order to match the physics of their simulations
better, they required
dt fcoll > ζ
−1. However, the density mod-
ulation ends up nearly identical to our model, so the topology is
almost unchanged.
Fig. 3.— Slices through the halo field from our simulation box at z = 8.25. The halo field is generated on a 12003 grid and then mapped
to a 4003 grid for viewing purposes. Each slice is 100 Mpc on a side and 0.25 Mpc deep. Collapsed halos are shown in blue. The left panel
shows the halo field directly filtered in Lagrangian space; the right panel maps the field to Eulerian space according to linear theory (see
§ 2.4 and eq. 9). The right panel corresponds to the bottom-left (x̄HI = 0.53) ionization field in Figure 5.
slices are 93.7 Mpc on a side and 0.37 Mpc deep, with ζ
adjusted so that the mean neutral fraction in the box is
x̄HI = 0.49. Ionized regions are shown as white. The left-
most and right-most panels are taken from Zahn et al.
(2007). The left-most panel was created by perform-
ing their bubble filtering procedure directly on the lin-
ear density field (without explicitly identifying halos).
The second panel was created by performing their bub-
ble filtering procedure on their N-body halo field, but
with the slightly different barrier definition in eq. (14).
The third panel was created by performing our bubble
filtering procedure on the same N-body halo field, but ig-
noring density fluctuations outside of halos (i.e. setting
〈δnl(x1, z)〉R = 0), which we have verified give nearly
identical bubble maps as our full procedure (so long as
x̄HI is fixed). The right-most panel was created using
an approximate RT algorithm (Abel & Wandelt 2002;
Sokasian et al. 2001, 2003) on the same halo field.
It is immediately obvious from Fig. 4 that all of the
approximate maps (first three panels) reproduce the RT
map (right-most panel) fairly well. Even the HII bub-
ble filtering performed directly on the linear density field
(left-most panel) performs well, which is encouraging, as
that is the starting point for our semi-numerical proce-
dure and we only improve on this scheme.
Figure 4 shows that our HII bubble filtering algorithm
is an excellent approximation to RT. The similar algo-
rithm proposed by Zahn et al. (2007) also performs well.
In comparison, our algorithm produces somewhat more
“bubbly” maps but appears to better capture the connec-
tivity of HII regions. Both are an obvious improvement
on directly filtering the linear density field.
Of course, in our full algorithm we identify halos from
the linear density field (rather than from simulations), so
our method consumes comparable processing time to the
one used to generate the leftmost panel in Figure 4, once
the halos have been identified. Moreover, we are able
to capture the “stochastic” component of the halo bias
that causes the relatively large differences between the
leftmost panel and the full RT simulation. That is, the
algorithm used to generate the leftmost panel uses the
large-scale linear density field to predict the distribution
of halos (Zahn et al. 2005, 2007). In reality, the rela-
tion is not deterministic because of random fluctuations
in the small-scale modes comprising each region. This
leads to nearly Poisson scatter in the halo number densi-
ties (Sheth & Lemson 1999; Casas-Miranda et al. 2002)
that can substantially modify the bubble size distribution
whenever sources are rare, particularly early in reion-
ization (Furlanetto et al. 2006a). By directly sampling
the small-scale modes to build the halo distribution, we
better recover this scatter (at least statistically, as illus-
trated by Fig. 2). Another way to include this scatter
is by directly sampling halos from an N-body simulation
(as in Zahn et al. 2007, or the second panel of Fig. 4),
although that obviously requires much more computing
power.
3.1. Ionization Maps
Now that we have demonstrated in turn the success of
our halo and bubble filtering procedures, we present the
resulting ionization maps when the two are combined.
In Figure 5, we show 100 Mpc × 100 Mpc × 0.5 Mpc
slices through our 2003 ionization field at z = 10, 9,
8.25, 7.25 (left to right across rows). With the assump-
tion of ζ = 15.1, these redshifts correspond to x̄HI = 0.89,
0.74, 0.53, 0.18, respectively. As has been pointed out by
Furlanetto et al. (2004c), the neutral fraction is the more
relevant descriptor; bubble morphologies at a constant
x̄HI vary little with redshift (see also McQuinn et al.
Fig. 4.— Slices from the ionization field at z = 6.89 created using different algorithms. All slices are 93.7 Mpc on a side and 0.37 Mpc
deep, with the mean neutral fraction in the box being x̄HI = 0.49. Ionized regions are shown as white. The left-most panel was created
by performing the bubble filtering procedure of Zahn et al. (2007) directly on the linear density field. The second panel was created by
performing their bubble filtering procedure on their N-body halo field, but with the slightly different barrier definition in eq. (14). The
third panel was created by performing our bubble filtering procedure described in § 3 on the same N-body halo field. The right-most panel
(from Zahn et al. 2007) was created using an approximate RT algorithm on the same halo field.
2006a). The bottom-left panel corresponds to the halo
field in the top-right panel of Fig. 3, generated on a
high-resolution 12003 grid.
To quantify the ionization topology resulting from our
method, we calculate the size distributions of both the
ionized and neutral regions. We randomly choose a pixel
of the desired phase (neutral or ionized), and record the
distance from that pixel to a phase transition along a
randomly chosen direction. We repeat this Monte Carlo
procedure 107 times. Volume-weighted probability dis-
tribution functions (PDFs) produced thusly are shown
by the solid curves in Figure 6 for ionized regions (top
panel) and neutral regions (bottom panel). Curves corre-
spond to (z, x̄HI) = (10, 0.89), (9.25, 0.79), (8.50, 0.61),
(8.00, 0.45), (7.50, 0.27), (7.00, 0.10), from left to right
in the top panel, respectively (or from right to left in
the bottom panel). All curves are normalized so that the
probability density integrates to unity.
It is useful to compare these distributions to the an-
alytic bubble mass function of Furlanetto et al. (2004c);
although this analytic approach is motivated by the same
excursion set barriers as our semi-numerical approach, it
does not account for the full geometry of sources. We
compute the probability distribution from the analytic
model by assuming purely spherical bubbles and convolv-
ing with the volume-weighted distance to the sphere’s
edge:
p(r) dr =
2πr2 dr
(1− x̄HI)
dRnb(R)
, (15)
where nb(R) is the comoving number density of bub-
bles with radii between R and R + dR (taken from
Furlanetto et al. 2004c).
Several points are evident from Figures 5 and
6. As expected (e.g., Furlanetto et al. 2004c, 2006a;
McQuinn et al. 2006a), there is a well-defined bubble
scale at each neutral fraction, despite some scatter in
the sizes. This scale also gets more pronounced (i.e. the
PDF peaks more) as reionization progresses; this is a
result of the changing shape of the underlying matter
power spectrum (Furlanetto et al. 2006a).
Also, the purely analytic estimates underpredict the
size distributions at all values of the neutral fraction,
though they do become increasingly accurate as the neu-
tral fraction decreases. This trend is perhaps counterin-
tuitive, as the analytic model, which rests on the assump-
tion of spherical bubbles, should perform best when the
bubbles are isolated, as one would expect at earlier times,
i.e. high neutral fractions. However, looking at the top-
left panel of Fig. 5, the typical bubbles filling most of
the ionized volume overlap due to the strong clustering
of early sources and bubbles. This results in many “over-
lapping pairs of spheres” at early times, resulting from
merging HII bubbles sourced by clustered sources. Thus
the spherical bubble-based analytic model underpredicts
the true size distribution, using our “mean free path” def-
inition of bubble sizes above. This effect was not noted
by previous studies (Zahn et al. 2007), because they used
a different definition of bubble sizes, based on spherical
filters used to flag regions in which x̄HI < 0.1. As time
progresses and the universe becomes more ionized, this
“overlapping pair of spheres” effect becomes less and less
dominant (see Fig. 5), and the analytic model becomes
increasingly more accurate.
Finally, the size distributions of neutral regions pre-
sented in the bottom panel of Fig. 6 are a new result and
potentially important for the 21-cm signal (which origi-
nates in neutral hydrogen, of course). In the later stages
of reionization, when the topology has transformed to
isolated neutral islands in a sea of ionized gas, this fig-
ure pinpoints the typical sizes of “mostly neutral” pixels
that continue to emit strongly. In contrast to the ionized
regions, the neutral regions (defined in this way) do not
grow substantially during reionization. From x̄HI = 0.89
to x̄HI = 0.1, the peak of the distribution shifts only by a
factor of ∼ 6, whereas the peak of the ionized region dis-
tribution shifts by a factor of ∼40 over the same range.
The reason for this is also evident in Figure 5: even when
the universe is mostly neutral, space is dotted with is-
lands of ionized gas, such that our “mean free path”–type
size distributions never become too large. The converse
does not hold true for ionized regions. However, a slight
parallel for ionized islands in a mostly neutral IGM, could
be found in Lyman limit systems (LLS) inside larger HII
regions (e.g. Barkana & Loeb 2002; Shapiro et al. 2004;
Miralda-Escudé et al. 2000), though it is not clear how
prevalent such neutral clumps are at high redshifts.
Throughout this paper, we have used a L = 100
Mpc “simulation” box. This size facilitates compar-
ison of our results with those from recent hybrid N-
Fig. 5.— Slices through the 2003 ionization field at z = 10, 9, 8.25, 7.25 (left to right across rows). With the assumption of ζ = 15.1,
these redshifts correspond to x̄HI = 0.89, 0.74, 0.53, 0.18, respectively. All slices are 100 Mpc on a side and 0.5 Mpc deep. The bottom-left
panel corresponds to the halo field in the top-right panel of Fig. 3, generated on a high-resolution 12003 grid.
body works (Zahn et al. 2007; McQuinn et al. 2006a;
Iliev et al. 2006a; Trac & Cen 2006). However, the speed
of our semi-numerical approach allows us to explore
larger cosmological scales while still consistently resolv-
ing the small halos that could dominate the photon bud-
get during reionization. As mentioned previously, exist-
ing N-body codes must resort to merger-tree methods
to populate their distribution of small-mass halos, even
for box sizes ∼< 100 Mpc (McQuinn et al. 2006a). In
this spirit, we present some preliminary results from a
N = 15003, L = 250 Mpc simulation, capable of directly
resolving halos with masses M ∼> 2.2× 10
8M⊙, with re-
sulting mass functions accurate to better than a factor
of two even at the smallest scale. This resolution pushes
the RAM limit of our machine and so each redshift can
take several hours to complete.8
In Figure 7, we compare size distributions of ionized
8 We note here that our halo-finding algorithm requires signifi-
cantly higher resolution than does predicting the ionization field
directly from the linear density field smoothed on larger scales
(Zahn et al. 2005, 2007). The latter method can be extended to
even larger boxes, though at the price of a somewhat less accurate
ionization map (compare the left and right panels in Fig. 4).
Fig. 6.— Size distributions (see definition in text) of ionized
(top panel) and neutral (bottom panel) regions. Curves correspond
to (z, x̄HI) = (10, 0.89), (9.25, 0.79), (8.50, 0.61), (8.00, 0.45),
(7.50, 0.27), (7.00, 0.10), from left to right in the top panel, re-
spectively (or from right to left in the bottom panel). Solid curves
are produced from our simulation while dotted curves correspond
to the analytic mass function. All curves are normalized so that
the probability distribution integrates to unity.
(top panel) and neutral (bottom panel) regions from our
two different simulation boxes. Curves correspond to (z,
x̄HI) = (9.00, 0.80), (8.00, 0.56), (7.00, 0.21), from left
to right in the top panel, respectively (or from right to
left in the bottom panel, respectively). Solid curves are
generated from our fiducial, N = 12003, L = 100 Mpc,
simulation while dashed curves are generated from our
larger simulation with N = 15003, L = 250 Mpc. The
cell size in all ionization maps is 0.5 Mpc on a side, with
the efficiency parameter, ζ, adjusted to obtain matching
values of x̄HI, and we set the minimum halo mass to
Mmin = 2.2× 108M⊙ even in the higher resolution runs
for easier comparison.
As reionization progresses, an increasing number of
large HII regions are “missed” by the L = 100 Mpc sim-
ulation. Interestingly, the analogous trend in the neutral
region size distributions (bottom panel) is weaker. This is
most likely because the “ionized island” effect limits the
size distributions of neutral regions as described above.
4. 21-CM TEMPERATURE FLUCTUATIONS
A natural application of our “simulation” technique is
to predict 21-cm brightness temperatures during reion-
ization. The offset of the 21-cm brightness temperature
from the CMB temperature, Tγ , along a line of sight
(LOS) at observed frequency ν, can be written as (e.g.
Furlanetto et al. 2006b):
δTb(ν)=
TS − Tγ
1 + z
(1− e−τν0 ) (16)
Fig. 7.— Size distributions of ionized (top panel) and neutral
(bottom panel) regions from different simulation boxes. Curves
correspond to (z, x̄HI) = (9.00, 0.80), (8.00, 0.56), (7.00, 0.21),
from left to right in the top panel, respectively (or from right to
left in the bottom panel). Solid curves are generated from our
fiducial, N = 12003, L = 100 Mpc, simulation while dashed curves
are generated from a larger simulation with N = 15003, L = 250
Mpc. The cell size in all ionization maps is 0.5 Mpc on a side,
with the efficiency parameter, ζ, adjusted to get matching values
of x̄HI and the minimum halo mass set to Mmin = 2.2 × 10
for comparison purposes.
≈ 9(1 + z)1/2xHI(1 + δnl)
dvr/dr +H
where TS is the gas spin temperature, τν0 is the opti-
cal depth at the 21-cm frequency ν0, δnl is the physical
overdensity (see discussion under eq. 14), H is the Hub-
ble parameter, dvr/dr is the comoving gradient of the
line of sight component of the comoving velocity, and all
quantities are evaluated at redshift z = ν0/ν − 1. The
final approximation makes the standard assumption that
TS ≫ Tγ for all redshifts of interest during reionization
(e.g. Furlanetto 2006) and also that dvr/dr ≪ H . We
verify in our simulation that dvr/dr < H for all neutral
pixels.
Maps of δTb(x, ν) generated in this manner are shown
in Figure 8. All slices are 100 Mpc on a side, 0.5 Mpc
deep, and correspond to (z, x̄HI) = (9.00, 0.74), (8.25,
0.53), (7.50, 0.27), from left to right. The top panels take
into account the velocity correction term in eq. (16),
while the bottom panels ignore it.
As seen in Fig. 8, velocities typically increase the con-
trast in temperature maps, making hot spots hotter and
cool spots cooler. We also see that temperature hot
spots, which correspond to dense pixels, tend to clus-
ter around the edges of HII bubbles, especially smaller
bubbles. This occurs because HII bubbles correlate with
peaks of the density field and long-wavelength biases in
the density field can extend beyond the edge of the ion-
ized region. This enhanced contrast might be useful in
Fig. 8.— Brightness temperature of 21-cm radiation relative to the CMB temperature. All slices are 100 Mpc on a side, 0.5 Mpc deep,
and correspond to (z, x̄HI) = (9.00, 0.74), (8.25, 0.53), (7.50, 0.27), left to right. Top panels include the velocity correction term in eq.
(16), while the bottom panels do not. For animated versions of these pictures, see http://pantheon.yale.edu/∼am834/Sim.
the detection of the boundaries of ionized regions with fu-
ture 21-cm experiments. As reionization progresses most
hot spots become swallowed up by HII bubbles, and the
effects of velocities diminish.
In Figure 9 we plot the dimensionless 21-
cm power spectrum, defined as ∆221(k, z) =
k3/(2π2V ) 〈|δ21(k, z)|2〉k, where δ21(x, z) ≡
δTb(x, z)/ ¯δTb(z) − 1. Solid blue curves take into
account gas velocities, while dashed red curves do not.
Curves correspond to (x̄HI, z) = (0.79, 9.25), (0.61,
8.50), (0.45, 8.00), (0.27, 7.50), (0.10, 7.00), bottom to
top. Error bars on the bottom dashed curve denote 1-σ
Poisson uncertainties; fractional errors in a given bin
are the same for all curves. As reionization progresses,
small-scale power is traded for large-scale power, and the
curves become flatter. Note that, with our dimensionless
definition of the power spectrum, curves with smaller
x̄HI have larger values of ∆
21(k, z). This is because the
mean brightness temperature offset drops quite rapidly
as reionization progresses, since ¯δTb(z) ∝ x̄HI, but the
scatter remains significant (see Fig. 6) and thus the
fractional perturbation, δ21(x, z), increases throughout
reionization.
Finally, in Figure 10 we plot dimensional power spec-
tra, ¯δTb(z)
2∆221(k, z). The curves correspond to (x̄HI, z)
= (0.80, 9.00), (0.56, 8.00), (0.21, 7.00), top to bottom at
large k, respectively. The dotted green curves are gen-
erated from a large, high-resolution “simulation”, with
Fig. 9.— Dimensionless 21-cm power spectra for (x̄HI, z) =
(0.79, 9.25), (0.61, 8.50), (0.45, 8.00), (0.27, 7.50), (0.10, 7.00),
bottom to top. Solid blue curves take into account gas velocities,
while dashed red curves do not.
http://pantheon.yale.edu/~am834/Sim
Fig. 10.— Dimensional 21-cm power spectra. The curves corre-
spond to (x̄HI, z) = (0.80, 9.00), (0.56, 8.00), (0.21, 7.00), top to
bottom at large k, respectively. The dotted green curves are gen-
erated from a large, high-resolution “simulation”, with N = 15003
and L = 250 Mpc, with no velocity contribution to the power spec-
tra. Solid blue curves and dashed red curves are generated with
our fiducial N = 12003 and L = 100 Mpc simulation, with and
without the velocity contribution, respectively.
N = 15003 and L = 250 Mpc, with no velocity contribu-
tion to the power spectra. Solid blue curves and dashed
red curves are generated with our fiducial N = 12003
and L = 100 Mpc simulation, with and without the ve-
locity contribution, respectively. The cell size in all δTb
maps is 0.5 Mpc on a side, with the efficiency parame-
ter, ζ, adjusted to achieve matching values of x̄HI and
the minimum halo mass set to Mmin = 2.2× 108M⊙ for
comparison purposes.
As seen in Figures 9 and 10, velocities make a mod-
est contribution to the 21 cm power spectrum, boosting
power on small scales early in reionization. Note that
the apparent slight decrease in power at small k when
velocities are included is well within the errors from av-
eraging over the few modes available to us on the largest
scales (e.g., see Poisson error bars on the bottom dashed
curve in Fig. 9). While the maximum δTb value in our
simulation box increases by a factor of a few when ve-
locities are included, most of the pixels are only slightly
affected. When the power spectrum is plotted in a di-
mensional version, ¯δTb(z)
2∆221(k, z), small-scale power is
boosted by ∼ 40% at (x̄HI, z) = (0.80, 9.00), with this en-
hancement monotonically decreasing as reionization pro-
gresses. Linear theory predicts that velocities enhance
the density power spectrum by a factor of 1.87 when
x̄HI = 1 (Kaiser 1987). In fact we do recover this en-
hancement for a fully neutral IGM; however, as predicted
by analytic models (McQuinn et al. 2006b), the ionized
bubbles rapidly remove most of this amplification.
Figure 10 also confirms the inferences drawn from Fig.
7, primarily that larger box sizes are needed to capture
the ionization topology at the end stages of reionization.
Comparing the dashed red to the dotted green curves in
Fig. 10, we note that our fiducial L = 100 Mpc simula-
tions are accurate for scales smaller than k ∼> 0.2 Mpc
(or λ . 30 Mpc). As reionization progresses, larger scales
lose power more rapidly than in the L = 250 Mpc simu-
lation. This is again evidence that very large scale simu-
lations are needed to model the middle and late stages of
reionization. Thus the speed and high resolution of our
semi-numeric approach will be extremely useful in future
modeling of reionization.
5. CONCLUSIONS
We introduce a method to construct semi-numeric sim-
ulations that can efficiently generate realizations of halo
distributions and ionization maps at high redshifts. Our
procedure combines an excursion-set approach with first-
order Lagrangian perturbation theory and operates di-
rectly on the linear density and velocity fields. As such,
our algorithm can exceed the dynamic range of exist-
ing N-body codes by orders of magnitude. As this is
the main limiting factor in simulating the ionized bubble
topology throughout reionization, when ionized regions
reach scales of tens of comoving Mpc, this will be partic-
ularly useful in such studies. Moreover, the efficiency of
the algorithm will allow us to explore the large parame-
ter space required by the many uncertainties associated
with high-redshift galaxy formation.
We find that our halo finding algorithm compares well
with N-body simulations on the statistical level, yield-
ing both accurate mass functions and power spectra. We
have not yet compared our halo distribution with sim-
ulations on a point-by-point basis, but we do not ex-
pect perfect agreement because of the vagaries of the ex-
cursion set approach. However, it is encouraging that
a very similar algorithm independently developed by
Bond & Myers (1996a) fares quite well in a comparison
of high-mass halos.
Our HII bubble finding algorithm captures the bubble
topology quite well, as compared to ionization maps from
ray-tracing RT algorithms at an identical x̄HI. Our al-
gorithm is similar to other codes, although we build the
ionization map from our excursion set halo field rather
than directly from the linear density field or from halos
found in an N-body simulation (Zahn et al. 2005, 2007).
Compared to codes built only from the linear density
field, we can better track the “stochastic” component of
the bias, though at the cost of somewhat more compu-
tation and a harder limit on resolution. On the other
hand, our scheme is much faster than using an N-body
code and offers superior dynamic range.
We create ionization maps using a simple efficiency pa-
rameter and compute the size distributions of ionized
and neutral regions. Our size distributions are gener-
ally shifted to larger scales when compared with purely
analytic models (Furlanetto et al. 2004c) at the same
mean neutral fraction. The discrepancy lies in the fact
that, at their core, the purely analytic models are based
on ensemble-averaged distributions of isolated spheres.
Hence they do not capture overlapping bubble shapes,
which are most important at large x̄HI (when the bubbles
are small and random fluctuations in the source densities,
as well as clustering, are most important).
In this paper, we have confined ourselves to a sim-
ple ionization criterion (essentially photon counting;
Furlanetto et al. 2004c). However, our algorithm can
easily accommodate more sophisticated prescriptions, so
long as they can be expressed either with the excursion
set formalism (Furlanetto & Oh 2005; Furlanetto et al.
2006a) or built from the halo field (in a similar way to
semi-analytic models of galaxy formation embedded in
numerical simulations).
We also use our procedure to generate maps and power
spectra of the 21-cm brightness temperature fluctuations
during reionization. We note that temperature hot spots
generally cluster around HII bubbles, especially in the
early phases of reionization. Because HII bubbles cor-
relate with peaks of the density field, long-wavelength
biases in the density field can extend beyond the edge of
the ionized region, with the resulting overdensities ap-
pearing as hot spots. This effect might be useful for
detecting the boundaries of ionized regions with future
21-cm experiments. We study the imprint of gas bulk ve-
locities on 21-cmmaps and power spectra, an effect which
was not included in previous studies. We find that ve-
locities do not have a major impact during reionization,
although they do increase the contrast in temperature
maps, making some hot spots hotter and some cool spots
cooler. Velocities also increase small-scale power, though
the effect decreases with decreasing x̄HI.
We also include some preliminary results from a sim-
ulation run with the largest dynamical range to date:
a 250 Mpc box which resolves halos with masses M ∼>
2.2 × 108M⊙. This simulation run confirms that ex-
tremely large scales are required to model the late stages
of reionization, x̄HI ∼< 0.5, when the typical scale of ion-
ized bubbles becomes several tens of Mpc.
The speed and dynamic range provided by our semi-
numeric approach will be extremely useful in the mod-
eling of early structure formation and reionization. Our
ionization maps can be efficiently folded into analyses of
current and upcoming high-redshift observations, espe-
cially 21-cm surveys.
We thank Greg Bryan for many helpful conversations
concerning the inner workings of cosmological simula-
tions and the generation of initial conditions. We also
thank Oliver Zahn for permitting the use of the halo field
from his simulation output as well as for several interest-
ing discussions. We thank Mathew McQuinn for provid-
ing the halo power spectra from his simulation as well
as for associated helpful comments. We thank Zoltan
Haiman, Greg Bryan, Oliver Zahn and Mathew McQuinn
for insightful comments on a draft version of this paper.
This research was supported by NSF-AST-0607470.
REFERENCES
Abel, T., & Wandelt, B. D. 2002, MNRAS, 330, L53
Bagla, J. S., & Padmanabhan, T. 1997, Pramana, 49, 161
Barkana, R., & Loeb, A. 2002, ApJ, 578, 1
—. 2004, ApJ, 609, 474
Benson, A. J., Kamionkowski, M., & Hassani, S. H. 2005, MNRAS,
357, 847
Bond, J. R., Cole, S., Efstathiou, G., & Kaiser, N. 1991, ApJ, 379,
Bond, J. R., & Myers, S. T. 1996a, ApJS, 103, 1
—. 1996b, ApJS, 103, 41
Casas-Miranda, R., Mo, H. J., Sheth, R. K., & Boerner, G. 2002,
MNRAS, 333, 730
Ciardi, B., Stoehr, F., & White, S. D. M. 2003, MNRAS, 343, 1101
Dijkstra, M., Haiman, Z., Rees, M. J., & Weinberg, D. H. 2004,
ApJ, 601, 666
Efstathiou, G. 1992, MNRAS, 256, 43P
Efstathiou, G., Davis, M., White, S. D. M., & Frenk, C. S. 1985,
ApJS, 57, 241
Eisenstein, D. J., & Hu, W. 1999, ApJ, 511, 5
Fan, X. et al. 2006, AJ, 132, 117
Furlanetto, S. R. 2006, MNRAS, 371, 867
Furlanetto, S. R., Hernquist, L., & Zaldarriaga, M. 2004a, MNRAS,
354, 695
Furlanetto, S. R., McQuinn, M., & Hernquist, L. 2006a, MNRAS,
365, 115
Furlanetto, S. R., & Oh, S. P. 2005, MNRAS, 363, 1031
Furlanetto, S. R., Oh, S. P., & Briggs, F. H. 2006b, Phys. Rep.,
433, 181
Furlanetto, S. R., Zaldarriaga, M., & Hernquist, L. 2004b, ApJ,
613, 16
—. 2004c, ApJ, 613, 1
—. 2006c, MNRAS, 365, 1012
Gelb, J. M., & Bertschinger, E. 1994, ApJ, 436, 491
Gnedin, N. Y. 2000a, ApJ, 535, 530
—. 2000b, ApJ, 542, 535
Haiman, Z., & Bryan, G. L. 2006, ApJ, 650, 7
Haiman, Z., Rees, M. J., & Loeb, A. 1997, ApJ, 484, 985
Hockney, R. W., & Eastwood, J. W. 1988, Computer simulation
using particles (Bristol: Hilger, 1988)
Iliev, I. T. et al. 2006a, MNRAS, submitted, preprint astro-
ph/0603199
Iliev, I. T., Mellema, G., Pen, U.-L., Merz, H., Shapiro, P. R., &
Alvarez, M. A. 2006b, MNRAS, 369, 1625
Jenkins, A., Frenk, C. S., White, S. D. M., Colberg, J. M., Cole,
S., Evrard, A. E., Couchman, H. M. P., & Yoshida, N. 2001,
MNRAS, 321, 372
Kaiser, N. 1987, MNRAS, 227, 1
Kashikawa, N. et al. 2006, ApJ, 648, 7
Kitayama, T., & Ikeuchi, S. 2000, ApJ, 529, 615
Lacey, C., & Cole, S. 1993, MNRAS, 262, 627
Liddle, A. R., Lyth, D. H., Viana, P. T. P., & White, M. 1996,
MNRAS, 282, 281
Malhotra, S., & Rhoads, J. E. 2004, ApJ, 617, L5
—. 2006, ApJ, 647, L95
McQuinn, M., Lidz, A., Zahn, O., Dutta, S., Hernquist, L., &
Zaldarriaga, M. 2006a, ArXiv Astrophysics e-prints
McQuinn, M., Zahn, O., Zaldarriaga, M., Hernquist, L., &
Furlanetto, S. R. 2006b, ApJ, 653, 815
Mesinger, A., & Haiman, Z. 2004, ApJ, 611, L69
—. 2006, ArXiv Astrophysics e-prints astro-ph/0610258
Mesinger, A., Perna, R., & Haiman, Z. 2005, ApJ, 623, 1
Miralda-Escudé, J., Haehnelt, M., & Rees, M. J. 2000, ApJ, 530, 1
Mo, H. J., & White, S. D. M. 1996, MNRAS, 282, 347
Page, L. et al. 2006, ArXiv Astrophysics e-prints astro-ph/0603450
Peebles, P. J. E. 1980, The large-scale structure of the
universe (Research supported by the National Science
Foundation. Princeton, N.J., Princeton University Press,
1980. 435 p.)
Press, W. H., & Schechter, P. 1974, ApJ, 187, 425
Razoumov, A. O., Norman, M. L., Abel, T., & Scott, D. 2002, ApJ,
572, 695
Reed, D., Gardner, J., Quinn, T., Stadel, J., Fardal, M., Lake, G.,
& Governato, F. 2003, MNRAS, 346, 565
Scannapieco, E., Ferrara, A., & Madau, P. 2002, ApJ, 574, 590
Scoccimarro, R., & Sheth, R. K. 2002, MNRAS, 329, 629
Shapiro, P. R., Giroux, M. L., & Babul, A. 1994, ApJ, 427, 25
Shapiro, P. R., Iliev, I. T., & Raga, A. C. 2004, MNRAS, 348, 753
Sheth, R. K., & Lemson, G. 1999, MNRAS, 304, 767
Sheth, R. K., Mo, H. J., & Tormen, G. 2001, MNRAS, 323, 1
Sheth, R. K., & Pitman, J. 1997, MNRAS, 289, 66
Sheth, R. K., & Tormen, G. 1999, MNRAS, 308, 119
Sirko, E. 2005, ApJ, 634, 728
Sokasian, A., Abel, T., Hernquist, L., & Springel, V. 2003, MNRAS,
344, 607
Sokasian, A., Abel, T., & Hernquist, L. E. 2001, New Astronomy,
6, 359
Spergel, D. N. et al. 2006, ApJ, submitted, preprint astro-
ph/0603449
Springel, V., & Hernquist, L. 2003, MNRAS, 339, 312
Thoul, A. A., & Weinberg, D. H. 1996, ApJ, 465, 608
Totani, T., Kawai, N., Kosugi, G., Aoki, K., Yamada, T., Iye, M.,
Ohta, K., & Hattori, T. 2006, PASJ, 58, 485
Trac, H., & Cen, R. 2006, ArXiv Astrophysics e-prints astro-
ph/0612406
Wyithe, J. S. B., & Loeb, A. 2004, Nature, 427, 815
Zahn, O., Lidz, A., McQuinn, M., Dutta, S., Hernquist, L.,
Zaldarriaga, M., & Furlanetto, S. R. 2007, ApJ, 654, 12
Zahn, O., Zaldarriaga, M., Hernquist, L., & McQuinn, M. 2005,
ApJ, 630, 657
Zel’Dovich, Y. B. 1970, A&A, 5, 84
ABSTRACT
  We present a method to construct semi-numerical ``simulations'', which can
efficiently generate realizations of halo distributions and ionization maps at
high redshifts. Our procedure combines an excursion-set approach with
first-order Lagrangian perturbation theory and operates directly on the linear
density and velocity fields. As such, the achievable dynamic range with our
algorithm surpasses the current practical limit of N-body codes by orders of
magnitude. This is particularly significant in studies of reionization, where
the dynamic range is the principal limiting factor. We test our halo-finding
and HII bubble-finding algorithms independently against N-body simulations with
radiative transfer and obtain excellent agreement. We compute the size
distributions of ionized and neutral regions in our maps. We find even larger
ionized bubbles than do purely analytic models at the same volume-weighted mean
hydrogen neutral fraction. We also generate maps and power spectra of 21-cm
brightness temperature fluctuations, which for the first time include
corrections due to gas bulk velocities. We find that velocities widen the tails
of the temperature distributions and increase small-scale power, though these
effects quickly diminish as reionization progresses. We also include some
preliminary results from a simulation run with the largest dynamic range to
date: a 250 Mpc box that resolves halos with masses M >~ 2.2 x10^8 M_sun. We
show that accurately modeling the late stages of reionization requires such
large scales. The speed and dynamic range provided by our semi-numerical
approach will be extremely useful in the modeling of early structure formation
and reionization.

<|endoftext|><|startoftext|>
Introduction
Active Galactic Nuclei (AGNs) are believed to be powered
by gas accretion. This gas is supplied from interstellar mat-
ter in host galaxies, and the gas may form rotationally-
supported structures around the central supermassive black
hole. If they are viewed close to edge-on, they may ob-
scure the central activity from direct view. AGNs can be
categorized as type 1 if seen face-on, and type 2 if seen
edge-on; this explanation is known as a unified model (e.g.
Antonucci & Miller 1985). Indeed, a few hundred pc res-
olution molecular gas imaging toward the central regions
of the Seyfert 2 galaxies NGC 1068 (Planesas et al. 1991;
Jackson et al. 1993) and M51 (Kohno et al. 1996) show
strong peaks at the nuclei with velocity gradients perpen-
dicular to radio jets, which suggest the existence of edge-
on circumnuclear rotating disks. Recent ∼ 50 pc resolu-
tion imaging studies toward NGC 1068 and the Seyfert
1 galaxy NGC 3227 support this view, showing more de-
tailed structures, namely warped disks (Schinnerer et al.
2000a,b). However, observations toward a few low activ-
ity AGN galaxies with < 100 pc resolution show lopsided,
weak, or no molecular gas emission toward the nuclei (e.g.
Garćıa-Burillo et al. 2003, 2005).
M51 (NGC 5194) has also been observed in detail
with molecular lines in the past, since it is one of the
nearest (7.1 Mpc; Takáts & Vinkó 2006) Seyfert galax-
Send offprint requests to: S. Matsushita, e-mail:
satoki@asiaa.sinica.edu.tw
ies. A pair of radio jets emanates from the nucleus and
narrow line regions (NLRs) are associated with the jet
(e.g., Crane & van der Hulst 1992; Grillmair et al. 1997;
Bradley et al. 2004). Interferometric images in molecular
gas show blueshifted emission on the eastern side of the
Seyfert 2 nucleus, and redshifted gas on the western side
(Kohno et al. 1996; Scoville et al. 1998). This shift is al-
most perpendicular to the jet axis, and the estimated col-
umn density is consistent with that estimated from X-ray
absorption toward the nucleus, suggesting that the molecu-
lar gas can be a rotating disk and play an important role in
obscuring the AGN. Interferometric CO(3-2) observations
suggest a velocity gradient along the jet in addition to that
perpendicular to the jet (Matsushita et al. 2004). These re-
sults imply more complicated features than a simple disk
structure. We therefore performed sub-arcsecond resolution
CO(2-1) and CO(1-0) imaging observations of the center of
M51 to study the distribution and kinematics of the molec-
ular gas around the AGN in more detail.
2. Observation and data reduction
We observed CO(2-1) and CO(1-0) simultaneously toward
the nuclear region of M51 using the IRAM Plateau de
Bure Interferometer. The array was in the new A configu-
ration, whose maximum baseline length extends to 760 m.
Observations were carried out on February 4th, 2006. The
system temperatures in DSB at 1 mm were in the range
200-700 K, except for Antenna 6, for which a new genera-
http://arxiv.org/abs/0704.0947v1
2 Matsushita et al.: Jet-disturbed molecular gas near the Sy 2 nucleus in M51
379.0 km/s 399.4 km/s 419.7 km/s
440.0 km/s 460.3 km/s 480.6 km/s
501.0 km/s 521.3 km/s 541.6 km/s
582.2 km/s 602.6 km/s561.9 km/s
2 0 −2 2 0 −22 0 −2
R.A. Offset [arcsec]
Fig. 1. Channel maps of the CO(2-1) line. The contour lev-
els are −3, 3, 5, 7, and 9σ, where 1σ corresponds to 5.2 mJy
beam−1 (= 0.96 K). The cross in each map indicates
the position of the 8.4 GHz radio continuum peak posi-
tion of R.A. = 13h29m52.s7101 and Dec. = 47◦11′42.′′696
(Hagiwara et al. 2001; Bradley et al. 2004). The R.A. and
Dec. offsets are the offsets from the phase tracking center of
R.A. = 13h29m52.s71 and Dec. = 47◦11′42.′′6. The synthe-
sized beam is shown at the bottom-left corner of the first
channel map.
tion receiver gave system temperatures of 150-230 K. Those
in SSB at 3 mm were in the range 140-250 K for Antenna
6, and 220-550 K for other antennas. Four of the corre-
lators were configured to cover a 209 MHz (272 km s−1)
bandwidth for the CO(2-1) line, and a 139 MHz (362 km
s−1) bandwidth for the CO(1-0) line. The remaining four
units of the correlator were configured to cover a 550 MHz
bandwidth for continuum observations and calibration. The
strong quasar 0923+392 was used for the bandpass calibra-
tion, and the quasars 1150+497 and 1418+546 were used
for the phase and amplitude calibrations.
The data were calibrated using GILDAS, and were im-
aged using AIPS. The data were CLEANed with natural
weighting, and the synthesized beam sizes are 0.′′40× 0.′′31
(14 pc × 11 pc) with a position angle (P.A.) of 0◦ and
0.′′85× 0.′′55 (29 pc × 19 pc) with a P.A. of 13◦ for CO(2-1)
and CO(1-0) images, respectively. Fig. 1 shows the channel
maps of CO(2-1) emission with a 20.3 km s−1 velocity res-
olution. The channel maps of CO(1-0) emission show simi-
lar features to that of CO(2-1) emission with lower spatial
resolution. Fig. 2 shows integrated intensity and intensity
weighted mean velocity maps of the CO(2-1) and CO(1-
34 pc
3 2 1 −1 −2 −30
1 20 [Jy/beam km/s]
450 550[km/s]500
3 2 1 0 −1 −2 −3
0.0 0.5 1.0 [Jy/beam km/s]
3 2 1 0 −1 −2 −3
03 12 −1 −2 −3
450 500 550 [km/s]
R.A. Offset [arcsec]
440460
(b) CO(1−0)
R.A. Offset [arcsec]
(d) CO(1−0)
(a) CO(2−1)
Moment 0 Map
(c) CO(2−1)
Moment 1 Map
Moment 0 Map
Moment 1 Map
Fig. 2. Integrated intensity (moment 0) and intensity
weighted velocity (moment 1) maps of the CO(2-1) and
CO(1-0) lines. The synthesized beams are shown at the
bottom-left corner of each image. The crosses and the ref-
erence positions of the R.A. and Dec. offsets are the same
as in Fig. 1. (a) The CO(2-1) moment 0 image. The contour
levels are (1, 3, 5, 7, 9, and 11) × 0.334 Jy beam−1 km s−1
(= 62.0 K km s−1). (b) The CO(1-0) moment 0 image. The
contour levels are (1, 3, 5, and 7) × 0.257 Jy beam−1 km
s−1 (= 50.6 K km s−1). (c) The CO(2-1) moment 1 image.
(d) The CO(1-0) moment 1 image.
0) lines. The noise levels for continuum maps are 1.2 mJy
beam−1 at 1.3 mm and 0.54 mJy beam−1 at 2.6 mm, respec-
tively. We did not detect any significant continuum emission
at either frequency.
3. Results
Most of the CO(2-1) emission is detected within ∼ 1′′
(34 pc) of the center, and is located mainly on the eastern
and western sides of the nucleus. There is also weak emis-
sion located∼ 2.′′7 northwest of the nucleus. The overall dis-
tribution and kinematics are consistent with past observa-
tions (Kohno et al. 1996; Scoville et al. 1998), if we degrade
our image to lower angular resolution; a blueshifted feature
with the average velocity of ∼ 460 km s−1 at the eastern
side of the nucleus, and a redshifted feature with an average
velocity of ∼ 500 km s−1 at the western side (Figs. 1, 2; see
also Fig. 3b). We refer to these main structures with the
same labels as in Scoville et al. (1998) (Fig. 2a).
Our higher resolution images, however, show more com-
plicated structures and kinematics than the previous low
angular resolution observations. Molecular gas on the west-
ern side of the nucleus, S1, is elongated in the north-south
direction and separated into two main peaks (S1a and b).
S1a is located 0.′′9 (30 pc) northwest of the nucleus, and S1b
Matsushita et al.: Jet-disturbed molecular gas near the Sy 2 nucleus in M51 3
is 1.′′0 (34 pc) to the southwest. On the eastern side of the
nucleus, the molecular gas has an intensity peak 0.′′6 (20 pc)
to the northeast (labeled S2), which is located closer to the
nucleus in projected distance than that of S1a/b.
The feature S1 shows a clear velocity gradient along
the north-south direction, which is shown in Fig. 2(c) and
also in the position-velocity (PV) diagram (Fig. 3a). This
gradient was previously suggested by the CO(3-2) data
(Matsushita et al. 2004), but the magnitude of the velocity
gradient is different. The computation of the magnitude of
the velocity gradient is similar to that used for the CO(3-2)
data. The fitting result indicates a velocity gradient within
S1 of 2.2 ± 0.3 km s−1 pc−1, which is larger than that re-
ported previously, 0.77± 0.01 km s−1 pc−1 (the value has
been modified by the different distance of the galaxy used).
This difference is partially due to the larger beam size of
the previous result; the CO(3-2) data set has a beam size
of 3.′′9× 1.′′6 with a P.A. of 146◦, and the velocities of S2/C
and S3 contaminate that of S1.
The CO(1-0) maps show very similar molecular gas
distribution and kinematics as those in CO(2-1) maps
(Fig. 2b,d). Only the western emission was detected in
previous observations (Aalto et al. 1999; Sakamoto et al.
1999), but our map clearly shows the emission from both
side of the nucleus.
In addition to the previously known features, our CO(2-
1) image also shows a weak emission near the nucleus with
a structure elongated in the northeast-southwest direction
(feature C in Fig. 2a). This structure could be a part of
S2, since the velocity map (Fig. 2c) and the PV diagram
(Fig. 3b) show a smooth velocity gradient, although most
of the emission in C comes from only one velocity channel
(419.7 km s−1 map in Fig. 1). The velocity gradient between
S2 and C is in an opposite sense to that previously seen
with the lower angular resolution observations mentioned
above. This structure is not detected in the CO(1-0) line,
but a hint of a velocity gradient can be seen in Fig. 2d.
The total CO(2-1) integrated intensity of S1, S2, and C
is 25.01 Jy km s−1, and that of S1 and S2 in Scoville et al.
(1998) is 33.44 Jy km s−1, so that our data detected 75%
of their intensity. Scoville et al. (1998) detected ∼ 50%
and 20% of the single dish CO(2-1) flux in redshifted and
blueshifted emission, respectively, so that our data recov-
ered ∼ 25% of the single dish flux.
4. Discussion
4.1. Jet-entrained molecular gas
Our molecular gas data show a clear north-south velocity
gradient within the feature S1. We suggested from our pre-
vious study that this velocity gradient may be due to molec-
ular gas entrainment by the radio jet (Matsushita et al.
2004). Here we revisit this possibility with higher spa-
tial and velocity resolution data. Fig. 4 shows our CO(2-
1) image overlaid on the 6 cm radio continuum image
(Crane & van der Hulst 1992). The radio continuum image
shows a compact radio core coincident with the nucleus,
and the southern jet emanating from there (note that the
northern jet is located outside our figure). The CO(2-1)
map clearly shows that S1 is aligned almost parallel to the
jet. In addition, Figs. 2 and 3 show that the velocity gradi-
ent in S1 is also almost parallel to the jet.
R.A. Offset [arcsec]Dec. Offset [arcsec]
1.51.5 1.01.0 0.50.5 0.00.0 −0.5−0.5 −1.0−1.0 −1.5−1.5
620 580
600 560
580 540
560 520
540 500
520 480
500 460
480 440
460 420
440 400
420 380
(a) (b)
Fig. 3. Position-velocity (PV) diagrams of the CO(2-1)
line. The contour levels are 3, 5, 7, and 9σ, where 1σ cor-
responds to 5.2 mJy beam−1 (= 0.96 K). (a) PV diagram
along the north-south elongated S1 feature (P.A. of the cut
is 103◦). The positions for S1a and S1b are shown with
labels. (b) PV diagram along R.A. with the cut through
the S1a, C, and S2 features (P.A. of the cut is 90◦). The
positions for S1a, S2, and C are shown with labels.
The velocity increases from ∼ 480 km s−1 at S1a to
∼ 540 km s−1 south of S1b. This increment is very simi-
lar to that observed in the NLR clouds along the radio jet;
Bradley et al. (2004) measured the velocities and velocity
dispersions of the clouds using the [O III] λ5007 line, and
showed that the velocity of the southern <≃ 1
′′ clouds from
the nucleus are at VLSR ∼ 440− 590 km s
−1 and the veloc-
ity increases as the clouds move away from the nucleus (see
Table 2 and Fig. 9 of their paper)1. This velocity range and
increment are consistent with our data. Furthermore recent
observations of H2O masers toward the nucleus also show
a velocity gradient along the jet with the same sense as our
results (Hagiwara 2007), in addtion to the good correspon-
dance of the velocity range (Hagiwara et al. 2001; Hagiwara
2007; Matsushita et al. 2004). These results suggest that
the molecular gas in S1 (and the NLR clouds and the H2O
masers) is possibly entrained by the radio jet. These results
also suggest that some of the material in NLRs is supplied
from molecular gas close to AGNs.
Another example of jet-entrained neutral gas is found in
the radio galaxy 3C293 (Emont et al. 2005). The velocity
of H I gas in absorption spectra toward the AGN matches
that of ionized gas along kpc-scale radio jets. The spatial
coincidence is not clear, since the spatial resolution of the
H I data is lower (25.′′3 × 11.′′9) than that of the ionized
gas data. Our result is therefore the first possible case of
entrainment of molecular gas by a jet at the scale of ten pc.
The better resolution of our new CO data allows us to
revisit the values of the molecular gas mass, momentum,
and energy of the entrained gas. We derive 6 × 105 M⊙,
8 × 1045 g cm s−1, and 3 × 1052 ergs for these quantities.
These values are about half of the previous values derived
from the CO(3-2) data, mainly due to the larger beam, but
the conclusion is similar; the energy of the entrained gas
could be similar to that of the radio jet (> 6.9× 1051 ergs;
1 We selected the clouds with a velocity dispersion of less than
100 km s−1; Clouds 3, 4, and 4a in Bradley et al. (2004). If we
include all the clouds, the velocity is ∼ 440 − 690 km s−1 with
a range of velocity dispersion of ∼ 25 − 331 km s−1; Clouds 2,
3, 3a, 4, 4a, and 4b.
4 Matsushita et al.: Jet-disturbed molecular gas near the Sy 2 nucleus in M51
47 11 46
52.9 52.7 52.5 52.313 29 53.1
RIGHT ASCENSION (J2000)
Fig. 4. The CO(2-1) integrated intensity image (contours)
overlaid on the VLA 6cm radio continuum image (greyscale;
Crane & van der Hulst 1992). The contour levels, the syn-
thesized beam, and the cross are the same as in Fig. 1.
Crane & van der Hulst 1992), but the momentum is much
larger than that of the jet (2 × 1041 g cm s−1). One way
to explain this discrepancy is through a continuous input
of momentum from the jet (see Matsushita et al. 2004, for
more detail discussions).
4.2. Obscuring material around the Seyfert 2 nucleus
The feature C is located in front of the Seyfert 2 nu-
cleus, and the CO(2-1) intensity is about 62.0 K km s−1
(Fig. 2a). Hence the column density can be calculated as
6.2 × 1021 cm−2 using a CO-to-H2 conversion factor of
1.0 × 1020 cm−2 (K km s−1)−1 (Matsushita et al. 2004)
and assuming a CO(2-1)/(1-0) ratio of unity. This value is
far lower than that derived from the X-ray absorption of
5.6× 1024 cm−2 (Fukazawa et al. 2001). As is mentioned in
Sect.3, the missing flux of our data is ∼ 75%. However, even
if all of this missing flux contributes to obscuring the nu-
clear emission, this large column density difference cannot
be explained. Changing the conversion factor or the ratio
by an order of magnitude also cannot explain this large
difference. One way to reconcile this disparity is to assume
that C is not spatially resovled, in which case the computed
column density is a lower limit. Alternatively, the obscur-
ing material preferentially traced by higher-J CO lines or
denser molecular gas tracers such as HCN may be involved.
The CO(3-2) intensity in brightness temperature scale is
∼ 2 times stronger than that of CO(1-0) (Matsushita et al.
2004), and the HCN(1-0) intensity is also relatively stronger
(HCN/CO ∼ 0.4; Kohno et al. 1996) than normal galaxies.
4.3. Molecular gas at ten pc scale from the Seyfert nucleus
Previous studies suggest that the blue shifted eastern fea-
ture S2 and the red shifted western feature S1 may be the
outer part of a rotating disk as in the AGN unified model.
However, our images show a more complicated nature, and
no clear evidence of simple disk characteristics.
The simplest interpretation is that S1 and S2/C are in-
dependent structures. Since S1 is affected by the jet but
S2/C is not, S1 is expected to be located closer to the nu-
cleus than S2/C, and the projection effect makes the po-
sition of S2/C closer in our images. Alternatively, S2/C
may be close to the nucleus, but the entrained gas has
been already swept away or ionized by the jet. S2/C has
a velocity gradient, and therefore can be interpreted as a
streaming gas, presumably infalling toward the nucleus, as
is observed in the Galactic Center (Lo & Claussen 1983;
Ho et al. 1991).
S1 and S2/C can also be interpreted as a rotating disk
that is largely disturbed by the jet, and only a part remains.
According to the velocity gradient along S2/C, the bluesh-
fited gas is expected at S1, which is the opposite sense to
the previous suggestion, but the gas shows no signs of it due
to the jet entrainment. This is possible from the timescale
point of view; under this interpretation, S1 should have
a blueshifted rotation velocity of ∼ 380 km s−1 based on
the velocity gradient in S2/C. S1 has a velocity ∼ 150 km
s−1 higher than the expected rotational velocity, and we
assume that this is the entrained velocity. In this case, it
takes 2 × 105 years to be elongated along the jet by ∼ 1′′
or 34 pc. On the other hand, the rotation timescale at this
radius is about 2× 106 years, an order of magnitude longer
timescale. The rotating disk can therefore be locally dis-
turbed by the jet.
However, the above two explanations have difficulty in
explaining optical images of the nucleus; the Hubble Space
Telescope images show “X” shaped dark lanes in front of the
nucleus (Grillmair et al. 1997), suggesting the existence of a
warped disk or two rings with one tilted far from another.
An alternative explanation of the dark lanes is that, as
previously proposed, there is a rotating edge-on ring with
S2 as blueshifted gas and S1a as redshifted gas. In this case,
the feature C can be the counterpart of another dark lane,
which runs northeast-southwest, although C has to be a
counter-rotating or Keplarian rotating disk to explain the
opposite sense of the velocity gradient to that of the S1a/S2
(Sect. 3). This configuration explains the “X” shape, but
has a rather complicated configuration, and it is difficult to
explain why the inner disk C is not disturbed by the jet.
We imaged the nuclear region of the Seyfert 2 galaxy
M51 at ∼ 10 pc resolution, and we see no clear evidence of
a circumnuclear rotating molecular gas disk as previously
suggested. The molecular gas along the radio jet is most
likely entrained by the jet. The explanations for other gas
components are speculative, possibly involving a circumnu-
clear rotating disk or streaming gas.
Acknowledgements. We thank Arancha Castro-Carrizo and the
IRAM staff for the new A configuration observations. We also thank
the anonymous referee for helpful comments. IRAM is supported by
INSU/CNRS (France), MPG (Germany) and IGN (Spain). This work
is supported by the National Science Council (NSC) of Taiwan, NSC
95-2112-M-001-023.
References
Aalto, S., Hüttemeister, S., Scoville, N. Z., & Thaddeus, P. 1999, ApJ,
522, 165
Matsushita et al.: Jet-disturbed molecular gas near the Sy 2 nucleus in M51 5
Antonucci, R. R. J., & Miller, J. S. 1985, ApJ, 297, 621
Bradley, L. D., Kaiser, M. E., & Baan, W. A. 2004, ApJ, 603, 463
Crane, P. C., & van der Hulst, J. M. 1992, AJ, 103, 1146
Emonts, B. H. C., Morganti, R., Tadhunter, C. N., et al. 2005,
MNRAS, 362, 931
Fukazawa, Y., Iyomoto, N., Kubota, A., Matsumoto, Y., &
Makishima, K. 2001, A&A, 374, 73
Garćıa-Burillo, S., Combes, F., Hunt, L. K., et al. 2003, A&A, 407,
Garćıa-Burillo, S., Combes, F., Schinnerer, E., Boone, F., & Hunt, L.
K. 2005, A&A, 441, 1011
Grillmair, C. J., Faber, S. M., Lauer, T. R., et al. 1997, AJ, 113, 225
Hagiwara, Y. 2007, AJ, 133, 1176
Hagiwara, Y., Henkel, C., Menten, K. M., & Nakai, N. 2001, ApJ,
560, L37
Ho, P. T. P., Ho, L. C., Szczepanski, J. C., Jackson, J. M., Armstrong,
J. T. 1991, Nature, 350, 309
Jackson, J. M., Paglione, T. A. D., Ishizuki, S., & Nguyen-Q-Rieu
1993, ApJ, 418, L13
Kohno, K., Kawabe, R., Tosaki T., & Okumura S. K. 1996, ApJ, 461,
Lo, K. Y., Claussen, M. J. 1983, Nature, 306, 647
Matsushita, S., Sakamoto, K., Kuo, C.-Y., et al. 2004, ApJ, 616, L55
Planesas, P., Scoville, N., & Myers, S. T. 1991, ApJ, 369, 364
Sakamoto, K., Okumura, S. K., Ishizuki, S., & Scoville, N. Z. 1999,
ApJS, 124, 403
Schinnerer, E., Eckart, A., & Tacconi, L. J. 2000a, ApJ, 533, 826
Schinnerer, E., Eckart, A., Tacconi, L. J., Genzel, R., & Downes, D.
2000b, ApJ, 533, 850
Scoville, N. Z., Yun, M. S., Armus, L., & Ford, H. 1998, ApJ, 493,
Takáts, K., & Vinkó, J. 2006, MNRAS, 372, 1735
	Introduction
	Observation and data reduction
	Results
	Discussion
	Jet-entrained molecular gas
	Obscuring material around the Seyfert 2 nucleus
	Molecular gas at ten pc scale from the Seyfert nucleus
ABSTRACT
  Previous molecular gas observations at arcsecond-scale resolution of the
Seyfert 2 galaxy M51 suggest the presence of a dense circumnuclear rotating
disk, which may be the reservoir for fueling the active nucleus and obscures it
from direct view in the optical. However, our recent interferometric CO(3-2)
observations show a hint of a velocity gradient perpendicular to the rotating
disk, which suggests a more complex structure than previously thought. To image
the putative circumnuclear molecular gas disk at sub-arcsecond resolution to
better understand both the spatial distribution and kinematics of the molecular
gas. We carried out CO(2-1) and CO(1-0) line observations of the nuclear region
of M51 with the new A configuration of the IRAM Plateau de Bure Interferometer,
yielding a spatial resolution lower than 15 pc. The high resolution images show
no clear evidence of a disk, aligned nearly east-west and perpendicular to the
radio jet axis, as suggested by previous observations, but show two separate
features located on the eastern and western sides of the nucleus. The western
feature shows an elongated structure along the jet and a good velocity
correspondence with optical emission lines associated with the jet, suggesting
that this feature is a jet-entrained gas. The eastern feature is elongated
nearly east-west ending around the nucleus. A velocity gradient appears in the
same direction with increasingly blueshifted velocities near the nucleus. This
velocity gradient is in the opposite sense of that previously inferred for the
putative circumnuclear disk. Possible explanations for the observed molecular
gas distribution and kinematics are that a rotating gas disk disturbed by the
jet, gas streaming toward the nucleus, or a ring with another smaller counter-
or Keplarian-rotating gas disk inside.

<|endoftext|><|startoftext|>
Introduction
Cataclysmic variables (CVs) are binary star systems in which the secondary, usually a late-
type main sequence star, fills its Roche lobe and loses mass to the white dwarf primary (Warner
1995). CVs are long-lived systems that are stable against mass transfer, so the mass transfer must
be driven by gradual changes in the orbit, or in the secondary star, or both. It is commonly
believed that the evolution of most CVs is driven by the slow loss of angular momentum from the
orbit, most likely through magnetic braking of the co-rotating secondary star, at least at longer
orbital periods Porb where gravitational radiation is ineffective (Andronov & Pinsonneault 2004
give a recent discussion). The loss of angular momentum constricts the Roche critical lobe around
the secondary and causes the system to transfer mass as it evolves toward shorter Porb. In this
scenario, Porb serves as a proxy measurement for the system’s evolutionary state. Correct and
complete orbital period measurements are fundamental to any accurate theory of CV evolution.
Given the usefulness of Porb, it is fortunate that it can usually be measured accurately and precisely.
This paper presents optical spectroscopy of the nine CVs listed in Table 1. We took these
observations mostly for the purpose of finding orbital periods using radial velocities (none of these
systems are known to eclipse). The long cumulative exposures also allowed us to look for any
unusual features. The Catalog and Atlas of Cataclysmic Variables Archival Edition (Downes et al.
2001) 2 lists seven of the stars as dwarf novae, one as either a dwarf nova or a DQ Her star, and one
simply as a cataclysmic, possibly a dwarf nova similar to U Gem or SS Cygni (type UGSS). Except
for CZ Aql, for which we confirm a 4.8-hour candidate period suggested by Cieslinski et al. (1998),
all of these objects lacked published orbital periods when we began working on them. Subsequently
Tappert & Bianchini (2003) found Porb = 0.0883 d for GZ Cancri; we had communicated our
advance findings to these authors so they could disambiguate their period determination.
2. Observations, Reductions, and Analysis
2.1. Observations
All our spectra were taken at the MDM Observatory on Kitt Peak, Arizona, using either the
1.3m McGraw-Hill telescope or the 2.4m Hiltner telescope. The earliest observations we report
here are from 1995, and the latest were obtained 2007 January. Table 2 gives a journal of the
observations.
At the 1.3m we used the Mark III spectrograph and a SITe 1024 × 1024 CCD detector. The
spectral resolution is 5.0 Å, covering a range of either 4480 to 6760 Å with 2.2 Å pixel−1 for the
2001 December BF Eri data, or 4646 to 6970 Å with 2.3 Å pixel−1 for the remaining data. The
2Available at http://archive.stsci.edu/prepds/cvcat/index.html; this had been called the Living Edition until its
author retired and ceased updates.
http://archive.stsci.edu/prepds/cvcat/index.html
– 3 –
2.4m spectra, except for those of FO Per, were obtained with the modular spectrograph and a
SITe 20482 CCD detector, with 2.0 Å pixel−1, over a range of 4210 to 7500 Å and with a spectral
resolution of 3.5 Å. The relatively small number of 2.4 m spectra of FO Per were taken with a
LORAL 20482-pixel detector, and cover from 4285 to 6870 Å at 1.25 Å pixel−1.
2.2. Reductions
For the most part we reduced the spectra using standard IRAF3 procedures. The wavelength
calibration was based on exposures of Hg, Ne, and Xe lamps. Prior to 2003 we took lamp exposures
through the night and whenever the telescope was moved. For the 2.4m data from 2003 to the
present, we used lamp exposures taken in twilight to find the shape of the pixel-to-wavelength
relation, and set the zero point individually for each nighttime exposure using the OI λ5577 night-
sky feature. The apparent velocity of the telluric OH emission bands at the far red end of the
spectrum, found with a cross-correlation routine, provided a check; although these are far from
the feature used to set the zero point, their apparent velocity typically remain within 10 km s−1
of zero. Because of the increased efficiency of this technique, we attempted to use it at the 1.3m
telescope also, during the 2004 June/July observing run. For unknown reasons the results were
unsatisfactory. To salvage the Hα emission velocities from that run, we determined a correction by
cross-correlating the night-sky emission features in the 6200-6625 Å range with a well-calibrated
night-sky spectrum obtained with a similar instrument. The correction was calculated for each
individual spectrum and then applied to each measured velocity, and it did reduce the scatter
somewhat, evidently because the wavelength range used includes the Hα emission line for which
we measured velocities.
On all our runs, we observed flux standards during twilight when the sky was clear, and applied
the resulting calibration to the data. The reproducibility of these observations suggests that our
fluxes are typically accurate to ±20 per cent. We also took short exposures of bright O and B
stars in twilight to map the telluric absorption features and divide them out approximately from
our program object spectra. Before flux calibration, we divided our program star spectra by a
mean hot-star continuum, in order to remove the bulk of the response variation. Table 1 lists
V magnitudes synthesized from our mean spectra, using the IRAF sbands task and the passband
tabulated by Bessell (1990); clouds, losses at the slit, and calibration errors make these uncertain
by a few tenths of a magnitude, but they do give a rough indication of the brightness of each system
at the time of our observation.
3IRAF is distributed by the National Optical Astronomy Observatories.
– 4 –
2.3. Analysis
Except for a few spectra taken in outburst (which show weak emission or absorption on a
strong continuum), all of the stars show the prominent emission lines. Figs. 1 and 2 show averaged
spectra, and Table 3 gives the equivalent width and FWHM of each line measured for each star
from its averaged spectrum.
Two stars, BF Eri and BI Ori, showed the spectral features of a late-type star. To quantify the
secondary contribution in these objects, we began by preparing averaged flux-calibrated spectra
(in BF Eri’s case the secondary’s radial velocity curve was measurable, so we shifted the individual
spectra to the secondary’s rest frame before averaging). Over time we have used the 2.4 m and
modular spectrograph to collect spectra of K and M stars classified by Keenan & McNeil (1989) or
Boeshaar (1976). The wavelength coverage and spectral resolution of these data are similar to the
1.3m data. We applied a range of scaling factors to the library spectra, subtracted them from the
averaged spectra, and examined the results by eye to estimate a range of spectral types and scaling
factors giving acceptable cancellation of the late-type features.
We use the spectral type and secondary flux to estimate the distance in the following manner.
We begin by finding the surface brightness of the secondary star in V , on the assumption that the
surface brightness is similar to that of main-sequence stars of the same spectral type; the Barnes-
Evans relation for late-type stars is discussed by Beuermann (2006). Combining the known Porb
with the assumption that the secondary fills its Roche critical lobe yields the secondary’s radius
R2 as a function of its mass, M2. In the relevant range of mass ratio, R2 ∝ M
2 , approximately,
and the dependence on M1 is weak enough to ignore. We generally do not know M2, so we guess
at a generous allowable range for this parameter using evolutionary simulations by Baraffe & Kolb
(2000) as a guideline; the weakness of the dependence of R2 on M2 means that this (rather ques-
tionable) step does not dominate the error budget. Combining the surface brightness with R2
yields the absolute magnitude MV . Subtracting this from the apparent magnitude measured for
the secondary star gives a distance modulus. The reddening maps of Schlegel et al. (1998) then
can be used to estimate the extinction. Note carefully that we do not assume that the secondary
is a ‘normal’ main-sequence star; we assume only that the secondary’s surface brightness is similar
to field stars of the same spectral type. The normalization of the secondary’s contribution also
depends on the assumption that the spectral features used to judge the subtraction are similar in
strength to those of a normal star.
As noted earlier, the immediate aim of our observations was to find orbital periods from radial
velocity time series data. The Hα emission line is usually the strongest feature, and it generally
gives good results in dwarf novae. All the emission-line velocities reported here are of Hα.
We measured radial velocities of Hα emission using convolution methods described by Schneider & Young
(1980) and Shafter (1983). In this technique one convolves an antisymmetric function with the line
profile, and takes the zero of the convolution (where the two sides of the line contribute equally)
as the line center. For the antisymmetric function with which the spectrum is convolved, we used
– 5 –
either the derivative of a Gaussian with adjustable width, or positive and negative Gaussians of
adjustable width offset from each other by an adjustable separation. Uncertainties in the convolu-
tion velocities are estimated by propagating forward the counting-statistics errors in the individual
data channels; in practice, these are lower limits to the true uncertainties, since the line profile
can vary in ways unrelated to the orbital modulation. The choice of convolution parameters is
dictated by the shape and width of the line, and in practice the parameters are adjusted to give
the best detection of the orbit. The physical interpretation of CV emission lines is complicated
and controversial (see, e.g., Shafter 1983, Marsh 1988, Robinson 1992), but in almost all cases the
emission-line periodicity accurately reflects Porb (though Araujo-Betancor et al. 2005 describe a
noteworthy exception to this rule). A sample of the radial velocities for each object are listed in
Table 4, while the full tables can be found online.
One of our systems, BF Eri, has a K-type absorption component in its spectrum. We measured
velocities of this using the cross-correlation radial velocity package described by Kurtz & Mink
(1998), using the region from 5000 to 6500 Å, and excluding the region containing the He I λ5876
emission line and and the NaD absorption complex. For a cross-correlation template spectrum, we
used the a velocity-compensated sum of many observations of IAU velocity standards taken with
the same instrument, as described in Thorstensen et al. (2004).
We searched for periods in all the velocity time series using the “residualgram” method
(Thorstensen et al. 1996); the resulting periodograms are given in Figs. 3 and 4. At the best
candidate periods we fitted least-squares sinusoids of the form v(t) = γ + K sin[2π(t − T0)/P ].
Fig. 5 shows the velocities folded on the best-fitting periods, and Table 5 gives the parameters of
these fits. Because of limitations of the sampling (e.g., the need to observe only at night from a
single site), a single periodicity generally manifests as a number of alias frequencies. To assess the
confidence with which we could assert that the strongest alias is the true period, we used a Monte
Carlo test described by Thorstensen & Freed (1985).
The alias problem can be particularly irksome over longer timescales; in this case the uncertain
number of cycles elapsed between observing runs causes fine-scale “ringing” in the periodogram.
The individual periods have tiny error bars, because of the large time span covered, but the am-
biguity in period means that a realistic error bar – one that covers the range of possibilities – is
much larger. In those cases, the period uncertainties given in Table 5 are estimated by analyzing
data from the individual observing runs separately. When only two observing runs are available,
the allowable fine-scale frequencies are well-described by a fitting formula
Porb = (t2 − t1)/n.
Here t1 and t2 are the epochs of blue-to-red velocity crossing observed on the two runs, and n
is the integer number of cycles that have passed between t1 and t2. The allowed range of n is
determined from the weighted average of the periods derived from separate fits to the two runs’
data. When more than two observing runs are available, the situation becomes more complex. In
some happy cases there are enough overlapping constraints that only a single, very precise period
– 6 –
remains tenable. We were able to find such precise periods for LX And, LU Cam, and BF Eri.
3. Notes on Individual Objects
We discuss the stars in alphabetical order by constellation.
3.1. LX Andromedae
LX And was first identified as a variable star (RR V-3) in the Lick RR Lyrae search (Kinman et al.
1982). It was classified incorrectly as an RV Tauri star, and its dwarf nova nature was unrecognized
until the photometric study by Uemura et al. (2000). Morales-Rueda & Marsh (2002) obtained
spectra of LX And as part of their study of dwarf novae in outburst and determined the equivalent
widths and FWHMs of the Balmer and He II lines. Our mean spectrum appears typical for a dwarf
nova at minimum light.
Because of the large hour-angle span, the radial velocity time series leaves no doubt about
the daily cycle count, which is near 6.6 cycle d−1. The several observing runs constrain the fine-
scale period in a more complicated way, but the Monte Carlo test indicates that a precise period
of 0.1509743(5) d is preferred with about 98 per cent confidence. Two other candidate periods
separated from this by 1 cycle per 53.2 d in frequency are much less likely.
3.2. CZ Aquilae
Very little has been published on CZ Aql, which is listed in the Archival Edition as a U-
Gem dwarf nova. Cieslinski et al. (1998) included the star in their spectroscopic study of irregular
variables, and noted a probable 4.8 hour period and emission lines typical of dwarf novae. Our
velocities confirm the suggested 4.8-hour period, but we cannot determine a unique cycle count
between our observing runs.
While the spectrum superficially resembles that of a dwarf nova, a closer look reveals interesting
behavior. Fig. 6, constructed using methods described by Taylor et al. (1999), presents our spectra
as a phase-averaged greyscale image. There is a striking broad component in the stronger Balmer
and HeI lines that shows a large velocity excursion, with the red wing of Hα reaching to +3100 km
s−1 at phase 0.3 (where phase 0 corresponds to the blue-to-red crossing of the line core). The broad
components around Hβ and λ6678 move in phase with those of Hα and range from 900 to 2600
and −2500 to −900 km s−1 and 700 to 2100 and −1000 to −600 km s−1, respectively. The wings
of λ5876 also move in phase with the others, but the red edge is difficult to follow at its minimum
because of interference from the NaD absorption lines, which are stationary and hence interstellar.
The maximum of the red edge is 3200 km s−1, while the blue edge ranges from −1500 to −600 km
– 7 –
s−1. The blueward wing of all these lines is noticeably weaker than the redward wing.
Other emission lines present include HeII λ4686, HeI λ4713 and λ4921, and, very weakly, FeII
λ5169. We also detect unidentified emission lines at λ6344, as is also seen in LS Peg (Taylor et al.
1999), and at λ5046. The strength of the λ5780 diffuse interstellar band (Jenniskens & Desert
1994) and the NaD lines suggest that a good deal of interstellar material lies along the line of sight,
and that the luminosity is relatively high.
High-velocity wings reminiscent of the ones seen here have been seen in V795 Her (Casares et al.
1996; Dickinson et al. 1997), LS Peg (Taylor et al. 1999), V533 Her (Thorstensen & Taylor 2000),
and RX J1643+34 (Patterson et al. 2002), all of which are SW Sex stars. We do not, how-
ever, detect another SW Sex characteristic, namely phase-dependent absorption in the HeI lines
(Thorstensen et al. 1991). The orbital periods of most SW Sex stars are shorter than 4 hours, so
CZ Aql’s 4.8-h period would be unusually long for an SW Sex star.
3.3. LU Camelopardalis
Jiang et al. (2000) obtained the first spectrum of this dwarf nova in a follow-up study of CV
candidates from the ROSAT All Sky Survey. We found no other published spectroscopic studies.
Our velocities constrain the period to a unique value, 0.1499685(7) d. The averaged spectrum
shows a rather strong, blue continuum, which may indicate a state somewhat above true minimum.
3.4. GZ Cancri
Jiang et al. (2000) confirmed the cataclysmic nature of GZ Cnc by obtaining the first spectrum
of the object. Kato et al. (2002) suggested that this star, originally labeled as a dwarf nova, could
possibly be an intermediate polar (DQ Her star), based on similarities in its long-term photometric
behavior to that of other intermediate polars. Tappert & Bianchini (2003) conducted a photometric
and spectroscopic study of the system. Using advance results from the present study to help decide
the daily cycle count, they found Porb = 0.08825(28) d, or 2.118(07) h, placing the system near
the lower edge of the so-called gap in the CV period distribution – a dearth of systems in the
period range from roughly 2 to 3 hr. Tappert & Bianchini (2003) also saw characteristics that
could indicate an intermediate polar classification, but did not claim their evidence was definitive
on this point.
Almost all our observations come from two observing runs a year apart. The full set of
velocities strongly indicates an orbital frequency near 11.4 cycle d−1, with the Monte Carlo test
giving a discriminatory power greater than 0.99 for the choice of daily cycle count. However, the
number of cycles between the two observing runs is not determined. Precise periods that fit the
combined data set are given by P = [349.785(3) d]/n, where n is the integer number of cycle counts;
– 8 –
n = 3972± 8 corresponds to roughly 1 standard deviation. While our period agrees well with that
of Tappert & Bianchini (2003), our data neither support nor disprove the claim that GZ Cnc may
be an intermediate polar.
3.5. V632 Cygni
Liu et al. (1999) offer the only published spectrum of this dwarf nova. They measured the
equivalent widths and integrated line fluxes of the Balmer, HeI, and HeII emission lines and sug-
gested that the orbital period is likely short based on the very strong Balmer emission. Our spec-
trum appears similar to theirs, and our measured flux level is also nearly the same. The periodigram
in Fig. 3 clearly favors an orbital frequency near 15.7 cycles d−1, with a discriminatory power of 95
per cent and a correctness likelihood near unity. This confirms the suggestion of Liu et al. (1999)
that the period is rather short and suggests that it is an SU UMa-type dwarf nova.
3.6. V1006 Cygni
Bruch & Schimpke (1992) present the only published spectrum we know of, and characterized
it as a “textbook example” of a dwarf nova spectrum. They noted a slightly blue continuum with
strong Balmer and He I emission, as well as clear He II λ4686 and Fe II emission. Our spectrum
(Fig. 1) is similar to theirs both in appearance and normalization, and our line measurements
(Table 3) are also comparable.
The periodogram (Fig. 4) indicates a frequency near 10.1 cycles d−1, and the Monte Carlo test
confirms that the daily cycle count is securely determined. Most of our data are from 2004 June,
but we returned in 2005 June/July to confirm the unusual period indicated in the earlier data.
The periods found by analyzing the two runs separately are consistent within their uncertainties.
As with GZ Cnc, there are multiple choices for the cycle count between the two observing runs;
the best-fitting periods are given by P = [369.006(4) d]/n, where n = 3726 ± 4 corresponds to
1 standard deviation. Including a few velocities from other observing runs suggests that n is
slightly larger, perhaps 3728. In any case, the period amounts to 2.38 h, which places V1006 Cyg
firmly in the period gap (Warner 1995), where there is apparently a true scarcity of dwarf novae
(Hellier & Naylor 1998).
3.7. BF Eri
The first evidence that BF Eridani was a cataclysmic variable came when an Einstein X-
ray source, 1ES0437-046, was matched to the variable (Elvis et al. 1992). Schachter et al. (1996)
confirmed this match and presented an optical spectrum. Kato (1999) and the Variable Star
– 9 –
Observers’ League in Japan (VSOLJ) found photometric variability characteristic of a dwarf nova.
The spectrum of BF Eri (Fig. 2) shows a significant contribution from a K star along with the
usual dwarf-nova emission lines. Normally, this suggests that Porb > 6 h. Nearly all our spectra
yielded good cross-correlation radial velocity measurements as well as emission-line velocities. The
absorption- and emission-line velocities independently give a period near 6.50 h (Table 5), in ac-
cordance with expectation based on the spectrum. There is no ambiguity in cycle count over the
5-year span of the observations, so the period is precise to a few parts per million. Fig. 6 shows a
phase-resolved average of the BF Eri spectra, with the absorption spectrum shifting in antiphase
to the emission lines.
If the emission-line velocities faithfully trace the primary’s center-of-mass motion, and the
absorption-line velocities also trace the secondary’s motion, then the two velocity curves should be
exactly one-half cycle out of phase. In BF Eri, we find a shift of 0.515 ± 0.007 cycles between the
two curves, consistent with 0.5 cycles, so we feel emboldened to explore the system dynamics.
Masses can only be derived when the orbital inclination is known, as in eclipsing systems. To
see if BF Eri might eclipse, we derived differential magnitudes from images that were taken for
astrometry (discussed below) and plotted them as a function of orbital phase. Some images were
taken at the phase at which an eclipse would appear, but no evidence for an eclipse was found.
Limits on the depth and duration of the eclipse are difficult to quantify because the data were taken
in short bursts in the presence of strong intrinsic variability, so a weak eclipse cannot be ruled out,
but the photometry does suggest that the inclination is not close to edge-on.
Because the system apparently does not eclipse, we cannot derive masses; rather, we find broad
constraints on the inclination by assuming astrophysically reasonable masses for the components.
Taken at face value, the velocity amplitudes K imply a mass ratio q = M2/M1 = 0.60 ± 0.03. If
we arbitrarily choose a white dwarf mass M1 = 0.9 M⊙ (so that M2 = 0.53 M⊙), the observed K
velocities imply i = 50 degrees. To find a rough lower limit on the inclination, we consider a massive
white dwarf (M1 = 1.2 M⊙) and, ignoring the constraint on q for the moment, take M2 = 0.4 M⊙;
this yields i = 40 degrees. For a rough upper limit, we assume M1 = 0.6 M⊙ and M2 = 0.4 M⊙,
which gives i = 67 degrees.
The decomposition procedure described earlier yielded a spectral type of K3 ±1 subclass; the
result of the subtraction is shown in Fig. 2. Using the V passband tabulated by Bessell (1990) and
the IRAF sbands task, we find a synthetic V = 16.9± 0.3 for the K star’s contribution. Taking the
range of plausible secondary star masses to be 0.4 to 0.8 M⊙ yields R2 = 0.7± 0.1 R⊙ at this Porb.
Combining this with the surface brightness expected at this spectral type yields MV = 6.8 ± 0.4
for the secondary. If there is no significant interstellar extinction, we have m − M = 10.1 ± 0.5,
or a distance of approximately 1100 ± 300 pc. The dust maps of Schlegel et al. (1998) give a total
E(B − V ) = 0.062 in this direction. Assuming that BF Eri is beyond the Galactic dust and taking
AV /E(B − V ) = 3.3 gives an extinction-corrected (m−M)0 = 9.9, and a distance estimate of 950
(+250,−200) pc.
– 10 –
We can also estimate a distance using the relation found by Warner (1987) between Porb, i,
and the absolute magnitude at maximum light MV (max). Using our inclination constraints, the
Warner relation predicts MV max = 3.9±0.7 at this orbital period. The General Catalog of Variable
Stars (Kholopov et al. 1999) lists mp = 13.2 at maximum light; taking this to be similar to Vmax
yields m−M = 9.3, or 9.1 corrected for extinction, which corresponds to 660 pc.
Given these distance estimates, it is surprising that BF Eri has a very substantial proper
motion. The Lick proper motion survey (Hanson et al. 2004) gives [µX , µY ] = [+34,−97] mas
yr−1. We have begun a series of parallax observations with the Hiltner 2.4m telescope using the
protocols described by Thorstensen (2003); so far we have five epochs from 2005 November and 2007
January. The proper motion relative to the background stars is [µX , µY ] = [32,−111] mas yr
and the parallax is not detected, with a nominal value of 1 ± 2 mas. The parallax determination
is very preliminary, but given the data so far we estimate the lower limit on the distance based on
the astrometry alone to be ∼ 200 pc.
At the nominal 950 pc distance derived from the secondary star, a 100 mas yr−1 proper
motion corresponds to a transverse velocity vT = 451 km s
−1. This is implausibly large, so we
are left wondering how we might have overestimated the distance. One effect might be as follows.
Our distance is based on the secondary’s apparent brightness, and we estimate the secondary’s
contribution to the total light by searching for the best cancellation of its features. If the secondary’s
absorption lines are weaker than those in the spectral-type standards, we would underestimate the
secondary’s contribution. In our best decomposition, the secondary is about 2.2 magnitudes fainter
than the total light in V . Assuming (unrealistically) that all the light is from the secondary would
therefore decrease the distance modulus by 2.2 magnitudes, to a distance of 340 pc.
We do not yet have enough information to resolve the conundrum posed by BF Eri’s unmis-
takably large proper motion and its apparently large distance, but a reasonable compromise might
be to put it at something like 400-500 pc, with an underluminous, low-metallicity secondary. The
cross-correlation velocities of the secondary have a zero point determined to ±5 km s−1, more or
less, and give a substantial systemic velocity of −72±3 km s−1, or −86 km s−1 in the local standard
of rest. If the star is at 450 pc, its space velocity with respect to the local standard of rest is ∼250
km s−1, with Galactic components [U, V,W ] = [−180,−180,−3] km s−1, that is, the velocity is
mostly parallel to the Galactic plane and lags far behind the rotation of the Galactic disk. This
would put BF Eri on a highly eccentric orbit; these are halo-population kinematics (even though
the star remains close to the plane). These kinematics would be qualitatively consistent with the
weak-line conjecture used earlier.
3.8. BI Ori
Szkody (1987) published the first quiescent spectrum of BI Orionis, which showed emission
lines typical of dwarf novae. Morales-Rueda & Marsh (2002) show a spectrum in outburst and
– 11 –
note the possible presence of weak HeII λ4686 emission.
Only the 2006 January velocities are extensive enough for period finding; they give Porb =
4.6 hr, with no significant ambiguity in the daily cycle count. The average spectrum shows the
usual emission lines; M-dwarf absorption features are also visible, though the signal-to-noise of the
individual spectra was not adequate for finding absorption-line velocities. Using the procedures
described earlier, we estimate the secondary’s spectral type to be M2.5 ± 1.5, with the secondary
alone having V = 20.0 ± 0.4. Assuming that the secondary’s mass lies in the broad range from
0.2 to 0.6 M⊙, its radius at this Porb would be 0.35 to 0.6 R⊙. Combining this with the surface
brightness derived from the spectral types gives an absolute magnitude MV = 10.2 ± 1.0. The
distance modulus, uncorrected for extinction, is therefore m − M = +9.8 ± 1.1, corresponding
to 910(+600,−360) pc. Schlegel et al. (1998) estimate a total reddening E(B − V ) = 0.11 in this
direction; assuming that BI Ori lies beyond all the dust, and taking AV /E(B−V ) = 3.3 reduces the
distance to ∼ 770 pc. At maximum light, BI Ori has mp = 13.2 (Kholopov et al. 1999). Assuming
the color is neutral, we find MV = 3.4 ± 1.1 at maximum. At BI Ori’s period, the Warner (1987)
relation predicts MV > 3.6 (with the brightest value corresponding to i = 0). This agrees broadly
with our nominal value based on the secondary star’s distance, but is a little fainter, suggesting
that BI Ori is not too far from face-on, or a little closer than our nominal distance, or both.
3.9. FO Per
FO Persei was apparently discovered by Morgenroth (1939), but its cataclysmic nature was
not immediately recognized. Bruch (1989) obtained spectra and gave equivalent widths for the
Balmer lines for two different nights of observations, between which the continuum changed from
relatively flat to inclined toward the red.
The emission lines in FO Per are rather narrow (Fig. 1, Table 3). This is often taken to indicate
a low orbital inclination. The velocity amplitude K is small, so that K/σ ≈ 1.6 for the best fits
(Table 5). Because of this, the daily cycle count remains ambiguous; the orbital frequency is either
5.8 or 6.8 cycle d−1, corresponding to Porb of 3.52 or 4.13 hr. CVs with periods in the 3-4 hour
range tend to be novalike variables (Shafter 1992), whereas FO Per is a dwarf nova; thus the 4.13
hr period is more likely a priori.
4. Summary
We have determined the orbital periods of eight CVs without significant daily cycle count
ambiguity; for FO Per, the period is narrowed to two choices. For three of the systems we find
high-precision periods by establishing secure cycle counts over long baselines.
While most of these objects are similar to others already known, three stand out as especially
– 12 –
interesting. CZ Aql shows asymmetric, high-velocity wings around the Balmer and HeI λ5876 and
λ6678 lines, possibly indicating a magnetic system. BF Eri’s proper motion of ∼ 100 mas yr−1 is
surprising in view of the large distance indicated by its secondary spectrum and by the Warner
relation; even if it is somewhat nearer than these indicators suggest, its kinematics are not typical of
disk stars. Finally, the orbital period of V1006 Cyg places it squarely in the middle of the so-called
period gap between 2 and 3 hours.
Acknowledgments. We are most grateful for support from the National Science Foundation
through grants AST-9987334 and AST-0307413. Bill Fenton took most of the spectra of GZ Cnc,
and J. Cameron Brueckner assisted with the BF Eri spectroscopy. Some of the astrometric images of
BF Eri were obtained by Sébastien Lépine and Michael Shara of the American Museum of Natural
History. We would like to thank the MDM Observatory staff for their skillful and conscientious
support. Finally, we are grateful to the Tohono O’odham for leasing us their mountain for a while,
so that we may study the glorious universe in which we all live.
– 13 –
REFERENCES
Andronov, N., & Pinsonneault, M. H. 2004, ApJ, 614, 326
Araujo-Betancor, S., et al. 2005, A&A, 430, 629
Baraffe, I., & Kolb, U. 2000, MNRAS, 318, 354
Bessell, M. S. 1990, PASP, 102, 1181
Beuermann, K. 2006, A&A, 460, 78
Boeshaar, P. 1976, Ph. D. thesis, Ohio State University
Bruch, A. 1989, A&AS, 78, 145
Bruch, A., & Schimpke, T. 1992, A&AS, 93, 419
Casares, J., Martinez-Pais, I. G., Marsh, T. R., Charles, P. A., & Lazaro, C. 1996, MNRAS, 278,
Cash, W. 1979, ApJ, 228, 939
Cieslinski, D., Steiner, J. E., & Jablonski, F. J. 1998, A&AS, 131, 119
Dickinson, R. J., Prinja, R. K., Rosen, S. R., King, A. R., Hellier, C., & Horne, K. 1997, MNRAS,
286, 447
Downes, R. A., Webbink, R. F., Shara, M. M., Ritter, H., Kolb, U., & Duerbeck, H. W. 2001,
PASP, 113, 764
Elvis, M., Plummer, D., Schachter, J., & Fabbiano, G. 1992, ApJS, 80, 257
Hellier, C., & Naylor, T. 1998, MNRAS, 295, L50
Jenniskens, P., & Desert, F.-X. 1994, A&AS, 106, 39
Jiang, X. J., Engels, D., Wei, J. Y., Tesch, F., & Hu, J. Y. 2000, A&A, 362, 263
Kato, T. 1999, Informational Bulletin on Variable Stars, 4745, 1
Kato, T., et al. 2002, A&A, 396, 929
Keenan, P. C., & McNeil, R. C. 1989, ApJS, 71, 245
Kholopov, P. N., et al. 1999, VizieR Online Data Catalog, 2214, 0
Kinman, T. D., Mahaffey, C. T., & Wirtanen, C. A. 1982, AJ, 87, 314
Hanson, R. B., Klemola, A. R., Jones, B. F., & Monet, D. G. 2004, AJ, 128, 1430
– 14 –
Kurtz, M. J., & Mink, D. J. 1998, PASP, 110, 934
Liu, W., Hu, J. Y., Zhu, X. H., & Li, Z. Y. 1999, ApJS, 122, 243
Marsh, T. R. 1988, MNRAS, 231, 1117
Monet, D. et al. 1996, USNO-A2.0, (U. S. Naval Observatory, Washington, DC)
Morgenroth, O. 1939, Astronomische Nachrichten, 268, 273
Morales-Rueda, L., & Marsh, T. R. 2002, MNRAS, 332, 814
Patterson, J., et al. 2002, PASP, 114, 1364
Robinson, E. L. 1992, ASP Conf. Ser. 29: Cataclysmic Variable Stars, 29, 3
Schachter, J. F., Remillard, R., Saar, S. H., Favata, F., Sciortino, S., & Barbera, M. 1996, ApJ,
463, 747
Schlegel, D. J., Finkbeiner, D. P., & Davis, M. 1998, ApJ, 500, 525
Schneider, D. P. & Young, P. 1980, ApJ, 238, 946
Shafter, A. W. 1983, ApJ, 267, 222
Shafter, A. W. 1992, ApJ, 394, 268
Szkody, P. 1987, ApJS, 63, 685
Tappert, C., & Bianchini, A. 2003, A&A, 401, 1101
Taylor, C. J., Thorstensen, J. R., & Patterson, J. 1999, PASP, 111, 184
Thorstensen, J. R. 2003, AJ, 126, 3017
Thorstensen, J. R., Fenton, W. H., & Taylor, C. J. 2004, PASP, 116, 300
Thorstensen, J. R. & Freed, I. W. 1985, AJ, 90, 2082
Thorstensen, J. R., Patterson, J. O., Shambrook, A., & Thomas, G. 1996, PASP, 108, 73
Thorstensen, J. R., Ringwald, F. A., Wade, R. A., Schmidt, G. D., & Norsworthy, J. E. 1991, AJ,
102, 272
Thorstensen, J. R., & Taylor, C. J. 2000, MNRAS, 312, 629
Uemura, M., Kato, T., & Watanabe, M. 2000, Informational Bulletin on Variable Stars, 4831, 1
Warner, B. 1987, MNRAS, 227, 23
– 15 –
Warner, B. 1995, Cambridge Astrophysics Series, Cambridge, New York: Cambridge University
Press, —c1995
Zacharias, N., Urban, S. E., Zacharias, M. I., Wycoff, G. L., Hall, D. M., Monet, D. G., & Rafferty,
T. J. 2004, AJ, 127, 3043
This preprint was prepared with the AAS LATEX macros v5.2.
– 16 –
Fig. 1.— Plots of the average flux-calibrated spectra for eight of the stars studied here. The weak
features seen near λ5577 are artifacts caused by imperfect subtraction of the strong [OI] night-sky
emission.
– 17 –
Fig. 2.— Plot of the averaged spectrum of BF Eri (top) and the spectrum after scaled late-type
(K3V) star has been subtracted (bottom). The spectra have been shifted into a rest frame before
averaging and do not include the 2006 March or 2007 January data.
– 18 –
Fig. 3.— Periodigrams for most of the stars studied here. The vertical axis in each case is the
inverse of chi-square for the least-squares best fitting sinusoid at each trial frequency. When data
from more that one observing run are combined, the periodigram can require hundreds of thousands
of points to resolve the fine-scale ringing; in those cases, the curve shown is formed by connecting
local maxima of the periodogram with straight lines. In those cases the right-hand panel gives a
close-up view of the region around the highest peak, revealing the alias structure resulting from
different choices of cycle count between the observing runs. The periodogram of BF Eri is for the
absorption-line velocities.
– 19 –
Fig. 4.— Periodigrams for the remainder of the stars, plotted in the same manner as the previous
figure. Because the choice of daily cycle count for FO Per remains ambiguous, we have not chosen
to enlarge either peak region.
– 20 –
Fig. 5.— Radial velocities plotted as a function of phase using the adopted orbital periods. For
CZ Aql, GZ Cnc, and V1006 Cyg, the number of cycle counts between observing runs is unknown,
and the exact period chosen to fold the velocities is one of a number of possibilities. The two
plots for FO Per are for different choices of the daily cycle count, and each of these in turn is also
an arbitrary choice among many finely-spaced periods. In BF Eri, both emission and absorption
velocities are plotted; the absorption velocities are shown with error bars.
– 21 –
Fig. 6.— Phase-averaged spectra of CZ Aql (top two panels) and BF Eri (bottom two panels),
presented as a greyscale. The scale is inverted, so that emission is represented by darker shades.
The two CZ Aql spectra are scaled differently to show the line cores (top) and the extent of the the
line wings. Note the NaD lines in CZ Aql remain stationary, indicating an interstellar origin. The
feature at λ6280 is telluric. BF Eri’s spectrum is plotted in two overlapping sections; the K-star’s
orbital motion is plainly visible.
– 22 –
Table 1. Stars Observed
Star α2000
a δ2000 Epoch
b Vobs
c maxd min
[hh:mm:ss] [◦:′:”] [mag.] [mag.] [mag.]
LX And 2:19:44.08 +40:27:22.3 2006.7 16.3 13.5p 16.4p
CZ Aql 19:19:58.21 −07:10:55.2 2003.4 15.4 13.p 15.p
LU Cam 5:58:17.86 +67:53:46.2 2002.0 16.3 14.v (16.v
GZ Cnc 9:15:51.68 +09:00:49.6 2000.3 15.4 13.1v 15.4v
V632 Cyg 21:36:04.22 +40:26:19.4 2000.5 17.9 12.6p 17.5p
V1006 Cyg 19:48:47.20 +57:09:22.8 2000.5 17.8 15.4p 17.0p
BF Eri 4:39:29.96 −04:35:59.5 2006.2 14.8 13.5p 15.5p
BI Ori 5:23:51.77 +01:00:30.6 2002.8 17.1 13.2p 16.7p
FO Per 4:08:34.98 +51:14:48.5 2004.0 17.1 11.8v 16.v
aPositions measured from images taken at the 2.4m Hiltner telescope, using
astrometric solutions from fits to USNO A2.0 (Monet et al. 1996) or UCAC 2
(Zacharias et al. 2004) stars. Uncertainties are of order 0.1 arcsec.
bThe date of the image used in the position measurement. The coordinate
system (equator and equinox) is J2000 in all cases.
cSynthesized from spectra, as described in text.
dTaken from the GCVS (Kholopov et al. 1999). Photgraphic magnitudes
flagged with ‘p’, visual with ‘v’.
– 23 –
Table 2. Journal of Observations
data N HA start HA end telescope
(UT) [hh:mm] [hh:mm]
LX And
2004 Jan 13 1 +2:35 +2:35 2.4m
2004 Mar 02 3 +3:45 +4:00 2.4m
2004 Nov 18 1 +0:29 +0:29 1.3m
2004 Nov 19 26 −3:36 +4:17 1.3m
2004 Nov 19 2 +2:31 +2:35 2.4m
2004 Nov 20 13 −3:17 +3:15 1.3m
2004 Nov 20 1 −1:23 −1:23 2.4m
2006 Jan 19 14 +1:31 +4:10 1.3m
2006 Jan 22 8 +3:31 +4:46 1.3m
2007 Jan 26 3 +1:36 +1:58 1.3m
2007 Jan 27 15 +0:58 +3:39 1.3m
CZ Aql
2005 Jul 02 3 +1:40 +2:04 1.3m
2005 Jul 04 48 −3:28 +3:17 1.3m
2005 Jul 05 13 −1:55 −0:06 1.3m
2005 Jul 06 12 −3:57 +2:28 1.3m
2005 Sep 03 2 +1:23 +1:32 1.3m
2005 Sep 07 2 −0:03 +0:05 1.3m
2005 Jun 28 2 −0:01 +0:04 2.4m
2006 Jun 18 2 +2:05 +2:10 2.4m
2006 Jun 19 2 +0:36 +0:40 2.4m
2006 Jun 23 3 −1:24 −1:12 2.4m
– 24 –
Table 2—Continued
data N HA start HA end telescope
(UT) [hh:mm] [hh:mm]
LU Cam
2002 Jan 22 2 −1:40 −1:19 2.4m
2002 Jan 23 8 −1:33 +2:31 2.4m
2002 Jan 24 25 −3:15 +5:47 2.4m
2002 Feb 18 2 +2:20 +2:28 2.4m
2002 Feb 19 2 +2:41 +2:50 2.4m
2002 Feb 20 4 −0:02 +3:43 2.4m
2002 Feb 22 1 +2:18 +2:18 2.4m
2004 Jan 16 2 +2:49 +2:53 2.4m
2004 Jan 17 1 +0:40 +0:40 2.4m
2004 Jan 19 1 −0:32 −0:32 2.4m
2004 Mar 02 1 +0:53 +0:53 2.4m
2004 Mar 07 6 +2:29 +3:14 2.4m
2004 Nov 19 4 +2:48 +3:27 2.4m
2005 Mar 21 1 +1:12 +1:12 2.4m
2005 Mar 22 2 +1:21 +1:30 2.4m
2005 Sep 09 1 −2:00 −2:00 2.4m
2005 Sep 12 1 −1:49 −1:49 2.4m
2006 Jan 09 2 +1:49 +1:55 2.4m
GZ Cnc
2000 Apr 07 2 +4:26 +4:32 2.4m
2000 Apr 08 1 +0:27 +0:27 2.4m
2000 Apr 10 5 −0:31 +4:32 2.4m
2000 Apr 11 15 +2:19 +4:22 2.4m
2001 Mar 24 2 +4:39 +4:49 2.4m
2001 Mar 25 21 −1:10 +3:10 2.4m
2001 Mar 26 20 −0:50 +4:27 2.4m
2001 Mar 27 2 −0:08 +0:01 2.4m
2001 Mar 28 3 +0:14 +0:25 2.4m
V632 Cyg
2005 Jul 07 2 +1:00 +1:16 1.3m
2005 Jul 08 10 −5:12 +0:33 1.3m
2005 Jul 09 18 −5:03 +1:09 1.3m
2005 Jul 10 18 −5:09 +1:07 1.3m
2005 Jul 11 3 +0:52 +1:19 1.3m
– 25 –
Table 2—Continued
data N HA start HA end telescope
(UT) [hh:mm] [hh:mm]
V1006 Cyg
2003 Jun 22 1 +1:06 +1:06 2.4m
2004 Jun 24 5 +0:52 +1:56 1.3m
2004 Jun 25 5 −1:43 +0:34 1.3m
2004 Jun 25 1 +4:06 +4:06 2.4m
2004 Jun 26 10 −4:25 +1:56 1.3m
2004 Jun 27 3 −3:00 −2:00 1.3m
2004 Jun 28 4 −2:25 +1:02 1.3m
2004 Jun 28 1 +0:28 +0:28 2.4m
2004 Jun 29 5 +0:55 +1:59 1.3m
2004 Jun 29 1 +4:07 +4:07 2.4m
2004 Jun 30 18 −4:39 +2:26 1.3m
2004 Jul 01 10 −3:58 +1:53 2.4m
2005 Jul 05 13 +0:58 +3:11 1.3m
2004 Jun 30 4 −3:41 +3:52 2.4m
2004 Jul 01 12 −4:07 +2:17 1.3m
2005 Jul 03 27 −3:57 +1:20 1.3m
2005 Jul 05 13 +0:58 +3:11 1.3m
BF Eri
2001 Dec 18 3 +2:35 +2:56 1.3m
2001 Dec 19 10 −3:06 +4:04 1.3m
2001 Dec 20 12 −3:49 +2:00 1.3m
2001 Dec 21 2 +3:04 +3:14 1.3m
2001 Dec 22 5 −3:13 −2:31 1.3m
2001 Dec 23 12 −3:18 +4:35 1.3m
2001 Dec 24 13 −3:17 +3:08 1.3m
2001 Dec 25 14 −2:20 +1:07 1.3m
2001 Dec 26 8 −2:10 +3:05 1.3m
2001 Dec 27 18 −2:39 +4:14 1.3m
2002 Jan 19 1 +1:27 +1:27 2.4m
2002 Jan 20 2 −1:33 +2:13 2.4m
2002 Jan 22 1 −2:02 −2:02 2.4m
2002 Feb 21 2 +1:26 +1:35 2.4m
2002 Feb 22 2 +0:40 +0:49 2.4m
2002 Oct 26 2 −0:11 +0:05 2.4m
2002 Oct 31 1 +3:21 +3:21 2.4m
2003 Feb 02 1 +0:04 +0:04 2.4m
2005 Sep 11 2 −0:57 −0:40 1.3m
2006 Mar 16 1 +1:56 +1:56 1.3m
2006 Mar 17 5 +2:15 +2:57 1.3m
– 26 –
Table 2—Continued
data N HA start HA end telescope
(UT) [hh:mm] [hh:mm]
2007 Jan 28 9 −1:46 −0:13 1.3m
BI Ori
2006 Jan 20 32 −2:54 +4:05 1.3m
2006 Jan 21 22 −2:47 +2:21 1.3m
2006 Jan 23 6 +1:03 +2:08 1.3m
FO Per
1995 Oct 09 9 −4:40 −3:26 2.4m
1995 Oct 10 5 −5:22 +1:26 2.4m
1996 Dec 19 14 +1:38 +4:03 1.3m
1996 Dec 20 5 +2:08 +3:43 1.3m
2001 Dec 18 9 −2:35 +4:36 1.3m
2004 Nov 18 1 −0:11 −0:11 1.3m
2004 Nov 19 9 +2:52 +5:08 1.3m
2004 Nov 19 2 +3:24 +2:48 2.4m
2004 Nov 20 20 −0:56 +5:10 1.3m
2006 Jan 10 12 −0:35 +2:35 1.3m
2006 Jan 10 1 +4:21 +4:21 2.4m
2006 Jan 11 18 −1:29 +2:41 1.3m
2006 Jan 11 1 −3:12 −3:12 2.4m
2006 Jan 12 4 −1:16 −0:36 1.3m
2006 Jan 13 10 +3:44 +5:37 1.3m
2006 Jan 16 11 −1:36 +1:00 1.3m
– 27 –
Table 3. Spectral Features in Quiescence
E.W.a Flux FWHM b
Feature (Å) (10−16 erg cm−2 s1) (Å)
LX And
Hβ 45 690 18
HeI λ4921 4 60 25
HeI λ5015 3 50 20
Fe λ5169 2 20 14
HeI λ5876 11 120 19
Hα 54 560 17
HeI λ6678 4 40 19
CZ Aql
Hβ 21 670 18
HeI λ4921 1 40 12
HeI λ5015 2 50 13
HeI λ5876 7 160 15
NaD −1 −16 · · ·
Hα 61 1250 27
HeI λ6678 5 90 16
LU Cam
Hγ 10 170 12
HeI λ4471 2 30 10
Hβ 14 190 12
HeI λ4921 1 20 12
HeI λ5015 2 20 13
Fe λ5169 1 10 12
HeI λ5876 4 50 12
Hα 27 240 13
HeI λ6678 3 20 15
HeI λ7067 3 20 · · ·
GZ Cnc
Hγ 26 940 26
HeI λ4471 8 260 28
HeII λ4686 5 140 46
Hβ 36 1040 25
HeI λ4921 5 140 26
HeI λ5015 4 100 28
Fe λ5169 2 60 26
HeI λ5876 9 200 27
– 28 –
Table 3—Continued
E.W.a Flux FWHM b
Feature (Å) (10−16 erg cm−2 s1) (Å)
Hα 38 790 25
HeI λ6678 4 90 31
HeI λ7067 3 60 32
V632 Cyg
Hβ 80 260 24
HeI λ4921 6 20 27
HeI λ5015 8 20 26
Fe λ5169 5 10 26
HeI λ5876 28 70 27
Hα 113 260 27
HeI λ6678 15 30 32
V1006 Cyg
Hβ 74 250 27
HeI λ4921 8 30 30
HeI λ5015 8 20 28
Fe λ5169 8 30 28
HeI λ5876 26 70 34
Hα 108 250 31
HeI λ6678 11 30 38
BF Eri
HeII λ4686 6 240 46
Hβ 23 1060 22
HeI λ5015 2 110 29
HeI λ5876 5 240 21
Hα 27 1200 22
HeI λ6678 3 120 27
BI Ori
Hβ 34 180 32
HeI λ4921 6 30 36
HeI λ5015 6 30 43
Fe λ5169 5 30 39
HeI λ5876 6 30 31
Hα 36 190 31
FO Per
– 29 –
Table 3—Continued
E.W.a Flux FWHM b
Feature (Å) (10−16 erg cm−2 s1) (Å)
Hβ 24 160 9
HeI λ4921 2 20 7
HeI λ5015 3 20 7
Fe λ5169 2 10 7
HeI λ5876 5 30 7
Hα 29 170 9
HeI λ6678 2 10 9
aEmission equivalent widths are counted as positive.
bFrom Gaussian fits.
– 30 –
Table 4. Radial Velocities
Star time a vabs σvabs vemn σvemn
(km s−1) (km s−1) (km s−1) (km s−1)
LX And 53017.7061 · · · · · · −44 −11
LX And 53066.6160 · · · · · · −57 −8
LX And 53066.6208 · · · · · · −37 −8
LX And 53066.6263 · · · · · · −77 −7
aHeliocentric Julian date of mid-integration, minus 2400000.
Note. — All emission-line velocities are of Hα. Emission-line velocity
uncertainties are derived from counting statistics and should be regarded as
lower limits. Table 4 is published in its entirety in the electronic version of
the Publications of the Astronomical Society of the Pacific. A short portion
is shown here for guidance regarding its form and content.
– 31 –
Table 5. Fits to Radial Velocities
Star Algorithma T0
b P K γ N σc
(d) (km s−1) (km s−1) (km s−1)
LX And G2,21,7 53754.6861(12) 0.1509743(5) 81(4) −48(3) 87 15
CZ Aql G2,18,8 53557.880(2) 0.2005(6)d 193(15) 5(10) 89 61
LU Cam D,15 52327.7421(14) 0.1499686(4) 57(4) 44(3) 66 14
GZ Cnc G2,15,9 51992.8928(13) 0.0881(4)d 79(7) 22(5) 71 26
V632 Cyg D,28 53560.9746(13) 0.06377(8) 62(8) −49(5) 51 28
V1006 Cyg G2,20,9 53187.9091(16) 0.09904(9)d 89(8) −11(6) 120 44
BF Eri emission G2,21,7 52574.0027(18) 0.2708801(6) 109(5) −91(3) 126 24
BF Eri absorption · · · 52573.8632(9) 0.2708805(4) 182(4) −72(3) 117 20
BF Eri mean: · · · · · · 0.2708804(4) · · · · · · · · ·
BI Ori 53756.541(3) G2,35,9 0.1915(5) 131(13) 24(9) 60 44
FO Per (shorter) D,11 52261.872(3) 0.1467(4)d 27(3) −49(2) 131 17
FO Per (longer) D,11 52261.893(3) 0.1719(5)d 27(3) −45(2) 131 17
Note. — Parameters of sinusoidal least-squares fits to the velocity timeseries, of the form v(t) = γ +K sin(2π(t −
T0)/P ). The quoted parameter uncertainties are based on the assumption that the scatter of the data around the best
fit is a realistic estimate of the velocity uncertainty (Cash 1979). In practice this is more conservative than assuming
that counting statistics uncertainties are realistic.
aCode for the convolution function used to derive emission line velocities; D = derivative of a Gaussian, G2 =
double-Gaussian function (see text). For the D algorithm the number that follows gives the line full-width at half-
maximum, in Å, for which the function is optimized; for the G2 algorithm the two numbers are respectively the
separation of the two Gaussians and their individual FWHMa, again in Å.
bHeliocentric Julian Date minus 2400000. The epoch is chosen to be near the center of the time interval covered
by the data, and within one cycle of an actual observation.
cRoot-mean-square residual of the fit.
dThe period determination in this case is complicated by unknown numbers of cycles between observing runs; the
uncertainty given here is an estimate based on fits to individual runs. Only certain values within the period range
given here are allowed; see text for details.
This figure "f6.png" is available in "png"
 format from:
http://arxiv.org/ps/0704.0948v1
http://arxiv.org/ps/0704.0948v1
	Introduction
	Observations, Reductions, and Analysis
	Observations
	Reductions
	Analysis
	Notes on Individual Objects
	LX Andromedae
	CZ Aquilae
	LU Camelopardalis
	GZ Cancri
	V632 Cygni
	V1006 Cygni
	BF Eri
	BI Ori
	FO Per
	Summary
ABSTRACT
  We present optical spectroscopy of nine cataclysmic binary stars, mostly
dwarf novae, obtained primarily to determine orbital periods Porb. The stars
and their periods are LX And, 0.1509743(5) d; CZ Aql, 0.2005(6) d; LU Cam,
0.1499686(4) d; GZ Cnc, 0.0881(4) d; V632 Cyg, 0.06377(8) d; V1006 Cyg,
0.09903(9) d; BF Eri, 0.2708804(4) d; BI Ori, 0.1915(5) d; and FO Per, for
which Porb is either 0.1467(4) or 0.1719(5) d.
  Several of the stars proved to be especially interesting. In BF Eri, we
detect the absorption spectrum of a secondary star of spectral type K3 +- 1
subclass, which leads to a distance estimate of approximately 1 kpc. However,
BF Eri has a large proper motion (100 mas/yr), and we have a preliminary
parallax measurement that confirms the large proper motion and yields only an
upper limit for the parallax. BF Eri's space velocity is evidently large, and
it appears to belong to the halo population. In CZ Aql, the emission lines have
strong wings that move with large velocity amplitude, suggesting a
magnetically-channeled accretion flow. The orbital period of V1006 Cyg places
it squarely within the 2- to 3-hour "gap" in the distribution of cataclysmic
binary orbital periods.

<|endoftext|><|startoftext|>
Introduction and motivation
The theory of variational calculus for problems with compositions has been re-
cently initiated in [5]. The new theory considers integral functionals that depend
not only on functions q(·) and their derivatives q̇(·), but also on compositions
(q ◦ q)(·) of q(·) with q(·). As far as chaos is often a byproduct of iteration
∗Accepted for an oral presentation at the 7th IFAC Symposium on Nonlinear Control
Systems (NOLCOS 2007), to be held in Pretoria, South Africa, 22–24 August, 2007.
†This work is part of the author’s PhD project. Supported by the Portuguese Institute for
Development (IPAD).
‡Supported by the Centre for Research on Optimization and Control (CEOC) through
the Portuguese Foundation for Science and Technology (FCT), cofinanced by the European
Community fund FEDER/POCTI.
http://arxiv.org/abs/0704.0949v1
of nonlinear maps [2], such problems serve as an interesting model for chaotic
dynamical systems. Let us briefly review this relation (for more details, we refer
the interested reader to [3, 4, 5]). Let q : [0, 1] → [0, 1] be a piecewise mono-
tonic map with probability density function fq(·), which captures the long term
statistical behavior of a nonlinear dynamical system. It is natural (see [2]) to
consider the problem of minimizing or maximizing the functional
I[q(·), fq(·)] =
(q(t) − t)
fq(t)dt , (1)
which depends on q(·) and its probability density function fq(·) (usually a com-
plicated function of q(·)). It turns out that fq(·) is the fixed point of the
Frobenius-Perron operator Pq[·] associated with q(·). For a piecewise mono-
tonic map q : [0, 1] → [0, 1] with r pieces, Pq[·] has the representation
Pq[f ](t) =
v∈{q−1(t)}
|q̇(v)|
where for any point t ∈ [0, 1] the set {q−1(t)} consists of at most r points. The
fixed point fq(·) associated with an ergodic map q(·) can be expressed as the
limit
fq = lim
P iq [1] , (2)
where 1 is the constant function 1 on [0, 1]. Substituting (2) into (1), and using
the adjoint property [2, Prop. 4.2.6], one eliminates the probability density
function fq(·), obtaining (1) in the form
I[q(·)] =
t, q(t), q(2)(t), q(3)(t), . . .
where we are using the notation q(i)(·) to denote the i-th composition of q(·)
with itself: q(1)(t) = q(t), q(2)(t) = (q ◦ q)(t), q(3)(t) = (q ◦ q ◦ q)(t), etc. In [5] a
generalized Euler-Lagrange equation, which involves the inverse images of the
extremizing function q(·) (cf. (11)), was proved for such functionals in the cases
t, q(t), q(2)(t)
t, q(t), q̇(t), q(2)(t)
t, q(t), q(2)(t), q(3)(t)
To the best of our knowledge, these generalized Euler-Lagrange equations com-
prise all the available results on the subject. Thus, one concludes that the
theory of variational calculus with compositions is in its childhood: much re-
mains to be done. Here we go a step further in the theory of functionals con-
taining compositions. We are mainly interested in Noether’s classical theo-
rem, which is one of the most beautiful results of the calculus of variations
and optimal control, with many important applications in Physics (see e.g.
[6, 13, 14]), Economics (see e.g. [1, 17]), and Control Engineering (see e.g.
[11, 15, 18, 20, 22]), and source of many recent extensions and developments
(see e.g. [7, 8, 9, 10, 16, 19, 21]). Noether’s symmetry theorem describes the
universal fact that invariance with respect to some family of parameter transfor-
mations gives rise to the existence of certain conservation laws, i.e. expressions
preserved along the Euler-Lagrange or Pontryagin extremals of the problem.
Our results are a generalized DuBois-Reymond necessary optimality condition
(Theorem 7), and a generalized Noether’s theorem (Theorem 13) for function-
als of the form
t, q(t), q̇(t), q(2)(t)
dt. In §4 an illustrative example is
presented.
2 Preliminaries – review of classical results of
the calculus of variations
There exist many different ways to prove the classical Noether’s theorem (cf.
e.g. [6, 12, 13, 17]). We review here one of those proofs, which is based on the
DuBois-Reymond necessary condition. Although this proof is not so common
in the literature of Noether’s theorem, it turns out to be the most suitable
approach when dealing with functionals containing compositions.
Let us consider the fundamental problem of the calculus of variations:
I[q(·)] =
L (t, q(t), q̇(t)) dt −→ min (P)
under the boundary conditions q(a) = qa and q(b) = qb, where q̇ =
, with q(·)
a piecewise-smooth function, and the Lagrangian L : [a, b] × Rn × Rn → R is a
C2 function with respect to all its arguments.
The concept of symmetry has a very important role in mathematics and its
applications. Symmetries are defined through transformations of the system
that leave the problem invariant.
Definition 1 (Invariance of (P)). The integral functional (P) it said to be
invariant under the ε-parameter infinitesimal transformations
t̄ = t + ετ(t, q) + o(ε) ,
q̄(t) = q(t) + εξ(t, q) + o(ε) ,
where τ and ξ are piecewise-smooth, if
L (t, q(t), q̇(t)) dt =
∫ t̄(tb)
t̄(ta)
L (t̄, q̄(t̄), ˙̄q(t̄)) dt̄ (4)
for any subinterval [ta, tb] ⊆ [a, b].
Along the work we denote by ∂iL the partial derivative of L with respect to
its i-th argument.
Theorem 2 (Necessary condition of invariance). If functional (P) is invariant
under the infinitesimal transformations (3), then
∂1L (t, q, q̇) τ + ∂2L (t, q, q̇) · ξ + ∂3L (t, q, q̇) ·
ξ̇ − q̇τ̇
+ L (t, q, q̇) τ̇ = 0 . (5)
Proof. Since (4) is to be satisfied for any subinterval [ta, tb] ⊆ [a, b], equality (4)
is equivalent to
t + ετ + o(ε), q + εξ + o(ε),
q̇ + εξ̇ + o(ε)
1 + ετ̇ + o(ε)
= L (t, q, q̇) . (6)
We obtain (5) differentiating both sides of (6) with respect to ε, and then setting
ε = 0.
Another very important notion in mathematics and its applications is the
concept of conservation law. One of the most important conservation laws was
proved by Leonhard Euler in 1744: when the Lagrangian L(q, q̇) corresponds to
a system of conservative points, then
− L (q(t), q̇(t)) +
(q(t), q̇(t)) · q̇(t) ≡ constant , (7)
t ∈ [a, b], holds along the solutions of the Euler-Lagrange equations.
Definition 3 (Conservation law). A quantity C(t, q, q̇) defines a conservation
law if
C(t, q(t), q̇(t)) = 0 , t ∈ [a, b] ,
along all the solutions q(·) of the Euler-Lagrange equation
∂3L (t, q, q̇) = ∂2L (t, q, q̇) . (8)
Conservation laws can be used to lower the order of the Euler-Lagrange equa-
tions (8) and simplify the resolution of the respective problems of the calculus of
variations and optimal control [16]. Emmy Amalie Noether formulated in 1918
a very general principle on conservation laws, with many important implications
in modern physics, economics and engineering. Noether’s principle asserts that
“the invariance of the functional
L (t, q(t), q̇(t)) dt under one-parameter in-
finitesimal transformations (3), imply the existence of a conservation law”. One
particular example of application of Noether’s theorem gives (7), which corre-
sponds to conservation of energy in classical mechanics or to the income-wealth
law of economics.
Theorem 4 (Noether’s theorem). If functional (P) is invariant, in the sense
of the Definition 1, then
C(t, q, q̇) = ∂3L (t, q, q̇) · ξ(t, q) + (L(t, q, q̇) − ∂3L (t, q, q̇) · q̇) τ(t, q) (9)
defines a conservation law.
We recall here the proof of Theorem 4 by means of the classical necessary
optimality condition of DuBois-Reymond.
Theorem 5 (DuBois-Reymond condition). If q(·) is a solution of problem (P),
∂1L (t, q, q̇) =
[L (t, q, q̇) − ∂3L (t, q, q̇) · q̇] . (10)
Proof. The DuBois-Reymond necessary optimality condition is easily proved
using the Euler-Lagrange equation (8):
[L (t, q, q̇) − ∂3L (t, q, q̇) · q̇]
= ∂1L (t, q, q̇) + ∂2L (t, q, q̇) · q̇ + ∂3L (t, q, q̇) · q̈
∂3L (t, q, q̇) · q̇ − ∂3L (t, q, q̇) · q̈
= ∂1L (t, q, q̇) + q̇ ·
∂2L (t, q, q̇) −
∂3L (t, q, q̇)
= ∂1L (t, q, q̇) .
Proof. (of Theorem 4) To prove the Noether’s theorem, we use the Euler-
Lagrange equation (8) and the DuBois-Reymond condition (10) into the neces-
sary condition of invariance (5):
0 = ∂1L (t, q, q̇) τ + ∂2L (t, q, q̇) · ξ
+ ∂3L (t, q, q̇) ·
ξ̇ − q̇τ̇
+ L (t, q, q̇) τ̇
= ∂2L (t, q, q̇) · ξ + ∂3L (t, q, q̇) · ξ̇ + ∂1L (t, q, q̇) τ
+ τ̇ (L (t, q, q̇) − ∂3L (t, q, q̇) · q̇)
∂3L (t, q, q̇) · ξ + ∂3L (t, q, q̇) · ξ̇
(L (t, q, q̇) − ∂3L (t, q, q̇) · q̇) τ
+ τ̇ (L (t, q, q̇) − ∂3L (t, q, q̇) · q̇)
∂3L (t, q, q̇) · ξ +
L(t, q, q̇) − ∂3L (t, q, q̇) · q̇
3 Main results
We consider the following problem of the calculus of variations with composition
of functions:
I[q(·)] =
L (t, q(t), q̇(t), z(t)) dt −→ min (Pc)
subject to given boundary conditions q(a) = qa, q(b) = qb, z(a) = za, and
z(b) = zb, where q̇ =
and z(t) = (q ◦ q)(t). We assume that the Lagrangian
L : [a, b] × R × R × R → R is a function of class C2 with respect to all the
arguments, and that admissible functions q(·) are piecewise-smooth. The main
result of [5] is an extension of the Euler-Lagrange equation (8) for problems of
the calculus of variations (Pc).
Theorem 6 ([5]). If q(·) is a weak minimizer of problem (Pc), then q(·) satisfies
the Euler-Lagrange equation
∂2L (x, q(x), q̇(x), z(x)) −
∂3L (x, q(x), q̇(x), z(x))
+ ∂4L (x, q(x), q̇(x), z(x)) q̇(q(x)) +
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
= 0 (11)
for any x ∈ (a, b).
3.1 Generalized DuBois-Reymond condition
We begin by proving an extension of the DuBois-Reymond necessary optimality
condition (10) for problems of the calculus of variations (Pc).
Theorem 7 (cf. Theorem 5). If q(·) is a weak minimizer of problem (Pc), then
q(·) satisfies the DuBois-Reymond condition
L (x, q(x), q̇(x), z(x)) − ∂3L (x, q(x), q̇(x), z(x)) q̇(x)
= ∂1L (x, q(x), q̇(x), z(x)) − q̇(x)
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
for any x ∈ (a, b).
Remark 8. If L (t, q, q̇, z) = L (t, q, q̇), then (12) coincides with the classical
DuBois-Reymond condition (10).
Proof. To prove Theorem 7 we use the Euler-Lagrange equation (11):
L (x, q, q̇, z) − ∂3L (x, q, q̇, z) q̇
= ∂1L (x, q, q̇, z) + ∂2L (x, q, q̇, z) q̇
+ ∂3L (x, q, q̇, z) q̈ + ∂4L (x, q, q̇, z) q̇(q(x))q̇
∂3L (x, q, q̇, z) − ∂3L (x, q, q̇, z) q̈
= ∂1L (x, q, q̇, z) + q̇
∂2L (x, q, q̇, z)
+ ∂4L (x, q, q̇, z) q̇(q(x)) −
∂3L (x, q, q̇, z)
= ∂1L (x, q, q̇, z) − q̇(x)
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
3.2 Noether’s theorem for functionals containing compo-
sitions
We introduce now the definition of invariance for the functional (Pc). As done
in the proof of Theorem 2 (see (6)), we get rid off of the integral signs in (4).
Definition 9 (cf. Definition 1). We say that functional (Pc) is invariant under
the infinitesimal transformations (3) if
L (t̄, q̄(t̄), q̄′(t̄), z̄(t̄))
= L (t, q(t), q̇(t), z(t)) + o(ε) , (13)
where q̄′ = dq̄/dt̄.
Along the work, in order to simplify the presentation, we sometimes omit
the arguments of the functions.
Theorem 10 (cf. Theorem 2). If functional (Pc) is invariant under the in-
finitesimal transformations (3), then
∂1L (t, q, q̇, z) τ + ∂2L (t, q, q̇, z) ξ + ∂3L (t, q, q̇, z)
ξ̇ − q̇τ̇
+ ∂4L (t, q, q̇, z) q̇(q(t))ξ + ∂4L (t, q, q̇, z) ξ(q(t)) + Lτ̇ = 0 . (14)
Proof. Equation (13) is equivalent to
t + ετ + o(ε), q + εξ + o(ε),
q̇ + εξ̇ + o(ε)
1 + ετ̇ + o(ε)
q(q + εξ + o(ε)) + εξ(q + εξ + o(ε))
× (1 + ετ̇ + o(ε))
= L (t, q, q̇, z) + o(ε) . (15)
We obtain equation (14) differentiating both sides of equality (15) with respect
to the parameter ε, and then setting ε = 0.
Remark 11. Using the Frobenius-Perron operator (see [2, Chap. 4]) and the
Euler-Lagrange equation (11), we can write (14) in the following form:
∂1L (x, q, q̇, z) τ + ∂2L (x, q, q̇, z) ξ
+ ∂3L (x, q, q̇, z)
ξ̇ − q̇τ̇
+ ∂4L (x, q, q̇, z) q̇(q(x))ξ
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
ξ + Lτ̇
= ∂1L (x, q, q̇, z) τ +
∂3L (x, q, q̇, z) ξ
+ ∂3L (x, q, q̇, z)
ξ̇ − q̇τ̇
+ Lτ̇ = 0 . (16)
Definition 12 (Conservation law for (Pc)). We say that a quantity C (x, q, q̇, z)
defines a conservation law for functionals containing compositions if
C (x, q(x), q̇(x), z(x)) = 0
along all the solutions q(·) of the Euler-Lagrange equation (11).
Our main result is an extension of Noether’s theorem for problems of the
calculus of variations (Pc) containing compositions.
Theorem 13 (Noether’s theorem for (Pc)). If functional (Pc) is invariant, in
the sense of the Definition 9, and there exists a function f = f(x, q, q̇, z) such
(x, q(x), q̇(x), z(x)) = τ q̇(x)
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
, (17)
C (x, q(x), q̇(x), z(x))
L(x, q(x), q̇(x), z(x)) − ∂3L (x, q(x), q̇(x), z(x)) q̇
τ(x, q)
+ ∂3L (x, q(x), q̇(x), z(x)) ξ(x, q) + f(x, q(x), q̇(x), z(x)) (18)
defines a conservation law (Definition 12).
Remark 14. If L (x, q, q̇, z) = L (x, q, q̇), then f is a constant and expression
(18) is equivalent to the conserved quantity (9) given by the classical Noether’s
theorem.
Proof. To prove the theorem, we use conditions (12) and (17) in (16):
0 = ∂1L (x, q, q̇, z) τ +
∂3L (x, q, q̇, z) ξ
+ ∂3L (x, q, q̇, z)
ξ̇ − q̇τ̇
+ Lτ̇
L (x, q, q̇, z) − ∂3L (x, q, q̇), z)) q̇
+ τ̇ [L (x, q, q̇, z) − ∂3L (x, q, q̇), z)) q̇]
+ ξ̇∂3L (x, q, q̇, z) + ξ
∂3L (x, q, q̇, z)
+ τ q̇(x)
t=q−1(x)
∂4L (t, q(t), q̇(t), z(t))
|q̇(t)|
∂3L (x, q, q̇, z) ξ +
L (x, q, q̇, z)
− ∂3L (x, q, q̇, z) q̇
τ + f(x, q, q̇, z)
4 An example
Let us consider the problem
I[q(·)] =
[x + q(x) + q(q(x))] dx −→ min
q(0) = 1 , q(1) = 0 , (19)
q(q(0)) = 0 , q(q(1)) = 1 .
In [5, §3] it is proven that (19) has the extremal
q(x) =
q1(x) = −2x + 1 , x ∈
q2(x) = −2x + 2 , x ∈
that is, (20) satisfies the Euler-Lagrange equation (11) for L(x, q, q̇, z) = 1
(x + q + z).
We now illustrate the application of our Theorem 13 to this problem. First, we
need to determine the variational symmetries. Substituting the Lagrangian L
in (16) we obtain that
x + q + z
τ̇ = 0 . (21)
The differential equation (21) admits the solution
τ = ke−
x+q+z , (22)
where k is an arbitrary constant. From Theorem 13 we conclude that
(x + q1 + z1)τ +
τ q̇1
|q̇1(t)|
dx, x ∈
, (23)
(x + q2 + z2)τ +
τ q̇2
|q̇2(t)|
dx, x ∈
, (24)
defines a conservation law, where τ is obtained from (22):
τ = ke−
3x = kelnx
= kx−
3 , x ∈ [0, 1] . (25)
Since for this problem we know the extremal, we can verify the validity of the
obtained conservation law directly from Definition 12: substituting equalities
(20) and (25) in (23) and (24), we obtain, as expected, a constant (zero in this
case):
(x + q1 + z1)τ +
τ q̇1
|q̇1(t)|
= 3kxτ − 2
τdx = 3kx
3 − 3kx
3 = 0 ,
(x + q2 + z2)τ +
τ q̇2
|q̇2(t)|
= 3kxτ − 2
τdx = 3kx
3 − 3kx
3 = 0 .
5 Conclusions
We proved a generalization (i) of the necessary optimality condition of DuBois-
Reymond, (ii) of the celebrated Noether’s symmetry theorem, for problems of
the calculus of variations containing compositions (respectively Theorems 7 and
13). Our main result is illustrated with the example studied in [5].
The compositional variational theory is in its childhood, so that much re-
mains to be done. In particular, it would be interesting to obtain an Hamil-
tonian formulation and to study more general optimal control problems with
compositions.
Acknowledgements
The authors are grateful to Pawe l Góra who shared Chapter 4 of [2].
References
[1] P. Askenazy (2003). Symmetry and optimal control in economics. J. Math.
Anal. Appl. 282, 603–613.
[2] P. Bracken, P. Góra (1997). Laws of chaos . Birkhaüser. Bassel.
[3] P. Bracken, P. Góra, A. Boyarsky (2001). Deriving chaotic dynamical sys-
tems from energy functionals. Stochastics and Dynamics 1, 377–388.
[4] P. Bracken, P. Góra, A. Boyarsky (2002). A minimal principle for chaotic
systems. Physica D 166, 63–75.
[5] P. Bracken, P. Góra, A. Boyarsky (2004). Calculus of variations for func-
tionals containing compositions. J. Math. Anal. Appl. 296, 658–664.
[6] D. S. Djukic, A. M. Strauss (1980). Noether’s theory for nonconservative
generalised mechanical systems. J. Phys. A 13, no. 2, 431–435.
[7] G. S. F. Frederico, D. F. M. Torres (2006). Constants of motion for frac-
tional action-like variational problems. Int. J. Appl. Math. 19, no. 1, 97–104.
[8] G. S. F. Frederico, D. F. M. Torres (2007). Nonconservative Noether’s
theorem in optimal control. Int. J. Tomogr. Stat. 5, no. W07, 109–114.
[9] J. Fu, L. Chen (2003). Non-Noether symmetries and conserved quantities
of nonconservative dynamical systems. Phys. Lett. A 317, no. 3-4, 255–259.
[10] P. D. F. Gouveia, D. F. M. Torres (2005). Automatic computation of con-
servation laws in the calculus of variations and optimal control. Comput.
Methods Appl. Math. 5, no. 4, 387–409.
[11] A. Gugushvili, O. Khutsishvili, V. Sesadze, G. Dalakishvili, N.
Mchedlishvili, T. Khutsishvili, V. Kekenadze, D. F. M. Torres (2003).
Symmetries and Conservation Laws in Optimal Control Systems . Georgian
Technical University, Tbilisi.
[12] J. Jost, X. Li-Jost (1998). Calculus of variations . Cambridge Univ. Press.
Cambridge.
[13] J. D. Logan (1987). Applied mathematics – a contemporary approach.
Wiley-Interscience Publication, John Wiley & Sons, Inc. New York.
[14] S. Moyo, P. G. L. Leach (1998). Noether’s theorem in classical mechanics.
Politehn. Univ. Bucharest Sci. Bull. Ser. A Appl. Math. Phys. 60, no. 3-4,
221–234.
[15] H. Nijmeijer, A. van der Schaft (1982). Controlled invariance for nonlinear
systems. IEEE Trans. Automat. Control 27, no. 4, 904–914
[16] E. A. M. Rocha, D. F. M. Torres (2006). Quadratures of Pontryagin ex-
tremals for optimal control problems. Control and Cybernetics 35, no. 4.
[17] R. Sato, R. V. Ramachandran (1990). Conservation laws and symmetry
– Applications to economics and finance. Kluwer Academic Publishers.
Boston, MA.
[18] H. J. Sussmann (1995). Symmetries and integrals of motion in optimal
control. Geometry in nonlinear control and differential inclusions (Warsaw,
1993), Polish Acad. Sci., Warsaw, 379–393.
[19] D. F. M. Torres (2002). On the Noether theorem for optimal control, Eu-
ropean Journal of Control 8, no. 1, 56–63.
[20] D. F. M. Torres (2002). Conservation laws in optimal control. Dynamics,
Bifurcations and Control, Springer-Verlag, Lecture Notes in Control and
Information Sciences, Berlin, Heidelberg, 287–296.
[21] D. F. M. Torres (2004). Proper extensions of Noether’s symmetry theorem
for nonsmooth extremals of the calculus of variations. Commun. Pure Appl.
Anal. 3, no. 3, 491–500.
[22] A. van der Schaft (1981/82). Symmetries and conservation laws for Hamil-
tonian systems with inputs and outputs: a generalization of Noether’s the-
orem. Systems & Control Letters 1, no. 2, 108–115.
	Introduction and motivation
	Preliminaries – review of classical results of the calculus of variations
	Main results
	Generalized DuBois-Reymond condition
	Noether's theorem for functionals containing compositions
	An example
	Conclusions
ABSTRACT
  The study of problems of the calculus of variations with compositions is a
quite recent subject with origin in dynamical systems governed by chaotic maps.
Available results are reduced to a generalized Euler-Lagrange equation that
contains a new term involving inverse images of the minimizing trajectories. In
this work we prove a generalization of the necessary optimality condition of
DuBois-Reymond for variational problems with compositions. With the help of the
new obtained condition, a Noether-type theorem is proved. An application of our
main result is given to a problem appearing in the chaotic setting when one
consider maps that are ergodic.

<|endoftext|><|startoftext|>
Introduction
	The Data
	Distribution of z with the age
	Distribution of z with the maximum heliocentric distance
	z from YOCs
	z from OB stars
	Exponential decay of the z distribution
	Distribution of z with the Galactic longitude
	Concluding remarks
	REFERENCES
ABSTRACT
  We have carried out a comparative statistical study for the displacement of
the Sun from the Galactic plane (z_\odot) following three different methods.
The study has been done using a sample of 537 young open clusters (YOCs) with
log(Age) < 8.5 lying within a heliocentric distance of 4 kpc and 2030 OB stars
observed up to a distance of 1200 pc, all of them have distance information. We
decompose the Gould Belt's member in a statistical sense before investigating
the variation in the z_\odot estimation with different upper cut-off limits in
the heliocentric distance and distance perpendicular to the Galactic plane. We
found z_\odot varies in a range of ~ 13 - 20 pc from the analys is of YOCs and
~ 6 - 18 pc from the OB stars. A significant scatter in the z_\odot obtained
due to different cut-off values is noticed for the OB stars although no such
deviation is seen for the YOCs. We also determined scale heights of
56.9(+3.8)(-3.4} and 61.4(+2.7)(-2.4) pc for the distribution of YOCs and OB
stars respectively.

<|endoftext|><|startoftext|>
Introduction
Studies of extremal black holes in string theory have regained importance with the advent of
the attractor mechanism. In its simplest form the attractor mechanism states that the near
horizon geometry of an extremal black hole is fixed in terms of its charges. Further, it has been
realized that there is a single function, called the entropy function, which determines the near
horizon geometry of extremal black holes [1] (see also [2]). Even though the entropy function
provides the non-zero charges such as the electric, magnetic charges and angular momenta, for
many extremal black holes, it does not always give the correct charges. For instance, there are
apparent discrepancies when there are Chern-Simons terms for the gauge fields present in the
Lagrangian. This is the case, for instance, in 5d minimal (and minimally gauged) supergravities.
On the other hand it has been believed [4] that the near horizon geometry of an extremal
rotating black hole of 5d supergravities knows about only part of the the full black hole angular
momentum, called the horizon angular momentum. In [4] this has been argued to be the case
for the BMPV black hole [16].
Given that finding the near horizon geometries of the yet to be discovered extremal black
hole solutions might be easier than finding the full black hole solutions, it will be useful to
have a prescription to extract the quantum numbers of the full black hole from its near horizon
geometry. In this note we show, by careful analysis of the near horizon geometries of these black
holes, that one can find the full set of asymptotic charges and angular momenta of extremal
rotating black holes that satisfy certain assumptions.
For this, we first construct gravitational Noether charges following Wald [5] for several su-
pergravity theories. These charges can be defined for Killing vectors of any given solution of the
theory of interest. We mainly focus on type IIB in 10d, minimal and gauged supergravities in 5d.
We present closed form expressions for the Nother-Wald charges of these theories as integrals
over compact submanifolds of co-dimension 2 of any given solution.
The 5d minimal gauged supergravity can be obtained by a consistent truncation of type IIB
reduced on S5 [22] (see also [23]). We show that the charges of the 5d theory can be obtained by
the same dimensional reduction of the corresponding 10d charges. We further reduce the theory
down to 3 dimensions and show that the Nother-Wald charges corresponding to Killing vectors
that generate translations along compact directions are the same as the usual Noether charges
for the corresponding Kaluza-Klein gauge fields in the dimensionally reduced theory. We use
the understanding of the charges in the reduced theory to show how the entropy function may
be modified to reproduce the charges of the 5d black holes.
We will argue that these Noether-Wald charges can be used to extract the charges of extremal
black holes from their near horizon geometries under certain assumptions which will be discussed
later on. Thus the formulae presented in this paper should prove useful in extracting the con-
served charges of an extremal black hole from only its near-horizon geometry without having to
know the full black hole solution. We exhibit the successes and limitations of our formulae by
considering the examples of Gutowski-Reall black holes [12] and their generalizations [17] and
BMPV [16, 4] black holes, black rings [18] and the 10d lift of Gutowski-Reall black holes [13].
The analysis of the conserved charges in this paper can be applied to many geometries other
than the extremal black holes considered here and in particular to non-extremal black holes too.
In addition to the charges of a black hole, one is typically interested in the entropy, the mass,
as well as the laws of black hole thermodynamics. Up to now, the entropy has been defined in
terms of a Noether charge only for non-extremal black holes [5]. To find these thermodynamic
quantities and the laws of thermodynamics on the “extremal shell”, it was necessary to take
the extremal limit of the relations defined for the non-extremal black holes (see for instance [1]).
Furthermore, computations of quantities such as the mass, the euclidean action and relations like
the first law and the Smarr formula relied on computing quantities in the asymptotic geometry.
Hence, it would be desirable to derive appropriate relations intrinsically for extremal black holes,
and with only minimal reference to the existence of an asymptotic geometry.
With this motivation, in the second part of the paper, we propose a definition of the entropy
for extremal black holes in the near horizon geometry that does not require taking the extremal
limit of Wald’s entropy, but agrees with it. With a similar approach, we also derive the extremal
limit of the first law from the extremal geometry, assuming only that the near-horizon geometry
be connected to some asymptotic geometry. This definition of the entropy further allows us
to derive a statistical version of the first law [6]. We also show that this gives us the entropy
function directly from a study of the appropriate Noether charge in the near-horizon geometry
of extremal black holes. We will comment on the interpretation of the mass as well, from the
point of view of the near horizon solution.
The rest of the paper is organized as follows. In section 2, we review Wald’s construction of
gravitational Noether charges and use it to derive the charges for type IIB supergravity (with
the metric and the five-form fields) and for the 5d minimal and gauged supergravity theories and
show that they are related by dimensional reduction. In section 3, we show that the Noether-
Wald charges are identical to the standard Noether charges for the Kaluza-Klein U(1) gauge
fields of the corresponding compact Killing vectors. We also discuss various assumptions under
which these charges, when evaluated anywhere in the interior of the geometry, match with the
standard Komar integrals evaluated in the asymptotes. Some issues of gauge (in)dependence
of our charges are also address there. In section 4, we demonstrate how our formulae work on
several examples of interest. The readers who are only interested in the formalism may skip this
section. In section 5, we turn to modifying the entropy function formalism to include the Chern-
Simons terms. In section 6, we discuss thermodynamics of the extremal black holes and define
various physical quantities like the entropy, chemical potentials for the charges and the mass.
We end with conclusions in section 7. The example for black rings is given in the appendix.
2 Charges from Noether-Wald construction
Here we derive expressions for the gravitational Noether charges corresponding to Killing isome-
tries of the gravitational actions we are interested in following Wald [5, 7]. We review first
the general formalism and point out some relevant subtleties. Then we construct these charges
for 10d type IIB supergravity and for minimally gauged supergravity and Einstein-Maxwell-CS
theory in 5d. Finally, we show how the 10d and 5d expressions can be related by dimensional
reduction.
2.1 Review of Noether construction
Let us first review the construction of the charges and discuss some of the relevant properties. In
[7], Lee and Wald described how to construct the Noether charges for diffeomorphism symmetries
of a Lagrangian L(φi = gµν , Aµ, · · · ), a d-form in d spacetime dimensions. For this, one first
writes the variation of L under arbitrary field variations δφi as
δ L = Ei(φ) δφ
i + dΘ(δφ) (1)
where Ei(φ) = 0 are the equations of motion and Θ is a (d − 1)-form. Secondly, one finds the
variation of the Lagrangian under a diffeomorphism
δξL = d(iξ L), (2)
where ξa is the (infinitesimal) generator of a diffeomorphism. Then one defines the (d− 1)-form
current Jξ
Jξ = Θ(δξφ)− iξ L (3)
where δξφ
i are the variations of the fields under the particular diffeomorphism. Then Jξ are
conserved, i.e. dJξ = 0, for any configuration satisfying the equations of motion. Since Jξ is
closed, one can write (for trivial cohomology)
Jξ = dQξ (4)
for some (d − 2)-form charge Qξ. Now consider ξ to be a Killing vector and suppose that the
field configurations on the given solution respect the symmetry generated by it, Lξφi = 0. Since
Θ(δξφ
i) is linear in Lξφi we have Θ(δξφi) = 0 and so Jξ = −iξL. Next, let us illustrate that the
charge defined as the integral
Qξ over a compact (d-2)-surface Σr is conserved when (i) ξ is
a Killing vector generating a periodic isometry or (ii) when the current Jξ = 0 (as for Killing
vectors in theories with L = 0 on the solutions). Consider a (d − 1)-hypersurface M12 which is
foliated by compact (d−2)-hypersurfaces Σr over some interval R12 ⊂ R. Using Gauss’ theorem
one has
Jξ (5)
for ∂M12 = {Σ1,Σ2}. If Jξ = 0, it follows that the charge
Qξ does not depend on Σr and
therefore is conserved along the direction r. Next, let us assume that ξ generates translations
along a periodic direction of Σr. In general,
Jξ receives contributions from terms in Jξ that
contain the one-form ξ̂ dual to the Killing vector field ξ and terms that do not. The terms not
involving ξ̂ vanish by the periodicity of ξ. Since Jξ = −iξL, there are no terms involving ξ̂.
Therefore
Qξ is again independent of Σr.
We will now discuss two important ambiguities in the above prescription. The first one is
that the charge density defined by the equation Jξ = dQξ is ambiguous as Qξ → Qξ + dΛξ does
not change Jξ for some (d-3)-form Λξ. The extra term does not contribute to the integrated
charge only if Λξ is a globally defined (d-3)-form on Σr, that is, it is periodic in the coordinates
of Σr and non-singular. While this is the case for most of our examples, there may be situations
in which, for instance, some gauge potentials that go into Qξ are only locally defined. Similarly,
conservation of Qξ is not guaranteed if any component of Qξ ∈ Ωd−1
is not globally defined.
To illustrate this, consider the Jξ = dQξ = 0 case and let n be a normal to Σr, such that dn = 0.
Qξ = (ind)
indQξ +
d (inQξ) =
d (inQξ) , (6)
which is only forced to vanish if inQξ is globally defined on Σr. The second, and a more impor-
tant, ambiguity comes from possible boundary terms in the Lagrangian L. For the boundary
terms Sbdy. =
Lbdy. =
dLbdy., the variation that gives the equations of motion is done on
the boundary,
δξSbdy. =
δLbdy.
δLbdy.
δξLbdy. =
d(δξLbdy.). (7)
Since δξLbdy. = iξ(dLbdy.) + d(iξLbdy.), the current is just given by
Jξ = −iξ(dLbdy.) + iξ(dLbdy.) + d(iξLbdy.) (8)
and hence the charge is Qξ = iξLbdy.. This implies that boundary terms contribute only to
conserved charges
Qξ of (Killing) vectors that do not lie in Σr.
2.2 The Noether-Wald charges for type IIB supergravity
Now we would like to find the Noether-Wald charges in 10d type IIB supergravity for config-
urations with just the metric and the 5-form turned on. As is standard, we work with the
action
LIIB =
16πG10
−g [R−
4 · 5!
F 2(5)] (9)
neglecting the self-duality of the 5-form and impose it only at the level of the equations of motion.
We follow the procedure outlined in section 2.1 to find the Noether-Wald currents. Using the
variations
−g R) =
−g [Rµν − 12Rgµν ] δg
−g gµν [∇σδ̄Γσµν −∇ν δ̄Γσµσ] and
−g F 2(5)) =
−g [5F (5)
µκσωλ
F (5) κσωλν − 12 gµν F
(5)] δg
−2 · 5![δC(4)
−g Fµνσωλ
)− ∂µ(δC
µνσωλ
−g)], (10)
where δ̄Γλµν =
gλσ [∇µδgσν +∇νδgµσ −∇σδgµν ], one can find the equations of motion
Rµν −
µκσωλ
F (5)ν
= 0 and ∂µ(
−g Fµνσωλ
) = 0. (11)
These are supplemented by the self-duality condition ⋆(10)F
(5) = F (5). The self-duality constraint
F (5) = ⋆F (5) implies that F 2
= 0, and then the metric equation of motion in (11) implies R = 0
for any solution. Hence the Lagrangian vanishes on the solutions and therefore the Noether-Wald
current in (3) is given entirely by the 9-form Θ (or equivalently by its dual vector field). This
can be found from the total derivative terms in δL by substituting δξg
µν = ∇µξν +∇νξµ and
and δξC
= 4 ∂[ν|(ξ
θ|σωλ]) + ξ
θνσωλ
. This gives us the current
J α = −2
−g gασ [Rσλ −
λνθωλ
F (5) νθωλσ ]ξ
+∂µ[−
−g gµνgασ(∇νξσ −∇σξν) +
2 · 3!
−g ξθC(4)
αµσωλ
] , (12)
where the first term vanishes by the equations of motion and the second term gives us the charge
density
16πG(10)
∇αξµ −∇µξα +
αµσωλ
. (13)
Noting that the self-duality constraint
−g Fµ0···µ4
ǫµ0···µ9F
µ5···µ9 implies
αµσωλ
= ξνC
3! 5!
ǫαµσωλµ5···µ9F
µ5···µ9 , (14)
the Noether-Wald charge density (13) can be equivalently written as the 8-form
= − 1
16πG10
⋆ dξ̂ − 1
(4) ∧ F (5)
where ξ̂ is the dual 1-form of the vector field ξµ. This can be integrated over a compact 8d
submanifold to get the corresponding conserved charge. A quick calculation verifies that the
current for this charge vanishes identically as expected because of the vanishing Lagrangian.
Hence, all charges that are computed from it are conserved as discussed in section 2.1. If we
further assume that LξC(4) = 0, we have iξF (5) = −d(iξC(4)). This can be used to rewrite (15)
16πG(10)
⋆ dξ̂ +
C(4) ∧ iξF (5)
up to an additional term proportional to d(C(4) ∧ iξC(4)). This extra term does not contribute
when integrated over a compact 8-manifold provided that C(4) ∧ iξC(4) is a globally well defined
7-form as we discussed in section 2.1. In such cases (16) can be used instead of (15).
In section 4, we will demonstrate that this formula reproduces conserved charges [12] of
Gutowski-Reall black holes of type IIB in 10 dimensions successfully. We hope this expression
may be useful in obtaining the charges of the yet to be discovered black holes from their near
horizon geometries alone.
2.3 The Noether-Wald charges for 5d Einstein-Maxwell-CS
The action for 5d Einstein-Maxwell-Chern-Simons gravity is
16πG5
−g (R− FµνFµν)−
ǫmnpqrAmFnpFqr
which is the same as the action for the 5d minimal gauged supergravity up to the cosmological
constant, which turns out not to contribute to the Noether charge. After a straight forward but
slightly lengthy calculation it is easy to show that the Noether current for this action is
16πG5
(Rαλ − 1
gαλR)− 2 (F λµFαµ − 14g
λαF 2)
+ 4 (ξ · A)
−gFαµ) + 2√
ǫανσωλFνσFωλ
−ggµνgαλ (∇νξλ −∇λξν)− 4
−g(ξ · A)Fαµ − 8
(ξ · A) ǫαµσωλAσFωλ
.(18)
The first two lines are simply proportional to the equations of motion and vanish on-shell and
hence the Noether-Wald charges for this theory are
16πG5
−g (∇αξµ −∇µξα) + 4(ξ ·A)(
−g Fαµ +
ǫαµσωλAσFωλ)
. (19)
These expressions have also appeared recently in [8]. An alternative derivation of (19) in terms
of KK charges will be presented in section 3.3. The charge density (19) can equivalently be
written as the 3-form
16πG5
⋆dξ̂ + 4 (iξA)
⋆ F − 4
A ∧ F
. (20)
As before the charges can be obtained by integrating Qξ over a 3d compact sub-manifold. Note
that if we set the gauge fields to zero we recover the standard Komar integral for the angular
momentum.
2.4 Reduction from 10 dimensions
Now, we will find the dimensional reduction of the 10d formula of conserved charges to the 5d
formula to show that they are indeed identical, so let us first review the reduction formulae to
obtain the equations of motion of 5d minimal gauged supergravity from 10d type IIB supergravity
with only the metric and the self-dual 5-form F (5) turned on [13, 14].
As usual, we express the metric in terms of the frame fields e0, . . . , e9 and do the dimensional
reduction along the compact 5-manifold Σc that is spanned by the 5-form e
5∧e6∧e7∧e8∧e9 =:
e56789. Then, the lift formula is [22] (see also [23])
ds210 = ds
5 + l
(dµi)
2 + µ2i
dξi +
F (5) = (1 + ∗(10))
vol(5) +
d(µ2i ) ∧ dξi ∧ ∗(5)F
, (21)
where µ1 = sinα, µ2 = cosα sin β, µ3 = cosα cosβ with 0 ≤ α ≤ π/2, 0 ≤ β ≤ π/2, 0 ≤ ξi ≤ 2π
and together they parametrise S5. Note that we define the Hodge star of a p-form ω in n-
dimensions as ∗(n)ωi1...in−p = 1p!ǫi1...in−p
j1...jpωj1...jp , with ǫ0123456789 = 1 and ǫ01234 = 1 in an
orthonormal frame. The 10d geometry is specified by {e0, · · · e4}, an orthonormal frame for the
5d metric ds25, together with
e5 = l dα, e6 = l cosαdβ, e7 = l sinα cosα [dξ1 − sin2β dξ2 − cos2β dξ3], (22)
e8 = l cosα sinβ cosβ[dξ2 − dξ3], e9 = −2√3A− l sin
2α dξ1 − l cos2α(sin2β dξ2 + cos2β dξ3).
and the five form [22, 23, 13]
F (5)=
e0···4 + e5···9
(e57 + e68) ∧ (∗(5)F − e9 ∧ F ) (23)
One can write the 5-form RR field strength as F (5) = dC(4) where
C(4) = Ω4 + cotα e
678 ∧ (e9 + 2√
A ∧ (e57 + e68) ∧ (e9 + 2√
(e9 + 2√
A) ∧ (⋆F + 2√
A ∧ F )
. (24)
where Ω4 is a 4-form such that e
01234 = dΩ4. Now we are ready to do the reduction of the 10d
charge
Qχ := −
16π G10
⋆ dχ̂− 1
(4) ∧ F (5)
where Σ8 is a compact 8d submanifold that is composed of a spacelike 3-surface Σ in 5d and
Σc. Hence, only e
5...9 will contribute to the integral. Let us consider χ to be a Killing vector of
the 10d geometry which also reduces to a Killing vector of the 5d geometry and χ̂ be its dual
1-form. Then we find from the expression for the frame fields (21, 22):
χ̂ = χ̂5 + (iχe
9) e9 = χ̂5 − 2√3(iχA) e
9 , so
⋆dχ̂ = ⋆dχ̂5 − 2√
(iχA) ⋆ d e
9 + . . . = ⋆dχ̂5 +
(iχA) ⋆ F + . . . (26)
where “. . .” denotes terms that do not contribute to Qξ. Next, let us find the relevant terms in
C(4) and F (5) (23,24). Noting that iχ
e9 + 2√
= 0, they are:
(4) = iχΩ4 − 2√
(iχA)
e57 + e68
e9 + 2√
e9 + 2√
iχ ⋆ F +
iχ(A ∧ F )
+ . . . (27)
F (5) = −4
e56789 + 2√
⋆ F − F ∧ e9
e57 + e68
+ . . . (28)
(4) ∧ F (5) = −2
iχΩ4 +
(iχA) +A ∧ iχ
⋆ F + 2√
A ∧ F
e56789 + . . . . (29)
After some algebra, the charge reads
Qχ = −
16π G5
⋆dχ̂5 + 4 (iχA) ⋆ F +
(iχA)A ∧ F +
iχΩ4 −
iχ(A ∧ ⋆F )
. (30)
We see immediately that for vectors in the directions of Σ it just reproduces the 5d Noether
charge (19). For vectors orthogonal to Σ, it is different, as is not unexpected, since typically in
dimensional reduction the actions agree only up to boundary terms.
3 Charges from dimensional reduction
In this section we will rederive the Noether-Wald charges for 5d supergravity of section (2.3)
using further dimensional reduction. In particular, we will demonstrate that the 5d Noether-
Wald charges can alternatively be obtained from Kaluza-Klein U(1) charges. For this, we will
first dimensionally reduce the 5d theory along the relevant Killing vectors and then find the
Noether charges of the resulting gauge theory.1 Then we will lift the results back to 5d and
show that they agree with the corresponding 5d Noether-Wald charges. Finally, we will discuss
in which cases the charges obtained by our methods in the interior of the solution agree with
the asymptotic ones.
3.1 Dimensional reduction
In 5 dimensions one can have two independent angular momenta, so we consider dimensional
reduction over both compact Killing vector directions which generate translations along which we
have the independent angular momenta. We will again assume that all fields obey the isometries
and hence only need to consider zero-modes in the compact directions.
We take lower case greek letters α, β, . . . ∈ {t, r, θ, φ, ψ} to be the 5d indices, upper case latin
A,B, . . . ∈ {t, r, θ} to be the 3d indices and lower case latin a, b, . . . , i, j, l,m, . . . ∈ {θ, φ} to be
the indices for the compactified directions in 5d or scalar fields in 3d. The appropriate reduction
ansatz is:
Gµν =
gMN + hijB
, Am =: Am and AM =: A
M + AaB
M , (31)
such that we get
Fµν =
FMN + (dAa ∧Ba)MN An,M
−Am,N 0
, (32)
in terms of the 3d gauge fields Ha = dBa and F 3d = dA3d, and we defined for simplicity F =
F 3d + AaH
a. The definition of A3d in (31) is needed to have the appropriate transformations
of the KK and Maxwell U(1) symmetries and arises naturally from the reduction using frame
fields (see, for instance, [9] for details). Now, we find
µν = FMNF
MN − 2habA,MA,M and
ǫαµνρσAαFµνFρσ = 4ǫ
LMNǫab
Aa,LFMNAb − A3dLAa,MAb,N
, (33)
such that the 5d Lagrangian (17) can be rewritten as :
G5 × L3d =
R3d −
HaMNH
b MN − FMNFMN + 2habAa,MA ,Mb
1This dimensional reduction has been used recently in [10, 11] for defining the entropy functions for such theories.
ǫLMNǫab
Aa,LFMNAb − A3dLAa,MAb,N
, (34)
where VT 2 is the “volume” of the compact coordinates. One can now construct conserved currents
using the Noether procedure for the gauge symmetries of the two U(1) gauge fields Baµ and A
We find the corresponding Noether charges for Baµ to be
Ja = −
16πG5
a rt + 4AaF
ǫLrtǫmnAm,LAn
. (35)
which we identify as the two independent angular momenta. The Noether charge for A3dµ works
out to be
Q = − VT 2
hF rt +
ǫLrtǫmnAm,LAn
which we identify with the 5d electric charge. Alternatively, these charges can be read off by
writing the left hand side of the equations of motion for the Lagrangian (34)
a MN + 4AFMN
+ 16Aa
ǫLMNǫmnAm,LAn
= 0 (37)
−g hFMN + 4
ǫLMNǫabAa,LAb
ǫLMNǫmnAm,LAn,M , (38)
as a total derivative and interpreting the resulting total conserved quantities as the charges.
For geometries with just one independent angular momentum, one can apply the above
formulae in a straight forward way, or do a reduction only down to 4d as in such cases only one
U(1) isometry is expected in the geometry. The computations for the latter are identical to the
ones here, so we just state the expressions for the angular momentum along ∂ξ and the charge:
J = − VT1
16πG5
e2σHrt + 4A F rt
ǫrtAB
A FAB − 2A,AA4dB
, (39)
Q = − VT1
−geσF rt + 1
ǫrtAB
3A FAB + A F
AB − 4A,AA4dB
, (40)
where e2σ = gψψ , VT 1 is the periodicity of ψ, and the conservation follows by the equations of
motion
e2σHMN + 4A FMN
ǫABMN
A FAB − 2A,AA4dB
= 0, (41)
−geσFMN + 2
ǫABMN
A FAB − 2A,AA4dB
ǫABMNFABA,M . (42)
3.2 Oxidation of the angular momentum
Now we would like to demonstrate that the lower dimensional Noether charges above, when
lifted back to 5d, give the Noether-Wald charges for the compactified Killing vectors. For
simplicity, we look at the expression with only one independent angular momentum and only
one dimension (along ψ) reduced. Our results will hold in general though, as the gauge theory
corresponding to the angular momentum is abelian, so we can examine different Killing vectors
independently. First, we note that the dimensional reduction ansatz can be obtained with the
following triangular form of the frame fields [9]:
V Iµ =
viM e
and the inverse V MI =
0 e−σ
, (43)
with (bold latin) tangent space indices A,B, . . . ∈ {0, . . . , 4} and a,b, . . . ∈ {0, . . . , 3} such that
we can write the 4d fields in terms of the 5d fields (but still in 4d coordinates):
BM = e
−σV 4M , HMN = e
−σ(dV 4
− 2e−σ
deσ) ∧B
σ = ξµV Iµ and A = ξ
µAµ . (44)
Now the conservation equation (41) for the angular momentum Jψ reads in flat indices
ηacηbd
ξµV Kµ ηKLdV
L− 2eσ(deσ) ∧B
+ 4ξµAµ(F− 2(dA ) ∧B)cd
8ξµAµ
ǫcdij
F− 2(dA ) ∧B
− 2(dA )cAd + (dA 2)cBd
= 0 . (45)
Extending the summations to A,B, .. and using the form of the frame fields and the indepen-
dence from ψ yields:
ηACηBD
d(ξµV IµηIJV
+ 4ξµAµFCD
8ξνAν
ǫCDABEA
(dξ̂)µN + 4ξ ·AFµN
8ξ · A
ǫµNαρσAαFρσ
= 0 . (46)
The conserved charge extracted from this equation exactly reproduces the charge in (19).
3.3 Generalization and Limitations
3.3.1 Relation to the Asymptotes
Let us now discuss in which situations the charges computed in the spacetime interior give
the charges as defined on the asymptotic boundary. We see most easily from (20) that when
evaluated on a hypersurface on which iξA = 0, such as a suitable asymptotic boundary, our
formulae match with the appropriate Komar integral.
We can compute a (possibly zero) KK or Noether-Wald charge, that corresponds in a specific
geometry to the angular momentum, for every U(1) isometry. However, the asymptotic hyper-
surface on which the angular momentum of a black hole is defined is an Sd−2. When in such
a geometry angular momenta are turned on, its SO(d − 1) isometry breaks (generically) down
to its U(1) subgroups whose charges give the angular momenta, so only the local U(1) factors
that correspond to the asymptotic U(1) subgroups will be related to the angular momentum.
Furthermore, the normalization of the period generated by the Killing vector also has to be
taken into account.
We saw in sections 2.1 and 3.1 how the charges of compact Killing vectors are conserved
whenever the source-free equations of motion hold. That is, they are independent of the position
of the surface on which they are computed, QΣr2 −QΣr1 =
M dMM ∂NQ
MN = 0 where Σr1 and
Σr2 are the boundaries of the volume M - provided that the U(1) theory is defined throughout
the bulk volume and we can consistently compactify the manifold (at least outside the horizon).
Hence, the black hole charge and angular momentum as defined on a spacelike d-2 hyper-
surface Σ∞ at the asymptotes are given by the corresponding KK or Noether-Wald charge,
computed over any spacelike d-2 hypersurface Σr0 in the spacetime for any (not necessarily ex-
tremal) black hole (or in general any spacetime with a suitable asymptotic boundary). That
is, provided there exists a spacelike d-1 hypersurface M with ∂M = {Σr,Σ∞} on which the
following sufficient conditions are satisfied:
1. The relevant compact Killing vector is a restriction to Σr of a Killing vector field that is
globally defined on M and generates a constant periodicity.
2. There are no sources, i.e. the vacuum equations of motion for the gauge fields are satisfied.
3. There exists a smooth fibration of surfaces
π→ [r0,∞[
= M such that π−1r0 = Σr0
limr→∞ π
−1r = Σ∞.
An example where these conditions are satisfied is the region outside the (outer) horizon
of a stationary black hole solution with an Sd−2 horizon topology, embedded in a geodesically
complete spacetime with an asymptotic Sd−2 boundary. One example where these conditions
are violated is that of black rings [18] which will be considered separately in an appendix.
3.3.2 Gauge Issues
The contributions of the CS term in the conserved quantities in (3.1) depend explicitly on the
gauge potentials. This does not however make them gauge dependent. To see this in 5d, let
us consider the electric charge computed by the Noether procedure which is given in [4] as
⋆ F + 2√
A ∧ F
. We notice that the charges get contributions of the form
A ∧ F ,
that change under a transformation δA = dΛ as
dΛ∧F =
d(ΛF ) = 0 because Σ is compact.
From the 3d point of view the KK scalars A may depend on a 5d gauge transformation. However
Λ must be periodic in the angular coordinates so that the contributions from dΛ vanish after
integration. This is also the reason why the term containing ξ ·A in eq. (19) is gauge independent
for compact Killing vectors. On the other hand, the Noether charge for a non-compact Killing
vector is gauge-dependent and hence is only physically relevant when measured with respect to
some boundary condition or as a difference of charges.
4 Examples
So far we have derived Noether charges for various supergravity theories that may be used to
calculate the electric charges and angular momenta of the solutions. In particular, they can be
used on the near horizon geometries to calculate the conserved charges of the corresponding black
holes. In this section we will demonstrate with several examples how our charges successfully
reproduce the known black hole charges in different dimensions, for equal or unequal angular
momenta and independent of the asymptotic geometries. We will start with a 10d example and
then cover 5d examples, first with one angular momentum in AdS and flat asymptotics, and
then with unequal angular momenta in asymptotic AdS.
4.1 The 10d Gutowski-Reall black hole
In [12], Gutowski and Reall found the first example of a supersymmetric black hole which
asymptotes to AdS5 as a solution to minimal gauged supergravity in 5d (see also [34, 17, 35, 36]).
Their solution was lifted to a solution to 10d type IIB supergravity in [13] and shown to admit
two supersymmetries. In [14] (see also [15]), the near horizon geometry of this 10d black hole was
studied. Here we use the formulae found in section 2.2 to calculate the Noether-Wald charges in
the near horizon geometry and show that they agree with the charges of the black hole measured
from the asymptotes. The 10d metric of this near horizon geometry is ds210 = ηabe
aeb with the
orthonormal frame
σL3 , e
, e2 =
σL1 , e
σL2 , e
λ σL3 , (47)
and the five-form is
F (5) =
(e0···4+e5···9)−
(e57+e68)∧ [−3e023+e014−
e234+e9∧ (3e14−e23−
e01)] (48)
where e5 . . . e9 are given in (22) and
dt+ ω
σL3 ) =
(e0 + 2ω
e4), λ =
l2 + 3ω2 and
σL1 = sinφdθ − sin θ cosφdψ, σL2 = cosφdθ + sin θ sinφdψ, σL3 = dφ+ cos θ dψ. (49)
The potential C(4) for the above field strength was given in section 2.4 with Ω4 =
e0234 [14].
Here we concentrate on the compact Killing vectors ∂φ and ∂ξ1 + ∂ξ2 + ∂ξ3 of this geometry and
calculate the corresponding conserved charges. For χ = ∂φ which has a period 4π, we have
χ̂ = 3ω
e0 + ωλ
e4 − ω2
e9 and
dχ̂ = −2ωλ
e01 + 3ω
e14 − (1 + ω2
)e23 + ω
(e57 + e68) (50)
and hence the relevant terms in ⋆dχ̂ are ω
(4l2 + 3ω2) e2···9. Similarly, we find
C(4) ∧ iχF (5) = ω
(2 l2 + ω2) 1
σL1 ∧ σL2 ∧ σL3 ∧ e56789 . (51)
After noting that the integral over 1
σ123 ∧ e56789 gives a factor of 2π5l5, we find
Q∂φ = −
16π4 l5G5
S3∧S5
[⋆dχ̂+
C(4) ∧ iχF (5)] = −
8 l G5
) , (52)
which agrees with the angular momentum, up to a minus sign, that comes from the definition
of the angular momentum as minus the Noether charge [12]. For χ = ∂ξ1 + ∂ξ2 + ∂ξ3 , we have
9 = −l. One can calculate the 10d current and find that
⋆ dχ̂+
C(4) ∧ iχF (5) =
(⋆5F +
A ∧ F ) ∧ e5678 ∧ (e9 +
A) + · · · . (53)
Therefore the corresponding charge is
Q∂ξ1+∂ξ2+∂ξ3
π l ω2
) . (54)
This differs from the answer Q(GR) =
3π ω2
(1 + ω
) [12] by a factor of −l/
12. The minus
sign is because of a difference in our conventions from those of [12] and the factor of l is there
to make the charge Q(GR) dimensionless. The killing vector ∂ξ1 + ∂ξ2 + ∂ξ3 has a period of 6π
and to normalise it to have a period of 2π we have to multiply it by a factor of 3. If we take this
into account the extra factor reduces to
3/2. This is precisely the factor required to define the
5d gauge field in the conventions of dimensional reduction from 10d to 5d [22]. Thus we find
complete agreement between our 10d computation of charges from the NHG and the asymptotic
black hole charges of [12].
4.2 5d Black Holes
Now we turn to black hole solutions in 5d Einstein-Maxwell-CS and minimal gauged supergravity.
4.2.1 Equal Angular Momenta: BMPV and GR
Let us consider two examples that are similar in the near-horizon geometry, with a squashed
S3 horizon, but differ by their asymptotic behaviour; the BMPV black hole [4, 16] with asymp-
totically flat geometry and the Gutowski-Reall (GR) black hole [12] with asymptotically AdS5
geometry.
Their near-horizon solutions can be put in to the form
ds2 = v1
− r2dt2 + dr
σ21 + σ
2 + η(σ3 − αr dt)2
, A = −e r dt+ p(σ3 − αr dt) (55)
which, when dimensionally reduced along the ψ-direction, gives ds24 = v1
− r2dt2 + dr2
dθ2 + sin2θ dφ2
. This has AdS2 × S2 symmetry as expected. The fields take the form
B = −rαdt+ cos θ dφ, e2σ = v2η, A = p and A4d = −e r dt. For the BMPV case, we find:
v1 = v2 =
, η = 1−
, α =
µ3 − j2
, e = −
µ3 − j2
and p =
. (56)
Evaluating the 4d quantities and noting that ǫtrφθ = 1 and VT 1 = 4π, (39, 40) gives us J = πj4G5
which is equal in magnitude to the angular momentum in [4] up to a factor of 2, which arises
from the canonical normalization of the Killing vector ξ = 2∂ψ , and Q =
For the GR case, we have:
, v2 =
, η = 1 + 3
, α = − 3ωl
4l2 + 3ω2
, e =
α, p =
. (57)
Note that we have defined A with an overall factor of −1 compared to [14] to account for a
different convention for the CS term. This gives the results J = −3πω2
(1 + 2ω
) and Q =
(1 + ω
) as expected. Note that [12] do not use the canonical normalization for ∂ψ of [4].
4.2.2 Non-equal Angular Momenta: Supersymmetric Black Holes
Here, we present as the most simple example the N=2 supersymmetric black holes with non-
equal angular momenta of [17], which are asymptotically AdS5, just as the GR case. We start
off with the metric in the form [17]
gtt =
(ρ2ΞaΞb)
ρ2ΞaΞb(1 + r
2) − ∆t(2mρ2 − q2 + 2abrρ2)
, grr =
, gθθ =
gtφ =
−∆t sin2θ
ρ4Ξ2aΞb
a(2mρ2 − q2) + bqρ2(1 + a2)
, gtψ = gtφ(a↔ b, sin θ ↔ cos θ)
gφφ =
sin2θ
ρ2Ξ2a
(r2 + a2)ρ4Ξa + a sin
a(2mρ2 − q2) + 2bqρ2
gψψ = gφφ(a↔ b, sin θ ↔ cos θ), gφψ = sin
2θ cos2θ
ρ4ΞaΞb
ab(2mρ2 − q2) + (a2 + b2)qρ2
with the the gauge field
∆tΞaΞbdt −
a sin2θ
dφ − b cos
where
ρ2 = r2 + a2 cos2θ + b2 sin2θ, ∆t = 1− a2 cos2θb2 sin2θ,
(r2+a2)(r2+b2)(1+r2)+q2+2abq
r2−2m , Ξa = 1− a
2 and Ξb = 1− b2 . (60)
We consider the case with saturated BPS-limit and no CTC’s, which requires:
1 + a+ b
, m = (a+ b)(1 + a)(1 + b)(1 + a+ b). (61)
Now we can find the near horizon geometry with explicit AdS2 symmetry as in [18], by re-defining
t̃ = ǫt, r̃ =
4(1+3a+a2+3b+b2+3ab)
(1+a)(1+b)(a+b)
a+b+ab
, d̃φ = dt+ dφ, d̃ψ = dt+ dψ, (62)
then taking the limit of ǫ→ 0 and applying a gauge transformation to get rid of a constant term
in At. We can read off the 3d scalar fields hmn and A and find
BmN = h
maGaN , gMN = GMN − BaMhabBbN and A3dM = AM − AaBaM . (63)
Noting that VT 2 = 4π2, eqns. (35) give us the angular momenta Jφ̃ = π
a2+2b2+3ab+a2b+ab2
4G5(1−a)(1−b)2 and
= π b
2+2a2+3ab+a2b+ab2
4G5(1−b)(1−a)2 . These agree precisely with the corresponding asymptotic angular
momenta of [18].
5 Charges from the entropy function
The original incarnation of the entropy function formalism [3, 1] was not only a useful tool for
finding near-horizon solutions, but also for extracting the conserved charges from a given solution.
However, in the presence of Chern-Simons terms, the entropy function formalism captures only
part of the conserved charges. We demonstrate here two equivalent ways to cure this problem.
Let us first recall the entropy function formalism [3, 1]:
One considers a general theory of gravity described by the Lagrangian density L with abelian
gauge fields F i(x) and scalar fields Φj(x). Then one writes down the most general ansatz for the
near horizon geometry assuming the isometries of AdS2 × S1 (for simplicity, we consider here
d=4 as in [3, 1]):
ds2 = v1(θ)
− r2dt2 +
dθ2 + v2(θ)
dφ2 − α r dt
F i =
ei − αbi(θ)
dr ∧ dt + ∂θbi(θ)dθ ∧ (dφ − α r dt) and Φj = uj(θ) , (64)
in terms of the parameters {α, ei, β} and θ-dependent scalars {vi(θ), bi(θ), ui(θ)}. Then, one de-
fines the “reduced action” f(α, ~e, β, ~v(θ), ~b(θ), ~u(θ)) =
dθdφL - a functional that generates
the equations of motion
δbi(θ)
δvi(θ)
δui(θ)
= 0, where the functional derivatives
can be understood in terms of the Fourier coefficients in the expansion along θ, and
= qi ,
= j , (65)
where qi and j are supposed to give the charges of the black hole. Then the entropy function is
defined to be the Legendre-transform of the reduced action
E(j, qi, β, ~v(θ), ~b(θ), ~u(θ)) = 2π(eiqi + αj − f) . (66)
Finally, the entropy of the black hole is S = E , evaluated on the solution.
5.1 Completing the equations of motion
In section 3.1, we learned how to find the conserved charges in the presence of Chern-Simons
by writing the KK gauge field equations of motion in a conserved form. Since we now know the
right reduction ansatz, we just need to find a mechanism to parametrize both the variation with
respect to At and Bt and the integration of the right hand side of the equations of motion to
obtain the closed form. One such mechanism is a modification of the ansatz with the pure gauge
terms {ǫi,ℵa} to do the variations δL
and δL
; and with a dummy function c(r), that introduces
an artificial and unphysical r-dependence into fields that are constant by the symmetries. c(r)
then allows to keep track of their, otherwise vanishing, derivatives and to do their integration
on the right hand side of the equations of motion. Hence, we write
Ai = −(ǫi + ei r)dt + c(r) pia(θ)
dφa − (ℵa + αar) dt
, (67)
ds2 = v(θ)
− r2dt2 + dr
dθ2 + ηab(θ)(dφ
a − (ℵa + αar)dt)(dφb − (ℵb + αbr)dt)
and we also wrap all scalar fields that appear in the Chern-Simons terms with a factor of c(r),
ui(θ, r) = c(r)Φi(θ). The solution corresponds to setting c(r) = 1 and c′(r) = 0, which we can
either implement by furnishing c(r) with a control parameter, or by choosing c(r), s.t. c(r0) = 1
and c′(r0) = 0 for some r0, but c
′(r0) 6= 0 for r 6= r0. The equations of motion for the gauge
fields are then ∂r
and ∂r
∂ℵa and give rise to the conserved charges
and Ja =
, (68)
evaluated on the solution. A simple variation of this is c(r) = 1 + 1
r, n being the number of
3d scalar fields in the CS term, which automatically takes care of the integration of the second
term and ensures that all remnant dummy terms will disappear in the first term at r = 0.
The other computations follow just as in the original form of the Entropy function, using
c = 1, c′ = 0 throughout. Note that the entropy function is still computed as originally defined,
E = 2π
αa + ∂L
ei − f
, i.e. not using the conserved charges.
One can easily see that this gives the equations of motion, and it also gives the correct value
for the entropy as the original derivation [3, 1] is independent of what the conserved charges are.
This can also be seen by repeating the derivation in section 6.4 with the original action (34).
As a simple example we have already written the 4d ansatz (55) in section 4.2.1 in a suggestive
form, such that the coefficients can be read off from (56) and (57) with β2 = v2. We note that
the ℵa parameters do not appear here in the action. A simple computation reveals that this
gives indeed the results in section 4.2.1.
5.2 Gauge invariance from boundary terms
In section 3.3, we found that the charges are gauge invariant. However, it would be desirable if
we could impose gauge invariance at the level of the Lagrangian of the 3d action (34). The result
can, in principle, be oxidized back to 5d, but we will stick for simplicity to 3d. The only term of
concern is the A3d ∧ dA[a ∧ dAb] in the CS term in (34), which varies under A3d → A3d + dΛ as
dΛ ∧ dA[a ∧ dAb]. This variation is a total derivative d(ΛdA[a ∧ dAb]) which, after integration,
gives a boundary term ΛdA[a ∧ dAb]. This can be re-expressed as d(ΛA[adAb])− A[adΛ ∧ dAb],
where the first term vanishes if we consider a stationary boundary. The second term is suitably
cancelled by adding a boundary term Abdy. [aA
∧ dAbdy. b], which is identical to a bulk term
d(A[aA
3d ∧ dAb]). Expressed in index notation, and furnished with appropriate factors, the
boundary term that we need to add corresponds to the bulk term is
δL 3d = −
16πG5
ǫLMNǫab
Aa,LF
MNAb + 2 A
LAa,MAb,N
, (69)
which brings the Lagrangian to
G5 × L3d =
R3d −
Ha MNH
b MN − FMNFMN + 2habAa,MA ,Mb
ǫLMNǫab
2Aa,LFMNAb + Aa,LF
, (70)
eliminating the gauge dependent term. A quick calculation shows that this does not affect the
value of the charges (35, 36). Effectively, what we have done is to differentiate the components
of the 5d gauge field in the CS term whose gauge transformations do not vanish automatically
by periodicity constraints, and remove the derivative from other components by an integration
by parts. Hence, the right hand side of each of the 3d gauge field equations of motion does
vanish, and the charges are just the conjugate momenta of the gauge fields B and A3d:
Q = −
δF 3dµν
ǫρµνdx
ρ and Ja = −
δHaµν
ǫρµνdx
ρ , (71)
as in the absence of CS terms. It is easy to verify that the value of the charges remains unchanged.
This means that, if we compute the reduced action from the gauge independent action, the
original formalism will give us the right charges. The entropy function, now computed with the
full charges, does not depend on the extra boundary term and hence also gives us the correct
value of the entropy as we shall derive directly from the Poincaré time Noether charge in section
6 Thermodynamic Charges
Having computed the charges of the Sd−2 isometries, we now turn to the charges of the AdS2
isometries. In particular, we will concentrate on the charge of ∂t, as this will be related to the
thermodynamic quantities entropy S and mass M . First we will compute the Poincar’e time
Noether charge from the Hamiltonian in the NHG and propose a new definition of the black
hole entropy for extremal black holes in the NHG in terms of this charge - similar to Wald’s
definition for non-extremal black holes. Then we (i) justify this definition by showing that it
gives the right extremal limit of the first law, (ii) derive from the Noether charge a statistical
version of the first law suitable for extremal black holes and (iii) re-derive the entropy function
directly from the definition of the entropy. Finally, we discuss the notion of mass as seen from
the NHG by deriving a Smarr-like formula.
6.1 Poincaré Time Hamiltonian
For the Poincaré time Killing vector ∂t, one expects the Noether charge to be related to the
Hamiltonian, which we will explore now.
Since the theory is generally diffeomorphism invariant, we expect the bulk contribution to
vanish. So we concentrate on boundary terms Sbdy. =
B Lbdy., that are necessary to cancel total
derivatives dΘ in the variation of the bulk action δS =
(Eiδφ
i + dΘ(δφ)). In our example,
we have to consider both the variations of the metric and of the 3d gauge fields. For the gauge
fields, the term that we ignored in the derivation of the equations of motion was
µ = ∂µ
δ Aν,µ
δAν +
δ Baν,µ
. (72)
For a complete spacetime, the textbook answer is to place the usual restriction δA|bdy. =
δB|bdy. = 0. Then, the only boundary term that one needs to add in order to make the vari-
ational principle consistent is a Gibbons-Hawking-like term, that compensates for a variation
proportional to the normal derivative of δg at the boundary. For the Einstein-Hilbert action,
that is the usual Gibbons-Hawking term
SGH =
LGH =
hK = − VT 2
16πG5
hγMNn
M ;N , (73)
where γ is the boundary metric and K is the surface gravity of the boundary B, which, in our
geometry, is just an S1 fibred over time. Note that we took n = −∂r to be inward-pointing
in order to define the bi-normal NMN :=
(∂t)[MnN]
|∂t||n| of Σbdy. with a positive signature. Now,
we can read off the Hamiltonian of the NHG if it were an isolated solution. By definition,
Lξgµν = 0, such that the canonical Hamiltonian is just HI = −
i∂tLGH with the time
slice of B being Σbdy = S1. Since ∂t is a Killing vector, a quick calculation shows |∂t|
−γK =√
−gNMN (d ∂̂t)MN , and hence the Hamiltonian is just
HI = −
Σbdy.
i∂tLGH =
16πG5
hNMN (d ∂̂t)
MN . (74)
Now, if we consider the near-horizon geometry being embedded in the full black hole solution,
we cannot put δA|bdy. = δB|bdy. = 0, but we need to satisfy the variational principle by adding
a Hawking-Ross-like boundary term as in [28]:
LHR = nM
δ AN,M
=: −nN
Q̃MNAN + J
and impose the condition to keep the charges fixed under variations of the boundary fields. Now,
the boundary action varies as:
δSHR = −
d2σ nM
δQ̃MN
δJMNa
d2σ nM
Q̃MNδAN + J
where the second term cancels the total derivative in the variation of the bulk action (note the
inward-pointing n), and the first term vanishes as the charges are fixed. A little caveat occurs if
we use the gauge-dependent form of the action (34), when Q̃ 6= Q, however the missing bit does
not depend on the 3d gauge fields, but only on the scalar fields, and hence it is invariant under
variations of the gauge fields. If we consider the gauge-independent form of the action (70), then
Q̃ = Q. Again, by definition we have LξBi = 0, and we will choose a gauge such that LξA=0,
and the canonical Hamiltonian is just
H = −
i∂t(LHR + LGH) . (77)
Because of the AdS2 symmetries, we have
Σbdy.
i∂t (Q∧A) =
Σbdy.
Q(i∂tA) and similar for Ji∧Bi.
This puts the Hawking-Ross contribution to the boundary Hamiltonian to−
Σbdy.
dθ NMN
Q̃MN (i∂tA)+
JMNa (i∂tB
. This gives for the action (34)
H = − VT 2
16πG5
dθNMN
(d ∂̂t)
MN + HaMNhab(i∂tB
b)+4FMN i∂t
ǫPMNǫabAa,PAb i∂t
We now compare (78) with the Noether charge obtained by dimensional reduction of the 5d
expression (20). For this, we work out how the individual terms look like in 3d with the notation
of section 3.1. We consider only the components QMN
in the non-compact directions, and only
zero modes of the fields in the compact directions. Hence we get from the reduction formulae
(31 - 34):
(dξ̂)MN =
dξ̂3d
ξ3d ·Bjhji + χihij
H iMN , FMN = FMN ,
ǫMNαβγ = 2ǫMNLǫijAi,LAj and ξ ·A = ξ3d ·A3d + ξ3d ·BiAi + χiAi . (79)
Now, we can write down the charges of ξ3d, the non-compact components of ξ, and χ, its compact
components, separately:
QMNξ3d = −
16πG5
dξ̂3d
+ ξ3d ·Bj
j MN + 4AiF
+ 4ξ3d ·A3dFMN
ξ3d · A3d + ξ3d ·BiAi
ǫMNLǫijAi,LAj
QMNχ = −
16πG5
iMN + 4AiF
ǫMNLǫkjAk,LAj
, (81)
where we have implicitly done an integration over the compact coordinates. Thus we see that
(78) is just the Noether charge Q∂t in 3d (80) as expected, and we have yet another confirmation
of the KK charge (35), as it matches with (80).
6.2 Entropy
The entropy S of non-extremal black holes was shown by Wald [5] to be given by the Noether
charge κS = 2π
Qξ of the timelike Killing vector ξ that generates the horizon, evaluated on
the bifurcate d-2 surface B of the horizon, and κ is the surface gravity of the horizon. Jacobsen,
Myers and Kang [19] later showed that the charge can be evaluated anywhere on the horizon,
provided all fields are regular at the bifurcation surface. After a coordinate transformation, one
sees that this requires all gauge fields to vanish on the horizon, such that the gauge is fixed to
ξ · A = 0 at the horizon, and hence eliminates the ambiguity of the gauge-dependence of the
Noether charge.
For extremal black holes, κ = 0 on the horizon (r = 0), so Wald does not give a suitable
definition of S, and furthermore there is no bifurcation surface - putting in doubt the gauge fixing.
In the AdS NHG, there should be no special point where to compute physical quantities. Using
the concept that the entropy is intrinsic to the horizon, and hence does not require embedding
the NHG into an asymptotic geometry, those problems are cured by defining the entropy as
κ(rbdy.)
HI(rbdy.) , (82)
in the dimensionally reduced theory with the boundary placed at any radius rbdy. 6= 0. The fact
that the 3d theory is static allows us to use
κ = −
gtt,r
−gttgrr
[9] that is well-defined and physically motivated as the acceleration of a probe at any radius r
with respect to an asymptotic observer and hence related to the temperature of Unruh radiation.
It also ensures that the entropy is independent of rbdy. with well-defined limits rbdy. → 0 and
rbdy. → ∞. Now, in terms of the Noether charge (80), the entropy is just as expected
Q∂t(r) (84)
in the gauge ξ · A(r) = ξ · B(r) = 0; but evaluated at r 6= 0, rather than r = 0 that one would
näıvely expect. We will see in the following three subsections that this definition of the entropy
naturally arises from black hole thermodynamics.
6.3 First Law
Since we have now an expression for the entropy intrinsic to the extremal limit, let us see whether
we can also find an expression for its variation as derived for non-extremal black holes by Wald
in [5]. First let us write the the Noether charge for the gauge-invariant action (70) in 3d for
ξ3d = ∂t as
Qξ3d(r) =
S − ξ3d · A(r)Qel. − ξ3d · Ba(r)Ja . (85)
Then, we consider variations of the dynamical fields δφi that keep the solution on-shell and use
the identity δdQξ3d = d
ξ3d · Θ
[5], with Θ defined in section 2, such that we can relate the
variation of the charge evaluated over two boundaries Σ1 and Σ2 of a spacelike d-1 surface:
δQξ3d − ξ3d ·Θ
δQξ3d − ξ3d ·Θ
. (86)
Now, let us move the boundaries into the near-horizon geometry (→ ΣH) and into some asymp-
totic limit (→ Σ∞). On ΣH , we have
ξ3d ·Θ =
L dθM ǫLMN
gOP δ̄ΓNOP + g
ON δ̄ΓPOP
δAO,N
δAO +
δκ − Qelδ(ξ3d ·A) − Jiδ(ξ3d ·Bi) , (87)
where we used for the second equality the AdS2 isometries, and assumed an Einstein-Hilbert
term for the gravitational action, and any gauge field term that can be written with only first
derivatives of A, such as (70). The right hand side of (86) can be interpreted by following Wald,
and defining the canonical energy, i.e. the Hamiltonian measured by an asymptotic observer at
Σ∞, E =
(Qξ3d − ξ3d·V ) with some d-1 form V: δ
ξ3d ·V =
ξ3d ·Θ. This corresponds,
for the asymptotic boundary conditions A = B = 0 and suitable normalization of ξ3d, to the
mass. Altogether, (87) gives us now an expression similar to the first law
δS + Φ(r) δQel. + Ω
i(r) δJi = δE (88)
at some r 6= 0, where Φ(r) = −ξ3d · A(r) and Ωi(r) = −ξ3d · Bi(r) measure the co-rotating
electric potential and angular frequency2 at r in the NHG with respect to the definition of E .
This, however is not yet a relation for the full black hole, but captures only physics outside Σr.
The extremal limit of the non-extremal first law of the full black hole solution is reproduced by
taking the limit r → 0:
ΦH δQel. + Ω
H δJi = δE , (89)
where ΦH = −ξ3d ·A(0) and ΩH = −ξ3d ·B(0) are the horizon co-rotating electric potential and
angular frequency. It is interesting to observe though, that (88) and corresponding expressions
for the Smarr formula resemble the first law of a finite temperature black hole, even though its
physical significance is limited, as Σrfor r 6= 0 is not a horizon.
An interesting observation and lesson is that when embedding the near horizon solution into
an asymptotic solution, but computing Noether charges in the NHG, we need to use the gauge
invariant action (70) and the full Noether charge, because there is no boundary of the NHG on
which we were allowed to fix the gauge fields and its gauge variations.
2To illustrate that this definition of Ω corresponds to the one in [5], consider a vector ξ = ∂t − Ω∂φ in static
coordinates with a diagonal metric g, and ξ = ∂t′ in co-rotating coordinates with a non-diagonal metric g
′. Then
ξ̂ = gttdt− Ωgφφdφ = gt′t′dt′ + Bφt′gφφdφ. A similar argument follows from requiring constant normalization of ξ and
considering gtt + gφφ = gt′t′ in the explicit coordinate transformation.
We see that our version of the first law also holds also for perturbations away from extremality,
which connects it smoothly (in a thermodynamic sense) to the near-extremal limit of the non-
extremal black hole, again supporting our definition of the entropy.
6.4 Entropy Function and the Euclidean Action
Now, let us continue following Wald [5] and relate the (integrated) mass (or energy E) to the
entropy. Starting with (85), we apply Gauss’ law to find
S − ξ3d ·A(r)Qel. − ξ3d · Ba(r)Ja = E −
Jξ3d +
ξ3d · V =: E −
I(r) , (90)
where the euclidean action3 I is now, in principle, a function of the radial position of ΣH , since
∂M = {ΣH ,Σ∞}. Even though I is defined only for κ 6= 0 as the integral of the analytically
continued Lagrangian, with τ = it having period 2π
, one would like to find a well-defined limit
as κ→ 0, i.e. r → 0, representing the full extremal black hole solution. This requires
ΦHQel. + Ω
HJa = E . (91)
This relation can be taken as a (gauge-dependent) definition of the mass of the black hole in
the near-horizon geometry. We note that since the action is gauge-invariant, (91) is gauge-
independent in the sense that a gauge transformation that changes ΦH and ΩH on Σ0 changes E
at Σ∞ accordingly. In the appropriate gauge in which E =M , it should agree with the BPS (or
extremality) condition - as we verified for BMPV and GR - and with an applicable Smarr-like
formula, supposed one has a full solution at hand. Now, let us study the remaining terms of (90).
Again, we make use of the AdS2 geometry to find that ξ3d ·
A(r)−A(0)
�κ(r) = F 3drt =: −EH
is the constant co-rotating electric field-strength in the NHG, as is ξ3d ·
Bi(r)−Bi(0)
�κ(r) =
Hrt =: −HH the field strength of the KK gauge field. Now, (90) reads
S = −2π
EHQel. + H
− I , (92)
with all terms, including I, being independent of the position r 6= 0 of ΣH in the NHG. (92)
holds also in the limit as r → 0. A similar expression was proposed and discussed in a statistical
context by Silva in [6], where it was motivated by taking the extremal limit of non-extremal black
holes, assuming an appropriate expansion of ΦH and ΩH in terms of the inverse temperature.
This is identical to (92), provided one identifies the NHG field strengths with the appropriate
expansion coefficients in [6]. Note that this relation is particular for extremal black holes and
profoundly different from the relation of the entropy to the euclidean action for non-extremal
black holes [29, 30].
Let us now show how this relates to the entropy function formalism. Given I = −2π
M iξ3dL +
iξ3dV
[5], we use the fact that the spacetime in the NHG can be trivially
foliated with spheres to re-write this as
I = −
iξ3dL +
iξ3dV −
iξ3dL
=: I0 +
L , (93)
where ∂M0 = {Σr=0,Σ∞}. Since
L is supposed to be invariant under the AdS2 isometries,
it is proportional to the volume form on AdS2 and (
L)�κ(r) = ⋆
L = const. Now,
the fact that I = const. implies that I0 = 0 and we are left with
S = −2π
EHQel. + H
HJi + ⋆
. (94)
3I equals the euclidean action only for stationary spacetimes, see [5].
This is just the entropy function for the gauge invariant action (70). The same derivation can
be applied to the original action (34) to give its corresponding entropy function. In that case E
in (91) will have a different value, because of the boundary terms in the action, stressing again
the need to work with (70) when relating the NHG to the asymptotic geometry.
6.5 Mass
Even though the mass of extremal black holes is fixed by the extremality (or BPS) relation 91,
let us now study its physical interpretation from the point of view of the NHG by deriving a
Smarr-like formula for the 5d Einstein-Maxwell-CS case.
Let us suppose there is some asymptotic geometry attached to the near horizon geometry in a
way that the conditions in section 3.3 are satisfied, and follow closely the derivation by Gauntlett,
Myers and Townsend in [4] for a few steps. The mass, E in a gauge in which A = B = 0 at Σ∞,
can be re-written using Gauss’s law in 5d as
M = −
16πG5
⋆dk̂ =
16πG5
⋆dk̂ +
, (95)
for some ∂M = {Σ,Σ∞} and k being the asymptotic unit norm timelike Killing vector. Assuming
we work in a gauge in which LξA = 0, and using the relations ✷kµ = −Rµνkν , LkΩ = ik(dΩ) +
d(ikΩ) for any form Ω and the equations of motion for g and A, the result is
16πG5
⋆dk̂ + 4(k · A) ⋆ F −
k̂ ∧ (Â · F )
(k · A)A ∧ F
, (96)
plus a term at Σ∞ that vanishes as A→ 0. In dimensions other than d = 5, there will be an extra
term that cannot be expressed as a surface integral at ΣH . For details see [4]. Now, we see that
the first, second and last terms combine to give the Noether charge (19). Decomposing k into
its compact and non-compact components, k = ∂t + Ω
iχi, and choosing Σ to be an r = const.
surface in the NHG, we find from the 3d expressions (80,81) that this gives us
S + ΩiJi
+Φ(r)Qel.−
(∂t · A) ⋆ F −
(∂̂t +Ω
iχ̂i) ∧ (Â · F )
In (∂̂t + Ω
iχ̂i) ∧ (Â · F ), we find that in terms of frame fields the relevant components are
(∂̂t+Ω̂
iχi)0, A0 and F01, since the AdS2 symmetries restrict non-vanishing FM1 to M = 0. This
makes the last term vanishing, such that we get in the limit r → 0 the Smarr formula
ΩiHJi + ΦHQel. , (98)
that agrees with the near-horizon limit of the non-extremal one. From the point of view of
the near-horizon solution, we find that the mass is now a gauge-dependent expression, with the
gauge given by the embedding of the near-horizon solution in the asymptotic solution. We find
that (98) looks different from (91), however they are in agreement since ΩH vanishes for BMPV
black holes [4].
7 Conclusions
In this paper we presented expressions for conserved currents and charges of 10d type IIB
supergravity (with the metric and five-form) and minimal (gauged) supergravity theories in 5
dimensions. These have been obtained following Wald’s construction of gravitational Noether
charges. Those of the 5d gauged supergravity can also be obtained by dimensional reduction of
the 10d formulae. We further showed that the Noether charges of the higher dimensional theories,
after dimensional reduction, match precisely with the Noether charges of gauge fields obtained
by Kaluza-Klein reduction over the compact Killing vector directions of interest. Our expressions
for the charges should be valid generally for both extremal and non-extremal geometries. We
then turned to their applications to extremal black holes and demonstrated that, when evaluated
in the near horizon geometries, our charges reproduce the conserved charges of the corresponding
extremal black holes under certain assumptions. In particular, we exhibited that our methods
give the correct electric charges and angular momenta for the BMPV and Gutowski-Reall black
holes.
A host of new solutions to supergravity theories with AdS2 isometries have been found
recently [20] and many more such solutions are expected to be found in the future. These
solutions may be interpreted as the near horizon geometries of some yet to be found black holes.
In such cases, our results should be useful in extracting the black hole charges without having to
know the full black hole solutions but just the near horizon geometries. On the other hand, the
holographic duals of string theories in the NHG are expected to be supersymmetric conformal
quantum mechanics. Our conserved charges should be part of the characterising data of these
conformal quantum mechanics.
We argued that the black holes with AdS3 near horizons do not satisfy our assumptions when
embedded in black hole asymptotes with Sd−2 isometries (rather than black string asymptotes).
Supersymmetric black rings are the main examples for which our formulae do not seem to apply.
More generally for black holes with AdS3 one has to find the correct way to extract the conserved
charges separately which we would like to return to in future.
We then presented a new entropy function valid for rotating black holes in 5d with CS terms
which gives the correct electric charges as well as the entropy. This is an improvement over [21].
We used appropriate boundary terms, that make the action fully gauge-independent which turns
out to be relevant to obtain the thermodynamics in the second part of the paper.
In the second part of the paper we exhibited a new definition of the entropy as a Noether
charge, and a derivation of the first law, which are applicable for extremal black holes directly.
We used this definition to produce the statistical version of the first law and moved on to re-
derive the entropy function from a more physical perspective. Finally, we commented on the
physical interpretation of the mass in the near-horizon solution. The relevant calculations were
done in the near-horizon geometry, only assuming an embedding into some asymptotic solution
for the purpose of formally defining the Mass. We did not, however, produce a conserved charge
corresponding to the the level number. In terms of the 5d fields, the expression in [27] is just
proportional to
⋆F , which is conserved in the NHG by the symmetries, but not by the
equations of motion in a general geometry. Various potentially interesting candidates, such as
the R-charge and global AdS2 time Noether-Wald charge did not produce an interesting result.
We find that the gauge-independent thermodynamic quantities can be evaluated everywhere
in the near-horizon geometry, as they are a statement about the near-horizon geometry. In
particular, they are the entropy, euclidean action and charges and their chemical potentials,
as well as the statistical version of the first law (92). Relations and quantities related to the
asymptotic geometry and to thermodynamics of non-extremal black holes (the mass, horizon
electric potential and angular frequency, as well as the first law and Smarr formula) however are
gauge-dependent from the point of view of the near-horizon geometry. They need to be evaluated
on a specific hypersurface, r = 0, as they come from position-dependent statements in the near-
horizon geometry. This means that the former ones may be more relevant for characterising
attractors.
Acknowledgements
We thank Rob Myers for helpful discussions and suggestions and helpful comments on the
manuscript. MW was supported by funds from the CIAR and from an NSERC Discovery grant.
Research at the KITP is supported in part by the National Science Foundation under Grant No.
PHY05-51164 and research at the Perimeter Institute in part by funds from NSERC of Canada
and MEDT of Ontario.
A Black Rings
The non-equal angular momentum generalization of the BMPV case is the supersymmetric black
ring [18]. It is an excellent counter-example in which the conditions in section 3.3 are not satisfied.
To demonstrate this, we sketch out the derivation of the asymptotic and near horizon limits as
given in [18]. The general form of the solution is given by:
ds1 = −f2(dt + ωφdφ + ωψdψ)2 +
f−1R2
(x−y)2
( dy2
+ (1−x2)dφ2 + (y2−1)dψ2
f(dt+ ω) − q
(1 + x)dφ + (1 + y)dψ
, (99)
where y ∈]−∞,−1] , x ∈ [−1, 1] , φ, ψ ∈ R�2πZ and f−1 = 1 + Q−q
(x − y) − q
(x2 − y2),
ωφ = − q8R2 (1− x
3Q− q2(3 + x+ y)
and ωψ =
(1 + y) +
(1− y2)
3Q− q2(3 + x+ y)
The asymptotic limit is given by (x + 1) → +0 and (y + 1) → −0, and its geometry of a
squashed sphere with broken isometry SO(4) → U(1)2 can be made manifest by combining
(x, y) into a radial coordinate ρ ∈ R+ and an angular coordinate Θ ∈ [−π2 ,
ρ sinΘ =
x−y and ρ cosΘ =
x−y (100)
The near horizon limit, on the other hand, is given by y → −∞, such that appropriate radial
and angular coordinates are r = −R
and cos θ = x. A first observation is that the two limits
are just points in the “opposite” coordinates, (ρ,Θ) → (R, π
) and (r, θ) → (R,π). To obtain
the near horizon geometry in a suitable form, we define χ = φ − ψ, take the limit r = ǫr̃R−1,
t = ǫ−1t̃, ǫ→ 0 and get:
ds2 =
q2dr̃2
dt̃dψ +
(q2 −Q)2 − 4q2R2
dψ2 +
dθ2 + sin2θdχ2
A = −
(q2 +Q)dψ + q2(1 + cos θ)dχ
. (101)
Now, we also see that the topology of the horizon is S1×S2 with U(1)×SO(3) ∋ U(1)2 isometry
and whose subgroup U(1)2 is not guaranteed to agree with the U(1)2 of the asymptotic geometry.
The AdS2 geometry is more apparent after dimensional reduction, when gtt ∝ r̃2 is restored,
and after suitably rescaling t̃. [18] show furthermore that the AdS2 and S
1 combine into a local
AdS3. The conserved charges are now Jψ =
(q2 −Q)2 − 12q2R2
, Jχ = − π8G5 q(q
2 +Q)
and Qel. =
(q2+Q), or in the old coordinates Jψ =
(q2−Q)2+2q2(q2−2Q−6R2
qQ . They compare to the asymptotic quantities computed in [18] Jψ =
q(3Q− q2),
q(6R2 + 3Q− q2) and Qel. =
The distinguishing feature here is that black rings have an AdS3×S2 near-horizon geometry.
Thus the S1 × S2 of the horizon and the S3 of the asymptotic hypersurface are topologically
distinct, such that there is no continuous fibration of hypersurfaces over r between them. In
particular, The coordinates that describe the asymptotic S3 shrink the horizon and the area
bounded by the black ring to a point in 3d (or an S1 × S1 in 5d), and are missing part of the
boundary of the full solution because of the difference in topology. This missing part shrinks
into the coordinate singularity that also contains the horizon, so flux that passes though that
part of the boundary will not be seen from the asymptotic geometry.
It is not inconceivable that if we consider the black rings on Taub-Nut spaces like in [31, 32, 33]
and obtain a 4d black hole which satisfies our criteria one may yet be able to recover the charges
of such black rings.
References
[1] A. Sen, “Black hole entropy function and the attractor mechanism in higher derivative
gravity,” JHEP 0509, 038 (2005) [arXiv:hep-th/0506177].
[2] P. Kraus and F. Larsen, “Microscopic black hole entropy in theories with higher derivatives,”
JHEP 0509, 034 (2005) [arXiv:hep-th/0506176].
[3] D. Astefanesei, K. Goldstein, R. P. Jena, A. Sen and S. P. Trivedi, “Rotating attractors,”
JHEP 0610, 058 (2006) [arXiv:hep-th/0606244].
[4] J. P. Gauntlett, R. C. Myers and P. K. Townsend, “Black holes of D = 5 supergravity,”
Class. Quant. Grav. 16, 1 (1999) [arXiv:hep-th/9810204].
[5] R. M. Wald, “Black hole entropy in the Noether charge,” Phys. Rev. D 48, 3427 (1993)
[arXiv:gr-qc/9307038].
[6] P. J. Silva, “Thermodynamics at the BPS bound for black holes in AdS,” JHEP 0610, 022
(2006) [arXiv:hep-th/0607056].
[7] J. Lee and R. M. Wald, “Local symmetries and constraints,” J. Math. Phys. 31, 725 (1990).
[8] M. Rogatko, “First law of black rings thermodynamics in higher dimensional Chern-Simons
gravity,” Phys. Rev. D 75, 024008 (2007) [arXiv:hep-th/0611260].
[9] T. Ortin, “Gravity And Strings,” (Cambridge University Press, Cambridge, England, 2004)
[10] G. L. Cardoso, J. M. Oberreuter and J. Perz, “Entropy function for rotating extremal black
holes in very special geometry,” JHEP 0705, 025 (2007) [arXiv:hep-th/0701176].
[11] K. Goldstein and R. P. Jena, “One entropy function to rule them all,” arXiv:hep-th/0701221.
[12] J. B. Gutowski and H. S. Reall, “Supersymmetric AdS5 black holes,” JHEP 0402, 006
(2004) [arXiv:hep-th/0401042].
[13] J. P. Gauntlett, J. B. Gutowski and N. V. Suryanarayana, “A deformation of AdS5 × S5,”
Class. Quant. Grav. 21, 5021 (2004) [arXiv:hep-th/0406188].
[14] A. Sinha, J. Sonner and N. V. Suryanarayana, “At the horizon of a supersym-
metric AdS5 black hole: Isometries and half-BPS giants,” JHEP 0701, 087 (2007)
[arXiv:hep-th/0610002].
[15] P. Davis, H. K. Kunduri and J. Lucietti, “Special symmetries of the charged Kerr-
AdS black hole of D = 5 minimal gauged supergravity,” Phys. Lett. B 628, 275 (2005)
[arXiv:hep-th/0508169].
[16] J. C. Breckenridge, R. C. Myers, A. W. Peet and C. Vafa, “D-branes and spinning black
holes,” Phys. Lett. B 391, 93 (1997) [arXiv:hep-th/9602065].
http://arxiv.org/abs/hep-th/0506177
http://arxiv.org/abs/hep-th/0506176
http://arxiv.org/abs/hep-th/0606244
http://arxiv.org/abs/hep-th/9810204
http://arxiv.org/abs/gr-qc/9307038
http://arxiv.org/abs/hep-th/0607056
http://arxiv.org/abs/hep-th/0611260
http://arxiv.org/abs/hep-th/0701176
http://arxiv.org/abs/hep-th/0701221
http://arxiv.org/abs/hep-th/0401042
http://arxiv.org/abs/hep-th/0406188
http://arxiv.org/abs/hep-th/0610002
http://arxiv.org/abs/hep-th/0508169
http://arxiv.org/abs/hep-th/9602065
[17] Z. W. Chong, M. Cvetic, H. Lu and C. N. Pope, “General non-extremal rotating black
holes in minimal five-dimensional gauged supergravity,” Phys. Rev. Lett. 95, 161301 (2005)
[arXiv:hep-th/0506029].
[18] H. Elvang, R. Emparan, D. Mateos and H. S. Reall, “A supersymmetric black ring,” Phys.
Rev. Lett. 93, 211302 (2004) [arXiv:hep-th/0407065].
[19] T. Jacobson, G. Kang and R. C. Myers, “On Black Hole Entropy,” Phys. Rev. D 49, 6587
(1994) [arXiv:gr-qc/9312023].
[20] J. P. Gauntlett, N. Kim and D. Waldram, “Supersymmetric AdS(3), AdS(2) and bubble
solutions,” JHEP 0704, 005 (2007) [arXiv:hep-th/0612253].
[21] J. F. Morales and H. Samtleben, “Entropy function and attractors for AdS black holes,”
JHEP 0610, 074 (2006) [arXiv:hep-th/0608044].
[22] A. Chamblin, R. Emparan, C. V. Johnson and R. C. Myers, “Charged AdS black holes and
catastrophic holography,” Phys. Rev. D 60, 064018 (1999) [arXiv:hep-th/9902170].
[23] M. Cvetic et al., “Embedding AdS black holes in ten and eleven dimensions,” Nucl. Phys.
B 558, 96 (1999) [arXiv:hep-th/9903214].
[24] J. P. Gauntlett, R. C. Myers and P. K. Townsend, “Supersymmetry Of Rotating Branes,”
Phys. Rev. D 59, 025001 (1999) [arXiv:hep-th/9809065].
[25] R. C. Myers and M. J. Perry, “Black Holes In Higher Dimensional Space-Times,” Annals
Phys. 172, 304 (1986).
[26] B. Sahoo and A. Sen, “BTZ black hole with Chern-Simons and higher derivative terms,”
JHEP 0607, 008 (2006) [arXiv:hep-th/0601228].
[27] R. Emparan and D. Mateos, “Oscillator level for black holes and black rings,” Class. Quant.
Grav. 22, 3575 (2005) [arXiv:hep-th/0506110].
[28] S. W. Hawking and S. F. Ross, “Duality between electric and magnetic black holes,” Phys.
Rev. D 52, 5865 (1995) [arXiv:hep-th/9504019].
[29] V. Iyer and R. M. Wald, “A Comparison of Noether charge and Euclidean methods
for computing the entropy of stationary black holes,” Phys. Rev. D 52, 4430 (1995)
[arXiv:gr-qc/9503052].
[30] S. Dutta and R. Gopakumar, “On Euclidean and noetherian entropies in AdS space,” Phys.
Rev. D 74, 044007 (2006) [arXiv:hep-th/0604070].
[31] H. Elvang, R. Emparan, D. Mateos and H. S. Reall, “Supersymmetric 4D rotating black
holes from 5D black rings,” JHEP 0508, 042 (2005) [arXiv:hep-th/0504125].
[32] D. Gaiotto, A. Strominger and X. Yin, “5D black rings and 4D black holes,” JHEP 0602,
023 (2006) [arXiv:hep-th/0504126].
[33] D. Gaiotto, A. Strominger and X. Yin, “New connections between 4D and 5D black holes,”
JHEP 0602, 024 (2006) [arXiv:hep-th/0503217].
[34] J. B. Gutowski and H. S. Reall, “General supersymmetric AdS(5) black holes,” JHEP 0404,
048 (2004) [arXiv:hep-th/0401129].
[35] H. K. Kunduri, J. Lucietti and H. S. Reall, “Supersymmetric multi-charge AdS(5) black
holes,” JHEP 0604, 036 (2006) [arXiv:hep-th/0601156].
[36] H. K. Kunduri, J. Lucietti and H. S. Reall, “Do supersymmetric anti-de Sitter black rings
exist?,” JHEP 0702, 026 (2007) [arXiv:hep-th/0611351].
http://arxiv.org/abs/hep-th/0506029
http://arxiv.org/abs/hep-th/0407065
http://arxiv.org/abs/gr-qc/9312023
http://arxiv.org/abs/hep-th/0612253
http://arxiv.org/abs/hep-th/0608044
http://arxiv.org/abs/hep-th/9902170
http://arxiv.org/abs/hep-th/9903214
http://arxiv.org/abs/hep-th/9809065
http://arxiv.org/abs/hep-th/0601228
http://arxiv.org/abs/hep-th/0506110
http://arxiv.org/abs/hep-th/9504019
http://arxiv.org/abs/gr-qc/9503052
http://arxiv.org/abs/hep-th/0604070
http://arxiv.org/abs/hep-th/0504125
http://arxiv.org/abs/hep-th/0504126
http://arxiv.org/abs/hep-th/0503217
http://arxiv.org/abs/hep-th/0401129
http://arxiv.org/abs/hep-th/0601156
http://arxiv.org/abs/hep-th/0611351
	Introduction
	Charges from Noether-Wald construction
	Review of Noether construction
	The Noether-Wald charges for type IIB supergravity
	The Noether-Wald charges for 5d Einstein-Maxwell-CS
	Reduction from 10 dimensions
	Charges from dimensional reduction
	Dimensional reduction
	Oxidation of the angular momentum
	Generalization and Limitations
	Relation to the Asymptotes
	Gauge Issues
	Examples
	The 10d Gutowski-Reall black hole
	5d Black Holes
	Equal Angular Momenta: BMPV and GR
	Non-equal Angular Momenta: Supersymmetric Black Holes
	Charges from the entropy function
	Completing the equations of motion
	Gauge invariance from boundary terms
	Thermodynamic Charges
	Poincaré Time Hamiltonian
	Entropy
	First Law
	Entropy Function and the Euclidean Action
	Mass
	Conclusions
	Black Rings
ABSTRACT
  We describe how to recover the quantum numbers of extremal black holes from
their near horizon geometries. This is achieved by constructing the
gravitational Noether-Wald charges which can be used for non-extremal black
holes as well. These charges are shown to be equivalent to the U(1) charges of
appropriately dimensionally reduced solutions. Explicit derivations are
provided for 10 dimensional type IIB supergravity and 5 dimensional minimal
gauged supergravity, with illustrative examples for various black hole
solutions. We also discuss how to derive the thermodynamic quantities and their
relations explicitly in the extremal limit, from the point of view of the
near-horizon geometry. We relate our results to the entropy function formalism.

<|endoftext|><|startoftext|>
Introduction
While QCD was established as a fundamental theory of the strong interaction
a few decades ago, its realization in hadron physics has not been understood com-
pletely. For instance, (apparent) absence of “exotic” states, which are different from
ordinary qq̄ mesons and qqq baryons, has been a long standing problem. Therefore,
the announcement1) of the discovery of Θ+ (1540), whose minimal configuration is
uudds̄, was quite striking. For the current experimental status, we refer to Ref.2)
In this report, we review the theoretical effort to search the Θ+ pentaquark
state. The main issue here is whether QCD favors its existence or not, and the
determination of possible quantum numbers for the pentaquark families (if any). In
particular, in order to understand the narrow width of Θ+ observed in the experi-
ment, it is crucial to determine the spin and parity directly from QCD.
For this purpose, we employ two frameworks, the QCD sum rule and the lattice
QCD, where both allow the nonperturbative QCD calculation without models, and
have achieved a great success for ordinary mesons/baryons. Note, however, that
neither of them is infallible, and we consider them as complementary to each other.
For instance, the lattice simulation cannot be performed at completely realistic setup,
i.e., there often exists the artifact stemming from discretization error, finite volume,
heavy u,d-quark masses and neglection of dynamical quark effect (quenching), etc.
On the other hand, the sum rule can be constructed at realistic situation, and is free
from such artifacts in lattice. Unfortunately, it suffer from another type of artifact.
Because a sum rule yields only the dispersion integral of spectrum, an interpretive
model function have to be assumed phenomenologically. Compared to the ordinary
hadron analyses, this procedure may weaken the predictability for the experimentally
uncertain system, such as pentaquarks. Another artifact in the sum rule is the OPE
truncation: one have to evaluate whether the OPE convergence is enough or not.
We also comment on the important issue common to both of the methods. Recall
that the decay channel Θ+ → N +K is open experimentally. Considering also that
both methods calculate a two-point correlator and seek for a pentaquark signal in it,
it is essential to develop a framework which can distinguish the pentaquark from the
NK state in the correlator. In the subsequent sections, we examine the literatures
and see how the above-described issues have been resolved or remain unresolved.
∗) e-mail address: doi@pa.uky.edu
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.0959v1
2 T. Doi
§2. The QCD Sum Rule Work
More than ten sum rule analyses forΘ+ spectroscopy exist for J = 1/2.3), 4), 5), 6), 7), 8), 9), 10), 11), 12)
The first parity projected sum rule was studied by us5) for I = 0. The posi-
tivity of the pole residue in the spectral function is proposed as a signature of
the pentaquark signal. This is superior criterion to the consistency check of pre-
dicted/experimental masses, because it is difficult to achieve the mass prediction
within 100MeV (∼ [m(Θ+) −m(NK)]) accuracy. We also propose the diquark ex-
otic current J5q = ǫ
abcǫdef ǫcfg(uTaCdb)(u
d Cγ5de)Cs̄
g , in order to suppress the NK
state contamination. The OPE is calculated up to dimension 6, checking that the
highest dimensional contribution is reasonably small. We obtain a possible signal in
negative parity.
In the treatment of the NK state, improvement is proposed in Ref.7) There,
NK contamination is evaluated using the soft-Kaon theorem. Note here that the
NK contamination calculated by two-hadron reducible (2HR) diagrams in the OPE
level6) is invalid because what have to be calculated is the 2HR part in the hadronic
level, not in the QCD (OPE) level. The reanalysis7) of sum rule up to dimension 6
shows that the subtraction of the NK state does not change the result of Ref.5)
Yet, as described in Sec.1, the above sum rules may suffer from the OPE trun-
cation artifact. In fact, the explicit calculation up to higher dimension have shown
that this is indeed the case.10), 11), 12) Here, we refer to the elaborated work in Ref.12)
They calculate the OPE for I(JP ) = 0(1/2±) up to dimension D = 15. It is shown
that the terms with D > 6 are important as well, while further high dimensional
terms D > 15 are not significant. Another idea in Ref.12) is the use of the combi-
nation of two independent pentaquark sum rules. In fact, the proper combination
is found to suppress the continuum contamination drastically, which corresponds to
reducing the uncertainty in the phenomenological model function. Examining the
positivity of the pole residue, they conclude the pentaquark exists in positive parity.
Does the result12) definitely predict the JP = 1/2+ pentaquark ? At this mo-
ment, we conservatively point out remaining issues. The first problem is still the
NK contamination. While such contamination is expected to be partly suppressed
through the continuum suppression, it is possible that the obtained signal corre-
sponds to just scattering states. In this point, Ref.12) argues that the signal has
different dependence on the parameter 〈q̄q〉 from the NK state. We, however, con-
sider this discussion uncertain, because 〈q̄q〉 is not a free parameter independent of
other condensates. For further study, the explicit estimate in the soft-Kaon limit7) is
interesting check, but the calculation up to high dimension has not been worked out
yet. Second issue is related to the OPE. In the evaluation of the high dimensional
condensates, one have to rely on the vacuum saturation approximation practically,
while the uncertainty originating from this procedure is not known. Furthermore,
there exists an issue for the validity of the OPE itself when considering the sum rule
with high dimensionality. In fact, rough analysis of the gluonic condensates shows13)
that the nonperturbative OPE may break down around D >∼ 11− 16. One may have
to consider this effect as well, through, for instance, the instanton picture.11)
So far, we have reviewed J = 1/2 sum rules. While there are J = 3/2 works,14), 15)
Theoretical Status of Pentaquarks 3
it is likely that they suffer from slow OPE convergence. Further progress is awaited.
§3. The Lattice QCD Work
There are a dozen of quenched lattice calculations:16), 17), 18), 19), 20), 21), 22), 23), 24), 25), 26), 27), 28)
some of them16), 17), 23), 25) report the positive signal, while others18), 19), 20), 21), 26), 24)
report null results. This apparent inconsistency, however, can be understood in a
unified way, by taking a closer look at the “interpretation” of the numerical results
and the pending lattice artifact.
As discussed in Sec.1, the question is how to identify the pentaquark signal in
the correlator, because the correlator at large Euclidian time is dominated by the
ground state, the NK scattering state. In this point, we develop a new method in
Ref.19), 26) Intuitively, this method makes use of that a scattering state is sensitive to
the spacial boundary condition (BC), while a compact one-particle state is expected
to be insensitive. Practically, we calculate the correlator under two spacial BCs: (1)
periodic BC (PBC) for all u,d,s-quarks, (2) hybrid BC (HBC) where anti-periodic
BC for u,d-quarks and periodic BC for s-quark. The consequences are as follows. In
PBC, all of Θ+, N, K are subject to periodic BC. In HBC, while Θ+(uudds̄) remains
subject to periodic BC, N(uud,udd) and K(s̄d,s̄u) are subject to anti-periodic BC.
Therefore, the energy of NK will shift by PBC → HBC due to the momentum of
N and K, while there is no energy shift for Θ+. (Recall that the momentum is
quantized on lattice as 2~nπ/L for periodic BC and (2~n + 1)π/L for anti-periodic
BC, with spatial lattice extent L and ~n ∈ ZZ3.) In this way, the different behavior
between NK and Θ+ can be used to identify whether the signal is NK or Θ+. We
simulate the anisotropic lattice, β = 5.75, V = 123 × 96, aσ/aτ = 4, with the clover
fermion. The conclusion is: (1) the signal in 1/2− is found to be s-wave NK from
HBC analysis. No pentaquark is found up to ∼ 200MeV above the NK threshold.
(2) the 1/2+ state is too massive (> 2GeV) to be identified as Θ+(1540).
In comparison with other lattice results, we introduce another powerful method18)
to distinguish Θ+ from NK. This method makes use of that the volume dependence
of the spectral weight behaves as O(1) for one-particle state, and as O(1/L3) for
two-particle state. Intuitively, the latter factor O(1/L3) can be understood as the
encounter probability of the two particles. The calculation18) of the spectral weight
from 163 × 28 and 123 × 28 lattices reveals that the ground states of both the 1/2±
channels are not the pentaquark, but the scattering states. Further analysis is per-
formed in Ref.23) There, the 1st excited state in 1/2− is extracted with 2 × 2 vari-
ational method. The volume dependence of the spectral weight indicates that the
1st excited state is not a scattering state but a pentaquark state. This is consistent
with Ref.,27) where 19× 19 variational method is used to extract the excited states.
Note here that this results is consistent with the HBC analysis.19) In fact,
HBC analysis exclude the pentaquark up to ∼ 200MeV above threshold, while the
resonance observed in Ref.23) locates 200-300MeV above the threshold. The question
is whether the observed resonance is really Θ+ which experimentally locates 100MeV
above the threshold. To address this question, explicit simulation is necessary at
physically small quark mass without quenching. In particular, small quark mass
4 T. Doi
would be important considering that Refs.19), 23) are simulated at rather heavy quark
masses and expected to suffer from large uncertainty in the chiral extrapolation.
Finally, we discuss the JP = 3/2± lattice results. We performed the com-
prehensive study26) with three different operators and conclude that all the lattice
signals are too massive (> 2GeV) for Θ+, and are identified as not pentaquarks but
scattering states from the HBC analysis. On the other hand, Ref.25) claims that a
pentaquark candidate is found in 3/2+. We, however, observe that the latter result
are contaminated by significantly large statistical noise, which makes their result
quite uncertain. Note also that their criterion to distinguish Θ+ from scattering
states is based on rather limited argument compared to the HBC analysis.
§4. Conclusions
We have examined both of the QCD sum rule and lattice QCD works. In the
sum rule, progresses in OPE calculation and continuum suppression have achieved
stable analysis, while the subtraction of NK contamination remains a critical issue.
In the lattice, the framework which distinguish the pentaquark from NK have been
successfully established. In order to resolve the superficial inconsistency in the lattice
prediction, the calculation at small quark mass without quenching is highly desirable.
Acknowledgements
This work is completed in collaboration with Drs. H.Iida, N.Ishii, Y.Nemoto,
M.Oka, F.Okiharu, H.Suganuma and J.Sugiyama. T.D. is supported by Special Post-
doctoral Research Program of RIKEN and by U.S. DOE grant DE-FG05-84ER40154.
References
1) LEPS Collaboration, T. Nakano et al., Phys. Rev. Lett. 91 (2003), 012002.
2) T. Nakano, in these proceedings.
3) S.-L. Zhu, Phys. Rev. Lett. 91 (2003), 232002.
4) R.D. Matheus et al., Phys. Lett. B 578 (2004), 323.
5) J. Sugiyama, T. Doi, and M. Oka, Phys. Lett. B 581 (2004), 167.
6) Y. Kondo, O. Morimatsu, T. Nishikawa, Phys. Lett. B 611 (2005), 93.
7) S.H. Lee, H. Kim, Y. Kwon, Phys. Lett. B 609 (2005), 252.
8) Y. Kwon, A. Hosaka and S.H. Lee, hep-ph/0505040 (2005).
9) M. Eidemuller, Phys. Lett. B 597 (2004), 314.
10) B.L. Ioffe and A.G. Oganesian, JETP Lett. 80 (2004), 386, R.D. Matheus and S. Narison,
Nucl. Phys. Proc. Suppl. 152 (2006), 236, A.G. Oganesian, hep-ph/0510327 (2005).
11) H.-J. Lee et al., Phys. Rev. D 73 (2006), 014010, ibid., Phys. Lett. B 610 (2005), 50.
12) T. Kojo, A. Hayashigaki, D. Jido, Phys. Rev. C 74 (2006), 045206.
13) M.A. Shifman, A.I. Vainshtein and V.I. Zakharov, Nucl. Phys. B 147 (1979), 385, 448.
14) T. Nishikawa et al., Phys. Rev. D 71 (2005), 016001, 076004.
15) W. Wei, P.-Z. Huang and H.-X. Chen and S.-L. Zhu, JHEP 0507 (2005), 015.
16) F. Csikor et al., JHEP 0311 (2003), 070, S. Sasaki, Phys. Rev. Lett. 93 (2004), 152001.
17) T.W. Chiu and T.H. Hsieh, Phys. Rev. D 72 (2005), 034505.
18) N. Mathur et al., Phys. Rev. D 70 (2004), 074508.
19) N. Ishii et al., Phys. Rev. D 71 (2005), 034001.
20) B.G. Lasscock et al., Phys. Rev. D 72 (2005), 014502.
21) F. Csikor et al., Phys. Rev. D 73 (2006), 034506.
22) C. Alexandrou and A. Tsapalis, Phys. Rev. D 73 (2006), 014507.
23) T.T. Takahashi, T. Kunihiro, T. Onogi, and T. Umeda, Phys. Rev. D 71 (2005), 114509.
24) K. Holland, and K.J. Juge, Phys. Rev. D 73 (2006), 074505.
http://arxiv.org/abs/hep-ph/0505040
http://arxiv.org/abs/hep-ph/0510327
Theoretical Status of Pentaquarks 5
25) B.G. Lasscock et al., Phys. Rev. D 72 (2005), 074507.
26) N. Ishii, T. Doi, Y. Nemoto, M. Oka and H. Suganuma, Phys. Rev. D 72 (2005), 074503.
27) O. Jahn, J.W. Negele and D. Sigaev PoS LAT2005 (2006), 069.
28) C. Hagen, D. Hierl and A. Schafer, Eur. Phys. J. A 29 (2006), 221.
	Introduction
	The QCD Sum Rule Work
	The Lattice QCD Work
	Conclusions
ABSTRACT
  We review the current status of the theoretical pentaquark search from the
direct QCD calculation. The works from the QCD sum rule and the lattice QCD in
the literature are carefully examined. The importance of the framework which
can distinguish the exotic pentaquark state (if any) from the NK scattering
state is emphasized.

<|endoftext|><|startoftext|>
Introduction
Magnetic cataclysmic variables (CV) are accreting binary sys-
tems in which material transfers from a dwarf secondary star
onto a magnetic (∼5 < B < ∼250 MG) white dwarf (WD)
through Roche lobe overflow. Polars or AM Her systems with
magnetic fields larger than ∼ 10 MG stand out among magnetic
CVs because the spin period of the primary WD is synchronized
with the orbital period of the system. Unlike non-magnetic or
low-magnetic accreting binaries, they have neither a disk nor
the capacity to accumulate the transferred matter, so the bulk of
flux of these systems comes from the accretion flow, particularly
around magnetic poles. Therefore, their luminosity is sensitive
to the mass transfer rate Ṁ. Polars are known to have highs and
lows in their luminosity state, which is directly dependent on Ṁ.
In recent years a number of polars were identified with ex-
tremely low accretion rates. They are commonly called LARPs,
a name coined by Schwope et al. (2002). Their mass accretion
rate is estimated to be about a few 10−13 M⊙/yr, two orders
of magnitude below the average for CVs and they are distin-
guished for their prominent cyclotron emission lines on top of
otherwise featureless blue continua. The first two LARPs, in-
cluding the subject of this study, were discovered in the course
of the Hamburg QSO survey, thanks to a broad variable fea-
ture in the spectra subsequently identified with cyclotron lines
(Reimers et al. 1999; Reimers & Hagen 2000, hereafter RH).
Later, another newly identified magnetic CV from the list of
Send offprint requests to: G. Tovmassian
⋆ Visiting research fellow at Center for Astrophysics and Space
Sciences, University of California, San Diego, 9500 Gilman Drive, La
Jolla, CA 92093-0424, USA
⋆⋆ PO Box 439027, San Diego, CA, 92143-9024, USA
ROSAT sources (RX J1554.2+2721) was spotted in the low
state with a spectrum identical to LARPs (Tovmassian et al.
2001, 2004). Intrigued by that discovery, we conducted a blitz
campaign to check if canonical LARPs, namely HS 1023+3900
and HS 0922+1333, might be caught in a high state as well.
Since both objects had only recently been discovered and had
very limited observational coverage, we obtained one full binary
orbital period of spectral observations. Our instrumental setup
provided higher spectral resolution than the original discovery
observation by RH. We observed emission from the Hα line ap-
parently arising from the irradiated surface of the secondary star
facing the hot accreting spot on the WD and Na i infrared dou-
blet from the cooler parts of the secondary star (Tovmassian et al.
2004). The derived radial velocity (RV) curve from that obser-
vation did not fold well with the period estimated in the dis-
covery paper. However, the limited time coverage undermined
our ability to measure the period properly. We could only state
that the period might be exceeding what was reported by RH by
at least 1.14 times, corresponding to Pspin/Porb=0.88. It should
be noted that RH determined their period from the cyclotron
hump cycles and thus, they measured the WD spin period rather
than the binary orbital period. It would be quite usual to find
some degree of de-synchronization between the spin period of
the WD and orbital period. Nevertheless, the difference in peri-
ods was too large for an asynchronous polar and too small for an
intermediate polar. The latter mostly follow the empirical ratio
Pspin/Porb ∼ 0.1. In rare cases, Pspin/Porb ∼ 0.25 (see e.g. Norton
et al. 2004). There are also theoretical restrictions on a kind of
ratio that was indicated by our observation as evident from the
Norton et al. (2004) paper. Therefore, we conducted a new se-
ries of observations in order to establish the orbital period of the
compact binary and to classify it properly. This brief paper anal-
http://arxiv.org/abs/0704.0961v1
2 Tovmassian & Zharikov: Orbital period of the HS 0922+1333
yses a combined set of observations and discusses the reasons
that led us to an erroneous conclusion in 2004.
In Sect.2 we describe our observations and the data reduc-
tion. The data analysis and the results are presented in Sect.3,
and conclusions are drawn in Sect.4.
2. Observations and reduction
Sets of observations were collected over a four-year period and
analyzed. All observations of HS 0922+1333 reported here were
obtained at the Observatorio Astrónomico Nacional San Pedro
Martir, México. The B&Ch spectrograph installed at the 2.1 me-
ter telescope was used for the extensive spectroscopy, while a
1.5 m telescope was used to obtain simultaneous photometry
during the 2003 March run. In the first observations, upon which
Tovmassian et al. (2004) depended, we used a 600 l/mm grating
centered in the optical IR range (6200 – 8340 Å) to achieve a
spectral resolution of 4.2 Å FWHM in a sequence of 900 sec ex-
posures covering one orbital period. The controversy over the
periods led us to re-observe the object during three nights in
March 2003. This time we utilized the highest available grat-
ing of 1200 l/mm. The spectral resolution reached 2.2 Å FWHM
covering the 6100 – 7200 Å range. Later we collected more ob-
servations with lower resolution to refine the orbital period and
properly classify the secondary star.
In all observations an SITe 1024 × 1024 24 µm pixel CCD
was used to acquire the data. The slit width was usually set to 2.′′0
and oriented in the E–W direction. He-Ar arc lamp exposures
were taken at the beginning and end of each run for wavelength
calibration.
In 2003 March observations we conducted simultaneously
with differential photometry. Exposure times were 40–60 sec
with an overall time resolution of about 80–100 sec using the
Johnson-Cousins Rc filter.
The reduction of data was done in a fairly standard manner.
The bulk of reduction was performed using IRAF1 procedures,
except for removing of cosmic rays by a corresponding program
in MIDAS2, as this is an easier and more reliable tool. The bi-
ases were taken at the beginning and end of the night and were
subtracted after being combined using the CCD overscan area
for control of possible temperature-related variations during the
night. We did not do flat field correction for spectral observations
and used blank sky images taken at twilight for direct images.
The flux calibration was done by observing a spectrophotomet-
ric standard star. Feige 34 was observed during a 2002 run and
G191-B2B during the rest of the observations.
The wavelength calibration is routinely done by observing
a He–Ar arc lamp at the beginning and end of a sequence on
the object or every 2 hours if the sequence is too long. Then the
wavelength solutions calculated for each arc-lamp exposure and
an average of preceding and succeeding images are applied to
the object observed in between. The wavelength solutions are
usually good to a few 1/10 of an Angstrom, while deviations due
to the telescope position and flexations of the spectrograph can
exceed that by an order of magnitude. Usually, that does not pose
a problem since we work with moderate resolutions and the am-
plitude of radial velocity variation is on the order of hundreds
of km/sec. The sensible way of checking and correcting wave-
length calibration is to measure the night sky lines. We mea-
1 http://iraf.noao.edu
2 ESO-MIDAS is the acronym for the European Southern
Observatory Munich Image Data Analysis System which is developed
and maintained by the European Southern Observatory
Fig. 1. The CLEANed power spectrum of the RV variation is
presented by a solid line. The dashed line is the power spectrum
of photometric data. Vertical axes on the right side correspond
to the photometric power scale.
sured several lines by selecting unblended ones located close to
Hα and Na I. The measurements of sky lines show a clear trend
and indicate the scale of errors that one can incur depending on
the telescope inclination. Although the trend is unusually steep,
reaching 30 km/sec over 4 hours of observation, the scatter of
points around a linear fit is relatively small, which defines the
error of the measurements (rms) and is ≤8 km/sec. Nevertheless,
the error bars in the corresponding plots reflect the entire range
of deviation just to demonstrate the scale of corrections applied
to the data. The deviations of the linear fit to the measured night
sky lines (with an average of 2 night sky lines around each mea-
sured line) from the rest value were used to correct the wave-
length calibration by the corresponding amount.
3. Orbital period and system parameters
We measured the Hα line in the 2002 spectra with single
Gaussian fits. The resulting RV curve was reasonably smooth
and sinusoidal, but the ends of the curve would not overlap when
folded with the period reported by RH (see Fig. 5 in 2004). We
speculated that the actual orbital period is longer than the one de-
rived from the photometry. However, the measurements of new
spectra obtained in 2003 do not show such a large discrepancy,
and the period analysis of the combined dataset easily reveals
that the true period is indeed 4.0395 hours and coincides with the
photometric period derived from the synchrotron lines variabil-
ity within errors of measurement. The combination of data taken
years apart and several nights in a row each year allowed us to
determine the period very precisely. We applied the CLEAN pro-
cedure (Roberts et al. 1987) to sort out the alias periods resulting
from the uneven data sampling and daily gaps and obtained a
strong peak in the power spectrum at the 5.94131 ± 0.00065 cy-
cles/day, corresponding to a 4.0395 ± 0.0001 hour period (see
Fig. 1). Simultaneous with spectroscopy, we obtained photome-
Tovmassian & Zharikov: Orbital period of the HS 0922+1333 3
Table 1. Log of observations of HS 0922+1333
Date HJD+ Telescope Instrument/Grating Range/Band Exp.Time Duration
Spectroscopy 24530000 Num. of Integrations
2002 02 04 2309 2.1m B&Ch1 600l/mm 6200-8340Å 900s×19 4.5h
2003-03-25 2723 2.1m B&Ch 1200l/mm 6100-7200Å 900s×13 3.3h
2003-03-26 2724 2.1m B&Ch 1200l/mm 6100-7200Å 900s×6 1.3h
2003-03-27 2725 2.1m B&Ch 1200l/mm 6100-7200Å 900s×15 2.6h
2005-10-27 3670 2.1m B&Ch 400l/mm 6100-9200Å 900s×15 2.2h
2005-10-29 3672 2.1m B&Ch 400l/mm 6100-9200Å 900s×11 1.7h
2005-10-30 3673 2.1m B&Ch 400l/mm 6100-9200Å 900s×8 1.2h
2006-01-18 3673 2.1m B&Ch 400l/mm 5800-8900Å 1800s×8 3.7h
Photometry
2003-03-26 2724 1.5m RUCA2 R 120s×107 3.8h
2003-03-27 2725 1.5m RUCA R 120s×99 3.4h
2003-03-28 2726 1.5m RUCA R 120s×99 3.4h
1 B&Ch - Boller & Chivens spectrograph (http://haro.astrospp.unam.mx/Instruments/bchivens/bchivens.htm)
2 RUCA - CCD photometer (http://haro.astrospp.unam.mx/Instruments/laruca/laruca intro.htm)
Fig. 2. The light curve of HS 0922+1333 obtained in filter Rc
and folded with the orbital period. The phasing is according to
spectroscopic data.
try in Rc band that partially includes the strongest cyclotron line.
It is a dominant contributor to the light curve (Fig.2), so we can
use it to determine the spin period of the WD. The power spec-
trum calculated from photometry gives exactly the same result,
but the peak is broader, because the data lacks a longer time base.
In Fig.1 the power spectra of spectral and photometric data are
presented together.
It is clear that this system is a synchronous magnetic cata-
clysmic variable. The spin period of its white dwarf primary is
locked with the orbital and is not shorter, as suspected earlier.
We explored the cause of confusion. First of all, we corrected all
measured radial velocities using the night sky lines to remove
the trends. This decreased the gap a little between points in the
2002 data in phases 0.0 through 0.2 where they were not overlap-
ping. But even taking errors related to the wavelength calibration
into account, they still do not fold properly (see the open (blue)
square symbols in Fig.3). What is more interesting, however, is
that the amplitude of the radial velocity variation has a much
higher value in the 2002 data than in 2003 data, as measured
with single Gaussians. The careful examination of the 2003 data,
with twice the spectral resolution than in 2002, reveals that at the
bottom of the Hα emission line there is a weak and broad bump
present in most phases. We de-blended the Hα line from 2003
observations using two Gaussian components in the IRAF splot
procedure. The result is shown on the right side of the left panel
of Fig.3 only (positive phases). The strong, narrow component
basically coincides with the single Gaussian measurements. But
the weak broad component appears to show a much larger am-
plitude and reveals itself mainly between phases 0.3 through 0.9.
This component is clearly identified as the heated matter leaving
the Lagrangian L1 point, the nozzle where the accretion stream
forms. Outflowing matter has intrinsic velocity, so at phase 0.75
when the secondary star reaches maximum velocity toward the
observer, it tilts the weight of the emission line toward larger
velocity. Its phasing appears to be similar to a high-velocity
component (HVC) detected routinely in polars (Schwope et al.
1997, Tovmassian et al. 1999) that originates in the ballistic part
of the stream. In the lower-resolution spectra this component
could not be separated, therefore the radial velocity curve be-
came stretched and deformed. That and the short time coverage
limited to just a little over one orbital period led to the misinter-
pretation of the 2002 spectral data. It is very interesting that we
were able to distinguish the accretion flow onset. So far these ob-
jects have been known to show only the synchrotron humps as an
evidence of accretion processes taking place in them (Schwope
et al. 2002).
The RV curve derived from sodium lines (see Fig.3) gives the
measure of the rotation of the center of mass of the secondary in
the orbital plane, while the narrow component of the Hα line
originates from the front side of the elliptically distorted sec-
ondary. The ephemerides of HS 0922+1333 from the RV mea-
surements can be described as
T0 = HJD 2452308.336+ 0.
d168313[200]× E,
where T0 corresponds to the −/+ crossing of the RV curve as fol-
lows from the fitting of sinusoid to the RV measurements of Hα
and sodium lines separately according to the following equation:
V(t) = γ + K × sin(2π(t − t0)/Porb)
http://haro.astrospp.unam.mx/Instruments/bchivens/bchivens.htm
http://haro.astrospp.unam.mx/Instruments/laruca/laruca_intro.htm
4 Tovmassian & Zharikov: Orbital period of the HS 0922+1333
Fig. 3. The radial velocity curve of Hα line (left panel) and of Na i doublet (right). The open squares (blue) in the left panel represent
2002 data obtained with lower spectral resolution. The filled squares (red) are measurements of 2003 observations with single
Gaussian fitting. The error bars on the left side of the plots reflect the amplitude of wavelength corrections. The points are placed at
the correct positions after trend removal. The right side of the plot presents measurements of the 2003 data but with double Gaussian
de-blending of the line. The filled (green) circles are from the stronger line component originating at the irradiated secondary, the
open circles correspond to a much weaker component coming from the stream. In the right panel, measurements of the Na I lines
are presented from 2002 observations. The filled square symbols denote RV of λ 8197 Å measured with Gaussian deblending, after
velocity correction with sky lines. The open squares and triangles are measurements of the same doublet with single gaussians
(squares λ 8185Å and triangles λ 8197 Å). The diamonds are measurements of the λ 8185 Å line from 2006 observations. The curve
is a result of sin fit to the combined data. Note that scales of y-axes of panels are different.
Table 2. Radial velocity parameters of HS 0922+1333.
Line γ K Residuals
km/sec km/sec km/sec
Hα 36.6±7 132±12 25.1
Na I 8185Å -81±11 162±17 29.5
Na I 8197Å -65±13 139±20 32.7
Corresponding numbers derived from the fitting are pre-
sented in the Table 2. Unfortunately, due to the large errors there
is no marked difference between the semi-amplitude of radial
velocities between Hα and Na I lines. Otherwise, knowing the
spectral type of the secondary, we could deduce the basic pa-
rameters of the binary since that difference reflects the size of
the Roche lobe of the secondary.
The spectrum of the secondary in the absence of an accretion
disk is clearly seen, and in the phases when the magnetic ac-
creting spot that is radiating strong synchrotron emission is self-
eclipsed, one can see undisturbed secondary spectrum in the near
infrared range. In the Fig. 4 the flux calibrated spectrum of the
object obtained at phase 0.5 is presented. Overplotted are stan-
dard spectra of M3 to M5 main sequence stars (Pickles 1998)
normalized to the object. The WD’s contribution has not been
removed. However, at wavelengths above 6500 Å, its contribu-
tion is apparently insignificant and a good accordance emerges
between the object and M4 V standard star. This is also consis-
tent with what is expected from the Porb - spectral type II rela-
tion (Beuermann 2000), although the secondary is a M3.5 star
according to RH. The masses of secondaries in systems with pe-
Fig. 4. The spectrum of HS 0922+1333is presented by the solid
line. For comparison the standard spectra of M3-M5 stars are
plotted from the Pickles (1998)
Tovmassian & Zharikov: Orbital period of the HS 0922+1333 5
V (km/s)
V (km/s)
V (km/s)
V (km/s)
V (km/s)
Fig. 5. The Doppler maps of HS 0922+1333. On the top the to-
mograms of Hα emission line are presented in two panels with
different contrast levels to emphasize the concentration of the
emitting region on the facing side of the secondary on the left
and possibly some trace of mass transfer stream on the right.
The curved lines in the top panels correspond to the stream tra-
jectory, with numbers in the top right panel indicating stream
azimuth. The tomogram corresponding to the Na I line is placed
below in the right corner. The circle-shaped emission around the
center of mass is caused by the presence of the component of the
doublet line. In the bottom left corner, the observed and recon-
structed trailed spectra of Hα line (above) and Na I (below) are
presented.
riods similar to HS 0922+1333 range from 0.35 to 0.42 M⊙, in
those cases where the mass could be estimated precisely. Such
a secondary would follow the empirical mass-period and radius-
period relations from Smith & Dhillon (1998)
M2/M⊙ = 0.126(11) P(h) − 0.11(4)
R2/R⊙ = 0.117(4) P(h) − 0.041(18), (1)
Observations with higher resolution in the near IR will per-
mit investigators to precisely measure the difference between the
RV of Hα originating at the facing side of the secondary and
sodium absorption lines reflecting the motion of the center of
mass. Subsequently, it should allow for estimating the observed
radius of the star to check the possibility that it fills the Roche
lobe. For now, we can only assume that the mass transfer pro-
ceeds in a way similar to other polars, based on the detection of
a high velocity component in the emission line. Its presence can
also be illustrated by constructing Doppler tomograms.
Doppler tomography (Marsh & Horne 1988; Marsh 2001) is
a powerful tool in cases like this, where the origin of line profiles
is bound to the orbital plane and the system has relatively high
inclination. We constructed Doppler maps, or tomograms, using
both the Hα emission line and the Na I λ8197Å absorption line
to prove the accuracy of our estimate of the binary parameters.
The tomograms in Fig.5 show that the Hα line is mostly confined
to the front side of the secondary, while the sodium absorption
fills the entire body of the secondary. However, the difference is
not very obvious. The reason for that appears to be the the lower
spectral resolution and fewer spectra employed.
4. Conclusions
1. We have determined the 4.0395 hours spectroscopic period
of the LARP HS 0922+1333 based on the radial velocity
measurements of Hα emission line originating at the irra-
diated secondary star. The derived value coincides within
measurement errors with the spin period of the system, thus
proving that the object is a synchronized polar.
2. The profiles of the Hα emission line in higher-spectral res-
olution observations turned out to be complex. They are
formed basically on the irradiated surface of the secondary
star, but they also show a small contribution from the matter
in close proximity to the L1 point. The matter escaping the
secondary shows RVs with higher velocity and a different
phase.
3. The Doppler tomograms tend to confirm detection of a
stream of transfer matter.
The parameters of the system that we have obtained are in-
teresting in the context of the model proposed by Webbink and
Wickramasinghe (2005). According to it, the LARPs are rela-
tively young and are still approaching their first Roche lobe over-
flow. The accretion is due to the capture of the wind material
from the secondary by the strong magnetic field of the primary.
We think that we see evidence of a faint stream common to the
polars that transfer material through the L1 point which is usually
due to the Roche lobe overflow. However, the wind will proba-
bly also cause a flow of matter through the same trajectory, so
it is difficult to say if the observation runs against the model.
The precise measurement of the secondary star size may help to
clarify this.
Acknowledgements. This study was supported partially by grant 25454 from
CONACyT. GT acknowledges the UC-MEXUS fellowship program enabling
him to visit CASS UCSD. The authors are grateful to the anonymous referee for
careful reading of the manuscript and valuable comments. We thank L.Valencic
for help in language related issues.
References
Beuermann, K. 2000, New Astronomy Review, 44, 93
Demircan, O., & Kahraman, G. 1991, Ap&SS, 181, 313
Marsh, T. R., Horne, K. 1988, MNRAS, 235, 269
Marsh, T. R. 2001, ”Doppler Tomography”, Astrotomography, Indirect Imaging
Methods in Observational Astronomy, Edited by H.M.J. Boffin, D. Steeghs
and J. Cuypers, Lecture Notes in Physics, vol. 573, p.1
Norton, A. J., Wynn, G. A., & Somerscales, R. V. 2004, ApJ, 614, 349
Pickles, A. J. 1998, PASP, 110, 863
Roberts, D. H., Lehar, J., Dreher, J. W. 1987, AJ, 93, 968
Reimers, D., & Hagen, H.-J. 2000, A&A, 358, L45
Reimers, D., Hagen, H.-J., & Hopp, U. 1999, A&A, 343, 157
Schwope, A. D., Mantel, K.-H., & Horne, K. 1997, A&A, 319, 894
Schwope, A. D., Brunner, H., Hambaryan, V., & Schwarz, R. 2002, ASP
Conf. Ser. 261: The Physics of Cataclysmic Variables and Related Objects,
261, 102
Smith, D. A., Dhillon, V. S. 1998, MNRAS, 301, 767
Tovmassian, G. H., et al. 1999, ASP Conf. Ser. 157: Annapolis Workshop on
Magnetic Cataclysmic Variables, 157, 133
Tovmassian, G. H., Greiner, J., Zharikov, S. V., Echevarrı́a, J., & Kniazev, A.
2001, A&A, 380, 504
Tovmassian, G., Zharikov, S., Mennickent, R., & Greiner, J. 2004, ASP
Conf. Ser. 315: IAU Colloq. 190: Magnetic Cataclysmic Variables, 315, 15
6 Tovmassian & Zharikov: Orbital period of the HS 0922+1333
Warner, B. 1995, Cataclysmic variable stars, Cambridge Astrophysics Series,
Cambridge, New York: Cambridge University Press, 1995
Webbink, R. F., & Wickramasinghe, D. T. 2005, Astronomical Society of the
Pacific Conference Series, 330, 137
	Introduction
	Observations and reduction
	Orbital period and system parameters
	Conclusions
ABSTRACT
  Context: The object HS 0922+1333 was visited briefly in 2002 in a mini survey
of low accretion rate polars (LARPs) in order to test if they undergo high
luminosity states similar to ordinary polars. On the basis of that short
observation the suspicion arose that the object might be an asynchronous polar
(Tovmassian et al. 2004). The disparity between the presumed orbital and spin
period appeared to be quite unusual. Aims: We performed follow-up observations
of the object to resolve the problem. Methods: New simultaneous spectroscopic
and photometric observations spanning several years allowed measurements of
radial velocities of emission and absorption lines from the secondary star and
brightness variations due to synchrotron emission from the primary. Results:
New observations show that the object is actually synchronous and its orbital
and spin period are equal to 4.04 hours. Conclusions: We identify the source of
confusion of previous observations to be a high velocity component of emission
line arousing from the stream of matter leaving L1 point.

<|endoftext|><|startoftext|>
Introduction
The rigidly rotating disk has long been recognized as a crucial ‘missing link’
in our historical reconstruction of Einstein’s recognition of the non-Euclidean
nature of spacetime in his path toward general relativity.1, 2 Relativistic rigid
rotation combines several different but related problems: the issue of a Lorentz-
covariant definition of rigid motion, the number of degrees of freedom of a rigid
body, the reality of length contraction,3 as well as Ehrenfest’s paradox4 and the
introduction of non-Euclidean geometric concepts into the theory of relativity.5
2 Relativistic rigid motion
A relativistic definition of rigid motion was first given by Max Born.6 The
definition was given in the context of a theory of the dynamics of a model of
an extended, rigid electron, and defined a rigid body as one whose infinitesimal
volume elements appear undeformed for any observer that is comoving instanta-
neously with the (center of the) respective volume element. The definition and
its implications were discussed at the 81st meeting of the Gesellschaft Deutscher
Naturforscher und Ärzte in Salzburg in late September 1909.
Gustav Herglotz and Fritz Noether, in papers received by the Annalen der
Physik on 7 and 27 December, respectively, further elaborated on the mathe-
matical consequences of Born’s definition.7 Herglotz, in particular, reformulated
∗To appear in: Proceedings of the Eleventh Marcel Grossmann Meeting on General Rela-
tivity, ed. H. Kleinert, R.T. Jantzen and R. Ruffini, World Scientific, Singapore, 2007.
http://arxiv.org/abs/0704.0962v1
the definition in more geometric terms: A continuum performs rigid motion if
the world lines of all its points are equidistant curves. The analysis showed that
Born’s infinitesimal condition of rigidity can only be extended to the motion of
a finite continuum in special cases. It implied that a rigid body has only three
degrees of freedom. The motion of one of its points fully determines its motion.
Translation and uniform rotation are special cases. In particular, the definition
does not allow for acceleration of a rigid disk from rest to a state of uniform
rotation with finite angular velocity.
In view of these consequences, various other definitions of a rigid body were
suggested, e.g. by Born and Noether,7, 8 until it became clear that special rel-
ativity does not allow for the usual concept of a rigid body. In other words, a
relativistic rigid body necessarily has an infinite number of degrees of freedom.9
On 22 November 1909, a short note appeared by Paul Ehrenfest pointing
to a paradox that follows from Born’s relativistic definition of rigid motion of
a continuum.10 He considered a rigid cylinder rotating around its axis and
contended that its radius would have to meet two contradictory requirements.
The periphery must be Lorentz-contracted, while its diameter would show no
Lorentz contraction. The difficulty became known as the “Ehrenfest paradox.”
In a polemic exchange with von Ignatowsky,11 Ehrenfest devised the following
thought experiment to illustrate the difficulty. He imagined the rotating disk
to be equipped with markers along the diameter and the periphery. If their
positions were marked onto tracing paper in the rest frame at a fixed instant,
with the disk both at rest and in uniform rotation, the two images should show
the same radius but different circumferences.
3 The Einstein-Varićak correspondence
Immediately after the 1909 Salzburg meeting, Einstein wrote to Arnold Som-
merfeld that “the treatment of the uniformly rotating rigid body seems to me of
great importance because of an extension of the relativity principle to uniformly
rotating systems.”12 This was a necessary step for Einstein following the heuris-
tics of his equivalence hypothesis, but only in spring 1912, a few weeks before
he made the crucial transition from a scalar to a tensorial theory of gravitation
based on a general spacetime metric,5 do we find another hint at the problem
in his writings.1, 2
The Collected Papers of Albert Einstein recently published13 nine letters by
Einstein to Vladimir Varićak (1865–1942), professor of mathematics at Agram
(now Zagreb, Croatia). Varićak had published on non-Euclidean geometry14 and
is known for representing special relativistic relations in terms of real hyperbolic
geometry.15, 16 The correspondence seems to have been initiated by Varićak ask-
ing for offprints of Einstein’s papers. In his response, Einstein added a personal
tone to it with his wife Mileva Marić, a native Hungarian Serb, writing the
address in Cyrillic script in order to raise Varićak’s curiosity. After exchanging
publications, Varićak soon commented on Einstein’s (now) famous 1905 special
relativity paper, pointing to misprints but also raising doubts about his treat-
ment of reflection of light rays off moving mirrors. These were rebutted by
Einstein in a response of 28 February 1910 in which he also, with reference to
Ehrenfest’s paradox, referred to the rigidly rotating disk as the “most interesting
problem” that the theory of relativity would presently have to offer. In his next
two letters, dated 5 and 11 April 1910 respectively, Einstein argued against the
existence of rigid bodies invoking the impossibility of superluminal signalling,
and also discussed the rigidly rotating disk. A resolution of Ehrenfest’s paradox,
suggested by Varićak, in terms of a distortion of the radial lines so as to preserve
the ratio of π with the Lorentz contracted circumference, was called interesting
but not viable. The radial and tangential lines would not be orthogonal in spite
of the fact that an inertial observer comoving with a circumferential point would
only see a pure rotation of the disk’s neighborhood.
About a year later, Einstein and Varićak corresponded once more. Varićak
had contributed to the polemic between Ehrenfest and von Ignatowsky by sug-
gesting a distinction between ‘real’ and ‘apparent’ length contraction. The real-
ity of relativistic length contraction was discussed in terms of Ehrenfest’s tracing
paper experiment, but for linear relative motion. According to Varićak, the ex-
periment would show that the contraction is only a psychological effect whereas
Einstein argued that the effect will be observable in the distance of the recorded
marker positions. When Varićak published his note, Einstein responded with a
brief rebuttal.17
Despite their differences in opinion, the relationship remained friendly. In
1913, Einstein and his wife thanked Varićak for sending them a gift, commented
favorably on his son who stayed in Zurich at the time, and Einstein announced
sending a copy of his recent work on a relativistic theory of gravitation. The
Einstein-Varićak correspondence thus gives us additional insights into a signifi-
cant debate. It shows Einstein’s awareness of the intricacies of relativistic rigid
rotation and bears testimony to the broader context of the conceptual clarifica-
tions in the establishment of the special and the genesis of the general theory
of relativity.
References
[1] J. Stachel, Einstein and the Rigidly Rotating Disk, in General Relativity
and Gravitation: One Hundred Years after the Birth of Albert Einstein.
Vol. 1, ed. A. Held (Plenum, 1980), 1–15; see also “The First Two Acts,”
in J. Stachel. Einstein from ‘B’ to ‘Z’ (Birkhäuser, 2002), 261–292.
[2] G. Maltese and L. Orlando. Stud. Hist. Phil. Mod. Phys. 26, 263 (1995).
[3] M. Klein et al. (ed.) The Collected Papers of Albert Einstein. Vol. 3. The
Swiss Years: Writings, 1909–1911. (Princeton University Press, 1993),
478–480.
[4] M. Klein. Paul Ehrenfest: The Making of a Theoretical Physicist. (North-
Holland, 1970), 152–154.
[5] M. Janssen, J. Norton, J. Renn, T. Sauer, J. Stachel. The Genesis of Gen-
eral Relativity: Einstein’s Zürich Notebook. Vol. 1. Introduction and Source.
Vol. 2. Commentary and Essays. (Springer, 2007).
[6] M. Born. Ann. Phys. 30, 1 (1909); Phys. Zs. 10, 814 (1909).
[7] G. Herglotz, Ann. Phys. 31, 393 (1910); F. Noether, Ann Phys. 31, 919
(1910).
[8] M. Born, Nachr. Königl. Ges. d. Wiss. (Göttingen) 161 (1910).
[9] A. Einstein, Jahrb. Radioaktiv. Elektr. 4, 411 (1907); M. Laue, Phys. Zs.
12, 85 (1911).
[10] P. Ehrenfest, Phys. Zs. 10, 918 (1909).
[11] P. Ehrenfest, Phys. Zs. 11, 1127 (1910); 12, 412 (1911); W.v.Ignatowsky,
Ann. Phys. 33, 607 (1910); Phys. Zs. 12, 164, 606 (1911).
[12] M. Klein et al. (ed.) The Collected Papers of Albert Einstein. Vol. 5. The
Swiss Years: Correspondence, 1902–1914. (Princeton University Press,
1993).
[13] D. Buchwald et al. (ed.) The Collected Papers of Albert Einstein. Vol. 10.
The Berlin Years: Correspondence, May–December 1920 and Supplemen-
tary Correspondence, 1909–1920. (Princeton University Press, 2006).
[14] V. Varićak. Jahresber. dt. Math. Ver. 17, 70 (1908); Atti del Cong. inter-
nat. del Mat. 2, 213 (1909).
[15] V. Varićak. Phys. Zs. 11, 93, 287, 586 (1910); Jahresber. dt. Math. Ver.
21, 103 (1912).
[16] S. Walter. The Non-Euclidean Style of Minkowskian Relativity, in The
Symbolic Universe. ed. J. Gray (Oxford University Press, 1999), 91–127.
[17] V. Varićak, Phys. Zs. 12, 169 (1911); A. Einstein. Phys. Zs. 12, 509 (1911).
	Introduction
	Relativistic rigid motion
	The Einstein-Varicak correspondence
ABSTRACT
  The historical significance of the problem of relativistic rigid rotation is
reviewed in light of recently published correspondence between Einstein and the
mathematician Vladimir Varicak from the years 1909 to 1913.

<|endoftext|><|startoftext|>
Introduction
Several years ago, it was discovered that Einstein had investigated the idea
of geometric stellar lensing more than twenty years before the publication
http://arxiv.org/abs/0704.0963v1
of his seminal note on the subject.1 The analysis of a scratch notebook2
showed that he had derived equations in notes dated to the year 1912 that
are equivalent to those that he would only publish in 1936.3 In the notes
and in the paper, Einstein derived the basic lensing equation for a point-like
light source and a point-like gravitating mass. From the lensing equation it
follows readily that a terrestial observer will see a double image of a lensed
star or, in the case of perfect alignment, a so-called “Einstein ring.” Einstein
also derived an expression for the apparent magnification of the light source
as seen by a terrestial observer. The dating for the notes was based on other
entries in the notebook. Some of these entries are related to a visit by Einstein
in Berlin April 15-22, 1912, and it was conjectured that the occasion for the
lensing entries was his meeting with the Berlin astronomer Erwin Freundlich
during this week.
The lensing idea lay dormant with Einstein until in 1936 he was prodded
by the amateur scientist Rudi W. Mandl into publishing his short note in
Science. In the meantime, the idea surfaced occasionally in publications by
other authors, such as Oliver Lodge (1919), Arthur Eddington (1920), and
Orest Chwolson (1924).4 We only have one other piece of evidence that
Einstein thought about the problem between 1912 and 1936. In a letter to
his friend Heinrich Zangger, dated 8 or 15 October 1915, Einstein remarked
that he has now convinced himself that the “new stars” have nothing to do
with the lensing effect, and that with respect to the stellar populations in
the sky the phenomenon would be far too rare to be observable.5
The Albert Einstein Archives in Jerusalem recently acquired a hitherto
unknown letter by Einstein that both corroborates some of the historical
conjectures of the early history of the lensing idea and also adds significant
new insight into the context of Einstein’s early considerations. From this
letter it appears that the phenomenon of “new stars,” i.e. the observation
of this type of cataclysmic variables, played a much more prominent role in
the origin of the idea than was suggested by the side remark in Einstein’s
letter to Zangger. It also adds important new information about Einstein’s
thinking in the crucial period between losing faith in the precursor theory to
1[Renn, Sauer, and Stachel 1997] and [Renn and Sauer 2003].
2Albert Einstein Archives (AEA), call number 3-013, published as [CPAE3, Appendix
A]. A facsimile is available on Einstein Archives Online at http://www.alberteinstein.info.
3[Einstein 1936].
4[Lodge 1919], [Eddington 1920, pp. 133–135], [Chwolson 1924].
5Einstein to Heinrich Zangger, 8 or 15 October 1915 [CPAE8, Doc. 130].
http://www.alberteinstein.info
the general theory of relativity entertained in the years 1913–1915, and the
breakthrough to a general relativistic theory of gravitation in the fall of 1915.6
In fact, the new letter justifies a reexamination of our reconstruction of what
we know about Einstein’s intellectual preoccupations both in April 1912 and
in October 1915, and more generally about the genesis of the concept of
gravitational lensing.
1 Einstein’s letter to Emil Budde
The new letter is a response to Emil Arnold Budde (1842–1921), dated 22
May 1916.7 Budde had been director of the Charlottenburg works of the
company of Siemens & Halske from 1893 until 1911.8 He was the author
of a number of scientific publications, among them a monograph on ten-
sors in three-dimensional space [Budde 1914a]9 and of a critical comment on
relativity published in 1914 in the Verhandlungen of the German Physical
Society.10
In an unknown letter to Einstein, Budde apparently had written about the
possibility of observing what are now called Einstein rings, i.e. ring shaped
images of a distant star that is in perfect alignment with a lensing star and
a terrestial observer. The subject matter of Budde’s initial letter can be in-
ferred from Einstein’s response in which he pointed out that one would expect
the phenomenon to be extraordinarily rare, and that it could not be detected
on photographic plates “as little circles” since irradiation would diffuse the
images that would hence only appear as bright little discs, indistinguishable
from the image of a regular star.
6For historical discussion, see [Norton 1984], [Janssen et al. 2007], and further refer-
ences cited therein.
7AEA 123-079. The letter will be published in the forthcoming volume of the Collected
Papers of Albert Einstein.
8Budde had studied catholic theology and science, and had worked as a secondary
school teacher and as a correspondent for the German daily Kölnische Zeitung in Paris,
Rome, and Constantinople. In 1887, he became a Privatgelehrter in Berlin, edited the
journal Fortschritte der Physik, and entered the company Siemens & Halske as a physicist
in 1892. In 1911, he retired and moved to Feldafing, near Lake Starnberg, since he had been
advised by his physicians to live at an altitude of at least 600m [Laue 1921, Werner 1921].
9In [Norton 1992, pp. 309–310] this textbook is cited as evidence for the argument that
Grossmann’s generalization of the term ‘tensor’ in [Einstein and Grossmann 1913] was an
original development.
10[Budde 1914b], [Budde 1914c].
The interesting part of Einstein’s response follows after this negative com-
ment. Einstein continued to relate that he himself had put his hopes on a
different aspect, namely that “due to the lensing effect” the distant star
would appear with an “immensely increased intensity,” and that he initially
had thought that this would provide an explanation of the “new stars.” He
went on to list three reasons why he had given up this hope after more
careful consideration. First, the temporal development of the intensity of a
nova is asymmetric. The luminosity increases much faster than it declines
again. Second, the color of the novae usually changes towards the red and,
in general, its spectral character changes in a distinct and characteristic way.
Third, the phenomenon would be very unlikely for the same reasons that the
observation of an Einstein ring would be unlikely.
In the beginning of his letter, Einstein pointed out that Budde’s idea con-
cerned the same thing that “about half a year ago” (“vor etwa einem halben
Jahre”) had put him into “joyous excitement” (“freudige Aufregung”). At
the end of the letter, he again wrote that the joy had been “just as short as
it had been great.” Counting back six months from the date of Einstein’s
letter, 22 May 1916, takes us to the 22nd of November 1915, which is just
the time of the final formulation of general relativity. It is also just another
six weeks or so away from the date of his letter to Zangger of early October,
in which he wrote about the very same subject of the possible explanation
of novae as a phenomenon of gravitational lensing.
2 The lensing calculations in the scratch note-
In light of this new letter, let us briefly reexamine the calculations in the
Scratch Notebook that had been dated to April 1912.11 Stellar gravitational
lensing is an implicit consequence of a law of the deflection of light rays in
a gravitational field. Such a law had been obtained by Einstein in 1911 as
a direct consequence of the equivalence hypothesis. The angle of deflection
11The following brief recapitulation refers to [CPAE3, 585–586], or
http://www.alberteinstein.info/db/ViewImage.do?DocumentID=34432&Page=23 and
· · ·&Page=26. For a complete and detailed paraphrase of Einstein’s notes, see the
Appendix below.
http://www.alberteinstein.info/db/ViewImage.do?DocumentID=34432
Figure 1: The geometric constellation for stellar gravitational lensing as
sketched in Einstein’s Scratch Notebook. From [CPAE3, p. 585].
α̃12 was found to be
where k is the gravitational constant, M the mass of the lensing star, c
the speed of light, and ∆ the distance of closest approach of the light ray
measured from the center of the massive star.13 On [p. 43] of the Scratch
Notebook we find the sketch shown in Fig. (1) and underneath it the lensing
equation
r = ρ
R +R′
where R denotes the distance between the light emitting distant star and
the massive star that is acting as a lens, R′ the distance between the lensing
star and the position of a terrestial observer who is located a distance r away
from the line connecting light source and lensing star. ρ is the distance of
closest approach of a light ray emitted by the star and seen by the observer.
α = 2kM/c2 is a typical length (later known as the Schwarzschild radius)
that depends on the mass of the light deflecting star and that determines
12I am using the notation α̃ instead of α (as in [Einstein 1911]) in order to distinguish
this angle from the quantity α (effectively the Schwarzschild radius) in Einstein’s scratch
notebook.
13[Einstein 1911, p. 908]. Qualitatively, Einstein had already derived the consequence of
light bending in a gravitational field when he first formulated his equivalence hypothesis
[Einstein 1907, p. 461]. In the final theory of general relativity, the same relation is
obtained with an additional factor of 2, as observed explicitly in [Einstein 1915c, p. 834].
Incidentally, the relevant formula was printed incorrectly by a factor of 2 in (the first
printing of) Einstein’s 1916 review paper of general relativity [Einstein 1916, p. 822], see
[CPAE6, Doc. 30, n. 36] and also Einstein’s response to Carl Runge, 8 November 1920
[CPAE10, Doc. 195].
the angle of deflection to be α
. The lensing equation can be written in
dimensionless variables as
r0 = ρ0 −
, (1)
after defining r0 and ρ0 as
r0 = r
R′(R +R′)α
ρ0 = ρ
R +R′
. (2)
The fact that equation (1) is a quadratic equation for ρ0 entails that there are
two solutions which correspond to two light rays that can reach an observer,
along either side of the lensing star,14 and hence that a terrestial observer
will see a double image of the distant star. For perfect alignment, the double
image will turn into a ring shaped image, an “Einstein-ring” whose diameter
0 = ρ
= 1 also follows immediately from the lensing equation.
In light of Einstein’s letters to Zangger and Budde, it is interesting that
Einstein went on to compute also the apparent magnification, obtaining the
following expression:
Htot = H
. (3)
Here Htot is the total intensity received by the observer, and H the intensity
of the star light at distance R. ρ1,2 denote the two roots of the quadratic
equation (1). The term in brackets gives the relative brightness, reducing
to 1 if no lensing takes place. Finally, some order of magnitude calculations
on these pages showed that the probability of observing this effect would be
given by the probability of having two stars within a solid angle that would
cover 10−15 of the sky, which is highly improbable given that the number of
known stars at the time was of the order of 106.15
Equations that are entirely equivalent to these were published much later,
in 1936, in Einstein’s note to Science.16
14Since only three points are given, the problem is intrinsically a planar one, as long as
the three points are not in perfect alignment.
15See the discussion in the appendix.
16[Renn, Sauer, and Stachel 1997].
The dating of the lensing notes in the scratch notebook to Einstein’s visit
in Berlin in April 1912 was based on other evidence in the notebook. Most
importantly, p. [36] lists Einstein’s appointments during his Berlin visit. In
addition, pp. [38] and [39] recapitulate very specifically the equations of Ein-
stein’s two papers on the theory of the static gravitational field of February
and March 1912, respectively.17 The calculations that deal with the lensing
problem then appear on pp. [43]-[48], and on pp. [51] and [52] of the note-
book. The sheet containing pp. [44] and [45] is a loose sheet inserted between
p. [43] and p. [45]. After p. [53], three pages have been torn out, and then
follow 37 blank pages, with some pages torn out in between. The remainder
of the notebook contains entries that begin at the other end of the notebook
which was turned upside down. Except for some apparently unrelated and
undated entires on pp. [49], [50],18 and [54], the lensing calculations hence
are at the end of a more or less continuous flow of entries. These physical
characteristics of the notebook lead to an important consequence. All infor-
mation that was pointing to a date of the lensing calculations in the year
1912 preceded the actual lensing calculations. Reexaming pp. [51] and [52]
of the notebook in light of the letters to Zangger and to Budde in fact reveals
that at least these entries were not written in 1912, but rather most likely
at the time of the letter to Zangger, in early October 1915. There are two
reasons for this. First, at the top of p. [51], Einstein wrote down the title
of a book published only in 1914.19 Therefore, the following calculations are
almost certainly to be dated later than the publication of this book. Second,
at the bottom of p. [52], Einstein explicitly refers to the “apparent diameter
of a Nova st[ar].” The calculations on pp. [51] and [52] in fact are a calcula-
tion of the apparent brightness and diameter of a star. We conclude that, in
all probability, the calculations on pp. [51] and [52] were written at the time
of Einstein’s letter to Zangger, early October 1915.
Does the dating of pp. [51] and [52] to October 1915 also compel us to
17[Einstein 1912a, Einstein 1912b].
18On the bottom half of p. [49] there is a sketch of Pascal’s and Brianchon’s Theorems,
which deal with hexagons inscribed in or circumscribed on a conical section. I wish to
thank Jesper Lützen for this identification. Other entries on pp. [49] and [50] also appear
to deal with problems from projective geometry. There is also a sketch of a vessel filled
with a liquid and the words “eau glyceriné” and what appears to be sketch of a magnetic
moment in a sinusoidal magnetic field.
19[Fernau 1914]. Could it be that the book was mentioned to Einstein when he met
with Romain Rolland in Geneva in September 1915, see [CPAE8, Doc. 118]?
revise our dating of the other lensing calculations in the notebook? To answer
this question, we need to consider the broader historical context of the notes.
But before doing so, we first observe that pp. [49] and [50] contain entries
that appear unrelated to the lensing problem. As shown by the detailed
paraphrase given in the appendix, the calculations on pp. [43] to [48] on
the other hand represent a coherent train of thought, as do the calculations
of pp. [51] and [52]. We also note that Einstein used a slightly different
notation on pp. [43]ff. and on pp. [51]-[52]. In the first set, he denoted the
distances between light source and lens and between lens and observer as
R and R′, respectively. On pp. [51]-[52] he used the notation R1 and R2,
respectively. He also reversed the roles of r and ρ. We conclude that there
is a discontinuity between the first set of lensing calculations on pp. [43] to
[48] and the second set on pp. [51] and p. [52].
3 The context of Einstein’s early lensing cal-
culations
From Einstein’s letter to Budde we learn that he had investigated the idea
that stellar lensing might explain the phenomenon of the “new stars,” and
that he had given up this idea after looking more closely into the character-
istic features of novae, especially their light curves and the changes in their
spectral characteristics. Let us therefore briefly look into the astronomical
knowledge about novae at the time.
The observation of a new star is an event that, in the early twentieth
century, occurred only every few years. Between 1900 and 1915, eight novae
were observed:20 Nova Persei 1901 (GK Per), Nova Geminorum (1) 1903 (DM
Gem), Nova Aquilae 1905 (V604 Aql), Nova Vela 1905 (CN Vel), Nova Arae
1910 (OY Ara), Nova Lacertae 1910 (DI Lac), Nova Sagittarii 1910 (V999
Sgr), and Nova Geminorum (2) 1912 (DN Gem) with maximum brightness
of 0.2, 4.8, 8.2, 10.2, 6.0, 4.6, 8.0, 3.5 magnitudes, respectively. At the time,
“the two most interesting Novae of the present century,” [Campbell 1914,
p. 493], were Nova Persei of 1901 and Nova Geminorum of 1912. The next
spectacular nova to occur was the very bright Nova Aquilae 1918 (V603 Aql)
with a maximum brightness of −1.1 mag.
Nova Geminorum (2) was discovered on March 12, 1912, by the as-
20For the following, see [Duerbeck 1987].
Figure 2: The light curve of Nova Geminorum 1912 for the first three
months after its appearance, as put together by Fischer-Petersen on the
basis of 253 individual observations. The points are the magnitudes re-
ported by the individual observers, the solid line is to guide the eye. From
[Fischer-Petersen 1912, p.429].
tronomer Sigurd Enebo at Dombaas, Norway [Pickering 1912]. On a pho-
tographic plate taken at Harvard College Observatory on March 10, showing
stars of magnitude 10.5, it was not visible, but it was visible as a magni-
tude 5 star in the constellation Gemini on a Harvard plate of March 11. On
March 13, a cablegram was received at Harvard and distributed throughout
the United States. In the following days all major observatories as well as
many amateur astronomers pointed their instruments towards the new star.
The maximum brightness of mag 3.5 was reached on March 14 (Einstein’s
33rd birthday!) [Fischer-Petersen 1912]. By March 16, the brightness was
down to a magnitude of 5.5 and in the following weeks it decreased further,
with distinct oscillations. By mid-April 1912, most observers registered a
brightness of mag 6 ≈ 7, see Fig. (2). We now know that the DN Gem is a
fast nova with a t3-time of 37d. Its light curve is type Bb in the classification
of [Duerbeck 1987], i.e. it declines with major fluctuations.
Like all classical novae, Nova Geminorum is, in fact, a binary system of
a white dwarf and main sequence star, where hydrogen-rich matter is being
accreted onto the white dwarf. Recent observations have even determined the
binary period [Retter et al. 1999]. The eruption of a classical nova occurs
when a hydrogen-rich envelope of the white dwarf suffers a thermonuclear
runaway.21 This explanation of classical novae also entails that they display
the same sequence of spectral behaviour as the luminosity decreases, see also
Fig. (3) below. However, our current understanding of classical novae was
suggested only in the fifties.22
The temporal proximity of the appearance of Nova Geminorum 1912 with
Einstein’s Berlin visit during the week of April 15–22, suggests that this
astronomical event was discussed also when Einstein met with Freundlich
for the first time.23 We know that the observatory in Potsdam took a
number of photographs of the new star between March 15 and April 12
[Furuhjelm 1912, Ludendorff 1912], and that Freundlich, among others, was
charged with photometric observations of the nova [Fischer-Petersen 1912,
p. 429]. Einstein and Freundlich had earlier corresponded about the possib-
lity of observing gravitational light deflection through the gravitational field
of the sun.24 The purpose of their meeting was to discuss possible astro-
nomical tests of Einstein’s emerging relativistic theory of gravitation. The
recent observation of the brightest nova since 1901 must have been on Fre-
undlich’s mind, and it seems more than likely that the idea of explaining
the phenomenon in terms of gravitational lensing therefore came up in the
course of their conversation. We conclude that our earlier dating of the first
set of calculations of the lensing problem in the Scratch Notebook to the
time of Einstein’s encounter with Freundlich in April 1912 is the most likely
possibility.
In fact, the context of the observation of Nova Geminorum 1912 provides
an answer to the question as to why Einstein would have done the calculations
at all and, in particular, why he would not have been content at the time
with a calculation of the lensing equation, the separation of the double star
image and, perhaps, the radius of the Einstein ring. Without this context
it might seem a rather ingenious move on Einstein’s part to go ahead and
immediately compute the apparent magnification of the lensed star as well.
But this answer to the question of motivation for the specific details of the
21For a review, see [Shara 1989].
22For a historical overview of previous theories, see [Duerbeck 2007].
23For evidence that Einstein met with Freundlich, see his letter to Michele Besso, 26
March 1912, in which he mentions planned discussions (“Besprechungen”) with Nernst,
Planck, Rubens, Warburg, Haber, and “an astronomer”—presumably Freundlich [CPAE5,
Doc. 377].
24Einstein to Freundlich, 1 September 1911, 21 September 1911, and 8 January 1912
[CPAE5, Docs. 281, 287, 336].
Figure 3: Changes in the spectrum of Nova Geminorum 1912, March 22 to
August 19, 1912. From [Adams and Kohlschütter 1912].
calculations in the Scratch Notebook, immediately raises another question.
Assuming that the first set of lensing calculations were done in spring
1912, why do we have no evidence that this idea was followed up by either
Einstein or by Freundlich until the fall of 1915? To answer this question,
it should first be observed that no summarizing results and analyses of the
observations of Nova Geminorum 1912 were published before the end of the
summer.
Let us briefly recall Einstein’s intellectual preoccupations after his visit
to Berlin in April 1912.25 Shortly before his trip to Berlin he had submitted
his two papers on a theory of the static gravitational field.26 After his return
to Prague in April 1912, Einstein was preparing for his move to Zurich. The
two papers were published in the 23 May issue of the Annalen der Physik.
Einstein wrote an addendum at proof stage to the second one, in which
he showed that the equations of motion could be written in a variational
form, adding that this would give us “an idea about how the equations of
motion of the material point in a dynamic gravitational field are constructed”
[Einstein 1912b, p. 458]. He also entered into a published dispute with Max
Abraham on their respective theories of gravitation.27 At the end of July,
he departed Prague for Zurich. The next thing we know about his work on
gravitation comes from a letter to Ludwig Hopf, dated 16 August 1912, in
which he wrote:
The work on gravitation is going splendidly. Unless I am com-
pletely wrong, I have now found the most general equations.28
These most general equations are, in all probability, equations of motion
in a gravitational field, represented by a metric tensor. After his arrival
in Zurich, Einstein began a collaboration with his former classmate Marcel
Grossmann, now his colleague at the ETH. Their research on a generalized
25We will focus here on his work of gravitation yet for the sake of completeness it should
be noted that Einstein at the same time was also thinking about quantum theory, most
notably about the law of photochemical equivalence and about the problem of zero point
energy, see [CPAE4, Docs. 5, 6, 11, 12].
26[Einstein 1912a], [Einstein 1912b], were received by the Annalen der Physik on 26
February and 23 March, respectively.
27[Einstein 1912c] which was received by the Annalen on 4 July 1912 is a response to a
critique by Abraham.
28Einstein to Hopf, 16 August 1912 [CPAE5, Doc. 416].
theory of relativity is documented in Einstein’s so-called “Zurich Notebook”29
and culminates in the publication of the “Outline [Entwurf] of a generalized
theory of relativity and a theory of gravitation,” in early summer of 1913 co-
authored with Marcel Grossmann.30 This so-called Entwurf-theory contains
all the elements of the final theory of general relativity, except for generally
relativistic field equations. Einstein would hold onto this theory until his
final breakthrough to general relativity in the fall of 1915.
In conclusion, we observe that Einstein’s path toward the general theory
of relativity in 1912 took him deep into the unknown land of the mathematics
associated with the metric tensor, before there was a chance to reconsider
the lensing idea in light of the data for Nova Geminorum 1912. In any case,
he would have to rely on Freundlich or other professional astronomers for a
secure assessment of the possibilities of an observation of the lensing effect
at the time.
Freundlich, on the other hand, continued to think about ways to test
Einstein’s new theory of gravitation.31 But his focus was on observations
of light deflection during a solar eclipse.32 In August 1914, he led a first
(unsuccessful) expedition to the Crimea to observe the eclipse of 21 August
1914. Even these efforts were hampered by the lack of funding and, more
generally, by the difficulties of securing increased research time that would
have allowed Freundlich to freely pursue his collaboration with Einstein.
Given these circumstances, and the fact that order-of-magnitude calcu-
lations may have convinced Einstein already in 1912 that the phenomenon
would be rare, it seems plausible that the lensing idea was not pursued further
for some time after Einstein’s visit in Berlin in April 1912.
Let us finally reexamine the events of fall 1915. Einstein, in the meantime
had left Zurich in the spring of 1914, accepting an appointment as member
of the Prussian Academy in Berlin. In September 1915, Einstein spent a few
weeks in Switzerland where he met, among others, with Heinrich Zangger,
Michele Besso, and Romain Rolland. On 22 September 1915, he left Zurich33
but travelled via Eisenach where he was on the 24th of September.34 By the
29AEA 3-006, see [CPAE4, Doc. 10]. For a comprehensive discussion of this document,
including a facsimile, transcription, and detailed paraphrase, see [Janssen et al. 2007].
30[Einstein and Grossmann 1913].
31See [Hentschel 1994] and [Hentschel 1997].
32See his correspondence with Einstein in [CPAE5].
33[CPAE8, p. 998].
34[CPAE10, Doc. Vol. 8, 122a].
30th of September, at the latest, he was back in Berlin, and wrote a letter
to Freundlich:
I am writing you now about a scientific matter that electrifies me
enormously.35
It is clear from the letter, however, that the excitement indicated to Fre-
undlich is not about the idea of gravitational lensing. Rather, Einstein
had found an internal contradiction in his Entwurf theory that amounted
to the realization that Minkowski space-time in rotating Cartesian coordi-
nates would not be a solution of the Entwurf field equations.36 This insight
undermined his confidence in the validity of the Entwurf theory, and is later
mentioned as one of three arguments that induced Einstein to lose faith in
the Entwurf equations.37 The first of these arguments was the fact that a cal-
culation of the planetary perihelion advance in the framework of the Entwurf
theory did not produce the well-known anomaly that had been established
for Mercury. This problem had been known to Einstein for some time.38
The third argument was realized sometime in early October, a few days after
stumbling upon the problem with rotation, and concerned the mathematical
derivation of the Entwurf field equations in Einstein’s comprehensive review
of October 1914.39 In any case, we know that Einstein asked Freundlich to
look into the problem of the rotating metric, and that they met some time in
early October. This follows from a letter Einstein wrote to Otto Naumann,
35Einstein to Freundlich, 30 September 1915 [CPAE8, Doc. 123]. For a detailed discus-
sion of this letter and its significance for the reconstruction of Einstein’s final breakthrough
to general relativity, see [Janssen 1999].
36Interestingly, the Scratch Notebook contains an entry that is pertinent to this problem.
On p. [66], i.e. on the last page of the backward end of the notebook, Einstein considers the
case of rotation in a calculation that exactly matches corresponding calculations dating
from October 1915, see [Janssen 1999]. Janssen cautiously remarks that he believes this
calculation to date from 1913 [Janssen 1999, p. 139]. It seems possible, however, that these
entries as well as the immediately preceding ones on the perihelion advance (see note 38)
may well date from late 1915 as well.
37See Einstein to Arnold Sommerfeld, 28 November 1915, and to Hendrik A. Lorentz, 1
January 1916 [CPAE8, Docs. 153, 177].
38See [Earman and Janssen 1993] and [CPAE4, pp. 344–359]. The Scratch Notebook
contains some calculations related to the perihelion advance on pp. [61–66], i.e. in the
backward end of the notebook. On p. [61], Einstein there explicitly noted that the advance
of Mercury’s perihelion would be 17′′ which is the value that is obtained on the basis of
the Entwurf-theory. These calculations are undated, see note 36.
39[Einstein 1914].
dated after 1 October 1915, in which Einstein asked about possibilities to al-
low Freundlich more freedom to pursue independent research. In this letter,
Einstein mentioned that Freundlich had visited him “recently.”40
By 12 October, Einstein had realized the third problem with the Entwurf
theory, the unproven uniqueness of the Lagrangian for the Entwurf field equa-
tions, as he reported in a letter to Lorentz. In this letter, he neither men-
tioned the problem with the rotating metric nor the issue of gravitational
lensing.41
For our reconstruction of this episode, the precise date of Einstein’s letter
to Zangger in which he remarked that he had given up the hope of explaining
the “new stars” as a lensing phenomenon is relevant. It could have been
written either on the 8th or the 15th of October.42
The letter to Zangger suggests that they had talked about the idea earlier
since Einstein seems to presuppose that Zangger knew what he was talking
about and did not explain what he meant by “lens effect” (“Linsenwirkung”).
As mentioned before, Einstein had just recently met with Zangger, as well as
with Besso before returning to Berlin. The following scenario seems therefore
plausible:
Upon returning to Berlin some time after the 24th of September 1915,
Einstein realized the problem of the rotating metric solution and wrote to
Freundlich on the 30th, asking him to look into this issue. Shortly afterwards,
the two met in person. Most likely they discussed not only the rotation
problem, but also the lensing idea. Having found troubling indications of an
inner inconsistency in the very foundations of this theory, it would have been
a natural move for Einstein to go back and reconsider early arguments such as
one based safely on the equivalence hypothesis.43 After this meeting, Einstein
40“Letzter Tage war Herr Dr. Freundlich von der Sternwarte N bei mir.” [CPAE8,
Doc. 124].
41In a letter to Hilbert, dated 7 November 1915, Einstein wrote that he realized the flaw
in his proof “about four weeks ago” [CPAE8, Doc. 136].
42The editors of [CPAE8] dated this letter explicitly to the 15th of October. It seems,
however, that the 8th is also a possibility. The letter was written on a Friday between
September 30, when a fire and explosion took place in the comb factory Walter near Lake
Biel took place, mentioned in the letter, and October 22 when Einstein participated in
the first Academy session after the summer break. I see no reason why Einstein could not
have heard of the accidents from Zangger before October 8.
43It seems unlikely that Einstein at that time was already contemplating a quantitatively
different law of light deflection. Einstein first observed in [Einstein 1915c, p. 834] that an
additional factor of 2 would arise from the different first-order approximation for the
wrote to Naumann exploring possibilities to give Freundlich more research
freedom. By October 8, Einstein had convinced himself that gravitational
lensing cannot explain the “new stars.” On 12 October, he realized the third
problem of his mathematical derivation of the Entwurf field equation.
According to this reconstruction of the sequence of events, it is remarkable
that the “joyous excitement” about the lensing idea falls within days after
his being “electrified” about the realization of the rotation problem on 30
September, and his realization of the third problem of the mathematical
derivation of the Entwurf equation, on or before 12 October 1915.44
Some five weeks later, his excitement was even greater and his heart,
allegedly, skipped a beat when he found that he could derive the anomalous
advance of Mercury’s perihelion on the basis of his new field equations. And
after having submitted the last of his four November communications to the
Prussian Academy on 25 November which presented the final gravitational
field equations, the “Einstein equations,” he wrote to Sommerfeld:
You must not be cross with me that I am answering your kind
and interesting letter only today. But in the last month I had
one of the most exciting, exhausting times of my life, indeed also
one of the most successful. I could not think of writing.45
It is interesting to learn from Einstein’s letter to Budde that in addition to
the realization of the problems with the Entwurf theory and the eventual suc-
metric if the Newtonian limit is derived on the basis of generally covariant field equations
in which the Ricci tensor is directly set proportional to the energy-momentum tensor.
These latter equations were published in his second November memoir, presented on 11
November, under the assumption that the trace of the energy-momentum tensor vanish.
In his comment on the factor of 2, Einstein refers to this result as being in contrast to
“earlier calculations” where the hypothesis of vanishing energy-momentum had not yet
been made.
44For completeness, one should point one other intellectual activity of Einstein’s during
those days. In Einstein’s letter to Zangger of 8 or 15 October, he also mentioned that
he wrote “a supplementary paper to my last year’s analysis on general relativity.” The
last year’s analysis is, in all likelyhood [Einstein 1914]; the supplementary paper is, in all
likelihood, an early version of [Einstein 1916b], or, perhaps, an early version of Einstein’s
first November memoir [Einstein 1915a], see [CPAE8, Doc. 130, note 5] and [Janssen 1999,
note 51].
45“Sie dürfen mir nicht böse sein, dass ich erst heute auf Ihren freundlichen und in-
teressanten Brief antworte. Aber ich hatte im letzten Monat eine der aufregendsten,
anstrengendsten Zeiten meines Lebens, allerdings auch der erfolgreichsten.” Einstein to
Sommerfeld, 28 November 1915 [CPAE8, Doc. 153].
cess of his breakthrough to general relativity, an astronomical problem, the
idea of explaining novae in terms of gravitational lensing added to Einstein’s
excitement in the midst of what must indeed have been the most intense
period of intellectual turmoil in his life.
4 Concluding remarks
Einstein’s recollections of his thought concerning the explanation of the “new
stars” as a phenomenon of gravitational lensing in his letter to Budde add
two significant insights to our reconstruction of the genesis of general rel-
ativity. If our dating and context hypothesis of the lensing calculations in
the scratch notebook are correct, we learn that it was an astronomical ob-
servation that triggered the elaboration of a significant consequence of the
equivalence hypothesis and its consequence of gravitational light deflection.
It is also interesting that on his intellectual path from the Entwurf theory to
the final theory of general relativity, Einstein also took a detour in which he
explored further consequences of one of the solid pillars of general relativity,
the equivalence hypothesis.
Appendix: Einstein’s lensing calculations in
the Scratch Notebook AEA 3-013
The following is a self-contained line-by-line paraphrase of Einstein’s lensing
calculations in his scratch notebook, [CPAE3, pp. 585–589]. The pagination
in square brackets refers to the sequence of pages in the notebook.
The calculations start out on p. [43] with Fig. (1) and continue on the
facing page p. [46]. From the more explicit sketch in Fig. (4), we read off the
lensing equation:
r = ρ
R +R′
. (4)
Here R is the distance between the light emitting star S and the lensing star
L; R′ the distance between the massive star L and the projected position of
the observer O on the line connecting light source and lens; ρ is the distance
of closest approach of a light ray emitted from the distant star and seen by
an observer; r is the orthogonal distance of the terrestial observer to the
line connecting light source and lens. The first term in the lensing equation
Figure 4: The geometry of stellar lensing.
(4) is obtained from the similarity of triangles with baseline R and R + R′,
respectively, and the second term is the angle of deflection as given by the law
of gravitational light bending, where α is the Schwarzschild radius 2GM/c2.
If we want to write this equation in dimensionless variables, we need to
multiply it by a factor of
R′(R +R′)α
so that, when we define r0 and ρ0 as
r0 = r
R′(R +R′)α
ρ0 = ρ
R +R′
the lensing equation (4) turns into
r0 = ρ0 −
. (8)
This is a quadratic equation for r0, the two solutions of which correspond to
the two light rays passing above and below L. The observer O therefore sees
two images of S at positions S ′ and S ′′, respectively. To read off the radius
of an “Einstein ring,” obtained for perfect alignment of S, L, and O, one
only needs to set r0 ≡ 1.
In order to get an expression for the apparent magnification, Einstein
proceeded as follows. He first took the square of eq. (8) as
2 + r2 = ρ2 +
. (9)
If we multiply this equation by π and denote the areas of the circles corre-
sponding to the radii r and ρ as f = πr2 and ϕ = πρ2, respectively, we can
write this equation as
2π + f = ϕ+
. (10)
We are not interested in the full circle corresponding to these radii but in the
differential area element associated with these radii. More precisely, we are
interested in the change of the differential area element df associated with f
when we change the differential area element dϕ associated with ϕ. Hence,
Einstein wrote
dϕ. (11)
The intensity H of the brightness received at r is related to the intensity H
of the brightness at ρ by
Hdf = ±Hdϕ, (12)
where the plus and minus signs refer to the two solutions of the quadratic
equation. Since we have from (11)
, (13)
we get
H = ± H
. (14)
or, inserting the explicit solutions, we can write the total brightness at r as
Htot = H
. (15)
As Einstein remarked, the term in brackets gives the relative brightness, if
we take the value for r → ∞ to be 1.46 This result is equation number (3)
46“Klammer gibt relative Helligkeit”
in Einstein’s notes, and most of the following material on pp. [47] and [48],
as well as on the loose sheet containing pp. [44] and [45], will be a discussion
of this expression for the relative brightness.
On p. [47], Einstein first rewrote the reduced lensing equation as
− x, (16)
and then the terms in brackets as
1− x41
x42 − 1
. (17)
The next step is to bring the two terms to a common denominator47
x41 − x42
(1− x41)(1− x42)
. (18)
If one squares the lensing equation (16) twice, one obtains
− 2 + (2 + r2)2 = 1
+ x4. (19)
If we now introduce new variables A and u via
2A = −2 + (2 + r2)2, (20)
A = −1 +
(2 + r2)2 = 1 + 2r2 +
r4, (21)
u = x4, (22)
we can write the quadrupled equation (19) as
2A = u+
. (23)
Multiplication by u and adding A2 on each side gives
u2 − 2Au+ A2 = −1 + A2, (24)
47In the notes, Einstein refers to this step as “Rationalisierung”.
from which one can immediately read off the two solutions of eq. (23) as
u = −A±
A2 − 1. (25)
Given (18), the difference between the two roots,
u1 − u2 = 2
A2 − 1, (26)
provides an expression for the nominator of Hr in (18). With the two roots,
we can also rewrite the quadratic equation in the form
u2 − 2Au+ 1 = (u− u1)(u− u2), (27)
and if we now set u = 1, we obtain
2(1−A) = (1− u1)(1− u2), (28)
which gives us an expression for the denominator of Hr in (18). Combining
the two expressions, as Einstein did on p. [48], we obtain
1 + 1
) , (30)
where we have inserted (21) to obtain the second line.
We now have an explicit expression for the relative brightness as a func-
tion of the dimensionless variable r. We now evidently see that Hr → 1/r
for r → 0, and that Hr approaches 1 asymptotically from above for large r,
see Fig. (5).
Let us now reconstruct Einstein’s order-of-magnitude estimate for the
expected frequency of the phenomenon on p. [45]. The explicit expression
for the relative brightness gives us a measure of the maximal distance r for
which significant magnification is obtained. We can look at specific values of
Hr(r). For instance, for r0 = 12 we find
1 + 1
5 ≈ 2. (31)
0 1 2 3
H (r)
Figure 5: A plot of the expression (30) for the relative brightness Hr as a
function of r. The inset is from [CPAE3, p. 587].
Hence, Einstein concluded that up to a distance of r0 =
one would obtain
an increase of the intensity by a factor of 2. In other words, if we write the
intensity Hr0 asymptotically for small r0 and R′ ≫ R as
R′(R +R′)α
, (33)
we see that for a lensing star at a distance of R, the relative increase in
intensity is given by
= tg ᾱ. (34)
Here ᾱ is the angle that determines how well the distant star has to be aligned
with the lensing star and the observer to produce appreciable magnification.
In order to get an order-of-magnitude estimate for this angle, one needs an
order-of-magnitude estimate for
. In order to obtain such an estimate,
Einstein notes that the ratio of the solar Schwarschild radius α to the solar
equatorial radius Rs is given approximately by
= 3 · 10−6. (35)
The radius of the sun is 2 light seconds, and the distance of the nearest stars
is of the order of 10 light years, or
105 · 365 · 10 ≈ 4 · 108 lightseconds. (36)
It follows that α
for a star of 1 solar mass 10 lightyears away is
≈ 10−14 or
≈ 10−7. (37)
To see the distant star with double intensity, we therefore have
tg ᾱ
, (38)
so that the angle ᾱ is of order 10−7. A linear angle corresponds to a solid
angle roughly by taking its square. Thus, the angular size of the region
where the distant star needs to be found behind a massive star in order to
be magnified in the lens is of order 10−14. In angular units, the total sky has
an area of 4π ≈ 10, so that the angular size of the region in question covers a
fraction of 10−15 of the total sky. This has to be contrasted with the average
density of stellar population in the sky. The Bonner Durchmusterung listed
of the order of 3 · 105 stars to ninth magnitude for the northern hemisphere,
so a reasonable average density of the number of stars would be 1 star per
10−5 of the sky.48
On the back of the loose sheet [p. 44] we find a few more calculations
related to order-of-magnitude estimates that start from (32). Einstein here
again goes back to the definition of r0 and ρ0 in terms of R, R
′, and α.49
Again, he observes that r0 =
would give twice the usual intensity, and
rewrites (6) for this case:
R′(R +R′)α
. (39)
The latter equation for R′ ≫ R turns into
, (40)
and for R ≪ R′ into
αR′. (41)
Einstein concluded that the smaller of the two distances R and R′ determines
the angle r
. In the top right corner of the page, Einstein jotted down another
order-of-magnitude calculation, which I do not fully understand. Apparently,
he computed the distance of 100 lightyears in terms of centimeters
3 · 1010 · 3 · 107 · 102 [cm] ≈ 1020 cm (42)
48On the relevant page under discussion here, we also find a little sketch by Einstein of
a circle and the angle of its radius for a point some distance away. The precise meaning
of this sketch is unclear but the numbers written next to it suggest that Einstein was
considering the order of magnitude for the angular size of the moon. The radius of the
moon is seen under an angle of 15′ from the earth, and the mean distance between the
earth and the moon in units of the lunar radius is about 200, which translates to an angle
of 50o.
49One can see here that Einstein corrected an error in his earlier calculations on [p. 43],
where he had erroneously written the second term of the lensing equation (4) with R
instead of R′, which resulted in a confusion of the factors of R and R′ in expressions (5)
and (6).
Figure 6: A sketch in Einstein’s scratch notebook to obtain eq. (43). From
[CPAE3, p.585].
He also computed the angle x under which the star at distance R′ and
the star at distance R+R′ would be seen by an observer at distance r away
from the connecting line between the two stars if no lensing took place:
x = r
R +R′
R′(R +R′)
. (43)
The first equation can be read off from a little sketch of the geometry of light
source, lensing star, and observer, at the bottom of the page, see Fig. (6).
Let us finally comment on the calculations on pp. [51] and [52]. As men-
tioned in the main text of this article, Einstein here introduced a change
of notation. On p. [51], he sketched again the geometry for stellar lensing.
Here, the geometry has been turned by 90 degrees, and the notation changed
so that R and R′ become R1 and R2, and ρ and r are interchanged to be-
come r and ρ, respectively. This change of notation is reflected in the lensing
equation, written down on p. [52] as
ρ = r +R1
, (44)
where tanw = r/R2. Einstein then immediately proceeded to compute the
magnification by taking the square of the lensing equation and then comput-
ing the derivative as
d(ρ2)
d(r2)
(R1α)
= A ·
. (45)
Instead of pursueing this calculation further, Einstein instead wrote “appar-
ent diameter of a Nova star,” and wrote down the solution of eq. (44) for
ρ = 0, as to obtain the diameter of an Einstein ring:
R1R2α
R1 +R2
. (46)
He computed the angle w0 as
R2(R1 +R2)
. (47)
The calculation ends with an attempt at a numerical order-of-magnitude
estimation which seems to proceed along the same lines as in eqs. (35,36).
The calculation, however, was broken off, and the whole page was struck
through.
Acknowledgments
I wish to thank Diana Buchwald for a critical reading of an earlier version of
this paper, and Hilmar Duerbeck for some helpful comments. Unpublished
correspondence in the Albert Einstein Archives is quoted by kind permission.
References
[Adams and Kohlschütter 1912] Adams, Walter S., and Kohlschutter [sic],
Arnold. “Observations of the spectrum of Nova Geminorum No. 2.”
Astrophysical Journal 36 (1912), 293–321.
[Budde 1914a] Budde, Emil Arnold. Tensoren und Dyaden im dreidimen-
sionalen Raum. Braunschweig: Vieweg, 1914.
[Budde 1914b] Budde, Emil Arnold. “Kritisches zum Relativitätsprinzip.”
Verhandlungen der Deutschen Physikalischen Gesellschaft 16 (1914)
586–612.
[Budde 1914c] Budde, Emil Arnold. “Kritisches zum Relativitätsprinzip II.”
Verhandlungen der Deutschen Physikalischen Gesellschaft 16 (1914)
914–925.
[Campbell 1914] Campbell, Leon. “A systematic search for bright Novae.”
Popular Astronomy 22 (1914), 493–495.
[Chwolson 1924] Chwolson, Orest. “Über eine mögliche Form fiktiver Dop-
pelsterne.” Astronomische Nachrichten 221 (1924) 329–330.
[CPAE2] Stachel, John, et al. (eds.) The Collected Papers of Albert Einstein,
Vol. 2: The Swiss Years: Writings, 1900–1909, Princeton: Princeton
University Press, 1989.
[CPAE3] Klein, Martin, et al. (eds.) The Collected Papers of Albert Einstein,
Vol. 3: The Swiss Years: Writings, 1909–1911, Princeton: Princeton
University Press, 1993.
[CPAE4] Klein, Martin, et al. (eds.) The Collected Papers of Albert Einstein,
Vol. 4: The Swiss Years: Writings, 1912–1914, Princeton: Princeton
University Press, 1995.
[CPAE5] Klein, Martin, et al. (eds.) The Collected Papers of Albert Ein-
stein, Vol. 5: The Swiss Years: Correspondence, 1902–1914, Princeton:
Princeton University Press, 1993.
[CPAE6] Kox, A.J., et al. (eds.) The Collected Papers of Albert Einstein,
Vol. 6: The Berlin Years: Writings, 1914–1917, Princeton: Princeton
University Press, 1996.
[CPAE8] Schulmann, Robert, et al. (eds.) The Collected Papers of Albert
Einstein, Vol. 8: The Berlin Years: Correspondence, 1914–1918, Prince-
ton: Princeton University Press, 1998.
[CPAE10] Buchwald, Diana K., et al. (eds.) The Collected Papers of Al-
bert Einstein, Vol. 10: The Berlin Years: Correspondence, May–
December 1920 and Supplementary Correspondence, 1909–1920, Prince-
ton: Princeton University Press, 2006.
[Duerbeck 1987] Duerbeck, Hilmar W. “A Reference Catalogue and Atlas of
Galactic Novae.” Space Science Reviews 45 (1987) 1–212.
[Duerbeck 2007] Duerbeck, Hilmar W. “Novae - a Historical Perspective.”
In Bode, M.F., Evans, A. (eds.) Classical Novae, Cambridge University
Press, forthcoming.
[Earman and Janssen 1993] Earman, John and Janssen, Michel. “Einstein’s
Explanation of the Motion of Mercury’s Perihelion.” In: Earman, John
et al. (eds.) The Attraction of Gravitation, Boston: Birkhäuser, 1993
(Einstein Studies Vol. 5), 129–172.
[Eddington 1920] Eddington, Arthur S. Space, Time, and Gravitation Cam-
bridge: Cambridge University Press, 1920.
[Einstein 1907] Einstein, Albert. “Über das Relativitätsprinzip und die aus
demselben gezogenen Folgerungen.” Jahrbuch der Radioaktivität und
Elektronik 4 (1907) 411–462. Reprinted in [CPAE2, Doc. 47].
[Einstein 1911] Einstein, Albert. “Über den Einfluß der Schwerkraft auf
die Ausbreitung des Lichtes.” Annalen der Physik 35 (1911) 898–908.
Reprinted in [CPAE3, Doc. 23].
[Einstein 1912a] Einstein, Albert. “Lichtgeschindigkeit und Statik des Grav-
itationsfeldes.” Annalen der Physik 38 (1912) 355–369. Reprinted in
[CPAE4, Doc. 3].
[Einstein 1912b] Einstein, Albert. “Zur Theorie des statischen Gravitations-
feldes.” Annalen der Physik 38 (1912) 443–458. Reprinted in [CPAE4,
Doc. 4].
[Einstein 1912c] Einstein, Albert. “Relativität und Gravitation. Erwiderung
auf eine Bemerkung von M. Abraham” Annalen der Physik 38 (1912)
1059–1064. Reprinted in [CPAE4, Doc. 8].
[Einstein 1914] Einstein, Albert. “Die formale Grundlage der allgemeinen
Relativitätstheorie” Königlich Preußische Akademie der Wissenschaften
(Berlin). Sitzungsberichte 1914, 1030–1085. Reprinted in [CPAE6,
Doc. 9].
[Einstein 1915a] Einstein, Albert. “Zur allgemeinen Relativitätstheorie.”
Königlich Preußische Akademie der Wissenschaften (Berlin). Sitzungs-
berichte 1915, 778–786. Reprinted in [CPAE6, Doc. 21].
[Einstein 1915b] Einstein, Albert. “Zur allgemeinen Relativitätstheorie
(Nachtrag).” Königlich Preußische Akademie der Wissenschaften
(Berlin). Sitzungsberichte 1915, 799–801. Reprinted in [CPAE6,
Doc. 22].
[Einstein 1915c] Einstein, Albert. “Erklärung der Perihelbewegung des
Merkur aus der allgemeinen Relativitätstheorie (Nachtrag).” Königlich
Preußische Akademie der Wissenschaften (Berlin). Sitzungsberichte
1915, 831–839. Reprinted in [CPAE6, Doc. 24].
[Einstein 1915d] Einstein, Albert. “Die Feldgleichungen der Gravitation.”
Königlich Preußische Akademie der Wissenschaften (Berlin). Sitzungs-
berichte 1915, 844–847. Reprinted in [CPAE6, Doc. 25].
[Einstein 1916] Einstein, Albert. “Die Grundlage der allgemeinen Rela-
tivitätstheorie.” Annalen der Physik 49 (1916) 769–822. Reprinted in
[CPAE6, Doc. 30].
[Einstein 1916b] Einstein, Albert. “Eine neue formale Deutung der
Maxwellschen Feldgleichungen der Elektrodynamik.” Königlich Preußis-
che Akademie der Wissenschaften (Berlin). Sitzungsberichte 1916, 184–
188. Reprinted in [CPAE6, Doc. 27].
[Einstein and Grossmann 1913] Einstein, Albert and Grossmann, Marcel.
Entwurf einer verallgemeinerten Relativitätstheorie und einer Theorie
der Gravitation. Leipzig: Teubner, 1913. Reprinted in [CPAE4, Doc. 13].
[Einstein 1936] Einstein, Albert. “Lens-like action of a star by the deviation
of light in the gravitational field.” Science 84 (1936) 506–507.
[Fernau 1914] Fernau, Hermann. Die französische Demokratie. Sozialpoli-
tische Studien aus Frankreichs Kulturwerkstatt. München: Duncker &
Humblot, 1914.
[Fischer-Petersen 1912] Fischer-Petersen, J. “Über die Lichtkurve der Nova
(18.1912) Geminorum 2.” Astronomische Nachrichten 192 (1912) 429–
[Furuhjelm 1912] Furuhjelm, Ragnar. “Über das Spektrum der Nova Gemi-
norum 2.” Astronomische Nachrichten 192 (1912) 117–124.
[Hentschel 1994] Hentschel, Klaus. “Erwin Finlay Freundlich and Testing
Einstein’s Theory of Relativity.” Archive for History of Exact Sciences
47 (1994) 143–201.
[Hentschel 1997] Hentschel, Klaus. The Einstein Tower. An Intertexture of
Dynamic Construction, Relativity Theory, and Astronomy. Stanford:
Stanford University Press, 1997.
[Janssen 1999] Janssen, Michel. “Rotation as the Nemesis of Einstein’s En-
twurf Theory.” In: Goenner, Hubert, et al. (eds.). The Expanding Worlds
of General Relativity Boston: Birkhäuser, 1999 (Einstein Studies Vol. 7),
127–157.
[Janssen et al. 2007] Janssen, Michel, Norton, John D., Renn, Jürgen, Sauer,
Tilman, and Stachel, John. The Genesis of General Relativity. Einstein’s
Zurich Notebook. Vol. 1. Introduction and Source, Vol. 2. Commentary
and Essays. Dordrecht: Springer, 2007.
[Laue 1921] Laue, Max von. [Nachruf auf Emil Arnold Budde] Verhandlun-
gen der Deutschen Physikalischen Gesellschaft (1921) 66–68.
[Lodge 1919] Lodge, Oliver J. “Gravitation and Light” Nature 104 (1919)
[Ludendorff 1912] Ludendorff, H. “Über die schwachen Absorptionslinien im
Spektrum der Nova Geminorum 2.” Astronomische Nachrichten 192
(1912) 124–130.
[Norton 1984] Norton, John D. “How Einstein Found His Field Equations.”
Historical Studies in the Physical Sciences 14 (1984), 253–316.
[Norton 1992] Norton, John D. “The Physical Content of General Covari-
ance.” In: Eisenstaedt, Jean, and A.J.Kox (eds.). Studies in the History
of General Relativity Boston: Birkhäuser, 1992 (Einstein Studies Vol. 3),
281–315.
[Pickering 1912] Pickering, Edward C. Astronomical Bulletin of the Harvard
College Observatory 17 March 1912.
[Renn, Sauer, and Stachel 1997] Renn, Jürgen, Sauer, Tilman, and Stachel,
John. “The Origin of Gravitational Lensing: A Postscript to Einstein’s
1936 Science Paper.” Science 275 (1997) 184–186.
[Renn and Sauer 2003] Renn, Jürgen, and Sauer, Tilman. “Eclipses of the
Stars. Mandl, Einstein, and the Early History of Gravitational Lensing.”
In: A. Ashtekar et al. (eds.). Revisiting the Foundations of Relativistic
Physics, Dordrecht: Kluwer, 2003, 69–92.
[Retter et al. 1999] Retter, A., Leibowitz, E.M., Naylor, T. “An irradiation
effect in Nova DN Gem 1912 and the significance of the period gap for
classical novae.” Monthly Notices of the Royal Astronomical Society 308
(1999) 140–146.
[Shara 1989] Shara, Michael M. “Recent Progress in Understanding the
Eruptions of Classical Novae.” Publications of the Astronomical Soci-
ety of the Pacific 101 (1989) 5–31.
[Werner 1921] Werner, R. “Emil Arnold Budde.” Elektrotechnische
Zeitschrift 42 (1921) 1153–1154.
	Einstein's letter to Emil Budde
	The lensing calculations in the scratch notebook
	The context of Einstein's early lensing calculations
	Concluding remarks
ABSTRACT
  Einstein's early calculations of gravitational lensing, contained in a
scratch notebook and dated to the spring of 1912, are reexamined. A hitherto
unknown letter by Einstein suggests that he entertained the idea of explaining
the phenomenon of new stars by gravitational lensing in the fall of 1915 much
more seriously than was previously assumed. A reexamination of the relevant
calculations by Einstein shows that, indeed, at least some of them most likely
date from early October 1915. But in support of earlier historical
interpretation of Einstein's notes, it is argued that the appearance of Nova
Geminorum 1912 (DN Gem) in March 1912 may, in fact, provide a relevant context
and motivation for Einstein's lensing calculations on the occasion of his first
meeting with Erwin Freundlich during a visit in Berlin in April 1912. We also
comment on the significance of Einstein's consideration of gravitational
lensing in the fall of 1915 for the reconstruction of Einstein's final steps in
his path towards general relativity.

<|endoftext|><|startoftext|>
Introduction
	Network Model
	Network Layer
	Channel Capacity of a MIMO Link
	MIMO-BC Link Layer
	Problem Formulation
	Reformulation of CRPA
	MIMO-MAC Channel Model
	Duality between MIMO-BC and MIMO-MAC
	Convexity of MIMO-MAC Capacity Region
	Maximum Weighted Sum Rate Problem of the Dual MIMO-MAC
	Problem Reformulation
	Solution Procedure
	Conjugate Gradient Projection for Solving link(n)(u(n))
	Computing the Conjugate Gradients
	Projection onto +(Pmax(n))
	Solving the Master Dual Problem
	Cutting-Plane Method for Solving (u)
	Subgradient Algorithm for Solving (u)
	Numerical Results
	Cutting-Plane Method
	Subgradient Method
	Comparison between BC and TDM
	Related Work
	Conclusions
	References
ABSTRACT
  MIMO technology is one of the most significant advances in the past decade to
increase channel capacity and has a great potential to improve network capacity
for mesh networks. In a MIMO-based mesh network, the links outgoing from each
node sharing the common communication spectrum can be modeled as a Gaussian
vector broadcast channel. Recently, researchers showed that ``dirty paper
coding'' (DPC) is the optimal transmission strategy for Gaussian vector
broadcast channels. So far, there has been little study on how this fundamental
result will impact the cross-layer design for MIMO-based mesh networks. To fill
this gap, we consider the problem of jointly optimizing DPC power allocation in
the link layer at each node and multihop/multipath routing in a MIMO-based mesh
networks. It turns out that this optimization problem is a very challenging
non-convex problem. To address this difficulty, we transform the original
problem to an equivalent problem by exploiting the channel duality. For the
transformed problem, we develop an efficient solution procedure that integrates
Lagrangian dual decomposition method, conjugate gradient projection method
based on matrix differential calculus, cutting-plane method, and subgradient
method. In our numerical example, it is shown that we can achieve a network
performance gain of 34.4% by using DPC.

<|endoftext|><|startoftext|>
DRAFT VERSION OCTOBER 31, 2018
Preprint typeset using LATEX style emulateapj v. 08/22/09
CRITERIA IN THE SELECTION OF TARGET EVENTS FOR PLANETARY MICROLENSING FOLLOW-UP
OBSERVATIONS
CHEONGHO HAN
Program of Brain Korea 21, Institute for Basic Science Research, Department of Physics,
Chungbuk National University, Chongju 361-763, Korea; cheongho@astroph.chungbuk.ac.kr
Draft version October 31, 2018
ABSTRACT
To provide criteria in the selection of target events preferable for planetary lensing follow-up observations,
we investigate the variation of the probability of detecting planetary signals depending on the observables
of the lensing magnification and source brightness. In estimating the probability, we consider variation of
the photometric precision by using a quantity defined as the ratio of the fractional deviation of the planetary
perturbation to the photometric precision. From this investigation, we find consistent result from previous
studies that the probability increases with the increase of the magnification. The increase rate is boosted at a
certain magnification at which perturbations caused by central caustic begin to occur. We find this boost occurs
at moderate magnifications of A . 20, implying that probability can be high even for events with moderate
magnifications. The probability increases as the source brightness increases. We find that the probability of
events associated with stars brighter than clump giants is not negligible even at magnifications as low as A ∼ 5.
In the absence of rare the prime target of very high-magnification events, we, therefore, recommend to observe
events with brightest source stars and highest magnifications among the alerted events. Due to the increase of
the source size with the increase of the brightness, however, the probability rapidly drops off beyond a certain
magnification, causing detections of low mass ratio planets (q . 10−4) difficult from the observations of events
involved with giant stars with magnifications A & 70.
Subject headings: gravitational lensing – planets and satellites: general
1. INTRODUCTION
With the advantages of being able to detect very low-mass
planets and those with separations from host stars that can-
not be covered by other methods, microlensing is one of
the most important methods that can detect and character-
ize extrasolar planets (Mao & Paczyński 1994; Gould & Loeb
1992). The microlensing planetary signal is a short duration
perturbation to the standard lensing light curve produced by
the primary star. To achieve high monitoring frequency re-
quired for the detection of the short-lived planetary signal,
current lensing experiments are employing early-warning sys-
tem to issue alerts of ongoing events in the early stage of
lensing magnification (Udalski et al. 1994; Bond et al. 2002)
and follow-up observations to intensively monitor the alerted
events (Dominik et al. 2002; Yoo et al. 2004). Under current
surveys, there exist in average & 50 alerted events at a certain
time (Dominik et al. 2002). Then, an important issue related
to the follow-up observation is which event should be moni-
tored for better chance of planet detections.
There have been several estimates of microlensing planet
detection efficiencies (Bolatto & Falco 1994; Bennett & Rhie
1996; Gaudi & Sackett 2000; Peale 2001). Most of these
works estimated the efficiency as a function of the instanta-
neous angular star-planet separation normalized by the angu-
lar Einstein radius, s, and planet/star mass ratio, q. However,
the efficiency determined in this way is of little use in the point
of view of observers who are actually carrying out follow-
up observations of lensing events. This is because the planet
parameters s and q are not known in the middle of lensing
magnification and thus they cannot be used as criteria in the
selection of target events for follow-up observations. Related
to the target selection, Griest & Safizadeh (1998) proposed a
useful criterion to observers. They pointed out that by focus-
ing on very high-magnification (A & 100) events, the proba-
bility of detecting planets in the lensing zone could be very
high. However, these events are rare and thus they cannot
be usually found in the list of alerted events. Therefore, it is
necessary to have criteria applicable to general lensing events
in the absence of very high-magnification events. To provide
such criteria, we investigate the dependency of the probability
of detecting planetary signals on the observables such as the
lensing magnification and source type.
The paper is organized as follows. In § 2, we briefly de-
scribe the basics of planetary microlensing. In § 3, we investi-
gate the variation of the probability of detecting planetary sig-
nals depending on the lensing magnification and source type
for events caused by planetary systems with different masses
and separations. We analyze the result and qualitatively ex-
plain the tendencies found from the investigation. Based on
the result of the investigation, we then present criteria for the
selection of target events preferable for follow-up observa-
tions. In § 4, we summarize the results and conclude.
2. BASICS OF PLANETARY LENSING
The lensing behavior of a planetary lens system is described
by the formalism of a binary lens with a very low-mass com-
panion. Because of the very small mass ratio, planetary a lens-
ing light curve is well described by that of a single lens of the
primary star for most of the event duration. However, a short-
duration perturbation can occur when the source star passes
the region around the caustics, that are the set of source po-
sitions at which the magnification of a point source becomes
infinite. The caustics of binary lensing form a single or multi-
ple sets of closed curves where each of which is composed of
concave curves (fold caustics) that meet at points (cusps).
For a planetary case, there exist two sets of disconnected
caustics: ‘central’ and ‘planetary’ caustics. The single central
http://arxiv.org/abs/0704.0968v1
2 MICROLENSING FOLLOW-UP CRITERIA
caustic is located close to the host star. It has a wedge shape
with four cusps and its size (width along the star-planet axis)
is related to the planet parameters by (Chung et al. 2005)
∆ξcc ∝
(s − 1/s)2
. (1)
For a given mass ratio, a pair of central caustics with sepa-
rations s and s−1 are identical to the first order of approxima-
tion (Dominik 1999; Griest & Safizadeh 1998; An 2005). The
planetary caustic is located away from the host star. The cen-
ter of the planetary caustic is located on the star-planet axis
and the position vector to the center of the planetary caustic
measured from the primary lens position is related to the lens-
source separation vector, s, by
rpc = s
. (2)
Then, the planetary caustic is located on the planet side, i.e.
sign(rpc) = sign(s), when s > 1, and on the opposite side, i.e.
sign(rpc) = −sign(s), when s < 1. When s > 1, there exists a
single planetary caustic and it has a diamond shape with four
cusps. When s< 1, there are two caustics and each caustic has
a triangular shape with three cusps. The size of the planetary
caustic is related to the planet parameters by
∆ξpc ∝
q1/2/(s
s2 − 1) for s > 1,
q1/2(κ0 − 1/κ0 +κ0/s2)cosθ0 for s < 1,
where κ(θ) =
[cos2θ± (s4 − sin2 2θ)1/2]/(s2 − 1/s2)
, θ0 =
[π±sin−1(31/2s2/2)]/2, and κ0 = κ(θ0) (Han 2006). The plan-
etary caustic is always bigger than the central caustic and the
size ratio between the two types of caustics, ∆ξcc/∆ξpc, be-
comes smaller as the mass of the planet becomes smaller and
the planet is located further away from the Einstein ring. The
planetary caustic is located within the Einstein ring of the pri-
mary when the planet is located in the range of separation
from the star of 0.6 . s . 1.6. The size of the caustic, which
is directly proportional to the planet detection efficiency, is
maximized when the planet is located in this range, and thus
this range is called as the ‘lensing zone’. As the position of
the planet approaches to the Einstein ring radius, s → 1, the
location of the planetary caustic approaches the position of
the central caustic. Then, the two types of caustic eventually
merge together, forming a single large one.
3. VARIATION OF DETECTABILITY
3.1. Quantification of Detectability
The quantity that has been often used in the previous es-
timation of the planet detection probability is the ‘fractional
deviation’ of the planetary lensing light curve from that of the
single lensing event of the primary, i.e.,
A − A0
. (4)
With this quantity, however, one cannot consider the variation
of the photometric precision depending on the lensing magni-
fication. In addition, it is difficult to consider the variation of
the detectability depending on the source type.
To consider the effect of source star brightness and its
lensing-induced variation on the planet detection probability,
we carry out our analysis based on a new quantity defined as
the ratio of the fractional deviation, ǫ, to the photometric pre-
cision, σ
, i.e,
D = |ǫ|
ν,S + Fν,B)1/2
(A − 1)F
, (5)
where F
ν,S and Fν,B represent the fluxes from the source star
and blended background stars, respectively. Here we as-
sume that photometry is carried out by using the difference
imaging method (Tomaney & Crotts 1996; Alard 1999). In
this technique, photometry of the lensed source star is con-
ducted on the subtracted image obtained by convolving two
images taken at different times after geometrically and pho-
tometrically aligning them. Then the signal from the lensed
star measured on the subtracted image is the flux variation
of the lensed source star, (A − 1)F
ν,S, while the noise orig-
inates from both the source and background blended stars,
ν,S + Fν,B. Under this definition of the planetary signal de-
tectability,D = 1 implies that the planetary signal is equivalent
to the photometric precision. Hereafter we refer the quantity
D as the ‘detectability’.
3.2. Contour Maps of Detectability
To see the variation of the detectability depending on the
separation parameter s, mass ratio q, and the types of involved
source star, we construct maps of detectability as a function
of the position in the source plane. Figure 1 shows example
maps. The individual sets of panels show the maps for events
associated with different types of source stars. All lengths
are normalized by the angular Einstein radius and ξ and η
represent the coordinates parallel with and normal to the star-
planet axis, respectively. A contours (yellow curve) is drawn
at the level of D = 3.0. The maps are centered at the position
of the primary lens star and the planet is located on the left.
The dotted arc in each panel represents the Einstein ring of the
primary star. The closed figures drawn by red curves represent
the caustics.
For the construction of the maps, we assume a mass of
the primary lens star of m = 0.3 M⊙ and distances to the
lens and source of DL = 6 kpc and DS = 8 kpc, respec-
tively. Then, the corresponding Einstein radius is rE =
(4Gm/c2)[(DL(DS − DL)/DS]
= 1.9 AU. For the source
stars, we test three different types of giant, clump giant, and
main-sequence stars. The assumed I-band absolute magni-
tudes of the individual types of stars are MI = 0.0, 1.5, and
3.6, respectively. With the assumed amount of extinction to-
ward the Galactic bulge field of AI = 1.0, these correspond
to the apparent magnitudes of I = 15.5, 17, and 19.1, respec-
tively. As the source type changes, not only the brightness
but also the size of the star changes. Source size affects the
planetary signal in lensing light curves (Bennett & Rhie 1996)
and thus we take account the finite source effect into consid-
eration. The assumed source radii of the individual types of
source stars are 10.0 R⊙, 3.0 R⊙, and 1.1 R⊙, respectively.
We assume that events are affected by blended flux equivalent
to that of a star with I = 20. We note that the adopted lens
and source parameters are the typical values of Galactic bulge
events that are being detected by the current lensing surveys
(Han & Gould 2003).
For the observational condition, we assume that images are
obtained by using 1 m telescopes, which are typical ones be-
ing used in current follow-up observations. We also assume
that the photon acquisition rate of each telescope is 10 pho-
tons per second for an I = 20 star and a combined image with
HAN 3
FIG. 1.— Contour maps of the detectability of the planetary signal, D, as a function of the position in the source plane for events caused by planetary systems
with various lens-source separations and mass ratios. The detectability represents the ratio of the fractional deviation of the planetary lensing light curve from
the single lensing light curve of the primary to the photometric precision. All lengths are normalized by the angular Einstein radius and ξ and η represent the
coordinates parallel with and normal to the star-planet axis, respectively. The individual sets of panels show the maps for events associated with different types
of source stars. Contours (yellow curve) are drawn at the level of D = 3.0. The maps are centered at the position of the primary lens star and the planet is located
on the left. The dotted arc in each panel represents the Einstein ring of the primary star. The closed figures drawn by red curves represent the caustics. For the
details about the assumed lens parameters and observational conditions, see § 3.2.
FIG. 2.— Geometric representation of the probability of detecting planetary
signals, P. Under the definition of P as the average probability of detecting
planetary signals with a detectability greater than a threshold value Dth at
the time of observation with a magnification A, the probability corresponds
to the portion of the arclet(s) where the detectability is greater than a threshold
value out of a circle around the primary with a radius equal to the lens-source
separation corresponding to the magnification at the time of observation. The
individual circles in the upper panel correspond to the source positions at
which the lensing magnifications are A = 1.5 (pink), 3.0 (cyan), 5.0 (green),
and 10.0 (red), respectively. The curves in the bottom panels show the vari-
ation of the detectability as a function of the position angle (θ) of points on
the circles with corresponding colors in the upper panel. We set the thresh-
old detectability as Dth = 3.0, i.e. 3σ detection of the planetary signal. The
dashed circle represents the Einstein ring.
a total exposure time of 5 minutes is obtained from each set
of observations.
3.3. Probability of Detecting Planetary Signals
Based on the maps of detectability, we then investigate the
probability of detecting planetary signals as a function of the
lensing magnification. We define the probability P as the av-
erage probability of detecting planetary signals with a de-
tectability greater than a threshold value Dth at the time of
observation with a magnification A. Geometrically, this prob-
ability corresponds to the portion of the arclet(s) where the
detectability is greater than a threshold value out of a circle
FIG. 3.— Probability of detecting planetary signals as a function of lens-
ing magnification. The individual panels show the probabilities for events
involved with different types of source stars. The curves in each panel show
the variation of the probability for planets with different mass ratios and sep-
arations. We note that although not presented, the probabilities for planets
with separations s < 1 are similar to those of the corresponding planets with
s−1. The probability is defined the average probability of detecting plane-
tary signals with a detectability greater than a threshold value Dth at the time
of observation with a magnification A. We set the threshold detectability as
Dth = 3.0, i.e. 3σ detection of the planetary signal. We note that there is a
maximum magnification specific to the angular size of the source star and
thus the curves stop at certain magnifications.
around the primary with a radius equal to the lens-source sep-
aration corresponding to the magnification at the time of ob-
servation. This is illustrated in Figure 2. We note that the
magnification is a unique function of the absolute value of the
lens-source separation u1, and thus A = const corresponds to
a circle around the lens. The lens-source separation is related
to the magnification by
u(A) =
(1 − A−2)1/2
. (6)
We set the threshold detectability as Dth = 3.0, i.e. 3σ detec-
tion of the planetary signal.
1 Strictly speaking, the magnification depends additionally on the size of
the source star.
4 MICROLENSING FOLLOW-UP CRITERIA
TABLE 1
LIMITATION BY FINITE-SOURCE EFFECT
source type event type
giant A & 70 for planets with q . 10−3
clump giant A & 200 for planets with q . 5× 10−4
main-sequence A & 500 for planets with q . 10−4
NOTE. — Cases of planetary microlensing events
where detection of planetary signal is limited by finite
source effect. We note that “-” means the respective con-
figuration cannot be realized.
In Figure 3, we present the resulting probability as a func-
tion of magnification. The individual panels show the proba-
bilities for events involved with different types of source stars.
In each panel, we present the variations of the probability for
planets with different mass ratios and separations. We test six
different planetary separations of s = 1/1.6, 1/1.4, 1/1.2, 1.2,
1.4, and 1.6 as representative values for planets in the lensing
zone. For the mass ratio, we test five values of q = 5× 10−3,
10−3, 5× 10−4, 10−4, and 5× 10−5.
From the variation of the probability, we find the follow-
ing tendencies. First, we find that the probability increases
with the increase of the lensing magnification. This is con-
sistent with the result of K. Horne (private communication).
This tendency is due to three factors. First, the size of the
planetary caustic increases as it is located closer to the pri-
mary star. This can be seen in Figure 4, where we present the
relation between the location of the planetary caustic and its
size, which is obtained by using equations (2) and (3). Then,
higher chance of planetary perturbation is expected when the
source is located closer to the primary during which the lens-
ing magnification is high. Second, perturbation regions of
the same size cover a larger range of angle as the planetary
caustic moves closer to the lens. This also contributes to the
higher probability. Third, the photometric precision improves
with the increasing brightness of the source star due to lensing
magnification. As the photometric precision improves, it is
easier to detect small deviations induced by planets. The same
reason can explain the considerable size of the perturbation
region induced by central caustics. Perturbations induced by
the central caustics occur at high magnifications during which
the photometric precision is high. As a result, despite much
smaller size of the central caustic than that of the planetary
caustic, the central perturbation region is considerable and can
even be comparable to the perturbation region induced by the
planetary caustic. This can be seen in the detectability maps
presented in Figure 1.
However, the probability does not continue to increase with
the increase of the magnification. Instead, the probability
drops off rapidly beyond a certain magnification. This critical
value corresponds to the magnification at which finite-source
effect begins to wash out the planetary signal. In Table 1, we
present the cases where finite source effect limits planet detec-
tions. As a result, detections of planets with low mass ratios
would be difficult for events involved with giant source stars
with magnifications A & 70. We note that the finite source
effect also limits the maximum magnifications of events and
thus the curves in Figure 3 discontinue at a certain value.
Second, as the magnification increases, the probability of
detecting planetary signal increases with two dramatically dif-
ferent rates of dP/d logA. We find that this abrupt change
of dP/d logA occurs due to the transition from the regime of
FIG. 4.— Variation of the size of the planetary caustic as a function of its
location. The value rpc represents the separation between the center of the
planetary caustic and the primary lens star. The sign of rpc is positive when
the caustic is on the planet side and vice versa. We note that the caustic size
at around rpc is not presented because the analytic expression in eq. (1) is not
valid in this region. In addition, there is no distinction between the planetary
and central caustics in this region.
perturbations induced by planetary caustics into the one of
perturbations induced by central caustics. The perturbation
region induced by the central caustic forms around the pri-
mary lens and thus the probability becomes very high once the
source star is in the central perturbation regime. The boost of
the increase rate occurs at different magnifications depending
on the planetary parameters and the types of involved source
stars. The critical magnification becomes lower as the mass
ratio of the planet increases and the separation of the planet
approaches the Einstein ring radius. In Table 2, we present
these critical magnifications. An important finding to be noted
is that the critical magnification occurs at moderate magnifi-
cations of . 20 for a significant fraction of events caused by
planetary systems with planets located in the lensing zone.
This implies that probability of detecting planetary signal can
be high even for events with moderate magnifications.
Third, the probability is higher for events involved with
brighter source stars. This is because of the improved pho-
tometric precision with the increase of the source brightness.
The difference in the probability depending on the source type
is especially important at low magnifications. For example,
the probabilities at a magnification of A = 5 for events caused
by a common planetary system with q = 10−3 and s = 1.2 but
associated with different source stars of giant, clump giant,
and main-sequence are P ∼ 20%, 10%, and 1%, respectively.
In the absence of high magnification events, therefore, the sec-
ond prime candidate event for follow-up observation is the
one involved with brightest source star. As the magnification
further increases and once the source star enters the central
perturbation region, the difference becomes less important.
4. SUMMARY AND CONCLUSION
For the purpose of providing useful criteria in the selec-
tion of target events preferable for planetary lensing follow-
up observations, we investigated the variation of the proba-
bility of detecting planetary lensing signals depending on the
observables of the lensing magnification and source bright-
ness. From this investigation, we found consistent result from
previous studies that the probability increases with the in-
crease of the lensing magnification due to the improvement
HAN 5
TABLE 2
CRITICAL MAGNIFICATIONS OF CENTRAL PERTURBATION
source planetary mass ratio
type separation q = 5× 10−3 q = 10−3 q = 5× 10−4 q = 10−4 q = 5× 10−5
s = 1.2, 1/1.2 A ∼ 2.2 A ∼ 7 A ∼ 8 A ∼ 22 A ∼ 22
giant s = 1.4, 1/1.4 A ∼ 2.5 A ∼ 8 A ∼ 12 – –
s = 1.6, 1/1.6 A ∼ 3.5 A ∼ 9 A ∼ 18 – –
clump s = 1.2, 1/1.2 A ∼ 7 A ∼ 8 A ∼ 11 A ∼ 30 A ∼ 60
giant s = 1.4, 1/1.4 A ∼ 8 A ∼ 12 A ∼ 17 A ∼ 60 A ∼ 80
s = 1.6, 1/1.6 A ∼ 9 A ∼ 16 A ∼ 20 A ∼ 745 –
main s = 1.2, 1/1.2 A ∼ 6 A ∼ 11 A ∼ 20 A ∼ 55 A ∼ 100
sequence s = 1.4, 1/1.4 A ∼ 8 A ∼ 20 A ∼ 30 A ∼ 100 A ∼ 150
s = 1.6, 1/1.6 A ∼ 11 A ∼ 30 A ∼ 40 A ∼ 150 A ∼ 200
NOTE. — Critical magnifications at which transition from the regime of perturbations induced
by planetary caustics into the one of perturbations induced by central caustics occur. We note that
the critical magnifications are . 20 in many cases.
of the photometric precision combined with the expansion of
the perturbation region. The increase rate of the probabil-
ity is boosted at a certain magnification at which perturba-
tion caused by the central caustic begins to occur. We found
that this boost occurs at moderate magnifications of A . 20
for a significant fraction of events caused by planetary sys-
tems with planets located in the lensing zone, implying that
probabilities can be high even for events with moderate mag-
nifications. The probability increases with the increase of the
source star brightness. We found that the probability of events
associated with source stars brighter than clump giants is not
negligible even at magnifications as low as A ∼ 5. In the ab-
sence of rare prime target of very high-magnification events
(A & 100), we, therefore, recommend to observe events with
brightest source stars and highest magnifications among the
alerted events. Due to the increase of the source size with the
increase of the brightness, however, the probability rapidly
drops off beyond a certain magnification. As a result, detec-
tions of planets with low mass ratios (q . 10−4) would be dif-
ficult for events involved with giant source stars with magni-
fications A & 70.
This work was supported by the Astrophysical Research
Center for the Structure and Evolution of the Cosmos (ARC-
SEC) of Korea Science and Engineering Foundation (KOSEF)
through Science Research Program (SRC) program.
REFERENCES
Alard, C. 1999, A&A, 343, 10
An, J. H. 2005, MNRAS, 356, 1409
Bennett, D. P., & Rhie, S. H. 1996, ApJ, 472, 660
Bolatto, D. B., & Falco, E. E. 1994, ApJ, 436, 112
Bond, I., et al. 2002, MNRAS, 331, L19
Chung, S. J., et al. 2005, ApJ, 630, 535
Dominik, M. 1999, A&A, 349, 108
Dominik, M., et al. 2002, Planetary and Space Science, 50, 299
Gould, A., & Loeb, A. 1992, ApJ, 396, 104
Griest, K., & Safizadeh, N. 1998, ApJ, 500, 37
Gaudi, B. S., & Sackett, P. D. 2000, ApJ, 532, 340
Han, C. 2006, ApJ, 638, 1080
Han, C., & Gould, A. 2003, ApJ, 592, 172
Mao, S., & Paczyński, B. 1991, ApJ, 374, L37
Peale, S. J. 2001, ApJ, 552, 889
Tomaney, A. B., & Crotts, A. P. S. 1996, AJ, 112, 2872
Udalski, A., Szymański, M., Kałużny, J., Kubiak, M., Mateo, M.,
Krzemiński, W., & Paczyński, B. 1994, 44, 227
Yoo, J., et al. 2004, ApJ, 616, 1204
This figure "fig1.jpg" is available in "jpg"
 format from:
http://arxiv.org/ps/0704.0968v1
http://arxiv.org/ps/0704.0968v1
ABSTRACT
  To provide criteria in the selection of target events preferable for
planetary lensing follow-up observations, we investigate the variation of the
probability of detecting planetary signals depending on the observables of the
lensing magnification and source brightness. In estimating the probability, we
consider variation of the photometric precision by using a quantity defined as
the ratio of the fractional deviation of the planetary perturbation to the
photometric precision. From this investigation, we find consistent result from
previous studies that the probability increases with the increase of the
magnification. The increase rate is boosted at a certain magnification at which
perturbations caused by central caustic begin to occur. We find this boost
occurs at moderate magnifications of $A\lesssim 20$, implying that probability
can be high even for events with moderate magnifications. The probability
increases as the source brightness increases. We find that the probability of
events associated with stars brighter than clump giants is not negligible even
at magnifications as low as $A\sim 5$. In the absence of rare the prime target
of very high-magnification events, we, therefore, recommend to observe events
with brightest source stars and highest magnifications among the alerted
events. Due to the increase of the source size with the increase of the
brightness, however, the probability rapidly drops off beyond a certain
magnification, causing detections of low mass ratio planets ($q\lesssim
10^{-4}$) difficult from the observations of events involved with giant stars
with magnifications $A\gtrsim 70$.

<|endoftext|><|startoftext|>
The dissolution of the vacancy gas and grain boundary diffusion
in crystalline solids
Fedor V.Prigara
Institute of Microelectronics and Informatics, Russian Academy of Sciences,
21 Universitetskaya, Yaroslavl 150007, Russia∗
(Dated: September 12, 2021)
Abstract
Based on the formula for the number density of vacancies in a solid under the stress or tension,
the model of grain boundary diffusion in crystalline solids is developed. We obtain the activation
energy of grain boundary diffusion (dependent on the surface tension or the energy of the grain
boundary) and also the distributions of vacancies and the diffusing species in the vicinity of the
grain boundary.
PACS numbers: 61.72.Bb, 66.30.Dn, 68.35.-p
http://arxiv.org/abs/0704.0972v4
Recently, it was shown that sufficiently high pressures as well as mechanical stresses
applied to a crystalline solid lead to the decrease in the energy of the vacancy formation
and create, therefore, an additional amount of vacancies in the solid [1]. The last effect
enhances self-diffusion in the crystal which is normally vacancy-mediated, at least in simple
metals. Since large mechanical stresses are normally present in grain boundaries, these new
results can elucidate the mechanisms of grain boundary diffusion which have remained so
far unclear [2].
According to the thermodynamic equation [3]
dE = TdS − pdV, (1)
where E is the energy, T is the temperature, S is the entropy, p is the pressure, and V is
the volume of a solid, the energy of a solid increases with pressure, so the pressure acts as
the energy factor similarly to the temperature. Therefore, the number of vacancies in a solid
increases both with temperature and with pressure.
The thermodynamic consideration based on the Clausius- Clapeyron equation gives the
number density n of vacancies in a solid in the form [1]
n = (P0/T ) exp (−Ev/T ) = (n0T0/T ) exp (−Ev/T ) , (2)
where Ev is the energy of the vacancy formation, P0 = n0T0 is a constant, T0 can be put
equal to the melting temperature of the solid at ambient pressure, and the constant n0 has
an order of magnitude of the number density of atoms in the solid. Here the Boltzmann
constant kB is included in the definition of the temperature T.
The formula (2) describes the thermal expansion of the solid. It should be taken into
account that the dissolution of the vacancy gas in a solid causes the deformation of the
crystalline lattice and changes the lattice parameters.
The energy of the vacancy formation Ev depends linearly on the pressure P (in the region
of high pressures) as given by the formula
Ev = E0 − αP/n0, (3)
where α is a dimensionless constant, α ≈ 18 for sufficiently high pressures. On the atomic
scale, the pressure dependence of the energy of the vacancy formation in the equation (3) is
produced by the strong atomic relaxation in a crystalline solid under high pressure.
With increasing pressure, the number density of vacancies in a solid increases, according
to the relation
n = (n0T0/T ) exp (− (E0 − αP/n0) /T ) , (4)
and, finally, the vacancies can condense, forming their own sub-lattice. Such is the explana-
tion of the appearance of composite incommensurate structures in metals and some other
elemental solids under high pressure [4-7].
Further increase of the number density of vacancies in a solid with increasing pressure
leads to the melting of the solid under sufficiently high pressure (and fixed temperature).
Such effect has been observed in sodium [6]. In general, such behavior is universal for solids,
though the corresponding melting pressure is typically much larger than those for sodium.
We assume that the melting of the crystalline solid occurs when the critical number
density nc of vacancies is achieved. In view of the equation (2), it means that the ratio
of the energy of the vacancy formation Ev to the melting temperature Tm of the solid is
approximately constant,
Ev/Tm ≈ α. (5)
The value of the constant α in the last relation can be determined from the empirical
relation between the activation energy of self diffusion (which is approximately equal to the
energy of vacancy formation) and the melting temperature of a solid [8]:
E0 ≈ 18Tm, (6)
so that α ≈ 18.
Substituting the expression (3) in the relation (5), we obtain
(E0 − αP/n0) /Tm ≈ α. (7)
The last equation gives the melting curve of the crystalline solid in the region of high
pressures in the form
T + P/n0 ≈ E0/α ≈ T0, (8)
where T0 is the melting temperature of the solid at ambient pressure.
The constant n0 can be determined from the relation between the tensile strength σs and
the melting temperature Tm of a solid [1]
n0 ∼= σs/Tm. (9)
The numerical value of this constant is n0 ≈ 1.1× 10
22cm−3 [1].
Replacing in the relation (4) the pressure P by the absolute value of the stress or tension
σ = F/S, applied to a solid, where F is the applied force and S is the cross-section area of
the solid in the plane perpendicular to the direction of the applied force, we can estimate
the mean number density of vacancies in the solid under the stress or tension:
〈n〉 ∼= (n0T0/T ) exp (− (E0 − ασ/n0) /T ) . (10)
The dissolution of the vacancy gas in a solid under the stress or tension is responsible
for the low values of the elastic limit and the tensile strength of solids as compared with
theoretical estimations not taking into account this process [9].
As indicated above, large mechanical stresses are normally present in grain boundaries.
The absolute value σb of the mechanical stress in the close vicinity of a grain boundary is
given by the formula
σb ∼= γb/r0, (11)
where γb is the energy of the grain boundary and r0 is the radius of the atomic relaxation
region (around a vacancy) which will be estimated below.
According to the relation (10), the energy of the vacancy formation in the close vicinity
of the grain boundary is given by the formula
Eb = E0 − αγb/ (n0r0) . (12)
For the small values of misorientation angle θ 6 10 − 15 degrees, the energy of the
dislocation structure contributes to the energy of the grain boundary [10]. However, for
larger misorientation angles, the energy of the grain boundary is approximately constant
and is determined by the surface tension γ of the solid, γb ∼= γ.
Due to the Einstein relation between the mobility of an atom, µ = v/F , where v is the
velocity of the atom and F is the force acting on the atom, and the diffusion coefficient D
µ = v/F = D/T, (13)
the speed of grain boundary motion v is proportional to the diffusion coefficient D⊥ for
self-diffusion in the direction perpendicular to the plane of a grain boundary. Therefore,
the activation energy E of grain boundary motion is equal to the activation energy E⊥ of
self-diffusion across the grain boundary. The last activation energy is equal to the activation
energy Eb of grain boundary self-diffusion in the case of high-angle grain boundaries, and
is approximately equal to the activation energy E0 of bulk self-diffusion for low-angle grain
boundaries. Thus, there is a step of the activation energy for grain boundary motion at
some critical value θc of the misorientation angle (θc = 10− 15
◦, as indicated above). Such
a step of the activation energy for grain boundary motion has been observed experimentally
in high-purity aluminium, the critical value of the misorientation angle being in this case
θc = 13.6
◦ [11].
The driving force for grain boundary motion is provided by the distribution of mechanical
stresses in a crystalline solid [12].
Assuming that the free surface of a crystalline solid is formed by the plane of vacancies,
we can estimate the surface tension of the solid as follows
γ ∼= βn0E0a0, (14)
where a0 = n
∼= 0.45nm has an order of magnitude of the lattice spacing a, and β is a
dimensionless constant which has an order of unity. For hard metals such as Al, Zr, Nb, Fe,
Pt, β ∼= 0.8. In the case of mild metals, β is normally smaller, e.g. for Rb and Sr, β ∼= 1/4.
Substituting the estimation (14) for the energy of the grain boundary γb ∼= γ in the
equation (12), we find
Eb ≈ E0 (1− βαa0/r0) . (15)
Due to the atomic relaxation and thermal motion of atoms, the migration barriers are
small [2,13], and the activation energy of self-diffusion is approximately equal to the energy
of the vacancy formation. The analysis of experimental data on the activation energy of
grain boundary self-diffusion gives an empirical relation [14]
Eb ≈ 9Tm ≈ E0/2. (16)
From equations (15) and (16), we find the estimation of the radius of the atomic relaxation
region,
r0 ≈ 2βαa0 ∼= αa0, (17)
since β has an order of unity. The radius of the atomic relaxation region has an order of
r0 ∼= 18n
≈ 8nm. This value is comparable with the diameters of tracks produced by
high energy ions in metals [15-17]. The grain boundary diffusion width δ [14] is smaller than
the radius of the atomic relaxation region due to the non-uniform distribution of vacancies
inside the atomic relaxation region in the grain boundary.
If we assume that the mechanical stress σ decreases linearly with the distance x from the
plane of the grain boundary,
σ = σ0 (1− kx) , (18)
where σ0 is the stress at the boundary of the atomic relaxation region with the width r0 in
the grain boundary (this value is smaller than σb ∼= γ/r0 ∼= (1/2)n0Tm and has an order of
magnitude σ0 ∼= (1/2)n0T ), then the equation (10) gives the distribution of vacancies in the
vicinity of the grain boundary in the form
n ∼= (n0T0/T ) exp (− (E0 − ασ0 (1− kx) /n0) /T ) = nbexp (−ασ0kx/ (n0T )) , (19)
where nb is the number density of vacancies at the boundary of the atomic relaxation region.
Due to the trapping by vacancies [18], the distribution of the concentration c of the
diffusing species in the vicinity of the grain boundary follows the same law:
c ∼= cbexp (−x/l) , (20)
where cb is the concentration of the diffusing species at the boundary of the relaxation region,
and the scale l is given by the formula
l = n0T/ (ασ0k) . (21)
Here k has an order of magnitude of 1/d, d being the size of the grain, so that l ∼= d/α. The
penetration profiles described by the equation (20) have been indeed observed experimentally
in the case of grain boundary diffusion in metals [8, 18], the measured penetration depth l
having an order of a few micrometers [8].
To summerize, we obtained the dependence of the activation energy of grain boundary
self-diffusion on the energy of the grain boundary, the estimation of the surface tension of a
solid and of the energy of the grain boundary, and the width of the atomic relaxation region
in the grain boundary (or the radius of the atomic relaxation region around a vacancy). We
obtained further the distributions of vacancies and the diffusing species in the vicinity of
the grain boundary. The obtained radius of the atomic relaxation region is consistent with
the diameters of tracks produced by high energy ions in metals.
—————————————————————
[1] F.V.Prigara, E-print archives, cond-mat/0701148.
[2] A.Suzuki and Y.Mishin, J. Mater. Sci. 40, 3155 (2005).
[3] S.-K.Ma, Statistical Mechanics (World Scientific, Philadelphia, 1985).
[4] R.J.Nelmes, D.R.Allan, M.I.McMahon, and S.A.Belmonte, Phys. Rev. Lett. 83, 4081
(1999).
[5] M.I.McMahon, S.Rekhi, and R.J.Nelmes, Phys. Rev. Lett. 87, 055501 (2001).
[6] O.Degtyareva, E.Gregoryanz, M.Somayazulu, H.K.Mao, and R.J.Hemley, Phys. Rev.
B 71, 214104 (2005).
[7] V.F.Degtyareva, Usp. Fiz. Nauk 176, 383 (2006) [Physics- Uspekhi 49, 369 (2006)].
[8] B.S.Bokstein, S.Z.Bokstein, and A.A.Zhukhovitsky, Thermodynamics and Kinetics of
Diffusion in Solids (Metallurgiya Publishers, Moscow, 1974).
[9] G.I.Epifanov, Solid State Physics (Higher School Publishers, Moscow, 1977).
[10] A.A.Smirnov, Kinetic Theory of Metals (Nauka, Moscow, 1966).
[11] M.Winning, G.Gottstein, and L.S.Shvindlerman, Acta Mater. 49, 211 (2001).
[12] K.J.Draheim and G.Gottstein, in APS Annual March Meeting, 17- 21 March 1997,
Abstract D41.87.
http://arxiv.org/abs/cond-mat/0701148
[13] B.P.Uberuaga, G.Henkelman, H.Jonsson, S.T.Dunham, W.Windl, and R.Stumpf,
Phys. Stat. Sol. B 233, 24 (2002).
[14] I.Kaur and W.Gust, Fundamentals of Grain and Interphase Boundary Diffusion
(Ziegler Press, Stuttgart, 1989).
[15] F.F.Komarov, Usp. Fiz. Nauk 173, 1287 (2003) [Physics- Uspekhi 46, 1253 (2003)].
[16] F.V.Prigara, E-print archives, cond-mat/0406222.
[17] M.Toulemonde, C.Trautmann, E.Balanzat, K.Hjort, and A.Weidinger, Nucl. In-
strum. Meth. B 217, 7 (2004).
[18] W.P.Ellis and N.H.Nachtrieb, J. Appl. Phys. 40, 472 (1969).
∗ Electronic address: fvprigara@rambler.ru
http://arxiv.org/abs/cond-mat/0406222
mailto:fvprigara@rambler.ru
	References
ABSTRACT
  Based on the formula for the number density of vacancies in a solid under the
stress or tension, the model of grain boundary diffusion in crystalline solids
is developed. We obtain the activation energy of grain boundary diffusion
(dependent on the surface tension or the energy of the grain boundary) and also
the distributions of vacancies and the diffusing species in the vicinity of the
grain boundary.

<|endoftext|><|startoftext|>
Introduction
Young rotation-powered pulsars typically radiate a large fraction of their spin-down
energy at X-ray energies. Observations in this band are thus important to the study of the
spin-down evolution of such pulsars and their emission mechanism(s). The study also helps to
understand the mechanical energy output of the pulsars into their surroundings, manifested
as pulsar wind nebulae (PWNe). To this end, one needs to monitor the spin-down at various
evolutionary stages of young pulsars and to measure their energy spectra, both pulsed and
unpulsed, with various viewing angles. However, only a dozen or so of young pulsars with
PWNe have been identified and studied in detailed so far.
The recently discovered 136 ms pulsar PSR J1930+1852 at the center of the supernova
remnant (SNR) G54.1+0.3 is the latest example of a Crab-like pulsar (Camilo et al. 2002).
Known as the “Bulls-Eye” pulsar, PSR J1930+1852 is surrounded by a bright symmetric
ring of emission (Lu et al. 2002) similar to the toroidal and jet-like structure associated
with the Crab pulsar, but viewed nearly face-on. Based on the initial timing parameters,
PSR J1930+1852 is the eighth most energetic pulsar known, with a rotational energy loss
rate of Ė = 1.2 × 1037 erg s−1, well above the empirical threshold for generating a bright
pulsar wind nebula (Ė ∼> 4 × 10
36 erg s−1, Gotthelf 2004). Such young pulsars are often
embedded in observable shell-type remnant which have yet to dissipate. However, like the
Crab, G54.1+0.3 lacks evidence for a thermal remnant in any waveband (Lu et al. 2002).
Most likely, the SN ejecta in these two remnants are still expanding into a very low density
medium.
In this paper we present the first dedicated X-ray timing and spectral follow-up observa-
tions of PSR J1930+1852 since discovery. Previous X-ray results were based on archival data
of limited quality. We use the new data to characterize the pulse shape and energy spectrum
and provide a long term ephemeris. Throughout the paper, the uncertainties (statistical
fluctuation only) are quoted at the 68% confidence level.
2. Observations and Data Analysis
The pulsar PSR J1930+1852 was observed twice with RXTE on 2002 September 12
– 14 and on 2002 December 23 – 25 using a combination of event and instrument modes.
For consistency, we analyze the data taken with the proportional counter array (PCA) in
the Good Xenon mode. PCA has a field of view of 1◦ (FWHM), total collecting area of
about 6500 cm2, time resolution of 1 µs, and spectral resolution of ≤ 18% at 6 keV. The
data are reduced and analyzed using the ftools software package version v5.2. We filter
– 3 –
the data using the standard RXTE criteria, selecting time intervals for which parameters
Elevation Angles < 10◦, Time Since SAA ≥ 30 min, Pointing Offsets < 0.02◦, and the
background electron rate Electron2 < 0.1. The effective exposure time after this filtering
is 31.7 ks and 41.7 ks for the September and December observations. Since the background
of RXTE is high and the spectral resolution is relatively low, the RXTE data is used
herein exclusively for timing analysis, selecting photons detected from PCA PHA channels
0 − 35 (∼ 2 − 15 keV). This results in a total of ∼ 1 and ∼ 1.6 million counts in the two
observations for the subsequent analysis. The photon arrival times are corrected to the Solar
system barycenter, based on the DE200 Solar ephemeris time system and the Chandra J2000
coordinates of J193030.13+185214.1 (Lu et al. 2002).
SNR G54.1+0.3 was also observed with Chandra on 2003 June 30 for a total of 58.4 ks.
The pulsar was placed at the aim-point of the front-illuminated ACIS-I detector. The CCD
chip I3 was operated in continuous-clocking mode (CC-mode), providing a time resolution of
2.85 ms and an one-dimensional imaging, in which the 2-D CCD image is integrated along the
column direction in each CCD readout cycle. The photon arrival times are post-processed to
account for the spacecraft dithering and SIM motion prior to the barycenter correction. The
spectral data are corrected for the effects of CTI (Charge Transfer Inefficiency). However,
the spectral gain is not well calibrated in the CC-mode, requiring adjustment in the fitting
process (details are given in §3). Spectral response matrices are generated for the ACIS-I
aimpoint, the location of the pulsar in this observation. After filtering the data using the
standard criteria, the remaining effective exposure is 57.2 ks. Reduction and analysis of
the Chandra data are all based on the standard software package CIAO (v3.2) and CALDB
(v3.0.0).
Figure 1 presents the geometry of the CC-mode observation overlaid on an archival
Chandra X-ray image of SNR G54.1+0.3. The CCD image is summed along the dimension
perpendicular to the marked line which is orientated with a position angle P.A. = 19◦ East
of North. The count distribution along this dimension is shown in Figure 2. The central
peak corresponds to the presence of the pulsar, which significantly contributes to the six
adjacent pixels, as denoted by the upper horizontal bar. The neighboring four pixels (two
on each side of the pulsar region), marked by the two lower horizontal bars, show the nearly
same intensity level in the ACIS-I3 image-mode data with the pulsar excised. We therefore
select counts falling in the inner six pixels for both our pulsar timing and spectral analysis of
the pulsar, while those counts in the outer four pixels are used to estimate the background
from the surrounding nebula.
– 4 –
Fig. 1.— Geometry of the Chandra ACIS-I3 CCD continuous-clocking (CC-mode) obser-
vation of PSR J1930+1852 presented herein. The dashed line gives the orientation of the
CC-mode observation shown overlaid on an archival Chandra broadband (0.3−10 keV) X-ray
image of SNR G54.1+0.3.
– 5 –
Fig. 2.— Source and background region determination for the Chandra CC-mode observation
of PSR J1930+1852. The 1-D count distribution of SNR G54.1+0.3 as observed in CC-
mode using ACIS-S3 (solid line), compared with the distributions constructed from the
collapsed Chandra ACIS-I3 image-mode data, with (dashed line) and without (dotted) the
pulsar excised. All data is restricted to the 0.3−10 keV energy band. The on-pulsar (central)
thick horizontal bar denotes the 6 pixels that contains significant pulsar emission, while the
two adjacent off-pulsar (outer) thick horizontal bars mark the pixels that are used to estimate
the local nebula background (see §2 for details).
– 6 –
3. Results
3.1. Pulsar Timing
For each observation, we search for the periodic signal of PSR J1930+1852 by folding
events around the period extrapolated from the early radio ephemeris of Camilo et al (2002).
For each period folding with a period P , a χ2 is calculated from the fit to the pulse profile
with a constant count rate. The null hypothesis of no periodic signal can be ruled out when
a significant peak is seen in the resultant “periodogram” (χ2 vs. P ), which is the case for
each of the X-ray observations at a high confidence (χ2 > 300 for 10 phase bins). We further
fit the peak shape with a Gaussian profile to maximize the accuracy of our pulsar period
determination (Figure 3). The centroid of this Gaussian is then taken as the best estimate of
the pulsar period. The light curves derived of the RXTE and Chandra observations folded
at the measured periods are shown in Figures. 4–5.
To estimate the uncertainties in the period measurements, we use the bootstrap tech-
nique of Diaconis & Efron (1983). This is done as the following: (1) constructing a new data
set of the same total number of counts by re-sampling with replacement from the observed
events; (2) determining the period with this re-sampled data set in the exactly same way as
with the original data; (3) repeating the above two steps for 500 times to produce a period
distribution; (4) Using the dispersion of this distribution as an estimate of the 1σ period
uncertainty. The distributions produced for the three observations are shown in the right
column of Figure 3, while the estimated uncertainties are included in Table 1.
To compute the pulsed fraction of the X-ray emission from PSR J1930+1852, we used
the Chandra observation. We extracted a total of 5506 counts in the 0.3 − 10 keV band
from the on-pulsar pixels of the 1-D count distribution (the solid curve in Figure 2). After
subtracting the local nebular contribution estimated from the neighboring off-pulsar pixels,
the remaining 3560±92 counts are considered as the net total emission from the pulsar. This
emission can be further divided into the pulsed and persistent components. To determine
the persistent component, we construct a 1-D distribution of the persistent emission from
the off-pulse counts, defined to be in the phase interval 0.1 − 0.3 (Figure 5). The same
on-pulsar pixels as shown in Figure 2 now contain a total of 598 counts, Corrected for the
off-pulse phase fraction (1/5), the total number of persistent counts over the entire phase is
then 598×5. Therefore, the net number of the pulsed counts is (5506-598×5)=2516±143.
This results in a pulsed fraction of fp ≡ (pulsed/total counts) = 71± 5%.
– 7 –
3.2. Pulsed Emission Spectral Characteristics
To check for phase-dependent spectral variations across the pulse profile we compute
the hardness ratio in each phase bin, defined as HR = Nh/Ns, where Ns and Nh are the
counts selected from the 0.3− 3 keV and 3− 10 keV energy bands, respectively. The pulsar
counts (pulsed and unpulsed) are extracted from the 6 pixel source region as discussed in §2
and the background from the neighboring 4 pixels. The calculated HR is shown in the lower
panel of Figure 5. Fitting these HR data points assuming a constant HR value resulted in
a χ2 of 17.94 for 9 degrees of freedom, which means that the hardness ratio changes with
phase at a confidence level of 96.4%. Further more, it appears that the HR values of the
on pulse emission are higher than those of the off-pulse emission. In order to quantify this,
we computed the mean HR for the off-pulse emission (bins 1, 2 3 and 10 in the panel) as
HR = 0.77±0.08 and the on-pulse bins (4 to 9) as HR = 0.95±0.04. Therefore, the on-pulse
emission is harder than the off-pulse emission at a confidence level of ∼ 2σ, or 98%.
Next, we study the Chandra spectrum of PSR J1930+1852 using the same sources
and background counts as extracted above. For the pulsed spectrum, the phase width
corrected off-pulse counts are subtracted from the on-pulse counts in each spectral bin.
Figure 6a presents the best fit absorbed power-law model using the standard response matrix.
Although the overall χ2 is acceptable (34.4 for 35 degree-of-freedom), the residuals to this
fit display characteristic feature, indicating that the gain of the response function is not
properly calibrated for the CC-mode. Following the method suggested by Kaaret et al.
(2001) we calibrate the gain offset and scale in XSPEC by comparing the overall CC-mode
spectra of PSR J1930+1852 to that determined by the ACIS-S3 imaging data. The latter
is characterized by the same model with the absorption column density NH = 1.6 × 10
cm−2 and a photon index α = 1.35 (Camilo et al. 2002). The resulting gain scale and
offset are found to be 0.90 and -0.18, respectively. Fixing this gain correction and NH to the
above values, we re-fit the pulsed emission spectrum to obtain a photon index of 1.2 ± 0.2
(see Figure 6b). The new χ2 value is 17.7 for 34 degree-of-freedom, significantly better than
without the gain correction. The pulsed flux measured in the 2 − 10 keV energy band is
1.2× 10−12 ergs cm−2 s−1. When compared to the overall 2− 10 keV flux of 1.7× 10−12 ergs
cm−2 s−1 (Camilo et al. 2002), this implies that ∼ 70% of the total emission from the pulsar
is pulsed, consistent with the estimate in Section 3.1.
4. Discussion
The properties of PSR J1930+1852 are most similar to those found for other examples of
young, energetic pulsars. The power-law spectral index of the pulsar emission is consistent
– 8 –
with its spin-down energy according to the empirical law of Gotthelf (2003) for energetic
rotation powered pulsars with Ė > 4× 1036 erg s−1. The power law index is also consistent
with that of the pulsed emission, as found for other high Ė, Crab-like pulsars (Gotthelf
2003). As with most X-ray detected radio pulsars, the X-ray pulse morphology differs from
that of the radio pulse. The full width at half maximum (FWHM) of the X-ray pulse is 0.4
phase compared to 0.15 phase in radio. Notably, the X-ray pulse has a steep rise and slow
decline, whereas the radio pulse is inverted, with a slow rise and steep decay instead.
The unpulsed component of PSR J1930+1852 is most likely nonthermal in nature as
the thermal emission from the cooling surface of the neutron star should be negligible.
According to the standard theoretical cooling curves, the surface temperature of a 1.4 M⊙
neutron star is about 0.13 keV at the age of PSR J1930+1852 (about 3,000 years; Page 1998).
Assuming a radius of 12 km the neutron star should have an absorbed 0.2 − 10 keV flux of
∼ 8×10−15 erg cm−2 s−1, which accounts for ∼ 0.4% of our detected total 0.2−10 keV X-ray
flux or 1.4% of the unpulsed flux. Tennant et al. (2001) detected the X-ray emission of the
Crab pulsar at its pulse minimum, though accounting for only a tiny fraction of the total or
unpulsed flux. Tennant et al. (2001) further suggested that this component is nonthermal.
The unpulsed X-ray emission from PSR J1930+1852 may be of the same nature as that of
the Crab pulsar.
Together with the previous X-ray and radio periods, the three timing measurements
obtained herein provide an opportunity to study the pulsar period evolution. A linear fit to
these periods yields a Ṗ of 7.5116(6)×10−13 s s−1 with a reduced χ2ν of 3.6 (see Figure 7).
The large χ2ν value and the scattered residuals show that the period of PSR J1930+1852
evolves in a more complicated than a simply constant spin down. The period derivative
obtained here is also significantly (9σ) different from that obtained by Camilo et al. (2002).
This suggests that PSR J1930+1852 has experienced periods of timing noise and/or glitches
- not unepxected for a young pulsar (e.g., Zhang et al. 2001; Wang, et al. 2001; Crawford
& Demiańsky 2003). Arzoumanian et al. (1994) defined a quantity ∆8 to represent the
stability of a pulsar. They found an empirical relation between ∆8 and Ṗ , which predicts a
high ∆8 of -0.67 for PSR J1930+1852. This value is higher than those measured for most
ordinary pulsars and is consistent with the variability in spin-down rate observed for this
pulsar.
Indeed, PSR J1930+1852 shares other interesting properties with PSR B0540-69. For
example, the pulsed X-ray emission of PSR B0540-69 has probably a harder spectrum, with
a photon index of 1.83±0.13, than the steady component whose photon index is (2.09±0.14;
Kaaret et al. 2001), whereas PSR J1930+1852 also has a harder pulsed emission than the
steady emission. Furthermore, the pulse width of PSR B0540-69 is about 0.4 and its pulsed
– 9 –
fraction fp = 71.0 ± 5%, both nearly identical to the respective values measured herein for
PSR J1930+1852. Based on these X-ray emission similarities, the X-ray emission regions of
the two pulsars may have the similar overall structures and viewing geometries.
The project is partially supported by NASA/SAO/CXC through grant GO5-6057X.
FJL and JLQ also acknowledge support from the National Natural Science Foundation of
China.
REFERENCES
Arzoumanian, Z., Nice, D.J., Taylor, J.H., & Thorestt, S.E. 1994, ApJ, 422, 671
Camilo, F., Lorimer, D.R., Bhat, N.D.R., Gotthelf, E.V., Halpern, J.P., Wang, Q.D., Lu,
F.J., & Mirabal, N. 2002, ApJ, 574, L71
Crawford, F., & Demiański, M. 2003, ApJ, 595, 1052
Diaconis, P., & Efron, B. 1983, Scientific American, May P96
Gotthelf, E. V. 2003, ApJ, 591, 361
Gotthelf, E. V. 2004, in “Young Neutron Stars and Their Environments”, IAU Symp. 218.
Ed. F. Camilo & B. M. Gaensler (S.F. CA.: ASP) 2004, 218, 225
Kaaret, P., et al. 2001, ApJ, 546, 1159
Lu, F.J., Wang, Q.D., Aschenbach, B., Durouchoux, P., & Song, L.M. 2002, ApJ, 568, L49
Middleditch, J., et al. 2006, ApJ, 652, 1531
Page, D. 1998, in The Many Faces of Neutron Stars ed. R. Buccheri, J. van Paradijs, & M.A.
Alpar (Dordrecht: Kluwer), 539
Tennant, A.F., et al. 2001, ApJ, 554, L173
Wang, N., Wu, X.J., Manchester, R.N., et al. 2001, Chin. J. Astron. Astrophys., 1, 195
Zhang, W., Marshall, F.E., Gotthelf, E.V., Middleditch, J., & Wang, Q.D. 2001, ApJ, 554,
This preprint was prepared with the AAS LATEX macros v5.2.
– 10 –
Fig. 3.— Period and period uncertainty of PSR J1930+1852 at three epochs. Left – The
periodograms of PSR J1930+1852 constructed from the September 2002, December 2002,
and June 2003 observations and together with the respective best-fit Gaussian profiles for
the central peaks. Right – The distribution of the 500 periods from the bootstrapped data
for each observations. The Gaussian 1σ width gives an estimate of the period uncertainty.
The P0 values are given in Table 1.
– 11 –
Fig. 4.— The pulse shape of PSR J1930+1852 in the 2 − 15 keV band as obtained with
RXTE on 2002 September 12 (solid) and December 23 (dashed). Phase zero is arbitrary;
two cycles are shown for clarity. The December light curve is shifted upward by 0.002.
– 12 –
Fig. 5.— The pulse shape and its hardness ratio of PSR J1930+1852 in the 0.3 − 10 keV
band as measured with Chandra on 2002 June 30. The pulse shape (Top Panel) is folded
at the period given in Table 1 and the phase bin size is chosen so that each bin contains
almost the same counts. The hardness ratio (Bottom Panel) is as defined in the text (§3.2);
the background, as defined in §2, has been subtracted.
– 13 –
Table 1. Timing Results for PSR J1930+1852
Date Obs. Type Epoch Period
(UT) (MJD[TDB]) (s)
1997 Apr 27 ASCA 50566 0.13674374(5)a
2002 Jan 17 Radio 52280 0.136855046957(9)a
2002 Sep 12 RXTE 52530 0.136871312(4)
2002 Dec 23 RXTE 52632 0.136877919(3)
2003 Jun 30 Chandra 52820 0.136890130(5)
aTaken from Camilo et al. (2002)
Fig. 6.— The pulsed X-ray spectrum of PSR J1930+1852 obtained with Chandra ACIS-I3
in continuous-clocking mode: Left Panel: fitting with an absorbed power-law model with
the gain scale and offset fixed as 1 and 0; Right Panel: fitting with the same model but with
the gain scale and offset of 0.90 and -0.18.
– 14 –
Fig. 7.— The period residuals of PSR J1930+1852 in different epochs.
	Introduction
	Observations and Data Analysis
	Results
	Pulsar Timing
	Pulsed Emission Spectral Characteristics
	Discussion
ABSTRACT
  We present new X-ray timing and spectral observations of PSR J1930+1852, the
young energetic pulsar at the center of the non-thermal supernova remnant
G54.1+0.3. Using data obtained with the Rossi X-ray Timing Explorer and Chandra
X-ray observatories we have derived an updated timing ephemeris of the 136 ms
pulsar spanning 6 years. During this interval, however, the period evolution
shows significant variability from the best fit constant spin-down rate of
$\dot P = 7.5112(6) \times 10^{-13}$ s s$^{-1}$, suggesting strong timing noise
and/or glitch activity. The X-ray emission is highly pulsed ($71\pm5%$
modulation) and is characterized by an asymmetric, broad profile ($\sim 70%$
duty cycle) which is nearly twice the radio width. The spectrum of the pulsed
emission is well fitted with an absorbed power law of photon index $\Gamma =
1.2\pm0.2$; this is marginally harder than that of the unpulsed component. The
total 2-10 keV flux of the pulsar is $1.7 \times 10^{-12}$ erg cm$^{-2}$
s$^{-1}$. These results confirm PSR J1930+1852 as a typical Crab-like pulsar.

<|endoftext|><|startoftext|>
Introduction 
The Portevin-Le Chatelier (PLC) effect is one of the widely studied 
metallurgical phenomena, observed in many metallic alloys of technological 
importance [1-12]. It is a striking example of the complexity of the spatiotemporal 
dynamics, arising from the collective behavior of dislocations. In uniaxial loading with 
                                                 
* Corresponding author: apu@veccal.ernet.in 
constant imposed strain rate, the effect manifests itself as a series of serrations (stress 
drops) in the stress-time or strain curve. Each stress drop is associated with the 
nucleation of a band of localized plastic deformation, often designated as PLC band, 
which under certain conditions propagates along the sample. The microscopic origin 
of the PLC effect is the dynamic strain aging (DSA) [13-19] of the material due to the 
interaction between mobile dislocations and diffusing solute atoms. At the 
macroscopic scale, this dynamic strain aging leads to a negative strain rate sensitivity 
(SRS) of the flow stress and makes the plastic deformation nonuniform.  
In polycrystals three types of the PLC effect are traditionally distinguished on the 
qualitative basis of the spatial arrangement of localized deformation bands and the 
particular appearance of deformation curves [20, 21]. Three generic types of 
serrations: type A, B and C occur depending on the imposed strain rate. For 
sufficiently large strain rate, type A serrations are observed. In this case, the bands are 
continuously propagating and highly correlated.  The associated stress drops are small 
in amplitude [22,23]. If the strain rate is lowered, type B serrations with relatively 
larger amplitude occur around the uniform stress strain curve. These serrations 
correspond to intermittent band propagation. The deformation bands are formed ahead 
of the previous one in a spatially correlated manner and give rise to regular surface 
markings [22,23]. For even smaller strain rate, bands become static. This type C band 
nucleates randomly in the sample leading to large saw-tooth shaped serration in the 
stress strain curve and random surface markings [22,23]. 
From metallurgical point of view, the PLC effect is usually undesirable since it has 
detrimental influences like the loss of ductility and the appearance of surface markings 
on the specimen. Beyond its importance in metallurgy, the PLC effect is an epitome 
for a general class of nonlinear complex systems with intermittent bursts. The 
succession of plastic instabilities shares both physical and statistical properties with 
many other systems exhibiting loading-unloading cycles e.g. earthquakes. PLC effect 
is regulated by interacting mechanisms that operate across multiple spatial and 
temporal scales. The output variable (stress) of the effect exhibits complex 
fluctuations which contains information about the underlying dynamics. 
The PLC effect has been extensively studied over the last several decades with the 
goal being to achieve a better understanding of the small-scale processes and of the 
multiscale mechanisms that link the mesoscale DSA to the macroscale PLC effect. 
The technological goal is to increase the SRS to positive values in the range of 
temperatures and strain rates relevant for industrial processes. This would ensure 
material stability during processing and would eliminate the occurrence of the PLC 
effect. 
Due to a continuous effort of numerous researchers, there is now a reasonable 
understanding of the mechanisms and manifestations of the PLC effect. A review of 
this field can be found in Ref. [17,18]. The possibility of chaos in the stress drops of 
PLC effect was first predicted by G. Ananthakrishna et. al. [24] and latter by V. 
Jeanclaude et. al. [25].  This prediction generated a new enthusiasm in this field. In 
last few years, many statistical and dynamical studies have been carried out on the 
PLC effect [10-12, 26-32]. Analysis revealed two types of dynamical regimes in the 
PLC effect. At medium strain rate (type B) chaotic regime has been demonstrated [30, 
33], which is associated with the bell-shaped distribution of the stress drops. For high 
strain rate (type A) the dynamics is identified as self organized criticality (SOC) with 
the stress drops following a power law distribution [33]. The crossover between these 
two mechanisms has also been a topic of intense research for the past few years 
[29,33,34-36]. It is shown that the crossover from the chaotic to SOC dynamics is 
clearly signaled by a burst in multifractality [29,33].  
This crossover phenomenon is of interest in the larger context of dynamical 
systems as this is a rare example of a transition between two dynamically distinct 
states. Chaotic systems are characterized by the self similarity of the strange attractors 
and sensitivity to initial conditions quantified by fractal dimension and the existence 
of a positive Lyapunov exponent, respectively. On the contrary, the SOC dynamics is 
characterized by infinite number of degrees of freedom and a power law statistics.  
The general consensus that the dynamic strain aging is the cause behind the 
PLC effect suggests a discrete connection between the stress fluctuation and the band 
dynamics. We do not have a system of primitive equations to describe the dynamics of 
the band, so we must extract as much information as possible from the data itself. We 
use the stress data recorded during the plastic deformation for our analysis. However, 
we do not analyze these data blindly but in the framework of nonlinear dynamics as 
the band dynamics shows intermittency. In this work, we have carried out detailed 
recurrence analysis of the stress time data observed during the PLC effect to study the 
change in the dynamical behavior of the effect with the imposed strain rate. 
Experimental details: 
Substitutional Aluminum alloys with Mg as the primary alloying element are model 
systems for the PLC effect studies. These alloys have wide technological applications 
due to their advantageous strength to weight ratio. They show good ductility and can 
be rolled to large reductions and processed in thin sheets and are being extensively 
used in beverage packaging and other applications. However, the discontinuous 
deformation behavior of these alloys at room temperature rule them out from many 
important applications like in the automobile industry. These alloys exhibit the PLC 
effect for wide range of strain rates and temperatures. Under these conditions the 
deformation of these materials localize in narrow bands which leave undesirable band-
type macroscopic surface markings on the final products.  
 Tensile tests were conducted on flat specimens prepared from polycrystalline 
Al-2.5%Mg alloy. Specimens with gauge length, width and thickness of 25, 5 and 2.3 
mm, respectively were tested in an INSTRON (model 4482) machine. All the tests 
were carried out at room temperature (300K) and consequently there was only one 
control parameter, the applied strain rate. To monitor closely its influence on the 
dynamics of the PLC effect, strain rate was varied from 7.98×10-5 S-1 to 1.60×10-3 S-1. 
The PLC effect was observed through out the range. The stress-time response was 
recorded electronically at periodic time intervals of 0.05 seconds. Fig. 1 shows the 
observed PLC effect in a typical stress-strain curve for strain rate 1.20×10-3 S-1. The 
stress data shows an increasing trend due to the strain hardening effect. The trend is 
eliminated and analyses reported in this study are carried out on the resulting data. The 
inset in the Fig. 1 shows a typical segment of the trend corrected stress-strain curve. In 
the varied strain rate region we could observe type B, B+A and A serrations as 
reported [20,21]. We kept the sampling rate same for all the experiments. 
Consequently the number of data points was not same for different strain rate 
experiments. To analyze the data in similar footing we have carried our analysis on the 
stress data from the same strain region 0.02-0.10 for all strain rate experiments. 
Recurrence Analysis 
 Eckman, Kamphorst and Ruelle [37] proposed a new method to study the 
recurrences and nonstationary behaviour occurring in dynamical system. They 
designated the method as “recurrence plot” (RP). The method is found to be efficient 
in identification of system properties that cannot be observed using other conventional 
linear and nonlinear approaches. Moreover, the method has been found very useful for 
analysis of nonstationary system with high dimension and noisy dynamics. The 
method can be outlined as follows: given a time series {xi} of N data points, first the 
phase space vectors ui={xi,xi+τ,……,xi+(d-1)τ}are constructed using Taken’s time delay 
method. The embedding dimension (d) can be estimated from the false nearest 
neighbor method. The time delay (τ) can be estimated either from the autocorrelation 
function or from the mutual information method. The main step is then to calculate the 
N×N matrix 
)(, jiiji xxR −−Θ= ε ,               i,j=1,2,….,N        (14) 
where εi is a cutoff distance, ||..|| is a norm (we have taken the Euclidean norm), and 
Θ(x) is the Heavyside function. The cutoff distance εi defines a sphere centered at ix . 
If jx  falls within this sphere, the state will be close to ix and thus 1, =jiR . The binary 
values in jiR ,  can be simply visualized by a matrix plot with color black (1) and white 
(0). This plot is called the recurrence plot.  
 However, it is often not very straight forward to conclude about the dynamics of 
the system from the visual inspection of the RPs. Zbilut and Webber [38,39] 
developed the recurrence quantification analysis (RQA) to provide the quantification 
of important dynamical aspects of the system revealed through the plot. The RQA 
proposed by Zbilut and Webber is mostly based on the diagonal structures in the RPs. 
They defined different measures, the recurrence rate (REC) measures the fraction of 
black points in the RP, the determinism (DET) is the measure of the fraction recurrent 
points forming the diagonal line structure, the maximal length of diagonal structures 
(Lmax), the entropy (Shannon entropy of the line segment distributions) and the trend 
(measure of the paling of recurrent points away from the central diagonal). These 
variables are used to detect the transitions in the time series. Recently Gao [40] 
emphasized the importance of the vertical structures in RPs and introduced a 
recurrence time statistics corresponding to the vertical structures in RP. Marwan et. al. 
[41] extended Gao’s view and defined measures of complexity based on the 
distribution of the vertical line length. They introduced three new RP based measures: 
the laminarity, the trapping time (TT) and the maximal length of the vertical structures 
(Vmax). Laminarity is analogous to DET and gives the measure of the amount of 
vertical structure in the RP and represents laminar states in the system. TT contains 
information about the amount as well as the length of the vertical structure. Applying 
these measures to the logistic map data they found that in contrast to the conventional 
RQA measures, their measures are able to identify the laminar states i.e. chaos-chaos 
transitions. The vertical structure based measures were also found very successful to 
detect the laminar phases before the onset of life-threatening ventricular 
tachyarrhythmia [41]. Here we have applied the measures proposed by Marwan et. al. 
along with the traditional measures to find the effect of strain rate on the PLC effect. 
Results and Discussions 
 RP and RQA have been successfully applied to diverse fields starting from 
Physiology to Econophysics in recent years. A review of various applications of RPs 
and RQA can be found in the recent article by Marwan et. al. [42]. Here we extend the 
list of application of RPs and RQA and for the first time apply these methods to study 
the dynamical behavior of the PLC effect in Al-2.5%Mg alloy. In this study, we 
particularly concentrate on the strain rate region of 7.98×10-5 S-1 to 1.60×10-3 S-1.  The 
main goal is to demonstrate the ability of RQA to detect the unique crossover 
phenomenon observed in the PLC dynamics. 
 It has been shown that a RP analysis is optimal when the trajectory is embedded 
in a phase space reconstructed with an appropriate dimension d [38]. Such a 
dimension can be well estimated using a false nearest neighbor technique. The d-
dimensional phase space is then reconstructed using delay coordinate. The time delay 
τ can be estimated using the mutual information or the first zero of an auto correlation 
function. Based on the false nearest neighbor method we have chosen d to be 10 for all 
the strain rates. τ obtained from the mutual information were in the range 1-14 for 
different strain rate data. A parameter specific to the RP is the cutoff distance εi. εi is 
selected from the scaling curve of REC vs, εi as suggested in the literature [43].  Fig. 2 
shows the RPs of the stress fluctuations during the PLC effect at four different strain 
rates. From the visual inspection of the RPs it is easy to understand that the dynamical 
behavior of the PLC effect changes with the strain rate. However, it is wise to go for 
the RQA and quantify the difference in the PLC dynamics with strain rate. Fig. 3 
shows the variation of the various RQA variables with strain rate. It can be seen from 
the Fig. 3 that the RQA variables like DET and laminarity do not show any systematic 
variation with strain rate. Lmax, TT and Vmax decreased rapidly with strain rate and 
reached a plateau. Trend values remained almost constant at lower strain rates and 
decreased at higher strain rates. The variation of entropy with strain rate is rather 
interesting. The entropy initially decreased with strain rate and suddenly reached a 
higher value. However, the most important behavior was observed in the variation of 
REC and a variable derived from REC and DET, i.e. the ratio of DET and REC 
(DET/REC). The REC values decreased initially and reached a low value and then 
again started increasing. This variation is quite appealing in the sense that the REC 
value is very low in the crossover region and hence is able to detect the crossover 
phenomenon of the PLC effect. On the contrary, the DET/REC values showed an 
abrupt jump in the crossover region. 
 It is clearly evident from this study that RQA is able to detect the crossover in 
the PLC dynamics from type B to type A region. However the detailed explanation of 
the results obtained from the study are not straightforward. Further study is necessary 
which will in turn also help to understand the dislocation dynamics involved in the 
PLC effect. 
Conclusions 
 In conclusion, for the first time we have applied the recurrence analysis to study 
the dynamical behavior of the PLC effect. The study revealed that the recurrence 
analysis is efficient to detect the unique crossover, as indicated in the earlier studies, in 
the dynamics of the PLC effect.  
References 
1. F. Le Chatelier, Rev. de Metall. 6, 914 (1909). 
2. A.W. Sleeswyk, Acta Metall. 6, 598 (1958). 
3. P.G. McCormick, Acta Metall. 20, 351 (1972). 
4. J.D. Baird, The Inhomogeneity of Plastic Deformation (American Society of 
Metals, OH, 1973). 
5. A. Van den Beukel, Physica Status Solidi(a) 30, 197(1975). 
6. A. Kalk, A. Nortmann, and Ch. Schwink, Philos. Mag. A 72, 1239 (1995). 
7. M. Zaiser, P. Hahner, Phys Status Solidi (b) 199, 267 (1997). 
8. M. S. Bharathi, M. Lebyodkin, G. Ananthakrishna, C. Fressengeas, and L. P. 
Kubin, Acta Mater. 50, 2813 (2002).  
9. E. Rizzi, P. Hahner, Int. J. Plasticity 20, 121(2004). 
10. P. Barat, A. Sarkar, P. Mukherjee, S. K. Bandyopadhyay, Phys Rev Lett 94, 
05502 (2005). 
11. A. Sarkar, P. Barat, Mod Phys Lett B 20, 1075 (2006) 
12. A. Sarkar, A. Chatterjee, P. Barat, P. Mukherjee, Mat Sci Eng A, in press 
13. A.H. Cottrell, Dislocations and plastic flow in crystals (Oxford University 
Press, London, 1953). 
14. A. Van den Beukel, U. F. Kocks, Acta Metall. 30, 1027 (1982). 
15. J. Schlipf, Scripta Metall. Mater. 31, 909 (1994). 
16. P. Hahner, A. Ziegenbein, E. Rizzi, and H. Neuhauser, Phys. Rev. B 65, 
134109 (2002). 
17. Y. Estrin, L.P. Kubin, Continuum Models for Materials with Microstructure, 
ed. H.B. Muhlhaus (Wiley, New York, 1995, p. 395). 
18. L.P. Kubin, C. Fressengeas, G. Ananthakrishna, Dislocations in Solids, Vol. 
11, ed. F. R. N. Nabarro, M. S. Duesbery (Elsevier Science, Amsterdam, 
2002, p 101). 
19. A. Nortmann, and Ch. Schwink, Acta Mater. 45, 2043 (1997);  
20. K. Chihab, Y. Estrin, L.P. Kubin,  J. Vergnol, Scripta Metall. 21, 203(1987). 
21. K. Chihab, and C. Fressengeas, Mater. Sci. Eng. A 356,102 (2003). 
22. M. Lebyodkin, L. Dunin-Barkowskii, Y. Brechet, Y. Estrin and L. P. Kubin,  
Acta mater 48, 2529 (2000). 
23. P. Hahner, Mat Sci Eng A 164, 23 (1993). 
24. G. Ananthakrishna, M.C. Valsakumar, J. Phys. D 15, L171(1982). 
25. V. Jeanclaude, C. Fressengeas, L.P. Kubin, Nonlinear Phenomena in Materials 
Science II, ed. L.P. Kubin (Trans. Tech., Aldermanndorf, 1992, p. 385). 
26. M. Lebyodkin, Y. Brechet, Y. Estrin, L.P. Kubin, Acta Mater. 44, 
4531(1996). 
27. M.A. Lebyodkin, Y. Brechet, Y. Estrin, L.P. Kubin, Phys. Rev. Lett. 
74,4758(1995). 
28. S. Kok, M.S. Bharathi, A.J. Beaudoin, C. Fressengeas, G. Ananthakrishna, 
L.P. Kubin, M. Lebyodkin, Acta Mater. 51, 3651(2003).  
29. M.S. Bharathi, M. Lebyodkin, G. Ananthakrishna, C. Fressengeas, L.P. 
Kubin, Phys. Rev. Lett. 87, 165508(2001).  
30. S. Venkadesan, M. C. Valsakumar, K. P. N. Murthy, and S. Rajasekar,  Phys. 
Rev. E 54, 611 (1996). 
31. M.A. Lebyodkin and T.A. Lebedkine, Phys. Rev. E 73, 036114 (2006) 
32. D. Kugiumtzis, A. Kehagias, E. C. Aifantis, and H. Neuhäuser, Phys. Rev. E 
70, 036110 (2004)  
33. G. Ananthakrishna, S. J. Noronha, C. Fressengeas, and L. P. Kubin, Phys. 
Rev. E 60, 5455 (1999). 
34. G. Ananthakrishna and M. S. Bharathi, Phys. Rev. E 70, 026111 (2004) 
35. M. S. Bharathi, G. Ananthakrishna, EuroPhys Lett 60, 234 (2002). 
36. M. S. Bharathi and G. Ananthakrishna, Phys. Rev. E 67, 065104 (2003) 
37. J. –P. Eckmann, S. O. Kamphorst, D. Ruelle, Europhys. Lett. 4, 973 (1987).  
38. J. –P. Zbilut and C. L. Webber Jr., Phys. Lett. A 171, 199 (1992). 
39. C. L. Webber Jr. and J. –P. Zbilut, J. Appl. Physiol. 76, 965 (1994). 
40. J. Gao and H. Cai, Phys. Lett. A 270, 75 (2000).  
41. N. Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, J. Kurths, Phys. Rev. 
E 66, 026702 (2002).  
42. N. Marwan, M. C. Romano, M. Thiel and J. Kurths, Physics Reports  438, 237 
(2007). 
43. C. L. Webber Jr, J. –P. Zbilut, In: Tutorials in contemporary nonlinear 
methods for the behavioral sciences, Chapter 2, pp. 26-94, 2005. M. A. 
Riley, G. Van Orden, eds. 
Fig. 1 True stress vs. true strain curve of Al-2.5%Mg alloy deformed at a strain rate of                    
1.20×10-3 S-1. The inset shows a typical segment of the trend corrected true stress vs true 
strain curve. 
Fig. 2 Recurrence plots at the strain rate (a) 1.60×10-3 S-1 (b) 7.97×10-4 S-1 (c) 3.85×10-4 
S-1 (d) 1.99×10-4 S-1  
Fig. 3 Variation of the Recurrence Quantification Analysis variables with strain rate. 
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14
0.040 0.042 0.044 0.046 0.048
True Strain
True Strain
Fig. 1 
Fig. 2 
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Type AType B
Strain rate (s-1)
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
0.0 5.0x10-4 1.0x10-3 1.5x10-3
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Strain rate (s-1)
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Strain rate (s-1)
0.0 5.0x10-4 1.0x10-3 1.5x10-3
Type AType B
Strain rate (s-1)
Fig. 3
ABSTRACT
  Tensile tests were carried out by deforming polycrystalline samples of
Al-2.5%Mg alloy at room temperature in a wide range of strain rates where the
Portevin-Le Chatelier (PLC) effect was observed. The experimental stress-time
series data have been analyzed using the recurrence analysis technique based on
the Recurrence Plot (RP) and the Recurrence Quantification Analysis (RQA) to
study the change in the dynamical behavior of the PLC effect with the imposed
strain rate. Our study revealed that the RQA is able to detect the unique
crossover phenomenon in the PLC dynamics.

<|endoftext|><|startoftext|>
Introduction. Crisis in the Search for DM Candidates 
The origin of dark matter (DM) has intrigued researches for several decades. It has become 
increasingly clear that these are neither neutrinos with m ≥ 10 eV nor massive compact halo 
objects (MACHOs) (Evans and Belokurov, 2005). In the most recent decade, efforts were 
focused primarily on the search for weakly interacting massive particles (WIMPs) and similar 
objects with a mass of ~10-100 GeV, whose existence is predicted by some theories of 
elementary particles beyond the Standard Model. It was believed self-evident from the very 
beginning that the cross section of their interaction with nucleons, s, should be larger than that 
of their mutual annihilation (~3×10-34 cm2) (Primack et al., 1988). The level reached presently 
is s ~ 10
, but still no reliable and universally accepted results have been reported. 
One cannot avoid the impression that the researchers are pursuing an imaginary but 
nonexisting horizon, in other words, that the WIMPs as they were conceived of originally 
simply do not exist. 
The only data that could possibly be treated as evidence for the existence of WIMPs, 
or more generally a DM particle component in the galactic halo, were obtained by the DAMA 
collaboration in their 7-year-long experiment (Bernabei et al., 2003). The  evidence is a yearly 
modulation of the number of 2-6-keV signals accumulated with ~100 kg of NaI(Tl) 
scintillators. The modulation is ~5% and reaches a maximum some time at the beginning of 
June; it could be attributed to a seasonal variation of a ground-level flux of objects from the 
galactic halo caused by the Earth’s orbit being inclined with respect to the direction of motion 
of the Solar system around the galactic centre (Bernabei et al., 2003). The statistical 
significance of the modulation (6.3σ) is high enough to leave no doubt in its existence. The 
experiments being performed on other installations do not, however, support these results. To 
interpret this situation, the scientists running the DAMA have to consider different types of 
WIMPs and different modes of their interaction with matter. Recall, for instance, the recent 
assumption of a light pseudoscalar and scalar DM candidate of ~keV mass (see Bernabei et 
al., 2006, and refs. herein). Another approach was put forward by Foot (2006). He believes 
that the DAMA signals originate from the Earth crossing a stream of micrometeorites of 
mirror matter. 
The purpose of the present paper is to show that the effects observed by DAMA/NaI, 
including the yearly variation of the signal level, allow an interpretation drawn from the 
St.Petersburg  (SPb) experiments on detection of DArk Electric Matter Objects (daemons), 
which presumably are Planckian elementary black holes carrying a negative electric charge 
(md ≈ 3×10
 g, Ze = -10e) (Drobyshevski, 1997a,b). 
Starting from March 2000, we have been reliably detecting by means of thin spaced 
ZnS(Ag) scintillators, both in ground-level and underground experiments, signals whose 
separation corresponds to ~10-15 km/s, i.e., the velocity of objects falling from near Earth, 
almost circular heliocentric orbits (NEACHOs). The flux is ~10
 and it varies with P 
= 0.5 yr, to pass through maxima in March and September (Drobyshevski, 2005a; 
Drobyshevski and Drobyshevski, 2006; Drobyshevski et al., 2003). (Note, that attempts were 
made to treat these results in terms of possible properties of mirror matter also (Foot and 
Mitra, 2003), but observations of a daemon flux directed upward, i.e., from under the ground 
level, are apparently in conflict with this interpretation.) 
II. Specific Features of the Traversal of the Sun by Daemons 
As the Sun moves through the interstellar medium with a velocity V∞, particles of the latter 
(we assume for the sake of simplicity that their velocity is << V∞) become focused by 
gravitation of the Sun, so that its effective cross section becomes Seff = πpmax
[1+(Vesc/V∞)
] (Eddington, 1926). Here R
 is the radius of the Sun, Vesc = 617.7 km/s is 
the escape velocity from its surface, and pmax is the maximum value of the impact parameter 
(the impact parameter is the distance between the continuation of the V∞ vector and the center 
of the Sun). For V∞ = 20 km/s, pmax = 30.9R
In crossing the Sun with a velocity of ~10
 cm/s, daemons are slowed down. One 
cannot calculate at present the associated decelerating force, because a negative daemon 
captures protons and heavy (Zn > Z) nuclei, catalyzes proton fusion reactions, decomposes 
somehow the nucleons in nuclei etc. As a result, the effective charge of the daemon, complete 
with the particles captured and carried by it, varies continuously. Straightforward estimates 
show, however, that daemons of the galactic disk with a velocity dispersion of ~4-30 km/s 
(Bahcall et al., 1992), are slowed down strongly enough to preclude the escape to infinity of 
many of them as they pass through the Sun (Drobyshevski, 1996, 1997a). 
Such objects move along strongly elongated trajectories with perihelia within the Sun. 
Subsequent crossings of the Sun’s material bring about contraction of the orbits and their 
escape under the Sun’s surface. If, however, a daemon moving on such a trajectory passes 
through the Earth’s gravitational sphere of action, it is deflected, which will result in the 
perihelion of its orbit leaving the Sun with a high probability. The daemon will be injected 
into a stable, strongly elongated Earth crossing heliocentric orbit (SEECHO). Straightforward 
estimates made in the gas kinetic approximation and using the concepts of mean free path 
length etc. suggest that daemons build up on SEECHOs to produce an Earth crossing flux of 
~3×10-7 cm-2s-1 (Drobyshevski, 1997a). (These were fairly optimistic calculations performed  
for a rough estimation of the parameters of the daemon detector that was being designed at 
that time, in 1996.) 
In subsequent inevitable crossings by SEECHO daemons of the Earth’s sphere of 
action, their orbits deform to approach that of the Earth; these are near-Earth almost circular 
heliocentric orbits (NEACHOs). And whereas daemons moving in nearly parabolic 
SEECHOs strike the Earth with velocities of up to V⊕√3 = 51.6 km/s (here V⊕ = 29.79 km/s is 
the orbital velocity of the Earth), the NEACHO objects fall on the Earth with a velocity of 
only ~10(11.2)-15 km/s. 
Estimates of the ground-level SEECHO daemon flux made in 1996 were based on 
simple concepts of an isotropic flux of galactic disk daemons incident on the Sun. Our 
subsequent experiments that demonstrated a half-year variation made it clear that the flux is 
not isotropic, probably because of the motion of the Solar system relative to the DM 
population of the disk. We know now also that the daemons detected by us fall, judging from 
their velocity, from NEACHOs (Drobyshevski, 2005b; Drobyshevski et al., 2003). 
III. Calculation of the Passage of Daemons through the Sun 
It appears only natural to assume the flux variations with P = 0.5 yr to be a consequence of 
the composition of the Earth’s orbital motion around the Sun and of the Sun itself relative to 
the galactic disk population. 
The Sun moves relative to the nearest star population with a velocity of 19.7 km/s in 
the direction of the apex with the coordinates A = 271° and D = +30° (equatorial coordinates) 
or L” = 57° and B” = +22° (galactic coordinates) (Allen, 1973). 
Initially, rather than delving into the fundamental essence of the processes underlying 
the celestial mechanics, we invoked a simplified concept of a “shadow”, which is produced by 
daemons captured into SEECHOs from the galactic disk by the moving Sun, and of the 
corresponding “antishadow” created by some daemons crossing the Sun in the opposite 
direction (i.e., in the direction of its motion), an approach that had been reflected in our earlier 
publications (Drobyshevski, 2004; Drobyshevski et al., 2003). 
In a new approach to calculation of the passage of objects through the Sun we made 
use of the celestial mechanics integrator of Everhart (1974). It was adapted to FORTRAN by 
S.Tarasevich (Institute of Theoretical Astronomy of RAS) for use with the BESM-6 
computer. In ITA (and now in the Institute of Applied Astronomy of RAS) it was employed in 
calculation of asteroid ephemeredes and precise prediction of the apparition of comets 
allowing for the action of known planets. We made two important refinements on the code, 
more specifically, we (i) introduced the resistance of the medium in the simplest gas dynamics 
form, F = σρV2 (Drobyshevski, 1996) (where σ is the effective cross section of a particle, and 
ρ is the medium density) and (ii) took into account that the Sun is not a point object but has 
instead a density distributed over the volume (we used the model of the Sun from Allen 
(1973)). 
The very first calculations revealed that the trajectories of particles falling on the Sun 
and crossing it have a non-closed, many-loop pattern (Drobyshevski, 2005b). This should 
certainly have been expected, because inside the Sun particles move in a gravitational field 
not of a point but rather of a radius-dependent mass, so that the trajectories do not close and 
form instead a rosette, whose petals appear successively in the direction opposite to that of a 
body moving around the Sun (see, e.g., Figs.4 and 5 below). 
This prompted us to consider the possibility of combining and explaining the results of 
DAMA/NaI and of our experiments in terms of a common daemon paradigm, all the more so 
because earlier attempts (Drobyshevski, 2005a) had succeeded in proposing an interpretation 
of the so-called “Troitsk anomaly”, i.e., a displacement of the tritium β-spectrum tail 
occurring with a half-year periodicity. 
This approach: 
a) would hopefully provide an answer as to why the results of the DAMA/NaI are not 
confirmed by other WIMP experiments; 
b) would permit us to understand why the intensity of the scintillation signals assigned to 
recoil nuclei lies in the 2-6-keV interval (here 2 keV is the sensitivity threshold of the 
DAMANaI detector) and not higher, whereas elastic interaction of nuclei with WIMPs 
of the galactic halo (V = 200-300 km/s) should seemingly produce signals with 
energies of up to ~200 keV. 
IV. On How to Corroborate the St.Petersburg and DAMA Experiments 
The first question that comes immediately to mind is how could one explain the twofold 
difference in the signal periodicity between the SPb and DAMA experiments? 
Figure 1. Scheme of motion of the Sun and the Earth towards the apex (see text). 
Let us begin with the SPb experiment. 
Figure 1 shows schematically the motion of the Sun together with the Earth in the 
direction of the apex. The angle between the plane of the Earth’s orbit, the ecliptic, and the 
apex direction is approximately α0 = 52°-53° (the direction to the apex, just as the velocity V∞, 
depends on the stars (or interstellar gas clouds etc.) one chooses as references (Allen, 1973); 
we assume in what follows α0 = 52° and V∞ = 20 km/s). We assume also the angle between 
the straight line lying in the ecliptic plane and normal to the apex direction and the equinox 
line to be about 10°; as the Earth moves along its orbit, it crosses first this line, and after that, 
the equinox line. This order of crossing fits our measurements of the positions of the maxima 
in the primary daemon flux (Drobyshevski, 2005a; Drobyshevski and Drobyshevski, 2006), 
which occur some time in the first decade of March or September (and incidentally coincides 
with A = 258° for the Solar apex relative the interstellar gas). 
0 10 20 30 40
p0 (R�)
−  σ = 1.00×10
−  σ = 2.58×10
−  σ = 2.58×10
←|α2|
V∞ = 20 km/s
Figure 2. Angle of deviation for an object having passed through the Sun depending on the impact 
parameter p0 at different cross-sections σ of its interaction with matter (at the object mass 3×10
 g). α1 
is the angle of deviation from the initial velocity V∞ direction after the first passage through the Sun;  
α2 is the angle of deviation from the same direction after the second passage. 
Figure 2 plots the angle of deviation α1 of a material particle from the direction of its 
initial velocity V∞ after emergence from the Sun again to infinity or to the aphelion of the first 
loop (i.e. at R1) of its trajectory vs. the impact parameter p0 (the dependence of R1 on the 
impact parameter p0 is given in Fig.3). We consider subsequently only cases with p0 < pmax. 
Interestingly, in the case of multi-loop trajectories, which is possible for σ ≠ 0, angular 
deflections of subsequent from preceding loops differ little from α1, although they gradually 
decrease. The value of σ can be estimated from a comparison of further calculations with 
experiment. 
Straightforward reasoning suggests that the two maxima observed in March and 
September should be a consequence of passage through the Earth’s orbit of daemons with an 
impact parameter about p0 = ±9.162R
, where they are deflected by the Sun through α1 ≈ 90
0 4 8 12 16
p0 (R�)
−  σ = 1.00×10
−  σ = 3.00×10
(a.u.)
V∞ = 20 km/s
Figure 3. Maximum distance R1 the object reaches after the first passage through the Sun versus σ and 
Figure 4. An example of multi-loop (cross-like) trajectory of an object being braked by the Solar 
matter (the Sun's center is at X = 0, Y = 0) for repeated passages through the Sun's body.  Object of 
3×10-5 g mass and cross-section σ = 0.79×10-19 cm2 falls from infinity (X = -∞; V∞ = 20 km/s) with an 
impact parameter p0 = Y(-∞) = 9.162R
. The figure plane contains the direction to the apex and the 
normal to it lying in the plane of the ecliptic. The figure shows also an ellipse with the major semi-axis 
of 1 AU, i.e., the projection of the Earth’s orbit (the dotted circle of 1 AU radius is given as a scale for 
the reader’s orientation).  
to either side. Moreover, the presence of these maxima suggests also that they originate from 
the daemons that had already been captured by the Sun for σ ≥ 0.78×10-19 cm2 but return 
repeatedly to it and cross its body. Figure 4 (with a table) provides an idea of such trajectories. 
A particle moves along a trajectory making a right cross. First, in traversing the Sun, it 
is deflected through 90° and, in crossing the Earth’s orbit, escapes with σ = 0.78×10-19 cm2 to 
R1 → ∞ (calculations for Fig.4 are made for σ = 0.79×10
to avoid calculations with too 
great a value of R1). Thereafter, returning, now from outside, it crosses for the second time the 
Earth’s orbit and, hitting the Sun on completion of the first loop, it again deflected, leaving it 
through nearly the right angle. Now the daemon moves along the petal oriented in the anti-
apex direction. But here, although R2 > 1 AU, it does not cross the Earth’s orbit because of the 
large inclination of the ecliptic. The subsequent two crossings of the Earth’s orbit (here still 
R3 > 1 AU) from the side opposite to that of the first transit, are completed by the daemon 
after the return and the third crossing of the Sun. In making the fourth passage through the 
Sun after the return, the particle moves toward the apex. Here, depending on the value of σ 
and completing the cross, the daemon can move away from the Sun to a distance R4 > 1 AU 
(but here again, because of the inclination of the ecliptic to the apex direction, it does not 
cross the Earth’s orbit), but may not reach R4 = 1 AU at all. The first value of R4 > 1 AU 
corresponds to σ1 = 0.78×10
 at which the resistance of the solar material in the first 
passage was just high enough to absorb the excess energy ∆E = mdV∞
/2, i.e., the particle was 
captured by the Sun. The second (upper) value σ3 = 1.415×10
 (for the minimum value 
R3 = 1 AU) can be estimated under the assumption that on its third passage the daemon can 
finally reach and crosses slightly the Earth’s orbit (i.e., R3 > 1 AU). The validity of the latter 
assumption is argued for at least by our observation in autumn and spring of two distinct 
maxima; the fourfold (or even six-fold - see below) crossing by daemons of the Earth’s orbit 
increases, accordingly, the probability of their transfer from the loop trajectories into 
SEECHOs and, subsequently, in NEACHOs, whence they fall on the Earth with V ≈ 10-15 
km/s. Thus, we arrive at 0.78×10-19 ≤ σ < 1.415×10-19 cm2 (we point out once more that the 
values of σ thus found depend on the accepted daemon mass; they should be proportional to 
md). For σ = σ3, the Sun does not capture the daemon at p0 > 12.01R
, i.e., when the daemon 
initially passes through more rarefied outer layers of the Sun it moves to infinity again. 
To choose a still more optimistic scenario, assume a daemon that for σ1 = 
0.78(0.79)×10-19 cm2 crosses the Earth’s orbit in the fifth loop as well (with R5 = 1.0919 > 1 
AU). The condition R5 = 1 AU yields σ5 = 0.849 ×10
. With σ1 < σ < σ5, the daemon 
has now crossed the Earth’s orbit six times! In this respect, Jupiter is markedly behind the 
Earth (with only two crossings), while Venus is on the winning side (8 crossings). 
Figure 3 presenting the R1(p0) relation in a graphic form for σ = 1×10
 and 3×10-19 
 facilitates estimation of the energy losses suffered by the daemon in traversing the Sun, 
of the number of such traversals etc. 
Turning now back to the DAMA/NaI experiment, the daemon can obviously cross the 
set-up in June provided it falls after the first traversal of the Sun in the plane of the ecliptic, 
i.e., if it is deflected through α1 = 52°, which occurs at p0 = 4.90R
. The above estimates of σ 
suggest (see Fig. 5a) that R1 > 1 AU, and that the second loop extends up to R2 > 1 AU also, 
while naturally not crossing the Earth’s orbit, because it leaves the ecliptic plane. In June, the 
second loops of the trajectories with p0 = -6.26R
 enter the plane of the ecliptic as well and 
extend in it up to R2 ≈ 2 > 1AU (Figs.5c,d). In December, the second loops of trajectories with 
p0 = 2.215R
 enter the ecliptic plane (Fig.5b). At σ = σ5 = 0.849×10
 they have R2 = 
1.14 AU, i.e. they are able to cause SEECHOs in December, and only at σ ≥ 0.94×10-19 cm2 
R2 becomes ≤ 1 AU. 
Fig. 5. The same as Fig.4, but here the Earth's orbit is projected into the figure plane as a straight line 
segment of 2 AU length with 52° inclination to the apex direction (December is up-left, June is down-
right). 
Thus the calculations performed for σ = σ1 = 0.78×10
, σ = σ5 = 0.849×10
, σ = 1×10-19 cm2, and σ = σ3 = 1.415×10
 suggest that the daemons captured in 
traversing the Sun produce behind it a fairly smeared trail (“shadow”) through which the 
Earth passes in May-June-July, but which, generally speaking, does not reach the part of the 
Earth’s orbit oriented in the direction of the apex and corresponding approximately to the 
November-January period. This is easy to understand, because the second loops of the 
trajectories which fall into the apex hemisphere and could produce an “antishadow” 
correspond to small p0, i.e., to particles passing through the dense central part of the Sun, 
where they suffer the strongest deceleration. This is why, in particular, the second loop of the 
trajectory of the daemon that crossed the Sun at p0 = 2.215R
 and fell into the ecliptic plane 
exactly in December, simply cannot reach the Earth for σ ≥ 1×10-19 cm2 (Fig.5b).  It is thus 
clear that the ground level flux of daemons from SEECHOs should exhibit a distinct 1-year 
periodicity with a minimum some time in December.  
Superimposed on this is the half-year wave of the NEACHO objects, which appear as 
a result of having transferred from numerous SEECHOs just in periods before the equinoxes 
(note these SEECHOs are realizing at both signs of p0 = ±9.162R
). Such transitions are more 
probable in March and September because of the appearance of a noticeably larger number of 
objects in SEECHOs with comparatively short semimajor axes lying close to these ecliptic 
plane zones. The daemons entering these SEECHOs come from the cross-shaped rosette 
trajectories, along which the same object crosses the Earth’s orbit twice (or even thrice) back 
and forth, the second (and all the more so, the third) time doing it with a noticeably below the 
parabolic velocity. Significantly, the projection of the SEECHO object velocity vector on the 
Earth’s orbit reaches its maximum values here, with the correspondingly increasing duration 
(and efficiency) of the Earth’s gravitational perturbations. Note also that the ratio of the minor 
to major semi-axes for the SEECHOs produced in the capture of objects that had crossed the 
outer zones of the Sun (p0 ≈ 10R
) exceeds those for the objects with p0 → 0 (compare Figs.4 
and 5), which, on the whole, also acts so as to increase the velocity vector projection on the 
Earth’s orbit. 
V. The Manifestation of Daemons in the SPb and DAMA/NaI Detectors 
Our SPb detector, made up of thin spaced ZnS(Ag) scintillators, was sensitive to the passage 
of only fairly low-velocity (<30 km/s) daemons. The reason for this, we believe presently, is  
that successive “disintegrations” of daemon-containing nucleons in the Zn (or Fe) nucleus 
captured by the daemon occur with an interval of ~10
 s, whereas the characteristic 
dimension of the set-up is ~20-30 cm. At higher velocities, the complex consisting of the 
captured nucleus (and, possibly, a cluster of atoms) and the daemon traverses the system 
carrying an excessive positive charge, which is readily compensated by electrons captured on 
the way, and, therefore, the daemon does not interact with new nuclei with an attendant 
generation of a noticeable secondary signal. 
The DAMA/NaI experiment with a ~100 kg NaI(Tl) scintillators was designed for 
measurement of the annual modulation signature. In case of WIMPs the measured quantity is 
the energy of the recoil nuclei knocked out by heavy (~10-100 GeV) WIMPs of the galactic 
halo. The set-up is thought to be sensitive to interactions occurring both with I and Na.  
Note that the sensitivity threshold of the system, ~2 keV, corresponds to the velocity 
of an iodine nucleus of 55 km/s if one takes the quenching factor to be about unity (the 
quenching factor is a ratio of efficiencies of producing scintillations by particles under 
consideration and electrons of the same energy). Assuming elastic interaction with a very 
massive particle, the latter, to produce a 2-keV signal, should move with a velocity of ~30 
km/s (in an elastic head-on collision, the velocity of the light particle, a nucleus, should be 
twice that of the heavy projectile particle). If the signals are due to WIMPs of the galactic 
halo (V∞ = 200-300 km/s), the recoil energy of the iodine nucleus could, seemingly, reach as 
high as 110-240 keV. 
Information on the yearly variation of the flux of particles traversing the DAMA/NaI 
is provided primarily by signals in the 2-6-keV range (Bernabei et al., 2003, 2006). The 6-
keV signal corresponds to the velocity of an elastically colliding projectile of ~ 47 km/s. This 
figure is in good agreement with the velocity of 51.6 km/s with which a particle in a quasi-
parabolic orbit hits the Earth (29.78√3 = 51.6 km/s; this velocity would produce a recoil 
nucleus with an energy of 7 keV, but allowing for the statistics of other than head-on 
collisions we would obtain 5-6 keV). But it is with these velocities (when particle energies 
differ exactly by a factor of three; compare, on the other hand, the 6 and 2 keV which one 
measures!) that SEECHO daemons fall on the Earth. A truly remarkable coincidence indeed. 
A number of additional questions, however, immediately arise here, to which one 
cannot yet supply unambiguous answers.  
Indeed, estimates of the velocity with which daemons escape from geocentric Earth-
crossing orbits (GESCO) into the Earth suggest that the resistance offered by a metal-like 
solid to a daemon moving with a velocity of ~10 km/s is ~10
 dyne (Drobyshevski, 2004), 
which entails a release of thermal energy of ~6 MeV/cm. It is unclear what energy would be 
liberated in a dielectric (without conductive electrons) scintillator. If it is heat, the scintillator 
will not detect it. (On the other hand, it is too high for the cryogenic systems of the type 
CDMS-I and Edelweiss-I designed for the detection of WIMPs either; see refs. in Bernabei et 
al. (2003).)  
The situation is not yet clear with regard to the quenching factor, which we considered 
above to be about unity for the low-energy iodine nuclei. Neutron elastic scattering 
experiments give a value of about 0.09 for I and 0.30 for Na (see Bernabei et al., 2003, and 
refs. therein). We will not give details of such calibrations, nor discuss the different 
possibilities here (see, however, arXiv:0706.3095). 
Also, one should not forget that the DAMA/NaI system, by the multiple hit rejection 
criterion by Bernabei et al. (2003, see Sec.3.3), rejects events with signals appearing 
simultaneously in two or more scintillators (nine scintillators altogether). But then how (and 
with what efficiency) do recoil nuclei form in one detector piece only? 
One may recall that immediately after leaving a solid, the daemon moves in vacuum 
(or in air) together with a cluster of atoms, in one of whose nuclei it resides (Drobyshevski, 
2005a). When it enters a solid object again, the daemon leaves a larger part of this cluster 
close to the surface of the object, and moves inside it only with a small part of the cluster, or 
even only with the remainder of the nucleus in which it rests. It is unclear how efficiently 
such a complex can initiate scintillations. It is conceivable that as long as it carries an excess 
positive charge, it is surrounded by electrons, and, in moving as a conventional heavy atom 
(or ion with Zi = 1) with a relatively low velocity (<50 km/s; recall that this corresponds to 
less than ~2 keV energy of an iodine nucleus), but in a rectilinear trajectory and without 
noticeable deceleration, through the dielectric and only moving atoms apart rather than 
penetrating into them, it will excite only phonons but not scintillations. The daemon resides in 
this state for tens of microseconds, “digesting” gradually the nucleons of the nucleus it is 
carrying and traveling tens of cm in this time. (We are not discussing here the points bearing 
on possible modes of “digestion” by the daemon, an elementary black hole, of nucleons that 
could leave no trace in a scintillation detector.) Eventually, however, the daemon/nuclear-
remainder complex acquires zero charge to become a particle of the neutron type. This state 
lasts until the next proton in the nucleus disintegrates and the system then acquires a negative 
charge, ~10
 s (with the distance traveled ~5 cm). It is in this state that this neutral (but 
supermassive, ~3×10-5 g) complex, 1-3 fermi in size, passes through electronic shells of atoms 
and is capable, in a path length of ~5 cm, to produce a recoil nucleus with a double velocity of 
up to ~100 km/s. It is such SEECHO-daemon-caused events that satisfy the multiple hit 
rejection criterion (Bernabei et al., 2003) that are possibly detected by the DAMA/NaI. The 
processes involved here certainly need a deeper analysis. 
VI. Conclusions 
The daemon approach had offered an explanation for the ~5-eV drift of the tritium β-spectrum 
tail with a half-year period, the so-called “Troitsk anomaly”, and some predictions regarding 
the KATRIN experiment on direct measurement of the neutrino mass (Drobyshevski, 2005a). 
It now appears that one can corroborate within the daemon paradigm the results of the 
DAMA/NaI experiment with the inferences drawn from the SPb study, which, in addition to 
detecting daemon populations of different velocities captured by the Solar system and moving 
within it in orbits of different, including geocentric, populations, established also a half-year 
variation in the NEACHO flux, demonstrating the advantages of vacuum systems in daemon 
detection, and revealing some of their remarkable properties and specific features of their 
interaction with matter. 
We find it particularly impressive that the range (2-6 keV) of the recorded signals in 
which the DAMA/NaI exhibits a yearly periodicity coincides with exactly the same level (2-7 
keV) that follows from the celestial mechanics scenario. On the other hand, a more careful 
analysis of the evolution of daemons captured by the Sun from the galactic disk and governed 
by celestial mechanics sheds light on reasons underlying the detection of the yearly 
periodicity of the high-velocity population (30-50 km/s) measured by DAMA, and of the half-
year periodicity of the low-velocity population (5-10-30 km/s) in the SPb experiment (in the 
latter case, the part played by the cross-like multi-loop trajectories of daemons traversing the 
Sun and captured by it appears significant). If performed with good statistics, the more 
advanced measurements on DAMA/LIBRA will hopefully also reveal the half-year harmonic 
in low signal level events (~2-3 keV). Such events originate from the fall of daemons from 
“short” SEECHOs in March and September. 
An analysis of the conditions favouring capture of daemons by the Sun and 
corroboration of their subsequent possible celestial mechanics evolution with the results 
gained in the SPb and DAMA/NaI experiments permits one to impose fairly strong constraints 
(one would almost say, to measure) on the effective cross section σ of daemon interaction 
with the Solar material. It was found to be 0.78×10-19 ≤ σ < 1.4×10-19 cm2. This is ~500 times 
the cross section of the neutral “antineon” atom formed by a daemon (Ze = -10e) and ten 
protons it captured, while being 3000-5000 times smaller than the cross section for a 
daemon/heavy-nucleus complex with electrons surrounding it. On the other hand, σ can be 
governed also by the Coulomb interaction of the daemon changing continuously its effective 
charge with particles of the solar plasma (Drobyshevski, 1996), which, in turn, may be helpful 
in refining our knowledge of the characteristics of the solar material. 
Obviously enough, the problems addressed in the paper (interaction of daemons both 
with the solar matter and with the scintillator material, the celestial mechanics and statistical 
evolution of their ensemble after capture by the Solar system, the part played by the initial 
conditions and the starting velocity dispersion in the daemon population of the galactic disk, 
refinement of the apex relative to this population, - it seems it is closer to the apex relative the 
interstellar gas, not stars (see. Fig.1), transfer to SEECHOs and, subsequently, to NEACHOs 
and GESCOs etc.) would require a much more careful and comprehensive analysis. This 
would permit a quantitative comparison of theoretical predictions with future experimental 
results.  
References 
Allen C.W., 1973. Astrophysical Quantities, 3
 ed., Univ. of London, The Athlone Press. 
Bahcall J.H., Flynn C., Gould A., 1992. Local dark matter from a carefully selected sample, 
Astrophys. J., 389, 234-250. 
Bernabei R., Belli P., Cappella F., Cerulli R., Montecchia F., Nozzoli F., Incicchitti A., 
Prosperi D., Dai C.J., Kuang H.H., Ma J.M., Ye Z.P., 2003. Dark Matter Search, Riv. 
Nuovo Cimento, 20(1), 1-73; astro-ph/0307403. 
Bernabei R., Belli P., Montecchia F., Nozzoli F., Cappella F., Incicchitti A., Prosperi D., R., 
R. Cerulli, C.J. Dai, H.L. He, H.H. Kuang, J.M. Ma, Z.P. Ye, 2006, Investigating 
pseudoscalar  and scalar dark matter, Int. J. Mod. Phys. A 21, 1445-1469.  
Drobyshevski E.M., 1996. Solar neutrinos and dark matter: cosmions, CHAMPs or… 
DAEMONs? Mon. Not. Roy. Astron Soc., 282, 211-217. 
Drobyshevski E.M., 1997a. If the dark matter objects are electrically multiply charged: New 
opportunities, in: Dark Matter in Astro- and Particle Physics (H.V.Klapdor-
Kleingrothaus and Y.Ramachers, eds.), World Scientific, pp.417-424. 
Drobyshevski E.M., 1997b. Dark Electric Matter Objects (daemons) and some possibilities of 
their detection, in: “COSMO-97. First International Workshop on Particle Physics and 
the Early Universe” (L.Roszkowski, ed.), World Scientific, pp.266-268. 
Drobyshevski E.M., 2004. Hypothesis of a daemon kernel of the Earth, Astron. Astrophys. 
Trans., 23, 49-59; astro−ph/0111042. 
Drobyshevski E.M., 2005a. Daemons, the "Troitsk anomaly" in tritium beta spectrum, and the 
KATRIN experiment, hep-ph/0502056. 
Drobyshevski E.M., 2005b. Detection of Dark Electric Matter Objects falling out from Earth-
crossing orbits, in: “The Identification of Dark Matter” (N.J.Spooner and 
V.Kudryavtsev, eds.), World Scientific, pp.408-413. 
Drobyshevski E.M., Beloborodyy M.V., Kurakin R.O., Latypov V.G., Pelepelin K.A., 2003. 
Detection of several daemon populations in Earth-crossing orbits, Astron. Astrophys. 
Trans., 22, 19-32; astro-ph/0108231. 
Drobyshevski E.M., Drobyshevski M.E., 2006. Study of the spring and autumn daemon-flux 
maxima at the Baksan Neutrino Observatory, Astron. Astrophys. Trans., 25, 57-73; 
astro−ph/0607046. 
Eddington A.S., 1926. The Internal Constitution of the Stars, Camb. Univ. Press, Cambridge. 
Evans N.W., Belokurov V., 2005. RIP: The MACHO era  (1974-2004), in: “The Identification 
of Dark Matter” (N.J.Spooner and V.Kudryavtsev, eds.), World Scientific, 2005, 
pp.141-150; astro-ph/0411222. 
Everhart E., 1974. Implicit single sequence methods for integrating orbits, Celestial 
Mechanics, 10, 35-55. 
Foot R., 2006. Implications of the DAMA/NaI and CDMS experiments for mirror matter-type 
dark matter, Phys.Rev. D74, 023514; astro-ph/0510705. 
Foot R., Mitra S., 2003. Have mirror micrometeorites been detected? Phys.Rev. D68, 071901; 
hep-ph/0306228. 
Primack J.R., Seckel D., Sadoulet B., 1988. Detection of cosmic dark matter, Annu. Rev. 
Nucl. Part. Sci., 38, 751-807.
ABSTRACT
  The assumption of the capture by the Solar System of the electrically charged
Planckian DM objects (daemons) from the galactic disk is confirmed not only by
the St.Petersburg (SPb) experiments detecting particles with V<30 km/s. Here
the daemon approach is analyzed considering the positive model independent
result of the DAMA/NaI experiment. We explain the maximum in DAMA signals
observed in the May-June period to be associated with the formation behind the
Sun of a trail of daemons that the Sun captures into elongated orbits as it
moves to the apex. The range of significant 2-6-keV DAMA signals fits well the
iodine nuclei elastically knocked out of the NaI(Tl) scintillator by particles
falling on the Earth with V=30-50 km/s from strongly elongated heliocentric
orbits. The half-year periodicity of the slower daemons observed in SPb
originates from the transfer of particles that are deflected through ~90 deg
into near-Earth orbits each time the particles cross the outer reaches of the
Sun which had captured them. Their multi-loop (cross-like) trajectories
traverse many times the Earth's orbit in March and September, which increases
the probability for the particles to enter near-Earth orbits during this time.
Corroboration of celestial mechanics calculations with observations yields
~1e-19 cm2 for the cross section of daemon interaction with the solar matter.

<|endoftext|><|startoftext|>
Introduction
	References
ABSTRACT
  We demonstrate a scheme for quantum communication between the ends of an
array of coupled cavities. Each cavity is doped with a single two level system
(atoms or quantum dots) and the detuning of the atomic level spacing and
photonic frequency is appropriately tuned to achieve photon blockade in the
array. We show that in such a regime, the array can simulate a dual rail
quantum state transfer protocol where the arrival of quantum information at the
receiving cavity is heralded through a fluorescence measurement. Communication
is also possible between any pair of cavities of a network of connected
cavities.

<|endoftext|><|startoftext|>
Introduction to Computational 
Learning Theory. The MIT Press, Cambridge, Massachusetts, 1994. 
[7]  V. Vapnik, Statistical Learning Theory, New York etc.: John Wiley & 
         Sons, Inc. 1998 
[8]  Daniil Ryabko, Pattern Recognition for Conditionally          
Independent Data, CS.LG/0507040 
[9]  X.Yao, Evolutionary Artificial Neural Networks, Int. J. Neural         
Systems, Vol 4. pp 203-222, 1993. 
[10]  B.Muller and J.Reinhardt, Neural Networks, An Introduction, 
springer-verlag 1990.  
[11] H.A. Gutowitz, editor , Cellular Automata, MIT press, Cambridge, MA, 
1990 
[12] T.Toffoli, N.Margolus, Cellular Automata Machines, A new  
environment for modeling, MIT press, Cambridge, MA, 1987 
[13]  C.R. Stephens, I. Garc´ıa Olmedo, J. Mora Vargas, H. Waelbroeck, 
Self Adaptation in evolving systems adap-org/9708002 
[14] T. B¨ack, Evolutionary Algorithms in Theory and Practice: evolution 
strategies, evolutionary programming, genetic algorithms, (Oxford 
Univ. Press 1996). 
[15]  L. Sekanina. Towards Evolvable IP Cores for FPGAs.In Proc. of the 
2003 NASA/DoD Conference on EvolvableHardware, pages 145–154, 
Chicago, IL, 2003. IEEE ComputerSociety. 
[16] L. Sekanina. Evolvable Components: From Theory to Hardware 
Implementations. Natural Computing Series, Springer Verlag, 2004. 
[17] Mohd Abubakr, R.M.Vinay, Novel Technique for Volatile Optical 
Memory using solitons, Proceeding of IEEE WOCN Bangalore, 2006
ABSTRACT
  Advances in semiconductor technology are contributing to the increasing
complexity in the design of embedded systems. Architectures with novel
techniques such as evolvable nature and autonomous behavior have engrossed lot
of attention. This paper demonstrates conceptually evolvable embedded systems
can be characterized basing on acausal nature. It is noted that in acausal
systems, future input needs to be known, here we make a mechanism such that the
system predicts the future inputs and exhibits pseudo acausal nature. An
embedded system that uses theoretical framework of acausality is proposed. Our
method aims at a novel architecture that features the hardware evolability and
autonomous behavior alongside pseudo acausality. Various aspects of this
architecture are discussed in detail along with the limitations.

<|endoftext|><|startoftext|>
Introduction
It is a long-established practice in physics to describe the gravitational field
by means of theories invariant under local Lorentz transformations. This is
the case of the Einstein-Cartan theory, for instance, or more generally of the
metric-affine approach to the gravitational field [1]. In the latter formulation,
the theory of gravity is considered as a gauge theory of the Poincaré group.
The motivation for addressing theories of gravity by means of local Lorentz
(SO(3,1)) symmetry is partially due to the impact of the Yang-Mills gauge
theory in particle physics and quantum field theory. Because of the local
SO(3,1) symmetry, it is possible to assert that in such theories “all reference
frames are equivalent”.
The investigation of metric-affine theories of gravity is important because
one might have to go beyond the Riemannian formulation of general relativity
in order to deal with structures that pertain to a possible quantum theory
of gravity. The relevance of the Poincaré group and its representations in
quantum field theory is well known. In spite of the above mentioned feature
of the local SO(3,1) symmetry, there is no physical reason that prevents
the possibility of considering theories of gravity invariant under the global
Lorentz symmetry.
One theory that exhibits invariance under global SO(3,1) symmetry is
the teleparallel equivalent of general relativity (TEGR) [2, 3, 4, 5, 6, 7, 8].
The Lagrangian density of the theory is invariant under local SO(3,1) trans-
formations up to a nontrivial, nonvanishing total divergence [9], and for this
reason the local SO(3,1) group is not a symmetry of the theory. (From a
different perspective, the TEGR may be considered as a gauge theory for the
translation group [10].) Because of the global SO(3,1) symmetry, we must
ascribe an interpretation to six degrees of freedom of the tetrad field. In the
TEGR two sets of tetrad fields that yield the same spacetime metric tensor
are physically distinct. Thus we should interpret the tetrad fields as refer-
ence frames adapted to ideal observers in spacetime. Therefore two sets of
tetrad fields that are related by a local SO(3,1) transformation yield the same
metrical properties of the spacetime, but represent reference frames that are
characterized by different inertial accelerations. In a given gravitational field
configuration, the Schwarzschild spacetime, say, a moving observer or an ob-
server at rest are described by different sets of tetrad fields, and both sets
of tetrads are related by some sort of SO(3,1) transformation. Of course the
proper interpretation of the translational and rotational accelerations of a
frame makes sense at least in the case of asymptotically flat spacetimes.
In this paper we carry out an analysis of the inertial accelerations of a
frame in the context of the TEGR. The inertial accelerations are represented
by a second rank antisymmetric tensor under global SO(3,1) transforma-
tions that is coordinate independent. This tensor can be decomposed into
translational and rotational accelerations (the latter is in fact the rotational
frequency of the frame). By considering the weak field limit we will see that
there is a very interesting relationship between the translational acceleration
and rotational frequency of the frame, and electric and magnetic fields, re-
spectively. This relationship is explicitly investigated in the context of the
Kerr spacetime. The translational acceleration and rotational frequency that
are necessary no maintain a static frame in the spacetime are closely related
to the electric field of a point charge and to the magnetic field of a perfect
magnetic dipole, respectively. The present analysis is very much similar to
the usual formulation of gravitoelectromagnetism.
We consider the four-velocity of observers that are in free fall (radially)
in the Schwarzschild spacetime and construct the reference frame adapted to
such observers. We show that the expression for the gravitational energy-
momentum that arises in the framework of the TEGR [4, 5, 7] vanishes,
if evaluated in this frame. This is a very interesting result that shows the
consistency of the above definition with the principle of equivalence. The
local effects of gravity are not measured by an observer in free fall, who
defines a locally inertial reference frame. In this frame the acceleration of
the observer vanishes (section 3), and therefore he cannot measure neither
the gravitational force exerted on him nor the mass of the black hole. Thus
in a freely falling frame the gravitational energy should vanish. The tetrad
field that establishes the reference frame of an observer in free fall is related
to other (possibly static) frames by a frame transformation, not a coordinate
transformation. For instance, it is possible to establish a transformation from
the freely falling frame to a frame adapted to observers that are asympotically
at rest in the Schwarzschild spacetime, out of which we obtain the usual value
for the total gravitational energy of the spacetime. We believe that viable
definitions of gravitational energy-momentum should exhibit this feature.
Notation: spacetime indices µ, ν, ... and SO(3,1) indices a, b, ... run from
0 to 3. Time and space indices are indicated according to µ = 0, i, a =
(0), (i). The tetrad field is denoted by ea µ, and the torsion tensor reads
Taµν = ∂µeaν − ∂νeaµ. The flat, Minkowski spacetime metric tensor raises
and lowers tetrad indices and is fixed by ηab = eaµebνg
µν = (− + ++). The
determinant of the tetrad field is represented by e = det(ea µ).
2 The field equations of the TEGR
Einstein’s general relativity is determined by the field equations. The latter
may be written either in terms of the metric tensor or of the tetrad field.
The TEGR is a reformulation of Einstein’s general relativity in terms of
the tetrad field. Sometimes the theory is also called “tetrad gravity” [9].
The tetrad field is anyway necessary to describe the coupling of Dirac spinor
fields with the gravitational field. The formulation of general relativity in
a different geometrical framework allows a new insight into the theory, and
this is precisely what happens in the consideration of the TEGR.
The Lagrangian density for the gravitational field in the TEGR is given
L = −k e (
T abcTabc +
T abcTbac − T aTa)− LM
≡ −k eΣabcTabc − LM , (1)
where k = 1/(16π), and LM stands for the Lagrangian density for the matter
fields. As usual, tetrad fields convert spacetime into Lorentz indices and vice-
versa. The tensor Σabc is defined by
Σabc =
(T abc + T bac − T cab) +
(ηacT b − ηabT c) , (2)
and T a = T b b
a. The quadratic combination ΣabcTabc is proportional to the
scalar curvature R(e), except for a total divergence [7].
The field equations for the tetrad field read
eaλebµ∂ν(eΣ
bλν)− e(Σbν aTbνµ −
eaµTbcdΣ
bcd) =
eTaµ . (3)
where eTaµ = δLM/δe
aµ. It is possible to prove by explicit calculations that
the left hand side of Eq. (3) is exactly given by 1
e [Raµ(e)− 12eaµR(e)]. The
field equations above may be rewritten in the form
∂ν(eΣ
aλν) =
e ea µ(t
λµ + T λµ) , (4)
where
tλµ = k(4ΣbcλTbc
µ − gλµΣbcdTbcd) , (5)
is interpreted as the gravitational energy-momentum tensor [7].
The Lagrangian density defined by Eq. (1) is invariant under global
SO(3,1) transformations of the tetrad field. As we asserted before, un-
der local SO(3,1) transformations the purely gravitational part of Eq. (1),
−k eΣabcTabc, transforms into −k eΣabcTabc plus a nontrivial, nonvanishing
total divergence [9]. The integral of this total divergence in general is non-
vanishing, unless restrictive conditions are imposed on the Lorentz transfor-
mation matrices.
The Hamiltonian formulation of the TEGR is obtained by first establish-
ing the phase space variables. The Lagrangian density does not contain the
time derivative of the tetrad component ea0. Therefore this quantity will
arise as a Lagrange multiplier. The momentum canonically conjugated to
eai is given by Π
ai = δL/δėai. The Hamiltonian formulation is obtained by
rewriting the Lagrangian density in the form L = pq̇ − H , in terms of eai,
Πai and Lagrange multipliers. The Legendre transform can be successfuly
carried out, and the final form of the Hamiltonian density reads [11]
H = ea0C
a + αikΓ
ik + βkΓ
k , (6)
plus a surface term. αik and βk are Lagrange multipliers that (after solving
the field equations) are identified as αik = 1/2(Ti0k + Tk0i) and βk = T00k.
Ca, Γik and Γk are first class constraints.
The constraint Ca is written as Ca = −∂iΠai+ha, where ha is an intricate
expression of the field variables. The integral form of the constraint equation
Ca = 0 motivates the definition of the total energy-momentum four-vector
P a [4],
P a = −
d3x∂iΠ
ai . (7)
V is an arbitrary volume of the three-dimensional space. In the configuration
space we have
Πai = −4keΣa0i . (8)
The emergence of total divergences in the form of scalar or vector densities
is possible in the framework of theories constructed out of the torsion tensor.
Metric theories of gravity do not share this feature. We note that by making
λ = 0 in eq. (4) and identifying Πai in the left hand side of the latter, the
integral form of eq. (4) is written as
P a =
d3x e ea µ(t
0µ + T 0µ) . (9)
In empty spacetimes and in the framework of black holes P a does represent
the gravitational energy-momentum contained in a volume V of the three-
dimensional space. Several applications to well known gravitational field
configurations support this interpretation.
3 Reference frames in spacetime
A set of four orthonormal, linearly independent vector fields in spacetime
establish a reference frame. Altogether, they define a tetrad field ea µ, which
allows the projection of vectors and tensors in spacetime in the local frame
of an observer.
Each set of tetrad fields defines a class of reference frames [12]. If we
denote by xµ(s) the world line C of an observer in spacetime (s is the proper
time of the observer), and by uµ(s) = dxµ/ds its velocity along C, we identify
the observer’s velocity with the a = (0) component of ea
µ. Thus uµ(s) =
µ along C. The acceleration aµ of the observer is given by the absolute
derivative of uµ along C,
De(0)
= uα∇αe(0) µ , (10)
where the covariant derivative is constructed out of the Christoffel symbols.
Thus ea
µ determines the velocity and acceleration along the worldline of an
observer adapted to the frame. Therefore a given set of tetrad fields, for which
µ describes a congruence of timelike curves, is adapted to a particular
class of observers, namely, to observers characterized by the velocity field
uµ = e(0)
µ, endowed with acceleration aµ. If ea µ → δaµ in the limit r → ∞,
then ea µ is adapted to static observers at spacelike infinity.
A geometrical characterization of tetrad fields as an observer’s frame can
be given by considering the acceleration of the frame along an arbitrary
path xµ(s) of the observer in spacetime. The acceleration of the frame is
determined by the absolute derivative of ea
µ along xµ(s). Thus, assuming
that the observer carries an orthonormal tetrad frame ea
µ, the acceleration
of the latter along the path is given by [13, 14]
µ , (11)
where φab is the antisymmetric acceleration tensor. According to Refs. [13,
14], in analogy with the Faraday tensor we can identify φab → (a,Ω), where
a is the translational acceleration (φ(0)(i) = a(i)) and Ω is the frequency
of rotation of the local spatial frame with respect to a nonrotating (Fermi-
Walker transported [12]) frame. It follows from Eq. (11) that
b = eb µ
= eb µ u
λ∇λea µ . (12)
Therefore given any set of tetrad fields for an arbitrary gravitational field
configuration, its geometrical interpretation can be obtained by suitably in-
terpreting the velocity field uµ = e(0)
µ and the acceleration tensor φab. The
acceleration vector aµ defined by Eq. (10) may be projected on a frame in
order to yield
ab = eb µa
µ = eb µu
α∇αe(0) µ = φ(0) b . (13)
Thus aµ and φ(0)(i) are not different accelerations of the frame.
The expression of aµ given by Eq. (10) may be rewritten as
aµ = uα∇αe(0) µ = uα∇αuµ =
, (14)
where Γ
αβ are the Christoffel symbols. We see that if u
µ = e(0)
µ represents
a geodesic trajectory, then the frame is in free fall and aµ = φ(0)(i) = 0.
Therefore we conclude that nonvanishing values of the latter quantities do
represent inertial accelerations of the frame.
In view of the orthogonality of the tetrads we write Eq. (12) as φa
−uλea µ∇λeb µ, where ∇λeb µ = ∂λeb µ − Γσλµeb σ. Now we take into account
the identity ∂λe
µ − Γσλµeb σ + 0ωλ b cec µ = 0, where 0ωλ b c is the metric
compatible, torsion free Levi-Civita connection, and express φa
b according
b = e(0)
µ( 0ωµ
a) . (15)
At last we consider the identity 0ωµ
b = −Kµ a b, where −Kµ a b is the
contortion tensor defined by
Kµab =
ν(Tλµν + Tνλµ + Tµλν) , (16)
and Tλµν = e
λTaµν (see, for instance, Eq. (4) of Ref. [7]; the identity is
obtained by requiring the vanishing of a general SO(3,1) connection ωµab, or
by direct calculation). After simple manipulations we finally obtain
φab =
[T(0)ab + Ta(0)b − Tb(0)a] . (17)
The expression above is clearly not invariant under local SO(3,1) trans-
formations, but is invariant under coordinate transformations. The values of
φab for a given tetrad field may be used to characterize the frame. We recall
that we are assuming the observer to carry the set of tetrad fields along xµ(s),
for which we have uµ = e(0)
µ. We interpret φab as the inertial accelerations
along xµ(s).
Two simple, straightforward applications of Eq. (17) are the following:
(i) The tetrad field adapted to observers at rest in Minkowski spacetime is
given by ea µ(ct, x, y, z) = δ
µ. We consider a time-dependent boost in the x
direction, say, after which the tetrad field reads
ea µ(ct, x, y, z) =
γ −βγ 0 0
−βγ γ 0 0
0 0 1 0
0 0 0 1
, (18)
where γ = (1 − β2)−1/2, β = v/c and v = v(t). The frame above is
then adapted to observers whose four-velocity is uµ = e(0)
µ(ct, x, y, z) =
(γ, βγ, 0, 0). After simple calculations we obtain
φ(0)(1) =
[βγ] =
1− v2/c2
, (19)
φ(0)(2) = 0 ,
φ(0)(3) = 0 ,
and φ(i)(j) = 0.
(ii) A frame adapted to an observer in Minkowski spacetime whose four-
velocity is e(0)
µ = (1, 0, 0, 0) and which rotates around the z axis, say, reads
ea µ(ct, x, y, z) =
1 0 0 0
0 cosω(t) − sinω(t) 0
0 sinω(t) cosω(t) 0
0 0 0 1
. (20)
It is easy to carry out the simple calculations and obtain
φ(2)(3) = 0 , (21)
φ(3)(1) = 0 ,
φ(1)(2) = −
and φ(0)(i) = 0. Together with the discussion regarding Eq. (14), the exam-
ples above support the interpretation of φa
b as the inertial accelerations of
the frame.
4 A freely falling frame in the Schwarzschild
spacetime
We will consider in this section a frame that is in free fall in the Schwarzschild
spacetime, namely, that is radially accelerated towards the center of the black
hole. We will take into account the kinematical quantities discussed the
preceeding section, in order to illustrate the construction of the tetrad field.
The spacetime is described by the line element
ds2 = −α−2dt2 + α2dr2 + r2(dθ2 + sin2 θdφ2) , (22)
where
α−2 = 1−
. (23)
Let us define the quantity β,
= (1− α−2)1/2 , (24)
which will be useful in the following.
An observer that is in radial free fall in the Schwarzschild spacetime is
endowed with the four-velocity [15]
, 0, 0
. (25)
The simplest set of tetrad fields that satisfies the condition
α = uα , (26)
is given by
eaµ =
−1 −α2β 0 0
β sin θ cosφ α2 sin θ cos φ r cos θ cosφ −r sin θ sinφ
β sin θ sinφ α2 sin θ sinφ r cos θ sinφ r sin θ cos φ
β cos θ α2 cos θ −r sin θ 0
. (27)
We recall that the index a labels the lines, and µ the columns. Since the
frame is in free fall the equation φ(0)(i) = 0 is satisfied. It is not difficult to
show that this set of tetrad fields also satisfies the conditions
φ(i)(j) =
[T(0)(i)(j) + T(i)(0)(j) − T(j)(0)(i)] = 0 . (28)
Three of the four conditions established by Eq. (26) are more relevant
for our purposes, namely, the three components of the frame velocity in
the three-dimensional space, ui = e(0)
i. Together with the three conditions
determined by Eq. (28), we have six conditions on the frame. We may assert
that these six conditions completely fix the structure of the tetrad field, even
though Eq. (28) has been verified a posteriori. Therefore Eq. (27) describes
a nonrotating frame in radial free fall in the Schwarzschild spacetime.
We will evaluate the gravitational energy-momentum out of the tetrad
field above, but will omit the details of the calculations which are alge-
braically long, but otherwise simple. The nonvanishing components of the
torsion tensor are
T001 = −β∂rβ (29)
T101 = −α2∂rβ
T202 = −rβ
T303 = −rβ sin2 θ
T212 = r(1− α2)
T313 = r(1− α2) sin2 θ .
The gravitational energy contained within a spherical surface of constant
radius is given by
P (0) = −
dSj Π
(0)j = 4k
dS1 e(e
001 + e(0) 1Σ
101) , (30)
where
Σ001 =
(g00g11g22T212 + g
00g11g33T313) , (31)
Σ101 = −
(g00g11g22T202 + g
00g11g33T303) .
We find that
e(e(0) 0Σ
001 + e(0) 1Σ
101) = r sin θ(α2 − 1− α2β2) (32)
= 0 ,
and therefore the gravitational energy contained within a surface of constant
radius as well as the total gravitational energy of the spacetime vanishes, if
evaluated in the frame of a freely falling observer. This is a very interesting
property of the whole formalism described in section 2. The vanishing of the
gravitational energy for freely falling observers is a feature that is consistent
with (and a consequence of) the principle of equivalence, since local effects of
gravity are not measured by observers in free fall. For other frames that are
related to Eq. (27) by a local Lorentz transformation we obtain nonvanishing
values of P (0). In particular, the total gravitational energy calculated out of
frames such that ea µ(t, x, y, z) → δaµ in the asymptotic limit r → ∞ is
exactly P (0) = m [4]. The latter tetrad field is adapted to observers at rest
at spacelike infinity. Thus the vanishing of gravitational energy in freely
falling frames shows that the localizability of the gravitational energy is not
inconsistent with with the principle of equivalence. The result given by Eqs.
(30-32) is a very good example of the frame dependence of the gravitational
energy definition (7).
It can be easily verified that the gravitational momentum components
P (1) and P (2) vanish in view of integrals like
0 dφ sinφ = 0 =
0 dφ cosφ,
whereas P (3) vanishes due to
0 dθ sin θ cos θ = 0.
It is important to remark that in general the vanishing of φab does not
imply the vanishing of P a. For an observer at rest at spacelike infinity the
total gravitational energy is nonvanishing, whereas for these observers we
have φab ∼= 0 (in the limit r → ∞; see next section).
5 Static frames in the Kerr spacetime
Another interesting application of the definitions of velocity and inertial ac-
celeration of a frame discussed in section 3 is the analysis of a static frame
in Kerr’s spacetime. The latter is established by the line element
ds2 = −
dt2 −
2χ sin2 θ
dφ dt+
dr2 (33)
+ρ2dθ2 +
Σ2 sin2 θ
dφ2 ,
with the following definitions:
∆ = r2 + a2 − 2mr , (34)
ρ2 = r2 + a2 cos2 θ ,
Σ2 = (r2 + a2)2 −∆a2 sin2 θ ,
ψ2 = ∆− a2 sin2 θ ,
χ = 2amr .
A static reference frame in Kerr’s spacetime is defined by the congruence
of timelike curves uµ(s) such that ui = 0, namely, the spatial velocity of the
observers is zero with respect to static observers at spacelike infinity. Since
we identify ui = e(0)
i, a static reference frame is established by the condition
i = 0 . (35)
In view of the orthogonality of the tetrads, the equation above implies e(k) 0 =
0. This latter equation remains satisfied after a local rotation of the frame,
ẽ(k) 0 = Λ
0 = 0. Therefore condition (35) determines the static char-
acter of the frame, up to an orientation of the frame in the three-dimensional
space.
A simple form for the tetrad field that satisfies Eq. (35) (or, equivalently,
e(k) 0 = 0) reads
eaµ =
−A 0 0 −B
0 C sin θ cosφ ρ cos θ cos φ −D sin θ sinφ
0 C sin θ sin φ ρ cos θ sinφ D sin θ cosφ
0 C cos θ −ρ sin θ 0
, (36)
with the following definitions
, (37)
χ sin2 θ
In the expression of D we have
Λ = (ψ2Σ2 + χ2 sin2 θ)1/2 .
We are interested in the calculation of φab given by Eq. (17), and for this
purpose it is useful to work with the inverse tetrad field ea
µ. It reads
sin θ sinφ − ρχ
sin θ cosφ 0
sin θ cos φ
sin θ sin φ
cos θ
cos θ cosφ 1
cos θ sinφ −1
sin θ
0 −ρψ
sin θ
sin θ
, (38)
where now the index a labels the columns, and µ the lines.
The frame determined by Eqs. (36) and (38) is valid in the region outside
the ergosphere. The function ψ2 = ∆ − a2 sin2 θ vanishes over the external
surface of the ergosphere (defined by r = r⋆ = m+
m2 − a2 cos2 θ; over this
surface g00 = 0), and we see that various components of Eqs. (36) and (38)
are not well defined over this surface. It is well known that it is not possible
to maintain static observers inside the ergosphere of the Kerr spacetime.
By inspecting Eq. (38) we see that for large values of r we have
µ(t, r, θ, φ) ∼= (0, cos θ,−(1/r) sin θ, 0) ,
µ(t, x, y, z) ∼= (0, 0, 0, 1) . (39)
Therefore we may assert that the frame given by Eq. (37) is characterized by
the following properties: (i) the frame is static, because Eq. (35) is verified;
(ii) the e(3)
µ components are oriented along the symmetry axis of the black
hole (the z direction). The second condition is ultimately reponsible for the
simple form of Eq. (36).
The evaluation of φab is long but straightforward, and for this reason
we will omit the details of the calculations. For convenience of notation we
define the vectors
r̂ = sin θ cosφ x̂+ sin θ sinφ ŷ + cos θ ẑ (40)
θ̂ = cos θ cos φ x̂+ cos θ sin φ ŷ− sin θ ẑ
which have well defined meaning as unit vectors in the asymptotic limit
r → ∞. We also define the three-dimensional vectors
a = (φ01, φ02, φ03) , (41)
Ω = (φ23, φ31, φ12) . (42)
We obtain the following expressions for a and Ω:
sin θ cos θ θ̂
, (43)
Ω = −
cos θ r̂+
sin θ ∂r
sin θ ∂θ
r̂ . (44)
The specific functional form of the vectors above completely characterize
the frame determined by Eq. (36). The determination of a and Ω is equiva-
lent to the fixation of six components of the tetrad field. Equations (43) and
(44) represent the inertial accelerations that one must exert on the frame
in order to verify that (i) the frame is static (condition (35)), and that (ii)
the e(3)
µ components of the tetrad field asymptotically coincides with the
symmetry axis of the black hole.
The form of a and Ω for large values of r is very interesting. It is easy to
verify that in the limit r → ∞ we obtain
r̂ , (45)
Ω ∼= −
2 cos θ r̂+ sin θ θ̂
. (46)
After the identificationsm↔ q and 4πma↔ m̄, where q is the electric charge
and m̄ is the magnetic dipole moment, equations (45) and (46) resemble the
electric field of a point charge and the magnetic field of a perfect dipole
that points in the z direction, respectively. These equations represent a
manifestation of gravitoelectromagnetism.
If we abandon the statical condition given by Eq. (35), an observer lo-
cated at a position (r, θ, φ) will be subject to an acceleration −a and to a
rotational motion determined by −Ω = ΩD, which is the dragging frequency
of the frame. Thus the gravitomagnetic effect is locally equivalent to iner-
tial effects in a frame rotating with frequency −ΩD, the latter having the
magnetic dipole moment structure given by Eq. (46). This is precisely the
gravitational Larmor’s theorem, discussed in Ref. [16].
The emergence of gravitoelectromagnetic (GEM) field quantities in the
context of the acceleration tensor φab presents no difference with respect to
the usual approach in the literature. Let us assume that tetrad field satisfies
the boundary conditions
ea µ ∼= δaµ +
ha µ , (47)
where ha µ is the perturbation of the flat space-time tetrad field in the limit
r → ∞, and that in this limit the SO(3,1) and spacetime indices acquire the
same significance. It is straightforward to verify that in this case we have
φ(0)(i) ∼= −∂i
, (48)
φ(i)(j) ∼= −
. (49)
Thus we identify
h00 , (50)
Ai = −
h0i . (51)
The identification above is equivalent to the one usually made in the litera-
ture, namely, Φ = (1/4)h̄00 and Ai = −(1/2)h̄0i [17], where h̄µν is the trace-
reversed field quantity defined by h̄µν = hµν−(1/2)ηµνh, and h = ηµνhµν . The
latter identification is made directly in the weak field form of the metric ten-
sor of a slowly rotating source. Assuming that h00 = 2Φ/c
2 and hij = δijh00,
where c is the speed of light (according to Eq. (1.4) of Ref. [17]), we obtain
h̄00 = 2h00, and therefore V = (1/4)h̄00. To our knowledge, the identification
of the GEM field quantities out of the tensor φab has not been addressed in
the literature so far.
6 Comments
Gravity theories invariant under the global SO(3,1) group are physically ac-
ceptable. The gravitational field equations determine the gravitational field,
not the frame. A given gravitational field configuration admits an infinity
of frames which in general are distinct from each other. We know that the
physical properties of a system are different in a static and in an accelerated
frame, for instance, and this feature should also hold in general relativity.
The gravitational energy-momentum that is defined in the realm of the
TEGR is frame dependent. This issue has been partially discussed before
in Refs. [7, 8], and also in Ref. [9]. This dependence is considered here to
be a natural property of the definition. The frame may be characterized by
the six components of the antisymmetric tensor φab, defined by Eq. (17),
which determine the translational acceleration and rotational frequency of
the frame, and which resembles the electric field of a point charge and the
magnetic field of a dipole, respectively, in the weak field limit of the Kerr
spacetime (in the consideration of a static frame).
In section 4 we have shown that the gravitational energy-momentum cal-
culated out of a frame that is nonrotating and freely falling in the Schwarzschild
spacetime vanishes. We expect this property to hold in the consideration of
a general spacetime geometry, in which case the analysis is somewhat more
complicated, because the frame is expected not to rotate with respect to a
Fermi-Walker transported frame. In general the construction of the latter
frame is not trivial.
It is clear that if the gravitational energy-momentum definition were in-
variant under local Lorentz transformations, we would not arrive at the result
of section 4, since the the value of P a on a three-dimensional volume V would
be the same for all frames, and presumably nonvanishing.
A common critique of the localizability of gravitational energy is that the
latter is unattainable because of the principle of equivalence. In this paper we
have seen that this is not the case. Definition (7) for the gravitational energy-
momentum yields the expected results both for observers asymptotically at
rest and for freely falling observers.
Acknowledgement
J. W. M. is grateful to G. F. Rubilar for helpful discussions on reference
frames. This work was supported in part by CNPQ (Brazil).
References
[1] F. W. Hehl, J. D. McCrea, E. W. Mielke and Y. Ne’eman, Phys. Rep.
258, 1 (1995).
[2] F. W. Hehl, in “Proceedings of the 6th School of Cosmology and Gravi-
tation on Spin, Torsion, Rotation and Supergravity”, Erice, 1979, edited
by P. G. Bergmann and V. de Sabbata (Plenum, New York, 1980).
[3] J. M. Nester, Int. J. Mod. Phys. A 4, 1755 (1989); J. Math. Phys. 33,
910 (1992).
[4] J. W. Maluf, J. F. da Rocha-Neto, T. M. L. Toŕıbio and K. H. Castello-
Branco, Phys. Rev. D 65, 124001 (2002).
[5] J. W. Maluf, F. F. Faria and K. H. Castello-Branco, Class. Quantum
Grav. 20, 4683 (2003).
[6] Y. Obukhov and J. G. Pereira, Phys. Rev. D 67, 044016 (2003).
[7] J. W. Maluf, Ann. Phys. (Leipzig) 14, 723 (2005).
[8] J. W. Maluf, S. C. Ulhoa, F. F. Faria and J. F. da Rocha-Neto, Class.
Quantum Grav. 23, 6245 (2006).
[9] Y. N. Obukhov and G. F. Rubilar, Phys. Rev. D 73, 124017 (2006).
[10] V. C. de Andrade and J. G. Pereira, Phys. Rev. D 56, 4689 (1997).
[11] J. W. Maluf and J. F. da Rocha-Neto, Phys. Rev. D 64, 084014 (2001).
[12] F. H. Hehl, J. Lemke and E. W. Mielke, Two Lectures on Fermions and
Gravity, in “Geometry and Theoretical Physics”, edited by J. Debrus
and A. C. Hirshfeld (Springer, Berlin Heidelberg, 1991).
[13] B. Mashhoon and U. Muench, Ann. Phys. (Leipzig) 11, 532 (2002) [gr-
qc/0206082].
[14] B. Mashhoon, Ann. Phys. (Leipzig) 12, 586 (2003) [hep-th/0309124].
[15] J. B. Hartle, “Gravity: An Introduction to Einstein’s General Relativ-
ity” (Addison-Wesley, San Francisco, 2003), p. 198.
[16] B. Mashhoon, Phys. Lett. A 173, 347 (1993).
[17] B. Mashhoon, Gravitoelectromagnetism: a Brief Review [gr-qc/0311030].
ABSTRACT
  We consider the interpretation of tetrad fields as reference frames in
spacetime. Reference frames may be characterized by an antisymmetric
acceleration tensor, whose components are identified as the inertial
accelerations of the frame (the translational acceleration and the frequency of
rotation of the frame). This tensor is closely related to
gravitoelectromagnetic field quantities. We construct the set of tetrad fields
adapted to observers that are in free fall in the Schwarzschild spacetime, and
show that the gravitational energy-momentum constructed out of this set of
tetrad fields, in the framework of the teleparallel equivalent of general
relatrivity, vanishes. This result is in agreement with the principle of
equivalence, and may be taken as a condition for a viable definition of
gravitational energy.

<|endoftext|><|startoftext|>
Introduction
In SuperString Theory the elementary particles are not point-like objects but ex-
tended, string-like objects. It is surprising that this apparently small change allows
us to answer fundamental questions that in the context of the quantum field theory
of point-like particles cannot even be posed. For example: Why is the Standard
Model gauge group SU(3) × SU(2)L × U(1)Y ? Why are there three families of
particles? Why is the mass of the electron me = 0.5 MeV? Why is the fine struc-
ture constant α = 1/137? In addition, only SuperString Theory has the potential
to unify all gauge interactions with gravity in a consistent way. In this sense, it
is a crucial step in the construction of the fundamental theory of particle physics
to find a consistent SuperString model in four dimensions accommodating the ob-
served Standard Model (SM), i.e. we need to find the SuperString Standard Model
(SSSM). Actually, this is the main task of what we call String Phenomenology.
In the late eighties, the compactification of the E8 × E8 Heterotic String on
http://arxiv.org/abs/0704.0987v4
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
2 Carlos Muñoz
six-dimensional orbifolds proved to be an interesting method to carry out this taska
(for a brief historical account see the Introduction in Ref. 1 and references therein).
For example, it was shown that the use of two Wilson lines on the torus defining the
symmetric Z3 orbifold can give rise to four-dimensional supersymmetric models with
gauge group SU(3)×SU(2)×U(1)5×Ghidden and, automatically, three generations
of chiral particles2. In addition, it was also shown that the Fayet–Iliopoulos (FI)
D-term, which appears because of the presence of an anomalous U(1), can give rise
to the breaking of the extra U(1)’s. In this way it was possible to construct3,4,5
supersymmetric models with gauge group SU(3)×SU(2)×U(1)Y , three generations
of particles in the observable sector, and absence of dangerous baryon- and lepton-
number-violating operatorsb.
Unfortunately, we cannot claim that one of these Z3 orbifold models is the SSSM,
since several problems are always present. For example, the initially large number
of extra particles, which are generically present in these constructions, is highly
reduced through the FI mechanism, since many of them get a high mass (≈ 1016−17
GeV). However, in general, some extra SU(3) triplets, SU(2) doublets and SU(3)×
SU(2) singlets still remain at low energy. On the other hand, given the predicted
value for the unification scale in the Heterotic String12, MGUT ≈ gGUT ×5.27 ·1017
GeV, the values of the gauge couplings deduced from LEP experiments cannot13 be
obtainedc. It was also not possible to obtain in these models the necessary Yukawa
couplings reproducing the observed fermion masses5,4,14.
At this point, it is fair to say that almost 20 years have gone by since String
Phenomenology started, and the SSSM has not been found yetd As acquittal on the
charge we should remark that there are thousands of models (vacua) that can be
built. Some of them have the gauge group of the SM or GUT groups, three families
of particles, and other interesting properties, but many others have a number of
families different from three, no appropriate gauge groups, no appropriate matter,
etc. A perfect way of solving this problem would be to use a dynamical mechanism to
select the correct model (vacuum). Such a mechanism should be able to determine
a point in the parameter space of the Heterotic String determining the correct
compactification producing the gauge group SU(3)c×SU(2)L×U(1)Y , three families
of the known particles, the correct Yukawa couplings, etc. The problem is that such
a mechanism has not been discovered yet.
So, for the moment, the best we can do is keep trying, i.e. to use the experimental
aOther interesting attempts at model building used Calabi–Yau spaces and fermionic constructions.
bRecently, other interesting models in the context of the Z3 orbifold
6, as well as in the context of
the Z6 orbifold
7,8,9, and Z12 orbifold
10,11, have been analysed.
cRecall that this is only possible in the context of the Minimal Supersymmetric Standard Model
(MSSM) for MGUT ≈ 2× 10
16 GeV.
dAnd this sentence can also be applied to any of the interesting models constructed in more recent
years using D-brane technology15. Actually, the probability of obtaining an MSSM like gauge
group with three generation in the context of intersecting D-branes in an orientifold background
seems to be extremelly small16,17, of about 10−9.
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 3
results available (such as the SM gauge group, three families, fermion masses, mixing
angles, etc.), to discard models. Although the model space is in principle huge, a
detailed analysis can reduce this to a reasonable size. For example, within the Z3
orbifold with two Wilson lines, one can construct in principle a number of order
50000 of three-generation models with the SU(3) × SU(2) × U(1)5 gauge group
associated to the first E8 of the Heterotic String. However, a study implied that
most of them are equivalent18, and in fact, at the end of the day, only 192 different
models were found19,18. This reduction is remarkable, but we should keep in mind
that the analysis of each one of these models is really complicated.
Nevertheless, a certain degree of optimism is important when working in String
Phenomenology, and one can argue that if the SM arises from SuperString Theory
there must exist one model with the right properties. In the present review we will
adopt this viewpoint, and will assume that the SM arises from orbifolds construc-
tions. Instead of the painful work of searching for the correct orbifold model, we
will try to deduce the phenomenological properties that such a model must have
in order to solve the crucial problems mentioned above, with the hope that this
analysis will allow us to make predictions that can be tested at the LHC.
In fact, all those problems, extra matter, gauge coupling unification, and cor-
rect Yukawa couplings, are closely related. The first two because the evolution of
the gauge couplings from high to low energy through the renormalization group
equations (RGEs), depends on the existing matter20,6. In Section 2 we will dis-
cuss a solution to the gauge coupling unification problem implying the prediction of
three generations of supersymmetric Higgses and vector-like colour triplets at low
energies1. In this solution the FI scale plays an important role.
Concerning the third problem, how to obtain the observed structure of fermion
masses and mixing angles, this is in our opinion the most difficult task in String
Phenomenology. For example, the right model must reproduce also the correct mass
hierarchy for quarks and leptons, mt
∼ 105, mτ
∼ 103, etc., and this is not a trivial
task, although it is true that one can find interesting results in the literature. In
particular, orbifold spaces have a beautiful mechanism to generate a mass hierar-
chy at the renormalizable level. Namely, Yukawa couplings between twisted matter
can be explicitly computed and they get suppression factors, which depend on the
distance between the fixed points to which the relevant fields are attached21−26.
The couplings can be schematically written as λ ∼ e−
Ti , with Re Ti ∼ R2i ,
and the Ti are the moduli fields associated to the size and shape of the orbifold.
The distances can be varied by giving different vacuum expectation values (VEVs)
to these moduli, implying that one can span in principle five orders of magnitude
the Yukawa couplings24−26. Unfortunately, this is not the end of the story, since
Nature tells us that a weak coupling matrix exists with weird magnitudes for the
entries, and that therefore we must arrange our up-and down-quark Yukawa cou-
plings in order to have specific off diagonal elementse. In Section 3 we will see that
e Needless to say, the recent experimental confirmation of neutrino masses makes the task even
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
4 Carlos Muñoz
to obtain this at the renormalizable level is possible if three Higgs families and the FI
breaking are present26,27. Thus we have a common solution for the three problems
mentioned above.
On the other hand, it is well known that dangerous flavour-changing neutral
currents (FCNCs) may appear when fermions of a given charge receive their mass
through couplings with several Higgs doublets28,29. This situation might be present
here since we have three generations of supersymmetric Higgses. In Section 4 we
will address this potential problem, finding that viable scenarios can be obtained27.
2. Predictions from gauge coupling unification
Since we are interested in the analysis of gauge couplings, we need to first clarify
which is the relevant scale for the running between the supersymmetric scaleMS and
the unification point. Let us recall that in heterotic compactifications some scalars
singlets Ci develop vacuum expectation values (VEVs) in order to cancel the FI
D-term, without breaking the SM gauge group. An estimate about their VEVs
can be done with the average result 〈Ci〉 ∼ 1016−17 GeV (see e.g. Ref. 6). After
the breaking, many particles, say ξ, acquire a high mass because of the generation
of effective mass terms. These come for example from operators of the type Ciξξ.
In this way extra vector-like triplets and doublets and also singlets become very
heavy. We will use the above value as our relevant scale, the so-called FI scale
MFI ≈ 1016−17 GeV.
As discussed in the Introduction, we are interested in the unification of the gauge
couplings at MGUT ≈ gGUT ×5.27 ·1017 GeV. This is not a simple issue, and various
approaches towards understanding it have been proposed in the literature30. Some
of these proposals consist of using string GUT models, extra matter at intermedi-
ate scales, heavy string threshold corrections, non-standard hypercharge normaliza-
tions, etc. In our case, we will try to obtain this value by using first the existence of
extra matter at the scale MS. We will see that this is not sufficient and, as a conse-
quence, the FI scale must be included. Let us concentrate for the moment on α3 and
α2. Recalling that three generations appear automatically for all the matter in Z3
orbifold scenarios with two Wilson lines, the most natural possibility is to assume
the presence of three light generations of supersymmetric Higgses. This implies that
we have four extra Higgs doublets, n2 = 4, with respect to the case of the MSSM.
Unfortunately, this goes wrong. Whereas α−13 remains unchanged, since the number
of extra triplets is n3 = 0, the line for α
2 is pushed down with respect to the case
of the MSSM. As a consequence, the two couplings cross at a very low scale (≈ 1012
GeV). We could try to improve this situation by assuming the presence of extra
triplets in addition to the four extra doublets. Then the line for α−13 is also pushed
down and therefore the crossing might be obtained for larger scales. However, even
more involved. We have to explain also the weak coupling matrix with the charged leptons. Besides,
in addition to the hierarchies shown above, we have to explain others such as me
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 5
Fig. 1. Unification of the gauge couplings at MGUT ≈ gGUT × 5.27 · 10
17 GeV with three light
generations of supersymmetric Higgses and vector-like colour triplets. In this example we show
one of the four possible patterns of heavy matter in eq. (2), in particular that with a) nFI3 = 0.
The line corresponding to α1 is just one of the many possible examples.
for the minimum number of extra triplets that can be naturally obtained in our
scenario, 3 × {(3, 1) + (3̄, 1)}, i.e. n3 = 6, the “unification” scale turns out to be
too large (≈ 1021 GeV). One can check that other possibilities including more extra
doublets and/or triplets do not work1. Thus, using extra matter at MS we are not
able to obtain the Heterotic String unification scale, since α3 never crosses α2 at
MGUT ≈ gGUT ×5.27 ·1017 GeV. Fortunately, this is not the end of the story. As we
will show now, the FI scale MFI is going to play an important role in the analysis.
In order to determine whether or not the Heterotic String unification scale can
be obtained, we need to know the number of doublets nFI2 and triplets n
3 in our
construction with masses of order the FI scale MFI . It is possible to show that
within the Z3 orbifold with two Wilson lines, three-generation standard-like models
must fulfil the following relation for the extra matter: 2+n2+n
2 = n3+n
3 +12.
Then, it is now straightforward to check that only models with n2 = 4, n3 = 6, and
therefore nFI2 − nFI3 = 12, may give rise to the Heterotic String unification scale1
(other possibilities for n2, n3 do not even produce the crossing of α3 and α2). This
is shown in Fig. 1 for an example with nFI3 = 0, and assuming MS = 500 GeV.
There we are using MFI = 2 · 1016 GeV as will be discussed below.
Note that at low energy we then have (excluding singlets)
3× {(3, 2) + 2(3̄, 1) + (1, 2)}+ 3× {(3, 1) + (3̄, 1) + 2(1, 2)} , (1)
i.e. the matter content of the Supersymmetric SM with three generations of Higgses
and vector-like colour triplets.
Let us remark that in these constructions only the following patterns of matter
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
6 Carlos Muñoz
with masses of order MFI are allowed:
a) nFI3 = 0 , n
2 = 12 → 3× {4(1, 2)} ,
b) nFI3 = 6 , n
2 = 18 → 3× {(3, 1) + (3̄, 1) + 6(1, 2)} ,
c) nFI3 = 12 , n
2 = 24 → 3× {2[(3, 1) + (3̄, 1)] + 8(1, 2)} ,
d) nFI3 = 18 , n
2 = 30 → 3× {3[(3, 1) + (3̄, 1)] + 10(1, 2)} . (2)
Thus for a given FI scale, MFI , each one of the four patterns in eq. (2) will give
rise to a different value for gGUT . Adjusting MFI appropriately, we can always get
MGUT ≈ gGUT ×5.27 ·1017 GeV. In particular this is so for MFI ≈ 2×1016 GeV as
shown in Fig. 1. It is remarkable that this number is within the allowed range for
the FI breaking scale as discussed above. For the pattern in Fig. 1 corresponding
to case a) we have gGUT ≈ 1.1, and therefore MGUT ≈ 5.8 · 1017 GeV.
Of course, we cannot claim to have obtained the Heterotic String unification scale
until we have shown that the coupling α1 joins the other two couplings at MGUT .
The analysis becomes more involved now and a detailed account of this issue can be
found in Ref. 1. Let us just mention that the fact that the normalization constant,
C, of the U(1)Y hypercharge generator is not fixed in these constructions as in the
case of GUTs (e.g., for SU(5), C2 = 3/5) is crucial in order to obtain the unification
with the other couplings.
Summarizing, the main characteristic of this scenario is the presence at low
energy of extra matter. In particular, we have obtained that three generations of
Higgses and vector-like colour triplets are necessary.
Since more Higgs particles than in the MSSM are present, there will be of course
a much richer phenomenology. Note for instance that the presence of six Higgs
doublets implies the existence of sixteen physical Higgs bosons, where eleven of
them are neutral and five charged.
Concerning the three generations of vector-like colour triplets, sayD andD, they
should acquire masses above the experimental limit O(200 GeV). This is possible,
in principle, through couplings with some of the extra singlets with vanishing hy-
percharge, say Ni, which are usually left at low energies, even after the FI breaking.
For example, in the model of Ref. 3, there are 13 of these singlets. Thus couplings
NiDD might be present. From the electroweak symmetry breaking, the fields Ni
a VEV might develop. Note in this sense that the Giudice–Masiero mechanism to
generate a µ term through the Kähler potential is not available in prime orbifolds
as Z3. Thus an interesting possibility to generate VEVs, given the large number of
singlets present in orbifold models, is to consider couplings of the type NiHuHd,
similarly to the Next-to-Minimal Supersymmetric Standard Model (NMSSM). It is
also worth noticing that some of these singlets might not have the necessary cou-
plings to develop VEVs and then their fermionic partners might be candidates for
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 7
right-handed neutrinosf .
For the models studied in Refs. 4, 6 the extra colour triplets have non-standard
fractional electric charge, ±1/15 and ±1/6 respectively. In fact, the existence of this
kind of matter is a generic property of the massless spectrum of supersymmetric
models. This means that they have necessarily colour-neutral fractionally charged
states, since the triplets bind with the ordinary quarks. For example, the model
with triplets with electric charge ±1/6 will have mesons and baryons with charges
±1/2 and ±3/2. On the other hand, the model studied in Ref. 3 has ‘standard’ extra
triplets, i.e. with electric charges ∓1/3 and ±2/3; these will therefore give rise to
colour-neutral integrally charged states. For example, a d-like quark D forms states
of the type uD, uuD, etc.
Let us finally mention that a detailed discussion about the stability of these
charged states, how to solve possible conflicts with cosmological bounds, and their
production modes can be found in Ref. 1.
3. Quark and lepton masses and mixing angles
Crucial ingredients in the above analysis were that all three generations of super-
symmetric Higgses remain light (Hui , H
i ), i = 1, 2, 3, and the FI breaking. And,
precisely, both ingredients favour to obtain the correct Yukawa couplings at the
renormalizable levelg. Namely, having three families of Higgses introduces more
Yukawa couplings, and after the FI breaking some physical particles appear com-
bined with other states, and the Yukawa couplings are modified in a well controlled
way. This, of course introduces more flexibility in the computation of the mass
matrices.
Let us recall that the Z3 orbifold is constructed by dividing R
6 by the [SU(3)]3
root lattice modded by the point group (P ) with generator θ, where the action of
θ on the lattice basis is θei = ei+1, θei+1 = −(ei + ei+1), with i = 1, 3, 5. The
two-dimensional sublattices associated to [SU(3)]3 are shown in Fig. 2. In orbifold
constructions, twisted strings appear attached to fixed points under the point group.
In the case of the Z3 orbifold there are 27 fixed points under P , and therefore there
are 27 twisted sectors. We will denote the three fixed points of each two-dimensional
sublattice as shown in Fig. 2. Thus the three generations arise because in addition to
the overall factor of 3 coming from the right-moving part of the untwisted matter,
the twisted matter come in 9 sets with 3 equivalent sectors on each one. Let us
fLet us remark however, that right-handed neutrino superfields with R-parity breaking couplings
of the type NiHuHd have been proposed recently
31 to solve the µ problem.
gLet us recall that the major problem that one encounters when trying to obtain models with en-
tirely renormalizable Yukawas lies at the phenomenological level, and is deeply related to obtaining
the correct quark mixing. Summarizing the analyses of Refs. 24, 25, for prime orbifolds with the
minimal Higgs content the space selection rules and the need for a fermion hierarchy forces the
fermion mass matrices to be diagonal at the renormalizable level. Thus, in these cases the CKM
parameters must arise at the non-renormalizable level. For analyses of non-prime orbifolds see
Refs. 24, 25, 32.
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
8 Carlos Muñoz
Fig. 2. Two dimensional sublattices (i = 1, 3, 5) of the Z3 orbifold. The fixed point components
are also shown.
suppose that the two Wilson lines correspond to the first and second sublattices.
The three generations correspond to move the third sublattice component (x · o) of
the fixed point keeping the other two fixed.
As mentioned in the Introduction, we must arrange our up-and down-quark
Yukawa couplings in order to have specific off diagonal elements,
HuūLαλ
u uRγ +Hdd̄Lαλ
dRγ . (3)
In principle this property arises naturally in the Z3 orbifold with two Wilson
lines23−26. For example, if the SU(2) doublet Hu corresponds to (o o o), the three
generations of (3,2) quarks to (o o (o, x, ·)) and the three generations of (3̄, 1) up-
quarks to (o o (o, x, ·)), then there are three couplings allowed from the space group
selection rule (the components of the three fixed points in each sublattice must be
either equal or different): λttHut̄LtR associated to (o o o)(o o o)(o o o) with λtt ∼
1, λcuHuc̄LuR associated to (o o o)(o o x)(o o ·) with λcu ∼ e−T5 , and λucHuūLcR
associated to (o o o)(o o ·)(o o x) with λuc ∼ e−T5 . In this simple example one
gets one diagonal Yukawa coupling without suppression factor and two off diagonal
degenerate ones ∼ e−T5 , but other more realistic examples producing the observed
structure of quark and lepton masses and mixing angles can be obtained using three
generations of Higgses26,27.
Let us first study the situation before taking into account the effect of the FI
breaking. Consider for example the following assignments of observable matter to
fixed point components in the first two sublattices;
Q o o uc o o dc x o
Hu o o Hd · o (4)
In this case the up- and down-quark mass matrices, assuming three different radii,
are given by
Mu = gNAu , Md = gNε1A
d , (5)
where g is the gauge coupling constant, N is proportional to the square root of
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 9
volume of the unit cell for the Z3 lattice, and
vu1 v
3 ε5 v
vu3 ε5 v
vu2 ε5 v
1 ε5 v
 , Ad =
vd1 v
3ε5 v
vd3ε5 v
vd2ε5 v
1ε5 v
 . (6)
Here vui , v
i denote the VEVs of the HiggsesH
i , H
i respectively, and εi = 3 e
For example, for T5 ∼ 1.95 one has ǫ5 ∼ 0.05.
The elements in the above matrices can be obtained straightforwardly. For ex-
ample, if the Higgs Hu1 corresponds to (o,o,o), then since the three generations of
(3,2) quarks Q correspond to (o,o,(o,x,·)) and the three generations of (3̄,1) quarks
uc to (o,o,(o,x,·)), there are only three allowed couplings,
(o,o,o)(o,o,o)(o,o,o) ,
(o,o,o)(o,o,x)(o,o,·) ,
(o,o,o)(o,o,·)(o,o,x) .
The corresponding suppression factors are given by 1, ε5, ε5 respectively, and are
associated with the elements 11, 23, 32 in the matrix Mu.
These matrices clearly improve the result obtained with only one Higgs family.
However, it is possible to show that although the observed quark mass ratios and
Cabbibo angle can be reproduced correctly, the 13 and 23 elements of the CKM
matrix cannot be obtained26. Fortunately, this is not the end of the story because
the previous result is modified when one takes into account the FI breaking. In
particular, it will be possible to get the right spectrum and a CKM with the right
form26,27.
As discussed in the Introduction and Section 2, some scalars Ci develop large
VEVs in order to cancel the FI D-term generated by the anomalous U(1). Thus
many particles ξ are expected to acquire a high mass because of the generation of
effective mass terms, and in this way vector-like triplets and doublets and also sin-
glets become heavy and disappear from the low-energy spectrum. This is the type
of extra matter that typically appears in orbifold constructions. The remarkable
point is that the SM matter remain massless, surviving through certain combina-
tions with other states3,4,14. Let us consider the simplest example, a model with
the Yukawa couplings
C1ξ1f , C2ξ1ξ2 , (7)
where f denotes a SM field, ξ1,2 denote two extra matter fields (triplets, doublets
or singlets), and C1,2 are the fields developing large VEVs denoted by 〈C1,2〉 = c1,2.
It is worth noting here that f can be an uc, dc, L, νc or ec field, but not a Q field.
This is because in these orbifold models no extra (3,2) representations are present,
and therefore the Standard Model field Q cannot mix with other representations
through Yukawas.
Clearly the ‘old’ physical particle f will combine with ξ1,2. It is now straightfor-
ward to diagonalise the mass matrix arising from the mass terms in eq. (7) to find
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
10 Carlos Muñoz
two very massive and one massless combination. The latter is given by
f ′ ≡ 1√
|c1|2 + |c2|2
(c∗2f − c∗1ξ2) . (8)
Notice for example that the mass terms (7) can be rewritten as
|c1|2 + |c2|2 ξ1ξ′2,
where ξ′2 ≡ 1√|c1|2+|c2|2 (c1f + c2ξ2). Indeed the unitary combination is the massless
field in eq. (8). The Yukawa couplings and hence mass matrices of the effective low
energy theory are modified accordingly. For example, consider a model where we
begin with a Yukawa coupling HQf . Since we have
|c1|2 + |c2|2
′ + c∗1ξ
2) , (9)
then the ‘new’ coupling (involving the light state) will beh
|c1|2 + |c2|2
HQf ′ .
The situation in realistic models is more involved since the fields appear in three
copies. All these effects modify the mass matrices of the low-energy effective theory
(see Eq. (5)), which, for the example studied in Ref. 26, are now given by
Mu = gNau
, Md = gNε1ad
, (10)
where
A B =
v1ε5β v3ε5 v2α
5β v2 v1α
5β v1ε5 v3α/ε5
 , (11)
and the parameters af , α, and β depend oni c1,2 ǫ1,3, and their possible values are
discussed in Ref. 26. As shown in Ref. 27, for natural values of those parameters
and the VEVs, one can find configurations that obey the electroweak symmetry
breaking conditions, and can account for the correct quark masses and mixings.
In addition to the magnitudes of the CKM matrix elements we also require a CP
violating phase. Although it has been shown that observable CP violation cannot
be obtained at the renormalizable level in odd order orbifolds34,33 for a minimal
Higgs sector, the above matrices having in addition to the ‘mixing’ of states three
families of Higgses avoid this problem35. Thus one possibility here (in addition to
the one already mentioned in footnote i) is to assume that the VEVs of the moduli
hWe should add that the coupling HQξ2, which would induce another contribution to HQf
′, is
not in fact allowed. For this to be the case the fields ξ2 and f would have had to have exactly the
same U(1)n charges. This is not possible since different particles all have different gauge quantum
numbers.
iNote that the ci are in general complex VEVs, and therefore they can give rise to a contribution to
the CP phase. This mechanism to generate the CP phase through the VEVs of the fields cancelling
the FI D-term was used first, in the context of non-renormalisable couplings, in Ref. 24. For a
recent analysis, see Ref. 33.
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 11
(a) (b)
Fig. 3. Feynman diagrams contributing to ∆mK at tree-level. h
s(p) denote scalar (pseudoscalar)
Higgses.
have an imaginary phase, which can occur when the flat moduli directions are lifted
by supersymmetry breaking and find their minimum where the phases are non-
zero36,35,37. Such a phase feeds directly into ε5. It is easy to check that this phase
is physically observable, and leads to a non-zero δ phase for the CKM matrix which
is of order one.
Let us finally mention that the correct masses for charged leptons can be ob-
tained following a similar approach, as discussed in Ref. [26]. For neutrinos this
turns out to be not sufficient, but a see-saw mechanism arising in a natural way in
orbifolds might solve the problem [38].
4. Phenomenological viability of orbifold models with three Higgs
families
The most challenging implication of an extended Higgs sector is perhaps the oc-
currence of tree-level FCNCs mediated by the exchange of neutral Higgs states.
Clearly, having six Higgs doublets (and thus six quark Yukawa couplings) the trans-
formations diagonalising the fermion mass matrices do not diagonalise the Yukawa
interactions. Since experimental data is in good agreement with the SM predictions,
where such an effect is not present, the potentially large contributions arising from
the tree-level interactions must be suppressed in order to have a model which is
experimentally viable. In general, the most stringent limit on the flavour-changing
processes emerges from the small value of the KL −KS mass difference39.
A detailed discussion of FCNCs in multi-Higgs doublet models was presented in
Ref. 40 (see also the references therein). We summarise here some relevant points
and apply the method to the orbifold case27, focusing on the neutral kaon sector
and investigating the tree-level contributions to ∆mK . The latter is simply defined
as the mass difference between the long- and short-lived kaon masses,
∆mK = mKL −mKS ≃ 2
∣MK12
∣ = 2
∣H∆S=2eff
, (12)
where H∆S=2eff is the effective Hamiltonian for the diagrams in Fig. 3. Once all
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
12 Carlos Muñoz
the contributions to MK12 have been taken into account, the prediction of this
orbifold model regarding ∆mK should be compared with the experimental value,
(∆mK)exp ≃ 3.49× 10−12 MeV.
In Ref. 27 the numerical approach was divided in two steps. Firstly, one focus
on the string sector of the model, and for each point in the space generated by the
free parameters of the orbifold (ε5, α
f ), one derives the up- and down-quark mass
matrices and computes the CKMmatrix. Further imposing the conditions associated
with electroweak symmetry breaking, and fixing a value for tanβ, one can then
determine the values of g N and ε1. A secondary step requires specifying the several
Higgs parameters, which must obey the minimum criteria Finally, the last step
comprehends the analysis of how each of the Yukawa patterns constrains the Higgs
parameters in order to have compatibility with the FCNC data. In particular, we
want to investigate how heavy the scalar and pseudoscalar eigenstates are required
to be in order to accommodate the observed value of ∆mK .
Let us summarize the analysis of the orbifold parameter space by commenting on
the relative number of input parameters and number of observables fitted. Working
with the six Higgs VEVs (v
i ), and the orbifold parameters ε1, ε5, α
and αd
one can obtain the correct electroweak symmetry breaking (MZ), as well as the
correct quark masses and mixings (six masses and three mixing angles).
In order to discuss now the tree-level FCNCs, let us remark that the present
orbifold model does not include a specific prediction regarding the Higgs sector.
For instance, we have no hint regarding the value of the several bilinear terms, nor
towards their origin. Concerning the soft breaking terms, the situation is similar.
In the absence of further information, we merely assume that the structure of the
soft breaking terms is the usual one (see Ref. 27 for further details), taking the
Higgs soft breaking masses and the Bµ-terms as free parameters (provided that the
electroweak symmetry breaking and minimisation conditions are verified).
In the absence of orbifold predictions for the Higgs sector parameters, and mo-
tivated by an argument of simplicity, we begin our analysis by considering textures
for the soft parameters as simple as possible. In particular, we arrive to four repre-
sentative cases with the following associated scalar and pseudoscalar Higgs spectra
(a) ms = {82.5, 190.6, 493.9, 515.9, 744.4, 760.2} GeV ;
mp = {186.8, 493.9, 515.9, 744.4, 760.2} GeV .
(b) ms = {84.6, 213.9, 387.4, 560.8, 785.9, 879.1} GeV ;
mp = {215.2, 387.3, 560.5, 785.9, 878.9} GeV .
(c) ms = {83.6, 292.9, 733.6, 785.9, 987.6, 1057.0} GeV ;
mp = {291.1, 733.6, 785.9, 987.6, 1057.0} GeV .
(d) ms = {79.4, 121.5, 296.9, 354.3, 794.6, 808.8} GeV ;
mp = {114.8, 296.9, 353.7, 794.6, 808.8} GeV .
In Fig. 4 we plot the ratio ∆mK/(∆mK)exp versus ε5, for cases (a)-(d), and
tanβ = 5. All the points displayed comply with the bounds from the CKM matrix.
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 13
 0.013  0.014  0.015  0.016  0.017
 0.013  0.014  0.015  0.016  0.017
Fig. 4. ∆mK/(∆mK)exp as a function of ε5 for tanβ = 5. The Higgs parameters correspond to
textures (a)-(d).
From Fig. 4 it is clear that it is quite easy for the orbifold model to accommodate
the current experimental values for ∆mK . Even though the model presents the
possibility of important tree-level contributions to the kaon mass difference, all
the textures considered give rise to contributions very close to the experimental
value. Although (a) and (b) are not in agreement with the measured value of ∆mK ,
their contribution is within order of magnitude of (∆mK)exp. As seen from Fig. 4,
with a considerably light Higgs spectrum (i.e. mh0
< 1 TeV), one is safely below
the experimental bound, as exhibited by cases (c) and (d). This is not entirely
unexpected given the strongly hierarchical structure of the Yukawa couplings (notice
from Eq. (11) that λd21 is suppressed by ε
Let us finally mention that the analysis for other neutral meson systems, Bd,
Bs and D
0, can be carried out in an analogous way27.
Additionally, and given the existence of flavour violating neutral Higgs couplings,
and the possibility of having complex Yukawa couplings, it is natural to have tree-
level contributions to CP violation. In the kaon sector, indirect CP violation is
parameterised by εK . From experiment one has εK = (2.284 ± 0.014) × 10−3. A
comparison of this quantity with the theoretical result in orbifold models can be
found in Ref. 27.
5. Conclusions
We have attacked the problem of the unification of gauge couplings in Heterotic
String constructions. In particular, we have obtained that due to the Fayet-
Iliopoulos scale, α3 and α2 cross at the right scale when a certain type of extra
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
14 Carlos Muñoz
matter is present. In this sense three families of supersymmetric Higgses and vector-
like colour triplets might be observed in forthcoming experiments. The unification
with α1 is obtained if the model has the appropriate normalization factor of the
hypercharge. Let us recall that although we have been working with explicit orb-
ifold examples, our arguments are quite general and can be used for other schemes
where the Standard Model gauge group with three generations of particles is ob-
tained, since extra matter and anomalous U(1)’s are generically present in string
compactifications.
Another advantage of these models is that they naturally predict three gener-
ations, and also that the three generations of Higgs fields give enough freedom to
allow an entirely geometric explanation of masses and mixings. The Fayet-Iliopoulos
mechanism plays also an important role here. Namely, after the gauge breaking some
physical particles appear combined with other states, and the Yukawa couplings are
modified in a well controlled way.
On the other hand, the presence of six Higgs doublets poses the potential prob-
lem of having tree-level FCNCs. By assuming simple textures for the Higgs free
parameters, we have verified for example that the experimental data on the neutral
kaon mass difference can be easily accommodated for a quite light Higgs spectra,
namely mh0
. 1 TeV.
The presence of a fairly light Higgs spectrum, composed by a total of 21 physical
states, may provide abundant experimental signatures at future colliders, like the
Tevatron or the LHC. In fact, flavour violating decays of the form hi → qq̄, or
hi → l+l− may provide the first clear evidence of this class of models.
Acknowledgments
This work was supported in part by the Spanish DGI of the MEC under Proyec-
tos Nacionales FPA2006-05423 and FPA2006-01105; by the Comunidad de Madrid
under Proyecto HEPHACOS, Ayudas de I+D S-0505/ESP-0346; and also by the
European Union under the RTN program MRTN-CT-2004-503369.
References
1. C. Muñoz, ‘A kind of prediction from superstring model building’, JHEP 12 (2001)
015 [arXiv:hep-ph/0110381].
2. L.E. Ibáñez, J.E. Kim, H.P. Nilles and F. Quevedo, ‘Orbifolds compactifications with
three families of SU(3) × SU(2)× U(1)n’, Phys. Lett. B191 (1987) 3.
3. J.A. Casas and C. Muñoz, ‘Three generation SU(3) × SU(2) × U(1)Y models from
orbifolds’, Phys. Lett. B214 (1988) 63.
4. A. Font, L.E. Ibáñez, H.P. Nilles and F. Quevedo, ‘Yukawa couplings in degenerate
orbifolds: towards a realistic SU(3) × SU(2) × U(1) superstring’, Phys. Lett. B210
(1988) 101.
5. J.A. Casas, E.K. Katehou and C. Muñoz, ‘U(1) charges in orbifolds: anomaly cancel-
lation and phenomenological consequences’, Nucl. Phys. B317 (1989) 171.
6. J. Giedt, ‘Spectra in standard-like Z3 orbifold models’, Annals Phys. 297 (2002) 67
[arXiv:hep-th/0108244].
http://arxiv.org/abs/hep-ph/0110381
http://arxiv.org/abs/hep-th/0108244
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
A kind of prediction from string phenomenology: extra matter at low energy 15
7. T. Kobayashi, S. Raby, R.-J. Zhang ‘Searching for realistic 4d string models with a Pati-
Salam symmetry: Orbifold grand unified theories from heterotic string compactification
on a Z6 orbifold’ Nucl. Phys. B704 (2005) 3 [arXiv:hep-ph/0409098].
8. W. Buchmuller, K. Hamaguchi, O. Lebedev, M. Ratz, ‘Supersymmetric standard model
from the heterotic string’, Phys. Rev. Lett. 96 (2006) 121602 [arXiv:hep-ph/0511035];
‘Supersymmetric standard model from the heterotic string (II)’, arXiv:hep-th/0606187.
9. O. Lebedev, H.P. Nilles, S. Raby, S. Ramos-Sánchez, M. Ratz, P.K. Vaudrevange
and A. Wingerter, ‘A mini-landscape of exact MSSM spectra in heterotic orbifolds’,
arXiv:hep-th/0611095.
10. J.E. Kim and B. Kyae, ‘String MSSM through flipped SU(5) from Z12 orb-
ifold’, arXiv:hep-th/0608085; ‘Flipped SU(5) from Z12−I orbifold with Wilson line’,
arXiv:hep-th/0608086.
11. J.E. Kim, J.-H. Kim and B. Kyae, ‘Superstring standard model from Z12−I
orbifold compactification with and without exotics, and effective R-parity’,
arXiv:hep-ph/0702278.
12. V.S. Kaplunovsky, ‘One loop threshold effects in string unification’, Nucl. Phys. B307
(1988) 145, Erratum, ibid. B382 (1992) 436 [arXiv:hep-th/9205070].
13. J.A. Casas and C. Muñoz, ‘Restrictions on realistic superstring models from renor-
malization group equations’, Phys. Lett. B214 (1988) 543.
14. J.A. Casas and C. Muñoz, ‘Yukawa couplings in SU(3) × SU(2) × U(1)Y orbifolds
models’, Phys. Lett. B212 (1988) 343.
15. For a review, see for instance L.E. Ibáñez, ‘Standard Model engineering with inter-
secting branes’, hep-ph/0109082, and references therein.
16. F. Gmeiner, R. Blumenhagen, G. Honecker, D. Lüst and T. Weigand, ‘One in a billion:
MSSM-like D-brane statistics’, JHEP 01 (2006) 004 [arXiv:hep-th/0510170].
17. M.R. Douglas and W. Taylor, ‘The landscape of intersecting brane models’, JHEP 01
(2007) 031 [arXiv:hep-th/0606109].
18. J.A. Casas, M. Mondragon and C. Muñoz, ‘Reducing the number of candidates to
standard model in the Z3 orbifold’, Phys. Lett. B230 (1989) 63.
19. J. Giedt, ‘Completion of standard-like embeddings’, Ann. Phys. B289 (2001) 251
[hep-th/0009104].
20. For previous works using this mechanism, see I. Antoniadis, J. Ellis, S. Kelley and
D.V. Nanopoulos, ‘The price of deriving the standard model from string’, Phys. Lett.
B272 (1991) 31;
I. Antoniadis, G.K. Leontaris and N.D. Tracas, ‘Grand unified string models and low
energy couplings’, Phys. Lett. B279 (1992) 58;
D. Bailin and A. Love, ‘String unification, grand unification and string loop threshold
corrections’, Phys. Lett. B292 (1992) 315;
M.K. Gaillard and R. Xiu, ‘Analysis of running coupling constant unification in string
theory’, Phys. Lett. B296 (1992) 71 [hep-ph/9206206];
A.E. Faraggi, ‘Gauge coupling unification in superstring derived standard-like models’,
Phys. Lett. B302 (1993) 202 [hep-ph/9301268];
K.R. Dienes, A.E. Faraggi, ‘Making ends meet: String unification and low-energy data’,
Phys. Rev. Lett. 75 (1995) 2646 [hep-th/9505018];
K.R. Dienes, A.E. Faraggi, ‘Gauge coupling unification in realistic free fermionic string
models’, Nucl. Phys. B457 (1995) 409 [hep-th/9505046];
C. Bachas, C. Fabre and T. Yanagida, ‘Natural gauge-coupling unification at the string
scale’ Phys. Lett. B370 (1996) 49 [hep-th/9510094];
D. Emmanuel-Costa, P. Fileviez-Perez and R. Gonzalez Felipe, ‘Natural gauge and grav-
itational coupling unification and the superpartner masses’ Phys. Lett. B648 (2007) 60
http://arxiv.org/abs/hep-ph/0409098
http://arxiv.org/abs/hep-ph/0511035
http://arxiv.org/abs/hep-th/0606187
http://arxiv.org/abs/hep-th/0611095
http://arxiv.org/abs/hep-th/0608085
http://arxiv.org/abs/hep-th/0608086
http://arxiv.org/abs/hep-ph/0702278
http://arxiv.org/abs/hep-th/9205070
http://arxiv.org/abs/hep-ph/0109082
http://arxiv.org/abs/hep-th/0510170
http://arxiv.org/abs/hep-th/0606109
http://arxiv.org/abs/hep-th/0009104
http://arxiv.org/abs/hep-ph/9206206
http://arxiv.org/abs/hep-ph/9301268
http://arxiv.org/abs/hep-th/9505018
http://arxiv.org/abs/hep-th/9505046
http://arxiv.org/abs/hep-th/9510094
September 15, 2021 7:34 WSPC/INSTRUCTION FILE mplareviewMunoz
16 Carlos Muñoz
[hep-ph/0610178];
V. Barger, J. Jiang, P. Langacker and T. Li, ‘String scale gauge coupling unification
with vector-like exotics and non-canonical U(1)Y normalization’, hep-ph/0612206
21. S. Hamidi and C. Vafa, ‘Interactions on orbifolds’, Nucl. Phys. B279 (1987) 465.
22. L.J. Dixon, D. Friedan, E. Martinec and S. Shenker, ‘The conformal field theory of
orbifolds’, Nucl. Phys. B282 (1987) 13.
23. L.E. Ibáñez, ‘Hierarchy of quark-lepton masses in orbifolds superstring compactifica-
tion’, Phys. Lett. B181 (1986) 269.
24. J.A. Casas and C. Muñoz, ‘Fermion masses and mixing angles: a test for string vacua’,
Nucl. Phys. B332 (1990) 189, Erratum-ibid. B340 (1990) 280.
25. J.A. Casas, F. Gómez and C. Muñoz, ‘Fitting the quark and lepton masses in string
theories’, Phys. Lett. B292 (1992) 42 [hep-th/9206083].
26. S. A. Abel and C. Muñoz, ‘Quark and lepton masses and mixing angles from super-
string constructions’, JHEP 02 (2003) 010 [arXiv:hep-ph/0212258].
27. N. Escudero, C. Muñoz and A.M. Teixeira, ‘Phenomenological viability of orbifold
models with three Higgs families’, JHEP 07 (2006) 041 [arXiv:hep-ph/0512301].
28. S.L. Glashow and S. Weinberg, ‘Natural conservation laws for neutral currents’, Phys.
Rev. D15 (1977) 1958.
29. E.A. Paschos, ‘Diagonal neutral currents’, Phys. Rev. D15 (1977) 1966.
30. For a review, see K.R. Dienes, ‘String theory and the path to unification: a review of
recent developments’, Phys. Rep. 287 (1997) 447 [hep-th/9602045].
31. D.E. Lopez-Fogliani and C. Muñoz, ‘Proposal for a supersymmetric standard model’,
Phys. Rev. Lett. 97 (2006) 041801 [arXiv:hep-ph/0508297].
32. P. Ko, T. Kobayashi and J.H. Park, ‘Quark masses and mixing angles in het-
erotic orbifold models’, Phys. Lett. B598 (2004) 263 [hep-ph/0406041]; ‘Lepton masses
and mixing angles from heterotic orbifold models’, Phys. Rev. D71 (2005) 095010
[hep-ph/0503029].
33. J. Giedt, “The KM phase in semi-realistic heterotic orbifold models”, Nucl. Phys.
B595 (2001) 3 [Erratum-ibid. B632 (2002) 397] [arXiv:hep-ph/0007193]; “CP viola-
tion and moduli stabilization in heterotic models”, Mod. Phys. Lett. A17 (2002) 1465
[arXiv:hep-ph/0204017].
34. O. Lebedev, ‘The CKM phase in heterotic orbifold models’, Phys. Lett. B521 (2001)
71 [hep-th/0108218].
35. T. Kobayashi and O. Lebedev, ‘Heterotic string backgrounds and CP violation’, Phys.
Lett. B565 (2003) 193 [hep-th/0304212].
36. B. Acharya, D. Bailin, A. Love, W.A. Sabra and S. Thomas, ‘Spontaneous breaking
of CP symmetry by orbifold moduli’, Phys. Lett. B357 (1995) 387 [hep-th/9506143];
D. Bailin, G.V. Kraniotis and A. Love, ‘CP violation by soft supersymmetry breaking
terms in orbifold compactifications’, Phys. Lett. B414 (1997) 269 [hep-th/970524]; ‘CP
violating phases in the CKM matrix in orbifold compactifications’, Phys. Lett. B435
(1998) 323 [hep-th/9805111].
37. O. Lebedev and S. Morris, ‘Towards a realistic picture of CP violation in heterotic
string models’, JHEP 08 (2002) 007 [hep-th/0203246].
38. S. A. Abel, N. Escudero, C. Muñoz and A. M. Teixeira, in preparation.
39. B. McWilliams and L. F. Li, ‘Virtual effects of Higgs particles’, Nucl. Phys. B179
(1981) 62; O. Shanker, ‘Flavor violation, scalar particles and leptoquarks’, Nucl. Phys.
B206 (1982) 253.
40. N. Escudero, C. Muñoz and A. M. Teixeira, ‘Flavor changing neutral currents
in supersymmetric multi-Higgs doublet models’, Phys. Rev. D73 (2006) 055015
[hep-ph/0512046].
http://arxiv.org/abs/hep-ph/0610178
http://arxiv.org/abs/hep-ph/0612206
http://arxiv.org/abs/hep-th/9206083
http://arxiv.org/abs/hep-ph/0212258
http://arxiv.org/abs/hep-ph/0512301
http://arxiv.org/abs/hep-th/9602045
http://arxiv.org/abs/hep-ph/0508297
http://arxiv.org/abs/hep-ph/0406041
http://arxiv.org/abs/hep-ph/0503029
http://arxiv.org/abs/hep-ph/0007193
http://arxiv.org/abs/hep-ph/0204017
http://arxiv.org/abs/hep-th/0108218
http://arxiv.org/abs/hep-th/0304212
http://arxiv.org/abs/hep-th/9506143
http://arxiv.org/abs/hep-th/9805111
http://arxiv.org/abs/hep-th/0203246
http://arxiv.org/abs/hep-ph/0512046
ABSTRACT
  We review the possibility that the Supersymmetric Standard Model arises from
orbifold constructions of the E_8 x E_8 Heterotic Superstring, and the
phenomenological properties that such a model should have. In particular,
trying to solve the discrepancy between the unification scale predicted by the
Heterotic Superstring (g_{GUT}x5.27x10^{17} GeV) and the value deduced from LEP
experiments (2x10^{16} GeV), we will predict the presence at low energies of
three families of Higgses and vector-like colour triplets. Our approach relies
on the Fayet-Iliopoulos breaking, and this is also a crucial ingredient,
together with having three Higgs families, to obtain in these models an
interesting pattern of fermion masses and mixing angles at the renormalizable
lebel. Namely, after the gauge breaking some physical particles appear combined
with other states, and the Yukawa couplings are modified in a well controlled
way. On the other hand, dangerous flavour-changing neutral currents may appear
when fermions of a given charge receive their mass through couplings with
several Higgs doublets. We will address this potential problem, finding that
viable scenarios can be obtained for a reasonable light Higgs spectrum.

<|endoftext|><|startoftext|>
Introduction
The formation and early evolution of stellar clusters occurs in deeply embedded regions
of giant molecular clouds (Lada & Lada 2003). While much has been learned from recent sur-
veys in the infrared (Gutermuth et al. 2005; Muench et al. 2002), the earliest stages of clus-
ter formation will (at least in many cases) be hidden from all but the longest IR/millimeter
wavelengths. Due to the small size scales of young clusters–multiplicity of protostars has
been observed on scales of a few thousand AU (e.g. Megeath, Wilson, & Corbin 2005)–high
angular resolution is necessary to resolve individual objects, particularly in massive star
forming regions which lie at relatively large distances (> 1 kpc). These considerations point
to millimeter-wavelength interferometric observations of thermal dust continuum emission as
an effective means of searching for clusters of young protostars, as the mm emission remains
optically thin up to high column densities (NH∼10
25 cm−2 ).
Located ∼ 1′ north of the luminous infrared cluster S255IR, S255N is a promising tar-
get in the search for young protoclusters. As illustrated in Figure 1, S255N is located in
a complicated region of past and ongoing massive star formation. S255N and S255IR (sat-
urated in this mid-IR Spitzer IRAC image) lie between two large H ii regions, S255 and
S257. Large-scale 12CO and HCN observations show that S255IR and S255N occupy op-
posite ends of a molecular ridge between the H ii regions (Heyer et al. 1989). The total
luminosity of S255N (∼ 1× 105 L⊙) is about twice that of S255IR (Minier et al. 2005), and
single-dish continuum and spectral line observations at submillimeter and millimeter wave-
lengths have established the presence of large column densities of dust and gas toward both
regions (e.g. Richardson et al. 1985; Mezger et al. 1988; Zinchenko, Henning, & Schreyer
1997; Minier et al. 2005). Observations at infrared and radio wavelengths, however, sug-
gest that S255N is the younger of the two regions. For example, S255IR is bright (>
70 Jy) in all infrared bands of the Midcourse Space Experiment (MSX) and contains a
well-developed near-IR cluster of early-B type stars (S255-2 Howard, Pipher & Forrest 1997;
Itoh et al. 2001), a cluster of compact H ii regions (Snell & Bally 1986), and a wealth
of complex H2 emission features (Miralles et al. 1997). In contrast, S255N (also called
S255 FIR1 and G192.60-MM1) is undetected in MSX images at wavelengths shorter than
– 3 –
21 µm (Crowther & Conti 2003), and contains only a single cometary UC H ii region,
G192.584-0.041 (e.g. Kurtz, Churchwell, & Wood 1994).
Additional evidence for protostellar activity in S255N exists in the form of outflow
tracers. For example, two small knots of H2 emission bracket the UC H ii region and may
be tracing an outflow (Miralles et al. 1997). In beamsizes of ∼ 50′′, redshifted CO emission
(Heyer et al. 1989) and highly-blueshifted OH absorption (Ruiz et al. 1992) are also seen
toward the UC H ii region. Finally, 44 GHz (70-61) A
+ Class I methanol masers form a
linear feature extending northeast of the UC H ii region (Kurtz, Hofner, & Álvarez 2004);
masers of this type have been observed in association with molecular outflows in other objects
(Plambeck & Menten 1990; Kurtz, Hofner, & Álvarez 2004).
At this stage, further understanding of the state of star formation in S255N requires
resolving the mm dust continuum emission in order to search for additional compact sources
that may be present in the vicinity of the UC H ii region. The high angular resolution now
available with the Submillimeter Array (SMA)1 makes this goal possible, while the wide
bandwidth allows simultaneous observation of many spectral lines, which are sensitive to a
range of gas conditions across the region. We describe our observations in section 2, present
our results in section 3, and discuss our interpretations in section 4.
A range of distances to S255 and S255N have been used in the literature. At the
extremes, Georgelin, Georglein, & Roux (1973) find an optical distance of 3.2 kpc to each
of the adjacent H ii regions S254 and S257 (located west of S255N), while Hunter & Massey
(1990) find a distance of 1.1 kpc to the optical H ii region S255 (located east of S255N, see
Figure 1) based on spectroscopy of its exciting star. Using a LSR velocity of +9 km s−1
(typical of the centroid velocities of our observed lines), and a Galactic center distance
of 8.5 kpc, we find a kinematic distance to S255N of 2.2 kpc using the rotation curve of
Brand & Blitz (1993). In this paper we adopt a distance of 2.6 kpc for S255N (also see
Moffat et al. 1979; Minier et al. 2005).
1The Submillimeter Array is a joint project between the Smithsonian Astrophysical Observatory and the
Academia Sinica Institute of Astronomy and Astrophysics and is funded by the Smithsonian Institution and
the Academia Sinica.
– 4 –
2. Observations
2.1. Submillimeter Array (SMA)
Our SMA observations toward S255N were obtained on December 06, 2003 in the com-
pact configuration. Six antennas were operational and the projected baseline lengths ranged
from 12 to 62 kλ, resulting in a synthesized beam of 4.′′7 × 2.′′4 (P.A.=−45.65◦). In this
configuration, the interferometer is not sensitive to smooth structures larger than 17′′. The
phase center was 06h12m53s.56, 18◦00′25′′.0 (J2000), and the double-sideband SIS receivers
were tuned to an LO frequency of 222.78 GHz. With a bandwidth of ∼2 GHz, the cor-
relator covered 216.796-218.764 GHz in lower sideband (LSB) and 226.796-228.764 GHz in
upper sideband (USB). The data were resampled to provide uniform spectral resolution of
1.12 km s−1. The zenith opacity as measured by the 225 GHz tipping radiometer at the
Caltech Submillimeter Observatory (CSO) was 0.18 at the beginning of the observations
and fell as low as 0.16. Data recorded late in the observing session, when the opacity had
climbed to 0.3, were not used. The typical system temperature at source transit was 200 K.
The primary beamsize of the 6-meter SMA antennas at this frequency is 56′′.
Initial calibration of the data was performed in Miriad. The gain calibrators were
J0739+016 and J0423-013. J0423-013 was also used for passband calibration. Subsequent
processing was carried out in AIPS. The continuum emission was measured using line-free
channels and removed in the u− v plane. The resulting continuum-only data were then self-
calibrated; these complex gain solutions were also applied to the continuum-subtracted line
data. The absolute position uncertainty is estimated to be 0.′′3 and the amplitude calibration
is accurate to 20%. For maximum sensitivity, both the continuum and line data were imaged
with natural weighting. The rms sensitivity of the continuum image is 4 mJy beam−1 (8
mK) and the rms sensitivity of the line data is 89 mJy beam−1 (170 mK).
2.2. Very Large Array (VLA)
Archival 3.6 cm data from the NRAO2 Very Large Array (VLA) were calibrated and
imaged in AIPS. The observation date was 2003 June 15 and the total time on source
was ∼70 minutes (project code AH819). The VLA was in A-configuration, in which the
interferometer is not sensitive to smooth structures larger than ∼7′′. The bandwidth was
50 MHz in two IFs. The flux calibrator was 3C147, and the gain calibrator was J0613+131.
2The National Radio Astronomy Observatory is a facility of the National Science Foundation operated
under agreement by the Associated Universities, Inc.
– 5 –
The synthesized beam of the 3.6 cm continuum image is 0.′′27× 0.′′23 (P.A.=77.51◦) and the
rms sensitivity is 18 µJy beam−1. Archival 3.6 cm data from the B configuration (in which
the interferometer is not sensitive to smooth structures larger than ∼20′′), observed on 1990
August 27 (project code AM301), and from the the CnB configuration, observed on 2005
June 21 (project code TSTCJC), were also reduced and combined with the A-configuration
data. The resulting image, made with a UV taper at 500 kλ, has a synthesized beam of
1.′′03× 0.′′84 (P.A.=−81.42◦) and a rms sensitivity of 34 µJy beam−1.
Archival 1.3 cm data from the VLA A-configuration (project code AC299) were analyzed
for water maser emission at 22.235 GHz. The observation date was 1991 August 1, total
on-source integration time was ∼10 minutes. The phase center of this observation was
toward the IR cluster ∼ 1′ to the south of S255N, hence a correction for the primary beam
attenuation has been applied to the data. The bandpass calibrator was 3C84 and the flux
calibration was derived assuming a flux density of 2.16 Jy for J0528+134. The synthesized
beam is 0.′′18 × 0.′′16 (P.A.=−56.55◦), the spectral line channel spacing is 0.33 km s−1, and
the rms noise is 0.13 Jy beam−1.
2.3. Spitzer Space Telescope
Mid-infrared images of S255 were obtained with the IRAC camera (Fazio et al. 2004)
on the Spitzer Space Telescope as part of Guaranteed Time Observations program 201 (P.I.
G. Fazio) on 12 March 2004. Integrations of 0.4 s and 10.4 s were taken in the high dynamic
range mode; S255N is not saturated in the longer exposures, and only the 10.4 s exposures
are discussed in this paper. Four 10.4 s exposures covered S255N, for a total integration time
on S255N of 41.6 s. Mosaiced post-BCD 3.6, 4.5, 5.8, and 8.0 µm images, calibrated and
processed using pipeline version S13.2.0, were downloaded from the Spitzer data archive.
2.4. Caltech Submillimeter Observatory
Our submillimeter continuum observations were obtained at the CSO using the Submil-
limeter High Angular Resolution Camera (SHARC), a 3He-cooled monolithic silicon bolome-
ter array of 24 pixels in a linear arrangement (Hunter, Benford & Serabyn 1996; Wang et al.
1996). For a typical dust source with a submillimeter spectral index of ∼ 4, the effective
frequency of the broadband 350 µm filter is 852 GHz and the bandwidth is 103 GHz. An
on-the-fly (OTF) map of S255 was obtained on 21 December 1995 by scanning the array
through the source in azimuth while the secondary mirror was chopping at a rate of 4.1
– 6 –
Hz and a throw of 88′′. Successive scans were made after stepping the array in elevation
by increments of 5′′. Airmass corrections were applied to each scan using the opacity de-
rived from frequent scans of Saturn during the night. The map data were restored with a
NOD2 dual beam restoration algorithm (Emerson, Klein & Haslam 1979) and transformed
into equatorial coordinates. The resulting image was smoothed with a Gaussian to produce
an effective half-power beamsize of 15′′.
3. Results
3.1. Continuum emission
Our 1.3 mm SMA continuum data resolve three distinct sources within the previously-
observed submm/mm clump of S255N (aka S255 FIR1 and G192.60-MM1: Jaffe et al. 1984;
Mezger et al. 1988; Minier et al. 2005). Figure 2 shows the SMA 1.3 mm continuum image
with CSO 350 µm contours superposed. As illustrated in Figure 2, the strongest SMA
1.3 mm emission peak coincides with the CSO 350 µm peak (resolution 15′′). The CSO 350
µm integrated flux density is 575±20 Jy, consistent to within 10% of the value predicted
by the dust spectral energy distribution (SED) models of Minier et al. (2005). The three
mm sources resolved with the SMA are designated SMA1, SMA2, and SMA3 in order of
descending peak intensity. The observed properties of each source (peak intensity, brightness
temperature, integrated flux density, and source size) are listed in Table 1, and the sources
are labeled in Figure 2. The integrated flux densities and source sizes listed in Table 1 were
determined by fitting a single Gaussian component to each SMA source. SMA1 was not well
fit by a single Gaussian, indicating that the observed continuum emission may arise from
multiple sources unresolved by the SMA beam; this issue is discussed further in §4.2. The
total flux density of the three compact mm sources is 0.79±0.16 Jy; stated errors include the
20% uncertainty in flux calibration. This total corresponds to 15±3% of the single-dish flux
density measured by Minier et al. (2005) with the SEST 15m telescope at 1.2mm (resolution
24′′).
Figure 3 compares the morphology of the 1.3 mm dust continuum emission with the
3.6 cm free-free continuum emission from the cometary UC H ii region, G192.584-0.041.
Figure 3a shows the lower-resolution (1.′′03 × 0.′′84) 3.6 cm VLA image superposed on the
SMA 1.3 mm continuum, while Figure 3b shows the high-resolution VLA 3.6 cm image
(0.′′27× 0.′′23), with the positions of the newly-reported water maser (see §3.2) and the Class
I methanol masers detected by Kurtz, Hofner, & Álvarez (2004) indicated.
The ∼ 1′′ resolution 3.6 cm VLA image presented in Figure 3a provides the most detailed
– 7 –
view to date of the diffuse cometary “tail” of the UC H ii region. The integrated flux density
of G192.584-0.041 measured from this image is 26.0±0.1 mJy. Based on our measurement
and published 2 cm integrated flux densities for G192.584-0.041 (Kurtz, Churchwell, & Wood
1994; Rengarajan & Ho 1996), the spectral index from 3.6 cm to 2 cm is ∼-0.1, consistent
with optically thin free-free emission. The flux density is consistent with a single exciting
star of spectral type B0.5 (as determined by Kurtz, Churchwell, & Wood 1994; Snell & Bally
1986). Extrapolating to 1.3 mm, we estimate the free-free contribution of G192.584-0.041 to
the 1.3 mm flux density of SMA1 to be .20 mJy (≤3.5%).
The 3.6 cm VLA image presented in Figure 3b is the highest-resolution cm-wavelength
image of G192.584-0.041 to date. With a resolution of 0.′′27× 0.′′23, the continuum emission
from G192.584-0.041 is resolved into three components (east to west): an arc, a point source,
and an extended feature (the extended “tail” is resolved out in the higher resolution data).
All three of these components overlap with the eastern side of SMA1, but none is coincident
with the mm emission peak, in agreement with the estimate that the free-free contribution
at 1.3 mm is quite small. The arc, which is oriented with its convex side towards the mm
peak, contains the brightest 3.6 cm emission. The 3.6 cm peak is located in the southern
part of the arc, east of the point source, and is offset by 1.′′1 (∼2800 AU) from the location
of the SMA1 mm continuum peak determined by fitting a single Gaussian component. The
3.6 cm point source, which is located west of the arc and faces its concave side, is offset by 1.′′6
(∼4,200 AU) from the SMA1 mm continuum peak. The peak brightness temperature of the
3.6 cm point source is only 122 K at the current resolution. Absent a second radio frequency
image with comparable resolution, it is not currently possible to ascertain the spectral indices
of the individual components. The 3.6 cm images presented in Figure 3(a-b) place strong
limits on the presence of any additional H ii regions in S255N. Other than G192.584-0.041,
no cm-wavelength emission is detected down to a 5σ limit of 90 µJy beam−1 (high-resolution
image).
3.2. Water maser emission
Water maser emission was detected at the position 06h12m53s.71, 18◦00′27.′′6 (J2000),
offset 0.′′9 (∼2,300 AU at 2.6 kpc) to the northeast of the 1.3 mm continuum emission peak
of SMA1, as determined by fitting a single Gaussian component. This is the first report
of water maser emission from S255N. The peak intensity is 2.8 Jy beam−1 (corrected for
primary beam attenuation) at vLSR=+8.5 km s
−1. The line is barely resolved by the 0.33
km s−1 spectral resolution.
The positions of the Class I 44 GHz (70-61) A
+ methanol masers detected by Kurtz, Hofner, & Álvarez
– 8 –
(2004) are marked with crosses in Fig. 3b, which shows the 1.3 mm and 3.6 cm continuum
emission and the newly-reported water maser. Kurtz, Hofner, & Álvarez (2004) estimate an
astrometric uncertainty of 0.′′5 for the CH3OH maser spots, while the absolute astrometry
of the H2O maser is better than 0.
′′1. The position of one of the CH3OH maser spots is
consistent with SMA1, within the astrometric uncertainty. The newly detected water maser
is < 1′′ from two CH3OH masers, and falls into the linear pattern of 44 GHz (70-61) A
CH3OH maser spots that extends northeast from SMA1.
3.3. Line emission
Molecular line emission from H2CO, CH3OH, SiO, CN, DCN, and HC3N is detected in
S255N; the specific transitions, frequencies, and upper state energies are listed in Table 2.
The lines detected in S255N are the same as those detected in the spectral regions covered
by our sidebands by Sutton et al. (1985) in their line survey of the Orion A molecular cloud,
which is similar to our data in spectral resolution (1.3 km/s), rms sensitivity (0.2 K), and
linear size scale (30′′= ∼13,500 AU at 450 pc). Integrated intensity images for CH3OH, SiO,
and H2CO are presented in Figure 4(a-d) and for DCN, HC3N, and CN in Figure 5(a-c).
The distributions of molecular emission observed in S255N fall into two main categories:
CH3OH, H2CO and SiO exhibit emission from multiple locations, while DCN and HC3N are
detected only in the vicinity of SMA1. CN exhibits compact emission towards SMA1, and
is also weakly detected toward SMA3. The CN lines have the lowest Eupper of the observed
transitions, and the CN images show artifacts from large-scale emission resolved out by the
interferometer, suggesting that much of the CN emission originates in an extended, cool
envelope around the compact continuum sources.
As shown in Figure 4(a-d), the spatial distributions of the integrated emission from
H2CO, CH3OH, and SiO are similar to one another. The kinematics of these molecules
are complex, as illustrated in Figures 6 and 7, with multiple spatially and kinematically
distinct components apparent. A finder chart for the positions of the profiles displayed in
Fig. 7 (named after their relative positions with respect to SMA1) is shown in Figure 5(d).
The positions are listed in Table 3. Line centroid velocities, ∆vFWHM , and integrated line
intensities obtained from Gaussian fits to the line profiles at these positions are listed in Table
4. Unless otherwise noted, the fit parameters in Table 4 are for the strongest component in
the spectrum. Fits to SiO line profiles are not included because the SiO line shapes are so
complex.
The strongest molecular emission in S255N lies toward the “SW” position ∼ 6′′ to the
southwest of SMA1 at a peak velocity between ∼ 6 − 8 km s−1 (Figs. 4, 6, 7c, Table 4).
– 9 –
This molecular emission is not coincident with any mm continuum emission and overlaps
the southern edge of the extended “tail” of the UC H ii region (Fig. 4). The line emission
toward the “SW” position is broader than at any other location in S255N, with ∆vFWHM ∼
7 and 9 km s−1 for CH3OH and H2CO, respectively, and shows pronounced blue wings. The
velocity of the H2CO peak is slightly more blueshifted than CH3OH, while SiO is significantly
blueshifted relative to both H2CO and CH3OH (Fig. 7c, Table 4).
The second-brightest region of H2CO, CH3OH, and SiO emission in S255N is located in
the vicinity of SMA1, east of the cometary head of the UCH ii region (Fig. 4). DCN, CN,
and HC3N have their strongest emission in this area (Fig. 5). Spectral line profiles for DCN,
CN, and HC3N are shown in Figure 8 and channel maps for DCN are shown in Figure 9.
Two positionally and kinematically distinct components are evident in H2CO, DCN, CN,
HC3N, and weakly, CH3OH, in the vicinity of SMA1. One component (denoted SMA1-NE)
lies 1.′′21 to the northeast of the SMA1 mm peak at a velocity of ∼ 7 km s−1, and the other
(denoted SMA1-SW) lies 1.′′17 southwest of the SMA1 mm peak at ∼ 11.5 km s−1 (Figs. 6,
7a,b, and 8a,b). SMA1-SW is 1.′′08 east of the 3.6 cm point source.
Interestingly, SMA1-NE and SMA1-SW also show differences in their chemical proper-
ties. For example, H2CO shows nearly equal strength towards both positions, as does HC3N,
while CH3OH is much stronger toward SMA1-SW (Fig. 6, 8). In contrast, DCN and CN are
both significantly stronger toward SMA1-NE (Fig. 8). Some differences in the peak velocities
at the two positions are also apparent amongst species. Relative to the other molecules, SiO
is significantly blueshifted (vLSR< 5 km s
−1) towards both SMA1-SW and SMA1-NE (Fig. 7,
Table 4). DCN is slightly redshifted relative to CN and HC3N toward both positions (Fig. 8,
Table 4). The CN and HC3N lines are also more than twice as broad as those of H2CO or
CH3OH toward SMA1-SW (Table 4).
The H2CO, CH3OH, and SiO integrated intensity peak located ∼5
′′ north of the mm
continuum source SMA2 (Fig. 4, “NW” position in Fig. 5d), is comprised of relatively weak,
broad emission (Fig. 7f). The H2CO and CH3OH lines are narrower than at the SW position,
but broader than at any of the other positions (Table 4). As at the other positions, the peak
of the SiO line profile is blueshifted relative to H2CO and CH3OH (Fig. 7f).
The line emission north and northeast of the SMA1 region (“N” and “NE” positions,
finding chart Fig. 5d) consists of narrow velocity features in CH3OH and H2CO, and, at
the NE position, SiO (Figs. 6 & 7d-e). In contrast to the other positions, at the NE
position CH3OH, H2CO, and SiO have the same velocity, vLSR∼8 km s
−1(Fig. 7d, Ta-
ble 4). The velocity of the H2CO and CH3OH emission at the N position is similar to
the vLSR∼11.5 km s
−1 component toward SMA1-SW (Table 4). The H2CO and CH3OH
lines are narrow, and the CH3OH peak is slightly blueshifted relative to H2CO. Broad and
– 10 –
weak blueshifted SiO emission is also detected at this position (Table 4, Fig. 7e).
3.4. Spitzer Space Telescope IRAC Observations
Figure 10 shows a three-color IRAC image (red 8.0 µm, green 4.5 µm, blue 3.6 µm) of
S255N, overlaid with contours of the 1.3 mm continuum emission (yellow) and the 3.6 cm
continuum emission (white). The positions of the Class I CH3OH masers reported by
Kurtz, Hofner, & Álvarez (2004) and the newly-reported water maser are marked with crosses.
Most of the observed mid-IR emission is offset to the northwest of the UC H ii region, and this
diffuse emission appears in all IRAC bands. An exception is the linear, green 4.5 µm emis-
sion feature that extends NE from the SMA1 mm continuum peak. No mid-IR emission is
associated with either SMA2 or SMA3, indeed these two positions are notably absent of IR
emission.
4. Discussion
4.1. Mass estimates from the dust emission
With an estimate of the dust temperature, we can estimate the masses of the compact
dust sources SMA1, SMA2, and SMA3 using a simple isothermal model of optically thin
dust emission (Beltrán et al. 2006):
Mgas,thin =
R Fν D
B(ν, Td) κν
where R is the gas-to-dust mass ratio (assumed to be 100), Fν is the observed flux density,
D is the distance to the source, B(ν, Td) is the Planck function, and κν is the dust mass
opacity coefficient. At 1.3 mm, the value of κ for gas densities of 106-108 cm−3 does not
differ much for grains with thick or thin ice mantles; we adopt a value of κ1.3mm=1 cm
2 g−1
for all of the compact mm sources (Ossenkopf & Henning 1994). The assumption of low
optical depth is justified by the low observed millimeter brightness temperatures (Table 1),
however, for highest accuracy we have made the small correction to our derived masses for
non-zero optical depth using the formula: Mgas = Mgas,thinτ/(1− e
−τ ).
Previous determinations of the dust temperature and mass in S255N have relied on
fitting multiple components to the (unresolved) mid-IR to mm SED. Minier et al. (2005) fit
a hot, compact core (T = 106 K, diameter = 400 AU) and an extended warm envelope (T =
44 K, diameter = 58,000AU) to a SED comprised of MSX, IRAS, SCUBA, and SEST data,
– 11 –
assuming a distance of 2.6 kpc. The derived luminosity and gas mass are 1.1× 105 L⊙ and
220 M⊙, respectively. While many of the datapoints in the SED constructed by Minier et al.
(2005) blend emission from all three SMA sources, the very compact hot core implied by
their fits would be unresolved by our SMA beam (∼12,200 × 6,200 AU at 2.6 kpc). Thus, the
hot core temperature of 106 K derived from the SED modeling provides an upper limit for
the dust temperature of the compact SMA sources. The fitted warm envelope temperature
of 44 K is likewise a good lower limit to the temperature of SMA1 since it dominates the
1.3 mm flux, contributing 73% of the total SMA flux density.
Several single dish estimates of the gas temperature are also available. For example,
Effelsberg 100-m observations of NH3 (1,1) and (2,2) (resolution 40
′′) suggest that the kinetic
temperature of the gas is only Tkin = 23 ± 1 K (Zinchenko, Henning, & Schreyer 1997).
Measurements of CH3C2H(6-5) K=0-3 toward S255N with the Onsala 20-m (resolution 38
by Malafeev et al. (2005) yield Trot = 35 ± 1 K, in better agreement with the extended
warm component derived for the dust. These authors find significantly higher temperatures
using CH3C2H(6-5) than NH3 towards all five sources observed (including S255) and suggest
that methyl acetylene may preferentially trace warmer/denser gas. In any case, since beam
dilution may play a significant role in these single dish estimates, they can only provide a
lower limit to the gas temperatures on SMA sizescales.
From our SMA line data it is clear that SMA1 is the warmest of the compact mm
sources: the two detected transitions with the highest Eupper (both HC3N, Eupper=131 K
and 142 K) are detected only towards SMA1, suggesting this may be a hot core (e.g.
Hatchell, Millar, & Rodgers 1998). DCN is also seen at this warm position. Though DCN is
formed in cold clouds, in this case it can serve as a young hot core tracer since its presence in
this warm region suggests it has recently been liberated from the icy mantles of dust grains
(e.g. Mangum et al. 1991). The HC3N emission is consistent with an upper temperature
limit of ∼100 K; a more quantitative determination is not possible with only two observed
transitions. In contrast, the ratio of the H2CO(30,3–20,2) and H2CO(32,2–22,1) lines is a reli-
able density-independent temperature diagnostic for TK . 50 K, and N(para-H2CO)/∆v .
1013.5 cm−2 (km s−1)−1 (Mangum & Wootten 1993). In this regime, the 30,3–20,2/32,2–22,1 ra-
tio ranges from 15 to 5 for Tk=20 to 50 K. In contrast, the observed line ratios in S255N
are less than 2.5 throughout the imaged region and are smallest (∼ 1.5) toward SMA1,
suggesting that the column density (i.e. opacity) and/or temperature is too high for these
lines to be diagnostic. For the position of SMA1-NE, assuming a temperature of 75 K,
NH2CO∼3.7 × 10
13 cm−2 from the H2CO(30,3–20,2) line and NH2CO∼1.0 × 10
14 cm−2 from
the H2CO(32,2–22,1). This comparison suggests that the 30,3–20,2 line is moderately optically
thick compared to the 32,2–22,1 line, and that the low line ratios are a combination of both
the column density and temperature being higher than the diagnostic range of these two
– 12 –
transitions. Combining this analysis with the SED models and the single dish line results
described above, the allowed temperature range for SMA1 is 40 - 100 K. The resulting ranges
of gas mass, column density, and number density computed for SMA1 are shown in Table 5.
The high derived gas density (nH2∼3-16×10
6 cm−3), also implied by the presence of the
water maser, indicates that the gas and dust temperatures are likely to be well-coupled (e.g.
Kaufman, Hollenbach, & Tielens 1998; Ceccarelli, Hollenbach & Tielens 1996).
Unlike SMA1, SMA2 and SMA3 are not accompanied by significant line emission.
H2CO, CH3OH, and SiO emission are present to the north of SMA2, and CN emission
is detected toward SMA3 (§3.3), but the physical relationship (if any) between this line
emission and the dust continuum sources is unclear. Also unlike SMA1, SMA2 and SMA3
are not associated with mid-IR emission in any IRAC band (§3.4). Instead, SMA2 and SMA3
appear to be cold, dark, young mm cores, without evidence for current star formation. On
the basis of the lack of line and mid-IR emission towards SMA2 and SMA3, we adopt a
lower temperature limit of 20 K and an upper temperature limit of 40 K for these sources.
The corresponding range of masses, column densities, and number densities for SMA2 and
SMA3 are tabulated in Table 5. The mass of each (7-17 M⊙ for SMA2, 6-13 M⊙ for SMA3)
is sufficient to form a low to intermediate mass star. No other cores are detected in the field
to a 5σ upper limit of M < 3M⊙ (at T = 20 K).
4.2. Velocity Structure and Outflows
Figure 10 shows a close-up view of the Spitzer 3-color IRAC image shown in Figure 1.
The brightest mid-IR emission is extended along a NE-SW axis, approximately parallel to
the axis of the UCH ii region, but with an offset to the northwest of ∼ 2′′. The overall
morphology of S255N is consistent with the multi-band bright mid-IR emission tracing the
surface of the UCH ii region, which is less dense to the southwest (of SMA1), as indicated
by the diffuse “tail” of the cometary UC H ii region extending in this direction. However,
the detailed interpretation of the mid-IR emission toward S255N is complicated by the offset
described above, and the sharp cutoff of the 3.6 and 8.0 µm emission along the southeast
boundary of the UCH ii region. Indeed, mid-IR emission is notably absent toward SMA2
and SMA3, as well as toward much of SMA1. A likely scenario for this behavior is absorption
of the mid-IR emission by the high column density mm cores; in this scenario the bulk of
the relatively cold mm cores must be in front of the UCH ii region.
With the exception of SMA1, the molecular emission in S255N is not obviously associ-
ated with any continuum emission, and is therefore unlikely to be centrally heated. However,
as described in §4.1 the H2CO line ratios suggest the gas is warm. Thus, it is likely that
– 13 –
much of this emission is associated with outflow material, although the number of outflows
and their driving source(s) are unclear. Published data on large-scale outflows in the region
(e.g. Miralles et al. 1997; Richardson et al. 1985; Heyer et al. 1989; Ruiz et al. 1992) are un-
fortunately too low in angular resolution to be useful in distinguishing outflows associated
with S255N from those associated with S255IR to the south, and/or the maps are swamped
by emission from a large outflow flowing north from S255IR. Excluding the “SW” position,
the relatively narrow linewidths of these S255N line emission regions suggest that they are
density enhancements within a larger extended flow resolved out by the interferometer. The
relative similarity of the line center velocities further suggests that the outflows are mostly
in the plane of the sky.
The linear morphology of the (green) 4.5 µm emission northeast of SMA1 is suggestive
of an outflow (Fig. 10). Such 4.5 µm nebulosity is a conspicuous feature of IRAC images
of star forming regions. Recent analysis of the massive DR21 outflow, the best-studied ex-
ample, has shown that H2 line emission accounts for ∼50% of the observed 4.5 µm IRAC
flux, and that the outflow morphology is almost identical in IRAC 4.5 µm and narrow-band
2.122 µm (H2 1-0 S(1) line) images (Davis et al. 2007; Smith et al. 2006). In S255N, the
2.122 µm H2 clump S255:H2-3 lies at the base of the 4.5 µm nebulosity; the H2 clump is
also coincident, within reported astrometric uncertainties, with our newly-reported water
maser. Both the 2.122 µm H2 1-0 S(1) line and the 4.6947 µm H2 0-0 S(9) line, identified by
Smith et al. (2006) as the dominant contributor to IRAC band 2, trace moderate-velocity
shocks (Draine, Roberge, & Dalgarno 1983). Recent models by Smith & Rosen (2005) of
shocks in dense protostellar molecular jets predict that the integrated H2 line contribution
to IRAC band 2 will be 5-14 times greater than to IRAC band 1 (3.6 µm), consistent with
the ratio of the emission seen in these bands toward the linear 4.5 µm feature. The 44
GHz Class I methanol masers, five of which lie along the 4.5 µm emission feature, pro-
vide further evidence for its identification as an outflow. Kurtz, Hofner, & Álvarez (2004)
found that masers of this type are often found in association with such outflow tracers
as SiO (IRAS 20126+4104, G31.41+0.31 and G34.26+0.15) and H2 (IRAS 20126+4104).
In G31.41+0.31, the 44 GHz methanol masers are also associated with thermal methanol
emission (Kurtz, Hofner, & Álvarez 2004), which Liechti & Walmsley (1997) found traced
shock/clump interfaces in the DR21 outflow. The parallels between these examples and
S255N strongly suggest that the 44 GHz methanol masers, H2CO, CH3OH, SiO, H2, and
4.5 µm emission extending northeast from SMA1-NE trace a molecular outflow from a pro-
tostar, probably SMA1-NE.
At the SW position, the broad lines with strong blue wings combined with the morphol-
ogy of the H2CO and CH3OH channel maps are consistent with a blueshifted outflow lobe
driven by SMA1-SW. Notably, no 44 GHz methanol masers coincide with the very strong
– 14 –
thermal CH3OH emission of the SW line peak, although elsewhere in S255N the methanol
maser flux densities are loosely correlated with the strength of thermal CH3OH emission.
The absence of masers towards the SW position suggests that the physical conditions are
not appropriate for the collisional pumping of Class I methanol masers (Cragg et al. 1992;
Plambeck & Menten 1990).
4.3. The Nature of SMA1
The complex kinematic behavior of the molecular line emission in the vicinity of SMA1
including SMA1-NE, SMA1-SW, and position “SW” (§3.3) is difficult to explain in the
context of a single protostar. Though SMA1-NE and SMA1-SW could be interpreted as the
blue and red-shifted lobes, respectively, of an outflow, this scenario does not explain the very
blue-shifted emission further to the southwest at position “SW”. Rotation also seems like an
unlikely explanation for the velocity gradient between SMA1-NE and SMA1-SW since the
gradient is parallel to the direction of the two probable outflow regions: the 4.5 µm emission
to the northeast and “SW” to the southwest. Instead, the combination of the chemical and
kinematic differentiation between SMA1-SW and SMA1-NE suggests the presence of two
individual sources, one at +7 km s−1 and one at +11.5 km s−1. To investigate this possibility,
in Figure 11 we show a uniform weighted SMA millimeter continuum image restored with
a beam of 1.′′0 (∼ 3 times smaller than the longest observed baseline), which essentially
reveals the location of the clean components. The localization of clean components into two
main regions in the vicinity of SMA1 suggests the presence of at least two sources separated
by ∼1.′′84 (4800 AU). If these two clean component peaks correspond to real dust sources,
their positions are in good agreement with the two kinematically distinct formaldehyde peaks
(< 0.′′1 and < 0.′′5). The northeast component of the pair is also within 0.′′2 of the water maser.
That two distinct protostars would exist with this separation is reasonable, as multiplicity of
protostars has been observed on scales of <6,000 AU (e.g. Megeath, Wilson, & Corbin 2005).
Although the presence of two protostars is a plausible interpretation, it clearly requires higher
resolution continuum observations for confirmation.
In any case, the molecular line emission from SMA1-NE and SMA1-SW is reminiscent of
hot molecular cores (HMCs), particularly the detection of DCN and HC3N. These molecules–
like CH3OH and H2CO, which also show strong emission towards SMA1-NE and SMA1-SW-
are present in the gas phase in HMCs because they have been evaporated from grain mantles
(e.g. Caselli 2005; Szczepanski et al. 2005). Complex organic molecules such as HCOOCH3,
however, are believed to be “daughter” species, formed in the gas phase by reactions of
“parent” species such as H2CO and CH3OH (Caselli 2005). Thus, while SMA1-NE and
– 15 –
SMA1-SW do not exhibit the truly copious molecular emission observed towards some HMCs
(e.g. Hatchell, Millar, & Rodgers 1998; Schilke et al. 2006), this is consistent with SMA1-NE
and SMA1-SW being very young sources, in which gas-phase hot-core chemistry has not yet
produced abundant complex organic molecules.
The line emission from the SMA1 sources is unusual in that the DCN(3-2) emission is
stronger than HC3N(24–23). By modeling deuterium chemistry, Roberts & Millar (2000)
find that the steady-state abundance of DCN in molecular clouds is a complicated function
of temperature and density (see their Figure 7), but generally higher at low metallicity.
We note this because S255N is located approximately (l ∼ 192◦) in the direction of the
Galactic anticenter, and may have lower metallicity than inner-galaxy star-forming regions
(Daflon & Cunha 2004; Afflerbach et al. 1997). Our SMA data, however, do not allow us
to disentangle the effects of abundance and excitation on the strengths of DCN and HC3N
emission.
The geometry of a HMC located a few arcseconds ahead of the vertex of a cometary
UCH ii region has been seen in other objects observed at high angular resolution, and this
notable configuration has led to much discussion on the energy source responsible for these
HMCs. The best-studied case for external heating is G34.26+0.15, in which Watt & Mundy
(1999) and Mookerjea et al. (2007) argue that HMC emission (characterized by complex ni-
trogen and oxygen-rich molecules) arises in gas heated by component C, the most evolved
of three nearby UCH ii regions. In the absence of extinction, a 1.1 × 104 L⊙ source at
the location of the 3.6 cm point source could heat SMA1-NE to ∼37 K, and SMA1-SW to
∼51 K, consistent only with the lower end of the range of plausible gas temperatures. In
contrast, G29.96-0.02 is the prototype for a HMC located ahead of a cometary UCH ii region
and internally heated by a high mass protostellar object (De Buizer, Osorio, & Calvet 2005;
Gibb, Wyrowski, & Mundy 2003). In G29.96-0.02, HMC emission is coincident with a re-
solved 1.4 mm continuum source, water maser spots, and a mid-IR sub-arcsecond point source
(De Buizer, Osorio, & Calvet 2005; Gibb, Wyrowski, & Mundy 2003; Olmi et al. 2003, and
references therein). In S255N, the arrangement of the 1.3 mm and 3.6 cm sources along
with the presence of water maser emission coincident with the molecular emission closely
resembles the case of G29.96-0.02. We favor the interpretation that one or more sources
younger than the excitation source of the UCH ii region are present and responsible for the
compact dust and molecular line emission from SMA1, consistent with the interpretation of
the hot core emission outlined above.
– 16 –
5. Conclusions
Our multiwavelength observations of S255N reveal significant new details in this lu-
minous star-forming region. While the previously-identified UCH ii region dominates the
cm continuum and mid-IR emission, the 1.3 mm continuum emission has been resolved into
three compact cores (SMA1, SMA2, and SMA3) clustered on scales of 0.1-0.2 pc. Dominated
by dust emission, these cores range in mass from 6 to 35 M⊙. There are no mid-infrared
point source counterparts to any of the dust cores, suggesting an early evolutionary phase.
The spectral line emission at the position of the brightest core, SMA1, is spatially compact
and includes HC3N, CN, DCN, CH3OH, SiO, and H2CO. SMA1 appears to be a developing
hot core offset by a few thousand AU from the UCH ii region. The chemical and kinematic
structure toward SMA1 is suggestive of further multiplicity at these scales. A 4.5 µm linear
feature emanating to the northeast of SMA1 is aligned with a cluster of methanol masers and
likely traces a outflow from a protostar within SMA1. We conclude that S255N is actively
forming a cluster of intermediate to high-mass stars. In addition, we speculate that some of
the missing flux in the SMA continuum image could be in the form of additional compact
low-mass, cold dust cores that lie below the sensitivity limit of our observations (M ∼ 3M⊙
at T = 20 K). Higher-resolution and more sensitive observations are needed to search for ad-
ditional protostars in S255N and other young protoclusters. Resolving individual protostars
in regions like these is a necessary task to determine how dense these young protoclusters
are, and how interactions among protostars in protoclusters may affect the process of star
formation.
This work is based in part on observations made with the Spitzer Space Telescope, which
is operated by the Jet Propulsion Laboratory, California Institute of Technology under a
contract with NASA. This research has made use of NASA’s Astrophysics Data System
Bibliographic Services and the SIMBAD database operated at CDS, Strasbourg, France.
Research at the CSO is funded by the NSF under contract AST96-15025. CJC is supported
by a National Science Foundation Graduate Research Fellowship and acknowledges partial
support from a Wisconsin Space Grant Graduate Fellowship. CJC would like to thank the
SMA and NRAO for student research support.
REFERENCES
Afflerbach, A., Churchwell, E., & Werner, M. W. 1997, ApJ, 478, 190
– 17 –
Beltrán, M. T., Brand, J., Cesaroni, R., Fontani, F., Pezzuto, S., Testi, L., & Molinari, S.
2006, A&A, 447, 221
Brand, J., & Blitz, L. 1993, A&A, 275, 67
Caselli, P. 2005, Cores to Clusters: Star Formation with Next Generation Telescopes, 47
Ceccarelli, C., Hollenbach, D.J., & Tielens, A.G.G.M. 1996, ApJ, 471, 400
Cragg, D.M., Johns, K.P., Godfrey, P.D., Brown, R.D. 1992, MNRAS, 259, 203
Crowther, P. A., & Conti, P. S. 2003, MNRAS, 343, 143
Daflon, S., & Cunha, K. 2004, ApJ, 617, 1115
Davis, C.J., Kumar, M.S.N., Sandell, G., Froebrich, D., Smith, M.D., & Currie, M.J. 2007,
MNRAS, 374, 29
De Buizer, J.M., Osorio, M., Calvet, N. 2005, ApJ, 635, 452
Draine, B.T., Roberge, W.G., & Dalgarno, A. 1983, ApJ, 264, 485
Emerson, D.T., Klein, U. & Haslam, C.G.T. 1979, A&A, 76, 92
Fazio et al. 2004, ApJS, 154, 10
Georgelin, Y.M., Georgelin, Y.P., & Roux, S. 1973, A&A, 25, 337
Gibb, A.G., Wyrowski, F., & Mundy, L.G. 2003, in SFChem 2002: Chemistry as a Diagnostic
of Star Formation, ed. C.L. Curry & M. Fich (Ottawa: NRC Press), 214
Gutermuth, R. A., Megeath, S. T., Pipher, J. L., Williams, J. P., Allen, L. E., Myers, P. C.,
& Raines, S. N. 2005, ApJ, 632, 397
Hatchell, J., Millar, T.J., & Rodgers, S.D. 1998, A&A, 332, 695
Heyer, M. H., Snell, R. L., Morgan, J., & Schloerb, F. P. 1989, ApJ, 346, 220
Howard, E.M., Pipher, J.L. & Forrest, W.J. 1997, ApJ, 481, 327
Hunter, T.R., Benford, D.J. & Serabyn, E. 1996, PASP, 108, 1042
Hunter, D.A. & Massey, P. 1990, AJ, 99, 846
Itoh, Y., Tamura, M., Suto, H., Hayashi, S.S., Murakawa, K., Oasa, Y., Nakajima, Y., Kaifu,
N., Kosugi, G., Usuda, T., & Doi, Y. 2001, PASJ, 53, 495
– 18 –
Jaffe, D. T., Davidson, J. A., Dragovan, M., & Hildebrand, R. H. 1984, ApJ, 284, 637
Kaufman, M.J., Hollenbach, D.J., & Tielens, A.G.G.M. 1998, ApJ, 497, 276
Kurtz, S., Hofner, P., & Álvarez, C. V. 2004, ApJS, 155, 149
Kurtz, S., Churchwell, E., & Wood, D. O. S. 1994, ApJS, 91, 659
Lada, C. J., & Lada, E. A. 2003, ARA&A, 41, 57
Liechti, S. & Walmsley, C.M. 1997, A&A, 321, 625
Malafeev, S. Y., Zinchenko, I. I., Pirogov, L. E., & Johansson, L. E. B. 2005, Astronomy
Letters, 31, 239
Mangum, J. G., Plambeck, R. L., & Wootten, A. 1991, ApJ, 369, 169
Mangum, J.G., & Wootten, A. 1993, ApJS, 89, 123
Megeath, S. T., Wilson, T. L., & Corbin, M. R. 2005, ApJ, 622, L141
Mezger, P. G., Chini, R., Kreysa, E., Wink, J. E., & Salter, C. J. 1988, A&A, 191, 44
Minier, V., Burton, M. G., Hill, T., Pestalozzi, M. R., Purcell, C. R., Garay, G., Walsh,
A. J., & Longmore, S. 2005, A&A, 429, 945
Miralles, M. P., Salas, L., Cruz-Gonzalez, I., & Kurtz, S. 1997, ApJ, 488, 749
Moffat, A. F. J., Jackson, P. D., & Fitzgerald, M. P. 1979, A&AS, 38, 197
Mookerjea, B., Casper, B., Mundy, L.G., & Looney, L.W. 2007, astroph/#0701827
Muench, A. A., Lada, E. A., Lada, C. J., & Alves, J. 2002, ApJ, 573, 366
Olmi, L., Cesaroni, R., Hofner, P., Kurtz, S., Churchwell, E., & Walmsley, C.M. 2003, A&A,
407, 225
Ossenkopf, V., & Henning, Th. 1994, A&A, 291, 943
Plambeck, R. L., & Menten, K. M. 1990, ApJ, 364, 555
Rengarajan, T. N., & Ho, P. T. P. 1996, ApJ, 465, 363
Richardson, K. J., White, G. J., Gee, G., Griffin, M. J., Cunningham, C. T., Ade, P. A. R.,
& Avery, L. W. 1985, MNRAS, 216, 713
– 19 –
Roberts, H., & Millar, T.J. 2000, A&A, 361, 388
Ruiz, A., Rodriguez, L. F., Canto, J., & Mirabel, I. F. 1992, ApJ, 398, 139
Schilke, P., Comito, C., Thorwirth, S., Wyrowski, F., Menten, K. M., Güsten, R., Bergman,
P., & Nyman, L.-Å. 2006, A&A, 454, L41
Smith, H.A., Hora, J.L., Marengo, M., & Pipher, J.L. 2006, ApJ, 645, 1264
Smith, M.D., & Rosen, A. 2005, MNRAS, 357, 1370
Snell, R. L., & Bally, J. 1986, ApJ, 303, 683
Sutton, E.C., Blake, G.A., Masson, C.R., & Phillips, T.G. 1985, ApJS, 58, 341
Szczepanski, J., Wang, H., Doughty, B., Cole, J., & Vala, M. 2005, ApJ, 626, L69
Wang, N., Hunter, T.R., Benford, D.J., Serabyn, E., Lis, D.C., et al. 1996, Applied Optics,
35, 6629
Watt, S., & Mundy, L. G. 1999, ApJS, 125, 143
Zinchenko, I., Henning, T., & Schreyer, K. 1997, A&AS, 124, 385
This preprint was prepared with the AAS LATEX macros v5.2.
– 20 –
Table 1. Properties of millimeter continuum sources in S255N
Source J2000 coordinates I1.3mm Size
a F1.3mm
α (h m s) δ (◦ ′ ′′) (Jy/b) [ ′′ × ′′ (◦)] (Jy) (K)
SMA1 06 12 53.67 +18 00 26.9 0.29 3.9×2.0 (17.4) 0.58± 0.12 1.84
SMA2 06 12 52.97 +18 00 31.9 0.09 2.2×1.6 (97.5) 0.12± 0.02 0.82
SMA3 06 12 53.69 +18 00 18.5 0.05 5.2× <1.3 (169.6) 0.09± 0.02 0.33
Total 0.79± 0.16
aDeconvolved source size determined by fitting a single Gaussian component to each
source. The SMA beam is 4.′′7× 2.′′4 (P.A.=−45.65◦).
bUncertainties include 20% calibration uncertainty.
cBrightness temperature computed using the Rayleigh-Jeans approximation.
– 21 –
Table 2. Molecular species and transitions observed in S255N
Species Transition Frequency Eupper/k
(GHz) (K)
SiO 5→4 217.104980 31.2
DCN 3→2 217.238538 20.9
H2CO 30,3 → 20,2 218.222192 21.0
HC3N 24→23 218.324723 131
CH3OH 4+2,2,0 → 3+1,2,0 218.440050 45.5
H2CO 32,2 → 22,1 218.475632 68.1
H2CO 32,1 → 22,0 218.760066 68.1
CNa,b 20,3,3 → 10,2,2 226.874191 16.4
CNa,c 20,3,4 → 10,2,3 226.874781 16.4
CNa 20,3,2 → 10,2,1 226.875896 16.4
CN 20,3,2 → 10,2,2 226.887420 16.4
CN 20,3,3 → 10,2,3 226.892128 16.4
HC3N 25→24 227.418905 142
aThese components are blended in our spectra.
bThe CN hyperfine components at frequencies
lower than this one lie outside of our observed band-
pass.
cThis transition is used to set the velocity scale
for CN in Fig 8.
– 22 –
Table 3. Spectral line positions
J2000 coordinates
Name α (h m s) δ (◦ ′ ′′)
SMA1-NE 06 12 53.73 +18 00 27.8
SMA1-SW 06 12 53.64 +18 00 25.4
SW 06 12 53.45 +18 00 23.0
NE 06 12 53.76 +18 00 33.2
N 06 12 53.70 +18 00 39.8
NW 06 12 52.86 +18 00 36.2
Table 4. Fitted line properties
H2CO(30,3–20,2) CH3OH DCN HC3N(24-23)
Position Center Width
Svdv Center Width
Svdv Center Width
Svdv Center Width
km s−1 km s−1 Jy b−1*km s−1 km s−1 km s−1 Jy b−1*km s−1 km s−1 km s−1 Jy b−1*km s−1 km s−1 km s−1 Jy b−1*km s−1
SMA1-NE 6.9(0.3)a 5.9(0.6)a 5.5(0.7)a 6.6(0.3)c 3.4(0.9)c 0.9(0.3)c 8.2(0.1) 3.3(0.2) 5.2(0.4) 6.3(0.3)a 3.5(0.8)a 1.9(0.6)a
SMA1-SW 12.1(0.1) 2.9(0.2) 4.2(0.3) 11.2(0.1) 2.4(0.2) 3.1(0.4) 10.7(0.3)b 5.6(0.7)b 3.1(0.5)b 9.5(0.8)a 11.2(2.1)a 3.0(0.7)a
SW 6.3(0.1) 9.2(0.4) 24.3(1.2) 7.8(0.1) 6.9(0.3) 13.4(0.8) · · · · · · · · · · · · · · · · · ·
NE 8.3(0.1) 2.9(0.3) 2.4(0.4) 8.3(0.2) 2.5(0.4) 1.6(0.4) · · · · · · · · · · · · · · · · · ·
N 11.5(0.1) 2.6(0.3) 2.4(0.3) 10.1(0.2) 2.5(0.5) 1.5(0.4) · · · · · · · · · · · · · · · · · ·
NW 8.9(0.2) 5.8(0.6) 5.3(0.7) 8.3(0.2)a 5.7(0.5)a 3.7(0.5)a · · · · · · · · · · · · · · · · · ·
aNot well fit by a single Gaussian.
bGaussian fit encompasses two blended components.
cParameters for second-strongest velocity component. Strongest component is very similar to that towards SMA1-SW.
– 24 –
Table 5. Range of estimated masses of dust cores in S255N
κ Tdust τdust M NH2
b nH2
Sourcea (cm2 g−1) (K) (1.3mm) (M⊙) (10
23cm−2) (106cm−3)
SMA1c 1 40-100 0.04-0.02 35-13 10.5-3.8 15.9-5.8
SMA2 1 20-40 0.02-0.01 17-7 5.0-2.2 7.6-3.3
SMA3 1 20-40 0.01-0.01 13-6 3.9-1.7 5.9-2.5
Total 65-26
aAssumed distance is 2.6 kpc.
bBeam-averaged quantities.The SMA beam is 4.′′7× 2.′′4 (P.A.=−45.65◦).
cThe mass of SMA1 was calculated using the 1.3 mm flux density less
the estimated free-free contribution of 20 mJy.
– 25 –
Fig. 1.— Three-color Spitzer IRAC image of S255N and its surroundings showing 8.0
µm (red), 4.5 µm (green), and 3.6 µm (blue). S255N lies in a complex region of past
and ongoing massive star-formation.
– 26 –
Fig. 2.— Greyscale and solid contours of the SMA 1.3 mm continuum with dotted contours
of the CSO 350 µm continuum superposed. The primary beam of the SMA (56′′ at 218.7
GHz) is indicated with a black circle. Naming conventions for mm and submm sources used
in the literature and in this paper are also indicated. The black 1.3 mm contour levels
are (-3, 3, 7, 15, 31, 47, 63) × 4 mJy beam−1 (the rms noise), observed with a 4.′′7 × 2.′′4
(P.A.=−45.65◦) beam. The dotted 350 µm contour levels are (2, 2.5, 3, 4, 5) × 16.5 Jy
beam−1, resolution 15′′.
– 27 –
Fig. 3.— (a) Black contours of the SMA 1.3 mm continuum, observed with a 4.′′7 × 2.′′4
(P.A.=−45.65◦) beam, with greyscale and dotted contours of the 3.6 cm continuum, observed
with an 1.′′03×0.′′84 beam (P.A.=−81.42◦), superposed. The black 1.3 mm contour levels are
(-3, 3, 7, 15, 31, 47, 63) × 4 mJy beam−1 (the rms noise). The dotted 3.6 cm contour levels
are (-3, 3, 5, 7, 11, 21, 41, 61, 81, 101, 121, 161) × 34 µJy beam−1 (the rms noise). The
SMA beam (black ellipse) and VLA beam (filled black ellipse) are plotted at lower left. (b)
Black contours of the SMA 1.3 mm continuum with greyscale of high-resolution (0.′′27×0.′′23,
P.A.=77.51◦ beam) 3.6 cm continuum superposed. The positions of methanol masers and of
the newly-reported water maser are marked. The VLA beam (filled black ellipse) is plotted
at lower left.
– 28 –
Fig. 4.— (a-d) Colorscale of the 1.3 mm continuum with black integrated intensity contours
for molecules that exhibit emission from multiple positions. The molecular species and
upper state energies are indicated in the lower right of each panel (also see Table 2). The
white contours show 3.6 cm emission from the UC H ii region G192.584-0.041. The blue cross
marks the position of the newly detected water maser. The black integrated intensity contour
levels are (-3, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29) × the rms noise levels: (a)
CH3OH 0.82 Jy beam
−1*km s−1, (b) SiO 1.35 Jy beam−1*km s−1, (c) H2CO 30,3 → 20,2 0.8
Jy beam−1*km s−1, (d) H2CO 32,2 → 22,1 0.76 Jy beam
−1*km s−1. The white 3.6 cm contour
levels are (-3, 3, 5, 7, 11, 21, 41, 61, 81, 101, 121, 161) × 34 µJy beam−1 (the rms noise). The
4.′′7 × 2.′′4 (P.A.=−45.65◦) SMA beam (filled black ellipse) and 1.′′03× 0.′′84 (P.A.=−81.42◦)
VLA beam (filled white ellipse) are shown at lower left in each panel.
– 29 –
Fig. 5.— (a-c) Similar to Figure 4a-d except integrated intensity images are shown for
molecules that are only detected in the vicinity of SMA1. The black integrated intensity
contour levels are (-3, 3, 5, 7, 9, 11) × the rms noise levels: (a) CN 0.9 Jy beam−1*km s−1,
(b) DCN 0.55 Jy beam−1*km s−1, and (c) HC3N 24→23 0.68 Jy beam
−1*km s−1. (d) Finding
chart for representative line profiles shown in Figures 7 & 8 and discussed in §3.3.
– 30 –
Fig. 6.— Channel maps of (a) H2CO(30,3–20,2) and (b) CH3OH, showing line emission
(contours) overlaid on the 1.3 mm continuum (greyscale). The contours levels are (1, 2, 3,
4, 5, 6, 7, 8, 9) × 0.28 Jy beam−1. Each panel is labeled with the channel velocity. The
position of the 3.6 cm point source is marked with a white cross.
– 31 –
CH3OH
(d) NE (e) North (f) NW
(c) SW(a) SMA1−NE (b) SMA1−SW
Fig. 7.— Representative line profiles for molecules that exhibit emission from multiple
positions, demonstrating the wide range of velocity and chemical behavior observed. The
positions for which spectra are presented are indicated on Figure 5d and listed in Table 3.
A vertical line is drawn at 10 km s−1 for reference.
(a) SMA1−NE (b) SMA1−SW
Fig. 8.— Representative line profiles for molecules that have their strongest emission towards
SMA1. The positions for which spectra are presented are indicated on Figure 5d and listed
in Table 3. A vertical line is drawn at 10 km s−1 for reference. Note that the vertical scale
is not the same as in Figure 7. For CN, the weaker features offset by -16 and -22 km s−1
from the main feature are due to the hyperfine components (see Table 2).
– 32 –
Fig. 9.— Channel maps of DCN emission (contours) overlaid on the 1.3 mm continuum
(greyscale). Each panel is labeled with the channel velocity. The position of the 3.6 cm
point source is marked with a white cross.
– 33 –
Fig. 10.— Close-up three-color Spitzer IRAC image of S255N showing mid-IR emission
offset to the NW of the UC H ii region with yellow SMA 1.3 mm and black 3.6 cm contours
superposed. The colorscale correspond to: 8.0 µm (red), 4.5 µm (green), and 3.6 µm (blue).
The yellow SMA 1.3 mm continuum contour levels are (3, 7, 15, 31, 47, 63) × 4 mJy beam−1
(the rms noise). The black VLA 3.6 cm continuum contour levels are (3, 5, 7, 11, 21, 41, 61,
81, 101, 121, 161) × 34 µJy beam−1 (the rms noise). Positions of Class I CH3OH masers
from Kurtz, Hofner, & Álvarez (2004) are marked with red crosses, and the position of the
newly-reported water maser is marked with a blue cross.
– 34 –
Fig. 11.— Greyscale image of the high resolution VLA 3.6 cm emission shown in Figure 3b
with SMA 1.3 mm uniform weighted continuum contours superposed. The 1.3 mm image
was restored with a 1′′ beam (∼ 3 times smaller than the longest observed baseline) which
essentially shows regions where clean components are concentrated. Red + symbols show
the 44 GHz methanol maser positions, the blue △ shows the H2O maser position, the green
× symbols show the peak locations of the ∼ 7 (SMA1-NE) and ∼ 11.5 km s−1 (SMA1-SW)
H2CO components, and the black ⋄ shows the location of SMA1 reported in Table 1. The
localization of clean components into two distinct regions in the vicinity of SMA1 suggests
the presence of at least two continuum sources; this result requires confirmation by higher
resolution mm/submm data.
	Introduction
	Observations
	Submillimeter Array (SMA)
	Very Large Array (VLA)
	Spitzer Space Telescope
	Caltech Submillimeter Observatory
	Results
	Continuum emission
	Water maser emission
	Line emission
	Spitzer Space Telescope IRAC Observations
	Discussion
	Mass estimates from the dust emission
	Velocity Structure and Outflows
	The Nature of SMA1
	Conclusions
ABSTRACT
  S255N is a luminous far-infrared source that contains many indications of
active star formation but lacks a prominent near-infrared stellar cluster. We
present mid-infrared through radio observations aimed at exploring the
evolutionary state of this region. Our observations include 1.3mm continuum and
spectral line data from the Submillimeter Array, VLA 3.6cm continuum and 1.3cm
water maser data, and multicolor IRAC images from the Spitzer Space Telescope.
The cometary morphology of the previously-known UCHII region G192.584-0.041 is
clearly revealed in our sensitive, multi-configuration 3.6cm images. The 1.3mm
continuum emission has been resolved into three compact cores, all of which are
dominated by dust emission and have radii < 7000AU. The mass estimates for
these cores range from 6 to 35 Msun. The centroid of the brightest dust core
(SMA1) is offset by 1.1'' (2800 AU) from the peak of the cometary UCHII region
and exhibits the strongest HC3N, CN, and DCN line emission in the region. SMA1
also exhibits compact CH3OH, SiO, and H2CO emission and likely contains a young
hot core. We find spatial and kinematic evidence that SMA1 may contain further
multiplicity, with one of the components coincident with a newly-detected H2O
maser. There are no mid-infrared point source counterparts to any of the dust
cores, further suggesting an early evolutionary phase for these objects. The
dominant mid-infrared emission is a diffuse, broadband component that traces
the surface of the cometary UCHII region but is obscured by foreground material
on its southern edge. An additional 4.5 micron linear feature emanating to the
northeast of SMA1 is aligned with a cluster of methanol masers and likely
traces a outflow from a protostar within SMA1. Our observations provide direct
evidence that S255N is forming a cluster of intermediate to high-mass stars.

<|endoftext|><|startoftext|>
7 Enumerating limit groups
Daniel Groves and Henry Wilton
21st May 2007
Abstract
We prove that the set of limit groups is recursive, answering a
question of Delzant. One ingredient of the proof is the observation
that a finitely presented group with local retractions (à la Long and
Reid) is coherent and, furthermore, there exists an algorithm that
computes presentations for finitely generated subgroups. The other
main ingredient is the ability to algorithmically calculate centralizers
in relatively hyperbolic groups. Applications include the existence of
recognition algorithms for limit groups and free groups.
A limit group is a finitely generated, fully residually free group. Recent
research into limit groups has been motivated by their role in the theory of
the set of homomorphisms from a finitely presented group to a free group, and
in the logic of free groups. This research has culminated in the independent
solutions to Tarski’s problems on the elementary theory of free groups by
Z. Sela (see [21], [22] et seq.) and O. Kharlampovich and A. Miasnikov (see
[12], [13] et seq.). Sela’s work extends to the elementary theory of hyperbolic
groups [19].
We will be entirely concerned with finitely presentable groups. A class
of groups G is recursively enumerable if there exists a Turing machine that
outputs a list of presentations for every group G; it is recursive if, furthermore,
the Turing machine only outputs one presentation from each isomorphism
class of G. T. Delzant asked if the class of limit groups is recursive [20, I.13].
Theorem A (Corollary 3.8) The class of limit groups is recursive.
In [4] and [8, 7] it is shown that the isomorphism problem is solvable for
the class of limit groups. Therefore, if the class of limit groups is recursively
http://arxiv.org/abs/0704.0989v2
enumerable it is recursive. To enumerate limit groups, our approach is to
use the structure theory of limit groups developed in [13]. An equivalent
structure theory is described in [21], which could also be used. Either way,
two problems need to be solved. First, one needs to be able to compute
presentations for finitely generated subgroups of limit groups. We call this
property effective coherence. Secondly, one needs to be able to compute
centralizers of elements in limit groups. To solve the second problem we use
the relatively hyperbolic structure on limit groups found in [9] and [1]. Our
solution to the first problem relies on local retractions.
D. Long and A. Reid [14] defined a group to have local retractions or
property LR if every finitely generated subgroup is a retract of a finite-index
subgroup. A finitely presented group with local retractions is coherent. Fur-
thermore, one can compute presentations for subgroups.
Theorem B (Theorem 2.4) There exists an algorithm that, given a finite
presentation for a group G with local retractions and a finite set of elements
S, outputs a presentation for the subgroup generated by S.
It is a remarkable fact that limit groups are finitely presented. It was
proved in [23] that limit groups have local retractions. There is a lengthier
proof that limit groups are effectively coherent using the theorem, also proved
in [23], that iterated centralizer extensions are coset separable with respect
to their vertex groups.
As an application of Theorem A, in section 4 we prove the following
theorem.
Theorem C (Theorem 4.1) There exists an algorithm that, given as in-
put a presentation for a group G and a solution to the word problem in G,
determines whether or not G is a limit group.
In Corollary 4.3, we deduce the existence of a similar recognition algo-
rithm for free groups (pointed out to us by Gilbert Levitt).
This paper is the first of a series, in which we intend to prove algorith-
mic versions of Sela’s results. Specifically, enumerating limit groups will be
useful in the algorithmic construction of Makanin–Razborov diagrams over
free groups.
Acknowledgements
The authors would like to thank Zlil Sela for many insightful and generous
conversations, and also François Dahmani and Vincent Guirardel for pointing
out Corollary 4.2 to us. Thanks also to Gilbert Levitt for drawing Corollary
4.3 to our attention, and to Martin Bridson for explaining the ideas of the
paragraph before Theorem 3.5. The first author was supported in part by
NSF Grant DMS-0504251.
1 Effective coherence
A finitely generated group is coherent if all of its finitely generated subgroups
are finitely presented. We will be interested in the following algorithmic
version of coherence.
Definition 1.1 A coherent group G is effectively coherent if there exists an
algorithm that, given a finite subset S as input, outputs a presentation for
the subgroup generated by S.
A class G of coherent groups is uniformly effectively coherent if there
exists an algorithm that, given as input a presentation of a group G ∈ G and
a finite set S of elements of G, outputs a presentation for the subgroup of G
generated by S.
An appealing consequence of this property is that, under mild hypothe-
ses, one can decide if a homomorphism to an effectively coherent group is
injective.
Lemma 1.2 If a group G is effectively coherent then there exists an algo-
rithm that, given a presentation for a group H, a solution to the word problem
in H and a homomorphism f : H → G, determines whether f is injective.
Proof. Given a presentation for the image of f and a solution to the word
problem in H , it is easy to check whether f has a well-defined inverse and
hence is injective. Therefore, if G is effectively coherent it is easy to check if
f is injective. �
Remark 1.3 Even without a solution to the word problem in H, there exists
a Turing machine that will confirm in finite time if the homomorphism f is
injective. Indeed, if f is injective then we know what the inverse to f must
be. By effective coherence, it is possible to compute a presentation for the
image of f , and the inverse homomorphism exists if and only if the relations
for f(H) hold in H (under the supposed inverse map). Even though the word
problem for H may be unsolvable, it is straightforward to enumerate the words
which are equal to 1 in H, and if f is a homomorphism then the relations for
f(H) (interpreted as words in the generators for H) will eventually appear
on this list.
However, if the word problem in H is unsolvable then there will in general
be no Turing machine which terminates if the map f is not injective, since
we will not be able to tell, for example, if the group H is the trivial group.
Of course, a finitely generated subgroup of an effectively coherent group
is effectively coherent. If G is a class of groups, denote by S(G) the class of
finitely generated subgroups of groups in G. We are interested in effective
coherence because it allows the property of being recursively enumerable to
pass from G to S(G). Furthermore, uniform effective coherence also passes to
subgroups.
Lemma 1.4 If G is recursively enumerable and uniformly effectively coher-
ent then S(G) is recursively enumerable and uniformly effectively coherent.
Proof. Enumerating the presentations of groups G ∈ G and finite subsets
S ⊂ G, then using uniform effective coherence to compute presentations
for 〈S〉, one enumerates presentations for every group in S(G). So S(G) is
recursively enumerable.
Given a presentation for a group G ∈ S(G) and a finite subset S of G,
we can enumerate groups K ∈ G and homomorphisms f : G → K and check
whether f is an injection using the Turing machine described in Remark
1.3. Since G ∈ S(G) one will eventually find such an injection f . Using the
effective coherence of K, one can now compute a presentation for 〈f(S)〉. So
S(G) is uniformly effectively coherent. �
We approach effective coherence through local retractions.
2 Local retractions
A group G retracts onto a subgroup H if the inclusion map H →֒ G admits
a left-inverse ρ : G → H . The subgroup H is called a retract and the
map ρ is a retraction. Following [14], a group has local retractions if every
finitely generated subgroup is a retract of a finite-index subgroup. This has
immediate consequences for coherence.
Lemma 2.1 If H is a retract of a finitely presented group G then H is
finitely presented.
Proof. The proof of the lemma is a diagram chase. Let ρ : G → H be the
retraction. If B generates H then, since
G = H ker ρ
we can add elements from ker ρ to B to give a (finite) generating set A =
B ∪A′ for G. Furthermore, any finite presentation for G can be modified to
give a finite presentation with generators of this form.
Denote by FX the free group on a set X . Let ρ
′ be the obvious retraction
from FA = FB ∗ FA′ to FB that kills FA′. This gives a commutative square
−−−→ G
−−−→ H
where p and q are the natural surjections FA → G and FB → H respectively.
Denote the inclusion H →֒ G by i and the inclusion FB →֒ FA by i
′. The
lemma follows directly from the claim that ρ′ restricts to a retraction ker p →
ker q.
If l ∈ ker q then p◦ i′(l) = i◦q(l) = 1 so i′(l) ∈ ker p. Likewise, if k ∈ ker p
then q◦ρ′(k) = ρ◦p(k) = 1 so ρ′(k) ∈ ker q. This proves the claim and hence
the lemma. �
Since finite-index subgroups of finitely presented groups are finitely pre-
sented, coherence for finitely presented groups with local retractions follows
immediately.
Proposition 2.2 If a finitely presented group G has local retractions then
G is coherent.
Better still, Lemma 2.1 is effective.
Lemma 2.3 Let G be a finitely presented group with solvable word problem.
There is an algorithm that takes as input a finite presentation for G and a
collection of words which are the images of the generators under a homomor-
phism ρ : G → G that is a retract onto ρ(G), and outputs a presentation for
ρ(G).
Proof. Applying Tietze transformations, the given generating set for G will
eventually be of the form required in the proof of Lemma 2.1, namely the
union of some generators for ρ(G) and some elements of ker ρ, and since G has
solvable word problem we can tell when we have found such a presentation.
By the proof of Lemma 2.1, a presentation for ρ(G) is then obtained by
eliminating all the generators in ker ρ from the presentation of G. �
By [14, Theorem 2.4], groups with local retractions are residually finite
and hence have (uniformly) solvable word problem. Let LR be the class of
finitely presented groups with local retractions.
Theorem 2.4 The class LR is uniformly effectively coherent.
Proof. Given a finite presentation for a group G ∈ LR and a finite collection
of elements S ∈ G, we can enumerate all finite-index subgroups K of G using
the Reidemeister–Schreier Process (see, for instance, [15]). Since G ∈ LR,
there is a finite-index subgroup K of G so that 〈S〉 ⊆ K and so that there
exists a retraction ρ : K → 〈S〉.
We find such a retraction as follows. In parallel, consider each of the
finite-index subgroups of G. Given such a finite-index subgroup K, look
for the elements of S as words in the generators for K. Suppose we have
found a finite-index subgroup K so that 〈S〉 ⊆ K, and a finite presentation
〈X | R(X)〉 of K, with S = {s1(X), . . . , sn(X)} written as words in X
Now search for a collection of words Y in S± with a bijection ρ : X → Y so
that each of the relations of the form R(Y ) holds and so that for each i we
have si(Y ) = si(X). Then the map ρ extends to a retraction ρ : K → 〈S〉.
Since there is a retraction, we will eventually find such a K and Y .
The algorithm of Lemma 2.3 now computes a presentation for 〈S〉. �
3 Enumerating I and L
The class of iterated extensions of centralizers is defined inductively. If G is
a group, g ∈ G and Z(g) is the centralizer of g then an amalgamated free
product
G′ = G ∗Z(g) (Z(g)× Z
is said to be obtained from G by extension of centralizers.
Definition 3.1 The class I of iterated extensions of centralizers is the small-
est class of groups containing all finitely generated free groups and closed
under extension of centralizers. The class of limit groups is defined to be
L = S(I),
the class of finitely generated subgroups of iterated extensions of centralizers.
The usual definition of limit groups is as finitely generated fully residually
free groups.
Definition 3.2 A group G is fully residually free if, for every finite subset
X ⊂ G r 1, there exists a homomorphism to a free group G → F such that
1 /∈ f(X).
A finitely generated group is fully residually free if and only if it is in L, by
a theorem of [13]. Fully residually free groups are residually finite (since free
groups are) and so have solvable word problem. Using the fact that limit
groups are fully residually free, the following fact is well known and easy to
prove.
Lemma 3.3 If G is a limit group and g ∈ G then Z(g) is a free abelian
group.
By Theorem B of [23], limit groups have local retractions. It is clear that
all groups in I are finitely presented.
Corollary 3.4 The class I is uniformly effectively coherent.
By Lemma 1.4, to enumerate limit groups it remains only to enumerate
I. The crucial step is the ability to calculate centralizers.
For this we use the relatively hyperbolic structure of limit groups (found
independently by E. Alibegović [1] and F. Dahmani [9]). See [11] for an intro-
duction to relatively hyperbolic groups (where in Farb’s language we mean
‘relatively hyperbolic with BCP’). Limit groups are torsion-free and hyper-
bolic relative to a finite collection of maximal noncyclic abelian subgroups.
Dahmani [6] provides an algorithm which takes as input a finite presentation
of such a relatively hyperbolic group and outputs a basis for a representative
of each conjugacy class of noncyclic maximal abelian subgroup (Dahmani’s
algorithm takes as input an arbitrary finite presentation, and does not need
to be given the ‘relatively hyperbolic structure’ of the group).
Another important tool will be the universal theory of a group. The ele-
mentary theory of a group G is the set of all sentences in first-order predicate
logic (possibly with constants) that hold in G. For example, G is abelian if
and only if the sentence
∀x, y ∈ G [x, y] = 1
is in the elementary theory of G. A universal sentence is a sentence in the
elementary theory with a single universal quantifier. The universal theory of
G is the set of universal sentences in the elementary theory ofG. Deciding the
truth of universal sentences is equivalent to deciding whether finite systems
of equations and inequations (with constants) have solutions.
In [16], Makanin proved that the universal theory of a free group F is
decidable—that is, there exists an algorithm that, given as input a universal
sentence, determines whether or not it lies in the universal theory of F .
The universal theory of torsion-free relatively hyperbolic groups with abelian
parabolic subgroups is also decidable, by another algorithm of Dahmani [5]
(again the input is any finite presentation for the group, along with the
universal sentence).
There is an alternative approach to calculating centralizers using biauto-
matic structures. It follows from work of Rebbechi [18] that limit groups are
biautomatic, and the algorithm for finding automatic structures described
in [10] can be adapted to find biautomatic structures [3]. In particular, one
can calculate the fellow-traveller constant of the bicombing. Using the ideas
of [2], it is then easy to compute a presentation for the centralizer of an
arbitrary finite subset.
Theorem 3.5 There exists an algorithm that, given as input a presentation
for a group G ∈ I and an element g ∈ G, outputs a minimal set of generators
for Z(g).
Proof. Apply Dahmani’s algorithm from [6] to find a basis for a represen-
tative of each conjugacy class of maximal noncyclic abelian subgroup.
Let g ∈ G. There are two cases to consider: either g is parabolic (which
means conjugate into a noncyclic abelian subgroup) or else g is hyperbolic
(which means g is not parabolic).
It is possible to decide whether or not g is parabolic. This is because
the universal theory of G is decidable [5]. The element is parabolic if and
only if there exists an element h ∈ G so that hgh−1 commutes with each
element of one of the above bases for the noncyclic abelian subgroups. This
is a finite system of equations over G, which we can determine the truth of
by Dahmani’s algorithm from [5].
If g is parabolic, then we will find such an element h, and the conjugates
by h−1 of the basis for the maximal noncyclic abelian subgroup generates
the centralizer of g. In this case we have found a minimal generating set for
Z(g).
If g is hyperbolic then its centralizer is generated by a maximal root of g.
According to D. Osin [17, Theorem 1.16.(3)], it is possible to algorithmically
extract roots from hyperbolic elements of G. On the face of it, Osin’s algo-
rithm needs to be given as input the relatively hyperbolic structure of the
group. However, Dahmani’s algorithm from [6] will find this structure, so we
can make Osin’s algorithm take only the finite presentation as input. There-
fore, if g is hyperbolic we can find a maximal root of g, and this maximal
root is a minimal generating set for Z(g). �
Corollary 3.6 The set I is recursively enumerable.
Combining this with Theorem 3.4 it follows that the set of limit groups
L is recursively enumerable, by Lemma 1.4.
Corollary 3.7 The set of limit groups L is recursively enumerable and uni-
formly effectively coherent.
The results of [4] (see also [8, 7]) show that limit groups have solvable iso-
morphism problem. Hence we can improve recursively enumerable to recur-
sive: we can ensure that the list produced includes at most one presentation
for each isomorphism class of limit groups.
Corollary 3.8 The set of limit groups L is recursive.
On the other hand, by systematically applying Tietze transformations, it
is possible to effectively list all of the finite presentations of all limit groups.
We give a simple application to recognition algorithms here.
4 Recognition algorithms
Theorem 4.1 There exists an algorithm that, given as input a presentation
for a group G and a solution to the word problem in G, determines whether
or not G is a limit group.
Proof. Let P = 〈X | R〉 be the finite presentation defining G. We have
already noted that it is possible to enumerate all finite presentations of limit
groups. Thus if G is a limit group then P will eventually appear on this list.
Suppose then that G is not a limit group. Then G is not fully residually
free, so there is a finite set {g1, . . . , gr} of nontrivial elements of G so that
for any homomorphism φ from G to a free group F , at least one of the gi
is in ker(φ). This property of G can easily be translated into a system of
equations and inequations over F as follows. Consider both the elements
of R and each gi as a word in X
±, and write R = {r1, . . . , rk}. Then the
following sentence encodes the fact that at least one of {g1, . . . , gr} is in the
kernel of any homomorphism from G to F :
∀X ⊂ F
r1(X) = 1∧· · ·∧rk(X) = 1
g1(X) = 1∨· · ·∨gr(X) = 1
. (1)
By Makanin’s algorithm [16], it is possible to decide whether or not universal
sentences are true in a free group. Enumerate finite sets of nontrivial elements
of G (the solution to the word problem allows us to know that the elements
are nontrivial). Now, for each such finite set {r1, . . . , rk}, decide whether the
sentence (1) is true or not. If G is not a limit group, we will eventually find
a finite set for which (1) is true. �
Of course, one cannot recognize limit groups amongst arbitrary finitely
presented groups.
A cyclically pinched group is an amalgamated free product of two free
groups with cyclic amalgamated subgroup. Some, but not all, of these groups
are limit groups. In [20, I.3], Sela asks for necessary or sufficient conditions
for a cyclically pinched group to be a limit group. We do not have an answer
to this question. However, Theorem 4.1 implies that at least the question
has an answer. The following result was pointed to us by François Dahmani
and Vincent Guirardel (its proof contains the core of the proof of Theorem
4.1).
Corollary 4.2 There is an algorithm that takes as input a finite presentation
of a cyclically pinched group and decides whether or not the defined group is
a limit group.
It does not matter whether the input presentation exhibits the cyclically
pinched nature of the group, since by applying some finite number of Ti-
etze transformations it is possible to find such a presentation. Once such
a presentation is found, there is an explicit solution to the word problem.
Therefore Corollary 4.2 follows immediately from Theorem 4.1.
As remarked above, limit groups are torsion-free and hyperbolic relative
to their maximal abelian subgroups. There is an algorithm to distinguish
free groups among such relatively hyperbolic groups; indeed, it is proved in
[7, Theorem 1.4] that there exists an algorithm that computes the Grushko
decomposition from a presentation of such a group. Combining this with
Theorem 4.1, we obtain a similar recognition algorithm for free groups. This
corollary was pointed out to us by Gilbert Levitt.
Corollary 4.3 There exists an algorithm that, given as input a presentation
for a group G and a solution to the word problem in G, determines whether
or not G is free.
One can also deduce a similar result for surfaces. In [8, Theorem D] it
is shown that there exists an algorithm that computes a JSJ decomposition
for a torsion-free, freely indecomposable group that is hyperbolic relative
to its maximal abelian subgroups. In particular, combining this with the
algorithm from [7], one can decide whether or not a limit group is a surface
group. It follows as before that there exists an algorithm that, given as
input a presentation for a group G and a solution to the word problem in G,
determines whether or not G is a (fully) residually free surface group. (The
only surface groups that are not residually free are the fundamental groups
of the non-orientable surfaces of Euler characteristic 1, 0 and -1.)
References
[1] Emina Alibegović. A combination theorem for relatively hyperbolic
groups. Bull. London Math. Soc., 37(3):459–466, 2005.
[2] Martin R. Bridson. On the subgroups of semihyperbolic groups. In
Essays on geometry and related topics, Vol. 1, 2, volume 38 of Monogr.
Enseign. Math., pages 85–111. Enseignement Math., Geneva, 2001.
[3] Martin R. Bridson and Lawrence D. Reeves. On the algorithmic con-
struction of classifying spaces and the isomorphism problem for biauto-
matic groups. Preprint, 2007.
[4] I. Bumagin, O. Kharlampovich, and A. Miasnikov. Isomorphism prob-
lem for finitely generated fully residually free groups. J. Pure and Ap-
plied Algebra, 208(3):961–977, 2007.
[5] François Dahmani. Existential questions in (relatively) hyperbolic
groups. Preprint, 2006.
[6] François Dahmani. Finding relatively hyperbolic structures. Preprint,
2006.
[7] François Dahmani and Daniel Groves. Detecting free splittings in rela-
tively hyperbolic groups. TAMS, to appear.
[8] François Dahmani and Daniel Groves. The isomorphism problem for
toral relatively hyperbolic groups. Preprint, 2005.
[9] François Dahmani. Combination of convergence groups. Geom. Topol.,
7:933–963 (electronic), 2003.
[10] David B. A. Epstein, James W. Cannon, Derek F. Holt, Silvio V. F.
Levy, Michael S. Paterson, and William P. Thurston. Word processing
in groups. Jones and Bartlett Publishers, Boston, MA, 1992.
[11] B. Farb. Relatively hyperbolic groups. Geom. Funct. Anal., 8(5):810–
840, 1998.
[12] O. Kharlampovich and A. Miasnikov. Irreducible affine varieties over a
free group. I. Irreducibility of quadratic equations and Nullstellensatz.
J. Algebra, 200(2):472–516, 1998.
[13] O. Kharlampovich and A. Miasnikov. Irreducible affine varieties over a
free group. II. Systems in triangular quasi-quadratic form and descrip-
tion of residually free groups. J. Algebra, 200(2):517–570, 1998.
[14] D. D. Long and A. W. Reid. Subgroup separability and virtual retrac-
tions of groups. Topology, to appear, 2006.
[15] Wilhelm Magnus, Abraham Karrass, and Donald Solitar. Combinato-
rial group theory: Presentations of groups in terms of generators and
relations. Interscience Publishers [John Wiley & Sons, Inc.], New York-
London-Sydney, 1966.
[16] G. S. Makanin. Decidability of the universal and positive theories of a
free group. Izv. Akad. Nauk SSSR Ser. Mat., 48(4):735–749, 1984.
[17] Denis V. Osin. Relatively hyperbolic groups: intrinsic geometry, alge-
braic properties, and algorithmic problems. Mem. Amer. Math. Soc.,
179(843):vi+100, 2006.
[18] D. Y. Rebbechi. Algorithmic properties of relatively hyperbolic groups.
Thesis, 2003. ArXiv: math/0302245.
[19] Z. Sela. Diophantine geometry over groups VIII: The elementary theory
of a hyperbolic group. Preprint.
[20] Zlil Sela. Diophantine geometry over groups: a list of research problems.
http://www.ma.huji.ac.il/~zlil/problems.dvi.
[21] Zlil Sela. Diophantine geometry over groups. I. Makanin-Razborov dia-
grams. Publ. Math. Inst. Hautes Études Sci., 93:31–105, 2001.
[22] Zlil Sela. Diophantine geometry over groups. II. Completions, closures
and formal solutions. Israel J. Math., 134:173–254, 2003.
[23] Henry Wilton. Hall’s Theorem for limit groups. Preprint, 2006.
Daniel Groves, Mathematics 253-37, California Institute of Tech-
nology, Pasadena, CA 91125
E-mail: groves@caltech.edu
http://arxiv.org/abs/math/0302245
Henry Wilton, Department of Mathematics, 1 University Sta-
tion C1200, Austin, TX 78712-0257
E-mail: henry.wilton@math.utexas.edu
	Effective coherence
	Local retractions
	Enumerating I and L
	Recognition algorithms
ABSTRACT
  We prove that the set of limit groups is recursive, answering a question of
Delzant. One ingredient of the proof is the observation that a finitely
presented group with local retractions (a la Long and Reid) is coherent and,
furthermore, there exists an algorithm that computes presentations for finitely
generated subgroups. The other main ingredient is the ability to
algorithmically calculate centralizers in relatively hyperbolic groups.
Applications include the existence of recognition algorithms for limit groups
and free groups.

<|endoftext|><|startoftext|>
Microsoft Word - Nanocrystallography_w_Figs_Physics_archive.doc
Dynamics of Size-Selected Gold Nanoparticles Studied 
by Ultrafast Electron Nanocrystallography  
Chong-Yu Ruan*, Yoshie Murooka, Ramani K. Raman, Ryan A. Murdick  
Department of Physics and Astronomy 
Michigan State University, East Lansing, MI 48824, USA 
* corresponding author: ruan@pa.msu.edu 
ABSTRACT We report the studies of ultrafast electron nanocrystallography on size-selected Au 
nanoparticles (2-20 nm) supported on a molecular interface. Reversible surface melting, melting, and 
recrystallization were investigated with dynamical full-profile radial distribution functions determined 
with sub-picosecond and picometer accuracies. In an ultrafast photoinduced melting, the nanoparticles 
are driven to a non-equilibrium transformation, characterized by the initial lattice deformations, 
nonequilibrium electron-phonon coupling, and upon melting, the collective bonding and debonding, 
transforming nanocrystals into shelled nanoliquids.  The displasive structural excitation at premelting 
and the coherent transformation with crystal/liquid coexistence during photomelting differ from the 
reciprocal behavior of recrystallization, where a hot lattice forms from liquid and then thermally 
contracts. The degree of structural change and the thermodynamics of melting are found to depend on 
the size of nanoparticle. 
Understanding the phases of materials and their transformations is a fundamental problem, especially 
on the nanometer scale, where thermodynamics and kinetic processes are influenced by local 
environments, including surfaces, microsctructures, and interfacial chemistry1,2. Their elucidation 
requires characterization at the atomistic scale, both in space and time; at nanointerfaces molecular 
sensitivity is necessary. The possibility for a scattering experiment to couple with the high time 
resolution of a femtosecond laser in a pump-probe arrangement makes it a favorable option to study 
structural dynamics, as evident from recent developments3-6. Here we report a nanocrystallographic 
method, based on Ultrafast Electron Crystallography (UEC)7, which allows quantitative studies of local 
structures and transient dynamics of nanoparticles (NPs) dispersed on a molecular interface. The 
implementation is general. Specifically demonstrated here are the studies of reversible photoinduced 
melting and subsequent recrystallization of size-selective Au NPs (2nm, 10nm, 20nm), a prototypical 
system for studying nanophases1 and catalysis2, supported on a molecular surface. The size-dependent 
phase transitions are examined using a more bulk-like NP (20nm, melting point ~ 1300K8) and a much 
smaller one, where surface and confinement play significant roles (2nm, melting point ~ 800K8). By 
achieving spatial and thermal energy isolation of NPs from their environment and from each other, the 
normally irreversible phase transformations become reversible, allowing multi-shot pump-probe 
diffraction to map out their full courses.  Such implementation allows the use of a low-density electron 
pulse to avoid the pulse-broadening effect9 and has high data reproducibility compared with single-shot 
experiments where a much higher density electron pulse is required10. 
The spatial and thermal isolation of the NPs from their environment is achieved by implementing a 
buffer molecular layer, in this case aminosilane, self-assembled on a silicon substrate as shown in Fig. 
1A 11,12, with which, substrate scattering is sufficiently suppressed. Since the NPs are dispersed, the 
diffraction is via transmission, mostly through individual particles rather than multiple particles (or 
aggregates). To highlight the difference, two cases are presented in Fig. 1 (D-I). Without buffering, NPs 
tend to aggregate, as visible in the electron micrograph (D). The diffraction pattern (E) is dominated by 
the substrate, as also evident from the rocking curve (F). With buffering, however, NPs are separated 
from each other on the surface (G), from which the diffraction (H) is predominantly from the NPs and 
the buffering molecular layer, with no indication of Si periodicity in the rocking curve (I).  Samples 
were also examined following the laser irradiation experiment and showed no signs of agglomeration or 
damage. A time-resolved structural study of surface supported Au NPs was conducted earlier using a 
synchrotron X-ray source13. However, because X-ray is much more penetrating, with 5 orders of 
magnitude less scattering power than electron, its interfacial structural resolution is limited by the 
signal-to-noise level, particularly for very small NPs. High-energy electron diffraction, with its short 
wavelength and high surface sensitivity as demonstrated here, has higher structural resolution than 
small angle X-ray diffraction. 
Prior to studying dynamics, the static structure of NPs is analyzed. The static pattern shows Debye-
Scherrer diffraction rings from the Au NPs, while the ordered buffer layer produces Bragg spots, 
primarily in the surface streak regions. The Debye-Scherrer rings are radially averaged into a 1D 
diffraction intensity curve, shown in Fig. 2A. This curve, obtained from the diffraction pattern of 2nm 
NPs, shows isolated peaks, reflecting a crystalline structure, which, from inspection, resembles more 
towards cuboctahedral and/or decahedral structures than an icosahedral one.  Here, s = (4π/λ)sin(θ/2) 
represents the magnitude of the reciprocal space wave-vector of diffracted electrons, with wavelength 
λ=0.069 Å for 30keV electrons, and θ being the electron scattering angle. The deviation from an ideal 
structure can be attributed to different possible conformations and the surface strain associated with 
small particles14. Their cuboctahedra-like crystalline characteristics are more easily seen in the modified 
radial distribution functions (mRDF)15, deduced from a Fourier analysis of 1D diffraction curves, shown 
in Fig. 2B.  All the major peaks above 2.8Å are in close agreement with the Au-Au distance table based 
on a face-center cubic (FCC) motif (Fig. 2C), which constitutes the internal lattice repeat of a 
cuboctahedra. The sensitivity of our technique is sufficient to permit the observation of molecular 
density peaks as well in the mRDF, such as those at 1.5 - 1.7 Å representing the C-C, C-N, and Si-N 
bonds and ~1.1 Å for C-H and N-H bonds.   
To study the dynamics, an ultrashort laser pulse (800 nm, P-polarized, ~40 fs) is used to excite the 
NPs, while the probing electron pulse is delayed relative to the laser pulse to monitor the structural 
evolution (Fig. 1B). To improve temporal resolution, a proximity-coupled optical system allows the 
photogenerated electron beam to be focused to ~5 µm in less than 6 cm from the photocathode with 
1000 or less electrons/pulse in order to remediate the space charge induced broadening effects and to 
reduce the pulse overlap between the pump and probe. Sub-ps accuracy can be readily achieved (Fig. 
1C). Using different excitation fluences (tuned to nonmelting, surface melting, and melting), transient 
responses of atoms in the NPs are determined from dynamical full profile mRDFs, highlighting the local 
dynamics, compared with the global ones based on analyzing Bragg peaks. The dynamics of bonds 
following laser irradiation can be extracted from the mRDF maps, shown in Fig. 3, selected here with 
the surface melting (31 mJ/cm2) and melting (75 mJ/cm2) fluences for 2nm NP. Their differentiation is 
evident from the rapid change in bond densities. In general, a melting is characterized by the 
replacement of sharp 2nd nearest neighbor peaks with more diffusive ones7. Uniquely here, the peak 
density reduction is coupled to the formation of new density peaks at slightly larger distances. This 
redistribution of 2nd nearest neighbor peaks can be used to determine the onset of melting as well as 
recrystallization, defined here as a 1/e drop in peak intensity at ~5Å. Based on this criterion, at 31 
mJ/cm2, we observe no melting. The laser heating causes the NPs to expand with little adjustment of 
bond densities below 1nm. At 18 ps, breaking and forming of bonds beyond 1nm are evident, indicative 
of surface melting. The transition is rapid, within 1-2 ps, and the molten layer lasts for only tens of ps. 
By 40 ps the newly formed long bonds begin to cool and slowly replace the original broken ones. The 
surfaces revert to original crystalline structure in ns timescale.  However, at 75 mJ/cm2, the bond 
densities across the full NP length scale are modified and complete melting occurs. Using the change in 
2nd nearest neighbor peak, we determine the occurrence of melting and recrystallization at 18 ps and 110 
ps respectively. Furthermore, the melting and recrystallization dynamics display vastly different 
characters, nonreciprocal to each other, as seen in the mRDF map. Following the initial expansion of the 
lattice, we can clearly see bonding and debonding emerge already at ~ 12 ps. The depletion of the bond 
density (debonding) around the major peaks (numbered 3,5,7,12 in Fig. 2B) is coupled to the 
enhancement of bond density (bonding) at longer distances, where bonds may or may not have been 
present before. The emerging longer distance peaks, which constitute a shoulder region, gain in density 
towards melting, ultimately smearing out into bands. The coherent bonding and debonding dynamics 
observed at the premelting period and the continued development of the newly formed long distance 
peaks into liquid structures suggests that liquid structures are populated while the particle is still 
relatively cold. This reflects the displasive character of the photo-melting process during which the 
transformation of crystal into liquid is through breaking old bonds and forming new bonds. In a sense, 
this photomelting dynamics resembles a conformation change between minimum energy structures on 
the free energy landscape. In contrast, the recrystallization is much like the reverse of a ‘thermal’ 
melting in which crystal simply thermally expands and then disorders.  The liquid structure of a NP is 
unique, and can be characterized by the reduced coordination number, judging from the reduction of the 
direct bond density, and shell-like mRDF densities. The structure of the liquid is compared with an 
icosahedral NP, which also possesses shell-like structure. The average distances of the liquid shells in 
the mRDF, taken at 40 ps, match well with those of a 10% expanded icosahedral shells with the 
pronounced crystalline peaks being smeared out. This suggests the density of the transient hot 
nanoliquid is reduced compared with a room temperature crystalline structure and the atom-atom 
correlation between liquid shells is lost.  
Closer inspection of the NPs expansion before melting reveals anisotropic movement of the lattice 
depending on the irradiating fluence. By fitting the mRDF density profiles using Gaussian function, we 
follow the time-dependent evolutions of the bond distance, density, and width for bonds at 2.88, 5.00, 
and 7.64 Å. At a low fluence (<31 mJ/cm2), the changes are isotropic with thermal expansion being 
equal for all three distances. However, the deformation of the lattice sets in as the fluence increases.  At 
a threshold fluence of 38 mJ/cm2, right before a full melting occurs, the early time (1-15 ps) anisotropic 
bond movements are evident, representing a combination of shearing motion along (100) direction and 
expansion (Fig. 4A, left panel). Generally, prior to lattice disorder, the nearest neighbor bond  (2.88 Å) 
sharpens while the rest of the longer bonds decay and widen. The transient narrowing at direct bond 
distance suggests a brief reduction of strain in the NP following the impulsive laser excitation. 
Expansion of direct bond continues till 60 ps, indicating uninterrupted transfer of energy into bond 
stretching vibrations.  However, these slower dynamics do not represent the time scale of electron-
lattice equilibration. The intrinsic electron-phonon coupling time for Au NPs is less than 4±1 ps, a limit 
derived based on fitting the Debye-Waller factor from low fluence (15 mJ/cm2) data where no lattice 
disorder or coherent motion is evidently present at short times. Thus the longer period for melting (18-
20 ps) and the lattice expansion (60 ps) reflect the time scales for atomic disorder in the crystal and 
thermal energy relaxation from the initially strongly excited hot phonons to the bond stretching 
vibrations. Link and El-Sayed16 have found time constant of 30 ps for NP shape change from nanorod to 
nanosphere, a time scale comparable to phonon-phonon scattering time.  
Melting of Au NPs has been investigated by other time-resolved techniques also. Plech and coworkers 
have reported the melting transition of Au NPs (100nm) using synchrotron based time-resolved X-ray 
powder diffraction, on the 100 ps time scale (their pulse duration)17.  The sample was irradiated by a 
400 nm fs laser with steadily increasing power, and the phase changes were interpreted based on 
monitoring the deviation of the integrated area under (111) and (200) Bragg peaks from a constant 
value, at a fixed delay of 105 ps. The corresponding lattice temperature is derived based on the lattice 
expansion by monitoring the shift of Bragg peak position. They observe a sub-bulk melting temperature 
(70% of the bulk value), which is unexpected for particle size larger than 30 nm.  They attribute this 
suppression of melting temperature to possible onset of surface melting. The largest lattice expansion 
before melting was determined to be 1.2% (1.82% is expected for bulk melting), and there was no 
indication of any significant crystal anisotropy based on diffraction. In a more recent small-angle X-ray 
scattering (SAXS) study, Plech and coworkers found 15 mJ/cm2 as the melting threshold for 38 nm Au 
NPs, and have shown laser alignment effects just below the melting fluence18. Hartland, Hu, and Sader 
have addressed the melting transition by measuring the vibration frequency of the breathing mode using 
time-resolved spectroscopy, but have found no discontinuity at the melting point19.  They conclude that 
a saturation of light absorption limits the energy that can be transferred to the lattice.  To connect our 
data to these studies, we also inspect the temporal evolution of Bragg peaks in s-space.  At 38 mJ/cm2 
(the threshold fluence for 2nm NPs), we find a rapid decay of intensity (~6 ps for (111) peak to drop by 
1/e). Such a significant change, however, does not correspond to a melting, as shown from our mRDF 
analyses, rather it indicates a breaking of lattice symmetry induced by photo-excitation. This 
deformation is also evident from the anisotropic shifts of different lattice planes – (220) blue-shifts 
while (311) and (331) peaks red-shift. Their associated peak-widths exhibit instantaneous narrowing 
followed by widening, again confirming the coherent change at short times. These shifts are consistent 
with a lattice deformation along (100) direction.  The lattice then expands significantly to a maximum 
of ~1.5% at 60 ps. The ‘lattice’ is found significantly disordered, no longer suitable for s-space 
analyses. However, the disorder is just below the threshold considered as melting according to our 
mRDF analyses. At 75 mJ/cm2 (melting fluence), a rapid drop of (111) intensity to 20% of the original 
level is found to appear at 15 ps, indicating the rapid loss of long-range order. This time scale is close to 
the onset of bonding and debonding observed in the premelting period according to the mRDF analyses.  
For 20nm Au NPs, we found the threshold fluence to be between 15-20 mJ/cm2, consistent with the 
results for 38 nm NPs obtained by Plech and coworkers.  
To understand the thermal energy redistribution, we use the stretch of bonds to gauge the ‘local 
temperature’ of bonds in NPs.  The term ‘local’ used here suggests a temperature based on the 
vibrational sampling of local bonding potential between atoms. The anharmonicity of the bonding 
potential leads to the expansion of the bond, which increases with the vibrational amplitude. Coherent 
motion resulting from impulsive strain at early times from the fs excitation will cause splitting or 
broadening (if unresolved) of peaks or driven anisotropic deformation of the lattice. To this end, coarse-
graining the dynamics with longer time period should be conducted to reflect the average extension of 
the lattice due to thermal (stochastic) energy. This method based on the mRDF analysis allows 
differentiation of inhomogeneity that exists on different length scales, not possible by extracting 
temperature solely based on following the shift of a Bragg peak 17. However, such a definition of 
temperature must not be confused with the temperature of the NP as a whole, which can only be defined 
when the thermal equilibrium is reached. By using the long-time data, where thermalization has been 
established, we can extract the NPs’ true temperatures under different fluences by comparing them to a 
two-temperature model (TTM) 20 (Fig. 4A). To convert the thermal expansion into temperature, we use 
the temperature-dependent thermal expansion coefficient from reference 21 which is valid between 300 
and 1300 K. This thermal expansion coefficient is found to apply for 60nm Au NPs17. Comparing the 
long-time thermal relaxation, which is fit to a TTM, with the short-time heightened lattice expansion 
reveals the hot phonons effect caused by non-equilibrium electron-phonon coupling (Fig. 4A, right 
panel). The existence of hot phonons was recently invoked to explain the non-equilibrium electron-
phonon coupling in low-dimensional systems, such as graphite22 and nanotube23, as well as molecular 
systems24. They are the vibrational modes coupled more directly to the de-excitation of electrons, thus 
gaining higher ‘temperature’ compared with an equilibrated lattice temperature obtained from a TTM. 
These hot phonons produce large amplitude of vibration, thus leading to heightened lattice expansion. 
Because the phonon-phonon interaction time is 30-60 ps, these hot phonons are likely responsible for 
initiating melting and influencing recrystallization, making the photomelting phenomena different from 
a thermal one. Size-dependent effects in the transient heating of NPs are seen, shown in Fig. 4B.  First, 
the transient maximum bond stretch is significantly higher in the 2nm NPs at 75 mJ/cm2 (6% for 2nm, 
3.5% for 20nm), albeit, the thermal temperature is very close in both cases - a fact deduced by 
comparing the equilibrated (∆R/R) data at longer times (from 1-3 ns data, see insets).  Second, 20nm 
NPs have similar maximum bond stretch at 80 mJ/cm2 and 31 mJ/cm2, both leading to melting, but 
differing in their liquid residence time. For 2nm NPs however, melting occurs only at 75 mJ/cm2. These 
results suggest that the particle size plays a role in determining the thermodynamics of NPs. For 20 nm 
NPs, increasing the fluence does not cause a continuous rise in liquid temperature, leading instead to a 
longer liquid residence time, suggesting that latent heat already exists at 20 nm. The lack of a sharp 
transition expected from first-order phase transition reflects the ultrafast nature of transformation. 
However, for 2nm NPs, the temperature continues to rise significantly after melting, suggesting the 
transformation being a second-order phase transition25,26.  
Based on the TTM27, at the irradiating fluence of 31 mJ/cm2, 1.5×1022 e-/cm3 (~24%) are excited in 
the Au NP, whereas at 75 mJ/cm2, 3.25×1022 e-/cm3 (~57%) are excited. At these high fluences, the 
interband transition starts to play a role. The lowest interband transition energy in Au is 1.7 eV28, which 
corresponds to promoting d-electrons (5d) in the vicinity of the X-point of the first Brillouin zone to the 
conduction band (6sp) and is slightly higher than our excitation energy (1.55 eV). However, as 
conduction electrons are strongly excited, their Fermi-Dirac distribution is modified, with part of the 
electronic levels below the Fermi level being emptied, to make way for interband transition. Because of 
this hot electron effect, the contribution of the interband transition increases with fluence. The effect of 
interband transition is manifested in the lattice anisotropic deformation observed at the short times (1-15 
ps). Although, the d holes relaxation will proceed in tens of fs by hole-hole scattering29, the lattice 
deformation likely persist to the ps time scale due to the slow collective motion of atoms responding to 
the modification of the energy landscape caused by the core electron (5d) excitation. The coupling of ps 
lattice deformation to electronic heating in bulk system was also discussed recently by Guo and 
Taylor30. In addition, we find that by using 400 nm excitation this anisotropic deformation is replaced 
by an isotropic one. Because of the high excitation energy, the interband transition is no longer pinned 
to the X-point. These results suggest that the excited energy landscape can be explored by following the 
ultrafast lattice dynamics as a function of the excitation energy. 
In conclusion, using ultrafast electron nanocrystallography, we have mapped out the dynamics of 
liquid-crystalline and crystalline-liquid phase transformations for Au nanoparticles, at and beyond the 
thermodynamic limit. The accurate mRDF determinations of nanostructures allow quantitative studies 
of atomic dynamics with molecular scale resolutions. The size dependence is evident in the change of 
structures and in the extent of melting. The reversible and coherent transformation on the ultrafast time 
scale demonstrates the directed dynamics on the energy landscape of finite systems. Abundant details 
can be further extracted by comparing the dynamical mRDF maps and electron micrographs obtained 
for fluence far beyond the melting threshold, and under different excitation wavelengths. This 
methodology is general and could be implemented to study a wide class of phenomena pertaining to 
nanoscaled materials. 
ACKNOWLEDGMENT  
This work is supported by the US Department of Energy, Office of Basic Energy Sciences, Division 
of Material Sciences and Engineering and the Intramural Research Grant Program at Michigan State 
University. 
REFERENCES 
1. Marks, L.D. Rep. Prog. Phys. 1994, 57, 603. 
2. Haruta, M. Catal. Today 1997, 36, 153. 
3. Zewail, A.H.  Annu Rev. Phys. Chem. 2006, 57, 65. 
4. Rousse, A. ; Rischel, C. ; Gauthier, J.  Rev. Mod. Phys. 2001, 73, 17. 
5. Bargheer, M. ; Zhavoronkov, N. ; Woerner, M. ; Elsaesser, T. ChemPhysChem. 2006, 7, 783. 
6. Chergui, M. ; Bressler, C. Chem. Rev. 2004, 104, 1781. 
7. Ruan, C-Y. ; Vigliotti, F. ; Lobastov, V.A. ; Chen, S. ; Zewail, A.H. Proc. Natl. Acad. Sci. USA 2004, 101, 1123. 
8. Buffat, Ph. ; Borel, J-P. Phys. Rev. A 1976, 13, 2287. 
9. Schelev, M. Ya. ; Richardson, M.C. ; Alcock, A.J.  Appl. Phys. Lett. 1971, 18, 354. 
10. Siwick, B.J. ; Dwyer, J.R. ; Jordan, R.E. ; Miller, R.J.D. Science 2003, 302, 5649. 
11. Sample Preparation: Si(111) wafers were first cleaned by immersing in H2SO4/H2O2 (7:3) for 10 min, followed by 
immersion in 40% NH4F solution in de-ionized water for 20 min, producing H-terminated Si(111) surface. This was 
followed by immersion in NH4OH/H2O2/H2O (1:1:5) at 80°C for 10 min, subsequent immersion in HCl/H2O2/H2O 
(1:1:6) at 80°C for 10 min, followed by drying under high purity (99.9%) dry N2 gas to produce OH-terminated 
Si(111). After each step, the wafer was thoroughly rinsed in de-ionized water (>18Mohm.cm). The wafer was dried 
in high purity N2 gas and immersed in [3-(2-Aminoethylamino)propyl]trimethoxysilane (AEAPTMS, Sigma-
Aldrich) for 20 min, followed by heating at 80°C over dry N2 cover to allow surface functionalization. Dispersion 
and immobilization of Au NP on the wafer was finally achieved by immersing the functionalized Si(111) wafer in a 
colloidal Au NP solution (Ted Pella) and Ethanol/water (2:1) for 2 hrs. Also see Ref. 12. 
12. Sato, T. ; Brown, D. ; Johnson, B.F.G. Chem. Commun. 1997, 11, 1007. 
13. Plech, A. ; Gresillon, S. ; Plessen, G.von ; Scheidt, K. ; Naylor, G. Chem. Phys. 2004, 299, 183. 
14. Cervellino, A. ; Giannini, C. ; Guagliardi, A.  J. Appl. Cryst. 2003, 36, 1148. 
15. Warren, B.E. X-ray Diffraction; Dover publications, Dover ed.: New York, 1990. 
16. Link, S.; El-Sayed, M.A. Int. Rev. Phys. Chem. 2000, 19, 409. 
17. Plech, A.; Kotaidis, V.  Phys. Rev. B 2004, 70, 195423. 
18. Plech, A.;Kotaidis,V.;Lorenc,M.;Boneberg,J. Nature Phys. 2006, 2, 44. 
19. Hartland,G.V.;Hu,M.;Sader.J.E. J. Phys. Chem. B 2003, 107, 7472. 
20. Kaganov, M.I.; Lifshitz, I.M.; Tanatarov, L.V. Zh. Eksp. Teor. Fiz. 1956, 31, 232. [Sov. Phys. JETP 1957, 4, 173]. 
21. Touloukian,Y.S.;Kirby,R.K.;Taylor,R.E.;Desai,P.D. Thermal expansion – Metallic elements and alloys, in 
Thermophysical Properties of Matter Vol. 12; IFI Plenum: New York, 1975. 
22. Kampfrath,T.;Perfetti,L.;Schapper,F.;Frischkorn,C.;Wolf,M. Phys. Rev. Lett. 2005, 95, 187403. 
23. Auer,C.;Schurrer,F.;Ertler,C. Phys. Rev. B 2006, 74, 165409. 
24. Ruan, C-Y. ; Lobastov, V.A.; Srinivasan, R. ; Goodson, B.M. ; Ihee, H. ; Zewail, A.H. Proc. Natl. Acad. Sci. USA 
2001, 98, 7117. 
25. Liu, H.B. ; Ascencio, J.A.; Perez-Alvarez, M.; Yacaman, M.J. Surf. Sci. 2001, 491, 88. 
26. Jellinek, J. ; Beck, T.L. ; Berry, R.S. J. Chem. Phys. 1986, 84, 2783. 
27. Chen, J.K. ; Beraun, J.E.; Tham, C.L. Num.Heat.Trans. A 2003, 44, 705. 
28. Hache, F. ; Ricard, D. ; Flytzanis, C. ; Kreibig, U. Applied. Phys A 1988, 47, 347. 
29. Petek,H.;Ogawa,S. Prog. Surf. Sci. 1997, 56, 239. 
30. Guo,C.;Taylor,A.J. Phys. Rev. B 2000, 62, R11921. 
Figure 1 
Figure 1. (A) Au nanoparticles dispersed on a self-assembled molecular interface (B) Pump-probe 
arrangement of UEC. (C) Zero-of-time determination using the diffraction signals from the 
photomechanical responses of graphite multilayers.  (D) SEM image of 20 nm Au NPs dispersed on the 
surface without proper buffering. (E) Diffraction pattern from (D) showing Bragg spots of silicon 
substrate. (F) Rocking curve analysis gated at central streak in (E) with varying incident angle θi.  (G) 
SEM image of 20 nm Au NPs with proper buffering. (H) Diffraction pattern from (G) showing Debye-
Scherrer diffraction rings and Bragg spots from buffer layer (Si, N and C stack layers in self-assembled 
aminosilane, spacing 2.2Å, tilt angle 31°). (I) Rocking curve of (H).
Figure 2 
Figure 2. Structure analyses of size-selected 
Au nanoparticles. (A) 1D diffraction intensity 
curve (black) obtained from radial averaging 
of the Debye-Scherrer UEC pattern (inset) of 
the surface supported 2nm NPs. Also shown 
are simulations generated from 2nm structures 
(cuboctahedra, decahedra, and icosahedra) of 
Au NPs at 300 K. The indices show associated 
Bragg reflection planes based on an FCC 
structure. (B) Experimental modified radial 
distribution functions (mRDFs) of static Au 
NPs along with theoretical prediction for 
cuboctahedra. The numeric labels represent 
the bond order in the FCC distance table 
(below). (C) FCC coordination shells 
corresponding to interatomic distances ri, 
calculated based on the bond order i and the 
Au lattice constant a=4.08 Å. 
Figure 3 
Figure 3. The melting dynamics of 2nm Au nanoparticles. (Left) mRDF map constructed by stacking 
mRDFs of UEC patterns at a sequence of delays between 5-2300 ps  at irradiation fluence F=31 mJ/cm2. 
Surface melting (enclosed by the dashed white line) is visible. (Right) mRDF map for F=75 mJ/cm2. 
Full scale melting is observed. The liquid state (enclosed by dashed white line) is characterized by the 
drop of 2nd nearest density (at ~ 5Å) to (1-1/e) of the static value (at negative time).  
Figure 4 
Figure 4. (A) The left panel shows the short-time relative distance change ∆R/R of dominant mRDF 
peaks (numbered 1, 3, and 7 in Fig. 2B) for 2nm Au NPs at fluence of 38 mJ/cm2, from which the lattice 
deformation can be deduced. The right panel shows the extension of these bonds for longer times and 
the corresponding local temperature deduced from the bond extensions (see text) compared with a TTM 
calculation. tD is the thermal relaxation time to the environment obtained by fitting data to a two-
temperature model (TTM) after equilibration.  (B) Temporal evolution of the relative distance change 
∆R/R of nearest neighbor bond (~2.88Å) in 2nm (solid line and symbol) and 20nm (dash-dash-dot line 
and open symbol) NPs, irradiated under F=31 mJ/cm2 in the left panel, and F = 75 (for 2nm NPs) and 80 
(for 20nm NPs) mJ/cm2 in the right panel. The insets show the corresponding ∆R/R dynamics at long 
times (50-2750 ps).
ABSTRACT
  We report the studies of ultrafast electron nanocrystallography on
size-selected Au nanoparticles (2-20 nm) supported on a molecular interface.
Reversible surface melting, melting, and recrystallization were investigated
with dynamical full-profile radial distribution functions determined with
sub-picosecond and picometer accuracies. In an ultrafast photoinduced melting,
the nanoparticles are driven to a non-equilibrium transformation, characterized
by the initial lattice deformations, nonequilibrium electron-phonon coupling,
and upon melting, the collective bonding and debonding, transforming
nanocrystals into shelled nanoliquids. The displasive structural excitation at
premelting and the coherent transformation with crystal/liquid coexistence
during photomelting differ from the reciprocal behavior of recrystallization,
where a hot lattice forms from liquid and then thermally contracts. The degree
of structural change and the thermodynamics of melting are found to depend on
the size of nanoparticle.

<|endoftext|><|startoftext|>
Introduction
Stochastic optimal switching problems (or starting and stopping problems) are important subjects both in
mathematics and economics. Since there are numerous articles about real options in the economic and
financial literature in recent years, the importance and applicability of control problems including optimal
switching problems cannot be exaggerated.
A typical optimal switching problem is described as follows: The controller monitors the price of natural
resources for optimizing (in some sense) the operation of an extraction facility. She can choose when to start
extracting this resource and when to temporarily stop doing so, based upon price fluctuations she observes.
The problem is concerned with finding an optimal switching policy and the corresponding value function. A
number of papers on this topic are well worth mentioning : Brennan and Schwarz (1985) in conjunction with
convenience yield in the energy market, Dixit (1989) for production facility problems, Brekke and Øksendal
(1994) for resource extraction problems, Yushkevich (2001) for positive recurrent countable Markov chain,
and Duckworth and Zervos (2001) for reversible investment problems. Hamdadène and Jeanblanc (2004)
analyze a general adapted process for finite time horizon using reflected stochastic backward differential
equations. Carmona and Ludkovski (2005) apply to energy tolling agreement in a finite time horizon using
Monte-Carlo regressions.
A basic analytical tool for solving switching problems is quasi-variational inequalities. This method is
indirect in the sense that one first conjectures the form of the value function and the switching policy and
next verifies the optimality of the candidate function by proving that the candidate satisfies the variational
inequalities. In finding the specific form of the candidate function, appropriate boundary conditions includ-
ing the smooth-fit principle are employed. This formation shall lead to a system of non-linear equations that
are often hard to solve and the existence of the solution to the system is also difficult to prove. Moreover,
http://arxiv.org/abs/0704.0991v1
this indirect solution method is specific to the underlying process and reward/cost structure of the prob-
lem. Hence a slight change in the original problem often causes a complete overhaul in the highly technical
solution procedures.
Our solution method is direct in the sense that we first show a new mathematical characterization of
the value functions and, based on the characterization, we shall directly find the value function and optimal
switching policy. Therefore, it is free from any guesswork and applicable to a larger set of problems (where
the underlying process is one-dimensional diffusions) than the conventional methods. Our approach here
is similar to Dayanik and Karatzas (2003) and Dayanik and Egami (2005) that propose direct methods of
solving optimal stopping problems and stochastic impulse control problems, respectively.
The paper is organized in the following way. In the next section, after we introduce our setup of one
dimensional optimal switching problems, in section 2.1, we characterize the optimal switching times as
exit times from certain intervals through sequential optimal stopping problems equivalent to the original
switching problem. In section 2.2, we shall provide a new characterization of the value function, which
leads to a direct solution method described in 2.3. We shall illustrate this method through examples in
section 3, one of which is a new optimal switching problem. Section 4 concludes with comments on an
extension to a further general problem.
2 Optimal Switching Problems
We consider the following optimal switching problems for one dimensional diffusions. Let (Ω,F ,P) be a
complete probability space with a standard Brownian motion W = {Wt; t ≥ 0}. Let Zt be the indicator
vector at time t, Zt ∈ {z1, z2, ..., zm} , Z where each vector zi = (a1, a2, ..., ak) with a is either 0 (closed)
or 1 (open), so that m = 2k. In this section, we consider the case of k = 1. That is, Zt takes either 0 or 1.
The admissible switching strategy is
w = (θ0, θ1, θ2, ..., θk, ...; ζ0, ζ1, ζ2, ..., ζk, ...)
with θ0 = 0 where where where 0 ≤ θ1 < θ2 < .... are an increasing sequence of Ft-stopping times and ζ1,
ζ2... are Fθi-measurable random variables representing the new value of Zt at the corresponding switching
times θi (in this section, ζi = 1 or 0). The state process at time t is denoted by (Xt)t≥0 with state space
I = (c, d) ⊆ R and X0 = x ∈ I , and with the following dynamics:
If ζ0 = 1 (starting in open state), we have, for m = 0, 1, 2, .....,
dXt =
dX1t = µ1(X
1)dt+ σ1(X
1)dWt, θ2m ≤ t < θ2m+1,
dX0t = µ0(X
0)dt+ σ0(X
0)dWt, θ2m+1 ≤ t < θ2m+2,
(2.1)
and if ζ0 = 0 (starting in closed state),
dXt =
dX0t = µ0(X
1)dt+ σ0(X
0)dWt, θ2m ≤ t < θ2m+1,
dX1t = µ1(X
1)dt+ σ1(X
1)dWt, θ2m+1 ≤ t < θ2m+2.
(2.2)
We assume that µi : R → R and σi : R → R are some Borel functions that ensure the existence and
uniqueness of the solution of (2.1) for i = 1 and (2.2) for i = 0.
Our performance measure, corresponding to starting state i = 0, 1, is
Jwi (x) = E
e−αsf(Xs)ds −
e−αθjH(Xθj− , ζj)
 (2.3)
where H : R×Z → R+ is the switching cost function and f : R → R is a continuous function that satisfies
e−αs|f(Xs)|ds
<∞. (2.4)
In this section, the cost functions are of the form:
H(Xθ−, ζ) =
H(Xθ−, 1) opening cost,
H(Xθ−, 0) closing cost.
The optimal switching problem is to optimize the performance measure for i = 0 (start in closed state) and
1 (start in open state). That is to find, for both i = 1 and i = 0,
vi(x) , sup
Jw(x) with X0 = x (2.5)
where W is the set of all the admissible strategies.
2.1 Characterization of switching times
For the remaining part of section 2, we assume that the state space X is I = (c, d) where both c and d are
natural boundaries of X. But our characterization of the value function does not rely on this assumption. In
fact, it is easily applied to other types of boundaries, for example, absorbing boundary.
The first task is to characterize the optimal switching times as exit times from intervals in R. For this
purpose, we define two functions g0 and g1 : R+ → R with
g1(x) , sup
Jw1 (x) and g0(x) , sup
Jw0 (x). (2.6)
where W0 , {w ∈ W : w = (θ0, ζ0, θ1 = +∞)}. In other words, g1(·) is the discounted expected revenue
by starting with ζ0 = 1 and making no switches. Similarly, g0(·) is the discounted expected revenue by
staring with ζ0 = 0 and making no switches.
We set w0 , g1 and y0 , g0. We consider the following simultaneous sequential optimal stopping
problems with wn : R+ → R and yn : R+ → R for n = 1, 2, ....:
wn(x) , sup
e−αsf(Xs)ds + e
−ατ (yn−1(Xτ )−H(Xτ−, 1− Zτ−))
, (2.7)
yn(x) , sup
e−αsf(Xs)ds+ e
−ατ (wn−1(Xτ )−H(Xτ−, 1− Zτ−))
, (2.8)
where S is a set of Ft stopping times. Note that for each n, the sequential problem 2.7 (resp. (2.8)) starts in
open (resp. closed) state.
On the other hand, we define n-time switching problems for ζ0 = 1:
q(n)(x) , sup
Jw1 (x), (2.9)
where
Wn , {w ∈W ;w = (θ1, θ2, ...θn+1; ζ1, ζ2, ...ζn); θn+1 = +∞}.
In other words, we start with ζ0 = 1 (open) and are allowed to make at most n switches. Similarly, we
define another n-time switching problems corresponding to ζ0 = 0:
p(n)(x) , sup
Jw0 (x). (2.10)
We investigate the relationship of these four problems:
Lemma 2.1. For any x ∈ R, wn(x) = q(n)(x) and yn(x) = p(n)(x).
Proof. We shall prove only the first assertion since the proof of the second is similar. We have set y0(x) =
g0(x). Now we consider w1 by using the strong Markov property of X:
w1(x) = sup
e−αsf(Xs)ds+ e
−ατ (g0(Xτ )−H(Xτ−, 0))
= sup
e−αsf(Xs)ds −
e−αsf(Xs)ds− e−ατ (g0(Xτ )−H(Xτ−, 0))
= sup
e−ατ (g0(Xτ )− g1(Xτ )−H(Xτ−, 0))
+ g1(x).
On the other hand,
q(1)(x) = sup
e−αsf(Xs)ds− e−αθ1H(Xθ1− , ζ1)
= sup
[∫ θ1
e−αsf(Xs)ds +
e−αsf(Xs)ds− e−αθ1H(Xθ1− , 0)
= sup
(g1(x)− e−αθ1g1(Xθ1))− e
−αθ1(g0(Xθ1)−H(Xθ1− , 0))
= sup
e−αθ1(g0(Xθ1)− g1(Xθ1)−H(Xθ1− , 0))
+ g1(x).
Since both τ and θ1 are Ft stopping times, we have w1(x) = q(1)(x) for all x ∈ R. Moreover, by the theory
of the optimal stopping (see Appendix A, especially Proposition A.4), τ and hence θ1 are characterized as
an exit time from an interval. Similarly, we can prove y1(x) = p
(1)(x). Now we consider q(2)(x) which is
the value if we start in open state and make at most 2 switches (open → close → open). For this purpose, we
consider the performance measure q̄(2) that starts in an open state and is allowed two switches: For arbitrary
switching times θ1, θ2 > θ1 ∈ S , we have
q̄(2)(x) , Ex
e−αsf(Xs)ds−
e−αθjH(Xθj− , ζj)
e−αsf(Xs)ds +
e−αsf(Xs)ds +
e−αsf(Xs)ds
− e−αθ1H(Xθ1−, 0)− e
−αθ2H(Xθ2−, 1)
g1(x)− Ex[e−αθ1g1(Xθ1)]
x[e−αθ1g0(Xθ1)− e−αθ2g0(Xθ2)]
+ Ex[e−αθ2g1(Xθ2)]
− Ex[e−αθ1H(Xθ1−, 0) + e−αθ2H(Xθ2−, 1)].
Hence we have the following multiple optimal stopping problems:
q̄(2)(x) = sup
(θ1,θ2)∈S2
e−αθ1
(g0 − g1)(Xθ1)−H(Xθ1−, 0)
+ e−αθ2
(g1 − g0)(Xθ2)−H(Xθ2−, 1)
+ g1(x)
where S2 , {(θ1, θ2); θ1 ∈ S; θ2 ∈ Sθ1} and Sσ = {τ ∈ S; τ ≥ σ} for every σ ∈ S . Let us denote
h1(x) , g1(x)− g0(x)−H(x, 0), h2(x) , g0(x)− g1(x)−H(x, 1),
V1(x) , sup
e−ατh1(Xτ )
and V2(x) , sup
e−ατ (h2(Xτ ) + V1(Xτ ))
We also define
Γ1 , {x ∈ I : V1(x) = h1(x)} and Γ2 , {x ∈ I : V2(x) = h2(x) + V1(x)}
with σn , inf{t ≥ 0 : Xt ∈ Γn}. By using Proposition 5.4. in Carmona and Dayanik (2003), we conclude
that θ1 = σ1 and θ2 = θ1 + σ2 ◦ s(θ1) is optimal strategy where s(·) is the shift operator. Hence we only
consider the maximization over the set of admissible strategy W ∗2 where
W ∗2 , {w ∈W2 : θ1, θ2 are exit imes from an interval in I},
and can use the relation θ2 − θ1 = θ ◦ s(θ1) with some exit time θ ∈ S .
q(2)(x) = sup
w∈W ∗2
e−αsf(Xs)ds−
e−αθjH(Xθj− , ζj)
= sup
w∈W ∗2
e−αsf(Xs)ds +
e−αsf(Xs)ds +
e−αsf(Xs)ds
− e−αθ1(H(Xθ1− , 0) + e−α(θ2−θ1)H(Xθ2− , 1))
= sup
w∈W ∗2
e−αsf(Xs)ds + e
[(∫ θ
e−αsf(Xs)ds− e−αθH(Xθ−, 1)
− e−αθ1H(Xθ1− , 0)
Now by using the result for p(1), we can conclude
q(2)(x) = sup
w∈W ∗2
[∫ θ1
e−αsf(Xs)ds + e
p(1)(Xθ1)−H(Xθ1− , 0)
= sup
[∫ θ1
e−αsf(Xs)ds+ e
y1(Xθ1)−H(Xθ1− , 0)
= w2(x)
Similarly, we can prove y2(x) = p
(2)(x) and we can continue this process inductively to conclude that
wn(x) = q
(n)(x) and yn(x) = p
(n)(x) for all x and n.
Lemma 2.2. For all x ∈ R, limn→∞ q(n)(x) = v1(x) and limn→∞ p(n)(x) = v0(x).
Proof. Let us define q(x) , limn→∞ q
(n)(x). Since Wn ⊂ W , q(n)(x) ≤ v1(x) and hence q(x) ≤ v1(x).
To show the reverse inequality, we define W+ to be a set of admissible strategies such that
W+ = {w ∈W : Jw1 (x) <∞ for all x ∈ R}.
Let us assume that v1(x) < +∞ and consider a strategy w+ ∈ W+ and another strategy wn that coincides
with w+ up to and including time θn and then takes no further interventions.
1 (x)− Jw1 (x) = Ex
e−αs(f(Xs)− f(Xs−θn))−
i≥n+1
e−αθiH(Xθi−, ζi)
 , (2.11)
which implies
|Jw+1 (x)− Jw1 (x)| ≤ Ex
e−αθn −
i≥n+1
e−αθiH(Xθi−, ζi)
As n→ +∞, the right hand side goes to zero by the dominated convergence theorem. Hence it is shown
v1(x) = sup
Jw1 (x) = sup
w∈∪nWn
Jw1 (x)
so that v1(x) ≤ q(x). Next we consider v1(x) = +∞. Then we have some m ∈ N such that wm(x) =
q(m)(x) = ∞. Hence q(n)(x) = ∞ for all n ≥ m. The second assertion is proved similarly.
We define an operator L : H → H where H is a set of Borel functions
Lu(x) , sup
e−αsf(Xs)ds+ e
−ατ (u(Xτ )−H(Xτ−, 1 − Zτ−))
Lemma 2.3. The function w(x) , limn→∞wn(x) is the smallest solution, that majorizes g1(x), of the
function equation w = Lw.
Proof. We renumber the sequence (w0, y1, w2, y3...) as (u0, u1, u2, u3....). Since un is monotone increas-
ing, the limit u(x) exists. We have un+1(x) = Lun(x) and apply the monotone convergence theorem
by taking n → ∞, we have u(x) = Lu(x). We assume that u′(x) satisfies u′ = Lu′ and majorizes
g1(x) = u0(x). Then u
′ = Lu′ ≥ Lu0 = u1. Let us assume, for induction argument that u′ ≥ un, then
u′ = Lu′ ≥ Lun = un+1.
Hence we have u′ ≥ un for all n, leading to u′ ≥ limn→∞ un = u. Now we take the subsequence in
(w0, y1, w2, y3....) to complete the proof.
Proposition 2.1. For each x ∈ R, limn→∞wn(x) = v1(x) and limn→∞ yn(x) = v0(x). Moreover, the
optimal switching times, θ∗i are exit times from an interval.
Proof. We can prove the first assertion by combining the first two lemmas above. Now we concentrate on
the sequence of wn(x). For each n, finding wn(x) by solving (2.7) is an optimal stopping problem. By
Proposition A.4, the optimal stopping times are characterized as an exit time of X from an interval for all
n. This is also true in the limit: Indeed, by Lemma 2.3, in the limit, the value function of optimal switching
problem v1(x) = w(x) satisfies w = Lw, implying that v1(x) is the solution of an optimal stopping
problem. Hence the optimal switching times are characterized as exit time from an interval.
2.2 Characterization of the value functions
We go back to the original problem (2.3) to characterize the value function of the optimal switching prob-
lems. By the exit time characterization of the optimal switching times, θ∗i are given by
θ∗i =
inf{t > θi−1;X1t ∈ Γ1}
inf{t > θi−1;X0t ∈ Γ0}
(2.12)
where Γ1 = R \C1 and Γ0 = R \C0. We define here Ci and Γi to be continuation and stopping region for
Xit , respectively. We can simplify the performance measure J
w considerably. For ζ0 = 1, we have
Jw1 (x) = E
e−αsf(Xs)ds −
e−αθjH(Xθj− , ζj)
e−αsf(Xs)ds+
e−αsf(Xs)ds
− e−αθ1
H(Xθ1−, 0) +
e−α(θi−θ1)H(Xθj− , ζj)
e−αsf(Xs)ds+ e
−αθ1E
e−αsf(Xs)ds−
e−αθjH(Xθj− , ζj)
− e−αθ1H(Xθ1−, 0)
We notice that in the time interval (0, θ1), the process X is not intervened. The inner expectation is just
Jw0 (Xθ1). Hence we further simplify
Jw1 (x) = E
e−αsf(Xs)ds+ e
−αθ1(Jw0 (Xθ1)−H(Xθ1−, 0))
−e−αθ1g1(Xθ1) + e−αθ1(Jw0 (Xθ1)−H(Xθ1−, 0))
+ g1(x)
−e−αθ1g1(Xθ1) + e−αθ1Jw1 (Xθ1)
+ g1(x).
The third equality is a critical observation. Finally, we define u1 , J1 − g1 and obtain
u1(x) = J
1 (x)− g1(x) = Ex
e−αθ1u1(Xθ1)
. (2.13)
Since the switching time θ1 is characterized as a hitting time of a certain point in the state space, we can
represent θ1 = τa , inf{t ≥ 0 : Xt = a} for some a ∈ R. Hence equation (2.13) is an optimal stopping
problem that maximizes
u1(x) = J
1 (x)− g1(x) = Ex
e−ατau1(Xτa)
. (2.14)
among all the τa ∈ S . When θ1 = 0 (i.e., x = Xθ1),
Jw1 (x) = E
x [−g1(x) + Jw0 (x)−H(x, 0)] + g1(x)
and hence
u1(x) = J
0 (x)−H(x, 0)− g1(x).
In other words, we make a switch from open to closed immediately by paying the switching cost. Similarly,
for ζ0 = 0, we can simplify the performance measure J
0 (·) to obtain
Jw0 (x) = E
−e−αθ1g0(Xθ1) + e
−αθ1Jw0 (Xθ1)
+ g0(x).
By defining u0 , J
0 − g0, we have
u0(x) = J
0 (x)− g0(x) = Ex
e−αθ1u0(Xθ1)
Again, by using the characterization of switching times, we replace θ1 with τb,
u0(x) = J
0 (x)− g0(x) = Ex
e−ατbu0(Xτb)
. (2.15)
In summary, we have
u1(x) =
u0(x) + g0(x)−H(x, 0) − g1(x), x ∈ Γ1,
x [e−ατau1(Xτa)] = E
x [e−ατa(u0(Xτa) + g0(Xτa)− g1(Xτa)−H(Xτa , 0))] , x ∈ C1,
(2.16)
u0(x) =
x [e−ατbu0(Xτb)] = E
x [e−ατb(u1(Xτb) + g1(Xτb)− g0(Xτb)−H(Xτb , 1))] , x ∈ C0,
u1(x) + g1(x)−H(x, 1) − g0(x), x ∈ Γ0.
(2.17)
Hence we should solve the following optimal stopping problems simultaneously:
v̄1(x) , supτ∈S E
x [e−ατ (u1(Xτ )]
v̄0(x) , supσ∈S E
x [e−ασ(u0(Xσ)]
(2.18)
Now we let the infinitesimal generators of X1 and X0 be A1 and A0, respectively. We consider (Ai −
α)v(x) = 0 for i = 0, 1. This ODE has two fundamental solutions, ψi(·) and ϕi(·). We set ψi(·) is
an increasing and ϕi(·) is a decreasing function. Note that ψi(c+) = 0, ϕi(c+) = ∞ and ψi(d−) =
∞, ϕi(d−) = 0. We define
Fi(x) ,
ψi(x)
ϕi(x)
and Gi(x) , −
ϕi(x)
ψi(x)
for i = 0, 1.
By referring to Dayanik and Karatzas (2003), we have the following representation
x[e−ατr1{τr<τl}] =
ψ(l)ϕ(x) − ψ(x)ϕ(l)
ψ(l)ϕ(r) − ψ(r)ϕ(l)
, Ex[e−ατr1{τl<τr}] =
ψ(x)ϕ(r) − ψ(r)ϕ(x)
ψ(l)ϕ(r) − ψ(r)ϕ(l)
for x ∈ [l, r] where τl , inf{t > 0;Xt = l} and τr , inf{t > 0;Xt = r}.
By defining
W1 = (u1/ψ1) ◦G−11 and W0 = (u0/ϕ0) ◦ F
the second equation in (2.16) and the first equation in (2.17) become
W1(G1(x)) =W1(G1(a))
G1(d)−G1(x)
G1(d) −G1(a)
+W1(G1(d))
G1(x)−G1(a)
G1(d) −G1(a)
x ∈ [a, d), (2.19)
W0(F0(x)) =W0(F0(c))
F0(b)− F0(x)
F0(b)− F0(c)
+W0(F0(b))
F0(x)− F0(c)
F0(b)− F0(c)
, x ∈ (c, b], (2.20)
respectively. We should understand that F0(c) , F0(c+) = ψ0(c+)/ϕ0(c+) = 0 and that G1(d) ,
G1(d−) = −ϕ1(d−)/ψ1(d−) = 0. In the next subsection, we shall explain W1(G1(d−)) and W0(F0(c+))
in details. Both W1 and W0 are a linear function in their respective transformed spaces. Hence under the
appropriate transformations, the two value functions are linear functions in the continuation region.
2.3 Direct Method for a Solution
We have established a mathematical characterization of the value functions of optimal switching problems.
We shall investigate, by using the characterization, a direct solution method that does not require the recur-
sive optimal stopping schemes described in section 2.1. Since the two optimal stopping problems (2.18)
have to be solved simultaneously, finding u0 in x ∈ C0, for example, requires that we find the smallest
F0-concave majorant of (u1(x) + g1(x)− g0(x)−H(x, 1))/ϕ0(x) as in (2.17) that involves u1.
There are two cases, depending on whether x ∈ C1 ∩C0 or x ∈ Γ1 ∩C0, as to what u1(·) represents.
In the region x ∈ Γ1 ∩C0, u1(·) that shows up in the equation of u0(x) is of the form u1(x) = u0(x) +
g0(x)−H(x, 1, 0) − g1(x). In this case, the “obstacle” that should be majorized is in the form
u1(x) + g1(x)− g0(x)−H(x, 1)
= (u0(x) + g0(x)−H(x, 0)− g1(x)) + g1(x)− g0(x)−H(x, 1)
= u0(x)−H(x, 0)−H(x, 1) < u0(x). (2.21)
This implies that in x ∈ Γ1∩C0, the u0(x) function always majorizes the obstacle. Similarly, in x ∈ Γ0∩C1,
the u1(x) function always majorizes the obstacle.
Next, we consider the region x ∈ C0 ∩ C1. The u0(·) term in (2.16) is represented, due to its linear
characterization, as
W0(F0(x)) = β0(F0(x)) + d0
with some β0 ∈ R and d0 ∈ R+ in the transformed space. (The nonnegativity of d0 will be shown.) In
the original space, it has the form of ϕ0(x)(β0F0(x) + d0). Hence by the transformation (u1/ψ1) ◦ G−1,
W1(G1(x)) is the smallest linear majorant of
K1(x) + ϕ0(x)(β0F0(x) + d0)
ψ1(x)
K1(x) + β0ψ0(x) + d0ϕ0(x)
ψ1(x)
on (G1(d−), G1(a∗)) where
K1(x) , g0(x)− g1(x)−H(x, 0). (2.22)
This linear function passes a point (G1(d−), ld) where G1(d−) = 0 and
ld = lim sup
(K1(x) + β0ψ0(x) + d0ϕ0(x))
ψ1(x)
Let us consider further the quantity ld ≥ 0. By noting
lim sup
(K1(x) + β0ψ0(x))
ψ1(x)
≤ lim sup
(K1(x) + β0ψ0(x) + d0ϕ0(x))
ψ1(x)
≤ lim sup
(K1(x) + β0ψ0(x))
ψ1(x)
+ lim sup
d0ϕ0(x)
ψ1(x)
and lim supx↑d
d0ϕ0(x)
ψ1(x)
= 0, we can redefine ld by
ld , lim sup
(K1(x) + β0ψ0(x))
ψ1(x)
(2.23)
to determine the finiteness of the value function of the optimal switching problem, v1(x), based upon Propo-
sition A.5-A.7. Let us concentrate on the case ld = 0.
Similar analysis applies to (2.17). u1(x) in (2.17) is represented as
W1(G1(x)) = β1G1(x) + d1
with some β1 ∈ R and d1 ∈ R+. Note that d1 = ld ≥ 0. In the original space, it has the form of
ψ1(x)(β1G1(x) + d1). Hence by the transformation (u0/ϕ0(x)) ◦ F−1, W0(F0(x)) is the smallest linear
majorant of
K0(x) + ψ1(x)(β1G1(x) + d1)
ϕ0(x)
K0(x)− β1ϕ1(x) + d1ψ1(x)
ϕ0(x)
on (F0(c+), F0(b
∗)) where
K0(x) , g1(x)− g0(x)−H(x, 1). (2.24)
This linear function passes a point (F0(c+), lc) where F0(c+) = 0 and
lc = lim sup
(K0(x)− β1ϕ1(x) + d1ψ1(x))+
ϕ0(x)
Hence we have lc = d0 ≥ 0. By the same argument as for ld, we can redefine
lc , lim sup
(K0(x)− β1ϕ1(x))+
ϕ0(x)
. (2.25)
Remark 2.1. (a) Evaluation of ld or lc does not require knowledge of β0 or β1, respectively unless the
orders of max(K1(x), ψ1(x)) and ψ0(x) are equal, for example. (For this event, see Proposition 2.4.)
Otherwise, we just compare the order of the positive leading terms of the numerator in (2.23) and
(2.25) with that of the denominator.
(b) A sufficient condition for ld = lc = 0: since we have
0 ≤ ld ≤ lim sup
(K1(x))
ψ1(x)
+ lim sup
(β0ψ0(x))
ψ1(x)
a sufficient condition for ld = 0 is
lim sup
(K1(x))
ψ1(x)
= 0 and lim sup
ψ0(x)
ψ1(x)
= 0. (2.26)
Similarly,
0 ≤ lc ≤ lim sup
(K0(x))
ϕ0(x)
+ lim sup
(−β1ϕ1(x))+
ϕ0(x)
Hence a sufficient condition for lc = 0 is
lim sup
(K0(x))
ϕ0(x)
= 0 and lim sup
ϕ1(x)
ϕ0(x)
= 0. (2.27)
Moreover, it is obvious β1 < 0 and β0 > 0 since the linear majorant passes the origin of each
transformed space. Recall a points in the interval (c, d) ∈ R+ will be transformed by G(·) to
(G(c), G(d−)) ∈ R−.
We summarize the case of lc = ld = 0:
Proposition 2.2. Suppose that ld = lc = 0, the quantities being defined by (2.23) and by (2.25), respectively.
The value functions in the transformed space are the smallest linear majorants of
R1(·) ,
1 (·))
1 (·))
and R0(·) ,
0 (·))
0 (·))
where
r1(x) , g0(x)− g1(x) + β0ψ0(x)−H(x, 0)
r0(x) , g1(x)− g0(x)− β1ϕ1(x)−H(x, 1)
β0 > 0 and β1 < 0. (2.28)
Furthermore, Γ1 and Γ0 in (2.16) and (2.17) are given by
Γ1 , {x ∈ (c, d) :W1(G1(x)) = R1(G1(x))}, and Γ0 , {x ∈ (c, d) :W0(F0(x)) = R0(F0(x))}.
Corollary 2.1. If either of the boundary points c or d is absorbing, then (F0(c),W0(F0(c)) or (G1(d),W1(G1(d)))
is obtained directly. We can entirely omit the analysis of lc or ld. The characterization of the value function
(2.19) and (2.20) remains exactly the same.
Remark 2.2. An algorithm to find (a∗, b∗, β∗0 , β
1) can be described as follows:
1. Start with some β′1 ∈ R.
2. Calculate r0 and then R0 by the transformation R0(·) =
0 (·))
0 (·))
3. Find the linear majorant of R0 passing the origin of the transformed space. Call the slope of the linear
majorant, β0 and the point, F0(b), where R0 and the linear majorant meet .
4. Plug b and β0 in the equation for r1 and calculate R1 by the transformation R1(·) =
1 (·))
1 (·))
5. Find the linear majorant of R1 passing the origin of the transformed space. Call the slope of the linear
majorant, β1 and the point, G1(a), where R1 and the linear majorant meet.
6. Iterate step 1 to 5 until β1 = β
If both R1 and R0 are differentiable functions with their respective arguments, we can find (a
∗, b∗) analyti-
cally. Namely, we solve the following system for a and b:
dR0(y)
y=F0(b)
(F0(b)− F0(c)) = R0(F0(b))
dR1(y)
y=G1(a)
(G1(a)−G1(d)) = R1(G1(a))
(2.29)
where dR0(y)
y=F0(b∗)
= β∗0 and
dR1(y)
y=G1(a∗)
= β∗1 .
Once we find W1(·) and W0(·), then we convert to the original space and add back g1(x) and g0(x)
respectively so that v1(x) = ψ1(x)W1(G1(x)) + g1(x) and v0(x) = ϕ0(x)W0(F0(x)) + g0(x). Therefore,
by (2.16) and (2.17), the value functions v1(·) and v0(·) are given by:
Proposition 2.3. If the optimal continuation regions for both of the value functions are connected and if
lc = ld = 0, then the pair of the value functions v1(x) and v0(x) are represented as
v1(x) =
v̂0(x)−H(x, 0), x ≤ a∗,
v̂1(x) , ψ1(x)W1(G1(x)) + g1(x), a
∗ < x,
v0(x) =
v̂0(x) , ϕ0(x)W0(F0(x)) + g0(x) x < b
v̂1(x)−H(x, 1), b∗ ≤ x,
for some a∗, b∗ ∈ R with a∗ < b∗.
Proof. If the optimal continuation regions for both of the value functions are connected and if ld = lc = 0,
then the optimal intervention times (2.30) have the following form:
θ∗i =
inf{t > θi−1;Xt /∈ (a∗, d)}, Z = 1,
inf{t > θi−1;Xt /∈ (c, b∗)}, Z = 0.
(2.30)
Indeed, since we have lc = ld = 0, the linear majorants W1(·) and W0(·) pass the origins in their respective
transformed coordinates. Hence the continuation regions shall necessarily of the form of (2.30).
By our construction, both v1(x) and v0(x) are continuous in x ∈ R. Suppose we have a∗ > b∗. In this
case, by the form of the value functions, v0(b−)−H(b, 1, 0) = v1(b). Since the cost function H(·) > 0 and
continuous, it follows v0(b−) > v1(b). On the other hand, v0(b+) = v1(b)−H(b, 0, 1) implying v0(b+) <
v1(b). This contradicts the continuity of v0(x). Also, a
∗ = b∗ will lead to v1(x) = v1(x)−H(x, 1, 0) which
is impossible. Hence if the value functions exist, then we must necessarily have a∗ < b∗.
In relation to Proposition 2.3, we have the following observations:
Remark 2.3. (a) It is obvious that
v0(x) = v̂0(x) > v̂0(x)−H(x, 0) = v1(x), x ∈ (c, a∗),
v1(x) = v̂1(x) > v̂1(x)−H(x, 1) = v0(x), x ∈ (b∗, d).
(b) Since u1(x) is continuous in (c, d), the “obstacle” u1(x) + g1(x)− g0(x)−H(x, 1) to be majorized
by u0(x) on x ∈ C0 = (c, b∗) is also continuous, in particular at x = a∗. We proved that u0(x)
always majorizes the obstacle on (c, a∗). Hence F (a∗) ∈ {y : W0(y) > R0(y)} if there exists a
linear majorant of R0(y) in an interval of the form (F0(q), F0(d)) with some q ∈ (c, d): otherwise,
the continuity of u1(x) + g1(x) − g0(x) −H(x, 1) does not hold. Similarly, we have F (b∗) ∈ {y :
W1(y) > R1(y)} if there exists a linear majorant of R0(y) in an interval of the form (G1(c), G1(q)).
Finally, we summarize other cases than lc = ld = 0:
Proposition 2.4.
(a) If either ld = +∞ or lc = +∞, then v1(x) = v0(x) ≡ +∞.
(b) If both ld and lc are finite, then ld = lc = 0.
Proof. (a) The proof is immediate by invoking Proposition A.5. (b) When lc is finite, we know by Proposi-
tion A.5 that the value function v0(x) is finite. On x ∈ (c, a∗), u1(x)+g1(x)−g0(x)−H(x, 1) < u0(x) <
+∞ is finite (see (2.21)) and thereby
lc = lim sup
u1(x) + g1(x)− g0(x)−H(x, 1)
ϕ0(x)
The same argument for ld = 0.
Therefore, we can conclude that ld = 0 for the situation where the orders of max(K1(x), ψ1(x)) and ψ0(x)
are equal (⇒ ld is finite) as described in Remark 2.1 (a).
3 Examples
We recall some useful observations. If h(·) is twice-differentiable at x ∈ I and y , F (x), then we define
H(y) , h(F−1(y))/ϕ(F−1(y)) and we obtain H
(y) = m(x) and H
(y) = m
(x)/F
(x) with
m(x) =
(x), and H
(y)(A− α)h(x) ≥ 0, y = F (x) (3.1)
with strict inequality if H
(y) 6= 0. These identities are of practical use in identifying the concavities of
H(·) when it is hard to calculate its derivatives explicitly. Using these representations, we can modify (2.29)
F ′0(b)
(b)(F0(b)− F0(c)) = r0(b)ϕ0(b)
G′1(a)
(a)(G1(a)−G1(d)) = r1(a)ψ1(a)
(3.2)
Example 3.1. Brekke and Øksendal (1994): We first illustrate our solution method by using a resource
extraction problem solved by Brekke and Øksendal (1994). The price Pt at time t per unit of the resource
follows a geometric Brownian motion. Qt denotes the stock of remaining resources in the field that decays
exponentially. Hence we have
dPt = αPtdt+ βPtdWt and dQt = −λQtdt
where α, β, and λ > 0 (extraction rate) are constants. The objective of the problem is to find the optimal
switching times of resource extraction:
v(x) = sup
Jw(x) = sup
e−ρt(λPtQt −K)Ztdt−
e−ρθiH(Xθi−, Zθi)
where rho ∈ R+ is a discount factor with ρ > α, K ∈ R+ is the operating cost and H(x, 0) = C ∈ R+
and H(x, 1) = L ∈ R+ are constant closing and opening costs. Since P and Q always show up in the form
of PQ, we reduce the dimension by defining Xt = PtQt with the dynamics:
dXt = (α− λZt)Xtdt+ βXtdWt.
Solution: (1) We shall calculate all the necessary functions. For Zt = 1 (open state), we solve (A1 −
ρ)v(x) = 0 where A1 = (α − λ)xv′(x) + 12β
2x2v′′(x) to obtain ψ1(x) = x
ν+ and ϕ1(x) = x
ν− where
ν+,− = β
−α+ λ+ 1
(α− λ− 1
β2)2 + 2ρβ2
. Similarly, for Zt = 0 (closed state), we solve
(A0 − ρ)v(x) = 0 where A0 = αxv′(x) + 12β
2x2v′′(x) to obtain ψ0(x) = x
µ+ and ϕ0(x) = x
µ− where
µ+,− = β
−α+ 1
(α− 1
β2)2 + 2ρβ2
. Note that under the assumption ρ > α, we have
ν+, µ+ > 1 and ν−, ν− < 0.
By setting ∆1 =
(α− λ− 1
β2)2 + 2ρβ2 and ∆0 =
(α− 1
β2)2 + 2ρβ2, we have G1(x) =
−ϕ1(x)/ψ1(x) = −x−2∆1/β
and F0(x) = ψ0(x)/ϕ0(x) = x
2∆0/β
. It follows thatG−11 (y) = (−y)−β
2/2∆1
and F−10 (y) = y
β2/2∆0 . In this problem, we can calculate g1(x), g0(x) explicitly:
g1(x) = E
e−ρs(λXs −K)ds
ρ+ λ− α
and g(x) = 0. Lastly, K1(x) = g0(x) − g1(x) − H(x, 0) = −
ρ+λ−α
− C and K0(x) =
g1(x)− g0(x)−H(x, 1) = xρ+λ−α −
(2) The state space of X is (c, d) = (0,∞) and we evaluate lc and ld. Let us first note that ∆0−∆1+λ > 0.
Since limx↓0
ϕ1(x)
ϕ0(x)
= limx↓0 x
∆0−∆1+λ
β2 = 0 and limx↓0(K0(x))
+/ϕ0(x) = 0, we have lc = l0 = 0 by
(2.27). Similarly, by noting limx↑+∞
ψ0(x)
ψ1(x)
= limx↑+∞ x
−(∆0−∆1+λ)
β2 = 0 and limx↑+∞(K1(x))
+/ϕ0(x) =
0, we have ld = l+∞ = 0 by (2.26).
(3) To find the value functions together with continuation regions, we set
r1(x) = −
ρ+ λ− α
− C + β0ψ0(x) and r0(x) =
ρ+ λ− α
− L− β1ϕ1(x)
and make transformations R1(y) = r1(F
−1(y))/ψ1(F
−1(y)) and R0(y) = r0(F
−1(y))/ϕ0(F
−1(y)), re-
spectively. We examine the shape and behavior of the two functions R1(·) and R0(·) with an aid of (3.1).
By calculating (r0/ϕ0)
′(x) explicitly to examine the derivative of R0(y), we can find a critical point x = q,
at which R0(F (x)) attains a local minimum and from which R0(F (x)) is increasing monotonically on
(F0(q),∞). Moreover, we can confirm that limy→∞R′0(y) = limx→∞
(r0/ϕ0)
F ′0(x)
= 0, which shows that
there exists a finite linear majorant of R0(y). We define
p(x) = β1ωx
ν− − (ρ− α)
ρ+ λ− α
+ (K + ρL)
such that (A0 − ρ)r0(x) = p(x) where ω ,
β2ν−(ν− − 1)− αν−
(∆0 − ∆1 + λ)(∆0 +
∆1−λ) > 0. By the second identity in (3.1), the sign of the second derivative R′′0(y) is the same as the sign
of p(x). It is easy to see that p(x) has only one critical point. For any β1 < 0, the first term is dominant as
x → 0, so that limx↓0 p(x) < 0. As x gets larger, for |β1| sufficiently small, p(x) can take positive values,
providing two positive roots, say x = k1, k2 with k1 < k2. We also have limx→+∞ p(x) = −∞. In this
case, R0(y) is concave on (0, F (k1) ∪ (F (k2),+∞) and convex on (F (k1), F (k2)). Since we know that
R0(y) attains a local minimum at y = F (q), we have q < k2, and it implies that there is one and only on
tangency point of the linear majorant W (y) and R0(y) on (F (q),∞), so that the continuation region is of
the form (0, b∗).
¿From this analysis of the derivatives of R0(y), there is only one tangency point of the linear majorant
W0(y) and R0(y). (See Figure 3.1-(a)). A similar analysis shows that there is only one tangency point of
the linear majorant W1(y) and R1(y). (See Figure 3.1-(b)).
0.5 1 1.5 2 2.5 3 3.5
RHFH.L,WHFH.LL
-200 -150 -100 -50
RHGH.L,WHGH.LL
0.5 1 1.5 2 2.5
v0HxL
0.1 0.2 0.3 0.4 0.5 0.6 0.7
v1HxL
Figure 1: A numerical example of resource extraction problem. with parameters (α, β, λ, ρ,K, L,C) =
(0.01, 0.25, 0.01, 0.05, 0.4, 2, 2)(a) The smallest linear majorant W0(F0(x)) and R0(F0(x)) with b
∗ = 1.15042 and
= 10.8125. (b)The smallest linear majorantW1(G1(x)) andR1(G1(x)) with a
∗ = 0.18300 and β∗
= −0.695324.
(c) The value function v0(x). (d) The value function v1(x).
(4) By solving the system of equations (2.29), we can find (a∗, b∗, β∗0 , β
1). We transform back to the original
space to find
v̂1(x) = ψ1(x)W1(G1(x)) + g1(x) = ψ1(x)β
1G1(x) + g1(x)
= −β∗1ϕ1(x) + g1(x) = −β∗1xν− +
ρ+ λ− α
v̂0(x) = ϕ0(x)W0(F0(x)) + g0(x) = ϕ0(x)β
0F0(x) + g0(x) = β
0ψ0(x) + g0(x) = β
Hence the solution is
v1(x) =
µ+ − C, x ≤ a∗,
−β∗1xν− +
ρ+λ−α
, x > a∗,
v0(x) =
µ+ , x ≤ b∗,
−β∗1xν− +
ρ+λ−α
− L, x > b∗,
which agrees with Brekke and Økesendal (1994).
Example 3.2. Ornstein-Uhrenbeck process: We shall consider a new problem involving an Ornstein-
Uhrenbeck process. Consider a firm whose revenue solely depends on the price of one product. Due to its
cyclical nature of the prices, the firm does not want to have a large production facilty and decides to rent
additional production facility when the price is favorable. The revenue process to the firm is
dXt = δ(m −Xt − λZt)dt+ σdWt,
where λ = r/δ with r being a rent per unit of time. The firm’s objective is to maximize the incremental
revenue generated by renting the facility until the time τ0 when the price is at an intolerably low level.
Without loss of generality, we set τ0 = inf{t > 0 : Xt = 0}. We keep assuming constant operating cost K ,
opening cost, L and closing cost C . Now the value function is defined as
v(x) = sup
Jw(x) = sup
e−αt(Xt −K)Ztdt−
θi<τ0
e−αθiH(Xθi−, Zθi)
Solution: (1) We denote, by ψ̃(·) and ϕ̃(·), the functions of the fundamental solutions for the auxiliary
process Pt , (Xt −m+ λ)/σ, t ≥ 0, which satisfies dPt = −δPtdt+ dWt. For every x ∈ R,
ψ̃(x) = eδx
2/2D−α/δ(−x
2δ) and ϕ̃(x) = eδx
2/2D−α/δ(x
which leads to ψ1(x) = ψ̃((x −m + λ)/σ), ϕ1(x) = ϕ̃((x −m + λ)/σ), ψ0(x) = ψ̃((x −m)/σ), and
ϕ0(x) = ϕ̃((x −m)/σ) where Dν(·) is the parabolic cylinder function; (see Borodin and Salminen (2002,
Appendices 1.24 and 2.9) and Carmona and Dayanik (2003, Section 6.3)). By using the relation
Dν(z) = 2−ν/2e−z
2/4Hν(z/
2), z ∈ R (3.3)
in terms of the Hermite function Hν of degree ν and its integral representation
Hν(z) =
Γ(−ν)
2−2tzt−ν−1dt, Re(ν) < 0, (3.4)
(see for example, Lebedev(1972, pp 284, 290)). Since Ex[Xt] = e
−δtx + (1 − e−δt)(m − λ), we have
g0(x) = 0 and g1(x) =
x−(m−λ)
+ m−λ−K
(2) The state space of X is (c, d) = (0,+∞). Since the left boundary 0 is the absorbing, the linear majorant
passes (0, F0(0)). Since limx→+∞ ψ0(x)/ψ1(x) = 0, we have ld = 0.
(3) We formulate
r1(x) = −
x− (m− λ)
δ + α
m− λ−K
− C + β0ψ0(x)
r0(x) =
x− (m− λ)
δ + α
m− λ−K
− L− β1ϕ1(x)
and make transformations: R1(y) = r1(F
−1(y))/ψ1(F
−1(y)) and R0(y) = r0(F
−1(y))/ϕ0(F
−1(y)),
respectively. We examine the shape and behavior of the two functions R1(·) and R0(·) with an aid of (3.1).
First we check the sign of R′0(y) and find a critical point x = q, at which R0(F (x)) attains a local minimum
and from which R0(F (x)) is increasing monotonically on (F0(q),∞). It can be shown that R
0(+∞) = 0
by using (3.3) and (3.4) and the identity H′ν(z) = 2νHν−1(z), z ∈ R (see Lebedev (1972, p.289), for
example.) This shows that there must exist a (finite) linear majorant of R0(y) on (F (q),∞). To check
convexity of R0(y), we define
p(x) = −
ϕ′′1(x) + δ(m− x− λ)
δ + α
− β1ϕ′1(x)
− αr0(x)
such that (A0−α)r0(x) = p(x). We can show easily limx→+∞ p(x) = −∞ since ϕ1(+∞) = ϕ′1(+∞) =
ϕ′′1(+∞) = 0. Due to the monotonicity of ϕ1(x) and its derivatives, p(x) can have at most one critical point
and p(x) = 0 can have one or two positive roots depending on the value of β1. In either case, let us call the
largest positive root x = k2. We also have limx→+∞ p(x) = −∞. Since we know that R0(y) attains a local
minimum at y = F (q) and is increasing thereafter, we have q < k2. It follows that there is one and only on
tangency point of the linear majorant W (y) and R0(y) on (F (q),∞), so that the continuation region is of
the form (0, b∗). A similar analysis shows that there is only one tangency point of the linear majorant W1(y)
and R1(y).
(4) Solving (3.2), we we can find (a∗, b∗, β∗0 , β
1). We transform back to the original space to find
v̂1(x) = ψ1(x)W1(G1(x)) + g1(x) = ψ1(x)β
1G1(x) + g1(x) = −β∗1ϕ1(x) + g1(x)
= −β∗1e
δ(x−m+λ)2
2σ2 D−α/δ
(x−m+ λ)
x− (m− λ)
δ + α
v̂0(x) = ϕ0(x)W0(F0(x)) + g0(x) = ϕ0(x)β
0 (F0(x)− F0(0)) + g0(x)
= β∗0{ψ0(x)− F0(0)ϕ0(x)}+ g0(x)
= β∗0e
(x−m+λ)2
D−α/δ
x−m+ λ
− F (0)D−α/δ
Hence the solution is, using the above functions,
v1(x) =
v̂0(x)− C, x ≤ a∗,
v̂1(x), x > a
v0(x) =
v̂0(x), x ≤ b∗,
v̂1(x)− L, x > b∗.
See Figure 3.2 for a numerical example.
0.5 1 1.5 2 2.5 3
v0HxL
0.5 1 1.5 2 2.5 3
v1HxL
Figure 2: A numerical example of leasing production facility problem with parameters (m,α, σ, δ, λ,K, L,C) =
(5, 0.105, 0.35, 0.05, 4, 0.4, 0.2, 0.2): (a) The value function v0(x) with b
∗ = 1.66182 and β∗
= 144.313. (b)The
value function v1(x) with a
∗ = 0.781797 and β∗
= −2.16941.
4 Extensions and conclusions
4.1 An extension to the case of k ≥ 2
It is not difficult to extend to a general case of k ≥ 2 where more than one switching opportunities are
available. But we put a condition that z ∈ Z is of the form z = (a1, a2, ...., ak) where only one element of
this vector is 1 with the rest being zero, i.e., z = (0, 0, 0, ...., 1, 0, 0) for example.
We should introduce the switching operator M0 on h ∈ H,
M0h(u, z) = max
ζ∈Z\{z}
{h(u, ζ)−H(u, z; ζ)} . (4.1)
In words, this operator would calculate which production mode should be chosen by moving from the current
production mode z. Now the recursive optimal stopping (2.7) becomes
wn+1(x) , sup
e−αsf(Xs)ds+ e
−ατMwn(Xτ )
Accordingly, the optimization procedure will become two-stage. To illustrate this, we suppose k = 2 so that
i = 0, 1, and 2. By eliminating the integral in (4.1), we redefine the switching operator,
Mhz(x) , max
ζ∈Z\{z}
{hζ(x) + gζ(x)− gz(x)−H(x, z, ζ)} , (4.2)
where
gz(x) , sup
Jwz (x) = E
e−αsf(Xs)ds
Hence (2.13) will be modified to uz(x) = E
x[e−ατMuz(Xτ )]. It follows that our system of equations
(2.18) is now
v̄2(x) , supτ∈S E
x [e−ατMv̄2(Xτ )]
v̄1(x) , supτ∈S E
x [e−ατMv̄1(Xτ )]
v̄0(x) , supτ∈S E
x [e−ατMv̄0(Xτ )]
(4.3)
The first stage is optimal stopping problem. One possibility of switching production modes is (0 → 1, 1 →
2, 2 → 0). First, we fix this switching scheme, say c, and solve the system of equations (4.3) as three
optimal stopping problems. All the arguments in Section 2.3 hold. This first-stage optimization will give
(x∗0(c), x
1(c), x
2(c), β
0 (c), β
1 (c), β
2 (c)), where xi’s are switching boundaries, depending on this switching
scheme c.
Now we move to another switching scheme c′ and solve the system of optimal stopping problems until
we find the optimal scheme.
4.2 Conclusions
We have studied optimal switching problems for one-dimensional diffusions. We characterize the value
function as linear functions in their respective spaces, and provide a direct method to find the value functions
and the opening and switching boundaries at the same time. Using the techniques we developed here as
well as the ones in Dayanik and Karazas (2003) and Dayanik and Egami (2005), we solved two specific
problems, one of which involves a mean-reverting process. This problem might be hard to solve with just
the HJB equation and the related quasi-variational inequalities. Finally, an extension to more general cases
is suggested. We believe that this direct method and the new characterization will expand the coverage of
solvable problems in the financial engineering and economic analysis.
A Summary of Optimal Stopping Theory
Let (Ω,F ,P) be a complete probability space with a standard Brownian motion W = {Wt; t ≥ 0} and
consider the diffusion process X0 with state pace I ⊆ R and dynamics
dX0t = µ(X
t )dt+ σ(X
t )dWt (A.1)
for some Borel functions µ : I → R and σ : I → (0,∞). We emphasize here that X0 is an uncontrolled
process. We assume that I is an interval with endpoints −∞ ≤ a < b ≤ +∞, and that X0 is regular in
(a, b); in other words, X0 reaches y with positive probability starting at x for every x and y in (a, b). We
shall denote by F = {Ft} the natural filtration generated by X0.
Let α ≥ 0 be a real constant and h(·) a Borel function such that Ex[e−ατh(X0τ )] is well-defined for
every F-stopping time τ and x ∈ I . Let τy be the first hitting time of y ∈ I by X0, and let c ∈ I be a fixed
point of the state space. We set:
ψ(x) =
x[e−ατc1{τc<∞}], x ≤ c,
1/Ec[e−ατx1{τx<∞}], x > c,
ϕ(x) =
e−ατx1{τx<∞}
, x ≤ c,
x[e−ατc1{τc<∞}], x > c,
F (x) ,
, x ∈ I. (A.2)
Then F (·) is continuous and strictly increasing. It should be noted that ψ(·) and ϕ(·) consist of an increasing
and a decreasing solution of the second-order differential equation (A − α)u = 0 in I where A is the
infinitesimal generator of X0. They are linearly independent positive solutions and uniquely determined
up to multiplication. For the complete characterization of ψ(·) and ϕ(·) corresponding to various types of
boundary behavior, refer to Itô and McKean (1974).
Let F : [c, d] → R be a strictly increasing function. A real valued function u is called F -concave on
[c, d] if, for every a ≤ l < r ≤ b and x ∈ [l, r],
u(x) ≥ u(l)
F (r)− F (x)
F (r)− F (l)
+ u(r)
F (x)− F (l)
F (r)− F (l)
We denote by
V (x) , sup
x[e−ατh(X0τ )], x ∈ [c, d] (A.3)
the value function of the optimal stopping problem with the reward function h(·) where the supremum is
taken over the class S of all F-stopping times. Then we have the following results, the proofs of which we
refer to Dayanik and Karatzas (2003).
Proposition A.1. For a given function U : [c, d] → [0,+∞) the quotient U(·)/ϕ(·) is an F -concave function
if and only if U(·) is α-excessive, i.e.,
U(x) ≥ Ex[e−ατU(X0τ )],∀τ ∈ S,∀x ∈ [c, d]. (A.4)
Proposition A.2. The value function V (·) of (A.3) is the smallest nonnegative majorant of h(·) such that
V (·)/ϕ(·) is F -concave on [c, d].
Proposition A.3. Let W (·) be the smallest nonnegative concave majorant of H , (h/ϕ) ◦ F−1 on
[F (c), F (d)], where F−1(·) is the inverse of the strictly increasing function F (·) in (A.2). Then V (x) =
ϕ(x)W (F (x)) for every x ∈ [c, d].
Proposition A.4. Define
S , {x ∈ [c, d] : V (x) = h(x)}, and τ∗ , inf{t ≧ 0 : X0t ∈ S}. (A.5)
If h(·) is continuous on [c, d], then τ∗ is an optimal stopping rule.
When both boundaries are natural, we have the following results:
Proposition A.5. We have either V ≡ 0 in (c, d) or V (x) < +∞ for all (c, d). Moreover, V (x) < +∞ for
every x ∈ (c, d) if and only if
lc , lim sup
h+(x)
and ld , lim sup
h+(x)
(A.6)
are both finite.
In the finite case, furthermore,
Proposition A.6. The value function V (·) is continuous on (c, d). If h : (c, d) → R is continuous and
lc = ld = 0, then τ
∗ of (A.5) is an optimal stopping time.
Proposition A.7. Suppose that lc and ld are finite and one of them is strictly positive, and h(·) is continuous.
Define the continuation region C , (c, d) \ Γ. Then τ∗ of (A.5) is an optimal stopping time, if and only if
there is no r ∈ (c, d) such that (c, r) ⊂ C if lc > 0 and
there is no l ∈ (c, d) such that (l, d) ⊂ C if ld > 0.
References
Borodin, A. N. and Salminen, P. (2002). Handbook of Brownian motion - facts and formulae, 2nd Edition.
Birkhäuser, Basel.
Brekke, K. A. and Øksendal, B. (1994). Optimal switching in an economic activity under uncertainty. SIAM
J. Control Optim. 32(4), 1021–1036.
Brennan, M. J. and Schwartz, E. S. (1985). Evaluating natural resource investments. J. Business. 58, 135–
Carmona, R. and Dayanik, S. (2003). Optimal multiple-stopping of linear diffusions and swing options.
Preprint. Princeton University.
Carmona, R. and Ludkovski, M. (2005). Optimal switching with applications to energy tolling agreements.
Preprint. University of Michigan.
Dayanik, S. and Egami, M. (2005). Solving stochastic impulse control problems via optimal stopping for
one-dimensional diffusions. Preprint. www.math.lsa.umich.edu/∼ egami.
Dayanik, S. and Karatzas, I. (2003). On the optimal stopping problem for one-dimensional diffusions.
Stochastic Process. Appl. 107(2), 173–212.
Dixit, A. (1989). Entry and exit decisions under uncertainty. J. Political Economy. 97, 620–638.
Duckworth, K. (2001). A model for investment decisions with switching costs. Annuls of Appl. Prob. 11(1),
239–260.
Hamadène, S. and Jeanblanc, M. (2004). On the starting and stopping problem: Applications in reversible
investments. Preprint.
Itô, K. and McKean, Jr., H. P. (1974). Diffusions processes and their sample paths. Springer-Verlag, Berlin.
Lebedev, N. N. (1972). Special functions and their applications, Dover Publications Inc., New York. Revised
edition, translated from the Russian and editied by R. A. Silverman.
Yushkevich, A. (2001). Optimal switching problem for countable Markov chains: average reward criterion.
Math. Meth. Oper. Res.53, 1–24.
	Introduction
	Optimal Switching Problems
	Characterization of switching times
	Characterization of the value functions
	Direct Method for a Solution
	Examples
	Extensions and conclusions
	An extension to the case of k2
	Conclusions
	Summary of Optimal Stopping Theory
ABSTRACT
  In this paper, we propose a direct solution method for optimal switching
problems of one-dimensional diffusions. This method is free from conjectures
about the form of the value function and switching strategies, or does not
require the proof of optimality through quasi-variational inequalities. The
direct method uses a general theory of optimal stopping problems for
one-dimensional diffusions and characterizes the value function as sets of the
smallest linear majorants in their respective transformed spaces.

<|endoftext|><|startoftext|>
Introduction
The standard model (SM), a local quantum field theory, has served so far
as a very good description of elementary particle processes [1]. It is however
widely believed that soon, when higher energies are experimentally accessible,
new phenomena may emerge that require a description that goes beyond the
standard model. Among the various the possibilities, is the possibility that
a composite nature of the standard model constituents may be revealed [2]
and a possible failure of locality [3]. It is possible that the underlying physics
∗e-mail address:sdj@iitk.ac.in
http://arxiv.org/abs/0704.0995v2
is nonlocal at shorter distances which could be a result of composite struc-
ture of particles, or granularity of space-time, or underlying noncommutative
structure of space-time [4]. With a nonlocal interaction, often goes causality
violation that can arise because the interaction region, encloses points sep-
arated by a space-like interval. Causality violation has been studied in the
context of non-local [5] and non-commutative quantum field theories [6, 7]. It
has in fact been suggested [8, 5] that non-local quantum field theories [9, 10]
may indeed serve as effective field theories for a deeper/more fundamental
theory such as a composite model; and the former indeed show causality
violation[10, 5]. An effective tool to study causality has been developed by
Bogoliubov and Shirkov [11] and has been in particular employed for the
causality violation in non-local [5] and non-commutative QFT’s [7]. We wish
to consider the following question: in view of the possible composite nature
of elementary particles, leading to extended structures, will these leave an
observable effect in the form of a violation of causality and locality that can
be detected? A similar question regarding a violation of the Pauli exclusion
principle on account of the compositeness of particles has been earlier ad-
dressed to [12]. This question is particularly interesting since should there
be a signal of CV, it will be detected long before an explicit knowledge of
composite structure is known. In fact, it is has been suggested [5] that the
unknown physics at high energy scales (Λ) from a possible source can effec-
tively be represented in a consistent way (a unitary, gauge-invariant, finite (or
renormalizable) theory) by a nonlocal theory at energies lower than Λ, but
higher than the present ones. In other words, the nonlocal standard model,
with a parameter Λ, can serve as such an effective field theory and will afford
a model-independent way of consistently reparametrizing the effects beyond
standard model . In this model, one finds that there is but a small CV at
low energies, which grows rapidly as energies approach Λ and beyond these,
the fundamental theory is expected to take over and presumably it leads to
no CV again. The aim of the present work is to approach this question in
a model-independent way in connection with a composite structure of SM
constituents.
2 Preliminary
2.1 Definition of the problem
Suppose that the presently known standard model particles are a composite
of a set of finer constituents. Suppose that these underlying constituents
belong to a local causality-preserving fundamental theory. Suppose, further
that at lower energies, one only observes the composite bound states and
their scattering processes. These bound state particles are extended objects.
A priori, their interaction is expected to be non-local. A nonlocal covariant
interaction has, at a given instant, interaction spread over a region in space,
which therefore contains spatially separated points. An obvious question
arises: will the interactions of the composite theory be such that causality
is preserved by this low-energy theory? We need the fundamental theory
for energy scales >> Λ, and for energy scales << Λ, we have the set of
composite particles described by the ”derived” theory. Then the question,
paraphrased differently is, will the phase transition (should there be one)
from the fundamental to the composite be causality preserving or it could
lead to a breakdown of causality at short enough distances?
2.2 Definition of the system
Let, for simplicity, the fundamental theory, denoted by F , be character-
ized by a single coupling constant g. For the purpose of formulation of the
Bogoliubov-Shirkov (BS) criterion of causality, we shall need to formulate
a theory with a variable coupling g(x). Let the low-energy derived theory
be characterized by its own coupling constant λ ≡ λ [g], which for identical
reasons, we shall need to allow to depend on space-time: λ = λ (x). We shall
assume, for simplicity, that the derived theory, denoted by C, is completely
described by its scattering states: i.e. we shall assume that the model admits
no bound states. A scattering state of the derived theory can be looked upon
from two different point of views:
⋆ a scattering state, as t → ±∞, is a state of a certain set of non-
interacting composite particles of the low energy theory with certain
momenta, polarizations etc.
⋆ a scattering state, as t → ±∞, is a (complicated) configuration of fields
of the fundamental theory.
3 Causality formulation for a theory without
a well-defined S-matrix
Bogoliubov and Shirkov have shown [11] that S−operator is causal only if it
satisfies,
δg (x)
δg (y)
= 0, x <∼ y (1)
(Here, x < y⇐⇒ x0 < y0 and x ∼ y ⇐⇒ (x − y)
2 < 0 ). The condition is
obtained from the primary meaning of causality that a disturbance does not
propagate outside the forward light-cone (the disturbance considered is that
in g(x)1), and is independent of any specific field theory formulation. The
BS causality criterion holds for a theory for which an S-operator is defined.
For a theory such as QCD, some of the matrix elements of the S-operator
may not exist on account of the infrared divergences. It is nonetheless true
that an alternate formulation in terms of the U-operator (i.e. U (−T, T ′))
can be given. This is so because, the U operator is unitary as much as the
S-operator and the BS criterion depends on two points x, y with x <∼ y
which can always be chosen to be such that they both lie in (−T, T ′). The
relation would then read
δg (x)
δg (y)
= 0, x <∼ y; −T < x0, y0 < T
′ (2)
It is possible to alternately formulate the causality condition in terms of
the following choices of the couplings. [This way results when we suitably
integrate (2) over x0 < 0 and y0 > 0]. In this approach, we make a comparison
of the following two neighboring theories in the coupling constant space2:
1. Fundamental theory F ′: Coupling constants = g′2 ( a constant value)
for x0 > 0 and g1(a constant value) for x0 < 0. Corresponding derived
theory is C′.
2. Fundamental theory F ′′: Coupling constants = g′′2 ( a constant value)
for x0 > 0 and g1(the same constant value) for x0 < 0. Corresponding
derived theory is C′′.
1One may consider varying g(x) an unphysical operation, but one can look alternately
upon varying g(x) at a point x0 as insertion of a (specific) local operator
∂g(x)
at x0 and
study the propagation of its effects.
2The idea of varying the coupling with time over all space is not an entirely unfamiliar
one: it is also employed in the LSZ formulation.
All the coupling constants are (chosen to be) space-independent. It suffices
for our purpose that g′2 differs infinitesimally from g1 and g
2 . (We can in fact
assume that the infinitesimal change from g1 to g2 is carried out adiabatically
and in an infinitesimal time). Then, we can alternately formulate [13] the
causality condition as,
U [g1, g
2 ;−T, T
′]U † [g1, g
2;−T, T
′] is independent of g1 (3)
This alternate formulation makes mathematics simpler, though it may lead
to an unusual-looking Physics.
In the following, we shall adopt a ”reductio ad absurdum” approach: We
shall let, if possible, that the theory C be causality-preserving and deduce
the consequences of causality of F for C and analyze these.
4 Relations between the derived theory and
the fundamental theory
4.1 Relations between coupling constants
The coupling constant λ is a function of g. If we allow a space-time dependent
coupling, then λ = λ [g]. A small change3δg (x) in the coupling g (x) about
g(x) = g = constant, will cause a change in λ (x) as given by4 δλ(z) =
δλ(z)
δg(y)
g(y)=g
δg (y). For the BS criterion of causality of Eq. (2), we need
to know the impact of a localized change g(x) → g (x) + δg (x) ≡ g (x) +
εδ4 (x− x̃) on the function λ (x). Now, if causality is valid, λ (x) cannot
be affected for any x0 < x̃0. Assuming that the theory has T-invariance
λ (x) cannot be affected for any x0 > x̃0. Thus, this together with causality
requires that,
λ (x) → λ (x) + Cεδ4 (x− x̃)
3For the argument presented subsequently, we shall go back to a general space-time
dependent coupling and not confine ourselves to the specific couplings presented in the
previous section.
4 We shall assume the existence and non-vanishing of
δλ(z)
δg(y)
g(y)=g
. By translational
invariance, this quantity is a function of (z − y) and is independent of the point z as such.
5We shall need that the theory F with a variable coupling has a T-invariance. This is
possible to formulate a time-reversal transformation for a theory with a variable g (x): we
need to define the action of time-reversal as Tg (x, t)T−1 = g (x,−t).
+ terms having finite order derivatives of delta function
Thus,
δλ (z)
δg (y)
= Cδ4 (z − y)
+ terms having finite order derivatives of delta function.
Then, for a constant small change δg = ε, for all x0 > 0, [i.e. δg(x) = ǫθ(x0)];
we find,
δλ(z) =
δλ(z)
δg(y)
δg (y)
Cδ4 (z − y) + derivatives of delta function
εθ (y0)
= C ′εθ (z0) for z0 > 0.
We shall denote by λ′2 = λ[g
2, g1] and λ
2 = λ[g
2 , g1].
4.2 Relation between states
We shall work in the interaction picture of C. Let the derived theory C′
have as incoming states6 {|c̃m (λ1,−T )〉} which, as −T → −∞, represents
scattering states with a number of free composite particles. We shall keep
T finite and will let T → ∞ only at the end of the argument. Evidently, as
−T → −∞, |c̃m (λ1,−T )〉 depends on λ1 only through the self-interaction
of each individual non-interacting particle in the state. Let H̃ denote the
Hilbert space of states of C′. Then the hypothesis that the scattering states
of C′ forms a complete set implies that the set {|c̃m (λ1,−T )〉} spans H̃:
H̃ ≡ sp {|c̃m (λ1,−T )〉}. We shall denote by H, the Hilbert space of states
of F ′ (and likewise for F ′′ ). Consider a state |c̃m (λ1,−T )〉 ∈ H̃ in the
interaction picture. On physical grounds, we know that there is a corre-
sponding state of F ′ in the interaction picture, denoted by |cm (g1,−T )〉.
We note that H can, in addition, have states linearly independent of the
states {|cm (g1,−T )〉}. We augment this set to complete an orthonormal ba-
sis {|cm (g1,−T )〉} ∪ {|βn (g1,−T )〉} ≡ {|αp (g1,−T )〉} for H. We shall call
6For technical simplicity, we shall assume that the set of states is countably infinite.
the span of {|cm (g1,−T )〉} by Ĥ ⊂ H. A similar discussion holds for F
Let us now consider the time-evolution, from t = −T to t = T ′, of a single
particle state of C′′ denoted by |s̃p (λ1,−T )〉, which belongs to the basis of
H̃. The unitary time evolution operator Ũ [λ1, λ
2;−T, T
′] as applied to the
state leads to
Ũ [λ1, λ
2;−T, T
′] |s̃p (λ1,−T )〉 = |s̃p (λ
′)〉 ∈ H̃ (4)
This state is a single particle state of slightly different mass, on account of a
slightly different self-energy, and interacts with a coupling λ′′2. We shall also
introduce interaction picture states
d̃m (λ
. These states are at t = T ′
and as T ′ → ∞ consist of a set of non-interacting (but self-interacting) parti-
cles of a slightly different mass and coupling constant λ′′2. These are analogues
of the ”out” states. We shall assume that these also span H̃. We shall further
make a convention: Under time reversal, the quantum numbers of particles
in the state |c̃m (λ1,−T )〉 become those of
d̃m (λ1, T
. Now, consider an
exclusive process in C′′. The magnitude of the quantum mechanical ampli-
tude for it, as seen from C′′and F ′′ are identical, as these are, in principle,
experimentally observable:
|ũnm| ≡
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃m (λ1,−T )〉
≡ |〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cm (g1,−T )〉| ≡ |unm| (5)
Here, we have introduced states |dn (g
2 , T
′)〉 in H analogous to
d̃n (λ
in H̃. We note that U here is the U−matrix in the interaction picture of
F ′′, as the set of states {|cm (g1,−T )〉} evolve according to the interaction
Hamiltonian H′I (g) (in the interaction picture) of the F
First we note that on account of unitarity of Ũ and Eq. (5),
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃m (λ1,−T )〉
|〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cm (g1,−T )〉|
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃m (λ1,−T )〉
|〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cm (g1,−T )〉|
So, the unitarity of U implies,
〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |βm (g1,−T )〉 = 0
〈βn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cm (g1,−T )〉 = 0 (8)
The relations (8) implies that U is a block-diagonal matrix. The unitarity of
U then implies that the block corresponding to the subspace ∧H, viz. Û , is
also unitary. We shall now attempt relate these further. In this connection,
we recall a result for a finite dimensional matrices:
Lemma : Let U and U ′ be two N ×N unitary matrices satisfying: |u′ij| =
|uij|; 1 ≤ i, j ≤ N . Then, there exist phases {θi : i = 1, 2, . . . , N}
and {φi : i = 2, . . . , N} such that u
ij = uij exp [i (θi + φj)] : 1 ≤ i, j ≤
N withφ1 ≡ 0..
Proof : Let the diagonal elements of U ′ and U be related by: u′ii = exp (iΘi)uii.
We define U ′′ by u′′ij = exp (−iΘi)u
ij. Then, u
ii = uii. Now U
′′ is unitary and
thus, a priori, has N2 independent parameters. The information on moduli
of elements constitutes (N − 1)2 independent conditions, corresponding to
an (N − 1)× (N − 1) dimensional submatrix; the rest of the (2N − 1) mod-
uli being determined by relations implying that the norm of each row and
column is unity. The relations u′′ii = uii imply additional N relations on the
phases on uii.This leaves N
2 − (N − 1)2 −N = N − 1 free parameters. The
phases of u′′1j , 2 ≤ j ≤ N are unconstrained by |u
ij| = |uij| : 1 ≤ i, j ≤ N
and u′′ii = uii and we define u
1j = u1j exp (iφj), 2 ≤ j ≤ N . Then, there
are no free parameters and must lead to a unique U ′′. Now, U ′′ specified by
u′′ij = uij exp (iφj − iφi), 1 ≤ i, j ≤ N (φ1 ≡ 0) is such a solution. This
together with u′′ij = exp (−iΘi)u
ij leads to the result; with the definition
Θi − φi = θi.
Thus, in view of the unitarity of Ũ , and Û and (5), we write,
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃m (λ1,−T )〉
≡ 〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cm (g1,−T )〉 × exp (iθ
n + iφm) (9)
We shall assume that F and C are have time-reversal invariance and derive
the consequences. Under time reversal, we know then that,
〈β|S |α〉 = 〈T α|S |T β〉 (10)
where |T β〉 is the state obtained by time-reversing the quantum numbers of
the state |β〉. In this case, it would imply, keeping in mind our choice of
definitions,
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃m (λ1,−T )〉
d̃m (λ1, T )
Ũ [λ1, λ
2;−T, T
′] |c̃n (λ
′)〉 (11)
We write a similar relation for F . Putting T ′ = T , (or equivalently, noting
that the matrix elements are insensitive to T ′ and T ), we find,
φp(λ2, λ1) = θp(λ1, λ2) (12)
5 Consequence of Causality of F for C
We shall assume that the fundamental theory F is causal and deduce the
consequences for the derived theory C(C′, C′′). The causality of F implies
U [g1, g
2 ;−T, T
′]U † [g1, g
2;−T, T
is independent of g1. Hence,
Mnm ≡ 〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′]U † [g1, g
2;−T, T
′] |dm (g
is also independent of g1 since the state vectors 〈dn (g
2 , T
′)| and |dm (g
are independent of g1 with g
2 and g
2 fixed. We shall re-express Mnm in
terms of the matrix elements of the derived theory C(C′, C′′) and deduce the
consequences. We note,
Mnm = 〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′]U † [g1, g
2;−T, T
′] |dm (g
〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |αp (g1,−T )〉
× 〈αp (g1,−T )|U
† [g1, g
2;−T, T
′] |dm (g
′)〉 (13)
〈dn (g
2 , T
′)|U [g1, g
2 ;−T, T
′] |cp (g1,−T )〉
× 〈cp (g1,−T )|U
† [g1, g
2;−T, T
′] |dm (g
′)〉 (14)
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] |c̃p (λ1,−T )〉 exp [−i(θ̃
p − θ̃
× 〈c̃p (λ1,−T )| Ũ
† [λ1, λ
2;−T, T
d̃m (λ
exp−[i(θ′′n − θ
m)](15)
≡ M̃nm(λ
2, λ1) (16)
In the 3rd step, we have employed the equations (8) and in the second step,
we have employed the closure relation for F .
In the above, θ′′n ≡ θn(λ
2, λ1), θ
m ≡ θm(λ
2, λ1), and θ̃
p ≡ θp(λ1, λ
2) etc.
Thus, the expression (15) is independent of λ1:
∂M̃nm(λ
2, λ1)
= 0 (17)
6 Analysis of Causality Condition
We shall now analyze the condition (17) obtained as an implication of causal-
ity of F . For this purpose, we shall find it useful to Taylor-expand θn as
follows7:
2, λ1) = θn(λ
2, λ1(0)) + βn∆1 + γn∆2 + δn∆1∆2 + · · ·
≡ αn + βn∆1 + γn∆2 + δn∆1∆2 + · · ·
2, λ1) = αm + βm∆1 + · · · (18)
Here, ∆1 ≡ λ1 − λ1(0); ∆2 ≡ λ
2 − λ
2; β, γ, δ refer to appropriate partial
derivatives at (λ′2, λ1(0)) and λ1(0) is some value near λ1.
We note that if
θn(λ2, λ1) is a function only of its first argument (I)
then, (θ̃′′p − θ̃
p) ≡ θp(λ1, λ
2)− θp(λ1, λ
2) is zero and (θ
n − θ
m) ≡ θn(λ
2, λ1)−
2, λ1) is independent of λ1. Also, we can then carry out the sum over p
using the completeness relation and find that the independence from λ1 of
M̃nm(λ
2, λ1)
d̃n (λ
Ũ [λ1, λ
2;−T, T
′] Ũ † [λ1, λ
2;−T, T
d̃m (λ
× exp−[i(θ′′n − θ
m)] (19)
7Throughout, we have employed only the infinitesimal variations in the couplings.
These are sufficient to determine the first order partial derivatives with respect to each λ1
and λ2. Hence, we shall content ourselves with expansion only upto O(∆1∆2)
for all m,n implies Ũ [λ1, λ
2;−T, T
′] Ũ † [λ1, λ
2;−T, T
′] is independent of λ1.
This condition is indeed necessary for causality of C. In fact, in this case, we
can rewrite8
d̃n (λ2, T
Ũ [λ1, λ2;−T, T
′] |c̃m (λ1,−T )〉
≡ 〈dn (g2, T
′)|U [g1, g2;−T, T
′] |cm (g1,−T )〉
× exp (iθn(λ2, λ1) + iθm(λ1, λ2)) (20)
d̃∗n (λ2, T
Ũ [λ1, λ2;−T, T
∣c̃∗m (λ1,−T )
≡ 〈dn (g2, T
′)|U [g1, g2;−T, T
′] |cm (g1,−T )〉 (21)
by redefining states by absorbing phases:
∣c̃∗m (λ1,−T )
= e−iθm(λ1) |c̃m (λ1,−T )〉)
etc. We note that this redefinition of the states is meaningful and compatible
with causality when θn is independent of its second argument. If on the
other hand, θn is dependent on its second argument (excepting a possibility
below), we cannot absorb a phase in a manner compatible with causality : a
state
∣c̃∗m
at t = −T cannot be made to depend on the value of coupling
λ2 it would have at a later time t > 0.
We can, in fact, liberalize somewhat the above condition by requiring
that,
βn = β and δn = 0 ∀ n (II)
In this case,
(θ̃′′p − θ̃
p) ≡ θp(λ1, λ
2)− θp(λ1, λ
= β∆2 + · · · (22)
is independent of λ1 and does not depend also on p and thus comes out of
the summation in (15). The summation in (15) can be carried out using
the completeness relation. Also, (θ′′n − θ
m) ≡ θn(λ
2, λ1) − θm(λ
2, λ1) is still
independent of λ1. Thus, the entire discussion proceeds as before: in par-
ticular, as a little analysis shows, the phases can again be absorbed into the
definition of states in a manner compatible with causality.
8We have dropped primes on λ2.
While we shall not provide the general analysis of (17), we shall establish
examples of a few specific sufficient conditions for causality violation. (These
are simple conditions that, in fact, contradict I or II above) We can easily
verify the following results:
1. There is causality violation if (i) for some n, δn 6= 0 and (ii) βn =
βm ∀ n,m
2. There is causality violation if there be m 6= n such that
M̃nm(λ
2, λ1) 6= 0, when evaluated to O(∆), and βm 6= βn.
Proof : We shall let, if possible, C be causal. We can then write,
Ũ [λ1, λ
2;−T, T
′] = Ũ [λ′′2; 0, T
′] Ũ [λ1;−T, 0] (23)
Then, we can write the expression (15) as,
M̃nm(λ
2, λ1) =
d̃n (λ
2, 0)
c̃p (λ1, 0)〉 exp [−i(θ̃
p − θ̃
× 〈c̃p (λ1, 0)
d̃m (λ
2, 0)
exp−[i(θ′′n − θ
d̃n (λ
2, 0)
d̃m (λ
2, 0)
exp [−i(θ′′n − θ
m)] (24)
where X ≡
|c̃p (λ1, 0)〉 〈c̃p (λ1, 0)| exp [−i(θ̃
p − θ̃
p)]. We shall now expand
the quantities involved to the first order in the infinitesimals as in (18). In
addition, we note that to the zeroth order in ∆2, (i.e. λ”2 − λ
2 = 0), we
have, (θ̃′′p − θ̃
p) = 0 and the completeness relation leads to X = 1. We further
define,
d̃n (λ
2, 0)
d̃m (λ
2, 0)
= δnm + iηnm∆2 + · · · (25)
Proof of (i): We define δ0 ≡ max{|δn|}; and let ±δ0 = δq for some q. We
now have,
(θ̃′′p − θ̃
p) ≡ θp(λ1, λ
2)− θp(λ1, λ
= β∆2 + δp∆1∆2 + · · · (26)
and thus,
|c̃p (λ1, 0)〉 〈c̃p (λ1, 0)| exp [−i(θ̃
p − θ̃
= exp (−iβ∆2)
|c̃p (λ1, 0)〉 〈c̃p (λ1, 0)| × exp [−i(δp∆1∆2)]
= exp (−iβ∆2)
I − i∆1∆2
|c̃p (λ1, 0)〉 〈c̃p (λ1, 0)| δp
Thus,
exp (iβ∆2)
d̃q (λ
2, 0)
d̃q (λ
2, 0)
= 1 + iηqq∆2 − i∆1∆2
δp|upq|
2 + · · · (28)
where upq ≡
d̃q (λ2, 0)
c̃p (λ1, 0)〉 (We can ignore primes on λ2 in this term).
The multiplicative exponential factor in (24) becomes:
exp (−iγq∆2 − iδq∆1∆2 + · · ·) ≈ 1− iγq∆2 − iδq∆1∆2 + · · ·
. Thus,
M̃qq = 1 + iηqq∆2 − iγq∆2 − iδq∆1∆2 − i∆1∆2
δp|upq|
2 + · · ·
= 1 + iηqq∆2 − iγq∆2 − i∆1∆2
[δq + δp]|upq|
2 + · · · (29)
In view of the fact that either δp + δq ≥ 0 ∀ p or δp+ δq ≤ 0 ∀ p the last term
is necessarily non-vanishing and dependent on ∆1
Proof of (ii): Consider the matrix element
M̃nm(λ
2, λ1) ≡
d̃n (λ
2, 0)
d̃m (λ
2, 0)
exp−[i(θ′′n − θ
m)] 6= 0(30)
for n 6= m. To the first order in the infinitesimals, the nonzero matrix element
d̃n (λ
2, 0)
d̃m (λ
2, 0)
is independent of ∆1. The multiplicative phase factor,
exp−[i(θ′′n − θ
m)] = exp{−i(αn − αm)− i(βn − βm)∆1 − iγn∆2}
is necessarily dependent on ∆1, thus implying causality violation.
9There is the obvious exception that δp = −δq for every such p such thatupq 6= 0; and
this has to be valid for each such q for which δq = ±δ0.
7 Additional comments
We comment in a qualitative way upon how a phase factor depending on
both values of the coupling can arise. Suppose that the derived theory C
is actually correctly described by a nonlocal covariant theory with a finite
non-zero non-locality scale ∆ ∼ 1/Λ. Since the theory is covariant, it is also
non-local in time. We write,
Ũ(λ1, λ2;−T, T
′) = Ũ(λ2; ∆, T
′)Ũ(λ1, λ2;−∆,∆)Ũ(λ1;−T,−∆) (31)
where the first and the third factors on the right hand side depends only on
one value of the coupling due to finite size of non-locality in time. The sec-
ond factor however depends on both couplings because in this time-interval
(−∆,∆), time evolution depends on both values of the coupling λ. On the
other hand, the fundamental theory, being local and causal, however has no
such analogue . The matrix Ũ(λ1, λ2;−∆,∆) can then give rise to phases
depending on both couplings in relation (9).
Naively, one may expect that if the fundamental theory is causal, the de-
rived theory should be so. Examples are however known where the diagrams
of the fundamental theory are associated with a different weight in the actual
phenomenology. For example, OZI rule in hadronic phenomenology gives a
suppression of a subset of the QCD diagrams. While such a possibility is
distinct from what is discussed in this work, generally such a modification
of the amplitudes within the fundamental theory may alter the underlying
properties of the fundamental theory such as causality.
References
[1] See e.g. Reviews of Particle Properties in W.-M. Yao et al., Journal of
Physics G 33, 1 (2006).
[2] See e.g. H. Harari, Phys.Rept.104:159,1984
[3] C. Bourrely , N.N. Khuri , Andre Martin , J. Soffer , Tai Tsun Wu
hep-ph/0511135; N.N. Khuri hep-ph/9512386
[4] See e.g. R.J. Szabo, Phys.Rep. 278,207 (2003).
http://arxiv.org/abs/hep-ph/0511135
http://arxiv.org/abs/hep-ph/9512386
[5] S. D. Joglekar, and A. Jain, Int. J.Mod. Phys. 19, (2004), S.D. Joglekar,
hep-th/0601006;
[6] See e.g. M. Chaichian, K. Nishijima, A. Tureanu, Phys.Lett. B568:146-
152,2003; O.W. Greenberg, Phys.Rev.D73:045014,2006
[7] A. Haque and S.D. Joglekar, hep-th/0701171
[8] S.D.Joglekar, J. Phys. A 34, 2765 (2001); S.D.Joglekar,Int.J.Mod.
Phys.A 16, (2001).
[9] E. D. Evens et al, Phys Rev D43, 499 (1991)
[10] G. Kleppe, and R. P. Woodard, Nucl. Phys. B 388, 81(1992).
[11] N. N. Bogoliubov, and D. V. Shirkov, Introduction to the theory of
quantized fields (John Wiley, New York, 1980) pg. 200-220.
[12] K. Akama et al Phys. Rev. Lett. 68, 1826 (1991); O.W. Greenberg, R.N.
Mohapatra Phys.Rev.Lett.59:2507,1987, Erratum-ibid.61:1432,1988;
Phys.Rev.Lett.62:712,1989, Erratum-ibid.62:1927,1989
[13] For a simpler and intuitive understanding of the causality condition in
either form, see e.g. S.D. Joglekar, hep-th/0601006
http://arxiv.org/abs/hep-th/0601006
http://arxiv.org/abs/hep-th/0701171
http://arxiv.org/abs/hep-th/0601006
	 Introduction
	 Preliminary
	 Definition of the problem
	 Definition of the system
	 Causality formulation for a theory without a well-defined S-matrix
	 Relations between the derived theory and the fundamental theory
	 Relations between coupling constants
	 Relation between states
	Consequence of Causality of F for C
	Analysis of Causality Condition
	Additional comments
ABSTRACT
  We study the question of whether a composite structure of elementary
particles, with a length scale $1/\Lambda$, can leave observable effects of
non-locality and causality violation at higher energies (but $\lesssim
\Lambda$). We formulate a model-independent approach based on
Bogoliubov-Shirkov formulation of causality. We analyze the relation between
the fundamental theory (of finer constituents) and the derived theory (of
composite particles). We assume that the fundamental theory is causal and
formulate a condition which must be fulfilled for the derived theory to be
causal. We analyze the condition and exhibit possibilities which fulfil and
which violate the condition. We make comments on how causality violating
amplitudes can arise.

<|endoftext|><|startoftext|>
Introduction.
Higher dimensional spacetimes are now an essential aspect of effective field
theories arising from fundamental theories of quantum gravity. The general as-
sumption implicit in such constructions was that the extra spatial dimensions
are compactified to ultrashort length scales. Hence quantum gravity effects were
relegated to very high energy scales. However in recent years the exciting possi-
bility of low scale quantum gravity effects in the brane world models have inspired
considerable interest and interesting phenomenological consequences [1–3]. The
brane world scenario envisaged the gauge sector of the fundamental interactions
to be restricted on a smooth codimension one hypersurface ( refered to as a brane)
embedded in a higher dimensional space-time and the electroweak scale as the
fundamental scale. The usual four dimensional Planck scale was then a derived
scale. In particular the Randall-Sundrum models and their variants based on a
warped non factorable compactification geometry in a bulk Anti deSitter ( AdS)
space time offered a partial resolution to the vexing hierarchy problem [4]. Al-
though the analysis was valid in a linearized framework a full non linear study
from a supergravity perspective confirmed the conclusions and their extension to
any Ricci flat geometry on the brane [5, 6].
For consistency the brane world scenario requires generic four dimensional
gravitational configurations on the brane to arise from a higher dimensional bulk.
The investigation of black hole configurations in this context has been an exciting
aspect of the study of brane world gravity [5]. Such a black hole on the brane
is expected to be a configuration extended in the bulk. Chamblin, Hawking and
Reall [7] attempted the description of a Schwarzschild black hole in a typical single
three brane five dimensional Randall-Sundrum brane world as a bulk black string.
This reproduced the usual Schwarzschild singularity on the brane but additionaly
was also singular at the AdS horizon far away from the three brane. Although a
pathology, this singularity was possibly a linearization artifact and could be shown
to be a mild p-p curvature singularity. The bulk black string was subject to the
usual instabilities against long wavelength perturbations [8,9] and was expected to
pinch off to a cigar geometry before reaching the AdS horizon. However the issue
of stability is contentious and for sphericaly symmetric solutions it was shown
that a more likely scenario is a transition to a non uniform black string [10]
In an earlier article [11]we have generalized the construction of Chamblin et.
al. [7]to consider rotating black holes in a five dimensional single three brane
RS brane world. The bulk configuration proposed was a five dimensional rotat-
ing black string which intercepted the three-brane in a four dimensional rotating
black hole described by a Kerr metric on the three brane. It was found that
the Kerr solution too was singular at the AdS horizon apart from the usual ring
singularity on the brane. The asymptotics of the equatorial geodesics at the AdS
horizon also indicated a p-p curvature singularity although an explicit determina-
tion was computationaly intractable. There have been other approaches to brane
world black holes including numerical studies for off brane metrics and a Hamil-
tonian constraint approach to charged black holes [12–18]. In lower dimensions
exact studies of brane world black holes [19] involving the AdS C-metric have
indicated that the bulk solutions are regular everywhere emphasizing that the
bulk singularity in higher dimension is possibly a linearization artifact. However
absence of exact bulk metrics in higher dimensions requires a linearized approach
and the black string framework is hence physicaly relevant in this context in spite
of such a bulk singularity.
The brane world constructions must be embedded in an appropriate string
theory for consistency, requiring the generalizations of these models to higher
dimensions. The generalization of the Randall-Sundrum construction and its
variants to higher dimensions with a single space like AdS direction and an ap-
propriate codimension one brane is straightforward. Additionaly this may easily
be extended to include the full non linear extensions of a Ricci flat metric [20–22].
In higher dimensions also the consistency of such brane world constructions re-
quire that gravitational configurations arise from appropriate bulk scenarios. In
particular this applies to higher dimensional black holes on the codimension one
brane. In this context in an earlier article [23]we had described the N dimensional
rotating Myers-Perry [24]black hole on a single (N-1) brane in a (N + 1) dimen-
sional RS brane world. The bulk solution in this case was a (N+1) dimensional
rotating black string extended in the AdS direction transverse to the (N-1) brane.
Analysis of equatorial geodesics again indicated a p-p curvature singularity in the
bulk apart from the usual extended singularity on the (N-1) brane.
In the recent past there has been remarkable and surprising progress in under-
standing higher dimensional black holes. In particular it has been realized that
the no hair and the uniqueness theorems are much less restrictive in higher dimen-
sions [27]. In four dimensions the no hair theorem characterizes any stationary
asymptoticaly flat black hole solution of Einstein-Maxwell system only by their
mass, angular momentum and conserved charges whereas the uniqueness theorem
forbids event horizons of non spherical toplogies. However the discovery [25] in
five dimensions of an asymptoticaly flat stationary black hole solution with a non
spherical ring like S2×S1 horizon topology with the possibility of dipole charges,
showed that higher dimensional black holes posess remarkably distinctive prop-
erties. The static black ring solution [28] was first obtained through the Wick
rotation of a neutral solution of an Einstein-Maxwell system [29] although they
involved conical singularities. However the stationary solution rotating in the
S1 direction was regular everywhere except the usual curvature singularity. For
fixed mass the angular momentum of the black ring was bounded below and for a
certain range of parameters two black rings and a usual five dimensional rotating
Myers-Perry black hole all with the same mass and spin coexist. The charged
versions of these black rings were first obtained in the framework of D=5 heterotic
supergravity [30] and fully supersymmeric three charged black ring solutions in
D=5 followed later from compactifications of black supertubes in D=10 [31, 32].
It was seen that these black rings could also support gauge dipoles independent
of the conserved gauge charges entailing an infinite non uniqueness and violating
the no hair theorem [33].
As emphasized earlier, for consistency of the brane world scenario it is im-
perative that gravitational configurations like black holes on the brane should
arise from appropriate bulk solutions. In this context it is but natural to inves-
tigate possible bulk configurations in a higher dimensional brane world scenario
which would describe five dimensional black rings on the brane. This is espe-
cialy relevant for the neutral rotating black rings as they are Ricci flat and hence
satisfy the criteria for embedding in higher dimensional Randall-Sundrum brane
worlds. Naturaly the absence of exact solutions in higher dimensions require the
usual linearized framework to analyse this question. The black string approach
is especialy relevant in this context to highlight the physical aspects of such an
embedding although it suffers from singular pathologies which are possibly lin-
earization artifacts.
In this article we address this issue and show that it is possible to consistently
embed the five dimensional black ring solution on a single four brane in a (5 + 1)
dimensional Randall-Sundrum brane world. Following the black string approach
we consider a six dimensional bulk rotating black string extension of the five
dimensional black ring. This bulk configuration intercepts the four brane in a
five dimensional rotating black ring. In what follows after a brief review of neutral
rotating black rings, we obtain their geodesic equations in the plane of the ring
analogous to the equatorial plane of black holes with spherical topologies. We
further investigate the asymptotic behaviour of both the null and the timelike
geodesics in this plane to elucidate the restricted causal structure of the black ring
space time. In section three we consider a bulk rotating black string extension
of a five dimensional neutral rotating black ring in a six dimensional RS brane
world with a single four brane. The bulk black string intercepts the four brane in
a five dimensional black ring with the usual spacelike curvature singularity on the
brane. Additionaly a curvature singularity also appears at the AdS horizon far
away from the four brane. Following the description of a black ring as a boosted
black string with periodic identification in a certain limit, the bulk solution may
be described as a boosted black two brane with the same periodic identification.
We then construct the six dimensional bulk geodesics in the plane of rotation of
the ring and show that their projections on the four brane reproduces the usual
five dimensional black ring geodesics in the same plane. To study of the nature
of the pathological singularity at the AdS horizon we further investigate the late
time asymptotics of these geodesics. It is shown that the curvature remains
finite along unbound geodesics which reach the AdS horizon. We also discuss the
possibility of the bulk solution to pinch off before reaching the AdS horizon due
to the usual instabilities and comment on the possible stable solution in the light
of the analysis outlined in [9] and [10]. In the last section we provide a summary
of our analysis and results and also discuss certain future open issues in this area.
2 The Rotating Neutral Black Ring .
In this section we first briefly review the neutral rotating black ring and eluci-
date the nature of the adapted coordinate system . We then construct the black
ring geodesics restricted to the plane of rotation of the ring which is analogous
to the equatorial geodesics in solutions with a spherical topology. Furthermore
we analyse the geodesic equations to study the nature of the radial orbits for
this plane and their asymptotics. The static neutral black ring was originally
discovered through a Wick rotation of certain Kaluza Klein C metrics decribing
neutral bubbles [29]. These involved conical singularities and consequent deficit
angles leading to either cosmic string defects joining these singularities or deficit
membranes. However an analytic continuation led to the original neutral rotat-
ing black ring solution which was a five dimensional asymptoticaly flat black hole
with a ringlike S2×S1 horizon topology, regular everywhere except at a spacelik
e curvature singularity. The original solution was further refined through appro-
priate factorizable choice of certain functions appearing in the metric [26,30–32].
The rotating black ring in equlibrium was parametrized by a dimensionless re-
duced angular momentum j = 27π
which was bounded from below for a fixed
mass. It could be shown that in the range 27
≤ j2 < 1 there existed one Myers-
Perry black hole with spherical topology and two black rings with identical mass
and angular momenta, in direct violation of the black hole uniqueness theorem.
2.1 Black Ring Metric
The metric of the neutral rotating five dimensional black ring in a specific adpated
coordinate system which is obtained from the foliation of space-time in terms of
the equipotentials of certain 1-form and 2-form gauge potentials is , [32]
ds2 = −
F (y)
F (x)
dt− C R
1 + y
F (y)
(x− y)2
F (x)
−G(y)
F (y)
dψ2 − dy
F (x)
, (1)
where the functions
F (ξ) = 1 + λξ, G(ξ) = (1− ξ2)(1 + νξ) , (2)
λ(λ− ν)
1 + λ
. (3)
Here R is a length scale which may be interpreted as the radius of the ring in
some limit [32] and the two dimensionless parameters λ and ν which are related
to the shape and the rotation velocity of the ring lie in the range
0 < ν ≤ λ < 1 (4)
. The range of the spatial co-ordinates (x, y) are required to be,
− 1 ≤ x ≤ +1 , −∞ ≤ y ≤ −1 . (5)
respectively.
The constant y hypersurfaces are nested deformed solid toroids with topology
S2 × S1, whereas the coordinate x is like a direction cosine, x = +1 points to
the interior of the ring and x = −1 points to the region outside the ring. The
solution is a stationary axisymmetric solution with rotation in the ψ direction,
and admits t, φ, and ψ Killing isometries.
In order to avoid conical singularities at the fixed points x = −1 and y = −1 of
the Killing isometries ∂φ and ∂ψ the co-ordinates ψ and φ require to be identified
with the equal periods
∆ψ = ∆φ = 4π
F (−1)
|G′(−1)|
. (6)
Furthermore the requirement that the orbits of the isometry ∂φ shows no deficit
angles at x = +1 lead to the condition
1 + ν2
The co-ordinates (x, φ) parametrize a two-sphere S2, the co-ordinate ψ parametrizes
a circle S1 and the solution describes a black ring having a regular horizon of
topology S1 × S2 and rotating in the S1 plane. However the horizon geometry is
not a simple product of S2 and S1 as the two sphere S2 is deformed there and
the deformation grows away from the horizon.
The metric reduces to a conventional five dimensional Myers-Perry black hole
with rotation in a single plane if, instead of (7), we consider the limit, R → 0,
(λ, ν) → 1 and the parameters
, a2 = 2R2
(1− ν)2
, (8)
are held constant. In this case the co-ordinates (x, φ, ψ) characterises a three-
sphere S3 which is a regular horizon of a five dimensional Myers-Perry black hole.
The ergosphere and the event horizon of the black ring are located at y = −1/λ
and y = −1/ν respectively. At y = −∞ there is a spacelike curvature singularity
inside the horizon. Asymptotic infinity is reached as (x, y) → −1.
The ADM mass and angular momentum are given as
λ(λ− ν)(1 + λ)
(1− ν)2
. (10)
The curvature squared for the black ring spacetime is computed to be,
RµνρσR
µνρσ =
6ν2(1 + ν2)2Q(x, y)
R4(1 + ν2 + 2νx)6
(x− y)4, (11)
where Q(x, y) is a poynomial of degree six in x and y. Hence there is a spacelike
curvature singularity at y = −∞ inside the event horizon. In terms of the Myers-
Perry co-ordinates (t, r, θ, ψ, φ) the difference (x−y) goes like 1/r2 at large r, i.e.
towards spatial infinity, so that the curvature squared goes as
RµνρσR
µνρσ ∼
as obtained in the case of five dimensional Myers-Perry black hole.
The rotating black ring in the limit of large radius R may be described after
appropriate coordinate redfinitions as a Schwarzschild black string boosted and
periodically identified along the translation invariant direction with a period 2πR
[30, 32, 33]. The black string metric is given as
ds2 = dw2 − (1−
)dt2 + (1−
)−1dr2 + r2dΩ2
, (13)
where the horizon is at r = r0 and w is the translation invariant direction. The
parameter ν = r0/R is seen to correspond to the thickness of the ring or the
ratio of the radius of the S2 at the horizon and the ring radius R . The ratio
λ/ν then measures the speed of rotation of the ring in the S1 direction and the
coordinate ψ = w/R corresponds to a redefined translation invariant direction of
the black string which is periodically identified as w = w + 2πR. The speed of
rotation is related to the local boost velocity given by
1− (ν/λ) and reduces to
1− (ν2/2) for the black ring space time to exclude any conical singularities. [30]
2.2 Black Ring Geodesics
The first order geodesic equations may be derived using the canonical framework
[34] from the Lagrangian
gµν ẋ
µẋν , (14)
here µ, ν = 0...4 and the covariant components of the metric tensor are as defined
in the previous section and ẋµ = dxµ/dρ with the affine parameter ρ = τ/m
[36]for time like geodesics, τ being the proper time andm the mass of the particle.
Consequently, for both time like and null geodesics the momenta are pµ = ẋµ.
The covariant momenta may be directly obtained from the Lagrangian and are
given as pµ = gµν ẋ
µ. The norm of the conjugate momenta is then given as,
gµνpµpν = −ǫm2 (15)
where gµν are the contravariant components of the black ring metric and ǫ = (0, 1)
for null and time like geodesics respectively.
The black ring spacetime admits three Killing isometries generated by the
vector fields ∂t, ∂ψ, and ∂φ corresponding to time translation and the two rotation
isometries in the coordinates φ, ψ. These isometries provide three conserved
conjugate momenta, pt = −E, pψ = Ψ, pφ = Φ. We consider the geodesics
restricted to the plane of rotation of the black ring, outside the ring, i.e, x = −1.
It is analogous to an equatorial plane in the spherical case in the sense that it
is reflection symmetric and hence geodesics in it with zero initial velocity in the
transverse x direction will continue to remain in the plane. The plane x = −1
being a fixed point of the ∂φ isometry, the gφφ component of the metric tensor
goes to zero smoothly there. The geodesic equations of motion in the equatorial
plane for the t and φ directions are obtained directly from the conserved conjugate
momenta. These turn out to be as follows:
1 + λy
(1− λ)2
(1 + y)4
(1 + λy)G(y)
E − C(1 + y)
R(1− λ)(G(y)
Ψ (16)
CR(1 + y)3
(1− λ)G(y)
(1 + y)2(1 + λy)
R2(1− λ)G(y)
Ψ (17)
The form of the y equation for geodesic motion in the equatorial plane is obtained
directly from eqn. (15) to be,
)2 + gyy
gttE2 − 2gtψEΨ+ gψψΨ2 + ǫm2
= 0, (18)
where gyy = 1/gyy, g
tt = gψψ/D, g
ψψ = gtt/D, g
tψ = −gtψ/D and D = gttgψψ −
Thus, the y equation may be expressed as
ẏ2 = − (1 + y)
(1− λ)2R2
C2(1 + y)3 + (1− λ)2(1 + νy)(1− y)
F (y)
2C(1 + y)2
(1 + λy)(1 + y)
− ǫ(1− λ)(1 + νy)(1− y)m2
where ǫ = (0, 1) for null and timelike geodesics respectively. It should be noted
that the co-efficient of E2 in the r.h.s of the above equation remains finite and
smooth at the ergosphere, y = −1/λ, even though the function F (y) in the
denominator vanishes. The eqn (19) should be compared with that appearing in
[35] for the null geodesics in the plane of the ring, where a a certain normalization
of the metric components have been chosen at asymptotic infinity.
The y co-ordinate ranges over the plane of rotation of the ring from the
curvature singularity to asymptotic infinity and the above equation is analogous
to particle motion in a central potential
ẏ2 + Veff(y;E,Ψ) = 0 (20)
Towards asymptotic infinity, (x, y) → −1, the effective potential for time like
geodesics tends to
Veff(y;E,Ψ) → −
2(1− ν)
R2(1 + λ)
η3(E2 −m2), (21)
where η tends to 0 towards asymptotic infinity and is given by η = −(1 + y).
Unbound time like geodesics can exist only when E2 −m2 > 0 in which case
the effective potential Veff is negative at large distances and approaches zero at
asymptotic infinity (x, y = −1). For the case E2 < m2 only bound geodesics
exist, in the sense that such geodesics do not reach upto asymptotic infinity.
Stable bound orbits are bound orbits which do not end up in the singularity. It is
common knowledge that stable bound orbits do not occur in a higher dimensional
central potential, even in the case of Newtonian gravity. Thus it is expected that
such orbits must be excluded from higher dimensional black hole space times.
This was explicitly shown for the equatorial geodesics of a five dimensional Myers-
Perry black hole in [36]. This conclusion is expected to also hold for the class
of geodesics restricted to the plane of rotation of the ring being considered here.
Their existence is indicated by the presence of stable circular orbits. For circular
orbits, we have the condition
Veff(y = yc) = 0 ,
∂Veff (y)
= 0 (22)
where y = yc is the ‘radius’ of the circular orbit.The condition for stability of the
circular orbit is
∂2Veff
> 0. (23)
−2.2 −2 −1.8 −1.6 −1.4 −1.2 −1
Figure 1: Plot of black ring effective potential for ν = 0.46, L = 4.40145 and
three different values of E as indicated in the box. Motion is allowed only in the
region where Veff < 0. The constants m = R = 1. It is apparent that there are
no stable bound orbits. E = 2.0 is close to having an unstable circular orbit,
whereas for E = 2.02 there are no inaccessible regions. Since E > 1 all the three
curves exhibit unbounded orbits. The case for E < 1 shows an exactly similar
behaviour as regards the bound orbits.
We get two simulataneous biquadratic equations in E and Ψ from Eq(22)
which can be solved in terms of the radius yc for a black ring of specific ν. These
values of Ec and Ψc can be then substituted into (23) to obtain a function of yc
for a specific black ring [36]. It is difficult to interpret the analytic expressions
for Ec,Ψc and that of Eq.(23) in terms of yc. However, numerical plots have been
obtained in Fig. 1 for the effective potential Veff(y) against y which clearly shows
that stable bound orbits are ruled out both for E2 > m2 and E2 < m2 .
3 Brane World Black Ring
In this section we very briefly outline the construction of the Randall-Sundrum
braneworld with a single (N-1)-brane in (N+1) dimensions with a single AdS
direction transverse to the brane. We then consider the specific case of the five
dimensional neutral rotating black ring on a four brane in a (5+1) dimensional
Randall-Sundrum braneworld with a single AdS direction transverse to the brane
hypersurface. We propose that the appropriate bulk description is provided by
a six dimensional rotating black string extension of the five dimensional rotating
black ring. The intercept of the bulk solution on the four brane is a five dimen-
sional black ring with the usual curvature singularity on the brane hypersurface
although an additional bulk singularity also appears at the AdS horizon. We also
compute the six dimensional bulk geodesics restricted to the plane of rotation of
the black ring. The projection of these bulk geodesics on the four brane reduces
to the appropriate class of black ring geodesics on the four brane hypersurface.
The y orbits for the bulk solution which reach the AdS horizon are then analyzed
using the geodesic equation to elucidate the natuer of the bulk singularity at the
AdS horizon. It is seen that the curvature remains finite at the AdS horizon
along the unbounded indicating the presence of a mild p-p curvature singularity.
3.1 Black Ring in a RS Brane World
The bulk metric for single brane RS brane world in (N +1) dimensions, with one
transverse AdS direction to the (N-1) brane is as follows; [19, 21]
ds2 = gmndx
mdxn =
[gµνdx
µdxν + dz2]. (24)
Here µ, ν = 0 . . . (N − 1) and m,n = 0 . . . (N) and l is the AdS length scale. The
transverse coordinate z = 0,∞ are the conformal infinity and the AdS horizon
respectively. The actual RS braneworld geometry is obtained by removing the
small z region at z = z0 and glueing a mirror copy of the large z geometry at the
location of the (N-1) brane which ensures Z2 reflection symmetry. The resulting
topology for the double brane RS scenario is essentialy RN × S1
and in the single
brane variant considered here the S1 direction is essentialy decompactified with
the second regulator brane being at z = ∞. The discontinuity of the extrinsic
curvature at the z = z0 surface corresponds to a thin distributional source of
stress-energy. From the Israel junctions conditions this may be interpreted as
a relativistic (N-1) brane (smooth domain wall) with a corresponding tension
[19,21]. The orginal RS model sliced the AdS space-time both at z = 0 and z = l
and inserted two (N-1) branes with Z2 reflection symmetry at both hypersurfaces.
The Israel junction conditions then required a negative tension for the brane at
z = l. The variant considered here may be obtained from the original RS model
by allowing the negative tension brane to approach the AdS horizon at z = ∞ .
Although we focus here only on the single brane RS model for convenience, our
construction may be generalized to the original RS model with double branes in
a straightforward manner.
The Einstein equations in (N+1) dimensions with a negative cosmological
constant continue to be satisfied for any metric gµν which is Ricci flat. The
curvature of the modified metric now satisfies
RpqrsR
pqrs =
2N(N + 1)
RµνλκR
µνλκ (25)
where (p, q) runs over (N +1) dimensions and (µ, ν) over the N dimensions of the
brane world volume. The perturbations of the (N+1) dimensional metric around
a Ricci flat background are now normalizable modes peaked at the location of
the (N-1) brane.
Having provided this brief introduction to the single brane RS model in (N+1)
dimensions we now specialize to N=5 and consider the bulk description of a five
dimensional neutral rotating black ring on the four brane in a six dimensional RS
braneworld. To this end we consider a bulk six dimensional black string extension
of the five dimensional rotating neutral black ring in the bulk. The black ring
being a Ricci flat space-time the bulk black string extension automaticaly satisfies
the Einstein equation [21] For a reflection symmetric four brane hypersurface fixed
at z = z0 we may introduce the co-ordinate w = z−z0. The bulk metric on either
side of the domain wall may now be expressed as
ds2 =
(z0 + |w|)2
dw2 −
F (y)
F (x)
dt− CR
1 + y
F (y)
(x− y)2
F (x)
−G(y)
F (y)
dψ2 − dy
F (x)
where −∞ < w <∞ and the domain wall is located at w = 0.
The induced metric on the four brane at z = z0 may be recast into the black
ring form by suitably rescaling the coordinates and the parameters. The ADM
mass and angular momentum as measured on the brane, scaled by the conformal
warp factor, are then given as
M , J∗ =
J. (27)
where M,J are the bulk parameters.
The curvature squared for the bulk black string is computed to be;
RjklmR
jklm =
6(1 + ν2)2ν2Q(x, y)
R4(1 + ν2 + 2νx)6
z4(x− y)4
Following Eq(12), towards spatial infinity on the brane the curvature squared
behaves as
RjklmR
jklm ∼
. (29)
The curvature invariant diverges at the spacelike singularity on the brane at
y = −∞. Additionaly, it is also seen to diverge at the AdS horizon z = ∞ for
finite r. As mentioned earlier, such a singularity seems to be a artifact of the
linearized approximation. In order to further investigate this issue we need to
study the geodesics and their behaviour at the AdS horizon.
As mentioned earlier the neutral rotating black ring maybe described in a cer-
tain limit as a Schwarzschild black string boosted in the translationaly invariant
direction and identified periodicaly. In the braneworld construction that we have
developed, this reduces to a six dimensional bulk black two brane boosted along
the extended direction on the four brane and identified periodically. In the 5+1
dimensional brane world Eq. (13) generalizes to;
ds2 =
dz2 + dw2 − (1− r0
)dt2 + (1− r0
)−1dr2 + r2dΩ2
Here u is the translation invariant direction of the black string along the brane
hypersurface and z describes the transverse direction. Apart from the conformal
factor the coordinate z is a spectator dimension and hence we have a six dimen-
sional bulk Schwarzschild black two brane boosted along a translation invariant
direction w and periodicaly identified as w ∼ w+2πR. This bulk black two brane
in the limit of large boost velocity and a large periodicity R intercepts the four
brane in a fast spinning thin five dimensional neutral black ring of large radius R
with the usual curvature singularity on the brane. This is obvious as the boost
does not involve the transverse z direction and the limit of large radius and high
boost velocity are z independent. So in this limit after periodic identification the
event horizon has S2×S1×R topology extended in the bulk and periodic in the
coordinate w on the four brane.
3.2 The Brane World Geodesics.
The geodesic equations for the the bulk spacetime may be obtained as earlier
from the Lagrangian
L = 1
= gjkẋ
j ẋk (31)
where gjk are the covariant components of the 5+1 dimensional metric as in eqn.
(24) and j, k = 0 . . . 5. Also ẋ = dx/dρ and on time like geodesics the affine
parameter ρ = τ/m. Accordingly we have pj = ẋj , pj = gjkẋ
k and
gjkpjpk = −ǫm2 (32)
where ǫ = 0, 1 for null and time like geodesics respectively.
The z equation for geodesic motion is obtained from the Lagrangian as
. (33)
The solution for null geodesics is either z =constant or
z = − z1l
. (34)
For timelike geodesics the solution is
z = −z1cosec(ρm/l). (35)
Herem is the particle mass for timelike geodesics and we should set z1/m=constant
for the null geodesics in this case. The null case z =constant is simply a null
geodesic of the five dimensional rotating black ring. We are interested in the
other solutions which reach the location of the bulk singularity at the AdS hori-
zon z = ∞ for ρ→ 0−.
The bulk spacetime has three killing isometries ∂t, ∂ψ, and ∂φ leading to the
corresponding conserved momenta pt = −E, pψ = Ψ and pφ = Φ for geodesic
motion. Once again we consider only those geodesics in the bulk which, on the
4-brane, are restricted to the plane of rotation of the black ring , i.e in the x = −1
plane. The gφφ component of the 5+1 dimensional metric goes to zero on the
plane of rotation so that E and Ψ are the conserved quantities for such geodesics.
The geodesic equations for the t and ψ co-ordinates in the plane of rotation of
the black ring are given as
z2(1− λ)
l2(1 + λy)
(1− λ)2
(1 + y)4
(1 + λy)G(y)
E − z
2C(1 + y)3
l2R(1− λ)(G(y)
z2CR(1 + y)3
l2(1− λ)G(y)
z2(1 + y)2(1 + λy)
l2R2(1− λ)G(y)
The y equation of motion for time like and null geodesics in the bulk which reach
the AdS horizon is given by
+ gttE2 − gtψEΨ+ gψψΨ2
= 0. (37)
Here the contravariant components of the metric in the equation are essentialy
the black ring metric without the bulk conformal factor.
The bulk timelike or null geodesics when projected onto the brane reduce
to the time like black ring geodesics restricted to the plane of rotation of the
ring. The projection to the four brane hypersurface is effected by scaling out
the z dependence of the geodesics. First, new parameters γ = z2/m2ρ for null
geodesics and γ = (−z2
/lm)cot(mρ/l) for time like geodesics are introduced. We
define the rescaled co-ordinates and parameters x = lx̃/z1, y = lỹ/z1, t = lt̃/z1 ,
R = l2R̃/z2
, λ = z1λ̃/l, ν = z1ν̃/l. The integrals of motion are also rescaled as
E = lẼ/z1,Ψ = l
3Ψ̃/z3
The geodesic equation for the y coordinate in the rescaled quantities may then
be written as,
+ Veff(ỹ; Ẽ, Ψ̃) = 0 (38)
where Veff is the same effective potential as given in eqn. (20). This is pre-
cisely the equation in y for a time like geodesic in the plane of rotation of a five
dimensional rotating black ring with an ADM mass M̃ and angular momentum
M , J̃ =
J (39)
and thus existing on the four brane hypersurface located at z = z0 = l
2/z1. The
parameter γ now serves as the proper time along the time like geodesic.
In order to ascertain the nature of the singularity at the AdS horizon (z = ∞)
we need to study the behaviour of the bulk geodesics near the AdS horizon, i.e
as ρ → 0−. This is equivalent to γ → ∞, so we need to investigate the late
time behaviour of the five dimensional time like geodesics on the four-brane. The
geodesics ending into the black ring singularity will take a finite amount of proper
time to do so. For infinite proper time the geodesics can either reach up to the
asymptotic infinity on the four brane(x̃, ỹ = −z1/l) or remain at a finite distance
from the black ring horizon. The geodesics that reach asymptotic infinity on the
brane have late time behaviour
r̃ ∼ γ
Ẽ2 −m2, (40)
where
r̃2 = − 1
z1/l + ỹ
. (41)
The co-ordinate r̃ is the radial direction on the brane and it is the same as the
radial Myers-Perry coordinate for the black ring in the asymptotic limit modulo
certain constants in the plane of rotation of the black ring.
It is expected that stable bound orbits do not exist in the case of the five
dimensional black rings. So, only unbound geodesics may reach the AdS horizon
at z → ∞. Along such orbits the curvature squared, Eq.(28), remains finite,
thus indicating the presence of a p-p curvature singularity at the AdS horizon.
To explicitly illustrate this, it is necessary to obtain the curvature components
in an orthonormal frame parallely propagated on a timelike geodesic to the AdS
horizon. Although its simple to demonstrate this in the case of the Schwarzschild
black hole in a braneworld for more complicated metrics and higher dimensions
the explicit determination of this frame involves several coupled PDE and renders
this analysis computationaly intractable. Although we have to emphasize that
such frames exist the choice is highly non unique and a specific suitable such
frame is complicated to establish even for four dimensional Kerr black holes in a
braneworld [11].
4 Summary and Discussions.
To summarize we have described a five dimensional neutral rotating black ring on
a four brane in a six dimensional Randall-Sundrum braneworld. As mentioned
earlier this has been motivated by the fact that for consistency the usual grav-
itational configurations on the brane, in particular black holes must arise from
some higher dimensional bulk solutions. The five dimensional black ring being
the first asymptoticaly flat solution with a non spherical horizon topology is an
interesting configuration to study from a bulk brane world perspective. Espe-
cialy as it explicitly violates the no hair and the uniqueness theorem. Due to the
absence of suitable exact bulk metrics in D > 4 a linearized framework around a
fixed solution is necessary for the analysis of the black ring in a brane world. In
this context the bulk black string approach of Chamblin et. al. [7] is especialy
relevant to elucidate the physical issues although the pathology of a singularity at
the AdS horizon persists. However, absence of such a singularity in lower dimen-
sional brane worlds where exact metrics are available shows the bulk singularity
to be a linearization artifact.
To this end we have considered a bulk six dimensional black string extension
of a five dimensional rotating neutral black ring in a 5+1 dimensional Randall-
Sundrum braneworld. This choice is consistent with the usual reflection symmet-
ric junction conditions on the four brane in such warped compactification models.
The bulk black string rotates in the four brane world volume and the induced
five dimensional metric on the four brane describes a neutral rotating black ring.
This reproduces the usual spacelike curvature singularity of the black ring on
the four brane hypersurface. Additionaly a singularity also appears in the bulk
at the AdS horizon. After elucidating the geodesics of the rotating black ring
restricted to the plane of rotation we have obtained both the timelike and the
null geodesics for the black string in the six dimensional bulk. We have further
shown that the restricted bulk geodesics projected on the four brane by scaling
away the AdS direction exactly match the corresponding class of five dimensional
black ring geodesics. The effective potential has been analysed numericaly and
we have shown that stable bound geodesics do n ot exist as is expected in D > 4.
It has been further shown that the curvature invariant remains finite along un-
bounded geodesics which reach the AdS horizon. This clearly indicates that the
bulk curvature singularity at the AdS horizon is possibly a p-p curvature sin-
gularity although an explicit illustration using parallely propogated orthonormal
frames is computationaly intractable.
It is mentioned earlier that a fast spinning thin neutral rotating black ring
may be described as a black string boosted along the translationaly invariant
direction and identified periodically in some limit. We have shown that from the
bulk perspective this description involves naturaly a black two brane in the six
dimensional bulk orthogonal to the four brane hypersurface. To obtain the black
ring on the four brane the black two brane must be boosted along a translationaly
invaraint direction longitudinal to the four brane and identified periodicaly along
this direction. Due to the direct equivalence of the two metrics it is obvious that
the usual matching of the geodesics on the bulk and the brane will continue to
hold in this limit . In the black ring limit the event horzion in the bulk would
constitute a base S2×S1 on the five dimensional brane hypersurface and a trivial
R fibration into the bulk.
The issue of stability of the bulk black string configuration is contentious and
remains unresolved for axialy symmetric stationary solutions. For AdS solutions
one conclusion is that the prefered phase will be an accumulation of a sequence of
lower dimensional black holes with the horzion pinched off at some scale. However
for the usual Schwarzschild black string this conclusion has been contested where
it has been shown that a more likely scenario is an evolution to a translationaly
non invariant stable solution [10]. But this although plausible has not yet been
generalized explicitly to axialy symmetric solutions. It has been argued that the
bulk solution should pinch off due to the instabilities before reaching the singu-
larity at the AdS horzion [7, 9]. However this issue is far from being completely
settled. It is possible that the pathology at the AdS horizon is a linearization
artifact especialy given that lower dimensional exact bulk solutions are regular
everywhere.
There are several open issues for future studies. Charged rotating black ring
solutions have been obtained in the context of string theory through the O(d, d)
transformations. These have been further generalized to rotating black rings with
dipole charges. In the brane world scenario, bulk configurations which reduce to
charged black holes have been investigated. It could be shown in this case that
the black hole on the brane developed a tidal charge due to the extra dimensions
apart from the usual conserved gauge charge [15]. It would be an interesting
exercise to study the brane world formulation of the dipole black rings in this
context. Very recently it has been shown that in higher diemnsions it is possible to
have stable configurations involving combinations of black rings and black holes.
These have been christened black saturn and are remarkably novel solutions of
higher dimensional general relativity [37]. Naturally it would be interesting to
investigate these configurations from a brane world perspective. It is generaly
expected that more such solutions would be possible in the context of higher
dimensions. Some of these issues are being currently studied.
5 Acknowledgements
We would like to thank A.Virmani for collaboration during early stages of this
work. GS would also like to acknowledge J. Maharana for discussions. Both of
us would like to thank D. D. B. Rao and B. N. Tiwari for computational help.
References
[1] N. Arkani-Hamed, S. Dimopoulos , G. Dvali, Phys. Lett.B 429, 263 (1998),
hep-ph/9803315; I. Antoniadis, N. Arkani-Hamed, S. Dimopoulos, G. Dvali,
Phys. Lett. B436,257 (1998), hep-ph/9804398 .
[2] J. Lykken, L. Randall, JHEP 0006,014 (2000), hep-th/9908076.
[3] C. Kokorelis, Nucl. Phys. B677 (2004) 115, hep-th/0207234; D. Cremades,
L.E.Ibanez and F.Marchesano, Nucl. Phys. B643 (2002) 93, hep-th/0205074
[4] L. Randall, R. Sundrum, Phys. Rev. Lett. 83, 4690 (1999), hep-th/9906064;
Phys. Rev. Lett.83, 3370 (1999), hep-ph/9905221 .
[5] R. Dick; Class.Quant.Grav. 18 (2001) R1-R24, hep-th/0105320
[6] A. Chamblin, G. W. Gibbons, Phys. Rev. Lett. 84 (2000), hep-th/9909130.
[7] A. Chamblin, S. Hawking, H. Reall, Phys. Rev. D 61, 065007 (2000),
hep-th/ 9909025
[8] R. Gregory, R. Laflamme, Phys. Rev. Lett. 70, 2837, (1993)
,hep-th/ 9301052; Nucl. Phys. B 428, 399 (1994), hep-th/ 9407071; Phys.
Rev D 51, 305 (1995), hep-th/ 9410050.
[9] R. Gregory, Class. Quant. Grav. 17, L125 (2000).
[10] G. T. Horowitz, K. Maeda, Phys.Rev.Lett. 87 (2001) 131301 hep-th/0105111;
ibid Phys. Rev. D 65 104028 (2002), hep-th/0201241.
[11] M.S. Modgil, S. Panda and G.Sengupta, Mod. Phys. Lett A, 17 (2002) 1479
, hep-th/0104122.
[12] I. Giannakis and H. C. Ren, Phys.Rev. D63 (2001) 024001, hep-th/0007053,
ibid Phys. Rev. D63 (2001) 125017, hep-th/0010183 Phys.Rev. D64 (2001)
065015, hep-th/0103265.
[13] R. Casadio, A. Fabbri, L. Mazzacurati;Phys.Rev. D65 (2002) 084040,
gr-qc/0111072; P. Kanti, K. Tamvakis, Phys.Rev. D65 (2002) 084010,
http://arxiv.org/abs/hep-ph/9803315
http://arxiv.org/abs/hep-ph/9804398
http://arxiv.org/abs/hep-th/9908076
http://arxiv.org/abs/hep-th/0207234
http://arxiv.org/abs/hep-th/0205074
http://arxiv.org/abs/hep-th/9906064
http://arxiv.org/abs/hep-ph/9905221
http://arxiv.org/abs/hep-th/0105320
http://arxiv.org/abs/hep-th/9909130
http://arxiv.org/abs/hep-th/9909025
http://arxiv.org/abs/hep-th/9301052
http://arxiv.org/abs/hep-th/9407071
http://arxiv.org/abs/hep-th/9410050
http://arxiv.org/abs/hep-th/0105111
http://arxiv.org/abs/hep-th/0201241
http://arxiv.org/abs/hep-th/0104122
http://arxiv.org/abs/hep-th/0007053
http://arxiv.org/abs/hep-th/0010183
http://arxiv.org/abs/hep-th/0103265
http://arxiv.org/abs/gr-qc/0111072
hep-th/0110298; S. Nojiri, S. D. Odintsov, S. Ogushi,Phys. Rev. D 65 (2002)
023521, hep-th/0108172; G. Dvali, G. Gabadadze, M. Porrati, Phys.Lett.
B485 (2000) 208, hep-th/0005016; I. Giannakis, H. Ren,Phys.Lett. B528
(2002) 133,hep-th/0111127; B. Abdesselam, N. Mohammedi,Phys.Rev. D65
(2002) 084018,hep-th/0110143; C. Charmousis, J.F. Dufaux, Class. Quant.
Grav. 19 (2002) 4671 , hep-th/0202107; A.J.M. Medved, Class.Quant.Grav.
19 (2002) 405, hep-th/0110118.
[14] Won Tae Kim, John J. Oh, Marie K. Oh, Myung Seok Yoon, J.Korean
Phys.Soc. 42 (2003) 13, hep-th/0006134.
[15] N. Dadhich, R. Maartens, P. Papadopoulos, V. Rezania, Phys.Lett.B 487,
1 (2000), hep-th/0003061; I. Oda, EDO-EP-32, (2000), hep-th/0008055; A.
Chamblin, H. S. Reall, H. Shinkai, T. Shiromizu, Phys. Rev. D 63 (2001)
064015, hep-th/0006134; H. Lu, C.N. Pope, Nucl.Phys. B 598 (2001) 492-508
[16] T. Shiromizu, M. Shibata, Phys. Rev. D 62 (2000) 127502.
[17] A.G. Cohen and D.B. Kaplan, Phys. Lett. B 470 (1999) 52, hep-th/9910132;
R.Gregory, Phys. Rev. Lett. 84 (2000) 2564, hep-th/9911015; T. Gherghetta
and M. Shaposhnikov, Phys. Rev. Lett. 85 (2000) 240, hep-th/0004014; T.
Gherghetta, E. Roessl and M. Shaposhnikov, Phys. Lett. B 491 (2000) 353,
hep-th/0006251; F. Leblond, R.C. Myers and D. J. Winters, JHEP 0107
(2001) 031, hep-th/0106140; P.Kanti, R. Madden and K.A. Olive, Phys.
Rev. D 64 (2001) 044021, hep-th/0104177.
[18] S. Kanno, J. Soda, Gen.Rel.Grav.37 (2005) 1651; A.N. Aliev, A.E. Gum-
rukcuoglu, Phys.Rev.D 71 (2005) 104027; R. Neves, hep-th/0409051; P.
Kanti, Int. J. Mod.Phys.A 19 (2004) 4899; S. Kanno, J. Soda, Class. Quant.
Grav. 21 (2004)1915; R. Neves, C. Vaz, hep-th/0309115; P. Kanti , I.
Olasagasti, K. Tamvakis, Phys. Rev. D 68 (2003) 124001; R. Neves, C. Vaz,
Phys. Lett. B5 68 (2003) 153; R. Neves, C. Vaz, Phys. Rev. D 68 (2003)
024007; H. Kudoh, T. Tanaka, T. Nakamura, Phys. Rev. D 68 (2003) 024035;
H. K. Jassal, L. Sriramkumar, gr-qc/0611102; J. de Oliveira, gr-qc/0604077;
H. Kudoh, T. Tanaka, T. Nakamura, Phys.Rev.D 68 (2003) 024035;
[19] R. Emparan, G. T. Horowitz, R. C. Myers, JHEP 0001, 021 (2000),
hep-th/9912135; R. Emparan, G. T. Horowitz, R. C. Myers, JHEP 0001,
007 (2000), hep-th/9911043 .
http://arxiv.org/abs/hep-th/0110298
http://arxiv.org/abs/hep-th/0108172
http://arxiv.org/abs/hep-th/0005016
http://arxiv.org/abs/hep-th/0111127
http://arxiv.org/abs/hep-th/0110143
http://arxiv.org/abs/hep-th/0202107
http://arxiv.org/abs/hep-th/0110118
http://arxiv.org/abs/hep-th/0006134
http://arxiv.org/abs/hep-th/0003061
http://arxiv.org/abs/hep-th/0008055
http://arxiv.org/abs/hep-th/0006134
http://arxiv.org/abs/hep-th/9910132
http://arxiv.org/abs/hep-th/9911015
http://arxiv.org/abs/hep-th/0004014
http://arxiv.org/abs/hep-th/0006251
http://arxiv.org/abs/hep-th/0106140
http://arxiv.org/abs/hep-th/0104177
http://arxiv.org/abs/hep-th/0409051
http://arxiv.org/abs/hep-th/0309115
http://arxiv.org/abs/gr-qc/0611102
http://arxiv.org/abs/gr-qc/0604077
http://arxiv.org/abs/hep-th/9912135
http://arxiv.org/abs/hep-th/9911043
[20] A. Chamblin, C. Csaki, J. Erlich, T. J. Hollowood,Phys. Rev. D 62, 044012
(2000), hep-th/0002076
[21] S. Giddings, E. Katz, L. Randall, JHEP 0003, 023 (2000), hep-th/0002091;
S. B. Giddings, E. Katz , MIT-CTP-3024 (2000), hep-th/0009176
[22] J. Podolsky, Class. Quant. Grav. 15, 719 (1998) , gr-qc/9801052.
[23] G.Sengupta, Int. Jnl. of Mod. Phys. D, 15, (2006), 171.
[24] R. C. Myers, M. J. Perry, Ann. Phys. 172 ( 1986), 304.
[25] R. Emparan and H.S.Reall, Phys. Rev. Lett. 88 (2002) 101101.
[26] K. Hong, E. Teo, Class. Quant. Grav. 22 (2005)109.
[27] M.I.Cai and C.G. Galloway, Class. Quant. Grav., 18 (2001), 2707.
[28] R. Emparan, Nucl. Phys. B610 (2001) 169; R. Emparan and H.S. Reall,
Phys.Rev. D65 (2002) 084025.
[29] A. Chamblin and R. Emparan, Phys. Rev D 55 (1997),754.
[30] H. Elvang, Phys. Rev. D 68 (2003),124016; .
[31] H. Elvang, R. Emparan, D. Mateos and H. S. Reall, Phys. Rev. Lett. 93
(2004), 211302; I. Bena, N. P. Warner, Phys. Rev. D74 (2006),066001; H.
Elvang, R. Emparan, D. Mateos and H. S. Reall, Phys. Rev. D 71(2005)
024033, hep-th/0408120; J. P. Gauntlett and J. B. Gutkowski, Phys. Rev.
D71 (2005), 025013; ibid, Phys. Rev. D71 (2005), 045002; I. Bena, Phys.
Rev. D 70 (2004)105018, hep-th/0404073.
[32] R. Emparan and H.S. Reall, Class.Quant.Grav. 23 (2006) R169,
[33] H. Elvang and R. Emparan, JHEP 0311 (2003),035; R. Emparan, JHEP
0403, (2004), 064; G. T. Horowitz and H. S. Reall, Class. Quantum Grav 22
(2005), 1289; E. Radu, D. Astefanesei, Phys. Rev. D73 (2006) 044014.
[34] R. M. Wald , General Relativity, Univ. of Chicago Press Chicago, (1984), S.
W. Hawking and G. F. Ellis, Large Scale Structure of Space-Time, Cam-
bridge,Univ. Press, Cambridge (1973), S. Chandrasekhar, Mathematical
Theory of Black Holes, Oxford Univ. Press (1985) V. P. Frolov, I. D. Novikov
Black Hole Physics: Basic Concepts and New Developments, Kluwer, (1998).
[35] H. Elvang, R. Emparan, A. Virmani, JHEP 0612 (2006) 074.
[36] V. P. Frolov, D. Stojkovic, Phys.Rev. D68 (2003) 064011; V. P. Frolov, D. V.
Fursaev, D. Stojkovic, Class.Quant.Grav.21 (2004) 3483; V. P. Frolov and
R. Goswami, gr-qc/0612033.
http://arxiv.org/abs/hep-th/0002076
http://arxiv.org/abs/hep-th/0002091
http://arxiv.org/abs/hep-th/0009176
http://arxiv.org/abs/gr-qc/9801052
http://arxiv.org/abs/hep-th/0408120
http://arxiv.org/abs/hep-th/0404073
http://arxiv.org/abs/gr-qc/0612033
[37] H. Elvang, P. Figueras , hep-th/0701035.
http://arxiv.org/abs/hep-th/0701035
	Introduction.
	The Rotating Neutral Black Ring .
	Black Ring Metric
	Black Ring Geodesics
	 Brane World Black Ring
	Black Ring in a RS Brane World
	The Brane World Geodesics.
	Summary and Discussions.
	Acknowledgements
ABSTRACT
  Five dimensional neutral rotating black rings are described from a
Randall-Sundrum brane world perspective in the bulk black string framework. To
this end we consider a rotating black string extension of a five dimensional
black ring into the bulk of a six dimensional Randall-Sundrum brane world with
a single four brane. The bulk solution intercepts the four brane in a five
dimensional black ring with the usual curvature singularity on the brane. The
bulk geodesics restricted to the plane of rotation of the black ring are
constructed and their projections on the four brane match with the usual black
ring geodesics restricted to the same plane. The asymptotic nature of the bulk
geodesics are elucidated with reference to a bulk singularity at the AdS
horizon. We further discuss the description of a brane world black ring as a
limit of a boosted bulk black 2 brane with periodic identification.

<|endoftext|><|startoftext|>
Introduction
The theory of character sheaves [L3] on a reductive group G over an alge-
braically closed field and the theory of irreducible characters of G over a finite
field are two parallel theories; the first one is geometric (involving intersection
cohomology complexes on G), the second one involves functions on the group of
rational points of G. In the case where G is connected, a bridge between the two
theories was constructed in [L1] and strengthened in [L2], [S]. In this paper we
begin the construction of the analogous bridge in the general case, extending the
method of [L1]. Here we restrict ourselves to character sheaves which are ”generic”
(in particular their support is a full connected component of G) and show how such
character sheaves are related to characters of representations (see Theorem 1.2).
Contents
1. Statement of the Theorem.
2. Constructing representations of GF .
3. Proof of Theorem 1.2.
1. Statement of the Theorem
1.1. Let k be an algebraic closure of a finite field Fq. Let G be a reductive
algebraic group over k with identity component G0 such that G/G0 is cyclic,
generated by a fixed connected component D. We assume that G has a fixed
Fq-rational structure with Frobenius map F : G −→ G such that F (D) = D. Let
l be a prime number invertible in k; let Q̄l be an algebraic closure of the l-adic
numbers. All group representations are assumed to be finite dimensional over Q̄l.
We say ”local system” instead of ”Q̄l-local system”.
Let B be the variety of Borel subgroups of G0. Now F : G −→ G induces a
morphism B −→ B denoted again by F . We fix B∗ ∈ B and a maximal torus T of
B∗ such that F (B∗) = B∗, F (T ) = T . Let U∗ be the unipotent radical of B∗. Let
Supported in part by the National Science Foundation.
Typeset by AMS-TEX
http://arxiv.org/abs/0704.0999v1
2 G. LUSZTIG
NB∗ (resp. NT ) be the normalizer of B∗ (resp. T ) in G. Let T̃ = NT ∩NB∗, a
closed F -stable subgroup of G with identity component T . Let T̃D = T̃ ∩D.
Let N = NT ∩ G0. Let W = N /T be the Weyl group. Let D : T
−→ T ,
D : W
−→ W be the automorphisms induced by Ad(d) : N −→ N where d is any
element of T̃D. Now F : N −→ N induces an automorphism ofW denoted again by
F . For w ∈ W let [w] be the inverse image of w under the obvious map N −→ W
and let w be the automorphism Ad(x) : T −→ T for any x ∈ [w]. For w ∈ W
let Ow be the G
0-orbit in B × B (G0 acting by simultaneous conjugation on both
factors) that contains (B∗, xB∗x−1) for some/any x ∈ [w]. Define the ”length
function” l : W −→ N by l(w) = dimOw − dimB. For any y ∈ G
0 we define
k(y) ∈ N by y ∈ U∗k(y)U∗. For y ∈ G0, τ ∈ T̃ we have k(τyτ−1) = τk(y)τ−1
and F (k(y)) = k(F (y)). For x ∈ G0 we define Fx : G −→ G by Fx(g) = xF (g)x
this is the Frobenius map for an Fq-rational structure on G. (Indeed if y ∈ G
such that x = y−1F (y), then Ad(y) : G
−→ G carries Fx to F .) If w ∈W satisfies
D(w) = w and x ∈ [w] then T, T̃ are Fx-stable; thus Fx is the Frobenius map for
an Fq-rational structure on T̃ whose group of rational points is T̃
Fx . Since T̃FxD
is the set of rational points of T̃D (a homogeneous T -space under left translation)
for the rational structure defined by Fx : T̃D −→ T̃D, we have T̃
D 6= ∅.
Let Z∅ = {(B0, g) ∈ B ×D; gB0g
−1 = B0}. Let d ∈ T̃D. We set
Ż∅,d = {(h0U
∗, g) ∈ (G0/U∗)×D; h−10 gh0d
−1 ∈ B∗}.
Define a∅ : Ż∅,d −→ Z∅ by (h0U
∗, g) 7→ (h0B
∗h−10 , g). Now a∅ is a principal T -
bundle where T acts (freely) on Ż∅,d by t0 : (h0U
∗, g) 7→ (h0t
0 , g). Define p∅ :
Z∅ −→ D by (B0, g) 7→ g. We define b∅ : Ż∅,d −→ T by (h0U
∗, g) 7→ k(h−10 gh0d
Note that b∅ commutes with the T -actions where T acts on T by
(a) t0 : t 7→ t0tD(t
Let L be a local system of rank 1 on T such that
(i) L⊗n ∼= Q̄l for some n ≥ 1 invertible in k;
(ii) D∗L ∼= L;
From (i),(ii) we see (using [L3, 28.2(a)]) that L is equivariant for the T -action (a)
on T . Hence b∗
L is a T -equivariant local system on Ż∅,d. Since a∅ is a principal
T -bundle there is a well defined local system L̃∅ on Z∅ such that a
L̃∅ = b
Note that the isomorphism class of L̃∅ is independent of the choice of d. Assume
in addition that:
(iii) {w ∈W ;D(w) = w,w∗L ∼= L} = {1}.
We show:
(b) p∅!L̃∅ is an irreducible intersection cohomology complex on D.
We identify Z∅ with the variety X = {(g, xB
∗) ∈ G ×G0/B∗; x−1gx ∈ NB∗} (as
in [L3, I, 5.4] with P = B∗, L = T, S = T̃D) by (g, xB
∗) ↔ (xB∗x−1, g). Then L̃∅
becomes the local system Ē on X defined as in [L3, I, 5.6] in terms of the local
system E = j∗L on T̃D where j : T̃D −→ T is y 7→ d
−1y. (Note that E is equivariant
GENERIC CHARACTER SHEAVES ON DISCONNECTED GROUPS AND CHARACTER VALUES3
for the conjugation action of T on T̃D.) In our case we have Ē = IC(X, Ē) since X
is smooth. Hence from [L3, I, 5.7] we see that p∅!Ē is an intersection cohomology
complex on D corresponding to a semisimple local system on an open dense subset
of D which, by the results in [L3, II, 7.10], is irreducible if and only if the following
condition is satisfied: if w ∈W,x ∈ [w] satisfy Ad(x)(T̃D) = T̃D and Ad(x)
∗E ∼= E ,
then w = 1. This is clearly equivalent to condition (iii). This proves (b).
From (b) and the definitions we see that p∅!L̃∅[dimD] is a character sheaf on D
in the sense of [L3, VI]. A character sheaf on D of this form is said to be generic.
We can state the following result.
Theorem 1.2. Let A be a generic character sheaf on D such that F ∗A ∼= A
where F : D −→ D is the restriction of F : G −→ G. Let ψ : F ∗A −→ A be an
isomorphism. Define χψ : D
F −→ Q̄l by g 7→
i∈Z(−1)
itr(ψ,Hig(A)) where H
the i-th cohomology sheaf and Hig is its stalk at g. There exists a G
F -module V
and a scalar λ ∈ Q̄∗l such that χψ(g) = λtr(g, V ) for all g ∈ D
The proof is given in §3. We now make some preliminary observations. In
the setup of 1.1 we have A = p∅!L̃∅[dimD] where L satisfies 1.1(i),(ii),(iii) and
F ∗(p∅!L̃∅) ∼= p∅!L̃∅. Hence we have p∅!F̃
∗L∅ ∼= p∅!L̃∅. By a computation in [L3,
IV, 21.18] we deduce that there exists w′ ∈ W such that D(w′) = w′, w′∗F ∗L ∼= L.
Setting w = F (w′) we see that
(a) D(w) = w, F ∗w∗L ∼= L.
1.3. Let w = (w1, w2, . . . , wr) be a sequence in W . Let lw = l(w1)+ l(w2)+ · · ·+
l(wr). Let
Zw = {(B0, B1, . . . , Br, g) ∈ B
r+1×D; gB0g
−1 = Br, (Bi−1, Bi) ∈ Owi(i ∈ [1, r])}.
This agrees with the definition in 1.1 when r = 0, that is w = ∅. Let d ∈ T̃D. We
define Żw,d as in 1.1 when r = 0 and by
Żw,d = {(h0U
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) ∈
(G0/U∗)× (G0/B∗)× . . .× (G0/B∗)× (G0/U∗)×D;
k(h−1i−1hi) ∈ [wi](i ∈ [1, r]), h
r gh0d
−1 ∈ U∗};
when r ≥ 1. Define aw : Żw,d −→ Zw as in 1.1 when r = 0 and by
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) 7→
∗h−10 , h1B
∗h−11 , . . . , hr−1B
∗hr−1, hrB
∗h−1r , g),
when r ≥ 1. Note that aw is a principal T -bundle where T acts (freely) on Żw,d
as in 1.1 when r = 0 and by
t0 : (h0U
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) 7→
∗, h1B
∗, . . . , hr−1B
∗, hrdt
−1U∗, g)
4 G. LUSZTIG
when r ≥ 1. Define pw : Zw −→ D by (B0, B1, . . . , Br, g) 7→ g.
In the remainder of this subsection we assume that w1w2 . . . wr = 1; this holds
automatically when r = 0. We define bw : Żw,d −→ T as in 1.1 when r = 0 and by
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) 7→ k(h−10 h1)k(h
1 h2) . . . k(h
r−1hr)
when r ≥ 1. Note that bw commutes with the T -actions where T acts on T as in
1.1(a).
Let L be a local system of rank 1 on T such that 1.1(i),(ii) hold. As in 1.1, L
is equivariant for the T -action 1.1(a) on T . Hence b∗
L is a T -equivariant local
system on Żw,d. Since aw is a principal T -bundle there is a well defined local
system L̃w on Zw such that a
L̃w = b
Lemma 1.4. Assume that w1w2 . . . wr = 1 and that L (as in 1.3) satisfies
(i) α̌∗L 6∼= Q̄l for any coroot α̌ : k
∗ −→ T .
Then pw!L̃w[lw](lw/2) ∼= p∅!L̃∅. (Note that lw is even.)
Assume first that for some i ∈ [1, r] we have wi = w
i where w
i in W
satisfy l(w′iw
i ) = l(w
i) + l(w
i ). Let
′ = (w1, w2, . . . , wi−1, w
i , wi+1, . . . , wn).
The map (B0, B1, . . . , Br+1, g) 7→ (B0, B1, Bi−1, Bi+1, . . . , Br+1, g) defines an iso-
morphism Zw′ −→ Zw compatible with the maps pw′ , pw and with the local systems
L̃w′ , L̃w. Since lw′ = lw we have
(a) pw!L̃w[lw](lw/2) ∼= pw′!L̃w′ [lw′ ](lw′/2).
Using (a) repeatedly we can assume that l(wi) = 1 for all i ∈ [1, r]. We will prove
the result in this case by induction on r. Note that r is even. When r = 0 the
result is obvious. We now assume that r ≥ 2. Since w1w2 . . . wr = 1, we can
find j ∈ [1, r − 1] such that l(w1w2 . . . wj) = j, l(w1w2 . . . wj+1) = j − 1. We can
find a sequence w′ = (w′1, w
2, . . . , w
r) in W such that l(w
i) = 1 for all i ∈ [1, r],
2 . . .w
j = w1w2 . . . wj , w
j = w
j+1, w
i = wi for i ∈ [j + 1, r]. Let
u = (w1w2 . . . wj , wj+1, . . . , wr) = (w
2 . . . w
j , w
j+1, . . . , w
Using (a) repeatedly we see that
pw!L̃w[lw](lw/2) ∼= pu!L̃u[lu](lu/2) ∼= pw′!L̃w′ [lw′ ](lw′/2).
Replacing w by w′ we see that we may assume in addition that wj = wj+1
for some j ∈ [1, r − 1]. We have a partition Zw = Z
∪ Z ′′
where Z ′
(resp.
) is defined by the condition Bj−1 = Bj+1 (resp. Bj−1 6= Bj+1). Let w
(w1, w, . . . , wj−1, wj+2, . . . , wr), w
′′ = (w1, w, . . . , wj−1, wj+1, . . . , wr). Define c :
−→ Zw′ by
(B0, B1, . . . , Br, g) 7→ (B0, B1, . . . , Bj−1, Bj+2, . . . , Br, g).
GENERIC CHARACTER SHEAVES ON DISCONNECTED GROUPS AND CHARACTER VALUES5
This is an affine line bundle and L̃w|Z′
= c∗L̃w′ . Let p
be the restriction of pw
to Z ′
. We have p′
= pw′c. Since the induction hypothesis applies to w
′ we have
w!(L̃w|Z′w)[lw](lw/2) = pw′!c!c
∗L̃w′ [lw](lw/2)
= pw′!L̃w′ [−2](−1)[lw](lw/2) = pw′!L̃w′ [lw′ ](lw′/2) = p∅!L̃∅.(b)
Define e : Z ′′
−→ Zw′′ by
(B0, B1, . . . , Br, g) 7→ (B0, B1, . . . , Bj−1, Bj+1, . . . , Br, g).
Let p′′
be the restriction of pw to Z
. We have p′′
= pw′′e. We show that
w!(L̃w|Z′′w) = 0. It is enough to show that
(c) pw′′!e!(L̃w|Z′′
) = 0.
Hence it is enough to show that e!(L̃w|Z′′
) = 0. It is also enough to show that, if
E is a fibre of e, then Hic(E, L̃w|E) = 0 for any i. As in the proof of [L3, VI, 28.10]
we may identify E = k∗ in such a way that L̃w|E becomes α̌
∗(L) for some coroot
α̌ : k∗ −→ T . We then use that Hic(k
∗, α̌∗L) = 0 which follows from α̌∗L 6∼= Q̄l.
Using (c) and the exact triangle
(pw′′!e!(L̃w|Z′′
), pw!L̃w, p
w!(L̃w|Z′w))
we see that
pw!L̃w[lw](lw/2) = p
w!(L̃w|Z′w)[lw])(lw/2) = p∅!L̃∅
(the last equality follows from (b)). The lemma is proved.
Lemma 1.5. Assume that L (as in 1.3) satisfies 1.1(iii). Then L satisfies 1.4(i).
Let RL be the set of roots α : T −→ k
∗ such that the corresponding coroot α̌
satisfies α̌∗L ∼= Q̄l. Let WL be the subgroup of W generated by the reflections
with respect to the various α ∈ RL. Since D
∗L ∼= L we have D(WL) = WL.
Assume that 1.4(i) does not hold. Then RL 6= ∅ and WL 6= {1}. By [DL, 5.17]
the fixed point set of D : WL −→ WL is 6= {1}. Let w ∈ WL − {1} be such that
D(d)w = w. Since w ∈WL we have w
∗L ∼= L (see [L3, VI, 28.3(b)]). Thus 1.1(iii)
does not hold. The lemma is proved.
2. Constructing representations of GF
2.1. In this section we construct some representations of GF using the method of
[DL]. See [M],[DM] for other results in this direction.
Let L be a local system of rank 1 on T such that 1.1(i) holds. For any t ∈ T let
Lt be the stalk of L at t. Assume that we are given w ∈W and x ∈ [w] such that
6 G. LUSZTIG
(i) F ∗xL
∼= L;
(Fx : T −→ T as in 1.1). Let φ : F
xL −→ L be the unique isomorphism of
local systems on T which induces the identity map on L1. For t ∈ T , φ induces
an isomorphism LFx(t)
−→ Lt. When t ∈ T
Fx this is an automorphism of the
1-dimensional vector space Lt given by multiplication by θ(t) ∈ Q̄
l . It is well
known that t 7→ θ(t) is a group homomorphism TFx −→ Q̄∗l .
Following [DL] we define
Y = {hU∗ ∈ G0/U∗; h−1F (h) ∈ U∗xU∗}.
For (g, t) ∈ G0F × TFx we define eg,t : Y −→ Y by hU
∗ 7→ ght−1U∗. Note
that (g, t) 7→ eg,t is an action of G
0F × TFx on Y . Hence G0F × TFx acts on
Hic(Y ) := H
c(Y, Q̄l) by (g, τ) 7→ e
g−1,τ−1
. We set
Hic(Y )θ = {ξ ∈ H
c(Y ); e
1,t−1ξ = θ(t)
−1ξ for all t ∈ TFx};
this is a G0F × TFx-stable subspace of Hic(Y ).
For g ∈ G0F we define ǫg : H
c(Y )θ −→ H
c(Y )θ by ǫg(ξ) = e
g−1,1
. This makes
Hic(Y )θ into a G
0F -module.
We can find an integer r ≥ 1 such that
F r(x) = x, xF (x) . . . F r−1(x) = 1.
Indeed we first find an integer r1 ≥ 1 such that F
r1(x) = x and then we find
an integer r2 ≥ 1 such that (xF (x) . . .F
r1−1(x))r2 = 1. Then r = r1r2 has the
required properties. Then hU∗ 7→ F r(h)U∗ is a well defined map Y −→ Y denoted
again by F r. Also,
F r = F rx : G −→ G.
(We have F rx (g) = (xF (x) . . . F
r−1(x))F r(g)(xF (x) . . .F r−1(x))−1 = F r(g).) Hence
F r acts trivially on TFx . We see that F r : Y −→ Y commutes with eg,t : Y −→ Y
for any (g, t) ∈ G0F × TFx . Hence (F r)∗ : Hic(Y ) −→ H
c(Y ) leaves stable the
subspace Hic(Y )θ. Note that:
for any i, all eigenvalues of (F r)∗ : Hic(Y ) −→ H
c(Y ) are of the form root of 1
times qnr/2 where n ∈ Z.
(See [L1, 6.1(e)] and the references there.)
Replacing r by an integer multiple we may therefore assume that r satisfies in
addition the following condition:
(a) for any i, all eigenvalues of (F r)∗ : Hic(Y ) −→ H
c(Y ) are of the form q
where n ∈ Z.
2.2. We preserve the setup of 2.1 and assume in addition that L satisfies 1.4(i).
Let i0 = 2dimU
∗ − l(w). Note that
(a) Hic(Y )θ = 0 for i 6= i0; if i = i0 then all eigenvalues of (F
r)∗ : Hic(Y )θ −→
Hic(Y )θ are of the form q
ir/2.
For the first statement in (a) see [DL, 9.9] and the remarks in the proof of [L1,
8.15]. The second statement in (a) is deduced from 2.1(a) as in the proof of [L1,
6.6(c)].
GENERIC CHARACTER SHEAVES ON DISCONNECTED GROUPS AND CHARACTER VALUES7
2.3. We preserve the setup of 2.1 and assume in addition that L satisfies 1.1(ii)
and that w ∈W satisfies D(w) = w. From the definitions we see that D : T −→ T
commutes with Fx : T −→ T hence D restricts to an automorphism of T
Fx and
(a) θ(D(t)) = θ(t) for any t ∈ TFx .
We show:
(b) there exists a homomorphism θ̃ : T̃Fx −→ Q̄∗l such that θ̃|TFx = θ.
Let d ∈ T̃FxD . Let n = |G/G
0| = |T̃Fx/TFx |. Then t0 := d
n ∈ TFx . Let c ∈ Q̄∗l
be such that cn = θ(t0). For any t ∈ T
Fx and j ∈ Z we set θ̃(djt) = cjθ(t).
This is well defined: if djt = dj
t′ with j, j′ ∈ Z, t, t′ ∈ TFx then j′ = j + nj0,
j0 ∈ Z and t
′ = t
0 t so that θ(t
′) = cnj0θ(t) and cjθ(t) = cj
θ(t′). We show that
if j, j′ ∈ Z, t, t′ ∈ TFx then θ̃(djtdj
t′) = θ̃(djt)θ̃(dj
t′) that is cj+j
θ(D−j
(t)t′) =
cjθ(t)cj
θ(t′); this follows from (a). This proves (b).
Let Γ = {(g, τ) ∈ GF×T̃Fx ; gτ−1 ∈ G0}, a subgroup of GF×T̃Fx . For (g, τ) ∈ Γ
we define eg,τ : Y −→ Y by hU
∗ 7→ ghτ−1U∗. To see that this is well defined we
assume that h ∈ G0 satisfies h−1F (h) ∈ U∗xU∗ and (g, τ) ∈ Γ; we compute
(ghτ−1)−1F (ghτ−1) = τh−1g−1gF (h)F (τ−1)
= τh−1F (h)F (τ−1) ∈ τU∗xU∗F (τ−1) = U∗τxF (τ−1)U∗ = U∗xU∗,
since τxF (τ−1) = x (that is Fx(τ) = τ). Note that (g, τ) 7→ eg,τ is an action
of Γ on Y (extending the action of G0F × TFx). Hence Γ acts on Hic(Y ) by
(g, τ) 7→ e∗
g−1,τ−1
. Note that Hic(Y )θ is a Γ-stable subspace of H
c(Y ). This follows
from the identity
eg−1,τ−1e1,t−1 = e1,τ−1t−1τeg−1,τ−1
for g ∈ GF , τ ∈ T̃Fx , t ∈ TFx together with the identity θ(t) = θ(τ−1tτ) which is
a consequence of (a).
For g ∈ GF we define ǫg : H
c(Y )θ −→ H
c(Y )θ by
ǫg(ξ) = θ̃(τ)e
g−1,τ−1ξ
for any ξ ∈ Hic(Y )θ and any τ ∈ T̃
Fx such that gτ−1 ∈ G0. Assume that τ ′ ∈ T̃Fx
is another element such that gτ ′−1 ∈ G0. Then τ ′ = τt with t ∈ TFx and
θ̃(τ ′)e∗g−1,τ ′−1ξ = θ̃(τ)θ(t)e
g−1,τ−1e
1,t−1ξ = θ̃(τ)e
g−1,τ−1ξ
so that ǫg is well defined. For g, g
′ in GF we choose τ, τ ′ in T̃Fx such that gτ−1 ∈
G0, g′τ ′−1 ∈ G0; we have
ǫgǫg′ξ = θ̃(τ
′)θ̃(τ)e∗g−1,τ−1e
g′−1,τ ′−1ξ = θ̃(ττ
′)e∗(gg′)−1,(ττ ′)−1ξ = ǫgg′ξ.
We see that
8 G. LUSZTIG
g 7→ ǫg defines a G
F -module structure on Hic(Y )θ extending the G
0F -module
structure in 2.1.
(Note that this extension depends on the choice of θ̃.) We show:
(c) If (g, τ) ∈ Γ then F reg,τ : Y −→ Y is the Frobenius map of an Fq-rational
structure on Y .
Since eg,t is a part of a Γ-action, it has finite order. Since F
r = F rx : G −→ G (see
2.1), we see that F r : Y −→ Y commutes with eg,τ : Y −→ Y . Hence (c) holds.
2.4. We preserve the setup of 2.3 and assume in addition that L satisfies 1.3(i).
Let i0 = 2dimU
∗ − l(w). Using 2.2(a), 2.3(c) and Grothendieck’s trace formula
we see that for (g, d) ∈ Γ we have
(−1)l(w)θ̃(d)qi0r/2tr(ǫg, H
c (Y )θ)
= θ̃(d)
(−1)itr((F r)∗ǫg, H
c(Y )θ) =
(−1)itr((F r)∗e∗g−1,d−1 , H
c(Y )θ)
(−1)i|TFx |−1
t∈TFx
tr((F r)∗e∗g−1,d−1e
1,t−1 , H
c(Y ))θ(t)
= |TFx |−1
t∈TFx
(−1)itr((F r)∗e∗g−1,(dt)−1 , H
c(Y ))θ(t)
= |TFx |−1
t∈TFx
g−1,(dt)−1 |θ(t)
= |TFx |−1
t∈TFx
|{hU∗ ∈ (G0/U∗); h−1F (h) ∈ U∗xU∗, h−1g−1F r(h)dt ∈ U∗}|θ(t).
3. Proof of Theorem 1.2
3.1. Let A, ψ, χψ be as in 1.2. Let L, w be as in the end of 1.2. Let x ∈ [w]. From
1.2(a) we see that 2.1(i) holds. Let r ≥ 1 be as in 2.1. Let
w = (w, F (w), . . . , F r−1(w)).
By the choice of r we have wF (w) . . . F r−1(w) = 1. Define a morphism F̃ : Zw −→
Zw by
F̃ (B0, B1, . . . , Br, g) = (F (g
−1Br−1g), F (B0), F (B1), . . . , F (Br−1), F (g)).
We show:
(a) Let g ∈ DF and let F̃g : p
(g) −→ p−1
(g) be the restriction of F̃ : Zw −→ Zw.
Then F̃g is the Frobenius map of an Fq-rational structure on p
It is enough to note that the map Br+1 −→ Br+1 given by
(B0, B1, . . . , Br) 7→ (F (g
−1Br−1g), F (B0), F (B1), . . . , F (Br−1))
GENERIC CHARACTER SHEAVES ON DISCONNECTED GROUPS AND CHARACTER VALUES9
is the composition of the map
F ′ : (B0, B1, . . . , Br) 7→ (F (B0), F (B1), . . . , F (Br))
(the Frobenius map of an Fq-rational structure on B
r+1) with the automorphism
(B0, B1, . . . , Br) 7→ (g
−1Br−1g, B0, B1, . . . , Br−1)
of Br+1 which commutes with F ′ and has finite order (since g has finite order in
Let d ∈ T̃FxD . Define a morphism F̃
′ : Żw,d −→ Żw,d by
F̃ ′(h0U
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) = (h′0U
∗, h′1B
∗, . . . , h′r−1B
∗, h′rU
∗, F (g))
where
h′0 = F (g
−1hr−1k(h
r−1hr))x
−1d, h′r = F (hr−1k(h
r−1hr)x
h′i = F (hi−1) for i ∈ [1, r − 1].
This is well defined since
(F (hr−1k(h
r−1hr)x
−1)−1F (g)F (g−1hr−1k(h
r−1hr))x
−1)dd−1 = 1.
We show that the T -action on Żw,d (see 1.3) satisfies F̃
′(t0x̃) = Fx(t0)F̃
′(x̃) for
t0 ∈ T, x̃ ∈ Żw,d. Let (hi) be as above. We must show:
F (g−1hr−1k(h
r−1hrdt
−1))x−1d = F (g−1hr−1k(h
r−1hr))x
−1dxF (t−10 )x
F (hr−1k(h
r−1hrdt
−1)x−1 = F (hr−1k(h
r−1hr)x
−1dxF (t0)
−1x−1d−1,
which follow from F (d) = x−1dx. Note that
(b) awF̃
′ = F̃ aw : Żw,d −→ Zw.
We show:
(c) |a−1
(y)F̃
| = |TFx | for any y ∈ ZF̃
Since a−1
(y) is a homogeneous T -space this follows from Lang’s theorem applied
to (T, Fx).
We have
(d) pwF̃ = Fpw : Zw −→ D.
3.2. We show:
(a) bwF̃
′ = Fxbw : Żw,d −→ T .
Let (h0, h1, . . . , hr, g) ∈ (G
0)r+1 ×D be such that
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗, g) ∈ Żw,d.
10 G. LUSZTIG
Let (h′1, h
2, . . . , h
r) be as in 3.1. We set
µ = k(h−10 h1)k(h
1 h2) . . . k(h
r−1hr) ∈ T,
µ′ = k(h−10 h1)k(h
1 h2) . . . k(h
r−2hr−1) ∈ B
∗F r−1(x)−1B∗
µ̃ = k(h′0
−1h′1)k(h
−1h′2) . . . k(h
−1h′r) ∈ T
so that µ = µ′k(h−1r−1hr) and
µ̃ = k(d−1xF (k(h−1r−1hr)
−1h−1r−1gh0))
× k(F (h−10 h1)) . . . k(F (h
r−3hr−2))k(F (h
r−2hr−1k(h
r−1hr))x
= d−1xF (k(h−1r−1hr)
−1)F (d)k(F (d−1)F (h−1r−1gh0))F (µ
′)F (k(h−1r−1hr))x
= d−1xF (d)F (µ)x−1 = xF (µ)x−1 = Fx(µ),
as required.
3.3. Let φ : F ∗xL
−→ L, θ : TFx −→ Q̄∗l be as in 2.1. We shall denote by ? the
various isomorphisms induced by φ such as:
(a) F̃ ′∗b∗
L = b∗
F ∗xL
−→ b∗
L (see 3.2(a)),
(b) F̃ ′∗a∗
−→ a∗
L̃w (coming from (a)),
(c) a∗
F̃ ∗L̃w
−→ a∗
L̃w (see (b) and 3.1(b)),
(d) F̃ ∗L̃w
−→ L̃w (coming from (c)),
(e) pw!F̃
−→ pw!L̃w (coming from (d)),
(f) F ∗pw!L̃w
−→ pw!L̃w (coming from (e) and 3.1(d)).
(g) F ∗(pw!L̃w[lw])
−→ pw!L̃w[lw] (coming from (f)).
3.4. For any g ∈ DF we compute
(−1)itr(?,Hig(pw!L̃w)) =
(−1)itr(?, Hic(p
(g), L̃w))
(g);F̃ (y)=y
tr(?, (L̃w)y)
where Hi is the i-th cohomology sheaf. (The last two sums are equal by the
Grothendieck trace formula applied in the context of 3.1(a).) Using 3.1(c) we see
that the last sum equals
|TFx |−1
(g))F̃
tr(?, (a∗
L̃w)ỹ) = |T
Fx |−1
(g))F̃
tr(?, (b∗
Lw)ỹ)
= |TFx |−1
(g))F̃
tr(?, (Lw)bw(ỹ)).
GENERIC CHARACTER SHEAVES ON DISCONNECTED GROUPS AND CHARACTER VALUES11
Now a−1
(g))F̃
can be identified with the set of all
∗, h1B
∗, . . . , hr−1B
∗, hrU
∗) ∈ (G0/U∗)×(G0/B∗)× . . .×(G0/B∗)×(G0/U∗)
such that
(a) k(h−1i−1hi) ∈ F
i−1(x)T for i ∈ [1, r],
(b) h−1r gh0d
−1 ∈ U∗,
(c) h0U
∗ = F (g−1hr−1k(h
r−1hr))x
−1dU∗,
(d) hiB
∗ = F (hi−1)B
∗ for i ∈ [1, r − 1].
(We then have automatically hrU
∗ = F (hr−1k(h
r−1hr)x
−1U∗.) If h0U
∗ is given,
then (d) determines successively h2B
∗, . . . hr−1B
∗ in a unique way and (b) deter-
mines hrU
∗ in a unique way. We see that the equations (a)-(d) are equivalent to
the following equations for h0U
h−10 F (h0) ∈ B
∗xB∗, F r−1(h0)
−1gh0d
−1 ∈ B∗F r−1(x)B∗,
F r(h0)
−1gh0d
−1U∗ = k(F r(h0)
−1gF (h0)F (d
−1))x−1U∗
(if r ≥ 2) and
h−10 gh0d
−1 ∈ B∗xB∗, F (h0)
−1gh0d
−1U∗ = k(F (h0)
−1gF (h0)F (d
−1))x−1U∗
(if r = 1). In both cases these equations are equivalent to
(e) h−10 F (h0) ∈ U
∗txF (t)−1U∗, F r(h0)
−1gh0d
−1 ∈ F r(t)U∗
for some t ∈ T . We then have F r−1(h0)
−1gh0d
−1 ∈ U∗F r−1(t)F r−1(x)U∗. For
∗, t as in (e) we compute
k(h−10 F (h0))k(F (h0)
−1F 2(h0)) . . . k(F
r−2(h0)
−1F r−1(h0))k(F
r−1(h0)
−1gh0d
= (txF (t)−1)(F (t)F (x)F 2(t−1)) . . . (F r−2(t)F r−2(x)F r−1(t)−1)(F r−1(t)F r−1(x))
= txF (x) . . . F r−1(x) = t.
By 3.2(a) the result of the last computation is necessarily in TFx . Thus Fx(t) = t.
Hence F r(t) = t and the equations (e) become
(f) h−10 F (h0) ∈ U
∗xU∗, F r(h0)
−1gh0d
−1 ∈ TFxU∗.
We see that
(−1)itr(?,Hig(pw!L̃w)) = |T
Fx |−1
t∈TFx
at = |T
Fx |−1
t′∈TFx
where
at = |{hU
∗ ∈ (G0/U∗); h−1F (h) ∈ U∗xU∗, dh−1g−1F r(h)t ∈ U∗}|θ(t),
12 G. LUSZTIG
a′t′ = |{hU
∗ ∈ (G0/U∗); h−1F (h) ∈ U∗xU∗, h−1g−1F r(h)dt′ ∈ U∗}|θ(dt′d−1).
Comparing with the last formula in 2.4 and using θ(dt′d−1) = θ(t′) for t′ ∈ TFx
we obtain (with i0 as in 2.4):
(−1)itr(?,Hig(pw!L̃w)) = (−1)
l(w)θ̃(d)qi0r/2tr(ǫg, H
c (Y )θ).
Let us choose an isomorphism pw!L̃w[lw] ∼= p∅!L̃∅. (This exists by 1.4; note that
1.4(i) holds by 1.5.) Via this isomorphism, the isomorphism 3.3(g) corresponds to
an isomorphism F ∗(p∅!L̃∅) −→ p∅!L̃∅ that is to an isomorphism ψ
′ : F ∗A
−→ A so
that ∑
(−1)itr(?,Hig(pw!L̃w)) =
(−1)itr(ψ′,Hig(A))
for any g ∈ DF . (We use that lw is even.) Since A is irreducible, we must have
ψ = λ′ψ′ for some λ′ ∈ Q̄∗l . It follows that
(−1)itr(ψ,Hig(A)) = λ
′(−1)l(w)θ̃(d)qi0r/2tr(ǫg, H
c (Y )θ)
for any g ∈ DF . Thus Theorem 1.2 holds with V being the GF -module Hi0c (Y )θ,
which is irreducible (even as a G0F -module) if G0 has connected centre, but is not
necessarily irreducible in general.
References
[DL] P.Deligne and G.Lusztig, Representations of reductive groups over finite fields, Ann.Math.
103 (1976), 103-161.
[DM] F.Digne and J.Michel, Groupes réductifs non connexes, Ann.Sci. École Norm.Sup.
27 (1994), 345-406.
[L1] G.Lusztig, Green functions and character sheaves, Ann.Math. 131 (1990), 355-408.
[L2] G. Lusztig, Remarks on computing irreducible characters, J.Amer.Math.Soc. 5 (1992),
971-986.
[L3] G.Lusztig, Character sheaves on disconnected groups,I, Represent. Th. (electronic) 7
(2003), 374-403; II 8 (2004), 72-124; III 8 (2004), 125-144; IV 8 (2004), 145-178; Er-
rata 8 (2004), 179-179; V 8 (2004), 346-376; VI 8 (2004), 377-413; VII 9 (2005), 209-266;
VIII 10 (2006), 314-352; IX 10 (2006), 353-379.
[M] G.Malle, Generalized Deligne-Lusztig characters, J.Algebra 159 (1993), 64-97.
[S] T.Shoji, Character sheaves and almost characters of reductive groups, Adv.in Math. 111
(1995), 244-313; II 111 (1995), 314-354.
Department of Mathematics, M.I.T., Cambridge, MA 02139
ABSTRACT
  We relate a generic character sheaf on a disconnected reductive group with a
character of a representation of the rational points of the group over a finite
field extending a result known in the connected case.

<|endoftext|><|startoftext|>
Measurement of D0-D 0 mixing in D0 → K0
− decays
L. M. Zhang,37 Z. P. Zhang,37 I. Adachi,7 H. Aihara,45 V. Aulchenko,1 T. Aushev,18, 13 A. M. Bakich,40
V. Balagura,13 E. Barberio,21 A. Bay,18 K. Belous,12 U. Bitenc,14 A. Bondar,1 A. Bozek,27 M. Bračko,20,14
J. Brodzicka,7 T. E. Browder,6 P. Chang,26 Y. Chao,26 A. Chen,24 K.-F. Chen,26 W. T. Chen,24 B. G. Cheon,5
C.-C. Chiang,26 I.-S. Cho,50 Y. Choi,39 Y. K. Choi,39 J. Dalseno,21 M. Danilov,13 M. Dash,49 A. Drutskoy,3
S. Eidelman,1 D. Epifanov,1 S. Fratina,14 N. Gabyshev,1 G. Gokhroo,41 B. Golob,19, 14 H. Ha,16 J. Haba,7
T. Hara,32 N. C. Hastings,45 K. Hayasaka,22 H. Hayashii,23 M. Hazumi,7 D. Heffernan,32 T. Hokuue,22 Y. Hoshi,43
W.-S. Hou,26 Y. B. Hsiung,26 H. J. Hyun,17 T. Iijima,22 K. Ikado,22 K. Inami,22 A. Ishikawa,45 H. Ishino,46
R. Itoh,7 M. Iwasaki,45 Y. Iwasaki,7 N. J. Joshi,41 D. H. Kah,17 H. Kaji,22 S. Kajiwara,32 J. H. Kang,50 H. Kawai,2
T. Kawasaki,29 H. Kichimi,7 H. J. Kim,17 H. O. Kim,39 S. K. Kim,38 Y. J. Kim,4 K. Kinoshita,3 S. Korpar,20, 14
P. Križan,19, 14 P. Krokovny,7 R. Kumar,33 C. C. Kuo,24 A. Kuzmin,1 Y.-J. Kwon,50 J. S. Lee,39 M. J. Lee,38
S. E. Lee,38 T. Lesiak,27 J. Li,6 A. Limosani,21 S.-W. Lin,26 Y. Liu,4 D. Liventsev,13 T. Matsumoto,47 A. Matyja,27
S. McOnie,40 T. Medvedeva,13 W. Mitaroff,11 H. Miyake,32 H. Miyata,29 Y. Miyazaki,22 R. Mizuk,13 Y. Nagasaka,8
I. Nakamura,7 E. Nakano,31 M. Nakao,7 Z. Natkaniec,27 S. Nishida,7 O. Nitoh,48 S. Ogawa,42 T. Ohshima,22
S. Okuno,15 S. L. Olsen,6 Y. Onuki,35 W. Ostrowicz,27 H. Ozaki,7 P. Pakhlov,13 G. Pakhlova,13 C. W. Park,39
H. Park,17 L. S. Peak,40 R. Pestotnik,14 L. E. Piilonen,49 A. Poluektov,1 H. Sahoo,6 Y. Sakai,7 O. Schneider,18
J. Schümann,7 C. Schwanda,11 A. J. Schwartz,3 R. Seidl,9, 35 K. Senyo,22 M. E. Sevior,21 M. Shapkin,12 H. Shibuya,42
S. Shinomiya,32 J.-G. Shiu,26 B. Shwartz,1 J. B. Singh,33 A. Sokolov,12 A. Somov,3 N. Soni,33 S. Stanič,30
M. Starič,14 H. Stoeck,40 K. Sumisawa,7 T. Sumiyoshi,47 S. Suzuki,36 O. Tajima,7 F. Takasaki,7 K. Tamai,7
N. Tamura,29 M. Tanaka,7 G. N. Taylor,21 Y. Teramoto,31 X. C. Tian,34 I. Tikhomirov,13 T. Tsuboyama,7
S. Uehara,7 K. Ueno,26 T. Uglov,13 Y. Unno,5 S. Uno,7 P. Urquijo,21 Y. Usov,1 G. Varner,6 K. Vervink,18
S. Villa,18 A. Vinokurova,1 C. H. Wang,25 M.-Z. Wang,26 P. Wang,10 Y. Watanabe,15 E. Won,16 B. D. Yabsley,40
A. Yamaguchi,44 Y. Yamashita,28 M. Yamauchi,7 C. Z. Yuan,10 C. C. Zhang,10 V. Zhilich,1 and A. Zupanc14
(The Belle Collaboration)
1Budker Institute of Nuclear Physics, Novosibirsk
2Chiba University, Chiba
3University of Cincinnati, Cincinnati, Ohio 45221
4The Graduate University for Advanced Studies, Hayama
5Hanyang University, Seoul
6University of Hawaii, Honolulu, Hawaii 96822
7High Energy Accelerator Research Organization (KEK), Tsukuba
8Hiroshima Institute of Technology, Hiroshima
9University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
10Institute of High Energy Physics, Chinese Academy of Sciences, Beijing
11Institute of High Energy Physics, Vienna
12Institute of High Energy Physics, Protvino
13Institute for Theoretical and Experimental Physics, Moscow
14J. Stefan Institute, Ljubljana
15Kanagawa University, Yokohama
16Korea University, Seoul
17Kyungpook National University, Taegu
18Swiss Federal Institute of Technology of Lausanne, EPFL, Lausanne
19University of Ljubljana, Ljubljana
20University of Maribor, Maribor
21University of Melbourne, School of Physics, Victoria 3010
22Nagoya University, Nagoya
23Nara Women’s University, Nara
24National Central University, Chung-li
25National United University, Miao Li
26Department of Physics, National Taiwan University, Taipei
27H. Niewodniczanski Institute of Nuclear Physics, Krakow
28Nippon Dental University, Niigata
29Niigata University, Niigata
30University of Nova Gorica, Nova Gorica
31Osaka City University, Osaka
32Osaka University, Osaka
Typeset by REVTEX
http://arxiv.org/abs/0704.1000v2
33Panjab University, Chandigarh
34Peking University, Beijing
35RIKEN BNL Research Center, Upton, New York 11973
36Saga University, Saga
37University of Science and Technology of China, Hefei
38Seoul National University, Seoul
39Sungkyunkwan University, Suwon
40University of Sydney, Sydney, New South Wales
41Tata Institute of Fundamental Research, Mumbai
42Toho University, Funabashi
43Tohoku Gakuin University, Tagajo
44Tohoku University, Sendai
45Department of Physics, University of Tokyo, Tokyo
46Tokyo Institute of Technology, Tokyo
47Tokyo Metropolitan University, Tokyo
48Tokyo University of Agriculture and Technology, Tokyo
49Virginia Polytechnic Institute and State University, Blacksburg, Virginia 24061
50Yonsei University, Seoul
We report a measurement of D0-D 0 mixing in D0 → K0S π
+π− decays using a time-dependent
Dalitz plot analysis. We first assume CP conservation and subsequently allow for CP violation. The
results are based on 540 fb−1 of data accumulated with the Belle detector at the KEKB e+e− collider.
Assuming negligible CP violation, we measure the mixing parameters x = (0.80± 0.29+0.09 +0.10−0.07 −0.14)%
and y = (0.33 ± 0.24+0.08 +0.06−0.12 −0.08)%, where the errors are statistical, experimental systematic, and
systematic due to the Dalitz decay model, respectively. Allowing for CP violation, we obtain the
CPV parameters |q/p| = 0.86+0.30 +0.06−0.29 −0.03 ± 0.08 and arg(q/p) = (−14
+16+5+2
−18−3−4)
PACS numbers: 13.25.Ft, 11.30.Er, 12.15.Ff
Mixing in the D0-D 0 system is predicted to be very
small in the Standard Model (SM) [1] and, unlike in K0,
B0, and B0s systems, has eluded experimental observa-
tion. Recently, evidence for this phenomenon has been
found in D0 → K+K−/π+π− [2] and D0 → K+π− [3]
decays. It is important to measure D0-D 0 mixing in
other decay modes and to search for CP -violating (CPV )
effects in order to determine whether physics contribu-
tions outside the SM are present. Here we study the
self-conjugate decay D0→K0S π+π−.
The time-dependent probability of flavor eigenstates
D0 and D 0 to mix to each other is governed by the
lifetime τD0 = 1/Γ, and the mixing parameters x =
(m1 − m2)/Γ and y = (Γ1 − Γ2)/2Γ. The parameters
m1,m2 (Γ1,Γ2) are the masses (decay widths) of the mass
eigenstates |D1,2〉 = p|D0〉±q|D 0〉, and Γ = (Γ1+Γ2)/2.
The parameters p and q are complex coefficients sat-
isfying |p|2 + |q|2 = 1. Various D0 decay modes have
been exploited to measure or constrain x and y [4]. For
D0→K0S π+π− decays, the time dependence of the Dalitz
plot distribution allows one to measure x and y directly.
This method was developed by CLEO [5] using 9.0 fb−1
of data; here we extend this method to a data sample 60
times larger.
The decay amplitude at time t of an initially produced
|D0〉 or |D 0〉 can be expressed as
,m2+, t) = A(m2−,m2+)
e1(t) + e2(t)
,m2+)
e1(t)− e2(t)
,m2+, t) = A(m2−,m2+)
e1(t) + e2(t)
,m2+)
e1(t)− e2(t)
, (1)
where A and A are the amplitudes for |D0〉 and |D 0〉
decays as functions of the invariant-masses-squared vari-
ables m2
≡ m2(K0S π±). The time dependence is con-
tained in the terms e1,2(t) = exp[−i(m1,2 − iΓ1,2/2)t].
Upon squaring M and M, one obtains decay rates con-
taining terms exp(−Γt) cos(xΓt), exp(−Γt) sin(xΓt), and
exp[−(1± y)Γt].
We parameterize the K0Sπ
+π− Dalitz distribution fol-
lowing Ref. [6]. The overall amplitude as a function of
m2+ and m
is expressed as a sum of quasi-two-body am-
plitudes (subscript r) and a constant non-resonant term
(subscript NR):
,m2+) =
iφrAr(m2−,m2+) + aNReiφNR , (2)
,m2+) =
iφ̄rAr(m2+,m2−) + āNReiφ̄NR . (3)
The functions Ar are products of Blatt-Weisskopf form
factors and relativistic Breit-Wigner functions [7].
The data were recorded by the Belle detector at the
KEKB asymmetric-energy e+e− collider [8]. The Belle
detector [9] includes a silicon vertex detector (SVD), a
central drift chamber (CDC), an array of aerogel thresh-
old Cherenkov counters (ACC), a barrel-like arrangement
of time-of-flight scintillation counters (TOF), and an elec-
tromagnetic calorimeter.
We reconstruct D0 candidates via the decay chain
D∗+ → π+s D0, D0 → K0Sπ+π− [10]. Here, πs denotes
a low-momentum pion, the charge of which tags the fla-
vor of the neutral D at production. The K0S candidates
are reconstructed in the π+π− final state; we require
that the pion candidates form a common vertex sepa-
rated from the interaction region and have an invariant
mass within ±10 MeV/c2 of m
. We reconstruct D0
candidates by combining the K0S candidate with two op-
positely charged tracks assigned as pions. These tracks
are required to have at least two SVD hits in both r-φ and
z coordinates. A D∗+ candidate is reconstructed by com-
bining the D0 candidate with a low momentum charged
track (the π+s candidate); the resulting D
∗+ momentum
in the e+e− center-of-mass (CM) frame is required to be
larger than 2.5 GeV/c in order to eliminate BB events
and suppress combinatorial background.
The charged pion tracks are refitted to originate from
a common vertex, which represents the decay point of
the D0. The D∗+ vertex is taken to be the intersection
of the D0 momentum vector with the e+e− interaction
region. The D0 proper decay time is calculated from
the projection of the vector joining the two vertices (~L)
onto the momentum vector: t = ~L · (~p/p)(m
/p). The
uncertainty in t (σt) is calculated event-by-event, and we
require σt < 1 ps (for selected events, 〈σt〉 ∼ 0.2 ps).
The signal and background yields are determined from
a two-dimensional fit to the variables m
and Q ≡
−mπ)·c2. The variableQ is the kinetic
energy released in the decay and equals only 5.9 MeV for
D∗+→π+s D0 decays. We parameterize the signal shape
by a triple-Gaussian function for m
, and the sum
of a bifurcated Student t distribution and a Gaussian
function for Q. The backgrounds are classified into two
types: random πs background, in which a random πs
is combined with a true D0 decay, and combinatorial
background. The shape of the m
distribution for
the random πs background is fixed to be the same as
that used for the signal. Other background distributions
are obtained from Monte Carlo (MC) simulation. We
perform a two-dimensional fit to the measured m
distributions in a wide range 1.81 GeV/c2 < m
1.92 GeV/c2 and 0<Q< 20 MeV. We define a smaller
signal region |m
− mD0 | < 15 MeV/c2 and |Q −
5.9 MeV| < 1.0 MeV, corresponding to 3σ intervals in
these variables. In this region we find 534410±830 signal
events and background fractions of 1% and 4% for the
random πs and combinatorial backgrounds, respectively.
and Q distributions are shown in Fig. 1 along
with projections of the fit result.
mKsππ (GeV)
1.825 1.85 1.875 1.9
Signal
Random π
Combinatorial
Q (MeV)
0 5 10 15 20
FIG. 1: The distribution of (a) m
with 0 < Q < 20 MeV;
(b) Q with 1.81 GeV/c2 < m
< 1.92 GeV/c2. Superim-
posed on the data (points with error bars) are projections of
the m
-Q fit.
For the events selected in the signal region we perform
an unbinned likelihood fit to the Dalitz plot variables
and m2+, and the decay time t. For D
0 decays, the
likelihood function is
fj(mK0
ππ,i, Qi)Pj(m2−,i,m2+,i, ti) , (4)
where j = {sig, rnd, cmb} denotes the signal or back-
ground components, and the index i runs over D0 candi-
dates. The event weights fj are functions of mK0
Q and are obtained from the m
-Q fit mentioned
above.
The probability density function (PDF)
Psig(m2−,m2+, t) equals |M(m2−,m2+, t)|2 convolved
with the detector response. Resolution effects in two-
particle invariant masses are significant only for m2ππ.
The latter, and variation of the efficiency across the
Dalitz plot, are taken into account using the method
described in Ref. [6]. The resolution in decay time t
is accounted for by convolving Psig with a resolution
function consisting of a sum of three Gaussians with a
common mean and widths σk = Sk · σt,i (k = 1− 3).
The scale factors Sk and the common mean are free
parameters in the fit.
The random πs background contains real D
0 and
D 0 decays; in this case the charge of the πs is un-
correlated with the flavor of the neutral D. Thus the
Prnd PDF is taken to be (1 − fw)|M(m2−,m2+, t)|2 +
fw|M(m2−,m2+, t)|2, convolved with the same resolution
function as that used for the signal, where fw is the
wrong-tag fraction. We measure fw = 0.452±0.005 from
fitting events in the Q sideband 3 MeV< |Q−5.9 MeV| <
14.1 MeV.
For the combinatorial background, Pcmb is the prod-
uct of Dalitz-plot and decay time PDFs. The latter is
parameterized as the sum of a delta function and an
exponential function convolved with a Gaussian resolu-
tion function. The timing and Dalitz PDF parameters
are obtained from fitting events in the mass sideband
30 MeV/c2< |m
−mD0 | < 55 MeV/c2.
The likelihood function for D 0 decays, L, has the same
form as L, with M and M (appearing in Psig and Prnd)
interchanged. To determine x and y, we maximize the
sum lnL+lnL. Table I lists the results from two separate
fits. In the first fit we assume CP is conserved, i.e.,
a = ā, φ = φ̄, and p/q = 1. We fit all events in the
signal region, where the free parameters are x, y, τ
the timing resolution parameters of the signal, and the
Dalitz plot resonance parameters ar(NR) and φr(NR). The
fit gives τ
= (409.9 ± 1.0) fs, which is consistent with
the world average [11]. The results for ar and φr for the
18 quasi-two-body resonances used (following the same
model as in Ref. [6]) and the NR contribution are listed
in Table II. The Dalitz plot and its projections, along
with projections of the fit result, are shown in Fig. 2. We
estimate the goodness-of-fit of the Dalitz plot through a
two-dimensional χ2 test [6] and obtain χ2/ndf = 2.1 for
3653−40 degrees of freedom (ndf). We find that the main
features of the Dalitz plot are well reproduced, with some
significant but numerically small discrepancies at peaks
and dips of the distribution in the very high m2
region.
The decay-time distribution for all events, and the ratio
of decay-time distribution for events in theK∗(892)+ and
K∗(892)− regions, are shown in Fig. 3.
TABLE I: Fit results and 95% C.L. intervals for x and y,
including systematic uncertainties. The errors are statisti-
cal, experimental systematic, and decay-model systematic,
respectively. For the CPV -allowed case, there is another so-
lution as described in the text.
Fit case Parameter Fit result 95% C.L. interval
No x(%) 0.80 ± 0.29 +0.09+0.10−0.07−0.14 (0.0, 1.6)
CPV y(%) 0.33 ± 0.24 +0.08+0.06−0.12−0.08 (−0.34, 0.96)
CPV x(%) 0.81 ± 0.30 +0.10+0.09−0.07−0.16 |x| <1.6
y(%) 0.37 ± 0.25 +0.07+0.07−0.13−0.08 |y| <1.04
|q/p| 0.86+0.30 +0.06−0.29 −0.03 ± 0.08 -
arg(q/p)(◦) −14+16+5+2−18−3−4 -
For the second fit, we allow for CPV . This introduces
the additional free parameters |p/q|, arg(p/q), ār(NR) and
φ̄r(NR). The fit gives two solutions: if {x, y, arg(p/q)} is
a solution, then {−x, −y, arg(p/q)+π} is an equally good
solution. From the fit to data, we find that the Dalitz plot
parameters are consistent for the D0 and D 0 samples;
hence we observe no evidence for direct CPV . Results
for |p/q| and arg(p/q), parameterizing CPV in mixing
and interference between mixed and unmixed amplitudes,
respectively, are also found to be consistent with CP con-
servation. If we fit the data assuming no direct CPV , the
values for x and y are essentially the same as those for the
TABLE II: Fit results for Dalitz plot parameters. The errors
are statistical only.
Resonance Amplitude Phase (◦) Fit fraction
K∗(892)− 1.629 ± 0.006 134.3 ± 0.3 0.6227
K∗0 (1430)
− 2.12 ± 0.02 −0.9± 0.8 0.0724
K∗2 (1430)
− 0.87 ± 0.02 −47.3 ± 1.2 0.0133
K∗(1410)− 0.65 ± 0.03 111± 4 0.0048
K∗(1680)− 0.60 ± 0.25 147± 29 0.0002
K∗(892)+ 0.152 ± 0.003 −37.5 ± 1.3 0.0054
K∗0 (1430)
+ 0.541 ± 0.019 91.8 ± 2.1 0.0047
K∗2 (1430)
+ 0.276 ± 0.013 −106± 3 0.0013
K∗(1410)+ 0.33 ± 0.02 −102± 4 0.0013
K∗(1680)+ 0.73 ± 0.16 103± 11 0.0004
ρ(770) 1 (fixed) 0 (fixed) 0.2111
ω(782) 0.0380 ± 0.0007 115.1 ± 1.1 0.0063
f0(980) 0.380 ± 0.004 −147.1 ± 1.1 0.0452
f0(1370) 1.46 ± 0.05 98.6 ± 1.8 0.0162
f2(1270) 1.43 ± 0.02 −13.6 ± 1.2 0.0180
ρ(1450) 0.72 ± 0.04 41± 7 0.0024
σ1 1.39 ± 0.02 −146.6 ± 0.9 0.0914
σ2 0.267 ± 0.013 −157± 3 0.0088
NR 2.36 ± 0.07 155± 2 0.0615
1 2 3
2 (GeV2/c4)
10000
1 2 3
2 (GeV2/c4)
20000
40000
1 2 3
2 (GeV2/c4)
10000
0 0.5 1 1.5 2
2 (GeV2/c4)
FIG. 2: Dalitz plot distribution and the projections for data
(points with error bars) and the fit result (curve). Here, m2±
corresponds to m2(K0Sπ
±) for D0 decays and to m2(K0Sπ
for D 0 decays.
CP -conservation case, and the values for the CPV pa-
rameters are further constrained: |q/p| = 0.95+0.22
−0.20 and
arg(q/p) = (−2+10
◦. A check with independent fits to
theD0 andD 0 tagged samples gives consistent results for
x (y): 0.58%±0.41% (0.45%±0.33%) and 1.04%±0.41%
(0.21%± 0.34%), respectively.
We consider systematic uncertainties arising from both
experimental sources and from the D0→K0S π+π− decay
Proper time (fs)
-2000 0 2000 4000
FIG. 3: (a) The decay-time distribution for events in the
Dalitz plot fit region for data (points with error bars), and
the fit projection for the CP -conservation fit (curve). The
hatched area represents the combinatorial background contri-
bution. (b) Ratio of decay-time distributions for events in the
K∗(892)+ and K∗(892)− regions.
model. We estimate these uncertainties by varying rel-
evant parameters by their ±1σ errors and interpreting
the change in x and y as the systematic uncertainty due
to that source. The main sources of experimental uncer-
tainty are the modeling of the background, the efficiency,
and the event selection criteria. We vary the background
normalization and timing parameters within their uncer-
tainties, and we also set fw equal to its expected value
of 0.5 or alternatively let it float. To investigate possi-
ble correlations between the Dalitz plot (m2+,m
) dis-
tribution and the t distribution of combinatorial back-
ground, the Dalitz plot distribution is obtained for three
bins of decay time; these PDFs are then used accord-
ing to the reconstructed t of individual events. We also
try a uniform efficiency function, and we apply a “best-
candidate” selection to check the effect of the small frac-
tion of multiple-candidate events. We add all variations
in x and y in quadrature to obtain the overall experimen-
tal systematic error.
The systematic error due to our choice of D0 →
+π− decay model is evaluated as follows. We
vary the masses and widths of the intermediate res-
onances by their known uncertainties [11], and we
also try fits with Blatt-Weisskopf form factors set to
unity and with no q2 dependence in the Breit-Wigner
widths. We perform a series of fits successively exclud-
ing intermediate resonances that give small contributions
(ρ(1450), K∗(1680)+), and we also exclude the NR con-
tribution. We account for uncertainty in modeling of
the S-wave ππ component by using K-matrix formal-
ism [12]. We include an uncertainty due to the effect of
around 10-20% bias in the amplitudes for theK∗(1410)±,
K∗0 (1430)
+ andK∗2 (1430)
+ intermediate states, which we
observe in MC studies. Adding all variations in quadra-
ture gives the final results listed in Table I.
We obtain a 95% C.L. contour in the (x, y) plane
by finding the locus of points where −2 lnL increases
by 5.99 units with respect to the minimum value (i.e.,
−2∆ lnL=5.99). All fit variables other than x and y are
allowed to vary to obtain best-fit values at each point
on the contour. To include systematic uncertainty, we
rescale each point on the contour by a factor
1 + r2,
where r2 is a weighted average of the ratios of systematic
to statistical errors for x and y, where the weights de-
pend on the position on the contour. Both the statistical-
only and overall contours for both the CPV -allowed and
the CP -conservation case are shown in Fig. 4. We note
that for the CPV -allowed case, the reflection of these
contours through the origin (0, 0) are also allowed re-
gions. Projecting the overall contour onto the x, y axes
gives the 95% C.L. intervals listed in Table I. After
the systematics-rescaling procedure, the no-mixing point
(0,0) has a value −2∆ lnL = 7.3; this corresponds to a
C.L. of 2.6%. We have confirmed this value by generat-
ing and fitting an ensemble of MC fast-simulated exper-
iments.
x (%)
no CPV (stat. only)
no CPV
CPV (stat. only)
-1 0 1 2
FIG. 4: 95% C.L. contours for (x, y): dotted (solid) corre-
sponds to statistical (statistical and systematic) contour for
no CPV , and dash-dotted (dashed) corresponds to statisti-
cal (statistical and systematic) contour for the CPV -allowed
case. The point is the best-fit result for no CPV .
In summary, we have measured the D0-D 0 mixing
parameters x and y using a Dalitz plot analysis of
D0 → K0S π+π− decays. Assuming negligible CP vi-
olation, we measure x = (0.80 ± 0.29+0.09+0.10
−0.07−0.14)% and
y = (0.33 ± 0.24+0.08+0.06
−0.12−0.08)%, where the errors are sta-
tistical, experimental systematic, and decay-model sys-
tematic, respectively. Our results disfavor the no-mixing
point x = y = 0 with a significance of 2.2σ, while the
one dimensional significance for x > 0 is 2.4σ. We have
also searched for CPV ; we see no evidence for this and
constrain the CPV parameters |q/p| and arg(q/p).
We thank the KEKB group for excellent operation of
the accelerator, the KEK cryogenics group for efficient
solenoid operations, and the KEK computer group and
the NII for valuable computing and Super-SINET net-
work support. We acknowledge support from MEXT and
JSPS (Japan); ARC and DEST (Australia); NSFC and
KIP of CAS (China); DST (India); MOEHRD, KOSEF
and KRF (Korea); KBN (Poland); MES and RFAAE
(Russia); ARRS (Slovenia); SNSF (Switzerland); NSC
and MOE (Taiwan); and DOE (USA).
[1] A. F. Falk et al., Phys. Rev. D 65, 054034 (2002);
I. I. Bigi, N. Uraltsev, Nucl. Phys. B 592, 92 (2001);
A. F. Falk et al., Phys. Rev. D 69, 114021 (2004);
A. A. Petrov, Int. J. Mod. Phys. A21, 5686 (2006).
[2] M. Starič et al. (Belle Collaboration), Phys. Rev. Lett.
98, 211803 (2007).
[3] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett.
98, 211802 (2007).
[4] For a review see: D. M. Asner, D0- D 0 Mixing, in Ref.
[11].
[5] D. M. Asner et al. (CLEO Collaboration), Phys. Rev. D
72, 012001 (2005) and arXiv: hep-ex/0503045v3.
[6] A. Poluektov et al. (Belle Collaboration), Phys. Rev. D
73, 112009 (2006).
[7] S. Kopp et al. (CLEO Collaboration), Phys. Rev. D 63,
092001 (2001).
[8] S. Kurokawa, E. Kikutani et al., Nucl. Instrum. Methods
Phys. Res. Sect. A 499, 1 (2003), and other papers in
this volume.
[9] A. Abashian et al. (Belle Collaboration), Nucl. In-
strum. Methods Phys. Res. Sect. A 479, 117 (2002);
Z. Natkaniec et al. (Belle SVD2 Group), Nucl. Instrum.
Methods Phys. Res. Sect. A 560, 1 (2006).
[10] Charge conjugate decays are implied unless explicitly
stated otherwise.
[11] W.-M. Yao et al. (Particle Data Group), J. Phys. G 33,
1 (2006).
[12] J. M. Link et al. (FOCUS Collaboration), Phys. Lett. B
585, 200 (2004); B. Aubert et al. (BaBar Collaboration),
arXiv: hep-ex/0507101.
http://arxiv.org/abs/hep-ex/0503045
http://arxiv.org/abs/hep-ex/0507101
ABSTRACT
  We report a measurement of D0-D0bar mixing in D0->Ks pi+ pi- decays using a
time-dependent Dalitz plot analysis. We first assume CP conservation and
subsequently allow for CP violation. The results are based on 540 fb$^{-1}$ of
data accumulated with the Belle detector at the KEKB $e^+e^-$ collider.
Assuming negligible CP violation, we measure the mixing parameters
$x=(0.80\pm0.29^{+0.09 +0.10}_{-0.07 -0.14})%$ and $y=(0.33\pm0.24^{+0.08
+0.06}_{-0.12 -0.08})%$, where the errors are statistical, experimental
systematic, and systematic due to the Dalitz decay model, respectively.
Allowing for CP violation, we obtain the $CPV$ parameters $|q/p|=0.86^{+0.30
+0.06}_{-0.29 -0.03}\pm0.08$ and $\arg(q/p)=(-14^{+16 +5 +2}_{-18 -3
-4})^\circ$.

<|endoftext|><|startoftext|>
Introduction 1
2. Gromov-Witten theory 5
3. Zwiebach invariants 9
4. Construction of correlators in Hodge field theory 12
5. String, dilaton, and tautological relations 17
6. Vanishing of the BV structure 21
7. Main Lemma 24
8. Proof of Theorem 3 27
References 33
1. Introduction
In this paper we present an attempt to formalize what may be called
a string field theory (SFT) for (closed) topological strings with Hodge
property.
From the very first days of string theory it was considered as a kind
of generalization of the perturbative expansion of the quantum field
theory in the (functional) integral representation. The space of graphs
with g loops with metrics on edges (Schwinger proper times) was gen-
eralized to moduli space of Riemann surfaces. Indeed, the latter space
really looks like a principle U(1)n bundle over the former space near the
http://arxiv.org/abs/0704.1001v1
2 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
points of maximal degeneracy (i.e., where maximal number of handles
are pinched).
A natural question is whether there are special string theories that
degenerate exactly to quantum field theories (may be, of the special
kind). Would it happen such theories should enjoy both finiteness
of string theory and (functional) integral description of quantum field
theory.
One of the first attempts to construct a theory of this type was
done by Zwiebach in [37]. He divided the moduli space into two re-
gions: the internal piece and the boundary. He observed that surfaces
representing the boundary region may be constructed from those rep-
resenting the internal piece by gluing them with the help of cylinders
(with flat metric). Therefore, he proposed to take integrals over the
moduli spaces in two steps: first, to take an integral over the internal
pieces, such that this would produce vertices, and then take an integral
along metrics on cylinders, that would exactly reproduce integral along
the Schwinger parameters on graphs in QFT prescription.
In this approach, he came with the infinite number of vertices of
different internal genera and with different number of external legs.
However, he observed that such vertices satisfy quadratic relations that
where a quantum version of some infinity-structure. At that time com-
munity of theoretical physicists seemed not to be impressed by the
Lagrangian with infinite number of (almost uncomputable1) vertices.
The next attempt was done by Witten [36]. He assumed that in
topological string theories there may be a limit in the space of two-
dimensional theories such that the measure of integration goes to the
vicinity of the points of maximal degeneration. In the type B theories
such limit seems to be the large volume limit of the target space; this
motivated Witten’s Chern-Simons-like representation for the topologi-
cal string theory. This approach was further developed by Bershadsky,
Cecotti, Ooguri, and Vafa in [3]. We note that the tropical limit of
Gromov-Witten theory [26] (type A topological strings) seems to real-
ize the same QFT degeneration of string theory. Indeed, the tropical
limit of a Riemann surface mapped to a toric variety is represented by
the graph mapped to the moment map domain.
In the development of topological string theory it became clear that
the proper object is not just a measure on the moduli space of complex
structures of Riemann surfaces, but rather a differential form on this
space. In original formulation these differential forms were assigned
to the tensor algebra of cohomology of some complex; such objects
are called Gromov-Witten invariants. We say that Gromov-Witten
invariants are QFT-like if the differential forms of non-zero degree have
support only in a vicinity of the points of maximal degeneration.
1 Note, that computation of an integral over a subspace with a boundary is
harder than that one over a compact space.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 3
We generalized the definition of Gromov-Witten invariants in [22] by
lifting it from the cohomology of a complex to the full complex. Such
generalization involved enlargement of the moduli space from Deligne-
Mumford space to Kimura-Stasheff-Voronov space [15], and we called it
Zwiebach invariants (in fact, some pieces of this construction appeared
earlier in [37] and [8]). The complex of states involved in the definition
of Zwiebach invariants is a bicomplex due to the action of the second
differential. The second differential represents the substitution of a
special vector field corresponding to the constant rotation of the phase
of the local coordinate at a marked point into differential forms on the
Kimura-Stasheff-Voronov space.
Once we have some Zwiebach invariants, it is possible to produce new
Zwiebach invariants by contraction of an acyclic Hodge sub-bicomplex.
In fact, it is one of the main properties of Zwiebach invariants. Con-
sider a sub-bicomplex, where these two differentials act freely. We call
it Hodge contractible bicomplex. The operation of contraction of a
Hodge contractible bicomplex turns Zwiebach invariants into induced
Zwiebach invariants on the coset with respect to contactible bicomplex.
Induced Zwiebach invariants are differential forms whose support is a
union of the support of the initial Zwiebach invariants and small neigh-
bourhoods of the points of maximal degeneration. This procedure is a
generalization from intervals to cylinders of the procedure of induction
of L∞-structures, see e. g. [30, 28].
This way we can obtain QFT-like Gromov-Witten invariants. We
just should start with Zwiebach invariants that have (in some suit-
able sense) no support inside the Kimura-Stasheff-Voronov spaces. In
fact, it is even enough to consider a weaker condition, motivated by
applications. That is, usually people consider the integrals of Gromov-
Witten invarians only over the tautological classes in the moduli space
of curves. So, we call a set of Zwiebach invariants vertex-like if the
integral over the Kimura-Stasheff-Voronov spaces of any their non-
zero component multipled by the pullback (from the Deligne-Mumford
space) of any tautological class vanishes.
Consider vertex-like Zwiebach invariants. Assume we contract a
Hodge contractible bicomplex down to cohomology. We obtain differ-
ential forms on the Deligne-Mumford space, such that the integral of
the product of any such form of non-zero degree with any tautological
class vanishes the interior of the moduli space. Integrals of such forms
over the moduli spaces turn out to be sums over graphs (corresponding
to degenerate Riemann surfaces). They resemble Feynman diagramms,
and generation function for the integral over moduli spaces resemble
diagrammatic expansion of perturbative quantum field theory.
In this paper, we don’t construct examples of vertex-like Zwiebach
invariants (we are going to do this explicitly in a future publication, as
well as the corresponding theory for the spaces introduced in [20, 21]).
4 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
Rather we conjecture that they exist and study the consequences of
this assumption. We call the emerging construction the Hodge field
theory, and now we will explain it in some detail.
First of all, degree zero parts of vertex-like Zwiebach invariants in-
duce the structure of homotopy cyclic Hodge algebra on the target
complex [22]. We remind that a cyclic Hodge algebra is just a Hodge
dGBV-algebra with one additional axiom (1/12-axiom, see below).
In fact, this structure is interesting by itself, without any reference to
Zwiebach invariants. It has first appeared in the paper of Barannikov
and Kontsevich [1]; it captures the properties of polyvector fields on
Calabi-Yau manifolds. More examples of dGBV algebras are studied
in [25] and [23]. It is possible to understand the structure of dGBV-
algebra as a natural generalization of the algebraic structure studied
in [19].
In the Hodge field theory construction we consider only a particular
case, where we obtain axioms of of a cyclic Hodge algebra itself, not
up to homotopy. We are aware of the fact that demanding existance of
vertex-like Zwiebach invariants simultaneously with vanishing homo-
topy piece of cyclic Hodge algebra conditions may be too restrictive,
and while considering only those relations that lead to axioms of cyclic
Hodge algebra may be too weak, however we proceed.
In the Hodge field theory construction we define graph expressions for
the analogues of Gromov-Witten invariants multiplied by tautological
classes using only cyclic Hodge algebra data. We call them Hodge field
theory correlators. The corresponding action of the Hodge field theory
is written down explicitely in Section 6.2.
Our main result is the proof that the Hodge field theory correla-
tors satisfy all universal equations that follow from relations among
tautological classes in cohomology.
The first result of this kind is due to Barannikov and Kontsevich.
They have noticed that there is a solution of the WDVV equation that
is associated to a dGBV-algebra (this solution is the critical value of
the BCOV action [3], see [1, Appendix] and [22, Appendix]). Later, we
reproved this in [22]. Then, in [22, 31, 32, 33] we proved some other
low-genera universal equations. Here we generalize all these result and
put all calculations done before in a proper framework.
In particular, the main problem for us was to define a graph expres-
sion in tensors of a cyclic Hodge algebra that corresponds to the full
Gromov-Witten potential with descendants. The first steps were done
in [31, 32], where we introduced the definition of descendants at one
point in Hodge field theory (mostly for combinatorial reasons). But
then we observed that it is a part of a natural definition of potential
with descendants in cyclic Hodge algebras that appears as a special
case of degeneration of vertex-like Zwiebach invariants multiplied by
tautological classes.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 5
In this paper, we present and study this construction. We prove in
a completely algebraic way that Hodge field theory correlators satisfy
the same equations as a Gromov-Witten potential: string, dilaton, and
the whole system of PDEs coming from tautological relations in the
cohomology of the moduli space of curves (see also [33] for some pre-
liminary results). In what follows we will not only present the proof
but also will do our best relating algebraic definitions and statements
on Hodge field theory to analoguous constructions and theorems in the
theory of Zwiebach invariants.
1.1. Organization of the paper. In Section 2 we remind all neces-
sary facts about the axiomatic Gromov-Witten theory. In Section 3
we define Zwiebach invariants and explain the motivation to consider
the sums over graphs in cyclic Hodge algebras. In Section 4 we de-
fine cyclic Hodge algebras and the corresponding descendant potential.
In Section 5 we state the main properties of the descendant potential
in cyclic Hodge algebras, and the rest of the paper is devoted to the
proofs.
1.2. Acknowledgments. A.L. was supported by the Russian Fed-
eral Agency of Atomic Energy and by the grants INTAS-03-51-6346,
NSh-8065.2006.2, NWO-RFBR-047.011.2004.026 (RFBR-05-02-89000-
NWO-a), and RFBR-07-02-01161-a.
S.S. was supported by the grant SNSF-200021-115907/1. S.S. is
grateful to the participants of the Moduli Spaces program at the Mittag-
Leffler Institute (Djursholm, Sweden) for the fruitful discussions of
the preliminary versions of the results of this paper. The remarks
of C. Faber, O. Tommasi, and D. Zvonkine were especially helpful.
I.S. was supported by the grant RFBR-06-01-00037.
2. Gromov-Witten theory
In this section we remind what is Gromov-Witten theory and explain
its basic properties that we are going to reproduce in Hodge field theory
construction.
2.1. Gromov-Witten invariants. Let us fix a finite dimensional vec-
tor space H0 over C together with the choice of a homogeneous basis
H0 = 〈e1, . . . , es〉 and a non-degenerate scalar product ηij = (·, ·) on it.
Let e1 be a distinguished even element of the basis.
Consider the moduli spaces of curves Mg,n. On each Mg,n we take
a differential form Ωg,n of mixed degree with values in H
0 . The whole
system of forms {Ωg,n} is called Gromov-Witten invariants, if it satisfies
the axioms [18, 24]:
(1) There are two actions of the symmetric group Sn on Ωg,n. First,
we can relabel the marked points on curves in Mg,n; second,
we can interchange the factors in the tensor product H⊗n0 . We
6 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
require that Ωg,n is equivariant with respect to these two actions
of Sn. In other words, one can think that each copy of H0 in the
tensor product is assigned to a specific marked point on curves
in Mg,n.
(2) The forms must be closed, dΩg,n = 0.
(3) Consider the mapping π : Mg,n+1 → Mg,n forgetting the last
marked point. Then the correspondence between Ωg,n and Ωg,n+1
is given by the formula
(1) π∗Ωg,n = (Ωg,n+1, e1) .
The meaning of the right hand side is the following. We want
to turn a H⊗n+10 -valued form into a H
0 -valued one. So, we
take the copy of H0 corresponding to the last marked point and
contract it with the vector e1 using the scalar product.
(4) Consider an irreducible boundary divisor inMg,n, whose generic
point is represented by a two-component curve. It is the image
of a natural mapping σ : Mg1,n1+1 ×Mg1,n2+1 → Mg,n, where
g = g1 + g1 and n = n1 + n2. We require that
(2) σ∗Ωg,n =
Ωg1,n1+1 ∧ Ωg2,n2+1, η
Here on the right hand side we contract with a scalar product
the two copies of H0 that correspond to the node.
In the same way, consider the divisor of genus g − 1 curves
with one self-intersection. It is the image of a natural mapping
σ : Mg−1,n+2 → Mg,n. In this case, we require that
(3) σ∗Ωg,n =
Ωg−1,n+2, η
As before, we contract two copies of H0 corresponding to the
node.
(5) We also assume that (Ω0,3, e1 ⊗ ei ⊗ ej) = (ei, ej) = ηij .
2.2. Gromov-Witten potential. Let us associate to each ei the set
of formal variables Tn,i, n = 0, 1, 2 . . . . By Fg denote the formal power
series in these variables defined as
(4) Fg :=
a1,...,an≥0
ψaii ,
ejTai,j
The first sum is taken over n ≥ 3 for g = 0, n ≥ 1 for g = 1, and n ≥ 0
for g ≥ 2. On the right hand side, we contract each copy of H0 with
the factor of the tensor product associated to the same marked point.
The formal power series F := exp(
g≥0 ~
g−1Fg) is called Gromov-
Witten potential associated to the system of Gromov-Witten invariants
{Ωg,n}. The coefficients of Fg, g ≥ 0, are called correlators and denoted
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 7
(5) 〈τa1,i1 . . . τan,in〉g :=
Vectors ei1 , . . . , ein are called primary fields.
The main properties of GW potentials come from geometry of the
moduli space of curves. First, one can prove that coefficients of F
satisfy string and dilaton equations:
〈τ0,1
τaj ,ij〉g =
〈τaj−1,ij
k 6=j
τak ,ik〉g;(6)
〈τ1,1
τaj ,ij〉g = (2g − 2 + n)〈
τaj ,ij〉g.(7)
The string equation is a corollary of the fact that π∗ψj = ψj − Dj ;
here π : Mg,n+1 → Mg,n is the projection forgetting the last marked
point andDj is the divisor inMg,n+1 whose generic point is represented
by a two-component curve with one node such that one component
has genus 0 and contains exactly two marked points, the i-th and the
(n+ 1)-th ones. It is assumed that
j=1 aj > 0.
The dilaton equation is a corollary of the fact that, in the same
notations, π∗ψn+1 = 2g−2+n. Of course, we assume that 2g−2+n > 0.
Second, any relation in the cohomology of Mg,n among natural ψ-κ-
strata gives a relation for the correlators. Let us explain this in more
detail.
2.3. Tautological relations.
2.3.1. Stable dual graphs. The moduli space of curves Mg,n [12] has a
natural stratification by the topological type of stable curves. We can
combine natural strata with ψ-classes at marked points and at nodes
and κ-classes on the moduli spaces of irreducible components. These
objects are called ψ-κ-strata.
A convenient way to describe a ψ-κ-stratum in Mg,n is the language
of stable dual graphs. Take a generic curve in the stratum. To each
irreducible component we associate a vertex marked by its genus. To
each node we associate an edge connecting the corresponding vertices
(or a loop, if it is a double point of an irreducible curve). If there is a
marked point on a component, then we add a leaf at the corresponding
vertex, and we label leaves in the same way as marked points. If we
multiply a stratum by some ψ-classes, then we just mark the corre-
sponding leaves or half-edges (in the case when we add ψ-classes at
nodes) by the corresponding powers of ψ. Also we mark each vertex
by the κ-class associated to it.
8 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
Let us remark that by κ-classes on Mg,n we mean the classes
(8) κk1,...,kl := π∗
where π : Mg,n+l → Mg,n is the projections forgetting the last l marked
points. It is just another additive basis in the ring generated by the
ordinary κ-classes (κk, k ≥ 1, in our notations). The basic properties
of these classes are stated in [6].
2.3.2. Integrals over ψ-κ-strata. Using the properties of GW invariants,
one can express the integral of Ωg,n over a ψ-κ-stratum S in terms of
correlators.
Consider a special case, when S is represented by a two-vertex graph
with no ψ- and κ-classes. Then, according to axiom 4, the integral of
Ωg,n is the product of integrals of Ωg1,n1+1 and Ωg2,n2+1 over the moduli
spaces corresponding to the vertices, contracted by the scalar product:
Ωg,n,
Mg1,n1+1
Ωg1,n1+1,
eij ⊗ ei′
Mg2,n2+1
Ωg2,n2+1,
eij ⊗ ei′′
Here we assume that the genus of one component of a generic curve
in S is g1 and n1 marked points with labels j ∈ J1, |J1| = n1, are on
this component. The other component has genus g2 and n2 marked
points with labels j ∈ J2, |J2| = n2, lie on it. Of course, g1 + g2 = g,
n1 + n2 = n.
Now consider a special case, when S is represented by a one-vertex
graph with ψ- and κ-classes. Let us assign a vector in the basis of H0
to each leaf (to each marked point). Then, according to axiom 3 the
integral
j κb1,...,bk ,
is equal to
Mg,n+k
Ωg,n+k
n+j ,
eij ⊗ e
Combining these two special cases one can obtain an expression in
correlators that corresponds to an arbitrary ψ-κ-stratum.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 9
2.3.3. Relations for correlators. Suppose that we have a linear combi-
nation L of ψ-κ-strata that is equal to 0 in the cohomology of Mg,n (a
tautological relation). Since dΩg,n = 0, the integral of
Ωg,n,
j=1 eij
over L is equal to zero, for an arbitrary choice of primary fields. This
gives an equation for correlators.
Usually, one consider also the pull-backs of L to Mg,n+n′, n
′ ≥ 0,
multiplied by arbitrary monomials of ψ-classes. Of course, they are also
represented as vanishing linear combinations of ψ-κ-strata. This gives a
system of PDEs for the formal power series Fg, g ≥ 0. For the detailed
description of the correspondence between tautological relations and
universal PDEs for GW potentials see, e. g., [10] or [6].
There are 8 basic tautological relations known at the moment: WDVV,
Getzler, Belorousski-Pandharipande, and topological recursion rela-
tions in M0,4, M1,1, M2,1, M2,2, M3,1 [10, 9, 2, 16].
3. Zwiebach invariants
In Gromov-Witten theory (and also in topological string theory) the
Gromov-Witten invariants is usually a structure on the cohomology
of a target manifold (the space H0) of on the cohomology of a com-
plex of some other gometric origin. We have introduced the notion of
Zwiebach invariants in [22] in order to formalize in a convenient way
what physicists mean by topological conformal quantum field theory
at the level of a complex rather than at the level of the cohomology.
The very general principles of homological algebra imply that al-
gebraic stuctures on the cohomology are often induced by some fun-
damental structures on a full complex (the standard example is the
induction of the infinity-structures from differential graded algebraic
structures). Such induction usually can be represented as a sum over
trees with vertices corresponding to fundamental operations and edges
corresponding to the homotopy that contracts the complex to its co-
homology.
We would like to stress that Gromov-Witten invariants also can be
considered as an induced structure on the cohomology of a complex.
In this case, the fundamental structure on the whole complex is deter-
mined by Zwiebach invariants.
We are able to associate some structure on a bicomplex with a special
compactification of the moduli space curves (Kimura-Stasheff-Voronov
compactification). So, complexes are replaced by bicomplexes, where
the second differential reflects the rotation of attached cylinders (or
circles). This is an appearance of the string nature of the problem.
As an induced structure we indeed obtain a Gromov-Witten-type
theory that, under some additional assumptions, can be presented in
terms of a sum over graphs. Below we explain the whole construction,
following [22] and with some additional details.
10 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
3.1. Kimura-Stasheff-Voronov spaces. We remind the construc-
tion of the Kimura-Stasheff-Voronov compactification Kg,n of the mod-
uli space of curves of genus g with n marked point. It is a real blow-up
of Mg,n; we just remember the relative angles at double points. We
can also choose an angle of the tangent vector at each marked point;
this way we get the principal U(1)n-bundle over Kg,n. We denote the
total space of this bundle by Sg,n.
There are also the standard mappings between different spaces Sg,n.
First, one can consider the projection π : Sg,n+1 → Sg,n forgetting the
last marked point. Suppose that under the projection we have to con-
tract a sphere that contains the points xi, xn+1, and a node. Denote
the natural coordinates on the circles corresponding to xi and a node
on a curve in Sg,n+1 by φi and θ. Let φ̃i be a coordinate on the circle
corresponding to xi in Sg,n. Then φ̃i = φi + θ under the projection
π. In the same way, if we contract a sphere that contains two nodes
and xn+1, then θ̃ = θ1 + θ2, where θ1 and θ2 are the coordinates on the
circles corresponding to the two nodes of a curve in Sg,n+1 and θ̃ is a
coordinate on the circle at the resulting node in Sg,n
In the same way, when we consider the mappings σ : Sg1,n1+1 ×
Sg1,n2+1 → Sg,n representing the natural boundary components of Sg,n,
we have θ = φn1+1 + φn2+1, where φn1+1 and φn2+1 are the coordinates
on the circles corresponding the points that are glued by σ into the
node and θ is the coordinate on the circle at the node. For the map-
ping σ : Sg−1,n+2 → Sg,n we also have θ = φn+1 + φn+2 with the same
notations.
3.2. Zwiebach invariants. Let us fix a Hodge bicomplex H with two
differentials denoted by Q and G− and with an even scalar product
η = (·, ·) invariant under the differentials:
(12) (Qv,w) = ±(v,Qw), (G−v, w) = ±(v,G−w).
The Hodge property means that
(13) H = H0 ⊕
〈eα, Qeα, G−eα, QG−eα〉,
where QH0 = G−H0 = 0 and H0 is orthogonal to H4.
Below we consider the action of Q and G− on H
⊗n. We denote
by Q(k) and G
− the action of Q and G− respectively on the k-th
component of the tensor product.
On each Sg,n we take a differential form Cg,n of the mixed degree with
values in H⊗n0 . The whole system of forms {Cg,n} is called Zwiebach
invariants, if it satisfies the axioms:
(1) Ωg,n is Sn-equivariant.
(2) (d+Q)Ωg,n = 0, Q =
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 11
(3) (G
− + ık)Cg,n = 0 for all 1 ≤ k ≤ n (we denote by ık the
substitution of the vector field generating the action on Sg,n of
the k-th copy of U(1)); Cg,n is invariant under the action of
U(1)n;
(4) π∗Cg,n = (Cg,n+1, e1), where π : Sg,n+1 → Sg,n is the mapping
forgetting the last marked point.
(5) σ∗Cg,n = (Cg1,n1+1 ∧ Cg2,n2+1, η
−1), where σ : Sg1,n1+1×Sg1,n2+1 →
Sg,n represents the boundary component. In the same way,
σ∗Cg,n = (Cg−1,n+2, η
−1) for the mapping σ : Sg−1,n+2 → Sg,n.
(6) (C0,3, e1 ⊗ vα ⊗ vβ) = ((Id+ dφ2G−)vα, (Id+ dφ3G−)vβ), φ2 and
φ3 are the coordinates on the circles at the corresponding points.
Zwiebach invariants on the bicomplex with zero differentials deter-
mine Gromov-Witten invariants. Indeed, in this case the factorization
property implies that {Cg,n} is lifted from the blowdown of Kimura-
Stasheff-Voronov spaces, i.e. it is determined by a set of forms on
Deligne-Mumford spaces. Then it is easy to check that this system of
forms satisfied all axioms of Gromov-Witten invariants.
3.3. Induced Zwiebach invariants. Induced Zwiebach invariants are
obtained by the contraction of H4. We denote by G+ the contraction
operator. This means that G+H0 = 0, Π = {Q,G+} is the projection
to H4 along H0, {G+, G−} = 0, and (G+v, w) = ±(v,G+w).
We construct an induced Zwiebach form C indg,n on a homotopy equiv-
alent modification S̃g,n of the space Sg,n. At each boundary component
γ we glue the cylinder γ× [0,+∞] such that γ in Sg,n is identified with
γ × {0} in the cylinder.
So, we have the mappings σ̃ : S̃g1,n1+1×S̃g1,n2+1× [0,+∞] → S̃g,n and
σ̃ : S̃g−1,n+2 × [0,+∞] → S̃g,n representing the boundary components
with glued cylinders. We take a form Cg,n, restrict it to H
0 , and
extend it to the glued cylinder by the rule that
(14) σ̃∗C indg,n =
C indg1,n1+1 ∧ C
g2,n2+1
, [e−tΠ−dt·G+ ]
in the first case and
(15) σ̃∗C indg,n =
C indg−1,n+2, [e
−tΠ−dt·G+ ]
in the second case, where [e−tΠ−dt·G+ ] is the bivector obtained from the
operator e−tΠ−dt·G+ , t is the coordinate on [0,+∞]. This determines
C indg,n completely.
Now it is a straightforward calculation to check that the forms C indg,n
are (d+Q)-closed and satisfy the factorization property when restricted
to the strata γ × {+∞}.
3.4. Induced Gromov-Witten theory. The induced Zwiebach in-
variants determine Gromov-Witten invariants. The correlators of the
corresponding Gromov-Witten potential are given by the integrals over
12 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
the fundamental cycles of K̃g,n (we just forget the circles at marked
points in S̃g,n) of the forms C
ψaii .
In fact, the fundamental class of K̃g,n is represented as a sum over all
irreducible boundary strata in Mg,n. Indeed, a boundary stratum γ in
Mg,n has real codimension equal to the doubled number of the nodes
of its generic curve. But then we add in K̃g,n a real two-dimensional
cylinder for each node. A simple explicit calculation allows to ex-
press the integral over the component of the fundamental cycle of K̃g,n
corresponding to γ. It splits into the integrals of the initial Zwiebach
invariants (multiplied by ψ-classes) over the moduli spaces correspond-
ing to the irreducible components of curves in γ; they are contracted
with the bivectors [G−G+] (obtained from the operator G−G+ via the
scalar product) corresponding to the nodes according to the topology
of curves in γ.
So, we represent the correlators of the induced Gromov-Witten the-
ory as sums over graphs. Then one can observe that C0,3 determines
a multiplication on H . Topology of the spaces S0,4 and S1,1 implies
that the whole algebraic structure that we obtain on H is the structure
of cyclic Hodge algebra up to Q-homotopy, see [22]. Let us assume
that the initial system of Zwiebach invariants is simple enough, i. e., it
induces the explicit structure of cyclic Hodge algebra on H and only
the integrals of the zero-degree parts of the initial Zwiebach invariants
(multiplied by ψ-classes) are non-vanishing on fundamental cycles. In
this case, the induced Gromov-Witten potential can be described in
very simple algebraic terms. It is the motivation of the definition of
the Hodge field theory construction given in the next section.
4. Construction of correlators in Hodge field theory
In this section, we describe in a very formal algebraic way the sum
over graphs obtained as an expression for the Gromov-Witten potential
induced from Zwiebach invariants in the previous section.
4.1. Cyclic Hodge algebras. In this section, we recall the definition
of cyclic Hodge dGBV-algebras [22, 31, 32, 24] (cyclic Hodge algebras,
for short). A supercommutative associative C-algebra H with unit
is called cyclic Hodge algebra, if there are two odd linear operators
Q,G− : H → H and an even linear function
: H → C called integral.
They must satisfy the following axioms:
(1) (H,Q,G−) is a bicomplex:
(16) Q2 = G2− = QG− +G−Q = 0;
(2) H = H0 ⊕ H4, where QH0 = G−H0 = 0 and H4 is repre-
sented as a direct sum of subspaces of dimension 4 generated
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 13
by eα, Qeα, G−eα, QG−eα for some vectors e ∈ H4, i. e.
(17) H = H0 ⊕
〈eα, Qeα, G−eα, QG−eα〉
(Hodge decomposition);
(3) Q is an operator of the first order, it satisfies the Leibniz rule:
(18) Q(ab) = Q(a)b+ (−1)ãaQ(b)
(here and below we denote by ã the parity of a ∈ H);
(4) G− is an operator of the second order, it satisfies the 7-term
relation:
G−(abc) = G−(ab)c+ (−1)
b̃(ã+1)bG−(ac) + (−1)
ãaG−(bc)(19)
−G−(a)bc− (−1)
ãaG−(b)c− (−1)
ã+b̃abG−(c).
(5) G− satisfies the property called 1/12-axiom:
(20) str(G− ◦ a·) = (1/12)str(G−(a)·)
(here a· and G−(a)· are the operators of multiplication by a and
G−(a) respectively, str means supertrace).
Define an operator G+ : H → H related to the particular choice
of Hodge decomposition. We put G+H0 = 0, and on each subspace
〈eα, Qeα, G−eα, QG−eα〉 we define G+ as
G+eα = G+G−eα = 0,(21)
G+Qeα = eα,
G+QG−eα = G−eα.
We see that [G−, G+] = 0; Π4 = [Q,G+] is the projection to H4 along
H0; Π0 = Id− Π4 is the projection to H0 along H4.
Consider the integral
: H → C. We require that
Q(a)b = (−1)ã+1
aQ(b),(22)
G−(a)b = (−1)
aG−(b),
G+(a)b = (−1)
aG+(b).
These properties imply that
G−G+(a)b =
aG−G+(b),
Π4(a)b =
aΠ4(b), and
Π0(a)b =
aΠ0(b).
We can define a scalar product on H as (a, b) =
ab. We suppose
that this scalar product is non-degenerate. Using the scalar product
we may turn any operator A : H → H into the bivector that we denote
by [A].
14 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
4.2. Tensor expressions in terms of graphs. Here we explain a
way to encode some tensor expressions over an arbitrary vector space
in terms of graphs.
Consider an arbitrary graph (we allow graphs to have leaves and we
require vertices to be at least of degree 3, the definition of graph that
we use can be found in [24]). We associate a symmetric n-form to each
internal vertex of degree n, a symmetric bivector to each egde, and a
vector to each leaf. Then we can substitute the tensor product of all
vectors in leaves and bivectors in edges into the product of n-forms in
vertices, distributing the components of tensors in the same way as the
corresponding edges and leaves are attached to vertices in the graph.
This way we get a number.
Let us study an example:
v ⊗ v
w ⊗ w
We assign a 5-form x to the left vertex of this graph and a 3-form y
to the right vertex. Then the number that we get from this graph is
x(a, b, c, v, w) · y(v, w, d).
Note that vectors, bivectors and n-forms used in this construction
can depend on some variables. Then what we get is not a number, but
a function.
4.3. Usage of graphs in cyclic Hodge algebras. Consider a cyclic
Hodge algebra H . There are some standard tensors over H , which we
associate to elements of graphs below. Here we introduce the notations
for these tensors.
We always assign the form
(24) (a1, . . . , an) 7→
a1 · · · · · an
to a vertex of degree n.
There is a collection of bivectors that will be assigned below to edges:
[G−G+], [Π0], [Id], [QG+], [G+Q], [G+], and [G−]. In pictures, edges
with these bivectors will be denoted by
, , ,
QG+ ,
G+Q ,
respectively. Note that an empty edge corresponding to the bivector
[Id] can usually be contracted (if it is not a loop).
The vectors that we will put at leaves depend on some variables. Let
{e1, . . . , es} be a homogeneous basis of H0. In particular, we assume
that e1 is the unit of H . To each vector ei we associate formal variables
Tn,i, n ≥ 0, of the same parity as ei. Then we will put at a leaf one of
the vectors En =
i=1 eiTn,i, n ≥ 0, and we will mark such leaf by the
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 15
number n. In our picture, an empty leaf is the same as the leaf marked
by 0.
4.3.1. Remark. There is a subtlety related to the fact that H is a Z2-
graded space. In order to give an honest definition we must do the
following. Suppose we consider a graph of genus g. We can choose
g edges in such a way that the graph being cut at these edge turns
into a tree. To each of these edges we have already assigned a bivector
[A] for some operator A : H → H . Now we have to put the bivector
[JA] instead of the bivector [A], where J is an operator defined by the
formula J : a 7→ (−1)ãa.
In particular, consider the following graph (this is also an example
to the notations given above):
An empty loop corresponds to the bivector [Id]. An empty leaf corre-
sponds to the vector E0. A trivalent vertex corresponds to the 3-form
given by the formula (a, b, c) 7→
If we ignore this remark, then what we get is just the trace of the
operator a 7→ E0 · a. But using this remark we get the supertrace of
this operator.
In fact, this subtlety will play no role in this paper. It affects only
some signs in calculations and all these signs will be hidden in lemmas
shared from [22, 31]. So, one can just ignore this remark.
4.4. Correlators. We are going to define the potential using correla-
tors. Let
(27) 〈τk1(V1) . . . τkn(Vn)〉g
be the sum over graphs of genus g with n leaves marked by τki(Vi),
i = 1, . . . , n, where V1, . . . , Vn are vectors in H , and τki are just formal
symbols. The index of each internal vertex of these graphs is ≥ 3; we
associate to it the symmetric form (24). There are two possible types of
edges: edges marked by [G−G+] (thick black dots in pictures, “heavy
edges” in the text) and edges marked by [Id] (empty edges). Since
an empty edge connecting two different vertices can be contracted, we
assume that all empty edges are loops.
Consider a vertex of such graph. Let us describe all possible half-
edges adjusted to this vertex. There are 2g, g ≥ 0, half-edges coming
from g empty loops; m half-edges coming from heavy edges of graph,
and l leaves marked τka1 (Va1), . . . , τkal (Val). Then we say that the type
of this vertex is (g,m; ka1, . . . , kal). We denote the type of a vertex v
by (g(v), m(v); ka1(v), . . . , kal(v)(v)).
Consider a graph Γ in the sum determining the correlator
(28) 〈τk1(V1) . . . τkn(Vn)〉g
16 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
We associate to Γ a number: we contract according to the graph struc-
ture all tensors corresponding to its vertices, edges, and leaves (for
leaves, we take vectors V1, . . . , Vn). Let us denote this number by T (Γ).
Also we weight each graph by a coefficient which is the product of
two combinatorial constants. The first factor is equal to
(29) V (Γ) =
v∈V ert(Γ) 2
g(v)g(v)!
|aut(Γ)|
Here |aut(Γ)| is the order of the automorphism group of the labeled
graph Γ, V ert(Γ) is the set of internal vertices of Γ. In other words, we
can label each vertex v by g(v), delete all empty loops, and then we get
a graph with the order of the automorphism group equal to 1/V (Γ).
The second factor is equal to
(30) P (Γ) =
v∈V ert(Γ)
Mg(v),m(v)+l(v)
a1(v)
1 . . . ψ
al(v)(v)
The integrals used in this formula can be calculated with the help of
the Witten-Kontsevich theorem [35, 17, 29, 27, 13, 14, 4].
So, the whole contribution of the graph Γ to the correlator is equal
to V (Γ)P (Γ)T (Γ). One can check that the non-trivial contribution to
the correlator 〈τk1(V1) . . . τkn(Vn)〉g is given only by graphs that have
exactly 3g − 3 + n−
i=1 ki heavy edges.
The geometric meaning here is very clear. The number T (Γ) comes
from the integral of the induced Gromov-Witten invariants of degree
zero, while the coefficient V (Γ)P (Γ) is exactly the combinatorial in-
terpretation of the intersection number of ψk11 . . . ψ
n with the stratum
whose dual graph is obtained from Γ by the procedure described after
the definition of V (Γ).
4.5. Potential. We fix a cyclic Hodge algebra and consider the formal
power series F = F(Tn,i) defined as
(31) F = exp
g−1Fg
= exp
a1,...,an∈Z≥0
〈τa1(Ea1) . . . τan(Ean)〉g
Abusing notations, we allow to mark the leaves by τa(Ea), Ea, or a;
all this variants are possible and denote the same.
4.6. Trivial example. For example, consider the trivial cyclic Hodge
algebra: H = H0 = 〈e1〉, Q = G− = 0,
e1 = 1. Then Ea = e1 · ta, and
the correlator 〈τa1(Ea1) . . . τan(Ean)〉g consists just of one graph with
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 17
one vertex, g empty loops, and n leaves marked by a1, . . . , an. The
explicit value of the coefficient of this graph is, by definition,
(32) 〈τa1 . . . τan〉g :=
ψa11 . . . ψ
So, in the case of trivial cyclic Hodge algebra we obtain exactly the
Gromov-Witten potential of the point (i. e.,
Ωg,n, e
≡ 1)) that we
denote below by F pt.
4.6.1. Remark about notations. Abusing notation, we use the same
symbol 〈 〉g for the correlators in GW theory and in Hodge field the-
ory. We hope that it does not lead to a confusion. For instance,
〈τa1(Ea1) . . . τan(Ean)〉g in the trivial example above is the correlator of
the trivial Hodge field theory, while 〈τa1 . . . τan〉g is the correlator of the
trivial GW theory.
5. String, dilaton, and tautological relations
In this section, we prove that the potential (31) satisfies the same
string and dilaton equations as GW potentials.
5.1. String equation.
Theorem 1. If
j=1 aj > 0, we have:
(33) 〈τ0(e1)
τaj (eij )〉g =
〈τaj−1(eij )
k 6=j
τak(eik)〉g;
Proof. Consider a graph Γ contributing to the correlator on the left
hand side of the string equation. The special leaf that we are going
to remove is marked by τ0(e1) and is attached to a vertex v of genus
gv (i. e., with gv attached light loops) with lv more attached leaves
labeled by indices in Iv, |Iv| = lv, and mv attached half-edges coming
from heavy edges and loops.
Let us remove the leaf τ0(e1) and change the label of one of the leaves
attached to the same vertex from τaj (eij ) to τaj−1(eij ). This way we
obtain a graph Γj contributing to the j-th summand of the right hand
side of (33). We take the sum of these graphs over j ∈ Iv. Of course,
we skip the summands where aj = 0.
Note that this sum is not empty (if Γ gives a non-zero contribution
to the left hand side of (33)). Indeed, if it is empty, this means that
aj = 0 for all j ∈ Iv. Therefore, since we expect that the contribution
to P (Γ) of the vertex v on the left hand side is nonzero, it follows that
gv = 0 and mv + lv = 2. So, there are three possible local pictures:
τ0(ej)
τ0(e1)
, and
τ0(ej1)
τ0(ej2)
τ0(e1) .
18 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
The first picture can be replaced with the bivector [G−G+G−G+],
which is equal to zero. Therefore, T (Γ) is also equal to zero. In the
second case, we also get 0 since G−G+(e1ej) = G−G+(E0) = 0. The
third picture is possible only when it is the whole graph, and this is in
contradiction with the assumption that
j=1 aj > 0.
Note also that T (Γ) = T (Γj) and V (Γ) = V (Γj) for all j. Indeed,
we have just removed the leaf with the unit of the algebra, so this
can’t change anything in the contraction of tensors. Therefore, T (Γ) =
T (Γj). Also both the leaf τ0(e1) and the vertex v are the fixed points
of any automorphism of Γ. The same is for the vertex corresponding
to v in Γj . Therefore, the automorphism groups are isomorphic for
both graphs. Since we make no changes for empty loops, it follows
that V (Γ) = V (Γj).
Let us prove that P (Γ) =
P (Γj). Indeed, the vertices of Γ
and Γj are in a natural one-to-one correspondence. Moreover, the lo-
cal pictures for all of them except for v and its image in Γj are the
same. Therefore, the corresponding intersection numbers contributing
to P (Γ) and P (Γj) are the same. The unique difference appeares when
we take the intersection numbers corresponding to v and its images in
Γj, j ∈ Iv. But then we can apply the string equation (6) of the GW
theory of the point (32), and we see that
Mgv,kv+lv+1
Mgv,kv+lv
k 6=j
(here o : Iv → {1, . . . , l} is an arbitrary on-to-one mapping). This
implies that P (Γ) =
P (Γj).
So, we have V (Γ)P (Γ)T (Γ) =
V (Γj)P (Γj)T (Γj). In order to
complete the proof of (33), we should just notice that when we write
down this expression for all graphs contributing to the left hand side
of (33), we use each graph contributing to the right hand side of (33)
exactly once.
5.2. Dilaton equation.
Theorem 2. If 2g − 2 + n > 0, we have:
(36) 〈τ1(e1)
τaj (eij )〉g = (2g − 2 + n)〈
τaj (eij )〉g;
Proof. Consider a graph Γ contributing to the correlator on the left
hand side of (36). The special leaf that we are going to remove is
marked by τ1(e1) and is attached to a vertex v of genus gv (i. e., with
gv attached light loops) with lv more attached leaves labeled by indices
in Iv, |Iv| = lv, and mv attached half-edges coming from heavy edges
and loops.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 19
Let us remove the leaf τ1(e1). We obtain a graph Γ
′ contributing to
the right hand side of (33). Let us prove this. Indeed, if we remove a
leaf and don’t get a proper graph, it follows that we have a trivalent
vertex. Since the contribution of this vertex to P (Γ) should be non-
zero, it follows that the unique possible local picture is
(37) τ1(e1) .
But this picture is the whole graph, and it is in contradiction with the
condition 2g − 2 + n > 0.
The same argument as in the proof of the string equation shows that
T (Γ) = T (Γ′) and V (Γ) = V (Γ′). Also, the contribution to P (Γ) and
P (Γ′) of all vertices except for the changed one is the same. The change
of the intersection number corresponding to the vertex v is captured
by the dilaton equation (6) of the trivial GW theory (32):
Mgv,kv+lv+1
= (2gv−2+kv+ lv)
Mgv,kv+lv
(again, o : Iv → {1, . . . , l} is an arbitrary on-to-one mapping). This
implies that P (Γ) = (2gv − 2 + kv + lv)P (Γ
′), and, therefore,
(39) V (Γ)P (Γ)T (Γ) = (2gv − 2 + kv + lv)V (Γ
′)P (Γ′)T (Γ′).
Let us write down the last equation for all graphs Γ contributing to
the left hand side (36). Observe that any graph Γ′ contributing to the
right hand side occurs |V ert(Γ′)| times, since the leaf τ1(e1) could be
attached to any its vertex. Therefore, any graph Γ′ contributing to the
right hand side of (36) appears in these equations with the coefficient
v∈V ert(Γ′)
(2gv − 2 + kv + lv) = 2g − 2 + n.
This completes the proof.
5.3. Tautological equations. As we have explained in Section 2.3.3,
any linear relation L among ψ-κ-strata in the cohomology of the mod-
uli space of curves gives rise to a family of universal relations for the
correlators of a Gromov-Witten theory.
Theorem 3 (Main Theorem). The system of universal relations com-
ing from a tautological relation in the cohomology of the moduli space of
curves holds for the correlators 〈
j=1 τaj (eij )〉g of cyclic Hodge algebra.
Note that some special cases of this theorem were proved in [22, 31,
32]. Our argument below is a natural generalization of the technique
introduced in these papers. Also we are able now to give an explana-
tion why we have managed to perform all our calculations there, see
Remark 8.6.1.
20 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
Let us give here a brief account of the proof of this theorem. First,
the definition of correlators of the Hodge field theory can be extended
to the intersection with an arbitrary tautological class α of degree K
in the space Mg,n, not only the a monomial in ψ-classes. In that case,
we do the following. Again, we consider the sum over all graphs Γ
with 3g − 3 + n − K heavy edges, and the number T (Γ) is defined
as above. Instead of the coefficient V (Γ)P (Γ) we use the intersection
number of α with the stratum whose dual graph is obtained from Γ by
the procedure described right after the definition of V (Γ). Namely, a
vertex with g loops is replaced by a vertex marked by g.
This definition is very natural from the point of view of Zwiebach
invariants. However, we know from Gromov-Witten theory that this
extension of the notion of correlator is unnecessary. Indeed, all integrals
with arbitrary tautological classes can be expressed in terms of the
integrals with only ψ-classes via some universal formulas.
The main question is whether these universal formulas also work
in Hodge field theory. Actually, the main result that more or less
immediately proves the theorem is the positive answer to this question.
5.3.1. Organization of the proof. The rest of the paper is devoted to
the proof of Main Theorem, and here we would like to overview it here.
In Section 6, we study the structure of graphs that can appear in
formulas for the correlators of Hodge field theory. We prove that if
T (Γ) 6= 0 and there is at least one heavy edge in Γ, then all vertices
have genus ≤ 1, i. e., there is at most one empty loop at any vertex.
This basically means that in calculations we’ll have to deal only with
genera 0 and 1. Also this allows us to write down the action of a Hodge
field theory.
In Section 7, we prove the main technical result (Main Lemma). In-
formally, it states that Q = −G−ψ when we apply these two operators
to the correlators of a Hodge field theory. In order to prove it, we
look at a small piece (consisting just of one heavy edge and one or two
vertices that are attached to it) in one of the graphs of a correlator. Of
course, in the correlator we can vary this small piece in an arbitrary
way, such that the rest of the graph remains the same. So, when we
consider the sum of all these small pieces, it is also a correlator of the
Hodge field theory. Thus we reduce the proof to a special case of the
whole statement. But since the genus of a vertex is ≤ 1, it appear now
to be a low-genera statement that can be done by a straightforward
calculation.
In Section 8, we present the proof of the Main Theorem. Consider
a ψ-κ-stratum α whose stable dual graph has k ≥ 1 edges. There is a
universal expression of the integral over α coming from Gromov-Witten
theory. It includes k entries of the scalar product restricted to H0. In
terms of graphs, it means that we are to introduce new edges with the
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 21
bivector [Π0] on them, and there are k such edges in our expression. A
direct corollary of the Main Lemma is that we can always replace [Π0]
by [Id]− ψ[G−G+]− [G−G+]ψ.
In Sections 8.2, 8.3, and 8.4, we show that when we replace [Π0] by
[Id] − ψ[G−G+] − [G−G+]ψ at all edges corresponding to the scalar
product restricted to H0, we obtain a new expression for the integral
over α that again contains only heavy edges and empty loops, as any
ordinary correlator. The main problem now is to understand the com-
binatorial coefficient of a graph Γ obtained this way.
Since we have a sum over graphs with heavy edges and empty loops,
it is natural to identify again these graphs with the corresponding strata
in the moduli space of curves. Then we can calculate the intersection
index of the stratum corresponding to a graph Γ and the initial class
α. Roughly speaking, the main thing that we have to do is to decide
about each node in α (represented initially by [Π0]), whether we have
this node in stratum corresponding to Γ. If yes, then we have an
excessive intersection (so, we must put −ψ on one of the half-edges of
the corresponding edge), and we keep this edge in Γ (so, we replace
[Π0] with −ψ[G−G+] or −[G−G+]ψ). If no, then we don’t have this
edge in Γ, so we contract [Π0], i. e., replace it with [Id].
So, the procedure that we used to get rid of the scalar product is
the same as the procedure of the intersection of α with strata of the
complementary dimension. This means (Section 8.5) that the universal
formula coming from Gromov-Witten theory is equivalent to the natu-
ral formula for the “correlator with α” coming from Zwiebach theory.
The tautological relation is a sum of classes equal to zero. So, while
the universal formula coming from Gromov-Witten theory gives (in the
case of a vanishing class) a non-trivial expression in correlators, the nat-
ural formula coming from Zwiebach theory gives identically zero. This
proves our theorem, see Section 8.6.
6. Vanishing of the BV structure
In this section, we recall several useful lemmas shared in [34, 31]. In
particular, these lemmas give some strong restrictions on graphs that
can give a non-zero contirbution to the correlators defined above.
6.1. Lemmas.
Lemma 1. [34, 31] The following vectors and bivectors are equal to
zero:
, G− .
Also let us remind another lemma in [31] that is very useful in cal-
culations.
22 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
Lemma 2. [31] For any vectors V0, V1, . . . , Vk, k ≥ 2,
...A1 A2 Ak
...G−A1 A2 Ak
+ · · ·+
...A1 Ak−1 G−Ak
,(42)
...A1 A2 Ak
...G−A1 A2 Ak
+ · · ·+
...A1 Ak−1 G−Ak
.(43)
Both lemmas are just simple corollaries of the axioms of cyclic Hodge
algebra.
6.2. Structure of graphs. Consider a graph studied in Section 4.4.
It can have leaves, empty and heavy loops, and heavy edges. Consider
a vertex of such graph. Let us assume that there are A empty loops,
B heavy loops, C heavy edges going to the other vertices of the graph,
and D leaves attached to this vertex:
(44) D C
This picture can be considered as an C +D form. Let us denote it by
Φ(A,B,C,D).
Lemma 3. If A ≥ 2 and B + C ≥ 1, then Φ(A,B,C,D) = 0.
In other words, if there are at least two empty loops at a vertex, then
there should not be any heavy loops or edges attached to this vertex.
Otherwise the contribution of the whole graph vanishes. This implies
Corollary 1. In the definition of correlators one should consider only
graphs of one of the following two types:
(1) One-vertex graphs with no heavy edges (loops).
(2) Arbitrary graphs with at most one empty loop at each vertex.
The contribution of all other graphs vanishes.
This corollary dramatically simplifies all our calculations with graphs
given below. Also we can write down now the action of the Hodge field
theory.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 23
Let F 0g (v0, v1, v2, . . . ), vi ∈ H ⊗ C[[{Tn,i}]] be the “dimension zero”
part of the potential of the Hodge field theory, namely,
(45) F 0g :=
a1+···+an=3g−3+n
〈τa1(va1) . . . τan(van)〉g.
The first sum is taken over n ≥ 0 such that 2g − 2 + n > 0. So, it is
exactly the generating function for the vertices of our graph expressions.
Then the action of the Hodge field theory is equal to
(46) A(v) := F 00 (E0 +G−v, E1, E2, . . . ) + ~F
1 (E0 +G−v, E1, E2, . . . )
gF 0g (E0, E1, E2, . . . )−
Qv ·G−v.
If we put Tn,i = 0 for n ≥ 1, then we immediately obtain the BCOV-
type action discussed in [1, Appendix] and [22, Appendix]. The similar
actions were also studied in [5] and [7].
6.3. Proof of Lemma 3. We consider the form Φ(A,B,C,D) and we
assume that A ≥ 2.
First, let us study the case when C ≥ 1. In this case, our C+D-form
can be represented as a contraction via the bivector [Id] of two forms,
Φ(A−2, B, C−1, D+1) and Φ(2, 0, 1, 1). Let us prove that the last one
is equal to zero. Indeed, this two-form can be represented as Φ̃(·, G+·),
where the two-form Φ̃ is represented by the picture
According to Lemma 1, Φ̃ = 0. Therefore, Φ(2, 0, 1, 1) = 0 and the
whole form Φ(A,B,C,D) is also equal to zero.
Now consider the case when B ≥ 1. In this case, our C + D-form
can be represented as a contraction via the bivector [Id] of two forms,
Φ(A − 2, B − 1, C,D + 1) and Φ(2, 1, 0, 1). Let us prove that the last
one is equal to zero. Indeed,
(48) Φ(2, 1, 0, 1) = =
Here the first equality is definition of Φ(2, 1, 0, 1), the second one is just
an equivalent redrawing, the third equality is application of Lemma 2,
the fourth one is again an equivalent redrawing.
24 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
The last picture contains the bivector (47) which is equal to zero
according to Lemma 1. Therefore, the whole picture is equal to zero,
and Φ(2, 1, 0, 1) = 0. So, the whole form Φ(A,B,C,D) is equal to zero
also in this case. This proves the lemma.
7. Main Lemma
7.1. Statement. The main technical tool that we use in the proof of
Theorem 3 is the lemma that we prove in this section.
Lemma 4 (Main Lemma). For any v1, . . . , vn ∈ H, a1, . . . , an ≥ 0,
〈τa1(v1) . . . τai−1(vi−1)τai(Q(vi))τai+1(vi+1) . . . τan(vn)〉g =
〈τa1(v1) . . . τai−1(vi−1)τai+1(G−(vi))τai+1(vi+1) . . . τan(vn)〉g.
A simple corollary of this lemma is the following:
Lemma 5. For any w ∈ H, v1, . . . , vn ∈ H0,
(50) 〈τa0(Qw)τa1(v1) . . . τan(vn)〉g
+ 〈τa0+1(G−w)τa1(v1) . . . τan(vn)〉g = 0.
In other words, we can state informally that Q+ ψG− = 0.
7.2. Special cases. The proof of the lemma can be reduced to a small
number of special cases. We consider correlators whose graphs have
only one heavy edge.
7.2.1. The first case is the following: Let
i=1 ai = n− 4. We prove
that for any v1, . . . , vn ∈ H ,
〈τa1(v1) . . . τai−1(vi−1)τai(Q(vi))τai+1(vi+1) . . . τan(vn)〉0 =
〈τa1(v1) . . . τai−1(vi−1)τai+1(G−(vi))τai+1(vi+1) . . . τan(vn)〉0.
First, we see that according to the definition of the correlator, the
left hand side of Equation (51) is the sum over graphs with two vertices
and with [G−G+] on the unique edge that connects the vertices. For
each I ⊔J = {1, . . . , n} we can consider the corresponding distribution
of leaves between the vertices (to be precise, let us assume that 1 ∈ I).
Then the coefficient of such graph is 〈τ0
i∈I τai〉0〈τ0
j∈J τaj〉0, and
we take the sum over all possible positions of Q at the leaves.
Using the Leibniz rule for Q and the property that [Q,G−G+] =
−G−, we see that this sum is equal to the sum over graphs with two
vertices and with [−G−] on the unique edge that connects the vertices.
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 25
For each I ⊔J = {1, . . . , n}, |I|, |J | ≥ 2, we consider the corresponding
distribution of leaves between the vertices. Then the coefficient of
such graph is still 〈τ0
i∈I τai〉0〈τ0
j∈J τaj〉0, and the underlying tensor
expression can be written (after we multiply the whole sum by −1) as
i∈I\{1}
vi ·G−(
Let us recall that the 7-term relation for G− implies that
vj) =
i,j∈J, i<j
G−(vivj)
k∈J\{i,j}
vk(53)
− (|J | − 2)
G−(vj)
i∈J\{i,j}
Using this, we can rewrite the whole sum over graphs as
1<i<j
k 6=1,i,j
vk ·G−(vivj)
I′⊔J ′⊔{i,j}={2,...,n}
〈τa1τ0
τak〉0〈τaiτajτ0
k∈J ′
τak〉0
i 6=1
j 6=1,i
vj ·G−(vi)
I′⊔J ′⊔{i}={2,...,n}
(|J ′| − 1) 〈τa1τ0
τaj〉0〈τaiτ0
j∈J ′
τaj〉0
Using that
I′⊔J ′⊔{i,j}={2,...,n}
〈τa1τ0
τak〉0〈τaiτajτ0
k∈J ′
τak〉0 =
〈τa1+1
k 6=1
τak〉0,
Equation (53), and the fact that
(56) (n− 3)〈τa1+1
j 6=1
τaj 〉0
I′⊔J ′⊔{i}={2,...,n}
(|J ′| − 1) 〈τa1τ0
τaj〉0〈τaiτ0
j∈J ′
τaj〉0
= 〈τai+1
j 6=i
τaj〉0,
26 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
we can rewrite Expression (54) as
G−(vi),
j 6=i
〈τai+1
j 6=i
τaj 〉0.
The last formula coincides by definition with the right hand side of
Equation (51) multiplied by −1. This proves the first special case.
7.2.2. The second case is in genus 1. Let
i=1 ai = n− 1. We prove
that for any v1, . . . , vn ∈ H ,
〈τa1(v1) . . . τai−1(vi−1)τai(Q(vi))τai+1(vi+1) . . . τan(vn)〉1 =
〈τa1(v1) . . . τai−1(vi−1)τai+1(G−(vi))τai+1(vi+1) . . . τan(vn)〉1.
According to the definition of the correlator, the left hand side of
Equation (58) is the sum over graphs of two possible types. The first
type include graphs with two vertices and two edges. The first edge
is heavy and connects the vertices; the second edge is an empty loop
attached to the first vertex. For each I ⊔ J = {1, . . . , n}, |J | ≥ 2,
we can consider the corresponding distribution of leaves between the
vertices (we assume that leaves with indices in I are at the first edge).
Then the coefficient of such graph is 〈τ0
i∈I τai〉1〈τ0
j∈J τaj 〉0. The
second type include graphs with one vertex and one heavy loop. All
leaves are attached to this vertex, and the coefficient of such graph
is 〈τ 20
i=1 τai〉0. For both types of graphs, we take the sum over all
possible positions of Q at the leaves.
Using the Leibniz rule for Q and the property that [Q,G−G+] =
−G−, we get the same graphs as before, but there is no Q, and instead
of [G−G+] we have −[G−] on the corresponding edge. Using Lemma 2,
we move −G− in graphs of the first type to the leaves marked by indices
in J . Using the 1/12-axiom and Lemma 2, we move −G− in graphs of
the second type to all leaves.
This way we get graphs of the same type in both cases. We get
graphs with one vertex, one empty loop attached to it, all leaves are
also attached to this vertex, and there is −G− on one of the leaves.
One can easily check that the coefficient of the graphs with −G− at
the i-th leaf is equal to
I⊔J⊔{i}={1,...,n}
τak〉1〈τ0τai
τak〉0 +
〈τ 20
τak〉0
= 〈τai+1
k 6=i
τak〉1
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 27
It is exactly the unique graph contributing to the i-th summand of
the right hand side of Equation (58), and the coefficient is right. This
proves that special case.
7.2.3. Consider g ≥ 2. Let
i=1 ai = 3g + n − 4. In this case, the
statement that for any v1, . . . , vn ∈ H ,
〈τa1(v1) . . . τai−1(vi−1)τai(Q(vi))τai+1(vi+1) . . . τan(vn)〉g =
〈τa1(v1) . . . τai−1(vi−1)τai+1(G−(vi))τai+1(vi+1) . . . τan(vn)〉g,
is immediately reduced to 0 = 0; it is a simple corollary of Lemma 1.
7.3. Proof of Main Lemma. Consider the left hand side of Equa-
tion (49). As usual, using the Leibniz rule for Q and the property that
[Q,G−G+] = −G−, we can remove all Q, but then we must change one
of [G−G+] on edges to −[G−]. Let us cut out the peaces of graphs that
includes this edges with −[G−], all empty loops, leaves and halves of
heavy edges attached to the ends of this special edge.
Since we consider the sum over all possible graphs contributing to
correlators, these small pieces can be gathered into groups according
to the type of the rest of the initial graph. Each group forms exactly
one of the special cases studied above. So, we know that −G− should
jump either to one of the leaves or to one of the heavy edges attached
to the ends of its edge. In the first case, we get exactly the graphs in
the right hand side of Equation (49); in the second case, we get zero.
One can easily check that we get the right coefficients for the graphs
in the right hand side of Equation (49). This proves the lemma.
8. Proof of Theorem 3
8.1. Equivalence of expression in graphs. Consider the expres-
sion in correlators corresponding to a ψ-κ-stratum as it is described in
Section 2.3.2. To each vertex of the corresponding stable dual graph
we assign the sum of graphs that forms correlator in the sense of Sec-
tion 4.4. The leaves of these graphs corresponding to the edges of the
stable dual graph (nodes) are connected in these pictures by edges with
[Π0] (the restriction of the scalar product toH0). We call the edges with
[Π0] “white edges” and mark them in pictures by thick white points,
see (25).
The axioms of cyclic Hodge algebra imply a system of linear equa-
tions for the graphs of this type. In particular, it has appeared that
playing with this linear equations we can always get rid of white edges
in the sum of pictures corresponding to a stable dual graph, see [22,
31, 32]. However, previously it was just an experimental fact. Now we
can show how it works in general.
28 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
The numerous examples of the correspondence between stable dual
graphs and graphs expressions in cyclic Hodge algebras and also of the
linear relations implied by the axioms of cyclic Hodge algebra are given
in [22, 31, 32].
Below, we explain how one can represent the expression in correlators
corresponding to a ψ-κ-stratum in terms of graphs with only empty and
heavy edges and with no white edges. The unique tool that we need is
Lemmas 4 and 5 proved above.
8.2. The simplest example. Consider a stable dual graph with two
vertices and one edge connecting them:
g1 g2
ψa1 . . . ψan1 ψb1 . . . ψbn2
κu1,...,ul1 κv1,...,vl2
ψa0 ψb0
The corresponding expression in correlators is
(62) 〈τa0(ej1)
τai(e•)
τui(e1)〉g1·
ηj1j2 · 〈τb0(ej2)
τbi(e•)
τvi(e1)〉g1
(here we denote by e• an arbitrary choice of ei ∈ H0). It is convenient
for us to rewrite this expression as
(63) 〈τa0(xα1)
τai(e•)
τui(e1)〉g1·
α1α2 · 〈τb0(xα2)
τbi(e•)
τvi(e1)〉g1,
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 29
where {xα} is the basis of the whole H . Using the fact that Π0 =
Id−QG+ −G+Q and applying Lemma 5, we obtain
(64) 〈τa0(xα1)
τai(e•)
τui(e1)〉g1·
α1α2 · 〈τb0(xα2)
τbi(e•)
τvi(e1)〉g1
= 〈τa0(xα1)
τai(e•)
τui(e1)〉g1·
[Id]α1α2 · 〈τb0(xα2)
τbi(e•)
τvi(e1)〉g1
− 〈τa0+1(xα1)
τai(e•)
τui(e1)〉g1·
[G−G+]
α1α2 · 〈τb0(xα2)
τbi(e•)
τvi(e1)〉g1
− 〈τa0(xα1)
τai(e•)
τui(e1)〉g1·
[G−G+]
α1α2 · 〈τb0+1(xα2)
τbi(e•)
τvi(e1)〉g1
In all three summands of the right hand side we still have two correla-
tors, whose leaves corresponding to the nodes are connected by some
special edges. But now the connecting edge is either marked by [Id]
(an empty edge) or by [G−G+] (an ordinary heavy edge). So, this way
we get rid of the white edge in this case.
Informally, in terms of pictures, we can describe Equation (64) as
When we put ψ, we mean that we add one more ψ-class at the node
at the corresponding branch of the curve. Dashed circles denote corre-
lators.
8.3. Example with two nodes. Now we consider an example of stra-
tum, whose generic point is represented by a three-component curve.
Again, we allow arbitrary ψ-classes at marked points and two branches
at nodes.
30 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
We perform the same calculation as above, but now we explain it in
terms of informal pictures from the very beginning. So, the first step
is the same as above:
=(66)
Then we apply Lemma 4 to each of the summands in the right hand
side:
=(67)
We take the sum of these three expressions, and we see that all pictures
where we have edges with [G−] and [G+] are cancelled. So, we get an
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 31
expression for the sum of graphs representing the initial stratum in
terms of graphs with only empty and heavy edges.
8.4. General case. The general argument is exactly the same as in
the second example. In fact, this gives a procedure how to write an
expression in graphs with only empty and heavy edges (and no white
edges) starting from a stable dual graph. Let us describe this proce-
dure.
Take a stable dual graph corresponding to a ψ-κ-stratum of dimen-
sion k in Mg,n. First, we are to decorate it a little bit. For each
edge, we either leave it untouched, or substitute it with an arrow (in
two possible ways). At the pointing end of the arrow, we increase the
number of ψ-classes by 1. Each of these graphs we weight with the
inversed order of its automorphism group (automorphisms must pre-
serve all decorations) multiplied by (−1)arr, where arr is the number
of arrows.
Consider a decorated dual graph. To each its vertex we associate the
corresponding correlator of cyclic Hodge algebra (we add new leaves in
order to represent κ-classes). Then we connect the leaves corresponding
to the nodes either by empty edges (if the corresponding edge of the
decorated graph is untouched) or by heavy edges (if the corresponding
edge of the dual graph is decorated by an arrow).
It is obvious that the number of heavy edges in the final graphs is
equal to k.
8.5. Coefficients. We can simplify the resulting graphs obtained in
the previous subsection. First, we can contract empty edges (as much
as it is possible; it is forbidden to contract loops). Second, we can
remove leaves added for the needs of κ-classes. Indeed, each such leaf
is equipped with a unit of H , so it doesn’t affect the contraction of
tensors corresponding to a graph. Moreover, when we remove all leaves
corresponding to κ-classes, we still have graph with at least trivalent
vertices. Otherwise, this graph is equal to zero, c. f. arguments in the
proofs of string and dilaton equations.
So, we obtain final graphs that have the same number of heavy edges
as the dimension of the initial ψ-κ-stratum, the same number of leaves
as the initial dual graph, and some number of empty loops, at most
one at each vertex. The exceptional case is when k = 0; in this case we
obtain only one graph, with one vertex, n leaves, and g empty loops.
In the first case, let us turn a graph like this into a stable dual
graph. Just replace its vertices with no empty loops by vertices of
genus zero, vertices with empty loops by vertices of genus one, heavy
edges are edges, and leaves are leaves. There are no ψ- or κ- classes. It
is obvious that the codimension of the stratum corresponding to this
dual graph is k. Indeed, in this case it is just the number of nodes.
32 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
So, to each ψ-κ-stratum X of dimension k > 0 we associate a linear
combination
ciYi of strata of codimension k with no ψ- or κ-classes,
whose curves have irreducible components of genus 0 and 1 only.
Proposition 1. We have ci = X · Yi.
Proof. We prove it two steps. First, consider a one-vertex stable dual
graph with no edges (just a correlator). In this case, the intersection
numberX ·Yi is just by definition ci = V (Γi)P (Γi), where Γi is the cyclic
Hodge algebra graph that turns into Yi via the procedure described
above.
Then, consider a stable dual graph with one edge. It is the intersec-
tion of the one-vertex stable dual graph with an irreducible component
of the boundary. For a given Yi, this component of the boundary either
intersects it transversaly, or we have an excessive intersection. In the
first case, the corresponding node is not represented in Yi. This means
that in Γi it should be an empty edge. In the second case, this node
is one of the nodes of Yi, so it should be a heavy edge of Γi. Also, it
is an excessive intersection, so we are to add the sum of ψ-classes with
the negative sign at the marked points (half-edges) corresponding to
the node, see [11, Appendix].
Exactly the same argument works for an arbitrary number of nodes,
we just extend it by induction. �
In the case of k = 0, we get just one final graph with coefficient c.
Proposition 2. If k = 0, the coefficient of the final graph is equal to
the number of points in the initial ψ-κ-stratum.
Proof. If k = 0, this means that each of the vertices of the initial stable
dual graph also has dimension 0, and the corresponding correlator of
cyclic Hodge algebra is represented by one one-vertex graph with no
heavy edges. Also this means that each edge of the initial stable dual
graph is replaced in the algorithm above by an empty edge. So, we can
think that we just work with the correlators of the Gromov-Witten
theory of the point. In this case Proposition becomes obvious. �
Now we deduce Theorem 3 from these propositions.
8.6. Proof of Theorem 3. Consider the system of subalgebras
(70) RH∗1 (Mg,n) ⊂ RH
∗(Mg,n)
of the cohomological tautological algebras of Mg,n generated by strata
with no ψ- or κ-classes and with irreducible curves of genus 0 and 1
only.
Let L be a linear combination of ψ-κ-strata of dimension k in Mg,n.
Then the expression in correlators of cyclic Hodge algebras correspond-
ing to L is equaivalent to a sum of some graphs with coefficients equal
to the intersection of L with classes in RHk1 (Mg,n).
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 33
So, if the class of L is equal to zero, then the corresponding equation
(and also the whole system of equations that we described in Sec-
tion 2.3.3) for correlators of cyclic Hodge algebra is valid. Theorem is
proved.
8.6.1. Remark. Evidently, RH∗1 (Mg,n) is a module over RH
∗(Mg,n).
Also it is obvious that RH∗1 (Mg,n) is closed under pull-backs and push-
forwards via the forgetful morphisms. This explains why it was enough
to make only one check in the simplest case in order to get the system
of equations in [22, 32] (cf. an argument in the last section in [22]).
8.7. An interpretation of Propositions 1 and 2. From the point
of view of the theory of Zwiebach invariants, both propositions look
very natural. Indeed, we try to give a graph expression for the integral
of an induced Gromov-Witten form multiplied by a tautological class
X . Since we know that we are able to integrate only degree zero parts
of induced Gromov-Witten invariants, we should just take the sum over
all graphs that correspond to the strata of complimentary dimension
in RH∗1 (Mg,n). The coefficients are to be the intersection numbers of
these strata with X .
On the other hand, we know that in any Gromov-Witten theory it is
enough to fix the integrals of Gromov-Witten invariants multiplied by
ψ-classes. Then the integrals of Gromov-Witten invariants multiplied
by arbitrary tautological classes are expressed by universal formulas.
We can try to use these universal formulas also in Hodge field theory.
They are exactly our expressions with white edges.
So, we have two different natural ways to express in terms of graphs
the integrals of induced Gromov-Witten invariants multiplied by tau-
tological classes. Propositions 1 and 2 state that these two different
expressions coinside.
References
[1] S. Barannikov, M. Kontsevich, Frobenius manifolds and formality of Lie alge-
bras of polyvector fields, Internat. Math. Res. Notices 1998, no. 4, 201–215.
[2] P. Belorousski, R. Pandharipande, A descendent relation in genus 2, Ann.
Scuola Norm. Sup. Pisa Cl. Sci. (4) 29 (2000), no. 1, 171–191.
[3] M. Bershadsky, S. Cecotti, H. Ooguri, C. Vafa, Kodaira-Spencer theory of
gravity and exact results for quantum string amplitudes, Comm. Math. Phys.
165 (1994), no. 2, 311–427.
[4] Lin Chen, Yi Li, Kefeng Liu, Localization, Hurwitz Numbers and the Witten
Conjecture, arXiv: math.AG/0609263.
[5] R. Dijkgraaf, Chiral deformations of conformal field theories, Nuclear Phys. B
493 (1997), no. 3, 588–612.
[6] C. Faber, S. Shadrin, D. Zvonkine, Tautological relations and the r-spin Witten
conjecture, arXiv: math.AG/0612510.
[7] A. Gerasimov, S. Shatashvili, Towards integrability of topological strings I:
three-forms on Calabi-Yau manifolds, J. High Energy Phys. 2004, no. 11, 074.
http://arxiv.org/abs/math/0609263
http://arxiv.org/abs/math/0612510
34 A. LOSEV, S. SHADRIN, AND I. SHNEIBERG
[8] E. Getzler, Batalin-Vilkovisky algebras and two-dimensional topological field
theories, Comm. Math. Phys. 159 (1994), 265–285.
[9] E. Getzler, Intersection theory onM1,4 and elliptic Gromov-Witten invariants.
J. Amer. Math. Soc. 10 (1997), no. 4, 973–998.
[10] E. Getzler, Topological recursion relations in genus 2, Integrable Systems and
Algebraic Geometry (Kobe/Kyoto, 1997), World Scientific, River Edge, NJ,
1998, pp. 73–106.
[11] T. Graber, R. Pandharipande, Constructions of non-tautological classes on
moduli spaces of curves, Michigan Math. J. 51 (2003), no. 1, 93–109.
[12] J. Harris, I. Morrison, Moduli of curves. Graduate Texts in Mathematics, 187.
Springer-Verlag, New York, 1998.
[13] M. Kazarian, S. Lando, An algebro-geometric proof of Witten’s conjecture,
arXiv: math.AG/0601760.
[14] Y.-S. Kim, K. Liu, A simple proof of Witten conjecture through localization,
arXiv: math.AG/0508384.
[15] T. Kimura, J. Stasheff, A. Voronov, On operad structures of moduli spaces
and string theory. Comm. Math. Phys. 171 (1995), no. 1, 1–25.
[16] T. Kimura, X. Liu, A genus-3 topological recursion relation. Comm. Math.
Phys. 262 (2006), no. 3, 645–661.
[17] M. Kontsevich, Intersection theory on the moduli space of curves and the
matrix Airy function, Comm. Math. Phys. 147 (1992), no. 1, 1–23.
[18] M. Kontsevich, Y. Manin, Gromov-Witten classes, quantum cohomology, and
enumerative geometry. Comm. Math. Phys. 164 (1994), no. 3, 525–562.
[19] A. Losev, Hodge strings and elements of K. Saito’s theory of primitive form,
Topological field theory, primitive forms and related topics (Kyoto, 1996), 305–
335, Progr. Math., 160, Birkhaeuser Boston, Boston, MA, 1998.
[20] A. Losev, Y. Manin, New moduli spaces of pointed curves and pencils of flat
connections, Michigan Math. J. 48 (2000), 443–472.
[21] A. Losev, Yu. Manin, Extended modular operad, Frobenius manifolds, 181–
211, Aspects Math., E36, Vieweg, Wiesbaden, 2004.
[22] A. Losev, S. Shadrin, From Zwiebach invariants to Getzler relation, Comm.
Math. Phys. 271, no. 3, 649–679.
[23] Yu. I. Manin, Three constructions of Frobenius manifolds: a comparative
study, Asian J. Math. 3 (1999), no. 1, 179–220.
[24] Yu. Manin, Frobenius manifolds, quantum cohomology, and moduli spaces.
American Mathematical Society Colloquium Publications, 47. American Math-
ematical Society, Providence, RI, 1999.
[25] S. A. Merkulov, Formality of canonical symplectic complexes and Frobenius
manifolds, Internat. Math. Res. Notices 1998, no. 14, 727–733.
[26] G. Mikhalkin, Enumerative tropical algebraic geometry in R2, J. Amer. Math.
Soc. 18 (2005), no. 2, 313–377.
[27] M. Mirzakhani, Weil-Petersson volumes and intersection theory on the moduli
space of curves, J. Amer. Math. Soc. 20 (2007), no. 1, 1–23.
[28] P. Mnev, Notes on simplicial BF theory, arXiv: hep-th/0610326.
[29] A. Okounkov, R. Pandharipande, Gromov-Witten theory, Hurwitz numbers,
and matrix models, I, arXiv: math.AG/0101147.
[30] F. Schaetz, BVF-complex and higher homotopy structures, arXiv:
math.QA/0611912.
[31] S. Shadrin, A definition of descendants at one point in graph calculus, arXiv:
math.QA/0507106.
[32] S. Shadrin, I. Shneiberg, Belorousski-Pandharipande relation in dGBV alge-
bras, J. Geom. Phys. 57 (2007), no. 2, 597–615.
http://arxiv.org/abs/math/0601760
http://arxiv.org/abs/math/0508384
http://arxiv.org/abs/hep-th/0610326
http://arxiv.org/abs/math/0101147
http://arxiv.org/abs/math/0611912
http://arxiv.org/abs/math/0507106
TAUTOLOGICAL RELATIONS IN HODGE FIELD THEORY 35
[33] I. Shneiberg, Topological recursion relations in M2,2, to appear in Funct. Anal.
Appl. (2007).
[34] U. Tillmann, Vanishing of the Batalin-Vilkovisky algebra structure for TCFTs,
Comm. Math. Phys. 205 (1999), no. 2, 283-286.
[35] E. Witten, Two dimensional gravity and intersection theory on moduli space.
Surveys in Differential Geometry, vol. 1 (1991), 243–310.
[36] E. Witten, Chern-Simons gauge theory as a string theory, The Floer memorial
volume, 637–678, Progr. Math. 133, Birkhaeuser, Basel, 1995.
[37] B. Zwiebach, Closed string field theory: quantum action and the Batalin-
Vilkovisky master equation, Nuclear Phys. B 390 (1993), no. 1, 33–152.
Institute for Theoretical and Experimental Physics, Bolshaya Che-
remushkinskaya 25, Moscow, 117218, Russia.
E-mail address : losev@itep.ru
Department of Mathematics, University of Zurich, Winterthurer-
strasse 190, CH-8057 Zurich, Switzerland.
E-mail address : sergey.shadrin@math.unizh.ch
Department of Algebra, Faculty of Mechanics and Mathematics,
Moscow State University, Leninskie Gory, GSP, Moscow, 119899, Rus-
E-mail address : shneiberg@mtu-net.ru
	1. Introduction
	2. Gromov-Witten theory
	3. Zwiebach invariants
	4. Construction of correlators in Hodge field theory
	5. String, dilaton, and tautological relations
	6. Vanishing of the BV structure
	7. Main Lemma
	8. Proof of Theorem 3
	References
ABSTRACT
  We propose a Hodge field theory construction that captures algebraic
properties of the reduction of Zwiebach invariants to Gromov-Witten invariants.
It generalizes the Barannikov-Kontsevich construction to the case of higher
genera correlators with gravitational descendants.
  We prove the main theorem stating that algebraically defined Hodge field
theory correlators satisfy all tautological relations. From this perspective
the statement that Barannikov-Kontsevich construction provides a solution of
the WDVV equation looks as the simplest particular case of our theorem. Also it
generalizes the particular cases of other low-genera tautological relations
proven in our earlier works; we replace the old technical proofs by a novel
conceptual proof.

<|endoftext|><|startoftext|>
**FULL TITLE**
ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION**
**NAMES OF EDITORS**
SBF: multi-wavelength data and models
M. Cantiello1,2, G. Raimondo2, J.P. Blakeslee1, E. Brocato2, M.
Capaccioli3
Abstract. Recent applications have proved that the Surface Brightness Fluc-
tuations (SBF) technique is a reliable distance indicator in a wide range of dis-
tances, and a promising tool to analyze the physical and chemical properties of
unresolved stellar systems, in terms of their metallicity and age. We present the
preliminary results of a project aimed at studying the evolutionary properties
and distance of the stellar populations in external galaxies based on the SBF
method.
On the observational side, we have succeeded in detecting I-band SBF gra-
dients in six bright ellipticals imaged with the ACS, for these same objects we
are now presenting also B-band SBF data. These B-band data are the first
fluctuations magnitude measurements for galaxies beyond 10 Mpc.
To analyze the properties of stellar populations from the data, accurate
SBF models are essential. As a part of this project, we have evaluated SBF
magnitudes from Simple Stellar Population (SSP) models specifically optimized
for the purpose. A wide range of chemical compositions and ages, as well as
different choices of the photometric system have been investigated. All mod-
els are available at the Teramo-Stellar Populations Tools web site: www.oa-
teramo.inaf.it/SPoT.
We have measured B- and I-band SBF magnitudes for 6 elliptical galax-
ies observed with the ACS camera on board of HST: NGC1407, NGC3258,
NGC3268, NGC4696, NGC5322 and NGC5557. Concerning I-band images,
their high S/N ratio allowed us to obtain SBF measurements in different regions
of the galaxies – 5 concentric annuli (Cantiello et al. 2005). On the contrary,
the B-band images have low S/N (∼1), and SBF amplitudes can be measured
only in one single annulus. The reliability of these B-band measurements has
been verified via numerical simulations, by using a procedure which is able to
reproduce realistic images of elliptical galaxies, including the stellar SBF signal.
The general lack of B-band SBF data hampered up to now a detailed com-
parison with models, our observational data represent the first sample of B-
and I-band SBF measurements for a fair sample of distant galaxies. Figure 1
(left panels) shows the comparison of absolute SBF magnitudes versus (B-I)0
color data with SSP models from the Teramo Stellar Populations Tools group
(SPoT models, Raimondo et al. 2005). SBF and color data appear generally
well reproduced by means of standard SSP models in the M̄I vs. (B-I)0 panel.
However, there is a considerable mismatch between SBF models and data for
Dep. of Physics and Astronomy, Washington State University, Pullman, WA 99164
INAF-Oss. Astronomico di Teramo, Via M. Maggini, 64100, Teramo, Italy
INAF-Oss. Astronomico di Capodimonte, Vicolo Moiariello 16, 80131, Napoli, Italy
http://arxiv.org/abs/0704.1002v1
2 Cantiello M. et al.
some objects in the M̄B vs. (B-I)0 panel. Such disagreement does not depend on
the distance modulus adopted to estimate the absolute SBF magnitudes, in fact
the same mismatch is present also in the distance-free SBF-color vs. color (B-I)0
(Cantiello et al. 2006, ApJ submitted). In addition, adopting other standard
SSP models from literature (e.g. Blakeslee et al. 2001) or also non-standard
SSP models (e.g. alpha enhanced models) the disagreement is not removed.
Figure 1. Left panels: SBF absolute magnitudes vs. the (B-I)0 color derived
from HST data (full dots). Full squares mark the only two other galaxies with
literature data. SSP models are from the SPoT group for the labeled chemical
compositions and 2 Gyr ≤ t ≤ 14 Gyr (symbols of increasing size mark older
ages). Right panels: same data as left panels but compared to CSP models.
One possible solution seems to be the use of Composite Stellar Populations
(CSP). In Figure 1 (right panels) we compare the Blakeslee et al. (2001) CSP
models with the present data. These CSP models are obtained combining SSP
models in such a way to mimic, at least approximately, the evolution of an
elliptical galaxy. With these models the disagreement between SBF data &
models disappears, as it is completely accounted for by CSP with a fraction of
old and metal-poor (t∼ 14 Gyr, [Fe/H]∼-1.3) stars as high as 8%, combined
with a dominant contribution from an old and metal rich stellar component.
In conclusion, our data seem to show that while the integrated properties
of some galaxies might be well interpreted within the scenario of classical SSP
models, there are few objects whose observational properties can only be inter-
preted by means of more complex stellar populations systems. In this view, SBF
and SBF colors, coupled with classical photometric data appear to be a very in-
teresting tool to understand the properties of the unresolved stellar systems in
distant galaxies.
References
Blakeslee, J.P., Vazdekis, A., Ajhar, E. 2001, MNRAS, 320, 193
Cantiello, M. et al. 2005, ApJ, 634, 239
Raimondo, G., et al. 2005, AJ, 130, 2625
ABSTRACT
  Recent applications have proved that the Surface Brightness Fluctuations
(SBF) technique is a reliable distance indicator in a wide range of distances,
and a promising tool to analyze the physical and chemical properties of
unresolved stellar systems, in terms of their metallicity and age. We present
the preliminary results of a project aimed at studying the evolutionary
properties and distance of the stellar populations in external galaxies based
on the SBF method.
  On the observational side, we have succeeded in detecting I-band SBF
gradients in six bright ellipticals imaged with the ACS, for these same objects
we are now presenting also B-band SBF data. These B-band data are the first
fluctuations magnitude measurements for galaxies beyond 10 Mpc.
  To analyze the properties of stellar populations from the data, accurate SBF
models are essential. As a part of this project, we have evaluated SBF
magnitudes from Simple Stellar Population (SSP) models specifically optimized
for the purpose. A wide range of chemical compositions and ages, as well as
different choices of the photometric system have been investigated. All models
are available at the Teramo-Stellar Populations Tools web site:
www.oa-teramo.inaf.it/SPoT.

<|endoftext|><|startoftext|>
Introduction 
The goal of this paper is to lay out some ideas for the further development of an invariant-
manifold-theory inspired computational approach to the problem of coarse-graining an 
autonomous system of ODE (fine system). Coarse variables are introduced as either functions of 
the fine state or time-averages of functions of the fine state. The objective is to come up with a 
closed theory of evolution for the coarse variables. In our past work [1, 2, 3] that we have called 
the method of Parametrized Locally Invariant Manifolds (PLIM), we have shown that this goal 
can be achieved in the context of hard nonlinear problems involving dissipation and/or 
oscillatory response. Strictly speaking, the achievement is, however, only partial in the sense that 
one requires knowledge of fine initial conditions to ensure a correct coarse-grained response 
even though the developed coarse equations are posed purely in terms of the coarse variables. 
This realization is intimately tied to the understanding of the emergence of memory effects in 
coarse response of an autonomous fine theory, a feature that can also be interpreted as stochastic 
effects in coarse response. In this paper, we examine two methods for the selection of a small 
number of coarse variables designed to allow for an autonomous coarse response, thus allowing 
unambiguous initialization of the coarse theory with information only on the coarse state. 
PLIM is an algorithm for developing approximate, but micro-dynamics-consistent, equations of 
evolution for user-defined coarse variables. The broad idea here is to calculate parametrizations 
of an appropriate collection of locally invariant manifolds of the fine dynamics a priori, with the 
coordinates being the coarse variables (observables). With a database devised to store this 
information, it becomes possible to define a closed dynamics for the coarse variables.  
PLIM as well as the Mori-Zwanzig Projection Operator Technique imply that if coarse variables 
are chosen ‘arbitrarily’, then more variables will, in general, be required to have an autonomous 
coarse theory that can be initialized unambiguously. Here, we propose two possible strategies for 
making a choice of such coarse variables. Connections of our work with, and a detailed review 
of, other multiscale strategies are provided in [1,2,3]. In particular, we note the work of 
Kevrekidis and co-workers [10,11,12] where the emphasis is not on deriving the form of the 
coarse equations at all but nevertheless make consistent predictions of coarse evolution based on 
a carefully crafted strategy for utilizing short bursts of microscopic simulations. 
The main point of the paper is how to start from a fine dynamics with some idea of what time-
averaged coarse variables one might be interested in and proceed to augment this problem in a 
well-defined manner so that coarse response can indeed be computed. In a rough sense, it is 
specifically designed to deal with problems where the ‘as-received’ fine scale problem does not 
readily have an obvious slow macroscopic dynamics associated with it. The procedure tries to 
augment the problem definition so that an appropriate macroscopic dynamics becomes 
associated with the augmented microscopic problem. 
This work is mathematically formal. 
2. Background 
The autonomous fine dynamics is defined as 
( ) ( )( )
( )0 .
t H f t
 (1) 
f  is an N -dimensional vector of fine degrees of freedom and H  is a generally nonlinear 
function of  fine states, denoted as the vector field of the fine dynamical system. Equation (1) 2  
represents the specification of initial conditions. N  can be large in principle, and the function H  
rapidly oscillating. 
Let Λ  be a user-specified function of the fine states producing vectors with m  components 
whose time averages over intervals of period τ  can be measured in principle and are of physical 
interest. These time averages are considered as some of the coarse variables of interest. Let us 
also define the remaining list of ‘instantaneous’ coarse variables p  with n  components through 
the relationship 
 ( )p fΠ= , (2) 
where Π  is user-defined. Often, such variables may be required to incorporate external driving 
influences (e.g. loadings) on an assembly whose time-averaged behavior needs to be explored. 
Given the fixed time interval τ  characterizing the resolution of coarse measurements in time, a 
coarse trajectory corresponding to each fine trajectory ( )f ⋅  is defined as the following pair of 
functions of time: 
( ) ( )( )
( ) ( )( )
c t f s ds
p t f t
∫  (3) 
Roughly speaking, it is a closed statement of evolution for the pair ( ),c p  that we seek. The 
statement is unambiguous only after we specify what sort of initial conditions we may want to 
prescribe. For the purpose of this section we assume that fine initial conditions are known with 
certainty. Then, the goal is to develop a closed evolution equation for ( ),c p , i.e. an equation that 
can be used for evolving ( ),c p  without concurrently evolving (1), corresponding to fine 
trajectories out of a prescribed set of fine initial conditions.  
Clearly, 
( ) ( )( ) ( )( )
( ) ( )( ) ( )( )
t f t f t
t f t H f t
Λ τ Λ
⎡ ⎤= + −⎢ ⎥⎣ ⎦
. (4) 
If we now introduce a forward trajectory ( )ff ⋅  corresponding to a trajectory ( )f ⋅  as 
 ( ) ( ):ff t f t τ= + , (5) 
then 
 ( ) ( )( )f f
t H f t
= . (6) 
Also, given an initial state f∗  we denote by f∗∗  the state defined as the solution of (1) evaluated 
at time τ . With these definitions in hand, we augment the fine dynamics (1) to 
( ) ( )( )
( ) ( )( )
t H f t
t H f t
 (7) 
and apply invariant manifold techniques to (7). In detail, on an m n+ - dimensional coarse phase 
space whose generic element we denote as ( ),c p , we seek functions fG  and G  that satisfy the 
first-order, quasilinear partial differential equations 
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
1 1 1
1 1 1
   1 to 
I I lm n N
f fk k K I
f fk l K
k l K
II lm n N
fk k K I
fk l K
k l K
G G G H G H G
c p f
G G G H G H G
c p f
= = =
= = =
⎫⎪∂ ∂⎛ ⎞ ∂ ⎪⎡ ⎤⎟⎜ − + = ⎪⎟⎜ ⎢ ⎥⎟ ⎪⎜ ⎣ ⎦⎝ ⎠∂ ∂ ∂ ⎪⎪ =⎬
⎪∂⎛ ⎞∂ ∂ ⎪⎡ ⎤⎟⎜ ⎪− + =⎟⎜ ⎪⎢ ⎥⎟⎜ ⎣ ⎦⎝ ⎠ ⎪∂ ∂ ∂ ⎪⎭
 (8) 
 at least locally in ( ),c p -space. Assuming that we have such a pair of functions over the domain 
containing the point  
 (9) 
 defined from (1) and (3) which, moreover, satisfies the conditions 
fG c p f
G c p f
∗ ∗ ∗∗
∗ ∗ ∗
 (10) 
it is easy to see that a local-in-time fine trajectory defined by 
( ) ( ) ( )( )
( ) ( ) ( )( )
f ft G c t p t
t G c t p t
 (11) 
through the coarse local trajectory satisfying 
( )( ) ( )( )
( )( ) ( )( )
G c p G c p
G c p H G c p
⎡ ⎤= −⎢ ⎥⎣ ⎦
∑  (12) 
is the solution of (7) (locally). A solution pair ( ),fG G  of (8) represents a parametrization of a 
locally invariant manifold of the dynamics (7). By a locally invariant manifold we mean a set of 
points in phase space such that the vector field of (7) is tangent to the set at all points. Thus, a 
trajectory of (7) exits a locally invariant manifold only through the boundary of the manifold. 
Also, note that if ( ),fG G  and ( ),fG G  are two solutions to (8) and (10) on an identical local 
domain in ( ),c p -space containing ( ),c p∗ ∗  and ( ) ( )( ),c p⋅ ⋅  and ( ) ( )( ),c p⋅ ⋅  are the corresponding 
coarse trajectories defined as solutions to (12), then local uniqueness of solutions to (7) implies 
( ) ( )( ) ( ) ( ) ( ) ( )( )
( ) ( )( ) ( ) ( ) ( ) ( )( )
, : : ,
, : : , .
f f f fG c t p t t t G c t p t
G c t p t t t G c t p t
= = =
= = =
 (13) 
Thus, 
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
  ;       locally in time,
0 0   ;    0 0
dc dc dp dp
t t t t
dt dt dt dt
c c p p
 (14) 
from (12), and assuming ( ),c p  and ( ),c p  are continuous, ( ) ( ), ,c p c p≡ , locally. 
Hence, given any pair of mappings ,fG G  satisfying (8) and (10) on a domain containing 
( ),c p∗ ∗ , we consider (12) as the consistent, closed theory for the evolution of the coarse 
variables c . Obstruction to the construction of solutions to (8) is explored in [1], providing one 
reason for seeking multiple local solutions as implemented in [2]. This paper seeks to determine 
coarse functions such that multiple local solutions are not required, as far as possible. 
Notice that if equations (4)1 ,  (5), (7) are viewed as a system, then this system has a singular 
perturbation structure for τ  large. We are interested in the evolution of the ‘ c ’ variables which 
are coarse/macroscopic variables. Section 3 briefly outlines a concrete and systematic procedure 
of how to append this system with more memory variables (in mechanics parlance, internal 
variables) so as to obtain an unambiguously initializable coarse dynamics. 
As explained in [1,2,3], an arbitrary choice of coarse variables (12) will not, in general, result 
in unique evolution of the coarse state out of a specified coarse initial condition. As well, 
conservation properties of the fine system are not expected to be preserved in the behavior of the 
time averages of fine variables. Thus, fixing attention on a fixed set of arbitrarily chosen coarse 
variables would imply what would seem like stochastic coarse response with dissipation. This is 
also the content of the main result of the Mori-Zwanzig projection Operator Technique within 
which Langevin dynamics can be derived. In this work, we propose a different strategy – we 
would like to start with a particular physically motivated set of coarse variables, but then would 
like to augment this set with more appropriately chosen variables so that the augmented set 
displays autonomous response. These extra variables effectively are memory (delay) variables 
corresponding to the original set of coarse variables. In the next section, we outline a procedure 
for the selection of such extra variables and then the use of the PLIM methodology to set up the 
autonomous coarse response. 
3. Variables for autonomous coarse response: The Delay Reconstruction Technique 
The main conceptual ingredient of this technique is Takens’s embedding theorem [6]: 
Theorem: Let M  be a compact manifold of dimension m . For pairs ( ), yϕ , : M Mϕ →  
a smooth diffeomorphism and :y M R→ (reals) a smooth function, it is a generic 
property that the map ( )
y M RϕΦ
+→ , defined by 
 ( ) ( ) ( ) ( ) ( )( )
, , , ,
y x y x y x y xϕΦ ϕ ϕ=  
is an embedding; by ‘smooth’ we mean at least 2C . 
Practically, Takens’s theorems suggest, and were motivated by the idea, that a single measurable 
signal (the function y ) of a complicated, possibly high-dimensional, dynamics (the mapping ϕ ), 
can in principle reveal all qualitative features of the underlying dynamics through the study of 
the delay-reconstruction map.  
Thus the delay reconstruction technique has been used by workers to make statements about 
qualitative features of dynamics. Of particular interest is the work in [4] and [5] where an 
algorithm for a systematic unfolding of delay-reconstructed trajectories is introduced, 
corresponding to an autonomous original dynamics. Essentially, starting from a one-component 
delay reconstruction of a trajectory of the original dynamics, more delay components are added 
till the point where the trajectory in delay-reconstruction space has no self-intersections. The 
number of components required is then declared to be representative of the number 2m  of the 
theorem. Ding et al. [7] show that provided the function y  satisfies certain smoothness 
assumptions, the correlation dimension of the delay-reconstructed signal with progressively more 
components hits a plateau when it becomes just greater than the correlation dimension of the 
attractor of the original dynamics. 
On the other hand, physical intuition suggests that any kind of measuring device acts as a filter 
that cannot measure variations below its resolution. Thus if y  were to represent a moving time 
average, then it could not possibly reveal all features of the original dynamics. In a deep physical 
sense, were this not the case, macroscopic physics would not be possible – time-averaged signals 
display much gentler and lower dimensional dynamics. It is this idea that we would now like to 
pursue. Suggestive practical examples of this feature of dynamics can be found in [4]. 
Consider Axiom A diffeomorphisms for which we have ergodicity on attractors. Consider an 
observable of the form 
 ( ) ( )
y x x
′ = ∑ , (15) 
where we assume that ϕ  takes values in k  for some positive integer k . Due to ergodicity, this 
map has a constant value almost everywhere on a set of non-vanishing Lebesgue measure in k  
containing the attractor. Thus the set of points generated by the dynamics corresponding to this 
observable along almost all trajectories on the attractor has dimension 0 , whereas the original 
attractor has some finite, possibly large, dimension. If y′  (or each of its component functions) 
satisfied the smoothness hypotheses of Takens’s theorem, then this would be a contradiction, as 
can be seen by utilizing a theorem of Eckmann and Ruelle [8] related to the determination of 
correlation dimension. Of course, it does not, since the value of y′  at a fixed point of ϕ  is the 
fixed point itself whereas for an evaluation at a slight perturbation off the fixed point, the value is 
the ergodic average implying that the observable is, in all likelihood, discontinuous, let alone 
smooth. If we step back from an infinite sum as in (15) and perform finite sums with large N  for 
realistic dynamics where the map ϕ  may not have enough smoothness to be a diffeomorphism 
(e.g. the interatomic force for the Lennard-Jones potential contains odd powers of square roots of 
the interatomic distance), we do not expect the above argument to change drastically, in the 
sense of a discontinuous function being approached as a limit of smooth functions. Therefore, 
time-averaged observables, reflecting real measurements, maybe expected to display lower-
dimensional dynamics. Moreover, such variables may be very useful for coarse behavior. 
Thus the idea with regard to definition of an autonomous coarse dynamics is to choose a set of 
physically motivated, time-averaged coarse variables corresponding to the original high-
dimensional dynamics. An aperiodic, dense(on the attractor) original trajectory is then delay-
reconstructed in terms of these variables plus added delay components, up to the point where the 
reconstructed trajectory has no self-intersections. The original set of coarse variables plus their 
delay counterparts form the augmented set of coarse variables that form an autonomous coarse 
dynamics. One now uses PLIM with this set of coarse variables to establish the coarse dynamics. 
The procedure outlined in Section 2 in dealing with one set of delay variables ( )ff  is easily 
generalized to deal with multiple (sets of) delay variables with different delays. 
If indeed the delay-reconstructed trajectory has no self-intersection and the embedding 
dimension is small compared to the dimension of the attractor or its enveloping inertial manifold 
for the original dynamics, then this implies that we have a one-to-one mapping between sets of 
different dimension. In this connection, the work of Sauer et al. [9] should be noted, especially 
their filtered delay embedding theorem. Again there are strong hypotheses involved, and, 
interestingly, their Self-Intersection theorem does not put a lower-bound on the dimension of the 
self-intersection set, thus leaving a ray of hope with regard to the existence of a one-one map. Of 
course, it should be noted that one–to-one maps between sets of different dimensions can be 
continuous (in the small-to-large dimension direction) but nowhere differentiable, but this may 
be acceptable in the PLIM approximation methodology in approaching such functions as limits 
of piecewise-smooth continuous functions. 
As an example of the application of these ideas, one may consider the Frenkel-Kontorova 
model of a chain of atoms interacting through linear springs as well as with a nonlinear substrate 
potential. The chain may be assumed fixed at one end and a load applied at the other. Of interest 
is the time averaged stress strain (end-load-displacement) curve of this 1-d assembly. PLIM is 
applied to this problem setup in [2]. However, strictly speaking (and as mentioned in [2]), the 
problem there is solved only for coarse evolution corresponding to the fine trajectory starting 
from the stress free initial state. With the developments suggested in this paper, it may be hoped 
that the two average stress and strain coarse variables can be systematically augmented with 
more memory variables such that coarse evolution based on the developed theory is provably  
representative (at least in a formal sense) of coarse evolution corresponding to a large class of 
fine trajectories. 
Remark: We note here that in the conventional applications of the delay-reconstruction 
technique, non-generic or non-smooth observables leading to dimension change, e.g. Broomhead 
et al. [13] and Pecora and Carroll [14], are considered anomalous and to be avoided. Of 
particular interest to this work, Broomhead et al. [13] explicitly construct an example involving a 
nonrecursive filter, the inverse all-pole filter, that reduces dimension under delay-reconstruction 
of a particular signal. For the question of coarse-graining/averaging, however, it seems that it is 
precisely such non-generic and/or non-smooth functions that should be relevant. Indeed, it makes 
sense that a set of coarse observables executing autonomous macroscopic dynamics is special 
and cannot be chosen as any generic, smooth function(s).  
4. Variables for autonomous coarse response: Adapted projections 
In this proposed approach to find coarse variables that evolve autonomously, we consider the 
following argument: let Π  be a scalar function on the fine phase space, representing the 
definition of the sought-for coarse variable. Let f  be a fine trajectory. Defining the coarse 
trajectory corresponding to f  by c , we have 
 ( ) ( )( ) ( ) ( )( ) ( )( ):        c t f t c t f t H f t
= ⇒ =
. (16) 
Now, in general, given Π , many fine states will correspond to a single coarse state. Under the 
circumstances, one way to ensure an autonomous coarse response is to ensure that on any set of 
fine states where the evaluation of Π  agrees, so should the right-hand-side of the c evolution. 
Mathematically, we have the following: we are interested in determining a function Π  that has 
the following property. Let c  be arbitrarily fixed. Then define 
 ( ){ }: :cW f f cΠ= = . (17) 
Now require that Π  satisfy 
 ( ) ( ) cf H f Af
 on cW ,  (18) 
where cA  is a constant depending on c  but independent of f . 
Equations (17) and (18) together imply that a level set of ( )Π ⋅  should also be a level set of the 
function ( ) ( )H
. One way to require this is to demand that 
 ( ) H
f f f
⎛ ⎞∂ ∂ ∂ ⎟⎜= ⎟⎜ ⎟⎜ ⎟⎜∂ ∂ ∂⎝ ⎠
 (19) 
where λ  is an arbitrary scalar. Choosing it to be a derivative of an arbitrary function of a single 
variable, i.e. of the form λ ϕ Π=∂ ∂ ,  we have the necessary condition that 
 ( ) 0H
⎛ ⎞∂ ∂ ⎟⎜ − ⎟=⎜ ⎟⎜ ⎟⎜∂ ∂⎝ ⎠
. (20) 
Thus, if we now require Π  to satisfy the first-order linear PDE on the fine phase space 
 ( ) H
, (21) 
it can be show by reversing the above arguments that 
 ( )c cϕ=  (22) 
would be the correct autonomous, coarse evolution equation for the coarse variable defined by 
Π  obtained as a solution to (21),  for arbitrary choices of ϕ  in (21). Thus an entire class of  
coarse variables can be defined based on the choice of ϕ , which is a somewhat surprising result. 
It may be hoped that this class contains physically meaningful coarse variable definitions. More 
importantly, it perhaps suggests that the choice of appropriate coarse variables cannot be left 
completely unconstrained, and their definition requires physical guidance. 
Equation (21) is a linear first order PDE for Π . Characteristic curves for the PDE are solutions 
to the original set of fine system of ODE. They do not intersect (and meet only at fixed points) 
because of the autonomous nature and smoothness of the fine evolution. Thus shocks do not 
exist, and it appears reasonable to expect to numerically approximate the PDE without any 
further conditions. 
5. Acknowledgments: It is a pleasure to acknowledge discussions with Luc Tartar and Noel 
Walkington. In particular, LT suggested the local characterization (19) and NW pointed out the 
backward-in-time uniqueness for smooth ODE. 
6. References 
[1] Acharya, A., Parametrized invariant manifolds: A recipe for multiscale modeling? Computer    
Methods in Applied Mechanics and Engineering, 194, 3067-3089, 2005. 
[2] Acharya, A. and Sawant, A., On a computational approach for the approximate dynamics of averaged 
variables in nonlinear ODE systems: toward the derivation of constitutive laws of the rate type, 
Journal of the Mechanics and Physics of Solids, 54, 2183-2213. 
[3] Sawant, A., Acharya, A., Model reduction via Parametrized Locally Invariant manifolds: Some 
Examples, Computer Methods in Applied Mechanics and Engineering, 195, 6287-6311, 2005. 
[4] Abarbanel, H. D. I., Analysis of observed chaotic data, Institute for Nonlinear Science Series, 
Springer-Verlag. 
[5] Kennel, M. B., Abarbanel, H. D. I. False neighbors and false strands: a reliable minimum embedding 
dimension algorithm, Physical review E, 66, 026209, 2002. 
[6] Takens, F., Detecting strange attractors in turbulence, In: Dynamical Systems and Turbulence, Lecture 
Notes in Mathematics, 898, Warwick 1980, Ed. D. A. Rand, L. S. Young, 366-381, 1980. 
[7] Ding, M., Grebogi, C., Ott, E., Sauer, T., Yorke, J. A. Estimating correlation dimension from a chaotic 
time series: when does plateau onset occur? Physica D, 69, 404-424, 1993. 
[8] Eckmann J.-P., Ruelle, D. Ergodic theory of chaos and strange attractors, Reviews of Modern Physics, 
57, 3, 617-656, 1985. 
[9] Sauer, T., Yorke, J. A., Casdagli, M. Embedology, Journal of Statistical Physics, 65, 3/4, 579-616, 
1991. 
[10] Gear, C. W., Kevrekidis, I. G., Theodoropoulos, C., Coarse Integration/Bifurcation Analysis via 
Microscopic Simulators: micro-Galerkin methods, ,  Comp. Chem. Engng., 26, 941-963, 2002. 
[11] Kevrekidis, I. G., C. W. Gear, J. M. Hyman, P. G. Kevrekidis, O. Runborg and K. Theodoropoulos, 
Equation-free coarse-grained multiscale computation: enabling microscopic simulators to perform 
system-level tasks, Comm. Math. Sciences, 1(4), 715-762, 2003. 
[12] Kevrekidis, I. G., C. William Gear, G. Hummer, Equation-free: the computer-assisted analysis of 
complex, multiscale systems, A.I.Ch.E Journal, 50(7), 1346-1354, 2004. 
[13] Broomhead, D. S., Huke, J. P., Muldoon, M. R. Linear filters and non-linear systems, Journal of the 
Royal Statistical Society, Ser. B, 54(2), 373-382, 1992. 
[14] Pecora, L. M., Carroll, T. L. Discontinuous and non-differentiable functions and dimension increase 
induced by filtering chaotic data, Chaos, 6(3), 432-439, 1996.
ABSTRACT
  Two ideas for the choice of an adequate set of coarse variables allowing
approximate autonomous dynamics for practical applications are presented. The
coarse variables are meant to represent averaged behavior of a fine-scale
autonomous dynamics.

<|endoftext|><|startoftext|>
**FULL TITLE**
ASP Conference Series, Vol. **VOLUME**, **YEAR OF PUBLICATION**
**NAMES OF EDITORS**
Simulating CCD images of elliptical galaxies
M. Cantiello1,2, G. Raimondo2, E. Brocato2, J.P. Blakeslee1, M.
Capaccioli3
Abstract. We introduce a procedure developed by the “Teramo Stellar Pop-
ulations Tools” group (Teramo-SPoT), specifically optimized to obtain realistic
simulations of CCD images of elliptical galaxies.
Particular attention is devoted to include the Surface Brightness Fluctua-
tion (SBF) signal observed in ellipticals and to simulate the Globular Cluster
(GC) system in the galaxy, and the distribution of background galaxies present
in real CCD frames. In addition to the physical properties of the simulated
objects - galaxy distance and brightness profile, luminosity function of GC and
background galaxies, etc. - the tool presented allows the user to set some of the
main instrumental properties - FoV, zero point magnitude, exposure time, etc.
The light coming from distant galaxies includes a specific SBF signal, essen-
tially correlated with the properties of the host stellar system (Tonry & Schnei-
der 1988). The existence of these luminosity fluctuations is due to the statistical
correlation between adjacent galaxy regions (pixels). Since its introduction, the
SBF method has been used as a reliable distance indicator for elliptical galaxies
(e.g. Tonry et al. 2001) and, more recently, as a tracer of stellar population
properties (e.g. Cantiello et al. 2003, 2005; Raimondo et al. 2005, R05).
In order to derive SBF magnitudes from CCD images of elliptical galaxies,
very high quality CCD data are required. We have developed a tool to simulate
CCD images of elliptical galaxies including the SBF signal and other properties
of the galaxy - surface brightness profile, distance, color profiles, contamination
of background galaxies, etc.
Due to its statistical nature, a reliable simulation of the SBF signal needs: i)
to accurately reproduce the details of the statistics governing the stellar SBF sig-
nal, and ii) to take into account the presence of any other source of fluctuations.
To include SBF signal in the simulations we use the Teramo-SPoT Single-burst
Stellar Populations (SSP) models (R05, visit also the SPoT website www.oa-
teramo.inaf.it/SPoT). These models are provided by computing a number Nsim
of independent SSP simulations for a large range of ages and chemical composi-
tions. The latter property of SPoT SSP models is at the base of our simulations
of realistic galaxies: we start with a galaxy having an analytic Sersic r1/n profile,
then, for each pixel [i,j] at the radius r∗ we substitute the analytic magnitude
profile µth(r∗)[i,j] with the integrated magnitudes µsim as evaluated in one of the
Nsim independent SSP simulations, having assumed 〈µsim〉 = µth(r∗). In this
Dep. of Physics and Astronomy, Washington State University, Pullman, WA 99164
INAF-Oss. Astronomico di Teramo, Via M. Maggini, 64100, Teramo, Italy
INAF-Oss. Astronomico di Capodimonte, Vicolo Moiariello 16, 80131, Napoli, Italy
http://arxiv.org/abs/0704.1004v1
2 Cantiello et al.
way the poissonian correlation between adjacent pixels is introduced, preserving
the galaxy brightness profile.
To be realistic, the simulation must also include the presence of GC and
background galaxies - which, in addition, can strongly affect the fluctuations
signal derived from the CCD. These sources are indeed included into our simu-
lations according to their typical luminosity functions, i.e., the total luminosity
function is assumed to be the sum of a power law for galaxies, and a gaus-
sian distribution for the GC component. The characteristic parameters of these
functions can be arbitrarily set by the user. Once the luminosity functions are
randomly populated all the background galaxies are randomly distributed in
the frame, while the GC spatial distribution is additionally convolved with an
inverse power law centered on the galaxy.
Finally, a uniform sky value is included, and the detector noise is added
according to the readout-noise and gain values of the selected instrument.
After the galaxy profile - including SBF -, the GC system, the background
galaxies, and the detector noise properties have been properly chosen, the simu-
lation can be carried out. The panels of Figure 1 show the frames associated to
some of the steps described above, the final frame simulated, and the luminosity
functions of GC and background galaxies.
Figure 1. The first three panels of the figure (left to right) show the profile
of the galaxy simulated, the frame of GC and background galaxies, and the
sum of the previous two frames plus sky and noise, respectively. ACS camera
properties are adopted for the instrumental characteristics. The last plot
shows the luminosity functions adopted of the GC (short dashed line) and
the background galaxies (long dashed line), and their sum (solid line).
The capabilities of the procedure here briefly described makes it useful to
simulate astronomical data for a wide range of applications. As a specific case
we mention the use of this tool to simulate realistic runs at defined telescopes
with the aim of measuring SBF. For example, we have applied this technique to
simulate ISAAC@VLT Ks-band, and ACS@HST F814W-band (reported Figure
1) images of given ellipticals in order to evaluate the proper exposure times
required to reach a defined S/N ratio for objects at different distances.
References
Cantiello, M., Raimondo, G., Brocato, E., Capaccioli, M. 2003, AJ, 611, 670
Cantiello, M., et al. 2005, ApJ, 634, 239
Raimondo, G., et al. 2005, AJ, 130, 2625
Simulating images of elliptical galaxies 3
Tonry, J. & Schneider, D.P. 1988, AJ, 96, 807
Tonry, J. et al. 2001, AJ, 546, 681
ABSTRACT
  We introduce a procedure developed by the ``Teramo Stellar Populations
Tools'' group (Teramo-SPoT), specifically optimized to obtain realistic
simulations of CCD images of elliptical galaxies.
  Particular attention is devoted to include the Surface Brightness Fluctuation
(SBF) signal observed in ellipticals and to simulate the Globular Cluster (GC)
system in the galaxy, and the distribution of background galaxies present in
real CCD frames. In addition to the physical properties of the simulated
objects - galaxy distance and brightness profile, luminosity function of GC and
background galaxies, etc. - the tool presented allows the user to set some of
the main instrumental properties - FoV, zero point magnitude, exposure time,
etc.

<|endoftext|><|startoftext|>
Introduction
AKähler manifold with negative first Chern class admits a unique Kähler-
Einstein metric. This was shown by Yau in his seminal paper on the Calabi
conjecture [Y1], and also independently by Aubin [A]. Yau later posed the
question of whether, in general, Kähler-Einstein metrics could be obtained
as a limit of algebraic metrics induced from embeddings into projective space
[Y2]. Donaldson [D1] showed that if a polarized variety (X,L) with discrete
automorphism group admits a constant scalar curvature Kähler metric, then
there is indeed a sequence of algebraic ‘balanced metrics’ [Zh] converging to
it. Donaldson’s proof makes use of the Tian-Yau-Zelditch expansion of the
Bergman kernel [Ti], [Ze] (see also [C]) and Lu’s [L] computation of the sec-
ond coefficient. Very recently, Donaldson [D2] has described how numerical
approximations to these balanced metrics could be used to compute, to a
good accuracy, explicit Kähler-Einstein metrics on certain varieties.
Tsuji [Ts] has considered a different way of producing Kähler-Einstein
metrics by algebraic approximations. He introduced a new iterative proce-
dure on varieties of general type with the aim of describing (possibly sin-
gular) Kähler-Einstein metrics. In the case when the first Chern class is
The first-named author is on leave for the semester and is visiting MSRI, Berkeley, CA
as a postdoctoral fellow. He is supported in part by National Science Foundation grant
DMS 0604805.
Part of this research was carried out while the second-named author was a short-term
visitor at MSRI in January 2007. He is supported in part by National Science Foundation
grant DMS 0504285.
http://arxiv.org/abs/0704.1005v1
negative, Tsuji proved that his iteration converges, in a certain weak sense,
to the Kähler-Einstein metric. In this paper, we give a uniform convergence
result and describe how his procedure may be modified to obtain results in
the case of algebraic surfaces of general type, and on orbifolds with isolated
singuarities.
We now describe Tsuji’s iteration. Let X be a compact Kähler mani-
fold of complex dimension n with ample canonical bundle KX . Let ωKE =√
(gKE)ijdz
i ∧ dzj be the Kähler-Einstein metric satisfying
Ric(ωKE) = −
∂∂ logωnKE = −ωKE ∈ c1(X).
Fix a Hermitian metric hKE on KX by setting hKE = (det gKE)
Let m0 ≥ 1 be an integer such that Km0X is base point free and let hm0
be any Hermitian metric on Km0X . Define a sequence of Hermitian metrics
hm on K
X for m > m0 inductively as follows. Suppose that hm is a given
Hermitian metric on KmX . To define hm+1, first define an inner product
〈 , 〉Tm+1 on the space of sections of Km+1X by
〈s, t〉Tm+1 =
hm ⊗ s⊗ t, for s, t ∈ H0(X,Km+1X ), (1.1)
where hm⊗s⊗t is regarded as a volume form onX. Now let (σ
m+1, . . . , σ
(Nm+1)
m+1 ),
for Nm+1+1 = dimH
0(X,Km+1X ), be an orthonormal basis of H
0(X,Km+1X )
with respect to this inner product. Then define the Hermitian metric hm+1
on Km+1X by
hm+1 =
(m+ n+ 1)!
(m+ 1)!
m+1 ⊗ σ
Observe that this metric is independent of the choice of orthonormal basis.
We have the following theorem on the convergence of the metrics hm,
strengthening the result given in [Ts].
Theorem 1 Let X be a compact Kähler manifold with ample canonical
bundle. Let hKE, ωKE and the sequence of Hermitian metrics hm be as
above. There exists a constant C depending only on X and hm0 such that
for m ≥ m0,
C logm
m hKE ≤ h1/mm ≤ e
C logm
m hKE. (1.2)
Hence h
m converges uniformly on X to hKE as m→ ∞.
We now describe a modification of Tsuji’s iteration. Let β be a continu-
ous function on a variety X with 0 ≤ β ≤ 1. Here we allow X to have mild
singularities. We also remove the assumption that KX be ample. We only
require that there exists an m0 ≥ 1 such that Km0X is base point free. Let
hm0 be a Hermitian metric on K
X . Given 0 < ε ≤ 1, we inductively define
a sequence of Hermitian metrics hm,ε = hm,ε(β, hm0) on KX as follows. As-
suming that hm,ε is given, define an inner product 〈 , 〉Tm+1,ε on the space
of sections of Km+1X by
〈s, t〉Tm+1,ε =
βεhm,ε ⊗ s⊗ t, for s, t ∈ H0(X,Km+1X ).
Then define the Hermitian metric hm+1,ε on K
hm+1,ε =
(m+ n+ 1)!
(m+ 1)!
m+1,ε ⊗ σ
m+1,ε
where (σ
m+1,ε, . . . , σ
(Nm+1)
m+1,ε ) is an orthonormal basis of H
0(X,Km+1X ) with
respect to the inner product 〈 , 〉Tm+1,ε . We call hm,ε the modified Tsuji
iteration. It depends on ε and β.
First, we consider the case when X is an algebraic surface of general
type. Let E =
iEi be the sum of the nonsingular rational curves Ei of self
intersection −1 (or (−1)-curves, for short) on X. Let τ : X → Xmin be a
holomorphic map blowing down these curves onX, so that Xmin is a minimal
surface of general type. Now if h is any Hermitian metric on KXmin, then
Ω = h−1(z1, . . . , zn)(
−1/2π)ndz1∧dz1∧ · · ·∧dzn∧dzn can be regarded as
a volume form on Xmin. Using coordinates w
i on X, there is a holomorphic
section S−1 of the line bundle [E ] associated to E and vanishing of order one
on E satisfying
(τ∗h)−1 ⊗ |S−1|2
(w1, . . . , wn)
dw1 ∧ dw1 ∧ · · · ∧ dwn ∧ dwn = τ∗Ω,
for any such h.
Xmin has no (−1)-curves, but may have (−2)-curves. Let f : Xmin →
Xcan be its canonical map. Xcan is an surface with ample canonical bundle
KXcan and, at worst, isolated orbifold singularities. By the orbifold version of
the results of [Y1], [A] (see [K], for example), there exists an orbifold Kähler-
Einstein metric ωKE on Xcan, with corresponding Hermitian metric hKE on
KXcan . Define a Kähler metric on Xmin by ωmin = f
∗ωKE and a Hermitian
metric on KXmin by hmin = f
∗hKE. Note that ωmin and hmin are not smooth
in general, although hmin is continuous on Xmin (see [TZ]). Let C =
be the sum of the (−2)-curves on Xmin and let S−2 be a holomorphic section
of the line bundle [C], vanishing of order one on C. Fix a smooth Hermitian
metric hC on [C], and assume that supXmin |S−2|
= 1. Let β be the smooth
function on X defined by β = τ∗|S−2|2hC , and let h∞ = τ
∗hmin.
For this β and some initial Hermitian metric hm0 on K
X , let hm,ε =
hm,ε(β, hm0) be the sequence of Hermitian metrics in the modified Tsuji
iteration as described above. Then we have the following result.
Theorem 2 Let X be an algebraic surface of general type. With hm,ε =
hm,ε(β, hm0) and S−1 as described above, for every sequence εj → 0,
lim sup
h1/mm,εj → h∞ ⊗ |S−1|
−2, as j → ∞,
almost everywhere on X.
Note that since KX = τ
∗KXmin + [E ], we can regard h∞ ⊗ |S−1|−2 as a
singular Hermitian metric on KX .
We now consider the case whenX is an orbifold with isolated singularities
with ample canonical bundle.
Theorem 3 Let (X,ωKE) be a Kähler-Einstein orbifold with KX ample and
only isolated singularities at points p1, . . . , pk. Let β be a continuous function
on X satisfying 0 ≤ β ≤ 1 and β(x) = 0 if and only if x = pi for some i.
Then, with hm,ε = hm,ε(β, hm0) as above, for every sequence εj → 0,
lim sup
h1/mm,εj → hKE, as j → ∞,
almost everywhere on X.
We end the introduction with a couple of remarks.
1. Tsuji’s iteration has some similarities to Donaldson’s TK -iteration (see
[D2], section 2.2.2) which in the case of KX ample should yield a
‘canonically balanced metric’ using sections of a fixed power k of the
canonical line bundle. As k → ∞, the limit of these metrics is expected
to be the Kähler-Einstein metric. On the other hand, Tsuji’s method
is a single iterative process.
2. Donaldson has suggested that the dynamical systems introduced in
[D2] are likely to be discrete approximations to the Kähler-Ricci and
Calabi flows. It would be interesting to know whether Tsuji’s iteration
could be viewed in a similar light.
The outline of the paper is as follows. Our main technique is the peak
section method of [Ti]. This is described in section 2, and extended to
orbifolds with isolated singularities (for related results on the Szegö kernel
for orbifolds, see [S], [DLM], [P]). In sections 3 and 4 we prove the main
theorems.
2. Peak sections
Now suppose that (X,ω) be a compact Kähler manifold of complex di-
mension n with ω ∈ c1(L) for an ample line bundle L. Fix a Hermitian
metric h on L satisfying −
∂∂ log h = ω. Write 〈·, ·〉L2(hm) and ‖ · ‖L2(hm)
for the L2 inner product and norm on H0(X,Lm) with respect to the Her-
mitian metric hm and volume form 1
ωn. We use the following lemma from
[Ti].
Lemma 2.1 There exists m1 > 0 depending only on X, L and h such that
for every x0 ∈ X and m ≥ m1 there is a global holomorphic section sm,x0 of
Lm satisfying the following.
(i) |sm,x0 |2hm(x0) = 1 and
‖sm,x0‖2L2(hm) =
(m+ n)!
(1 + O(m−1)),
where f(m) = O(m−1) means that |f(m)|m ≤ A for a constant A
depending only on X, L and h.
(ii) For t any holomorphic section of Lm which vanishes at x0,
∣〈sm,x0 , t〉L2(hm)
∣ ≤ B
‖sm,x0‖L2(hm)‖t‖L2(hm),
with a constant B depending only on X, L and h.
Proof We outline here the proof of part (i), since we will explicitly refer to
this method in the orbifold case. For (ii) we refer the reader to [Ti]. Pick a
normal coordinate chart (U, (z1, z2, . . . , zn)) for g centered at the point x0.
Let η : [0,∞) → [0, 1] be a cut-off function satisfying η(t) = 1 for t ≤ 1
η(t) = 0 for t ≥ 1 and −4 ≤ η′(t) ≤ 0, |η′′(t)| ≤ 8. Define a weight function
ψ, which for m sufficiently large is supported in U , by setting
ψ(z) = (n+ 2)η
am|z|2
am|z|2
, (2.1)
for z ∈ U , and ψ ≡ 0 outside U , where am = m/(logm)2. A short calculation
shows that √
∂∂ψ(z) ≥ −C(n+ 2)amω, for |z| > 0. (2.2)
Now let ψi be a decreasing sequence of smooth functions on X such that
ψ = lim
ψi and
∂∂ψi ≥ −C(n+ 2)amω, (2.3)
where the constant C may be different from the one in (2.2). It follows that
for sufficiently large m,
∂∂ψi −
∂∂ log hm +Ric(ω) ≥
ω. (2.4)
Choose a local holomorphic section v of L so that |v|2h(x0) = 1 and (∂|v|2h)(x0) =
0. Let w be the smooth local section of Lm defined by
w = ∂
am|z|2
We apply the L2 estimate of Hörmander [H] (cf. Proposition 1.1 of [Ti]) to
the (1,0) form w: there exists a smooth global section u of Lm satisfying
∂u = w and
|u|2hme−ψ
|w|2hme−ψ
Here we are using the fact that the weight function ψ can be approximated
by ψi satisfying (2.4). Observe that w vanishes identically outside the region
2/am ≤ |z|2 ≤ 4/am, and that in U , |w|2hme−ψ ≤ Cam|vm|2hm. It follows that
|u|2hme−ψ
(logm)2
2/am≤|z|2≤4/am
|vm|2hm
From our choice of coordinates, |v|2h(z) = 1− |z|2 +O(|z|3), and so locally
|vm|2hm
)m(√−1
dz1 ∧ dz1 ∧ · · · ∧ dzn ∧ dzn.
Hence
|u|2hm
(logm)2
≤ C(logm)2n−2m− logm/2−n. (2.5)
Now set sm,x0 = η
am|z|2
vm − u. Since
|u|2hme−ψ
< ∞, we have
u(z) = O(|z|2) by the definition of ψ. Hence |sm,x0 |2hm(x0) = 1. Calculate
|sm,x0 |2hm
am|z|2
|vm|2hm
|u|2hm
+ 2Re
am|z|2
vm, u
. (2.6)
From (2.5) the last two terms are O(m−q) for any q. For the first term,
observe that
|z|2≤2/am
|vm|2hm
am|z|2
|vm|2hm
|z|2≤4/am
|vm|2hm
(2.7)
We will make use of the following elementary lemma:
Lemma 2.2 Fix b > 0 and q > 0. Then
|z|2≤b/am
(1−|z|2)mdz1∧dz1∧· · ·∧dzn∧dzn =
(m+ n)!
+O(m−q),
where the term O(m−q) depends only on b, q and n.
Then for any fixed b > 0 and for m sufficiently large,
|z|2≤b/am
|vm|2hm
(m+ n)!
(1 + O(m−1)).
From (2.6) and (2.7) we obtain
|sm,x0 |2hm
(m+ n)!
(1 + O(m−1)),
as required. �
Assume now that (X,ω) is a Kähler orbifold of complex dimension n
with isolated orbifold singularities at points p1, . . . , pk. Then for each x ∈
{p1, . . . , pk} there is an open neighborhood Vx ⊂ X containing x, a finite
subgroup Gx ∈ U(n), a Gx-invariant open neighborhood Ṽx of the origin in
n and a projection map πx : Ṽx → Ṽx/Gx ∼= Vx with πx(0) = x. Let h be an
orbifold Hermitian metric on an orbifold line bundle L with −
∂∂ log h =
ω. We show the following.
Lemma 2.3 There exists m1 > 0 depending only on X, L and h such that
for each m ≥ m1 the following holds. Let x0 ∈ X. Then
(i) If x0 satisfies
(d(x0, pi))
2 ≤ 1
, for some i ∈ {1, . . . , k},
where d( , ) is the distance function on X with respect to ω, then there
is a global holomorphic section sm,x0 of L
m satisfying |sm,x0 |2hm(x0) = 1
(m+ n)!
‖sm,x0‖2L2(hm) =
|Gpi |
+O(m−1).
(ii) If x0 satisfies
< (d(x0, pi))
, for some i ∈ {1, . . . , k},
where am = m/(logm)
2, then there is a global holomorphic section
sm,x0 of L
m satisfying |sm,x0 |2hm(x0) = 1 and
1 + O(m−1)
≤ (m+ n)!
‖sm,x0‖2L2(hm) ≤ C2
1 + O(m−1)
for positive constants C1 and C2 depending only on |Gpi |.
(iii) If x0 satisfies
(d(x0, pi))
, for every i ∈ {1, . . . , k},
then there exists a global holomorphic section sm,x0 of L
m satisfying
|sm,x0 |2hm(x0) = 1 and
(m+ n)!
‖sm,x0‖2L2(hm) = 1 + O(m
Proof We will choose m1 ≫ 1 later in the proof, depending only on X, L
and h. Let m ≥ m1. For (i), assume first that x0 ∈ X − {p1, . . . , pk}. We
may assume without loss of generality that
(d(x0, p1))
2 ≤ 1
and d(x0, pi) ≥ c > 0 for i = 2, . . . , k for some uniform c.
Dropping the subscript p1, we have a uniformizing coordinate system
π : Ṽ → Ṽ /G ∼= V centered at p1 ∈ V , so that at 0 ∈ Ṽ , the metric g is the
identity. The metric is G-invariant and smooth in Ṽ and has vanishing first
derivatives at the origin. Set |G| = l. Since the singularity is isolated, the
only fixed point of the action is 0 ∈ Ṽ , and so Ṽ − {0} is a l-fold cover of
V − {p1}. The preimage of x0 under the map π consists of l distinct points
which we will write as x̃1, . . . , x̃l ∈ Ṽ ⊂ Cn. We may assume that
0 < |x̃1|2 = |x̃2|2 = · · · = |x̃l|2 <
Let η and ψ be the functions defined earlier in the smooth case, and let ψ̃
be a weight function on U given by
ψ̃(z) =
ψ(z − x̃i).
Observe that ψ̃ is G-invariant, since ψ(z) is a function of |z|2 only. Hence ψ̃
can be regarded as a smooth function on X in the orbifold sense. Note also
that ψ̃ is non-positive everywhere. We have
−1∂∂ψ̃(z) ≥ −C(n+ 2)amω(z), for z ∈ Ṽ − {x1, . . . , xl}.
Hence for sufficiently large m, with ψ̃j approximating ψ̃ as before,
−1∂∂ψ̃j −
∂∂ log hm +Ric(ω) ≥
Let v be a local orbifold holomorphic section of L. We may assume without
loss of generality that |v|2h(p1) = 1. Pulling back to Ṽ we have (∂|v|2h)(0) = 0.
Let w be the smooth local section of Lm defined in Ṽ by
am|z − x̃i|2
Notice that w is G-invariant. If m1 is sufficiently large then
|x̃i − x̃j|2 ≤
for all i, j and it follows that w vanishes identically in the regions
|z − x̃i|2 ≤
|z − x̃i|2 ≥
for all i. In addition, w vanishes in the regions
|z|2 ≤ 1
|z|2 ≥ 16
It follows that w descends to a smooth global orbifold section of Lm and ψ̃
is uniformly bounded whenever w is not identically zero. Hence in V ,
|w|2hme−ψ̃ ≤ Cam|vm|2hm .
|w|2hme−ψ̃
and we can apply the orbifold version of Hörmander’s estimates to obtain a
smooth global orbifold section u of Lm satisfying ∂u = w and
|u|2hme−ψ̃
|w|2hme−ψ̃
(logm)2
1/4am≤|z|2≤16/am
|vm|2hm
We can write |v|2h(z) = 1 − |z|2 + O(|z|3) and it follows that, by a similar
argument as in the smooth case,
|u|2hm
≤ C(logm)2n−2m− logm/2−n.
Now set
sm,x0(z) =
am|z − x̃i|2
vm(z)− u(z),
so that sm,x0 is a global holomorphic orbifold section of L
m. Notice that
since
|u|2hme−ψ̃
it follows from the definition of ψ̃ that u(z − x̃i) = O(|z − x̃i|2) for each i.
Hence u(x0) = 0 and since |sm,x0 |2hm(x0) = |vm|2hm(x0), we have
1 ≥ |sm,x0 |2hm(x0) ≥ (1−
)m = em log(1−2/m
2) = 1−O(m−q), (2.8)
for any q. Calculate, remembering that Ṽ −{0} is an l-fold cover of V −{p1},
|sm,x0 |2hm
am|z − x̃i|2
|vm|2hm
am|z − x̃i|2
vm, u
|u|2hm
. (2.9)
The last two terms are O(m−q) for any q. For the first term, observe that
|z|2≤1/4am
|vm|2hm
am|z − x̃i|2
|vm|2hm
|z|2≤16/am
|vm|2hm
. (2.10)
From Lemma 2.2 we obtain
|sm,x0 |2hm
l(m+ n)!
(1 + O(m−1)).
Then from (2.8) we obtain the required section by rescaling.
The case when x0 is one of the singular points pi is easier, since we can
take the weight function to be ψ, which is of course G-invariant. The proof
follows as in the smooth case, except that a factor of l arises when estimating
the integral of |sm,x0 |2hm .
We now consider case (ii). We divide this into two parts:
< (d(x0, p1))
≤ (d(x0, p1))2 <
for a constant A≫ 0 depending on l to be determined later.
For (a), we can use almost the same argument as in (i). The only differ-
ence is that (2.8) becomes
1 ≥ |sm,x0 |2hm(x0) ≥ c > 0,
for a constant c depending on A. The required estimate follows after scaling
sm,x0 .
For (b) we argue as follows. Using the notation above, we work in the
coordinate patch Ṽ and consider the same weight function ψ̃. Let v1 be a
local holomorphic section of π∗L over Ṽ with the property that
|v1|2h(z) = 1− |z − x̃1|2 +O(|z − x̃1|3).
Observe that v1 is not G-invariant. Writing the elements of G as γ1, . . . , γl
with γ1 the identity element, we set vi = γ
i v1, so that
|vi|2h(z) = 1− |z − x̃i|2 +O(|z − x̃i|3).
Now define a local G-invariant section ŵ of Lm over Ṽ by
ŵ(z) =
am|z − x̃i|2
vmi (z).
We have
≤ |x̃i − x̃j |2 ≤
, (2.11)
and by a similar argument as in case (i),
|ŵ|2hme−ψ̃
Hence we can obtain a smooth global orbifold section û of Lm satisfying
∂û = ŵ and
|û|2ω
≤ C(logm)2n−2m− logm/2−n.
Let sm,x0 be the global holomorphic orbifold section of L
m given by
sm,x0(z) =
am|z − x̃i|2
vmi (z)− û(z).
Notice that û(x0) = 0. For i 6= j we have |vmi |2hm(x̃j) ≤ e−A/8, using (2.11),
and if A is sufficiently large depending only on l it follows that
1 ≥ |sm,x0 |2hm(x0) ≥ c > 0, (2.12)
for c depending only on l. Now
|sm,x0 |2hm
am|z − x̃i|2
am|z − x̃i|2
vmi , û
|û|2hm
. (2.13)
As before, the last two terms are O(m−q) for any q, and
(m+ n!)
(1 + O(m−1)) ≤ 1
am|z − x̃i|2
l(m+ n!)
(1 + O(m−1)), (2.14)
for a constant c′ > 0 depending only on l. Combining (2.12), (2.13) and
(2.14) completes part (b) of (ii).
For case (iii) we can avoid the singularities using the same argument as
in the smooth case. �
Remark 2.1 The result of Lemma 2.3.(ii) is clearly not sharp. It would be
interesting to know what the optimal estimates are in this case.
3. Convergence of Tsuji’s iteration
In this section we give a proof of Theorem 1. We begin with a simple and
well-known observation, which we will be useful later. Let X be any set, and
let H be a finite dimensional vector subspace of the vector space of functions
from X to C. Suppose that H is equipped with an inner product 〈 , 〉H . For
any orthonormal basis (v0, . . . , vN ) of H, define a function ρ : X → R by
ρ(x) =
i=0 |vi|2(x). Note that the function ρ is independent of the choice
of orthonormal basis. Now fix x ∈ X . Then it is possible to choose an
orthonormal basis (v0, . . . , vN ) such that vi(x) = 0 for i = 1, 2, . . . , N . The
observation is that
ρ(x) = sup
|v|2(x)
∣ v ∈ H, ‖v‖H = 1
= |v0|2(x). (3.1)
We now turn to the proof of Theorem 1. Notice that, in addition to a
Hermitian metric hm on K
X for each m ≥ m0, we have defined by (1.1) an
inner product 〈 , 〉Tm on H0(X,KmX ) for each m ≥ m0 + 1. Also, from the
Hermitian metric hKE on KX we have the L
2 inner product on H0(X,KmX )
given by hmKE and
ωnKE. We will denote this inner product simply by
〈·, ·〉KE.
We will use Lemma 2.1 on the existence of peak sections to prove the
following:
Lemma 3.1 Let m1 be the constant of Lemma 2.1 for L = KX , h = hKE.
Assume m1 ≥ m0. There exists A depending only on X such that for all
m ≥ m1,
hm1KE
≤ hm ≤ sup
hm1KE
(3.2)
Proof In the course of this proof, the constant A may change from line
to line. We will prove the upper bound on hm first. We use induction.
Obviously, the inequality holds for m = m1. Let
Cm = sup
and assume that hm ≤ CmhmKE. Notice that for any section t of H
0(X,Km+1X ),
‖t‖2Tm+1 ≤ Cm ‖t‖
Fix a point x0 ∈ X. Let sm+1,x0 ∈ H0(X,Km+1X ) be a peak section as
constructed in Lemma 2.1. Then calculate, from the definition of hm+1,
(m+ 1)!
(m+ n+ 1)!
hm+1(x0) ≤
‖sm+1,x0‖
|sm+1,x0 |
hm+1KE (x0)
≤ Cm ‖sm+1,x0‖
hm+1KE (x0)
(m+ 1)!
(m+ n+ 1)!
1 + O
hm+1KE (x0),
and it follows that
hm+1 ≤ sup
We turn now to the lower bound for hm. Again we use induction and assume
that hm ≥ DmhmKE for
Dm = inf
Fix x0 inX. Let (σ
m+1, . . . , σ
(Nm+1)
m+1 ) be an orthonormal basis of H
0(X,Km+1X )
with respect to the inner product 〈 , 〉Tm+1 . We may assume that
m+1(x0) = 0 for i = 1, . . . , Nm+1.
Observe that 1 = ‖σ(0)m+1‖2Tm+1 ≥ Dm‖σ
m+1‖2KE. Then
(m+ 1)!
(m+ n+ 1)!
hm+1(x0) = h
KE (x0)
|σ(0)m+1|2hm+1
≥ Dmhm+1KE (x0)
‖σ(0)m+1‖2KE
|σ(0)m+1|2hm+1
. (3.3)
Now let (τ
m+1, . . . , τ
(Nm+1)
m+1 ) be an orthonormal basis of H
0(X,Km+1X ) with
respect to the inner product 〈 , 〉KE. As before we may assume that
m+1(x0) = 0 for i = 1, . . . , Nm+1.
Then it follows that if t is any section of H0(X,Km+1X ), we have
‖t‖2KE
≤ |τ (0)m+1|
(x0). (3.4)
Hence
(m+ 1)!
(m+ n+ 1)!
hm+1(x0) ≥ Dmhm+1KE (x0)
|τ (0)m+1|2hm+1
. (3.5)
We consider again the peak section sm+1,x0 . Define real numbers a0, . . . , aNm+1
sm+1,x0 =
Then, using the second part of Lemma 2.1,
a2i =
sm+1,x0 ,
‖sm+1,x0‖KE
and so
a2i ≤ A
‖sm+1,x0‖2KE
(m+ 1)2
Now notice that
a20 = ‖sm+1,x0‖2KE −
≥ ‖sm+1,x0‖2KE
1−A 1
(m+ 1)2
Now since |sm+1,x0 |2hm+1
(x0) = 1, we have |τ
m+1|2hm+1
(x0) = 1/a
0. Then
from (3.5) we have
hm+1(x0) ≥ Dmhm+1KE (x0)
and the required lower bound follows. �
From this, we can prove Theorem 1.
Proof of Theorem 1 Raise (3.2) to the power 1/m. For the upper bound
of Theorem 1, observe that
for a constant C depending only on A. The lower bound follows similarly.
4. The modified iteration
We give a proof of Theorem 2. We omit the proof of Theorem 3, since it is
simpler and follows along the same lines. We first prove a convergence result
on the minimal surface Xmin of general type. Consider a Hermitian metric
hm0 on K
and write β = |S−2|2hC , for |S−2|
as in the introduction.
Recall that hmin is the Hermitian metric on KXmin given by f
∗hKE, for hKE
the Hermitian metric on KXcan corresponding to the Kähler-Einstein metric
ωKE. Consider the sequence of metrics hm,ε = hm,ε(β, hm0) on K
Theorem 4.1 For every sequence εj → 0,
lim sup
h1/mm,εj → hmin, as j → ∞,
almost everywhere on Xmin.
To prove this, we will need two lemmas.
Lemma 4.1 There exist m1 > 0 and A depending only on Xmin, β and ε
such that for all m ≥ m1,
βεhm,ε ≤ sup
hm1,ε
hmmin
. (4.1)
Proof We will use induction, for m1 to be determined later. Assume the
inequality holds for hm. Let
Cm = sup
hm1,ε
hm1min
Then βεhm,ε ≤ Cmhmmin. It follows that ‖t‖2Tm+1,ε ≤ Cm‖t‖
for t any
global section of Km+1Xmin . Since the inequality for hm+1,ε obviously holds at
points on C, it is sufficient to prove it at a fixed point y0 ∈ Xmin − C. Write
x0 = f(y0) ∈ Xcan. By Lemma 2.3 there is a global holomorphic section
sm+1,x0 of K
satisfying |sm+1,x0 |2hm+1
(x0) = 1 and
βε(y0)
|sm+1,x0 |2hm+1
(m+ 1)!
(m+ n+ 1)!
1 + O(m−1)
as long as m1 is chosen to be sufficiently large. Then
(m+ 1)!
(m+ n+ 1)!
βε(y0)hm+1,ε(y0)
≤ βε(y0)
‖f∗(sm+1,x0)‖
Tm+1,ε
|f∗(sm+1,x0)|
hm+1min (y0)
≤ Cmβε(y0)
|sm+1,x0 |2hm+1
hm+1min (y0)
(m+ 1)!
(m+ n+ 1)!
1 + O
hm+1min (y0),
and the lemma follows. �
For the lower bound of hm,ε, we use a modification of a lemma of Tsuji
[Ts]:
Lemma 4.2 There exist constants m2 and B depending only on Xmin such
that for all m ≥ m2 and 0 < ε ≤ 1,
βεh−1/mm,ε ≤ V
m−m2+1
βεh−1/m2m2,ε
)m2/m
(4.2)
for V =
ωnmin
Proof Set
Lm,ε =
βεh−1/mm,ε
(m+ n)!
(Nm + 1),
for Nm + 1 = dimH
0(Xmin,K
). Then we claim that
Lm,ε ≤ c1/mm L
(m−1)/m
m−1,ε . (4.3)
Given (4.3), we can finish the proof of the lemma as follows. First, by
Riemann-Roch, there exist constants m2 and B such that for m ≥ m2,
cm ≤ V
From (4.3), arguing by induction, we have
Lm,ε ≤ (cmcm−1 · · · cm2)
Lm2/mm2,ε ,
and the inequality (4.2) follows immediately. It remains to show (4.3). Using
Hölder’s inequality,
Lm,ε =
βεh−1/mm,ε h
1/(m−1)
m−1,ε h
−1/(m−1)
m−1,ε
βε(h−1/mm,ε h
1/(m−1)
m−1,ε )
−1/(m−1)
m−1,ε
−1/(m−1)
m−1,ε
βεh−1m,εhm−1,ε
(m−1)/m
m−1,ε
(m+ n)!
σ(i)m,ε ⊗ σ
m,ε ⊗ hm−1,ε
(m−1)/m
m−1,ε
= c1/mm L
(m−1)/m
m−1,ε ,
and this completes the proof of the lemma. �
We can now use these lemmas to prove a convergence result for the
metrics hm,ε.
Proof of Theorem 4.1 From Lemma 4.1 we have
h−1/mm,ε ≥ E(m, ε)h−1min on Xmin − C,
where E(m, ε) → 1 as m→ ∞. From Lemma 4.2, we have
βεh−1/mm,ε ≤ V F (m, ε),
where F (m, ε) → 1 as m→ ∞. Writing hε = lim supm→∞ h
m,ε , we have
h−1ε − h−1min
h−1min
h−1min =
lim inf
βε(h−1/mm,ε − E(m, ε)h−1min)
≤ lim inf
βεh−1/mm,ε −
βεh−1min
≤ V −
βεh−1min → 0,
as ε→ 0. Theorem 4.1 follows. �
Finally, we complete the proof of Theorem 2.
Proof of Theorem 2 Using the notation given in the introduction, there
is an isomorphism Θ : H0(Xmin,K
) → H0(X,KmX ) given by Θ(s) =
τ∗s⊗ Sm−1. Then, given an inner product Tm,ε on H0(X,KmX ), we can define
an inner product T̂m,ε on H
0(Xmin,K
) by 〈s, t〉
T̂m,ε
= 〈Θ(s),Θ(t)〉Tm,ε .
Then given an initial Hermitian metric hm0 on K
X we can obtain an
inner product 〈·, ·〉
T̂m0+1,ε
on Km0+1Xmin and hence a Hermitian metric ĥm0+1,ε
. Applying the modified Tsuji iteration as in the case of Theorem
4.1, we obtain a sequence of Hermitian metrics ĥm,ε form ≥ m0+1 onKmXmin.
From the definition of S−1, one can check that hm,ε = τ
∗ĥm,ε ⊗ |S−1|−2m.
Indeed, assuming inductively that hm,ε = τ
∗ĥm,ε ⊗ |S−1|−2m, denote by
〈·, ·, 〉′
T̂m+1,ε
the inner product induced by ĥm,ε on H
0(Xmin,K
). We need
to show that 〈s, t〉′
T̂m+1,ε
= 〈s, t〉
T̂m+1,ε
for s, t ∈ H0(Xmin,Km+1Xmin). But
〈s, t〉′
T̂m+1,ε
ĥm,ε ⊗ s⊗ t
hm,ε ⊗ |S−1|2m ⊗ (τ∗s)⊗ (τ∗t)⊗ |S−1|2
= 〈Θ(s),Θ(t)〉Tm+1,ε
= 〈s, t〉
T̂m+1,ε
Now by Theorem 4.1, we see that for any sequence εj → 0, we have
lim sup
ĥ1/mm,εj → hmin,
almost everywhere. Theorem 2 follows immediately. �
References
[A] Aubin, T. Équations du type Monge-Ampère sur les variétés
kählériennes compactes. Bull. Sci. Math. (2) 102 (1978), no. 1, 63–95.
[C] Catlin, D. The Bergman kernel and a theorem of Tian, Analysis and
geometry in several complex variables (Katata, 1997), 1–23, Trends
Math., Birkhäuser Boston, Boston, MA, 1999.
[DLM] Dai, X., Liu, K. and Ma, X. On the asymptotic expansion of Bergman
Kernel, J. Differential Geom. 72 (2006), no. 1, 1–41.
[D1] Donaldson, S.K. Scalar curvature and projective embeddings. I., J. Dif-
ferential Geom. 59 (2001), no. 3, 479–522.
[D2] Donaldson, S.K. Some numerical results in complex differential geom-
etry, preprint, math.DG/0512625.
[H] Hörmander, L. An introduction to complex analysis in several variables,
Second revised edition. North-Holland Mathematical Library, Vol. 7.
North-Holland Publishing Co., Amsterdam-London; American Elsevier
Publishing Co., Inc., New York, 1973.
[K] Kobayashi, R., Einstein-Kähler V -metrics on open Satake V -surfaces
with isolated quotient singularities. Math. Ann. 272 (1985), no. 3, 385–
[L] Lu, Z. On the lower order terms of the asymptotic expansion of Tian-
Yau-Zelditch, Amer. J. Math. 122 (2000), no. 2, 235–273.
[P] Paoletti, R. Szegö kernels and finite group actions, Trans. Amer. Math.
Soc. 356 (2004), no. 8, 3069–3076.
[S] Song, J. The Szegö kernel on an orbifold circle bundle, preprint,
math.DG/0405071.
[Ti] Tian, G. On a set of polarized Kähler metrics on algebraic manifolds,
J. Differential Geom. 32 (1990), 99–130.
[TZ] Tian, G. and Zhang, Z. On the Kähler-Ricci flow on projective man-
ifolds of general type, Chinese Ann. Math. Ser. B 27 (2006), no. 2,
179–192.
[Ts] Tsuji, H. Dynamical construction of Kähler-Einstein metrics, preprint,
math.AG/06066262.
[Y1] Yau, S.-T. On the Ricci curvature of a compact Kähler manifold and
the complex Monge-Ampère equation, I, Comm. Pure Appl. Math. 31
(1978), no.3, 339–411.
[Y2] Yau, S.-T. Open problems in geometry, Proc. Symposia Pure Math. 54
(1993), 1–28
http://arxiv.org/abs/math/0512625
http://arxiv.org/abs/math/0405071
http://arxiv.org/abs/math/0606626
[Ze] Zelditch, S. Szegö kernels and a theorem of Tian, Internat. Math. Res.
Notices (1998), no. 6, 317–331.
[Zh] Zhang, S. Heights and reductions of semi-stable varieties, Compositio
Math. 104 (1996), 77–105.
ABSTRACT
  We show that on Kahler manifolds with negative first Chern class, the
sequence of algebraic metrics introduced by H. Tsuji converges uniformly to the
Kahler-Einstein metric. For algebraic surfaces of general type and orbifolds
with isolated singularities, we prove a convergence result for a modified
version of Tsuji's iterative construction.

<|endoftext|><|startoftext|>
Introduction
Starting from the seminal works by Poincaré [13] and Denjoy [3], a deep theory for the dy-
namics of circle diffeomorphisms has been developed by many authors [1, 7, 8, 17], and most of
the fundamental related problems have been already solved. Quite surprisingly, the case of several
commuting diffeomorphisms is rater special, as it was pointed out for the first time by Moser [9]
in relation to the problem of the smoothness for the simultaneous conjugacy to rotations. Roughly
speaking, in this case it should be enough to assume a joint Diophantine condition on the rotation
numbers which does not imply a Diophantine condition for any of them (see the recent work [5] for
the solution of the C∞ case of Moser’s problem).
A similar phenomenon concerns the classical Denjoy Theorem. Indeed, in [4] it was proved
that if d ≥ 2 is an integer number and τ > 1/d, then the elements f1, . . . , fd of any family of
C1+τ commuting circle diffeomorphisms are simultaneously (topologically) conjugate to rotations
provided that their rotation numbers are independent over the rationals (that is, no non trivial
linear combination of them with rational coefficients equals a rational number). In other words,
the classical (and nearly optimal) C2 hypothesis for Denjoy Theorem can be weakened in the case
of several commuting diffeomorphisms. The first and main result of this work is a generalization of
this fact to the case of different regularities.
Theorem A. Let d ≥ 2 be an integer number and τ1, . . . , τd be real numbers in ]0, 1[ such that
τ1+ · · ·+ τd > 1. If fk, k ∈ {1, . . . , d}, are respectively C
1+τk circle diffeomorphisms which have ro-
tation numbers independent over the rationals and which do commute, then they are simultaneously
(topologically) conjugate to rotations.
Since the probabilistic arguments of [4] cannot be applied to the case of different regularities, the
preceding result is much more than a straightforward generalization of Theorem A of [4]. Indeed,
for the proof here we use a key new argument which is somehow more deterministic.
Theorem A is (almost) optimal (in the Hölder scale), in the sense that if one decreases slightly the
regularity assumptions then it is no longer true. The following result relies on classical constructions
by Bohl [2], Denjoy [3], Herman [7], and Pixton [12], and its proof consists on an easy extension of
the construction given by Tsuboi in [16].
http://arxiv.org/abs/0704.1006v1
Denjoy Theorem for commuting diffeomorphisms 2
Theorem B. Let d ≥ 2 be an integer number and τ1, . . . , τd be real numbers in ]0, 1[ such that
τ1 + · · ·+ τd < 1. If ρ1, . . . , ρd are elements in R/Z which are independent over the rationals, then
there exist C1+τk circle diffeomorphisms fk, k ∈ {1, . . . , d}, having rotation numbers ρk, which do
commute, and such that none of them is topologically conjugate to a rotation.
It is well known that the techniques developed for Denjoy Theory can be applied to the study of
group actions on the interval. In this direction we should point out that the methods of this paper
also allow to extend (in a straightforward way) the so called “Generalized Kopell Lemma” and
the “Denjoy-Szekeres Type Theorem” (Theorems B and C of [4] respectively) for Abelian groups
of interval diffeomorphisms under analogous hypothesis of different regularities. Furthermore, the
construction of counter-examples for both of them when these hypothesis do not hold can be also
extended to this context. We leave the verification of all of this to the reader.
Acknowledgments. It is a pleasure to thank Bassam Fayad and Sergey Voronin for their encour-
agements, as well as the Independent University of Moscow for the hospitality during the conference
“Laminations and Group Actions in Dynamics” held in February 2007. The first author was sup-
ported by the Swiss National Science Foundation. This work was also funded by the RFBR grants
7-01-00017-a and CNRS-L−a 05-01-02801, and by the CONICYT grant 7060237.
1 A general principle revisited
As it is well known since the classical works by Denjoy, Schwartz and Sacksteder [3, 14, 15], if I is
a wandering interval1 for the dynamics of a finitely generated semigroup Γ of C1+lip diffeomorphisms
of the closed interval or the circle (on which we will always consider the normalized length), one
can control the distortion of the elements of Γ over (a slightly larger interval than) I in terms of
the sum of the lengths of the images of I along the corresponding sequence of compositions and a
uniform Lipschitz constant for the derivatives of the (finitely many) generators of Γ. If τ belongs
to ]0, 1[ and Γ consists of C1+τ diffeomorphisms, the same is true provided that the sum of the
τ -powers of the lengths of the corresponding images of I is finite (this last condition does not follow
from the disjointness of these intervals !): see for instance [4], Lemma 2.2. It is not difficult to
prove a similar statement for the case of different regularities, and this is precisely the content of
the following lemma. However, to the difference of [4], here we will deal with finite sequences of
compositions by a technical reason which will be clear at the end of the next section.
Lemma 1.1. Let Γ be a semigroup of (orientation preserving) diffeomorphisms of the circle or the
closed interval which is generated by finitely many elements gk, k ∈ {1, . . . , l}, which are respectively
of class C1+τk , where τk∈]0, 1]. Let Ck denote the τk-Hölder constant of the function log(g
k), and
let C = max{C1, . . . , Cl} and τ = max{τ1, . . . , τl}. Given n0 ∈ N, for each n ≤ n0 let us chose
kn ∈ {1, . . . , l}, and for a fixed interval I let S > 0 be a constant such that
∣gkn · · · gk1(I)
τkn+1 . (1)
If n ≤ n0 is such that gkn · · · gk1(I) does not intersect I but is contained in the L-neighborhood of
I, where L := |I|/2 exp(2τCS), then gkn · · · gk1 has a hyperbolic fixed point.
1We say that an interval is wandering if its images by different elements of the underlying semigroup are disjoint.
Denjoy Theorem for commuting diffeomorphisms 3
Proof. Let J = [a, b] be the (closed) 2L-neighborhood of I, and let I ′ (resp. I ′′) the connected
component of J \I to the right (resp. to the left) of I. We will prove by induction on j∈{0, . . . , n0}
that the following two conditions are satisfied:
(i)j |gkj · · · gk1(I
′)| ≤ |gkj · · · gk1(I)|,
(ii)j sup{x,y}⊂I∪I′
(gkj ···gk1)
(gkj ···gk1)
≤ exp(2τ CS).
Condition (ii)0 is trivially satisfied, whereas condition (i)0 is satisfied since |I
′| = 2L ≤ |I|.
Assume that (i)i and (ii)i hold for each i ∈ {0, . . . , j − 1}. Then for every x, y in I ∪ I
′ we have
(gkj · · · gk1)
(gkj · · · gk1)
∣ log(g′ki+1(gki · · · gk1(x))) − log(g
(gki · · · gk1(y)))
Cki+1
∣gki · · · gk1(x)− gki · · · gk1(y)
τki+1
|gki · · · gk1(I)|+ |gki · · · gk1(I
)τki+1
≤ C 2τ
|gki · · · gk1(I)|
τki+1
≤ C 2τS.
This shows (ii)j. To verify (i)j first note that there must exist x ∈ I and y ∈ I
′ such that
|gkj · · · gk1(I)| = |I| · (gkj · · · gk1)
′(x) and |gkj · · · gk1(I
′)| = |I ′| · (gkj · · · gk1)
′(y).
Therefore, by (ii)j ,
|gkj · · · gk1(I
|gkj · · · gk1(I)|
(gkj · · · gk1)
(gkj · · · gk1)
|I ′|
≤ exp(2τCS)
|I ′|
which proves (i)j . Obviously, similar arguments show that (i)j and (ii)j also hold for every
j ∈ {0, . . . , n0} when we replace I
′ by I ′′.
Now for simplicity let us denote hj = gkj · · · gk1 . Assume that hn(I) is contained in the L-
neighborhood of the interval I (see Figure 1). Then property (i)n gives hn(J) ⊂ J , and this already
implies that hn has a fixed point x in J . (The reader will see that the existence of this fixed point
together with the fact that hn 6= id is the only information that we will retain for the proof of
Theorem A.)
To conclude we would like to show that the fixed point x is hyperbolic. To do this just note
that, if hn(I) does not intersect I, then there exists y ∈ I such that
h′n(y) =
|hn(I)|
Therefore, by (ii)n,
h′n(x) ≤ h
n(y) exp(2
τCS) ≤
L exp(2τCS)
and this finishes the proof. �
Denjoy Theorem for commuting diffeomorphisms 4
.......................
...............................
.................................................................................................................................................................................
.................................................................
...........
.................................................................................................................................................................................................................................................................................................................................................................
......................................
..............................
.........................
......................
....................
...................
. ............................
.............
.......... ..........
hn hn
...........................................................
.........
Figure 1
hn(I)I
hyperbolic
fixed point
2 Proof of Theorem A
Recall the following well known argument (see for instance [6], Proposition 6.17, or [11], Lemma
4.1.4). If f1, . . . , fd are commuting circle homeomorphisms, then there is a common invariant
probability measure µ on S1. Moreover, if the rotation number of at least one of them is irrational,
then there is no finite orbit for the group action, and the measure µ has no atom. Therefore, the
distribution function
Fµ : S
1 → R/Z, Fµ(x) := µ([0, x[),
gives a (simultaneous) semiconjugacy between the maps f1, . . . , fd and the rotations corresponding
to their rotation numbers. Thus, for the proof of Theorem A we have to show that this semiconju-
gacy is in fact a conjugacy, and our strategy for proving this (under the hypothesis of the Theorem)
is the classical one and goes back to Schwartz [15]. Indeed, in the contrary case the support of
µ would be a (minimal) invariant Cantor set, and the connected components of its complement
would correspond to the maximal wandering open intervals. Fixing one of these intervals, say I,
we will search for a sequence of compositions hn = fkn · · · fk1 satisfying the hypothesis of Lemma
1.1. This will allow us to conclude that some hn has a (hyperbolic) fixed point, thus implying that
its rotation number is equal to zero. However, this is in contradiction to the fact that the rotation
numbers of the fk’s are independent over the rationals (it is easy to verify that the rotation number
restricted to any group of circle homeomorphisms which preserves a probability measure on S1 is a
group homomorphism: see again [6] or [11]).
In order to ensure the existence of the sequence (hn) the main idea of [4] was to endow the
space of all (infinite) sequences of compositions with a natural probability measure, and then to
prove that the “generic ones” satisfy many nice properties as for instance the convergence of the
sum (1) as n0 goes to infinity. It seems that such a probabilistic argument cannot be applied to
the case of different regularities, and we will need to introduce a new argument which is somehow
more deterministic, since it gives partial information on the sequence that we find. For simplicity
we will first deal with the case d=2.
2.1 The case d = 2
Although not explicitly stated in [4], the main probabilistic argument for the proof of the
Generalized Denjoy Theorem therein is not a dynamical issue, but it is just a statement concerning
the finiteness of the sum of the τ -powers of some positive real numbers. To be more concrete (at
least in the case d = 2 and when τ > 1/2), if (ℓi,j) is a double-indexed sequence of positive numbers
with finite total sum (where i and j are non negative integers), then with respect to some natural
probability distribution on the space of infinite paths (i(n), j(n))n≥0 satisfying i(0) = j(0) = 0,
i(n+1) ≥ i(n), j(n+1) ≥ j(n) and i(n+1)+ j(n+1) = 1+ i(n)+ j(n), one has almost everywhere
Denjoy Theorem for commuting diffeomorphisms 5
the convergence of the sum
ℓτi(n),j(n).
The first goal of this section is to prove the existence of paths sharing a similar property in the
case of different exponents τ1, τ2 in ]0, 1[ (with τ1 + τ2 > 1). A substantial difference here is that
we will construct our sequence by concatenating infinitely many finite paths, and each one of these
paths will be chosen among finitely many ones. To do this we begin with the following elementary
lemma.
Lemma 2.1. Let ℓi,j be positive real numbers, where i ∈ {1, . . . ,m} and j ∈ {1, . . . , n}. Assume
that the total sum of the ℓi.j’s is less than or equal to 1. If τ belongs to ]0, 1[, then there exists
k ∈ {1, . . . , n} such that
ℓτi,k ≤
Proof. We will show that the mean value of the function k 7→
i=1 ℓ
i,k is less than or equal to
m1−τ/nτ , from where the claim of the lemma follows immediately. To do this first note that, by
Hölder’s inequality, for each fixed k ∈ {1, . . . , n} one has
ℓτi,k =
(ℓτi,k)
i=1, (1)
∥(ℓτi,k)
· ‖(1)mi=1‖1/(1−τ) =
m1−τ .
Thus, by using Hölder’s inequality again one obtains
ℓτi,k
, (1)
· ‖(1)
k=1‖1/(1−τ)
which finishes the proof. �
Now we explain the main idea of our construction. Let us assume that the total sum of the
double-indexed sequence of positive numbers ℓi,j is ≤ 1, and suppose that the numbers τ1∈]0, 1[ and
τ2∈]0, 1[ such that τ1+τ2 > 1 are fixed. Denoting by [[a, b]] the set of integers between a and b (with
a and b included when they are in Z), let us consider any sequence of rectangles Rm ⊂ N0×N0 such
that R0 = {(0, 0)}, R2m+1 = [[im, im+1]]× [[jm, jm+2]] and R2m+2 = [[im, im+2]]× [[jm+1, jm+2]],
where (im)m≥1 and (jm)m≥1 are strictly increasing sequences of non negative integers numbers
satisfying i0 = i1 = 0 and j0 = j1 = 0 (see Figure 2). Denoting by Xm and Ym respectively the
number of points on the horizontal and vertical sides of each Rm, a direct application of Lemma
2.1 gives us, for ε := 1− τ1 − τ2 > 0 and each m ≥ 0:
Denjoy Theorem for commuting diffeomorphisms 6
– an integer r(2m+ 1) ∈ [[im, im+1]] such that
r(2m+1),j
Y 1−τ22m+1
Y τ12m+1
· Y −ε2m+1,
– an integer r(2m+ 2) ∈ [[jm+1, jm+2]] such that
i,r(2m+2)
Y τ12m+2
Y τ12m+2
·X−ε2m+2.
Starting from the origin and following the corresponding horizontal and vertical lines, we find
an infinite path (i(n), j(n))n≥0 satisfying
i(0) = j(0) = 0, i(n + 1) ≥ i(n), j(n+ 1) ≥ j(n), i(n+ 1) + j(n + 1) = 1 + i(n) + j(n),
and such that the sum
τα(n)
i(n),j(n)
is bounded by
· Y −ε2m+1 +
·X−ε2m+2
, (3)
where α(n) := 1 if |i(n + 1)− i(n)| = 1 and α(n) := 2 if |j(n + 1)− j(n)| = 1.
Figure 2
i0 = i1 i2 i3 i4 i5
j0 =j1
•••••
•••••••••••••••••••••••
•••••••••••••••••••••••••••••
R5 R6
R7 R8
Denjoy Theorem for commuting diffeomorphisms 7
Now let us consider any choice such that im = [4
mτ1 ] and jm = [4
mτ2 ] for m large enough.
Writing am ≃ bm when (am) and (bm) are sequences of positive numbers such that (am/bm) remains
bounded and away from zero, for such a choice we have Xm ≃ 2
mτ1 and Ym ≃ 2
mτ2 . Thus,
(2mτ1)τ2
(2mτ2)τ1
and therefore there exists C > 0 such that, for each m ≥ 0,
Y τ1m
This implies that the sum in (3) is bounded by
S := C
 = C
4τ2ε − 1
4τ1ε − 1
, (4)
and so the value of the sum (2) is finite (and also bounded by S).
We can now proceed to the proof of Theorem A in the case d=2. Assume by contradiction that
fk, k∈{1, 2}, are respectively C
1+τk commuting circle diffeomorphisms which are not simultaneously
conjugate to rotations and which have rotation numbers independent over the rationals. Let I be a
connected component of the complement of the invariant minimal Cantor set for the group action,
and let ℓi,j = |f
2 (I)|. We obviously have
i,j ℓi,j ≤ 1, and so we can apply all our previous
discussion to this sequence. In particular, there exists an infinite path (i(n), j(n)) starting at the
origin and such that the sum
τα(n)
i(n),j(n)
is bounded by the number S > 0 defined by (4). If for n ≥ 1 we let kn = α(n − 1) ∈ {1, 2}, then
we obtain a sequence of compositions hn = fkn · · · fk1 such that the preceding sum coincides term
by term with
|fkn · · · fk1(I)|
τkn+1 .
Thus, in order to apply Lemma 1.1 to get a contradiction, we just need to verify that, for some
n ≥ 1, the hypothesis that hn(I) = fkn · · · fk1(I) is contained in the L-neighborhood of I is satisfied
(where L := |I|/2 exp(2τCS), τ := max{τ1, τ2}, and C := max{C1, . . . , Cd}, with Ck being the τk-
Hölder constant for the function log(f ′k)).
To to this first note that, if we collapse all the connected components of the complement of the
minimal invariant Cantor set, then we obtain a topological circle Ŝ1 on which the original diffeomor-
phisms induce naturally minimal homeomorphisms f̂1 and f̂2 which are simultaneously conjugate
to rotations. Moreover, the L-neighborhood of I becomes a non degenerate interval Û ; thus, there
exists N ∈ N such that the intervals f̂−11 (Û), . . . , f̂
1 (Û), as well as f̂
2 (Û), . . . , f̂
2 (Û), cover the
circle Ŝ1. This easily implies that for any image I0 of I by some element of the semigroup generated
by f1 and f2 there exists k and k
′ in {1, . . . , N} such that fk1 (I0) and f
2 (I0) are contained in the
L-neighborhood of I. Now it is easy to see that, for the sequence of compositions that we found,
for every N̄ ∈ N there exists some integer r ∈ N such that kr = kr+1 = . . . = kr+N̄ . For N̄ = N
this obviously implies that at least one of the intervals hr+1(I), . . . , hr+N (I) is contained in the
L-neighborhood of I, thus finishing the proof.
Denjoy Theorem for commuting diffeomorphisms 8
Figure 3
•••••••••
•••••••••••••
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
R′2 R
R′4 R
R′6 R
x′0 x
y′0=y
y′2=y
y′4=y
y′6=y
We would like to close this section by giving a different type of choice for the sequence of
rectangles which is simpler to describe and for which the preceding arguments are also valable. (For
simplicity, we will use a similar construction to deal with the case d > 2, altough the preceding
one still applies). This sequence (R′m)m≥0 is of the form [[0, x
m]] × [[0, y
m]], where (x
m) and (y
are non decreasing sequences of positive integer numbers such that x′0 = y
0 = 0, x
m > x
m−1 and
y′m = y
m−1 if m is odd, and x
m = x
m−1 and y
m > y
m−1 if m is even. If (ℓi,j) is a double-indexed
sequence of positive real numbers with total sum ≤ 1, we chose these integer numbers in such a
way that x′2m+1 = x
2m+2 = [4
mτ1 ] and y′2m = y
2m+1 = [4
mτ2 ] for m large enough. As before, inside
the rectangle Rm there is a “good” vertical (resp. horizontal) segment of line Lm for m even (resp.
odd). Therefore, for each M0 ∈ N we can concatenate these segments between Lm−1 ∩ Lm and
Lm ∩ Lm+1 at the m
th step for m < M0, and between LM0−1 ∩ LM0 and the point of LM0 on the
boundary of RM0 at the last step (see Figure 3). In this way we obtain a path (starting at the
origin) of finite length n(M0)− 1 for which the sum
n(M0)−1
τα(n)
i(n),j(n)
is bounded by some number S > 0 which is independent of M0.
Now let fk, k ∈{1, 2}, be two commuting circle diffeomorphisms of class C
1+τk which are not
simultaneously conjugate to rotations. Fix again one of the maximal wandering open intervals for
the dynamics, say I, and let ℓi,j = |f
2 (I)|. (Note that
i,j ℓi,j ≤ 1.) The method above gives us
a family of finite paths, and each of these paths determines uniquely a sequence of compositions.
Remark however that there is a little difference here, since we allow the use of the inverses of f1
and f2. Therefore, in order to apply Lemma 1.1, we will need to consider now {f1, f
1 , f2, f
2 } as
being our system of generators, and therefore we put τ = max{τ1, τ2} and C = max{C1, C2, C
where Ci (resp. C
i) is a τi-Hölder constant for the function log(f
i) (resp. log((f
′)). As in the
previous proof, we need to verify that, for some M0 ∈ N, there exists a non trivial element in the
sequence of compositions (hn) associated to its corresponding finite path which sends I inside the
L-neighborhood of itself, where L := |I|/2 exp(2τCS). As before, for proving this it suffices to show
that for every N there exists r ∈ N such that one has hr+i+1 = f1hr+i for each i ∈ {0, . . . , N − 1},
or hr+i+1 = f2hr+i for each i ∈ {0, . . . , N − 1}. However, this last property is always satisfied if
Denjoy Theorem for commuting diffeomorphisms 9
M0 is big enough so that the number of points with integer coordinates in the line segment LM0
contained in RM0 \ RM0−1 is greater than N . Note that it is in this last argument where we use
the fact that we keep only finite sequences of compositions, altough our method combined with a
diagonal type argument easily shows the existence of an infinite sequence for which the sum (2)
converges.
2.2 The general case
In the case d = 2, the “good” paths leading to the sequence of compositions which allows to apply
Lemma 1.1 were obtained by concatenating horizontal and vertical lines. When d > 2 we will need
to concatenate lines in several (namely d) directions, and the geometrical difficulty for doing this
is evident: in dimension bigger than 2, two lines in different directions do not necessarily intersect.
To overcome this difficulty we will use the fact that, at each step (i.e. inside each rectangle), there
is not only one finite path which is good, but this is the case for a “large proportion” of finite paths.
We first reformulate Lemma 2.1 in this direction.
Lemma 2.2. Let ℓi,j be positive real numbers, where i ∈ {1, . . . ,m} and j ∈ {1, . . . , n}. Assume
that the total sum of the ℓi.j’s is less than or equal to 1. If τ belongs to ]0, 1[ and A > 1, then for
a proportion of indexes k ∈ {1, . . . , n} greater than or equal to (1− 1/A) we have
ℓτi,k ≤ A
Proof. As in the proof of Lemma 2.1, the mean value of the function
ℓτi,k (5)
is less than or equal to m1−τ/nτ . The claim of the lemma then follows as a direct application
of Chebychev’s inequality: the proportion of points for which the value of (5) is greater than this
mean value times A cannot exceed 1/A. �
Now let (ℓi1,...,id) be a multi-indexed sequence of positive real numbers having total sum ≤ 1,
and let τ1, . . . , τd be real numbers in ]0, 1[. Starting with R0 = [[0, 0]]
d, let us consider a sequence
(Rm)m≥0 of rectangles of the form Rm = [[0, x1,m]]× · · · × [[0, xd,m]] satisfying xk,m ≥ xk,m−1 for
each k ∈ {1, . . . , d}, with strict inequality if and only if k ≡ m (mod d). For each m ≥ 1 denote by
s(m) ∈ {1, . . . , d} the residue class (mod d) of m, and denote by Fm the face
[[0, x1,m]]× · · · × [[0, xs(m)−1,m]]× {0} × [[0, xs(m)+1,m]]× · · · × [[0, xd,m]]
of Rm. For each (i1, . . . , is(m)−1, 0, is(m)+1, . . . , id) belonging to this face Fm we consider the sum
xs(m),m
τs(m)
i1,...,is(m)−1,j,is(m)+1,...,id
By Lemma 2.2, if Am > 1 then the proportion of points in Fm for which this sum is bounded by
(1 + xs(m),m)
1−τs(m)
j 6=s(m)
(1 + xj,m)
τs(m)
= Am ·
1−τs(m)
s(m),m
j 6=s(m)
τs(m)
is at least equal to (1− 1/Am), where Xj,m := 1+xj,m. In order to concatenate the corresponding
lines we will use the following elementary lemma.
Denjoy Theorem for commuting diffeomorphisms 10
.................
.................. s(m)-direction
..................
..................
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
......
............
............
............
............
............
............
............
............
.......
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
......
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
......
..................
......................
Fm+1•
s(m+ 1)-direction
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
......
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
......
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
.....
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
............
.....
.................. ..................
Figure 4
(i1, . . . , is(m+1)−2, 0, 0, is(m+1)+1, . . . , id)
admissible in Cm
(i1, . . . , is(m+1)−1, 0, 0, is(m+1)+2, . . . , id)
admissible in Cm+1
Lemma 2.3. Let us chose inside each rectangle (Rm)m≥1 a set L(m) of (complete) lines in the
corresponding s(m)-direction whose proportion (with respect to all the lines in that direction inside
(Rm)) is at least (1 − 1/Am). If M0∈N is such that
m=1 1/Am<1, then there exists a sequence
of lines Lm ∈ L(m), m ∈ {0, . . . ,M0}, such that Lm+1 intersects Lm for every m < M0.
Proof. Let us denote by Cm the (d− 2)-dimensional face of Rm given by
[[0, x1,m]]× · · · × [[0, xs(m)−1,m]]× {0} × {0} × [[0, xs(m)+2,m]]× · · · × [[0, xd,m]].
Call a point (i1, . . . , is(m)−1, 0, 0, is(m)+2 , . . . , id) ∈ Cm admissible if there exists a sequence of lines
Li∈L(i), i∈{0, . . . ,m}, such that Li intersects Li+1 for every i∈{0, . . . ,m− 1}, and such that Lm
projects in the s(m)-direction into a point (i1, . . . , is(m)−1, 0, is(m)+1, is(m)+2, . . . , id) ∈ Fm for some
is(m)+1∈ [[0, xs(m)+1,m+1]]. We will show that the proportion of admissible points in CM0 is greater
than or equal to
P := 1−
Am > 0.
To prove this, for each m ≥ 0 let us denote by Pm the proportion of admissible points in Cm.
Since R0 reduces to the origin, it suffices to show that, for all m ≥ 0,
Pm+1 ≥ Pm −
To prove this inequality first note that each line Lm+1 ∈ L(m + 1) determines uniquely a point
(i1, . . . , is(m+1)−1, 0, is(m+1)+1, . . . , id)∈Fm+1. The projection into Cm of this line then corresponds
to the point
(i1, . . . , is(m+1)−2, 0, 0, is(m+1)+1 , . . . , id).
If this is an admissible point of Cm then we can concatenate the line Lm+1 to the sequence of
lines corresponding to it (see Figure 4). Now the proportion of lines in L(m + 1) being at least
Denjoy Theorem for commuting diffeomorphisms 11
1− 1/Am+1, the proportion of those lines which project on Cm into an admissible point is at least
equal to
− (1− Pm) = Pm −
By projecting in the (s(m+1)+1)-direction, this obviously implies that the proportion of admissible
points in Cm+1 is also greater than or equal to Pm − 1/Am+1, thus finishing the proof. �
Observe that a sequence of lines Lm as above determines a finite path (starting at the origin) of
points (x1(n), . . . , xd(n)) having non negative integer coordinates such that the distance between
two consecutive ones is equal to 1. Moreover, if we denote by n(M0) the length of this path plus 1,
the corresponding sum
n(N0)−1
τα(n)
x1(n),...,xd(n)
is bounded by
(1 + xs(m),m)
1−τs(m)
i 6=s(m)
(1 + xi,m)
τs(m)
1−τs(m)
s(m),m
j 6=s(m)
τs(m)
, (7)
where α(n) equals the unique index in {1, . . . , d} for which |xα(n)(n+ 1)− xα(n)(n)| = 1.
Now let us define Am=2
εmτs(m)/2A, where A is a large enough constant so that
m≥0 1/Am<1,
and let us consider any choice of the xk,m’s so that Xk,m ≃ 2
mτk . For such a choice we have
j 6=k
= X−εk,m ·
j 6=k
≃ 2−εmτk ·
j 6=k
(2mτk )τj
(2mτj )τk
= 2−εmτk , (8)
where ε := 1 − τ1 − · · · − τd > 0. Therefore, for each M0 ∈ N the preceding lemma provides us a
sequence of lines Lm, m ∈ {0, . . . ,M0}, such that Lm+1 intersects Lm for each m < M0, and such
that the corresponding expression (7) is bounded from above by
2εmτs(m)/2A ·
j 6=k
≤ AC ′
2−εmτs(m)/2 ≤ AC ′
2−εmτ
′/2 =: S < ∞, (9)
where τ ′ := min{τ1, . . . , τd} and C
′ is a constant (independent of M0) giving an upper bound for
the quotient between the left and the right hand expressions in (8).
With all this information in mind we can proceed to the proof of Theorem A in the case d > 2 in
the very same way as in the (second proof for the) case d = 2. Indeed, assume that fk, k ∈ {1, . . . , d},
are circle diffeomorphisms as in the statement of the theorem which are not conjugate to rotations,
and let I be a maximal open wandering interval for the dynamics (i.e. a connected component
of the complement of the minimal invariant Cantor set). Clearly, we can apply all our previous
discussion to the multi-indexed sequence (ℓi1,...,id) defined by ℓi1,...,id = |f
1 · · · f
(I)|. In particular,
for each M0 ∈ N we can find a finite path so that the sum (6) is bounded by the number S > 0
defined by (9) (which is independent of M0). Each such a path induces canonically a finite sequence
of compositions by the fk’s and their inverses. Therefore, in order to apply Lemma 1.1 to get a
contradiction, we need to verify that some of such sequences contains a (non trivial) element hn
which sends I into its L-neighborhood for L := |I|/2 exp(2τCS), where τ := max{τ1, . . . , τd} and
Denjoy Theorem for commuting diffeomorphisms 12
C := max{C1, . . . , Cd, C
1, . . . , C
d}, with Ck (resp. C
k) being the τk-Hölder constant of the function
log(f ′k) (resp. log((f
)′). To ensure this last property let U be the L-neighborhood of I, and let
N ∈ N be such that, given any wandering interval, among the first N iterates of f1, as well as for
f2, . . . , fd, at least one of them sends this interval inside U . If we take M0 large enough so that the
number of points with integer coordinates in LM0 which are contained in RM0 \RM0−1 exceeds N ,
then one can easily see that the associated sequence of compositions contains the desired element
hn. This finishes the proof of Theorem A.
3 Proof of Theorem B
The strategy for the proof of Theorem B is well known. We prescribe the rotation numbers
ρ1, . . . , ρd (which are supposed to be independent over the rationals), we fix a point p ∈ S
1, and for
each (i1, . . . , id) ∈ Z
d we replace the point Ri1ρ1 · · ·R
(p) by an interval Ii1,...,id of length ℓi1,...,id in
such a way that the total sum of the ℓi1,...,id ’s is finite. Doing this we obtain a new circle on which the
rotations Rρk induce nice homeomorphisms if we extend them apropiately to the intervals Ii1,...,id
(outside these intervals the induced homeomorphisms are canonically defined). More precisely, as
it is well explained in [4, 7, 10, 16], if there exists a constant C ′ > 0 so that for all (i1, . . . , id) ∈ Z
and all k ∈ {1, . . . , d} one has
ℓi1,...,1+ik,...,id
ℓi1,...,ik,...,id
i1,...,ik,...,id
≤ C ′, (10)
then one can perform the extension to the intervals Ii1,...,id in such a way the resulting maps fk,
k∈{1, . . . , d}, are respectively C1+τk diffeomorphisms and commute, and moreover their derivatives
are identically equal to 1 on the invariant minimal Cantor set.2 Indeed, one possible extension is
given by fk(x) = (ϕIi1,...,ik,...,id
)−1 ◦ϕIi1,...,1+ik,...,id
(x), where x belongs to the interior of the interval
Ii1...,ik,...,id. Here, ϕI:]a, b[→ R denotes the map
ϕI(x) =
It turns out that a good choice for the lengths is
ℓi1,...,id =
1 + |i1|1/τ1 + · · · |id|
Indeed, on the one hand, if we decompose the sum of the ℓi1,...,id ’s according to the biggest |ij |
we obtain
(i1,...,id)∈Z
ℓi1,...,id ≤ 1 +
|ij |
1/τj ≤ |ik|
for all j ∈ {1, . . . , d}
|ik| ≥ 1
1 + |i1|1/τ1 + · · · |id|
2Condition (10) is also necessary under these requirements. Indeed, there must exist a point in Ii1,...,ik,...,id for
which the derivative of the corresponding map fk equals ℓi1,...,1+ik,...,id/ℓi1,...,ik,...,id . Since the derivative of fk at
the end points of Ii1,...,ik,...,id is assumed to be equal to 1, condition (10) holds for C
′ being the τk-Hölder constant
of the derivative of fk.
Denjoy Theorem for commuting diffeomorphisms 13
and therefore, for some constant C > 0, this sum is bounded by
card{(i1, . . . , id) : |ij |
1/τj ≤ n1/τk for all j∈{1, . . . , d}, ik = n}
1 + n1/τk
≤ 1 + C
n1/τk
j 6=k
nτj/τk = 1 + C
j 6=k τj)/τk
n1/τk
= 1 + C
n(1−τk−ε)/τk
n1/τk
= 1 + C
n1+ε/τk
where ε := 1− (τ1 + · · ·+ τd). (Remark that, since ε > 0, the last infinite sum converges.)
On the other hand, the left hand expression in (10) is equal to
F (i1, . . . , id) :=
|1 + ik|
1/τk − |ik|
1 + |i1|1/τ1 + · · ·+ |1 + ik|
1/τk + · · ·+ |id|
1 + |i1|
1/τ1 + · · · + |ik|
1/τk + · · ·+ |id|
In order to obtain an upper bound for this expression first note that, if ik ≥ 0, then
F (i1, . . . , ik, . . . , id) ≤ F (i1, . . . ,−1− ik, . . . , id).
Therefore, we can restrict to the case where ik < 0. For this case, denoting B = 1 +
j 6=k |ij |
and a = |ik| we have
F (i1, . . . , id) =
a1/τk − (a− 1)1/τk
B + (a− 1)1/τk
B + a1/τk
a1/τk − (a− 1)1/τk
B + (a− 1)1/τk
)1−τk
B + a1/τk
B + (a− 1)1/τk
Both factors in the last expression are decreasing in B. Thus, since B ≥ 1,
F (i1, . . . , id) ≤
a1/τk − (a− 1)1/τk
1 + (a− 1)1/τk
)1−τk
1 + a1/τk
1 + (a− 1)1/τk
Now note that a ≥ 1. For a = 1 the right hand expression above equals 2τk . If a > 1 then the Mean
Value Theorem gives the estimate a1/τk − (a − 1)1/τk ≤ a
/τk, and therefore the preceding
expression is bounded from above by
((a− 1)1/τk )1−τk
a1/τk
(a− 1)1/τk
· 2 =
21/τk
We have then shown that for any (i1, . . . , id) ∈ Z
d one has
F (i1, . . . , id) ≤
21/τk .
In other words, if τ ′ = min{τ1, . . . , τd} then inequality (10) holds for each (i1, . . . , id) ∈ Z
d and
every k ∈ {1, . . . , d} for the constant C ′ = 21/τ
/τ ′, and this finishes the proof of Theorem B.
Denjoy Theorem for commuting diffeomorphisms 14
References
[1] Arnol’d, V. Small denominators I. Mapping the circle onto itself. Izv. Akad. Nauk SSSR Ser. Mat. 25
(1961), 21-86.
[2] Bohl, P. Uber die hinsichtlich der unabhängigen variabeln periodische differential gleichung erster ord-
nung. Acta Math. 40 (1916), 321-336.
[3] Denjoy, A. Sur les courbes définies par des équations différentielles à la surface du tore. J. Math. Pures
et Appl. 11 (1932), 333-375.
[4] Deroin, B., Kleptsyn, V. & Navas, A. Sur la dynamique unidimensionnelle en régularité intermédiaire.
To appear in Acta Math.
[5] Fayad, B. & Khanin, K. Smooth linearisation of commuting circle diffeomorphisms. Preprint (2006).
[6] Ghys, É. Groups acting on the circle. L’Enseig. Math. 47 (2001), 329-407.
[7] Herman, M. Sur la conjugaison différentiable des difféomorphismes du cercle à des rotations. Publ. Math.
de l’IHÉS 49 (1979), 5-234.
[8] Katznelson, Y., Ornstein, D. The differentiability of the conjugation of certain diffeomorphisms of the
circle. Erg. Theory and Dynam. Systems 9 (1989), 643-680.
[9] Moser, J. On commuting circle mappings and simultaneous Diophantine approximations. Math. Z. 205
(1990), 105-121.
[10] Navas, A. Growth of groups and diffeomorphisms of the interval. Preprint (2006).
[11] Navas, A. Grupos de difeomorfismos del ćırculo. Monograf́ıas del IMCA, Lima, Perú (2006).
[12] Pixton, D. Nonsmoothable, unstable group actions. Trans. of the AMS 229 (1977), 259-268.
[13] Poincaré, H. Mémoire sur les courbes définies par une équation différentielle. Journal de Mathématiques
7 (1881), 375-422, and 8 (1882), 251-296.
[14] Sacksteder, R. Foliations and pseudogroups. Amer. J. Math. 87 (1965), 79-102.
[15] Schwartz, A. A generalization of Poincaré-Bendixon theorem to closed two dimensional manifolds. Amer.
J. Math. 85 (1963), 453-458.
[16] Tsuboi, T. Homological and dynamical study on certain groups of Lipschitz homeomorphisms of the circle.
J. Math. Soc. Japan 47 (1995), 1-30.
[17] Yoccoz, J. C. Centralisateurs et conjugaison différentiable des difféomorphismes du cercle. Petits diviseurs
en dimension 1. Astérisque 231 (1995), 89-242.
Victor Kleptsyn
Université de Genève, 2-4 rue du Lièvre, Case postale 64, 1211 Genève 4, Suisse (Victor.Kleptsyn@math.unige.ch)
Andrés Navas
Universidad de Santiago de Chile, Alameda 3363, Santiago, Chile (andnavas@uchile.cl)
ABSTRACT
  We prove that if d is an integer number bigger than 1 and f_1,...,f_d are
commuting circle diffeomorphisms respectively of class C^(1+\tau_k), where
\tau_1 + ... + \tau_k > 1, then these maps are simultaneously conjugate to
rotations provided that their rotation numbers are independent over the
rationals.

<|endoftext|><|startoftext|>
Transient Dynamics of Sparsely Connected
Hopfield Neural Networks with Arbitrary
Degree Distributions
Pan Zhang and Yong Chen ∗
Institute of Theoretical Physics, Lanzhou University, Lanzhou 730000, China
Abstract
Using probabilistic approach, the transient dynamics of sparsely connected Hopfield
neural networks is studied for arbitrary degree distributions. A recursive scheme is
developed to determine the time evolution of overlap parameters. As illustrative
examples, the explicit calculations of dynamics for networks with binomial, power-
law, and uniform degree distribution are performed. The results are good agreement
with the extensive numerical simulations. It indicates that with the same average
degree, there is a gradual improvement of network performance with increasing
sharpness of its degree distribution, and the most efficient degree distribution for
global storage of patterns is the delta function.
Key words: neural networks, complex networks, degree distribution, probability
theory
PACS: 87.10.+e, 89.75.Fb, 87.18.Sn, 02.50.-r
As a tractable toy model of associative memories and can also be viewed as
an extension of the Ising model, Hopfield neural networks [1] received lots
of attention in recent two decades. Equilibrium properties of fully-connected
Hopfield neural network have been well studied using spin-glass theory, es-
pecially the replica method [2,3]. Dynamics is also studied using generating
functional method [4] and signal-to-noise analysis [5,6,7] .
Given the huge number of neurons, there is only small number of intercon-
nections in human brain cortex (∼ 1011 neurons and ∼ 1014 synapses). In
order to simulate a biological genuine model rather than the fully-connected
networks, various random diluted models were studied, including extremely
diluted model [8,9], finite diluted model [10,11], and finite connection model
⋆ Physica A 387, 1009(2008)
∗ Corresponding author. Email address: ychen@lzu.edu.cn
Preprint submitted to Elsevier 19 November 2018
http://arxiv.org/abs/0704.1007v2
[12,13]. But neural connectivity is suggested to be far more complex than fully
random graph, e.g. the networks of c.elegans and cat’s cortical neural were re-
ported to be small-world and scale-free, respectively [14,15]. To go one step
closer to more biological realistic model, many numerical studies are carried
out, focusing on how the topology, the degree distribution, and clustering co-
efficient of a network topology affect the computational performance of the
Hopfield model [16,17,18,19]. With the same average connection, random net-
work was reported to be more efficient for storage and retrieval of patterns
than either small-world network or regular network [17]. Torres et al. reported
that the capacity of storage is higher for neural network with scale-free topol-
ogy than for highly random diluted Hopfield networks [18]. However, to our
best knowledge, there are no any theoretical results of either dynamics or
statics yet.
The goal of this paper is to analytically study the dynamics of Hopfield model
for a sparsely connected topology whose degree distribution is not restricted
to a specific distribution (e.g. Poisson) but can take arbitrary forms. Another
question investigated in this paper is how the degree distribution of connection
topology influences the network performance, especially whether there exists
an optimal degree distribution given a fixed number of nodes and connections.
Let us consider a system of N spins or neurons, the state of the spins takes
si (t) = ±1 and updates synchronously with the following probability,
Prob[si (t + 1) |hi (t)] =
eβsi(t+1)hi(t)
2 cosh (βhi (t))
, (1)
where β is the inverse temperature and the local field of neuron i is defined
hi (t) =
Jijsj (t) . (2)
We store q = αN random patterns ξµ = (ξ
1 , . . . , ξ
N) in networks, where α is
called the loading ratio. The couplings are given by the Hebb rule,
Jij =
j , (3)
where Cij is the adjacency matrix (Cij = 1 if j is connected to i, Cij = 0
otherwise). In contrast to spin glasses or many other physical systems, the
interactions between biological neurons are not symmetric: neuron i may in-
fluence neuron j even if neuron j has no influence on neuron i. So in our model,
Cij and Cji are chosen independently. Degree of spin i, ki =
j=1Cij, denotes
the number of spins that are connected to i. We consider the case that neurons
are sparsely connected, it means that N → ∞, ki → ∞ but ki/N → 0. For
example, we can take ki = O(lnN). And in this paper, the degrees of neurons
are set as an arbitrary distribution p (ki = k).
We use g (·) to express the transfer function,
si (t+ 1) = g (hi(t)) . (4)
Without loss of generality, let us consider the case to retrieve ξ1. We define
m (t) as the overlap parameter between network state s (t) and the first pattern
ξ1 as
m (t) =
ξ1i si (t) . (5)
Then the local field at time t can be represented by
hi (t) =
j 6=i
j sj (t) +
j 6=i
j sj (t) , (6)
where the first term is the signal from ξ1 and the second one is crosstalk noise
from other patterns. Our aim is to determine the form of the local field in
the thermodynamic limit N → ∞. We apply the law of large numbers to
the signal term and find that it converges to ξ1i
m (t) in the thermodynamic
limit. To show this point intuitively, we can simply replace the signal term by
its average,
j 6=i
j sj (t) =ξ
j sj (t)
. (7)
This formula is exact in the thermodynamic limit because the whole system is
assumed to be self-averaging. Since Cij and ξ
1 are independent of each other,
we can write the average of product as the product of average,
j sj (t)
= ξ1i 〈Cij〉
ξ1j sj (t)
. (8)
Using definition of ki together with Eq. (5), we have 〈Cij〉 = ki/N and m (t) =
ξ1j sj (t)
. So we have following formula,
j 6=i
j sj (t) = ξ
m (t) . (9)
Taking a closer look at the second term in Eq. (6), if all the terms in the
sum (with regard to µ) are independent, we are able to apply the central
limit theorem to it. As pointed out in Ref. [8], two conditions are essential for
the independence of terms in the sum: first is that almost all feedback loops
are eliminated, and the second is that with probability 1, any two neurons
have different clusters of ancestors, i.e. they will remain independent because
they receive inputs from two trees which have no neurons in common. In our
model, because of the sparsely connected architecture together with the high
asymmetry of synaptic connections, two conditions are both satisfied. Thus the
second term in Eq. (6) converges to a zero-mean Gaussian form N
(q−1)ki
where
(q−1)ki
is the variance of Gaussian noise. Then the local field of neuron
i can be expressed by
hi (t) = ξ
m (t) +N
(q − 1) ki
. (10)
Note that similar treatment of local field can also be found in [7].
Then the average state of neuron i was given formally by
〈si (t + 1)〉 =
dz (2π)
−z2/2
m (t) +
(q − 1) ki
, (11)
where 〈〉ξ1 stands for averaging over distribution of ξ
i , and P (ξ) = [δ(ξ + 1) + δ(ξ − 1)] /2.
When self-averaging is assumed, the average of neuron state in the next time
can be obtained by taking average over all N neurons,
〈s (t+ 1)〉 =
dz (2π)
−z2/2
m (t) +
(q − 1) ki
.(12)
Using the concept of degree distribution, we only need to take average over
the degree distributions as
〈s (t+ 1)〉 =
dkp (k)
dz (2π)
−z2/2
m (t) +
(q − 1) k
.(13)
The overlap parameters are obtained in the similar way,
m (t+ 1) =
dkp (k)
dz (2π)
−z2/2
m (t) +
(q − 1) k
.(14)
When focusing on the most interested case of zero temperature (β → ∞),
transfer function g (·) is replaced by sgn (·). From Eq. (14) one gets
m (t+ 1)=
dkp (k)
m(t)+
(q−1)k
dz (2π)
−z2/2
m(t)+
(q−1)k
dz (2π)
−z2/2
m(t)+
(q−1)k
dz (2π)
−z2/2
m(t)+
(q−1)k
dz (2π)
−z2/2
Then, the last equation can be further simplified to
m (t+ 1) =
dkp (k) erf
m (t)
(q − 1) /k
 , (16)
where
erf (u) =
−x2/2
dx. (17)
This finishes the Signal-to-Noise derivation of overlap parameter at zero tem-
perature. As long as the degree distribution of network is determined, using
Eq. (14-17), one can calculate temporal evolution of overlap parameters up to
an arbitrary time step.
Using auxiliary thermal fields γ (t) to express the stochastic dynamics [7], it
is easy to extend the method to arbitrary temperatures by averaging the zero
temperature results over the auxiliary fields.
s (t+ 1) = g (h (t) + γ (t) /β) , (18)
and the probability density of γ (t) is given by
p (γ (t)) =
1− tanh2 (γ (t))
. (19)
0 2 4 6 8 10
 in theory
 in simulation
Fig. 1. Time evolution of overlap parameters for Hopfield network with delta func-
tion degree distribution. Initial overlaps range from 1.0 to 0.1 (top to bottom). N
is 50000. Each neuron has 100 degrees and 20 patterns are stored in networks.
For illustrative examples, we apply our theory to networks with some specific
degree distributions and numerical simulations are performed to verify the
theoretical results. In all of our numerical simulations, we set N = 5 × 104
and the average degree k̄ = 100, varying only the arrangement of connections.
Each neuron is connected on average to 0.2% of the other neurons compared
to ∼ 0.1% in the mouse cortex [20].
The first numerical experiment is the delta function
p (k) = δ
k − k̄
, (20)
which means that every neuron has exactly k̄ connections. In practice, the con-
nection topology is generated by randomizing a regular lattice which average
degree is k̄. Time evolutions of overlap parameters from theory and numerical
simulations are plotted in Fig. 1.
The second degree distribution is binomial distribution which comes from a
Erdös-Renyi random graph [21] (see the left panel of Fig. 2)
p (k) = CkN
. (21)
The temporal evolution of overlap parameters are presented in the right panel
0 2 4 6 8 10
 in theory
 in simulation
0 50 100 150 200
Degree
Fig. 2. Left panel: the normalized binomial degree distribution of networks. Right
panel: the temporal evolution of overlap parameters for Hopfield network with de-
gree distribution shown in the left panel (Erdös-Renyi random graph). Initial over-
laps range from 1.0 to 0.1 (top to bottom). N = 50000, k̄ = 100, and 20 patterns
are stored in networks.
0 2 4 6 8 10
 in theory
 in simulation
54.59815 148.41316 403.42879
9.11882E-4
0.00248
0.00674
0.01832
0.04979
0.13534
0.36788
Degree
Fig. 3. Left panel: the normalized power-law degree distribution (log-log scale).
Right panel: the time evolution of overlap parameters for Hopfield network with
degree distribution plotted in the left panel. Initial overlaps range from 1.0 to 0.1
(top to bottom). N = 50000, k̄ = 100, and 20 patterns are stored in networks.
of Fig. 2.
The third one is power-law distribution (see the left panel of Fig. 3)
p (k) =
k̄2k−3, (22)
0 5 10 15 20
1.0  A
18.0 18.5 19.0
Fig. 4. Theoretical comparison of time evolutions of overlap parameters in networks
with the same average degree but different degree distributions. Degree distribution
of A is the delta function, B is binomial, and C is power-law. The inset shows
in detail that performance of network with delta function degree distribution is
slightly better than that with binomial distribution (Erdös-Renyi random graph).
N = 50000, k̄ = 100, and 20 patterns are stored in networks.
which is of great importance because it may comes from preferential attach-
ment in the growth process of neurons [22]. The right panel of Fig. 3 shows
the temporal evolutions of overlap parameters.
It is obvious that the theoretical results from our scheme are consistent with
the simulations for the above degree distributions. Emerging naturally from
the above statements, which form of degree distribution is the best one?
To investigate how degree distributions influence the performance of networks,
we theoretically compare time evolutions of overlap parameters with delta
function (A), binomial (B), and power-law degree distribution (C) in Fig. 4.
The inset shows the details near stationary states. It indicates that network
with delta degree distribution performs slightly better than that with binomial
distribution. The most rapid degradation in overlap occurs in network with
power-law distribution. This behavior can be interpreted as follows. Using
Eq. (16), it is easy to find that an individual neuron with fewer degrees suffers
more perturbations from crosstalk noise. In the case of power-law degree dis-
tribution, degrees are not uniformly distributed in networks and there are too
many neurons with small number of connections, which leads to negative per-
formance of the entire networks. However, note that despite the disadvantages
of power-law distribution, hubs (subset of networks which has higher degrees)
0 50 100 150 200
Degree
0 5 10 15 20
17 18 19
Fig. 5. Theoretical comparison of time evolutions of overlap parameters for Hopfield
networks for uniform degree distributions with different width. Degree distribution
of A is delta function. Left panel: the uniform degree distributions with various
width, 50 (B), 100 (C), 150 (D), and 200 (E). The inset in right panel shows in
detail that the performance of A is slightly better than that of B. N = 50000,
k̄ = 100, and 55 patterns are stored in networks.
in networks may be useful for partial storage [17].
In addition, for verifying the above statements, we construct the special cases
that the degree distributions of networks is uniform with various widthes, 50
(B), 100 (C), 150 (D), 200 (E), and the delta function A which width is 0
(see the left panel of Fig. 5). The temporal evolutions of overlaps are plotted
in the right panel of Fig. 5, and the inset shows the detailed information
near stationary states. It was found that the much more widespread degree
distribution tends to induce worse performance of networks.
In summary, the transient dynamics of sparsely connected Hopfield model with
arbitrary degree distributions is studied in this paper. It was found that the
delta function degree distribution is optimal in terms of network performance,
and there is a gradual improvement for network performance with increasing
sharpness of its degree distribution. We would like to emphasize that the model
investigated in this paper is a simple relaxation to real network topology,
by neglecting loops in it. But Ref. [23] suggested that the feedback loops
together with their correlations exist in networks and play important role in
network dynamics even in the case of sparsely connected systems. It would
be interesting to investigate Hopfield model with real complicated topology
influenced both by degree distributions and loops (feedbacks and correlations).
Acknowledgements
This work was supported by the National Natural Science Foundation of China
under Grant No. 10305005 and by the Special Fund for Doctor Programs at
Lanzhou University. One of us (PZ) thanks Dong Liu for useful suggestions.
References
[1] Hopfield, J. J. (1982). Neural Networks and Physical Systems with Emergent
Collective Computational Abilities. Proc. Nat. Acad. Sci. USA, 79(8), 2554-
2558.
[2] Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985). Spin-glass models of
neural networks. Phys. Rev. A, 32(2), 1007-1018.
[3] Amit, D. J., Gutfreund H., & Sompolinsky, H. (1985). Storing Infinite Numbers
of Patterns in a Spin-Glass Model of Neural Networks. Phys. Rev. Lett., 55,
1530-1533.
[4] Coolen, A. C. C. (2000). Statistical Mechanics of Recurrent Neural Networks
II. arXiv:cond-mat/0006011.
[5] Amari, S., & Maginu, K. (1988). Statistical neurodynamics of associative
memory. Neural Networks, 1, 63-73.
[6] Okada, M. (1995). A Hierarchy of Macrodynamical Equations for Associative
Memory. Neural Networks, 8, 833-838.
[7] Bolle, D., Blanco, J. B., & Verbeiren T. (2004). The signal-to-noise analysis of
the Little-Hopfield model revisited. J. Phys. A, 37, 1951-1969.
[8] Derrida, B., Gardner E., & Zippelius, A. (1987). An exactly solvable asymmetric
neural network model. Europhys. Lett., 4, 167-173.
[9] Patrick, A. E., & Zagrebnov, V. A. (1990). Parallel dynamics for an extremely
diluted neural network. J. Phys. A, 23, L1323-L1337.
[10] Theumann, W. K. (2003). Mean-field dynamics of sequence processing neural
networks with finite connectivity. Physica A, 328, 1-12.
[11] Zhang, P., & Chen, Y. (2007). Statistical neurodynamics for sequence processing
neural networks with finite dilution. Lect. Note Comput. Sci., 4491, 1144-1152.
[12] Wemmenhove, B., & Coolen, A. C. C. (2003). Finite connectivity attractor
neural networks. J. Phys. A, 36, 9617-9633.
[13] Castillo, I. P., & Skantzos, N. S. (2004). The Little-Hopfield model on a Random
Graph. J. Phys. A, 37, 9087-9099.
http://arxiv.org/abs/cond-mat/0006011
[14] Stephan, K.E., Kamper, L., Bozkurt, A., Burns, G.A.P.C., Young, M.P.,
& Koter, R. (2001). Advanced database methodology for the collation of
connectivity data on the Macaque brain (CoCoMac). Phil. Trans. R. Soc.
Lond. B Biol. Sci. 356, 1159-1186; Cherniak, C. (1994). Component placement
optimization in the brain. J. Neurosci. 14, 2418-2427; Scannell, J. W., Burns,
G. A. P. C., Hilgetag, C. C., ONeill, M. A., & Young, M. P. (1999). The
Connectional Organization of the Cortico-thalamic System of the Cat. Cerebral
Cortex 9, 277-299.
[15] Watts D. J., & Strogatz, S. H. (1998). Collective dynamics of ’small-world’
networks. Nature(London), 393, 440-442.
[16] Simard, D., Nadeau L., & Krögerar, H. (2005). Fastest learning in small
world neural networks. Phys. lett. A 336(11), 8-15; Li, C. & Chen, G. (2003).
Stability of a neural network model with small-world connections. Phys. Rev.
E 68, 052901-4; Davey, N. & Adams, R. (2004). High capacity associative
memories and connection constraints. Connection Science 16(1), 47-65; Davey
N., Christianson, B. & Adams, R. (2004). High capacity associative memories
and small world networks. Neural Networks Proceedings 1, 182; Stauffer, D.,
Aharony A., da, da Fontoura Costa L., & Adler, J. (2003). Efficient Hopfield
pattern recognition on a scale-free neural network. Euro. Phys. J. B 32, 395-399.
[17] McGraw, P. N., & Menzinger, M. (2003). Topology and computational
performance of attractor neural networks. Phys. Rev. E, 68, 047102-047105.
[18] Torres, J. J., Munoz, M. A., Marro, J., & Garrido, P. L. (2003). Influence of
topology on the performance of a neural network, arXiv:cond-mat/0310205.
[19] Kim, B. J. (2004). Performance of networks of artificial neurons: The role of
clustering. Phys. Rev. E, 69, 045101-045104.
[20] Braitenberg, V. & Schüz, A. (1998). Cortex: Statistics and Geometry of
Neruonal Connectivity. Springer-Verlag, Berlin.
[21] Erdös, P., & Renyi, A. (1959). On random graphs. Publ. Math. (debrecen) 6,
290-297.
[22] Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks,
Science, 286, 509-512; Barabási, A. L., Albert, R., & Jeong, H. (1999). Mean-
field theory for scale-free random networks. Physica A, 272, 173-187.
[23] Zhang, P., & Chen Y. (2007). Topology and dynamics of attractor neural
networks: the role of loopiness, arxiv:cond-mat/0703405.
http://arxiv.org/abs/cond-mat/0310205
http://arxiv.org/abs/cond-mat/0703405
	References
ABSTRACT
  Using probabilistic approach, the transient dynamics of sparsely connected
Hopfield neural networks is studied for arbitrary degree distributions. A
recursive scheme is developed to determine the time evolution of overlap
parameters. As illustrative examples, the explicit calculations of dynamics for
networks with binomial, power-law, and uniform degree distribution are
performed. The results are good agreement with the extensive numerical
simulations. It indicates that with the same average degree, there is a gradual
improvement of network performance with increasing sharpness of its degree
distribution, and the most efficient degree distribution for global storage of
patterns is the delta function.

<|endoftext|><|startoftext|>
Introduction to abelian and derived categories, in Representations of reductive
groups, edited by R. W. Carter and M. Geck, Cambridge University Press 1998, 41-62 (avail-
able on Keller’s webpage).
[Ke2] B. Keller, Derived categories and their uses, in Handbook of algebra, edited by
M. Hazewinkel, Elsevier 1996 (available on Keller’s webpage).
[Ke3] B. Keller, On differential graded categories, preprint (available on Keller’s webpage).
[Ke4] B. Keller, Introduction to A∞-algebras and modules, Homology, Homotopy, and Applica-
tions 3 (2001), 1–35 (available on Keller’s webpage).
[KeVo] B. Keller, D. Vossieck, Aisles in derived categories, Bull. Soc. Math. Belg. Ser. A 40 (1988),
no. 2, 239-253.
[KaSch] M. Kashiwara, P. Schapira, Sheaves on Manifolds, Grundlehren der Mathematichen Wis-
senschaften 292, Springer-Verlag, 1990.
LECTURES ON DERIVED AND TRIANGULATED CATEGORIES 33
[Li] J. Lipman, Notes on derived categories and derived functors, available at
http://www.math.purdue.edu/∼lipman.
[Ma] S. Mac Lane, Categories for the working mathematician, Second edition. Graduate Texts in
Mathematics 5, Springer-Verlag, New York, 1998.
[Ma] H. R. Margolis, Spectra and the Steenrod algebra North-Holland Mathematical Library 29,
North-Holland Publishing Co., Amsterdam, 1983.
[May1] J. P. May A concise course in algebraic topology, Chicago Lectures in Mathematics, Uni-
versity of Chicago Press, Chicago, IL, 1999.
[May2] J. P. May, The axioms for triangulated categories, preprint.
[Mi1] B. Mitchel, The full embedding theorem, Am. J. Math. 86 (1964), 619–637.
[Mi2] B. Mitchel, Rings with several objects, Adv. in Math. 8 (1972), 1–161.
[Ne] A. Neeman, Triangulated categories, Annals of Mathematics Studies, 148. Princeton Univer-
sity Press, Princeton, NJ, 2001.
[Po] A. Polishchuk, Noncommutative two-tori with real multiplication as noncommutative projec-
tive varieties, J. Geom. Phys. 50, no. 1-4, 162-187.
[Pu] D. Puppe, On the formal structure of stable homotopy theory, in Colloq. Alg. Topology,
Aarhus University (1962), 65-71.
[Ro] A. Rosenberg, The spectrum of abelian categories and reconstructions of schemes, in Rings,
Hopf Algebras, and Brauer groups, 257–274, Lectures Notes in Pure and Appl. Math. 197,
Marcel Dekker, New York, 1998.
[Ve] J.-L. Verdier, Des catégories dérivées des catégories abéliennes, Astérisque 239, 1996.
[We] C. A. Weibel, Introduction to homological algebra, Cambridge Studies in Advanced Mathe-
matics 38, Cambridge University Press, 1994.
E-mail address: behrang@alum.mit.edu
Mathematics Department, Florida State University, 208 Love Building, Tallahassee,
FL 32306-4510, U.S.A.
	Lecture 1: Abelian categories
	1. Products and coproducts in categories
	2. Abelian categories
	3. Categories of sheaves
	4. Abelian category of quasi-coherent sheaves on a scheme
	5. Morita equivalence of rings
	6. Appendix: injective and projective objects in abelian categories
	Lecture 2: Chain complexes
	1. Why chain complexes?
	2. Chain complexes
	3. Constructions on chain complexes
	4. Basic properties of cofiber sequences
	5. Derived categories
	6. Variations on the theme of derived categories
	7. Derived functors
	Lecture 3: Triangulated categories
	1. Triangulated categories
	2. Cohomological functors
	3. Abelian categories inside triangulated categories; t-structures
	4. Producing new abelian categories
	5. Appendix I: topological triangulated categories
	6. Appendix II: different illustrations of TR4
	References
ABSTRACT
  These notes are meant to provide a rapid introduction to triangulated
categories. We start with the definition of an additive category and end with a
glimps of tilting theory. Some exercises are included.

<|endoftext|><|startoftext|>
Introduction
The aim of this work is to propose a concrete method for studying group actions
on algebraic stacks. Of course, in its full generality this problem could already
be very difficult in the case of schemes. The case of stacks has yet an additional
layer of difficulty due to the fact that stacks have two types of symmetries: 1-
symmetries (i.e., self-equivalences) and 2-symmetries (i.e., 2-morphisms between
self-equivalences).
Studying actions of a group stack G on a stack X can be divided into two sub-
problems. One, which is of geometric nature, is to understand the two types of sym-
metries alluded to above; these can be packaged in a group stack AutX. The other,
which is of homotopy theoretic nature, is to get a hold of morphisms G → AutX.
Here, a morphism G → AutX means a weak monoidal functor; two morphisms
f, g : G→ AutX that are related by a monoidal transformation ϕ : f ⇒ g should be
regarded as giving rise to the “same” action.
Therefore, to study actions of G on X one needs to understand the group stack
AutX, the morphisms G→ AutX, and also the transformations between such mor-
phisms. Our proposed method, uses techniques from 2-group theory to tackle these
problems. It consists of two steps:
1) finding suitable crossed module models for AutX and G;
2) using butterflies [No3, AlNo1] to give a geometric description
of morphisms G → AutX and monoidal transformations between
them.
Finding a ‘suitable’ crossed module model for AutX may not always be easy,
but we can go about it by choosing a suitable ‘symmetric enough’ atlas X → X.
This can be used to find an approximation of AutX (Proposition 6.2), and if we are
lucky (e.g., when X = P(n0, · · · , nr)) it gives us the whole AutX.
2 BEHRANG NOOHI
Once crossed module models for G and AutX are found, the butterfly method
reduces the action problem to standard problems about group homomorphisms and
group extensions, which can be tackled using techniques from group theory.
Organization of the paper
Sections §3–§5 are devoted to setting up the basic homotopy theory of 2-group
actions and using butterflies to formulate our strategy for studying actions. To
illustrate our method, in the subsequent sections we apply these ideas to study
group actions on weighted projective stacks. In §6 we define weighted projective
general linear 2-groups PGL(n0, n1, · · · , nr) and prove (see Theorem 6.3) that they
model AutP(n0, · · · , nr); we prove this over any base scheme S, generalizing the
case S = SpecC proved in [BeNo]:
Theorem 1.1. Let AutPS(n0, n1, · · · , nr) be the group stack of automorphisms of
the weighted projective stack PS(n0, n1, · · · , nr) relative to an arbitrary base scheme
S. Then, there is a natural equivalence of group stacks
PGLS(n0, n1, · · · , nr)→ AutPS(n0, n1, · · · , nr).
Here, PGLS(n0, n1, · · · , nr) stands for the group stack associated to the crossed
module PGLS(n0, · · · , nr) (see §6 for the defnition).
We analyze the structure of PGL(n0, n1, · · · , nr) in detail in §7. In Theorem 7.7
we make explicit the structure of PGL(n0, · · · , nr) = [Gm → G] by writing G as
a semidirect product of a reductive part (product of general linear groups) and a
unipotent part (successive semidirect product of linear affine groups).
In light of the two step approach discussed above, Theorems 6.3 and 7.7 enable
us to study actions of group schemes (or group stacks, for that matter) on weighted
projective stacks in an explicit manner. This is discussed in §9. We classify actions
of a group scheme G on PS(n0, n1, · · · , nr) in terms of certain central extensions
of G by the multiplicative group Gm. We also describe the stack structure of the
corresponding quotient (2-)stacks. As a consequence, we obtain the following (see
Theorem 9.1).
Theorem 1.2. Let k be a field and G a connected linear algebraic group over k,
assumed to be reductive if char(k) > 0. Let X = P(n0, n1, · · · , nr) be a wighted
projective stack over k. Suppose that Pic(G) = 0. Then, every action of G on X
lifts to a linear action of G on Ar+1.
In a forthcoming paper, we use the results of this paper (more specifically, The-
orem 6.3), together with the results of [No2], to give a complete classification, and
explicit construction, of twisted forms of weighted projective stacks; these are the
weighted analogues of Brauer-Severi varieties.
Contents
1. Introduction 1
2. Notation and terminology 3
3. Review of 2-groups and crossed modules 3
4. 2-groups over a site and group stacks 4
4.1. Presheaves of weak 2-groups over a site 5
4.2. Group stacks over C 5
4.3. Equivalences of group stacks 6
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 3
5. Actions of group stacks 7
5.1. Formulation in terms of crossed modules and butterflies 8
6. Weighted projective general linear 2-groups PGL(n0, n1, ..., nr) 9
6.1. Automorphism 2-group of a quotient stack 9
6.2. Weighted projective general linear 2-groups 12
7. Structure of PGL(n0, n1, ..., nr) 13
8. Some examples 18
9. 2-group actions on weighted projective stacks 20
9.1. Description of the quotient 2-stack 21
References 22
2. Notation and terminology
Our notation for 2-groups and crossed modules is that of [No1] and [No3], to
which the reader is referred to for more on 2-group theory relevant to this work.
In particular, we use mathfrak letters G, H for 2-groups or crossed modules. By a
weak 2-group we mean a strict monoidal category G with weak inverses (Definition
3.1). If the inverses are also strict, we call G a strict 2-group.
By a stack we mean a presheaf of groupoids (and not a category fibered in
groupoids) over a Grothendieck site that satisfies the decent condition. We use
mathcal letters X, Y,... for stacks.
Given a presheaf of groupoids X over a site, its stackification is denoted by Xa.
We use the same notation for the sheafification of a presheaf of sets (or groups).
Them-dimensional general linear group scheme over SpecR is denoted by GL(m,R).
When R = Z, this is abbreviated to GL(m). The corresponding projectivized gen-
eral linear group scheme is denoted by PGL(m); this notation does not conflict with
the notation PGL(n0, n1, · · · , nr) for a weighted projective general linear 2-group
(§6) because in the latter case we always assume r ≥ 1.
3. Review of 2-groups and crossed modules
A strict 2-group is a group object in the category of groupoids. Equivalently, a
strict 2-group is a strict monoidal groupoid G in which every object has a strict
inverse; that is, multiplication by an object induces an isomorphism from G onto
itself. A morphism of 2-groups is, by definition, a strict monoidal functor.
The weak 2-groups we will encounter in this paper are less weak than the ones
discussed in [No1, No3]. We hope that this change in terminology is not too con-
fusing for the reader.
Definition 3.1. A weak 2-group is a strict monoidal groupoid G in which multi-
plication by an object induces an equivalence of categories from G to itself. By a
morphism of weak 2-groups we mean a strict monoidal functor. By a weak mor-
phism we mean a weak monoidal functor. (We will not encounter weak morphisms
until later sections.)
The set of isomorphism classes of objects in a 2-group G is denoted by π0G; this
is a group. The automorphism group of the identity object 1 ∈ ObG is denoted by
π1G; this is an abelian group.
4 BEHRANG NOOHI
Weak 2-groups and strict monoidal functors between them form a category
W2Gp which contains the category 2Gp of strict 2-groups as a full subcategory.1
Morphisms in W2Gp induce group homomorphisms on π0 and π1. In other words,
we have functors π0, π1 : W2Gp → Gp; the functor π1 indeed lands in the full
subcategory of abelian groups. A morphism between weak 2-groups is called an
equivalence if the induced homomorphisms on π0 and π1 are isomorphisms. Note
that an equivalence may not have an inverse.
The following lemma is straightforward.
Lemma 3.2. Let f : H→ G be a morphism of weak 2-groups. Then f , viewed as a
morphism of underlying groupoids, is fully faithful if and only if π0f : π0H→ π0G is
injective and π1f : π1H→ π1G is an isomorphism. It is an equivalence of groupoids
if and only if both π0f and π1f are isomorphisms.
A crossed module G = [∂ : G1 → G0] consists of a pair of groups G0 and G1,
a group homomorphism ∂ : G1 → G0, and a (right) action of G0 on G1, denoted
−a. This action lifts the conjugation action of G0 on the image of ∂ and descends
the conjugation action of G1 on itself. In other words, the following axioms are
satisfied:
• ∀β ∈ G1,∀a ∈ G0, ∂(βa) = a−1∂(β)a;
• ∀α, β ∈ G1, β∂(α) = α−1βα.
It is easy to see that the kernel of ∂ is a central (in particular abelian) subgroup
of G1; we denote this abelian group by π1G. The image of ∂ is always a normal
subgroup of G0; we denote the cokernel of ∂ by π0G. A morphism of crossed
modules is a pair of group homomorphisms which commute with the ∂ maps and
respect the actions. Such a morphism induces group homomorphisms on π0 and
Crossed modules and morphisms between them form a category, which we denote
by XMod. We have functors π0, π1 : XMod → Gp; the functor π1 indeed lands
in the full subcategory of abelian groups. A morphism in XMod is said to be an
equivalence if it induces isomorphisms on π0 and π1. Note that an equivalence may
not have an inverse.
There is a well-known natural equivalence of categories 2Gp ∼= XMod; see
[No1], §3.3. This equivalence respects the functors π0 and π1. This way, we can
think of a crossed module as a strict 2-group, and vice versa. For this reason, we
may sometimes abuse terminology and use the term (strict) 2-group for an object
which is actually a crossed module; we hope that this will not cause any confusion.
Note that W2Gp contains 2Gp as a full subcategory.
4. 2-groups over a site and group stacks
First a few words on terminology. For us a stack is presheaf of groupoids over a
Grothendieck site (and not a category fibered in groupoids) that satisfies the descent
condition. This may be somewhat unusual for algebraic geometers who are used to
categories fibered in groupoids, but it makes the exposition simpler. Of course, it
is standard that this point of view is equivalent to the one via categories fibered
in groupoids. Just to recall how this equivalence works, to any category fibered in
groupoids X one can associate a presheaf X of groupoids over C which is defined
1Both W2Gp and 2Gp are 2-categories but we will ignore the 2-morphisms for the time being
and only look at the underlying 1-category.
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 5
as follows. By definition, X is the presheaf that assigns to an object U ∈ C the
groupoids X(U) := Hom(U,X), where U stands for the presheaf of sets represented
by U and Hom is computed in the category of stacks over C. Conversely, to any
presheaf of groupoids one associates a category fibered in groupoids defined via the
Grothendieck construction. For more on this we refer the reader to [Ho], especially
§5.2.
4.1. Presheaves of weak 2-groups over a site. Let C be a Grothendieck site.
Let W2GpC be the category of presheaves of weak 2-groups over C; that is, the
category of contravariant functors from C to W2Gp. We define 2GpC and XModC
analogously. There is a natural equivalence of categories 2GpC ' XModC. In
particular, we can think of a presheaf of crossed modules as a presheaf of (strict)
2-groups. Note that W2GpC contains 2GpC as a full subcategory.
Let X be a presheaf of groupoids over C. To X we associate a presheaf of weak 2-
groups AutX ∈W2GpC which parametrizes auto-equivalences of X. By definition,
AutX is the functor that associates to an object U in C the weak 2-group of self-
equivalences of XU , where XU is the restriction of X to the comma category CU .
(The ‘comma category’, or the ‘over category’, CU is the category of objects in C
over U .) Notice that in the case where X is a stack, AutX, viewed as a presheaf of
groupoids, is also a stack. Indeed, AutX is almost a group object in the category of
stacks over C. To be more precise, AutX is a group stack in the sense of Definition
4.1 below.
Let G ∈ W2GpC be a presheaf of weak 2-groups on C. We define π
0 G to
be the presheaf U 7→ π0
, and π0G to be the sheaf associated to π
Similarly, π
1 G is defined to be the presheaf U 7→ π1
, and π1G to be the
sheaf associated to π
We define π0G and π1G for a presheaf of crossed modules G ∈ XModC in a
similar manner. The equivalence of categories between 2GpC and XModC respects
0 , π0, π
1 and π1. Lemma 3.2 remains valid in this setting if instead of π0 and
π1 we use π
0 and π
4.2. Group stacks over C. We recall the definition of a group stack from [Bre].
We modify Breen’s definition by assuming that our group stacks are strictly as-
sociative and have strict units. This is all we will need because the group stack
AutX of self-equivalences of a stack X (indeed, any presheaf of groupoids X) has
this property, and that is all we are concerned with in this paper.
Definition 4.1 ([Bre], page 19). Let C be a Grothendieck site. By a group stack
over C we mean a stack G that is a strict monoid object in the category of stacks
over C and for which weak inverses exist. By a morphism of group stacks we mean
a strict monoidal functor. That is, a morphism of stacks that strictly respects the
monoidal structures. By a weak morphism we mean a weak monoidal functor.
The condition on existence of weak inverses means that for every U ∈ ObC and
every object a in the groupoid G(U), multiplication by a induces an equivalence of
categories from G(U) to itself (or equivalently, an equivalence of stacks from XU to
itself). This condition is equivalent to saying that, for every U ∈ ObC, X(U) is a
weak 2-group. More concisely, it is equivalent to
(pr,mult)
−→ G× G
being an equivalence of stacks.
6 BEHRANG NOOHI
Remark 4.2. It is well known that a weak group stack can always be strictified
to a strict one. So, theoretically speaking, the strictness of monoidal structure in
Definition 4.1 is not restrictive. However, given fixed (strict) group stacks G and H,
strict morphisms H → G are not adequate. We will see in the subsequent sections
that when studying group actions on stacks we can not avoid weak morphisms. In
this section, however, we will only discuss strict morphisms.
Let grStC be the category of group stacks and strict morphisms between them
(Definition 4.1); this is naturally a full subcategory of W2GpC. There are natural
functors
W2GpC → grStC and XModC → grStC.
The former is the stackification functor that sends a presheaf of groupoids to its as-
sociated stack; note that since the stackification functor preserves products, we can
carry over the monoidal structure from a presheaf of groupoids to its stackification.
The latter functor is obtained from the former by precomposing with the natural
fully faithful functor XModC → W2GpC (see the beginning of §4.1). Given a
presheaf of crossed modules [∂ : G1 → G0], the associated group stack has as un-
derlying stack the quotient stack [G0/G1], where G1 acts on G0 by multiplication
on the right (via ∂).
Definition 4.3. Let X be a presheaf of groupoids over C. We define πpreX to be
the presheaf that sends an object U in C to the set of isomorphism classes in X(U).
We denote the sheaf associated to πpreX by πX. For a global section e of X, we
define AutX(e) to be sheaf associated to the presheaf that sends an object U in C
to the group of automorphisms, in the groupoid X(U), of the object eU ; note that
when X is a stack this presheaf is already a sheaf and no sheafification is needed.
Note that when G is the underlying presheaf of groupoids of a presheaf of weak
2-groups G ∈W2GpC, then π0G = πG and π1G = AutG(e), where e is the identity
section of G.
4.3. Equivalences of group stacks. There are two ways of defining the notion
of equivalence between group stacks. One way is to regard them as stacks and use
the usual notion of equivalence of stacks. The other is to regard them as presheaves
of weak 2-groups and use π0 and π1 (see §4.1). The next lemma shows that these
two definitions agree.
Lemma 4.4. Let G and H be group stacks, and let f : H → G be a morphism of
group stacks. Then, the following are equivalent:
(i) f is an equivalence of stacks;
(ii) The induced maps π0f : π
0 H → π
0 G and π1f : π
1 H → π
1 G are
isomorphisms of presheaves of groups;
(iii) The induced maps π0f : π0H → π0G and π1f : π1H → π1G are isomor-
phisms of sheaves of groups.
Proof. The only non-trivial implication is (iii)⇒ (ii). In the proof we will use the
following standard fact from closed model category theory.
Theorem ([Hi], Theorem 3.2.13). Let M be a closed model cate-
gory, L a localizaing class of morphisms in M, and ML the localized
model category. Let X and Y be fibrant objects (i.e., L-local ob-
jects) in ML, and let f : Y→ X be a morphism in M that is a weak
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 7
equivalence in the localized model structure ML (that is, f is an
L-local weak equivalence). Then, f is a weak equivalence in M.
We will apply the above theorem with M being the model structure on the
category GpdC of presheaves of groupoids on C in which weak equivalences are
morphisms that induce isomorphisms (of presheaves of groups) on π
0 and π
and fibrations are objectwise. We take L to be the class of hypercovers. The
weak equivalences in the localized model structure will then be the ones inducing
isomorphism (of sheaves of groups) on π0 and π1. The main reference for this is
[Ho].
Let us now prove (iii)⇒ (ii). It is shown in [Ho] that G and H are L-local objects
(see §5.2 and §7.3 of [ibid.]). By hypothesis, f induces isomorphisms (of sheaves) on
π0 and π1, so it is a weak equivalence in the localized model structure. Therefore,
since G and H are L-local, f is already a weak equivalence in the non-localized model
structure. This exactly means that π0f : π
0 H→ π
0 G and π1f : π
1 H→ π
are isomorphisms of presheaves. �
Lemma 4.5. Let X be a presheaf of groupoids over C and ϕ : X→ Xa its stackifi-
cation. Then, we have the following (see Definition 4.3 for notation):
(i) The induced morphism πX→ π(Xa) is an isomorphism of sheaves of sets;
(ii) For every global section e of X, the natural map AutX(e)→ AutXa(e) is an
isomorphism of sheaves of groups.
Proof. This is a simple sheaf theory exercise. We include the proof of (i). Proof of
(ii) is similar.
First we prove that πϕ : πX → π(Xa) is injective. Let U ∈ ObC, and let x, y
be element in πX(U) such that πϕ(x) = πϕ(y). We have to show that x = y. By
passing to a cover of U , we may assume x and y lift to objects x̄ and ȳ in X(U). We
will show that there is an open cover of U over which x̄ and ȳ become isomorphic.
Since ϕ(x̄) and ϕ(ȳ) become equal in π(Xa), there is a cover {Ui} of U such that
there is an isomorphism αi : ϕ(x̄|Ui) ∼−→ ϕ(ȳ|Ui) in the groupoid Xa(Ui), for every
i. By replacing {Ui} with a finer cover, we may assume that αi come from X(Ui).
(More precisely, αi = ϕ(βi), where βi is a morphism in the groupoid X(Ui).) This
implies that, for every i, x̄|Ui and ȳ|Ui are isomorphic as objects of the groupoid
X(Ui). This is exactly what we wanted to prove.
Having proved the injectivity, to prove the surjectivity it is enough to show that
every object x in π(Xa)(U) is in the image of ϕ, possibly after replacing U by an
open cover. By choosing an appropriate cover, we may assume x lifts to Xa(U).
Since Xa is the stackification of X, we may assume, after refining our cover, that x
is in the image of X(U)→ Xa(U). The claim is now immediate. �
Lemma 4.6. Let G = [G1 → G0] be a presheaf of crossed modules, and let G =
[G0/G1] be the corresponding group stack. Then, we have natural isomorphisms of
sheaves of groups πiG ∼−→ πiG, i = 1, 2.
Proof. Apply Lemma 4.5. �
5. Actions of group stacks
In this section we present an interpretation of an action of a group stack on a
stack in terms of butterflies. We begin with the definition of an action.
8 BEHRANG NOOHI
Definition 5.1. Let X be a stack and G a group stack. By an action of G on X we
mean a weak morphism f : G→ AutX. We say two actions f and f ′ are equivalent
if there is a monoidal transformation ϕ : f → f ′.
In the case where G is a group (over the base site), it is easy to see that our
definition of action is equivalent to Definitions 1.3.(i) of [Ro]. A monoidal trans-
formation ϕ : f → f ′ between two such actions is the same as the structure of a
morphism of G-groupoids (in the sense of Definitions 1.3.(ii) of [Ro]) on the identity
map idX : X→ X, where the source and the target are endowed with the G-groupoid
structures coming from the actions f and f ′, respectively.
5.1. Formulation in terms of crossed modules and butterflies. Butterflies
were introduced in [No3] as a convenient way of encoding weak morphisms between
2-groups (rather, crossed modules representing the 2-groups). The theory was fur-
ther extended in [AlNo1] to the relative case (over a Grothendieck site). We will
use this theory to translate problems about 2-group actions on stacks to certain
group extension problems.
We begin by recalling the definition of a butterfly (see [No3], Definition 8.1 and
[AlNo1], §4.1.3).
Definition 5.2. Let G = [ϕ : G1 → G0] and H = [ψ : H1 → H0] be crossed
modules. By a butterfly from H to G we mean a commutative diagram of groups
H0 G0
in which both diagonal sequences are complexes, and the NE-SW sequence, that
is, G1 → E → H0, is short exact. We require that ρ and σ satisfy the following
compatibility with actions. For every x ∈ E, α ∈ G1, and β ∈ H1,
ι(αρ(x)) = x−1ι(α)x, κ(βσ(x)) = x−1κ(β)x.
A morphism between two butterflies (E, ρ, σ, ι, κ) and (E′, ρ′, σ′, ι′, κ′) is a mor-
phism f : E → E′ commuting with all four maps (it is easy to see that such an f is
necessarily an isomorphism). We define B(H,G) to be the groupoid of butterflies
from H to G.
This definition is justified by the following result (see [No3] and [AlNo1]).
Theorem 5.3. Let G = [G1 → G0] and H = [H1 → H0] be crossed modules
of sheaves of groups over the cite C. Let G = [G0/G1] and H = [H0/H1] be
the corresponding quotient group stacks. Then, there is a natural equivalence of
groupoids
B(H,G) ∼= Homweak(H,G).
Here, the right hand side stands for the groupoid whose objects are weak morphisms
of group stacks and whose morphisms are monoidal transformations.
The above result can be interpreted as follows. A butterfly as in the theorem
gives rise to a canonical zigzag in XModC
∼←− E→ G,
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 9
where E = [H1 ×G1
κ·ι−→ E]. After passing to the associated stacks, it gives rise to
a zigzag in grStC
∼←− E→ G,
which after inverting the left map (as a weak morphism), results in a weak morphism
H→ G. It follows from this description of a butterfly that π0 and π1 are functorial
with respect to butterflies. Furthermore, the equivalence of Theorem 5.3 respects
π0 and π1 (see Lemma 4.6).
When [G1 → G0] is a crossed module model for the group stack AutX of auto-
equivalences of a stack X, then it follows from Theorem 5.3 that an action of
H = [H0/H1] on X is the same thing as an isomorphism class of a butterfly as in
Definition 5.2. In other words, to give an action of H on X, we need to find an
extension E of H0 by G1, together with group homomorphisms κ : H1 → E and
ρ : E → G0 satisfying the conditions of Definition 5.2. This summarizes our strategy
for studying group actions on stacks. To show its usefulness, in the subsequent
sections we will apply this method to the case where X = PS(n0, n1, · · · , nr) is a
weighted projective stack over a base scheme S.
6. Weighted projective general linear 2-groups PGL(n0, n1, ..., nr)
In this section we introduce weighted projective general linear 2-group schemes
and prove that they model self-equivalences of weighted projective stacks (Theorem
6.3).
We begin by some general observations about automorphism 2-groups of quotient
stacks. From now on, we assume that C = SchS is the big site of schemes over
a base scheme S, endowed with a subcanonical topology (say, étale, Zariski, fppf,
fpqc, etc.).
6.1. Automorphism 2-group of a quotient stack. We define a crossed module
in S-schemes [∂ : G1 → G0] to be a pair of S-group schemes G0 and G1, an S-group
scheme homomorphism ∂ : G1 → G0, and a (right) action of G0 on G1 satisfying
the axioms of a crossed module. These are precisely the representable objects in
XModSchS ; in other words, a crossed module in schemes [∂ : G1 → G0] gives rise
to a presheaf of crossed modules
U 7→ [∂(U) : G1(U)→ G0(U)].
We often abuse terminology and call a crossed module in schemes over S simply
a strict 2-group scheme over S.
The following two propositions generalize Lemma 8.2 of [BeNo].
Proposition 6.1. Let S be a base scheme. Let A be an abelian affine group scheme
over S acting on a S-scheme X, and let X = [X/A] be the quotient stack. Let G
be those automorphisms of X which commute with the A action; this is a sheaf of
groups on SchS. We have the following:
(i) With the trivial action of G on A, the natural map ϕ : A → G becomes a
crossed modules in SchS-schemes.
(ii) Let G be the group stack associated to [ϕ : A→ G]. Then, there is a natural
morphism of group stacks G→ AutX. Furthermore, this morphism induces
an isomorphism of sheaves of groups π1G ∼−→ π1(AutX).
10 BEHRANG NOOHI
Proof. Part (i) is straightforward, because ϕ maps A to the center of G. Let G
denote the presheaf of 2-groups associated to [ϕ : A→ G]. To prove part (ii), it is
enough to construct a morphism of presheaves of 2-groups G → AutX and show
that it has the required properties. Stackification of this map gives us the desired
map (Lemma 4.6).
Let us construct the morphism G→ AutX. We give the effect of this morphism
on the sections over S. Since everything commutes with base change, the same
construction works for every U → S in the site SchS and gives rise to the desired
morphism. of presheaves.
To define G(S)→ AutX(S), recall the explicit description of the S-points of the
quotient stack [X/A]:
Ob[X/A](S) =
(T, α) | T an A-torsor over S
α : T → X an A-map
Mor[X/H](S)((T, α), (T
)) = {f : T → T ′ an A-torsor map s.t. α′ ◦ f = α}
Any element of g ∈ G(S) induces an automorphism of X relative to S (keep the
same torsor T and compose α with the action of g on X). Also, for any element
a ∈ A(S), there is a natural 2-isomorphism from the identity automorphism of X
to the automorphism induced by ϕ(a) ∈ G(S) (which is by definition the same as
the action of a). It is given by the multiplication action of a−1 on the torsor T
(remember that A is abelian) which makes the following triangle commute
Interpreted in the language of 2-groups, this gives a morphism of 2-groups
G(S)→ AutX(S).
To prove that G→ AutX induces an isomorphism on π1, we show that, for every
U → S in the site SchS , the morphism of 2-groups G(U) → AutX(U) induces
an isomorphism on π1. Again, we may assume that U = S. We know that the
group of 2-isomorphisms from the identity automorphism of X to itself is naturally
isomorphic to the group of global sections of the inertia stack of X. In the case
X = [X/A], this is naturally isomorphic to the group of elements of A(S) which act
trivially on X. Note that this group is naturally isomorphic to π1G(S). Therefore,
the map G(S)→ AutX(S) induces an isomorphism on π1. �
Proposition 6.2. Notation being as in Proposition 6.1, assume that X is a proper
Deligne-Mumford stack over S, and that X → S has geometrically connected and
reduced fibers. Also, assume that A fits in an extension
0→ A0 → A→ A/A0 → 0
where A/A0 is finite over S and A0 has geometrically connected fibers (this is auto-
matics, for example, in the case where A is smooth and the number of its geometric
connected components is a locally constant function on S). Then, G → AutX is
fully faithful (as a morphism of presheaves of groupoids). In particular, the induced
map π0G→ π0(AutX) of sheaves of groups is injective.
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 11
Proof. As in Proposition 6.1, let G denote the presheaf of 2-groups associated to
[ϕ : A → G]. We need to show that, for every U → S in the site SchS , G(U) →
AutX(U) is fully faithful; since AutX is a stack, it would then follow that the
stackified morphism G→ AutX is also fully faithful.
We may assume that U = S. By Proposition 6.1.(ii) and Lemma 3.2, it is enough
to prove that if the action of g ∈ G(S) on X is 2-isomorphic to the identity, then g
is of the form ϕ(a), for some a ∈ A(S). Let us fix such a 2-isomorphism. The effect
of this 2-isomorphism on the A-torsor on X corresponding the point X → [X/A],
viewed as an object in the groupoid [X/A](X) of X-points of [X/A], is given by
an A-torsor map F : A ×S X → A ×S X which makes the following A-equivariant
triangle commute:
A×S X
µ // X
A×S X
Here, A×S X is the trivial A-torsor on X and µ is the action of A on X.
Precomposing F with the canonical section X → A×S X (corresponding to the
identity element of A) and then projecting onto the first factor, we obtain a map
f : X → A relative to S. The proposition follows from the following.
Claim. The map f is constant, in the sense that it factors through an S-point
a : S → A of A. Furthermore, the effect of a on X (induced from the action of A
on X) is the same as the effect of g on X.
Let us prove the claim. It follows from the commutativity of the above diagram
that, for any point x in X, the effect of g on x is the same as the effect of f(x) on x.2
In other words, f(x)g−1 leaves x fixed. Applying this to ax instead of x, and using
the fact that a and f(x)g−1 commute, we find that f(ax)g−1 also leaves x fixed, for
every a ∈ A. This implies that, for any point x of X, and any a ∈ A, the element
r(a, x) := f(ax)f(x)−1 leaves x fixed. Therefore, the map ρ : A ×S X → A ×S X,
ρ(a, x) := (r(a, x), x) factors through the stabilizer group scheme τ : Σ→ X. Thus,
we have a commutative triangle
A×S X
Now, consider the short exact sequence
0→ A0 → A→ A/A0 → 0,
where A0 is a group scheme over S with geometrically connected fibers and A/A0 is
finite over S. Since τ : Σ→ X has discrete fibers (because X is Deligne-Mumford)
the restriction of ρ to A0 ×S X factors through the identity section. Hence, for
every a ∈ A0 and x ∈ X (over the same point in S), r(a, x) = f(ax)f(x)−1 is the
identity element of A. This implies that f : X → A is A0-equivariant (for the trivial
action of A0 on A). So, we obtain an induced map λ : [X/A0]→ A (relative to S).
Since [X/A0] is finite over [X/A], and [X/A] is proper over S, the structure map
2When we say a “point” of X we mean a scheme T over S and a morphism T → X relative to
12 BEHRANG NOOHI
π : [X/A0]→ S is proper. From our assumptions we have that π has geometrically
connected and reduced fibers. Base change then implies that π∗O[X/A0] = OS .
Since A is affine over S, it follows that λ is constant, i.e., factors through a section
a : S → A. Since f : X → A factors through λ, it also factors through a. By
construction, the effect of a on X is the same as the effect of g on X, which is what
we wanted to prove. �
6.2. Weighted projective general linear 2-groups. Since the construction of
the weighted projective stacks, and also of the weighted projective general linear
2-group schemes, commutes with base change, we can work over Z. We begin with
some notation. We denote the multiplicative group scheme over SpecZ by Gm,Z,
or simply Gm. The affine (r + 1)-space over a base scheme S is denoted by Ar+1S ;
when the base scheme is SpecR it is denoted by Ar+1R , and when the base scheme
is SpecZ simply by Ar+1. Since r will be fixed throughout this section, we will
usually denote Ar+1S − {0} by US . We will abbreviate USpecR and USpecZ to UR
and U, respectively. We fix a Grothendieck topology on SchS that is not coarser
than Zariski.
Let n0, n1, · · · , nr be a sequence of positive integers, and consider the weight
(n0, n1, · · · , nr) action of Gm on U = Ar+1−{0}. (That is, for every scheme T , an el-
ement t ∈ Gm(T ) acts on UT by multiplication by (tn0 , tn1 , · · · , tnr ).) The quotient
stack of this action is called the weighted projective stack of weight (n0, n1, · · · , nr)
and is denoted by PZ(n0, n1, · · · , nr), or simply by P(n0, n1, · · · , nr). The weighted
projective general linear 2-group scheme PGL(n0, n1, · · · , nr) is defined to
be the 2-group scheme associated to the crossed module
[∂ : Gm → Gn0,n1,··· ,nr ],
where Gn0,n1,··· ,nr is the group scheme, over Z, of all Gm-equivariant (for the above
weighted action) automorphisms of U. More precisely, the T -points of Gn0,n1,··· ,nr
are automorphisms
f : UT → UT
that commute with the Gm-action. The homomorphism ∂ : Gm → Gn0,n1,··· ,nr is
the one induced from the Gm-action itself. We take the action of Gn0,n1,··· ,nr on
Gm to be trivial. The associated group stack is denoted by PGL(n0, n1, · · · , nr),
and is called the projective general linear group stack of weight (n0, n1, · · · , nr).
The following theorem says that a weighted projective general linear 2-group
scheme is a model for the group stack of self-equivalences of the corresponding
weighted projective stack. A special case of this theorem (namely, the case where
the base scheme is C) was proved in ([BeNo], Theorem 8.1). We briefly sketch how
the proof in [ibid.] can be modified to cover the general case.
Theorem 6.3. Let AutP(n0, n1, · · · , nr) be the group stack of automorphisms of
the weighted projective stack P(n0, n1, · · · , nr). Then, the natural map
PGL(n0, n1, · · · , nr)→ AutP(n0, n1, · · · , nr)
is an equivalence of group stacks. In particular, we have isomorphisms of sheaves
of groups
π0AutP(n0, n1, · · · , nr) ∼= π0PGL(n0, n1, · · · , nr) ∼= π0 PGL(n0, n1, · · · , nr),
π1AutP(n0, n1, · · · , nr) ∼= π1PGL(n0, n1, · · · , nr) ∼= π1 PGL(n0, n1, · · · , nr) ∼= µd,
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 13
where d = gcd(n0, n1, · · · , nr) and µd stands for the multiplicative group scheme
of dth roots of unity.
In order to prove our main result (Theorem 6.3) we need the following result
about line bundles on weighted projective stacks. For more details on this, the
reader is referred to [No4]. More general results about Picard stacks of algebraic
stacks can be found in [Bro].
Proposition 6.4. Let P = PS(n0, n1, · · · , nr), where S = SpecR is the spectrum
of a local ring. Then every line bundle on P is of the form O(d) for some d ∈ Z.
Proof. In the proof we use stack versions of Grothendieck’s base change and semi-
continuity results ([Ha], III. Theorem 12.11). We will assume that R is Noetherian.
In the case where R is a field, the assertion is easy to prove using the fact that
the Picard group of P is isomorphic to the Weil divisor class group. To prove the
general case, let x be the closed point of S = SpecR. Let L be a line bundle
on P. After twisting with on appropriate O(d), we may assume Lx ∼= O. We
will show that L is trivial. We have H1(Px,Lx) = H
1(Px,Ox) = 0. Hence, by
semicontinuity, H1(Py,Ly) = 0 for every point y of S. Base change implies that
R1f∗(L) = 0, and that R
0f∗(L) = f∗(L) is locally free (necessarily of rank 1).
Therefore, f∗(L) is free of rank 1 and, by base change, H
0(Py,Ly) is 1-dimensional
as a k(y)-vector space, for every y in S. In fact, this is true for every tensor
power L⊗n, n ∈ Z. So, Ly is trivial for every y in S. (Note that, when k is a
field, dimkH
0(Pk(n0, n1, · · · , nr),O(d)) is equal to the number of solutions of the
equation a1n0 + a2n1 + · · ·+ arnr = d in non-negative integers ai.)
Now let s be a generating section of f∗(L) ∼= R. It follows that f∗(s) is a
generating section of L. So L is trivial. �
Proof of Theorem 6.3. We apply Propositions 6.1 and 6.2 with S = SpecZ, X =
Ar+1 − {0}, and H = Gm. This implies that
PGL(n0, n1, · · · , nr)→ AutP(n0, n1, · · · , nr)
is a fully faithful morphism of stacks. That is, for every scheme U , the morphism
of groupoids
PGL(n0, n1, · · · , nr)(U)→ AutP(n0, n1, · · · , nr)(U)
is fully faithful. All that is left to show is that it is essentially surjective. Since
PGL(n0, n1, · · · , nr) and AutP(n0, n1, · · · , nr) are both stacks, it is enough to prove
this for U = SpecR, where R is a local ring. In this case, we know by Proposition
6.4 that PicP(n0, n1, · · · , nr) ∼= Z. We can now proceed exactly as in ([BeNo],
Theorem 8.1).
The isomorphisms stated at the end of the theorem follow from Lemma 4.4 and
Lemma 4.6. �
7. Structure of PGL(n0, n1, ..., nr)
In this section we give detailed information about the structure of the group
Gn0,n1,··· ,nr . We show that, as a group scheme over an arbitrary base, it splits as a
semi-direct product of a reductive group scheme and a unipotent group scheme. The
reductive part is a product of a copies of the general linear groups. The unipotent
part is a successive semi-direct product of vector groups; see Theorem 7.7.
14 BEHRANG NOOHI
Throughout this section, the action of Gm on U = Ar+1−{0} means the weight
(n0, n1, · · · , nr) action. To shorten the notation, we denote the group Gn0,n1,··· ,nr
by G. The rank m general linear group scheme over SpecR is denoted by GL(m,R).
When R = Z, this is abbreviated to GL(m). We always assume r ≥ 1. The
corresponding projectivized group scheme is denoted by PGL(m); this notation
does not conflict with the notation PGL(n0, n1, · · · , nr) for a weighted projective
general linear 2-group as in the latter case we have at least two variables.
We begin with a simple lemma.
Lemma 7.1. Let R be an arbitrary ring, and let f be a global section of the structure
sheaf of UR = Ar+1R − {0}, r ≥ 1. Then f extends uniquely to a global section of
Ar+1R .
Proof. Let Ui = SpecR[x0, · · · , xr, x−1i ] and consider the covering UR = ∪
i=1Ui.
We show that the restriction fi := f |Ui is a polynomial for every i. To see this,
observe that, except possibly for xi, all variables occur with positive powers in fi.
To show that xi also occurs with a positive power, pick some j 6= i and use the fact
that xi occurs with a positive power in fj |Ui∩Uj = fi|Ui∩Uj .
Therefore, for every i, fi actually lies in R[x0, · · · , xr, x−1i ]. Since fj |Ui = fi|Uj ,
it is obvious that all fi are actually the same and provide the desired extension of
f to UR. �
From now on, we will use a slightly different notation with indices. Namely, we
assume that the weights are m1 < m2 < · · · < mt, with each mi appearing exactly
ri ≥ 1 times in the weight sequence (so in the previous notation we would have
r + 1 = r1 + · · · + rt). We denote the corresponding projective general linear 2-
group by PGL(m1 : r1,m2 : r2, · · · ,mt : rt). We use the coordinates xij , 1 ≤ i ≤ t,
1 ≤ j ≤ ri, for Ar+1. We think of xij as a variable of degree mi. We will usually
abbreviate the sequence xi1, · · · , xiri to x
i. Similarly, a sequence F i1, · · · , F iri of
polynomials is abbreviated to Fi.
Let R be a ring. The following proposition tells us how a Gm,R-equivariant
automorphisms of UR looks like.
Proposition 7.2. Let F : UR → UR be a Gm-equivariant map. Then F is of the
form (Fi)1≤i≤t, where for every i, each component F
j ∈ R[x
j ; 1 ≤ i ≤ t, 1 ≤ j ≤ ri]
of Fi is a weighted homogeneous polynomial of weight mi.
Proof. The fact that components of F are polynomial follows from Lemma 7.1. The
statement about homogeneity of F ij is a simple exercise in polynomial algebra and
is left to the reader. �
In the above proposition, each F ij can be written in the form F
j = L
j + P
where Lij is linear in the variables x
1, · · · , xiri , and P
j is a homogeneous polynomial
of degree mi in variables x
b with a < i. Let LF := (L
i)1≤i≤t be the linear part of
F . It is again a Gm-equivariant endomorphism of U.
Proposition 7.3. Let F be as in the Proposition 7.2. The assignment F 7→ LF
respects composition of endomorphisms. In particular, if F is an automorphism,
then so is LF .
Proof. This follows from direct calculation, or, alternatively, by using the fact that
LF is simply the derivative of F at the origin. �
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 15
Corollary 7.4. There is a natural split homomorphism
φ : G→ GL(r1)×GL(r2)× · · · ×GL(rt).
Next we give some information about the structure of the kernel U of φ. It
consists of endomorphisms F = (F ij )i,j , where F
j has the form
F ij = x
j + P
Here, P ij is a homogeneous polynomial of degree mi in variables x
b with a < i.
Indeed, it is easily seen that, for an arbitrary choice of the polynomials P ij , the
resulting endomorphism F is automatically invertible. So, to give such an F ∈ U
is equivalent to giving an arbitrary collection of polynomials {P ij}1≤i≤t,1≤j≤ri such
that each P ij is a homogeneous polynomial of degree mi in variables x
b with a < i.
So, from now on we switch the notation and denote such an element of U by (P ij )i,j .
Proposition 7.5. For each 1 ≤ a ≤ t, let Ua ⊆ U be the set of those endomor-
phisms F = (P ij )i,j for which P
j = 0 whenever i 6= a. Let Ka denote the set of
monomials of degree ma in variables x
j, i < a, and let ka be the cardinality of Ka.
(In other words, ka is the number of solutions of the equation
zi,j = ma
in non-negative integers zi,j.) Then we have the following:
(i) Ua is a subgroup of U and is canonically isomorphic to the vector group
scheme Ara ⊗ AKa ∼= Araka . (Note: U1 is trivial.)
(ii) If a < b, then Ua normalizes Ub.
(iii) The groups Ua, 1 ≤ i ≤ t, generate U and we have Ua ∩ Ub = {1} if a 6= b.
Proof of (i). The action of (P ij )i,j ∈ Ua on A
r+1 is given by
(x1, · · · ,xa, · · · ,xt) 7−→ (x1, · · · ,xa + Pa, · · · ,xt).
So, if AKa stands for the vector group scheme on the basis Ka, there is a canonical
isomorphism
Ua ∼=
AKa ∼= Ara ⊗ AKa .
Proof of (ii). Let G = (Qij)i,j be an element in Ua and F = (P
j )i,j an element in
Ub. By (i), the inverse of G is G
−1 = (−Qij)i,j . Let us analyze the effect of the
composite G ◦ F ◦G−1 on Ar+1:
(x1, · · · ,xa, · · · ,xb, · · · ,xt) G
7−→ (x1, · · · ,xa −Qa, · · · ,xb, · · · ,xt)
F7−→ (x1, · · · ,xa −Qa, · · · ,xb + Rb, · · · ,xt)
G7−→ (x1, · · · ,xa, · · · ,xb + Rb, · · · ,xt).
Here the polynomial Rbk, 1 ≤ k ≤ rb, is obtained from P
k by substituting the
variables xaj with the polynomial x
Proof of (iii). Easy. �
16 BEHRANG NOOHI
Part (ii) implies that each Ua acts by conjugation on each of Ua+1, Ua+2,· · · ,
3 To fix the notation, in what follows we let the conjugate of an automorphism
f by an automorphism g to be g ◦ f ◦ g−1.
Notation. Let {Ua}ta=1 be a family of subgroups of a group U which satisfies the
following properties: 1) Each Ua normalizes every Ub with a < b; 2) No two distinct
Ua intersect; 3) The Ua generate U . In this case, we say that U is a successive
semi-direct product of the Ua, and use the notation U ∼= Ut o · · ·o U2 o U1.
The following is an immediate corollary of Proposition 7.5.
Corollary 7.6. There is a natural decomposition of U as a semi-direct product
U ∼= Ut o · · ·o U2 o U1,
where Ua ∼= Araka is the group introduced in Proposition 7.5. (Note that U1 is
trivial.)
In the next theorem we use the notation Am for two things. One that has already
appeared is the affine group scheme of dimension m. When there is a group scheme
G involved, we also use the notation Am for the trivial representation of G on Am.
Theorem 7.7. There is a natural decomposition of G as a semi-direct product
G ∼= Ut o · · ·o U2 o U1 o
GL(r1)× · · · ×GL(rt)
where Ua ∼= Araka and ka is as in Proposition 7.5. (Note that U1 is trivial.)
Furthermore, for every 1 ≤ a ≤ t, the action of GL(ra) leaves each Ub invariant.
We also have the following:
(i) When a > b the induced action of GL(ra) on Ub is trivial.
(ii) When a = b the induced action of GL(ra) on Ua is naturally isomorphic
to the representation ρ ⊗ AKa , where ρ is the standard representation of
GL(ra) and Ka is as in Proposition 7.5. (Recall that Ua is canonically
isomorphic to Ara ⊗ AKa .)
(iii) When a < b the action of GL(ra) on Ub is naturally isomorphic to the
representation ⊕
0≤l≤bmb
Arbdl ⊗ ρ̂⊗l.
Here ρ̂ stands for the inverse transpose of ρ, and dl is the number of mono-
mials of degree mb in variables x
j, i < b, i 6= a; so dl also depends on a
and b. (In other words, dl is the number of solutions of the equation
i 6=a
zi,j = mb − lma
in non-negative integers zi,j.)
Proof. Let g ∈ GL(ra) and F ∈ Ub. As in the proof of Proposition 7.5.i, we analyze
the effect of the composite g ◦ F ◦ g−1 on Ar+1. The element g ∈ GL(ra) acts
on Ar+1 as follows: it leaves every component xji invariant if i 6= a and on the
coordinates xa1 , · · · , xara it acts linearly (like the action of an ra × ra matrix on a
column vector).
3All group actions in this section are assumed to be on the left.
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 17
Proof of (i). The effect of g ∈ GL(ra) only involves the variables xa1 , · · · , xara and
does not see any other variable, whereas the effect of F ∈ Ub only involves the
variables xij , i ≤ b. Since b < a, these two are independent of each other. That is,
F and g commute.
Proof of (ii). Assume F = (P ij )i,j ; so P
j = 0 if i 6= a. The effect of g ◦ F ◦ g
−1 can
be described as follows:
(x1, · · · ,xa, · · · ,xt) g
7−→ (x1, · · · ,ya, · · · ,xt)
F7−→ (x1, · · · ,ya + Pa, · · · ,xt)
g7−→ (x1, · · · ,xa + Qa, · · · ,xt).
Here, yaj is the linear combination of x
1 , · · · , xara , the coefficients being the en-
tries of the jth row of the matrix g−1. Similarly, Qaj is the linear combination of
P a1 , · · · , P ara , coefficients being the entries of the j
th row of the matrix g.
Proof of (iii). Assume F = (P ij )i,j ; so P
j = 0 if i 6= b. Let y
a be as in (ii). The
effect of g ◦ F ◦ g−1 can be described as follows:
(x1, · · · ,xa, · · · ,xb, · · · ,xt) g
7−→ (x1, · · · ,ya, · · · ,xb, · · · ,xt)
F7−→ (x1, · · · ,ya, · · · ,xb + Rb, · · · ,xt)
g7−→ (x1, · · · ,xa, · · · ,xb + Rb, · · · ,xt).
Here the polynomials Rbk, 1 ≤ k ≤ rb, are obtained from P
k by substituting the
variable xaj with y
Let λ be the representation of GL(ra) on the space V of homogenous polynomials
of degree mb which acts as follows: it takes a polynomial P ∈ V and substitutes
the variables xaj , 1 ≤ j ≤ ra, with y
j . From the description above, we see that the
representation of GL(ra) on Ub is a direct sum of rb copies of λ. We will show that
0≤l≤bmb
Adl ⊗ ρ̂⊗l.
To obtain the above decomposition, simply note that a polynomial in V can be
uniquely written in the form ∑
0≤l≤bmb
SlTl,
where Tl is a homogenous polynomial of degree lma in variables x
1 , · · · , xara , and
Sl is a homogenous polynomial of degree mb− lma in the rest of the variables. The
action of GL(ra) leaves Sl intact and acts on Tl by the l
th power of the inverse
transpose of the standard representation. �
The actions of various pieces in the above semi-direct product decomposition,
though explicit, are tedious to write down, except for small values of t. We give
some examples in §8.
Let us denote Ut o · · ·o U2 o U1 by U and define the crossed module
PGL(n0, n1, · · · , nr)red := [∂ : Gm → GL(r1)× · · · ×GL(rt)],
where the kth factor of ∂(λ) is the size rk scalar matrix λ
mk . Theorem 7.7 then
implies that
PGL(n0, n1, · · · , nr) ∼= U o PGL(n0, n1, · · · , nr)red.
18 BEHRANG NOOHI
We think of U as the unipotent radical and PGL(n0, n1, · · · , nr)red as the reductive
part of PGL(n0, n1, · · · , nr).
Remark 7.8. It is perhaps useful to put the above result in the general context of
algebraic group theory. Recall that every algebraic group G over a field fits in a
short exact sequence
1→ U → G→ Gred → 1,
where U is the unipotent radical of G and Gred is reductive. The sequence is not
split in general. In our case, the group scheme Gn0,n1,··· ,nr admits such a short
sequence over an arbitrary base and, furthermore, the sequence is split.
The general theory of unipotent groups tells us that any unipotent group over
a perfect field admits a filtration whose graded pieces are vector groups. This
filtration splits, but only in the category of schemes (i.e., the splitting maps may
not be group homomorphisms). In our case, however, the group scheme U admits
such a filtration over an arbitrary base. Furthermore, the filtration is split group
theoretically.
8. Some examples
In this section we look at some explicit examples of weighted projective general
linear 2-groups.
Example 8.1. Weight sequence m < n,m - n. In this case we have t = 2, and
r1 = r2 = 1 and k1 = 0. So G ∼= Gm ×Gm.
Example 8.2. Weight sequence m < n,m | n. In this case we have t = 2, r1 = r2 =
1, and k1 = 1. So we have
G ∼= Ao (Gm ×Gm).
The action of an element (λ1, λ2) ∈ Gm ×Gm on an element a ∈ A is given by
(λ1, λ2) · a = λ2λ
More explicitly, an element in G is map of the form
(x, y) 7→ (λ1x, λ2y + ax
Note the similarity with the group of 2× 2 lower-triangular matrices.
Example 8.3. Weight sequence n = m. We obviously have G ∼= GL(2).
Example 8.4. Weight sequence 1, 2, 3. First we determine U . A typical element in
U is of the form
(x, y, z) 7→ (x, y + ax2, z + bx3 + cxy).
We have U2 = A and U3 = A2. The action of an element a ∈ U2 on an element
(b, c) ∈ U3 is given by (b− ac, c). That is, a acts on U3 = A2 by the matrix(
So, U ∼= A⊕2 oA. Finally, we have
G ∼= U o (Gm)3 = A⊕2 oAo (Gm)3,
where the action of an element (λ1, λ2, λ3) ∈ (Gm)3 on an element (a, b, c) ∈ U is
given by (λ−21 λ2a, λ
1 λ3b, λ
2 λ3c).
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 19
Example 8.5. Weight sequence 1, 2, 4. An element in U has the general form
(x, y, z) 7→ (x, y + ax2, z + bx4 + cx2y + dy2).
We have U2 = A and U3 = A3. The action of an element a ∈ U2 on an element
(b, c, d) ∈ U3 is given by the matrix
 1 −a a20 1 −2a
0 0 1
So, U ∼= A⊕3 oA.
Finally, we have
G ∼= U o (Gm)3 = A⊕3 oAo (Gm)3,
where the action of an element (λ1, λ2, λ3) ∈ (Gm)3 on an element (a, b, c, d) ∈ U
is given by
(λ−21 λ2a, λ
1 λ3b, λ
2 λ3c, λ
2 λ3d).
Next we look at PGL(m1 : r1,m2 : r2, · · · ,mt : rt). Recall that, as a crossed
module, this is given by [∂ : Gm → G], where ∂ is the obvious map coming from
the action of Gm on Ar+1, and the action of G on Gm is the trivial one.
Observe that the map ∂ factors though the component GL(r1)× · · · ×GL(rt) of
G. So, let us define L to be the cokernel of the following map:
r1︷ ︸︸ ︷
m1 , . . . , λ
,·····,
rt︷ ︸︸ ︷
mt , . . . , λ
) // GL(r1)× · · · ×GL(rt).
From Theorem 7.7 we immediately obtain the following.
Proposition 8.6. Let L be the group defined in the previous paragraph, and let
k = gcd(m1, · · · ,mt). We have natural isomorphisms of group schemes
π0 PGL(m1 : r1,m2 : r2, · · · ,mt : rt) ∼= Ut o · · ·o U2 o U1 o L,
π1 PGL(m1 : r1,m2 : r2, · · · ,mt : rt) ∼= µk.
Our final result is that, if all weights are distinct (that is, ri = 1), then the
corresponding projective general linear 2-group is split.
Proposition 8.7. Let {m1, · · · ,mt} be distinct positive integers, and consider the
projective general linear 2-group PGL(m1,m2, · · · ,mt). Then, the projection map
G → π0 PGL(m1,m2, · · · ,mt) splits. In particular, PGL(m1,m2, · · · ,mt) is split.
That is, it is completely classified by its homotopy group schemes:
π0 PGL(m1, · · · ,mt) ∼= Ut o · · ·o U2 o U1 o (Gm)t−1,
π1 PGL(m1, · · · ,mt) ∼= µk.
Proof. By Theorem 7.7 and Proposition 8.6 we know that G ∼= Ut o · · · o U2 o
U1 o (Gm)t and π0 PGL(m1,m2, · · · ,mt) ∼= Ut o · · ·oU2 oU1 oL, where L is the
cokernel of the map
α : Gm
(λm1 ,··· ,λmt ) // (Gm)t.
So it is enough to show that the image of µ is a direct factor. Note that if we
divide all the mi by their greatest common divisor, the image of α does not change.
20 BEHRANG NOOHI
So, we may assume gcd(m1, · · · ,mt) = 1. Let M be a t × t integer matrix whose
determinant is 1 and whose first column is (m1, · · · ,mt). The matrix M gives rise
to an isomorphism µ : (Gm)t → (Gm)t whose restriction to the subgroup Gm ×
{1}t−1 ∼= Gm is naturally identified with α. The subgroup µ({1} × (Gm)t−1) ⊂
(Gm)t is the desired complement of the image of α. �
Corollary 8.8. Let m, n be distinct positive integers, and let k = gcd(m,n). Then
PGL(m,n) is a split 2-group. That is, it is classified by its homotopy groups:
π0 PGL(m,n) ∼=
Gm, if m < n,m - n
AoGm, if m < n,m | n
π1 PGL(m,n) ∼= µk.
(In the case m | n, the action of Gm on A in the cross product A o Gm is simply
the multiplication action.)
Proof. Everything is clear, except perhaps a clarification is in order regarding the
parenthesized statement. Observe that the Gm appearing in the cross product
AoGm is indeed the cokernel of the map
α : Gm
(λm,λn) // (Gm)2,
which is naturally identified with the subgroup {1} ×Gm ⊂ (Gm)2. Therefore, by
the formula of Example 8.2, the action of an element λ ∈ Gm on an element a ∈ A
is given by λa. �
Finally, for the sake of completeness, we include the following.
Proposition 8.9. The 2-group PGL(k, k, · · · , k), k appearing t times, is given by
the following crossed module:
(λk,··· ,λk)// GL(t)].
We have π0 PGL(k, · · · , k) ∼= PGL(t) and π1 PGL(k, · · · , k) ∼= µk. In particular,
PGL(1, 1, · · · , 1), 1 appearing t times, is equivalent to the group scheme PGL(t).
9. 2-group actions on weighted projective stacks
In this section we combine the method developed in §3–§5 with the results about
the structure of AutP(n0, n1, · · · , nr) to study 2-group actions on a weighted pro-
jective stack P(n0, n1, · · · , nr). The goal is to illustrate how one can classify 2-group
actions using butterflies and how to describe the corresponding quotient 2-stacks.
Below, all group scheme are assumed to be flat and of finite presentation over a
fixed base S.
Let H be a group stack and [ψ : H1 → H0] a presentation for it as a crossed
module in schemes. By Theorem 5.3, to give an action of H on P(n0, n1, · · · , nr)
is equivalent to giving a butterfly diagram
}} ρ &&
H0 Gn0,n1,··· ,nr
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 21
In other words, to give an action of H on X, we need to find a central extension
1→ Gm → E → H0 → 1
of H0 by Gm, together with
• a lift κ : H1 → E of ψ to E such that κ(βσ(x)) = x−1κ(β)x, for every
β ∈ H1 and x ∈ E;
• an extension of the weighted action of Gm to a linear action of E on Ar+1
which is trivial on the image of κ.
The following result is more or less immediate from the above description of a
group action.
Theorem 9.1. Let k be a field and H a connected linear algebraic group over k,
assumed to be reductive if char k > 0. Let X = P(n0, n1, · · · , nr) be a wighted
projective stack over k. Suppose that Pic(H) = 0. Then, every action of H on X
lifts to an action of H on Ar+1 via a homomorphism H → Gn0,n1,··· ,nr .
Proof. An action of H on X lifts to Ar+1 if and only if the corresponding butterfly
is equivalent to a strict one. By ([AlNo1], Proposition 4.5.3) this is equivalent to
the central extension
1→ Gm → E → H → 1
being split. By ([C-T], Corollary 5.7) such central extensions are classified by
Pic(H), which in our case is assumed to be trivial. Any choice of a splitting
amounts to a lift of the action of H to Ar+1. �
This result is essentially saying that the obstruction to lifting the H-action from
P(n0, n1, · · · , nr) to Ar+1 lies in Pic(H).
9.1. Description of the quotient 2-stack. Given an action of a group stack H on
the weighted projective stack P(n0, n1, · · · , nr), we can use the associated butterfly
to get information about the quotient 2-stack [P(n0, n1, · · · , nr)/H]. First, notice
that [κ : H1 → E] is also a crossed module in schemes. Denote the associated group
stack by H′. It acts on P(n0, n1, · · · , nr) via ρ. We have
[P(n0, n1, · · · , nr)/H] ∼= [(Ar+1 − {0})/H′].
Note that the right hand side is the quotient stack of the action of a group stack on
an honest scheme, namely, Ar+1 − {0}. It is easy to describe its 2-stack structure
by looking at cohomologies of the NW-SE sequence
κ−→ E ρ−→ Gn0,n1,··· ,nr
of the butterfly. Set
[P(n0, n1, · · · , nr)/H]1 := [(Ar+1 − {0})/ cokerκ],
[P(n0, n1, · · · , nr)/H]0 := [(Ar+1 − {0})/ im ρ].
(Here, cokerκ and im ρ are the sheaf theoretic cokernel and image of the corre-
sponding maps which we assume are representable.) Then, [P(n0, n1, · · · , nr)/H]1
is the best approximation of [P(n0, n1, · · · , nr)/H] by a 1-stack, in the sense that it
is obtained by killing off the 2-automorphisms of the 2-stack [P(n0, n1, · · · , nr)/H].
More precisely, there is a natural map
[P(n0, n1, · · · , nr)/H]→ [P(n0, n1, · · · , nr)/H]1
22 BEHRANG NOOHI
making the former a 2-gerbe over the latter for the 2-group [kerκ→ 1]. Similarly,
[P(n0, n1, · · · , nr)/H]0 is an orbifold (i.e., a Deligne-Mumford stack which is gener-
ically a scheme) obtained by quotienting out the generic 1-automorphisms. More
precisely, there is a natural map
[P(n0, n1, · · · , nr)/H]1 → [P(n0, n1, · · · , nr)/H]0
making the former a gerbe over the latter for the group ker ρ/ imκ (namely, the
middle cohomology of the NW-SE sequence).
It follows that the quotient 2-stack [P(n0, n1, · · · , nr)/H] is equivalent to a 1-
stack if and only if κ is injective; it is an orbifold if and only if the NW-SE sequence
is left exact.
Example 9.2. Suppose that H is an honest group scheme and denote it by H.
Then, to give an action of H on P(n0, n1, · · · , nr) is equivalent to giving a central
extension
1→ Gm → E → H → 1
of H by Gm, together with a linear action of E on Ar+1 extending the weighted
action of Gm. We have
[P(n0, n1, · · · , nr)/H] ∼= [(Ar+1 − {0})/E],
which is an honest 1-stack.
Example 9.3. Suppose that H is the group stack associated to [A → 1], where A
is an abelian group scheme. We rename H to A[1]. Then, to give an action of
A[1] on P(n0, n1, · · · , nr) is equivalent to giving a character κ : A → µd ⊂ Gm,
where d is the greatest common divisor of (n0, n1, · · · , nr). The quotient 2-stack
[P(n0, n1, · · · , nr)/A[1]] is a 1-stack if and only if κ is injective. Assume this to
be the case and identify A with the corresponding subgroup of µd. Then, roughly
speaking, the quotient stack [P(n0, n1, · · · , nr)/A[1]] is obtained by killing the A
in µd ⊆ Ix at every inertia group Ix of P(n0, n1, · · · , nr). (Note that the generic
inertia group of P(n0, n1, · · · , nr) is µd.) For example, if the base is an algebraically
closed field of characteristic prime to d, then
[P(n0, n1, · · · , nr)/A[1]] ∼= P(
, · · · ,
where a is the order of A.
References
[AlNo1] E. Aldrovandi, B. Noohi, Butterflies I: morphisms of 2-group stacks, Adv. Math. 221
(2009), no. 3, 687–773.
[BeNo] K. Behrend, B. Noohi, Uniformization of Deligne-Mumford analytic curves, J. reine
angew. Math. 599 (2006), 111–153.
[Bre] L. Breen, On the classification of 2-gerbes and 2-stacks, Astérisque No. 225, 1994.
[Bro] S. Brochard, Foncteur de Picard d’un champ algébrique, Math. Ann. 343 (2009), no. 3,
541–602.
[C-T] J-L. Colliot-Thélène, Résolutions flasques des groupes linéaires connexes, J. reine angew.
Math. 618 (2008), 77–133.
[Ha] R. Hartshorne, Algebraic geometry, Graduate Texts in Mathematics, No. 52, Springer-Verlag,
New York-Heidelberg, 1977.
[Hi] P. S. Hirschhorn, Model categories and their localizations, Mathematical Surveys and Mono-
graphs, 99, American Mathematical Society, Providence, RI, 2003.
[Ho] Sh. Hollander, A homotopy theory for stacks, math.AT/0110247.
GROUP ACTIONS ON ALGEBRAIC STACKS VIA BUTTERFLIES 23
[No1] B. Noohi, Notes on 2-groupoids, 2-groups and crossed modules, Homotopy, Homology, and
Applications, 9 (2007), no. 1, 75–106.
[No2] , Group cohomology with coefficients in a crossed module, J. Inst. Math. Jussieu, 10
(2011), no. 2, 359–404.
[No3] , On weak maps between 2-groups, preprint, arXiv:math/0506313v3 [math.CT].
[No4] , Picard stack of a weighted projective stack, preprint available at
http://www.maths.qmul.ac.uk/∼noohi/research.html.
[Ro] M. Romagny, Group actions on stacks and applications, Michigan Math. J. 53, Issue 1
(2005), 209–236.
	1. Introduction
	2. Notation and terminology
	3. Review of 2-groups and crossed modules
	4. 2-groups over a site and group stacks
	4.1. Presheaves of weak 2-groups over a site
	4.2. Group stacks over C
	4.3. Equivalences of group stacks
	5. Actions of group stacks
	5.1. Formulation in terms of crossed modules and butterflies
	6. Weighted projective general linear 2-groups PGL(n0,n1,...,nr)
	6.1. Automorphism 2-group of a quotient stack
	6.2. Weighted projective general linear 2-groups
	7. Structure of PGL(n0,n1,...,nr)
	8. Some examples
	9. 2-group actions on weighted projective stacks
	9.1. Description of the quotient 2-stack
	References
ABSTRACT
  We introduce an explicit method for studying actions of a group stack G on an
algebraic stack X. As an example, we study in detail the case where
X=P(n_0,...,n_r) is a weighted projective stack over an arbitrary base S. To
this end, we give an explicit description of the group stack of automorphisms
of, the weighted projective general linear 2-group PGL(n_0,...,n_r). As an
application, we use a result of Colliot-Thelene to show that for every linear
algebraic group G over an arbitrary base field k (assumed to be reductive if
char(k)>0) such that Pic}(G)=0, every action of G on P(n_0,...,n_r) lifts to a
linear action of G on A^{r+1}.

<|endoftext|><|startoftext|>
Modal Extraction in Spatially Extended Systems
Kapilanjan Krishan
Department of Physics and Astronomy, University of California - Irvine, Irvine, California 92697
Andreas Handel
Department of Biology, Emory University, Atlanta, Georgia 30322
Roman O. Grigoriev and Michael F. Schatz
Center for Nonlinear Science and School of Physics,
Georgia Institute of Technology, Atlanta, Georgia 30332
(Dated: August 10, 2021)
We describe a practical procedure for extracting the spatial structure and the growth rates of
slow eigenmodes of a spatially extended system, using a unique experimental capability both to
impose and to perturb desired initial states. The procedure is used to construct experimentally the
spectrum of linear modes near the secondary instability boundary in Rayleigh-Bénard convection.
This technique suggests an approach to experimental characterization of more complex dynamical
states such as periodic orbits or spatiotemporal chaos.
PACS numbers:
Numerous nonlinear nonequilibrium systems in nature
and in technology exhibit complex behavior in both space
and time ; understanding and characterizing such behav-
ior (spatiotemporal chaos) is a key unsolved problem in
nonlinear science [1]. Many such systems are modelled
by partial differential equations; hence, in principle, their
dynamics takes place in an infinite dimensional phase
space. However, dissipation often acts to confine these
systems’ asymptotic behavior to finite-dimensional sub-
spaces known as invariant manifolds [2]. Knowledge of
the invariant manifolds provides a wealth of dynamical
information; thus, devising methodologies to determine
invariant manifolds from experimental data would signif-
icantly advance understanding of spatiotemporal chaos.
In this Letter, we describe experiments in Rayleigh-
Bénard convection where several slow eigenmodes and
their growth rates associated with instability of roll states
are extracted quantitatively. Rayleigh-Bénard convec-
tion (RBC) serves well as a model spatially extended sys-
tem; in particular, the spiral defect chaos (SDC) state in
RBC is considered an outstanding example of spatiotem-
poral chaos. In SDC the spatial structure is primarily
composed of curved but locally parallel rolls, punctuated
by defects (Fig. 1) [3, 4]. The recurrent formation and
drift of defects in SDC is believed to play a key role in
driving spatiotemporal chaos; moreover, many aspects of
defect nucleation in SDC are related to defect formation
observed at the onset of instability in patterns of straight,
parallel rolls in RBC [10]. We obtain experimentally a
low-dimensional description of the modes responsible for
the nucleation of one important class of defects (disloca-
tions), by first imposing reproducibly a linearly stable,
straight roll state (stable fixed point) near instability on-
set. This state is subsequently subjected to a set of dis-
tinct, well-controlled perturbations, each of which initi-
FIG. 1: Shadowgraph visualization reveals spontaneous de-
fect nucleation in the spiral defect chaos state of Rayleigh-
Benard convection. Two convection rolls are compressed to-
gether (higher contrast region in left image). (b.) A short
time later (right image), one of the rolls pinches off and two
dislocations form.
ates a relaxational trajectory from the disturbed state to
the (same) fixed point. An ensemble of such trajectories
is used to construct a suitable basis for describing the em-
bedding space by means of a modified Karhunen-Loeve
decomposition. The dynamical evolution of small distur-
bances is then characterized by computing both finite-
time Lyapunov exponents and the spatial structure of
the associated eigenmodes (a similar approach was car-
ried out numerically by Egolf et al. [5]). This capability
is an important step toward developing a systematic way
of characterizing and, perhaps, controlling, spatiotempo-
rally chaotic states like SDC where localized “pivotal”
events like defect formation play a central role in driving
complex behavior.
The convection experiments are performed with
gaseous CO2 at a pressure of 3.2 MPa. A 0.697±0.06
mm-thick gas layer is contained in a 27 mm square cell,
which is confined laterally by filter paper. The layer
http://arxiv.org/abs/0704.1011v1
FIG. 2: Experimental images illustrate the flow response to
two different perturbations applied, in turn, to the same state
of straight convection rolls. Each image represents the dif-
ference between the perturbed and unperturbed convection
states and therefore, each image highlights the effect of a
given perturbation on the flow. In the two cases shown, the
localized perturbation is applied directly on a region of either
downflow (left image) or upflow (right image). In all cases,
the disturbance created by the perturbation decays away and
the flow returns to the original unperturbed state.
is bounded on top by a sapphire window and on the
bottom by a sheet of 1 mm-thick glass neutral den-
sity filter(NDF). The neutral density filter is bonded to
a heated metal plate with heat sink compound. The
temperature of the sapphire window held constant at
21.3 ◦C by water cooling. The temperature difference
between the top and bottom plates ∆T is held fixed
at 5.50 ± 0.01 ◦C by computer control of a thin film
heater attached to the bottom metal plate. These condi-
tions correspond to a dimensionless bifurcation parame-
ter ǫ=(∆T−∆Tc)/∆Tc = 0.41, where ∆Tc is the temper-
ature difference at the onset of convection. The vertical
thermal diffusion time, computed to be 2.1 s at onset,
represents the characteristic timescale for the system.
We use laser heating to alter the convective patterns
that occur spontaneously. A focused beam from an Ar-
ion laser is directed through the sapphire window at
a spot on the NDF. Absorption of the laser light by
the NDF increases the local temperature of the bottom
boundary and hence that of the gas, thereby inducing
locally a convective upflow. The convection pattern may
be modifed, either locally or globally, by rastering the
hot spot created by the laser beam. The beam is steered
using two galvanometric mirrors rotating about axes or-
thogonal to each other under computer control. The in-
tensity of the beam is modulated using an acousto-optic
modulator. This technique of optical actuation is used
to impose convection patterns with desired properties,
to perturb these convection patterns and to change the
boundary conditions. Similar approaches for manipulat-
ing convective flows were explored earlier using a high
intensity lamp and masks [11] in RBC and a rastered
infrared laser in Bénard-Marangoni convection [12].
The experiments begin by using laser heating to im-
pose a well-specified basic state of stable straight rolls.
The basic state is typically arranged to be near the on-
set of instability by imposing a sufficiently large pattern
wavenumber such that at fixed ǫ the system’s parame-
ters are near the skew-varicose stability boundary [10].
In this regime, the modes responsible for the instability
are weakly damped and, therefore, can be easily excited.
The linear stability of the basic state is probed by ap-
plying brief pulses of spatially localized laser heating. For
stable patterns, all small disturbances eventually relax.
To excite all modes governing the disturbance evolution,
we apply a set of localized perturbations consistent with
symmetries of the (ideal) straight roll pattern – continu-
ous translation symmetry in the direction along the rolls
and discrete translation symmetry in the perpendicular
direction plus the reflection symmetry in both directions.
Therefore, localized perturbations applied across half a
wavelength of the pattern form a ”basis” for all such per-
turbations – any other localized perturbation at a differ-
ent spatial location is related by symmetry. Localized
perturbations are produced in the experiment by aiming
the laser beam to create a “hot spot” whose location is
stepped from the center of a (cold) downflow region to
the center of an adjacent (hot) upflow region in differ-
ent experimental runs. The perturbations typically last
approximately 5 s and have a lateral extent of approxi-
mately 0.1 mm, which is less than 10 % of the pattern
wavelength.
The evolution of the perturbed convective flow is mon-
itored by shadowgraph visualization. A digital camera
with a low-pass filter (to filter out the reflections from the
Ar-ion laser) is used to capture a sequence of 256× 256
pixel images recorded with 12 bits of intensity resolution
at a rate of 41 images per second. A background im-
age of the unperturbed flow is subtracted from each data
image; such sequences of difference images comprise the
time series representing the evolution of the perturbation
(Fig 2).
The total power for each (difference) image in a time
series is obtained from 2-D spatial Fourier transforms.
The resulting time series of total power shows a strong
transient excursion (corresponding to the initial response
of the convective flow to a localized perturbation by laser
heating) followed by exponential decay as the system re-
laxes back to the stable state of straight convection rolls.
We restrict further analysis to the region of exponential
decay, which typically represents about 3.5 seconds of
data for each applied perturbation.
The dimensionality of the raw data is too high to
permit direct analysis, so each difference image is first
windowed (to avoid aliasing effects) and Fourier filtered
by discarding the Fourier modes outside a 31 × 31 win-
dow centered at the zero frequency. The discarded high-
frequency modes are strongly damped and contain less
than 1% of the total power. The basis of 312 Fourier
modes still contains redundant information, so we fur-
KL mode 1 KL mode 2 
KL mode 3 KL mode 4 
FIG. 3: The first four Karhunen-Loeve eigenvectors are shown
for a perturbed roll state near the skew-varicose boundary of
Rayleigh-Bénard convection. The eigenvectors are ordered by
their eigenvalues (largest to smallest), which are propotional
to the amount of power contained in the corresponding eigen-
vector.
ther reduce the dimensionality of the embedding space
by projecting the disturbance trajectories onto the “opti-
mal” basis constructed using a variation of the Karhunen-
Loeve (KL) decomposition [6, 7]. The correlation matrix
C is computed using the Fourier filtered time series xs(t),
(xs(t)− 〈xs(t)〉t)(x
s(t)− 〈xs(t)〉t)
†, (1)
where the index s labels different initial conditions and
the origin of time t = 0 corresponds to the time when
the perturbation applied by the laser is within the linear
neighbourhood of the statioary state. The angle brack-
ets with the subscript t indicate a time average. The
eigenvectors of C are the KL basis vectors. It is worth
noting that the average performed to compute C rep-
resents an ensemble average over different initial condi-
tions (obtained by applying different perturbations); this
is distinctly different from the standard implementation
of KL decomposition where statistical time averages are
typically employed.
The spatial structures of the first four KL vectors are
shown in Fig. 3. We find that the first 24 basis vectors
capture over 90% of the total power, so an embedding
space spanned by these vectors represents well the relax-
ational dynamics about the straight roll pattern. In our
convection experiments, the KL eigenvectors show two
distinct length scales. The first two dominant vectors
KL basis vector 1
FIG. 4: A two-dimensional projection of the experimental
time series (symbols) and the least squares fits (continuous
curves). The time series have been shifted such that the fixed
point is at the origin.
are spatially localized, while the remaining vectors are
spatially extended. This is consistent with earlier work
as suggested in [4].
More quantitative information can be obtained by find-
ing the eigenmodes of the system, excited by the pertur-
bation, and their growth rates. These can be extracted
from a nonlinear least squares fit with the cost function
i,s,t
i (t)−
i (∞) +
, (2)
where xsi (t) is a projection of the perturbation at time t
in the time series s onto the ith KL basis vector. In the fit
k and λk are the kth eigenmode and its growth rate and
Ask is the initial amplitude of the kth eigenmode excited
in the experimental time series s. The fixed points xs(∞)
are chosen to be different for the differing time series in
the ensemble to account for a slow drift in the parameters
and we assume that only n eigenmodes are excited.
The results for an ensemble of time series correspond-
ing to seven point perturbations applied across a wave-
length of the pattern with n = 6 are shown in Figs. 4-5.
(With seven different initial conditions we cannot hope
to distinguish more than seven different modes). In par-
ticular, Fig. 4 shows the projection of the experimental
time series and the least squares fit on the plane spanned
by the first two KL basis vectors. Such extraction of
the linear manifold in experiments on spatially extended
systems without the knowledge of the dynamical equa-
tions of the system aids in the application of techniques
that are well developed for low dimensional systems. The
manifolds of fixed points and periodic orbits are of par-
ticular interest in chaotic systems.
The extracted growth rates λk are shown in Fig. 5.
Not surprisingly, since the pattern is stable the growth
rates are negative. The leading eigenmode (see Fig. 6) is
spatially extended and shows a diagonal structure charac-
1 2 3 4 5 6
Mode number
FIG. 5: The growth rates of the six dominant eigenmodes
and the error bars extracted from the least squares fit. The
growth rates have been non-dimensionalized by the vertical
thermal diffusion time.
teristic of the skew-varicose instability in an unbounded
system. This is also expected as the pattern is near the
skew-varicose instability boundary. The second eigen-
mode is spatially localized and has no analog in spatially
unbounded systems. The subsequent modes are again
spatially delocalized and likely correspond to the Gold-
stone modes of the unbounded system (e.g., overall trans-
lation of the pattern) which are made weakly stable due
to confinement by the lateral boundaries of the convec-
tion cell.
If the system is brought across the stability boundary,
one of the modes is expected to become unstable (with-
out significant change in its spatial structure), thereby
determining further (nonlinear) evolution of the system
towards a state with a pair of dislocation defects. We
would also expect the spatially localized eigenmodes (like
the second one in Fig. 6) to preserve their structure if
the base state is smoothly distorted (as it would be, e.g.,
in the SDC state shown in Fig. 1), indicating the same
type of a spatially localized instability. Our further ex-
perimental studies will aim to confirm these expectations.
Defects represent a type of “coherent structure” in
spiral defect chaos. Previous efforts have used coher-
ent structures to characterize spatiotemporally chaotic
extended systems in both models [7] and experiments
[8]; the use of coherent structures to parametrize the in-
variant manifold was pioneered by Holmes et al. [6] in
the context of turbulence. In practice coherent struc-
tures are usually extracted using the Karhunen-Loéve
(or proper orthogonal) decomposition of time series of
system states, which picks out the statistically impor-
tant patterns. This prior work has met with only limited
success – indeed, it is unclear whether statistically im-
portant patterns are dynamically important. An alterna-
tive approach has been proposed by Christiansen et al.
Eigenmode 1 Eigenmode 2 
Eigenmode 3 Eigenmode 4 
FIG. 6: Four dominant eigenmodes extracted from the least
squares fit.
[9], who suggested instead to use the recurrent patterns
corresponding to the low-period unstable periodic orbits
(UPO) of the system, which are dynamically more impor-
tant. Our work sets the stage for attempting the more
ambitious task of extraction of UPOs and their stability
properties from experimental data.
Summing up, we have developed an experimental tech-
nique which allows extraction of quantitative information
describing the dynamics and stability of a pattern form-
ing system near a fixed point. This technique should be
applicable to a broad class of patterns, including unstable
fixed points, periodic orbits and segments of chaotic tra-
jectories. Moreover, we expect that a similar approach
could be applied to other pattern forming systems, con-
vective or otherwise, as long as a method of spatially
distibuted actuation of their state can be devised.
[1] M. C. Cross and P. C. Hohenberg, Rev. Mod. Phys. 65,
851-1112 (1993)
[2] P. Manneville, Dissipative structures and weak turbulence
(Academic Press, 1993).
[3] S. W. Morris, E. Bodenschatz, D. S. Cannel and G.
Ahlers, Phys. Rev. Lett. 71, 2026-2029 (1993)
[4] D. A. Egolf, E. V. Melnikov and E. Bodenschatz, Phys.
Rev. Lett. 80, 3228-3231 (1998).
[5] D. A. Egolf, I. V. Melnikov, W. Pesch and R. E. Ecke,
Nature 404, 733-736 (2000)
[6] P. J. Holmes, J. L. Lumley, and G. Berkooz, Turbu-
lence, coherent structures, dynamical systems and sym-
metry (Cambridge University Press, 1996).
[7] L. Sirovich, Physica A 37, 126 (1989)
[8] F. Qin, E. E. Wolf and H. C. Chang, Phys. Rev. Lett.
72(10), 1459 (1994)
[9] F. Christiansen, P. Cvitanovic and V. Putkaradze, Non-
linearity 10, 55-70 (1997).
[10] F. H. Busse, J. Math. Phys. 46, 140 (1967)
[11] M. M. Chen and J. A. Whitehead, J. Fluid Mech. 31, 1
(1968); F. H. Busse and J. A. Whitehead, J. Fluid Mech.
47, 305 (1971).
[12] D. Semwogerere and M. F. Schatz, Phys. Rev. Lett. 88,
054501(2002)
ABSTRACT
  We describe a practical procedure for extracting the spatial structure and
the growth rates of slow eigenmodes of a spatially extended system, using a
unique experimental capability both to impose and to perturb desired initial
states. The procedure is used to construct experimentally the spectrum of
linear modes near the secondary instability boundary in Rayleigh-B\'{e}nard
convection. This technique suggests an approach to experimental
characterization of more complex dynamical states such as periodic orbits or
spatiotemporal chaos.

<|endoftext|><|startoftext|>
Introduction
A galaxy is an ensemble of billions of stars, which interact by the gravitational
field which they create collectively. For galaxies, the collisional relaxation time is
much longer than the age of the universe ([8]). The collisions can therefore be
ignored and the galactic dynamics is well described by the Vlasov - Poisson system
(collisionless Boltzmann equation)
(1) ∂tf + v · ∇xf −∇xU · ∇vf = 0, ∆U = 4π
f(t, x, v)dv,
where (x, v) ∈ R3 × R3, f(t, x, v) is the distribution function and Uf (t, x) is its
gravitational potential. The Vlasov-Poisson system can also be used to describe the
dynamics of globular clusters over their period of orbital revolutions ([11]). One of
the central questions in such galactic problems, which has attracted considerable
attention in the astrophysics literature, of [7], [8], [11], [31] and the references there,
is to determine dynamical stability of steady galaxy models. Stability study can be
used to test a proposed configuration as a model for a real stellar system. On the
other hand, instabilities of steady galaxy models can be used to explain some of the
striking irregularities of galaxies, such as spiral arms as arising from the instability
of an initially featureless galaxy disk ([7]), ([32]).
In this article, we consider stability of spherical galaxies, which are the simplest
elliptical galaxy models. Though most elliptical galaxies are known to be non-
spherical, the study of instability and dynamical evolution of spherical galaxies
could be useful to understand more complicated and practical galaxy models . By
Jeans’s Theorem, a steady spherical galaxy is of the form
f0(x, v) ≡ f0(E,L2),
http://arxiv.org/abs/0704.1012v2
2 YAN GUO AND ZHIWU LIN
where the particle energy and total momentum are
|v|2 + U0(x), L2 = |x× v|2 ,
and U0(x) = U0 (|x|) satisfies the self-consistent Poisson equation. The isotropic
models take the form
f0(x, v) ≡ f0(E).
The cases when f ′0(E) < 0 has been widely studied and these models are known
to be linearly stable to both radial ([9]) and non-radial perturbations ([2]). The
well-known Casimir-Energy functional (as a Liapunov functional)
(2) H(f) ≡
Q(f) +
|v|2f − 1
|∇xUf |2,
is constant along the time evolution. If f ′0(E) < 0, we can choose the Casimir
function Q0 such that
Q′0(f0(E)) ≡ −E
for all E. By a Taylor expansion of H(f)−H(f0), it follows that formally the first
variation at f0 is zero, that is, H(1)(f0(E)) = 0 (on the support of f0(E)), and the
second order variation of H at f0 is
(3) H(2)f0 [g] ≡
{f0>0}
−f ′0(E)
dxdv − 1
|∇xUg|2dx
where Q′′(f0) =
, g = f − f0 and ∆Ug =
gdv. In the 1960s, Antonov ([1],
[2]) proved that
(4) H(2)f0 [Dh] =
∫ ∫ |Dh|2
|f ′0(E)|
dxdv −
|∇ψh|2 dx
is positive definite for a large class of monotone models. Here
D = v · ∇x −∇xU0 · ∇v,
h(x, v) is odd in v and −∆ψ =
Dhdv. He showed that such a positivity is
equivalent to the linear stability of f0(E). In [9], Doremus, Baumann and Feix
proved the radial stability of any monotone spherical models. Their proof was
further clarified and simplified in [10], [37], [22], and more recently in [33], [21].
In particular, this implies that any monotone isotropic models are at least linearly
stable.
Unfortunately, despite its importance and a lot of research (e.g., [20], [5], [6],
[13]), to our knowledge, no rigorous and explicit instability criterion of non-monotone
models has been derived. When f ′0(E) changes sign, functional H
is indefinite
and it gives no stability information, although it seems to suggest that these mod-
els are not energy minimizers under symplectic perturbations. In this paper, we
first obtain the following instability criterion for general spherical galaxies. For
any function g with compact support within the support of f0(E), we define the
|f ′0(E)| −weighted L2
R3 ×R3
space L2|f ′0|
with the norm ‖·‖|f ′0| as
(5) ||h||2|f ′
|f ′0(E)|h2dxdv.
UNSTABLE AND STABLE GALAXY MODELS 3
Theorem 1.1. Assume that f0(E) has a compact support in x and v, and f
bounded. For φ ∈ H1, define the quadratic form
(6) (A0φ, φ) =
|∇φ|2dx + 4π
f ′0(E) (φ− Pφ)
dxdv,
where P is the projector of L2|f ′0|
kerD =
and more explicitly Pφ is given by (18) for radial functions and (26) for general
functions. If there exists φ0 ∈ H1 such that
(7) (A0φ0, φ0) < 0,
then there exists λ0 > 0 and φ ∈ H2, f (x, v) given by (14), such that eλ0t[f, φ] is
a growing mode to the Vlasov-Poisson system (1) linearized around [f0(E), Uf0 ] .
A similar instability criterion can be obtained for symmetry preserving pertur-
bations of anisotropic spherical models f0
, see Remark 2. We note that the
term Pφ in the instability criterion is highly non-local and this reflects the collective
nature of stellar instability. The proof of Theorem 1.1 is by extending an approach
developed in [25] for 1D Vlasov-Poisson, which has recently been generalized to
Vlasov-Maxwell systems ([26], [28]). There are two elements in this approach. One
is to formulate a family of dispersion operators Aλ for the potential, depending
on a positive parameter λ. The existence of a purely growing mode is reduced to
find a parameter λ0 such that the Aλ0 has a kernel. The key observation is that
these dispersion operators are self-adjoint due to the reversibility of the particle
trajectories. Then a continuation argument is applied to find the parameter λ0
corresponding to a growing mode, by comparing the spectra of Aλ for very small
and large values of λ. There are two new complications in the stellar case. First,
the essential spectrum of Aλ is [0,+∞) and thus we need to make sure that the
continuation does not end in the essential spectrum.This is achieved by using some
compactness property due to the compact support of the stellar model. Secondly,
it is more tricky to find the limit of Aλ when λ tends to zero. For that, we need an
ergodic lemma (Lemma 2.4) and use the integrable nature of the particle dynamics
in a central field to derive an expression for the projection Pφ appeared in the limit.
In the second part of the article, we further study the nonlinear (dynamical)
stability of the normalized King model:
(8) f0 = [e
E0−E − 1]+
motivated by the study of the operator A0. The famous King model describes
isothermal galaxies and the core of most globular clusters [24]. Such a model
provides a canonical form for many galaxy models widely used in astronomy. Even
though f ′0 < 0 for the King model, it is important to realize that, because of the
Hamiltonian nature of the Vlasov-Poisson system (1), linear stability fails to imply
nonlinear stability (even in the finite dimensional case). The Liapunov functional is
usually required to prove nonlinear stability. In the Casimir-energy functional (2), it
is natural to expect that the positivity of such a quadratic formH(2)f0 [g] should imply
stability for f0(E). However, there are at least two serious mathematical difficulties.
First of all, it is very challenging to use the positivity of H(2)f0 [g] to control higher
4 YAN GUO AND ZHIWU LIN
order remainder in H(f)−H(f0) to conclude stability [38]. For example, one of the
remainder terms is f3 whose L2 norm is difficult to be bounded by a power of the
stability norm. The non-smooth nature of f0(E) also causes trouble here. Second of
all, even if one can succeed in controlling the nonlinearity, the positivity of H
is only valid for certain perturbation of the form g = Dh [22]. It is not clear at all
if any arbitrary, general perturbation can be reduced to the form Dh. To overcome
these two difficulties, a direct variational approach was initiated by Wolansky [39],
then further developed systematically by Guo and Rein in [14], [15], [17], [18], [19].
Their method avoids entirely the delicate analysis of the second order variation
H(2)f0 in (3), which has led to first rigorous nonlinear stability proof for a large class
of f0(E). The high point of such a program is the nonlinear stability proof for every
polytrope [18] f0(E) = (E0 − E)k+. Their basic idea is to construct galaxy models
by solving a variational problem of minimizing the energy under some constraints
of Casimir invariants. A concentration-compactness argument is used to show the
convergence of the minimizing sequence. All the models constructed in this way
are automatically stable.
Unfortunately, despite its success, the King model can not be studied by such a
variational approach. The Casimir function for a normalized King model is
(9) Q0(f) = (1 + f) ln(1 + f)− 1− f,
which has very slow growth for f → ∞. As a result, the direct variational method
fails. Recently, Guo and Rein [21] proved nonlinear radial stability among a class
of measure-preserving perturbations
Sf0 ≡
f(t, r, vr, L) ≥ 0 :
Q(f, L) =
Q(f0, L), for Q ∈ C∞c and Q(0, L) ≡ 0.
The basic idea is to observe that for perturbations in the class Sf0 , one can write
g = f − f0 as Dh = {h,E}. Therefore, H(2)f0 [g] = H
[Dh], for which the positivity
was proved in [22] for radial perturbations. To avoid the difficulty of controlling the
remainder term by H(2)f0 [g], an indirect contradiction argument was used in [21].
As our second main result of this article, we establish nonlinear stability of King’s
model for general perturbations with spherical symmetry:
Theorem 1.2. The King’s model f0 = [e
E0−E − 1]+ is nonlinearly stable under
spherically symmetric perturbations in the following sense: given any ε > 0 there
exists ε1 > 0 such that for any compact supported initial data f(0) ∈ C1c with
spherical symmetry, if d (f (0) , f0) < ε1 then
0≤t<∞
d (f (t) , f0) < ε,
where the distance functional d (f, f0) is defined by (35).
For the proof, we extended the approach in [27] for the 1 1
D Vlasov-Maxwell
model. To prove nonlinear stability, we study the Taylor expansion ofH(f)−H(f0).
Two difficulties as mentioned before are: to prove the positivity of the quadratic
form and to control the remainder. We use two ideas introduced in [27]. The first
idea is to use any finite number of Casimir functional Qi
f, L2
as constraints.
The difference from [21] is that we do not impose Qi
f, L2
f0, L
in the
perturbation class, but expand the invariance equationQi
f (t) , L2
f0, L
UNSTABLE AND STABLE GALAXY MODELS 5
f (0) , L2
f0, L
to the first order. In this way, we get a constraint for
g = f −f0 in the form that the coefficient of its projection to ∂1Qi
f0, L
is small.
Putting these constraints together, we deduce that a finite dimensional projection of
g to the space spanned by
f0, L
is small. To control the remainder term,
we use a duality argument. Noting that it is much easier to control the potential φ,
we use a Legendre transformation to reduce the nonlinear term in g to a new one
in φ only. The key observation is that the constraints on g in the projection form
are nicely suited to the Legendre transformation and yields a non-local nonlinear
term in φ only with the projections kept. By performing a Taylor expansion of this
non-local nonlinear term in φ, the quadratic form becomes a truncated version of
(A0φ, φ) defined by (6), whose positivity can be shown to be equivalent to that of
Antonov functional. The the remainder term now is only in terms of φ and can
be easily controlled by the quadratic form. The new complication in the stellar
case is that the steady distribution f0 (E) is non-smooth and compactly supported.
Therefore, we split the perturbation g into inner and outer parts, according to the
support of f0. For the inner part, we use the above constrainted duality argument
and the outer part is estimated separately.
2. An Instability Criterion
We consider a steady distribution
f0 (x, v) = f0(E)
has a bounded support in x and v and f ′0 is bounded, where the particle energy
E = 1
|v|2 + U0(x). The steady gravitational potential U0(x) satisfies a nonlinear
Poisson equation
∆U0 = 4π
f0dv.
The linearized Vlasov-Poisson system is
(11) ∂tf + v · ∇xf −∇xU0 · ∇vf = ∇xφ · ∇vf0, ∆φ = 4π
f(t, x, v)dv.
A growing mode solution (eλtf(x, v), eλtφ(x)) to (1) with λ > 0 satisfies
(12) λf + v · ∇xf −∇xU0 · ∇vf = f ′0v · ∇xφ.
We define [X(s;x, v), V (s;x, v)] as the trajectory of
dX(s;x,v)
= V (s;x, v)
dV (s;x,v)
= −∇xU0
such that X(0;x, v) = x, and V (0;x, v) = v. Notice that the particle energy E is
constant along the trajectory. Integrating along such a trajectory for −∞ ≤ s ≤ 0,
we have
f(x, v) =
eλsf ′0(E)V (s;x, v) · ∇xφ(X(s;x, v))ds(14)
= f ′0(E)φ(x) − f ′0(E)
λeλsφ(X(s;x, v))ds.
6 YAN GUO AND ZHIWU LIN
Plugging it back into the Poisson equation, we obtain an equation for φ
−∆φ+ [4π
f ′0(E)dv]φ − 4π
f ′0(E)
λeλsφ(X(s;x, v))dsdv = 0.
We therefore define the operator Aλ as
Aλφ ≡ −∆φ+ [4π
f ′0(E)dv]φ − 4π
f ′0(E)
λeλsφ(X(s;x, v))dsdv.
Lemma 2.1. Assume that f0(E) has a bounded support in x and v and f
bounded. For any λ > 0, the operator Aλ : H
2 → L2 is self-adjoint with the
essential spectrum [0,+∞) .
Proof. We denote
Kλφ = −4π[
f ′0(E)dv]φ + 4π
f ′0(E)
λeλsφ(X(s;x, v))dsdv.
Recall that f0 (x, v) = f0(E) has a compact support ⊂ S ⊂ R3x × R3v. We may
assume S = Sx×Sv, both balls in R3. Let χ = χ (|x|) be a smooth cut-off function
for the spatial support of f0 in the physical space Sx; that is, χ ≡ 1 on the spatial
support of f0 and has compact support inside Sx. Let Mχ be the operator of
multiplication by χ. Then Kλ = KλMχ =MχKλ =MχKλMχ. Indeed,
f ′0 (x, v) = f
0 (X(s;x, v), V (s;x, v))
because of the invariance of E under the flow. So
(Kλφ) (x) = −4π[
f ′0(E)dv]φ + 4π
f ′0(E)
λeλsφ(X(s;x, v))dsdv(15)
= −4π[
f ′0(E)dv]φ + 4π
∫ ∫ 0
λeλs (f ′0(E)φ) (X(s;x, v))dsdv
= (MχKλMχφ)(x).
First we claim that
‖Kλ‖L2→L2 ≤ 8π
|f ′0(E)| dv
Indeed, the L2 norm for the first term in Kλ is easily bounded by 4π
f ′0(E)dv
For the second term, we have for any ψ ∈ L2,
4πλeλsf ′0(E)φ(X(s;x, v))dsdvψ(x)dx|
|f ′0(E)|φ2(X(s;x, v))dvdx
|f ′0(E)|ψ2(x)dvdx
|f ′0(E)|φ2(x)dvdx
|f ′0(E)|ψ2(x)dvdx
|f ′0(E)|φ2(x)dvdx
|f ′0(E)|ψ2(x)dvdx
|f ′0(E)| dv
‖φ‖2 ‖ψ‖2 .
UNSTABLE AND STABLE GALAXY MODELS 7
Moreover, we have that Kλ is symmetric Indeed, for fixed s, by making a change of
variable (y, w) → (X(s;x, v), V (s;x, v)), so that (z, v) = (X(−s; y, w), V (−s; y, w)),
we deduce that
4πf ′0(E)
λeλsφ(X(s;x, v))dsdvψ(x)dx
4πf ′0(E)φ(y)ψ(X(−s; y, w))dydwds
4πf ′0(E)
λeλsψ(X(−s; y,−w))φ(y)dydwds
4πf ′0(E)
λeλsψ(X(s;x, v))φ(x)dvdxds.
Here we have used the fact [X(s; y, w), V (s; y, w)] = [X(−s; y,−w),−V (s; y,−w)]
in the last line. Hence
(Kλφ, ψ) = (φ,Kλψ).
Since Kλ = KλMχ and Mχ is compact from H
2 into L2 space with support in Sx,
so Kλ is relatively compact with respect to −∆. Thus by Kato-Relich and Weyl’s
Theorems, Aλ : H
2 → L2 is self-adjoint and σess(Aλ) = σess(−∆). �
Lemma 2.2. Assume that f ′0(E) has a bounded support in x and v and f
bounded. Let
k(λ) = inf
φ∈D(Aλ),||φ||2=1
(φ,Aλφ),
then k(λ) is a continuous function of λ when λ > 0. Moreover, there exists 0 <
Λ <∞ such that for λ > Λ
(17) k(λ) ≥ 0.
Proof. Fix λ0 > 0, φ ∈ D(Aλ), and ||φ||2 = 1. Then
k(λ0) ≤ (φ,Aλ0φ)
≤ (φ,Aλφ) + |(φ,Aλ0φ)− (φ,Aλφ)|
≤ (φ,Aλφ) + 4π
|f ′0(E)|
[λeλs − λ0eλ0s]φ(X(s;x, v))φ(x)dsdvdx
≤ (φ,Aλφ) + 4π
|f ′0(E)|
[λ̃|s|eλ̃s + eλ̃s]dλ̃φ(X(s;x, v))φ(x)dsdvdx
≤ (φ,Aλφ) + C
[λ̃|s|eλ̃s + eλ̃s]dλ̃ds
≤ (φ,Aλφ) + C| lnλ− lnλ0|.
We therefore deduce that by taking the infimum over all φ,
k(λ0) ≤ k(λ) + C| lnλ− lnλ0|.
Same argument also yields k(λ) ≤ k(λ0) + C| lnλ − lnλ0|.Thus |k(λ0)− k(λ)| ≤
C| lnλ− lnλ0| and k(λ) is continuous for λ > 0.
To prove (17), by (14), we recall from Sobolev’s inequality in R3
8 YAN GUO AND ZHIWU LIN
|(Kλφ, ψ)| =
4πf ′0(E)e
λs∇φ(X(s;x, v))V (s)dsdvψ(x)dx
|ψ|2|f ′0(E)|dvdx
|∇φ(X (s))|2|f ′0(E)||V (s) |2dxdv]1/2ds
|ψ|2|f ′0(E)|dvdx
)1/2 ∫ ∫
v2|∇φ(x)|2|f ′0(E)|dxdv]1/2ds
||ψ||6||∇φ||2 ≤
||∇ψ||2||∇φ||2,
since f0 has compact support. Therefore,
(Aλφ, φ) = ||∇φ||2 − (Kλφ, φ) ≥ (1 −
)||∇φ||2 ≥ 0
for λ large. �
We now compute limλ→0+Aλ. We first consider the case when the test function
φ is spherically symmetric.
Lemma 2.3. For spherically symmetric function φ(x) = φ (|x|) , we have
(Aλφ, φ) = (A0φ, φ) ≡
|∇φ|2dx+ 4π
f ′0(E)dvφ
− 32π3
minU0
f ′0(E)
∫ r2(E,L)
r1(E,L)
2(E−U0−L2/2r2)
∫ r2(E,L)
r1(E,L)
2(E−U0−L2/2r2)
|∇φ|2 + 32π3
f ′0(E)
∫ r2(E,L)
r1(E,L)
(φ− φ̄)2 drdEdL√
2(E − U0 − L2/2r2)
Proof. Given the steady state f0(E), U0(|x|) and any radial function φ (|x|) . To
find the limit of
(Aλφ, φ) =
|∇φ|2dx+ 4π
f ′0(E)dvφ
2dx(19)
f ′0(E)
λeλsφ(X(s;x, v))ds
φ (x) dxdv,
we study the following
(20) lim
λeλsφ(X(s;x, v))ds.
Note that we only need to study (20) for points (x, v) with E = 1
|v|2+U0| (x|) < E0
and L = |x× v| > 0, because in the third integral of (19) f ′0(E) has support in
UNSTABLE AND STABLE GALAXY MODELS 9
{E < E0} and the set {L = 0} has a zero measure. We recall the linearized Vlasov-
Poisson system in the r, vr, L coordinates takes the form
∂tf + vr∂rf +
− ∂rU0
∂vrf = ∂rUf∂vrf0,
∂rrUf +
∂rUf = 4π
For the corresponding linearized system, for points (x, v) with E < E0 and L > 0,
the trajectory of (X(s;x, v), V (s;x, v)) in the coordinate (r, E, L) is a periodic
motion described by the ODE (see [8])
dr(s)
= vr(s),
dvr(s)
= −U ′0(r) +
with the period
T (E,L) = 2
∫ r2(E,L)
r1(E,L)
2(E − U0 − L2/2r2)
where 0 < r1(E,L) ≤ r2(E,L) < +∞ are zeros of E − U0 − L2/2r2.So by Lin’s
lemma in [[25]],
λeλsφ(X(s;x, v))ds =
φ(X(s;x, v))ds.
Since φ(X(s;x, v) = φ(r(s)), a change of variable from s→ r(s) leads to
φ(X(s;x, v))ds = 2
φ(r)dr
2(E − U0 − L2/2r2)
For any function g(r, E, L), we define its trajectory average as
ḡ(E,L) ≡
∫ r2(E,L)
r1(E,L)
g(r,E,L)dr√
2(E−U0−L2/2r2)
∫ r2(E,L)
r1(E,L)
2(E−U0−L2/2r2)
λeλsφ(X(s;x, v))ds = 2
φ(r)dr
2(E − U0 − L2/2r2)
/T (E,L) = φ̄ (E,L)
10 YAN GUO AND ZHIWU LIN
and the integrand in third term of (19) converges pointwise to f ′0(E)φ̄φ. Thus by
the dominated convergence theorem, we have
(Aλφ, φ) =
|∇φ|2dx+ 4π
f ′0(E)φ
2dxdv − 4π
f ′0(E)φ̄φ dxdv
|∇φ|2dx+ 4π
f ′0(E)φ
2dxdv
− 32π3
minU0
f ′0(E)
∫ r2(E,L)
r1(E,L)
φ̄ (E,L)φ (r)
drdEdL
2(E − U0 − L/2r2)
|∇φ|2dx+ 4π
f ′0(E)φ
2dxdv
− 32π3
minU0
f ′0(E)
∫ r2(E,L)
r1(E,L)
2(E−U0−L/2r2)
∫ r2(E,L)
r1(E,L)
2(E−U0−L/2r2)
|∇φ|2 + 32π3
f ′0(E)
∫ r2(E,L)
r1(E,L)
(φ− φ̄)2
drdEdL
2(E − U0 − L/2r2)
This finishes the proof of the lemma. �
To compute limλ→0+(Aλφ, φ) for more general test function φ, we use the fol-
lowing ergodic lemma which is a direct generalization of the result in [26].
Lemma 2.4. Consider the solution (P (s; p, q) , Q (s; p, q)) to be the solution of a
Hamiltonian system
Ṗ = ∂qH (P,Q)
Q̇ = −∂pH (P,Q)
with (P (0) , Q (0)) = (p, q) ∈ Rn ×Rn. Denote
Qλm =
λeλsm (P (s) , Q (s)) ds.
Then for anym (p, q) ∈ L2 (Rn ×Rn), we have Qλm→ Pm strongly in L2 (Rn ×Rn).
Here P is the projection operator of L2 (Rn ×Rn) to the kernel of the transport
operator D = ∂qH∂p − ∂pH∂q and Pm is the phase space average of m in the set
traced by the trajectory.
Proof. Denote U (s) : L2 (Rn ×Rn) → L2 (Rn ×Rn) to be the unitary semigroup
U (s)m = m (P (s) , Q (s)). By Stone Theorem ([40]), U (s) is generated by iR = D,
where R = −iD is self-adjoint and
U (s) =
eiαsdMα
where
Mα;α ∈ R1
is spectral measure of R. So
λeλsm(P (s), Q(s))ds =
eiαsdMαm ds =
λ+ iα
dMαm.
UNSTABLE AND STABLE GALAXY MODELS 11
On the other hand, the projection is P = M{0} =
ξdMα where ξ(α) = 0 for
α 6= 0 and ξ(0) = 1. Therefore
λeλsm(P (s), Q(s))ds − Pm
λ+ iα
− ξ(α)
d‖Mαm‖2L2
by orthogonality of the spectral projections. By the dominated convergence theo-
rem this expression tends to 0 as λ→ 0+, as we wished to prove. The explaination
of Pm as the phase space average of m is in our remark below. �
Remark 1. Since
λeλsds = 1, the function
(x, v) =
λeλsm (P (s), Q(s)) ds
is a weighted time average of the observable m along the particle trajectory. By the
same proof of Lemma 2.4, we have
(22) lim
m (P (s), Q(s)) ds = Pm.
But from the standard ergodic theory ([3]) of Hamiltonian systems, the limit of the
above time average in (22) equals the phase space average of m in the set traced
by the trajectory. Thus Pm has the meaning of the phase space average of m and
Lemma 2.4 states that the limit of the weighted time average (21) yields the same
phase space average. In particular, if the particle motion is ergodic in the invariant
set SI determined by the invariants E1, · · · , Ik, and if dσI denotes the induced
measure of Rn ×Rn on SI , then
(23) Pm =
σI (SI)
m (p, q) dσI (p, q) .
For integral systems, using action angle variables (J1, · · · , Jn;ϕ1, · · · , ϕn) we have
(Pm) (J1, · · · , Jn) = (2π)−n
· · ·
m (J1, · · · , Jn, ϕ1, · · · , ϕn) dϕ1, · · · dϕn
for the generic case with independent frequencies (see [4]).
Recall the weighted L2 space L2|f ′0|
in (5). Then U (s) : L2|f ′0|
→ L2|f ′0|
defined by
U (s)m = m (X(s;x, v), V (s;x, v)) is an unitary group, where (X(s;x, v), V (s;x, v))
is the particle trajectory (13). The generator of U (s) is D = v ·∂x−∇xU0 ·∇v and
R = −iD is self-adjoint by Stone Theorem. By the same proof, Lemma 2.4 is still
valid in L2|f ′0|
. In particular, for any φ (x) ∈ L2
we have
λeλsφ(X(s;x, v))ds → Pφ
in L2|f ′0|
, where P is the projector of L2|f ′0|
to kerD.
Now we derive an explicit formula for the above limit Pφ. Note that as in the
proof of lemma 2.3, we only need to derive the formula of Pφ for points (x, v) with
E < E0 and L > 0. Since U0 (x) = U0 (r), the particle motion (13) in such a center
field is integrable and has been well studied (see e.g. [8], [4]). For particles with
12 YAN GUO AND ZHIWU LIN
energy E < E0 < 0, L > 0 and momentum ~L = x× v, the particle orbit is a rosette
in the annulus
AE,L = {r1(E,L) ≤ r ≤ r2(E,L)} =
E − U0 − L2/2r2 ≥ 0
lying on the orbital plane perpendicular to ~L. So we can consider the particle
motion to be planar. For such case, the action-angle variables are as follows (see
e.g. [30]): the actions variables are
T (E,L)
, Jθ = L,
where
T (E,L) = 2
∫ r2(E,L)
r1(E,L)
2(E − U0 − L2/2r2)
is the radial period, the angle variable ϕr is determined by
dϕr =
T (E,L)
2(E − U0 − L2/2r2)
and ϕθ = θ −∆θ where
d (∆θ) =
Lr−2 − Ωθ
2(E − U0 − L2/2r2)
Ωθ (E,L) =
T (E,L)
∫ r2(E,L)
r1(E,L)
2(E − U0 − L2/2r2)
is the average angular velocity. For any function φ (x) ∈ H2
, we denote
φ~L (r, θ) to be the restriction of φ in the orbital plane perpendicular to
~L. Then by
(24), for the generic case when the radial and angular frequencies are independent,
we have
E, ~L
= (2π)
φ~Ldϕθdϕr(26)
πT (E,L)
∫ r2(E,L)
r1(E,L)
φ~L (r, θ) dθdr
2(E − U0 − L2/2r2)
In particular, for a spherically symmetric function φ = φ (r), we recover
(27) (Pφ) (E,L) = 2
T (E,L)
∫ r2(E,L)
r1(E,L)
φ(r)dr
2(E − U0 − L2/2r2)
We thus conclude the following
Lemma 2.5. Assume that f0(E) has a bounded support in x and v and f
bounded. For any φ ∈ H1
, we have
(Aλφ, φ) = (A0φ, φ)
|∇φ|2dx+ 4π
f ′0(E)dvφ
2dx− 4π
f ′0(E) (Pφ)
|∇φ|2dx+ 4π
f ′0(E) (φ− Pφ)
UNSTABLE AND STABLE GALAXY MODELS 13
where P is the projector of L2|f ′0|
to kerD and more explicitly Pφ is given by (26).
The limiting operator A0 is
(29) A0φ = −∆φ+ [4π
f ′0(E)dv]φ − 4π
f ′0(E)Pφdv.
Now we give the proof of the instability criterion.
Proof of Theorem 1.1. We define
λ∗ = sup
k(λ)<0
By Lemmas 2.1 and 2.5, we deduce that
−∞ < λ∗ ≤ Λ <∞.
Therefore, by the continuity of k(λ), we have
k(λ∗) = 0.
Hence, there exists an increasing sequence of λn < λn+1 < λ∗ so that λn → λ∗,
kn ≡ k(λn) < 0, and
kn → k(λ∗) = 0.
Therefore, kn are negative eigenvalues. By Lemma 2.2, we get a sequence φn ∈ H2
such that
(30) Aλnφn = knφn
with kn < 0, kn → 0 and λn → λ0 > 0, as n → ∞. Recall χ the cutoff function
of the support of f0(E) such that χ ≡ 1 for f0(E) > 0. We claim that χφn is a
nonzero function for any n. Suppose otherwise, χφn ≡ 0, then from the equation
(30) we have (−∆− kn)φn = 0 which implies that φn = 0, a contradiction.Thus
we can normalize φn by ‖χφn‖2 = 1. Taking inner product of (30) with φn and
integrating by parts, we have
‖▽φn‖22 ≤ −4π
f ′0(E)φ
n dvdx+
4πf ′0(E)
λnsφn(X(s;x, v))dsφn (x) dx
= −4π
f ′0(E) (χφn)
4πf ′0(E)
λns (χφn) (X(s;x, v))ds (χφn) (x) dx
f ′0(E)dv
‖χφn‖22 .
Here in the second equality above, we use the fact χ = 1 on the support of
f ′0(E) (f0(E)) and that (χφn) (X(s;x, v)) = φn(X(s;x, v)χ due to the invariance
of the support under the trajectory flow, as in (15). In the last inequality, we use
the same estimate as in (16). Thus,
||φn||L6 ≤ C sup
‖▽φn‖2 < C
for some constant C′ independent of n. Then there exists φ ∈ L6 and ∇φ ∈ L2
such that
φn → φ weakly in L6, and ∇φn → ∇φ weakly in L2.
14 YAN GUO AND ZHIWU LIN
This implies that χφn → χφ strongly in L2. Therefore ‖χφ‖2 = 1 and thus φ 6= 0.
It is easy to show that φ is a weak solution of Aλ0φ = 0 or
(31) −∆φ = −[4π
f ′0(E)dv]φ + 4πf
λ0sφ(X(s;x, v))dsdv = ρ.
We have that
ρdx = −4π
f ′0(E)φ (x) dxdv +
4πf ′0(E)φ(X(s;x, v))dxdvds
= −4π
f ′0(E)φ (x) dxdv +
4πf ′0(E)φ(x)dxdvds = 0
and by (31) ρ has compact support in Sx, the x−support of f0(E). Therefore from
the formula φ (x) =
|x−y|
dy, we have
φ (x) =
ρ (y)
|x− y|
ρ (y)
|x− y|
ρ (y)
dy = O
|x|−2
for x large, and thus φ ∈ L2. By elliptic regularity, φ ∈ H2. We define f (x, v) by
(14), then f ∈ L∞ with the compact support in S. Now we show that eλ0t[f, φ] is a
weak solution to the linearized Vlasov-Poisson system. Since φ satisfies the Poisson
equation (31), we only need to show that f satisfies the linearized Vlasov equation
(12) weakly. For that, we take any g ∈ C1c
3 × R3
, and
R3×R3
(Dg) fdxdv
R3×R3
(Dg) (f ′0(E)φ(x)) dxdv −
R3×R3
(Dg) f ′0(E)
λ0sφ(X(s;x, v))dsdxdv
= I + II.
Since D is skew-adjoint, the first term is
I = −
R3×R3
gD (f ′0(E)φ) dxdv = −
R3×R3
f ′0(E)gDφdxdv.
UNSTABLE AND STABLE GALAXY MODELS 15
For the second term,
II = −
R3×R3
f ′0(E) Dg(x, v) φ (X(s;x, v)) dxdvds
R3×R3
f ′0(E) (Dg) (X(−s), V (−s))φ (x) dxdvds
R3×R3
f ′0(E)
g (X(−s), V (−s))
ds φ (x) dxdv
R3×R3
f ′0(E)
λ0g (x, v)−
λ0sg (X(−s), V (−s)) ds
φ (x) dxdv
R3×R3
f ′0(E)λ0φ (x)− f ′0(E)
λ0sφ (X(s), V (s)) ds
g (x, v) dxdv
R3×R3
f ′0(E)φ (x)− f ′0(E)
λ0sφ (X(s), V (s)) ds
g dxdv
= .λ0
R3×R3
fgdxdv.
Thus we have
R3×R3
(Dg) fdxdv =
R3×R3
(λ0f − f ′0(E)Dφ) gdxdv
which implies that f is a weak solution to the linearized Vlasov equation
λ0f +Df = f
0 (E) v · ∇xφ.
Remark 2. Consider an anisotropic spherical galaxy with f0 (x, v) = f0
For a radial symmetric growing mode eλt (φ, f) with φ = φ (|x|) and f = f
|x| , E, L2
The linearized Vlasov equation (11) becomes
λf + v · ∇xf −∇xU0 · ∇vf
= ∇xφ · ∇vf0 = ∇xφ ·
|x× v|2
= φ′ (|x|) x
v + 2
[(x× v)× x]
v · ∇xφ,
which is of the same form as in the isotropic case (20). So by the same proof of The-
orem 1.1, we also get an instability criterion for radial perturbations of anisotropic
galaxy, in terms of the quadratic form (18) with f ′0(E) being replaced by
3. Nonlinear Stability of the King’s Model
In the second half of the article, we investigate the nonlinear stability of the King
model (8). We first establish:
Lemma 3.1. Consider spherical models f0 = f0 (E) with f
0 < 0. The operator
A0 : H
r → L2r
A0φ = −∆φ+ [4π
f ′0dv]φ − 4π
f ′0Pφdv
16 YAN GUO AND ZHIWU LIN
is positive, where H2r and L
r are spherically symmetric subspaces of H
2 and L2,
and the projection Pφ is defined by (27). Moreover, for φ ∈ H2r we have
(32) (A0φ, φ) ≥ ε
|∇φ|22 + |φ|
for some constant ε > 0.
Proof. Define k0 = inf (A0φ, φ) / (φ, φ) .We want to show that k0 > 0. First, by
using the compact embedding of H2r →֒ L2r it is easy to show that the minimum
can be obtained and k0 is the lowest eigenvalue. Let A0φ0 = k0φ0 with φ0 ∈ H2r
and ‖φ0‖2 = 1. The fact that k0 ≥ 0 follows immediately from Theorem 1.1 and the
nonexistence of radial modes ([9], [22]) for monotone spherical models. The proof
of k0 > 0 is more delicate. For that, we relate the quadratic form (A0φ, φ) to the
Antonov functional (4). We define D = v ·∂x−∇xU0 ·∇v to be the generator of the
unitary group U (s):L
|f ′0|
→ L2,r|f ′0|
defined by U (s)m = m (X(s;x, v), V (s;x, v)) .
Here L
|f ′0|
is the spherically symmetric subspace of L2|f ′0|
, which is preserved under
the flow mapping U (s). By the definition of Pφ, we have φ0 − Pφ0 ⊥ kerD. By
Stone theorem iD is self-adjoint and in particular D is closed. Therefore by the
closed range theorem ([40]), we have (kerD)
= R (D) , where R (D) is the range
of D. So there exists h ∈ L2,r|f ′0|
such that Dh = φ0−Pφ0. Moreover, since φ0−Pφ0
is even in v and the operator D reverses the parity in v, the function h is odd in v.
Define f− = f ′0h. We have
k0 = (A0φ0, φ0) =
|∇φ0|2 dx + 4π
f ′0 (φ0 − Pφ0)
|∇φ0|2 dx− 8π
|f ′0| (φ0 − Pφ0)φ0dxdv
|f ′0| (φ0 − Pφ0)
|Df−|2
|f ′0|
dxdv + 2
Df−dvdx +
|∇φ0|2 dx
|Df−|2
|f ′0|
dxdv +
|∇φ0|2 dx
|Df−|2
|f ′0|
dxdv +
|∇φ0|2 − 2∇φ0 · ∇φ−
|Df−|2
|f ′0|
dxdv − 1
where ∆φ− = 4π
Df−dv.Notice that the last expression above is the Antonov
functional 4πH (f−, f−). Since f− is spherical symmetric and odd in v,we have
H (f−, f−) > 0 by the proof in [22] which was further clarified in [33] and [21].
Therefore we get k0 > 0 as desired and (A0φ, φ) ≥ k0 |φ|22.
UNSTABLE AND STABLE GALAXY MODELS 17
To get the estimate (32), we rewrite
(A0φ, φ) = ε
|∇φ|2 dx + 4π
f ′0 (φ− Pφ)
+ (1− ε) (A0φ, φ)
|∇φ|2 dx− 4πε ‖φ− Pφ‖2L2
+ (1− ε) k0 |φ|22
|∇φ|2 dx− 8πε ‖φ‖2L2
+ (1− ε) k0 |φ|22 (since ‖P‖L2
|∇φ|2 dx+ ((1− ε) k0 − Cε) |φ|22 ≥ ε
|∇φ|2 dx+ |φ|22
if ε is small enough. �
Next, we will approximate the kerD by a finite dimensional approximation. Let
{ξi(E,L) = αi(E)βi(L)}∞i=1 be a smooth orthogonal basis for the subspace kerD =
{g(E,L)} ⊂ L2,r|f ′0|
.Define the finite-dimensional projection operator PN : L2,r|f ′0|
|f ′0|
(33) PNh ≡
(h, ξi)|f ′0|ξi
and the operator AN : H2r → L2r by
ANφ = −∆φ+ [4π
f ′0dv]φ− 4π
f ′0PNφdv.
Lemma 3.2. There exists K, δ0 > 0 such that when N > K we have
ANφ, φ
≥ δ0 |∇φ|22
for any φ ∈ H2r .
Proof. First we have AN → A0 strongly in L2. In deed, for any φ ∈ H2r ,
∥ANφ−A0φ
4πf ′0 (PNφ− Pφ) dv
≤ C ‖PNφ− Pφ‖L2
as N → ∞.We claim that for N sufficiently large, the lowest eigenvalue of AN
is at least k0/2 where k0 > 0 is the lowest eigenvalue of A0. Suppose otherwise,
then there exists a sequence {λn} and {φn} ⊂ H2r with λn < k0/2, ‖φn‖2 =
1 and Anφn = λnφn. This implies that ∆φn is uniformly bounded in L
2, by
elliptic estimate we have ‖φn‖H2 ≤ C for some constant C independent of n.
Therefore there exists φ0 ∈ H2r such that φn → φ0 weakly in H2r . By the compact
embedding of H2r →֒ L2r, we have φn → φ0 strongly in L2r and ‖φ0‖2 = 1. The
strong convergence of Anφ0 → A0φ0 implies that
Anφn → A0φ0
weakly in L2. Let λn → λ0 ≤ k0/2, then we have A0φ0 = λ0φ0, a contradiction.
Therefore we have
ANφ, φ
≥ k0/2 |φ|22 for φ ∈ H2r , when N is large enough. The
estimate (34) is by the same proof of (32) in Lemma 3.1. �
18 YAN GUO AND ZHIWU LIN
Recalling (8) with f0 = [e
E0−E−1]+ and Q0(f) = (f+1) ln(f+1)−f, we further
define functionals (related to the finite dimensional approximation of kerD) as
Ai(f) ≡
αi(− ln(s+ 1) + E0)ds,
Qi(f, L) ≡ Ai(f)βi(L), for 1 ≤ i ≤ N.
for 1 ≤ i ≤ N. Clearly,
∂1Qi(f0, L) = αi(− ln(f0 + 1) + E0)βi(L) = αi(E)βi(L) = ξi(E,L),
where {ξi(E,L)}Ni=1 are used to define PN in Lemma 3.2. Define the Casimir
functional (E0 < 0 )
I(f) =
[Q0(f) +
|v|2f − E0f ]dxdv −
|∇φ|2dx
which is invariant of the nonlinear Vlasov-Poisson system. We introduce additional
N invariants
Ji(f, L) ≡
Qi(f, L)dxdv.
for 1 ≤ i ≤ N . We define Ω to be the support of f0(E). We first consider
I(f)− I(f0) =
[Q0(f)−Q0(f0) +
|v|2(f − f0)− E0(f − f0)]dxdv
∇U0 · ∇(U − U0)−
|∇(U − U0)|2dx
[Q0(f)−Q0(f0) + (E − E0)(f − f0)]dxdv −
|∇(U − U0)|2dx.
We define
g = f − f0, φ = U − U0
gin ≡ (f − f0)1Ω, gout ≡ (f − f0)1Ωc , ∆φin ≡
gin, ∆φout ≡
gout .
And we define the distance function for nonlinear stability as
d(f, f0) ≡
[Q0(gin + f0)−Q0(f0) + (E − E0)gin]dxdv
|∇φin|2dx
Q0(gout)dxdv +
(E − E0)goutdxdv
= din +
|∇φin|2dx + dout,
for which each term is non-negative. We therefore split:
I(f)− I(f0)
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin]dxdv −
|∇φin|2dx
Q0(gout)dxdv +
(E − E0)goutdxdv −
|∇φout|2dx−
∇φout · ∇φindx
= Iin + Iout .
UNSTABLE AND STABLE GALAXY MODELS 19
In the estimates below, we use C,C′, C′′ to denote general constants depending
only on f0 and quantities like ‖f (t)‖Lp (p ∈ [1,+∞]) which equals ‖f (0)‖Lp and
therefore always under control. We first estimate ‖∇φout‖22 to be of higher order of
d, which also implies that
∇φout · ∇φindx is of higher order of d.
Lemma 3.3. For ε > 0 sufficiently small, we have
|∇φout|2dx ≤ C
εd(f, f0) +
[d(f, f0)]
Proof. In fact, since
|∇φout|2dx ≤ C||
gout dv||2L6/5
≤ C||
gout 1E0≤E≤E0+εdv||2L6/5 + C||
gout 1E>E0+εdv||2L6/5 .
The first term is bounded by
g2out dv]
1E0≤E≤E0+εdv]
3/5dx
g2out dvdx]×
1E0≤E≤E0+εdv]
3/2dx
≤ Cε[
g2out dvdx] ≤ Cε[
g2out dvdx]
≤ Cεd(f, f0).
In the above estimates, we use that
Q0(gout)dvdx ≥ c
g2out dvdx and
1E0≤E≤E0+εdv ≤ Cε,
which can be checked by an explicit computation when ε > 0 is sufficiently small
such that E0 + ε ≤ 0.
On the other hand, by the standard estimates (see [12, P. 120-121])
gout 1E>E0+εdv||2L6/5
gout 1E>E0+εdxdv
|v|2gout 1E>E0+εdxdv
(E − E0)gout 1E>E0+εdxdv
(E − E0)gout 1E>E0+εdxdv + 2 sup |U0|
gout 1E>E0+εdxdv
2 sup |U0|
d5/3.
20 YAN GUO AND ZHIWU LIN
By Lemma 3.3, we have
∇φout · ∇φindx
≤ ‖∇φout‖2 ‖∇φin‖2
ε1/3d(f, f0) +
[d(f, f0)]
and therefore for ε sufficiently small,
(36) Iout ≥ dout − C
ε1/3d(f, f0) +
[d(f, f0)]
4/3 +
[d(f, f0)]
To estimate Iin, we split it into three parts:
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin + φingin]dxdv +
|∇φin|2dx
(1− τ)
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin + (I − PN )φingin]dxdv +
|∇φin|2dx
+ (1− τ)
PNφingindxdv
= I1in + I
in + I
where ∆φin = 4π
gin dv. We estimate each term in the following lemmas.
Lemma 3.4.
(38) I1
din − Cτ
|∇φin|2dx.
Proof. In fact, since the integration region Ω is finite, we have
I1in =τ
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin + φingin]dxdv +
|∇φin|2dx
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin]dxdv − Cτ ||φin||L6 ||gin||L6/5
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin]dxdv − C′τ ||∇φin||L2 ||gin||2
din − C′′τ ||∇φin||22,
since
din =
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin]dxdv ≥ C||gin||22.
To estimate I2in, we need the following pointwise duality lemma from elementary
calculus.
Lemma 3.5. For any c, and any h, we have
gc,f0 (h) = Q0(h+ f0)−Q0(f0)−Q′0(f0)h− ch ≥ (f0 + 1)(1 + c− ec).
Proof. Direct computation yields that the minimizer fc of gc,f0 (h) satisfies the
Euler-Lagrange equation
ln (fc + f0 + 1)− ln (f0 + 1)− c = 0,
UNSTABLE AND STABLE GALAXY MODELS 21
fc = (f0 + 1) (e
c − 1) .
Thus by using the Euler-Lagrange equation, we deduce
min gc,f0 (h) = gc,d (fc)
= (fc + f0 + 1) ln(1 + fc + f0)
− (f0 + 1) ln(1 + f0)− [1 + ln(f0 + 1)]fc − cfc
= (fc + f0 + 1)[ln(1 + fc + f0)− ln(f0 + 1)− c]
+ fc ln(1 + f0) + c(f0 + 1)− [1 + ln(f0 + 1)]fc
= (f0 + 1)(1 + c− ec).
Lemma 3.6.
(39) I2
(1− τ) δ0
|∇φin|2dx− CeC
Proof. Recall (37). By using Lemma 3.5 for c = − (φin − PNφin) and using the
Taylor expansion, we have
I2in = (1 − τ)
[Q0(f0 + gin)−Q0(f0) + (E − E0)gin + (φin − PNφin) fin]dxdv
(1− τ)
|∇φin|2dx
(1− τ)
|∇φin|2dx+ (1− τ)
(f0 + 1)1Ω(1 + φin − PNφin − eφin−PNφin)dxdv
≥ 1− τ
|∇φin|2dx− 4π
|f ′0 (E)| (φin − PNφin)
− Ce|φin−PNφin|∞
|f ′0 (E)| |φin − PNφin|
dxdv (Note (f0(E) + 1)1Ω = |f ′0(E)|)
≥ (1− τ) δ0
|∇φin|2dx− Ce|φin−PNφin|∞
|f ′0 (E)| |φin − PNφin|
dxdv.
In the last line, we have used Lemma 3.2. To estimate the last term above and
conclude our lemma, it suffices to show
|φin − PNφin|∞ ≤ CNd
This follows from the facts that for the fixed N smooth functions ξi, we have
|PNφin|∞ =
(φin, ξi)|f ′0|ξi
≤ CN |φin|∞ ,
and since φ is spherically symmetric,
|φin| (r) =
u2ρin (u) du+
uρin (u)du
R |ρin|2 ≤ C
′′ ‖gin‖2 ≤ CNd
where ρin =
gindv and R is the support radius of ρin. �
22 YAN GUO AND ZHIWU LIN
We now estimate the term
PNφinfindxdv, for which we use the additional
invariants.
Lemma 3.7. For any ε > 0, we have
∣ ≤ C(d1/2(0) + ε1/2d1/2 + 1
d)d1/2.
Proof. By the definition of I3in in (37), it suffices to estimate (gin, ξi). We expand
Ji(f, L)− Ji(f0, L)
= Ji(f0 + gin, L)− Ji(f0, L) + Ji(gout, L)
= (gin , ξi) +O(d) + Ji(gout, L).
Notice that
|Ji(gout, L)| ≤ C||gout||L1 ≤ C||1{E0≤E≤E0+ε}gout||L1 + C||1{E≥E0+ε}gout||L1
≤ ε1/2||gout||L2 +
||1{E≥E0+ε}(E − E0)gout||L1 ≤ C[ε
1/2d1/2 +
It thus follows that
|(gin , ξi)| ≤ |Ji(f(0), L)− Ji(f0, L)|+ C[ε1/2d1/2 +
≤ C[d1/2(0) + ε1/2d1/2 + 1
Therefore
∣I3in
∣ = (1− τ)
PNφingin dxdv
(φin, ξi)|f ′0|ξi
gin dxdv
(φin, ξi)|f ′0|
|(ξi, gin)| ≤ C′
|φin|∞ |(ξi, gin)|
≤ Cd1/2[d1/2(0) + ε1/2d1/2 +
Now we prove the nonlinear stability of King model.
Proof of Theorem 1.2. The global existence of classical solutions of 3D Vlasov-
Poisson system was shown in [34] for compactly supported initial data f (0) ∈ C1c .
Let the unique global solution be (f (t) , φ (t)). Let d (t) = d(f (t) , f0). Combining
estimates (36), (38), (39) and (40), we have
I(f (0))− I(f0) = I(f (t))− I(f0)
≥ dout +
din +
(1− τ) δ0
|∇φin|2dx
ε1/3d (t) +
d (t)
d (t)
− CeC
′d(t)
d (t)
− Cd (t)1/2 [d1/2(0) + ε1/2d (t)1/2 +
d (t)].
UNSTABLE AND STABLE GALAXY MODELS 23
Thus by choosing ε and τ sufficiently small, there exists δ′ > 0 such that
I(f (0))− I(f0) ≥ δ′d(t)− C
d (t)
+ d (t)
+ d (t)
− CeC
′d(t)
d (t)
− Cd (t)1/2 d1/2(0).
It is easy to show that I(f (0)) − I(f0) ≤ C′′d (0). Define the functions y1 (x) =
δ′x2 − CeC′xx3 − C
x8/3 + x10/3 + x3
and y2 (x) = Cd (0)
x + C′′d (0). Then
above estimates implies that y1
d (t)
d (t)
. The function y1 is in-
creasing in (0, x0) where x0 is the first maximum point. So if d (0) is sufficiently
small, the line y = y2 (x) intersects the curve y = y1 (x) at points x1, x2, · · · ,
with x1 (d (0)) < x0 < x2 (d (0)) < · · · . Thus the inequality y1 (x) ≤ y2 (x) is
valid in disjoint intervals [0, x1 (d (0))] and [x2 (d (0)) , x3 (d (0))], · · · . Because d (t)
is continuous, we have that d (t)
< x1 (d (0)) for all t < ∞, provided we choose
d (0)
< x0. Since x1 (d (0)) → 0 as d (0) → 0, we deduce the nonlinear stability
in terms of the distance functional d (t)
Acknowledgements
This research is supported partly by NSF grants DMS-0603815 and DMS-0505460.
We thank the referees for comments and corrections.
References
[1] Antonov, V. A. Remarks on the problem of stability in stellar dynamics. Soviet Astr, AJ., 4,
859-867 (1961).
[2] Antonov, V. A., Solution of the problem of stability of stellar system Emden’s density law
and the spherical distribution of velocities, Vestnik Leningradskogo Universiteta, Leningrad
University, 1962.
[3] Arnold, V. I., Avez, A., Ergodic problems of classical mechanics, W. A. Benjamin, Inc., New
York-Amsterdam 1968.
[4] Arnold, V. I., Mathematical methods of classical mechanics, Springer-Verlag, New York-
Heidelberg, 1978.
[5] Barnes, J.; Hut, P.; Goodman, J., Dynamical instabilities in spherical stellar systems, Astro-
physical Journal, vol. 300, p. 112-131, 1986.
[6] Bartholomew, P., On the theory of stability of galaxies, Monthly Notices of the Royal Astro-
nomical Society, Vol. 151, p. 333 (1971).
[7] Bertin, Giuseppe, Dynamics of Galaxies, Cambridge University Press, 2000.
[8] Binney, J., Tremaine, S., Galactic Dynamics. Princeton University Press, 1987.
[9] Doremus, J. P.; Baumann, G.; Feix, M. R., Stability of a Self Gravitating System with Phase
Space Density Function of Energy and Angular Momentum, Astronomy and Astrophysics,
Vol. 29, p. 401 (1973).
[10] Gillon, D.; Cantus, M.; Doremus, J. P.; Baumann, G., Stability of self-gravitating spherical
systems in which phase space density is a function of energy and angular momentum, for
spherical perturbations, Astronomy and Astrophysics, vol. 50, no. 3, p. 467-470, 1976.
[11] Fridman, A., Polyachenko, V., Physics of Gravitating System Vol I and II, Springer-Verlag,
1984.
[12] Glassey, Robert T., The Cauchy problem in kinetic theory, SIAM, Philadelphia, PA, 1996.
[13] Goodman, Jeremy, An instability test for nonrotating galaxies, Astrophysical Journal, vol.
329, p. 612-617, 1988.
[14] Guo, Y., Variational method for stable polytropic galaxies, Arch. Rational Mech. Anal., 147,
225-243, 1999.
24 YAN GUO AND ZHIWU LIN
[15] Guo, Y., On generalized Antonov stablility criterion for polytropic steady states, Contem.
Math., 263, 85-107, 1999.
[16] Guo, Y., Rein, G., Stable steady states in stellar dynamics, Arch. Rational Mech. Anal., 147,
no. 3, 225-243, (1999).
[17] Guo, Y., Rein, G., Existence and stability of Camm type steady states in galactic dynamics,
Indiana U. Math. J., 48, 1237-1255, 1999.
[18] Guo, Y., Rein, G., Isotropic steady states in stellar dynamics, Commun. Math. Phys., 219,
2001.
[19] Guo, Y., Rein, G., Isotropic steady states in stellar dynamics revisited., Los Alamos Preprint,
2002.
[20] Henon, M., Numerical Experiments on the Stability of Spherical Stellar Systems, Astronomy
and Astrophysics, Vol. 24, p. 229 (1973).
[21] Guo, Y., Rein, G., Stability of the King Model and Symmetric Measure-Preserving Pertur-
bations, Preprint.
[22] Kandrup, H.; Signet, J. F.; A simple proof of dynamical stability for a class of spherical
clusters. The Astrophys. J. 298, p. 27-33.(1985)
[23] Kandrup, Henry E., A stability criterion for any collisionless stellar equilibrium and some
concrete applications thereof, Astrophysical Journal, vol. 370, p. 312-317, 1991.
[24] King, Ivan R., The structure of star clusters. III. Some simple dynamical models, Astronom-
ical Journal, Vol. 71, p. 64 (1966).
[25] Lin, Zhiwu, Instability of periodicBG waves, Math. Res. Letts., 8, 521-534(2001).
[26] Lin, Zhiwu and Strauss, Walter, Linear stability and instability of relativistic Vlasov-Maxwell
systems, to appear in Comm. Pure Appl. Math.
[27] Lin, Zhiwu and Strauss, Walter, Nonlinear stability and instability of relativistic Vlasov-
Maxwell systems, to appear in Comm. Pure Appl. Math.
[28] Lin, Zhiwu and Strauss, Walter, A sharp stability criterion for the Vlasov-Maxwell systems,
submitted.
[29] Lynden-Bell, D., The Hartree-Fock exchange operator and the stability of galaxies, Monthly
Notices of the Royal Astronomical Society, Vol. 144, p.189, 1969.
[30] Lynden-Bell, D. Lectures on stellar dynamics. Galactic dynamics and N-body simulations
(Thessaloniki, 1993), 3–31, Lecture Notes in Phys., 433, Springer, Berlin, 1994.
[31] Merritt, David, Elliptical Galaxy Dynamics, The Publications of the Astronomical Society of
the Pacific, Volume 111, Issue 756, pp. 129-168.
[32] Palmer, P. L., Stability of collisionless stellar systems: mechanisms for the dynamical struc-
ture of galaxies, Kluwer Academic Publishers, 1994.
[33] Perez, Jerome and Aly, Jean-Jacques, Stability of spherical stellar systems - I. Analytical
results, Monthly Notices of the Royal Astronomical Society, Volume 280, Issue 3, pp. 689-
699, 1996.
[34] Pfaffelmoser, K, Global classical solutions of the Vlasov-Poisson system in three dimensions
for general initial data, J. Differential Equations 95 (1992), no. 2, 281–303.
[35] Rein, G.: Collisionless Kinetic Equations from Astrophysics - The Vlasov-Poisson system.
Preprint 2005.
[36] Schaeffer, Jack, Steady states in galactic dynamics, Arch. Ration. Mech. Anal. 172 (2004),
no. 1, 1–19.
[37] Sygnet, J. F.; des Forets, G.; Lachieze-Rey, M.; Pellat, R., Stability of gravitational systems
and gravothermal catastrophe in astrophysics, Astrophysical Journal, vol. 276, p. 737-745,
1984.
[38] Wan, Y-H., On onlinear stability of isotropic models in stellar dynamics, Arch. Rational.
Mech. Anal., 147, (1999) 245-268.
[39] Wolansky, G., On nonlinear stability of polytropic galaxies. Ann. Inst. Henri Poincare. (1999),
16, 15-48.
[40] Yosida, Kôsaku, Functional analysis, Sixth edition. Grundlehren der Mathematischen Wis-
senschaften, 123. Springer-Verlag, 1980.
UNSTABLE AND STABLE GALAXY MODELS 25
Lefschetz Center for Dynamical Systems, Division of Applied Mathematics, Brown
University, Providence, RI 02912, USA
E-mail address: guoy@cfm.brown.edu
Mathematics Department, University of Missouri, Columbia, MO 65211 USA
E-mail address: lin@math.missouri.edu
	1. Introduction
	2. An Instability Criterion
	3. Nonlinear Stability of the King's Model
	References
ABSTRACT
  To determine the stability and instability of a given steady galaxy
configuration is one of the fundamental problems in the Vlasov theory for
galaxy dynamics. In this article, we study the stability of isotropic spherical
symmetric galaxy models $f_{0}(E)$, for which the distribution function $f_{0}$
depends on the particle energy $E$ only. In the first part of the article, we
derive the first sufficient criterion for linear instability of $f_{0}(E):$
$f_{0}(E)$ is linearly unstable if the second-order operator \[
A_{0}\equiv-\Delta+4\pi\int f_{0}^{\prime}(E)\{I-\mathcal{P}\}dv \] has a
negative direction, where $\mathcal{P}$ is the projection onto the function
space $\{g(E,L)\},$ $L$ being the angular momentum [see the explicit formula
(\ref{A0-radial})]. In the second part of the article, we prove that for the
important King model, the corresponding $A_{0}$ is positive definite. Such a
positivity leads to the nonlinear stability of the King model under all
spherically symmetric perturbations.

<|endoftext|><|startoftext|>
7 Flops connect minimal models
Yujiro Kawamata
October 30, 2018
Abstract
A result by Birkar-Cascini-Hacon-McKernan together with the bound-
edness of length of extremal rays implies that different minimal models
can be connected by a sequence of flops.
A flop of a pair (X,B) is a flip of a pair (X,B′) which is crepant for
KX + B where B
′ is a suitably chosen different boundary. We prove the
following:
Theorem 1. Let f : (X,B) → S and f ′ : (X ′, B′) → S be projective
morphisms from Q-factorial terminal pairs of varieties and Q-divisors such
that KX+B and KX′ +B
′ are relatively nef over S. Assume that there exists
a birational map α : X 99K X ′ such that α∗B = B
′, where the lower asterisk
denotes the strict transform. Then α is decomposed into a sequence of flops.
More precisely, there exist an effective Q-divisor D on X such that
(X,B +D) is klt and a factorization of the birational map α
X = X0 99K X1 99K · · · 99K Xt = X
which satisfy the following conditions:
(1) αi : Xi−1 → Xi (1 ≤ i ≤ t) is a flip for the pair (Xi, Bi +Di) over S,
where Bi and Di are strict transforms of B and D, respectively.
(2) αi is crepant for KXi−1 + Bi−1 in the sense that the pull-backs of
KXi−1 +Bi−1 and KXi +Bi coincide on a common log resolution.
We remark that the boundary B need not be assumed to be big as in
[1] Corollary 1.1.3. For example, a birational map between Calabi-Yau man-
ifolds can be decomposed into a sequence of flops. The number of marked
http://arxiv.org/abs/0704.1013v1
minimal models which are birationally equivalent to a fixed pair is finite if
B is big ([1] Corollary 1.1.5), but it is not the case in general (cf. [4]), where
a marked minimal model is a pair consisting of a minimal model and a fixed
birational map to it. If we relax the condition for the pairs to being klt, then
we should allow crepant blowings up besides flops.
The theorem was already proved in the case dimX = 3 and B = 0; first
in [2] assuming the abundance which was proved afterwards, and later in [5]
without assumption.
Proof. It is well-known that α is an isomorphism in codimension 1 because
(X,B) and (X ′, B′) are terminal and KX + B and KX′ + B
′ are relatively
nef (cf. [2]). We recall the proof for reader’s convenience. Let µ : V → X
and µ′ : V → X ′ be common log resolutions. We write
KV = µ
∗(KX +B)− µ
B + E = (µ′)∗(KX′ +B
′)− (µ′
)−1B′ + E ′
where E and E ′ are effective divisors whose supports coincide with the excep-
tional loci of µ and µ′, respectively, because (X,B) and (X ′, B′) are terminal.
Assume that there is a prime divisor on V which is contracted by µ but not
by µ′. Then it is an irreducible component of E but not of E ′. We set
F = min{E,E ′}, Ē = E − F and Ē ′ = E ′ − F . By the Hodge index theo-
rem, there exists a curve C on V which is contracted by µ and is contained
in Supp(Ē) but not in Supp(µ−1
B + Ē ′) and such that (Ē · C) < 0. Since
B ≥ (µ′
)−1B′, we have
((µ′)∗(KX′ +B
′) + µ−1
B − (µ′
)−1B′ + Ē ′) · C) ≥ 0.
But this is a contradiction to
(µ∗(KX +B) + Ē) · C) < 0.
The case where there is a prime divisor on V which is contracted by µ′ but
not by µ is treated similarly.
Let L′ be an effective f ′-ample divisor on X ′, and L its strict transform
on X . There exists a small positive number l such that (X,B + lL) is klt.
If KX + B + lL is f -nef over S, then α becomes a morphism by the base
point free theorem, hence an isomorphism since X is Q-factorial. Therefore
we may assume that KX +B + l
′L is not f -nef over S for any 0 < l′ ≤ l.
Let H be an effective divisor on X such that (X,B + lL+ tH) is klt and
KX + B + lL + tH is f -nef for some positive number t. We shall run the
MMP for the pair (X,B + l′L) over S with scaling of H for some l′. Since α
is an isomorphism in codimension 1, there are only flips in this MMP. The
following lemma shows that we can choose extremal rays such that the flips
are crepant with respect to KX +B.
Let k be a positive integer such that k(KX +B) is a Cartier divisor. We
set e = 1
2k dimX+1
Lemma 2. (1) There exists an extremal ray R for (X,B + lL) over S such
that ((KX +B) ·R) = 0.
(2) Let
t0 = min{t ∈ R | ((KX +B + lL+ tH) · R) ≥ 0 for all extremal rays R
for (X,B + lL) over S s.t. ((KX +B) ·R) = 0}.
Then KX +B + elL+ et0H is f -nef, and there exists an extremal ray R for
(X,B+elL) over S such that ((KX+B+elL+et0H)·R) = ((KX+B)·R) = 0.
Proof. (1) Since KX +B + elL is not nef, there exists an extrenal ray R for
(X,B+ elL) over S. Then R is also an extremal ray for (X,B+ lL) because
(X,B) is f -nef. Since the pair (X,B+ lL) is klt, R is generated by a rational
curve C, which is mapped to a point on S, such that
0 > ((KX +B + lL) · C) ≥ −2 dimX
by [3].
We claim that ((KX +B) ·C) = 0. Indeed we have otherwise ((KX +B) ·
C) ≥ 1/k, hence
((KX +B + elL) · C)
2k dimX + 1
((KX +B + lL) · C) +
2k dimX
2k dimX + 1
((KX +B) · C)
2k dimX + 1
(−2 dimX + 2dimX) = 0
a contradiction.
(2) If KX + B + elL + et0H is not f -nef, then there exists an extremal
ray R for (X,B + elL + et0H) over S. Then R is also an extremal ray for
(X,B+ lL+t0H) because (X,B) is f -nef. Since the pair (X,B+ lL+t0H) is
klt, R is generated by a rational curve C such that ((KX+B+lL+t0H)·C) ≥
−2 dimX by [3]. Then we have
((KX +B + elL+ et0H) · C)
2k dimX + 1
((KX +B + lL+ t0H) · C) +
2k dimX
2k dimX + 1
((KX +B) · C)
2k dimX + 1
(−2 dimX + 2dimX) = 0
a contradiction. Therefore KX +B + elL+ et0H is f -nef.
Since B + lL is f -big, the number of extremal rays for (X,B + lL) over
S is finite. Hence there exists such an R that ((KX +B + lL+ t0H) · R) =
((KX +B) · R) = 0.
We note that we can deduce (1) from only the finiteness of extremal rays,
but not (2). The point is that the number e stays independent of t0 during
the MMP.
We run the MMP for (X,B+elL) with scaling ofH . We take an extremal
ray R such that ((KX +B+ elL+ et0H) ·R) = ((KX +B) ·R) = 0. The flip
exists by [1] Corollary 1.4.1. Since (lL+ t0H) ·R) = 0, the pair (X,B+ lL+
t0H) remains to be klt after the flip. We also note that k(KX+B) remains to
be a Cartier divisor after the flip by the base point free theorem. Therefore
we can continue the process. By the termination theorem of directed flips
([1] Corollary 1.4.2), we complete our proof.
References
[1] Caucher Birkar, Paolo Cascini, Christopher D. Hacon, James McK-
ernan. Existence of minimal models for varieties of log general type.
math.AG/0610203.
[2] Kawamata, Yujiro. Crepant blowing-up of 3-dimensional canonical
singularities and its application to degenerations of surfaces. Ann.
of Math. 127 (1988), 93–163.
[3] Kawamata, Yujiro. On the length of an extremal rational curve. In-
vent. Math. 105 (1991), 609–611.
[4] Kawamata, Yujiro. On the cone of divisors of Calabi-Yau fiber
spaces. Internat. J. Math. 8 (1997), 665–687.
http://arxiv.org/abs/math/0610203
[5] Kollár, János. Flops. Nagoya Math. J. 113(1989), 15–36.
Department of Mathematical Sciences, University of Tokyo,
Komaba, Meguro, Tokyo, 153-8914, Japan
kawamata@ms.u-tokyo.ac.jp
ABSTRACT
  A remark on a paper by Birkar-Cascini-Hacon-McKernan.

<|endoftext|><|startoftext|>
7 A product formula for volumes of varieties
Yujiro Kawamata
October 28, 2018
The volume v(X) of a smooth projective variety X is defined by
v(X) = lim sup
dimH0(X,mKX)
md/d!
where d = dimX . This is a birational invariant.
Theorem 0.1. Let f : X → Y be a surjective morphism of smooth projective
varieties with connected fibers. Assume that both Y and the general fiber F
of f are varieties of general type. Then
v(Y )
v(F )
where dX = dimX, dY = dimY and dF = dimF .
Proof. Let H be an ample divisor on Y . There exists a positive integer m0
such that m0KY −H is effective.
Let ǫ be a positive integer. By Fujita’s approximation theorem ([1]),
after replacing a birational model of X , there exists a positive integer m1 and
ample divisors L on F such thatm1KF−L is effective and v(
L) > v(F )−ǫ.
By Viehweg’s weak positivity theorem ([2]), there exists a positive integer
k such that Sk(f
OX(m1KX/Y )⊗OY (H)) is generically generated by global
sections for a positive integer k. k is a function on H and m1.
We have
rank Im(SmSk(f
OX(m1KX/Y )) → f∗OX(km1mKX/Y ))
≥ dimH0(F, kmL)
≥ (v(F )− 2ǫ)
(km1m)
http://arxiv.org/abs/0704.1014v1
for sufficiently large m.
dimH0(X, km1mKX)
≥ dimH0(Y, k(m1 −m0)mKY )× (v(F )− 2ǫ)
(km1m)
≥ (v(Y )− ǫ)
(k(m1 −m0)m)
(v(F )− 2ǫ)
(km1m)
≥ (v(Y )− 2ǫ)(v(F )− 2ǫ)
(km1m)
dY !dF !
if we take m1 large compared with m0 such that
(v(Y )− ǫ)
(v(Y )− 2ǫ)
m1 −m0
)dY .
Remark 0.2. If X = Y × F , then we have an equality in the formula. We
expect that the equality implies the isotriviality of the family.
References
[1] Fujita, Takao. Approximating Zariski decomposition of big line bundles.
Kodai Math. J. 17 (1994), no. 1, 1–3.
[2] Viehweg, Eckart. Weak positivity and the additivity of the Kodaira di-
mension for certain fibre spaces. Algebraic varieties and analytic vari-
eties (Tokyo, 1981), 329–353, Adv. Stud. Pure Math., 1, North-Holland,
Amsterdam, 1983.
Department of Mathematical Sciences, University of Tokyo,
Komaba, Meguro, Tokyo, 153-8914, Japan
kawamata@ms.u-tokyo.ac.jp
ABSTRACT
  A simple application of the semipositivity.

<|endoftext|><|startoftext|>
Introduction
In [1][2], Zucchini has constructed a two dimensional topological sigma model on generalized
complex geometry [3] [4] [5] by the AKSZ formulation [6] (also see [7]), which is a general
geometrical framework to construct a topological sigma model by the Batalin-Vilkovisky
formalism [8]. Also, there are many recent papers [9]-[32] on this topic. Zucchini’s model is a
generalization of the Poisson sigma model and is similar to A model in [6]. However B model
looks different from the Zucchini model because B model has more fields than the Zucchini
model has.
In this paper, we propose an alternative realization of generalized complex geometry by
a topological field theory by the AKSZ formulation. Our model is similar to B model, not
A model in the sense of AKSZ, as a worldsheet action of a topological sigma model with
superifields on a supermaifold. Our model is the first candidate which naturally includes B
model and may be related to a topological string theory on generalized Calabi-Yau geometry
[23] [24].
First we construct a three dimensional topological field theory of generalized complex
geometry with a nontrivial 3-form H , which has Zucchini’s model as a boundary action. This
topological field theory is a reconstruction by the AKSZ formulation of the model proposed
in the paper [33]. Next after a dimensional reduction, we derive a topological field theory
of generalized complex geometry in two dimensions from three dimensions. We can see that
this model has a generalized complex structure as a consistency condition of a topological
BV action. If the generalized complex structure is a complex structure, our model has one
parameter marginal deformation of the model without changing a complex structure, and
reduces to B model in a limit of the deformation. If the generalized complex structure is a
symplectic structure, our model becomes a new 2D topological sigma model with a symplectic
structure.
The paper is organized as follows. In section 2, the AKSZ actions of A model, B model
and the Zucchini model are reviewed. In section 3, three dimensional topological field the-
ory of generalized complex geometry is rederived in the AKSZ formulation. In section 4, we
derive a two dimensional topological field theory of generalized complex geometry and check
its properties. In section 5, our model is reduced in two special ways. Section 6 includes
conclusion and discussion. In appendix A, a generalized complex structure is briefly summa-
rized. In appendix B, the AKSZ formulation of the Batalin-Vilkovisky formalism in general
n dimensions is reviewed.
2 A Model, B Model and Zucchini Model
In this section, we review the AKSZ formulation of topological sigma models such as A model,
B model and the Zucchini model.
2.1 A Model and B Model
A model and B model are defined on the graded bundle
T ∗[1]M ⊕ (T [1]M ⊕ T ∗[0]M) . (1)
Here E = TM , n = 2 and p ≥ 1 in the general graded bundles (100). Local coordinates
are written by superfields on this bundle: (φi,B1i,A1
i,B0,i). φ
i is a map φi : ΠTΣ →
M , and B1i is a basis of sections of ΠT
∗Σ ⊗ φ∗(T ∗[1]M). A1i is a basis of sections of
ΠT ∗Σ ⊗ φ∗(T [1]M), and B0i is a basis of sections of ΠT ∗Σ ⊗ φ∗(T ∗[0]M). The antibracket
on this bundle (1) is
(F,G) ≡ F
∂B1,i
∂B1,i
∂B0,i
∂B0,i
G (2)
from (102).
The A model action with a symplectic form Qij in [34] is
SAQ =
Qij(φ)dφ
idφj , (3)
where d is a superderivative d = θµ∂µ. where the integration
ΠTΣ means the integration on the
supermanifold,
ΠTΣ d
2θd2σ. This action is consistent if and only if the 2-formQ = 1
Qijdφ
satisfies the symplectic condition dMQ = 0, namely
∂kQij + ∂iQjk + ∂jQki = 0. (4)
This model is rewritten by the AKSZ formulation on the graded bundle T ∗[1]M⊕(T [1]M ⊕ T ∗[0]M).
We introduce Ai1, B0i and B1i as auxiliary fields, and rewrite the action using the first order
formalism. The action in AKSZ formulation is
SAQ =
B1idφ
i −B0idAi1 −B1iAi1 +
Qij(φ)A
. (5)
We can check that (SAQ, SAQ) = 0 if and only if the 2-formQ satisfies the symplectic condition
Also, A model action with a Poisson bivector P ij is
SAP =
B1idφ
i −B0idAi1 +
P ij(φ)B1iB1j , (6)
which is called the Poisson sigma model [35][36]. The consistency condition (SAP , SAP ) = 0
is satisfied if and only if P ij is a Poisson bivector field i.e.
P il∂lP
jk + P jl∂lP
ki + P kl∂lP
ij = 0. (7)
B model with a complex structure J ij is
B1idφ
i −B0idAi1 + J ij(φ)B1iA
∂J ik
(φ)B0iA
1, (8)
which is a covariant form of B model action in [6], but is different from the action in [37]. We
can check that the consistency condition (SB, SB) = 0 is satisfied if and only if J
j satisfies
the integrability condition for the complex structure
J li∂lJ
j − J lj∂lJki − Jkl∂iJ lj + Jkl∂jJ li = 0. (9)
2.2 Zucchini Model
In [1], Zucchini has proposed a topological sigma model with a generalized complex structure
on a two dimensional worldsheet Σ. Although he called this model ”the Hitchin sigma model”,
here we call it the Zucchini model.
First we consider H = 0 case. The action of the Zucchini’s model is
B1idφ
P ij(φ)B1iB1j +
Qij(φ)dφ
idφj + J ij(φ)B1idφ
j . (10)
The master equation (SZ , SZ) = 0 is satisfied if P , Q and J satisfy the conditions for a
generalized complex structure (73), (74) and (75). We can see that the Batalin-Vilkovisky
structure of this model defines a generalized complex structure on a target manifold M . If
J ij = 0 in the action (10), the action reduces to the summation of two realizations of A model
such that (3) + (6). However, if P ij = Qij = 0, the action (10) does not reduce to the B
model action (8). So we can not easily see whether the Zucchini model can be related to B
model.
Also, we can consider b-transformation property of this model [1]. The b-transformation
is defined by (77), (83) and
= φi,
B̂1i = B1i + bijdφ
j . (11)
The b-transformation produces the b field term such as
ŜZ = SZ −
bijdφ
idφj. (12)
This suggests that the Zucchini action with H 6= 0 should have a Wess-Zumino term
SZH =
B1idφ
P ijB1iB1j +
Qijdφ
idφj + J ijB1idφ
Hijkdφ
idφjdφk,(13)
where X is a three dimensional worldvolume such that Σ = ∂X is a two dimensional boundary
of X .
3 3D Topological Field Theory with Generalized Com-
plex Structures from 2D Zucchini Model
In this section, we review a three dimensional topological field theory with a generalized
complex structure from the Zucchini model in two dimensions. Here this topological field
theory is redefined by the AKSZ formulation, which was not explicitly written in [33].
3.1 H = 0 case
Let X be a three dimensional worldvolume with a coordinate (σM) for M = 1, 2, 3, and
Σ = ∂X be a two dimensional boundary of X . First we consider H = 0 case.
By using the Stokes theorem, we can see the action (10) as
B1idφ
P ijB1iB1j +
Qijdφ
idφj + J ijB1idφ
dB1idφ
∂P ij
dφkB1iB1j + P
ijdB1iB1j +
dφkdφidφj
∂J ij
dφkB1idφ
j + J ijdB1idφ
j , (14)
where d is a three dimensional derivative d = θM∂M . φ
i and B1i can be extended to those
on X such that φi : ΠTX → M and B1i is a basis of sections of ΠT ∗X ⊗ φ∗(T ∗[1]M). We
introduce a superfield Ai1 with total degree one, which is a basis of a section of ΠT
∗(T [1]M) such that Ai1 = dφ
i, and a superfield B2i with total degree two, which is a basis
of a section of ΠT ∗X ⊗ φ∗(T ∗[2]M) such that B2i = −dB1i. Moreover, we introduce two
Lagrange multiplier fields Y 2i and Z
1 in order to realize two equations such as A
1 = dφ
and B2i = −dB1i by the equations of motion. The superfield Y 2i with total degree two is a
section of ΠT ∗X ⊗ φ∗(T ∗[2]M), and the superfield Zi1 with total degree one is a section of
ΠT ∗X ⊗ φ∗(T [1]M). The 3D action (14) is equivalent to
−B2iAi1 +
∂P ij
1B1iB1j − P ijB2iB1j +
∂J ij
1B1iA
1 − J ijB2iA
1 + (A
1 − dφ
i)Y 2i + (B2i + dB1i)Z
1. (15)
We define Y ′2i = Y 2i − 12B2i and Z
1 = Z
1 − 12A
1. The action (15) is rewritten as
SZ = Sa + Sb + total derivative ;
−Y ′2idφi + dB1iZ ′i1 + Y ′2iAi1 +B2iZ ′i1 ,
B2idφ
B1idA
1 − J ijB2iA
1 − P ijB2iB1j +
1B1k +
∂P jk
1B1jB1k. (16)
where Sa is independent of a generalized complex structure. Sb can be written as
〈0 +B2, d(φ+ 0)〉+
〈A1 +B1, d(A1 +B1)〉
−〈0 +B2,J (A1 +B1)〉 −
〈A1 +B1,Ai1
(A1 +B1)〉+ total derivative,(17)
which is analogical with the B model action (8).
The antibracket (P-structure) on X , which is induced from the antibracket (2) on Σ, for
i, B2,i, A1
i and B1,i is given by the antibracket (102) in n = 3. In order to define the
antibrackets for Y ′2i and Z
1 , we introduce two antibracket conjugate fields X
i, which are
maps from ΠTX to M , and V 1i, which are sections of ΠT
∗X ⊗ φ∗(T ∗[1]M). The model
is defined on the graded bundle of the direct product of T ∗[2]M ⊕ (T [1]M ⊕ T ∗[1]M) and
(T [0]M⊕T ∗[2]M)⊕ (T [1]M ⊕ T ∗[1]M). The second bundle is represented by auxiliary fields.
The antibracket is
(F,G) ≡ F
∂B2,i
∂B2,i
∂B1,i
∂B1,i
∂Y ′2,i
∂Y ′2,i
∂Z1′i
∂V 1,i
∂V 1,i
∂Z1′i
G. (18)
We can check that SZ satisfies the master equation (SZ , SZ) = 0 if J , P and Q are components
of the generalized complex structure (72). We can take the proper boundary conditions
Σ = ∂X ;
//|∂X = 0,B2i//|∂X = 0,Y ′2i//|∂X = 0,Z1
//|∂X = 0, (19)
such that the total derivative terms on the master equation (SZ , SZ) vanish. Here // means
that we take the components which are tangent to the boundary ∂X .
Also, because (Sa, Sa) = (Sa, Sb) = 0, Sb satisfies the master equation (Sb, Sb) = 0
Aijk = Bijk = Cijk = 0,
∂iDjkl + (ijkl cyclic) = 0, (20)
where Aijk, Bijk, Cijk and Djkl are defined in Appendix A. Therefore, we can see Sb as a three
dimensional AKSZ action with generalized complex structure. We discuss why the condition
is not Djkl = 0 but ∂iDjkl + (ijkl cyclic) = 0 in subsection 3.3.
We call Sb three dimensional generalized complex sigma model.
We consider 3D b-transformation property from the 2D b-transformations (11) and the
conditions Ai1 = dφ and B2i = −dB1i. 3D b-transformations are
= φi,
= Ai1,
B̂1i = B1i + bijA
B̂2i = B2i − d(bijAj1),
2i = Y
bijdA
−J lk
1 − P lk
1B1k + d(J
kbliA
1) + d(P
lkbliB1k),
= Z ′i1 . (21)
We can see that 3D action (16) is invariant under the b-transformation such that
ŜZ = SZ . (22)
3.2 H 6= 0 case I :Action induced from the Zucchini model
In the similar way, we can consider the case of a twisted generalized complex structure with
H 6= 0. From the Zucchini model with H 6= 0 (13), a three dimensional action is derived as
SZH = Sa + SHb + total derivative ;
−Y ′2idφ
i + dB1iZ
1 + Y
i +B2iZ
SHb =
B2idφ
B1idA
1 − J ijB2iA
1 − P ijB2iB1j +
Hijk +
1B1k +
∂P jk
1B1jB1k. (23)
This action (23) satisfies the master equation (SZH, SZH) = 0, if J , P , Q and H are compo-
nents of a twisted generalized complex structure (87). However, this action is not b-invariant
under the b-transformation (21), (77) and (83). The action (23) transforms under the b-
transformation as
ŜZH = SZH −
1 = SZH −
(dMb)[ijk]A
1, (24)
which has been expected from b-transformation property (12) in the two dimensional model.
Since H is closed, from the Poincaré Lemma, we can locally write H with a 2-form q on
M such as
Hijk =
. (25)
1 term in (23) can be absorbed to Q by a local b-transformation qij = bij in
the action (23), and we obtain just the H = 0 action (16). In other words, the H terms in
(23) are consistent up to H-exact terms as a global theory, and this model is meaningful only
as a cohomology class in H3(M). It is a gerbe gauge transformation dependence [1].
If we set Qij = J
j = 0 in (13), we obtain the AKSZ formulation of the WZ-Poisson sigma
model [38]:
SWZP =
B1idφ
P ijB1iB1j +
Hjkldφ
idφjdφk. (26)
From (23), the 3D topological sigma model equivalent to (26) is
SWZP = Sa + SWZPb ;
−Y ′2idφ
i + dB1iZ
1 + Y
1 +B2iZ
SWZPb =
B2idφ
dB1iA
1 − P ijB2iB1j +
HijkA
∂P jk
1B1jB1k. (27)
3.3 H 6= 0 case II : b-invariant action
We can construct a b-invariant action with H 6= 0 in three dimensional manifold X . We
introduce other H terms.
SI = Sa + SIb ;
−Y ′2idφ
i + dB1iZ
1 + Y
1 +B2iZ
SIb =
B2idφ
B1idA
1 − J ijB2iA
1 − P ijB2iB1j +
J liHjkl +
−P klHijl −
1B1k +
∂P jk
1B1jB1k. (28)
SI satisfies the master equation (SI , SI) = 0 under the antibracket (18) if and only if J , P ,
Q and H are components of a twisted generalized complex structure. Namely, the master
equation (SI , SI) = 0 gives
AHijk = BHijk = CHijk = 0,
∂iDHjkl + (ijkl cyclic) = 0, (29)
where AHijk, BHijk, CHijk and DHjkl are defined in Appendix A. The integrability condition
is not DHijk = 0 but ∂iDHjkl + (ijkl cyclic) = 0 because the action SI is b-transformation
invariant, H ijk has b-transformation ambiguity by (83), and H is defined as a cohomology
class in H3(M) in a twisted generalized complex structure.
Since SIa does not depend on a twisted generalized complex structure, (SIb, SIb) = 0 is
satisfied under the condition (29). We can introduce the coupling constants by redefining Y ′2i
and Z1
′i to g1Y
2i and g2Z
1 . If we take the limits that g1 → 0 and g2 → 0, then SI → SIb
and a twisted generalized complex structure does not change. We call this model SIb a three
dimensional twisted generalized complex sigma model.
We can change the b-transformation so that the action SI is invariant, though the action
(28) is not invariant under the original b-transformation (21). The b-transformations for B2i
and Y ′2i are changed to
B̂2i = B2i −
2i = Y
1 − bijdZ
1, (30)
and b-transformations for the other fields are the same as (21). Then we can check ŜI = SI
after short calculation.
4 2D Topological Field Theory of Generalized Complex
Geometry
In this section, we propose a new two dimensional topological field theory of generalized
complex geometry using the 3D topological field theory. First, only a part of the 3D BV
formalism action is dimensionally reducted to in two dimension, and next this is modified
in the 2D BV formalism such that the master equations determine just generalized complex
structures. One important reason to have to take this unusual way is that generally, master
equations of BV formalisms are not kept by a dimensional reduction.
4.1 H = 0
First we consider the H = 0 case. We consider a dimensional reduction, which can keep a
generalized complex structure, from a three dimensional worldvolume X to a two dimensional
manifold Σ′. X is compactified to Σ′ × S1. Then ΠTX is compactified to ΠTΣ′ × ΠTS1. It
should be noticed that Σ′ is generally a different manifold from Σ.
Here, we take X = Σ×R+, where Σ has a local coordinate (σ1, σ2) and R+ = [0,∞) has
a local coordinate (σ3). The second component (σ2) is compactified such that Σ′ = L×R+,
whose local coordinate is (σ1, σ3), where L is a manifold in one dimension. We formulate
the dimensional reduction from a general three dimensional manifold X to a general two
dimensional manifold Σ′. Here we ignore Kaluza-Klein modes and consider only massless
sectors, because we will see that the consistent BV action can be constructed in two di-
mension even if these KK modes are omitted. It is not our purpose that we derive the
two dimensional model which is completely equivalent to the 3D topological field theory.
The target graded bundle for the three dimensional model, T ∗[2]M ⊕ (T [1]M ⊕ T ∗[1]M), re-
duces to the graded bundle for the two dimensional model, (T ∗[1]M ⊕ (T [−1]M ⊕ T ∗[2]M))⊕
((T [0]M ⊕ T ∗[1]M)⊕ (T [1]M ⊕ T ∗[0]M)). Under the dimensional reduction (σ1, σ2, σ3) →
(σ1, σ3), the fields are reduced as follows.
i(σ1, σ2, σ3) = φ̃
(σ1, σ3) + θ2φ̃−1
(σ1, σ3),
1, σ2, σ3) = Ã1
(σ1, σ3) + θ2α̃0
i(σ1, σ3),
B1i(σ
1, σ2, σ3) = B̃1i(σ
1, σ3) + θ2β̃0i(σ
1, σ3),
B2i(σ
1, σ2, σ3) = B̃2i(σ
1, σ3) + θ2β̃1i(σ
1, σ3), (31)
where φ̃−1
has the total degree −1, φ̃
, α̃0
i and β̃0i have the total degree 0, Ã1
, B̃1i and
β̃1i have the total degree 1, and B̃2i has the total degree 2. All these superfields do not
depend on θ2.
The antibracket induced from three dimensions is
(F,G) ≡ F
∂β̃1i
∂β̃1i
∂φ̃−1
∂B̃2i
∂B̃2i
∂φ̃−1
∂β̃0i
∂β̃0i
∂B̃1i
∂B̃1i
G. (32)
We take a three dimensional AKSZ action Sb (16) with a generalized complex structure. The
existence of the negative total degree superfield φ̃−1
complexifies the dimensional reduction
in the AKSZ formulation. Generally in [39], it is known that even if we substitute (31) to
(16), we do not obtain the correct AKSZ action in two dimensions, and we need more φ̃−1
terms.
In order to derive the correct AKSZ action, first we should consider the dimensional
reduction via the non-BV formalism. The superfields are expanded by the ghost numbers to
i = φ(0)i + φ(−1)i + φ(−2)i + φ(−3)i,
B1i = B
1,i +B
1,i +B
1,i +B
1,i ,
i = A
1 + A
1 + A
(−1)i
1 + A
(−2)i
B2,i = B
2,i +B
2,i +B
2,i +B
2,i , (33)
where φ(−1)i ≡ θMφ(−1)iM , etc. After setting all the antifield with negative ghost numbers to
zero, the following non-BV action is
2i dφ
(0)i +
1i dA
1 − J ijB
1 − P ijB
∂φ(0)i
(φ(0)i)A
∂φ(0)i
∂φ(0)j
(φ(0)i)A
∂P jk
∂φ(0)i
(φ(0)i)A
1k . (34)
Since by the dimensional reduction, the fields reduce to
φ(0)i(σ1, σ2, σ3) = φ̃
(σ1, σ2),
1, σ2, σ3) = Ã1
(σ1, σ3) + θ2α̃0
(0)i(σ1, σ3),
1i (σ
1, σ2, σ3) = B̃1
1, σ3) + θ2β̃0
1, σ3),
2i (σ
1, σ2, σ3) = B̃2
1, σ3) + θ2β̃1
1, σ3), (35)
the action (34) reduces to
i dφ̃
+ B̃1
i dα̃0
(0)i + Ã1
− J ijÃ1
i + P
ijB̃1
i β̃1
 α̃0
(0)k +
− ∂J
 β̃0
 Ã1
 α̃0
(0)j − ∂P
 Ã1
∂P jk
 B̃1
j B̃1
J ijα̃0
(0)j + P ijβ̃0
i , (36)
up to total derivative terms. Therefore the action S
R of a 2D topological field theory is
R = S
0 + S
i dφ̃
+ B̃1
i dα̃0
(0)i + Ã1
−J ijÃ1
i + P
ijB̃1
i β̃1
 α̃0
(0)k +
− ∂J
 β̃0
 Ã1
 α̃0
(0)j − ∂P
 Ã1
∂P jk
 B̃1
j B̃1
J ijα̃0
(0)j + P ijβ̃0
i . (37)
Next we formulate the action SR by the AKSZ formulation. We define SR = S0 + S1
where S0 and S1 are AKSZ actions for S
0 and S
1 , respectively. S0 is easily derived after
substituting (31) to (16);
β̃1idφ̃
− B̃2idφ̃−1
+ B̃1idα̃0
i + Ã1
dβ̃0i
up to total derivative terms. The condition (S1, S1) = 0 comes from (38) and (SR, SR) = 0.
We introduce an negative total degree, which is defined as one for φ̃−1, and zero for the other
fields. We can expand S1 for the negative total degree such as S1 =
p=0 S
1 , where
i1 · · · φ̃−1
ipL[p]i1···ip(φ̃, Ã1
, α̃0, B̃1i, β̃0i, B̃2i, β̃1i) (39)
are the negative total degree p terms. Therefore
SR = S0 +
1 . (40)
Here we write the first two actions S
1 and S
1 with the negative total degree zero and one
by substituting (31) to (16),
−J ijÃ1
β̃1i + P
B̃1iβ̃1j
 α̃0
 β̃0k
 Ã1
 α̃0
j − ∂P
 Ã1
B̃1k +
∂P jk
 B̃1jB̃1k
J ijα̃0
j + P ijβ̃0j
B̃2i, (41)
∂J ij
B̃2iÃ1
∂P ij
B̃2iB̃1j −
∂2Qjk
 Ã1
B̃1k −
∂2P jk
B̃1jB̃1k
. (42)
1 for p > 1 are recursively derived from the master equation (S1, S1) =
p=0{(S1, S1)}[p] =
0. It should be noticed that since a target space M has finite dimensions, S
1 is nonzero
for only a finite number of p. This action is a special case of a nonlinear gauge theory with
2-forms (a generalization of the Poisson sigma model) analyzed in the paper [39][40][41].
4.2 H 6= 0
Here we consider H 6= 0 case. A 2D topological field theory of twisted generalized complex
geometry is derived in a similar way in subsection 4.1 from H-terms in section 3.2:
SR = S0 +
1 ; (43)
−J ijÃ1
β̃1i + P
B̃1iβ̃1j
3Hijk +
 α̃0
 β̃0k
 Ã1
 α̃0
j − ∂P
 Ã1
B̃1k +
∂P jk
 B̃1jB̃1k
J ijα̃0
j + P ijβ̃0j
B̃2i, (44)
∂J ij
B̃2iÃ1
∂P ij
B̃2iB̃1j −
Hijk +
 Ã1
 Ã1
B̃1k −
∂2P jk
B̃1jB̃1k
, (45)
and S
1 for p > 1 are recursively derived from (SR, SR).
Also, from b-invariant H-terms in section 3.3,
SR = S0 +
1 ; (46)
−J ijÃ1
β̃1i + P
B̃1iβ̃1j
3J liHjkl +
 α̃0
−P klHjkl −
 β̃0k
 Ã1
P klHjkl +
 α̃0
j − ∂P
 Ã1
B̃1k +
∂P jk
 B̃1jB̃1k
J ijα̃0
j + P ijβ̃0j
B̃2i, (47)
∂J ij
B̃2iÃ1
∂P ij
B̃2iB̃1j −
JmiHjkm +
 Ã1
−P kmHjkm −
 Ã1
B̃1k −
∂2P jk
B̃1jB̃1k
, (48)
and S
1 for p > 1 are recursively derived from (SR, SR).
5 Two Special Reductions to Complex Geometry and
Symplectic Geometry
In this section, we consider two special reductions related to complex geometry and of sym-
plectic geometry.
5.1 Complex geometry
First we consider our model in complex geometry, which is the case that P = Q = H = 0 in
the action (40). We redefine superfields as
, φ̃−1
= λφ̃′
, α̃0
i = λα̃′
B̃1i = λB̃
, β̃0i = −β̃′0i,
B̃2i = λB̃
, β̃1i =
, (49)
where λ is a constant. After this redefinition, the action (40) is
SR = S0 +
1 , (50)
i − Ã′
+ B̃′
, (51)
J ijβ̃′1iÃ
− J ijα̃′0
 , (52)
λφ̃−1
∂J ij
∂2Jkj
, (53)
and S
1 has at least the higher order of λ than λ
p because φ̃−1
= λφ̃′
. We can take the
limit λ −→ 0 with preserving the complex structure. S [p]1 for p > 0 reduces to zero, and the
2D action is
SRJ =
β̃1idφ̃
− Ã1
dβ̃0i + J
jβ̃1iÃ1
∂J ik
β̃0iÃ1
. (54)
This action is nothing but the B model action (8) up to a total derivative and the all over
factor 1
, which depends on only J ij. The master equation (SbJ , SbJ) = 0 impose the condition
that J ij is a complex structure.
We make a comment about the difference between the action (50) with a finite λ and the
B model action (54) with λ → 0. Following the well-known method in [6], we can see that
the topological string theory has to be deformed by the other terms in (50) than in the B
model. In the calculation of [6], we may locally take the complex structure as a constant, and
the kinetic terms (51) and two terms in (52) without the derivatives of J ij are only different
parts from those in the B model. Here it should be noted that although these deformed parts
may seem to decouple to the B model part, the interactions between them can come from the
non-constant metric. These deformed parts couple to only the metric on the bosonic space
of φ̃
, which is independent of φ̃−1
, because there is no metric with fermionic indices on
the fermionic space of φ̃−1
. So these deformed parts can be seen as a topological theory
with only B field-like couplings on the fermionic space of φ̃−1
. Physically, we may assume
that there is no topological information along fermionic directions, although this situation
with no metric is special. Therefore in this assumption, we can see that our action (50) is
equivalent to topological string theory, called topological B model. As a future work, it would
be interesting to check this equivalence more carefully.
5.2 Symplectic geometry
Next we consider our model in symplectic geometry, which is the case that J = H = 0 in the
action (40). We redefine superfields as
, φ̃−1
= µφ̃′
= µÃ′
, α̃0
i = α̃′
B̃1i =
, β̃0i = −µβ̃′0i,
B̃2i = µB̃
, β̃1i =
, (55)
where µ is a constant. After this redefinition, the 2D action (40) reduces to
SR = S0 +
+ B̃′
− Ã′
P ijB̃′
∂P jk
 (57)
 α̃′
j − 1
∂P jk
+ P ijβ̃′
µφ̃−1
∂P ij
B̃2iB̃1j −
∂2Qjk
∂2P jk
,(58)
and S
1 is at least the higher order of λ than λ
p because φ̃−1
= µφ̃′
After taking the limit µ −→ 0 with preserving the symplectic structure, S [p]1 for p > 0
reduces to zero, and the 2D action is
SRP =
β̃1idφ̃
+ B̃1idα̃0
i + P ijB̃1iβ̃1j +
∂P jk
B̃1jB̃1k. (59)
The BV condition (SbP , SbP ) = 0 is satisfied if and only if P
ij is a Poisson structure (the
inverse of a symplectic structure) (7). It should be noticed that although this action (59)
depends on only a symplectic structure P ij, this action is a different realization of the Poisson
structure from the A model (6), because we can also check that this model is not equivalent
to topological string theory following the similar way as in [6].
6 Conclusions and Discussion
We have constructed a topological field theory with a generalized complex structure in three
dimensions and two dimensions using the AKSZ formulation. Our model reduces to B model
in a limit if the generalized complex structure is only a complex structure, although Zucchini
model reduces to A model in the limit that the generalized complex structure is only a
symplectic structure.
It would be interesting to check that the Zucchini model and our model are equivalent to
a topological string theory with a generalized complex structure [23][24], which is constructed
from the twisted N = (2, 2) supersymmetric sigma model with a non-trivial B field.
Appendix A. Generalized Complex Structure
In this appendix A, we summarize a generalized complex structure, based on description of
section 3 in [11] and section 2 in [1].
Let M be a manifold of even dimension d with a local coordinate {φi}. We consider
the vector bundle TM ⊕ T ∗M . We denote a section as X + ξ ∈ C∞(TM ⊕ T ∗M) where
X ∈ C∞(TM) and ξ ∈ C∞(T ∗M).
TM ⊕ T ∗M is equipped with a natural indefinite metric of signature (d, d) defined by
〈X + ξ, Y + η〉 = 1
(iXη + iY ξ), (60)
for X + ξ, Y + η ∈ C∞(TM ⊕ T ∗M), where iV is an interior product with a vector field V .
In the Cartesian coordinate (∂/∂φi, dφi), The metric is written as follows:
, (61)
We define a Courant bracket on TM ⊕ T ∗M as follows:
[X + ξ, Y + η] = [X, Y ] + LXη −LY ξ −
dM(iXη − iY ξ), (62)
with X + ξ, Y + η ∈ C∞(TM ⊕T ∗M), where LV denotes Lie derivation with respect a vector
field V and dM is the exterior differential of M . This bracket is antisymmetric but do not
satisfy the Jacobi identity. We may consider a so called Dorfman bracket as follows:
(X + ξ) ◦ (Y + η) = [X, Y ] + LXη − iY dξ, (63)
which satisfies the Jacobi identity but is not antisymmetric. Antisymmetrization of a Dorfman
bracket coincides with a Courant bracket.
A generalized almost complex structure J is a section of C∞(End(TM ⊕ T ∗M)), which is
an isometry of the metric 〈 , 〉, J ∗IJ = I, and satisfies
J 2 = −1. (64)
A b-transformation is an isometry defined by
exp(b)(X + ξ) = X + ξ + iXb, (65)
where b ∈ C∞(∧2T ∗M) is a 2–form. A Courant bracket is covariant under the b-transformation
[exp(b)(X + ξ), exp(b)(Y + η)] = exp(b)[X + ξ, Y + η], (66)
if the 2–form b is closed. The b-transform of J is defined by
Ĵ = exp(−b)J exp(b). (67)
J has the ±
−1 eigenbundles because J 2 = −1, In order to divide TM ⊕ T ∗M to each
eigenbundle, we need complexification of TM ⊕ T ∗M , (TM ⊕ T ∗M)⊗ C. The projectors on
the eigenbundles are defined by
−1J ). (68)
The generalized almost complex structure J is integrable if
Π∓[Π±(X + ξ),Π±(Y + η)] = 0, (69)
for any X + ξ, Y + η ∈ C∞(TM ⊕ T ∗M), where the bracket is the Courant bracket. Then J
is called a generalized complex structure. Integrability is equivalent to the single statement
N(X + ξ, Y + η) = 0, (70)
for all X + ξ, Y + η ∈ C∞(TM ⊕ T ∗M), where N is the generalized Nijenhuis tensor defined
N(X + ξ, Y + η) = [X + ξ, Y + η]− [J (X + ξ),J (Y + η)] + J [J (X + ξ), Y + η]
+J [X + ξ,J (Y + η)]. (71)
The b-transform Ĵ of a generalized complex structure J is a generalized complex structure
if the 2–form b is closed.
We decompose a generalized almost complex structure J in coordinate form as follows
, (72)
where J,K ∈ C∞(TM ⊗ T ∗M), P ∈ C∞(∧2TM), Q ∈ C∞(∧2T ∗M).
Then the conditions J ∗IJ = I, and J 2 = −1 derive
i = −J ij
J ikJ
j + P
ikQkj + δ
j = 0,
J ikP
kj + J jkP
ki = 0,
j +QjkJ
i = 0, (73)
where
P ij + P ji = 0,
Qij +Qji = 0. (74)
The integrability condition (69) is equivalent to the following condition
Aijk = Bijk = Cijk = Dijk = 0, (75)
where
Aijk = P il∂lP jk + P jl∂lP ki + P kl∂lP ij,
Bijk = J li∂lP jk + P jl(∂iJkl − ∂lJki) + P kl∂lJ j i − J j l∂iP lk,
Cijk = J li∂lJkj − J lj∂lJki − Jkl∂iJ lj + Jkl∂jJ li
+P kl(∂lQij + ∂iQjl + ∂jQli),
Dijk = J li(∂lQjk + ∂kQlj) + J lj(∂lQki + ∂iQlk)
+J lk(∂lQij + ∂jQli)−Qjl∂iJ lk −Qkl∂jJ li −Qil∂kJ lj . (76)
Here ∂i is a differentiation with respect to φ
i. The b–transform is
Ĵ ij = J
j − P ikbkj,
P̂ ij = P ij,
Q̂ij = Qij + bikJ
j − bjkJki + P klbkiblj . (77)
where bij + bji = 0.
The usual complex structures J is embedded in generalized complex structures as the
special form
0 −tJ
. (78)
Indeed, one can check this form satisfies conditions, (73) and (75) if and only if J is a
complex structure. Similarly, the usual symplectic structures Q is obtained as the special
form of generalized complex structures
0 −Q−1
. (79)
This satisfies (73) and (75) if and only if Q is a symplectic structure, i. e. it is closed. Other
exotic examples exist. There exists manifolds which cannot support any complex or symplectic
structure, but admit generalized complex structures.
The Courant bracket on TM ⊕ T ∗M can be modified by a closed 3–form. Let H ∈
C∞(∧3T ∗M) be a closed 3–form. We define the H twisted Courant brackets by
[X + ξ, Y + η]H = [X + ξ, Y + η] + iX iYH, (80)
where X + ξ, Y + η ∈ C∞(TM ⊕ T ∗M). Under the b-transform with b a closed 2–form,
[exp(b)(X + ξ), exp(b)(Y + η)] = exp(b)[X + ξ, Y + η], (81)
holds with the brackets [ , ] replaced by [ , ]H . For a non closed b, one has
[exp(b)(X + ξ), exp(b)(Y + η)]H−dM b = exp(b)[X + ξ, Y + η]H . (82)
So, the b-transformation shifts H by the exact 3–form dMb:
Ĥ = H − dMb. (83)
One can define an H twisted generalized Nijenhuis tensor NH as follows
N(X + ξ, Y + η) = [X + ξ, Y + η]H − [J (X + ξ),J (Y + η)]H + J [J (X + ξ), Y + η]H
+J [X + ξ,J (Y + η)]H , (84)
by using the brackets [ , ]H instead of [ , ]. A generalized almost complex structure J is H
integrable if
NH(X + ξ, Y + η) = 0, (85)
for all X + ξ, Y + η ∈ C∞(TM ⊕ T ∗M). Then we call J an twisted generalized complex
structure.
The H integrability conditions is as follows:
AHijk = BHijk = CHijk = DHijk = 0, (86)
where
AHijk = Aijk,
BHijk = Bijk + P jlP kmHilm
CHijk = Cijk − J liP kmHjlm + J ljP kmHilm,
DHijk = Dijk −Hijk + J liJmjHklm + J ljJmkHilm + J lkJmiHjlm. (87)
Appendix B. AKSZ Formulation of Batalin-Vilkovisky
Formalism
In the appendix B, we review the AKSZ formulation in any dimension [42]. In order to
construct and analyze topological field theories systematically, it is useful to use Batalin-
Vilkovisky formalism. The geometric structure of the AKSZ formulation is called Batalin-
Vilkovisky Structures.
B-1. Batalin-Vilkovisky Structures on Graded Vector Bundles
Let M be a smooth manifold in d dimensions. If we consider We define a supermanifold
ΠT ∗M . Mathematically, ΠT ∗M , whose bosonic part is M , is defined as a cotangent bundle
with reversed parity of the fiber. That is, a base manifold M has a Grassman even coordinate
and the fiber of ΠT ∗M has a Grassman odd coordinate. We introduce a grading called total
degrees, which is denoted |F | for a function F . The coordinates of the base manifold have
grade zero and the coordinates of the fiber have grade one. Similarly, we can define ΠTM for
a tangent bundle TM . ΠTM is also called a supermanifold.
We must consider more general assignments for the degree of the fibers of T ∗M or TM .
For an integer p, we define T ∗[p]M , which is called a graded cotangent bundle. T ∗[p]M is a
cotangent bundle, whose fiber has the degree p. This degree is also called the total degree.
A coordinate of the bass manifold have the total degree zero and a coordinate of the fiber
have the total degree p. If p is odd, the fiber is Grassman odd, and if p is even, the fiber is
Grassman even. We define a graded tangent bundle T [p]M in the same way.
We consider a vector bundle E. A graded vector bundle E[p] is defined in the similar way.
E[p] is a vector bundle whose fiber has a shifted degree by p. Note that only the degree of
fiber is shifted, and the degree of base space is not shifted.
We consider a Poisson manifold N with a Poisson bracket {∗, ∗}. If we shift the total
degree, we can construct a graded manifold (a graded cotangent bundle or a graded vector
bundle) Ñ from N . Then a Poisson structure {∗, ∗} shifts to a graded Poisson structure
by grading of Ñ . The graded Poisson bracket is called an antibracket and denoted by (∗, ∗).
(∗, ∗) is graded symmetric and satisfies the graded Leibniz rule and the graded Jacobi identity
with respect to grading of the manifold. The antibracket (∗, ∗) with the total degree −n + 1
satisfies the following identities:
(F,G) = −(−1)(|F |+1−n)(|G|+1−n)(G,F ),
(F,GH) = (F,G)H + (−1)(|F |+1−n)|G|G(F,H),
(FG,H) = F (G,H) + (−1)|G|(|H|+1−n)(F,H)G,
(−1)(|F |+1−n)(|H|+1−n)(F, (G,H)) + cyclic permutations = 0, (88)
where F,G and H are functions on Ñ , and |F |, |G| and |H| are total degrees of the functions,
respectively. The graded Poisson structure is also called P-structure. If n = 1, the antibracket
is equivalent to the Schouten bracket. For higher n, the antibracket is equivalent to the Loday
bracket [43] with the degree −n + 1.
Typical examples of Poisson manifold N are a cotangent bundle T ∗M and a vector bundle
E ⊕ E∗. First we consider a cotangent bundle T ∗M . Since T ∗M has a natural symplectic
structure, we can define a Poisson bracket induced from the symplectic structure. If we take
a local coordinate φi on M and a local coordinate Bi of the fiber, we can define a Poisson
bracket as follows:
{F,G} ≡ F
G, (89)
where F and G are functions on T ∗M , and
∂ /∂ϕ and
∂ /∂ϕ are the right and left differen-
tiations with respect to ϕ, respectively. Here we shift the degree of fiber by p, i.e. the space
T ∗[p]M . Then a Poisson structure shifts to a graded Poisson structure. The corresponding
graded Poisson bracket is called antibracket, (∗, ∗). Let φi be a local coordinate of M and
Bn−1,i a basis of the fiber of T
∗[p]M . The antibracket (∗, ∗) on a cotangent bundle T ∗[p]M
is expressed as:
(F,G) ≡ F
∂Bp,i
∂Bp,i
G. (90)
The total degree of the antibracket (∗, ∗) is −p. This antibracket satisfies the property (88)
for −p = −n + 1.
Next, we consider a vector bundle E ⊕ E∗. There is a natural Poisson structure on the
fiber of E ⊕ E∗ induced from a paring of E and E∗. If we take a local coordinate Aa on the
fiber of E and Ba on the fiber of E
∗, we can define
{F,G} ≡ F
G, (91)
where F and G are functions on E ⊕ E∗. We shift the degrees of fibers of E and E∗ like
E[p]⊕E∗[q], where p and q are positive integers. The Poisson structure changes to a graded
Poisson structure (∗, ∗). Let Apa be a basis of the fiber of E[p] and Bq,a a basis of the fiber
of E∗[q]. The antibracket is represented as
(F,G) ≡ F
∂Bq,a
G− (−1)pqF
∂Bq,a
G. (92)
The total degree of the antibracket (∗, ∗) is −p − q. This antibracket satisfies the property
(88) for −p− q = −n + 1.
We define a Q-structure. A Q-structure is a function S on a graded manifold Ñ which
satisfies the classical master equation (S, S) = 0. S is called a Batalin-Vilkovisky action. We
require that S satisfy the compatibility condition
S(F,G) = (SF ,G) + (−1)|F |+1(F, SG), (93)
where F andG are arbitrary functions. (S, F ) = δF generates an infinitesimal transformation,
which is a BRST transformation, which coincides with the gauge transformation of the theory.
The AKSZ formulation of the Batalin-Vilkovisky formalism is defined as a P-structure and
a Q-structure on a graded manifold.
B-2. Batalin-Vilkovisky Structures of Topological Sigma Models
In this subsection, we explain Batalin-Vilkovisky structures of topological sigma models. Let
X be a base manifold in n dimensions, with or without boundary, and M be a target manifold
in d dimensions. We denote φ a smooth map from X to M .
We consider a supermanifold ΠTX , whose bosonic part is X . ΠTX is defined as a tangent
bundle with reversed parity of the fiber. We take a local coordinate of ΠTX , (σµ, θµ), where σµ
is a coordinate on the base space and θµ is a super coordinate on the fiber and µ = 1, 2, · · · , n.
We extend a smooth function φ to a function on the supermanifold φ : ΠTX → M . φ is
called a superfield and an element of ΠT ∗X ⊗M . We introduce a new non-negative integer
grading on ΠT ∗X . A coordinate σµ on a base manifold has zero and a coordinate θµ on the
fiber has one. This grading is called the form degree. We denote degF the form degree of the
function F . The total degree defined in the previous section is a grading with respect to M ,
on the other hand The form degree is a grading with respect to X . We define a ghost number
ghF such that ghF = |F | − degF . W assign the ghost numbers of σµ and θµ zero. Thus σµ
has the total degree zero and θµ has total degree one.
We consider a P-structure on T ∗[p]M . We take p = n−1 to construct a Batalin-Vilkovisky
structure in a topological sigma model on a general n dimensional worldvolume. We consider
T ∗[n− 1]M for an n-dimensional base manifold X . Let a superfield φi be local a coordinate
of ΠT ∗X ⊗M , where i, j, k, · · · are indices of the local coordinate on M . Let a superfield
Bn−1,i be a basis of sections of ΠT
∗X ⊗φ∗(T ∗[n− 1]M). Expansions to component fields of
the superfields are the following:
i = φ(0)i + θµ1φ(−1)iµ1 +
θµ1θµ2φ(−2)iµ1µ2 + · · ·+
θµ1 · · · θµnφ(−n)iµ1···µn , (94)
Bn−1,i = B
(n−1)
n−1,i + · · ·+
(n− 1)!
θµ1 · · · θµn−1B(0)µ1···µn−1n−1,i +
θµ1 · · · θµnB(−1)µ1···µnn−1,i,
where (p) is the ghost number of the component field.
From (90) in the previous subsection, we define an antibracket (∗, ∗) on a cotangent bundle
T ∗[n− 1]M as
(F,G) ≡ F
∂Bn−1,i
∂Bn−1,i
G, (95)
where F and G are functions of φi and Bn−1,i. The total degree of the antibracket is −n+1.
If F and G are functionals of φi and Bn−1,i, we understand an antibracket is defined as
(F,G) ≡
∂Bn−1,i
∂Bn−1,i
G, (96)
where the integration
ΠTX means the integration on the supermanifold,
ΠTX d
nθdnσ. Through
this article, we always understand an antibracket on two functionals in a similar manner and
abbreviate this notation.
Next we consider a P-structure on E ⊕ E∗. In a topological sigma model in n dimension
worldvolume, we assign the total degree of p and q such that p+ q = n− 1. The total graded
bundle is E[p] ⊕ E∗[n − p − 1], where −n + 1 ≤ p ≤ n − 1, p 6= 0. Let Apap be a basis of
sections of ΠT ∗X ⊗φ∗(E[p]) and Bn−p−1,ap a basis of the fiber of ΠT ∗X ⊗φ∗(E∗[n− p− 1]).
Expansions to component fields of the superfields are
ap = A(p)app + θ
µ1A(p−1)apµ1p + · · ·++
(p− 1)!
θµ1 · · · θµ(p−1)A(0)apµ1···µ(p−1)p
+ · · ·+ 1
θµ1 · · · θµn , A(−n+p)apµ1···µnp (97)
Bn−p−1,ap = B
(n−p−1)
n−p−1,ap
+ θµ1B
(n−p−2)
µ1n−p−1,ap
+ · · ·+ 1
(n− p− 1)!
θµ1 · · · θµ(n−p−1)B(0)µ1···µ(n−p−1)n−p−1,ap
+ · · ·+ 1
θµ1 · · · θµnB(−p−1)µ1···µnn−p−1,ap,
From (92), we define the antibracket as
(F,G) ≡ F
∂Apap
∂Bn−p−1,ap
G− (−1)npF
∂Bn−p−1,ap
∂Apap
G. (98)
We need to consider various grading assignments for E ⊕ E∗, because each assignment
induces different Batalin-Vilkovisky structures. In order to consider all independent assign-
ments, we define the following bundle. Let Ep be series of vector bundles, where −n + 1 ≤
p ≤ n− 1. We consider a direct sum of each bundle Ep[p] :
p=−n+1,p 6=0
Ep[p], (99)
and we can define a P-structure on the graded vector bundle
T ∗[n− 1]M ⊕
p=−n+1,p 6=0
Ep[p]⊕ E∗p [n− p− 1]
 , (100)
which is isomorphic to the graded bundle
T ∗[n− 1]
p=−n+1,p 6=0
Ep[p]
 . (101)
as a sum of (95) and (98):
(F,G) ≡
p=−n+1
∂Apap
∂Bn−p−1 ap
G− (−1)npF
∂Bn−p−1 ap
∂Apap
G. (102)
where A0
a0 = φi, that is p = 0 component is the antibracket (95) on the graded cotangent
bundle T ∗[n− 1]M . Note that all terms of the antibracket have the total degree −n+1, and
we can confirm that the antibracket (102) satisfies the identity (88).
References
[1] R. Zucchini, “A sigma model field theoretic realization of Hitchin’s generalized complex
geometry,” JHEP 0411 (2004) 045 [arXiv:hep-th/0409181].
[2] R. Zucchini, “Generalized complex geometry, generalized branes and the Hitchin sigma
model,” JHEP 0503 (2005) 022 [arXiv:hep-th/0501062].
[3] S. J. Gates, C. M. Hull and M. Roček, “Twisted multiplets and new supersymmetric
nonlinear sigma models”, Nucl. Phys. B 248, 157 (1984).
[4] N. Hitchin, “Generalized Calabi Yau manifolds”, Q. J. Math. 54 no. 3 (2003) 281,
[arXiv:math.dg/0209099].
[5] M. Gualtieri, “Generalized complex geometry”, Oxford University doctoral thesis,
[arXiv:math.dg/0401221].
[6] M. Alexandrov, M. Kontsevich, A. Schwartz and O. Zaboronsky, “The Geometry of the
master equation and topological quantum field theory,” Int. J. Mod. Phys. A 12 (1997)
1405 [arXiv:hep-th/9502010].
[7] D. Roytenberg, “AKSZ-BV formalism and Courant algebroid-induced topological field
theories,” Lett. Math. Phys. 79, 143 (2007) [arXiv:hep-th/0608150].
[8] I. A. Batalin and G. A. Vilkovisky, “Gauge Algebra And Quantization,” Phys. Lett.B102
(1981) 27; “Quantization Of Gauge Theories With Linearly Dependent Generators,”
Phys. Rev. D28 (1983) 2567, [Erratum-ibid. D 30 (1984) 508].
[9] A. Kapustin, “Topological strings on noncommutative manifolds”, Int. J. Geom. Meth.
Mod. Phys. 1 nos. 1 & 2 (2004) 49, [arXiv:hep-th/0310057].
[10] U. Lindström, “Generalized N = (2,2) supersymmetric non-linear sigma models”, Phys.
Lett. B 587 (2004) 216, [arXiv:hep-th/0401100].
[11] U. Lindström, R. Minasian, A. Tomasiello and M. Zabzine, “Generalized complex
manifolds and supersymmetry”, Commun. Math. Phys. 257 (2005) 235 [arXiv:hep-
th/0405085].
[12] M. Zabzine, “Geometry of D-branes for general N=(2,2) sigma models”, Lett. Math.
Phys. 70 (2004) 211 [arXiv:hep-th/0405240].
[13] A. Kapustin and Y. Li, “Topological sigma-models with H-flux and twisted generalized
complex manifolds”, [arXiv:hep-th/0407249].
[14] S. Chiantese, F. Gmeiner and C. Jeschek, “Mirror symmetry for topological sigma models
with generalized Kähler geometry”, Int. J. Mod. Phys. A 21 (2006) 2377 [arXiv:hep-
th/0408169].
[15] L. Bergamin, “Generalized complex geometry and the Poisson sigma model,” Mod. Phys.
Lett. A 20 (2005) 985 [arXiv:hep-th/0409283].
[16] U. Lindström, M. Roček, R. von Unge and M. Zabzine, “Generalized Kähler geometry
and manifest N=(2,2) supersymmetric nonlinear sigma-models,” JHEP 0507 (2005) 067
[arXiv:hep-th/0411186].
[17] M. Zabzine, “Hamiltonian perspective on generalized complex structure,” Commun.
Math. Phys. 263 (2006) 711 [arXiv:hep-th/0502137].
[18] R. Zucchini, “A topological sigma model of biKähler geometry,” JHEP 0601 (2006) 041
[arXiv:hep-th/0511144].
[19] U. Lindstrom, M. Roček, R. von Unge and M. Zabzine, “Generalized Kähler mani-
folds and off-shell supersymmetry,” Commun. Math. Phys. 269 (2007) 833 [arXiv:hep-
th/0512164].
[20] A. Bredthauer, U. Lindstrom, J. Persson and M. Zabzine, “Generalized Kähler geome-
try from supersymmetric sigma models,” Lett. Math. Phys. 77 (2006) 291 [arXiv:hep-
th/0603130].
[21] V. Pestun, “Topological strings in generalized complex space,” [arXiv:hep-th/0603145].
[22] M. Zabzine, “Lectures on generalized complex geometry and supersymmetry,”
[arXiv:hep-th/0605148].
[23] W. y. Chuang, “Topological twisted sigma model with H-flux revisited,” [arXiv:hep-
th/0608119].
[24] R. Zucchini, “The biHermitian topological sigma model,” JHEP 0612 (2006) 039
[arXiv:hep-th/0608145].
[25] S. Guttenberg, “Brackets, sigma models and integrability of generalized complex struc-
tures,” JHEP 0706 (2007) 004 [arXiv:hep-th/0609015].
[26] N. Ikeda and T. Tokunaga, “Topological membranes with 3-form H flux on generalized
geometries,” [arXiv:hep-th/0609098].
[27] W. Merrell, L. A. P. Zayas and D. Vaman, “Gauged (2,2) sigma models and generalized
Kaehler geometry,” [arXiv:hep-th/0610116].
[28] A. Kapustin and A. Tomasiello, “The general (2,2) gauged sigma model with three-form
flux,” [arXiv:hep-th/0610210].
[29] R. Zucchini, “BiHermitian supersymmetric quantum mechanics,” Class. Quant. Grav.
24 (2007) 2073 [arXiv:hep-th/0611308].
[30] J. Persson, “T-duality and generalized complex geometry,” JHEP 0703 (2007) 025
[arXiv:hep-th/0612034].
[31] U. Lindstrom, M. Roček, R. von Unge and M. Zabzine, “Linearizing generalized Kähler
geometry,” JHEP 0704 (2007) 061 [arXiv:hep-th/0702126].
[32] U. Lindstrom, M. Roček, R. von Unge and M. Zabzine, “A potential for generalized
Kähler geometry,” [arXiv:hep-th/0703111].
[33] N. Ikeda, “Three dimensional topological field theory induced from generalized complex
structure,” [arXiv:hep-th/0412140].
[34] E. Witten, “Topological Sigma Models,” Commun. Math. Phys. 118 (1988) 411.
[35] N. Ikeda and K. I. Izawa, “General form of dilaton gravity and nonlinear gauge the-
ory,” Prog. Theor. Phys. 90 (1993) 237 [arXiv:hep-th/9304012]. For reviews, N. Ikeda,
“Two-dimensional gravity and nonlinear gauge theory,” Annals Phys. 235 (1994) 435
[arXiv:hep-th/9312059].
[36] P. Schaller and T. Strobl, “Poisson structure induced (topological) field theories,” Mod.
Phys. Lett. A 9 (1994) 3129 [arXiv:hep-th/9405110]. “Poisson sigma models: A general-
ization of 2-d gravity Yang-Mills systems,” [arXiv:hep-th/9411163].
[37] C. Hofman, “On the open-closed B-model,” JHEP 0311 (2003) 069 [arXiv:hep-
th/0204157].
[38] C. Klimcik and T. Strobl, “WZW-Poisson manifolds”, J. Geom. Phys. 43 (2002) 341,
[arXiv:math.sg/0104189].
[39] N. Ikeda and K. I. Izawa, “Dimensional reduction of nonlinear gauge theories,” JHEP
0409 (2004) 030 [arXiv:hep-th/0407243].
[40] I. Batalin and R. Marnelius, “Generalized Poisson sigma models,” Phys. Lett. B 512
(2001) 225 [arXiv:hep-th/0105190].
[41] I. Batalin and R. Marnelius, “Superfield algorithms for topological field theories,”
[arXiv:hep-th/0110140].
[42] N. Ikeda, “Deformation of Batalin-Vilkovisky Structures,” [arXiv:math.sg/0604157].
[43] J. -L. Loday, Une version non commutative des algebres de Lie: les algebres de Leibniz,
Enseign. Math. 39, 269 (1993).
ABSTRACT
  We propose a new topological field theory on generalized complex geometry in
two dimension using AKSZ formulation. Zucchini's model is $A$ model in the case
that the generalized complex structuredepends on only a symplectic structure.
Our new model is $B$ model in the case that the generalized complex structure
depends on only a complex structure.

<|endoftext|><|startoftext|>
Introduction
After many years of theoretical investigations, the nature
of the ground-state of the spin- 1
Heisenberg model on the
kagome lattice is still not known. Although all numerical
studies have concluded to the absence of long-range mag-
netic (Néel) order [1,2,3,4,5,6,7,8], many basic questions
such as the existence of spontaneously broken symmetries,
or the existence of a finite gap to magnetic excitations re-
main open. In fact, many different states of matter have
been proposed for the kagome Heisenberg antiferromag-
net: Z2 gapped topological liquids [9,10], valence-bond
crystals [11,12,13,14], critical spin liquids with gapless
spinons [15,12].
Recently, a promising spin- 1
antiferromagnetic insula-
tor with an ideal kagome geometry, ZnCu3(OH)6Cl2, has
been synthesized and studied for its magnetic properties
[16,17,18,19,20]. Because these studies did not detect any
kind ordering (nor spin freezing) down to 50mK, it could
represent one of the first and most remarkable realization
of a 2D quantum spin liquid [21,22].
To extract some information about the low-energy physics
of the kagome Heisenberg model from the experiments on
ZnCu3(OH)6Cl2, it is important to first analyze in a quan-
titative way the possible role of magnetic defects (and
other “perturbations”to this model) in this compound.
In this paper we compare the experimental data for the
magnetic susceptibility χ and specific heat cv obtained by
Correspondence to: gregoire.misguich@cea.fr
Helton et al.[16] with calculations for the spin- 1
Heisen-
berg model on the kagome lattice based on exact diago-
nalization (ED) data (partial spectrum of a 36-site cluster
and full spectrum for 24-site and 18-sites clusters) and
high-temperature series expansion [23,24]. Down to tem-
perature kBT/J = 0.2, the experimental susceptibility
χexp(T ) can be very well fitted by that of the kagome lat-
tice Heisenberg model with J ' 190 K plus a contribution
of about 4% of impurity spins with weak mutual inter-
actions (modeled by a ferromagnetic Curie-Weiss temper-
ature of ' −6 K), likely mostly due to antisite disorder
(Cu substituted to Zn on sites between the kagome planes)
[20,25]. The low temperature specific heat is dominated
by impurities (and other perturbations) below 2 K and
by phonons above 15 K. In the intermediate range, the
calculated specific heat appears to be larger than in the
experiment. We comment on this feature at the end of the
paper.
2 Uniform static susceptibility
The spin- 1
Heisenberg model reads:
H = J
〈i,j〉
Si · Sj − gµBH
Szi (1)
where the sum runs over pairs of nearest neighbor sites on
the kagome lattice and H is an external magnetic field.
2 G. Misguich and P. Sindzingre: Magnetic susceptibility and specific heat of the spin- 1
Heisenberg model ...
To fix the notations, we define the (zero-field) uniform
susceptibility per site χ(T ) as χ(T ) = gµB
〈Szi 〉
where N is the total number of spins.
The high-temperature expansion of χ has been com-
puted up to order O
(included) by Elstner and
Young [23]:
χ(T ) =
χth(t = kBT/J)
χth(t) = (1/4)t
−1 − (1/4)t−2 + · · · (2)
where C0 = 0.25(gµB)2 is the Curie constant and t the re-
duced temperature. The truncated series χn(t) =
i=0 cit
at order n = 14 and n = 15 agree with a relative error
smaller than 10−2 for t > 1. Down to this temperature,
they already provide good approximations to χth(t). The
convergence of high-T series can be improved using Padé
approximants (PA). In the present case the PA provide a
reliable estimate of χth(t) at least down to t ∼ 0.5.1 One
representative PA (numerator of degree 8 and denomina-
tor of degree 7) is displayed Fig. 1.
On a small enough system, it is possible to obtain
by ED the full spectrum (2N energy levels for N spins).
We have done so for two 18-site and 24-site kagome clus-
ters (with periodic boundary conditions). Then thermo-
dynamic quantities can be computed exactly as a func-
tion of T . For bigger systems, where one can still compute
some eigenstates by ED, one may use the approximate
method described in reference [26] to compute thermody-
namic quantities. footnote In this method, one constructs
the density of states in each symmetry sector from the
exact low and high energy states obtained from ED and
approximating in between the unknown part of the spec-
trum with a smooth density of states. This smooth part
is constructed so that the first moments of the density of
states (Tr[Hn]), in each symmetry sector of the finite clus-
ter, are exact up to n = 5. This Ansatz guaranties that the
thermodynamics becomes exact at low T (when thermal
excitations only involve the eigenstates computed exactly)
as well as at high T (when a re-summed high-temperature
expansion up to T−5 is valid).
The results for the susceptibility χ(t) are shown in
Fig. 1. Above t = 0.2, the relative difference between the
(exact) N = 18 and N = 24 curves is smaller than 0.5%.
We therefore make the (rather safe) assumption that our
finite-size results are good approximations to the thermo-
dynamic limit down to tmin ' 0.2. This represents a small
gain over the coupled-cluster expansion of reference [27],
which is valid above t ∼ 0.3.
The χ(t) obtained with the approximate method for
N = 36 sites also agrees (with a relative error smaller than
2%) with the N = 24 results down to t ' 0.2. Slightly be-
low, the 36-site susceptibility is still increasing and might
be a better approximation to the infinite-size limit than
the 24-site curve. Still, it is not possible decide at which t
finite-size (and/or errors due to the approximation in the
1 Comparing the PA and the exact curves for 18 and 24-site
clusters shows that the PA is in fact correct down to T ∼ 0.4J ,
see Fig. 1.
density of states) will become too important. Safely, we
only use the theoretical results (noted χth) above t ' 0.2
to fit the experimental data χexp for the susceptibility.
We fit χexp to a sum of contributions from the kagome
spins χs and the impurities χimp in the following way:
χexp(T ) =
χs(kBT/J) + χimp(T ) (3)
with χimp(T ) =
T + θimp
where J is the (unknown) magnetic exchange in between
the spins in the kagome planes, x an impurity concentra-
tion, C0 = 0.25(gµB)2/kB the Curie constant and θimp
the Curie-Weiss temperature of the system of impurities.
Eq. 4 is the leading term in a high-temperature expan-
sion for the system of impurities, which provides a sim-
plified picture of their interactions. To be applicable, T
should therefore be large compared to θimp. We assume
that the system of impurities does not perturb the the
kagome spins.
We optimized numerically the parameters so that χs
fits the theoretical results χth for t ≥ 0.2. As can be seen
on Fig. 1, an almost perfect agreement can be obtained be-
tween χs = J4C0 (χexp(T )−χimp(T )) (squares) and the the-
oretical estimates for the kagome susceptibility χth with
the following parameters: C0 = 0.504 K cm3/(mol of Cu)
(equivalent to a gyromagnetic factor g = 2.32), J = 190.4 K,
x = 0.03655 and θimp = −6.1 K. We note that the value
of J is in rough agreement with the values reported in
reference [16] (17meV'200 K) and [27] (170 K). We also
check a posteriori that the lowest temperature of the fit
(t = 0.2 ' 38 K) is much bigger than |θimp|, so that a
Curie-Weiss approximation for the impurities is justified.
6 K is also approximatively the transition temperature re-
ported in ZnxCu4−x(OH)6Cl3 for x < 0.6 (replacing some
Zn by Cu between the kagome planes) [17], so this energy
scale may correspond to some couplings for spins located
between the planes, where magnetic impurities could sit.
We eventually notice that down to t = 0.15 (that is below
the lowest temperature used for the fit), χexp − χimp con-
tinues to increase and to follow the N = 36 curve. This
suggests that the maximum of the kagome susceptibility
could indeed be below t = 0.15.
Rigol and Singh [27] analyzed the same experimen-
tal data with another high-temperature method (coupled
cluster expansion) and obtained a somewhat different con-
clusion. They argued that Dzyaloshinskii-Moriya (DM) in-
teractions provide a better description of the low-temperature
increase of the susceptibility than impurities. Although
we agree that DM interactions are certainly present in
ZnCu3(OH)6Cl2 and that they should affect the physics
of the system (at least at low-temperatures), it is also
clear that a few percent of impurities must be present too
and should have a visible effect on the susceptibility, even
at rather high temperatures. According to reference [27],
free impurities cannot explain the sharp increase of χexp
below 60 K. In our analysis, this issue is solved by al-
lowing for a small ferromagnetic Curie-Weiss temperature
G. Misguich and P. Sindzingre: Magnetic susceptibility and specific heat of the spin- 1
Heisenberg model ... 3
Fig. 1. (color online) Magnetic susceptibility per spin as a
function of temperature. Dashed (magenta) curve : experimen-
tal data (Helton et al.), multiplied by J/(4C0) as a function
of kBT/J with J ' 190 K and C0 = 0.504 K cm3/(mol of
Cu). Black squares: χs =
(χexp(T ) − χimp(T )), obtained
from the experimental data χexp by subtracting the contribu-
tion χimp of a concentration x = 0.03655 of impurity spins
with a Curie-Weiss temperature θimp = −6.1 K. Red (resp.
cyan) curve : Exact χth for a N = 24 (resp. 18) site kagome
cluster with periodic boundary conditions. Blue curve: Results
for N = 36 spins obtained with the (approximate) method of
reference [26]. Green curve: Padé approximant from the high-
temperature expansion at order t−15. The Padé approximant
is not converged below t ' 0.4 whereas the finite-size curves
are practically converged to the thermodynamic limit down to
t ' 0.2. Below t = 0.2J , the later curves are only indicative.
θimp ' −6 K for the impurities.2 One can indeed see that,
once χimp has been subtracted, the experimental results
show a saturation of χ around 20 K, in rough agreement
with the measurements of reference [19]. The location of
the maximum we obtain is however quite sensitive to the
value x of the impurity concentration.
3 Specific heat
The experimental data for the specific heat cv(T ) are only
available at very low temperature where the size effects
on the ED results are large. We therefore also applied
the high-temperature entropy method [28]. It combines
three pieces of information about the system: 1) The high
2 The Curie-Weiss temperature is an average of the differ-
ent exchange constants. However, due to the complexity of the
interactions between impurities (disorder), the ferromagnetic
sign of θimp does not necessarily imply that they behave ferro-
magnetically at low temperatures.
temperature series expansion of cv(T ), up to T−17 [23,
24]. 2) The ground-state energy per site e0 of the Hamil-
tonian. Here we use the following estimate e0 = 2〈0|Si ·
Sj |0〉 = −0.44 [24]. 3) The exponent α describing the low-
temperature behavior of the specific heat: cv(t→ 0) ∼ tα.
The method then provides a set of cv(t) curves (for dif-
ferent Padé approximants) which all satisfy exactly the
following properties: i) cv(T → 0) ∼ Tα, ii) cv(T →∞) ∼
the series expansion, iii)
cv(T )dT = −Ne0, and iv)∫∞
cv(T )/TdT = NkB ln(2). When the value of e0 and α
are both correct, one usually gets a large number of very
similar curves but if either is incorrect, only a few and scat-
tered curves will be obtained (more details in Refs. [28,
24]). In the case of the kagome antiferromagnet, the value
of α is not known and the entropy method gives a reason-
able convergence for α = 1 and α = 2 [24]. Motivated by
the experimental observation [16] of a low-temperature cv
with an exponent smaller than one, we also include here
a calculation with α = 0.5.
The results for some valid3 PA from orders 14 to 17 are
displayed in Fig. 2 together with the exact result obtained
by ED of the full spectrum of a 24-site kagome cluster. For
t > 0.3, these results are practically exact.
In the two scenarios α = 1 and 2 there is a significant
dispersion of the results for t < 0.3 from one PA to another
[24].4 In both case, cv show a low-t peak or shoulder as
found from ED of finite-size systems. The choice α = 0.5
leads to a smoother cv(t) and improves significantly the
convergence. It is however not clear if this improved con-
vergence for small values of α (0.5 . α . 1) indicates that
α is actually smaller than one or an artifact of the present
entropy method, which might be “slower” to stabilize a
low-t peak (as with α = 2, see Fig. 2), than a smooth
curve (as with α = 0.5). In any case, this is clearly re-
lated to the unusually large low-temperature entropy of
the kagome system.
Fig. 2 also displays the experimental results (black
squares) obtained by Helton et al. [16]. The only parame-
ter here is the exchange constant, taken to be J ' 190 K
from the fits of the susceptibility data. Above 15 K, the
phonon contribution is dominant and we have to focus on
the lower temperatures to analyze the magnetic contribu-
tion. Below 15 K the order of magnitude agrees with our
calculation but there is no quantitative agreement. Several
“perturbations” such as weakly interacting magnetic im-
purities or magnetic anisotropies should indeed contribute
to the specific heat at such low temperatures.
Results for the integrated entropy S(T ) =
cv(x)/xdx
are plotted in Fig. 3, where one sees that the choice of the
3 By the entropy method, the entropy s(e) is obtained as the
power α/(α + 1) of a rational fraction (PA) of the energy per
site e. The specific heat curve is then obtained parametrically
through T (e) = 1/s′(e) and cv(e) = −s′(e)2/s′′(e). Only the
PA which satisfy s(e) > 0, s′(e) > 0 and s′′(e) < 0 in the range
]e0, 0[ are physically “valid”.
4 This is due to the finite order in the high-temperature
expansion. Still, for a given value of α, we believe that this
method gives a qualitatively correct picture for cv(T ), even at
low T.
4 G. Misguich and P. Sindzingre: Magnetic susceptibility and specific heat of the spin- 1
Heisenberg model ...
Fig. 2. (color online) Specific heat cv vs temperature T .
Black squares: experimental data from Helton et al. [16], as-
suming J ' 190 K. Red triangles: exact specific heat of a 24-
site kagome cluster. Green (full), blue (dashed) and magenta
(dotted) curves: cv calculated by the entropy method for three
different values of the low-temperature exponent (α = 2, 1 and
0.5). All valid PA from order 14 to 17 with numerator and de-
nominator of degrees ≥ 3 are shown. For a given α and at each
temperature, their dispersion provides a rough estimate of the
error bars.
Fig. 3. (color online) Entropy S vs temperature. Square, tri-
angles and circles: experimental data from Helton et al. [16]
with magnetic field B= 0, 5 and 14 Teslas, plotted as a func-
tion of kBT/J with J = 190 K. Green (full), blue (dashed)
and magenta (dotted) curves: entropy calculated by the en-
tropy method for α = 2, α = 1 and α = 0.5 (same PA as in
Fig. 2).
low-T exponent α of cv has practically no influence on the
theoretical S(t = kBT/J) above t ' 0.06 and that, around
this temperature, the experimental value is significantly
lower than in our calculations. Of course, subtracting the
contribution of the phonons (hard to estimate quantita-
tively) and from the impurities would make the discrep-
ancy even larger. Concerning the impurities, one sees in
Fig. 3 that an applied magnetic field of 5 and 14 Teslas
is enough to significantly reduce S(t) for t . 0.06. Such
fields are low in comparison to J but of the order of the
estimated coupling between the impurities. The difference
between the curves at 0 and 5 (or 14) Teslas may thus
provide a rough estimate of the contribution of the impu-
rities. Note that S(t) at 5 Teslas become closer to the S(t)
computed by the entropy method with α = 2. The experi-
mental value α < 1 could be due to the impurities and the
actual value of α for the kagome spins could be larger than
1. In any case, at t = 0.06(' 11 K), the experimental en-
tropy (' 0.2 ln 2) is thus at least 7% of ln 2 below the the-
oretical estimates (0.27 ln 2). This seems a rather robust
indication that some additional interactions play some role
in this energy range, by freezing some degrees of freedom
of the spins in the kagome planes and pushing the corre-
sponding entropy to higher energies. We may mention in
particular DM interactions [27], non-magnetic impurities
in the kagome planes (“dilution”) [20] and interactions
between impurities and the kagome spins. We conclude
that 15∼20 K is a minimal temperature for a kagome lat-
tice Heisenberg model description of ZnCu3(OH)6Cl2 to
be valid.
Acknowledgments
We are grateful to C. Lhuillier for many discussions and
comments about the manuscript. We also thank P. Mendels
and F. Bert for useful discussions. GM also thanks Y. S. Lee
and J. Helton for discussions and for providing their data
as well as P.A. Lee, Y. Ran, T. Senthil, X.-G. Wen for
discussions on related topics.
Note added : After the first submission of this manuscript,
two preprints [20,25] (neutron scattering) confirmed the
importance of magnetic impurities (from 6% to 10%) in
ZnCu3(OH)6Cl2. The smaller value found here could be
due to our simplified model to fit the magnetic suscepti-
bility.
References
1. V. Elser, Phys. Rev. Lett. 62, 2405 (1989)
2. C. Zeng and V. Elser, Phys. Rev. B 42, 8436 (1990)
3. J. T. Chalker and J. F. G. Eastmond, Phys. Rev. B 46,
14201 (1992)
4. R. R. P. Singh and D. A. Huse, Phys. Rev. Lett. 68 1766,
(1992)
5. P. W. Leung and V. Elser, Phys. Rev. B 47, 5459 (1993)
6. T. Nakamura and S. Miyashita, Phys. Rev. B 52, 9174
(1995)
http://link.aps.org/abstract/PRL/v62/p2405
http://link.aps.org/abstract/PRB/v42/p8436
http://link.aps.org/abstract/PRB/v46/p14201
http://link.aps.org/abstract/PRB/v46/p14201
http://link.aps.org/abstract/PRL/v68/p1766
http://link.aps.org/abstract/PRL/v68/p1766
http://link.aps.org/abstract/PRB/v47/p5459
http://link.aps.org/abstract/PRB/v52/p9174
http://link.aps.org/abstract/PRB/v52/p9174
G. Misguich and P. Sindzingre: Magnetic susceptibility and specific heat of the spin- 1
Heisenberg model ... 5
7. P. Lecheminant, B. Bernu, C. Lhuillier, L. Pierre and
P. Sindzingre Phys. Rev. B 56, 2521 (1997)
8. C. Waldtmann, H. U. Everts, B. Bernu, C. Lhuillier,
P. Sindzingre, P. Lecheminant and L. Pierre, Eur. Phys.
J. B 2, 501 (1998)
9. S. Sachdev, Phys. Rev. B 45, 12 377 (1992)
10. Fa Wang and A. Vishwanath, Phys. Rev. B 74, 174423
(2006)
11. J.B. Marston and C. Zeng, J. Appl. Phys. 69, 5962 (1991)
12. M. B. Hastings, Phys. Rev. B 63, 014413 (2000)
13. P. Nikolic and T. Senthil, Phys. Rev. B 68, 214415 (2003)
14. R. Budnik and A. Auerbach, Phys. Rev. Lett. 93, 187205
(2004)
15. Y. Ran, M. Hermele, P. A. Lee and X.-G. Wen, Phys. Rev.
Lett. 98, 117205 (2007)
16. J. S. Helton, K. Matan, M. P. Shores, E. A. Nytko,
B. M. Bartlett, Y. Yoshida, Y. Takano, A. Suslov, Y. Qiu,
J.-H. Chung, D. G. Nocera, Y. S. Lee, Phys. Rev. Lett.
98, 107204 (2007)
17. P. Mendels, F. Bert, M. A. de Vries, A. Olariu, A. Harrison,
F. Duc, J. C. Trombe, J. S. Lord, A. Amato, and C. Baines,
Phys. Rev. Lett. 98, 077204 (2007)
18. O. Ofer, A. Keren, E. A. Nytko, M. P. Shores,
B. M. Bartlett, D. G. Nocera, C. Baines, A. Amato, cond-
mat/0610540
19. T. Imai, E. A. Nytko, B. M. Bartlett, M. P. Shores,
D. G. Nocera, cond-mat/0703141
20. M. A. de Vries, K. V. Kamenev, W. A. Kockelmann,
J. Sanchez-Beniteza and A. Harrison, arXiv:0705.0654
21. P. W. Anderson, Mater. Res. Bull. 8, 153 (1973)
22. G. Misguich and C. Lhuillier, in Frustrated Spin Systems,
edited by H. T. Diep (World Scientific, Singapore, 2005)
[cond-mat/0310405]
23. N. Elstner and A. P. Young, Phys. Rev. B 50, 6871 (1994)
24. G. Misguich and B. Bernu, Phys. Rev. B 71, 014417 (2005)
25. S.-H. Lee, H. Kikuchi, Y. Qiu, B. Lake, Q. Huang, K.
Habicht, K. Kiefer, arXiv:0705.2279
26. P. Sindzingre, G. Misguich, C. Lhuillier, B. Bernu,
L. Pierre, C. Waldtmann and H. U. Everts, Phys. Rev.
Lett. 84, 2953 (2000)
27. M. Rigol and R. R. P. Singh, Phys. Rev. Lett. 98, 207204
(2007)
28. B. Bernu and G. Misguich, Phys. Rev. B 63, 134409 (2001)
http://link.aps.org/abstract/PRB/v56/p2521
http://publish.edpsciences.com/abstract/EPJB/v2/p501
http://publish.edpsciences.com/abstract/EPJB/v2/p501
http://link.aps.org/abstract/PRB/v45/p12377
http://link.aps.org/abstract/PRB/v74/e174423
http://link.aps.org/abstract/PRB/v74/e174423
http://dx.doi.org/10.1063/1.347830
http://link.aps.org/abstract/PRB/v63/e014413
http://link.aps.org/abstract/PRB/v68/e214415
http://link.aps.org/abstract/PRL/v93/e187205
http://link.aps.org/abstract/PRL/v93/e187205
http://link.aps.org/abstract/PRL/v98/e117205
http://link.aps.org/abstract/PRL/v98/e117205
http://link.aps.org/abstract/PRL/v98/e107204
http://link.aps.org/abstract/PRL/v98/e107204
http://link.aps.org/abstract/PRL/v98/e077204
http://arxiv.org/abs/cond-mat/0610540
http://arxiv.org/abs/cond-mat/0610540
http://arxiv.org/abs/cond-mat/0703141
http://arxiv.org/abs/0705.0654
http://dx.doi.org/10.1016/0025-5408(73)90167-0
http://arxiv.org/abs/cond-mat/0310405
http://link.aps.org/abstract/PRB/v50/p6871
http://link.aps.org/abstract/PRB/v71/e014417
http://arxiv.org/abs/0705.2279
http://link.aps.org/abstract/PRL/v84/e2953
http://link.aps.org/abstract/PRL/v84/e2953
http://link.aps.org/abstract/PRL/v98/e207204
http://link.aps.org/abstract/PRL/v98/e207204
http://link.aps.org/abstract/PRB/v63/e134409
	Introduction
	Uniform static susceptibility
	Specific heat
ABSTRACT
  We compute the magnetic susceptibility and specific heat of the spin-1/2
Heisenberg model on the kagome lattice with high-temperature expansions and
exact diagonalizations. We compare the results with the experimental data on
ZnCu3(OH)6Cl2 obtained by Helton et al. [Phys. Rev. Lett. 98, 107204 (2007)].
Down to k_BT/J~0.2, our calculations reproduce accurately the experimental
susceptibility, with an exchange interaction J~190K and a contribution of 3.7%
of weakly interacting impurity spins. The comparison between our calculations
of the specific heat and the experiments indicate that the low-temperature
entropy (below ~20K) is smaller in ZnCu3(OH)6Cl2 than in the kagome Heisenberg
model, a likely signature of other interactions in the system.

<|endoftext|><|startoftext|>
C:/Documents and Settings/yohno/デスクトップ/岩崎論文/revtex4/manuscript070408.dvi
Environmental dielectric screening e�ect on exciton transition
energies in single�walled carbon nanotubes
Yutaka Ohno��� � Shinya Iwasaki�� Yoichi Murakami�� Shigeru
Kishimoto�� Shigeo Maruyama�� and Takashi Mizutani�� y
�Department of Quantum Engineering� Nagoya University�
Furo�cho� Chikusa�ku� Nagoya ��������� Japan
�Department of Mechanical Engineering�
The University of Tokyo� 	���
 Hongo�
Bunkyo�ku� Tokyo 
�������� Japan
�Dated� April �� �����
Abstract
Environmental dielectric screening e	ects on exciton transition energies in single
walled carbon
nanotubes �SWNTs� have been studied quantitatively in the range of dielectric constants from
��� to 
� by immersing SWNTs bridged over trenches in various organic solvents by means of
photoluminescence and the excitation spectroscopies� With increasing environmental dielectric
constant ��env�� both E�� and E�� exhibited a redshift by several tens meV and a tendency to
saturate at a �env � � without an indication of signi�cant �n�m� dependence� The redshifts can be
explained by dielectric screening of the repulsive electron
electron interaction� The �env dependence
of E�� and E�� can be expressed by a simple empirical equation with a power law in �env� Eii �
ii � A�
env� We also immersed a sample in sodium
dodecyl
sulfate �SDS� solution to investigate
the e	ects of wrapping SWNTs with surfactant� The resultant E�� and E��� which agree well with
Weisman�s data �Nano Lett� �� ��
� ����
��� are close to those of �env of �� However� in addition
to the shift due to dielectric screening� another shift was observed so that the ��n � m�
family
patterns spread more widely� similar to that of the uniaxial
stress
induced shift�
I� INTRODUCTION
Optical spectroscopy of single�walled carbon nanotubes �SWNTs� has received increasing
attention� not only for assigning the chiral vector� �n�m�� of the SWNTs��� but also to study
the physics of the one�dimensional excitons���� In paticular� photoluminescence �PL� and the
excitation spectroscopies are the most widely used for characterizing SWNTs with resonant
Raman scattering spectroscopy� Two�photon absorption spectroscopy� and Rayleigh scat�
tering spectroscopy combined with TEM or SEM observation	 are also powerful techniques
to investigate excitonic states of SWNTs and the bundle e�ect�
Recently� the environmental e�ect is one of the topics most investigated for its e�ect on
the optical properties of SWNTs�
��� It is known that the optical transition energies vary�
depending on the kind of surrounding surfactant used to individualize SWNTs�
��� Lefebvre
et al� have reported the optical transition energies of SWNTs bridging between two micro�
pillars fabricated on a Si wafer show a blueshift as compared to the SDS�wrapped SWNTs���
We have also compared air�suspended SWNTs grown on a quartz substrate with a grating
structure to SDS�wrapped SWNTs� and have shown that the energy di�erences between
air�suspended and SDS�wrapped SWNTs depend on �n�m�� especially on chiral angle and
on type of SWNTs� type�I �	n 
 m mod � � �� or type�II �	n 
 m mod � �	���� These
energy variations due to environmental conditions have been thought to be produced by
the di�erence in the dielectric constant of the surrounding materials� The energy of the
many�body Coulomb interactions between carriers depends on the environmental dielectric
constant ��env�� because the electric force lines contributing to the Coulomb interactions pass
through the matrix� as well as the inside of the SWNT�����	 Note that the environmental
e�ect is caused not only by the dielectric screening of Coulomb interactions� but also by
chemical����� and mechanical conditions��� Finnie et al� have reported that gas adsorption
a�ects the emission energy of SWNTs���
Investigations into the environmental e�ect have not been comprehensive� despite its
importance in understanding the optical properties of SWNTs� Quantitative and separate
investigations are necessary to understand the contributions of various environmental factors�
In this study� we have focused on the dielectric screening e�ect� which has been quantitatively
investigated by immersing the SWNTs grown over trenches into various liquids with di�erent
�env� from ��
 to ��� by means of photoluminescence and the excitation spectroscopies�
II� EXPERIMENTAL
For the study on environmental e�ects� the SWNTs grown between two micro�pillars����
or over a trench����� are suitable because the environmental conditions can be controlled
intentionally� The samples used in this work are SWNTs grown on a quartz substrate with
periodic trenches� as shown in Fig� �� Both the period and depth of the trenches are 	
�m� The trenches were formed by photolithography� Al�metal evaporation and lift�o�� and
subsequent reactive�ion etching of the quartz with the Al mask� The SWNTs were grown by
alcohol catalytic chemical vapor deposition�� at ����C for �� s� after spin coating of a water
solution of Co acetate� By optimizing the growth condition� isolated bridging SWNTs can
be formed over the trenches� The density of the SWNTs was ���	 �m�� along the direction
of the trench� Such low�density� isolated SWNTs were necessary to observe luminescence
when the sample was immersed in liquids� On the other hand� in case of a sample with a
relatively high�density of SWNTs as �	 �m���� PL intensity was degraded by immersing in
liquids� even though strong PL was obtained in air� This is probably because the SWNTs
form bundles with neighbors upon immersion in liquids� in the case of a sample with a
high�density of SWNTs�
The sample was mounted in a vessel with a quartz window� and immersed in various
organic solvents with �env from ��
 to �� �see Table I�� We also immersed a sample in
��wt� D�O solution of sodium�dodecyl�sulfate �SDS� to investigate the e�ect of wrapping
with surfactant molecules� PL and the excitation spectra were measured using a home�
made facility consisting of a tunable� continuous�wave Ti�Sapphire laser �������� nm�� a
monochromator with a focal length of 	� cm� and a liquid�N��cooled InGaAs photomultiplier
tube ���������� nm�� The excitation wavelength was monitored by a laser wavelength
meter� The diameter of the laser spot on the sample surface was �� mm� so that an
ensemble of many SWNTs was detected�
III� RESULTS AND DISCUSSION
Figure 	 shows PL maps of SWNTs �a���g� in various organic solvents with di�erent
�env and �h� in SDS solution� By immersing the sample in the organic solvents� the emission
and excitation peaks� which respectively correspond to E�� and E��� showed redshifts and
spectral broadening� In the PL map measured in SDS solution��� the peak positions agree
well with those of Weisman�s empirical Kataura plot for SDS�wrapped SWNTs represented
by crosses��
The E���E�� plots in air ��env � ����� hexane ���
�� chloroform ������ and SDS solution
are shown in Fig� � � Both of E�� and E�� showed redshifts with increasing �env� without
signi�cant �n�m� dependence� The amounts of the redshifts are ����
 meV for E�� and
	���� meV for E��� The redshift with increasing �env is consistent with the theoretical
work by Ando��� The optical transition energy in SWNTs is given by a summation of the
bandgap and the exciton binding energies� When the �env increases� the Coulomb interaction
is enhanced and the exciton binding energy decreases� This leads to a blueshift of the optical
transition energy in the SWNT� It should be noted that the bandgap is renormalized by the
electron�electron repulsion interaction� and consequently the change in the �env a�ects not
only the exciton binding energy� but also the bandgap� According to theoretical work������
the repulsive electron�electron energy is larger in magnitude than the exciton binding energy�
Therefore� if we consider that both energies show a similar dependence on �env� the decrease
in the electron�electron repulsion energy exceeds the decrease in the exciton binding energy
when �env increases� This results in a redshift in optical transition energies� The redshifts
observed in the present experiment are attributed to the decrease in the repulsion energy
of the electron�electron interaction with an increase in the �env� Note that even though
the redshift with increasing �env is consistent with the theoretical studies� the amount of
the redshift is much smaller than the calculations� This is probably because in the present
experiments� the �env only outside of the SWNTs was varied� whereas the theoretical studies
used a dielectric constant for the whole system�
E�� and E�� are plotted as a function of �env in Fig� �� The energy shifts show a tendency
to saturate at �env � �� The energy variations can be �tted to an empirical expression with
a power law in �
Eii � E
env ���
where E�ii corresponds to a transition energy when �env is in�nity� A corresponds to the
maximum value of the energy change by �
� and � is a �tting parameter� A and � are ��
meV and ��
 for E��� and 	
 meV and ��� for E��� respectively� on average� Perebeinos et al�
have reported power law scaling for excitonic binding energy by dielectric constant� where
the scaling factor � is estimated to be � ��� in the range of � � ���	 Although the power law
expression of Eq� � is quite similar to the Perebeinos�s scaling law for exciton binding energy�
the present empirical expression is attributed to the scaling of electron�electron repulsion
energy by �env� rather than exciton �electron�hole� binding energy as described above� Quite
recently� such a power�law�like�downshift in optical transition energy with �env has been
obtained by theoretical calculations based on a tight�binding model���
We have previously pointed out that the E�� and E�� varies depending on �n�m�� in
particular on chiral angle and on the type of SWNTs �type�I or type�II�� comparing those
of SDS�wrapped SWNTs to those of air�suspended SWNTs� The same behavior occurred
by immersing a sample in SDS solution� as shown in Fig� 	�h�� Most E�� and E�� of SDS�
wrapped SWNTs show a redshift� except for E�� of near�zigzag type�II SWNTs� which
show a blueshift as compared to those in air� This behavior is inconsistent with the results
of the �env dependence as described above� in which �n�m� dependence is not signi�cant�
Comparing the PL maps in SDS solution to those in organic solvents as shown in Fig� �� the
equivalent �env of SDS�wrapped SWNTs would be � 	� In addition to the shift due to the
dielectric screening� the peak positions of SDS�wrapped SWNTs shift so as the �	n 
family patterns spread more widely� This suggest that wrapping with SDS has another
e�ect in addition to the dielectric screening e�ect� The behavior of the additional shifts is
similar to the shift induced by uniaxial stress� reported by Arnold et al��� At present� it still
remains an issue whether such uniaxial strain is induced in the SWNTs by wrapping with
surfactants or not� A charging e�ect due to SDS� which is an anionic surfactant� should also
be considered� Further study is necessary to understand the e�ects of surfactants on optical
transition energies in SWNTs�
Finally� we note the broadening of the PL spectra in liquids� The representative spectra
are shown in Fig� �� The peaks show a broadening� in addition to the redshift with increasing
�env� The linewidth increased from 	� meV in air to �� meV in acetonitrile for �
��� SWNTs�
This linewidth broadening is probably attributed to inhomogeneous broadening due to local
�uctuations of �env on the nano�scale dimension� The dielectric constants we used in Table I
are macroscopic values� On such small dimensions as exciton diameters of a few nm��� the
size of molecules of the organic solvents would be considerable so that the local �env would
�uctuate depending on the number and orientation of the organic molecules� This would
result in inhomogeneous broadening of the PL spectrum�
IV� SUMMARY
In summary� dielectric screening e�ects due to the environment around SWNTs on exciton
transition energies in the SWNTs have been studied quantitatively by means of PL and the
excitation spectroscopies� We varied the �env from ��� to �� by immersing the samples
with SWNTs bridging over trenches in various organic solvents with di�erent �envs� With
increasing �env� both E�� and E�� showed a redshift by ����
 meV for E�� and 	���� meV
for E��� and a tendency to saturate at �env � �� without a signi�cant �n�m� dependence� The
redshift can be explained by dielectric screening of repulsive electron�electron energy� The
�env dependence of E�� and E�� were expressed by a simple empirical equation with a power
law in �env� The equivalent �env of SDS�wrapped SWNTs was estimated to be � 	� It was
suggested that the e�ect of wrapping SWNTs with SDS was not only a dielectric screening
e�ect� but also another e�ect which caused an energy shift like a uniaxial�stress�induced
shift�
� Also at PRESTO� Japan Science and Technology Agency� �
� Honcho� Kawaguchi� Saitama
����� Japan� Electronic address� yohno�nuee�nagoya�u�ac�jp
y Also at Institute for Advanced Research� Nagoya University� Furo
cho� Chikusa
ku� Nagoya�
����� Japan
� H� Kataura� Y� Kumazawa� Y� Maniwa� I� Umezu� S� Suzuki� Y� Ohtsuka� and Y� Achiba�
Synthetic Metals ���� ���� �������
� S� M� Bachilo� M� S� Strano� C� Kittrell� R� H� Hauge� R� E� Smalley� and R� B� Weisman�
Science ���� �
�� �������
� R� B� Weisman and S� M� Bachilo� Nano Lett� �� ��
� ����
��
� Y� Miyauchi� S� Chiashi� Y� Murakami� Y� Hayashida� S� Maruyama� Chem� Phys� Lett� ����
��� �������
� H� Htoon� M� J� O�Connell� P� J� Cox� S� K� Doorn� and V� I� Klimov� Phys� Rev� Lett� ���
������ �������
� F� Wang� G� Dukovic� L� E� Brus� T� F� Heinz� Science ���� �
� �������
	 F� Wang� M� Y� Sfeir� L� Huang� X� M� H� Huang� Y� Wu� J� Kim� J� Hone� S� O�Brien� L� E�
Brus� and T� F� Heinz� Phys� Rev� Lett �	� ������ �������
� P� T� Araujo� S� K� Doorn� S� Kilina� S� Tretiak� E� Einarsson� S� Maruyama� H� Chacham� M�
A� Pimenta� and A� Jorio� Phys� Rev� Lett� ��� ������ �������
 V� C� Moore� M� S� Strano� E� H� Haroz� R� H� Hauge� R� E� Smalley� J� Schmidt� and Y�
Talmon� Nano Lett� �� �
�� ����
��
�� S� G� Chou� H� B� Ribeiro� E� B� Barros� A� P� Santos� D� Nezich� Ge� G� Samsonidze� C�
Fantini� M� A� Pimenta� A� Jorio� F� Plentz Filho� M� S� Dresselhaus� G� Fresselhaus� R� Saito�
M� Zheng� G� B� Onoa� E� D� Semke� A� K� Swan� M� S� �Unl�u� and B� B� Goldberg� Chem�
Phys� Lett� ���� ��� �������
�� C� Fantini� A� Jorio� M� Souza� M� S� Strano� M� S� Dresselhaus� and M� A� Pimenta� Phys�
Rev� Lett� ��� ������ �������
�� J� Lefebvre� J� M� Fraser� Y� Homma� and P� Finnie� Appl� Phys� A ��� ���� �������
�� T� Hertel� A� Hagen� V� Talalaev� K� Arnold� F� Hennrich� M� Kappes� S� Rosenthal� J� McBride�
H� Ulbricht� and E� Flahaut� Nano Lett� 
� ��� �������
�� P� Finnie� Y� Homma� and J� Lefebvre� Phys� Rev� Lett� ��� ������ �������
�� Y� Ohno� S� Iwasaki� Y� Murakami� S� Kishimoto� S� Maruyama� and T� Mizutani� Phys� Rev�
B ��� �
���� �������
�� T� Ando J� Phys� Soc� Japan ��� ��� �������
�	 V� Perebeinos� J� Terso	� and Ph� Avouris� Phys� Rev� Lett� ��� ������ �������
�� K� Arnold� S� Lebedkin� O� Kiowski� F� Hennrich� and M� M� Kappes� Nano Lett� �� �
�� �������
 J� Lefebvre� Y� Homma� and P� Finnie� Phys� Rev� Lett� ��� ������ ����
��
�� Y� Ohno� S� Kishimoto� and T� Mizutani� Nanotechnology ��� ��� �������
�� Y� Murakami� Y� Miyauchi� S� Chiashi� and S� Maruyama� Chem� Phys� Lett� ���� �
 ����
��
�� In order to form micelle structure� the sample was kept in SDS solution for one night before the
PL measurement�
�� Y� Miyauchi� R� Saito� K� Saito� Y� Ohno� S� Iwasaki� T� Mizutani� J� Jiang� S� Maruyama �will
be published elsewhere�
�� C� D� Spataru� S� Ismail
Beigi� L� X� Benedict� and S� G� Louie� Phys� Rev� Lett� ��� ������
�������
TABLE I� �env of air and various liquids used in this study�
solvent �env
air ���
hexane ���
chloroform ���
ethyl acetate ���
dichloromethane ���
acetone ��
acetonitrile 
�
FIG� �� Plan
view and bird
view �inset� SEM images of SWNTs bridging over trenches on a quartz
substrate� Pt thin �lm was deposited on the sample in order to avoid charge up of the insulating
substrate and to observe individual SWNTs� By optimizing growth condition� individual SWNTs
can be obtained�
0.8 0.9 1.0 1.1
0.9 1.0 1.1
0.9 1.0 1.1
0.9 1.0 1.1
(a) air (b) hexane (c) chloroform (d) ethyl acetate
(e) dichloromethane (f) acetone (g) acetonitrile (h) SDS
FIG� �� PL contour maps for SWNTs� �a� in air ��env � ����� �b� hexane ��env � ����� �c� chloroform
��env � ����� �d� ethyl acetate ��env � ����� �e� dichloromethane ��env � ����� �f� acetone ��env �
������ and �g� acetonitrile ��env � 
����� and �h� SDS�D�O solution� The crosses and open circles
represent peak positions in air and in the liquid� In �h�� stars represent the peak positions of
wrapped SWNTs reported by Weisman et al�� The PL intensity color schemes are linear� but
di	erent scaling factors were used for each maps� The emission and excitation maxima correspond
to E�� and E��� respectively�
FIG� 
� E�� versus E�� plots in air ��env � ����� hexane ��env � ����� chloroform ��env � ����� The
stars represent the peak positions of SDS
wrapped SWNTs�� Both E�� and E�� show redshifts
with increasing �env with a small �n�m� dependence� The peak positions of SDS
wrapped SWNTs
deviate from the line of dielectric screening e	ect�
FIG� �� �env dependences of E�� and E�� of various �n�m� SWNTs� The solid lines are �tting
curves given by an empirical expression with power law in �
FIG� �� PL spectra in various liquids� With increasing �
� the emission spectra show linewidth
broadening in addition to redshift�
ABSTRACT
  Environmental dielectric screening effects on exciton transition energies in
single-walled carbon nanotubes (SWNTs) have been studied quantitatively in the
range of dielectric constants from 1.0 to 37 by immersing SWNTs bridged over
trenches in various organic solvents by means of photoluminescence and the
excitation spectroscopies. With increasing environmental dielectric constant
($\epsilon_{\rm env}$), both $E_{11}$ and $E_{22}$ exhibited a redshift by
several tens meV and a tendency to saturate at a $\epsilon_{\rm env} \sim 5$
without an indication of significant ($n$,$m$) dependence. The redshifts can be
explained by dielectric screening of the repulsive electron-electron
interaction. The $\epsilon_{\rm env}$ dependence of $E_{11}$ and $E_{22}$ can
be expressed by a simple empirical equation with a power law in $\epsilon_{\rm
env}$, $E_{\rm ii} = E_{\rm ii}^{\infty} + A\epsilon_{\rm env}^{-\alpha}$. We
also immersed a sample in sodium-dodecyl-sulfate (SDS) solution to investigate
the effects of wrapping SWNTs with surfactant. The resultant $E_{11}$ and
$E_{22}$, which agree well with Weisman's data [Nano Lett. {\bf 3}, 1235
(2003)], are close to those of $\epsilon_{\rm env}$ of 2. However, in addition
to the shift due to dielectric screening, another shift was observed so that
the ($2n+m$)-family patterns spread more widely, similar to that of the
uniaxial-stress-induced shift.

<|endoftext|><|startoftext|>
Introduction
Cyclic cohomology groups of topological algebras play an essential role in non-
commutative geometry [2]. There has been a number of papers addressing the
calculation of cyclic-type continuous homology and cohomology groups of some Ba-
nach, C∗- and topological algebras; see, e.g., [2, 8, 14, 17, 18, 20, 32]. However, it
remains difficult to describe these groups explicitly for many topological algebras.
To compute the continuous Hochschild and cyclic cohomology groups of Fréchet
algebras one has to deal with complexes of complete DF -spaces. Here, in addition
to presenting known homological techniques we also supply technical enhancements
that permit the necessary generalization of results known in the case of Banach
algebras and their inverse limits to wider classes of topological algebras notably to
those that occur in noncommutative geometry.
The category of Banach spaces has the useful property that it is closed under
passage to dual spaces. Fréchet spaces do not have this property: the strong dual
Date: 22 August 2007.
Key words and phrases. Cyclic cohomology, Hochschild cohomology, nuclear DF -spaces, locally
convex algebras, nuclear Fréchet algebra.
I am indebted to the Isaac Newton Institute for Mathematical Sciences at Cambridge for hos-
pitality and for generous financial support from the programme on Noncommutative Geometry
while this work was carried out.
http://arxiv.org/abs/0704.1019v2
2 Z. A. Lykova
of a Fréchet space is a complete DF -space. DF -spaces have the awkward feature
that their closed subspaces need not be DF -spaces. However, closed subspaces of
complete nuclear DF -spaces are again DF -spaces [21, Proposition 5.1.7].
In Section 3 we use the strongest known results on the open mapping theorem to
give sufficient conditions on topological spaces E and F to imply that any continuous
linear operator T from E∗ onto F ∗ is open. This allows us to prove the following
results. In Lemma 3.6 we show that, for a continuous morphism ϕ : X → Y of
complexes of complete nuclear DF -spaces, the isomorphism of cohomology groups
Hn(ϕ) : Hn(X )→ Hn(Y) is automatically topological.
We use this fact to describe explicitly up to topological isomorphism the continuous
Hochschild and cyclic cohomology groups of nuclear ⊗̂-algebras A which are Fréchet
spaces or DF -spaces and have trivial Hochschild homology HHn(A) for all n ≥ 1
(Theorem 5.3). In Proposition 4.4, under the same condition on HHn(A), we give
explicit formulae, up to isomorphism of linear spaces, for continuous cyclic-type
homology of A in a more general category of underlying spaces.
In Theorem 6.8 the continuous cyclic-type homology and cohomology groups are
described up to topological isomorphism for the following classes of biprojective ⊗̂-
algebras: the tensor algebra E⊗̂F generated by the duality (E, F, 〈·, ·〉) for nuclear
Fréchet spaces or for nuclear complete DF -spaces E and F ; nuclear biprojective
Fréchet Köthe algebras λ(P ); nuclear biprojective Köthe algebras λ(P )∗ which are
DF -spaces; the algebra of distributions E∗(G) and the algebra of smooth functions
E(G) on a compact Lie group G.
2. Definitions and notation
We recall some notation and terminology used in homology and in the theory of
topological algebras. Homological theory can be found in any relevant textbook, for
instance, Loday [16] for the pure algebraic case and Helemskii [7] for the continuous
case.
Throughout the paper ⊗̂ is the projective tensor product of complete locally
convex spaces, by X⊗̂n we mean the n-fold projective tensor power X⊗̂ . . . ⊗̂X of
X , and id denotes the identity operator.
We use the notation Ban, Fr and LCS for the categories whose objects are Banach
spaces, Fréchet spaces and complete Hausdorff locally convex spaces respectively,
and whose morphisms in all cases are continuous linear operators. For topological
homology theory it is important to find a suitable category for the underlying spaces
of the algebras and modules. In [7] Helemskii constructed homology theory for the
following categories Φ of underlying spaces, for which he used the notation (Φ, ⊗̂).
Definition 2.1. ([7, Section II.5]) A suitable category for underlying spaces of
the algebras and modules is an arbitrary complete subcategory Φ of LCS having the
following properties:
(i) if Φ contains a space, it also contains all those spaces topologically isomorphic
to it;
Cyclic cohomology of nuclear Fréchet and DF algebras 3
(ii) if Φ contains a space, it also contains any of its closed subspaces and the
completion of any its Hausdorff quotient spaces;
(iii) Φ contains the direct sum and the projective tensor product of any pair of its
spaces;
(iv) Φ contains C.
Besides Ban, Fr and LCS important examples of suitable categories Φ are the
categories of complete nuclear spaces [31, Proposition 50.1], nuclear Fréchet spaces
and complete nuclear DF -spaces. As to the above properties for the category of
complete nuclear DF -spaces, recall the following results. By [12, Theorem 15.6.2],
if E and F are complete DF -spaces, then E⊗̂F is a complete DF -space. By [21,
Proposition 5.1.7], a closed linear subspace of a complete nuclear DF -space is also
a complete nuclear DF -space. By [21, Proposition 5.1.8], each quotient space of a
complete nuclear DF -space by a closed linear subspace is also a complete nuclear
DF -space.
By definition a ⊗̂-algebra is a complete Hausdorff locally convex algebra with
jointly continuous multiplication. A left ⊗̂-module X over a ⊗̂-algebra A is a com-
plete Hausdorff locally convex spaceX together with the structure of a leftA-module
such that the map A×X → X , (a, x) 7→ a ·x is jointly continuous. For a ⊗̂-algebra
A, ⊗̂A is the projective tensor product of left and right A-⊗̂-modules (see [6], [7,
II.4.1]). The category of left A-⊗̂-modules is denoted by A-mod and the category
of A-⊗̂-bimodules is denoted by A-mod-A.
Let K be one of the above categories. A chain complex X∼ in the category K is a
sequence of Xn ∈ K and morphisms dn
· · · ← Xn
← Xn+1
← Xn+2 ← . . .
such that dn ◦ dn+1 = 0 for every n. The homology groups of X∼ are defined by
Hn(X∼) = Ker dn−1/Im dn.
A continuous morphism of chain complexes ψ∼ : X∼ → P∼ induces a continuous
linear operator Hn(ψ∼) : Hn(X∼)→ Hn(P∼) [9, Definition 0.4.22].
If E is a topological vector space E∗ denotes its dual space of continuous linear
functionals. Throughout the paper, E∗ will always be equipped with the strong
topology unless otherwise stated. The strong topology is defined on E∗ by taking as
a basis of neighbourhoods of 0 the family of polars V 0 of all bounded subsets V of
E; see [31, II.19.2].
For any ⊗̂-algebra A, not necessarily unital, A+ is the ⊗̂-algebra obtained by
adjoining an identity to A. For a ⊗̂-algebra A, the algebra Ae = A+⊗̂A
called the enveloping algebra of A, where A
+ is the opposite algebra of A+ with
multiplication a · b = ba.
A complex of A-⊗̂-modules and their morphisms is called admissible if it splits
as a complex in LCS [7, III.1.11]. A module Y ∈ A-mod is called flat if for any
admissible complex X of right A-⊗̂-modules the complex X⊗̂AY is exact. A module
Y ∈ A-mod-A is called flat if for any admissible complex X of A-⊗̂-bimodules the
4 Z. A. Lykova
complex X⊗̂AeY is exact. For Y,X ∈ A-mod-A, we shall denote by Tor
n (X, Y ) the
nth homology of the complex X⊗̂AeP, where 0← Y ← P is a projective resolution
of Y in A-mod-A, [7, Definition III.4.23].
It is well known that the strong dual of a Fréchet space is a complete DF -space
and that nuclear Fréchet spaces and complete nuclear DF -spaces are reflexive [21,
Theorem 4.4.12]. Moreover, the correspondence E ↔ E∗ establishes a one-to-one
relation between nuclear Fréchet spaces and complete nuclear DF -spaces [21, The-
orem 4.4.13]. DF -spaces were introduced by A. Grothendieck in [5].
Further we shall need the following technical result which extends a result of
Johnson for the Banach case [11, Corollary 1.3].
Proposition 2.2. Let (X , d) be a chain complex of
(a) Fréchet spaces and continuous linear operators, or
(b) complete nuclear DF -spaces and continuous linear operators,
and let N ∈ N. Then the following statements are equivalent:
(i) Hn(X , d) = {0} for all n ≥ N and HN−1(X , d) is Hausdorff;
(ii) Hn(X ∗, d∗) = {0} for all n ≥ N.
Proof. Recall that Hn(X , d) = Ker dn−1/Im dn and H
n(X ∗, d∗) = Ker d∗n/Im d
Let L be the closure of Im dN−1 in XN−1. Consider the following commutative
diagram
0 ← L
←− XN
←− XN+1
←− . . .
↓ i ւ dN−1
in which i is the natural inclusion and j is a corestriction of dN−1. The dual com-
mutative diagram is the following
0 → L∗
−→ X∗N
−→ X∗N+1
−→ . . .
↑ i∗ ր d∗N−1
X∗N−1
It is clear that HN−1(X , d) is Hausdorff if and only if j is surjective. Since i is
injective, condition (i) is equivalent to the exactness of diagram (1). On the other
hand, by the Hahn-Banach theorem, i∗ is surjective. Thus condition (ii) is equivalent
to the exactness of diagram (2).
In the case of Fréchet spaces, by [18, Lemma 2.3], the exactness of the complex
(1) is equivalent to the exactness of the complex (2).
In the case of complete nuclear DF -spaces, by [21, Proposition 5.1.7], L is the
strong dual of a nuclear Fréchet space. By [21, Theorem 4.4.12], complete nuclear
DF -spaces are reflexive, and therefore the complex (1) is the dual of the complex
(2) of nuclear Fréchet spaces and continuous linear operators. By [18, Lemma 2.3],
the exactness of the complex (1) is equivalent to the exactness of the complex (2).
The proposition is proved. �
Cyclic cohomology of nuclear Fréchet and DF algebras 5
3. The open mapping theorem in complete nuclear DF -spaces
It is known that there exist closed linear subspaces of DF -spaces that are not
DF -spaces. For nuclear spaces, however, we have the following.
Lemma 3.1. [21, Proposition 5.1.7] Each closed linear subspace F of the strong dual
E∗ of a nuclear Fréchet space E is also the strong dual of a nuclear Fréchet space.
In a locally convex space a subset is called a barrel if it is absolutely convex,
absorbent and closed. Every locally convex space has a neighbourhood base con-
sisting of barrels. A locally convex space is called a barrelled space if every barrel
is a neighbourhood [26]. By [26, Theorem IV.1.2], every Fréchet space is barrelled.
By [26, Corollary IV.3.1], a Hausdorff locally convex space is reflexive if and only
if it is barrelled and every bounded set is contained in a weakly compact set. Thus
the strong dual of a nuclear Fréchet space is barrelled. For a generalization of the
open mapping theorem to locally convex spaces, V. Pták introduced the notion of
B-completeness in [24]. A subspace Q of E∗ is said to be almost closed if, for each
neighbourhood U of 0 in E, Q∩U0 is closed in the relative weak* topology σ(E∗, E)
on U0. A locally convex space E is said to be B-complete or fully complete if each
almost closed subspace of E∗ is closed in the weak* topology σ(E∗, E).
Theorem 3.2. [24]. Let E be a B-complete locally convex space and F be a barrelled
locally convex space. Then a continuous linear operator f of E onto F is open.
Recall [10, Theorem 4.1.1] that a locally convex space E is B-complete if and only
if each linear continuous and almost open mapping f of E onto any locally convex
space F is open. By [10, Proposition 4.1.3], every Fréchet space is B-complete.
Theorem 3.3. Let E be a semi-reflexive metrizable barrelled space, F be a Hausdorff
reflexive locally convex space and let E∗ and F ∗ be the strong duals of E and F
respectively. Then a continuous linear operator T of E∗ onto F ∗ is open.
Proof. By [10, Theorem 6.5.10] and by [10, Corollary 6.2.1], the strong dual E∗
of a semi-reflexive metrizable barrelled space E is B-complete. By [26, Corollary
IV.3.2], if a Hausdorff locally convex space is reflexive, so is its dual under the strong
topology. By [26, Corollary IV.3.1], a Hausdorff reflexive locally convex space is
barrelled. Hence F ∗ is a barrelled locally convex space. Therefore, by Theorem 3.2,
T is open. �
Corollary 3.4. Let E and F be nuclear Fréchet spaces and let E∗ and F ∗ be the
strong duals of E and F respectively. Then a continuous linear operator T of E∗
onto F ∗ is open.
For a continuous morphism of chain complexes ψ∼ : X∼ → P∼ in Fr, a surjective
map Hn(ϕ) : Hn(X ) → Hn(Y) is automatically open, see [7, Lemma 0.5.9]. To get
the corresponding result for dual complexes of Fréchet spaces one has to assume
nuclearity.
6 Z. A. Lykova
Lemma 3.5. Let (X , dX ) and (Y , dY) be chain complexes of nuclear Fréchet spaces
and continuous linear operators and let (X ∗, d∗X ) and (Y
∗, d∗Y) be their strong dual
complexes. Let ϕ : X ∗ → Y∗ be a continuous morphism of complexes. Suppose that
ϕ∗ = H
n(ϕ) : Hn(X ∗, d∗X )→ H
n(Y∗, d∗Y)
is surjective. Then ϕ∗ is open.
Proof. Let σY∗ : Ker (d
Y)n → H
n(Y∗, d∗Y) be the quotient map. Consider the map
ψ : Ker (d∗X )n ⊕ Y
n−1 → Ker (d
Y)n ⊂ Y
given by (x, y) 7→ ϕn(x) + (d
Y)n−1(y).
By Lemma 3.1, Ker (d∗X )n and Ker (d
Y)n are the strong duals of nuclear Fréchet
spaces and hence are barrelled. By [10, Theorem 6.5.10] and [10, Corollary 6.2.1],
the strong dual of a semi-reflexive metrizable barrelled space is B-complete. Thus
Ker (d∗X )n, Y
n−1 and
Ker (d∗X )n ⊕ Y
∼= [(Ker (d∗X )n)
∗ ⊕ Yn−1]
are B-complete. By assumption ϕ∗ maps H
n(X ∗, d∗X ) onto H
n(Y∗, d∗Y), which im-
plies that ψ is a surjective linear continuous operator from the B-complete locally
convex space Ker (d∗X )n ⊕ Y
n−1 to the barrelled locally convex space Ker (d
Therefore, by Theorem 3.2, ψ is open. Consider the diagram
Ker (d∗X )n ⊕ Y
→ Ker (d∗X )n
→ Hn(X ∗, d∗X )
↓ ψ ↓ ϕ∗
Ker (d∗Y)n
−→ Hn(Y∗, d∗Y)
in which j is a projection onto a direct summand and σX ∗ and σY∗ are the natural
quotient maps. Obviously this diagram is commutative. Note that the projection j
and quotient maps σX ∗ , σY∗ are open. As ψ is also an open map, so is σY∗ ◦ ψ =
ϕ∗ ◦ σX ∗ ◦ j. Since σX ∗ ◦ j is continuous, ϕ∗ is open. �
Corollary 3.6. Let (X , dX ) and (Y , dY) be cochain complexes of complete nuclear
DF -spaces and continuous linear operators, and let ϕ : X → Y be a continuous
morphism of complexes. Suppose that ϕ∗ = H
n(ϕ) : Hn(X , dX ) → H
n(Y , dY) is
surjective. Then ϕ∗ is open.
Proof. By [21, Theorem 4.4.13], (X , dX ) and (Y , dY) are strong duals of chain com-
plexes (X ∗, d∗X ) and (Y
∗, d∗Y) of nuclear Fréchet spaces and continuous operators.
The result follows from Lemma 3.5. �
4. Cyclic and Hochschild cohomology of some ⊗̂-algebras
One can consult the books by Loday [16] or Connes [2] on cyclic-type homological
theory.
Let A be a ⊗̂-algebra and let X be an A-⊗̂-bimodule. We assume here that the
category of underlying spaces Φ has the properties from Definition 2.1. Let us recall
the definition of the standard homological chain complex C∼(A, X). For n ≥ 0, let
Cyclic cohomology of nuclear Fréchet and DF algebras 7
Cn(A, X) denote the projective tensor product X⊗̂A
. The elements of Cn(A, X)
are called n-chains. Let the differential dn : Cn+1 → Cn be given by
dn(x⊗ a1 ⊗ . . .⊗ an+1) = x · a1 ⊗ . . .⊗ an+1+
(−1)k(x⊗ a1 ⊗ . . .⊗ akak+1 ⊗ . . .⊗ an+1) + (−1)
n+1(an+1 · x⊗ a1 ⊗ . . .⊗ an)
with d−1 the null map. The homology groups of this complex Hn(C∼(A, X)) are
called the continuous Hochschild homology groups of A with coefficients in X and
denoted by Hn(A, X) [7, Definition II.5.28]. We also consider the cohomology
groups Hn((C∼(A, X))
∗) of the dual complex (C∼(A, X))
∗ with the strong dual
topology. For Banach algebras A, Hn((C∼(A, X))
∗) is topologically isomorphic to
the Hochschild cohomologyHn(A, X∗) of A with coefficients in the dual A-bimodule
X∗ [7, Definition I.3.2 and Proposition II.5.27]. The weak bidimension of a Fréchet
algebra A is
dbwA = inf{n : H
n+1(C∼(A, X)
∗) = {0} for all Fréchet A−bimodules X}.
The continuous bar and ‘naive’ Hochschild homology of a ⊗̂-algebra A are defined
respectively as
Hbar∗ (A) = H∗(C(A), b
′) and Hnaive∗ (A) = H∗(C(A), b),
where Cn(A) = A
⊗̂(n+1), and the differentials b, b′ are given by
b′(a0 ⊗ · · · ⊗ an) =
(−1)i(a0 ⊗ · · · ⊗ aiai+1 ⊗ · · · ⊗ an) and
b(a0 ⊗ · · · ⊗ an) = b
′(a0 ⊗ · · · ⊗ an) + (−1)
n(ana0 ⊗ · · · ⊗ an−1).
Note that Hnaive∗ (A) is just another way of writing H∗(A,A), the continuous homol-
ogy of A with coefficients in A, as described in [7, 11].
There is a powerful method based on mixed complexes for the study of the cyclic-
type homology groups; see papers by C. Kassel [13], J. Cuntz and D. Quillen [4] and
J. Cuntz [3]. We shall present this method for the category LCS of locally convex
spaces and continuous linear operators; see [1] for the category of Fréchet spaces. A
mixed complex (M, b, B) in the category LCS is a familyM = {Mn}n≥0 of locally
convex spaces Mn equipped with continuous linear operators bn : Mn → Mn−1 and
Bn : Mn → Mn+1, which satisfy the identities b
2 = bB + Bb = B2 = 0. We assume
that in degree zero the differential b is identically equal to zero. We arrange the
8 Z. A. Lykova
mixed complex (M, b, B) in the double complex
. . . . . . . . . . . .
b ↓ b ↓ b ↓
b ↓ b ↓
There are three types of homology theory that can be naturally associated with a
mixed complex. The Hochschild homology Hb∗(M) of (M, b, B) is the homology of
the chain complex (M, b), that is,
Hbn(M) = Hn(M, b) = Ker {bn :Mn →Mn−1}/Im {bn+1 :Mn+1 →Mn}.
To define the cyclic homology of (M, b, B), let us denote by BcM the total complex
of the above double complex, that is,
· · · → (BcM)n
→ (BcM)n−1 → · · ·
→ (BcM)0 → 0,
where the spaces
(BcM)0 =M0, . . . , (BcM)2k−1 =M1 ⊕M3 ⊕ · · · ⊕M2k−1
(BcM)2k =M0 ⊕M2 ⊕ · · · ⊕M2k
are equipped with the product topology, and the continuous linear operators b+B
are defined by
(b+B)(y0, . . . , y2k) = (by2 +By0, . . . , by2k +By2k−2)
(b+B)(y1, . . . , y2k+1) = (by1, . . . , by2k+1 +By2k−1).
The cyclic homology of (M, b, B) is defined to be H∗(BcM, b+B). It is denoted by
Hc∗(M, b, B).
The periodic cyclic homology of (M, b, B) is defined in terms of the complex
· · · → (BpM)ev
→ (BpM)odd
→ (BpM)ev
→ (BpM)odd → . . . ,
where even/odd chains are elements of the product spaces
(BpM)ev =
M2n and (BpM)odd =
M2n+1,
respectively. The spaces (BpM)ev/odd are locally convex spaces with respect to the
product topology [15, Section 18.3.(5)]. The continuous differential b+B is defined
as an obvious extension of the above. The periodic cyclic homology of (M, b, B) is
Hpν (M, b, B) = Hν(BpM, b+B), where ν ∈ Z/2Z.
There are also three types of cyclic cohomology theory associated with the mixed
complex, obtained when one replaces the chain complex of locally convex spaces
Cyclic cohomology of nuclear Fréchet and DF algebras 9
by its dual complex of strong dual spaces. For example, the cyclic cohomology
associated with the mixed complex (M, b, B) is defined to be the cohomology of
the dual complex ((BcM)
∗, b∗ + B∗) of strong dual spaces and dual operators; it is
denoted by H∗c (M
∗, b∗, B∗).
Consider the mixed complex (Ω̄A+, b̃, B̃), where Ω̄
nA+ = A
⊗̂(n+1) ⊕A⊗̂n and
b 1− λ
0 −b′
; B̃ =
where λ(a1⊗· · ·⊗an) = (−1)
n−1(an⊗a1⊗· · ·⊗an−1) and N = id+λ+· · ·+λ
n−1 [16,
1.4.5]. The continuous Hochschild homology of A, the continuous cyclic homology
of A and the continuous periodic cyclic homology of A are defined by
HH∗(A) = H
∗(Ω̄A+, b̃, B̃), HC∗(A) = H
∗(Ω̄A+, b̃, B̃) and
HP∗(A) = H
∗ (Ω̄A+, b̃, B̃)
where Hb∗, H
∗ and H
∗ are Hochschild homology, cyclic homology and periodic cyclic
homology of the mixed complex (Ω̄A+, b̃, B̃) in LCS, see [17].
There is also a cyclic cohomology theory associated with a complete locally convex
algebra A, obtained when one replaces the chain complexes of A by their dual
complexes of strong dual spaces.
Lemma 4.1. (i) Let A be a [nuclear] Fréchet algebra. Then the following complexes
(C(A), b), (Ω̄A+, b̃), (BcΩ̄A+, b̃+ B̃) and (BpΩ̄A+, b̃+ B̃) are complexes of [nuclear]
Fréchet spaces and continuous linear operators.
(ii) Let A be a [nuclear] ⊗̂-algebra which is a DF -space. Then the following com-
plexes (C(A), b), (Ω̄A+, b̃), and (BcΩ̄A+, b̃+ B̃) are complexes of [nuclear] complete
DF -spaces and continuous linear operators, and (BpΩ̄A+, b̃ + B̃) is a complex of
[nuclear] complete locally convex spaces and continuous linear operators, but it is
not a DF -space in general.
Proof. It is well known that Fréchet spaces are closed under countable cartesian
products and projective tensor product [31]; nuclear locally convex spaces are closed
under cartesian products, countable direct sums and projective tensor product [12,
Corollary 21.2.3]; complete DF -spaces are closed under countable direct sums, pro-
jective tensor product, but not under infinite cartesian products [12, Theorem 12.4.8
and Theorem 15.6.2]. �
Propositions 4.2 and 4.3 below are proved by the author in [17, 18] and show
the equivalence between the continuous cyclic (co)homology of A and the contin-
uous periodic cyclic (co)homology of A when A has trivial continuous Hochschild
(co)homology HHn(A) for all n ≥ N for some integer N . Here we add in these
statements certain topological conditions on the algebra which allow us to show
that isomorphisms of (co)homology groups are automatically topological.
Proposition 4.2. [17, Proposition 3.2] Let A be a complete locally convex algebra.
Then, for any even integer N , say N = 2K, and the following assertions, we have
(i)N ⇒ (ii)N ⇒ (iii)N ⇒ (ii)N+1 and (ii)N ⇒ (iv)N :
10 Z. A. Lykova
(i)N H
naive
n (A) = {0} for all n ≥ N and H
n (A) = {0} for all n ≥ N − 1;
(ii)N HHn(A) = {0} for all n ≥ N ;
(iii)N for all k ≥ K, up to isomorphism of linear spaces,
HC2k(A) = HCN(A) and HC2k+1(A) = HCN−1(A);
(iv)N up to isomorphism of linear spaces, HP0(A) = HCN(A) and HP1(A) =
HCN−1(A).
For Fréchet algebras the isomorphisms in (iii)N and (iv)N are automatically topolog-
ical. For a nuclear ⊗̂-algebra A which is a DF -space the isomorphisms in (iii)N are
automatically topological.
Proof. A proof of the statement is given in [17, Proposition 3.2]. Here we add a
part on the automatic continuity of the isomorphisms. In view of the proofs of
[17, Propositions 2.1 and 3.2] it is easy to see that isomorphisms of homology in
(iii)N and (iv)N are induced by continuous morphisms of complexes. Note that by
Lemma 4.1, for a Fréchet algebra A, the following complexes (BcΩ̄A+, b̃ + B̃) and
(BpΩ̄A+, b̃ + B̃) are complexes of Fréchet spaces and continuous linear operators.
Thus, for Fréchet algebras, by [7, Lemma 0.5.9], isomorphisms of homology groups
are topological.
By Lemma 4.1, for a nuclear ⊗̂-algebra A which is a DF -space, the following com-
plex (BcΩ̄A+, b̃ + B̃) is a complex of nuclear complete DF -spaces and continuous
linear operators. By Corollary 3.6, for complete nuclear DF -spaces the isomor-
phisms for homology groups in (iii)N are also topological. �
Proposition 4.3. [18, Proposition 3.1] Let A be a complete locally convex algebra.
Then, for any even integer N , say N = 2K, and the following assertions, we have
(i)N ⇒ (ii)N ⇒ (iii)N ⇒ (ii)N+1 and (ii)N ⇒ (iv)N :
(i)N H
naive(A) = {0} for all n ≥ N and H
bar(A) = {0} for all n ≥ N − 1;
(ii)N for all n ≥ N, HH
n(A) = {0};
(iii)N for all k ≥ K, up to isomorphism of linear spaces, HC
2k(A) = HCN(A) and
HC2k+1(A) = HCN−1(A);
(iv)N up to isomorphism of linear spaces, HP
0(A) = HCN(A) and HP 1(A) =
HCN−1(A).
For nuclear Fréchet algebras the isomorphisms in (iii)N and (iv)N are topological
isomorphisms. For a nuclear ⊗̂-algebra A which is a DF -space the isomorphisms
in (iii)N are topological isomorphisms.
Proof. We need to add to the proof of [18, Proposition 3.1] the following part on
automatic continuity. In view of the proof of [18, Proposition 3.1] it is easy to
see that the isomorphisms of cohomology groups in (iii)N and (iv)N are induced by
continuous morphisms of complexes.
For nuclear Fréchet algebras, by Lemma 4.1, the complexes ((BcΩ̄A+)
∗, b̃∗ + B̃∗)
and ((BpΩ̄A+)
∗, b̃∗ + B̃∗) are complexes of strong duals of nuclear Fréchet spaces.
By Lemma 3.5, the isomorphisms of cohomology groups in (iii)N and (iv)N are
topological.
Cyclic cohomology of nuclear Fréchet and DF algebras 11
For a nuclear ⊗̂-algebra A which is a DF -space, by Lemma 4.1 and by [21,
Theorem 4.4.13], the chain complex (BcΩ̄A+, b̃+ B̃) is the strong dual of a complex
of nuclear Fréchet spaces. By [21, Theorem 4.4.12], complete nuclear DF -spaces and
nuclear Fréchet spaces are reflexive. Therefore ((BcΩ̄A+)
∗, b̃∗ + B̃∗) is a complex of
nuclear Fréchet spaces. Thus, by [7, Lemma 0.5.9], the isomorphisms of cohomology
groups in (iii)N are topological. �
The space of continuous traces on a topological algebra A is denoted by Atr, that
Atr = {f ∈ A∗ : f(ab) = f(ba) for all a, b ∈ A}.
The closure in A of the linear span of elements of the form {ab− ba : a, b ∈ A} is
denoted by [A,A]. Recall that b0 : A⊗̂A → A is uniquely determined by a ⊗ b 7→
ab− ba.
Proposition 4.4. Let A be in Φ and be a ⊗̂-algebra.
(i) Suppose that the continuous cohomology groups Hnaiven (A) = {0} for all n ≥ 1
and Hbarn (A) = {0} for all n ≥ 0. Then, up to isomorphism of linear spaces,
HHn(A) = {0} for all n ≥ 1 and HH0(A) = A/Im b0;
HC2ℓ(A) = A/Im b0 and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
HP0(A) = A/Im b0 and HP1(A) = {0}.
(ii) Suppose that the continuous cohomology groups Hnnaive(A) = {0} for all n ≥ 1
and Hnbar(A) = {0} for all n ≥ 0. Then, up to isomorphism of linear spaces,
HHn(A) = {0} for all n ≥ 1 and HH0(A) = Atr;
HC2ℓ(A) = Atr and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
HP0(A) = Atr and HP1(A) = {0}.
Proof. (i). One can see that Hbarn (A) = {0} for all n ≥ 0 implies that
HHn(A) = H
naive
n (A) for all n ≥ 0,
see [17, Section 3]. Note that by definition of the ‘naive’ Hochschild homology of
A, Hnaive0 (A) = A/Im b0. Therefore, HHn(A) = {0} for all n ≥ 1 and HH0(A) =
A/Im b0.
From the exactness of the long Connes-Tsygan sequence of continuous homology
it follows that
HC0(A) = H
naive
0 (A) = A/Im b0 and HC1(A) = {0}.
The rest of Statement (i) follows from Proposition 4.2.
(ii) It is known that Hnbar(A) = {0} for all n ≥ 0, implies HH
n(A) = Hnnaive(A)
for all n ≥ 0. By definition of the ‘naive’ Hochschild cohomology of A, H0naive(A) =
Atr. Thus HHn(A) = {0} for all n ≥ 1 and HH0(A) = Atr.
From the exactness of the long Connes-Tsygan sequence of continuous cohomology
it follows that HC0(A) = H0naive(A) = A
tr and HC1(A) = {0}. The rest of
Statement (ii) follows from Proposition 4.3. �
12 Z. A. Lykova
5. Cyclic-type cohomology of biflat ⊗̂-algebras
Recall that a ⊗̂-algebra A is said to be biflat if it is flat in the category of A-⊗̂-
bimodules [7, Def. 7.2.5]. A ⊗̂-algebra A is said to be biprojective if it is projective
in the category of A-⊗̂-bimodules [7, Def. 4.5.1]. By [7, Proposition 4.5.6], a ⊗̂-
algebra A is biprojective if and only if there exists an A-⊗̂-bimodule morphism
ρA : A → A⊗̂A such that πA ◦ ρA = idA, where πA is the canonical morphism
πA : A⊗̂A → A, a1 ⊗ a2 7→ a1a2. It can be proved that any biprojective ⊗̂-algebra
is biflat and A = A2 = Im πA [7, Proposition 4.5.4]. Here A2 is the closure of
the linear span of the set {a1 · a2 : a1, a2 ∈ A} in A. A ⊗̂-algebra A is said to be
contractible if A+ is is projective in the category of A-⊗̂-bimodules. A ⊗̂-algebra A
is contractible if and only if A is biprojective and has an identity [7, Def. 4.5.8]. For
biflat Banach algebras A, Helemskii proved A = A2 = Im πA [7, Proposition 7.2.6]
and gave the description of the cyclic homology HC∗ and cohomology HC
∗ groups
of A in [8]. Later the author generalized Helemskii’s result to inverse limits of biflat
Banach algebras [17, Theorem 6.2] and to locally convex strict inductive limits of
amenable Banach algebras [18, Corollary 4.9].
Proposition 5.1. Let A be in Φ and be a biflat ⊗̂-algebra such that A = Im πA;
in particular, let A ∈ Φ be a biprojective ⊗̂-algebra. Then
(i) Hnaiven (A) = {0} for all n ≥ 1, H
naive
0 (A) = A/Im b0 and H
n (A) = {0} for all
n ≥ 0;
(ii) for the homology groups HH∗, HC∗ and HP∗ of A we have the isomorphisms of
linear spaces (5).
If, furthermore, A is a Fréchet space or A is a nuclear DF -space, then Hnaive0 (A) =
A/[A,A] is Hausdorff, and, for a biflat A, A = A2 implies that A = Im πA.
Proof. By [7, Theorem 3.4.25], up to topological isomorphism, the homology groups
Hnaiven (A) = Hn(A,A) = Tor
n (A,A+)
for all n ≥ 0. Since A is biflat, by [7, Proposition 7.1.2], Hnaiven (A) = {0} for all
n ≥ 1.
By [7, Theorem 3.4.26], up to topological isomorphism, the homology groups
Hbarn (A) = Hn+1(A,C) = Tor
n+1(C,C)
for all n ≥ 0, where C is the trivial A-bimodule. Note that, for the trivial A-
bimodule C, there is a flat resolution
0← C← A+ ← A← 0
in the category of left or right A-⊗̂-modules. By [7, Theorem 3.4.28], Hbarn (A) =
TorAn+1(C,C) = {0} for all n ≥ 1. By assumption, A = Im πA, hence H
0 (A) =
A/Im πA = {0}. Thus the conditions of Proposition 4.4 (i) are satisfied.
In the categories of Fréchet spaces and complete nuclear DF -spaces, the open
mapping theorem holds – see Corollary 3.4 for DF -spaces. Thus, by [7, Proposi-
tions 3.3.5 and 7.1.2], up to topological isomorphism, Hnaive0 (A) = Tor
0 (A,A+) is
Cyclic cohomology of nuclear Fréchet and DF algebras 13
Hausdorff. Since A is biflat, by [7, Proposition 7.1.2], TorA0 (C,A) is also Hausdorff.
By [7, Proposition 3.4.27], A2 = Im πA. �
A ⊗̂-algebra A is amenable if A+ is a flat A-⊗̂-bimodule. For a Fréchet algebra A
amenability is equivalent to the following: for all Fréchet A-bimodules X , H0(A, X)
is Hausdorff and Hn(A, X) = {0} for all n ≥ 1. Recall that an amenable Banach
algebra A is biflat and has a bounded approximate identity [7, Theorem VII.2.20].
Lemma 5.2. Let A be an amenable ⊗̂-algebra which is a Fréchet space or a nu-
clear DF -space. Then Hnaiven (A) = {0} for all n ≥ 1, H
naive
0 (A) = A/[A,A] and
Hbarn (A) = {0} for all n ≥ 0.
Proof. In the categories of Fréchet spaces and complete nuclear DF -spaces, the open
mapping theorem holds. Therefore, by [7, Theorem III.4.25 and Proposition 7.1.2],
up to topological isomorphism, for the trivial A-bimodule C,
Hbarn (A) = Hn+1(A,C) = Tor
n+1(C,A+) = {0}
for all n ≥ 0;
Hnaiven (A) = Hn(A,A) = Tor
n (A,A+) = {0}
for all n ≥ 1 and Hnaive0 (A) = Tor
0 (A,A+) is Hausdorff, that is, H
naive
0 (A) =
A/[A,A]. �
Theorem 5.3. Let A be a ⊗̂-algebra which is a Fréchet space or a nuclear DF -
space. Suppose that the continuous homology groups Hnaiven (A) = {0} for all n ≥ 1,
Hnaive0 (A) is Hausdorff and H
n (A) = {0} for all n ≥ 0. In particular, asssume
that A is a biflat algebra such that A = A2 or A is amenable. Then
(i) up to topological isomorphism,
HHn(A) = {0} for all n ≥ 1 and HH0(A) = A/[A,A];
HC2ℓ(A) = A/[A,A] and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
(ii) up to topological isomorphism for Fréchet algebras and up to isomorphism of
linear spaces for nuclear DF -algebras,
(8) HP0(A) = A/[A,A] and HP1(A) = {0};
(iii)
Hnnaive(A) = {0} for all n ≥ 1;
Hnbar(A) = {0}; for all n ≥ 0;
(iv) up to topological isomorphism for nuclear Fréchet algebras and nuclear DF -
algebras and up to isomorphism of linear spaces for Fréchet algebras,
HHn(A) = {0} for all n ≥ 1 and HH0(A) = Atr;
HC2ℓ(A) = Atr and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
(v) up to topological isomorphism for nuclear Fréchet algebras and up to isomorphism
of linear spaces for Fréchet algebras and for nuclear DF -algebras,
(11) HP0(A) = Atr and HP1(A) = {0}.
14 Z. A. Lykova
Proof. In view of Proposition 5.1 and Lemma 5.2, a biflat algebra A such that
A = A2 and an amenable A satisfy the conditions of the theorem.
By Proposition 2.2, firstly, Hbarn (A) = {0} for all n ≥ 0 if and only if H
bar(A) =
{0} for all n ≥ 0; and, secondly, Hnnaive(A) = {0} for all n ≥ 1 if and only if
Hnaiven (A) = {0} for all n ≥ 1 and H
naive
0 (A) is Hausdorff.
By Proposition 4.4, we have isomorphisms of linear spaces in (i) – (v). In Propo-
sitions 4.2 and 4.2 we show also when the above isomorphisms are automatically
topological. �
Remark 5.4. Recall that, for a biflat Banach algebra A, dbwA ≤ 2 [27, Theorem
6]. By [14, Theorem 5.2], for a Banach algebra A of a finite weak bidimension dbwA,
we have isomorphisms between the entire cyclic cohomology and the periodic cyclic
cohomology of A, HE0(A) = HP 0(A) = Atr and HE1(A) = HP 1(A) = {0}. The
entire cyclic cohomology HEk(A) of A for k = 0, 1 are defined in [2, IV.7]. In
[25, Theorem 6.1] M. Puschnigg extended M. Khalkhali’s result on the isomorphism
HEk(A) = HP k(A) for k = 0, 1 from Banach algebras to some Fréchet algebras.
The following statement shows that the above theorems give the explicit descrip-
tion of cyclic type homology and cohomology of the projective tensor product of two
biprojective ⊗̂-algebras.
Proposition 5.5. Let B and C be biprojective ⊗̂-algebras. Then the projective
tensor product A = B⊗̂C is a biprojective ⊗̂-algebra.
Proof. Since B is biprojective, there is a morphism of B-⊗̂-bimodules ρB : B → B⊗̂B
such that πB ◦ ρB = idB. A similar statement is valid for C. Let i be the topological
isomorphism
i : (B⊗̂B)⊗̂(C⊗̂C)→ (B⊗̂C)⊗̂(B⊗̂C)
given by (b1⊗ b2)⊗ (c1⊗ c2) 7→ (b1⊗ c1)⊗ (b2⊗ c2). Note that πB⊗̂C = (πB⊗̂πC)◦ i
It is routine to check that
ρB⊗̂C : B⊗̂C → (B⊗̂C)⊗̂(B⊗̂C)
defined by ρB⊗̂C = i◦(ρB⊗ρC) is a morphism of B⊗̂C-⊗̂-bimodules and πB⊗̂C◦ρB⊗̂C =
idB⊗̂C. �
Remark 5.6. For amenable Banach algebras B and C, B. E. Johnson showed that
the Banach algebra A = B⊗̂C is amenable [11]. By [19, Proposition 5.4], for a biflat
Banach algebra A, each closed two-sided ideal I with bounded approximate identity
is amenable and the quotient algebra A/I is biflat. Thus the explicit description of
cyclic type homology and cohomology of such I and A/I is also given in Theorem
5.3. One can find a number of examples of biflat and simplicially trivial Banach and
C∗- algebras in [17, Example 4.6, 4.9].
Cyclic cohomology of nuclear Fréchet and DF algebras 15
6. Applications to the cyclic-type cohomology of biprojective
⊗̂-algebras
In this section we present examples of nuclear biprojective ⊗̂-algebras which are
Fréchet spaces or DF -spaces and the continuous cyclic-type homology and cohomol-
ogy of these algebras.
Example 6.1. Let G be a compact Lie group and let E(G) be the nuclear Fréchet
algebra of smooth functions on G with the convolution product. It was shown by
Yu.V. Selivanov that A = E(G) is biprojective [29].
Let E∗(G) be the strong dual to E(G), so that E∗(G) is a complete nuclear DF -
space. This is a ⊗̂-algebra with respect to convolution multiplication: for f, g ∈
E∗(G) and x ∈ E(G), < f ∗ g, x >=< f, y >, where y ∈ E(G) is defined by
y(s) =< g, xs >, s ∈ G and xs(t) = x(s
−1t), t ∈ G. J.L. Taylor proved that the
algebra of distributions E∗(G) on a compact Lie group G is contractible [30].
Example 6.2. Let (E, F ) be a pair of complete Hausdorff locally convex spaces
endowed with a jointly continuous bilinear form 〈·, ·〉 : E ×F → C that is not iden-
tically zero. The space A = E⊗̂F is a ⊗̂-algebra with respect to the multiplication
defined by
(x1 ⊗ x2)(y1 ⊗ y2) = 〈x2, y1〉x1 ⊗ y2, xi ∈ E, yi ∈ F.
Yu.V. Selivanov proved that this algebra is biprojective and usually non unital
[28, 29]. More exactly, if A = E⊗̂F has a left or right identity, then E or F
respectively is finite-dimensional. If the form 〈·, ·〉 is nondegenerate, then A = E⊗̂F
is called the tensor algebra generated by the duality (E, F, 〈·, ·〉).
In particular, if E is a Banach space with the approximation property, then the
algebra A = E⊗̂E∗ is isomorphic to the algebra N (E) of nuclear operators on E [7,
II.2.5].
6.1. Köthe sequence algebras. The following results on Köthe algebras can be
found in A. Yu. Pirkovskii’s papers [22, 23].
A set P of nonnegative real-valued sequences p = (pi)i∈N is called a Köthe set if
the following axioms are satisfied:
(P1) for every i ∈ N there is p ∈ P such that pi > 0;
(P2) for every p, q ∈ P there is r ∈ P such that max{pi, qi} ≤ ri for all i ∈ N.
Suppose, in addition, the following condition is satisfied:
(P3) for every p ∈ P there exist q ∈ P and a constant C > 0 such that pi ≤ Cq
for all i ∈ N.
For any Köthe set P which satisfies (P3), the Köthe space
λ(P ) = {x = (xn) ∈ C
N : ‖x‖p =
|xn|pn <∞ for all p ∈ P}
is a complete locally convex space with the topology determined by the family of
seminorms {‖x‖p : p ∈ P} and a ⊗̂-algebra with pointwise multiplication. The
⊗̂-algebras λ(P ) are called Köthe algebras.
By [21] and [6], for a Köthe set, λ(P ) is nuclear if and only if
(P4) for every p ∈ P there exist q ∈ P and ξ ∈ ℓ1 such that pi ≤ ξiqi for all i ∈ N.
16 Z. A. Lykova
By [22, Theorem 3.5], λ(P ) is biprojective if and only if
(P5) for every p ∈ P there exist q ∈ P and a constant M > 0 such that p2i ≤Mqi
for all i ∈ N.
The algebra λ(P ) is unital if and only if
n pn <∞ for every p ∈ P .
Example 6.3. Fix a real number 1 ≤ R ≤ ∞ and a nondecreasing sequence
α = (αi) of positive numbers with limi→∞ αi =∞. The power series space
ΛR(α) = {x = (xn) ∈ C
N : ‖x‖r =
|xn|r
αn <∞ for all 0 < r < R}
is a Fréchet Köthe algebra with pointwise multiplication. The topology of ΛR(α) is
determined by a countable family of seminorms {‖x‖rk : k ∈ N} where {rk} is an
arbitrary increasing sequence converging to R.
By [23, Corollary 3.3], ΛR(α) is biprojective if and only if R = 1 or R =∞.
By the Grothendieck-Pietsch criterion, ΛR(α) is nuclear if and only if for limn
0 for R <∞ and limn
<∞ for R =∞, see [22, Example 3.4].
The algebra ΛR((n)) is topologically isomorphic to the algebra of functions holo-
morphic on the open disc of radius R, endowed with Hadamard product, that is,
with “co-ordinatewise” product of the Taylor expansions of holomorphic functions.
Example 6.4. The algebra H(C) ∼= Λ∞((n)) of entire functions, endowed with the
Hadamard product, is a biprojective nuclear Fréchet algebra [23].
Example 6.5. The algebra H(D1) ∼= Λ1((n)) of functions holomorphic on the open
unit disc, endowed with the Hadamard product, is a biprojective nuclear Fréchet
algebra. Moreover it is contractible, since the function z 7→ (1− z)−1 is an identity
for H(D1) [23].
For any Köthe space λ(P ) the dual space λ(P )∗ can be canonically identified with
{(yn) ∈ C
N : ∃p ∈ P and C > 0 such that |yn| ≤ Cpn for all n ∈ N}.
It is shown in [23] that, for a biprojective Köthe algebra λ(P ), λ(P )∗ is a sequence
algebra with pointwise multiplication.
The algebra λ(P )∗ is unital if and only if there exists p ∈ P such that inf i pi > 0.
Example 6.6. The nuclear Fréchet algebra of rapidly decreasing sequences
s = {x = (xn) ∈ C
N : ‖x‖k =
|xn|n
k <∞ for all k ∈ N}
is a biprojective Köthe algebra [22]. The algebra s is topologically isomorphic to
Λ∞(α) with αn = logn [23]. The nuclear Köthe ⊗̂-algebra s
∗ of sequences of poly-
nomial growth is contractible [30].
Example 6.7. [23, Section 4.2] Let P be a Köthe set such that pi ≥ 1 for all p ∈ P
and all n ∈ N. Then the formula 〈a, b〉 =
i aibi defines a jointly continuous,
nondegenerate bilinear form on λ(P ) × λ(P ). Thus M(P ) = λ(P )⊗̂λ(P ) can be
considered as the tensor algebra generated by the duality (λ(P ), λ(P ), 〈·, ·〉), and so
is biprojective. There is a canonical isomorphism between M(P ) and the algebra
Cyclic cohomology of nuclear Fréchet and DF algebras 17
λ(P × P ) of N×N complex matrices (aij)(ij)∈N×N satisfying the condition ‖a‖p =
i,j |aij|pipj <∞ for all p ∈ P with the usual matrix multiplication.
In particular, for P = {(nk)n∈N : k = 0, 1, . . . }, we obtain the biprojective nuclear
Fréchet algebra ℜ = s⊗̂s of “smooth compact operators” consisting of N×N com-
plex matrices (aij) with rapidly decreasing matrix entries. Here s is from Example
Theorem 6.8. Let A be a ⊗̂-algebra belonging to one of the following classes:
(i) A = E(G) or A = E∗(G) for a compact Lie group G;
(ii) A = E⊗̂F , the tensor algebra generated by the duality (E, F, 〈·, ·〉) for nuclear
Fréchet spaces E and F (e.g., ℜ = s⊗̂s) or for nuclear complete DF -spaces E and
(iii) Fréchet Köthe algebras A = λ(P ) such that the Köthe set P satisfies (P3),
(P4) and (P5); in particular, Λ1(α) such that limn
= 0 or Λ∞(α) such that
<∞. (e.g., H(D1), s, H(C)).
(iv) Köthe algebras A = λ(P )∗ which are the strong duals of λ(P ) from (iii).
(v) the projective tensor product A = B⊗̂C of biprojective nuclear ⊗̂-algebras B
and C which are Fréchet spaces or DF -spaces; in particular, A = E(G)⊗̂ℜ.
Then, up to topological isomorphism,
Hnaiven (A) = {0} for all n ≥ 1 and H
naive
0 (A) = A/[A,A];
Hbarn (A) = {0} for all n ≥ 0;
HHn(A) = {0} for all n ≥ 1 and HH0(A) = A/[A,A];
HC2ℓ(A) = A/[A,A] and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
Hnnaive(A) = {0} for all n ≥ 1; H
bar(A) = {0} for all n ≥ 0;
HHn(A) = {0} for all n ≥ 1 and HH0(A) = Atr;
HC2ℓ(A) = Atr and HC2ℓ+1(A) = {0} for all ℓ ≥ 0;
and, up to topological isomorphism for Fréchet algebras and up to isomorphism of
linear spaces for DF -algebras,
HP0(A) = A/[A,A] and HP1(A) = {0};
HP0(A) = Atr and HP1(A) = {0}.
Proof. We have mentioned above that the algebras in (i)-(iii) and (v) are biprojective
and nuclear. By [23, Corollary 3.10], for any nuclear biprojective Fréchet Köthe
algebra λ(P ), the strong dual λ(P )∗ is a nuclear, biprojective Köthe ⊗̂-algebra
which is a DF -space. For nuclear Fréchet algebras and for nuclear DF -algebras, the
conditions of Theorem 5.3 are satisfied. Therefore, for the homology and cohomology
groups HH and HC of A we have the topological isomorphisms (7) and (10). For
the periodic cyclic homology and cohomology groups HP of A, for Fréchet algebras,
we have topological isomorphisms and, for nuclear DF -algebras, isomorphisms of
linear spaces (8) and (11). It is obvious that, for commutative algebras, Atr = A∗
and A/[A,A] = A. �
18 Z. A. Lykova
The cyclic-type homology and cohomology of E(G) for a compact Lie group G
were calculated in [20].
References
[1] J. Brodzki and Z. A. Lykova: “Excision in cyclic type homology of Fréchet algebras”, Bull.
London Math. Soc., Vol. 33, (2001), pp. 283-291.
[2] A. Connes: Noncommutative geometry, Academic Press, London, 1994.
[3] J. Cuntz: “Cyclic theory and the bivariant Chern-Connes character”, In: Noncommutative
geometry,” Lecture Notes in Math., no. 1831, Springer, Berlin, 2004, pp. 73–135.
[4] J. Cuntz and D. Quillen: “Operators on noncommutative differential forms and cyclic homol-
ogy”, In: Geometry, Topology and Physics, Conf. Proc. Lecture Notes Geom. Topology VI,
Internat. Press, Cambridge, MA, 1995, pp. 77-111.
[5] A. Grothendieck: “Sur les espaces (F ) et (DF )”, Summa Brasil. Math., Vol. 3, (1954), pp.
57–123.
[6] A. Grothendieck: Produits tensoriels topologiques et espaces nucléaires, Mem. Amer. Math.
Soc., Vol. 16, 1955.
[7] A.Ya. Helemskii: The homology of Banach and topological algebras, Kluwer Academic Pub-
lishers, Dordrecht, 1989.
[8] A.Ya. Helemskii: “Banach cyclic (co)homology and the Connes-Tzygan exact sequence”, J.
London Math. Soc. (2), Vol. 46, (1992), no. 3, pp. 449–462.
[9] A.Ya. Helemskii: Banach and polynormed algebras. General theory, representations, homology,
Oxford University Press, Oxford, 1992.
[10] T. Husain: The open mapping and closed graph theorems in topological vector spaces, Friedr.
Vieweg and Sohn, Braunschweig, 1965.
[11] B.E. Johnson: Cohomology in Banach algebras, Mem. Amer. Math. Soc., Vol. 127, 1972.
[12] H. Jarchow: Locally convex spaces, B.G. Teubner, Stuttgart, 1981.
[13] C. Kassel: “Cyclic homology, comodules, and mixed complexes”, J. Algebra, Vol. 107, (1987),
pp. 195-216.
[14] M. Khalkhali: “Algebraic connections, universal bimodules and entire cyclic cohomology”,
Comm. Math. Phys., Vol. 161, (1994), pp. 433–446.
[15] G. Köthe: Topological Vector Spaces, I, Die Grundlehren der mathematischen Wissenschaften,
159, Springer Verlag, Berlin, New-York, 1979.
[16] J.-L. Loday: Cyclic Homology, Springer Verlag, Berlin, 1992.
[17] Z.A. Lykova: “Cyclic cohomology of projective limits of topological algebras”, Proc. Edin-
burgh Math. Soc., Vol. 49, (2006), pp. 173–199.
[18] Z.A. Lykova: “Cyclic-type cohomology of strict inductive limits of Fréchet algebras”, J. Pure
Appl. Algebra, Vol. 205, (2006), pp. 471–497.
[19] Z.A. Lykova and M.C. White: “Excision in the cohomology of Banach algebras with coeffi-
cients in dual bimodules”, In: E. Albrecht and M. Mathieu (Eds.), Banach Algebras’97, Walter
de Gruyter Publishers, Berlin, 1998, pp. 341–361.
[20] R. Meyer: “Comparisons between periodic, analytic and local cyclic cohomology”,
ArXiv:math.KT/0205276 v2 18 Dec 2003.
[21] A. Pietsch: Nuclear Locally Convex Spaces, Springer Verlag, Berlin, 1972.
[22] A.Yu. Pirkovskii: “Biprojective topological algebras of homological bidimension 1”, J. Math.
Sci., Vol. 111, No. 2, (2001), pp. 3476-3495.
[23] A.Yu. Pirkovskii: “Homological bidimension of biprojective topological algebras and nuclear-
ity”, Acta Univ. Oulu. Ser Rerum Natur., Vol. 408, (2004), pp. 179–196.
[24] V. Pták: “On complete topological linear spaces”, Czech. Math. J., Vol. 78 (3), (1953), pp.
301–364. (Russian with English summary)
http://arxiv.org/abs/math/0205276
Cyclic cohomology of nuclear Fréchet and DF algebras 19
[25] M. Puschnigg: “Excision in cyclic homology theories”, Invent. Math., Vol. 143, (2001), pp.
249–323.
[26] A.P. Robertson and W. Robertson: Topological vector spaces, Cambridge Univ. Press, 1973.
[27] Yu.V. Selivanov: “Cohomology of biflat Banach algebras with coefficients in dual bimodules”,
Functional Anal. Appl., Vol. 29, (1995), pp. 289–291.
[28] Yu.V. Selivanov: “Biprojective Banach algebras”, Math. USSR Izvestija, Vol. 15, (1980), pp.
387–399.
[29] Yu.V. Selivanov: “Biprojective topological algebras”, Unpublished manuscript, 1996.
[30] J.L. Taylor: “Homology and cohomology for topological algebras”, Adv. Math., Vol. 9, (1972),
pp. 137–182.
[31] F. Treves: Topological vector spaces, distributions and kernels, Academic Press, New York,
London, 1967.
[32] M. Wodzicki: “Vanishing of cyclic homology of stable C∗-algebras”, C. R. Acad. Sci. Paris I
, Vol. 307, (1988), pp. 329–334.
School of Mathematics and Statistics, University of Newcastle,, Newcastle
upon Tyne, NE1 7RU, UK (Z.A.Lykova@newcastle.ac.uk)
	1. Introduction
	2. Definitions and notation
	3. The open mapping theorem in complete nuclear DF-spaces
	4. Cyclic and Hochschild cohomology of some -algebras
	5. Cyclic-type cohomology of biflat -algebras
	6. Applications to the cyclic-type cohomology of biprojective -algebras
	6.1.  Köthe sequence algebras
	References
ABSTRACT
  We give explicit formulae for the continuous Hochschild and cyclic homology
and cohomology of certain topological algebras. To this end we show that, for a
continuous morphism $\phi: \X\to \Y$ of complexes of complete nuclear
$DF$-spaces, the isomorphism of cohomology groups $H^n(\phi): H^n(\X) \to
H^n(\Y)$ is automatically topological. The continuous cyclic-type homology and
cohomology are described up to topological isomorphism for the following
classes of biprojective $\hat{\otimes}$-algebras: the tensor algebra $E
\hat{\otimes} F$ generated by the duality $(E, F, < \cdot, \cdot >)$ for
nuclear Fr\'echet spaces $E$ and $F$ or for nuclear $DF$-spaces $E$ and $F$;
nuclear biprojective K\"{o}the algebras $\lambda(P)$ which are Fr\'echet spaces
or $DF$-spaces; the algebra of distributions $\mathcal{E}^*(G)$ on a compact
Lie group $G$.

<|endoftext|><|startoftext|>
Introduction
In a sequential decision problem, a decision maker (or forecaster) performs a sequence of actions.
After each action the decision maker suffers some loss, depending on the response (or state) of the
environment, and its goal is to minimize its cumulative loss over a certain period of time. In the
setting considered here, no probabilistic assumption is made on how the losses corresponding to
different actions are generated. In particular, the losses may depend on the previous actions of the
decision maker, whose goal is to perform well relative to a set of reference forecasters (the so-called
“experts”) for any possible behavior of the environment. More precisely, the aim of the decision
maker is to achieve asymptotically the same average (per round) loss as the best expert.
Research into this problem started in the 1950s (see, for example, Blackwell [5] and Hannan
[18] for some of the basic results) and gained new life in the 1990s following the work of Vovk [29],
Littlestone and Warmuth [24], and Cesa-Bianchi et al. [7]. These results show that for any bounded
loss function, if the decision maker has access to the past losses of all experts, then it is possible
to construct on-line algorithms that perform, for any possible behavior of the environment, almost
as well as the best of N experts. More precisely, the per round cumulative loss of these algorithms
is at most as large as that of the best expert plus a quantity proportional to
lnN/n for any
bounded loss function, where n is the number of rounds in the decision game. The logarithmic
dependence on the number of experts makes it possible to obtain meaningful bounds even if the
pool of experts is very large.
In certain situations the decision maker has only limited knowledge about the losses of all
possible actions. For example, it is often natural to assume that the decision maker gets to know
only the loss corresponding to the action it has made, and has no information about the loss it would
have suffered had it made a different decision. This setup is referred to as the multi-armed bandit
problem, and was considered, in the adversarial setting, by Auer et al. [1] who gave an algorithm
whose normalized regret (the difference of the algorithm’s average loss and that of the best expert)
is upper bounded by a quantity which is proportional to
N lnN/n. Note that, compared to the
full information case described above where the losses of all possible actions are revealed to the
decision maker, there is an extra
N factor in the performance bound, which seriously limits the
usefulness of the bound if the number of experts is large.
Another interesting example for the limited information case is the so-called label efficient
decision problem (see Helmbold and Panizza [22]) in which it is too costly to observe the state of
the environment, and so the decision maker can query the losses of all possible actions for only a
limited number of times. A recent result of Cesa-Bianchi, Lugosi, and Stoltz [9] shows that in this
case, if the decision maker can query the losses m times during a period of length n, then it can
achieve O(
lnN/m) average excess loss relative to the best expert.
In many applications the set of experts has a certain structure that may be exploited to construct
efficient on-line decision algorithms. The construction of such algorithms has been of great interest
in computational learning theory. A partial list of works dealing with this problem includes Herbster
and Warmuth [19], Vovk [30], Bousquet and Warmuth [6], Helmbold and Schapire [27], Takimoto
and Warmuth [28], Kalai and Vempala [23], György at al. [13, 14, 15]. For a more complete survey,
we refer to Cesa-Bianchi and Lugosi [8, Chapter 5].
In this paper we study the on-line shortest path problem, a representative example of structured
expert classes that has received attention in the literature for its many applications, including,
among others, routing in communication networks; see, e.g., Takimoto andWarmuth [28], Awerbuch
et al. [3], or György and Ottucsák [17], and adaptive quantizer design in zero-delay lossy source
coding; see, György et al. [13, 14, 16]. In this problem, a weighted directed (acyclic) graph is given
whose edge weights can change in an arbitrary manner, and the decision maker has to pick in each
round a path between two given vertices, such that the weight of this path (the sum of the weights
of its composing edges) be as small as possible.
Efficient solutions, with time and space complexity proportional to the number of edges rather
than to the number of paths (the latter typically being exponential in the number of edges), have
been given in the full information case, where in each round the weights of all the edges are revealed
after a path has been chosen; see, for example, Mohri [26], Takimoto and Warmuth [28], Kalai and
Vempala [23], and György et al. [15].
In the bandit setting only the weights of the edges or just the sum of the weights of the edges
composing the chosen path are revealed to the decision maker. If one applies the general bandit
algorithm of Auer et al. [1], the resulting bound will be too large to be of practical use because
of its square-root-type dependence on the number of paths N . On the other hand, using the
special graph structure in the problem, Awerbuch and Kleinberg [4] and McMahan and Blum [25]
managed to get rid of the exponential dependence on the number of edges in the performance bound.
They achieved this by extending the exponentially weighted average predictor and the follow-the-
perturbed-leader algorithm of Hannan [18] to the generalization of the multi-armed bandit setting
for shortest paths, when only the sum of the weights of the edges is available for the algorithm.
However, the dependence of the bounds obtained in [4] and [25] on the number of rounds n is
significantly worse than the O(1/
n) bound of Auer et al. [1]. Awerbuch and Kleinberg [4] consider
the model of “non-oblivious” adversaries for shortest path (i.e., the losses assigned to the edges can
depend on the previous actions of the forecaster) and prove an O(n−1/3) bound for the expected
per-round regret. McMahan and Blum [25] give a simpler algorithm than in [4] however obtain a
bound of the order of O(n−1/4) for the expected regret.
In this paper we provide an extension of the bandit algorithm of Auer et al. [1] unifying the
advantages of the above approaches, with a performance bound that is polynomial in the number
of edges, and converges to zero at the right O(1/
n) rate as the number of rounds increases. We
achieve this bound in a model which assumes that the losses of all edges on the path chosen by the
forecaster are available separately after making the decision. We also discuss the case (considered
by [4] and [25]) in which only the total loss (i.e., the sum of the losses on the chosen path) is known
to the decision maker. We exhibit a simple algorithm which achieves an O(n−1/3) per-round regret
with high probability against “non-oblivious” adversary. In this case it remains an open problem
to find an algorithm whose cumulative loss is polynomial in the number of edges of the graph and
decreases as O(n−1/2) with the number of rounds.
In Section 2 we formally define the on-line shortest path problem, which is extended to the
multi-armed bandit setting in Section 3. Our new algorithm for the shortest path problem in
the bandit setting is given in Section 4 together with its performance analysis. The algorithm
is extended to solve the shortest path problem in a combined label efficient multi-armed bandit
setting in Section 5. Another extension, when the algorithm competes against a time-varying path
is studied in Section 6. An algorithm for the “restricted” multi-armed bandit setting (when only the
sums of the losses of the edges are available) is given in Section 7. Simulation results are presented
in Section 8.
2 The shortest path problem
Consider a network represented by a set of vertices connected by edges, and assume that we have
to send a stream of packets from a distinguished vertex, called source, to another distinguished
vertex, called destination. At each time slot a packet is sent along a chosen route connecting source
and destination. Depending on the traffic, each edge in the network may have a different delay, and
the total delay the packet suffers on the chosen route is the sum of delays of the edges composing
the route. The delays may change from one time slot to the next one in an arbitrary way, and
our goal is to find a way of choosing the route in each time slot such that the sum of the total
delays over time is not significantly more than that of the best fixed route in the network. This
adversarial version of the routing problem is most useful when the delays on the edges can change
dynamically, even depending on our previous routing decisions. This is the situation in the case
of ad-hoc networks, where the network topology can change rapidly, or in certain secure networks,
where the algorithm has to be prepared to handle denial of service attacks, that is, situations where
willingly malfunctioning vertices and links increase the delay; see, e.g., Awerbuch et al. [3].
This problem can be cast naturally as a sequential decision problem in which each possible
route is represented by an action. However, the number of routes is typically exponentially large
in the number of edges, and therefore computationally efficient algorithms are called for. Two
solutions of different flavor have been proposed. One of them is based on a follow-the-perturbed-
leader forecaster, see Kalai and Vempala [23], while the other is based on an efficient computation
of the exponentially weighted average forecaster, see, for example, Takimoto and Warmuth [28].
Both solutions have different advantages and may be generalized in different directions.
To formalize the problem, consider a (finite) directed acyclic graph with a set of edges E =
{e1, . . . , e|E|} and a set of vertices V . Thus, each edge e ∈ E is an ordered pair of vertices (v1, v2).
Let u and v be two distinguished vertices in V . A path from u to v is a sequence of edges e(1), . . . , e(k)
such that e(1) = (u, v1), e
(j) = (vj−1, vj) for all j = 2, . . . , k − 1, and e(k) = (vk−1, v). Let
P = {i1, . . . , iN} denote the set of all such paths. For simplicity, we assume that every edge in
E is on some path from u to v and every vertex in V is an endpoint of an edge (see Figure 1 for
examples).
PSfrag replacements
Figure 1: Two examples of directed acyclic graphs for the shortest path problem.
(a) (b)
In each round t = 1, . . . , n of the decision game, the decision maker chooses a path It among
all paths from u to v. Then a loss ℓe,t ∈ [0, 1] is assigned to each edge e ∈ E. We write e ∈ i if the
edge e ∈ E belongs to the path i ∈ P, and with a slight abuse of notation the loss of a path i at
time slot t is also represented by ℓi,t. Then ℓi,t is given as
ℓi,t =
and therefore the cumulative loss up to time t of each path i takes the additive form
Li,t =
ℓi,s =
where the inner sum on the right-hand side is the loss accumulated by edge e during the first t
rounds of the game. The cumulative loss of the algorithm is
L̂t =
ℓIs,s =
ℓe,s .
It is well known that for a general loss sequence, the decision maker must be allowed to use
randomization to be able to approximate the performance of the best expert (see, e.g., Cesa-Bianchi
and Lugosi [8]). Therefore, the path It is chosen randomly according to some distribution pt over
all paths from u to v. We study the normalized regret over n rounds of the game
L̂n −min
where the minimum is taken over all paths i from u to v.
For example, the exponentially weighted average forecaster ([29], [24], [7]), calculated over all
possible paths, has regret
L̂n −min
ln(1/δ)
with probability at least 1− δ, where N is the total number of paths from u to v in the graph and
K is the length of the longest path.
3 The multi-armed bandit setting
In this section we discuss the “bandit” version of the shortest path problem. In this setup, which is
more realistic in many applications, the decision maker has only access to the losses corresponding
to the paths it has chosen. For example, in the routing problem this means that information is
available on the delay of the route the packet is sent on, and not on other routes in the network.
We distinguish between two types of bandit problems, both of which are natural generalizations
of the simple bandit problem to the shortest path problem. In the first variant, the decision maker
has access to the losses of those edges that are on the path it has chosen. That is, after choosing a
path It at time t, the value of the loss ℓe,t is revealed to the decision maker if and only if e ∈ It.
We study this case and its extensions in Sections 4, 5, and 6.
The second variant is a more restricted version in which the loss of the chosen path is observed,
but no information is available on the individual losses of the edges belonging to the path. That
is, after choosing a path It at time t, only the value of the loss of the path ℓIt,t is revealed to the
decision maker. Further on we call this setting as the restricted bandit problem for shortest path.
We consider this restricted problem in Section 7.
Formally, the on-line shortest path problem in the multi-armed bandit setting is described as
follows: at each time instance t = 1, . . . , n, the decision maker picks a path It ∈ P from u to v.
Then the environment assigns loss ℓe,t ∈ [0, 1] to each edge e ∈ E, and the decision maker suffers
loss ℓIt,t =
ℓe,t. In the unrestricted case the losses ℓe,t are revealed for all e ∈ It, while in
the restricted case only ℓIt,t is revealed. Note that in both cases ℓe,t may depend on I1, . . . , It−1,
the earlier choices of the decision maker.
For the basic multi-armed bandit problem, Auer et al. [1] gave an algorithm, based on exponen-
tial weighting with a biased estimate of the gains (defined, in our case, as gi,t = K− ℓi,t), combined
with uniform exploration. Applying their algorithm to the on-line shortest path problem in the
bandit setting results in a performance that can be bounded, for any 0 < δ < 1 and fixed time
horizon n, with probability at least 1− δ, by
L̂n −min
≤ 11K
N ln(N/δ)
K lnN
(The constants follow from a slightly improved version; see Cesa-Bianchi and Lugosi [8].)
However, for the shortest path problem this bound is unacceptably large because, unlike in the
full information case, here the dependence on the number of all paths N is not merely logarithmic,
while N is typically exponentially large in the size of the graph (as in the two simple examples
of Figure 1). In order to achieve a bound that does not grow exponentially with the number of
edges of the graph, it is imperative to make use of the dependence structure of the losses of the
different actions (i.e., paths). Awerbuch and Kleinberg [4] and McMahan and Blum [25] do this
by extending low complexity predictors, such as the follow-the-perturbed-leader forecaster [18],
[23] to the restricted bandit setting. However, in both cases the price to pay for the polynomial
dependence on the number of edges is a worse dependence on the length n of the game.
4 A bandit algorithm for shortest paths
In this section we describe a variant of the bandit algorithm of [1] which achieves the desired
performance for the shortest path problem. The new algorithm uses the fact that when the losses
of the edges of the chosen path are revealed, then this also provides some information about the
losses of each path sharing common edges with the chosen path.
For each edge e ∈ E, and t = 1, 2, . . ., introduce the gain ge,t = 1− ℓe,t, and for each path i ∈ P,
let the gain be the sum of the gains of the edges on the path, that is,
gi,t =
ge,t .
The conversion from losses to gains is done in order to facilitate the subsequent performance
analysis. To simplify the conversion, we assume that each path i ∈ P is of the same length K
for some K > 0. Note that although this assumption may seem to be restrictive at the first
glance, from each acyclic directed graph (V,E) one can construct a new graph by adding at most
(K−2)(|V |−2)+1 vertices and edges (with constant weight zero) to the graph without modifying
the weights of the paths such that each path from u to v will be of length K, where K denotes the
length of the longest path of the original graph. If the number of edges is quadratic in the number
of vertices, the size of the graph is not increased substantially.
A main feature of the algorithm below is that the gains are estimated for each edge and not
for each path. This modification results in an improved upper bound on the performance with
the number of edges in place of the number of paths. Moreover, using dynamic programming as
in Takimoto and Warmuth [28], the algorithm can be computed efficiently. Another important
ingredient of the algorithm is that one needs to make sure that every edge is sampled sufficiently
often. To this end, we introduce a set C of covering paths with the property that for each edge
e ∈ E there is a path i ∈ C such that e ∈ i. Observe that one can always find such a covering set
of cardinality |C| ≤ |E|.
We note that the algorithm of [1] is a special case of the algorithm below: For any multi-
armed bandit problem with N experts, one can define a graph with two vertices u and v, and N
directed edges from u to v with weights corresponding to the losses of the experts. The solution
of the shortest path problem in this case is equivalent to that of the original bandit problem with
choosing expert i if the corresponding edge is chosen. For this graph, our algorithm reduces to the
original algorithm of [1].
A BANDIT ALGORITHM FOR SHORTEST PATHS
Parameters: real numbers β > 0, 0 < η, γ < 1.
Initialization: Set we,0 = 1 for each e ∈ E, wi,0 = 1 for each i ∈ P, and W 0 = N .
For each round t = 1, 2, . . .
(a) Choose a path It at random according to the distribution pt on P, defined by
pi,t =
(1− γ)wi,t−1
W t−1
if i ∈ C
(1− γ)wi,t−1
W t−1
if i 6∈ C.
(b) Compute the probability of choosing each edge e as
qe,t =
i:e∈i
pi,t = (1− γ)
i:e∈iwi,t−1
W t−1
|{i ∈ C : e ∈ i}|
(c) Calculate the estimated gains
g′e,t =
ge,t+β
if e ∈ It
otherwise.
(d) Compute the updated weights
we,t = we,t−1e
ηg′e,t
wi,t =
we,t = wi,t−1e
where g′
i,t =
e∈i g
e,t, and the sum of the total weights of the paths
W t =
wi,t.
The main result of the paper is the following performance bound for the shortest-path bandit
algorithm. It states that the per round regret of the algorithm, after n rounds of play, is, roughly,
of the order of K
|E| lnN/n where |E| is the number of edges of the graph, K is the length of the
paths, and N is the total number of paths.
Theorem 1 For any δ ∈ (0, 1) and parameters 0 ≤ γ < 1/2, 0 < β ≤ 1, and η > 0 satisfying
2ηK|C| ≤ γ, the performance of the algorithm defined above can be bounded, with probability at
least 1− δ, as
L̂n −min
≤ Kγ + 2ηK2|C|+
+ |E|β.
In particular, choosing β =
, γ = 2ηK|C|, and η =
4nK2|C|
yields for all n ≥
, 4|C| lnN
L̂n −min
4K|C| lnN +
|E| ln |E|
The proof of the theorem is based on the analysis of the original algorithm of [1] with necessary
modifications required to transform parts of the argument from paths to edges, and to use the
connection between the gains of paths sharing common edges.
For the analysis we introduce some notation:
Gi,n =
gi,t and G
i,n =
g′i,t
for each i ∈ P and
Ge,n =
ge,t and G
e,n =
g′e,t
for each e ∈ E, and
Ĝn =
gIt,t.
The following lemma, shows that the deviation of the true cumulative gain from the estimated
cumulative gain is of the order of
n. The proof is a modification of [8, Lemma 6.7].
Lemma 1 For any δ ∈ (0, 1), 0 ≤ β < 1 and e ∈ E we have
Ge,n > G
e,n +
Proof. Fix e ∈ E. For any u > 0 and c > 0, by the Chernoff bound we have
P[Ge,n > G
e,n + u] ≤ e−cuEec(Ge,n−G
e,n) . (1)
Letting u = ln(|E|/δ)/β and c = β, we get
e−cuEec(Ge,n−G
e,n) = e− ln(|E|/δ)Eeβ(Ge,n−G
e,n) =
Eeβ(Ge,n−G
e,n) ,
so it suffices to prove that Eeβ(Ge,n−G
e,n) ≤ 1 for all n. To this end, introduce
Zt = e
β(Ge,t−G
e,t) .
Below we show that Et[Zt] ≤ Zt−1 for t ≥ 2 where Et denotes the conditional expectation
E[·|I1, . . . , It−1] . Clearly,
Zt = Zt−1 exp
ge,t −
1{e∈It}ge,t + β
Taking conditional expectations, we obtain
Et[Zt]
= Zt−1Et
ge,t −
1{e∈It}ge,t + β
= Zt−1e
qe,t Et
ge,t −
1{e∈It}ge,t
≤ Zt−1e
qe,t Et
1 + β
ge,t −
1{e∈It}ge,t
ge,t −
1{e∈It}ge,t
= Zt−1e
qe,t Et
1 + β2
ge,t −
1{e∈It}ge,t
≤ Zt−1e
qe,t Et
1 + β2
1{e∈It}ge,t
≤ Zt−1e
≤ Zt−1. (4)
Here (2) holds since β ≤ 1, ge,t −
1{e∈It}
≤ 1 and ex ≤ 1 + x + x2 for x ≤ 1. (3) follows from
1{e∈It}
= ge,t. Finally, (4) holds by the inequality 1 + x ≤ ex. Taking expectations on both
sides proves E[Zt] ≤ E[Zt−1]. A similar argument shows that E[Z1] ≤ 1, implying E[Zn] ≤ 1 as
desired. ✷
Proof of Theorem 1. As usual in the analysis of exponentially weighted average forecasters, we
start with bounding the quantity ln Wn
. On the one hand, we have the lower bound
i,n − lnN ≥ ηmax
G′i,n − lnN . (5)
To derive a suitable upper bound, first notice that the condition η ≤ γ
2K|C|
implies ηg′
i,t ≤ 1 for
all i and t, since
ηg′i,t = η
g′e,t ≤ η
1 + β
≤ ηK(1 + β)|C|
where the second inequality follows because qe,t ≥ γ/|C| for each e ∈ E.
Therefore, using the fact that ex ≤ 1 + x+ x2 for all x ≤ 1, for all t = 1, 2, . . . we have
W t−1
wi,t−1
W t−1
pi,t − γ|C|1{i∈C}
pi,t − γ|C|1{i∈C}
1 + ηg′i,t + η
ηg′i,t + η
pi,tg
i,t +
pi,tg
i,t (7)
where (6) follows form the definition of pi,t, and (7) holds by the inequality ln(1 + x) ≤ x for all
x > −1.
Next we bound the sums in (7). On the one hand,
pi,tg
i,t =
g′e,t =
g′e,t
i∈P:e∈i
g′e,tqe,t = gIt,t + |E|β.
On the other hand,
pi,tg
i,t =
g′e,t
pi,tK
i∈P:e∈i
e,tqe,t
qe,tg
β + 1{e∈It}ge,t
≤ K(1 + β)
g′e,t
where the first inequality is due to the inequality between the arithmetic and quadratic mean, and
the second one holds because ge,t ≤ 1. Therefore,
W t−1
(gIt,t + |E|β) +
η2K(1 + β)
g′e,t .
Summing for t = 1, . . . , n, we obtain
Ĝn + n|E|β
η2K(1 + β)
G′e,n
Ĝn + n|E|β
η2K(1 + β)
|C|max
G′i,n (8)
where the second inequality follows since
e∈E G
e,n ≤
i∈C G
i,n. Combining the upper bound
with the lower bound (5), we obtain
Ĝn ≥ (1− γ − ηK(1 + β)|C|)max
G′i,n −
lnN − n|E|β. (9)
Now using Lemma 1 and applying the union bound, for any δ ∈ (0, 1) we have that, with probability
at least 1− δ,
Ĝn ≥ (1− γ − ηK(1 + β)|C|)
Gi,n −
− 1− γ
lnN − n|E|β ,
where we used 1− γ − ηK(1 + β)|C| ≥ 0 which follows from the assumptions of the theorem.
Since Ĝn = Kn− L̂n and Gi,n = Kn− Li,n for all i ∈ P, we have
L̂n ≤ Kn (γ + η(1 + β)K|C|) + (1− γ − η(1 + β)K|C|)min
+ (1− γ − η(1 + β)K|C|)
lnN + n|E|β
with probability at least 1− δ. This implies
L̂n −min
Li,n ≤ Knγ + η(1 + β)nK2|C|+
lnN + n|E|β
≤ Knγ + 2ηnK2|C|+ K
+ n|E|β
with probability at least 1− δ, which is the first statement of the theorem. Setting
and γ = 2ηK|C|
results in the inequality
L̂n −min
Li,n ≤ 4ηnK2|C|+
nK|E| ln
which holds with probability at least 1 − δ if n ≥ (K/|E|) ln(|E|/δ) (to ensure β ≤ 1). Finally,
setting
4nK2|C|
yields the last statement of the theorem (n ≥ 4 lnN |C| is required to ensure γ ≤ 1/2). ✷
Next we analyze the computational complexity of the algorithm. The next result shows that
the algorithm is feasible as its complexity is linear in the size (number of edges) of the graph.
Theorem 2 The proposed algorithm can be implemented efficiently with time complexity O(n|E|)
and space complexity O(|E|).
Proof. The two complex steps of the algorithm are steps (a) and (b), both of which can be
computed, similarly to Takimoto and Warmuth [28], using dynamic programming. To perform
these steps efficiently, first we order the vertices of the graph. Since we have an acyclic directed
graph, its vertices can be labeled (in O(|E|) time) from 1 to |V | such that u = 1, v = |V |, and if
(v1, v2) ∈ E, then v1 < v2. For any pair of vertices u1 < v1 let Pu1,v1 denote the set of paths from
u1 to v1, and for any vertex s ∈ V , let
Ht(s) =
i∈Ps,v
Ĥt(s) =
i∈Pu,s
we,t .
Given the edge weights {we,t}, Ht(s) can be computed recursively for s = |V | − 1, . . . , 1, and Ĥt(s)
can be computed recursively for s = 2, . . . , |V | in O(|E|) time (letting Ht(v) = Ĥt(u) = 1 by
definition). In step (a), first one has to decide with probability γ whether It is generated according
to the graph weights, or it is chosen uniformly from C. If It is to be drawn according to the graph
weights, it can be shown that its vertices can be chosen one by one such that if the first k vertices of
It are v0 = u, v1, . . . , vk−1, then the next vertex of It can be chosen to be any vk > vk−1, satisfying
(vk−1, vk) ∈ E, with probability w(vk−1,vk),t−1Ht−1(vk)/Ht−1(vk−1). The other computationally
demanding step, namely step (b), can be performed easily by noting that for any edge (v1, v2),
q(v1,v2),t = (1− γ)
Ĥt−1(v1)w(v1,v2),t−1Ht−1(v2)
Ht−1(u)
|{i ∈ C : (v1, v2) ∈ i}|
as desired. ✷
5 A combination of the label efficient and bandit settings
In this section we investigate a combination of the multi-armed bandit and the label efficient
problems. This means that the decision maker only has access to the loss of the chosen path upon
request and the total number of requests must be bounded by a constant m. This combination is
motivated by some applications, in which feedback information is costly to obtain.
In the general label efficient decision problem, after taking an action, the decision maker has the
option to query the losses of all possible actions. For this problem, Cesa-Bianchi et al. [9] proved an
upper bound on the normalized regret of order O(K
ln(4N/δ)/(m)) which holds with probability
at least 1− δ.
Our model of the label-efficient bandit problem for shortest paths is motivated by an application
to a particular packet switched network model. This model, called the cognitive packet network,
was introduced by Gelenbe et al. [11, 12]. In these networks a particular type of packets, called
smart packets, are used to explore the network (e.g., the delay of the chosen path). These packets
do not carry any useful data; they are merely used for exploring the network. The other type of
packets are the data packets, which do not collect any information about their paths. The task of
the decision maker is to send packets from the source to the destination over routes with minimum
average transmission delay (or packet loss). In this scenario, smart packets are used to query the
delay (or loss) of the chosen path. However, as these packets do not transport information, there
is a tradeoff between the number of queries and the usage of the network. If data packets are on
the average α times larger than smart packets (note that typically α ≫ 1) and ǫ is the proportion
of time instances when smart packets are used to explore the network, then ǫ/(ǫ+ α(1− ǫ)) is the
proportion of the bandwidth sacrificed for well informed routing decisions.
We study a combined algorithm which, at each time slot t, queries the loss of the chosen path
with probability ǫ (as in the solution of the label efficient problem proposed in [9]), and, similarly
to the multi-armed bandit case, computes biased estimates g′
i,t of the true gains gi,t. Just as in the
previous section, it is assumed that each path of the graph is of the same length K.
The algorithm differs from our bandit algorithm of the previous section only in step (c), which
is modified in the spirit of [9]. The modified step is given below:
MODIFIED STEP FOR THE LABEL EFFICIENT BANDIT ALGORITHM FOR
SHORTEST PATHS
(c’) Draw a Bernoulli random variable St with P(St = 1) = ǫ, and compute the
estimated gains
g′e,t =
ge,t+β
ǫqe,t
St if e ∈ It
ǫqe,t
St if e /∈ It .
The performance of the algorithm is analyzed in the next theorem, which can be viewed as a
combination of Theorem 1 in the preceding section and Theorem 2 of [9].
Theorem 3 For any δ ∈ (0, 1), ǫ ∈ (0, 1], parameters η =
ǫ lnN
4nK2|C|
, γ =
2ηK|C|
≤ 1/2, and
n|E|ǫ
≤ 1, and for all
K2 ln2(2|E|/δ)
|E| lnN
|E| ln(2|E|/δ)
, 4|C| lnN
the performance of the algorithm defined above can be bounded, with probability at least 1− δ, as
L̂n −min
K|C| lnN + 5
|E| ln
8K ln
|E| ln 2N
If ǫ is chosen as (m −
2m ln(1/δ))/n then, with probability at least 1 − δ, the total number
of queries is bounded by m (see [8, Lemma 6.1]) and the performance bound above is of the order
|E| ln(N/δ)/m.
Similarly to Theorem 1, we need a lemma which reveals the connection between the true and the
estimated cumulative losses. However, here we need a more careful analysis because the “shifting
term”
ǫqe,t
St, is a random variable.
Lemma 2 For any 0 < δ < 1, 0 < ǫ ≤ 1, for any
n ≥ 1
K2 ln2(2|E|/δ)
|E| lnN
K ln(2|E|/δ)
parameters
2ηK|C|
≤ γ, η =
ǫ lnN
4nK2|C|
and β =
n|E|ǫ
≤ 1 ,
and e ∈ E, we have
Ge,n > G
e,n +
Proof. Fix e ∈ E. Using (1) with u = 4
and c =
, it suffices to prove for all n that
ec(Ge,n−G
≤ 1 .
Similarly to Lemma 1 we introduce Zt = e
c(Ge,t−G
e,t) and we show that Z1, . . . , Zn is a super-
martingale, that is Et[Zt] ≤ Zt−1 for t ≥ 2 where Et denotes E[·|(I1, S1), . . . , (I t−1, St−1)]. Taking
conditional expectations, we obtain
Et[Zt] = Zt−1Et
ge,t−
1{e∈It}
Stge,t+Stβ
qe,tǫ
≤ Zt−1Et
1 + c
ge,t −
1{e∈It}Stge,t + Stβ
qe,tǫ
ge,t −
1{e∈It}Stge,t + Stβ
qe,tǫ
. (10)
Since
ge,t −
1{e∈It}Stge,t + Stβ
qe,tǫ
= − β
ge,t −
1{e∈It}Stge,t
qe,tǫ
1{e∈It}Stge,t
qe,tǫ
qe,tǫ
we get from (10) that
Et[Zt]
≤ Zt−1Et
1− cβ
qe,tǫ
21{e∈It}Stge,tβ
q2e,tǫ
2ge,tStβ
qe,tǫ
q2e,tǫ
≤ Zt−1
qe,tǫ
. (11)
Since c = βǫ/4 we have
− β + c
qe,tǫ
= −3β
qe,tǫ
4qe,t
4qe,t
β3|C|
≤ 0, (13)
where (12) follows from qe,t ≥ γ|C| and (13) holds by
β2|C|
≤ 1 ,
and the last inequality is ensured by n ≥ K
2 ln2(2|E|/δ)
ǫ|E| lnN
, the assumption of the lemma.
Combining (11) and (13) we get that Et[Zt] ≤ Zt−1. Taking expectations on both sides of the
inequality, we get E[Zt] ≤ E[Zt−1] and since E[Z1] ≤ 1, we obtain E[Zn] ≤ 1 as desired. ✷
Proof of Theorem 3. The proof of the theorem is a generalization of that of Theorem 1, and
follows the same lines with some extra technicalities to handle the effects of the modified step (c’).
Therefore, in the following we emphasize only the differences. First note that (5) and (7) also hold
in this case. Bounding the sums in (7), one obtains
pi,tg
i,t =
(gIt,t + |E|β)
and ∑
pi,tg
i,t ≤
K(1 + β)
g′e,t .
Plugging these bounds into (7) and summing for t = 1, . . . , n, we obtain
(gIt,t + |E|β ) +
η2K(1 + β)
(1− γ)ǫ
|C|max
G′i,n .
Combining the upper bound with the lower bound (5), we obtain
(gIt,t + |E|β ) ≥
1−γ− ηK(1 + β)|C|
G′i,n−
. (14)
To relate the left-hand side of the above inequality to the real gain
t=1 gIt,t, notice that
(gIt,t + |E|β) − (gIt,t + |E|β)
is a martingale difference sequence with respect to (I1, S1), (I2, S2), . . .. Now for all t = 1, . . . , n,
we have the bound
X2t |(I1, S1), . . . , (It−1, St−1)
(gIt,t + |E|β)2
∣∣∣∣(I1, S1), . . . , (I t−1, St−1)
≤ (K + |E|β)
= σ2, (15)
where (15) holds by n ≥ |E| ln(2|E|/δ)
(to ensure β|E| ≤ K). We know that
for all t. Now apply Bernstein’s inequality for martingale differences (see Lemma 9 in the Appendix)
to obtain
Xt > u
, (16)
where
From (16) we get
(gIt,t + |E|β) ≥ Ĝn + βn|E|+ u
. (17)
Now Lemma 2, the union bound, and (17) combined with (14) yield, with probability at least
1− δ,
Ĝn ≥
1− γ −
ηK(1 + β)|C|
Gi,n −
− βn|E| − u
since the coefficient of G′i,n is greater than zero by the assumptions of the theorem.
Since Ĝn = Kn− L̂n and Gi,n = Kn− Li,n, we have
L̂n ≤
1− γ − K(1 + β)η|C|
Li,n +Kn
K(1 + β)η|C|
1− γ −
K(1 + β)η|C|
+ βn|E|+
≤ min
Li,n +Kn
K(1 + β)η|C|
+ 5βn|E|+
+ u ,
where we used the fact that K
= βn|E|.
Substituting the value of β, η and γ, we have
L̂n −min
Li,n ≤Kn
2Kη|C|
2Kη|C|
+ 5βn|E|+ u
n|C| lnN
n|E|K ln(2|E|/δ)
K|C| lnN + 5
|E| ln(2|E|/δ) +
8K ln (2/δ)
ln (2/δ)
as desired. ✷
6 A bandit algorithm for tracking the shortest path
Our goal in this section is to extend the bandit algorithm so that it is able to compete with time-
varying paths under small computational complexity. This is a variant of the problem known
as tracking the best expert ; see, for example, Herbster and Warmuth [19], Vovk [30], Auer and
Warmuth [2], Bousquet and Warmuth [6], Herbster and Warmuth [20].
To describe the loss the decision maker is compared to, consider the following “m-partition”
prediction scheme: the sequence of paths is partitioned into m+1 contiguous segments, and on each
segment the scheme assigns exactly one of the N paths. Formally, an m-partition Part(n,m, t, i) of
the n paths is given by an m-tuple t = (t1, . . . , tm) such that t0 = 1 < t1 < · · · < tm < n+1 = tm+1,
and an (m+ 1)-vector i = (i0, . . . , im) where ij ∈ P. At each time instant t, tj ≤ t < tj+1, path ij
is used to predict the best path. The cumulative loss of a partition Part(n,m, t, i) is
L(Part(n,m, t, i)) =
tj+1−1∑
ℓij ,t =
tj+1−1∑
ℓe,t.
The goal of the decision maker is to perform as well as the best time-varying path (partition),
that is, to keep the normalized regret
L̂n −min
L(Part(n,m, t, i))
as small as possible (with high probability) for all possible outcome sequences.
In the “classical” tracking problem there is a relatively small number of “base” experts and the
goal of the decision maker is to predict as well as the best “compound” expert (i.e., time-varying
expert). However in our case, base experts correspond to all paths of the graph between source and
destination whose number is typically exponentially large in the number of edges, and therefore we
cannot directly apply the computationally efficient methods for tracking the best expert. György,
Linder, and Lugosi [15] develop efficient algorithms for tracking the best expert for certain large
and structured classes of base experts, including the shortest path problem. The purpose of the
following algorithm is to extend the methods of [15] to the bandit setting when the forecaster only
observes the losses of the edges on the chosen path.
A BANDIT ALGORITHM FOR TRACKING SHORTEST PATHS
Parameters: real numbers β > 0, 0 < η, γ < 1, 0 ≤ α ≤ 1.
Initialization: Set we,0 = 1 for each e ∈ E, wi,0 = 1 for each i ∈ P, and W 0 = N .
For each round t = 1, 2, . . .
(a) Choose a path It according to the distribution pt defined by
pi,t =
(1− γ)wi,t−1
W t−1
if i ∈ C;
(1− γ)wi,t−1
W t−1
if i 6∈ C.
(b) Compute the probability of choosing each edge e as
qe,t =
i:e∈i
pi,t = (1− γ)
i:e∈iwi,t−1
W t−1
|{i ∈ C : e ∈ i}|
(c) Calculate the estimated gains
g′e,t =
ge,t+β
if e ∈ It;
otherwise.
(d) Compute the updated weights
vi,t = wi,t−1e
wi,t = (1− α)vi,t +
where g′
i,t =
e∈i g
e,t and W t is the sum of the total weights of the paths, that
W t =
vi,t =
wi,t.
The following performance bounds shows that the normalized regret with respect to the best
time-varying path which is allowed to switch pathsm times is roughly of the order ofK
(m/n)|C| lnN .
Theorem 4 For any δ ∈ (0, 1) and parameters 0 ≤ γ < 1/2, α, β ∈ [0, 1], and η > 0 satisfying
2ηK|C| ≤ γ, the performance of the algorithm defined above can be bounded, with probability at
least 1− δ, as
L̂n −min
L(Part(n,m, t, i))
≤ Kn (γ + η(1 + β)K|C|) +
K(m+ 1)
|E|(m+ 1)
+ βn|E|+ 1
αm(1− α)n−m−1
In particular, choosing
K(m+ 1)
|E|(m+ 1)
, γ = 2ηK|C|, α =
(m+ 1) lnN +m ln
e(n−1)
4nK2|C|
we have, for all n ≥ max
K(m+1)
|E|(m+1)
, 4|C|D
L̂n −min
L(Part(n,m, t, i)) ≤ 2
4K|C|D +
|E|(m + 1) ln
|E|(m + 1)
where
D = (m+ 1) lnN +m
1 + ln
The proof of the theorem is a combination of that of our Theorem 1 and Theorem 1 of [15]. We
will need the following three lemmas.
Lemma 3 For any 1 ≤ t ≤ t′ ≤ n and any i ∈ P,
vi,t′
wi,t−1
≥ eηG
i,[t,t′](1− α)t′−t
where G′
i,[t,t′]
τ=t g
i,τ .
Proof. The proof is a straightforward modification of the one in Herbster and Warmuth [19]. From
the definitions of vi,t and wi,t (see step (d) of the algorithm) it is clear that for any τ ≥ 1,
wi,τ = (1− α)vi,τ +
W τ ≥ (1− α)eηg
i,τwi,τ−1 .
Applying this equation iteratively for τ = t, t+1, . . . , t′ − 1, and the definition of vi,t (step (d)) for
τ = t′, we obtain
vi,t′ = wi,t′−1e
i,t′ ≥ eηg
t′−1∏
(1− α)eηg
wi,t−1
i,[t,t′](1− α)t′−twi,t−1
which implies the statement of the lemma. ✷
Lemma 4 For any t ≥ 1 and i, j ∈ P, we have
Proof. By the definition of wi,t we have
wi,t = (1− α)vi,t +
W t ≥
W t ≥
vj,t .
This completes the proof of the lemma. ✷
The next lemma is a simple corollary of Lemma 1.
Lemma 5 For any δ ∈ (0, 1), 0 ≤ β ≤ 1, t ≥ 1 and e ∈ E we have
Ge,t > G
e,t +
|E|(m+ 1)
|E|(m+ 1)
Proof of Theorem 4. The theorem is proved the same way as Theorem 1 until (8), that is,
Ĝn + n|E|β
η2K(1 + β)
|C|max
G′i,n . (18)
Let Part(n,m, t, i) be an arbitrary partition. Then the lower bound is obtained as
vim,n
(recall that im denotes the path used in the last segment of the partition). Now vim,n can be
rewritten in the form of the following telescoping product
vim,n = wi0,t0−1
vi0,t1−1
wi0,t0−1
wij ,tj−1
vij−1,tj−1
vij ,tj+1−1
wij ,tj−1
Therefore, applying Lemmas 3 and 4, we have
vim,n ≥ wi0,t0−1
)m m∏
(1− α)tj+1−1−tjeηG
ij ,[tj ,tj+1−1]
′(Part(n,m,t,i))(1− α)n−m−1.
Combining the lower bound with the upper bound (18), we have
αm(1− α)n−m−1
ηG′(Part(n,m, t, i))
Ĝn + n|E|β
η2K(1+β)
|C|maxi∈P G′i,n ,
where we used the fact that Part(n,m, t, i) is an arbitrary partition. After rearranging and using
maxi∈P G
i,n ≤ maxt,iG′(Part(n,m, t, i)) we get
Ĝn ≥ (1− γ − ηK(1 + β)|C|) max
G′(Part(n,m, t, i))
−n|E|β − 1− γ
αm(1− α)n−m−1
Now since 1 − γ − ηK(1 + β)|C| ≥ 0, by the assumptions of the theorem and from Lemma 5 with
an application of the union bound we obtain that, with probability at least 1− δ,
Ĝn ≥ (1− γ − ηK(1 + β)|C|)
G(Part(n,m, t, i))−
K(m+ 1)
|E|(m+ 1)
− n|E|β − 1− γ
αm(1− α)n−m−1
Since Ĝn = Kn− L̂n and G(Part(n,m, t, i)) = Kn− L(Part(n,m, t, i)), we have
L̂n ≤ (1− γ − ηK(1 + β)|C|) min
L(Part(n,m, t, i)) +Kn (γ + η(1 + β)K|C|)
+ (1− γ − η(1 + β)K|C|) K(m+ 1)
|E|(m+ 1)
+ n|E|β
αm(1− α)n−m−1
This implies that, with probability at least 1− δ,
L̂n −min
L(Part(n,m, t, i))
≤ Kn (γ + η(1 + β)K|C|) +
K(m+ 1)
|E|(m+ 1)
+ n|E|β + 1
αm(1− α)n−m−1
. (20)
To prove the second statement, let H(p) = −p ln p − (1 − p) ln(1 − p) and D(p ‖ q) = p ln p
(1− p) ln 1−p
. Optimizing the value of α in the last term of (20) gives
αm(1− α)n−m−1
(m+ 1) ln (N) +m ln
+ (n−m− 1) ln
(m+ 1) ln (N) + (n − 1)(Db(α∗ ‖ α) +Hb(α∗))
where α∗ = m
. For α = α∗ we obtain
αm(1− α)n−m−1
((m+ 1) ln (N) + (n− 1)(Hb(α∗)))
((m+ 1) ln (N) +m ln((n − 1)/m)
+(n−m− 1) ln(1 +m/(n−m− 1)))
((m+ 1) ln (N) +m ln((n − 1)/m) +m)
(m+ 1) ln (N) +m ln
e(n− 1)
where the inequality follows since ln(1 + x) ≤ x for x > 0. Therefore
L̂n −min
L(Part(n,m, t, i))
≤ Kn (γ + η(1 + β)K|C|) +
K(m+ 1)
|E|(m+ 1)
+ n|E|β +
which is the first statement of the theorem. Setting
K(m+ 1)
|E|(m+ 1)
, γ = 2ηK|C|, and η =
4nK2|C|
results in the second statement of the theorem, that is,
L̂n −min
L(Part(n,m, t, i))
4K|C|D +
|E|(m+ 1) ln
|E|(m+ 1)
Similarly to [15], the proposed algorithm has an alternative version, which is efficiently com-
putable:
AN ALTERNATIVE BANDIT ALGORITHM FOR TRACKING SHORTEST
PATHS
For t = 1, choose I1 uniformly from the set P. For t ≥ 2,
(a) Draw a Bernoulli random variable Γt with P(Γt = 1) = γ.
(b) If Γt = 1, then choose It uniformly from C.
(c) If Γt = 0,
(c1) choose τt randomly according to the distribution
P{τt = t′} =
(1−α)t−1Z1,t−1
for t′ = 1
α(1−α)t−t
Wt′Zt′,t−1
for t′ = 2, . . . , t
where Zt′,t−1 =
i∈P e
i,[t′,t−1] for t′ = 1, . . . , t− 1, and Zt,t−1 = N ;
(c2) given τt = t
′, choose It randomly according to the probabilities
P{It = i|τt = t′} =
i,[t′,t−1]
Zt′,t−1
for t′ = 1, . . . , t− 1
for t′ = t.
In a way completely analogous to [15], in this alternative formulation of the algorithm one can
compute the probabilities P{It = i|τt = t′} and the normalization factors Zt′,t−1 efficiently. Using
the fact that the baseline bandit algorithm for shortest paths has an O(n|E|) time complexity by
Theorem 2, it follows from Theorem 3 of [15] that the time complexity of the alternative bandit
algorithm for tracking the shortest path is O(n2|E|).
7 An algorithm for the restricted multi-armed bandit problem
In this section we consider the situation where the decision maker receives information only about
the performance of the whole chosen path, but the individual edge losses are not available. That
is, the forecaster has access to the sum ℓIt,t of losses over the chosen path It but not to the losses
{ℓe,t}e∈It of the edges belonging to It.
This is the problem formulation considered by McMahan and Blum [25] and Awerbuch and
Kleinberg [4]. McMahan and Blum provided a relatively simple algorithm whose regret is at
most of the order of n−1/4, while Awerbuch and Kleinberg gave a more complex algorithm to
achieve O(n−1/3) regret. In this section we combine the strengths of these papers, and propose
a simple algorithm with regret at most of the order of n−1/3. Moreover, our bound holds with
high probability, while the above-mentioned papers prove bounds for the expected regret only. The
proposed algorithm uses ideas very similar to those of McMahan and Blum [25]. The algorithm
alternates between choosing a path from a “basis” B to obtain unbiased estimates of the loss
(exploration step), and choosing a path according to exponential weighting based on these estimates.
A simple way to describe a path i ∈ P is a binary row vector with |E| components which are
indexed by the edges of the graph such that, for each e ∈ E, the eth entry of the vector is 1 if e ∈ i
and 0 otherwise. With a slight abuse of notation we will also denote by i the binary row vector
representing path i. In the previous sections, where the loss of each edge along the chosen path
is available to the decision maker, the complexity stemming from the large number of paths was
reduced by representing all information in terms of the edges, as the set of edges spans the set of
paths. That is, the vector corresponding to a given path can be expressed as the linear combination
of the unit vectors associated with the edges (the eth component of the unit vector representing
edge e is 1, while the other components are 0). While the losses corresponding to such a spanning
set are not observable in the restricted setting of this section, one can choose a subset of P that
forms a basis, that is, a collection of b paths which are linearly independent and each path in P
can be expressed as a linear combination of the paths in the basis. We denote by B the b × |E|
matrix whose rows b1, . . . , bb represent the paths in the basis. Note that b is equal to the maximum
number of linearly independent vectors in {i : i ∈ P}, so b ≤ |E|.
Let ℓ
t denote the (column) vector of the edge losses {ℓe,t}e∈E at time t, and let ℓ
(ℓb1,t, . . . , ℓbb,t)
T be a b-dimensional column vector whose components are the losses of the paths
in the basis B at time t. If α
(i,B)
, . . . , α
(i,B)
are the coefficients in the linear combination of the
basis paths expressing path i ∈ P, that is, i =
j=1 α
(i,B)
j, then the loss of path i ∈ P at time t
is given by
ℓi,t = 〈i, ℓ
t 〉 =
(i,B)
〈bj, ℓ(E)t 〉 =
(i,B)
bj ,t (21)
where 〈·, ·〉 denotes the standard inner product in R|E|. In the algorithm we obtain estimates ℓ̃
bj ,t
of the losses of the basis paths and use (21) to estimate the loss of any i ∈ P as
ℓ̃i,t =
(i,B)
bj ,t . (22)
It is algorithmically advantageous to calculate the estimated path losses ℓ̃i,t from an intermediate
estimate of the individual edge losses. LetB+ denote the the Moore-Penrose inverse ofB defined by
B+ = BT (BBT )−1, where BT denotes the transpose of B and BBT is invertible since the rows of
B are linearly independent. (Note thatB+ = B−1 if b = |E|). Then letting ℓ̃(B)t = (ℓ̃b1,t, . . . , ℓ̃bb,t)
t = B
it is easy to see that ℓ̃i,t in (22) can be obtained as ℓ̃i,t = 〈i, ℓ̃
t 〉, or equivalently
ℓ̃i,t =
ℓ̃e,t.
This form of the path losses allows for an efficient implementation of exponential weighting via
dynamic programming [28].
To analyze the algorithm we need an upper bound on the magnitude of the coefficients α
(i,B)
For this, we invoke the definition of a barycentric spanner from [4]: the basis B is called a C-
barycentric spanner if |α(i,B)
| ≤ C for all i ∈ P and j = 1, . . . , b. Awerbuch and Kleinberg [4] show
that a 1-barycentric spanner exists if B is a square matrix (i.e., b = |E|) and give a low-complexity
algorithm which finds a C-barycentric spanner for C > 1. We use their technique to show that a
1-barycentric spanner also exists in case of a non-square B, when the basis is chosen to maximize
the absolute value of the determinant of BBT . As before, b denotes the maximum number of
linearly independent vectors (paths) in P.
Lemma 6 For a directed acyclic graph, the set of paths P between two dedicated nodes has a 1-
barycentric spanner. Moreover, let B be a b×|E| matrix with rows from P such that det[BBT ] 6= 0.
If B−j,i is the matrix obtained from B by replacing its jth row by i ∈ P and
∣∣det
B−j,iB
]∣∣ ≤ C2
∣∣det
]∣∣ (23)
for all j = 1, . . . , b and i ∈ P, then B is a C-barycentric spanner.
Proof. Let B be a basis of P with rows b1, . . . , bb ∈ P that maximizes |det[BBT ]|. Then, for
any path i ∈ P, we have i =
j=1 α
(i,B)
j for some coefficients {α(i,B)
}. Now for the matrix
B−1,i = [i
T , (b2)T , . . . , (bb)T ]T we have
∣∣det
B−1,iB
∣∣∣det
B−1,ii
T ,B−1,i(b
2)T ,B−1,i(b
3)T , . . . ,B−1,i(b
∣∣∣∣∣∣∣
(i,B)
B−1,ib
,B−1,i(b
2)T ,B−1,i(b
3)T , . . . ,B−1,i(b
∣∣∣∣∣∣∣
∣∣∣∣∣∣
(i,B)
B−1,i(b
j)T ,B−1,i(b
2)T ,B−1,i(b
3)T , . . . ,B−1,i(b
∣∣∣∣∣∣
= |α(i,B)
∣∣det
B−1,iB
(i,B)
)2 ∣∣det
where last equality follows by the same argument the penultimate equality was obtained. Repeating
the same argument for B−j,i, j = 2, . . . , b we obtain
∣∣det
B−j,iB
]∣∣ =
(i,B)
)2 ∣∣det
]∣∣ . (24)
Thus the maximal property of |det[BBT ]| implies |α(i,B)
| ≤ 1 for all j = 1, . . . , b. The second
statement follows trivially from (23) and (24). ✷
Awerbuch and Kleinberg [4] also present an iterative algorithm to find a C-barycentric spanner
if B is a square matrix. Starting from the identity matrix, their algorithm replaces a row of the
matrix in each step by maximizing the determinant with respect to the given row. This is done by
calling an oracle function, and it is shown that the oracle is called O(b logC b) times. In case B is
not a square matrix, the algorithm carries over if we have access to an alternative oracle that can
maximize |det[BBT ]|: Starting from an arbitrary basis B we can iteratively replace one row in
each step, using the oracle, to maximize the determinant |det[BBT ]| until (23) is satisfied for all j
and i. By Lemma 6, this results in a C-barycentric spanner. Similarly to [4], it can be shown that
the oracle is called O(b logC b) times for C > 1.
For simplicity (to avoid carrying the constant C), assume that we have a 2-barycentric spanner
B. Based on the ideas of label efficient prediction, the next algorithm gives a simple solution to
the restricted shortest path problem. The algorithm is very similar to that of the algorithm in the
label efficient case, but here we cannot estimate the edge losses directly. Therefore, we query the
loss of a (random) basis vector from time to time, and create unbiased estimates ℓ̃
bj ,t of the losses
of basis paths ℓ
bj ,t, which are then transformed into edge-loss estimates.
A BANDIT ALGORITHM FOR THE RESTRICTED SHORTEST PATH
PROBLEM
Parameters: 0 < ǫ, η ≤ 1.
Initialization: Set we,0 = 1 for each e ∈ E, wi,0 = 1 for each i ∈ P, W 0 = N . Fix a
basis B, which is a 2-barycentric spanner. For each round t = 1, 2, . . .
(a) Draw a Bernoulli random variable St such that P(St = 1) = ǫ;
(b) If St = 1, then choose the path It uniformly from the basis B. If St = 0, then
choose It according to the distribution {pi,t}, defined by
pi,t =
wi,t−1
W t−1
(c) Calculate the estimated loss of all edges according to
t = B
where ℓ̃
t = {ℓ̃
e,t }e∈E , and ℓ̃
t = (ℓ̃
, . . . , ℓ̃
) is the vector of the estimated
losses
bj ,t =
bj ,t1{It=b
for j = 1, . . . , b.
(d) Compute the updated weights
we,t = we,t−1e
−ηℓ̃e,t ,
wi,t =
we,t = wi,t−1e
e∈i ℓ̃e,t ,
and the sum of the total weights of the paths
W t =
wi,t .
The performance of the algorithm is analyzed in the next theorem. The proof follows the argu-
ment of Cesa-Bianchi et al. [9], but we also have to deal with some additional technical difficulties.
Note that in the theorem we do not assume that all paths between u and v have equal length.
Theorem 5 Let K denote the length of the longest path in the graph. For any δ ∈ (0, 1), parameters
0 < ǫ ≤ 1
and η > 0 satisfying η ≤ ǫ2, and n ≥ 8b
ln 4bN
, the performance of the algorithm defined
above can be bounded, with probability at least 1− δ, as
L̂n −min
Li,n ≤ K
+ nǫ+
2nǫ ln 4
In particular, choosing
and η = ǫ2
we obtain
L̂n −min
Li,n ≤ 9.1K2b (Kb ln(4bN/δ))1/3 n2/3 .
The theorem is proved using the following two lemmas. The first one is an easy consequence of
Bernstein’s inequality:
Lemma 7 Under the assumptions of Theorem 5, the probability that the algorithm queries the basis
more than nǫ+
2nǫ ln 4
times is at most δ/4.
Using the estimated loss of a path i ∈ P given in (22), we can estimate the cumulative loss of
i up to time n as
L̃i,n =
ℓ̃i,t .
The next lemma demonstrates the quality of these estimates.
Lemma 8 Let 0 < δ < 1 and assume n ≥ 8b
ln 4bN
. For any i ∈ P, with probability at least
1− δ/4,
pi,tℓi,t −
pi,tℓ̃i,t ≤
Furthermore, with probability at least 1− δ/(4N),
L̃i,n − Li,n ≤
Proof. We may write
pi,tℓi,t −
pi,tℓ̃i,t =
(i,B)
bj ,t − ℓ̃bj ,t
pi,tα
(i,B)
bj ,t − ℓ̃bj ,t
bj ,t . (25)
Note that for any bj, X
bj ,t, t = 1, 2, . . . is a martingale difference sequence with respect to (It, St),
t = 1, 2, . . . as Etℓ̃b,t = ℓb,t. Also,
bj ,t
pi,tα
(i,B)
bj ,t
(i,B)
)2 K2b
bj ,t| ≤
∣∣∣∣∣
pi,tα
(i,B)
∣∣∣∣∣
∣∣∣ℓbj ,t − ℓ̃bj ,t
∣∣∣ ≤
∣∣∣α(i,B)
where the last inequalities in both cases follow from the fact that B is a 2-barycentric spanner.
Then, using Bernstein’s inequality for martingale differences (Lemma 9), we have, for any fixed bj ,
bj ,t ≥
where we used (26), (27) and the assumption of the lemma on n. The proof of the first statement
is finished with an application of the union bound and its combination with (25).
For the second statement we use a similar argument, that is,
(ℓ̃i,t − ℓi,t) =
(i,B)
bj ,t − ℓbj ,t) ≤
∣∣∣α(i,B)
∣∣∣∣∣
bj ,t − ℓbj ,t)
∣∣∣∣∣
∣∣∣∣∣
bj ,t − ℓbj ,t)
∣∣∣∣∣ . (29)
Now applying Lemma 9 for a fixed bj we get
bj ,t − ℓbj ,t) ≥
because of Et[(ℓ̃bj ,t − ℓbj ,t)2] ≤
and −K ≤ ℓ̃
bj ,t − ℓbj ,t ≤ K
. The proof is completed by
applying the union bound to (30) and combining the result with (29). ✷
Proof of Theorem 5. Similarly to earlier proofs, we follow the evolution of the term ln Wn
the same way as we obtained (5) and (7), we have
≥ −ηmin
L̃i,n − lnN
pi,tℓ̃i,t +
pi,tℓ̃
Combining these bounds, we obtain
L̃i,n −
pi,tℓ̃i,t +
pi,tℓ̃
−1 + ηKb
pi,tℓ̃i,t ,
because 0 ≤ ℓ̃i,t ≤ 2Kbǫ . Applying the results of Lemma 8 and the union bound, we have, with
probability 1− δ/2,
Li,n −
−1 + ηKb
)( n∑
pi,tℓi,t −
pi,tℓi,t +
. (31)
Introduce the sets
= {t : 1 ≤ t ≤ n and St = 0} and T n
= {t : 1 ≤ t ≤ n and St = 1}
of “exploitation” and “exploration” steps, respectively. Then, by the Hoeffding-Azuma inequality
[21] we obtain that, with probability at least 1− δ/4,
pi,tℓi,t ≥
ℓIt,t −
|Tn|K2
Note that for the exploration steps t ∈ T n, as the algorithm plays according to a uniform distribu-
tion instead of pi,t, we can only use the trivial lower bound zero on the losses, that is,
t∈T n
pi,tℓi,t ≥
t∈T n
ℓIt,t −K|T n| .
The last two inequalities imply
pi,tℓi,t ≥ L̂n −
|Tn|K2
−K|T n| . (32)
Then, by (31), (32) and Lemma 7 we obtain, with probability at least 1− δ,
L̂n −min
+ nǫ+
2nǫ ln 4
where we used L̂n ≤ Kn and |Tn| ≤ n. Substituting the values of ǫ and η gives
L̂n −min
Li,n ≤ K2bnǫ+
Knǫ+Knǫ+
Knǫ+ nǫ
≤ 9.1K2bnǫ
where we used
2nǫ ln 4
ln 4N
= nǫ, and lnN
≤ nǫ (from the
assumptions of the theorem). ✷
8 Simulation results
To further investigate our new algorithms, we have conducted some simple simulations. As the main
motivation of this work is to improve earlier algorithms in case the number of paths is exponentially
large in the number of edges, we tested the algorithms on the small graph shown in Figure 1 (b),
which has one of the simplest structures with exponentially many (namely 2|E|/2) paths.
The losses on the edges were generated by a sequence of independent and uniform random
variables, with values from [0, 1] on the upper edges, and from [0.32, 1] on the lower edges, resulting
in a (long-term) optimal path consisting of the upper edges. We ran the tests for n = 10000 steps,
with confidence value δ = 0.001. To establish baseline performance, we also tested the EXP3
algorithm of Auer et al. [1] (note that this algorithm does not need edge losses, only the loss of the
chosen path). For the version of our bandit algorithm that is informed of the individual edge losses
(edge-bandit), we used the simple 2-element cover set of the paths consisting of the upper and
lower edges, respectively (other 2-element cover sets give similar performance). For our restricted
shortest path algorithm (path-bandit) the basis {uuuuu, uuuul, uuull, uulll, ullll} was used, where
u (resp. l) in the kth position denotes the upper (resp. lower) edge connecting vk−1 and vk. In
this example the performance of the algorithm appeared to be independent of the actual choice of
the basis; however, in general we do not expect this behavior. Two versions of the algorithm of
Awerbuch and Kleinberg [4] were also simulated. With its original parameter setting (AwKl), the
algorithm did not perform well. However, after optimizing its parameters off-line (AwKl tuned),
substantially better performance was achieved. The normalized regret of the above algorithms,
averaged over 30 runs, as well as the regret of the fixed paths in the graph are shown in Figure 2.
Although all algorithms showed better performance than the bound for the edge-bandit algo-
rithm, the latter showed the expected superior performance in the simulations. Furthermore, our
algorithm for the restricted shortest path problem outperformed Awerbuch and Kleinberg’s (AwKl)
algorithm, while being inferior to its off-line tuned version (AwKl tuned). It must be noted that
similar parameter optimization did not improve the performance of our path-bandit algorithm,
which showed robust behavior with respect to parameter tuning.
9 Conclusions
We considered different versions of the on-line shortest path problem with limited feedback. These
problems are motivated by realistic scenarios, such as routing in communication networks, where
the vertices do not have all the information about the state of the network. We have addressed the
problem in the adversarial setting where the edge losses may vary in an arbitrary way; in particular,
they may depend on previous routing decisions of the algorithm. Although this assumption may
neglect natural correlation in the loss sequence, it suits applications in mobile ad-hoc networks,
where the network topology changes dynamically in time, and also in certain secure networks that
has to be able to handle denial of service attacks.
Efficient algorithms have been provided for the multi-armed bandit setting and in a combined
label efficient multi-armed bandit setting, provided the individual edge losses along the chosen
path are revealed to the algorithms. The normalized regrets of the algorithms, compared to the
performance of the best fixed path, converge to zero at an O(1/
n) rate as the time horizon n
grows to infinity, and increases only polynomially in the number of edges (and vertices) of the
graph. Earlier methods for the multi-armed bandit problem either do not have the right O(1/
convergence rate, or their regret increase exponentially in the number of edges for typical graphs.
 0  2000  4000  6000  8000  10000
Number of packets
edge-bandit
path-bandit
AwKl tuned
bound for edge-bandit
Figure 2: Normalized regret of several algorithms for the shortest path problem. The gray dotted
lines show the normalized regret of fixed paths in the graph.
The algorithm has also been extended so that it can compete with time varying paths, that is, to
handle situations when the best path can change from time to time (for consistency, the number
of changes must be sublinear in n).
In the restricted version of the shortest path problem, where only the losses of the whole paths
are revealed, an algorithm with a worse O(n−1/3) normalized regret was provided. This algorithm
has comparable performance to that of the best earlier algorithm for this problem [4], however,
our algorithm is significantly simpler. Simulation results are also given to assess the practical
performance and compare it to the theoretical bounds as well as other competing algorithms.
It should be noted that the results are not entirely satisfactory in the restricted version of the
problem, as it remains an open question whether the O(1/
n) regret can be achieved without the
exponential dependence on the size of the graph. Although we expect that this is the case, we have
not been able to construct an algorithm with such a proven performance bound.
10 Appendix
Lemma 9 (Bernstein’s inequality for martingale differences [10].) Let X1, . . . ,Xn be a martingale
difference sequence such that Xt ∈ [a, b] with probability one (t = 1, . . . , n). Assume that, for all t,
X2t |Xt−1, . . . ,X1
≤ σ2 a.s.
Then, for all ǫ > 0,
Xt > ǫ
2nσ2+2ǫ(b−a)/3
and therefore
2nσ2 ln δ−1 + 2 ln δ−1(b− a)/3
References
[1] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The non-stochastic multi-armed bandit
problem. SIAM Journal on Computing, 32(1):48–77, 2002.
[2] P. Auer and M. K. Warmuth, “Tracking the best disjunction,” Machine Learning, vol. 32,
no. 2, pp. 127–150, 1998.
[3] B. Awerbuch, D. Holmer, H. Rubens, and R. Kleinberg. Provably competitive adaptive routing.
In Proceedings of IEEE INFOCOM 2005, volume 1, pages 631–641, March 2005.
[4] B. Awerbuch and R. D. Kleinberg. Adaptive routing with end-to-end feedback: distributed
learning and geometric approaches. In Proceedings of the 36th Annual ACM Symposium on the
Theory of Computing, STOC 2004, pages 45–53, Chicago, IL, USA, Jun. 2004. ACM Press.
[5] D. Blackwell. An analog of the minimax theorem for vector payoffs. Pacific Journal of Math-
ematics, 6:1–8, 1956.
[6] O. Bousquet and M. K. Warmuth. Tracking a small set of experts by mixing past posteriors.
Journal of Machine Learning Research, 3:363–396, Nov. 2002.
[7] N. Cesa-Bianchi, Y. Freund, D. P. Helmbold, D. Haussler, R. Schapire, and M. K. Warmuth.
How to use expert advice. Journal of the ACM, 44(3):427–485, 1997.
[8] N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University
Press, Cambridge, 2006.
[9] N. Cesa-Bianchi, G. Lugosi, and G. Stoltz. Minimizing regret with label efficient prediction.
IEEE Trans. Inform. Theory, IT-51:2152–2162, June 2005.
[10] D. A. Freedman. On tail probabilities for martingales. Annals of Probability, 3:100–118, Feb.
1975.
[11] E. Gelenbe, M. Gellman, R. Lent, P. Liu, and P. Su. Autonomous smart routing for network
QoS. In Proceedings of First International Conference on Autonomic Computing, pages 232–
239, New York, May 2004. IEEE Computer Society.
[12] E. Gelenbe, R. Lent, and Z. Xhu. Measurement and performance of a cognitive packet network.
Journal of Computer Networks, 37:691–701, 2001.
[13] A. György, T. Linder, and G. Lugosi. Efficient algorithms and minimax bounds for zero-delay
lossy source coding. IEEE Transactions on Signal Processing, 52:2337–2347, Aug. 2004.
[14] A. György, T. Linder, and G. Lugosi. A ”follow the perturbed leader”-type algorithm for
zero-delay quantization of individual sequences. In Proc. Data Compression Conference, pages
342–351, Snowbird, UT, USA, Mar. 2004.
[15] A. György, T. Linder, and G. Lugosi. Tracking the best of many experts. In Proceedings of
the 18th Annual Conference on Learning Theory, COLT 2005, pages 204–216, Bertinoro, Italy,
Jun. 2005. Springer.
[16] A. György, T. Linder, and G. Lugosi. Tracking the best quantizer. In Proceedings of the IEEE
International Symposium on Information Theory, pages 1163–1167, Adelaide, Australia, June-
July 2005.
[17] A. György and Gy. Ottucsák. Adaptive routing using expert advice. The Computer Journal,
49(2):180–189, 2006.
[18] J. Hannan. Approximation to Bayes risk in repeated plays. In M. Dresher, A. Tucker, and
P. Wolfe, editors, Contributions to the Theory of Games, volume 3, pages 97–139. Princeton
University Press, 1957.
[19] M. Herbster and M. K. Warmuth. Tracking the best expert. Machine Learning, 32(2):151–178,
1998.
[20] M. Herbster and M. K. Warmuth, “Tracking the best linear predictor,” Journal of Machine
Learning Research, vol. 1, pp. 281–309, 2001.
[21] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the
American Statistical Association, 58:13–30, 1963.
[22] D.P. Helmbold and S. Panizza. Some label efficient learning results. In Proceedings of the 10th
Annual Conference on Computational Learning Theory, pages 218–230. ACM Press, 1997.
[23] A. Kalai and S Vempala. Efficient algorithms for the online decision problem. In B. Schölkopf
and M. Warmuth, editors, Proceedings of the 16th Annual Conference on Learning Theory
and the 7th Kernel Workshop, COLT-Kernel 2003, pages 26–40, New York, USA, Aug. 2003.
Springer.
[24] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and
Computation, 108:212–261, 1994.
[25] H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an
adaptive adversary. In Proceedings of the 17th Annual Conference on Learning Theory, COLT
2004, pages 109–123, Banff, Canada, Jul. 2004. Springer.
[26] M. Mohri. General algebraic frameworks and algorithms for shortest distance problems. Tech-
nical Report 981219-10TM, AT&T Labs Research, 1998.
[27] R. E. Schapire and D. P. Helmbold. Predicting nearly as well as the best pruning of a decision
tree. Machine Learning, 27:51–68, 1997.
[28] E. Takimoto and M. K. Warmuth. Path kernels and multiplicative updates. Journal of Machine
Learning Research, 4:773–818, 2003.
[29] V. Vovk. Aggregating strategies. In Proceedings of the Third Annual Workshop on Computa-
tional Learning Theory, pages 372–383, Rochester, NY, Aug. 1990. Morgan Kaufmann.
[30] V. Vovk. Derandomizing stochastic prediction strategies. Machine Learning, 35(3):247–282,
Jun. 1999.
	Introduction
	The shortest path problem
	The multi-armed bandit setting
	A bandit algorithm for shortest paths
	A combination of the label efficient and bandit settings
	A bandit algorithm for tracking the shortest path
	An algorithm for the restricted multi-armed bandit problem
	Simulation results
	Conclusions
	Appendix
ABSTRACT
  The on-line shortest path problem is considered under various models of
partial monitoring. Given a weighted directed acyclic graph whose edge weights
can change in an arbitrary (adversarial) way, a decision maker has to choose in
each round of a game a path between two distinguished vertices such that the
loss of the chosen path (defined as the sum of the weights of its composing
edges) be as small as possible. In a setting generalizing the multi-armed
bandit problem, after choosing a path, the decision maker learns only the
weights of those edges that belong to the chosen path. For this problem, an
algorithm is given whose average cumulative loss in n rounds exceeds that of
the best path, matched off-line to the entire sequence of the edge weights, by
a quantity that is proportional to 1/\sqrt{n} and depends only polynomially on
the number of edges of the graph. The algorithm can be implemented with linear
complexity in the number of rounds n and in the number of edges. An extension
to the so-called label efficient setting is also given, in which the decision
maker is informed about the weights of the edges corresponding to the chosen
path at a total of m << n time instances. Another extension is shown where the
decision maker competes against a time-varying path, a generalization of the
problem of tracking the best expert. A version of the multi-armed bandit
setting for shortest path is also discussed where the decision maker learns
only the total weight of the chosen path but not the weights of the individual
edges on the path. Applications to routing in packet switched networks along
with simulation results are also presented.

<|endoftext|><|startoftext|>
Introduction 1
2. Curvature estimates 5
3. Proof of Theorem 1.5 9
References 9
1. Introduction
Let N = Nn+1 be a Riemannian manifold, Ω ⊂ N open, connected and
precompact, and M ⊂ Ω a closed connected hypersurface with second fun-
damental form hij , induced metric gij and principal curvatures κi. M is said
to be a Weingarten hypersurface, if, for a given curvature function F , its
principal curvatures lie in the convex cone Γ ⊂ Rn in which the curvature
function is defined, M is then said to be admissible, and satisfies the equation
(1.1) F |M = f
where the right-hand side f is a prescribed positive function defined in Ω̄.
When proving a priori estimates for solutions of (1.1) the concavity of F
plays a central role. As usual we consider F to be defined in a cone Γ as well
as on the space of admissible tensors such that
(1.2) F (hij) = F (κi).
Date: September 2, 2021.
2000 Mathematics Subject Classification. 35J60, 53C21, 53C44, 53C50, 58J05.
Key words and phrases. curvature estimates, Weingarten hypersurface, curvature flows.
This work has been supported by the Deutsche Forschungsgemeinschaft.
http://arxiv.org/abs/0704.1021v2
2 CLAUS GERHARDT
Notice that curvature functions are always assumed to be symmetric and if
F ∈ Cm,α(Γ ), 2 ≤ m, 0 < α < 1, then F ∈ Cm,α(SΓ ), where SΓ ⊂ T
0,2(M)
is the open set of admissible symmetric tensors with respect to the given
metric gij . The result is due to Ball, [1], see also [7, Theorem 2.1.8].
The second derivatives of F then satisfy
(1.3) F
ij,klηijηkl =
∂κi∂κj
ηiiηjj +
Fi − Fj
κi − κj
(ηij)
2 ≤ 0 ∀ η ∈ S,
where S ⊂ T 0,2(M) is the space of symmetric tensors, if F is concave in Γ ,
cf. [4, Lemma 1.1].
However, a mere non-positivity of the right-hand side is in general not
sufficient to prove a priori estimates for the κi resulting in the fact that only
for special curvature functions for which a stronger estimate was known such
a priori estimates could be derived and the problem (1.1) solved, if further
assumptions are satisfied.
Sheng et al. then realized in [9] that the term
(1.4)
Fi − Fj
κi − κj
(ηij)
was all that is needed to obtain the stronger concavity estimates under certain
circumstances. Indeed, if the κi are labelled
(1.5) κ1 ≤ · · · ≤ κn,
then there holds:
1.1. Lemma. Let F be concave and monotone, and assume κ1 < κn, then
(1.6)
Fi − Fj
κi − κj
(ηij)
κn − κ1
(Fn − Fi)(ηni)
for any symmetric tensor (ηij), where we used coordinates such that gij = δij .
Proof. Without loss of generality we may assume that the κi satisfy the strict
inequalities
(1.7) κ1 < · · · < κn,
since these points are dense. The concavity of F implies
(1.8) F1 ≥ · · · ≥ Fn,
cf. [2, Lemma 2], where
(1.9) Fi =
the last inequality is the definition of monotonicity. The inequality then
follows immediately. �
CURVATURE ESTIMATES 3
The right-hand side of inequality (1.6) is exactly the quantity that is
needed to balance a bad technical term in the a priori estimate for κn, at
least in Riemannian manifolds, as we shall prove. Unfortunately, this doesn’t
work in Lorentzian spaces, because of a sign difference in the Gauß equations.
The assumptions on the curvature function are very simple.
1.2. Assumption. Let Γ ⊂ Rn be an open, symmetric, convex cone con-
taining Γ+ and let F ∈ C
m,α(Γ ) ∩ C0(Γ̄ ), m ≥ 4, be symmetric, monotone,
homogeneous of degree 1, and concave such that
(1.10) F > 0 in Γ
(1.11) F |∂Γ = 0.
These conditions on the curvature function will suffice. They could have
been modified, even relaxed, e.g., by only requiring that logF is concave, but
then the condition
(1.12) F ijgij ≥ c0 > 0,
which automatically holds, if F is concave and homogeneous of degree 1,
would have been added, destroying the aesthetic simplicity of Assumption 1.2.
Our estimates apply equally well to solutions of an equation as well as to
solutions of curvature flows. Since curvature flows encompass equations, let
us state the main estimate for curvature flows.
Let Ω ⊂ N be precompact and connected, and 0 < f ∈ Cm,α(Ω̄). We
consider the curvature flow
(1.13)
ẋ = −(Φ− f̃)ν
x(0) = x0,
where Φ is Φ(r) = r and f̃ = f , x0 is the embedding of an initial admissible
hypersurface M0 of class C
m+2,α such that
(1.14) Φ− f̃ ≥ 0 at t = 0,
where of course Φ = Φ(F ) = F . We introduce the technical function Φ in the
present case only to make a comparison with former results, which all use
the notation for the more general flows, easier.
We assume that Ω̄ is covered by a Gaussian coordinate system (xα), 0 ≤
1 ≤ n, such that the metric can be expressed as
(1.15) ds̄2 = e2ψ{(dx0)2 + σijdx
and Ω̄ is covered by the image of the cylinder
(1.16) I × S0
where S0 is a compact Riemannian manifold and I = x
0(Ω̄), x0 is a global
coordinate defined in Ω̄ and (xi) are local coordinates of S0.
4 CLAUS GERHARDT
Furthermore we assume that M0 and the other flow hypersurfaces can
be written as graphs over S0. The flow should exist in a maximal time
interval [0, T ∗), stay in Ω, and uniform C1-estimates should already have
been established.
1.3. Remark. The assumption on the existence of the Gaussian co-
ordinate system and the fact that the hypersurfaces can be written as
graphs could be replaced by assuming the existence of a unit vector field
η ∈ C2(T 0,1(Ω̄)) and of a constant θ > 0 such that
(1.17) 〈η, ν〉 ≥ 2θ
uniformly during the flow, since this assumption would imply uniform C1-
estimates, which are the requirement that the induced metric can be esti-
mated accordingly by controlled metrics from below and above, and because
the existence of such a vector field is essential for the curvature estimate.
If the flow hypersurfaces are graphs in a Gaussian coordinate system, then
such a vector field is given by
(1.18) η = (ηα) = e
ψ(1, 0, . . . , 0)
and the C1-estimates are tantamount to the validity of inequality (1.17).
In case N = Rn+1 and starshaped hypersurfaces one could also use the
(1.19) 〈x, ν〉,
cf. [3, Lemma 3.5].
Then we shall prove:
1.4. Theorem. Under the assumptions stated above the principal curva-
tures κi of the flow hypersurfaces are uniformly bounded from above
(1.20) κi ≤ c,
provided there exists a strictly convex function χ ∈ C2(Ω̄). The constant c
only depends on |f |2,Ω, θ, F (1, . . . , 1), the initial data, and the estimates for
χ and those of the ambient Riemann curvature tensor in Ω̄.
Moreover, the κi will stay in a compact set of Γ .
As an application of this estimate our former results on the existence of
a strictly convex hypersurface M solving the equation (1.1), [4, 5], which we
proved for curvature functions F of class (K), are now valid for curvature
functions F satisfying Assumption 1.2 with Γ = Γ+.
We are even able to solve the existence problem by using a curvature
flow which formerly only worked in case that the sectional curvature of the
ambient space was non-positive.
CURVATURE ESTIMATES 5
1.5. Theorem. Let F satisfy the assumptions above with Γ = Γ+ and
assume that the boundary of Ω has two components
(1.21) ∂Ω = M1
∪ M2,
where the Mi are closed, connected strictly convex hypersurfaces of class
Cm+2,α, m ≥ 4, which can be written as graphs in a normal Gaussian coordi-
nate system covering Ω̄, and where we assume that the normal of M1 points
outside of Ω and that of M2 inside. Let 0 < f ∈ C
m,α(Ω̄), and assume that
M1 is a lower barrier for the pair (F, f) and M2 an upper barrier, then the
problem (1.1) has a strictly convex solution M ∈ Cm+2,α provided there exists
a strictly convex function χ ∈ C2(Ω̄). The solution is the limit hypersurface
of a converging curvature flow.
2. Curvature estimates
Let M(t) be the flow hypersurfaces, then their second fundamental form
i satisfies the evolution equation, cf. [7, Lemma 2.4.1]:
2.1. Lemma. The mixed tensor h
i satisfies the parabolic equation
(2.1)
i − Φ̇F
i;kl =
Φ̇F klhrkh
i − Φ̇Fhrih
rj + (Φ− f̃)hki h
− f̃αβx
gkj + f̃αν
i + Φ̇F
kl,rshkl;ih
+ Φ̈FiF
j + 2Φ̇F klR̄αβγδx
− Φ̇F klR̄αβγδx
rj − Φ̇F klR̄αβγδx
+ Φ̇F klR̄αβγδν
νγxδl h
i − Φ̇F R̄αβγδν
γxδmg
+ (Φ− f̃)R̄αβγδν
γxδmg
+ Φ̇F klR̄αβγδ;ǫ{ν
mj + ναx
Let η be the vector field (1.18), or any vector field satisfying (1.17), and
(2.2) ṽ = 〈η, ν〉,
then we have:
2.2. Lemma (Evolution of ṽ). The quantity ṽ satisfies the evolution equa-
(2.3)
˙̃v − Φ̇F ij ṽij =Φ̇F
j ṽ − [(Φ− f̃)− Φ̇F ]ηαβν
− 2Φ̇F ijhkjx
ηαβ − Φ̇F
ijηαβγx
− Φ̇F ijR̄αβγδν
xδjηǫx
− f̃βx
k ηαg
6 CLAUS GERHARDT
The derivation is elementary, see the proof of the corresponding lemma in
the Lorentzian case [7, Lemma 2.4.4].
Notice that ṽ is supposed to satisfy (1.17), hence
(2.4) ϕ = − log(ṽ − θ)
is well defined and there holds
(2.5) ϕ̇− Φ̇F ijϕij = −{ ˙̃v − Φ̇F
ij ṽij}
ṽ − θ
− Φ̇F ijϕiϕj .
Finally, let χ be the strictly convex function. Its evolution equation is
(2.6)
χ̇− Φ̇F ijχij = −[(Φ− f̃)− Φ̇F ]χαν
α − Φ̇F ijχαβx
≤ −[(Φ− f̃)− Φ̇F ]χαν
α − c0Φ̇F
ijgij
where c0 > 0 is independent of t.
We can now prove Theorem 1.4:
Proof of Theorem 1.4. Let ζ and w be respectively defined by
ζ = sup{ hijη
iηj : ‖η‖ = 1 },(2.7)
w = log ζ + ϕ+ λχ,(2.8)
where λ > 0 is supposed to be large. We claim that w is bounded, if λ is
chosen sufficiently large.
Let 0 < T < T ∗, and x0 = x0(t0), with 0 < t0 ≤ T , be a point in M(t0)
such that
(2.9) sup
w < sup{ sup
w : 0 < t ≤ T } = w(x0).
We then introduce a Riemannian normal coordinate system (ξi) at x0 ∈
M(t0) such that at x0 = x(t0, ξ0) we have
(2.10) gij = δij and ζ = h
Let η̃ = (η̃i) be the contravariant vector field defined by
(2.11) η̃ = (0, . . . , 0, 1),
and set
(2.12) ζ̃ =
hij η̃
gij η̃
ζ̃ is well defined in neighbourhood of (t0, ξ0).
Now, define w̃ by replacing ζ by ζ̃ in (2.8); then, w̃ assumes its maximum
at (t0, ξ0). Moreover, at (t0, ξ0) we have
(2.13)
ζ = ḣnn,
and the spatial derivatives do also coincide; in short, at (t0, ξ0) ζ̃ satisfies the
same differential equation (2.1) as hnn. For the sake of greater clarity, let us
therefore treat hnn like a scalar and pretend that w is defined by
(2.14) w = log hnn + ϕ+ λχ.
CURVATURE ESTIMATES 7
From the equations (2.1), (2.5), (2.6) and (1.6), we infer, by observing the
special form of Φ, i.e., Φ(F ) = F , Φ̇ = 1, f̃ = f and using the monotonicity
and homgeneity of F
(2.15) F = F (κi) = F (
, . . . , 1)κn ≤ F (1, . . . , 1)κn
that in (t0, ξ0)
(2.16)
0 ≤ − 1
Φ̇F ijhkih
ṽ − θ
− fhnn + c(θ)Φ̇F
ijgij + λc
− λc0Φ̇F
gij − Φ̇F
ϕiϕj + Φ̇F
ij(log hnn)i(log h
κn − κ1
(Fn − Fi)(h
ni; )
2(hnn)
Similarly as in [6, p. 197], we distinguish two cases
Case 1. Suppose that
(2.17) |κ1| ≥ ǫ1κn,
where ǫ1 > 0 is small, notice that the principal curvatures are labelled ac-
cording to (1.5). Then, we infer from [6, Lemma 8.3]
(2.18) F ijhkih
F ijgijǫ
(2.19) F ijgij ≥ F (1, . . . , 1),
for a proof see e.e., [7, Lemma 2.2.19].
Since Dw = 0,
(2.20) D log hnn = −Dϕ− λDχ,
we obtain
(2.21) Φ̇F ij(log hnn)i(log h
n)j = Φ̇F
ϕiϕj + 2λΦ̇F
ϕiχj + λ
χiχj ,
where
(2.22) |ϕi| ≤ c|κi|+ c,
as one easily checks.
Hence, we conclude that κn is a priori bounded in this case.
Case 2. Suppose that
(2.23) κ1 ≥ −ǫ1κn,
8 CLAUS GERHARDT
then, the last term in inequality (2.16) is estimated from above by
(2.24)
1 + ǫ1
(Fn − Fi)(h
ni; )
2(hnn)
1 + 2ǫ1
(Fn − Fi)(h
nn; )
2(hnn)
+ c(ǫ1)Φ̇
(Fi − Fn)κ
where we used the Codazzi equation. The last sum can be easily balanced.
The terms in (2.16) containing the derivative of hnn can therefore be esti-
mated from above by
(2.25)
1− 2ǫ1
1 + 2ǫ1
nn; )
2(hnn)
1 + 2ǫ1
(h inn; )
2(hnn)
≤ Φ̇Fn
(h inn; )
2(hnn)
= Φ̇Fn‖Dϕ+ λDχ‖
= Φ̇Fn{‖Dϕ‖
2 + λ2‖Dχ‖2 + 2λ〈Dϕ,Dχ〉}.
Hence we finally deduce
(2.26)
0 ≤ −Φ̇1
ṽ − θ
+ cλ2Φ̇Fn(1 + κn)− fκn + λc
+ (c(θ) − λc0)Φ̇F
ijgij
Thus, we obtain an a priori estimate
(2.27) κn ≤ const,
if λ is chosen large enough. Notice that ǫ1 is only subject to the requirement
0 < ǫ1 <
2.3. Remark. Since the initial condition F ≥ f is preserved under the
flow, a simple application of the maximum principle, cf. [4, Lemma 5.2], we
conclude that the principal curvatures of the flow hypersurfaces stay in a
compact subset of Γ .
2.4. Remark. These a priori estimates are of course also valid, if M is a
stationary solution.
CURVATURE ESTIMATES 9
3. Proof of Theorem 1.5
We consider the curvature flow (1.13) with initial hypersurface M0 = M2.
The flow will exist in a maximal time interval [0, T ∗) and will stay in Ω̄.
We shall also assume that M2 is not already a solution of the problem for
otherwise the flow will be stationary from the beginning.
Furthermore, the flow hypersurfaces can be written as graphs
(3.1) M(t) = graphu(t, ·)
over S0, since the initial hypersurface has this property and all flow hypersur-
faces are supposed to be convex, i.e., uniform C1-estimates are guaranteed,
cf. [4].
The curvature estimates from Theorem 1.4 ensure that the curvature op-
erator is uniformly elliptic, and in view of well-known regularity results we
then conclude that the flow exists for all time and converges in Cm+2,β(S0)
for some 0 < β ≤ α to a limit hypersurface M , that will be a stationary
solution, cf. [8, Section 6].
References
[1] J. M. Ball, Differentiability properties of symmetric and isotropic functions, Duke
Math. J. 51 (1984), no. 3, 699–728.
[2] Klaus Ecker and Gerhard Huisken, Immersed hypersurfaces with constant Weingarten
curvature., Math. Ann. 283 (1989), no. 2, 329–332.
[3] Claus Gerhardt, Flow of nonconvex hypersurfaces into spheres, J. Diff. Geom. 32
(1990), 299–314.
[4] , Closed Weingarten hypersurfaces in Riemannian manifolds, J. Diff. Geom. 43
(1996), 612–641, pdf file.
[5] , Hypersurfaces of prescribed Weingarten curvature, Math. Z. 224 (1997),
167–194, pdf file.
[6] , Hypersurfaces of prescribed scalar curvature in Lorentzian manifolds, J. reine
angew. Math. 554 (2003), 157–199, math.DG/0207054.
[7] , Curvature Problems, Series in Geometry and Topology, vol. 39, International
Press, Somerville, MA, 2006, 323 pp.
[8] , Curvature flows in semi-Riemannian manifolds, arXiv:0704.0236, 48 pages.
[9] Weimin Sheng, John Urbas, and Xu-Jia Wang, Interior curvature bounds for a class
of curvature equations, Duke Math. J. 123 (2004), no. 2, 235–264.
Ruprecht-Karls-Universität, Institut für Angewandte Mathematik, Im Neuen-
heimer Feld 294, 69120 Heidelberg, Germany
E-mail address: gerhardt@math.uni-heidelberg.de
URL: http://www.math.uni-heidelberg.de/studinfo/gerhardt/
http://www.math.uni-heidelberg.de/studinfo/gerhardt/Gerhardt-JDG-96.pdf
http://www.math.uni-heidelberg.de/studinfo/gerhardt/MZ224,97.pdf
http://arXiv.org/pdf/math.DG/0207054
http://arXiv.org/abs/0704.0236
	1. Introduction
	2. Curvature estimates
	3. Proof of Theorem 1.5
	References
ABSTRACT
  We prove curvature estimates for general curvature functions. As an
application we show the existence of closed, strictly convex hypersurfaces with
prescribed curvature $F$, where the defining cone of $F$ is $\C_+$. $F$ is only
assumed to be monotone, symmetric, homogeneous of degree 1, concave and of
class $C^{m,\al}$, $m\ge4$.

<|endoftext|><|startoftext|>
Introduction and main result
We prove a quenched functional central limit theorem for non-nestling random walk
in random environment (RWRE) on the d-dimensional integer lattice Zd in dimensions
d ≥ 2. Here is a general description of the model, fairly standard since quite a while.
An environment ω is a configuration of transition probability vectors ω = (ωx)x∈Zd ∈
Ω = PZd , where P = {(pz)z∈Zd : pz ≥ 0,
z pz = 1} is the simplex of all probability
vectors on Zd. Vector ωx = (ωx,z)z∈Zd gives the transition probabilities out of state
x, denoted by πx,y(ω) = ωx,y−x. To run the random walk, fix an environment ω and
an initial state z ∈ Zd. The random walk X0,∞ = (Xn)n≥0 in environment ω started
at z is then the canonical Markov chain with state space Zd whose path measure P ωz
satisfies
P ωz (X0 = z) = 1 and P
z (Xn+1 = y|Xn = x) = πx,y(ω).
On the space Ω we put its product σ-fieldS, natural shifts πx,y(Tzω) = πx+z,y+z(ω),
and a {Tz}-invariant probability measure P that makes the system (Ω,S, (Tz)z∈Zd,P)
ergodic. In this paper P is an i.i.d. product measure on PZd . In other words, the
vectors (ωx)x∈Zd are i.i.d. across the sites x under P.
Date: November 21, 2018.
2000 Mathematics Subject Classification. 60K37, 60F05, 60F17, 82D30.
Key words and phrases. Random walk, non-nestling, random environment, central limit theorem,
invariance principle, point of view of the particle, environment process, Green function.
1Department of Mathematics, University of Utah.
1Supported in part by NSF Grant DMS-0505030.
2Mathematics Department, University of Wisconsin-Madison.
2Supported in part by NSF Grant DMS-0402231.
http://arxiv.org/abs/0704.1022v2
2 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Statements, probabilities and expectations under a fixed environment, such as
the distribution P ωz above, are called quenched. When also the environment is av-
eraged out, the notions are called averaged, or also annealed. In particular, the
averaged distribution Pz(dx0,∞) of the walk is the marginal of the joint distribution
Pz(dx0,∞, dω) = P
z (dx0,∞)P(dω) on paths and environments.
Several excellent expositions on RWRE exist, and we refer the reader to the lectures
[3], [15] and [18]. We turn to the specialized assumptions imposed on the model in
this paper.
The main assumption is non-nestling (N) which guarantees a drift uniformly over
the environments. The terminology was introduced by Zerner [19].
Hypothesis (N). There exists a vector û ∈ Zd \ {0} and a constant δ > 0 such that
z · û π0,z(ω) ≥ δ
There is no harm in assuming û ∈ Zd, and this is convenient. We utilize two
auxiliary assumptions: an exponential moment bound (M) on the steps of the walk,
and some regularity (R) on the environments.
Hypothesis (M). There exist positive constants M and s0 such that
es0|z|π0,z(ω) ≤ es0M
Hypothesis (R). There exists a constant κ > 0 such that
z: z·û=1
π0,z(ω) ≥ κ
= 1. (1.1)
Let J = {z : Eπ0,z > 0} be the set of admissible steps under P. Then
P{∀z : π0,0 + π0,z < 1} > 0 and J 6⊂ Ru for all u ∈ Rd. (1.2)
Assumption (1.1) above is stronger than needed. In the proofs it is actually used in
the form (7.5) [Section 7] that permits backtracking before hitting the level x · û = 1.
At the expense of additional technicalities in Section 7 quenched assumption (1.1)
can be replaced by an averaged requirement.
Assumption (1.2) is used in Lemma 7.10. It is necessary for the quenched CLT as
was discovered already in the simpler forbidden direction case we studied in [10] and
[11]. Note that assumption (1.2) rules out the case d = 1. However, the issue is not
whether the walk is genuinely d-dimensional, but whether the walk can explore its
environment thoroughly enough to suppress the fluctuations of the quenched mean.
Most work on RWRE takes uniform ellipticity and nearest-neighbor jumps as standing
assumptions, which of course imply Hypotheses (M) and (R).
QUENCHED FUNCTIONAL CLT FOR RWRE 3
These assumptions are more than strong enough to imply a law of large numbers:
there exists a velocity v 6= 0 such that
n−1Xn = v
= 1. (1.3)
Representations for v are given in (2.5) and Lemma 5.1. Define the (approximately)
centered and diffusively scaled process
Bn(t) =
X[nt] − [nt]v√
. (1.4)
As usual [x] = max{n ∈ Z : n ≤ x} is the integer part of a real x. Let DRd[0,∞) be
the standard Skorohod space of Rd-valued cadlag paths (see [6] for the basics). Let
Qωn = P
0 (Bn ∈ · ) denote the quenched distribution of the process Bn on DRd[0,∞).
The results of this paper concern the limit of the process Bn as n → ∞. As
expected, the limit process is a Brownian motion with correlated coordinates. For a
symmetric, non-negative definite d × d matrix D, a Brownian motion with diffusion
matrix D is the Rd-valued process {B(t) : t ≥ 0} with continuous paths, independent
increments, and such that for s < t the d-vector B(t)−B(s) has Gaussian distribution
with mean zero and covariance matrix (t − s)D. The matrix D is degenerate in
direction u ∈ Rd if utDu = 0. Equivalently, u · B(t) = 0 almost surely.
Here is the main result.
Theorem 1.1. Let d ≥ 2 and consider a random walk in an i.i.d. product random
environment that satisfies non-nestling (N), the exponential moment hypothesis (M),
and the regularity in (R). Then for P-almost every ω distributions Qωn converge weakly
on DRd [0,∞) to the distribution of a Brownian motion with a diffusion matrix D
that is independent of ω. utDu = 0 iff u is orthogonal to the span of {x − y :
E(π0x)E(π0y) > 0}.
Eqn (2.6) gives the expression for the diffusion matrix D, familiar for example from
[14]. Before turning to the proofs we discuss briefly the current situation in this area
of probability and the place of this work in this context.
Several different approaches can be identified in recent work on quenched central
limit theorems for multidimensional RWRE. (i) Small perturbations of classical ran-
dom walk have been studied by many authors. The most significant results include the
early work of Bricmont and Kupiainen [4] and more recently Sznitman and Zeitouni
[16] for small perturbations of Brownian motion in dimension d ≥ 3. (ii) An aver-
aged CLT can be turned into a quenched CLT by bounding certain variances through
the control of intersections of two independent paths. This idea was introduced by
Bolthausen and Sznitman in [2] and more recently applied by Berger and Zeitouni
in [1]. Both utilize high dimension to handle the intersections. (iii) Our approach
is based on the subdiffusivity of the quenched mean of the walk. That is, we show
that the variance of Eω0 (Xn) is of order n
2α for some α < 1/2. We also achieve this
through intersection bounds. Instead of high dimension we assume strong enough
4 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
drift. We introduced this line of reasoning in [9] and later applied it to the case of
walks with a forbidden direction in [11]. The significant advance taken in the present
paper over [9] and [11] is the elimination of restrictions on the admissible steps of
the walk. Theorem 2.1 below summarizes the general principle for application in this
paper.
As the reader will see, the arguments in this paper are based on quenched expo-
nential bounds that flow from Hypotheses (N), (M) and (R). It is common in this
field to look for an invariant measure P∞ for the environment process that is mutu-
ally absolutely continuous with the original P, at least on the part of the space Ω
to which the drift points. In this paper we do things a little differently: instead of
the absolute continuity, we use bounds on the variation distance between P∞ and P.
This distance will decay exponentially in the direction û.
In the case of nearest-neighbor, uniformly elliptic non-nestling walks in dimension
d ≥ 4 the quenched CLT has been proved earlier: first by Bolthausen and Sznitman
[2] under a small noise assumption, and recently by Berger and Zeitouni [1] without
the small noise assumption. Berger and Zeitouni [1] go beyond non-nestling to more
general ballistic walks. The method in these two papers utilizes high dimension
crucially. Whether their argument can work in d = 3 is not presently clear. The
approach of the present paper should work for more general ballistic walks in all
dimensions d ≥ 2, as the main technical step that reduces the variance estimate to
an intersection estimate is generalized (Section 6 in the present paper).
We turn to the proofs. The next section collects some preliminary material and
finishes with an outline of the rest of the paper.
2. Preliminaries for the proof.
As mentioned, we can assume that û ∈ Zd. This is convenient because then the
lattice Zd decomposes into levels identified by the integer value x · û.
Let us summarize notation for the reader’s convenience. Constants whose exact
values are not important and can change from line to line are often denoted by C
and s. The set of nonnegative integers is N = {0, 1, 2, . . . }. Vectors and sequences
are abbreviated xm,n = (xm, xm+1, . . . , xn) and xm,∞ = (xm, xm+1, xm+2, . . . ). Similar
notation is used for finite and infinite random paths: Xm,n = (Xm, Xm+1, . . . , Xn)
and Xm,∞ = (Xm, Xm+1, Xm+2, . . . ). X[0,n] = {Xk : 0 ≤ k ≤ n} denotes the set of
sites visited by the walk. Dt is the transpose of a vector or matrix D. An element
of Rd is regarded as a d× 1 column vector. The left shift on the path space (Zd)N is
(θkx0,∞)n = xn+k.
E, E0, and E
0 denote expectations under, respectively, P, P0, and P
0 . P∞ will
denote an invariant measure on Ω, with expectation E∞. We abbreviate P
0 (·) =
0 (·) and E∞0 (·) = E∞Eω0 (·) to indicate that the environment of a quenched
expectation is averaged under P∞. A family of σ-algebras on Ω that in a sense look
towards the future is defined by Sℓ = σ{ωx : x · û ≥ ℓ}.
QUENCHED FUNCTIONAL CLT FOR RWRE 5
Define the drift
D(ω) = Eω0 (X1) =
zπ0z(ω).
The environment process is the Markov chain on Ω with transition kernel
Π(ω,A) = P ω0 (TX1ω ∈ A).
The proof of the quenched CLT Theorem 1.1 utilizes crucially the environment
process and its invariant distribution. A preliminary part of the proof is summarized
in the next theorem quoted from [9]. This Theorem 2.1 was proved by applying
the arguments of Maxwell and Woodroofe [8] and Derriennic and Lin [5] to the
environment process.
Theorem 2.1. [9] Let d ≥ 1. Suppose the probability measure P∞ on (Ω,S) is
invariant and ergodic for the Markov transition Π. Assume that
z |z|2E∞(π0z) <∞
and that there exists an α < 1/2 such that as n→ ∞
|Eω0 (Xn)− nE∞(D)|
= Ŏ (n2α). (2.1)
Then as n → ∞ the following weak limit happens for P∞-a.e. ω: distributions Qωn
converge weakly on the space DRd[0,∞) to the distribution of a Brownian motion with
a symmetric, non-negative definite diffusion matrix D that is independent of ω.
Another central tool for the development that follows is provided by the Sznitman-
Zerner regeneration times [17] that we now define. For ℓ ≥ 0 let λℓ be the first time
the walk reaches level ℓ relative to the initial level:
λℓ = min{n ≥ 0 : Xn · û−X0 · û ≥ ℓ}.
Define β to be the first backtracking time:
β = inf{n ≥ 0 : Xn · û < X0 · û}.
Let Mn be the maximum level, relative to the starting level, reached by time n:
Mn = max{Xk · û−X0 · û : 0 ≤ k ≤ n}.
For a > 0, and when β <∞, consider the first time by which the walker reaches level
Mβ + a:
λMβ+a = inf{n ≥ β : Xn · û−X0 · û ≥Mβ + a}.
Let S0 = λa and, as long as β ◦ θSk−1 < ∞, define Sk = Sk−1 + λMβ+a ◦ θSk−1 for
k ≥ 1. Finally, let the first regeneration time be
Sℓ1I{β ◦ θSk <∞ for 0 ≤ k < ℓ and β ◦ θSℓ = ∞}. (2.2)
Non-nestling guarantees that τ
1 is finite, and in fact gives moment bounds uniformly
in ω as we see in Lemma 3.1 below. Consequently we can iterate to define τ
0 = 0,
and for k ≥ 1
k = τ
k−1 + τ
1 ◦ θτ
k−1 . (2.3)
6 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
When the value of a is not important we simplify the notation to τk = τ
k . Sznit-
man and Zerner [17] proved that the regeneration slabs
τk+1 − τk, (Xτk+n −Xτk)0≤n≤τk+1−τk ,
{ωXτk+z : 0 ≤ z · û < (Xτk+1 −Xτk) · û}
) (2.4)
are i.i.d. for k ≥ 1, each distributed as
τ1, (Xn)0≤n≤τ1 , {ωz : 0 ≤ z·û < Xτ1 ·û}
under
P0( · | β = ∞). Strictly speaking, uniform ellipticity and nearest-neighbor jumps were
standing assumptions in [17], but these assumptions are not needed for the proof of
the i.i.d. structure.
From the renewal structure and moment estimates a law of large numbers (1.3) and
an averaged functional central limit theorem follow, along the lines of Theorem 2.3
in [17] and Theorem 4.1 in [14]. These references treat walks that satisfy Kalikow’s
condition, considerably more general than the non-nestling walks we study. The
limiting velocity for the law of large numbers is
E0(Xτ1 |β = ∞)
E0(τ1|β = ∞)
. (2.5)
The averaged CLT states that the distributions P0{Bn ∈ · } converge to the distri-
bution of a Brownian motion with diffusion matrix
(Xτ1 − τ1v)(Xτ1 − τ1v)t|β = ∞
E0[τ1|β = ∞]
. (2.6)
Once we know that the P-a.s. quenched CLT holds with a constant diffusion matrix,
this diffusion matrix must be the same D as for the averaged CLT. We give here the
argument for the degeneracy statement of Theorem 1.1.
Lemma 2.1. Define D by (2.6) and let u ∈ Rd. Then utDu = 0 iff u is orthogonal
to the span of {x− y : E(π0x)E(π0y) > 0}.
Proof. The argument is a minor embellishment of that given for a similar degeneracy
statement on p. 123–124 of [10] for the forbidden-direction case where π0,z is supported
by z · û ≥ 0. We spell out enough of the argument to show how to adapt that proof
to the present case.
Again, the intermediate step is to show that utDu = 0 iff u is orthogonal to the
span of {x− v : E(π0x) > 0}. The argument from orthogonality to utDu = 0 goes as
in [10, p. 124].
Suppose utDu = 0 which is the same as
P0(Xτ1 · u = τ1v · u | β = ∞) = 1.
Suppose z is such that Eπ0,z > 0 and z · û < 0. By non-nestling there must exist w
such that Eπ0,zπ0,w > 0 and w · û > 0. Pick m > 0 so that (z + mw) · û > 0 but
QUENCHED FUNCTIONAL CLT FOR RWRE 7
(z + (m− 1)w) · û ≤ 0. Take a = 1 in the definition (2.2) of regeneration. Then
P0[Xτ1 = z + 2mw, τ1 = 2m+ 1 | β = ∞]
[ ( m−1∏
πiw,(i+1)w
πmw,z+mw
( m−1∏
πz+(m+j)w,z+(m+j+1)w
P ωz+2mw(β = ∞)
Consequently
(z + 2mw) · u = (1 + 2m)v · u. (2.7)
In this manner, by replacing σ1 with τ1 and by adding in the no-backtracking
probabilities, the arguments in [10, p. 123] can be repeated to show that if Eπ0x > 0
then x · u = v · u for x such that x · û ≥ 0. In particular the very first step on p. 123
of [10] gives w · u = v · u. This combines with (2.7) above to give z · u = v · u. Now
simply follow the proof in [10, p. 123–124] to its conclusion. �
Here is an outline of the proof of Theorem 1.1. It all goes via Theorem 2.1.
(i) After some basic estimates in Section 3, we prove in Section 4 the existence of the
ergodic equilibrium P∞ required for Theorem 2.1. P∞ is not convenient to work with
so we still need to do computations with P. For this purpose Section 4 proves that in
the direction û the measures P∞ and P come exponentially close in variation distance
and that the environment process satisfies a P0-a.s. ergodic theorem. In Section 5
we show that P∞ and P are interchangeable both in the hypotheses that need to be
checked and in the conclusions obtained. In particular, the P∞-a.s. quenched CLT
coming from Theorem 2.1 holds also P-a.s. Then we know that the diffusion matrix
D is the one in (2.6).
The bulk of the work goes towards verifying condition (2.1), but under P instead
of P∞. There are two main stages to this argument.
(ii) By a decomposition into martingale increments the proof of (2.1) reduces to
bounding the number of common points of two independent walks in a common
environment (Section 6).
(iii) The intersections are controlled by introducing levels at which both walks
regenerate. These common regeneration levels are reached fast enough and the pro-
gression from one common regeneration level to the next is a Markov chain. When
this Markov chain drifts away from the origin it can be approximated well enough by
a symmetric random walk. This approximation enables us to control the growth of
the Green function of the Markov chain, and thereby the number of common points.
This is in Section 7 and in an Appendix devoted to the Green function bound.
3. Basic estimates for non-nestling RWRE
This section contains estimates that follow from Hypotheses (N) and (M), all col-
lected in the following lemma. These will be used repeatedly. In addition to the
8 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
stopping times already defined, let
Hz = min{n ≥ 1 : Xn = z}
be the first hitting time of site z.
Lemma 3.1. If P satisfies Hypotheses (N) and (M), then there exist positive constants
η, γ, κ, (Cp)p≥1, and s1 ≤ s0, possibly depending on M , s0, and δ, such that for all
x ∈ Zd, n ≥ 0, s ∈ [0, s1], p ≥ 1, ℓ ≥ 1, for z such that z · û ≥ 0, a ≥ 1, and for
P-a.e. ω,
Eωx (e
−sXn·û) ≤ e−sx·û(1− sδ/2)n, (3.1)
Eωx (e
s|Xn−x|) ≤ esMn, (3.2)
P ωx (X1 · û ≥ x · û+ γ) ≥ κ, (3.3)
Eωx (λ
ℓ) ≤ Cpℓp, (3.4)
Eωx (|Xλℓ − x|p) ≤ Cpℓp, (3.5)
Eω0 [(MHz − z · û)p1I{Hz < n}] ≤ CpℓpP ω0 (Hz < n) + Cps−pe−sℓ/2, (3.6)
P ωx (β = ∞) ≥ η, (3.7)
Eωx (|τ
1 |p) ≤ Cp ap, (3.8)
Eωx (|Xτ (a)1 +n −Xn|
p) ≤ Cq aq, for all q > p. (3.9)
The particular point in (3.8)–(3.9) is to make the dependence on a explicit. Note
that (3.7)–(3.8) give
E0(τj − τj−1)p <∞ (3.10)
for all j ≥ 1. In Section 4 we construct an ergodic invariant measure P∞ for the
environment chain in a way that preserves the conclusions of this lemma under P∞.
Proof. Replacing x by 0 and ω by Txω allows us to assume that x = 0. Then for all
s ∈ [0, s0/2]
∣∣Eω0 (e−sX1·û)− 1 + sEω0 (X1 · û)
∣∣ ≤ |û|2Eω0 (|X1|2es0|X1|/2)
≤ (2|û|/s0)2es0Ms2 = cs2,
where we used moment assumption (M). Then by the non-nestling assumption (N)
Eω0 (e
−sXn·û|Xn−1) = e−sXn−1·ûEωXn−1(e
−s(X1−X0)·û) ≤ e−sXn−1·û(1− sδ + cs2).
Taking now the quenched expectation of both sides and iterating the procedure proves
(3.1), provided s1 is small enough. To prove (3.2) one can instead show that
Eω0 (e
k=1 |Xk−Xk−1|) ≤ esnM .
This can be proved by induction as for (3.1), using only Hypothesis (M) and Hölder’s
inequality (to switch to s0).
QUENCHED FUNCTIONAL CLT FOR RWRE 9
Concerning (3.3), we have
P ω0 (X1 · û ≥ γ) ≥ (1− eγs(1− sδ/2))−→
sδ/2.
So taking γ small enough and κ slightly smaller than sδ/2 does the job.
Notice next that P ω0 (λ1 <∞) = 1 due to (3.1). P-a.s. Then
Eω0 (λ
(n+ 1)pP ω0 (λ1 > n) ≤
(n+ 1)pP ω0 (Xn · û ≤ 1)
(n + 1)pEω0 (e
−sXn·û).
The last expression is bounded if s is small enough. Therefore,
Eω0 (λ
ℓ) ≤ Eω0
[ ∣∣∣
[ℓ]+1∑
(λi − λi−1)
≤ ([ℓ] + 1)p−1
[ℓ]+1∑
EωXλi−1
≤ Cpℓp.
Bound (3.5) is proved similarly: by the Cauchy-Schwarz inequality, Hypothesis (M)
and (3.1),
Eω0 (|Xλ1 |p) ≤
Eω0 (|Xn|2p)1/2P ω0 (Xn−1 · û < 1)1/2
[2p]! s
−[2p]
s0Mes
)1/2 ∑
(1− sδ/2)(n−1)/2np ≤ Cp.
To prove (3.6), write
Eω0 [(MHz − z · û)p1I{Hz < n}]
ℓp−1P ω0 (MHz − z · û ≥ ℓ,Hz < n) + Cpℓ
0 (Hz < n)
ℓp−1Eω0 [P
Xλz·û+ℓ
(Xk · û−X0 · û ≤ −ℓ)] + Cpℓp0P ω0 (Hz < n)
ℓp−1e−sℓ + Cpℓ
0 (Hz < n) ≤ Cps−pe−sℓ0/2 + Cpℓ
0 (Hz < n).
To prove (3.7), note that Chebyshev inequality and (3.1) give, for s > 0 small
enough, ℓ ≥ 1, and P-a.e. ω
P ω0 (λ−ℓ+1 <∞) ≤
P ω0 (Xn · û ≤ −(ℓ− 1)) ≤ 2(sδ)−1e−s(ℓ−1).
On the other hand, for an integer ℓ ≥ 2 we have
P ω0 (λℓ < β) ≥
P ω0 (λℓ−1 < β,Xλℓ−1 = x)P
x (λ−ℓ+1 = ∞).
10 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Therefore, taking ℓ to infinity one has, for ℓ0 large enough,
P ω0 (β = ∞) ≥ P ω0 (λℓ0 < β)
(1− 2(sδ)−1e−sℓ).
Markov property and (3.3) give P ω0 (λℓ0 < β) ≥ κℓ0/γ+1 > 0 and (3.7) is proved.
Now we will bound the quenched expectation of λ
1I{β < ∞} uniformly in ω.
To this end, for p1 > p and q1 = p1(p1 − p)−1, we have by (3.7)
Eω0 (λ
1I{β <∞}) ≤
Eω0 (λ
1I{β = n})
Eω0 (λ
)p/p1(
P ω0 (β = n)
)1/q1
By (3.4) one has, for p2 > p1 > p and q2 = p2(p2 − p1)−1,
Eω0 (λ
Eω0 (λ
m+1+a)
)p1/p2(
P ω0 ([Mn] = m)
)1/q2
(m+ 1 + a)p1
P ω0 (Xi · û ≥ m)
)1/q2
where Cp really depends on p1 and p2, but these are chosen arbitrarily, as long as
they satisfy p2 > p1 > p. Using (3.2) one has
P ω0 (Xi · û ≥ m) ≤
1 if m < 2M |û|i,
e−smeM |û|si if m ≥ 2M |û|i.
Hence,
Eω0 (λ
) ≤ Cp
(m+ 1 + a)p1(n1I{m < 2Mn|û|}+ e−sm/2)1/q2
≤ Cpn(n+ a)p1n1/q2 + Cp
(m+ 1)p1e−sm/2q2 + Cpa
e−sm/2q2
≤ Cpn1+1/q2(n+ a)p1 .
Since {β = n} ⊂ {Xn · û ≤ 0}, one can use (3.1) to conclude that
Eω0 (λ
1I{β <∞}) ≤ Cp
np/p1+p/(p1q2)(n+ a)p(1− sδ/2)n/q1 ≤ Cpap.
QUENCHED FUNCTIONAL CLT FOR RWRE 11
In the last inequality we have used the fact that a ≥ 1. Using, (3.7), the definition
of the times Sk, and the Markov property, one has
Eω0 [S
ℓ 1I{β ◦ θSk <∞ for 0 ≤ k < ℓ and β ◦ θSℓ = ∞}]
≤ (ℓ+ 1)p−1
Eω0 [λ
a1I{β ◦ θSk <∞ for 0 ≤ k < ℓ}]
Eω0 [λ
◦ θSj1I{β ◦ θSk <∞ for 0 ≤ k < ℓ}]
≤ (ℓ+ 1)p−1
p(1− η)ℓ +
(1− η)jCpap(1− η)ℓ−j−1
≤ Cp(ℓ + 1)p(1− η)ℓ−1ap.
Bound (3.8) follows then from (2.2). To prove (3.9) let q > p and write
Eω0 (|Xτ (a)1 +n −Xn|
Eω0 (|Xk+1+n −Xk+n|p|τ
1 |p−11I{k < τ
k−1−q+pEω0 (|Xk+1+n −Xk+n|p|τ
1 |q)
k−1−q+pEω0 (|τ
1 |2q)1/2 ≤ Cq aq,
where we have used Hypothesis (M) along with the Cauchy-Schwarz inequality in
the second to last inequality and (3.8) in the last. This completes the proof of the
lemma. �
4. Invariant measure and ergodicity
For ℓ ∈ Z define the σ-algebras Sℓ = σ{ωx : x · û ≥ ℓ} on Ω. Denote the restriction
of the measure P to the σ-algebra Sℓ by P|Sℓ . In this section we prove the next
two theorems. The variation distance of two probability measures is dVar(µ, ν) =
sup{µ(A)− ν(A)} with the supremum taken over measurable sets A.
Theorem 4.1. Assume P is product non-nestling (N) and satisfies the moment hy-
pothesis (M). Then there exists a probability measure P∞ on Ω with these properties.
(a) P∞ is invariant and ergodic for the Markov transition kernel Π.
(b) There exist constants 0 < c, C <∞ such that for all ℓ ≥ 0
dVar(P∞|Sℓ ,P|Sℓ) ≤ Ce
−cℓ. (4.1)
(c) Hypotheses (N) and (M) and the conclusions of Lemma 3.1 hold P∞-almost
surely.
Along the way we also establish this ergodic theorem under the original environ-
ment measure. E∞ denotes expectation under P∞.
12 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Theorem 4.2. Assumptions as in Theorem 4.1 above. Let Ψ be a bounded S−a-
measurable function on Ω, for some 0 < a <∞. Then
Ψ(TXjω) = E∞Ψ P0-almost surely. (4.2)
The ergodic theorem tells us that there is a unique invariant P∞ in a natural
relationship to P, and that P∞ ≪ P on each σ-algebra S−a. Limit (4.2) cannot
hold for all bounded measurable Ψ on Ω because this would imply the absolute
continuity P∞ ≪ P on the entire space Ω. A counterexample that satisfies (N)
and (M) but where the quenched walk is degenerate was given by Bolthausen and
Sznitman [2, Proposition 1.5]. Whether regularity assumption (R) or ellipticity will
make a difference here is not presently clear. For the simpler case of space-time walks
(see description of model in [9]) with nondegenerate P ω0 absolute continuity P∞ ≪ P
does hold on the entire space. Theorem 3.1 in [2] proves this for nearest-neighbor
jumps with some weak ellipticity. The general case is no harder.
Proof of Theorems 4.1 and 4.2. Let Pn(A) = P0(TXnω ∈ A). A computation shows
fn(ω) =
(ω) =
P ωx (Xn = 0).
By hypotheses (M) and (N) we can replace the state space Ω = PZd with the
smaller space Ω0 = PZ
0 where
P0 = {(pz) ∈ P :
es0|z|pz ≤ es0M and
z · û pz ≥ δ }. (4.3)
Fatou’s lemma shows that the exponential bound is preserved by pointwise conver-
gence in P0. Then the exponential bound shows that the non-nestling property is also
preserved. Thus P0 is compact, and then Ω0 is compact under the product topology.
Compactness gives a subsequence {nj} along which the averages nj−1
m=1 Pm
converge weakly to a probability measure P∞ on Ω0. Hypotheses (N) and (M) transfer
to P∞ by virtue of having been included in the state space Ω0. Thus the proof of
Lemma 3.1 can be repeated for P∞-a.e. ω. We have verified part (c) of Theorem 4.1.
Next we check that P∞ is invariant under Π. Take a bounded, continuous local
function F on Ω0 that depends only on environments (ωx : |x| ≤ K). For ω, ω̄ ∈ Ω0
∣∣ΠF (ω)− ΠF (ω̄)
∣∣Eω0 [F (TX1ω)]−Eω̄0 [F (TX1ω̄)]
|z|≤C
∣∣∣π0,z(ω)F (Tzω)− π0,z(ω̄)F (Tzω̄)
∣∣∣+ ‖F‖∞
|z|>C
π0,z(ω) + π0,z(ω̄)
From this we see that ΠF is continuous. For let ω̄ → ω in Ω0 so that ω̄x,z → ωx,z at
each coordinate. Since the last term above is controlled by the uniform exponential
QUENCHED FUNCTIONAL CLT FOR RWRE 13
tail bound imposed on P0, continuity of ΠF follows. Consequently the weak limit
m=1 Pm → P∞ together with Pn+1 = PnΠ implies the Π-invariance of P∞.
We show the exponential bound (4.1) on the variation distance next because the
ergodicity proof depends on it. On metric spaces total variation distance can be
characterized in terms of continuous functions:
dVar(µ, ν) =
fdν : f continuous, sup |f | ≤ 1
This makes dVar(µ, ν) lower semicontinuous which we shall find convenient below.
Fix ℓ > 0. Then
dPn|Sℓ
dP|Sℓ
P ωx (Xn = 0,max
Xj · û ≤ ℓ/2)
E[P ωx (Xn = 0,max
Xj · û > ℓ/2)|Sℓ].
(4.4)
The L1(P)-norm of the second term is bounded by
In,ℓ = P0(max
Xj · û > Xn · û+ ℓ/2)
and (3.1) tells us that
In,ℓ ≤
e−sℓ/2(1− sδ/2)n−j ≤ Ce−sℓ/2. (4.5)
The integrand in the first term of (4.4) is measurable with respect to σ(ωx : x·û ≤ ℓ/2)
and therefore independent of Sℓ. The distance between the whole first term and 1 is
then Ŏ (In,ℓ). Thus for large enough ℓ,
dVar(Pn|Sℓ ,P|Sℓ) ≤
∫ ∣∣∣
dPn|Sℓ
dP|Sℓ
∣∣∣dP ≤ 2In,ℓ ≤ Ce−cℓ.
By the construction of P∞ as the Cesàro limit and by the lower semicontinuity and
convexity of the variation distance
dVar(P∞|Sℓ ,P|Sℓ) ≤ lim
dVar(Pm|Sℓ ,P|Sℓ) ≤ Ce
Part (b) has been verified.
As the last point we prove the ergodicity. Recall the notation E∞0 = E∞E
0 . Let
Ψ be a bounded local function on Ω. It suffices to prove that for some constant b
∣∣∣n−1
Ψ(TXjω)− b
∣∣∣ = 0. (4.6)
14 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
By an approximation it follows from this that for all F ∈ L1(P∞)
ΠjF (ω) → E∞F in L1(P∞). (4.7)
By standard theory (Section IV.2 in [12]) this is equivalent to ergodicity of P∞ for
the transition Π.
We combine the proof of Theorem 4.2 with the proof of (4.6). For this purpose let
Ψ be S−a+1-measurable with a <∞. Take a to be the parameter in the regeneration
times (2.2). Let
τi+1−1∑
Ψ(TXjω).
From the i.i.d. regeneration slabs and the moment bound (3.10) follows the limit
τm−1∑
Ψ(TXjω) = lim
ϕi = b0 P0-almost surely, (4.8)
where the constant b0 is defined by the limit.
To justify this more precisely, recall the definition of regeneration slabs given in
(2.4). Define a function Φ of the regeneration slabs by
Φ(S0,S1,S2, . . . ) =
τ2−1∑
Ψ(TXjω).
Since each regeneration slab has thickness in û-direction at least a, the Ψ-terms in
the sum do not read the environments below level zero and consequently the sum is
a function of (S0,S1,S2, . . . ). Next one can check for k ≥ 1 that
Φ(Sk−1,Sk,Sk+1, . . . ) =
τ2(Xτk−1+ · −Xτk−1 )−1∑
j=τ1(Xτk−1+ · −Xτk−1)
TXτk−1+j−Xτk−1 (TXτk−1ω)
= ϕk.
Now the sum of ϕ-terms in (4.8) can be decomposed into
ϕ0 + ϕ1 +
Φ(Sk,Sk+1,Sk+2, . . . ).
The limit (4.8) follows because the slabs (Sk)k≥1 are i.i.d. and the finite initial terms
ϕ0 + ϕ1 are eliminated by the m
−1 factor.
Let αn = inf{k : τk ≥ n}. Bounds (3.7)–(3.8) give finite moments of all orders to
the increments τk−τk−1 and this implies that n−1(ταn−1−ταn) → 0 P0-almost surely.
QUENCHED FUNCTIONAL CLT FOR RWRE 15
Consequently (4.8) yields the next limit, for another constant b:
Ψ(TXjω) = b P0-almost surely. (4.9)
By boundedness this limit is valid also in L1(P0) and the initial point of the walk is
immaterial by shift-invariance of P. Let ℓ > 0 and choose a small ε0 > 0. Abbreviate
Gn,x(ω) = E
[ ∣∣∣n−1
Ψ(TXjω)− b
∣∣∣1I
Xj · û ≥ X0 · û− ε0ℓ/2
I = {x ∈ Zd : x · û ≥ ε0ℓ, |x| ≤ Aℓ}
for some constant A. Use the bound (4.1) on the variation distance and the fact that
the functions Gn,x(ω) are uniformly bounded over all x, n, ω, and, if ℓ is large enough
relative to a and ε0, for x ∈ I the function Gn,x is Sε0ℓ/3-measurable.
P ω0 [Xℓ = x]Gn,x(ω) ≥ ε1
P∞{Gn,x(ω) ≥ ε1/(Cℓd)}
≤ Cℓdε−11
E∞Gn,x ≤ Cℓdε−11
EGn,x + Cℓ
2dε−11 e
−cε0ℓ/3.
By (4.9) EGn,x → 0 for any fixed x. Thus from above we get for any fixed ℓ,
1I{Xℓ ∈ I}Gn,Xℓ
≤ ε1 + Cℓ2dε−11 e−cε0ℓ/3. (4.10)
The reader should bear in mind that the constant C is changing from line to line.
Finally, we write
∣∣∣n−1
Ψ(TXjω)− b
≤ lim
1I{Xℓ ∈ I}
∣∣∣n−1
n+ℓ−1∑
Ψ(TXjω)− b
∣∣∣1I
Xj · û ≥ Xℓ · û− ε0ℓ/2
+ CP∞0 {Xℓ /∈ I} + CP∞0
Xj · û < Xℓ · û− ε0ℓ/2
≤ lim
1I{Xℓ ∈ I}Gn,Xℓ
+ CP∞0 {Xℓ · û < ε0ℓ}
+ CP∞0 { |Xℓ| > Aℓ} + CE∞0 P ωXℓ
Xj · û < X0 · û− ε0ℓ/2
As pointed out, P∞ satisfies Lemma 3.1 because hypotheses (N) and (M) were built
into the space Ω0 that supports P∞. This enables us to make the error probabilities
above small. Consequently, if we first pick ε0 and ε1 small enough, A large enough,
then ℓ large, and apply (4.10), we will have shown (4.6). Ergodicity of P∞ has been
shown. This concludes the proof of Theorem 4.1.
16 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Thereom 4.2 has also been established. It follows from the combination of (4.6)
and (4.9). �
5. Change of measure
There are several stages in the proof where we need to check that a desired con-
clusion is not affected by choice between P and P∞. We collect all instances of such
transfers in this section. The standing assumptions of this section are that P is an
i.i.d. product measure that satisfies Hypotheses (N) and (M), and that P∞ is the
measure given by Theorem 4.1. We show first that P∞ can be replaced with P in the
key condition (2.1) of Theorem 2.1.
Lemma 5.1. The velocity v defined by (2.5) satisfies v = E∞(D). There exists a
constant C such that
|E0(Xn)− nE∞(D)| ≤ C for all n ≥ 1. (5.1)
Proof. We start by showing v = E∞(D). The uniform exponential tail in the defi-
nition (4.3) of P0 makes the function D(ω) bounded and continuous on Ω0. By the
Cesàro definition of P∞,
E∞(D) = lim
nj−1∑
Ek(D) = lim
nj−1∑
E0[D(TXkω)].
The moment bounds (3.7)–(3.9) imply that the law of large numbers n−1Xn → v
holds also in L1(P0). From this and the Markov property
v = lim
E0(Xk+1 −Xk) = lim
E0[D(TXkω)].
We have proved v = E∞(D).
The variables (Xτj+1 −Xτj , τj+1−τj)j≥1 are i.i.d. with sufficient moments by (3.7)–
(3.9). With αn = inf{j ≥ 1 : τj − τ1 ≥ n} Wald’s identity gives
E0(Xταn−Xτ1) = E0(αn)E0(Xτ1 |β = ∞) and E0(ταn−τ1) = E0(αn)E0(τ1|β = ∞).
Consequently, by the definition (2.5) of v,
E0(Xn)− nv = vE0(ταn − τ1 − n)− E0(Xταn −Xτ1 −Xn).
It remains to show that E0(ταn − τ1−n) and E0(Xταn −Xτ1 −Xn) are bounded by
constants. We do this with a simple renewal argument. Let Yj = τj+1 − τj for j ≥ 1
and V0 = 0, Vm = Y1 + · · · + Ym. The quantity to bound is the forward recurrence
time Bn = min{k ≥ 0 : n+ k ∈ {Vm}} because ταn − τ1 − n = Bn.
We can write
Bn = (Y1 − n)+ +
1I{Y1 = k}Bn−k ◦ θ
QUENCHED FUNCTIONAL CLT FOR RWRE 17
where θ shifts the sequence {Yk} and makes Bn−k ◦ θ independent of Y1. The two
main terms on the right multiply to zero, so for any integer p ≥ 1
Bpn = ((Y1 − n)+)p +
1I{Y1 = k}(Bn−k ◦ θ)p.
Set z(n) = E0((Y1 − n)+)p. Moment bounds (3.7)–(3.8) give E0(Y p+11 ) < ∞ which
implies
z(n) <∞. Taking expectations and using independence gives the discrete
renewal equation
n = z(n) +
P0(Y1 = k)E0B
Induction on n shows that E0B
k=1 z(k) ≤ C(p) for all n. In particular,
E0(ταn − τ1 − n)p is bounded by a constant uniformly over n. To extend this to
E0|Xταn − Xτ1 − Xn|p apply an argument like the one given for (3.9) at the end of
Section 3. �
Proposition 5.2. Assume that there exists an ᾱ < 1/2 such that
|Eω0 (Xn)− E0(Xn)|
= Ŏ (n2ᾱ). (5.2)
Then condition (2.1) is satisfied for some α < 1/2.
Proof. By (5.1) assumption (5.2) turns into
|Eω0 (Xn)− nv|
= Ŏ (n2ᾱ). (5.3)
In the rest of this proof we use the conclusions of Lemma 3.1 under P∞ instead of P.
This is justified by part (c) of Theorem 4.1.
For k ≥ 1, recall that λk = inf{n ≥ 0 : (Xn − X0) · û ≥ k}. Take k = [nρ] for
a small enough ρ > 0. The point of the proof is to let the walk run up to a high
level k so that expectations under P∞ can be profitably related to expectations under
P through the variation distance bound (4.1). Estimation is needed to remove the
dependence on the environment on low levels. First compute as follows.
|Eω0 (Xn − nv)|
|Eω0 (Xn − nv, λk ≤ n) + Eω0 (Xn − nv, λk > n)|
≤ 2E∞
[ ∣∣Eω0 (Xn −Xλk − (n− λk)v, λk ≤ n)− Eω0 (λkv, λk ≤ n) + Eω0 (Xλk , λk ≤ n)2
n2E∞[P
0 (λk > n)]
≤ 8E∞
[ ∣∣∣
0≤m≤n
x·û≥k
P ω0 (Xm = x, λk = m)E
x {Xn−m − x− (n−m)v}
+ Ŏ (k2 + n2esk(1− sδ/2)n).
(5.4)
The last error term above is Ŏ (n2ρ). We used the Cauchy-Schwarz inequality and
Hypothesis (M) to get the second term in the first inequality, and then (3.1), (3.4),
and (3.5) in the last inequality.
18 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
To handle the expectation on line (5.4) we introduce a spanning set of vectors
that satisfy the main assumptions that û does. Namely, let {ûi}di=1 span Rd and
satisfy these conditions: |û − ûi| ≤ δ/(2M), where δ and M are the constants from
Hypotheses (N) and (M), and
αiûi with αi > 0. (5.5)
Then non-nestling (N) holds for each ûi with constant δ/2, and all the conclusions
of Lemma 3.1 hold when û is replaced by ûi and δ by δ/2. Define the event Ak =
{infiXi · û ≥ k} and the set
Λ = {x ∈ Zd : min
x · ûi ≥ 1}.
The point of introducing Λ is that the number of points x in Λ on level x · û = ℓ > 0
is of order Ŏ (ℓd−1).
By Jensen’s inequality the expectation on line (5.4) is bounded by
x∈Λ, x·û≥k
0≤m≤n
P ω0 (Xm = x, λk = m)
∣∣Eωx {Xn−m − x− (n−m)v, Ak/2}
(5.6)
0≤m≤n
x 6∈Λ
P ω0 (Xm = x, λk = m)
∣∣Eωx {Xn−m − x− (n−m)v}
+ 2E∞
0≤m≤n
x·û≥k
P ω0 (Xm = x, λk = m)
∣∣Eωx {Xn−m − x− (n−m)v, Ack/2}
By Cauchy-Schwarz, Hypothesis (M) and (3.1), the third term is Ŏ (n2e−sk/2) =
Ŏ (1). The second term is of order
n2max
0 (Xλk · ûi < 1)] ≤ n2max
P ω0 (Xm · ûi < 1)P ω0 (Xm · û ≥ k)
)1/2 ]
≤ es/2n2
(1− sδ/4)m/2e−µk/2eµM |û|m/2
= Ŏ (n2e−µk/2) = Ŏ (1),
for µ small enough. It remains to bound the term on line (5.6). To this end, by
Cauchy-Schwarz, (3.2) and (3.1),
P ω0 (Xm = x, λk = m) ≤ {e−sx·û/2esM |û|m/2∧1}×{eµk/2(1−µδ/2)(m−1)/2∧1} ≡ px,m,k.
Notice that ∑
{e−sx·û/2esM |û|m/2 ∧ 1} = Ŏ (md)
QUENCHED FUNCTIONAL CLT FOR RWRE 19
and ∑
md{eµk/2(1− µδ/2)(m−1)/2 ∧ 1} = Ŏ (kd+1).
Substitute these back into line (5.6) to eliminate the quenched probability coefficients.
The quenched expectation in (5.6) is Sk/2-measurable. Consequently variation dis-
tance bound (4.1) allows us to switch back to P and get this upper bound for line
(5.6):
x∈Λ, x·û≥k
0≤m≤n
px,m,kE[|Eωx {Xn−m − x− (n−m)v, Ak/2}|2] + Ŏ (kd+1n2e−ck/2).
The error term is again Ŏ (1).
Now insert Ac
back inside the quenched expectation, incurring another error term
of order Ŏ (kd+1n2e−sk/2) = Ŏ (1). Using the shift-invariance of P, along with (5.3),
and collecting all of the above error terms, we get
|Eω0 (Xn − nv)|
x∈Λ, x·û≥k
0≤m≤n
px,m,kE[ |Eωx {Xn−m − x− (n−m)v}|2] + Ŏ (n2ρ)
= Ŏ (kd+1n2ᾱ + n2ρ) = Ŏ (nρ(d+1)+2ᾱ).
Pick ρ > 0 small enough so that 2α = ρ(d + 1) + 2ᾱ < 1. The conclusion (2.1)
follows. �
Once we have verified the assumptions of Theorem 2.1 we have the CLT under
P∞-almost every ω. But we want the CLT under P-almost every ω. Thus as the final
point of this section we prove the transfer of the central limit theorem from P∞ to P.
This is where we use the ergodic theorem, Theorem 4.2. Let W be the probability
distribution of the Brownian motion with diffusion matrix D.
Lemma 5.3. Suppose the weak convergence Qωn ⇒ W holds for P∞-almost every ω.
Then the same is true for P-almost every ω.
Proof. It suffices to show that for any bounded uniformly continuous F on DRd[0,∞)
and any δ > 0
Eω0 [F (Bn)] ≤
F dW + δ P-a.s.
By considering also −F this gives Eω0 [F (Bn)] →
F dW P0-a.s. for each such func-
tion. A countable collection of them determines weak convergence.
Fix such an F . Let c =
F dW and
h(ω) = lim
Eω0 [F (Bn)].
20 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
For ℓ > 0 define the events
A−ℓ = {inf
Xn · û ≥ −ℓ}
and then
hℓ(ω) = lim
Eω0 [F (Bn), A−ℓ] and Ψ(ω) = 1I{ω : h̄ℓ(ω) ≤ c+ 12δ}.
The assumed quenched CLT under P∞ gives P∞{h = c} = 1. By (3.1), and by its
extension to P∞ in Theorem 4.1(c), there are constants 0 < C, s <∞ such that
|h(ω)− hℓ(ω)| ≤ Ce−sℓ
uniformly over all ω that support both P and P∞. Consequently if δ > 0 is given,
E∞Ψ = 1 for large enough ℓ. Since Ψ is S−ℓ-measurable Theorem 4.2 implies that
Ψ(TXjω) → 1 P0-a.s.
By increasing ℓ if necessary we can ensure that {h̄ℓ ≤ c + 12δ} ⊂ {h̄ ≤ c + δ} and
conclude that the stopping time
ζ = inf{n ≥ 0 : h̄(TXnω) ≤ c+ δ}
is P0-a.s. finite. From the definitions we now have
0 [F (Bn)] ≤
F dW + δ P0-a.s.
Then by bounded convergence
Eω0 E
0 [F (Bn)] ≤
F dW + δ P-a.s.
Since ζ is a finite stopping time, the strong Markov property, the uniform continuity
of F and the exponential moment bound (3.2) on X-increments imply
Eω0 [F (Bn)] ≤
F dW + δ P-a.s.
This concludes the proof. �
6. Reduction to path intersections
The preceding sections have reduced the proof of the main result Theorem 1.1 to
proving the estimate
|Eω0 (Xn)−E0(Xn)|
= Ŏ (n2α) for some α < 1/2. (6.1)
The next reduction takes us to the expected number of intersections of the paths of
two independent walks X and X̃ in the same environment. The argument uses a de-
composition into martingale differences through an ordering of lattice sites. This idea
QUENCHED FUNCTIONAL CLT FOR RWRE 21
for bounding a variance is natural and has been used in RWRE earlier by Bolthausen
and Sznitman [2].
Let P ω0,0 be the quenched law of the walks (X, X̃) started at (X0, X̃0) = (0, 0) and
P0,0 =
P ω0,0 P(dω) the averaged law with expectation operator E0,0. The set of sites
visited by a walk is denoted by X[0,n) = {Xk : 0 ≤ k < n} and |A| is the number of
elements in a discrete set A.
Proposition 6.1. Let P be an i.i.d. product measure and satisfy Hypotheses (N) and
(M). Assume that there exists an ᾱ < 1/2 such that
E0,0(|X[0,n) ∩ X̃[0,n)|) = Ŏ (n2ᾱ). (6.2)
Then condition (6.1) is satisfied.
Proof. For L ≥ 0, define B(L) = {x ∈ Zd : |x| ≤ L}. Fix n ≥ 1, c > |û|, and let
(xj)j≥1 be some fixed ordering of B(cMn) satisfying
∀i ≥ j : xi · û ≥ xj · û.
For B ⊂ Zd let SB = σ{ωx : x ∈ B}. Let Aj = {x1, . . . , xj}, ζ0 = E0(Xn), and for
j ≥ 1
ζj = E(E
0 (Xn)|SAj).
(ζj − ζj−1)j≥1 is a sequence of L2(P)-martingale differences and we have
E[ |Eω0 (Xn)− E0(Xn)| 2] (6.3)
|E0(Xn)− E{Eω0 (Xn)|SB(cMn)}|2
[ ∣∣Eω0 (Xn,max
|Xi| > cMn)
− E{Eω0 (Xn,max
|Xi| > cMn) |SB(cMn)}
∣∣2 ]
|B(cMn)|∑
E( |ζj − ζj−1|2 ) + Ŏ (n3e−sM(c−|û|)n). (6.4)
In the last inequality we have used (3.2). The error is Ŏ (1). For z ∈ Zd define
half-spaces
H(z) = {x ∈ Zd : x · û > z · û}.
Since Aj−1 ⊂ Aj ⊂ H(xj)c,
E(|ζj − ζj−1|2)
P(dωAj)
P(dωAc
)P(dω̃xj)
Eω0 (Xn)− E
〈ω,ω̃xj 〉
0 (Xn)
P(dωH(xj)c)P(dω̃xj)
P(dωH(xj))
Eω0 (Xn)− E
〈ω,ω̃xj 〉
0 (Xn)
. (6.5)
Above 〈ω, ω̃xj〉 denotes an environment obtained from ω by replacing ωxj with ω̃xj .
22 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
We fix a point z = xj to develop a bound for the expression above, and then return
to collect the estimates. Abbreviate ω̃ = 〈ω, ω̃xj〉. Consider two walks Xn and X̃n
starting at 0. Xn obeys environment ω, while X̃n obeys ω̃. We can couple the two
walks so that they stay together until the first time they visit z. Until a visit to z
happens, the walks are identical. So we write
P(dωH(z))
Eω0 (Xn)− Eω̃0 (Xn)
(6.6)
P(dωH(z))
P ω0 (Hz = m)
Eωz (Xn−m − z)− Eω̃z (Xn−m − z)
P(dωH(z))
P ω0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
Eωz (Xn−m − z)− Eω̃z (Xn−m − z)
(6.7)
Decompose H(z) = Hℓ(z) ∪H′ℓ(z) where
Hℓ(z) = {x ∈ Zd : z · û < x · û < z · û+ ℓ} and H′ℓ(z) = {x ∈ Zd : x · û ≥ z · û+ ℓ}.
Take a single (ℓ,m) term from the sum in (6.7) and only the expectation Eωz (Xn−m−
P(dωH(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
×Eωz (Xn−m − z)
P(dωH(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
×Eωz (Xτ (ℓ)1 +n−m −Xτ (ℓ)1 )
(6.8)
P(dωH(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
×Eωz (Xn−m −Xτ (ℓ)1 +n−m +Xτ (ℓ)1 − z)
(6.9)
The parameter ℓ in the regeneration time τ
1 of the walk started at z ensures that the
subsequent walkX
1 + ·
stays inH′ℓ(z). Below we make use of this to get independence
from the environments in H′ℓ(z)c. By (3.9) the quenched expectation in (6.9) can be
bounded by Cpℓ
p, for any p > 1.
QUENCHED FUNCTIONAL CLT FOR RWRE 23
Integral (6.8) is developed further as follows.
P(dωH(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
× Eωz (Xτ (ℓ)1 +n−m −Xτ (ℓ)1 )
P(dωHℓ(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
P(dωH′
(z))E
z (Xτ (ℓ)1 +n−m
P(dωHℓ(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
× Ez(Xτ (ℓ)1 +n−m −Xτ (ℓ)1 |SH′ℓ(z)c)
P(dωHℓ(z))P
0 (Hz = m, ℓ− 1 ≤ max
0≤j≤m
Xj · û− z · û < ℓ)
× E0(Xn−m|β = ∞).
(6.10)
The last equality above comes from the regeneration structure, see Proposition 1.3 in
Sznitman-Zerner [17]. The σ-algebra SH′
(z)c is contained in the σ-algebra G1 defined
by (1.22) of [17] for the walk starting at z.
The last quantity (6.10) above reads the environment only until the first visit to z,
hence does not see the distinction between ω and ω̃. Hence when the integral (6.7) is
developed separately for ω and ω̃ into the sum of integrals (6.8) and (6.9), integrals
(6.8) for ω and ω̃ cancel each other. We are left only with two instances of integral
(6.9), one for both ω and ω̃. The last quenched expectation in (6.9) we bound by
p as was mentioned above.
Going back to (6.6), we get this bound:
P(dωH(z))
Eω0 (Xn)− Eω̃0 (Xn)
P(dωH(z))
ℓpP ω0 (Hz < n, ℓ− 1 ≤ max
0≤j≤Hz
Xj · û− z · û < ℓ)
P(dωH(z))E
0 [(MHz − z · û)p1I{Hz < n}]
≤ Cpnpε
P(dωH(z))P
0 (Hz < n) + Cps
−pe−sn
For the last inequality we used (3.6) with ℓ = nε and some small ε, s > 0. Square,
take z = xj , integrate as in (6.5), and use Jensen’s inequality to bring the square
inside the integral to get
E( |ζj − ζj−1|2 ) ≤ 2Cpn2pεE[ |P ω0 (Hxj < n)|2 ] + 2Cps−2pe−sn
24 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Substitute these bounds into line (6.4) and note that the error there is Ŏ (1).
E[ |Eω0 (Xn)−E0(Xn)|2 ]
≤ Cpn2pε
E[ |P ω0 (Hz < n)|2 ] + Ŏ (nds−2pe−sn
) + Ŏ (1)
= Cpn
P0,0(z ∈ X[0,n) ∩ X̃[0,n)) + Ŏ (1)
= Cpn
2pεE0,0[ |X[0,n) ∩ X̃[0,n)| ] + Ŏ (1).
Utilize assumption (6.2) and take ε > 0 small enough so that 2α = 2pε + 2ᾱ < 1.
(6.1) has been verified. �
7. Bound on intersections
The remaining piece of the proof of Theorem 1.1 is this estimate:
E0,0( |X[0,n) ∩ X̃[0,n)| ) = Ŏ (n2α) for some α < 1/2. (7.1)
X and X̃ are two independent walks in a common environment with quenched
distribution P ωx,y[X0,∞ ∈ A, X̃0,∞ ∈ B] = P ωx (A)P ωy (B) and averaged distribution
Ex,y(·) = EP ωx,y(·).
To deduce the sublinear bound we introduce regeneration times at which both
walks regenerate on the same level in space (but not necessarily at the same time).
Intersections happen only within the regeneration slabs, and the expected number
of intersections decays exponentially in the distance between the points of entry of
the walks in the slab. From regeneration to regeneration the difference of the two
walks operates like a Markov chain. This Markov chain can be approximated by
a symmetric random walk. Via this preliminary work the required estimate boils
down to deriving a Green function bound for a Markov chain that can be suitably
approximated by a symmetric random walk. This part is relegated to an appendix.
Except for the appendix, we complete the proof of the functional central limit theorem
in this section.
To aid our discussion of a pair of walks (X, X̃) we introduce some new notation.
We write θm,n for the shift on pairs of paths: θm,n(x0,∞, y0,∞) = (θ
mx0,∞, θ
ny0,∞). If
we write separate expectations for X and X̃ under P ωx,y, these are denoted by E
x and
Ẽωy .
By a joint stopping time we mean pair (α, α̃) that satisfies {α = m, α̃ = n} ∈
σ{X0,m, X̃0,n}. Under the distribution P ωx,y the walks X and X̃ are independent.
Consequently if α ∨ α̃ <∞ P ωx,y-almost surely then for any events A and B,
P ωx,y[(X0,α, X̃0,α̃) ∈ A, (Xα,∞, X̃α̃,∞) ∈ B]
= Eωx,y
1I{(X0,α, X̃0,α̃) ∈ A}P ωXα, eXα̃{(X0,∞, X̃0,∞) ∈ B}
QUENCHED FUNCTIONAL CLT FOR RWRE 25
This type of joint restarting will be used without comment in the sequel.
For this section it will be convenient to have level stopping times and running
maxima that are not defined relative to the initial level.
γℓ = inf{n ≥ 0 : Xn · û ≥ ℓ} and γ+ℓ = inf{n ≥ 0 : Xn · û > ℓ}.
Since û ∈ Zd, γ+ℓ is simply an abbreviation for γℓ+1. Let Mn = sup{Xi · û : i ≤ n}
be the running maximum. M̃n, γ̃ℓ and γ̃
ℓ are the corresponding quantities for the X̃
walk. The first backtracking time for the X̃ walk is β̃ = inf{n ≥ 1 : X̃n · û < X̃0 · û}.
Define
L = inf{ℓ > (X0 · û) ∧ (X̃0 · û) : Xγℓ · û = X̃γ̃ℓ · û = ℓ}
as the first fresh common level after at least one walk has exceeded its starting level.
Set L = ∞ if there is no such common level. When the walks are on a common level,
their difference will lie in the hyperplane
Vd = {z ∈ Zd : z · û = 0}.
We start with exponential tail bounds on the time to reach the common level.
Lemma 7.1. There exist constants 0 < a1, a2, C < ∞ such that, for all x, y ∈ Zd,
m ≥ 0 and P-a.e. ω,
P ωx,y(γL ∨ γ̃L ≥ m) ≤ Cea1|y·û−x·û|−a2m. (7.2)
For the proof we need a bound on the overshoot.
Lemma 7.2. There exist constants 0 < C, s < ∞ such that, for any level k, any
b ≥ 1, any x ∈ Zd such that x · û ≤ k, and P-a.e. ω,
P ωx [Xγk · û ≥ k + b] ≤ Ce−sb. (7.3)
Proof. From (3.1) it follows that for a constant C, for any level ℓ, any x ∈ Zd, and
P-a.e. ω,
Eωx [number of visits to level ℓ] =
P ωx [Xn · û = ℓ] ≤ C. (7.4)
(This is certainly clear if x · û = ℓ. Otherwise wait until the process first lands on
level ℓ, if ever.)
26 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
From this and the exponential moment hypothesis we deduce the required bound
on the overshoots: for any k, any x ∈ Zd such that x · û ≤ k, and P-a.e. ω,
P ωx [Xγk · û ≥ k + b] =
z·û<k
P ωx [γk > n, Xn = z, Xn+1 · û ≥ k + b]
z·û=k−ℓ
P ωx [γk > n, Xn = z]P
z [X1 · û ≥ k + b]
z·û=k−ℓ
P ωx [Xn = z]Ce
−s(ℓ+b)
≤ Ce−sb
e−sℓ ≤ Ce−sb. �
Proof of Lemma 7.1. Consider first γL, and let us restrict ourselves to the case where
the initial points x, y satisfy x · û < y · û.
Perform an iterative construction of stopping times ηi, η̃i and levels ℓ(i), ℓ̃(i). Let
η0 = η̃0 = 0, x0 = x and y0 = y. ℓ(0) and ℓ̃(0) need not be defined. Suppose that
the construction has been done to stage i − 1 with xi−1 = Xηi−1 , yi−1 = X̃η̃i−1 , and
xi−1 · û < yi−1 · û. Then set
ℓ(i) = Xγ(yi−1·û) · û, ℓ̃(i) = X̃γ̃(ℓ(i)) · û, ηi = γ(ℓ̃(i)) and η̃i = γ̃(Xηi · û+ 1).
In words, starting at (xi−1, yi−1) with yi−1 above xi−1, let X reach the level of yi−1
and let ℓ(i) be the level X lands on; let X̃ reach the level ℓ(i) and let ℓ̃(i) be the level
X̃ lands on. Now let X try to establish a new common level at ℓ̃(i) with X̃ : in other
words, follow X until the time ηi it reaches level ℓ̃(i) or above, and stop it there.
Finally, reset the situation by letting X̃ reach a level strictly above the level of Xηi ,
and stop it there at time η̃i. The starting locations for the next step are xi = Xηi ,
yi = X̃η̃i that satisfy xi · û < yi · û.
We show that within each step of the iteration there is a uniform lower bound on
the probability that a fresh common level was found. For this purpose we utilize
assumption (1.1) in the weaker form
P{ω : P ω0 (Xγ1 · û = 1) ≥ κ } = 1. (7.5)
Pick b large enough so that the bound in (7.3) is < 1. For z, w ∈ Zd such that
z · û ≥ w · û define a function
ψ(z, w) = P ωz,w[ Xγk · û = k for each k ∈ {z · û, . . . , z · û+ b},
X̃γ̃(z·û) · û− z · û ≤ b ]
≥ κb(1− Ce−sb) ≡ κ2 > 0.
QUENCHED FUNCTIONAL CLT FOR RWRE 27
The uniform lower bound comes from the independence of the walks, from (7.3) and
from iterating assumption (7.5). By the Markov property
P ωxi−1,yi−1[Xγ(ℓ̃(i)) · û = ℓ̃(i)]
≥ P ωxi−1,yi−1[ X̃γ̃(ℓ(i)) · û− ℓ(i) ≤ b, Xγk · û = k for each k ∈ {ℓ(i), . . . , ℓ(i) + b} ]
≥ Eωxi−1,yi−1
ψ(Xγ(yi−1·û), yi−1)
≥ κ2.
The first iteration on which the attempt to create a common level at ℓ̃(i) succeeds
I = inf{i ≥ 1 : Xγ
ℓ̃(i)
· û = ℓ̃(i)}.
Then ℓ̃(I) is a new fresh common level and consequently L ≤ ℓ̃(I). This gives the
upper bound
γL ≤ γℓ̃(I).
We develop an exponential tail bound for γℓ̃(I), still under the assumption x · û < y · û.
From the uniform bound above and the Markov property we get
P ωx,y[I > i] ≤ (1− κ2)i.
Lemma 7.2 gives an exponential bound
P ωx,y
(X̃η̃i − X̃η̃i−1) · û ≥ b
≤ Ce−sb (7.6)
because the distance (X̃η̃i − X̃η̃i−1) · û is a sum of four overshoots:
X̃η̃i − X̃η̃i−1
· û =
X̃γ̃(Xηi ·û+1) · û−Xηi · û− 1
+ 1 +
Xγ(ℓ̃(i)) · û− ℓ̃(i)
X̃γ̃(ℓ(i)) · û− ℓ(i)
γ( eXη̃i−1 ·û)
· û− X̃η̃i−1 · û
Next, from the exponential tail bound on
X̃η̃i − X̃η̃i−1
· û and from
ℓ̃(i) ≤ X̃η̃i · û =
X̃η̃j − X̃η̃j−1
· û+ y · û
we get the large deviation estimate
P ωx,y[ℓ̃(i) ≥ bi+ y · û] ≤ e−sbi for i ≥ 1 and b ≥ b0,
for some constants 0 < s < ∞ (small enough) and 0 < b0 < ∞ (large enough).
Combine this with the bound above on I to write
P ωx,y[ℓ̃(I) ≥ a] ≤ P ωx,y[I > i] + P ωx,y[ℓ̃(i) ≥ a]
≤ e−si + esy·û−sa ≤ 2esy·û−sa
where we assume a ≥ 2b0 + y · û and set the integer i = ⌊b−10 (a− y · û)⌋. Recall that
0 < s <∞ is a constant whose value can change from line to line.
28 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
From (3.1) and an exponential Chebyshev
P ωx,y[γk > m] ≤ P ωx [Xm · û ≤ k] ≤ esk−sx·û−h1m
for all x, y ∈ Zd, k ∈ Z and m ≥ 0. Above and in the remainder of this proof h1, h2
and h3 are small positive constants. Finally we derive
P ωx,y[γℓ̃(I) > m] ≤ P ωx,y[ℓ̃(I) ≥ k + x · û] + P ωx,y[γk+x·û > m]
≤ 2es(y−x)·û−sk + esk−h1m ≤ Ces(y−x)·û−h2m.
To justify the inequalities above assume m ≥ 4sb0/h1 > 4s/h1 and pick k in the
range
+ (y − x) · û ≤ k ≤ 3h1m
+ (y − x) · û.
To summarize, at this point we have
P ωx,y[γL > m] ≤ Ces(y−x)·û−h2m for x · û < y · û. (7.7)
To extend this estimate to the case x · û ≥ y · û, simply allow X̃ to go above x and
then apply (7.7). By an application of the overshoot bound (7.3) and (7.7) at the
point (x, X̃γ̃(x·û+1))
P ωx,y[γL > m] ≤ Eωx,yP ωx, eXγ̃(x·û+1)[γL > m]
≤ P ωx,y[X̃γ̃(x·û+1) > x · û+ εm] + Cesεm−h2m ≤ Ce−h3m
if we take ε > 0 small enough.
We have proved the lemma for γL, and the same argument works for γ̃L. �
Assuming that X0 · û = X̃0 · û define the joint stopping times
(ρ, ρ̃) = (γ+
, γ̃+
(ν1, ν̃1) =
(ρ, ρ̃) + (γL, γ̃L) ◦ θρ,ρ̃ if ρ ∨ ρ̃ <∞
∞ if ρ = ρ̃ = ∞.
(7.8)
Notice that ρ and ρ̃ are finite or infinite together, and they are infinite iff neither
walk backtracks below its initial level (β = β̃ = ∞). Let ν0 = ν̃0 = 0 and for k ≥ 0
define
(νk+1, ν̃k+1) = (νk, ν̃k) + (ν1, ν̃1) ◦ θνk,ν̃k .
Finally let (ν, ν̃) = (γL, γ̃L), K = sup{k ≥ 0 : νk ∨ ν̃k <∞}, and
(µ1, µ̃1) = (ν, ν̃) + (νK , ν̃K) ◦ θν,ν̃ . (7.9)
These represent the first common regeneration times of the two paths. Namely,
Xµ1 · û = X̃µ̃1 · û and for all n ≥ 1,
Xµ1−n · û < Xµ1 · û ≤ Xµ1+n · û and X̃µ̃1−n · û < X̃µ̃1 · û ≤ X̃µ̃1+n · û.
QUENCHED FUNCTIONAL CLT FOR RWRE 29
Next we extend the exponential tail bound to the regeneration times.
Lemma 7.3. There exist constants 0 < C < ∞ and η̄ ∈ (0, 1) such that, for all
x, y ∈ Vd = {z ∈ Zd : z · û = 0}, k ≥ 0, and P-a.e. ω, we have
P ωx,y(µ1 ∨ µ̃1 ≥ k) ≤ C(1− η̄)k. (7.10)
Proof. We prove geometric tail bounds successively for γ+1 , γ
ℓ , γ
, ρ, ν1, νk, and
finally for µ1. To begin, (3.1) implies that
P ω0 (γ
1 ≥ n) ≤ P ω0 (Xn−1 · û ≤ 1) ≤ es2(1− η1)n−1
with η1 = s2δ/2, for some small s2 > 0. By summation by parts
Eω0 (e
1 ) ≤ es2Js3 ,
for a small enough s3 > 0 and Js = 1 + (e
s − 1)/(1 − (1 − η1)es). By the Markov
property for ℓ ≥ 1,
Eω0 (e
ℓ ) ≤
x·û>ℓ−1
Eω0 (e
ℓ−1 , Xγ+
= x)Eωx (e
But if x · û > ℓ− 1, then Eωx (es3γ
ℓ ) ≤ ETxω0 (es3γ
1 ). Therefore by induction
Eω0 (e
ℓ ) ≤ (es2Js3)ℓ for any integer ℓ ≥ 0. (7.11)
Next for an integer r ≥ 1,
Eω0 (e
Mr ) =
Eω0 (e
ℓ ,Mr = ℓ) ≤
Eω0 (e
ℓ )1/2P ω0 (Mr = ℓ)
(es2J2s4)
ℓ/2(1I{ℓ < 3Mr|û|}+ e−s5ℓ)1/2 ≤ C(es2J2s4)Cr,
for some C and for positive but small enough s2, s4, and s5. In the last inequality
above we used the fact that es2J2s4 converges to 1 as first s4 ց 0 and then s2 ց 0.
In the second-to-last inequality we used (3.2) to get the bound
P ω0 (Xi · û ≥ ℓ) ≤
e−sℓeM |û|si ≤ Ce−s5ℓ if ℓ ≥ 3M |û|r.
Above we assumed that the walk X starts at 0. Same bounds work for any x ∈ Vd
because a shift orthogonal to û does not alter levels, in particular P ωx (Mr = ℓ) =
P Txω0 (Mr = ℓ).
By this same observation we show that for all x, y ∈ Vd
Eωx,y(e
fMr ) ≤ C(es2J2s4)Cr
by repeating the earlier series of inequalities.
30 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Using (3.1) and these estimates gives for x, y ∈ Vd
P ωx,y(ρ ≥ n, β ∧ β̃ <∞) =
P ωx,y(γ
Mr∨fMr
≥ n, β ∧ β̃ = r)
≤ e−s4n/2
1≤r≤εn
Eωx (e
Mr )1/2Eωx,y(e
fMr )1/2
P ωx {Xr · û < x · û}+ P ωy {X̃r · û < y · û}
≤ Cεne−s4n/2(es2J2s4)Cεn + C(1− s6δ/2)εn.
Taking ε > 0 small enough shows the existence of a constant η2 > 0 such that for all
x, y ∈ Vd, n ≥ 1, and P-a.e. ω,
P ωx,y(ρ ≥ n, β ∧ β̃ <∞) ≤ C(1− η2)n.
Same bound works for ρ̃ also. We combine this with (7.2) to get a geometric tail
bound for ν11I{β ∧ β̃ <∞}. Recall definition (7.8) and take ε > 0 small.
P ωx,y[ν1 ≥ k, β ∧ β̃ <∞]
≤ P ωx,y[ρ ≥ k/2, β ∧ β̃ <∞] + P ωx,y[β ∧ β̃ <∞, |Xρ · û− X̃ρ̃ · û| > εk]
+ P ωx,y[γL ◦ θρ,ρ̃ ≥ k/2, β ∧ β̃ <∞, |Xρ · û− X̃ρ̃ · û| ≤ εk ].
On the right-hand side above we have an exponential bound for each of the three
probabilities: the first probability gets it from the estimate immediately above, the
second from a combination of that and (3.2), and the third from (7.2):
P ωx,y[γL ◦ θρ,ρ̃ ≥ k/2, β ∧ β̃ <∞, |Xρ · û− X̃ρ̃ · û| ≤ εk ]
= Eωx,y
1I{β ∧ β̃ <∞, |Xρ · û− X̃ρ̃ · û| ≤ εk}P ωXρ , eXρ̃{γL ≥ k/2}
≤ Cea1εk−a2k/2.
The constants in the last bound above are those from (7.2), and we choose ε <
a2/(2a1). We have thus established that
Eωx,y(e
s7ν11I{β ∧ β̃ <∞}) ≤ J̄s7
for a small enough s7 > 0, with J̄s = C(1− (1− η3)es)−1 and η3 > 0.
To move from ν1 to νk use the Markov property and induction:
Eωx,y(e
s7νk1I{νk ∨ ν̃k <∞})
Eωx,y(e
s7νk−11I{νk−1 ∨ ν̃k−1 <∞, Xνk−1 = z, X̃ν̃k−1 = z̃})
× Eωz,z̃(es7ν11I{β ∧ β̃ <∞})
≤ J̄s7Eωx,y(es7νk−11I{νk−1 ∨ ν̃k−1 <∞}) ≤ · · · ≤ J̄ks7.
QUENCHED FUNCTIONAL CLT FOR RWRE 31
Next, use the Markov property at the joint stopping times (νk, ν̃k), (7.2), (3.7),
and induction to derive
P ωx,y(K ≥ k) ≤ P ωx,y(νk ∨ ν̃k <∞)
P ωx,y(νk−1 ∨ ν̃k−1 <∞, Xνk−1 = z, X̃ν̃k−1 = z̃)P ωz,z̃(β ∧ β̃ <∞)
≤ (1− η2)P ωx,y(νk−1 ∨ ν̃k−1 <∞) ≤ (1− η2)k.
Finally use the Cauchy-Schwarz and Chebyshev inequalities to write
P ωx,y(νK ≥ n) =
P ωx,y(νk ≥ n,K = k)
(1− η2)k + e−s7n
1≤k≤εn
Eωx,y(e
s7νk1I{νk ∨ ν̃k <∞})
≤ C(1− η2)εn + Cεne−s7nJ̄εns7 .
Looking at the definition (7.9) of µ1 we see that an exponential tail bound follows by
applying (7.2) to the ν-part and by taking ε > 0 small enough in the last calculation
above. Repeat the same argument for µ̃1 to conclude the proof of (7.10). �
After these preliminaries define the sequence of common regeneration times by
µ0 = µ̃0 = 0 and
(µi+1, µ̃i+1) = (µi, µ̃i) + (µ1, µ̃1) ◦ θµi,µ̃i. (7.12)
The next tasks are to identify suitable Markovian structures and to develop a cou-
pling.
Proposition 7.4. The process (X̃µ̃i−Xµi)i≥1 is a Markov chain on Vd with transition
probability
q(x, y) = P0,x[X̃µ̃1 −Xµ1 = y | β = β̃ = ∞]. (7.13)
Note that the time-homogeneous Markov chain does not start from X̃0−X0 because
the transition to X̃µ̃1 −Xµ1 does not include the condition β = β̃ = ∞.
Proof. Express the iteration of the common regeneration times as
(µi, µ̃i) = (µi−1, µ̃i−1) +
(ν, ν̃) + (νK , ν̃K) ◦ θν,ν̃
◦ θµi−1,µ̃i−1 , i ≥ 1.
Let Ki be the value of K at the ith iteration:
Ki = K ◦ θν,ν̃ ◦ θµi−1,µ̃i−1 .
32 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Let n ≥ 2 and z1, . . . , zn ∈ Vd. Write
P0,z[X̃µ̃i −Xµi = zi for 1 ≤ i ≤ n] (7.14)
(ki,mi,m̃i,vi,ṽi)1≤i≤n−1∈Ψ
Ki = ki, µi = mi, µ̃i = m̃i,
Xmi = vi and X̃m̃i = ṽi for 1 ≤ i ≤ n− 1,
(X̃µ̃1 −Xµ1) ◦ θmn−1,m̃n−1 = zn
Above Ψ is the set of vectors (ki, mi, m̃i, vi, ṽi)1≤i≤n−1 such that ki is nonnegative and
mi, m̃i, vi · û, and ṽi · û are all positive and strictly increasing in i, and ṽi − vi = zi.
Define the events
Ak,b,b̃ = {ν + νk ◦ θν,ν̃ = b , ν̃ + ν̃k ◦ θν,ν̃ = b̃}
Bb,b̃ = {Xj · û ≥ X0 · û for 1 ≤ j ≤ b , X̃j · û ≥ X̃0 · û for 1 ≤ j ≤ b̃}.
Let m0 = m̃0 = 0, bi = mi −mi−1 and b̃i = m̃i − m̃i−1. Rewrite the sum from above
(ki,mi,m̃i,vi,ṽi)1≤i≤n−1∈Ψ
[ n−1∏
1I{Aki,bi,b̃i} ◦ θ
mi−1,m̃i−1
1I{Bbi, b̃i} ◦ θ
mi−1,m̃i−1 , Xmi = vi and X̃m̃i = ṽi for 1 ≤ i ≤ n− 1,
β ◦ θmn−1 = β̃ ◦ θm̃n−1 = ∞ , (X̃µ̃1 −Xµ1) ◦ θmn−1,m̃n−1 = zn
Next restart the walks at times (mn−1, m̃n−1) to turn the sum into the following.
(ki,mi,m̃i,vi,ṽi)1≤i≤n−1∈Ψ
Eω0,z
[ n−1∏
1I{Aki,bi,b̃i} ◦ θ
mi−1,m̃i−1
1I{Bbi, b̃i} ◦ θ
mi−1,m̃i−1 , Xmi = vi and X̃m̃i = ṽi for 1 ≤ i ≤ n− 1
× P ωvn−1 , ṽn−1
β = β̃ = ∞ , X̃µ̃1 −Xµ1 = zn
Inside the outermost braces the events in the first quenched expectation force the
level
ℓ = Xmn−1 · û = vn−1 · û = X̃m̃n−1 · û = ṽn−1 · û
QUENCHED FUNCTIONAL CLT FOR RWRE 33
to be a new maximal level for both walks. Consequently the first quenched expecta-
tion is a function of {ωx : x · û < ℓ} while the last quenched probability is a function
of {ωx : x · û ≥ ℓ}. By independence of the environments, the sum becomes
(ki,mi,m̃i,vi,ṽi)1≤i≤n−1∈Ψ
[ n−1∏
1I{Aki,bi,b̃i} ◦ θ
mi−1,m̃i−1 (7.15)
1I{Bbi, b̃i} ◦ θ
mi−1,m̃i−1 , Xmi = vi and X̃m̃i = ṽi for 1 ≤ i ≤ n− 1
× Pvn−1 , ṽn−1
β = β̃ = ∞ , X̃µ̃1 −Xµ1 = zn
By a shift and a conditioning the last probability transforms as follows.
Pvn−1 , ṽn−1
β = β̃ = ∞ , X̃µ̃1 −Xµ1 = zn
= P0,zn−1
X̃µ̃1 −Xµ1 = zn
∣∣ β = β̃ = ∞
Pvn−1 , ṽn−1
β = β̃ = ∞
= q(zn−1, zn)Pvn−1 , ṽn−1
β = β̃ = ∞
Now reverse the above use of independence to put the probability
Pvn−1 , ṽn−1 [β = β̃ = ∞]
back together with the expectation (7.15). Inside this expectation this furnishes the
event β ◦ θmn−1 = β̃ ◦ θm̃n−1 = ∞ and with this the union of the entire collection of
events turns back into X̃µ̃i −Xµi = zi for 1 ≤ i ≤ n−1. Going back to the beginning
on line (7.14) we see that we have now shown
P0,z[X̃µ̃i −Xµi = zi for 1 ≤ i ≤ n]
= P0,z[X̃µ̃i −Xµi = zi for 1 ≤ i ≤ n− 1]q(zn−1, zn).
Continue by induction. �
The Markov chain Yk = X̃µ̃k − Xµk will be compared to a random walk obtained
by performing the same construction of joint regeneration times to two independent
walks in independent environments. To indicate the difference in construction we
change notation. Let the pair of walks (X, X̄) obey P0 ⊗Pz with z ∈ Vd, and denote
the first backtracking time of the X̄ walk by β̄ = inf{n ≥ 1 : X̄n · û < X̄0 · û}.
Construct the common regeneration times (ρk, ρ̄k)k≥1 for (X, X̄) by the same recipe
[(7.8), (7.9) and (7.12)] as was used to construct (µk, µ̃k)k≥1 for (X, X̃). Define
Ȳk = X̄ρ̄k − Xρk . An analogue of the previous proposition, which we will not spell
out, shows that (Ȳk)k≥1 is a Markov chain with transition
q̄(x, y) = P0 ⊗ Px[X̄ρ̄1 −Xρ1 = y | β = β̄ = ∞]. (7.16)
34 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
In the next two proofs we make use of the following decomposition. Suppose
x · û = y · û = 0, and let (x1, y1) be another pair of points on a common, higher level:
x1 · û = y1 · û = ℓ > 0. Then we can write
{(X0, X̃0) = (x, y), β = β̃ = ∞, (Xµ1 , X̃µ̃1) = (x1, y1)}
(γ,γ̃)
{X0,n(γ) = γ, X̃0,n(γ̃) = γ̃, β ◦ θn(γ) = β̃ ◦ θn(γ̃) = ∞}. (7.17)
Here (γ, γ̃) range over all pairs of paths that connect (x, y) to (x1, y1), that stay
between levels 0 and ℓ−1 before the final points, and for which a common regeneration
fails at all levels before ℓ. n(γ) is the index of the final point along the path, so for
example γ = (x = z0, z1, . . . , zn(γ)−1, zn(γ) = x1).
Proposition 7.5. The process (Ȳk)k≥1 is a symmetric random walk on Vd and its
transition probability satisfies
q̄(x, y) = q̄(0, y − x) = q̄(0, x− y) = P0 ⊗ P0[X̄ρ̄1 −Xρ1 = y − x | β = β̄ = ∞].
Proof. It remains to show that for independent (X, X̄) the transition (7.16) reduces to
a symmetric random walk. This becomes obvious once probabilities are decomposed
into sums over paths because the events of interest are insensitive to shifts by z ∈ Vd.
P0 ⊗ Px[β = β̄ = ∞ , X̄ρ̄1 −Xρ1 = y]
P0 ⊗ Px[β = β̄ = ∞ , Xρ1 = w , X̄ρ̄1 = y + w]
(γ,γ̄)
P0[X0,n(γ) = γ, β ◦ θn(γ) = ∞]Px[X0,n(γ̄) = γ̄, β ◦ θn(γ̄) = ∞]
(γ,γ̄)
P0[X0,n(γ) = γ]Px[X0,n(γ̄) = γ̄]
P0[β = ∞]
(7.18)
Above we used the decomposition idea from (7.17). Here (γ, γ̄) range over the
appropriate class of pairs of paths in Zd such that γ goes from 0 to w and γ̄ goes from
x to y + w. The independence for the last equality above comes from noticing that
the quenched probabilities P ω0 [X0,n(γ) = γ] and P
w [β = ∞] depend on independent
collections of environments.
The probabilities on the last line of (7.18) are not changed if each pair (γ, γ̄) is
replaced by (γ, γ′) = (γ, γ̄−x). These pairs connect (0, 0) to (w, y−x+w). Because
x ∈ Vd satisfies x · û = 0, the shift has not changed regeneration levels. This shift
turns Px[X0,n(γ̄) = γ̄] on the last line of (7.18) into P0[X0,n(γ′) = γ
′]. We can reverse
the steps in (7.18) to arrive at the probability
P0 ⊗ P0[β = β̄ = ∞ , X̄ρ̄1 −Xρ1 = y − x].
This proves q̄(x, y) = q̄(0, y − x).
QUENCHED FUNCTIONAL CLT FOR RWRE 35
Once both walks start at 0 it is immaterial which is labeled X and which X̄ , hence
symmetry holds. �
It will be useful to know that q̄ inherits all possible transitions from q.
Lemma 7.6. If q(z, w) > 0 then also q̄(z, w) > 0.
Proof. By the decomposition from (7.17) we can express
Px,y[(Xµ1 , X̃µ̃1) = (x1, y1)|β = β̃ = ∞] =
(γ,γ̃)
EP ω[γ]P ω[γ̃]P ωx1 [β = ∞]P
[β = ∞]
Px,y[β = β̃ = ∞]
If this probability is positive, then at least one pair (γ, γ̃) satisfies EP ω[γ]P ω[γ̃] > 0.
This implies that P [γ]P [γ̃] > 0 so that also
Px ⊗ Py[(Xµ1 , X̃µ̃1) = (x1, y1)|β = β̃ = ∞] > 0. �
In the sequel we detach the notations Y = (Yk) and Ȳ = (Ȳk) from their original
definitions in terms of the walks X , X̃ and X̄, and use (Yk) and (Ȳk) to denote
canonical Markov chains with transitions q and q̄. Now we construct a coupling.
Proposition 7.7. The single-step transitions q(x, y) for Y and q̄(x, y) for Ȳ can be
coupled in such a way that, when the processes start from a common state x,
Px,x[Y1 6= Ȳ1] ≤ Ce−α1|x|
for all x ∈ Vd. Here C and α1 are finite positive constants independent of x.
Proof. We start by constructing a coupling of three walks (X, X̃, X̄) such that the
pair (X, X̃) has distribution Px,y and the pair (X, X̄) has distribution Px ⊗ Py.
First let (X, X̃) be two independent walks in a common environment ω as before.
Let ω̄ be an environment independent of ω. Define the walk X̄ as follows. Initially
X̄0 = X̃0. On the sites {Xk : 0 ≤ k < ∞} X̄ obeys environment ω̄, and on all other
sites X̄ obeys ω. X̄ is coupled to agree with X̃ until the time
T = inf{n ≥ 0 : X̄n ∈ {Xk : 0 ≤ k <∞}}
it hits the path of X .
The coupling between X̄ and X̃ can be achieved simply as follows. Given ω and
ω̄, for each x create two independent i.i.d. sequences (zxk )k≥1 and (z̄
k)k≥1 with distri-
butions
Qω,ω̄[zxk = y] = πx,x+y(ω) and Q
ω,ω̄[z̄xk = y] = πx,x+y(ω̄).
Do this independently at each x. Each time the X̃-walk visits state x, it uses a new
zxk variable as its next step, and never reuses the same z
k again. The X̄ walk operates
the same way except that it uses the variables z̄xk when x ∈ {Xk} and the zxk variables
when x /∈ {Xk}. Now X̄ and X̃ follow the same steps zxk until X̄ hits the set {Xk}.
It is intuitively obvious that the walksX and X̄ are independent because they never
use the same environment. The following calculation verifies this. Let X0 = x0 = x
36 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
and X̃ = X̄ = y0 = y be the initial states, and Px,y the joint measure created by the
coupling. Fix finite vectors x0,n = (x0, . . . , xn) and y0,n = (y0, . . . , yn) and recall also
the notation X0,n = (X0, . . . , Xn).
The description of the coupling tells us to start as follows.
Px,y[X0,n = x0,n, X̄0,n = y0,n] =
P(dω)
P(dω̄)
P ωx (dz0,∞)1I{z0,n = x0,n}
i:yi /∈{zk : 0≤k<∞}
πyi,yi+1(ω) ·
i:yi∈{zk: 0≤k<∞}
πyi,yi+1(ω̄)
[by dominated convergence]
= lim
P(dω)
P(dω̄)
P ωx (dz0,N) 1I{z0,n = x0,n}
i:yi /∈{zk : 0≤k≤N}
πyi,yi+1(ω) ·
i:yi∈{zk: 0≤k≤N}
πyi,yi+1(ω̄)
= lim
z0,N :z0,n=x0,n
P(dω)P ωx [X0,N = z0,N ]
i:yi /∈{zk: 0≤k≤N}
πyi,yi+1(ω)
P(dω̄)
i:yi∈{zk: 0≤k≤N}
πyi,yi+1(ω̄)
[by independence of the two functions of ω]
= lim
z0,N :z0,n=x0,n
P(dω)P ωx [X0,N = z0,N ]
P(dω)
i:yi/∈{zk : 0≤k≤N}
πyi,yi+1(ω)
P(dω̄)
i:yi∈{zk: 0≤k≤N}
πyi,yi+1(ω̄)
= Px[X0,n = x0,n] · Py[X0,n = y0,n].
Thus at this point the coupled pairs (X, X̃) and (X, X̄) have the desired marginals
Px,y and Px ⊗ Py.
Next construct the common regeneration times (µ1, µ̃1) for (X, X̃) and (ρ1, ρ̄1) for
(X, X̄) by the earlier recipes. Define two pairs of walks stopped at their common
regeneration times:
(Γ, Γ̄) ≡
(X0, µ1 , X̃0, µ̃1), (X0, ρ1 , X̄0, ρ̄1)
. (7.19)
Suppose the sets X[0, µ1∨ρ1) and X̃[0, µ̃1∨ρ̄1) do not intersect. Then the construction
implies that the path X̄0, µ̃1∨ρ̄1 agrees with X̃0, µ̃1∨ρ̄1, and this forces the equalities
(µ1, µ̃1) = (ρ1, ρ̄1) and (Xµ1 , X̃µ̃1) = (Xρ1 , X̄ρ̄1). We insert an estimate on this event.
QUENCHED FUNCTIONAL CLT FOR RWRE 37
Lemma 7.8. There exist constants 0 < C, s < ∞ such that, for all x, y ∈ Vd and
P-a.e. ω,
P ωx,y(X[0, µ1∨ρ1) ∩ X̃[0, µ̃1∨ρ̄1) 6= ∅) ≤ Ce−s|x−y|. (7.20)
Proof. Write
P ωx,y(X[0, µ1∨ρ1) ∩ X̃[0, µ̃1∨ρ̄1) 6= ∅) ≤ P ωx,y(µ1 ∨ µ̃1 ∨ ρ1 ∨ ρ̄1 > ε|x− y|)
+ P ωx ( max
1≤i≤ε|x−y|
|Xi − x| ≥ |x− y|/2)
+ P ωy ( max
1≤i≤ε|x−y|
|Xi − y| ≥ |x− y|/2).
By (7.10) and its analogue for (ρ1, ρ̄1) the first term on the right-hand-side decays
exponentially in |x − y|. Using (3.2) the second and third terms are bounded by
ε|x − y|e−s|x−y|/2eεs|x−y|M , for s > 0 small enough. Choosing ε > 0 small enough
finishes the proof. �
From (7.20) we obtain
(Xµ1 , X̃µ̃1) 6= (Xρ1, X̄ρ̄1)
≤ Px,y
Γ 6= Γ̄
≤ Ce−s|x−y|. (7.21)
But we are not finished yet: it remains to include the conditioning on no back-
tracking. For this purpose generate an i.i.d. sequence (X(m), X̃(m), X̄(m))m≥1, each
triple constructed as above. Continue to write Px,y for the probability measure of
the entire sequence. Let M be the first m such that the paths (X(m), X̃(m)) do not
backtrack, which means that
k · û ≥ X
0 · û and X̃
k · û ≥ X̃
0 · û for all k ≥ 1.
Similarly define M̄ for (X(m), X̄(m))m≥1. M and M̄ are stochastically bounded by
geometric random variables by (3.7).
The pair of walks (X(M), X̃(M)) is now distributed as a pair of walks under the
measure Px,y[ · |β = β̃ = ∞], while (X(M̄), X̄(M̄)) is distributed as a pair of walks
under Px ⊗ Py[ · |β = β̄ = ∞].
Let also again
Γ(m) = (X
0 , µ
0 , µ̃
) and Γ̄(m) = (X
0 , ρ
0 , ρ̄
be the pairs of paths run up to their common regeneration times. Consider the two
pairs of paths (Γ(M), Γ̄(M̄)) chosen by the random indices (M, M̄). We insert one more
lemma.
Lemma 7.9. For s > 0 as above, and a new constant 0 < C <∞,
Γ(M) 6= Γ̄(M̄)
≤ Ce−s|x−y|/2. (7.22)
38 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Proof. Let Am be the event that the walks X̃(m) and X̄(m) agree up to the maximum
1 ∨ ρ̄
1 of their regeneration times. The equalities M = M̄ and Γ
(M) = Γ̄(M̄) are a
consequence of the event A1∩· · ·∩AM , for the following reason. As pointed out earlier,
on the event Am we have the equality of the regeneration times µ̃(m)1 = ρ̄
1 and of the
stopped paths X̃
0 , µ̃
0 , ρ̄
. By definition, these walks do not backtrack after
the regeneration time. Since the walks X̃(m) and X̄(m) agree up to this time, they
must backtrack or fail to backtrack together. If this is true for each m = 1, . . . ,M ,
it forces M̄ = M , since the other factor in deciding M and M̄ are the paths X(m)
that are common to both. And since the paths agree up to the regeneration times,
we have Γ(M) = Γ̄(M̄).
Estimate (7.22) follows:
Γ(M) 6= Γ̄(M̄)
≤ Px,y
Ac1 ∪ · · · ∪ AcM
Px,y[M ≥ m, Acm ] ≤
Px,y[M ≥ m]
)1/2(
Px,y[Acm]
≤ Ce−s|x−y|/2.
The last step comes from the estimate in (7.20) for each Acm and the geometric bound
on M . �
We are ready to finish the proof of Proposition 7.7. To create initial conditions
Y0 = Ȳ0 = x take initial states (X
0 , X̃
0 ) = (X
0 , X̄
0 ) = (0, x). Let the final
outcome of the coupling be the pair
(Y1, Ȳ1) =
− X(M)
− X(M̄)
under the measure P0,x. The marginal distributions of Y1 and Ȳ1 are correct [namely,
given by the transitions (7.13) and (7.16)] because, as argued above, the pairs of
walks themselves have the right marginal distributions. The event Γ(M) = Γ̄(M̄)
implies Y1 = Ȳ1, so estimate (7.22) gives the bound claimed in Proposition 7.7. �
The construction of the Markov chain is complete, and we return to the main
development of the proof. It remains to prove a sublinear bound on the expected
number E0,0|X[0,n)∩ X̃[0,n)| of common points of two independent walks in a common
environment. Utilizing the common regeneration times, write
E0,0|X[0,n) ∩ X̃[0,n)| ≤
E0,0|X[µi,µi+1) ∩ X̃[µ̃i,µ̃i+1)|. (7.23)
The term i = 0 is a finite constant by bound (7.10) because the number of common
points is bounded by the number µ1 of steps. For each 0 < i < n apply a decom-
position into pairs of paths from (0, 0) to given points (x1, y1) in the style of (7.17):
QUENCHED FUNCTIONAL CLT FOR RWRE 39
(γ, γ̃) are the pairs of paths with the property that
(γ,γ̃)
{X0,n(γ) = γ, X̃0,n(γ̃) = γ̃, β ◦ θn(γ) = β̃ ◦ θn(γ̃) = ∞}
= {X0 = X̃0 = 0, Xµi = x1, X̃µ̃i = y1}.
Each term i > 0 in (7.23) we rearrange as follows.
E0,0|X[µi,µi+1) ∩ X̃[µ̃i,µ̃i+1)|
x1,y1
(γ,γ̃)
EP ω0,0[X0,n(γ) = γ, X̃0,n(γ̃) = γ̃]
×Eωx1,y1(1I{β = β̃ = ∞}|X[0 , µ1) ∩ X̃[0 , µ̃1)| )
x1,y1
(γ,γ̃)
EP ω0,0[X0,n(γ) = γ, X̃0,n(γ̃) = γ̃]P
x1,y1
[β = β̃ = ∞]
×Eωx1,y1( |X[0 , µ1) ∩ X̃[0 , µ̃1)| | β = β̃ = ∞ )
x1,y1
EP ω0,0[Xµi = x1, X̃µ̃i = y1]E
x1,y1
( |X[0 , µ1) ∩ X̃[0 , µ̃1)| | β = β̃ = ∞ ).
The last conditional quenched expectation above is handled by estimates (3.7), (7.10),
(7.20) and Schwarz inequality:
Eωx1,y1( |X[0 , µ1) ∩ X̃[0 , µ̃1)| | β = β̃ = ∞ ) ≤ η
−2Eωx1,y1( |X[0 , µ1) ∩ X̃[0 , µ̃1)| )
≤ η−2Eωx1,y1(µ1 · 1I{X[0 , µ1) ∩ X̃[0 , µ̃1) 6= ∅} )
≤ η−2
Eωx1,y1[µ
)1/2(
P ωx1,y1{X[0 , µ1) ∩ X̃[0 , µ̃1) 6= ∅}
≤ Ce−s|x1−y1|/2.
Define h(x) = Ce−s|x|/2, insert the last bound back up, and appeal to the Markov
property established in Proposition 7.4:
E0,0|X[µi,µi+1) ∩ X̃[µ̃i,µ̃i+1)| ≤ E0,0
h(X̃µ̃i −Xµi)
P0,0[X̃µ̃1 −Xµ1 = x]
qi−1(x, y)h(y).
In order to apply Theorem A.1 from the Appendix, we check its hypotheses in the
next lemma. Assumption (1.2) enters here for the first and only time.
Lemma 7.10. The Markov chain (Yk)k≥0 with transition q(x, y) and the symmet-
ric random walk (Ȳk)k≥0 with transition q̄(x, y) satisfy assumptions (A.i), (A.ii) and
(A.iii) stated in the beginning of the Appendix.
Proof. From Lemma 7.3 and (3.2) we get moment bounds
E0,x|X̄ρ̄k |m + E0,x|Xρk |m <∞
40 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
for any power m < ∞. This gives assumption (A.i), namely that E0|Ȳ1|3 < ∞. The
second part of assumption (A.ii) comes from Lemma 7.6. Assumption (A.iii) comes
from Proposition 7.7.
The only part that needs work is the first part of assumption (A.ii). We show
that it follows from part (1.2) of Hypothesis (R). By (1.2) and non-nestling (N) there
exist two non-zero vectors y 6= z such that z · û > 0 and Eπ0,yπ0,z > 0. Now we have
a number of cases to consider. In each case we should describe an event that gives
Y1−Y0 a particular nonzero value and whose probability is bounded away from zero,
uniformly over x = Y0.
Case 1: y is noncollinear with z. The sign of y · û gives three subcases. We do
the trickiest one explicitly. Assume y · û < 0. Find the smallest positive integer b
such that (y + bz) · û > 0. Then find the minimal positive integers k,m such that
k(y + bz) · û = mz · û. Below Px is the path measure of the Markov chain (Yk) and
then P0,x the measure of the walks (X, X̃) as before.
Px{Y1 − Y0 = ky + (kb−m)z}
≥ P0,x
X̃µ̃1 = x+ ky + (k + 1)bz , Xµ1 = (m+ b)z , β = β̃ = ∞
P Txω0 {Xi(b+1)+1 = i(y + bz) + z, . . . , Xi(b+1)+b = i(y + bz) + bz,
X(i+1)(b+1) = (i+ 1)(y + bz) for 0 ≤ i ≤ k − 1, and then
Xk(b+1)+1 = k(y + bz) + z , . . . , Xk(b+1)+b = k(y + bz) + bz }
× P ω0 {X1 = z , X2 = 2z , . . . , Xm+b = (m+ b)z }
× P ωx+ky+(k+1)bz{β = ∞}P ω(m+b)z{β = ∞}
Regardless of possible intersections of the paths, assumption (1.2) and inequality (3.7)
imply that the quantity above has a positive lower bound that is independent of x.
The assumption that y, z are nonzero and noncollinear ensures that ky+(kb−m)z 6= 0.
Case 2: y is collinear with z. Then there is a vector w 6∈ Rz such that Eπ0,w > 0. If
w · û ≤ 0, then by Hypothesis (N) there exists u such that u · û > 0 and Eπ0,wπ0,u > 0.
If u is collinear with z, then replacing z by u and y by w puts us back in Case 1. So,
replacing w by u if necessary, we can assume that w · û > 0. We have four subcases,
depending on whether x = 0 or not and y · û < 0 or not.
(2.a) The case x 6= 0 is resolved simply by taking paths consisting of only w-steps
for one walk and only z-steps for the other, until they meet on a common level and
then never backtrack.
(2.b) The case y · û > 0 corresponds to Case 3 in the proof of [11, Lemma 5.5].
(2.c) The only case left is x = 0 and y · û < 0. Let b and c be the smallest positive
integers such that (y + bw) · û ≥ 0 and (y + cz) · û > 0. Choose minimal positive
QUENCHED FUNCTIONAL CLT FOR RWRE 41
integeres m ≥ b and n > c such that m(w · û) = n(z · û). Then,
P0{Y1 − Y0 = nz −mw}
≥ P0,0{X̃µ̃1 = y + bw + nz,Xµ1 = y + (b+m)w}
P ω0 {Xi = iw for 1 ≤ i ≤ b and Xb+1+j = y + (b+ j)w for 0 ≤ j ≤ m}
× P ω0 {Xi = iw for 0 ≤ i ≤ b,Xb+1 = bw + z and then
Xb+1+j = y + bw + jz for 1 ≤ j ≤ n}
× P ωy+(b+m)w(β = ∞)P ωy+bw+nz(β = ∞)
Since w and z are noncollinear, mw 6= nz. For the same reason, w-steps are always
taken at points not visited before. This makes the above lower bound positive. By
the choice of b and z · û > 0, neither walk dips below level 0.
We can see that the first common regeneration level for the two paths is (y+ bw+
nz)·û. The first walk backtracks from level bw ·û so this is not a common regeneration
level. The second walk splits from the first walk at bw, takes a z-step up, and then
backtracks using a y-step. So the common regeneration level can only be at or above
level (y+ bw+(c+1)z) · û. The fact that n > c ensures that (y+ bw+nz) · û is high
enough. The minimality of n ensures that this is the first such level. �
Now that the assumptions have been checked, Theorem A.1 gives constants 0 <
C <∞ and 0 < η < 1 such that
qi−1(x, y)h(y) ≤ Cn1−η for all x ∈ Vd and n ≥ 1.
Going back to (7.23) and collecting the bounds along the way gives the final estimate
E0,0|X[0,n) ∩ X̃[0,n)| ≤ Cn1−η
for all n ≥ 1. This is (6.2) which was earlier shown to imply condition (2.1) required
by Theorem 2.1. Previous work in Sections 2 and 5 convert the CLT from Theorem
2.1 into the main result Theorem 1.1. The entire proof is complete, except for the
Green function estimate furnished by the Appendix.
Appendix A. A Green function type bound
Let us write a d-vector in terms of coordinates as x = (x1, . . . , xd), and similarly
for random vectors X = (X1, . . . , Xd).
Let Y = (Yk)k≥0 be a Markov chain on Z
d with transition probability q(x, y),
and let Ȳ = (Ȳk)k≥0 be a symmetric random walk on Z
d with transition probability
q̄(x, y) = q̄(y, x) = q̄(0, y − x). Make the following assumptions.
(A.i) A third moment bound E0|Ȳ1|3 <∞.
42 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
(A.ii) Some uniform nondegeneracy: there is at least one index j ∈ {1, . . . , d} and
a constant κ0 such that the coordinate Y
j satisfies
Px{Y j1 − Y
0 ≥ 1} ≥ κ0 > 0 for all x. (A.1)
(The inequality ≥ 1 can be replaced by ≤ −1, the point is to assure that a cube is
exited fast enough.) Furthermore, for every i ∈ {1, . . . , d}, if the one-dimensional
random walk Ȳ i is degenerate in the sense that q̄(0, y) = 0 for yi 6= 0, then so is
the process Y i in the sense that q(x, y) = 0 whenever xi 6= yi. In other words, any
coordinate that can move in the Y chain somewhere in space can also move in the Ȳ
walk.
(A.iii) Most importantly, assume that for any initial state x the transitions q and
q̄ can be coupled so that
Px,x[Y1 6= Ȳ1] ≤ Ce−α1|x|
where 0 < C, α1 <∞ are constants independent of x.
Throughout the section C will change value but α1 remains the constant in the
assumption above. Let h be a function on Zd such that 0 ≤ h(x) ≤ Ce−α2|x| for
constants 0 < α2, C < ∞. This section is devoted to proving the following Green
function type bound on the Markov chain.
Theorem A.1. There are constants 0 < C, η <∞ such that
Ezh(Yk) =
P0(Yk = y) ≤ Cn1−η for all n ≥ 1 and z ∈ Zd.
To prove the estimate, we begin by discarding terms outside a cube of side r =
c1 logn. Bounding probabilities crudely by 1 gives
|y|>c1 logn
Pz(Yk = y) ≤ n
|y|>c1 logn
h(y) ≤ Cn
k>c1 logn
kd−1e−α2k
k>c1 logn
e−(α2/2)k ≤ Cne−(α2/2)c1 logn ≤ Cn1−η
as long as n is large enough so that kd−1 ≤ eα2k/2, and this works for any c1.
B = [−c1 log n, c1 log n]d.
Since h is bounded, it now remains to show that
Pz(Yk ∈ B) ≤ Cn1−η. (A.2)
For this we can assume z ∈ B since accounting for the time to enter B for the first
time can only improve the estimate.
QUENCHED FUNCTIONAL CLT FOR RWRE 43
Bound (A.2) will be achieved in two stages. First we show that the Markov chain Y
does not stay in B longer than a time whose mean is a power of the size of B. Second,
we show that often enough Y follows the random walk Ȳ during its excursions outside
B. The random walk excursions are long and thereby we obtain (A.2). Thus our first
task is to construct a suitable coupling of Y and Ȳ .
Lemma A.1. Let ζ = inf{n ≥ 1 : Ȳ ∈ A} be the first entrance time of Ȳ into some
set A ⊆ Zd. Then we can couple Y and Ȳ so that
Px,x[ Yk 6= Ȳk for some 1 ≤ k ≤ ζ ] ≤ CEx
e−α1|Ȳk|.
The proof shows that the statement works also if ζ = ∞ is possible, but we will
not need this case.
Proof. For each state x create an i.i.d. sequence (Zxk , Z̄
k )k≥1 such that Z
k has distri-
bution q(x, x+ · ), Z̄xk has distribution q̄(x, x+ · ) = q̄(0, · ), and each pair (Zxk , Z̄xk )
is coupled so that P (Zxk 6= Z̄xk ) ≤ Ce−α1|x|. For distinct x these sequences are inde-
pendent.
Construct the process (Yn, Ȳn) as follows: with counting measures
Ln(x) =
1I{Yk = x} and L̄n(x) =
1I{Ȳk = x} (n ≥ 0)
and with initial point (Y0, Ȳ0) given, define for n ≥ 1
Yn = Yn−1 + Z
Ln−1(Yn−1)
and Ȳn = Ȳn−1 + Z̄
Ȳn−1
L̄n−1(Ȳn−1)
In words, every time the chain Y visits a state x, it reads its next jump from a new
variable Zxk which is then discarded and never used again. And similarly for Ȳ . This
construction has the property that, if Yk = Ȳk for 0 ≤ k ≤ n with Yn = Ȳn = x, then
the next joint step is (Zxk , Z̄
k ) for k = Ln(x) = L̄n(x). In other words, given that the
processes agree up to the present and reside together at x, the probability that they
separate in the next step is bounded by Ce−α1|x|.
44 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Now follow self-evident steps.
Px,x[ Yk 6= Ȳk for some 1 ≤ k ≤ ζ ]
Px,x[ Yj = Ȳj ∈ Ac for 1 ≤ j < k, Yk 6= Ȳk ]
1I{ Yj = Ȳj ∈ Ac for 1 ≤ j < k }PYk−1,Ȳk−1(Y1 6= Ȳ1)
1I{ Yj = Ȳj ∈ Ac for 1 ≤ j < k }e−α1|Ȳk−1|
≤ CEx
e−α1|Ȳm|. �
For the remainder of this section Y and Ȳ are always coupled in the manner that
satisfies Lemma A.1.
Lemma A.2. Let j ∈ {1, . . . , d} be such that the one-dimensional random walk Ȳ j is
not degenerate. Let r0 be a positive integer and w̄ = inf{n ≥ 1 : Ȳ jn ≤ r0} the first
time the random walk Ȳ enters the half-space H = {x : xj ≤ r0}. Couple Y and Ȳ
starting from a common initial state x /∈ H. Then there is a constant C independent
of r0 such that
Px,x[ Yk 6= Ȳk for some k ∈ {1, . . . , w̄} ] ≤ Ce−α1r0 for all r0 ≥ 1.
The same result holds for H = {x : xj ≥ −r0}.
Proof. By Lemma A.1
Px,x[ Yk 6= Ȳk for some k ∈ {1, . . . , w̄} ] ≤ CEx
[ w̄−1∑
e−α1|Ȳk|
≤ CExj
[ w̄−1∑
e−α1Ȳ
t=r0+1
e−α1tg(xj, t)
where for s, t ∈ [r0 + 1,∞)
g(s, t) =
Ps[Ȳ
n = t , w̄ > n]
is the Green function of the half-line (−∞, r0] for the one-dimensional random walk
Ȳ j. This is the expected number of visits to t before entering (−∞, r0], defined on
p. 209 in Spitzer [13]. The development in Sections 18 and 19 in [13] gives the bound
g(s, t) ≤ C(1 + (s− r0 − 1) ∧ (t− r0 − 1)) ≤ C(t− r0), s, t ∈ [r0 + 1,∞). (A.3)
QUENCHED FUNCTIONAL CLT FOR RWRE 45
Here is some more detail. Shift r0 + 1 to the origin to match the setting in [13].
Then P19.3 on p. 209 gives
g(x, y) =
u(x− n)v(y − n) for x, y > 0
where the functions u and v are defined on p. 201. For a symmetric random walk
u = v. P18.7 on p. 202 implies that
v(m) =
P[Z1 + · · ·+ Zk = m]
where c is a certain constant and {Zi} are i.i.d. strictly positive, integer-valued ladder
variables for the underlying random walk. Now one can show inductively that v(m) ≤
v(0) for each m so the quantities u(m) = v(m) are bounded. This justifies (A.3).
Continuing from further above we get the estimate claimed in the statement:
[ w̄−1∑
e−α1|Ȳk|
(t− r0)e−α1t ≤ Ce−α1r0. �
For the next lemmas abbreviate Br = [−r, r]d for d-dimensional centered cubes.
Lemma A.3. With α1 given in the coupling hypothesis (A.iii), fix any positive constant
κ1 > 2α
1 . Consider large positive integers r0 and r that satisfy
2α−11 log r ≤ r0 ≤ κ1 log r < r.
Then there exist a positive integer m0 and a constant 0 < α3 <∞ such that, for large
enough r,
x∈BrrBr0
Px[without entering Br0 chain Y exits Br by time r
m0 ] ≥ α3
. (A.4)
Proof. We consider first the case where x ∈ BrrBr0 has a coordinate xj that satisfies
xj ∈ [−r,−r0 − 1] ∪ [r0 + 1, r] and Ȳ j is nondegenerate. For this case we can take
m0 = 4. A higher m0 may be needed to move a suitable coordinate out of the interval
[−r0, r0]. This is done in the second step of the proof.
The same argument works for both xj ∈ [−r,−r0−1] and xj ∈ [r0+1, r]. We treat
the case xj ∈ [r0 + 1, r]. One way to realize the event in (A.4) is this: starting at xj ,
the Ȳ j walk exits [r0 + 1, r] by time r
4 through the right boundary into [r + 1,∞),
and Y and Ȳ stay coupled together throughout this time. Let ζ̄ be the time Ȳ j exits
[r0+1, r] and w̄ the time Ȳ
j enters (−∞, r0]. Then w̄ ≥ ζ̄. Thus the complementary
probability of (A.4) is bounded by
Pxj{ Ȳ j exits [r0 + 1, r] into (−∞, r0] }
+ Pxj{ζ̄ > r4} + Px,x{ Yk 6= Ȳk for some k ∈ {1, . . . , w̄} }.
(A.5)
46 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
We treat the terms one at a time. From the development on p. 253-255 in [13] we
get the bound
Pxj{ Ȳ j exits [r0 + 1, r] into (−∞, r0] } ≤ 1−
(A.6)
for some constant α4 > 0. In some more detail: P22.7 on p. 253, the inequality in
the third display of p. 255, and the third moment assumption on the steps of Ȳ give
a lower bound
Pxj{ Ȳ j exits [r0 + 1, r] into [r + 1,∞) } ≥
xj − r0 − 1− c1
r − r0 − 1
(A.7)
for the probability of exiting to the right. Here c1 is a constant that comes from
the term denoted in [13] by M
s=0(1 + s)a(s) whose finiteness follows from the
third moment assumption. The text on p. 254-255 suggests that these steps need the
aperiodicity assumption. This need for aperiodicity can be traced back via P22.5 to
P22.4 which is used to assert the boundedness of u(x) and v(x). But as we observed
above in the derivation of (A.3) boundedness of u(x) and v(x) is true without any
additional assumptions.
To go forward from (A.7) fix any m > c1 so that the numerator above is positive
for xj = r0 + 1 +m. The probability in (A.7) is minimized at x
j = r0 + 1, and from
xj = r0 + 1 there is a fixed positive probability θ to take m steps to the right to get
past the point xj = r0 + 1 +m. Thus for all x
j ∈ [r0 + 1, r] we get the lower bound
Pxj{ Ȳ j exits [r0 + 1, r] into [r + 1,∞) } ≥
θ(m− c1)
r − r0 − 1
and (A.6) is verified.
As in (A.3) let g(s, t) be the Green function of the random walk Ȳ j for the half-
line (−∞, r0], and let g̃(s, t) be the Green function for the complement of the interval
[r0 + 1, r]. Then g̃(s, t) ≤ g(s, t), and by (A.3) we get this moment bound:
Exj [ ζ̄ ] =
t=r0+1
g̃(xj , t) ≤
t=r0+1
g(xj, t) ≤ Cr2.
Consequently, uniformly over xj ∈ [r0 + 1, r],
Pxj [ζ̄ > r
4] ≤ C
. (A.8)
From Lemma A.2
Px[ Yk 6= Ȳk for some k ∈ {1, . . . , w̄} ] ≤ Ce−α1r0 . (A.9)
Putting bounds (A.6), (A.8) and (A.9) together gives an upper bound of
1 − α4
+ Ce−α1r0
QUENCHED FUNCTIONAL CLT FOR RWRE 47
for the sum in (A.5) which bounds the complement of the probability in (A.4). By
assumption r0 > 2α
1 log r, so for large enough r the sum above is not more than
1− α3/r for some constant α3 > 0.
The lemma is now proved for those x ∈ Br r Br0 for which some
j ∈ J ≡ {1 ≤ j ≤ d : the one-dimensional walk Ȳ j is nondegenerate}
satisfies xj ∈ [−r,−r0 − 1] ∪ [r0 + 1, r]. Now suppose x ∈ Br r Br0 but all j ∈ J
satisfy xj ∈ [−r0, r0]. Let
T = inf{n ≥ 1 : Y jn /∈ [−r0, r0] for some j ∈ J}.
The first part of the proof gives Px-almost surely
PYT [without entering Br0 chain Y exits Br by time r
4/2] ≥ α3
Replacing r4 by r4/2 only affects the constant in (A.8). It can of course happen that
YT /∈ Br but then we interpret the above probability as one.
By the Markov property it remains to show that for a suitable m0
Px[T ≤ rm0/2] : x ∈ Br r Br0 but xj ∈ [−r0, r0] for all j ∈ J
(A.10)
is bounded below by a positive constant. Hypothesis (A.1) implies that for some
constant b1, ExT ≤ br01 uniformly over the relevant x. This is because one way to
realize T is to wait until some coordinate Y j takes 2r0 successive identical steps.
By hypothesis (A.1) this random time is stochastically bounded by a geometrically
distributed random variable.
It is also necessary for this argument that during time [0, T ] the chain Y does not
enter Br0. Indeed, under the present assumptions the chain never enters Br0 . This is
because for x ∈ BrrBr0 some coordinate i must satisfy xi ∈ [−r,−r0−1]∪ [r0+1, r].
But now this coordinate i /∈ J , and so by hypothesis (A.ii) the one-dimensional
process Y i is constant, Y in = x
i /∈ [−r0, r0] for all n.
Finally, the required positive lower bound for (A.10) comes by Chebychev. Take
m0 ≥ κ1 log b1 +1 where κ1 comes from the assumptions of the lemma. Then, by the
hypothesis r0 ≤ κ1 log r,
Px[T > r
m0/2] ≤ 2r−m0br01 ≤ 2rκ1 log b1−m0 ≤ 12
for r ≥ 4. �
We come to one of the main auxiliary lemmas of this development.
Lemma A.4. Let U = inf{n ≥ 0 : Yn /∈ Br} be the first exit time from Br for the
Markov chain Y . Then there exist finite positive constants C1, m1 such that
Ex(U) ≤ C1rm1 for all 1 ≤ r <∞.
48 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Proof. First observe that supx∈Br Ex(U) < ∞ by assumption (A.1) because by a
geometric time some coordinate Y j has experienced 2r identical steps in succession.
Throughout, let r0 < r satisfy the assumptions of Lemma A.3. Once the statement
is proved for large enough r, we obtain it for all r ≥ 1 by increasing C1.
Let 0 = T0 = S0 ≤ T1 ≤ S1 ≤ T2 ≤ · · · be the successive exit and entrance times
into Br0 . Precisely, for i ≥ 1 as long as Si−1 <∞
Ti = inf{n ≥ Si−1 : Yn /∈ Br0} and Si = inf{n ≥ Ti : Yn ∈ Br0}
Once Si = ∞ then we set Tj = Sj = ∞ for all j > i. If Y0 ∈ Br r Br0 then also
T1 = 0. Again by assumption (A.1) (and as observed in the proof of Lemma A.3)
there is a constant 0 < b1 <∞ such that
x∈Br0
Ex[T1] ≤ br01 . (A.11)
So a priori T1 is finite but S1 = ∞ is possible. Since T1 ≤ U <∞ we can decompose
as follows:
Ex[U ] =
Ex[U, Tj ≤ U < Sj ]
Ex[Tj , Tj ≤ U < Sj] +
Ex[U − Tj , Tj ≤ U < Sj].
(A.12)
We first treat the last sum in (A.12). By an inductive application of Lemma A.3,
for any z ∈ Br rBr0 ,
Pz[U > jr
m0 , U < S1] ≤ Pz[ Yk ∈ Br r Br0 for k ≤ jrm0 ]
1I{ Yk ∈ Br r Br0 for k ≤ (j − 1)rm0 }PY(j−1)rm0 { Yk ∈ Br r Br0 for k ≤ r
≤ · · · ≤ (1− α3r−1)j .
Utilizing this, still for z ∈ Br r Br0 ,
Ez[U, U < S1] =
Pz[U > m , U < S1]
≤ rm0
Pz[U > jr
m0 , U < S1] ≤ rm0+1α−13 .
(A.13)
Next we take into consideration the failure to exit Br during the earlier excursions
in Br rBr0 . Let
Hi = {Yn ∈ Br for Ti ≤ n < Si}
be the event that in between the ith exit from Br0 and entrance back into Br0 the
chain Y does not exit Br. We shall repeatedly use this consequence of Lemma A.3:
for i ≥ 1, on the event {Ti <∞}, Px[Hi | FTi] ≤ 1− α3r−1. (A.14)
QUENCHED FUNCTIONAL CLT FOR RWRE 49
Here is the first instance.
Ex[U − Tj , Tj ≤ U < Sj ] = Ex
[ j−1∏
1IHk · 1I{Tj <∞} · EYTj (U, U < S1)
≤ rm0+1α−13 Ex
[ j−1∏
1IHk · 1I{Tj−1 <∞}
≤ rm0+1α−13 (1− α3r−1)j−1.
Note that if YTj above lies outside Br then EYTj (U) = 0. In the other case YTj ∈
Br rBr0 and (A.13) applies. So for the last sum in (A.12):
Ex[U − Tj , Tj ≤ U < Sj] ≤
rm0+1α−13 (1− α3r−1)j−1 ≤ rm0+2α−23 . (A.15)
We turn to the second-last sum in (A.12). Utilizing (A.11) and (A.14),
Ex[Tj , Tj ≤ U < Sj] ≤
[ j−1∏
1IHk · 1I{Tj <∞} · (Ti+1 − Ti)
≤ br01 (1− α3r−1)j−1
[ i−1∏
1IHk · (Ti+1 − Ti)1IHi · 1I{Ti+1 <∞}
(1− α3r−1)j−1−i.
(A.16)
Split the last expectation as
[ i−1∏
1IHk · (Ti+1 − Ti)1IHi · 1I{Ti+1 <∞}
[ i−1∏
1IHk · (Ti+1 − Si)1IHi · 1I{Si <∞}
[ i−1∏
1IHk · (Si − Ti)1IHi · 1I{Ti <∞}
[ i−1∏
1IHk · 1I{Si <∞} · EYSi (T1)
[ i−1∏
1IHk · 1I{Ti <∞} ·EYTi (S1 · 1IH1)
[ i−1∏
1IHk · 1I{Ti−1 <∞}
(br01 + r
m0+1α−13 )
≤ (1− α3r−1)i−1(br01 + rm0+1α−13 ). (A.17)
In the second-last inequality above, before applying (A.14) to theHk’s, EYSi (T1) ≤ b
comes from (A.11). The other expectation is estimated again by iterating Lemma
50 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
A.3 and again with z ∈ Br rBr0 :
Ez(S1 · 1IH1) =
Pz[S1 > m , H1] ≤
Pz[ Yk ∈ Br rBr0 for k ≤ m ]
≤ rm0
Pz[ Yk ∈ Br r Br0 for k ≤ jrm0 ] ≤ rm0+1α−13 .
Insert the bound from line (A.17) back up into (A.16) to get the bound
Ex[Tj , Tj ≤ U < Sj] ≤ (2br01 + rm0+1α−13 )j(1− α3r−1)j−2.
Finally, bound the second-last sum in (A.12):
Ex[Tj , Tj ≤ U < Sj] ≤
2br01 r
2α−23 + r
m0+3α−33
(1− α3r−1)−1.
Taking r large enough so that α3r
−1 < 1/2 and combining this with (A.12) and
(A.15) gives
Ex[U ] ≤ rm0+2α−21 + 4br01 r2α−23 + 2rm0+3α−33 .
Since r0 ≤ κ1 log r for some constant C, the above bound simplifies to C1rm1 . �
For the remainder of the proof we work with B = Br for r = c1 log n. The above
estimate gives us one part of the argument for (A.2), namely that the Markov chain
Y exits B = [−c1 log n, c1 logn]d fast enough.
Let 0 = V0 < U1 < V1 < U2 < V2 < · · · be the successive entrance times Vi into B
and exit times Ui from B for the Markov chain Y , assuming that Y0 = z ∈ B. It is
possible that some Vi = ∞. But if Vi < ∞ then also Ui+1 < ∞ due to assumption
(A.1), as already observed. The time intervals spent in B are [Vi, Ui+1) each of length
at least 1. Thus, by applying Lemma A.4,
Pz(Yk ∈ B) ≤
(Ui+1 − Vi)1I{Vi ≤ n}
EYVi (U1)1I{Vi ≤ n}
≤ C(logn)m1Ez
1I{Vi ≤ n}
(A.18)
QUENCHED FUNCTIONAL CLT FOR RWRE 51
Next we bound the expected number of returns to B by the number of excursions
outside B that fit in a time of length n:
1I{Vi ≤ n}
(Vj − Vj−1) ≤ n
(Vj − Uj) ≤ n
(A.19)
According to the usual notion of stochastic dominance, the random vector (ξ1, . . . , ξn)
dominates (η1, . . . , ηn) if
Ef(ξ1, . . . , ξn) ≥ Ef(η1, . . . , ηn)
for any function f that is coordinatewise nondecreasing. If the {ξi : 1 ≤ i ≤ n} are
adapted to the filtration {Gi : 1 ≤ i ≤ n}, and P [ξi > a|Gi−1] ≥ 1 − F (a) for some
distribution function F , then the {ηi} can be taken i.i.d. F -distributed.
Lemma A.5. There exist positive constants c1, c2 and γ such that the following holds:
the excursion lengths {Vj − Uj : 1 ≤ j ≤ n} stochastically dominate i.i.d. variables
{ηj} whose common distribution satisfies P[η ≥ a] ≥ c1a−1/2 for 1 ≤ a ≤ c2nγ.
Proof. Since Pz[Vj − Uj ≥ a|FUj ] = PYUj [V ≥ a] where V means first entrance time
into B, we shall bound Px[V ≥ a] below uniformly over
x /∈ B :
Pz[YU1 = x] > 0
Fix such an x and an index 1 ≤ j ≤ d such that xj /∈ [−r, r]. Since the coordinate Y j
can move out of [−r, r], this coordinate is not degenerate, and hence by assumption
(A.ii) the random walk Ȳ j is nondegenerate. As before we work through the case
xj > r because the argument for the other case xj < −r is the same.
Let w̄ = inf{n ≥ 1 : Ȳ jn ≤ r} be the first time the one-dimensional random walk
Ȳ j enters the half-line (−∞, r]. If both Y and Ȳ start at x and stay coupled together
until time w̄, then V ≥ w̄. This way we bound V from below. Since the random
walk is symmetric and can be translated, we can move the origin to xj and use classic
results about the first entrance time into the left half-line, T̄ = inf{n ≥ 1 : Ȳ jn < 0}.
Pxj [w̄ ≥ a] ≥ Pr+1[w̄ ≥ a] = P0[T̄ ≥ a] ≥
for a constant α5. The last inequality follows for one-dimensional symmetric walks
from basic random walk theory. For example, combine equation (7) on p. 185 of [13]
with a Tauberian theorem such as Theorem 5 on p. 447 of Feller [7]. Or see directly
Theorem 1a on p. 415 of [7].
52 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
Now start both Y and Ȳ from x. Apply Lemma A.2 and recall that r = c1 log n.
Px[V ≥ a] ≥ Px,x[V ≥ a, Yk = Ȳk for k = 1, . . . , w̄ ]
≥ Px,x[w̄ ≥ a, Yk = Ȳk for k = 1, . . . , w̄ ]
≥ Pxj [w̄ ≥ a]− Px,x[ Yk 6= Ȳk for some k ∈ {1, . . . , w̄} ]
≥ α5√
− Cn−c1α1 .
This gives a lower bound
Px[V ≥ a] ≥
if a ≤ α25(2C)−2n2c1α1 . This lower bound is independent of x. We have proved the
lemma. �
We can assume that the random variables ηj given by the lemma satisfy 1 ≤ ηj ≤
γ and we can assume both c2, γ ≤ 1 because this merely weakens the result. For
the renewal process determined by {ηj} write
S0 = 0 , Sk =
ηj , and K(n) = inf{k : Sk > n}
for the renewal times and the number of renewals up to time n (counting the renewal
S0 = 0). Since the random variables are bounded, Wald’s identity gives
EK(n) · Eη = ESK(n) ≤ n+ c2nγ ≤ 2n,
while
∫ c2nγ
ds ≥ c3nγ/2.
Together these give
EK(n) ≤ 2n
≤ C2n1−γ/2.
Now we pick up the development from line (A.19). Since the negative of the
function of (Vj − Uj)1≤i≤n in the expectation on line (A.19) is nondecreasing, the
stochastic domination of Lemma A.5 gives an upper bound of (A.19) in terms of the
i.i.d. {ηj}. Then we use the renewal bound from above.
1I{Vi ≤ n}
(Vj − Uj) ≤ n
ηj ≤ n
= EK(n) ≤ C2n1−γ/2.
QUENCHED FUNCTIONAL CLT FOR RWRE 53
Returning back to (A.18) to collect the bounds, we have shown that
Pz(Yk ∈ B) ≤ C(logn)m1Ez
1I{Vi ≤ n}
≤ C(logn)m1C2n1−γ/2
and thereby verified (A.2).
References
[1] N. Berger and O. Zeitouni. A quenched invariance principle for certain ballistic
random walks in i.i.d. environments.
http://front.math.ucdavis.edu/math.PR/0702306.
[2] E. Bolthausen and A.-S. Sznitman. On the static and dynamic points of view for
certain random walks in random environment. Methods Appl. Anal., 9(3):345–
375, 2002. Special issue dedicated to Daniel W. Stroock and Srinivasa S. R.
Varadhan on the occasion of their 60th birthday.
[3] E. Bolthausen and A.-S. Sznitman. Ten lectures on random media, volume 32 of
DMV Seminar. Birkhäuser Verlag, Basel, 2002.
[4] J. Bricmont and A. Kupiainen. Random walks in asymmetric random environ-
ments. Comm. Math. Phys., 142(2):345–420, 1991.
[5] Y. Derriennic and M. Lin. The central limit theorem for Markov chains started
at a point. Probab. Theory Related Fields, 125(1):73–76, 2003.
[6] S. N. Ethier and T. G. Kurtz. Markov processes. Wiley Series in Probability and
Mathematical Statistics: Probability and Mathematical Statistics. John Wiley
& Sons Inc., New York, 1986. Characterization and convergence.
[7] W. Feller. An introduction to probability theory and its applications. Vol. II.
Second edition. John Wiley & Sons Inc., New York, 1971.
[8] M. Maxwell and M. Woodroofe. Central limit theorems for additive functionals
of Markov chains. Ann. Probab., 28(2):713–724, 2000.
[9] F. Rassoul-Agha and T. Seppäläinen. An almost sure invariance principle for
random walks in a space-time random environment. Probab. Theory Related
Fields, 133(3):299–314, 2005.
[10] F. Rassoul-Agha and T. Seppäläinen. Ballistic random walk in a random en-
vironment with a forbidden direction. ALEA Lat. Am. J. Probab. Math. Stat.,
1:111–147 (electronic), 2006.
[11] F. Rassoul-Agha and T. Seppäläinen. Quenched invariance principle for multi-
dimensional ballistic random walk in a random environment with a forbidden
direction. Ann. Probab., 35(1):1–31, 2007.
[12] M. Rosenblatt. Markov processes. Structure and asymptotic behavior. Springer-
Verlag, New York, 1971. Die Grundlehren der mathematischen Wissenschaften,
Band 184.
[13] F. Spitzer. Principles of random walks. Springer-Verlag, New York, second
edition, 1976. Graduate Texts in Mathematics, Vol. 34.
54 F. RASSOUL-AGHA AND T. SEPPÄLÄINEN
[14] A.-S. Sznitman. Slowdown estimates and central limit theorem for random walks
in random environment. J. Eur. Math. Soc. (JEMS), 2(2):93–143, 2000.
[15] A.-S. Sznitman. Topics in random walks in random environment. In School
and Conference on Probability Theory, ICTP Lect. Notes, XVII, pages 203–266
(electronic). Abdus Salam Int. Cent. Theoret. Phys., Trieste, 2004.
[16] A.-S. Sznitman and O. Zeitouni. An invariance principle for isotropic diffusions
in random environment. Invent. Math., 164(3):455–567, 2006.
[17] A.-S. Sznitman and M. Zerner. A law of large numbers for random walks in
random environment. Ann. Probab., 27(4):1851–1869, 1999.
[18] O. Zeitouni. Random walks in random environments, volume 1837 of Lecture
Notes in Mathematics. Springer-Verlag, Berlin, 2004. Lectures from the 31st
Summer School on Probability Theory held in Saint-Flour, July 8–25, 2001,
Edited by Jean Picard.
[19] M. P. W. Zerner. Lyapounov exponents and quenched large deviations for mul-
tidimensional random walk in random environment. Ann. Probab., 26(4):1446–
1476, 1998.
F. Rassoul-Agha, 155 S 1400 E, Salt Lake City, UT 84112
E-mail address : firas@math.utah.edu
URL: www.math.utah.edu/∼firas
T. Seppäläinen, 419 Van Vleck Hall, Madison, WI 53706
E-mail address : seppalai@math.wisc.edu
URL: www.math.wisc.edu/∼seppalai
	1. Introduction and main result
	2. Preliminaries for the proof.
	3. Basic estimates for non-nestling RWRE
	4. Invariant measure and ergodicity
	5. Change of measure
	6. Reduction to path intersections
	7. Bound on intersections
	Appendix A. A Green function type bound
	References
ABSTRACT
  We consider a non-nestling random walk in a product random environment. We
assume an exponential moment for the step of the walk, uniformly in the
environment. We prove an invariance principle (functional central limit
theorem) under almost every environment for the centered and diffusively scaled
walk. The main point behind the invariance principle is that the quenched mean
of the walk behaves subdiffusively.

<|endoftext|><|startoftext|>
The Effect of Annealing Temperature on Statistical Properties of WO3 Surface
G. R. Jafari a,b, A. A. Saberi c, R. Azimirad c, A. Z. Moshfegh c, and S. Rouhani c
Department of Physics, Shahid Beheshti University, Evin, Tehran 19839, Iran
Department of Nano-Science, IPM, P. O. Box 19395-5531, Tehran, Iran
Department of Physics, Sharif University of Technology, P.O. Box 11365-9161, Tehran, Iran
We have studied the effect of annealing temperature on the statistical properties of WO3 surface
using atomic force microscopy techniques (AFM). We have applied both level crossing and structure
function methods. Level crossing analysis indicates an optimum annealing temperature of around
400oC at which the effective area of the WO3 thin film is maximum, whereas composition of the
surface remains stoichiometric. The complexity of the height fluctuation of surfaces was charac-
terized by roughness, roughness exponent and lateral size of surface features. We have found that
there is a phase transition at around 400oC from one set to two sets of roughness parameters. This
happens due to microstructural changes from amorphous to crystalline structure in the samples that
has been already found experimentally.
I. INTRODUCTION
Transition metal oxides represent a large family of
materials possessing various interesting properties, such
as superconductivity, colossal magneto-resistance and
piezoelectricity. Among them, tungsten oxide is of in-
tense interest and has been investigated extensively for its
distinctive properties. With outstanding electrochromic
[1, 2, 3, 4, 5, 6], photochromic [7], gaschromic [8], gas
sensor [9, 10, 11], photo-catalyst [12], and photolumines-
cence properties [13], tungsten oxide has been used to
construct ”smart-window”, anti-glare rear view mirrors
of automobile, non-emissive displays, optical recording
devices, solid-state gas sensors, humidity and tempera-
ture sensors, biosensors, photonic crystals, and so forth.
WO3 thin films can be prepared by various deposition
techniques such as thermal evaporation [3, 8], spray py-
rolysis [14], sputtering [6], pulsed laser ablation [10, 11],
sol-gel coating [2, 5, 15], and chemical vapor deposition
[16].
The gas sensitivity of WO3 heavily depends upon film
parameters such as composition, morphology (e.g. grain
size), nanostructure and microstructure (e.g. porosity,
surface-to-volume ratio). Film parameters are related to
the deposition technique used, the deposition conditions
and the subsequent annealing process. Annealing,
which is an essential process to obtain stable films with
well-defined microstructure, causes stoichiometry and
microstructural changes that have a high influence on the
sensing characteristics of the films [17]. Moreover, the
surface structure and surface morphology of the metal
oxides are also important for different applications. In
fact, the electrochromic devices are made of amorphous
oxides [1], while crystalline phase plays a major role in
catalysts and sensors [17]. This is because, the minor
change in their chemical composition and crystalline
structure could modify different properties of the metal
oxides.
In practice, one of the effective ways to modify the
surface morphology is annealing process at various
temperatures. So far, most of morphological analysis
related to the WO3 surface were accessible through
the experimental methods. Usually these analysis
are rigorous and time consuming. Moreover, lack of
the suitable analysis for AFM data to find the nano
and microstructural properties of surfaces was feeling
perfectly.
In this article, we introduce the methods: roughness
analysis and level crossing as suitable candidates and
show that we can get easily the structure and morpho-
logical properties of a surface in a fast manner, only
using the AFM observation as an initial data.
The roughness of a surface has been studied as a
simple growth model using analytical and numerical
methods [18, 19, 20, 21, 22, 23, 24, 25, 26]. These
studies quite generally proposed that the height fluc-
tuations have a self-similar character and their average
correlations exhibit a dynamic scaling form. Also some
authors recently use the average frequency of positive
slope level crossing to provide further complete analysis
on roughness of a surface [27]. This stochastic approach
has turned out to be a promising tool also for other
systems with scale dependent complexity, such as in
surface growth where one would like to measure the
roughness [28]. Some authors have applied this method
to study the fluctuations of velocity fields in Burgers
turbulence [29] and the Kardar-Parisi-Zhang equation in
(d+1)-dimensions [30] and analyzing the stock market
[31].
In this work, we have used the scaling analysis to
determine the roughness, roughness exponent and the
lateral size of surface features. Moreover, level crossing
analysis has been utilized to estimate the effective area
of a surface.
This paper is organized as follows: In section II, we
have discussed about the film preparation and experi-
http://arxiv.org/abs/0704.1023v1
mental results obtained from AFM, XPS and UV-visible
spectrophotometer for the annealed samples at the vari-
ous temperatures. In section III, we have introduced the
analytical methods briefly. Data description and data
analysis based on the statistical parameters of WO3 sur-
face as a function of annealing temperatures are given in
section IV.Finally, section V concludes presented results.
II. EXPERIMENTALS
Thin films of WO3 were deposited on microscope slide
glass using thermal evaporation method. The deposition
system was evacuated to a base pressure of ∼ 4×10−3Pa.
Thickness of the deposited films was considered about
200 nm measured by the stylus and optical techniques.
More details about the other deposition parameters of
the films are recently reported elsewhere [32].
To study the effect of annealing temperature on sur-
face structure and optical properties of the samples, they
were annealed at 200, 300, 350, 400, 450, and 500oC in
air for a period of 60 min. Optical transmission and
reflection measurements of the deposited films were per-
formed in a range of 300-1100 nm wavelength using a
Jascow V530 ultraviolet (UV)-visible spectrophotometer
with resolution of 1 nm.
X-ray photoelectron spectroscopy (XPS) using a Specs
EA 10 Plus concentric hemispherical analyzer (CHA)
with Al Kα anode at energy of 1486.6 eV was employed
to study the atomic composition and chemical state of
the tungsten oxide thin films. The pressure in the ultra
high vacuum surface analysis chamber was less than
1.0×10−7Pa. All binding energy values were determined
by calibration and fixing the C(1s) line to 285.0 eV . The
XPS data analysis and deconvolution were performed
by SDP (version 4.0) software. The nanoscale Surface
topography of the deposited films was investigated by
Thermo Microscope Autoprobe CP-Research atomic
force microscopy (AFM) in air with a silicon tip of 10
nm radius in contact method. The AFM images were
recorded with resolution of about 20 nm in a scale of
5× 5 µm.
A. XPS Characterization
The elemental and chemical characterizations of the
films were performed by XPS. Figure 1a shows theW (4f)
core level spectra recorded on the ”as deposited” WO3
sample, and the results of its fitting analysis. To repro-
duce the experimental data, one doublet function was
used for the W (4f) component. This contains W (4f7/2)
at 35.6 eV and W (4f5/2) at 37.8 eV with a full-width
at half-maximum (FWHM) of 1.75± 0.04 eV . The area
ratio of these two peaks is 0.75 which is supported by
the spin-orbit splitting theory of 4f levels. Moreover,
the structure was shifted by 5 eV toward higher energy
FIG. 1: W (4f) core level spectra of WO3 thin films: a) ”as
deposited” and b) annealed at 500oC.
relative to the metal state.
It is thus clear that the main peaks in our XPS spectrum
attributed to the W 6+ state on the surface [1, 2, 33]. In
stoichiometricWO3, the six valence electrons of the tung-
sten atom are transferred into the oxygen p-like bands,
which are thus completely filled. In this case, the tung-
sten 5d valence electrons have no part of their wave func-
tion near the tungsten atom and the remaining electrons
in the tungsten atom experience a stronger Coulomb in-
teraction with the nucleus than in the case of tungsten
atom in a metal, in which the screening of the nucleus
has a component due to the 5d valence electrons. There-
fore, the binding energy of the W (4f) level is larger in
WO3 than in metallic tungsten. If an oxygen vacancy
exists, the electronic density near its adjacent W atom
increases, the screening of its nucleus is higher and, thus,
the 4f level energy is expected to be at lower binding
energy [1].
By increasing the annealing temperature it was ob-
served that the position of W (4f) peak did not obviously
change. But for WO3 thin film annealed at 500
oC (Fig.
1b), the W (4f) peak moved to a lower binding energy so
that W (4f7/2) position was observed at 35.0 eV . This
can be related to oxygen vacancy at this high annealing
temperature and formation of W 5+.
B. Optical Characterization
The transmittance and reflectance spectra in the
visible and infrared range recorded for the WO3 thin
films before and after annealing at different temperatures
(Fig. 2a). It is seen that, the transmittance of the ”as
deposited” films in the visible range varies from about
80 up to nearly 100% (without considering the substrate
contribution). Correspondingly, maximum value of the
reflectance for both the film and the substrate is about
20% (the reflectance from the bare glass substrate was
measured about 10%). The sharp reduction in the
transmittance spectrum at the wavelength of ∼ 350nm
is due to the fundamental absorption edge that was also
reported previously [1, 2, 3].
The oscillations in the transmission and reflection
spectra are caused by optical interference. The optical
transmittance of WO3 films strongly depends on the
oxygen content of the films. In fact, non-stoichiometric
films with composition of WO3−x show a blue tinge for
x > 0.03 [34].
The ”as deposited” pure tungsten oxide films were
highly transparent with no observable blue coloration,
under our experimental conditions. As can be seen
from Fig. 2a after annealing process at 200 to 400oC,
the transmittance and reflectance of the WO3 films
have not changed significantly. Only, the position of
the oscillations altered due to thickness reduction and
film condensation after the heat-treatment process
[1]. At 500oC transmittance and reflectance of the
annealed WO3 film is reduced about 10%, therefore at
this temperature, the film turn into non-stoichiometric
composition, so that it could be seen from changing
color of the film.
The optical gap (Eg) was evaluated from the ab-
sorption coefficient (α) using the standard relation:
(αhν)1/η = A(hν − Eg), in which η depends on the
kind of optical transition in semiconductors, and α was
determined near the absorption edge using the simple
relation: α = ln[(1 − R)2/T ]/d ,where d is thickness
of the film. More useful explanation about the optical
band gap calculation reported in [32]. The relationship
between the optical band gap energy and annealing
temperature for WO3 thin films has been shown in Fig.
2b. As can be seen from it, the optical band gap for
the ”as deposited” WO3 evaluated 3.4 eV . Amorphous
structure of the ”as deposited” WO3 causes to Eg is
bigger than 2.7 eV . After annealing samples at 200 and
300oC, the optical band gap decreased slightly about 0.1
eV which can be related to condensation of the films.
But the optical band gap of the WO3 annealed at 400
reduced to 3.1 eV due to crystallization of the film. This
reduction continues to 2.5 eV for the sample annealed
at 500oC. Reason of the Eg becomes smaller than 2.7
eV is oxygen vacancy at this temperature as was seen in
Fig 1b. It is to note that for evaporated WO3 films one
has found 2.7 < Eg < 3.5 eV [1].
λ ( )
400 600 800 1000
27 oC
200 oC
300 oC
400 oC
500 oC
100 200 300 400 500
FIG. 2: a) Optical transmittance (T) and reflectance (R) and
b) Optical band gap energy of the WO3 thin films annealed
at different temperatures.
C. AFM Analysis
To study the effect of the annealing process on the sur-
face morphology of the films, we have shown AFM images
of the WO3 surfaces annealed at the different tempera-
tures : 200, 300, 350, 400, 450, 500oC in Figure 1. As
can be seen from Fig. 1, for the annealed film at 200oC,
it seems that the surface morphology of the film is rela-
tively the same with a smooth surface, amorphous struc-
ture and nanometric grain size, as also reported by other
investigators for WO3 films [35, 36]. We have also ob-
served similar image for the ”as deposited” WO3 which
is not shown here. For WO3 thin films, increasing an-
nealing temperature to 350oC did not significantly ef-
fect on surface parameters because it is low temperature
for crystallization of WO3 [1]. But at higher annealing
temperatures 400, 450 and 500oC, surface grain size and
FIG. 3: AFM images of WO3 thin films annealed at various
temperatures a) 200, b) 300, c) 350, d) 400, e) 450 and f)
500oC, respectively.
roughness begin to increase. The more precise analysis
of these surfaces are given in the next section.
III. STATISTICAL QUANTITIES
A. Roughness Analysis
It is also known that to derive the quantitative infor-
mation of the surface morphology one may consider a
sample of size L and define the mean height of growing
film h and its roughness σ by:
σ(L, t) = (〈(h − h)2〉)1/2 (1)
where t is growing time and 〈· · ·〉 denotes an averaging
over different samples, respectively. Moreover, growing
time is a factor which can be applied to control the sur-
face roughness of thin films.
Let us now calculate the roughness exponent of the
growing surface. Starting from a flat interface (one of
the possible initial conditions), it is conjectured that a
scaling of lenght by factor b and of time by factor bz (z is
the dynamical scaling exponent), rescales the roughness
σ by factor bχ as follows [18]:
σ(bL, bzt) = bασ(L, t) (2)
which implies that
σ(L, t) = Lαf(t/Lz). (3)
For large t and fixed L (i.e.x = t/Lz → ∞) σ saturate.
However, for fixed and large L and t ≪ Lz, one expects
that correlations of the height fluctuations are set up only
within a distance t1/z and thus must be independent of
L. This implies that for x ≪ 1, f(x) ∼ xβg′(λ) with
β = α/z. Thus, dynamic scaling postulates that
σ(L, t) =
tβ, t≪ Lz;
Lα, t≫ Lz.
The roughness exponent α and the dynamic exponent β
characterize the self-affine geometry of the surface and its
dynamics, respectively. In the present work, we see the
surfaces at the limit t → ∞ and so we will only obtain
the α exponent.
The common procedure to measure the roughness ex-
ponent of a rough surface is use of the surface structure
function depending on the length scale l which is defined
S2(l) = 〈|h(x+ l)− h(x)|2〉. (5)
It is equivalent to the statistics of height-height corre-
lation function C(l) for stationary surfaces, i.e. S2(l) =
2σ2(1−C(l)). The second order structure function S2(l),
scales with l as l2α.
B. Level Crossing Analysis
Let ν+α denotes the number of positive slope crossing
of h(x) − h̄ = α for interval L.
Since the process is homogeneous, if we take a second
time interval of L immediately following the first we shall
obtain the same result, and for two intervals together we
shall therefore obtain [28]:
N+α (2L) = 2N
α (L), (6)
from which it follows that, for a homogeneous process,
the average number of crossing is proportional to the in-
terval L. Hence
N+α (L) ∝ L, (7)
N+α (L) = ν
αL, (8)
where ν+α is the average frequency of positive slope cross-
ing of the level h(x) − h̄ = α. We now consider how the
frequency parameter ν+α can be deduced from the under-
lying probability distributions for h(x)− h̄.
Consider a small length scale δx of a typical sample func-
tion. Since we are assuming that the process h(x)− h̄ is a
smooth function of x, with no sudden ups and downs, if
δx is small enough, the sample can only cross h(x)−h̄ = α
with positive slope if h(x) − h̄ < α at the beginning of
the interval L. Furthermore, there is a minimum slope
at x if the level h(x) − h̄ = α is to be crossed in interval
∆x depending on the value of h(x)− h̄ at position x. So
there will be a positive crossing of h(x) − h̄ = α in the
next interval ∆x if, at x,
h(x)− h̄ < α and
h(x) − h̄
h(x) − h̄
As shown in [28], the frequency ν+α can be written in
terms of joint PDF (probability distribution function ) of
p(α, y′) as follows:
ν+α =
p(α, y′)y′dy′. (10)
and then the quantity N+tot which is defined as:
N+tot =
ν+α |α− ᾱ|dα. (11)
will measure the total number of crossing the surface with
positive slope. So, the N+tot and square area of growing
surface are in the same order. Concerning this, it can be
utilized as another quantity to study further the rough-
ness of a surface [27].
l ( m)
0.5 1 1.5 2
FIG. 4: Log-Log plot of the selection structure function of
various annealed temperature: a) 27, b) 200, c) 300, d) 350,
e) 400, f) 450, g) 500oC.
IV. RESULTS AND DISCUSSION
Thin films of WO3 were deposited by using thermal
evaporation method and then surface micrographs of
WO3 samples were obtained by AFM technique after
annealed at different temperatures (Fig.3).
These micrographs were then analyzed using methods
from stochastic data analysis have introduced in the
last section. Figure 3 shows AFM images of WO3 thin
films annealed at 200 ,300 ,350 ,400 ,450, and 500oC.
The ”as deposited” and annealed sample at 200oC
(Fig. 3a) have columnar structure, indicating that up
to 200oC no significant changes in the microstructure
occurs. However, at higher temperatures (figs. 3b-3f) we
have observed increased grain size and rougher surface.
Specifically at 500oC (Fig. 3f) we observe stark changes
in the micrograph which is accompanied by composition
changes in the surface. This can be related to the phase
transition to Magneli phase e.g. WO3−x in the annealing
process [36]. This is also confirmed by our XPS and
UV-visible spectrophotometry analysis (Sec. II). These
are shown the significant formation of W 5+ state in the
surface at 500oC.
Also our analysis shows that below 400oC the surfaces
are in amorphous phase with the same behavior for all
scales, but as soon as the crystalline phase appears the
system behaves differently which diagnostics at small and
large scale for temperatures above 400oC. By using pa-
rameters of the analytical method given here, these tran-
sitions can be quantified.
h ( m)
-0.02 -0.01 0 0.01 0.02
10000
200 oC
300 oC
400 oC
500 oC
FIG. 5: The average frequency ν+α as a function of height h
Now, we will use the statistical parameters introduced
in the last section and will obtain some quantitative
information about the effect of annealing temperature
on the surface topography of the WO3 samples.
The structure function S2(l) as defined in Eq.(5) can
be used to quantify the topology of a rough surface. The
structure function S2(l) is plotted against the length
scale of the sample in Fig.4 . The saturated S2(l) is an
indication of the surface roughness, as 2σ2. The most
obvious observation indicates that roughness is raised
with increasing annealing temperature. Roughness has
a minimum of 0.91nm at 27 and 200oC and a maximum
of 48nm at 500oC. This is because higher temperatures
create higher peaks (i.e. peaks with more deviations
from the average) . All exponents which is derivable
from S2(l) have been summarized and given in Table I.
As depicted in Fig.4 , the structure function S2(l),
has a different behavior in the various temperatures. So
that, in the annealing temperature range 27-350oC it
has a typical behavior in all scales, but in the higher
temperature range 400-500oC its behavior is different
in the small and large scales. In the other words, the
phase transition is occurred at 400oC, because for higher
temperatures, there are two sets of roughness parameters
needed to simulate the surface morphology. It can be
related to the phase transition in the structure of the
surface from amorphous to crystalline phase has been
yielded from the band gap energy (see the section II.B).
The slope of each S2(l) curves at the small and
large scales yields the roughness exponents α and α′ of
100 200 300 400 500
FIG. 6: The normalized N+
behavior as a function of an-
nealing temperature. The solid line is plotted according to
Eq.(12) around 400oC .
the corresponding surface. Hence, it is seen that the
mono roughness exponent increases with the addition of
annealing temperature up to 400oC. In the higher tem-
peratures, we have obtained two roughness exponents(
α-α′) equal to the 0.40-0.14, 0.71-0.20, and 0.69-0.24
for temperatures 400, 450 and 500oC, respectively.
Difference in the α values, in these temperatures, are in
agreement with changes of correlation length. Where the
correlation length, is the distance at which the structure
function behaves differently.
The range of the scaling upon correlation length listed
in the forth column in table I. The value of C∗s denotes
the correlation length at small scales and C∗l for large
scales. The higher C∗ value represents a smoother
surface (as we expected from Fig.3). The correlation
length obtained from the structure function is also a
measure of minimum lateral size of surface features at
each annealing temperature.
The another important WO3 film parameter is the ef-
fective area of the sample which has an important role
in the gas sensitivity of WO3 surfaces. To obtain a mea-
sure for this, we utilize the level crossing analysis. As
shown in Fig.5, the average frequency ν+α as a function
of height h, is plotted for the various annealing tempera-
tures. The broad curves indicate the higher magnitude of
height fluctuations around the average, and sharp curves
show that the most of fluctuations are around the height
average. This conclusion is in the correspondence with
the results obtained from Fig.3.
According to the Eq.(11) N+α i.e. The total number of
the crossing surface with positive slope is proportional to
the square of area of the growing surface. To obtain the
optimum value of the effective area, we have calculated
the ratio of effective areas with respect to the area of
the ”as deposited” surface (27oC). The values are given
TABLE I: The Roughness exponent, roughness, correlation
length and effective area relative to the ”as deposited” sample
area (27oC).
T [oC] α− α′ σ[nm] C∗s -C
[nm] N+/N+(27oC)
27 0.15− none 0.91 60− none 1.00
200 0.15− none 0.98 60− none 1.08± 0.02
300 0.61− none 2.20 100− none 1.17± 0.02
350 0.62− none 11.50 100− none 1.67± 0.02
400 0.40 − 0.15 17.00 100 − 300 2.00± 0.02
450 0.71 − 0.20 30.00 400− 1000 1.72± 0.02
500 0.69 − 0.24 48.00 200− 1400 1.41± 0.02
in the last column in Table I. It means, although the
roughness increases by the annealing temperature but
the effective square area of the rough surface has a
maximum value of N+tot [27].
For more clarity, we have calculated the temperature
dependence of normalized N+α numerically (Fig.6)
around 400oC, and we have obtained the three following
functions for this quantity :
N+tot(T ) = (5.0718− 0.0223× T + 2.72× 10
−5 × T 2)−1(12)
ln(N+tot(T )) = −6.3632+ 0.0344× T − 4.20× 10
−5 × T 2(13)
N+tot(T ) = −8.7057+ 0.0520× T − 6.37× 10
−5 × T 2(14)
According to this figure, the maximum value of the
effective area is at 400oC (with respect to its value
at 27oC) with the relative value equal to 2.00. Thus,
applying this analysis easily shows that if one follows
the condition which the effective area as an important
parameter in the gas sensitivity of WO3 surfaces is
optimum and furthermore, the film composition has not
been changed (e.g. The Magneli phase transition has
not been occurred), should choose the annealed surface
at 400oC for better performance.
V. CONCLUSIONS
We have investigated the role of annealing tempera-
ture, as an external parameter, to control the statisti-
cal properties of a rough WO3 surface. The AFM mi-
crostructure of the surfaces is just needed to apply in our
analysis. We have computed the statistical quantities
such as roughness exponent, roughness and lateral size
of surface features of the ”as deposited” and annealed
surfaces at 200, 300, 350, 400, 450, and 500oC, using the
structure function. We have seen a phase transition at
400oC, because for higher temperatures there are two sets
of roughness parameters, due to structural changes from
amorphous to the crystalline phase. Moreover, using the
level crossing analysis we have obtained an optimum an-
nealing temperature, 400oC in which the surface of the
WO3 has maximum value about twice relative to the ”as
deposited” film without any changes in the film composi-
tion that may increase surface reaction of the WO3 film
as the gas sensor or photo-catalyst.
VI. ACKNOWLEDGMENT
GRJ and AAS would like to thank S.M.Fazeli for his
useful comments and especially M.R.Rahimitabar for his
useful lectures on ”stochastic data analysis”. AZM would
like to acknowledge research council of Sharif University
of Technology for financial support of the work.
[1] C. G. Granqvist, Handbook of Electrochromic Materials
(Elsevier, Amsterdam, 1995).
[2] P. R. Bueno, F. M. Pontes, E. R. Leite, L. O. S. Bulhes,
P. S. Pizani, P. N. Lisboa-Filho, and W. H. Schreiner, J.
Appl. Phys. 96, 2102 (2004).
[3] R. Azimirad, O. Akhavan, and A. Z. Moshfegh, J. Elec-
trochem. Soc.153, E11(2006).
[4] A. Siokou, S. Ntais, S. Papaefthimiou, G. Leftheriotis,
and P. Yianoulis, Surface Science, 566/568, 1168 (2004).
[5] S.-L. Kuai, G. Bader, and P. V. Ashrit, App. Phys. Lett.,
86, 221110 (2005).
[6] Y. Takeda, N. Kato, T. Fukano, A. Takeichi, and T. Mo-
tohiro, J. Appl. Phys., 96, 2417 (2004).
[7] C. O. Avellaneda and L. O. S. Bulhes, Solid State Ionics,
165, 117 (2003).
[8] S.-H. Lee, H. M. Cheong, P. Liu, D. Smith, C. Edwin
Tracy, A. Mascanrenhas, J. R. Pitts, and S.K. Deb, J.
Appl. Phys., 88, 3076 (2000).
[9] Y. S. Kim, S.-C. Ha, K. Kim, H. Yang, S.-Y. Choi, Y.
T. Kim, J. T. Park, C. H. Lee, J. Choi, J. Paek, and K.
Lee, Appl. Phys. Lett., 86, 213105 (2005).
[10] E. Gyrgy, G. Socol, I. N. Mihailescu, C. Ducu, and S.
Ciuca, J. Appl. Phys., 97, 093527 (2005).
[11] H. Kawasaki, T. Ueda, Y. Suda, and T. Ohshima, Sens.
Actuators B, 100, 266 (2004).
[12] M. A. Gondal, A. Hameed, Z. H. Yamani, and A.
Suwaiyan, Chem. Phys. Lett., 385, 111 (2004).
[13] M. Feng, A. L. Pan, H. R. Zhang, Z. A. Li, F. Liu, H.
W. Liu, D. X. Shi, B. S. Zou, and H. J. Gao, Appl. Phys.
Lett., 86, 141901 (2005).
[14] J. Hao, S. A. Studenikin, and M. Cocivera, J. Appl.
Phys., 90, 5064 (2001).
[15] G. Garcia-Belmonte, P. R. Bueno, F. Fabregat-Santiago,
and J. Bisquert, J. Appl. Phys., 96, 853 (2004).
[16] M. Seman and C. A. Wolden, J. Vac. Sci. Technol. A, 21,
1927 (2003).
[17] M. Stankova, X. Vilanova, E. Llobet, J. Calderer, C. Bit-
tencourt, J. J. Pireaux and X. Correig, Sens. Actuators
B, 105, 271 (2005).
[18] A.L. Barabasi and H.E. Stanley, Fractal Concepts in Sur-
face Growth (Cambridge University Press, New York,
1995).
[19] G. R. Jafari, S.M. Fazeli, F. Ghasemi, S.M. Vaez Allaei,
M. Reza Rahimi Tabar, A. Irajizad, and G. Kavei, Phys.
Rev. Lett. 91, 226101 (2003).
[20] G. R. Jafari, S. M. Mahdavi, A. Iraji zad, and P. Kag-
hazchi, Surface And Interface Analysis; 37: 641 645
(2005).
[21] A. Irajizad, G. Kavei, M. Reza Rahimi Tabar, and S.M.
Vaez Allaei, J. Phys.: Condens. Matter 15, 1889 (2003).
[22] T. Halpin-Healy and Y.C. Zhang, Phys. Rep. 254, 218
(1995); J. Krug, Adv. Phys. 46, 139 (1997).
[23] J. Krug and H. Spohn “In Solids Far From Equilibrium
Growth, Morphology and Defects”, edited by C. Godreche
(Cambridge University Press, New York, 1990).
[24] P. Meakin Fractals, Scaling and Growth Far From Equi-
librium (Cambridge University Press, Cambridge, 1998).
[25] M. Kardar, Physica A 281, 295 (2000).
[26] A.A. Masoudi, F. Shahbazi, J. Davoudi, and M. Reza
Rahimi Tabar, Phys. Rev. E 65, 026132 (2002).
[27] P. Sangpour, G. R. Jafari, O. Akhavan, A.Z. Mosh-
fegh, and M. Reza Rahimi Tabar, Phys.Rev.B 71, 155423
(2005).
[28] F. Shahbazi, S. Sobhanian, M. Reza Rahimi Tabar, S.
Khorram, G.R. Frootan, and H. Zahed, J. Phys. A 36,
2517 (2003).
[29] M. Sadegh Movahed, A. Bahraminasab, H. Rezazadeh,
A. A. Masoudi, cond-mat/0509077 (2005).
[30] A. Bahraminasab, M. Sadegh Movahed, S. D. Nassiri and
A. A. Masoudi, cond-mat/0508180 (2005).
[31] G. R. Jafari, M. S. Movahed, S. M. Fazeli, M. Reza
Rahimi Tabar, and S. F. Masoudi, Jstat. Mech. P06008,
(2006).
[32] A. Z. Moshfegh, R. Azimirad, and O. Akhavan, Thin
Solid Films, 484, 124 (2005).
[33] B. V. Crist, Handbook of Monochromatic XPS Spectra:
The Elements and Native Oxides, Vol. 1 (John Wiley &
Sons Ltd, Chichester, 2000).
[34] K. D. Lee, Thin Solid Films, 302, 84 (1997).
[35] M. D. Antonik, J. E. Schneider, E. L. Wittman, K. Snow,
J. F. Vetelino, and R. J. Lad, Thin Solid Films, 256, 247
(1995).
[36] A. Al-Mohammad and M. Gillet, Thin Solid Films, 408,
302 (2002).
http://arxiv.org/abs/cond-mat/0509077
http://arxiv.org/abs/cond-mat/0508180
ABSTRACT
  We have studied the effect of annealing temperature on the statistical
properties of $WO_3$ surface using atomic force microscopy techniques (AFM). We
have applied both level crossing and structure function methods. Level crossing
analysis indicates an optimum annealing temperature of around 400$^oC$ at which
the effective area of the $WO_3$ thin film is maximum, whereas composition of
the surface remains stoichiometric. The complexity of the height fluctuation of
surfaces was characterized by roughness, roughness exponent and lateral size of
surface features. We have found that there is a phase transition at around
400$^oC$ from one set to two sets of roughness parameters. This happens due to
microstructural changes from amorphous to crystalline structure in the samples
that has been already found experimentally.

<|endoftext|><|startoftext|>
Determination of Low-Energy Parameters of Neutron–Proton Scattering
on the Basis of Modern Experimental Data from Partial-Wave Analyses
V. A. Babenko∗ and N. M. Petrov
Bogolyubov Institute for Theoretical Physics, National Academy of Sciences of Ukraine,
Metrologicheskaya ul. 14b, 03143 Kiev, Ukraine
The triplet and singlet low-energy parameters in the effective-range expansion for neutron–
proton scattering are determined by using the latest experimental data on respective phase
shifts from the SAID nucleon–nucleon database. The results differ markedly from the analogous
parameters obtained on the basis of the phase shifts of the Nijmegen group and contradict the
parameter values that are presently used as experimental ones. The values found with the
aid of the phase shifts from the SAID nucleon–nucleon database for the total cross section
for the scattering of zero-energy neutrons by protons, σ0 = 20.426 b, and the neutron–proton
coherent scattering length, f = −3.755 fm, agree perfectly with the experimental cross-section
values obtained by Houk, σ0 = 20.436 ± 0.023 b, and experimental scattering-length values
obtained by Houk andWilson, f = −3.756±0.009 fm, but they contradict cross-section values of
σ0 = 20.491±0.014 b according to Dilg and coherent-scattering-length values of f = −3.7409±
0.0011 fm according to Koester and Nistler.
PACS: 13.75.Cs, 21.30.-x, 25.40.Dn
DOI: 10.1134/S1063778807040072
1. Along with the deuteron parameters, the low-energy parameters in the effective-range
expansion for neutron–proton scattering,
k cot δ = −
rk2 + v2k
4 + v3k
6 + v4k
8 + . . . , (1)
are fundamental quantities that play a key role in studying strong nucleon–nucleon interaction.
∗E-mail: pet@online.com.ua
http://arxiv.org/abs/0704.1024v1
These parameters are of great importance for constructing various realistic nuclear-force models,
which, in turn, form a basis for studying the structure of nuclei and various nuclear processes.
For this reason, it is highly desirable to determine reliably and accurately the parameters in
the effective-range expansion, including the scattering length a, the effective range r, the shape
parameter v2, and higher order parameters vn.
Although low-energy parameters for neutron–proton scattering have been determined and
studied since the early 1950s, even the experimental values of such parameters as the scattering
length a and the effective range r are ambiguous to date. As for the shape parameter v2, even
its sign is unknown at the present time. The theoretical value of this parameter depends greatly
on the nuclear-force model used: as we go over from one model to another, the parameter v2
in the triplet state changes within a broad interval, from −0.95 [1, 2] to 1.371 fm3 [3], whence
it follows that the shape parameter is a very subtle and sensitive feature of nucleon–nucleon
interaction.
We would like to note that not only does the shape parameter v2 depend on the form of
interaction, but it is also strongly dependent on the scattering length a and the effective range r.
In particular, a change of only a few tenths of a percent in the scattering length a may lead to a
severalfold change in the shape parameter v2 [4]. The shape parameters vn of order higher than
that of v2 have been still more poorly determined and are more sensitive to details of nucleon–
nucleon interaction. The aforesaid highlights once again the importance of reliably determining
the scattering length a and the effective range r, the more so as these are quantities that are
most frequently used as inputs in constructing various models of nucleon–nucleon interaction.
2. It is well known [5] that the neutron–proton system may occur either in the triplet (the
total spin is S = 1) or the singlet (the total spin is S = 0) spin state. In determining the
scattering lengths a and the effective ranges r in the triplet (t) and singlet (s) spin states,
one employs the experimental dependence of the total (spin-averaged) cross section for the
scattering of slow neutrons by free protons and data characterizing the scattering of zero-
energy neutrons by para-hydrogen. In order to determine the triplet and singlet scattering
lengths (at and as, respectively), use is usually made of equations that relate these quantities
to the total cross section for the scattering of zero-energy neutrons by protons,
σ0 = π
, (2)
and to the coherent scattering length,
(3at + as) . (3)
In this case, the cross section σ0 is determined from the results of experiments that study
slow-neutron scattering on protons bound in various molecules (H2, H2O, C6H6, CH3OH),
corrections associated with neutron capture by a proton and with effects of proton binding
in molecules being subsequently eliminated. The elimination of binding-effect corrections is a
nontrivial many-body problem, since, in addition to proton and neutron motion, it is necessary
to take into account the motion of the molecular residue. A number of significant simplifications
and approximations are made in solving this problem [6]. A compendium of experimental results
from [7–13] on the total cross section for the scattering of zero-energy neutrons by free protons,
σ0, is given in Table 1.
Two values of the total cross section σ0 are recommended at the present time. These are
the value obtained by Houk (1971) [12],
σ0 = 20.436(23) b, (4)
and the value obtained by Dilg (1975) [13],
σ0 = 20.491(14) b. (5)
Since these two values of σ0 are inconsistent, their weighted-mean value
σ0 = 20.476(12) b (6)
can also be used in determining the scattering lengths.
It should be noted that the total cross section σ0 has not been measured since 1975.
The coherent scattering length f , which is determined by relation (3), is found either from
experiments where slow neutrons are scattered by pure para-hydrogen [8, 14, 15] or by crystals
[16] or — and this is a more precise method — from experiments where neutrons are reflected
by a liquid mirror and where use is made of a number of pure hydrocarbons [9, 10, 17–22]. Also,
a method for determining the coherent scattering length by means of neutron interferometry
from experiments to study neutron scattering on molecular hydrogen was proposed in [23]. The
values found by various authors for the neutron–proton coherent scattering length f are quoted
in Table 2, whence it can be seen that the value of this quantity is even more ambiguous than
the value of σ0.
In determining the scattering lengths in the triplet and the singlet state (at and as, respec-
tively), one employs most frequently, at the present time, the coherent-length value obtained
by Koester and Nistler [22],
f = −3.7409(11) fm, (7)
and the coherent-length value presented in the compilation of Dumbrajs et al. [24],
f = −3.738(1) fm. (8)
Recent experiments aimed at determining the neutron–proton coherent scattering length by
means of neutron interferometry [23], which were mentioned above, yielded the value
f = −3.7384(20) fm. (9)
Within the experimental errors, the value in (9) agrees with the result of Koester and Nistler
in (7) and with the value in (8), which was used by Dumbrajs et al. [24].
Table 3 presents values obtained in a number of previous studies [9, 10, 13, 18, 21, 22,
24–28] for the scattering lengths and effective ranges in the triplet and singlet spin states. All
of them have been used as experimental values. The values of the triplet (at) and singlet (as)
scattering lengths from Table 3 were obtained on the basis of formulas (2) and (3) by using
various values for the total cross section σ0 and the neutron–proton coherent scattering length
The values of the triplet effective range rt in Table 3 were determined primarily in an
approximation that does not depend on the form of interaction; that is,
rt ≡ ρ (−εd, 0) = 2R
, (10)
where ρ (−εd, 0) is the mixed effective radius of the deuteron;
R = 1/α (11)
is a parameter that characterizes the spatial dimensions of the deuteron; and α is the deuteron
wave number, which is related to the deuteron binding energy εd by the equation
εd = h̄
2α2/mN . (12)
In a number of studies [24, 26], the triplet effective range was determined in accordance
with the formula
rt = ρ (−εd, 0) + δrt , (13)
where the correction δrt is a model-dependent quantity. According to the estimates obtained by
Noyes on the basis of the dispersion relations [26], the correction δrt arising owing to one-pion
exchange is
δrt = −0.013 fm . (14)
According to other estimates [24], this correction is
δrt ≃ −0.001 fm , (15)
which is an order of magnitude smaller in absolute value than the estimate in (14). In the
latter case, the effective range rt is therefore nearly coincident with the mixed effective radius
ρ (−εd, 0).
The singlet effective range rs is usually determined on the basis of an analysis of the total
cross section for neutron–proton scattering, σ (E), in the low-energy region at fixed values of
the parameters at, as, and rt. The values found in this way for the singlet effective range
rs appear to be even more ambiguous than the values of the triplet effective range. As can
be seen from Table 3, the scattering-length and effective-range values used as experimental
ones change within rather broad ranges. The scatter of these values is due first of all to the
fact that different experimental values of the cross section for the scattering of zero-energy
neutrons by free protons, σ0, and of the neutron–proton coherent scattering length f are used
to determine these quantities. The ambiguity in determining the singlet effective range rs is also
associated with an insufficient accuracy of the experimental total cross sections for neutron–
proton scattering at energies below 5MeV. The values found by different authors for the singlet
effective range rs change within a broad range, from 2.42 [9] to 2.81 fm [13].
Thus, the accuracy of the experiments performed in the 1950s–1970s is insufficient for
unambiguously determining the low-energy parameters of neutron–proton scattering. At the
same time, these parameters play an important role in the theory of few-nucleon systems, which
is based on nucleon–nucleon interaction. As was shown in [29, 30], the binding energies of the
3H and 4He nuclei depend greatly on the singlet effective range rs, increasing as rs becomes
smaller. By way of example, we indicate that, as rs decreases by 0.1 fm, the binding energies
of the 3H and 4He nuclei increase by 0.3 and 1.5MeV, respectively. We note that the decrease
of 0.01 fm in the triplet scattering length at also leads to the increase of 0.025MeV in the
triton binding energy [31, 32]. At the same time, it is well known that, in calculations with
realistic nucleon–nucleon potentials, the binding energies of few-nucleon systems prove to be
underestimated. In such calculations, the 3H binding energy is as a rule underestimated by
1MeV. A reliable and precise determination of the low-energy parameters of neutron–proton
scattering and their use in calculating the binding energies of systems that contain three or
more nucleons may contribute to solving the problem of underestimating the binding energies
of few-nucleon systems without introducing three-particle forces, quark degrees of freedom,
and other concepts that would require revising basic points in the traditional theory of nuclear
forces, which relies on pair nucleon–nucleon interaction.
To conclude this section, we present, for low-energy parameters, values that are currently
used as experimental ones. Most frequently, the present-day literature quotes two sets of low-
energy parameters. These are the set from [24],
at = 5.424(4) fm, rt = 1.759(5) fm;
as = −23.748(10) fm, rs = 2.75(5) fm,
which is matched with the experimental value (5) of the total cross section at zero energy due
to Dilg [13] and with the value in (8) for the neutron–proton coherent scattering length from
[24], and the set from [28],
at = 5.419(7) fm, rt = 1.753(8) fm;
as = −23.740(20) fm, rs = 2.77(5) fm,
which corresponds to the weighted-mean value (6) of the cross sections presented by Houk [12]
and Dilg [13] and to the value in (7) for the coherent length due to Koester and Nistler [22].
It should be noted that the experiments performed in the 1950–1970s were the main source
of information used to deduce the values in (16) and (17) for the low-energy parameters of
neutron–proton scattering.
3. In recent years, the accuracy of experimental data on nucleon–nucleon scattering has
been improved considerably; moreover, methods of their partial-wave analysis, which make it
possible to describe the results of scattering experiments in terms of phase shifts, have also been
refined [33, 34]. Owing to this, the triplet and singlet low-energy parameters of neutron–proton
scattering can be determined independently of one another by using the 3S1- and
1S0-state
phase shifts [4, 35]. The results of the partial-wave analysis performed by the GWU group [33]
(data from the well-known SAID nucleon–nucleon database) and by the Nijmegen group [34] are
presently the most precise and most widely used data on the phase shifts for nucleon–nucleon
scattering. The most popular modern realistic nucleon–nucleon potentials constructed within
the last decade, which include the Nijm-I, Nijm-II, Reid93 [36], Argonne V18 [37], CD-Bonn [28,
38], and Moscow [39] potentials, are based on fits to data of the Nijmegen group [34]. However,
it should be noted that the partial-wave analysis of the Nijmegen group is a result of processing
and averaging experimental data on nucleon–nucleon scattering over a period from 1955 to
1992, but this analysis provides an insufficiently accurate description of modern experimental
data on nucleon–nucleon scattering. Despite the proximity of the phase shifts for neutron–
proton scattering that were obtained by the GWU and Nijmegen groups, the corresponding
values of the low-energy parameters in the effective-range expansion are markedly different [4],
this difference being not only quantitative but also qualitative.
Using the approximation of the effective-range function k cot δ at low energies by polyno-
mials and Padé approximants within the least squares method, we calculated the triplet and
singlet low-energy parameters of neutron–proton scattering for the experimental data on the
GWU [33] and Nijmegen [34] phase shifts. The results obtained for the low-energy parameters
in the present study by employing the data from the partial-wave analysis of the GWU group,
at = 5.4030 fm, rt = 1.7494 fm, v2t = 0.163 fm
as = −23.719 fm, rs = 2.626 fm, v2s = −0.005 fm
differ significantly from the parameter values
at = 5.420 fm, rt = 1.753 fm, v2t = 0.040 fm
as = −23.739 fm, rs = 2.678 fm, v2s = −0.48 fm
which were obtained on the basis of the data from the partial-wave analysis of the Nijmegen
group. The triplet low-energy parameters calculated here for the phase shifts of the Nijmegen
group are virtually coincident with the analogous parameters obtained previously in [35]. Un-
fortunately, the data presented by the Nijmegen group do not contain the singlet low-energy
parameters of neutron–proton scattering. The value of the singlet shape parameter v2s for the
Nijmegen phase shifts was calculated in [1], and it is in agreement with our value.
Using expressions (18) and (19) for the scattering lengths and relying on formulas (2) and
(3), we find for the cross section σ0 and for the coherent scattering length f that
σ0 = 20.426 b , f = −3.755 fm (20)
in the case of the GWU phase shifts and that
σ0 = 20.473 b , f = −3.7395 fm (21)
in the case of the Nijmegen phase shifts.
The values in (21) are in good agreement with the weighted mean of the cross sections
obtained by Houk and Dilg, σ0 = 20.476(12) b, and with the coherent-scattering-length value
of f = −3.7409(11) fm according to Koester and Nistler [22]. It should be emphasized, however,
that this agreement is not accidental; it is directly related to the fact that, in the partial-wave
analysis of the Nijmegen group, the cross-section values obtained by Houk [12] and Dilg [13]
and the coherent-scattering-length value obtained by Koester and Nistler [22] were used as
input experimental parameters. It is precisely the reason why all of the experimental low-
energy parameters in (17), with the exception of the singlet effective range, agree within the
experimental error with the corresponding parameters in (19), which were calculated on the
basis of the Nijmegen phase shifts.
The singlet-effective-range value of rs = 2.678 fm, which was calculated for the phase shifts
obtained by the Nijmegen group, is much smaller than the experimental value of rs = 2.77(5) fm,
which was quoted by Dilg in [13]. In this connection, it should be noted that, in [13], the
singlet effective range rs was determined from experimental data on the total cross section
for neutron–proton scattering at energies below 5MeV at the scattering-length values fixed
at at = 5.423(4) fm and as = −23.749 fm and the triplet-effective-range value fixed at rt =
1.760(5) fm, but, as was indicated above, this method for determining the effective range is
highly unreliable (see Table 3). A determination of the singlet effective range rs directly from
the singlet phase shift irrespective of the triplet parameters is more correct and consistent,
which reduces substantially the uncertainty in this quantity.
For the sake of comparison, the low-energy parameters for neutron–proton scattering that
correspond to the GWU (GWU PWA) and Nijmegen (Nijm PWA) phase shifts are given in Ta-
ble 4, along with the values of these parameters for a number of the realistic potentials (Argonne
V18 [37], CD-Bonn [28, 38], and Moscow [39] potentials) whose parameters were matched with
the Nijmegen nucleon–nucleon database. Also quoted there are the experimental values of the
low-energy parameters. Table 4 shows that the values of the low-energy parameters obtained
for the Nijmegen phase shifts are in perfect agreement with the corresponding parameters for
the potentials fitted to the Nijmegen nucleon–nucleon database.
A significant distinction between the values of the triplet low-energy parameters for the
GWU and Nijmegen data was discussed in detail in our previous article [4]. Here, we only
indicate that the difference of the triplet scattering lengths by 0.3% is in fact a more important
circumstance than the fourfold distinction between the values of the triplet shape parameters.
This is because many important features of the neutron–proton system — such as the asymp-
totic deuteron normalization factor AS and the root-mean-square radius rd of the deuteron
— are highly sensitive to variations in the triplet scattering length [40]. We also note that,
although the triplet effective ranges obtained from experimental data of the two main groups
are close to each other, the values of the difference δrt of the effective range rt and the mixed
effective radius ρ (−εd, 0) for the GWU [33] and Nijmegen [34] phase shifts differ significantly.
For example, the correction δrt for the phase shifts of the GWU group is positive, taking the
value
δrt = 0.0163 fm . (22)
For the phase shifts of the Nijmegen group, this correction is negative and, in absolute value, is
an order of magnitude smaller than the correction in (22): δrt = −0.001 fm. The value of the
singlet effective range for the phase shifts of the GWU group also differs from its counterpart
for the Nijmegen phase shifts (by about 2%), and the corresponding difference of the singlet
shape parameters is formidable, reaching two orders of magnitude.
In contrast to the partial-wave analysis of the Nijmegen group, the partial-wave analysis of
the GWU group does not employ the values of the cross section σ0 and the coherent scattering
length f as input parameters. The theoretical values of σ0 = 20.426 b and f = −3.755 fm,
which we obtained here for the cross section in question and for the neutron–proton coherent
scattering length from data of the partial-wave analysis performed by the GWU group, are in
perfect agreement with the experimental cross-section value of σ0 = 20.436(23) b according to
Houk [12], and the experimental coherent-scattering-length value obtained by Houk and Wilson
[9, 10],
f = −3.756(9) fm , (23)
but they contradict the cross-section value of σ0 = 20.491(14) b according to Dilg [13] and the
coherent-scattering-length value of f = −3.7384(20) fm, which was obtained recently by the
neutron-interferometry method in [23]. Thus, we see that a reliable experimental determination
of the total cross section for neutron–proton scattering at zero energy, σ0, and of the coherent
scattering length, f , is now quite a pressing problem. Precise values of these quantities would
make it possible to determine unambiguously the triplet and singlet scattering lengths and to
solve the problem of choosing a correct set of the low-energy parameters and phase shifts among
currently recommended experimental values.
REFERENCES
1. T. D. Cohen and J. M. Hansen, Phys. Rev. C 59, 13 (1999).
2. T. D. Cohen and J. M. Hansen, Phys. Rev. C 59, 3047 (1999).
3. M. W. Kermode, A. McKerrell, J. P. McTavish, and L. J. Allen, Z. Phys. A 303, 167
(1981).
4. V. A. Babenko and N. M. Petrov, Yad. Fiz. 68, 244 (2005) [Phys. At. Nucl. 68, 219
(2005)].
5. A. G. Sitenko and V. K. Tartakovskĭı, Lectures on the Theory of the Nucleus (Atomizdat,
Moscow, 1972; Pergamon, Oxford, 1975).
6. L. Hulthén and M. Sugawara, in Handbuch der Physik, Ed. by S. Flügge (Springer-Verlag,
New York, Berlin, 1957), p. 1.
7. E. Melkonian, Phys. Rev. 76, 1744 (1949).
8. A. T. Stewart and G. L. Squires, Phys. Rev. 90, 1125 (1953).
9. T. L. Houk and R. Wilson, Rev. Mod. Phys. 39, 546 (1967).
10. T. L. Houk and R. Wilson, Rev. Mod. Phys. 40, 672 (1968).
11. J. M. Neill, J. L. Russell, and J. R. Brown, Nucl. Sci. Eng. 33, 265 (1968).
12. T. L. Houk, Phys. Rev. C 3, 1886 (1971).
13. W. Dilg, Phys. Rev. C 11, 103 (1975).
14. G. L. Squires and A. T. Stewart, Proc. Roy. Soc. A 230, 19 (1955).
15. J. Callerame, D. J. Larson, S. J. Lipson, and R. Wilson, Phys. Rev. C 12, 1423 (1975).
16. C. G. Shull, E. O. Wollan, G. A. Morton, and W. L. Davidson, Phys. Rev. 73, 842
(1948).
17. D. J. Hughes, M. T. Burgy, and G. R. Ringo, Phys. Rev. 77, 291 (1950).
18. M. T. Burgy, G. R. Ringo, and D. J. Hughes, Phys. Rev. 84, 1160 (1951).
19. W. C. Dickinson, L. Passell, and O. Halpern, Phys. Rev. 126, 632 (1962).
20. L. Koester, Z. Phys. 198, 187 (1967).
21. L. Koester and W. Nistler, Phys. Rev. Lett. 27, 956 (1971).
22. L. Koester and W. Nistler, Z. Phys. A 272, 189 (1975).
23. K. Schoen, D. L. Jacobson, M. Arif et al., Phys. Rev. C 67, 044005 (2003).
24. O. Dumbrajs, R. Koch, H. Pilkuhn et al., Nucl. Phys. B 216, 277 (1983).
25. H. P. Noyes, Phys. Rev. 130, 2025 (1963).
26. H. P. Noyes, Ann. Rev. Nucl. Sci. 22, 465 (1972).
27. E. L. Lomon and R. Wilson, Phys. Rev. C 9, 1329 (1974).
28. R. Machleidt, Phys. Rev. C 63, 024001 (2001).
29. V. F. Kharchenko, N. M. Petrov, and S. A. Storozhenko, Nucl. Phys. A 106, 464 (1968).
30. V. F. Kharchenko, Fiz. Élem. Chastits At. Yadra 10, 884 (1979) [Sov. J. Part. Nucl.
10, 349 (1979)].
31. V. F. Kharchenko and S. A. Storozhenko, Nucl. Phys. A 137, 437 (1969).
32. N. M. Petrov and I. V. Simenog, Yad. Fiz. 28, 381 (1978) [Sov. J. Nucl. Phys. 28, 193
(1978)].
33. R. A. Arndt, W. J. Briscoe, I. I. Strakovsky, and R. L. Workman, Partial-Wave Analysis
Facility SAID, The George Washington University [http://gwdac.phys.gwu.edu] ; R. A.
Arndt, I. I. Strakovsky, and R. L. Workman, Phys. Rev. C 62, 034005 (2000).
http://gwdac.phys.gwu.edu]
34. Nijmegen NN-Online program [http://nn-online.org] ; V. G. J. Stoks, R. A. M. Klomp,
M. C. M. Rentmeester, and J. J. de Swart, Phys. Rev. C 48, 792 (1993).
35. J. J. de Swart, C. P. F. Terheggen, and V. G. J. Stoks, Invited talk at the 3rd International
Symposium ”Dubna Deuteron 95”, Dubna, Russia, 1995, nucl-th/9509032.
36. V. G. J. Stoks, R. A. M. Klomp, C. P. F. Terheggen, and J. J. de Swart, Phys. Rev. C
49, 2950 (1994).
37. R. B. Wiringa, V. G. J. Stoks, and R. Schiavilla, Phys. Rev. C 51, 38 (1995).
38. R. Machleidt, F. Sammarruca and Y. Song, Phys. Rev. C 53, 1483 (1996).
39. V. I. Kukulin, V. N. Pomerantsev, and A. Faessler, nucl-th/9903056.
40. V. A. Babenko and N. M. Petrov, Yad. Fiz. 66, 1359 (2003) [Phys. At. Nucl. 66, 1319
(2003)].
http://nn-online.org]
http://arxiv.org/abs/nucl-th/9509032
http://arxiv.org/abs/nucl-th/9903056
Table 1. Total cross section for neutron scattering on a proton at zero energy
No. References σ0 , b
1 Melkonian [7] (1949) 20.36(10)
2 Stewart and Squires [8] (1953) 20.41(14)
3 Houk and Wilson [9] (1967) 20.37(2)
4 Houk and Wilson [10] (1968) 20.442(23)
5 Neill et al. [11] (1968) 20.366(76)
6 Houk [12] (1971) 20.436(23)
7 Dilg [13] (1975) 20.491(14)
Table 2. Amplitude for coherent neutron–proton scattering
No. References f , fm
1 Shull et al. [16] (1948) −3.900(100)
2 Hughes et al. [17] (1950) −3.75(3)
3 Burgy et al. [18] (1951) −3.78(2)
4 Stewart and Squires [8] (1955) −3.80(5)
5 Dickinson et al. [19] (1962) −3.740(20)
6 Koester [20] (1967) −3.719(2)
7 Houk and Wilson [9, 10] (1967, 1968) −3.756(9)
8 Koester and Nistler [21] (1971) −3.740(3)
9 Koester and Nistler [22] (1975) −3.7409(11)
10 Callerame et al. [15] (1975) −3.733(4)
11 Schoen et al. [23] (2003) −3.7384(20)
Table 3. Low-energy parameters of neutron–proton scattering from various studies
No. References at , fm as , fm rt , fm rs , fm
1 Burgy et al. [18] (1951) 5.377(21) −23.690(55) 1.704(28) −
2 Noyes [25] (1963) 5.396(11) −23.678(28) 1.727(14) 2.51(11)
5.392(6) −23.689(13) 1.724(7) 2.42(9)
3 Houk and Wilson [9] (1967) 5.399(11) −23.680(28) 1.732(12) 2.48(11)
5.411(4) −23.671(12) 1.747(4) 2.59(8)
4 Houk and Wilson [10] (1968) 5.405(6) −23.728(13) 1.738(7) 2.56(10)
5 Koester and Nistler [21] (1971) 5.414(5) −23.719(13) − −
6 Noyes [26] (1972) 5.413(5) −23.719(13) 1.735 2.66
5.423(5) −23.712(13) 1.748(6) 2.75(10)
7 Lomon and Wilson [27] (1974) 5.414(5) −23.719(13) 1.750(5) 2.76(5)
2.77(5)
8 Dilg [13] (1975) 5.423(4) −23.749(9) 1.760(5) 2.81(5)
2.78(5)
9 Koester and Nistler [22] (1975) 5.424(3) −23.749(8) 1.760(5) 2.81(5)
10 Dumbrajs et al. [24] (1983) 5.424(4) −23.748(10) 1.759(5) 2.75(5)
11 Machleidt [28] (2001) 5.419(7) −23.740(20) 1.753(8) 2.77(5)
Table 4. Low-energy parameters of neutron–proton scattering that were obtained on the basis
of the present-day data of the partial-wave analysis and modern realistic models of nucleon–
nucleon interaction
No. Model at , fm as , fm rt , fm rs , fm σ0 , b f , fm
1 GWU PWA 5.4030 −23.719 1.7494 2.626 20.426 −3.755
2 Nijm PWA 5.420 −23.739 1.753 2.678 20.473 −3.7395
3 Argonne V18 5.419 −23.732 1.753 2.697 20.461 −3.7375
4 CD Bonn 5.4199 −23.738 1.751 2.671 20.471 −3.7392
5 Moscow 5.422 −23.740 1.754 2.66 20.476 −3.7370
6 Expt. [10, 12] 5.405(6) −23.728(13) 1.738(7) 2.56(10) 20.436(23) −3.756(9)
7 Expt. [24] 5.424(4) −23.748(10) 1.759(5) 2.75(5) 20.491(14) −3.738(1)
8 Expt. [28] 5.419(7) −23.740(20) 1.753(8) 2.77(5) 20.476(12) −3.7409(11)
ABSTRACT
  The triplet and singlet low-energy parameters in the effective-range
expansion for neutron--proton scattering are determined by using the latest
experimental data on respective phase shifts from the SAID nucleon--nucleon
database. The results differ markedly from the analogous parameters obtained on
the basis of the phase shifts of the Nijmegen group and contradict the
parameter values that are presently used as experimental ones. The values found
with the aid of the phase shifts from the SAID nucleon--nucleon database for
the total cross section for the scattering of zero-energy neutrons by protons,
$\sigma_{0}=20.426 $b, and the neutron--proton coherent scattering length,
$f=-3.755 $fm, agree perfectly with the experimental cross-section values
obtained by Houk, $\sigma_{0}=20.436\pm 0.023 $b, and experimental
scattering-length values obtained by Houk and Wilson, $f=-3.756\pm 0.009 $fm,
but they contradict cross-section values of $\sigma_{0}=20.491\pm 0.014 $b
according to Dilg and coherent-scattering-length values of $f=-3.7409\pm 0.0011
$fm according to Koester and Nistler.

<|endoftext|><|startoftext|>
Introduction
In recent years interest has grown in the detection of very high energy cosmic ray neutrinos [1].
Such particles could be produced in the cosmic particle accelerators which make the charged
primaries or they could be produced by the interactions of the primaries with the Cosmic Mi-
crowave Background, the so called GZK effect [2]. The flux of neutrinos expected from these
two sources has been calculated [3,4]. It is found to be very low so that large targets are needed
for a measurable detection rate. It is interesting to measure this neutrino flux to see if it is
compatible with the values expected from these sources, incompatibility implying new physics.
Searches for cosmic ray neutrinos are ongoing in AMANDA [5], IceCube [6], ANTARES
[7] and NESTOR [8], detecting upward going muons from the Cherenkov light in either ice or
water. In general, these experiments are sensitive to lower energies than discussed here since the
Earth becomes opaque to neutrinos at very high energies. The experiments could detect almost
horizontal higher energy neutrinos but have limited target volume due to the attenuation of the
light signal in the ice. The Pierre Auger Observatory, an extended air shower array detector, will
also search for upward and almost horizontal showers from neutrino interactions [9]. In addition
to these detectors there are ongoing experiments to detect the neutrino interactions by either
radio or acoustic emissions from the resulting particle showers [1]. These latter techniques,
with much longer attenuation lengths, allow very large target volumes utilising either large
ice fields or dry salt domes for radio or ice fields, salt domes and the oceans for the acoustic
technique.
In order to assess the feasibility of each technique the production of the particle shower from
neutrino interactions needs to be simulated. Since experimental data on the interactions of such
high energy particles do not exist it is necessary to use theoretical models to simulate them.
The most extensive ultra high energy simulation program which has so far been developed is
CORSIKA [10]. However, this program has been used previously only for the simulation of
cosmic ray air showers. The program is readily available [10].
Different simulations are necessary for the radio and acoustic techniques. Radio emission
occurs due to coherent Cherenkov radiation from the particles in the shower, the Askaryan
Effect [11]. The emitted energy is sensitive to the distribution of the electron-positron asymme-
try which develops in the shower and which grows for lower energy electromagnetic particles.
Hence, to simulate radio emission, the electromagnetic component of the shower must be fol-
lowed down to very low kinetic energies (∼ 100 keV) [12]. In contrast, an acoustic signal
is generated by the sudden local heating of the surrounding medium induced by the particle
shower [13]. Thus to simulate the acoustic signal the spatial distribution of the deposited en-
ergy is needed. Once the electromagnetic energy in the shower reaches the MeV level (electron
range ∼ 1 cm) the energy can be simply added to the total deposited energy and the simula-
tion of such particles discontinued. Extensive simulations have been carried out for the radio
technique [14]. However, the simulations for the acoustic technique are less advanced. Some
work has been done [15,16] using the Geant4 package [17]. However, this work is restricted to
energies less than 105 GeV for hadron showers since the range of validity of the physics models
in this package does not extend to higher energy hadrons.
In this paper the energy distributions of showers produced by neutrino interactions in sea
water at energies up to 1012 GeV are discussed. The distributions are generated using the air
shower program CORSIKA [10] modified to work in a sea water medium. The salt compo-
nent of the sea water has a negligible effect1 and the results are presented in distance units of
g cm−2, hence they should be applicable to ice also. The computed distributions have been pa-
rameterised and this parameterisation is used to develop a simple program to simulate neutrino
interactions and the resulting particle showers. The properties of the acoustic signals from the
generated showers are also presented.
2 Adaptation of the CORSIKA program to a water medium
The air shower program, CORSIKA (version 6204) [10], has been adapted to run in sea water
i.e. a medium of constant density of 1.025 g cm−3 rather than the variable density needed for
an air atmosphere. Sea water was assumed to consist of a medium in which 66.2% of the atoms
are hydrogen, 33.1% of the atoms are oxygen and 0.7% of the atoms are made of common salt,
NaCl. The salt was assumed to be a material with atomic weight and atomic number A=29.2 and
Z=14, the mean of sodium and chlorine. The purpose of this is to maintain the structure of the
program as closely as possible to the air shower version which had two principal atmospheric
components (oxygen and nitrogen) with a trace of argon. The presence of the salt component
had an almost undetectable effect on the behaviour of the showers.
Other changes made to the program to accommodate the water medium include a modifica-
tion of the stopping power formula to allow for the density effect in water 2. This only affects
the energy loss for hadrons since the stopping powers for electrons are part of the EGS [18]
package which is used by CORSIKA to simulate the propagation of the electromagnetic com-
ponent of the shower. Smaller radial binning of the shower was also required since shower radii
in water are much smaller than those in air. In addition the initial state energy for electrons
and photons above which the LPM effect [19] was simulated in the program was reduced to the
much lower value necessary for water 3. The LPM effect suppresses pair production from high
energy photons and bremsstrahlung from high energy electrons. Similarly, the interactions of
neutral pions had to be simulated at lower energy than in air because of the higher density water
medium. In all about 100 detailed changes needed to be made to the CORSIKA program to
accommodate the water medium.
To test the implementation of the LPM effect [19] in the program 100 showers from incident
gamma ray photons at several different energies were generated and the mean depth of the first
interaction (the mean free path) calculated. The observed mean free path was found to be
in agreement with the expected behaviour when both the suppression of pair production and
photonuclear interactions were taken into account (see Figure 1). This showed that the LPM
effect had been properly implemented in CORSIKA.
Considerable fluctuations between showers occurred. These are expressed in terms of the
ratio of the root mean square (RMS) deviation of a given parameter to its mean value: the
1The shower maximum was observed to peak at a depth 2.4 ± 1.1% less in sea water than in fresh water with
the same peak energy deposited, for protons of energy 105 GeV.
2The stopping power was computed using the Bethe-Bloch formula [20] and the density effect from the formu-
lae of Sternheimer et al [21].
3The level was set at 1 TeV compared to the characteristic energy for water ELPM = 270 TeV [20].
RMS peak energy deposit to the mean peak energy deposit was observed to be 14% at 105 GeV
reducing to 4% at 1011 GeV, that for the depth of the peak position varied from 19% to 7.4%
and for the full width at half maximum of the shower from 63% to 18%. To smooth out such
fluctuations averages of 100 generated showers will be taken in the following. The statistical
error on the averages is then given by these RMS values divided by 10. The hadronic energy
contributes only about 10% to the shower energy at the shower peak, the remainder being carried
by the electromagnetic part of the shower.
The simulations were all carried out in a vertical column of sea water 20 m long. The
deposited energy generated by CORSIKA was binned into 20 g cm−2 slices longitudinally
and 1.025 g cm−2 annular cylinders radially for 0 < r < 10.25 g cm−2 and 10.25 g cm−2
for 10.25 < r < 112.75 g cm−2 where r is the distance from the vertical axis. To reduce
computing times, the thinning option was used i.e. below a certain fraction of the primary
energy (in this case 10−4) only one of the particles emerging from the interaction is followed
and an appropriated weight is given to it [22]. The simulation of particles continued down to cut-
off energies of 3 MeV for electromagnetic particles and 0.3 GeV for hadrons. When a particle
reached this cut-off, the energy was added to the slice where this occurred. The QGSJET [23]
model was used to simulate the hadronic interactions.
3 Comparison with other simulations
3.1 Comparison with Geant4
Proton showers were generated in sea water using the program Geant4 (version 8.0) [17] and
compared with those generated in CORSIKA. Unfortunately, the range of validity of Geant4
physics models for hadronic interactions does not extend beyond an energy of 105 GeV. Hence
the comparison is restricted to energies below this.
Figure 2 shows the longitudinal distributions of proton showers at energies of 104 and 105
GeV (averaged over 100 showers) as determined from Geant4 and CORSIKA. The showers
from CORSIKA tend to be slightly broader and with a smaller peak energy than those generated
by Geant4. The difference in the peak height is ∼ 5% at 104 GeV rising to ∼ 10% at energy 105
GeV. Figure 3 shows the radial distributions. The differences in the longitudinal distributions
are reflected in the radial distributions. However, the shapes of the radial distributions are very
similar between Geant4 and CORSIKA, with CORSIKA producing ∼ 10% more energy near
the shower axis at depths between 450 and 850 g cm−2 where most of the energy is deposited.
The acoustic signal from a shower is most sensitive to the radial distribution, particularly near
the axis (r ∼ 0). It is relatively insensitive to the shape of the longitudinal distribution.
3.2 Comparison with the simulation of Alvarez-Muñiz and Zas
The CORSIKA simulation was also compared with the longitudinal shower profile for protons
computed in the simulation by Alvarez-Muñiz and Zas (AZ) [24]. There was a reasonable
agreement between the longitudinal shower shapes from CORSIKA and those shown in Figure
2 of ref. [24]. However, the numbers of electrons and positrons at the peak of the CORSIKA
showers was ∼ 20% lower than those from ref. [24]. This number is sensitive to the energy
below which these particles are counted and this is not specified in [24]. Hence the agreement
between CORSIKA and their simulation is probably satisfactory within this uncertainty.
In conclusion, the modifications made to CORSIKA to simulate high energy showers in a
water medium give results which are compatible with the predictions from the Geant4 simula-
tions for energy less than 105 GeV and the simulation of AZ within 20%. This is taken to be
the accuracy of the simulation program assuming that there are no unexpected and unknown
interactions between the centre of mass energy explored at current accelerators and those stud-
ied in these simulations. Studies of the sensitivity of the CORSIKA simulation to the different
models of the hadronic interactions have been reported in reference [25]. They find that the
peak number of electrons plus positrons varies by ∼ 20% for proton showers in air depending
on the choice of the hadron interaction model used. These differences are similar in magnitude
to the differences between the AZ, Geant4 and CORSIKA simulations reported here. Hence
the observed differences between the Geant4, AZ and CORSIKA simulations in water could be
within the uncertainties of the hadronic interaction models.
4 Simulation of neutrino induced showers
Neutrinos interact with the nuclei of the detection medium by either the exchange of a charged
vector boson (W+), i.e. charged current (CC) interactions or the exchange of the neutral vector
boson (Z0), i.e. neutral current (NC) deep inelastic scattering interactions (see for example
[26]). The ratio of the CC to NC interaction cross sections is approximately 2:1. The CC
interactions produce charged secondary scattered leptons while the NC interactions produce
neutrinos. The hadron shower carries a fraction y of the energy of the incident neutrino and
the scattered lepton the remaining fraction 1 − y. We assume that the neutrino flavours are
homogeneously mixed when they arrive at the Earth by neutrino oscillations. Hence in the
CC interactions electrons, µ and τ leptons will be produced as the scattered leptons in equal
proportions. At the energies we shall consider, these particles behave in a manner similar to
minimum ionising particles for µ and τ leptons. This is almost true also for electrons for
which the bremsstrahlung process will be suppressed by the LPM effect. Hence the charged
scattered leptons contribute little to the energy producing an acoustic signal. In the case of NC
interactions there is no contribution to this energy from the scattered lepton. For these reasons
the contribution of the scattered lepton to the shower profile is ignored beyond z = 20 m in
what follows.
It is interesting to note that a τ lepton can decay to hadrons or a very high energy electron or
muon can produce bremsstrahlung photons at large distances from the interaction point. These
can initiate further distant showers, the so called “double bang” effect. The stochastic nature of
such electron showers is studied in [15, 16]. These effects are not considered in this study.
4.1 Neutrino-nucleon interaction cross sections.
A number of groups have computed the high energy neutrino-nucleon interaction cross sections,
σ, [27–29]. In the quark parton model of the nucleon for the single vector boson exchange pro-
cess, the differential cross section for CC interactions can be expressed in terms of the measured
structure functions of the target nucleon F2 and xF3 as
dQ2dy
Q2 +M2W
(F2(x,Q
2)(1− y + y2/2)± y(1− y/2)xF3(x,Q
2)) (1)
where GF is the Fermi weak coupling, MW is the mass of the weak vector boson, Q
2 is the
square of the four momentum transferred to the target nucleon, y = ν/E where ν is the energy
transferred to the nucleon (ν = E−E ′ with E and E ′ the energies of the incident and scattered
leptons) and x = Q2/2Mν is the fraction of the momentum of the target nucleon carried by the
struck quark (here x and y are defined for a stationary target nucleon). The plus (minus) sign
is for neutrino (anti-neutrino) interactions. It can be seen that y is the fraction of the neutrino’s
energy which is converted into the energy of the hadron shower. A similar expression can
be written down for the NC interaction (see for example [26]) which has a ratio to the CC
cross section varying from 0.33 to 0.41 as the neutrino energy increases from 104 to 1013 GeV.
The structure functions F2 and xF3 are the sum of the quark distribution functions which have
been parameterised by fitting data [30, 31]. It can be shown that Q2 = sxy where s = 2ME
is the square of the centre of mass energy (M is the target nucleon mass). To compute the
cross sections the structure functions must be calculated at values of x . M2W/s i.e. at values
well outside the region of the fits to the parton distribution functions (PDFs) which have been
performed for x & 10−5, the range of current measurements. The extrapolation outside the
measurement range is discussed in [27], [29] and [32, 33]. Here we adopt the procedure of
extrapolating linearly on a log-log scale from the parameterised parton distribution functions
of [30] computed at x = 10−4 and x = 10−5. By considering various theoretical evolution
procedures it is estimated in [29] that the procedure has an accuracy of ∼ 32% per decade
and we use this as an estimate of the accuracy of the calculation. However, this could be an
underestimate [34].
The expression in equation 1 for charged current interactions and the one for neutral current
interactions were integrated to obtain the total neutrino-nucleon interaction cross section, the
value of the fraction of events per interval of y, 1/σdσ/dy, and the mean value of y. The total
cross section was found to be in good agreement with the values in [27, 29] and in reasonable
agreement with [28] which is based on a model different from the quark parton model. Fig-
ure 4 shows the mean value of y obtained from this procedure (solid curve) and the effect of
multiplying or dividing the PDFs by a factor 1.32 per decade (dashed curves) as an indication
of the possible range of uncertainties in the extrapolation of the PDFs. Figure 5 shows the y
dependence of the cross section for different neutrino energies.
4.2 A simple generator for neutrino interactions.
A simple generator for neutrino interactions in a column of water of thickness 20 m was con-
structed as follows. The neutrino interacts at the top of the water column (z=0, with the z axis
along the axis of the column). The energy fraction transferred, y, for the interaction was gener-
ated, distributed according to the curve for the energy of the neutrino shown in Figure 5. This
allows the energy of the hadron shower to be calculated for the event. The assumption was
made that these hadron showers will have approximately the same distributions as those of a
proton interaction at z=0 (see Section 4.3 for a test of this assumption). A series of files of
100 such proton interactions were generated at energies in steps of half an order of magnitude
between 105 and 1012 GeV. The hadron shower for each neutrino interaction was selected at
random from the 100 showers in the file at the proton energy closest to the energy of the hadron
shower. The deposited energy in each bin was then multiplied by the ratio of the energy of the
hadron shower to that of the proton shower. This is made possible because the shower shapes
vary slowly with shower energy. For example, the ratio of the peak energy deposit per 20 g
cm−2 slice to the shower energy varies from 0.037 to 0.030 as the proton shower energy varies
from 105 to 1012 GeV.
4.3 The HERWIG neutrino generator.
The CORSIKA program has an option to simulate the interactions of neutrinos at a fixed
point [35]. The first interaction is generated by the HERWIG package [36]. This option was
adapted to our version of CORSIKA in sea water. Some problems were encountered with the y
dependence of the resulting interactions due to the extrapolation of the PDFs to very small x at
high energies. This only affects the rate of the production of the showers at different y and the
distribution of the hadrons produced in the interaction at a given y should be unaffected.
A total of 700 neutrino interactions were generated at an incident neutrino energy of 2 · 1011
GeV. These were divided into the shower energy intervals 0.5−2 ·1010, 2−4 ·1010, 4−7.5 ·1010,
0.75− 1.3 · 1011 and 1.3− 2 · 1011. The showers in which the scattered lepton energy disagreed
with the shower energy by more than 20% were eliminated leading to a loss of 17% of the
events with shower energy greater than 0.5 · 1010 GeV. This is due to radiative effects and
misidentification of the scattered lepton. Approximately 70 events remained in each energy
interval. The energy depositions from these were averaged and compared to the averages from
proton showers scaled by the ratio of the shower energy to the proton energy. Figure 6 shows
the longitudinal distributions of the hadronic shower energy deposited for the different energy
intervals (labelled EW ) compared to the scaled proton distributions. Figure 7 shows a sample
of the transverse distributions.
There is a good consistency between the proton and neutrino induced showers. The proton
showers peak, on average, 20 g cm−2 shallower in depth with a peak energy 2% larger than the
neutrino induced showers. This is small compared to the overall uncertainty. The slight shift in
the longitudinal distribution is reflected as a normalisation shift in the radial distributions. We
conclude therefore that to equate a proton induced shower starting at the neutrino interaction
point to that from a neutrino is a satisfactory approximation.
5 Parameterisation of showers
In this section a parameterisation of the energy deposited by the showers generated by COR-
SIKA (averaged over 100 showers depositing the same total energy) is described. Other avail-
able parameterisations will then be compared with the showers generated by CORSIKA.
The acoustic signal generated by a hadron shower depends mainly on the energy deposited
in the inner core of the shower. This is illustrated in figure 8 which shows the contribution to
the acoustic signal from cores of different radii. This figure shows that it is crucial to represent
the deposited energy well at radius less than 2.05 g cm−2. The calculation of the acoustic signal
from the deposited energy is described in section 6.
5.1 Parameterisation of the CORSIKA Showers
The differential energy deposited was parameterised as follows
= L(z, EL) · R(r, z, EL) (2)
where the function L(z, EL) represents the longitudinal distribution of deposited energy and
R(r, z, EL) the radial distribution. Here EL is log10E with E the total shower energy.
The function L(z, EL) = dE/dz is a modified
4 version of the Gaisser-Hillas function [37].
This function represents the longitudinal distribution of the energy deposited.
L(z, EL) = P1L
z − P2L
P3L − P2L
(P3L−P2L)
P4L+P5Lz+P6Lz
P3L − z
P4L + P5Lz + P6Lz2
Here the parameters PnL were fitted to quadratic functions of EL = log10E with values
= 2.760 · 10−3 − 1.974 · 10−4EL + 7.450 · 10
−6E2L (4)
P2L = −210.9− 6.968 · 10
−3EL + 0.1551E
L (5)
P3L = −41.50 + 113.9EL − 4.103E
L (6)
P4L = 8.012 + 11.44EL − 0.5434E
L (7)
P5L = 0.7999 · 10
−5 − 0.004843EL + 0.0002552E
L (8)
P6L = 4.563 · 10
−5 − 3.504 · 10−6EL + 1, 315 · 10
−7E2L. (9)
The parameter P1L represents the peak energy deposited and P3L the depth in the z coordinate
at this peak while P2L, P4L, P5L and P6L are related to the shower width and shape in z.
The radial distribution was represented by the NKG function [37]
R(r, z, EL) =
)(P2R−1)
)(P2R−4.5)
where the integral
)(P2R−1)
)(P2R−4.5)
dr = P1R
Γ(4.5− 2P2R)Γ(P2R)
Γ(4.5− P2R)
4 The modification is to replace the shape parameter λ in equation 3.5 of reference [37] by the quadratic
expression in z in equation 3.
The parameter P1R was found to vary strongly with depth while P2R was only a weak function
of depth. The parameters PnR (with n = 1,2) were each represented by the quadratic form
PnR = A +Bz + Cz
2 (11)
and the quantities A,B,C parameterised as quadratic functions of EL. This gave for P1R
A = 0.01287E2L − 0.2573EL + 0.9636 (12)
B = −0.4697 · 10−4E2L + 0.0008072EL + 0.0005404 (13)
C = 0.7344 · 10−7E2L − 1.375 · 10
−6EL + 4.488 · 10
−6 (14)
and for the parameter P2R
A = −0.8905 · 10−3E2L + 0.007727EL + 1.969 (15)
B = 0.1173 · 10−4E2L − 0.0001782EL − 5.093 · 10
−6 (16)
C = −0.1058 · 10−7E2L + 0.1524 · 10
−6EL − 0.1069 · 10
−8. (17)
The fit was made in a depth range where dE/dz was greater than 10% of the peak value
defined by equation 4. The program MINUIT [38] was used to minimise the squared fractional
deviations
Fi −Di
Fi +Di
where Fi and Di refer to the fitted value and the value observed in the ith bin from the COR-
SIKA showers, respectively. In order to improve the fit at small radii the contributions to χ2
were arbitrarily weighted by 10 for r < 2.05 g cm−2, 4 for 2.05 < r < 3.075 g cm−2, unity
for 3.075 < r < 51.25 g cm−2 and 0.25 for r > 51.25 g cm−2. The RMS value of the frac-
tional deviations was 3.4% for radii less than 51.25 g cm−2 and for energies greater than 106.5
GeV. The fit becomes poorer at lower energies and greater radii than these. Integrating the pa-
rameterisation shows that the fraction of the total energy computed from the fit within the fit
range was 91% averaged over the deposited energy range 107 to 1012 GeV. The corresponding
fraction directly from the CORSIKA distributions was 92.5%, averaged over the same energy
range. When applying this parameterisation at depths with smaller energy deposit than 10% of
the peak value, the energy was assumed to be confined to an annular radius of 1.025 g cm−2.
There was a good agreement (within 5% at the peak) between the acoustic signal computed
using this parameterisation and that taken directly from the CORSIKA showers.
5.2 The parameterisation used by the SAUND Collaboration
The SAUND Collaboration [39] uses the following parameterisation [40], based on the NKG
formulae (e.g. see reference [37]), for the energy deposited per unit depth, z, and per unit
annular thickness at radius r from a shower of energy E
= Ek(
)t exp (t− z/λ) 2πrρ(r) (19)
where zmax = 0.9X0 ln(E/Ec) is the maximum shower depth, X0 = 36.1 g cm
−2 is the radia-
tion length and Ec = 0.0838GeV. The constants t = zmax/λwhere λ = 130−5 log10(E/10
4GeV)
g cm−2 and k = tt−1/ exp (t)λΓ(t). The radial density is given by
ρ(r) =
as−2(1 + a)s−4.5
Γ(4.5− s)
2πΓ(s)Γ(4.5− 2s)
where a = r/rM with rM = 9.04 g cm
−2, the Molière radius in water, and s = 1.25. Figure
9 shows the radial distributions from CORSIKA compared with the absolute predictions of this
parameterisation.
There is qualitative agreement between the parameterisation and the CORSIKA results. The
difference in normalisation is explained by the somewhat different longitudinal profiles of the
CORSIKA showers from the SAUND parameterisation. The latter are broader with a lower
peak energy deposit and a depth of the maximum which is larger than the CORSIKA showers.
CORSIKA predicts more energy at small r than the SAUND parameterisation. Quantitatively,
51% of the shower energy is contained within a cylinder of radius 4 cm for the CORSIKA
showers compared to 35% from the SAUND parameterisation. These fractions are approxi-
mately independent of energy. Hence, in acoustic detectors a harder frequency spectrum for the
acoustic signals is predicted by CORSIKA than by the SAUND parameterisation. Note that in
the fit described in Section 5.1 the values of the parameter P1R (equivalent to RM in equation
20) were strongly depth dependent and much lower than the Molière radius in water, assumed
by the SAUND collaboration. In addition, the value of P2R (equivalent to s in equation 20)
while relatively constant tended to be at a higher value (∼ 1.9) than that assumed by SAUND.
5.3 The parameterisation used by Niess and Bertin
Hadron showers, generated by Geant4 (version 4.06 p03), were studied up to energies of 105
GeV and electromagnetic showers to higher energies by Niess and Bertin [15,16]. The hadronic
showers were parameterised as follows.
= rf(z)g(r, z) (21)
f(z) =
(bz′)a−1 exp−bz′
where E is the energy of the hadron shower, X0 is the radiation length in water, z
′ = z/X0,
b = 0.56 as determined from the fit and a is chosen to satisfy z′max = (a − 1)/b. Here z
max is
the depth in radiation lengths at which the shower maximum occurs. This is parameterised as
z′max = 0.65 log(
) + 3.93 (23)
with Ec = 0.05427 GeV. The radial distribution function is parameterised as
g(r, z) = g0
where ri = 3.5 cm, n = n1 = 1.66 − 0.29(z/zmax) for r < ri and n = n2 = 2.7 for r > ri.
The constant g0 is chosen to be (2− n1)(n2 − 2)/((n2 − n1)r
i ) so that the integral of the radial
distribution is unity.
Figure 10 shows the radial distributions from this parameterisation compared with the pre-
dictions of CORSIKA. There is quite good agreement between the two. There is a difference in
the normalisation with depth since Geant4, on which this parameterisation is based, produces
showers which tend to develop more slowly with depth than those from CORSIKA (see Fig-
ure 2). Furthermore, both this and the SAUND parameterisation (Section 5.2) assume a linear
variation of the shower peak depth with logE whereas CORSIKA gives a clear parabolic shape
(see equation 6). This is illustrated in Figure 11. The Niess-Bertin parameterisation predicts that
56% of the shower energy is contained within a cylinder of radius 4 cm in reasonable agreement
with the value of 51% from CORSIKA (these values are almost independent of energy).
6 The acoustic signals from the showers.
The pressure, P , from a hadron shower depositing total energy E at time t resulting from the
deposition of relative energy density ǫ = (1/E)(1/2πr)d2E/drdz at a point distant d from the
volume, dV , follows the form [13]
P (d, t) =
δ(t− d/c)
dV (25)
where the integral is over the total volume of the shower. Here β = 2.0 · 10−4 is the thermal
expansion coefficient of the medium at 14◦C, Cp = 3.8 · 10
3 J kg−1 K−1 is the specific heat
capacity and c = 1500 ms−1 is the velocity of sound in the sea water.
Acoustic signals seen by an observer at distance r from the shower centre are computed from
equation (25) as follows. Points are produced randomly throughout the volume of the shower
with density proportional to the deposited energy density and the time of flight from every
produced point to the observer calculated. The flight times to the observer are histogrammed
over 2n bins (in this case n = 10 is chosen) centred on the mean flight time and with a suitable
bin width, τ (chosen here to be 1µs). The counts in each bin of the histogram are divided by τ
yielding the function Exyz(t). The Fourier transform of the pressure wave is then
P (ω) =
Exyz(t)e
−iωtdt =
Exyz(t)e
−iωtdt =
iωExyz(ω)
using the standard Fourier transform theorem, that taking the derivative in the time domain is the
same as multiplying by iω in the frequency domain. The Fourier transform Exyz(ω) at angular
frequency ω is evaluated numerically by a fast Fourier Transform (FFT) from the histogram
Exyz(t). A correction is applied for attenuation in the water by a factor A(ω) = e
−α(ω)r where
α(ω) is the frequency dependent attenuation coefficient. The pressure as a function of time is
then evaluated numerically by an inverse FFT using frequency steps from zero to the sampling
frequency (the inverse of the bin width τ i.e. 1 MHz in this case). This gives
P (t) =
n=511
n=−512
P (ωn)A(ωn)e
inΩ (27)
where Ω = 2π/1024 radians and ωn/2π = nΩ/2π MHz is the nth frequency. The attenuation
coefficient α(ω) is computed either according to the formulae in [42] or using the complex
attenuation given in [15, 16]. This method of calculation was computationally much faster than
the evaluation of the space integral given in equation 18 of reference [13] and gave identical
results.
Acoustic pulses, computed with the complex attenuation described in [15, 16], using the
parameterisations of the shower profile given above are shown in Figure 12. It can be seen that
the parameterisation developed here gives similar results to that described in [15, 16] despite
the fact that the latter was an extrapolation from low energy simulations. The parameterisation
used by SAUND [39, 40] gives smaller signals concentrated at somewhat lower frequencies.
Further properties of the acoustic signals are shown in Figures 13 to 16. The pulses tend
to be somewhat asymmetric with the asymmetry defined by |Pmax| − |Pmin|/|Pmax| + |Pmin|.
The complex nature of the attenuation enhances this asymmetry. This is most evident in the far
field conditions e.g. at 5km where non complex attenuation would yield a totally symmetric
pulse. Figure 13 shows the angular dependence of the peak pressure. Here the angle is that
subtended by the acoustic detector relative to the plane, termed the median plane, through the
shower maximum at right angles to the axis of the shower. The parameterisation derived here
gives a somewhat narrower angular spread than the others. This could be due to the slightly
longer showers predicted by CORSIKA than the others. Figure 13 also shows the asymmetry of
the pulse as a function of this angle. The pulse initially becomes more symmetric moving out of
the median plane and then the asymmetry becomes negative at larger angles. Figure 14 shows
the decrease of the pulsed peak pressure with distance from the shower in the median plane and
the asymmetry with distance in this plane. Figures 15 and 16 show the frequency composition
of the pulses at different angles to the median plane at 1 km from the shower and at different
distance in the median plane, respectively.
7 Conclusions
The simulation program for high energy cosmic ray air showers, CORSIKA, has been modified
to work in a water or ice medium. This allows both hadron and neutrino showers to be generated
in the medium over a wide range of energy (105 to 1012 GeV). The properties of hadronic
showers in water simulated by CORSIKA agree with those from other simulations to within
10 − 20%. A similar uncertainty has been noted previously from the variations in CORSIKA
showers in air generated by different models of the hadron interactions. However, none of
the other available simulations for water cover the range of energies accessible to CORSIKA.
The hadronic showers produced by neutrino interactions are shown to have similar profiles to
proton showers which deposit the same amount of energy to that from the neutrino and which
start at the interaction point of the neutrino. The properties of the neutrino interactions are
described. A parameterisation of the shower profiles generated by CORSIKA is given. There is
reasonable agreement with the parameterisation based on the Geant4 simulations at low energy
(< 105 GeV) developed by Niess and Bertin. However, the agreement with the parameterisation
used by the SAUND Collaboration, which is based on the NKG formalism, is less good. The
position of the shower maximum, determined from the CORSIKA program, is found to vary
quadratically with logE rather than linearly as assumed in the latter two parameterisations.
The acoustic signals generated by neutrino interactions using CORSIKA and by the two
other parameterisations are described and their properties are studied. The acoustic signal is
found to be very sensitive to the energy deposited close to the shower axis.
7.1 Acknowledgments
We wish to thank Ralph Engel, Dieter Heck, Johannes Knapp and Tanguy Pierog for their
assistance in modifying the CORSIKA program. We also thank Valentin Niess and Justin Van-
denbroucke for valuable discussions.
References
[1] Proceedings of the Workshop on Acoustic and Radio EeV Neutrino Detection Activities
(ARENA), DESY, Zeuthen (May 2005), Editors R. Nahnhauer and S. Böser
[2] K. Griesen, Phys. Rev. Lett.16 (1966) 748,
G.T. Zaptsepin, V.A. Kuzmin, JETP Lett. 4 (1966) 78.
[3] E. Waxman and J. Bahcall, Phys. Rev. D59 (1999) 023002, (hep-ph/9807282).
[4] R.D. Engel, D. Seckel and T.Stanev, Phys. Rev. D64 (2001) 093010 (astro-ph/0101216).
[5] See for example M. Ackermann et al., (astro-ph/0412347) Phys. Rev. D71 (2005) 077102.
[6] See for example Nucl. Instrum. Meth. A567 (2006) 438
[7] J.A. Aguilar et al., astro-ph/0606229.
[8] See for example G. Aggouras et al., Nucl. Instrum. and Meth. A552 (2005) 420.
[9] See for example Nucl. Phys. Proc. Suppl. 143 (2005) 373.
[10] “CORSIKA: A Monte Carlo Code to Simulate Extensive Air Showers”, D. Heck et al.,
Karlsruhe Report FZKA 6019. (http://www-ik.fzk.de/corsika).
[11] G. Askar’yan, Soviet Physics JETP 14 (1962) 441 and 21 (1965) 658.
[12] J. Alvarez-Muñiz, E. Marqués, R.A. Vázquez and E. Zas Phys. Rev. D68 (2003) 043001
(astro-ph/0206043).
[13] J.G. Learned Phys. Rev. D19 (1979) 3293.
[14] J. Alvarez-Muñiz and E.Zas, Phys. Lett. B434 (1998) 396 (astro-ph/9806098).
[15] V. Niess and V. Bertin astro-ph/0511617 and V. Niess, PhD Thesis, CPPM, Marseille.
[16] V. Niess, PhD Thesis, CPPM, Marseille, see equations 1-55 and 1-56.
http://arxiv.org/abs/hep-ph/9807282
http://arxiv.org/abs/astro-ph/0101216
http://arxiv.org/abs/astro-ph/0412347
http://arxiv.org/abs/astro-ph/0606229
http://www-ik.fzk.de/corsika
http://arxiv.org/abs/astro-ph/0206043
http://arxiv.org/abs/astro-ph/9806098
http://arxiv.org/abs/astro-ph/0511617
[17] Geant4, J. Allison et al., Nucl. Inst. and Meths. in Phys. Research A506 (2003) 250 and
IEEE Transactions on Nucl. Science 53 (2006) 270.
[18] “The EGS4 Code System” W.R. Nelson, H.Hirayama and D.W.O. Rogers, report number
SLAC-265.
[19] L.D. Landau and I.J. Pomeranchuk, Dokl. Akad. Nauk. SSSR 92 (1953) 535 and 92 (1953)
735. These papers are available in English in L. Landau, “The Collected Papers of L.D.
Landau”, Pergamon Press 1965.
A.B. Migdal, Phys. Rev. 103 (1956) 1811.
[20] Particle data table, Phys. Lett. 592 (2004) 1.
[21] R. M. Sternheimer, S.M. Seltzer and M.J.Berger, Atomic Data and Nuclear Data Tables
30 (1984) 261.
[22] D. Heck et al., Forschungszentrum Karlsruhe GmbH, Karlsruhe, Report number FZKA
6019 (1998).
[23] N.N. Kalmykov and S. Ostapchenko, Phys. Atom. Nucl. 56 (1993) 346, N.N. Kalmykov
et al., Nucl. Phys. Proc. Suppl. 52B (1997) 17.
[24] J. Alvarez-Muniz and E. Zas, Phys. Lett. B434 (1998) 396 (astro-ph/9806098).
[25] Influence of Hadronic Interaction Model on the Development of EAS in Monte Carlo
Simulations, D. Heck, J.Knapp and G. Schatz, Nucl. Phys. B (Proc. Suppl.) 52B (1997)
139-141.
[26] “An Introduction to the Physics of Quarks and Leptons” by P. Renton (published by Cam-
bridge University Press, 1990)
[27] J. Kwiecinski, A.D. Martin and A.M. Stasto Acta Phys. Polon. B31 (2000) 1273
(hep-ph/0004109).
[28] A.Z. Gazizov and S.I. Yanush Phys. Rev. D65 (2002) 093003 (hep-ph/0105368)
[29] R. Ghandi, C. Quigg, M.H. Reno, I. Sarcevic, Astroparticle Physics 5 (1996) 81.
[30] A.D. Martin, R.G. Roberts, W.J.Stirling and R.S. Thorne, Eur. Phys. J. C14 (2000) 133
(hep-ph/9907231).
[31] http://www.phys.psu.edu/∼cteq/
[32] J. Kwiecinski, A.D. Martin and A.M. Stasto, Phys. Rev. D59 (1999) 093002.
[33] A.D. Martin, M.G.Ryskin and A.M. Stasto, Acta Phys. Polon. B34 (2003)3273.
[34] R.S. Thorne, private communication.
[35] O. Pisanti, private communication, see also M. Ambrosio et al., astro-ph/0302062.
[36] HERWIG, G. Corcella et al., hep-ph/0011363.
http://arxiv.org/abs/astro-ph/9806098
http://arxiv.org/abs/hep-ph/0004109
http://arxiv.org/abs/hep-ph/0105368
http://arxiv.org/abs/hep-ph/9907231
http://www.phys.psu.edu/~cteq/
http://arxiv.org/abs/astro-ph/0302062
http://arxiv.org/abs/hep-ph/0011363
[37] “Introduction to Ultra High Energy Cosmic Rays” by P. Sokolsky (published by Addison-
Wesley, 1989).
[38] F. James and M. Roos, “Minuit, A System for Function Minimization and Analysis of the
Parameter Errors and Correlations”, Comput. Phys. Commun. 10 (1975) 343.
[39] J. Vandenbroucke, G. Gratta, N. Lehtenin, Astrophys. J. 621 (2005) 301.
(astro-ph/0406105).
[40] J. Vandenbroucke, private communication.
[41] N.G. Lehtinen, S. Adam, G.Gratta, T.K. Berger and M.J. Buckingham (astro-ph/0104033).
[42] M.A. Ainslie and J.G. McColm, J.Acoust. Soc. Am 103 (1998) 1671.
http://arxiv.org/abs/astro-ph/0406105
http://arxiv.org/abs/astro-ph/0104033
Interaction length of photons in water (CORSIKA)
Pair production length
with LPM Effect
(dash dotted curve)
Total Interaction length
including photonuclear
interaction (solid curve)
9/7 X0 Water (No LPM effect)
Log10 Photon Energy/GeV
Corsika - LPM Effect On
Corsika - LPM Effect Off
4 5 6 7 8 9 10 11 12
Figure 1: The interaction length for high energy gamma rays versus the photon energy measured
in CORSIKA (data points with statistical errors). The dash dotted curve shows the pair produc-
tion length computed from the LPM effect using the formulae of Migdal [19]. The solid curve
shows the computed total interaction length, including both pair production and photonuclear
interactions with the cross section from CORSIKA. The dashed line labelled 9/7X0 shows the
expected pair production length without the LPM effect. Here X0 is the radiation length of the
material.
z (cm)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Average de/dz
Geant4  seawater
Corsika seawater
Average dE/dz of 100 proton showers
z (cm)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Average de/dz
Geant4  seawater
Corsika seawater
Average dE/dz of 100 proton showers
Figure 2: Averaged longitudinal energy deposited per unit path length of 100 proton showers
at energy 104 GeV (upper plot) and 105 GeV (lower plot) generated in Geant4 and CORSIKA
versus depth in the water.
Geant4  seawater
Corsika seawater
Depth = 250cm
210 Depth = 450cm
Depth = 650cm
210 Depth = 850cm
0 5 10 15 20 25 30 35 40
210 Depth = 1050cm
Geant4  seawater
Corsika seawater
Depth = 250cm
10 Depth = 450cm
) Depth = 650cm
10 Depth = 850cm
0 5 10 15 20 25 30 35 40
10 Depth = 1050cm
Figure 3: Averaged radial energy deposited per 20 g cm−2 vertical slice per unit radial distance
for 100 proton showers at energy 104 GeV (left hand plots) and 105 GeV (right hand plots)
generated in Geant4 and CORSIKA versus distance from the axis in the water for different
depths of the shower.
Average y for different structure function extrapolations
Standard extrapolation (solid)
Scaled extrapolations (dashed)
log Eν/GeV
4 6 8 10 12
Figure 4: The mean value of y as a function of energy for νµ interactions computed according
to the standard model with the PDFs of MRS99 [30], extrapolating x and Q2 out of the fit
range from x = 10−4 linearly on a log-log scale. The upper dashed curve shows the result of
multiplying the PDFs by 1.32log(10
−4/x) for PDFs with x < 10−4 and the lower dashed curve by
dividing by this factor. The deviations of the dashed curve from the solid one is an indication
of the precision of the standard model.
Differential cross section for νµ Scattering
E=104 GeV
E=1013 GeV
E=106 GeV
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5: The fraction of events per unit y interval for different νµ energies computed by inte-
grating the expressions for the CC and NC cross sections.
x 10 5
10000
x 10 5
x 10 6
x 10 6
Mean EW = 10
10 GeV
Proton (dashed) νµ (solid) showers of same energy
   EW = 3 10
10 GeV
    EW = 5.75 10
10 GeV
    EW = 10
11 GeV
    EW = 1.65 10
11 GeV
Depth gm cm-2
x 10 6
200 400 600 800 1000 1200 1400 1600 1800
Figure 6: The longitudinal distribution of the deposited energy for neutrino showers (solid)
generated by the Herwig-CORSIKA package and proton showers (dashed) scaled to the same
values of shower energy EW . The scaling factors applied to the average of the protons showers
with energy 1010 GeV were 1.0 and 3.0 for EW = 10
10 GeV and EW = 3 · 10
10 GeV, respec-
tively. Those applied to proton showers with energy 1011 GeV were 0.575, 1.0 and 1.65 for
EW = 5.75 · 10
10 EW = 10
11 and EW = 1.65 · 10
11 GeV, respectively.
Depth 250 g/cm2
Proton (dashed) νµ (solid) EW = 3 10
10 GeV
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Depth 250 g/cm2
Proton (dashed) νµ (solid) EW = 1.65 10
11 GeV
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 7: The solid curves show the averaged radial energy deposited per 20 g cm−2 vertical
slice per unit radial distance for 70 neutrino showers with energy transfer EW = 3 · 10
10 GeV
(left hand plots) and EW = 1.65 · 10
11 GeV (right hand plots). The incident neutrino energy
was 2 ·1011 GeV. For comparison the dashed curves show the distributions from proton showers
scaled to these energies. In the left (right) hand plots protons of energy 1010 GeV (1011 GeV)
were scaled by a factor of 3 (1.65).
−100 −80 −60 −40 −20 0 20 40 60 80 100
time (µs)
 Pulse @1km, z=8m (109GeV Primary)
0−1 cm
0−2 cm
Total
Figure 8: The acoustic signal at a distance of 1 km from the shower axis in the median plane
computed from the average of 100 CORSIKA showers each depositing a total energy of 109
GeV in the water. The dotted, dashed and solid curves shows the signals computed from the
deposited energies within cores of radius 1.025 g cm−2, 2.05 g cm−2 and the whole shower
(solid curve), respectively. It can be seen that most of the amplitude of the signal comes from
the energy within a core of radius 2.05 g cm−2.
Solid CORSIKA, dash SAUND parameterisation
Depth 250 g/cm2
106 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Solid CORSIKA, dash SAUND parameterisation
Depth 250 g/cm2
1011 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 9: The radial distributions of the deposited energy at different depths from CORSIKA
compared to the parameterisation used by the SAUND collaboration [39] for 106 GeV and 1011
GeV proton induced showers.
Solid CORSIKA, dash NB parameterisation
Depth 250 g/cm2
106 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Solid CORSIKA, dash NB parameterisation
Depth 250 g/cm2
1011 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 10: The radial distributions of the deposited energy at different depths from CORSIKA
compared to the parameterisation used by the Niess and Bertin [15, 16] (labelled NB parame-
terisation) for 106 GeV and 1011 GeV proton induced showers.
CORSIKA
SAUND
Log10E/GeV
6 7 8 9 10 11 12
Figure 11: The depth of the shower peak as a function of log10 E from CORSIKA (black points)
for the showers starting at z = 0. The solid curve shows the parameterisation according to
equation 6. The dashed (dash dotted) lines show the values assumed by the SAUND (Niess-
Bertin) Collaborations.
−100 −50 0 50 100
−0.05
time (µs)
The Acoustic Pulse from a 1011GeV Shower at 1km
ACoRNE
SAUND
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
ACoRNE
SAUND
Figure 12: The left hand plot shows acoustic pulses generated from the parameterisation de-
scribed in section 5.1 labelled ACoRNE, the parameterisation from reference [15, 16] labelled
NB and that from reference [39,40] labelled SAUND. These pulses were evaluated for a hadron
shower from a neutrino interaction depositing hadronic energy of 1011 GeV 1 km distant from
an acoustic detector in a plane perpendicular to the shower axis at the shower maximum (the
median plane). The right hand plot shows the frequency decomposition of the pulses in the left
hand plot.
−6 −4 −2 0 2 4 6
Angle (degrees)
Peak Pressure vs Angle from a 1011GeV Shower at 1km
ACoRNE
SAUND
0 0.5 1 1.5 2 2.5 3
−0.15
−0.05
Angle (degrees)
The Asymmetry of a Pulse from a 1011GeV Shower
ACoRNE
SAUND
Figure 13: The left hand plot shows the variation of the peak pressure in the pulse with angle
from the median plane at 1 km from the shower. The right hand plot shows the variation of
the pulse asymmetry with this angle at the same distance. The curves were computed from the
parameterisations labelled.
Distance (meters)
Peak Pressure vs Distance from a 1011GeV Shower
ACoRNE
SAUND
Distance (Meters)
The Asymmetry of a Pulse from a 1011GeV Shower
ACoRNE
SAUND
Figure 14: The left hand plot shows the decrease of the pulse peak pressure and the right hand
plot the pulse asymmetry, both in the median plane, as a function of distance from the shower
computed from the parameterisations.
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
Figure 15: The left hand plot shows the frequency decomposition of the acoustic signal, com-
puted from the parameterisation of the CORSIKA showers, at different angles to the median
plane at a distance of 1km from the shower and the right hand plot shows the cumulative fre-
quency spectrum i.e. the integral of the left hand plot.
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower
1000m
2000m
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower
1000m
2000m
Figure 16: The left hand plot shows the frequency decomposition of the acoustic signal, com-
puted from the parameterisation of the CORSIKA showers, at different distances from the hy-
drophone in the median plane and the right hand plot shows the cumulative frequency spectrum
i.e. the integral of the left hand plot.
April 8, 2007
Simulation of Ultra High Energy Neutrino Interactions in Ice
and Water
(the ACoRNE Collaboration)a
S. Bevan1, S. Danaher2, J. Perkin3, S. Ralph3y, C. Rhodes4, L. Thompson3, T. Sloan5b and
D. Waters1.
1 Physics and Astronomy Dept, University College London, UK.
2 School of Computing Engineering and Information Sciences, University of Northumbria,
Newcastle-upon-Tyne, UK.
3 Dept of Physics and Astronomy, University of Sheffield, UK.
4 Institute for Mathematical Sciences, Imperial College London, UK.
5 Department of Physics, University of Lancaster, Lancaster, UK
y Deceased
a Work supported by the UK Particle Physics and Astronomy Research Council and by the
Ministry of Defence Joint Grants Scheme
b Author for correspondence, email t.sloan@lancaster.ac.uk
Abstract
The CORSIKA program, usually used to simulate extensive cosmic ray air showers, has
been adapted to work in a water or ice medium. The adapted CORSIKA code was used
to simulate hadronic showers produced by neutrino interactions. The simulated showers
have been used to study the spatial distribution of the deposited energy in the showers.
This allows a more precise determination of the acoustic signals produced by ultra high
energy neutrinos than has been possible previously. The properties of the acoustic signals
generated by such showers are described.
(Submitted to Astroparticle Physics)
1 Introduction
In recent years interest has grown in the detection of very high energy cosmic ray neutrinos [1].
Such particles could be produced in the cosmic particle accelerators which make the charged
primaries or they could be produced by the interactions of the primaries with the Cosmic Mi-
crowave Background, the so called GZK effect [2]. The flux of neutrinos expected from these
two sources has been calculated [3,4]. It is found to be very low so that large targets are needed
for a measurable detection rate. It is interesting to measure this neutrino flux to see if it is
compatible with the values expected from these sources, incompatibility implying new physics.
Searches for cosmic ray neutrinos are ongoing in AMANDA [5], IceCube [6], ANTARES
[7] and NESTOR [8], detecting upward going muons from the Cherenkov light in either ice or
water. In general, these experiments are sensitive to lower energies than discussed here since the
Earth becomes opaque to neutrinos at very high energies. The experiments could detect almost
horizontal higher energy neutrinos but have limited target volume due to the attenuation of the
light signal in the ice. The Pierre Auger Observatory, an extended air shower array detector, will
also search for upward and almost horizontal showers from neutrino interactions [9]. In addition
to these detectors there are ongoing experiments to detect the neutrino interactions by either
radio or acoustic emissions from the resulting particle showers [1]. These latter techniques,
with much longer attenuation lengths, allow very large target volumes utilising either large
ice fields or dry salt domes for radio or ice fields, salt domes and the oceans for the acoustic
technique.
In order to assess the feasibility of each technique the production of the particle shower from
neutrino interactions needs to be simulated. Since experimental data on the interactions of such
high energy particles do not exist it is necessary to use theoretical models to simulate them.
The most extensive ultra high energy simulation program which has so far been developed is
CORSIKA [10]. However, this program has been used previously only for the simulation of
cosmic ray air showers. The program is readily available [10].
Different simulations are necessary for the radio and acoustic techniques. Radio emission
occurs due to coherent Cherenkov radiation from the particles in the shower, the Askaryan
Effect [11]. The emitted energy is sensitive to the distribution of the electron-positron asymme-
try which develops in the shower and which grows for lower energy electromagnetic particles.
Hence, to simulate radio emission, the electromagnetic component of the shower must be fol-
lowed down to very low kinetic energies (� 100 keV) [12]. In contrast, an acoustic signal
is generated by the sudden local heating of the surrounding medium induced by the particle
shower [13]. Thus to simulate the acoustic signal the spatial distribution of the deposited en-
ergy is needed. Once the electromagnetic energy in the shower reaches the MeV level (electron
range � 1 cm) the energy can be simply added to the total deposited energy and the simula-
tion of such particles discontinued. Extensive simulations have been carried out for the radio
technique [14]. However, the simulations for the acoustic technique are less advanced. Some
work has been done [15,16] using the Geant4 package [17]. However, this work is restricted to
energies less than 105 GeV for hadron showers since the range of validity of the physics models
in this package does not extend to higher energy hadrons.
In this paper the energy distributions of showers produced by neutrino interactions in sea
water at energies up to 1012 GeV are discussed. The distributions are generated using the air
shower program CORSIKA [10] modified to work in a sea water medium. The salt compo-
nent of the sea water has a negligible effect1 and the results are presented in distance units of
g cm�2, hence they should be applicable to ice also. The computed distributions have been pa-
rameterised and this parameterisation is used to develop a simple program to simulate neutrino
interactions and the resulting particle showers. The properties of the acoustic signals from the
generated showers are also presented.
2 Adaptation of the CORSIKA program to a water medium
The air shower program, CORSIKA (version 6204) [10], has been adapted to run in sea water
i.e. a medium of constant density of 1.025 g cm�3 rather than the variable density needed for
an air atmosphere. Sea water was assumed to consist of a medium in which 66:2% of the atoms
are hydrogen, 33:1% of the atoms are oxygen and 0:7% of the atoms are made of common salt,
NaCl. The salt was assumed to be a material with atomic weight and atomic number A=29.2 and
Z=14, the mean of sodium and chlorine. The purpose of this is to maintain the structure of the
program as closely as possible to the air shower version which had two principal atmospheric
components (oxygen and nitrogen) with a trace of argon. The presence of the salt component
had an almost undetectable effect on the behaviour of the showers.
Other changes made to the program to accommodate the water medium include a modifica-
tion of the stopping power formula to allow for the density effect in water 2. This only affects
the energy loss for hadrons since the stopping powers for electrons are part of the EGS [18]
package which is used by CORSIKA to simulate the propagation of the electromagnetic com-
ponent of the shower. Smaller radial binning of the shower was also required since shower radii
in water are much smaller than those in air. In addition the initial state energy for electrons
and photons above which the LPM effect [19] was simulated in the program was reduced to the
much lower value necessary for water 3. The LPM effect suppresses pair production from high
energy photons and bremsstrahlung from high energy electrons. Similarly, the interactions of
neutral pions had to be simulated at lower energy than in air because of the higher density water
medium. In all about 100 detailed changes needed to be made to the CORSIKA program to
accommodate the water medium.
To test the implementation of the LPM effect [19] in the program 100 showers from incident
gamma ray photons at several different energies were generated and the mean depth of the first
interaction (the mean free path) calculated. The observed mean free path was found to be
in agreement with the expected behaviour when both the suppression of pair production and
photonuclear interactions were taken into account (see Figure 1). This showed that the LPM
effect had been properly implemented in CORSIKA.
Considerable fluctuations between showers occurred. These are expressed in terms of the
ratio of the root mean square (RMS) deviation of a given parameter to its mean value: the
1The shower maximum was observed to peak at a depth 2:4� 1:1% less in sea water than in fresh water with
the same peak energy deposited, for protons of energy 105 GeV.
2The stopping power was computed using the Bethe-Bloch formula [20] and the density effect from the formu-
lae of Sternheimer et al [21].
3The level was set at 1 TeV compared to the characteristic energy for water E
= 270 TeV [20].
RMS peak energy deposit to the mean peak energy deposit was observed to be 14% at 105 GeV
reducing to 4% at 1011 GeV, that for the depth of the peak position varied from 19% to 7:4%
and for the full width at half maximum of the shower from 63% to 18%. To smooth out such
fluctuations averages of 100 generated showers will be taken in the following. The statistical
error on the averages is then given by these RMS values divided by 10. The hadronic energy
contributes only about 10% to the shower energy at the shower peak, the remainder being carried
by the electromagnetic part of the shower.
The simulations were all carried out in a vertical column of sea water 20 m long. The
deposited energy generated by CORSIKA was binned into 20 g cm�2 slices longitudinally
and 1.025 g cm�2 annular cylinders radially for 0 < r < 10:25 g cm�2 and 10.25 g cm�2
for 10:25 < r < 112:75 g cm�2 where r is the distance from the vertical axis. To reduce
computing times, the thinning option was used i.e. below a certain fraction of the primary
energy (in this case 10�4) only one of the particles emerging from the interaction is followed
and an appropriated weight is given to it [22]. The simulation of particles continued down to cut-
off energies of 3 MeV for electromagnetic particles and 0.3 GeV for hadrons. When a particle
reached this cut-off, the energy was added to the slice where this occurred. The QGSJET [23]
model was used to simulate the hadronic interactions.
3 Comparison with other simulations
3.1 Comparison with Geant4
Proton showers were generated in sea water using the program Geant4 (version 8.0) [17] and
compared with those generated in CORSIKA. Unfortunately, the range of validity of Geant4
physics models for hadronic interactions does not extend beyond an energy of 105 GeV. Hence
the comparison is restricted to energies below this.
Figure 2 shows the longitudinal distributions of proton showers at energies of 104 and 105
GeV (averaged over 100 showers) as determined from Geant4 and CORSIKA. The showers
from CORSIKA tend to be slightly broader and with a smaller peak energy than those generated
by Geant4. The difference in the peak height is � 5% at 104 GeV rising to � 10% at energy 105
GeV. Figure 3 shows the radial distributions. The differences in the longitudinal distributions
are reflected in the radial distributions. However, the shapes of the radial distributions are very
similar between Geant4 and CORSIKA, with CORSIKA producing � 10% more energy near
the shower axis at depths between 450 and 850 g cm�2 where most of the energy is deposited.
The acoustic signal from a shower is most sensitive to the radial distribution, particularly near
the axis (r � 0). It is relatively insensitive to the shape of the longitudinal distribution.
3.2 Comparison with the simulation of Alvarez-Muñiz and Zas
The CORSIKA simulation was also compared with the longitudinal shower profile for protons
computed in the simulation by Alvarez-Muñiz and Zas (AZ) [24]. There was a reasonable
agreement between the longitudinal shower shapes from CORSIKA and those shown in Figure
2 of ref. [24]. However, the numbers of electrons and positrons at the peak of the CORSIKA
showers was � 20% lower than those from ref. [24]. This number is sensitive to the energy
below which these particles are counted and this is not specified in [24]. Hence the agreement
between CORSIKA and their simulation is probably satisfactory within this uncertainty.
In conclusion, the modifications made to CORSIKA to simulate high energy showers in a
water medium give results which are compatible with the predictions from the Geant4 simula-
tions for energy less than 105 GeV and the simulation of AZ within 20%. This is taken to be
the accuracy of the simulation program assuming that there are no unexpected and unknown
interactions between the centre of mass energy explored at current accelerators and those stud-
ied in these simulations. Studies of the sensitivity of the CORSIKA simulation to the different
models of the hadronic interactions have been reported in reference [25]. They find that the
peak number of electrons plus positrons varies by � 20% for proton showers in air depending
on the choice of the hadron interaction model used. These differences are similar in magnitude
to the differences between the AZ, Geant4 and CORSIKA simulations reported here. Hence
the observed differences between the Geant4, AZ and CORSIKA simulations in water could be
within the uncertainties of the hadronic interaction models.
4 Simulation of neutrino induced showers
Neutrinos interact with the nuclei of the detection medium by either the exchange of a charged
vector boson (W+), i.e. charged current (CC) interactions or the exchange of the neutral vector
boson (Z0), i.e. neutral current (NC) deep inelastic scattering interactions (see for example
[26]). The ratio of the CC to NC interaction cross sections is approximately 2:1. The CC
interactions produce charged secondary scattered leptons while the NC interactions produce
neutrinos. The hadron shower carries a fraction y of the energy of the incident neutrino and
the scattered lepton the remaining fraction 1 � y. We assume that the neutrino flavours are
homogeneously mixed when they arrive at the Earth by neutrino oscillations. Hence in the
CC interactions electrons, � and � leptons will be produced as the scattered leptons in equal
proportions. At the energies we shall consider, these particles behave in a manner similar to
minimum ionising particles for � and � leptons. This is almost true also for electrons for
which the bremsstrahlung process will be suppressed by the LPM effect. Hence the charged
scattered leptons contribute little to the energy producing an acoustic signal. In the case of NC
interactions there is no contribution to this energy from the scattered lepton. For these reasons
the contribution of the scattered lepton to the shower profile is ignored beyond z = 20 m in
what follows.
It is interesting to note that a � lepton can decay to hadrons or a very high energy electron or
muon can produce bremsstrahlung photons at large distances from the interaction point. These
can initiate further distant showers, the so called “double bang” effect. The stochastic nature of
such electron showers is studied in [15, 16]. These effects are not considered in this study.
4.1 Neutrino-nucleon interaction cross sections.
A number of groups have computed the high energy neutrino-nucleon interaction cross sections,
�, [27–29]. In the quark parton model of the nucleon for the single vector boson exchange pro-
cess, the differential cross section for CC interactions can be expressed in terms of the measured
structure functions of the target nucleon F
and xF
)(1� y + y
=2)� y(1� y=2)xF
)) (1)
where G
is the Fermi weak coupling, M
is the mass of the weak vector boson, Q2 is the
square of the four momentum transferred to the target nucleon, y = �=E where � is the energy
transferred to the nucleon (� = E�E 0 with E and E 0 the energies of the incident and scattered
leptons) and x = Q2=2M� is the fraction of the momentum of the target nucleon carried by the
struck quark (here x and y are defined for a stationary target nucleon). The plus (minus) sign
is for neutrino (anti-neutrino) interactions. It can be seen that y is the fraction of the neutrino’s
energy which is converted into the energy of the hadron shower. A similar expression can
be written down for the NC interaction (see for example [26]) which has a ratio to the CC
cross section varying from 0.33 to 0.41 as the neutrino energy increases from 104 to 1013 GeV.
The structure functions F
and xF
are the sum of the quark distribution functions which have
been parameterised by fitting data [30, 31]. It can be shown that Q2 = sxy where s = 2ME
is the square of the centre of mass energy (M is the target nucleon mass). To compute the
cross sections the structure functions must be calculated at values of x . M2
=s i.e. at values
well outside the region of the fits to the parton distribution functions (PDFs) which have been
performed for x & 10�5, the range of current measurements. The extrapolation outside the
measurement range is discussed in [27], [29] and [32, 33]. Here we adopt the procedure of
extrapolating linearly on a log-log scale from the parameterised parton distribution functions
of [30] computed at x = 10�4 and x = 10�5. By considering various theoretical evolution
procedures it is estimated in [29] that the procedure has an accuracy of � 32% per decade
and we use this as an estimate of the accuracy of the calculation. However, this could be an
underestimate [34].
The expression in equation 1 for charged current interactions and the one for neutral current
interactions were integrated to obtain the total neutrino-nucleon interaction cross section, the
value of the fraction of events per interval of y, 1=�d�=dy, and the mean value of y. The total
cross section was found to be in good agreement with the values in [27, 29] and in reasonable
agreement with [28] which is based on a model different from the quark parton model. Fig-
ure 4 shows the mean value of y obtained from this procedure (solid curve) and the effect of
multiplying or dividing the PDFs by a factor 1.32 per decade (dashed curves) as an indication
of the possible range of uncertainties in the extrapolation of the PDFs. Figure 5 shows the y
dependence of the cross section for different neutrino energies.
4.2 A simple generator for neutrino interactions.
A simple generator for neutrino interactions in a column of water of thickness 20 m was con-
structed as follows. The neutrino interacts at the top of the water column (z=0, with the z axis
along the axis of the column). The energy fraction transferred, y, for the interaction was gener-
ated, distributed according to the curve for the energy of the neutrino shown in Figure 5. This
allows the energy of the hadron shower to be calculated for the event. The assumption was
made that these hadron showers will have approximately the same distributions as those of a
proton interaction at z=0 (see Section 4.3 for a test of this assumption). A series of files of
100 such proton interactions were generated at energies in steps of half an order of magnitude
between 105 and 1012 GeV. The hadron shower for each neutrino interaction was selected at
random from the 100 showers in the file at the proton energy closest to the energy of the hadron
shower. The deposited energy in each bin was then multiplied by the ratio of the energy of the
hadron shower to that of the proton shower. This is made possible because the shower shapes
vary slowly with shower energy. For example, the ratio of the peak energy deposit per 20 g
cm�2 slice to the shower energy varies from 0.037 to 0.030 as the proton shower energy varies
from 105 to 1012 GeV.
4.3 The HERWIG neutrino generator.
The CORSIKA program has an option to simulate the interactions of neutrinos at a fixed
point [35]. The first interaction is generated by the HERWIG package [36]. This option was
adapted to our version of CORSIKA in sea water. Some problems were encountered with the y
dependence of the resulting interactions due to the extrapolation of the PDFs to very small x at
high energies. This only affects the rate of the production of the showers at different y and the
distribution of the hadrons produced in the interaction at a given y should be unaffected.
A total of 700 neutrino interactions were generated at an incident neutrino energy of 2 � 1011
GeV. These were divided into the shower energy intervals 0:5�2�1010, 2�4�1010, 4�7:5�1010,
0:75� 1:3 � 10
11 and 1:3� 2 � 1011. The showers in which the scattered lepton energy disagreed
with the shower energy by more than 20% were eliminated leading to a loss of 17% of the
events with shower energy greater than 0:5 � 1010 GeV. This is due to radiative effects and
misidentification of the scattered lepton. Approximately 70 events remained in each energy
interval. The energy depositions from these were averaged and compared to the averages from
proton showers scaled by the ratio of the shower energy to the proton energy. Figure 6 shows
the longitudinal distributions of the hadronic shower energy deposited for the different energy
intervals (labelled E
) compared to the scaled proton distributions. Figure 7 shows a sample
of the transverse distributions.
There is a good consistency between the proton and neutrino induced showers. The proton
showers peak, on average, 20 g cm�2 shallower in depth with a peak energy 2% larger than the
neutrino induced showers. This is small compared to the overall uncertainty. The slight shift in
the longitudinal distribution is reflected as a normalisation shift in the radial distributions. We
conclude therefore that to equate a proton induced shower starting at the neutrino interaction
point to that from a neutrino is a satisfactory approximation.
5 Parameterisation of showers
In this section a parameterisation of the energy deposited by the showers generated by COR-
SIKA (averaged over 100 showers depositing the same total energy) is described. Other avail-
able parameterisations will then be compared with the showers generated by CORSIKA.
The acoustic signal generated by a hadron shower depends mainly on the energy deposited
in the inner core of the shower. This is illustrated in figure 8 which shows the contribution to
the acoustic signal from cores of different radii. This figure shows that it is crucial to represent
the deposited energy well at radius less than 2.05 g cm�2. The calculation of the acoustic signal
from the deposited energy is described in section 6.
5.1 Parameterisation of the CORSIKA Showers
The differential energy deposited was parameterised as follows
= L(z; E
) �R(r; z; E
) (2)
where the function L(z; E
) represents the longitudinal distribution of deposited energy and
R(r; z; E
) the radial distribution. Here E
is log
E with E the total shower energy.
The function L(z; E
) = dE=dz is a modified4 version of the Gaisser-Hillas function [37].
This function represents the longitudinal distribution of the energy deposited.
L(z; E
) = P
z � P
z + P
Here the parameters P
were fitted to quadratic functions of E
= log
E with values
= 2:760 � 10
� 1:974 � 10
+ 7:450 � 10
= �210:9� 6:968 � 10
+ 0:1551E
= �41:50 + 113:9E
� 4:103E
= 8:012 + 11:44E
� 0:5434E
= 0:7999 � 10
� 0:004843E
+ 0:0002552E
= 4:563 � 10
� 3:504 � 10
+ 1; 315 � 10
: (9)
The parameter P
represents the peak energy deposited and P
the depth in the z coordinate
at this peak while P
and P
are related to the shower width and shape in z.
The radial distribution was represented by the NKG function [37]
R(r; z; E
�4:5)
where the integral
�4:5)
dr = P
�(4:5� 2P
�(4:5� P
4The modification is to replace the shape parameter � in equation 3.5 of reference [37] by the quadratic expres-
sion in z in equation 3.
The parameter P
was found to vary strongly with depth while P
was only a weak function
of depth. The parameters P
(with n = 1,2) were each represented by the quadratic form
= A +Bz + Cz
2 (11)
and the quantities A;B;C parameterised as quadratic functions of E
. This gave for P
A = 0:01287E
� 0:2573E
+ 0:9636 (12)
B = �0:4697 � 10
+ 0:0008072E
+ 0:0005404 (13)
C = 0:7344 � 10
� 1:375 � 10
+ 4:488 � 10
�6 (14)
and for the parameter P
A = �0:8905 � 10
+ 0:007727E
+ 1:969 (15)
B = 0:1173 � 10
� 0:0001782E
� 5:093 � 10
�6 (16)
C = �0:1058 � 10
+ 0:1524 � 10
� 0:1069 � 10
: (17)
The fit was made in a depth range where dE=dz was greater than 10% of the peak value
defined by equation 4. The program MINUIT [38] was used to minimise the squared fractional
deviations
where F
and D
refer to the fitted value and the value observed in the ith bin from the COR-
SIKA showers, respectively. In order to improve the fit at small radii the contributions to �2
were arbitrarily weighted by 10 for r < 2:05 g cm�2, 4 for 2:05 < r < 3:075 g cm�2, unity
for 3:075 < r < 51:25 g cm�2 and 0.25 for r > 51:25 g cm�2. The RMS value of the frac-
tional deviations was 3:4% for radii less than 51.25 g cm�2 and for energies greater than 106:5
GeV. The fit becomes poorer at lower energies and greater radii than these. Integrating the pa-
rameterisation shows that the fraction of the total energy computed from the fit within the fit
range was 91% averaged over the deposited energy range 107 to 1012 GeV. The corresponding
fraction directly from the CORSIKA distributions was 92:5%, averaged over the same energy
range. When applying this parameterisation at depths with smaller energy deposit than 10% of
the peak value, the energy was assumed to be confined to an annular radius of 1.025 g cm�2.
There was a good agreement (within 5% at the peak) between the acoustic signal computed
using this parameterisation and that taken directly from the CORSIKA showers.
5.2 The parameterisation used by the SAUND Collaboration
The SAUND Collaboration [39] uses the following parameterisation [40], based on the NKG
formulae (e.g. see reference [37]), for the energy deposited per unit depth, z, and per unit
annular thickness at radius r from a shower of energy E
= Ek(
exp (t� z=�) 2�r�(r) (19)
where z
= 0:9X
ln(E=E
) is the maximum shower depth, X
= 36:1 g cm�2 is the radia-
tion length andE
= 0:0838GeV. The constants t = z
=�where � = 130�5 log
(E=10
g cm�2 and k = tt�1= exp (t)��(t). The radial density is given by
�(r) =
(1 + a)
s�4:5
�(4:5� s)
2��(s)�(4:5� 2s)
where a = r=r
with r
= 9:04 g cm�2, the Molière radius in water, and s = 1:25. Figure
9 shows the radial distributions from CORSIKA compared with the absolute predictions of this
parameterisation.
There is qualitative agreement between the parameterisation and the CORSIKA results. The
difference in normalisation is explained by the somewhat different longitudinal profiles of the
CORSIKA showers from the SAUND parameterisation. The latter are broader with a lower
peak energy deposit and a depth of the maximum which is larger than the CORSIKA showers.
CORSIKA predicts more energy at small r than the SAUND parameterisation. Quantitatively,
51% of the shower energy is contained within a cylinder of radius 4 cm for the CORSIKA
showers compared to 35% from the SAUND parameterisation. These fractions are approxi-
mately independent of energy. Hence, in acoustic detectors a harder frequency spectrum for the
acoustic signals is predicted by CORSIKA than by the SAUND parameterisation. Note that in
the fit described in Section 5.1 the values of the parameter P
(equivalent to R
in equation
20) were strongly depth dependent and much lower than the Molière radius in water, assumed
by the SAUND collaboration. In addition, the value of P
(equivalent to s in equation 20)
while relatively constant tended to be at a higher value (� 1:9) than that assumed by SAUND.
5.3 The parameterisation used by Niess and Bertin
Hadron showers, generated by Geant4 (version 4.06 p03), were studied up to energies of 105
GeV and electromagnetic showers to higher energies by Niess and Bertin [15,16]. The hadronic
showers were parameterised as follows.
= rf(z)g(r; z) (21)
f(z) =
exp�bz
where E is the energy of the hadron shower, X
is the radiation length in water, z0 = z=X
b = 0:56 as determined from the fit and a is chosen to satisfy z0
= (a � 1)=b. Here z0
the depth in radiation lengths at which the shower maximum occurs. This is parameterised as
= 0:65 log(
) + 3:93 (23)
with E
= 0:05427 GeV. The radial distribution function is parameterised as
g(r; z) = g
where r
= 3:5 cm, n = n
= 1:66 � 0:29(z=z
) for r < r
and n = n
= 2:7 for r > r
The constant g
is chosen to be (2� n
� 2)=((n
) so that the integral of the radial
distribution is unity.
Figure 10 shows the radial distributions from this parameterisation compared with the pre-
dictions of CORSIKA. There is quite good agreement between the two. There is a difference in
the normalisation with depth since Geant4, on which this parameterisation is based, produces
showers which tend to develop more slowly with depth than those from CORSIKA (see Fig-
ure 2). Furthermore, both this and the SAUND parameterisation (Section 5.2) assume a linear
variation of the shower peak depth with logE whereas CORSIKA gives a clear parabolic shape
(see equation 6). This is illustrated in Figure 11. The Niess-Bertin parameterisation predicts that
56% of the shower energy is contained within a cylinder of radius 4 cm in reasonable agreement
with the value of 51% from CORSIKA (these values are almost independent of energy).
6 The acoustic signals from the showers.
The pressure, P , from a hadron shower depositing total energy E at time t resulting from the
deposition of relative energy density � = (1=E)(1=2�r)d2E=drdz at a point distant d from the
volume, dV , follows the form [13]
P (d; t) =
Æ(t� d=
)
dV (25)
where the integral is over the total volume of the shower. Here � = 2:0 � 10�4 is the thermal
expansion coefficient of the medium at 14ÆC, C
= 3:8 � 10
3 J kg�1 K�1 is the specific heat
capacity and 
 = 1500 ms�1 is the velocity of sound in the sea water.
Acoustic signals seen by an observer at distance r from the shower centre are computed from
equation (25) as follows. Points are produced randomly throughout the volume of the shower
with density proportional to the deposited energy density and the time of flight from every
produced point to the observer calculated. The flight times to the observer are histogrammed
over 2n bins (in this case n = 10 is chosen) centred on the mean flight time and with a suitable
bin width, � (chosen here to be 1�s). The counts in each bin of the histogram are divided by �
yielding the function E
(t). The Fourier transform of the pressure wave is then
P (!) =
using the standard Fourier transform theorem, that taking the derivative in the time domain is the
same as multiplying by i! in the frequency domain. The Fourier transform E
(!) at angular
frequency ! is evaluated numerically by a fast Fourier Transform (FFT) from the histogram
(t). A correction is applied for attenuation in the water by a factor A(!) = e��(!)r where
�(!) is the frequency dependent attenuation coefficient. The pressure as a function of time is
then evaluated numerically by an inverse FFT using frequency steps from zero to the sampling
frequency (the inverse of the bin width � i.e. 1 MHz in this case). This gives
P (t) =
n=511
n=�512
 (27)
where 
 = 2�=1024 radians and !
=2� = n
=2� MHz is the nth frequency. The attenuation
coefficient �(!) is computed either according to the formulae in [42] or using the complex
attenuation given in [15, 16]. This method of calculation was computationally much faster than
the evaluation of the space integral given in equation 18 of reference [13] and gave identical
results.
Acoustic pulses, computed with the complex attenuation described in [15, 16], using the
parameterisations of the shower profile given above are shown in Figure 12. It can be seen that
the parameterisation developed here gives similar results to that described in [15, 16] despite
the fact that the latter was an extrapolation from low energy simulations. The parameterisation
used by SAUND [39, 40] gives smaller signals concentrated at somewhat lower frequencies.
Further properties of the acoustic signals are shown in Figures 13 to 16. The pulses tend
to be somewhat asymmetric with the asymmetry defined by jP
j � jP
j + jP
The complex nature of the attenuation enhances this asymmetry. This is most evident in the far
field conditions e.g. at 5km where non complex attenuation would yield a totally symmetric
pulse. Figure 13 shows the angular dependence of the peak pressure. Here the angle is that
subtended by the acoustic detector relative to the plane, termed the median plane, through the
shower maximum at right angles to the axis of the shower. The parameterisation derived here
gives a somewhat narrower angular spread than the others. This could be due to the slightly
longer showers predicted by CORSIKA than the others. Figure 13 also shows the asymmetry of
the pulse as a function of this angle. The pulse initially becomes more symmetric moving out of
the median plane and then the asymmetry becomes negative at larger angles. Figure 14 shows
the decrease of the pulsed peak pressure with distance from the shower in the median plane and
the asymmetry with distance in this plane. Figures 15 and 16 show the frequency composition
of the pulses at different angles to the median plane at 1 km from the shower and at different
distance in the median plane, respectively.
7 Conclusions
The simulation program for high energy cosmic ray air showers, CORSIKA, has been modified
to work in a water or ice medium. This allows both hadron and neutrino showers to be generated
in the medium over a wide range of energy (105 to 1012 GeV). The properties of hadronic
showers in water simulated by CORSIKA agree with those from other simulations to within
10 � 20%. A similar uncertainty has been noted previously from the variations in CORSIKA
showers in air generated by different models of the hadron interactions. However, none of
the other available simulations for water cover the range of energies accessible to CORSIKA.
The hadronic showers produced by neutrino interactions are shown to have similar profiles to
proton showers which deposit the same amount of energy to that from the neutrino and which
start at the interaction point of the neutrino. The properties of the neutrino interactions are
described. A parameterisation of the shower profiles generated by CORSIKA is given. There is
reasonable agreement with the parameterisation based on the Geant4 simulations at low energy
(< 105 GeV) developed by Niess and Bertin. However, the agreement with the parameterisation
used by the SAUND Collaboration, which is based on the NKG formalism, is less good. The
position of the shower maximum, determined from the CORSIKA program, is found to vary
quadratically with logE rather than linearly as assumed in the latter two parameterisations.
The acoustic signals generated by neutrino interactions using CORSIKA and by the two
other parameterisations are described and their properties are studied. The acoustic signal is
found to be very sensitive to the energy deposited close to the shower axis.
7.1 Acknowledgments
We wish to thank Ralph Engel, Dieter Heck, Johannes Knapp and Tanguy Pierog for their
assistance in modifying the CORSIKA program. We also thank Valentin Niess and Justin Van-
denbroucke for valuable discussions.
References
[1] Proceedings of the Workshop on Acoustic and Radio EeV Neutrino Detection Activities
(ARENA), DESY, Zeuthen (May 2005), Editors R. Nahnhauer and S. Böser
[2] K. Griesen, Phys. Rev. Lett.16 (1966) 748,
G.T. Zaptsepin, V.A. Kuzmin, JETP Lett. 4 (1966) 78.
[3] E. Waxman and J. Bahcall, Phys. Rev. D59 (1999) 023002, (hep-ph/9807282).
[4] R.D. Engel, D. Seckel and T.Stanev, Phys. Rev. D64 (2001) 093010 (astro-ph/0101216).
[5] See for example M. Ackermann et al., (astro-ph/0412347) Phys. Rev. D71 (2005) 077102.
[6] See for example Nucl. Instrum. Meth. A567 (2006) 438
[7] J.A. Aguilar et al., astro-ph/0606229.
[8] See for example G. Aggouras et al., Nucl. Instrum. and Meth. A552 (2005) 420.
[9] See for example Nucl. Phys. Proc. Suppl. 143 (2005) 373.
[10] “CORSIKA: A Monte Carlo Code to Simulate Extensive Air Showers”, D. Heck et al.,
Karlsruhe Report FZKA 6019. (http://www-ik.fzk.de/corsika).
[11] G. Askar’yan, Soviet Physics JETP 14 (1962) 441 and 21 (1965) 658.
[12] J. Alvarez-Muñiz, E. Marqués, R.A. Vázquez and E. Zas Phys. Rev. D68 (2003) 043001
(astro-ph/0206043).
[13] J.G. Learned Phys. Rev. D19 (1979) 3293.
[14] J. Alvarez-Muñiz and E.Zas, Phys. Lett. B434 (1998) 396 (astro-ph/9806098).
[15] V. Niess and V. Bertin astro-ph/0511617 and V. Niess, PhD Thesis, CPPM, Marseille.
[16] V. Niess, PhD Thesis, CPPM, Marseille, see equations 1-55 and 1-56.
[17] Geant4, J. Allison et al., Nucl. Inst. and Meths. in Phys. Research A506 (2003) 250 and
IEEE Transactions on Nucl. Science 53 (2006) 270.
[18] “The EGS4 Code System” W.R. Nelson, H.Hirayama and D.W.O. Rogers, report number
SLAC-265.
[19] L.D. Landau and I.J. Pomeranchuk, Dokl. Akad. Nauk. SSSR 92 (1953) 535 and 92 (1953)
735. These papers are available in English in L. Landau, “The Collected Papers of L.D.
Landau”, Pergamon Press 1965.
A.B. Migdal, Phys. Rev. 103 (1956) 1811.
[20] Particle data table, Phys. Lett. 592 (2004) 1.
[21] R. M. Sternheimer, S.M. Seltzer and M.J.Berger, Atomic Data and Nuclear Data Tables
30 (1984) 261.
[22] D. Heck et al., Forschungszentrum Karlsruhe GmbH, Karlsruhe, Report number FZKA
6019 (1998).
[23] N.N. Kalmykov and S. Ostapchenko, Phys. Atom. Nucl. 56 (1993) 346, N.N. Kalmykov
et al., Nucl. Phys. Proc. Suppl. 52B (1997) 17.
[24] J. Alvarez-Muniz and E. Zas, Phys. Lett. B434 (1998) 396 (astro-ph/9806098).
[25] Influence of Hadronic Interaction Model on the Development of EAS in Monte Carlo
Simulations, D. Heck, J.Knapp and G. Schatz, Nucl. Phys. B (Proc. Suppl.) 52B (1997)
139-141.
[26] “An Introduction to the Physics of Quarks and Leptons” by P. Renton (published by Cam-
bridge University Press, 1990)
[27] J. Kwiecinski, A.D. Martin and A.M. Stasto Acta Phys. Polon. B31 (2000) 1273 (hep-
ph/0004109).
[28] A.Z. Gazizov and S.I. Yanush Phys. Rev. D65 (2002) 093003 (hep-ph/0105368)
[29] R. Ghandi, C. Quigg, M.H. Reno, I. Sarcevic, Astroparticle Physics 5 (1996) 81.
[30] A.D. Martin, R.G. Roberts, W.J.Stirling and R.S. Thorne, Eur. Phys. J. C14 (2000) 133
(hep-ph/9907231).
[31] http://www.phys.psu.edu/ cteq/
[32] J. Kwiecinski, A.D. Martin and A.M. Stasto, Phys. Rev. D59 (1999) 093002.
[33] A.D. Martin, M.G.Ryskin and A.M. Stasto, Acta Phys. Polon. B34 (2003)3273.
[34] R.S. Thorne, private communication.
[35] O. Pisanti, private communication, see also M. Ambrosio et al., astro-ph/0302062.
[36] HERWIG, G. Corcella et al., hep-ph/0011363.
[37] “Introduction to Ultra High Energy Cosmic Rays” by P. Sokolsky (published by Addison-
Wesley, 1989).
[38] F. James and M. Roos, “Minuit, A System for Function Minimization and Analysis of the
Parameter Errors and Correlations”, Comput. Phys. Commun. 10 (1975) 343.
[39] J. Vandenbroucke, G. Gratta, N. Lehtenin, Astrophys. J. 621 (2005) 301. (astro-
ph/0406105).
[40] J. Vandenbroucke, private communication.
[41] N.G. Lehtinen, S. Adam, G.Gratta, T.K. Berger and M.J. Buckingham (astro-ph/0104033).
[42] M.A. Ainslie and J.G. McColm, J.Acoust. Soc. Am 103 (1998) 1671.
Interaction length of photons in water (CORSIKA)
Pair production length
with LPM Effect
(dash dotted curve)
Total Interaction length
including photonuclear
interaction (solid curve)
9/7 X0 Water (No LPM effect)
Log10 Photon Energy/GeV
Corsika - LPM Effect On
Corsika - LPM Effect Off
4 5 6 7 8 9 10 11 12
Figure 1: The interaction length for high energy gamma rays versus the photon energy measured
in CORSIKA (data points with statistical errors). The dash dotted curve shows the pair produc-
tion length computed from the LPM effect using the formulae of Migdal [19]. The solid curve
shows the computed total interaction length, including both pair production and photonuclear
interactions with the cross section from CORSIKA. The dashed line labelled 9=7X
shows the
expected pair production length without the LPM effect. Here X
is the radiation length of the
material.
z (cm)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Average de/dz
Geant4  seawater
Corsika seawater
Average dE/dz of 100 proton showers
z (cm)
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Average de/dz
Geant4  seawater
Corsika seawater
Average dE/dz of 100 proton showers
Figure 2: Averaged longitudinal energy deposited per unit path length of 100 proton showers
at energy 104 GeV (upper plot) and 105 GeV (lower plot) generated in Geant4 and CORSIKA
versus depth in the water.
Geant4  seawater
Corsika seawater
Depth = 250cm
210 Depth = 450cm
Depth = 650cm
210 Depth = 850cm
0 5 10 15 20 25 30 35 40
210 Depth = 1050cm
Geant4  seawater
Corsika seawater
Depth = 250cm
10 Depth = 450cm
) Depth = 650cm
10 Depth = 850cm
0 5 10 15 20 25 30 35 40
10 Depth = 1050cm
Figure 3: Averaged radial energy deposited per 20 g cm�2 vertical slice per unit radial distance
for 100 proton showers at energy 104 GeV (left hand plots) and 105 GeV (right hand plots)
generated in Geant4 and CORSIKA versus distance from the axis in the water for different
depths of the shower.
Average y for different structure function extrapolations
Standard extrapolation (solid)
Scaled extrapolations (dashed)
log Eν/GeV
4 6 8 10 12
Figure 4: The mean value of y as a function of energy for �
interactions computed according
to the standard model with the PDFs of MRS99 [30], extrapolating x and Q2 out of the fit
range from x = 10�4 linearly on a log-log scale. The upper dashed curve shows the result of
multiplying the PDFs by 1:32log(10
=x) for PDFs with x < 10�4 and the lower dashed curve by
dividing by this factor. The deviations of the dashed curve from the solid one is an indication
of the precision of the standard model.
Differential cross section for νµ Scattering
E=104 GeV
E=1013 GeV
E=106 GeV
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 5: The fraction of events per unit y interval for different �
energies computed by inte-
grating the expressions for the CC and NC cross sections.
x 10 5
10000
x 10 5
x 10 6
x 10 6
Mean EW = 10
10 GeV
Proton (dashed) νµ (solid) showers of same energy
   EW = 3 10
10 GeV
    EW = 5.75 10
10 GeV
    EW = 10
11 GeV
    EW = 1.65 10
11 GeV
Depth gm cm-2
x 10 6
200 400 600 800 1000 1200 1400 1600 1800
Figure 6: The longitudinal distribution of the deposited energy for neutrino showers (solid)
generated by the Herwig-CORSIKA package and proton showers (dashed) scaled to the same
values of shower energy E
. The scaling factors applied to the average of the protons showers
with energy 1010 GeV were 1.0 and 3.0 for E
10 GeV and E
= 3 � 10
10 GeV, respec-
tively. Those applied to proton showers with energy 1011 GeV were 0.575, 1.0 and 1.65 for
= 5:75 � 10
11 and E
= 1:65 � 10
11 GeV, respectively.
Depth 250 g/cm2
Proton (dashed) νµ (solid) EW = 3 10
10 GeV
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Depth 250 g/cm2
Proton (dashed) νµ (solid) EW = 1.65 10
11 GeV
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 7: The solid curves show the averaged radial energy deposited per 20 g cm�2 vertical
slice per unit radial distance for 70 neutrino showers with energy transfer E
= 3 � 10
10 GeV
(left hand plots) and E
= 1:65 � 10
11 GeV (right hand plots). The incident neutrino energy
was 2 �1011 GeV. For comparison the dashed curves show the distributions from proton showers
scaled to these energies. In the left (right) hand plots protons of energy 1010 GeV (1011 GeV)
were scaled by a factor of 3 (1.65).
−100 −80 −60 −40 −20 0 20 40 60 80 100
time (µs)
 Pulse @1km, z=8m (109GeV Primary)
0−1 cm
0−2 cm
Total
Figure 8: The acoustic signal at a distance of 1 km from the shower axis in the median plane
computed from the average of 100 CORSIKA showers each depositing a total energy of 109
GeV in the water. The dotted, dashed and solid curves shows the signals computed from the
deposited energies within cores of radius 1.025 g cm�2, 2.05 g cm�2 and the whole shower
(solid curve), respectively. It can be seen that most of the amplitude of the signal comes from
the energy within a core of radius 2.05 g cm�2.
Solid CORSIKA, dash SAUND parameterisation
Depth 250 g/cm2
106 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Solid CORSIKA, dash SAUND parameterisation
Depth 250 g/cm2
1011 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 9: The radial distributions of the deposited energy at different depths from CORSIKA
compared to the parameterisation used by the SAUND collaboration [39] for 106 GeV and 1011
GeV proton induced showers.
Solid CORSIKA, dash NB parameterisation
Depth 250 g/cm2
106 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Solid CORSIKA, dash NB parameterisation
Depth 250 g/cm2
1011 GeV protons 100 shower average.
Depth 450 g/cm2
Depth 650 g/cm2
Depth 850 g/cm2
Depth 1050 g/cm2
Radius(cm)
5 10 15 20 25 30 35 40
Figure 10: The radial distributions of the deposited energy at different depths from CORSIKA
compared to the parameterisation used by the Niess and Bertin [15, 16] (labelled NB parame-
terisation) for 106 GeV and 1011 GeV proton induced showers.
CORSIKA
SAUND
Log10E/GeV
6 7 8 9 10 11 12
Figure 11: The depth of the shower peak as a function of log
E from CORSIKA (black points)
for the showers starting at z = 0. The solid curve shows the parameterisation according to
equation 6. The dashed (dash dotted) lines show the values assumed by the SAUND (Niess-
Bertin) Collaborations.
−100 −50 0 50 100
−0.05
time (µs)
The Acoustic Pulse from a 1011GeV Shower at 1km
ACoRNE
SAUND
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
ACoRNE
SAUND
Figure 12: The left hand plot shows acoustic pulses generated from the parameterisation de-
scribed in section 5.1 labelled ACoRNE, the parameterisation from reference [15, 16] labelled
NB and that from reference [39,40] labelled SAUND. These pulses were evaluated for a hadron
shower from a neutrino interaction depositing hadronic energy of 1011 GeV 1 km distant from
an acoustic detector in a plane perpendicular to the shower axis at the shower maximum (the
median plane). The right hand plot shows the frequency decomposition of the pulses in the left
hand plot.
−6 −4 −2 0 2 4 6
Angle (degrees)
Peak Pressure vs Angle from a 1011GeV Shower at 1km
ACoRNE
SAUND
0 0.5 1 1.5 2 2.5 3
−0.15
−0.05
Angle (degrees)
The Asymmetry of a Pulse from a 1011GeV Shower
ACoRNE
SAUND
Figure 13: The left hand plot shows the variation of the peak pressure in the pulse with angle
from the median plane at 1 km from the shower. The right hand plot shows the variation of
the pulse asymmetry with this angle at the same distance. The curves were computed from the
parameterisations labelled.
Distance (meters)
Peak Pressure vs Distance from a 1011GeV Shower
ACoRNE
SAUND
Distance (Meters)
The Asymmetry of a Pulse from a 1011GeV Shower
ACoRNE
SAUND
Figure 14: The left hand plot shows the decrease of the pulse peak pressure and the right hand
plot the pulse asymmetry, both in the median plane, as a function of distance from the shower
computed from the parameterisations.
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower at 1km
Figure 15: The left hand plot shows the frequency decomposition of the acoustic signal, com-
puted from the parameterisation of the CORSIKA showers, at different angles to the median
plane at a distance of 1km from the shower and the right hand plot shows the cumulative fre-
quency spectrum i.e. the integral of the left hand plot.
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower
1000m
2000m
0 5 10 15 20 25 30 35 40
Frequency (kHz)
Relative Power Spectrum of a Pulse from a 1011GeV Shower
1000m
2000m
Figure 16: The left hand plot shows the frequency decomposition of the acoustic signal, com-
puted from the parameterisation of the CORSIKA showers, at different distances from the hy-
drophone in the median plane and the right hand plot shows the cumulative frequency spectrum
i.e. the integral of the left hand plot.
	Introduction
	Adaptation of the CORSIKA program to a water medium
	Comparison with other simulations
	Comparison with Geant4
	Comparison with the simulation of Alvarez-Muñiz and Zas
	Simulation of neutrino induced showers
	Neutrino-nucleon interaction cross sections.
	A simple generator for neutrino interactions.
	The HERWIG neutrino generator.
	Parameterisation of showers
	Parameterisation of the CORSIKA Showers
	The parameterisation used by the SAUND Collaboration
	The parameterisation used by Niess and Bertin
	The acoustic signals from the showers.
	Conclusions
	Acknowledgments
ABSTRACT
  The CORSIKA program, usually used to simulate extensive cosmic ray air
showers, has been adapted to work in a water or ice medium. The adapted CORSIKA
code was used to simulate hadronic showers produced by neutrino interactions.
The simulated showers have been used to study the spatial distribution of the
deposited energy in the showers. This allows a more precise determination of
the acoustic signals produced by ultra high energy neutrinos than has been
possible previously. The properties of the acoustic signals generated by such
showers are described.

<|endoftext|><|startoftext|>
Introduction
It is claimed that our vacuum is one of the 10500 possible meta-stable vacua in the
string theory landscape [1]. If this is true, then the physical parameters labeling which
vacuum we are living in can not be calculated from the first principle. Theoretically,
these parameters may only be explained by some anthropic reasoning [2], or by pure
chance.
From the cosmological point of view, in the framework of eternal inflation [3], the
vast landscape of vacua is not only a logic possibility but also the reality. If we demand
that our observable universe is not too special in the multiverse, in principle, we can
make predictions in the multiverse by calculating the probability of the corresponding
universe history.
A serious problem arises at this point. A measure of the history space is essentially
needed in order to compare different histories of the universe. But in general relativity,
it is not straightforward to construct such a measure. It is because there is no preferred
space slicing and time notation in general relativity, and singularities commonly arise
in the cosmic solutions. Even in the much simplified Friedmann-Robertson-Walker
universe, the measure problem is not easy to solve. The construction of a measure of
the history space is considered as one of the central problems in cosmology. Attempts
for this problem can be found in [4, 5].
To analyze this problem in detail, let us construct the history space of the universe
and discuss the measure. In general relativity, all the trajectories in the phase space
should lie on the hypersurface H−1(0) due to the Hamiltonian constraint H = 0. Now
we want to consider the history space, where a trajectory is represented by a single
point. So we have to identify the points in H−1(0) which can be linked by the time
evolution. Then, the history space, or the multiverse, takes the form
M = H−1(0)/R. (1)
The next step is to construct a measure on the history space. To make sense and
to be natural in physics, a measure of the history space should satisfy three conditions
[4, 6]: (i) It should be positive. (ii) It should depend only on the intrinsic dynamics
and neither on any choice of time slicing nor on the choice of dependent variables.
(iii) It should respect all the symmetries of the space of solutions.
A measure of the history space satisfying these three requirements can be con-
structed from the phase space symplectic form [4, 6, 7]. The symplectic form of the
phase space ω can be written in terms of the canonical coordinates and momenta as
dpi ∧ dq
i. (2)
where m is the number of canonical coordinates.
If we choose pm = H, then from the Hamilton’s equations, q
m = t is the time
coordinate. And the symplectic form (2) can be written as
dpi ∧ dq
i + dH ∧ dt. (3)
The Hamiltonian constraint H = 0 naturally yields a two-form transverse to the
time evolution. This is the two-form in the history space,
ωC ≡ ω|H=0 =
dpi ∧ dq
i. (4)
The measure of the history space can be constructed by raising ωC to the (m−1)th
power,
(−1)(m−1)(m−2)/2
(m− 1)!
ωm−1C . (5)
Note that ΩM is an exact form. It can be globally written as ΩM ∼ dA with
A ≡ p1dq
dpi ∧ dq
i. (6)
This measure of the history space can be applied to the inflationary cosmology in
determining the probability of inflation. At first, it was believed that the canonical
measure favors inflation [6]. But soon it is realized that both inflationary and non-
inflationary history have infinite measure [8]. So the measure problem in cosmology
remained unsolved.
Recently, Gibbons and Turok [4] suggested a solution to this measure problem.
They noticed that a universe with a very small spacial curvature at the present time
can not be distinguished from a flat one. So physically, it makes sense to cut off the
history space by identifying a universe with a very small spacial curvature with a
flat universe. As was shown in [4], the measure for some quantities, like the spacial
curvature, is cutoff dependent, and dominated by the cutoff. While the measure
for some other quantities, for example, the e-folding number of inflation, is cutoff
independent. So by applying this cutoff, the question whether a N e-folds’ inflation is
natural can be well defined, and investigated explicitly. It is shown that the history
space volume for slow roll inflation is suppressed by a factor of exp(−3N), where N
is the e-folding number.
The work [4] concentrates on a single field minimal coupled inflation model. There
is a vast variety of inflation models in addition to a single inflaton model, thus it is
interesting to ask how other models weigh in this measure. Some of the inflation mod-
els involve a modified Lagrangian density other than the minimal one, some involve
multi-fields and some modify the Einstein gravity. We want to know whether these
inflation models are also suppressed for a large e-folding number. An investigation of
these models is the main task of this paper.
This paper is organized as follows. In Section 2, we review the approach by
Gibbons and Turok [4] for gravity minimally coupled to a scalar field. It is shown
that the inflation probability can be calculated directly as a function of N . In Section
3, we discuss the measure for the scalar field with a more general Lagrangian. We
find that in this generalized case, the measure for the slow roll inflationary history is
suppressed by exactly the same factor exp(−3N). In Section 4, we consider the multi-
field inflation. It can be shown that with the assumption of slow roll for the Hubble
constant, the measure is a lot more suppressed by the exponential factor exp(−3nN),
where n is the number of inflaton fields. So it seems much more unnatural for multi-
field inflation to happen. In Section 5, we investigate the generalized Lagrangian for
multi-field inflation. We find that the generalization of the Lagrangian can not solve
the measure problem raised in Section 4. Finally, we summarize the paper in the last
section.
2 Single Field Inflation Models
In this section, we consider a single scalar field minimally coupled with gravity with
the action
−3a(N−2ȧ2 − k) +
a3N−2ϕ̇2 − a3V (ϕ)
, (7)
where N is the lapse function, and k = 0,±1 represents the spacial curvature, and
dot denotes the derivative with respect to time. For simplicity, we have set M2p ≡
1/(8πG) = 1.
By varying the action with respect to the lapse function N , we obtain the Fried-
mann equation
3H2 =
ϕ̇2 + V (ϕ)−
, (8)
where after the variation, we have set N = 1. Varying the action with respect to ϕ
leads to the scalar field equation of motion
ϕ̈+ 3Hϕ̇+ Vϕ = 0, (9)
where the subscript ϕ in V denotes derivative with respect to ϕ.
From the time derivative of (8) and using (9), we get
Ḣ = −
ϕ̇2 +
. (10)
To construct the history space, we need to slice the H = 0 hypersurface of the
phase space. A good way to do this is to choose a constant H surface H = HS as
a slicing [4], where HS is chosen low enough that it just above the end of inflation
and the universe evolves adiabatically from then on. To choose a constant H slice is
because for the flat or open universe, and non-negative potential V (ϕ), each history
trajectory crosses a constant H surface exactly once. And the reason for choosing
H low enough is that only this choice can result in a cutoff independent measure of
e-folds, and this choice is in agreement with the anthropic “top down” approach to
cosmology [9].
On a constant H surface, the measure for the history space takes the form
dpϕ ∧ dϕ, (11)
where pϕ ≡ a
3ϕ̇ denotes the canonical momentum for ϕ. It can be calculated that
dϕda 3a2
6H2S − 2V + 4ka
6H2S − 2V + 6ka
. (12)
A divergence occurs in the large a limit of (12). This is the infinity discovered in
[8]. Following [4], we set a cutoff for the spacial curvature to critical density ratio
Ωk ≡ −k/(a
2H2S) as
|Ωk| ≥ ∆Ωk, (13)
The cutoff makes sense physically because a small enough Ωk is neither geometrically
meaningful nor physically observable. As we are working on a constant HS surface,
the cutoff can be translated into the cutoff of the scale factor
a2 ≤ a2max ≡
. (14)
Recall that ΩM = dA, The measure can be reduced to a surface integral around a
constant a surface of the constant HS history space,
pϕdϕ = a
ϕ̇dϕ. (15)
To investigate the probability distribution for inflation, now concentrate on an
history space volume element A ∼ ϕ̇∆ϕ. Where we have dropped the a3max term as
it is a constant. Since the variation operation ∆ is taken on a constant H surface, it
is convenient to convert the time derivative ∂t to the derivative with respect to the
Hubble constant ∂H , using
∂t = Ḣ∂H = −
ϕ̇2∂H . (16)
Then we can take the advantage that ∂H ·∆ = ∆ · ∂H .
Note thatH do not change when we move on the history space. Then the equation
(8) leads to a constraint for the history space variation
ϕ̇∆ϕ̇ + Vϕ∆ϕ = 0, (17)
Given this constraint, the Hubble evolution for A can be calculated as
∂HA = −
A. (18)
where we have neglected the spatial curvature energy density, because it have to be
small during the last e-folds of inflation.
Note that for the e-folding number N , −H = ∂tN = Ḣ∂HN , the equation (18)
takes the form
∂HA = 3A∂HN, (19)
which can be integrated out to give
A = e3NA(HS). (20)
The above equation tells us that as we stand at the end of inflation and track back-
wards with time, a volume in the history space expands exponentially. In order not
to break the slow roll condition along the whole 60 e-folds’ inflation, The volume el-
ement A must lie in a exponentially narrow corner in the constant HS history space.
So the probability for inflation is suppressed by the exp(−3N) factor. This suppres-
sion shows that inflation is not as natural as we intuitively think. It may have not
solved the naturalness problems of the hot big bang cosmology because of its unnat-
ural nature, or there remains some unknown mechanism to produce a exponentially
sharp peak for the possibility distribution of the history space.
3 Generalized Single Field Models
In this section, we consider the action
−3aN−2ȧ2 + a3f
ϕ,N−1ϕ̇
. (21)
A good many inflation models can be described using this action. For example,
K-Inflation [10], Phantom Inflation [11], Inflation driven by the brane DBI action
[12, 13], etc.
Choosing ϕ as a canonical coordinate and using the proper time, the canonical
momentum for ϕ takes the form
p = πa3, where π ≡ fϕ̇. (22)
Take variation with respect to N and ϕ, one obtains
3H2 = ϕ̇π − f, π̇ + 3Hπ − fϕ = 0, Ḣ = −
ϕ̇π. (23)
Using (23), the constraint for the variation in the history space can be written as
ϕ̇∆π = fϕ∆ϕ. (24)
And using the definition of π, we have the variation relation
∆ϕ̇ = f−1ϕ̇ϕ̇ (∆π − fϕ̇ϕ∆ϕ). (25)
where we have assumed that fϕ̇ϕ̇ 6= 0, in order that ϕ can be treated as a dynamical
degree of freedom.
Now we take the cutoff as discussed in the last section, and reduce the integration
of the history space to the boundary integration
pdϕ = a3max
πdϕ. (26)
Then it can be calculated that the variation of the volume element in the history
space A ∼ π∆ϕ evolves along the constant H surfaces as
∂HA = −
A = 3A∂HN (27)
So the conclusion is exactly the same as that of the last section. In order to get
N e-folds’ slow roll inflation, the volume element in the history space should be
exponentially fine turned.
It should be noticed that in this general case, there is the possibility that even
the history evolution is not slow rolling, accelerated expansion with a large e-folding
number can be achieved in models such as the Kflation or the phantom inflation. But
it is difficult to get a scale invariant perturbation spectrum if the slow roll condition
is not satisfied [11].
4 Multi-Field Inflation Models
Multi-field inflation models take an important part in the inflationary model build-
ing. In string theory, there can be a number of scalar fields at the inflation scale.
Phenomenally, in multi-field models, slow roll condition is less stringent and can be
satisfied in more models [14]. Moreover, there are interesting inflation models, like
the hybrid inflation model [15], which requires essentially more than one field. So it is
useful to study the measure for multi-field inflation and investigate the corresponding
probability.
The action for the multi-field inflation takes the form
−3aN−2ȧ2 +
a3N−2ϕ̇i
2 − a3V (ϕi)
, (28)
where the duplicate index i is summed over the n scalar fields.
Choosing to use the proper time, the canonical momentum for ϕi is pi ≡ a
3ϕ̇i.
And the equations of motion takes the form
3H2 =
ϕ̇2i + V, Ḣ =
ϕ̇2i , ϕ̈i + 3Hϕ̇i + Vϕi = 0. (29)
The constraint for constant HS variation is
ϕ̇i∆ϕ̇i + Vϕi∆ϕi = 0. (30)
It can be checked by direct calculation that ∂H ·∆ = ∆ · ∂H is also true operating on
ϕ̇i. In this multi-field case
A ∼ ϕ̇1 ∆ϕ1 ∧∆ϕ̇2 ∧∆ϕ2 ∧ . . . ∧∆ϕ̇n ∧∆ϕn. (31)
Using the constraint (30), each term in ∂HA is proportional to A, and ∂HA turns
out to be
∂HA = 3n
A∂HN. (32)
In a multi-field inflation model, Ḧ/(HḢ) should also be small and rolling slowly as
in the single field case. If one assumes that ϕ̈1/(Hϕ̇1) is also small and slow rolling,
then the integration can be carried out as
A = e3nNA(HS), (33)
which shows that the departure from slow-roll evolves much faster than that in the
single field case. As a result, multi-field inflation is much more unnatural then the
single-field inflation with a much smaller measure in the history space. This result
is not surprising. It is because from the first equation in (29), the Hubble constant
has contribution from the energy density of all inflation fields. While from the third
equation in (29), the Hubble constant appears as a friction in the evolution of each
single inflaton field. So in the multi-field inflation case, the friction of each single field
is contributed by all the fields, and the history space for slow roll inflation is much
more concentrated then the single field models.
As analyzed in [16], |ϕ̈1/(Hϕ̇1)| ≪ 1 may break down in some multi-field inflation
models. Now let’s see whether a fast rolling ϕ̇1 can result in something more natural.
If we want ϕ̈1/(Hϕ̇1) to cancel the exponential expansion of the history space
volume, we need
H, (34)
which amounts to demanding that
∣ϕ̇1/a
∣ increases with time. As n ≥ 2, |ϕ̇1| must
be increasing faster than a3 to make this cancellation possible. And this cancellation
need to be valid along the whole 60 e-folds of inflation. It seems impossible for ϕ̇1 to
behave like this. So even a fast rolling ϕ̇1 can not make the situation more natural.
A few words are in order here. We have picked a specific field ϕ1 out of many other
fields in studying the measure, this is just the result of integrating out pϕ1, namely,
we have allowed ϕ̇1 to vary as much as possible. We could have picked out another
field, then we would be discussing the differential measure in a different region on the
history space.
Now we see that the multi-field inflation is even more impossible than the single
field inflation. Then, if for anthropic principle or some other reasons that a 60 e-folds’
inflation has to have happened in our history, it should be single field inflation rather
than multi-field inflation, because the latter has much smaller measure.
5 Generalized Multi-Field Models
In this section, we do the generalizations one step further to consider the action
−3aN−2ȧ2 + a3f
−1ϕ̇i
, (35)
which has the features of the actions in both Section 3 and Section 4.
Using the proper time, the canonical momentum for ϕ takes the form
pi = πia
3, where πi ≡ fϕ̇i, (36)
with the equations of motion
3H2 = ϕ̇iπi − f, π̇i + 3Hπi − fϕi = 0, Ḣ = −
ϕ̇iπi. (37)
and the constraint for the history space variation
ϕ̇i∆πi = fϕi∆ϕi. (38)
We assume that the matrix fϕ̇iϕ̇j has inverse matrix. This should be true when all the
constraints in (35) are solved and ϕi only denotes the dynamical degree of freedom.
We use f ϕ̇iϕ̇j as the inverse matrix of fϕ̇iϕ̇j Then it can be shown that
∆ϕ̇i = f
ϕ̇iϕ̇j (∆πj − fϕ̇jϕk∆ϕk). (39)
In this generalized case,
A ∼ π1 ∆ϕ1 ∧∆π2 ∧∆ϕ2 ∧ . . . ∧∆πn ∧∆ϕn. (40)
using the same technique developed in Section 3 and Section 4, one finds
∂HA =
f ϕ̇1ϕ̇iπ̇i
ϕ̇1Ḣ
ϕ̇iπ̇i
ϕ̇iϕ̇k π̇k
f ϕ̇1ϕ̇jfϕ̇jϕk ϕ̇k
Ḣϕ̇1
ϕ̇iϕ̇jfϕ̇jϕkϕ̇k
A (41)
To see the implications of this equation, let us concentrate on the double field in-
flation models. It is because it seems more difficult to cancel the −3nH
term for
lager n. Lagrangian densities like f = g(ϕ1)ϕ̇
1 + h(ϕ2)ϕ̇
2 are not of special interest
here, because they can be transformed into the case discussed in Section 4 by a field
redefinition. As another example, let us consider the Lagrangian density
f = f
ϕ1, ϕ2,
(ϕ21 + ϕ
, (42)
in this case, the equation (41) takes the form
∂HA = −6A
∂HN, (43)
where
f ′ ≡
(ϕ̇21 + ϕ̇
] . (44)
So a fast rolling f ′ϕ̇21 is required to cancel the exponential expansion of the volume
of the history space.
To see the physical implications for this condition, consider the DBI inflation
model by [13]. The action of the DBI inflation model is given by
SDBI = −
1− f̃ ϕ̇2 − 1
+ V (ϕ)
where the angular motion of ϕi has been ignored, so ϕ̇
2 ≡ ϕ̇2i . Then f
′ = 1/
1− f̃ ϕ̇2 ≡
γ is just the relativistic factor defined in [13]. From the spectral index
ns − 1 = 4
, (46)
We conclude that γ should not be a fast rolling quantity along the whole history of
observable inflation. Moreover, from the equation Ḣ = −γϕ̇2/2, we see again that
γ can not be large for a long time during inflation. So the cancellation of the e−3nN
factor can not be obtained.
6 Conclusion and Discussions
In this paper, we have reviewed the measure problem in cosmology. We calculated the
measure and the probability for inflation in single and multi-field models with gener-
alized Lagrangian density. It is shown that the measure for the single field inflation
and the corresponding generalizations are suppressed by a factor of exp(−3N). While
the n-field and generalized multi-field inflation models has a measure proportional to
exp(−3nN).
This work can be understood in another way. Taking apart the discussion for the
measure and the slow roll condition, other parts of this paper can be thought of as a
proof of the attractor behavior of various kinds of generalized inflation models. On
the one hand, it is a proof that the attractor behavior is very common in inflationary
models. While on the other hand, to take the measure into consideration, we see that
it is far from obvious for an attractor to be a natural solution in cosmology. And it
is just this early time attractor combined with the requirement of slow roll that puts
inflation into a highly unnatural situation.
We did not study explicitly the inflation models with non-minimal coupling to
gravity [17]. But these models do not seem to bring large correction for the sup-
pression factor. It is because through conformal transformation, these non-minimal
coupled models are generally equivalent with the corresponding minimal coupled in-
flation models with the same number or one more inflation fields. Another reason
not to consider these models in this work is that, as the energy scale commonly drops
during inflation, near the end of inflation, the non-minimal coupling effect may not
be so important.
There are also inflation models with extra components or special spacetime prop-
erties. Examples of this kind are inflation with holographic dark energy [18, 19] or
in the non-commutative spacetime [20, 21, 22]. These models do not seem to change
the results much either. Because in the former example, the holographic dark energy
is diluted during inflation so do not seem to cause large corrections near the end
of inflation. In the latter case, although the spectrum for perturbations is greatly
modified in the non-commutative spacetime, the isotropic and homogeneous inflating
background do not change much because it belongs to a lower energy scale. So the
corrections to the probability can not be large.
As a closing remark, we noticed that some non-inflationary models do not share
the small measure problem. One example is the cyclic universe model [23]. Although
the cyclic model is controlled by gravity coupled with a scalar field, it do not have
slow roll behavior backwards in time in the cycle we live. So the key observation that
the exponentially expansion of the phase space volume breaks the slow roll condition
do not apply in the cyclic model. Nevertheless, the number of cycles in the cyclic
universe must be finite [23], so it remains to explain how all the cycles begin in the
first place.
Acknowledgments
This work was supported by grants of NSFC. We thank Bin Chen, Yi-Fu Cai, Chao-
Jun Feng, and Yushu Song for discussion.
References
[1] R. Bousso, J. Polchinski, JHEP 0006:006,2000, hep-th/0004134. S. Kachru,
R. Kallosh, A. Linde and S. Trivedi, Phys. Rev. D68 (2003) 046005, hep-
th/0301240. L. Susskind, hep-th/0302219. M. R. Douglas, JHEP 0305, 046
(2003). hep-th/0303194.
[2] Steven Weinberg, Rev. Mod. Phys. 61:1-23 (1989).
[3] Steinhardt, P J 1983, Proceedings of the Nuffield Workshop, Cambridge, 21 June
- 9 July, 1982, eds: Gibbons, G W, Hawking, S W and Siklos, S T C (Cambridge:
Cambridge University Press), pp. 251-66. Vilenkin, A, Phys. Rev. D27, 2848-55
(1983). A. D. Linde, Phys. Lett. B175, 395 (1986).
[4] G. W. Gibbons and Neil Turok, hep-th/0609095.
[5] Jaume Garriga, Delia Schwartz-Perlov, Alexander Vilenkin, Sergei Winitzki,
JCAP 0601 (2006) 017, hep-th/0509184. Richard Easther, Eugene A. Lim,
Matthew R. Martin, JCAP 0603 (2006) 016, astro-ph/0511233. Vitaly
Vanchurin, Alexander Vilenkin, Phys. Rev. D74 (2006) 043520, hep-th/0605015.
R. Bousso, Phys. Rev. Lett. 97, 191302 (2006), hep-th/0605263.
[6] G. W. Gibbons, S. W. Hawking and J. M. Stewart, Nucl. Phys. B 281, 736
(1986).
[7] M. Henneaux, Nuovo Cim Lett. 38, 609 (1983).
[8] S. W. Hawking and D. N. Page, Nucl. Phys. B298, 789 (1988).
[9] S. W. Hawking, Thomas Hertog, Phys. Rev. D73 (2006) 123527,
hep-th/0602091.
[10] C. Armendariz-Picon, T. Damour, V. Mukhanov, Phys.Lett. B458 (1999) 209-
[11] Yun-Song Piao and Yuan-Zhong Zhang, Phy. Rev. D 70, 063513 (2004),
astro-ph/0401231. P. F. Gonzalez-Diaz and J. A. Jimenez-madrid, Phys. Lett.
B596, 16 (2004).
http://arxiv.org/abs/hep-th/0004134
http://arxiv.org/abs/hep-th/0302219
http://arxiv.org/abs/hep-th/0303194
http://arxiv.org/abs/hep-th/0609095
http://arxiv.org/abs/hep-th/0509184
http://arxiv.org/abs/astro-ph/0511233
http://arxiv.org/abs/hep-th/0605015
http://arxiv.org/abs/hep-th/0605263
http://arxiv.org/abs/hep-th/0602091
http://arxiv.org/abs/astro-ph/0401231
[12] Eva Silverstein and David Tong, Phys. Rev. D70:103505,2004, hep-th/0310221.
Mohsen Alishahiha, Eva Silverstein, David Tong, Phys. Rev. D70:123505,2004,
hep-th/0404084.
[13] Gary Shiu and Bret Underwood, Phys. Rev. Lett. 98:051301,2007,
hep-th/0610151.
[14] Yun-Song Piao, Rong-Gen Cai, Xin-min Zhang, Yuan-Zhong Zhang, Phys. Rev.
D66:121301,2002. hep-ph/0207143.
[15] A. Linde, Phys. Rev. D49:748-754,1994, astro-ph/9307002.
[16] Robert Brandenberger, Pei-Ming Ho and Hsien-chung Kao, JCAP 0411 (2004)
011, hep-th/0312288.
[17] A. Starobinsky, Phys. Lett. B91 (1980) 99; A. Starobinsky, Sov. Astron. Lett.
9 (1983) 302. Viatcheslav F. Mukhanov, H. A. Feldman and Robert H. Bran-
denberger, Phys. Rept. 215,(1992) 203 Jai-chan Hwang and Hyerim Noh, Phys.
Rev. D71, (2005) 063536, gr-qc/0412126. Miao Li, astro-ph/0607525, JCAP
0610 (2006) 003. Bin Chen, Miao Li, Tower Wang, Yi Wang, astro-ph/0610514.
[18] M. Li, Phys.Lett. B603 (2004) 1, hep-th/0403127; Q.-G. Huang and M, Li, JCAP
0408 (2004) 013, astro-ph/0404229. Qing-Guo Huang, Miao Li, hep-th/0410095,
JCAP 0503 (2005) 001. Bin Chen, Miao Li, Yi Wang, astro-ph/0611623.
[19] Qing-Guo Huang, Yungui Gong, astro-ph/0403590, JCAP 0408 (2004) 006.
Hsien-Chung Kao, Wo-Lung Lee, Feng-Li Lin, astro-ph/0501487, Phys.Rev. D71
(2005) 123518. Xin Zhang, astro-ph/0504586, Int. J. Mod. Phys. D14 (2005)
1597-1606; Xin Zhang, Feng-QuanWu, astro-ph/0506310, Phys. Rev. D72 (2005)
043524; Zhe Chang, Feng-Quan Wu, Xin Zhang, astro-ph/0509531, Phys. Lett.
B633 (2006) 14-18; Xin Zhang, astro-ph/0609699, Phys. Rev. D 74 (2006) 103505.
Xin Zhang, Feng-Quan Wu, astro-ph/0701405.
[20] Robert Brandenberger, Pei-Ming Ho, Phys. Rev. D66:023517,2002, AAPPS
Bull.12N1:10-20,2002, hep-th/0203119. Q.-G. Huang and M. Li, JHEP
http://arxiv.org/abs/hep-th/0310221
http://arxiv.org/abs/hep-th/0404084
http://arxiv.org/abs/hep-th/0610151
http://arxiv.org/abs/hep-ph/0207143
http://arxiv.org/abs/astro-ph/9307002
http://arxiv.org/abs/hep-th/0312288
http://arxiv.org/abs/gr-qc/0412126
http://arxiv.org/abs/astro-ph/0607525
http://arxiv.org/abs/astro-ph/0610514
http://arxiv.org/abs/hep-th/0403127
http://arxiv.org/abs/astro-ph/0404229
http://arxiv.org/abs/hep-th/0410095
http://arxiv.org/abs/astro-ph/0611623
http://arxiv.org/abs/astro-ph/0403590
http://arxiv.org/abs/astro-ph/0501487
http://arxiv.org/abs/astro-ph/0504586
http://arxiv.org/abs/astro-ph/0506310
http://arxiv.org/abs/astro-ph/0509531
http://arxiv.org/abs/astro-ph/0609699
http://arxiv.org/abs/astro-ph/0701405
http://arxiv.org/abs/hep-th/0203119
0306(2003)014, hep-th/0304203; S. Tsujikawa, R. Maartens and R. Bran-
denberger, astro-ph/0308169. Q.-G. Huang and M. Li, JCAP 0311(2003)001,
astro-ph/0308458; Nucl. Phys. B713(2005)219, astro-ph/0311378;
[21] Q.-G. Huang and M, Li, astro-ph/0603782; X. Zhang and F.-Q, Wu,
astro-ph/0604195, Phys. Lett. B638 (2006) 396; Qing-Guo Huang, Phys. Rev.
D74:063513,2006, astro-ph/0605442. X. Zhang, hep-th/0608207; Q.-G. Huang,
astro-ph/0610389, JCAP 0611:004,2006.
[22] Seoktae Koh, Robert H. Brandenberger, hep-th/0702217. Robert Brandenberger,
hep-th/0703173.
[23] J. Khoury, B.A. Ovrut, P.J. Steinhardt and N. Turok, Phys. Rev. D64 (2001)
123522; P.J. Steinhardt and N. Turok, Science 296, (2002) 1436; Phys. Rev. D65
126003 (2002); P.J. Steinhardt and N. Turok, astro-ph/0404480, New Astron.
Rev. 49 (2005) 43-57.
http://arxiv.org/abs/hep-th/0304203
http://arxiv.org/abs/astro-ph/0308169
http://arxiv.org/abs/astro-ph/0308458
http://arxiv.org/abs/astro-ph/0311378
http://arxiv.org/abs/astro-ph/0603782
http://arxiv.org/abs/astro-ph/0604195
http://arxiv.org/abs/astro-ph/0605442
http://arxiv.org/abs/hep-th/0608207
http://arxiv.org/abs/astro-ph/0610389
http://arxiv.org/abs/hep-th/0702217
http://arxiv.org/abs/hep-th/0703173
http://arxiv.org/abs/astro-ph/0404480
	Introduction
	Single Field Inflation Models
	Generalized Single Field Models
	Multi-Field Inflation Models
	Generalized Multi-Field Models
	Conclusion and Discussions
ABSTRACT
  We investigate the measure problem in the framework of inflationary
cosmology. The measure of the history space is constructed and applied to
inflation models. Using this measure, it is shown that the probability for the
generalized single field slow roll inflation to last for $N$ e-folds is
suppressed by a factor $\exp(-3N)$, and the probability for the generalized
$n$-field slow roll inflation is suppressed by a much larger factor
$\exp(-3nN)$. Some non-inflationary models such as the cyclic model do not
suffer from this difficulty.

<|endoftext|><|startoftext|>
ZJOU-PHY-TH-07-01
NJNU-TH-07-05
Studies of B0s → η(′)η(′) decays in the pQCD approach
Xin Liua∗, Zhen-Jun Xiaob†, Hui-Sheng Wangc
a. Department of Physics, Zhejiang Ocean University,
Zhoushan, Zhejiang 316000, P.R. China
b. Department of Physics and Institute of Theoretical Physics,
Nanjing Normal University, Nanjing, Jiangsu 210097, P.R. China and
c. Department of Applied Mathematics and Physics,
Anhui University of Technology and Science,
Wuhu, Anhui 241000, P.R. China
(Dated: November 18, 2018)
Abstract
We calculate the CP averaged branching ratios and CP-violating asymmetries for B0s → ηη, ηη′
and η′η′ decays in the perturbative QCD (pQCD) approach here. The pQCD predictions for
the CP-averaged branching ratios are Br(B0s → ηη) =
14.2+18.0−7.5
× 10−6, Br(B0s → ηη′) =
12.4+18.2−7.0
×10−6, and Br(B0s → η′η′) =
9.2+15.3−4.9
×10−6, which agree well with those obtained
by employing the QCD factorization approach and also be consistent with available experimental
upper limits. The gluonic contributions are small in size: less than 7% for Bs → ηη and ηη′
decays, and around 18% for Bs → η′η′ decay. The CP-violating asymmetries for three decays
are very small: less than 3% in magnitude.
PACS numbers: 13.25.Hw, 12.38.Bx, 14.40.Nd
∗ liuxin@zjou.edu.cn
† xiaozhenjun@njnu.edu.cn
http://arxiv.org/abs/0704.1027v1
Among various B → M1M2 decay channels ( here Mi refers to the light pseudo-scalar
or vector mesons ), the decays involving the isosinglet η or η′ mesons in the final state are
phenomenologically very interesting and have been studied extensively during the past
decade because of the so-called Kη′ puzzle or other special features [1, 2, 3, 4].
Motivated by the large number of Bs production and decay events expected at the
forthcoming LHC experiments, the studies about the Bs meson decays become more
attractive than ever before. Very recently, some two-body Bs → Miη(′) decays, such as
Bs → (π, ρ, ω, φ)η(′) decays have been studied in Refs. [5, 6] in the perturbative QCD
(pQCD ) factorization approach [7, 8, 9]. In this paper, we would like to calculate the
branching ratios and CP asymmetries for the three B0s → ηη, ηη′ and η′η′ decays by
employing the low energy effective Hamiltonian [10] and the pQCD approach. Besides
the usual factorizable contributions, we here are able to evaluate the non-factorizable and
the annihilation contributions to these decays.
On the experimental side, only the poor upper limit on Br(B0s → ηη) is available now
[11] (upper limits at 90% C.L.):
Br(B0s → ηη) < 1.5× 10−3 , (1)
Of course, this situation will be improved rapidly when LHC experiment starts to run at
the end of 2007.
This paper is organized as follows. In Sec. I, we calculate analytically the related
Feynman diagrams and present the various decay amplitudes for the studied decay modes.
In Sec. II, we show the numerical results for the branching ratios and CP asymmetries
of B0s → η(′)η(′) decays. A short summary and some discussions are also included in this
section.
I. PERTURBATIVE CALCULATIONS
Since the b quark is rather heavy we consider the Bs meson at rest for simplicity. It
is convenient to use light-cone coordinate (p+, p−,pT ) to describe the meson’s momenta:
p± = (p0± p3)/
2 and pT = (p
1, p2). Using the light-cone coordinates the Bs meson and
the two final state meson momenta can be written as
(1, 1, 0T ), P2 =
(1, 0, 0T ), P3 =
(0, 1, 0T ), (2)
respectively, here the light meson masses have been neglected. Putting the light (anti-)
quark momenta in Bs, η
′ and η mesons as k1, k2, and k3, respectively, we can choose
k1 = (x1P
1 , 0,k1T ), k2 = (x2P
2 , 0,k2T ), k3 = (0, x3P
3 ,k3T ). (3)
Then, after the integration over k−1 , k
2 , and k
3 , the decay amplitude for Bs → ηη′ decay,
for example, can be conceptually written as
A(Bs → ηη′) ∼
dx1dx2dx3b1db1b2db2b3db3
C(t)ΦBs(x1, b1)Φη′(x2, b2)Φη(x3, b3)H(xi, bi, t)St(xi) e
−S(t)
, (4)
where ki are momenta of light quarks included in each meson, term Tr denotes the trace
over Dirac and color indices, C(t) is the Wilson coefficient evaluated at scale t, the function
H(k1, k2, k3, t) is the hard part and can be calculated perturbatively, the function ΦM is
the wave function, the function St(xi) describes the threshold resummation [12] which
smears the end-point singularities on xi, and the last term, e
−S(t), is the Sudakov form
factor which suppresses the soft dynamics effectively. We will calculate analytically the
function H(xi, bi, t) for the considered decays in the first order in αs expansion and give
the convoluted amplitudes in next section.
For the two-body charmless Bs meson decays, the related weak effective Hamiltonian
Heff can be written as [10]
Heff =
uq (C1(µ)O
1 (µ) + C2(µ)O
2 (µ))− VtbV ∗tq
Ci(µ)Oi(µ)
, (5)
where Ci(µ) are Wilson coefficients at the renormalization scale µ and Oi are the four-
fermion operators for the case of b → q (q = d, s) transition [5, 10]. For the Wilson
coefficients Ci(µ) (i = 1, . . . , 10), we will use the leading order (LO) expressions, although
the next-to-leading order (NLO) results already exist in the literature [10]. This is the
consistent way to cancel the explicit µ dependence in the theoretical formulae. For the
renormalization group evolution of the Wilson coefficients from higher scale to lower scale,
we use the formulae as given in Ref.[13] directly.
A. Decay amplitudes
We firstly take Bs → ηη′ decay mode as an example, and then extend our study to
Bs → ηη and η′η′ decays. Similar to the B0s → π0η(′) decays in [5], there are 8 type
diagrams contributing to the Bs → ηη
decays, as illustrated in Fig.1. We first calculate
the usual factorizable diagrams (a) and (b). Operators O1,2,3,4,9,10 are (V − A)(V − A)
currents, the sum of their amplitudes is given as
Feη = 8πCFm
dx1dx3
b1db1b3db3 φBs(x1, b1)
(1 + x3)φ
η (x3, b3) + (1− 2x3)rη(φPη (x3, b3) + φTη (x3, b3))
·αs(t1e) he(x1, x3, b1, b3) exp[−Sab(t1e)]
+2rηφ
η (x3, b3)αs(t
e)he(x3, x1, b3, b1) exp[−Sab(t2e)]
. (6)
where rη = m
0/mB; CF = 4/3 is a color factor. The explicit expressions of the function
he, the scales t
e and the Sudakov factors Sab can be found Ref. [5]. The form factors of
Bs to η decay, F
Bs→ηss̄
0,1 (0), can thus be extracted from the expression in Eq. (6).
The operators O5,6,7,8 have a structure of (V − A)(V + A). Some of these operators
can contribute to the decay amplitude in a factorizable way, but others may contribute
after making a Fierz transformation in order to get right flavor and color structure for
FIG. 1: Typical Feynman diagrams contributing to the Bs → ηη′ decays, where diagram (a)
and (b) contribute to the Bs → η form factor FBs→η0,1 .
factorization to work. Such kinds of contributions can be written as
F P1eη = −Feη . (7)
F P2eη = 16πCFm
(f sη′ − fuη′)m2η′
2msmBs
dx1dx3
b1db1b3db3 φBs(x1, b1)
φAη (x3, b3) + rη((2 + x3)φ
η (x3, b3)− x3φTη (x3, b3))
·αs(t1e)he(x1, x3, b1, b3) exp[−Sab(t1e)]
η (x3, b3)− 2(x1 − 1)rηφPη (x3, b3)
·αs(t2e)he(x3, x1, b3, b1) exp[−Sab(t2e)]
. (8)
For the non-factorizable diagrams 1(c) and 1(d), the corresponding decay amplitudes
can be written as
Meη =
dx1dx2 dx3
b1db1b2db2 φBs(x1, b1)φ
η′(x2, b2)
2x3rηφ
η (x3, b1)− x3φη(x3, b1)
·αs(tf)hf (x1, x2, x3, b1, b2) exp[−Scd(tf )]} , (9)
MP1eη = 0, (10)
MP2eη = −Meη . (11)
For the non-factorizable annihilation diagrams 1(e) and 1(f), we find
Maη =
dx1dx2 dx3
b1db1b2db2 φBs(x1, b1)
η (x3, b2)φ
η′(x2, b2)
+rηrη′
(x2 + x3 + 2)φ
η′(x2, b2) + (x2 − x3)φTη′(x2, b2)
φPη (x3, b2)
+rηrη′
(x2 − x3)φPη′(x3, b2) + (x2 + x3 − 2)φTη′(x2, b2)
φTη (x3, b2)
·αs(t3f)h3f (x1, x2, x3, b1, b2) exp[−Sef (t3f )]
η (x3, b2)φ
η′(x2, b2)
+rηrη′
(x2 + x3)φ
η′(x2, b2) + (x3 − x2)φTη′(x2, b2)
φPη (x3, b2)
+rηrη′
(x3 − x2)φPη′(x2, b2) + (x2 + x3)φTη′(x2, b2)
φTη (x3, b2)
·αs(t4f)h4f (x1, x2, x3, b1, b2) exp[−Sef (t4f )]
, (12)
MP1aη =
dx1dx2 dx3
b1db1b2db2 φBs(x1, b1)
(x3 − 2)rηφAη′(x2, b2)(φPη (x3, b2) + φTη (x3, b2))− (x2 − 2)rη′φAη (x3, b2)
(φPη′(x2, b2) + φ
η′(x2, b2))
· αs(t3f)h3f (x1, x2, x3, b1, b2) exp[−Sef(t3f )]
x3rηφ
η′(x2, b2)(φ
η (x3, b2) + φ
η (x3, b2))
−x2rη′φAη (x3, b2)(φPη′(x2, b2) + φTη′(x2, b2))
·αs(t4f )h4f(x1, x2, x3, b1, b2) exp[−Sef (t4f)]
, (13)
MP2aη =
dx1dx2 dx3
b1db1b2db2 φBs(x1, b1)
η (x3, b2)φ
η′(x2, b2)
+rηrη′
(x2 + x3 + 2)φ
η′(x2, b2) + (x3 − x2)φTη′(x2, b2)
φPη (x3, b2)
+rηrη′
(x3 − x2)φPη′(x3, b2) + (x2 + x3 − 2)φTη′(x2, b2)
φTη (x3, b2)
·αs(t3f )h3f(x1, x2, x3, b1, b2) exp[−Sef (t3f)]
η (x3, b2)φ
η′(x2, b2)
+rηrη′
(x2 + x3)φ
η′(x2, b2) + (x2 − x3)φTη′(x2, b2)
φPη (x3, b2)
+rηrη′
(x2 − x3)φPη′(x2, b2) + (x2 + x3)φTη′(x2, b2)
φTη (x3, b2)
·αs(t4f )h4f(x1, x2, x3, b1, b2) exp[−Sef (t4f)]
. (14)
For the factorizable annihilation diagrams 1(g) and 1(h), we have
Faη = F
aη = −8πCFm4Bs
dx2 dx3
b2db2b3db3
η (x3, b3)φ
η′(x2, b2)
+2rηrη′((x3 + 1)φ
η (x3, b3) + (x3 − 1)φTη (x3, b3))φPη′(x2, b2)
·αs(t3e)ha(x2, x3, b2, b3) exp[−Sgh(t3e)]
η (x3, b3)φ
η′(x2, b2)
+2rηrη′((x2 + 1)φ
η′(x2, b2) + (x2 − 1)φTη′(x2, b2))φPη (x3, b3)
·αs(t4e)ha(x3, x2, b3, b2) exp[−Sgh(t4e)]
F P2aη = −16πCFm4Bs
dx2 dx3
b2db2b3db3
x3rη(φ
η (x3, b3)− φTη (x3, b3))φAη′(x2, b2) + 2rη′φAη (x3, b3)φPη′(x2, b2)
·αs(t3e)ha(x2, x3, b2, b3) exp[−Sgh(t3e)]
x2rη′(φ
η′(x2, b2)− φTη′(x2, b2))φAη (x3, b3) + 2rηφAη′(x2, b2)φPη (x3, b3)
·αs(t4e)ha(x3, x2, b3, b2) exp[−Sgh(t4e)]
. (16)
For the Bs → ηη′ decay, besides the Feynman diagrams as shown in Fig. 1 where
the upper emitted meson is the η′, the Feynman diagrams obtained by exchanging the
position of η and η′ also contribute to this decay mode. The corresponding expressions
of amplitudes for new diagrams will be similar with those as given in Eqs.(6-14), since
the η and η′ are all light pseudoscalar mesons and have the similar wave functions. The
expressions of amplitudes for new diagrams can be obtained by the replacements
φAη ←→ φη′ , φPη ←→ φPη′ , φTη ←→ φtη′ , rη ←→ rη′ . (17)
For example, we find that:
Feη′ = Feη, Faη′ = −Faη, F P1aη′ = −F P1aη , F P2aη′ = F P2aη . (18)
Before we write down the complete decay amplitude for the studied decay modes, we
firstly give a brief discussion about the η − η′ mixing and the gluonic component of the
η′ meson. There exist two popular mixing basis for η − η′ system, the octet-singlet and
the quark flavor basis, in literature. Here we use the SU(3)F octet-singlet basis with the
two mixing angle (θ1, θ8) scheme [14] to describe the mixing of η and η
′ mesons. In the
numerical calculations, we will use the following mixing parameters [14]
θ8 = −21.2◦, θ1 = −9.2◦, f1 = 1.17fπ, f8 = 1.26fπ. (19)
In this paper, we firstly take η and η′ as a linear combination of light quark pairs
uū, dd̄ and ss̄, and then estimate the possible gluonic contributions to Bs → η(′)η(′)
decays by using the formulae as presented in Ref. [4]. We found that the possible gluonic
contributions are indeed small.
B. Complete decay amplitudes
For B0s → ηη′ decay, by combining the contributions from different diagrams, the total
decay amplitude can be written as
M(ηη′) =
FeηF2f
η′ + Feη′F
C4 − C5 −
FeηF2f
η′ + Feη′F
C4 − 2C5 −
F P2eη F2 + F
C5 + C6 −
+ (Meη +Meη′)F2F
ξuC2 − ξt
C3 + 2C4 −
− [MeηF1F ′2 +Meη′F ′1F2] ξt
MP2eη +M
2C6 +
MP2eη F1F
F ′1F2
+ (Maη +Maη′)F1F
ξuC2 − ξt
C3 + 2C4 −
− (Maη +Maη′)F2F ′2ξt
MP1aη +M
MP2aη +M
2C6 +
MP2aη +M
−fBs ·
F P2aη + F
C5 + C6 −
, (20)
where ξu = V
ubVus and ξt = V
tbVts, and the relevant mixing parameters and decay con-
stants are
cos θ8 −
sin θ1, F2 = −
sin θ8 +
cos θ1, (21)
F ′1 =
sin θ8 +
cos θ1, F
2 = −
sin θ8 +
cos θ1, (22)
f dη =
cos θ8 −
sin θ1, f
η = −
cos θ8 −
sin θ1, (23)
f dη′ =
sin θ8 +
cos θ1, f
η′ = −
sin θ8 +
cos θ1. (24)
Similarly, the decay amplitudes for B0s → ηη and B0s → η′η′ decay can be obtained
easily from Eq.(20) by the following replacements
f dη , f
η ←→ f dη′ , f sη′ ; F1(θ1, θ8)←→ F ′1(θ1, θ8); F2(θ1, θ8)←→ F ′2(θ1, θ8). (25)
Note that the contributions from the possible gluonic component of η′ meson have not
been included here.
II. NUMERICAL RESULTS AND DISCUSSIONS
In this section, we will calculate the CP-averaged branching ratios and CP violating
asymmetries for those considered decay modes. The input parameters and the wave
functions to be used are given in Appendix A. In numerical calculations, central values
of input parameters will be used implicitly unless otherwise stated.
Using the decay amplitudes obtained in last section, it is straightforward to calculate
the branching ratios. By employing the two mixing angle scheme of η − η′ system and
using the mixing parameters as given in Eq. (19), one finds the CP-averaged branching
ratios for the considered three decays as follows
Br( B0s → ηη) =
14.2+6.6−4.2(ωb)
+16.7
−6.2 (m
× 10−6, (26)
Br( B0s → ηη′) =
12.4+5.7−3.6(ωb)
+17.3
−6.0 (m
× 10−6, (27)
Br( B0s → η′η′) =
9.2+3.0−2.0(ωb)
+15.0
−4.5 (m
× 10−6, (28)
where the main errors are induced by the uncertainties of ωb = 0.50 ± 0.05 GeV, and
0 = [1.49−2.38] GeV (corresponding to ms = 130±30 MeV), respectively. The above
pQCD predictions agree well with those obtained in the QCD facterization approach [2].
As for the gluonic contributions, we follow the same procedure as being used in Ref. [4]
to include the possible gluonic contributions to the Bs → η(′) transition form factors
0,1 and found that the gluonic contributions to the branching ratios are less than
3% for B → ηη decay, ∼ 7% for B → ηη′ decay, and around 18% for B → η′η′ decay.
The central values of the pQCD predictions for Bs → η(′)η(′) decays after the inclusion of
possible gluonic contributions are the following
Br(B0s → ηη) =
13.7+6.4−4.0(ωb)
+16.5
−6.1 (m
× 10−6,
Br( B0s → ηη′) =
11.6+5.3−3.4(ωb)
+16.8
−5.7 (m
× 10−6,
Br( B0s → η′η′) =
10.8+3.7−2.4(ωb)
+16.2
−5.2 (m
× 10−6. (29)
Now we turn to the evaluations of the CP-violating asymmetries of Bs → η(′)η(′) decays
in pQCD approach. For B0s meson decays, a non-zero ratio (∆Γ/Γ)Bs is expected in the
SM [15, 16]. For Bs → η(′)η(′) decays, three quantities to describe the CP violation can
be defined as follows [16]:
AdirCP =
|λCP |2 − 1
1 + |λCP |2
, AmixCP =
2Im(λCP )
1 + |λCP |2
, A∆Γs =
2Re(λCP )
1 + |λCP |2
, (30)
λCP = ηf
V ∗tbVts〈f |Heff |B̄0s〉
ts〈f |Heff |B0s 〉
〈f |Heff |B̄0s〉
〈f |Heff |B0s〉
, (31)
in a very good approximation. Here AdirCP and AmixCP means the direct and mixing-induced
CP violation respectively, while the third term A∆Γs is related to the presence of a non-
negligible ∆Γs. By using the mixing parameters in Eq. (19) and the input parameters as
given in Appendix A, one found the pQCD predictions for AdirCP , AmixCP and Hf
AdirCP (B0s → ηη) =
−0.2± 0.1(γ)± 0.1(ωb)+0.4−0.2(m
× 10−2,
AdirCP (B0s → ηη′) =
+0.6+0.1−0.2(γ)± 0.1(ωb)± 0.3(m
× 10−2,
AdirCP (B0s → η′η′) =
−0.8+0.2−0.1(γ)± 0.1(ωb)± 0.7(m
× 10−2, (32)
AmixCP (B0s → ηη) = [−0.3± 0.1(γ)± 0.2(ωb)± 0.5(m
0 )]× 10−2,
AmixCP (B0s → ηη′) = [−0.8± 0.2(γ)± 0.1(ωb)± 0.2(m
0 )]× 10−2,
AmixCP (B0s → η′η′) =
+1.8+0.3−0.5(γ)± 0.0(ωb)+0.5−0.3(m
× 10−2, (33)
A∆Γs(ηη) ≈ A∆Γs(ηη′) ≈ A∆Γs(η′η′) ≈ 1, (34)
where the dominant errors come from the variations of CKM angle γ = 60◦ ± 20◦, ωb =
0.50± 0.05 GeV and mηss̄0 = [1.49− 2.38] GeV ( corresponding to ms = 130 ± 30 MeV),
respectively. It is easy to see that both the direct and mixing-induced CP violations of the
considered Bs decays are very small in magnitude, and thus almost impossible to measure
them even in the LHC experiments. The above pQCD predictions are also consistent with
the QCDF predictions [1, 2].
In short, we calculated the branching ratios and CP-violating asymmetries of B0s → ηη,
ηη′ and η′η′ decays at the leading order by using the pQCD factorization approach. Besides
the usual factorizable diagrams, the non-factorizable and annihilation diagrams are also
calculated analytically in the pQCD approach. From our calculations and phenomeno-
logical analysis, we found the following results:
• Using the two mixing angle scheme, the pQCD predictions for the CP-averaged
branching ratios are
Br(B0s → ηη) =
14.2+18.0−7.5
× 10−6,
Br(B0s → ηη′) =
12.4+18.2−7.0
× 10−6,
Br(B0s → η′η′) =
9.2+15.3−4.9
× 10−6, (35)
where the various errors as specified previously have been added in quadrature. The
pQCD predictions for the three decay channels agree well with those obtained by
employing the QCDF approach.
• The gluonic contributions are small in size: less than 7% for Bs → ηη and ηη′
decays, and around 18% for Bs → η′η′ decay.
• The direct and mixing-induced CP violations of the considered three decay modes
are very small: less than 3% in magnitude.
Note added: After completion of this paper, the paper in Ref.[18] appeared, and where
a systematic study for the Bs → M1M2 decays in the pQCD factorization approach has
been done. Since different mixing-scheme of η − η′ system have been used, the explicit
expressions of the decay amplitudes of the relevant decays are different in these two papers,
but the numerical predictions for branching ratios and CP violations agree well with each
other. The possible gluonic contributions are estimated here.
Acknowledgments
X. Liu would like to acknowledge the financial support of The Scientific Research
Start-up Fund of Zhejiang Ocean University under Grant No.21065010706. This work was
partially supported by the National Natural Science Foundation of China under Grant
No.10575052, and by the Specialized Research Fund for the Doctoral Program of Higher
Education (SRFDP) under Grant No. 20050319008.
APPENDIX A: INPUT PARAMETERS AND WAVE FUNCTIONS
In this Appendix we show the input parameters and the light meson wave functions to
be used in the numerical calculations.
The masses, decay constants, QCD scale and B0s meson lifetime are
(f=4)
= 250MeV, fπ = 130MeV, fBs = 230MeV,
0 = 1.4GeV, ms = 130MeV, fK = 160MeV,
MBs = 5.37GeV, MW = 80.41GeV, τB0s = 1.46× 10
−12s (A1)
For the CKM matrix elements, here we adopt the Wolfenstein parametrization for the
CKM matrix, and take λ = 0.2272, A = 0.818, ρ = 0.221 and η = 0.340 [11].
For the B meson wave function, we adopt the model
φBs(x, b) = NBsx
2(1− x)2exp
M2Bs x
(ωbb)
, (A2)
where ωb is a free parameter and we take ωb = 0.50± 0.05 GeV in numerical calculations,
and NBs = 63.67 is the normalization factor for ωb = 0.50.
For the distribution amplitudes φAη
, φPη
and φTη
, we utilize the result from the
light-cone sum rule [17] including twist-3 contribution. For the corresponding Gegenbauer
moments and relevant input parameters, we here use a
2 = 0.115, a
4 = −0.015, ρηdd̄ =
0 , η3 = 0.015 and ω3 = −3.0. We also assume that the wave function of uū is the
same as the wave function of dd̄ [3]. For the wave function of the ss̄ components, we also
use the same form as dd̄ but with mss̄0 and fy instead of m
0 and fx, respectively:
fx = fπ, fy =
2f 2K − f 2π . (A3)
These values are translated to the values in the two mixing angle method:
f1 = 152.1MeV, f8 = 163.8MeV,
θ1 = −9.2◦, θ8 = −21.2◦. (A4)
The parameters mi0 (i = ηdd̄(uū), ηss̄) are defined as:
dd̄(uū)
0 ≡ mπ0 ≡
(mu +md)
2M2K −m2π
(2ms)
. (A5)
[1] M. Beneke and M. Neubert, Nucl. Phys. B 675, 333 (2003).
[2] J.F. Sun, G.H. Zhu, D.S. Du, Phys. Rev. D 68 (2003) 054003.
[3] E. Kou, Phys. Rev. D 63, 054027 (2001); E. Kou and A.I. Sanda, Phys. Lett. B 525, 240
(2002).
[4] Y.Y. Charng, T. Kurimoto, and H.N. Li, Phys. Rev. D 74, 074024 (2006).
[5] Z.J. Xiao, X. Liu, and H.S. Wang, Phys. Rev. D 75,034017 (2007).
[6] X.F. Chen, D.Q. Guo and Z.J. Xiao, hep-ph/0701146.
[7] C.-H. V. Chang and H.N. Li, Phys. Rev. D 55, 5577 (1997); T.-W. Yeh and H.N. Li, Phys.
Rev. D 56, 1615 (1997).
[8] H.N. Li, Prog.Part.& Nucl.Phys. 51, 85 (2003), and reference therein.
[9] G.P. Lepage and S.J. Brodsky, Phys. Rev. D 22, 2157 (1980).
[10] G. Buchalla, A.J. Buras, and M.E. Lautenbacher, Rev. Mod. Phys. 68, 1125 (1996).
[11] Particle Data Group, W.-M. Yao et al., J. Phys. G 33, 1 (2006).
[12] H.N. Li, Phys. Rev. D 66, 094010 (2002).
[13] C.-D. Lü, K. Ukai and M.Z. Yang, Phys. Rev. D 63, 074009 (2001).
[14] Th. Feldmann, P. Kroll and B. Stech, Phys. Rev. D 58, 114006 (1998); R. Escribano and
J.M. Frere, J. High Energy Phys. 0506 (2005) 029.
[15] M. Beneke, G. Buchalla, C. Greub, A. Lenz and U. Nierste, Phys. Lett. B 459, 631 (1999).
[16] L. Fernandez, Ph.D Thesis, CERN-Thesis-2006-042.
[17] P. Ball, J. High Energy Phys. 9809, 005 (1998); P. Ball, J. High Energy Phys. 9901, 010
(1999); P. Ball and R. Zwicky, Phys. Rev. D 71, 014015 (2005).
[18] A. Ali, G. Kramer, Y. Li, C.D. Lü, Y.L. Shen, W. Wang, and Y.M. Wang, hep-ph/0703162.
http://arxiv.org/abs/hep-ph/0701146
http://arxiv.org/abs/hep-ph/0703162
	Perturbative calculations
	Decay amplitudes
	Complete decay amplitudes
	Numerical results and Discussions
	Acknowledgments
	Input parameters and wave functions
	References
ABSTRACT
  We calculate the CP averaged branching ratios and CP-violating asymmetries
for $B_s^0 \to \eta \eta, \eta \eta^\prime$ and $\eta^\prime \eta^\prime$
decays in the perturbative QCD (pQCD) approach here. The pQCD predictions for
the CP-averaged branching ratios are $Br(B_s^0 \to \eta \eta) = \left
(14.2^{+18.0}_{-7.5}) \times 10^{-6}$, $Br(B_s^0 \to \eta \eta^\prime)= \left
(12.4 ^{+18.2}_{-7.0}) \times 10^{-6}$, and $Br(B_s^0 \to \eta^{\prime}
\eta^{\prime}) = \left (9.2^{+15.3}_{-4.9}) \times 10^{-6}$, which agree well
with those obtained by employing the QCD factorization approach and also be
consistent with available experimental upper limits. The gluonic contributions
are small in size: less than 7% for $B_s \to \eta \eta$ and $ \eta \eta^\prime$
decays, and around 18% for $B_s \to \eta' \eta'$ decay. The CP-violating
asymmetries for three decays are very small: less than 3% in magnitude.

<|endoftext|><|startoftext|>
Introduction
Ordinal regression (or ranking learning) is an impor-
tant supervised problem of learning a ranking or or-
dering on instances, which has the property of both
classification and metric regression. The learning task
of ordinal regression is to assign data points into a set
of finite ordered categories. For example, a teacher
rates students’ performance using A, B, C, D, and E
(A > B > C > D > E) (Chu & Ghahramani, 2005a).
Ordinal regression is different from classification due
to the order of categories. In contrast to metric re-
gression, the response variables (categories) in ordinal
regression is discrete and finite.
The research of ordinal regression dated back to the
ordinal statistics methods in 1980s (McCullagh, 1980;
McCullagh & Nelder, 1983) and machine learning re-
search in 1990s (Caruana et al., 1996; Herbrich et al.,
1998; Cohen et al., 1999). It has attracted the con-
siderable attention in recent years due to its poten-
tial applications in many data-intensive domains such
as information retrieval (Herbrich et al., 1998), web
page ranking (Joachims, 2002), collaborative filtering
(Goldberg et al., 1992; Basilico & Hofmann, 2004; Yu
et al., 2006), image retrieval (Wu et al., 2003), and pro-
tein ranking (Cheng & Baldi, 2006) in Bioinformatics.
A number of machine learning methods have been de-
veloped or redesigned to address ordinal regression
problem (Rajaram et al., 2003), including perceptron
(Crammer & Singer, 2002) and its kernelized gener-
alization (Basilico & Hofmann, 2004), neural network
with gradient descent (Caruana et al., 1996; Burges
et al., 2005), Gaussian process (Chu & Ghahramani,
2005b; Chu & Ghahramani, 2005a; Schwaighofer et
al., 2005), large margin classifier (or support vec-
tor machine) (Herbrich et al., 1999; Herbrich et al.,
2000; Joachims, 2002; Shashua & Levin, 2003; Chu
& Keerthi, 2005; Aiolli & Sperduti, 2004; Chu &
Keerthi, 2007), k-partite classifier (Agarwal & Roth,
2005), boosting algorithm (Freund et al., 2003; Dekel
et al., 2002), constraint classification (Har-Peled et al.,
2002), regression trees (Kramer et al., 2001), Naive
Bayes (Zhang et al., 2005), Bayesian hierarchical ex-
perts (Paquet et al., 2005), binary classification ap-
proach (Frank & Hall, 2001; Li & Lin, 2006) that de-
composes the original ordinal regression problem into
a set of binary classifications, and the optimization of
nonsmooth cost functions (Burges et al., 2006).
Most of these methods can be roughly classified into
two categories: pairwise constraint approach (Herbrich
et al., 2000; Joachims, 2002; Dekel et al., 2004; Burges
et al., 2005) and multi-threshold approach (Cram-
mer & Singer, 2002; Shashua & Levin, 2003; Chu &
Ghahramani, 2005a). The former is to convert the full
ranking relation into pairwise order constraints. The
latter tries to learn multiple thresholds to divide data
A Neural Network Approach to Ordinal Regression
into ordinal categories. Multi-threshold approaches
also can be unified under the general, extended binary
classification framework (Li & Lin, 2006).
The ordinal regression methods have different advan-
tages and disadvantages. Prank (Crammer & Singer,
2002), a perceptron approach that generalizes the bi-
nary perceptron algorithm to the ordinal multi-class
situation, is a fast online algorithm. However, like a
standard perceptron method, its accuracy suffers when
dealing with non-linear data, while a quadratic kernel
version of Prank greatly relieves this problem. One
class of accurate large-margin classifier approaches
(Herbrich et al., 2000; Joachims, 2002) convert the
ordinal relations into O(n2) (n: the number of data
points) pairwise ranking constraints for the structural
risk minimization (Vapnik, 1995; Schoelkopf & Smola,
2002). Thus, it can not be applied to medium size
datasets (> 10,000 data points), without discarding
some pairwise preference relations. It may also overfit
noise due to incomparable pairs.
The other class of powerful large-margin classifier
methods (Shashua & Levin, 2003; Chu & Keerthi,
2005) generalize the support vector formulation for or-
dinal regression by finding K − 1 thresholds on the
real line that divide data into K ordered categories.
The size of this optimization problem is linear in the
number of training examples. However, like support
vector machine used for classification, the prediction
speed is slow when the solution is not sparse, which
makes it not appropriate for time-critical tasks. Simi-
larly, another state-of-the-art approach, Gaussian pro-
cess method (Chu & Ghahramani, 2005a), also has the
difficulty of handling large training datasets and the
problem of slow prediction speed in some situations.
Here we describe a new neural network approach for
ordinal regression that has the advantages of neural
network learning: learning in both online and batch
mode, training on very large dataset (Burges et al.,
2005), handling non-linear data, good performance,
and rapid prediction. Our method can be considered
a generalization of the perceptron learning (Crammer
& Singer, 2002) into multi-layer perceptrons (neural
network) for ordinal regression. Our method is also
related to the classic generalized linear models (e.g.,
cumulative logit model) for ordinal regression (Mc-
Cullagh, 1980). Unlike the neural network method
(Burges et al., 2005) trained on pairs of examples
to learn pairwise order relations, our method works
on individual data points and uses multiple output
nodes to estimate the probabilities of ordinal cate-
gories. Thus, our method falls into the category of
multi-threshold approach. The learning of our method
proceeds similarly as traditional neural networks using
back-propagation (Rumelhart et al., 1986).
On the same benchmark datasets, our method yields
the performance better than the standard classifica-
tion neural networks and comparable to the state-of-
the-art methods using support vector machines and
Gaussian processes. In addition, our method can learn
on very large datasets and make rapid predictions.
2. Method
2.1. Formulation
Let D represent an ordinal regression dataset consist-
ing of n data points (x, y) , where x ∈ Rd is an input
feature vector and y is its ordinal category from a fi-
nite set Y . Without loss of generality, we assume that
Y = 1, 2, ...,K with ”<” as order relation.
For a standard classification neural network without
considering the order of categories, the goal is to pre-
dict the probability of a data point x belonging to
one category k (y = k). The input is x and the
target of encoding the category k is a vector t =
(0, ..., 0, 1, 0, ..., 0), where only the element tk is set to
1 and all others to 0. The goal is to learn a function
to map input vector x to a probability distribution
vector o = (o1, o2, ...ok, ...oK), where ok is closer to 1
and other elements are close to zero, subject to the
constraint
i=1 oi = 1.
In contrast, like the perceptron approach (Crammer &
Singer, 2002), our neural network approach considers
the order of the categories. If a data point x belongs
to category k, it is classified automatically into lower-
order categories (1, 2, ..., k − 1) as well. So the target
vector of x is t = (1, 1, .., 1, 0, 0, 0), where ti (1 ≤ i ≤ k)
is set to 1 and other elements zeros. Thus, the goal
is to learn a function to map the input vector x to
a probability vector o = (o1, o2, ..., ok, ...oK), where
oi (i ≤ k) is close to 1 and oi (i ≥ k) is close to 0.∑K
i=1 oi is the estimate of number of categories (i.e.
k) that x belongs to, instead of 1. The formulation
of the target vector is similar to the perceptron ap-
proach (Crammer & Singer, 2002). It is also related
to the classical cumulative probit model for ordinal re-
gression (McCullagh, 1980), in the sense that we can
consider the output probability vector (o1, ...ok, ...oK)
as a cumulative probability distribution on categories
(1, ..., k, ..., K), i.e.,
is the proportion of cate-
gories that x belongs to, starting from category 1.
The target encoding scheme of our method is related to
but, different from multi-label learning (Bishop, 1996)
and multiple label learning (Jin & Ghahramani, 2003)
A Neural Network Approach to Ordinal Regression
because our method imposes an order on the labels (or
categories).
2.2. Learning
Under the formulation, we can use the almost exactly
same neural network machinery for ordinal regression.
We construct a multi-layer neural network to learn
ordinal relations from D. The neural network has d
inputs corresponding to the number of dimensions of
input feature vector x and K output nodes correspond-
ing to K ordinal categories. There can be one or more
hidden layers. Without loss of generality, we use one
hidden layer to construct a standard two-layer feedfor-
ward neural network. Like a standard neural network
for classification, input nodes are fully connected with
hidden nodes, which in turn are fully connected with
output nodes. Likewise, the transfer function of hid-
den nodes can be linear function, sigmoid function,
and tanh function that is used in our experiment. The
only difference from traditional neural network lies in
the output layer. Traditional neural networks use soft-
max e
(or normalized exponential function) for
output nodes, satisfying the constraint that the sum of
outputs
i=1 oi is 1. zi is the net input to the output
node Oi.
In contrast, each output node Oi of our neural net-
work uses a standard sigmoid function 1
1+e−zi
, with-
out including the outputs from other nodes. Output
node Oi is used to estimate the probability oi that a
data point belongs to category i independently, with-
out subjecting to normalization as traditional neural
networks do. Thus, for a data point x of category
k, the target vector is (1, , 1, .., 1, 0, 0, 0), in which the
first k elements is 1 and others 0. This sets the target
value of output nodes Oi (i ≤ k) to 1 and Oi (i > k)
to 0. The targets instruct the neural network to ad-
just weights to produce probability outputs as close
as possible to the target vector. It is worth pointing
out that using independent sigmoid functions for out-
put nodes does not guaranteed the monotonic relation
(o1 >= o2 >= ... >= oK), which is not necessary but,
desirable for making predictions (Li & Lin, 2006). A
more sophisticated approach is to impose the inequal-
ity constraints on the outputs to improve the perfor-
mance.
Training of the neural network for ordinal regres-
sion proceeds very similarly as standard neural net-
works. The cost function for a data point x can
be relative entropy or square error between the tar-
get vector and the output vector. For relative en-
tropy, the cost function for output nodes is fc =∑K
i=1 (ti log oi + (1− ti) log(1− oi)). For square er-
ror, the error function is fc =
i=1 (ti − oi)
2. Pre-
vious studies (Richard & Lippman, 1991) on neural
network cost functions show that relative entropy and
square error functions usually yield very similar re-
sults. In our experiments, we use square error function
and standard back-propagation to train the neural net-
work. The errors are propagated back to output nodes,
and from output nodes to hidden nodes, and finally to
input nodes.
Since the transfer function ft of output node Oi is
the independent sigmoid function 1
1+e−zi
, the deriva-
tive of ft of output node Oi is
(1+e−zi )2
1+e−zi
(1 − 1
1+e−zi
) = oi(1 − oi). Thus, the net error
propagated to output node Oi is
= ti−oi
oi(1−oi)
oi(1 − oi) = ti − oi for relative entropy cost function,
= −2(ti−oi)×oi(1−oi) = −2oi(ti−oi)(1−oi)
for square error cost function. The net errors are prop-
agated through neural networks to adjust weights us-
ing gradient descent as traditional neural networks do.
Despite the small difference in the transfer function
and the computation of its derivative, the training of
our method is the same as traditional neural networks.
The network can be trained on data in the online
mode where weights are updated per example, or in
the batch mode where weights are updated per bunch
of examples.
2.3. Prediction
In the test phase, to make a prediction, our method
scans output nodes in the order O1, O2, ..., OK . It
stops when the output of a node is smaller than the
predefined threshold T (e.g., 0.5) or no nodes left. The
index k of the last node Ok whose output is bigger than
T is the predicted category of the data point.
3. Experiments and Results
3.1. Benchmark Data and Evaluation Metric
We use eight standard datasets for ordinal regres-
sion (Chu & Ghahramani, 2005a) to benchmark our
method. The eight datasets (Diabetes, Pyrimidines,
Triazines, Machine CUP, Auto MPG, Boston, Stocks
Domain, and Abalone) are originally used for metric
regression. Chu and Ghahramani (Chu & Ghahra-
mani, 2005a) discretized the real-value targets into
five equal intervals, corresponding to five ordinal cat-
egories. The authors randomly split each dataset into
training/test datasets and repeated the partition 20
times independently. We use the exactly same parti-
tions as in (Chu & Ghahramnai, 2005a) to train and
test our method.
A Neural Network Approach to Ordinal Regression
We use the online mode to train neural networks. The
parameters to tune are the number of hidden units, the
number of epochs, and the learning rate. We create
a grid for these three parameters, where the hidden
unit number is in the range [1..15], the epoch number
in the set (50, 200, 500, 1000), and the initial learning
rate in the range [0.01..0.5]. During the training, the
learning rate is halved if training errors continuously
go up for a pre-defined number (40, 60, 80, or 100) of
epochs. For experiments on each data split, the neural
network parameters are fully optimized on the training
data without using any test data.
For each experiment, after the parameters are opti-
mized on the training data, we train five models on
the training data with the optimal parameters, start-
ing from different initial weights. The ensemble of five
trained models are then used to estimate the general-
ized performance on the test data. That is, the average
output of five neural network models is used to make
predictions.
We evaluate our method using zero-one error and mean
absolute error as in (Chu & Ghahramani, 2005a).
Zero-one error is the percentage of wrong assignments
of ordinal categories. Mean absolute error is the root
mean square difference between assigned categories
(k′) and true categories (k) of all data points. For
each dataset, the training and evaluation process is
repeated 20 times on 20 data splits. Thus, we com-
pute the average error and the standard deviation of
the two metrics as in (Chu & Ghahramani, 2005a).
3.2. Comparison with Neural Network
Classification
We first compare our method (NNRank) with a stan-
dard neural network classification method (NNClass).
We implement both NNRank and NNClass using
C++. NNRank and NNClass share most code with
minor difference in the transfer function of output
nodes and its derivative computation as described in
Section 2.2.
As Table 1 shows, NNRank outperforms NNClass in
all but one case in terms of both the mean-zero error
and the mean absolute error. And on some datasets
the improvement of NNRank over NNClass is sizable.
For instance, on the Stock and Pyrimidines datasets,
the mean zero-one error of NNRank is about 4% less
than NNClass; on four datasets (Stock, Pyrimidines,
Triazines, and Diabetes) the mean absolute error is
reduced by about .05. The results show that the or-
dinal regression neural network consistently achieves
the better performance than the standard classifica-
tion neural network. To futher verify the effectiveness
of the neural network ordinal regression approach, we
are currently evaluating NNRank and NNclass on very
large ordinal regression datasets in the bioinformatics
domain (work in progress).
3.3. Comparison with Gaussian Processes and
Support Vector Machines
To further evaluate the performance of our method, we
compare NNRank with two Gaussian process meth-
ods (GP-MAP and GP-EP) (Chu & Ghahramani,
2005a) and a support vector machine method (SVM)
(Shashua & Levin, 2003) implemented in (Chu &
Ghahramani, 2005a). The results of the three meth-
ods are quoted from (Chu & Ghahramani, 2005a). Ta-
ble 2 reports the zero-one error on the eight datasets.
NNRank achieves the best results on Diabetes, Tri-
azines, and Abalone, GP-EP on Pyrimidines, Auto
MPG, and Boston, GP-MAP on Machine, and SVM
on Stocks.
Table 3 reports the mean absolute error on the eight
datasets. NNRank yields the best results on Diabetes
and Abalone, GP-EP on Pyrimidines, Auto MPG, and
Boston, GP-MAP on Triazines and Machine, SVM on
Stocks.
In summary, on the eight datasets, the performance
of NNRank is comparable to the three state-of-the-art
methods for ordinal regression.
4. Discussion and Future Work
We have described a simple yet novel approach to
adapt traditional neural networks for ordinal regres-
sion. Our neural network approach can be consid-
ered a generalization of one-layer perceptron approach
(Crammer & Singer, 2002) into multi-layer. On the
standard benchmark of ordinal regression, our method
outperforms standard neural networks used for classi-
fication. Furthermore, on the same benchmark, our
method achieves the similar performance as the two
state-of-the-art methods (support vector machines and
Gaussian processes) for ordinal regression.
Compared with existing methods for ordinal regres-
sion, our method has several advantages of neural net-
works. First, like the perceptron approach (Crammer
& Singer, 2002), our method can learn in both batch
and online mode. The online learning ability makes
our method a good tool for adaptive learning in the
real-time. The multi-layer structure of neural network
and the non-linear transfer function give our method
the stronger fitting ability than perceptron methods.
Second, the neural network can be trained on very
A Neural Network Approach to Ordinal Regression
large datasets iteratively, while training is more com-
plex than support vector machines and Gaussian pro-
cesses. Since the training process of our method is the
same as traditional neural networks, average neural
network users can use this method for their tasks.
Third, neural network method can make rapid
prediction once models are trained. The ability of
learning on very large dataset and predicting in
time makes our method a useful and competitive
tool for ordinal regression tasks, particularly for
time-critical and large-scale ranking problems in
information retrieval, web page ranking, collaborative
filtering, and the emerging fields of Bioinformat-
ics. We are currently applying the method to
rank proteins according to their structural rele-
vance with respect to a query protein (Cheng &
Baldi, 2006). To facilitate the application of this
new approach, we make both NNRank and NNClass
to accept a general input format and freely available at
http://www.eecs.ucf.edu/∼jcheng/cheng software.html.
There are some directions to further improve the neu-
ral network (or multi-layer perceptron) approach for
ordinal regression. One direction is to design a trans-
fer function to ensure the monotonic decrease of the
outputs of the neural network; the other direction
is to derive the general error bounds of the method
under the binary classification framework (Li & Lin,
2006). Furthermore, the other flavors of implemen-
tations of the multi-threshold multi-layer perceptron
approach for ordinal regression are possible. Since ma-
chine learning ranking is a fundamental problem that
has wide applications in many diverse domains such
as web page ranking, information retrieval, image re-
trieval, collaborative filtering, bioinformatics and so
on, we believe the further exploration of the neural net-
work (or multi-layer perceptron) approach for ranking
and ordinal regression is worthwhile.
References
Agarwal, S., & Roth, D. (2005). Learnability of bipar-
tite ranking functions. In Proc. of the 18th annual
conference on learning theory (colt-05).
Aiolli, F., & Sperduti, A. (2004). Learning preferences
for multiclass problems. In Advances in neural in-
formation processing systems 17 (nips).
Basilico, J., & Hofmann, T. (2004). Unifying collabo-
rative and content-based filtering. In Proceedings of
the twenty-first international conference on machine
learning (icml), 9. New York, USA: ACM press.
Bishop, C. (1996). Neural networks for pattern recog-
nition. USA: Oxford University Press.
Burges, C., Ragno, R., & Le, Q. V. (2006). Learning
to rank with nonsmooth cost functions. In Advances
in neural information processing systems (nips) 20.
Cambridge, MA: MIT press.
Burges, C. J. C., Shaked, T., Renshaw, E., Lazier, A.,
Deeds, M., Hamilton, N., & Hullender, G. (2005).
Learning to rank using gradient descent. In Proc. of
internaltional conference on machine learning (icml-
05), 89–97.
Caruana, R., Baluja, S., & Mitchell, T. (1996). Using
the future to sort out the present: Rankprop and
multitask learning for medical risk evaluation. In
Advances in neural information processing systems
8 (nips).
Cheng, J., & Baldi, P. (2006). A machine learning in-
formation retrieval approach to protein fold recog-
nition. Bioinformatics, 22, 1456–1463.
Chu, W., & Ghahramani, Z. (2005a). Gaussian pro-
cesses for ordinal regression. Journal of Machine
Learning Research, 6, 1019–1041.
Chu, W., & Ghahramani, Z. (2005b). Preference learn-
ing with Gaussian processes. In Proc. of inter-
national conference on machine learning (icml-05),
137–144.
Chu, W., & Keerthi, S. (2005). New approaches to
support vector ordinal regression. In Proc. of inter-
national conference on machine learning (icml-05),
145–152.
Chu, W., & Keerthi, S. (2007). Support vector ordinal
regression. Neural Computation, 19.
Cohen, W. W., Schapire, R. E., & Singer, Y. (1999).
Learning to order things. Journal of Artificial Intel-
ligence Research, 10, 243–270.
Crammer, K., & Singer, Y. (2002). Pranking with
ranking. In Advances in neural information pro-
cessing systems (nips) 14, 641–647. Cambridge, MA:
MIT press.
Dekel, O., Keshet, J., & Singer, Y. (2004). Log-linear
models for label ranking. In Proc. of the 21st inter-
national conference on machine learning (icml-06),
209–216.
Frank, E., & Hall, M. (2001). A simple approach to
ordinal classification. In Proc. of the european con-
ference on machine learning.
A Neural Network Approach to Ordinal Regression
Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003).
An efficient boosting algorithm for combining pref-
erences. Journal of Machine Learning Research, 4,
933–969.
Goldberg, D., Nichols, D., Oki, B., & Terry, D. (1992).
Using collaborative filtering to weave an information
tapestry. Communications of the ACM, 35, 61–70.
Har-Peled, S., Roth, D., & Zimak, D. (2002). Con-
straint classification: a new approach to multiclass
classification and ranking. In Advances in neural
information processing systems 15 (nips).
Herbrich, R., Graepel, T., Bollmann-Sdorra, P., &
Obermayer, K. (1998). Learning preference relations
for information retrieval. In Proc. of icml workshop
on text categorization and machine learning, 80–84.
Herbrich, R., Graepel, T., & Obermayer, K. (1999).
Support vector learning for ordinal regression. In
Proc. of 9th international conference on artificial
neural networks (icann), 97–102.
Herbrich, R., Graepel, T., & Obermayer, K. (2000).
Large margin rank boundaries for ordinal regres-
sion. In A. J. Smola, P. Bartlett, B. Scholkopf and
D. Schuurmans (Eds.), Advances in large margin
classifiers, 115–132. Cambridge, MA: MIT Press.
Jin, R., & Ghahramani, Z. (2003). Learning with
multiple labels. In Advances in neural information
processing systems (nips) 15. Cambridge, MA: MIT
press.
Joachims, I. (2002). Optimizing search engines us-
ing clickthrough data. In D. Hand, D. Keim and
R. NG (Eds.), Proc. of 8th acm sigkdd international
conference on knowledge discovery and data mining,
133–142.
Kramer, S., Widmer, G., Pfahringer, B., & DeGroeve,
M. (2001). Prediction of ordinal classes using regres-
sion trees. Fundamenta Informaticae, 47, 1–13.
Li, L., & Lin, H. (2006). Ordinal regression by ex-
tended binary classification. In Advances in neu-
ral information processing systems (nips) 20. Cam-
bridge, MA: MIT press.
MacKay, D. J. C. (1992). A practical bayesian frame-
work for back propagation networks. Neural Com-
putation, 4, 448–472.
McCullagh, P. (1980). Regression models for ordinal
data. Journal of the Royal Statistical Society B, 42,
109–142.
McCullagh, P., & Nelder, J. A. (1983). Generalized
linear models. London: Chapman and Hall.
Minka, T. P. (2001). A family of algorithms for ap-
proximate bayesian inference. PhD Thesis, Mas-
sachusetts Institute of Technology.
Paquet, U., Holden, S., & Naish-Guzman, A. (2005).
Bayesian hierarchical ordinal regression. In Proc. of
the international conference on artifical neural net-
works.
Rajaram, S., Garg, A., Zhou, X., & Huang, T. (2003).
Classification approach towards ranking and sort-
ing problems. In Machine learning: Ecml 2003,
vol. 2837 of lecture notes in artificail intelligence
(n. lavrac, d. gamberger, h. blockeel and l. todorovski
eds.), 301–312. Springer-Verlag.
Richard, M., & Lippman, R. (1991). Neural network
classifiers estimate bayesian a-posteriori probabili-
ties. Neural Computation, 3, 461–483.
Rumelhart, D., Hinton, G., & Williams, R. (1986).
Learning Internal Representations by Error Propa-
gation. In D. E. Rumelhart and J. L. McClelland
(Eds.), Parallel distributed processing: Explorations
in the microstructure of cognition. vol. i: Founda-
tions, 318–362. Bradford Books/MIT Press, Cam-
bridge, MA.
Schölkopf, B., & Smola, A. (2002). Learning with Ker-
nels, Support Vector Machines, Regularization, Op-
timization and Beyond. Cambridge, MA: MIT Uni-
versity Press.
Schwaighofer, A., Tresp, V., & Yu, K. (2005). Hiear-
achical bayesian modelling with gaussian processes.
In Advances in neural information processing sys-
tems 17 (nips). MIT press.
Shashua, A., & Levin, A. (2003). Ranking with large
margin principle: two approaches. In Advances in
neural information processing systems 15 (nips).
Vapnik, V. (1995). The nature of statistical learning
theory. Berlin, Germany: Springer-Verlag.
Wu, H., Lu, H., & Ma, S. (2003). A practical svm-
based algorithm for ordinal regression in image re-
trieval. 612–621.
Yu, S., Yu, K., Tresp, V., & Kriegel, H. P. (2006).
Collaborative ordinal regression. In Proc. of 23rd
international conference on machine learning, 1089–
1096.
A Neural Network Approach to Ordinal Regression
Zhang, H., Jiang, L., & Su, J. (2005). Augmenting
naive bayes for ranking. In International conference
on machine learning (icml-05).
A Neural Network Approach to Ordinal Regression
Table 1. The results of NNRank and NNClass on the eight datasets. The results are the average error over 20 trials along
with the standard deviation.
Mean zero-one error Mean absolute error
Dataset NNRank NNClass NNRank NNClass
Stocks 12.68±1.8% 16.97± 2.3% 0.127±0.01 0.173±0.02
Pyrimidines 37.71±8.1% 41.87±7.9% 0.450±0.09 0.508±0.11
Auto MPG 27.13±2.0% 28.82±2.7% 0.281±0.02 0.307±0.03
Machine 17.03±4.2% 17.80±4.4% 0.186±0.04 0.192±0.06
Abalone 21.39±0.3% 21.74± 0.4% 0.226±0.01 0.232±0.01
Triazines 52.55±5.0% 52.84±5.9% 0.730±0.06 0.790±0.09
Boston 26.38±3.0% 26.62±2.7% 0.295±0.03 0.297±0.03
Diabetes 44.90±12.5% 43.84±10.0% 0.546±0.15 0.592±0.09
Table 2. Zero-one error of NNRank, SVM, GP-MAP, and GP-EP on the eight datasets. SVM denotes the support vector
machine method (Shashua & Levin, 2003; Chu & Ghahramani, 2005a). GP-MAP and GP-EP are two Gaussian process
methods using Laplace approximation (MacKay, 1992) and expectation propagation (Minka, 2001) respectively (Chu &
Ghahramani, 2005a). The results are the average error over 20 trials along with the standard deviation. We use boldface
to denote the best results.
Data NNRank SVM GP-MAP GP-EP
Triazines 52.55±5.0% 54.19±1.5% 52.91±2.2% 52.62±2.7%
Pyrimidines 37.71±8.1% 41.46±8.5% 39.79±7.2% 36.46±6.5%
Diabetes 44.90±12.5% 57.31±12.1% 54.23±13.8% 54.23±13.8%
Machine 17.03±4.2% 17.37±3.6% 16.53±3.6% 16.78±3.9%
Auto MPG 27.13±2.0% 25.73±2.2% 23.78±1.9% 23.75±1.7%
Boston 26.38±3.0% 25.56±2.0% 24.88±2.0% 24.49±1.9%
Stocks 12.68±1.8% 10.81±1.7% 11.99±2.3% 12.00±2.1%
Abalone 21.39±0.3% 21.58±0.3% 21.50±0.2% 21.56±0.4%
Table 3. Mean absolute error of NNRank, SVM, GP-MAP, and GP-EP on the eight datasets. SVM denotes the support
vector machine method (Shashua & Levin, 2003; Chu & Ghahramani, 2005a). GP-MAP and GP-EP are two Gaussian
process methods using Laplace approximation and expectation propagation respectively (Chu & Ghahramani, 2005a).
The results are the average error over 20 trials along with the standard deviation. We use boldface to denote the best
results.
Data NNRank SVM GP-MAP GP-EP
Triazines 0.730±0.07 0.698±0.03 0.687±0.02 0.688±0.03
Pyrimidines 0.450±0.10 0.450±0.11 0.427±0.09 0.392±0.07
Diabetes 0.546±0.15 0.746±0.14 0.662±0.14 0.665±0.14
Machine 0.186±0.04 0.192±0.04 0.185±0.04 0.186±0.04
Auto MPG 0.281±0.02 0.260±0.02 0.241±0.02 0.241±0.02
Boston 0.295±0.04 0.267±0.02 0.260±0.02 0.259±0.02
Stocks 0.127±0.02 0.108±0.02 0.120±0.02 0.120±0.02
Abalone 0.226±0.01 0.229±0.01 0.232±0.01 0.234±0.01
ABSTRACT
  Ordinal regression is an important type of learning, which has properties of
both classification and regression. Here we describe a simple and effective
approach to adapt a traditional neural network to learn ordinal categories. Our
approach is a generalization of the perceptron method for ordinal regression.
On several benchmark datasets, our method (NNRank) outperforms a neural network
classification method. Compared with the ordinal regression methods using
Gaussian processes and support vector machines, NNRank achieves comparable
performance. Moreover, NNRank has the advantages of traditional neural
networks: learning in both online and batch modes, handling very large training
datasets, and making rapid predictions. These features make NNRank a useful and
complementary tool for large-scale data processing tasks such as information
retrieval, web page ranking, collaborative filtering, and protein ranking in
Bioinformatics.

<|endoftext|><|startoftext|>
Introduction
	Experimental
	Statistical quantities
	roughness exponents
	 The Markov nature of height fluctuations
	The level crossing analysis
	Results and discussion
	The tip convolution effect
	Conclusions
	Acknowledgments
	References
ABSTRACT
  The effect of bias voltages on the statistical properties of rough surfaces
has been studied using atomic force microscopy technique and its stochastic
analysis. We have characterized the complexity of the height fluctuation of a
rough surface by the stochastic parameters such as roughness exponent, level
crossing, and drift and diffusion coefficients as a function of the applied
bias voltage. It is shown that these statistical as well as microstructural
parameters can also explain the macroscopic property of a surface. Furthermore,
the tip convolution effect on the stochastic parameters has been examined.

<|endoftext|><|startoftext|>
Etched Glass Surfaces, Atomic Force Microscopy and Stochastic Analysis
G. R. Jafari a,b, M. Reza Rahimi Tabar c,d, A. Iraji zad c, G. Kavei f 1
1a Department of Physics, Shahid Beheshti University, Evin, Tehran 19839, Iran
b Department of Nano-Science, IPM, P. O. Box 19395-5531, Tehran, Iran
c Department of Physics, Sharif University of Technology, P. O. Box 11365-9161, Tehran, Iran
d CNRS UMR 6529, Observatoire de la Côte d’Azur, BP 4229, 06304 Nice Cedex 4, France
e Material and Energy, Research Center, P.O. Box 14155-4777, Tehran, Iran
The effect of etching time scale of glass surface on its statistical properties has been studied using atomic force
microscopy technique. We have characterized the complexity of the height fluctuation of a etched surface by
the stochastic parameters such as intermittency exponents, roughness, roughness exponents, drift and diffusion
coefficients and find their variations in terms of the etching time.
PACS numbers:
I. INTRODUCTION
The complexity of rough surfaces is subject of a large va-
riety of investigations in different fields of science1,2. Sur-
face roughness has an enormous influence on many important
physical phenomena such as contact mechanics, sealing, ad-
hesion, friction and self-cleaning paints and glass windows,3,4.
A surface roughness of just a few nanometers is enough to re-
move the adhesion between clean and (elastically) hard solid
surfaces3. The physical and chemical properties of surfaces
and interfaces are to a significant degree determined by their
topographic structure. The technology of micro fabrication of
glass is getting more and more important because glass sub-
strates are currently being used to fabricate micro electro me-
chanical system (MEMS) devices5. Glass has many advan-
tages as a material for MEMS applications, such as good me-
chanical and optical properties. It is a high electrical insulator,
and it can be easily bonded to silicon substrates at tempera-
tures lower than the temperature needed for fusion bonding6.
Also micro and nano-structuring of glass surfaces is impor-
tant for the production of many components and systems such
as gratings, diffractive optical elements, planar wave guide de-
vices, micro-fluidic channels and substrates for (bio) chemical
applications7. Wet etching is also well developed for some of
these applications8,9,10,11,12,13,14.
One of the main problems in the rough surface is the scal-
ing behavior of the moments of height h and evolution of the
probability density function (PDF) of h, i.e. P (h, x) in terms
of the length scale x. Recently some authors have been able
to obtain a Fokker-Planck equation describing the evolution
of the probability distribution function in terms of the length
scale, by analyzing some stochastic phenomena, such as rough
surfaces15,16,17, turbulent system18, financial data19, cosmic
background radiation20 and heart interbeats21 etc. They no-
ticed that the conditional probability density of field incre-
ment satisfies the Chapman-Kolmogorov equation. Mathe-
matically, this is a necessary condition for the fluctuating data
to be a Markovian process in the length (time) scales22.
In this work, we investigate the etching process as a
stochastic process. We measure the intermittency exponents
of height structure function, roughness, roughness exponents
and Kramers-Moyal‘s (KM) coefficients. Indeed we consider
the etching time t, as an external parameter, to control the sta-
tistical properties of a rough surface and find their variations
with t. It is shown that the first and second KM‘s coefficients
have well-defined values, while the third and fourth order co-
efficients tend to zero. The first and second KM‘s coefficients
for the fluctuations of h(x), enables us to explain the height
fluctuation of the etched glass surface.
II. EXPERIMENTAL
We started with glass microscope slides as a sample. Only
one side of samples was etched by HF solution for different
etching time (less than 20 minutes). HF concentration was
%40 for all the experiments. The surface topography of the
etched glass samples in the scale (< 5µm) was obtained us-
ing an AFM (Park Scientific Instruments). The images in
this scale were collected in a constant force mode and digi-
tized into 256×256 pixels. A commercial standard pyramidal
Si3N4 tip was used. A variety of scans, each with size L,
were recorded at random locations on the surface. Figure 1
shows typical AFM image with resolutions of about 20nm.
III. STATISTICAL QUANTITIES
A. Multifractal Analysis and the Intermittency Exponent
Assuming statistical translational invariance, the structure
functions Sq(l) =< |h(x + l) − h(x)|q >, (moments of the
increment of the rough surface height fluctuation h(x)) will
depend only on the space deference of heights l, and has a
power law behavior if the process has the scaling property:
Sq(l) =< |h(x + l)− h(x)|q >∝ Sq(L0)(
)ξ(q) (1)
where L0 is the fixed largest length scale of the system,
< ··· > denotes statistical average (for non-overlapping incre-
ments of length l), q is the order of the moment (we take here
q > 0), and ξ(q) is the exponents of structure function. The
http://arxiv.org/abs/0704.1030v1
FIG. 1: AFM surface image of etched glass film with size 5× 5µm2
after 12 minutes.
second moment is linked to the slope β of the Fourier power
spectrum: β = 1 + ξ2. The main property of a multifractal
processes is that it is characterized by a non-linear ξq function
verses q. Monofractals are the generic result of this linear be-
havior. For instance, for Brownian motion (Bm) ξq = q/2,
and for fractional Brownian motion (fBm) ξq ∝ q.
B. Roughness and Roughness Exponents
It is also known that to derive the quantitative information
of the surface morphology one may consider a sample of size
L and define the mean height of growing film h and its vari-
ance, σ by:
σ(L, t) = (〈(h − h)2〉)1/2 (2)
where t is etching time and 〈· · · 〉 denotes an averaging over
different samples, respectively. Moreover, etching time is a
factor which can apply to control the surface roughness of thin
films.
Let us now calculate also the roughness exponent of the
etched glass. Starting from a flat interface (one of the possible
initial conditions), it is conjectured that a scaling of space by
factor b and of time by factor bz (z is the dynamical scaling
exponent), rescales the variance, σ by factor bχ as follows1:
σ(bL, bzt) = bασ(L, t) (3)
which implies that
σ(L, t) = Lαf(t/Lz). (4)
If for large t and fixed L (x = t/Lz → ∞) σ saturate. How-
ever, for fixed large L and t ≪ Lz , one expects that corre-
lations of the height fluctuations are set up only within a dis-
tance t1/z and thus must be independent of L. This implies
that for x ≪ 1, f(x) ∼ xβ with β = α/z. Thus dynamic
scaling postulates that
σ(L, t) ∝
tβ , t≪ Lz;
Lα, t≫ Lz.
The roughness exponent α and the dynamic exponent β char-
acterize the self-affine geometry of the surface and its dynam-
ics, respectively.
The common procedure to measure the roughness exponent
of a rough surface is use of the surface structure function de-
pending on the length scale l which is defined as:
S2(l) = 〈|h(x + l)− h(x)|2〉. (6)
It is equivalent to the statistics of height-height correlation
function C(l) for stationary surfaces, i.e. S2(l) = 2σ2(1 −
C(l)). The second order structure function S(l), scales with l
as l2α1.
C. The Markov Nature of Height Fluctuations: Drift and
Diffusion Coefficients
We check whether the data of height fluctuations follow a
Markov chain and, if so, measure the Markov length scale
lM . As is well-known, a given process with a degree of
randomness or stochasticity may have a finite or an infinite
Markov length scale23. The Markov length scale is the min-
imum length interval over which the data can be considered
as a Markov process. To determine the Markov length scale
lM , we note that a complete characterization of the statisti-
cal properties of random fluctuations of a quantity h in terms
of a parameter x requires evaluation of the joint PDF, i.e.
PN (h1, x1; ....;hN , xN ), for any arbitrary N . If the process
is a Markov process (a process without memory), an im-
portant simplification arises. For this type of process, PN
can be generated by a product of the conditional probabili-
ties P (hi+1, xi+1|hi, xi), for i = 1, ..., N − 1. As a nec-
essary condition for being a Markov process, the Chapman-
Kolmogorov equation,
P (h2, x2|h1, x1) =
d(hi)P (h2, x2|hi, xi)P (hi, xi|h1, x1) (7)
should hold for any value of xi, in the interval x2 < xi <
The simplest way to determine lM for homogeneous sur-
face is the numerical calculation of the quantity, S =
|P (h2, x2|h1, x1)−
dh3P (h2, x2|h3, x3)P (h3, x3|h1, x1)|,
for given h1 and h2, in terms of, for example, x3 − x1
and considering the possible errors in estimating S. Then,
lM = x3 − x1 for that value of x3 − x1 such that, S = 0
It is well-known, the Chapman-Kolmogorov equation
yields an evolution equation for the change of the distribu-
tion function P (h, x) across the scales x. The Chapman-
Kolmogorov equation formulated in differential form yields a
l (nm)
50 100 150 200
FIG. 2: Scaling of the structure functions in log-log plot for mo-
ments less than 8. (from bottom to top).
master equation, which can take the form of a Fokker-Planck
equation22,23:
P (h, x) = [−
D(1)(h, x) +
D(2)(h, x)]P (h, x).(8)
The drift and diffusion coefficients D(1)(h, r), D(2)(h, r) can
be estimated directly from the data and the moments M (k) of
the conditional probability distributions:
D(k)(h, x) =
limr→0M
M (k) =
dh′(h′ − h)kP (h′, x+ r|h, x). (9)
The coefficients D(k)(h, x)‘s are known as Kramers-Moyal
coefficients. According to Pawula‘s theorem22, the Kramers-
Moyal expansion stops after the second term, provided that
the fourth order coefficient D(4)(h, x) vanishes22. The forth
order coefficients D(4) in our analysis was found to be about
D(4) ≃ 10−4D(2). In this approximation, we can ignore the
coefficients D(n) for n ≥ 3. We note that this Fokker-Planck
equation is equivalent to the following Langevin equation (us-
ing the Ito interpretation)22:
h(x) = D(1)(h, x) +
D(2)(h, x)f(x) (10)
where f(x) is a random force, zero mean with gaussian statis-
tics, δ-correlated in x, i.e. 〈f(x)f(x′)〉 = 2δ(x−x′). Further-
more, with this last expression, it becomes clear that we are
able to separate the deterministic and the noisy components
of the surface height fluctuations in terms of the coefficients
D(1) and D(2).
2 4 6 8 10
FIG. 3: The results of scaling exponent ξq which is clearly linear vs.
l (nm)
500 1000
6 min
8 min
10 min
12 min
15 min
FIG. 4: Log-Log plot of selection structure function of the etched
glass surfaces.
IV. RESULTS AND DISCUSSION
Now, using the introduced statistical parameters in the pre-
vious sections, it is possible to obtain some quantitative infor-
mation about the effect of etching time on surface topography
of the glass surface. To study the effect of the etching time on
the surface statistical characteristics, we have utilized AFM
imaging technique in order to obtain microstructural data of
the etched glass surfaces at the different etching time in the
HF. Figure 1 shows the AFM image of etched glass after 12
minuets etched. To investigate the scaling behavior of the mo-
ments of δhl = h(x + l) − h(x), we consider the samples
that they reached to the stationary state. This means that their
statistical properties do not change with time. In our case the
-0.2 0 0.2
-0.08
-0.06
-0.04
-0.02
2 min
6 min
8 min
10 min
12 min
15 min
FIG. 5: Drift coefficients of the surfaces at different etching time
less than 20 minutes.
samples with etching time more than 20 minutes are almost
stationary. Figure 2 shows the log-log plot of the structure
functions verses length scale l for different orders of moments.
The straight lines show that the moments of order q have the
scaling behavior. We have checked the scaling relation up to
moment q = 10. The resulting intermittency exponent ξq is
shown in figure 3. It is evident that ξq has a linear behavior.
This means that the height fluctuations are mono-fractal be-
havior. We also directly estimated the scaling exponent of the
linear term lqH/ < (h(x + l)− h(x))q > and obtain the fol-
lowing values for the samples with 20 minuets etching time,
ξ1 = 0.70± 0.04 and ξ2 = 1.40 ± 0.04. This means etching
memorize fractal feature during etching. Therefore using the
scaling exponent ξ2 we obtain the roughness exponent α as
ξ2/2 = 0.70± 0.04. Figure 4 presents the structure function
S(l) of the surface at the different etching time, using equation
(6). It is also possible to evaluate the grain size dependence
to the etching time, using the correlation length achieved by
the structure function represented in figure 4. The correlation
lengths increase with etching time. Its value has a exponen-
tial behavior 448(1 − exp(−0.15t))nm. Also we find that
the dynamical exponent is given by β = 0.6 ± 0.1. Also
we measured the variation of the Markov length with etching
time t (min), and obtain lM = 40 + 3t (nm) for time scales
t < 20min.
Finally to obtain the stochastic equation of the height fluc-
tuations behavior of the surface, we need to measure the
Keramer- Moyal Coefficients. In our analysis the forth order
coefficients D(4) is less than Second order coefficients, D(2),
about D(4) ≃ 10−4D(2). In this approximation, we ignore the
coefficients D(n) for n ≥ 3. So, to discuss the surfaces it just
needs to measure the drift coefficient D(1)(h
) and diffusion
coefficient D(2)(h
) using Eq. (9). Figures 5 and 6 show the
drift coefficient D(1)(h
) and diffusion coefficients D(2)(h
for the surfaces at the different etching time, respectively. It
-0.2 0 0.2
0.001
0.002
0.003
0.004 2 min
6 min
8 min
10 min
12 min
15 min
FIG. 6: Diffused coefficients of the surface at different etching time
less than 20 minutes.
can be shown that the drift and diffusion coefficients have the
following behavior,
D(1)(
, t) = −f (1)(t)
D(2)(
, t) = f (2)(t)(
)2 (12)
The two coefficients f (1)(t) and f (2)(t) increase with the
then is saturated. Using the data analysis we obtain that
they are linear verses time (min): f (1)(t) = 0.005t and
f (2)(t) = 0.0003t for time scales t < 20 min. To better
comparing the parameter of samples we divided the heights
to their variances. In this case, maximum and minimum of
heights are about plus 1 and mines 1, respectively. Compar-
ing samples with etching times 2 and 6 minutes, shows f (1)
increases 300 percent after 4 minutes (from 2 min to 6 min)
from f (1)(t = 2×60) = 0.6 to f (1)(t = 6×60) = 1.8. Also,
f (2) is 0.006 and 0.018 after 2 and 6 minutes, respectively.
V. CONCLUSIONS
We have investigated the role of etching time, as an exter-
nal parameter, to control the statistical properties of a rough
surface. We have shown that in the saturate state the struc-
ture of topography has fractal feature with fractal dimension
Df = 1.30. In addition, Langevin characterization of the
etched surfaces enable us to regenerate the rough surfaces
grown at the different etching time, with the same statistical
properties in the considered scales15.
VI. ACKNOWLEDGMENT
We would like to thank S. M. Mahdavi for his useful com-
ments and discussions and Also P. Kaghazchi and M. Shirazi
for samples preparation.
1 A.L. Barabasi and H.E. Stanley, Fractal Concepts in Surface
Growth (Cambridge University Press, New York, 1995).
2 S. Davies, P. Hall , J. Roy. Stat. Soc. B 61 (1999) 3.
3 A. G. Peressadko, N. Hosoda, and B. N. J. Persson, PRL 95,
124301 (2005), B N J Persson, O Albohr, U Tartaglino, A I Volok-
itin and E Tosatti, J. Phys.: Condens. Matter 17 (2005) R1R62.
4 Zhao Y-P,Wang L S and Yu T X, J. Adhes. Sci. Technol. 17 519
(2003)
5 Won Ick Jang, Chang Auck Choi, Myung Lae Lee, Chi Hoon Jun
and Youn Tae Kim, J. Micromech. Microeng. 12 (2002) 297306.
6 M. Bu , T. Melvin a, G. J. Ensell, J. S. Wilkinson, A. G.R. Evans,
Sensors and Actuators A 115:pp. 476-482 (2004).
7 Yu-Cheng Lin, Hsiao-Ching Ho, Chien-Kai Tseng and Shao-Qin
Hou 2001 J. Micromech. Microeng. 11 189-194
8 D.M. Knotter, J. Am. Chem. Soc. 122 (2000) 4345.
9 G.A.C.M. Spierings, J. Mater. Sci. 28 (1993) 6261.
10 R. Schuitema, et al, Light scattering at rough interfaces of
thin film solar cells to improve the efficiency and stability,
IEEE/ProRISC99, pp 399-404 (1999).
11 L. B. Glebov, et al, Photo induced chemical etching of silicate and
borosilicate glasses, Glasstech. Ber. Glass Sci. Technol. 75 C2 pp
298 - 301 (2002).
12 G. R. Jafari, S. M. Mahdavi, A. Iraji zad, and P. Kaghazchi, Sur-
face And Interface Analysis; 37: 641 645 (2005).
13 N. Silikas, k.E.R. England, D.C Wattes, K.D Jandt, J. Dentistry
27 (1999) 137.
14 A. Irajizad, G. Kavei, M. Reza Rahimi Tabar, and S.M. Vaez Al-
laei, J. Phys.: Condens. Matter 15, 1889 (2003).
15 G.R. Jafari, S.M. Fazeli, F. Ghasemi, S.M. Vaez Allaei, M. Reza
Rahimi Tabar, A. Irajizad, and G. Kavei, Phys. Rev. Lett. 91,
226101 (2003).
16 M. Waechter, F. Riess, Th. Schimmel, U. Wendt and J. Peinke,
Eur. Phys. J. B 41, 259-277 (2004).
17 P. Sangpour, G. R. Jafari, O. Akhavan, A.Z. Moshfegh, and M.
Reza Rahimi Tabar, Phys. Rev. B 71, 155423 (2005).
18 Christoph Renner, Joachim Peinke, and Rudolf Friedrich, Journal
of Fluid Mechanics, 433:383409, 2001.
19 Ch. Renner, J. Peinke, R. Friedrich, Physica A 298, 499 (2001).
20 F. Ghasemi, A. Bahraminasab, S. Rahvar, and M. Reza Rahimi
Tabar, Preprint arxiv:astro-phy/0312227, 2003.
21 F. Ghasemi, J. Peinke, M. Sahimi and M. Reza Rahimi Tabar, Eur.
Phys. J. B 47, 411(2005)
22 H. Risken, The Fokker-Planck equation (Springer, Berlin, 1984).
23 R. Friedrich, J. Zeller, and J. Peinke, Europhysics Letters 41, 153
(1998).
http://arxiv.org/abs/astro-phy/0312227
ABSTRACT
  The effect of etching time scale of glass surface on its statistical
properties has been studied using atomic force microscopy technique. We have
characterized the complexity of the height fluctuation of a etched surface by
the stochastic parameters such as intermittency exponents, roughness, roughness
exponents, drift and diffusion coefficients and find their variations in terms
of the etching time.

<|endoftext|><|startoftext|>
Introduction. pp elastic scattering is one of the most fundamental reactions in
particle-nuclear physics and is described in transition amplitudes by use of helicity of
initial and final states. Requiring that the interaction is invariant under space inversion,
time reversal and rotation in spin space, pp scattering in a given spin state is described
in five independent transition amplitudes (φi, i = 1− 5) as functions of the center-of-
mass energy squared, s, and t [1]. The understanding of these amplitudes would provide
crucial guidelines to investigate the reaction mechanism.
Each transition amplitude is described as a sum of the hadronic amplitude (φ hadi ) and
the electro-magnetic amplitude (φ emi ). In the small −t region, φ emi and φ hadi become
similar in strength and interfere with each other. We call this interference the Coulomb
Nuclear Interference (CNI). Thanks to the great successes of QED and the past pre-
http://arxiv.org/abs/0704.1031v1
RHIC proton beam
Forward scattered 
proton
proton target 
recoil proton recoil proton 
measure!measure!
( ) 0Tm2ppt Rp2inout <-=-=
-t (GeV/c)
-310 -210
-0.01
FIGURE 1. Left: Example of "parallel" case, p↑p↑ → pp. Right: εNN , εN and εb at
s = 13.7 GeV as
a function of −t.
cisely measured quantities (ex. magnetic moment), φ emi is precisely described. On the
other hand, φ hadi is not fully described by theory, because the perturbative QCD is not
applicable in the CNI region. By use of the experimental data of total and differen-
tial cross-section of unpolarized pp elastic scattering, we can determine the sum of two
non-spin-flip hadronic amplitudes (φ had+ = φ
3 ) [3]. In order to approach to the
hadronic single and double spin-flip amplitudes (φ had5 and φ
2 ), we measure AN and
ANN in the CNI region. AN is defined by the asymmetry of cross-section with up-down
transverse polarization for one of the protons. Similarly ANN is defined by the asymme-
try of cross-section for parallel and anti-parallel transverse polarization for both of the
protons. The left side of Fig. 1 depicts "parallel" case. We define the scattering plane
from 3-momenta of incident and recoil particles, which is normal to the spin directions.
By use of transition amplitudes, AN is expressed as,
−Im[φ em5 (s, t)φ
+ (s, t)+φ
5 (s, t)φ
+ (s, t)]
|φ+(s, t)|2
. (1)
The first term of Eq. 1 is calculable and has a peak around −t ≃ 0.003 (GeV/c)2 [2]
which is generated by proton’s anomalous magnetic moment. Because the presence of
φ had5 introduces a deviation in shape and magnitude from the first term, a measurement
of AN in the CNI region, therefore, can be a sensitive probe for φ had5 .
ANN is expressed as,
ANN ≈
2|φ had5 (s, t)|
2+Re[(φ+(s, t))∗φ had2 (s, t)]
|φ+(s, t)|2
. (2)
Because the first term is 2nd order of φ had5 and the second term is 1st order of φ
2 , ANN
is sensitive to φ had2 [4]. From a consequence of angular momentum conservation at small
−t and large
s, we use φ had4 ∝ t → 0 for these expressions.
Experiment. Experiment has been performed using a polarized hydrogen gas jet
target and polarized RHIC proton beam at 24 GeV/c (
s = 6.7 GeV) and 100 GeV/c
s = 13.7 GeV). We detect the recoil protons by silicon detectors which are located on
both sides of the target. The details of experimental setup are described in [5, 6].
In the pp elastic scattering process, both forward-scattered particle and recoil particle
are protons and there are no other particles involved nor new particles produced in the
process. Since initial states are well defined, the elastic process can be, in principle,
identified by detecting the recoil particle only. By measuring kinetic energy TR, time of
flight, and recoil angle of recoil particle, we measure the mass of recoil particle and all
the forward scattered rest particles. We collected 4.3 M events at
s = 13.7 GeV and
0.8 M events at
s = 6.7 GeV, respectively. The details of event selection is described
in [5].
The selected event yield is sorted by −t bins, which is obtained measuring the kinetic
energy of the recoil particle: −t = 2mpTR, and spin states. mp is the proton mass. Then
we calculate two types of single spin raw asymmetries, εN for the target spin state and
εb for the beam spin state. We also calculate double spin raw asymmetry, εNN for the
target and beam spin states.
The right side of Fig. 1 displays these raw asymmetries of
s = 13.7 GeV data as a
function of −t in the region 0.001 ≤ −t ≤ 0.035 (GeV/c)2 (0.5 ≤ TR ≤ 17 MeV).
The polarized gaseous proton target allowed us to achieve the measurement in the CNI
region for the first time. In order to cancel out the asymmetries of up-down luminosity,
and detector acceptance, we employed so-called "square root formula" for εN and εb
calculation. On the other hand, εNN needs to be corrected by the luminosity asymmetry.
The target spin flips every 5 minutes and the density of both spin states are the same
and stable during the experimental period (∼ 90 hours). The beam intensity , which is
measured by the wall current monitor [7], varies by bunch (every 106 nsec in 2004)
and fill (every several hours). By accumulating intensity for the experimental period, the
variation is compensated. Therefore the luminosity asymmetry is quite small compared
to statistical error of εNN .
Results and discussions. AN is measured normalizing εN by well measured the target
polarization Pt [6],
. (3)
Utilizing the measured AN , we also measure the beam polarization, Pb = εb/AN 1.
Normalizing εNN by Pt and Pb, ANN is obtained via
ANN =
. (4)
The left and right plots of Fig. 2 display the results of AN and ANN at
s = 6.7 GeV
with filled circles and 13.7 GeV with open circles, respectively. The errors on the data
1 This experimental setup also plays an important role in the RHIC spin program to measure the absolute
beam polarization [8].
-t (GeV/c)
-310 -210
=6.7 GeVs
=13.7 GeVs
 = 6.7 GeVsNo had. func. 
 = 13.7 GeVsNo had. func. 
-t (GeV/c)
-310 -210
-0.06
-0.04
-0.02
=6.7 GeVs
=13.7 GeVs
FIGURE 2. The left and right plots display the results of AN and ANN at
s = 6.7 GeV with filled
circles and 13.7 GeV with open circles, respectively. The errors on the data points are statistical. The
lower bands represents the total systematic errors. The solid and dashed lines in the left plots correspond
to the first term in Eq. 1 for these
s, respectively.
points are statistical. The lower bands represent the total systematic errors. The solid and
dashed lines correspond to the first term in Eq. 1 for these
s, respectively.
AN at
s = 13.7 GeV are consistent with the dashed line (χ2/ndf=13.4/14). On the
other hand, although the accuracy is statistically limited, AN at
s = 6.7 GeV are not
consistent with the solid line (χ2/ndf=35.5/9) and this discrepancy implies the presence
of φ had5 . ANN for these
s have no clear −t dependence and the average values are
consistent with zero within 1.5 σ .
In summary, measurements of AN and ANN provide experimental knowledge to poorly
known φ had5 and φ
2 . The
s dependence of φ had5 is provided by AN results and
the theoretical interpretation is under way [9]. However, there is no comprehensive
understanding of φ had2 and φ
5 yet. Further measurements at different
s are required
to fully describe the behavior of φ had2 and φ
REFERENCES
1. M. Jacob and G.C. Wick , Annals Phys. 7, 404 (1959).
2. N.H. Buttimore et al., Phys. Rev. D 59, 114010 (1999).
3. M.M. Block and R.N. Cahn, Czech. J. Phys. 40, 164-175 (1990).
4. T.L. Trueman, RHIC Spin Note, September 27, (2005), hep-ph/0604153.
5. H. Okada et al., Phys. Lett. B 638, 450 (2006); H. Okada, Doctororal thesis, July (2006);
http://www.star.bnl.gov/∼hiromi/HiromiOkadaThesis.pdf
6. A. Zelenski et al., Nucl. Inst. and Meth. A 536, 248 (2005);
T. Wise et al., Nucl. Inst. and Meth. A 559, 1 (2006).
7. P.R. Cameron et al., Nucl.Instrum.Meth. A345, 226-229 (1994).
8. K.O. Eyser, these proceedings (2006).
9. Private discussion with L. Trueman.
http://arxiv.org/abs/hep-ph/0604153
http://www.star.bnl.gov/~hiromi/HiromiOkadaThesis.pdf
ABSTRACT
  Precise measurements of the single spin asymmetry, $A_N$ and the double spin
asymmetry, $A_{NN}$, in proton-proton (\textit{pp}) elastic scattering in the
region of four-momentum transfer squared $0.001 < -t < 0.032 ({\rm GeV}/c)^2$
have been performed using a polarized atomic hydrogen gas jet target and the
RHIC polarized proton beam at 24 GeV/$c$ and 100 GeV/$c$. The polarized gaseous
proton target allowed us to achieve the measurement of $A_{NN}$ in the CNI
region for the first time. Our results of $A_N$ and $A_{NN}$ provide
significant constraints to determine the magnitude of poorly known hadronic
single and double spin-flip amplitudes at this energy.

<|endoftext|><|startoftext|>
Introduction
Let R be a Riemann surface with the equation:
(z − λi)(∗).
We find relations that are satisfied by theta constants with rational characteristics
evaluated at τR, the period matrix of R. Special type identities for period matrices
are known in the case of a general Riemann surface ( Schottky-Jung identities).
For hyperelliptic curves there are vanishing theta constants of even characteristics
that characterize the associated period matrix. According to Mumford, [Mu] spe-
cial relations of non vanishing of theta constants evaluated at period matrices of
hyperelliptic curves were obtained by Frobenius.
The original Schottky problem seeks special relations among theta constants that
characterize the entire moduli space of algebraic curves of genus g. In this note we
seek special relations that are satisfied by n-sheeted cyclic covers of the sphere.
When n = 2 cyclic covers are just hyperelliptic curves. The next case is n = 3 and
we find relations between theta constants with rational characteristics evaluated at
τR the period matrices of such curves .
These identities are a result of Thomae formula for cyclic n sheeted covers of
the sphere. This formula expresses powers of such theta constants evaluated at the
period matrix τR through polynomial expression of λi. A relation between these
polynomials produces a relation between associated theta constants. Applying the
representation theory of the symmetric group, S3m we produce a basis for the vector
space spanned by the polynomials and as a result relations between the associated
theta constants.
For the simplest case of 6 branch points our results overlap with results of Mat-
sumoto [Ma]. In his paper Matsumoto finds the explicit action of S6 on theta
Date: 04 April 2007.
http://arxiv.org/abs/0704.1032v1
2 KOPELIOVICH
constants evaluated at τR and expresses branch points λi as rational functions of
theta constants. As a result he writes identities between cubic powers of these
constants which essentially coincide with the identities obtained by us in the last
section of our note. Using the representation theory of S6 we see that the space
generated by theta constants is 5 dimensional. This seems to be a new result even
in this case. We note that the Algebraic dimension of this particular family of
curves is 3.
This work was partially done during a visit to the TAMU math department and
the author thanks the department for the invitation and kind hospitality. I thank
Samuel Grushevsky and Mike Fried for constructive remarks on this note.
2. Thomae formula for cyclic covers and relations between theta
constants
We explain the general Thomae formula following [Na] for an algebraic curve R
given by the equation:
(z − λi)(∗)
We denote f : R 7→ CP1 the projection (z, y) 7→ z. Define Qi = f
−1(λi), to be
the unique branch point on R that is the pre image of λi. Fix a homology basis
a1, a2...a3m−2, b1, b2, ..., b3m−2 on R such that the intersections are aiaj = 0 = bibj
and aibj = 1. Let v1...v3m−2 be a basis of standard holomorphic differentials dual
to the basis a1, a2...a3m−2, b1, b2, ..., b3m−2 i.e.
vj = 0,
vj = δij . Now fix an
ordering of λi. Let φ be the automorphism of order 3 defined by (z, y) 7→ (z, ωy) for
ω3 = 1.We write α ≡ β for linear equivalent of divisors, i.e. if there exists a function
g : R 7→ CP 1 and div(g) = α − β. The group Div0/ ≡ is Jac(R), the Jacobian of
R. (Div0 - divisors of degree 0.) Let ψ be the mapping ψ : Div 7→ Div/ ≡ . Then
the following lemma is true:
Lemma 2.1. Let P1, P2 ∈ R,P1 6= P2 and
Di = Pi + φ(Pi) + φ
2(Pi), i = 1, 2
then ψ(D1) ≡ ψ(D2).
Proof. Let f1(P ) =
f(P )−f(P1)
f(P )−f(P2)
, then div(f1) = D1 −D2. �
Define D = ψ
P + φ(Pi) + φ
2(Pi)
as the equivalence class in the Jacobian.
Lemma 2.2. Let K be the canonical divisor of R Then the following holds:
D ≡ 3Qi ≡ ∞1 +∞2 +∞3
K ≡ (2m− 2)D
Qi ≡ mD
Proof. The first item follows exactly as in the previous lemma. To show the rest,
note that z dz
is a holomorphic differential with the divisor Q6m−61 . �
THETA CONSTANTS FOR 3-SHEETED COVERS OF THE SPHERE 3
Now let Λ = {Λ1,Λ2,Λ3} be a partition of {1, 2, 3, 4, 5, ...3m} with |Λi| = m
for i = 1, 2, 3. We are interested in the following divisor eΛ associated with the
partition:
eΛ = XΛ1 + 2XΛ2 −D −∆
where for each subset S of {1, 2...3m} we set
Fix a point P0 ∈ R and let ΦP0 : R→ Jac(R) be given by ΦP0 (P ) =
v1...
v3m−2
Definition 2.3. Let Hg denote the set of g × g symmetric matrices, τ such that
the imaginary part of τ is positive definite. For ε, ε′,∈ Rg and τ ∈ Hg we denote
(τ) =
lεZ2g
exp 2πi
)t ε′
This series is uniformly and absolutely convergent on compact subsets of Cg×Hg.
To each w ∈ C3m−2 associate a unique w1, w2 ∈ R
g such that w = w1 + τw2.
[Na] proves the following formula for theta constants with characteristics associ-
ated to divisors eΛ. see [BR] as well:
Theorem 2.4. The divisor eΛ is a point of order 6 on the Jacobian and
(1) θ[eΛ]
6 (τR) = CΛ(detA)
3((Λ0Λ0)(Λ1Λ1)(Λ2Λ2))
(Λ0Λ1)(Λ1Λ2)(Λ0Λ2)
Here A is the matrix of certain differentials integrated with respect to ai. and if
Λi = {i1 < ... < im} ,Λj = {j1 < ... < jm}
(ΛiΛi) =
(λik − λil) , (ΛiΛj) =
k=1,l=1
(λik − λjl)
We apply the theorem to generate special relations between theta functions
with characteristics eΛ, evaluated at τR. For each partition Λ denote the poly-
nomial on the right hand side of the last equation by pΛ. To obtain identities
for θ[eΛ] we search for identities between pΛ. The key observation that allows us
to simplify the problem is the following form of the polynomials: choose Λ =
{{1, 2...,m}, {m+ 1, ..., 2m}, {2m, ..., 3m}}. Then by definition of pΛ the factor
ΛiΛj is the discriminant and a common factor for each pΛ which does not
depend on the partition Λ. Thus identities between θ6[eΛ] are equivalent to identi-
ties between the polynomials
((Λ0Λ0)(Λ1Λ1)(Λ2Λ2))
Consequently, identities between
θ6[eΛ] are equivalent to identities between the
polynomials:
((Λ0Λ0)(Λ1Λ1)(Λ2Λ2)) .
To get a hint for the result observe that the group S3m acts naturally on the
polynomials ((Λ0Λ0)(Λ1Λ1)(Λ2Λ2)) via its action on the partitions of {1...3m} .
Thus Span(((Λ0Λ0)(Λ1Λ1)(Λ2Λ2)) , is a vector space and has a representation of
S3m on it.
4 KOPELIOVICH
3. Explicit Basis
In this section we provide an explicit basis for the space of polynomials from the
previous section. We imitate the process described in [J] to construct a basis for
the irreducible representation of the symmetric group of Sn. For complex numbers
these representations are completely classified. We describe the construction for
any representation of the symmetric group and obtain the relevant case of cyclic
covers as an immediate corollary of the general case. We remind the reader some
facts from the representation theory of Sn.
Let n be a natural number and let k1...km be a partition of n. i.e.
i=1 ki = n
and k1 ≥ k2 ≥ k3... ≥ km.
Definition 3.1. A Young diagram associated to a partition consists of m rows
such that i’th row has ki elements.
Definition 3.2. Let Y be a Young diagram; a tableau is obtained by distributing
the numbers {1...n} within the m rows with the following properties
• Each row contains exactly ki elements
• The numbers in each row form an increasing sequence
Assume that Λ = {Λ0, ...,Λk} is a tableau of n. Define the polynomial:
(ΛiΛi) =
ik<il,{ik,il}∈Λi
(λik − λil ), where pΛ =
(ΛiΛi).
The symmetric group, Sn acts on Λ and therefore acts on the polynomials pΛ. To
find the basis for pΛ we use a modification of Garnier relation [J] (7.1) to construct
a basis for the polynomials.1 Arrange the tableau in columns ( i.e. the first column
will be elements of Λ1 the second column elements of Λ2 etc). Overall we have k
columns for Λ. Let X be a subset of the i − th column of Λ and Y is a subset of
the i+1− th column of Λ. Let σ1...σk be coset representatives for SX×Y in SX
Then we have the Garnier relations:
Theorem 3.3. Let µi denote the number of elements in the i− th column of Λ. if
Y | > µi then
sign(σm) (pσmΛ) = 0.
Proof. If |X
Y | > µi, by the pigeon hole principle there exists an involution δ
such that σmΛ is invariant under it. Thus
sign(σm) (pσmΛ) =
sign(σm) (pδσmΛ) =
signσm (pσmΛ) = 0
In order to exhibit an explicit basis we define a standard Young tableau
1We were not able to find a reference to our approach of constructing Specht modules though
we are confident its a folklore.
THETA CONSTANTS FOR 3-SHEETED COVERS OF THE SPHERE 5
Definition 3.4. A standard tableau is a tableau where the rows and the columns
are arranged in an increasing order.
Definition 3.5. We define an ordering on the set of tableaux by setting Λ1 < Λ2
if there is an i such that
• if j > i than j is in the same column of Λ1,Λ2
• i is in more left column in Λ1 than Λ2.
Theorem 3.6. Let Λ1...Λk be the collection of standard tableaux for a given parti-
tion. Then pΛ1 ...pΛk is a basis for the vector space spanned by Λ.
Proof. We follow [J] in the proof. We show that pΛk spans any other polynomial
corresponding to our partition. Let t be a tableau and suppose by induction that
the theorem is proved for each t1 tableau such that t1 < t. If t is non standard
there exists adjacent columns a1 < ... < aq < ... < ar and b1 < b2... < bq < ...bs
such that aq > bq. Apply Garnier relation for X = a1...ar, Y = b1...bq. For each σ
a representative in SX
Y in SX×Y we have that [tσ] < t by the definition of the
order < . The result follows immediately from the induction hypothesis. �
Definition 3.7. For an element k of the tableau t Let Ck, Rk be the unique column
and row k belongs to. The hook of k, hk is the number of elements beneath k in
Ck plus the number of elements to the right of k in Rk (include the element itself
in the row but not in the column.)
It is well known that the number of standard tableaux equals to
See [J].
4. The ideal of theta identities
We apply the theory of the previous paragraph to cyclic covers of order 3. Ac-
cording to the theory, the hooks of the partitions correspond to tableau with 3 rows
and m elements in each row. Our first corollary is
Corollary 4.1. The dimension of the polynomials pΛ (and hence the vector space
spanned by
θ6[eΛ] (τR) corresponding to them) is:
(3m)!×2
(m+2)!(m+1)!m!
Hence we can also give a basis for θ6[eΛ] (τR) that correspond to the different
partitions eΛ.
Corollary 4.2. The set of
θ6[eΛS ] (τR), ΛS is a standard partition is a basis for
a vector space spanned by
θ6[eΛ] (τR). In particular each
θ6[eΛ] (τR), Λ can be
written as a linear combination of elements from the set
θ6[eΛS ] (τR).
5. Example
Let us revisit the case when there are 6 branch points and the genus of the
surface is 4. In this case, by the formula for the dimension, the number of basis
functions, θ3[eΛ] is : 2 ×
4!3!2!
= 5. We enumerate the 15 partitions as well as the
the polynomials that correspond to them:
(1) Λ = {(1, 2), (3, 4), (5, 6)}pΛ = (λ1 − λ2)(λ3 − λ4)(λ5 − λ6)
(2) Λ = {(1, 2), (3, 5), (4, 6)}pΛ = (λ1 − λ2)(λ3 − λ5)(λ4 − λ6)
6 KOPELIOVICH
(3) Λ = {(1, 2), (3, 6), (4, 5)}pΛ = (λ1 − λ2)(λ3 − λ6)(λ4 − λ5)
(4) Λ = {(1, 3), (2, 4), (5, 6)}pΛ = (λ1 − λ3)(λ2 − λ4)(λ5 − λ6)
(5) Λ = {(1, 3), (2, 5), (4, 6)}pΛ = (λ1 − λ3)(λ2 − λ5)(λ4 − λ6)
(6) Λ = {(1, 3), (2, 6), (4, 5)}pΛ = (λ1 − λ3)(λ2 − λ6)(λ4 − λ5)
(7) Λ = {(1, 4), (2, 5), (3, 6)}pΛ = (λ1 − λ4)(λ2 − λ5)(λ3 − λ6)
(8) Λ = {(1, 4), (2, 6), (3, 5)}pΛ = (λ1 − λ4)(λ2 − λ6)(λ3 − λ5)
(9) Λ = {(1, 4), (2, 3), (5, 6)}pΛ = (λ1 − λ4)(λ2 − λ3)(λ5 − λ6)
(10) Λ = {(1, 5), (2, 3), (4, 6)}pΛ = (λ1 − λ5)(λ2 − λ3)(λ4 − λ6)
(11) Λ = {(1, 5), (2, 4), (3, 6)}pΛ = (λ1 − λ5)(λ2 − λ4)(λ3 − λ6)
(12) Λ = {(1, 5), (2, 6), (3, 4)}pΛ = (λ1 − λ5)(λ2 − λ6)(λ3 − λ4)
(13) Λ = {(1, 6), (2, 3), (4, 5)}pΛ = (λ1 − λ6)(λ2 − λ3)(λ4 − λ5)
(14) Λ = {(1, 6), (2, 4), (3, 5)}pΛ = (λ1 − λ6)(λ2 − λ4)(λ3 − λ5)
(15) Λ = {(1, 6), (2, 5), (3, 4)}pΛ = (λ1 − λ6)(λ2 − λ5)(λ3 − λ4)
The basis for the vector space of the polynomials corresponds to the following
standard tableaux:
(1) Λ = {(1, 2), (3, 4), (5, 6)}pΛ = (λ1 − λ2)(λ3 − λ4)(λ5 − λ6)
(2) Λ = {(1, 2), (3, 5), (4, 6)}pΛ = (λ1 − λ2)(λ3 − λ5)(λ4 − λ6)
(3) Λ = {(1, 3), (2, 4), (5, 6)}pΛ = (λ1 − λ3)(λ2 − λ4)(λ5 − λ6)
(4) Λ = {(1, 3), (2, 5), (4, 6)}pΛ = (λ1 − λ3)(λ2 − λ5)(λ4 − λ6)
(5) Λ = {(1, 4), (2, 5), (3, 6)}pΛ = (λ1 − λ4)(λ2 − λ5)(λ3 − λ6)
The rest of the 10 polynomials can be rewritten as a linear combination of the set
above applying Garnier’s algorithm as in Theorem 3.7. For example we have:
(λ1−λ2)(λ3−λ6)(λ4−λ5) = −(λ1−λ2)(λ3−λ4)(λ5−λ6)+(λ1−λ2)(λ3−λ5)(λ4−λ6)
(λ1−λ3)(λ2−λ6)(λ4−λ5) = −(λ1−λ3)(λ2−λ4)(λ5−λ6)+(λ1−λ3)(λ2−λ5)(λ4−λ6)
(λ1−λ6)(λ2−λ5)(λ3−λ4) = (λ1−λ4)(λ2−λ5)(λ3−λ6)−(λ1−λ3)(λ2−λ5)(λ4−λ6)
The others polynomials can be expressed in a similar way leading to identities
between
θ6[eΛ] (τR) in this case. Let us conclude with the following remarks on
the identities above: In the hyperelliptic curve case the identities between integral
characteristics of theta functions evaluated at period matrix of hyperelliptic curves
arise from vanishing properties of theta functions. In our case it is interesting to
investigate whether an analogous situation can arise. The only source of cubic theta
identities known to the author, is the following theorem in [Ko]:
Theorem 5.1. Let
be an odd integral theta characteristics in genus 3m− 2
Then for any τ ∈ H3m−2:
0≤νi≤3
µiνiθ3
µ′ + 2ν
(0, τ) = 0
It is plausible that the vanishing of theta constants with rational characteristics
of order 3 on τR will produce a new proof for the special identities obtained in
this note using Thomae formula. Finally note that for all the identities (4) the
coefficients are ±1 It is plausible that this a general phenomenon.
THETA CONSTANTS FOR 3-SHEETED COVERS OF THE SPHERE 7
6. conclusion
There exists an extensive literature on Schottky-Jung identities and on theta
constants for hyperelliptic curves. In this note we obtained special identities for
other classes of algebraic curves. In subsequent notes we plan to pursue and develop
further the themes touched in this note, especially applications of similar methods
to general Hurwitz spaces and their mapping class groups.
References
[AK] R.Adin, Y.Kopeliovich, Short Eigenvectors and Multidimensional Theta Functions, Linear
Algebra and Appl. 257(1)(1997) 49-63
[BR] M. Bershadsky, A. Radul, Fermionic fields on Zn curves Comm. in Mathematical Phys.
116(4)(1988) 689-700
[FK1] H. Farkas, Y. Kopeliovich, New Theta Constant Identities Israel Journal of Mathematics
82(1)(1993) 133-140
[FK2] H. Farkas, Y.Kopeliovich, New Theta Constant Identities II Proceeding of
AMS.123(4)(1995) 1009-1020
[J] G.D.James The representation theory of the Symmetric Groups Lecture Notes in Math. vol.
682 (Springer Verlag 1978)
[Ko] Y. Kopeliovich, Multi Dimensional Theta Constant Identities Journal of Geometric Analysis
8 (4)(1998) 571-581
[Ma] K.Matsumoto Theta constants associated with the cyclic triple coverings of the complex
projective line branching at six points Publ. Res. Inst. Math. Sci. 37 (3) (2001) 419-440
[Mu] D. Mumford, Tata Lectures on Theta II (Progress in Mathematics, Birkhauser 1984)
[Na] A. Nakayashiki, On the Thomae formula for ZN curves Publ. Res. Inst. Math Sci. 33
(6)(1997) 987-1015
[Th] J. Thomae, Beitrag zur Bestimmung θ(0, 0...,0) durch di klassenmoduln algebraicer Funk-
tionen Crelle’s Journal 71(1870) 201-222
5736 Las Virgenes Rd. Calabasas CA 91302
	1. Introduction
	2. Thomae formula for cyclic covers and relations between theta constants
	3. Explicit Basis
	4. The ideal of theta identities
	5. Example
	6. conclusion
	References
ABSTRACT
  We find identities between theta constants with rational characteristics
evaluated at period matrix of $R,$ a cyclic 3 sheeted cover of the sphere with
$3k$ branch points $\lambda_1...\lambda_{3k}.$ These identities follow from
Thomae formula \cite{BR}. This formula expresses powers of theta constants as
polynomials in $\lambda_1...\lambda_{3k}.$ We apply the representation of the
symmetric group to find relations between the polynomials and hence between the
associated theta constants.

<|endoftext|><|startoftext|>
Topology of spaces
of equivariant symplectic embeddings
Alvaro Pelayo
Abstract
We compute the homotopy type of the space of Tn-equivariant symplectic embeddings
from the standard 2n-dimensional ball of some fixed radius into a 2n-dimensional symplectic–
toric manifold (M, σ), and use this computation to define a Z≥0-valued step function on R≥0
which is an invariant of the symplectic–toric type of (M, σ). We conclude with a discussion
of the partially equivariant case of this result.
1 The main theorem
Let (M, σ) be a 2n-dimensional symplectic manifold and write Br for the compact 2n-ball of
radius r > 0 in the complex space Cn equipped with the restriction of the standard symplectic
form σ0 of C
n. (The proofs of the results in this paper hold verbatim for the open ball.) Recently a
lot of effort has been put into understanding the topological and geometric properties of the space
of symplectic embeddings from Br into M . This question is not only intriguing, but it is also very
fundamental because it acknowledges one of the main differences that exist between Riemannian
and symplectic geometry, e.g. Gromov’s non–squeezing theorem [12].
Figure 1: An equivariant and symplectic embedding B2r → S2s with r/s =
This question, posed with such generality, has proven to be extremely difficult to answer.
Significant progress has been made by McDuff [17], [18], Biran [3], [5] and most recently by
Lalonde–Pinsonnault [14], among other authors. One of the most general results is due to McDuff;
she showed the connectedness of the space of 4-balls into 4-manifolds with non-simple Seiberg–
Witten type, in particular rational or ruled surfaces. Recall that we say that a symplectic 4-manifold
Q has simple Seiberg–Witten type or just simple type if the only non–zero Gromov invariants of Q
occur in classes A ∈ H2(Q) for which k(A) = K · A + A2 = 0. It follows from work of Taubes
http://arxiv.org/abs/0704.1033v1
and Li–Liu that the symplectic 4-manifolds with non–simple type are blow–ups of (i) rational and
ruled manifolds; (ii) manifolds with b1 = 0, b
2 = 1, like the Enriques or Barlow surface; and
(iii) manifolds with b1 = 2 and (H1(X))
2 6= 0; examples (with K = 0) are hyperelliptic surfaces,
some non–Kähler T2-bundles over T2 and quotients T2 × Σg/G where Σg is a surface of genus g
greater than 1, and G is certain finite group. See [17] for further references and examples.
McDuff’s techniques are unique to dimension 4 and do not extend at all to higher dimensions—
this is also the case in the other authors’ work—existence of J-holomorphic curves with special
homological properties is essential in their proofs. Although J-holomorphic curves exist in all
even dimensions, it is only in dimension 4 where these homological properties hold.
In the present paper we study a special case of this question: M is a symplectic–toric manifold
of arbitrary dimension, and the symplectic embeddings that we consider preserve the toric struc-
ture, see Figure 1. Precisely this means that there exists an automorphism Λ of the n-torus Tn such
that the following diagram commutes:
n × Br
Λ×f //
f // M
, (1.1)
where ψ is a fixed effective and Hamiltonian Tn-action on M and · denotes the standard action by
rotations on Br (component by component). In this case we say that f is a Λ-equivariant mapping.
(1,0)(0,0)
(1,1)(0,1)(1,1)(0,1)
(2,0)(0,0)(1,0)(0,0)
(0,1)
Figure 2: The momentum polytope of CP2 and B21 (left), of a Hirzebruch surface (center) and of
(CP1)2 (right).
The feature that makes the study of symplectic manifolds equipped with Hamiltonian torus
actions richer than the study of generic symplectic manifolds is the presence of the smooth mo-
mentum map µM : M → Lie(Tn)∗, whose image ∆M is a convex polytope (called the momentum
polytope of M , cf. Figure 2) as shown independently by Atiyah and Guillemin–Sternberg [1], [8].
Here we are identifying the Lie algebra Lie(Tn) and its dual Lie(Tn)∗ with Rn. Since this identi-
fication is not canonical, we need to specify the convention we adopt in this paper. This amounts
to choosing an epimorphism R → T1 which we take to be x 7→ e2
−1x. This epimorphism in-
duces an isomorphism between Lie(T1) and R via ∂
7→ 1/2, giving rise to a new isomorphism
Lie(Tn) → Rn, ∂
7→ 1/2 ek, by canonically identifying Lie(Tn) with the product of n copies of
Lie(T1) (see [9] for more details).
For example, under the convention of the previous paragraph, the momentum map µBr of Br
is a mapping from Br into R
n with components µBrk (z) = |zk|2, for all integer k with 1 ≤ k ≤ n.
(There are a number of different conventions used in the literature, and our choice is intended to
give the simplest formula for the momentum map of Br.) The simplest symplectic manifolds which
admit Hamiltonian effective torus actions are called symplectic–toric.
Definition 1.1 A symplectic–toric manifold M , also called a Delzant manifold, is a compact
connected symplectic manifold equipped with an effective Hamiltonian action of a torus of dimen-
sion half of the dimension of the manifold. In this case the momentum polytope ∆M is called the
Delzant polytope of M . ⊘
Symplectic–toric manifolds were classified by Delzant in [7]. In particular, he showed that the
momentum image of such a manifold under the momentum map completely determines M up to
equivariant symplectomorphisms.
The main result of this paper, Theorem 1.2 below, describes the topology of the space of equiv-
ariant embeddings of symplectic balls into a symplectic–toric manifold. We denote by χ(M) the
Euler characteristic of M .
Theorem 1.2. For every symplectic–toric 2n-manifold M there is an associated Z-valued non–
increasing step function Emb(M,σ) : R≥0 → [0, n!χ(M)] such that for each r ≥ 0 the space of
equivariant symplectic embeddings from the 2n-ball Br into M is homotopically equivalent to a
disjoint union of Emb(M,σ)(r) subspaces, each of which is homeomorphic to the n-torus T
As a matter of fact we can explicitly and easily read Emb(M,σ) from the polytope ∆
Example 1.3 Let (M, σ) equal the blow–up of S2r0×S
with r0 = 1/
2 whose Delzant polytope
has vertices at (0, 0), (2, 0), (2, 1), (1, 2) and (0, 2) (see Figure 4). Then Emb(M,σ) = 10χ[0,1) +
2χ[1,
2), where χA denotes the characteristic function of A ⊂ R. We identify the 2-sphere of
radius r equipped with the standard area form with (CP1, 4r2 ·σFS), where σFS is the Fubini–Study
form. ⊘
Proposition 1.4. The function Emb(M,σ)(r) given in Theorem 1.2 is an invariant of the symplectic–
toric type of M and is given by the formula
Emb(M,σ)(r) = n!
p∈MTn
cp(r), (1.2)
where for each fixed point p ∈M , cp(r) = 1 if the infimum of the SL(n, Z)-lengths of the edges of
∆M meeting at µM(p) is strictly greater than r2, and cp(r) = 0 otherwise.
Example 1.5 Let σFS be the Fubini–Study form on CP
n and observe that Tn acts naturally on
(CPn, λ · σFS), λ > 0, with n + 1 fixed points. The momentum polytope is a tetrahedrum with
vertices at 0 and λ ei, where the ei are the canonical basis vectors in R
n. So if M = CPn × CPm,
the space of equivariant symplectic embeddings from Br into M is homotopically equivalent to
(n+m)!(n+1)(m+1)⊔
if r <
λ, and it is empty otherwise. ⊘
The study of the space of symplectic embeddings is directly related to the study of the sym-
plectic ball packing problem cf. [4], the equivariant version of which was treated in [19].
2 Proof of Theorem 1.2
In this section we prove Theorem 1.2. For clarity, the proof is divided in three steps, which we
describe next.
We start by introducing the notation and making the following observations:
i) Throughout the proof Rwill denote the space of rotations by matrices of the form (δi τ(i)θi j)ni, j=1
with τ ∈ Sn (the symmetric group) and θi j ∈ T1, and ET
x will denote the space of equivari-
ant symplectic embeddings f from Br into M such that f(0) = p and µ
M(p) = x ∈ ∆M ,
equipped with the Cm-Whitney topology (m ≥ 0). Throughout the present section we fix f .
Since each component of R is canonically identified with the n-torus Tn (cf. Corollary 2.5),
Theorem 1.2 amounts to prove that if p ∈MTn (MTn denotes the Tn-fixed point set) is such
that µM(p) = x, then the space ETnx gets identified with R via a homotopy equivalence.
ii) Secondly let BTnr denote the space of equivariant symplectomorphisms of Br (again with
respect to the Cm-Whitney topology).
Recall that, for example, the Cm-Whitney topology on BTnr is given by the well–known norm
‖ φ ‖Cm= max
0≤k≤m
‖ Tkφ ‖M(C),
where we are taking the norm ‖ · ‖M(C) on the right–hand side of this expression to be the
canonical Euclidean norm on the space of n× n matrices with complex entries.
iii) We identify the automorphism group Aut(Tn) with the matrix group
GL(n, Z).
iv) The elements αpi ’s, 1 ≤ i ≤ n, denote the weights of the isotropy representation of Tn on
TpM ; the canonical basis vectors ei ∈ Rn represent the weights of the isotropy representa-
tion of Tn on T0Br.
Step 1: Invariance of the image f(Br).
In this step we first show how to go from smooth maps on manifolds to affine maps on polytopes
(see diagram (2.3)), and secondly we use this to show the invariance of the image f(Br) ⊂ M .
Precisely, one can think of an embedding being equivariant in the sense of commuting with the
n-action, and it is when we reparametrize the torus that Λ appears.
Lemma 2.1. Let g be any Λ-equivariant and symplectic embedding such that the normalization
condition f(0) = g(0) = p holds. Then for all z ∈ Br, if Tn · z denotes the Tn-orbit that passes
through z, the identity f(Tn · z) = g(Tn · z) holds, and therefore f(Br) = g(Br).
Proof. Let f, g be Λ-equivariant and symplectic embeddings from Br into M with f(0) = g(0) =
p. Under the identifications described in Section 1, the following diagram commutes, where the
top arrow stands for the affine map with linear part (Λt)−1, which takes 0 to x:
(Λt)−1+x
// ∆M
. (2.3)
In order to prove the commutativity of diagram (2.3), we denote by ξM the vector field induced
by the element ξ ∈ Lie(Tn) via the exponential map and note that from the definition of the
momentum maps µM and µBr , the Λ-equivariance of f , and the fact that f ∗σ = σ0, we have
the following sequence of equalities, where Tf(z)µ
M and Tzµ
Br denote, respectively, the tangent
mapping of µM at f(z) and of µBr at z,
〈TzµBr(v), ξ〉z = (σ0)z(v, ξBr(z))
= σf(z)(Tzf(v), Λ(ξ)M(f(z)))
= 〈Tf(z)µM(Tzf(v)), Λ(ξ)〉f(z)
= 〈Λt ◦ Tf(z)µM(Tzf(v)), ξ〉z, (2.4)
where z ∈ Br, v ∈ TzBr and ξ ∈ Lie(Tn). Therefore by equation (2.4) and by using the chain rule
we obtain that for all z ∈ Br and v ∈ TzBr
Br(v) = Tz(Λ
t ◦ µM ◦ f)(v). (2.5)
Considering equation (2.5), f(0) = p and µM(p) = x and composing with (Λt)−1, after integration
we obtain the commutativity condition on diagram (2.3). Notice that diagram (2.3) also holds for
the embedding g.
Then it follows from the conjunction of diagram (2.3) and diagram (1.1) that for all t ∈ Tn the
following identities hold:
µM(ψ(Λ(t), f(z))) = µM(f(t · z))
= (Λt)−1 ◦ µBr(z) + x
= µM(g(t · z)) = µM(ψ(Λ(t), g(z))). (2.6)
Expression (2.6) is clearly equivalent to µM(Tn · f(z)) = µM(Tn · g(z)), since Λ is an auto-
morphism. Now since M is symplectic–toric, by the proof of the Atiyah–Guillemin–Sternberg
convexity theorem we know that each fiber of the momentum map µM consists of a single con-
nected orbit which together with the last equality implies that Tn · f(z) = Tn · g(z). Since f(Br)
is the union of the orbit images f(Tn · z), we immediately obtain that f(Br) = g(Br), which
concludes the proof.
It is possible to explicitly describe the momentum image µM(f(Br)), and for this purpose we
recall the notion of SL(n, Z)–length: if x, y ∈ Rn, we say that a segment line [x, y] joining x to y
(0,0)
(0,1) (1,1)
(1,0)
(0,1) (1,1)
(0,0) (1,0)
Figure 3: Equivariant symplectic ball embeddings in (CP1)2 (left); the triangle on the right does
not come from such embedding.
in Rn has SL(n, Z)-length d if there exists a matrix A ∈ SL(n, Z) such that A(d e1) = y − x (d
is not defined in general, only for segments of rational slope). The Euclidean length of a segment
line agrees with its SL(n, Z)-length if and only if the segment is parallel to one of the coordinate
axes in Rn.
In their article [16], Karshon and Tolman made the following two definitions. Let (Q, σQ) be
a connected symplectic 2m-dimensional manifold with momentum map µQ for an action of an
m-torus Tm on Q, and let Γ ⊂ (Lie(Tm))∗ be an open convex subset which contains the image
of Q under the momentum map µQ. The quadruple (Q, σQ, µQ, Γ) is a proper Hamiltonian Tm-
manifold if the momentum map µQ is proper as a map to Γ.
The proper Hamiltonian Tm-manifold (Q, σ, µQ, Γ) is said to be centered about a point α ∈ Γ
if α is contained in the momentum map image of every component of QK , for each K ⊂ Tm. Here
QK := {q ∈ Q |ψQ(a, q) = q, ∀a ∈ K},
where ψQ : T
m ×Q→ Q denotes the action of Tm on Q. The following lemma is Proposition 2.8
in [16].
Lemma 2.2 (Karshon–Tolman, [16]). Let the quadruple (Q, σQ, µQ, Γ) be a proper Hamiltonian
2m-dimensional Tm-manifold. Suppose that (Q, σQ, µQ, Γ) is centered about α ∈ Γ and that the
preimage (µQ)−1({α}) consists of a single fixed point q. Then Q is equivariantly symplectomor-
phic to
{z ∈ Cm |α+
|zj |2 ηqj ∈ Γ},
where ηq1, . . . , η
m are the weights of the isotropy representation of T
m on TqQ.
We use Lemma 2.2 in order to prove the following lemma.
Lemma 2.3. The momentum image µM(f(Br)) equals the subset of R
n given by the convex hull of
x and x+ r2 αpi , where 1 ≤ i ≤ n. Furthermore, the infimum of the SL(n, Z)-lengths of the edges
of ∆M meeting at x is greater than or equal to r2, if and only if for all 0 < s < r there exists an
embedding h : Bs → M which is Λ-equivariant and symplectic, satisfying h(0) = p.
Proof. The first observation is that ∆Br equals the convex hull in Rn of 0 and r2 e1, . . . , r
2 en.
Secondly, since µBr : Br → ∆Br is onto, it follows from diagram (2.3) that
µM(f(Br)) = (Λ
t)−1(∆Br) + x.
Since Λ is an automorphism, (Λt)−1 is an automorphism of the corresponding dual spaces and
therefore there exists a permutation τ ∈ Sn such that (Λt)−1(ei) = αpτ(i). Then the linearity of Λ
implies that µM(f(Br)) equals the the convex hull in R
n of the points x and x+ r2 αpi , 1 ≤ i ≤ n,
which proves the first claim.
Suppose that the infimum of the SL(n, Z)-lengths of the edges meeting at x is greater than or
equal to r2. Let Σ be the convex hull of x and x + r2 αpi , with 1 ≤ i ≤ n, and let Z be the convex
hull of x+r2 αpi , with 1 ≤ i ≤ n. Notice that Σ ⊂ ∆M , Σ\Z is open in ∆M , and let Γ ⊂ Rn be the
open half–space of Rn, whose closure’s boundary ∂(cl(Γ)) is the hyperplane of Rn that contains
Z, and such that Σ \ Z ⊂ Γ.
Let N := (µM)−1(Σ \ Z) and let σN be the symplectic form obtained by restricting σ to N .
The set N is open in M because it is the preimage of the open set Σ \Z under the momentum map
µM : M → ∆M . By the proof of Atiyah–Guillemin–Sternberg convexity theorem, cf. [1], [8], N
is a connected manifold. Since M is compact, the momentum map µM : M → ∆M is a proper
map and therefore its restriction µM : N → Σ \Z is a proper map, which means that µM : N → Γ
is proper, since (µM)−1(Γ \ (Σ \ Z)) = ∅. Therefore (N, σN , ψN) is a connected symplectic
manifold with momentum map µM , and the quadruple (N, σN , µM , Γ) is a proper Hamiltonian
n-space.
On the other hand, notice that the quadruple (N, σN , µM , Γ) is centered about the point x, and
(µM)−1({x}) = p, so we can apply Lemma 2.2, and conclude that N is equivariantly symplecto-
morphic to the submanifold X ⊂ Cn given by
X := {z ∈ Cn | x+
|zi|2 αpi ∈ Σ \ Z}
= {z ∈ Cn | x+
|zi|2 αpi ∈ Σ} \ {z ∈ Cn | x+
|zi|2 αpi ∈ Z}
= Br \ ∂Br = Int(Br).
Hence there exists an equivariant symplectomorphism φ : Int(Br) → N , and by letting j : Bs →
Int(Br) be the standard inclusion, if s < r, the map h := jN ◦φ◦ j : Bs → M , where jN : N →M
is the inclusion map, is an equivariant symplectic embedding for all s < r with h(0) = p. The
converse follows from the first statement of the lemma.
Note that µM(f(Br)) only depends on the fixed point p and the radius r (which was fixed a
priori) and not on f . In Figure 3 several momentum ball images are drawn using Lemma 2.3. Note
that the shaded triangle on the right picture is not a Delzant polytope since it fails to be smooth at
(0, 0). Delzant polytopes are simple, edge–rational and smooth polytopes, cf. Figure 2 (see [10]
or [6] for a definition of these notions).
Step 2: A deformation retraction on BTnr .
In this step we use Alexander’s trick to construct a deformation retraction from the space of
equivariant symplectomorphisms of the 2n-dimensional ball Br in C
n onto a disjoint union of
copies of Tn. The continuity of this deformation is standard and may be found in [13].
Lemma 2.4. The space BTnr of equivariant symplectomorphisms of the 2n-dimen-
sional ball Br in C
n, with respect to the standard symplectic form σ0 and the canonical action
of Tn by rotations, deformation retracts onto its subspace of linear, equivariant and symplectic
rotations given by matrices in R.
Proof. We define the transformationHT
from BTnr × [0, 1] into BT
r , by the formulaH
(φ, t) :=
φt, where φt is the composite map
φt := (mt)
−1 ◦ φ ◦mt, t 6= 0. (2.7)
The map mt in expression (2.7) denotes the linear contraction of factor 0 ≤ t ≤ 1 on Br, mt(z) =
t z; and when t = 0, φt = φ0 is defined to be the tangent mapping Tφ of the map φ, evaluated at 0.
(This expression for φt is known as Alexander’s trick.) It is easy to check that H
is continuous
and that the evaluation map [0, 1] × Br → Br given by (t, z) 7→ φt(z) is smooth. Since HT
the identity on linear maps, we conclude that it is a deformation retraction, not only a homotopy,
onto the space of ball rotations by matrices (δi τ(i)θi j)
i, j=1 with τ ∈ Sn (the symmetric group) and
θi j ∈ T1.
We have left to check that HT
is well defined, i.e. that φt ∈ BT
r . Indeed, the equivariance
of the mapping φt follows directly from formula (2.7); explicitely we have that if φ is equivariant
with respect to Λ ∈ Aut(Tn), then φt(s · z) = 1/t φ(s · tz) = Λ(s) · φt(z) for all s ∈ Tn. By
differentiating formula (2.7) we obtain that
T(φt) = (mt)
−1 ◦ Tφ ◦mt, (2.8)
and since mt is a linear isomorphism, the mapping φt is a diffeomorphism. Furthermore, since
the mapping φ is symplectic, it follows from expression (2.8) that for all z ∈ Br we have that
(φ∗tσ0)z(u, v) = (σ0)φ(z)(Tzφ(u), Tzφ(v)) = (σ0)z(u, v), for every pair of vectors u, v ∈ TzBr,
and hence φt is a symplectic mapping. Therefore φt is a diffeomorphism, which is equivariant and
symplectic, or equivalently φt ∈ BT
r . We have been assuming that t 6= 0, but if t = 0, it is trivial
that φ0 ∈ BT
Corollary 2.5. The space BTnr of equivariant symplectomorphisms of the 2n-dim-
ensional ball Br in C
n, with respect to the standard symplectic form σ0 and the canonical action
of Tn by rotations, is homotopically equivalent to a disjoint union of n! copies of Tn.
Proof. Apply Lemma 2.4 and observe that the space of ball rotations by matrices (δi τ(i)θi j)
i, j=1
with τ ∈ Sn and θi j ∈ T1 is homotopically equivalent to a disjoint union of n! copies of Tn.
We conclude the proof with Step 3, in which Lemma 2.1, Lemma 2.3 and Lemma 2.4 are
combined in order to prove Theorem 1.2. The proof of Proposition 1.4 will follow from the proof
of Theorem 1.2, since the function Emb(M,σ) will be explicitly computed.
Step 3: Lifting the deformation φt to ET
x and conclusion.
In this final step we show that ETnx is homotopically equivalent to a disjoint union of copies of
Lemma 2.6. Suppose that the infimum of the SL(n, Z)-lengths of the edges of ∆M meeting at x is
strictly greater than r2. Then there exists an equivariant and symplectic embedding u : Br → M
with u(0) = p such that if ρ is the identification map on ETnx which takes values on BT
r and is
given by formula ρ(h) := u−1 ◦ h, where h ∈ ETnx , the space ET
x is homotopically equivalent to
the space ρ−1(R).
Proof. The first observation is that by Lemma 2.3 there exists a Λ-equivariant and symplectic
embedding u from Br intoM with u(0) = p. In order to construct homotopy equivalences between
ETnx and R, we define ρ to be the identification map on ET
x which takes values in BT
r and is given
by formula ρ(h) := u−1◦h for every h ∈ ETnx . Now we claim that the mapHT
x from ET
x ×[0, 1] to
ETnx , given by the commutative diagram (2.9) below, is a well–defined and continuous homotopy
satisfying HT
x (ET
x × {0}) = ρ−1(R), while ET
x is preserved at time t = 1, i.e. we have that
x (ET
x × {1}) = ET
x . The diagram is the following:
ETnx × [0, 1]
BTnr × [0, 1]
Br // BTnr
. (2.9)
The mapping HT
x is well defined by Lemma 2.1. Note that H
x is continuous, since the identifi-
cations ρ and ρ−1 are obviously continuous and we showed in Lemma 2.4 that HT
is continuous.
We can therefore conclude, from the previous considerations and the fact that that HT
is a de-
formation retraction in the Cm-Whitney topology, that HT
x induces homotopy equivalences ρ and
δ(f) := HT
x (f, 0) between ET
x and ρ
−1(R), with ρ◦δ homotopic to idETn
and δ◦ρ = idρ−1(R).
In order to conclude the proof of Theorem 1.2 we simply make the following observations:
• First, the space described in it is precisely the disjoint union of the ETnx , x being a vertex of
∆M , because 0 is to be mapped to a fixed point of ψ.
• The number of Tn-fixed points, which is the same as the number of vertices of ∆M , is pre-
cisely χ(M). This follows from the analysis of the momentum map as in Atiyah–Delzant–
Guillemin–Sternberg theory (see for example [10], [11]).
• If we denote by Emb(M,σ)(r) the number of copies of Tn onto which the space considered in
Theorem 1.2 retracts (see formula (1.2)), Emb(M,σ)(r) is obtained by multiplying the number
of fixed points that admit such an embedding (see Lemma 2.3) by the number of copies of
n onto which ETnx (for the particular point) retracts; this latter number is n! (see Corollary
2.5), i.e. as many copies of Tn as possible ways that the canonical basis vectors ei may be
mapped onto the basis of weights αpi (for the particular point). Also, the former number is by
Lemma 2.3 controlled by the Boolean variable cp(r) defined in Proposition 1.4. Therefore
Emb(M,σ)(r) is given by Emb(M,σ)(r) = n!
p∈MTn cp(r), as we wanted to show.
• It is obviously true that if (M, σ) is equivariantly symplectomorphic to (M̃, σ̃), then Emb(M,σ)(r) =
(M̃, σ̃)
(r), so the integer Emb(M,σ)(r) is a symplectic–toric invariant.
(2,1)
(1,2)(0,2)
(2,0)(0,0)
(2,1)
(1,2)(0,2)
(2,0)(0,0)
Figure 4: Polytope corresponding to the Delzant manifold (M, σ) obtained by blowing up S2r0×S2r0
with r0 = 1/
2. Observe that Emb(M,σ)(
2) = 0 (proof in left figure) and Emb(M,σ)(1/
10 (proof in right figure). See Lemma 2.7.
As a final remark we observe that the invariant function Emb(M,σ) associated to the Delzant
manifold (M, σ) always reaches its minimum and maximum values on an interval of strictly posi-
tive length.
Lemma 2.7. There exist numbers r0, s0 > 0 such that if r ≤ r0, then the space of equivariant
symplectic embeddings from Br into M is homotopically equivalent to a disjoint union of n!χ(M)
copies of Tn, and if s ≥ s0, then it is empty.
Proof. It follows easily from Lemma 2.3, Corollary 2.5 and the previous observations.
This concludes the proof of Theorem 1.2 (and hence by construction the proof of Proposition
1.4).
3 Remarks on the partially equivariant case of Theorem 1.2
In this section we initiate a discussion on the topology of the space of partially equivariant sym-
plectic embeddings and sketch some suggestions to answer a question in this direction.
First the notion of Λ-equivariance (Λ ∈ Aut(Tn)) in Section 1 has a natural extension: we
say that an embedding from the 2n-ball Br into the 2n-dimensional Delzant manifold M is γ-
-equivariant with respect to a monomorphism γ : Tn−k → Tn, 1 ≤ k ≤ n − 1, if the following
diagram commutes:
n−k × Br
γ×f //
f // M
For example, Mγ is the set of p ∈ M such that ψ(γ(t), p) = p for all t ∈ Tn−k, and the rest of
terminology is also analogous. This definition extends naturally to the case when k = n, in which
the embeddings considered are purely symplectic, as well as to the case when k = 0, in which the
embeddings are fully equivariant, case which we treated previously in the paper. Unless otherwise
specified we do not consider these two cases in the discussion that follows. The question we would
like to address is the following:
Question 3.1 Let r be such that any connected component C ofMγ admits a Darboux–Weinstein
neighborhood of radius r, and by this we mean a neighborhood that is equivariantly symplecto-
morphic to a bundle over C with fiber the standard ball of radius r. Is the space of γ-equivariant
symplectic embeddings from Br intoM homotopically equivalent to the space of purely symplectic
embeddings from B2kr into M
γ up to reparametrization groups (as explained below)? ⊘
To analyze Question 3.1 first define B̂2kr to be the embedded 2k-ball in Br, i.e. the set of
points (z1, . . . , zk, 0) in Br so that
i=1 |zi|2 ≤ r2. The preimage under the momentum map of
the k-face corresponding to γ is the fixed point locus Mγ . Now consider any symplectic embed-
ding f : B̂2kr → Mγ . We want to find a canonical way to extend f to an equivariant symplectic
embedding can(f) : Br →M up to homotopy.
Here is an attempt to construct can(f): near the image of f , we can apply the equivariant ver-
sion of the Darboux–Weinstein’s theorem in order to find a neighborhood of Im(f) in M which
is symplectomorphic to Im(f)× B2(n−k), with the action of Tn−k given by the standard action on
2(n−k), and the symplectic form coinciding with the product symplectic form. Note that the sym-
plectic normal bundle toMγ is trivial over Im(f) because Im(f) is contractible, so a neighborhood
of Im(f) looks like Im(f)× B2(n−k) with a product symplectic form, and the action of Tn−k on it
is conjugate to the standard one. Using this identification M is described as a product, and we can
define can(f)(z) := (f(z1, . . . , zk), zk+1, . . . , zn). This expression for can(f) is clearly symplec-
tic and equivariant with respect to Tn−k-actions on the last n− k coordinates but is not canonical
because the local symplectomorphism given by Darboux–Weinstein’s theorem is not unique. We
cannot expect it to always be the same independently of f , because it is not true that globally the
normal bundle to Mγ is symplectically trivial, it only becomes true over a neighborhood of Im(f).
So this construction depends on choices of parameters.
Calling CAN the space of canonical embeddings can(f) : Br → ∆M , where f : B̂2kr → Mγ is
a symplectic embedding, observe that CAN is naturally identified with the space of purely sym-
plectic embeddings from the standard B2kr into M
γ , up to homotopy. The question then becomes
whether any γ-equivariant symplectic embedding f : B2kr → M may be deformed through a con-
tinuous family of equivariant symplectic embeddings to an embedding in CAN.
Equivalently, we ask the question: is the natural map between the space of partially equivariant
embeddings from Br into M and the space of symplectic embeddings from B
r into the fixed point
set Mγ (given by the restriction to the fixed ball B2kr ) a fibration? Note that the construction of
can(f) would give a section of this fibration.
Conjecture 3.1. Question 3.1 has an affirmative answer.
Example 3.2 If M = S2 × S2 with a product symplectic form and product T2-action (this space
has been carefully studied by Lalonde–Pinsonnault [14] and Anjos [2] among other authors), the
fixed point locus of the second S1 factor is S2×{a, b}, where a, b are the fixed points of the action
of S1 on S2. Now, given a symplectic embedding f of the ball B2 into S2, it is easy to build an
1-equivariant embedding of B4 into S2 × S2 canonically by (z1, z2) 7→ (f(z1), z2), where z2 is
taken to be a coordinate centered at the fixed point a. In this case the normal bundle to the fixed
point component S2 × {a} is globally trivial. ⊘
The combination of purely symplectic results of Biran, Lalonde–Pinsonnault and others and
an affirmative answer to Question 3.1 would give insight into the partially equivariant case in
higher dimensions; for example McDuff showed that if M is a symplectic 4-manifold with non-
simple Seiberg–Witten type, then the space of symplectic embeddings from Br into M is path
connected (which extends results of Biran). This is a consequence of the non–trivial result: any
two cohomologous and deformation equivalent symplectic forms on M are isotopic (proved in
[17]). Examples are known in dimensions 6 and above of cohomologous symplectic forms that are
deformation equivalent but not isotopic, so these techniques do not help to understand the topology
of the space of symplectic embeddings from Br into M . A positive answer to Question 3.1 would
give the first non–trivial result in dimension 6.
Another way of trying to generalize Theorem 1.2 is to consider embeddings equivariant with
respect to a complexity one action, that is, an action of Tn−1 on M2n. This is a hopeful approach
since a complete classification of complexity one actions has been recently achieved by Karshon
and Tolman [15].
Acknowledgments
The author is grateful to D. Auroux and V. Guillemin for discussions, and for hosting him at the
M.I.T. regularly during the Fall and Spring semesters of 2003 and 2004. He thanks D. Auroux,
J.J. Duistermaat, Y. Karshon and M. Pinsonnault for making comments on a preliminary version
of this paper. Finally, the author is grateful to an anonymous referee for helpful suggestions that
have shortened the proof in Section 2, as well as for his/her interesting comments on Section 3.
References
[1] M. Atiyah, Convexity and commuting Hamiltonians. Bull. London Math. Soc., 1 (1982) 1–15.
[2] S. Anjos, Homotopy type of symplectomorphism groups of S2 × S2. Geom. Topol. 6 (2002)
195–218.
[3] P. Biran, Connectedness of spaces of symplectic embeddings, Int. Math. Res. Lett., 10 (1996)
487–491.
[4] P. Biran, From symplectic packing to algebraic geometry and back, Proc. 3rd Europ. Congr.
of Math., Barcelona 2, Prog. Math., Birkhauser (2001) 507–524.
[5] P. Biran, Geometry of symplectic intersections, Proc. ICM Beijing III (2004) 1-3.
[6] A. Cannas da Silva, Lectures in symplectic geometry, Springer-Verlag, Berlin (2000).
[7] T. Delzant, Periodic Hamiltonians and convex images of the momentum mapping, Bull. Soc.
Math. France, 116 (1988) 315–339.
[8] V. Guillemin and S. Sternberg, Convexity properties of the momentum mapping, Invent.
Math., 67 (1982) 491–513.
[9] V. Guillemin and S. Sternberg, Symplectic techniques in physics, Cambr. Univ. Press (1984).
[10] V. Guillemin, Moment maps and combinatorial invariants of Hamiltonian Tn spaces, Prog.
Math., Birkhauser (1994).
[11] V. Guillemin, V. Ginzburg and Y. Karshon, Moment maps, cobordisms and Hamiltonian
group actions, Amer. Math. Soc. Monograph 98, (2002).
[12] M. Gromov, Pseudoholomorphic curves in symplectic geometry, Invent. Math., 82 (1985)
307–347.
[13] M. Hirsch, Differential topology. Springer-Verlag 33 (1980).
[14] F. Lalonde and M. Pinsonnault, The topology of the space of symplectic balls in rational
4-manifolds, Duke Math. J. 122 (2004), no. 2, 347–397.
[15] Y. Karshon and S. Tolman, Centered complexity one Hamiltonian torus actions, Trans. Amer.
Math. Soc. 353 (2001), no. 12, 4831–4861 (electronic).
[16] Y. Karshon and S. Tolman, The Gromov width of complex Grasmannians, Alg. and Geom.
Topol. 5 paper 38, 911–922.
[17] D. McDuff, From deformation to isotopy, Topics in Symplectic 4-manifolds, Internat. Press,
Cambridge, MA (1998).
[18] D. McDuff, The structure of rational and ruled symplectic 4-manifolds, J. Amer. Math. Soc.
3(3) (1990) 679–712.
[19] A. Pelayo, Toric symplectic ball packing, Top. and its Appl. 153 (2006) 3633–3644.
A. Pelayo
Department of Mathematics, University of Michigan
2074 East Hall, 530 Church Street, Ann Arbor, MI 48109–1043, USA
e-mail: apelayo@umich.edu
	The main theorem
	Proof of Theorem ??
	Remarks on the partially equivariant case of Theorem ??
ABSTRACT
  We compute the homotopy type of the space of T^n-equivariant symplectic
embeddings from the standard 2n-dimensional ball of some fixed radius into a
2n-dimensional symplectic-toric manifold M, and use this computation to define
a Z-valued step function on the positive real line which is an invariant of the
symplectic-toric type of M. We conclude with a discussion of the partially
equivariant case of this result.

<|endoftext|><|startoftext|>
Introduction to toric varieties, Ann. Math. Stud. 131, Princ. Univ. Press 1993.
[12] V. Guillemin and S. Sternberg, Convexity properties of the momentum mapping, Invent.
Math., 67 (1982) 491–513.
[13] V. Guillemin and S. Sternberg, Symplectic techniques in physics, Camb. Univ. Press, 1984
[14] V. Guillemin, Moment maps and combinatorial invariants of Hamiltonian Tn spaces, Prog.
in Math., Birkauser, (1994).
[15] V. Guillemin, V. Ginzburg and Y. Karshon, Moment maps, cobordisms and Hamiltonian
group actions, Amer. Math. Soc. Monograph 98, (2002).
[16] B.S. Kruglikov, A remark on symplectic packings. (Russian) Dokl. Akad. Nauk 350 (1996),
no. 6, 730–734.
[17] F. Lalonde, M. Pinsonnault. The topology of the space of symplectic balls in rational 4-
manifolds. Duke Math. J. 122 (2004), no. 2, 347–397.
[18] D. McDuff, From deformation to isotopy, Topics in Symplectic 4-manifolds, Internat. Press,
Cambridge, MA (1998).
[19] D. McDuff, L. Polterovich, Symplectic packings and algebraic geometry. With an appendix
by Y. Karshon. Invent. Math. 115 (1994), no. 3, 405–434.
[20] D. McDuff and D.A. Salamon, Introduction to Symplectic Topology, 2nd edition, OUP,
(1998).
[21] F. M. Maley, J. Mastrangeli, L. Traynor, Symplectic packings in cotangent bundles of tori.
Experiment. Math. 9 (2000), no. 3, 435–455.
[22] A. Pelayo, Topology of spaces of equivariant symplectic embeddings, Proc. Amer. Math.
Soc. 135 (2007) 277–288.
[23] L. Traynor, Symplectic packing constructions. J. Diff. Geom. 41 (1995), no. 3, 735–751.
[24] G. Xu, Curves in P 2 and symplectic packings. Math. Ann. 299 (1994), no. 4, 609–613.
A. Pelayo
Department of Mathematics, University of Michigan
2074 East Hall, 530 Church Street, Ann Arbor, MI 48109–1043, USA
e-mail: apelayo@umich.edu
	The Main Theorem
	From Symplectic Geometry to Combinatorics
	Proof of Theorem ?? and Proposition ??
	A Remark on blowing up
	Further questions
ABSTRACT
  We define and solve the toric version of the symplectic ball packing problem,
in the sense of listing all 2n-dimensional symplectic-toric manifolds which
admit a perfect packing by balls embedded in a symplectic and torus equivariant
fashion.
  In order to do this we first describe a problem in geometric-combinatorics
which is equivalent to the toric symplectic ball packing problem. Then we solve
this problem using arguments from Convex Geometry and Delzant theory.
  Applications to symplectic blowing-up are also presented, and some further
questions are raised in the last section.

<|endoftext|><|startoftext|>
Introduction
The type Ia supernovae (SNe Ia) [1] observations provide the first evidence for the accelerating expan-
sion of the present universe. These results, when combined with the observations on the anisotropy
spectrum of cosmic microwave background (CMB) [2] and the results on the power spectrum of large
scale structure (LSS) [3], strongly suggest that the universe is spatially flat and dominated by a
component, though arguably exotic, with large negative pressure, referred to as dark energy [4]. The
nature of such dark energy constitutes an open and tantalizing question connecting cosmology and
particle physics. Different mechanisms have been suggested over the past few years to accommodate
dark energy. The simplest form of dark energy is the vacuum energy (the cosmological constant).
A tiny positive cosmological constant which can naturally explain the current acceleration would
encounter many theoretical problems such as the fine-tuning problem and the coincidence problem.
The former can be stated as the existence of an enormous gap between the vacuum expectation value,
in other words the cosmological constant, in particle physics and that observed over cosmic scales.
The absence of a fundamental mechanism which sets the cosmological constant to zero or to a very
small value is also known as the cosmological constant problem. The latter however relates to the
question of the near equality of energy densities of the dark energy and dark matter today.
Another possible form of dark energy is a dynamical, time dependent and spatially inhomogeneous
component, called the quintessence [5]. An example of quintessence is the energy associated with a
scalar field φ slowly evolving down its potential V (φ) [6, 7]. Slow evolution is needed to obtain a
negative pressure, pφ =
φ̇2+V (φ), so that the kinetic energy density is less than the potential energy
density. Yet another phenomenological explanation based on current observational data is given by
the x-matter (xCDM) model which is associated with an exotic fluid characterized by an equation
of state px = wxρx (wx < −
is the necessary condition to make a universe accelerate), where the
parameter wx can be a constant or more generally a function of time [8].
∗email: m-heydarifard@sbu.ac.ir
†email: hr-sepangi@sbu.ac.ir
http://arxiv.org/abs/0704.1035v1
Over the past few years, we have been witnessing a phenomenal interest in the possibility that
our observable four-dimensional (4D) universe may be viewed as a brane hypersurface embedded
in a higher dimensional bulk space. Physical matter fields are confined to this hypersurface, while
gravity can propagate in the higher dimensional space-time as well as on the brane. The most
popular model in the context of brane world theory is that proposed by Randall and Sundrum
(RS). In the so-called RSI model [9], the authors proposed a mechanism to solve the hierarchy
problem with two branes, while in the RSII model [10], they considered a single brane with a positive
tension, where 4D Newtonian gravity is recovered at low energies even if the extra dimension is not
compact. This mechanism provides us with an alternative to compactification of extra dimensions.
The cosmological evolution of such a brane universe has been extensively investigated and effects such
as a quadratic density term in the Friedmann equations have been found [11]-[13]. This term arises
from the imposition of the Israel junction conditions which is a relationship between the extrinsic
curvature and energy-momentum tensor of the brane and results from the singular behavior in the
energy-momentum tensor. There has been concerns expressed over applying such junction conditions
in that they may not be unique. Indeed, other forms of junction conditions exist, so that different
conditions may lead to different physical results [14]. Furthermore, these conditions cannot be used
when more than one non-compact extra dimension is involved. To avoid such concerns, an interesting
higher-dimensional model was introduced in [15] where particles are trapped on a 4-dimensional
hypersurface by the action of a confining potential V. In [16], the dynamics of test particles confined
to a brane by the action of such potential at the classical and quantum levels were studied and the
effects of small perturbations along the extra dimensions investigated. Within the classical limits,
test particles remain stable under small perturbations and the effects of the extra dimensions are not
felt by them, making them undetectable in this way. The quantum fluctuations of the brane cause
the mass of a test particle to become quantized and, interestingly, the Yang-Mills fields appear as
quantum effects. Also, in [17], a braneworld model was studied in which the matter is confined to the
brane through the action of such a potential, rendering the use of any junction condition unnecessary
and predicting a geometrical explanation for the accelerated expansion of the universe.
In brane theories the covariant Einstein equations are derived by projecting the bulk equations
onto the brane. This was first done by Shiromizu, Maeda and Sasaki (SMS) [18] where the Gauss-
Codazzi equations together with Israel junction conditions were used to obtain the Einstein field
equations on the 3-brane. In a series of recent papers a number of authors [23, 24] have presented
detailed descriptions of the dynamic of homogeneous and anisotropic brane worlds in the SMS model.
The study of anisotropic homogeneous brane world cosmological models has shown an important
difference between these models and standard 4D general relativity, namely, that brane universes are
born in an isotropic state. For the anisotropic Bianchi type I and V geometries, with a conformally
flat bulk (vanishing Weyl tensor), this type of behavior has been found by exactly solving the field
equations [25].
In this paper, we follow [17] and consider an m-dimensional bulk space without imposing the
Z2 symmetry. As mentioned above, to localize the matter on the brane, a confining potential is
used rather than a delta-function in the energy-momentum tensor. The resulting equations on the
anisotropic brane are modified by an extra term that may be interpreted as the x-matter, providing
a possible phenomenological explanation for the accelerated expansion of the universe. The behavior
of the observationally important physical quantities such as anisotropy and deceleration parameter
is studied in this scenario. We should emphasize here that there is a difference between the model
presented in this work and models introduced in [19, 20] in that in the latter no mechanism is
introduced to account for the confinement of matter to the brane.
2 The model
In this section we present a brief review of the model proposed in [16]. Consider the background
manifold V 4 isometrically embedded in a pseudo-Riemannian manifold Vm by the map Y : V 4 → Vm
such that
,ν = ḡµν , GABY
a = 0, GABN
b = gab = ±1, (1)
where GAB (ḡµν) is the metric of the bulk (brane) space Vm(V 4) in arbitrary coordinates, {Y
A} ({xµ})
is the basis of the bulk (brane) and NAa are (m − 4) normal unit vectors, orthogonal to the brane.
Perturbation of V̄4 in a sufficiently small neighborhood of the brane along an arbitrary transverse
direction ξ is given by
ZA(xµ, ξa) = YA + (LξY)
A, (2)
where L represents the Lie derivative and ξa (a = 1, 2, ...,m− 4) is a small parameter along NAa that
parameterizes the extra noncompact dimensions. By choosing ξ orthogonal to the brane, we ensure
gauge independency [16] and have perturbations of the embedding along a single orthogonal extra
direction N̄a giving local coordinates of the perturbed brane as
ZA,µ(x
ν , ξa) = YA,µ + ξ
aN̄Aa,µ(x
ν). (3)
In a similar manner, one can find that since the vectors N̄A depend only on the local coordinates xµ,
they do not propagate along the extra dimensions. The above assumptions lead to the embedding
equations of the perturbed geometry
Gµν = GABZ
,ν , Gµa = GABZ
a, GABN
b = gab. (4)
If we set NAa = δ
a , the metric of the bulk space can be written in the following matrix form
GAB =
gµν +AµcA
ν Aµa
Aνb gab
, (5)
where
gµν = ḡµν − 2ξ
aK̄µνa + ξ
aξbḡαβK̄µαaK̄νβb, (6)
is the metric of the perturbed brane, so that
K̄µνa = −GABY
a;ν , (7)
represents the extrinsic curvature of the original brane (second fundamental form). We use the
notation Aµc = ξ
dAµcd, where
Aµcd = GABN
c = Āµcd, (8)
represents the twisting vector fields (the normal fundamental form). Any fixed ξa signifies a new
perturbed geometry, enabling us to define an extrinsic curvature similar to the original one by
K̃µνa = −GABZ
a;ν = K̄µνa − ξ
K̄µγaK̄
νb +AµcaA
. (9)
Note that definitions (5) and (9) require
K̃µνa = −
. (10)
In geometric language, the presence of gauge fields Aµa tilts the embedded family of sub-manifolds
with respect to the normal vector NA. According to our construction, the original brane is orthogonal
to the normal vector NA. However, equation (4) shows that this is not true for the deformed geometry.
Let us change the embedding coordinates and set
XA,µ = Z
,µ − g
abNAa Abµ. (11)
The coordinates XA describe a new family of embedded manifolds whose members are always or-
thogonal to NA. In this coordinates the embedding equations of the perturbed brane is similar to
the original one, described by equation (1), so that YA is replaced by XA. This new embedding of
the local coordinates is suitable for obtaining induced Einstein field equations on the brane. The
extrinsic curvature of a perturbed brane then becomes
Kµνa = −GABX
a;ν = K̄µνa − ξ
bK̄µγaK̄
νb = −
, (12)
which is the generalized York’s relation and shows how the extrinsic curvature propagates as a result
of the propagation of the metric in the direction of extra dimensions. The components of the Riemann
tensor of the bulk written in the embedding vielbein {XA,α,N
a }, lead to the Gauss-Codazzi equations
Rαβγδ = 2g
abKα[γaKδ]βb +RABCDX
,δ , (13)
2Kα[γc;δ] = 2g
abA[γacKδ]αb +RABCDX
,δ , (14)
where RABCD and Rαβγδ are the Riemann tensors for the bulk and the perturbed brane respectively.
Contracting the Gauss equation (13) on α and γ, we find
Rµν = (KµαcK
ν −KcK
µν ) +RABX
,ν − g
abRABCDN
b , (15)
and the Einstein tensor of the brane is given by
Gµν = GABX
,ν +Qµν + g
abRABN
b gµν − g
abRABCDN
b , (16)
where
Qµν = −g
KγµaKγνb −KaKµνb
KαβaK
αβa −KaK
gµν . (17)
As can be seen from the definition of Qµν , it is independently a conserved quantity, that is Q
;ν = 0
[19]. Using the decomposition of the Riemann tensor into the Weyl curvature, the Ricci tensor and
the scalar curvature
RABCD = CABCD −
(m− 2)
GB[DRC]A − GA[DRC]B
(m− 1)(m − 2)
R(GA[DGC]B), (18)
we obtain the 4D Einstein equations as
Gµν = GABX
,ν +Qµν − Eµν +
(m− 2)
gabRABN
b gµν
(m− 2)
(m− 1)(m − 2)
Rgµν , (19)
where
Eµν = g
abCABCDN
,ν , (20)
is the electric part of the Weyl tensor CABCD. Now, let us write the Einstein equation in the bulk
space as
AB + Λ
(b)GAB = α
∗SAB, (21)
where
SAB = TAB +
VGAB , (22)
here α∗ = 1
(M∗ is the fundamental scale of energy in the bulk space), Λ
(b) is the cosmological
constant of the bulk and TAB is the energy-momentum tensor of the matter confined to the brane
through the action of the confining potential V. We require V to satisfy three general conditions:
firstly, it has a deep minimum on the non-perturbed brane, secondly, depends only on extra coordi-
nates and thirdly, the gauge group representing the subgroup of the isometry group of the bulk space
is preserved by it [16]. The vielbein components of the energy-momentum tensor are given by
Sµν = SABX
,ν , Sµa = SABX
a , Sab = SABN
b . (23)
Use of equation (22) then gives
RAB = −
(m− 2)
GABS +
(m− 2)
Λ(b)GAB + α
∗SAB, (24)
R = −
(α∗S −mΛ(b)). (25)
Substituting RAB and R from the above into equation (19) and using equation (23), we obtain
Gµν = Qµν − Eµν +
(m− 3)
(m− 2)
α∗gabSabgµν +
(m− 2)
Sµν −
(m− 4)(m− 3)
(m− 1)(m− 2)
α∗Sgµν
(m− 7)
(m− 1)
Λ(b)gµν . (26)
On the other hand, again from equation (21), the trace of the Codazzi equation (14) gives the “gravi-
vector equation”
Kδaγ;δ −Ka,γ −AbaγK
b +AbaδK
bδ +Baγ =
3(m− 4)
α∗Saγ , (27)
where
Baγ = g
mnCABCDN
n . (28)
Finally, the “gravi-scalar equation” is obtained from the contraction of (15), (19) and using equation
S − gmnSmn
gab =
(Q+R+W ) gab −
Λ(b)gab, (29)
where
W = gabgmnCABCDN
a . (30)
Equations (26)-(30) represent the projections of the Einstein field equations on the brane-brane,
bulk-brane, and bulk-bulk directions.
As was mentioned in the introduction, localization of matter on the brane is realized in this model
by the action of a confining potential. Since the potential V is assumed to have a minimum on the
brane, which can be taken as zero, localization of matter may simply be realized by taking equation
(22) and consider its components on the brane, in which case we may write
ατµν =
(m− 2)
Tµν , Tµa = 0, Tab = 0, (31)
where α is the scale of energy on the brane. Now, the induced Einstein field equations on the original
brane can be written as
Gµν = ατµν −
(m− 4)(m− 3)
2(m− 1)
ατgµν − Λgµν +Qµν − Eµν , (32)
where Λ = −
(m−7)
(m−1)
Λ(b) and Qµν is a conserved quantity which according to [19] may be considered
as an energy-momentum tensor of a dark energy fluid representing the x-matter, the more common
phrase being ‘x-Cold-Dark Matter’ (xCDM). This matter has the most general form of the equation of
state which is characterized by the following conditions [21]: violation of the strong energy condition
at the present epoch for ωx < −1/3 where px = ωxρx, local stability i.e. c
s = δpx/δρx ≥ 0 and
preservation of causality i.e. cs ≤ 1. Ultimately, we have three different types of ‘matter’ on the
right hand side of equation (32), namely, ordinary confined conserved matter represented by τµν , the
matter represented by Qµν which will be discussed later and finally, the Weyl matter represented by
Eµν .
3 Field equations on the anisotropic brane
In the following we will investigate the influence of the extrinsic curvature terms on the anisotropic
universe described by Bianchi type I and V geometries. We restrict our analysis to a constant
curvature bulk, so that Eµν = 0. The constant curvature bulk is characterized by the Riemann tensor
RABCD = k∗(GACGBD − GADGBC), (33)
where GAB denotes the bulk metric components in arbitrary coordinates and k∗ is either zero for
the flat bulk, or proportional to a positive or negative bulk cosmological constant respectively, corre-
sponding to two possible signatures (4, 1) for the dS5 bulk and (3, 2) for the AdS5 bulk. We take, in
the embedding equations, g55 = ε = ±1. With this assumption the Gauss-Codazzi equations reduce
Rαβγδ =
(KαγKβδ −KαδKβγ) + k∗(gαγgβδ − gαδgβγ), (34)
Kα[β;γ] = 0. (35)
The effective equations derived in the previous section with constant curvature bulk can be written
Gµν = ατµν − λgµν +Qµν . (36)
Here, λ is the effective cosmological constant in four dimensions with Qµν being a completely geo-
metrical quantity given by
Qµν =
(KKµν −KραK
ν ) +
αβ −K2
, (37)
where K = gµνKµν . To proceed further, the confined source on the brane should now be specified.
The most common matter source used in cosmology is that of a perfect fluid which, in co-moving
coordinates, is given by
τµν = (ρ+ p)uµuν + pgµν , uµ = −δ
µ, p = (γ − 1)ρ, 1 ≤ γ ≤ 2, (38)
where γ = 2 represents the stiff cosmological fluid describing the high energy density regime of the
early universe.
From a formal point of view the Bianchi type I and V geometries are described by the line element
ds2 = −dt2 + a21(t)dx
2 + a22(t)e
−2βxdy2 + a23(t)e
−2βxdz2. (39)
The metric for the Bianchi type I geometry corresponds to the case β = 0, while for the Bianchi type
V case we have β = 1. Here ai(t), i = 1, 2, 3 are the expansion factors in different spatial directions.
For later convenience, we define the following variables [25]
ai, Hi =
, i = 1, 2, 3, 3H =
Hi, ∆Hi = Hi −H, i = 1, 2, 3. (40)
In equation (40), v is the volume scale factor, Hi, i = 1, 2, 3 are the directional Hubble parameters,
and H is the mean Hubble parameter. From equation (40) we also obtain H = v̇
. The physical
quantities of observational importance in cosmology are the expansion scalar Θ, the mean anisotropy
parameter A, and the deceleration parameter q, which are defined according to
Θ = 3H, 3A =
, q =
H−1 − 1 = −H−2
Ḣ +H2
. (41)
We note that A = 0 for an isotropic expansion. Moreover, the sign of the deceleration parameter
indicates how the universe expands. A positive sign for q corresponds to the standard decelerating
models whereas a negative sign indicates an accelerating expansion in late times.
Using the York’s relation
Kµνa = −
, (42)
we realize that in a diagonal metric, Kµνa is diagonal. After separating the spatial components, the
Codazzi equations reduce to (here α, β, γ, σ = 1, 2, 3)
Kαγa,σ +K
βσ = K
σa,γ +K
βγ , (43)
Kαγa,0 +
Kαγa =
K00a, i = 1, 2, 3. (44)
The first equation gives K11a,σ = 0 for σ 6= 1, since K
1a does not depend on the spatial coordinates.
Repeating the same procedure for α, γ = i, i = 2, 3, we obtain K22a,σ = 0 for σ 6= 2 and K
3a,σ = 0
for σ 6= 3. Assuming K11a = K
2a = K
3a = ba(t), where ba(t) are arbitrary functions of t, the second
equation gives
ḃa +
K00a, i = 1, 2, 3. (45)
Summing equations (45) we find
K00a = −
3ḃav
. (46)
For µ, ν = 1, 2, 3 we obtain
Kµνa = bagµν . (47)
Assuming further that the functions ba are equal and denoting ba = b, θ =
and Θ = v̇
, we find from
equation (37) that
αβ = b2
, K = b
, (48)
Qµν = −
gµν , µ, ν = 1, 2, 3, Q00 =
. (49)
Now, using these relations and equation (36), the modified Friedmann equations become
3Ḣ +
H2i = λ−
ρ(3γ − 2) +
, (50)
(vHi) = 2β
2v−2/3 + λ−
ρ(γ − 2) +
, i = 1, 2, 3. (51)
For β = 0 we obtain the field equations for Bianchi type I geometry, while β = 1 gives the Bianchi type
V equations on the brane world. These equations are modified with respect to the standard equations
by the components of the extrinsic curvature. Such term may be used to offer an explanation for
the x-matter. In the next section we discuss the ramifications of this term on the cosmology of our
model. We also note the implicit effects of the bulk signature ε on the expansion of the universe.
For the sake of completeness, let us compare the model presented in this work to the usual brane
worlds models where the Israel junction condition is used to calculate the extrinsic curvature in terms
of the energy-momentum tensor on the brane and its trace, that is
Kµν = −
α∗2(τµν −
τgµν), (52)
where α∗ is proportional to the gravitational constant in the bulk. If we did that, we would obtain
b(t) = −1
α∗2ρ, which, upon its substitution in equation (50), gives
3Ḣ +
H2i = λ+
ρ(2− 3γ) +
ρ2(1− 3γ), (53)
which is same as equation (16) in [25]. Therefore, the emergence of a ρ2 term in the Friedmann
equations is a consequence of the discontinuity in the bulk and the brane system. The existence of
this term either does not agree with observations or requires extra parameters and fine tuning.
4 Dark energy and role of extrinsic curvature
As we noted before, Qµν is an independently conserved quantity, suggesting that an analogy with the
energy momentum of an uncoupled non-conventional energy source would be in order. To evaluate
the compatibility of such geometrical model with the present experimental data, we identify Qµν with
x-matter [21] by defining Qµν as a perfect fluid and write
Qµν =
[(ρx + px)uµuν + pxgµν ] , px = (γx − 1)ρx. (54)
Comparing Qµν , µ, ν = 1, 2, 3 and Q00 from equation (54) with the components of Qµν and Q00 given
by equation (49), we obtain
px = −
, ρx =
. (55)
Equation (54) was chosen in accordance with the weak-energy condition corresponding to the positive
energy density and negative pressure with ε = +1. Use of the above equations leads to an equation
for b(t)
. (56)
If γx is taken as a constant, the solution for b(t) is
b(t) = b0v
−γx/2. (57)
Using equation (55) and this solution, the energy density of xCDM becomes
v−γx . (58)
A brief discussion on the energy-momentum conservation on the brane would be appropriate at
this point. The contracted Bianchi identities in the bulk space, GAB;A = 0, using equation (21),
imply
TAB +
= 0. (59)
Since the potential V has a minimum on the brane, the above conservation equation reduces to
τµν;µ = 0, (60)
and gives
γρ = 0. (61)
Thus, the time evolution of the energy density of the matter is given by
ρ = ρ0v
−γ . (62)
Using the geometrical energy density for Qµν and the evolution law of the matter energy density, the
field equations (50)-(51) become
3Ḣ +
H2i = λ+
ρ0(2− 3γ)v
b20(2− 3γx)v
−γx , (63)
(vHi) = 2β
2v−2/3 + λ+
ρ0(2− γ)v
b20(2− γx)v
−γx , i = 1, 2, 3. (64)
Summing equations (64) we find
(vH) = 2β2v−2/3 + λ+
ρ0(2− γ)v
b20(2− γx)v
−γx . (65)
Now, substituting back equation (65) into equations (64) we obtain
Hi = H +
, i = 1, 2, 3, (66)
with hi, i = 1, 2, 3 being constants of integration satisfying the consistency condition
i=1 hi = 0.
The basic equation describing the dynamics of the anisotropic brane world with a constant curvature
bulk can be written as
v̈ = 6β2v1/3 + 3λv +
ρ0(2− γ)v
1−γ +
b20(2− γx)v
1−γx . (67)
Here, we note that for a stiff fluid (γ = 2), the dynamics of the matter on the brane is solely
determined by the geometrical matter (xCDM). The general solution of equation (67) becomes
t− t0 =
9β2v4/3 + 3λv2 + 3αρ0v
2−γ + 9b20v
2−γx + C
)−1/2
dv, (68)
where C is a constant of integration. The time variation of the physically important parameters
described above in the exact parametric form, with v taken as a parameter, is given by
Θ = 3H =
9β2v4/3 + 3λv2 + 3αρ0v
2−γ + 9b20v
2−γx + C
, (69)
ai = a0iv
1/3 exp
9β2v4/3 + 3λv2 + 3αρ0v
2−γ + 9b20v
2−γx +C
)−1/2
, i = 1, 2, 3, (70)
A = 3h2
9β2v4/3 + 3λv2 + 3αρ0v
2−γ + 9b20v
2−γx + C
, (71)
9β2v4/3 +
2−γ +
2−γx + 3C
9β2v4/3 + 3λv2 + 3αρ0v2−γ + 9b
2−γx + C
) − 1, (72)
where h2 =
i=1 h
i . In addition, the integration constants hi and C must satisfy the consistency
condition h2 = 2
C. For β = 0 we obtain the general solutions for Bianchi type I geometry, while
β = 1 gives the Bianchi type V solutions on the anisotropic brane world.
In a matter dominated Bianchi type I universe, γ = 1 with γx = 0, equation (68) becomes
t− t0 =
3αρ0v + 9b
2 + C
, (73)
where for later convenience, we take C =
3α2̺2
. The time dependence of the volume scale factor of
the Bianchi type I universe is given by
v(t) = e3b0(t−t0) −
, (74)
which for t = t0+
becomes zero. By reparameterizing the initial value of the cosmological
time according to e−3b0t0 = αρ0
, the evolution of the anisotropic brane universe starts at t = 0 from
a singular state v(t = 0) = 0. Therefore the expansion scalar, scale factors, mean anisotropy, and
decelerating parameter are given by
Θ(t) =
3b0(t−t0)
e3b0(t−t0) − αρ0
, (75)
ai = a0i
e3b0(t−t0) −
]1/3 [
e−3b0(t−t0)
] 2b0hi
, i = 1, 2, 3, (76)
A(t) =
3b20e
6b0(t−t0)
, (77)
q(t) =
e−3b0(t−t0) − 1. (78)
We consider λ = 0 and show that, within the context of the present model, the extrinsic curvature
can be used to account for the accelerated expansion of the universe. In figure 1 we have plotted the
behavior of the deceleration parameter for different values of γ. The behavior of this parameter shows
that when the geometrical energy density is positive and the pressure is negative the AdS5 bulk is
not compatible with the expansion of the universe. Also, this behavior is much dependent on the
range of the values that γx can take. The use of the de Sitter bulk with ρx > 0 allows us to use the
wealth of available data from the recent measurements to determine limits on the values of γx in our
geometric model. For having an accelerating universe we distinguish γx <
. As mentioned before,
q(t) > 0 corresponds to the standard decelerating models whereas q(t) < 0 indicates an accelerating
expansion at late times. Therefore, the universe undergoes an accelerated expansion at late times in
0.5 1 1.5 2
0.5 1 1.5 2
Figure 1: Left, the deceleration parameter of the Bianchi type I universe and right, the same parameter in the Bianchi
type V universe for the de Sitter bulk with γ = 1 (solid line), γ = 4/3 (dashed line), γ = 2 (dot-dashed line), γx = 0.3
and λ = 0.
the absence of a positive cosmological constant. As has been noted in [20], it should be emphasised
here too that the geometrical approach considered here is based on three basic postulates, namely,
the confinement of the standard gauge interactions, the existence of quantum gravity in the bulk
and finally, the embedding of the brane world. All other model dependent properties such as warped
metric, mirror symmetries, radion or extra scalar fields, fine tuning parameters like the tension of the
brane and the choice of a junction condition are left out as much as possible in our calculations.
To understanding the behavior of the mean anisotropy parameter in our model, let us consider it
as a function of the volume scale factor
A(v) =
9β2v4/3 + 3λv2 + 3αρ0v
2−γ + 9b20v
2−γx +C
. (79)
The behavior of the anisotropy parameter at the initial state depends on the values of γ and γx. For
an accelerating universe we obtain γx <
. From equation (79), in the limit v → 0 (singular state)
and taking γx <
, we find
A(v) =
, 1 ≤ γ ≤ 2. (80)
Therefore for a brane world scenario with a confining potential, the initial state is always anisotropic.
In our model the behavior of the anisotropy parameter coincides with the standard 4D cosmology
and is different from the brane world models where a delta-function in the energy-momentum tensor
is used [26] to confine the mater on the brane. The behavior of the mean anisotropy parameter of
the Bianchi type I and V geometries are illustrated, for γx = 0.3 and different values of γ, in figure 2.
The behavior of this parameter shows that the universe starts from a singular state with maximum
anisotropy and ends up in an isotropic de Sitter inflationary phase at late times. In figure 3 we have
plotted the deceleration parameter and the anisotropy parameter of the Bianchi type I geometry for
γ = 1 and different values of γx = 0, 0.3, 0.5.
At this point it would be appropriate to compare our model with other forms of dark energy such
as the 4D quintessence. One may consider a 4D effective Lagrangian whose variation with respect
to gµν would result in the dynamical equations (36) compatible with the embedding and with the
confined matter hypotheses [20]. In contrast to the standard model, our model corresponds to an
Einstein-Hilbert Lagrangian which is modified by extrinsic curvature terms. The resulting Einstein
equations are thus modified by the term Qµν . Since, as was mentioned before, this quantity is
independently conserved, there is no exchange of energy between this geometrical correction and the
confined matter source. Such an aspect has one important consequence however; if Qµν is to be related
to dark energy, as we did in this paper, it does not exchange energy with ordinary matter, like the
coupled quintessence models [27]. The coupled scalar field models may avoid the cosmic coincidence
problem with the available data being used to fix the corresponding dynamics and, consequently, the
scalar field potential responsible for the present accelerating phase of the universe.
0.2 0.4 0.6 0.8 1 1.2 1.4
0.2 0.4 0.6 0.8 1 1.2 1.4
Figure 2: Left, the anisotropy parameter of the Bianchi type I universe and right, the same parameter in the Bianchi
type V universe for the de Sitter bulk with γ = 1 (solid line), γ = 4/3 (dashed line), γ = 2 (dot-dashed line), γx = 0.3
and λ = 0.
0.5 1 1.5 2 2.5 3
0.5 1 1.5 2
Figure 3: Left, the deceleration parameter of the Bianchi type I geometry and right, the anisotropy parameter in the
Bianchi type I geometry for the de Sitter bulk with γx = 0 (solid line), γx = 0.3 (dashed line), γx = 0.5 (dot-dashed
line), γ = 1 and λ = 0.
5 Conclusions
In this paper, we have studied an anisotropic brane world model in which the matter is confined to
the brane through the action of a confining potential, rendering the use of any junction condition
redundant. We have shown that in an anisotropic brane world embedded in a constant curvature dS5
bulk the accelerating expansion of the universe can be a consequence of the extrinsic curvature and
thus a purely geometrical effect. The study of the behavior of the anisotropy parameter shows that
in our model the universe starts as a singular state with maximum anisotropy and reaches, for both
Bianchi type I and V space-times, an isotropic state in the late time limit. The study of this scenario
in an inhomogeneous brane will be the subject of a future investigation.
References
[1] A. G. Riess et. al., Astron. J. 116 1009 (1998),
S. Perlmutter et. al., Astrophys. J. 517 565 (1999)
[2] C. L. Bennett et. al., Astrophys. J. Suppl. 148 1 (2003),
D. N. Spergel et. al., Astrophys. J. Suppl. 148 175 (2003)
[3] M. Tegmark et. al., Phys. Rev. D 69 103501 (2004),
M. Tegmark et. al., Astrophys. J. 606 702 (2004)
[4] S. Weinberg, Rev. Mod. Phys. 61 1 (1989),
P. J. E. Peebles and B. Ratra, Rev. Mod. Phys. 75 559 (2003),
T. Padmanabhan, Phys. Rept. 380 235 (2003)
[5] R. R. Caldwell, R. Dave and P. J. Steinhardt, Phys. Rev. Lett. 80 1582 (1998)
[6] P. J. E. Peebles and B. Ratra Astrophys. J. 325 L17 (1988) ,
B. Ratra and P. J. E. Peebles Phys. Rev. D 37 3406 (1988)
[7] P. G. Ferreira and M. Joyce, Phys. Rev. Lett 79 4740 (1997),
E. J. Copeland, A. R. Liddle and D. Wands, Phys. Rev. D 57 4686 (1998) ,
C. Wetterich, Nucl. Phys. B 302 668 (1988)
[8] M. Turner and M. White, Phys. Rev. D 56 4439 (1997),
Z. H. Zhu, M. K. Fujimoto and D. Tatsumi, Astron. Atrophys. 372 377 (2001),
P. S. Corasaniti, M. Kunz, D. Parkinson, E. J. Copeland and B. A. Bassett, Phys. Rev. D 70
083006 (2004)
[9] L. Randall and R. Sundrum, Phys. Rev. Lett. 83 3370 (1999)
[10] L. Randall and R. Sundrum, Phys. Rev. Lett. 83 4690 (1999)
[11] P. Brax and C. van de Bruck, Class. Quant. Grav. 20 R201 (2003),
D. Langlois, Prog. Theor. Phys. Suppl. 148 181 (2003),
R. Maartens, Reference Frames and Gravitomagnetism, ed. J Pascual-Sanchez et. al., (World
Scientific,2001), p.93-119.
[12] P. Binetruy, C. Deffayet and D. Langlois, Nucl. Phys. B 565 269 (2000)
[13] P. Binetruy, C. Deffayet, U. Ellwanger and D. Langlois Phys. Lett. B 477 285 (2000)
[14] R. A. Battye and B. Carter, Phys. Lett. B 509 331 (2001)
[15] V. A. Rubakov and M. E. Shaposhnikov, Phys. Lett. B 125 136 (1983)
[16] S. Jalazadeh and H. R. Sepangi, Class. Quant. Grav. 22 2035 (2005)
[17] M. Heydari-fard, M. Shirazi, S. Jalazadeh and H. R. Sepangi, Phys. Lett. B 640 1 (2006)
[18] T. Shiromizu, K. Maeda and M. Sasaki, Phys. Rev. D 62 024012 (2000),
M. Sasaki, T. Shiromizu and K. Maeda, Phys. Rev. D 62 024008 (2000)
[19] M. D. Maia, E. M. Monte, J. M. F. Maia and J. S. Alcaniz, Class. Quant. Grav. 22 1623 (2005)
[20] M. D. Maia, E. M. Monte and J. M. F. Maia, Phys. Lett. B 585 11 (2004)
[21] M. Turner and M. White, Phys. Rev. D 56 4439 (1997),
T. Chiba, N. Sugiyama and T. Nakamura, Mon. Not. R. Astron. Soc. 289 L5 (1997)
[22] L. P. Eisenhart 1966 Riemannian Geometry, Princeton University Press, Princeton NJ (1966)
[23] M. G. Santos, F. Vernizzi, P. G. Ferreira, Phys. Rev. D 64 063506 (2001),
A. Campos and C. F. Sopuerta, Phys. Rev. D 63 104012 (2001),
A. Campos and C. F. Sopuerta, Phys. Rev. D 64 104011 (2001),
A. A. Coley, Class. Quant. Grav. 19 L45 (2002),
A. A. Coley, Phys. Rev. D 66 023512 (2002)
[24] R. Maartens, V. Sahni and T. D. Saini, Phys. Rev. D 63 063509 (2001),
R. J. van den Hoogen, A. A. Coley and H. Ye, Phys. Rev. D 68 023502 (2003),
R. J. van den Hoogen and J. Ibanez, Phys. Rev. D 67 083510 (2003),
N. Goheer and P. K. S. Dunsby, Phys.Rev. D 66 043527 (2002),
N. Goheer and P. K. S. Dunsby, Phys. Rev. D 67 103513 (2003)
[25] C. M. Chen, T. Harko and M. K. Mak, Phys. Rev. D 64 044013 (2001)
[26] T. Harko and M. K. Mak, Class. Quant. Grav. 21 1489 (2004)
[27] W. Zimdahl, D. Pavon and L. P. Chimento, Phys. Lett. B 521 133 (2001),
D. Tocchini-Valentini and L. Amendola, Phys.Rev. D 65 063508 (2002),
J. M. F. Maia and J. A. S. Lima Phys.Rev. D 65 083513 (2002)
ABSTRACT
  We consider an anisotropic brane world with Bianchi type I and V geometries
where the mechanism of confining the matter on the brane is through the use of
a confining potential. The resulting equations on the anisotropic brane are
modified by an extra term that may be interpreted as the x-matter, providing a
possible phenomenological explanation for the accelerated expansion of the
universe. We obtain the general solution of the field equations in an exact
parametric form for both Bianchi type I and V space-times. In the special case
of a Bianchi type I the solutions of the field equations are obtained in an
exact analytic form. Finally, we study the behavior of the observationally
important parameters.

<|endoftext|><|startoftext|>
Introduction and Main Result
1.1 Motivation
What portion of a manifold can be filled by disjointly embedded balls? Answers to this ball packing question
depend considerably on the manifold being packed, the types of embeddings allowed, the number of balls
allowed, and the radii of balls allowed. In this paper, we consider a packing problem for a specific class of
compact symplectic manifolds. Throughout, we let (M2n, σ) denote a 2n-dimensional compact, connected,
smooth manifold equipped with a symplectic form σ. Before stating our main result, we discuss some of
the foundational results concerning ball packings of symplectic manifolds.
In [9], M. Gromov considered the problem of finding the supremum of the densities
νN (M, σ) = sup
N vol(Br)
volσ(M)
of packings of a symplectic manifold (M2n, σ) by a fixed number N of disjoint symplectic embeddings of
the standard open radius r symplectic ball Br ⊂ C
n. Therein, Gromov found obstructions to a full (also
http://arxiv.org/abs/0704.1036v2
called perfect) packing by too few balls of the same radius; specifically, he established that νN (B1) ≤ N/2
for each 1 < N < 2n. Motivated by techniques from algebraic geometry, D. McDuff and L. Polterovich
established a correspondence between symplectically embedded balls and symplectic blowing-up in [11].
They showed, for instance, that when N is a square, CP2 is fully packable by N disjoint symplectically
embedded balls. In contrast, they also established that νN (CP
2) < 1 for nonsquare 1 < N ≤ 8. In
[1], P. Biran demonstrated that for any symplectic four manifold (M4, σ) with rational cohomology class
[σ] ∈ H2(M, Q), νN(M, σ) = 1 for all sufficiently large N . We refer the reader to [2] for more in this
direction.
In this article, we consider the symplectic packing problem in a toral equivariant setting. Specifically,
we consider toric ball packings of symplectic-toric manifolds (see Definition 2.4). In this particular ball
packing problem, the Euler characteristic χ(M) gives an upper bound on the number of disjoint balls in
any possible packing. For this reason, we allow the balls in an equivariant packing to have varying radii as
opposed to the packings described in the previous paragraph where all balls have the same radius.
The study of toric ball packings of symplectic-toric manifolds was initiated by the first author in [12]
where the homotopy type of the space of equivariant and symplectic embeddings of a fixed ball is described,
and in [13] where the symplectic-toric manifolds admitting a full toric ball packing are classified. The
present work was motivated by the latter:
Theorem 1.1. [13, Thm. 1.7] A symplectic-toric manifold (M2n, σ, ψ) admits a full toric ball packing if
and only if there exists λ > 0 such that
• if n = 2, (M4, σ, ψ) is equivariantly symplectomorphic to either (CP2, λ · σFS) or a product (CP
1, λ · (σFS ⊕ σFS)) (where σFS denotes the Fubini-Study form and these manifolds are equipped
with the standard actions of T2), or
• if n = 1 or n > 2, (M2n, σ, ψ) is equivariantly symplectomorphic to (CPn, λ · σFS) (where σFS
denotes the Fubini-Study form and this manifold is equipped with the standard action of Tn).
1.2 Main Results
Our motivation for this paper was to understand when it is possible to list (up to equivalence) all symplectic-
-toric manifolds admitting a maximal toric ball packing of a specified density, such as in Theorem 1.1. As
it turns out, this is possible only when that density is one. To be more precise, let S2n denote the set
of equivalence classes of 2n-dimensional symplectic-toric manifolds (see Section 2 for definitions). The
maximal density function
Ω2n : S
2n → (0, 1]
associates to each equivalence class [(M2n, σ, ψ)] the largest density from all possible symplectic-toric
packings of M .
Theorem 1.1 classifies the set Ω−12n ({1}). In contrast with Theorem 1.1, we prove the following:
Theorem 1.2. Let S2n denote the set of equivalence classes of 2n-dimensional symplectic-toric manifolds
and let Ω2n : S
2n → (0, 1] be the maximal density function. Then Ω−1({x}) is uncountable for all x ∈
(0, 1).
Theorem 1.2 answers [13, Quest. 5.1]. The proof of Theorem 1.2 follows from the proof of the next
theorem, which asserts that there are uncountable families of equivariantly diffeomorphic symplectic toric
manifolds with the same maximal density that are not equivariantly symplectomorphic.
Theorem 1.3. Let S2n denote the set of equivalence classes of 2n-dimensional symplectic-toric manifolds
and let
Ω2n : S
2n → (0, 1]
be the maximal density function, n ≥ 2. Suppose that (M, σ, ψ) is a symplectic-toric manifold with Euler
characteristic χ(M) ≥ ⌊(n + 2)/2⌋ · ⌈(n + 2)/2⌉ + 1. Then for any ǫ > 0, there exists a constant c > 0
and a family F of equivariantly diffeomorphic symplectic-toric manifolds satisfying
• | volσ′(M
′)− volσ(M)| < ǫ for all (M
′, σ′, ψ′) ∈ F ,
• Ω−12n ({x}) ∩ F is uncountable for all x ∈ (δ − c, δ) or for all x ∈ (δ, δ + c).
To prove Theorem 1.3 we exploit the following:
Proposition 1.4. Let (M2n, σ, ψ) be a symplectic-toric manifold of dimension 2n ≥ 4. The set of symplectic-
-toric packings of (M2n, σ, ψ) has the structure of a convex polytope in RV , where V denotes the Euler
characteristic χ(M) of M . Moreover, (M2n, σ, ψ) admits finitely many maximal toric ball packings.
The proof of Theorem 1.3 essentially follows from the fact that there are certain linear perturbations
of a given symplectic-toric manifold along which the maximal density function is locally convex. The
proof of its local convexity follows from the Brunn-Minkowski inequality once an explicit description of
the maximal density function in terms of the polytope from Proposition 1.4 is given.
In the next section, we collect together preliminary material and reduce Theorem 1.3 to a proposition
concerning polytopes. In the last section we prove these propositions using basic convexity techniques.
Acknowledgements:
We are indebted to Professor Novik for bringing to our attention an explicit example showing that the
density function is surjective during the first author’s visit to the University of Washington, as well as for
providing us with the statement and proof of Lemma 2.16. The first author is grateful to her for stimulating
conversations and for hospitality at the University of Washington. Part of the work of the first author was
funded by Rackham Fellowships from the University of Michigan. His research was partially conducted
while visiting Oberlin College. The second author was partially funded by the Clay Mathematics Institute as
a Liftoff Fellow as well as by an NSF Postdoctoral Fellowship during the period this research was conducted.
2 Reduction to Convex Geometry
Throughout this section (M2n, σ) denotes a 2n-dimensional compact connected smooth manifold equipped
with a symplectic form. We let Tk ∼= (S1)k denote the k-dimensional torus and identify its Lie algebra t
with Rk. This identification is not unique and throughout we make the convention that the identification
comes from the isomorphism Lie(S1) ∼= R given by ∂
7→ 1/2. Using the standard inner product on Rk,
we identify the dual space t∗ with t.
Definition 2.1. A σ-preserving action ψ : Tk ×M → M of a k-dimensional torus is Hamiltonian if for
each ξ ∈ t there exists a smooth function µξ : M → R such that iξMσ = dµξ, and the map
t → C∞(M, R), ξ 7→ µξ,
is a Lie algebra homomorphism. Here, ξM is the vector field on M infinitesimally generating the one
parameter action coming from ξ and the Lie algebra structure on C∞(M, R) is given by the Poisson bracket.
It follows from Definition 2.1 that a σ-preserving action ψ : Tk ×M → M of a k-dimensional torus is
Hamiltonian if and only if there exists a momentum map µM :M → t∗ satisfying Hamilton’s equation
iξMσ = d〈µ
M , ξ〉,
for all ξ ∈ t. Note that a momentum map is well defined up to translation by an element of t∗. Nevertheless,
we will ignore this ambiguity and refer to the momentum map. It is well known that if (M2n, σ) admits an
effective and Hamiltonian action of Tk, then k ≤ n (see for instance [3, Theorem 27.3]). The maximal case,
usually referred to as a symplectic-toric manifold or Delzant manifold, is a triple (M2n, σ, ψ) consisting
of a compact connected symplectic manifold (M, σ) equipped with an effective and Hamiltonian action
ψ : Tn ×M →M .
Example 2.2. Equip the open radius r ball Br ⊂ Cn with the standard symplectic form σ0 =
j dzj∧dzj .
The action Rot : Tn×Br → Br of T
n given by (θ1, . . . , θn)·(z1, . . . , zn) = (θ1 z1, . . . , θn zn) is Hamiltonian.
Its momentum map µBr has components µBrk = |zk|
2. Its image, which we shall denote by ∆n(r), is given
∆n(r) = ConvHull(0, r2 e1, . . . , r
2 en) \ ConvHull(r
2 e1, . . . , r
2 en),
where {ei}
i=1 is the standard basis of R
n. When the dimension n is clear from the context, we shall write
∆(r) = ∆n(r). ⊘
The next two definitions define the packings considered in this paper. These definitions first appeared in
[13] in a slightly different but equivalent form.
Definition 2.3. Let (M2n, σ, ψ) be a symplectic-toric manifold, let Λ ∈ Aut(Tn) and let r > 0. A subset
B ⊂ M is said to be a Λ-equivariantly embedded symplectic ball of radius r if there exists a symplectic
embedding f : Br →M with image B and such that the following diagram commutes:
Tn × Br
Λ×f //
Tn ×M
f // M
We say that the Λ-equivariantly embedded symplectic ball B has center f(0) ∈M .
We shall say that a subset B′ ⊂ M is an equivariantly embedded symplectic ball of radius r′ if there
exists Λ′ ∈ Aut(Tn) such that B′ is a Λ′-equivariantly embedded symplectic ball of radius r′ (although this
is a slight abuse of the standard use of the word “equivariantly”). ⊘
We define the symplectic volume of a subset A ⊂M by volσ(A) :=
Definition 2.4. Let (M2n, σ, ψ) be a symplectic-toric manifold. A toric ball packing of M is a disjoint
union P :=
α∈ABα of equivariantly embedded symplectic balls Bα (of possibly varying radii) in M . The
density Ω(P) of a packing P is defined by Ω(P) := volσ(P)/ volσ(M). The density Ω(M
2n, σ, ψ) of a
symplectic-toric manifold (M2n, σ, ψ) is defined by
Ω(M, σ, ψ) := sup{Ω(P) | P is a toric packing ofM}.
A packing achieving this density is said to be a maximal density packing. If in addition, this density is one,
then (M2n, σ, ψ) is said to admit a full or perfect toric ball packing. ⊘
For a symplectic-toric manifold (M2n, σ, ψ), the number of fixed points of ψ is known to coincide with
the Euler characteristic χ(M) (see e.g. [5]). It follows that a toric ball packing P consists of at most χ(M)
disjoint equivariant balls. By a well known theorem of Atiyah and Guillemin-Sternberg, the image of the
momentum map of a toral Hamiltonian action is the convex hull of the images of the fixed points (see for
instance [3, Theorem 27.1]). The images of momentum maps for symplectic-toric manifolds are a particular
class of polytopes. Recall that an n-dimensional polytope is simple if there are precisely n edges meeting
at each one of its vertices.
Definition 2.5. A simple n-dimensional convex polytope ∆ ⊂ Rn is said to be Delzant if for each vertex
v, the edges meeting at v are all of the form v + ti ui where ti > 0 and {u1, . . . , un} define a basis of the
Z-module Zn. ⊘
A polytope is describable as the intersection of closed half-spaces
{x ∈ Rn | 〈x, ui〉 ≥ λi},
where the vector ui is an inward pointing normal vector to the i
th facet of ∆ and each λi is a real scalar. In
this notation, the polytope ∆ is Delzant if and only if there are precisely n facets incident to each vertex of
∆ and the inward pointing normals to these facets u1, . . . , un can be chosen to be a Z-basis of Z
Given a symplectic-toric manifold M , the symplectic-toric manifolds obtained from M by scaling the
symplectic form, changing the time parameter in the acting torus by an automorphism, and any others that
are equivariantly symplectomorphic to one of these will clearly have the same maximal density. Therefore,
for the purpose of this paper we will say that two symplectic-toric manifolds (M1, σ1, ψ1) and (M2, σ2, ψ2)
are equivalent if there exists an automorphism Λ ∈ Aut(Tn) ∼= GL(n, Z), a positive number λ > 0, and a
symplectomorphism h : (M1, σ1) → (M2, λ · σ2) such that the following diagram commutes:
Tn ×M1
Λ×h //
Tn ×M2
h // M2
We recall the result of Delzant [4]:
Theorem 2.6. Suppose that (M1, σ1, ψ1) and (M2, σ2, ψ2) are two 2n-dimensional symplectic-toric man-
ifolds with momentum maps µM1 and µM2 respectively. Then there exists a (ψ1, ψ2)-equivariant symplecto-
morphism h : (M1, σ1) → (M2, σ2) such that µ
M1 = µM2 ◦ h if and only if µM1(M1) = µ
M2(M2).
In view of this theorem, there is a natural equivalence relation one can put on the set of Delzant polytopes
so that momentum maps will induce a bijective correspondence between equivalence classes of symplectic-
-toric manifolds as above and equivalence classes of Delzant polytopes. To be more specific, first note that
scaling Rn leaves invariant the Delzant polytopes. Define two Delzant polytopes ∆1,∆2 ⊂ R
n to be in the
same projective class if there exists λ > 0 such that λ∆1 = ∆2. The group AGL(n, Z) consisting of affine
transformations of Rn with linear part in GL(n, Z) acts on the set of projective classes of Delzant polytopes
in Rn. For the purpose of this paper we say that two Delzant polytopes ∆1 and ∆2 are equivalent if the
projective classes of ∆1 and ∆2 are in the same AGL(n, Z) orbit. By applying Theorem 2.6, it is standard
to show that there is a bijective correspondence between equivalence classes of symplectic-toric manifolds
and equivalence classes of Delzant polytopes as defined here.
We exploit this correspondence in order to reduce Theorem 1.3 and Proposition 1.4 to propositions
concerning packing Delzant polytopes. The next few definitions are translations of the above definitions
into the appropriate definitions concerning Delzant polytopes. Again, they first appeared in [13] in a slightly
different but equivalent form.
Definition 2.7. Let ∆ be a Delzant polytope. A subset Σ ⊂ ∆ is said to be an admissible simplex of radius
r with center at the vertex v ∈ ∆ if Σ is the image of ∆(r1/2) by an element of AGL(n, Z) which takes the
origin to v and the edges of ∆(r1/2) to the edges of ∆ meeting v. For a vertex v ∈ ∆, we put
rv := max{r > 0 | ∃ an admissible simplex of radius r with center v}. ⊘ (1)
Remark 2.8. In view of Example 2.2, the simplex ∆(r1/2) may be identified with the set obtained by
removing from ConvHull(0, re1, . . . , ren) the facet not containing the origin. For this reason we say that
AGL(n, Z) images of ∆(r1/2) as in the above definition have radius r instead of radius r1/2. ⊘.
We denote the Euclidean volume of a subset A ⊂ ∆ by voleuc(A).
Definition 2.9. Let ∆ be a Delzant polytope. An admissible packing of ∆ is a disjoint union P :=
α∈A Σα
of admissible simplices (of possibly varying radii) in ∆. The density Ω(P) of a packing P is defined by
Ω(P) := voleuc(P)/ voleuc(∆). The density Ω(∆) of a Delzant polytope ∆ is defined by
Ω(∆) := sup{Ω(P) | P is an admissible packing of∆}.
A packing achieving this density is said to be a maximal density packing. If in addition, this density is one,
then ∆ is said to admit a full or perfect packing. ⊘
The next lemma shows that admissible simplices in ∆ are parametrized by their centers and radii. The
rational or SL(n, Z)-length of an interval I ⊂ Rn with rational slope is the unique number l := lengthQ(I)
such that I is AGL(n, Z)-congruent to an interval of length l on a coordinate axis. For a vertex v in a
Delzant polytope, we denote the n edges leaving v by e1v, . . . , e
v . By Definition 2.5, each e
v is of the form
v+ tiv u
v with t
v > 0 and {u
i=1 defining a Z-basis of Z
n. In this notation, we have that lengthQ(e
v) = t
Lemma 2.10. Let ∆ be a Delzant polytope. Then for each vertex v ∈ ∆,
rv = min{lengthQ(e
v), . . . , lengthQ(e
v )}.
There is an admissible simplex Σ(v, r) of radius r with center v if and only if 0 ≤ r ≤ rv. Moreover this
admissible simplex is unique and voleuc(Σ(v, r)) = r
n/n!.
Proof. We first argue the uniqueness of an admissible simplex Σ of radius r and center v, assuming its
existence. Suppose Σ1 and Σ2 were two such. By definition, there exist two affine transformationsA1, A2 ∈
AGL(n, Z) satisfying A1(∆(r
1/2)) = Σ1 and A2(∆(r
1/2)) = Σ2. As both Σi are centered at v, both Ai
have translational part given by v. Write Ai(·) = Λi(·)+ v with Λi ∈ GL(n, Z). By the Delzant property of
∆, c.f. Definition 2.5, the automorphisms Λ1 and Λ2 both take the standard basis {ei}
i=1 of Z
n bijectively
onto the basis {uiv}
i=1 of Z
n as unordered sets. Therefore Λ−11 Λ2 leaves invariant the standard basis of Z
as an unordered set and hence leaves ∆(r1/2) invariant as a set. It follows that Σ1 = Σ2. Next we argue
existence.
As above, write eiv = v + t
v with {u
i=1 forming a Z-basis of Z
n. Let Λ ∈ GL(n, Z) be the
automorphism of Zn defined by Λ(ei) = u
v, with {ei}
i=1 the standard basis of Z
n. The affine transfor-
mation A : Rn → Rn defined by A(x) = Λ(x) + v satisfies A(t ei) = t u
v + v for each t > 0. There-
fore, A(∆(r1/2)) defines an admissible simplex for ∆ with center v if and only if r ≤ min{t1v, . . . , t
min{lengthQ(e
v), . . . , lengthQ(e
v )}, concluding the proof of existence.
Finally, for each 0 ≤ r ≤ rv, let Σ(v, r) be the unique admissible simplex with radius r and center v
guaranteed by the previous two paragraphs. Since elements in AGL(n, Z) preserve Euclidean volume, we
have voleuc(Σ(v, r)) = voleuc(∆(r
1/2)) = rn/n!, completing the proof.
Lemmas 2.13 and 2.14 below reduce the problem of analyzing the densities of toric ball packings of
symplectic-toric manifolds to analyzing the densities of admissible packings of Delzant polytopes. These
lemmas appear in [12] and [13] respectively, in slightly different form. We include the arguments here for
the reader’s convenience. We first recall some definitions and a result due to Y. Karshon and S. Tolman.
Let (N2n, σN) denote a connected (and not necessarily compact) symplectic manifold equipped with
an effective Hamiltonian action of Tn with momentum map µN : N → t∗. Suppose that T ⊂ t∗ is an open
and convex subset containing µN(N) with the property that µN : N → T is a proper map. The quadruple
(N, σN , µN , T ) is said to be a proper Hamiltonian Tn-manifold. For a subgroup K ⊂ Tn, denote the
fixed point set for the Tn action on N by NK . A proper Hamiltonian Tn-manifold (N, σN , µN , T ) is said
to be centered at α ∈ T provided that α is in the momentum map image of every component of NK for
all subgroups K ⊂ Tn. For a fixed point p ∈ NT
, there exist isotropy weights η1, . . . , ηn ∈ t
∗ such that
the induced linear symplectic action on Tp(N) is isomorphic to the action on (C
n, σ0) generated by the
momentum map
(x) = µN(p) +
|zj |
2ηj .
For a symplectic-toric manifold M with fixed point p ∈ MT
, and corresponding vertex v := µ(p) ∈ ∆,
and edges eiv = v + t
v (i = 1, . . . , n) emanating from v , the Z-basis {u
i=1 of Z
n coincides with the
set of isotropy weights at p.
Proposition 2.11. [10, Prop. 2.8] Let (N2n, σN , µN , T ) be a proper Hamiltonian Tn-manifold. Assume
that N is centered about α ∈ T and that (µN)−1({α}) consists of a single fixed point p. Then N is
equivariantly symplectormorphic to
{z ∈ Cn |α+
|zj |
2ηj ∈ T },
where η1, . . . , ηn ∈ t
∗ are the isotropy weights at p.
Remark 2.12. The reader who consults [10] will find an additional factor of π in the definition of isotropy
weights and in the statement of their Proposition 2.8. This factor does not appear in the present work
because of the particular identification chosen between t and Rn. ⊘
Lemma 2.13. Let (M2n, σ, ψ) be a symplectic-toric manifold with momentum map µM : M → t∗ and
associated Delzant polytope ∆ := µM(M). Let B ⊂ M be an equivariantly embedded symplectic ball of
radius r and center p ∈ M . Then µM(B) is an admissible simplex of radius r2 in ∆ with center µM(p).
Conversely, if Σ ⊂ ∆ is an admissible simplex of radius r, then there exists an equivariantly embedded
symplectic ball B ⊂M of radius r1/2 satisfying µM(B) = Σ.
Proof. First suppose that B ⊂ M is an equivariantly embedded ball of radius r. By definition, there is an
automorphism Λ ∈ Aut(Tn) ∼= GL(n, Z) and a symplectic embedding f : (Br, σ0) → (M, σ) with image
B such that the following diagram commutes:
Tn × Br
Λ×f //
Tn ×M
f // M
We denote by ξBr and ξM the vector fields on Br and M infinitesimally generating the actions of the one
paramater group coming from ξ ∈ t. Fix z ∈ Br and a tangent vector v ∈ TzBr.
Note that from the definition of the momentum maps µM and µBr , commutativity of the above diagram,
and the fact that f ∗σ = σ0, we have the following sequence of equalities:
〈dµBrz (v), ξ〉µBr (z) = σ0(v, ξBr)z
= σ(dfz(v), dfz(ξBr))f(z)
= σ(dfz(v), (Λ ξ)M)f(z)
= 〈dµMf(z)(dfz(v)), Λ ξ〉µM (f(z))
= 〈Λt dµMf(z)(dfz(v)), ξ〉µBr (z). (2)
By equation (2) and the chain rule we obtain that for all z ∈ Br and v ∈ TzBr,
dµBrz (v) = d(Λ
t ◦ µM ◦ f)z(v). (3)
As Br is connected, (3) implies that there exists x
′ ∈ Rn such that µBr + x′ = Λt ◦ µM ◦ f as maps
Br → R
n. Letting x = (Λt)−1(x′) yields commutativity of the following diagram:
(Λt)−1+x
// ∆M
. (4)
It follows from commutativity of this diagram that x = µM(f(0)) = µM(p) so that µ(B) = µ(f(Br)) is an
admissible simplex of radius r2 and center µM(p), completing the proof of the first statement.
Next suppose that Σ ⊂ ∆ is an admissible simplex of radius r. By applying a translation, we may
assume that Σ is centered at the origin. Identify Σ with the set
ConvHull(0, r η1, . . . , r ηn) \ ConvHull(r η1, . . . , r ηn).
Let T ⊂ t∗ be the the unique open half space of t∗ containing Σ with bounding hyperplane containing
ConvHull(r η1, . . . r ηn). Denote by σ
N , µN the restrictions of the symplectic form σ, and of the momen-
tum map µM , to the open submanifold N := (µM)−1(Σ) ⊂ M . The quadruple (N, σN , µN , T ) is a
proper Hamiltonian Tn manifold centered at 0 ∈ T . It now follows from Proposition 2.11, that (N, σN ) is
equivariantly symplectomorphic to
{z ∈ Cn |
|zj |
2ηj ∈ T } = Br1/2 .
In other words, the set N ⊂ M is an equivariantly embedded symplectic ball of radius r1/2 satisfying
µM(N) = Σ (c.f. [13, Lem 2.3] for an explicit verification).
Lemma 2.14. Let (M2n, σ, ψ) be a symplectic-toric manifold with momentum map µM :M → Rn and as-
sociated Delzant polytope ∆ := µM(M). Then for each toric ball packing P of M , µM(P) is an admissible
packing of ∆ satisfying Ω(P) = Ω(µM(P)). Moreover, given an admissible packing Q of ∆, there exists a
toric ball packing P of M satisfying µM(P) = Q.
Proof. Let P be a toric ball packing ofM . For each equivariant symplectic ball B in the packing P , µM(B)
is an admissible simplex in ∆ by the previous lemma. Since the fibers of the momentum map µM :M → ∆
are connected (see for instance [3, Theorem 27.1]), disjoint equivariant symplectic balls in the packing P
are sent to disjoint admissible simplices in ∆. Hence, µM(P) is a toric packing of ∆. By the Duistermaat-
-Heckman Theorem (see for instance [3, Theorem 30.3]) the push forward of symplectic volume under the
momentum map satisfies µM∗ (volM) = K(n) voleuc, where K(n) > 0 is a dimensional constant. It follows
that Ω(P) = Ω(µM(P)). It remains to argue that given an admissible packing Q of ∆, there exists a toric
ball packing P of M satisfying µM(P) = Q. By the Lemma 2.13, for each admissible simplex Σ there
exists an equivariant symplectic ball B ⊂ M with µM(B) = Σ. Choosing one such equivariant symplectic
ball for each admissible simplex in Q defines a disjoint collection P of equivariant symplectic balls mapping
onto Q under the momentum map. Hence P is a toric packing of M satisfying µM(P) = Q.
Before concluding this section by reformulating our main results in terms of polytopes, we must first
introduce the complete regular n–dimensional fan associated to an n-dimensional Delzant polytope ∆, as
in [4, Sec. 5]. As above, write
{x ∈ Rn | 〈x, ui〉 ≥ λi},
where F is the number of facets of ∆ and ui is the unique primitive integral normal vector to the i
th facet.
For each face ∆′ of ∆ of codimension k, there is a unique multi-index I∆′ of length k, I∆′ = {i1, . . . , ik},
1 ≤ i1 < . . . < ik ≤ F , such that
∆′ = {x ∈ Rn | 〈x, uj〉 = λj , ∀j ∈ I∆′}.
Letting σ∆′ denote the cone in R
n generated by the vectors {uj | j ∈ I∆′}, the complete regular n-
-dimensional fan associated to ∆ is given by {σ∆′ |∆
′ is a face of∆}. For our purposes here, we state
the well known fact that if two Delzant polytopes have the same associated fan, their associated symplectic-
-toric manifolds are equivariantly diffeomorphic. This is a standard fact that follows from the construction
of a symplectic-toric manifold starting from a given Delzant polytope.
It now follows from Lemma 2.13 and Lemma 2.14 that proving Theorem 1.2 is reduced to establishing
the following proposition:
Proposition 2.15. Let Dn denote the set of equivalence classes of n-dimensional Delzant polytopes and
Ω : Dn → (0, 1]
be the maximal density function, n ≥ 2. Then Ω−1({x}) is uncountable for all x ∈ (0, 1).
Recall that the number of fixed points of the Tn-action on M equals the Euler Characteristic of M and
the number of vertices of the momentum polytope µ(M), c.f. [5]. We are grateful to Professor Novik for
the following observation:
Lemma 2.16. An n-dimensional Delzant polytope ∆ with at least ⌊(n+2)/2⌋ · ⌈(n+2)/2⌉+1 vertices has
at least n + 3 facets. Moreover, this bound is sharp in the sense that there exists an n-dimensional Delzant
polytope with ⌊(n+ 2)/2⌋ · ⌈(n+ 2)/2⌉ vertices and n+ 2 facets.
Proof. Indeed, to see that a fewer number of vertices is not enough, let ∆ be the direct product of two
regular simplexes, one of dimension ⌊n/2⌋ and another one of dimension ⌈n/2⌉. Their product is an n-
dimensional Delzant polytope that has ⌊(n+ 2)/2⌋ · ⌈(n+ 2)/2⌉ vertices and only (n+ 2) facets. It is well
known, see e.g. [7, pp. 98-100] for the proof of (the dual) statement, that ∆ has the maximal number of
vertices amongst all simple n-polytopes with n+ 2 facets.
It then follows from Lemma 2.13, Lemma 2.14 and Lemma 2.16 that proving Theorem 1.3 is reduced
to establishing the following proposition:
Proposition 2.17. Let Dn denote the set of equivalence classes of n-dimensional Delzant polytopes and
Ω : Dn → (0, 1]
be the maximal density function, n ≥ 2. Suppose that ∆ is a Delzant polytope having at least n + 3 facets
and let Ω(∆) := δ ∈ (0, 1). Then for any ǫ > 0, there exists a constant c > 0 and a family F of Delzant
polytopes satisfying
• the polytopes in F determine a common fan,
• | voleuc(∆
′)− voleuc(∆)| < ǫ for all ∆
′ ∈ F ,
• Ω−1({x}) ∩ F is uncountable for all x ∈ (δ − c, δ) or for all x ∈ (δ, δ + c).
Similarly we have reduced proving Proposition 1.4 to showing:
Proposition 2.18. Let ∆ ⊂ Rn be a Delzant polytope. The set of admissible packings of ∆ has the structure
of a convex polytope in RV , where V is the number of vertices of ∆. Moreover, ∆ admits finitely many
maximal density packings.
We prove these propositions in the next section.
3 Proofs of propositions 2.15, 2.17, and 2.18.
In this section, we prove Propositions 2.15, 2.17, and 2.18 using convexity arguments. First we recall some
preliminary notions. For a set A ⊂ Rn, denote the closure of A by A. Now suppose that A is a convex set.
A function f : A→ R is said to be convex if for all x1, x2 ∈ A and t ∈ (0, 1),
f((1− t) x1 + t x2) ≤ (1− t) f(x1) + t f(x2)
and strictly convex if the inequality is always strict. Similarly, the function f is said to be concave if for all
x1, x2 ∈ A and t ∈ (0, 1),
f((1− t) x1 + t x2) ≥ (1− t) f(x1) + t f(x2)
and strictly concave if the inequality is always strict. It follows from the definitions that if f is a convex
function on A and g is a positive concave function on A, then f/g is a convex function which is strict if one
of f or g is strict. Moreover, if f1, . . . , fk are convex functions onA then so is the function max{f1, . . . , fk}.
We let C(Rn) denote the space of compact convex subsets of Rn and endow it with the Hausdorff metric
dH given by
dH(A, B) := inf{ǫ > 0 |A ⊂ Nǫ(B) andB ⊂ Nǫ(A)},
where Nǫ(X) denotes the open ǫ-neighborhood of a subset X ⊂ R
n. A compact convex set A ∈ C(Rn)
with non-empty interior is said to be a convex body. If λ > 0 and A and B are convex bodies then so are the
λA := {λ a | a ∈ A} A+B = {a+ b | a ∈ A and b ∈ B}.
Subsets A, B ⊂ Rn are said to be homothetic if there exists v ∈ Rn and λ > 0 such that λA + {v} = B.
We recall the celebrated Brunn-Minkowski inequality (see [8] for a detailed survey):
Theorem 3.1 (Brunn-Minkowski). Let A, B be convex bodies in Rn and 0 < λ < 1. Then
vol1/neuc ((1− λ)A+ λB) ≥ (1− λ) vol
euc (A) + λ vol
euc (B),
with equality if and only if A and B are homothetic.
In the remainder of this section, we let ∆n =
i=1{x ∈ R
n | 〈x, ui〉 ≥ λi} denote an n-dimensional
Delzant polytope with F facets and V vertices. We enumerate the vertices v1, v2, . . . , vV and facets
F1, . . . ,FF . When two vertices vj and vk are adjacent in ∆, we denote their common edge by ej,k. By
the Delzant property, each vertex vi is the unique intersection point of n facets,
F jvi = {x ∈ R
n | 〈x, uvi〉 = λvi} ∩∆, j = 1, . . . , n,
where the inward normal vectors ujvi to the j
th facet F jvi collectively define a Z-basis of Z
n. In particular, ∆
is simple and there are precisely n edges ejvi , j = 1, . . . n leaving each vertex vi of ∆. We shall denote the
set of admissible packings of ∆ by AP(∆).
Proposition 2.18. Let ∆ ⊂ Rn be a Delzant polytope. The set AP(∆) of admissible packings of ∆ has the
structure of a polytope in RV , where V is the number of vertices of ∆. Moreover, ∆ admits finitely many
maximal toric ball packings.
Proof. Each packing Q ∈ AP(∆) consists of a disjoint union
i=1Σ(vi, Ri(Q)) of admissible simplices
Σ(vi, Ri(Q)) centered at the vertex vi with (possibly zero) radius Ri(Q). Define the map
R : AP(∆) → ΠVi=1[0, rvi], Q 7→ (R1(Q), . . . , RV (Q)).
By Lemma 2.10, the map R is injective so that we can identify AP(∆) with its image in ΠVi=1[0, rvi ].
Note that R is not surjective as the admissible simplices with given radii (x1, . . . , xV ) ∈ Π
i=1[0, r
vi] will
not in general be disjoint. We must argue that the image set of R is precisely the solution set to finitely
many linear inequalities.
As a first step, we give a criterion for admissible simplices to be disjoint in terms of their intersections
along the edges of ∆. To this end, let F denote a finite family of admissible simplices with pairwise distinct
centers and fix an admissible simplex Σ from the family. Let v denote the center of Σ.
Disjointness Condition: For Σ to be disjoint from the rest of the familyF , it is a necessary and sufficient
condition that its closure Σ intersects the closure of the other admissible simplices in the family in at most
one point in each of the edges ejv, j = 1, . . . , n.
To see that the above disjointness condition is necessary, suppose that the closure of another admissible
simplex in F , say Σ′, intersects Σ in more than a single point along some edge ejv. Then Σ ∩ Σ
′ ∩ ejv is a
convex subset of ejv with at least two points and is therefore a closed subsegment e ⊂ e
i with nonempty
interior. The interior of e is contained in both Σ and Σ′, whence Σ ∩ Σ′ 6= ∅. Next we argue that the
above condition is sufficient. To see this, let Σ′ be another admissible simplex in the family F . Denote
by v′ ∈ ∆ the center of Σ′ and let xjv′ denote the unique point in the set Σ
′ ∩ e
v′ \ Σ
′ ∩ e
v′ for each
j = 1, . . . , n. If the above condition holds, then {v′, x1v′ , . . . , x
v′} ⊂ ∆ \Σ. But since ∆ \Σ is a convex set
and Σ′ = ConvHull(v′, x1v′ , . . . , x
v′), it follows that Σ
′ ⊂ ∆ \ Σ, concluding the proof of sufficiency.
For distinct vertices vi, vj ∈ {v1, . . . , vV }, define Li,j by
Li,j =
lengthQ(ei,j) if vi and vj are adjacent
rvi + rvj otherwise
It follows from (1) and Lemma 2.10 and the preceding disjointness condition that a point (x1, . . . , xV ) ∈
ΠVi=1[0, rvi] lies in the image of R if and only if the equations
xi + xj ≤ Li,j ,
hold for all i 6= j ∈ {1, . . . , V }. Therefore,
R(AP(∆)) =
i 6=j∈{1,...,V }
{(x1, . . . , xV ) ∈ R
≥0 | xi + xj ≤ Li,j}. (5)
By Lemma 2.10, the density function in the coordinates of RV≥0 is given by
Ω((x1, . . . , xV )) = (Σ
i )/(n! voleuc(∆)).
As n > 1, Ω is a strictly convex function on AP(∆) so that its maximum value can only be obtained at its
vertices, establishing the last part of the proposition.
In view of the injectivity of R, we will henceforth identify AP(∆) with the polytope R(AP(∆)).
Before proving Proposition 2.17, we need to set up some additional notation and prove a lemma. A
natural way to perturb the Delzant polytope ∆ is by perturbing its defining linear equations. Define the map
F → C(Rn), s = (s1, . . . , sF ) 7→ ∆s (6)
∆s :=
{x ∈ Rn | 〈x, ui〉 ≥ λi + s
This map is continuous and has image in the set of polytopes with not more than F facets (we regard the
empty set as a polytope). For small enough r > 0, the polytopes ∆s with s ∈ B(0, r) ⊂ R
F still have
F facets, are Delzant, and all determine the same fan. Let B∆ denote the largest open ball centered at the
origin with these three properties and let Dn denote the set of equivalence classes of Delzant polytopes. The
map B∆ → D
n induced by (6) induces an equivalence relation on B∆ by declaring points in the fibers to
be equivalent. We denote the set of equivalence classes by B̂∆ and endow it with the quotient topology. As
GL(n, Z) is discrete, the dimension of a fixed equivalence class of Delzant polytopes is n+1, the dimension
of the group of homotheties of Rn. It follows that the dimension of B̂∆ satisfies dim(B̂∆) ≥ F − (n+ 1).
Define the function
Ω : B∆ → (0, 1]
by Ω(s) := Ω(∆s), and for s1, s2 ∈ B∆, define the function
Ωs1, s2 : [0, 1] → (0, 1]
by Ωs1, s2(t) := Ω((1 − t) s1 + t s2) = Ω(∆(1−t) s1+t s2).
Lemma 3.2. The function Ω : B∆ → (0, 1] is continuous. For distinct s1, s2 ∈ B∆, there exists a suitably
small ǫ(s1, s2) > 0 such that the restriction Ω
s1, s2
|[0, ǫ] : [0, ǫ] → (0, 1] is convex. Furthermore, if ∆s1 and
∆s2 are not homothetic, then Ω
s1, s2
|[0, ǫ] is strictly convex.
Proof. Define || · ||n : R
≥0 → R by ||(x1, . . . , xV )||n = (
i=1 x
1/n. It is well known that || · ||n is a strictly
convex function. For s ∈ B∆, we have that
Ω1/n(s) =
max || · ||n|AP(∆s)
(n!)1/n vol1/neuc (∆s)
Each of the vertices v1, . . . , vV of ∆ are defined as the unique solution to the linear system:
〈ujvi, vi〉 = λ
j = 1, . . . n.
Similarly, ∆s has V vertices v1(s), . . . , vV (s), each defined by the linear system:
〈ujvi, vi(s)〉 = λ
+ sjvi j = 1, . . . n.
Hence, the vertices define affine maps vi : B∆ → R
n. It follows that the edges of ∆s and their rational
lengths also vary linearly with s ∈ B∆ and that, as we have already remarked, s 7→ ∆s defines a continuous
map B∆ → C(R
By the proof of Proposition 2.18, c.f. expression (5),
AP(∆s) =
i 6=j∈{1,...,V }
{(x1, . . . , xV ) ∈ R
≥0 | xi + xj ≤ Li,j(s)},
where
Li,j(s) =
lengthQ(ei,j(s)) if vi(s) and vj(s) are adjacent;
rvi(s) + rvj (s) otherwise,
rvi(s) = min{lengthQ(e
(s)), . . . , lengthQ(e
(s))}.
Hence, the defining equations of AP(∆s) vary continuously with s ∈ B∆ so that s 7→ AP(∆s) defines a
continuous map B∆ → C(R
≥0). Since vol
1/n and max || · ||n define continuous maps on C(R
n) and C(RV≥0),
it follows that Ω is continuous.
Now fix s1, s2 ∈ B∆ and t ∈ (0, 1). We first claim that
∆(1−t) s1+t s2 = (1− t)∆s1 + t∆s2.
Note that
∆(1−t) s1+t s2 = ConvHull(v1((1− t) s1 + t s2), . . . , vV ((1− t) s1 + t s2)).
Since vi((1− t) s1 + t s2) = (1− t) vi(s1) + t vi(s2),
{v1((1− t) s1 + t s2), . . . , vV ((1− t) s1 + t s2)} ⊂ (1− t)∆s1 + t∆s2 ,
whence
∆(1−t) s1+t s2 ⊂ (1− t)∆s1 + t∆s2 .
Now consider the (n+ 1)-dimensional polytope
∆(s1, s2) :=
{(x, t) ∈ Rn × [0, 1] | 〈x, ui〉 ≥ λi + (1− t) s1 + t s2}.
Let Ht = ∆(s1, s2) ∩ {xn+1 = t} and note that Ht is naturally identified with ∆(1−t) s1+t s2 . Now, if
(x, 0) ∈ H0, (y, 1) ∈ H1, and t ∈ (0, 1), then (1 − t) (x, 0) + t (y, 1) ∈ Ht, concluding the proof of the
claim.
By Theorem 3.1, the map [0, 1] → R given by t 7→ vol1/neuc (∆(1−t) s1+t s2) is concave and strictly concave
if and only if ∆s1 and ∆s2 are not homothetic. To conclude the proof of the Lemma, we must argue that there
is a suitably small ǫ = ǫ(s1, s2) > 0 such that the map [0, 1] → R given by t 7→ max || · ||n|AP(∆(1−t) s1+t s2)
is convex when restricted to the interval [0, ǫ). Let p1, p2, . . . , pk ∈ R
V denote the vertices of AP(∆s1).
It follows from the description above that for t sufficiently small AP(∆(1−t) s1+t s2) also has k vertices
v1(t), . . . , vk(t) ∈ R
V and that the maps t 7→ vi(t) define (possibly constant) line segments in R
V . By
convexity of || · ||n,
max || · ||n|AP(∆(1−t) s1+t s2 ) = max{||v1(t)||n, . . . , ||vk(t)||n},
and the result follows.
We are now ready to prove:
Proposition 2.17. Let Dn denote the set of equivalence classes of n-dimensional Delzant polytopes and
Ω : Dn → (0, 1]
be the maximal density function, n ≥ 2. Suppose that ∆ is a Delzant polytope having at least n + 3 facets
and let Ω(∆) := δ ∈ (0, 1). Then for any ǫ > 0, there exists a constant c > 0 and a family F of Delzant
polytopes satisfying
• the polytopes from F determine a common fan,
• | voleuc(∆
′)− voleuc(∆)| < ǫ for all ∆
′ ∈ F ,
• Ω−1({x}) ∩ F is uncountable for all x ∈ (δ − c, δ) or for all x ∈ (δ, δ + c).
Proof. Let ∆ be a Delzant polytope with with at least n + 3 facets and with Ω(∆) = δ and let ǫ > 0. By
continuity of the volume function with respect to the Hausdorff metric, there is a suitably small connected
neighborhood N ⊂ B∆ of the origin for which | voleuc(∆s)− voleuc(∆)| < ǫ for each s ∈ N . We define the
desired family F by
F = {∆s | s ∈ N}
and remark that by construction all ∆′ ∈ F determine the same fan. Therefore, it remains to establish the
third item of the proposition.
As dim(N) ≥ n + 3 and since the space of homotheties of Rn has dimension n + 1, there exists
s ∈ N\{0} such that ∆s is not homothetic to∆. By Lemma 3.2, there exists ǫ
′ > 0 such that Ω
0, s : [0, ǫ
[0, 1) is strictly convex and therefore Ω: N → (0, 1] is not the constant map. Since Ω is continuous and N
is connected, there exists c > 0, such that (δ, δ + c) ⊂ Ω(N) or (δ − c, δ) ⊂ Ω(N).
Suppose that (δ − c, δ) ⊂ Ω(N). Note that Ω : B∆ → (0, 1] descends to a continuous map Ω̂ : B̂∆ →
(0, 1]. Suppose that for some x ∈ (δ − c, δ), Ω̂−1({x}) is countable. As dim(B̂∆) ≥ 2, B̂∆ \ Ω̂
−1({x})
is connected. Hence, Ω̂(B̂∆ \ Ω̂
−1({x})) is connected, a contradiction. In case that (δ, δ + r) ⊂ Ω(N) an
analogous argument concludes the proof.
Remark 3.3. The proof of Proposition 2.17 above also establishes the following: suppose that ∆ is an
n-dimensional Delzant polytope with at least n+ 3 facets and with Ω(∆) := δ ∈ (0, 1). Also suppose that
Ω(B∆) contains an open neighborhood (δ−c, δ+c) of δ. Then for each x ∈ (δ−c, δ+c), Ω
−1({x}) ⊂ Dn
is uncountable.
Proposition 2.15. Let Dn denote the set of equivalence classes of n-dimensional Delzant polytopes and
Ω : Dn → (0, 1]
be the maximal density function, n ≥ 2. Then Ω−1({x}) is uncountable for all x ∈ (0, 1).
Proof. By Remark 3.3, it suffices to show that there exists an n-dimensional Delzant polytope ∆ with at
least n + 3 facets for which Ω(B∆) = (0, 1). Consider the polytope ∆(ǫ1, ǫ2) obtained by removing from
the standard n-dimensional simplex an admissible simplex of radius ǫi at the vertex ei for i = 1, 2. For
compatible choices of ǫ1 and ǫ2, we obtain a Delzant polytope with n + 3 facets. Fix ǫ
1 and ǫ
2 both very
close to zero and let E ⊂ [0, 1]2 be the set of pairs (x, y) for which ∆(x, y) = ∆s for some s ∈ B∆(ǫ0
By definition, Ω({∆(x, y) | (x, y) ∈ E}) ⊂ Ω(B∆(ǫ01, ǫ02)), a connected subset of the interval (0, 1). We
conclude by showing Ω({∆(x, y)|(x, y) ∈ E}) contains deleted open neighborhoods of 0 and 1 in [0, 1].
To obtain a deleted neighborhood of 1 we remark that as (x, y) → (0, 0), (x, y) ∈ E and Ω(∆(x, y)) → 1.
Similarly, to obtain a deleted neighborhood of 0 we remark that as y → 0, the pairs of the form (1− 2y, y)
are in E and Ω(1− 2y, y) → 0.
We thank Professor Novik for bringing the example above to our attention. We conclude with the
following:
Question. Let [(M2n, σ, ψ)] be an equivalence class of symplectic-toric manifolds. Is there a formula
for the number of different maximal toric packings of M in terms of its equivariant symplectic invariants?
References
[1] P. Biran: A stability property of symplectic packing, Invent. Math. 136 (1999), no. 1, 123–155.
[2] P. Biran: From symplectic geometry to algebraic geometry and back, Proceedings of the 3’rd Eu-
ropean Congress of Mathematics (Barcelona 2000), Vol. II, 507-524, Progr. Math, 202, Birkhauser
2001.
[3] A. Cannas da Silva: Lectures in symplectic geometry, Springer–Verlag, Berlin, (2000).
[4] T. Delzant: Hamiltoniens périodiques et image convexe de l’application moment, Bull. Soc. Math.
France, 116 (1988) 315–339.
[5] W. Fulton: Introduction to toric varieties, Princeton Univ. Press, Ann. Math. Stud. 131, (1993).
[6] V. Guillemin: Moment Maps and Combinatorial Invariants of Hamiltonian T n-spaces. Birkhäuser,
Boston, etc., 1994.
[7] B. Grünbaum, Convex polytopes, Second edition, (prepared and with a preface by V. Kaibel, V. Klee
and G. M. Ziegler), Graduate Texts in Mathematics, 221, Springer-Verlag, New York, 2003.
[8] R. Gardner: The Brunn-Minkowski inequalty, Bull. Amer. Math. Soc. 39(2002), 355-405.
[9] M. Gromov: Pseudo-holomorphic curves in symplectic manifolds, Invent. Math. 82 (1985), 307-
[10] Y. Karshon and S. Tolman: The Gromov width of complex Grassmannians, Alg. Geom. Top. 5
(2005), 911-922.
[11] D. McDuff and L. Polterovich: Symplectic packings and algebraic geometry. With an appendix by
Y. Karshon, Invent. Math. 115 (1994), no. 3, 405–434.
[12] A. Pelayo: Topology of spaces of equivariant symplectic embeddings, Proc. Amer. Math. Soc. 135
(2007) 277–288.
[13] A. Pelayo: Toric symplectic ball packing, Top. and its Appl. 153 (2006) 3633–3644.
Alvaro Pelayo
Department of Mathematics, University of Michigan
2074 East Hall, 530 Church Street, Ann Arbor, MI 48109–1043, USA
e-mail: apelayo@umich.edu
Benjamin Schmidt
Department of Mathematics, University of Chicago
5734 South University Avenue, Chicago, Illinois 60637
e-mail: schmidt@math.uchicago.edu
	Introduction and Main Result
	Motivation
	Main Results
	Reduction to Convex Geometry
	Proofs of propositions ??, ??, and ??.
ABSTRACT
  Let M be a symplectic-toric manifold of dimension at least four. This paper
investigates the so called symplectic ball packing problem in the toral
equivariant setting. We show that the set of toric symplectic ball packings of
M admits the structure of a convex polytope. Previous work of the first author
shows that up to equivalence, only CP^1 x CP^1 and CP^2 admit density one
packings when n=2 and only CP^n admits density one packings when n>2. In
contrast, we show that for a fixed n>=2 and each r in (0, 1), there are
uncountably many inequivalent 2n-dimensional symplectic-toric manifolds with a
maximal toric packing of density r. This result follows from a general analysis
of how the densities of maximal packings change while varying a given
symplectic-toric manifold through a family of symplectic-toric manifolds that
are equivariantly diffeomorphic but not equivariantly symplectomorphic.

<|endoftext|><|startoftext|>
Introduction to Quantum Mechanics: A
Time-Dependent Perspective, (University Science Books,
2006).
[14] L.D. Landau, and E.M. Lifshitz Quantum Mechan-
ics: Non-relativistic Theory, 3rd edition (Butterworth-
Heinemann, Oxford, 2003).
[15] C. Zener, Proc. R. Soc. London A137, 696 (1932).
[16] A. Auerbach, and S. Kivelson, Nucl. Phys. B 257, 799
(1985).
[17] D. Blume, C.H. Greene, Phys. Rev. A 65, 043613 (2002).
[18] E.L. Bolda, E. Tiesinga, and P.S. Julienne, Phys. Rev. A
66, 013403 (2002).
[19] Z. Idziaszek, and T. Calarco, Phys. Rev. Lett. 96, 013201
(2006).
[20] T. Bergeman, M.G. Moore, and M. Olshanii, Phys. Rev.
Lett. 91, 163201 (2003).
[21] Z. Idziaszek, and T. Calarco, Phys. Rev. A 74, 022712
(2006).
[22] B.E. Granger, D. Blume, Phys. Rev. Lett. 92, 133202
(2004).
[23] M. Abramowitz and I.E. Stegun Handbook of Mathemat-
ical Functions, 10th edition, (Department of Commerce,
Washington DC, 1972).
[24] For atomic states with a permanent quadrupole moment,
the long-range part of the potential comes from the inter-
action of ion charge with the atom quadrupole moment.
[25] 1D solutions (10) and (11) are valid at all distances for
k → 0 (E → ~ω⊥).
[26] COM and relative motions can be decoupled in the lowest
order when the size of the molecule is much smaller than
the harmonic oscillator lengths li and la.
[27] At least for the internal states with the total angular
momentum J = 1/2.
ABSTRACT
  We consider a system composed of a trapped atom and a trapped ion. The ion
charge induces in the atom an electric dipole moment, which attracts it with an
r^{-4} dependence at large distances. In the regime considered here, the
characteristic range of the atom-ion interaction is comparable or larger than
the characteristic size of the trapping potential, which excludes the
application of the contact pseudopotential. The short-range part of the
interaction is described in the framework of quantum-defect theory, by
introducing some short-range parameters, which can be related to the s-wave
scattering length. When the separation between traps is changed we observe
trap-induced shape resonances between molecular bound states and vibrational
states of the external trapping potential. Our analysis is extended to
quasi-one-dimensional geometries, when the scattering exhibit
confinement-induced resonances, similar to the ones studied before for
short-range interactions. For quasi-one-dimensional systems we investigate the
effects of coupling between the center of mass and relative motion, which
occurs for different trapping frequencies of atom and ion traps. Finally, we
show how the two types of resonances can be employed for quantum state control
and spectroscopy of atom-ion molecules.

<|endoftext|><|startoftext|>
Introduction
For the past while there has been intense interest in finite N partition functions for
Yang Mills theories, especially in super-symmetric ones, particularly with regard to their
construction for BPS states and the counting thereof [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15].
Much attention has been devoted to this issue for N = 4 super Yang Mills theory, it
being by now the archetypal example of a conformal field theory for which we have a
dual description in terms of string theory, by means of the AdS/CFT conjecture [16].
An example is in the assiduous efforts that have been made to explain the entropy of
certain BPS black holes in AdS5×S5 [17,18,19,20] in terms of microscopic counting of dual
operators in N = 4 super Yang Mills, with gauge group SU(N), by means of partition
functions [1]. While this problem remains unsolved to date, essentially due to the difficulty
of defining what is meant by these dual operators for finite N , there are other equally
interesting sectors of (super) Yang Mills theories where more progress with counting has
been made, examples being in free N = 4 super Yang Mills and in chiral ring sectors
involving BPS operators.
For N = 4 super Yang Mills, the half BPS sector consists of multi-trace operators
involving a single bosonic operator, Z. Similarly, the quarter BPS sector consists of multi-
trace operators involving two bosonic operators Z, Y while the eighth consists of ones
involving three bosonic operators Z, Y,X and two fermionic operators λ, λ̄. (All these
operators are here assumed to belong to the Lie algebra of U(N).)
For the chiral ring, the commuting/anti-commuting of these operators is at the heart
of why we can write very concise and elegant generating functions for the finite N multi-
trace partition functions [1]. Analysis of these partition functions, in terms of the counting
of operators, has become a sophisticated industry where such approaches as the so-called
‘Plethystic Program’ have provided substantial results [3,9].
This paper is devoted largely to the issue of partition functions for free field theory
and particularly to the counting of gauge invariant multi-trace operators for the case of two
bosonic fundamental fields. This is of relevance to the quarter BPS sector of N = 4 super
Yang Mills in the free field limit when the operators Z, Y do not commute, in contrast to
the chiral ring when they do.
The partition function for a free, massless quark-gluon gas was computed long ago [21].
This involved taking particle statistics into account using coherent state techniques and
then imposing the gauge singlet condition by integrating over the relevant gauge group.
With some modifications to the expression thus derived we may write the multi-trace
partition function for some generic bosonic/fermionic fundamental fields in terms of an
integral over the gauge group, involving the single particle partition function [22,23]. This
is the starting point here.
For U(N) gauge theories we may easily write down the integral, though, even for this
case, its evaluation is far from simple. One approach (which we adapt here for the SU(2)
case) is to rewrite the expression in terms of an N -fold contour integral, whereby it may in
principle be evaluated by summing the contributions from poles inside origin centred unit
discs, in each of the N complex planes - similar techniques have been used in [10]. Due to
the number of poles this becomes unfeasible for higher values of N . Another approach is
to use the fact that the complex integral that interests us provides for an inner product
for symmetric polynomials - see Macdonald [24], pp. 363 - 372, for a related discussion.
Taking this point more seriously reveals an alternative route to evaluating the free field
theory partition function which exposes not only the large N case in an almost trivial way,
but also how and where this differs from the finite N case.
This treatment also reveals an alternative interpretation of the free field partition
function at finite N - it is related to a gauge group average of the cycle polynomial for the
symmetric permutation group (after a certain identification of ‘letters’ with gauge group
valued variables). This point is not dwelt on further here though makes the connection
between the partition function and Polya enumeration explicit. For single trace operators
at large N this connection has already been made [22,25] for N = 4 super Yang Mills,
whereby the partition function for single trace operators is related to the cycle polynomial
for the cyclic permutation group.1
Another issue is how to use the expression for the free field partition function to give
explicit counting of gauge invariant multi-trace operators in a Yang-Mills theory, with
gauge group U(N). The case for one bosonic fundamental field has been widely discussed
and, for finite N , the operators are counted by partition numbers of non-negative integers
into at most N parts (which is here denoted by pN (n) - no closed formula for these numbers
for arbitrary N, n exist, though they have a ‘nice’ generating function). Here, this result is
re-derived, from a symmetric function perspective, by employing the well known Cauchy-
Littlewood formula.
To proceed further with counting, for the quarter BPS sector of N = 4 super Yang
Mills, for instance, character methods prove to be both natural and indispensable. Char-
acters in relation to conformal field theories prove to be very convenient for encoding
the allowable representations [15,29,30,31,32,33] and for studying related partition func-
tions, [34,35,36,33]. For N = 4 super Yang Mills, it was shown in [33] that, if we are
1 For more recent applications of symmetric polynomials and Polya enumeration to super-
symmetric quantum mechanical models, analysed in terms of Fock space methods, see [26,27,28].
to distinguish among primary operators with differing conformal dimensions, spins and
R-symmetry charges, such counting is most easily achieved using reductions of the full
N = 4 superconformal characters, in certain limits that isolate corresponding sectors of
short/semi-short operators. (One such limit corresponds to the index constructed in [1].)
This point may be easily illustrated for quarter BPS primary operators in N = 4 su-
per Yang Mills, the case of two bosonic fundamental fields here. (See [37] for an explicit
construction and counting of quarter BPS operators.) For N = 4 super Yang Mills, the
counting of quarter BPS primary operators is complicated by the fact that, if we are to
keep track of differing R-symmetry representations, any partition function restricted to
this sector must be expanded in terms of U(2) characters (or two-variable Schur polyno-
mials). Denoting some partition function restricted to this sector by Z(t, u), where t, u are
letters corresponding to the fields Z, Y , then by expanding
Z(t, u) =
N(n,m) s(n,m)(t, u) , s(n,m)(t, u) =
tn+1um − tmun+1
t− u , (1.1)
in terms of two-variable Schur polynomials s(n,m)(t, u), we obtain the numbers N(n,m) of
gauge invariant quarter BPS operators belonging to the [m,n−m,m] SU(4)R R-symmetry
representation and having conformal dimension n + m (so that they are superconformal
primary highest weight states in the corresponding quarter BPS supermultiplets). (The
case m = 0 counts gauge invariant half BPS primary operators.)
Here, the free field partition function is thus expanded in terms of Schur polynomials,
depending on the same variables as the one particle partition function, the two boson
case being a specialisation. This is quite naturally achieved using the Cauchy-Littlewood
formula (and, if we include fermions, another formula due to Littlewood). Generally, we
may obtain a result that relates the counting numbers to a sum over Kronecker coefficients.
These arise naturally in the theory of the symmetric permutation group, though remain
somewhat mysterious from a combinatorial perspective.
Specialising to the two boson case, a recursive procedure is employed here for the
counting of multi-trace quarter BPS operators in free field theory at finite N . This issue
was given considerable discussion for the large N case in [33] - here the results of [33] are
generalised in terms of a generating function that may be employed to count quarter BPS
operators for any R-symmetry charges, at large N . Asymptotic counting is also addressed
in the latter case for the numbers N(n,m) in (1.1) for large n and fixed m.
To complete the discussion, counting of quarter BPS operators in the chiral ring of
N = 4 super Yang Mills is investigated in terms of expanding over U(2) characters as in
(1.1). An explicit formula is given for the corresponding finite N partition function, with
a short combinatorial interpretation given in terms of plane partitions, and specialised to
large N . For the latter case, the exponential behaviour of the numbers N(n,m) in (1.1) is
found for n and m both comparably large. This behaviour is consistent with a special case
addressed in [3,9]. By way of completion, a similar discussion for an arbitrary number of
bosonic fundamental fields in the chiral ring is included.
Two appendices are included; the first establishes some notation used for partitions
and gives some standard results for the symmetric group and symmetric polynomials, the
second gives some tables of numbers of quarter BPS operators in free N = 4 super Yang
Mills, with gauge group U(N), for which explicit formulae are given in the main text.
Footnotes contain further details and points of clarification.
2. Free Field Partition Functions
We start from the single particle partition function which is here denoted by f(t) for
some variables2 t = (t1, t2 . . .). The general form of f(t) is
f(t) =
aiti , (2.1)
where each ti is a letter corresponding to a fundamental field and ai are signs, being +1
for a bosonic field, or −1 for a fermionic field.
For compact gauge Lie group G, the multi-trace partition function is then given by
[21], (see [22,23] for refinements,)
ZG(t) =
dµG(g) exp
f(tn)χR(g
, (2.2)
involving the Haar (or G-invariant, or Hurwitz) measure dµG(g) for g ∈ G (so that∫
dµG(g)F (g) =
dµG(g)F (gh) =
dµG(g)F (hg) for all h ∈ G and
dµG(g) = 1)
and where χR(g) is the character for the R representation of G, assuming that the funda-
mental fields transform in identical gauge group representations R.
For G = U(N), so that for any matrix U ∈ U(N) we may write U = VΘV †, where V
is a unitary matrix and Θ = diag.(eiθ1 , . . . , eiθN ), 0 ≤ θi < 2π, and for some F (U) = F (Θ),
independent of V , then we may write
dµU(N)(U)F (U) =
(2π)NN !
1≤k<l≤N
|eiθk − eiθl |2 F (Θ) , (2.3)
2 In what follows roman letters are used to denote a collection of variables and, for x =
(x1, . . . , xi), y = (y1, . . . , yj) for example, the shorthand x
α is used to mean (x1
α, . . . , xi
α) and
z = xy to mean z = (z11, . . . , zij) where zrs = xrys. The latter convenient notation has been used
by Macdonald [24].
which is, of course, related to the Weyl parametrisation of U(N). Thus, for such F (U), the
left-hand side of (2.3) simplifies to an integral over the N torus. Of course, as χR(U) gen-
erally depends on linear combinations of tr(U j)ktr(U †l)m for various non-negative integers
j, k, l,m then any function of χR(U) is an example of such an F (U).
We are interested in the case where R = Adj. is the adjoint representation so that for
U(N) we have that χAdj.(U) = trU trU
† (while for SU(N) then χAdj.(U) = trU trU
†−1).
For U(N) we then find that, using (2.2) with (2.3),
ZU(N)(t) =
(2π)NN !
1≤k<l≤N
|eiθk − eiθl |2 exp
f(tn)
j,k=1
ein(θj−θk)
(2.4)
We may write (2.4) as an N -fold contour integral by first making the variable change
zj = e
iθj so that the integrals in (2.4) are around unit circles in each zj complex plane and
then we obtain
ZU(N)(t) =
(2πi)NN !
∆(z)∆(z−1) exp
f(tn)pn(z)pn(z
, (2.5)
where ∆(z) =
1≤i<j≤N (zi − zj) is the Vandermonde determinant and pn(z) =
i=1 zi
is a power symmetric polynomial - see appendix A for a brief discussion of symmetric
polynomials. This integral may then in principle be evaluated by deforming the contours
so as to extract the residues at poles within the discs |zj | < 1, 1 ≤ j ≤ N .
A crucial observation is that, for some N variable symmetric polynomials g(z), h(z),
(2πi)NN !
∆(z)∆(z−1)g(z)h(z−1) , (2.6)
acts as an inner product - this is easy to see in terms of Schur polynomials which provide
an orthonormal basis for symmetric polynomials. The reader may now wish to peruse
appendix A where notation regarding partitions and a short discussion of symmetric poly-
nomials is included.
The General and Large N Cases for U(N)
For application of inner products to (2.5) we have that, in terms of power symmetric
polynomials pλ(z) for partitions λ,
f(tn)pn(z)pn(z
nanan!
f(tn)anpn(z)
anpn(z
−1)an ,
fλ(t)pλ(z)pλ(z
−1) ,
(2.7)
with the definitions of
nanan! , fλ(t) =
f(tn)an , (2.8)
being in terms of the frequency representation of λ, (1a1 , 2a2 , . . .) with
n≥1 n an = |λ|,
the weight of the partition λ (note that the frequency representation of λ is simply a
convenient re-ordering of the parts of λ).
In (2.7) the numbers zλ have a standard combinatorial interpretation - for a given
permutation σ ∈ Sm with a1 1-cycles, a2 2-cycles etc., so that
n≥1 n an = m = |λ|, then
n≥1 n
anan! is the size of the centraliser Zσ = {τ ; τ ∈ Sm, τστ−1 = σ} of σ ∈ Sm.
(This may be easily seen as under conjugation of σ by τ then τ can permute the cycles
of length n among themselves in an! ways and/or render a cyclic rotation on each of the
individual cycles in nan ways.) More details of the symmetric group are to be found in
appendix A.
We may immediately observe that (2.7) represents a sum over cycle polynomials of
the symmetric group Sm. This is given by, for letters u1, u2, . . . um,3
Cm(u) =
a1,...,am≥0
δa1+2a2+...+mam,m
nanan!
uλ , (2.9)
where ‘λ ⊢ m’ means that λ is any partition of m - see appendix A for notation - and
an , (2.10)
in terms of the frequency representation of λ above. Identifying un = f(t
n)pn(z)pn(z
then we may rewrite
ZU(N)(t) =
(2πi)NN !
∆(z)∆(z−1)Cm(u) , (2.11)
3 This formula is easy to see from the definition of the cycle polynomial for a subgroup G
of Sm. This is given by
j1(g) · · ·um
jm(g) =
|Kg|u1
j1(g) · · ·um
jm(g) ,
where ji(g) denotes the number of i cycles in the unique decomposition of g into disjoint cycles
and Kg denotes the conjugacy classes of G with class representatives g. The size of the conjugacy
class Kg is given by |Kg| = |G|/|Zg| where Zg is the centraliser of g ∈ G. For the present case
then, G = Sm, |Sm| = m!, and |Zσ| = zλ, where λ gives the cycle structure of σ ∈ Sm, and thus,
for the corresponding conjugacy class Kλ, |Kλ| = m!/zλ.
the sum of the U(N) group averages of each of the cycle polynomials Cm(u). (Physically,
the interpretation is that the cycle index for the symmetric permutation group accounts
for particle statistics while integration over the gauge group imposes the gauge singlet
condition. For purposes of clarity, the U(N) case has been focused upon here, though
from the form of (2.2) it is easy to see how this generalises for other gauge groups whereby
the letters un = f(t
n)χR(g
n) for the fundamental fields transforming in identical gauge
group representations, R.)
Directly from (2.7), in terms of the inner product (2.6), then
ZU(N)(t) =
fλ(t)
pλ, pλ
fλ(t)
µ⊢|λ|
ℓ(µ)≤N
, (2.12)
where on the right-hand side of (2.12) we have used an expression for the inner product
of two power symmetric polynomials expressed in terms of the characters of the symmet-
ric group, given in appendix A. (Here ‘ℓ(µ)’ means the number of non-zero parts of the
partition µ.)
Using a result of appendix A (essentially orthogonality relations for symmetric group
characters), (2.12) may be rewritten as
ZU(N)(t) =
|λ|≤N
fλ(t) +
|λ|>N
fλ(t)
µ⊢|λ|
ℓ(µ)≤N
. (2.13)
In the large N limit, ZU(N)(t) simplifies considerably as only the first term in (2.13)
need be considered. Using the frequency representation of λ then
ZU(∞)(t) =
fλ(t) =
f(tn)an =
1− f(tn) , (2.14)
a result which has been obtained using Polya counting methods for single trace operators
and saddle point approximations [22,23].
Higher order corrections in |λ|, the weight of the partition λ, to (2.13) may be obtained
by successive evaluation of
µ⊢|λ|
ℓ(µ)≤N
. One method is to employ the Murnaghan-
Nakayama Rule, used to compute χ
λ using skew hooks and Young diagrams. (A readable
account of the Murnaghan-Nakayama Rule may be found in [38], though of course it is
explained in many standard textbooks that discuss the symmetric group.)
For the case of |λ| = N+1 then we may observe that,
µ⊢N+1
ℓ(µ)≤N
µ⊢N+1
ℓ(µ)≤N+1
)2 − (χνλ)2 = zλ − (χνλ)2 , ν = (1N+1) , (2.15)
since the partition ν = (1N+1) is the only one excluded among those partitions µ of
N+1 with ℓ(µ) ≤ N . By applying the Murnaghan-Nakayama Rule we may determine, for
ν = (1L),
(χνλ)
1 for |λ| = L
0 otherwise
, (2.16)
since χνλ in this case is just a sign. (This may be easily seen as there is only one possible
way to remove successive skew hooks, which in this case are just column Young diagrams
of length λi, from the (1
L) column Young diagram to leave one of normal shape, in this
case, another column Young diagram.) Thus, using (2.15) with (2.16) in (2.13), we obtain
ZU(N)(t) =
|λ|≤N+1
fλ(t)−
|λ|=N+1
fλ(t) +
|λ|>N+1
fλ(t)
µ⊢|λ|
ℓ(µ)≤N
. (2.17)
By a similar line of argument we may do the same for the case of |λ| = N+2. We
have that, for ν1 = (1
N+2) and ν2 = (2, 1
N+1),
µ⊢N+2
ℓ(µ)≤N
µ⊢N+2
ℓ(µ)≤N+2
)2 − (χν1λ )
2 − (χν2λ )
2 = zλ − (χν1λ )
2 − (χν2λ )
2 . (2.18)
We may determine, for ν = (2, 1L),
(χνλ)
(a1 − 1)2 for |λ| =
n≥1 n an = L+2
0 otherwise
. (2.19)
Using (2.16), (2.19) with (2.18) in (2.17) then we obtain
ZU(N)(t) =
|λ|≤N+2
fλ(t)−
N+1≤|λ|≤N+2
fλ(t)−
|λ|=N+2
fλ(t)
|λ|>N+2
fλ(t)
µ⊢|λ|
ℓ(µ)≤N
(2.20)
where, in the frequency representation of λ,
(a1−1)2
(a1 − 1)!
(a1 − 2)!
)/ ∞∏
nanan! . (2.21)
We may proceed in this manner to compute explicit higher order corrections though
this becomes cumbersome save for the first few cases as shown. (For |λ| > N + 2 the
corrections will always involve contributions from (2.16) and (2.19) as well as extra ones
coming from
µ⊢|λ|
ℓ(µ)≤|λ|
µ⊢|λ|
ℓ(µ)≤N
The One Boson Case for U(N)
For the case of one bosonic fundamental field (applicable to half BPS operators for
N = 4 super Yang Mills), we have f(t) = t in (2.5), so that we may write
ZU(N)(t) =
(2πi)NN !
∆(z)∆(z−1)
j,k=1
1− tzjzk−1
. (2.22)
To evaluate this integral we may use the Cauchy-Littlewood formula,
1− xiyj
ℓ(λ)≤min.{L,M}
sλ(x1, . . . , xL)sλ(y1, . . . , yM) , (2.23)
where the sum on the right-hand side is over all partitions λ such that the corresponding
Young diagrams have no more than min.{L,M} rows, ℓ(λ) ≤ min.{L,M}. With xi = tzi,
yi = zi
−1, i = 1, . . . , N , in (2.23), so that sλ(tz1, . . . , tzN ) = t
|λ|sλ(z), and employing also
(2.6) and the orthonormality of Schur polynomials, we may easily obtain,
ZU(N)(t) =
ℓ(λ)≤N
sλ, sλ
ℓ(λ)≤N
t|λ| . (2.24)
By changing summation variables so that λi − λi+1 = ai, i = 1, . . . , N−1, λN = aN then
we may write
ZU(N)(t) =
a1,...,aN=0
ta1+2a2+...+NaN = PN (t) , (2.25)
where4
PN (t) =
1− ti . (2.26)
Of course this is nothing other than the generating function for the number pN (n) of
partitions of n into no more than N parts since, by definition,
ℓ(λ)≤N
δ|λ|,n = pN (n) , (2.27)
so that by the above
pN (n)t
n = PN (t) . (2.28)
4 1/P∞(t) =
(1−tn) is commonly called the Euler function, denoted by Φ(t). P∞(t) =∑∞
p(n)tn acts as a generating function for the number of unordered partitions of n, p(n). Note
that pN (n) = p(n) for n ≤ N , i.e. the number of partitions of n into no more than N parts is the
same as the total number of partitions of n so long as n ≤ N .
This makes explicit the connection between ZU(N)(t) and the partition numbers pN (n).
The SU(2) Gauge Group Case
Here we first consider f(t) =
j=1 tj in (2.1) so that the variables 0 ≤ ti < 1
represent k bosons in the single particle partition function. For such fields transforming in
the adjoint representation of SU(2) then (2.2) simplifies significantly. For any U ∈ SU(2)
we may write U = VΘV †, where V is unitary and Θ = diag.(eiθ, e−iθ), for 0 ≤ θ < 2π, so
that for F (U) = F (θ), then in usual Weyl parametrisation,
SU(2)
dµSU(2)(U)F (U) =
dθ sin2 θ F (θ)
1− cos θ
F ( θ
1− cos θ
F ( θ
(2.29)
where F (θ) = F (θ + π) is assumed in writing the last line. In the present case, F (U) =
F (θ) =
n≥1 f(t
n)χAdj.(U
n)/n, where χAdj.(U) = tr(U)tr(U
†) − 1 = e2iθ + e−2iθ + 1 =
2 cos 2θ + 1, so that
ZSU(2)(t) =
dθ (1− cos θ)
(1− tj)(1− tjeiθ)(1− tje−iθ)
. (2.30)
Making the variable change z = eiθ, and using that F (θ) = F (−θ) is even, then
ZSU(2)(t) =
(1− z)
(1− tj)(1− tjz)(1− tjz−1)
, (2.31)
where the integral is around the unit circle |z| = 1. The residues in (2.31) may be easily
computed since all the relevant (simple) poles in the disc |z| < 1 occur at the points z = tj .
ZSU(2)(t) =
1− ti2
j 6=i
(ti − tj)(1− titj)(1− tj)
. (2.32)
This partition function has an interesting interpretation from a group theory perspec-
tive. We may write,
1 + t
(1− tz)(1− tz−1) =
χn(z)t
n , (2.33)
where
χj(z) =
2 − z−j− 12
2 − z− 12
, j ∈ 1
Z , (2.34)
is an SU(2) character, corresponding to the spin j irreducible representation, Rj. Now the
integral in (2.31) acts as an SU(2) inner product,
χj , χk
(1− z)χj(z)χk(z−1) = δjk , (2.35)
for j, k ∈ N. Thus, from (2.31) with (2.33) and (2.35),
(1− tj2)ZSU(2)(t) =
n1,...,nk=0
χn1 · · ·χnk , 1
n1 · · · tknk , (2.36)
acts as a generating function for the number of singlets in the decomposition of the SU(2)
representation Rn1 ⊗· · ·⊗Rnk .5 By using that χn(z) =
j=−n z
j we may use the Cauchy
residue theorem to compute explicitly that
χn1 · · ·χnk , 1
· · ·
(δj1+···+jk,n1+···+nk − δj1+···+jk,n1+···+nk+1) . (2.37)
If we modify the one particle partition function to include k bosons and k̄ fermions
and hence consider (2.1) in the form f(t, t̄) =
j=1 tj −
̄=1 t̄̄ then we may similarly as
above evaluate
ZSU(2)(t, t̄) =
(1− z)
1≤j≤k
1≤̄≤k̄
(1− t̄̄)(1− t̄̄z)(1− t̄̄z−1)
(1− tj)(1− tjz)(1− tjz−1)
, (2.38)
where the contour is around the unit disc |z| = 1. So long as k > k̄ then (2.38) receives
contributions from only those simple poles at z = tj so that for this case we obtain
ZSU(2)(t, t̄)
k−k̄−1
1− ti2
1≤̄≤k̄
1≤j≤k,j 6=i
(ti − t̄̄)(1− tit̄̄)(1− t̄̄)
(ti − tj)(1− titj)(1− tj)
. (2.39)
For k ≤ k̄ then (2.38) also receives contributions from poles at z = 0. For instance
ZSU(2)(t, t̄)
ti(1− ti2)
1≤j,̄≤k
j 6=i
(ti − t̄̄)(1− tit̄̄)(1− t̄̄)
(ti − tj)(1− titj)(1− tj)
1≤j,̄≤k
t̄̄(1− t̄̄)
tj(1− tj)
(2.40)
5 Generating functions for products of Lie algebra representations have been considered
elsewhere, in [39] for instance. A generating function for the number of singlets in n products of
the fundamental times n products of the anti-fundamental representations for SU(N) was found
by Gessel [40] in terms of Toeplitz determinants involving Bessel functions. See also [41] for a
nice physics oriented discussion of similar issues. The special case of R 1
⊗ · · · ⊗R 1
(2n products
of the fundamental) for SU(2), contains a Catalan number, 1
, of singlet representations.
where the last term on the right-hand side of (2.40) comes from the simple pole at z = 0.
These formulae should be useful for computing the multi-trace partition functions,
for fundamental fields transforming in an SU(2) gauge group, in other sectors of N = 4
super Yang Mills. For instance, after a suitable identification of the variables tj , t̄̄ with
variables in single particle partition functions for semi-short sectors of N = 4 super Yang
Mills, described in detail in [33], then (2.40) should allow for an explicit expression for
corresponding multi-trace partition functions. They may also be useful for computing the
N = 4 superconformal index of [1] for SU(2) gauge group, or at least for restrictions of it
such as described in [33] or [42].
3. Counting Operators in Free N = 4 Super Yang Mills
In this section, the counting of half and quarter BPS operators for free N = 4 super
Yang Mills, when the fundamental fields transform in the adjoint representation of U(N),
is discussed in some detail.
Counting Operators Directly
We may, of course, proceed to count multi-trace half and quarter BPS primary oper-
ators directly, in terms of the fundamental fields, Z, for half BPS operators and Z, Y , for
quarter BPS operators.
(Z, Y ) forms a U(2) doublet, where U(2) has generators given by a subset of the
SU(4)R generators, Hi, Ei±, 1 ≤ i ≤ 3, where Hi are the Cartan sub-algebra genera-
tors and Ei± are ladder operators satisfying (in the Chevalley-Serre basis) [Hi, Ej±] =
±KijEj±, with [Kij] being the usual SU(4) Cartan matrix. The U(2) generators consist
of the SU(2) generators H2, E2±, where explicitly [(H1, H2, H3), E2±] = ∓(1,−2, 1)E2±,
along with the generator H1+H2+H3, whose eigenvalues give the conformal dimensions
in this case, [H1+H2+H3, (Z, Y )] = (Z, Y ) [33]. Explicitly, we have that [E2+, Z] = 0,
[E2−, Z] = Y , [(H1, H2, H3), Z] = (0, 1, 0)Z, [(H1, H2, H3), Y ] = (1,−1, 1)Y so that an
operator involving n Z’s and m Y ’s transforms in the [m,n−m,m] SU(4)R R-symmetry
representation.
For k-trace half BPS primary operators transforming in the [0, n, 0] SU(4)R R-
symmetry representation, with conformal dimension n, then in terms of the fundamental
field Z a basis is provided by,
tr(Zn1) · · · tr(Znk) ,
ni = n . (3.1)
We have that, due to trace identities for finite N , tr(Zn) for n > N is expressible in terms
of a sum over multi-trace operators of the form (3.1), for k > 1, and thus, a minimal basis
for multi-trace half BPS primary operators consists of (3.1) for all 1 ≤ k ≤ n and with
every ni ≤ N , ordered so that n1 ≥ n2 ≥ . . . ≥ nk ≥ 0, i.e. so that (n1, . . . , nk) is a
partition of n where each part ni ≤ N . With this restriction, the number of multi-trace
half BPS primary operators for a given n is
N(n) = pN (n) , (3.2)
since the number pN (n), in (2.27), of partitions of n into ≤ N parts is the same as the
number of partitions of n in which each part is ≤ N - see [43] for a simple proof employing
generating functions.
For quarter BPS operators belonging to the [m,n−m,m] SU(4)R R-symmetry repre-
sentation, a basis for k-trace operators is
Zn1jY m1j
· · · tr
ZnkjY mkj
nij = n ,
mij = m, (3.3)
where there is a choice of ordering in each trace. (Note that the m = 0 case corresponds to
the half BPS case already considered.) Using the basis provided by (3.3) for all allowable k,
then to avoid over-counting of multi-trace quarter BPS operators, the cyclicity of each trace
and also trace identities for finite N must be accounted for. Assuming that this is done,
let M(n,m) denote the number of elements in this minimal basis for multi-trace operators
of the form (3.3). Then, to obtain the number N(n,m) of multi-trace quarter BPS primary
operators in the SU(4)R representation [m,n−m,m], the number of U(2) descendants, in
the SU(4)R representation [m,n−m,m], of multi-trace quarter BPS primary operators, in
SU(4)R representations [j, n+m−2j, j], 0 ≤ j ≤ m−1, must be subtracted from M(n,m).
(These descendants arise due to the relation [E2−, Z] = Y . Acting with (E2−)
m−j on the
highest weight state in the SU(4)R representation [j, n+m−2j, j] we obtain a descendant in
the SU(4)R representation [m,n−m,m].) The number of such U(2) descendants coincides
with N(n+m−j,j), the number of corresponding primary operators. In this way, we obtain
M(n,m) = N(n,m) +N(n+1,m−1) + . . .+N(n+m−1,1) +N(n+m) , (3.4)
so that N(n,m) = M(n,m) −M(n+1,m−1) may be obtained recursively for each m.
We may illustrate by counting all multi-trace quarter BPS primary operators in the
[1, n−1, 1] R-symmetry representation. In this case a basis for k+1-trace operators is
provided by
tr(Zn1) · · · tr(Znk)tr(ZjY ) ,
ni = n− j . (3.5)
Cyclicity of traces implies that we may arrange Y as shown, to avoid over-counting. U(N)
trace identities imply, similarly as for the half BPS case, that a minimal basis for multi-
trace operators requires j < N and each ni ≤ N in (3.5) for every 1 ≤ k ≤ n−j, so
that (n1, . . . , nk) forms a partition of n−j, with every part ≤ N . Thus, by a similar
argument as for the half BPS case, M(n,1) =
j=0 pN (n−j). Finally, to ensure that only
primary operators are counted then we must subtract off contributions from descendants
of half BPS primary operators in the [0, n+1, 0] SU(4)R representation, of which there are
pN (n+1). Using (3.4) with (3.2) we then conclude that
N(n,1) =
pN (n− j)− pN (n+1) , (3.6)
gives the number of multi-trace quarter BPS primary operators in the [1, n−1, 1] R-
symmetry representation.
Counting in this fashion becomes more difficult for greater m and now a procedure is
described employing symmetric polynomials to find a generating function for the numbers
of multi-trace quarter BPS primary operators in the [m,n−m,m] SU(4)R representation,
for m = 0, 1, 2 at finite N and for any n,m at large N . This generating function is
subsequently used to provide asymptotic counting for fixed m, large n in the large N limit.
Counting Operators via Expansion of Partition Functions in Schur Polynomials
For k bosonic fundamental fields, we may take f(t) =
j=1 tj in (2.1) so that (2.5)
may be written as
ZU(N)(t) =
(2πi)NN !
∆(z)∆(z−1)
r,s=1
1− tjzrzs−1
. (3.7)
Often it is the case that such partition functions should be expanded in terms of sλ(t), the
k variable Schur polynomial labelled by partitions λ. An example is provided by (1.1) for
counting multi-trace quarter BPS operators. We may use the Cauchy-Littlewood formula
(2.23) to expand in this way, to obtain
ZU(N)(t) =
ℓ(λ)≤min.{k,N2}
Nλ sλ(t) , (3.8)
where
(2πi)NN !
∆(z)∆(z−1) sλ(zz
−1) , (3.9)
where zz−1 has components zizj
−1, 1 ≤ i, j ≤ N .
From Macdonald [24] we have that
sλ(xy) =
µ,ν⊢|λ|
γλµνsµ(x)sν(y) , (3.10)
in terms of Kronecker coefficients,
γλµν =
σ∈S|λ|
χλ(σ)χµ(σ)χν(σ)
ρ⊢|λ|
χλρ χ
(3.11)
being a sum over irreducible S|λ| characters evaluated at σ ∈ S|λ|, related to a sum over
irreducible S|λ| characters evaluated on the conjugacy classes labelled by the partitions ρ in
the second line. Using (2.6) along with the orthonormality property of Schur polynomials
we find that
µ⊢|λ|
ℓ(µ)≤N
γλµµ . (3.12)
The situation becomes much more involved if we include also k̄ fermionic fields, so
that (2.1) may be written in the form f(t, t̄) =
j=1 tj −
̄=1 t̄̄, and attempt to ex-
pand ZU(N)(t, t̄) in terms of products of Schur polynomials sλ(t)sµ(t̄). Such expansion
is required for counting, for instance, for the free field partition function in the eighth
BPS sector of N = 4 super Yang Mills. In this case the partition function is expanded,
analogous to (1.1), in terms of SU(2|3) characters, which may be expressed in terms of a
linear combination of products of two-variable and three-variable Schur polynomials. (See
[33] for a discussion of counting for the eighth BPS sector along these lines.) Including
fermions, (3.7) becomes modified by
ZU(N)(t, t̄) =
(2πi)NN !
∆(z)∆(z−1)
1≤j≤k
1≤̄≤k̄
r,s=1
1− t̄̄zrzs−1
1− tjzrzs−1
. (3.13)
To achieve the expansion, we may use the Cauchy-Littlewood formula (2.23) along with
another formula of Littlewood,
(1 + xiyj) =
ℓ(λ)≤L,ℓ(λ̃)≤M
sλ(x1, . . . , xL) sλ̃(y1, . . . , yM ) , (3.14)
where λ̃ is the partition conjugate to λ (where the rows and columns of the Young diagram
corresponding to λ are interchanged) and where the sum is restricted to those λ whereby
the corresponding Young diagrams have at most L rows, ℓ(λ) ≤ L, and M columns,
ℓ(λ̃) ≤ M . We may thus write
ZU(N)(t, t̄) =
ℓ(λ)≤min.{k,N2}
ℓ(µ)≤k̄,ℓ(µ̃)≤N2
Nλ,µ sλ(t) sµ(t̄) , (3.15)
where
Nλ,µ =
(−1)|µ|
(2πi)NN !
∆(z)∆(z−1) sλ(zz
−1)sµ̃(zz
−1) . (3.16)
Obviously these numbers are considerably more involved than those in (3.9). We may of
course use (3.10) again to interpret (3.16) in terms of Kronecker coefficients.
Counting Quarter BPS Operators by Symmetric Polynomial Methods
The two bosonic fundamental field case is now focused upon.6 In particular, the
numbers N(n,m) in (1.1) are evaluated using results of the last sub-section.
We may proceed to evaluate Nλ recursively. The simplest case is for Nλ = N(n),
whereby introducing a formal variable t then it is clear, by (2.23) with (2.22), (2.25) and
(2.26), that
N(n)tn =
(2πi)NN !
∆(z)∆(z−1)
j,k=1
1− tzjzk−1
= PN (t) , (3.17)
so that, by (2.28), N(n) is given by (3.2).
More generally to evaluate N(n,m) from (3.9) we may use, for y = zz−1,
s(m)(y)s(n)(y) = s(n,m)(y) + s(n+1,m−1)(y) + . . .+ s(n+m−1,1)(y) + s(n+m)(y) ,
s(m)(zz
−1) =
ℓ(µ)≤N
sµ(z)sµ(z
−1) , (3.18)
6 The two boson case leads to an interesting generalisation of an identity in [44] involv-
ing Littlewood-Richardson coefficients cνλµ, the coefficients that appear in the decomposition
sλ(x)sµ(x) =
cνλµsν(x). With f(t) in (2.1) given by f(t) = t1 + t2, and expanding appro-
priately the corresponding integrand in (3.7) using (2.23); then using (2.6), the orthonormality of
Schur polynomials and the result (2.14), we obtain (note that cνλµ = 0 if |ν| 6= |λ|+ |µ|)
ZU(∞)(t1, t2) =
〈sλsµ, sλsµ〉∞ =
λ,µ,ν
ν⊢|λ|+|µ|
1− t1n − t2n
which reduces to Theorem 4.1 of [44] if we take t1 = t2 = t.
where the expression in the first line of (3.18) may be easily seen using Young tableaux
multiplication rules while (2.23) determines the expression in the second line. From (3.9)
with (3.18), we may find a useful generating function, in terms of a formal variable t, for
the numbers in (3.4) as follows,
F (m)N (t) =
M(n,m)tn
(2πi)NN !
∆(z)∆(z−1) s(m)(zz
s(n)(zz
−1)tn
(2πi)NN !
∆(z)∆(z−1)
ℓ(µ)≤N
sµ(z)sµ(z
j,k=1
1− tzjzk−1
ℓ(µ)≤N
sλsµ, sλsµ
(3.19)
so that we may write
N(n,m) =
F (m)N (t)−
F (m−1)N (t)
, (3.20)
which allows for recursive determination of N(n,m).
Applying this to the case of Nλ = N(n,1) we have, from (3.18)
s(1)(zz
−1) = s(1)(z)s(1)(z
−1) , (3.21)
so that, from (3.19),
F (1)N (t) =
N(n,1) +N(n+1)
ℓ(λ)≤N
s(1)sλ, s(1)sλ
. (3.22)
Using (again, this may be easily seen from Young tableaux multiplication rules)
s(1)(z)sλ(z) =
sλ+er (z) , (3.23)
for {er; 1 ≤ r ≤ N, er · es = δrs} being usual orthonormal vectors, we find that
F (1)N (t) =
N(n,1) +N(n+1)
ℓ(λ)≤N
r,s=1
sλ+er , sλ+es
. (3.24)
Now for any partition λ,
sλ+er , sλ+es
vanishes unless er = es for any r, s and λr−1−λr >
0 for r = 2, . . . , N , due to
s(λ1,...,λr−1,λr+1,...,λN )(z) = 0 for λr−1 = λr , r > 1 . (3.25)
Changing summation variables to those in (2.25) then we have, with the definition (2.26),
F (1)N (t) =
N(n,1) +N(n+1)
a1,...,aN≥0
ta1+...+NaN +
a1,...,aN≥0
ta1+...+NaN
a1,...,aN≥0
ta1+...+NaN =
1− tPN−1(t) .
(3.26)
Thus, using (2.28), (3.2) with (3.26),7
N(n,1) =
pN−1(j)−N(n+1) =
pN−1(j)− pN (n+ 1) . (3.27)
For the case of Nλ = N(n,2) we have that, from (3.18),
s(2)(zz
−1) = s(2)(z)s(2)(z
−1) + s(1,1)(z)s(1,1)(z
−1) , (3.28)
so that, from (3.19), we have
F (2)N (t) =
ℓ(λ)≤N
s(2)sλ, s(2)sλ
s(1,1)sλ, s(1,1)sλ
. (3.29)
Using
s(2)(z)sλ(z) =
sλ+2er (z) +
1≤r<s≤N
sλ+er+es(z) ,
s(1,1)(z)sλ(z) =
1≤r<s≤N
sλ+er+es(z) ,
(3.30)
7 This formula agrees with (3.6) due to
pN (n − j) =
pN−1(j) which follows
because the corresponding generating functions match,
pN (n− j)t
= (1 + t+ . . .+ t
)PN (t) =
PN−1(t) =
pN−1(j)t
along with (3.25) and
s(λ1,...,λr−1,λr+2,...,λN )(z) = −s(λ1,...,λr+1,λr−1+1,...,λN )(z) , (3.31)
for the cases where λr−1 = λr, we may obtain, with the definition (2.26),
F (2)N (t) =
1−tN+1
(1−t)(1−t2)PN−1(t) +
(1−t)(1−t2)PN−2(t) , (3.32)
where the first contribution comes from
|λ|〈s(2)sλ, s(2)sλ
while the second comes
|λ|〈s(1,1)sλ, s(1,1)sλ
. Since the partition number pk(−n) = 0 for n = 1, 2, . . .
we may write, using (2.28),8
(1− t)(1− t2)Pk(t) =
n,i,j=0
pk(n− i− 2j)tn . (3.33)
Thus, from (3.32) with (3.27),
N(n,2) = −
pN−1(j)
i,j=0 (pN−2(n−i−2j) + pN−1(n−i−2j)) if n ≤ N ,∑∞
i,j=0 (pN−2(n−i−2j) + pN−1(n−i−2j)− pN−1(N+1−n−i−2j)) if n ≥ N+1.
(3.34)
Tables of the numbers (3.2), (3.27) and (3.34) are given in appendix B for some few
cases of n,N . Notice from these tables that the numbers N(n,m) below the diagonal line
N ≥ n+m for a given n are the same for all N . This is a general feature that derives from
values of N(n,m) for N ≥ n + m, which numbers may be obtained from a corresponding
generating function that is now constructed.
Using these techniques, we may provide a consistency check of (3.17), (3.26), (3.32)
along with a general result for N(n,m) for high enough values of N , N ≥ m + n. This
employs the orthogonality property of power symmetric polynomials pλ(z) (in the large N
8 This is a special case of the following: for any f(n), n ∈ Z, that satisfies f(−n) = 0,
n = 1, 2, . . ., then we may (at least formally) write
Pk(t)
f(n)t
n,i1,...,ik=0
f(n− i1 − 2i2 − . . .− kik)t
limit) along with
s(n)(z) =
pλ(z)
i1,...,in=0
i1!i2! · · · in!
δi1+2i2+...+nin,n p1(z)
p2(z)
)i2 · · ·
pn(z)
(3.35)
Using the trivial identity pλ(xy) = pλ(x)pλ(y) then from (2.6), (3.19) with (3.35) we have
F (m)∞ (t) =
pλpµ, pλpµ
pν , pν
(3.36)
where for (1a1 , 2a2 , . . .) being the frequency representation of λ and (1b1 , 2b2 , . . .) being
that of µ then ν has frequency representation (1a1+a2 , 2a2+b2 , . . .) so that |ν| = n + m.
This agrees with F (m)N (t) in a series expansion up to O(tN−m) (since the last equation in
(3.36) is also valid for finite N so long as |ν| = n +m ≤ N , by a result of appendix A).
Now, since
(aj+bj)!
aj !bj!
, (3.37)
we obtain from (3.36) that,
F (m)∞ (t) =
a1,...,an=0
b1,...,bm=0
δa1+···+nan,nδb1+···+mbm,m
(aj + bj)!
aj !bj!
b1,...,bm=0
δb1+···+mbm,m
(aj + bj)!
aj!bj !
1− tj
b1,...,bm=0
δb1+···+mbm,m
(1− tj)bj+1
1− tj .
(3.38)
For the first few cases we have that, with PN (t) as defined in (2.26),
F (m)∞ (t) =
P∞(t) for m = 0
1−tP∞(t) for m = 1
(1−t)(1−t2)P∞(t) for m = 2
, (3.39)
whose series expansion agrees with (3.17), (3.26), (3.32) up to O(tN−m) for, respectively,
m = 0, 1, 2. We may use (3.20) with (3.36) to determine N(n,m) exactly for N ≥ n+m.
Asymptotic Counting of Quarter BPS Operators at Large N
Asymptotic counting for the one boson case in the large N limit, for which, with PN (t)
as defined in (2.26), with p(n) being the total number of (unordered) partitions of n,
ZU(∞)(t) = P∞(t) =
p(n)tn , (3.40)
is the multi-trace partition function, entails finding an asymptotic value for the partition
number p(n) for ‘large’ n. This may be achieved by performing a saddle point approxima-
tion of p(n) = 1
dtP∞(t)t
−n−1. The function P∞(t) has a ‘large’ singularity at t = 1,
but in addition has singularities at all other roots of unity - see [45] on the validity of ignor-
ing these contributions asymptotically. This method was used by Hardy and Ramanujan
to find their celebrated formula, here given in a less detailed form as,
p(n) ∼ 1
, (3.41)
which was improved by Rademacher to give p(n) exactly. Their method relied crucially on
the modular properties of P∞(t).
Focusing now on the two bosonic fundamental field case for which, in the large N
limit, (3.20) with (3.38) gives exact counting, at issue is first finding asymptotic values for
the numbers Q(n,m, b) = Q(n,m, b1, . . . , bm), with constraint equation
j=1 jbj = m,
defined by
(1− tj)bj+1
1− tj = 1 +
Q(n,m, b)tn . (3.42)
Having found these we may then attempt to find the dominant contribution to (3.20) with
(3.38) for large N . In order to give asymptotic values for Q(n,m, b) we may follow [46] and
apply a formula due to Meinardus which gives a general result for the generating function
(1− tn)−an = 1 +
r(n)tn . (3.43)
A detailed version of Meinardus’ theorem may be found in [46] but for purposes of brevity
we may note that it implies that, as n → ∞,
r(n) ∼ C nκ exp
AΓ(α+ 1)ζ(α+ 1)nα
)1/(α+1)
(α+ 1)/α
, (3.44)
where ζ(s) =
j=1 j
−s is the Riemann zeta function and the constants C, κ, α, A are
determined by the auxiliary Dirichlet series,
D(s) =
, (3.45)
which must converge for Re(s) > α, a positive real number, and possess an analytic
continuation in the region Re(s) ≥ c, −1 < c < 0, such that, in this region, D(s) is
analytic except at a simple pole at s = α where it has residue A. In terms of α,A then
2π(1+α)
AΓ(α+ 1)ζ(α+ 1)
)(1−2D(0))/2(α+1)
expD′(0) ,
κ = (D(0)− 1− 1
α)/(α+ 1) .
(3.46)
Applying Meinardus’ theorem to the case of (3.42), clearly we have
D(s) =
+ ζ(s) , (3.47)
so that, assuming m =
j=1 jbj is fixed, D(s) has a simple pole at s = α = 1 where it
has residue A = 1. Using
D(0) = −1
bj , expD
′(0) =
, (3.48)
then, from (3.44) with (3.46), we may easily determine that, as n → ∞,
Q(n,m, b) ∼ 1
bj m∏
. (3.49)
This reduces to (3.41) when bj = 0, 1 ≤ j ≤ m, whereby Q(n, 0, . . . , 0) = p(n). Using
(3.20) for (3.38) with (3.42) and (3.49) then, as n → ∞,
N(n,m) =
b1,...,bm≥0∑
jbj=m
Q(n,m, b)−
b1,...,bm−1≥0∑
jbj=m−1
Q(n+1, m−1, b)
(3.50)
since Q(n,m,m, 0, . . . , 0), for b1 = m, bj = 0, j > 1, dominates over all other terms in
(3.50). This gives asymptotic values for the numbers in (1.1), for counting quarter BPS
operators, transforming in [m,n−m,m] SU(4)R representations, in the large N limit of
free N = 4 super Yang Mills, as previously described.
4. Counting Operators in the Chiral Ring of N = 4 Super Yang Mills
For the purposes of counting operators in the chiral ring of N = 4 super Yang Mills,
we denote corresponding multi-trace partition functions by CU(N)(t).
The generating function for CU(N)(t) for the case of one bosonic fundamental field has
been written in the form [1,3,9]
C(ν, t) =
1− νtn =
νNCU(N)(t) , (4.1)
so that ν acts as a chemical potential for the rank of the gauge group U(N). The equiv-
alence CU(N)(t) = ZU(N)(t) = PN (t), with ZU(N)(t) as in (2.25), is actually a special case
of the q-Binomial theorem. Writing - see [43] for notation -
(a; q)k = (1− a)(1− aq) · · · (1− aqk−1) , (4.2)
then the q-Binomial theorem is, for |x|, |q| < 1,
(a; q)k
(q; q)k
(ax; q)∞
(x; q)∞
. (4.3)
(Identifying ν = x and q = t and setting a = 0 in (4.3), so that 1/(ν; t)∞ = C(ν, t) above
and 1/(t; t)N = PN (t) in (2.26), then CU(N)(t) = ZU(N)(t) = PN (t) straightforwardly.
This special case of the q-Binomial theorem is due to Euler.)
For the two boson case, so that the single particle partition function is given by
f(t, u) = t + u for some t, u, then the generating function for the finite N chiral ring
partition function CU(N)(t, u) is given by [1,3,9]
C(ν, t, u) =
n,m=0
1− νtnum =
νNCU(N)(t, u) . (4.4)
This function is more difficult to analyse in terms of counting though has been investigated
by Stanley [47] in relation to partitions - there it has been dubbed the ‘double Eulerian’
generating function. Through use of the Cauchy-Littlewood formula, then we may expand
CU(N)(t, u) in terms of partitions of N as,
CU(N)(t, u) =
hλ(t)hλ(u) , (4.5)
where
hλ(t) = sλ(1, t, t
2, . . .) , (4.6)
so that using an identity for Schur polynomials to be found in [24,47] then
CU(N)(t, u) =
1≤i<j≤N (1− tλi−λj+j−i)(1− uλi−λj+j−i)∏N
i=1(t; t)λi+N−i(u; u)λi+N−i
(i−1)λi . (4.7)
(4.5) with (4.6) has a natural interpretation in terms of plane partitions in that, for π
being all column-strict plane partitions of shape λ, |π| =
i,j πij ,
hλ(t) = sλ(1, t, t
2, . . .) =
t|π| . (4.8)
Obviously, (4.5) with (4.8) generalise for other chiral ring sectors. (For a different connec-
tion between the ‘double Eulerian’ generating function and major indices of permutations
see [47], p. 385.) As an illustration of (4.5) with (4.8), we may consider the case N = 2
whereby λ = (2, 0), (1, 1) gives the two possible partitions of 2. For λ = (2, 0) (corre-
sponding to a Young diagram with a single row of two boxes) π11 ≥ π12 ≥ 0 gives all
column-strict plane partitions of shape (2, 0), while for λ = (1, 1) (corresponding to a
Young diagram with a single column of two boxes) then π11 > π21 ≥ 0 gives all column-
strict plane partitions of shape (1, 1). Thus,
h(2,0)(t) =
π11,π12≥0
π11≥π12
tπ11+π12 =
(1− t)(1− t2) ,
h(1,1)(t) =
π11,π21≥0
π11>π21
tπ11+π21 =
(1− t)(1− t2) ,
(4.9)
so that, from (4.5) for N = 2,
CU(2)(t, u) = h(2,0)(t)h(2,0)(u)+h(1,1)(t)h(1,1)(u) =
1 + ut
(1− t)(1− t2)(1− u)(1− u2) , (4.10)
which is the correct result as may be verified by extracting the ν2 coefficient in an expansion
of (4.4) up to O(ν2).
9 See [47] for a detailed description of plane partitions. Briefly, a column-strict plane parti-
tion of shape λ is an array π = (πij) of non-negative integers with finitely many non-zero entries,
that is arranged in a Young tableaux with shape λ - see appendix A - such that the numbers πij
are weakly decreasing along each row, πij ≥ πi j+1 ≥ 0, and strictly decreasing down each column,
πij > πi+1 j ≥ 0. The sum of the parts of π is given by |π| =
πij. (Note that in contrast to
the definition in [47], here we are allowing πij = 0, for some i, j, to be a part of the plane partition
π with shape λ.)
In the large N limit,
CU(∞)(t, u) =
n1,n2≥0
n1+n2>0
1− tn1un2 , (4.11)
upon which attention is shortly focused.
For the numbers N(n,m) → N̂(n,m) counting quarter BPS primary operators for the
chiral ring of N = 4 super Yang Mills, belonging to [m,n − m,m] SU(4)R R-symmetry
representations, as in (1.1), we have 10
N̂(n,m) =
dt du CU(N)(t, u) s(n,m)(t−1, u−1) (t−1 − u−1)2 . (4.12)
These may be more conveniently evaluated in terms of the numbers in (3.4) M(n,m) →
M̂(n,m), counting all chiral ring quarter BPS operators in the [m,n−m,m] SU(4)R repre-
sentation, given by
M̂(n,m) =
(2πi)2
dt du CU(N)(t, u) t−n−1u−m−1 , (4.13)
so that N̂(n,m) = M̂(n,m)−M̂(n+1,m−1). Defining Pλ(n) to be the number of column-strict
plane partitions π of shape λ so that |π| =
i,j πij = n, then, from (4.5) with (4.8) and
(4.13), M̂(n,m) =
λ⊢N Pλ(n)Pλ(m). Thus,
N̂(n,m) =
(Pλ(n)Pλ(m)− Pλ(n+1)Pλ(m−1)) , (4.14)
counts chiral ring quarter BPS primary operators in SU(4)R representations [m,n−m,m]
for any n,m at finite N .
Asymptotic Counting for Chiral Ring BPS Operators at Large N
For asymptotic counting of operators in the chiral ring of N = 4 super Yang Mills
at large N , a relatively crude method is employed here which nevertheless captures the
exponential behaviour of counting numbers of interest. This method is based on saddle
point approximations of functions near a dominant singularity - see [45] for a useful sum-
mary. (Often for physical applications in thermodynamics, e.g. for entropy formulae, we
are interested only in the exponential behaviour of such numbers anyhow.)
10 This formula employs the orthonormality relation of Schur polynomials described here
and has appeared in a similar context in [33], appendix B.
To illustrate, we consider the one boson case in the large N limit again. We first find
a convenient ‘approximating function’ as follows,
P∞(t) =
1− tn = exp
ln(1− tn)
∼ exp
ds ln(1− ts)
= exp
6 ln t
(4.15)
which has an ‘easier’ singularity structure. (The approximation in the second step may
be justified by the Euler-Maclaurin formula for approximating sums by integrals.) Using
(4.15) then for large enough n,
p(n) =
dt P∞(t)t
−n−1 ∼ 1
dt eg(t) , g(t) = − π
6 ln t
− n ln t . (4.16)
We may approximate the latter integral for large n by noting that the dominant contribu-
tion is at the saddle point t′ = e−π/
6n ∼ 1 for which
g(t′) = π
n , g′(t′) = 0 , g′′(t′) =
6n3 eπ
2/3n = α , (4.17)
so that, for t′′ = t− t′,
p(n) ∼ eπ
2n/3 1
dt′′ e
αt′′2 ∼ eπ
2n/3 1
ds e−
αs2 =
2n/3 .
(4.18)
Thus,
ln p(n) ∼ π
n , (4.19)
which captures the correct behaviour of ln p(n) for large n, according to (3.41).
We may proceed analogously for the quarter BPS chiral ring multi-trace partition
function at large N , (4.11), which we approximate by
CU(∞)(t, u) =
n1,n2≥0
n1+n2>0
1− tn1un2 = exp
n1,n2≥0
n1+n2>0
ln(1− tn1un2)
∼ exp
dv dw ln(1− tvuw)
= exp
( ζ(3)
ln t lnu
(4.20)
In this case we have, from (4.13),
M̂(n,m) ∼
(2πi)2
dt du eg(t,u) , (4.21)
where
g(t, u) =
ln t lnu
− n ln t−m lnu , (4.22)
for n,m large. The dominant contribution to the integral, for n,m both large and of the
same order, occurs about the point (t′, u′) ∼ (1, 1) where
t′ = e−(ζ(3)mn
−2)1/3 , u′ = e−(ζ(3)nm
−2)1/3 , (4.23)
for which
g(t′, u′) = 3 3
ζ(3)nm ,
g(t, u)
(t′,u′)
g(t, u)
(t′,u′)
= 0 ,
g(t, u)
(t′,u′)
= 2(ζ(3)−1n5m−1)
3 e2(ζ(3)mn
−2)1/3 = α ,
g(t, u)
(t′,u′)
= 2(ζ(3)−1m5n−1)
3 e2(ζ(3)nm
−2)1/3 = β ,
g(t, u)
(t′,u′)
= (ζ(3)−1n2m2)
3 e(ζ(3)mn
−2)1/3+(ζ(3)nm−2)1/3 = γ .
(4.24)
So long as m,n are both large and of the same order, the saddle point approximation is
justified and we obtain
M̂(n,m) ∼ e3
ζ(3)nm 1
dv dw e−
(αv2+βw2+2γvw) = h(α, β, γ)e3
ζ(3)nm ,
(4.25)
where
h(α, β, γ) =
αβ − γ2
(ζ(3)m−2n−2)
3 e−(ζ(3)mn
−2)1/3−(ζ(3)nm−2)1/3 . (4.26)
We thus have that for n,m both comparably large,
lnM̂(n,m) ∼ 3 3
ζ(3)nm , (4.27)
so that
ln N̂(n,m) = ln
M̂(n,m) − M̂(n+1,m−1)
∼ lnM̂(n,m) ∼ 3 3
ζ(3)nm . (4.28)
It is difficult to check the consistency of this result given the dearth of literature on
these types of multi-variable generating functions and their asymptotic behaviour, however,
we may consider the simpler function, also considered in [9],
CU(∞)(t, t) =
(1− tn)n+1 =
E(r) tr , (4.29)
where, in terms of the counting numbers N̂(n,m), from (1.1),
CU(∞)(t, t) =
(n−m+1)N̂(n,m) tn+m ⇒ E(r) =
(r−2m+1)N̂(r−m,m) .
(4.30)
Hence, E(r) counts quarter BPS primary operators in the chiral ring of N = 4 super
Yang Mills, transforming in [m,n−m,m] SU(4)R representations, with the same conformal
dimensions r = n+m. Extracting the dominant contribution to ln E(r) from (4.30), which
occurs at the maximum value of m, mM = [
r], and using (4.28), we obtain
ln E(r) ∼ ln N̂(r−mM,mM) ∼ 32
2ζ(3)r2 . (4.31)
By considering (4.29) directly, we may employ Meinardus’ theorem (described in the
third section) to find the behaviour of ln E(r) as r → ∞. Note, however that Meinardus’
theorem may not be applied directly to (4.29) since the corresponding auxiliary Dirichlet
series (3.45), with aj = j+1, has two simple poles. To overcome this difficulty we split
(4.29) into a product of two functions, both separately amenable to application of Meinar-
dus’ theorem. One is the reciprocal of the Euler function, P∞(t) in (4.15). The other, the
MacMahon function, is given by
M(t) =
(1− tn)n =
q(r) tr , (4.32)
and has been considered in a similar context as here in [3].11 Writing
CU(∞)(t, t) = P∞(t)M(t) , (4.33)
with P∞(t) as in (4.15), so that, using (4.29),
E(r) =
r1,r2≥0
r1+r2=r
p(r1)q(r2) , (4.34)
we may find the asymptotic behaviour of ln E(r), as r → ∞, by extracting the dominant
contribution from (4.34) using the asymptotic behaviour of p(r), q(r). The auxiliary
Dirichlet series for M(t) in (4.32) is, from (3.45) with aj = j,
D(s) = ζ(s− 1) , (4.35)
11 The relation of M(t) in (4.32) to plane partitions is given a description in [46]. Briefly,
q(r) gives the number of ordinary plane partitions π, so that πij ≥ πi+1 j > 0, πij ≥ πi j+1 > 0,
with |π| =
πij = r. p(r) < q(r) as ordinary partitions λ are a special case of plane partitions.
In fact, the formula for ln q(r) found here is a special case of a more exact asymptotic formula
first found by Wright [48] for the number of plane partitions q(r) of the number r.
which has a simple pole at s = α = 2, at which the residue is A = 1. Thus, from (3.44),
ln q(r) ∼ 3
2ζ(3)r2 . (4.36)
This is consistent with (4.31) as the dominant contribution to lnE(r) comes from the r1 = 0
term in (4.34) (since p(r) ≪ q(r) as r → ∞) so that lnE(r) ∼ ln q(r).
It has not escaped attention that the method used here, to capture the exponential
behaviour of asymptotic values for the numbers M̂(n,m), may be easily extended to chiral
ring sectors other than the quarter BPS one. Suppose, for simplicity, that Zj , 1 ≤ j ≤ k−1,
are commuting bosonic fundamental fields, in the U(N) Lie algebra, so that the single
particle partition function is given by f(t) =
j=1 tj , in terms of the corresponding letters
tj . Let M̂(m1,...,mk−1) denote the number of independent operators involving products of
m1 Z1’s, m2 Z2’s etc. in corresponding multi-trace operators. The multi-trace partition
function, in the large N limit, is given by,
CU(∞)(t) =
n1,...,nk−1≥0
n1+...+nk−1>0
1− t1n1 · · · tk−1nk−1
, (4.37)
which may be crudely approximated by, similarly as before,12
CU(∞)(t) ∼ exp
dvj ln(1−t1v1 · · · tk−1vk−1)
= exp
(−1)k+1ζ(k)
ln t1 · · · ln tk−1
. (4.38)
Thus, without going into as much detail, for the analogue of (4.27) we have, (assuming mj
are all comparably large,)
lnM̂(m1,...,mk−1) ∼ g(t
1, . . . , t
k−1) = k
ζ(k)m1 · · ·mk−1 , (4.39)
where g(t′1, . . . , t
k−1) is the value of
g(t1, . . . , tk−1) =
(−1)k+1ζ(k)
ln t1 · · · ln tk−1
−m1 ln t1 − . . .−mk−1 ln tk−1 , (4.40)
12 For Lin(x) =
xj/jn being the usual Polylogarithm, with Lin(1) = ζ(n), n > 1,
Lin(0) = 0, then with the convention Li1(x) = − ln(1− x), the following integral
Lin(zx) = Lin+1(z) ,
may be useful for showing this, after a suitable change of variables.
at the saddle point (t′1, . . . , t
k−1) ∼ (1, . . . , 1), where
(ln t′1, . . . , ln t
k−1) = − k
ζ(k)m1 · · ·mk−1 (1/m1, . . . , 1/mk−1) , (4.41)
so that
g(t1, . . . , tk−1)
,...,t′
= 0 , j = 1, . . . , k−1 . (4.42)
(4.39) is consistent with a result implied by Meinardus’ theorem. The function,13
CU(∞)(t, . . . , t) ∼
(1− tn)−n
k−2/(k−2)! =
c(k, r)tr , (4.43)
has auxiliary Dirichlet series, from (3.45) with aj = j
k−2/(k − 2)!,
D(s) =
(k − 2)! ζ(s+ 2− k) , (4.44)
which has a simple pole at s = α = k−1 at which the residue is A = 1/(k − 2)!, so that,
from (3.44),
ln c(k, r) ∼ k
(k−1) ζ(k) rk−1 . (4.45)
(4.45) is precisely the result that may be obtained from (4.39) if we maximise the product
m1 · · ·mk−1, subject to the constraint
j=1 mj = r, for which the solution is mj = mM =
r/(k − 1) (relaxing the constraint that mj be non-negative integers, which is irrelevant
asymptotically), so that lnM̂(mM,...,mM) ∼ ln c(k, r).
This is applicable to counting multi-trace operators in the eighth BPS chiral ring sec-
tor for N = 4 super Yang Mills with fundamental fields Z, Y,X involving m1 Z’s, m2 Y ’s,
m3 X ’s. Expanding the corresponding partition function (4.37), with k = 4, in terms of
Schur polynomials s(m1,m2,m3)(t), m1 ≥ m2 ≥ m3 ≥ 0, similar to (1.1), the expansion
coefficients N̂(m1,m2,m3) count spinless multi-trace primary operators transforming in the
[m2 +m3, m1 −m2, m2 −m3] SU(4)R R-symmetry representation, with conformal dimen-
sionsm1+m2+m3 [33]. Just as in (4.28), asymptotically lnN(m1,m2,m3) ∼ lnM(m1,m2,m3).
This counting, however, ignores contributions of the fermionic fields λ, λ̄, which it may be
important to include in order to give correct counting of eighth BPS chiral ring operators.
13 This may be easily seen from (4.37), as the number of solutions to
mj = n, where
mj are non-negative integers, is the binomial number
n+k−2
which, to leading order in large n,
behaves like nk−2/(k−2)!. More properly, we should split the product CU(∞)(t, . . . , t) into pieces
separately amenable to Meinardus’ theorem, as for the prior case for k = 3, however, just as for
that case, the numbers c(k, r) dominate, and so other contributions are ignored here.
5. Conclusions
There are some obvious questions not answered by this work. The first is whether
or not the approach in the second section using symmetric polynomials can give insight
into thermodynamics at finite N , such as for the Hagedorn transition, for example. While
it gives the large N expression (2.14) in an elementary way, its wider applicability or
usefulness to such questions is unclear. The approach is undoubtedly useful for finding
exact expressions for counting numbers (as in (3.2), (3.27) and (3.34) for quarter BPS
operators) and (3.9), (3.16) may be useful for analysing counting for more complicated
sectors of N = 4 super Yang Mills, with gauge group U(N).
The second question is how the arguments employing symmetric polynomial tech-
niques here may be extended to other gauge groups, the most pertinent being perhaps
SU(N). Arguments here employing (2.6) and the orthonormality property of Schur poly-
nomials should remain largely unaffected for SU(N). Exact values for counting numbers
obtained here should require some modification for SU(N), though asymptotic values may
be unchanged.
The third question concerns asymptotic values for counting numbers and how these
may be improved. The asymptotic counting formulae given in such papers as [3,9] for
chiral ring sectors are special cases of formulae such as those of Hardy and Ramanujan,
Meinardus, etc., all of which derive from single variable generating functions. It is hoped
that the expressions (3.50), (4.28), (4.39), given here for asymptotic counting of BPS
operators, that distinguishes between differing R-symmetry charges, represents a serious
attempt at going beyond consideration of single variable generating functions.14 Improving
upon these formulae will require more sophisticated techniques, perhaps along the lines
used to find those of Hardy and Ramanujan or Meinardus and employing any modular
properties of the multi-variable functions involved. This issue may also be important for
microscopic counting for Black Holes, as the BPS solutions found thus far, for N = 4
superconformal symmetry, depend on special values of R-symmetry charges [17,18,19,20]
- see [50] for a related detailed discussion.
Thus far, the elegant results for finite N partition functions for chiral ring sectors have
been interpreted from a largely geometric perspective - it may be interesting to investigate
more how such results are related to the theory of random matrices and/or symmetric
polynomials.
14 After submission of the first version of this paper to the electronic archive, I received
an e-mail from Hai Lin pointing out an interesting comparison between (4.28) here and (2.14) of
[49], obtained in quite a different context. The two formulae are essentially the same given the
numerical value 3 3
ζ(3) = 3.189 . . ., correct to three decimal places.
Acknowledgements
I warmly thank Yang-Hui He, Paul Heslop, Hugh Osborn, Christian Romelsberger and
Christian Saemann for useful comments and discussions. This work is supported by an
IRCSET (Irish Research Council for Science, Engineering and Technology) Post-doctoral
Fellowship.
Appendix A. Partitions, symmetric group characters, symmetric polynomials
and inner products
A generic partition λ is any finite or infinite sequence λ = (λ1, λ2, . . .) of non-negative
integers in decreasing order λ1 ≥ λ2 ≥ . . . ≥ 0 containing only finitely many non-zero
terms. Often it is convenient to omit zero entries. The non-zero entries are called the
parts of λ the number of which we denote by ℓ(λ). The sum of the parts of λ is called
the weight of λ which we denote by |λ| =
i λi. If |λ| = L then λ is a partition of L
and we write λ ⊢ L. For convenience we sometimes write λ in its frequency representation
which is a reordering of the entries in λ, indicating the number of times each successive
non-negative integer occurs, (1a1 , 2a2 , . . .) so that exactly an of the parts of λ equal n and
|λ| =
n≥1 n an.
In terms of standard Young diagrams, λ corresponds to a Young diagram of shape λ,
with λ1 boxes in the first row, λ2 boxes in the second row etc.; the number of parts ℓ(λ)
is simply the number of rows and the weight |λ| is the total number of boxes.
For the symmetric group, SN , the irreducible representations are labelled by par-
titions λ ⊢ N - see [38] for a useful summary - so that, for Xλ(σ), σ ∈ SN , being a
corresponding matrix representation, then the character of σ ∈ SN in the representation
Xλ is χλ(σ) = tr(Xλ(σ)). The characters are class functions so that they take a constant
value on conjugacy classes and, recalling that for SN the conjugacy classes Kµ are labelled
by partitions µ ⊢ N , corresponding to the cycle structure of a class representative, then
χλ(σ) = χλµ for all σ ∈ Kµ. With zλ as defined in (2.8), a crucial property of SN charac-
ters is the orthogonality of the matrix [zµ
−1/2χλµ]λµ. This gives rise to the orthogonality
relations, for λ, µ ⊢ N , (see also Ch. IV of [51] for a related discussion,)
χλ(σ)χµ(σ) =
χλν χ
ν = δλµ , (A.1)
and ∑
χνλ χ
µ = zλδλµ . (A.2)
A convenient basis for N variable symmetric polynomials are Schur polynomials
sλ(z) = sλ(z1, . . . , zN ) labelled by λ = (λ1, . . . , λN ). They may be expressed in a number
of ways [24,47]. For convenience we write them as
sλ(z) = aλ+ρ(z)/aρ(z) , (A.3)
where ρ, the Weyl vector, is given by ρ = (N − 1, N − 2, . . . , 1, 0) and
aλ+ρ(z) =
sgn(σ) zσ(1)
λ1+N−1 · · · zσ(j)λj+N−j · · · zσ(N)λN = det[ziλj+N−j ] , (A.4)
aρ(z) = det[zi
N−j ] =
1≤i<j≤N
(zi − zj) = ∆(x) , (A.5)
being the Vandermonde determinant. Schur polynomials sλ(z) have a standard interpre-
tation as corresponding to the characters of irreducible U(N) (or, for
i zi = 1, SU(N))
Lie algebra representations. Here, λ gives the shape of the Young tableaux for the corre-
sponding U(N) Lie algebra representation.
For λ = (λ1, . . . , λN ) and µ = (µ1, . . . , µN ) where λi, µi ∈ Z then, from the definition
of (2.6) along with (A.3),
sλ, sµ
sgn(σ)δλσµ =
sgn(σ)δλµσ , (A.6)
where, for any λ′ = (λ′1, . . . , λ
N ), λ
′σ = σ(λ′+ρ)−ρ is the shifted Weyl reflection of λ′ by σ,
with the action of SN on λ′ being given by σ(λ′1, . . . , λ′N ) = (λ′σ(1), . . . , λ′σ(N)). (Equation
(A.6) is a reflection of sλ(x) = sgn(σ)sλσ(x) for any partition λ and σ ∈ SN - note that this
property is useful for showing (3.25), (3.31). λσ has a standard interpretation in terms of
U(N) Lie algebra representations - for the Verma module with dominant integral highest
weight having orthonormal basis labels λ, λ1 ≥ λ2 ≥ . . . ≥ λN ≥ 0, then λσ, for σ 6= idSN ,
are the orthonormal basis labels for the highest weights of all invariant sub-modules. This
fact may be exploited to derive the Weyl character formula (A.3) for the irreducible U(N)
Lie algebra representation with dominant integral highest weight having orthonormal basis
labels λ, or, alternatively, Young tableaux of shape λ.)
When λ, µ are partitions so that λ1 ≥ . . . ≥ λN ≥ 0 and µ1 ≥ . . . ≥ µN ≥ 0 then
(A.6) reduces to a well defined inner product,
sλ, sµ
= δλµ , (A.7)
so that in this case the Schur polynomials are orthonormal. Note that in order that sλ(x)
be non-zero for some arbitrary partition λ then ℓ(λ) ≤ N , so that (A.7) is zero for ℓ(λ) > N
or ℓ(µ) > N .
Another basis for symmetric polynomials are the power symmetric polynomials, pλ(z),
for λ = (λ1, . . . , λL) ⊢ L, which are defined by
pλ(z) = pλ1(z)pλ2(z) · · ·pλL(z) , pn(z) =
n . (A.8)
Note that there is no longer the restriction that ℓ(λ) ≤ N as for Schur polynomials.
Symmetric group characters may be used to relate the two bases for symmetric poly-
nomials [24,47] so that, with the definition of zλ in (2.8),
sλ(z) =
χλµ pµ(z) , (A.9)
(a theorem of Frobenius) and, for λ ⊢ L,
pλ(z) =
ℓ(µ)≤N
λ sµ(z) . (A.10)
((A.9) with χ
λ = 1 for all λ ⊢ N is useful for obtaining (3.35).)
Regarding the inner product (2.6), then using (A.10) along with (A.7), we then have
that, for λ ⊢ L, µ ⊢ M , 〈
pλ, pµ
= δLM
ℓ(ν)≤N
µ . (A.11)
Orthogonality of symmetric group characters implies, from (A.2), that for |λ|, |µ| ≤ N
then (A.11) simplifies to, with the definition of zλ in (2.8),
pλ, pµ
= zλδλµ . (A.12)
Appendix B. Tables
N n 2 3 4 5 6 7 8 9 10 11
1 1 1 1 1 1 1 1 1 1 1
2 2 2 3 3 4 4 5 5 6 6
3 2 3 4 5 7 8 10 12 14 16
4 2 3 5 6 9 11 15 18 23 27
5 2 3 5 7 10 13 18 23 30 37
6 2 3 5 7 11 14 20 26 35 44
Numbers of multi-trace half BPS primary operators, with conformal dimension n
and belonging to [0, n, 0] R-symmetry representations, for free N = 4 SYM with
U(N) gauge group. (For every N there is one [0, 0, 0] and [0, 1, 0] representation -
these are omitted above.)
N(n,1)
N n 2 3 4 5 6 7 8 9 10 11
2 1 1 2 2 3 3 4 4 5 5
3 1 2 4 5 8 10 13 16 20 23
4 1 2 5 7 12 16 23 30 40 49
5 1 2 5 8 14 20 30 41 57 74
6 1 2 5 8 15 22 34 48 69 92
7 1 2 5 8 15 23 36 52 76 104
Numbers of multi-trace quarter BPS primary operators, with conformal dimension
n+1 and belonging to [1, n−1, 1] R-symmetry representations, for free N = 4 SYM
with U(N) gauge group. (n = 0, 1 cases are all zero.)
N(n,2)
N n 2 3 4 5 6 7 8 9 10 11
3 3 5 10 14 21 27 36 44 55 65
4 3 6 14 21 36 50 73 96 130 163
5 3 6 15 25 44 66 101 142 200 267
6 3 6 15 26 48 74 118 171 251 346
7 3 6 15 26 49 78 126 188 281 398
8 3 6 15 26 49 79 130 196 298 428
Numbers of multi-trace quarter BPS primary operators, with conformal dimension
n+2 and belonging to [2, n−2, 2] R-symmetry representations, for free N = 4 SYM
with U(N) gauge group. (n = 0, 1 cases are all zero.)
References
[1] J. Kinney, J.M. Maldacena, S. Minwalla and S. Raju, An Index for 4 Dimensional
Super Conformal Theories, hep-th/0510251.
[2] D. Martelli, J. Sparks and S.T. Yau, Sasaki-Einstein Manifolds and Volume Min-
imisation, hep-th/0603021.
[3] S. Benvenuti, B. Feng, A. Hanany, Y.H. He, Counting BPS Operators in Gauge
Theories: Quivers, Syzygies and Plethystics, hep-th/0608050.
[4] D. Martelli and J. Sparks, Dual Giant Gravitons in Sasaki-Einstein Backgrounds,
Nucl. Phys. B 759 (2006) 292, hep-th/0608060.
[5] A. Basu and G. Mandal, Dual Giant Gravitons in AdSm × Y n (Sasaki-Einstein),
hep-th/0608093
[6] A. Butti, D. Forcella and A. Zaffaroni, Counting BPS Baryonic Operators in CFTs
with Sasaki-Einstein Duals, hep-th/0611229.
[7] A. Hanany and C. Romelsberger, Counting BPS Operators in the Chiral Ring of
N = 2 Supersymmetric Gauge Theories or N = 2 Braine Surgery, hep-th/0611346.
[8] L. Grant and K. Narayan, Mesonic Chiral Rings in Calibi-Yau Cones from Field
Theory, hep-th/0701189.
[9] B. Feng, A. Hanany, Y.H. He, Counting Gauge Invariants: The Plethystic Program,
hep-th/0701063.
[10] D. Forcella, A. Hanany and A. Zaffaroni, Baryonic Generating Functions, hep-
th/0701236.
[11] S. Lee, S. Lee and J. Park, Toric AdS4/CFT3 Duals and M-theory Crystals, hep-
th/0702120.
[12] G. Mandal and N.V. Suryanarayana, Counting 1/8 BPS Dual Giants, hep-
th/0606088.
[13] A. Sinha and N.V. Suryanarayana, Two-charge Small Black Hole Entropy: String-
loops and Multi-strings, JHEP 0610 (2006) 034, hep-th/0606218.
[14] N.V. Suryanarayana, Half-BPS Giants, Free Fermions and Microstates of Super-
stars, JHEP 0601 (2006) 082, hep-th/0411145.
[15] A. Barabanschikov, L. Grant, L.L. Huang and S. Raju, The Spectrum of Yang Mills
on a Sphere, hep-th/0501063.
http://arxiv.org/abs/hep-th/0510251
http://arxiv.org/abs/hep-th/0603021
http://arxiv.org/abs/hep-th/0608050
http://arxiv.org/abs/hep-th/0608060
http://arxiv.org/abs/hep-th/0608093
http://arxiv.org/abs/hep-th/0611229
http://arxiv.org/abs/hep-th/0611346
http://arxiv.org/abs/hep-th/0701189
http://arxiv.org/abs/hep-th/0701063
http://arxiv.org/abs/hep-th/0701236
http://arxiv.org/abs/hep-th/0701236
http://arxiv.org/abs/hep-th/0702120
http://arxiv.org/abs/hep-th/0702120
http://arxiv.org/abs/hep-th/0606088
http://arxiv.org/abs/hep-th/0606088
http://arxiv.org/abs/hep-th/0606218
http://arxiv.org/abs/hep-th/0411145
http://arxiv.org/abs/hep-th/0501063
[16] J.M. Maldacena, The Large N Limit of Superconformal Field Theories and Super-
gravity, Adv. Theor. Math. Phys. 2 (1998) 231, hep-th/9711200.
[17] J.B. Gutowski and H.S. Reall, Supersymmetric AdS(5) Black Holes, JHEP 0402
(2004) 006, hep-th/0401042.
[18] J.B. Gutowski and H.S. Reall, General Supersymmetric AdS(5) Black Holes, JHEP
0404 (2004) 048, hep-th/0401129.
[19] Z.W. Chong, M. Cvetic, H. Lu and C.N. Pope, General Non-extremal Rotating
Black Holes in Minimal Five-Dimensional Gauged Supergravity, Phys. Rev. Lett.
95 (2005) 161301 hep-th/0506029.
[20] H.K. Kunduri, J. Lucietti and H.S. Reall, Supersymmetric Multi-Charge AdS(5)
Black Holes, JHEP 0604 (2006) 036, hep-th/0601156.
[21] B.-S. Skagerstam, On the Large Nc Limit of the SU(Nc) Colour Quark-Gluon Par-
tition Function, Z. Phys. C 24 (1984) 97.
[22] B. Sundborg, The Hagedorn Transition, Deconfinement and N = 4 SYM Theory,
Nucl. Phys. B 573 (2000) 349, hep-th/9908001.
[23] O. Aharony, J. Marsano, S. Minwalla, K. Papadodimas and M. Van Raamsdonk,
The Hagedorn/Deconfinement Phase Transition in Weakly Coupled Large N Gauge
Theories, Adv. Theor. Math. Phys. 8 (2004) 603, hep-th/0310285.
[24] I.G. Macdonald, Symmetric Functions and Hall Polynomials, Second Edition,
Clarendon Press, Oxford, 1995.
[25] A.M. Polyakov, Gauge Fields and Space-Time, Int. J. Mod. Phys. A 17S1 (2002)
119, hep-th/0110196.
[26] E. Onofri, G. Veneziano and J. Wosiek, Supersymmetry and Combinatorics, math-
ph/0603082.
[27] R. De Pietri, S. Mori and E. Onofri, The Planar Spectrum in U(N)-Invariant
Quantum Mechanics by Fock Space Methods: I. The Bosonic Case, JHEP 0701
(2007) 018, hep-th/0610045.
[28] M. Bonini, G.M. Cicuta and E. Onofri, Fock Space Methods and Large N , J. Phys.
A 40 (2007) F229, hep-th/0701076.
[29] V.K. Dobrev and E. Sezgin, Spectrum and Character Formulae of SO(3, 2) Unitary
Representations, Lecture Notes in Physics, Vol. 379, eds. J.D Hennig, W. Lücke and
J. Tolar, Springer-Verlag, Berlin, 1990.
http://arxiv.org/abs/hep-th/9711200
http://arxiv.org/abs/hep-th/0401042
http://arxiv.org/abs/hep-th/0401129
http://arxiv.org/abs/hep-th/0506029
http://arxiv.org/abs/hep-th/0601156
http://arxiv.org/abs/hep-th/9908001
http://arxiv.org/abs/hep-th/0310285
http://arxiv.org/abs/hep-th/0110196
http://arxiv.org/abs/math-ph/0603082
http://arxiv.org/abs/math-ph/0603082
http://arxiv.org/abs/hep-th/0610045
http://arxiv.org/abs/hep-th/0701076
[30] V.K. Dobrev, Positive Energy Representations of Non-compact Quantum Algebras,
Proceedings of the Workshop on Generalized Symmetries in Physics, Clausthal,
July 1993, eds. H.D. Doebner et al., World Sci. Singapore, 1994.
[31] F.A. Dolan, Character Formulae and Partition Functions in Higher Dimensional
Conformal Field Theory, J. Math. Phys. 47 (2006) 062303, hep-th/0508031.
[32] V.K. Dobrev, Characters of the Positive Energy UIRs of D = 4 Conformal Super-
symmetry, hep-th/0406154.
[33] M. Bianchi, F.A. Dolan, P.J. Heslop and H. Osborn, N = 4 Superconformal Char-
acters and Partition Functions, Nucl. Phys. B 767 [FS] (2007) 163, hep-th/0609179.
[34] G.W. Gibbons, M.J. Perry and C.N. Pope, Partition Functions, the Bekenstein
Bound and Temperature Inversion in anti-de Sitter Space and its Conformal Bound-
ary, Phys. Rev. D 74 (2006) 084009, hep-th/0606186.
[35] J.L. Cardy, Operator Content and Modular Properties of Higher Dimensional Con-
formal Field Theories, Nucl. Phys. B366 (1991) 403.
[36] D. Kutasov and F. Larsen, Partition Sums and Entropy Bounds in Weakly Coupled
CFT, JHEP 0101 (2001) 001, hep-th/0009244.
[37] E. D’Hoker, P. Heslop, P. Howe and A.V. Ryzhov, Systematics of Quarter BPS
Operators in N = 4 SYM, JHEP 0304 (2003) 038, hep-th/0301104.
[38] B.E. Sagan, The Symmetric Group, Representations, Combinatorial Algorithms,
and Symmetric Functions, Wadsworth and Brooks/Cole Mathematics Series, Cali-
fornia, 1991.
[39] L. Begin, C. Cummins and P. Mathieu, Generating Functions for Tensor Products,
hep-th/9811113.
[40] I.M. Gessel, Symmetric Functions and P-Recursiveness, T. Combin. Theory Ser. A
53 (1990) 257.
[41] J. Carlsson and B.H.J. McKellar, SU(N) Glueball Masses in 2 + 1 Dimensions,
Phys. Rev. D 68 (2003) 074502, hep-lat/0303016.
[42] Y. Nakayama, Finite N Index and Angular Momentum Bound from Gravity, hep-
th/0701208.
[43] G.E. Andrews, R. Askey and R. Roy, Special Functions, Cambridge University
Press, Cambridge, 1999.
[44] J.F. Willenbring, Stable Hilbert Series of S(g)K for Classical Groups, math/0510649
http://arxiv.org/abs/hep-th/0508031
http://arxiv.org/abs/hep-th/0406154
http://arxiv.org/abs/hep-th/0609179
http://arxiv.org/abs/hep-th/0606186
http://arxiv.org/abs/hep-th/0009244
http://arxiv.org/abs/hep-th/0301104
http://arxiv.org/abs/hep-th/9811113
http://arxiv.org/abs/hep-lat/0303016
http://arxiv.org/abs/hep-th/0701208
http://arxiv.org/abs/hep-th/0701208
http://arxiv.org/abs/math/0510649
[45] A.M. Odlyzko, Asymptotic Enumeration Methods, Handbook of Combinatorics,
Vol. 2, eds. R.L. Graham, M. Grötschel and L. Lovász, North-Holland, Amster-
dam, 1995.
[46] G.E. Andrews, The Theory of Partitions, Encyclopedia of Mathematics and its
Applications, Vol. 2, ed. G.C. Rota, Addison-Wesley, Reading, Massachusetts, 1976.
[47] R.P. Stanley, Enumerative Combinatorics, Vol. 2, Cambridge University Press,
Cambridge, 1999.
[48] E.M. Wright, Asymptotic Partition Formulae, I: Plane Partitions, Quart. J. Math.
Oxford Ser. 2 (1931) pp. 177-189.
[49] H. Lin and J.M. Maldacena, Fivebranes from Gauge Theory, Phys. Rev. D 74 (2006)
084014, hep-th/0509235.
[50] H.K. Kunduri, J. Lucietti and H.S. Reall, Do Supersymmetric Anti-de Sitter Black
Rings Exist?, JHEP 0702 (2007) 026, hep-th/0611351.
[51] D.E. Littlewood, The Theory of Group Characters and Matrix Representations of
Groups, Clarendon Press, Oxford, 1940.
http://arxiv.org/abs/hep-th/0509235
http://arxiv.org/abs/hep-th/0611351
ABSTRACT
  The free field partition function for a generic U(N) gauge theory, where the
fundamental fields transform in the adjoint representation, is analysed in
terms of symmetric polynomial techniques. It is shown by these means how this
is related to the cycle polynomial for the symmetric group and how the large N
result may be easily recovered. Higher order corrections for finite N are also
discussed in terms of symmetric group characters. For finite N, the partition
function involving a single bosonic fundamental field is recovered and explicit
counting of multi-trace quarter BPS operators in free \N=4 super Yang Mills
discussed, including a general result for large N. The partition function for
BPS operators in the chiral ring of \N=4 super Yang Mills is analysed in terms
of plane partitions. Asymptotic counting of BPS primary operators with
differing R-symmetry charges is discussed in both free \N=4 super Yang Mills
and in the chiral ring. Also, general and explicit expressions are derived for
SU(2) gauge theory partition functions, when the fundamental fields transform
in the adjoint, for free field theory.

<|endoftext|><|startoftext|>
Introduction 2
2 Symplectic homology 6
3 The Morse-Bott chain complex 11
4 Morse-Bott moduli spaces 21
4.1 Transversality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Compactness for Morse-Bott trajectories . . . . . . . . . . . . . . 26
4.3 Gluing for Morse-Bott moduli spaces . . . . . . . . . . . . . . . . 30
4.4 Coherent orientations . . . . . . . . . . . . . . . . . . . . . . . . 69
A Appendix: Asymptotic estimates 84
http://arxiv.org/abs/0704.1039v2
Symplectic homology for autonomous Hamiltonians 2
1 Introduction
One crucial hypothesis in the definition of Floer homology [12] of a Hamiltonian
H on a symplectic manifold (W,ω) is that the 1-periodic orbits of the Hamilto-
nian vector field XH are nondegenerate. Unless they are all constant – which
happens if the Hamiltonian is C2-small – this forces H to be time-dependent.
The purpose of this paper is to define Floer homology for a time-independent, or
autonomous Hamiltonian H :W → R under the assumption that its 1-periodic
orbits are transversally nondegenerate. This last condition is generic in the
space of autonomous Hamiltonians.
Although this generalization of Floer homology is interesting by itself, our
main motivation is to understand the relationship between linearized contact
homology of a fillable contact manifold (M, ξ) and symplectic homology of the
filling (W,ω). In this case there is a natural class of time-independent Hamilto-
nians on W whose nonconstant 1-periodic orbits correspond precisely to closed
Reeb orbits on M = ∂W , and for which the Floer trajectories can be related
to holomorphic cylinders in the symplectization M × R [3]. The goal of the
present paper is to relate the Floer trajectories of a specific time-dependent
perturbation to the Floer trajectories of the unperturbed Hamiltonian. Thus
Floer homology for time-independent Hamiltonians serves as a bridge between
symplectic homology and linearized contact homology. Moreover, the moduli
spaces of Floer trajectories for autonomous Hamiltonians are related to the
moduli spaces defining S1-equivariant symplectic homology [4, 3].
The Morse-Bott analysis in this paper is, to the best of our knowledge, new
to the literature, being based on ideas contained in the first author’s Ph.D.
dissertation [1] within the context of contact homology. Although our situation
is that of critical manifolds of dimension one, the complexity of the analytical
setup is the same as that of the higher dimensional case.
We must mention at this point Frauenfelder’s inspired approach [16, Ap-
pendix A] in which he defines a complex for a Morse-Bott function on a finite
dimensional manifold via “flow lines with cascades” – these being our Floer
trajectories with gradient fragments – and in which, without proving the cor-
respondence with gradient trajectories for some perturbed Morse function, he
directly shows deformation invariance of the resulting chain complex.
We now describe the structure of the paper. We give in the introduction only
a loose statement of our main Correspondence Theorem 3.7 and we recall in
Section 2 the construction of symplectic homology. Although this is well-known
to specialists we still need to establish notations, and we seize the occasion to
set up a general framework using the Novikov ring and nontrivial homotopy
classes of periodic orbits.
Section 3 describes the Morse-Bott complex and formally states the Corre-
spondence Theorem 3.7. The latter is complemented by Proposition 3.9 which
describes how the coherent orientation signs for the Morse-Bott complex are
related to the ones for the Floer complex.
Section 4 contains the proofs of the previous transversality, compactness,
Symplectic homology for autonomous Hamiltonians 3
gluing and orientation statements. Finally, the Appendix contains the state-
ments concerned with the asymptotic behaviour of the various types of Floer
trajectories that we use. These asymptotic estimates enter crucially in the proof
of the compactness statements, as well as in the definition of the Fredholm setup
for gluing.
We end the introduction with an informal presentation of our results. Let
H : W → R be an autonomous Hamiltonian defined on a symplectic manifold
(W,ω). We assume that H is a Morse function and that the nonconstant 1-
periodic orbits ofH are transversally nondegenerate. The set P(H) of 1-periodic
orbits of H is the set of critical points of the Hamiltonian action functional and
consists of isolated elements γep corresponding to critical points p̃ ∈ Crit(H),
and of nonisolated elements coming in families Sγ which are Morse-Bott non-
degenerate circles. These correspond to reparametrizations of some given orbit
γ ∈ P(H), with γ : S1 = R/Z →W .
For each circle Sγ we choose a perfect Morse function fγ : Sγ → R with
exactly one maximum Max and one minimum min . We denote by γmin , γMax
the orbits in Sγ corresponding to the minimum and the maximum of fγ respec-
tively. We choose a chart S1 × R2n−1 ∋ (τ, p) and a smooth cut-off function
ργ : S
1 × R2n−1 → R in the neighbourhood of each γ(S1) ⊂W , and we denote
by ℓγ ∈ Z+ the maximal positive integer such that γ(θ + 1/ℓγ) = γ(θ), θ ∈ S1.
Following [8], for δ > 0 small enough the time-dependent Hamiltonian
Hδ : S
1 ×W → R,
Hδ(θ, τ, p) := H − δ
ργ(τ, p)fγ(τ − ℓγθ)
has only nondegenerate 1-periodic orbits. Moreover, these are of the following
two types: they are either constant orbits γep corresponding to critical points
p̃ ∈ Crit(H), or they are nonconstant orbits of the form γp ∈ P(H) for p ∈
Crit(fγ). Thus, out of each circle Sγ of periodic orbits for H there are exactly
two orbits surviving for Hδ, namely γmin and γMax .
Let J be a generic time-dependent almost complex structure on W . Given
p ∈ Crit(fγ), q ∈ Crit(fγ) we denote by
M(γp, γq;Hδ, J)
the moduli space of Floer trajectories for the pair (Hδ, J) modulo reparametriza-
tion, with negative asymptote γp and positive asymptote γq. We also denote
M(Sγ , Sγ ;H, J)
the moduli space of Floer trajectories for the pair (H, J) modulo reparametriza-
tion, with negative asymptote in Sγ and positive asymptote in Sγ . Our goal is
to describe the moduli spaces of the first type in terms of moduli spaces of the
second type.
Symplectic homology for autonomous Hamiltonians 4
We denote by
M(p, q;H, {fγ}, J)
the moduli space of Floer trajectories for the pair (H, J) with intermediate
gradient fragments, consisting of tuples
[u] = (cm, [um], cm−1, [um−1], . . . , [u1], c0), m ≥ 0
such that:
(i) [ui] ∈ M(Sγi , Sγi−1 ;H, J), i = 1, . . . ,m with γm := γ, γ0 := γ;
(ii) cm is a semi-infinite gradient trajectory of fγ = fγm connecting γp to the
endpoint of um;
(iii) cj , j = 1, . . . ,m − 1 is a finite gradient trajectory of fγj connecting the
endpoints of uj+1 and uj ;
(iv) c0 is a semi-infinite gradient trajectory of fγ = fγ0 connecting the endpoint
of u1 to γq.
We give a pictogram of such an element [u] with m ≥ 1 in Figure 4 on page 45,
where one should read ci instead of vi. If m = 0 such an element [u] is simply
an infinite gradient trajectory of some fγ . Let us note that, just as the space
of Floer trajectories for a nondegenerate Hamiltonian can be compactified by
adding “broken” Floer trajectories, the space of Floer trajectories with interme-
diate gradient fragments can be compactified by adding “broken” such objects,
with an obvious meaning. We denote by M(p, q;H, {fγ}, J) these compactified
moduli spaces.
Our main result is the following comparison theorem.
Theorem. The following assertions hold.
(i) any sequence [vn] ∈ M(γp, γq;Hδn , J), δn → 0 converges to an element of
M(p, q;H, {fγ}, J);
(ii) any element of M(p, q;H, {fγ}, J) can be obtained as such a limit;
(iii) there is a bijective correspondence between elements of M(γp, γq;Hδ, J)
and elements of M(p, q;H, {fγ}, J) if the difference of index of the end-
points is equal to one, or equivalently if the moduli spaces have dimension
zero.
The rigorous forms for the statements (i), (ii), (iii) are given in Proposi-
tion 4.7, Proposition 4.22 and Theorem 3.7 respectively. Unsurprisingly, the
Fredholm setup for the previous theorem uses Sobolev norms with exponential
weights since we have degenerate asymptotics. Similarly, due to the convergence
estimates in the Appendix, there are such weights centered on the portions of the
Floer cylinders approaching gradient fragments. For each peak in the weight,
there is a special section supported around this peak which has constant norm
Symplectic homology for autonomous Hamiltonians 5
with respect to δ → 0. For each gradient fragment this section corresponds to
the reparametrization shift of the underlying gradient trajectory. As δ → 0, the
corresponding peak explodes and thus forbids all infinitesimal variations except
for the single degree of freedom coming from Morse theory.
To be useful for homological calculations the above theorem needs to be
complemented by a statement concerning signs. We describe in Section 4.4 how
to construct coherent orientations on the relevant spaces of Fredholm operators
and how to obtain signs ǫ(u) and ǫ(uδ) for elements [u] ∈ M(p, q;H, {fγ}, J)
and uδ ∈ M(γp, γq;Hδ, J) when the corresponding moduli spaces are zero-
dimensional. We recall in Remark 3.2 the definition of good orbits borrowed
from Symplectic Field Theory, where it plays a crucial role in all orientation
and signs problems. In the following statement we denote again by m ≥ 0 the
number of nonconstant Floer trajectories involved in u.
Proposition 3.9. Assume the moduli spaces under consideration have dimen-
sion zero. The bijective correspondence between elements uδ ∈ M(γp, γq;Hδ, J)
and [u] ∈ M(p, q;H, {fγ}, J) changes signs as follows:
(i) If m ≥ 1 we have
ǫ(u) = (−1)m−1ǫ(uδ);
(ii) If m = 0 we have u = uδ and ǫ(u) = ǫ(uδ), p is the minimum and q is the
maximum of the same function fγ , the moduli space M(p, q;H, {fγ}, J)
consists of the two gradient lines of fγ running from p to q, and their signs
are different if and only if the underlying orbit γ is good.
This result has two pleasant consequences. On the one hand we can construct
a “Morse-Bott” chain complex which computes symplectic homology by count-
ing with suitable signs rigid elements in the moduli spaces M(p, q;H, {fγ}, J).
On the other hand, this chain complex singles out algebraically the good orbits
and can be used to relate the symplectic homology of a manifold (W,ω) with
contact type boundary to the linearized contact homology of its boundary – the
latter being defined by a chain complex involving only good orbits. As already
mentioned at the beginning of this section, this is achieved in [3].
Acknowledgements. A.O. has benefited from a Swiss National Fund grant
under the supervision of Prof. Dietmar Salamon at ETH Zürich. Both authors
acknowledge financial support from the Fonds National de la Recherche Scien-
tifique, Belgium, the Forschungsinstitut für Mathematik, Zürich, the Institut de
Recherche Mathématique Avancée at Université Louis Pasteur, Strasbourg, as
well as from the Mathematisches Forschungsinstitut, Oberwolfach. The authors
would like to thank an anonymous referee for having carefully read through the
arguments in the paper and for having kindly suggested a solution to a gap in
the original proof of Proposition 4.9.
Symplectic homology for autonomous Hamiltonians 6
2 Symplectic homology
We define in this section the symplectic homology groups of a symplectically
aspherical manifold with contact type boundary. Our construction is modelled
on those of Cieliebak, Floer, Hofer and Viterbo [6, 7, 13, 28]. We consider
nontrivial homotopy classes of loops and we use the Novikov ring.
Let (W,ω) be a compact symplectic manifold with contact type boundary
M := ∂W . This means that there exists a vector field X defined in a neigh-
bourhood of M , transverse and pointing outwards along M , and such that
LXω = ω.
Such an X is called a Liouville vector field. The 1-form λ := (ιXω)|M is a
contact form on M . We denote by ξ the contact distribution defined by λ. The
Reeb vector field Rλ is uniquely defined by the conditions ker ω|M = 〈Rλ〉
and λ(Rλ) = 1. We denote by φλ the flow of Rλ. The action spectrum of
(M,λ) is defined by
Spec(M,λ) := {T ∈ R+ | there is a closed Rλ-orbit of period T }.
We assume throughout this paper the condition
f∗ω = 0 for all smooth f : T 2 →W. (1)
This guarantees that the energy of a Floer trajectory does not depend on its
homology class, but only on its endpoints (see below). Condition (1) plays an
important role in the Morse-Bott description of the symplectic homology groups.
Our main class of examples is provided by exact symplectic forms.
Let φ be the flow of X . We parametrize a neighbourhood U of M by
G :M × [−δ, 0] → U, (p, t) 7→ φt(p).
Then d(etλ) is a symplectic form on M ×R+ and G satisfies G∗ω = d(etλ). We
denote
Ŵ :=W
M × R+
and endow it with the symplectic form
ω̂ :=
ω, on W,
d(etλ), on M × R+.
Given a time-dependent HamiltonianH : S1×Ŵ → R, we define theHamil-
tonian vector field XθH by
ω̂(XθH , ·) = dHθ, θ ∈ S1 = R/Z,
where Hθ := H(θ, ·). We denote by φH the flow of XθH , defined by φ0H = Id and
φθH(x) = X
H(x)), θ ∈ R.
Symplectic homology for autonomous Hamiltonians 7
Let H be the set of admissible Hamiltonians, consisting of functions
H : S1 × Ŵ → R which satisfy
(i) H < 0 on W ;
(ii) H(θ, p, t) = αet + β for t large enough, with α /∈ Spec(M,λ);
(iii) every 1-periodic orbit γ : S1 → Ŵ of XθH is nondegenerate, i.e.
1l− dφ1H(γ(0))
6= 0.
We denote by P(H) the set of 1-periodic orbits of XθH and by Pa(H) the set of
1-periodic orbits in a given free homotopy class a in Ŵ .
Let J denote the set of admissible almost complex structures
J : S1 → End(TŴ ), J2 = −1l
which are compatible with ω̂ and have the following standard form for t large
enough: {
J(p,t)|ξ = J0,
J(p,t)
= Rλ.
Here J0 is any compatible complex structure on the symplectic bundle (ξ, dλ)
which is independent of θ and t.
Let us fix a reference loop la : S
1 → Ŵ for each free homotopy class a in
Ŵ such that [la] = a. If a is the trivial homotopy class we choose la to be a
constant loop. Recall that free homotopy classes of loops in Ŵ are in one-to-one
correspondence with conjugacy classes in π1(Ŵ ). As a consequence, the inverse
a−1 of a free homotopy class is well-defined. We require that la−1 coincides with
the loop la with the opposite orientation.
The Hamiltonian action functional acts on pairs (γ, [σ]) consisting of a
loop γ ∈ C∞(S1, Ŵ ) and the homology class (rel boundary) of a map σ : Σ → Ŵ
defined on a Riemann surface Σ with two boundary components ∂0Σ (with the
opposite boundary orientation) and ∂1Σ (with the boundary orientation), which
satisfies
σ|∂0Σ = l[γ], σ|∂1Σ = γ. (3)
Its values are defined by
AH(γ, [σ]) := −
σ∗ω̂ −
H(θ, γ(θ)) dθ. (4)
The differential dAH(γ, [σ]) : C∞(S1, γ∗TŴ ) → R is given by
dAH(γ, [σ])ζ :=
ω̂(γ̇ −XθH(γ), ζ) dθ.
Therefore the critical points of AH are pairs (γ, [σ]) such that γ ∈ P(H). We
fix from now on, for each γ ∈ P(H), a map σγ satisfying (3); then the set of all
pairs (γ, [σ]) can be identified with H2(W ;Z) for fixed γ.
Symplectic homology for autonomous Hamiltonians 8
Let us choose a symplectic trivialization
Φa : S
1 × R2n → l∗aTŴ
for each free homotopy class a in Ŵ . If a is the trivial homotopy class we
choose the trivialization to be constant. Moreover, we require that Φa−1(θ, ·) =
Φa(−θ, ·), θ ∈ S1 = R/Z. For each γ ∈ P(H) there exists a unique (up to
homotopy) trivialization
Φγ : Σ× R2n → σ∗γTŴ
such that Φγ = Φ[γ] on ∂0Σ× R2n. Let
Ψ : [0, 1] → Sp(2n), Ψ(θ) := Φ−1γ ◦ dφθH(γ(0)) ◦ Φγ . (5)
Because γ is nondegenerate we can define the Conley-Zehnder index µ(γ) by
µ(γ) := µ(γ, σγ) := −µCZ(Ψ), (6)
where µCZ(Ψ) is the Conley-Zehnder index of a path of symplectic matrices [23].
Remark 2.1. If, in the previous construction, we replace σγ with σγ#A for
some A ∈ H2(W ;Z), then the resulting index will be
µ(γ, σγ#A) = µ(γ, σγ)− 2〈c1(TW ), A〉. (7)
We define the Novikov ring Λω as the set of formal linear combinations
A∈H2(W ;Z)
A, λA ∈ Z such that
# {A |λA 6= 0, ω(A) ≤ c} <∞
for all c > 0. The multiplication in Λω is given by
λ ∗ λ′ :=
A,B∈H2(W ;Z)
We note that, if ω is exact, then Λω = Z[H2(W ;Z)]. We define a grading on
Λω by |eA| := −2〈c1(TW ), A〉. For each free homotopy class a in Ŵ and each
admissible Hamiltonian H we define the symplectic chain group SCa∗ (H) as
the free Λω-module generated by elements γ ∈ Pa(H). The grading is given by
|eAγ| := µ(γ)− 2〈c1(TW ), A〉.
We define the space of Floer trajectories M̂A(γ, γ;H, J) as the set of solu-
tions u : R× S1 → Ŵ of the equation
∂su+ Jθ(∂θu−XθH) = 0, (8)
such that
u(s, θ) = γ(θ), lim
u(s, θ) = γ(θ), lim
∂su = 0 (9)
uniformly in θ and
[σγ#u] = [σγ#A]. (10)
Symplectic homology for autonomous Hamiltonians 9
Remark 2.2. Under the nondegeneracy assumption on γ, γ condition (9) is
equivalent to the finiteness of the energy
E(u) := EJ,H(u) :=
|∂su|2θ + |∂tu−XθH |2θ
dsdθ. (11)
Because γ, γ are nondegenerate the linearized operator Du : W
1,p(R ×
S1, u∗TŴ ) → Lp(R× S1, u∗TŴ ), p > 2 given by
Duζ := ∇sζ + Jθ∇θζ + (∇ζJθ)∂θu−∇ζ
, u ∈ M̂A(γ, γ;H, J) (12)
is Fredholm with index
ind(Du) = µ(γ)− µ(γ) + 2〈c1(TW ), A〉. (13)
An almost complex structure J ∈ J is called regular for u ∈ M̂A(γ, γ;H, J) if
Du is surjective, and it is called regular if Du is surjective for all γ, γ ∈ P(H),
A ∈ H2(W ;Z) and u ∈ M̂A(γ, γ;H, J). It is proved in [15] that the space
Jreg(H) of regular almost complex structures is of the second category in J . For
every J ∈ Jreg(H) the space M̂A(γ, γ;H, J) is a smooth manifold of dimension
µ(γ)− µ(γ) + 2〈c1(TW ), A〉. From now on we fix some J ∈ Jreg(H).
If γ 6= γ or A 6= 0, the additive group R acts freely on M̂A(γ, γ;H, J) by
s0 · u(·, ·) := u(s0 + ·, ·). We define the moduli space of Floer trajectories
MA(γ, γ;H, J) := M̂A(γ, γ;H, J)/R.
Its dimension is
dim MA(γ, γ;H, J) := µ(γ)− µ(γ) + 2〈c1(TW ), A〉 − 1.
If γ = γ and A = 0, the space M̂0(γ, γ;H, J) consists of a single point, cor-
reponding to a constant solution (i.e. independent of s). The R action is then
trivial and we define the moduli space by M0(γ, γ;H, J) := M̂0(γ, γ;H, J).
A straightforward application of the maximum principle [28] using the special
form of admissible Hamiltonians for large t shows that all solutions of equa-
tions (8) and (9) are contained in a compact set. Moreover, by condition (1),
there are no J-holomorphic spheres that can bubble off. Therefore the mod-
uli space MA(γ, γ;H, J) can be compactified [12] to a space MA(γ, γ;H, J)
consisting of all tuples
([uk], [uk−1], . . . , [u1]), [ui] ∈ MAi(γi, γi;H, J)
such that γ
= γ, γi = γi+1, γk = γ and
iAi = A. We call such a tuple
([uk], [uk−1], . . . , [u1]) a broken trajectory of level k. The topology of the
compactified moduli space is described by the following notion of convergence:
a sequence [uν ] ∈ MA(γ, γ;H, J) is said to converge to the broken trajectory
Symplectic homology for autonomous Hamiltonians 10
γ = γ
γ1 = γ2
γ2 = γ3
γ = γ3
Figure 1: Broken trajectory.
([uk], [uk−1], . . . , [u1]) if there exist sequences s
i ∈ R, 1 ≤ i ≤ k such that sνi ·uν
converges uniformly on compact sets to ui.
If the space M̂A(γ, γ;H, J) is nonempty then its dimension is strictly positive
due to the action of R. In this case, the broken trajectories involved in the
compactification have level at most dim M̂A(γ, γ;H, J). In particular, when
µ(γ) − µ(γ) + 2〈c1(TW ), A〉 = 1 the moduli space MA(γ, γ;H, J) is compact
and consists of a finite number of points. In this situation one can associate a
sign ǫ(u) to each element [u] of this moduli space [13] (see also Section 4.4). We
define the Floer differential
∂ : SCa∗ (H) → SCa∗−1(H)
∂γ :=
µ(γ)−µ(γ)+2〈c1(TW ),A〉=1
[u]∈MA(γ,γ;H,J)
ǫ(u)eAγ. (14)
According to Floer [12] we have ∂2 = 0. We define the symplectic homol-
ogy groups of the pair (H, J) by
SHa∗ (H, J) := H∗(SC
∗ (H), ∂).
Remark 2.3. In view of condition (1) the Novikov ring Λω can be replaced by
Z[H2(W ;Z)], or even by Z at the price of losing the grading. Indeed, the energy
of a Floer trajectory depends only on its endpoints, hence the moduli spaces
M(γ, γ;H, J) :=
(γ, γ;H, J) are compact. Therefore the sum (14) in-
volves only a finite number of classes A.
By a standard argument [12] the groups SHa∗ (H, J) do not depend on J ∈
Jreg(H). Nevertheless, they do depend onH and, in order to obtain an invariant
of (W,ω), we need an additional algebraic limit construction. We define an
Symplectic homology for autonomous Hamiltonians 11
admissible homotopy of Hamiltonians as a map H : R×S1×Ŵ → R with
the following properties:
(i) H(s, ·, ·) = H− ∈ H for s ≤ −1, H(s, ·, ·) = H+ ∈ H for s ≥ 1;
(ii) H < 0 on W and there exist t0 ≥ 0 and functions α, β : R → R such that,
for all t ≥ t0, we have
H(s, t, p) = α(s)et + β(s);
(iii) ∂sH ≥ 0.
An admissible homotopy of almost complex structures is a map J : R →
J such that J(s) = J− for s ≤ −1 and J(s) = J+ for s ≥ 1. Given an admissible
homotopy of Hamiltonians one defines regular admissible homotopies of almost
complex structures in the usual way, by linearizing the equation
∂su(s, θ) + J(s, θ, u(s, θ))(∂θu(s, θ)−XθH(s, θ, u(s, θ))) = 0, (15)
subject to the limit conditions
u(s, ·) = γ ∈ P(H−), lim
u(s, ·) = γ ∈ P(H+). (16)
Regular admissible homotopies of almost complex structures form again a set
of the second category in the space of admissible homotopies and the rigid
behaviour of H for t ≥ t0, together with the condition ∂sH ≥ 0, ensures again
that solutions of (15) and (16) stay in a compact set (see [22]). The usual count
of solutions of (15) and (16) induces the monotonicity morphism
σ : SHa∗ (H−) → SHa∗ (H+), (17)
which does not depend on the choice of admissible homotopy connecting H−
and H+. These morphisms form a direct system on the set {SHa∗ (H), H ∈ H}
and we define the symplectic homology groups of (W,ω) by
SHa∗ (W,ω) := lim→
SHa∗ (H).
According to [6, Lemma 3.7] and [28, Theorem 1.7] these groups do not depend
on the choice of the Liouville vector field X .
3 The Morse-Bott chain complex
In this section we apply the Morse-Bott formalism of [1] to the case of Hamil-
tonians H : Ŵ → R having circles of 1-periodic orbits.
We denote by Pλ the set of closed unparametrized Rλ-orbits in M . For each
free homotopy class of loops b in M we denote by Pbλ the set of all γ ∈ Pλ in the
homotopy class b. The inclusion i : M →֒ W induces a map (still denoted by
Symplectic homology for autonomous Hamiltonians 12
i) between the sets of free homotopy classes of loops in M and W respectively.
For each free homotopy class a in W we denote
−1(a)
b∈i−1(a)
We assume in this section that the closed Reeb orbits on M are transversally
nondegenerate in M . This means that, for every orbit γ of period T > 0, we
1l− dφTλ (γ(0))|ξ
6= 0.
This can always be achieved by an arbitrarily small perturbation of λ or, equiv-
alently, of X , and such perturbations do not change the symplectic homology
groups. If all orbits γ ∈ Pλ are transversally nondegenerate one can assign to
each of them a Conley-Zehnder index µCZ(γ) according to the following recipe.
We fix a reference loop lb : S
1 → M for each free homotopy class b in M
such that [lb] = b. If b is the trivial homotopy class we choose lb to be a constant
loop and we require that lb−1 coincides with lb with the opposite orientation.
We define symplectic trivializations
Φb : S
1 × R2n−2 → l∗bξ
as follows. For each class b we choose a homotopy hab : S
1 × [0, 1] → W from
la, a = i(b) to lb such that
ha−1b−1(τ, ·) = hab(−τ, ·). (18)
We extend the trivialization Φa : S
1 × R2n → l∗aTŴ over the homotopy hab to
get a trivialization Φ′b : S
1 × R2n → l∗bTŴ . This trivialization is homotopic to
another one, still denoted Φ′b, such that
Φ′b(S
1 × R2n−2 × {0} × {0}) = l∗bξ,
Φ′b(S
1 × {0} × R× {0}) = l∗b 〈
〉, (19)
Φ′b(S
1 × {0} × {0} × R) = l∗b 〈Rλ〉.
We define Φb := Φ
b|S1×R2n−2×{0}×{0}. If b is the trivial homotopy class we
choose hab to be a path of constant loops, so that Φb is constant.
We fix for each γ ∈ Pλ a map σγ : Σ → M , where Σ is a Riemann surface
with two boundary components ∂0Σ (with the opposite boundary orientation)
and ∂1Σ (with the boundary orientation), satisfying
σ|∂0Σ = l[γ], σ|∂1Σ = γ. (20)
For each γ ∈ Pλ there exists a unique (up to homotopy) trivialization
Φγ : Σ× R2n−2 → σ∗γξ
Symplectic homology for autonomous Hamiltonians 13
such that Φγ = Φ[γ] on ∂0Σ× R2n−2. Let
Ψ : [0, T ] → Sp(2n− 2), Ψ(τ) := Φ−1γ ◦ dφτλ(p) ◦ Φγ , p ∈ im γ. (21)
Because γ is nondegenerate we can define the Conley-Zehnder index µ(γ) by
µ(γ) := µ(γ, σγ) := µCZ(Ψ), (22)
where µCZ(Ψ) is the Conley-Zehnder index of a path of symplectic matrices [23].
Remark 3.1. If, in the previous construction, we replace σγ with σγ#A for
some A ∈ H2(M ;Z), then the resulting index will be
µ(γ, σγ#A) = µ(γ, σγ) + 2〈c1(ξ), A〉. (23)
Note that c1(ξ) = i
∗c1(TW ) because i
∗TW = ξ⊕〈 ∂
, Rλ〉. Moreover, the parity
of µ(γ) is well-defined independently of the trivialization of ξ along γ.
Remark 3.2. For each simple orbit γ ∈ Pλ we denote by γk, k ∈ Z+ its positive
iterates. The parity of the Conley-Zehnder index of all the odd, respectively even
iterates is the same. If these two parities differ we say that all even iterates γ2k,
k ∈ Z+ are bad orbits. It is proved in [27, Lemma 3.2.4] that the even iterates
of a simple orbit γ of period T are bad if and only if dφTλ (p)|ξ, p ∈ im γ has
an odd number of real negative eigenvalues strictly smaller than −1 (see also
Lemma 4.25). The orbits in Pλ which are not bad are called good orbits.
We define a new class H′ of admissible Hamiltonians consisting of elements
H : Ŵ → R such that
(i) H |W is a C2-small Morse function and H < 0 on W ;
(ii) H(p, t) = h(t) outside W , where h(t) is a strictly increasing function with
h(t) = αet + β, α, β ∈ R, α /∈ Spec(M,λ) for t bigger than some t0, and
such that h′′ − h′ > 0 on [0, t0[.
Note that the 1-periodic orbits of XH in W are constant and nondegenerate
by assumption (i). A direct computation shows that
Xh(p, t) = −e−th′(t)Rλ. (24)
The 1-periodic orbits of XH fall in two classes:
(1) critical points of H in W ;
(2) nonconstant 1-periodic orbits of Xh, located on levels M × {t}, t ∈]0, t0[,
which are in one-to-one correspondence with closed −Rλ-orbits of period
e−th′(t).
Symplectic homology for autonomous Hamiltonians 14
Recall that, for every critical point p̃ ∈ Crit(H), the corresponding constant
XH-orbit γep has Conley-Zehnder index
µ(γep) = ind(p̃;−H)− n, n =
dim W,
where ind(p̃;−H) is the Morse index of p̃ with respect to −H [25, Lemma 7.2].
Let α := limt→∞ e
−tH(p, t). We denote by P≤αλ the set of all γ ∈ Pλ such
γ∗λ ≤ α. BecauseH is independent of θ, every orbit γ ∈ P≤αλ gives rise to
a whole circle of nonconstant 1-periodic orbits γH of XH . We denote by Sγ the
set of such orbits and identify Sγ with its image under the natural embedding
Sγ → Ŵ given by γH 7→ γH(0). Note that all elements of Sγ differ by a shift in
the parametrization, and that the γH are noninjective if their minimal period
is smaller than 1.
Lemma 3.3. Let H ∈ H′. Every nonconstant 1-periodic orbit γH of H is
transversally nondegenerate in Ŵ .
Proof. We have to show that the only eigenvector of dφ1H(γH(0)) corresponding
to the eigenvalue 1 is γ̇H(0). To this effect we note that ξ is an invariant space
and that
dφ1H(γH(0))|ξ =
e−th′(t)
(γH(0))|ξ.
Because we have assumed that all Rλ-orbits are transversally nondegenerate in
M , it follows that dφ1H(γH(0))|ξ has no eigenvalue equal to one. On the other
hand we have
dφ1H(γH(0))
− e−t(h′′ − h′)Rλ.
The conclusion follows because h′′(t)− h′(t) > 0.
For each γ ∈ P≤αλ we choose a Morse function fγ : Sγ → R with exactly one
maximum M and one minimum m. We fix from now on an element H ∈ H′
and, for each γ ∈ Pλ corresponding to a nonconstant γH ∈ P(H), we denote
by ℓγ ∈ Z+ the maximal positive integer such that γH(θ + 1ℓγ ) = γH(θ) for
all θ ∈ S1. We choose a symplectic trivialization ψ := (ψ1, ψ2) : Uγ
V ⊂ S1 × R2n−1 between open neighbourhoods Uγ ⊂ Ŵ of γH(S1) and V of
S1 × {0}, such that ψ1(γ(θ)) = ℓγθ. Here S1 × R2n−1 is endowed with the
symplectic form ω0 :=
i=1 dqi ∧ dpi, q1 ∈ S1, (p1, q2, p2, . . . , qn, pn) ∈ R2n−1.
Let ρ : S1 × R2n−1 → [0, 1] be a smooth cutoff function supported in a small
neighbourhood of S1 × {0} such that ρ|S1×{0} ≡ 1. For δ > 0 and (θ, p, t) ∈
S1 × Uγ we define
Hδ(θ, p, t) := h(t) + δρ(ψ(p, t))fγ(ψ1(p, t)− ℓγθ). (25)
The Hamiltonian Hδ coincides with H outside the open sets S
1 × Uγ . This is
precisely the perturbation described in [8, Proposition 2.2]. It is shown therein
that, for δ sufficiently small, the set P(Hδ) consists of the following elements:
Symplectic homology for autonomous Hamiltonians 15
(1) constant orbits, which are the same as those of H ;
(2) nonconstant orbits, which are nondegenerate and form pairs (γmin , γMax ),
where γ ∈ P≤αλ and γmin , γMax coincide with the orbits in Sγ starting at
the minimum and the maximum of fγ respectively.
Lemma 3.4. The periodic orbits γmin , γMax ∈ P(Hδ) satisfy
µ(γmin) = µ(γ) + 1, µ(γMax ) = µ(γ). (26)
Proof. We denote by γH the 1-periodic orbit of XH corresponding to γ ∈ P≤αλ .
We define the Robbin-Salamon index of γH by
µRS(γH) := µ(Ψ),
where Ψ : [0, 1] → Sp(2n) is given by (5) and µ(Ψ) is the Robbin-Salamon
index of an arbitrary path of symplectic matrices [23, §4]. It is shown in [8,
Proposition 2.2] that
− µ(γmin) = µRS(γH)−
, −µ(γMax ) = µRS(γH) +
. (27)
Note that γH has the orientation of −Rλ. Define Ψ̃ : [0, 1] → Sp(2n) by
Ψ̃(θ) := Φ−1γ (−θ) ◦ dφ−θH (γ(0)) ◦Φγ(0),
where Φγ : R/Z × R2n → γ∗HTŴ is the trivialization involved in (5). Then
µRS(−γH) = −µRS(γH) = µ(Ψ̃).
Let Sp∗(2n) ⊂ Sp(2n) be the set of symplectic matrices with no eigenvalue
equal to 1 and recall that we have denoted by a free homotopy classes of loops
in W and by b free homotopy classes of loops in M . By our choice (18) and (19)
of trivializations of TŴ over the reference loops lb, b ∈ i−1(a) we deduce that
the path Ψ̃ is homotopic with endpoint in Sp∗(2n) to the path
[0, 1] → Sp(2n) : θ 7→ Ψλ (Tθ)⊕
Here T := e−t(h′′(t) − h′(t)) and Ψλ : [0, T ] → Sp(2n − 2) is defined by (21).
By the symplectic shear axiom for the Robbin-Salamon index [23, Theorem 4.1]
the index of the above path is µ(γ) + 1
. As a consequence
−µRS(γH) = µRS(−γH) = µ(γ) +
Together with (27) this yields the conclusion of the Lemma.
Let p ∈ Crit(fγ); then γp ∈ P(Hδ) for all δ ∈]0, δ0] if δ0 is small enough, and
Lemma 3.4 says that µ(γp) = µ(γ) + ind(p; fγ). If p̃ is a critical point of H in
Symplectic homology for autonomous Hamiltonians 16
W we denote by γ
∈ P(H) the corresponding constant orbit. Our goal is to
describe the boundary points as δ → 0 of
MA]0,δ0[(γp, γq;H, {fγ}, J) :=
0<δ<δ0
{δ} ×MA(γp, γq;Hδ, J), (28)
µ(γp)− µ(γq) + 2〈c1(TW ), A〉 = 1,
where
γ, γ ∈ P≤αλ , p ∈ Crit(fγ), q ∈ Crit(fγ), A ∈ H2(W ;Z), J ∈ J
γ ∈ P≤αλ , p ∈ Crit(fγ), q ∈ Crit(H), A ∈ H2(W ;Z), J ∈ J .
Our description is very similar to that of [1] within the setting of contact
homology. We fix J ∈ J , γ, γ ∈ P≤αλ and q̃ ∈ Crit(H). We define two Morse-
Bott spaces of Floer trajectories M̂A(Sγ , Sγ ;H, J) and M̂A(Sγ , q̃;H, J) as
follows.
For γ, γ ∈ P≤αλ we denote by M̂A(Sγ , Sγ ;H, J) the set of solutions u :
R× S1 → Ŵ of the Floer equation (8) subject to the asymptotic conditions
u(s, θ) = γH(θ), lim
u(s, θ) = γ
(θ), lim
∂su = 0 (29)
uniformly in θ, with
γH ∈ Sγ , γH ∈ Sγ (30)
[σγ#u] = [σγ#A].
It is implicit in the above definition that the orbits γH and γH may vary for
different elements of M̂A(Sγ , Sγ ;H, J).
For γ ∈ P≤αλ and q̃ ∈ Crit(H) we denote by M̂A(Sγ , q̃;H, J) the set of
solutions u : R× S1 → Ŵ of the Floer equation (8) subject to the asymptotic
conditions
u(s, θ) = γH(θ), lim
u(s, θ) = q̃, lim
∂su = 0 (31)
uniformly in θ, with
γH ∈ Sγ (32)
[σγ#u] = A.
Again, the orbit γH may vary for different elements of M̂A(Sγ , q̃;H, J).
Symplectic homology for autonomous Hamiltonians 17
If γ 6= γ or A 6= 0, the additive group R acts freely on M̂A(Sγ , Sγ ;H, J)
and M̂A(Sγ , q̃;H, J) by s0 · u(·, ·) := u(s0 + ·, ·). We define the Morse-Bott
moduli spaces of Floer trajectories by
MA(Sγ , Sγ ;H, J) := M̂A(Sγ , Sγ ;H, J)/R
MA(Sγ , q̃;H, J) := M̂A(Sγ , q̃;H, J)/R.
If γ = γ and A = 0, the space M̂0(Sγ , Sγ ;H, J) is diffeomorphic to Sγ , con-
sists of constant cylinders (i.e. independent of s) and the R action is trivial.
In this case, we define the Morse-Bott moduli spaces by M0(Sγ , Sγ ;H, J) :=
M̂0(Sγ , Sγ ;H, J). We have natural evaluation maps
ev : MA(Sγ , Sγ ;H, J) → Sγ , ev : MA(Sγ , Sγ ;H, J) → Sγ
ev : MA(Sγ , q̃;H, J) → Sγ
defined by
ev([u]) := lim
u(s, ·), ev([u]) := lim
u(s, ·).
In the statement of the next result we denote by J ′ the set of almost complex
structures J ∈ J which are independent of θ ∈ S1.
Proposition 3.5. (i) Given H ∈ H′, let J ′(H) ⊂ J ′ be the (nonempty and
open) set of almost complex structures J such that, for any x ∈ Ŵ located
on a simple 1-periodic orbit of XH , we have
[XH , JXH ](x) 6= 0 and [XH , JXH ](x) /∈ 〈XH , JXH〉. (33)
There exists a set of second category J ′reg(H) ⊂ J ′(H) consisting of almost
complex structures J that are regular for all u ∈ M̂A(Sγ , Sγ ;H, J) or u ∈
M̂A(Sγ , q̃;H, J) with γ or γ being a simple orbit, and such that Jξ = ξ,
= Rλ outside a fixed open neighbourhood of the nonconstant periodic
orbits of XH .
(ii) Given H ∈ H′, there exists a set of second category Jreg(H) ⊂ J con-
sisting of regular almost complex structures J which, outside a fixed open
neighbourhood of the nonconstant periodic orbits of XH , are independent
of θ and satisfy Jξ = ξ, J ∂
= Rλ.
In each of the previous cases the relevant moduli spaces MA(Sγ , Sγ ;H, J),
MA(Sγ , q̃;H, J) are smooth manifolds of dimension
dim MA(Sγ , Sγ ;H, J) = µ(γ)− µ(γ) + 2〈c1(TW ), A〉,
dim MA(Sγ , q̃;H, J) = µ(γ)− µ(γeq) + 2〈c1(TW ), A〉,
and the evaluation maps ev, ev are smooth.
Symplectic homology for autonomous Hamiltonians 18
The proof of this statement is given in Section 4. Unless the contrary is
explicitly mentioned, all statements in this section hold both for J ∈ Jreg(H)
or J ∈ J ′reg(H), provided one considers moduli spaces with at least one simple
asymptotic orbit in the latter case.
Let now J ∈ Jreg(H) and fix for each γ ∈ P≤αλ a metric on Sγ such that
Rλ has length one. Let Freg(H, J) be the set of regular Morse functions,
consisting of families {fγ}, γ ∈ P≤αλ of perfect Morse functions fγ : Sγ → R
such that all the maps ev are transverse to the unstable manifolds Wu(p),
p ∈ Crit(fγ), all the maps ev are transverse to the stable manifolds W s(p),
p ∈ Crit(fγ) and all pairs
(ev, ev) : MA(Sγ , Sγ ;H, J) → Sγ × Sγ ,
(ev, ev) : MA1(Sγ , Sγ1 ;H, J) ev ×ev MA2(Sγ1 , Sγ ;H, J) → Sγ × Sγ (34)
are transverse to productsWu(p)×W s(q), p ∈ Crit(fγ), q ∈ Crit(fγ). Here and
in the sequel the unstable and stable manifolds are understood with respect to
∇fγ . Denote by C∞p (Sγ ,R) the set of perfect Morse functions on Sγ .
Lemma 3.6. The set Freg(H, J) is of the second Baire category in the space∏
C∞p (Sγ ,R).
Proof. The first two transversality conditions on ev, ev are satisfied if and only
if the maximum of each function fγ is a regular value of all the evaluation maps
ev having Sγ as target space, and if the minimum of each fγ is a regular value
of all the evaluation maps ev mapping to Sγ . The third transversality condition
requires in addition that each pair (M,m) ∈ Sγ × Sγ , with M the maximum of
fγ and m the minimum of fγ , is a regular value of (ev, ev).
By Sard’s theorem the minimum and maximum of each fγ can be chosen
inside a set of second category in Sγ . The conclusion follows.
Let now J ∈ Jreg(H) and {fγ} ∈ Freg(H, J). For p ∈ Crit(fγ) we denote
the Morse index by
ind(p) := dim Wu(p;∇fγ).
Let γ, γ ∈ P≤αλ and p ∈ Crit(fγ), q ∈ Crit(fγ). For m ≥ 0 we denote by
MAm(p, q;H, {fγ}, J) (35)
the union for γ1, . . . , γm−1 ∈ P≤αλ and A1+ . . .+Am = A of the fibered products
Wu(p)×ev (MA1(Sγ , Sγ1)×R+)ϕfγ1◦ev×ev (M
A2(Sγ1 , Sγ2)×R+)ϕfγ2◦ev×ev
. . . ϕfγm−1 ◦ev
×ev MAm(Sγm−1 , Sγ)ev×W s(q),
Symplectic homology for autonomous Hamiltonians 19
with the convention γ0 = γ. This is well defined as a smooth manifold of
dimension
dim MAm(p, q;H, {fγ}, J)
= ind(p)− 1 + (dim MA1(Sγ , Sγ1) + 1)− 1
+(dim MA2(Sγ1 , Sγ2) + 1)− 1 + ...
+dim MAm(Sγm−1 , Sγ)− 1 + (1− ind(q))
= µ(γ) + ind(p)− µ(γ)− ind(q) + 2〈c1(TW ), A1 + ...+Am〉 − 1
= µ(γp)− µ(γq) + 2〈c1(TW ), A〉 − 1.
The last equality follows from Lemma 3.4. Note that MA0 (p, q;H, {fγ}, J) is
naturally a submanifold of MA(Sγ , Sγ ;H, J). We denote
MA(p, q;H, {fγ}, J) :=
MAm(p, q;H, {fγ}, J)
and we call this the moduli space of Morse-Bott broken trajectories,
whereasMAm(p, q;H, {fγ}, J) is called the moduli space of Morse-Bott bro-
ken trajectories with m sublevels (see also Definition 4.1 and Figure 4).
Similarly, given γ ∈ P≤αλ , p ∈ Crit(fγ), q̃ ∈ Crit(H), we define mod-
uli spaces of Morse-Bott broken trajectories MAm(p, q̃;H, {fγ}, J), m ≥ 0 and
MA(p, q̃;H, {fγ}, J) by replacing the last term MAm(Sγm−1 , Sγ)ev×W s(q) in
the definition (35) with MAm(Sγm−1 , q̃;H, J). This is again well defined as a
smooth manifold of dimension
dim MA(p, q̃;H, {fγ}, J) = µ(γp)− µ(γeq) + 2〈c1(TW ), A〉 − 1.
Again, MA0 (p, q̃;H, {fγ}, J) is naturally a submanifold of MA(Sγ , q̃;H, J).
The significance of the above moduli spaces of broken Morse-Bott trajec-
tories is explained by the following theorem, which describes the boundary of
]0,δ0[
(γp, γq;H, {fγ}, J) in (28) as δ → 0.
Theorem 3.7 (Correspondence Theorem). Let H ∈ H′ be fixed and let
α := limt→∞ e
−tH(p, t) be the maximal slope of H. Let J ∈ Jreg(H) and
{fγ} ∈ Freg(H, J). There exists
δ1 := δ1(H, J) ∈ ]0, δ0[
such that, for any
γ, γ ∈ P≤αλ , p ∈ Crit(fγ), q ∈ Crit(fγ),
γ ∈ P≤αλ , p ∈ Crit(fγ), q ∈ Crit(H),
and any A ∈ H2(W ;Z) with
µ(γp)− µ(γq) + 2〈c1(TW ), A〉 = 1,
the following hold:
Symplectic homology for autonomous Hamiltonians 20
(i) J is regular for MA(γp, γq;Hδ, J) for all δ ∈]0, δ1[;
(ii) the space MA
]0,δ1[
(γp, γq;H, {fγ}, J) is a 1-dimensional manifold having a
finite number of components which are graphs over ]0, δ1[, i.e. the natural
projection MA
]0,δ1[
(γp, γq;H, {fγ}, J) →]0, δ1[ is a submersion;
(iii) there is a bijective correspondence between points
[u] ∈ MA(p, q;H, {fγ}, J)
and connected components of MA]0,δ1[(γp, γq;H, {fγ}, J).
The proof of this statement, including a discussion of gluing and compactness
for Morse-Bott moduli spaces, is given in Section 4.
We assume in the remainder of this section that the conclusions of Theo-
rem 3.7 are satisfied. For each [u] ∈ MA(p, q;H, {fγ}, J) the sign ǫ(uδ) is con-
stant on the corresponding connected component C[u] for continuity reasons.
We define a sign ǭ(u) by
ǭ(u) := ǫ(uδ), δ ∈]0, δ1[, (δ, [uδ]) ∈ C[u]. (36)
We define the Morse-Bott chain groups by
BCa∗ (H) :=
i−1(a),≤α
Λω〈γmin , γMax 〉, a 6= 0, (37)
BC0∗(H) :=
ep∈Crit(H)
Λω〈p̃〉 ⊕
i−1(0),≤α
Λω〈γmin , γMax 〉. (38)
where α := limt→∞ e
−tH(p, t) and P i
−1(0),≤α
λ = P
λ ∩ P
i−1(0)
λ . The grading is
defined by
|eAp̃| := ind(p̃;−H)− n− 2〈c1(TW ), A〉,
|eAγmin | := µ(γ) + 1− 2〈c1(TW ), A〉,
|eAγMax | := µ(γ)− 2〈c1(TW ), A〉.
We define the Morse-Bott differential
∂ : BCa∗ (H) → BCa∗−1(H)
∂p̃ :=
eq∈Crit(H)
|ep|−|eq|=1
[u]∈M0(ep,eq;H,{fγ},J)
ǭ(u)q̃, (39)
∂γp :=
eq∈Crit(H)
|γp|−|e
eq|=1
[u]∈MA(γp,eq;H,{fγ},J)
ǭ(u)eAq̃ (40)
,q∈Crit(fγ)
|γp|−|e
[u]∈MA(γp,γ
;H,{fγ},J)
ǭ(u)eAγ
, p ∈ Crit(fγ).
Symplectic homology for autonomous Hamiltonians 21
The sums (39) and (40) clearly involve only periodic orbits in the same free
homotopy class as that of p̃ or γp respectively.
Remark 3.8. Since H is C2-small, the moduli spaces MA(p̃, q̃;Hδ, J), p̃, q̃ ∈
Crit(H) of expected dimension ind(p̃;−H)−ind(q̃;−H)+2〈c1(TW ), A)〉−1 = 0
are independent of δ and consist exclusively of gradient trajectories of H in
W [17, Theorem 6.1](see also [25, Theorem 7.3]). As a consequence, these
moduli spaces are empty whenever A 6= 0.
We have, following directly from the definitions, an obvious isomorphism of
free Λω-modules
SCa∗ (Hδ) ≃ BCa∗ (H), δ ∈]0, δ1[.
It follows now from Theorem 3.7 and the definition (36) of signs in the Morse-
Bott complex that the corresponding differentials, defined by (14) and (39-40),
also coincide. Here we use the fact that the Hamiltonian action functional
decreases along Floer trajectories, hence the differential (14) applied to elements
p̃ ∈ Crit(H) does not involve nonconstant elements of P(Hδ) and reduces to (39)
by Remark 3.8. As a consequence, we have
H∗(BC
∗ (H), ∂) = SH
∗ (Hδ, J).
We shall construct in Section 4.4 a system of coherent orientations on the
Morse-Bott moduli spaces
MA(Sγ , Sγ ;H, J), MA(Sγ , q̃;H, J)
whenever γ, γ ∈ P≤αλ are good orbits. This in turn determines signs ǫ(u) via an
orientation rule for fiber products (see (87)).
Proposition 3.9. Assume dim MA(p, q;H, {fγ}, J) = 0. The bijective corre-
spondence between elements [u] ∈ MAm(p, q;H, {fγ}, J), m ≥ 1 and elements of
[uδ] ∈ MA(γp, γq;Hδ, J) given by Theorem 3.7 changes the signs by the rule
ǫ(u) = (−1)m−1ǫ(uδ).
Moreover, if m = 0 then u = uδ and ǫ(u) = ǫ(uδ), p is a minimum and q
is a maximum, the moduli space MA0 (p, q;H, {fγ}, J) consists of the two gra-
dient lines running from p to q and their signs are different if and only if the
underlying orbit is good.
In view of (36) and (39–40), this identification of signs between ǫ(u) and ǭ(u)
allows to define the Morse-Bott differential exclusively in terms of Morse-Bott
data.
4 Morse-Bott moduli spaces
The structure of this section is as follows. We give in §4.1 the proof of Proposi-
tion 3.5, whereas Theorem 3.7 is proved in §4.2–§4.3, which treat compactness
and gluing and correspond to assertions (i-ii) and (iii) respectively. Finally §4.4
contains a full discussion of orientation issues and the proof of Proposition 3.9.
Symplectic homology for autonomous Hamiltonians 22
4.1 Transversality
Proof of Proposition 3.5. We first prove (ii). Let J ℓ ⊂ J be the space of
admissible almost complex structures of class Cℓ, ℓ ≥ 1, and let J ℓ(H) ⊂ J ℓ be
the set of almost complex structures J which, outside a fixed neighbourhood of
the nonconstant periodic orbits of XH , are independent of θ and satisfy Jξ = ξ,
= Rλ. By a standard trick of Taubes [15, Theorem 5.1] it is enough to show
that there exists an open and dense set J ℓreg(H) ⊂ J ℓ(H) consisting of regular
elements. We define the universal moduli spaces
MA(Sγ , Sγ ;H,J ℓ(H)) = {(u, J) | J ∈ J ℓ(H), u ∈ MA(Sγ , Sγ ;H, J)}
MA(Sγ , q̃;H,J ℓ(H)) = {(u, J) | J ∈ J ℓ(H), u ∈ MA(Sγ , q̃;H, J)}.
The main point is to show that these universal moduli spaces are Banach man-
ifolds. Then the sets J ℓreg(H) consist of the regular values of the natural pro-
jections from the universal moduli spaces to J ℓ(H). We only treat the case of
MA(Sγ , Sγ ;H,J ℓ(H)) since the second case is entirely similar, and we assume
without loss of generality that γ 6= γ. This universal moduli space is the zero
set of a distinguished section of a Banach vector bundle E → BA×J ℓ(H) which
we now define.
Let p > 2 and d > 0. Let BA = B1,p,d(Sγ , Sγ , A;H) be the space of proper
maps u : R× S1 → Ŵ which are locally in W 1,p and satisfy
(i) the map u converges uniformly in θ as s→ ±∞ to γ(·+ θ0), respectively
γ(· + θ0), for some θ0, θ0 ∈ S1, and represents the homology class A ∈
H2(W ;Z);
(ii) there exist tubular neighbourhoods U and U of γ and γ respectively, to-
gether with parametrizations Ψ : U → S1×R2n−1 and Ψ : U → S1×R2n−1
such that
Ψ ◦ γ(θ) = {θ} × {0}, Ψ ◦ γ(θ) = {θ} × {0},
Ψ ◦ γ(θ + θ0)−Ψ ◦ u(s, θ) ∈ W 1,p(]−∞,−s0], ed|s|ds dθ),
Ψ ◦ γ(θ + θ0)−Ψ ◦ u(s, θ) ∈ W 1,p([s0,∞[, ed|s|ds dθ),
for some s0 > 0 sufficiently large.
Then BA is a Banach manifold and, for d/p strictly smaller than the constant
r in Proposition A.1, it contains the moduli spaces MA(Sγ , Sγ ;H, J) for all
J ∈ J ℓ. Let E → BA ×J ℓ(H) be the Banach vector bundle with fiber E(u,J) =
Lp(R×S1, u∗TŴ ; ed|s|ds dθ). Let ∂̄H : BA×J ℓ(H) → E be the section defined
∂̄H(u, J) := ∂su+ Jθ(∂θu−XH). (41)
Symplectic homology for autonomous Hamiltonians 23
Then MA(Sγ , Sγ ;H,J ℓ(H)) = ∂̄−1H (0) and it remains to show that ∂̄H is
transverse to the zero section. This means that the vertical differential
D∂̄H(u, J) : TuBA × TJJ ℓ(H) → E(u,J)
is surjective for all (u, J) ∈ ∂̄−1H (0). We have
TuBA =W 1,p(R× S1, u∗TŴ ; ed|s|ds dθ)⊕ V ⊕ V ,
where V , V are the one-dimensional real vector spaces generated by two sections
of u∗TŴ of the form (1 − β(s, θ))XH(γ(θ)) and β(s, θ)XH(γ(θ)) respectively,
with β(s, θ) = β(s) a smooth cutoff function which vanishes for s ≤ 0 and is
equal to 1 for s ≥ 1. The space TJJ ℓ(H) consists of matrix valued functions
Y : S1 → End(TŴ ) of class Cℓ satisfying the conditions
JθYθ + YθJθ = 0, ω̂(Yθv, w) + ω̂(v, Yθw) = 0, ∀v, w ∈ TŴ , (42)
and such that, outside fixed neighbourhoods of the nonconstant periodic orbits
of XH , they are independent of θ and have the form
with respect
to the splitting ξ ⊕ Span(Rλ, ∂∂t ). The operator D∂̄H(u, J) can be written
D∂̄H(u, J) · (ζ, Y ) = Duζ + Yθ(u)(∂θu−XH(u)).
Du : W
1,p(R× S1, u∗TŴ ; ed|s|ds dθ)⊕ V ⊕ V → Lp(R× S1, u∗TŴ ; ed|s|ds dθ)
is the linearization of the Cauchy-Riemann operator associated to the pair (H, J)
and is explicitly given by formula (12). It is proved in [5, Proposition 4] that
Du is a Fredholm operator. It is at this point that the exponential weight
plays a crucial role, due to the degeneracy of the asymptotic orbits. As a
consequence the range of D∂̄H(u, J) is closed and we are left to prove that it
is also dense. Let q > 1 be such that 1/p + 1/q = 1. We show that every
η ∈ Lq(R× S1, u∗TŴ ; ed|s|ds dθ) satisfying
〈η,Duζ〉ed|s|ds dθ = 0,
〈η, Yθ(u)(∂θu−XH(u))〉ed|s|ds dθ = 0
for all ζ and Y vanishes. The first equation implies, by elliptic regularity, that η
is of class Cℓ and has the unique continuation property. Assume by contradiction
that η does not vanish. Then the set {(s, θ) : η(s, θ) 6= 0} is open and dense.
On the other hand, it is proved in [15, Theorem 4.3] that the set
R(u) := {(s, θ) : ∂su(s, θ) 6= 0, u(s, θ) 6= γ(θ), γ(θ), u(s, θ) /∈ u(R \ {s}, θ)}
of regular points of u is open and dense (although nondegeneracy of the asymp-
totic orbits is a standing assumption in [15], it does not play any role in the
Symplectic homology for autonomous Hamiltonians 24
proof of this result). Let z0 = (s0, θ0) be a point in R(u) with η(z0) 6= 0 and
u(z0) belonging to the fixed open neighbourhood of γ (such a point exists since
we have assumed γ 6= γ). One can choose a matrix Yθ0(u(z0)) satisfying (42)
such that
〈η(z0), Yθ0(u(z0))J(u(z0))∂su(z0)〉 6= 0.
Because z0 is a regular point we can choose a time-dependent cutoff function
ρ : S1 × Ŵ → [0, 1] supported near (θ0, u(z0)) such that Y := ρYθ0(u(z0))
satisfies ∫
〈η, Yθ(u)(∂θu−XH(u))〉ed|s|ds dθ 6= 0.
This contradicts (43) and shows thatD∂̄H(u, J) is surjective, hence the universal
moduli space MA(Sγ , Sγ ;H,J ℓ(H)) is a Banach manifold as claimed.
We now prove (i). The set J ′(H) is obviously open. The fact that it is
nonempty can be seen as follows. The space S1 ×R2 admits the “skating ring”
contact form α = sin θdx− cos θdy, (θ, x, y) ∈ S1×R2 for which ∂
∈ ξ = kerα.
If J denotes the almost complex structure on ξ satisfying J ∂
= cos θ ∂
sin θ ∂
, then [ ∂
, J ∂
] 6= 0 and [ ∂
, J ∂
] /∈ ξ = 〈 ∂
, J ∂
〉. This simple model
can be adapted to our situation as follows. We can symplectically trivialize a
neighbourhood of the simple orbit γ as S1 × R2n−1 ∋ (θ, t, q2, p2, . . . , qn, pn)
with the standard symplectic form dθ ∧ dt + dq2 ∧ dp2 + . . . + dqn ∧ dpn, so
that XH corresponds to
. Let J be a compatible almost complex structure
such that J ∂
+ cos θ ∂
+ sin θ ∂
. Since ∂
and ∂
commute we have
[XH , JXH ] = [
, cos θ ∂
+ sin θ ∂
] 6= 0 and [XH , JXH ] /∈ 〈XH , JXH〉, so
that J ∈ J ′(H).
Let J ′ℓ ⊂ J ′ be the space of admissible almost complex structures of class
Cℓ, ℓ ≥ 1 which are independent of θ ∈ S1, and let J ′ℓ(H) ⊂ J ′ℓ be the space
of almost complex structures J which, outside a fixed neighbourhood of the
nonconstant periodic orbits of XH , satisfy Jξ = ξ, J
= Rλ. It is enough
to show that there exists an open and dense set J ′ℓreg(H) ⊂ J ′ℓ(H) consisting
of elements which are regular for Floer trajectories with one nontrivial simple
asymptote.
We have J ′ℓ ⊂ J ℓ and the main point is to show that the corresponding
universal moduli spaces MA(Sγ , Sγ ;H,J ′ℓ(H)) ⊂ MA(Sγ , Sγ ;H,J ℓ(H)) and
MA(Sγ , q̃;H,J ′ℓ(H)) ⊂ MA(Sγ , q̃;H,J ℓ(H)) are Banach manifolds. We again
treat only MA(Sγ , Sγ ;H,J ′ℓ(H)) and assume without loss of generality that
γ is a simple orbit and γ 6= γ. This universal moduli space is the zero set of
the section of the restricted bundle E → BA × J ′ℓ(H) defined by (41), and we
have to show that the vertical differential D∂̄H(u, J) : TuBA × TJJ ′ℓ(H) →
E(u,J) is surjective. Arguing by contradiction, we get an element η ∈ Lq(R ×
S1, u∗TŴ ; ed|s|ds dθ) of class Cℓ which does not vanish on an open and dense
subset of R×S1 and satisfies
〈η, Y (u)(∂θu−XH(u))〉ed|s|ds dθ = 0 for any
Y ∈ TJJ ′ℓ(H). The main difference with respect to (ii) is that TJJ ′ℓ(H) ⊂
TJJ ℓ(H) consists of elements Y ∈ End(TŴ ) which are independent of θ.
Symplectic homology for autonomous Hamiltonians 25
I(u) :=
(s, θ) : ∂su(s, θ) 6= 0, u−1(u(s, θ)) = {(s, θ)}
be the set of injective points, and denote IR(u) := I(u)∩ ]R,∞[×S1, R > 0.
The main observation is that our special choice of J ∈ J ′(H) implies that
IR(u) is open and dense in ]R,∞[×S1 for R large enough. This is proved
exactly as in [15, §7], and the main steps are the following. Since γ is simple,
every u as above is simple, i.e. for every integer m > 1 there exists a point
(s, θ) ∈ R × S1 = R × R/Z such that u(s, θ + 1
) 6= u(s, θ). Let U be a
neighbourhood of γ in which [XH , JXH ] 6= 0 and [XH , JXH ] /∈ 〈XH , JXH〉. We
call a point (s, θ) regular if ∂su, ∂θu, XH(u), JXH(u) are linearly independent
at (s, θ), and we denote by R(u) the set of regular points. Then [15, Lemma 7.6]
holds and [15, Lemma 7.7] shows that the set {(s, θ) ∈ R(u) : u(s, θ) ∈ U} is
open and dense in u−1(U). Note that we crucially use here our hypothesis
J ∈ J ′(H), which plays the role of the hypothesis J ∈ Jad(M,ω,X) in [15].
Finally [15, Lemma 7.8] shows that the set of points which are regular and
injective is open and dense in u−1(U), and in particular IR(u) is open and
dense in ]R,∞[×S1 for R large enough.
We can then choose a point z0 = (s0, θ0) ∈ IR(u) such that η(z0) 6= 0 and
a matrix Y (u(z0)) satisfying (42) and 〈η(z0), Y (u(z0))J(u(z0))∂su(z0)〉 6= 0.
Since z0 is an injective point we can choose a cutoff function ρ : Ŵ → R
supported near u(z0) such that Y := ρY (u(z0)) satisfies
〈η, Y (u)(∂θu −
XH(u))〉ed|s|ds dθ 6= 0. This contradiction shows that D∂̄H(u, J) is surjective
and therefore MA(Sγ , Sγ ;H,J ′ℓ(H)) is a Banach manifold as claimed.
The dimension of the moduli space MA(Sγ , Sγ ;H, J), J ∈ Jreg(H) is equal
to ind(Du) − 1. The restriction of the operator Du to the subspace W 1,p(R ×
S1, u∗TŴ ; ed|s|ds dθ) is conjugated to a Cauchy-Riemann operator
Du :W 1,p(R× S1, u∗TŴ ; ds dθ) → Lp(R× S1, u∗TŴ ; ds dθ)
via multiplication by e
|s|. If the asymptotics of Du were nondegenerate, the
Fredholm index of Du would be given by [26]
µRS(γ)− µRS(γ) + 2〈c1(TW ), A〉.
Due to the one-dimensional degeneracy of γ and γ, the actual index of Du is
obtained by a calculation analogous to [5, Proposition 4] (see also Lemma 3.4) :
(µRS(γ)−
)− (µRS(γ) +
) + 2〈c1(TW ), A〉. (44)
We have proved in Lemma 3.4 that µRS(γ) = µ(γ) +
, hence
ind(Du) = ind(Du) + 2 = µ(γ)− µ(γ) + 2〈c1(TW ), A〉+ 1.
Finally note that the evaluation maps ev, ev are well-defined and smooth on
BA. Hence their restrictions to the moduli spaces are smooth as well.
Symplectic homology for autonomous Hamiltonians 26
4.2 Compactness for Morse-Bott trajectories
Definition 4.1. Let H, {fγ} and J be fixed as above, and let p ∈ Crit(fγ),
q ∈ Crit(fγ). The space M̂A(p, q;H, {fγ}, J) of parametrized Morse-Bott
broken trajectories consists of tuples
u = (cm, um, cm−1, um−1, . . . , u1, c0)
such that
(i) ui ∈ M̂Ai(Sγi , Sγi−1 ;H, J), i = 1, . . . ,m with γm = γ, γ0 = γ and A1 +
. . .+Am = A;
(ii) c0 : [−1,+∞[→ Sγ0 , ci : [−Ti/2, Ti/2] → Sγi , i = 1, . . . ,m − 1 and
cm :]−∞, 1] → Sγm satisfy ċi = ∇fγi ◦ ci, i = 0, . . . ,m;
(iii) ev(ui) = ev(ci), ev(ui) = ev(ci−1), i = 1, . . . ,m and c0(+∞) = q,
cm(−∞) = p.
The space MA(p, q;H, {fγ}, J) of unparametrized Morse-Bott broken tra-
jectories consists of equivalence classes
[u] = (cm, [um], cm−1, [um−1], . . . , [u1], c0)
such that u ∈ M̂A(p, q;H, {fγ}, J).
Definition 4.2. Let
uk = (cmk,k, umk,k, cmk−1,k, . . . , u1,k, c0,k) ∈ M̂A(pk, qk;H, {fγ}, J)
with k = 1, . . . , ℓ, and satisfying qk = pk−1 for k = 2, . . . , ℓ. We denote p := pℓ,
q := q1. A sequence vn ∈ M̂A(γp, γq;Hδn , J) with δn → 0, n → ∞ is said to
converge to u := (uℓ, . . . ,u1) if there exist shifts (s
i,k) ∈ R, i = 1, . . . ,mk such
vn(·+ sni,k , ·) → ui,k, n→ ∞
uniformly on compact sets in R× S1. We write in this case vn → u.
A sequence [ṽn] ∈ MA(γp, γq;Hδn , J) with δn → 0, n → ∞ is said to
converge to [u] ∈ MA(p, q;H, {fγ}, J) if there exist representatives vn and
v such that vn → v (this condition is obviously independent on the choice of
representatives). We write in this case [ṽn] → [u].
We call u a broken Floer trajectory with gradient fragments. We
call each of the uk’s a Floer trajectory with gradient fragments. Each uk
is a level of u and each ui,k is a sublevel of uk.
Definition 4.3. An element
u = (cm, um, cm−1, . . . , u1, c0) ∈ M̂A(p, q;H, {fγ}, J)
with m ≥ 1 is stable if each ui, i = 1, . . . ,m is a nonconstant Floer trajectory
and if each ci, i = 1, . . . ,m − 1 defined on an interval of nonzero length is
Symplectic homology for autonomous Hamiltonians 27
nonconstant. An element u = (c0) ∈ MA(p, q;H, {fγ}, J) is stable if p 6= q.
A broken Floer trajectory with gradient fragments u = (uℓ, . . . ,u1) is stable if
each uk, k = 1, . . . , ℓ is stable.
Remark 4.4. A convergent sequence vn of nonconstant Floer trajectories has
a stable limit u which is unique up to shifts on the ci,k and ui,k.
The proofs of the next two lemmas use the asymptotic estimates proved
in the Appendix. The relevant notation is introduced at the beginning of the
Appendix, and we briefly recall it here for the reader’s convenience. For each
γ ∈ P(H) we choose coordinates (ϑ, z) ∈ S1 × R2n−1 parametrizing a tubular
neighbourhood of γ, such that ϑ ◦ γ(θ) = θ and z ◦ γ(θ) = 0. Given a smooth
function fγ : Sγ → R, we denote by ϕ
s the gradient flow of fγ with respect to
the natural metric on S1.
In a neighbourhood of γ ∈ P(H) the Floer equation ∂su+ J∂θu− JXH = 0
becomes ∂sZ+J∂θZ+J
−JXH = 0, where Z(s, θ) := (ϑ◦u(s, θ)−θ, z◦u(s, θ)).
Since XH =
on {z = 0} this can be rewritten as ∂sZ + J∂θZ + Sz = 0 for
some matrix-valued function S = S(ϑ, z). The matrix S∞(θ) := S(θ, 0) is
symmetric. Let A∞ : H
k(S1,R2n) → Hk−1(S1,R2n) be the operator defined by
A∞Z := J
Z + S∞(θ)z. The kernel of A∞ has dimension one and is spanned
by the constant vector e1 := (1, 0, . . . , 0). We denote by Q∞ the orthogonal
projection onto (ker A∞)
⊥ and we set P∞ := 1l−Q∞.
Lemma 4.5. Let vn ∈ M̂A(γp, γq;Hδn , J) with δn → 0, n→ ∞ and s
1 < s
shifts such that vn(·+ sn1 , ·) → u1, vn(·+ sn2 , ·) → u2 uniformly on compact sets,
with u1 ∈ M̂A1(Sγ1 , Sγ ;H, J) and u2 ∈ M̂A2(Sγ , Sγ2 ;H, J). Any two sequences
of shifts sn1 < s
+ < s
− < s
2 satisfying s
+ − sn1 → ∞, sn2 − sn− → ∞ and
+ − sn1 ) → 0, δn(sn2 − sn−) → 0, (45)
have the property that
vn(·+ sn+, ·) → ev(u1), vn(·+ sn−, ·) → ev(u2)
uniformly on compact sets.
Proof. We claim that there exists K > 0 such that vn([s
1 +K, s
2 −K] × S1)
is contained in a given small neighbourhood of Sγ . If that was not the case, we
could find a sequenceKn → ∞ and a sequence (sn, θn) ∈ [sn1 +Kn, sn2 −Kn]×S1
such that dist(vn(sn, θn), Sγ) is bounded away from zero. Up to a subsequence,
vn(· + sn, ·) converges to some Floer trajectory v which must be nonconstant.
On the other hand, for any s ∈ R and for any K > 0 we have, for n large
enough,
vn(s+ s
2 −K, ·)
≤AHδn
vn(s+ sn, ·)
≤AHδn
vn(s+ s
1 +K, ·)
and in the limit AH(u2(s − K, ·)) ≤ AH(v(s, ·)) ≤ AH(u1(s + K, ·)). We let
K go to infinity and obtain AH(γ) ≤ AH(v(s, ·)) ≤ AH(γ). This holds for all
Symplectic homology for autonomous Hamiltonians 28
s ∈ R and therefore the cylinder v is constant over some element of P(H), a
contradiction which proves the claim.
By (98) in the proof of Proposition A.3 applied to vn on [s
1 + K, s
2 −K]
we get
|Q∞vn(s, θ)| ≤ Cmax(‖Q∞vn(sn1 +K)‖, ‖Q∞vn(sn2 −K)‖). (46)
Let γ+ be the limit in Sγ of vn(s
+, ·), and let In(ǫ) := [sn+(ǫ), sn+] ⊂ [sn1 +
K, sn+] be the maximal subinterval containing s
+ such that P∞vn(s), s ∈ In(ǫ)
is at distance at least ǫ from the critical points of fγ , except maybe γ+ (if the
latter is a critical point). By the second part of Proposition A.3 applied to vn
on In(ǫ) we obtain
|ϑ ◦ vn(s, θ)− θ−ϕ
δs (θ0)| ≤ Cmax(‖Q∞vn(s
+(ǫ))‖, ‖Q∞vn(sn+)‖)eMδn(s
Since δn(s
+ − sn1 ) → 0 and taking into account (46) we get
|ϑ ◦ vn(s, θ)− ϑ ◦ γ+(θ)| ≤ C1 max(‖Q∞vn(sn1 +K)‖, ‖Q∞vn(sn2 −K)‖). (47)
For K large enough the right hand term becomes so small that the distance
between P∞vn(s), s ∈ In(ǫ) and the critical points of fγ , except possibly γ+, is
strictly bigger than ǫ, hence In(ǫ) = [s
1 +K, s
+] by maximality (this holds for
K large enough). Applying (47) to s = sn1 +K we obtain
|ϑ ◦ vn(sn1 +K, θ)− ϑ ◦ γ+(θ)| ≤ C1 max(‖Q∞vn(sn1 +K)‖, ‖Q∞vn(sn2 −K)‖).
Passing to the limit in the above inequality we obtain
|ϑ ◦ u1(K, θ)− ϑ ◦ γ+(θ)| ≤ C1 max(‖Q∞u1(K)‖, ‖Q∞u2(−K)‖).
Letting K → ∞ we obtain ev(u1) = γ+. That this implies uniform convergence
on compact sets to the constant cylinder over ev(u1) can be seen in two ways:
either one notices that the above estimates hold uniformly when sn+ is replaced
with sn++K+, whereK+ is a bounded constant, or one uses the fact that a Floer
trajectory passing through a periodic orbit is necessarily a constant cylinder, by
unique continuation applied to the infinite jet at that orbit [21, Theorem 2.3.2].
A similar argument proves the assertion involving ev(u2).
Lemma 4.6. Let vn ∈ M̂A(p, q;Hδn , J) with δn → 0, n→ ∞. Assume we are
given two sequences of shifts sn1 < s
2 such that vn(· + sni , ·), i = 1, 2 converge
uniformly on compact sets to constant cylinders uγi over orbits γi belonging to
the same family Sγ . Then there exists a (possibly broken) gradient trajectory
of fγ starting at γ1 and ending at γ2. Moreover, the length of the gradient
trajectory is T = limn→∞ δn(s
2 − sn1 ).
Proof. We claim that, for n large enough, vn([s
1 , s
2 ]×S1) is entirely contained
in an arbitrarily small neighbourhood of γ. By contradiction, if this fails we
can reparametrize the sequence vn so that it converges to a nonconstant Floer
Symplectic homology for autonomous Hamiltonians 29
trajectory v ∈ MB(γ′, γ′;H, J) for some class B ∈ H2(M ;Z) and some γ′, γ′ ∈
P(H) such that their actions satisfy AH(γ) ≤ AH(γ′) < AH(γ′) ≤ AH(γ),
which is impossible.
We first assume that γ1 is not a critical point of fγ . Let ǫ > 0 be fixed and
denote by In(ǫ) = [s
1 , s
2 (ǫ)] ⊂ [sn1 , sn2 ] the maximal subinterval containing sn1
such that the distance between P∞vn(s), s ∈ In(ǫ) and Crit(fγ) is at least ǫ.
We can apply Proposition A.3 to vn and In(ǫ). In particular, for some sequence
θn ∈ S1 we have
(s,θ)∈In(ǫ)×S1
|ϑ ◦ vn(s, θ)− θ − ϕδnfγs (θn)| = 0.
Since vn(s
1 , ·) converges to γ1, we also have
(s,θ)∈In(ǫ)×S1
|ϑ ◦ vn(s, θ)− θ − ϕ
δn(s−s
(γ1)| = 0. (48)
Modulo passing to a subsequence we know that vn(s
2 (ǫ), ·) converges, which
together with (48) implies that δn(s
2 (ǫ)−sn1 ) converges to T (ǫ) ∈ R+. This holds
for each ǫ > 0 and, since sn2 (ǫ) < s
′) if ǫ > ǫ′, the limit limǫ→0 T (ǫ) = T ∈ R+
exists. Then ϕ
s (γ1), s ∈ [0, T ] is a gradient trajectory starting at γ1.
If T is finite then this trajectory, and therefore vn(In(ǫ)×S1) stay at a fixed
distance from Crit(fγ) for n large enough. Hence In(ǫ) = In for ǫ sufficiently
small and we are done. If T is infinite and the limit lims→∞ ϕ
s (γ1) is equal
to γ2, we are also done. Otherwise we are in the next case, with shifts s̃
limǫ→0 s
2 (ǫ) and s̃
2 := s
We now assume that γ1 is a critical point of fγ and γ1 6= γ2. Given ǫ > 0 we
denote by In(ǫ) = [s
1 , s
2 (ǫ)] ⊂ [sn1 , sn2 ] the maximal subinterval containing sn1
such that the distance between P∞vn(s), s ∈ In(ǫ) and Crit(fγ)\{γ1} is at least
ǫ. For ǫ > 0 small enough the loops P∞vn(s
2 (ǫ)) are at a distance bigger than ǫ
from γ1 and, up to a subsequence, vn(s
2 (ǫ), ·) converges to some γ̃2 ∈ Sγ which
is not a critical point of fγ . The same argument as in the previous case applied
“backwards” to the shifts sn1 < s
2 (ǫ) produces a negative gradient trajectory
running from γ̃2 to some critical point γ̃1. By definition of In(ǫ), we must have
γ̃1 = γ1 and we thus obtain a gradient trajectory from γ1 to γ̃2. We are now in
the first case with shifts s̃n1 := s
2 (ǫ) and s̃
2 := s
We successively apply the above two cases in order to produce a broken gra-
dient trajectory from γ1 to γ2. This is a finite process since a broken trajectory
has a finite number of nonconstant fragments.
Proposition 4.7. Let vn ∈ M̂A(p, q;Hδn , J) with δn → 0, n → ∞. There
exists a broken Floer trajectory with gradient fragments u and a subsequence
(still denoted by vn) such that vn → u.
Proof. The energy E(vn) := EJ,Hδn (vn) defined in (11) satisfies
E(vn) = −
Hδn(θ, γp(θ)) dθ +
Hδn(θ, γq(θ)) dθ.
Symplectic homology for autonomous Hamiltonians 30
Since Hδn → H we infer that E(vn) is uniformly bounded.
Floer’s compactness theorem [12, Proposition 3c] applies to our situation and
provides a collection of Floer trajectories ui, i = 1, . . . ,m for the pair (H, J)
together with holomorphic spheres attached to them, as well as shifts (sni ) such
that vn(·+ sni , ·) converges to ui and its associated holomorphic spheres in the
sense of nodal curves. Condition (1) implies symplectic asphericity 〈ω, π2(Ŵ )〉 =
0, therefore holomorphic spheres in (Ŵ , J) are constant and the shifted vn
converge to ui uniformly on compact sets.
Because the action spectrum of ∂W was assumed to be discrete and injective
the trajectories ui connect with each other, in the sense that ev(ui) and ev(ui+1)
belong to the same family of trajectories Sγi , i = 1, . . . ,m−1. Moreover, ev(um)
belongs to Sγ and ev(u1) belongs to Sγ .
By Lemma 4.5 there exist shifts sni,± such that vn(· + sni,+, ·) converges to
the constant cylinder over ev(ui), and vn(· + sni,−, ·) converges to the constant
cylinder over ev(ui). Applying Lemma 4.6 with shifts s
i,+ < s
i−1,−, i = 2, . . . ,m
and n large enough, we obtain broken gradient trajectories ci−1 starting at
ev(ui) and ending at ev(ui−1). Let now s
+ be shifts such that vn(·+sn−, ·) →
p and vn(· + sn+, ·) → q. Applying Lemma 4.6 with shifts sn− < snm,− and with
shifts sn1,+ < s
+ we obtain broken gradient trajectories cm starting at p and
ending at ev(um) and c0 starting at ev(u1) and ending at q. Since all Sγ are
circles of periodic orbits, the broken gradient trajectories ci, i = 0, . . . ,m consist
each of a single fragment.
The construction of a stable broken Floer trajectory with gradient fragments
out of the data ci, ui is straightforward and goes as follows. The collection of
points of the form ev(ui+1), ev(ui) which are critical points of fγi determine a
partition
(cmℓ,ℓ, umℓ,ℓ, cmℓ−1,ℓ, . . . , c1,ℓ, u1,ℓ, c0,ℓ), . . . , (cm1,1, um1,1, . . . , u1,1, c0,1)
of the ordered tuple (cm, um, . . . , c1, u1, c0). Note that the cmk,k and c0,k may
either be missing or be constant and exactly one of c0,k and cmk−1,k−1 is missing.
In such a situation we set cmk,k or c0,k to be a constant trajectory at the relevant
critical point, defined on a semi-infinite interval.
4.3 Gluing for Morse-Bott moduli spaces
We prove in this subsection the assertions (i-ii) of Theorem 3.7. The following
notation was introduced in the previous subsection. For γ ∈ P(H) we choose
coordinates (ϑ, z) ∈ S1 × R2n−1 parametrizing a tubular neighbourhood of γ,
such that ϑ ◦ γ(θ) = θ and z ◦ γ(θ) = 0. Given a smooth function fγ : Sγ → R,
we denote by ϕ
s the gradient flow of fγ with respect to the natural metric on
S1. The orthogonal projection onto the 1-dimensional kernel of the asymptotic
operator at γ ∈ P(H) is denoted by P∞, and we denote Q∞ := 1l− P∞.
Let p > 2, d > 0 and δ > 0. Let BAδ = B
1,p,d
δ (γp, γq, A;H, {fγ}) be the space
of proper maps u : R× S1 → Ŵ which are locally in W 1,p and satisfy
Symplectic homology for autonomous Hamiltonians 31
(i) the map u converges uniformly in θ as s → ±∞ to γ
, respectively γp,
and represents the homology class A ∈ H2(W ;Z);
(ii) there exist tubular neighbourhoods U and U of γ and γ respectively,
parametrized by (ϑ, z) ∈ S1 × R2n−1 such that
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p(]−∞,−s0]× S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; ed|s|ds dθ),
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p([s0,∞[×S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; ed|s|ds dθ),
for some s0 > 0 sufficiently large and some θ0, θ0 ∈ S1 satisfying
ϕfγs (θ0) = p, lim
ϕfγs (θ0) = q. (49)
Then BAδ is a Banach manifold and, for d > 0 sufficiently small, it contains
the moduli spaces MA(γp, γq;Hδ, J) for all J ∈ J (see Proposition A.2 in the
Appendix). Let E → BAδ be the Banach vector bundle with fiber E(u,J) =
Lp(R× S1, u∗TŴ ; ed|s|ds dθ). Let ∂̄Hδ,J : BAδ → E be the section defined by
∂̄Hδ,J(u) := ∂su+ Jθ(∂θu−XHδ ).
Then MA(γp, γq;Hδ, J) = ∂̄
(0). From now on we fix J ∈ Jreg(H). In
order to prove (i) in Theorem 3.7 we need to show that the vertical differential
Du : TuBAδ → Eu defined by (12) is surjective for all u ∈ MA(γp, γq;Hδ, J)
when δ > 0 is sufficiently small and the expected dimension of the moduli space
is zero. We have
TuBAδ =W 1,p(R× S1, u∗TŴ ; ed|s|ds dθ)⊕ V u ⊕ V u,
where V u, V u are real vector spaces of dimension
dim V u = ind(p), dim V u = 1− ind(q).
When their dimension is nonzero V u and V u are respectively generated by two
sections of u∗TŴ of the form
(1− β(s, θ))∇fγ(ϑ ◦ u(s, 0)) and β(s, θ)∇fγ(ϑ ◦ u(s, 0)),
with β(s, θ) = β(s) a smooth cutoff function which vanishes for s ≤ 0 and is
equal to 1 for s ≥ 1. The fact that V u and V u have varying dimensions is a
consequence of condition (49).
We shall prove surjectivity of Du by showing that the elements of the moduli
spaceMA(γp, γq;Hδ, J) can be approximated, for δ > 0 small enough, by gluing
the elements of MA(Sγ , Sγ ;H, J) with fragments of gradient trajectories of the
Morse functions fγ .
Symplectic homology for autonomous Hamiltonians 32
Given a, b ∈ R, a < b we define intervals
I(a, b) =
[a, b] if a, b ∈ R,
]−∞, b] if a = −∞, b ∈ R,
[a,∞[ if a ∈ R, b = ∞.
For b − a > 4 and |ǫ| < 1, we let ha,b,ǫ : R → I(a, b) ⊂ R be a collection of
smooth increasing functions such that ha,b,ǫ(s) = a if s ≤ a − ǫ/2, ha,b,ǫ(s) =
b if s ≥ b + ǫ/2 and ha,b,ǫ(s) := s if a − ǫ/2 + 1 < s < b + ǫ/2 − 1. We
can of course make the family {ha,b,ǫ} depend smoothly on a, b and ǫ. We
define ka,b,ǫ(s) :=
|σ=0h′a−σ,b+σ,ǫ(s). The support of ka,b,ǫ is contained in
[a − ǫ/2, a− ǫ/2 + 1] ∪ [b + ǫ/2 − 1, b + ǫ/2]. We may assume without loss of
generality that h′a,b,ǫ and ka,b,ǫ are uniformly bounded.
Convention. If ǫ = 0 we shall omit it from all subsequent decorations, and we
set ǫ = 0 if a = −∞ or b = +∞.
a, b ∈ R a ∈ R, b = +∞
ka,b,ǫ
ha,b,ǫ
h′a,b,ǫ
ka,b,ǫ
ha,b,ǫ
h′a,b,ǫ
Figure 2: The reparametrization function ha,b,ǫ and its derivatives.
Let γ ∈ Pλ and c : I(a, b) → Sγ ⊂ Ŵ be a fragment of gradient trajectory
for the function fγ , i.e. ċ = ∇fγ ◦ c. We define the corresponding gradient
cylinder
uδ,γ,a,b,ǫ : R× S1 → Sγ ⊂ Ŵ
by the equation
ϑ ◦ uδ,γ,a,b,ǫ(s, θ) = ϑ ◦ c(δh a
(s)) + θ. (50)
Then lim
ϑ ◦ uδ,γ,a,b,ǫ(s, θ) = ϑ ◦ c(a) + θ and lim
ϑ ◦ uδ,γ,a,b,ǫ(s, θ) = ϑ ◦
c(b) + θ.
Symplectic homology for autonomous Hamiltonians 33
For γ ∈ Pλ we define Banach manifolds B1,p,dδ (Sγ , Sγ ; fγ), B
1,p,d
δ (p, Sγ ; fγ),
p ∈ Crit(fγ) and B1,p,dδ (Sγ , q; fγ), q ∈ Crit(fγ) consisting of maps u : R× S1 →
Ŵ which are locally of class W 1,p, whose asymptotics are translates of γ, which
represent the zero homology class and which satisfy the following asymptotic
conditions.
(i) for B1,p,dδ (Sγ , Sγ ; fγ): there exists a neighbourhood U of Sγ together with
a parametrization (ϑ, z) : U → S1 × R2n−1 such that
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p(]−∞,−s0]× S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; ed|s|ds dθ),
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p([s0,∞[×S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; ed|s|ds dθ),
for some θ0, θ0 ∈ S1 and some s0 > 0. Moreover, there exists T > 0 such
T (θ0) = θ0; (51)
(ii) for B1,p,dδ (p, Sγ ; fγ): there exists a neighbourhood U of Sγ parametrized
by (ϑ, z) ∈ S1 × R2n−1 such that
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p(]−∞,−s0]× S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; ed|s|ds dθ),
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p([s0,∞[×S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; ed|s|ds dθ),
for some θ0, θ0 ∈ S1 such that lims→−∞ ϕ
s (θ0) = lims→−∞ ϕ
s (θ0) = p
and some s0 > 0;
(iii) for B1,p,dδ (Sγ , q; fγ): there exists a neighbourhood U of Sγ parametrized
by (ϑ, z) ∈ S1 × R2n−1 such that
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p(]−∞,−s0]× S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; ed|s|ds dθ),
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p([s0,∞[×S1,R; ed|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; ed|s|ds dθ),
for some θ0, θ0 ∈ S1 such that lims→∞ ϕ
s (θ0) = lims→∞ ϕ
s (θ0) = q and
some s0 > 0.
We will designate one of the above three spaces by B′δ. We define evaluation
maps ev and ev on B′δ by
ev(u) = lim
u(s, ·), ev(u) = lim
u(s, ·).
Symplectic homology for autonomous Hamiltonians 34
Any map u = uδ,γ,a,b,ǫ belongs to a suitable space B′δ, depending on a, b being
finite or not. The tangent space TuB′δ has a natural decomposition
TuB′δ =W 1,p,d(R× S1, u∗TŴ )⊕ V
u ⊕ V
u, (52)
where V
u are real vector spaces of dimensions
1 if a ∈ R,
ind(p) if a = −∞, dim V
1 if b ∈ R,
1− ind(q) if b = +∞. (53)
When the dimensions are respectively nonzero the generators of V
u are
sections given as follows.
(i) for B1,p,dδ (Sγ , Sγ ; fγ) the sections are
(1− β(s, θ))XH(γ(θ + θ0)) and β(s, θ)XH(γ(θ + θ0));
(ii) for B1,p,dδ (p, Sγ ; fγ) the sections are
(1− β(s, θ))∇fγ(ϑ ◦ u(s, 0)) and β(s, θ)XH(γ(θ + θ0));
(iii) for B1,p,dδ (Sγ , q; fγ) the sections are
(1− β(s, θ))XH(γ(θ + θ0)) and β(s, θ)∇fγ(ϑ ◦ u(s, 0)).
We recall that β(s, θ) = β(s) is a smooth cutoff function which vanishes for
s ≤ 0 and is equal to 1 for s ≥ 1. The norm on TuB′δ is chosen such that the
norm of the above generators of V
u is equal to 1. Let E → B′δ be the
Banach vector bundle with fiber
Eu = Lp(R× S1, u∗TŴ ; ed|s|ds dθ).
We are interested in the family of sections ∂̄a,b,ǫ := ∂̄H′
a,b,ǫ
,J : B′δ → E , with
H ′a,b,ǫ = H + h
(s)(Hδ −H)
= H + δh′a
(s)ρfγ(ℓγϑ− ℓγθ). (54)
Here we use the definition (25) of Hδ. This is a three-parameter family in case
(i) and a two-parameter family in cases (ii) and (iii). Its main feature is that
∂̄a,b,ǫ(uδ,γ,a,b,ǫ) = 0.
We note that neither of the operators ∂̄H,J and ∂̄Hδ,J defines a section B′δ → E
if a or b is infinite. The vertical differential Du := D
a,b,ǫ
u : TuB′δ → Eu of each
of the sections ∂̄a,b,ǫ is given by formula (12) and is a Fredholm operator whose
index has the following values (see also (44)).
Symplectic homology for autonomous Hamiltonians 35
(i) for B1,p,dδ (Sγ , Sγ ; fγ)
ind(Du) = (µRS(γ)−
)− (µRS(γ) +
) + 2 = 1,
(ii) for B1,p,dδ (p, Sγ ; fγ)
ind(Du) = (µRS(γ)−
)− (µRS(γ) +
) + ind(p) + 1 = ind(p),
(iii) for B1,p,dδ (Sγ , q; fγ)
ind(Du) = (µRS(γ)−
)− (µRS(γ) +
) + 1 + 1− ind(q) = 1− ind(q).
In formulas (ii) and (iii) the asymptotics of the operator obtained by conjugation
with e
|s| do not depend on ind(p), ind(q) because, for δ small, the exponential
weight d
overrides the contribution of the perturbation Hδ −H .
Proposition 4.8. Let u = uδ,γ,a,b,ǫ ∈ B′δ. The operator
Du :W
1,p(R× S1, u∗TŴ ; ed|s|)⊕ V ′u ⊕ V ′u → Lp(R× S1, u∗TŴ ; ed|s|dsdθ)
is surjective for δ > 0 small enough.
Proof. In order to compute Du we choose ∇ to be the Levi-Civita connection
corresponding to a (split) metric given by (dλ + dt ∧ λ)(·, J ·). It is a general
fact that the operator Du can be written in a unitary trivialization of u
∗TŴ as
(Duζ)(s, θ) = ∂sζ + J0∂θζ + S(s, θ)ζ(s, θ),
where J0 is the standard complex structure on R
2n and S is asymptotically
symmetric as s → ±∞. We can choose the trivialization so that XH and ∂/∂t
correspond to constant vectors in R2n. We denote S := lims→−∞ S(s, ·). In this
situation the matrix S has the following properties:
(i) ‖S(s, θ) − S(ϑ ◦ u(s, θ) − ϑ ◦ c(a))‖ is bounded by a constant multiple of
δ. This is because, for s ∈ R, the restriction of u to [s − 1, s+ 1] × S1 is
δ-close to the constant cylinder over the orbit u(s, ·) ∈ Sγ .
(ii) the action of S(s, θ) on the (constant) vector of R2n corresponding to XH
is multiplication by
δk(s) := δh′a
(s)f ′′γ (ϑ(u(s, θ))− θ),
and this expression goes to zero with δ.
(iii) the matrix S(s, θ) sends the subspace corresponding to ξ to itself and
sends ∂/∂t on a multiple of the form (E + δF (s, θ))∂/∂t, with E > 0 and
F a bounded function of (s, θ). This follows from (12), (24) and the fact
that ∇∂/∂tRλ = 0 and ∇vRλ ∈ ξ, v ∈ ξ;
Symplectic homology for autonomous Hamiltonians 36
(iv) there is a constant C > 0 such that ‖S′(s, θ)‖ ≤ Cδ for all s ∈ R and
θ ∈ S1. This follows from (50) due to the presence of the factor δ in front
of the reparametrization function h a
We characterize now the kernel of Du. We first show that each ζ ∈ ker Du is
a multiple of the (constant) vector corresponding to XH , or that its component
ζ⊥ on the orthogonal complement vanishes. Let F (s) denote the self-adjoint
operator J0∂θ + S(s, θ), so that Du = ∂s + F (s). If ζ ∈ ker Du we have
(∂s − F (s))(∂s + F (s))ζ = 0, i.e.
∂2sζ − F (s)2ζ + S′(s)ζ = 0.
By taking the scalar product in L2(S1,R2n) with ζ⊥ and using property (ii) for
S we get
〈ζ⊥, ∂2sζ⊥〉 − ‖F (s)ζ⊥‖2 + 〈ζ⊥, S′(s)ζ⊥〉 = 0.
The Morse-Bott assumption and property (i) guarantee that ‖F (s)ζ⊥‖L2 ≥
c‖ζ⊥‖L2 for some c > 0. We obtain
∂2s‖ζ⊥‖2L2 ≥ 2〈ζ⊥, ∂2sζ⊥〉L2 ≥ 2(c2 − Cδ)‖ζ⊥‖2L2 ≥ c2‖ζ⊥‖2L2
if δ > 0 is sufficiently small. In particular ‖ζ⊥‖2L2 can have no local maximum
on R. Since ‖ζ⊥‖2
→ 0 as s→ ±∞ we deduce that ζ⊥ ≡ 0.
We now show that all elements of ker Du are independent of θ. Let ζ ∈
ker Du. Because ζ
⊥ = 0 we have ∂sζ + J0∂θζ + δk(s)ζ = 0, with ∂sζ + δk(s)ζ
and ∂θζ pointwise colinear with XH . Hence ∂sζ + δk(s)ζ = 0 and ∂θζ = 0.
This shows that the elements of ker Du also belong to the kernel of the
linearized Morse operator
ζ 7→ ∂sζ + δh′a
(s)f ′′γ (ϑ ◦ c(δh a
This is a differential equation on R for which the Cauchy problem has a unique
solution. Hence the space of solutions is one-dimensional in C∞(R,R) and, in
order to determine the dimension of ker Du, we just have to check whether the
solutions belong or not to its domain.
If a and b are finite the solutions are constant near ±∞, hence belong to the
domain of Du and dim ker Du = 1. If a = −∞ (and b is finite) we distinguish
two cases: either p is a maximum, in which case f ′′γ (p) < 0, the solutions are
unbounded near −∞ and ker Du = 0, or p is a minimum, in which case f ′′γ (p) >
0, the solutions coincide near −∞ with the elements of V ′u and dim ker Du = 1.
Hence dim ker Du = ind(p). A similar argument shows that dim ker Du =
1− ind(q) if b = +∞ (and a is finite). In all cases we have
dim ker Du = ind(Du),
so that Du is surjective.
Symplectic homology for autonomous Hamiltonians 37
Up to a translation, the defining interval I(a, b) of a gradient cylinder can be
considered to be [−T/2, T/2], T > 0 in case (i), or ]−∞, 1], [−1,∞[ in cases (ii)
and (iii) respectively. We shall thus assume in the sequel that the parameters
a, b take the values
a = −T/2, b = T/2 for T > 0, or a = −∞, b = 1, or a = −1, b = +∞.
We consider a tuple (γ, a, b, ǫ) and the gradient cylinder u := uδ := uδ,γ,a,b,ǫ
for δ small enough. Let (sδ) be a family of parameters such that sδ ≤ s∗δ and
→ 1 as δ → 0, where
s∗δ :=
(T + ǫ)/2δ, if a = −T/2, b = T/2,
1/δ, otherwise.
In particular we have sδ → ∞ as δ → 0. Our goal now is to define modified
norms ‖ · ‖1,δ and ‖ · ‖δ on the domain and target of the operators Du = Duδ
such that they admit uniformly bounded right inverses with respect to δ → 0.
Let wδ : R → R+ be the weight function defined by
wδ(s) =
ed||s|−sδ|, if a and b are finite,
ed|s−sδ|, if a = −∞ and b is finite,
ed|s+sδ|, if a is finite and b = ∞.
0 sδ−sδ
Figure 3: Weight function ||s| − sδ| for a, b finite (logarithmic scale).
The new norm ‖ · ‖δ on the target of Du is the Lp-norm with weight wδ, and
we emphasize it by writing the target as
Lp(R× S1, u∗TŴ ;wδ(s)dsdθ).
Let V
u,δ, V
u,δ be vector spaces of the same dimension as V
u, given
by (53), and which, when their dimension is nonzero, are spanned by the fol-
lowing sections.
(i) if a, b are both finite the sections are
(1− β(s+ sδ, θ))XH(γ(θ + θ0)) and β(s− sδ, θ)XH(γ(θ + θ0));
Symplectic homology for autonomous Hamiltonians 38
(ii) if a = −∞ and b is finite the sections are
(1− β(s− sδ, θ))∇fγ(ϑ ◦ u(s, 0)) and β(s− sδ, θ)XH(γ(θ + θ0));
(iii) if a is finite and b = +∞ the sections are
(1− β(s+ sδ, θ))XH(γ(θ + θ0)) and β(s+ sδ, θ)∇fγ(ϑ ◦ u(s, 0)).
In case a = −∞, b finite or a finite, b = +∞ we define the new norm ‖ · ‖1,δ
on the domain of Du by splitting it as
domDu =W
1,p(R× S1, u∗TŴ ;wδ(s)dsdθ) ⊕ V
u,δ ⊕ V
and setting the norm of the above generators of V
u,δ, V
u,δ to be equal to 1.
In case a, b are finite we split the domain of Du as above and further modify
the weighted norm on the W 1,p-space. We recall from the proof of Proposi-
tion 4.8 that kerDu is 1-dimensional and is spanned by a section ζδ which is
constant for |s| ≥ s∗δ . We normalize ζδ by requiring that its value at 0 be equal
to the constant vector corresponding to XH . Let 〈·, ·〉 be the scalar product in
L2(S1). For an element ζ ∈ W 1,p(R× S1, u∗TŴ ;wδ(s)dsdθ) we denote
κδ :=
〈ζ(0, ·), ζδ(0, ·)〉
〈ζδ(0, ·), ζδ(0, ·)〉
We denote
χδ(s, θ) := β(s+ sδ)β(−s+ sδ)ζδ(s, θ)
and define the norm ‖ · ‖1,δ on W 1,p(R× S1, u∗TŴ ;wδ(s)dsdθ) by
‖ζ‖1,δ := ‖ζ − κδχδ‖1,p,δ + |κδ|.
Here ‖ · ‖1,p,δ is the weighted norm on W 1,p(R× S1, u∗TŴ ;wδ(s)dsdθ).
Proposition 4.9. Let u = uδ = uδ,γ,a,b,ǫ as above. There exists δ2 ∈]0, δ0] such
that the operator
Du : (domDu, ‖ · ‖1,δ) → (Lp(R× S1, u∗TŴ ;wδ(s)dsdθ), ‖ · ‖δ)
is surjective and has a uniformly bounded right inverse Qu = Quδ for δ ∈]0, δ2].
Proof. We choose a unitary trivialization of u∗TŴ as in the proof of Proposi-
tion 4.8, so that XH and ∂/∂t correspond to constant vectors in R
2n, and so
that the operator Du takes the form
(Duζ)(s, θ) = ∂sζ + J0∂θζ + S(s, θ)ζ(s, θ).
Here J0 is the standard complex structure on R
2n and S is asymptotically
symmetric as s→ ±∞. The matrix S(s, θ) can be written as S′(s, θ)⊕S′′(s, θ)
with respect to the splitting ξ ⊕ 〈∂/∂t,XH〉, so that the operator Du is also
Symplectic homology for autonomous Hamiltonians 39
split with respect to the decomposition ξ ⊕ L, where L := 〈∂/∂t,XH〉. It
is therefore enough to find uniformly bounded right inverses for each of the
surjective operators
D′u :W
1,p(R× S1, u∗ξ;wδ(s)dsdθ) → Lp(R× S1, u∗ξ;wδ(s)dsdθ),
D′′u :W
1,p(R× S1, u∗L; ‖ · ‖1,δ)⊕ V
u,δ ⊕ V ′u,δ → Lp(R× S1, u∗L;wδ(s)dsdθ).
Here we use the fact that the norm ‖·‖1,δ coincides with the weightedW 1,p-norm
on sections with values in the subbundle u∗ξ. Note that D′u is an isomorphism
since it has index 0, whereas ind(D′′u) = ind(Du) is either 0 or 1.
We treat D′′u and consider first the case of a semi-infinite gradient trajec-
tory. The two possible cases are entirely similar, and we assume without loss of
generality that a = −∞, b = 1. Let
S′′0 :=
so that limδ→0 S
′′(s, θ) = S′′0 uniformly in (s, θ). Consider the operator
D′′0,δ :W
1,p(R×S1, u∗L;wδ(s)dsdθ)⊕V
u,δ⊕V
u,δ → Lp(R×S1, u∗L;wδ(s)dsdθ)
defined by D′′0,δ := ∂s + J0∂θ + S
0 . As in the proof of Proposition 4.8 one sees
that D′′0,δ is surjective, and we claim that it admits a right inverse Q
0,δ that
is uniformly bounded with respect to δ. Indeed, let Q′′0 be a right inverse of
D′′0 := D
0,δ=1 and consider the shift operators
(Tδζ)(s) := ζ(s+ sδ)
acting from dom(D′′0,δ) → dom(D′′0 ) and from Lp(wδ(s)dsdθ) → Lp(ed|s|dsdθ).
It follows from the definitions of ‖ · ‖1,δ and ‖ · ‖δ that the operators Tδ are
isometries, and we have D′′0,δ = T
0Tδ since D
0 is independent of s ∈ R.
Hence Q′′0,δ = T
0Tδ is a right inverse for D
0,δ such that ‖Q′′0,δ‖ = ‖Q′′0‖,
and the claim is proved.
Now, if δ is small enough we have ‖S′′(s, θ) − S′′0 ‖ ≤ 1/2‖Q′′0‖, s ∈ R and
therefore ‖D′′u −D′′0,δ‖ ≤ 1/2‖Q′′0‖. This implies that
‖D′′uQ′′0,δ − Id‖ = ‖D′′uQ′′0,δ −D′′0,δQ′′0,δ‖ ≤
Thus D′′uQ
0,δ is invertible and the norm of its inverse is ≤ 2. Finally a right
inverse for D′′u is given by Q
0,δ(D
−1 and has norm ≤ 2‖Q′′0‖.
We now treat the case a = −T/2, b = T/2 for T > 0. Let u := uδ,γ,−(T+ǫ)/2,0,
u := uδ,γ,0,(T+ǫ)/2 and
:= D′′u :W
1,p(R× S1, u∗L; ed|s|dsdθ) ⊕ V ′u ⊕ V ′u → Lp(ed|s|dsdθ),
D′′ := D′′u :W
1,p(R× S1, u∗L; ed|s|dsdθ) ⊕ V ′u ⊕ V
u → Lp(ed|s|dsdθ).
Symplectic homology for autonomous Hamiltonians 40
The same argument as above, using the constant operator D′′0 , shows that D
and D′′ admit right inverses which are uniformly bounded with respect to δ →
0. Both operators have index 1 and it follows from the description of their
kernels given in the proof of Proposition 4.8 that their restrictions toW 1,p⊕V ′u,
respectively W 1,p ⊕V ′u are isomorphisms. We choose the right inverses Q
, Q′′
to be the inverses of their respective restrictions.
Let ζ ∈ kerD′′, ζ ∈ kerD′′ be two sections such that their values at +∞
and respectively −∞ are equal to the (constant) vector corresponding to XH
in the chosen trivialization of u∗TŴ . Let V ′, V
be the 1-dimensional vector
spaces spanned by βζ and (1 − β)ζ respectively. Setting the norm of these
generators to be equal to 1 defines a new norm on dom(D
) and dom(D′′),
which we emphasize by decomposing the latter as
dom(D
) =W 1,p(R× S1, u∗L; ed|s|dsdθ)⊕ V ′u ⊕ V ′,
dom(D′′) = D′′u :W
1,p(R× S1, u∗L; ed|s|dsdθ) ⊕ V ′ ⊕ V ′u.
It follows from our special choice of the right inverses Q
, Q′′ that the latter
are also uniformly bounded with respect to this new norm as δ → 0.
Let D′′ := D
′′ be the operator obtained by gluing D
cut at sδ and
D′′ cut at −sδ, with the ‖ · ‖1,δ-norm on its domain and the ‖ · ‖δ-norm on its
target. It follows as in [5, Proposition 5] that the right inverses Q
, Q′′ give rise
to a uniformly bounded right inverse Q′′ for D′′ as δ → 0. On the other hand,
we have ‖D′′u − D′′‖ → 0 as δ → 0, and we obtain a uniformly bounded right
inverse for D′′u by the previous formula Q
u := Q
′′(D′′uQ
′′)−1. We note that,
upon gluing, the exponential weights at ±∞ for D′′, D′′ give rise to the peak
in the weight function wδ for D
′′, and the fibered sum operation on V ′, V
which the norm is fixed, is responsible for the appearance of the distinguished
cutoff section ζδ leading to the modified norm ‖ · ‖1,δ.
We now treat D′u and start by making a few general remarks. For each
s0 ∈ R the operator
D′(s0) := ∂s+J0∂θ+S(s0, θ) :W
1,p(R×S1, u∗ξ; dsdθ) → Lp(R×S1, u∗ξ; dsdθ)
is δ-close to the R-invariant operator with nondegenerate asymptotics corre-
sponding to the constant cylinder over the orbit u(s0, ·). Hence, for δ > 0
small enough, both operators are isomorphisms [24, Lemma 2.4]. Moreover,
this property also holds in the presence of weights ed|s|, eds or e−ds. For
the weight ed|s| we argue as follows. The operator is still Fredholm between
the W 1,p and Lp spaces with weights, of the same index 0. Since the corre-
sponding W 1,p space is contained in W 1,p(R × S1, u∗ξ; dsdθ) we infer that the
operator is injective, hence an isomorphism. For the weight eds we argue as
follows. Multiplication by e
s determines linear isomorphisms M : W 1,p(R ×
S1, u∗ξ; edsdsdθ) →W 1,p(R×S1, u∗ξ; dsdθ) andM : Lp(R×S1, u∗ξ; edsdsdθ) →
Symplectic homology for autonomous Hamiltonians 41
Lp(R× S1, u∗ξ; dsdθ). The operator M−1D′(s0)M is an isomorphism and, for
ζ ∈ W 1,p(R× S1, u∗ξ; edsdsdθ), we have
M−1D′(s0)Mζ = D
′(s0)ζ +
Since d > 0 is small as in Proposition A.2 and p > 2, the operatorM−1D′(s0)M
is R-invariant and has nondegenerate asymptotics, hence is an isomorphism [24,
Lemma 2.4]. An analogous reasoning using the multiplication by e−
s proves
the claim for the weight e−ds.
We now prove that D′u admits a uniformly bounded right inverse in the case
a = −T/2, b = T/2, sδ = (T+ǫ)/2δ. We recall the notation u := uδ,γ,−(T+ǫ)/2,0,
u := uδ,γ,0,(T+ǫ)/2 and set
:= D′u :W
1,p(R× S1, u∗ξ; ed|s|dsdθ) → Lp(R× S1, u∗ξ; ed|s|dsdθ),
D′ := D′u :W
1,p(R× S1, u∗ξ; ed|s|dsdθ) → Lp(R× S1, u∗ξ; ed|s|dsdθ).
We claim that each of the operatorsD
, D′ is an isomorphism with uniformly
bounded right inverse as δ → 0. We give the proof forD′ since the proof forD′ is
entirely analogous. We choose a finite number of points −∞ = s−m < s−m+1 <
· · · < s−1 < 0 = s0 < s1 < · · · < sm+1 = +∞ such that ‖S(s, θ) − S(s′, θ)‖ ≤
1/4C for all θ ∈ S1 and s, s′ ∈ [si, si+1], i = −m, . . . ,m, with C > 0 a constant
to be chosen below. Let
bi−1 := ai := c
−1(u(si, 0)), i = −m, . . . ,m+ 1.
We consider the operators
D′i := D
uδ,γ,ai−1,bi−1
, i = −m+ 1, . . . ,−1,
D′0 := D
uδ,γ,a−1,b0
D′i := D
uδ,γ,ai,bi
, i = 1, . . . ,m.
For each i = −m + 1, . . . ,m we denote by ui = ui,δ the gradient cylinder
corresponding to the operator D′i. The domain and range of the operators D
are as follows:
D′i :W
1,p(R× S1, u∗i ξ; e−dsdsdθ) → Lp(R× S1, u∗i ξ; e−dsdsdθ), i < 0,
D′0 :W
1,p(R× S1, u∗0ξ; ed|s|dsdθ) → Lp(R× S1, u∗0ξ; ed|s|dsdθ),
D′i :W
1,p(R× S1, u∗i ξ; edsdsdθ) → Lp(R× S1, u∗i ξ; edsdsdθ), i > 0.
We have seen that D′(s0) is an isomorphism for all s0 ∈ R if one uses any of
the weights ed|s|, eds, e−ds. Since S(s0, ·) belongs to a compact set of loops of
matrices we infer that the norm of the inverse Q′(s0) := D
′(s0)
−1 is uniformly
bounded with respect to s0 ∈ R for each of these three weights. We choose
C := maxweight∈{ed|s|,eds,e−ds} maxs0∈R ‖Q′(s0)‖.
Symplectic homology for autonomous Hamiltonians 42
The same argument as forD′′u shows that the inverse of eachD
i is bounded by
2C independently of δ. We glue together the operators D′i into D̃
′ by cutting at
ai/δ and bi/δ. Then D̃
′ is still surjective and the norm of its inverse is bounded
by 2CC̃2m−1, with C̃ a universal constant (see [24, Proposition 3.9]). Note that
our choice of weights for the operators D′i is such that the resulting weight for
the domain and target of D̃′ is still ed|s|. On the other hand we have
‖D̃′ −D′‖ → 0, δ → 0.
This is because the two operators coincide outside 2m− 1 intervals of length 2,
where the variation of S tends to zero as δ → 0. As a consequence the inverse
is also uniformly bounded when δ is small enough.
We now glue the operator D
cut at sδ with the operator D
′ cut at −sδ,
and denote the resulting operator by D′. The argument in [5, Proposition 5]
shows that D′ admits a uniformly bounded right inverse Q′, provided one uses
the weight wδ(s) on its domain and target. On the other hand
‖D′u −D′‖ → 0, δ → 0
since the two operators differ on a segment of length 2 where the variation of S
goes to zero. We infer that D′u also admits a uniformly bounded right inverse.
The cases when a = −∞, b = 1 or a = −1, b = ∞ follow now easily by
combining the proof of the existence of uniformly bounded right inverses for the
operators D
with the previous use of a shift operator (Tδζ)(s) = ζ(s± sδ).
Remark 4.10. Note that, if a = −T/2, b = T/2, Our construction of a right
inverse for D′′ in the proof of Proposition 4.9 is such that its norm is uniformly
bounded as δ → 0 even if one uses the “non-compensated” norm ‖·‖1,p,δ instead
of ‖ · ‖1,δ. However, our choice of the norm ‖ · ‖1,δ will be essential in the proof
of Proposition 4.16.
In order to describe the pregluing construction it is convenient to work with
a single section over B′δ rather than with a family of sections. We recall that,
up to a translation, the defining interval I(a, b) of a gradient cylinder can be
considered to be [−T/2, T/2], T > 0 in case (i), or ] − ∞, 1], [−1,∞[ in cases
(ii) and (iii) respectively. We are therefore led to consider the section
∂̄ : B′δ → E (56)
defined by ∂̄ := ∂̄−∞,1 and ∂̄ := ∂̄−1,∞ in cases (ii) and (iii), and by
∂̄(u) = ∂̄ǫ(u) := ∂̄−Tu/2,Tu/2,ǫ(u)
in case (i). Here Tu > 0 is the time needed to flow along the gradient of fγ from
the negative limit to the positive limit of u (see (51)).
Remark 4.11. In case (i) the section ∂̄ can be described as follows. The one
parameter family of sections ∂̄T := ∂̄−T/2,T/2,ǫ gives rise to a section denoted
Symplectic homology for autonomous Hamiltonians 43
{∂̄T } of the pull-back bundle pr∗1E → B′δ × R+. There is a codimension one
embedding ι : B′δ → B′δ × R+ given by ι(u) = (u, Tu), the composition pr1 ◦ ι is
the identity and we have
∂̄ = {∂̄T }|im ι.
The situation is summarized in the following commutative diagram.
pr∗1E //
� ι //
B′δ × R+
pr1 //
{∂̄T }
Given u ∈ B′δ we denote by D′u : TuB′δ → Eu the vertical differential of ∂̄. In
cases (ii) and (iii) we have seen that D′u is a Fredholm operator of index ind(p)
and 1− ind(q) respectively. In case (i) the vertical differential can be computed
explicitly as follows. The vertical differential of {∂̄T }, denoted by D{∂̄T }, is
D{∂̄T }(u, T ) · (ζ, τ) = D−T/2,T/2,ǫu ζ − τ(JXHδ−H)
h′−T/2δ,T/2δ,ǫ/δ(s)
= D−T/2,T/2,ǫu ζ −
(JXHδ−H)k−T/2δ,T/2δ,ǫ/δ(s).
Let us write a section ζ ∈ TuB′δ as ζ = ζ0 + aζ + bζ, with ζ0 ∈ W 1,p,d, a, b ∈ R
and ζ, ζ being the distinguished generators of V
u respectively. The vertical
differential D′u acts by
D′uζ = D{∂̄T}(u, Tu) · (ζ, dTu · ζ)
= D−Tu/2,Tu/2,ǫu ζ −
dTu · ζ
(JXHδ−H)k−Tu/2δ,Tu/2δ,ǫ/δ(s).
One can explicitly compute
dTu · ζ = dTu · (aζ + bζ) =
ċ(−Tu/2)b− ċ(Tu/2)a
ċ(−Tu/2) · ċ(Tu/2)
where c : R → Sγ is the gradient trajectory satisfying c(−Tu/2) = θ0, c(Tu/2) =
θ0 and ċ is the derivative with respect to the XH -parametrization of Sγ .
Proposition 4.12. Let T > 0 and u = uδ,γ,−T/2,T/2,ǫ. The index of D
equal to 1, its kernel has dimension 2 and a complement of imD′u is spanned by
a section supported in
[−(T + ǫ)/2δ,−(T + ǫ)/2δ + 1]× S1
[(T + ǫ)/2δ − 1, (T + ǫ)/2δ]× S1.
Morever, D′u admits a right inverse defined on its image which is uniformly
bounded with respect to δ → 0.
Symplectic homology for autonomous Hamiltonians 44
Proof. The first order differential operators D′u and D
−Tu/2,Tu/2,ǫ
u differ by a
term of order zero, hence their indices are equal and indD′u = 1.
The operator D{∂̄T }(u, Tu) is surjective and has index 2. As a consequence
dimkerD′u ≤ dim kerD{∂̄T }(u, Tu) = 2. Let c : R → Sγ be the gradient curve
defining u = uδ,γ,−T/2,T/2,ǫ. For σ close to zero we define c
σ(s) := c(σ + s)
and denote by uσ1 := u
δ,γ,−T/2,T/2,ǫ
the gradient cylinder defined by cσ. Then
∂̄(uσ1 ) = ∂̄T (u
1 ) = 0, hence ζ
1 := d
|σ=0uσ1 ∈ kerD′u. We also define uσ2 :=
uδ,γ,−(T+σ)/2,(T+σ)/2,ǫ to be the gradient cylinder associated to c. Then ∂̄(u
2 ) =
∂̄T+σ(u
2 ) = 0, hence ζ
2 := d
|σ=0uσ2 ∈ kerD′u. Since ζ1 and ζ2 are linearly
independent, we infer that dimkerD′u = 2.
We claim that the section η := 1
(JXHδ−H)k−Tu/2δ,Tu/2δ,ǫ/δ(s) spans a com-
plement of imD′u. This follows from (i) in Lemma 4.13 below with ℓ := dTu,
φ := D
−Tu/2,Tu/2,ǫ
u , φ̃ := D
u, y := η and xy := ζ
2. That D′u admits a uniformly
bounded right inverse defined on its image follows from (ii) in Lemma 4.13 and
the fact that D
−Tu/2,Tu/2,ǫ
u has a uniformly bounded right inverse by Proposi-
tion 4.9.
Lemma 4.13. Let φ : E → F be a surjective map of Banach vector spaces,
ℓ : E → R be a nonzero linear functional, y = φ(xy) ∈ F be fixed and φ̃ : E → F
be defined by
φ̃(x) = φ(x) − ℓ(x)y.
We assume that kerφ ⊂ ker ℓ. Then im φ̃ = φ(ker ℓ) if and only if ℓ(xy) = 1, in
which case the following hold.
(i) The element y spans a complement of im φ̃.
(ii) If Q : F → E is a right inverse for φ, then Q|φ(ker ℓ) is a right inverse for
φ̃ defined on its image.
Proof. We first note that im φ̃ ⊇ φ(ker ℓ). Let us now assume that im φ̃ =
φ(ker ℓ). For x /∈ ker ℓ we obtain φ(x)− l(x)φ(xy) ∈ φ(ker ℓ), hence x− l(x)xy ∈
ker ℓ, implying ℓ(x) − ℓ(x)ℓ(xy) = 0 and ℓ(xy) = 1. Conversely, if ℓ(xy) = 1 we
obtain x− ℓ(x)xy ∈ ker ℓ for any x ∈ E, hence φ̃(x) = φ(x− ℓ(x)xy) ∈ φ(ker ℓ).
The element y does not belong to φ(ker ℓ) because y = φ(xy) with ℓ(xy) = 1
and the preimage xy is well-defined up to an element of kerφ ⊂ ker ℓ. This
proves the equivalence in the statement of the Lemma, as well as (i).
To prove (ii) we need to show that Q(φ(ker ℓ)) ⊂ ker ℓ. We prove the stronger
statement imQ ∩ ker ℓ = Q(φ(ker ℓ)). The inclusion imQ ∩ ker ℓ ⊂ Q(φ(ker ℓ))
follows from the observation that, given x = Qz with ℓ(x) = 0, we have z =
φ(Qz) = φ(x) ∈ φ(ker ℓ). On the other hand note that Qφ is the projection to
imQ along kerφ. Since kerφ ⊂ ker ℓ, it follows that Qφ(ker ℓ) ⊂ imQ∩ker ℓ.
We describe now the pre-gluing construction for elements of the Morse-Bott
moduli spaces and gradient cylinders of the form uδ,γ,a,b,ǫ. We define the space
B̃δ := B̃1,p,dδ (γp, Sγm−1 , . . . , Sγ1 , γq, A;H, {fγ})
Symplectic homology for autonomous Hamiltonians 45
consisting of tuples w̃ := (u1, . . . , um, v0, . . . , vm) satisfying the following condi-
tions.
(i) ui ∈ B1,p,d(Sγi , Sγi−1 , Ai;H), i = 1, . . . ,m, with Sγ0 := Sγ , Sγm := Sγ ,
Sγi 6= Sγi−1 , i = 1, . . . ,m and A1 + . . .+Am = A;
(ii) v0 ∈ B1,p,dδ (Sγ , q; fγ), vi ∈ B
1,p,d
δ (Sγi , Sγi ; fγi) for i = 1, . . . ,m − 1, and
vm ∈ B1,p,dδ (p, Sγ ; fγ);
(iii) ev(vi−1) = ev(ui) and ev(vi) = ev(ui) for i = 1, . . . ,m;
(iv) ev(v0) belongs to the stable manifold of q, and ev(vm) belongs to the
unstable manifold of p.
By the definition of the spaces B1,p,dδ (Sγi , Sγi ; fγi) we have ev(vi) 6= ev(vi) for
i = 1, . . . ,m − 1. We denote by Ti > 0 the unique positive real number such
that ϕ
(ev(vi)) = ev(vi), where ϕ
s is the gradient flow of fγ .
Figure 4: Broken Morse-Bott trajectory w̃.
Let us choose a tubular neighbourhood Uγ ⊂ Ŵ for each γ ∈ P(H),
parametrized by (ϑ, z) ∈ S1 × R2n−1. Given any subset
K ⊂ B̃1,p,dδ (γp, Sγm−1 , . . . , Sγ1 , γq, A;H, {fγ})
for which there exists s0 > 0 such that, for |s| ≥ s0, the components of any
w̃ ∈ K belong to the respective tubular neighbourhoods of their asymptotics,
Symplectic homology for autonomous Hamiltonians 46
we construct, for δ > 0 small enough and ǫi ∈ R, i = 1, . . . ,m− 1 small enough
in absolute value, a pre-gluing map
Gδ,ǫ : K → B1,p,dδ (γp, γq, A;H, {fγ}), ǫ := (ǫ1, . . . , ǫm−1).
Let β : R → [0, 1] be a smooth increasing cutoff function vanishing on ]−∞, 0]
and identically equal to 1 on [1,∞[. Define the gluing profile R = R(δ) by
. (57)
We define for i = 1, . . . ,m the maps ûi : [−R,R]× S1 → Ŵ by
ûi(s, θ) :=


z(s, θ) = β(s+R)z ◦ ui(s, θ),
ϑ(s, θ) = θ + β(s+R)(ϑ ◦ ui(s, θ)− θ),
s ∈ [−R,−R+ 1],
ui(s, θ), s ∈ [−R+ 1, R− 1],{
z(s, θ) = β(−s+R)z ◦ ui(s, θ),
ϑ(s, θ) = θ + β(−s+R)(ϑ ◦ ui(s, θ)− θ),
s ∈ [R− 1, R].
We define for i = 1, . . . ,m− 1 the maps
v̂i : [−(Ti + ǫi)/2δ, (Ti + ǫi)/2δ]× S1 → Ŵ
by the analogous formulas in which we replace R by Ti+ǫi
. We also define
v̂0 : [−1/δ,+∞[×S1 → Ŵ
v̂0(s, θ) :=
z(s, θ) = β(s+ 1
)z ◦ v0(s, θ),
ϑ(s, θ) = θ+β(s+ 1
)(ϑ◦v0(s, θ)−θ),
s ∈ [− 1
+ 1],
v0(s, θ), s ∈ [− 1δ + 1,+∞[,
as well as
v̂m :]−∞, 1/δ]× S1 → Ŵ
by the analogous formula with s replaced by −s and v0 replaced by vm. Finally,
we define
Gδ,ǫ(w̃)
as the catenation v̂m, ûm, v̂m−1, . . . , û1, v̂0. The catenation of these maps is
performed in the above order and with (obvious) shifts
0 = svm < sum < svm−1 < . . . < su1 < sv0
in the domain defined by
suj = svj + ℓj, (58)
svj−1 = suj + ℓj−1
Symplectic homology for autonomous Hamiltonians 47
for j = 1, . . . ,m. Here we denote
ℓi := R + (Ti + εi)/2δ (59)
for i = 0, . . . ,m, with the convention Tm = T0 = 2 and εm = ε0 = 0. We have
in particular
v̂i(s, θ) = Gδ,ǫ(w̃)(s+ svi , θ), (s, θ) ∈ dom(v̂i), i = 0, . . . ,m,
ûj(s, θ) = Gδ,ǫ(w̃)(s+ suj , θ), (s, θ) ∈ dom(ûj), j = 1, . . . ,m.
Given u = (cm, um, . . . , u1, c0) ∈ M̂A(p, q;H, {fγ}, J), we denote by
Gδ,ǫ(u)
the element Gδ,ǫ(w̃) ∈ Bδ, where w̃ := (vm, um, . . . , u1, v0) and vi := uδ,γi,ai,bi,ǫi ,
i = 0, . . . ,m is the gradient cylinder corresponding to the gradient trajectory
ci : I(ai, bi) → Sγi .
The section ∂̄Hδ,J(Gδ,ǫ(w̃)) belongs to the space
Lp(R× S1, Gδ,ǫ(w̃)∗TŴ ; gδ,ǫ(s)dsdθ),
where the continuous function gδ,ǫ(s) is the catenation of the following functions:
(i) gδ,ui(s) := e
d|s| on the domain [−R,R] of ûi;
(ii) gδ,ǫi,vi(s) = e
d||s|−si,δ| on the domain [−(Ti + ǫi)/2δ, (Ti + ǫi)/2δ] of v̂i,
where si,δ =
Ti+ǫi
−R ≤ s∗i,δ =
Ti+ǫi
, i = 1, . . . ,m− 1;
(iii) gδ,v0(s) := e
d|s+s0,δ| on the domain [−1/δ,+∞[ of v̂0, where s0,δ = 1/δ −
R ≤ s∗0,δ = 1/δ;
(iv) gδ,vm(s) := e
d|s−sm,δ| on the domain ]−∞, 1/δ] of v̂m, with sm,δ = 1/δ −
R ≤ s∗m,δ = 1/δ.
We denote the norm on the above Lp space with weight gδ,ǫ by ‖ · ‖δ, omitting
in the notation the dependence on the numbers Ti + ǫi, i = 1, . . . ,m − 1. We
define a norm ‖ · ‖1,δ on the space
W 1,p(R× S1, Gδ,ǫ(w̃)∗TŴ ; gδ,ǫ(s)dsdθ)
as follows. For j = 1 . . . ,m let
〈ζ(suj −R, ·), XH〉
〈XH , XH〉
, κj =
〈ζ(suj +R, ·), XH〉
〈XH , XH〉
, (60)
where 〈·, ·〉 is the inner product in L2(S1). Here suj − R and suj + R are the
coordinates of the catenation circles between ûj and v̂j , respectively ûj and
v̂j−1. For i = 1, . . . ,m− 1 let
〈ζ(svi , ·), ζi,δ(0, ·)〉
〈ζi,δ(0, ·), ζi,δ(0, ·)〉
Symplectic homology for autonomous Hamiltonians 48
suj +R
sujsvj suj − 2R suj −R svj−1suj + 2R
Figure 5: The definition of ‖ · ‖1,δ.
where the section ζi,δ generates the kernel of the operator Dvi as in Proposition
4.8. The norm ‖ · ‖1,δ is then defined by
‖ζ‖1,δ :=
∥∥ζ −
κjβ(−s+ suj )β(s − suj + 2R)XH (61)
− κjβ(s− suj )β(−s+ suj + 2R)XH
κiβ(s− svi + ℓi − 2R)β(−s+ svi + ℓi − 2R)ζi,δ(· − svi , ·)
W 1,p(gδ,ǫ)
|κj |+ |κj |
|κi|.
Here ℓj is defined by (59), β : R → [0, 1] is the smooth cutoff function which
vanishes on ]−∞, 0] and is equal to 1 on [1,∞[, and ‖ · ‖W 1,p(gδ,ǫ) is the W 1,p-
norm with weight gδ,ǫ on W
1,p(ed|s|dsdθ). The graph of the function
β(−s+ suj )β(s− suj + 2R) + β(s− suj )β(−s+ suj + 2R)
+β(s− svj + ℓj − 2R)β(−s+ svj + ℓj − 2R)
+β(s− svj−1 + ℓj−1 − 2R)β(−s+ svj−1 + ℓj−1 − 2R)
is depicted in Figure 5.
Remark 4.14. The definition of ‖ · ‖1,δ is such that the norm of the gluing
map G constructed in the proof of Proposition 4.18 below is uniformly bounded
with respect to δ → 0.
Proposition 4.15. Let w̃ ∈ B̃δ and ǫ(δ) := (ǫ1(δ), . . . , ǫm−1(δ)) be such that
(i) ǫi(δ) → 0, δ → 0 for i = 1, . . . ,m− 1;
(ii) ui ∈ MAi(Sγi , Sγi−1 ;H, J), i = 1, . . . ,m;
Symplectic homology for autonomous Hamiltonians 49
(iii) the components vi are of the form uδ,γi,ai,bi,ǫi , with bi = −ai = Ti/2 for
i = 1, . . . ,m − 1, b0 = +∞, a0 = −1, ǫ0 = 0 and bm = 1, am = −∞,
ǫm = 0.
‖∂̄Hδ,J(Gδ,ǫ(w̃))‖δ = 0.
Proof. We must check that ‖∂̄Hδ,J(Gδ,ǫ(w̃))|I×S1‖δ → 0 as δ → 0 when I ⊂ R
is an interval of the following type.
(i) I = [−R + 1, R − 1] is contained in the domain of ûi. Then ∂̄Hδ,J(ûi) =
−J(XHδ − XH) ◦ ûi. The norm of this map is pointwise bounded by a
constant multiple of δ. Hence its δ-norm is bounded by a constant multiple
of δedR → 0, δ → 0;
(ii) I = [−R,−R + 1] or I = [R − 1, R] is contained in the domain of ûi.
We have ∂̄Hδ,J(ûi) = ∂̄H,J(ûi) − J(XHδ − XH) ◦ ûi. The second term is
bounded as in (i). The term ∂̄H,J(ûi) is pointwise bounded by the norms
of z ◦ ûi, ϑ ◦ ûi − θ and of their derivatives. By Proposition A.1 their
δ-norm is bounded by a constant multiple of e(d−r)R → 0, δ → 0;
(iii) I = [−(Ti + ǫi)/2δ + 1, (Ti + ǫi)/2δ − 1] for i = 1, . . .m − 1, or I =
[−1/δ + 1,+∞[ or I =] −∞, 1/δ − 1] and is contained in the domain of
some v̂i. Since ∂̄H′
−Ti/2,Ti/2,ǫi
,J(v̂i) = 0 and H
−Ti/2,Ti/2,ǫi
= Hδ for s ∈ I,
we already have ‖∂̄Hδ,J(Gδ,ǫ(w̃))|I×S1‖δ = 0;
(iv) I = [−(Ti+ ǫi)/2δ,−(Ti+ ǫi)/2δ+1] or I = [(Ti+ ǫi)/2δ− 1, (Ti+ ǫi)/2δ]
for i = 1, . . .m − 1, or I = [−1/δ,−1/δ + 1], or I = [1/δ − 1, 1/δ] and is
contained in the domain of some v̂i. Then ∂̄Hδ,J(v̂i) involves only ϑ◦ v̂i−θ,
its derivative with respect to s and δ∇fγi . By formula (50) the norm
of these expressions is pointwise bounded by a constant multiple of δ,
therefore their δ-norms are bounded by δedR → 0 as δ → 0.
Proposition 4.16. Let [ṽn] ∈ MA(γp, γq;Hδn , J) with δn → 0, n → ∞ and
let [u] ∈ MA(p, q;H, {fγ}, J) be a broken Floer trajectory of level ℓ = 1 whose
intermediate gradient fragments c1, . . . , cm−1 are nonconstant. Then [ṽn] → [u],
n→ ∞ if and only if there exist
• representatives vn ∈ [ṽn], v ∈ [u],
• real parameters ǫn = (ǫn1 , . . . , ǫnm−1) with ǫni → 0, n→ ∞,
• vector fields ζn ∈ TGδn,ǫn (v)Bδ with ζn = (ζ
n, ζn, ζn), such that
‖ζn‖1,δn := ‖ζ0n‖1,δn + ‖ζn‖+ ‖ζn‖ → 0, n→ ∞,
Symplectic homology for autonomous Hamiltonians 50
satisfying
vn := expGδn,ǫn(v)
(ζn).
Proof. We first prove the converse implication, namely that convergence in norm
implies geometric convergence. We define shifts (sni ), i = 1, . . . ,m inductively
snm := 1/δn +Rn, s
i := s
i+1 + 2Rn + (Ti + ǫ
i )/δn.
We claim that vn(·+sni , ·) → ui, n→ ∞ uniformly on compact sets. Let R0 > 0
be fixed. By assumption
‖ζ0n(·+ sni , ·)|[−R0,R0]×S1‖1,δn → 0, n→ ∞.
By the Sobolev embedding theorem this implies
‖ζ0n(·+ sni , ·)|[−R0,R0]×S1‖C0 → 0, n→ ∞.
Since
Gδn,ǫn(v)(· + sni , ·)|[−R0,R0]×S1 = ui|[−R0,R0]×S1
for n sufficiently large, the conclusion follows.
We now prove the direct implication. Let us pick a representative
v = (cm, um, cm−1, . . . , u1, c0) ∈ [u]
and let Ti, i = 0, . . . ,m be the lengths of the intervals of definition of ci, with the
convention T0 = Tm = +∞. We also choose arbitrary representatives vn ∈ [ṽn].
By assumption there exist shifts (sni ) such that vn(· + sni , ·) converges to ui
uniformly on compact sets. We define
ǫni := δn(s
i − sni+1 − 2Rn)− Ti, i = 1, . . . ,m− 1. (62)
By Lemma 4.6 we have ǫni → 0, n→ ∞. We define partitions of the real line
−∞ = anm ≤ bnm ≤ anm−1 ≤ . . . ≤ an0 ≤ bn0 = +∞
by bnm := 1/δn and
ani−1 := b
i + 2Rn, b
i−1 := a
i−1 + (Ti−1 + ǫ
i−1)/δn, i = 1, . . . ,m.
We define a sequence of shifts (sn) by
sn := snm − 1/δn −Rn
and we still denote by vn the shifted sequence vn(·+ sn, ·).
We first show the existence of a unique vector field ζn satisfying vn =
expGδn,ǫn(v)
(ζn). For that it is enough to prove
s∈In,θ∈S1
dist(vn(s, θ), Gδn,ǫn(v)(s, θ)) = 0, (63)
where In is an interval of the following form:
Symplectic homology for autonomous Hamiltonians 51
(i) [bni , a
i−1], i = 1, . . . ,m;
(ii) [ani , b
i ], i = 1, . . . ,m− 1;
(iii) [bnm −K/δn, bnm] or [an0 , an0 +K/δn], for any K > 0.
The asymptotic behaviour of vn and Gδn,ǫn(v) ensures that ζn is an element of
the relevant W 1,p-space.
We prove case (i) by contradiction. Assume that there exists ǫ > 0 and a
sequence (s̃n, θ̃n) ∈ [bmi , ani−1]× S1 such that
dist(vn(s̃n, θ̃n), Gδn,ǫn(v)(s̃n, θ̃n)) ≥ ǫ.
Since (63) is satisfied if one replaces vn by ui(·−sni , ·) (by definition ofGδn,ǫn(v)),
we also have
dist(vn(s̃n, θ̃n), ui(s̃n − sni , θ̃n)) ≥ ǫ/2 (64)
for n large enough. By the assumption of uniform convergence on compact
sets vn(· + sni , ·) → ui(·, ·), up to passing to a subsequence we can assume that
s̃n − sni → ±∞. We treat the case s̃n − sni → ∞, the other case being similar.
Since s̃n ∈ [bni , ani−1] and δn(ani−1 − bni ) = 2δnRn → 0, we have δn(s̃n − sni ) → 0.
By Lemma 4.5 we infer that vn(·+ s̃n, ·) → ev(ui), which means
vn(s̃n, ·) = lim
ui(s̃n − sni , ·)
and this contradicts (64).
Note that the above proof shows that vn(· + ani−1, ·) → ev(ui) and vn(· +
bni , ·) → ev(ui), i = 1, . . . ,m uniformly on compact sets.
We now prove case (ii). Let us fix 1 ≤ i ≤ m − 1. An action argument as
the one in the proof of Lemma 4.6 shows that vn(In × S1) is entirely contained
in a small neighbourhood of Sγi . We apply Proposition A.3 to vn and In × S1
to obtain
(s,θ)∈In×S1
|z ◦ vn(s, θ)| = 0
(s,θ)∈In×S1
|ϑ ◦ vn(s, θ)− θ − ϕ
δn(s−a
(ev(ui+1))| = 0.
The same two equations hold, by definition, if one replaces vn by Gδn,ǫn(v), and
the conclusion follows.
We now prove (iii). We treat only the case In = [a
0 , a
0 +K/δn], the other
case being similar. An action argument as above shows that vn(·+an0 +K/δn, ·)
converges uniformly on compact sets to a constant cylinder over some orbit
γ ∈ Sγ0 . By Lemma 4.6 we know that γ = ϕ
K (ev(u1)), and in particular is
not a critical point of fγ0 . Now the conclusion follows in the same way as in
case (ii).
We now show that
‖ζn|In×S1‖1,δn = 0 (65)
Symplectic homology for autonomous Hamiltonians 52
in each of the cases (i)-(iii). We denote in the sequel
|ζ(s, θ)|1 := |ζ(s, θ)| + |∇sζ(s, θ)|+ |∇θζ(s, θ)|.
We first consider case (i). Let us fix K > 0 large enough. For n large enough
we can write
In = [s
i −Rn, sni −K] ∪ [sni −K, sni +K] ∪ [sni +K, sni +Rn].
We first note that
∫ sni +K
|ζ(s, θ)|p1 gδn,ǫn(s)dsdθ =
∫ sni +K
|ζ(s, θ)|p1 ed|s−s
i |dsdθ
≤ sup
s∈[sn
−K,sn
|ζ(s, θ)|p1 · e
Since vn(· + sni , ·) and Gδn,ǫn(v)(· + sni , ·) converge uniformly on compact sets
together with their derivatives to ui, the last term goes to zero as n→ ∞.
In order to estimate the integral on the interval [sni −Rn, sni −K] we apply
Proposition A.3 on [sni+1 +K, s
i −K] to vn to obtain
|z ◦ vn(s, θ)|1 ≤ C(K)
cosh(ρ(s− s
i+1+s
cosh(ρ(
≤ C1C(K)eρ(s−s
i +K)
|ϑ ◦ vn(s, θ)− θ − ϕ
δn(s−b
(pni )|1 ≤ C1C(K)eρ(s−s
i +K),
where | · |1 stands for the pointwise C1-norm, for some pni ∈ Sγi such that
pni → ev(ui), n → ∞. Similar estimates hold, by definition, if one replaces vn
by Gδn,ǫn(v) and p
i with ev(ui). Hence we obtain
|ζn(s, θ)− κni XH |1 ≤ C1C(K)eρ(s−s
i +K), (66)
where κni → 0 as n→ ∞ and
C(K) = Cmax(‖Q∞vn(sni+1 +K)‖, ‖Q∞vn(sni −K)‖,
‖Q∞ṽn(sni+1 + sn +K)‖, ‖Q∞ṽn(sni + sn −K)‖). (67)
We obtain
∫ sni −K
|ζn(s, θ)− κni XH |
1 gδn(s)dsdθ
∫ sni −K
|ζn(s, θ)− κni XH |
−d(s−sni )dsdθ
≤ C2C(K)pedK .
Symplectic homology for autonomous Hamiltonians 53
A similar estimate holds when the interval of integration is [sni +K, s
i + Rn],
with C(K) replaced with C′(K). Letting n→ ∞ we obtain
In×S1
|ζn(s, θ)− κni β(−s+ sni )XH − κni β(s− sni )XH |
1 gδn(s)dsdθ
≤ C2(C(K)p + C′(K)p)edK .
We let now K → ∞. Proposition A.3 implies that, for K > K ′, we have
C(K ′) ≤ C3C(K)e−ρ(K
′−K), hence C(K)pedK → 0 as K → ∞ because d < ρp.
The equality (65) follows.
We now consider case (ii). We fix K > 0 large enough and apply Proposi-
tion A.3 on the interval [sni+1+K, s
i −K] ⊃ [sni+1+Rn, sni −Rn] = In to obtain
as in case (i)
|ζn(s, θ)− κni ζi,δ(s, θ)|1 ≤ C(K)
cosh(ρ(s− s
i+1+s
cosh(ρ(
where C(K) is given by (67), ζi,δ(s −
sni+1+s
) generates the kernel of the lin-
earized operator corresponding to gradient trajectory ci as in Proposition 4.8
and κni ζi,δ(b
i , ·) = κni XH . In particular, we have κni → 0, n→ ∞. We get
∫ sni −Rn
|ζn(s, θ)− κni ζi,δ|
1 gδn(s)dsdθ ≤ C2C(K)
pe(d−ρp)(Rn−K).
The last term goes to zero as n → ∞. Equality (65) follows now as in case (i).
Case (iii) is entirely similar to case (ii).
In order to complete the proof of ‖ζn‖1,δn → 0, n→ ∞, it is enough to show
that ‖ζn|In×S1‖1,δn → 0 if In =] − ∞, bnm − K/δn] or In = [an0 + K/δn,+∞[,
for any K > 1. The two cases are entirely similar and we give the argument
only for In =]−∞, bnm −K/δn]. By Proposition A.2, for n sufficiently large we
have vn(s, θ) = expuδn,γm,−∞,1(s,θ)(ηn(s, θ)), with ηn = (η
n, ηn), η
n ∈W 1,p(In×
S1, u∗δn,γm,−∞,1TŴ ; e
r|s|ds dθ), ηn ∈ V
. Since vn(b
n , ·) → ev(um) we have
‖ηn‖∞ → 0. Since Gδn,ǫn(v) = uδn,γm,−∞,1 on In, we obtain ζn = ηn, so that
‖ζn‖ → 0. The fact that ‖ζ0n‖1,δn → 0 follows from the fact that d < r.
We explain now how to construct a right inverse for DGδ,ǫ( ew) which is uni-
formly bounded with respect to δ → 0. The space B̃δ is a Banach manifold
whose tangent space at w̃ is
T ewB̃δ = TvmB′δ dev ⊕dev TumB dev ⊕dev Tvm−1B′δ dev ⊕ . . .⊕dev Tv0B′δ. (68)
Recall that the fibered sum of two vector spaces W1, W2 with respect to linear
maps fi :Wi →W is the vector space
W1 f1 ⊕f2 W2 := {(w1, w2) ∈W1 ⊕W2 : f1(w1) = f2(w2)}.
Symplectic homology for autonomous Hamiltonians 54
If (W1, ‖ · ‖1), (W2, ‖ · ‖2) and W are normed vector spaces, and f1, f2 are
continuous linear maps, then W1 f1 ⊕f2 W2 is a closed subspace of W1⊕W2 and
inherits the norm ‖ · ‖1 + ‖ · ‖2 from W1 ⊕W2. In our case
dev : TvmB′δ =W 1,p,d ⊕ V
′ ⊕ V ′ → Tev(vm)Sγm
factors through the projection on V ′, and similarly for the other evaluation
maps. Therefore the above fibered sum only affects the summands V , V , V
V ′, so that T ewB̃δ is a subspace of codimension 2m in
TvmB′δ ⊕ TumB ⊕ Tvm−1B′δ ⊕ . . .⊕ Tv0B′δ.
As above, the norm on T ewB̃δ is induced from the ambient space. Recall that
the W 1,p-component has weight ed|s| for each TujB, weight ed||s|−si,δ| for each
TviB′δ, i = 1, . . . ,m − 1, weight ed|s+s0,δ | for i = 0 and weight ed|s−sm,δ| for
i = m, with si,δ as in the definition of gδ,ǫ.
The sections ∂̄H,J : B → E and ∂̄ : B′δ → E defined by (41) and (56) give rise
to a section over B̃δ. We denote its vertical differential by
D ew : T ewB̃δ → Lp,d(v∗mTŴ )⊕ Lp,d(u∗mTŴ )⊕ . . .⊕ Lp,d(v∗0TŴ ),
where
Lp,d(v∗i TŴ ) := L
p(R× S1, v∗i TŴ ; gδ,ǫi,vi(s)dsdθ),
Lp,d(u∗iTŴ ) := L
p(R× S1, u∗iTŴ ; gδ,ui(s)dsdθ).
Lemma 4.17. Let J ∈ Jreg(H) and {fγ} ∈ Freg(H, J). Let ǫ =(ǫ1, . . . , ǫm−1)
and let w̃ ∈ B̃δ be as in Proposition 4.15. The image of the operator D ew has codi-
mension m−1 and admits a complement spanned by sections ηi ∈ Lp,d(v∗i TŴ ),
i = 1, . . . ,m− 1 which are respectively supported in
[−(Ti + ǫi)/2δ,−(Ti + ǫi)/2δ + 1]× S1 ∪ [(Ti + ǫi)/2δ − 1, (Ti + ǫi)/2δ]× S1.
The operator D ew admits a right inverse Q ew defined on its image and whose
norm is uniformly bounded with respect to δ → 0.
Proof. We show that
imD ew = imD
⊕ imDum ⊕ imD′vm−1 ⊕ . . .⊕ imD
=: E. (69)
By definition we have imD ew ⊂ E. Let us now choose (xm, ym, . . . , x0) ∈ E and
x̃i and ỹj such that D
(x̃i) = xi, Duj (ỹj) = yj . We need to modify x̃i and ỹj
by elements lying in the kernels of the corresponding operators so that
dev(ỹj) = dev(x̃j), dev(ỹj) = dev(x̃j−1), j = 1, . . . ,m. (70)
Let us first assume m > 1. We have
TvmM′δ,1,−∞(Sγ , Sγ ;H, J)× TumMAm(Sγ , Sγm−1 ;H, J) = kerD′vm × kerDum
Symplectic homology for autonomous Hamiltonians 55
and, because {fγ} ∈ Freg(H, J), the map
(dev, dev) : kerD′vm × kerDum → Tev(vm)Sγ × Tev(um)Sγ
is transverse to the diagonal. We can therefore modify x̃m and ỹm so that
dev(ỹm) = dev(x̃m). Similarly the map
(dev, dev) : kerDu1 × kerD′v0 → Tev(u1)Sγ × Tev(v0)Sγ
is transverse to the diagonal and we can modify ỹ1, x̃0 in order to achieve
dev(ỹ1) = dev(x̃1). For i = 1, . . . ,m− 1 the maps
(dev, dev) : kerD′vi → Tev(vi)Sγi × Tev(vi)Sγi
are surjective and we can modify x̃i so that (70) is satisfied.
If m = 1 the regularity hypothesis on fγ ensures that the map
(dev, dev, dev, dev) : kerD′v1 × kerDu1 × kerD
→ Tev(v1)Sγ × Tev(u1)Sγ × Tev(u1)Sγ × Tev(v0)Sγ
is transverse to the product of the diagonals in the first two and in the last
two factors. We can therefore modify simultaneously x̃1, ỹ1, x̃0 in order to
achieve (70). Therefore (69) is proved. It then follows from Proposition 4.12
that the image of D ew has codimension m − 1 and is spanned by sections ηi ∈
Lp,d(v∗i TŴ ) supported in the desired intervals.
We now prove that D ew admits a uniformly bounded right inverse defined on
its image. We observe that D ew is the restriction to dom(D ew) of the direct sum of
operators D := D′vm ⊕Dum ⊕D
⊕· · ·⊕Du1 ⊕D′v0 . Let ζm, ζ0 be generators
of kerD′vm , kerD
and, for i = 1, . . . ,m− 1, let ζ1i , ζ2i be the basis of kerD′vi
constructed in Proposition 4.12. We denote by K the vector space spanned by
these 2m sections, viewed as elements of dom(D). Then dim K = 2m and K
is a complement of dom(D ew). Let P : dom(D) → dom(D ew) be the projection
parallel to K, let Quj , j = 1, . . . ,m be uniformly bounded right inverses forDuj ,
let Qvi , i = 0, . . . ,m be uniformly bounded right inverses for D
defined on
their images as in Proposition 4.12, and denote Q := Qvm ⊕Qum⊕Qvm−1 ⊕· · ·⊕
Qu1 ⊕Qu0 . Since K ⊂ kerD the operator P ◦Q : im(D) = im(D ew) → dom(D ew)
is a right inverse for D ew defined on its image, and we claim that its norm is
uniformly bounded for δ → 0. The norm of Q is uniformly bounded for δ → 0,
so that it is enough to prove that the norm of P is uniformly bounded for δ → 0.
The sections ζ0, ζm and ζ
i , ζ
i for i = 1, . . . ,m − 1 have the property that
their respective asymptotic values (obtained by applying dev and dev) are not
simultaneously zero. Moreover, the same is true for any linear combination of ζ1i
and ζ2i for i = 1, . . . ,m− 1. As a consequence, there exists a uniform constant
C > 0 such that, for any x = (xm, 0, xm−1, . . . , 0, x0) ∈ K, we have
‖x‖1,δ ≤ C
|dev(xm)|+ |dev(x0)|+
|dev(xi)|+ |dev(xi)|
. (71)
Symplectic homology for autonomous Hamiltonians 56
Given v ∈ dom(D) we have P (v) = v + w for some vector w ∈ K which is
uniquely determined by the asymptotic values of the components of v, and it
follows from (71) that
‖w‖1,δ ≤ C‖v‖1,δ.
We obtain
‖P (v)‖1,δ
‖v‖1,δ
‖v + w‖1,δ
‖v‖1,δ
≤ 1 + C,
so that the norm of P is uniformly bounded by 1+C. This proves the Lemma.
Proposition 4.18. Let J ∈ Jreg(H) and {fγ} ∈ Freg(H, J). Let w̃ ∈ B̃δ and
ǫ(δ) = (ǫ1(δ), . . . , ǫm−1(δ)) be as in Proposition 4.15. The operator
DGδ,ǫ( ew) : W
1,p(R× S1, Gδ,ǫ(w̃)∗TŴ ; gδ,ǫ(s)dsdθ)⊕ V
⊕ V ′v0
→ Lp(R× S1, Gδ,ǫ(w̃)∗TŴ ; gδ,ǫ(s)dsdθ)
is surjective and admits a right inverse Qδ = Qδ,ǫ, ew whose δ-norm is uniformly
bounded with respect to δ → 0.
Proof. Our proof is modelled on the proof of the gluing theorem for holomorphic
spheres by McDuff and Salamon [21, Ch. 10] . Let
vδm, u
m−1, . . . , u
be the extensions of v̂m, ûm, v̂m−1, . . . , û1, v̂0 to R × S1 defined by the same
formulas. Note that
uδj(s, θ) = uj(s, θ), s ∈ [−R+ 1, R− 1],
vδm(s, θ) = vm(s, θ), s /∈ [1/δ − 1, 1/δ],
vδ0(s, θ) = v0(s, θ), s /∈ [−1/δ,−1/δ+ 1]
and vδi (s, θ) = vi(s, θ) for s outside [−(Ti + ǫi)/2δ,−(Ti + ǫi)/2δ + 1] ∪ [(Ti +
ǫi)/2δ − 1, (Ti + ǫi)/2δ] and i = 1, . . . ,m− 1. The difference between vδi and vi
on the one hand, and that between uδj and uj on the other hand is exponentially
small as δ → 0. This implies that the operatorsDuδ
andD′
are surjective
for δ small enough and admit uniformly bounded right inverses, while the op-
erators D′
, i = 1, . . . ,m− 1 have a codimension one image with a supplement
spanned by a smooth section ηi supported in [−(Ti+ ǫi)/2δ,−(Ti+ ǫi)/2δ+1]×
S1 ∪ [(Ti + ǫi)/2δ − 1, (Ti + ǫi)/2δ]× S1, and admit uniformly bounded “right
inverses” defined on their image. It follows that the vertical differential D ewδ
satisfies the conclusions of Lemma 4.17, where w̃δ := (uδ1, . . . , u
0 , . . . , v
In particular, it admits a uniformly bounded right inverse defined on its image,
which we denote by Q ewδ (see [21, Lemma 10.6.1] for a similar statement in the
case of holomorphic spheres). This means that there exists a constant c0 > 0
such that
‖Q ewδx‖W 1,p,d ≤ c0‖x‖Lp,d
Symplectic homology for autonomous Hamiltonians 57
for all x ∈ imD ewδ and δ > 0.
We define an operator Tδ by the commutative diagram
T ewδ B̃δ
Lp,d(w̃δ∗TŴ )
dom(DGδ,ǫ( ew)) L
p(R× S1, Gδ,ǫ(w̃)∗TŴ ; gδ,ǫ(s)dsdθ)
where
Lp,d(w̃δ∗TŴ ) := Lp,d(vδm
TŴ )⊕ Lp,d(uδm
TŴ )⊕ . . .⊕ Lp,d(vδ0
TŴ ).
In the rest of the proof we shall omit the subscript ǫ from Gδ,ǫ and gδ,ǫ. An
element of Lp,d(w̃δ∗TŴ ) is denoted by
x = (xm, ym, . . . , x0).
The mixing map P , the splitting map S and the gluing map G are defined below,
and we shall prove that P, S,G are uniformly bounded with respect to δ → 0.
We shall also prove that Tδ is an approximate right inverse for DGδ( ew), i.e.
‖DGδ( ew)Tδη − η‖δ ≤
‖η‖δ (72)
for δ sufficiently small and η ∈ Lp(R × S1, Gδ(w̃)∗TŴ ; gδ(s)dsdθ). This im-
plies that DGδ( ew)Tδ is invertible (with the norm of its inverse bounded by 2),
and Tδ(DGδ( ew)Tδ)
−1 is a right inverse for DGδ( ew). Since P, S,G are uniformly
bounded, the norm of Tδ(DGδ( ew)Tδ)
−1 is bounded by a constant multiple of
‖Q ewδ‖, hence is uniformly bounded and the conclusion of the Proposition fol-
lows.
For every L > 0 we fix a smooth function
βL : R → [0, 1]
which vanishes for s ≤ 0, which is constant equal to 1 for s ≥ L and whose
derivative is bounded by 2/L. We moreover require that, for L large enough,
the function βL vanishes for s ≤ 1.
We define the mixing map P . Let
pi : L
p,d(w̃δ∗TŴ ) → imD′
, i = 0, . . . ,m
be the projection on Lp,d((vδi )
∗TŴ ) followed by the projection on imD′
allel to ηi. Recall the definition (59) of ℓi for i = 0, . . . ,m and let
qj : L
p,d(w̃δ∗TŴ ) → imDuδ
qj(x)(s, θ) := yj(s, θ)
+ β1(s− ℓj) ·
(1l− pj)(xj)
(s− ℓj, θ)
+ (1− β1(s− ℓj−1)) ·
(1l− pj−1)(xj−1)
(s− ℓj−1, θ)
Symplectic homology for autonomous Hamiltonians 58
for j = 1, . . . ,m. We define
P : Lp,d(w̃δ∗TŴ ) → imD ewδ
P := pm + qm + pm−1 + . . .+ q1 + p0.
The norm of P is uniformly bounded with respect to δ → 0 since the norm of
each pi is uniformly bounded by 1.
We define now the splitting map
S(η) := x = (xm, ym, . . . , x0).
We recall the definition (58) of the catenation shifts
0 = svm < sum < svm−1 < . . . < su1 < sv0 ,
and set
xm(s, θ) := β1(1/δ − s)η(s, θ),
x0(s, θ) := β1(1/δ + s)η(s+ sv0 , θ),
and, for i = 1, . . . ,m− 1, j = 1, . . . ,m,
yj(s, θ) :=
(1− β1(−R− s))η(s+ suj , θ), s ≤ 0,
(1− β1(−R+ s))η(s+ suj , θ), s ≥ 0,
xi(s, θ) :=
β1((Ti + ǫi)/2δ + s)η(s+ svi , θ), s ≤ 0,
β1((Ti + ǫi)/2δ − s)η(s+ svi , θ), s ≥ 0.
It follows from the definition that the norm of S is uniformly bounded by 1.
We define now the gluing map ζ := G(x̃), x̃ = (x̃m, ỹm, x̃m−1, . . . , x̃0) ∈
T ewδ B̃δ by “slowly interpolating” the components of x̃. For j = 1, . . . ,m, i =
1, . . . ,m− 1 we put
ζ(s, θ) :=
x̃m(s, θ), −∞ < s ≤ 1/δ −R/2,
ỹj(s− suj , θ), suj −R/2 ≤ s ≤ suj +R/2,
x̃i(s− svi , θ), svi − ℓi + 3R/2 ≤ s ≤ svi + ℓi − 3R/2,
x̃0(s− sv0 , θ), sv0 − 1/δ +R/2 ≤ s < +∞.
The above formula leaves out two types of intervals, on which the actual inter-
polation takes place (see Figure 6).
• If svj + ℓj − 3R/2 ≤ s ≤ suj −R/2 (interval of length R), we define
ζ(s, θ) := x̃j(+∞, θ)
+ (1− βR
(s− svj − ℓj +R))
x̃j(s− svj , θ)− x̃j(+∞, θ)
+ (1− βR
(−s+ suj −R))
ỹj(s− suj , θ)− ỹj(−∞, θ)
Symplectic homology for autonomous Hamiltonians 59
R 0 −R−R
x̃j−1 0 −Tj−1+ǫj−12δ
−Tj−1+ǫj−1
Tj+ǫj
Tj+ǫj
Figure 6: The gluing map G.
• If suj +R/2 ≤ s ≤ svj−1 − ℓj−1 + 3R/2 (interval of length R), we define
ζ(s, θ) := x̃j−1(−∞, θ)
+ (1−βR
(−s+svj−1−ℓj−1 +R))
x̃j−1(s− svj−1 , θ)− x̃j−1(−∞, θ)
+(1− βR
(s− suj −R))
ỹj(s− suj , θ)− ỹj(+∞, θ)
The section ζ is indeed of class W 1,p because
ỹj(−∞, θ) = x̃j(+∞, θ), ỹj(+∞, θ) = x̃j−1(−∞, θ).
That the norm of G is uniformly bounded with respect to δ → 0 follows directly
from the definition (68) of the norm on T ewδ B̃δ, as well as from the definition (61)
of the norm ‖ · ‖1,δ on dom(DGδ,ǫ( ew)) (see also Remark 4.14).
Let us now prove the estimate (72). On each of the intervals appearing
in (73) we have (DGδ( ew)Tδη)(s, θ) = η(s, θ) and we are therefore left to examine
intervals of the type [svj + ℓj − 3R/2, suj −R/2] and [suj +R/2, svj−1 − ℓj−1 +
3R/2]. We treat only the first case since the second one is entirely similar.
Upon applying the operator DGδ( ew) to ζ we obtain five types of terms as
following.
• DGδ( ew)x̃j(+∞, θ). Since x̃j(+∞, θ) does not depend on s we can view
DGδ( ew) as a family of operators on S
1. Then we have
‖DGδ( ew)x̃j(+∞, θ)‖δ = ‖(DGδ( ew) −Dvj(+∞,θ))x̃j(+∞, θ)‖δ
≤ ‖DGδ( ew) −Dvj(+∞,θ)‖δ‖x̃j(+∞, θ)‖
≤ C(δ)‖η‖δ.
Symplectic homology for autonomous Hamiltonians 60
Here Dvj(+∞,θ) denotes the linearized operator at the constant cylinder
vj(+∞, θ), the norm ‖x̃j(+∞, θ)‖ is induced from the (1-dimensional)
space V ′vj , and
C(δ) → 0, δ → 0.
This last statement and the last inequality follow from
‖DGδ( ew)−Dvj(+∞,θ)‖δ
≤ C(‖v̂j−vj(+∞, θ)‖L1,p,d([(Tj+ǫj)/2δ−R/2,(Tj+ǫj)/2δ]×S1)
+ ‖ûj−uj(−∞, θ)‖L1,p,d([−R,−R/2]×S1))
and the fact that the intervals of integration migrate to ±∞. The above
inequality makes crucial use of the fact that the weight gδ on the necks is
given by the exponential weight of the ambient spaces Bδ, B′δ. Moreover,
we have ‖x̃j(+∞, θ)‖ ≤ ‖x̃‖ ≤ C‖η‖δ because Q ew, P and S are uniformly
bounded with respect to δ.
• −β′
(s− svj − ℓj +R)
x̃j(s− svj , θ)− x̃j(+∞, θ)
, as well as β′
suj − R))
ỹj(s − suj , θ) − ỹj(−∞, θ)
. The δ-norm of each of these two
terms is bounded by C(δ)‖η‖δ, with C(δ) → 0 as δ → 0. To see this we
first use that |β′
| ≤ 4/R → 0, δ → 0. Secondly we use that ‖x̃j(s −
svj , θ) − x̃j(+∞, θ)‖ ≤ ‖x̃‖ ≤ C‖η‖δ and ‖ỹj(s − suj , θ) − ỹj(−∞, θ)‖ ≤
‖x̃‖ ≤ C‖η‖δ.
• (1 − βR/2(s− svj − ℓj + R))DGδ( ew)
x̃j(s− svj , θ)− x̃j(+∞, θ)
and (1−
βR/2(−s+suj−R))DGδ( ew)
ỹj(s−suj , θ)−ỹj(−∞, θ)
. The parts involving
x̃j(+∞, θ) = ỹj(−∞, θ) are bounded by C(δ)‖η‖δ as above. On the other
hand we write
DGδ( ew)x̃j = (DGδ( ew) −D ewδ)x̃j +D ewδ x̃j
and similarly for DGδ( ew)ỹj. The first term of such a sum is bounded by
C(δ)‖η‖δ as above, with C(δ) → 0, δ → 0. We are left with
(1− βR/2)D ewδ x̃j(s− svj , θ) + (1− βR/2)D ewδ ỹj(s− suj , θ)
(P ◦ S)vjη
(s− svj , θ) +
(P ◦ S)ujη
(s− suj , θ) = η.
Here we denote by (P ◦ S)vj , (P ◦ S)uj the components of P ◦ S in
Lp,d(vδj
TŴ ) and Lp,d(uδj
TŴ) respectively. The first equality uses the
fact that 1− βR/2 ≡ 1 on the support of (P ◦S)vjη and on the support of
(P ◦ S)ujη, as well as D ewδ ◦Q ewδ = 1l.
As a conclusion we have
‖DGδ( ew)Tδη − η‖δ ≤ C(δ)‖η‖δ, C(δ) → 0, δ → 0,
and the estimate (72) holds for δ small enough.
Symplectic homology for autonomous Hamiltonians 61
We shall use the following quantitative form of the implicit function theorem
from McDuff and Salamon [21, A.3.4].
Theorem 4.19. Let X and Y be Banach spaces, U ⊂ X be an open set, and
f : U → Y be a continuously differentiable map. Let x0 ∈ U be such that
D := df(x0) : X → Y is surjective and has a bounded right inverse Q : Y → X.
Choose positive constants ε and c such that ‖Q‖ ≤ c, Bε(x0) ⊂ U , and
‖x− x0‖ < ε =⇒ ‖df(x)−D‖ ≤ 1/2c. (74)
Then, for any x1 ∈ X satisfying
‖f(x1)‖ < ε/4c, ‖x1 − x0‖ < ε/8, (75)
there exists a unique x ∈ X such that
f(x) = 0, x− x1 ∈ imQ, ‖x− x0‖ ≤ ε. (76)
Moreover, ‖x− x1‖ ≤ 2c‖f(x1)‖.
The above theorem will be used within the following setup. Consider an
element [u] ∈ MA(p, q;H, {fγ}, J) and denote u0 := Gδ,ǫ(u). Given ε > 0 we
denote by Bε(0) the ball of radius ε centered at 0 inW
1,p(R×S1, u∗0TŴ ; ‖·‖1,δ),
where ‖ · ‖1,δ is defined by (61). For ζ ∈W 1,p(R× S1, u∗0TŴ ; ‖ · ‖1,δ) we write
ζ = ζ1 +
κjβ(−s+ suj )β(s− suj + 2R)XH
κjβ(s− suj )β(−s+ suj + 2R)XH
κiβ(s− svi + ℓi − 2R)β(−s+ svi + ℓi − 2R)ζi,δ(· − svi , ·)
with ℓi = R+ (Ti + ǫi)/2δ and ζi,δ the generator of ker Dvi whose value at 0 is
the vector field XH along γi. Then
‖ζ‖1,δ = ‖ζ1‖W 1,p(gδ,ǫ) +
(|κj |+ |κj |) +
|κi|.
We denote
ζ̃ := ζ1+
κjβ(−s+suj )β(s−suj+2R)XH+κjβ(s−suj )β(−s+suj+2R)XH
so that ζ̃(svi , ·) is L2-orthogonal to ζi,δ(0, ·). For each i = 1, . . . ,m − 1 we
consider the smooth cutoff function
ρi,δ,ǫ(s) := β(s− svi + ℓi − 2R)β(−s+ svi + ℓi − 2R),
Symplectic homology for autonomous Hamiltonians 62
so that ρi,δ,ǫ vanishes outside [svi − Ti+ǫi2δ , svi +
Ti+ǫi
] and ρi,δ,ǫ ≡ 1 on the
interval [svi − Ti+ǫi2δ + 1, svi +
Ti+ǫi
− 1].
We define ϕζ(u0) : R× S1 → Ŵ by
ϕζ(u0)(s, θ) :=
u0(s, θ), suj −R ≤ s ≤ suj +R,
ρi,δ,ǫ(s)κi
(u0(s, ·))(θ), svi − Ti+ǫi2δ ≤ s ≤ svi +
Ti+ǫi
Note that the last formula can also be written in the chart (ϑ, z) around Sγi as
ϑ ◦ϕζ(u0)(s, θ) = ϑ ◦ϕ
ρi,δ,ǫ(s)κi
(u0(s, 0))+ θ. Given a vector field ξ along u0 we
define the vector field ϕζ∗ξ along ϕζ(u0) by
ϕζ∗ξ(s, θ) :=
ξ(s, θ), suj −R ≤ s ≤ suj +R,
ρi,δ,ǫ(s)κi∗
(u0)ξ(s, θ), svi − Ti+ǫi2δ ≤ s ≤ svi +
Ti+ǫi
We define a map
Φ : Bε(0) → Bδ = B1,p,dδ (γp, γq, A;H, {fγ}) (77)
Φ(ζ) := expϕζ(u0)(ϕζ∗ζ̃).
Since ρi,δ,ǫ is precisely the coefficient of ζi,δ in our splitting for ζ, it follows that
dΦ(0) = Id. Hence, for ε > 0 small enough the map Φ is a diffeomorphism onto
its image, i.e. a chart.
We denote X := W 1,p(R × S1, u∗0TŴ ; ‖ · ‖1,δ), U := Bε(0) ⊂ X , Y :=
Lp(R × S1, u∗0TŴ ; gδ,ǫdsdθ), x0 = 0. For ε > 0 small enough the Banach
bundle E → Bδ can be trivialized over the image of Φ as Bε(0) × Y , and we
denote by f : Bε(0) → Y the section ∂̄Hδ,J ◦Φ read in this trivialization. Then
df(0) = Du0 is surjective and has a right inverse Qδ whose δ-norm is uniformly
bounded with respect to δ → 0 by Proposition 4.18. In order for the hypotheses
of Theorem 4.19 to be satisfied we need to check that (74) holds.
Lemma 4.20. There exists a constant C > 0 independent of δ such that, for
all x ∈ Bε(0), we have
‖df(x)− df(0)‖ ≤ C‖x‖1,δ.
Remark 4.21. The motivation for introducing the chart Φ is that we must
use the “compensated” norm ‖ · ‖1,δ. The lemma would fail if one used the
usual exponential chart ζ 7→ expu0(ζ) instead of Φ, because the estimate for the
expression (82) in the proof below would not hold.
Proof. We need to prove the existence of a uniform constant C > 0 such that
‖D(∂̄Hδ,J ◦ Φ)(x) · ζ −D(∂̄Hδ ,J ◦ Φ)(0) · ζ‖δ ≤ C‖x‖1,δ‖ζ‖1,δ (78)
Symplectic homology for autonomous Hamiltonians 63
for all ζ ∈ X . We recall the decomposition ζ = ζ̃ +
i=1 κiρi,δ,ǫζi,δ, which
satisfies ‖ζ‖1,δ = ‖ζ̃‖1,δ +
i=1 |κi|. It is therefore enough to prove (78) sep-
arately for ζ = ζ̃ and for ζ = ρi,δ,ǫζi,δ, i = 1, . . . ,m− 1. We abbreviate in the
following computations ∂̄ = ∂̄Hδ,J .
We first assume ζ = ρi,δ,ǫζi,δ. Given x = x̃+
j=1 κjρj,δ,ǫζj,δ we have
D(∂̄ ◦ Φ)(x)ζ −D(∂̄ ◦ Φ)(0)ζ
= D(∂̄ ◦ Φ)(x)ζ −D(∂̄ ◦ Φ)(
κjρj,δ,ǫζj,δ)ζ (79)
+D(∂̄ ◦ Φ)(
κjρj,δ,ǫζj,δ)ζ −D(∂̄ ◦ Φ)(0)ζ. (80)
The term (79) is further equal to
∂̄(expϕx+tζ(u0)(ϕx+tζ∗x̃))−
∂̄(expϕx+tζ(u0)(0))
= Dexpϕx(u0)(ϕx∗ex)
·D2 expϕx(u0)(ϕx∗x̃) · ∇tϕx+tζ∗x̃
+ Dexpϕx(u0)(ϕx∗ex)
·D1 expϕx(u0)(ϕx∗x̃) · ρi,δ,ǫ∇fγi(ϕx(u0))
− Dϕx(u0) ·D1 expϕx(u0)(0) · ρi,δ,ǫ∇fγi(ϕx(u0))
= Dexpϕx(u0)(ϕx∗ex)
·D2 expϕx(u0)(ϕx∗x̃) · ∇tϕx+tζ∗x̃ (81)
+ Dexpϕx(u0)(ϕx∗ex)
· (D1 expϕx(u0)(ϕx∗x̃)− T ·D1 expϕx(u0)(0))
·ρi,δ,ǫ∇fγi(ϕx(u0))
+ (Dexpϕx(u0)(ϕx∗ex)
· T −Dϕx(u0)) ·D1 expϕx(u0)(0) · ρi,δ,ǫ∇fγi(ϕx(u0)).
Here T is the parallel transport in Ŵ along the geodesic τ 7→ expϕx(u0)(τϕx∗x̃),
τ ∈ [0, 1], and we have ρi,δ,ǫ∇fγi(ϕx(u0)) = ρi,δ,ǫ(ϕ
ρi,δ,ǫκi)∗ζi,δ.
We study the first term in (81). We have pointwise bounds
|∇tϕx+tζ∗x̃| ≤ C(1 + |κi|)|x̃|,
|∇∇tϕx+tζ∗x̃| ≤ C(1 + |κi|)(|x̃|+ |∇x̃|)
for some universal constant C > 0. In particular
‖∇tϕx+tζ∗x̃‖W 1,p(gδ,ǫ) ≤ C‖x̃‖1,δ
if |κi| ≤ ‖x‖1,δ ≤ ε, with C > 0 a universal constant. On the other hand the
operators D2 expϕx(u0)(ϕx∗x̃) : W
1,p(gδ,ǫ) → W 1,p(gδ,ǫ) and Dexpϕx(u0)(ϕx∗ex) :
W 1,p(gδ,ǫ) → Lp(gδ,ǫ) are uniformly bounded if ‖x‖∞ ≤ C‖x‖1,δ ≤ Cε (we use
here the Sobolev inequality). This implies that the δ-norm of the first term
in (81) is bounded by a constant multiple of ‖x̃‖1,δ.
We now study the second term in (81). Let ‖| · ‖| be the operator norm for
continuous linear maps
W 1,p(ϕx(u0)
∗TŴ ; ‖ · ‖1,δ) →W 1,p(expϕx(u0)(ϕx∗x̃)
∗TŴ ; gδ,ǫdsdθ).
Symplectic homology for autonomous Hamiltonians 64
We claim that ‖|D1 expϕx(u0)(ϕx∗x̃)− T ·D1 expϕx(u0)(0)‖| ≤ C‖x̃‖1,δ for some
uniform constant C > 0, provided ‖x‖1,δ ≤ ε. Indeed, since the metric on Ŵ
varies smoothly, for any ξ = ξ̃ +
ℓ=1 κ
ℓρℓ,δ,ǫ∇fγℓ(ϕx(u0)) we have pointwise
bounds
∣∣(D1 expϕx(u0)(ϕx∗x̃)− T ·D1 expϕx(u0)(0))ξ̃
∣∣ ≤ C|x̃||ξ̃|,
∣∣∇(D1 expϕx(u0)(ϕx∗x̃)− T ·D1 expϕx(u0)(0))ξ̃
∣∣ ≤ C(|∇x̃||ξ̃|+ |x̃||∇ξ̃|),
∣∣(D1 expϕx(u0)(ϕx∗x̃)− T ·D1 expϕx(u0)(0))ρℓ,δ,ǫ∇fγℓ(ϕx(u0))
∣∣ ≤ C|x̃|,
∣∣∇(D1 expϕx(u0)(ϕx∗x̃)−T ·D1 expϕx(u0)(0))ρℓ,δ,ǫ∇fγℓ(ϕx(u0))
∣∣ ≤ C(|x̃|+|∇x̃|).
The claim then follows by integration with respect to the weight gδ,ǫ and by
using the Sobolev inequalities ‖x̃‖L∞ ≤ C‖x̃‖1,δ and ‖ξ̃‖L∞ ≤ C‖ξ̃‖1,δ. On the
other hand, as already seen above, the operator Dexpϕx(u0)(ϕx∗ex)
acting from
the space W 1,p(gδ,ǫ) to L
p(gδ,ǫ) is uniformly bounded for ‖x‖1,δ ≤ ε, since its
coefficients are bounded. We infer that the δ-norm of the second term in (81)
is bounded by a constant multiple of ‖x̃‖1,δ.
We finally study the third term in (81). We claim that ‖Dexpϕx(u0)(ϕx∗ex) ·
T −Dϕx(u0)‖ ≤ C‖x̃‖1,δ for some uniform constant C > 0, provided ‖x‖1,δ ≤ ε.
This follows from the pointwise bounds
∣∣(Dexpϕx(u0)(ϕx∗ex) · T −Dϕx(u0))ξ̃
∣∣ ≤ C|x̃|(|ξ̃|+ |∇ξ̃|),
∣∣(Dexpϕx(u0)(ϕx∗ex) · T −Dϕx(u0))ρℓ,δ,ǫ∇fγℓ(ϕx(u0))
∣∣ ≤ C|x̃|
by integrating with respect to the weight gδ,ǫ and by using the previous Sobolev
inequalities. Since D1 expϕx(u0)(0) = Id, we infer that the δ-norm of the third
term in (81) is bounded by a constant multiple of ‖x̃‖1,δ.
As a conclusion, the δ-norm of the expression in (79) is bounded by a con-
stant multiple of ‖x̃‖1,δ.
We now consider the expression in (80), which can be written as
D(∂̄ ◦ Φ)(κiρi,δ,ǫζi,δ)ζ −D(∂̄ ◦ Φ)(0)ζ (82)
∂̄(ϕκiζ+tζ(u0))−
∂̄(ϕtζ(u0))
∂̄(u0(·+ (κi + t)ρi,δ,ǫ, ·))−
∂̄(u0(·+ tρi,δ,ǫ, ·)).
Each term in the above difference is supported in the intervals [svi − Ti+ǫi2δ , svi −
Ti+ǫi
+1] and [svi+
Ti+ǫi
−1, svi+ Ti+ǫi2δ ]. Moreover, their difference is pointwise
bounded by C|κi| for some uniform constant C > 0. Since the weight gδ,ǫ is
uniformly bounded on the above intervals of length 1, we infer that the δ-norm
of the expression in (80) is bounded by C|κi|, hence by C‖x‖1,δ for some uniform
constant C > 0.
Symplectic homology for autonomous Hamiltonians 65
We now assume ζ = ζ̃ and we again decompose D(∂̄ ◦Φ)(x)ζ −D(∂̄ ◦Φ)(0)ζ
as the sum of the expressions in (79) and (80).
The expression in (79) can be written
∂̄(expϕx(u0)(ϕx∗(x̃ + tζ̃)))−
∂̄(expϕx(u0)(ϕx∗tζ̃))
= Dexpϕx(u0)(ϕx∗ex)
·D2 expϕx(u0)(ϕx∗x̃) · ϕx∗ζ̃
− Dϕx(u0) ·D2 expϕx(u0)(0) · ϕx∗ζ̃
= Dexpϕx(u0)(ϕx∗ex)
· (D2 expϕx(u0)(ϕx∗x̃)− T ·D2 expϕx(u0)(0)) · ϕx∗ζ̃
+ (Dexpϕx(u0)(ϕx∗ex)
· T −Dϕx(u0)) ·D2 expϕx(u0)(0) · ϕx∗ζ̃. (83)
Here T denotes the same parallel transport map as above.
We claim that the δ-norm of the first term in the expression (83) is bounded
by C‖x̃‖1,δ‖ζ̃‖1,δ when ‖x‖1,δ ≤ ε, for some uniform constant C > 0. We have
the pointwise estimates
|ϕx∗ζ̃| ≤ C(1 +
|κj |)|ζ̃|,
|∇ϕx∗ζ̃| ≤ C(1 +
|κj |)(|ζ̃|+ |∇ζ̃|),
which imply ‖ϕx∗ζ̃‖W 1,p(gδ,ǫ) ≤ C‖ζ̃‖1,δ for some uniform constant C > 0,
provided ‖x‖1,δ ≤ ε. On the other hand, the pointwise estimates
∣∣(D2 expϕx(u0)(ϕx∗x̃)− T ·D2 expϕx(u0)(0))ξ
∣∣ ≤ C|x̃||ξ|,
∣∣∇(D2 expϕx(u0)(ϕx∗x̃)− T ·D2 expϕx(u0)(0))ξ
∣∣ ≤ C(|∇x̃||ξ|+ |x̃||∇ξ|)
show that the norm of the operator D2 expϕx(u0)(ϕx∗x̃) − T · D2 expϕx(u0)(0)
acting from W 1,p(gδ,ǫ) to itself is bounded by C‖x̃‖1,δ. Finally, we have already
seen that the operator Dexpϕx(u0)(ϕx∗ex)
acting between W 1,p(gδ,ǫ) and L
p(gδ,ǫ)
is uniformly bounded, and the claim follows.
We now claim that the δ-norm of the second term in the expression (83)
is also bounded by C‖x̃‖1,δ‖ζ̃‖1,δ when ‖x‖1,δ ≤ ε, for some uniform constant
C > 0. We have the pointwise estimate
∣∣(Dexpϕx(u0)(ϕx∗ex) · T −Dϕx(u0))ξ
∣∣ ≤ C|x̃|(|ξ|+ |∇ξ|),
which implies that the norm of the operator Dexpϕx(u0)(ϕx∗ex)
·T −Dϕx(u0) acting
from W 1,p(gδ,ǫ) to L
p(gδ,ǫ) is bounded by C‖x̃‖1,δ for some uniform constant
C > 0. Since ‖ϕx∗ζ̃‖W 1,p(gδ,ǫ) ≤ C‖ζ̃‖1,δ and D2 expϕx(u0)(0) = Id, the claim
follows.
Symplectic homology for autonomous Hamiltonians 66
We finally study the term (80) in the decomposition of D(∂̄ ◦Φ)(x)ζ−D(∂̄ ◦
Φ)(0)ζ, which can be written
∂̄(expϕx(u0) ϕx∗tζ̃)−
∂̄(expu0 tζ̃)
= Dϕx(u0) ·D2 expϕx(u0)(0) · ϕx∗ζ̃ −Du0 ·D2 expu0(0) · ζ̃
= Dϕx(u0) · ϕx∗ζ̃ −Du0 · ζ̃.
This last expression is pointwise bounded by C(
j=1 |κj |)(|ζ̃| + |∇ζ̃|), which
implies that its δ-norm is bounded by C‖x‖1,δ‖ζ̃‖1,δ for some uniform constant
C > 0.
This proves the lemma.
Proposition 4.22. Let [u] ∈ MA(p, q;H, {fγ}, J). There exists δ1 > 0 and a
one-parameter family [uδ] ∈ MA(γp, γq;Hδ, J), 0 < δ < δ1 such that
[uδ] → [u], δ → 0.
Here convergence is understood in the sense of Definition 4.2. Moreover, if
dim MA(p, q;H, {fγ}, J) = 0 then the intermediate gradient fragments in [u]
are nonconstant and the above one-parameter family is unique.
Remark 4.23. The fact that the intermediate gradient fragments in [u] are
nonconstant is the reason why we had to prove the gluing theorem only in the
case where the intermediate lengths of gradient trajectories are strictly positive:
Ti > 0, i = 1, . . . ,m− 1, where m is the number of sublevels in [u].
Proof. We choose a representative u = (cm, um, . . . , u1, c0) of [u] and we apply
Theorem 4.19 in a chart of Bδ as above. By Proposition 4.18 the operator D
admits a right inverse Qδ which is uniformly bounded with respect to δ by
some constant c. By Lemma 4.20 there exists ε > 0 independent of δ such that
condition (74) is satisfied. We set x0 := Gδ(u). By Proposition 4.15 we have
‖f(x0)‖ = 0
and therefore condition (75) is satisfied on some open neighbourhood of x0 if δ
is small enough. Taking x1 := x0 in the statement of Theorem 4.19 provides us
with an element x ∈ X satisfying (76). We set
uδ := x.
Then [uδ] ∈ MA(γp, γq;Hδ, J). Because ‖x − x0‖ ≤ 2c‖f(x0)‖ → 0 and x0 =
Gδ(u) → u by construction, we infer by Proposition 4.16 that [uδ] → [u], δ → 0.
We now assume that the dimension of MA(p, q;H, {fγ}, J) is zero. We have
dimMA(γp, γq;Hδ, J) =
= µ(γp)− µ(γq) + 2〈c1(TW ), A〉 − 1
= µ(γ)− µ(γ) + 2〈c1(TW ), A〉 − 1 + ind(p)− ind(q),
Symplectic homology for autonomous Hamiltonians 67
hence µ(γ)−µ(γ)+2〈c1(TW ), A〉 = 1− ind(p)+ ind(q) ≤ 2. On the other hand
µ(γ) − µ(γ) + 2〈c1(TW ), A〉 =
i=1 µ(γi) − µ(γi−1) + 2〈c1(TW ), Ai〉, where
m ≥ 0 is the number of sublevels of u. Each of the summands is nonnegative
by transversality, and the only possibilities occuring are the following:
(i) each summand is zero, which means that all the Floer trajectories involved
in u are rigid;
(ii) one of the summands is 1 and the others vanish. Since [u] is rigid, the
only nonrigid summand must be u0 or um, while c0, respectively cm have
to be constant.;
(iii) two of the summands are 1, and the others vanish. As above, the nonrigid
summands must be u0 and um, while c0 and cm are constant;
(iv) one of the summands is 2 and the others vanish. Since [u] is rigid we must
have m = 1 and c0, c1 have to be constant.
In each of the cases (i-iii) the intermediate gradient trajectories have to be
nonconstant by transversality of the evaluation maps (34).
Let now [ṽδ] → [u], δ → 0. Since the only possible intermediate gradient
trajectory in [u] is nonconstant, we can apply Proposition 4.16. We obtain
representatives vδ ∈ [ṽδ], v ∈ [u] and functions ǫ = ǫ(δ) = (ǫ1(δ), . . . , ǫm−1(δ))
such that vδ, Gδ,ǫ(v) belong to some ‖ · ‖1,δ,ǫ -chart in Bδ and
‖vδ −Gδ,ǫ(v)‖1,δ,ǫ → 0, δ → 0. (84)
We have to prove that uδ and vδ differ by a shift for δ > 0 sufficiently small.
Let us choose a continuous path vt ∈ M̂A(p, q;H, {fγ}, J), t ∈ [0, 1] with
v0 = u, v1 = v. We denote yt = yt(δ) := Gδ,tǫ(vt) and ‖ · ‖t := ‖ · ‖1,δ,tǫ. Note
that, for each δ > 0, there exists a continuous function Cδ : [0, 1]× [0, 1] → R+
such that
Cδ(t, t′)
‖ · ‖t′ ≤ ‖ · ‖t ≤ Cδ(t, t′)‖ · ‖t′ , t, t′ ∈ [0, 1]
satisfying Cδ(t, t) = 1, t ∈ [0, 1].
As yt(δ) and Dyt(δ) vary continuously with t and δ, we can choose ε > 0 and
c > 0 so that the hypotheses of the implicit function theorem 4.19 are satisfied
for each yt(δ), t ∈ [0, 1], 0 < δ < δ1 and some suitable constant δ1 > 0. After
further shrinking δ1 we can also assume that ‖f(yt)‖t < ε/4c for all t ∈ [0, 1]
and 0 < δ < δ1. Finally, in view of (84), for some smaller δ1 we can achieve
‖vδ − y1(δ)‖1 ≤ ε.
We define
Iδ := {t ∈ [0, 1] : ∃x ∈ [vδ], ‖x− yt(δ)‖t ≤ ε}, 0 < δ < δ1.
We prove that Iδ = [0, 1] by showing that it is a nonempty open and closed
subset of [0, 1]. Note that Iδ is nonempty since 1 ∈ Iδ. We prove now that Iδ
Symplectic homology for autonomous Hamiltonians 68
is closed. Assume that tn ∈ Iδ is such that tn → t. Let xn ∈ [vδ] be such that
‖xn−ytn(δ)‖tn ≤ ε. By the triangular inequality we see that ‖xn−yt(δ)‖t stays
bounded, hence the sequence of shifts defining xn is also bounded and, up to a
subsequence, we may assume that xn → x ∈ [vδ]. Then
‖xn − yt(δ)‖t ≤ Cδ(t, tn)‖xn − yt(δ)‖tn
≤ Cδ(t, tn)(‖xn − ytn(δ)‖tn + ‖ytn(δ)− yt(δ)‖tn)
≤ Cδ(t, tn)(ε+ ‖ytn(δ)− yt(δ)‖tn).
We pass to the limit n→ ∞ and obtain ‖x−yt(δ)‖t ≤ ε, hence t ∈ Iδ. We prove
now that Iδ is open. Let t ∈ Iδ and choose an open interval J containing t such
that Cδ(t
′, t) < 2 and ‖yt′(δ)−yt(δ)‖t < ε/8 for all t′ ∈ J . Theorem 4.19 applied
to x0 := yt(δ) and x1 := yt′(δ) yields x such that f(x) = 0 and ‖x−yt(δ)‖t ≤ ε.
The uniqueness statement in the implicit function theorem ensures that the
intersection of the space of solutions with the ‖ · ‖t-ball of radius ε centered
at x0 is a graph over ker D. Since dim ker D = 1 and since translation in
the s-variable already provides a 1-parameter family of solutions, we infer that
x ∈ [vδ]. Moreover, the last statement in Theorem 4.19 gives ‖x−yt′(δ)‖t ≤ ε/2.
Then ‖x− yt′(δ)‖t′ ≤ Cδ(t′, t)ε/2 < ε, so that t′ ∈ Iδ and Iδ is open.
The upshot is that there exists x ∈ [vδ] such that ‖x−y0(δ)‖0 ≤ ε, 0 < δ < δ1.
But y0(δ) = Gδ(u) and, again by the uniqueness statement in the implicit
function theorem, we get that x and uδ differ by a shift. Hence [uδ] = [vδ].
Proof of Theorem 3.7. We first prove (i) and show the existence of δ1. As-
sume by contradiction that there exists a sequence δn → 0 and Floer trajec-
tories vn ∈ M̂A(γp, γq;Hδn , J) such that J is not regular for vn. By Propo-
sition 4.7 we may assume, up to shifting and passing to a subsequence, that
vn → u ∈ M̂A(p, q;H, {fγ}, J). As seen in the proof of Proposition 4.22, the
limit u has nonconstant intermediate gradient trajectories since J is regular for
u. We can therefore apply Proposition 4.16 and get parameters ǫn and vector
fields ζn such that vn = expGδn,ǫn (u)
(ζn) and ‖ζn‖1,δn,ǫn → 0. By Proposi-
tion 4.18 the operator DGδn,ǫn (u) is surjective and admits a right inverse which
is uniformly bounded with respect to δ. We infer that the operator Dvn is also
surjective for n large enough, a contradiction.
Let us prove (ii). Let (δ, vδ) ∈ M̂A]0,δ1[(γp, γq;H, {fγ}, J) and let I(δ) ⊂]0, δ1[
be a small relatively compact open interval containing δ. Since the norms ‖·‖1,δ′
are equivalent for δ′ ∈ I(δ), the space BI(δ) :=
δ′∈I(δ){δ′} × Bδ′ is a Banach
manifold. Similarly, there is a Banach vector bundle EI(δ) → BI(δ) endowed with
an obvious section ∂̄HI(δ),J whose restriction to Bδ′ is ∂̄Hδ′ ,J . The restriction
of its linearization D(δ,vδ) at (δ, vδ) to TvδBδ is the surjective operator Dvδ
of index 1, hence D(δ,vδ) is surjective and has index 2. Therefore ker D(δ,vδ)
projects surjectively onto TδI(δ) = R and the projection in (ii) is a submersion.
Symplectic homology for autonomous Hamiltonians 69
We now prove (iii). Let us note that, by Proposition 4.22, we have a map
MA(p, q;H, {fγ}, J) → π0(MA]0,δ1[(γp, γp;H, {fγ}, J)),
[u] 7→ C[u] :=
δ∈]0,δ1[
{(δ, [uδ])},
where [uδ] is the uniquely defined one-parameter family of Proposition 4.22 such
that [uδ] → [u]. This map is injective because the limit of such a family [uδ]
as δ → 0 is unique. In order to prove surjectivity, let C = {(δ, [vδ])} be a
connected component of MA
]0,δ1[
(γp, γp;H, {fγ}, J). By Proposition 4.7 there
exists a sequence δn → 0 and [u] ∈ MA(p, q;H, {fγ}, J) such that [vδn ] → [u].
By the uniqueness statement in Proposition 4.22 we get that C = C[u].
4.4 Coherent orientations
The structure of this section is as follows. We first present the construction
of coherent orientations in the usual Floer setting for (Hδ, J) by adopting the
point of view of [5]. We construct coherent orientations on the moduli spaces of
Morse-Bott trajectories, out of which we get orientations on the space of Morse-
Bott trajectories with gradient fragments. Finally, we prove Proposition 3.9.
We denote S1 := R/Z and, for a path of symmetric matrices S : S1 →
M2n(R), we denote by ΨS the unique solution of the Cauchy problem
Ψ̇(θ) = J0S(θ)Ψ(θ), Ψ(0) = 1l, θ ∈ [0, 1], (85)
where J0 is the standard complex structure on R
2n. Then ΨS is a path of
symplectic matrices and we denote
S := {S : S1 →M2n(R) : tS = S and det(1l−ΨS(1)) 6= 0}.
Let us denote by E a symplectic vector bundle of rank 2n over CP 1, or
R×S1, or C, with fixed trivializations in neighbourhoods of infinity in the case
of R× S1 and C. We denote by
O(CP 1, E)
the space of linear operators D : W 1,p(CP 1, E) → Lp(CP 1,Λ0,1E) of the form
(∂x + J0∂y + S(z))dz̄ in a local trivialization of E, where z = x + iy is a local
coordinate on CP 1. Given S, S ∈ S we denote by
O(R× S1, E;S, S)
the space of linear operators D : W 1,p(R × S1, E) → Lp(R × S1,Λ0,1E) of the
form (∂s + J0∂θ + S(s, θ))(ds − idθ) in a local trivialization of E, such that
lims→−∞ S(s, ·) = S and lims→∞ S(s, ·) = S in the given trivializations of E.
Given S0 ∈ S we denote by
O±(C, E;S0)
Symplectic homology for autonomous Hamiltonians 70
O+ O+
O(R× S1)O−O+O(CP1)
Figure 7: The four possibilities of gluing (O = O(R × S1)).
the space of linear operators D :W 1,p(C, E) → Lp(C,Λ0,1E) of the form (∂x +
J0∂y + S(z))dz̄ in a local trivialization of E and such that, when expressed in
holomorphic cylindrical coordinates (s, θ) with e±2π(s+iθ) = z as (∂s + J0∂θ +
S(s, θ))(ds − idθ), we have lims→±∞ S(s, θ) = S0(θ) in the given trivialization
of E. Intuitively, the space O+ corresponds to the sphere with one positive
puncture, while O− corresponds to the sphere with one negative puncture.
It is a standard fact in the literature that each of the above spaces O is
contractible and consists of Fredholm operators. Moreover, they each come
equipped with a canonical real line bundle Det(O) whose fiber atD is Det(D) :=
(Λmax ker D)⊗ (ΛmaxcokerD)∗. Each of the bundles Det(O) is trivial since the
base is contractible.
We now define gluing operations between elements of the above spaces (see
Figure 7). Let K ∈ O+(C, E;S0) or K ∈ O(R × S1, E;S, S0), and L ∈
O−(C, F ;S0) or L ∈ O(R × S1, F ;S0, S). Let us choose a cutoff function
β : R → [0, 1] such that β(s) = 0 if s ≤ 0 and β(s) = 1 if s ≥ 1. Given
R > 0 large we define operators KR and LR by replacing S in the asymptotic
expressions ofK and L by S0+β(R−s)(S−S0) and S0+β(R+s)(S−S0) respec-
tively. We cut out semi-infinite cylinders {s > R} from the base of E, {s < −R}
from the base of F , then identify their boundaries using the coordinate θ. We
glue the vector bundles E and F using their given trivializations near infinity
and denote the resulting vector bundle by E#F . We define K#RL by concate-
nating KR and LR, so that K#RL belongs to one of the spaces O(CP 1, E#F ),
O+(C, E#F ;S), O−(C, E#F ;S), or O(R× S1, E#F ;S, S).
Following [5, Corollary 7], for R large enough there is a natural isomor-
phism Det(K)⊗Det(L) ∼→ Det(K#RL) defined up to homotopy. In particular,
given orientations oK of Det(K) and oL of Det(L), we induce a canonical ori-
entation oK#oL of Det(K#RL). Moreover, this operation on orientations is
associative [13, Theorem 10].
We describe now, following [5], a procedure for constructing orientations on
Symplectic homology for autonomous Hamiltonians 71
the spaces O(R × S1, E;S, S) which are coherent with respect to the gluing
operation, in the sense of [13, Definition 11]. We denote by θn a trivial sym-
plectic vector bundle of rank 2n. We first note that each determinant bundle
Det(O(CP 1, E)) is naturally oriented since O(CP 1, E) contains the connected
space of complex linear operators and the latter have kernels and cokernels
which are canonically oriented as complex vector spaces. We now choose arbi-
trary orientations of the determinant bundles Det(O+(C, E;S0)) such that the
trivialization of E at infinity extends to C.
Remark 4.24. Note that, if S0 commutes with J0, the set of C-linear operators
in O+(C, E;S0) forms a nonempty convex set, hence Det(O+(C, E;S0)) has a
canonical orientation.
We induce orientations on the determinant bundles Det(O−(C, E;S0)) such that
the trivialization of E at infinity extends to C by requiring that the orientation
induced by gluing on Det(O(CP 1, θn)) is the canonical one. Finally, we induce
orientations on Det(O(R×S1, E;S, S)) by requiring that the orientation induced
on Det(O(CP 1, θn#E#θn)) by the gluing operation
O+(C, θn;S)×O(R× S1, E;S, S)×O−(C, θn;S) → O(CP 1, θn#E#θn)
is the canonical one. It is proved in [5] that this defines a system of coherent
orientations.
The general procedure for inducing orientations of the spaces of Floer trajec-
tories M̂A(γp, γq;Hδ, J) out of a system of coherent orientations goes as follows.
Let Ψp, Ψq denote the linearizations of the Hamiltonian flow of Hδ along γp, γq
in their fixed respective trivializations and let Sp, Sq ∈ S be the corresponding
paths of symmetric matrices as in (85). Let E be a symplectic vector bundle
over R × S1 with fixed trivializations at infinity and relative first Chern class
equal to 〈c1(TŴ ), A〉. For each u ∈ M̂A(γp, γq;Hδ, J) there is an isomorphism
of symplectic vector bundles Φu : u
∼→ E, chosen to depend continuously
on u. There is a map
M̂A(γp, γq;Hδ, J) → O(R× S
1, E;Sp, Sq), u 7→ Φu ◦ D̃u ◦Φ−1u ,
where D̃u has the same analytical expression as the linearized operator Du :
W 1,p(R×S1, u∗TŴ ; ed|s|ds dθ)⊕V u⊕V u → Lp(R×S1, u∗TŴ ; ed|s|ds dθ) con-
sidered in Section 4.3. Under the assumption that J is regular and because of
elliptic regularity, the operators D̃u and Du have the same kernel, consisting of
smooth elements. Hence their determinant lines are naturally isomorphic. It fol-
lows that the pull-back of Det(O) under the above map is naturally isomorphic
to ΛmaxTM̂ = Λmax ker Du, and we get an orientation on M̂.
If the dimension is one, the space M̂ has a canonical orientation given at each
point u by the vector field ∂su. Comparing this with the orientation constructed
above associates to each connected component [u] of M̂ a sign ǫ(u).
Symplectic homology for autonomous Hamiltonians 72
Lemma 4.25. Let S1 ∈ S and define Sm(θ) := S1(mθ). Assume Sm ∈ S and
define an automorphism φm of O+(C, E;Sm) by conjugation with the map z 7→
e2iπ/mz. Then φm is orientation reversing for Det(O+(C, E;Sm)) if and only if
m is even and the difference of Conley-Zehnder indices µCZ(ΨSm)− µCZ(ΨS1)
is odd.
Proof. We start by explaining how φm acts on the orientations of the deter-
minant bundle. The operators K ∈ O+(C, E;Sm) which are invariant under
conjugation by φm, i.e. K(ζ ◦φm) ◦φ−1m = K(ζ) for all ζ, form a convex and, in
particular, connected set. Since φm acts on ker K and cokerK, it also acts on
Det(K) and the induced action on orientations extends to Det(O+(C, E;Sm)).
There is a bijective correspondence between operators K1 ∈ O+(C, E;S1)
and operators K ∈ O+(C, E;Sm) which are invariant under conjugation by φm,
in which case the pull-back of ker K1 under z 7→ zm is the 1-eigenspace of φm
acting on ker K. Since ker K splits as a direct sum of eigenspaces corresponding
to them-th roots of unity and since imaginary roots give rise to even-dimensional
eigenspaces, we infer that the dimension of the −1-eigenspace has the parity of
dimker K − dimker K1. This fact is relevant in our situation since φm reverses
the orientation of ker K if and only if this dimension is odd. Similarly, φm
reverses the orientation of cokerK if and only if dim cokerK − dim cokerK1
is odd. As a conclusion, φm reverses the orientation of Det(K) if and only if
ind(K)− ind(K1) = µCZ(ΨSm)− µCZ(ΨS1) is odd. This can happen of course
only if −1 is an m-th root of unity, i.e. m is even.
Remark 4.26. The proof of Lemma 4.25 shows that, if m is odd, the difference
of Conley-Zehnder indices is automatically even.
Lemma 4.27. Let S1 ∈ S, define Sm(θ) := S1(mθ) and assume Sm ∈ S. Let
T ∈ O(R× S1, θn;Sm, Sm) be an element of the form
T := ∂s + J0∂θ + Sm(θ − β(s)/m),
with β : R → [0, 1] a smooth function satisfying β(s) = 0 near −∞, β(s) = 1
near +∞ and with derivative uniformly bounded by some small constant c. We
denote by O one of the spaces O+(C, E;Sm) or O(R × S1, E;S, Sm), S ∈ S.
The family ψ = {ψR}, R > 0 of automorphisms of O defined by
ψR(D) := D#RT
induces an action on the orientations of Det(O) which is reversing if and only if
m is even and the difference of Conley-Zehnder indices µCZ(ΨSm)− µCZ(ΨS1)
is odd.
Proof. Note that T is an isomorphism if c is small enough, by the same argument
as the one for D′′u in the proof of Proposition 4.9.
We now explain what is the action of ψ on the orientations of Det(O).
Let D ∈ O and let V ⊂ Lp be a finite dimensional vector space spanned by
Symplectic homology for autonomous Hamiltonians 73
smooth sections with compact support, such that V + imD = Lp. We define
the stabilization of D by V as
DV : V ⊕W 1,p → Lp, (v, ζ) 7→ v +Dζ.
ThenDV is a surjective Fredholm operator and there is a canonical isomorphism
Det(D) ≃ Λmax ker DV ⊗ ΛmaxV ∗. For R large enough the glued operator
DR = D
V #RT : V ⊕W 1,p → Lp is surjective with a uniformly bounded right
inverse QR, and moreover the projection onto ker DR given by 1l − QRDR is
an isomorphism when restricted to ker DV (see [5, Corollary 6], as well as [13,
Proposition 9] for a slightly different setup). Since DV #RT = (D#RT )
V , this
induces a natural isomorphism between Det(D) and Det(ψR(D)).
The gluing of orientations is associative, hence it is enough to prove the
statement for O = O+(C, θn;Sm). We claim that the action induced by ψ is
the same as the one induced by φm in Lemma 4.25. Let us choose D ∈ O which
is s-independent for s large enough, and let DV be a surjective stabilization.
We construct a continuous path in O from ψR(DV ) := ψR(D)V to φm(DV ) :=
φm(D)
V as follows. Let DVt be the conjugation of D
V by rt : C → C, z 7→
e−2iπt/mz, and let Tt be the operator ∂s + J0∂θ + Sm(θ − (t+ (1− t)β(s))/m).
Then DVt #RTt interpolates between ψR(D
V ) and φm(D
V ) as t varies from 0 to
1. This is a path of surjective operators admitting a continuous family of right
inverses Qt. Given a basis (ζ1, . . . , ζk) of ker D
V , a basis of ker DVt is given by
(ζ1, . . . , ζk) ◦ rt. By projecting along imQt we obtain a basis of ker DVt #RTt.
For t = 1, since DV1 = D
1 #RT1, the elements ζi ◦ r1 are preserved by the
projection and form a basis of φ−1m (D
V ) which is exactly the one giving the
action of φ−1m (or φm) on orientations, as explained in Lemma 4.25.
Lemma 4.28. Let γ ∈ P≤αλ and γp, γq be the orbits corresponding to the
minimum p and maximum q of fγ respectively. For δ > 0 small enough, the
moduli space MA(γp, γq;Hδ, J) is empty if A 6= 0, while for A = 0 it consists
of exactly two elements u1, u2 corresponding to the two gradient trajectories of
fγ running from p to q. Moreover, they satisfy
ǫ(u1) + ǫ(u2) =
0, if γ is a good orbit,
±2, if γ is a bad orbit.
Proof. Let c1, c2 be the gradient trajectories of fγ running from p to q. By
Theorem 3.7, for δ > 0 small enough each element [uδ] ∈ MA(γp, γq;Hδ, J)
corresponds to a unique Floer trajectory with gradient fragments [u] whose
endpoints are p and q. For energy reasons there can be no nonconstant Floer
trajectory involved in [u] and therefore [u] is either c1 or c2. Since the cylinders
u1 and u2 of the form uδ,γ,−∞,+∞ associated to c1 and c2 are already Floer
trajectories forHδ, we infer that [uδ] equals either [u1] or [u2], and the homology
class A is necessarily zero. Let us introduce the notation ǫ(γ) := 1 if γ is a good
orbit and ǫ(γ) := −1 if γ is a bad orbit. The conclusion of the Lemma is
equivalent to the relation
ǫ(u1) = −ǫ(γ)ǫ(u2). (86)
Symplectic homology for autonomous Hamiltonians 74
Let us choose a symplectic trivialization Φγ : TŴ |Sγ → Sγ×(R×R2n−1) such
that Φγ(XH) = (1, 0). We assume without loss of generality that ċ1 is a positive
multiple of XH , so that Φγ(∂su1) = (f1, 0) with f1 > 0 and Φγ(∂su2) = (f2, 0)
with f2 < 0. We denote by Du1 , Du2 the elements of O(R× S1, θn;Sp, Sq) ob-
tained by conjugation of D̃u1 , D̃u2 with Φγ . The main point is to consider the
operator ψ(Du1) = Du1#RT , with T as in Lemma 4.27. A basis of Det(Dui)
corresponding to the coherent orientation is by definition ǫ(ui)(fi, 0), i = 1, 2.
The image of this basis under the action of ψ is given by ǫ(u1)(f
1 , 0), for some
1 ∈ W 1,p(R× S1,R) with ‖f
1 − f1‖1,p arbitrarily small with R → ∞, hence
1 > 0 for R large enough. By Lemma 4.27, a basis of Det(Du1#RT ) corre-
sponding to the coherent orientation is ǫ(γ)ǫ(u1)(f
1 , 0). Finally, the operators
Du1#RT and Du2 can be connected by a continuous path of operators Dt,
t ∈ [0, 1] satisfying properties (ii)-(iv) in the proof of Proposition 4.9, as well as
the following weaker form of property (i) therein.
(i’) there exists a smooth path c : R → Sγ with c(±∞) being fixed critical
points of fγ , such that ‖S(s, θ)− S(θ+ ϑ ◦ c(s)− ϑ ◦ c(−∞))‖ is bounded
by a constant multiple of δ.
The connected components of the set of operators satisfying (i’) and (ii)-(iv)
are indexed by homotopy classes of paths c as above. Gluing Du1 to T has
precisely the effect of concatenating c1 with (γq|[0,1/m])−1, which is homotopic
to c2. The proof of Proposition 4.9 works the same with the weaker assumption
(i’) and shows that the operators Dt are surjective and that ker Dt is generated
by an element of the form (ft, 0), where ft ∈ W 1,p(R×S1,R) has constant sign
for t ∈ [0, 1]. We conclude that ǫ(u1)ǫ(γ)f#1 and ǫ(u2)f2 have the same sign,
hence (86) is proved.
We now generalize the construction of coherent orientations to the moduli
spaces of Morse-Bott trajectories with gradient fragments. We define S̃ to be
the space of loops of symmetric matrices S : S1 → M2n(R) such that the
symplectic matrix ΨS(1) defined by (85) has exactly one eigenvalue equal to 1,
corresponding to the eigenspace R⊕0 ⊂ R⊕R2n−1 = R2n. Let β : R → [0, 1] be
a smooth function equal to 0 near −∞ and equal to 1 near +∞. We define V ,
V to be the one-dimensional real vector spaces generated by the vector-valued
functions (1−β(s))(1, 0) and β(s)(1, 0) respectively. In the following we denote
by W 1,p,d = W 1,p(ed|s|ds dθ), Lp,d = Lp(ed|s|ds dθ). Given S, S ∈ S̃ we denote
Õ(R× S1, E;S, S)
the space of linear operators D : W 1,p,d(R × S1, E) ⊕ V ⊕ V → Lp,d(R ×
S1,Λ0,1E) of the form (∂s + J0∂θ + S(s, θ))(ds− idθ) in a local trivialization of
E, for which there exist θ, θ ∈ R/Z such that lims→−∞ S(s, θ) = S(θ + θ) and
lims→∞ S(s, θ) = S(θ + θ) in the given trivializations at infinity of E. Given
S0 ∈ S, S ∈ S̃ we denote by
Õu(R× S1, E;S0, S)
Symplectic homology for autonomous Hamiltonians 75
the space of linear operators
D :W 1,p(R× S1, E; g+(s)ds dθ) ⊕ V → Lp(R× S1,Λ0,1E; g+(s)ds dθ)
with g+(s) := max(1, e
ds), which are of the form (∂s + J0∂θ + S(s, θ))(ds −
idθ) in a local trivialization of E, and for which there exists θ ∈ R/Z such
that lims→−∞ S(s, θ) = S0(θ) and lims→∞ S(s, θ) = S(θ + θ) in the given
trivializations at infinity of E. Given S ∈ S̃, S0 ∈ S we denote by
Õs(R× S1, E;S, S0)
the space of linear operators
D :W 1,p(R× S1, E; g−(s)ds dθ) ⊕ V → Lp(R× S1,Λ0,1E; g−(s)ds dθ)
with g−(s) := max(1, e
−ds), which are of the form (∂s + J0∂θ + S(s, θ))(ds −
idθ) in a local trivialization of E, and for which there exists θ ∈ R/Z such
that lims→−∞ S(s, θ) = S(θ + θ) and lims→∞ S(s, θ) = S0(θ) in the given
trivializations at infinity of E. Given S̃ ∈ S̃ we denote by
Õ±(C, E; S̃)
the space of linear operators D : W 1,p,d(C, E) ⊕ V± → Lp,d(C,Λ0,1E) of the
form (∂x + J0∂y + S(z))dz̄ in a local trivialization of E and such that, when
expressed in holomorphic cylindrical coordinates (s, θ) with e±2π(s+iθ) = z as
(∂s+J0∂θ+S(s, θ))(ds− idθ), there exists θ± ∈ R/Z so that lims→±∞ S(s, θ) =
S̃(θ+θ±) in the given trivialization of E near infinity. Here we use the notation
V+ := V and V− := V .
Due to the exponential weights, each of the above spaces Õ consists of Fred-
holm operators and comes equipped with a canonical real line bundle Det(Õ)
whose fiber at D is Det(D). Unlike in the nondegenerate case, the spaces Õ
are not generally contractible, hence we have to investigate the orientability of
Det(Õ).
Given S ∈ S̃ we define m = m(S) to be the maximal positive integer such
that S(θ + 1/m) = S(θ), θ ∈ R/Z. The number m is infinite if and only if the
loop S is constant, in which case the spaces Õ±(C, E;S), Õu(R×S1, E;S0, S),
Õs(R×S1, E;S, S0) are contractible. In the following we shall restrict ourselves
to nonconstant loops S ∈ S̃, in which case the above spaces have the homotopy
type of S1, while Õ(R× S1, E;S, S) has the homotopy type of S1 × S1 (this is
because they fiber over S1, respectively S1 × S1 with contractible fibers). We
denote by S1 ∈ S̃ the unique loop such that S(θ) = S1(mθ).
Lemma 4.29. Let S ∈ S̃ be nonconstant. Then Det(Õ±(C, E;S)) is nonori-
entable if and only if m is even and µRS(S)− µRS(S1) is odd.
Proof. We prove the statement only for Õ+ := Õ±(C, E;S), the proof of the
other case being similar. The following two remarks will allow us to apply
Symplectic homology for autonomous Hamiltonians 76
Lemma 4.25. First, Det(D) is naturally isomorphic to Det(D|W 1,p,d) ⊗ V+ and
V+ is a trivial bundle over Õ+. Second, the operator D|W 1,p,d is conjugated to
an operator D̃ ∈ O+(C, E;S− dp1l). Hence it is enough to study the orientability
of the bundle D̃et(Õ+) over Õ+ with fiber Det(D̃).
The bundle D̃et(Õ+) is orientable if and only if its restriction to a loop
generating π1(Õ) is orientable. After choosing D ∈ Õ+ which is invariant
under conjugation with z 7→ e−2iπ/m, the conjugation of D by rt : C → C,
z 7→ e−2iπt/mz provides such a loop Dt, t ∈ [0, 1] with D0 = D1 = D. The
orientation on Det(D̃1) obtained by continuation along the path Dt from an
orientation on Det(D̃0) is the same as the one induced by the action of φ
m (or
φm) in Lemma 4.25. Since µCZ(S − dp1l) = µRS(S)− 1/2 and µCZ(S1 −
1l) =
µRS(S1)− 1/2, the statement follows from Lemma 4.25.
The same kind of argument gives the following result.
Lemma 4.30. Let S, S, S ∈ S̃ be nonconstant and S0, S0 ∈ S. The line bundles
Det(Õ(R× S1, E;S, S)), Det(Õu(R× S1, E;S0, S)), Det(Õs(R× S1, E;S, S0))
are nonorientable if and only if the condition in Lemma 4.29 holds for S and
for one of S, S. �
The previous results motivate the following definition. We denote
S̃good := {S ∈ S̃ : S constant or µRS(S)− µRS(S1) is even},
S̃bad := {S ∈ S̃ : S nonconstant and µRS(S)− µRS(S1) is odd},
so that S̃bad = S̃ \ S̃good. Although the determinant lines over the various
spaces Õ are nonorientable if one of the asymptotes is in S̃bad, we can construct
covers
O of Õ over which the determinant lines become orientable. Let S0 ∈ S
and S ∈ S̃bad with S(θ) = S1(mθ), θ ∈ R/Z. We define
Os(R× S1, E;S, S0) to
consist of pairs (D, θ) such that θ ∈ R/ 2
Z, D = (∂s+J0∂θ+S(s, θ))(ds−idθ) ∈
Õs(R× S1, E;S, S0) with lims→−∞ S(s, θ) = S(θ + θ). The obvious projection
Os → Õs is a double cover and the lift of the determinant bundle to
Os is
orientable. We define in a completely analogous manner double covers
O±(S) →
Õ±(S),
Ou(S0, S) → Õu(S0, S), S ∈ S̃bad and a cover
O(S, S) → Õ(S, S) which
is double if exactly one of S, S is in S̃bad, and quadruple if both S, S are in S̃bad.
We define now gluing operations between elements of the various spaces Õ.
Let K in Õ+(C, E;S), Õ(R × S1, E;S, S), or Õu(R × S1, E;S0, S), and L in
Õ−(C, F ;S), Õ(R × S1, F ;S, S), or Õs(R × S1, F ;S, S0). We denote by SK ,
respectively SL the matrix valued functions involved in K and L near infinity.
We assume that
SK(s, ·) = lim
SL(s, ·) = S(·+ θ0) =: Sθ0
Symplectic homology for autonomous Hamiltonians 77
for some θ0 ∈ R/Z. We choose a cutoff function β : R → [0, 1] such that
β(s) = 0 if s ≤ 0 and β(s) = 1 if s ≥ 1. Given R > 0 large we define
operators KR and LR by replacing SK and SL by Sθ0 +β(R− s)(SK −Sθ0) and
Sθ0+β(R+s)(SL−Sθ0) respectively. We cut out semi-infinite cylinders {s > R}
from the base of E, {s < −R} from the base of F , then identify their boundaries
using the coordinate θ. We glue the vector bundles E and F using their given
trivializations near infinity and denote the resulting vector bundle by E#F . We
define K#RL by concatenating KR and LR, so that K#RL belongs to one of
the spaces O(CP 1), Õ+(C;S), O+(C;S0), or Õ−(R × S1;S), Õ(R × S1;S, S),
Õs(R × S1;S, S0), or O−(C;S0), Õu(R × S1;S0, S), O(R × S1;S0, S0), where
we have omitted the symbol E#F from the notation.
The above gluing operations admit a straightforward extension to the spaces
O. For example, two elements (K, θ) ∈
Ou(S0, S), (L, θ) ∈
Os(S, S0) can be
glued if θ = θ, in which case they give rise to an element K#RL ∈ O(S0, S0).
Recall that the domain of an operator D in some Õ contains a canoni-
cally oriented 1-dimensional summand for each asymptote in S̃, together with
a canonical isomorphism with R. We denote by VK , VL the summands corre-
sponding to the asymptote S of K and L respectively, and we let V := VK⊕RVL
be their (canonically oriented) fibered sum. By [5, Corollary 6], for R > 0 large
enough there is a natural isomorphism Det(K ⊕R L) ≃ Det(K#RL) defined up
to homotopy, where K ⊕R L is the restriction of K ⊕ L to the fibered sum of
their domains. Since V is canonically oriented, it follows that Det(K ⊕R L) is
canonically isomorphic to Det(K ⊕ L) ≃ Det(K) ⊗ Det(L). Hence we obtain
a canonical isomorphism Det(K) ⊗ Det(L) ∼→ Det(K#RL) defined up to ho-
motopy, and inducing an associative gluing operation for orientations. Similar
considerations apply to the elements of the spaces
Remark 4.31. We can construct a system of coherent orientations on the
determinant line bundles Det(Õ±(C, E;S)) and Det(Õ(R × S1, E;S, S)) with
S, S, S ∈ S̃good by the same procedure as for the spaces O. We can moreover
extend this to a system of coherent orientations involving all spaces O, Õ and
O. Nevertheless, if we want that certain orientations have a geometric meaning,
we have to impose compatibility conditions which seem ad-hoc in such a general
setup. This is why we restrict ourselves in the sequel to the spaces O, Õ and
which are relevant for our geometric situation.
We use now the notations of Section 3. Given γ ∈ P(H) we denote by
Ψγ the linearization of the Hamiltonian flow along γ given by (21) and let
Sγ : R/Z → M2n(R) be the corresponding loop of symmetric matrices defined
by Ψ̇γ = J0SγΨγ . Then Sγ ∈ S̃good if and only if γ is a good orbit. We similarly
define Sγq for each γq ∈ P(Hδ), with q ∈ Crit(fγ). For γ ∈ P(H), γq ∈ P(Hδ)
we denote Õs(R× S1, E; γ, γ
) := Õs(R× S1, E;Sγ , Sγ
) etc.
Convention. In what follows the spaces Õ will be understood to be indexed
Symplectic homology for autonomous Hamiltonians 78
only by good orbits, whereas if one of the asymptotic orbits is bad we use the
corresponding double or quadruple cover
We construct orientations on the determinant bundles over all spaces O, Õ,
O indexed by the elements of P(H) and P(Hδ) as follows. We start by choosing
arbitrary orientations of Det(Õ+(C, E; γ)), respectively Det(
O+(C, E; γ)), γ ∈
P(H) such that the trivialization of E at infinity extends to C. We then choose
orientations of Det(Õs(R×S1, E; γ, γq)), respectively Det(
Os(R×S1, E; γ, γq)),
γ ∈ P(H), q ∈ Crit(fγ) such that the trivializations of E at infinity extend
to R × S1, as follows. If γ is good, the space Õs(R × S1, θn; γ, γq) contains
a distinguished family of operators of the form Φγ ◦ Du ◦ Φ−1γ , where u =
uδ,γ,−1,∞ is the cylinder corresponding to a semi-infinite gradient trajectory
ending at q and Φγ : TŴ |Sγ → Sγ × R2n is a fixed trivialization satisfying
Φγ(XH) = (1, 0) ∈ R⊕R2n−1. This family is naturally parametrized by W s(q),
hence it is connected. As seen in Proposition 4.9 the above Fredholm operators
are surjective and have index 1 − ind(q). If the index is zero we choose the
orientation sign to be +1. If the index is one the kernel is generated by a
nonvanishing section of the form (f, 0), hence is canonically isomorphic to R⊕0
and therefore admits a canonical orientation. If γ is bad, we choose in an
arbitrary way a lift of the operator Φγ ◦Du ◦ Φ−1γ , where u = uδ,γ,−1,∞ is the
cylinder corresponding to a constant semi-infinite gradient trajectory at q. This
determines a lift of the whole path of operators described above, and hence an
orientation of Det(
Os(R× S1, E; γ, γq)) by the previous rule.
We induce orientations on Det(O+(θn)) by gluing orientations on the line
bundles Det(Õ+(θn)) and Det(Õs(θn)). The orientations on Det(Õ+(θn)) and
Det(O+(θn)) determine orientations on Det(Õ±(E)) and Det(O±(E)) by re-
quiring that the glued orientation on Det(O(CP 1, E)) is the canonical one. We
get orientations of Det(Õ(R×S1, E)) by requiring that the orientation induced
on Det(O(CP 1, θn#E#θn)) by the gluing operation
Õ+(C, θn; γ) ev ×ev Õ(R× S1, E; γ, γ)ev ×ev Õ−(C, θn; γ)→ O(CP 1, θn#E#θn)
is the canonical one. Here we have denoted by ev, ev the evaluation maps to
S1 at −∞ and +∞ respectively. Similarly, we get orientations on Det(O(R ×
S1, E)), Det(Õu(R×S1, E)) and Det(Õs(R×S1, E)), as well as orientations on
O) for the various spaces
Lemma 4.32. The above recipe defines a system of coherent orientations.
Proof. We have to prove that, given operators K, L that can be glued lying
in one of the spaces O, Õ or
O, the coherent orientations oK , oL of Det(K)
and Det(L) induce an orientation oK#oL that coincides with the coherent ori-
entation of Det(K#L). In the case when K, L belong to some O(R × S1),
Õ(R × S1) or
O(R × S1) this means that, for a suitable choice of operators
Symplectic homology for autonomous Hamiltonians 79
O+(C, θn), A ∈ Õ+(C, θn) or A ∈ O+(C, θn), and B ∈
O−(C, θn),
B ∈ Õ−(C, θn) or B ∈ O−(C, θn), with oA, oB the coherent orientations on
the respective determinant line bundles, oA#(oK#oL)#oB is the canonical ori-
entation on Det(O(CP 1)).
Let E and F be the symplectic vector bundles corresponding to K and L
respectively. If E = θn or F = θn the conclusion is a direct consequence of the
definitions and of the associativity of gluing. In the general case E 6= θn and F 6=
θn we give the proof whenK ∈ Õu(R×S1, E; γp, γ) and L ∈ Õs(R×S1, F ; γ, γq),
the other cases being similar. Let us introduce an auxiliary loop of symmetric
matrices S0 ∈ S such that [S0, J0] = 0, and we define the orientations on
Det(O±(C, E′;S0)) to be the canonical ones (see Remark 4.24). This determines
in turn orientations on Det(Õu(R×S1, E′;S0, Sγ)), γ ∈ P(H) by requiring that
gluing induces the coherent orientation on Det(Õ+(C, θn#E′; γ)).
Let A1 ∈ O+(C, E1;S0), K1 ∈ Õu(R× S1, θn;S0, Sγ) with E1#θn = θn#E.
By the above definition, we have oA1#oK1 = oA#oK . We obtain
oA#(oK#oL)#oB = (oA#oK)#oL#oB = (oA1#oK1)#oL#oB
= oA1#oK1#L#oB = oA1#oK1#L#B.
The operators A1 and K1#L#B are homotopic to C-linear operators with
asymptotic condition S0. The main observation now is that the gluing of two
C-linear operators is again C-linear, hence the gluing of the above orientations
is the canonical one on Det(O(CP 1)).
Let γ, γ be good orbits. In this case the procedure for orienting the Morse-
Bott spaces of Floer trajectories M̂A(Sγ , Sγ ;H, J) is entirely similar to the
corresponding procedure in the nondegenerate case (it is actually simpler since
we do not need the intermediate transition from Lp,d to Lp spaces). Namely,
we pull back the orientation on Det(Õ) using the natural map M̂ → Õ. This
in turn induces orientations on the quotient spaces MA(Sγ , Sγ ;H, J). Recall
that, given oriented vector spaces V ⊂W , we define an orientation on W/V by
requiring that the isomorphism V ⊕ (W/V ) ≃W is orientation preserving.
Since the stable and unstable manifolds of the functions fγ are canoni-
cally oriented, one gets orientations (i.e. signs) on all zero-dimensional moduli
spaces of Floer trajectories with gradient fragments MA(p, q;H, {fγ}, J) which
involve only good orbits. This is done by the following fibered sum rule.
Let fi : Wi → W , i = 1, 2 be linear maps of oriented vector spaces such that
f : W1 ⊕W2 → W , (w1, w2) 7→ f1(w1) − f2(w2) is surjective. The orientation
on the fibered sum W1f1⊕f2W2 := ker f is defined such that the isomorphism
of vector spaces (W1 ⊕W2)/ ker f
∼→ W induced by f changes orientations by
the sign (−1)dim W2·dim W . Note that this rule is such that the fibered sum
operation is associative for oriented vector spaces, and moreover, if f2 is an ori-
entation preserving isomorphism, the natural isomorphismW1f1⊕f2W2 ≃W1 is
orientation preserving. Similarly, if f1 is an orientation preserving isomorphism,
the natural isomorphism W1f1⊕f2W2 ≃W2 is orientation preserving.
Symplectic homology for autonomous Hamiltonians 80
The important remark now is that, although the spaces M̂A(Sγ , Sγ ;H, J)
with γ or γ being a bad orbit may not be orientable, we can nevertheless define
orientations (i.e. signs) on all zero-dimensional moduli spaces of Floer trajec-
tories with gradient fragments MA(p, q;H, {fγ}, J). The sign of an (isolated)
point [u] = (cm, [um], . . . , c1, [u1], c0) in this moduli space is determined as fol-
lows. For each operator Dui , i = 1, . . . ,m with at least one bad asymptote we
choose a lift in the corresponding space
O(R×S1). For each ci, i = 0, . . . ,m ly-
ing on a bad orbit γi the corresponding operatorDuδ,γi,−Ti/2,Ti/2 admits a unique
lift to the space
O(Sγi , Sγi) such that it can be glued with both Dui+1 and Dui .
Since all these operators are surjective, the orientations of the determinant line
bundles over the spaces Õ and
O induce orientations on TuiM̂Ai(Sγi , Sγi−1),
respectively TWu(p), TW s(q) and T(ci(−Ti/2),Ti)(Sγi × R+), i = 1, . . . ,m − 1.
By the fibered sum rule we get an orientation on TuM̂A(p, q;H, {fγ}, J) which
we call “the coherent orientation”. On the other hand this vector space carries
the “geometric orientation” of the basis (∂sum, . . . , ∂su1). We define the sign
ǫ(u) = ǫ([u]) (87)
to be +1 if these two orientations coincide, and −1 if they are different.
We now want to compare the signs ǫ(u) with the signs ǫ(uδ) of the glued
trajectories uδ corresponding to u. The situation is expressed by the follow-
ing diagram, in which we dropped the decorations A, (H, {fγ}, J) and (Hδ, J)
and in which we have indicated on the morphism arrows the way in which the
corresponding isomorphisms of vector spaces act on orientations.
Coherent
orientation
// TM̂(p, q)
Id ǫ(u)
TM̂(γp, γq)⊕ R
Id ǫ(uδ)
Coherent
orientation
Geometric
orientation
〈∂sum,...,∂su1〉
// TM̂(p, q)
? // TM̂(γp, γq)⊕ R
Geometric
orientation
〈∂suδ〉⊕ R
The map φ is defined from gluing as follows. The tangent space TM̂(p, q)
is the kernel of the operator D ew, w̃ = (vm, um, . . . , v1, u1, v0) considered in
Lemma 4.17. Moreover, since the cokernel ofD ew is naturally oriented, the coher-
ent orientation of Det(D ew) induces a “coherent” orientation on ker D ew. Recall
that the analytical expression ofD ew isDvm⊕Dum⊕D′vm−1⊕. . .⊕D
⊕Du1⊕Dv0 ,
and note that D ew admits a natural stabilization D
ew obtained by replacing
D′vi , i = 1, . . . ,m− 1 with D{∂̄T }(vi, Tvi) (see Remark 4.11 for the definitions).
By [5, Corollary 6] there is a natural isomorphism φ̃ : ker DR
∼→ ker DRm−1
Gδ( ew)
which preserves the coherent orientations. We denote by φ : ker DR
ker DR
the composition of φ̃ with the projection Π on ker DR
along the
image of the right inverse Qδ of DGδ( ew) given by Proposition 4.18. Since DGδ( ew)
Symplectic homology for autonomous Hamiltonians 81
and Duδ are close in the relevant δ-norm, we get that φ is an isomorphism pre-
serving coherent orientations.
The vertical maps change orientations by ǫ(u), respectively ǫ(uδ) by defi-
nition, and the whole work now goes into determining the action of φ on the
geometric orientations.
Remark 4.33. If γ is a good orbit and p ∈ Crit(fγ), the geometric orientations
on Wu(p) and W s(p) coincide with the coherent ones. Indeed, the unstable
manifold Wu(p) is naturally identified with the zero set of the section ∂̄−∞,1
defined on B1,p,dδ (p, Sγ ; fγ) by (54), whereas the stable manifold W s(p) is natu-
rally identified with the zero set of the section ∂̄−1,∞ defined on B1,p,dδ (Sγ , q; fγ).
The assertion for W s(p) is then a direct consequence of the definition of the ori-
entation on Det(Õs(R×S1, θn; γ, γq)). As for Wu(p), let us consider the gluing
operation
Õu(R× S1, θn; γp, γ)ev ×ev Õs(R× S1, θn; γ, γp) → O(R× S1, θn; γp, γp).
We choose the surjective operators D1 := Duδ,γ,−∞,1 , D2 := Duδ,γ,−1,∞ corre-
sponding to the constant gradient trajectory at p. With these choicesD1#D2 =
Duδ,γ,−∞,∞ =: D also corresponds to the constant gradient trajectory at p. The
operator D is an isomorphism and, by the coherent choice of the orientations,
the determinant line Det(D) ≃ R is positively oriented. If p is the maximum of
fγ then ker D2 ≃ TpSγ as oriented vector spaces (by definition), the kernel of
D1 is trivial and its determinant line must be positively oriented. If p is the min-
imum of fγ then ker D2 is trivial and its determinant line is positively oriented
by definition, therefore ker D1 ≃ TpSγ must have the geometric orientation.
Lemma 4.34. Assume dim MA(p, q;H, {fγ}, J) = 0 and fix an element [u] ∈
MA(p, q;H, {fγ}, J) with m ≤ 2 sublevels. Then ǫ(u) = ǫ(uδ) if m = 0, 1 and
ǫ(u) = −ǫ(uδ) if m = 2.
Proof. If m = 0 the statement is obvious since u consists of a single gradient
trajectory and u = uδ (see Lemma 4.28). We now have to show that the map φ in
the previous diagram preserves the geometric orientation if m = 1, respectively
reverses it if m = 2. Since a shift σ on u1 produces a glued trajectory uδ shifted
by the same amount σ, we infer that φ(∂su1) = ∂suδ and, for m = 1, the
statement follows from the commutativity of the diagram.
Let us now examine the case m = 2. We recall that φ = Π ◦ φ̃, where the
isomorphism φ̃ is the composition of the gluing map G in the proof of Propo-
sition 4.18 with the projection to ker DR
Gδ( ew)
along the image of Qδ (see [5]).
We first show that φ(∂su1 + ∂su2) is close in ‖ · ‖1,δ-norm to ∂suδ. We denote
w̃σ,σ := (v2, u2(·+ σ), v1, u1(·+ σ), v0), σ ∈ R. Then G(0⊕ ∂su2 ⊕ 0⊕ ∂su1 ⊕ 0)
is ‖ · ‖1,δ-close to ddσ
Gδ(w̃
σ,σ), which is ‖ · ‖1,δ-close to ddσ
Gδ(w̃)(·+ σ),
which is in turn close to d
vδ(· + σ) = ∂svδ. Then φ(∂su1 + ∂su2) =
Π(G(0 ⊕ ∂su2 ⊕ 0⊕ ∂su1 ⊕ 0)) is ‖ · ‖1,δ-close to Π(∂suδ) = ∂suδ.
We now show that φ(∂su1−∂su2) ∈ ker Duδ ⊕R is a vector having a negative
component in the R direction and whose component on ker Duδ is ‖ · ‖1,δ-close
to −∂su1. Then the conclusion follows.
Symplectic homology for autonomous Hamiltonians 82
Let w̃−σ,σ := (v2, u2(· − σ), v1, u1(·+ σ), v0), σ ∈ R. Then G(0⊕ (−∂su2)⊕
0 ⊕ ∂su1 ⊕ 0) is ‖ · ‖1,δ-close to ddσ
Gδ(w̃
−σ,σ). We define ǫ(σ) := 2δσ and
the section
Gδ,ǫ(σ)(w̃
−σ,σ) =
Gδ(w̃
−σ,σ) + 2δ
Gδ,ǫ(w̃) (88)
is by construction ‖ · ‖1,δ-close to ddσ
Gδ(w̃
−σ,−σ), hence ‖ · ‖1,δ-close to
−∂suδ. By adapting the arguments in the proof of Proposition 4.15 one sees
that the section
∂̄Tv1+ǫGδ,ǫ(w̃) = D∂̄Tv1
Gδ,ǫ(w̃) +D{∂̄T }(Gδ(w̃), Tv1) · (0, 1)
is ‖ · ‖δ-small. Here the sections ∂̄T are of the form ∂̄HT ,J , where HT is the s-
dependent Hamiltonian given respectively by (54) on the intervals of definition
of v2, v1, v0, and equal toH on the intervals of definition of u1, u2. The previous
equation shows that ( d
Gδ,ǫ(w̃), 1) ∈ dom(DRGδ( ew)) is ‖·‖1,δ-close to ker D
On the other hand, equation (88) shows that G(0 ⊕ (−∂su2)⊕ 0⊕ ∂su1 ⊕ 0) is
‖ · ‖1,δ-close to −∂suδ − 2δ ddǫ
Gδ,ǫ(w̃). Hence, after projecting to ker D
ker Duδ ⊕ R, we get a vector having a negative component in the R direction
and whose component on ker Duδ is ‖ · ‖1,δ-close to −∂suδ.
Proof of Proposition 3.9. The special statement concerning the case m = 0
was proved in Lemmas 4.28 and 4.34, whereas the equality ǫ(u) = (−1)m−1ǫ(uδ)
in case m = 1, 2 was the content of Lemma 4.34. The proof in the case m ≥ 3
is just a more elaborate version of the proof of Lemma 4.34. We consider the
basis of TuM̂(p, q) given by
e0 := ∂sum + ∂sum−1 + . . .+ ∂su2 + ∂su1,
e1 := −∂sum + ∂sum−1 + . . .+ ∂su2 + ∂su1,
em−2 := −∂sum − ∂sum−1 − . . .+ ∂su2 + ∂su1,
em−1 := −∂sum − ∂sum−1 − . . .− ∂su2 + ∂su1.
It is easy to see that the orientation determined by (e0, . . . , em−1) is the same
as the geometric orientation determined by (∂sum, . . . , ∂su1). We have to show
that the orientation of the basis (φ(e0), . . . , φ(em−1)) differs from the canonical
orientation of 〈∂suδ〉 ⊕ Rm−1 by (−1)m−1.
As in Lemma 4.34 we see that φ(e0) is ‖ · ‖1,δ-close to ∂suδ. We now show
that φ(ek) ∈ kerDuδ ⊕Rm−1, k = 1, . . . ,m− 1 has a negative component which
is bounded away from zero along the corresponding factor R ⊂ Rm−1, that
the other components in Rm−1 are close to zero, whereas the component along
kerDuδ is close to −∂suδ in ‖ · ‖1,δ-norm. Then the conclusion will follow since
the orientation defined by (φ(e0), . . . , φ(em−1)) is the same as the orientation
defined by
(∂suδ, 0, . . . , 0), (−∂suδ,−1, 0, . . . , 0), . . . , (−∂suδ, 0, . . . , 0,−1)).
Symplectic homology for autonomous Hamiltonians 83
Let us fix k = 1, . . . ,m−1. We shall freely use the notation ek for the vector
0⊕ (−∂sum)⊕ 0⊕ . . .⊕ (−∂sum−k+1)⊕ 0⊕∂sum−k⊕ . . . ∂su1⊕ 0 in the domain
of the gluing map G defined in the proof of Proposition 4.18. For σ > 0 we
denote
k :=(vm, um(·−σ), . . . , um−k+1(·−σ), vm−k, um−k(·+σ), . . . , u1(·+σ), v0),
w̃−σ,−σ :=(vm, um(·−σ), . . . , um−k+1(·−σ), vm−k, um−k(·−σ), . . . , u1(·−σ), v0).
Then G(ek) is ‖ · ‖1,δ-close to ddσ
Gδ(w̃
k ). We denote
ǫk(ǫ) := (0, . . . , ǫ, . . . , 0),
where the parameter ǫ > 0 appears on position m− k. The section
Gδ,ǫk(2δσ)(w̃
k ) =
Gδ(w̃
k ) + 2δ
Gδ,ǫk(w̃) (89)
is by construction ‖·‖1,δ-close to ddσ
Gδ(w̃
−σ,−σ), hence ‖·‖1,δ-close to −∂suδ.
As in Lemma 4.34, by adapting the arguments in the proof of Proposition 4.15
one sees that the section
∂̄Tvm−k+ǫGδ,ǫk(ǫ)(w̃) = D∂̄Tvm−k
Gδ,ǫk(ǫ)(w̃)
+ D{∂̄T }(Gδ(w̃), Tvm−k) · (0, 1)
is ‖ · ‖δ-small. As before, the sections ∂̄T are of the form ∂̄HT ,J , where HT
is the s-dependent Hamiltonian given respectively by (54) on the intervals of
definition of vm, vm−1, . . . , v0, and equal to H on the intervals of definition of
um, . . . , u1. The previous equation shows that
Gδ,ǫk(ǫ)(w̃), 0, . . . , 1, . . . , 0) ∈ dom(D
Gδ( ew)
is ‖·‖1,δ-close to ker DR
. On the other hand, equation (89) shows that G(ek)
is ‖·‖1,δ-close to −∂suδ−2δ ddǫ
Gδ,ǫk(ǫ)(w̃). After projecting to ker D
get a vector whose k-th component in Rm−1 is negative, whose other components
in Rm−1 are small, and whose component on ker Duδ is ‖ · ‖1,δ-close to −∂suδ.
Remark 4.35. We chose to define the signs ǫ(u) by comparing the orientation
induced on TuM̂A(p, q;H, {fγ}, J) by the fiber sum rule from the coherent ori-
entations on TM̂Ai(Sγi , Sγi−1 ;H, J), i = 1, . . . ,m with the orientation of the
basis (∂sum, . . . , ∂su1). Another possible recipe would have been the following:
induce orientations on TMAi(Sγi , Sγi−1 ;H, J) out of the coherent orientations
by quotienting out 〈∂s〉, then apply the fiber sum rule in order to get a sign on
the zero-dimensional spaces T[u]MA(p, q;H, {fγ}, J). The sign obtained in this
way would have differed from the previously defined ǫ(u) by a factor ±1 which
Symplectic homology for autonomous Hamiltonians 84
can be explicitly computed and which depends on the combinatorics of the lev-
els of u. The curious reader can test this procedure in the case m = 1: it gives
a sign equal to ǫ(u) if p, q are both minima, respectively equal to −ǫ(u) if p, q
are both maxima. The following two properties of the fibered sum constitute
a useful tool for making the verification (here W1 and W2 are oriented vector
spaces).
• the natural isomorphism W1 0
∼→ W1 ⊕ ker f2 changes the orien-
tation by (−1)dim W1·(dim W2+1);
• the natural isomorphism W1 f1
∼→ ker f1 ⊕W2 preserves the orien-
tation.
A Appendix: Asymptotic estimates
For all γ ∈ P(H), we choose coordinates (ϑ, z) ∈ S1 × R2n−1 parametrizing a
tubular neighbourhood of γ, such that ϑ ◦ γ(θ) = θ and z ◦ γ(θ) = 0. Given a
smooth function fγ : Sγ → R, we denote by ϕ
s the gradient flow of fγ with
respect to the natural metric on S1.
In a neighbourhood of γ ∈ P(H) the Floer equation ∂su+ J∂θu− JXH = 0
becomes ∂sZ+J∂θZ+J
−JXH = 0, where Z(s, θ) := (ϑ◦u(s, θ)−θ, z◦u(s, θ)).
Since XH =
on {z = 0} this can be rewritten as
∂sZ + J∂θZ + Sz = 0
for some matrix-valued function S = S(ϑ, z). The matrix S∞(θ) := S(θ, 0) is
symmetric. Let A∞ : H
k(S1,R2n) → Hk−1(S1,R2n) be the operator defined by
A∞Z := J
Z + S∞(θ)z.
The kernel of A∞ has dimension one and is spanned by the constant vector
e1 := (1, 0, . . . , 0). We denote by Q∞ the orthogonal projection onto (ker A∞)
and we set P∞ := 1l − Q∞. Then A∞ is invertible when restricted to imQ∞
and Q∞A∞ = A∞.
Proposition A.1. Let H ∈ H′ be fixed. There exists r > 0 such that for all
J ∈ J ℓ and for all u ∈ MA(Sγ , Sγ ;H, J), γ, γ ∈ P(H) we have
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p(]−∞,−s0]× S1,R; er|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; er|s|ds dθ),
ϑ ◦ u(s, θ)− θ − θ0 ∈ W 1,p([s0,∞[×S1,R; er|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; er|s|ds dθ),
for some θ0, θ0 ∈ S1 and some s0 > 0 sufficiently large.
Symplectic homology for autonomous Hamiltonians 85
Proof. We make the proof only at +∞ since the case of −∞ is entirely similar.
For s large enough we set S(s, θ) := S(ϑ ◦ u(s, θ), z ◦ u(s, θ)), so that S∞(θ) =
lims→∞ S(s, θ) and lims→∞ |∂sS(s, θ)| = 0.
Let A(s) : Hk(S1,R2n) → Hk−1(S1,R2n) be the operator defined by
A(s)Z := J
Z + S(s, θ)z,
so that A∞ = lims→∞A(s). We have A(s) = A(s)Q∞, ∂sQ∞ = Q∞∂s. Since
A∞ is invertible when restricted to imQ∞ and Q∞A∞ = A∞, the operators
A(s) andQ∞A(s) are also invertible when restricted to imQ∞ for s large enough
and there exists c > 0 such that
‖A(s)Q∞Z‖2 ≥ ‖Q∞A(s)Q∞Z‖2 ≥ c‖Q∞Z‖2
for all Z ∈ Hk(S1,R2n). For s large enough we define
f(s) :=
‖Q∞Z(s)‖2.
We have
f ′′(s) = ‖∂sQ∞Z‖2 + 〈Q∞Z, ∂2sQ∞Z〉
= ‖∂sQ∞Z‖2 − 〈Q∞Z, ∂sQ∞A(s)Q∞Z〉
= ‖Q∞A(s)Q∞Z‖2 − 〈Q∞Z,Q∞(∂sA(s))Q∞Z −Q∞A(s)2Q∞Z〉
≥ (c− ε)‖Q∞Z‖2+〈(A(s)∗ −A(s))Q∞Z,A(s)Q∞Z〉+‖A(s)Q∞Z‖2
≥ (2c− 2ε)‖Q∞Z‖2 ≥ 4ρ2f(s).
Here A(s)∗ is the adjoint of A(s) and we used the fact that ‖∂sA(s)‖ → 0,
A(s)∗ −A(s) → 0 for s→ ∞ and ‖A(s)‖ is uniformly bounded.
Let now s0 be large enough and define g(s) := f(s0)e
−2ρ(s−s0). Then g′′ =
4ρ2g, (f − g)′′ ≥ 4ρ2(f − g), (f − g)(s0) = 0 and lims→∞ f(s) − g(s) = 0.
Then f − g ≤ 0 on [s0,∞[ because it cannot have a strictly positive maximum.
Therefore
‖Q∞Z(s)‖ ≤ ‖Q∞Z(s0)‖e−ρ(s−s0).
It is important to note that this estimate holds for any Sobolev normHk. By
the Sobolev embedding theorem this implies the following pointwise estimate
|Q∞Z(s, θ)| ≤ Ce−ρs, |∂θQ∞Z(s, θ)| = |∂θZ(s, θ)| ≤ Ce−ρs, s ≥ s0.
Because ∂sZ +A(s)Z = ∂sZ +A(s)Q∞Z = 0 we obtain
|∂sZ(s, θ)| ≤ Ce−ρs, s ≥ s0
and, by integration on [s,∞[ and taking into account that Z(s, θ) converges to
(θ0, 0, . . . , 0) for s→ ∞, we obtain the pointwise estimate
|(ϑ− θ − θ0, z)| ≤ Ce−ρs.
This implies the conclusion for any r < ρ.
Symplectic homology for autonomous Hamiltonians 86
Proposition A.2. Let H ∈ H′ and {fγ : Sγ → R} be a collection of perfect
Morse functions indexed by γ ∈ Pλ. There exist r > 0 and δ0 > 0 such that
for all J ∈ J , γ, γ ∈ P(H), p ∈ Crit(fγ), q ∈ Crit(fγ) and for all (δ, u) ∈
]0,δ0]
(γp, γq;H, {fγ}, J), we have
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p(]−∞,−s0]× S1,R; er|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p(]−∞,−s0]× S1,R2n−1; er|s|ds dθ),
ϑ ◦ u(s, θ)− θ − ϕδfγs (θ0) ∈ W 1,p([s0,∞[×S1,R; er|s|ds dθ),
z ◦ u(s, θ) ∈ W 1,p([s0,∞[×S1,R2n−1; er|s|ds dθ),
for some θ0, θ0 ∈ S1 and some s0 > 0 sufficiently large.
Proof. The proof is similar to the one of Proposition A.1. With the same nota-
tions as before the Floer equation satisfied by u can be written in local coordi-
nates Z = (ϑ− θ, z) as
∂sZ + J∂θZ + Sz − δ∇fγ(Z1) = 0, (90)
where Z1 := ϑ−θ. We again show that f(s) = 12‖Q∞Z‖
2 satisfies an inequality
of the form f ′′(s) ≥ 4ρ2f(s). There are two additional terms to estimate in the
expression of f ′′(s), namely
〈Q∞Z, δQ∞A(s)∇fγ(Z1)〉 (91)
〈Q∞Z, δQ∞∂s(∇fγ(Z1))〉. (92)
Let P∞ := 1l−Q∞ be the orthogonal projection on ker A∞. The main observa-
tion is that Q∞∇fγ(P∞Z1) = 0. As a consequence there exists a matrix-valued
function L = L(s, θ) such that
Q∞∇fγ(Z1) = LQ∞(Z1).
The term (91) is then estimated by
〈Q∞Z, δQ∞A(s)∇fγ(Z1)〉 = 〈Q∞Z, δQ∞A(s)Q∞∇fγ(Z1)〉
≤ Cδ‖Q∞Z‖2
for s ≥ s0, where s0 depends on u, but C depends only on γ and fγ . Similarly,
the term (92) is estimated by
〈Q∞Z, δQ∞∂s(∇fγ(Z1))〉 = 〈Q∞Z, δ∂sQ∞∇fγ(Z1)〉
≤ Cδ‖Q∞Z‖‖∂sQ∞(Z1)‖
‖Q∞Z‖2 + ‖∂sQ∞Z‖2
The norm of ∂sQ∞Z = Q∞∂sZ satisfies
‖∂sQ∞Z‖ = ‖Q∞A(s)Z − δQ∞∇fγ(Z1)‖ ≤ C‖Q∞Z‖.
Symplectic homology for autonomous Hamiltonians 87
As a consequence, there exists δ0 > 0 and ρ > 0 such that f
′′(s) ≥ 4ρ2f(s) for
s ≥ s0 and 0 < δ ≤ δ0. As before, we infer the pointwise bounds
|Q∞Z(s, θ)| ≤ Ce−ρs, |∂θQ∞Z(s, θ)| = |∂θZ(s, θ)| ≤ Ce−ρs, s ≥ s0. (93)
It remains to estimate P∞Z. For that we write ∇fγ(Z1) = ∇fγ(P∞(Z1)) +
KQ∞(Z1) for some matrix-valued function K = K(s, θ). Again, for s ≥ s0, the
norm ‖K‖ is uniformly bounded by a constant depending only on γ and fγ . By
applying P∞ to the equation (90) and using the fact that P∞∇fγ(P∞(Z1)) =
∇fγ(P∞(Z1)) and P∞(Z1) = P∞(Z) we obtain
|∂s(P∞Z)− δ∇fγ(P∞Z)| ≤ Ce−ρs. (94)
We claim that this implies
|P∞Z(s)− ϕ
s (θ0)| ≤ Ce−ρs, s ≥ s0 (95)
for a suitable θ0. We choose a Morse coordinate x on Sγ around the critical
point q of fγ in which the gradient ∇fγ(x) = ±Mx,M > 0. Then equation (94)
becomes
∂s(P∞Z)(s)∓ δMP∞Z(s) = G(s)
with |G(s)| ≤ Ce−ρs. Then P∞Z(s) = c(s)e±δMs with e±δMs∂sc(s) = G(s).
As a consequence, for δ < ρ/M the function c admits a limit c∞ as s→ ∞ and
c(s) = c∞ −
G(σ)e∓δMσ dσ. Let θ0 be such that ϕ
s (θ0) = c∞e
±δMs (note
that c∞ = 0 if q is a maximum). Then
|P∞Z(s)− ϕ
s (θ0)| = |e±δMs
G(σ)e∓δMσ dσ|
≤ Ce−ρs.
The estimates (93) and (95) imply the conclusion.
Proposition A.3. Let δ ∈]0, δ0] and let uδ ∈ M̂A(γp, γq;Hδ, J). Let Iδ =
[s0(δ), s1(δ)] ⊂ R be an interval such that uδ(Iδ×S1) is contained in the domain
of a coordinate chart Z = (ϑ, z) around Sγ for some γ ∈ P(H).
There exist ρ > 0, θ0 ∈ S1, C > 0 and M > 0 such that z ◦ u(s, θ) and its
(first order) derivatives are bounded by
Cmax(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0)/2)
for s ∈ Iδ, θ ∈ S1. If P∞Z(s), s ∈ Iδ stays away from all but one of the critical
points of fγ, then ϑ ◦ u(s, θ)− θ − ϕ
s (θ0) and its (first order) derivatives are
bounded by
Cmax(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)eδM(s1−s0)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0)/2)
Moreover, if P∞Z(s), s ∈ Iδ stays away from all critical points of fγ , the above
bound is improved to (96).
Symplectic homology for autonomous Hamiltonians 88
Proof. With the notations of Proposition A.2, the Floer equation satisfied by u
can be written in local coordinates Z = (ϑ− θ, z) as
∂sZ + J∂θZ + Sz − δ∇fγ(Z1) = 0, (97)
where Z1 := ϑ − θ. Let A∞ = J ddθ + S∞(θ) the asymptotic operator at γ, let
Q∞ be the orthogonal projection onto (ker A∞)
⊥ and P∞ := 1l − Q∞. Then,
as in Proposition A.2, the quantity f(s) = 1
‖Q∞Z‖2 satisfies an inequality of
the form f ′′(s) ≥ 4ρ2f(s). Define
g(s) := max(f(s0), f(s1))
cosh(2ρ(s− s0+s1
cosh(ρ(s1 − s0))
Then (f − g)′′ ≥ 4ρ2(f − g) and f − g cannot have a strictly positive maximum.
Since f−g ≤ 0 at s0 and s1, we infer that f−g ≤ 0 on Iδ. As in Proposition A.1,
we infer the pointwise bounds for s ≥ s0
|Q∞Z(s, θ)| ≤ Cg1(s), (98)
|∂θQ∞Z(s, θ)| = |∂θZ(s, θ)| ≤ Cg1(s),
|∂s(P∞Z)(s)− δ∇fγ(P∞Z)(s)| ≤ C1g1(s),
where
g1(s) := max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(2ρ(s− s0+s1
cosh(ρ(s1 − s0))
If P∞Z(s) stays away from Crit(fγ), we can assume that ∇fγ(P∞Z(s)) = M
in some suitable coordinate on S1. Then the last equation becomes
∂s(P∞Z)(s)− δM = G(s),
where |G(s)| ≤ C1g1(s). By direct integration we obtain
|(P∞Z)(s)− δMs− c0| =
s0+s1
G(σ) dσ
s0+s1
cosh(2ρ(s− s0 + s1
)) dσ
∣∣ sinh(ρ(s− s0 + s1
cosh(ρ(s−
s0 + s1
Here C2 = C1 max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)/
cosh(ρ(s1 − s0)) and we have
used the inequality
coshx ≤
2 cosh(x/2). Therefore, there exists a uniquely
Symplectic homology for autonomous Hamiltonians 89
determined θ0 ∈ S1 such that
|(P∞Z)(s)− ϕδfγs (θ0)|
max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0))
max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0)/2)
The last inequality follows from cosh(x/2) ≤
coshx. A similar manipulation
on (98) gives
|Q∞Z(s, θ)| ≤ C
2max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0)/2)
The last two inequalities imply the conclusion of the Proposition in the case
when P∞Z(s), s ∈ Iδ stays away from Crit(fγ).
If P∞Z(s) is allowed to approach one of the critical points of fγ , the estimate
on |Q∞Z(s)| stays the same, but the estimate involving P∞Z(s) has to be
modified as follows. In a suitable Morse coordinate chart around the critical
point we can assume that ∇fγ(x) = ±Mx, M > 0 and we have to study the
equation
∂s(P∞Z)(s)∓ δMP∞Z(s) = G(s),
with |G(s)| ≤ C1g1(s). As in Proposition A.2 we have P∞Z(s) = c(s)e±δMs
with e±δMs∂sc(s) = G(s). Then c(s) = c0 +
s0+s1
e∓δMσG(σ) dσ and there
exists a θ0 ∈ S1 such that ϕ
s (θ0) = c0e
±δMs. We obtain
|(P∞Z)(s)− ϕδfγs (θ0)| ≤
s0+s1
e±δM(s−σ)G(σ) dσ
≤ eδM(s1−s0)
s0+s1
G(σ) dσ
The last integral is bounded by
max(‖Q∞Z(s0)‖, ‖Q∞Z(s1)‖)
cosh(ρ(s− s0+s1
cosh(ρ(s1 − s0)/2)
as in the previous case and the conclusion follows.
References
[1] F. Bourgeois, A Morse-Bott approach to contact homology, Ph.D. dissertation, Stanford Uni-
versity, 2002.
[2] ——, Contact homology and homotopy groups of the space of contact structures, Math. Res.
Lett. 13 (2006), 71–85.
Symplectic homology for autonomous Hamiltonians 90
[3] F. Bourgeois, A. Oancea, An exact sequence for contact and symplectic homology. Preprint
arXiv: 0704.2169.
[4] ——, The Gysin exact sequence for S1-equivariant symplectic homology. Work in progress.
[5] F. Bourgeois, K. Mohnke, Coherent orientations in symplectic field theory, Math. Z. 248
(2004), 123–146.
[6] K. Cieliebak, Handle attaching in symplectic homology and the chord conjecture, J. Eur.
Math. Soc. (JEMS) 4 (2002), 115–142.
[7] K. Cieliebak, A. Floer, and H. Hofer, Symplectic homology. II. A general construction, Math.
Z. 218 (1995), 103–122.
[8] K. Cieliebak, A. Floer, H. Hofer, and K. Wysocki, Applications of symplectic homology II:
Stability of the action spectrum, Math. Z. 223 (1996), 27–45.
[9] K. Cieliebak, K. Mohnke, Compactness for punctured holomorphic curves, J. Symplectic
Geom. 3 (2005), 589–654.
[10] D. Dragnev, Fredholm theory and transversality for noncompact pseudoholomorphic maps in
symplectizations, Comm. Pure Appl. Math. 57 (2004), 726–763.
[11] Y. Eliashberg, A. Givental, and H. Hofer, Introduction to symplectic field theory, Geom.
Funct. Anal., Special Volume (2000), Part II, 560–673.
[12] A. Floer, Symplectic fixed points and holomorphic spheres, Comm. Math. Phys. 120 (1989),
575–611.
[13] A. Floer, H. Hofer, Coherent orientations for periodic orbit problems in symplectic geometry,
Math. Z. 212 (1993), 13–38.
[14] ——, Symplectic homology. I. Open sets in Cn, Math. Z. 215 (1994), 37–88.
[15] A. Floer, H. Hofer, and D. Salamon, Transversality in elliptic Morse theory for the symplectic
action, Duke Math. J. 80 (1995), 251–292.
[16] U. Frauenfelder, The Arnold-Givental conjecture and moment Floer homology, Int. Math.
Res. Not. 2004, no. 42, 2179–2269.
[17] H. Hofer, D. Salamon, Floer homology and Novikov rings, in The Floer memorial volume,
Eds. H. Hofer et al. Progress in Math., vol. 133, Birkhäuser, 1995, pp. 483–524.
[18] H. Hofer, K. Wysocki, and E. Zehnder, Properties of pseudoholomorphic curves in symplecti-
zations. I. Asymptotics. Ann. Inst. H. Poincaré Anal. Non Linéaire 13 (1996), 337–379.
[19] H. Hofer, K. Wysocki, and E. Zehnder, Properties of pseudoholomorphic curves in symplectiza-
tions. IV. Asymptotics with degeneracies. in Contact and symplectic geometry (Cambridge,
1994), Ed. C.B. Thomas. Publ. Newton Inst., vol. 8, Cambridge Univ. Press, Cambridge,
1996, 78–117.
[20] H. Hofer, K. Wysocki, and E. Zehnder, Polyfolds and Fredholm theory. Part I, Preliminary
preprint, 182 pp., 2005.
[21] D. McDuff, D. Salamon, J-holomorphic curves and symplectic topology, AMS Colloquium
Publications, vol. 52, AMS, 2004.
[22] A. Oancea, A survey of Floer homology for manifolds with contact type boundary or sym-
plectic homology, Ensaios Mat. 7 (2004), 51–91.
[23] J. Robbin, D. Salamon, The Maslov index for paths, Topology 32 (1993), 827–844.
[24] D. Salamon, Lectures on Floer Homology, in Symplectic Geometry and Topology, Eds.
Y. Eliashberg and L. Traynor. IAS/Park City Math. Series, vol. 7, AMS, 1999, pp. 143–229.
[25] D. Salamon, E. Zehnder, Morse theory for periodic solutions of Hamiltonian systems and the
Maslov index, Comm. Pure Appl. Math. 45 (1992), 1303–1360.
Symplectic homology for autonomous Hamiltonians 91
[26] M. Schwarz, Cohomology operations from S1-cobordisms in Floer homology, PhD thesis, ETH
Zurich, 1995.
[27] I. Ustilovsky, Contact homology and contact structures on S4m+1, Ph.D. dissertation, Stan-
ford University, 1999.
[28] C. Viterbo, Functors and computations in Floer homology with applications. I. Geom. Funct.
Anal. 9 (1999), 985–1033.
Index
A∞, asymptotic operator, 27, 84
BCa∗ (H), BC
∗(H), Morse-Bott chain groups, 20
Du, linearized operator, 9
D ew, fibered sum linearized operator, 54
G, gluing map, 58
Gδ,ǫ( ew), pre-glued curve, 46
H′a,b,ǫ, 34
Hδ , perturbed Hamiltonian, 14
P , mixing map, 57
P∞, asymptotic operator, 27, 84
Qδ , right inverse for gluing theorem, 56
Q∞, asymptotic operator, 27, 84
R = R(δ), gluing profile, 46
Rλ, Reeb vector field, 6
S, splitting map, 58
SCa∗ (H), symplectic chain group, 8
SHa∗ (W,ω), symplectic homology, 11
Sγ , circle of orbits, 14
S∞, asymptotic matrix, 27, 84
X, Liouville vector field, 6
XH , Hamiltonian vector field, 6
Λω , Novikov ring, 8
ǭ(u), 20
β, cutoff function, 23, 31, 34, 46, 48, 70
βL, cutoff function, 57
AH , action functional, 7
B′δ, 33
BA = B1,p,d(Sγ , Sγ , A;H), 22
BAδ = B
1,p,d
(γp, γq
, A;H, {fγ}), 30
E(u), energy, 9
Freg(H, J), regular families {fγ}, 18
H, admissible Hamiltonians, 7
H′, autonomous admissible Hamiltonians, 13
J , admissible a.c. structures, 7
J ′, time-indep. admissible a.c. structures, 17
J ′reg(H), 17
Jreg(H), 9, 17
M0(γ, γ;H, J), cM0(γ, γ;H, J), 9
MA(Sγ , Sγ ;H, J), cM
A(Sγ , Sγ ;H, J), 16–17
MA(Sγ , eq;H, J), cM
A(Sγ , eq;H, J), 16–17
MA(γ, γ;H, J), cMA(γ, γ;H, J), 8–9
MA]0,δ0]
(γp, γq
;H, {fγ}, J), 16
MAm(p, q;H, {fγ}, J), M
A(p, q;H, {fγ}, J), 18–
O, space of CR operators, 69–70
P(H),Pa(H), 7
i−1(a)
, 11–12
S, loops of symmetric matrices without degen-
erate directions, 69
∂̄H , 22
∂̄Hδ,J , 31
∂̄a,b,ǫ := ∂̄H′
a,b,ǫ
,J , 34
ǫ(u), 21, 80
ǫ(u), ǫ(uδ), 10, 20
γmin , γMax , surviving orbits, 15
ind(p), index of critical point of fγ , 18
µ(γ), index of Reeb orbit, 8, 15
ev, ev, evaluation maps, 17
∂, Floer differential, 10
∂, Morse-Bott differential, 20
eO, space of CR operators, 74–75
eS, loops of symmetric matrices with one degen-
erate direction, 74
Spec(M,λ), 6
O, cover of space of CR operators, 76
cW , symplectic completion, 6
bω, symplectic form on the completion cW , 6
eBδ , 44
ξ, contact distribution, 6
{∂̄T }, 43
fγ , Morse function on Sγ , 14, 18
gδ,ǫ(s), weight function for gluing, 47
ha,b,ǫ, 32
ka,b,ǫ, 32
uδ,γ,a,b,ǫ, gradient cylinder, 32
	Introduction
	Symplectic homology
	The Morse-Bott chain complex
	Morse-Bott moduli spaces
	Transversality
	Compactness for Morse-Bott trajectories
	Gluing for Morse-Bott moduli spaces
	Coherent orientations
	Appendix: Asymptotic estimates
ABSTRACT
  We define Floer homology for a time-independent, or autonomous Hamiltonian on
a symplectic manifold with contact type boundary, under the assumption that its
1-periodic orbits are transversally nondegenerate. Our construction is based on
Morse-Bott techniques for Floer trajectories. Our main motivation is to
understand the relationship between linearized contact homology of a fillable
contact manifold and symplectic homology of its filling.

<|endoftext|><|startoftext|>
Introduction are closely connected with the values of the reflection coefficients
at zero Matsubara frequency. For later use we discuss it for the various cases.
• For ideal metals40 it holds
r‖(0, k⊥) = r⊥(0, k⊥) = 1. (4)
• For real metals described by the dielectric function of the Drude model,
ε(iξl) = 1 +
ξl[ξl + ν(T )]
, (5)
where ωp is the plasma frequency and ν(T ) is the relaxation parameter, it
holds41
r‖(0, k⊥) = 1, r⊥(0, k⊥) = 0. (6)
Eq. (6) results in the discontinuity between the cases of ideal and real metals
and leads to the violation of the Nernst heat theorem for metallic plates
having perfect crystal lattices.52,53,54,56
• For real metals described by the dielectric function of the plasma model,
ε(iξl) = 1 +
, (7)
from Eq. (2) it follows:44,45
r‖(0, k⊥) = 1, r⊥(0, k⊥) =
c2k2⊥ + ω
p − ck⊥
c2k2⊥ + ω
p + ck⊥
. (8)
Here, in the limit of ideal metals (ωp → ∞) the continuity is preserved
because r⊥(0, k⊥) in Eq. (8) goes to unity. The free energy (1) calculated
with the permittivity (8) is also consistent with thermodynamics.
• For dielectrics and semiconductors the dielectric permittivities at the
imaginary Matsubara frequencies are given by the Ninham-Parsegian
representation,69,70
ε(iξl) = 1 +
1 + ξ2l /ω
, (9)
where the parameters Cj are the absorption strengths satisfying the condi-
Cj = ε0 − 1 (10)
and ωj are the characteristic absorption frequencies. Here, the static dielec-
tric permittivity ε0 ≡ ε(0) is supposed to be finite. Although Eq. (10) is an
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 7
approximate one, it gives a very accurate description for many materials.71
By the substitution of Eq. (9) in Eq. (2) one arrives at
r‖(0, k⊥) ≡ r0 =
ε0 − 1
ε0 + 1
, r⊥(0, k⊥) = 0. (11)
Note that the vanishing of the transverse reflection coefficient for dielectrics at zero
frequency in Eq. (11) has another meaning than for the Drude metals in Eq. (6).
For Drude metal the parallel reflection coefficient is equal to the physical value for
real photons at normal incidence, i.e., to unity, and the transverse one vanishes
instead of taking unity, its physical value. This results in the violation of the Nernst
heat theorem for perfect crystal lattices. In the case of dielectrics both reflection
coefficients at zero frequency in Eq. (11) depart from the physical value for real
photons which is equal to (
ε0−1)/(
ε0+1). In this case, however, one of them is
larger and the other one is smaller than the physical value. As we will see below, this
leads to the preservation of Nernst’s heat theorem confirming that Eq. (9), despite
being approximate, describes the material properties of dielectric and semiconductor
plates in a thermodynamic consistent way.
Now we derive the analytic representation for the Casimir free energy in Eq. (1)
at low temperatures. For convenience in calculations, we introduce the dimensionless
variables
= τl, y = 2aql, (12)
where ξc = c/(2a) is the characteristic frequency, τ = 4πkBaT/(~c), and ql was
defined in Eq. (3). Then the Lifshitz formula (1) takes the form
F(a, T ) = ~cτ
32π2a3
1− δl0
dy f(ζl, y) , (13)
where
f(ζ, y) = f‖(ζ, y) + f⊥(ζ, y), (14)
f‖,⊥(ζ, y) = y ln
1− r2‖,⊥(ζ, y)e−y
, (15)
and reflection coefficients (2), in terms of variables (12), being given by
r‖(ζl, y) =
εly −
y2 + ζ2l (εl − 1)
εly +
y2 + ζ2l (εl − 1)
, r⊥(ζl, y) =
y2 + ζ2l (εl − 1)− y
y2 + ζ2l (εl − 1) + y
. (16)
To separate the temperature independent contribution and thermal correction
in Eq. (13) we apply the Abel-Plana formula,3,6
1− δl0
F (l) =
F (t) dt+ i
F (it)− F (−it)
e2πt − 1
, (17)
where F (z) is an analytic function in the right half-plane. Here, taking it as
F (x) =
dy f(x, y) (18)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
8 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
and using Eq. (17), we can identically rearrange Eq. (13) to the form
F(a, T ) = E(a) + ∆F(a, T ) , (19)
where E(a) is the energy of the van der Waals or Casimir interaction at zero tem-
perature,
E(a) =
32π2a3
dy f(ζ, y) , (20)
and ∆F(a, T ) is the thermal correction to this energy,
∆F(a, T ) =
32π2a3
F (iτt)− F (−iτt)
e2πt − 1
. (21)
Note that, in fact, Eq. (21) describes the dependence of the free energy on the tem-
perature arising from the dependence on temperature of the Matsubara frequencies.
Thus, ∆F(a, T ) in (21) coincides with the thermal correction to the energy, defined
as F(a, T )− F(a, 0), only for plate materials with temperature independent prop-
erties.
The asymptotic expressions for the energy E(a) at both short and large sep-
arations are well known.6,37,38 Below we find the asymptotic expressions for the
thermal correction (21) under the conditions τ ≪ 1 and τ ≫ 1. Taking into account
the definition of τ in Eq. (12), the asymptotic expressions at τ ≪ 1 are applicable
both at small and large separations if the temperature is sufficiently low.
We begin with condition τ ≪ 1. Let us substitute Eq. (9) in Eqs. (14) – (16),
expand the function f(x, y) in powers of x = τt, and than integrate the obtained
expansion with respect to y from x to infinity in order to find F (x) in Eq. (18) and
F (ix)− F (−ix) in Eq. (21).
It is easy to check that f⊥(ζ, y) does not contribute to the leading, second, order
in the expansion of F (ix)− F (−ix) in powers of x. Thus, we can restrict ourselves
by the consideration of the expansion
f‖(x, y) = y ln
1− r20e−y
2ε0 r
ε0 + 1
y (ey − r20)
4ξ2c r
(ε0 + 1)ω
ey − r20
+O(x3), (22)
where r0 was defined in Eq. (11). Note that for simplicity we consider here only one
oscillator in Eq. (9) and put ωj = ω1. The case of several oscillator modes can be
considered in an analogous way.
As a next step, we integrate Eq. (22) term by term according to Eq. (18), expand
the partial results in powers of x and sum up the obtained series. Thereby we obtain
the following expressions:
Z1(x) ≡
y dy ln
1− r20e−y
(1 + nx)e−nx
= −Li3(r20)−
ln(1− r20) + O(x3) , (23)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 9
Z2(x) ≡
ε0 + 1
y (ey − r20)
= − 2ε0x
ε0 + 1
r2n0 Ei(−nx) , (24)
Z3(x) ≡
4r20ξ
(ε0 + 1)ω
ey − r20
4ξ2cx
(ε0 + 1)ω
(1 + nx)e−nx
(ε0 + 1)ω
x2 Li2(r
1− r20
+O(x5)
, (25)
where Lin(z) is the polylogarithm function and Ei(z) is the exponential integral
function.
From these equations it follows
Z1(ix) − Z1(−ix) = O(x3) , Z3(ix)− Z3(−ix) = O(x5) , (26)
Z2(ix) − Z2(−ix) = 2iπ
ε0 + 1
1− r20
x2 +O(x3) , (27)
and, thus, Z1 and Z3 do not contribute to the leading order in the expansion of
F (ix)− F (−ix). The latter is determined by Z2 only. As a result, we arrive at
F (ix)− F (−ix) = iπ
(ε0 − 1)2
2(ε0 + 1)
x2 − iαx3 +O(x4), (28)
where r0 was substituted from Eq. (11) and α was introduced for the still unknown
real coefficient of the next to leading order resulting from Z1 and Z2 as well as,
possibly, from f⊥(ζ, y). At this stage it is difficult to determine the value of this
coefficient because all powers in the expansion of f(x, y) contribute to it. Remark-
ably, the two leading orders depend only on the static dielectric permittivity ε0 and
are not influenced by the dependence of the dielectric permittivity on the frequency
contained in Z3.
Substituting Eq. (28) in Eq. (21) and using Eq. (19), we obtain
F(a, T ) = E(a)− ~c
32π2a3
(ε0 − 1)2
ε0 + 1
τ3 − C4τ4 +O(τ5)
, (29)
where C4 ≡ α/240 and ζ(z) is the Riemann zeta function.
So far we have considered the free energy. The thermal pressure is obtained as
P (a, T ) = −∂F(a, T )
= P0(a)−
32π2a4
4 +O(τ5)
, (30)
where P0(a) = −∂E(a)/∂a is the Casimir pressure at zero temperature.
In order to determine the value of the coefficient C4 of the leading term, we
express the pressure directly through the Lifshitz formula
P (a, T ) = − ~cτ
32π2a4
1− δl0
(ζl, y)
ey − r2
(ζl, y)
r2⊥(ζl, y)
ey − r2⊥(ζl, y)
. (31)
Again, applying the Abel-Plana formula (17), we represent the pressure as follows,
P (a, T ) = P0(a) + ∆P (a, T ), (32)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
10 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
where the thermal correction to P0(a), the pressure at zero temperature, is
∆P (a, T ) = − i~cτ
32π2a4
Φ(iτt)− Φ(−iτt)
e2πt − 1
and the function Φ(x) is given by
Φ(x) ≡ Φ‖(x) + Φ⊥(x) , Φ‖,⊥(x) =
y2 r2
(x, y)
ey − r2
(x, y)
. (34)
First, we determine the leading term of the expansion of Φ⊥(x) in powers of
x. For this purpose, let us introduce the new variable v = y/x and note that the
reflection coefficient r⊥(x, v) depends on x only through the frequency dependence
of ε given by Eq. (9). Thus, we can rewrite and expand Eq. (34) as follows:
Φ⊥(x) = x
v2 r2⊥(x, v)
evx − r2⊥(x, v)
v2 r2⊥(v)
1− r2⊥(v)
+ O(x4), (35)
where, according to Eq. (16),
r⊥(v) ≡ r⊥(0, v) =
v2 + ε0 − 1− v√
v2 + ε0 − 1 + v
. (36)
Integration in Eq. (35) with account of Eq. (36) results in
Φ⊥(x) =
1− ε0(3− ε0)
+ O(x4) , (37)
from which it follows:
Φ⊥(ix)− Φ⊥(−ix) = −i
ε0 (3− ε0)
+ O(x5). (38)
The expansion of Φ‖(x) from Eq. (34) in powers of x is somewhat more cumber-
some. It can be performed in the following way. As is seen from the second equality
in Eq. (26), the dependence of the dielectric permittivity on frequency contributes
to F (ix)−F (−ix) starting from only the 5th power in x. Bearing in mind the con-
nection between free energy and pressure, we can conclude that the dependence on
the frequency contributes to Φ‖(ix)−Φ‖(−ix) starting from the 4th order. We are
looking for the lowest (third) order expansion term of Φ‖(ix)−Φ‖(−ix). Because of
this, it is permissible to disregard the frequency dependence of ε and describe the
dielectric by its static dielectric permittivity.
To begin with, we identically rearrange Φ‖(x) in Eq. (34) by subtracting and
adding the two first expansion terms of the function under the integral in powers
of x,
Φ‖(x) =
(x, y)
ey − r2
(x, y)
ey − r20
ε0 + 1
(1− r20e−y)
ey − r20
− x2 2ε0
ε0 + 1
(1− r20e−y)
, (39)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 11
and consider these three integrals separately. The first integral in terms of the new
variable v = y/x reads
Q1(x) ≡ x3
evx − r2
− v2 r
evx − r20
ε0 + 1
(1− r20e−vx)
, (40)
where, in accordance with Eq. (16),
r‖(v) ≡ r‖(0, v) =
ε0v −
v2 + ε0 − 1
ε0v +
v2 + ε0 − 1
. (41)
Expanding Q1(x) in powers of x and explicitly calculating the remaining inte-
grals for the lowest, third, power of x results in
Q1(x) = x
1− r2
1− r20
ε0 + 1
(1− r20)
+O(x4) (42)
1 + 3ε0 − 2 ε20
2 r20
1− r20
− 12 ε0
ε0 + 1
(1− r20)
+O(x4).
The second and third integrals on the right-hand side of Eq. (39) are simply deter-
mined with the following result:
Q2(x) ≡
ey − r20
(2 + 2nx+ n2x2)e−nx
= 3Li3(r
1− r20
+O(x4) , (43)
Q3(x) ≡ −
ε0 + 1
(1− r20e−y)
= − 2ε0x
ε0 + 1
1− r20e−x
= − 2ε0
ε0 + 1
1− r20
− x3 r
(1− r20)2
+O(x4) . (44)
Substituting Eqs. (42), (43) and (44) into Φ‖(x) = Q1(x) + Q2(x) +Q3(x), we
arrive at
Φ‖(ix)− Φ‖(−ix) = −i
2ε20 − 3ε0 − 1
+ O(x4). (45)
Then, by summing Eqs. (38) and (45), the result is obtained
Φ(ix)− Φ(−ix) = −i
2 + ε
0 − ε
0 − 2
+ O(x4). (46)
Now we substitute Eq. (46) in Eq. (33) and perform integration. Finally, from
Eq. (32) the desired expression for the Casimir pressure is derived
P (a, T ) = P0(a)−
32π2a4
ε0 − 1
ε20 + ε
0 − 2
τ4 +O(τ5)
. (47)
By comparison with Eq. (30) the explicit form of the coefficient C4 is found as
ε0 − 1
ε20 + ε
0 − 2
/720 (48)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
12 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
and, thus, both two first perturbation orders in the expansion for the free energy
(29) are determined.
Equations (29), (47) and (48) solve the fundamental problem of the thermody-
namic consistency of the Lifshitz theory in the case of two dielectric plates. From
Eqs. (29) and (48) the entropy of the van der Waals and Casimir interaction between
plates takes the form
S(a, T ) = −
∂F(a, T )
3kBζ(3)(ε0 − 1)2
64π3a2(ε0 + 1)
τ2 (49)
2π2(ε0 + 1)
0 + 2ε0 + 2
ε0 + 2
135ζ(3)(
ε0 + 1)2
τ + O(τ2)
As is seen from Eq. (49), in the limit τ → 0 (T → 0) the lower order contributions
to the entropy are of the second and the third powers in the small parameter τ .
Thus, the entropy vanishes when the temperature goes to zero as it must be in
accordance with the third law of thermodynamics (the Nernst heat theorem).
A similar behavior was obtained for ideal metals40,72,73 and for real metals
described by the plasma model.45,52 For example, in the case of plates made of
ideal metal the entropy at low temperatures is given by40,72,73
S(a, T ) =
3kBζ(3)
32π3a2
1− 2π
135ζ(3)
τ + O(τ2)
. (50)
Note, however, that the expansion coefficients in Eq. (50) cannot be obtained as
a straightforward limit |ε0| → ∞ in Eq. (49) and the above equations for the
free energy and pressure. The mathematical reason is that it is impermissible to
interchange the limiting transitions τ → 0 and |ε0| → ∞ in the power expansions
of functions depending on ε0 as a parameter.
Remarkably, the low-temperature behavior of the free energy, pressure and en-
tropy of nonpolar dielectrics in Eqs. (29), (47) and (49) is universal, i.e., is deter-
mined only by the static dielectric permittivity. The absorption bands included in
Eq. (9) do not influence the low-temperature behavior. A more simple derivation
of the results (29), (47)–(49) for dielectrics with constant ε is contained in Ref. 74.
As was demonstrated above, all these results remain unchanged if the dependence
of dielectric permittivity on frequency is taken into account.
In Ref. 60 more general results were obtained related to two dissimilar dielectric
plates with dielectric permittivities ε(1)(ω) and ε(2)(ω). For brevity here we present
only the final expressions for the low-temperature behavior of the Casimir free
energy, pressure and entropy between dissimilar plates. They are as follows:60
F(a, T ) = E(a)− ~c
32π2a3
0 + ε
0 + 2ε
0 + 1)(ε
0 + 1)
0 − 1)(ε
0 − 1)
0 + ε
− C4 τ +O(τ2)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 13
P (a, T ) = P0(a)−
32π2a4
4 +O(τ5)
, (51)
S(a, T ) =
3ζ(3)
0 + ε
0 + 2ε
0 + 1)(ε
0 + 1)
0 − 1)(ε
0 − 1)
0 + ε
− C4 τ +O(τ2)
Here ε
(1,2)
0 ≡ ε(1,2)(0) and the coefficient C4 is given by60
0 + ε
0 + ε
0 + 2ε
0 − ε
0 − 3ε
0 − 3ε
0 + 1
0 − ε
0 − ε
0 − 1
0 − 1
0 + ε
Artanh
0 + ε
0 − ε
0 − ε
It is easily seen that in the limit ε
0 = ε
0 = ε0 equations (51), (52) coincide with
equations (29), (47)–(49) having obtained above. Note that in the application region
of low-temperature asymptotic expressions the entropy of the Casimir interaction
between dielectric plates is nonnegative.
The obtained analytic behavior of the free energy, pressure and entropy at low
temperatures can be compared with the results of numerical computations using
the Lifshitz formula. Dielectric properties of the plates can be described by the
static dielectric permittivity or more precisely using the optical tabulated data for
the complex index of refraction. As an example, in Fig. 1 we present the thermal
corrections to the Casimir energy (a) and pressure (b) at a separation a = 400 nm
as functions of temperature in the configuration of two dissimilar plates made of
high-resistivity Si and SiO2. The dielectric permittivities of both materials along
the imaginary frequency axis were computed in Ref. 75 using the optical data of
Ref. 76. The precise thermal corrections computed by taking into account these
permittivities are shown by the solid lines and corrections computed by our analyt-
ical asymptotic expressions are shown by the long-dashed lines. Short-dashed lines
indicate the results computed by the Lifshitz formula with constant dielectric per-
mittivities of Si and SiO2 equal to ε
0 = 11.67 and ε
0 = 3.84, respectively. As is
seen in Fig. 1a,b, at T < 60K the results obtained using the analytical asymptotic
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
14 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
25 50 75 100 125 150 175 200
T (K)
−∆F (pJ)
25 50 75 100 125 150 175 200
T (K)
−∆P (µPa)
Fig. 1. Magnitudes of the thermal corrections to the energy (a) and pressure (b) in configuration
of two plates, one made of Si and another one of SiO2, at a separation a = 400 nm as a function of
temperature calculated by the use of different approaches: by the Lifshitz formula and tabulated
optical data (solid lines), by the Lifshitz formula and static dielectric permittivities (short-dashed
lines), by the asymptotic expressions in Eqs. (51) and (52) (long-dashed lines).
expressions practically coincide with the solid lines computed using the tabulated
optical data for the materials of the plates.
Now we return to the case of two similar dielectric plates and consider the
asymptotic expressions under the condition τ ≫ 1, i.e., at high temperatures (large
separations). It is well known37,38,77 that in this case the approximation of static
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 15
dielectric permittivity works good and the main contribution is given by the zero-
frequency term of the Lifshitz formula (13)
F(a, T ) = ~cτ
64π2a3
ydy ln
1− r20e−y
(the other terms being exponentially small). Performing the integration in Eq. (53)
we obtain
F(a, T ) = − kBT
16πa2
. (54)
In a similar manner for the Casimir pressure and entropy at τ ≫ 1 it follows
P (a, T ) = −
, S(a, T ) =
16πa2
. (55)
Equations (54) and (55) are simply generalized60 for the case of two dissimilar di-
electric plates by performing the replacement r20 → r
0 , where r
(1,2)
0 are defined
by Eq. (11) with the static dielectric permittivities of dissimilar plates ε
(1,2)
3. IS THE DC CONDUCTIVITY RELATED TO THE CASIMIR
INTERACTION BETWEEN DIELECTRICS?
As was discussed in the previous section, the zero-frequency term in formula (13),
i.e., the contribution with l = 0, is of prime importance and determines many of
the basic properties of the Casimir interaction. In the above consideration we have
described dielectric materials by Eq. (9) with finite static dielectric permittivity ε0.
This resulted in Eq. (11) where one reflection coefficient at zero frequency is larger
and the other one is smaller than the physical value for real photons at normal
incidence. However, in the Lifshitz theory, the departure of both coefficients from
their physical values is coordinated in such a way that the Nernst heat theorem
remains valid.
It is common knowledge that at nonzero temperatures dielectric materials pos-
sess a negligibly small but not equal to zero dc conductivity. From physical intuition
it is reasonable to expect that the influence of this conductivity on the van der Waals
and Casimir forces should be also negligible. In Ref. 57 it was shown that, on the
contrary, the inclusion of small dielectric dc conductivity in the model of dielectric
response leads to a large effect in dispersion forces. This raises the question if dc
conductivity is related to dispersion forces or if, on the contrary, the zero-frequency
contribution should be understood not literally but as an analytic continuation from
the region of high frequencies determining the physical phenomenon of dispersion
forces.
To illustrate this problem in more details, we consider the asymptotic behavior
of the free energy and entropy at low temperature with included dc conductivity.
What this means is that, instead of the dielectric permittivity ε(iξl) given in Eq. (9),
one uses57,58
ε̃(iξl) = ε(iξl) +
= εl(iξ) +
β(T )
, (56)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
16 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
where σ0 is the dc conductivity of the plate material and β(T ) = 2~σ0/(kBT ). The
conductivity of dielectrics depends on temperature as σ0 ∼ exp(−b/T ) where b is
determined by the energy gap ∆which differs for different materials.78 The small-
ness of the dc conductivity of dielectrics can be illustrated79 by the example of
SiO2 where at T = 300K it holds β ∼ 10−12. Thus, the role of the dc conductivity
is really negligible for all l ≥ 1. In addition, β(T ) quickly decreases with decrease
of T and, as a consequence, remains negligible at any T . In spite of this, the substi-
tution of Eq. (56) into the reflection coefficients (16) leads to different result than
in Eq. (11):
r̃‖(0, y) = 1, r̃‖(0, y) = 0. (57)
Equation (57) is in some analogy to Eq. (6), obtained for metals described by the
Drude model, which leads to the violation of the Nernst heat theorem in the case
of perfect crystal lattices.
Now we substitute the dielectric permittivity ε̃l ≡ ε̃(ξl) in Eqs. (13) – (16)
instead of εl and find the Casimir free energy F̃(a, T ) with included dc conductivity.
For convenience, we separate the zero-frequency term, subtract and add the usual
zero-frequency contribution for dielectric without dc conductivity. The result is
F̃(a, T ) =
16πa2
1− e−y
1− r20e−y
16πa2
ydy ln
1− r20e−y
1− r̃2‖(ζl, y)e
1− r̃2⊥(ζl, y)e−y
where r0 was defined in Eq. (11). Let us expand the last, third, integral on the right-
hand side of Eq. (58) in powers of the small parameters β/l. Then, we combine the
zero-order contribution in this expansion with the second integral on the right-
hand side of Eq. (58) and obtain the Casimir free energy F(a, T ) calculated with
the dielectric permittivities εl. The first integral on the right-hand side of Eq. (58)
is calculated explicitly. Then, Eq. (58) can be rewritten as
F̃(a, T ) = F(a, T )− kBT
16πa2
ζ(3)− Li3
+R(a, T ). (59)
Here, R(a, T ) is of order O(β/l). It represents the first and higher-order contribu-
tions in the expansion of the third integral on the right-hand side of Eq. (58) in
powers of β/l. Restricting its explicit form to the first order contribution we get
R(a, T ) = R1(a, T ) +O
(β/l)
, (60)
R1(a, T ) =
dy y2e−y
y2 + ζ2l (εl − 1)
(2− εl) ζ2l − 2y2
y2 + ζ2l (εl − 1) + εly
r‖(ζl, y)
1− r2
(ζl, y)e−y
y2 + ζ2l (εl − 1) + y
r⊥(ζl, y)
1− r2⊥(ζl, y)e−y
. (61)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 17
Calculating the entropy by the first equality in Eq. (49), we arrive at
S̃(a, T ) = S(a, T ) +
16πa2
ζ(3)− Li3
− ∂R(a, T )
, (62)
where S(a, T ) is the entropy for the plates with the dielectric permittivity εl given
by Eq. (49).
Let us now show that the quantity R(a, T ) exponentially goes to zero with the
decrease of T . First we consider only the integral in Eq. (61), expand the integrated
function in powers of τ (we recall that ζl = τl), restrict ourselves to the main
contribution, resulting for τ = 0, and rearrange it appropriately:
(ε0 + 1)2
1− r20e−y
= − 2
(ε20 − 1)
1− r20e−y
(ε20 − 1)
dy ye−ny = −
(ε20 − 1)
1 + nζl
e−nζl . (63)
Substituting this in Eq. (61), we find
R1(a, T ) = −
4πa2 (ε20 − 1)
e−nτl
e−nτl
= − kBTβ
4πa2 (ε20 − 1)
1− e−nτ
enτ − 1
kB Li2
2πa2 (ε20 − 1)
Tβ ln τ + TβO(τ0) . (64)
Here, the last line is obtained by using the equality
1− e−nτ
enτ − 1
= − ln τ + 1− lnn+O(τ2) , (65)
substituting only its leading term and observing the definition of the integral log-
arithm, Li2 (z) = (1/2)
zn/n2. Taking into account that β ∼ (1/T ) exp(−b/T ),
we get the conclusion that the temperature dependence of R1(a, T ) is given by
R1(a, T ) ∼ e−b/T lnT. (66)
Thus, both R1(a, T ) and its derivative with respect to T in Eqs. (59) and (62) go
to zero. The terms of the second and higher powers in β in Eq. (60) go to zero even
faster than R1 when T → 0.
Finally, in the limit T → 0 from Eq. (62) it follows
S̃(a, 0) =
16πa2
ζ(3)− Li3
> 0. (67)
The right-hand side of this equation depends on the parameter of the system under
consideration (the separation distance a) and implies a violation of the Nernst heat
theorem. An analogous result was obtained60 in the case of two dissimilar dielectrics.
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
18 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
The violation of the Nernst heat theorem in the Casimir interaction for dielectrics
originates from the inclusion of the dc conductivity in the model of dielectric re-
sponse. This violation is, however, of different nature than the one discussed above
in the case of Drude metals. In the case of dielectrics the entropy at zero tempera-
ture is positive but in the case of Drude metals it is negative. In the case of metals
the violation is caused by the vanishing contribution from the transverse electric
mode at zero frequency whereas the other reflection coefficient takes the physical
value 1 [see Eq. (6)]. For dielectrics the situation is quite opposite. In this case the
transverse reflection coefficient at zero frequency is always equal to zero [compare
Eqs. (11) and (57) in the absence and in the presence of the contribution from
dc conductivity]. Here the violation occurs due to the unity value of the parallel
reflection coefficient at zero frequency in Eqs. (57) which departs from the value
r0 = (ε0 − 1)/(ε0 + 1) coordinated with the zero value of the transverse coefficient
in Eq. (11).
One can conclude that the dc conductivity of a dielectric is not related to the na-
ture of the van der Waals and Casimir forces and must not be included in the model
of dielectric response. Ignoring this rule results in a violation of thermodynamics.
Physically it is amply clear that there is no fluctuating field of zero frequency and
that for such high-frequency phenomena as the van der Waals and Casimir forces the
low-frequency behavior should be obtained by analytic continuation from the region
of high frequencies. This permits to conclude that the correct procedure consists in
the substitution of the finite static dielectric permittivities into the zero-frequency
term of the Lifshitz formula, as Lifshitz and his collaborators really did.36,37,38
4. THERMAL CASIMIR FORCE BETWEEN DIELECTRIC
AND METAL PLATES
The Casimir interaction between metal and dielectric plates suggests the interest-
ing opportunity to verify the thermodynamic consistency of the Lifshitz theory
with different models of the dielectric response. This configuration was first inves-
tigated in Ref. 61 where it was proved that the Casimir entropy is in accordance
with the demands of the Nernst heat theorem if the static permittivity of the di-
electric plate is finite. In Ref. 61, however, only the first leading terms in the low-
temperature asymptotic expressions for the free energy and entropy were obtained
and the Casimir pressure was derived only in the dilute approximation. Here we
derive the more precise low-temperature behavior for the Casimir free energy, pres-
sure and entropy in the configuration of one plate made of ideal metal and another
plate made of dielectric with any finite static dielectric permittivity.
For the configuration of metal and dielectric plates the Lifshitz formula takes
the form80 analogical to Eq. (13)
F(a, T ) = ~cτ
32π2a3
1− δl0
ydy (68)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 19
1− rM‖ (ζl, y) rD‖ (ζl, y) e−y
1− rM⊥ (ζl, y) rD⊥ (ζl, y) e−y
Here the reflection coefficients r
for metal and dielectric, respectively, are given
by Eqs. (16) where εl should be changed for ε
l = ε
M,D(iξl).
For an ideal metal rM‖,⊥(ζl, y) = 1 and Eq. (68) takes the more simple form
F(a, T ) =
32π2a3
ydy (69)
1− r‖(ζl, y)e−y
1− r⊥(ζl, y)e−y
(here and below we omit the index D near the reflection coefficient and permit-
tivity of a dielectric plate). We admit that the dielectric permittivity calculated at
Matsubara frequencies εl ≡ ε0, i.e., is equal to its static value and find the asymp-
totic behavior of Eq. (69) at small τ . [In analogy with Sec. 2 it is possible to prove
that the deviations of ε(iξl) from ε0 at high frequencies do not influence the low-
temperature behavior of the Casimir free energy, pressure and entropy. It can be
shown also that the results of this section are valid not only for ideal metal plate
but for plate made of real metal as well.] The free energy (69) can be represented
by Eqs. (18) – (21), with the function f(ζ, y) replaced by
f̂(ζ, y) = y ln
1− r‖(ζ, y)e−y
+ y ln
1− r⊥(ζ, y)e−y
≡ f̂‖(ζ, y) + f̂⊥(ζ, y). (70)
In the case of one dielectric and one metal plate both f̂‖ and f̂⊥ contribute to
F (ix)− F (−ix). The expansion of f̂(x, y) in powers of x takes the form
f̂(x, y) = y ln(1− r0e−y)−
ε0 − 1
e−y − ε0
ε0 + 1
x2 +O(x3). (71)
Now we integrate Eq. (71) in accordance with Eq. (18) to find the function F (x).
The integral of the first term on the right-hand side of Eq. (71) is evaluated using
the new variable v = y − x:
ydy ln(1− r0e−y) =
vdv ln(1− r0e−v) + O(x2), (72)
where the coefficient near the first-order contribution in x vanishes. As a result, this
term could contribute to F (ix) − F (−ix) only starting from the third expansion
order. The integrals of the second-order terms on the right-hand side of Eq. (71)
are simply calculated using the formulas
= −Ei(−x),
= −Ei(−nx). (73)
Finally, we obtain
F (ix)− F (−ix) = iπ
(ε0 − 1)2
4 (ε0 + 1)
x2 − iγ x3 +O(x4), (74)
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
20 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
where the unknown third order expansion coefficient is designated as γ.
Substituting Eq. (74) in Eq. (21) and using Eq. (19), we find the free energy in
the system metal-dielectric in the form
F(a, T ) = E(a)− ~c
32π2a3
(ε0 − 1)2
ε0 + 1
τ3 −K4 τ4 +O(τ5)
, (75)
where K4 ≡ γ/240.
The Casimir pressure in the configuration of metal and dielectric plates obtained
from Eq. (75) is equal to
P (a, T ) = P0(a)−
32π2a4
4 +O(τ5)
. (76)
The direct application of the Lifshitz formula gives the expression for the pressure
analogical to Eq. (31),
P (a, T ) = − ~cτ
32π2a4
1− δl0
r‖(ζl, y)
ey − r‖(ζl, y)
r⊥(ζl, y)
ey − r⊥(ζl, y)
. (77)
Using the Abel-Plana formula (17), Eq. (77) can be represented in the form of
Eqs. (32), (33) where
Φ‖,⊥(x) =
y2 r‖,⊥(x, y)
ey − r‖,⊥(x, y)
. (78)
Again, we deal first with Φ⊥(x). By adding and subtracting the asymptotic
behavior of the integrated function at small x,
y2r⊥(x, y)
ey − r⊥(x, y)
(ε0 − 1)x2e−y +O(x3) , (79)
and introducing the new variable v = y/x, the function Φ⊥(x) can be identically
rearranged and expanded in powers of x as follows:
Φ⊥(x) =
(ε0 − 1)x2e−x + x3
rn⊥(v)e
−nvx − 1
(ε0 − 1)e−vx
(ε0 − 1)x2(1− x) + x3
v2r⊥(v)
1− r⊥(v)
ε0 − 1
+O(x4). (80)
The integral on the right-hand side of Eq. (80) is converging and can be simply
calculated with the result
Φ⊥(x) =
ε0 − 1
x2 − 1
ε0 − 1)x3 +O(x4). (81)
To deal with Φ‖(x) we add and subtract in Eq. (78) the two first expansion
terms of the integrated function in powers of x,
Φ‖(x) =
ey − r0
− ε0r0e
y2(ε0 + 1) (1− r0e−y)2
r‖(x, y)
ey − r‖(x, y)
ey − r0
ε0r0e
y2(ε0 + 1) (1− r0e−y)2
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 21
The asymptotic expansion of the first integral on the right-hand side of Eq. (82) is
given by
2Li3(r0)−
ε0(ε0 − 1)
2(ε0 + 1)
(ε0 − 1)(3ε0 − 2)x3 +O(x4), (83)
and of the second one by
ε0(ε0 − 1)−
ε0(ε0
ε0 − 1) +
ε0(ε0 − 1)
x3 +O(x4). (84)
By summing Eqs. (83) and (84) we find
Φ‖(x) = 2Li3(r0)−
ε0(ε0 − 1)
2(ε0 + 1)
x2 (85)
(ε0 − 1) + (ε0
ε0 − 1)− 3ε0(ε0 − 1)
x3 +O(x4).
Finally we add Eq. (85) to Eq. (81) and arrive at
Φ(ix) − Φ(−ix) = −i
1− 2ε0
ε0 + ε
x3 +O(x4). (86)
Substituting this in Eq. (33) and using Eq. (32), the asymptotic expression for the
Casimir pressure is obtained
P (a, T ) = P0(a)−
32π2a4
1− 2ε0
ε0 + ε
τ4 +O(τ5)
. (87)
Comparing Eqs. (87) and (76), the explicit expression for the coefficient K4 reads
1− 2ε0
ε0 + ε
. (88)
Now we are in a position to find the asymptotic behavior of the entropy in the
limit of low temperatures in the configuration of two parallel plates one of which
is metallic and the other one dielectric. By calculating the negative derivative of
Eq. (75) with respect to temperature, one arrives at
S(a, T ) =
3kBζ(3)(ε0 − 1)2
128π3a2(ε0 + 1)
τ2 (89)
8π2(ε0 + 1)
1− 2ε0
ε0 + ε
135ζ(3)(ε0 − 1)2
τ +O(τ2)
This equation is in analogy to Eq. (49) obtained for the case of two dielectrics.
As is seen from Eq. (89), the entropy of the Casimir and van der Waals interactions
between metal and dielectric plates vanishes when the temperature goes to zero,
i.e., the Nernst heat theorem is satisfied. Note that the first term of order τ2 on the
right-hand side of Eq. (89) was already obtained in Ref. 61. It is notable also that at
low temperatures the entropy goes to zero remaining positive. At the same time, as
was shown in Ref. 61, at larger temperatures entropy is nonmonotonous and may
take negative values. This interesting property distinguishes the configuration of
metal and dielectric plates from two dielectric plates. In the latter configuration the
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
22 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
negative entropy appears only for nonphysical dielectrics with anomalously large
and frequency independent dielectric permittivities.53
Now let us consider the case τ ≫ 1, i.e., high temperatures (large separations).
In the same way as for two dielectric plates, here the main contribution to the free
energy is given by the term with l = 0 in Eq. (69),
F(a, T ) = ~cτ
64π2a3
ydy ln
1− r0e−y
. (90)
Performing the integration in Eq. (90), we obtain
F(a, T ) = − kBT
16πa2
Li3 (r0) . (91)
From this equation, for the Casimir pressure and entropy at τ ≫ 1 it follows
P (a, T ) = − kBT
Li3 (r0) , S(a, T ) =
16πa2
Li3 (r0) . (92)
The results (91) and (92) are analogous to (54) and (55) for two dielectric plates.
5. THE PROBLEM ORIGINATING FROM THE ACCOUNT OF
DIELECTRIC DC CONDUCTIVITY
In the previous section it was supposed that the static dielectric permittivity ε0 of
the dielectric plate is finite. Now we will deal with the configuration of metal and
dielectric plates with included dc conductivity of the dielectric material. In doing
so the permittivity of the dielectric plate is given by
ε̃(iξl) = ε0 +
β(T )
, (93)
where all notations were introduced in and below Eq. (56). Thus, the reflection
coefficients at zero frequency satisfy Eq. (4) for a plate made of ideal metal and
Eq. (57) for a plate made of dielectric with included dc conductivity. Let us find
the low-temperature behavior of the Casimir entropy and verify the consistency of
the Lifshitz theory with thermodynamics in this nonstandard situation.
For this purpose we substitute the dielectric permittivity (93) in Eq. (69) in-
stead of ε0 and find the Casimir energy F̃(a, T ) with included dc conductivity of
a dielectric plate. In the same way as in Sec. 3, it is convenient to separate the
zero-frequency term of F̃(a, T ) and subtract and add the usual zero-frequency con-
tribution for metal-dielectric plates computed with the dielectric permittivity ε0,
F̃(a, T ) = kBT
16πa2
1− e−y
1− r0e−y
16πa2
ydy ln
1− r0e−y
1− r̃‖(ζl, y)e−y
1− r̃⊥(ζl, y)e−y
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 23
Here the reflection coefficients r̃‖,⊥ are calculated with the permittivity (93). We
expand the third integral on the right-hand side of Eq. (94) in powers of the small
parameter β/l. The zero-order contribution in this expansion together with the
second integral of Eq. (94) form the Casimir free energy F(a, T ) calculated with
dielectric permittivity ε0. The first integral on the right-hand side of Eq. (94) is
calculated explicitly. As a result, Eq. (94) is rearranged to
F̃(a, T ) = F(a, T )− kBT
16πa2
[ζ(3)− Li3 (r0)] +Q(a, T ), (95)
where Q(a, T ) contains the first and higher-order contributions in the expansion of
the third integral on the right-hand side of Eq. (94) in powers of β/l. The explicit
form of the main first-order term in Q(a, T ) is the following:
Q1(a, T ) =
dy y2e−y
y2 + ζ2l (ε0 − 1)
(2− ε0)ζ2l − 2y2
y2 + ζ2l (ε0 − 1) + ε0y
1− r‖(ζl, y)e−y
y2 + ζ2l (ε0 − 1) + y
1− r⊥(ζl, y)e−y
. (96)
In the same way as in Sec. 3, we expand the integrated function in Eq. (96) in
powers of τ (bearing in mind that ζl = τl) and preserve only the main contribution
at τ = 0:
Q1(a, T ) = −
dy yr0e
(ε20 − 1) (1− r0e−y)
= − kBTβ
4πa2(ε20 − 1)
e−nτl
e−nτl
Dealing with this expression in the same way as with Eq. (64), we arrive at
Q1(a, T ) ∼ e−b/T lnT. (98)
The Casimir entropy in the configuration of metal and dielectric plates with
included dc conductivity of the dielectric plate is obtained as minus derivative of
Eq. (95) with respect to temperature,
S̃(a, T ) = S(a, T ) +
16πa2
[ζ(3)− Li3 (r0)]−
∂Q(a, T )
. (99)
Using Eq. (98), the calculation of the limiting value at T → 0 is straightforward:
S̃(a, 0) =
16πa2
[ζ(3)− Li3 (r0)] > 0. (100)
From this equation it follows that the inclusion of the dc conductivity of dielectric
plate in the configuration metal-dielectric results in a violation of the Nernst heat
theorem. In the above this result was obtained for a metallic plate made of ideal
metal. It can be shown that it remains valid for a metallic plate made of real metal
with finite conductivity.
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
24 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
Thus, both configurations (two dielectric plates or one metal plate and one
dielectric) lead to the same conclusion that when the dc conductivity is included
in the model of dielectric response of the dielectric plate, the Lifshitz theory loses
its consistency with thermodynamics. This confirms the conclusion made in Sec. 3
that the actual properties of dielectric materials at very low frequencies are in fact
not related to the van der Waals and Casimir forces.
6. QUALITATIVE DISCUSSION OF CASIMIR INTERACTION
BETWEEN METAL AND SEMICONDUCTOR
As was mentioned in the Introduction, semiconductors possess a wide variety of
electric and optical properties ranging from metallic to dielectric. This opens the
possibility to modulate the van der Waals and Casimir forces by changing the charge
carrier density. Bearing in mind the discussed above problems on the consistency
of the Lifshitz theory with thermodynamics, semiconductors can provide us with
a test for the validity of different approaches. Thus, if for good dielectric the dc
conductivity does not play any real role in the van der Waals and Casimir forces,
the question arises on how much it should be increased in order to become a relevant
factor in the description of dispersion forces.
In Ref. 81 the Casimir force acting between two Si plates was calculated using the
simple analytic expression for Si dielectric permittivity as a function of frequency.
The complete tabulated optical data of Si were used in Ref. 75 to calculate the van
der Waals interaction of different atoms with a Si wall. The first attempt to measure
the van der Waals force between a glass lens and a Si plate and to modify it by light
due to the change of carrier density was undertaken in Ref. 82. However, glass is a
dielectric and therefore the electric forces due to localized point charges could not
be controlled. This might explain that no force change occured on illumination at
small separations where the effect should be most pronounced.
The first measurements of the Casimir force between a gold coated sphere and
a single crystal Si plate were performed in Refs. 62, 63 by means of the atomic
force microscope. The experiments used a p-type B doped Si plate of resistivity
ρ = 0.0035 Ω cm. The chosen resistivity of the plate is in some sense intermediate
between the resistivity of metals (which is usually two or three orders of magnitude
lower) and the resistivity of dielectrics (it can be by about a factor of 105 larger; for
instance, high-resistivity “dielectric” Si has the resistivity ρ0 = 1000 Ω cm). Thus,
the used Si plate had a relatively large absorption typical for semiconductors but it
was also enough conductive to avoid the accumulation of charges.63
In Fig. 2, taken from Ref. 63, the differences of the theoretical and mean ex-
perimental Casimir forces acting between Au sphere and Si plate are presented as
functions of separation. In Fig. 2a the theoretical force F theor is computed using
the Lifshitz formula and the dielectric permittivity of a Si plate with the relatively
low resistivity ρ used in experiment. This dielectric permittivity goes to infinity as
ξ−1 with decreasing frequency (see the solid line in Fig. 3). In Fig. 2b the theo-
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 25
80 100 120 140
  (nm)
F theor − F̄ expt (pN)
80 100 120 140
 (nm)
F̃ theor − F̄ expt (pN)
Fig. 2. Differences of the theoretical and mean experimental Casimir forces versus separation.
Theoretical forces are computed (a) for the Si plate used in experiment and (b) for dielectric Si.
Solid and dashed lines indicate 95 and 70% confidence intervals, respectively.
retical Casimir force F̃ theor is computed using the dielectric permittivity of the Si
plate made of “dielectric” Si with high resistivity ρ0. The dielectric permittivity of
high-resistivity Si is shown by the dashed line in Fig. 3. It is characterized by the
finite static value εSi(0) = 11.67. The solid and dashed lines in Fig. 2a,b indicate
the 95% and 70% confidence intervals, respectively. As is seen from Fig. 2, the the-
oretical approach using the dielectric permittivity of high-resistivity “dielectric” Si
is excluded by experiment within the separation range from 60 to 110nm at 70%
confidence. At the same time, the theory using the dielectric permittivity of Si with
a low resistivity ρ is consistent with experiment.
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
26 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
13.5 14 14.5 15 15.5 16 16.5 17
Log10 [ξ (rad/s)]
ε (iξ)D
Fig. 3. Dielectric permittivity of Si plate used in experiment along the imaginary frequency axis
(solid line). Dashed line shows the dielectric permittivity of dielectric Si.
The above results suggest an approach on how to correctly determine the pos-
sible role of the low-frequency conductivity properties in dispersion forces. As is
seen from Fig. 3, the dielectric permittivity of low-resistivity Si (solid line) sig-
nificantly departs from the dielectric permittivity of “dielectric” Si in the region
around the important dimensional parameter of the problem, the characteristic fre-
quency c/(2a) ∼ 1014 − 1015 rad/s. That is why the low-resistivity sample cannot
be described at low frequencies by the static dielectric permittivity of Si equal to
εSi(0) = 11.67. To describe it, the term β(T )/l, like in Eq. (93), should be added
to εSi(0). Note that in this case β(T )/l > 1 at the first Matsubara frequencies with
l = 1, 2, 3, . . . and, thus, this quantity cannot be considered as a small parameter.
At the same time, for a high-resistivity sample the inclusion of the dc conduc-
tivity would lead to deviations from the dashed line in Fig. 3 only at frequencies
ξ < 108 rad/s, which are much less than the characteristic frequency. This compari-
son permits to make a conclusion in what experimental situations the conductivity
properties of semiconductors at low frequencies should be taken into account and
when they should be omitted as being not related to dispersion forces.
The future experiments on the modification of semiconductor charge carrier
density by laser light83 will bring a more clear understanding of this problem on
the connection between the low-frequency material properties and dispersion forces.
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 27
7. DOES SPATIAL DISPERSION LEAD TO AN IMPORTANT
IMPACT ON THERMAL CASIMIR FORCE?
The presented above new analytic results on the low-temperature behavior of the
Casimir free energy, pressure and entropy between dielectrics or between dielectric
and metal are based on the conventional Lifshitz theory which describes dielectric
materials by means of the frequency dependent dielectric permittivity. In fact, the
assumption that the material of the plates possesses only the frequency dispersion
means that the components of electric displacement are connected with the compo-
nents of electric field by the relation
Dk(r, ω) = εkl(r, ω)El(r, ω). (101)
This equation is central in all different derivations of the Lifshitz formula (see, e.g.,
Refs. 2, 6, 36–38, 69, 70, 84). The effects of spatial nonlocality (spatial dispersion)
are in fact essential only at shortest separations between the plates comparable with
atomic dimensions and also for metals at sufficiently large separations (typically of
about 2–3µm) in the frequency region of the anomalous skin effect. The Casimir
force in the latter region was described by the Lifshitz theory reformulated in terms
of the Leontovich impedance.85
Recently the spatial dispersion came to the attention in connection with the
thermal Casimir force.64,65,66,67,68 In particular, it was claimed64,65 that for real
metals at any separation the account of spatial dispersion leads to practically the
same result (6) for the reflection coefficients at zero frequency as was obtained earlier
using the Drude model dielectric function (5). This conclusion, if it is correct, not
only returns us to the contradiction with experiment30,32,33 but also casts doubts
on all results obtained by means of the conventional Lifshitz theory accounting for
only the frequency dispersion. It is natural when the spatial dispersion contributes
a small fraction of a percent as it is generally believed in numerous applications of
the Lifshitz theory. It is, however, quite another matter when the account of the
spatial dispersion results in some “dramatic effects”,64 i.e., in several hundred times
larger thermal correction than is obtained in the local case. Below we demonstrate
that the conclusions of Refs. 64 – 68 are in fact not reliable because they use the
Lifshitz theory of dispersion forces outside of its application range.86
To find the electromagnetic modes associated with an empty gap between the
plates, Refs. 64 – 68 use the standard continuity boundary conditions,
E1t = E2t, B1n = B2n, D1n = D2n, B1t = B2t, (102)
which are commonly applied in the derivation of the Lifshitz formula for spatially
local nonmagnetic materials. Here B is the magnetic induction, n is the normal to
the boundary directed inside the medium, the subscripts n, t refer to the normal
and tangential components, respectively, the subscript 1 refers to the vacuum and
subscript 2 to the plate material. In Refs. 64 – 68 the spatial dispersion is described
by the longitudinal and transverse dielectric permittivities depending on the wave
vector and frequency: εkl = εkl(q, ω). However, as is shown below, in the theory
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
28 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
of the Casimir effect both the boundary conditions (102) and permittivities εkl =
εkl(q, ω) are inapplicable.
We start from the boundary conditions and recall the set of Maxwell equations
in a metal describing the Casimir effect,
rotE +
= 0, divD = 0, rotB − 1
= 0 divB = 0. (103)
Equations (103) do not contain any external, i.e., independent on E, D and B,
current or charge densities. The definition of the electric displacement is
+ 4πi, (104)
where the volume current i is induced by E and B and takes into account the
conduction electrons.
In electrodynamics with spatial dispersion the electric field and magnetic induc-
tion are finite at the boundary surfaces, whereas the electric displacement can tend
to infinity.87 Then, integrating Eqs. (103) over the thickness of the boundary layer
as is done in Ref. 88, we reproduce the first two conditions in Eq. (102) and arrive
at the modified third and fourth conditions,87,89
E1t = E2t, B1n = B2n, D2n −D1n = 4πσ, [n× (B2 −B1)] =
j, (105)
where the induced surface charge and current densities are given by
div[n× [D × n]]dl, j = 1
dl. (106)
Note that the boundary conditions (105), (106) are obtained from the macroscopic
Maxwell equations for physical fields. They should not be mixed with the boundary
conditions arising in perturbative theories and for the fictitious fields (see below).
In linear electrodynamics for a medium with time-independent properties with-
out spatial dispersion the material equation connecting the electric displacement
and electric field takes the form
Dk(r, t) =
ε̂kl(r, t− t′)El(r, t′)dt′. (107)
According to this equation, the electric displacement at a point r and moment t is
determined by the electric field at the same point r at different moments t′ ≤ t (the
spatial dispersion is absent but the temporal may be present). It is easily seen that
the substitution of Eq. (107) in Eq. (106) leads to σ = 0, j= 0 and, as a result, the
boundary conditions (105) coincide with the standard continuity conditions (102). It
is unjustified, however, to use conditions (102) in the presence of spatial dispersion.
In Refs. 87, 90 a few examples are presented illustrating that in this case neither σ
nor j is equal to zero.
We now turn to a discussion of the use of dielectric permittivity εkl(q, ω) in the
theory of the Casimir effect with account of spatial dispersion. In the presence of
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 29
only frequency dispersion, it is possible to perform the Fourier transformation of
the fields
E(r, t) =
E(r, ω)e−iωtdω, D(r, t) =
D(r, ω)e−iωtdω, (108)
in Eq. (107) and arrive at Eq. (101) where
εkl(r, ω) =
ε̂kl(r, τ)e
iωτdτ (109)
is the frequency-dependent dielectric permittivity and τ ≡ t− t′. In fact Eqs. (101),
(108) and (109) are used in parallel with the boundary conditions (102) in all
derivations of the Lifshitz formula.2,6,36,37,38,69,70,84
If the material of the plates is characterized not only by temporal but also spatial
dispersion, Eq. (107) is generalized to
Dk(r, t) =
dr′ε̂kl(r, r
′, t− t′)El(r′, t′). (110)
If the material medium were uniform in space (i.e., all points were equivalent), the
kernel ε̂ would not depend on r and r′ separately, as in Eq. (110), but on the
difference R ≡ r − r′. In this case, by performing the Fourier transformation,
E(r, t) =
dqE(q, ω)ei(qr−ωt),
D(r, t) =
dqD(q, ω)ei(qr−ωt), (111)
and substituting it in Eq. (110), one could introduce the dielectric permittivities
εkl(q, ω) =
dR ε̂kl(R, τ)e
−i(qR−ωτ), (112)
as Refs. 64–68 do, and rearrange Eq. (110) to the form
Dk(q, ω) = εkl(q, ω)El(q, ω). (113)
In the Casimir effect, however, the material medium is not uniform due to the
presence of a macroscopic gap between the two plates (half spaces). Because of this,
the assumption that the kernel ε̂ depends only on R and τ is wrong. As a result,
it is not possible to introduce the dielectric permittivity εkl(q, ω) depending on
the wave vector and frequency. In fact, for systems with spatial dispersion in the
presence of boundaries the kernel ε̂ depends not only on R and τ but also on the
distance from the boundary.87 In this complicated situation the following approx-
imate phenomenological approach is sometimes applicable.87 For electromagnetic
waves with a wavelength λ the kernel ε̂(r, r′, τ) in Eq. (110) differs essentially from
zero only in a certain vicinity of the point r with characteristic dimensions l ≪ λ
(for nonmetallic condensed media l is of the order of the lattice constant). Then it is
reasonable to assume that ε̂ is a function of R=r–r′, except for a layer of thickness
l adjacent to the boundary surface. If one is mostly interested in bulk phenomena
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
30 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
and neglects the role and influence of a subsurface layer, the quantity εkl(q, ω) may
be employed as a reasonable approximation.
This approximate phenomenological approach is widely applied in the theoretical
investigation of the anomalous skin effect (see, e.g., Ref. 91). Note that in Ref. 91
some kind of fictitious infinite system was introduced and electromagnetic fields in
this system are discontinuous on the boundary surface. This discontinuity should
not be confused with the discontinuity of physical fields of a real system in the
presence of spatial dispersion described by Eqs. (105) and (106). (There is also
another approach to the description of the anomalous skin effect in polycrystals
using the generalizations of the local Leontovich impedance which takes into account
the shape of Fermi surface92). The frequency and wave vector dependent dielectric
permittivity in the presence of boundaries is also approximately applied in the
theory of radiative heat transfer93 or in the study of electromagnetic interaction of
molecules with metal surfaces.94 In all these applications the boundary effects are
usually taken into account by the boundary conditions (105) supplemented by so
called “additional boundary conditions”.
It is unlikely, however, that the approximate phenomenological approach using
such quantity as εkl(q, ω) in the presence of boundaries would be applicable in the
theory of the Casimir force where the boundary effects on the zero-point electromag-
netic oscillations are of prime importance. It is notable also that this approach faces
serious theoretical difficulties including the violation of the law of conservation of
energy.95 It is then not surprising that the application of this approach in Refs. 64,
65 results in the contribution to the Casimir free energy from the transverse electric
mode which is in contradiction with experiment.30,32,33
One more shortcoming of Refs. 64–68 is that they substitute the dielectric per-
mittivity εkl(q, ω), depending on both wave vector and frequency, into the conven-
tional Lifshitz formula derived in the presence of only temporal dispersion. In the
famous review paper96 it has been noticed, however, that with the inclusion of spa-
tial dispersion the free energy of a fluctuating field takes the form F = FL +∆F ,
where FL is given by the conventional Lifshitz expression derived in a spatially
local case and written in terms of the Fresnel reflection coefficients, and ∆F is an
additional term which can be expressed in terms of the thermal Green’s function of
the electromagnetic field and polarization operator. Review96 calls as not reliable
the results of, e.g., Ref. 97 obtained by the substitution of dielectric permittivity
εkl(q, ω), taking account of spatial dispersion, into the conventional Lifshitz for-
mula. It can be true that the Lifshitz formula written in terms of general reflection
coefficients is applicable in both spatially local and nonlocal situations. However,
as far as the exact reflection coefficients in a spatially nonlocal case are unknown,
the use of some approximate phenomenological models, elaborated in literature for
applications different than the Casimir effect, may lead to incorrect results for ∆F
and create inconsistencies with experiment.
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 31
To conclude this section, the results of Refs. 64–68 on the influence of spatial
nonlocality on the Casimir interaction are shown to be not reliable. Although at
present there is no fundamental theory of the thermal Casimir force incorporating
spatial dispersion, there is no reason to expect that it can play any significant role
in the frequency region of infrared optics (experimental separations) or normal skin
effect (i.e., at separations between plates greater than 4–5µm).
8. CONCLUSIONS AND DISCUSSION
In the foregoing we have presented the derivation of analytic asymptotic expres-
sions for the free energy, pressure and entropy of the Casimir interaction between
two dielectric plates and between metal and dielectric plates at low and high temper-
atures. It was shown that the low-temperature behavior of the Casimir interaction
between dielectrics and between dielectric and metal is determined by the static
dielectric permittivities of nonpolar dielectrics. The obtained results were shown to
be in agreement with thermodynamics when the static dielectric permittivities of
dielectrics are finite. In particular, the entropy of the Casimir interaction goes to
zero when the temperature vanishes, i.e., the Nernst heat theorem is satisfied. This
demonstrates the consistency of the original Lifshitz’s approach to the van der Waals
forces between dielectrics which disregards the small conductivity of dielectrics at
constant current.
The second important result shown above is that the inclusion of the dc con-
ductivity of dielectrics into the model of dielectric response leads not to some small
corrections to the characteristics of the Casimir interaction, as one could expect,
but makes the Lifshitz theory inconsistent with thermodynamics leading to the vi-
olation of the Nernst heat theorem. This reveals that real material properties at
very low, quasistatic frequencies are in fact not related to the phenomenon of the
van der Waals and Casimir forces which is actually determined by sufficiently high
frequencies. In this case the zero-frequency contribution to the Casimir force should
be understood not literally but as analytic continuation to zero of the material
physical behavior in the region around the characteristic frequency.
The presented results provide a basis for the calculation of the van der Waals
and Casimir forces between real materials. Such calculations are much needed for
numerous applications of the Casimir force discussed in the Introduction, in par-
ticular for the applications in nanotechnology and for constraining predictions of
fundamental physical theories beyond the Standard Model. Bearing in mind that
semiconductors are the main constituent materials in nanotechnological devices, it
is a subject of high priority to understand the Casimir effect with semiconductor
boundaries. In this connection we have discussed new experimental results and the-
oretical ideas on the Casimir interaction between a metal sphere and semiconductor
plate. It was stressed that by changing the charge carrier density in the semicon-
ductor it is possible to bring it in different intermediate states between metallic
and dielectric. In this case the problem arises when the conductivity properties of
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
32 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
semiconductor are not related to dispersion forces and when they are becoming rel-
evant. A criterion for the resolution of this problem was formulated based on the
relation between the typical frequency at which the dc conductivity properties come
into play and the characteristic frequency of the Casimir effect. In fact the thermal
Casimir interaction between semiconductors remains an open question and much
work should be done both in experiment and theory to gain a better insight into
this subject.
The last major problem discussed in this review is whether or not the spatial
dispersion influences essentially the thermal Casimir force between real materials. In
the present literature there is no agreement on this subject. We adduced arguments
in favor of the statement that in the region of experimental separations the influence
of the spatial dispersion on the Casimir force is negligible small. The statements
on the opposite, contained in the literature, were shown to be not reliable because
they are obtained by the application of the Lifshitz theory outside of its application
range. At the same time it was ascertained that at the moment there is no consistent
fundamental theory of the van der Waals and Casimir forces taking the spatial
dispersion into account. This is the problem to solve in the foreseeable future.
ACKNOWLEDGMENTS
G.L.K. and V.M.M. are grateful to the Center of Theoretical Studies and the Insti-
tute for Theoretical Physics, Leipzig University for their kind hospitality. This work
was supported by Deutsche Forschungsgemeinschaft grant 436RUS113/789/0-2.
References
1. H. B. G. Casimir, Proc. K. Ned. Akad. Wet. 51, 793 (1948).
2. P. W. Milonni, The Quantum Vacuum (Academic Press, San Diego, 1994).
3. V. M. Mostepanenko and N. N. Trunov, The Casimir Effect and its Applications
(Clarendon Press, Oxford, 1997).
4. K. A. Milton, The Casimir Effect (World Scientific, Singapore, 2001).
5. M. Kardar and R. Golestanian, Rev. Mod. Phys. 71, 1233 (1999).
6. M. Bordag, U. Mohideen and V. M. Mostepanenko, Phys. Rep. 353, 1 (2001).
7. S. K. Lamoreaux, Rep. Progr. Phys. 68, 201 (2005).
8. P. Candelas and S. Weinberg, Nucl. Phys. B237, 397 (1984).
9. M. Bordag, B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. D58,
075003 (1998); D60, 055004 (1999); D62, 011701(R) (2000).
10. J. C. Long, H. W. Chan and J. C. Price, Nucl. Phys. B539, 23 (1999).
11. E. Fischbach, D. E. Krause, V. M. Mostepanenko and M. Novello, Phys. Rev. D64,
075010 (2001).
12. V. M. Mostepanenko, Int. J. Mod. Phys. A17, 722 (2002); A17, 4143 (2002).
13. G. L. Klimchitskaya and U. Mohideen, Int. J. Mod. Phys. A17, 4143 (2002).
14. J. F. Babb, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. A70, 042901
(2004).
15. M. Antezza, L. P. Pitaevskii and S. Stringari, Phys. Rev. A70, 053619 (2004).
16. M. Krech, The Casimir Effect in Critical Systems (World Scientific, Singapore, 1994).
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 33
17. H. B. Chan, V. A. Aksyuk, R. N. Kleiman, D.J. Bishop and F. Capasso, Science 291,
1941 (2001); Phys. Rev. Lett. 87, 211801 (2001).
18. E. V. Blagov, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. B71, 235401
(2005).
19. E. Elizalde, Ten Physical Applications of Spectral Zeta Functions (Springer, Berlin,
1995).
20. S. K. Lamoreaux, Phys. Rev. Lett. 78, 5 (1997).
21. U. Mohideen and A. Roy, Phys. Rev. Lett. 81, 4549 (1998).
22. G. L. Klimchitskaya, A. Roy, U. Mohideen and V. M. Mostepanenko, Phys. Rev. A60,
3487 (1999).
23. A. Roy and U. Mohideen, Phys. Rev. Lett. 82, 4380 (1999).
24. A. Roy, C.-Y. Lin and U. Mohideen, Phys. Rev. D60, 111101(R) (1999).
25. B. W. Harris, F. Chen and U. Mohideen, Phys. Rev. A62, 052109 (2000).
26. T. Ederth, Phys. Rev. A62, 062104 (2000).
27. G. Bressi, G. Carugno, R. Onofrio and G. Ruoso, Phys. Rev. Lett. 88, 041804 (2002).
28. F. Chen, U. Mohideen, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev.
Lett. 88, 101801 (2002).
29. F. Chen, U. Mohideen, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev.
A66, 032113 (2002).
30. R. S. Decca, E. Fischbach, G. L. Klimchitskaya, D. E. Krause, D. López and
V. M. Mostepanenko, Phys. Rev. D68, 116003 (2003).
31. F. Chen, G. L. Klimchitskaya, U. Mohideen and V. M. Mostepanenko, Phys. Rev.
A69, 022117 (2004).
32. R. S. Decca, D. López, E. Fischbach, G. L. Klimchitskaya, D. E. Krause and
V. M. Mostepanenko, Ann. Phys. (N.Y.) 318, 37 (2005).
33. G. L. Klimchitskaya, R. S. Decca, D. López, E. Fischbach, D. E. Krause and
V. M. Mostepanenko, Int. J. Mod. Phys. A20, 2205 (2005).
34. Y. Srivastava, A. Widom and M. H. Friedman, Phys. Rev. Lett. 55, 2246 (1985).
35. Y. Srivastava and A. Widom, Phys. Rep. 148, 1 (1987).
36. E. M. Lifshitz, Sov. Phys. JETP 2, 73 (1956).
37. I. E. Dzyaloshinskii, E. M. Lifshitz and L. P. Pitaevskii, Sov. Phys. Usp. 4, 153 (1961).
38. E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics, Part II (Pergamon, Oxford,
1980).
39. J. Schwinger, L. L. DeRaad and K. A. Milton, Ann. Phys. (N.Y.) 115, 1 (1978).
40. L. S. Brown and G. J. Maclay, Phys. Rev. 184, 1272 (1969).
41. M. Boström and B. E. Sernelius, Phys. Rev. Lett. 84, 4757 (2000).
42. J. Feinberg, A. Mann and M. Revzen, Ann. Phys. (N.Y.) 288, 103 (2001).
43. A. Scardiccio and R. L. Jaffe, Nucl. Phys. B743, 249 (2006).
44. C. Genet, A. Lambrecht and S. Reynaud, Phys. Rev. A62, 012110 (2000).
45. M. Bordag, B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. Lett.
85, 503 (2000).
46. J. S. Høye, I. Brevik, J. B. Aarseth and K. A. Milton, Phys. Rev. E67, 056116 (2003).
47. I. Brevik, J. B. Aarseth, J. S. Høye and K. A. Milton, Phys. Rev. E71, 056101 (2005).
48. V. B. Bezerra, G. L. Klimchitskaya and C. Romero, Phys. Rev. A65, 012111 (2002).
49. B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. A67, 062102
(2003).
50. J. R. Torgerson and S. K. Lamoreaux, Phys. Rev. E70, 047102 (2004).
51. S. K. Lamoreaux and W. T. Buttler, Phys. Rev. E71, 036109 (2005).
52. V. B. Bezerra, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. A66, 022112
(2002).
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
34 B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko
53. V. B. Bezerra, G. L. Klimchitskaya, V. M. Mostepanenko and C. Romero, Phys. Rev.
A69, 022119 (2004).
54. V. B. Bezerra, R. S. Decca, E. Fischbach, B. Geyer, G. L. Klimchitskaya, D. E. Krause,
D. López, V. M. Mostepanenko and C. Romero, Phys. Rev. E73, 028101 (2006).
55. I. Brevik, S. A. Ellingsen and K. A. Milton, New J. Phys. 8, 236 (2006).
56. V. M. Mostepanenko, V. B. Bezerra, R. S. Decca, E. Fischbach, B. Geyer, G. L. Klim-
chitskaya, D. E. Krause, D. López and C. Romero, J. Phys. A39, 6589 (2006).
57. J. R. Zurita-Sánchez, J.-J. Greffet and L. Novotny, Phys. Rev. A69, 022902 (2004).
58. K. Joulain, J.-P. Mulet, F. Marquier, R. Carminati and J.-J. Greffet, Surf. Sci. Rep.
57, 59 (2005).
59. B. S. Stipe, H. J. Mamin, T. D. Stowe, T. W. Kenny and D. Rugar, Phys. Rev. Lett.
87, 096801 (2001).
60. B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. D72, 085009
(2005).
61. B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. A72, 022111
(2005).
62. F. Chen, U. Mohideen, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev.
A72, 020101(R) (2005); A73, 019905(E) (2006).
63. F. Chen, U. Mohideen, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev.
A74, 022103 (2006).
64. B. E. Sernelius, Phys. Rev. B71, 235114 (2005).
65. V. B. Svetovoy and R. Esquivel, Phys. Rev. E72, 036113 (2005).
66. R. Esquivel and V. B. Svetovoy, Phys. Rev. A69, 062102 (2004).
67. R. Esquivel, C. Villarreal and W. L. Mochán, Phys. Rev. A68, 052103 (2003).
68. A. M. Contreras-Reyes and W. L. Mochán, Phys. Rev. A72, 034102 (2005).
69. J. Mahanty and B. W. Ninham, Dispersion Forces (Academic, New York, 1976).
70. V. A. Parsegian and B. W. Ninham, Nature 224, 1197 (1969).
71. L. Bergström, Adv. Coll. Interface Sci. 70, 125 (1997).
72. H. Mitter and D. Robaschik, Eur. Phys. J. B13, 335 (2000).
73. B. W. Ninham and J. Daicic, Phys. Rev. A57, 1870 (1998).
74. G. L. Klimchitskaya, B. Geyer and V. M. Mostepanenko, J. Phys. A39, 6495 (2006).
75. A. O. Caride, G. L. Klimchitskaya, V. M. Mostepanenko and S. I. Zanette, Phys. Rev.
A71, 042901 (2005).
76. Handbook of Optical Constants of Solids, ed. E. D. Palik (Academic, New York, 1985).
77. B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Int. J. Mod. Phys. A16,
3291 (2001).
78. J. C. Slater, Insulators, Semiconductors and Metals. Quantum Theory of Molecules
and Solids, Vol. 3, (McGraw-Hill, New York, 1967).
79. Materials Science and Engineering Handbook, eds. J. F. Shackelford and W. Alexander
(CRC, Boca Raton, 2001).
80. B. Geyer, G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. A65, 062109
(2002).
81. N. Inui, J. Phys. Soc. Jpn. 72, 2198 (2003); 73, 2198 (2004); 75, 024004 (2006).
82. W. Arnold, S. Hunklinger and K. Dransfeld, Phys. Rev. B19, 6049 (1979).
83. F. Chen and U. Mohideen, J. Phys. A39, 6233 (2006).
84. C. Genet, A. Lambrecht and S. Reynaud, Phys. Rev. A67, 043811 (2003).
85. E. I. Kats, Sov. Phys. JETP 46, 109 (1977).
86. G. L. Klimchitskaya and V. M. Mostepanenko, Phys. Rev. B75, 036101 (2007).
87. V. M. Agranovich and V. L. Ginzburg, Crystal Optics with Spatial Dispersion, and
Excitons (Springer, Berlin, 1984).
October 22, 2018 0:31 WSPC/INSTRUCTION FILE textDr
Thermal Casimir force between dielectrics 35
88. J. A. Stratton, Electromagnetic Theory (McGraw-Hill, New York, 1941).
89. V. L. Ginzburg, Physics and Astrophysics (Pergamon Press, Oxford, 1985).
90. V. M. Agranovich and V. L. Ginzburg, Sov. Phys. JETP 36, 440 (1973).
91. K. L. Kliewer and R. Fuchs, Phys. Rev. 172, 607 (1968).
92. I. M. Kaganova and M. I. Kaganov, Phys. Rev. B63, 054202 (2001).
93. S. M. Rytov, Yu. A. Kravtsov and V. I. Tatarskii, Principles of Statistical Radio-
physics, Vol. 3 (Springer, Berlin, 1989).
94. G. W. Ford and W. H. Weber, Phys. Rep. 113, 195 (1984).
95. J. T. Foley and A. J. Devaney, Phys. Rev. B12, 3104 (1975).
96. Yu. S. Barash and V. L. Ginzburg, Sov. Phys. Usp. 18, 305 (1975).
97. G. G. Kleinman and U. Landman, Phys. Rev. Lett. 33, 524 (1974).
ABSTRACT
  We review recent results obtained in the physics of the thermal Casimir force
acting between two dielectrics, dielectric and metal, and between metal and
semiconductor. The detailed derivation for the low-temperature behavior of the
Casimir free energy, pressure and entropy in the configuration of two real
dielectric plates is presented. For dielectrics with finite static dielectric
permittivity it is shown that the Nernst heat theorem is satisfied. Hence, the
Lifshitz theory of the van der Waals and Casimir forces is demonstrated to be
consistent with thermodynamics. The nonzero dc conductivity of dielectric
plates is proved to lead to a violation of the Nernst heat theorem and, thus,
is not related to the phenomenon of dispersion forces. The low-temperature
asymptotics of the Casimir free energy, pressure and entropy are derived also
in the configuration of one metal and one dielectric plate. The results are
shown to be consistent with thermodynamics if the dielectric plate possesses a
finite static dielectric permittivity. If the dc conductivity of a dielectric
plate is taken into account this results in the violation of the Nernst heat
theorem. We discuss both the experimental and theoretical results related to
the Casimir interaction between metal and semiconductor with different charge
carrier density. Discussions in the literature on the possible influence of
spatial dispersion on the thermal Casimir force are analyzed. In conclusion,
the conventional Lifshitz theory taking into account only the frequency
dispersion remains the reliable foundation for the interpretation of all
present experiments.

<|endoftext|><|startoftext|>
Introduction
Typically when we do geometry we concentrate on a specific venue in a par-
ticular space. Often the context is Euclidean space, and often the work is
done in R2 or R3. But in modern work there are many aspects of analysis
that are linked to concrete aspects of geometry. And there is often interest
in rendering the ideas in Hilbert space or some other infinite dimensional set-
ting. Thus one wants to see how the finite-dimensional result in RN changes
as N → +∞.
In the present paper we study some particular aspects of the geometry of
N and their asymptotic behavior as N → ∞. We choose these particular
examples because the results are surprising or especially interesting. One
may hope that they will lead to further studies.
1 Volume in RN
Let us begin by calculating the volume of the unit ball in RN and the surface
area of its bounding unit sphere. We let ΩN denote the former and ωN−1
denote the latter. In addition, we let Γ(x) be the celebrated Gamma function
of L. Euler. It is a helpful intuition (which is literally true when x is an
integer) that Γ(x) ≈ (x− 1)!. We shall also use Stirling’s formula which says
k! ≈ kk · e−k ·
1We are happy to thank the American Institute of Mathematics for its hospitality and
support during this work.
http://arxiv.org/abs/0704.1041v1
or, more generally,
Γ(x) ≈ (x− 1)x−1e−(x−1)
2π(x− 1)
for x ∈ R, x > 0.
Lemma 1 We have that
e−π‖x‖
dx = 1.
Proof: The case N = 1 is familiar from calculus. We write
hence
e−π‖x‖
(polar coordinates)
‖x‖=1
r ds(x)dr
hence S = 1.
For the N−dimensional case, write
e−π|x|
1dx1 · · ·
and apply the one-dimensional result.
Let σ be the unique rotationally invariant area measure on SN−1 = ∂BN .
Lemma 2 We have
ωN−1 =
2πN/2
Γ(N/2)
Proof: Introducing polar coordinates we have
e−π|x|
rN−1dr
Letting s = r2 in this last integral and doing some obvious manipulations
yields the result.
Corollary 3 The volume of the unit ball in RN is
2πN/2
Γ(N/2) ·N
Proof: We calculate that
1 dV (x)
(polar coordinates)
‖x‖=1
1·rN−1 dσ(x)dr = ωN−1·
That completes the proof.
Now the first nontrivial fact that we wish to observe about the volume of
the Euclidean unit ball in N -space is that that volume tends to 0 at N → ∞.
More formally,
Proposition 4 We have the limit
Ω(N) = 0 .
Proof: We calculate that
(Volume of Unit Ball) =
2πN/2
Γ(N/2) ·N
2πN/2
((N − 2)/2)(N−2)/2e−(N−2)/2
2π[(N − 2)/2] ·N
(2πe)N/2 · 2
N (N−1)/2 ·
(2πe)N/2 · 2
N (N+1)/2 ·
This expression clearly tends to 0 as N → +∞.
In fact we can actually say something about the rate at which the volume
of the ball tends to zero. We have
Proposition 5 We have the estimate
0 ≤ ΩN ≤ 2 ·
20N/2
N (N+1)/2
Proof: Follows by inspection of the last line of the proof of Proposition 4.
In fact something more is true about the volumes of balls in high-dimensional
Euclidean space.
Proposition 6 Let R > 0 be fixed. Then
Vol(B(0, R)) = 0 .
In other words, the volume of the ball of radius R tends to 0.
Proof: From the formula for the volume of the unit ball we have that
Vol(B(0, R)) = lim
2πeR2
This expression clearly tends to 0 as N → +∞.
We leave the proof of the next result as an exercise for the reader; simply
examine the formula for ωN−1:
Proposition 7 Let R > 0. Then the surface area of the sphere of radius R
in RN tends to 0 as N → +∞.
The following very simple but remarkable fact comes up in considerations
of spherical summation of Fourier series.
Proposition 8 As N → +∞, the volume of the unit ball in RN is con-
centrated more and more out near the boundary sphere. More precisely, let
δ > 0. Then
volume(B(0, 1) \B(0, 1− δ))
volume(B(0, 1)
= 1 .
Proof: We have
volume(B(0, 1) \B(0, 1− δ))
volume(B(0, 1)
= lim
[1− (1− δ)N ] · [2πN/2]/[Γ(N/2) ·N ]
[2πN/2]/[Γ(N/2) ·N ]
= lim
1− (1− δ)N
= 1 .
That is the desired conclusion.
2 A Case of Leakage
The title of this section gives away the punchline of the example. Or so it
may seem to some.
Consider at first a square box of side two with sides parallel to the coor-
dinate axes in the Euclidean plane. We may inscribe in this box four discs of
diameter 1, as shown in Figure 1. These discs will be called primary discs.
Once those four discs are inscribed, we may inscribe a small, shaded disc in
the middle as shown in Figure 2. We set
area of shaded disc
area of large box
The same construction may be performed in Euclidean dimension 3. Ex-
amine Figure 3. It suggests a rectangular parallelepiped with all sides equal
Figure 1: The configuration in dimension 2.
Figure 2: The shaded disc in dimension 2.
Figure 3: The configuration in dimension 3.
to 2, and 8 unit balls inscribed inside in a canonical fashion. These eight
primary balls determine a unique inscribed shaded ball in the center. We set
volume of shaded ball
volume of large box
A similar construction may be performed in any dimension N ≥ 2, with
2N balls inscribed in a rectangular box of side 2. The ratio RN is then
calculated in just the same way. The question is then
What is the limit limN→∞RN as N → +∞?
It is natural to suppose, and most people do suppose, and that this limit
(assuming it exists) is between 0 and 1. All other things being equal, it is
likely equal to either 0 or 1. Thus it comes as something of a surprise that
this limit is in fact equal to +∞. Let us now enunciate this result and prove
Proposition 9 The limit
RN = +∞ .
Figure 4: The disc trapped in dimension 2.
Of course this result is counter-intuitive, because we all instinctively be-
lieve that the shaded ball, in any dimension, is contained inside the big box.
Such is not the case. We are being fooled by the 2-dimensional situation de-
picted in Figure 1. In that special situation, any of the two adjacent primary
discs actually touch in such a way as to trap the shaded disc in a particular
convex subregion of the big box (see Figure 4). So certainly it must be that
R2 < 1. But such is not the case in higher dimensions. There is actually a
gap on each side of the box through which the shaded ball can leak. And
indeed it does.
This is what we shall now show. First we shall perform the calculation of
RN for each N and confirm that the expression tends to +∞ as N → +∞.
Then we shall calculate the first dimension in which the shaded ball actually
leaks out of the box.
Proof of the Proposition: Notice that the center of one of the primary
balls is at the point (1, 1, . . . , 1). It is a simple matter to calculate that a
boundary point of this ball that is nearest to the center of the box is located
at P ∗ ≡ (1 − 1/
N, 1 − 1/
N, . . . , 1 − 1/
N . Since the shaded ball will
osculate the primary ball at that point, we see that the shaded ball has center
the origin and radius equal to
dist(0, P ∗) =
(1− 1/
N)2, 1− 1/
N)2, . . . , 1− 1/
N)2 =
N + 1− 2
Thus we see that the volume of the shaded ball is
[N + 1− 2
N ]N/2 · ΩN .
The ratio RN is then
[N + 1− 2
N ]N/2 · ΩN
Now we may simplify this last expression to
2 · πN/2
Γ(N/2) ·N
· [N + 1− 2
N ]N/2
After some simplification we find that
2(π/4)N/2 · [N + 1− 2
N ]N/2
Γ(N/2) ·N
By Stirling’s formula, this last expression is approximately equal to
2 · (π/4)N/2(N + 1− 2
N)N/2
N − 2
)(2−N)/2
· e(N−2)/2 · 1√
π(N − 2)
(N + 1− 2
N) · 2
N − 2
· N − 2
π(N − 2)
After some manipulation, we finally find that
RN = lim
N + 1
N − 2
· N − 2
π(N − 2)
= lim
)N/2 (
N − 2
· N − 2
π(N − 2)
Now, in the limit, we may replace expressions like N − 2 by N . And we may
reparametrize N as 3N . The result is
N − 2
π(N − 2)
= lim
)3N/2
N − 2
π(N − 2)
= lim
N − 2
π(N − 2)
What we see now is that this last equals
Plainly, because πe/2 > 4, this limit is +∞. That proves the result.
And now we turn to the question of when the shaded ball starts to leak
out of the big box. This is in fact easy to analyze. We need only determine
when the radius of the shaded ball exceeds 1. First notice that the radius of
the shaded ball is monotone increasing in N . Now we need to solve
N + 1− 2
N > 1 .
This is a simple algebra problem, and the solution is N > 4. Thus, beginning
in dimension 5, the shaded ball will “leak out of” the large box.
It may be noted that RichardW. Cottle has made a study of mathematical
phenomena that change (in the manner of a catastrophe—see [ZEE]) between
dimensions 4 and dimensions 5. The results may be found in [COT].
3 Centroids
This final section of the paper will be more like an invitation to further
exploration. We cannot include all the details of the calculations, as they
are too recondite and complex. Yet the topic is very much in the spirit of
the theme of this paper, and we cannot resist including a few pointers to this
new and interesting work (for which see [KRA1] and [KRMP]).
Figure 5: Centroids for a triangle.
The inspiration for this work is the following somewhat surprising obser-
vation. Let T be a triangle in the plane (see Figure 5). There are three ways
to calculate the centroid of this figure: (i) average the vertices, (ii) average
the edges, or (iii) average the 2-dimensional solid figure. And the question
is: are these three versions of the centroid the same? The answer is that
(i) and (iii) are always the same. Generically (ii) is different. In fact the
three versions of the centroid coincide if and only if the triangle is equilateral
[KRMP].
We used this fact as a springing-off point to investigate analogous ques-
tions in higher dimensions. Consider the simplex S in RN that is the con-
vex hull of the points 0 = (0, 0, . . . , 0), (1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . ,
(0, 0, . . . , 0, 1). Refer to Figure 6. Such an N -dimensional geometric figure
comes equipped with (N+1) notions of centroid: one can average the vertices
(or 1-dimensional skeleton) S0, or one can average over the 1-dimensional
skeleton S1, or one can average over the two-dimensional skeleton S2A, or
. . . one can average over the (N − 1)-dimensional skeleton SN−1, or one can
average over the N -dimensional solid SN . There results the centroids C0,N ,
C1,N , . . . , CN,N . And the question is: Are these different notions of centroid
all the same? And here is the somewhat surprising answer:
In dimensions 2 through 12 (for the ambient space), the skeletons
S0 and SN for the simplex S have the same centroid. In those
Figure 6: A simplex in RN .
same dimensions, the skeletons S1, S2, . . . , SN−1 all have different
centroids, and the centroids all differ from the common centroid
for S0 and SN . But in dimension 13 things are different. In fact
in that dimension the skeletons S3 and S8 have the same centroid.
Let us say a word about why these facts are true. Let ej denote the j
coordinate vector in RN (i.e., the vector with a 1 in the jth position and
0s in all other slots). Then a sophisticated computation with elementary
calculus yields that the centroid of the k-skeleton Sk of the simplex which is
the convex hull of 0, e1, e2, . . . , eN is
Ck,N =
k + (N − k)
k + 1
(k + 1) + (N − k)
k + 1
(e1 + e2 + · · ·+ eN) .
From this formula it can immediately be verified that
S0 = SN =
N + 1
(e1 + e2 + · · ·+ eN ) .
It can also be checked that, in dimensions 2 through 12, all the intermediate
skeletons have distinct centroids. But, in dimension N = 13, we observe that
C3,13 = C8,13 =
13 · 24
(e1 + e2 + · · ·+ eN ) .
One may well ask whether dimension N = 13 is the only dimension in
which there are two intermediate skeletons with the same centroid. The
answer is “no”; there are in fact infinitely many such dimensions (although
they are quite sparse—sparser than the prime integers). One may verify this
assertion by using the following Diophantine formula.
Theorem 10 Fix a dimension N ≥ 2. Consider the simplex S as described
above. There are skeletons of dimension k1 and k2, 1 ≤ k1 < k2 ≤ N − 1,
of the simplex S which have the same centroid if and only if k1 = a
2 − 1,
k2 = b
2 − 1 (for positive integers a and b) and, in addition,
N = (b2 + ab+ a2)− (b+ a)− 1 . (⋆)
Obviously this theorem gives us a tool for finding dimensions in which
the simplex S has two intermediate skeletons with the same centroid. The
following table gives some values of the dimension, and of the intermediate
dimensions of skeletons which have the same centroid. Of course this data
may be confirmed by direct calculation with the formula (⋆). We stress
that there are in fact infinitely many dimensions in which this phenomenon
occurs. The proof of this statement is a nontrivial exercise in elementary
number theory (see [KRMP]).
value of N value of k1 value of k2 approx. coord. of centroid
13 3 8 0.0737179487
21 3 15 0.0464285714
29 8 15 0.0340038314
31 3 24 0.0317204301
40 8 24 0.0247619047
43 3 35 0.0229789590
51 15 24 0.0194852941
53 8 35 0.0187368973
57 3 48 0.0173872180
65 15 35 0.0153133903
We conclude this discussion by recording the fact that it is impossible in
any dimension for there to be three intermediate skeletons with the same
centroid.
Proposition 11 For no dimension N can there exists 3 distinct number
1 ≤ k1 < k2 < k3 ≤ N − 1 such that the centroids Ck1,N , Ck2,N , Ck3,N for the
simplex S coincide.
Proof: We let
Q(a, b) = (b2 + ab+ a2)− (b+ a)− 1 .
It suffices for us to show that there do not exist natural numbers a < b < c
such that Q(a, b) = Q(a, c). Seeking a contradiction, we suppose that such a
triple does indeed exist.
b2 + ab− b = c2 + ac− c
b2 + (a− 1)b = c2 + (a− 1)c .
Since a ≥ 1, the function b 7→ b2+(a−1)b is strictly increasing, which yields
a contradiction.
The exploration of centroids for simplices of high dimension is a new venue
of exploration. There are many new phenomena, and more to be discovered.
See [KRMP] for more results along these lines. The reference [ZON] is also
of interest.
References
[COT] R. W. Cottle, Quartic barriers, Computational Optimization and Ap-
plications 12(1999), 81–105.
[KRA1] S. G. Krantz, A Matter of gravity, Amer. Math. Monthly 110(2003),
465–481.
[KRMP] S. G. Krantz, J. E. McCarthy, and H. R. Parks, Geometric characteri-
zations of centroids of simplices, Journal of Mathematical Analysis and
Applications 316(2006), 87–109.
[ZEE] E. C. Zeeman, Catastrophe Theory. Selected Papers, 1972–1977, Addison-
Wesley, Reading, MA, 1977.
[ZON] C. Zong, Strange Phenomena in Convex and Discrete Geometry, Springer-
Verlag, New York, 1996.
STEVEN G. KRANTZ received his B.A. degree from the University of
California at Santa Cruz in 1971. He earned the Ph.D. from Princeton Uni-
versity in 1974. He has taught at UCLA, Princeton University, Penn State,
and Washington University in St. Louis. Krantz is the holder of the UCLA
Alumni Foundation Distinguished Teaching Award, the Chauvenet Prize,
and the Beckenbach Book Prize. He is the author of 150 papers and 50
books. His research interests include complex analysis, real analysis, har-
monic analysis, and partial differential equations. Krantz is currently the
Deputy Director of the American Institute of Mathematics.
American Institute of Mathematics, 360 Portage Avenue,
Palo Alto, CA 94306
skrantz@aimath.org
	Introduction
	Volume in RN
	A Case of Leakage
	Centroids
ABSTRACT
  We study asymptotics of various Euclidean geometric phenomena as the
dimension tend to infinity.

<|endoftext|><|startoftext|>
Entangling and disentangling capacities of nonlocal maps
Berry Groisman
Centre for Quantum Computation, DAMTP, Centre for Mathematical Sciences,
University of Cambridge, Wilberforce Road, Cambridge CB3 0WA, United Kingdom.
Entangling and disentangling capacities are the key manifestation of the nonlocal content of a
quantum operation. A lot of effort has been put recently into investigating (dis)entangling capacities
of unitary operations, but very little is known about capacities of non-unitary operations. Here we
investigate (dis)entangling capacities of unital CPTP maps acting on two qubits.
I. INTRODUCTION
Entanglement content is one of the fundamental ways
to characterize nonlocal quantum resources (nonlocal
states and operations). For pure bipartite states the ul-
timate measure of entanglement, the von Neumann en-
tropy of entanglement, had been recently discovered [1].
A universal measure of entanglement for mixed states
had not been found yet and different measures are used
depending on the operational context. Nevertheless, the
important feature of all entanglement measures of states
is that their values are directly inferred using the param-
eters of a state itself.
Similarly to mixed states, the entanglement content
of quantum operations can be characterized in different
ways, e.g. via the amount of entanglement necessary to
generate that operation or via the amount of entangle-
ment the operation is able to produce/destroy (the so
called entangling/disentangling capacities). This article
is concerned with the two latter measures.
Unlike the amount of entanglement in a state, the
(dis)entangling capacities of an operation do not have
an operational interpretation on their own. They mani-
fest themselves via the change of the entanglement of a
particular state that the operation acts upon. And the
operation has to act on a specific state (the “optimal”
state) in order to realize its (dis)entangling capacity in
full. Thus, the straightforward way to calculate these
quantities is to maximize the change of entanglement over
all possible initial states.
Substantial progress have been made recently in inves-
tigating (dis)entangling capacities of unitary operations.
The capacities of two-qubit unitary operations were ex-
plicitly calculated [2, 3, 4]. It was also shown that single-
shot capacities are equal to asymptotic capacities [3, 5, 6].
Some results for higher dimensions were also obtained [7].
However, extending these techniques to systems of higher
dimensionality seems to be a very difficult task. Even in
the two-qubit case the capacities of a general unitary op-
eration have been calculated numerically, no analytical
technique is known.
In all real situations an experimentalist never deals
with a perfect unitary in the laboratory. And it is need-
less to say that calculating capacities of non-unitary op-
erations, i.e. nonlocal quantum maps, is even a bigger
challenge.
In this article we consider nonlocal completely positive
trace preserving (CPTP) maps of the form
τ(ρ) →
pkUkρU
, (1)
where Uk are unitary transformations. Maps of this type
are often called random unitary processes, and they are
doubly stochastic. The map (1) may arise, for example,
if the desired unitary transformation can be implemented
successfully only with certain probability, while another
unitary is realized in the case of failure. A continuous
version of the map (1) may arise naturally in experiment
if parameters of a desired unitary transformation are sub-
ject to a noise (the case of Gaussian noise will be analyzed
in detail in Section IVC). The scope of this article covers
the case of τ that act on two qubits. We calculate single-
shot (dis)entangling capacities of τ in some particular
cases.
The structure of the article is as follows. In Sec. II
the definition(s) of (dis)entangling capacities of unitaries
are presented and some recent results concerning two-
qubit unitaries. Sec. III generalizes the definition of
(dis)entangling capacities for non-unitaries. Some nu-
merical results for (lower bounds on) (dis)entangling ca-
pacities for discreet and continuous mixtures of unitaries
are presented in Sec. IV.
II. (DIS)ENTANGLING CAPACITY OF A
UNITARY: DEFINITIONS AND SOME RELATED
RESULTS
Consider an unitary operation UAB that acts on a
tensor product Hilbert space HA ⊗ HB of two spatially
separated particles A and B. If UAB can not be de-
composed into a tensor product of local unitaries, i.e.
UAB 6= VA⊗WB, then we say that UAB is nonlocal. Un-
like local unitaries, nonlocal unitaries have an ability to
produce or destroy entanglement. This ability is usually
characterized by the entangling, E↑(U), and the disen-
tangling, E↓(U), capacities, i.e. by the maximal increase
(decrease) of entanglement that can be achieved when
U acts on quantum states. To quantify these capacities
we have to choose appropriate measures of entanglement.
The most sensible choice is to use the entanglement of for-
mation [11] as a measure of entanglement of the initial
state ρ, and the distillable entanglement [12] as a measure
http://arxiv.org/abs/0704.1042v1
of entanglement of the final state UρU †. The reason for
this asymmetric choice is purely operational one. What
counts is the amount of resources (pure maximally en-
tangled states) needed to create ρ (asymptotically) and
the amount of pure-state entanglement one will be able
to extract from UρU †, again asymptotically. Thus the
most general definition is
E↑(U) = max
[D(UρU †)− EF (ρ)],
E↓(U) = max
[EF (ρ)−D(UρU †)],
where the maximization is over all possible states ρ
(mixed and pure) accessible to U . The Hilbert space of
an accessible ρ is not necessarily restricted to HA ⊗HB.
It turns out to be the case that some U create more en-
tanglement if the original particles are entangled with
local ancillary particles [2, 3]. It also appears to be the
case that the maximization in Eq. (2) can be restricted
to pure-states only [3], therefore, the definition (2) can
be simplified as
E↑(U) = max
[E(U |ψ〉) − E(|ψ〉)]
E↓(U) = max
[E(|ψ〉) − E(U |ψ〉)],
where E is an entanglement measure for pure state
(Throughout this paper we will use the von Neumann
entropy of entanglement as the most appropriate mea-
sure). This obviously simplifies the job significantly.
Let us briefly recall the main results for A and B being
two-level particles, qubits.
Any UAB acting on qubits can be decomposed as [2, 10]
UAB = [VA ⊗ VB] e
α=x,y,z
α [WA ⊗WB] , (4)
where π/4 ≥ ξx ≥ ξy ≥ |ξz | ≥ 0. The middle term sand-
wiched by local unitaries is called the canonical decom-
position of U . Any U can be transformed to its canon-
ical form by sandwiching it with Hermitian conjugates
of corresponding local unitaries. That means that the
canonical form is genuinely nonlocal part of U - every-
thing else is local. The beauty of this results is that
out of 15 real parameters that parameterize a general
two-qubit unitary only three are necessary to describe
its nonlocal nature. It simplifies considerably the clas-
sification of nonlocal unitaries. For the purpose of our
discussion three classes can be identified; namely, the
Controlled-NOT(CNOT)-class (ξx 6= 0, ξy = ξz = 0),
the DoubleCNOT-class (ξx 6= 0, ξy 6= 0, ξz = 0), and the
SWAP-class (all three ξα are not equal zero) [15, 16]. The
names reflect the fact that the corresponding “mother”
unitary transformation (i.e. with ξα = π/4 for α 6= 0)
belongs to that class.
The main results for qubits are [2, 3]:
(a) E↑(U) = E↓(U).
(b) For CNOT-class the optimal state, i.e. the state
that satisfies definition (3), lives solely in the Hilbert
space of particles A and B (no ancillas are needed) and
takes the form
|ψopt〉 = cosα|0〉A|0〉B ± i sinα|1〉A|1〉B, (5)
where ± correspond to E↑ and E↓ respectively. Thus
all U from that class achieve their capacity by acting on
pure states with the same Schmidt basis (only values of
Schmidt coefficients differ depending on the value of ξx).
The values α = f(ξx) can be obtained by straightforward
numerical optimization.
(c) If ξα < π/4 the maximal capacity is achieved when
|ψopt〉 is already entangled.
(d) Unitaries of the CNOT-class achieve their capaci-
ties by acting on optimal states that lie inHA⊗HB. How-
ever, unitaries of the DCNOT and SWAP-classes achieve
their capacities only if the original particles are entangled
with local ancillas. It was conjectured that it is sufficient
to take the size of ancillas equal to the size of original
particles. This conjecture was supported by numerical
simulations for qubits [3].
III. ENTANGLING AND DISENTANGLING
CAPACITIES OF A NON-UNITARY
For non-unitaries we will use a definition similar to Eq.
(2), i.e. we define
E↑(τ) = max
[D(τ(ρ)) − EF (ρ)]
E↓(τ) = max
[EF (ρ)−D(τ(ρ))].
However, in general here we cannot justify reducing
the search to pure states. This is due to the fact that
distillable entanglement is not necessary a convex mea-
sure.
We can argue, nevertheless, that in the case of mix-
tures of unitaries acting on two qubits without ancillas
the distillable entanglement can be regarded as a convex
measure. Indeed, a mixture of optimal states (5) forms a
Bell-diagonal state for which the lower and upper bounds
on distillable entanglement [17]
S(ρA)− S(ρAB) ≤ D(ρ) ≤ ERE(ρAB) (7)
coincide. Here we recall that the relative entropy of en-
tanglement ERE(x) is a convex measure.
If ancillas are used then the situation is more compli-
cated. We leave the question of whether the capacities
are attained on pure states as an open question and cal-
culate the lower bounds on these capacities using pure
states.
IV. MIXTURES OF UNITARIES ACTING ON
TWO QUBITS
The properties of two-qubit unitaries described above
in Section II can help us to generalize that approach to
-0.75-0.5-0.25 0.25 0.5 0.75
FIG. 1: (color online) E↑(τ ) (solid line) and E↓(τ ) (dashed
line) as functions of ∆ for different values of ξ, where p = 0.5.
The highest curve corresponds to ξ = π/4. The lowest curve
corresponds to ξ = 0. Here ∆ is measured in radians.
mixtures of unitaries as in Eq. (1)[18]. Here we use two
methods for calculating E↑(τ) and E↓(τ).
Method I: We make an assumption about the partic-
ular form of the optimal input state, and subsequently
find the optimal values of its parameters.
Method II: We perform a direct numerical optimiza-
tion without making any a priori assumption about the
optimal state (except of its purity).
A. Example I: Discreet CNOT-mixtures
Consider a mixture of unitaries of the CNOT-class,
Uk = e
x . (8)
Here we use Method I. From continuity it follows that
the optimal state is expected to lie on the 2-dimensional
manifold of (superpositions of) states of the form (5) or
their convex mixtures. Moreover, in this special case we
can adopt the argument of [3] (see Sec. II) and claim that
the search can be restricted to pure states only. Thus,
the state optimal for τ is again of the type (5).
As a simplest case let us consider only two unitaries
U1 and U2:
τ(ρ) → pU1ρU †1 + (1 − p)U2ρU
For convenience let us define ∆ = ξ2x − ξ1x, and denote
ξ1x simply by ξ, so ξ
x = ξ +∆. We will fix ξ and analyze
E↑ and E↓ for various ∆. For ∆ = 0 the map reduces
to a unitary (with an appropriate capacity). As smaller
angle means smaller E↑(U), we would expect that if U1 is
mixed with U2, where ∆ < 0, then the entangling capac-
ity of the resulting map will decrease relative to E↑(U).
This intuition is consistent with the results presented on
Fig. 1. Similarly, we might expect that the entangling
capacity of the map will increase with ∆ > 0, and that
-0.4 -0.2 0.2 0.4
FIG. 2: (color online) DCNOT: E↑(τ ) (empty diamonds with
solid line fit) and E↓(τ ) (empty triangles with dashed line fit)
as functions of ∆ for ξ1x = ξ
y = π/8. SWAP: E
↑(τ ) (filled
diamonds with solid line fit) and E↓(τ ) (filled triangles with
dashed line fit) as functions of ∆ for ξ1x = ξ
y = ξ
z = π/8. In
both cases p = 0.5. Here ∆ is measured in radians.
this capacity will reach its maximum for maximal ∆, i.e.
maximal E↑(U2). However, Fig. 1 shows that this is not
the case. E↑(τ) indeed grows while ∆ is positive and
relatively small, reaching its maximum for certain inter-
mediate positive value of ∆ and then starting to decrease.
In other words, if U1 and p are fixed, then maximal E
is achieved for some intermediate U2 with ξ
x > ξ
x, but
not for ξ2x = π/4. This result might seem counterintuitive
from the first sight, but it has a clear explanation. Let α1,
α2 be the corresponding optimal values of α in Eq. (5)
for U1 and U2 respectively, then the optimal value of ατ
will lie somewhere in between, i.e. satisfy α1 > ατ > α2.
For ξ2x = π/4, U2 can realize its entangling capacity of 1
ebit if α2 = 0. However, when U2 acts on a state with
ατ > α2, then it creates less than 1−H [(cosατ )2] ebit.
Disentangling capacity, E↓(τ), behaves differently. It
is monotonic with ∆. It equals E↑ for ∆ = 0 as expected,
and it is strictly larger than E↑ for all other values of ∆.
The last observation shows behavior completely opposite
to that of unitaries. In a sense, it is easier for non-local
map to destroy entanglement rather than create it, while
for unitary operations the opposite holds [7].
This approach can be similarly applied to any finite
number of unitaries and to continuous distribution of uni-
taries, which will be discussed in Section IVC.
B. Example II: Discreet DCNOT and
SWAP-mixtures
For mixtures of unitaries of DCNOT and SWAP-class
Method II was used. The details of numerical calcula-
tions are presented in Appendix. We conjectured that
it is sufficient if local ancillas are qubits. Selected re-
sults are shown in Fig. 2. We can see that for DCNOT-
mixture the behavior of E↑(τ) and E↓(τ) is qualitatively
similar to CNOT-mixture (Fig. 1). However, for SWAP-
mixture slightly different behavior is obtained. In partic-
ular E↓(τ) exhibits maximum at an intermediate value
of ∆. It is also noticeable that for D > 0.357, E↑(τ) is
smaller for SWAP-mixture than for DCNOT-mixture. It
is a counterintuitive result that SWAP-mixture which is
naturally considered as “stronger” that DCNOT-mixture
has lower entangling capacity. However, again similar to
the line of thought in Example I we can argue that for
(relatively) large ∆ the second unitary, U2 is too strong,
and therefore when it acts on the optimal state (optimal
for the mixture, not for itself) it causes more destruc-
tion that corresponding U2 of DCNOT-class would have
caused.
C. Example III: Entangling capacity of noisy
unitary with Gaussian fluctuations
So far we analyzed discrete mixtures of unitaries. In
this section we analyze a continuous distribution, which is
usually what experimentalists deal with. These distribu-
tions arise due to uncertainty in one or more parameters.
Such uncertainties may be caused by the limits of cali-
bration precision of the devices and by high sensitivity of
systems used to generate desired interactions. For exam-
ple, the strength of exchange coupling between donors in
silicon based solid-state architectures for quantum com-
puting exhibit significant uncertainty resulting in error
in gate operation [19].
In particular, we consider the case when a non-unitary
map arises if a unitary from CNOT-class is subject to a
Gaussian noise.
Recent work [20] analyzed the capability of noisy
Hamiltonians to create entanglement. In particular, in-
teractions of the form Eq. (8), where ξ is Gaussian dis-
tributed with the mean ξ = π/4 and standard deviation
Ω, were considered. Without noise this operation (CNOT
operation) is able to create a maximally entangled state
if it acts on a disentangled pure state. The authors an-
alyzed the situation when the noisy operation acts on
initially disentangled state, which is by itself subject to
a Gaussian noise. Its capability to create entanglement
was characterized in terms of the condition for insepara-
bility of the resulting mixed state (via Peres-Horodecci
separability criterion).
The aim of our analysis is different. We consider noisy
interactions with ξ ∈ [0, π/4] and calculate their entan-
gling and disentangling capacities in terms of ξ and Ω.
Thus, we give a comprehensive quantitative characteriza-
tion of the non-local content of these noisy maps in terms
of their entangling and disentangling capacities. Unlike
[20] we do not test the resulting state on inseparability,
rather calculate its distillable entanglement explicitly.
The action of a unitary U = exp[iξσAx σ
x ], where ξ is
Gaussian distributed with the mean ξ and the standard
deviation Ω, on the state ρAB can be seen as a non-
0.2 0.4 0.6 0.8
FIG. 3: (color online) E↑(τG) (solid line) and E
↓(τG) (dashed
line) as a function of ξ for several values of Ω: 0.01, 0.18, 0.35,
0.52, 0.69, and 0.86. The dotted line corresponds to Ω = 0, i.e.
a unitary transformation. ξ and Ω are measured in radians.
unitary CPTP map
τG(ρ) →
(ξ−ξ)2
2Ω2 UρU †dξ, (10)
which is a continuous mixture of unitaries of the CNOT-
class. Similarly to the Sec. IVA we consider pure initial
state, i.e. ρ = |ψ〉〈ψ|, where ψ takes the form (5), cal-
culate the distillable entanglement of the output mixed
state, and maximize it over α. Figure 3 presents nu-
merical results for E↑(τ) and E↓(τ) as functions of ξ
for several values of Ω. We can see that already for
Ω ≈ 0.01 rad we obtain only very small deviation from
the (dis)entangling capacity of the unitary, i.e. τG with-
out noise - Ω = 0. As Ω increases the disentangling
capacity increases and the entangling capacity decreases.
The former fact should not be surprizing as it is known
that entanglement can be destroyed even by local CPTP
unital maps [21]. Thus, the more dispersed the distribu-
tion of ξ becomes, the easier for τG to destroy entangle-
ment and the harder to create it. Nevertheless, we see
that even when Ω is relatively large τG is still able to
create considerable amount of entanglement.
V. DISCUSSION AND CONCLUSION
We have discussed the entangling and disentangled ca-
pacities of nonlocal CPTP unital maps, i.e. maps that
can be represented as probabilistic mixtures of unitaries,
and have calculated these capacities in some particular
cases for two qubits. Three classes of unitaries were con-
sidered, namely the CNOT, DCNOT, and SWAP classes.
We have observed that the disentangling capacity was
always larger than corresponding entangling capacity,
which contrasts with the unitary case where the both
capacities are equal for qubits and for higher dimensions
disentangling capacity cannot be greater than entangling
capacity [7].
In the case of the CNOT-class our results were
obtained via straightforward generalization of the
method for CNOT-class unitaries. We argue that the
(dis)entangling capacity is achieved when a map acts on
the optimal pure state from the same family as in the uni-
tary case. Both discrete and continuous mixtures were
analyzed. In the case of the DCNOT and SWAP-class
direct numerical optimization was performed. We have
conjectured that dimensions of the local ancillas are equal
to the dimensions of the original particles, i.e. the ancil-
las were taken to be qubits.
A number of open question can be addressed in a future
research.
It will be interesting and useful to prove (or disprove)
the general conjecture that the sizes of local ancillas can
be taken equal to the sizes of original particles.
Here we have calculated single-shot capacities. In the
case of unitaries it had been shown that in the asymptotic
regime one cannot do better [5, 6]. It is important to
check whether this result holds in the non-unitary case.
In the case of DCNOT and SWAP-mixtures we per-
formed maximization over pure states only thereby ob-
taining lover bounds on E↑(τ) and E↓(τ), but not their
actual values.
In our future research we will address the question of
whether these bounds are tight. It might be the case
that optimal states for these operations are mixed and,
consequently, the capacities are higher than we have cal-
culated.
Acknowledgments
This work was funded by the U.K. Engineering
and Physical Sciences Research Council, Grant No.
EP/C528042/1, and supported by the European Union
through the Integrated Project QAP (IST-3-015848) and
SECOQC.
APPENDIX A
We have used two-dimensional ancilla on each side.
Consider a general state of four qubits in the tensor-
product of the computational bases of the original parti-
cles A, B and the ancillary particles A′, B′
|Ψ〉AA′BB′ =
i,j,k,l
ci,j,k,l|i〉A|j〉A′ |k〉B |l〉B′ . (A1)
There are 16 terms in the above superposition with 16
complex amplitudes ci,j,k,l, therefore |Ψ〉 can be parame-
terized using 30 real numbers (if we take into account the
global phase and normalization). We will parameterize it
in the following way [22]. First, to facilitate our analysis
it is easier to incorporate four indexes i, j, k and l, each of
which runs from 0 to 1, into a single index, x, that runs
from 1 to 16. This can be done by using the formula
x = 8i + 4j + 2k + l + 1, which is essentially a formula
for converting a number from the Boolean representation
to the decimal. Second, we present amplitudes cx in the
cx = |cx|eiθx , (A2)
where
|cx|2 = 1 and θ1 = 0. Third, we introduce
new parameters φx such that
cx = sinφx−1
cosφy, (A3)
where φ0 = π/2. Thus the state |Ψ〉 is parameterized by
30 angles. The advantage of this parametrization is that
we restrict their values only to the interval [0, 2π] that
simplifies numerics.
We proceed as follows. A program generates a vec-
tor of 30 random numbers in the interval [0, 2π]. This is
the initial state. We then apply the non-local map and
obtain a final state. We calculate the value of the gain
in entanglement ∆S = S(τ(Ψ)BB′ ) − S(τ(Ψ)AA′BB′) −
S(TrAA′ |Ψ〉〈Ψ|). After that we vary the values of the ran-
dom vector by a small amount and repeat these calcula-
tions again, thereby obtaining a gradient of the change in
entanglement in that point. We move along the gradient
to obtain the next |Ψ〉, and the procedure is repeated.
Eventually, the program reaches the maximum where it
stops.
[1] C.H.Bennett, H.J. Bernstein, S. Popescu, B. Schumacher,
Phys. Rev. A , 53 2046 (1996).
[2] W. Dür, G. Vidal, J. I. Cirac, N. Linden, and S. Popescu,
Phys. Rev. Lett. , 87, 137901 (2001).
[3] M.S. Leifer, L. Henderson, and N. Linden, Phys. Rev. A
67, 012306 (2003).
[4] D.W.Berry and B.C.Sanders, Phys. Rev. A 71, 022304
(2005); P.Zanardi, C. Zalka, and Lara Faoro, Phys. Rev.
A 62, 030301(R) (2000); L. Clarisse, S. Ghosh, S. Sev-
erini, A. Sudbery, e-print arXiv:quant-ph/0611075v2.
[5] C.H. Bennett, A. Harrow, D.W. Leung, and J.A. Smolin,
IEEE Tran. Inf. Theory, 49, 8, 1895 (2003).
[6] A.M. Childs, D.W. Leung, F. Verstraete, and G. Vidal,
Quant. Inf. Comp. 3, 97 (2003).
[7] N. Linden, J.A. Smolin, and A. Winter, e-print
quant-ph/0511217.
[8] M.A. Nielsen and I.L.Chuang, Quantum Computation
and Quantum Information, Cambridge University Press
(2004).
[9] It was explicitly shown [3] that the optimization can be
restricted to pure states.
[10] B. Kraus and J.I. Cirac, Phys. Rev. A 63, 062309 (2001).
http://arxiv.org/abs/quant-ph/0611075
http://arxiv.org/abs/quant-ph/0511217
[11] W.K. Wootters, Phys. Rev. Lett. 80, 2245 (1998).
[12] C.H. Bennett, D.P. DiVincenzo, J.A. Smolin, and W.K.
Wootters, Phys. Rev. A 54, 3824 (1996).
[13] J. I. Cirac, W. Dür, B. Kraus and M. Lewenstein, Phys.
Rev. Lett. , 86, 544 (2001); W. Dür and J. I. Cirac, Phys.
Rev. A , 64, 012317 (2001).
[14] B. Kraus and J.I. Cirac, Phys. Rev. A 63, 062309 (2001).
[15] Strictly speaking DCNOT-class and SWAP-class should
be unified under a single class if analyzed according to
the criteria of interconvertability under LOCC [16]. In
the framework of (dis)entangling capacity it is useful to
identify them as separate classes, because their behavior
differs qualitatively.
[16] W. Dür, G. Vidal, J. I. Cirac, Phys. Rev. Lett. , 89,
057901 (2002).
[17] V. Vedral and M. B. Plenio, Phys. Rev. A 57, 1619
(1998).
[18] Here we assume that all unitaries in Eq. (1) belong to the
same class, which is typical for real situations where only
parameters of the coupling are subject to variations.
[19] M. J. Testolin, C. D. Hill, C. J. Wellard, L. C. L. Hollen-
berg, e-print quant-ph/0701165.
[20] S. Bandyopadhyay and D.A. Lidar, Phys. Rev. A 70,
010301(R) (2004).
[21] B.Groisman, S. Popescu, and A. Winter, Phys. Rev. A
72, 032317 (2005).
[22] This method is a partial adaptation of the method used
in Ref. [17] for calculation the relative entropy of entan-
glement.
http://arxiv.org/abs/quant-ph/0701165
ABSTRACT
  Entangling and disentangling capacities are the key manifestation of the
nonlocal content of a quantum operation. A lot of effort has been put recently
into investigating (dis)entangling capacities of unitary operations, but very
little is known about capacities of non-unitary operations. Here we investigate
(dis)entangling capacities of unital CPTP maps acting on two qubits.

<|endoftext|><|startoftext|>
Introduction to Kolmogorov Complexity and Its
Applications, Springer, 1997.
¶It can be reached at arXiv: http://arxiv.org/abs/0704.1043.
A website with the complete results of the whole experiment is available at
http://www.mathrix.org/experimentalAIT/
http://arxiv.org/abs/0704.1043
October 23, 2018 8:33 World Scientific Review Volume - 9in x 6in LongAbstractDelahaye2
6 Jean-Paul Delahaye and Hector Zenil
7. A.K.Zvonkin, L. A. Levin. ”The Complexity of finite objects and the Algo-
rithmic Concepts of Information and Randomness.”, UMN = Russian Math.
Surveys, 25(6):83-124, 1970.
8. S. Lloyd, Programming the Universe, Knopf, 2006.
9. R. Solomonoff, The Discovery of Algorithmic Probability, Journal of Com-
puter and System Sciences, Vol. 55, No. 1, pp. 73-88, August 1997.
10. R. Solomonoff, A Preliminary Report on a General Theory of Inductive In-
ference, (Revision of Report V-131), Contract AF 49(639)-376, Report ZTB-
138, Zator Co., Cambridge, Mass., Nov, 1960
11. S. Wolfram, A New Kind of Science, Wolfram Media, 2002.
	*-14pt
	1. On the Kolmogorov-Chaitin Complexity for short sequences
	Jean-Paul Delahaye and Hector Zenil
ABSTRACT
  A drawback of Kolmogorov-Chaitin complexity (K) as a function from s to the
shortest program producing s is its noncomputability which limits its range of
applicability. Moreover, when strings are short, the dependence of K on a
particular universal Turing machine U can be arbitrary. In practice one can
approximate it by computable compression methods. However, such compression
methods do not always provide meaningful approximations--for strings shorter,
for example, than typical compiler lengths. In this paper we suggest an
empirical approach to overcome this difficulty and to obtain a stable
definition of the Kolmogorov-Chaitin complexity for short sequences.
Additionally, a correlation in terms of distribution frequencies was found
across the output of two models of abstract machines, namely unidimensional
cellular automata and deterministic Turing machine.

<|endoftext|><|startoftext|>
Introduction 
In cosmology, the redshift is one of the most important observations to study the 
origin and the evolution of the universe and the motion of celestial bodies. Based on the 
redshift observation, in 1927, Belgian Priest Georges Lemaître was the first to propose 
that the universe began with the explosion of a primeval atom, such hypothesis is the 
origin of the Big Bang theory. The hypothesis was supported by Edwin Hubble (Hubble, 
1929), he found that distant galaxies in every direction are going away from us with 
speeds proportional to their distance, which is based on the observation that the redshift is 
proportional to the distance. According to the Big Bang theory, as light from distant 
galaxies approaches the Earth there is an increase of space between the Earth and the 
galaxies due to the expansion of the universe, which leads to wavelengths being stretched, 
i.e., the light is expanding with the universe. The Big Bang theory becomes a dominant 
theory at current stage in study of the origin and the evolution of the universe. It can 
explain many important observations such as the redshifts, the ratio of light elements in 
the universe, cosmic microwave background radiation, etc.  Tired light theory is an 
alternate explanation of the redshift effect. Tired light was first proposed in 1929 by Fritz 
Zwicky (Zwicky, 1929) who suggested that photons might slowly lose energy as they 
travel vast distances. The major problem associated with this theory is that there is no 
notable machnism causing such energy drop during its journey. 
           In 2004, the author proposed the dark matter field fluid model at the DPF2004 
(Pan, 2005). In this model, the interstellar space is assumed to be, for simplicity, more or 
less uniformly filled with the dark matter field fluid which has fluid property and field 
property and all “baryonic” matter objects are saturated with such dark matter field fluid. 
Any motional celestial object will experience the dragging force of the dark matter field 
fluid. It is demonstrated that the current behavior and past evolution of Earth-Moon 
system can be described very well by this model (Pan, 2007) and the dragging effect of 
the dark matter field fluid dominates the evolution of the Earth-Moon system.  This paper 
will extend the application of the dark matter field fluid model to the light traveling 
through space and compare the results with observations; the geometrical aspects of 
photons is also discussed based on this model.  
2. The possible redshift effect of dark matter field fluid 
In the proposed dark matter field fluid model (Pan, 2005 and 2007), a spherical 
body moving  through the dark matter field fluid at low Reynolds number condition 
experiences the following dragging force F, 
mvrF n−−= 16πη      (1) 
where η is the dark matter field fluid constant which is equivalent to regular fluid 
viscosity constant, r is the radius of the sphere, n is the constant rising from saturation 
effect,  m is the mass of the sphere and the v is the moving velocity of the sphere.  The 
direction of the force F is opposite to the direction of velocity. The equation of motion of 
the body is  
m n−−= 16πη      (2) 
The equation (2) can be written as  
)( 1 mvr
mvd n−−= πη     (3) 
The momentum of the sphere is p=mv, so the equation (3) can be written as 
    pr
dp n−−= 16πη      (4) 
The equation (4) shows that under the dragging force of dark matter field fluid at low 
Reynolds number condition, the decrease rate of momentum of the spherical body is 
proportional to its momentum. It is further assumed that the general form of Eq. (4) can 
be applied to all ordinary matter objects (including photons) which move through the 
dark matter field fluid at low Reynolds number condition, i.e.,  
α−=       (5) 
where α is a parameter depending on the geometrical characteristics of the object (such as 
the size, shape, etc), the dark matter field fluid constant. Eq. (5) is the law of motion for 
ordinary objects moving through the dark matter field fluid with low Reynolds number. 
 It is well known that a photon has momentum p=h/λ, where λ is the wavelength of 
photon and h is the Planck’s constant. For a photon traveling through the dark matter 
field fluid, if the Eq. (5) is applicable, then, 
     (6) 
and  
          (7) teαλλ 0=
where the λ0 is the wavelength of the photon at time t=0, i.e., the wavelength of the 
photon which is just emitted by the emitter. The wavelength of the photon exponentially 
increases with the time it travels. This is the redshift effect of dark matter field fluid. The 
time t for the photons traveling from the emitter to the observer is 
t =        (8) 
where the D is the distance from the emitter to the observer and c is the speed of light. 
Let β =α/c, the redshift constant of dark matter field fluid, the Eq. (7) can be written as 
    .      (9) Deβλλ 0=
By convention, the redshift z is defined as 
=z .      (10) 
So the redshift caused by dark matter field fluid is 
    .      (11) 1−= Dez β
When the βD « 1, Eq, 11 reduces to the regular cosmological redshift formula 
    D
β == .     (12) 
Eq. 12 indicates that for a sufficient short distance, the cosmological redshift is directly 
proportional to the distance. The conventional cosmological redshift formula is 
    D
z =       (13) 
where the H is the Hubble constant. One can see that α is equivalent to H. Although the 
Eq. 12 and Eq. 13 are the same in form, their physical meanings are completely different. 
In the Eq. 12, the redshift is caused by the dragging effect of dark matter field fluid and 
the parameter α is a measure of the dragging effect of the dark matter field fluid on the 
light and has a unique value; but in the Eq. 13, the redshift is caused by universe 
expansion or the stretching of the space.  
 The emitter, however, may move with speed v relative to the observer, such 
motion has a Doppler effect on the emitted light (v « c) 
    )1(0 c
r += λλ       (14) 
where the λr is the wavelength of the light when the emitter is at rest. Therefore, the 
actual wavelength of light detected by the observer is 
    Dr ec
v βλλ )1( += .     (15) 
The observed redshift is 
    1)1( −+=
.    (16) 
Obviously, the relativistic Doppler formula has to be used when the v is close to c. Eq. 16 
indicates that the observed redshift z depends on not only the speed of emitter but also the 
distance. For sufficiently short distance, the Doppler motion effect may dominate the 
redshift z; for a large distance D, the contribution from the dragging force of the dark 
matter field fluid may dominate the redshift z. 
 Fiq. 1 shows how the redshift z varies with the speed of the emitter with 
parameter βD=0.05. By convention, if z is positive, it is redshift; if z is negative, it is 
blueshift. Referring to Fig. 1, the line intercepts with the speed axis v/c at (exp[-βD]-1), 
no at zero. When the emitter (such as galaxies, supernova, pulsars, etc) moves away from 
the observer (on the Earth), v is positive, both the dragging effect of dark matter field 
fluid and the effect of motion (Doppler effect) have positive contribution to z, z is 
positive, so a redshift is observed. When the speed of the emitter is zero, only the effect 
of dark matter field fluid contributes to z which equals to (exp[βD]-1), and a redshift is 
observed. When the speed is negative, i.e., the emitter moves toward the observer with 
the speed in the range between (exp[-βD]-1) and 0, the positive contribution by the dark 
matter field fluid is greater than the negative contribution by the motion, z is still positive, 
i.e., a redshift is observed, so the observer can not know which direction the emitter 
moves based on only the sign of z. When the speed of the emitter equals to (exp[-βD]-1), 
the positive contribution from the dragging effect of dark matter field fluid equals the 
negative contribution from the motion effect, the two effects cancel each other, no shift is 
observed. When the speed of the emitter is at the left side of the (exp[-βD]-1), the 
negative contribution from motion effect is greater than the positive contribution from the 
dark matter field fluid, a blueshift is observed. The observer knows that the emitter 
moves towards to him/her. When the distance D increases, the interception (exp[-βD]-1) 
moves to the left, covers a larger redshift range, a higher negative speed is needed in 
order to observe a blueshift. According to this result, due to the redshift effect of dark 
matter field fluid, much more redshifts than blueshifts should be observed for the celestial 
objects in the sky and a redshift (z > 0) is always observed  for any celestial objects in all 
directions as long as their distance is sufficiently large. This conclusion is exactly what is 
observed. So far only few galaxies show blueshifts, the most famous being M31 galaxy 
(Andromeda). As M31 is our near neighbor and the distance is relatively short, the speed 
is at the left side of (exp[-βD]-1) on Fig. 1. In this case, the redshift associated with the 
dark matter field fluid is less than the blueshift from the motional Doppler effect. Most 
galaxies are much further away from us. It is possible that some of those far away 
galaxies may move towards us, but the redshift effect of dark matter field fluid is greater 
than the blueshift of the motional Doppler effect, the final observed shifts are to the red. 
The significance of the Eq. (16) is that a large value of the redshift z does not necessarily 
means that the distant emitters(galaxies, supernovae, etc) have a large receding speed. 
The speed deduced from a redshift z using conventional Doppler formula or conventional 
cosmological redshift formula (Hubble’s law, v = HD) will be overestimated for v ≥ 0, 
misleading for c(exp[-βD]-1)≤ v ≤ 0, and underestimated for v < c(exp[-βD]-1). Note, the 
speed v here is the actual speed of the emitters relative to the observer at the moment it 
emits the light. 
 The parameter β will be one the important parameters of the nature. Finding the 
value of β is a challenge. The redshift z of a remote celestial object can be accurately 
measured; however, accurately measuring the speeds and distances of the distant celestial 
objects is difficult. We can roughly estimate the range of β. According to the available 
data, the M31 galaxy has a redshift z = -0.000991 (Huchra et al. 1999) and distance about 
2.9 million light years which equals 0.889 Mpc (mega parsec). The speed of M31 would 
be -297 km/s if only based on the Doppler effect. This speed will be underestimated 
according to the Eq. 16. If the actual speed is -600 km/s (most likely exaggerated), then, 
β= 1.14 × 10-3 /Mpc; if the actual speed is -300 km/s, then, β = 1.09 × 10-5 /Mpc. So β is 
probably in the range of 10-3 to 10-5 /Mpc. According to Eq. 12 and Eq. 13, β = H/C = 
2.5×10-4 /Mpc with assumption of H = 75 km/s/Mpc, it is just in the middle of the 
estimated range. Therefore, the current value of Hubble’s constant can be used as a good 
guess for the β. With this estimated value of β, the speed of M31 towards us is about -363 
km/s. When the sufficiently accurate value of β and the distance of the objects are found, 
it will be possible to calculate the speed of the objects with the observed value of z. 
Theoretically, the mechanism of the redshift effect by the dark matter field fluid 
model can be tested by observing the change of the redshift values of remote celestial 
objects over the time. As indicated above that when the emitter moves toward the 
observer with the speed in the range between (exp[-βD]-1) and 0, the positive 
contribution by the dark matter field fluid is greater than the negative contribution by the 
motion, z is still positive, i.e., a redshift is observed. However, the value of the redshift z 
will decrease with time as long as the emitter keeps moving towards the observer, and 
eventually it will become blueshift when the distance between the emitter and observer is 
short enough that redshift by dark matter field fluid is less than the blueshift by the 
motional Doppler effect. However, it will take very long time for any detectable change 
with reasonable accuracy. For example, it will take about 97 million years for the 
blueshift of M31 changing from -0.000991 to -0.001000 based on the current data and 
assuming β = 2.5 x 10-4/Mpc.  
As indicated in the previous paper (Pan, 2005), the dark matter field fluid may 
have thermal property with temperature about 2.7 K, the observed cosmic microwave 
background radiation is the black body radiation of the dark matter field fluid at 2.7K. It 
is certain that the density distribution and the thermal temperature of the dark matter field 
fluid are not uniform through the space and not constant in all time scale. The uneven 
distribution of density and temperature of the dark matter field fluid in the space causes 
uneven black body radiation which could be the origin of the observed anisotropies of 
cosmic microwave background radiation. Furthermore, it is very possible that the dark 
matter field fluid may be converted to other type of matter by certain physical 
mechanisms, or vise verse, which causes the change of density and temperature 
distribution, therefore, the current distribution of the density and the temperature of the 
dark matter field fluid could be different from remote past. 
It is interesting to notice that the redshift effect of dark matter field fluid is 
another version of “Tired light” model. But in this model, it is clear that the mechanism 
to cause the energy loss for photons to pass through the dark matter field fluid is the 
dragging force, and their wavelength becomes increasingly longer as they do so. No one 
ever thinks before that there is any relationship between the evolution of the Earth-Moon 
system and the redshift, but the dark matter field fluid model demonstrates that those two 
events obey the same law of motion (Eq. 5) and share the same mechanism. Such 
mechanism should not cause any blurring to the observed light.  
3. The geometrical aspects of photons 
Light is the most mysterious and common phenomenon in the nature, people are 
always fascinated by the properties of light with all kinds of imagination, such 
imagination may be more important than the knowledge and can make people thinking in  
unusual ways without being limited by their knowledge. At 16 years old, Albert Einstein 
wondered what it would be like to ride on a beam of light. Maxwell described the light as 
a wave; Einstein described the light as a stream of energy packs which are now called 
photons, i.e. particles. So the light has wave-particle dual properties. It is quite often that 
people with different backgrounds and ages wonder what a photon looks like and how big 
it is.  Such question is very interesting, but does not have an answer now, may never have 
one. However, we can still address some of the geometrical aspects of the photons and 
interesting information can be extracted out from the above results.  
We must accept the facts that photons are real matter objects and each photon is 
created as a whole entity and travels in the space as a whole entity without falling apart 
during its journey, therefore there must be some kind of an internal force existed inside 
the photon to hold “all components” of the photon together and keep it stable during the 
journey, such internal force is equivalent to the internal force to hold “all components” of 
an electron together, and is equivalent to the internal force to hold all components of an 
atom together and is equivalent to the internal force to hold all components of solar 
system together. In general, when we talk about the particle property, we will naturally 
think about the size and shape which are associated with particles. For wave-like property, 
on other hand, the size and the shape may lose their meanings according to quantum 
mechanics. However, this does not mean that those objects which can be perfected 
treated by quantum mechanics do not have, at least in time average, the geometrical 
characteristics such as size and shape. The instant geometries of photons, electrons and 
other particles may vary rapidly with time, however, in time average, they should have 
certain stable sizes and shapes, although it is hard to know what such sizes and shapes are.  
For example, the instant size and shape of hydrogen atom rapidly changes with time due 
to the fact that the distance and orientation of electron to the proton rapidly changes 
because of the fast motion of electron around the proton. However, in time average, the 
electron cloud is distributed around the proton which makes the hydrogen atom have a 
“ball” shape with the stable average size well represented by the Bohr radius (0.53 Å). 
From above results, one can see that the observed redshifts agree with the 
description of the model very well and photons with all wavelength-band follow the law 
of motion Eq. 5. As indicated above, the parameter α in Eq. 5 depends on the geometrical 
characteristics of the object (such as the size, shape, etc) and the α is the same for all 
photons; therefore, based on such information, we can conclude that all photons have the 
same geometrical characteristics (size and shape, at least in time average). This means 
that when a photon travels through the dark matter field fluid, it gradually loses its energy 
due to the dragging effect of the dark matter field fluid, but its geometry remains the 
same all the way in its journey. We can further conclude that at least in time average all 
photons in all inertial reference frames have the same geometry; therefore, photons do not 
have the length contraction effect. This is a very important property of photons in 
addition to that all photons have the same speed in all inertial reference frames. In 
contrast, an object with rest mass has length contraction effect when it is observed in 
different inertial reference frames with relative motions according to the special relativity. 
4. Conclusion 
 The dark matter field fluid model has been successfully applied to the 
cosmological redshift, the results deduced from this model agree very well with the 
observations. The observed cosmological redshift of light depends on both the speed of 
the emitter and the distance between the emitter and the observer. If the emitter moves 
away from us, a redshift is observed. If the emitter moves towards us, whether a redshift, 
a blueshift or no shift is observed will depend on the speed vs. the distance. If the speed is 
in the range of c(exp[-βD] – 1) < v < 0, a redshift is observed; if the speed equals c(exp[-
βD] – 1), no shift is observed; if the speed v < c(exp[-βD] – 1), a blueshift is observed. A 
redshift will be always observed in all directions for any celestial objects as long as their 
distance from us is large enough. Therefore, many more redshifts than blueshifts should 
be observed for galaxies and supernovae, etc in the sky. This conclusion agrees with 
current observations. The estimated value of the redshift constant β of the dark matter 
field fluid is in the range of 10-3 ~ 10-5 /Mpc. A large redshift value from a distant 
celestial object may not necessarily indicate that it has a large receding speed. At least in 
time average, all photons have the same geometry in any inertial reference frames and do 
not have length contraction effect. 
5. References 
1. Hubble, Edwin, "A Relation between Distance and Radial Velocity among Extra-
Galactic Nebulae" (1929) Proceedings of the National Academy of Sciences of the United 
States of America, Volume 15, Issue 3, pp. 168-173 
2. Zwicky, F.  On the Red Shift of Spectral Lines through Interstellar Space. PNAS 
15:773-779, 1929. 
3. Pan, H. Application of fluid mechanics to the dark matter, Internat. J. Modern 
Phys. A, 20(14), 3135 (2005). 
4. Pan, H.  The evolution of the Earth-Moon system based on the dark matter field 
fluid model, arXiv:0704.0003 (2007). 
5. Huchra, J. P., Vogeley, M. S., Geller, M. J., Astrophys. J. Suppl. Ser, 121(2), 
287(1999). 
http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1929PNAS...15..168H&db_key=AST&data_type=HTML&format=&high=42ca922c9c30954
http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=1929PNAS...15..168H&db_key=AST&data_type=HTML&format=&high=42ca922c9c30954
http://arxiv.org/abs/0704.0003
The dependence of observed redshift z on the speed v of the emitter with βD = 
0.05. The speed v of the emitter is in unit of speed of light c.
ABSTRACT
  The cosmological redshift phenomenon can be described by the dark matter
field fluid model, the results deduced from this model agree very well with the
observations. The observed cosmological redshift of light depends on both the
speed of the emitter and the distance between the emitter and the observer. If
the emitter moves away from us, a redshift is observed. If the emitter moves
towards us, whether a redshift, a blueshift or no shift is observed will depend
on the speed vs. the distance. If the speed is in the range of
c(exp[-beta*D]-1) < v < 0, a redshift is observed; if the speed equals
c(exp[-beta*D]-1), no shift is observed; if the speed v less than
c(exp[-beta*D]-1), a blueshift is observed. A redshift will be always observed
in all directions for any celestial objects as long as their distance from us
is large enough. Therefore, many more redshifts than blueshifts should be
observed for galaxies and supernovae, etc in the sky. This conclusion agrees
with current observations. The estimated value of the redshift constant beta of
the dark matter field fluid is in the range of 10^(-3) ~ 10^(-5)/Mpc. A large
redshift value from a distant celestial object may not necessarily indicate
that it has a large receding speed. Based on the redshift effect of dark matter
field fluid, it is concluded that at least in time average all photons have the
same geometry (size and shape) in all inertial reference frames and do not have
length contraction effect.

<|endoftext|><|startoftext|>
Cool Stars in Hot Places
ASP Conference Series, Vol. To appear in proceedings of Cool Stars 14,
Ed. Gerard Van Belle
Cool Stars in Hot Places
S. T. Megeath
Ritter Observatory, Department of Physics and Astronomy, University
of Toledo, Toledo, OH 43606
E. Gaidos
Department of Geology & Geophysics, University of Hawaii, Honolulu,
HI 96822
J. J. Hester
Arizona State University, Department of Physics & Astronomy, Tempe,
AZ 85287
F. C. Adams
Physics Department, University of Michigan, Ann Arbor, MI 48109
J. Bally
Center for Astrophysics and Space Astronomy, University of Colorado,
Boulder, CO 80309
J.-E. Lee
Physics and Astronomy Department, The University of California at
Los Angeles, Los Angeles, CA 90095
S. Wolk
Harvard Smithsonian Center for Astrophysics, Cambridge, MA 02138
Abstract.
During the last three decades, evidence has mounted that star and planet
formation is not an isolated process, but is influenced by current and previous
generations of stars. Although cool stars form in a range of environments, from
isolated globules to rich embedded clusters, the influences of other stars on cool
star and planet formation may be most significant in embedded clusters, where
hundreds to thousands of cool stars form in close proximity to OB stars. At
the cool stars 14 meeting, a splinter session was convened to discuss the role
of environment in the formation of cool stars and planetary systems; with an
emphasis on the “hot” environment found in rich clusters. We review here the
basic results, ideas and questions presented at the session. We have organized
this contribution into five basic questions: what is the typical environment of
cool star formation, what role do hot star play in cool star formation, what role
does environment play in planet formation, what is the role of hot star winds and
supernovae, and what was the formation environment of the Sun? The intention
is to review progress made in addressing each question, and to underscore areas
of agreement and contention.
http://arxiv.org/abs/0704.1045v1
2 Megeath et al.
1. What is the Typical Environment of Cool Star Formation?
Cool stars form in a range of environments, from isolated Bok globules, to modest
sized clusters containing 100-200 stars, and finally to large, dense clusters with
thousands of cool stars and several to tens of OB stars. This is in sharp contrast
to OB stars, which form almost entirely in large clusters. This motivates the
question: in what environment do most cool stars form?
Surveys of the molecular gas in our Galaxy indicate that most of the cold
molecular gas is in giant molecular clouds (GMCs) with masses of 105 to 106 M⊙
(Heyer & Terebey 1998). These massive molecular clouds are thought to form
entire associations of hot OB stars as well thousands of low mass stars. Coupled
with analyses indicating that 80-90% of cool stars form in large clusters (Porras
2003; Lada & Lada 2003; Carpenter 2000); these results seemed to point to a
galaxy in which the vast majority of cool star formation takes place in rich
crowded clusters in close proximity to hot stars. However, since there was little
information on the numbers of isolated stars, the analyses of Porras (2003) and
Lada & Lada (2003) considered only stars in groups and clusters. In an analysis
of the 2MASS point source catalog toward several molecular clouds, Carpenter
(2000) found evidence for substantial numbers of isolated stars, but the estimates
contained significant uncertainties.
More recently, surveys of giant molecular clouds with the Spitzer space
telescope provided the means to identify isolated young stars and protostars
through the infrared excesses from their disks and envelopes (Allen et al. 2007).
Spitzer surveys of four giant molecular clouds containing young massive hot stars,
the Orion A cloud, Orion B cloud, Cep OB3 cloud and Mon R2 cloud, show that
in addition to clusters associated with regions of massive star formation, there
are large number of stars in small groups or isolation. In these clouds, 46% of
the young stars with excesses are found in clusters with over 90 sources, 11%
are found in small clusters of 90-30 stars, 8% in groups of 30-10 stars, and the
remaining 35% in groups with less than 10 members or isolation (Megeath et al.
2007; Gutermuth et al. 2007). About 33% of the stars are found in the two
largest clusters with over 700 members each. Thus, although most cool stars
may form in OB associations, young cool stars in OB associations are not found
primarily in large clusters. Instead, they are found in a range of environments,
with a significant fraction of stars forming in relative isolation several to tens of
parsecs away from the nearest OB stars (Fig. 1).
2. What Role do Hot Stars play in Cool Star Formation?
Although cool stars dominate star-forming regions in both number and total
stellar mass, hot stars are thought to be the primary agents of molecular cloud
evolution. The extreme-UV radiation from young O and early B-type stars pho-
toionizes the surfaces of molecular clouds, resulting in flows of ionized gas which
erode the clouds. Far-UV radiation may play a similar role in regions where only
B-stars are present by heating and photodissociating the molecular gas. Clus-
ters with O and/or B stars and ages of only a few million years appear to have
partially or fully dispersed their molecular clouds. An example is the 2.5 Myr
old σ Ori cluster (Sherry et al. 2004); this cluster sits outside the Orion B cloud
Cool Stars in Hot Places 3
Figure 1. The Orion OB association. The contours show an AV map of
the Orion region made from the 2MASS database (Gutermuth, p. com.), the
gray circles are the O stars (large circles) and B stars (small circles) from
Brown et al. (1994) and the dots are Spitzer identified young cool stars and
protostars (Megeath et al. 2007; Hernandez et al. 2007). Only regions of
high molecular column density and the σ Ori cluster have been surveyed by
Spitzer, and many more young cool stars certainly exist in the OB association
(Fig. 1). Other examples are discussed in Allen et al. (2007). It is estimated
that only 10% of embedded clusters survive gas dispersal and presist as clusters
for more than 10 Myr (Lada & Lada 2003).
The detection of Evaporating Gaseous Globules (EGGs), 1000 AU diameter
photoevaporating dark globules, demonstrated that hot stars may directly im-
pact protostellar evolution (Hester et al. 1996). EGGs appear to be protostellar
or prestellar cores which emerge from their parental clouds as the surrounding
lower density gas is ionized (Hester & Desch 2005). In M16, 15% of the EGGs
contain embedded stars, indicating that they are the sites of recent or ongoing
star formation (McCaughrean & Anderson 2002). This suggests that hot stars
can directly affect protostellar evolution by photoevaporating the infalling gas
4 Megeath et al.
and limiting the ultimate mass of the nascent star. However, it is not known
what fraction of stars emerge from their clouds in EGGs.
A more controversial issue is whether OB stars trigger cool star forma-
tion. This possibility has been discussed in the literature for decades (e.g.
Elmegreen & Lada 1977). Hester & Desch (2005) proposed that in regions with
hot stars, cool star formation is driven primarily by shock fronts preceeding ad-
vancing ionization fronts. The shock fronts overtake and compress pre-existing
density enhancements, inducing collapse and the formation of clusters of low
mass stars. Evidence for this is found in the detection of clusters of young stars
at the surfaces of molecular faces being eroded by hot stars (Sugitani et al. 1995;
Megeath et al. 2004; Allen et al. 2006). However, additional evidence, such as
the detection of the shock fronts, is needed to determine whether the clusters
have been triggered, or whether they are regions of ongoing star formation which
have been overtaken by ionization fronts (Megeath & Wilson 1997).
Although there is growing evidence that triggering does happen, it is not
clear what fraction of cool star formation is triggered. Assessing the overall
importance of triggered star formation can be difficult due to the rapid evolution
and even rapid motions of OB stars. For example, Hoogerwerf et al. (2001)
argued that the interaction of the ι Ori binary system with a second system led
to the ejection of the runaway stars AE Aur and µ Col 2.5 Myr ago (both are 09.5
stars). Although they suggested that these stars originated in the Orion Nebula
Cluster, the lack of a visible HII region surrounding ι Or, an O9 III star which in
projection appears conicident with the Orion A cloud, suggests that it is several
to tens of parsecs away from the Orion A molecular cloud and is part of the 5 Myr
OB1c association (Brown et al. 1994). At the time of their ejection, these three
O–stars may have had a significant impact on the Orion A cloud, and could
have been responsible for triggering star formation in the Orion Nebula Cluster.
Another possible example is the LDN 1551 dark cloud in the Taurus dark cloud
complex. This cloud has a cometary morphology with the “head” of the comet
pointing toward the Orion constellation. Moriarty-Schieven et al. (2005) argued
that the cometary shape may be due to the interaction of LDN 1551 (149 pc from
the Sun) with the B8I star Rigel (Hipparcos distance is 240 pc) and the M2I star
Betelgeuse (Hipparcos distance 130 pc). The high proper motion of Betelgeuse
would place it southeast of LDN 1551 several million years ago; hence, both
Betelgeuse and Rigel could have plausibly interacted with LDN 1551, creating
the cometary morphology.
These observations demonstrate the difficulties in determining causal rela-
tionships between subsequent generations of star formation and establishing the
importance of triggering. Although ongoing triggering can be identified by the
detection of clusters near ionization fronts, in many cases, evidence of triggering
may be erased by the evolution and motion of massive stars.
3. What Role does Environment Play in Planet Formation?
Environment may also play a role in planet formation by altering the properties
of protoplanetary disks. We discuss here two mechanisms: tidal interactions
between stars in clusters and the photo-ablation of disks by UV photons from
nearby OB stars.
Cool Stars in Hot Places 5
Tidal interactions occur when a disk around a star in a cluster is distorted
or stripped during a close encounter with another cluster member. Such inter-
actions appear to be unimportant. Adopting a stellar density of 104 pc−3 (the
peak density for many embedded clusters) and assuming virialized velocities,
Gutermuth et al. (2005) used a simple mean free path argument to estimate the
frequency of close approaches. They estimated that even in the dense, central
cores of clusters, close approaches at distances of 100 AU would occur once
in a 10 Myr interval. However, the high stellar densities assumed by Guter-
muth et al. may only persist for a few million years before the clusters begin
to expand. This result is supported by N-body simulations of bound clusters
which show that such interactions are rare over the lifetime of an embedded
cluster (Adams et al. 2006; Throop & Bally in prep). Adams et al. (2006) find
that each star in a 1000-member (initially) embedded cluster will experience
one close-approach within 700-4000 AU over a 10 Myr interval. This distance
is more than three times the typical radius of observed circumstellar disks in
nearby dark clouds (Andrews et al. 2007) and much larger than the size of the
Solar System. Since the adopted timescale for gas removal in these simulations
was 5 Myr, longer than the observed timescale (Sec. 2), the close-approach dis-
tances should be considered lower limits. In summary, the results from three
independent investigations are in agreement; unless embedded clusters exist in
our galaxy with much higher stellar densities than observed in nearby regions
such as the Orion Nebula Cluster, tidal interactions in clusters rarely influence
disk evolution and planet formation.
In contrast, photoevaporation of disks by nearby OB stars appears to be a
much more influential process. The UV radiation from the OB stars heats the
gas in disks through photoionization and photodissociation, resulting in flows
of gas off the disks. This process was discovered in VLA and HST observations
of young stars in the Orion Nebula (Churchwell et al. 1987; O’Dell & Zheng
1994). The inferred mass loss rates were 10−7 M⊙, suggesting disk lifetimes of
only a few hundred thousand years (Bally et al. 1998). However, the mass loss
occurs in the outer disk where the thermal velocity of ionized gas exceeds the
escape velocity from the star, and the gas in the inner disk may not be strongly
affected. More recent calculations include the effect of the far-UV radiation
and the time dependent nature of the UV-field as the stars orbit within the
cluster potential. Adams et al. (2004) calculate the mass loss from a disk as a
function of the intensity of the far-UV radiation field. They find the radiation
field can truncate a disk to the size of our solar system in several million years;
the exact radius depends on the duration of the exposure to UV radiation, the
intensity of the UV radiation, and the mass of the central star (Adams et al.
2004). Throop & Bally (in prep) use N-body simulations to calculate the time
dependent flux of UV radiation incident on a young star with disk as it orbits
in a cluster which contains OB stars in its center. They find that typical stars
experience only a brief exposure to intense UV as they pass within 10,000 AU
of the central OB stars. Consequently, the UV flux incident on a disk varies in
an stochastic manner over the lifetime of the cluster.
Recently, Throop & Bally (2005) proposed that the photoevaporation of
disks may in fact trigger the formation of planets. In their model, grain growth
and dust settling concentrates dust grains in the midplane of the disk. Conse-
quently, the ablation of the gas from the disk surface (as well as the remaining
6 Megeath et al.
dust grains entrained in the gas) reduces the ratio of the gas surface density
to dust surface density. If the surface density of gas is reduced to less then
10 times the dust density, the disk becomes unstable to gravitational collapse
(Sekiya 1998; Youdin & Shu 2002).
Although photoevaporation may be important in rich embedded clusters
with OB stars, many young cool stars in OB associations are not found in such
clusters. Young cool stars with disks identified in the Spitzer survey of the
Orion A cloud have a median projected distance of 4.1 pc to the nearest O to
B0 star, and a median projected distance of 2.1 pc to the nearest B1-B3 stars
(Megeath et al. 2007). Hence, in OB associations, most cool stars may form
at large distances from the central OB stars and are unaffected by their UV
radiation.
4. What is Role of Hot Star Winds and Supernovae?
Chandra X-ray observations of young stellar clusters have detected diffuse X-
ray emission in nine regions. The total luminosities of this gas range from
1− 200× 1033 erg s−1 (Wolk et al. 2002; Townsley 2006). Although supernovae
could generate this gas, in most cases the diffuse gas appears to be generated by
stellar winds from massive stars colliding with other winds or the surrounding
HII region. However, in the Carina region, a component of hot gas enriched in Fe
was likely created by a supernova (Townsley 2006). The impact of the extremely
hot gas on star and planet formation is not well understood. In addition to
destroying the surrounding the cloud, the blast waves from a supernova could
compress surrounding cores of gas causing them to collapse into stars (Boss 1995;
Melioli et al. 2006). Disks can survive at distances of ≤ 1 pc from a supernova
(Chevalier 2000); however, these disks will be heated by the radiation and blast
wave, and may also be stripped by the blast wave when the disks are only 0.25 pc
from the supernova (Chevalier 2000). The hot X-ray gas created by winds may
fill bubbles within the larger HII region. This hot, low density gas would be
transparent to UV photons, and hence any young stars within the bubble may
be exposed to a more intense UV field than those in the surrounding HII region.
5. What Was the Formation Environment of the Sun?
Did our Sun also form in the “hot” environment of a large embedded cluster?
Tremaine (1991) and Gaidos (1995) proposed that our Solar System might pre-
serve dynamical evidence of its birth environment. Gaidos (1995) and Adams & Laughlin
(2001) used the low inclination and eccentricity of Neptune to place constraints
on the time-integrated tidal field of a cluster and the closest stellar passage.
However, such reasoning must now be re-examined in light of the expectation
that most embedded clusters expand and disperse in a few Myr (although some
clusters would form bound open clusters, Sec. 2) and the realization that Nep-
tune (and Uranus) migrated outward to its present orbit by scattering in a
residual planetesimal disk, a process that was probably not completed until
after a parental cluster dispersed (Hahn & Malhotra 2005). Scattering inside
the disk itself, which dampens any non-circular motion, could have produced
the low eccentricity and inclination observed today. Similar arguments can
Cool Stars in Hot Places 7
be made that other parts of the outer Solar System (the Edgewood-Kuiper
belt, Oort Cloud) formed after the cluster evaporated (Levison & Morbidelli
2003). Kenyon & Bromley (2004) and Morbidelli & Levison (2004) proposed
that Sedna, a member of the scattered Kuiper Belt, was produced by the
close passage of a star, but there are other explanations (Barucci et al. 2005;
Gladman & Chan 2006). Thus, it is likely that the structure of the outer Solar
System post-dates an embedded cluster phase.
The strongest evidence for an early cluster environment is the inferred pres-
ence of short-lived radionuclides (SLRs) during the formation of solids now found
in meteorites. There are at least three possible sources of SLRs: particle irradi-
ation within the primordial solar nebula, the wind from a nearby AGB star, and
the wind and/or supernova ejecta from a nearby massive star. The discovery
of 60Fe in the early Solar System (Tachibana & Huss 2003) firmly establishes
that the Sun formed in a rich cluster containing massive stars (Hester 2004;
Hester & Desch 2005). Neutron-rich isotopes such as 60Fe cannot be produced
by particle irradiation. The uniform distribution of the SLR 26Al makes it
unlikely it was produced by irradiation (Thrane et al. 2006). Finally, it is sta-
tistically unlikely that the SLRs originated in an AGB star (Kastner & Myers
1994).
Further evidence is found in the mass-independent fractionation of the oxy-
gen isotopes (17O and 18O) in meteorites. Following a proposal by Clayton, Grossman & Mayeda
(1973), Lee et al. (2007) have made a theoretical analysis of the time-dependent
chemistry in a collapsing envelope subjected to an external UV field. Due to
“self-shielding” of the much more abundant C16O, the UV field preferentially
dissociates C18O and C17O, producing an enhancement of 18O and 17O in the
gaseous envelope. These heavier isotopes are then incorporated (as water) into
ice grains and transported into the inner region of the solar nebula. This pro-
cess depends on the intensity of the external UV radiation field (from OB stars)
so that the measured fractionation can constrain the formation environment of
the Sun. Lee et al. (2007) conclude that the observed isotopic ratios are best
explained by a radiation field 105 greater than the interstellar field, again sup-
porting the presence of nearby massive stars.
The current evidence firmly indicates that the Sun formed in a hot environ-
ment enriched by the ejecta of one or more nearby supernova; however, there
is a continuing debate over how the solar nebula was enriched. Cameron et al.
(1995) argued that the enrichment occurred when the collapse of the proto-
solar molecular cloud was triggered by the blast wave of a supernova (also see
Vanhala & Boss 2002). Hester & Desch (2005) question whether this process
could enrich the collapsing molecular gas. Alternatively, the protostellar en-
velope of the Sun may have been directly enriched while collapsing onto the
proto-Sun (Looney et al. 2006). For example, if the solar system formed in an
EGG, then it may have been subjected to a blast wave from a supernova. Finally,
the SLRs may have been injected directly into the disk of the solar nebula when
the Sun was in its T-Tauri phase; a possible mechanism for this is the “aerogel”
model, in which grains in SN ejecta are deaccelerated and vaporized within the
gaseous primordial disk (Ouellette et al. 2005). This scenario is supported by
observations showing that 40% of disks may persist for 4 Myr (Hernandez 2007),
the lifetime of a 60 M⊙ star.
8 Megeath et al.
Recent quantitative analyses have constrained the distance between the
Sun and the supernova from which the SLRs presumably originated. If the en-
richment occurred while the Sun was in a T Tauri phase with a 200 AU disk
(Andrews et al. 2007), the estimated distance is between 0.04-0.4 pc (Looney et al.
2006; Ellinger et al. in prep). If the enrichment occurred in the protostellar
phase (5000 AU diameter), the estimated distance is between 0.12- 1.6 pc (Looney et al.
2006). The question has been raised whether these distances are consistent with
observations showing that embedded clusters largely disrupt their parental cloud
and disperse in a few million years (see Sec. 2). The dispersal of the molecular
gas makes the presence of nearby protostars unlikely, and the subsequent expan-
sion of the cluster make the presence of young stars with disks less likely. There
are possible solutions to this problem. The Sun may have remained close to a
hot star as the cluster dispersed. Only one low mass star with a disk is found
within a projected distance of ∼ 0.3 pc of the O6 star HD206267 in the 4 Myr
old IC 1396 association (Sicilia-Aguilar et al. 2006), suggesting that this may be
rare occurrence. The Sun may have been a bound companion to a massive star,
such as the companions with disks found around the OB stars comprising the
Orion Trapezium (Schertl et al. 2003); however, it unclear how long such a disk
may survive. The Sun could have formed in a massive embedded cluster which
evolved into a bound open cluster. In this case, the solar system would have to
survive photoevaporation and perturbations from tidal interactions as it orbited
within the cluster (Adams & Laughlin 2001). Finally, the solar system may
have been enriched by the combined ejecta of many supernova (Hester & Desch
2005; Williams & Gaidos in prep). Additional data on SLRs in meteorites, de-
tailed modeling of the evolution and dispersal of embedded clusters, and the
study of other planetary systems in hot environments should bring a more de-
tailed understanding of our Sun’s formation environment.
The presence of SLRs may have had a significant impact on planet formation
in the solar nebula. Radioactive decay of 26Al and 60Fe provides by far the
largest source of energy for melting and differentiating planetesimals in the early
Solar System (Bizzarro et al. 2005; Hevey & Sanders 2006). In summary, it has
been amply demonstrated by observation and theory that environment plays
a significant role in the formation of cool stars and planets. A comprehensive
understanding of star and planet formation must not treat young stars and
protoplanetary solely as isolated objects, but as parts of larger associations and
clusters in which the formation of cool and hot stars are inextricably linked.
References
Adams,F. C., Hollenbach, D., Laughlin, G. Gorti, U. 2004 ApJ, 611, 360.
Adams, F. C., Proszkow, E. M., Garuzzo, M., Myers, P. C. 2006 ApJ 641, 504.
Adams, F. C., & Laughlin, G. 2001, Icarus 150, 151.
Allen, L. E., Hora, J. L., Megeath, S. T., Deutsch, L K., Fazio, G. G., Chavarria, L.,
Dell, R. W., 2005, IAU 227, eds. Cesaroni, R., Felli, M., Churchwell, E.
Allen, L., Megeath, S. T., Gutermuth, R., Myers, P. C., Wolk, S., Adams, F. C.,
Muzerolle, J., Young, E. T., & Pipher J. R. 2007, Protostars and Planets V.
Andrew, S. M., Williams, J. P. 2007 ApJ in press.
Bally, J., Sutherland, R. S., Devine, D., & Johnstone, D. 1998, ApJ 116, 293.
Barucci, M. A., Cruikshank, D. P., Dotto, E., et al., 2005 A&A 439, L1.
Bizzarro, M., Baker, J. A., Haack, H & Lundgaard, K. L. 2005, ApJ 632, L41.
Cool Stars in Hot Places 9
Boss, A. P., 1995, ApJ 439, 224.
Brown, A. G. A., de Geuss, E. J. & de Zeeuw, P. T., 1994, A&A 289, 101.
Cameron, A. G. W., Hoeflich, P., Myers, P. C., & Clayton, D. D., 1995, ApJ, 447, L53.
Carpenter, J. M. 2000 AJ, 120, 3139.
Chevalier, R. A., 2000 ApJ, 538, L151.
Churchwell, E. B., Felli, M., Wood, D. O. S. & Massi, M. 1987, ApJ, 321, 515
Clayton R. N., Grossman, L., & Mayeda, T. K. 1973, Science, 182, 485.
Ellinger et al., in prep.
Elmegreen, B. G. & Lada, C. J. 1977, ApJ 214, 725.
Gaidos, E. J., 1995, Icarus 114, 258.
Gladman, Chan 2006, ApJ 643, L135.
Gutermuth, R. A., Megeath, S. T., Pipher, et al. 2005 ApJ 632, 397.
Gutermuth, R. A., Pipher, J. L., Megeath, S. T. et al., 2007, in prep.
Hahn, J. M. & Malhotra, R., 2005, AJ 130, 2392.
Hevey, P. J. & Sanders, I. S. 2006, Meteoritics & Planetary Science, 41, 95.
Heyer, M. H., Terebey, S., 1998, ApJ 502, 265.
Hernandez, J., Hartmann, L., Megeath, T. 2007 ApJ in press.
Hester, J. J. & Desch, S. J. 2005, in ASP Conf. Ser. 341: Chondrules and the Proto-
planetary Disk ed. Krot, Scott, & Reipurth, 527.
Hester, J. J., Desch, S. J., Healy, K. R., Leshin, L. A. 2004, Science 304, 1116.
Hester, J. J., Scowen, P. A.,Sankrit, R. et al., 1996, AJ, 111, 2349.
Hoogerwerf, R., de Bruijne, J. H. J. & P. T. de Zeeuw, 2001, A&A 365, 49.
Kastner, J. H. & Myers, P. C., 1994, ApJ 421, 605.
Kenyon, S. J. & Bromley, B. C., 2004, Nature 432, 598.
Lada, C. J. & Lada, E. A., 2003 ARA&A 41, 57.
Lee, J.-E., Bergin, E. & Lyons, J. submitted to Meteoritics & Planetary Science
Levison, H. F. & Morbidelli, A., Nature 2003, 426, 419.
Looney, L. W., Tobin, J. J., Fields, B. D. 2006, ApJ 652, 1755.
Megeath, S. T., Wilson, T. L. AJ, 2007, AJ 114, 1106.
Megeath, S. T., Allen, L. E., Gutermuth, R. A. et al., 2004, ApJS, 154, 367.
Megeath, S. T., Gutermuth, R. A., Hora, J. L., et al., 2007, in prep.
Melioli, C., De Gouveia Dal Pino, E. M., et al., 2006, MNRAS, 373, 811.
McCaughrean, M. J., Andersen, M., 2002, A&A 389, 513.
Morbidelli, A. & Levison, H. F., 2004, AJ 128, 2564.
Moriarty-Schieven, G. H., Johnston, D., Bally, J. & Jenness, T., 2006, ApJ 645, 357.
O’dell, C. R. & Zhen, W., 1994, ApJ, 436, 194.
Ouellette, N., Desch, S. J., Hester, J. J., Leshin, L. A., 2005, in ASP Conf. Ser. 341:
Chondrules and the Protoplanetary Disk ed. Krot, Scott, & Reipurth, 527.
Porras, A., Micol, C., Allen, L, et al., 2003, AJ 126, 1916.
Reach, W., Rho, J., Young, E., et al., 2004, ApJS 154, 385.
Sekiya, M., 1998, Icarus 133, 298.
Sherry, W. H., Wlater, F. M. & Wolk, S. J. 2004, AJ 128, 2316.
Sicilia-Aguilar, A., Hartmann, L., Calvet, N. et al., 2006, ApJ 638, 897.
Schertl, D., Balega, Y. Y., Preibisch, Th. & Weigelt, G., 2003, A&A 402, 267.
Sugitani, Motohide, T., Ogura, K., 1995, ApJ 455, L39.
Tachibana, S. & Huss, G. R., 2003. ApJ, 588, L41.
Thrane, K., Bizzarro, M. & Baker, J. A. 2006 ApJ 646, L159.
Throop, H. B., Bally, J., 2005, ApJ 623, L149.
Throop, H. B., Bally, J. 2007 in prep.
Townsley, L. K. 2006 to appear in “Massive Stars: From Pop III and GRBs to the Milky
Way”, ed. M. Livio (astro-ph/0608173).
Tremaine, S., 1991, Icarus, 89, 85.
Vanhala, H, A. T. & Boss, A. P. 2002, ApJ 576, 1144.
Williams, J. P., & Gaidos, E. J. in prep.
Wolk, S. J., Bourke, T. L., Smith, R. K., et al., 2002 ApJ 580, L161.
http://arxiv.org/abs/astro-ph/0608173
10 Megeath et al.
Youdin, A. N. & Shu, F. H., 2002, ApJ 580, 494.
ABSTRACT
  During the last three decades, evidence has mounted that star and planet
formation is not an isolated process, but is influenced by current and previous
generations of stars. Although cool stars form in a range of environments, from
isolated globules to rich embedded clusters, the influences of other stars on
cool star and planet formation may be most significant in embedded clusters,
where hundreds to thousands of cool stars form in close proximity to OB stars.
At the cool stars 14 meeting, a splinter session was convened to discuss the
role of environment in the formation of cool stars and planetary systems; with
an emphasis on the ``hot'' environment found in rich clusters. We review here
the basic results, ideas and questions presented at the session. We have
organized this contribution into five basic questions: what is the typical
environment of cool star formation, what role do hot star play in cool star
formation, what role does environment play in planet formation, what is the
role of hot star winds and supernovae, and what was the formation environment
of the Sun? The intention is to review progress made in addressing each
question, and to underscore areas of agreement and contention.

<|endoftext|><|startoftext|>
Introduction 2
2 Differential forms with logarithmic singularities 5
3 Logarithmically singular hermitian vector bundles 13
4 Global bounds for real log-log growth (1,1)-forms 17
5 Bounding height integrals 25
6 Arakelovian heights 39
7 Examples 46
8 Appendix 52
Conventions
We fix some conventions and notations to be followed throughout this paper.
The open disc of C centered at 0 and of radius ε > 0 will be denoted by ∆ε.
If f, g : E → R are two functions on a set E 6= ∅, we write f ≺ g to mean
that there exists a constant C such that f(x) ≤ Cg(x) for all x ∈ E. If the
involved constant depends on some data D that we want to specify, we write
f ≺D g.
http://arxiv.org/abs/0704.1046v1
If X is a complex analytic manifold we decompose the exterior differential
operator as d = ∂+ ∂̄ and we define dc = (4πi)−1(∂− ∂̄), so that ddc = i ∂ ∂̄ /2π.
Let k be a field. By an algebraic variety X over k we mean a separated and
reduced scheme of finite type over k. In particular, X is noetherian and it has
a finite number of irreducible components. When k = C, for every separated
scheme of finite type X over C there is an associated complex analytic space
Xan, whose underlying topological space equals the set of complex points X(C).
If F is a closed subscheme of X , then F an is an analytic subspace of Xan. The
scheme X is proper over C if, and only if, Xan is compact. Also, X is a
nonsingular variety over C if, and only if, Xan is a complex analytic manifold.
To simplify notations we will write X instead of Xan or X(C) (it will be clear
from the context the category we are working on).
Let K be a number field. Its ring of integers is denoted by OK . Let S =
SpecOK . An arithmetic variety over S will be a flat and projective scheme
π : X → S , with regular generic fiber XK = X ×S SpecK of pure dimension
n. The set of complex points X (C) has a natural structure of complex analytic
manifold, and it can be partitioned as
X (C) =
σ:K →֒C
Xσ(C).
The complex conjugation induces an antiholomorphic involution F∞ : X (C) →
X (C).
1 Introduction
In this paper we establish a common generalization of the following two state-
ments.
Theorem 1.1 (Faltings [7], Lemma 3). Let X be a projective arithmetic variety
and Y ⊆ X a Zariski closed subset. Let L = (L , ‖ ·‖) be an ample line bundle
on X endowed with a smooth hermitian metric on L |X (C)\Y (C). Suppose that
‖ · ‖ has logarithmic singularities along Y (C). Fix a number field K. For any
real constant C, there are only finitely many points P ∈ X (K) \ Y (K) with
(P ) ≤ C.
The notions of function and metric with logarithmic singularities are recalled
in §2 and §3 below.
Theorem 1.2 (Bost-Gillet-Soulé [1], Proposition 3.2.4 and Theorem 3.2.5).
Let X be a projective arithmetic variety and L a smooth hermitian ample line
bundle on X .
1. For any real constant A, there are only finitely many effective cycles z ∈
Zp(X ) such that deg
z ≤ A and h
(z) ≤ A.
2. There exists a positive constant κ such that h
(z) ≥ −κ deg
z for every
effective cycle z ∈ Zp(X ).
Let us place under the hypothesis of Theorem 1.1. Let P ∈ X (K) \ Y (K),
extended to P : SpecOK → X by properness of X . Since L is ample, there
exists some positive power L ⊗N admitting a global section s non-vanishing at
P . Then
(P ) = log ♯
(L ⊗N )
(sOK)
σ:K →֒C
log ‖s‖σ(Pσ),
where Pσ is the point in Xσ(C) induced by P . Therefore, in Faltings’ result,
only the metric as a function is required. The definition of the height of a
cycle of positive relative dimension, with respect to a smooth hermitian line
bundle, involves the derivatives of the metric up to second order (see [1] §3 and
also §6 below). Therefore, to extend both theorems, we need to describe the
kind of logarithmic singularities allowed to the derivatives of the metric, up to
second order. The arithmetic intersection theory of Gillet and Soulé needs to
be reinterpreted so we can deal with such metrics. This has been done in [2],
where Burgos, Kramer and Kühn develop a theory of abstract arithmetic Chow
groups, and apply it to the case of logarithmic singularities.
Before the statement of our main theorem we introduce some notations. Let
K be a number field, OK its ring of integers and X a projective arithmetic
variety over SpecOK . Suppose that D ⊆ XK is a divisor such that D(C) has
normal crossings. Write U = X (C) \D(C). By Z
U (X ) we denote the group of
codimension p cycles z on X , such that z(C) intersects D(C) properly. A cycle
on X is called horizontal if it is the Zariski closure of a cycle on XK . We write
U (XK) for the subgroup of Z
U (X ) consisting of horizontal cycles. Now let L
be a line bundle on X , endowed with a pre-log-log hermitian metric ‖ · ‖, with
singularities along D (see §3 below for the definition). According to Burgos,
Kramer and Kühn, this is the notion of metric with logarithmic singularities
well suited to define heights on Z
U (X ) (see [2] §7). If LK is ample there is a
well defined normalized height h̃
U (XK) with respect to L = (L , ‖ · ‖).
We refer to §6 for a summary of the theory of heights in Arakelov theory.
Theorem 1.3 (Main Theorem). Let X be a projective arithmetic variety over
SpecOK , D ⊆ XK a reduced divisor such that D(C) ⊆ X (C) has simple
normal crossings and U = X (C) \ D(C). Let L be a line bundle on X with
LK ample. Let ‖ · ‖ be a pre-log-log hermitian metric on L , with singularities
along D, and ‖ · ‖0 a smooth metric on L . Then there exist constants α, β,
γ > 0 and R ∈ Z≥0 such that for every effective cycle z ∈ Z
U (XK) we have
(z) + γ ≥ 1 and
(1.1)
∣∣∣h̃
(z)− h̃
∣∣∣ ≤ α+ β logR
(z) + γ
If moreover ‖ · ‖ is good along D, then we can take R = 1.
In conjunction with Theorem 1.2, Theorem 1.3 immediately yields the de-
sired finiteness property as well as the existence of a universal lower bound.
Corollary 1.4. Let X be a projective arithmetic variety over SpecOK , D ⊆
XK a reduced divisor such that D(C) has simple normal crossings and U =
X (C) \D(C). Let L be a pre-log-log hermitian ample line bundle on X , with
singularities along D.
1. For any real constant A, there are only finitely many effective cycles z ∈
U (X ) such that degLK z ≤ A and hL (z) ≤ A.
2. There exists a positive constant κ such that h
(z) ≥ −κ deg
z for every
effective cycle z ∈ Z
U (X ).
The techniques employed for the proof of the main theorem were initially in-
spired by the work of Carlson and Griffiths on the defect relation for equidimen-
sional holomorphic maps, in higher dimensional Nevanlinna theory [4]. Moreover
(1.1) may be interpreted as a vast generalization of the naive Liouville’s inequal-
ity for the distance between an algebraic number and a rational number. This
point of view is conceptually interesting, since it opens the natural question of
finding an analogue to Roth’s theorem for the distance(1) between a divisor with
simple normal crossings and effective cycles of arbitrary dimension (all defined
over a number field)(2).
This paper is organized as follows.
In Section 2 we review the theory of differential forms with logarithmic sin-
gularities necessary in the rest of the paper. In Section 3 we study in detail
several notions of logarithmically singular hermitian vector bundles. The re-
sults we recall provide a wealth of constructions to which Theorem 1.1 and
Theorem 1.3 apply. Both sections are complemented with examples for a bet-
ter understanding of the theory. In Section 4 we establish global bounds for
real log-log growth (1,1)-forms. An outstanding consequence is a decomposition
theorem (Theorem 4.3) for pre-log-log functions, which plays a crucial role in
the proof of the main theorem. In Section 5 we prove estimates for integrals of
pre-log-log forms appearing in the archimedian part of the definition of height.
This leads to the proof of the main theorem in Section 6, where the reader will
find an overview of the theory of heights in Arakelov geometry. In Section 7
we present some examples of good hermitian line bundles interesting for arith-
metic purposes (for instance in relation with Theorem 1.3). We treat the case
of fully decomposed automorphic vector bundles on locally symmetric varieties,
the relative dualizing sheaf of the universal curve over the moduli space of stable
curves, equipped with the family hyperbolic metric, the Weil-Petersson metric
on the moduli space of curves and the Kähler-Einstein metric on quasi-projective
varieties. Finally, in the Appendix we prove a Bertini’s type theorem needed
for the preliminary reductions in the proof of the main theorem.
Acknowledgements. I am deeply indebted to J.-B. Bost and J. I. Burgos Gil
for proposing me to work on this subject as the starting point of my PhD. thesis,
and for their guidance and constant encouragement. During the preparation of
this paper I benefited from stimulating discussions with several people: I warmly
thank S. Fischler, U. Kühn, V. Maillot, C. Mourougane and D. Roessler for the
time they devoted to me. Their useful advice is reflected throughout the paper.
This work was presented, in a preliminary form, at the Internationales
Graduiertenkolleg Arithmetic and Geometry of the Humboldt-Universität zu
Berlin and the ETH of Zürich, in May 2005. I am grateful to J. Kramer for
inviting me to talk to their summer school.
(1)Suitable candidates for the logarithm of the distance are provided by the height integrals
introduced in §5 below.
(2)The height morphism h
is defined for any pre-log-log hermitian line bundle L , with
singularities along a divisor in X (C). However, for the main theorem to hold, the divisor
needs to be defined over a number field. This essentially goes back to the construction of
transcendental numbers due to Liouville.
2 Differential forms with logarithmic singulari-
Let X be a complex analytic manifold and F a closed analytic subspace. In this
section we introduce several notions of differential forms on X with logarithmic
singularities along F , relevant to our work. We distinguish the case of functions
from the case of differential forms, since the former can be presented in a more
general geometric frame. Indeed, while we can define functions with logarithmic
singularities along an arbitrary closed analytic subspace F , the appropriate
analogues for differential forms of any order require F to be a divisor with
normal crossings.
2.1 Functions with logarithmic singularities
Let X be a complex analytic manifold and F a closed analytic subspace of X .
We denote by IF ⊆ OX the ideal sheaf defining F and supp(F ) for the support
of F . For every x ∈ X , there exist an open neighborhood V and holomorphic
functions s1, . . . , sm ∈ OX(V ) such that
i. the germ of ideal sheaf IF,x is generated by s1, . . . , sm;
ii. the trace of the support of F on V is supp(F ) ∩ V = {z ∈ V | s1(z) =
. . . = sm(z) = 0}.
Since OX is a coherent sheaf, so is IF . Then, given s1, . . . , sm as above, after
possibly shrinking V , s1, . . . , sm generate all the germs IF,z for z ∈ V . In this
case we say that s1, . . . , sm generate IF |V and we write IF |V = (s1, . . . , sm) as
a shortcut. The reader is referred to [6] for further details on analytic spaces.
Definition 2.1. Let X be a complex analytic manifold and F ⊆ X a closed
analytic subspace. A smooth function f : X \ supp(F ) → C has logarithmic sin-
gularities along F if for every open subset V of X such that IF |V = (s1, . . . , sm)
and every relatively compact open subset V ′ ⊂⊂ V , there exists an integer
N ≥ 0 such that
(2.1) |f|V ′\supp(F ) | ≺
∣∣∣∣log( maxi=1,...,m |si|)
For a function f : X \ supp(F ) → C to be with logarithmic singularities
along F it is enough that (2.1) be satisfied for a given open covering of X and
given local generators of IF . This is the content of the next lemma.
Lemma 2.2. Let X be a complex analytic manifold and F a closed analytic
subspace. Fix an open covering {Vα}α of X such that IF |Vα is generated by
sα1 , . . . , s
. Suppose given V ′α ⊂⊂ Vα still forming an open covering of X.
Then a smooth function f : X \ supp(F ) → C has logarithmic singularities
along F if, and only if, for every α there exists an integer N ≥ 0 such that
|f|V ′α\supp(F )
∣∣∣∣log( maxi=1,...,mα
|sαi |)
Proof. Left as an easy exercise.
Lemma 2.3. Let X be a complex analytic manifold and F , G closed analytic
subspaces with supp(F ) = supp(G). A smooth function f : X\supp(F ) → C has
logarithmic singularities along F if, and only if, it has logarithmic singularities
along G.
Proof. Write I, J for the ideal sheaves of F , G, respectively. Let V be an open
neighborhood of x ∈ X such that I|V = (s1, . . . , sl) and J|V = (t1, . . . , tm).
By Hilbert’s Nullstellensatz (see (4.22) in [6]), after possibly shrinking V , there
exists an integer N ≥ 0 such that
sNi =
i tj , t
µijsi.
for some holomorphic functions λ
i , µ
j . Hence, if V
′ ⊆ V is a relatively compact
open subset, there exists a constant C > 0 such that
N ≤ C max
j=1,...,m
|tj |, |tj |
N ≤ C max
i=1,...,l
|si|.
The lemma follows.
The meaning of the lemma is that the notion of function with logarithmic sin-
gularities along F depends only on the support of F .
Proposition 2.4. i. Let X be a complex analytic manifold and F,G closed an-
alytic subspaces with supp(F ) ⊆ supp(G). A smooth function f : X\supp(F ) →
C with logarithmic singularities along F has logarithmic singularities along G.
ii. Let ϕ : X → Y be a morphism of complex analytic manifolds and F ⊆ Y a
closed analytic subspace. If f : Y \ supp(F ) → C has logarithmic singularities
along F , then ϕ∗f = f ◦ ϕ has logarithmic singularities along ϕ−1(F ). If ϕ is
surjective and proper, the converse holds.
Proof. The first item i is straightforward. We shall prove the second part ii.
Let V be an open subset of Y such that IF |V = (s1, . . . , sm). The ideal sheaf
of ϕ−1(F ∩ V ) is (ϕ∗s1, . . . , ϕ
∗sm). Let {Vα}α be an open covering of V by
relatively compact subsets. For every open U ⊂⊂ ϕ−1(V ) define Uα = U ∩
ϕ−1(Vα). Then Uα is relatively compact in ϕ
−1(V ) and ϕ(Uα) ⊆ Vα. The
estimate
|f|Vα\supp(F )| ≺
∣∣∣∣log( maxi=1,...,m |si|)
implies the corresponding inequality for ϕ∗(f) on Uα.
We now prove the converse under the surjectivity and properness assumption.
Let V be as above and V ′ ⊂⊂ V an open subset. Since ϕ is proper, ϕ−1(V ′)
is relatively compact in ϕ−1(V ). The hypothesis of logarithmic singularities of
ϕ∗f asserts the existence of an integer N ≥ 0 such that
|ϕ∗f|ϕ−1(V ′\supp(F ))| ≺
∣∣∣∣log( maxi=1...,m |ϕ
∗si|)
We come up with the conclusion by the surjectivity of ϕ.
2.2 Differential forms with logarithmic singularities
Definition 2.5 (Divisor with normal crossings). Let X be a complex analytic
manifold of dimension n. A reduced analytic subspace D of X is a divisor
with normal crossings if X can be covered by open subsets V with coordinates
z1, . . . , zn such that D ∩ V is defined by z1 · . . . · zm = 0, for some 0 ≤ m ≤ n.
We say that D has simple normal crossings if it can be written as a finite union
of smooth analytic hypersurfaces of X .
Definition 2.6 (Adapted analytic chart [2]). Let X be a complex analytic
manifold of dimension n and D a divisor with normal crossings in X . An
analytic chart (V ; {zi}
i=1) is said to be adapted to D if |zi| < 1/e, i = 1, . . . , n
and D ∩ V is defined by z1 · . . . · zm = 0, for some 0 ≤ m ≤ n. The integer m
will be understood when no confusion can arise.
Notation 2.7. Let X be a complex analytic manifold and D ⊆ X a divisor
with normal crossings. Let (V ; {zi}i) be an analytic chart such that D ∩ V is
defined by z1 · . . . · zm = 0. We define
dζk =
zk log |zk|−1
, if 1 ≤ k ≤ m
dzk, if k > m
and similarly for dζ̄k. Given I, J ordered subsets of {1, . . . , n}, we abbreviate
dζI ∧ dζ̄J =
dζi ∧
dζ̄j .
In the following definitions we write X for a complex analytic manifold of
dimension n, D a divisor with normal crossings in X , U = X \D and ι : U →֒ X
for the natural open immersion. The sheaf of C∞ complex differential forms on
U is denoted by E∗U .
Definition 2.8 (Poincaré growth forms [14]). The sheaf of Poincaré growth
forms on X , with singularities along D, is the subalgebra PD of ι∗E
U gener-
ated, on every open analytic chart (V ; {zi}i) adapted to D, by C
∞(V \D,C) ∩
L∞loc(V,C) and the differential forms dζk, dζ̄k, k = 1, . . . , n. Namely, for every
analytic chart (V ; {zi}i) adapted to D, ι∗E
U (V ) is the C-vector space generated
by differential forms ∑
αI,JdζI ∧ dζ̄J
where αI,J ∈ C
∞(V \D) are locally bounded on V .
Definition 2.9 (Good forms [14]). A good differential form on an open subset
V of X , with singularities along D, is a section ω ∈ Γ(V,PD) such that dω ∈
Γ(V,PD).
Definition 2.10 (log-log growth forms [2]). The sheaf of log-log growth forms
on X , with singularities along D, is the subalgebra LD of ι∗E
U generated, on
every analytic chart (V ; {zi}i) adapted to D, by the functions f ∈ C
∞(V \D,C)
such that on every open V ′ ⊂⊂ V
(2.2) |f(z1, . . . , zn)| ≺
(log log |zk|
for some integer N ≥ 0, together with the differential forms dζk, dζ̄k, k =
1, . . . , n.
Remark 2.11. Observe that the following inequalities hold:
(log log |zk|
−1)N ≤
(log log |zk|
−1)Nm ≤ 2m−1
(log log |zk|
−1)Nm.
Therefore, a smooth function f on V \ D has log-log growth along D if, and
only if, for every open V ′ ⊂⊂ V there is an estimate
(2.3) |f|V ′\D| ≺
(log log |zk|
−1)N .
In concrete computations involving functions with log-log growth, we may use
either formulations (2.2) or (2.3).
Proposition 2.12. Let f : X \D → C be a smooth function. Suppose that df
has log-log growth with singularities along D. Then f has log-log growth with
singularities along D. More precisely, for every analytic chart (V ; {zi}i) adapted
to D and every open V ′ ⊂⊂ V , if df =
j gjdζj +
j hjdζ̄j with
|gj|V ′\D |, |hj|V ′\D | ≺
(log log |zk|
−1)N ,
then we have
|f|V ′\D| ≺
(log log |zi|
−1)N+1
i<j≤m
(log log |zj |
−1)N .
Proof. After localizing to an analytic chart adapted to D we reduce to V =
with V \D = ∆∗r
. Let 0 < ε < 1 and Uε = ∆
be contained in V . Let (w1, . . . , wn) ∈ Uε \D. Fix 1 ≤ i ≤ r. Define the curve
γ : [0, 1] −→ V
t 7−→ (tw1 + (1− t)
e|w1|
, w2, . . . , wn).
Then we have
(f ◦ γ)(1)− (f ◦ γ)(0) =
γ∗(df).
Now we write
γ∗(df|Uε\D) = γ
z1 log |z1|−1
z1 log |z1|−1
where g, h ∈ C∞(Uε \D,C) satisfy
|g|, |h| ≺
(log log |zk|
−1)N on Uε \D.
A straightforward computation yields
df|Uε\D
) ∣∣∣∣
≺f,Uε
1<j≤r
(log log |wj |
−1)N ·
(log log γ∗|z1|
log γ∗|z1|
log γ∗|z1|−1
≤ (log log |w1|
−1)N+1
1<j≤r
(log log |wj |
−1)N .
It follows that
|f(w1, . . . , wn)| ≺f,Uε
∣∣∣∣f
e|w1|
, w2, . . . , wn
)∣∣∣∣
+(log log |w1|
−1)N+1
1<j≤r
(log log |wj |
−1)N .
By induction we find
|f(w1, . . . , wn)| ≺f,Uε sup
|f(z1, . . . zn)|
(log log |wi|
−1)N+1
i<j≤r
(log log |wj |
Observe that the sup is finite because f is smooth on X \ D. The proof is
complete.
Definition 2.13 (pre-log-log forms [2]). The sheaf of pre-log-log forms on X ,
with singularities along D, is the subalgebra E∗X〈〈D〉〉pre of ι∗E
U generated by
log-log growth forms ω in LD such that ∂ ω, ∂̄ ω and ∂ ∂̄ ω are also log-log growth
forms in LD.
A particular case of pre-log-log forms that deserves to be distinguished is
that of P-singular functions.
Definition 2.14 (P-singular function). Let f : X\D → C be a smooth function.
We say that f is a P-singular function, with singularities along D, if, and only
if, df and ddcf have Poincaré growth along D.
Corollary 2.15. Let f : X \D → C be a P-singular function. Then, for every
adapted analytic chart (V ; {zi}i) and every open V
′ ⊂⊂ V , we have
|fV ′\D| ≺
(log log |zk|
Consequently, f is a pre-log-log function.
Proof. This is a straightforward application of Proposition 2.12.
For later computations it will be worth having at our disposal the following
basic properties of log-log growth forms.
Proposition 2.16. i. Any log-log growth form is locally integrable. Moreover,
log-log growth functions and log-log growth 1-forms are locally L2.
ii. (Stokes’ theorem for pre-log-log forms.) If ω ∈ Γ(X, E∗X〈〈D〉〉pre) and [ω]
denotes its associated current, then
d[ω] = [dω]
and similarly for ∂, ∂̄ and ∂ ∂̄.
iii. If f : X → Y is a morphism of complex analytic manifolds and DX ⊆ X,
DY ⊆ Y normal crossing divisors with f
−1(DY ) ⊆ DX , then f
∗PDY ⊆ PDX
and f∗LDY ⊆ LDX . Therefore f
∗E∗Y 〈〈DY 〉〉pre ⊆ E
X〈〈DX〉〉pre. In particular,
this is true for f being the natural inclusion of a complex analytic submanifold
X ⊆ Y intersecting DY transversally and DX = DY ∩X.
Proof. The proposition quotes [2], Proposition 7.5 and Proposition 7.6. How-
ever, for later use, we may comment on the proof of i. After changing to polar
coordinates, it is enough to observe that for every 0 < δ < 1 we have an estimate
∫ ε/e
(log log t−1)N
t(log t)2
∫ ε/e
t(log t−1)1+δ
< +∞.
We finally give an example showing that the notion of pre-log-log function
depends on the compactification X of U .
Example 2.17. Let X = P2
with projective coordinates (w0 : w1 : w2). As
divisor with normal crossings set D = (w0 = 0) ∪ (w1 = 0). Define the smooth
function on U = X \D
g(w0 : w1 : w2) =
|w0|2 + |w1|2
Denote by X̃ the blowing-up of X at (0 : 0 : 1). X̃ admits the following
description:
((w0 : w1 : w2), (z0 : z1)) ∈ X × P
| w0z1 = w1z0
The map realizing the blowing-up is the projection onto the first factor π : X̃ →
X . In particular, since (0 : 0 : 1) ∈ D, we have an isomorphism π−1(U)
→ U ,
and π−1(D) is a divisor with normal crossings. Observe that the pullback of g
by π is
f((w0 : w1 : w2), (z0 : z1)) =
|z0|2 + |z1|2
The function f extends to a smooth function on the whole X̃, in particular pre-
log-log along π−1(D). However, we claim that g is not a pre-log-log function
alongD. To see this we compute ∂ g. We may localize at the affine neighborhood
w2 6= 0 of (0 : 0 : 1) and write u = w0/w2, v = w1/w2. In coordinates u, v we
∂ g =
|u|2|v|2
(|u|2 + |v|2)2
But the function |u|2|v|2 log |u|−1/(|u|2 + |v|2)2 does not have log-log growth
along D, as we see after restriction to |u| = |v|. This proves the claim.
2.3 Variants: log-log forms
Following [3] we introduce a variant of the sheaf of pre-log-log forms, by imposing
bounds on all the derivatives of the component functions of the differential forms.
There are also corresponding variants for Poincaré growth forms and good forms,
for which we refer to loc. cit.
We fix a complex analytic manifold X and D ⊆ X a divisor with normal
crossings. Write U = X \D and ι : U →֒ X for the natural open immersion.
Definition 2.18 (log-log functions of infinite order [3]). A smooth function
f : X \D → C is said to be a log-log function of infinite order, with singularities
along D, if for every analytic chart (V ; {zi}i) adapted toD, every open V
′ ⊂⊂ V
and multi-indices α = (α1, . . . , αn), β = (β1, . . . , βn), there is a bound on V
∣∣∣∣∣
f(z1, . . . , zn)
∣∣∣∣∣ ≺
k=1(log log |zk|
where zα
= zα11 . . . z
m (similarly for z
β≤m) and N depends on V ′, α, β.
Definition 2.19 (log-log growth forms of infinite order [3]). The sheaf of log-log
growth forms of infinite order onX , with singularities along D, is the subalgebra
of ι∗E
U generated, on every analytic chart V adapted to D, by log-log growth
functions of infinite order and the differential forms dζk, dζ̄k, k = 1, . . . , n (see
Notation 2.7).
Remark 2.20. Let ω be a log-log growth form of infinite order alongD, defined
on some analytic open subset of X . Then the complex conjugate ω̄ is also a
log-log growth form of infinite order along D.
Definition 2.21 (log-log forms [3]). The sheaf of log-log forms on X , with
singularities along D, is the subalgebra E∗X〈〈D〉〉 of ι∗E
U generated by log-log
growth forms ω of infinite order, such that ∂ ω, ∂̄ ω and ∂ ∂̄ ω are also log-log
growth forms of infinite order.
Remark 2.22. There is an obvious inclusion E∗X〈〈D〉〉 ⊆ E
X〈〈D〉〉pre.
Log-log growth forms of infinite order enjoy of analogous properties to the
log-log growth forms introduced before. We refer to [3] for details. An advan-
tage of the sheaf of log-log forms over the sheaf of pre-log-log forms is that a
Poincaré’s type lemma holds for the former. The next essential property follows.
Theorem 2.23 ([3]). The natural inclusion
Ω∗X −→ E
X〈〈D〉〉
is a filtered quasi-isomorphism with respect to the Hodge filtration.
Proposition 2.24. Let f : X \ D → C be a smooth function. Then f is a
log-log form, with singularities along D, if, and only if, df is locally L2 on X
and ddcf is a log-log growth form of infinite order along D.
Proof. The direct implication is an easy exercise. Let us see the converse. By
hypothesis, ∂̄ ∂ f is a log-log form, with singularities along D. Let x ∈ X . By
Theorem 2.23, there exists an open neighborhood V of x and a log-log form ω
on V such that
∂̄ ∂ f = ∂̄ ω.
Therefore, we can write
∂ f = ω + θ.
for some holomorphic form θ on V \D. Observe that θ is locally L2, because
∂ f is locally L2 by hypothesis and ω is a log-log 1-form (see Proposition 2.16).
By Lemma 2.25 below, θ must be holomorphic on V . This proves that ∂ f has
log-log growth of infinite order along D. The same reasoning applied to ∂̄ ∂ f̄
(the complex conjugate of ∂ ∂̄ f) proves that ∂̄ f has log-log growth of infinite
order along D. Therefore df has log-log growth of infinite order.
Again by Theorem 2.23, after possibly shrinking V , there exists a log-log func-
tion of infinite order g and a holomorphic function h on V \D such that f = g+h.
We claim that h is locally L2. Since this is true for g, we are reduced to prove
it for f . But we have already shown that df has log-log growth of infinite order
along D, so that Proposition 2.12 implies that f has log-log growth along D. In
particular, f is locally L2 (Proposition 2.16). By Lemma 2.25, g is holomorphic
on V and hence f has log-log growth of infinite order along D. This finishes the
proof.
Lemma 2.25. Let X be a complex analytic manifold and D ⊆ X a divisor with
normal crossings. Let θ be a holomorphic function on X \D. If θ is locally L2
on X, then θ extends to a holomorphic function on X.
Proof. The lemma is well-known, but we include the proof for lack of reference.
It is enough to treat the case whenX = ∆nε ⊆ C
n andD is defined by z1·. . .·zr =
0, so that X \D = ∆∗rε ×∆
ε. We write δ = (δ1, . . . , δr) ∈ R
>0. Since θ is locally
(2.4) ‖θ‖2ε := lim
Iδ < +∞
where
∆ε/2\∆δk )×∆
∣∣∣∣∣
dzk ∧ dz̄k
∣∣∣∣∣ .
The Laurent series development
(2.5) θ(z1, . . . , zn) =
is absolutely and uniformly convergent on any (
k=1 ∆ε/2\∆δk)×∆
. There-
fore, the integral Iδ can be computed term by term:
ν,µ∈Zn
∆ε/2\∆δk
k |dzk ∧ dz̄k|
zνkk z
k |dzk ∧ dz̄k|
Recall that given integers a, b we have
eaiθebiθdθ = 2πδa,b,
so that
|aν |
∆ε/2\∆δk
2νk |dzk ∧ dz̄k|
2νk |dzk ∧ dz̄k|
(2.6)
We reason by contradiction and assume that θ does not extend to a holomorphic
function on ∆n
. We can suppose that in (2.5) there appears a term aνz
ν 6= 0,
with ν = (ν1, . . . , νn) ∈ Z
<0 × Z
≥0 , 1 ≤ l ≤ r. From (2.6) and by direct
computation we find
(2.7) Iδ ≥ (4π)
n|aν |
k=l+1
(ε/2)2νk+2
2νk + 2
δ2νk+2k
2νk + 2
(ε/2)2νk+2
2νk + 2
where
Jδk =
if νk = −1,
(ε/2)2νk+2
2νk+2
2νk+2
2νk+2
if νk < −1.
Since aν 6= 0 and Jδk → +∞ as δ → 0, we see from (2.7) that Iδ → +∞ as
δ → 0. This contradicts (2.4). The proof is complete.
3 Logarithmically singular hermitian vector bun-
Let X be a complex analytic manifold and D ⊆ X a divisor with normal cross-
ings. Write U = X \D and ι : U →֒ X for the natural open immersion. In this
section we study vector bundles endowed with hermitian metrics with singular-
ities of logarithmic type along D. The reader is referred to §2 for the several
definitions and properties of differential forms with singularities of logarithmic
type along D.
Definition 3.1 ([3] and [14]). Let E be a vector bundle of rank r on X . A
smooth hermitian metric h on E|U is said to have logarithmic singularities along
D if, for every trivializing open subset V and holomorphic frame e1, . . . , er of
E|V , putting hij = h(ei, ej) and H = (hij) on V \D, the following condition is
fulfilled:
(L(E, h)) the functions |hij |, detH
−1 have logarithmic singularities along
D ∩ V (see Definition 2.1).
We say that h is (pre-)log-log (resp. good) along D if moreover, for every such
data V and e1, . . . , er, the following property holds:
(G(E, h)) the entries of the matrix (∂ H)H−1 are (pre-)log-log (resp.
good) forms on V , with singularities along D (see Definition 2.9).
We will usually write E = (E, h) when no confusion on the metric can arise.
Sometimes we use some variants of the definition, and we say for instance “E
has logarithmic singularities along D” or “E is (pre-)log-log (resp. good) along
In the case of line bundles, the notions of (pre-)log-log and good hermitian
metrics can be characterized by slightly simpler properties.
Proposition 3.2. Let L be a hermitian line bundle on X and h a smooth
hermitian metric on L|U . Write ‖ · ‖ for the norm associated to h.
i. The metric h is (pre-)log-log (resp. good) with singularities along D if, and
only if, for every trivializing open subset V and holomorphic frame e of L|V , the
function log h(e, e) is (pre-)log-log (resp. P-singular) on V , with singularities
along D.
ii. The metric h is log-log with singularities along D if, and only if, for every
trivializing open subset V and holomorphic frame e of L|V , the form ∂ log h(e, e)
is locally L2 on V and ∂̄ ∂ log h(e, e) has log-log growth of infinite order, with
singularities along D.
Proof. This follows from the definitions and Proposition 2.12, Proposition 2.15
and Proposition 2.24 applied to the smooth real function log h(e, e) on V \D.
An essential extension property of hermitian vector bundles with logarithmic
singularities is the following observation due to Mumford.
Proposition 3.3. Let (E◦, h) be a smooth hermitian vector bundle on U . Then
there exists at most one extension of (E◦, h) to a hermitian vector bundle (E, h)
on X, with logarithmic singularities along D. More precisely, if (E, h) is such
an extension, then for every open subset V in X
Γ(V,E) =
s ∈ Γ(V, ι∗E
◦) | h(s, s) has log. sing. along D ∩ V
Proof. This is Proposition 1.3 in [14].
Hermitian vector bundles with logarithmic singularities along D admit the
following characterization.
Proposition 3.4. Let E be a vector bundle on X and hE a smooth hermitian
metric on E|U . Denote by hE∨ the dual metric. Then hE has logarithmic
singularities along D if, and only if, the following condition is satisfied with
F = E and F = E
(L̃(F, hF )) for every open subset V and any holomorphic section s of F|V ,
the function hF (s, s) has logarithmic singularities along D ∩ V .
Proof. For the direct implication, first take a holomorphic section s of E over
an open subset V . We may assume that s does not vanish on V . After pos-
sibly shrinking V , we can complete s to a holomorphic frame e1 = s, . . . , er
of E|V . By the definition of metric with logarithmic singularities, the function
hE(s, s) = hE(e1, e1) has logarithmic singularities along D.
Secondly, let V be a trivializing open subset of E and e1, . . . , er a holomor-
phic frame of E|V . Write H = (hij) for the matrix of hE in base {ei}i and
H−1 = (gij) for the inverse matrix. From the very construction of H
−1 and
the logarithmic singularities of the functions hij and detH
−1, it is immedi-
ate to check that the functions gij have logarithmic singularities along D. If
B is the matrix of hE∨ in any holomorphic frame of E
, then there exists
A ∈ GLr(Γ(V,OX)) such that
B = At ·H−1 · A.
Since the entries of A are holomorphic, the entries of B inherit from H−1 the
logarithmic singularities along D ∩ V .
Let now s be a holomorphic section of E∨ over an open subset V . Replacing
V by a smaller open subset, we can complete s to a holomorphic frame of E∨
v1 = s, . . . , vr. As we have just proven the functions hE∨(vi, vj) have logarithmic
singularities along D, in particular so does hE∨(s, s) = h(v1, v1).
Now for the converse. Let V be a trivializing open subset, adapted to D. Let
e1, . . . , er be a frame for E|V . Write H = (hij) for the matrix of the hermitian
metric hE in base {ei}i. By hypothesis, for every open subset V
′ ⊂⊂ V , there
exists an integer N ≥ 0 such that on V ′
hE(ei, ei) ≺ (log |z1 · . . . · zm|
−1)N .
Applying Schwarz’s inequality we get
|hij |
2 ≺ (log |z1 · . . . · zm|
−1)2N .
The same argument provides similar bounds for the entries of the matrix of hE∨
in the dual basis, namely H−1. Since the determinant of H−1 is a polynomial
in the entries of this matrix, we derive a bound
detH−1 ≺ (log |z1 · . . . · zm|
for some integer M . This concludes the proof.
As an immediate consequence of the proposition we establish the following
corollary.
Corollary 3.5. Let E = (E, hE) be a hermitian vector bundle with logarithmic
singularities along D. For every exact sequence of vector bundles
0 −→ F −→ E −→ Q −→ 0,
the induced hermitian vector bundles F = (F, hF ) (restricted metric) and Q =
(Q, hQ) (quotient metric) have logarithmic singularities along D.
Proof. It is enough to prove that for every exact sequence as in the statement,
conditions (L̃(F )) and (L̃(Q)) hold. Indeed, since E
has logarithmic singular-
ities along D, conditions (L̃(F
)) and (L̃(Q
)) automatically follow by duality.
Then we conclude applying Proposition 3.4. The validity of L̃(F ) is clear. For
L̃(Q), we just observe that if s is a holomorphic section of Q|V and s̃ is a
holomorphic section of E|V lifting s, then
hQ(s, s) ≤ hE(s̃, s̃).
Thus we see that L̃(E) implies L̃(Q).
We next state the main formal properties of logarithmically singular (resp.
(pre-)log-log, resp. good) hermitian vector bundles.
Proposition 3.6. Let E, F be two vector bundles on X and hE and hF smooth
hermitian metrics on E|U and F|U , respectively. If hE and hF have logarithmic
(resp. (pre-)log-log, resp. good) singularities along D, then E
, E ⊗ F , SkE
and ∧kE have logarithmic (resp. (pre-)log-log, resp. good) singularities along
Proof. Left as an elementary exercise.
Proposition 3.7. Let X, Y be complex analytic manifolds and DX ⊆ X,
DY ⊆ Y normal crossing divisors. Let f : X → Y be a morphism of com-
plex analytic manifolds. Let E = (E, h) be a hermitian vector bundle on Y
whose metric is defined and smooth on Y \DY .
i. If f−1(DY ) ⊆ DX and h has logarithmic (resp. (pre-)log-log, resp. good)
singularities along DY , then the metric f
∗(h) on f∗(E) has logarithmic (resp.
(pre-)log-log, resp. good) singularities along DX .
ii. Suppose that f is surjective, proper and f−1(DY ) = DX . Then h has
logarithmic singularities along DY if, and only if, f
∗(E) has logarithmic singu-
larities along DX .
Proof. The first item i follows from Proposition 2.4 i and Proposition 2.16 iii.
The second item ii is automatically deduced from Proposition 2.4 ii.
Corollary 3.8. Let (E, h) be a hermitian vector bundle on X, with singularities
along D. Let OE(1) be the dual of the trivial vector bundle of P(E), the projective
space of lines in E∨. Denote by π : P(E) → X the natural projection. Then the
metric on OE(1) induced by π
∗(h) has logarithmic singularities along π−1(D).
Proof. By definition, the line bundle OE(1) is a quotient of π
∗(E). The hermi-
tian metric on OE(1) is the quotient metric from π
∗(E). By Proposition 3.7,
π∗(E) has logarithmic singularities along π−1(D). Then, by Corollary 3.5, so
does the induced metric on OE(1).
The end of this section is devoted to some counter-examples.
Example 3.9. i. Counter-example to Corollary 3.5 and Corollary 3.8 for pre-
log-log hermitian vector bundles. Let X = A1
be the complex line, with analytic
coordinate z. Let D be the divisor with normal crossings z = 0. As vector
bundle we take E = O⊕2X . We consider a hermitian metric h on E such that, in
the standard basis e1, e2 and near the origin, its matrix H looks like
log |z|−1 0
It is easily seen that (E, h) is pre-log-log along D (actually good). However, the
induced metric on the line bundle OE(−1) ⊆ π
∗(E∨) on P(E) = P1
is not
pre-log-log. Indeed, identifiy A1
as an open subset of P1
via t 7→ (1 : t). If e∨1 ,
e∨2 is the dual basis, define the section s of OE(−1)|A1
s = e∨1 + te
Then h∨(s, s) = (log |z|−1)−1 + |t|2 and
log h∨(s, s) =
(log |z|−1)−1 + |t|2
If we restrict ∂ log h∨(s, s)/ ∂ t to the set C : t = (log |z|−1)−1/2, we find
log h∨(s, s)
(log |z|−1)1/2
which does not have log-log growth near z = 0.
ii. The notion of hermitian vector bundle with logarithmic singularities depends
on the compactification. Let Y be a smooth complex projective surface. Let p
be a closed point in Y and π : X → Y the blowing-up of Y at p. Let D be a
divisor with normal crossings in Y with p ∈ D. Then π−1(D) is a divisor with
normal crossings. Define U = X \ π−1(D) and V = Y \D. Then π induces an
isomorphism between U and V . Let h be a smooth hermitian metric on ωY |V ,
and endow ωX |U with the induced metric π
∗(h). Assume that (ωX , π
∗(h)) has
logarithmic singularities along π−1(D). Then we claim that h does not define
a metric on ωY with logarithmic singularities along D. Indeed, suppose that
ωY = (ωY , h) had logarithmic singularities along D. Then, by Proposition 3.7,
π∗(ωY ) would have logarithmic singularities along π
−1(D). Observe that
π∗(ωY )|U = ωX |U .
By Proposition 3.3 we would derive the equality π∗(ωY ) = ωX . However we
know that
π∗(ωY ) = ωX ⊗O(−E)
where E is the exceptional divisor π−1(p). Since the self-intersection (E2) = −1,
O(−E) is not trivial. We thus arrive to a contradiction and the claim is proved.
We remark that we can produce such examples just endowing ωX with a smooth
hermitian metric and then restricting it to U .
4 Global bounds for real log-log growth (1,1)-
forms
4.1 Statement of the theorem and consequences
Let X be a complex analytic manifold and D ⊆ X a divisor with simple normal
crossings. DecomposeD into smooth irreducible components,D = D1∪. . .∪Dm.
For every Dk we fix a global section sk of O(Dk) with divisor div sk = Dk. We
endow O(Dk) with a smooth hermitian metric ‖ · ‖k such that ‖sk‖
k ≤ e
Therefore 1 ≤ log log ‖sk‖
−2 ≤ +∞ on X .
Notation 4.1. For every integer N ≥ 0 we define the real positive smooth
function on X \D
(log log ‖sk‖
The purpose of this section is the proof of the following global bounds for
real log-log growth (1,1)-forms on a compact complex analytic manifold.
Theorem 4.2. Suppose that X is compact and let ω be a smooth positive (1,1)-
form on X. Let η be a real log-log growth (1,1)-form on X, with singularities
along D. Then there exist constants A,B > 0 and an integer N ≥ 0 such that
on X \D
(4.1) η +BΘN (dd
c(−Θ1) +Aω) ≥ 0.
If moreover η has Poincaré growth along D, then N can be chosen to be 0.
The proof of the theorem is postponed until §4.3. Now we may discuss a
result appearing as a particular instance of Theorem 4.2.
Theorem 4.3. Suppose that X is compact and let ω be a smooth positive (1,1)-
form on X. Let f : X \D → R be a pre-log-log function, with singularities along
D. Then there exist positive pre-log-log functions, with singularities along D,
ϕ, ψ : X \D −→ R≥0
and constants A,B ≥ 0, N ∈ Z≥0 with the properties
i. f is the difference of ϕ and ψ: f = ϕ− ψ;
ii. the following inequalities hold on X \D:
ωϕ :=dd
c(−ϕ) +BΘN (dd
c(−Θ1) +Aω) ≥ 0,
ωψ :=dd
c(−ψ) +BΘN (dd
c(−Θ1) +Aω) ≥ 0.
If f is P-singular, then N can be chosen to be 0;
iii. if f is P-singular, one can take ϕ, ψ to be P-singular with
ddc(−ϕ) +Aω ≥ 0,
ddc(−ψ) +Aω ≥ 0
on X \D.
Proof. Since X is compact, from the log-log growth of f it is easily seen that
for some constant C > 0 and integer M ≥ 0,
f + CΘM ≥ 0 on X \D.
If f is P-singular, then by Corollary 2.15 we can take (as we do) M ≤ 1. We
define ϕ̃ = f + CΘM and ψ̃ = CΘM . These are positive pre-log-log functions,
with singularities along D (see Lemma 4.7 below). If f is P-singular (hence
M ≤ 1), then ϕ̃, ψ̃ are P-singular (again by Lemma 4.7). By Theorem 4.2 there
exist constants A,B ≥ 0 and N ∈ Z≥0 such that
eϕ := dd
c(−ϕ̃) +BΘN (dd
c(−Θ1) +Aω) ≥ 0
:= ddc(−ψ̃) +BΘN (dd
c(−Θ1) +Aω) ≥ 0
hold on X \D. Hence ϕ = ϕ̃ and ψ = ψ̃ satisfy the requirements of i and ii. If
f is P-singular, then ddc(−ϕ̃), ddc(−ψ̃) have Poincaré growth along D and we
may take N = 0. In this case we have
eϕ =dd
c(−ϕ̃) +mBddc(−Θ1) +mABω
=ddc(−(ϕ̃+mBΘ1)) +mABω
and similarly
= ddc(−(ψ̃ +mBΘ1)) +mABω
In view of these equalities, ϕ = ϕ̃ + mBΘ1 and ψ = ψ̃ + mBΘ1 fulfill the
requirement of iii.
We include the next corollary for its own interest, but we will not need it in
the sequel.
Corollary 4.4. Suppose that X is compact and Kähler. Let ω be a Kähler
form on X. Let f : X \ D → R be a P-singular function and f = ϕ − ψ a
decomposition as in Theorem 4.3 iii. Then the functions
−ϕ,−ψ : X \D −→ R≤0
uniquely extend to quasiplurisubharmonic functions(3) on X.
Proof. First of all, since −ϕ and −ψ are pre-log-log alongD, by Proposition 2.16
we have the equality of currents ddc[−ϕ] = [ddc(−ϕ)] and ddc[−ψ] = [ddc(−ψ)]
on X . The inequalities
ddc[−ϕ] +Aω ≥ 0,
ddc[−ψ] +Aω ≥ 0
(4.2)
then hold on X in the sense of currents. Let U ⊂ X be an open subset diffeo-
morphic to a complex euclidian ball. Because ω is d-closed (Kähler assumption),
by Poincaré’s lemma ω|U is d-exact. Since U itself is Kähler, ω|U is in fact dd
exact. Write ω|U = dd
ch for some smooth function h on U . Then the currents
ddc[−ϕ|U +Ah] and dd
c[−ψ|U +Ah] are positive on U , by (4.2). Since −ϕ and
−ψ are bounded above and D is polar, ϕ̃ := −ϕ|U + Ah, ψ̃ := −ψ|U + Ah
uniquely extend to plurisubharmonic functions on U (see [6], Theorem 5.24).
Since h is smooth, these extensions determine extensions of −ϕ|U and −ψ|U
to quasiplurisubharmonic functions on U , clearly unique. The corollary fol-
lows.
4.2 Construction of pre-log-log functions
4.2.1 Preliminaries
Lemma 4.5. Let M be a complex analytic manifold and α, β, C∞ differential
forms of type (1,0) on M . For every function K > 0 on M , the following
inequality holds:
2Re(iα ∧ β) ≥ −
α ∧ α− iKβ ∧ β.
(3)A quasiplurisubharmonic function on a complex analytic manifold M is an upper semi-
continuous function h : M → [−∞,+∞[ which is locally the sum of a smooth function and a
plurisubharmonic function.
Proof. On one hand, the (1,1)-form
µ = i(α/K1/2 +K1/2β) ∧ (α/K1/2 +K1/2β)
is semi-positive. On the other hand, there is an equality
α ∧ α+ 2Re(iα ∧ β) + iKβ ∧ β.
The lemma follows.
Lemma 4.6. Let L be a line bundle on X. Suppose that s ∈ Γ(X,L) is a global
section such that div s is a divisor with normal crossings. Let ‖ · ‖ be a smooth
hermitian metric on L with ‖s‖2 ≤ e−e. For every integer N ≥ 0, the smooth
function
θN = (log log ‖s‖
−2)N : X \ div s −→ R≥1
is pre-log-log, with singularities along div s. On X \div s the following identities
hold:
∂ θN = NθN−1
∂ log ‖s‖−2
log ‖s‖−2
ddc(−θN ) =
θ1 −N + 1
∂ θN ∧ ∂̄ θN −NθN−1
c1(L)
log ‖s‖−2
provided N ≥ 1. If N = 1, θ1 is P-singular along div s.
Proof. The lemma is trivial for N = 0. We suppose N ≥ 1. First of all we
remark that θN has log-log growth along div s. Next we compute ∂ θN and
ddc(−θN ) on X \ div s. We find
(4.3) ∂ θN = NθN−1
∂ log ‖s‖−2
log ‖s‖−2
ddc(−θN ) = −
∂ ∂̄ θN = −
N ∂ θN−1 ∧
∂̄ log ‖s‖−2
log ‖s‖−2
NθN−1
∂ log ‖s‖−2 ∧ ∂̄ log ‖s‖−2
(log ‖s‖−2)2
NθN−1
∂ ∂̄ log ‖s‖−2
log ‖s‖−2
To simplify the last equality, rearrange terms, use (4.3) and the trivial fact
θN+M = θNθM , and recall that on X \ div s we have c1(L) = dd
c log ‖s‖−2. We
(4.4) ddc(−θN ) =
θ1 −N + 1
∂ θN ∧ ∂̄ θN −NθN−1
c1(L)
log ‖s‖−2
Observe that the quotient (θ1−N+1)/NθN is bounded, so it has log-log growth
along div s. Also the function 1/ log ‖s‖−2 is bounded and c1(L) is smooth, so
that c1(L)/ log ‖s‖
−2 has log-log growth along div s. Hence from (4.3) and (4.4)
we see that it is enough to prove that ∂ θN has log-log growth along div s or,
still, that ∂ log ‖s‖−2/ log ‖s‖−2 has log-log growth along div s.
Let V be an open analytic chart adapted to div s such that L|V can be trivialized
and s = z1 . . . zme, where e is a holomorphic frame of L|V . We can write
∂ log ‖s‖−2
log ‖s‖−2
∂ log ‖e‖−2
log ‖s‖−2
log |zk|
log ‖s‖−2
zk log |zk|−1
The differential form ∂ log ‖e‖−2 is smooth on V and (log ‖s‖−2)−1 is bounded
on V , so that the first term is bounded on any small enough open V ′ ⊂⊂ V . As
for the sum, we observe that log |zk|/ log ‖s‖
−2 is bounded on any small enough
open V ′ ⊂⊂ V , because log ‖s‖−2 = log ‖e‖−2 +
j=1 log |zj|
−2 and log ‖e‖−2
is smooth. Hence ∂ log ‖s‖−2/ log ‖s‖−2 has log-log growth along div s. The
proof is complete.
Lemma 4.7. Under the hypothesis of §4.1 and with the notations therein, the
functions ΘN are pre-log-log, with singularities along D. If N = 1, then Θ1 is
P-singular.
Proof. Write θ
N = (log log ‖sk‖
N , ΘN =
k=1 θ
N and apply Lemma 4.6.
4.2.2 Local results
Let L be a line bundle on X admitting a global section s ∈ Γ(X,L) whose
associated divisor div s is smooth and irreducible. Let ‖·‖ be a smooth hermitian
metric on L with ‖s‖2 ≤ e−e on X . Define, as before, the function
θ1 = log log ‖s‖
−2 : X \ div s −→ R≥1.
By Lemma 4.6 we can write
(4.5) ddc(−θ1) =
∂ log ‖s‖−2 ∧ ∂̄ log ‖s‖−2
(log ‖s‖−2)2
c1(L)
log ‖s‖−2
Let (V ; z1, . . . , zn) be an analytic chart adapted to div s with V ∩div s 6= ∅. We
suppose that L|V can be trivialized and we denote u for a holomorphic frame
such that s = z1u. In local coordinates equality (4.5) becomes dd
c(−θ1) =
α+ β + γ, where
∂ log |z1|
−2 ∧ ∂̄ log |z1|
(log ‖s‖−2)2
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
β :=− 2Re
∂ log |z1|
−2 ∧ ∂̄ log ‖u‖−2
(log ‖s‖−2)2
=2aRe
dz1 ∧ ∂̄ log ‖u‖
z1(log |z1|−2)2
∂ log ‖u‖2 ∧ ∂̄ log ‖u‖2
(log ‖s‖−2)2
c1(L)
log ‖s‖−2
and a = (log |z1|/ log ‖s‖)
2. Decompose ∂̄ log ‖u‖2 =
j=1 qjdz̄j , where the
functions qj are smooth on V . Then β can be expanded as a sum β =
j=1 βj ,
βj := 2aRe
dz1 ∧ dz̄j
z1(log |z1|−2)2
Proposition 4.8. Let L be a line bundle on X admitting a global section s ∈
Γ(X,L) such that div s is smooth and irreducible. Let ‖·‖ be a smooth hermitian
metric on L with ‖s‖2 ≤ e−e and define θ1 = log log ‖s‖
−2. Let ω be a positive
(1,1)-form on X. For every analytic chart (V ; z1, . . . , zn) adapted to div s and
any open V ′ ⊂⊂ V intersecting div s, there exists A > 0 such that on V ′ \ div s
ddc(−θ1) +Aω ≥
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
Proof. Without loss of generality, suppose supV | log ‖u‖| and supV |qj | finite for
all j (otherwise, replace V by a relatively compact open subset containing V ′).
We divide the proof into three steps.
Step 1. Observe that because log ‖s‖ = log |z1|+log ‖u‖ and log ‖u‖ is bounded
on V , the function a uniformly tends to 1 as z1 tends to 0. Therefore there
exists an open V ′′ ⊂ V ′ such that 1/2 ≤ a ≤ 2 on V ′′ \ div s. For later need,
we take V ′′ so that V ′ \ V ′′ does not intersect div s. On V ′′ \div s the following
inequality holds:
(4.6) α ≥
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
Step 2. Define C = maxj supV |qj |. Shrinking V
′′ if necessary, we assume that
|z1| ≤ 1/16n(C + 1) on V
′′. Since a ≤ 2 on V ′′, we have
β1 = 2aRe
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
≥ −4C|z1|
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
(4.7)
on V ′′ \ div s. Fix a constant K ≥ 8nC. Applying Lemma 4.5, for every j > 1
we find
(4.8) βj ≥ −2
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
− 2CK
dzj ∧ dz̄j
(log |z1|−2)2
also on V ′′ \ div s. From inequalities (4.7) and (4.8) we derive that
(4.9) β ≥ −
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
− 2CK
dzj ∧ dz̄j
(log |z1|−2)2
holds on V ′′ \ div s.
Step 3. To conclude, add up (4.6) and (4.9) to find
(4.10) ddc(−θ1) ≥
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
− 2CK
dzj ∧ dz̄j
(log |z1|−2)2
on V ′′ \ div s. The last two terms in (4.10) define a smooth differential form on
V \div s, bounded on V ′′ \div s. Hence, there exists a constant A > 0 such that
(4.11) ddc(−θ1) +Aω ≥
dz1 ∧ dz̄1
|z1|2(log |z1|−2)2
holds on V ′′\div s. Because θ1 is smooth away from div s and V ′ \ V ′′ is compact
and disjoint from div s, after possibly increasing A inequality (4.11) holds on
V ′ \ div s as well, as was to be shown.
4.2.3 Global results
We keep the hypothesis and notations of §4.1.
Proposition 4.9. Suppose that X is compact and let ω be a smooth positive
(1,1)-form on X. Let L be a line bundle on X admitting a global section s ∈
Γ(X,L) such that div s is a divisor with normal crossings. Let ‖ · ‖ be a smooth
hermitian metric on L such that ‖s‖2 ≤ e−e. Define θN = (log log ‖s‖
for any N ∈ Z≥0. Then there exists a constant A = A(N) > 0 such that
ddc(−θN ) +Aω ≥ 0 on X \ div s.
Proof. The case N = 0 is trivial. We treat the case N ≥ 1. By Lemma 4.6 the
following identity holds:
ddc(−θN ) =
θ1 −N + 1
∂ θN ∧ ∂̄ θN −NθN−1
c1(L)
log ‖s‖−2
First of all, the function θN−1/ log ‖s‖
−2 is bounded and the differential form
c1(L) is smooth on X . Since X is compact, there exists a constant A > 0 such
(4.12) −NθN−1
c1(L)
log ‖s‖−2
ω ≥ 0 on X \ div s
Still by the compactness hypothesis and by the very definition of θ1, there
exists an open neighborhood V of div s such that θ1|V ≥ N − 1. Moreover
i ∂ θN ∧ ∂̄ θN ≥ 0, so that
(4.13)
θ1 −N + 1
∂ θN ∧ ∂̄ θN ≥ 0 on V \ div s.
Finally, since θM ≥ 1 is smoooth on X \div s for every integerM ≥ 0 and X \V
is compact, after possibly increasing A we have
(4.14)
θ1 −N + 1
∂ θN ∧ ∂̄ θN +
ω ≥ 0 on X \ V.
Equations (4.12), (4.13) and (4.14) together give the desired positivity property.
Corollary 4.10. Suppose that X is compact and let ω be a smooth positive
(1,1)-form on X. For every integer N ≥ 0 there exists A = A(N) > 0 such that
ddc(−ΘN ) +Aω ≥ 0 holds on X \D.
Proof. This follows from Proposition 4.9 by writing ΘN =
k=1 θ
N , with the
notation θ
N = (log log ‖sk‖
The last proposition of this subsection provides a first approach to Theorem
4.2. We may thus place under the hypothesis and notations therein.
Proposition 4.11. Suppose that X is compact and let ω be a smooth positive
(1,1)-form on X. For every finite covering {(Vα; z
1 , . . . , z
n )}α of X by analytic
charts adapted to D, together with relatively compact open subsets V ′α ⊂⊂ Vα
still forming a covering, there exists a constant A > 0 such that
ddc(−Θ1) +Aω ≥
dzαk ∧ dz̄
|zαk |
2(log |zαk |
holds on V ′α \D for every α.
Proof. This follows immediately from Proposition 4.8.
4.3 Proof of Theorem 4.2
We now complete the proof of Theorem 4.2.
Since X is compact, we can choose a finite open covering Vα of X as in
Proposition 4.11. It is enough to prove the existence of constants A,B,N ful-
filling (4.1) on a single V ′α ⊂⊂ Vα. We write {zi}i for the coordinates on Vα,
instead of {zαi }i. Following Notation 2.7, we develop
dζj ∧ dζ̄j +
2Re(hjk
dζj ∧ dζ̄k),
where the functions hjk, j ≤ k, have log-log growth along D ∩ Vα. There exist
a constant C > 0 and an integer N ≥ 0 such that
|hjk|V ′α\D| ≤ CΘN .
Therefore, by Lemma 4.5, on V ′α \D there is a lower bound
η ≥ −CΘN
dζj ∧ dζ̄j − CΘN
dζj ∧ dζ̄j +
dζk ∧ dζ̄k
From this inequality and Proposition 4.11 we see that there exist B ≥ 1 and a
smooth differential form σ on Vα such that
η +BΘN (dd
c(−Θ1) +Aω) ≥ ΘNσ
holds on V ′α \D. After possibly increasing A, we also have σ +Aω ≥ 0 on V
Since B ≥ 1, we finally find
η +BΘN(dd
c(−Θ1) + 2Aω) ≥ 0
on V ′α \D, as was to be shown. Observe that if η has Poincaré growth, then we
can choose N = 0.
5 Bounding height integrals
5.1 Geometric assumptions and statement of the theorem
Let X be a complex analytic manifold and D ⊂ X a divisor with normal cross-
ings. Let L be a line bundle on X and ‖ · ‖ a pre-log-log hermitian metric on L,
with singularities along D (see Definition 3.1). If ‖ · ‖0 is any smooth hermitian
metric on L, then we can write
‖ · ‖ = e−f/2‖ · ‖0,
where f : X \D → R is a pre-log-log function. If ‖ · ‖ is good along D, then f
is P-singular along D. As usual, abbreviate L = (L, ‖ · ‖) and L0 = (L, ‖ · ‖0).
Suppose now that Y ⊂ X is a compact complex analytic submanifold of pure
dimension d. We assume the following conditions are fulfilled:
i. the submanifold Y meets D in a divisor with normal crossings E in Y ;
ii. the restriction ω := c1(L0)|Y is semi-positive and
degL Y :=
ωd > 0;
iii. there exists a global section s ∈ Γ(Y, L) such that ‖s‖20 ≤ e
−e on Y and
div s is a divisor with normal crossings containing E. In particular div s is
reduced, so that we may indistinctly treat div s as a reduced Weil divisor
or a reduced scheme. For every integer N ≥ 0, we define the pre-log-log
function, with singularities along div s,
ℓN = (log log ‖s‖
N : Y \ div s −→ R≥1;
iv. there exist pre-log-log functions, with singularities along E,
Θ1,ΘN , ϕ, ψ : Y \ E −→ R≥0 (N ∈ Z≥0 depending on f)
with f = ϕ− ψ, and bounds
ϕ ≤ CℓM , ψ ≤ CℓM ,Θ1 ≤ Cℓ1,ΘN ≤ CℓN ,
for some constant C ≥ 0 and integer M ≥ 0. Moreover, if f is P-singular,
we suppose that M = 1 and N = 0;
v. there exists A > 0 such that τ := ddc(−Θ1) + Aω ≥ 0, and for every
integer Q ≥ 0 there exists AQ > 0 such that τQ := dd
c(−ℓQ) + AQω ≥ 0.
For Q = 0, we can choose A0 = 1, so that τ0 = ω;
vi. there exists B > 0 such that
ddc(−ϕ) +BΘNτ ≥ 0,
ddc(−ψ) +BΘNτ ≥ 0
hold on Y \E (and so on Y \div s). Observe that by the bounds in iv, we
then have
ωϕ := dd
c(−ϕ) +BCℓNτ ≥ 0,
ωψ := dd
c(−ψ) +BCℓNτ ≥ 0
on Y \ div s.
The aim of this section is to find bounds for the height integrals
Jp :=
f c1(L)
p c1(L0)
d−p, 0 ≤ p ≤ d.
We observe that f c1(L)
p c1(L0)
is a pre-log-log differential form on Y with
singularities along E, hence locally integrable (see Proposition 2.16). In partic-
ular, since E ⊆ div s and div s is Lebesegue negligible, the integrals Jp can be
computed on Y \ div s.
Theorem 5.1. There exist constants α, β > 0, R ∈ Z≥0, depending only on A,
{AN}N , B, C, M and N , such that, for any p ∈ {0, . . . , d},
|Jp| ≤ α degL Y + β · (degL Y ) · log
log ‖s‖−20
c1(L0)
degL Y
If ‖ · ‖ is good along E, then we can take R = 1, so that
|Jp| ≤ α degL Y + β · (degL Y ) · log
log ‖s‖−20
c1(L0)
degL Y
The theorem will be reduced to the bounds claimed by the following two
propositions.
Proposition 5.2. Let σ be a closed, real and semi-positive pre-log-log (t,t)-form
on Y , with singularities along div s. Let a, b be integers such that a+ b+ t = d.
If a, b ≥ 0, define the integral
I(M,a, b, σ) =
Otherwise set I(M,a, b, σ) = 0. Then, if a > 0, the following bound holds
I(M,a, b, σ) ≤ BCI(M +N, a− 1, b, στ)
+ CI(M,a− 1, b, στM )
+ bBC2I(2M,a− 1, b− 1, σττN )
+ (a− 1)BC2I(2M,a− 2, b, σττN ).
If f is P-singular (in which case N = 0 and M = 1), then
I(1, a, b, σ) ≤ ABCI(1, a− 1, b, στ0)
+ (B + 1)CI(1, a− 1, b, στ1).
Similar bounds are true if b > 0, exchanging the role of a and b.
Proposition 5.3. Let σ be a (d, d)-form which is a product of (1,1)-forms of
the kind τ or τQ, Q ≥ 0. Let W be the set of integers Q ≥ 0 such that τQ
appears in σ. Fix an integer K ≥ 0. Then there exist constants α, β > 0 and
an integer R ≥ 0, depending only on A, {AN}N , C, K and W , such that
ℓKσ ≤ α degL Y + β(degL Y ) log
log ‖s‖−20
c1(L0)
degL Y
If K = 1 and σ is a product of differential forms of the kind τ0, τ1 (i.e. W ⊆
{0, 1}), then we can take R = 1.
Assuming for the moment the propositions, we prove Theorem 5.1.
Proof of Theorem 5.1. We first observe that c1(L) = dd
cf + c1(L0), so that
(5.1) Jp =
f(ddcf)jωd−j.
Next, on Y \ div s we write f = ϕ− ψ and ddcf = ωψ − ωϕ. We get
(−1)j−k
ϕωkψω
d−j −
ψωkψω
The coefficients
can be bounded in terms of dimX (hence independently
of Y and the hermitian line bundles). Therefore we are reduced to bound
integrals of the kind ∫
ϕωaϕω
ψωaϕω
for integers a, b, c ≥ 0 such that a + b + c = d. Since the differential forms
c are semi-positive and 0 ≤ ϕ, ψ ≤ CℓM , we have to find upper bounds
for the integrals
I(M,a, b, ωc) =
Successively applying Proposition 5.2, we reduce our problem to find bounds for
integrals
ℓKσ, where σ is a product of (1,1)-forms of type τ or τQ, for some
integers K, Q ≥ 0. If f is P-singular, then K = 1 and σ is a product of forms
of type τ0 and τ1. We conclude by Proposition 5.3.
5.2 Proofs of Proposition 5.2 and Proposition 5.3
We proceed to prove Proposition 5.2 and Proposition 5.3. The proofs make an
extensive use of Stokes’ theorem for pre-log-log differential forms. We refer to
Proposition 2.16 for the statement and references. To simplify the exposition, it
will be worth having at our disposal the computations summarized in the next
lemma.
Lemma 5.4. Let a, b ≥ 0 be integers. On Y \ E the following equalities hold:
∂(ωaϕω
ψ) = aBC(∂ ℓN)ω
ψτ + bBC(∂ ℓN )ω
ψ τ ;
∂ ∂̄(ωaϕω
ψ) =aBC(∂ ∂̄ ℓN)ω
−aBC(∂̄ ℓN ) ∂(ω
+bBC(∂ ∂̄ ℓN )ω
−bBC(∂̄ ℓN ) ∂(ω
ψ )τ.
Proof. It is enough to apply the definition of ωϕ, ωψ, Leibniz’ rule and the fact
that ddc(−ϕ), ddc(−ψ) and τ are ∂ and ∂̄-closed.
Proof of Proposition 5.2. Write I = I(M,a, b, σ). We suppose that a > 0 and
b ≥ 0. We decompose
ωaϕ = dd
c(−ϕ)ωa−1ϕ +BCℓNτω
Accordingly, the integral I decomposes as I = I1 +BCI2, with
c(−ϕ)ωa−1ϕ ω
ℓM+Nω
Bounding I1. To get bounds on I1 we apply Stokes’ theorem for pre-log-log
forms. For this, we first recall that σ is closed of degree (t, t) and ωϕ, ωψ are of
degree (1, 1). Then, by Leibniz’ rule, we compute
∂̄(−ϕ)ωa−1ϕ ω
=ℓMdd
c(−ϕ)ωa−1ϕ ω
+(∂ ℓM )
∂̄(−ϕ)ωa−1ϕ ω
∂̄(−ϕ) ∂(ωa−1ϕ ω
where we used that ddc = i ∂ ∂̄ /2π. By Stokes’ theorem, we find I1 = I1,1+I1,2,
where
I1,1 = −
(∂ ℓM ) ∂̄(−ϕ)ω
I1,2 =
ℓM ∂̄(−ϕ) ∂(ω
Bounding I1,1. Again we apply Stokes’ theorem. By Lebniz’ rule we have
(−ϕ)(∂ ℓM )ω
(∂ ℓM ) ∂̄(−ϕ)ω
(−ϕ)(∂ ∂̄ ℓM )ω
(−ϕ)(∂ ℓM ) ∂̄(ω
Therefore, by Stokes’ theorem, we get I1,1 = I1,1,1 + I1,1,2, where
I1,1,1 =
ϕddc(−ℓM )ω
I1,1,2 =
(−ϕ)(∂ ℓM ) ∂̄(ω
Bounding I1,1,1. Recall that, by assumption, there exists a constant AM > 0
such that τM = dd
c(−ℓM )+AMω ≥ 0. Then, since ω
ψωσ is a semi-positive
form and 0 ≤ ϕ ≤ CℓM , we have a bound
I1,1,1 ≤
ϕ(ddc(−ℓM ) +AMω)ω
=CI(M,a− 1, b, στM ).
(5.2)
Bounding I1,1,2. To bound the integral I1,1,2 we first appeal to Lemma 5.4 to
develop
∂̄(ωa−1ϕ ω
ψ) = (a− 1)BC(∂̄ ℓN )ω
ψτ + bBC(∂̄ ℓN)ω
Then we write I1,1,2 = (a− 1)BCI
1,1,2 + bBCI
1,1,2, where
1,1,2 =
(−ϕ)(∂ ℓM )(∂̄ ℓN )ω
1,1,2 =
(−ϕ)(∂ ℓM )(∂̄ ℓN )ω
ψ στ.
In these integrals we observe that
(5.3) i ∂ ℓM ∧ ∂̄ ℓN = NMℓN+M−2i ∂ ℓ1 ∧ ∂̄ ℓ1,
which is a semi-positive differential form. Since ϕ ≥ 0 and ωϕ, ωψ, σ and τ are
also semi-positive, we find I
1,1,2 ≤ 0 and I
1,1,2 ≤ 0. This shows
(5.4) I1,1,2 = (a− 1)BCI
1,1,2 + bBCI
1,1,2 ≤ 0.
From (5.2) and (5.4) we conclude with the bound
(5.5) I1,1 = I1,1,1 + I1,1,2 ≤ CI(M,a− 1, b, στM ).
Bounding I1,2. As before we proceed by successive applications of Stokes’ the-
orem. First of all we compute
ℓM (−ϕ) ∂(ω
ℓM ∂̄(−ϕ) ∂(ω
(−ϕ)(∂̄ ℓM ) ∂(ω
(−ϕ)ℓM ∂̄ ∂(ω
By Stokes’ theorem we find I1,2 = I1,2,1 + I1,2,2, with
I1,2,1 =
ϕ(∂̄ ℓM ) ∂(ω
I1,2,2 =
(−ϕ)ℓM ∂ ∂̄(ω
Bounding I1,2,1. By Lemma 5.4 we get the expansion
∂(ωa−1ϕ ω
ψ) = (a− 1)BC(∂ ℓN )ω
ψτ + bBC(∂ ℓN)ω
Accordingly, we decompose I1,2,1 = (a− 1)BCI
1,2,1 + bBCI
1,2,1, where
1,2,1 = −
ϕ(∂ ℓN )(∂̄ ℓM )ω
1,2,1 = −
ϕ(∂ ℓN )(∂̄ ℓM )ω
ψ στ.
By (5.3) the differential form i ∂ ℓN ∧ ∂̄ ℓM is semi-positive. Since ϕ ≥ 0 and
ωϕ, ωψ, σ and τ are also semi-positive, we have I
1,2,1, I
1,2,1 ≤ 0. This proves
(5.6) I1,2,1 = (a− 1)BCI
1,2,1 + bBCI
1,2,1 ≤ 0.
Bounding I1,2,2. To bound the integral I1,2,2 we first recall from Lemma 5.4
∂ ∂̄(ωa−1ϕ ω
ψ) = (a− 1)BC(∂ ∂̄ ℓN)ω
− (a− 1)BC(∂̄ ℓN) ∂(ω
+ bBC(∂ ∂̄ ℓN)ω
− bBC(∂̄ ℓN ) ∂(ω
ψ )τ.
Corresponding to this expansion, we write I1,2,2 = (a − 1)BCI
1,2,2 + (a −
1)BCI
1,2,2 + bBCI
1,2,2 + bBCI
1,2,2, with the obvious notations for the inte-
grals I
1,2,2 (see below).
Bounding I
1,2,2. We have
1,2,2 =
ϕℓMdd
c(−ℓN)ω
Recall that for some constant AN > 0 the differential form τN = dd
c(−ℓN) +
ANω is semi-positive. Moreover 0 ≤ ϕ ≤ CℓM . Thus we find the bound
1,2,2 ≤C
ψσττN
=CI(2M,a− 2, b, σττN ).
(5.7)
Bounding I
1,2,2. We write
1,2,2 =
ϕℓM (∂̄ ℓN) ∂(ω
ψ)στ.
By (5.3) and Lemma 5.4, i ∂(ωa−1ϕ ω
ψ) ∧ ∂̄ ℓN is semi-positive, so that
(5.8) I
1,2,2 ≤ 0.
Bounding I
1,2,2. We have
1,2,2 =
ϕℓMdd
c(−ℓN)ω
Reasoning as for I
1,2,2, we arrive to
1,2,2 ≤C
ψ σττN
=CI(2M,a− 1, b− 1, σττN ).
(5.9)
Bounding I
1,2,2. We finally bound the integral
1,2,2 =
ϕℓM (∂̄ ℓN ) ∂(ω
ψ )στ.
Again by (5.3) and Lemma 5.4, the differential form i ∂(ωa−1ϕ ω
ψ ) ∧ (∂̄ ℓN ) is
semi-positive, so that
(5.10) I
1,2,2 ≤ 0.
We now conclude with a bound for I1,2, since the inequalities (5.6)–(5.10) yield
I1,2 ≤ I1,2,2 ≤(a− 1)BC
2I(2M,a− 2, b, σττN )
+bBC2I(2M,a− 1, b− 1, σττN ).
(5.11)
As for I1, the bounds (5.5) and (5.11) lead to
I1 = I1,1 + I1,2 ≤CI(M,a− 1, b, στM )
+bBC2I(2M,a− 1, b− 1, σττN )
+(a− 1)BC2I(2M,a− 2, b, σττN ).
(5.12)
Bounding I2. We have
(5.13) I2 =
ℓM+Nω
ψστ = I(M +N, a− 1, b, στ)
To conclude we put (5.12) and (5.13) together and we get
I = I1 +BCI2 ≤BCI(M +N, a− 1, b, στ)
+CI(M,a− 1, b, στM )
+bBC2I(2M,a− 1, b− 1, σττN )
+(a− 1)BC2I(2M,a− 2, b, σττN ),
as was to be shown.
Suppose now that f is P-singular (so that N = 0 andM = 1). We can write
ωϕ = dd
c(−ϕ̃) +ABCω,
ωψ = dd
c(−ψ̃) +ABCω,
where ϕ̃ = ϕ + BCΘ1 and ψ̃ = ψ + BCΘ1. The same method followed above
allows to establish the bound
I(1, a, b, σ) ≤ ABCI(1, a− 1, b, στ0)
+ (B + 1)CI(1, a− 1, b, στ1).
The details are left to the reader.
Proof of Proposition 5.3. Let σ be a (d, d)-form which is a product of (1,1)-
forms of type τ or τQ, Q ≥ 0. We write σ = τ
sσ1, for some s ≥ 0 and σ1 a
product of forms of type τQ. Define
(5.14) J(K, s, σ1) =
First of all we show how to reduce s to 0. The argument is by induction. If
s > 0, recalling that τ = ddc(−Θ1) +Aω we write
(5.15) τs = ddc(−Θ1)τ
s−1 +Aωτs−1.
Since τ0 = ω, we get from the definition of J in (5.14) and from (5.15)
(5.16) J(K, s, σ1) = AJ(K, s− 1, σ1τ0) +
c(−Θ1)τ
s−1σ1.
We next bound the integral on the right hand side of (5.16). Since τ and σ1 are
∂ and ∂̄-closed, applying Stokes’ theorem for pre-log-log forms we get
(5.17)
c(−Θ1)τ
s−1σ1 =
c(−ℓK)τ
s−1σ1.
Now ddc(−ℓK) = τK −AKτ0, the forms τ0, τK are semi-positive and 0 < Θ1 ≤
Cℓ1, so that from (5.17) and the definition of J we derive
(5.18)
c(−Θ1)τ
s−1σ1 ≤ CJ(1, s− 1, σ1τK).
Observe that because ℓK = ℓ
1 and ℓ1 ≥ 1, the inequality ℓ1 ≤ ℓK holds.
Therefore
(5.19) J(1, s− 1, σ1τK) ≤ J(K, s− 1, σ1τK).
From (5.16)–(5.19) we arrive to
J(K, s, σ1) ≤ AJ(K, s− 1, σ1τ0) + CJ(K, s− 1, σ1τK).
Hence we may suppose that s = 0, so that σ is a product τQ1 . . . τQd . We have
to deal with
L(K,Q1, . . . , Qd) =
ℓKτQ1 . . . τQd .
Again by an inductive argument we show how to reduce all the integers Qi to
0. Suppose that Q1 > 0. Then we write τQ1 = dd
c(−ℓQ1) +AQ1τ0, so that
(5.20) L(K,Q1, . . . , Qd) = AQ1L(K, 0, Q2, . . . , Qd) +
c(−ℓQ1)σ1
where σ1 = τQ2 . . . τQd . We study the integral on the right hand side of (5.20).
Because σ1 is ∂ and ∂̄-closed, by Stokes’ theorem for pre-log-log forms we find
L2 :=
c(−ℓQ1)σ1 =
(∂ ℓK)(∂̄ ℓQ1)σ1
KQ1ℓK+Q1−2(∂ ℓ1)(∂̄ ℓ1)σ1.
(5.21)
By the very definition of ℓ1, we have
(5.22) ∂ ℓ1 ∧ ∂̄ ℓ1 =
∂ log ‖s‖−20 ∧ ∂̄ log ‖s‖
(log ‖s‖−20 )
On the other hand, since ‖s‖20 ≤ e
−e, there exists a constant D depending only
on K +Q1 − 2 such that
(5.23)
ℓK+Q1−2
(log ‖s‖−20 )
(log log ‖s‖−20 )
K+Q1−2
(log ‖s‖−20 )
(log ‖s‖−20 )
Moreover observe that
(5.24)
(log ‖s‖−20 )
∂ log ‖s‖−20 = −2 ∂(log ‖s‖
−1/2.
Because i ∂ log ‖s‖−20 ∧ ∂̄ log ‖s‖
0 and σ1 are semi-positive, combining (5.21)–
(5.24) we get
(5.25) L2 ≤ −2
KQ1D(∂(log ‖s‖
−1/2)(∂̄ log ‖s‖−20 )σ1.
By Lemma 5.5 below, we can apply Stokes’ theorem to the right hand side of
(5.25) and obtain
(5.26) L2 ≤ 2
KQ1D(log ‖s‖
−1/2ωσ1.
By the hypothesis on ‖ · ‖0, (log ‖s‖
−1/2 ≤ 1. Using the positivity of ωσ1,
from (5.26) we derive
(5.27) L2 ≤ 2KQ1D
Now recall that ω is ∂ and ∂̄-closed, and τQ = dd
c(−ℓQ) +AQω, so that
ωσ1 = AQ2 . . . AQdω
d + ddcσ2
for some pre-log-log form σ2. Applied to (5.27) this provides
(5.28) L2 ≤ 2Q1AQ2 . . . AQdKD
ωd = 2Q1AQ2 · . . . · AQdKD degL Y.
The identity (5.20) together with (5.28) imply the inequality
L(K,Q1, . . . , Qd) ≤ AQ1L(K, 0, Q2, . . . , Qd) + 2Q1AQ2 · . . . · AQdKD degL Y.
Successively repeating this argument, we reduce Q2, . . . , Qd to 0. Hence it
remains to treat the integrals
M(K) =
We want to apply Jensen’s inequality. First of all, rewrite
M(K) = (degL Y )
degL Y
so that ωd/ degL Y defines a probability measure on Y . Secondly, the function
x 7→ (log x)K is concave on ]eK−1,+∞[, because
(log x)K =
(log x)K−2(K − 1− log x).
Since ‖s‖−20 ≥ e
e, in particular eK+1 log ‖s‖−20 > e
K−1. We use that logK is an
increasing function and apply Jensen’s inequality:
M(K) ≤(degL Y )
logK(eK+1 log ‖s‖−20 )
degL Y
≤(degL Y ) log
log ‖s‖−20
degL Y
By the trivial inequality x+ y ≤ 2xy for real x, y ≥ 1, we finally arrive to
M(K) ≤ (degL Y )(2K + 2)
K logK
log ‖s‖−20
degL Y
This concludes the proof of the proposition, except for the fact that we can take
R = 1 when K = 1 and σ is a product of forms τ0 and τ1. This last case is
similarly treated and left to the reader.
Lemma 5.5. Let µ be a closed pre-log-log (d-1,d-1)-form on Y , with singulari-
ties along div s. Then we have
(log ‖s‖−20 )
∧ ∂̄ log ‖s‖−20 µ =
(log ‖s‖−20 )
∧ ωµ.
Proof. The proof of the lemma follows the ideas of Lemma 7.36 in [2].
First of all, as for log-log growth differential forms, the form
(log ‖s‖−20 )
∧ ∂̄ log ‖s‖−20 µ = −
(log ‖s‖−20 )
1/2 ∂ ℓ1 ∧ ∂̄ ℓ1µ
is locally integrable on Y . Indeed, after localizing to an analytic chart adapted
to div s and changing to polar coordinates, we are reduced to point out that,
for every 0 < δ < 1/2, we have an estimate
(5.29)
∫ ε/e
(log log t−1)N (log t−1)1/2
t(log t)2
∫ ε/e
t(log t−1)3/2−δ
< +∞.
Define
I = −
(log ‖s‖−20 )
∧ ∂̄ log ‖s‖−20 µ.
We construct a finite open covering {(V ′α; {z
i }i)}α of div s, by adapted analytic
open charts. Suppose that via the coordinates {zαi }α, V
α is identified with
(r = r(α)), so that V ′α \ D corresponds to ∆
further assume that in these coordinates we can write s = zα1 · . . . · z
r uα, where
uα is a holomorphic unit. After possibly adding a finite number of adapted
analytic charts to {(V ′α; {z
i }i)}α, the open subsets Vα ⊂⊂ V
α identified with
via the coordinates {zαi }i still cover div s. Write Ω = ∪αVα. Take
a finite open covering {Vβ}β of X \ Ω, so that Vβ ∩ (div s) = ∅ for all β. Let
{χα}α ∪ {χβ}β be a partition of unity subordinate to {Vα}α ∪ {Vβ}β, with χγ
vanishing outside Vγ for all γ = α, β. We can expand
where
Iγ := −
(log ‖s‖−20 )
∧ ∂̄ log ‖s‖−20 χγµ.
We first treat the integrals Iβ . Observe that for any C
∞ differential form ν on
Y \ div s, the differential form χβν is C
∞ on Y , because χβ vanishes on Y \ Vβ
and Vβ ∩ (div s) = ∅. Moreover the equality d(χβν) = dχβ ∧ ν + χβ ∧ dν holds
on Y . For ν = i ∂ ∂̄ log ‖s‖−20 /2π we find χβν = χβω on Y . These observations,
together with dµ = 0, yield, by Stokes’ theorem,
(log ‖s‖−20 )
(log ‖s‖−20 )
∂̄ log ‖s‖−20 (dχβ)µ,
(5.30)
On the other hand, for every α define Bαε (div s) to be the ε-neighborhood of
div s given by
Bαε (div s) =
Bαε (Tk),
where Tk is the divisor z
k = 0 in Vα and
Bαε (Tk) = ∆
×∆ε ×∆
×∆s1/2e ⊂ Vα.
Then we write Iα = limε→0 Iα,ε, where Iα,ε = I
α,ε + I
α,ε + I
α,ε and
I(1)α,ε :=−
Y \Bαε (div s)
(log ‖s‖−20 )
∂̄ log ‖s‖−20 χαµ
I(2)α,ε :=
Y \Bαε (div s)
(log ‖s‖−20 )
I(3)α,ε :=−
Y \Bαε (div s)
(log ‖s‖−20 )
∂̄ log ‖s‖−20 (dχα)µ.
The differential form in I
α,ε is integrable on Y , since it has log-log growth along
div s (Proposition 2.16). The differential form in I
α,ε is integrable on Y , too.
Indeed, after localizing to an analytic chart adapted to div s and changing to
polar coordinates, we are reduced to prove the convergence of the integrals
(5.31)
∫ 1/e
(log log t−1)N
(log t−1)1/2
(5.32)∫ 1/e
(log log t−1)N
(log t−1)1/2
t log t−1
∫ 1/e
(log log t−1)N
t(log t−1)3/2
From the boundedness of (log log t−1)N/(log t−1)1/2 the convergence of (5.31)
follows. The second one (5.32) has already been treated (5.29). Therefore
limε→0 I
α,ε = I
α and limε→0 I
α,ε = I
α , where
I(2)α :=
(log ‖s‖−20 )
ωχαµ,
I(3)α :=−
(log ‖s‖−20 )
∂̄ log ‖s‖−20 (dχα)µ.
As for the integral I
α,ε, after applying Stokes’ theorem we find
I(1)α,ε =
∂ Bαε (div s)
(log ‖s‖−20 )
∂̄ log ‖s‖−20 χαµ.
Taking into account that χα vanishes on ∂ Vα, we easily see that
(5.33) |I(1)α,ε| ≤
∂∗ Bαε (Tk)
(log ‖s‖−20 )
∣∣∂̄ log ‖s‖−20 χαµ
with the notation
∂∗Bαε (Tk) = ∆
× ∂∆ε ×∆
Observe that ∂∗Bαε (Tk) is fibered in circles over Tk, via the projection
pαk,ε : ∂
∗Bε(Tk) −→ Tk
(zα1 , . . . , z
d ) 7−→ (z
1 , . . . , z
k−1, 0, z
k+1 . . . , z
we will mean integral along the fibers of pαk,ε. We claim that, for every
k = 1, . . . , r,
(5.34) lim
(log ‖s‖−20 )
∣∣∂̄ log ‖s‖−20 χαµ
∣∣ = 0.
Indeed, write s = zα1 . . . z
r uα on V
α, where uα is a holomorphic unit. Since χαµ
is a (d− 1, d− 1) pre-log-log form, the differential form to integrate under
has the shape
f(zα1 , . . . , z
(log log |zαk |
(log ‖s‖−20 )
∣∣∣∣∣∣
dz̄αk
j 6=k
dzαj ∧ dz̄
∣∣∣∣∣∣
where f behaves as follows:
0 ≤ f(zα1 , . . . , z
d ) ≺
j 6=k
(log log |zαj |
|zαj |
2(log |zαj |
We point out that |dz̄αk /z̄
k | is bounded along the fibers of p
k,ε, that the form
j 6=k dz
j ∧ dz̄
∣∣∣ is integrable and (log log |zαk |−1)M/(log ‖s‖
1/2 vanishes
along Tk. This is enough to prove the claim (5.34). Therefore, from (5.33)
limε→0 I
α,ε = 0 and consequently
Iα =I
α + I
(log ‖s‖−20 )
(log ‖s‖−20 )
∂̄ log ‖s‖−20 (dχα)µ.
(5.35)
Finally, since the open covering {Vα}α ∪ {Vβ}β is finite, from (5.30) and (5.35)
we derive
(log ‖s‖−20 )
χγ)ωµ
(log ‖s‖−20 )
∂̄ log ‖s‖−20 d(
χγ)µ.
(5.36)
The assertion of the lemma follows from (5.36) once we note that
γ χγ = 1
and d(
γ χγ)= 0.
5.3 Application to the projective case
We suppose that X is a nonsingular complex projective variety (non necessarily
connected) and D ⊆ X a divisor with simple normal crossings. Consider the
following elements:
i. as in section 4, for every integer N ≥ 0 introduce a function ΘN (see
Notation 4.1). By Lemma 4.7, the functions ΘN are pre-log-log, with
singularities along D;
ii. an ample line bundle L on X , admitting a global section s ∈ Γ(X,L)
such that div s is a divisor with simple normal crossings containing D. In
particular div s is reduced and may be seen as a reduced Weil divisor or a
reduced scheme;
iii. a pre-log-log metric ‖ · ‖ on L, with singularities along D;
iv. a smooth hermitian metric ‖·‖0 on L with ω := c1(L0) > 0 and ‖s‖
0 ≤ e
v. as in §5.1 we introduce ℓQ = (log log ‖s‖
Q, Q ∈ Z≥0. By Lemma 4.6,
the function ℓQ is pre-log-log, with singularities along div s. Moreover,
Proposition 4.9 asserts the existence of a constant AQ > 0 such that
τQ := dd
c(−ℓQ) +AQω ≥ 0. We can take A0 = 1.
We write ‖ · ‖ = e−f/2‖ · ‖0, where f : X \D → R is a pre-log-log function, with
singularities along D. If ‖ · ‖ is good, then f is P-singular. Since X is compact,
associated to ω = c1(L0) there is a decomposition f = ϕ − ψ as in Theorem
4.3. Moreover, because D ⊆ div s, there exist a constant C ≥ 0 and an integer
M ≥ 0 such that
ϕ ≤ CℓM , ψ ≤ CℓM ,Θ1 ≤ Cℓ1,ΘN ≤ CℓN
hold on X \div s. If ‖ · ‖ is good, then we can take M = 1. Finally, by Theorem
4.3 and Proposition 4.9, there exist constants A,B > 0 and an integer N ≥ 0
such that τ := ddc(−Θ1) +Aω ≥ 0 and
ωϕ :=dd
c(−ϕ) +BΘN(dd
c(−Θ1) +Aω) ≥ 0,
ωψ :=dd
c(−ψ) +BΘN (dd
c(−Θ1) +Aω) ≥ 0.
If ‖ · ‖ is good along D, then we can take N = 0. Therefore the assumptions of
§5.1 are fulfilled for X .
Let now π : X ′ → X be a morphism of complex analytic manifolds such that the
inverse image schemes π−1(D) ⊆ π−1(div s) are divisors with normal crossings.
Let Y ⊆ X ′ be a compact complex analytic submanifold of pure dimension d.
Suppose that Y meets π−1(div s) in a divisor with normal crossings in Y . Then
Y ∩D is a divisor with normal crossings in Y , too. We pull-back by π all the
objects introduced above (π−1(D), π∗ΘN , π
∗L, π∗s, π∗‖ · ‖, etc.) and then
we restrict them to Y . We obtain corresponding objects on Y (π−1(D) ∩ Y ,
(π∗ΘN)|Y , (π
∗L)|Y , (π
∗s)|Y , (π
∗‖ · ‖)|Y , etc.). Provided that degπ∗L Y > 0, the
requirements of §5.1 are fulfilled on Y . It is important to point out that the
involved constants A, {AQ}Q, B, C, M , N don’t depend on the data X
′, π, Y .
For every integer 0 ≤ p ≤ d define
J∗p =
(π∗f) c1(π
∗L)p c1(π
As a consequence of Theorem 5.1 we get the following corollary.
Corollary 5.6. Let π : X ′ → X be a morphism of complex analytic manifolds
such that π−1(D) and π−1(div s) are divisors with normal crossings. Let Y ⊆ X ′
be a compact complex analytic submanifold of pure dimension d, intersecting
div s in a divisor with normal crossings in Y . Suppose that degπ∗L Y > 0.
There exist constants α, β > 0 and an integer R ≥ 0, depending only on A,
{AQ}Q, B, C, M and N such that, for all p ∈ {0, . . . , d},
|J∗p | ≤ α degπ∗L Y + β · (degπ∗L Y ) · log
log π∗‖s‖−20
degπ∗L Y
If ‖ · ‖ is good along D, then we can take R = 1:
|J∗p | ≤ α degπ∗L Y + β · (degπ∗L Y ) · log
log π∗‖s‖−20
degπ∗L Y
Let Z be a reduced closed subscheme of X of pure dimension d, intersecting
div s properly. Denote by Z1, . . . , Zr the irreducible components of Z. For every
i = 1, . . . , r, let πi : Xi → X be an imbedded resolution of singularities of Zi,
such that π−1i (div s) is a divisor with normal crossings intersecting the strict
transform Z̃i of Zi in a divisor with normal crossings in Z̃i (see [2], Theorem
7.27). Then Corollary 5.6 applies to πi, Zi, for every i = 1, . . . , r. If θ is a
smooth differential form of degree 2d on X \div s, locally integrable on X , then
we adopt the convention ∫
π∗i θ.
This definition intrinsically depends on Z, and not on the choice of the resolu-
tions πi. With this convention, define
f c1(L)
p c1(L0)
In this situation Corollary 5.6 reads as follows.
Corollary 5.7. Let Z be a reduced closed subscheme of X of pure dimension d,
intersecting div s properly. There exist positive constants α, β, and an integer
R ≥ 0, depending only on A, {AQ}Q, B, C, M and N such that, for all
p ∈ {0, . . . , d},
|Jp| ≤ α degL Z + β · (degL Z) · log
log ‖s‖−20
c1(L0)
degL Z
with R = 1 whenever ‖ · ‖ is good along D.
Proof. Decompose Z into irreducible components: Z = Z1∪ . . .∪Zr. Following
the convention above, define
J (i)p =
f c1(L)
p c1(L0)
so that
(5.37) Jp =
J (i)p .
By Corollary 5.6, there exist constants α, β > 0 and an integer R ≥ 0, depending
only on A, {AQ}Q, B, C, M and N such that, for all p ∈ {0, . . . , d},
(5.38) |J (i)p | ≤ α degL Zi + β · (degL Zi) · log
log ‖s‖−20
c1(L0)
degL Zi
If ‖ · ‖ is good along D, then we can take R = 1. Now recall that the function
logR is increasing and concave on ]eR−1,+∞[, so that
degL Zi
degL Z
log ‖s‖−20
c1(L0)
degL Zi
≤ logR
degL Zi
degL Z
log ‖s‖−20
c1(L0)
degL Zi
= logR
log ‖s‖−20
c1(L0)
degL Z
≤(2R+ 2)R logR
log ‖s‖−20
c1(L0)
degL Z
(5.39)
For the last inequality we used that x + y ≤ 2xy for real x, y ≥ 1. The lemma
follows combining (5.37)–(5.39).
6 Arakelovian heights
In this section we turn to an arithmetic situation and deal with arakelovian
heights on arithmetic varieties. We prove Theorem 1.3, which can be seen as an
arithmetic counterpart of the bounds in §5. A remarkable and straightforward
outcome is the finiteness property of arakelovian heights with respect to pre-log-
log hermitian ample line bundles, as well as the existence of a universal lower
bound (Corollary 1.4).
6.1 Heights attached to pre-log-log hermitian line bundles
Let K be a number field and OK its ring of integers. Write S = SpecOK .
Throughout this section we work with a fixed arithmetic variety π : X → S
of relative dimension n. We recall this means that X is a flat and projective
scheme over S , with regular generic fiber XK = X ×S SpecK of pure di-
mension n. The set of complex points X (C) of X has a natural structure of
complex analytic manifold, and it can be decomposed as
X (C) =
σ:K →֒C
Xσ(C).
Complex conjugation induces an antiholomorphic involution
F∞ : X (C) −→ X (C).
We fixD ⊂ XK a divisor, such that D(C) ⊂ X (C) has simple normal crossings.
Write U = X (C) \D(C).
Notation 6.1 ([2]). We define Z
U (X ) to be the free abelian group generated
by the irreducible reduced subschemes Z ⊆ X of codimension p, such that Z(C)
intersects D(C) properly. We call Z
U (X ) the group of cycles of codimension
p, intersecting D(C) properly. A cycle z is said to be vertical if its components
are supported on closed fibers X℘, ℘ ∈ S \ {(0)}. A cycle z is said to be
horizontal if its irreducible components are flat over S . We denote by Z
U (XK)
the subgroup of Z
U (X ) of horizontal cycles.
Definition 6.2. A pre-log-log hermitian line bundle on X , with singularities
along D, is a couple L = (L , ‖ · ‖) formed by
i. a line bundle (invertible sheaf) L on X ;
ii. a pre-log-log hermitian metric ‖·‖ on the line LC on X (C), with singular-
ities along D(C), and invariant under the action of complex conjugation
F∞: F
∞‖ · ‖ = ‖ · ‖.
In [2], [3], Burgos, Kramer and Kühn attach a height morphism to a pre-
log-log hermitian line bundle L ,
U (X ) −→ R,
generalizing the height morphism for smooth hermitian line bundles introduced
by Bost, Gillet and Soulé in [1]. We refer the reader to the cited bibliography
for the precise definition and basic properties of h
, both in the smooth and
pre-log-log case. For our purposes, it will be enough to state the following
propositions summarizing the main features of h
Proposition 6.3. Let L be a pre-log-log hermitian line bundle on X , with
singularities along D. The height h
satisfies the following three properties:
H1. if PK : SpecK → X is a K-valued point whose image does not belong to
D, and P : S → X denotes its extension to S , then
(P∗S ) = d̂eg(P
L )(4)
(4)The arithmetic degree ddeg of a hermitian line bundle M = (M, ‖ · ‖) over SpecOK is
defined as follows: if s is a non zero global section of M, then ddeg(M) = log ♯
σ:K →֒C log ‖s‖σ .
H2. if z is a vertical cycle supported on a closed fiber X℘, then
(z) = log(N℘) deg
where N℘ denote the norm of the ideal ℘;
H3. let z ∈ Z
n+1−p
U (XK) be irreducible and reduced. Let s be a rational section
of L ⊗N which does not identically vanish on z. If (div(s).z)(C) intersects
D(C) properly, then
(z) = h
(div(s).z)−
log(‖s‖L⊗N ) c1(L )
Proof. This follows from the definition of h
and the extended arithmetic in-
tersection theory in [2].
Remark 6.4. i. The convergence of the integral in H3 is implicit in the state-
ment.
ii. If LK is an ample line bundle, then an easy inductive argument shows that
the properties H1, H2 and H3 actually characterize h
Proposition 6.5. Let L be a line bundle on X , ‖ · ‖ a pre-log-log hermitian
metric on L , with singularities along D, and ‖ · ‖0 a smooth hermitian metric
on L . Write L = (L , ‖ · ‖), L 0 = (L , ‖ · ‖0) and ‖ · ‖ = e
−f/2‖ · ‖0, where
f : X (C) \D(C) → R is a pre-log-log function, with singularities along D(C).
For any cycle z ∈ Z
n+1−p
U (X ) we have
(z) = h
(z) +
f c1(L )
k c1(L 0)
p−1−k.
Proof. This is contained in [3], Theorem 4.1.
Remark 6.6. Proposition 6.5 allows to recover Proposition 6.3 once it is known
for smooth hermitian line bundles. In this case the properties H1, H2 and H3
are already established in [1].
Let F |K be a finite extension of fields and write T = SpecOF . Base
changing by T → S , we get an arithmetic variety XT → T , together with a
finite flat morphism g : XT → X of degree [F : K]. If D ⊆ XK is an effective
divisor such that D(C) ⊆ X (C) has simple normal crossings, then DF ⊆ XF is
an effective divisor and DF (C) ⊆ XT (C) has simple normal crossings as well.
Let L be a pre-log-log hermitian line bundle on X , with singularities along
D. The pull-back g∗L of L to XT is a pre-log-log hermitian line bundle,
with singularities along DF . Since g is flat, for every cycle z on X there
is a well defined pull-back cycle g∗(z). Observe that if z(C) intersects D(C)
properly, then g∗(z)(C) intersects DF (C) properly. Namely, the correspondence
z 7→ g∗(z) induces a morphism
U (X ) −→ Z
V (XT )
z 7−→ g∗(z),
(6.1)
where V = XT (C) \DF (C). This morphism maps Z
U (XK) into Z
V (XF ).
Lemma 6.7. Let L be a pre-log-log hermitian line bundle on X , with singu-
larities along D. Let F |K be a finite extension of fields and T = SpecOF .
Write g : XT → X for the finite flat projection induced by T → S . Let
w ∈ Z
V (XF ) be an irreducible and reduced cycle and set z = g(w)red. Let δ be
the degree of g |w. Then we have the equality
(w) = δh
Consequently, for the morphism g∗ of (6.1), we have
hg∗L (g
∗(z)) = [F : K]h
for every z ∈ Z
U (X ) and every p.
Proof. This follows for instance from the case of smooth metrics (see [1], §3.1.4
and Proposition 3.2.1) and Proposition 6.5, since g|w(C) : w(C) → z(C) is gener-
ically smooth and finite of degree δ, so that
f c1(L )
k c1(L 0)
p−1−k
f c1(L )
k c1(L 0)
p−1−k.
Notation 6.8. Let L be a pre-log-log hermitian line bundle, with singularities
along D. Let z ∈ Z
U (XK) with degLK z 6= 0. We define its normalized height
to be
(z) =
[K : Q] deg
Lemma 6.9. Let g : XT → X be as before. Let w ∈ Z
V (XF ) be an irreducible
and reduced cycle and set z = g(w)red. For the normalized height we have
(w) = h̃
(z) and h̃
(g∗z) = h̃
Proof. For every w ∈ Z
n+1−p
V (XF ) irreducible and reduced and z = g(w)red, we
have the equalities
deg(g∗L )F w =
[F : Q]
L )p−1
[F : Q]
c1(L )
[F : K]
where δ is the degree of g|w. It follows that deg(g∗L )F (g
∗z) = deg
z. Com-
bined with Lemma 6.7, we get h̃g∗L (w) = h̃L (z) and h̃g∗L (g
∗z) = h̃
Remark 6.10. The normalization h̃
just introduced is not the standard one.
However it appears naturally in the statement and proof of Theorem 1.3.
6.2 Proof of the main theorem
We now proceed to prove Theorem 1.3. The argument mainly relies on the
bounds established in §5 , and more concretely the situation studied in §5.3.
However, in the reduction steps we will need the following Bertini’s type theo-
rem. The proof is essentially well known, but we include it in the Appendix for
lack of reference.
Proposition 6.11. Let X be a nonsingular projective scheme over an alge-
braically closed field k. Let D ⊆ X be a divisor with simple normal crossings.
Let L be an ample line bundle on X. Then there exist an integer N > 0 and
global sections s1, . . . , sr ∈ H
0(X,L⊗N ) such that supp(div s1),. . . , supp(div sr)
are divisors with simple normal crossings and the following equality of schemes
holds:
D = (supp(div s1) ∩ . . . ∩ supp(div sr))red.
Under the hypothesis of Theorem 1.3, since D(C) has simple normal cross-
ings, DK has also simple normal crossings in XK . We will apply Proposition
6.11 through the following straightforward corollary.
Corollary 6.12. There exist a finite extension K ′|K, a positive integer N and
global sections s1, . . . , sr ∈ H
0(XK′ ,L
K′ ) such that
B1. supp(div sj)(C) is a divisor with simple normal crossings in X (C), for
every j = 1, . . . , r;
B2. DK′ = (supp(div s1) ∩ . . . ∩ supp(div sr))red.
The next two lemmas provide the final reductions before the proof of Theo-
rem 1.3.
Lemma 6.13. It is enough to proof Theorem 1.3 in the following situations:
i. after some finite extension K ′ | K;
ii. L is very ample and z ∈ Z
U (XK) is irreducible and reduced.
Proof. The first claim i is clear. For the proof of ii, we first note that the
statement of Theorem 1.3 for L ⊗N already implies the statement of the theorem
for L , since h̃
⊗N (z) = Nh̃
(z) and h̃
(z) = Nh̃
(z). Hence we assume
LK is very ample. We then proceed in two steps.
Step 1. We can suppose that L is very ample. Indeed, there exists some model
(Y ,A ) of (XK ,LK) with A very ample. The metrics ‖ · ‖, ‖ · ‖0 on L induce
metrics on A , and we write A and A 0 for the corresponding hermitian line
bundles. For every effective cycle z ∈ Z
U (XK), we write z̃ for the corresponding
effective and horizontal cycle on Y . By Proposition 3.2.2 in [1], there exists a
positive constant C, independent of z, such that
(6.2)
(z̃)− h
∣∣ ≤ C deg
Since A C and L C are isometric, Proposition 6.5 and (6.2) together give
(z̃)− h
(z̃)− h
∣∣ ≤ C deg
Step 2. We can suppose that z is irreducible and reduced. Indeed, suppose that
we have shown the existence of constants α, β, γ > 0 and R ∈ Z≥0 such that,
for every w ∈ Z
U (XK) irreducible and reduced, we have h̃L 0(w) + γ ≥ 1 and
(6.3)
∣∣∣h̃
(w)− h̃
∣∣∣ ≤ α+ β logR
(w) + γ
After possibly increasing γ we can suppose that h̃
(w) + γ > eR−1, for every
irreducible and reduced w ∈ Z
U (XK). Let us now consider w =
i∈I wi ∈
U (XK) \ {0} where the wi ∈ Z
U (XK) are irreducible and reduced. Then (6.3)
yields
∣∣∣h̃
(w) − h̃
∣∣∣ ≤
∣∣∣h̃
(wi)− h̃L 0(wi)
≤ α+ β
(wi) + γ
Since h̃
(wi) + γ > e
R−1 for all i and logR is concave on ]eR−1,+∞[, we
conclude
(wi) + γ
≤ logR
(wi) + γ)
= logR
(w) + γ
This completes the proof.
By Corollary 6.12 and Lemma 6.13, after possibly extending K and choosing
a suitable model of (XK ,LK), we can suppose that L is very ample and there
exist sections s1, . . . , sr ∈ H
0(XK ,LK) with the properties B1, B2 above (with
K ′ = K). After possibly multiplying the sections sj by a sufficiently divisible
integer, we can even suppose that s1, . . . , sr ∈ H
0(X ,L ). We denote Ej =
supp(div sj)K . We fix these data until the end of the proof.
Lemma 6.14. Let z ∈ Z
U (XK) be irreducible and reduced. Let F be a finite
extension of K over which all the irreducible components of zK are defined.
Let T = SpecOF and g : XT → X be the finite flat projection induced by
T → S . Write g∗(z) =
i∈I zi, zi irreducible, reduced and flat over T . Then
z(C) intersects D(C) properly if, and only if, every zi(C) intersects one of the
Ej,F (C) properly.
Proof. Straightforward.
Now we can complete the proof of Theorem 1.3.
Proof of Theorem 1.3. First of all, for every integer M ≥ 0 we construct a
function ΘM , as in §4, for the complex analytic variety X (C) and the divisor
with simple normal crossings D(C).
Observe that we can suppose that the metric ‖ · ‖0 satisfies c1(L 0) > 0 and
0 ≤ e
−e for every j = 1, . . . , r. Indeed, by [1], Proposition 3.2.2 (or also
Proposition 6.5 for smooth metrics), a change of smooth metric causes only
bounded variations of the normalized height.
We introduce the pre-log-log functions
Q = (log log ‖sj‖
Q : X (C) \ Ej(C) −→ R.
For everyQ we fix a positive constantAQ > 0 such that dd
Q )+AQ c1(L 0) ≥
0, A0 = 1, for all j = 1, . . . , r (see Proposition 4.9).
Write ‖ · ‖ = e−f/2‖ · ‖0, where f : X (C) \ D(C) → R is a pre-log-log
function (resp. P-singular if ‖ · ‖ is good). Attached to c1(L 0) we perform
a decomposition f = ϕ − ψ as in Theorem 4.3. Recall that ϕ, ψ are positive
pre-log-log (resp. P-singular) functions along D(C), with
ωϕ :=dd
c(−ϕ) +BΘN(dd
c(−Θ1) +A c1(L 0)) ≥ 0,
ωψ :=dd
c(−ψ) + BΘN(dd
c(−Θ1) +A c1(L 0)) ≥ 0,
for some A,B > 0 and N ∈ Z≥0. If ‖ · ‖ is good, then we can take N = 1.
For every j = 1, . . . , r, D ⊆ Ej . By compactness of X (C) there exist constants
C > 0, M ∈ Z≥0 such that
ϕ ≤ Cℓ
M , ψ ≤ Cℓ
M ,Θ1 ≤ Cℓ
1 ,ΘN ≤ Cℓ
for all j ∈ {1, . . . , r}.
Let z ∈ Z
n+1−p
U (XK) be irreducible and reduced. Denote by F an extension
of K such that all the irreducible components of zK are defined over F . Let
T = SpecOF , g : XT → X be as before. Decompose g
∗(z) =
i∈I zi, with
the zi irreducible and flat over T . By Lemma 6.14, for every zi there exists
j = j(i) ∈ {0, . . . , r} such that zi(C) intersects Ej,F (C) properly. By Corollary
5.7 and Proposition 6.5 we have
(6.4)
∣∣∣h̃g∗L (zi)− h̃g∗L 0(zi)
∣∣∣ ≤ α+β logR
zi(C)
log g∗‖sj‖
[K : Q] deg(g∗L )F zi
for α, β > 0, R ∈ Z≥0 depending only on A, {AQ}Q, B, C,M and N . Moreover,
if ‖ · ‖ is good, then we can take R = 1. Applying the property H3 of heights
(see Proposition 6.3), we rewrite (6.4) as
(6.5)∣∣∣h̃g∗L (zi)− h̃g∗L 0(zi)
∣∣∣ ≤ α+ β logR
g∗L 0
(zi)− 2h̃g∗L 0(div(g
∗sj).zi)
To derive this inequality we point out that
deg(g∗L )F (div(g
∗sj).zi) = deg(g∗L )F zi,
so that
hg∗L 0(div(g
∗sj).zi)
[F : Q] deg(g∗L )F zi
= h̃g∗L 0(div(g
∗sj).zi).
By Theorem 1.2 there exists a positive constant κ > 0 such that, for every effec-
tive cycle w 6= 0 on X , we have h̃
(w) > −κ. Combined with Lemma 6.9 this
yields h̃g∗L 0(div(g
∗sj).zi) > −κ, because div(g
∗sj).zi is effective. Therefore,
from (6.5) we deduce
∣∣∣h̃g∗L (zi)− h̃g∗L 0(zi)
∣∣∣ ≤α+ β logR
2h̃g∗L 0(zi) + 2κ+ 2e
≤α+ 2Rβ logR
h̃g∗L 0(zi) + κ+ e
(6.6)
where we applied the trivial inequalities log 2 ≤ 1 and x + y ≤ 2xy for real
x, y ≥ 1. Now h̃g∗L 0(zi)+κ+ e
R+1 > eR−1 and logR is concave on ]eR−1,+∞[.
From (6.6) we derive
∣∣∣h̃g∗L (g
∗z)−h̃g∗L 0(g
deg(g∗L )F (zi)
deg(g∗L )F (g
∣∣∣h̃g∗L (zi)− h̃g∗L 0(zi)
≤α+ 2Rβ logR
deg(g∗L )F (zi)
deg(g∗L )F (g
h̃g∗L 0(zi) + κ+ e
=α+ 2Rβ logR
h̃g∗L 0(g
∗z) + κ+ eR+1
(6.7)
By Lemma 6.9, h̃
(g∗z) = h̃
(z) and h̃
g∗L 0
(g∗z) = h̃
(z). Hence (6.7) is
equivalent to
(6.8)
∣∣∣h̃
(z)− h̃
∣∣∣ ≤ α+ 2Rβ logR
(z) + κ+ eR+1
The constants α, 2Rβ, γ := κ + eR+1 > 0, R ∈ Z≥0 (R = 1 if ‖ · ‖ is good) in
(6.8) depend only on L and L 0, and not on z. This concludes the proof of the
theorem.
7 Examples
7.1 Automorphic vector bundles on toroidal compactifi-
cations
The first natural examples of good hermitian vector bundles are provided by the
theory of (fully decomposed) automorphic vector bundles on locally symmetric
varieties, and their extensions to smooth toroidal compactifications. These have
been firstly worked by Mumford in his proof of Hirzebruch’s proportionality
principle in the non-compact case [14]. In this section we quote from loc. cit.
the main construction and Mumford’s theorem. As an application, we briefly
discuss the example of modular forms.
Let B be a bounded symmetric domain. We can write B = G/K, where
G is a semi-simple adjoint group and K is a maximal compact subgroup. De-
note KC, GC the complexifications of K and G. Inside GC there is a parabolic
subgroup of the form P+ · KC (P+ being its unipotent radical), such that
K = G∩ (P+ ·KC) and G · (P+ ·KC) is open in GC. Then B̌ := GC/G · (P+ ·KC)
is a rational projective variety and there is a G-equivariant immersion B →֒ B̌
compatible with the complex structure of B. Let E0 be a G-equivariant vector
bundle on B attached to a representation σ : K → GLr(C). We complexify
σ and extend it to P+ · KC, by letting it act trivially on P+. This extension
induces a GC equivariant analytic vector bundle Ě0 on B̌, with ι
∗(Ě0) = E0.
This way we get a holomorphic structure on E0 (which depends on the chosen
extension of σ to P+ ·KC).
Let Γ be a neat arithmetic subgroup of G acting on B. Then X = Γ\B is a
smooth quasi-projective complex variety. The vector bundle E0 descends to a
holomorphic vector bundle E on X . Such a vector bundle is called fully decom-
posed automorphic vector bundle. Since K is compact, there is a G-invariant
hermitian metric h0 on E0, thus inducing a hermitian metric h on E.
Theorem 7.1. Let X be a smooth toroidal compactification of X with D =
X \X a divisor with normal crossings. Then the automorphic vector bundle E
extends to a vector bundle E1 over X, such that h is good along D.
The following proposition may be interesting for some arithmetic purposes.
Proposition 7.2. Suppose that E0 = ωB is the canonical bundle of B. Then
E1 = ωX(D) and coincides with the pull-back of an ample line bundle O(1) on
the Baily-Borel compactification X∗ of X. The global sections of O(n) naturally
correspond to the modular forms whose automorphy factor is the nth power of
the jacobian.
Remark 7.3. Under the hypothesis of the proposition, the line bundle ωX(D)
is not ample in general.
Equip the line bundle E0 = ωB with the hermitian metric h0 induced by the
Kähler-Einstein metric on B, say with Einstein constant −1. The existence and
uniqueness is guaranteed by a result of Mok and Yau [13]. Since the Kähler-
Einstein metric is invariant under automorphisms, h0 is G-equivariant. By The-
orem 7.1 h0 induces a good hermitian metric h on E1 = ωX(D), with singulari-
ties along D. Observe that this metric is induced by the Kähler-Einstein metric
on X with Einstein constant −1, by uniqueness. Since the Kähler-Einstein met-
ric has negative Ricci curvature, c1(ωX(D), h) ≥ 0 on X . Together with the
fact that ωX(D) is the pull-back of an ample line bundle O(1) on X
∗, this can
be shown to be enough for the main theorem to hold, as soon as X and X
are defined over a number field K and we have chosen suitable models X of
X, X ∗ of X∗, etc. over SpecOK . Suppose that O(1) extends to an ample
line bundle A on X ∗, that there is a morphism π : X → X ∗ extending the
natural projection X → X∗ and put L = π∗(A). The line bundle L is a model
of ωX(D) and it can be endowed with the good hermitian metric induced by
h. Then Corollary 1.4 hold for L , provided we restrict to effective horizontal
cycles. The proof follows the same lines as for pre-log-log hermitian ample line
bundles, and it will be detailed elsewhere.
7.2 Some natural hermitian vector bundles on the moduli
space of curves
Let g ≥ 2 be an integer and Mg the moduli space of complex stable curves of
genus g. We denote by π : Cg → Mg the universal curve. By Mg we mean the
open subset of Mg parametrizing smooth curves, and we write Cg = π
−1(Mg).
The boundary ∂Mg = Mg \ Mg, which classifies singular stable curves, is a
divisor with normal crossings. We write ∂ Cg = π
−1(∂Mg), which is a divisor
with normal crossings, too. For the sake of simplicity we neglect that Mg and
Cg are actually V -manifolds, and we work as if they were complex analytic
manifolds (see [17] for the definition of V -manifold and the description of the
moduli space of curves as a V -manifold).
The first example concerns ωCg/Mg , the relative dualizing sheaf of π. Every
fiber of π|Cg admits a unique complete hyperbolic metric of constant negative
curvature −1. By Teichmüller’s theory these metrics glue together and define
a smooth hermitian metric on ω∨
Cg/Mg
. We get a smooth hermitian metric on
ωCg/Mg . By a theorem of Wolpert [18] this metric extends to a good hermitian
metric on ωCg/Mg , with singularities along ∂ Cg. It is well known that ωCg/Mg
is relatively ample ([5], Corollary to Theorem 1.2).
Let us now consider (ΩMg , hWP ) the cotangent bundle of Mg with the Weil-
Petersson metric. Recall that if p is a point of Mg representing a Riemann
surface R, then ΩMg ,p is isomorphic to the space of holomorphic quadratic
differentials on R. If R is written as H/Γ (H Poincaré’s upper half plane and
Γ ⊆ PSL2(R) a discrete subgroup), then the metric hWP on the stalk ΩMg ,p is
the usual scalar product
〈ϕ, ψ〉 =
ϕ(z)ψ(z)y2dxdy
on automorphic forms of weight (2,0) for the group Γ. By a result of Trapani [16],
(detΩMg , dethWP ) extends to a good hermitian line bundle ωMg (log ∂Mg),
with singularities along ∂Mg. Moreover Trapani shows that ωMg (log ∂Mg)
admits a smooth hermitian metric with positive curvature form. Therefore, its
pull-back to the moduli space of curves of genus g with level n structure (n ≥ 3)
is ample.
In an ambitious program pioneered by [11], [12], Liu, Sun and Yau study
the goodness and bounded geometry of several natural Kähler metrics on the
moduli space of curves. The interested reader is referred to loc. cit. for precise
statements.
7.3 Kähler-Einstein metrics on quasi-projective varieties
7.3.1 Complex theory
The main references we follow are [10], [15], and [19].
Let M be a complex analytic manifold of dimension n and Ω a smooth
positive (n, n)-form on M , namely a volume form. For every analytic chart
(V ; z1, . . . , zn) of M write
Ω|V = ξV
dzk ∧ dz̄k
The functions {ξV }V define a smooth hermitian metric ‖ · ‖Ω on the canonical
line bundle ωM . By definition, the Ricci form of Ω is the real (1,1)-form
RicΩ = c1(ωM , ‖ · ‖Ω)
which is locally given by
RicΩ |V= dd
c log ξV .
The generalized Fefferman operator J acting on volume forms is defined as
J : Ω 7−→
(RicΩ)n
Theorem 7.4 (Kobayashi [10]). Let X be a compact complex analytic manifold
and D ⊆ X a divisor with simple normal crossings. Suppose that the line bundle
ωX(D) is ample on X. Then there exists a unique complete Kähler-Einstein
metric gKE on X \D with constant negative Ricci curvature -1. If ΩKE is the
volume form of gKE, then gKE is characterized by being complete on X \D and
by the equation
J(ΩKE) = 1.
Proposition 7.5. Let X be a compact complex analytic manifold and D ⊆
X a divisor with simple normal crossings. Suppose that ωX(D) is ample on
X. Let U = X \ D and endow ωU with the smooth hermitian metric ‖ · ‖KE
induced by ΩKE. Then (ωU , ‖ · ‖KE) extends to a good hermitian line bundle
(ωX(D), ‖ · ‖KE), with singularities along D.
The proof of Proposition 7.5 follows easily from the more precise growth
properties established in the proof of Theorem 7.4. For the sake of completeness
we now deepen some of the details involved. In the sequel we fix X and D as
in the proposition.
Definition 7.6. Let M be a complex analytic manifold of dimension n. Let
V ⊆ Cn be an open subset. A holomorphic map φ : V → M is called a
quasicoordinate if rank(dvφ) = n for every v ∈ V . Then (V, φ) is called a local
quasicoordinate of M .
Let x ∈ D and (V = V (x); z1, . . . , zn) be an analytic chart of X centered at
x, by means of which V gets identified with ∆r1 ×∆
1 and V \D with ∆
(r = r(x)). For every η ∈ (0, 1)r, define the quasicoordinate
φη : Vη = (∆3/4)
r ×∆s1 −→ V \D
v = (v1, . . . , vn) 7−→ (φη,1(v), . . . , φη,n(v))
where
φη,k(v) =
if 1 ≤ k ≤ r,
vk if k > r.
Observe that
η∈(0,1)r
φη(Vη).
We now construct a quasicoordinate covering V = {(Vβ , φβ)}β , containing ex-
actly:
• all the quasicoordinates {(Vη, φη)}η∈(0,1)r , for (V = V (x); z1, . . . , zn),
x ∈ D, as above. Denote by W the union of the images of all these
quasicoordinates. This is an open neighborhood of D;
• a finite coordinate covering of the compact subset X \W .
We introduce the Hölder space of Ck,α-functions on U = X \D, with respect to
the quasicoordinate covering V .
Definition 7.7 (Hölder spaces). Let k ≥ 0 be an integer and α ∈ (0, 1). The
Ck,α-norm with respect to V of a function u ∈ Ck(U) is
|u‖k,α,V = sup
(V,φ)∈V
‖φ∗(u)‖k,α
= sup
(V,φ)∈V
|p|+|q|≤k
∣∣∣∣∣
∂|p|+|q|
∂ vp ∂ vq
φ∗(u)(v)
∣∣∣∣∣
+ sup
v,v′∈V
|p|+|q|=k
|v − v′|−α
∣∣∣∣∣
∂|p|+|q|
∂ vp ∂ vq
φ∗(u)(v)
|p|+|q|
∂ vp ∂ vq
φ∗(u)(v′)
∣∣∣∣∣
We define the space of Ck,α-functions on U as
Ck,α(U) = {u ∈ Ck(U) | ‖u‖k,α,V < +∞},
which is seen to be a Banach space with respect to the norm ‖ · ‖k,α,V .
Definition 7.8. We define Rr,s(U) as the space of (r, s)-differential forms ω on
U such that, for all quasicoordinate (V, φ) ∈ V ,
φ∗(ω) =
|p|=r
|q|=s
(apdv
p + bqdv
‖ap‖k,α, ‖bq‖k,α < +∞
for all multi-indices p, q with |p| = r, |q| = s and all k ≥ 0, α ∈ (0, 1).
If (v1, . . . , vn) are the standard coordinates on V ⊂ C
n and p = (i1, . . . , ir),
q = (j1, . . . , js) are ordered multi-indices, we wrote
dvp = dvi1 ∧ . . . ∧ dvir , dv
q = dvj1 ∧ . . . ∧ dvjs .
Lemma 7.9. i. Rr,s(U) is a C-vector space;
ii. Rr,s(U) ∧Rr
′,s′(U) ⊆ Rr+r
′,s+s′(U);
iii. ∂ Rr,s(U) ⊆ Rr+1,s(U) and ∂̄ Rr,s(U) ⊆ Rr,s+1(U).
Proof. Immediate to check from the definition of Rr,s(U).
Recall that PD denotes the sheaf of Poincaré forms on X with singularities
along D (Definition 2.8).
Lemma 7.10. We have an inclusion Rr,s(U) ⊆ Γ(X,PD)
(r,s), where the su-
perscript stands for the (r, s) part with respect to the usual bigrading of complex
differential forms.
Proof. We localize near the divisor D and consider a quasicoordinate φη : Vη →
V \ D in V . Hence V has coordinates z1, . . . , zn and V \ D is the divisor
z1 . . . zr = 0. For simplicity we consider the differential form
ω = h
z1 log(|z1|−1)
∧ . . . ∧
zr log(|zr|−1)
∧ dzr+1 ∧ . . . dzn
and suppose that φ∗η(ω) has finite C
k,α-norm for all k ≥ 0 and α ∈ (0, 1). We
have to prove that h is bounded on the image of φη. From the definition of φη,
a straightforward computation shows that
φ∗η(ω) = φ
1− |vi|2
|vi − 1|
(vi − 1)2
dv1 ∧ . . . ∧ dvn.
The hypothesis implies the coefficient of dv1 ∧ . . .∧ dvn has bounded sup-norm.
Since |vi| ≤ 3/4 for i = 1, . . . , r, this immediately yields the boundedness of
φ∗(h).
Corollary 7.11. Let u : U = X \D → C be a smooth function. If u ∈ R0,0(U),
then u is a P-singular function with singularities along D.
Proof. By Lemma 7.9, du ∈ R1,0(U)⊕R0,1(U) and ddcu ∈ R1,1(U). By Lemma
7.10, du and ddcu have Poincaré growth with singularities along D. Hence u is
a P-singular function with singularities along D.
Theorem 7.12 (Kobayashi [10]). Let X be a compact complex analytic manifold
and D ⊆ X a divisor with simple normal crossings. Suppose that ωX(D) is
ample on X. Let D1, . . . , Dm be the irreducible components of D and si ∈
Γ(X,O(Di)) sections with div si = Di, for all i = 1, . . . ,m. Let Ω be a volume
form on X. There exist suitable smooth hermitian metrics on the line bundles
O(Di), that we write ‖ · ‖ for simplicity, with ‖si‖ small enough, and a function
u ∈ R0,0(U) such that the volume form ΩKE of the Kähler-Einstein metric on
ΩKE = e
u Ω∏m
i=1 ‖si‖
2 log(‖si‖2)2
With Theorem 7.12 at hand, we can prove Proposition 7.5.
Proof of Proposition 7.5. We may localize at an analytic chart (V ; z1, . . . , zn)
of X by means of which V gets identified with ∆r1 ×∆
1 and V \D = ∆
For simplicity suppose that Di ∩ V gets identified with zi = 0, for i = 1, . . . , r
and Di ∩ V = ∅ for i > r. A local analytic frame of ωX(D)|V is
, . . . ,
, dzr+1, . . . , dzn.
Write ‖si‖
2 = |zi|
2hi for i = 1, . . . , r. Then by Theorem 7.12 we can write
ΩKE|V = e
hk log(‖sk‖2)2
‖sk‖2 log(‖sk‖2)2
dzk ∧ dz̄k
where γ is a smooth positive function. Observe also that the functions hk as
well as the second product are smooth positive functions. We are thus reduced
to prove that u and log(log(‖si‖
2)2) are P-singular functions with singularities
along D. On one hand, Lemma 7.11 proves that u is P-singular. On the other
hand, log(log(‖si‖
2)2) is P-singular by Lemma 4.6.
7.3.2 Arithmetic theory
Let K be a number field and X a nonsingular projective variety over K. Let
D ⊆ X be a reduced effective divisor such that D(C) ⊆ X(C) has simple
normal crossings. Suppose that ωX(D) is an ample divisor on X . Then, for
every σ : K →֒ C, ωXσ,C(Dσ,C) is ample and there exists a unique Kähler-
Einstein metric on Xσ(C) \ Dσ(C) with constant negative Ricci curvature -1
(see Theorem 7.4). By Proposition 7.5, these metrics induce good hermitian
metrics on the lines ωXσ,C(Dσ,C), with singularities along Dσ(C), respectively.
The collection of these metrics, for varying σ : K →֒ C, is invariant under the
action of complex conjugation F∞. Indeed, let gKE,σ be the Kähler-Einstein
metric on Xσ(C) \Dσ(C). Then F
∞(gKE,σ) defines a complete Kähler metric
on Xσ(C) \Dσ(C). Let ΩKE,σ be the volume form of gKE,σ, so that F
∞ΩKE,σ
is the volume form of F ∗∞(gKE,σ). Since RicF
∞ΩKE,σ = F
∞ RicΩKE,σ, we find
J(F ∗∞ΩKEσ) = F
∞J(ΩKE,σ) = 1.
By uniqueness we derive F ∗∞(gKE,σ) = gKE,σ. We write ωX(D)KE for the
resulting good hermitian line bundle, with singularities along D.
Let now (X ,L ) be any model of (X,ωX(D)KE) over S = SpecOK . Then
Theorem 1.3 can be applied for any choice of smooth metric ‖ · ‖0 on L . If L
is ample over X , then Corollary 1.4 applies to (X ,L ). In this situation L
verifies the finiteness and the universal lower bound properties.
8 Appendix
The appendix is aimed to prove Proposition 6.11.
Proof of Proposition 6.11. Decompose D into smooth irreducible components,
D = D1 ∪ . . . ∪ Dm. Let D
∗ be the Weil divisor D1 + . . . + Dm. Denote by
F the family of nonsingular subschemes of X consisting of X itself and all the
irreducible components of the intersections
where I runs over all the subsets of {1, . . . ,m}. Since L is ample, there ex-
ists some positive integer N such that L⊗N and L⊗N(−D∗) are very ample.
Consider the exact sequence
0 → L⊗N (−D∗) → L⊗N → L⊗ND → 0.
Taking global sections, we find the exact sequence
(8.1) 0 → Γ(X,L⊗N(−D∗))
→ Γ(X,L⊗N )
→ Γ(X,L⊗ND ).
For every Y ∈ F , Bertini’s theorem ([9], Chapter II, Theorem 8.18) provides us
with a dense open subset UY of the projective space P = P(Γ(X,L
⊗N(−D∗)))
such that, for any t ∈ UY , supp(div t) intersects Y transversally. Since F is
finite and the open subsets UY are dense, the intersection
is a non-empty open subset of P. Therefore we can take t1, . . . , tr ∈ U such
that supp(div t1) ∩ . . . ∩ supp(div tr)=∅. Let t1, . . . , tr ∈ Γ(X,L
⊗N(−D∗)) be
representatives of t1, . . . , tr, respectively. Let s1 = j(t1), . . . , sr = j(tr) be their
images by the morphism j of (8.1). Since p(si) = 0, D ⊆ supp(div si) for all
i = 1, . . . , r. Actually, for every i = 1, . . . , r, we have
div si = div ti +D
Therefore, for the support of div si we find
supp(div si) = supp(div ti) ∪D.
By the choice of the sections ti (ti ∈ U), supp(div si) is a divisor with simple
normal crossings. Finally, since supp(div t1) ∩ . . . ∩ supp(div tr) = ∅, we have
an equality of reduced closed subschemes of X
D = (supp(div s1) ∩ . . . ∩ supp(div sr))red.
The proof of the proposition is complete.
References
[1] J.-B. Bost, H. Gillet and C. Soulé, Heights of projective varieties and pos-
itive Green forms, J. Amer. Math. Soc. 7 (1994), 903–1027.
[2] J.I. Burgos, J. Kramer and U. Kühn, Cohomological arithmetic Chow rings,
to appear in J. Inst. Math. Jussieu (2007).
[3] , Arithmetic characteristic classes of automorphic vector bundles,
Documenta Math. 10 (2005), 619–716.
[4] J. Carlson and P.A. Griffiths, A defect relation for equidimensional holo-
morphic mappings between algebraic varieties, Ann. Math. 95 (1972), 557–
[5] P. Deligne and D. Mumford, The irreducibility of the space of curves of
given genus, Publ. Math. IHES 36 (1969), 75–109.
[6] J.-P. Demailly, Complex Analytic and Differential Geometry, unpublished
book.
[7] G. Faltings, Finiteness theorems for abelian varieties over number fields,
Arithmetic Geometry (G. Cornell and J.H. Silverman, eds.), Springer-
Verlag, 1986, 9–27.
[8] H. Gillet and C. Soulé, Arithmetic intersection theory, Publ. Math. IHES
72 (1990), 94–174.
[9] R. Hartshorne, Algebraic Geometry, Graduate Texts in Math., vol. 52,
Springer-Verlag, 1977.
[10] R. Kobayashi,Kähler-Einstein metric on an open algebraic manifold, Osaka
J. Math 21 (1984), 399–418.
[11] K. Liu, X. Sun and S.-T. Yau, Canonical metrics on the moduli space of
Riemann surfaces I, J. Differential Geometry 68 (2004), 571–637.
[12] , Canonical metrics on the moduli space of Riemann surfaces II, J.
Differential Geometry 69 (2005), 163–216.
[13] N. Mok and S.T. Yau, Completeness of the Kähler-Einstein metric on
bounded domains and the characterization of domains of holomorphy by
curvature conditions, Proc. Symp. in Pure Math. 39 (1983), 41–59.
[14] D. Mumford, Hirzebruch’s proportionality theorem in the non-compact case,
Invent. Math. 42 (1977), 239–272.
[15] G. Tian and S.T. Yau, Existence of Kähler-Einstein metrics on complete
Kähler manifolds and their applications to algebraic geometry, Math. As-
pects of String Theory (San Diego, California, 1986), Adv. Ser. Math.
Phys., 1, World Sci. Publishing, Singapore, 1987.
[16] S. Trapani, On the determinant of the bundle of meromorphic quadratic
differentials on the Deligne-Mumford compactification of the moduli space
of Riemann surfaces, Math. Ann. 293 (1992), 681–705.
[17] S.A. Wolpert, On obtaining a positive line bundle from the Weil-Petersson
class, Amer. J. of Math. 107 No. 6 (1985), 1485–1507.
[18] S.A. Wolpert, The hyperbolic metric and the geometry of the universal
curve, J. Differential Geometry 31 (1990) 417–472.
[19] D. Wu, Higher canonical asymptotics of Kähler-Einstein metrics on quasi-
projective manifolds, Comm. Anal. and Geom., 14 No. 4 (2006), 795–845.
G. Freixas i Montplet, Département de Mathématiques, Univer-
sité Paris-Sud, Bâtiment 425, 91405 Orsay cedex, France
E-mail address : gerard.freixas@math.u-psud.fr
	Introduction
	Differential forms with logarithmic singularities
	Logarithmically singular hermitian vector bundles
	Global bounds for real log-log growth (1,1)-forms
	Bounding height integrals
	Arakelovian heights
	Examples
	Appendix
ABSTRACT
  We prove lower bound and finiteness properties for arakelovian heights with
respect to pre-log-log hermitian ample line bundles. These heights were
introduced by Burgos, Kramer and K\"uhn, in their extension of the arithmetic
intersection theory of Gillet and Soul\'e, aimed to deal with hermitian vector
bundles equipped with metrics admitting suitable logarithmic singularities. Our
results generalize the corresponding properties for the heights of
Bost-Gillet-Soul\'e, as well as the properties established by Faltings for
heights of points attached to hermitian line bundles whose metrics have
logarithmic singularities. We also discuss various geometric constructions
where such pre-log-log hermitian ample line bundles naturally arise.

<|endoftext|><|startoftext|>
Microsoft PowerPoint - figures
Anisotropic quasiparticle renormalization in Na0.73CoO2: role of
inter-orbital interactions and magnetic correlations
J. Geck1,2,∗, S.V. Borisenko1, H. Berger3, H. Eschrig1, J. Fink1, M. Knupfer1,
K. Koepernik1, A. Koitzsch1, A.A. Kordyuk1,4, V.B. Zabolotnyy1, and B. Büchner1
1IFW Dresden, P.O. Box 270116,D-01171 Dresden, Germany
2Department of Physics and Astronomy,
University of British Columbia, Vancouver, BC, V6T 1Z1, Canada
3Institut de physique de la matiére complex,
EPF Lausannne, 1015 Lausanne, Switzerland and
4Institute of Metal Physics of the National Academy
of Sciences of Ukraine, 03142 Kyiv, Ukraine
(Dated: Received: November 8, 2018)
Abstract
We report an angular resolved photoemission study of NaxCoO2 with x≃0.73 where it is found
that the renormalization of the quasiparticle dispersion changes dramatically upon a rotation from
ΓM to ΓK. The comparison of the experimental data to the calculated band structure reveals that
the quasiparticle renormalization is most pronounced along the ΓK-direction, while it is significantly
weaker along the ΓM-direction. We discuss the observed anisotropy in terms of multiorbital effects
and point out the relevance of magnetic correlations for the band structure of NaxCoO2 with
x ≃ 0.75.
PACS numbers: 71.27.+a, 71.18.+y,74.25.Jb,74.70.-b
∗ Email: geck@physics.ubc.ca
http://arxiv.org/abs/0704.1048v2
The unconventional behavior of correlated electrons in materials comprised of square lat-
tices has attracted a vast amount of attention [1, 2]. However, besides strong electronic
correlations, the topology of the underlying lattice structure itself constitutes another im-
portant ingredient that can produce exotic electronic ground states.
The CoO2-layers in the NaxCoO2 materials, which are built up from edge-sharing CoO6
octahedra, constitute a realization of a correlated electron system based on a triangular
lattice. More specifically, these compounds possess a layered hexagonal structure, where
strongly covalent CoO2- and ionic Na-layers (ab-planes) alternate along the perpendicular
c-axis [3]. Upon changing the Na content x, these materials can be doped with electrons and,
in addition, water molecules can be intercalated. Both changing x and water intercalation
drastically alter the electronic properties of NaxCoO2, leading most notably to the emergence
of superconductivity upon hydration [4, 5]. We will focus on the non-hydrated compounds
with x ≃ 0.7, where an anomalous metallic state with an extremely large and field dependent
thermoelectric power as well as a giant field dependent scattering rate was observed [6, 7].
There is also evidence for the seemingly paradoxical coexistence of electron itinerancy and
localized magnetic moments, as well as for unusual charge order CDW phenomena in these
macroscopically metallic compounds [8, 9, 10].
The unconventional electronic properties described above strongly motivate the study
of the electronic structure of NaxCoO2 by means of angular resolved photoemission spec-
troscopy (ARPES), which provides direct and unique experimental access to single-particle
excitations and, thus, to the many-body effects in these materials. Previous ARPES stud-
ies on NaxCoO2 [12, 13, 14, 15, 16, 17] showed pronounced deviations from the electronic
structure predicted by LDA calculations [11]. In addition, a strongly renormalized heavy
QP band, displaying a kink feature was reported [13, 14, 15].
In this ARPES study we focus on the momentum (k) dependent renormalization effects
in Na0.73CoO2, which is essential to unravel the relevant couplings that govern the low
energy physics. ARPES experiments were carried out using a lab-based system equipped
with a SCIENTA SES 200 analyser and a Gammadata He discharge lamp at an excitation
energy hν = 21.2 eV (energy resolution 30meV, angular resolution 0.3◦). The high-quality
Na0.73CoO2 single crystals examined in this study were grown by the sodium chloride flux
methods as described in Ref. [18].
A typical Fermi level (EF ) crossing observed at T=25K is shown in Fig. 1, where the
image on the left hand side shows the photoelectron intensity as a function of momentum k
and binding energy EB. A single and well defined QP band, which crosses the Fermi energy
at EB = 0 eV can be observed. On the right hand side of Fig. 1 an energy distribution
curve (EDC) and a momentum distribution curve (MDC) are shown. By mapping the EF
crossings over a large area of k-space, a cut through the Fermi surface (FS) of Na0.73CoO2
parallel to the CoO2 planes was measured. The data allows for a precise calibration of the
k-scale, because dispersions from the first into the second Brillouin zone were captured.
In Figs. 2 (a),(b) the k-dependent photoelectron intensity for two fixed binding energies of
EB = 0 eV and 0.132 eV are shown. Focussing on the cut at EB = 0 eV in Fig. 2 (a), a large
hole-pocket centered around the Brillouin zone center Γ is observed. The FS displays a clear
hexagonal topology. The intensity variations along the FS, leading to a seemingly lower
symmetry, are caused by matrix element effects. The size of the observed FS agrees well
with the doping level, provided that only one of the two a1g-bands (cf. Fig. 2 (c)) crosses
EF . This agrees with the data shown in Fig. 1, where a single QP-band crosses EF .
In accordance with previous studies, the hole-pockets along ΓK are not observed for
EB = 0 eV. It can be seen in Fig. 2 (b) that the corresponding e
g-band (cf. Fig. 2 (c)) lies
well below EF , as reported earlier [15]. This can also be observed in Fig. 2 (c), where the
measured band structure along ΓK and ΓM is compared to the band structure obtained by
LDA [11]. There is a fair agreement of the ARPES data and LDA at higher binding energies
above 1.5 eV. However, close to EF deviations occur which are particularly pronounced along
In Fig. 3 two cuts of the data set shown in Fig. 2 along the ΓK- (left) and ΓM-direction
(right) are shown. Focussing on the cut along ΓK, two prominent features can be observed.
First, there is a band crossing EF that forms the FS. Second, there is another band at higher
binding energies, which can be identified as the e′g-band. The data indicates that the top of
the e′g-band is at about 85meV.
By tracking the maximum of the MDCs for different EB, the QP-dispersion (E(k)) in-
dicated by black symbols is obtained. Close to EF the dispersion can be well described by
a linear behavior, yielding the Fermi velocity vF = (0.3 ± 0.05) eV Å along ΓK. However,
at slightly higher EB, the measured dispersion is bent and thus deviates from the linear
behavior close to EF . We define the energy where this deviation becomes significant as Ed,
resulting in Ed = (26 ± 8)meV for the ΓK-direction. The second bend in the dispersion
around 80meV is related to the crossing of the two bands [17]. Applying the same analysis
to the cut along ΓM leads to vF = (0.6± 0.08) eV Å and Ed = (66 ± 5)meV. Clearly, both
vF and Ed depend on the direction in k-space.
The observed bending of the dispersion that sets in around Ed could be due to a coupling
of the QP to bosonic excitations. In fact, consistent with this interpretation, we observe an
enhanced increase of the scattering rate around Ed. It has to be noted, however, that the
feature in the dispersion observed here is not as well defined as the kink in the cuprates,
for example. This means that the values of Ed cannot be identified straightforwardly with
a specific mode energy and, moreover, a coupling to several different bosonic excitations is
possible.
In the following we will focus on vF , which is a well defined quantity: Cuts for different
ψ-values (cf. Fig. 1 (b)) were systematically analyzed in the same way as described above.
The obtained variation of vF as determined from the ARPES data is shown in Fig. 4 (a).
Upon rotating the direction of the cut from ΓM to ΓK, vF decreases by about a factor of 2.
Using the function kF (ψ) that was determined from the data in Fig. 2 (a) together with the
obtained values of vF , the variation of the effective mass m
∗ with ψ can be calculated. The
result is given in Fig. 4 (c). Remarkably, the QP along the ΓK direction is about twice as
heavy as the QP along ΓM. This pronounced anisotropy of m∗ is expected to have a strong
impact on the in-plane electronic properties of Na0.73CoO2.
To determine the renormalization effects in Na0.73CoO2, we use the LDA band structure
as a reference and compare the ARPES Fermi velocities (vPESF ) to the corresponding LDA
values (vLDAF ). As it can be observed in Figs. 4 (a),(b), these two quantities show exactly
the opposite behavior: vPESF decreases and v
F increases upon rotation from ΓM to ΓK.
In order to check whether the deviation between ARPES and LDA is related to a lattice
distortion at the surface, LDA calculations were performed for structures where the distance
between the oxygen and the cobalt layers, i.e. the Co–O–Co bond angle, was changed. In
agreement with previous calculations, we observe that the top of the e′g band is shifted to
higher binding energies upon increasing the Co–O–Co bond angle. At the same time the
anisotropy of vLDAF is slightly reduced, but unchanged qualitatively. This strongly suggests
that the measured anisotropy of vPESF is not caused by a lattice distortion at the surface.
The comparison of LDA and ARPES therefore shows that the deviation of the LDA band-
structure from the measured QP-dispersion increases dramatically close to the ΓK-direction
(cf. inset of Fig. 3). In the following we will refer to this deviation as QP-renormalization
(QPR). This QPR can be characterized using a constant κ defined by (1+ κ) vPESF = v
The ψ- or, in other words, k-dependence of κ is shown in Fig. 4 (d), revealing the strong
anisotropy of the QPR in Na0.73CoO2. We note that, although the photoelectron intensity
in Fig. 2 (a) at kF is also influenced by matrix element effects, it is always significantly lower
in the ΓK- than in the ΓM-direction, in agreement with enhanced renormalization effects
along ΓK.
We find ∂kE
≃ ∂kE
LDA/2 along ΓK in the whole energy range up to EB = 85meV
(inset of Fig. 3). At the same time it is remarkable that the QPR gets stronger the closer
the so-called e′g-band gets to the Fermi level (c.f. Fig.2 (b)). This points to an effect related
to coupling between the a1g- and the e
g-bands. In fact, a strong interaction between these
bands is manifested by a large hybridization gap at higher binding energies and the polar-
ization dependence along ΓK found in a recent ARPES study [17]. Hence, the k-dependent
QPR at EF is most likely caused by multiorbital effects, i.e. interactions between the states
of e′g and a1g symmetry. In this case, the QP-states along ΓM and ΓK display different prop-
erties: Along ΓM the QP-states have largely a1g symmetry, while they display pronounced
multiorbital properties along ΓK. This is of crucial importance for the many-body effects
in these materials, since the coupling of the QP to bosonic excitations depends critically on
the symmetry of the QP-states [21]. To conclude so far, the observed anisotropies clearly
indicate that multiorbital effects play an important role for the QP-dynamics at EF .
Furthermore, our DFT studies –details will be provided in a forthcoming publication–
show that magnetic correlations play an important role for the QP-dynamics as well: Accord-
ing to non-magnetic LDA calculations the band structure of Na0.75CoO2 displays a strong 3D
character. In agreement with a previous DFT study [22], we obtain a sizeable kz-dispersion
parallel to the c-axis that leads to additional caps of the FS as shown in Figs. 5 (a),(b). Such
a strong 3D character is not in agreement with ARPES data: (i) in general, the QP peaks
at EF are expected to be considerably broadened in a 3D system, in particular because the
short life time of the final states becomes important [19]. This is not the case (Fig. 1). (ii)
ARPES measurements at various excitation energies do not show any evidence for a strong
dispersion along c [17].
However, magnetic LSDA calculations yield an AFM ground state, where ferromagnetic
ab-planes are coupled antiferromagnetically along c. This agrees well with neutron data [23].
In the AFM state, the kz-dispersion is strongly reduced, which removes the aforementioned
FS-caps and yields the FS shown in Fig. 5 (c). In other words, according to LSDA, 3D
AFM correlations render the electronic structure of Na0.75CoO2 effectively 2D. The top of
the e′g-band at EB ≃ 70meV as well as the topology and size of the FS obtained in LSDA
are in good agreement with the ARPES data as demonstrated in 5 (d). The above results
together with the neutron data indicate that AFM correlations have a strong influence on
the electronic structure of NaxCoO2 with x ≃ 0.75.
In conclusion, we have shown that the QPR in Na0.73CoO2 is strongly anisotropic and
provided clear evidence for the relevance of multiorbital effects for the QP dynamics in this
material. In addition, detailed DFT studies highlight the impact of magnetic correlations on
the QP-states near EF , which is expected to be directly related to the unusual temperature
as well as the field dependencies of the thermopower and the QP scattering rates [6, 7, 14].
Hence, both the interactions between the a1g and e
g states as well as magnetic correlations
have to be taken into account in order to obtain a realistic description of these materials.
Acknowledgements: We thank Dr. Bussy (Univ. of Lausanne) for the micro probe
analysis and I. Elfimov, K.M. Shen, D.G. Hawthorn and G.A. Sawatzky for helpful discus-
sions. This work was supported by the Swiss NCCR research pool MaNEP of the Swiss NSF,
the DFG (FOR 538 research unit, Grant 51195121) and the BMBF (Grant 05KS4OD2/8).
J.G. gratefully acknowledges the support by the DFG.
Figure captions
FIG. 1: Left: Typical Fermi level crossing observed along a cut close to ΓM (ψ =
173◦, cf. Fig. 2 (b)) Right: Corresponding energy distribution curve (EDC) and momentum
distribution curve (MDC) at k = kF and EB = 0 eV, respectively.
FIG. 2: ARPES data for Na0.73CoO2 (excitation energy hν = 21.2 eV). (a), (b): Momen-
tum distribution maps of the photoelectron intensity integrated over a small energy interval
(EB ± 3meV) at EB = 0 eV and 0.132 eV measured at T=25K. The measured k-region is
indicated by the black dotted line in (a). The other regions in k-space have be obtained by
rotating this data set by 120◦ and 240◦. The broken white lines show the two-dimensional
Brillouin zone. High-symmetry points Γ, K, and M are indicated in (a) and the definition
of ψ is given in (b). A fit to kF = kF (ψ) is shown as a solid black line. (c): Comparison
of the measured band structure and the LDA calculation by Singh (black lines) [11]. The
crystal field split e′g- and a1g-manifolds are indicated.
FIG. 3: Cuts through the map data shown in Fig. 2 (a) and (b). The data is normalized to
binding energies above 0.25 eV. Black symbols: QP-dispersion determined by fitting MDCs
at different EB. Broken lines indicate the fitted linear dispersions (see text). The inset
shows the ARPES- (symbols) and LDA- (lines) dispersions as a function of k−kF (ψ). LDA
for x = 0.73 in the rigid band approximation (cf. Fig. 4).
FIG. 4: (a),(b): vPESF and v
F as a function of ψ. The experimental v
F values at a
given value of ψ were obtained by averaging over two equivalent cuts (e.g. ψ = 150◦, 210◦).
The LDA calculations were performed in the rigid band approximation for the low temper-
ature lattice structure using Wien2K. The same behavior was also found by LDA/LSDA
calculations in the virtual crystal approximation (cf. Fig. 5). (c): Effective mass of the QP.
(a)-(c): hν = 21.2 eV. Solid curves are fits to a sinus-function intended to serve as guides to
the eye. (d): κ = vLDAF /v
F − 1 characterizing the QPR.
FIG. 5: FS obtained for x = 0.75 by LDA in the virtual crystal approximation (VCA),
revealing a three-dimensional band structure. (c): FS for the AFM ground state obtained
by LSDA in the VCA where the band structure retains its pronounced two-dimensionality.
The color scale in (a)-(c) indicates vF . (d) Comparison of the measured and LSDA FS. The
DFT calculations have been performed using the FPLO code [20].
[1] J. Orenstein and A. Millis, Science 288, 468 (2000).
[2] Y. Tokura and N. Nagaosa, Science 288, 462 (2000).
[3] Q. Huang, M. L. Foo, R. A. Pascal, J. W. Lynn, B. H. Toby, T. He, H. W. Zandbergen, and
R. J. Cava, Phys. Rev. B 70, 184110 (2004).
[4] M. L. Foo, Y. Wang, S. Watauchi, H. Zandbergen, T. He, R. J. Cava, and N. P. Ong, Phys.
Rev. Lett. 92, 247001 (2004).
[5] K. Takada, H. Sakurai, E. Takayama-Muromachi, F. Izumi, R. Dilanian, and T. Sasaki, Nature
422, 53 (2003).
[6] Y. Wang, N. Rogado, R. J. Cava, and N. Ong, Nature 423, 425 (2003).
[7] S. Li, L. Taillefer, D. Hawthorn, M. Tanatar, J. Paglione, M. Sutherland, R. Hill, C. Wang,
and X. Chen, Phys. Rev. Lett. 93, 056401 (2004).
[8] J. Gavilano, D. Rau, B. Pedrini, J. Hinderer, H. Ott, S. Kazakov, and J. Karpinski, Phys.
Rev. B 69, 100404(R) (2004).
[9] C. Bernhard, A. Boris, N. Kovaleva, G. Khaliullin, A. Pimenov, L. Yu, D. Chen, C. Lin, and
B. Keimer, Phys. Rev. Lett. 93, 176401 (2004).
[10] F. Ning, T. Imai, B. Statt, and F. Chou, Phys. Rev. Lett. 93, 237201 (2004).
[11] D. J. Singh, Phys. Rev. B 61, 13397 (2000).
[12] T. Valla, P. D. Johnson, Z. Yusof, B. Wells, Q. Li, S. M. Loureiro, R. J. Cava, M. Mikamik,
Y. Morik, M. Yoshimurak, et al., Nature 417, 627 (2002).
[13] H.-B. Yang, S.-C.Wang, A. K. P. Sekharan, H. Matsui, S. Souma, T. Sato, T. Takahashi,
T. Takeuchi, J. C. Campuzano, R. Jin, et al., Phys. Rev. Lett. 92, 246403 (2006).
[14] M. Z. Hasan, Y.-D. Chuang, D. Qian, Y. W. Li, Y. Kong, A. Kuprin, A. Fedorov, R. Kim-
merling, E. Rotenberg, K. Rossnagel, et al., Phys. Rev. Lett. 92, 246402 (2004).
[15] H.-B. Yang, Z.-H. Pan, A. K. P. Sekharan, T. Sato, S. Souma, T. Takahashi, R. Jin, B. C.
Sales, D. Mandrus, A. Fedorov, et al., Phys. Rev. Lett. 95, 146401 (2006).
[16] D. Qian, L. Wray, D. Hsieh, D. Wu, J. L. Luo, N. L. Wang, A. Kuprin, A. Fedorov, R. J.
Cava, L. Viciu, et al., Phys. Rev. Lett. 96, 046407 (2006).
[17] D. Qian, L. Wray, D. Hsieh, L. Viciu, R. Cava, J. Luo, D. Wu, N. Wang, and M. Hasan1,
Phys. Rev. Lett. 97, 186405 (2006).
[18] M. Iliev, A. Litvinchuk, R. Meng, Y. Sun, J. Cmaidalka, and C. W. Chu, Physica C 402, 239
(2004).
[19] H. Starnberg, H. Brauer, and P. Nilsson, Phys. Rev. B 48, 621 (1993).
[20] H. Eschrig and K. Koepernik, Phys. Rev. B 59, 1743 (1999).
[21] T. P. Devereaux, T. Cuk, Z.-X. Shen, and N. Nagaosa, Phys. Rev. Lett. 93, 117004 (2004).
[22] M. D. Johannes, D. A. Papaconstantopoulos, D. J. Singh, and M. J. Mehl, Europhys. Lett.
68, 433 (2004).
[23] L. M. Helme, A. T. Boothroyd, R. Coldea, D. Prabhakaran, D. Tennant, A. Hiess, and
J. Kulda, Phys. Rev. Lett. 94, 157206 (2005).
Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
	Figure captions
	References
ABSTRACT
  We report an angular resolved photoemission study of Na0.73CoO2 where it is
found that the renormalization of the quasiparticle (QP) dispersion changes
dramatically upon a rotation from GM to GK. The comparison of the experimental
data to the calculated band structure reveals that the QP-renormalization is
most pronounced along the GK-direction, while it is significantly weaker along
the GM-direction. We discuss the observed anisotropy in terms of multiorbital
effects and point out the relevance of magnetic correlations for the band
structure of Na0.73CoO2.

<|endoftext|><|startoftext|>
Introduction, in this exploratory work we use the phenomenological factorization
model rather than in the established theories based on a heavy quark expansion. Consequently,
uncertainties due to power corrections, at this stage, are not included in our calculations, by
assumption. In view of such shortcomings we must emphasize that the additional errors due to
such model dependent assumptions may be sizable.
From Table III we see that the predicted rates for resonant and nonresonant components are
consistent with experiment within errors. The nonresonant contribution arises dominantly from
the transition process (88%) via the scalar-density-induced vacuum to KK̄ transition, namely,
〈K+K−|s̄s|0〉, and slightly from the current-induced process (3%). Therefore, it is natural to
conjecture that nonresonant decays could also play a prominent role in other penguin dominated
3-body B decays.
The K+K−KS mode is an admixture of CP -even and CP -odd components. By excluding the
major CP -odd contribution from φKS , the 3-bodyK
+K−KS final state is primarily CP -even. The
K+K− mass spectra of the B0 → K+K−KS decay from CP -even and CP -odd contributions are
shown in Fig. 2. For the CP -even spectrum, there are peaks at the threshold and mK+K− = 1.5
GeV region. The threshold enhancement arises from the f0(980)KS and the nonresonant f
1 1.5 2 2.5 3 3.5 4 4.5
m K+ K- HGeVL
1 1.5 2 2.5 3
m K+ K- HGeVL
1 1.02 1.04
   (GeV)K+K-m
1 1.5 2 2.5 3 3.5 4 4.5
   (GeV)K+K-m
1 1.5 2 2.5 3 3.5 4 4.5
BABAR
preliminary
(a) (b) (c)
FIG. 2: The K+K− mass spectra for B0 → K+K−KS decay from (a) CP -even and (b) CP -odd
contributions. The insert in (b) is for the φ region. The full K+K−KS spectrum, which is the sum
of CP -even and CP -odd parts, measured by BaBar [16] is depicted in (c).
contributions [see Eq. (2.18)]. 2 For the CP -odd spectrum, the peak on the lower end corresponds
to the φKS contribution, which is also shown in the insert. The b → u transition is governed by
the current-induced process 〈B0 → K+K0〉 × 〈0 → K−〉 [see Eq. (A4)]. From Eq. (2.8) it is clear
that the b→ u amplitude prefers a small invariant mass of K+ and K0 and hence a large invariant
mass of K+ and K−. In contrast, the b → c amplitude prefers a small s23. Consequently, their
interference is largely suppressed. The full K+K−KS spectrum, which is the sum of the CP -even
and the CP -odd parts, has been measured by BaBar [Fig. 2(c)]. It clearly shows the phenomenon
of threshold enhancement and the scalar resonances X0(1550) and χc0.
The decay B
0 → KSKSKS is a pure penguin-induced mode [cf. Eq. (A7)] and it receives
intermediate pole contributions only from the iso-singlet scalar mesons such as f0(980). Just like
other KKK modes, this decay is governed by the nonresonant background dominated by the σ
term defined in Eq. (2.18). Hence, this mode is ideal for determining the unknown parameter σ
which is given in Eq. (2.26). Time-dependent CP violation in neutral 3-body decay modes with
fixed CP parity was first discussed by Gershon and Hazumi [51].
Results for the decay rates and CP asymmetries in B0 → K+K−KS(L), KSKSKS(L) are dis-
played in Table IV and Table V, respectively. (For the decay amplitudes of B0 → KSKSKS(L), see
[40] for details.) The mixing-induced CP violations are defined by
SKKK,CP± =
Im(e−2iβACP±Ā
CP±)ds12ds23
|ACP±|2ds12ds23 +
|ĀCP±|2ds12ds23
SKKK =
Im(e−2iβAĀ∗)ds12ds23
|A|2ds12ds23 +
|Ā|2ds12ds23
= f+ SKKK,CP+ + (1− f+)SKKK,CP−, (2.27)
2 In our previous work [40] we have argued that the spectrum should have a peak at the large mK+K−
end. This is because we have introduced an additional nonresonant contribution to the ω− parameter
parametrized as ωNR− = κ
2pB ·p2
and employed the B− → D0K0K− data and applied isospin symmetry
to the B → KK matrix elements to determine the unknown parameter κ. Since this nonresonant term
favors a small mK+KS region, a peak of the spectrum at large mK+K− is thus expected. However, such a
bump is not seen experimentally [16]. In this work we will no longer consider this term.
TABLE IV: Branching ratios for B0 → K+K−KS , KSKSKS , KSKSKL decays and the fraction
of CP-even contribution to B
0 → K+K−KS , f+. The branching ratio of CP-odd K+K−KS with
φKS excluded is shown in parentheses. Results for (K
+K−KL)CP± are identical to those for
(K+K−KS)CP∓. For theoretical errors, see Table III. Experimental results are taken from [50].
Final State B(10−6)theory B(10−6)expt
K+K−KS 9.89
+0.19+2.28+0.07
−0.21−1.81−0.08 12.4± 1.2
(K+K−KS)CP+ 8.33
+0.10+1.82+0.05
−0.12−1.49−0.06
(K+K−KS)CP− 1.57
+0.09+0.46+0.02
−0.10−0.32−0.02
(0.14+0.06+0.14+0.01−0.06−0.06−0.01)
KSKSKS input 6.2± 0.9
KSKSKL 7.63
+0.01+1.37+0.03
−0.01−1.19−0.03 < 14
theory
K+K−KS 0.98
+0.01+0.01+0.00
−0.01−0.02−0.00 0.91 ± 0.07
theory
K+K−KL 0.98
+0.01+0.01+0.00
−0.01−0.02−0.00
where A is the decay amplitude of B
0 → K+K−KS(L) or KSKSKS(L) and Ā is the conjugated B0
decay amplitude, and f+ is the CP even fraction defined by
ΓCP+ + ΓCP+
Γ + Γ
φKS excluded.
(2.28)
Generally, it is more convenient to define an effective sin 2β via Sf ≡ −ηf sin 2βeff with ηf = 2f+−1
for K+K−KS . The predicted value of f+ is consistent with the data but it is on the higher end of
the experimental measurement because the CP -odd contributions from the vector mesons ρ, ω, · · · ,
are OZI suppressed and the CP -odd nonresonant contribution is constrained by the π+π−π− rate.
The deviation of the mixing-induced CP asymmetry in B0 → K+K−KS and KSKSKS from
that measured in B → φcc̄KS , i.e. sin 2βφcc̄KS = 0.681± 0.025 [50], namely, ∆ sin 2βeff ≡ sin 2βeff −
sin 2βφcc̄KS , is calculated from Table V to be
∆ sin 2βK+K−KS = 0.047
+0.028
−0.033 ,
∆sin 2βKSKSKS = 0.038
+0.027
−0.032 . (2.29)
The corresponding experimental values are 0.049 ± 0.10 and −0.101 ± 0.20, respectively. Due to
the presence of color-allowed tree contributions in B
0 → K+K−KS , it is naively expected that this
penguin-dominated mode is subject to a potentially significant tree pollution and hence ∆ sin 2βeff
can be as large as O(10%). However, our calculation indicates the deviation of the mixing-induced
CP asymmetry in B
0 → K+K−KS from that measured in B
0 → φcc̄KS is very similar to that of
the KSKSKS mode as the tree pollution effect in the former is somewhat washed out. Nevertheless,
TABLE V: Mixing-induced and direct CP asymmetries sin 2βeff (top) and Af (in %, bottom),
respectively, in B
0 → K+K−KS and KSKSKS decays. Experimental results for K+K−KS and
K+K−KL modes are obtained from the data of B
0 → K+K−K0. Results for (K+K−KL)CP± are
identical to those for (K+K−KS)CP∓. For theoretical errors, see Table III. Experimental results
are taken from [50].
Final state sin 2βeff Expt.
(K+K−KS)φKS excluded 0.728
+0.001+0.002+0.009
−0.002−0.001−0.020 0.73 ± 0.10
(K+K−KS)CP+ 0.732
+0.003+0.006+0.009
−0.004−0.004−0.020
(K+K−KL)φKL excluded 0.728
+0.001+0.002+0.009
−0.002−0.001−0.020 0.73 ± 0.10
KSKSKS 0.719
+0.000+0.000+0.008
−0.000−0.000−0.019 0.58 ± 0.20
KSKSKL 0.718
+0.000+0.000+0.008
−0.000−0.000−0.019
Af (%) Expt.
(K+K−KS)φKS excluded −4.63
+1.35+0.53+0.40
−1.01−0.54−0.34 −7± 8
(K+K−KS)CP+ −4.86+1.43+0.52+0.42−1.09−0.55−0.35
(K+K−KL)φKL excluded −4.63
+1.35+0.53+0.40
−1.01−0.54−0.34 −7± 8
KSKSKS 0.69
+0.01+0.01+0.05
−0.01−0.01−0.06 14 ± 15
KSKSKL 0.77
+0.01+0.01+0.05
−0.01−0.03−0.07
direct CP asymmetry of the former, being of order −4%, is more prominent than the latter.3
B. B− → KKK decays
The B− → K+K−K− decay amplitude has a similar expression as Eq. (A4) except that one
also needs to add the contributions from the interchange s23 → s12 and put a factor of 1/2 in the
decay rate to account for the identical particle effect.
Branching ratios of resonant and nonresonant contributions to B− → K+K−K− are shown in
Table VI. It is clear that the predicted rates of resonant and nonresonant components are consistent
with the data except for the broad scalar resonance X0(1550). Both BaBar and Belle have seen
a large fraction from X0(1550), (121 ± 19 ± 6)% by BaBar [8] and (63.4 ± 6.9)% by Belle [9], 4
while our prediction is similar to that in B
0 → K+K−K0. It is not clear why there is a huge
3 In our previous work [40], ∆ sin 2βeff is found to be
∆ sin 2βK+K−KS = 0.06
+0.09
−0.04 , ∆sin 2βKSKSKS = 0.06
+0.03
−0.04 ,
for sin 2βJ/ψKS = 0.687 ± 0.032, while direct CP asymmetry is less than 1% in both modes. Note that
due to an oversight the experimental error bars were not included in our previous paper for the theoretical
calculation of ∆ sin 2βeff .
4 Belle [9] actually found two solutions for the fraction of X0(1550)K
−: (63.4± 6.9)% and (8.21± 1.94)%.
The first solution is preferred by Belle.
disparity between B− → K+K−K− and B0 → K+K−K0 as far as the X0(1550) contribution is
concerned. Obviously, a refined measurement of the X0(1550) contribution to the K
+K−K− mode
is urgently needed in order to clarify this issue. Our result for the nonresonant contribution is in
good agreement with Belle, but disagrees with BaBar. Notice that Belle did not see the scalar
resonance f0(980) as Belle employed the E791 result [53] for g
f0→KK̄ which is smaller than gf0→ππ.
In contrast to E791, the ratio gf0→KK̄/gf0→ππ is measured to be larger than 4 in the existing e+e−
experiments [45, 54]
TABLE VI: Branching ratios (in units of 10−6) of resonant and nonresonant (NR) contributions to
B− → K+K−K−. For theoretical errors, see Table III.
Decay mode BaBar [8] Belle [9] Theory
φK− 4.14 ± 0.32± 0.33 4.72 ± 0.45 ± 0.35+0.39−0.22 2.9
+0.0+0.5+0.0
−0.0−0.5−0.0
f0(980)K
− 6.5± 2.5± 1.6 < 2.9 7.0+0.0+0.4+0.1−0.0−0.7−0.1
X0(1550)K
− 43± 6± 3 1.1+0.0+0.2+0.0−0.0−0.2−0.0
f0(1710)K
− 1.7± 1.0± 0.3
NR 50± 6± 4 24.0 ± 1.5± 1.8+1.9−5.7 25.3
+0.9+4.8+0.3
−1.0−4.4−0.3
Total 35.2 ± 0.9± 1.6 32.1 ± 1.3± 2.4 25.5+0.5+4.4+0.2−0.6−4.1−0.2
We next turn to the decay B− → K−KSKS . Following [55], let us consider the symmetric state
of K0K
|K0K0〉sym ≡
|K0(p1)K
(p2)〉+ |K
(p1)K
0(p2)〉
= [|KS(p1)KS(p2)〉 − |KL(p1)KL(p2)〉] /
2. (2.30)
Hence,
B(B− → K−KSKS) =
[B(B− → K−KSKS) + B(B− → K−KLKL)]
B(B− → K−(K0K0)sym). (2.31)
The factorizable amplitude of B− → K−K0K0 is given by Eq. (A8). Just as other KKK modes,
this decay is also expected to be dominated by the nonresonant contribution (see Table VII).
The calculated total rate is in good agreement with experiment. Just as the pure penguin mode
KSKSKS , the decay B
− → K−KSKS also can be used to constrain the nonresonant parameter
As pointed out in [55], isospin symmetry implies the relation
A(B− → K−K0K0) = −A(B0 → K0K+K−). (2.32)
This leads to
B(B− → K−(K0K0)sym) =
τ(B−)
τ(B0)
B(B0 → K+K−K0)φK excluded. (2.33)
Experimentally, this relation is well satisfied: LHS=(23.0±2.6)×10−6 and RHS=(22.1±2.1)×10−6 .
Hence, the isospin relation Eq. (2.32) is well respected.
TABLE VII: Branching ratios (in units of 10−6) of resonant and nonresonant (NR) contributions
to B− → K−KSKS . For theoretical errors, see Table III.
Decay mode f0(980)K
− X0(1550)K
− NR total
Theory 5.2+0.0+0.3+0.1−0.0−0.5−0.1 0.92
+0.00+0.16+0.00
−0.00−0.15−0.00 12.4
+0.2+2.1+0.1
−0.3−2.0−0.1 12.2
+0.0+1.5+0.0
−0.0−1.7−0.0
Expt. 11.5 ± 1.3
III. B → Kππ DECAYS
In this section we shall consider five B → Kππ decays, namely, B− → K−π+π−, K0π−π0,
0 → K−π+π0, K0π+π− and K0π0π0. They are dominated by b → s penguin transition and
consist of three decay processes: (i) the current-induced process, 〈B → ππ〉 × 〈0 → K〉, (ii) the
transition processes, 〈B → π〉 × 〈0 → πK〉, and 〈B → K〉 × 〈0 → ππ〉, and (iii) the annihilation
process 〈B → 0〉 × 〈0 → Kππ〉.
The factorizable amplitudes for B− → K−π+π−, K0π−π0, B0 → K−π+π0, K0π+π− and
π0π0 are given in Eqs. (A10-A14), respectively. All five channels have the three-body matrix
element 〈ππ|(q̄b)V−A|B〉 which has the similar expression as Eqs. (2.3) and (2.4) except that the
pole B∗s is replaced by B
∗ and the kaon is replaced by the pion. However, there are additional
resonant contributions to this three-body matrix element due to the intermediate vector ρ and
scalar f0 mesons
〈π+(p2)π−(p3)|(ūb)V −A |B
−〉R =
→π+π−
m2ρi − s23 − imρiΓρi
ε∗ · (p2 − p3)〈ρ0i |(ūb)V −A |B
gf0i→π
m2f0i
− s23 − imf0iΓf0i
〈f0i|(ūb)V −A |B
−〉, (3.1)
where ρi denote generic ρ-type vector mesons, e.g. ρ = ρ(770), ρ(1450), ρ(1700), · · ·. Applying Eqs.
(B1) and (B6) we are led to
〈π+(p2)π−(p3)|(ūb)V−A|B−〉R 〈K−(p1)|(s̄u)V −A |0〉
→π+π−
− s23 − imρiΓρi
(s12 − s13)
(mB +mρi)A
mB +mρi
(s12 + s13 − 3m2π)− 2mρi [A
2)−ABρi0 (q
f0i→π
m2f0i
− s23 − imf0iΓf0i
(m2B −m2f0i)F
2). (3.2)
Likewise, the 3-body matrix element 〈K−π+|(s̄b)
|B0〉 appearing in B0 → K−π+π0 also receives
the following resonant contributions
〈K−(p1)π+(p2)|(s̄b)V −A |B
0〉R =
→K−π+
− s12 − imK∗
ε∗ · (p1 − p2)〈K
i |(s̄b)V −A |B
(3.3)
with K∗i = K
∗(892),K∗(1410),K∗(1680), · · ·.
For the two-body matrix elements 〈π+K−|(s̄d)V−A|0〉, 〈π+π−|(ūu)V−A|0〉 and 〈π+π−|s̄s|0〉, we
note that
〈K−(p1)π+(p2)|(s̄d)V −A |0〉 = 〈π
+(p2)|(s̄d)V −A |K
+(−p1)〉 = (p1 − p2)µFKπ1 (s12)
m2K −m2π
(p1 + p2)µ
−FKπ1 (s12) + FKπ0 (s12)
, (3.4)
where we have taken into account the sign flip arising from interchanging the operators s ↔ d.
Hence,
〈K−(p1)π+(p2)|(s̄d)V−A|0〉〈π−(p3)|(d̄b)V−A|B−〉
= FBπ1 (s12)F
1 (s12)
s23 − s13 −
(m2B −m2π)(m2K −m2π)
+ FBπ0 (s12)F
0 (s12)
(m2B −m2π)(m2K −m2π)
. (3.5)
However, the form factor F1 also receives resonant contributions
− s12 − imK∗
− s12 − imK∗
(p1 − p2)µ
, (3.6)
K∗πK = 〈K
−(p1)π
+(p2)|K∗〉 = gK
∗→πK ε∗ · (p1 − p2), (3.7)
where K∗0 i = K
0 (1430), · · ·. Hence, the resonant contributions to the form factor FKπ1 are
FKπ1,R (s) =
− s− imK∗
− s− imK∗
. (3.8)
In principle, the weak vector form factor F π
+π− defined by
〈π+(pπ+)π−(pπ−)|ūγµu|0〉 = (pπ+ − pπ−)µF π
+π− , (3.9)
can be related to the time-like pion electromagnetic form factors. However, unlike the kaon case,
the time-like e.m. form factors of the pions are not well measured enough allowing us to determine
the resonant and nonresonant parts. Therefore, we shall only consider the resonant part which has
the expression
F ππR (s) =
mρifρig
ρi→ππ
m2ρi − s− imρiΓρi
. (3.10)
Following Eq. (2.18), the relevant matrix elements of scalar densities read
〈π+(p2)π−(p3)|s̄s|0〉 =
mf0i f̄
gf0i→π
m2f0i
− s23 − imf0iΓf0i
+ 〈π+(p2)π−(p3)|s̄s|0〉NR, (3.11)
〈K−(p1)π+(p2)|s̄d|0〉 =
→K−π+
− s12 − imK∗
+ 〈K−(p1)π+(p2)|s̄d|0〉NR. (3.12)
Note that for the scalar meson, the decay constants fS and f̄S are defined in Eq. (B1) and they
are related via Eq. (B2). The nonresonant contribution 〈π+(p2)π−(p3)|s̄s|0〉NR vanishes under the
OZI rule, while under SU(3) symmetry5
〈K−(p1)π+(p2)|s̄d|0〉NR = 〈K+(p1)K−(p2)|s̄s|0〉NR = fNRs (s12), (3.13)
with the expression of fNRs given in Eq. (2.18).
It is known that in the narrow width approximation, the 3-body decay rate obeys the factoriza-
tion relation
Γ(B → RP → P1P2P ) = Γ(B → RP )B(R→ P1P2), (3.14)
with R being a resonance. This means that the amplitudes A(B → RP → P1P2P ) and A(B → RP )
should have the same expressions apart from some factors. Hence, using the known results for
quasi-two-body decay amplitude A(B → RP ), one can have a cross check on the three-body decay
amplitude of B → RP → P1P2P . For example, from Eq. (A12) we obtain the factorizable
amplitude A(B
0 → K∗00 (1430)π0;K∗00 (1430) → K−π+) as
〈K−(p1)π+(p2)π0(p3)|Tp|B
0〉K∗0
(1430) =
(1430)→K−π+
− s12 − imK∗
−ap4 + r
10 − r
FBπ0 (m
)(m2B −m2π)
a2δpu +
(a9 − a7)
B −m2K∗
, (3.15)
where
χ (µ) =
2m2K∗
mb(µ)(ms(µ)−mq(µ))
. (3.16)
The expression inside {· · ·} is indeed the amplitude of B0 → K∗00 (1430)π0 given in Eq. (A6) of
[48].
The strong coupling constants such as gρ→π
+π− and gf0(980)→π
+π− are determined from the
measured partial widths through the relations
8πm2S
g2S→P1P2 , ΓV =
4πm2V
g2V→P1P2 , (3.17)
for scalar and vector mesons, respectively, where pc is the c.m. momentum. The numerical results
+π− = 6.0, gK
∗→K+π− = 4.59,
gf0(980)→π
+π− = 1.33+0.29−0.26 GeV, g
→K+π− = 3.84GeV. (3.18)
In determining the coupling of f0 → π+π−, we have used the partial width
Γ(f0(980) → π+π−) = (34.2+13.9+8.8−11.8−2.5)MeV (3.19)
5 The matrix elements of scalar densities can be generally decomposed into D-, F - and S(singlet)-type
components. Assuming that the singlet component is OZI suppressed, SU(3) symmetry leads to, for
example, the relation 〈Kπ|s̄q|0〉NR = 〈KK̄|s̄s|0〉NR.
measured by Belle [56]. The momentum dependence of the weak form factor FKπ(q2) is
parametrized as
FKπ(q2) =
FKπ(0)
1− q2/Λχ2 + iΓR/Λχ
, (3.20)
where Λχ ≈ 830 MeV is the chiral-symmetry breaking scale [57] and ΓR is the width of the relevant
resonance, which is taken to be 200 MeV [38].
The results of the calculation are summarized in Tables VIII-XII. We see that except for
f0(980)K, the predicted rates for K
∗π, K∗0 (1430)π and ρK are smaller than the data. Indeed, the
predictions based on QCD factorization for these decays are also generally smaller than experiment
by a factor of 2∼5. This will be discussed in more details in Sec. VI.
TABLE VIII: Branching ratios (in units of 10−6) of resonant and nonresonant (NR) contributions
to B− → K−π+π−. For theoretical errors, see Table III.
Decay mode BaBar [6] Belle [7] Theory
π− 9.04± 0.77 ± 0.53+0.21−0.37 6.45 ± 0.43 ± 0.48
+0.25
−0.35 3.0
+0.0+0.8+0.0
−0.0−0.7−0.0
0 (1430)π
− 34.4± 1.7 ± 1.8+0.1−1.4 32.0 ± 1.0± 2.4
−1.9 10.5
+0.0+3.2+0.0
−0.0−2.7−0.1
ρ0K− 5.08± 0.78 ± 0.39+0.22−0.66 3.89 ± 0.47 ± 0.29
+0.32
−0.29 1.3
+0.0+1.9+0.1
−0.0−0.7−0.1
f0(980)K
− 9.30± 0.98 ± 0.51+0.27−0.72 8.78 ± 0.82 ± 0.65
+0.55
−1.64 7.7
+0.0+0.4+0.1
−0.0−0.8−0.1
NR 2.87± 0.65 ± 0.43+0.63−0.25 16.9 ± 1.3± 1.3
−0.9 18.7
+0.5+11.0+0.2
−0.6− 6.3−0.2
Total 64.4± 2.5 ± 4.6 48.8 ± 1.1± 3.6 45.0+0.3+16.4+0.1−0.4−10.5−0.1
TABLE IX: Same as Table VIII except for the decay B− → K0π−π0.
Decay mode Theory Decay mode Theory
K∗−π0 1.5+0.0+0.3+0.2−0.0−0.3−0.2 K
π− 1.5+0.0+0.4+0.0−0.0−0.3−0.0
K∗−0 (1430)π
0 5.5+0.0+1.6+0.1−0.0−1.4−0.1 K
0 (1430)π
− 5.2+0.0+1.6+0.0−0.0−1.4−0.0
1.3+0.0+3.0+0.0−0.0−0.9−0.0 NR 10.0
+0.2+7.1+0.0
−0.2−3.7−0.0
Total 27.0+0.3+15.4+0.2−0.2− 8.8−0.2
While Belle has found a sizable fraction of order (35 ∼ 40)% for the nonresonant signal in
K−π+π− and K
π+π− modes (see Table I), BaBar reported a small fraction of order 4.5% in
K−π+π−. The huge disparity between BaBar and Belle is ascribed to the different parameteri-
zations adopted by both groups. BaBar [6] used the LASS parametrization to describe the Kπ
S-wave and the nonresonant component by a single amplitude suggested by the LASS collaboration
to describe the scalar amplitude in elastic Kπ scattering. As commented in [7], while this approach
is experimentally motivated, the use of the LASS parametrization is limited to the elastic region
of M(Kπ) <∼ 2.0 GeV, and an additional amplitude is still required for a satisfactory description
of the data. In our calculations we have taken into account the nonresonant contributions to the
TABLE X: Same as Table VIII except for the decay B
0 → K0π+π−.
Decay mode Belle [13] Theory
K∗−π+ 5.6± 0.7± 0.5+0.4−0.3 2.1
+0.0+0.5+0.3
−0.0−0.5−0.3
K∗−0 (1430)π
+ 30.8± 2.4 ± 2.4+0.8−3.0 10.1
+0.0+2.9+0.1
−0.0−2.5−0.2
6.1± 1.0± 0.5+1.0−1.1 2.0
+0.0+1.9+0.1
−0.0−0.9−0.1
f0(980)K
7.6± 1.7± 0.7+0.5−0.7 7.7
+0.0+0.4+0.0
−0.0−0.7−0.0
NR 19.9± 2.5 ± 1.6+0.7−1.2 15.6
+0.1+8.3+0.0
−0.1−4.9−0.0
Total 47.5± 2.4 ± 3.7 42.0+0.3+15.7+0.0−0.2−10.8−0.0
TABLE XI: Branching ratios (in units of 10−6) of resonant and nonresonant (NR) contributions
0 → K−π+π0. Note that the branching ratios for K∗−π+ and K∗0π0 given in [14] and
[15] are their absolute ones. We have converted them into the product branching ratios, namely,
B(B → Rh)× B(R→ hh). For theoretical errors, see Table III.
Decay mode BaBar [14] Belle [15] Theory
K∗−π+ 3.6± 0.8± 0.5 4.9+1.5+0.5+0.8−1.5−0.3−0.3 1.0
+0.0+0.3+0.1
−0.0−0.3−0.1
π0 2.0± 0.6± 0.3 < 2.3 1.0+0.0+0.3+0.2−0.0−0.2−0.1
K∗−0 (1430)π
+ 11.2± 1.5 ± 3.5 5.1± 1.5+0.6−0.7 5.0
+0.0+1.5+0.1
−0.0−1.3−0.1
0 (1430)π
0 7.9± 1.5± 2.7 6.1+1.6+0.5−1.5−0.6 4.2
+0.0+1.4+0.0
−0.0−1.2−0.0
ρ+K− 8.6± 1.4± 1.0 15.1+3.4+1.4+2.0−3.3−1.5−2.1 2.5
+0.0+3.6+0.2
−0.0−1.4−0.2
NR < 4.6 5.7+2.7+0.5−2.5−0.4 < 9.4 9.6
+0.3+6.6+0.0
−0.2−3.5−0.0
Total 34.9± 2.1 ± 3.9 36.6+4.2−4.1 ± 3.0 28.9
+0.2+16.1+0.2
−0.2− 9.4−0.2
two-body matrix elements of scalar densities, 〈Kπ|s̄q|0〉. Recall that a large nonresonant contribu-
tion from 〈KK|s̄s|0〉 is needed in order to explain the observed decay rates of B0 → KSKSKS and
B− → K−KSKS . From Tables VIII-XII we see that our predicted nonresonant rates are in agree-
ment with the Belle measurements. The reason why the nonresonant fraction is as large as 90% in
KKK decays, but becomes only (35 ∼ 40)% in Kππ channels (see Table I) can be explained as
follows. Under SU(3) flavor symmetry, we have the relation 〈Kπ|s̄q|0〉NR = 〈KK̄|s̄s|0〉NR. Hence,
the nonresonant rates in the K−π+π− and K
π+π− modes should be similar to that in K+K−K
or K+K−K−. Since theKKK channel receives resonant contributions only from φ and f0i mesons,
while K∗i ,K
0i, ρi, f0i resonances contribute to Kππ modes, this explains why the nonresonant frac-
tion is of order 90% in the former and becomes of order 40% in the latter. Note that the predicted
nonresonant contribution in the K−π+π0 mode is larger than the BaBar’s upper bound and barely
consistent with the Belle limit. It is conceivable that the SU(3) breaking effect in 〈Kπ|s̄q|0〉NR
may lead to a result consistent with the Belle limit.
It is interesting to notice that, based on a simple fragmentation model and SU(3) symmetry,
Gronau and Rosner [55] found the relations
Γ(B− → K+K−K−)NR = 2Γ(B
0 → K+K−K0)NR = 2Γ(B− → K−π+π−)NR
TABLE XII: Same as Table VIII except for the decay B
0 → K0π0π0.
Decay mode f0(980)K
0 (1430)π
0 NR Total
Theory 3.8+0.0+2.0+0.0−0.0−0.4−0.0 0.55
+0.00+0.16+0.00
−0.00−0.13−0.00 2.3
+0.0+0.8+0.0
−0.0−0.6−0.0 5.3
+0.0+1.8+0.0
−0.0−1.1−0.0 12.9
+0.0+4.0+0.1
−0.0−3.0−0.1
TABLE XIII: Branching ratios, mixing-induced and direct CP asymmetries for B
0 → KSπ+π−
decays. Results for (KLππ)CP± are identical to those for (KSππ)CP∓. For theoretical errors, see
Table III.
Final state Branching ratio
+π−)CP+ 13.52
+0.02+4.03+0.01
−0.03−3.06−0.01
+π−)CP− 7.45
+0.10+3.79+0.02
−0.08−2.32−0.02
f+ 0.65
+0.00+0.03+0.00
−0.00−0.04−0.00
Final state sin 2βeff
+π−)CP+ 0.693
+0.000+0.003+0.003
−0.000−0.002−0.014
+π−)full 0.718
+0.001+0.017+0.008
−0.001−0.007−0.018
Final state Af (%)
+π−)CP+ 4.27
+0.00+0.19+0.28
−0.00−0.12−0.35
+π−)full 4.94
+0.03+0.03+0.32
−0.02−0.05−0.40
= 2Γ(B
0 → K0π+π−)NR = 4Γ(B
0 → K−π+π0)NR. (3.21)
Again, a large nonresonant background in K−π+π− and K
π+π− is favored by this model.
Although the B
0 → KSπ0π0 rate has not been measured, its time-dependent CP asymmetries
have been studied by BaBar [58] with the results
sin 2βeff = −0.72± 0.71 ± 0.08, ACP = −0.23 ± 0.52 ± 0.13 . (3.22)
Note that this mode is a CP-even eigenstate. We found that its branching ratio is not so small,
of order 6 × 10−6, in spite of the presence of two neutral pions in the final state (see Table XII).
Theoretically, we obtain
sin 2βeff = 0.729
+0.000+0.001+0.009
−0.000−0.001−0.020 , ACP =
0.28+0.09+0.07+0.02−0.06−0.06−0.02
%. (3.23)
Finally, we consider the mode KSπ
+π− which is an admixture of CP-even and CP-odd compo-
nents. Results for the decay rates and CP asymmetries are displayed in Table XIII. We see that
the effective sin 2β is of order 0.718 and direct CP asymmetry of order 4.9% for KSπ
IV. B → KKπ DECAYS
We now turn to the three-body decay modes dominated by b → u tree and b → d penguin
transitions, namely, KKπ and πππ. We first consider the decay B− → K+K−π− whose factorizable
TABLE XIV: Same as Table VIII except for the decay B− → K+K−π−.
Decay mode f0(980)π
− K∗0K− K∗00 (1430)K
− NR Total
Theory 0.50+0.00+0.06+0.02−0.00−0.04−0.02 0.23
+0.00+0.04+0.02
−0.00−0.04−0.02 0.82
+0.00+0.18+0.09
−0.00−0.16−0.08 1.8
+0.5+0.4+0.2
−0.5−0.2−0.2 4.0
+0.5+0.7+0.3
−0.6−0.5−0.3
Expt. < 6.3 (BaBar)[59]
< 13 (Belle) [11]
TABLE XV: Same as Table VIII except for B− → π+π−π−. The nonresonant background is used
as an input to fix the parameter α
defined in Eq. (2.8).
Decay mode BaBar [5] Theory
ρ0π− 8.8 ± 1.0± 0.6+0.1−0.7 7.7
+0.0+1.7+0.3
−0.0−1.6−0.2
f0(980)π
− 1.2 ± 0.6± 0.1± 0.4 < 3.0 0.39+0.00+0.01+0.03−0.00−0.01−0.02
NR 2.3 ± 0.9± 0.3± 0.4 < 4.6 input
Total 16.2 ± 1.2± 0.9 12.0+1.1+2.0+0.4−1.2−1.8−0.3
amplitude is given by Eq. (A9). Note that we have included the matrix element 〈K+K−|d̄d|0〉.
Although its nonresonant contribution vanishes as K+ and K− do not contain the valence d or d̄
quark, this matrix element does receive a contribution from the scalar f0 pole
〈K+(p2)K−(p3)|d̄d|0〉R =
mf0i f̄
gf0i→π
m2f0i
− s23 − imf0iΓf0i
, (4.1)
where 〈f0|d̄d|0〉 = mf0 f̄df0 . In the 2-quark model for f0(980), f̄
f0(980)
= f̄f0(980) sin θ/
2. Also note
that the matrix element 〈K−(p3)|(s̄b)V −A|B−〉〈π−(p1)K+(p2)|(d̄s)V−A|0〉 has a similar expression
as Eq. (3.5) except for a sign difference
〈K−(p3)|(s̄b)V−A|B−〉〈π−(p1)K+(p2)|(d̄s)V−A|0〉
= −FBK1 (s12)FKπ1 (s12)
s23 − s13 −
(m2B −m2K)(m2K −m2π)
−FBK0 (s12)FKπ0 (s12)
(m2B −m2K)(m2K −m2π)
. (4.2)
As in Eq. (3.8), the form factor FKπ1 receives a resonant contribution for the K
∗ pole.
The nonresonant and various resonant contributions to B− → K+K−π− are shown in Table
XVI. The predicted total rate is consistent with upper limits set by BaBar and Belle.
V. B → πππ DECAYS
The factorizable amplitudes of the tree-dominated decay B− → π+π−π− and B0 → π+π−π0
are given by Eqs. (A15) and (A16), respectively. We see that the former is dominated by the ρ0
TABLE XVI: Same as Table VIII except for the decay B
0 → π+π−π0.
Decay mode ρ+π− ρ−π+ ρ0π0 f0(980)π
0 NR Total
Theory 8.5
+0.0+1.1+0.2
−0.0−1.0−0.1
+0.0+4.0+0.3
−0.0−3.5−0.3
+0.0+0.3+0.0
−0.0−0.2−0.0
0.010
+0.000+0.003+0.000
−0.000−0.002−0.000
+0.02+0.01+0.00
−0.02−0.01−0.00
+0.0+5.6+0.2
−0.0−5.0−0.2
pole, while the latter receives ρ± and ρ0 contributions. As a consequence, the π+π−π0 mode has a
rate larger than π+π−π− even though the former involves a π0 in the final state.
The π+π−π− mode receives nonresonant contributions mostly from the b → u transition as
the nonresonant contribution in the matrix element 〈π+π−|d̄d|0〉 is suppressed by the smallness of
penguin Wilson coefficients a6 and a8. Therefore, the measurement of the nonresonant contribution
in this decay can be used to constrain the nonresonant parameter α
in Eq. (2.8)
VI. DIRECT CP ASYMMETRIES
Direct CP asymmetries for various charmless three-body B decays are collected in Table XVII.
Mixing-induced and direct CP asymmetries in B0 → K+K−KS,L and KSKSKS,L decays are al-
ready shown in Table V. It appears that direct CP violation is sizable in K+K−K− and K+K−π−
modes.
The major uncertainty with direct CP violation comes from the strong phases which are needed
to induce partial rate CP asymmetries. In this work, the strong phases arise from the effective
Wilson coefficients a
i listed in (A3) and from the Breit-Wigner formalism for resonances. Since
direct CP violation in charmless two-body B decays can be significantly affected by final-state
rescattering [60], it is natural to extend the study of final-state rescattering effects to the case of
three-body B decays. We will leave this to a future investigation.
TABLE XVII: Direct CP asymmetries (in %) for various charmless three-body B decays. For
theoretical errors, see Table III. Experimental results are taken from [50].
Final state BaBar Belle Theory
K+K−K− −2± 3± 2 −10.4+1.7+0.9+0.9−1.3−1.0−0.8
K−KSKS −4± 11± 2 −3.9+0.0+0.6+0.3−0.0−0.8−0.3
K+K−π− 0± 10± 3 17.5+1.9+2.2+0.0−3.8−3.4−0.2
K−π+π− −1.3± 3.7± 1.1 4.9± 2.6± 2.0 −3.3+0.7+0.4+0.3−0.5−0.4−0.2
K−π+π0 7± 11± 1 6.3+0.6+1.4+0.5−0.7−1.4−0.5
π+π− 4.9+0.0+0.0+0.3−0.0−0.1−0.4
π0π0 −23± 52± 13 −17± 24± 6 0.28+0.09+0.07+0.02−0.06−0.06−0.02
π−π0 0.4+0.0+0.4+0.0−0.0−0.4−0.0
π+π−π− −1± 8± 3 4.4+0.8+1.2+0.0−0.6−0.9−0.2
π+π−π0 −3.0+0.1+0.2+0.3−0.1−0.3−0.2
VII. TWO-BODY B → V P AND B → SP DECAYS
Thus far we have considered the branching ratio products B(B → Rh1)B(R → h2h3) with
the resonance R being a vector meson or a scalar meson. Using the experimental information on
B(R→ h2h3) [4]
B(K∗0 → K+π−) = B(K∗+ → K0π+) = 2B(K∗+ → K+π0) =
B(K∗00 (1430) → K+π−) = 2B(K∗+0 (1430) → K
+π0) =
(0.93 ± 0.10),
B(φ→ K+K−) = 0.492 ± 0.006 . (7.1)
one can extract the branching ratios of B → V P and B → SP . The results are summarized in
Table XVIII.
Two remarks about the experimental branching ratios are in order: (i) The BaBar results for the
branching ratios of B
0 → K∗−π+, K∗0π0, K∗−0 (1430)π+ are inferred from the three-body decays
0 → K0π+π− (see Table XI) and Belle results are taken from B0 → K−π+π0 (see Table X). (ii)
Branching ratios of B
0 → φK0 shown in Table XVIII are not inferred from the Dalitz plot analysis
of B → KKK decays.
For comparison, the predictions of the QCD factorization approach for B → V P [61] and
B → SP [48] are also exhibited in Table XVIII. In order to compare theory with experiment for
B → f0(980)K → π+π−K, we need an input for B(f0(980) → π+π−). To do this, we shall use the
BES measurement [45]
Γ(f0(980) → ππ)
Γ(f0(980) → ππ) + Γ(f0(980) → KK)
= 0.75+0.11−0.13 . (7.2)
Assuming that the dominance of the f0(980) width by ππ and KK and applying isospin relation,
we obtain
B(f0(980) → π+π−) = 0.50+0.07−0.09 , B(f0(980) → K
+K−) = 0.125+0.018−0.022 . (7.3)
At first sight, it appears that the ratio defined by
R ≡ B(f0(980) → K
B(f0(980) → π+π−)
= 0.25± 0.06 (7.4)
is not consistent with the value of 0.69 ± 0.32 inferred from the BaBar data (see Tables VI and
VIII)
Γ(B− → f0(980)K−; f0(980) → K+K−)
Γ(B− → f0(980)K−; f0(980) → π+π−)
6.5± 2.5 ± 1.6
9.3± 1.0+0.6−0.9
, (7.5)
where we have applied the narrow width approximation Eq. (3.14).
The above-mentioned discrepancy can be resolved by noting that the factorization relation Eq.
(3.14) for the resonant three-body decay is applicable only when the two-body decays B → RP
and R → P1P2 are kinematically allowed and the resonance is narrow, the so-called narrow width
approximation. However, as the decay f0(980) → K+K− is kinematically barely or even not
allowed, the off resonance peak effect of the intermediate resonant state will become important.
TABLE XVIII: Branching ratios of quasi-two-body decays B → V P and B → SP obtained from the
studies of three-body decays based on the factorization approach. Unless specified, the experimental
results are obtained from the 3-body Dalitz plot analyses given in previous Tables. Theoretical
uncertainties have been added in quadrature. QCD factorization (QCDF) predictions taken from
[61] for V P modes and from [48] for SP channels are shown here for comparison.
Decay mode BaBar Belle QCDF This work
φK0 8.4+1.5−1.3 ± 0.5 a 9.0
−1.8 ± 0.7 b 4.1
+0.4+1.7+1.8+10.6
−0.4−1.6−1.9− 3.0 5.3
φK− 8.4± 0.7 ± 0.7 9.60 ± 0.92+1.05−0.84 4.5
+0.5+1.8+1.9+11.8
−0.4−1.7−2.1− 3.3 5.9
π− 13.5 ± 1.2+0.8−0.9 9.8± 0.9
−1.2 3.6
+0.4+1.5+1.2+7.7
−0.3−1.4−1.2−2.3 4.4
π0 3.0± 0.9 ± 0.5 < 3.5 0.7+0.1+0.5+0.3+2.6−0.1−0.4−0.3−0.5 1.5
K∗−π+ 11.0 ± 1.5± 0.7 8.4± 1.1+0.9−0.8 3.3
+1.4+1.3+0.8+6.2
−1.2−1.2−0.8−1.6 3.1
K∗−π0 6.9 ± 2.0± 1.3 b 3.3+1.1+1.0+0.6+4.4−1.0−0.9−0.6−1.4 2.2
K∗0K− 0.30+0.11+0.12+0.09+0.57−0.09−0.10−0.09−0.19 0.35
+0.06
−0.06
ρ0K− 5.1± 0.8+0.6−0.9 3.89 ± 0.47
+0.43
−0.41 2.6
+0.9+3.1+0.8+4.3
−0.9−1.4−0.6−1.2 1.3
4.9± 0.8 ± 0.9 6.1± 1.0 ± 1.1 4.6+0.5+4.0+0.7+6.1−0.5−2.1−0.7−2.1 2.0
ρ+K− 8.6± 1.4 ± 1.0 15.1+3.4+2.4−3.3−2.6 7.4
+1.8+7.1+1.2+10.7
−1.9−3.6−1.1− 3.5 2.5
8.0+1.4−1.3 ± 0.5 b 5.8
+0.6+7.0+1.5+10.3
−0.6−3.3−1.3− 3.2 1.3
ρ0π− 8.8± 1.0+0.6−0.9 8.0
−2.0 ± 0.7 b 11.9
+6.3+3.6+2.5+1.3
−5.0−3.1−1.2−1.1 7.7
ρ−π+ 21.2+10.3+8.7+1.3+2.0− 8.4−7.2−2.3−1.6 15.5
ρ+π− 15.4+8.0+5.5+0.7+1.9−6.4−4.7−1.3−1.3 8.5
ρ0π0 1.4± 0.6 ± 0.3 3.1+0.9+0.6−0.8−0.8 0.4
+0.2+0.2+0.9+0.5
−0.2−0.1−0.3−0.3 1.0
f0(980)K
0; f0 → π+π− 5.5± 0.7 ± 0.6 7.6± 1.7+0.8−0.9 6.7
+0.1+2.1+2.3
−0.1−1.5−1.1
c 7.7+0.4−0.7
f0(980)K
−; f0 → π+π− 9.3± 1.0+0.6−0.9 8.8± 0.8
−1.8 7.8
+0.2+2.3+2.7
−0.2−1.6−1.2
c 7.7+0.4−0.8
f0(980)K
0; f0 → K+K− 5.3± 2.2 5.8+0.1−0.5
f0(980)K
−; f0 → K+K− 6.5± 2.5 ± 1.6 < 2.9 7.0+0.4−0.7
f0(980)π
−; f0 → π+π− < 3.0 0.5+0.0+0.2+0.1−0.0−0.1−0.0 c 0.39
+0.03
−0.02
f0(980)π
−; f0 → K+K− 0.50+0.06−0.04
f0(980)π
0; f0 → π+π− 0.02+0.01+0.02+0.04−0.01−0.00−0.01 c 0.010
+0.003
−0.002
0 (1430)π
− 36.6 ± 1.8± 4.7 51.6 ± 1.7+7.0−7.4 11.0
+10.3+7.5+49.9
− 6.0−3.5−10.1 16.9
0 (1430)π
0 12.7 ± 2.4± 4.4 9.8± 2.5 ± 0.9 6.4+5.4+2.2+26.1−3.3−2.1− 5.7 6.8
K∗−0 (1430)π
+ 36.1 ± 4.8 ± 11.3 49.7 ± 3.8+4.0−6.1 11.3
+9.4+3.7+45.8
−5.8−3.7− 9.9 16.2
K∗−0 (1430)π
0 5.3+4.7+1.6+22.3−2.8−1.7− 4.7 8.9
K∗00 (1430)K
− < 2.2 b 1.3+0.3−0.3
aFrom the Dalitz plot analysis of B0 → K+K−K0 decay measured by BaBar (see Table III), we obtain
B(B0 → φK0) = (6.2± 0.9)× 10−6. The experimental value of BaBar cited in the Table is obtained from a
direct measurement of B0 → φK0.
bnot determined directly from the Dalitz plot analysis of three-body decays.
cWe have assumed B(f0(980) → π+π−) = 0.50 for the QCDF calculation.
Therefore, it is necessary to take into account the finite width effect of the f0(980) which has a
width of order 40-100 MeV [4]. In short, one cannot determine the ratio R by applying the narrow
width approximation to the three-body decays. That is, one should employ the decays B → Kππ
rather than B → KKK to extract the experimental branching ratio for B → f0(980)K provided
B(f0(980) → ππ) is available.
We now compare the present work for B → V P and B → SP with the approach of QCD
factorization [34, 48]. In this work, our calculation of 3-body B decays is similar to the simple
generalized factorization approach [62, 63] by assuming a set of universal and process independent
effective Wilson coefficients a
i with p = u, c in Eq. (A3). In QCDF, the calculation of a
rather sophisticated. They are basically the Wilson coefficients in conjunction with short-distance
nonfactorizable corrections such as vertex corrections and hard spectator interactions. In general,
they have the expressions [34, 61]
i (M1M2) =
Ni(M2) +
Vi(M2) +
Hi(M1M2)
i (M2), (7.6)
where i = 1, · · · , 10, the upper (lower) signs apply when i is odd (even), ci are the Wilson coefficients,
CF = (N
c − 1)/(2Nc) with Nc = 3, M2 is the emitted meson and M1 shares the same spectator
quark with the B meson. The quantities Vi(M2) account for vertex corrections, Hi(M1M2) for hard
spectator interactions with a hard gluon exchange between the emitted meson and the spectator
quark of the B meson and Pi(M2) for penguin contractions. Hence, the effective Wilson coefficients
i (M1M2) depend on the nature ofM1 andM2; that is, they are process dependent. Moreover, they
depend on the order of the argument, namely, a
i (M2M1) 6= a
i (M1M2) in general. In the above
equation, Ni(M2) vanishes for i = 6, 8 and M2 = V , and equals to unity otherwise. For three-
body decays, in principle one should also compute the vertex, gluon and hard spectator-interaction
corrections. Of course, these corrections for the three-body case will be more complicated than the
two-body decay one. One possible improvement of the present work is to utilize the QCDF results
for the effective parameters a
i (M1M2) in the vicinity of the resonance region.
We next proceed to the comparison of numerical results. For φK, K∗π and K∗K modes, the
QCDF and the present work have similar predictions. For the ρ meson in the final states, QCDF
predicts slightly small ρK and too large ρπ compared to experiment. 6 In contrast, in the present
work we obtain reasonable ρπ but too small ρK. This is ascribed to the form factor A
0 (0) =
0.37 ± 0.06 employed in [61] that is too large compared to ours ABρ0 (0) = 0.28 ± 0.03 (see Table
XIX). Recall that the recent QCD sum rule calculation also yields a smaller one A
0 (0) = 0.30
+0.07
−0.03
[64].
For B → f0(980)K and B → f0(980)π, QCDF [48] and this work are in agreement with
experiment. The large rate of the f0(980)K mode is ascribed to the large f0(980) decay constant,
f̄f0(980) ≈ 460 MeV at the renormalization scale µ = 2.1 GeV [48]. In contrast, the predicted
0 (1430)π
− and K∗−0 (1430)π
+ are too small compared to the data. The fact that QCDF leads to
too small rates for φK, K∗π, ρK and K∗0 (1430)π may imply the importance of power corrections
6 Recall that the world average of the branching ratio of B0 → ρ±π∓ is (24.0±2.5)×10−6 [50], while QCDF
predicts it to be ∼ 36.6× 10−6 [61].
due to the non-vanishing ρA and ρH parameters arising from weak annihilation and hard spectator
interactions, respectively, which are used to parametrize the endpoint divergences, or due to possible
final-state rescattering effects from charm intermediate states [60]. However, this is beyond the
scope of the present work.
VIII. CONCLUSIONS
In this work, an exploratory study of charmless 3-body decays of B mesons is presented using
a simple model based on the framework of the factorization approach. The 3-body decay process
consists of resonant contributions and the nonresonant signal. Since factorization has not been
proved for three-body B decays, we shall work in the phenomenological factorization model rather
than in the established theories such as QCD factorization. That is, we start with the simple idea
of factorization and see if it works for three-body decays. Our main results are as follows:
• If heavy meson chiral perturbation theory (HMChPT) is applied to the three-body matrix
elements for B → P1P2 transitions and assumed to be valid over the whole kinematic region,
then the predicted decay rates for nonresonant 3-body B decays will be too large and even
exceed the measured total rate. This can be understood because chiral symmetry has been
applied beyond its region of validity. We assume the momentum dependence of nonresonant
amplitudes in the exponential form e−αNRpB ·(pi+pj) so that the HMChPT results are recovered
in the soft meson limit pi, pj → 0. The parameter αNR can be fixed from the tree-dominated
decay B− → π+π−π−.
• Besides the nonresonant contributions arising from B → P1P2 transitions, we have identified
another large source of the nonresonant background in the matrix elements of scalar densi-
ties, e.g. 〈KK|s̄s|0〉 which can be constrained from the KSKSKS (or K−KSKS) mode in
conjunction with the mass spectrum in the decay B
0 → K+K−K0 .
• All KKK modes are dominated by the nonresonant background. The predicted branching
ratios of K+K−KS(L), K
+K−K− and K−KSKS modes are consistent with the data within
the theoretical and experimental errors.
• Although the penguin-dominated B0 → K+K−KS decay is subject to a potentially signifi-
cant tree pollution, its effective sin 2β is very similar to that of the KSKSKS mode. However,
direct CP asymmetry of the former, being of order −4%, is more prominent than the latter,
• The role played by the unknown scalar resonance X0(1550) in the decay B− → K+K−K−
should be clarified in order to see if it behaves in the same way as in the K+K−K
mode.
• Applying SU(3) symmetry to relate the nonresonant component in the matrix element
〈Kπ|s̄q|0〉 to that in 〈KK|s̄s|0〉, we found sizable nonresonant contributions in K−π+π−
and K
π+π− modes, in agreement with the Belle measurements but larger than the BaBar
results. In particular, the predicted nonresonant contribution in the K−π+π0 mode is consis-
tent with the Belle limit and larger than the BaBar’s upper bound. It will be interesting to
have a refined measurement of the nonresonant contribution to this mode to test our model.
• The π+π−π0 mode is predicted to have a rate larger than π+π−π− even though the former
involves a π0 in the final state. This is because the latter is dominated by the ρ0 pole, while
the former receives ρ± and ρ0 resonant contributions.
• Among the 3-body decays we have studied, the decay B− → K+K−π− dominated by b→ u
tree transition and b→ d penguin transition has the smallest branching ratio of order 4×10−6.
It is consistent with the current bound set by BaBar and Belle.
• Decay rates and time-dependent CP asymmetries in the decays KSπ0π0, a purely CP -even
state, and KSπ
+π−, an admixture of CP -even and CP -odd components, are studied. The
corresponding mixing-induced CP violation is found to be of order 0.729 and 0.718, respec-
tively.
• Since the decay f0(980) → K+K− is kinematically barely or even not allowed, it is crucial
to take into account the finite width effect of the f0(980) when computing the decay B →
f0(980)K → KKK. Consequently, one should employ the Dalitz plot analysis of Kππ mode
to extract the experimental branching ratio for B → f0(980)K provided B(f0(980) → ππ) is
available. The large rate of B → f0(980)K is ascribed to the large f0(980) decay constant,
f̄f0(980) ≈ 460 MeV.
• The intermediate vector meson contributions to 3-body decays e.g. ρ, φ, K∗
are identified through the vector current, while the scalar meson resonances e.g.
f0(980), X0(1550), K
0 (1430) are mainly associated with the scalar density. Their effects
are described in terms of the Breit-Wigner formalism.
• Based on the factorization approach, we have computed the resonant contributions to 3-
body decays and determined the rates for the quasi-two-body decays B → V P and B → SP .
The predicted ρπ, f0(980)K and f0(980)π rates are consistent with experiment, while the
calculated φK, K∗π, ρK and K∗0 (1430)π are too small compared to the data.
• Direct CP asymmetries have been computed for the charmless 3-body B decays. We found
sizable direct CP violation in K+K−K− and K+K−π− modes.
• In this exploratory work we use the phenomenological factorization model rather than in
the established theories based on a heavy quark expansion. Consequently, we don’t have
1/mb power corrections within this model. However, systematic errors due to such model
dependent assumptions may be sizable and are not included in the error estimates that we
give.
Note added: After the paper was submitted for publication, BaBar (arXiv:0708.0367 [hep-ex]) has
reported the observation of the decay B+ → K+K−π+ with the branching ratio (5.0± 0.5± 0.5)×
10−6. Our prediction for this mode (see Table XIV) is consistent with experiment.
Acknowledgments
This research was supported in part by the National Science Council of R.O.C. under Grant
Nos. NSC95-2112-M-001-013, NSC95-2112-M-033-013, and by the U.S. DOE contract No. DE-
AC02-98CH10886(BNL).
http://arxiv.org/abs/0708.0367
APPENDIX A: DECAY AMPLITUDES OF THREE-BODY B DECAYS
In this appendix we list the factorizable amplitudes of the 3-body decays B →
KKK,KKπ,Kππ, πππ. Under the factorization hypothesis, the decay amplitudes are given by
〈P1P2P3|Heff |B〉 =
p=u,c
λ(r)p 〈P1P2P3|Tp|B〉, (A1)
where λ
p ≡ VpbV ∗pr with r = d, s. For KKK and Kππ modes, r = s and for KKπ and πππ
channles, r = d. The Hamiltonian Tp has the expression [34]
Tp = a1δpu(ūb)V−A ⊗ (s̄u)V−A + a2δpu(s̄b)V−A ⊗ (ūu)V−A + a3(s̄b)V−A ⊗
(q̄q)V−A
(q̄b)V−A ⊗ (s̄q)V−A + a5(s̄b)V −A ⊗
(q̄q)V+A
−2ap6
(q̄b)S−P ⊗ (s̄q)S+P + a7(s̄b)V−A ⊗
eq(q̄q)V+A
−2ap8
(q̄b)S−P ⊗
eq(s̄q)S+P + a9(s̄b)V−A ⊗
eq(q̄q)V−A
(q̄b)V−A ⊗
eq(s̄q)V−A, (A2)
with (q̄q′)V±A ≡ q̄γµ(1 ± γ5)q′, (q̄q′)S±P ≡ q̄(1 ± γ5)q′ and a summation over q = u, d, s being
implied. For the effective Wilson coefficients, we use
a1 ≈ 0.99 ± 0.037i, a2 ≈ 0.19 − 0.11i, a3 ≈ −0.002 + 0.004i, a5 ≈ 0.0054 − 0.005i,
au4 ≈ −0.03− 0.02i, ac4 ≈ −0.04 − 0.008i, au6 ≈ −0.06− 0.02i, ac6 ≈ −0.06− 0.006i,
a7 ≈ 0.54 × 10−4i, au8 ≈ (4.5 − 0.5i) × 10−4, ac8 ≈ (4.4 − 0.3i) × 10−4, (A3)
a9 ≈ −0.010 − 0.0002i, au10 ≈ (−58.3 + 86.1i) × 10−5, ac10 ≈ (−60.3 + 88.8i) × 10−5,
for typical ai at the renormalization scale µ = mb/2 = 2.1 GeV which we are working on.
Various three-body B decay amplitudes are collected below.
B → KKK
〈K0K+K−|Tp|B
0〉 = 〈K+K0|(ūb)V−A|B0〉〈K−|(s̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈K+K−|(d̄b)V−A|B0〉〈K
0|(s̄d)V−A|0〉
+〈K0|(s̄b)V−A|B0〉〈K+K−|(ūu)V−A|0〉(a2δpu + a3 + a5 + a7 + a9)
+〈K0|(s̄b)V−A|B0〉〈K+K−|(d̄d)V−A|0〉
a3 + a5 −
(a7 + a9)
+〈K0|(s̄b)V−A|B0〉〈K+K−|(s̄s)V−A|0〉
a3 + a
4 + a5 −
(a7 + a9 + a
+〈K0|s̄b|B0〉〈K+K−|s̄s|0〉(−2ap6 + a
+〈K+K−|d̄(1− γ5)b|B0〉〈K
0|s̄(1 + γ5)d|0〉 (−2ap6 + a
+〈K+K−K0|(s̄d)V −A|0〉〈0|(d̄b)V−A|B0〉
+〈K+K−K0|s̄γ5d|0〉〈0|d̄γ5b|B0〉(−2ap6 + a
8), (A4)
with rPχ =
mb(µ)(m2+m1)(µ)
〈K+K−K−|Tp|B−〉 = 〈K+K−|(ūb)V−A|B−〉〈K−|(s̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈K−|(s̄b)V−A|B−〉〈K+K−|(ūu)V−A|0〉(a2δpu + a3 + a5 + a7 + a9)
+〈K−|(s̄b)V−A|B−〉〈K+K−|(d̄d)V −A|0〉
a3 + a5 −
(a7 + a9)
+〈K−|(s̄b)V−A|B−〉〈K+K−|(s̄s)V−A|0〉
a3 + a
4 + a5 −
(a7 + a9 + a
+〈K−|s̄b|B−〉〈K+K−|s̄s|0〉(−2ap6 + a
+〈K+K−|ū(1− γ5)b|B0〉〈K−|s̄(1 + γ5)u|0〉 (−2ap6 + a
+〈K+K−K−|(s̄u)V−A|0〉〈0|(ūb)V−A|B−〉
+〈K+K−K−|s̄γ5u|0〉〈0|ūγ5b|B−〉(−2ap6 + a
8). (A5)
Since there are two identical K− mesons in this decay, one should take into account the identical
particle effects. For example,
〈K+K−|(ūb)V−A|B−〉〈K−|(s̄u)V−A|0〉 = 〈K+(p1)K−(p2)|(ūb)V −A|B−〉〈K−(p3)|(s̄u)V−A|0〉
+ 〈K+(p1)K−(p3)|(ūb)V −A|B−〉〈K−(p2)|(s̄u)V−A|0〉,
and a factor of 1
should be put in the decay rate.
〈K0K0K0|Tp|B
0〉 = 〈K0K0|(d̄b)V−A|B0〉〈K0|(s̄d)V−A|0〉
10 − (a
+〈K0|(s̄b)V −A|B0〉〈K0K0|(d̄d)V−A|0〉
a3 + a5 −
(a7 + a9)
+〈K0|(s̄b)V −A|B0〉〈K0K0|(s̄s)V−A|0〉
a3 + a
4 + a5 −
(a7 + a9 + a
+〈K0|s̄b|B0〉〈K0K0|s̄s|0〉(−2ap6 + a
+〈K0K0K0|(s̄d)V−A|0〉〈0|(d̄b)V−A|B0〉
(a7 + a9 + a
+〈K0K0K0|s̄γ5d|0〉〈0|d̄γ5b|B0〉(−2ap6 + a
8). (A7)
The second and third terms do not contribute to the purely CP -even decay B
0 → KSKSKS .
〈K−K0K0|Tp|B−〉 = 〈K0K
0|(ūb)V−A|B−〉〈K−|(s̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈K0K−|(d̄b)V−A|B−〉〈K0|(s̄d)V−A|0〉
10 − (a
+〈K−|(s̄b)V−A|B−〉〈K0K0|(d̄d)V−A|0〉
a3 + a5 −
(a7 + a9)
+〈K−|(s̄b)V−A|B−〉〈K0K0|(s̄s)V−A|0〉
a3 + a
4 + a5 −
(a7 + a9 + a
+〈K−|s̄b|B−〉〈K0K0|s̄s|0〉(−2ap6 + a
+〈K−K0K0|(s̄u)V −A|0〉〈0|(ūb)V−A|B0〉(a1δpu + ap4 + a
+〈K−K0K0|s̄γ5u|0〉〈0|ū(1− γ5)b|B−〉(2ap6 + 2a
8). (A8)
The third and fourth terms do not contribute to the decay B− → K−KSKS .
B → KKπ
〈π−K+K−|Tp|B−〉 = 〈K+K−|(ūb)V−A|B−〉〈π−|(d̄u)V −A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈π−|(d̄b)V−A|B−〉〈K+K−|(ūu)V−A|0〉(a2δpu + a3 + a5 + a7 + a9)
+〈π−|d̄b|B−〉〈K+K−|d̄d|0〉(−2ap6 + a
+〈π−|(d̄b)V−A|B−〉〈K+K−|(s̄s)V−A|0〉
a3 + a5 −
(a7 + a9)
+〈K−|(s̄b)V−A|B−〉〈K+π−|(d̄s)V−A|0〉(ap4 −
+〈K−|s̄b|B−〉〈K+π−|d̄s|0〉(−2ap6 + a
+〈K+K−π−|(d̄u)V−A|0〉〈0|(ūb)V −A|B−〉
a1δpu + a
4 + a
+〈K+K−π−|d̄γ5u|0〉〈0|ūγ5b|B−〉(2ap6 − a
8). (A9)
B → Kππ
〈K−π+π−|Tp|B−〉 = 〈π+π−|(ūb)V−A|B−〉〈K−|(s̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈K−|(s̄b)V−A|B−〉〈π+π−|(ūu)V −A|0〉
a2δpu +
(a7 + a9)
+〈K−|s̄b|B−〉〈π+π−|s̄s|0〉(−2ap6 + a
+〈π−|(d̄b)V−A|B−〉〈K−π+|(s̄d)V−A|0〉(ap4 −
+〈π−|d̄b|B−〉〈K−π+|s̄d|0〉(−2ap6 + a
+〈K−π+π−|(s̄u)V−A|0〉〈0|(ūb)V−A|B−〉(a1δpu + ap4 + a
+〈K−π+π−|s̄γ5u|0〉〈0|ūγ5b|B−〉(2ap6 + 2a
8). (A10)
〈K0π+π−|Tp|B
0〉 = 〈π+π−|(d̄b)V−A|B
0〉〈K0|(s̄d)V −A|0〉
10 − (a
+〈K0|(s̄b)V−A|B
0〉〈π+π−|(ūu)V−A|0〉
a2δpu +
(a7 + a9)
+〈K0|s̄b|B0〉〈π+π−|s̄s|0〉(−2ap6 + a
+〈π+|(ūb)V−A|B
0〉〈K0π−|(s̄u)V−A|0〉(a1 + ap4 + a
+〈π+|ūb|B0〉〈K0π−|s̄u|0〉(−2ap6 − 2a
+〈K0π+π−|(s̄d)V −A|0〉〈0|(d̄b)V−A|B
0〉(a1δpu + ap4 + a
+〈K0π+π−|s̄(1 + γ5)d|0〉〈0|d̄γ5b|B
0〉(2ap6 − a
8). (A11)
〈K−π+π0|Tp|B
0〉 = 〈π+π0|(ūb)V−A|B
0〉〈K−|(s̄u)V −A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈K−π+|(s̄b)V−A|B
0〉〈π0|(ūu)V−A|0〉
a2δpu +
(−a7 + a9)
+〈π+|(ūb)V−A|B
0〉〈K−π0|(s̄u)V−A|0〉 [a1δpu + ap4 + a
+〈π0|(d̄b)V −A|B
0〉〈K−π+|(s̄d)V −A|0〉(ap4 −
+〈π+|ūb|B0〉〈K−π0|s̄u|0〉(−2ap6 − 2a
+〈π0|d̄b|B0〉〈K−π+|s̄d|0〉(−2ap6 + a
+〈K−π+π0|(s̄d)V−A|0〉〈0|(d̄b)V−A|B
0〉(ap4 −
+〈K−π+π0|s̄(1 + γ5)d|0〉〈0|d̄γ5b|B
0〉(2ap6 − a
8). (A12)
〈K0π−π0|Tp|B−〉 = 〈π0π−|(d̄b)V−A|B−〉〈K
0|(s̄d)V −A|0〉
10 − (a
+〈K0π−|(s̄b)V−A|B−〉〈π0|(ūu)V−A|0〉
a2δpu +
(−a7 + a9)
+〈π0|(ūb)V−A|B−〉〈K
π−|(s̄u)V−A|0〉 [a1δpu + ap4 + a
+〈π−|(d̄b)V−A|B−〉〈K
π0|(s̄d)V−A|0〉
+〈π0|ūb|B−〉〈K0π−|s̄u|0〉(−2ap6 − 2a
+〈π−|d̄b|B−〉〈K0π0|s̄d|0〉(−2ap6 + a
+〈K0π−π0|(s̄u)V −A|0〉〈0|(ūb)V−A|B−〉(a1δpu + ap4 + a
+〈K0π−π0|s̄(1 + γ5)u|0〉〈0|ūγ5b|B−〉(2ap6 + 2a
8). (A13)
〈K0π0π0|Tp|B
0〉 = 〈π0π0|(d̄b)V−A|B
0〉〈K0|(s̄d)V −A|0〉
10 − (a
+〈K0π0|(s̄b)V−A|B
0〉〈π0|(ūu)V−A|0〉
a2δpu +
(−a7 + a9)
+〈π0|(d̄b)V−A|B
0〉〈K0π0|(s̄d)V−A|0〉
+〈π0π0|(ūu)
|0〉〈K0|(s̄b)
a2δpu + 2a3 + 2a5 +
(a7 + a9)
+〈K0π0|s̄d|0〉〈π0|d̄b|B0〉(−2ap6 + a
+〈π0π0|s̄s|0〉〈K0|s̄b|B0〉(−2ap6 + a
+〈K0π0π0|(s̄d)V−A|0〉〈0|(d̄b)V−A|B
0〉(ap4 −
+〈K0π0π0|s̄(1 + γ5)d|0〉〈0|d̄γ5b|B
0〉(2ap6 − a
8). (A14)
B → πππ
〈π−π+π−|Tp|B−〉 = 〈π+π−|(ūb)V −A|B−〉〈π−|(d̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈π−|(d̄b)V−A|B−〉〈π+π−|(ūu)V−A|0〉
a2δpu − ap4 +
(a7 + a9) +
+〈π−|d̄b|B−〉〈π+π−|d̄d|0〉(−2ap6 + a
+〈π−π+π−|(d̄u)V−A|0〉〈0|(ūb)V−A|B−〉(a1δpu + ap4 + a
+〈π−π+π−|d̄(1 + γ5)u|0〉〈0|ūγ5b|B−〉(2ap6 + 2a
8). (A15)
〈π0π+π−|Tp|B
0〉 = 〈π+π0|(ūb)V−A|B
0〉〈π−|(d̄u)V−A|0〉
a1δpu + a
4 + a
10 − (a
6 + a
+〈π+π−|(d̄b)V −A|B
0〉〈π0|(ūu)V−A|0〉
a2δpu − ap4 + (a
(a7 + a9) +
+〈π+|(ūb)V−A|B
0〉〈π−π0|(d̄u)V−A|0〉 [a1δpu + ap4 + a
+〈π0|(d̄b)V−A|B
0〉〈π+π−|(ūu)V−A|0〉
a2δpu − ap4 +
(a7 + a9) +
+〈π0|d̄b|B0〉〈π+π−|d̄d|0〉(−2ap6 + a
(A16)
APPENDIX B: DECAY CONSTANTS, FORM FACTORS AND OTHERS
In this appendix we collect the numerical values of the decay constants, form factors, CKM
matrix elements and quark masses needed for the calculations. We first discuss the decay constants
of the pseudoscalar meson P and the scalar meson S defined by
〈P (p)|q̄2γµγ5q1|0〉 = −ifPpµ, 〈S(p)|q̄2γµq1|0〉 = fSpµ, 〈S|q̄2q1|0〉 = mS f̄S , (B1)
and 〈V (p, ε)|Vµ|0〉 = fVmV ε∗µ for the vector meson. For the scalar mesons, the vector decay
constant fS and the scale-dependent scalar decay constant f̄S are related by equations of motion
µSfS = f̄S, with µS =
m2(µ)−m1(µ)
, (B2)
where m2 and m1 are the running current quark masses. The neutral scalar mesons σ, f0 and a
cannot be produced via the vector current owing to charge conjugation invariance or conservation
of vector current:
fσ = ff0 = fa0
= 0. (B3)
However, the decay constant f̄S is non-vanishing. In [48] we have applied the QCD sum rules to
estimate this quantity. In this work we folow [48] to use
f̄f0(980) = 460MeV, f̄K∗0 (1430)
= 550MeV, (B4)
at µ = 2.1 GeV. As for the decay constants of vector mesons, we use (in units of MeV).
fρ = 216, fK∗ = 218, f̄f0(980) = 460, f̄K∗0 = 550. (B5)
Form factors for B → P, S transitions are defined by [41]
〈P (p′)|Vµ|B(p)〉 =
(p+ p′)µ −
m2B −m2P
FBP1 (q
m2B −m2P
〈S(p′)|Aµ|B(p)〉 = −i
(p + p′)µ −
m2B −m2S
FBS1 (q
m2B −m2S
〈V (p′, ε)|Vµ|B(p)〉 =
mB +mV
ǫµναβε
∗νpαp′βV (q2),
〈V (p′, ε)|Aµ|B(p)〉 = i
(mB +mV )ε
ε∗ · p
mB +mV
(p+ p′)µA
ε∗ · p
2)−ABV0 (q2)]
, (B6)
where q = p− p′, F1(0) = F0(0), A3(0) = A0(0), and
mP +mV
2)− mP −mV
2), (B7)
where Pµ = (p + p
′)µ, qµ = (p − p′)µ. As shown in [65], a factor of (−i) is needed in B → S
transition in order for the B → S form factors to be positive. This also can be checked from heavy
quark symmetry [65].
Various form factors for B → S transitions have been evaluated in the relativistic covariant
light-front quark model [65]. In this model form factors are first calculated in the spacelike region
and their momentum dependence is fitted to a 3-parameter form
F (q2) =
F (0)
1− a(q2/m2B) + b(q2/m2B)2
. (B8)
The parameters a, b and F (0) are first determined in the spacelike region. This parametrization is
then analytically continued to the timelike region to determine the physical form factors at q2 ≥ 0.
The results relevant for our purposes are summarized in Table XIX. In practical calculations, we
shall assign the form factor error to be 0.03. For example, FBK0,1 (0) = 0.35 ± 0.03.
The form factor for B to f0(980) is of order 0.25 at q
2 = 0 [48]. In the qq̄ model for the f0(980),
0 = FBf0 sin θ/
For the heavy-flavor independent strong coupling g in HMChPT, we use |g| = 0.59± 0.01± 0.07
as extracted from the CLEO measurement of the D∗+ decay width [39]. The sign is fixed to be
negative in the quark model [23].
For the CKM matrix elements, we use the Wolfenstein parameters A = 0.806, λ = 0.22717,
ρ̄ = 0.195 and η̄ = 0.326 [52]. The corresponding CKM angles are (sin 2β)CKM = 0.695
+0.018
−0.016 and
γ = (59 ± 7)◦ [52]. For the running quark masses we shall use
mb(mb) = 4.2GeV, mb(2.1GeV) = 4.95GeV, mb(1GeV) = 6.89GeV,
mc(mb) = 1.3GeV, mc(2.1GeV) = 1.51GeV,
ms(2.1GeV) = 90MeV, ms(1GeV) = 119MeV,
md(1GeV) = 6.3MeV, mu(1GeV) = 3.5MeV. (B9)
The uncertainty of the strange quark mass is specified as ms(2.1GeV) = 90± 20 MeV.
TABLE XIX: Form factors of B → π,K,K∗0 (1430), ρ transitions obtained in the covariant light-
front model [65].
F F (0) F (q2max) a b F F (0) F (q
max) a b
FBπ1 0.25 1.16 1.73 0.95 F
0 0.25 0.86 0.84 0.10
FBK1 0.35 2.17 1.58 0.68 F
0 0.35 0.80 0.71 0.04
1 0.26 0.70 1.52 0.64 F
0 0.26 0.33 0.44 0.05
V Bρ 0.27 0.79 1.84 1.28 A
0 0.28 0.76 1.73 1.20
1 0.22 0.53 0.95 0.21 A
2 0.20 0.57 1.65 1.05
0.31 0.96 1.79 1.18 ABK
0 0.31 0.87 1.68 1.08
1 0.26 0.58 0.93 0.19 A
2 0.24 0.70 1.63 0.98
[1] A.E. Snyder and H.R. Quinn, Phys. Rev. D 48, 2139 (1993).
[2] M. Ciuchini, M. Pierini, and L. Silvestrini, Phys. Rev. D 74, 051301 (2006).
[3] M. Gronau, D. Pirjol, A. Soni, and J. Zupan, Phys. Rev. D 75, 014002 (2007).
[4] Particle Data Group, Y.M. Yao et al., J. Phys. G 33, 1 (2006).
[5] B. Aubert et al. (BaBar Collaboration), Phys. Rev. D 72, 052002 (2005).
[6] B. Aubert et al. (BaBar Collaboration), Phys. Rev. D 72, 072003 (2005).
[7] A. Garmash et al. (Belle Collaboration), Phys. Rev. Lett. 96, 251803 (2006).
[8] B. Aubert et al. (BaBar Collaboration), Phys. Rev. D 74, 032003 (2006).
[9] A. Garmash et al. (Belle Collaboration), Phys. Rev. D 71, 092003 (2005).
[10] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett. 93, 181805 (2004).
[11] A. Garmash et al. (Belle Collaboration), Phys. Rev. D 69, 012001 (2004).
[12] B. Aubert et al. (BaBar Collaboration), Phys. Rev. D 73, 031101 (2006).
[13] A. Garmash et al. (Belle Collaboration), Phys. Rev. D 75, 012006 (2007).
[14] B. Aubert et al. (BaBar Collaboration), hep-ex/0408073.
[15] P. Chang et al. (Belle Collaboration), Phys. Lett. B 599, 148 (2004).
[16] B. Aubert et al. (BaBar Collaboration), arXiv:0706.3885 [hep-ex].
[17] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett. 95, 011801 (2005).
[18] P. Singer, Phys. Rev. D 16, 2304 (1977); Nuovo Cim. 42A, 25 (1977).
[19] Yu. L. Kalinovsk and V.N. Pervushin, Sov. J. Nucl. Phys. 29, 225 (1979).
[20] H.Y. Cheng, Z. Phys. C 32, 243 (1986).
[21] L.L. Chau and H.Y. Cheng, Phys. Rev. D 41, 1510 (1990).
[22] F.J. Botella, S. Noguera, and J. Portolés, Phys. Lett. B 360, 101 (1995).
[23] T.M. Yan, H.Y. Cheng, C.Y. Cheung, G.L. Lin, Y.C. Lin, and H.L. Yu, Phys. Rev. D 46,
1148 (1992); 55, 5851(E) (1997).
[24] M.B. Wise, Phys. Rev. D 45, 2118 (1992).
[25] G. Burdman and J.F. Donoghue, Phys. Lett. B 280, 287 (1992).
http://arxiv.org/abs/hep-ex/0408073
http://arxiv.org/abs/0706.3885
[26] D.X. Zhang, Phys. Lett. B 382, 421 (1996).
[27] A.N. Ivanov and N.I. Troitskaya, Nuovo Cim. A 111, 85 (1998).
[28] N.G. Deshpande, G. Eilam, X.G. He, and J. Trampetić, Phys. Rev. D 52, 5354 (1995).
[29] S. Fajfer, R.J. Oakes, and T.N. Pham, Phys. Rev. D 60, 054029 (1999).
[30] B. Bajc, S. Fajfer, R.J. Oakes, T.N. Pham, and S. Prelovsek, Phys. Lett. B 447, 313 (1999).
[31] A. Deandrea, R. Gatto, M. Ladisa, G. Nardulli, and P. Santorelli, Phys. Rev. D 62, 036001
(2000); ibid 62, 114011 (2000).
[32] A. Deandrea and A.D. Polosa, Phys. Rev. Lett. 86, 216 (2001).
[33] S. Fajfer, T.N. Pham, and A. Prapotnik, Phys. Rev. D 70, 034033 (2004).
[34] M. Beneke, G. Buchalla, M. Neubert, and C.T. Sachrajda, Phys. Rev. Lett. 83, 1914 (1999);
Nucl. Phys. B 591, 313 (2000); ibid. B 606, 245 (2001).
[35] Y.Y. Keum, H.n. Li, and A.I. Sanda, Phys. Rev. D 63, 054008 (2001); Phys. Lett. B 504, 6
(2001).
[36] C.W. Bauer, S. Fleming, D. Pirjol, and I.W. Stewart, Phys. Rev. D 63, 114020 (2001).
[37] C. L. Y. Lee, M. Lu, and M. B. Wise, Phys. Rev. D 46, 5040 (1992).
[38] H. Y. Cheng and K. C. Yang, Phys. Rev. D 66, 054015 (2002).
[39] S. Ahmed et al. (CLEO Collaboration), Phys. Rev. Lett. 87, 251801 (2001).
[40] H.Y. Cheng, C.K. Chua, and A. Soni, Phys. Rev. D 72, 094003 (2005).
[41] M. Wirbel, B. Stech, and M. Bauer, Z. Phys. C 29, 637 (1985).
[42] S.J. Brodsky and G.R. Farrar, Phys. Rev. D 11, 1309 (1975).
[43] C. K. Chua, W. S. Hou, S. Y. Shiau, and S. Y. Tsai, Phys. Rev. D 67, 034012 (2003).
[44] D. Bisello et al. (DM2 Collaboration), Z. Phys. C 39, 13 (1988).
[45] M. Ablikim et al. (BES Collaboration), Phys. Rev. D 72, 092002 (2005).
[46] H. Y. Cheng and K. C. Yang, Phys. Rev. D 71, 054020 (2005).
[47] H. Y. Cheng, Int. J. Mod. Phys. A 4, 495 (1989).
[48] H.Y. Cheng, C.K. Chua, and K.C. Yang, Phys. Rev. D 73, 014017 (2006).
[49] H.Y. Cheng, Phys. Rev. D 67, 034024 (2003).
[50] Heavy Flavor Averaging Group, http://www.slac.stanford.edu/xorg/hfag.
[51] T. Gershon and M. Hazumi, Phys. Lett. B 596, 163 (2004).
[52] CKMfitter Group, J. Charles et al., Eur. Phys. J. C 41, 1 (2005) and updated results from
http://ckmfitter.in2p3.fr; UTfit Collaboration, M. Bona et al., JHEP 0507, 028 (2005) and
updated results from http://utfit.roma1.infn.it.
[53] E.M. Aitala et al. (E791 Collaboration), Phys. Rev. Lett. 91, 201801 (2003).
[54] R.R. Akhmetshin et al., Phys. Lett. B 462, 380 (1999); M.N. Achasov et al., Phys. Lett. B
485, 349 (2000); A. Aloisio et al., Phys. Lett. B 537, 21 (2003).
[55] M. Gronau and J.L. Rosner, Phys. Rev. D 72, 094031 (2005).
[56] K. Abe et al. (Belle Collaboration), Phys. Rev. D 75, 051101 (2007).
[57] H.Y. Cheng, Int. J. Mod. Phys. A 4, 495 (1989).
[58] B. Aubert et al. (BaBar Collaboration), hep-ex/0702010.
[59] B. Aubert et al. (BaBar Collaboration), Phys. Rev. Lett. 91, 051801 (2003).
[60] H.Y. Cheng, C.K. Chua, and A. Soni, Phys. Rev. D 71, 014030 (2005).
http://www.slac.stanford.edu/xorg/hfag
http://ckmfitter.in2p3.fr
http://utfit.roma1.infn.it
http://arxiv.org/abs/hep-ex/0702010
[61] M. Beneke and M. Neubert, Nucl. Phys. B 675, 333 (2003).
[62] A. Ali and C. Greub, Phys. Rev. D 57, 2996 (1998); A. Ali, G. Kramer, and C.D. Lü, ibid.
58, 094009 (1998).
[63] Y.H. Chen, H.Y. Cheng, B. Tseng, and K.C. Yang, Phys. Rev. D 60, 094014 (1999).
[64] P. Ball and R. Zwicky, Phys. Rev. D 71, 014015 (2005).
[65] H. Y. Cheng, C. K. Chua, and C. W. Hwang, Phys. Rev. D 69, 074025 (2004).
	Introduction
	BKKK decays
	B0KKK decays
	B-KKK decays
	BK decays
	BKK decays
	B decays
	Direct CP asymmetries
	Two-body BVP and BSP decays
	Conclusions
	Acknowledgments
	Decay amplitudes of three-body B decays
	Decay constants, form factors and others
	References
ABSTRACT
  Charmless 3-body decays of B mesons are studied in the framework of the
factorization approach. The nonresonant contributions arising from $B\to
P_1P_2$ transitions are evaluated using heavy meson chiral perturbation theory
(HMChPT). The momentum dependence of nonresonant amplitudes is assumed to be in
the exponential form $e^{-\alpha_{NR}} p_B\cdot(p_i+p_j)}$ so that the HMChPT
results are recovered in the soft meson limit $p_i, p_j\to 0$. In addition, we
have identified another large source of the nonresonant signal in the matrix
elements of scalar densities, e.g. $<K\bar K|\bar ss|0>$, which can be
constrained from the decay $\bar B^0\to K_SK_SK_S$ or $B^-\to K^-K_SK_S$. The
intermediate vector meson contributions to 3-body decays are identified through
the vector current, while the scalar meson resonances are mainly associated
with the scalar density. Their effects are described in terms of the
Breit-Wigner formalism. Our main results are: (i) All KKK modes are dominated
by the nonresonant background. The predicted branching ratios of
$K^+K^-K_{S(L)}$, $K^+K^-K^-$ and $K^-K_SK_S$ modes are consistent with the
data within errors. (ii) Although the penguin-dominated $B^0\to K^+K^-K_{S}$
decay is subject to a potentially significant tree pollution, its effective
$\sin 2\beta$ is very similar to that of the $K_SK_SK_S$ mode. However, direct
CP asymmetry of the former, being of order -4%, is more prominent than the
latter. (iii) For $B\to K\pi\pi$ decays, we found sizable nonresonant
contributions in $K^-\pi^+\pi^-$ and $\bar K^0\pi^+\pi^-$ modes, in agreement
with the Belle measurements but larger than the BaBar result.

<|endoftext|><|startoftext|>
Electrical transport properties of polar heterointerface between
KTaO3 and SrTiO3
A. Kalabukhov,1, ∗ R. Gunnarsson,1 T. Claeson,1 and D. Winkler1
1Department of Microtechnology and Nanoscience
(MC2), Chalmers University of Technology
Göteborg, Sweden
(Dated: August 10, 2021)
Abstract
Electrical transport of a polar heterointerface between two insulating perovskites, KTaO3 and
SrTiO3, is studied. It is formed between a thin KTaO3 film deposited on a top of TiO2-terminated
(100) SrTiO3 substrate. The resulting (KO)
1−(TiO2)
0 heterointerface is expected to be hole-
doped according to formal valences of K (1+) and Ti (4+). We observed electrical conductivity and
mobility in the KTaO3/SrTiO3 similar to values measured earlier in electron-doped LaAlO3/SrTiO3
heterointerfaces. However, the sign of the charge carriers in KTaO3/SrTiO3 obtained from the
Hall measurements is negative. The result is an important clue to the true origin of the doping at
perovskite oxide hetero-interfaces.
PACS numbers: 73.20.-r,73.21.Ac,73.40.-c
http://arxiv.org/abs/0704.1050v1
The mechanism of doping in hetero-interfaces between two insulating perovskite oxides
has been intensively discussed since the observation of large electrical conductivity in the
hetero-structure between LaAlO3 (LAO) and SrTiO3 (STO).
1,2,3,4,5,6 It was argued that
when a thin LAO film is coherently grown on the TiO2-terminated STO substrate, the
resulting interface (LaO)+(TiO2)
0 is expected to be polar, provided that the bulk formal
valences of Ti and La are maintained at the interface and that the structure of the interface
is atomically perfect. The polar structure at the interface results in an infinitely growing
electrostatic potential in the (001) direction when the thickness of LAO film is increasing.
In order to compensate for the charge discontinuity at the interface, half of an electron per
square unit cell may be released leading to conductivity at the interface.7 The possibility
of this doping mechanism was supported by theoretical works.8,9 However, there are other
possible doping mechanisms in perovskite oxides. It is well known that electrical properties of
STO can be changed from insulating to metallic ones by a small reduction of oxygen from its
stoichiometric composition.10,11 The possibility that the electrical property at the LAO/STO
heterointerface is not due to the oxygen vacancies was presumably ruled out by keeping the
STO substrate at deposition conditions which does not result in bulk conductivity due to
oxygen vacancies. However, it is still possible that the deposition of the LAO film itself
can reduce oxygen from a shallow layer at the STO substrate, as argued by Siemons et al.6
We have previously suggested that oxygen vacancies play an important role in the electrical
properties of the LAO/STO heterointerface.4 However, the true microscopic origin of the
conductivity at the interface between LAO and STO could not be understood.
In this work we treat another polar interface between two insulating perovskite oxides,
KTaO3 (KTO) and STO. KTO is a well known material with a cubic structure and lattice
parameter of 3.99 Å(compare with 3.905 Åin STO). It is incipient ferroelectric at room
temperature with a dielectric constant of about 260.12,13 Tantalum has a formal valence
of 5+, and potassium 1+ in KTO. The KTO film should grow as a sequence of layers on
a single TiO2-terminated STO substrate in the (001) direction and the resulting interface
should have the structure of (KO)−(TiO2)
0. This means that half a hole per square unit
cell should be released. This is opposite to the (LaO)+(TiO2)
0 heterointerface, where half
an electron per unit cell is transferred to the interface.
We have grown thin KTO films on STO substrates and found that the KTO/STO inter-
face is indeed conducting with electrical properties very similar to the LAO/STO interface.
0 1 2 3 4 5 6
Position ( m)
FIG. 1: (Color online) Atomic force microscope image (top) and cross section (bottom) of the
surface of the 6 nm thick KTaO3 film grown on a TiO2-terminated (100) SrTiO3 substrate. Unit
cell steps are seen about every 250 nm along the surface.
However, the charge of electrical carriers deduced from Hall effect measurements is nega-
tive. We discuss possible reasons for this interesting result in view of interface structure and
possible doping mechanisms.
KTO films were prepared by pulsed laser deposition with in-situ reflection high-energy
electron diffraction (RHEED) used to monitor film growth during deposition. The growth
conditions were similar to what we used previously to fabricate LAO/STO hetero-interfaces:4
deposition temperature TD = 750
◦C, oxygen pressure pO2 = 10
−4 mbar, laser energy density
J = 1.5 J/cm2. RHEED oscillations could be observed during the initial part of the film
growth. However the intensity decreased rapidly and after 3 unit cells it was too low to
observe oscillations. The deposition rate estimated from the first RHEED oscillations was 1
unit cell per 10 pulses. The thickness of the KTO films was 13 u.c. layers (approx. 6 nm).
Atomic force microscopy (AFM) showed very smooth step-like surface of the KTO film, see
-2 -1 0 1 2
 6 nm LAO/STO
 6 nm KTO/STO
T = 300 K
H (T)
FIG. 2: (Color online) (a) Experimental configuration for determination of Hall coefficient; (b) Hall
resistance RXY for KTO/STO and LAO/STO heterostructures measured at room temperature and
the same experimental configuration.
Fig.1.
Electrical measurements were made in a four point van der Pauw configuration14 in the
temperature range 2 K – 300 K and in magnetic field up to 5 T. First we proved that the
KTO film itself is not conducting by using ”soft” contacts: silver wires glued on the film
surface using silver epoxy. In order to reach the interface, we used Ti/Au contact pads
fabricated by sputtering through metal mask. The resistance between Ti/Au contacts and
contacts glued by silver epoxy was above 10 MΩ, indicating an absence of pinholes in the
KTO film.
The electrical properties of KTO films may be compared to those of 15 u.c. thick LAO
films on TiO2-terminated STO substrates prepared in the same conditions. Both hetero-
structures show metallic conductivity with relatively high mobilities and charge carrier con-
centrations. Fig.2 shows Hall resistances measured at room temperature under the same
experimental configuration (i.e. magnetic field and current direction, see Fig.2a). Both
KTO/STO and LAO/STO heterointerfaces had the same sign of Hall coefficient. The sign
of the charge carriers is negative according to the magnetic field and bias current directions.
The values of the sheet resistance RS, the Hall mobility µH and the charge carrier density
nS of the KTO/STO heterointerfaces are very similar to those of LAO/STO, see Fig.3. We
1 10 100
T (K)
 6 nm KTO/STO
 6 nm LAO/STO
1 10 100
15 (b)
T (K)
1 10 100
T (K)
FIG. 3: (Color online) Temperature dependence of sheet resistivity RS (a), charge carriers density
nS (b) and Hall mobility µH for LAO/STO and KTO/STO heterointerfaces prepared at 10
−4 mbar
oxygen pressure.
measured three KTO films prepared in similar deposition conditions and they all showed
similar transport properties.
It is known that potassium deficiency is a significant problem in growth of KTO films
due to the high vapor pressure of potassium at high temperatures.13 If this were the case
here, the actual heterointerface between KTO/STO may have different reconstruction from
the one described above. This possibility needs to be ruled out in a future determination of
the microstructure of the hetero-structure by electron microscopy.
Independent of the KTO/STO heterointerface microstructure being perfect or not, it
is quite remarkable that the electrical properties are very similar to those of LAO/STO
heterointerface. This could suggests that there is a common doping mechanism where the
type and concentration of charge carriers do not directly depend on the sign of the polar
interface deduced from the formal bulk valences. We have previously argued that the high
conductivity, mobility, and charge carrier density found in hetero-junctions of LAO/STO
prepared at low oxygen pressure mainly are due to oxygen vacancies residing in STO close
to the interface. That conclusion is further strengthened by the present findings.
The work was supported by the Swedish KAW, SSF, and VR foundations, the EU
NANOXIDE, and ESF THIOX programs.
∗ Electronic address: alexei.kalaboukhov@mc2.chalmers.se
1 A. Ohtomo, and H.Y. Hwang, Nature 427, 423 (2004); 441, 120 (2006).
mailto:alexei.kalaboukhov@mc2.chalmers.se
2 M. Huijben, G. Rijnders,D. H. A. Blank, S. Bals, S. van Aert, J. Verbeeck, G. van Tendeloo,
A. Brinkman and H. Hilgenkamp, Nature Materials 5, 556 (2006).
3 S. Thiel, G. Hammerl, A. Schmehl, C. W.Schneider and J. Mannhart, Science 313, 1935 (2006).
4 A. Kalabukhov, R. Gunnarsson, J. Börjesson, E. Olsson, T. Claeson and D. Winkler, Phys.
Rev. B 75, 121404(R) (2007).
5 G. Herranz, M. Basletic, M. Bibes, R. Ranchal, A. Hamzic, E. Tafra, K. Bouzehouane, E.
Jacquet, J. P. Contour, A. Barthelemy and A. Fert, Phys. Rev. B 73, 064403 (2006).
6 W. Siemons, G. Koster, H. Yamamoto, W. A. Harrison, G. Lukovsky, T. H. Geballe, D. H. A.
Blank and M. R. Beasley, cond-mat/0612223 (2006).
7 N. Nakagawa, H. Y. Hwang and D. A. Muller, Nature Mat., 5, 204 (2006).
8 R. Pentcheva and W. E. Pickett, Phys. Rev. B 74, 035112 (2006)
9 M. S. Park, S. H. Rhim and A. J. Freeman, Phys. Rev. B 74, 205416 (2006)
10 O. N. Tufte and P. W. Chapman, Phys. Rev. 155, 796 (1967).
11 C. S. Koonce, M. L. Cohen, J. F. Schooley, W. R. Hosler and E. R. Pfeiffer, Phys. Rev. 163,
380 (1967).
12 D. Z. Bozinis, J. P. Hurrel, Phys. Rev. B 13, 3109 (1976).
13 H.-J. Bae, J. Sigman, S.-J. Park, Y.-H. Heo, L. A. Boatner and D. P. Norton, Solid State
Electronics 48, 51 (2004).
14 J. L. van der Pauw Philips Res. Rep. 13, 1 (1958).
http://arxiv.org/abs/cond-mat/0612223
	References
ABSTRACT
  Electrical transport of a polar heterointerface between two insulating
perovskites, KTaO3 and SrTiO3, is studied. It is formed between a thin KTaO3
film deposited on a top of TiO2- terminated (100) SrTiO3 substrate. The
resulting (KO)1-(TiO2)0 heterointerface is expected to be hole-doped according
to formal valences of K (1+) and Ti (4+). We observed electrical conductivity
and mobility in the KTaO3/SrTiO3 similar to values measured earlier in
electron-doped LaAlO3/SrTiO3 heterointerfaces. However, the sign of the charge
carriers in KTaO3/SrTiO3 obtained from the Hall measurements is negative. The
result is an important clue to the true origin of the doping at perovskite
oxide hetero-interfaces.

<|endoftext|><|startoftext|>
Mon. Not. R. Astron. Soc. 000, 1–?? (2007) Printed 29 October 2018 (MN LATEX style file v2.2)
The Angular Separation of the Components of the Cepheid
AW Per
D. Massa
1⋆ † and N. R. Evans2 ⋆
1SGT, Inc. NASA’s GSFC, Code 665, Greenbelt, MD 20771, USA
2Center for Astrophysics, 60 Garden St., MS 4, Cambridge, MA 02138
Accepted 2007 December 15. Received 2007 December 14; in original form 2007 August 13
ABSTRACT
The 6.4 day classical Cepheid AW Per is a spectroscopic binary with a period of 40
years. Analyzing the centroids of HST/STIS spectra obtained in November 2001, we
have determined the angular separation of the binary system. Although we currently
have spatially resolved data for a single epoch in the orbit, the success of our approach
opens the possibility of determining the inclination, sin i, for the system if the mea-
surements are repeated at additional epochs. Since the system is potentially a double
lined spectroscopic binary, the combination of spectroscopic orbits for both compo-
nents and the visual orbit would give the distance to the system and the masses of its
components, thereby providing a direct measurement of a Cepheid mass.
Key words: Cepheids – stars: AW Per – binaries: visual – binaries: spectroscopic.
1 INTRODUCTION
Cepheids are important stars in many respects, most notably
for their roles as fundamental rungs on the cosmic distance
ladder and the challenges their structure pose to stellar inte-
riors modelling. The use of Cepheids as primary extragalac-
tic distance indicators makes a quantitative understanding
of their properties extremely valuable. While the Magellanic
Clouds are perhaps the best laboratory to study cepheids,
the dependence of the period–luminosity relation on metal-
icity is still not fully understood (Romaniello et al. 2005).
Consequently, accurate distances (absolute magnitudes) to
Galactic cepheids are needed to fully understand and quan-
tify this dependence and to apply cepheid scale to more
metal rich spiral galaxy stars which are more commonly used
in extragalactic distance determinations.
Cepheids also present important tests for interiors cal-
culations since, as evolved stars, their structure is dictated
by their evolutionary history. In addition, the models must
predict the puslational properties of cepheids, making the
modelling especially challenging. This complexity is codified
in the term “the Cepheid mass problem”. Forty years ago,
when the first hydrodynamic pulsation calculations were
made, it was realized that the mass could be derived by
either matching the Herzsprung progression of secondary
⋆ E-mail: massa@derckmassa.net; evans@head-cfa.harvard.edu
† Based on observations with the NASA/ESA Hubble Space Tele-
scope, obtained at the Space Telescope Science Institute, which
is operated by the Association of Universities for Research in As-
tronomy, Inc. under NASA contract No. NAS5-26555.
maxima or by a parameterization of the pulsation con-
stant. These masses were as much as a factor of two smaller
than evolutionary calculations. A reconciliation was recently
achieved from re-evaluation of the interior opacities (see Si-
mon, 1990, for a summary). We see, therefore, that in ad-
dition to absolute magnitudes, obtaining accurate Cepheid
masses is also important.
If we can determine the angular separations of binary
systems containing a Cepheid, which are double lined spec-
troscopic binaries (SB2s), then the distances and masses of
the Cepheids can be derived from basic physics. Because of
the central roles of Cepheids in fundamental astrophysics,
it is important to have such direct measurements. While
several Cepheid distances have been measured directly by
the Hipparcos satellite, the quality of these measurements
was only sufficient for statistical considerations (e.g., Groe-
newegen & Oudmaijer 2000). More recently, a large cam-
paign using the Fine Guidance Sensor on HST has begun
to yield accurate distances to single Cepheids (Benedict et
al. 2002). However, to date the mass of only one cepheid,
Polaris, has been directly determined from fundamental ob-
servations (Evans, et al. 2007).
Although SB2s containing a Cepheid and an A or B star
are common (see, Evans 1995), these stars are difficult to re-
solve in the optical. This is because of the inevitable, enor-
mous magnitude differences of the components in the opti-
cal, which result from massive stars evolving toward cooler
temperatures at nearly constant luminosity. The top panel
Figure 1 shows a typical example of a Cepheid + B star bi-
nary, and the contrast between the primary and secondary
throughout the optical and IR is obvious.
c© 2007 RAS
http://arxiv.org/abs/0704.1051v2
2 D. Massa and N. R. Evans
Figure 1. Kurucz models for a typical Cepheid (large/red) + hot
star (small/blue) binary. The top panel shows how the secondary
is roughly 10 times fainter in the optical, making the system ex-
tremely difficult to resolve from the ground. On the other hand,
the secondary dominates the flux from the system in the UV. The
remaining 5 panels demonstrate how the wavelength dependence
of the spectrum centroid changes with orientation of the axis of
the binary relative to the dispersion for 5 different orientations,
shown to the left of each panel. Notice that in the spectral re-
gion accessible from the ground, the centroid shifts by less than
10% of the full separation. The “cross-over” point is not reached
until λ ∼ 2500Å. A color version of the figure is available in the
electronic version of the paper.
Thus, while the measurement of a Cepheid mass by di-
rectly imaging a double lined spectroscopic binary with a
Cepheid primary and an A or B star secondary has been
a long-sought goal, ground-based studies have not, as yet,
been able to accomplish this (even though they have been
able to resolve the stellar disks of some Cepheids, e.g.,
Kervella et al. 2004, and references therein). As a result, in-
direct methods have been developed to determine the masses
of Cepheids. The most popular of these uses a combination
of UV and optical spectroscopy to obtain radial velocity
curves for both components. Then the UV spectral energy
distribution (SED) of the hot secondary is used to obtain
its temperature. Finally, the mass – temperature relation
for main sequence A or B stars is used to infer the mass
of the secondary and, thus, (since the system is an SB2)
the mass of the primary. This approach has been applied to
several systems (SU Cyg, S Mus and V350 Sgr), using IUE
or HST spectra to determine the radial velocity curves and
SEDs of the secondaries (Evans, et al., 1998). The masses
obtained by this approach agree, on average, with the mass-
luminosity predictions from evolutionary calculations with
moderate convective overshoot (e.g. Schaller, et al. 1992).
However, this approach requires an exact understanding of
the evolutionary phase of the hot secondary and relies on its
spectroscopic parallax to determine the distances to the sys-
tems. Clearly, a direct measurement of the masses of both
components is more desirable.
In this paper, we describe how we used the Space Tele-
scope Imaging Spectrograph (STIS) on HST to resolve a the
potential SB2 Cepheid binary AW Per using an approach we
call “cross-dispersion imaging”. AW Per is a 6.4 day Cepheid
which is in a roughly 40 year orbit with its hot secondary
(Evans et al. 2000). Evans (1989) studied the system and
determined that the secondary is a main sequence B7-8 star
and that the color excess of the system is E(B−V ) = 0.52.
The Teff of the secondary is expected to be∼ 12000K (Evans
1994).
The remainder of the paper is organized as follows: §2
provides an overview the approach used to “resolve” the bi-
nary, §3 describes the observations, §4 gives the data anal-
ysis, §5 details the analysis of the observations, §6 presents
the results, §7 discusses the results and their implications,
and §8 summarizes the findings.
2 THE APPROACH(CROSS-DISPERSION
IMAGING)
2.1 Basic Principles
Massa & Endal (1987) describe how combining imaging and
spectroscopy can dramatically increase the effective “resolv-
ing power” of an instrument. Specifically, they showed how
the wavelength dependence of the centroid of a spectrum can
determine the angular separation of an unresolved binary
whose components have distinctly different spectra. The ba-
sic concept of this approach is quite simple. It is based on
an idea put forth by Beckers (1982) and has been indepen-
dently discovered by a number of others (see, e.g., Porter et
al. 2004, and references therein).
Like all cross-dispersion imaging techniques, some sort
of a model is required to interpret the observations. This
model might be extremely simple, as in the case of a binary
where one assumes that the system is composed of exactly
two stars, and that one contributes all of the flux at one
wavelength and the other contributes all of the flux at an-
other wavelength. This crude model would be sufficient to
“resolve” the binary from the properties of its spectrum.
Consider the image of a highly unresolved binary sys-
tem. To first order, the image of the combined light from the
system is indistinguishable from a point source. However,
the position of an image at any given wavelength will be dis-
placed toward the location of the binary component which
contributes most of the light that wavelength. In principle,
one could obtain images at several different wavelengths and
determine how the centers of the images shift from one ex-
posure to the next. Analysis of this set of data (along with a
model for the flux ratios in each band) would then determine
the separation of the two components (Becker 1982). The
drawback of this direct approach is that all of the exposures
c© 2007 RAS, MNRAS 000, 1–??
AW Per 3
would have to be obtained using different optical elements,
making alignment at the sub-pixel level effectively impossi-
ble. Instead, Massa & Endal (1987) show that tracking the
centroid of the spectrum of the binary has the same effect.
Furthermore, because all of the position measurements (the
centroid of the spectrum at each wavelength) are obtained
at one time, this method is more efficient and the measure-
ments are differential in nature, freeing them from several
sources of systematic error.
To make these notions quantitative, let x and y be the
angular coordinates on the detector which are parallel and
perpendicular to the wavelength dispersion. Therefore, the
wavelengths, λ, are given by λ = λ(x). Now, consider a bi-
nary whose components have an angular separation θ and
photon fluxes per unit wavelength Np(λ) and Ns(λ) for the
primary and secondary, respectively. Further, let φ be the
position angle of the binary on the sky (measured c.c. from
north toward east of a line from the primary to the sec-
ondary) and let α be a similarly measured angle between
north and a line in the dispersion direction pointing in the
direction of decreasing wavelength. Thus, α can be varied by
changing the orientation of the telescope. With these defi-
nitions, the wavelength dependence of the centroid of the
spectrum of a single observation of a binary is
y(λ) =
1 +Np(λ)/Ns(λ)
+ Const. where (1)
∆y = θ sin(φ− α) (2)
(see the appendix). Thus, if Np(λ)/Ns(λ) is known, then
measurements at two or more orientations (α’s) enables one
to determine θ and φ, the separation and position angle of
the binary. Note that if the spectral energy distributions
(SEDs) of the components are vastly different, then the po-
sition of the centroid shifts from one to the other, depend-
ing upon which star dominates the flux at each wavelength.
On the other hand, if the binary components have identical
SEDs, then no spatial information can be gained from the
centroid positions.
Figure 1 is a cartoon depicting how the centroid of the
spectrum of a binary star, whose components have very dif-
ferent effective temperatures, is influenced by the relative
energy distributions of the two components and the orien-
tation of the binary relative to the dispersion direction of
a spectrograph. In this case, the centroid shifts from the
cool component at long wavelengths to the hot component
at short wavelengths. We define the cross-over wavelength
as that wavelength were each binary component contributes
equally to the flux. For Cepheid binaries, this wavelength is
typically in the near UV (∼ 3000Å for the case shown). In
order to infer spatial information from the centroids, it is de-
sirable to span as large a wavelength baseline as possible, to
maximize the deflections in the centroid positions. The best
case would be to cover a large enough wavelength range with
a single setting, so that one end of the spectrum is totally
dominated by one star and the other end is dominated by
the other. If this is not practical, a wavelength band centered
on the cross-over wavelength and covering a baseline large
enough to experience more than a 50% centroid deflection
is adequate. However, in this case, one needs an estimate
of the SEDs of the two binary components in order to ex-
tract the angular separation. Note that if the absolute flux
calibration of the instrument is well-determined, then the
Figure 2. Relative error in the angular separation of a binary
determined from fitting a cosine curve to measurements obtained
at three orientations, {−∆α, 0,+∆α} versus ∆α (abscissa) over
the interval ∆α = 1 → 90◦. The different curves are for different
values of the orientation of the system on the sky, φ, between
φ = 1 → 90◦.
flux observations can provide additional information which
can be incorporated into the determination of the angular
separation (see §5).
Finally, to unambiguously determine the separation and
position angle of the binary, two or more observations are
required in order to solve eq. (2) for θ and φ in terms of the
measured quantities ∆y(n) and α(n), for n > 2.
The final error associated with the angular separation
and the position angle measurements depends upon the
band pass of the observation, the signal-to-noise of the data
(discussed in the next section), the number of independent
orientations obtained and the relation between the these an-
gles and φ. We have examined the relative error for sampling
three orientations, α(n) = {−∆α, 0,+∆α}, for position an-
gles between 1 and 90◦. Figure 2 demonstrates how the rel-
ative accuracy of the observations changes as a function of
sampling interval, ∆α, and relative orientations, φ, for this
case. For most orientations, any sampling with ∆α & 30◦
provides comparable accuracy.
The approximations developed in this section are only
valid in the sub-Rayleigh regime. Once the sources are re-
solved at any wavelength, the entire image must be modeled
using a an accurate representation of the point spread func-
tion as well as the fluxes of the two objects.
2.2 Exposure Times and Random Errors
The counts needed to centroid to an accuracy σ[y(λ)] can be
estimated for an instrument whose spread function perpen-
dicular to the dispersion is a Gaussian with FWHM = ξ.
A single count is equivalent to one estimate of the center
of the spectrum drawn from a sample with an RMS disper-
sion σ = ξ/
8 ln 2 = 0.42ξ. Therefore, N samples (counts)
determine the centroid to an accuracy of
σ[y(λ)] =
0.42ξ
. (3)
c© 2007 RAS, MNRAS 000, 1–??
4 D. Massa and N. R. Evans
Equation (3) gives the counts needed over a wavelength
band to obtain the desired accuracy. The FWHM of the
STIS PSF varies from ∼ 0.05 − 0.07′′ (depending on wave-
length) and the minimum number of counts obtained in one
10 min observation over a spectral resolution element (2 pix-
els) was 4000, and we obtained 3 of these. Therefore, the
poorest precision we can expect based upon simple sampling
arguments is ∼ 3 × 10−4′′, and this is for a single resolu-
tion element. In all, there are 512 independent resolution
elements which will be combined to determine a single mea-
surement of ∆y through the use of eq. (1). Therefore, ran-
dom noise in the angular separation determinations should
be . 10−4′′, and not a limiting factor for our observations.
However, as is typical for most observations, we shall see
that systematic effects will dominate the error budget (see,
3 THE OBSERVATIONS
As can be seen from the top panel of Figure 1, a broad wave-
length baseline is needed to optimize the extraction process.
Furthermore, good spectral resolution is also advantageous,
since spectral features provide additional constraints. Con-
sequently, we employed the STIS on HST to obtain high
spatial resolution, excellent wavelength coverage and good
spectral resolution. We used the STIS NUV-MAMA detec-
tor together with its G230L grating, since this combination
provided good coverage (1600 6 λ 6 3160 Å) of the ex-
pected cross-over point (see, Kim Quijano, J., et al. 2003).
Spectra were obtained at three distinct roll angles (see,
Table 1) which differ by ∼ ±20◦. Although rolls of ±60◦
would be optimal, we were limited to smaller rolls by HST
restrictions for objects at the declination of AW Per. Al-
though not optimal, Figure 2 shows that this restricted range
does not sacrifice very much in theoretical accuracy. After a
standard STIS target acquisition, which centers the binary
within a 0.1′′ aperture, we obtained the science observations
through the 25MAMA aperture, which provides slitless spec-
tra of the binary. At each roll, we offset the star by ±0.1′′
and obtained additional science exposures. This procedure
allows us to characterize localized distortions in the detec-
tor. It is also useful for determining the sensitivity of the
observations to their position on the detector, since each
spectrum is sampled differently by the pixel lattice. Since
the spectrum was repositioned to within 2 pixels (< 0.05′′)
after each roll, the dispersion of measurements obtained at
the ±0.1′′ offsets should provide a good characterization of
the errors that result from all of the changes encountered in
the positioning of the spectrum. The reproducibility of these
observations also provides a more realistic measurements of
the centroiding errors than those based on simple signal-to-
noise considerations. As a result of our observing strategy,
we obtained 3 observations at each of 3 rolls, for a total of
9 spectra, with exposure times of roughly 10 min each.
The orientations mentioned above are measured with
respect to the STIS coordinate system, which we define as
the x0 − y0 system. In this system, the dispersion direction
(from red to blue) makes an angle (measured in the c.c.
direction) of −1.4◦ with the x0 axis.
4 DATA REDUCTION
4.1 Centroids
The first step in the reduction process was to extract the
centroids. This presents a problem, since the STIS detector
does not oversample the HST PSF. However, since (as will
be explained shortly) only relative centroids will be needed,
we can accept some level of bias in the extraction process, as
long as it is consistent. This is because the ultimate measure-
ments will be differences of the centroids, which will cancel
small, uniform biases introduced in the extraction process.
We used three separate approaches to extract the cen-
troids, y(λ), from the raw images. We chose to analyze the
raw images (in their native “highres” 2048×2048 format) be-
cause initial experimentation showed that the geometrically
corrected images did little to improve the relative positions
of the centroids over the a range of 10 pixels or less (which
are the scales important to us). Thus it was felt best to
avoid the inevitable smoothing which is introduced by the
resampling involved in geometric corrections.
The first approach we used was a simple cross-
correlation technique relative to a set of 0.025′′ FWHM gaus-
sians. The second one employed a standard cross-correlation
technique using the cross dispersion profiles of a spectrum
of a standard star (the wd GD71) which was observed at
roughly the same position on the detector with the same
grating. We used sinc interpolation in the cross-correlation
to compensate for the undersampling of the PSF by the
MAMA detector. Finally, we used a non-linear least squares
fit to a set of gaussians whose FWHMs, central positions
and amplitudes were allowed to vary at each pixel. No sys-
tematic differences were found among all three approaches.
However, the results from the non-linearly extracted cen-
troids produced the results with the lowest pixel-to-pixel
scatter, and these were adopted for the following analysis.
The 3 sets of centroid measurements at each roll an-
gle were rebinned to 512 elements from the 2048 elements
available in the raw images, and these were used to con-
struct mean centroids at each roll and their standard devi-
ations. Because the centroids near the edges of the detec-
tor are poorly determined, of the 512 bined pixels (in the
wavelength direction) only about 490 are well-behaved. The
standard deviations for these 490 pixels determined for each
roll angle are over plotted as a function of wavelength in Fig-
ure 3. The RMS means for each roll angle are 0.027, 0.024,
and 0.027 pixels or (0.67, 0.59, and 0.67 mas). Remember,
these are the single observation standard deviations for a
single pixel, and there are 9 independent observations with
490 useful pixels. Notice also that this scatter is significantly
larger than the one expected from the simple signal-to-noise
arguments of the previous section. The reason is that the
actual uncertainties are set by random differences between
the photometric and geometric centroids of the pixels, and
by localized geometric distortions in the detector over the
range of a few pixels. Nevertheless, the repeatability of the
centroids (to a few percent of a pixel) is considered quite
good, and we will use this scatter to characterize the actual
measurement errors in the centroid positions.
Since the centroids are extracted from the raw images,
they contain large scale geometric distortions. Consequently,
we will analyze the relative centroids. To construct these, we
first combine the centroids determined at each offset for a
c© 2007 RAS, MNRAS 000, 1–??
AW Per 5
Table 1. Observation log
Obs ID Off Set Roll Obs Date Exp. Time Phasea V (B − V )
arc sec Deg. MJD - 52235 Min. Φ Mag. Mag.
o6f104010 +0.0 175.526 0.34765625 10.0 0.906 7.40 1.02
o6f104020 +0.1 175.526 0.35546875 10.0 0.907 7.39 1.01
o6f104020 −0.1 175.526 0.36328125 11.4 0.909 7.38 1.01
o6f105010 +0.0 205.000 0.41406250 10.0 0.916 7.34 1.00
o6f105020 +0.1 205.000 0.42187500 10.0 0.918 7.33 0.99
o6f105030 −0.1 205.000 0.42968750 11.4 0.919 7.32 0.99
o6f106010 +0.0 146.526 0.48046875 10.0 0.927 7.27 0.97
o6f106020 +0.1 146.526 0.48828125 10.0 0.928 7.26 0.97
o6f106030 −0.1 146.526 0.49609375 11.4 0.929 7.26 0.97
a Phase, V and (B − V ) are derived from sources in the literature, as discussed in the text.
Figure 3. Standard deviations of the three independent spectra
of AW Per obtained at each roll angle. The standard deviations
for each roll angle are over plotted.
particular roll angle to produce a mean centroid, 〈y〉, at each
roll. These measurements contain geometric distortions and
any systematic effects introduced by the centroid extrac-
tion technique. However, when we analyze the differences
between each individual mean and the grand mean of all
the observations, these systematic affects are removed. This
is because the offsets at each roll are larger than the dis-
placements from one roll to another, and the scatter that
the former exhibit (Fig. 3) demonstrates that localized ge-
ometric distortions are small. Similarly, any systematic af-
fects that result from mis-matches between the actual PSF
orthogonal to the dispersion and the gaussian used to deter-
mine the centroids will cancel out, since the same process is
used in each case.
Finally, we must account for the fact that y(λ) is not
exactly perpendicular to the dispersion. As a result, we must
divide the final displacements that we measure by cos(1.4◦).
4.2 Fluxes
STIS fluxes were extracted from the images using the CAL-
STIS IDL software package developed by Lindler (1998) for
Figure 4. Plots of the mean STIS spectrum of AW Per (solid
curve) together with the available IUE spectra (dotted), cali-
brated to the HST flux system.
the STIS Instrument Definition Team. In order to constrain
the B star flux contribution, we also incorporate the avail-
able IUE spectra (obtained when the Cepheid component
was near minimum light), into the analysis given in §5. The
IUE fluxes were placed upon the HST/STIS flux system us-
ing the transformations described by Massa & Fitzpatrick
(2000). Figure 4 compares the IUE and STIS spectra. It
is immediately clear that the IUE long wavelength spectra
were obtained when the Cepheid was near minimum light
(Φ = 0.53, Evans 1989), while the STIS observations were
near maximum light (Table 1). The effects of extinction are
also clearly apparent, as is the fact that the IUE fluxes are
a factor of 1.146 smaller than the STIS fluxes. This discrep-
ancy is a constant over the region of overlap, and its origin
is unknown. Consequently, we cannot be certain which set
of fluxes is correct. In §6 we show that this ambiguity intro-
duces a significant uncertainty into our results.
The variability of the Cepheid is clearly detectable in
the STIS spectra. Figure 5 shows STIS flux ratios for the
mean spectra obtained at the second and third roll angles
divided by the first. The time lapsed between the mean ob-
servations is 1.59 and 3.19 hours, respectively. This plot
c© 2007 RAS, MNRAS 000, 1–??
6 D. Massa and N. R. Evans
Figure 5. Plots of the ratios of mean STIS spectra of AW Per
obtained at the second and third roll angles divided by the mean
flux obtained at the first roll angle. These plots demonstrate how
the Cepheid component brightened over the 3.5 hour observing
sequence. Notice that the flux at the shortest wavelengths does
not change, since it is dominated by the B star secondary.
demonstrates two things. First, the Cepheid flux changed
significantly throughout the three HST orbits spanned by
the observations. Second, the flux ratios decrease with wave-
length, becoming unity at the shortest wavelengths. This is
contrary to what is normally seen in single Cepheids like
δ Cep (Schmidt & Parsons 1982) where the flux changes
typically increase with decreasing wavelength. Consequently,
this figure shows that the flux at the shortest wavelengths
is dominated by the B star, which does not vary.
The following analysis also requires the color and mag-
nitude of the system the time of the observations. We com-
bined the data from Szabados (1980), Moffett & Barnes
(1984), Szabados (1991), and Kiss (1998), using the period
and HJD for zero phase from Kiss (1998). The combined
data were fit with a high order polynomial, and this was
used to determine the V and (B − V ) photometry at the
times of the STIS observations. The resulting phases and
photometry are listed in Table 1.
5 ANALYSIS
5.1 Overview
Because our spectra cover a limited band-pass, we require
an estimate for the flux ratio of the binary components in
order to extract the wavelength dependence of the centroids.
This flux ratio is constrained, since it must also satisfy the
observed flux of the system, which is the reddened, com-
bined flux of the two binary components. Ideally, one would
fit the observed flux and centroid positions with a combi-
nation of single star spectra obtained with the same instru-
ment and which experience the same reddening. However,
because there is no such library of single star spectra avail-
able, we used an approach which employs a model for the
B star SED star and for the UV extinction to construct the
combined flux and the centroids. We then used a non-linear
least squares fitting procedure1 to fit the centroids and fluxes
simultaneously. This method is described in detail in § 5.3.
5.2 Model Components
We now describe the components of the model used to fit the
observations. In a few instances, refinements might increase
the accuracy, but in the interest of expediency, certain ef-
fects were ignored for the first attempt. First, we use Kurucz
(1991) Atlas 9 models with updated metallicities2 for the B
star. We use only models with a micro-turbulent velocity
of 2.0 km s−1. The synthetic photometry for the models
was calibrated as in Fitzpatrick & Massa (2005). We set
log g = 4.0 for the B star atmosphere. The sensitivity of our
results to this assumption is tested once a fit is achieved.
The model atmosphere fluxes were prepared in the man-
ner described by Fitzpatrick & Massa (2005), which is best
suited to the IUE fluxes. The dust model is quite general.
We use the Fitzpatrick (1999) formulation of the Fitzpatrick
& Massa (1990) model since we need a representation of the
near-UV extinction, and the original Fitzpatrick & Massa
(1990) formulation does not provide one. Although the Fitz-
patrick (1999) curve for the near UV is largely untested, it
is reasonable and the best currently available. To provide
additional flexibility to the Fitzpatrick model, we allow the
bump strength (c3), the width of the 2175 Å (γ) and far-UV
curvature term (c4) to vary independently. In this way, we
can accommodate any observed extinction curve. As a result,
the RV parameter (the ratio of visual extinction to color ex-
cess) only affects the general slope of the UV extinction and
the shape of the near-UV curve, and the wavelength depen-
dence of the total extinction to an object can be expressed
Aλ ≡ A[RV , E(B − V ), γ, c3, c4;λ] . (4)
5.3 Details of the Fitting Procedures
We simultaneously fit the STIS centroids at all three roll
angles and the IUE flux from the B star. We constrain the
reddened model for the B star by assuming that all of the
flux from the system for λ 6 1650 Å is due to the B star.
The difference between the observed flux and the reddened
B star model provides the Cepheid SED which is used in
fitting the centroids. The free parameters of the fit are: The
three ∆y(n) (displacements perpendicular to the dispersion
at each roll angle), Tseff (the effective temperature of the B
star secondary), [m/H]
(the abundance parameter for the
B star), E(B−V ) (the color excess of the system, consistent
with the fluxes), RV (which determines the slope of the UV
extinction curve), γ (the width of the 2175 Å bump), c3
(the bump strength), and c4 (the strength of the far UV
curvature) – 10 parameters in all. The V magnitude of the B
star, Vs, is fixed by the observed flux attributed to the B star
at λ = 1650 Å and the extinction at that wavelength relative
to V . In addition to the separations, the results also yield
an empirical, unreddened UV SED and photometry for the
1 We use the Markwardt non-linear IDL fitting procedure, avail-
able at http://astrog.physics.wisc.edu/∼craigm/idl/idl.html.
2 We used the the Kurucz “preferred models” available at
http://kurucz.harvard.edu/.
c© 2007 RAS, MNRAS 000, 1–??
http://astrog.physics.wisc.edu/~craigm/idl/idl.html
http://kurucz.harvard.edu/
AW Per 7
Cepheid. These can then be and compared to models or to
actual Cepheids. Since the derived Cepheid flux is identical
to the observed flux minus the B star flux for wavelengths
longward of 1650 Å, the flux in this region is fit exactly. The
equation used to fit the centroids is:
− θ2sN(Ts, log gs, vt, [m/H];λ)
θ2sN(Ts, log gs, vt, [m/H];λ)
and the unreddened flux of the Cepheid is given by
p = [N(λ)
− θ2sN(Ts, log gs, vt, [m/H];λ)]
×10A[RV ,E(B−V ),γ,c3,c4;λ] (6)
where θs is the angular diameter of the B star (fixed by the
flux at 1650Å) and n = 1, 2, 3 represents the observations
obtained at each roll angle, which are means of the data
for the three off-set positions. We cannot use a single mean
for the fluxes, since significant changes in V , (B − V ) and
the UV SED occur over the course of the observations (see,
Table 1, Fig. 5) and must be taken into account. However,
the data were averaged at each roll, since the time between
off-sets was much smaller than the time between rolls.
A major advantage of our approach is that it only relies
on a Kurucz Atlas 9 model for the B star, and recent work
by Fitzpatrick & Massa (1999, 2005) has demonstrated that
these provide excellent representations of low resolution B
star SEDs. Further, it avoids using the Atlas 9 models for the
Cepheid component, which is desirable since the accuracy of
Cepheid model atmospheres has not been fully tested, espe-
cially in the UV. This issue is addressed further in §6. The
disadvantage of our approach is that we must have extremely
well calibrated fluxes, and we have already seen an inconsis-
tency between the poorly exposed IUE fluxes and the STIS
data.
5.4 Determining the Separations
The final step in the analysis is to fit the angular separations
derived at each roll angle to a sine curve whose phase and
amplitude are related to the position angle and separation
of the binary (eq. 2). The amplitude of the curve is the full
separation of the system and the phase is the position angle
of the system on the sky. The abscissa of the plot is the
position angle in the x − y system, which is equal to the
values listed in Table 1 minus 1.41◦ (which accounts for the
rotation to align the spectra with the y axis). Figure 6 shows
the definitions of the different angles used in the analysis,
and their relations to one another.
5.5 Weights
The non-linear least squares involves fitting an array which
consists 3 sets of centroids and the IUE fluxes all at once.
To perform the fit, we must provide errors for the different
components of this array. The measurement errors affecting
the centroids were obtained from the standard deviations
of the three independent sets of measurements obtained at
each offset position. For the IUE data, we used the error
vector which accompanies the MXLO fluxes (see, Nicholes
& Linsky 1996).
Figure 6. Diagram showing the definitions of the different angles
and coordinate systems used in the analysis, and their relations to
one another. The position angle on the sky of the binary angle, φ,
is defined as the angle measured the c.c. from north to east, with
the primary at the origin. The x− y system is the standard STIS
coordinate system, with x parallel to the dispersion (increasing
in the direction of increasing wavelength) and y perpendicular
to it. The angle α (also measured the c.c. from north to east) is
defined as the angle between North and x for a given telescope
orientation. Thus, φ− α is the angle between the dispersion and
a line connecting the binary components and ∆y = θ sin(φ−α) is
the displacement of the two spectra of the binary perpendicular
to the dispersion. If φ− α = 0 or ±180◦, then ∆y = 0.
6 RESULTS
In fitting the data, we assumed a microturbulent velocity
of 2.0 km s−1, which is typical for main sequence B stars
(e.g., Fitzpatrick & Massa 2005). Because the B star is over-
whelmed by the Cepheid in the optical and near-UV, we
do not have access to the classical log g diagnostics for B
stars, namely, the Balmer jump and Balmer lines. Conse-
quently, we fixed the surface gravity at 4.0, again typical for
main sequence B stars. We allowed the abundance parame-
ter, [m/H]
, and the effective temperature of the B star to be
optimized by the least squares routine, along with the ∆y’s
and the extinction parameters. In addition, we assumed that
the IUE fluxes were correct (so the STIS fluxes were divided
by 1.146 to make them agree with the IUE data). In apply-
ing our model, we also assume that all of the STIS flux in
a 30 Å band centered at 1650 Å is due to the B star. We
shall examine the effects of our assumptions shortly. Only
the IUE fluxes between 1250 and 1700Å are incorporated
into the fit of the SED, which constrains the physical prop-
erties of the B star. This extends slightly beyond the 1650Å
limit used for the STIS data, but recall that the IUE data
were obtained when the Cepheid was near minimum light,
and nearly a factor of two fainter in the UV (see, Fig. 4).
The parameters determined from the fit are given in Ta-
ble 2, where parameters that were fixed in the fit are enclosed
in parentheses. Figure 7 shows our fits to the centroids. The
points are the observed data and the solid curves are the
fits obtained simultaneously with the fit to the fluxes. The
effects of spectral features on the centroids are clearly seen.
c© 2007 RAS, MNRAS 000, 1–??
8 D. Massa and N. R. Evans
Table 2. Parameter Values
Parameter Value Parameter Value
∆y1 −0.010 c3 4.13
∆y2 0.279 c4 0.82
∆y3 −0.269 γ 0.9686
[6297] Vs (11.084)
15735 (B − V )s0 (−0.156)
log gp (4.00) (U −B)
0 (−0.597)
log gs [1.60] Vp (7.362)
[m/H]p [0.00] (B − V )
0 (0.494)
[m/H]s -0.20 (U −B)
0 (0.359)
E(B − V ) 0.53 ∆ logL (0.95)
R(V ) 3.11
Values in parenthesis were not involved in the fitting procedure.
Values in square brackets were determined from a fit to the
Cepheid SED derived from the initial fit.
Figure 8 shows the fit to the SED below 1650Å. We do not
show the fit to the binary SED longward of 1650Å since it
is, by definition, exact. The extinction curve derived from
the best fit is also shown in Figure 8, where it is compared
to a standard RV = 3.1 curve from Fitzpatrick (1999).
We can also estimate the physical parameters of the
Cepheid component of the binary by fitting its mean SED
inferred from fit. This SED is found by subtracting the red-
dened B star model from the observed SED of the system
and then correcting this difference for the effects of extinc-
tion. The unreddened SED plus its V , (B−V )0 and (U−B)0
(also inferred from the fit) were then fit to an Atlas 9 model.
The V , (B − V ) and (U −B) photometry were initially as-
signed errors of 0.02, 0.01 and 0.02 mag, respectively. In
performing this fit, we fixed the micro-turbulent velocity at
2 km s−1, and allowed T
eff (the effective temperature of the
primary), log gp (the surface gravity of the primary) and
[m/H]
(the abundance of the primary), to vary. We had
to restrict the surface gravity to be larger than 1.6, other-
wise the fitting routine would seek log gp values that were
unrealistically small (we expect a log gp ≃ 2.0, e.g., Evans
1994). Furthermore, we had to increase the weight (decrease
the error) of the (B − V ) photometry by a factor of 10 in
order to obtain reasonable agreement with the photometry.
Figure 9 compares the unreddened SED of the Cepheid to
the best fit model. The parameters derived from the fit are
also listed in Table 2 and are enclosed in square brackets,
to distinguish them from the parameters derived from the
initial fit to the data.
It is also possible to test the reasonableness of the in-
ferred UV Cepheid SED by comparing it to IUE observa-
tions of the single Cepheid star δ Cep. δ Cep has a period
of 5.4 days, compared to 6.5 days for AW Per, and its mean
unreddened color is 〈(B − V )〉 = 0.57. To obtain the in-
trinsic color of AW Per, we use our derived color excess
for the system and the intrinsic colors of the B star sec-
ondary from Table 2 and the mean magnitude of the system,
〈V 〉 = 7.49 mag, to correct the observed mean color of the
system, 〈(B − V )〉 = 1.06 mag, for both extinction and the
presence of the companion. The result is 〈(B−V )p0〉 = 0.57,
identical to that of δ Cep (recall that the intrinsic color
we derive for AW Per is at Φ ≃ 0.92). Thus, the compar-
ison between these two stars is expected to be quite good.
Figure 7. Fits to the mean centroids at each roll angle for AW
Per. Each mean centroid was fit simultaneously with the corre-
sponding fluxes, optical photometry and interstellar extinction.
A Kurucz model was used to fit the B star component, and the
Cepheid flux was taken to be the difference between the reddened
B star model and the observed flux.
The bottom plot in Figure 9 compares the unreddened IUE
data (points) for δ Cep from several exposures obtained
for 0.9 6 Φ(δCep) 6 1.0 to the unreddened Cepheid STIS
spectrum (solid curve) of AW Per. Several IUE exposures
are required to produce the δ Cep spectrum since the dy-
namic range of IUE was so limited and the range of the
UV SED of δ Cep is so large. The IUE data had the Massa
& Fitzpatrick (2000) corrections applied, were dereddened
by an E(B − V ) = 0.09 (Dean et al. 1987) and scaled by
10−0.4(7.37−3.54) , which corresponds to magnitude difference
of AW Per at Φ = 0.92 (the mean for the STIS data) and
δ Cep at Φ = 0.95 (the mean of the IUE data).
Finally, we utilize the ∆y(n) which resulted from the fits
to derive the separation of the system and its position angle
on the sky. These are found by fitting eq. (2) to the plot
of ∆y versus roll angle shown in Figure 10. The error bars
at each orientation are the quadratic mean errors for that
roll determined from the dispersion in the fits to the three
individual sets of observations obtained at each orientation
(see, next section). The inverse of the errors squared were
c© 2007 RAS, MNRAS 000, 1–??
AW Per 9
Figure 8. Top: Best fit B star (thin curve) compared to the IUE
(points) and STIS (thick curve) fluxes. The model includes red-
dening. We only show the far-UV region, since the fit is, by defini-
tion, exact for wavelengths longward of 1650Å. Bottom: AW Per
extinction curve determined by the simultaneous fit of the flux
and centroids (solid curve) compared to a standard RV = 3.1
curve (dotted) from Fitzpatrick (1999).
used to weight the fit. The final result of the analysis is a
separation of θ = 13.74 ± 0.26 mas and a position angle
φ = 184.16 ± 1.94 deg, for an accuracy of ∼ 2%.
6.1 Errors in the parameters
In this section, we describe the internal, random, errors af-
fecting our parameter determinations, and also examine the
influence of systematic effects upon the results.
The random errors were evaluated in two, independent
ways. One is the error estimates calculated by the least
squares routine, which are determined by evaluating deriva-
tives of the model. These errors are listed in the second col-
umn of Table 3. We also obtained error estimates by fitting
the sets of observations obtained at the same off-set at each
roll angle, independently. These provide 3 sets of indepen-
dent observations and we used the parameters determined
from each set to obtain standard deviations (S.D.s) of the
model parameters. These estimates (divided by
3 applica-
ble to the error in the mean) are listed in the third column
of Table 3. Notice that the errors in the ∆y(n) determined
from the S.D.s are nearly twice as large. To be conservative,
Figure 9. Top: Inferred dereddened Cepheid SED (points) com-
pared to the best fitting Kurucz model (solid) and the dered-
dened flux of the best fit B star (dashed). Bottom: Comparison
of the unreddened Cepheid flux (solid curve) and an unreddened
IUE spectrum (dots) of δ Cep observations for 0.90 6 Φ 6 0.95.
The δ Cep flux is scaled by the difference between V = 3.54 at
Φ = 0.925 for δ Cep and V = 7.37, the magnitude of the primary
in AW Per at Φ = 0.92 (the mean phase of the STIS observations).
As discussed in the text, the δ Cep spectrum is a combination of
several IUE spectra.
these errors were used as the error shown in Figure 10 and
in determining the errors in θ and φ.
Beside the random (or measurement) errors, systematic
effects will also be present. We characterize these by varying
the different assumptions which enter the fitting procedure,
and then examining their influence on the result. To begin,
we varied the assumed value of log g used to fit the B star
by ±0.5, which should encompass all plausible values. The
result (the difference divided by 2) is listed in column 4 of
Table 3. Next, we tested the affect of assuming that the
STIS (and not the IUE) fluxes are correct and allowed for
the possibility that the B star accounts for only 95%, instead
of 100% of the flux at 1650Å. These results are listed in the
last two columns of Table 3
As can be seen from Table 3, the varying the log g can
cause a significant change in Teff
s, but has little effect on
the ∆y(n), which are the object of our analysis. In fact, the
only significant change in the ∆y(n) result from our inability
to determine whether the STIS or IUE fluxes are correct,
c© 2007 RAS, MNRAS 000, 1–??
10 D. Massa and N. R. Evans
Table 3. Errors
Param. Prog. S.D. |δ log g| |δ
fSTIS
fP+fs
∆y(1) 0.004 0.015 1.4× 10−4 6.5× 10−6 5.0× 10−4
∆y(2) 0.005 0.017 0.0019 1.4× 10−4 0.015
∆y(3) 0.005 0.014 0.0021 1.3× 10−4 0.015
248 105 1205 9.1 37
[m/H]
0.057 0.025 7.5× 10−5 0.0016 0.0064
E(B − V ) 0.001 0.038 0.018 0.0026 0038
RV 0.031 0.12 0.11 0.0090 2.7× 10
γ 0.015 2.4× 10−4 0.019 8.6× 10−4 0.0025
c3 0.14 0.32 0.049 0.029 0.0055
c4 0.019 0.066 0.014 4.3× 10
−3 0.0068
Figure 10. Determination of the angular separation of AW Per.
The observational errors for ∆y were determined from individual
fits to the 3 independent offset observations at each roll angle.
and even these errors are only of the same order of the errors
determined from the repeated observations. As a result, we
conclude that the angular separation determined from our
analysis is very robust to variations in the assumptions or
input parameters.
7 DISCUSSION
We have seen that the separation determined from the fit
is quite stable. We now discuss the physical parameters de-
termined from our fits (Table 2), their reliability and their
implications.
We first consider the Cepheid SED derived from the
fit. It is compared to the best fitting Atlas 9 model in top
panel of Figure 9. This “best fitting” model is not a very
good fit, since it lies systematically below the observed flux
in far-UV flux and over it in the near-UV flux. Furthermore,
the agreement with the optical photometry is not very good.
The model predicts V = 7.362, (B − V ) = 0.470 and (U −
B) = 0.309. The the agreement with the (B − V ) color
given in Table 2 is fair, but recall that it was given a very
large weight. The agreement with the inferred (U−B) is not
very good at all. The poor overall fit probably results from
the short comings of Atlas 9 models for Cepheids discussed
below.
The bottom panel of Figure 9 compares the unreddened
SED of the Cepheid component of AW Per to the unred-
dened SED of the single Cepheid, δ Cep at approximately
the same phase. This figure demonstrates three points. First,
the two SEDs agree surprising well. Second, the strong far-
UV flux in the derived SED relative to the models is also
present (and slightly larger) in δ Cep, so the derived SED is
quite reasonable. Third, the flux in δ Cep is extremely small
for wavelengths shortward of 1650Å, bolstering our assump-
tion that all of the flux in AW Per observed below 1650Å is
due to the B star secondary.
So, why is the Atlas 9 model fit of the Cepheid so poor?
One must remember that Cepheid UV SEDs depend on nu-
merous, ill-defined physical processes that are not fully in-
corporated into the Atlas 9 models. These include spherical
extension, which can enhance the UV flux from an atmo-
sphere (see Fig. 4 in Hauschildt et al. 1999), chromospheres
(e.g., Sasselov & Lester 1994), the amount of convective en-
ergy transport (Castelli, Gratton, & Kurucz 1997) and the
details of the line blanketing (Prieto, Hubeny, & Lambert,
2003). In addition, there are inevitably dynamical effects
that are not treated by the models.
In fact, we initially attempted to fit the data with using
an approach that employed models for both the Cepheid and
the B star. However, we abandoned it because it produced
poor fits and the separations that were ∼ 10% larger than
those derived from the adopted technique. The origin of the
systematic difference in the centroids can be traced to the
gradient in the flux residuals seen in the top of Figure 9.
These propagate into the fits of the centroids. Perhaps the
use of more detailed Cepheid models could solve this prob-
In spite of these difficulties, it is of interest to exam-
ine the physical parameters determined from the Cepheid
model. To begin, Teff of the best fit model agrees reason-
ably well with previous estimates for Cepheid temperatures
near maximum light (Evans & Teays 1996, Fry & Carney
1999, Kovtyukh & Gorlova 2000). On the other hand, the
fit selects a very low surface gravity and would have set-
tled on an even lower value if it had been allowed to do so.
It is also interesting that the Cepheid model has a signif-
icantly different metallicity than the B star. However, this
may not be too strange. Instead, it may simply reflect the
fact that the [m/H] parameter in cooler models responds
more to spectral features produced by CNO elements, while
the same parameter in the B stars responds to the Fe abun-
dance (Fitzpatrick & Massa 1999).
c© 2007 RAS, MNRAS 000, 1–??
AW Per 11
Next, we consider the parameters determined for the
B star. The model fit to the far-UV (Fig. 8) is quite good,
and the extinction curve, while distinctly different from the
canonical RV = 3.1 curve, is rather unremarkable, with
parameters well within normal bounds (e.g., Fitzpatrick &
Massa 1990, Valencic et al. 2004). Also, the [m/H] for the
B star is well within the expected range for such stars (e.g.,
Fitzpatrick & Massa 1999, 2005) and the inferred color ex-
cess is quite close to previous determinations (Evans 1994).
It should not be surprising that these fits are so good, since
both the extinction model and the ability of the Atlas 9 mod-
els to describe normal B star spectra are well documented.
Notice that Teff we derive is considerably hotter than previ-
ously estimated by Evans (1994), and lies somewhat closer
to the ZAMS (see, Fig. 7 in Evans 1994). However, its prob-
able mass, ∼ 5M⊙ (based on its Teff , Andersen, 1991), re-
mains significantly less than the lower limit of ∼ 6.6M⊙
determined from the radial velocity orbit of the primary by
Evans et al. (2000). Thus, it still appears likely that the B
star component of AW Per must also be a binary.
8 SUMMARY
We have shown that the signatures of the Cepheid and B
star components of AW Per are clearly present in the wave-
length dependence of the centroid of its spectrum. This
result demonstrates the power of our approach. A simple
model was devised to extract the angular separation of the
binary from the centroid measurements. The accuracy of
the angular separation is ∼ 2%, or ± a few ×10−4′′! We also
demonstrated that the results are extremely stable to vari-
ations in the expected systematic effects in the data and its
analysis. We also showed that one possible source of uncer-
tainty in the current data is the absolute level of the far-UV
data. Higher quality far-UV observations to secure the B
star flux level and secure its parameters would be extremely
useful.
Our final results are listed in Table 2. In addition to
the angular separations and position angle, these include
a Cepheid temperature and systemic extinction that agree
with previous estimates and a B star secondary temperature
that is considerably hotter than previously thought (e.g.,
Evans, 1994). However, the likely mass of the secondary
still appears too small to account for the minimum mass
of the secondary inferred by the radial velocity of the pri-
mary. Consequently, it is likely that the B star component
of AW Per is also be a binary.
Finally, the long period of AW Per’s orbit means that
it will be a few years before the separation changes enough
for the second independent observation needed to determine
sin i can be obtained.
ACKNOWLEDGMENTS
We would like to thank Karla Peterson and Charles Proffit
of STScI, who provided valuable guidance in preparing the
observations. This work was supported by NASA grants to
SGT, Inc. and SAO.
REFERENCES
Andersen, J. 1991, A&A Rev, 3, 91
Beckers, J.M. 1982, Opt. Acta, 29, 361
Benedict, G.F., McArthur, B.E., Fredrick, L. W., et al.
2002 AJ, 124, 1695
Castelli, F., Gratton, R.G. & Kurucz, R.L. 1997, A&A,
318, 841
Dean, J.F., Warren, P.R. & Cousins, A.W.J. 1987, MN-
RAS, 183, 569
Evans, N.R. 1989, AJ, 97, 1737
Evans, N.R. 1994, ApJ, 436, 273
Evans, N.R. 1995, ApJ, 445, 393
Evans, N.R. & Teays, T.J. 1996, AJ, 112, 761
Evans, N.R., Bohm-Vitense, E., Carpenter, K., Beck-
Winchatz, B. & Robinson, R. 1998, ApJ, 494, 768
Evans, N.R., Vinko, J. & Wahlgren, G.M. 2000, AJ, 120,
Evans, N. R., et al. 2007, in Hartkopf, W.I. Guinan, E.F.
& Harmanec, P. eds., “Binary Stars as Critical Tools and
Tests in Contemporary Astrophysics”, IAU. Symp 240,
Fitzpatrick. E.L. 1999, PASP, 111, 63
Fitzpatrick, E.L. & Massa, D. 1990, ApJS, 72, 163
Fitzpatrick, E.L. & Massa, D. 1999, ApJ, 525, 1011
Fitzpatrick, E.L. & Massa, D. 2005, AJ, 129, 1642
Fry, A.M. & Carney, B.W. 1999, AJ, 118, 1806
Groenewegen, M.A.T. & Oudmaijer R.D. 2000, A&A, 356,
Hauschildt, P.H., Allard, F., Ferguson, J., Baron, E. &
Alexander, D.R. 1999, ApJ, 525, 871
Kervella, P., Fouqué, P., Storm, J., Gieren, W.P., Bersier,
D., Mourard, D., Nardetto, N., Foresto, V. 2004, ApJ,
604, L113
Kovtyukh, V.V. & Gorlova, N.I. 2000, A&A, 358, 587
Kiss, L.L. 1998, MNRAS, 297, 825
Kurucz, R.L. 1991, in “Stellar Atmospheres: Beyond Classi-
cal Models”, ed. L. Crivellari, I. Hubeny, & D.G. Hummer
(NATO ASI Ser. C.; Dordrect: Kluwer), 441
Lindler, D. 1998, CALSTIS Reference Guide (CALSTIS
Version 5.1) (Greenbelt, MD:GSFC)
Massa, D. & Endal, A.S. 1987, AJ, 93, 760
Massa, D. & Fitzpatrick, E. L. 2000, ApJS, 126, 517
Moffett, T.J., & Barnes, T.G. 1984, ApJS, 55, 389
Nichols, J.S. & Linsky, J.L. 1996, AJ, 111, 517
Prieto, C.A. , Hubeny, I. & Lambert, D.L. 2003, ApJ, 591,
Porter, J.M., Oudmaijer, R.D.& Baines, D. 2004, A&A,
428, 327
Kim Quijano, J., et al. 2003, “STIS Instrument Hand-
book”, Version 7.0, (Baltimore: STScI).
Romaniello, M., Primas, F., Mottini, M., Groenewegen, M.,
Bono, G. & Franois, P. 2005, A&A, 429L, 37
Sasselov, D.D. & Lester, J.B. 1994, ApJ, 423, 795
Schaller, G., Schaerer, D., Meynet, G., & Maeder, A. 1992,
A&AS, 96, 269
Schmidt, G.S. & Parsons, S.B. 1982, ApJS, 48, 185
Simon, N.R. 1990 in Cacciari, C. & Clementini, G. eds.,
ASP Conf. Ser., 11, Confrontation between stellar pulsa-
tion and evolution, p193
Szabados, L. 1980, Commun. Konkoly Obs. Hung. Acad.
Sci., Budapest, No.76
c© 2007 RAS, MNRAS 000, 1–??
12 D. Massa and N. R. Evans
Szabados, L. 1991, Commun. Konkoly. Obs. Hung. Acad.
Sci., Budapest, No.96
Valencic, L.A., Clayton, G.C., & Gordon, K.D. 2004, ApJ,
616, 912
APPENDIX A: MATHEMATICAL DETAILS
This appendix provides a detailed derivation of how the
wavelength dependence of the centroid of the a dispersed
image can be used to determine the separation of a binary
whose components have different colors.
Consider the set of angular coordinates x and y which
are parallel and perpendicular to the dispersion, respec-
tively, with x increasing in the direction of increasing wave-
length (this is the standard STIS coordinate system, Kim et
al. 2003). Now, let h(y) be the instrumental profile in the
cross-dispersion direction, y. Then the spectrum of a single
star located at y = y0 can be expressed as
f(λ, y) = N(λ)h(y − y0) (A1)
where λ = λ(x), andN(λ) is the photon flux at λ (we assume
infinite resolution in the wavelength direction).
If the spectra of the primary and secondary components
of the binary are centered at yp, and ys, then their spectra
are separated by ∆y = ys − yp, and the image of the binary
spectrum is given by
f(λ, y) = Np(λ)h(y − yp) +Ns(λ)h(y − yp −∆y). (A2)
If ∆y is small compared to structure in h(y), this equation
can be approximated by
f(λ, y) ≃ Np(λ)h(y − yp)
+Ns(λ)
4h(y − yp) +∆y
dh(y)
y=y−yp
= [Np(λ) +Ns(λ)]×
4h(y − yp) +
Ns(λ)
Np(λ) +Ns(λ)
dh(y)
y=y−yp
≃ [Np(λ) +Ns(λ)]×
∆yNs(λ)
Np(λ) +Ns(λ)
Therefore, the wavelength dependence of the centroid of the
spectrum will vary as
y(λ) = yp +
∆yNs(λ)
Np(λ) +Ns(λ)
= yp +
1 +R(λ)
where R(λ) = Np(λ)/Ns(λ) is the flux ratio of the binary
components.
Now, the separation ∆y depends on both the separation
of the binary, θ, and the orientation of the system relative
to the dispersion direction. The position angle on the sky
of the binary, φ, is defined as the angle measured the c.c.
from north to east, with the primary at the origin. The angle
α(n) (also measured the c.c. from north to east) is defined
as the angle of a line in the dispersion direction pointing in
the direction of increasing wavelength for the nth telescope
orientation. In this case, φ − α(n) is the angle between the
dispersion and a line connecting the binary components and
∆y(n) = θ sin(φ−α(n)) is the displacement of the two spectra
of the binary (note that when φ − α = 0, ±180◦, ∆y = 0).
Therefore, the observation obtained with the telescope in
the nth orientation can be expressed as
p + θ sin(φ− α(n))[1 +R(λ)]−1 (A5)
where y
p is the wavelength independent displacement of
the nth exposure in y.
To extract both θ and φ from the observed centroids, at
least two observations at different α’s are required. There-
fore, as long as the relative fluxes of the binary components
are known, a linear regression of the wavelength dependence
of the centroid against [1+R(λ)]−1 gives ∆y(n) for that ob-
servation. The y
p are constant terms related to the abso-
lute position of the primary star, although in practice the
they cannot be reliably disentangled from the large random
errors in the absolute position of the binary on the detector
at each orientation.
Once the ∆y(n) are determined for each orientation,
these are plotted against the known quantities, α(n). Since
= θ sin(φ− α(n)) (A6)
fitting a sine function to the ∆y(n) as a function of the α(n)
determines φ and θ, the observables of an astrometric binary
at the epoch of the observations.
c© 2007 RAS, MNRAS 000, 1–??
	INTRODUCTION
	THE APPROACH(Cross-Dispersion Imaging)
	Basic Principles
	Exposure Times and Random Errors
	THE OBSERVATIONS
	DATA REDUCTION
	Centroids
	Fluxes
	ANALYSIS
	Overview
	Model Components
	Details of the Fitting Procedures
	Determining the Separations
	Weights
	RESULTS
	Errors in the parameters
	DISCUSSION
	SUMMARY
	MATHEMATICAL DETAILS
ABSTRACT
  The 6.4 day classical Cepheid AW Per is a spectroscopic binary with a period
of 40 years. Analyzing the centroids of HST/STIS spectra obtained in November
2001, we have determined the angular separation of the binary system. Although
we currently have spatially resolved data for a single epoch in the orbit, the
success of our approach opens the possibility of determining the inclination,
sini, for the system if the measurements are repeated at additional epochs.
Since the system is potentially a double lined spectroscopic binary, the
combination of spectroscopic orbits for both components and the visual orbit
would give the distance to the system and the masses of its components, thereby
providing a direct measurement of a Cepheid mass.

<|endoftext|><|startoftext|>
arXiv:0704.1052v2  [cond-mat.mes-hall]  2 Aug 2007
Transverse field effect in graphene ribbons
D. S. Novikov
W. I. Fine Theoretical Physics Institute, University of Minnesota, Minneapolis, MN 55455
(Dated: July 31, 2007)
It is shown that a graphene ribbon, a ballistic strip of carbon monolayer, may serve as a quan-
tum wire whose electronic properties can be continuously and reversibly controlled by an externally
applied transverse voltage. The electron bands of armchair-edge ribbons undergo dramatic transfor-
mations: The Fermi surface fractures, Fermi velocity and effective mass change sign, and excitation
gaps are reduced by the transverse field. These effects are manifest in the conductance plateaus, van
Hove singularities, thermopower, and activated transport. The control over one-dimensional bands
may help enhance effects of electron correlations, and be utilized in device applications.
PACS numbers: 73.23.-b 73.63.-b 72.80.Rj
Building nanoscale systems with pre-determined prop-
erties has long been the focus of basic and applied re-
search. Progress in this field is tied to the recent ad-
vancements in the synthesis of quantum nanowires [1]
and quantum dots [2] via control of the growth process,
as well as in the growth and selection of carbon nanotubes
[3]. The characteristics of these devices, however, are set
by design and are typically difficult to modify. Ideally,
one would like to be able to tune the system’s properties
reversibly after synthesis.
In the present work we suggest that a graphene ribbon
(GR), a ballistic strip of recently discovered [4] carbon
monolayer, may serve as a quantum wire whose electronic
properties can be continuously and reversibly controlled
by the external transverse voltage. The setup makes use
of the massless relativistic electron dispersion in graphene
[4–6], with the valence and conduction bands touching at
a conical Dirac point [3, 7].
Electron dispersion in GRs varies depending on their
chirality, as the transverse confinement of Dirac fermions
is sensitive to the boundary conditions [8–16]. This
has prompted proposals for GR applications as field-
effect transistiors [17] and valley filters [18]. Further-
more, GRs have been suggested as a host of interesting
many-body phenomena, including spin polarization on
the edges [19, 20], and as a basis for building coupled
electron spin qubits [21]. Recently spectral gaps in GRs
have been measured [22], scaling approximately inversely
with the ribbon width.
The basic idea of the present proposal is that the prop-
erties at the Dirac point are fragile and can be affected by
external fields [23]. Not suprisingly, the proposed strong
electric field effect is similar to that considered for carbon
nanotubes [24–26]. Unfortunately, small radiusR ∼ 1 nm
of single-walled tubes requires very large transverse fields
E[MV/cm] ≃ 25/R2[nm]; this has so far hindered observation
of band transformation. Remarkably, with GRs, the rib-
bon width (that plays the role of the tube circumference)
may vary in a broad range, L ∼ 10−200 nm [22], and the
effects of strong band transformation become realistic.
In the setup shown in Fig. 1, electrons in a GR are con-
fined along the x axis, while the longitudinal momentum
ky ≡ k is conserved. The effect of the applied transverse
voltage V is to induce the potential
U(x) = −eE(x− L/2) . (1)
The acting field E ∝ Eext ∝ V can be assumed uni-
form and proportional to the external field Eext as long
as the bands are not strongly mixed (as described be-
low); e is the unit charge. We subtracted the average,
setting
dxU(x) = 0 (the subtracted constant adds to
the chemical potential controlled by the gate voltage Vg).
The natural units for the transverse voltage and energy
u = eEL/∆L , ∆L = h̄v/L ≃ 0.7 eV/L[nm] , (2)
where v ≃ 106m/s is graphene’s Fermi velocity. The
ballistic limit of transport is implied.
We now give an overview of the results. The electron
band transformation is shown in Figs. 2 and 3. The lon-
gitudinal electron bands change qualitatively when the
FIG. 1: (color online). The setup. Left and right electrodes
carry the voltage ±V/2, producing the external field Eext. The
carrier concentration is controlled by the back gate voltage Vg.
http://arxiv.org/abs/0704.1052v2
−10 −5 0 5 10
Wave vector  kL along the ribbon
10 20 30
FIG. 2: (color online). Transverse field effect in metallic GRs.
Inset: Velocity reversals in metallic GRs occur at the zeros
of the function g(u), Eq. (9); first reversal voltage u1 ≈ 9.2.
Fine lines show the u ≪ 1 and u ≫ 1 asymptotic behavior
of g(u). Top: The voltage u = 15 above the reversal value
u1. The Fermi surface acquires a pair of small pockets. Small
gaps at k = 0 are due to imperfect boundaries [20]. Bottom:
Landauer conductance G (bold integers and colors) in the
units of G0 = 2e
2/h, as the number of transverse modes at the
Fermi energy Vg. Dashed cut corresponds to the top panel.
dimensionless transverse voltage approaches u ∼ 10. A
number of effects follow:
(i) The Landauer conductance [27] is quantized in the
units of G0 = 2e
2/h, similar to that in point contacts in
GaAs [28]. The crucial difference is that now the posi-
tions and widths of the plateaus can be controlled by the
transverse voltage. The sharp steps in Figs. 2 and 3 in
real systems will be smoothened out by finite tempera-
ture or weak disorder, while the conductance values on
the plateaus will remain equal to the quantized values.
−4 −3 −2 −1 0 1 2 3 4
Wave vector  kL  along the ribbon
2 ∆(u) 
5 10 15
FIG. 3: (color online). Transverse field effect in semiconduct-
ing GRs. For voltages u > ut ≈ 4.5 above threshold the
effective mass at k = 0 is negative. Top: Bands transforma-
tion, starting from u = 0 (dotted), to just above threshold
u = 6 (solid), to above threshold, u = 10 (dashed, only lowest
subbands shown), where the gap is minimal at k 6= 0. In-
set: Gap suppression occurs above the threshold ut. Bottom:
Landauer conductance plateaus in the units of G0 = 2e
(ii) The thermopower S ∝ −∂ lnG/∂Vg, being propor-
tional to the conductance derivative [29], peaks at the
borders between the domains in Figs. 2 and 3 (bottom).
(iii) The Fermi velocity in metallic GRs is reduced by
the field, v → vg(u) [Fig. 2 inset and Eq. (9) below]. As
a consequence, the one-dimensional density of states at
the band center ν(0) = 2/{πh̄v|g(u)|} increases [factor 2
is due to spin degeneracy]. This increase magnifies the
effects of electron interactions. The latter may manifest
themselves via the increase of the Luttinger liquid expo-
nent in a sufficiently long ribbon, and through excitonic
instabilities (resulting in interaction-induced gaps).
(iv) The Fermi velocity changes sign for the field val-
ues corresponding to zeroes of g(u), causing strong van
Hove singularities in metallic GRs. The Fermi surface
fractures, with each sign change adding a pair of small
pockets to the Fermi surface (Fig. 2, top). This effect
produces extra conductance plateaus (Fig. 2, bottom).
(v) There is a threshold voltage ut ≃ 4.5 above which
the effective mass of the lowest energy subband in semi-
conducting GRs changes sign, so that the longitudinal
electron dispersion acquires symmetric minima at small
but nonzero k (Fig. 3). The excitation gap is then re-
duced by the field (Fig. 3 inset). This effect can be de-
tected in the shift of the conductance plateaus (Fig. 3,
bottom), and in the activated transport measurements.
(vi) The band structure remains electron-hole symmet-
ric at any field for both metallic and semiconducting GRs
due to the Dirac symmetry of the problem. Thus the
conductance plots of Figs. 2 and 3 are independent of
the polarity of the gate and transverse voltages.
Turning to possible applications, the setup may serve
as a field-effect transistor with a tunable working point,
in which the “transverse”, V , and the “normal”, Vg, field
effects can be utilized separately. Furthermore, one may
selectively amplify combinations α(V −V 0)+β(Vg−V 0g ),
α = ∂G/∂V |V and β = ∂G/∂Vg|V by choosing an appro-
priate working point V = (V 0, V 0g ) on the edge of the
conductance plateau. Tuning the parameters to achieve
a large gain for say, V − Vg, combined with the device’s
large input and low output impedance, is reminiscent of
an operational amplifier. By the same token, strong con-
ductance nonlinarity in both inputs V and Vg may render
this setup into a few-nm size signal multiplier, or even
into a logic gate.
We now outline the details of the calculation. At the π-
band center (ǫ = 0), the electron dispersion is determined
by the two inequivalent Dirac points in the Brillouin zone.
The low-energy states Ψ(r) = eiKxψ+(r) + e
−iKxψ−(r)
are represented [3] in terms of the smoothly varying en-
velope ψ = {ψ+, ψ−} that consists of the pair of the two-
component spinors ψ+ and ψ− with values on the two
sublattices of the honeycomb lattice [here K = −4π/3a0,
where a0 =
3acc is the graphene lattice constant, and
acc = 0.144 nm is the Carbon bond length]. The dy-
namics of the envelope is governed by the Dirac equation
Hψ = ǫψ, with the effective Hamiltonian
, H± = ±ih̄vσ1∂x − h̄vkσ2 + U(x) ,
where σ1,2 are the Pauli matrices. The boundary condi-
tions Ψ(r)|x=0,L = 0 at the armchair edges dictate [14]
ψ+(0) + ψ−(0) = 0 , ψ+(L) + e
iφnψ−(L) = 0 , (4)
where the phase φn = KL = − 2π3 (n + 1 − δ), and
L = 1
(n+1− δ)a0 is the effective ribbon width (the dis-
tance between the sites on which Ψ vanishes). The phase
φn may incorporate corrections coming from imperfect
edges, similar to the curvature-induced corrections in
nanotubes [30]. (For example, the δt/t ≈ 0.12 change
in the hopping amplitude at the edges due to the passi-
vated bonds [20] reduces the effective width L and the
boundary phase φn by the amount ∝ δ = 3
≈ 0.20.)
The system (3) and (4) is solved numerically (Figs. 2
and 3) via the transfer matrix approach similar to that
of Refs. [24, 25, 31]. Eq. (3) is equivalent to
∂xψ± = ±Pψ± , P(x) = kσ3 + iσ1(U − ǫ)/h̄v . (5)
The armchair boundary conditions (4) require tr (SS̃) =
2 cosφn for the product of the transfer matrices
S = Txe
P(x)dx , S̃ = T̃xe
P(x)dx , (6)
where Tx and T̃x symbolize the “chronological” and “anti-
chronological” orderings of the operators P(x) that do
not commute for different x.
In the absence of the field, the GR spectrum consists of
one-dimensional Dirac bands with |ǫk=0| = ∆L× π3 |n+1−
3p− δ|, p = 0,±1,±2, ... . Thus GRs with n = 3p− 1 are
metallic (with small gap ∝ δ originating from imperfect
boundaries [20]), in which case the lowest energy mode is
non-degenerate, and the rest are doubly-degenerate (the
latter degeneracy is lifted by the finite δ). The ribbons
with n = 3p and n = 3p − 2 are semiconducting, with
non-degenerate bands, and excitation gaps |ǫk=0| = ∆L×
(1∓ δ) correspondingly.
To study the transverse field effect it is convenient to
employ the chiral gauge transformation [24, 25]
ψ± = e
±iσ1ϕ(x)ψ̃± , ϕ =
U(x′)dx′/h̄v , (7)
that preserves the boundary conditions (4) and trans-
forms the system (3), H± → H̃±,
H̃± = h̄v
−kσ2e±2iσ1ϕ(x) ± iσ1∂x
. (8)
The transformation (7) shows that the spectrum at k = 0
is unaffected by the field. For the metallic GRs with
ideal edges (eiφn ≡ 1), the two degenerate k = 0, ǫ = 0
eigenstates, each consisting of a pair {ψ̃+, ψ̃−}, are
|1〉 =
and |2〉 =
Projecting the Hamiltonian (8) onto these states, H̃ →
h̄vkg(u)σ2, we find the spectrum around k = 0
ǫ = ±h̄v|kg(u)| , g =
dξ cos [uξ(1− ξ)] . (9)
The function g(u) is plotted in Fig. 2 inset. For |u| ≪ 1,
g ≃ 1 − u2/60. Its |u| ≫ 1 form g ≃
π/|u| cos[(|u| −
π)/4] determines the successive voltages un ≈ ±(3+4n)π,
n = 0, 1, 2, ..., where the k = 0 velocity changes sign. At
those voltages the dispersion ǫ ∼ k3 at k = 0, causing
the van Hove singularity ν(ǫ) ∼ |ǫ|−2/3 in the density
of states at ǫ = 0, and an additional pair of pockets
of Fermi surface emerges. In the |u| ≫ 1 limit, such
pockets appear at the zeroes of g for both metallic and
semiconducting GRs.
Electron interactions in graphene result in the RPA
screening of the external field Eext. The screening is
scale-invariant, E = Eext/κ, for an infinite sheet [32],
κ = 1 + 2πe2/4h̄v ≃ 5. The depolarization problem
in nanotubes [24–26, 33–36] also yields κ ≃ 5 practi-
cally independent of the tube radius and chirality. This
linear-screening estimate will remain valid in GRs as long
as the subbands are not strongly mixed. In the oppo-
site case the field on the ribbon edges, estimated in the
Thomas-Fermi fashion, develops an algebraic singular-
ity [37], which corresponds to filling the Fermi-surface
pockets. As a result, the uniform field model (1) is jus-
tified for weak to moderate fields. For the acting field
u = 10, the required external field uext ≃ 50 is achieved
at EextL ≃ 1V across the ribbon width L = 30 nm.
To conclude, the transverse voltage applied across a
graphene nanoribbon dramatically affects its longitudinal
electronic dispersion. The Fermi surface breaks up into
pockets for the metallic ribbons, and the excitation gap
closes for the semiconducting ones. The strong field effect
can lead to interesting physical phenomena as well as be
utilized in carbon-based electronic devices.
This work has benefited from illuminating discussions
with M. Fogler, L. Glazman and L. Levitov. The research
was sponsored by NSF grants DMR 02-37296 and DMR
04-39026.
[1] X. Duan and C. M. Lieber, Adv. Mater. 12, 298 (2000).
[2] C. B. Murray, C. R. Kagan, and M. G. Bawendi, Annu.
Rev. Mater. Sci. 30, 545-610 (2000).
[3] M.S. Dresselhaus, G. Dresselhaus, Ph. Avouris, Carbon
Nanotubes: Synthesis, Structure, Properties and Appli-
cations (Springer, New York, 2001).
[4] K.S. Novoselov et al., Science 306, 666 (2004); Nature
438, 197 (2005).
[5] C. Berger et al., J. Phys. Chem. B 108, 19912 (2004).
[6] Y. Zhang, J. P. Small, M. E. S. Amori, and P. Kim, Phys.
Rev. Lett. 94, 176803 (2005).
[7] P. R. Wallace, Phys. Rev. 71, 622 - 634 (1947).
[8] K. Nakada, M. Fujita, G. Dresselhaus, and M. S. Dres-
selhaus, Phys. Rev. B 54, 17954 (1996).
[9] M. Fujita, K. Wakabayashi, K. Nakada, and K. Kusak-
abe, J. Phys. Soc. Jpn. 65, 1920 (1996).
[10] K. Wakabayashi and T. Aoki, Int. J. Mod. Phys. B 16,
4897-4909 (2002).
[11] S. Ryu and Y. Hatsugai, Phys. Rev. Lett. 89, 077002
(2002).
[12] M. Ezawa, Phys. Rev. B 73, 045432 (2006).
[13] N. M. R. Peres, F. Guinea, and A. H. Castro Neto, Phys.
Rev. B 73, 125411 (2006).
[14] L. Brey and H. A. Fertig, Phys. Rev. B 73, 195408 (2006);
Phys. Rev. B 73, 235411 (2006).
[15] V. Barone, O. Hod, and G. E. Scuseria, Nano Lett. 6,
2748 (2006).
[16] P. G. Silvestrov and K. B. Efetov, Phys. Rev. Lett. 98,
016802 (2007).
[17] B. Obradovic, R. Kotlyar, F. Heinz, P. Matagne, T. Rak-
shit, M. D. Giles, M. A. Stettler, and D. E. Nikonov,
Appl. Phys. Lett. 88, 142102 (2006); Y. Ouyang, Y.
Yoon, J. K. Fodor, and J. Guo, Appl. Phys. Lett. 89,
203107 (2006).
[18] A. Rycerz, J. Tworzdlo, and C. W. J. Beenakker, Nature
Physics 3, 172 (2007).
[19] Y.-W. Son, M. L. Cohen, and S. G. Louie, Nature 444,
347 (2006).
[20] Y.-W. Son, M. L. Cohen, and S. G. Louie, Phys. Rev.
Lett. 97, 216803 (2006).
[21] B. Trauzettel, D. V. Bulaev, D. Loss, and G. Burkard,
Nature Physics 3, 192 (2007).
[22] Z. Chen, Y.-M. Lin, M. J. Rooks, and P. Avouris,
cond-mat/0701599 (2007); M. Y. Han, B. Oezyilmaz, Y.
Zhang, and P. Kim, Phys. Rev. Lett. 98, 206805 (2007).
[23] Effect of strong transverse field on Landau levels in
graphene has been studied by V. Lukose, R. Shankar,
and G. Baskaran, Phys. Rev. Lett. 98, 116802 (2007).
[24] D. S.Novikov and L. S. Levitov, preprint arXiv.org/cond-
mat/0204499 (2002); also in B. Altshuler, A. Tagliacozzo
and V. Tognetti (Eds.), Quantum Phenomena in Meso-
scopic Systems (IOS Press, Amsterdam, 2003).
[25] D. S.Novikov and L. S. Levitov, Phys. Rev. Lett. 96,
036402 (2006).
[26] Y. Li, S.V.Rotkin, and U.Ravaioli, Nano Lett. 3, 183
(2003).
[27] M. Buttiker, Y. Imry, R. Landauer and S. Pinhas, Phys.
Rev. B 31, 6207 (1985).
[28] B. J. van Wees et al., Phys. Rev. Lett. 60, 848 - 850
(1988); D. A. Wharam et al., J. Phys. C 21, L209-L214
(1988).
[29] L. W. Molenkamp, H. van Houten, C. W. J. Beenakker,
R. Eppenga, and C. T. Foxon, Phys. Rev. Lett. 65, 1052
(1990); L. W. Molenkamp, Th. Gravier, H. van Houten,
O. J. A. Buijk, M. A. A. Mabesoone, and C. T. Foxon,
Phys. Rev. Lett. 68, 3765 (1992).
[30] C. L.Kane and E. J.Mele, Phys. Rev. Lett. 78, 1932
(1997).
[31] H.-W. Lee and D. S. Novikov, Phys. Rev. B 68, 155402
(2003).
[32] J. Gonzalez, F. Guinea, and M. A. H. Vozmediano,
Phys. Rev. B 59, R2474 (1999); D. T. Son, preprint
arXiv.org/cond-mat/0701501 (2007).
[33] L. X.Benedict, S.G. Louie, and M.L.Cohen, Phys. Rev.
B 52, 8541 (1995).
[34] M.Krcmar, W.M. Saslow, and A. Zangwill, J. Appl.
Phys. 93, 3495 (2003).
[35] E. N. Brothers, K. N. Kudin, G. E. Scuseria, and C. W.
Bauschlicher, Jr., Phys. Rev. B 72, 033402 (2005).
[36] B. Kozinsky and N. Marzari, Phys. Rev. Lett. 96, 166801
(2006).
[37] D. B. Chklovskii, B. I. Shklovskii, and L. I. Glazman,
Phys. Rev. B 46, 4026 (1992).
ABSTRACT
  It is shown that a graphene ribbon, a ballistic strip of carbon monolayer,
may serve as a quantum wire whose electronic properties can be continuously and
reversibly controlled by an externally applied transverse voltage. The electron
bands of armchair-edge ribbons undergo dramatic transformations: The Fermi
surface fractures, Fermi velocity and effective mass change sign, and
excitation gaps are reduced by the transverse field. These effects are manifest
in the conductance plateaus, van Hove singularities, thermopower, and activated
transport. The control over one-dimensional bands may help enhance effects of
electron correlations, and be utilized in device applications.

<|endoftext|><|startoftext|>
J. Y. Jo et al.
Domain Switching Kinetics in Disordered Ferroelectric Thin Films
J. Y. Jo,1 H. S. Han,1 J.-G. Yoon,2 T. K. Song,3 S.-H. Kim,4 and T. W. Noh1,∗
ReCOE&FPRD, Department of Physics andAstronomy, Seoul National University, Seoul 151-747, Korea
Department of Physics,University of Suwon, Suwon, Gyeonggi-do 445-743, Korea
School of Nano Advanced Materials, Changwon National University, Changwon, Gyeongnam 641-773, Korea
R&D center, Inostek Inc., Ansan, Gyeonggi-do 426-901, Korea
(Dated: October 23, 2018)
We investigated domain kinetics by measuring the polarization switching behaviors of polycrys-
talline Pb(Zr,Ti)O3 films, which are widely used in ferroelectric memory devices. Their switching
behaviors at various electric fields and temperatures could be explained by assuming the Lorentzian
distribution of domain switching times. We viewed the switching process under an electric field as
a motion of the ferroelectric domain through a random medium, and we showed that the local field
variation due to dipole defects at domain pinning sites could explain the intriguing distribution.
PACS numbers: 77.80.Fm, 77.80.Dj, 77.84.Dy
Domain switching kinetics in ferroelectrics (FEs) un-
der an external electric field Eext have been extensively
investigated for several decades [1, 2, 3, 4, 5, 6, 7, 8, 9].
The traditional approach to explain the FE switching
kinetics, often called the Kolmogorov-Avrami-Ishibashi
(KAI) model, is based on the classical statistical theory
of nucleation and unrestricted domain growth [10, 11].
For a uniformly polarized FE sample under Eext, the
KAI model gives the time (t)-dependent change in polar-
ization ∆P (t) as
∆P (t) = 2Ps[1− exp{−(t/t0)
n}], (1)
where n and t0 are the effective dimension and character-
istic switching time for the domain growth, respectively,
and Ps is spontaneous polarization. When the nuclei are
appearing in time with the same probability, n = 3 for
bulk samples and n = 2 for thin films [12]. In addition,
t0 is proportional to the average distance between the
nuclei, divided by the domain wall speed. Several stud-
ies have used the KAI model successfully to explain the
∆P (t) behaviors of FE single crystals and epitaxial thin
films [2].
Recently, FE thin films have been intensively investi-
gated for FE random access memory (FeRAM) [1]. Most
commercial FeRAM use polycrystalline Pb(Zr,Ti)O3
(poly-PZT) films, and their ∆P (t) behaviors determine
the reading and writing speeds of the FeRAM. In such
non-epitaxial FE films, a domain cannot propagate in-
definitely due to pinning caused by numerous defects, so
the KAI model cannot be applied. Therefore, it is impor-
tant both scientifically and technologically to clarify the
domain switching kinetics of polycrystalline FE films.
Numerous studies have examined the ∆P (t) behaviors
of polycrystalline FE films, and the reported results vary
markedly [3, 4, 5, 6, 7]. Lohse et al. measured the polar-
ization switching currents of poly-PZT films, and showed
that ∆P (t) slowed significantly compared to Eq. (1) [3].
Tagantsev et al. observed similar phenomena for poly-
PZT films. To explain these behaviors, they developed
the nucleation-limited-switching (NLS) model. They as-
sumed that films consist of several areas that have inde-
pendent switching kinetics:
∆P (t) = 2Ps
[1− exp{−(t/t0)
n}]F (log t0)d(log t0),
where F (log t0) is the distribution function for log t0
[4]. They assumed a very broad mesa-like function for
F (log t0), and could explain their ∆P (t) data. The same
500 ns
@ 1.7 V
1.7 V
1.4 V
1.2 V
1.1 V
1.0 V
0.9 V
0.8 V
-7 -6 -5 -4 -3 -2
300 K
150 K
  80 K
  25 K
  15 K
log t (s)
@ 300 K
Pole Read P
Write
A1 A2
Pole Read P
500 ns
A1 A2(a)
500 ns
FIG. 1: (color online). Schematic diagrams of the pulse trains
used to measure (a) non-switching polarization (Pns) and (b)
switching polarization (Psw). Time (t)-dependent switched
polarization ∆P (t) (c) under various external voltages (Vext)
at room temperature and (d) under 1.7 V at various temper-
atures. The dotted and solid lines correspond to fitted results
using the KAI model and the Lorentzian distribution function
in log t0, respectively.
http://arxiv.org/abs/0704.1053v1
0 100 200 300
  15 K
  25 K
  80 K
 (kV/cm)
150 K
300 K
-7 -6 -5 -4 -3
log t (s)
@ 80 K, 1.7 V 
 Experimental results
 Lorentzian distribution
 Gaussian distribution
 KAI model
-6 -4 -2
log t(s)
FIG. 2: (color online). (a) Values of n for various T and
Eext. (b) ∆P (t) results for (solid symbols) experimental data
and fitted results using the Lorentzian (solid line), Gaussian
(dashed line), and delta (dotted line) distributions for log t0.
The inset shows the distribution functions corresponding to
the fitted results.
authors also studied La-doped poly-PZT films and found
that ∆P (t) at room temperature is limited mainly by
nucleation, while at a low temperature (T ), the switching
kinetics are governed by domain wall motion, implying
the validity of the KAI model [5].
In this Letter, we investigate the polarization switch-
ing behaviors of poly-PZT films. We can explain the
measured ∆P (t) in terms of the Lorentzian distribution
function for F (log t0), irrespective of T . We show that
such distribution arises from local field variation in a dis-
ordered system with dipole-dipole interactions.
Note that (111)-oriented poly-PZT films with a Ti
concentration near 0.7 are the most widely used mate-
rial in FeRAM applications. We prepared our polycrys-
talline PbZr0.3Ti0.7O3 thin film on Pt/Ti/SiO2/Si sub-
strates using the sol-gel method. The poly-PZT film
had a thickness of 150 nm. X-ray diffraction studies
showed that it has the (111)-preferred orientation, and
scanning electron microscopy studies indicated that our
poly-PZT film consists of grains with a size of about 200
nm. We deposited Pt top electrodes using sputtering
with a shadow mask. The areas of the top electrodes
were about 7.9×10−9 m2.
We obtain the ∆P (t) values of our Pt/PZT/Pt capac-
itors using pulse measurements [2, 4, 13, 14]. Figure
1(a) shows the pulse trains used to measure the non-
switching polarization change (Pns). Using pulse A1, we
poled all the FE domains in one direction. Then, we ap-
plied pulse A2 with the same polarity, and measure the
current passing a sensing resistor. By integrating the cur-
rent, we could obtain the Pns values. Figure 1(b) shows
the pulse trains used to measure the switching polariza-
tion (Psw). Inserting pulse B with the opposite polarity
between pulses A1 and A2, we could reverse some portion
of the FE domains, so the difference between the values
of Psw and Pns represents the polarization change due
to domain switching, namely ∆P (t). We varied t from
200 ns to 1 ms, and Vext from 0.8 to 4 V. The value of
Eext can be estimated easily by dividing Vext by the film
thickness. At T of 80∼300 K, we used pulses A1 and A2
with a height of 4 V, which was larger than the coercive
voltage. Below 80 K, the coercive voltage increases, so
we increased the pulse height to 6 V [15].
Figure 1(c) shows the values of ∆P (t)/2Ps at room
temperature with numerous values of Vext. Figure 1(d)
shows the values of ∆P (t)/2Ps at various T with Vext
= 1.7 V. The dotted lines in both figures are the curves
best fitting Eq. (1). The KAI model predictions deviated
markedly from the experimental ∆P (t) values in the late
switching stage, in agreement with Gruverman et al. [6].
In addition, the best fitting results with the KAI model
gave unreasonable values of n. As shown in Fig. 2(a),
the values of n varied markedly with T and Eext. In ad-
dition, in the low Eext region, we obtained n values much
smaller than 1, which are not proper as an effective di-
mension of domain growth. Therefore, Eq. (1) fails to
describe the polarization switching behaviors of our PZT
films.
To explain the measured ∆P (t), we tried simple func-
tions for F (log t0) in Eq. (2). The opposite domain,
once nucleated, will propagate inside the film, so we fixed
n=2. The solid circles in Fig. 2(b) show the experimen-
tal ∆P (t) at 80 K with Vext = 1.7 V. For F (log t0),
we tried the delta, Gaussian, and Lorentzian distribu-
tion functions, as shown in the inset. The dotted line
indicates the fitting results using Eq. (2) with a delta
function. Note that this curve corresponds to a fit with
the KAI model, and thus the classical theory cannot ex-
plain our experimental data. The dashed line shows the
Gaussian fitting results. Although this fitting seems rea-
sonable, some discrepancies occur. The solid black shows
the fitting results with the Lorentzian distribution:
F (log t0) =
(log t0−log t1)2+w2
, (3)
where A is a normalization constant, and w (log t1) is
the half-width at half-maximum (a central value) [16].
The Lorentzian fit can account for our observed ∆P (t)
behaviors quite well.
We applied the Lorentzian fit to all of the other exper-
imental ∆P (t) data. As shown by the solid lines in Figs.
1(c) and (d), the Lorentzian fit provides excellent expla-
nations. Figure 3(a) presents the Lorentzian distribution
-4 -2 0 2 4
0.8 V
1.0 V
1.2 V
1.4 V
1.7 V
log t - log t
-8 -7 -6 -5 -4 -3 -2
log t
 0.8 V
 1.0 V
 1.2 V
 1.4 V
 1.7 V
@ 300 K
@ 300 K
FIG. 3: (color online). (a) The Eext-dependent Lorentzian
distribution functions at room temperature. (b) Rescaled
∆P (t) using fitting parameters for the Lorentzian distribu-
tion function.
functions used for the 300 K data. As Vext increases,
log t1 and w decrease. We rescaled the experimental
∆P (t)/2Ps data using (log t - log t1)/w. All the data
merge into a single line, an arctangent function [16], as
shown in Fig. 3(b). Although not indicated in this fig-
ure, the experimental data for all other T also merged
with this line. This scaling behavior suggests that the
Lorentzian distribution function for log t0 is intrinsic.
Note that F (log t0) follows not the Gaussian distribu-
tion, but the Lorentzian distribution. For a statistically
independent random process, it is a basic statistical rule
that the resulting distribution should become Gaussian,
regardless of the process details [17]. For example, im-
purities (or crystal defects) inside a real crystal result in
inhomogeneous broadening of the light absorption line,
which has a Gaussian line shape.
However, some studies have observed that magnetic
resonance line broadening of randomly distributed dipole
impurities follows the Lorentzian distribution [18]. The
first rigorous theoretical result for this problem is that of
Anderson, who showed that the distribution of any in-
teraction field component in the system of dilute aligned
dipoles should be Lorentzian [19, 20]. Polycrystalline FE
films should contain many dipole defects that will act
as pinning sites for the domain wall motion. To explain
our observed Lorentzian distribution of log t0, we assume
that a local field E exists at the FE domain pinning sites
and that it has a Lorentzian distribution:
F (E) = A
, (4)
where ∆ is the half-width at half-maximum of the E dis-
tribution function, related to the concentration of pin-
ning sites.
In the low Eext region, the domain wall motion should
be governed by thermal activation process at the pin-
ning sites. Without E effects, thermal activation results
in a domain wall speed in the form v ∝ 1/t0 ∝ exp[-
(U/kBT )(E0/Eext)], where U is the energy barrier and
E0 is the threshold electric field for pinned domains [21].
Since E results in a change in the effective electric field
at pinning sites, the associated t0 can be expressed as
t0 ∼ exp
)( E0
Eext+E
. (5)
Then, the distribution of E results in a distribu-
tion in log t0, using the relation F (log t0) = F (E) ·
|dE(log t0)/d(log t0)|. With
log t1 ≈
w ≈ UE0∆
, (7)
we can obtain the desired Lorentzian distribution for
F (log t0), i.e., Eq. (3), from Eqs. (4) and (5).
Our experimental values for log t1 and w agree with
the analytical forms. Figures 4(a) and (b) plot log t1 vs.
1/Eext and w vs. 1/E
ext at various T , respectively. Both
log t1 and w follow the expected Eext-dependence in the
low Eext region. Note that Eq. (6) is consistent with
Merz’s law [22], which states that the current coming
from FE polarization switching should have a character-
istic time of exp(α/Eext), where α is the activation field.
Using this empirical law, several studies have measured α
values. For example, So et al. reported α ≈1700 kV/cm
for 100-nm-thick epitaxial PZT films [2], and Scott et
al. reported α ≈ 270 kV/cm for 350-nm-thick poly-PZT
films [23]. These values are consistent with our room
temperature value of UE0/kBT , i.e., 1400 kV/cm.
Our model viewed the FE domain switching kinet-
ics as domain wall motion driven by Eext with a ran-
dom pinning potential. In the low Eext region, thermal
activation at the pinning sites can be important, result-
ing in the so-called domain wall creep motion. Applying
atomic force microscopy, Tybell et al. [21] and Paruch
et al. [24] demonstrated that the domain-switching ki-
netics in epitaxial PZT films are governed by the domain
wall creep motion. Some theoreticians studied the do-
main wall creep motion of an elastic string in a random
potential. They found a linear increase in U with an in-
crease in T [25]. The insets in Fig. 4(a) show that the
0.0000 0.0001 0.0002 0.0003 0.0004 0.0005
0 150 300
0 150 300
0.000 0.005 0.010 0.015 0.020 0.025 0.030
  15 K
  25 K
  80 K
150 K
300 K
T (K)
(cm/kV)
  15 K
  25 K
  80 K
150 K
300 K
 (cm/kV)
FIG. 4: (color online). Eext-dependent (a) log t1 and (b) w at
various T . Note that log t1 and w are proportional to 1/Eext
and to 1/E2extin the low Eext region, respectively. The insets
show UE0/kB and ∆. The solid lines are guidelines for eyes.
value of UE0/kB obtained from the linear fits in Fig. 4(a)
increase linearly with T , consistent with the theoretical
prediction for U [25]. The inset in Fig. 4(b) shows ∆
obtained from the fits to Fig. 4(b). Similar exponential
decay behavior was predicted in a magnetic resonance
study of randomly distributed dipoles [26].
At this point, we wish to compare our model with the
NLS model. Although both models use Eq. (2), the ori-
gins and forms for F (log t0) are quite different. In the
NLS model, the FE film consists of numerous areas, each
with its own and independent t0. Subsequently, it was
suggested that the individually switched regions corre-
spond to single grains or clusters of grains in which the
grain boundaries act as frontiers limiting the propaga-
tion of the switched region [8]. Consequently, the NLS
model can be applied for polycrystalline films only, and
the form of F (log t0) should depend on their microstruc-
ture. Conversely, in our model, the interaction between
dipole defects inside the FE film induces a distribution
in the local field, which results in F (log t0). Therefore,
both point defects and the grain boundaries could act
as pinning sites. Using the Lorentzian distribution for
F (log t0), our model can be used for both epitaxial and
polycrystalline FE films [2]. Using Eqs. (2) and (3) with
small w values, we could successfully explain the ∆P (t)
for FE single crystals or epitaxial thin films [2]. We also
found that our model can explain the ∆P (t) data for
poly-PZT films with Ti concentrations of 0.48 and 0.65.
Note that our model for thermally activated domain
switching kinetics can be viewed as the famous prob-
lem that treats the propagation of elastic objects driven
by an external force in presence of a pinning potential
[21, 24, 25]. It can be applied to many FE films, since
the domain wall motion with a disordered pinning po-
tential should be the dominant mechanism for ∆P (t).
Therefore, the ∆P (t) studies can be used to investi-
gate numerous intriguing issues concerning nonlinear sys-
tems, such as creep motion, avalanche phenomenon, pin-
ning/depinning transition, and so on.
In summary, we investigated the polarization switch-
ing behaviors of (111)-oriented poly-PZT films and
found that the characteristic switching time obeyed the
Lorentzian distribution. We explained this intriguing
phenomenon by introducing the local electric field due
to the defect dipole.
The authors thank D. Kim for fruitful discussions.
This study was financially supported by Creative Re-
search Initiatives (Functionally Integrated Oxide Het-
erostructure) of MOST/KOSEF.
∗ Electronic address: twnoh@snu.ac.kr
[1] Ferroelectric Memories, edited by J. F. Scott (Springer-
Verlag, Berlin, 2000).
[2] Y. W. So et al., Appl. Phys. Lett. 86, 92905 (2005) and
references therein.
[3] O. Lohse et al., J. Appl. Phys. 89, 2332 (2001).
[4] A. K. Tagantsev et al., Phys. Rev. B 66, 214109 (2002).
[5] I. Stolichnov et al., Appl. Phys. Lett. 83, 3362 (2003).
[6] A. Gruverman et al., Appl. Phys. Lett. 87, 082902 (2005).
[7] V. Shur et al., J. Appl. Phys. 84, 445 (1998).
[8] I. Stolichnov et al., Appl. Phys. Lett. 86, 012902 (2005).
[9] B. H. Park et al., Nature 401, 682 (1999).
[10] N. Kolmogorov, Izv. Akad. Nauk. Ser. Math. 3, 355
(1937).
[11] M. Avrami, J. Chem. Phys. 8, 212 (1940).
[12] If all nuclei of opposite polarization arise through whole
process, n could be larger than the actual dimension.
[13] Y. S. Kim et al., Appl. Phys. Lett. 86, 102907 (2005).
[14] J. Y. Jo et al., Phys. Rev. Lett. 97, 247602 (2006).
[15] Complications can occur due to charge trapping or do-
main pinning, called the imprint effect. Refer to Ref. [1].
To prevent the imprint effect, we applied a pulse with
the opposite polarity at the end of each pulse train mea-
surement (i.e., after pulse A2).
[16] A double exponential function exp[-{10log t/10log t0}n]
with n >1 can be approximated as a step function cen-
tered at log t0=log t. As a result, Eq. (2) can be approx-
imated as 2PsA/π · [arctan{(log t− log t1)/w}+ π/2].
[17] F. Reif, Fundamentals of Statistics and Thermal Physics
(McGraw-Hill, Singapore, 1985).
[18] J. H. V. Vleck, Phys. Rev. 74, 1168 (1948).
[19] P. W. Anderson, Phys. Rev. 82, 342 (1951).
[20] J. R. Klauder and P. W. Anderson, Phys. Rev. 125, 912
(1962).
[21] T. Tybell et al., Phys. Rev. Lett. 89, 097601 (2002).
[22] W. J. Merz, Phys. Rev. 95, 690 (1954).
[23] J. F. Scott et al., J. Appl. Phys. 64, 787 (1998).
[24] P. Paruch et al., Phys. Rev. Lett. 94, 197601 (2005).
[25] A. B. Kolton et al., Phys. Rev. Lett. 94, 047002 (2005).
mailto:twnoh@snu.ac.kr
[26] M. W. Klein, Phys. Rev. 173, 552 (1968).
ABSTRACT
  We investigated domain kinetics by measuring the polarization switching
behaviors of polycrystalline Pb(Zr,Ti)O$_{3}$ films, which are widely used in
ferroelectric memory devices. Their switching behaviors at various electric
fields and temperatures could be explained by assuming the Lorentzian
distribution of domain switching times. We viewed the switching process under
an electric field as a motion of the ferroelectric domain through a random
medium, and we showed that the local field variation due to dipole defects at
domain pinning sites could explain the intriguing distribution.

<|endoftext|><|startoftext|>
Introduction
Consciousness is probably the most difficult problem attempted by human
scientific endeavor, and is developing into an eclectic discipline. In this paper,
I introduce certain new set theoretic ideas (among others) in the already
inter-disciplinary field of consciousness studies. At the outset, I shall clearly
state relevant points of my philosophical stance.
A. Nature of Consciousness.
Conscious observers are different from brain and bodies, but interact
through them. This is identical to the dualistic school of thought, with
Ecclles [1] as a representative. This apriori does not rule out emergence of
temporary consciousness in matter, as a result of various postulated mecha-
nisms [2], such as Bose-Einstein condensations, or self organizing behavior,
or phase locked dynamical neural networks, strange attractors etc.. Whereas
the material consciousness is temporary (depending upon stability of the
physical system), the non-material observer is eternal in time. This does
not imply that observer is always conscious, or observer is conscious only in
body. The term, ’Soul’, is more accurate, as consciousness and observation
are ’temporary phenomena’ accompanying ’Soul’, in certain conditions 1. I
shall however continue to use the term, ”conscious observer”, to represent
soul. Consciousness may thus be regarded as a common emergent property
of both the observer, as well as matter. As can be seen, the stance is flexible,
enough to be compatible with almost diverging views on the subject, and
may be termed as ’Non-dualistic’ in the sense that it allows simultaneous
existence of almost divergent view points. Further, the conscious observers
are distinct from each other, and inhabit the same (one) physical universe.
B. Nature of Time.
It is probably unlikely that complete problem of consciousness can be
solved without understanding nature of time. Time has been a subject of
many monographs and papers [3, 4, 5] in physics. The unresolved issues here
are -
1. Arrow of Time, i.e., its irreversibility,
2. Origin of present moment, alternatively the observed subjective dis-
tinction between, past, present and future, with conscious observers’
1For instancee, during dreamless sleep and coma, both consciousness and observations
are absent
attention confined to present (light-like hyper-surface), (ignoring out
of body and similar psychic experiences [7] , for the time-being).
3. Overall geometry of time, i.e., cyclic or linear etc., [8].
In periodic time, it can be shown that cause and effect are connected by,
what in set theory is called an equivalence relation [9]. However, this causal
structure could be important, as it has been pointed out that EPR paradox
[38], which has been experimentally verified in recent years [11, 12, 13, 14,
15, 16], does suggest that causality is an equivalence relationship [17]. While
this circular geometry of time, would appear to lead to the usual causal
paradoxes of time loops in physics, such as killing one’s mother before one’s
birth (or conception!) - the causal paradoxes are avoided by withholding free-
will to the conscious observers or actors or participants (in the time cycle!).
Thus, consciousness is only an observer of events pre-recorded within itself,
in an over all cyclic time, within this philosophical framework. Any apparent
freewill is actually illusory.
Further cyclic nature of time blurs the distinction between past and fu-
ture, as the two are globally connected. In introduction of his monograph,
Zeh [3] quotes Lewis Carroll from the book ”Through the Looking Glass” -
White queen to Alice: ”Its a bad memory which works only backwards”.
In cyclic time, a good or perfect memory, will be able to remember all
events which the possessor (conscious observer) would have observed with in
the complete time cycle.
I should add that while cyclic time has been a hallmark of ancient cos-
mological systems, Poincare , Zermelo, Caratheodoty, and Nietzsche [18, 19,
20, 21] did attempt a mathematical formulation of the idea at the turn of the
century, and the idea has recently been evoked by contemporary mathemat-
ical physicists such as Segal [23], Guillemin [22] and others [24, 25] , with a
view of solving certain problems of observational cosmological physics, and
particle physics (macro and micro cosmos). Idea behind introducing cyclic
time concept is that if problem of consciousness is going to be solved [or the
other way round), it may be so only in cyclic time - or what the mathematical
physicists would call the S1 (circle) topology of time.
2 Set Theoretic Connection.
Cantor defined set as ”collection into a whole, of objects of our intuition or
thought”. The definition is very psychological, and the phrase ”collection
into a whole” is of special relevance. The phrase actually implies simulta-
neous perception of constituent set elements as a whole. Even a sentence
is understood only when perceived as a whole. Role of short term memory
in verbal comprehension comes to mind when perceiving or comprehending
very large sentences - here 7± 2 chunks of short term memory are probably
being used at various levels, and the sentence has to be read many times,
before comprehension (collection into a whole) occurs. Interestingly 7±2 has
been derived by statistical mechanical considerations, by developing so called
Fokker-Plank equation of brain neurons [26]. I equate verbal comprehension
with (verbal) perception, as such an equality is the reason for use of phrases
such as ”Now I see”, when verbal comprehension occurs.
While verbal perception occurring in sentence comprehension involves
only a small number of elements, visual perception by contrast requires inte-
gration of a very large number of elements. Same can be said of other modes
of perception, such as auditory, tactile, kinesthetic, olfactory, propeoceptory.
Further, in the conscious experience of an observer, these various percep-
tions from different senses, are further integrated into a whole, which even
has non-sensory components such as thoughts, memories, feelings, emotions.
The complete momentary experience of a conscious observer is thus a set,
in context of Cantor’s original definition. Further its a finite set, as it has a
finite number of elements, as represented by finite number of brain neurons.
3 Quantum Mechanical Aspects
von Neumann [27], Wigner [28] and Pauli [29] have suggested the that wave-
packet reduction was occurring when the wave-packet was interacting with
the conscious observer. Precise mechanism for this reduction by interac-
tion with consciousness has never been worked out. On the other hand cer-
tain mechanisms for wave-packet reduction have indeed been worked out by
physicists, e.g., wave-packet reduction occurs when a coherent (unreduced)
wave-packet approaches a system with infinite degrees of freedom [30, 31].
Hawking [?] has also obtained interesting results concerning wave-packet re-
duction near a black-hole when a coherent state approaches it.
There exist other interpretations of quantum mechanics which try to do
away with concept of wave-packet reduction all together. These include Ev-
erett’s many world interpretation [33] , Bohm’s interpretation Bohm 1957,
various hidden variable scenarios Bellifante 1973 etc.. However, experimental
evidence of quantum zeno effect Itano 1990, and EPR suggests that wave-
packet reduction is a good concept to explain the results. I therefore assume
that wave-packet reduction does occur, and further it is caused by conscious
observer. What are the properties of consciousness which solves these quan-
tum measurement problem? Let’s refer to this set of properties of conscious-
ness as OM ( O - Observer, M - Measurement). Acronym OM has been
selected, because of special significance word ’Om’ has in context of search
for one’s true identity in ancient Indian philosophy.
von Neumann’s model can be understood in terms of ”Quantum Mea-
surement Chain” (QMC). In this when an attempt is made to record state of
say Schrodinger’s cat, by a camera, the wave function of camera also passes
into a coherent state. Same happens to the wave functions of film, human
eyes, retina, brain neurons and so on. In von Neumann’s interpretation, ob-
server lies at the end of this quantum measurement chain, and leads to wave
packet reduction of all the intermediary links (camera, brain, neurons) in
this quantum measurement chain. This chain is originating at Schrodinger’s
cat. Its possible for more than one QMC to originate from the same system,
e.g., when multiple observers are monitoring state of Schrodinger’s cat. Such
quantum measurement chaincs will be called linked.
Now, apriori there is no reason, why if consciousness is causing wave-
packet reduction (and influencing physical universe), it is not through the
two mechanisms already outlined by physicists Hepp, Fakuda, and Hawking
(above). In absence of any other description of process of consciousness
causing wave-packet reduction, I accept what is not forbidden. Thus, as a
conceptual agent responsible for wave-packet reduction, I attribute following
properties to consciousness -
OM-1. Consciousness is a system of infinite degrees of freedom.
OM-2. Embodied Consciousness is associated with a black-hole.
Reason for use of letters O and M in OM will become clear in sections
4 and 5.
4 Black-hole in brain!
While popular notion of black-holes as massive astro-physical objects still
awaits experimental confirmation, theoretical physics has moved ahead with
concept of mini-black-holes, which are much less massive, and those which
can be light enough to have mass of elementary particles, such as protons
[37, 38]. While the astro-physical black-holes are caused by gravitational
collapse of stars whose mass exceeds the so called Chandershekhar limit, the
concept of micro-mini-black-hole (MMBH) is more like a singularity or hole in
fabric of space, i.e., its a place, where physical space ceases to exist, so to say.
Its gravitational influence is negligible, being proportional to its mass, and so
is its size. Its interesting because of its relativistic, quantum mechanical, and
philosophical properties. The reader thus need not be alarmed, that all of
his or her gray matter will be sucked down this infernal black-hole in brain.
There is an interesting philosophical reason for existence of a black-hole
associated with consciousness in brain. While the soul or conscious ob-
server, is regarded as a non-physical object, non-localizable in space-time,
the present brain studies have almost localized it to be the region within
brain. Black-hole provides a escape for the non-material spirit. While the
black-hole can be given physical co-ordinates, the area within the event hori-
zon, of the black-hole, effectively does not belong to the physical universe
- it lies beyond the physical, universe so to say. Thus conscious observer
located within such a black-hole strictly speaking does not exist within phys-
ical space-time. In the event of physical death, the mind-body connection
is severed, (e.g. Moody [39] , and one experts this black-hole to dissolve,
or evaporate by a mechanism, analogous to ”Hawking radiation” [40]. Its
energy will be carried away by gravitational waves, and therefore will lead to
a mass loss equal to mass of MMBH (few grams). Also effect of these grav-
itational waves generated at the moment of death, will be similar to high
frequency acoustic waves, and would lead to cracking of any glass enclosure
containing the physical body. Experiments verifying these phenomena have
actually been done at the turn of this century [41]. The kind of tunnel vision
reported in near death experiences, i.e., motion through a long dark tunnel,
with light at the end of tunnel [39], is actually in accordance with optics
near black-holes - an observer escaping through a black-hole would actually
experience, similar tunnel vision!
Black-holes have many other interesting properties such as existence of
local closed time like curves, Morris 1988 which could explain ability of clair-
voyants to see past and future events, while existing within the physical
body (by motion of point like conscious observer, on one of the local closed
time like curves near MMBH). Black holes also provide a handle, or gate-
way to other dimension, and non-physical universes, which are of special
interest Brahma Kumaris and possibly other workers in Transcendental. In-
terestingly, tachyons (particles traveling faster than light) falling through a
black-hole, leave the conventional physical universe of three space and one
time dimension, and enter a universe with three time and one space dimen-
sion Chandola 1986. This latter universe or rather meta-universe, could be
of special interest for actual meta-physical experiences.
5 Records within Consciousness
Apriori, von Neumann’s interpretation of consciousness causing wave-packet
reduction, does not determine, as to which particular outcome is actually
selected. Neither do any other mechanisms such as Fakuda’s, Hepp’s or
Hawking’s - all they do is reduce a coherent superposition of states into an
incoherent mixture - actual outcome then being a psychological process of
observation. Thus if embodied consciousness is monitoring quantum states
of 109 neurons, and causing their wave-packet reductions, which leads to per-
ception, and the complete momentary experience of that observer, then these
wave-packet reductions must be recorded within the conscious observer, in
a consistent fashion, and cannot be random because - random reductions,
will not lead to the perceived order of the universe. Various laws of physics,
such as those of continuity, conservation, invariances, etc., are result of per-
ceptions, based upon these wave-packet reductions. Hence we can identity
another property of consciousness -
OM-3. Recorded within conscious observer is outcome of all quantum mea-
surements performed by it (as reflected in coherent brain states). These
reductions are not random, but have a logical relation to each other, which
is the basis for invariances observed in physics, and results of EPR and Bell’s
inequality experiments etc.
Now this selection of a particular outcome, from the set of all possible out-
comes is another psychological process, which was encountered quite early ;in
development of set theory. Its called Axiom of Choice [44]. Briefly, it states
that, given a set, there exists a choice function, which selects an element of
the set. Only problem is, that while it is vital to almost all of mathematics,
its use has lead to paradoxical Banach-Tarski theorems, involving duplica-
tion of spheres, and show that concept of additive measure is not sound [45].
Details of how axiom of choice relates to quantum measurement process are
worked out in [?]
There exists another reason for records within consciousness, and evi-
dence from it comes from following scenario reminiscent of EPR of quantum
mechanics. The argument has a philosophical flavor - if distinct observers A
and B separated by a space-like interval, observe state of Schrodinger’s cat in
an experiment at a particular instant, it is required that both should observe
it either dead or alive. It should not be that observer A, finds cat alive,
and observer B finds cat dead. Thus if wave function of Schrodinger’s cat
as represented in brain neurons of observer A is collapsing, due to a record-
ing within A, this collapse is compatible with a similar collapse occurring in
brain of observer B. To ensure this compatibility we require -
OM-4. Recording within conscious observers with respect to measurement
performed on the same quantum system are mutually compatible. Alterna-
tively, outcomes observed by observers lying at ends of distinct but linked
quantum measurement chains, are compatible.
Support of EPR results of quantum mechanics for records within con-
sciousness is as follows. Lets say that observers A and B are separated by
a space-like interval, and perform measurement on a correlated photon pair.
EPR results indicate that wave functions of both the photons are collapsing
only at the moment of measurement, and the collapses are mutually com-
patible, which A and B will also notice, when they compare notes latter on.
Now, there exists no way for observer A to send a signal to B, regarding his
or her outcome, within the frame work of present day physics (light speed
limit and all that). The question therefore exists, if the conscious observers
A and B are indeed causing these distinct but correlated collapses, how is
mutual compatibility being ensured? OM3 is thus related to OM4. This
is also where cyclic nature of time may be playing an important role. In
cyclic time, the same measurements would have been performed in all the
past (infinite) time cycles, and identical outcomes would have been recorded,
in all the time cycles. This geometry of time, appears to provide at least
a chance for observers to compare note and correlate their outcomes. See
[9] for a possible scenario of communication between observers in EPR type
experiments using light like signals in cyclic time.
I close this section, with an argument, for why the embodied consciousness
should exist in a MMBH (micro-mini-black-hole). If outcomes of quantum
measurement are pre-recorded in the conscious observers, (OM-3 and OM-4)
such a recording constitutes ”hidden variable [35] determining the quantum
measurement outcome”. Now Bell’s theorem [47] yielded inequalities, which
would distinguish between hidden variable scenario, and actual quantum me-
chanics (without hidden variables). Experiments [11]-[16] to test between the
two yielded results in accordance with quantum mechanics, i.e., no functions
or additional physical hidden variables, which would determine the outcome.
Locating the consciousness within a black-hole resolves this problem, because
now the recording (hidden variable determining the outcome) is lying beyond
the event horizon of the black-hole, and thus effectively outside the universe,
and therefore is beyond the purview of present formulation of Bell’s theorem
[47].
6 Neuro-Biological-Quantum-Zeno-Effect
Readers would be familiar with ancient Greek Zeno’s paradox [49], which
questioned the concept of motion, by arguing that if an arrow in flight was
being continuously observed, and occupied a position at every time instant,
as to how could the apparent motion observed was actually possible? Though
the paradox was resolved in continuum based classical mechanics, it has reap-
peared in grab of quantum zeno effect (QZE) - so christened by physicist
Sudarshan [49]. Briefly, if a system is in state A, and about to change to
state B, before it does so, its wave function (mathematical object describing
its physical state), has to go into a superposition of states A and B, i.e., the
system exists in a sort of (A+B) state. Now when a quantum measurement
is performed on this superposed (A+B) state, the wave function collapses
to either A or B. So, if the measurements are being performed sufficiently
rapidly, the wave function of the system cannot evolve to first state (A+B)
and than state B. As a result it remains frozen in state A. This effect is also
called ”watch dog effect”, [50] (thief moves only when watch dog closes its
eyes), and the ”boiled kettle phenomena”, (kettle appears to boil over and
spill, just when one’s attention is diverted). Thus in terms of quantum me-
chanics, paradox of Zeno’s arrow can be formulated and resolved as follows.
Where as, wave-function of arrow, which also describes its position is evolv-
ing continuously, the actual act of wave-packet reduction, by monitoring or
perception of a conscious observer, is a non-continuos phenomena. This is
because the process of human perception requires a large number of photons
from Zeno’s arrow to reach human eye and retina, where after a time de-
lay, a signal is relayed to brain, and a quantum mechanical representation
of arrow’s (superposed) wave function is formed by observer’s neurons. This
quantum measurement chain of coherent (superposed) state collapses when
the conscious observer perceives arrow’s position, and is a non-continuos phe-
nomena. Thus, in between these non-continuos perceptions, wave function of
arrow can evolve to different positions. QZE has been verified for ensembles
of atoms about to make electron transitions from a higher energy state to a
lower energy state [36]. Monitoring at progressively smaller intervals, reduces
actual number of atoms making the transition, in a given period of time.
Neuro-biological-quantum-zeno-effect (NBQZE), as the term suggests,
implies that, if brain state of a person is being monitored at sufficiently
small space-time scales, (by another person, with data being recorded onto
say a computer, all of which is later examined by the experimenter), then
neurons of the subject will not be able to evolve to a coherent state and make
transition, to another state which would represent transitions from one per-
ception to another. Thus person’s subjective experience would be blocked.
Even though external sensory stimulus may be applied, the subject would
not report perception of the stimulus. The resultant state would be similar to
highest states of meditation, which involve complete withdrawal of conscious-
ness from the body and senses - effectively the consciousness has ceased to
interact with the physical universe, and is no longer performing any quantum
measurement - wave-packet reductions, in his or her brain are being caused
by the experimenter, and are preventing brain state from evolving along with
the wave function of the changing environment or universe.
References
[1] Ecclles, J. C., and Popper, K. R. : Self and its Brain, Routledge and
Kegan Paul (1977).
[2] Crick, F. : The Astonishing Hypothesis - The Scientific Search for
the Soul, Simon and Schuster, London (1994).
[3] Zeh, H.: The Physical Basis of the Direction of Time, Springer-Verlag
(1989).
[4] Sachs, R.G.: The Physics of Time Reversal, University of Chicago
Press (1987).
[5] Mackey, M. C. : Time’s Arrow: The Origins of Thermodynamic
Behavior, Spriger-Verlag (1992).
[6] Davies, P. : The Physics of Time Asymmetry, University of California
Press (1977).
[7] Blackmore, S.J.: Beyond the Body, Academy Chicago Publishers
(1992).
[8] Newton-Smith, W.H. : The Structure of Time, Routledge and Kegan
Paul (1980).
[9] Modgil, M. S. : Time irreversibility implies energy non-conservation
or global causality violation, Pre-print (1995).
[10] Einstein, A., Podolsky, B., Rosen, N.: Phy. Rev., 47, 777-80 (1935).
[11] Freedman, F. J. and Clauser, J.F. : Phys. Rev. Lett., 28, 938-941
(1972).
[12] Clauser, J.F. : Phys. Rev. Lett., 36, 1223-6 (1976).
[13] Fry, E.S. and Thompson, S.C. : Phys. Rev. Lett., 37, 465-8 (1976).
[14] Aspect, A., Grangier, P. and Roger, G. : Phys. Rev. Lett. 47, 460-3
(1981); Phys. Rev. Lett., 49, 91-4 (1982).
[15] Aspect, A., Dalibard, J. and Roger, G. : Phys. Rev. Lett., 49, 1804-7
(1982).
[16] Aspect, A. : Wave Particle Dualism, pp337-390, ed. Dinner, S., Far-
gue, D., Lochak, G. and Selleri F., Reidel, Dordrecht (1984) .
[17] de Beauregard, C. : Found. Phys., 15, 871 (1985).
[18] Poincare, H. : Acta Mathematica, 13, 1-270 (1890); Revue de Meta-
physique et de Morale 1, 534-7 (1893).
[19] Zermelo, E. : Annalen der Physik, 57, 485-94 (1896); Annalen der
Physik, 59, 793-801 (1896); Caratheodory, C., Sitzber, Preuss. Akad.
Wiss., 579 (1919).
[20] Brush, S.G. : Kinetic Theory, Vol.2 Irreversible Processes, Pergamon
Press 1966. This book also gives English translations of relevant parts
of papers of Nitzsche, Poncare, Zermelo and Boltzmann; Kaufman,
W. A. : Nietzche: Philosopher, Psychologist, Antichrist, Chapter 11,
Princeton University Press (1950) .
[21] Nietzsche, C. A.; sa Vie sa Pensee, Gallimard, Paris, Vol. 4, Livre
2, chap. I and 3; Der Wille zur Macht,in his, Gesammelte Werke,
Musarion Verlag, Munich (1926), Vol. 19, Book 4, Part 3; English
translation by, Manthey-Zorn, O., in Nietzsche, An Anthology of his
works, Washington Square Press, New York (1964) p.90.
[22] Guillemin, V. : Cosmology in (2+1)-dimensions, cyclic models and
deformations of M − 2, 1 , Princeton University Press (1989).
[23] Segal, I. E.: Mathematical Cosmology and Extra-galactic Astronomy,
Cambridge University Press (1984).
[24] Schechter, B.M. : Phy. Rev. D 16, 3015 (1977); Villaroel, J. : I1
Nuo. Cim. A 94, 405 (1986).
[25] Castell, L. : Phy. Rev. D 6, 536 (1972).
[26] Ingberg, L. : Phys. Rev., A 8, 396 (1983); 29, 346 (1984); 31, 1183
(1985).
[27] von Neumann J.: Mathematical foundations of quantum mechanics,
Princepton University Press (1955).
[28] Wigner, E.W.: A scientist speculates, (1966).
[29] W. Pauli quoted in, Mind, Matter, and Quantum Mechanics,
(Springer-Verlag, Berlin, 1993), Chap. 7.
[30] Hepp, K. : Helv. Phys. Acta., 45, 237 (1972).
[31] Fakuda, R. : Phys. Rev., A, 35, 8 (1987); 36, 3023 (1987).
[32] Hawking, S. W. : Quantum Coherence down the wormhole, in , Quan-
tum Gravity, Proceedings of the Fourth Seminar on Qunatum Grav-
ity, Moscow, Markov, M.N., Berezin, V. A., and Frolov, V. P. (eds.),
World Scientific, Singapore, p125.
[33] Everett E.: Rev. Mod. Phys., 29, 454-462, (1957).
[34] Bohm, D. : textbfWholeness and Implicate Order, Routledge, 1980.
[35] Bellinfante, F. J.: A Survey of Hidden Variables Theories, Pergamon
Press, 1973.
[36] Itano, W. M., Heinzen, D.K., Bollinger, J. J. and Wineland, D. J.:
Phys. Rev. A 41, 2295 (1990).
[37] Recami, E. and Tonin-Zanchin, V.: Unusual Black Holes, in, About
some stable (non-evaporating) Extremal Solutions of Einstein Equa-
tions, Mukherjee, S., Prasanna, A.R. and Kembhavi, A.K. (eds.),
Wiley Eastern Ltd. (1992).
[38] Einstein, A. and Rosen, N.: Phys. Rev., 48, 73 (1935).
[39] Moody Jr., R. A. : Reflections on Life after Life, Bantam (1977).
[40] Hawking, S. W. : Nature, 248, 30 (1974); Commun. Math. Phys. ,
43, 199 (1975).
[41] MacDougall, D. : Hypothesis concerning Soul Substance Together
with Experimental Evidence for the Existence of Such Substance, in,
Journal of American Society for Psychical Research, 1, 237-44 (1907).
[42] Morris, M.S., Throne, K.S., and Yurtsever, U.: Phys. Rev. Lett., 61,
1446-1449, (1988).
[43] Chandola, H. C., Rajput, B.. S., Sagar, R. and Verma, R. C. : Indian
J. Pure and Appl. Phys. , 24, 51 (1986).
[44] Moore, G. M.: Axiom of Choice, Its Otigin, Development and Influ-
ence, Springer-Verlag (1982).
[45] Wagon, S.: The Banach-Tarski Paradox, Encylop. of Math. and its
applications, Vol. 24, Cambridge University Press (1985).
[46] Modgil, M., S. : Axiom of Choice and Quantum Measurement,
preprint (1992).
[47] Bell, J. S.: Physics, 1, 195 (1964).
[48] Grunbaum, A.: Modern Science and Zeno’s Paradoxes, George Allen
and Unwin Ltd, London (1967).
[49] Mishra, B. and Sudarshan, E. C. G.: J. Math. Phys. , 18, 756 (1977).
[50] Joos, E. : Phys. Rev., D 29, 1626 (1984).
	 Introduction
	 Set Theoretic Connection.
	 Quantum Mechanical Aspects
	 Black-hole in brain!
	 Records within Consciousness
	 Neuro-Biological-Quantum-Zeno-Effect
ABSTRACT
  Role of axiom of choice in quantum measurement is highlighted by suggesting
that the conscious observer chooses the outcome from a mixed state. Further, in
a periodically repeating universe, these outcomes must be pre-recorded within
the non-physical conscious observers, which precludes free will. Free will
however exists in a universe with open time, It is suggested that psychology's
binding problem is connected with Cantor's original definition of set.
Influence of consciousness on material outcome through quantum processes is
discussed and interesting constraints derived. For example, it is predicted
that quantum mechanical brain states should get frozen if monitored at
sufficiently small space-time intervals - a neuro-biological version of the so
called quantum zeno effect, which has been verified in domain of micro-physics.
Existence of a very small micro-mini-black-hole in brain is predicted as a
space-time structural interface between consciousness and brain, whose
vaporization explains mass-loss reported in weighing experiments, conducting
during the moments of death.

<|endoftext|><|startoftext|>
Introduction
Gamma-ray burst (GRB) jet structure, that is, the energy distribution E(θ) in the ultra-
relativistic collimated outflow, is at present not yet fully understood (Zhang & Mészáros
1 Department of Physics, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan;
takami@theo.phys.sci.hiroshima-u.ac.jp, ryo@theo.phys.sci.hiroshima-u.ac.jp.
2 NASA Goddard Space Flight Center, Greenbelt, MD 20771.
http://arxiv.org/abs/0704.1055v3
– 2 –
2002b). There are many jet models proposed in addition to the simplest uniform-jet model:
the power-law jet model (Rossi et al. 2002; Zhang & Mészáros 2002a), the Gaussian jet
model (Zhang et al. 2004), the annular jet model (Eichler & Levinson 2004), the multiple
emitting subshell model (Kumar & Piran 2000; Nakamura 2000), the two-component jet
model (Berger et al. 2003), and so on. The jet structure may depend on the generation
process of the jet and therefore may provide us important information about the central
engine of the GRB. For example, in the collapsar model for long GRBs (e.g., Zhang et al.
2003, 2004), the jet penetrates into and breaks out of the progenitor star, resulting in the
E(θ) ∝ θ−2 profile (Lazzati & Begelman 2005). For the compact binary merger model for
short GRBs, hydrodynamic simulations have shown that the resulting jet tends to have a
flat core surrounded by the power-law-like envelope (Aloy et al. 2005).
In the pre-Swift era, there were many attempts to constrain the GRB jet structure.
Thanks to the HETE-2, statistical properties of long GRBs, X-ray-rich GRBs, and X-
ray flashes were obtained (Sakamoto et al. 2005), which were thought to constrain the
jet models (Lamb et al. 2004). These observational results constrain various jet models,
such as the uniform-jet model (Yamazaki et al. 2004a; Lamb et al. 2005; Donaghy 2006),
the multiple subshell model (Toma et al. 2005b), the Gaussian jet model (Dai & Zhang
2005), and so on. For BATSE long GRBs, Yonetoku et al. (2005) derived the distribu-
tion of the pseudo-opening angle, inferred from the Ghirlanda (Ghirlanda et al. 2004) and
Yonetoku (Yonetoku et al. 2004) relations, as f(θj)dθj ∝ θ
j dθj , which is compatible with
that predicted by the power-law jet model as discussed in Perna et al. (2003) (however, see
Nakar et al. 2004). Afterglow properties are also expected to constrain the jet structure
(e.g., Granot & Kumar 2003); however, energy redistribution effects prevent us from reach-
ing a definite conclusion. Polarization measurements of optical afterglows bring us important
information (Lazzati et al. 2004).
In the Swift era, rapid follow-up observation reveals prompt GRBs followed by a steep de-
cay phase in the X-ray early afterglow (Tagliaferri et al. 2005; Nousek et al. 2006; O’Brien et al.
2006a). In the most popular interpretations, the steep decay component is the tail emis-
sion of the prompt GRB (the so called high-latitude emission), i.e., the internal shock
origin (Zhang et al. 2006; Yamazaki et al. 2006; Liang et al. 2006; Dyks et al. 2005), al-
though there are some other possibilities (e.g., Kobayashi et al. 2007; Panaitescu et al. 2006;
Pe’er et al. 2006; Lazzati & Begelman 2006; Dado et al. 2006). Then, for the uniform-jet
case, the predicted decay index is α = 1 − β, where we use the convention Fν ∝ T
−αν1+β
(Kumar & Panaitescu 2000). For power-law jet case (E(θ) ∝ θ−q), the relation is modified
to α = 1−β+(q/2). However, these simple analytical relations cannot be directly compared
with observations, because they are for the case in which the observer’s line of sight is along
the jet axis and because changing the zero of time, which potentially lies anywhere within
– 3 –
the epoch where we see the bright pulses, substantially alters the early decay slope.
Recently, Yamazaki et al. (2006) (Y06) investigated the tail emission of the prompt
GRB, finding that the jet structure can be described and that the global decay slope is
not so much affected by the local angular inhomogeneity as it is affected by the global
energy distribution. They also argued that the structured jet model is preferable, because
steepening GRB tail breaks appeared in some events. In this paper, we calculate for the first
time the distribution of the decay index of the prompt tail emission for various jet models
and find that the derived distributions can be distinguished from each other, so that the jet
structure can be more directly constrained than previous arguments. This paper is organized
as follows. We describe our model in § 2. In § 3, we investigate the distribution of the decay
index of the prompt GRB emission. Section 4 is devoted to discussions.
2. Tail Part of the Prompt GRB Emission
We consider the same model as discussed in the previous works (Y06; Yamazaki et al.
2004b; Toma et al. 2005a,b). The whole GRB jet, whose opening half-angle is ∆θtot, consists
of Ntot emitting subshells. We introduce the spherical coordinate system (r, ϑ, ϕ, t) in the
central engine frame, where the origin is at the central engine and ϑ = 0 is the axis of the
whole jet. Each emitting subshell departs at time t
(0 < t
< tdur, where j = 1, · · · , Ntot,
and tdur is the active time of the central engine) from the central engine in the direction of
~n(j) = (ϑ(j), ϕ(j)), and emits high-energy photons, generating a single pulse as observed. The
direction of the observer is denoted by ~nobs = (ϑobs, ϕobs). The observed flux from the jth
subshell is calculated when the following parameters are determined: the viewing angle of the
subshell θ(j)v = cos
−1(~nobs ·~n
(j)), the angular radius of the emitting shell ∆θ
, the departure
time t
, the Lorentz factor γ(j) = (1− β2
)−1/2, the emitting radius r
0 , the low- and high-
energy photon indices α
B and β
B , the break frequency in the shell comoving frame ν
(Band et al. 1993), the normalization constant of the emissivity A(j), and the source redshift
z. The observer time T = 0 is chosen as the time of arrival at the observer of a photon
emitted at the origin r = 0 at t = 0. Then, at the observer, the starting and ending times
of the jth subshell emission are given by
start ∼ t
dep +
1 + γ2
, (1)
1 + γ2
, (2)
where θ
+ = θ
v +∆θ
= max{0, θ(j)v −∆θ
}, and we use the formulas β(j) ∼ 1−1/2γ
and cos θ ∼ 1 − θ2/2 for γ(j) ≫ 1 and θ ≪ 1, respectively. The whole light curve from the
– 4 –
GRB jet is produced by the superposition of the subshell emission.
Y06 discussed some kinematical properties of prompt GRBs in our model and found
that each emitting subshell with θ(j)v ≫ ∆θ
produces a single, smooth, long-duration, dim,
and soft pulse, and that such pulses overlap with each other and make the tail emission of
the prompt GRB. Local inhomogeneities in the model are almost averaged during the tail
emission phase, and the decay index of the tail is determined by the global jet structure,
that is the mean angular distribution of the emitting subshell because in this paper all
subshells are assumed to have the same properties unless otherwise stated. Therefore, we
are essentially studying the tail emission from the usual continuous jets at once, i.e., from
uniform- or power-law jets with no local inhomogeneity. In the following, we study various
energy distributions of the GRB jet through the change of the angular distribution of the
emitting subshell.
3. Decay Index of the Prompt Tail Emission
In this section, we perform Monte Carlo simulations in order to investigate the jet
structure by calculating the statistical properties of the decay index of the tail emission.
For a fixed-jet model, we randomly generate 104 observers with their line of sights (LOSs)
~nobs = (ϑobs, ϕobs). For each LOS, the light curve, F (T ) of the prompt GRB tail in the
15–25 keV band is calculated, and the decay index is determined. The adopted observation
band is the low-energy end of the Burst Alert Telescope(BAT) detector and near the high-
energy end of the X-Ray Telescope(XRT) on Swift. Hence, one can observationally obtain
continuous light curves, beginning with the prompt GRB phase to the subsequent early
afterglow phase (Sakamoto et al. 2007), so that it is convenient for us to compare theoretical
results with observations. However, our actual calculations have shown that our conclusion is
not qualitatively altered, even if the observation band is changed, for example, to 0.5–10 keV,
as usually considered for other references.
For each light curve, the decay index is calculated by fitting F (T ) with a single power-
law form, ∝ (T − T∗)
−α, as in the following (see Fig. 1). The decay index α depends on the
choice of T∗ (Zhang et al. 2006; Yamazaki et al. 2006)
1. Let Ts and Te be the start and end
time, respectively, of the prompt GRB, i.e.,
Ts = min{T
start} (3)
1 Recently, Kobayashi & Zhang (2007) have discussed the way to choose the time zero. According to
their arguments, the time zero is near the rising epoch of the last bright pulse in the prompt GRB phase.
– 5 –
Te = max{T
} . (4)
Then, we take T∗ as the time until 99% of the total fluence, which is defined by Stotal =
F (T ′) dT ′, is radiated, that is,
F (T ′) dT ′ = 0.99Stotal . (5)
Then, the prompt GRB is in the main emission phase for T < T∗, while it is in the tail emis-
sion phase for T > T∗. The time interval [Ta, Tb], in which the decay index α is determined
assuming the form F (T ) ∝ (T − T∗)
−α, is taken to satisfy
F (Ta,b) = qa,bF (T∗) , (6)
where we adopt qa = 1 × 10
−2 and qb = 1 × 10
−3, unless otherwise stated. We find that in
this epoch the assumed fitting form gives a well approximation.
At first, we consider the uniform-jet case, in which the number of subshells per unit
solid angle is approximately given by dN/dΩ = Ntot/(π∆θ
tot) for ϑ < ∆θtot, where ∆θtot =
0.25 rad is adopted. The departure time of each subshell t
is assumed to be homogeneously
random between t = 0 and t = tdur = 20 sec. The central engine is assumed to produce
Ntot = 1000 subshells. In this section, we assume that all subshells have the same values of
the following fiducial parameters: ∆θsub = 0.02 rad, γ = 100, r0 = 6.0×10
14 cm, αB = −1.0,
βB = −2.3, hν
0 = 5 keV, and A = constant. Our assumption of constant A is justified
as follows. Note that the case in which N subshells that have the same brightness A are
launched into the same direction, but a different departure time, is equivalent to the case of
one subshell emission with the brightness of NA. This is because in the tail emission phase,
the second terms in the r.h.s. of Eqs. (1) and (2) dominate the first terms, so that the time
difference effect, which arises from the difference of tdep for each subshell, can be obscured.
Hence, giving the angular distribution of the emission energy is equivalent to giving the
angular distribution of the subshells with constant A. Also, Y06 showed that to obtain a
smooth, monotonic tail emission as observed by Swift, the subshell properties, hν ′0
and/or
A(j), cannot have wide scatter in the GRB jet. Therefore, we can expect, at least as the
zeroth-order approximation, that the subshells have the same properties.
The left panel of Fig. 2 shows the decay index α as a function of ϑobs. For ϑobs . ∆θtot
(on-axis case), α clusters around ∼ 3. On the other hand, when ϑobs & ∆θtot (off-axis case),
α rapidly increases with ϑobs. The reason is as follows. If all subshells are seen sideways
(that is, θ(j)v ≫ ∆θ
for all j), the bright pulses in the main emission phase followed by the
tail emission disappear because of the relativistic beaming effect, resulting in a smaller flux
contrast between the main emission phase and the tail emission phase compared with the on-
axis case. Then T∗ becomes larger. Furthermore, in the off-axis case, the tail emission decays
– 6 –
more slowly (|dF/dT | is smaller) than in the on-axis case. Then both Ta−T∗ and Tb−T∗ are
larger for the off-axis case than for the on-axis case. As can be seen in Fig. 3 of Zhang et al.
(2006), the emission seems to decay rapidly, so that the decay index α becomes large. The
left panel of Fig. 3 shows α as a function of the total fluence Stotal which is the sum of the
fluxes in the time interval, [Ts, Te]. In Fig. 3, both α and Stotal are determined observationally,
so that our theoretical calculation can be directly compared with the observation.
A more realistic model is the Gaussian jet model, in which the number of subshells per
unit solid angle is approximately given by dN/dΩ = C exp(−ϑ2/2ϑ2c) for 0 ≦ ϑ ≦ ∆θtot,
where C = Ntot/2πϑ
c [1 − exp(−∆θ
tot/2ϑ
c)] is the normalization constant. We find only a
slight difference between the results for the uniform- and the Gaussian jet models. Therefore,
we do not show the results for the Gaussian jet case in this paper.
Next, we consider the power-law distribution. In this case, the number of subshells per
unit solid angle is approximately given by dN/dΩ = C[1 + (ϑ/ϑc)
2]−1 for 0 ≦ ϑ ≦ ∆θtot,
i.e., dN/dΩ ≈ C for 0 ≦ ϑ ≪ ϑc and dN/dΩ ≈ C(ϑ/ϑc)
−2 for ϑc ≪ ϑ ≦ ∆θtot, where C =
(Ntot/πϑ
c)[ln(1 + (∆θtot/ϑc)
2)]−1 is the normalization constant and we adopt ϑc = 0.02 rad
and ∆θtot = 0.25 rad. The other parameters are the same as for the uniform-jet case.
As can be seen in the right panels of Figs. 2 and 3, both the ϑobs–α and Stotal–α diagrams
are complicated compared with the uniform-jet case. When ϑobs . ϑc, the observer’s LOS is
near the whole jet axis. Compared with the uniform-jet case, α is larger, because the power-
law jet is dimmer in the outer region, i.e., emitting subshells are sparsely distributed near
the periphery of the whole jet (see also the solid lines of Figs. 1 and 3 of Y06). If ϑobs ≫ ϑc,
the scatter of α is large. Some bursts have an especially small α of around 2. This comes
from the fact that the power-law jet has a core region (0 < ϑ . ϑc), where emitting subshells
densely distributed compared with the outer region. The core generates the light-curve break
in the tail emission phase, as can be seen in Fig. 4 (Y06). In the epoch before the photons
emitted by the core arrive at the observer (e.g., T − Ts . 7.5 × 10
2 s for the solid line in
Fig. 4), the number of subshells that contribute to the flux at time T , Nsub(T ), increases
with T more rapidly than for the uniform-jet case. Then, the light curve shows a gradual
decay. If the fitting region [Ta, Tb] lies in this epoch, the decay index α is around 2. In the
epoch after the photons arising from the core are observed (e.g., T −Ts & 7.5× 10
2 s for the
solid line in Fig. 4), the subshell emission with θ(j)v & ϑobs + ϑc is observed. Then Nsub(T )
rapidly decreases with T , and the observed flux suddenly drops. If the interval [Ta, Tb] lies
in this epoch, the decay index becomes larger than 4.
To compare the two cases considered above more clearly, we derive the distribution of
the decay index α. Here we consider the events whose peak fluxes are larger than 10−4 times
of the largest one in all simulated events, because the events with small peak fluxes are not
– 7 –
observed. Fig. 5 shows the result. For the uniform-jet case (solid line), α clusters around 3,
while for the power-law jet case (dotted line), the distribution is broad (1 . α . 7) and has
multiple peaks.
So far, we have considered the fiducial parameters. In the following, we discuss the
dependence on parameters, r0, γ, βB, tdur, and ∆θtot (It is found that the α-distribution
hardly depends on the value of αB, ∆θsub, and ν
0 within reasonable parameter ranges). At
first, we consider the case in which r0 = 1.0 × 10
14 cm is adopted, with other parameters
being fiducial. Fig. 6 shows the result. The shape of the α-distribution is almost the
same as that for the fiducial parameters, in both the uniform- and the power-law jet cases.
This comes from the fact that in a tail emission phase, the light curve for a given r0 is
approximately written as F (T ; r0) ≈ g(cT/r0), where a function g determines the light-curve
shape of the tail emission for other given parameters. Then, the light curves in the case of
r0 = r0,1 and r0 = r0,2, namely, F (T ; r0,1) and F (T ; r0,2), satisfy the relation F (T ; r0,2) ≈
F ((r0,1/r0,2)T ; r0,1). This can be seen, for example, by comparing the solid line with the
dotted one in Fig. 4. Hence, T∗, Ta, and Tb are approximately proportional to r0; in this
simple scaling, one can easily find that α remains unchanged for different values of r0.
Second, we consider the case of γ = 200 and r0 = 2.4× 10
15 cm, with other parameters
being fiducial. In this case, the angular spreading timescale (∝ r0/γ
2) is the same as in the
fiducial case, so that the tail emissions still show smooth light curves, although the whole
emission ends later, according to the scaling Te ∝ r0 (see the dot-dashed line in Fig. 4).
Fig. 7 shows the result. For large γ, the relativistic beaming effect is more significant, so
that the events in ϑobs & ∆θtot, which cause large α, are dim compared with the small-γ
case. Such events cannot be observed. For the power-law jet case, therefore, the number of
large-α events becomes small, although the distribution is still broad (1 . α . 4) and has
two peaks. On the other hand, for the uniform-jet case, the distribution of the decay index
α is almost the same as for the fiducial parameter set, because the value of the decay index
α in the case of ϑobs . ∆θtot is almost the same as that in the case of ϑobs & ∆θtot.
Third, we change the value of the high-energy photon index βB from −2.3 to −5.0, with
other parameters being fiducial. Fig. 8 shows the result. For the uniform-jet case, the mean
value is 〈α〉 ∼ 4, while 〈α〉 ∼ 3 for the fiducial parameters, so that the decay index defined
in this paper does not obey the well-known formula α = 1−βB (Kumar & Panaitescu 2000).
For the power-law jet case, the whole distribution shifts toward the higher value, and the
ratio of the two peaks changes. In the tail emission phase, the spectral peak energy Epeak is
below 15 keV (see also Y06), so that the steeper the spectral slope of the high-energy side
of the Band function, the more rapidly the emission decays, resulting in the dimmer tail
emission (see the dashed line in Fig. 4). Then, the fitting region [Ta, Tb] shifts toward earlier
– 8 –
epochs, because T∗ becomes small. Therefore, the number of events with small α increases,
and the number of events with large α decreases. Furthermore, we comment on the case in
which βB is varied for each event in order to more directly compare with the observation.
Here we randomly distribute βB according to the Gaussian distribution with a mean of −2.3
and a variance of 0.4. It is found that the results are not qualitatively changed.
Next, we change the value of the duration time tdur from 20 sec to 200 sec, with other
parameters being fiducial. The epoch of the bright pulses in the main emission phase becomes
longer than that in tdur = 20 sec. However, the behavior of the tail emission does not depend
on tdur very much (see Fig. 9). Therefore, the distribution of the decay index α is almost
the same as that for the fiducial parameters for both the uniform-jet case and the power-
law jet case. Even if we consider the case in which tdur is randomly distributed for each
event according to the lognormal distribution with an average of log(20 s) and a logarithmic
variance of 0.6, the results are not significantly changed.
Finally, we discuss the dependence on ∆θtot. Only the uniform-jet case is considered,
because the structured jet is usually quasi-universal and because we focus our attention
on the behavior of the uniform-jet model. The dotted line in Fig. 10 shows the result for
constant ∆θtot = 0.1 rad with other parameters being fiducial. We can see many events with
large α. The large α is observed because for small ∆θtot, although the off-axis events (i.e.,
∆θtot . ϑobs) are still dim because of the relativistic beaming effect, a fraction of such events
survives the flux threshold condition and are observable. Such events have large α & 5 (see
the 4th paragraph of this section, which explains the left panel of Fig. 2). This does not
occur in the large-∆θtot case. However, we still find in this case that there are no events
with α . 2. We consider another case in which ∆θtot is variable. Here we generate events
whose ∆θtot distributes as f∆θtotd(∆θtot) ∝ ∆θtot
−2d(∆θtot) (0.05 . ∆θtot . 0.4). Then for
a given ∆θtot, the quantities ν
0 and A are determined by hν
0 = (∆θtot/0.13)
−3.6 keV and
A ∝ (∆θtot)
−7.3, respectively. Other parameters are fiducial. If the model parameters are
chosen in this way, the Amati and Ghirlanda relations (Amati et al. 2002; Ghirlanda et al.
2004) are satisfied, and the event rates of long GRBs, X-ray-rich GRBs and X-ray flashes
become similar (Donaghy 2006). The solid line in Fig. 10 shows the result. Again we find
that there are no events with α . 2.
In summary, when we adopt model parameters within reasonable ranges, the decay
index becomes larger than ∼ 2 for the uniform- and the Gaussian jet cases, while a significant
fraction of events with α . 2 is expected for the power-law jet case. Therefore, if a non-
negligible number of events with α . 2 are observed, both the uniform- and the Gaussian
jet models will be disfavored. Furthermore, if we observationally derive the α-distribution,
the structure of GRB jets will be more precisely determined.
– 9 –
4. Discussion
We have calculated the distribution of the decay index, α, for the uniform-, Gaussian,
and the power-law jet cases. For the uniform-jet case, α becomes larger than ∼ 2, and its
distribution has a single peak. The Gaussian jet model predicts almost the same results as
the uniform-jet model. On the other hand, for the power-law jet case, α ranges between
∼ 1 and ∼ 7, and its distribution has multiple peaks. Therefore, we can determine the jet
structure of GRBs by analyzing a lot of early X-ray data showing a steep decay component
that is identified as a prompt GRB tail emission. However, one of the big challenges in the
Swift data for calculating the decay index in our definition is to derive the composite light
curve of BAT and XRT. Since the observed energy bands of BAT and XRT do not overlap,
we are forced to extrapolate one of the data sets to plot the light curve in a given energy
band. To derive the composite light curve unambiguously for a prompt and an early X-ray
emission, we need an observation of a prompt emission by current instruments, which overlap
the energy range of XRT.
The tail behavior with α . 2 does not appear in the uniform- and the Gaussian jet
models; hence, it is important to constrain the jet structure. However, in practical observa-
tions, such gradually decaying prompt tail emission might be misidentified with the external
shock component, as expected in the pre-Swift era. Actually, some events have shown such
a gradual decay, without the steep and the shallow decay phases, and their temporal and
spectral indices are consistent with a classical afterglow interpretation (O’Brien et al. 2006b).
Hence, in order to distinguish the prompt tail emission from the external shock component
at a time interval [Ta, Tb], one should study the spectral evolution and/or the continuity
and smoothness of the light curve (i.e., whether breaks appear or not) over the entire burst
emission.
In this paper, we adopt qa = 1×10
−2 and qb = 1×10
−3 when the fitting epoch [Ta, Tb] is
determined [see Eq. (6)]. Then, the prompt tail emission in this time interval is so dim that
it may often be obscured by the external shock component, causing a subsequent shallow
decay phase of the X-ray afterglow. One possible way to resolve this problem is to adopt
larger values of qa and qb, e.g., qa = 1/30 and qb = 1 × 10
−2, in which the interval [Ta, Tb]
shifts toward earlier epochs, so that the flux then is almost always dominated by the prompt
tail emission. We have calculated the decay index distribution for this case (qa = 1/30
and qb = 1 × 10
−2) and have found that the differences between uniform- and power-law
jets still arises as can be seen in the case of qa = 1 × 10
−2 and qb = 1 × 10
−3, so that our
conclusion remains unchanged. However, the duration of the interval, Tb−Ta, becomes short,
which might prevent us from observationally fixing the decay index at high significance. If
qa & 1/30, the emission at [Ta, Tb] is dominated by the last brightest pulse. Then, the light-
– 10 –
curve shape at [Ta, Tb] does not reflect the global jet structure, but reflects the properties of
the emitting subshell causing the last brightest pulse. Another way to resolve the problem
is to remove the shallow decay component. For this purpose, the origin of the shallow decay
phase should be clarified in order to extract the dim prompt tail emission exactly. The other
problem is contamination of X-ray flares, whose contribution has to be removed in order to
investigate the tail emission component. In any case, if the GRB occurs in an extremely
low-density region (a so-called naked GRB), where the external shock emission is expected
to be undetectable, our method may be a powerful tool to investigate the GRB jet structure.
This work was supported in part by Grants-in-Aid for Scientific Research of the Japanese
Ministry of Education, Culture, Sports, Science, and Technology 18740153 (R. Y.). T.S. was
supported by an appointment of the NASA Postdoctoral Program at the Goddard Space
Flight Center, administered by Oak Ridge Associated Universities through a contract with
NASA.
REFERENCES
Aloy, M. A., Janka, H.-T., & Müller, E. 2005, A&A, 436, 273
Amati, L., et al. 2002, A&A, 390, 81
Band, D. L., et al. 1993, ApJ, 413, 281
Berger, E., et al., 2003, Nature, 426, 154
Dado, S., Dar, A., & De Rujula, A. 2006, ApJ, 646, L21
Dai, X. & Zhang, B. 2005, ApJ, 621, 875
Donaghy, T. Q. 2006, ApJ, 645, 436
Dyks, J., Zhang, B., & Fan, Y. Z. 2005, astro-ph/0511699
Eichler, D., & Levinson, A. 2004, ApJ, 614, L13
Ghirlanda, G., Ghisellini, G., & Lazzati, D. 2004, ApJ, 616, 331
Granot, J. & Kumar, P. 2003, ApJ, 591, 1086
Kobayashi, S. & Zhang, B. Mészáros, P., & Burrows, D., 2007, ApJ, 655, 391
Kobayashi, S. & Zhang, B. 2007, ApJ, 655, 973
http://arxiv.org/abs/astro-ph/0511699
– 11 –
Kumar, P. & Panaitescu, A. 2000, ApJ, 541, L51
Kumar, P., & Piran, T. 2000, ApJ, 535, 152
Lamb, D. Q., et al. 2004, NewA Rev., 48, 423
Lamb, D. Q., Donaghy, T. Q., & Graziani, C. 2005, ApJ, 620, 355
Lazzati, D., et al. 2004, A&A, 422, 121;
Lazzati, D. & Begelman, M. C. 2005, ApJ, 629, 903
Lazzati, D. & Begelman, M. C. 2006, ApJ, 641, 972;
Liang, E. W., et al. 2006, ApJ, 646, 351
Nakamura, T. 2000, ApJ, 534, L159
Nakar, E., Granot, J., & Guetta, D. 2004, ApJ, 606, L37
Nousek, J. A., et al. 2006, ApJ, 642, 389
O’Brien, P. T., et al. 2006a, ApJ, 647, 1213
O’Brien, P. T., Willingale, R., Osborne, J. P., & Goad, M. R. 2006b, New J. Phys. 8, 121
Panaitescu, A., Mészáros, P., Gehrels, N., Burrows, D., & Nousek, J. 2006, MNRAS, 366,
Pe’er, A., Mészáros, P., Rees, M. J., 2006, ApJ, 652, 482
Perna, R., Sari R., & Frail, D. 2003, ApJ, 594, 379
Rossi, E., Lazzati, D., & Rees, M. J. 2002, MNRAS, 332, 945
Sakamoto, T., et al. 2005, ApJ, 629, 311
Sakamoto, T., et al. 2007, submitted to ApJ
Tagliaferri, G., et al. 2005, Nature, 436, 985
Toma, K., Yamazaki, R., & Nakamura, T. 2005a, ApJ, 620, 835
Toma, K., Yamazaki, R., & Nakamura, T. 2005b, ApJ, 635, 481
Yamazaki, R., Ioka, K., & Nakamura, T. 2004a, ApJ, 606, L33
– 12 –
Yamazaki, R., Ioka, K., & Nakamura, T. 2004b, ApJ, 607, L103
Yamazaki, R., Toma, K., Ioka, K., & Nakamura, T. 2006, MNRAS, 369, 311 (Y06)
Yonetoku, D., Murakami, T., Nakamura, T., Yamazaki, R., Inoue, A. K., & Ioka, K. 2004,
ApJ, 609, 935
Yonetoku, D., Yamazaki, R., Nakamura, T., & Murakami, T. 2005, MNRAS, 362, 1114
Zhang, B., & Mészáros, P. 2002a, ApJ, 571, 876
Zhang, B., & Mészáros, P. 2002b, Int. J. Mod. Phys. A, 19, 2385
Zhang, B., Dai, X., Lloyd-Ronning, N. M., & Mészáros, P. 2004, ApJ, 601, L119
Zhang, B., et al. 2006, ApJ, 642, 354
Zhang, W., Woosley, S. E., & MacFadyen, A. I. 2003, ApJ, 586, 356
Zhang, W., Woosley, S. E., & Heger, A. 2004, ApJ, 608, 365
This preprint was prepared with the AAS LATEX macros v5.2.
– 13 –
Ts T Te
 F(Tb)
 F(Ta)
 F(T  )
Ta Tb
1/100 1/1000
T [sec]*
fitting region
Fig. 1.— Example of how the decay index α is determined by the calculated light curve
F (T ). The start and end time of the burst are denoted by Ts and Te, respectively. The time
T∗ is determined by Eq. (5). The decay index α is determined by fitting F (T ) ∝ (T − T∗)
in the time interval [Ta, Tb].
Fig. 2.— Decay index α as a function of ϑobs, the angle between the whole jet axis and the
observers’ lines of sight. Red and green points represent events whose peak fluxes are larger
and smaller than 10−4 times the largest one in all simulated events, respectively. Left and
right panels are for the uniform- and power-law jet cases, respectively.
– 14 –
Fig. 3.— Decay index α as a function of the total fluence Stotal, the sum of the fluxes in
the time interval [Ts, Te]. Red and green points represent events whose peak fluxes are larger
and smaller than 10−4 times the largest one in all simulated events, respectively. Left and
right panels are for the uniform- and power-law jet cases, respectively.
– 15 –
10-12
10-10
102 103 104
T-Ts [sec]
fiducial
r0=1.0x10
γ=200 , r0=2.4x10
βB=-5.0
Fig. 4.— Examples of light curves of the prompt tail emission in the 15–25 keV band
for the power-law jet case and ϑobs > ϑc (ϑobs = 0.27 rad and ϑc = 0.02 rad). The solid
line shows the fiducial parameters. A bump caused by the core emission can be seen at
T − Ts ∼ 7.5 × 10
2 s. The dotted, dot-dashed, and dashed lines are for r0 = 1.0× 10
14 cm;
r0 = 2.4 × 10
15 cm and γ = 200; and βB = −5, respectively, with other parameters being
fiducial. Time intervals [Ta, Tb] for each case are denoted by the thick solid lines. The flux
is normalized by the peak value.
– 16 –
 0.05
 0.15
 0.25
 0  1  2  3  4  5  6  7  8
α ( decay index ) 
Fig. 5.— Distributions of the decay index α for uniform-jet (dN/dΩ = const.;solid line) and
power-law jet (dN/dΩ ∝ [1 + (ϑ/ϑc)
2]−1;dotted line) models, respectively. We assume that
all subshells have the same values of the following fiducial parameters: ∆θsub = 0.02 rad,
γ = 100, r0 = 6.0 × 10
14 cm, αB = −1.0, βB = −2.3, hν
0 = 5 keV, and A = constant. We
consider events whose peak fluxes are larger than 10−4 times the largest one in all simulated
events (red points in Fig. 2).
– 17 –
 0.05
 0.15
 0.25
 0  1  2  3  4  5  6  7  8
α ( decay index ) 
Fig. 6.— Same as Fig. 5, but for r0 = 1.0× 10
14 cm.
– 18 –
 0.05
 0.15
 0.25
 0.35
 0  1  2  3  4  5  6  7  8
α ( decay index ) 
Fig. 7.— Same as Fig. 5, but for γ = 200 and r0 = 2.4× 10
15 cm.
– 19 –
 0.05
 0.15
 0.25
 0  1  2  3  4  5  6  7  8
α ( decay index ) 
Fig. 8.— Same as Fig. 5, but for βB = −5.0.
– 20 –
 0.05
 0.15
 0  1  2  3  4  5  6  7  8
α ( decay index ) 
Fig. 9.— Same as Fig. 5, but for tdur = 200 sec.
– 21 –
 0.05
 0.15
 0.25
 0.35
 0  5  10  15  20
α ( decay index ) 
Fig. 10.— Distribution of the decay index α for the uniform-jet profile. The dotted line is for
∆θtot = 0.1 rad with other fiducial parameters. The solid line is for the variable-∆θtot case,
in which we generate events whose ∆θtot distributes as f∆θtotd(∆θtot) ∝ ∆θtot
−2d(∆θtot)
(0.05 . ∆θtot . 0.4), and for given ∆θtot, the quantities ν
0 and A are determined by
hν ′0 = (∆θtot/0.13)
−3.6 keV and A ∝ (∆θtot)
−7.3, respectively. Other parameters are fiducial.
	Introduction
	Tail Part of the Prompt GRB Emission
	Decay Index of the Prompt Tail Emission
	Discussion
ABSTRACT
  We show that the jet structure of gamma-ray bursts (GRBs) can be investigated
with the tail emission of the prompt GRB. The tail emission which we consider
is identified as a steep-decay component of the early X-ray afterglow observed
by the X-ray Telescope onboard Swift. Using a Monte Carlo method, we derive,
for the first time, the distribution of the decay index of the GRB tail
emission for various jet models. The new definitions of the zero of time and
the time interval of a fitting region are proposed. These definitions for
fitting the light curve lead us an unique definition of the decay index, which
is useful to investigate the structure of the GRB jet. We find that if the GRB
jet has a core-envelope structure, the predicted distribution of the decay
index of the tail has a wide scatter and has multiple peaks, which cannot be
seen for the case of the uniform and the Gaussian jet. Therefore, the decay
index distribution tells us the information on the jet structure. Especially,
if we observe events whose decay index is less than about 2, both the uniform
and the Gaussian jet models will be disfavored according to our simulation
study.

<|endoftext|><|startoftext|>
Introduction
We briefly review recent developments in the mini black hole production and evaporation
mainly based on the series of works done by Ida, Oda and Park [1, 2, 3, 4]. In the low energy
gravity scenarios such as ADD and RS-I, the CERN Large Hadronic collider (LHC) will become
a black hole factory [5, 6]. Above the TeV Planck scale, the classical production cross section of
the (4+n)-dimensional black hole grows geometrically σ ∼ ŝ1/(n+1), with
ŝ being the center of
mass energy of the parton scattering.
Once produced, black hole loses its energy or mass primarily via Hawking (thermal) radia-
tion. The Hawking radiation goes mainly into the standard model quarks and leptons (spinors) and
gauge bosons (vectors) localized on the brane, except for a few gravitons and Higgs boson(s). The
quanta of Hawking radiation will have characteristic energy spectrum determined by the Hawking
temperature and the greybody factor. The process of Hawking radiation in four dimensional rotat-
ing black hole has been treated in detail by Teukolsky, Press, Page and others in 1970s’. In higher
dimensions, however, it is shown that the process has quite different features.
• Hawking temperaure TH ∝ (Mbh)/M∗)1/(n+1) of a (4+ n) dimensional black hole is much
higher than 4 dimensional one with the small fundamental scale M∗ ∼ TeV ≪ MPlanck. With
this high temperature, the number of available degrees of freedom for Hawking radiation
are much bigger in (4+n) dimensions with all the standard model particles localized on the
brane [8].
• The near horizon geometry of (4+ n) dimensional black hole is quite complicated. Its ge-
ometry is different from that of a four dimensional Kerr black hole. With the highly mod-
ified geometry in the vicinity of the event horizon, frequency dependent correction factor
of Hawking radiation, i.e., greybody factor, is also largely modified (also see the references
[9, 10, 11] as independent studies on the same topic).
To understand the physics of those black holes, we have to understand the greybody factor of
higher dimensional, rotating black hole [12, 13].
2. Generalized Teukolsky equation and greybody factor
The induced metric on the three-brane in the (4+ n)-dimensional Myers-Perry solution [7]
with a single nonzero angular momentum is given by
∆−a2 sin2 ϑ
dt2 +
2a(r2 +a2 −∆)sin2 ϑ
2 +a2)2 −∆a2 sin2 ϑ
sin2 ϑdϕ2 − Σ
dr2 −Σdϑ2, (2.1)
where
Σ = r2 +a2 cos2 ϑ , ∆ = r2 +a2 −µr1−n. (2.2)
Black hole’s Life at colliders Seong Chan Park
The parameters µ and a are equivalent to the total mass M and the angular momentum J
(2+n)A2+nµ
16πG4+n
, J =
A2+nµa
8πG4+n
(2.3)
evaluated at the spatial infinity of the (4+n)-dimensional space-time, respectively, where A2+n =
2π(3+n)/2/Γ((3+ n)/2) is the area of a unit (2+ n)-sphere and G4+n is the (4+ n)-dimensional
Newton constant of gravitation.
1.Subtracting outgoing wave contamination at NH and separating ingoing and outgoing wave
at FF are described. 2.Here we answer the question :what fraction of energy would be radiated into
Hawking radiation in spin-down phase.
2.1 Asymptotic solutions in Kerr-Newman frame
We are given a linear, second-order equation, say
+ τR = 0, (2.4)
where η and τ are determined in Kerr-Newman frame as:
η = −
(s−1)∆′+2iK
, (2.5)
2iωr(2s−1)−λ
. (2.6)
∆ = r2 +a2 − (r2H +a2)
K = (r2 +a2)w−ma. (2.7)
The asymptotic solutions are given at NH and FF limits:
RNH ∼ Win +Woute2ikr∗∆s, (2.8)
RFF ∼ Yinr2s−1 +Youte2ikr∗/r. (2.9)
2.2 BC: Subtracting outgoing contamination at NH
The solutions near the horizon r → rH are
RNHin = 1+a1(r− rH)+
(r− rH)2 + · · · ,
RNHout = e
2ikr∗(r− rH)s (1+b1(r− rH)+ · · ·) , (2.10)
where the coefficients ai’s and bi’s are straightforward to compute:
a1 = −
, (2.11)
a2 = −
(η0 + τ−1)a1 + τ0
1+η−1
, (2.12)
· · · , (2.13)
Black hole’s Life at colliders Seong Chan Park
where η j and τ j are j-th order coefficients of Taylor expansion of η and τ , respectively.
For s = 1/2 and 1,
τ−1 =
2iωδs,1 −λ
, (2.14)
τ0 = λ
+δs,1
(∆1 −∆2), (2.15)
η−1 =
δs,1/2 −2i
, (2.16)
η0 = −
(K1∆1 −K0∆2), (2.17)
where
∆1 = 2+(n−1)(1+a2), (2.18)
∆2 = 1−n(n−1)(1+a2)/2, (2.19)
K0 = (1+a
2)ω −am, (2.20)
K1 = 2ω . (2.21)
The problem is to integrate Eq.2.4 from purely ingoing initial conditions at r = r0 out to r → ∞.
Choosing the positive choice for s makes Yin stable and easily determined by an outward integration.
However, for such an integration RNHout is unstable against contaminating the purely ingoing solution.
We can avoid the difficulty in mid-integration, relying on a mathematical transformation of the
equation to stabilize the solutions in the two asymptotic regimes r → rH and r → ∞ as follows. To
counteract the above contamination, let
R̃ = R− (1+a1(r− rH)). (2.22)
Then f satisfies the equation
L R̃ = g, (2.23)
where L = d2/dr2+ηd/dr+τ and g=−L (1+a1(r− rH)) =−ηa1−τ (1+a1(r− rH)). Equa-
tion 2.23 is now stably integrated through both asymptotic regimes, i.e., from the near horizon to
far field regimes.
Near the horizon, R̃ becomes
R̃(r → rH) =
(r− rH)2 + · · · . (2.24)
2.3 Separating solution at FF
For s = 1/2,
RFFin ∼ 1+
+ · · · ,
RFFout ∼ e2ikr∗
· · ·), (2.25)
Black hole’s Life at colliders Seong Chan Park
where
1 = −i
, (2.26)
· · · (2.27)
Then, R̃1/2 becomes
R̃1/2(r → ∞)≃ (Yin +1−a1)+a1r+
+Yout
e2iωr∗
. (2.28)
Using this expression, we can easily read out Y ’s without numerical difficulties.
Finally, the greybody factor could be written as
Γs=1/2 = 1−
|c f1 |
|Yout|2
|Yin|2
. (2.29)
For s = 1,
RFFin ∼ r(1+
+ · · ·),
RFFout ∼ e2ikr∗
+ · · ·), (2.30)
where
cv1 = −i
, (2.31)
cv2 = −
λ 2 −4aω(aω −m)
, (2.32)
· · · .
Then, R̃1 becomes
R̃1(r → ∞)≃ (Yin −a1) r+(Yincv1 −1+a1)+
+Yout
e2iωr∗
. (2.33)
Using this expression, we can easily read out Y ’s without numerical difficulties.
Finally, the greybody factor could be written as
Γs=1 = 1−
|cv2|
|Yout|2
|Yin|2
. (2.34)
3. Hawking radiations of mini-black hole
Schematically black hole evolution follows five successive steps as is depicted in Fig.1: the
production phase, the balding phase, the spin-down phase, the Schwarzschild phase and the Planck
phase. When a black hole is produced in high energy collision (production phase), the geometry
is highly irregular, and could even be topologically non-trivial. By emitting (bulk) gravitons and
other particles, the black hole will be settled down to a rotating black hole which is supposed to be
well described by Myers-Perry solution in (4+n) dimensional spacetime (balding phase).
Black hole’s Life at colliders Seong Chan Park
The decay in spin-down and Schwarzschild phases are calculable in terms of Hawking radi-
ation. We are interested in those phases (spin-down phase and Schwarzschild phase) and would
answer what fraction of energy will be lost in each of these phases.
The rate of energy (and angular momentum) loss by Hawking radiation is given as follows:
2π ∑s,l,m
dω〈Ns,l,m〉
, (3.1)
where gs is the number of “massless" degrees of freedom at temperature T , namely, the number of
degrees of freedom whose masses are smaller than T , with spin s. The expected number of particles
of the species of spin s emitted in the mode with spheroidal harmonics l, axial angular momentum
〈Ns,l,m〉=
sΓl,m(a,ω)
e(ω−mΩ)/T − (−1)2s
. (3.2)
1 2 3 4 5
Ω r_h
D=10, s=1, a*=0.01
1 2 3 4 5 6
Ω r_h
D=10, s=1, a*=0.3
2 4 6 8
Ω r_h
D=10, s=1, a*=0.6
2 4 6 8 10 12
Ω r_h
D=10, s=1, a*=0.9
2.5 5 7.5 10 12.5 15
Ω r_h
D=10, s=1, a*=1.2
5 10 15 20
Ω r_h
D=10, s=1, a*=1.5
Figure 1: Hawking radiation from D = 10 black hole. s = 1,a = 0.01,0.3,0.6,0.9,1.2,1.5.
Black hole’s Life at colliders Seong Chan Park
4. Time evolution
From the ratio of energy and angular momentum in eq.3.1, we can define a scale invariant
function γ(as = a/rs) as follows:
γ−1(as) ≡
d lnas
d lnM
(4.1)
. (4.2)
Now we calculate the ratio of final(M f ) and initial(Mi) energy of black hole by integrating the
eq.4.1 with as(ini) for initial angular momentum.
= Exp
∫ as(final)
as(ini)
γ(as)
. (4.3)
The amount of energy which is radiated in spin-down phase (0 ≈ as(final)6 as 6 as(ini)) is (Mi −
M f ) and M f will be also radiated in Schwarzschild phase where the angular momentum of black
hole is zero.
Next, let us consider the evolution of the black hole. Since time scales as rn+3s in (4+ n)
dimensions 1, it is convenient to introduce scale invariant rates for energy and angular momentum
as follows.
α(as) ≡ −rn+3s
d lnM
, (4.4)
β (as) ≡ −rn+3s
d lnJ
, (4.5)
with these new variables γ(as) can be written as γ−1(as) = β/α(as)− (n+ 2)/(n+ 1). We also
introduce dimensionless variables y and z to take angular momentum and mass of the hole:
y ≡ − lnas, (4.6)
z ≡ − ln
, (4.7)
then finally we get the time variation of energy and angular momentum in terms of scale-invariant
time parameter (τ = r−n−3s (ini)t) with initial mass of the hole:
= (β −α
n+1 z. (4.8)
After finding the solutions z(y) and τ(y) of the coupled differential equations 4.8, one can get
y(τ) and z(τ), hence as and M/Mi, as a function of time. From these, one can find how other
quantities evolve, such as the area.
1We can easily understand this by simply looking at the formula −dM/dt ∼ AT 4 where the surface area of horizon
A ∼ r2s for brane fields and the temperature of the hole T ∼ 1/rs and M ∼ rn+1s .
Black hole’s Life at colliders Seong Chan Park
0 0.5 1 1.5 2 2.5
PSfrag replacements
0 0.2 0.4 0.6 0.8 1
PSfrag replacements
Figure 2: Evolution of bh in D = 10.
Up to now we have used a unit where the size of event horizon is fixed as rh = 1 and angular
momentum of the hole is parameterized by (ah = a/rh). For conversion of unit, the following
expressions are useful with as = ah/(1+a
1/(n+1).
α(as) = −ιn+1n (1+a2h)
n+1 r2h
, (4.9)
β (as) = −κn+1n (1+a2h)
n+1 r2h
, (4.10)
where
ιn = rsM−
n+1 =
(n+2)Ωn+2
, (4.11)
κn = ιn(
n+1 . (4.12)
In Fig.2, black hole evolution in units of the initial mass as a function of rotation parameter as
for scalar(s), fermion(f), vector(v), and sum of all the standard model particles(SM) in D = 5(left)
and D = 10 (right) are shown. The initial angular momentum parameter is fixed by as = 0.83
and 2.67 in D = 5 and D = 10 that are the maximal rotations allowed by the initial collision,
respectively. The mass of the hole goes to zero before the rotation parameter goes to zero when
only scalar emission is available. However, when all the standard model fields are turned on, the
evolution is essentially determined by the spinor and vector radiation. It is found that a black hole
spins down quickly at the first stage with large rotation parameter as and the decrease of rotation
parameter slows down as angular momentum of the hole is reduced.
When all the standard model fields are turned on (SM), the evolution is essentially determined
by the spinor and vector radiation. The figures show that a black hole spins down quickly at the first
stage with large rotation parameter and the decrease of rotation parameter slows down as angular
momentum of the hole is reduced.
5. Summary and Discussion
The complete description of Hawking radiation to the brane localized SM fields and the con-
sequent time evolution of mini black hole in the context of low energy gravity scenario has been
made.
Black hole’s Life at colliders Seong Chan Park
We have developed analytic and numerical methods to solve the radial Teukolsky equation
which has been generalized to the higher dimension (D = 4+n). Two main points in our numerical
methods are as follows. First, we have imposed the proper purely-ingoing boundary condition near
the horizon without the growing contamination of the out-going wave by extracting lower order
terms explicitly. Second, we have developed the method to fit the in-going and out-going part from
the numerically integrated wave solution at far field region by explicitly obtaining the next-to-next
order expansion (or next-to-next-to-next order in vector case) of the solution. With these progress
in numerical treatment, we can safely integrate the generalized Teukolsky equation up to very large
r without out-going wave contamination.
Then we have calculated all the possible modes to completely determine the radiation rate of
the mass and angular momentum of the hole. Totally 3407 are computed explicitly, other than the
modes which are confirmed to be negligible. A black hole tends to lose its angular momentum at
the early stage of evolution. However the black hole still have a sizable rotating parameter after
radiating half of its mass. More than 70% or 80% of black hole’s mass is lost during the spin down
phase.
Now that we have completely determined the radiation and evolution of the spin-down and
Schwarzschild phases, only remaining hurdle is the evaluation of the balding phase, which is still
being disputed due to its non-purturbative nature, to extract the quantum gravitational information
at the Planck phase from the experimental data at LHC.
References
[1] D. Ida, K. y. Oda and S. C. Park, Phys. Rev. D 67, 064025 (2003) [Erratum-ibid. D 69, 049901
(2004)] [arXiv:hep-th/0212108].
[2] D. Ida, K. y. Oda and S. C. Park, arXiv:hep-ph/0501210.
[3] D. Ida, K. y. Oda and S. C. Park, Phys. Rev. D 71, 124039 (2005) [arXiv:hep-th/0503052].
[4] D. Ida, K. y. Oda and S. C. Park, Phys. Rev. D 73, 124022 (2006) [arXiv:hep-th/0602188].
[5] S. B. Giddings and S. D. Thomas, Phys. Rev. D 65, 056010 (2002) [arXiv:hep-ph/0106219].
[6] S. Dimopoulos and G. Landsberg, Phys. Rev. Lett. 87, 161602 (2001) [arXiv:hep-ph/0106295].
[7] R. C. Myers and M. J. Perry, Annals Phys. 172, 304 (1986).
[8] R. Emparan, G. T. Horowitz and R. C. Myers, Phys. Rev. Lett. 85, 499 (2000)
[arXiv:hep-th/0003118].
[9] C. M. Harris and P. Kanti, Phys. Lett. B 633, 106 (2006) [arXiv:hep-th/0503010].
[10] M. Casals, P. Kanti and E. Winstanley, JHEP 0602, 051 (2006) [arXiv:hep-th/0511163].
[11] M. Casals, S. R. Dolan, P. Kanti and E. Winstanley, arXiv:hep-th/0608193.
[12] S. C. Park and H. S. Song, J. Korean Phys. Soc. 43, 30 (2003) [arXiv:hep-ph/0111069].
[13] S. C. Park, J. Korean Phys. Soc. 45, 208 (2004).
ABSTRACT
  In the series of papers by Ida, Oda and Park, the complete description of
Hawking radiation to the brane localized Standard Model fields from mini black
holes in the low energy gravity scenarios are obtained. Here we briefly review
what we have learned in those papers.

<|endoftext|><|startoftext|>
Introduction
The Collatz conjecture (also known as the 3n + 1 conjecture, Ulam’s con-
jecture, the Syracuse problem, Kakutani’s problem, Hasse’s algorithm, etc.)
was first proposed by Lothar Collatz in 1937 [2]. In terms of the function
T (n), defined by
T (n) =
, n ≡ 1(mod 2)
, n ≡ 0(mod 2)
, n ∈ N, (1)
∗e-mail: dominicd@newpaltz.edu
http://arxiv.org/abs/0704.1057v1
the conjecture claims that for all natural numbers n, there exists a natural
number k such that
T (k) (n) = T ◦ T ◦ · · · ◦ T
︸ ︷︷ ︸
k times
(n) = 1.
For example, we have
T (3) = 5, T (2)(3) = 8, T (3)(3) = 4, T (4)(3) = 2, T (5)(3) = 1,
T (7) = 11, T (2)(7) = 17, T (3)(7) = 26, T (4)(7) = 13, T (5)(7) = 20, (2)
T (6)(7) = 10, T (7)(7) = 5, T (8)(7) = 8, T (9)(7) = 4, T (10)(7) = 2, T (11)(7) = 1.
We define T (∞) (n) = 1.
Exercise 1 Prove that if ∀n ∈ N ∃ k ∈ N such that T (k) (n) < n, then the
Collatz conjecture is true. The number k is called the stopping time of n.
As of February 2007, the Collatz conjecture has been verified for numbers
up to 13×258 = 3, 746 , 994, 889, 972, 252, 672 [7]. However, the general case
remains open.
Introducing the total stopping time function σ∞(n), defined by σ∞(1) = 0
σ∞(n) = inf
k ∈ N ∪ {∞} | T (k) (n) = 1
, n ≥ 2,
we can reformulate the Collatz conjecture as
C = N, (C1)
where
C = {n ∈ N | σ∞(n) < ∞} . (3)
From (2), we have
n 2 3 4 5 6 7 8
σ∞(n) 1 5 2 4 6 11 3
Exercise 2 Find σ∞(n) for 9 ≤ n ≤ 100.
Hint: The web page http://www.numbertheory.org/php/collatz.html con-
tains an implementation which allows the computation of σ∞(n) for large
values of n.
http://www.numbertheory.org/php/collatz.html
One could consider the inverse problem and try to characterize the sets
Sk, defined by S0 = {1} and
Sk = {n ∈ N | σ∞(n) = k} , k ≥ 1. (4)
The first few Sk are
S1 = {2} , S2 = {4} , S3 = {8} , S4 = {5, 16} , S5 = {3, 10, 32} , (5)
S6 = {6, 20, 21, 64} , S7 = {12, 13, 40, 42, 128} , S8 = {24, 26, 80, 84, 85, 256} .
It is clear from (4) that 2k ∈ Sk ∀k ∈ N0, where N0 = N ∪ {0} . In terms of
the sets Sk, the Collatz conjecture reads
Sk = N. (C2)
Exercise 3 Compute Sk for 9 ≤ k ≤ 100.
Hint: Consider the inverse map T−1 : N → P (N) , given by
T−1(n) =
{2n} , n ≡ 0, 1(mod 3)
2n, 1
(2n− 1)
, n ≡ 2(mod 3)
The sequence of natural numbers
n , n ≥ 0
, defined by x
0 = m
n+1 = T
x(m)n
, 0 ≤ n, (6)
is called the trajectory or forward orbit of m ∈ N. From (2), we have
= {2, 1} ,
= {3, 5, 8, 4, 2, 1} ,
= {4, 2, 1} ,
x(5)n
= {5, 8, 4, 2, 1} ,
x(6)n
= {6, 3, 5, 8, 4, 2, 1} ,
x(7)n
= {7, 11, 17, 26, 13, 20, 10, 5, 8, 4, 2, 1} ,
x(8)n
= {8, 4, 2, 1} .
Exercise 4 Find
for 9 ≤ m ≤ 100.
Using the sequences
we can restate Collatz’s conjecture as
x(m)n
= {2, 1} . (C3)
We can also consider higher order recurrences, i.e., instead of (6), use
n+i = T
, 0 ≤ n,
where
T (i) (x) = fi,j(x), if x ≡ j
mod 2i
, 0 ≤ j ≤ 2i − 1. (7)
For i = 1, 2, 3, we have
f1,0(x) =
, f1,1(x) =
3x+ 1
f2,0(x) =
, f2,1(x) =
3x+ 1
, f2,2(x) =
3x+ 2
, f2,3(x) =
9x+ 5
f3,0(x) =
, f3,1(x) =
9x+ 7
, f3,2(x) =
3x+ 2
, f3,3(x) =
9x+ 5
f3,4(x) =
3x+ 4
, f3,5(x) =
3x+ 1
, f3,6(x) =
9x+ 10
, f3,7(x) =
27x+ 19
Exercise 5 Prove that if the sequence {3k + 4 } ⊂ C, then the Collatz con-
jecture is true.
In terms of (7), the Collatz conjecture reads
∀n ∈ N ∃ m ∈ N such that n ≡ k (mod 2m) and
fm,k(x) < 1. (C4)
For example, we have 11 ≡ 11 (mod 25) and
f5,10(x) =
3x+ 2
, f5,11(x) =
27x+ 23
Thus,
5 = f5,11(11) = 10, x
10 = f5,10(10) = 1.
The literature on the Collatz conjecture is vast and growing rapidly.
Rather than attempting to cover it, we refer the reader to the excellent
survey papers [5] and [6].
2 Representation of natural numbers
Let the sets Λm be defined by Λm = {2
m} , 0 ≤ m ≤ 3 and
n ∈ N | n =
, m ≥ 4,
for some m, l, b1, . . . , bl ∈ N0, with
0 ≤ l ≤ m− 3 and 0 ≤ b1 < b2 < · · · < bl ≤ m− 4.
The first few Λm are
Using the (l + 2)−tuple (l, b1, . . . , bl, m) to represent the number
we can write
Λ4 = {(0, 4) , (1, 0, 4)} , Λ5 = {(0, 5) , (1, 1, 5) , (2, 0, 1, 5)} ,
Λ6 = {(0, 6) , (1, 0, 6) , (1, 2, 6) , (2, 1, 2, 6)} , (8)
Λ7 = {(0, 7) , (1, 1, 7) , (1, 3, 7) , (2, 0, 3, 7), (2, 2, 3, 7)} ,
Λ8 = {(0, 8) , (1, 0, 8) , (1, 2, 8) , (1, 4, 8) , (2, 1, 4, 8), (2, 3, 4, 8)} .
Exercise 6 Compute Λm for 9 ≤ m ≤ 100.
Hint: (a) If (v1, v2, . . . , vl+2) ∈ Λm, then (v1, v2 + 1, . . . , vl+2 + 1) ∈ Λm+1.
(b) (1, 0, 2m) ∈ Λ2m for all m ≥ 2.
Comparing (5) with (8), it seems that Sm = Λm. The next results will
show this to be true.
Lemma 7 For all m ∈ N0, we have
T (Λm+1) ⊂ Λm. (9)
Proof. Let n ∈ Λm+1. Then,
T (n) =
2bk−1
if b1 > 0 (n even) or
T (n) =
2bk+1
2bk+1−1
if b1 = 0 (n odd). In either case, T (n) ∈ Λm.
Lemma 8 For all m ∈ N0, we have
Λm ⊂ Sm. (10)
Proof. We use induction on m. The case of m = 0 is clearly true, since
Λ0 = {1} = S0.
Assuming (10) to be true for m, let n ∈ Λm+1. From (9) we have T (n) ∈
Λm and therefore σ∞ [T (n)] = m. Thus, σ∞(n) = m+1 and the result follows.
Exercise 9 Show that
T (n) =
. (11)
The other inclusion is also true.
Theorem 10 For all m ∈ N0,
Sm ⊂ Λm.
Proof. Clearly, Sm = {2
m} = Λm, 0 ≤ m ≤ 3.
Let m ≥ 4 and s ∈ Sm. Using (11) we can write the recurrence (6) as
n+1 =
x(s)n +
0 = s, (12)
where
θn = sin
x(s)n
, i.e., x(s)n ≡ θ
n (mod 2) . (13)
Assuming {θn} to be a known sequence, the solution of (12) is [1]
x(s)n = 2
j + 1
j + 1
or using (13)
n = 2
−n3Θ(n−1)
3Θ(k)
, (14)
Θ (x) =
j . (15)
Setting n = m and solving for s in (14), we obtain
m = 1
3Θ(m−1)
3Θ(k)
. (16)
Let l = Θ (m− 1) . From (13) and (15), we see that Θ (x) is a step function
with unit jumps at x = b1, b2, . . . , bl, m, where 0 ≤ b1 < b2 < · · · < bl < m.
Therefore, we can rewrite (16) as
Finally, since x
m−3 = 8, x
m−2 = 4, x
m−1 = 2 and x
m = 1, the penul-
timate jump must occur before or at x = m − 4. Thus, bl ≤ m − 4 and
l = Θ (m− 1) ≤ m− 3.
Corollary 11 The Collatz conjecture is true if and only if every natural
number n can be represented in the form
for some m, l, b1, . . . , bl ∈ N0, with
0 ≤ l ≤ m− 3 and 0 ≤ b1 < b2 < · · · < bl ≤ m− 4.
Corollary 11 is not a proof of the Collatz conjecture, but it provides a lot
of information on the set C and the function σ∞(n). When l = 0, we recover
the known fact that 2m ∈ Sm, ∀m ∈ N0. For l = 1, we have the following
result.
Lemma 12 For all m ∈ N, we have
∈ Sm, 0 ≤ k ≤
, m even,
22k+1
∈ Sm, 0 ≤ k ≤
, m odd.
Proof. Let n ∈ Λm, with l = 1. We have
2m − 2b1
= 2b1 ×
2m−b1 − 1
, 0 ≤ b ≤ m− 4.
Thus, 2m−b1 ≡ 1 (mod 2) and therefore m− b1 ≡ 0 (mod 2) . Considering the
cases m even and m odd, the result follows.
When l = 2, the situation is slightly more complicated. To simplify
matters, we restrict ourselves to the case of n being odd.
Proposition 13 For all m ≥ 5, with m 6= 6, 8, we have
2m−2−6k
∈ Sm, 1 ≤ k ≤
, m ≥ 10 even,
2m−4−6k
∈ Sm, 0 ≤ k ≤
, m ≥ 5 odd.
Proof. Let n ∈ Λm, odd, with l = 2. Then,
2m − 3− 2b2
and therefore
2m − 2b2 = 2b2 ×
2m−b2 − 1
≡ 3 (mod 9) .
Considering all possible cases, we have
1) 2b2 ≡ 1 (mod 9) and 2m−b2 − 1 ≡ 3 (mod 9), which implies
b2 ≡ 0 (mod 6) , m− b2 ≡ 2 (mod 6) .
2) 2b2 ≡ 2 (mod 9) and 2m−b2 − 1 ≡ 6 (mod 9), which implies
b2 ≡ 1 (mod 6) , m− b2 ≡ 4 (mod 6) .
3) 2b2 ≡ 4 (mod 9) and 2m−b2 − 1 ≡ 3 (mod 9), which implies
b2 ≡ 2 (mod 6) , m− b2 ≡ 2 (mod 6) .
4) 2b2 ≡ 5 (mod 9) and 2m−b2 − 1 ≡ 6 (mod 9), which implies
b2 ≡ 5 (mod 6) , m− b2 ≡ 4 (mod 6) .
5) 2b2 ≡ 7 (mod 9) and 2m−b2 − 1 ≡ 3 (mod 9), which implies
b2 ≡ 4 (mod 6) , m− b2 ≡ 2 (mod 6) .
6) 2b2 ≡ 8 (mod 9) and 2m−b2 − 1 ≡ 6 (mod 9), which implies
b2 ≡ 3 (mod 6) , m− b2 ≡ 4 (mod 6) .
Thus, for m even we shall have m − b2 ≡ 2 (mod 6) or b2 ≡ m − 2 (mod 6)
and for m odd we need m − b2 ≡ 4 (mod 6) or b2 ≡ m − 4 (mod 6) , with
1 ≤ b2 ≤ m− 4. Writing b2 in terms of m, the result follows.
From Corollary 11, we can also get an idea of how the total stopping time
σ∞(n) behaves if the Collatz conjecture is true. Solving for m in (C5) we
ln (2)
3ln+ 3l
In other words, σ∞(n) lies on the family of parametric curves
ln (2)
3in+ j
, i, j ∈ N0, i ≤ j.
For example, we have
n 2 3 4 5 6 7 8
σ∞(n) 1 5 2 4 6 11 3
(i, j) (0, 0) (2, 5) (0, 0) (1, 1) (2, 10) (5, 347) (0, 0)
Exercise 14 Prove that
ln(n)
ln(2)
≤ σ∞(n) ∀n ∈ N.
2.0.1 Binary sequences
Another approach is to study the sequence
, k ≥ 0
, which contains a
wealth of information.
Definition 15 Let τ : C → N be defined by
τ (n) =
σ∞(n)
2k. (17)
For example, we have
n 1 2 3 4 5 6 7 8
τ(n) 1 2 35 4 17 70 2199 8
Clearly, τ (2n) = 2n, ∀n ∈ N0.
Exercise 16 Find τ(n) for 9 ≤ n ≤ 100.
Let’s study the image of Λm by τ. We have
τ (Λ0) = {1} , τ (Λ1) = {2} , τ (Λ2) = {4} , τ (Λ3) = {8} ,
τ (Λ4) = {16, 17} , τ (Λ5) = {32, 34, 35} , τ (Λ6) = {64, 65, 68, 70} , (18)
τ (Λ7) = {128, 130, 136, 137, 140} , τ (Λ8) = {256, 257, 260, 272, 274, 280} .
Exercise 17 Let
1 , . . . , λ
where Nm = #Λm denotes the number of elements in the set Λm. Prove that
∀m ∈ N0 there exist a sequence
2m > α
1 ≥ · · · ≥ α
= 0, (19)
such that
= 2m + α
From (18), we have
m 1 2 3 4 5 6 7 8
1 0 0 0 1 3 6 12 24
2 0 0 0 0 2 4 9 18
It follows from (19) that
#Λm ≤ α
1 + 1, ∀m ≥ 0.
Using (16), we can define an inverse function for τ(n).
Definition 18 Let φ : N → Q be defined by
φ(n) =
3Φ(m−1)
3Φ(k)
where β
m−1 . . . β
0 is the binary representation of n, i.e.,
= n and Φ (x) =
For example, we have
n 1 2 3 4 5 6 7 8
φ(n) 1 2 1
4 1 2
Exercise 19 Find φ(n) for 9 ≤ n ≤ 100.
It follows from Theorem 10 that
φ ◦ τ (n) = n, ∀n ∈ N,
while (19) implies that
Sm = φ
2m, . . . , 2m + α
∩ N, ∀m ≥ 0.
In terms of φ, the Collatz conjecture reads
N ⊂ φ (N) . (C6)
With (C6), we finally reach a statement equivalent to the Collatz con-
jecture, which is independent of the original formulation in terms of T (x).
Although we have not succeeded in proving (C6), we hope that studying the
function φ(n) will shed new light on the Collatz problem.
3 Further problems
In the spirit of the Monthly, we offer a series of problems to the curious
reader. Those labeled ”Exercise” are relatively easy to prove, ”Problems”
denote results strongly supported by numerical evidence and ”Conjectures”
are those that we would really wish to prove, but that may turn out to be
false.
Conjecture 20 Prove that
σ∞(n) ≤ δ(n) ln(n) ∀n ≥ 2, (20)
where δ(n) is a slowly varying function, which might be eventually constant.
Definition 21 The Abby-Normal numbers (AN numbers). Let the scaled
total stopping time γ (n) be defined by
γ (n) =
σ∞(n)
ln(n)
, n ≥ 2.
We say that ak is the k−th AN number if
γ (ak) = max {γ (n) | ak−1 ≤ n ≤ ak} , k ≥ 1
with a0 = 2. In other words, {γ (ak)} is an increasing sequence of sharp lower
bounds for the function δ(n) defined in (20).
Exercise 22 Show that
n 1 2 3 4 5 6 7
a(n) 3 7 9 27 230, 631 626, 331 837, 799
γ(n) ≃ 4.55 5.65 5.92 21.24 22.51 23.90 24.12
From the results obtained by Eric Roosendaal [8], it follows that
6, 649, 279, 8, 400, 511, 11, 200, 681, 15, 733, 191,
63, 728, 127, 3, 743, 559, 068, 799, 100, 759, 293, 214, 567,
are possible AN numbers. We have
γ(100, 759, 293, 214, 567)≃ 35.17.
Exercise 23 Find all AN numbers in the interval [1, 000, 000, 100, 759, 293, 214, 567].
Conjecture 24 Prove that there exist infinitely many AN numbers.
Problem 25 Let Vm be the m−vector
0 , . . . , x
σ∞(m)
and L : RN → RN the linear operator defined by
L ([v1, v2, . . . vN ]) = [v2, v3, . . . vN , v1] .
Let θ (m) be the angle between Vm and L (Vm) . Prove that
< cos [θ (m)] <
, ∀m ∈ N.
< cos [θ (m)] <
, ∀m ≡ 1(2).
(iii)
cos [θ (ak)] =
Hint: See [4].
Problem 26 Prove that:
(i) ∀k ≥ 6 ∃ mk ∈ Sk such that mk + 1 ∈ Sk.
(ii) ∀k ≥ 7 ∃ mk ∈ Sk such that mk + 2 ∈ Sk.
(iii) ∀l ∈ N ∃ K ∈ N such that ∀k ≥ K ∃ mk ∈ Sk such that mk + l ∈ Sk .
Hint: See [3].
Exercise 27 Prove that
#Sk+1
Exercise 28 Let
ζ (m) = # {2 ≤ k ≤ m | σ∞(k) = σ∞(k − 1)} .
Show that
ζ (m) = 0, 2 ≤ m ≤ 12, ζ (m) = 1, 13 ≤ m ≤ 14,
ζ (m) = 2, 15 ≤ m ≤ 18, ζ (m) = 3, 19 ≤ m ≤ 20.
Conjecture 29 Prove that
ζ (m)
Problem 30 Prove that
3× 2m−5
, m ≥ 0 (21)
605× 2m−13
, m ≥ 0,
where ⌊·⌋ denotes the greatest integer function.
Problem 31 Let
1 , . . . , s
, Nm = #Sm,
2m = s
1 > s
2 > · · · > s
Prove that
2m + 22k
k+2, 0 ≤ k ≤
, m ≥ 4, m ≡ 0(mod 2)
2m + 22k+1
k+2, 0 ≤ k ≤
, m ≥ 5, m ≡ 1(mod 2).
References
[1] R. P. Agarwal. Difference equations and inequalities. Marcel Dekker Inc.,
New York, 2nd ed., 2000.
[2] L. Collatz. On the origin of the (3n+1)−problem. J. Qufu Normal Univ.
(Nat. Sci.), 12(3):9–11, 1986.
[3] L. E. Garner. On heights in the Collatz 3n+ 1 problem. Discrete Math.,
55(1):57–64, 1985.
[4] D. Gluck and B. D. Taylor. A new statistic for the 3x+1 problem. Proc.
Amer. Math. Soc., 130(5):1293–1301 (electronic), 2002.
[5] J. C. Lagarias. The 3x+1 Problem: An Annotated Bibliography (1963–
2000). eprint: arxiv:NT/0309224.
[6] J. C. Lagarias. The 3x + 1 Problem: An Annotated Bibliography, II
(2001-). eprint: arxiv:math.NT/0608208.
[7] T. Oliveira e Silva. Computational verification of the 3x+ 1 conjecture.
http://www.ieeta.pt/˜tos/3x+1.html
[8] E. Roosendaal. On the 3x + 1 problem.
http://www.ericr.nl/wondrous/index.html
http://arxiv.org/abs/math/0608208
http://www.ieeta.pt/~tos/3x+1.html
http://www.ericr.nl/wondrous/index.html
	Introduction
	Representation of natural numbers
	Binary sequences
	Further problems
ABSTRACT
  We establish an equivalent condition to the validity of the Collatz
conjecture, using elementary methods. We derive some conclusions and show
several examples of our results. We also offer a variety of exercises, problems
and conjectures.

<|endoftext|><|startoftext|>
Introduction 1
2 The Analytical Problem (in two dimensions) 3
2.1 Both Fixed Points Are Finite . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 The Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 The Solution of the Differential Equation . . . . . . . . . . . . . . . . 6
2.2 One Fixed Point At Infinity; One Fixed Point Finite . . . . . . . . . . . . . 7
2.2.1 The Differential Equation . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 The Solution of the Differential Equation . . . . . . . . . . . . . . . . 8
2.3 Both Fixed Points are at Infinity . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Descartes’ Theorem 9
3.1 Drucker’s Characterization of a Surface of Revolution . . . . . . . . . . . . 9
3.2 Proof of Descartes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1 Introduction
Almost two thousand three hundred years ago, the hellenistic mathematician Diokles [4]
gave the first proof that a mirror in the shape of a paraboloid of revolution reflects all incident
light rays, which are parallel to its axis of symmetry, to a single point, which Kepler [8],
in 1604, called the focus.
http://arxiv.org/abs/0704.1059v4
After the advent of the calculus it was possible to prove that the only such reflecting
surface is generated by revolving a (proper or degenerate) parabola around its axis of sym-
metry. This is a very famous and well-known result, and is treated in many easily accessible
sources. See, for example, Spiegel [16].
All these proofs are based on Heron’s Law of Reflection, θI = θRf , where θRf is
the angle the reflected ray makes with the normal to the reflecting surface at the point of
incidence of the incoming ray and θI is the angle the incident ray makes with the normal.
One transforms Heron’s equation into the ordinary differential equation (ODE) of
the cross-section curve of the mirror. Drucker’s paper [5] would seem to be the final word
on the subject.
Unfortunately and surprisingly, the corresponding result for a lens, instead of a mirror,
is less well-known, at least among mathematicians (although [11] is pleasant attempt to alter
that). Yet the case of a lens, too, is quite fascinating and is treatable by elementary means.
The purpose of this paper is to remedy the situation and fill this gap.
Indeed, it all started in 1637, when Descartes [3] asked for the refractive analogue of
the parabolic mirror:
Which shape of lens will focus all rays from one radiant point source to one
single image point?
We will call such a lens a perfect lens .
Descartes discovered that the cross-section curve of the perfect lens, assumed to be
a surface of revolution, is a fourth degree curve known today as the cartesian oval . It
can be defined as the locus of points the ”weighted” sum of whose distances from two fixed
points is a constant:
d1 + nd2 = c (1.0.1)
where d1 and d2 are the distances from any point on the curve to the two fixed points, called
the foci, and n is a constant. If one focus is at the origin and the other is at the point (b, 0)
where b > 0 , the equation can be written:
(1 − n2)(x2 + y2) + 2n2bx + c2 − n2b2
= 4c2(x2 + y2) (1.0.2)
If n = ±1, the oval is the conic section :
x − b
c2−b2
) = 1 (1.0.3)
More information on cartesian ovals can be found in [14] and [15] and [18].
Descartes’ own treatment, which is not altogether easy to read (see [1]), shows that
the oval is a solution, but does not show that it is the only solution.
The only treatments of Descartes’ result that we have seen in the literature do not
appear in books on mathematics (!), but rather on optics (see Hecht [7] and Klein [9])
and use Fermat’s Principle: A light ray traverses the path between two points which takes
the least time.
A non-trivial computation, based on the calculus of variations, shows that the time the
light ray takes to go from the radiant point to the image point is constant for every point
of the cross-section curve of the perfect lens (since if the time were different in two points of
the curve, it would not be minimal), and therefore its equation is that of the cartesian oval.
Moreover such treatments make physical assumptions about the velocity of light in differ-
ent media, while, as our treatment will show, the problem is really one in pure mathematics.
We have not seen any treatment of the subject which is founded purely on Snell’s law
of refraction, which describes the relationship between the angle of incidence and the angle
of refraction when light passes the boundary between two isotropic media (media in which
the path of a light ray is a straight line). The law states:
If θI is the angle the incident ray makes with the normal to the boundary at the
point of refraction, and if θR is the angle the refracted ray makes with the normal,
then at all points of the boundary the ratio
sin θI
sin θR
where n, called the index of refraction, is constant .
Such a treatment of Descartes’ theorem would seem desireable, since it is the immediate
generalization of the corresponding treatment of the perfect reflective mirror .
In this paper, we will present a new, self-contained, elementary, purely analytical proof,
based on Snell’s Law and Drucker’s paper [5], of the following complete form of Descartes’
theorem:
Theorem 1. (Descartes’ Theorem) A smooth connected surface is a perfect lens if and
only if it is a connected subset of the ovoid obtained by revolving a cartesian oval around its
axis of symmetry.
2 The Analytical Problem (in two dimensions)
We begin by solving the following two-dimensional purely analytical problem:
It is required to find the equation, f(x, y) = 0, of a smooth connected curve, C,
for which the straight lines from two fixed points cut the normal in two angles
whose sines are in constant ratio.
Please note the absence of physical modeling. The problem is purely mathematical, as its
its solution.
We will find and solve an ordinary differential equation (ODE) for which the equa-
tion of the curve is the general solution. The ODE, in fact, will be a restatement of Snell’s
Definition 1. We call any curve C that solves the problem a perfect two-dimensional
lens with respect to the points F and F ′.
2.1 Both Fixed Points Are Finite
2.1.1 The Differential Equation
We assume a cartesian coordinate system in the xy-plane.
Let the two fixed points be O(0, 0) and B(b, 0) with b > 0 (Here is we use the assumption
that F and F ′ are finite and distinct). Let P(x, y) be a variable point on the curve f(x, y) = 0.
We assume P is in the first quadrant and we assume that the curve is concave downwards
at P (that is, if y(x) is the function defined implicitly by the equation f(x, y) = 0, then
y′′(x0) < 0) . Let l1 be the length of the line segment OP and let l2 be the length of PB.
Let MN be the normal to the curve where N is on the concave side of the curve and P is
between M and N. Let θ1 := MP̂O, the angle that OP forms with the normal MN , and let
θ2 := BP̂N , the angle that BP forms with the normal MN . Let PT be the tangent line
to the curve at P where T is the point on the x-axis where the tangent line crosses it. Let
φ := P T̂B, the angle, measured counter clockwise, the tangent line forms with the x-axis.
We will use the geometry of the figure to obtain formulas for sin θ1 and sin θ2 in terms
of x, y, and the derivative, y′. When we substitute these expressions into Snell’s Law, we
obtain the desired differential equation for the curve f(x, y) = 0.
By the law of cosines applied twice to the △OPB
cos PÔB =
b2 + l2
cos PB̂O :=
b2 + l2
cos PÔB = cos(θ1 + φ − 90) cosPB̂O = cos(90 − θ2 − φ)
= sin(θ1 + φ) = sin(θ2 + φ)
So, we obtain our fundamental formulas:
sin(θ1 + φ) =
b2 + l2
sin(θ2 + φ) :=
b2 + l2
(2.1.1)
Moreover, it is evident that
< θ1 + φ < π 0 6 θ2 + φ <
(2.1.2)
b2 + l2
b2 + l2
2b2 − 2bx
b − x
Therefore, equations (2.1.1) become
sin(θ1 + φ) =
sin(θ2 + φ) =
b − x
By (2.1.1) and (2.1.2) and the definition of the arcsin function, we obtain
π − (θ1 + φ) = arcsin
θ2 + φ = arcsin
b − x
and therefore
θ1 = (π − φ) − arcsin
θ2 = arcsin
b − x
whence,
sin θ1 = sin(π − φ) cos
arcsin
− cos(π − φ) sin
arcsin
= sin φ
+ cos φ
1 + y′2
1 + y′2
sin θ2 = cos φ sin
arcsin
b − x
− sin φ cos
arcsin
b − x
1 + y′2
b − x
1 + y′2
b − x
1 + y′2
b − x
(1 + y′2)
Now, our assumption is that Snell’s Law holds, i.e., that
sin θ1
sin θ2
where n is a constant, holds for every point P (x, y) of the curve. Substituting our two
formulas for sin θ1 and sin θ2 into this equation gives us the equation:
1 + y′2
1 + y′2
1 + y′2
b − x
(1 + y′2)
= n (2.1.3)
Solving this equation (2.1.3) for y′ we obtain the differential equation of the curve:
b − x
) (2.1.4)
2.1.2 The Solution of the Differential Equation
We use the “arrow” notation. “P ⇒ Q” means “the proposition P (logically) implies the
proposition Q.”
(2.1.4) ⇒
b − x
yy′ + x
nyy′ − n(b − x)
yy′ + x
x2 + y2
+ n ·
−(b − x) + yy′
(b − x)2 + y2
2yy′ + 2x
x2 + y2
+ n ·
−2(b − x) + 2yy′
(b − x)2 + y2
x2 + y2 + n
(b − x)2 + y2
x2 + y2 + n
(b − x)2 + y2 = c
for some (arbitrary) constant c. We have therefore proved:
Theorem 2. The general solution for the differential equation (2.1.4) of the perfect two-
dimensional lens, C, with respect to the points F = (0, 0) and F ′ = (b, 0), where b > 0, is
given by the equation:
x2 + y2 + n
(b − x)2 + y2 = c. (2.1.5)
As we saw, this is the equation (1.0.1) of a cartesian oval with foci at the points (0, 0)
and (b, 0).
We have assumed that b is finite in this analysis, i.e., that the two foci are a finite distance
apart.
Now we consider the limiting cases where one or both foci are “at infinity.” We will see
that we obtain proper or degenerate conic sections for these cases.
2.2 One Fixed Point At Infinity; One Fixed Point Finite
2.2.1 The Differential Equation
We will slightly alter the treatment for the case of two finite foci. To do so, we begin with
the following:
Definition 2. A point at infinity is specified by means of a line through the origin. The
line joining P to a point at infinity is the line through P parallel to the given line.
Points at infinity are not considered to be on C.
We assume that the fixed point F is at −∞ along the x-axis and that the fixed point F ′
is at the point (b, 0) of the x-axis, where b > 0.
Intuitively, this means that a beam of light from −∞, parallel to the x-axis, is brought to
a point focus at (b, 0) by a single refracting curve, f(x, y) = 0, of index n.
The line joining P to the point at infinity is the line parallel to the x-axis through P . θ1
is the angle the horizontal line through P (x, y) makes with the normal while θ2 is the angle
PF makes with the normal. Finally, l be the length of PF .
Then, the earlier derivation of the ODE is applicable. We need only observe that
θ1 + φ =
So, substituting our two new formulas for sin θ1 and sin θ2 into Snell’s Law gives us, instead
of (2.1.3), the new equation:
1 + y′2
1 + y′2
b − x
(1 + y′2)
After some rearrangement, we obtain the differential equation of the curve:
(b − x) − yy′
(b − x)2 + y2
· n = 0. (2.2.1)
2.2.2 The Solution of the Differential Equation
The ODE (2.2.1) can be solved by the same computations as we did for the ODE (2.1.4)
which lead us to the
Theorem 3. The general solution for the differential equation (2.2.1) of the perfect two-
dimensional lens, C, with the radiant point at −∞ is given by the equation:
x + n
(b − x)2 + y2 = c. (2.2.2)
where c is an arbitrary constant.
We observe that the equation (2.2.2) has the following interesting interpretation. The
equation (2.2.2) says that the ratio of the distance of the point P from the line x =
c − bn
1 − n
its distance from the point (b, 0) is the constant ±n, and thererefore, by the focus-directrix
definition, is a conic section.
This theorem takes a more elegant form if we assume that the curve C passes through the
origin. Then, the constant c = nb and, after rationalizing (2.2.2), we obtain ([10], problem
B-10, Chapter 20):
Theorem 4. The general solution for the differential equation (2.2.1) of the perfect two-
dimensional lens, C, is a conic section whose focus is the point where the light is focused
and whose excentricity is the reciprocal of the index of refraction.
1. If n2 6= 1, C given by the equation:
x − nb
) = 1 (2.2.3)
Therefore C is an ellipse if n2 > 1 or an hiperbola if n2 < 1, either one of which is
centered at
n + 1
2. If n = 1, then C is the segment of the x-axis given by 0 6 x 6 b.
3. If n = −1, then C is the parabola
y2 = 4bx (2.2.4)
The reader should compare this result with that of the form of the perfect reflecting
mirror already cited in [5]. If n < 0, then we get reflection instead of refraction.
Maesumi [11] used Fermat’s Principle to treat this case in a very elegant paper, al-
though his definition of the index of refraction is the reciprocal of our (standard) one.
2.3 Both Fixed Points are at Infinity
Keeping the notation of the case of the radiant point at −∞, we assume that the refracted
rays form a parallel beam in the direction such that
θ2 + φ = Constant,
but, this means that
sin(θ2 + φ) =
b − x
(b − x)2 + y2
where C is some constant. But the condition that C goes through the origin means that
C = 1,
and rationalizing the resulting equation we obtain:
Theorem 5. If both fixed points are at infinity, then the perfect lens C has the equation:
x = 0 (2.3.1)
That is, it is the vertical y-axis.
3 Descartes’ Theorem
3.1 Drucker’s Characterization of a Surface of Revolution
In 1992 [5] Drucker published a very interesting paper in which he treated the problem
of finding all perfect mirrors, i.e., mirrors which reflect all rays issuing from one radiant
point to one image point.
After showing that the two dimensional curve with the perfect reflecting property is a
proper or degenerate conic section, he (implicitly) proved the following characterization of a
surface of revolution. Drucker, himself, did not state it explicitly.
Theorem 6. Let F and F ′ be two fixed points. If, for each point P of the smooth connected
surface S the normal ~N at P lies in the subspace spanned by the vectors
FP and
F ′P , then
S is a surface of revolution whose axis of revolution is the line through F and F ′.
Proof. We offer a new proof of Drucker’s theorem. It is based on an idea in Salmon
[15] which goes back to Monge [12].
Since, by definition, the normal MN is in the subspace spanned by FP and F ′P , it is in
the plane of △FPF ′.
If MN is always parallel to FF ′, then S is a plane which is perpendicular to FF ′. We
exclude this degenerate case for the rest of the argument. (See (2.3.1)).
Therefore MN is not always parallel to FF ′. Thus, the infinite line MN intersects FF ′
at some point. This is the characteristic property of the surface S.
Let (α, β, γ) be a point on FF ′ and let (l, m, n) be the line’s direction numbers where we
assume l · m · n 6= 0. The corollaries deal with the case where one or more coefficients are
equal to zero. Then, the equation of the line FF ′ is
x − α
y − β
z − γ
= t (3.1.1)
where t is the common value of the three fractions.
F (x, y, z) = 0 (3.1.2)
be the equation of S, where F is a continuously differentiable function of x, y, and z in some
open set R, and let (x0, y0, z0) be the point P on S..
Since MN is normal to S at P , its equation is:
x − x0
Fx(x0, y0, z0)
y − y0
Fy(x0, y0, z0)
z − z0
Fz(x0, y0, z0)
= T (3.1.3)
where T is the common value of the three fractions, and where Fx(x0, y0, z0) ≡
evaluated
in (x0, y0, z0), and where the other denominators have a similar interpretation. We assume
that all three denominators are different from zero. The corollaries deal with the cases where
the denominators are equal to zero.
Solving equations (3.1.1) and (3.1.3) for x, y, and z, and then equating the values ob-
tained, we get the following homogeneous linear system for the unknowns t, T , and 1:
lt − Fx(x0, y0, z0)T + (α − x0) · 1 = 0
mt − Fy(x0, y0, z0)T + (β − y0) · 1 = 0
nt − Fz(x0, y0, z0)T + (γ − z0) · 1 = 0
The analytical condition that this system have a nontrivial solution, which it does by as-
sumption, is that the determinant of their coefficients vanish:
∣∣∣∣∣∣
Fx(x0, y0, z0) Fy(x0, y0, z0) Fz(x0, y0, z0)
l m n
x0 − α y0 − β z0 − γ
∣∣∣∣∣∣
= 0 (3.1.4)
The determinant on the left-hand side of (3.1.4) is (one half of ) the Jacobian of the three
functions
Ω := F (x, y, z), u := lx + my + nz, v := (x − α)2 + (y − β)2 + (z − γ)2, (3.1.5)
evaluated at the point P of S.
But the point P is totally arbitrary, which means the Jacobian (3.1.4) vanishes in a full
neighborhood of P , since F is a continuously differentiable function of x, y, and z in some
open set R. According to a classical theorem (see Buck [2], Goursat [6], Osgood [13],
and Taylor [17], if the Jacobian of the three functions vanishes identically, then three
functions are functionally dependent.
That means that there is a function, Ω(u, v), of the two variables u and v, defined and
continuously differentiable in a neighborhood of the point (u0, v0), where
u0 := lx0 + my0 + nz0, v0 := (x0 − α)2 + (y0 − β)2 + (z0 − γ)2
for which the equation
F (x, y, z) = Ω
lx + my + nz, (x − α)2 + (y − β)2 + (z − γ)2
(3.1.6)
holds identically in a neighborhood of (x0, y0, z0).
Now, the equation
lx + my + nz = u (3.1.7)
represents a plane which cuts the line FF ′ (represented by (3.1.1)) perpendicularly, while
the equation
(x − α)2 + (y − β)2 + (z − γ)2 = v (3.1.8)
represents a sphere of radius
v and with center (α, β, γ) on the line FF ′.
The points (x, y, z) which are on the plane, (3.1.7), and on the sphere, (3.1.8), simulta-
neously, are on their circle of intersection and this circle has its center on the line FF ′.
Therefore, the equation (3.1.2) of S, i.e., Ω(u, v) = 0, represents a surface generated
by a circle of variable radius whose center moves along the line FF ′ and whose plane is
perpendicular to that line.
Thus, every planar transverse section of S, perpendicular to FF ′, consists of one or more
circles whose centers are on the line FF ′.
That is, S is a surface of revolution with axis FF ′.
This completes the proof of Drucker’s theorem.
Corollary 1. If the z-axis is the axis of revolution, we may take the origin as the point
(α, β, γ), and the equation (3.1.2) becomes
F (x, y, z) = Ω
z, x2 + y2 + z2
(3.1.9)
There are similar simplifications in (3.1.6) if we take the other coordinate axes as the
axis of revolution.
Corollary 2. If Fx(x0, y0, z0) ≡ 0, then the normal is everywhere perpendicular to the x −
axis and the equation (3.1.2) becomes the cylinder of revolution:
F (x, y, z) = Ω
my + nz, (y − β)2 + (z − γ)2
(3.1.10)
There are similar simplifications if the other components of the normal are zero.
Remark 1. In order to apply the theorem on functional dependence which we used in
the above proof, we have to make sure that we comply with all the hypotheses. The only
one, which we did not explicitly state in the body of the proof is, using the notations of
(3.1.7) and (3.1.8), is that at least one of the three jacobians
∂(u, v)
∂(x, y)
∂(u, v)
∂(y, z)
∂(u, v)
∂(x, z)
, (3.1.11)
is different from zero at (x0, y0, z0).
We claim that even more is true in our case. We will prove that at least two of the
jacobians (3.1.11) are different from zero.
Suppose, to the contrary, that at least two of them are equal to zero, say
∂(u, v)
∂(x, y)
∂(u, v)
∂(y, z)
= 0 (3.1.12)
This leads to
x − α
y − β
y − β
z − γ
, (3.1.13)
respectively. By (3.1.13)
x − α
y − β
z − γ
(3.1.14)
which is the equation of the axis FF ′. But, this means that S is just the straight line axis,
which is excluded by the hypothesis that S is a smooth surface. Therefore, at least two of
the jacobians (3.1.11) are different from zero and the theorem on functional dependence is
applicable.
Remark 2. The proof shows that the characteristic property of a surface S of revolution is
that the normal to any point of S intersects the axis of revolution.
3.2 Proof of Descartes’ Theorem
We adapt Drucker’s definition
Definition 3. Let S be a smooth connected surface and let F and F ′ be points not in S. We
say that S is a perfect lens relative to F and F ′ if, for each point P in S:
1. the normal ~N at P lies in the subspace spanned by the vectors
FP and
F ′P , and
2. the sines of the angles which
FP and
F ′P form with that normal are in constant
ratio for every point P in S.
By condition 2 of the definition, the cross-section of S sliced out by the xy-plane is a
plane curve C which is a perfect two-dimensional lens relative to F and F ′.
That means that C is either (part of) a cartesian oval, or (part of) a conic section,
or a degenerate case of either one.
Therefore, by condition 1 and Drucker’s Theorem, a three dimensional perfect lens S
is (part of) a surface of revolution with axis FF ′ obtained by rotating a two-dimensional
perfect lens S around it.
This completes the proof of Descartes’ Theorem.
Acknowledgment
Support from the Vicerrectoŕıa de Investigación of the University of Costa Rica is acknowl-
edged.
References
[1] Barbin, E; Guitart, R., La pulsation entre les conceptions optiques, algbriques,
articules, et projetive des ovales cartsiennes, Actes de la septime universit d’t interdis-
ciplinaire sur l’historie des mathmatiques, (Nantes, juillet 1997)
[2] Buck, R.C., Advanced Calculus, McGraw-Hill, Inc., New York, 1978
[3] R. Descartes, The Geometry, Part II, Great Books, Volume 31, Encyclopedia Bri-
tannica, New York, 1952
[4] Diokles, On Burning Mirrors, G J Toomer, Sources in the History of Mathematics
and the Physical Sciences 1, Springer, (New York, 1976).
[5] Drucker, D., Reflection Properties of Curves and Surfaces, Mathematics Magazine,
65 (1992), 147-157
[6] Goursat, E., A Course in Mathematical Analysis, Vol. 1, Dover Publications New
York, 1964
[7] Hecht, E., Optics, 4th edition, Barnes and Noble New York, 2001
[8] Kepler, Astronomia, pars optica iv.4, 1604.
[9] Klein, M.V., Optics, John Wiley & Sons New York, 1970
[10] Leighton, R.B. ; Vogt, R.E., Exercises in Introductory Physics, Addison-Wesley,
Phillipines, 1969
[11] Maesumi, M., Parabolic Mirrors, Elliptic and Hyperbolic Lenses, AMM, 99 (1992),
558-560
[12] Monge, G., R Taton (ed.), L’Oeuvre scientifique de Monge Paris, 1951.,
[13] Osgood, W.F., Functions of Real Variables, G.E. Stechert & Co., New York, 1938
[14] Ferrol, R., Mandonnet, J., Ovale de Descartes From
http://www.mathcurve.com/courbes2d/descartes/descartes.shtml(2007)
[15] Salmon, G., A Treatise on the Analytic Geometry of Three Dimensions, Hodges, Fig-
gis, & Co, Dublin, 1882
[16] Spiegel, M., Applied Differential Equations, Prentice-Hall, New York, 1958
[17] Taylor, A.E.; Mann, W.R., Advanced Calculus, 3rd edition, John Wiley & Sons
New York, 1986
[18] Weisstein, Eric W. ”Cartesian Ovals.” From MathWorld–A Wolfram Web Re-
source. http://mathworld.wolfram.com/CartesianOvals.html
http://www.mathcurve.com/courbes2d/descartes/descartes.shtml(2007
http://mathworld.wolfram.com/CartesianOvals.html
	Introduction
	The Analytical Problem (in two dimensions)
	Both Fixed Points Are Finite
	The Differential Equation
	The Solution of the Differential Equation
	One Fixed Point At Infinity; One Fixed Point Finite
	The Differential Equation
	The Solution of the Differential Equation
	Both Fixed Points are at Infinity
	Descartes' Theorem
	Drucker's Characterization of a Surface of Revolution
	Proof of Descartes' Theorem
ABSTRACT
  We give a new, elementary, purely analytical development of
\textsc{Descartes}' theorem that a smooth connected surface is a perfect
focusing lens if and only if it is a connected subset of the ovoid obtained by
revolving a cartesian oval around its axis of symmetry.

<|endoftext|><|startoftext|>
Introduction
For more than two decades, effective superpotential has been a central object in the
nonperturbative study of N = 1 supersymmetric theories. This object is protected from
perturbative corrections in the conventional sense [1], and yet receives important nonpertur-
bative corrections (see for example [2, 3]). In recent years, analyses from superstring theory
have revealed an interesting perturbative window into nonperturbative physics with the use
of the gluino condensate superfield variable [4, 5, 6, 7]. In [8], field theoretic discussion based
on the model with U(N) gauge group and rigid N = 1 supersymmetry (see eq. (2.2) for its
action SN=1) is given and this is in accord with the string theory based developments.
Superstring theory, on the other hand, insists upon maximally extended supersymmetry
with no adjustable parameter. A scenario that one may draw is that this extended super-
symmetry becomes spontaneously broken to N = 1. Along this vein, a field theory model
with U(N) gauge group and rigid N = 2 supersymmetry spontaneously broken to N = 1
has been introduced in [9, 10, 11] (see eq. (2.1) for its action SN=2), generalizing the abelian
counterpart of [12]. (See also [13] for N = 2 supergravity and [14] for related discussions.)
Several properties of this model have been derived.
In this letter, we make a first analysis on the interplay between the effective superpotential
and partially as well as spontaneously broken N = 2 supersymmetry, shedding a light upon
the comparison of the two models mentioned above. A key aspect of this comparison is that
the fermionic shift symmetry of SN=1 gets replaced by the second (spontaneously broken)
supersymmetry of SN=2. In fact, this is one of the original motivations/results of [9].
The fermionic shift symmetry of SN=1 supplies the well-known formula [7, 8] constraining
the form of the effective superpotential, which is originally proposed from flux compactifi-
cation of string theory [15, 16]. Based on a diagrammatic analysis [17] (for a review see
[18]), we are able to state how this form undergoes modifications in the model SN=2. After
giving a few accounts of the model in the next section, we present a diagrammatic analysis
of Weff in section III. Our final understanding is summarized in eq. (3.10). This is followed
by a computation of the two-loop contribution to Weff in section IV. In the final section, we
derive a set of two equations on the two generating functions R(z) and T (z) of the one-point
functions, generalizing the argument based on the chiral ring and the Konishi anomaly in
[8]. We observe a modification from that given in [8] here as well.
II. The U(N) gauged model with spontaneously broken N = 2
supersymmetry
Let us briefly recall a few ingredients of the model, which are needed in what follows.
The action [9] given in the Wess-Zumino gauge can be written as
SN=2 =
d4xd4θ
Φ̄eadV
∂F(Φ)
− h.c.
+ ξV 0
d4xd2θ
∂2F(Φ)
∂Φa∂Φb
WaWb + eΦ0 +m
∂F(Φ)
+ h.c.
, (2.1)
where V = V ata and W
α are the vector superfield and the gauge superfield strength re-
spectively and Φ = Φata (a = 0, 1, . . . , N
2 − 1) is the chiral superfield ∗. There are three
Fayet-Iliopoulos parameters (e,m, ξ) which are all real. For simplicity, we choose the prepo-
tential as a single trace function of degree n + 2: F(Φ) =
k=1 gkTrΦ
k+1/(k + 1)!. While
this action is shown to be invariant under the N = 2 supersymmetry transformations [9, 10],
the vacuum breaks half of the N = 2 supersymmetries. Extremizing the scalar potential,
we obtain the condition 〈 ∂
∂Φ0∂Φ0
〉 = −(e± iξ)/m, which is a polynomial of order n and this
determines the expectation value of the scalar field.
The action SN=2 in (2.1) is to be compared with that of the N = 1, U(N) gauge model
with a single trace tree level superpotential W (Φ):
SN=1 =
d4xd4θTr Φ̄eadV Φ+
d4xd2θTr (iτWW +W (Φ)) + h.c.
, (2.2)
where τ is a complex gauge coupling τ = θ/2π + 4πi/g2.
In [9], it is checked that the second supersymmetry reduces to the fermionic shift symme-
try in the limit m → ∞. The action SN=2 in fact reduces to SN=1 in the limit m, e, ξ → ∞
with mgk (k ≥ 2) fixed [19]. We show that our result reduces to that of [7, 17] in this limit.
III. Diagrammatic analysis of the effective superpotential
In this letter, we consider the matter-induced part of the effective superpotential by
integrating out the massive degrees of freedom Φ:
d4x(d2θWeff+h.c.+d
4θ(nonchiral terms)) =
DΦDΦ̄eiSN=2. (3.1)
∗a = 0 corresponds to the overall U(1) part.
Let us take Wα (or V ) as the background field †. We consider the case of unbroken U(N)
gauge group. For simplicity, we choose 〈Φ〉 = 0 by setting g1 = −(e± iξ)/m.
We are interested in the holomorphic superpotential which does not contain the anti-
holomorphic couplings ḡk. We can take ḡk = 0 for k ≥ 3 without loss of generality. Collecting
the Φ̄ dependent terms, we obtain
SΦ̄ =
d4xd4θ
Φ̄eadV
∂F(Φ)
− (ḡ1Φ̄ +
Φ̄2)eadV Φ
d4xd2θ̄
TrΦ̄2
d4xd4θTr
Φ̃ḡ2
ḡ1Φ−
. (3.2)
In the last expression, we have introduced a covariantly anti-chiral superfield Φ̃ = Φ̄eadV ,
which satisfies ∇αΦ̃ = 0 (∇α = e
−adV Dαe
adV ). Eq. (3.2) is quadratic in Φ̃ and can be
integrated straightforwardly. As a result, we obtain the following terms,
16ḡ2
ḡ1Φ−
ḡ1Φ−
(Img1)
8mḡ2
Φ∇2Φ+ . . . , (3.3)
where . . . denotes the higher order interaction terms, which we will not consider here. Indeed,
these interaction vertices are higher order in m−1 compared to the vertices which we consider
below. These contribute to our main result (3.10) as higher order corrections in m−1 and do
not spoil our conclusion that the effective superpotential is modified from the case of SN=1
(2.2).
Replacing d2θ̄ integration by −∇̄2/4 and collecting the terms which are not in SΦ̄, we
obtain an action after the Φ̄ integration ‡:
d4xd2θTr
(Img1)
32mḡ2
Φ∇̄2∇2Φ +m
(WΦsWΦk−1−s)
. (3.4)
The first two terms are already present in the integrations with regard to the action SN=1
(2.2). The last term is new and originates from the gauge kinetic term in eq. (2.1). As we
will see below, this last term does contribute to the effective superpotential and becomes
responsible for the violation of the well-known relation [7, 8] between the effective superpo-
tential of the gauge theory and the planar free energy of the matrix model having the tree
level (bare) superpotential as its potential.
†The simplest background is that consisting of a vanishing gauge field Aµ and a constant gaugino λ
which satisfies {λα, λβ} = 0 [18]. This configuration implies that traces of more than two W vanish.
‡In eq. (3.4), it is understood that the generating functional has a renormalized perturbation expansion
in which a nonvanishing tadpole is always canceled by a nonvanishing value of the source coupled to Φ. This
implies that the tadpole can in practice be ignored.
After rescaling Φ → aΦ with a2 = mḡ2/(Img1)
2, the quadratic part of the action (3.4)
reduces to
−�+m′ +
adWαDα
(2WWΦ2 +WΦWΦ),
where we have used the relation ∇̄2∇2Φ = 16(�Φ − adWαDαΦ/2) and introduced m
a2mg2 and g
3 = a
2g3/12. The propagator in the momentum space is
∆(p, π) =
dse−s(p
2+m′+ 1
adWαπα−ig
The Grassmann momentum πα is Fourier transformation of superspace coordinate θα and
the matrix M is
Mabcd = (WW)daδbc + (WW)bcδda +WdaWbc, (3.5)
where we have exhibited the gauge index dependence explicitly. This matrix is not present
in the propagator of [17]. Using eq. (3.5), we are able to insert W without involving the
momentum πα.
The interaction terms in eq. (3.4) are divided into the following two types:
type I. m
Tr Φk, k = 3, . . . , n+ 1.
type II. −
Tr(WΦsWΦk−1−s), k = 4, . . . , n+ 1.
Type I vertices are already present in [17]. Type II vertices are not present in [17]. They
insert two W in specific ways.
Before going on to consider loop diagrams, let us first demonstrate that we have only
to consider planar diagrams in our case as well [17, 18]. For a given diagram, we denote
by V the number of vertices, by P the number of propagators and by h the number of
holes (or index loops). There are V sets of chiral superspace integrations from V vertices.
One of them becomes the chiral superspace integration over the effective superpotential,
and the number of remaining πα momentum integrations is P − V + 1. These Grassmann
integrations must be saturated by 1
adWαπα terms in the propagators. Furthermore, we can
freely insert W both from the M terms in the propagators and from the type II vertices. If
we denote the number of these additional insertions by 2α, the total number of W insertions
is 2(P − V + 1 + α). On the other hand, one index loop can accommodate at most two W.
Thus we have h ≥ P − V + 1+ α. This implies that only the planar diagrams contribute to
the effective superpotential as the Euler number of the diagram is χ = V − P + h.
A planar diagram with h index loops has (h − 1) loop momenta. Let us consider the
(h− 1)-loop planar diagrams (contributing to the (h− 1)-loop vacuum amplitude) in which
all vertices are type I. Let us, for a moment, ignore the M term of (3.5). The calculation is
then the same as that of [17] which we briefly describe. Each diagram is a product of the
bosonic part obtained by integrating over the momentum p and the fermionic one coming
from the πα integrations. As we have seen in the last paragraph, we have exactly 2(h − 1)
W insertions in the fermion part. There are two possibilities for these W insertions. The
one is to keep one of the index loops empty, filling the remaining index loops with two
W. This yields NSh−1 term, where S = − 1
TrU(N)W
αWα. The other is to fill each of
two index loops chosen with single W, which yields Sh−2wαwα terms where w
α = 1
TrWα.
After calculating the both parts, we perform the Schwinger parameter integrals. Clearly this
procedure is universal to every (h− 1)-loop planar diagram up to the multiplications by the
symmetric factor and by the coupling constants. Therefore every such diagram is a product
of these factors with the following expression
{NhSh−1 + hC22S
h−2wαwα} ≡
(h−1)
0 ,(3.6)
where we have introduced A
(h−1)
0 . The factor h of the first term comes from the choice of
the empty index loop, and hC2 of the second term is the combination of inserting two W
into different index loops. The most important fact is that the dependence on Schwinger
parameters of the bosonic part is cancelled by that of the fermionic part. This explains
that the calculation of the effective superpotential of the gauge theory reduces to that of the
matrix model [17].
There are two types of corrections to A
(h−1)
0 . The one is due to the presence of the M
terms in the propagators, which we denote by A
(h−1)
1 . The other is due to the type II vertices,
which is obtained by replacing one of the type I vertices in A
(h−1)
0 by the corresponding type
II vertex and by summing over all possibilities. We denote this by A
(h−1)
2 . We consider them
in order.
Let us see the effects of the M term, namely, eq. (3.5). It plays a role of inserting two
W further. Thus we will obtain terms which are proportional to Sh. Note that we cannot
insert more than two W because, in such case, at least one of the index loops has more
than two insertions of W. For the parts contributing to NSh−1, which have an empty index
loop, we can further insert WαWα from the first two terms in (3.5). In the case in which
they are inserted in the a-th index loop, we obtain
TrWW, where ia
labels the propagators which form the a-th index loop. The absence of factor N is explained
by the absence of an empty index loop. The factor h is not present as we have so far
restricted ourselves to the a-th index loop. Summing over all index loops, we obtain the first
contribution to A
(h−1)
TrWW = 2ig′3
TrWW,
where we have used that when all index loops are summed, they pass through each double
line propagator exactly twice.
Let us note that the parts contributing to the second term of eq. (3.6) can receive further
insertions of W as well. They have two index loops with a single W insertion, for which
we can exploit the last term of M . An insertion of this term requires that two index loops
share a propagator. Let us define the index A = 1, . . . , hC2 as labeling the combinations of
such two index loops and the index Ã labeling the cases which have a common propagator
in the two index loops chosen. Let us further introduce the index iÃ labeling the common
propagator in case Ã. With these notations, we obtain the second contribution to A
(h−1)
2Sh−2
WαabWαcdW
Wβdc = ig
TrWαWα.
Putting all these together, we obtain the contributions from the vertices of type I,
(h−1)
(h−1)
1 (si))
)h−1(
16π2iPg3S
wαwα. (3.7)
It is important that the above new term has Schwinger parameter dependence aside from
the exponential factor. In [17], it was pointed out that the cancellation of this dependence
represents the reduction of the system to the matrix model. The appearance of this new
term with Schwinger parameter dependence may spoil this reduction. Note also that this new
term does not have an overall factor N , indicating the violation of the well-known relation
due to Dijkgraaf-Vafa [7].
We now turn to the vertices of type II which contain two W insertions. The ℓ-th order
vertex in Φ is
Tr(2WWΦℓ +WΦWΦℓ−1 + . . .+WΦℓ−1WΦ). (3.8)
where we have omitted the overall factors. The first term inserts two W into an index loop
while the remainder insert them into two different index loops. Having done 2(h − 1) πα
Figure 1: two-loop planar diagrams
integrations, we obtain 2(h − 1) W insertions. We can therefore use vertex (3.8) only once
in a diagram. When this is done, insertion of the M term from the propagator is disallowed.
Let us consider A
(h−1)
2 and suppose that one of the type I vertices, TrΦ
ℓ, is replaced by
the above vertex (3.8). The first term connects ℓ index loops and we can insert W2 into ℓ
different ways. Thus we obtain
2ℓTrWW as a contribution to A
(h−1)
2 . For the other
terms of eq. (3.8), there are in total ℓ(ℓ − 1) ways of inserting two W into different index
loops. These give
2Sh−2
ℓ(ℓ− 1)
WαabWαcdW
baWβdc =
ℓ(ℓ− 1)TrWW.
Summing the above two contributions, we obtain
ℓ(ℓ + 1)TrWW. Thus, in any
(h − 1)-loop diagram, changing a vertex from type I to type II is equivalent to considering
only NSh−1 terms in eq. (3.6) and changing the coupling constant by
mgℓ →
16π2igℓ+1S
, for ℓ ≥ 3. (3.9)
Therefore, we obtain a formula for the contribution from the (h− 1)-loop diagrams with
P propagators to Weff in (2.1),
(h−1)
∂F (h−1)
∂2F (h−1)
wαwα −
16π2iPmg3
∂F (h−1)
(h−1)
2 , (3.10)
where W
(h−1)
2 is defined by replacing, in the first term, one coupling constant according to
eq. (3.9) and summing over all possibilities. We have denoted by F (h−1) the (h − 1)-loop
contribution to the planar free energy of the matrix model.
IV. Example
As a sample computation, let us take the two-loop contribution to the effective superpo-
tential. There are two two-loop planar diagrams depicted in Fig.1. Collecting all possible
insertions of W, we obtain
(mg3)
32(mg2)3
NS2 −
(mg3)
16(mg2)3
Swαwα +
π2i(mg3)
2(mg2)4
π2i(mg3)(mg4)
3(mg2)3
. (4.1)
The first two terms are the ones which are present in the computation based on [7, 8] with
SN=1. The third one comes from the M term in the propagator and the last one from the
type II vertices. Note that, in the limit m → ∞ with mgk (k ≥ 2) fixed, we reproduce the
result of [17]. In an arbitrary loop amplitude, the situation is the same: new terms are of
order m−1 in this limit.
The overall U(1) part does not decouple from the SU(N) part. This can be easily seen by
translating S into the glueball superfield Ŝ = − 1
TrSU(N) W
αWα and extracting the factor
in front of wαwα. By the existence of the last two terms in eq. (3.10), it is nonvanishing.
For example, in the two-loop example, this part in (4.1) reads
πi(mg3)[2(mg2)(mg4)− 3(mg3)
2(mg2)4
wαwα 6= 0,
V. The chiral ring and the generalized Konishi anomaly
An alternative approach to the effective superpotential is to exploit and extend the prop-
erties of the N = 1 chiral ring and the generalized Konishi anomaly equations based on
reference [20, 8]. The anomalous Ward identity of our model for the general transformation
δΦ = f(Φ,W) is
= 〈TrfW ′(Φ)〉Φ −
Tr(fF ′′′(Φ)WαWα)
, (5.1)
where W ′′(Φ) = mF ′′′(Φ). In terms of the two generating functions of chiral one-point
functions
R(z) = −
TrWαWα
z − Φ
T (z) =
z − Φ
the anomalous Ward identities (5.1) are
R(z)2 = W ′(z)R(z) +
f(z),
2R(z)T (z) = W ′(z)T (z) +
c(z) + 16π2iF ′′′(z)R(z) +
c̃(z),
where f(z) and c(z) are polynomials of degree n− 1 in z and c̃(z) is a polynomial of degree
n− 2:
f(z) = −
(W ′(Φ)−W ′(z))WαWα
z − Φ
c(z) = 4
W ′(Φ)−W ′(z)
z − Φ
c̃(z) = −i
(F ′′′(Φ)− F ′′′(z))WαWα
z − Φ
The last term of eq. (5.1) does not contribute to the equation for R(z) because of the chiral
ring relation TrWαWαW
βWβ = 0. The equation for R(z) is the same as that of [8], which
is the loop equation of the matrix model. On the other hand, the equation for T (z) alters
from that of [8].
The final step of this approach is to express the effective superpotential in terms of R(z)
and T (z). Taking a variational derivative of (3.1) with respect to the coupling gk, we obtain
∂Weff
dzzkT (z) +
16π2i
(k − 1)!
dzzk−1R(z).
Hence we can determine the effective superpotential up to gk independent terms.
Acknowledgements
We thank Kazuhito Fujiwara, Yosuke Imamura, Hiroaki Kanno, Hironobu Kihara, Ya-
sunari Kurita, Kazutoshi Ohta and Makoto Sakaguchi for useful discussions. We are grateful
to Hiraku Yonemura for his collaboration at an early stage. This work is supported in part by
the Grant-in-Aid for Scientific Research (18540285) from the Ministry of Education, Science
and Culture, Japan. Support from the 21 century COE program “Constitution of wide-angle
mathematical basis focused on knots” is gratefully appreciated. The preliminary version of
this work was presented in YITP workshop “Fundamental Problems and Applications of
Quantum Field Theory”, YITP-W-06-16 in Yukawa Institute for Theoretical Physics, Kyoto
University (December 14-16 2006). We wish to acknowledge the participants for stimulating
discussions.
References
[1] M. T. Grisaru, W. Siegel and M. Rocek, Nucl. Phys. B 159 (1979) 429.
[2] G. Veneziano and S. Yankielowicz, Phys. Lett. B 113 (1982) 231.
[3] I. Affleck, M. Dine and N. Seiberg, Phys. Rev. Lett. 51 (1983) 1026; Nucl. Phys. B 241
(1984) 493.
[4] C. Vafa, J. Math. Phys. 42 (2001) 2798 [arXiv:hep-th/0008142].
[5] F. Cachazo, K. A. Intriligator and C. Vafa, Nucl. Phys. B 603 (2001) 3
[arXiv:hep-th/0103067].
[6] F. Cachazo and C. Vafa, [arXiv:hep-th/0206017].
[7] R. Dijkgraaf and C. Vafa, Nucl. Phys. B 644 (2002) 3 [arXiv:hep-th/0206255]; Nucl.
Phys. B 644 (2002) 21 [arXiv:hep-th/0207106]; [arXiv:hep-th/0208048].
[8] F. Cachazo, M. R. Douglas, N. Seiberg and E. Witten, JHEP 0212 (2002) 071
[arXiv:hep-th/0211170].
[9] K. Fujiwara, H. Itoyama and M. Sakaguchi, Prog. Theor. Phys. 113 (2005) 429
[arXiv:hep-th/0409060]; [arXiv:hep-th/0410132].
[10] K. Fujiwara, H. Itoyama and M. Sakaguchi, Nucl. Phys. B 723 (2005)
33 [arXiv:hep-th/0503113]; Prog. Theor. Phys. Suppl. 164 (2007) 125
[arXiv:hep-th/0602267].
[11] K. Fujiwara, H. Itoyama and M. Sakaguchi, Nucl. Phys. B 740 (2006) 58
[arXiv:hep-th/0510255]; [arXiv:hep-th/0611284].
[12] I. Antoniadis, H. Partouche and T.R. Taylor, Phys. Lett. B 372 (1996) 83,
[arXiv:hep-th/9512006]; I. Antoniadis and T. R. Taylor, Fortsch. Phys. 44 (1996) 487
[arXiv:hep-th/9604062]; H. Partouche and B. Pioline, Nucl. Phys. Proc. Suppl. 56B
(1997) 322 [arXiv:hep-th/9702115].
[13] S. Ferrara, L. Girardello and M. Porrati, Phys. Lett. B 366 (1996) 155
[arXiv:hep-th/9510074]; P. Fre, L. Girardello, I. Pesando and M. Trigiante, Nucl.
Phys. B 493 (1997) 231 [arXiv:hep-th/9607032]; M. Porrati, Nucl. Phys. Proc. Suppl.
55B (1997) 240 [arXiv:hep-th/9609073]; J. Louis, [arXiv:hep-th/0203138]; H. Itoyama
and K. Maruyoshi, Int. J. Mod. Phys. A 21 (2006) 6191 [arXiv:hep-th/0603180];
K. Maruyoshi, [arXiv:hep-th/0607047].
[14] P. Kaste and H. Partouche, JHEP 0411 (2004) 033 [arXiv:hep-th/0409303]; P. Merlatti,
Nucl. Phys. B 744 (2006) 207 [arXiv:hep-th/0511280]; L. Girardello, A. Mariotti and
G. Tartaglino-Mazzucchelli, JHEP 0603 (2006) 104 [arXiv:hep-th/0601078].
http://arxiv.org/abs/hep-th/0008142
http://arxiv.org/abs/hep-th/0103067
http://arxiv.org/abs/hep-th/0206017
http://arxiv.org/abs/hep-th/0206255
http://arxiv.org/abs/hep-th/0207106
http://arxiv.org/abs/hep-th/0208048
http://arxiv.org/abs/hep-th/0211170
http://arxiv.org/abs/hep-th/0409060
http://arxiv.org/abs/hep-th/0410132
http://arxiv.org/abs/hep-th/0503113
http://arxiv.org/abs/hep-th/0602267
http://arxiv.org/abs/hep-th/0510255
http://arxiv.org/abs/hep-th/0611284
http://arxiv.org/abs/hep-th/9512006
http://arxiv.org/abs/hep-th/9604062
http://arxiv.org/abs/hep-th/9702115
http://arxiv.org/abs/hep-th/9510074
http://arxiv.org/abs/hep-th/9607032
http://arxiv.org/abs/hep-th/9609073
http://arxiv.org/abs/hep-th/0203138
http://arxiv.org/abs/hep-th/0603180
http://arxiv.org/abs/hep-th/0607047
http://arxiv.org/abs/hep-th/0409303
http://arxiv.org/abs/hep-th/0511280
http://arxiv.org/abs/hep-th/0601078
[15] S. Gukov, C. Vafa and E. Witten, Nucl. Phys. B 584 (2000) 69 [Erratum-ibid. B 608
(2001) 477] [arXiv:hep-th/9906070].
[16] T. R. Taylor and C. Vafa, Phys. Lett. B 474 (2000) 130 [arXiv:hep-th/9912152].
[17] R. Dijkgraaf, M. T. Grisaru, C. S. Lam, C. Vafa and D. Zanon, Phys. Lett. B 573
(2003) 138 [arXiv:hep-th/0211017].
[18] R. Argurio, G. Ferretti and R. Heise, Int. J. Mod. Phys. A 19 (2004) 2015
[arXiv:hep-th/0311066].
[19] K. Fujiwara, Nucl. Phys. B 770 (2007) 145 [arXiv:hep-th/0609039].
[20] K. Konishi, Phys. Lett. B 135 (1984) 439.
http://arxiv.org/abs/hep-th/9906070
http://arxiv.org/abs/hep-th/9912152
http://arxiv.org/abs/hep-th/0211017
http://arxiv.org/abs/hep-th/0311066
http://arxiv.org/abs/hep-th/0609039
	Introduction
	The U(N) gauged model with spontaneously broken N=2 supersymmetry
	Diagrammatic analysis of the effective superpotential
	Example
	The chiral ring and the generalized Konishi anomaly
ABSTRACT
  It is known that the fermionic shift symmetry of the N=1, U(N) gauge model
with a superpotential of an adjoint chiral superfield is replaced by the second
(spontaneously broken) supersymmetry in the N=2, U(N) gauge model with a
prepotential and Fayet-Iliopoulos parameters. Based on a diagrammatic analysis,
we demonstrate how the well-known form of the effective superpotential in the
former model is modified in the latter. A set of two equations on the one-point
functions stating the Konishi anomaly is modified accordingly.

<|endoftext|><|startoftext|>
Comment on “Chiral Suppression of Scalar Glueball Decay”
Kuang-Ta Chao1, Xiao-Gang He2, and Jian-Ping Ma3,1
Department of Physics, Peking University, Beijing
Department of Physics and Center for Theoretical Sciences, National Taiwan University, Taipei
Institute Of Theoretical Physics, Academia Sinica, Beijing
PACS numbers: PACS numbers: 12.39.Mk, 12.38.Bx
In a recent letter, based on an effective Lagrangian,
Chanowitz[1] showed that in the limit that the mass mq
of a light quark q goes to zero, the decay amplitude for
a scalar glueball Gs decaying into qq̄ goes to zero, and
conjectured further that this chiral suppression also oc-
curs at the hadron level for Gs decays into ππ,KK with
the ratio of these two branching ratios to be of the or-
der O(m2u,d/m
s) for finite quark masses. Here we show
that the decay Gs → qq̄ is forbidden in the chiral limit
in QCD without assumptions. More essentially, we show
that this chiral suppression may be spoiled and may not
materialize itself at the hadron level.
A glueball here is assumed to be a pure gluonic state.
It decays into a qq̄ pair through a multi-gluon annihila-
tion process. The decay amplitude for Gs → q(p1)q̄(p2)
can be written as a product of a spinor pair ū(p1) and
v(p2) with a product of any number of γ matrices sand-
wiched between the spinors. Because vector-like coupling
in QCD, for mq = 0 the number of the γ-matrices is
an odd number which can always be reduced to one γ-
matrix. Therefore the amplitude can be written as:
Tqq̄ = ū(p1)γµA
µv(p2).
Lorentz covariance of the amplitude then dictates
Aµ(p1, p2) to be of the form a1p
+ a2p
. Therefore in
the chiral limit mq = 0, Tqq̄ = 0. The result also applies
to a pseudoscalar glueball decays into a qq̄ pair.
To study whether there is a chiral suppression in
Gs → ππ,KK or not, we work with an effective La-
grangian, Ls = fgG
a,µνGaµνGs, as in [1], and employ
QCD factorization[2] to calculate the amplitude Tππ for
Gs → π
+π−. To the leading twist-2 order, there are two
diagrams with the two gluons splitting into two quarks
and two anti-quarks, and then form two pions. The two
gluons are off-shell by the scale at order of MGs . A direct
calculation gives:
Tππ = −αsfg
du1du2φπ+(u1)φπ−(u2)
(1 − u1)(1 − u2)
[1 +O(αs, λ/MGs)] ,
where φπ is normalized as
duφπ(u) = 1. ui(i = 1, 2) is
the momentum fraction carried by the anti-quark in the
meson. In the above, λ can be any soft scale, such as
quark mass, ΛQCD and mπ. Clearly, Tππ is not zero in
the chiral limit mq = 0.
The amplitude for Gs → K
+K− decay can be ob-
tained by replacing quantities related to π by those re-
lated to K correspondingly. We would obtain, R =
B(Gs → ππ)/B(Gs → KK) ≈ f
K = 0.48, which is
substantially different from 1. This suppression is much
milder compared with the one at the quark level. This is
due to the fact that in perturbative QCD (pQCD) calcu-
lation the decay of Gs → ππ,KK is related to the cou-
pling of Gs to two pairs of qq̄ compared with conjectured
by Chanowitz in [1], where it is assumed that Gs just cou-
ples to one qq̄ pair. We should point out that whether
the chiral suppression at quark level can be realized still
waits for better non-perturbative calculation for the di-
rect two quark hadronization into ππ and KK. If the
pQCD contribution dominates, the result of R ≈ f4π/f
can be obtained without the assumption of the effective
Lagrangian. Because glueball is a pure gluon state, the
amplitude of the decay Gs → π
+π− can always be writ-
ten with QCD factorization as Tππ = f
πHg ⊗φπ+ ⊗φπ− ,
where the higher-twist effects related to π’s are neglected
andHg consists of some perturbative coefficient functions
and some quantities related to the structure of Gs. Al-
though Hg is unknown, one can easily find the result of
R ≈ f4π/f
The f0(1710) is a candidate for scalar glueball. Early
measurement obtained R ≤ 0.11[3], and a larger one by
BES[4] R = 0.41+0.11
−0.17 recently. It is interesting to notice
that the later is consistent with our result and may fa-
vor that the f0(1710) is a gluebal. However one should
remember that the prediction R ≈ f4π/f
K can have sub-
stantial non-perturbative corrections and there may be
further complication by mixing effects of a glueball with
qq̄ states. A more detailed study can be found in [5].
Acknowledgments: This work was supported in part
by grants from NSC and NNSFC (No 10421503).
[1] M.S. Chanowitz, Phys. Rev. Lett. 95, 172001(2005) .
[2] S.J. Brodsky and G.P. Lepage, Phys. Rev. D24,
2848(1981), G.P. Lepage and S.J. Broadsky, Phys. ReV.
D22, 2157(1980).
[3] W.-.M Yao et al., (Particle Data Group), J. Phys. G33,
1(2006).
http://arxiv.org/abs/0704.1061v1
[4] M. Ablikim et al. (BES Collaboration), Phys. Lett. B642,
441(2006).
[5] K.T. Chao, Xiao-Gang He and J.P. Ma, hep-ph/0512327.
http://arxiv.org/abs/hep-ph/0512327
ABSTRACT
  Comment on ``Chiral Suppression of Scalar Glueball Decay''

<|endoftext|><|startoftext|>
Scaling pT distributions for p and p̄ produced in Au+Au collisions at RHIC
W.C. Zhang, Y. Zeng, W.X. Nie, L.L. Zhu and C.B. Yang
Institute of Particle Physics, Hua-Zhong Normal University, Wuhan 430079, P.R. China
With the experimental data from STAR and PHENIX on the centrality dependence of the pT
spectra of protons and anti-protons produced at mid-rapidity in Au+Au collisions at 200 GeV,
we show that for protons and anti-protons there exists a scaling distribution independent of the
colliding centrality. The scaling functions can also describe data from BRAHMS for both proton
and anti-proton spectra at y = 2.2 and 3.2. The scaling behaviors are shown to be incompatible
with the usual string fragmentation scenario for particle production.
PACS numbers: 25.75.Dw,13.85.Ni
I. INTRODUCTION
One of the most important quantities in investigating
properties of the medium produced in high energy col-
lisions is the particle distribution for different species of
final state particles. RHIC experiments have found a lot
of novel phenomena from the particle spectra, such as the
unexpectedly large p/π ratio at pT ∼ 3 GeV/c [1], the
constituent quark number scaling of the elliptic flows [2],
and strong nuclear suppression of the pion spectrum in
central Au+Au collisions [3], etc. From the spectrum one
can learn a lot on the dynamics for particle production.
In many studies, searching for a scaling behavior of
some quantities vs suitable variables is useful for unveil-
ing potential universal dynamics. A typical example is
the proposal of the parton model from the x-scaling of
the structure functions in deep-inelastic scatterings [4].
Quite recently, a scaling behavior [5] of the pion spectrum
at mid-rapidity in Au+Au collisions at RHIC was found,
which related spectra with different collision centralities.
In [6] the scaling behavior was extended to non-central
region, up to η = 3.2 for both Au+Au and d+Au colli-
sions. The same scaling function can be used to describe
pion spectra for pT up to a few GeV/c from different col-
liding systems at different rapidities and centralities. The
shape of pion spectrum in those collisions is determined
by only one parameter 〈pT 〉, the mean transverse momen-
tum of the particle. It is very interesting to ask whether
similar scaling behaviors can be found for spectra of other
particles produced in Au+Au collisions at RHIC. In this
paper, the scaling property of the spectra for protons and
anti-protons is investigated and compared with that for
pions.
The organization of this paper is as follows. In Sec. II
we will address the procedures for searching the scaling
behaviors. Then in Sec. III the scaling properties of the
spectra for protons and anti-protons produced in Au+Au
collisions at RHIC at
sNN = 200 GeV will be studied.
We discuss mainly the centrality scaling of the spectra at
mid-rapidity and extend the discussion to very forward
region with rapidity y = 2.2 and 3.2 briefly. Sec. IV is for
discussions on the relation between the scaling behaviors
and the string fragmentation scenario.
II. METHOD FOR SEARCHING THE SCALING
BEHAVIOR OF THE SPECTRUM
As done in [5, 6], the scaling behavior of a set of spectra
at different centralities can be searched in a few steps.
First, we define a scaled variable
z = pT /K , (1)
and the scaled spectrum
Φ(z) = A
2πpTdpTdy
pT=Kz
, (2)
with K and A free parameters. As a convention, we
choose K = A = 1 for the most central collisions. With
this choice Φ(z) is nothing but the pT distribution for
the most central collisions. For the spectra with other
centralities, we try to coalesce all data points to one curve
by choosing proper parameters A and K. If this can
be achieved, a scaling behavior is found. The detailed
expression of the scaling function depends, of course, on
the choice of A and K for the most central collisions.
This arbitrary can be overcome by introducing another
scaling variable
u = z/〈z〉 = pT /〈pT 〉 , (3)
and the normalized scaling function
Ψ(u) = 〈z〉2Φ(〈z〉u)/
Φ(z)zdz . (4)
Here 〈z〉 is defined as
〈z〉 ≡
zΦ(z)zdz/
Φ(z)zdz . (5)
By definition,
Ψ(u)udu =
uΨ(u)udu = 1. This
scaled transverse momentum distribution is in essence
similar to the KNO-scaling [7] on multiplicity distribu-
tion.
III. SCALING BEHAVIORS OF PROTON AND
ANTI-PROTON DISTRIBUTIONS
Now we focus on the spectra of protons and anti-
protons produced at mid-rapidity in Au+Au collisions at
http://arxiv.org/abs/0704.1062v2
sNN = 200 GeV. STAR and PHENIX Collaborations
at RHIC published spectra for protons and anti-protons
at mid-rapidity for a set of colliding centralities [8, 9].
STAR data have a pT coverage larger than PHENIX ones.
As shown in Fig. 1, all data points for proton spectra at
different centralities can be put to the same curve with
suitably chosen A and K, by the procedure explained
in last section. The parameters are shown in Table I.
Except a few points for very peripheral collisions (cen-
tralities 60-92% for PHENIX data and 60-80% for STAR
data), all points agree well with the curve in about six
orders of magnitude. The larger deviation of data at
centralities 60-92% for PHENIX and 60-80% for STAR
from the scaling curve may be due to the larger central-
ity coverage, because the size of colliding system changes
dramatically in those centrality bins. For simplicity we
define v = ln(1+ z), and the curve can be parameterized
Φp(z) = 0.052 exp(14.9v − 16.2v2 + 3.3v3) . (6)
2 4 6 8 10 12
PHENIX
0−10%
20−30%
40−50%
60−92%
0−12%
10−20%
20−40%
40−60%
60−80%
p in Au+Au
FIG. 1: Scaling behavior of the spectrum for protons pro-
duced at mid-rapidity in Au+Au collisions at RHIC. The data
are taken from [8, 9]. Feed-down corrections are considered
in the data. The solid curve is from Eq. (6).
Similarly, one can put all data points for anti-proton
spectra at different centralities to a curve with other sets
of parameters A and K which are given also in TABLE I.
The agreement is good, as can be seen from Fig. 2, with
only a few points in small pT region for peripheral colli-
sions departing a little from the curve. For anti-proton
the scaling function is
Φp̄(z) = 0.16 exp(13v − 14.9v2 + 2.9v3) , (7)
with v defined above.
To see how good is the agreement between the fitted
curves in Figs. 1 and 2 and the experimental data, one
can calculate a ratio
B = experimental data/fitted results ,
2 4 6 8 10 12
STARPHENIX
0−12%0−10%
10−20%20−30%
20−40%40−50%
40−60%60−92%
60−80%
40−80%
 pbar in Au+Au
FIG. 2: Scaling behavior of the spectrum for anti-protons
produced at mid-rapidity in Au+Au collisions at RHIC. The
data are taken from [8, 9]. Feed-down effects are not corrected
in the STAR data for p̄. The solid curve is from Eq. (7).
STAR p p̄
centrality K A K A
0-12% 1 1 1 1
10-20% 0.997 1.203 1.005 1.417
20-40% 0.986 2.009 0.991 2.305
40-60% 0.973 4.432 0.993 5.414
60-80% 0.941 13.591 0.959 16.686
40-80% 0.986 8.126
PHENIX p p̄
centrality K A K A
0-10% 1.042 1.226 1.068 2.404
20-30% 1.026 2.532 1.045 4.901
40-50% 1.031 6.253 1.013 11.754
60-92% 0.934 39.056 0.935 69.31
BRAHMS p p̄
centrality K A K A
y = 2.2 0.930 0.921
y = 3.2 1.079 0.754 1.153 6.985
TABLE I: Parameters for coalescing all data points to the
same curves in Figs. 1 and 2.
and show B as a function of pT in linear scale for all
the data sets, as shown in Fig. 3 for the case of proton.
From the figure one can see that almost all the points
have values of B within 0.7 to 1.3, which means that the
scaling is true within an accuracy of 30%. This is quite
a good fit, considering the fact that the data cover about
6 orders of magnitude. For anti-protons, the agreement
is better than for protons.
Now one can see that the transverse momentum dis-
tributions for protons and anti-protons satisfy a scaling
law. For large pT (thus large z) the scaling functions in
Eqs. (6) and (7) behave as powers of pT , though the
2 4 6 8 10 12
FIG. 3: Ratio between experimental data and the fitted re-
sults shown in Fig. 1. STAR and PHENIX data are taken
from [8, 9]. Symbols are the same as in Fig. 1.
2 4 6 8 10 12
PHENIX
0−10%
20−30%
40−50%
60−92%
BRAHMS
y=3.2
p in Au+Au
0−12%
10−20%
20−40%
40−60%
60−80%
FIG. 4: Normalized scaling distribution for protons produced
at mid-rapidity and very forward direction in Au+Au col-
lisions at RHIC with the scaling variable u. STAR and
PHENIX data are taken from [8, 9] and BRAHMS data from
[10].
expressions are not in powers of z or pT . The scaling
functions in Eqs. (6) and (7) depend on the choices of A
and K for the case with centrality 0-12% for STAR data.
With the variable u defined in Eq. (3) this dependence
can be circumvented. 〈z〉’s for protons and anti-protons
are 1.14 and 1.08, respectively, with integration over z in
the range from 0 to 12, roughly corresponding to the pT
range measured by STAR. The normalized scaling func-
tions Ψ(u) for protons and anti-protons can be obtained
easily from Eqs. (6) and (7) and are shown in Figs. 4
and 5, respectively together with scaled data points as
in Figs. 1 and 2. A simple parameterization for the two
normalized scaling functions in Figs. 4 and 5 can be given
as follows
Ψp(u) = 0.064 exp(13.6v − 16.67v2 + 3.6v3) ,
Ψp̄(u) = 0.086 exp(12.41v − 15.31v2 + 3.16v3) ,
with v = ln(1 + u).
2 4 6 8 10 12
PHENIX
0−10%
20−30%
40−50%
60−92%
BRAHMS
y=3.2
y=2.2
pbar in Au+Au
0−12%
10−20%
20−40%
40−60%
60−80%
40−80%
FIG. 5: Normalized scaling distribution for anti-protons pro-
duced at mid-rapidity and very forward direction in Au+Au
collisions at RHIC with the scaling variable u. STAR and
PHENIX data are taken from [8, 9] and BRAHMS data from
[10].
As in the case for pion distributions, one can also in-
vestigate the pT distributions of protons and anti-protons
in non-central rapidity regions in Au+Au collisions. The
only data set we can find is from BRAHMS [10] at rapid-
ity y = 2.2 and 3.2 with centrality 0-10%. It is found that
the BRAHMS data can also be put to the same scaling
curves, as shown in Figs. 4 and 5. The values of corre-
sponding parameters A and K are also given in TABLE
I. Thus the scaling distributions found in this paper may
be valid in both central and very forward regions for pro-
tons and anti-protons produced in Au+Au collisions at
RHIC at
sNN=200 GeV.
Now one can ask for the difference between the scaling
functions for protons and anti-protons. After normaliza-
tion to 1 the difference between the scaling distributions
Ψ(u) for protons and anti-protons is shown in Fig. 6. In
log scale the difference between the two scaling functions
is invisible at low u. To show the difference clearly a ra-
tio r = Ψp(u)/Ψp̄(u) is plotted in the inset of Fig. 6 as
a function of u. The increase of r with u is in agreement
qualitatively with data shown in [9] where it is shown that
p̄/p decreases with pT monotonically. The difference in
the two scaling functions can be understood physically.
In Au+Au collisions there are much more quarks u, d
than ū and d̄ in the initial state. In the central region in
the state just before hadronization, more u and d quarks
can be found because of the nuclear stopping effect in
the interactions. As a consequence, more protons can
be formed from the almost thermalized quark medium
than anti-protons in the small pT regime. Experimen-
tal data show that in low pT region the yield of anti-
proton is about 80% that of protons in central Au+Au
collisions at RHIC. This difference contributes to the net
baryon density in the central region in Au+Au collisions
at RHIC. On the other hand, in the large pT region, pro-
tons and anti-protons are formed mainly from fragmen-
tation of hard partons produced in the QCD interactions
with large momentum transfer. As shown in [11], the
gluon yield from hard processes is about five times that
of u and d quarks. The fragmentation from a gluon to
p and p̄ is the same. The amount of u, d quarks from
hard processes is about 10 times that of ū, d̄ when the
hard parton’s transverse momentum is high enough. It
is well-known that the fragmentation function for a gluon
to p or p̄ is much smaller than that for a u or d (ū or d̄)
to p (p̄) because of the dominant valence quark contribu-
tion to the latter process. As a result, the ratio of yields
of proton over anti-proton at large pT is even more than
that at small pT . After normalizing the distributions to
the scaling functions the yield ratio of proton over anti-
proton increases approximately linearly with u when u is
large. It should be mentioned that no such difference for
π+, π− and π0, because they all are composed of a quark
and an antiquark.
2 4 6 8 10 12
2 4 6 8 10 12
FIG. 6: Comparison between the scaling functions for pro-
tons and anti-protons produced at mid-rapidity in Au+Au
collisions at RHIC with the scaling variable u. The inset if
for the ratio Ψp(u)/Ψp̄(u).
The scaling behaviors of the pT distribution functions
for protons and anti-protons can be tested experimen-
tally from studying the ratio of moments of the mo-
mentum distribution, 〈pnT 〉/〈pT 〉n =
unΨ(u)udu for
n = 2, 3, 4, · · ·. From the determined normalized distri-
butions, the ratio can be calculated by integrating over
u in the range from 0 to 12, as mentioned above, and
the results are tabulated in TABLE II. The values of the
ratio are independent of the parameters A and K in the
fitting process but only on the functional form of the
scaling distributions. If the scaling behaviors of particle
distributions are true, such ratios should be constants in-
n p p̄ π
2 1.194 1.215 1.65
3 1.717 1.775 4.08
4 2.978 3.064 14.4
5 6.415 6.417 64.73
6 19.045 17.253 373.82
TABLE II: Ratio of moments 〈pnT 〉/〈pT 〉
n for protons, anti-
protons and pions produced in Au+Au collisions at RHIC.
dependent of the colliding centralities and rapidities. For
comparison, the corresponding values of the ratio for pi-
ons produced in the same interactions, calculated in [6],
are also given in TABLE II. Because of very small dif-
ference in the scaling distributions for protons and anti-
protons at small u, the ratio for protons increases with
n at about the same rate as for anti-proton for small n.
For large n, the ratio for p becomes larger than that for p̄
because of the big difference in the scaling functions for p
and p̄ at large u. Because of the very strong suppression
of high transverse momentum proton production relative
to that of pions, the ratio for pions increases with n much
more rapidly than for p and p̄.
Another important question is about the difference be-
tween the scaling functions for protons in this paper and
for pions in [5, 6]. Experiments at RHIC have shown that
the ratio of proton yield over that of pion increases with
pT up to 1 in the region pT ≤ 3 GeV/c and saturates
in large pT region. This behavior should be seen from
the scaling functions for these two species of particles.
For the purpose of comparing the scaling distributions
we define a ratio
R = Ψp(u)/Ψπ(u) , (8)
and plot the ratio R as a function of u in Fig. 7. The ratio
increases with u, when u is small, reaches a maximum at
u about 1 and then decreases. Finally it decreases slowly
to about 0.1 for very large u. The highest value of R
is about 1.6, while the experimentally observed p over π
ratio is about 1 at pT ∼ 3 GeV/c. The reason for this
difference is two-fold. One is the normalization difference
in defining R and the experimental ratio. Another lies in
the different mean transverse momenta 〈pT 〉’s for pions
and protons with which the scaling variable u is defined
and used in getting the ratio R.
The existence of difference in the scaling distributions
for different species of particles produced in high energy
collisions is not surprising, because the distributions re-
flect the particle production dynamics which may be dif-
ferent for different particles. In the quark recombina-
tion models [12, 13, 14] pions are formed by combining a
quark and an anti-quark while protons by three quarks.
Because different numbers of (anti)quarks participate in
forming the particles, their scaling distributions must be
different. In this sense, our investigation results urge
more studies on particle production mechanisms.
0 1 2 3 4 5
FIG. 7: Ratio Ψp(u)/Ψπ(u) between the scaling functions for
protons and pions produced in Au+Au collisions at RHIC as
a function of the scaling variable u. The pion scaling distri-
bution is from [5, 6].
IV. DISCUSSIONS
From above investigation we have found scaling distri-
butions for protons and anti-protons produced in Au+Au
collisions at RHIC in both mid-rapidity and forward re-
gion. The difference between those two scaling distri-
butions is quite small, but they differ a lot from that for
pions and the ratio Ψp/Ψπ exhibits a nontrivial behavior.
Investigations in [5, 6] and in this paper have shown
that particle distributions can be put to the same curve
by linear transformation on pT . Though we have not
yet a uniform picture for the particle productions in high
energy nuclear collisions, the scaling behaviors can, in
some sense, be compared to that from the string frag-
mentation picture [15]. In that picture if there are n
strings, they may overlap in an area of Sn and the
average area for a string is then Sn/n. It is shown
that the momentum distributions can be related to the
case in pp collisions also by a linear variable change
pT → pT ((Sn/n)AuAu/(Sn/n)pp)1/4. Viewed from that
picture, our fitted K gives the degree of string overlap.
The average area for a string in most central Au+Au
collisions is about 70 percent of that in peripheral ones
from the values of K obtained from fitting the spectra of
proton. If string fragmentation is really the production
mechanism for all species of particles in the collisions, one
can expect that the overlap degree obtained is the same
from the changes of spectrum of any particle. In the lan-
guage in this work, values of K are expected the same
for pions, protons and other particles in the string frag-
mentation picture for particle production. Our results
show the opposite. Comparing the values of K from [5]
and this work, one can see that for pion spectrum K is
larger for more peripheral collisions but smaller for pro-
ton and anti-proton spectra. Our results indicate that
other particle production mechanisms may also provide
ways to the scaling distributions. Obviously more de-
tailed studies, both theoretically and experimentally, are
needed.
Acknowledgments
This work was supported in part by the National
Natural Science Foundation of China under Grant Nos.
10635020 and 10475032, by the Ministry of Education of
China under Grant No. 306022 and project IRT0624.
[1] S.S. Adler et al., PHENIX Collaboration, Phys. Rev. C
69, 034909 (2004).
[2] D. Molnár and S.A. Voloshin, Phys. Rev. Lett. 91,
092301 (2003); P.Sorensen,STAR Collaboration, J.Phys.
G30, S217 (2004).
[3] See, for example, S.S.Adler,PHENIX Collaboration,
Phys. Rev. Lett. 91, 072301 (2003).
[4] J.D. Bjorken and E.A. Paschos, Phys. Rev. 185, 1975
(1969).
[5] R.C. Hwa and C.B. Yang, Phys. Rev. Lett. 90, 212301
(2003).
[6] L.L. Zhu and C.B. Yang, Phys. ReV. C 75, 044904
(2007).
[7] Z. Koba, H.B. Nielsen and P. Olesen, Nucl. Phys. B 40,
317 (1972).
[8] S.S. Adler et al., PHENIX Collaboration, Phys. Rev.
Lett. 91, 172301 (2003).
[9] B.I. Abelev et al., STAR Collaboration, Phys. Rev. Lett.
97, 152301 (2006).
[10] R. Karabowicz for the BRAHMS Collaboration, talk at
Quark Matter 2005, Budapest, Hungary, Nucl. Phys. A
774, 447 (2006); I. Arsene et al., BRAHMS collaboration,
nucl-ex/0610021.
[11] D.K. Srivastava, C. Gale and R.J. Fries, Phys. Rev. C
67, 034903 (2003).
[12] R.C. Hwa, and C.B. Yang, Phys. Rev. C 67, 034902
(2003).
[13] V. Greco, C.M. Ko, and P. Lévai, Phys. Rev. Lett. 90,
202302 (2003); Phys. Rev. C 68, 034904 (2003).
[14] R.J. Fries, B. Müller, C. Nonaka and S.A. Bass, Phys.
Rev. Lett. 90, 202303 (2003); Phys. Rev. C 68, 044902
(2003).
[15] M.A. Braun, F. del Moral, and C. Pajares, Nucl. Phys.
A 715, 791 (2003).
http://arxiv.org/abs/nucl-ex/0610021
ABSTRACT
  With the experimental data from STAR and PHENIX on the centrality dependence
of the $p_T$ spectra of protons and anti-protons produced at mid-rapidity in
Au+Au collisions at 200 GeV, we show that for protons and anti-protons there
exists a scaling distribution independent of the colliding centrality. The
scaling functions can also describe data from BRAHMS for both proton and
anti-proton spectra at $y=2.2$ and 3.2. The scaling behaviors are shown to be
incompatible with the usual string fragmentation scenario for particle
production.

<|endoftext|><|startoftext|>
Introduction
An important problem in the studies on the SiO maser was that SiO maser sources ever
known were considerably biased. Specifically, the dust (effective) temperature of known
SiO maser sources, which was calculated from mid-infrared flux densities (such as the
IRAS andMSX flux densities), was limited roughly in a range of 250 K. Tdust . 2000 K.
This is because the previous SiO maser surveys have been limited to relatively warm
dust-temperature ranges. Consequently a non-negligible number of potential SiO maser
sources (especially with a low dust-temperature) have been slipped from the previous
SiO maser surveys.
Nyman et al.(1993) first realized the importance of SiO maser sources exhibiting a
low dust-temperature. They investigated how SiO maser emission behaves in a low
dust-temperature range by observing OH/IR stars in the SiO J = 1−0 v = 1&2 and
J =2−1 v = 1 lines. The OH/IR stars often exhibit a low dust-temperature less than
Tdust = 250 K. In their observation cold objects clearly show a larger intensity ratio of the
SiO J = 1−0 v = 2 to v = 1 lines. Both collisional and radiative schemes cannot fully ex-
plain this observational properties of the SiO masers (Bujarrabal 1994; Deol et al. 1995).
Nyman et al.(1993) suggested that an infrared H2O line (116,6 ν2 = 1→127,5 ν2 = 0)
overlapping with the SiO J = 0 v = 1→J = 1 v = 2 transition might play an important
role. However, in early 1990s the number of cold SiO maser sources (like OH/IR stars)
was quite limited, and it was difficult to statistically investigate the relation between
infrared colors and intensity ratios of SiO maser lines.
Nakashima & Deguchi(2003b) recently extended the Nyman’s study by surveying the
SiO maser emission in cold, dusty IRAS sources exhibiting low dust temperature less
http://arxiv.org/abs/0704.1063v1
2 J. Nakashima & S. Deguchi
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
corr: 0.59 A
log(F25/F12)
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
corr: 0.40
log(F25/F12)
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
log(F25/F12)
corr: 0.35
Figure 1. Infrared colors versus intensity ratio of SiO maser lines. The horizontal axes represent
infrared colors. F25 and F12 denote the IRAS flux densities at λ = 25 and 12 µm, respectively.
The filled dots (•), upward triangles (△) and downward triangles (▽) respectively represent the
intensity ratios of the SiO maser lines, lower limits of the ratio and upper limits of the ratio.
Correlation coefficients are given in the upper-left corners of each panel. The dashed lines are
the results of least-square fitting of a first order polynomial.
than 250 K. They found roughly 40 new SiO maser sources in the cold dusty objects,
and in conjunction with the results of another SiO maser survey of relatively warm IRAS
objects (Nakashima & Deguchi 2003a) they clearly demonstrated that the intensity ratio
of the SiO J = 1−0 v = 2 to v = 1 lines increases in inversely proportional to the
dust temperature. Nakashima & Deguchi(2003b) again suggested that the overlap line
of H2O might explain this correlation if the overlap line becomes stronger with decrease
of the dust temperature. To consider further this problems, we need to confirm whether
properties of the SiO lines other than J = 1−0 v = 1 and 2 lines are consistent with
the existence of the H2O overlap line. In this contributed paper we present the result
of quasi-simultaneous observations in the multiple different SiO rotational lines with the
Nobeyama 45m telescope. The main aim of the observation is to check the behavior of
SiO maser intensity ratios including lines other than the J = 1−0 v = 1 and 2 lines.
2. Observations and Results
The observing targets were selected from Nakashima & Deguchi(2003a, b) and the
Nobeyama SiO maser source catalog (Gorny et al. in preparation) in terms of the IRAS
colors and flux densities. The targets are distributed roughly in the right ascension range
between 18h and 22h, because the cold SiO maser sources found by Nakashima & Deguchi(2003b)
are distributed roughly in this range. We selected the observing targets basically in order
of the brightness at λ = 12 µm, but we also paid attention to the source distribution in
the IRAS two-color diagram so that the observing targets continuously cover the entire
color range.
SiO line observations with the Nobeyama 45m telescope were made in two separated
periods: May 11–19, 2004 and February 15–19, 2006. In the first period we observed, in
total, 38 objects. The observed SiO transitions in the first period were J = 1−0 v =1, 2, 3
and J = 2−1 v =1, 2. We also observed in the 29SiO J = 1−0 v =0 and J = 2−1 v =0
lines. In addition, we observed 27 objects in the H2O maser line at 22 GHz (61,6−52,3)
as a backup observation under rainy/heavy cloudy condition. In the second period we
observed, in total, 53 objects. The observed transitions in the second period were SiO
J = 1−0 v =0, 1, 2, 3, 4, 29SiO J = 1−0 v =0 and 30SiO J = 1−0 v =0. The technical
details of the observations will be presented in our future paper (Nakashima & Deguchi,
in preparation).
In this paper, we focus on the properties of the SiO J = 1−0 v = 1, 2 and 3 lines, in
SiO maser intensity ratios 3
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
log(F25/F12)
8 micron flux (MSX)
SiO J=1-0 v=1
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
SiO J=1-0 v=2
log(F
SiO J=1-0 v=3
C1.71+0.26 2.47+0.25
2.12+0.20
Figure 2. [Panels A, B and C]: Relation between infrared colors and absolute intensity of SiO
maser lines. The notation of the infrared colors is same with that used in Figure 1. The intensity
of SiO maser lines is standardized at the distance of 1 kpc. The thick dashed lines represent the
results of least-square-fitting of a first order polynomial. The inclinations of the fitted lines (thick
dashed lines) are given at the lower-right corners of each panel with statistical uncertainty. In
the panel A, only the data points below log(F25/F12) = 0.5 were fitted by the polynomial. The
data points above log(F25/F12) = 0.5 are independently fitted by a first order polynomial, and
the results of the fitting is given as the chain line. [Panel D]: Relation between 8µm absolute
flux density and infrared color. The 8µm flux is standardized at the distance of 1 kpc.
which we detected an enough number of objects for statistical analysis. Figure 1 shows
the relations between infrared colors and intensity ratios among the SiO maser lines. The
line intensities used to calculate the intensity ratios are velocity-integrated intensities.
In the panel A of Figure 1 we can clearly confirm the positive correlation between the
log(F25/F12) color and the intensity ratio of the SiO J = 1−0 v = 2 to v = 1 lines
as reported by Nakashima & Deguchi(2003b). Interestingly, the intensity ratios of the
J = 1−0 v = 3 to v = 2 lines and of the J = 1−0 v = 3 to v = 1 lines seem to also
correlate with the log(F25/F12) color (see, panels B and C) even though the correlation
coefficients are slightly smaller than that of panel A.
Figure 2 shows the relations between infrared colors and absolute intensities of the SiO
maser lines. The intensity of the SiO maser lines is standardized at the distance of 1 kpc
using the luminosity distances. The panels A, B and C of Figure 2 show the relations
between the log(F25/F12) color and the absolute intensity of the SiO J = 1−0 v = 1,
2 and 3 lines. A notable feature seen in these panels is that the SiO maser absolute
intensities undoubtedly correlate with the log(F25/F12) color. Another clear feature is
that the higher the vibrational transitions, the steeper the inclination of the dashed lines
representing the results of least-square-fitting of a first order polynomial. This tendency
is consistent with the correlation seen in Figure 1. In the panel A in Figure 2, the values
of the absolute intensity of SiO maser emission seem to maximize at log(F25/F12) ∼
0.5, and the values tend to decrease with increase of the color in the red region above
log(F25/F12) = 0.5. The log(F25/F12) color of 0.5 corresponds the boundary between
distributions of AGB and post-AGB stars in the log(F25/F12) color. In fact, the panel A
4 J. Nakashima & S. Deguchi
in Figure 1 (and Figure 8 in Nakashima & Deguchi 2003b) shows a sudden change of the
feature at log(F25/F12) ∼ 0.5. No such change is seen in the panels B and C in Figure
2, simply because the SiO J = 1−0 v = 2 and 3 lines have not been detected above
log(F25/F12) = 0.5 in the present observations.
A possible reason for the correlation seen in panels A, B and C in Figure 2 is that
the energy input to the SiO maser region increases with the infrared colors. To confirm
this possibility, in panel D in Figure 2 we plotted the 8µm flux densities as a function of
the infrared colors. The values of the 8µm flux densities were taken from the MSX point
source catalog. If we rely on the radiative scheme the 8µm flux should well represent
the energy input to the SiO maser region, because the λ = 8µm corresponds to the
∆v = 1 SiO transition (e.g., Deguchi & Iguchi 1976). In panel D in Figure 2 the 8µm flux
densities are standardized at the distance of 1kpc using the luminosity distances. The
distribution of the data points seen in panel D is, in fact, strikingly similar with those
seen in panels A, B and C, supporting that the 8µm flux tightly correlates with the SiO
maser intensity as suggested by Bujarrabal et al.(1987).
3. Discussion
In this section we discuss the possible explanation for the correlation between infrared
colors and SiO maser intensity ratios among the v = 1, 2 and 3 lines at 43 GHz. One
possible explanation is to introduce the overlap line of H2O (116,6 ν2 = 1 → 127,5 ν2 = 0),
which has been first suggested by Olofsson et al.(1981) to explain the anomalous, weak
intensity of the SiO J = 2−1 v = 2 line in oxygen-rich (O-rich) stars. This H2O line
overlaps with the SiO J = 0 v = 1→J = 1 v = 2 transition with a velocity difference of
1 km s−1. With this line overlap, the J = 1 v = 2 level is overpopulated, and the weakness
of the SiO J = 2−1 v = 2 line is explained by this overpopulation. The overpopulation
at the J = 1 v = 2 level is also consistent with the strong intensity of the J = 1−0 v = 2
line. Thus, the correlation between the infrared colors and the intensity ratio of the SiO
J = 1−0 v = 2 to v = 1 lines may be explained if this overlap line of H2O becomes
stronger with increase of the infrared colors. One problem in this interpretation is that
the intensity ratios of the SiO J = 1−0 v = 3 to v = 1&2 lines cannot be explained only
by the H2O 116,6 ν2 = 1 → 127,5 ν2 = 0 line. However Cho et al.(2007) recently reported
an interesting detection of the SiO J = 2−1 v = 3 line toward an S-type star, χ Cyg.
They also confirmed that the SiO J = 2−1 v = 3 line is weak in O-rich stars. The S-
type stars have almost same amount of oxygen and carbon atoms in their envelopes, and
consequently they have few H2O molecules in the envelopes. These results potentially
suggest that another overlap line of H2O affects on the population distribution of SiO in
O-rich stars, and Cho et al.(2007) have suggested that the H2O 50,5 ν2 = 2→63,4 ν2 = 1
line overlapping with the SiO J = 0 v = 2→J = 1 v = 3 line (with a velocity difference
of about 1.5 km s−1) acts on the population distribution of SiO. Thus, if both H2O
116,6 ν2 = 1→127,5 ν2 = 0 and 50,5 ν2 = 2→63,4 ν2 = 1 lines becomes stronger with
increase of infrared colors, all correlations between infrared colors and the SiO maser
intensity ratios among the J = 1−0 v = 1, 2 and 3 lines might be explained. The line
intensity of the H2O 50,5 ν2 = 2→63,4 ν2 = 1 line is usually weaker than that of the
116,6 ν2 = 1→127,5 ν2 = 0line. This fact also seems to be consistent with the relatively
weak intensity of the SiO J = 1−0 v = 3 line.
However, there are some other problems on the explanation with the overlap line of
H2O. First, we have to explain how the H2O infrared lines overlapping with the SiO
lines become stronger with increase of infrared colors. The relative abundance of H2O
molecules possibly increases with infrared colors, but this is not conclusive. Second, the
SiO maser intensity ratios 5
correlation between infrared colors and the intensity ratios of the SiO J = 1−0 v = 2 to
v = 1 lines might be explained without the overlap line of H2O. In the envelopes of very
cold objects, strong 8µm emission comes from every direction to the SiO masing region,
causing ineffective pumping through the SiO ∆v = 1 transition. On the other hand, 4µm
emission corresponding to the SiO ∆v = 2 transition is more effectively pump the SiO
population instead of the 8µm. These processes might explain the correlation seen in
Figure 3 (e.g., Doel et al. 1995). Third, a recent theoretical calculation predicted that if
we introduce the overlap line of H2O the spatial distribution of the maser spots cannot
be theoretically reproduced (Soria-Ruiz et al. 2004). Thus, this problem will be remain
controversial for some more time.
4. Summary
In this research we observed 75 known SiO maser sources quasi-simultaneously in the
SiO J = 1−0, v = 0, 1, 2, 3 and 4 lines, SiO J = 2−1 v = 1 and 2, 29SiO J = 1−0 v = 0
and J = 2−1 v = 0, and 30SiO J = 1−0 v = 0 lines. We also observed the targets in the
H2O 61,6−52,3 line under rainy/heavy cloudy condition. The sample continuously covers
a very wide dust-temperature range from 150 K to 2000 K. The correlation between
infrared colors and the intensity ratio of the SiO J = 1−0 v = 2 to v = 1 lines is
confirmed as reported by Nakashima & Deguchi(2003b). The intensity rations of SiO
J = 1−0 v = 3 to v = 1&2 lines possibly correlate with infrared colors. The overlap lines
of H2O might explain the correlations between the infrared colors and the SiO maser
intensity ratios among the J = 1−0 v = 1, 2 and 3 lines, although there are alternative
ways to interpret the phenomena.
Acknowledgements
The present research has been supported by the Academia Sinica Institute of Astron-
omy & Astrophysics in Taiwan.
References
Bujarrabal, V. 1994, A&A 285, 953
Bujarrabal, V., Planesas, P., & del Romero, A. 1987, A&A 175, 164
Cho, S.-H., Lee, C. W., & Park, Y.-S. 2007, ApJ 657, 482
Deguchi, S., & Iguchi, T. 1976, PASJ 28, 307
Doel, R. C., et al. 1995, A&A 302, 797
Nakashima, J., & Deguchi, S. 2003a, PASJ 55, 203
Nakashima, J., & Deguchi, S. 2003b, PASJ 55, 229
Nyman, L.-Å., Hall, P. J., & Le Bertre, T. 1993, A&A 280, 551
Olofsson, H., Rydbeck, O. E. H., Lane, A. P., & Predmore, C. R. 1981, ApJ 247, L81
Olofsson, H., Rydbeck, O. E. H., & Nyman, L.-A. A&A 150, 169
Soria-Ruiz, R., et al. 2004, A&A 426, 131
Discussion
Elitzur: In our recent work, infrared intensity and colors are uniquely determined by
the optical depth of the dust shell. SiO maser intensity is also correlated to the optical
depth. Therefore, what you are finding is somehow related to the effect of the activity of
the photosphere such as the variation of mass loss rates.
Nakashima: Thanks for useful comments. (We took account of Elitzur’s comments in
the text.)
	Introduction
	Observations and Results
	Discussion
	Summary
ABSTRACT
  We present the results of SiO line observations of a sample of known SiO
maser sources covering a wide dust-temperature range. The aim of the present
research is to investigate the causes of the correlation between infrared
colors and SiO maser intensity ratios among different transition lines. We
observed in total 75 SiO maser sources with the Nobeyama 45m telescope
quasi-simultaneously in the SiO J=1-0 v=0, 1, 2, 3, 4 and J=2-1 v=1, 2 lines.
We also observed the sample in the 29SiO J=1-0 v=0 and J=2-1 v=0, and 30SiO
J=1-0 v=0 lines, and the H2O 6(1,6)-5(2,3) line. As reported in previous
papers, we confirmed that the intensity ratios of the SiO J=1-0 v=2 to v=1
lines clearly correlate with infrared colors. In addition, we found possible
correlation between infrared colors and the intensity ratios of the SiO J=1-0
v=3 to v=1&2 lines.

<|endoftext|><|startoftext|>
Excitation Spectrum Gap and Spin-Wave Velocity of XXZ Heisenberg Chains:
Global Renormalization-Group Calculation
Ozan S. Sarıyer1, A. Nihat Berker2−4, and Michael Hinczewski4
1Department of Physics, Istanbul Technical University, Maslak 34469, Istanbul, Turkey,
2College of Sciences and Arts, Koç University, Sarıyer 34450, Istanbul, Turkey,
3Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, U.S.A., and
4Feza Gürsey Research Institute, TÜBİTAK - Bosphorus University, Çengelköy 34684, Istanbul, Turkey
The anisotropic XXZ spin- 1
Heisenberg chain is studied using renormalization-group theory. The
specific heats and nearest-neighbor spin-spin correlations are calculated thoughout the entire tem-
perature and anisotropy ranges in both ferromagnetic and antiferromagnetic regions, obtaining a
global description and quantitative results. We obtain, for all anisotropies, the antiferromagnetic
spin-liquid spin-wave velocity and the Isinglike ferromagnetic excitation spectrum gap, exhibiting
the spin-wave to spinon crossover. A number of characteristics of purely quantum nature are found:
The in-plane interaction sxi s
j + s
induces an antiferromagnetic correlation in the out-of-plane szi
component, at higher temperatures in the antiferromagnetic XXZ chain, dominantly at low temper-
atures in the ferromagnetic XXZ chain, and, in-between, at all temperatures in the XY chain. We
find that the converse effect also occurs in the antiferromagnetic XXZ chain: an antiferromagnetic
szi s
j interaction induces a correlation in the s
component. As another purely quantum effect,
(i) in the antiferromagnet, the value of the specific heat peak is insensitive to anisotropy and the
temperature of the specific heat peak decreases from the isotropic (Heisenberg) with introduction
of either type (Ising or XY) anisotropy; (ii) in complete contrast, in the ferromagnet, the value and
temperature of the specific heat peak increase with either type of anisotropy.
PACS numbers: 67.40.Db, 75.10.Pq, 64.60.Cn, 05.10.Cc
I. INTRODUCTION
The quantum Heisenberg chain, including the possi-
bility of spin-space anisotropy, is the simplest nontrivial
quantum spin system and has thus been widely studied
since the very beginning of the spin concept in quan-
tum mechanics [1, 2, 3]. Interest in this model continued
[4, 5, 6, 7, 8, 9, 10] and redoubled with the exposition
of its richly varied low-temperature behavior [11, 12, 13]
and of its relevance to high-temperature superconductiv-
ity [14, 15, 16, 17, 18]. It has become clear that antifer-
romagnetism and superconductivity are firmly related to
each other, adjoining and overlapping each other.
A large variety of theoretical tools have been employed
in the study of the various isotropic and anisotropic
regimes of the quantum Heisenberg chain, including
finite-systems extrapolation [6, 19], linked-cluster [7] and
dimer-cluster [20] expansions, quantum decimation [21],
decoupled Green’s functions [22], quantum transfer ma-
trix [23, 24], high-temperature series expansion [25], and
numerical evaluation of multiple integrals [26]. The XXZ
Heisenberg chain retains high current interest as a theo-
retical model [27, 28] with direct experimental relevance
[29].
In the present paper, a position-space renormalization-
group method introduced by Suzuki and Takano [30, 31]
for d = 2 dimensions and already applied to a number
of d > 1 systems [30, 31, 32, 33, 34, 35, 36] is used
to compute the spin-spin correlations and the specific
heat of the d = 1 anisotropic quantum XXZ Heisenberg
model, easily resulting in a global description and de-
tailed quantitative information for the entire temperature
and anisotropy ranges including the ferromagnetic and
antiferromagnetic, the spin-liquid and Isinglike regions.
We obtain, for all anisotropies, the antiferromagnetic
spin-liquid spin-wave velocity and the Isinglike ferromag-
netic excitation spectrum gap, exhibiting the spin-wave
to spinon crossover. A number of other characteristics of
purely quantum nature are found: The in-plane interac-
tion sxi s
j induces an antiferromagnetic correlation
in the out-of-plane szi component, at higher temperatures
in the antiferromagnetic XXZ chain, dominantly at low
temperatures in the ferromagnetic XXZ chain, and, in-
between, at all temperatures in the XY chain. We find
that the converse effect also occurs in the antiferromag-
netic XXZ chain: an antiferromagnetic szi s
j interaction
induces a correlation in the s
i component. As another
purely quantum effect, (i) in the antiferromagnet, the
value of the specific heat peak is insensitive to anisotropy
and the temperature of the specific heat peak decreases
from the isotropic (Heisenberg) with introduction of ei-
ther type (Ising or XY) anisotropy; (ii) in complete con-
trast, in the ferromagnet, the value of the specific heat
peak is strongly dependent on anisotropy and the tem-
perature of the specific heat peak increases with either
type of anisotropy. This purely quantum effect is a pre-
cursor to different phase transition temperatures in three
dimensions [32, 36, 37, 38]. Our calculational method
is relatively simple, readily yields global results, and is
overall quantitatively successful.
http://arxiv.org/abs/0704.1064v2
II. THE ANISOTROPIC QUANTUM
HEISENBERG MODEL AND THE
RENORMALIZATION-GROUP METHOD
A. The Anisotropic Quantum Heisenberg Model
The spin- 1
anisotropic Heisenberg model (XXZ model)
is defined by the dimensionless Hamiltonian
−βH =
sxi s
j + s
+ Jzs
, (1)
where β = 1/kBT and 〈ij〉 denotes summation over
nearest-neighbor pairs of sites. Here the sui are the quan-
tum mechanical Pauli spin operators at site i. The ad-
ditive constant G is generated by the renormalization-
group transformation and is used in the calculation of
thermodynamic functions. The anisotropy coefficient is
R = Jz/Jxy. The model reduces to the isotropic Heisen-
berg model (XXX model) for |R| = 1, to the XY model
for R = 0, and to the Ising model for |R| → ∞.
B. Renormalization-Group Recursion Relations
The Hamiltonian in Eq.(1) can be rewritten as
− βH =
{−βH(i, i+ 1)} . (2)
where βH(i, i+ 1) is a Hamiltonian involving sites i and
i + 1 only. The renormalization-group procedure, which
eliminates half of the degrees of freedom and keeps the
partition function unchanged, is done approximately [30,
Trodde
−βH =Trodde
{−βH(i,i+1)} (3)
=Trodde
{−βH(i−1,i)−βH(i,i+1)}
{−βH(i−1,i)−βH(i,i+1)}
′H′(i−1,i+1) ≃ e
i {−β′H′(i−1,i+1)} = e−β
′H′ .
Here and throughout this paper, the primes are used for
the renormalized system. Thus, at each successive length
scale, we ignore the non-commutativity of the operators
beyond three consecutive sites, in the two steps indicated
by ≃ in the above equation. Since the approximations
are applied in opposite directions, one can expect some
mutual compensation. Earlier studies [30, 31, 33, 34,
35] have been successful in obtaining finite-temperature
behavior on a variety of quantum systems.
The transformation above is summarized by
p s ms Two-site basis eigenstates
+ 1 1 |φ1〉 = | ↑↑〉
0 |φ2〉 =
{| ↑↓〉+ | ↓↑〉}
− 0 0 |φ4〉 =
{| ↑↓〉 − | ↓↑〉}
TABLE I: The two-site basis eigenstates that appear in
Eq.(8). These are the well-known singlet and triplet states.
The state |φ3〉 is obtained by spin reversal from |φ1〉, with the
same eigenvalue.
′H′(i,k) = Trj e
{−βH(i,j)−βH(j,k)}, (4)
where i, j, k are three successive sites. The operator
−β′H′(i, k) acts on two-site states, while the operator
−βH(i, j)−βH(j, k) acts on three-site states, so that we
can rewrite Eq.(4) in the matrix form,
〈uivk|e−β
′H′(i,k)|ūiv̄k〉 =
〈uiwj vk|e−βH(i,j)−βH(j,k)|ūiwj v̄k〉 , (5)
where state variables u, v, w, ū, v̄ can take spin-up or spin-
down values at each site. The unrenormalized 8× 8 ma-
trix on the right-hand side is contracted into the renor-
malized 4× 4 matrix on the left-hand side of Eq.(5). We
use two-site basis states vectors {|φp〉} and three-site ba-
sis states vectors {|ψq〉} to diagonalize the matrices in
Eq.(5). The states {|φp〉}, given in Table I, are eigen-
states of parity, total spin magnitude, and total spin z-
component. These {|φp〉} diagonalize the renormalized
matrix, with eigenvalues
J ′z +G
′, Λ2 = +
J ′xy −
J ′z +G
Λ4 = −
J ′zxy −
J ′z +G
′. (6)
The states {|ψq〉}, given in Table II, are eigenstates of
parity and total spin z-component. The {|ψp〉} diagonal-
ize the unrenormalized matrix, with eigenvalues
Jz + 2G, λ4 = 2G, (7)
λ2 = −
8J2xy + J
+ 2G,
λ3 = −
8J2xy + J
+ 2G.
With these eigenstates, Eq.(5) is rewritten as
γp ≡ 〈φp|e−β
′H′(i,k)|φp〉 =
u,v,ū,
v̄,w,q
〈φp|uivk〉〈uiwjvk|ψq〉·
〈ψq|e−βH(i,j)−βH(j,k)|ψq〉〈ψq |ūiwj v̄k〉〈ūiv̄k|φp〉 . (8)
p ms Three-site basis eigenstates
+ 3/2 |ψ1〉 = | ↑↑↑〉
1/2 |ψ2〉 = µ{| ↑↑↓〉 + σ| ↑↓↑〉 + | ↓↑↑〉}
|ψ3〉 = ν{−| ↑↑↓〉 + τ | ↑↓↑〉 − | ↓↑↑〉}
− 1/2 |ψ4〉 =
{| ↑↑↓〉 − | ↓↑↑〉}
TABLE II: The three-site basis eigenstates that appear in
Eq.(8) with coefficients σ = (−Jz +
8J2xy + J
z )/2Jxy , τ =
(Jz +
8J2xy + J
z )/2Jxy and normalization factors µ, ν. The
states |ψ5−8〉 are obtained by spin reversal from |ψ1−4〉, with
the same respective eigenvalues.
Thus, there are three independent γp that determine the
renormalized Hamiltonian and, therefore, three renor-
malized interactions in the Hamiltonian closed under
renormalization-group transformation, Eq.(1). These γp
γ1 = e
J′z+G
= e2G−
Jz + cosh
8J2xy + J
Jz sinh
8J2xy + J
8J2xy + J
γ2 = e
J′xy−
J′z+G
=2e2G−
8J2xy + J
Jz sinh
8J2xy + J
8J2xy + J
γ4 = e
+G′ = 2e2G , (9)
which yield the recursion relations
J ′xy = ln
, J ′z = ln
, G′ =
γ21γ2γ4
As expected, J ′xy and J
z are independent of the additive
constant G and the derivative ∂GG
′ = bd = 2, where b =
2 is the rescaling factor and d = 1 is the dimensionality
of the lattice.
For Jxy = Jz, the recursion relations reduce to the spin-
isotropic Heisenberg (XXX) model recursion relations,
while for Jxy = 0 they reduce to spin-
Ising model re-
cursion relations. The Jz = 0 subspace (XY model) is
not (and need not be) closed under these recursion rela-
tions [30, 31]: The renormalization-group transformation
induces a positive Jz value, but the spin-space easy-plane
aspect is maintained.
In addition, there exists a mirror symmetry along
the Jz-axis, so that J
xy (−Jxy, Jz) = J ′xy (Jxy, Jz) and
J ′z (−Jxy, Jz) = J ′z (Jxy, Jz). The thermodynamics of
the system remains unchanged under flipping the in-
teractions of the x and y spin components, since the
renormalization-group trajectories do not change. In
fact, this is part of a more general symmetry of the XYZ
model, where flipping the signs of any two interactions
leaves the spectrum unchanged [8]. Therefore, with no
loss of generality, we take Jxy > 0. Independent of the
sign of Jxy, Jz > 0 gives the ferromagnetic model and
Jz < 0 gives the antiferromagnetic model.
C. Calculation of Densities and Response
Functions by the Recursion-Matrix Method
Just as the interaction constants of two consecutive
points along the renormalization-group trajectory are re-
lated by the recursion relations, the densities are con-
nected by a recursion matrix T̂ , which is composed of
derivatives of the recursion relations. For our Hamilto-
nian, the recursion matrix and density vector ~M are
∂J′xy
szi s
. (11)
These are densities Mα associated with each interaction
∂ lnZ
, (12)
where Nα is the number of α-type interactions and Z
is the partition function for the system, which can be
expressed both via the unrenormalized interaction con-
stants as Z( ~K) or via the renormalized interaction con-
stants as Z( ~K ′). By using these two equivalent forms,
one can formulate the density recursion relation [39]
Mα = b
M ′βTβα , Tβα ≡
∂K ′β
. (13)
Since the interaction constants, under renormalization-
group transformation, stay the same at fixed points such
as critical fixed points or sinks, the above Eq.(13) takes
the form of a solvable eigenvalue equation,
bd ~M∗ = ~M∗ · T̂ , (14)
at fixed points, where ~M = ~M ′ = ~M∗. The fixed point
densities are the components of the left eigenvector of the
recursion matrix with left eigenvalue bd [39]. At ordinary
points, Eq.(13) is iterated until a sink point is reached
under successive renormalization-group transformations.
In algebraic form, this means
~M (0) = b−nd ~M (n)T̂ (n)T̂ (n−1) · · · T̂ (1) , (15)
where the upper indices indicate the number of iteration
(transformation), with ~M (n) ≃ ~M∗.
This method is applied on our model Hamiltonian.
The sink of the system is at infinite temperature J∗xy =
J∗z = 0 for all initial conditions (Jxy, Jz).
Response functions are calculated by differentiation
of densities. For example, the internal energy is U =
szi s
, employing T = 1/Jxy, and U =
szi s
, employing T = 1/|Jz|. The spe-
cific heat C = ∂TU follows from the chain rule,
C = J2xy
szi s
, for T = 1/Jxy,
C = J2z
szi s
∂|Jz|
, for T = 1/|Jz|.
III. CORRELATIONS SCANNED WITH
RESPECT TO ANISOTROPY
The ground-state and excitation properties of the XXZ
model offer a variety of behaviors [11, 12, 40, 41]: The
antiferromagnetic model with R < −1 is Isinglike and
the ground state has Néel long-range order along the z
spin component with a gap in the excitation spectrum.
For −1 ≤ R ≤ 1, the system is a ”spin liquid”, with a
gapless spectrum and power-law decay of correlations at
zero temperature. The ferromagnetic model with R > 1
is also Isinglike, the ground state is ferromagnetic along
the z spin component, with an excitation gap.
Our calculated
szi s
sxi s
nearest-neighbor spin-spin correlations for the
whole range of the anisotropy coefficient R are shown
in Fig.1, for various temperatures. The xy correlation is
always non-negative. Recall that we use Jxy > 0 with
no loss of generality. In the Isinglike antiferromagnetic
(R < −1) region, the z correlation is expectedly anti-
ferromagnetic. As the
szi s
correlation saturates for
large |R|, the transverse
correlation is some-
what depleted. In the Isinglike ferromagnetic (R > 1)
region, the
szi s
correlation is ferromagnetic, saturates
quickly as the
correlation quickly goes to zero.
In the spin-liquid (|R| < 1) region, the
szi s
correlation
monotonically passes through zero in the feromagnetic
side, while the
correlation is maximal. The re-
markable quantum behavior of
szi s
around R = 0 is
discussed in Sec.V below. It is seen in the figure that
these changeovers are increasingly sharp as temperature
is decreased and, at zero temperature, become discon-
tinuous at R = 1. As seen in Fig.1(b), at zero temper-
ature, our calculated
szi s
correlations
FIG. 1: (a) Calculated nearest-neighbor spin-spin correlations
szi s
(thick curves from lower left) and
(thin curves
from upper left) as a function of anisotropy coefficient R for
temperatures 1/Jxy = 0, 0.1, 0.2, 0.4, 0.8. (b) Calculated zero-
temperature nearest-neighbor spin-spin correlations (thin and
thick curves, as in the upper panel) compared with the exact
points of Ref.[4, 40, 42, 43, 44] shown with filled and open
symbols for
szi s
respectively. At R = 1, the
calculated
szi s
discontinuously goes from antiferromagnetic
to the exact result of 0.25 [40] of saturated ferromagnetism
and the calculated
discontinuously goes from ferro-
magnetic to the exact result of constant zero [40].
show very good agreement with the known exact points
[4, 42, 43, 44]. Also, our results for R > 1 fully overlap
the exact results of
szi s
= 0.25 and
= 0 [40].
We also note that zero-temperature is the limit in which
our approximation is at its worst.
FIG. 2: Calculated nearest-neighbor spin-spin corre-
lations
(upper panels) and
szi s
(lower
panels) for the antiferromagnetic XXZ chain, as a
function of temperature, for anisotropy coefficients
R = 0,−0.25,−0.50,−0.75,−1,−2,−4,−8,−∞ spanning the
spin-liquid (left panels) and Isinglike (right panels) regions.
Note that, in every one of the panels, the correlation curves
cross each other. This remarkable quantum phenomenon is
discussed in the text.
IV. ANTIFERROMAGNETIC XXZ CHAIN
For the antiferromagnetic XXZ chain, our calculated
szi s
nearest-neighbor spin-spin correla-
tions as a function of temperature are shown in Fig.2
for various anisotropy coefficients R. We find that when
Jxy is the dominant interaction (spin liquid), the corre-
lations are weakly dependent on anisotropy R. When Jz
is the dominant interaction (Isinglike), the correlations
are weakly dependent on anisotropy R only at the higher
temperatures. Our results are compared with multiple-
integral results [26] in Fig.3.
In every one of the panels of Fig.2, the correlation
curves cross each other, revealing a remarkable quantum
phenomenon. In a classical system, the correlation be-
tween a given spin component (e.g.,
) is expected
to decrease when the coupling of another spin component
(e.g., |Jz |) is increased. It is found from the antiferromag-
netic XXZ chain in Fig.2 that the opposite may occur
in a quantum system: In this figure, an increase in Jxy
causes an increase in |
szi s
| for 1/|Jz| > 0.9 and 0.4
in the spin-liquid and Isinglike regions respectively. Con-
versely, an increase in |Jz| causes an increase in
for 1/Jxy > 0.4 and 2.1 in the spin-liquid and Isinglike
regions respectively. This quantum effect can be called
cross-component spin correlation.
FIG. 3: Comparison of our results (thick lines) for the cor-
relation functions of the antiferromagnetic XXZ chain, with
the multiple-integral results of Ref.[26] (thin lines), for var-
ious anisotropy coefficients R spanning the spin-liquid and
Isinglike regions.
The antiferromagnetic specific heats calculated with
Eq.(16) are shown in Fig.4 for various anisotropy coeffi-
cients and compared, in Figs.5, 6, with finite-lattice ex-
pansion [6, 19], quantum decimation [21], transfer matrix
[24], high-temperature series expansion [25] results and,
for the R = 0 case, namely the XY model, with the exact
result [5] C = (1/4πT )
cosω/ cosh
dω. The
C(T ) peak temperature is highest for the isotropic case
(Heisenberg) and decreases with anisotropy increasing in
either direction (towards Ising or XY). The peak value
of C(T ) is only weakly dependent on anisotropy, espe-
cially for the Isinglike systems. A strong contrast to this
behavior will be seen, as another quantum mechanical
phenomenon, in the ferromagnetic XXZ chain.
The linearity, at low temperatures, of the spin liq-
uid (|R| ≤ 1) specific heat with respect to temperature
is expected on the basis of spin-wave calculations for
the antiferromagnetic XXZ model [45, 46]. This linear
form of C(T ) reflects the linear energy-momentum dis-
persion of the low-lying excitations, the magnons. The
low-temperature magnon dispersion relation is ~ω = ckn,
where c is the spin-wave velocity and n = 1 for the an-
tiferromagnetic XXZ model in d = 1 [40]. The internal
energy, given by U = (1/2π)
dk~ω(k)/(eβ~ω(k)−1), is
dominated by the magnons at low temperatures, yielding
U ∼ T 2 and C ∼ T for n = 1 in the dispersion relation.
From this relation, our calculated spin-wave velocity c as
a function of anisotropy R is given in Fig.7 and compares
well with the also shown exact result [47]. A simultaneous
fit to the dispersion relation exponent n, expected to be
1, yields 1.00± 0.02. However, for the Isinglike −R > 1,
the unexpected linearity instead of an exponential form
FIG. 4: Calculated specific heats C of the antiferromagnetic
XXZ chain, as a function of temperature for anisotropy coeffi-
cients R = 0,−0.25,−0.50,−0.75,−1,−2,−4,−8,−∞ span-
ning the spin-liquid (upper panel) and Isinglike (lower panel)
regions.
caused by a gap in the excitation spectrum, points to the
approximate nature of our renormalization-group calcu-
lation. The correct exponential form is obtained in the
large −R limit, where the renormalization-group calcu-
lation becomes exact.
Rojas et al. [25] have obtained the high-temperature
expansion of the free energy of the XXZ chain to order
β3, where β is the inverse temperature. The specific heat
from this expansion is
2 +R2
J2xy −
J3xy +
6− 8R2 −R4
J4xy. (17)
This high-temperature specific heat result is also com-
pared with our results, in Fig.6, and very good agreement
is seen. In fact, when in the high-temperature region of
0 < β < 0.1, we fit our numerical results for C(β) to the
fourth degree polynomial C = Σ4i=0Aiβ
i, and we do find
(1) the vanishing A0 < 10
−5 and A1 < 10
−7 for all R and
(2) the comparison in Fig.8 between our results for A2
and A3 and those of Eq.(17) from Ref.[25], thus obtaining
excellent agreement for all regions of the model.
V. FERROMAGNETIC XXZ CHAIN
For the ferromagnetic (i.e., R > 0) systems in Fig.1,
szi s
expectation value becomes rapidly negative
at lower temperatures for R < 1, even though for
R ≥ 0 all couplings in the Hamiltonian are ferromag-
FIG. 5: Comparison of our antiferromagnetic specific heat re-
sults (thick lines) with the results of Refs.[5] (open circles),
[6] (dotted), [19] (thin lines), [21] (dash-dotted), and [23, 24]
(dashed), for anisotropy coefficients R = 0,−0.5,−1,−2 span-
ning the spin-liquid and Isinglike regions.
FIG. 6: Comparison of our antiferromagnetic specific heat
results (thick lines) with the high-temperature J → 0 behav-
iors (thin lines) obtained from series expansion in Ref.[25],
for anisotropy coefficients R = 0,−0.50,−0.75,−1,−2,−∞
spanning the spin-liquid and Isinglike regions.
netic. This is actually a real physical effect, not a nu-
merical anomaly. In fact, we know the spin-spin corre-
lations for the ground state of the one-dimensional XY
model (the R = 0 case of our Hamiltonian), and we can
compare our low-temperature results with these exact
values. The ground-state properties of the spin- 1
model are studied by making a Jordan-Wigner trans-
FIG. 7: Our calculated antiferromagnetic spin-wave veloc-
ity c versus the anisotropy coefficient R. The dashed line,
2π sin(γ)/γ where γ ≡ cos−1(−R), is the exact result [47].
FIG. 8: Comparison of our results with the high-temperature
expansion of Ref. [25] for all regions: antiferromagnetic (outer
panels) and ferromagnetic (inner panels), spin-liquid (left
panels) and Isinglike (right panels). Triangles and circles de-
note our results, while solid and dashed lines denote the re-
sults of Ref.[25] for A2 and A3, respectively. The error bars,
due to the statistical fitting procedure of the coefficients A2
and A3, have half-heights of 1.7×10
−4 and 2.6×10−3 respec-
tively.
formation, yielding a theory of non-interacting spinless
fermions. Analysis of this theory yields the exact zero-
temperature nearest-neighbor spin-spin correlations [4]
shown in Table III. Our renormalization-group results
in the zero-temperature limit, also shown in this table,
compare quite well with the exact results, as with the
other exact points in Fig.1(b), although in the worst re-
gion for our approximation. Finally, by continuity, it is
reasonable that for a range of R positive but less than
one, the z component correlation function is as we find,
intriguingly but correctly negative at low temperatures.
Thus, the interaction sxi s
j (irrespective of its sign,
due to the symmetry mentioned at the end of Sec.IIB)
induces an antiferromagnetic correlation in the szi com-
ponent, competing with the szi s
j interaction when the
FIG. 9: Calculated nearest-neighbor spin-spin correlations
(upper panels) and
szi s
(lower panels) for the
ferromagnetic XXZ chain, as a function of temperature,
for anisotropy coefficients R = 0, 0.25, 0.50, 0.75, 1, 2, 4, 8,∞
spanning the spin-liquid (left panels) and Isinglike (right pan-
els) regions.
FIG. 10: Left panel: Calculated nearest-neighbor spin-
spin correlations
szi s
for the ferromagnetic XXZ chain,
as a function of temperature 1/Jxy in the spin liquid, for
anisotropy coefficients R = 0, 0.25, 0.50, 0.75, 1. Right panel:
The sign-reversal temperature T0 of the nearest-neighbor cor-
relation 〈szi s
j 〉: our results (full curve) and the analytical re-
sult from the quantum transfer matrix method (dashed) [23].
latter is ferromagnetic.
For finite temperatures, our calculated nearest-
neighbor spin-spin correlations are shown in Figs.9, 10,
for different values of R. These results are compared with
Green’s function calculations [22] in Fig.11. As expected
from the discussion at the beginning of this section, in
the spin-liquid region, the correlation 〈szi szj 〉 is negative
at low temperatures. Thus, a competition occurs in the
FIG. 11: Comparison of our ferromagnetic R = 1, 5
results
with Green’s function calculations [22] .
FIG. 12: Calculated specific heats C of the ferromagnetic
XXZ chain, as a function of temperature for anisotropy coef-
ficients R = 0, 0.25, 0.50, 0.75, 1, 2, 4, 8,∞ spanning the spin-
liquid (upper panel) and Isinglike (lower panel) regions.
correlation 〈szi szj 〉 between the XY-induced antiferromag-
netism and the ferromagnetism due to the direct cou-
pling between the sz spin components. In fact, the rein-
forcement of antiferromagnetic correlations of 〈szi szj 〉 by
increasing Jxy (and also its converse) was seen in the
antiferromagnetic XXZ chain discussed in the previous
section. Thus, we see that whereas this cross-component
effect is dominant at low temperatures in the ferromag-
netic XXZ chain, it is seen at higher temperatures in the
antiferromagnetic XXZ chain and, in-between, through-
out the temperature range in the XY chain.
In the ferromagnetic XXZ chain, as a consequence
of the competition mentioned above, a sign reversal in
〈szi szj 〉 occurs from negative to positive correlation, at
temperatures J−1xy = T0(R).[48] At this temperature, by
cancelation of the competing effects, the nearest-neighbor
FIG. 13: Comparison of our ferromagnetic specific heat re-
sults (thick lines) with the high-temperature J → 0 behaviors
(thin lines) obtained from series expansion [25], for anisotropy
coefficients R = 0.25, 0.50, 0.75, 1, 2,∞ spanning the spin-
liquid and Isinglike regions.
Zero-temperature
correlations of the
spin- 1
XY chain
Exact values
from Ref. [4]
results
〉 0.15915 0.17678
〈szi s
j 〉 −0.10132 −0.12500
TABLE III: Zero-temperature nearest-neighbor correlations
of the spin- 1
XY chain.
correlation 〈szi szj 〉 is zero. Our calculated T0(R) curve is
shown in Fig.10, and has very good agreement with the
exact result T0 = (
3 sin γ/4γ) tan[π(π − γ)/2γ] where
γ ≡ cos−1(−R) [23].
The calculated ferromagnetic specific heats are shown
in Fig.12 for various anisotropy coefficients and com-
pared, in Figs.13, 14, with finite-lattice expansion [6],
quantum decimation [21], decoupled Green’s functions
[22], transfer matrix [23, 24], high-temperature se-
ries expansion [25] results and, for the R = 0 case,
namely the XY model, with the exact result [5] C =
(1/4πT )
cosω/ cosh
dω. In sharp contrast
to the antiferromagnetic case in Sec.IV, the peak C(T )
temperature is highest for the most anisotropic cases
(XY or Ising) and decreases with anisotropy decreas-
ing from either direction (towards Heisenberg). In the
same contrast, the peak value of C(T ) is dependent
on anisotropy, decreasing, eventually to a flat curve, as
anisotropy is decreased. This contrast between the fer-
romagnetic and antiferromagnetic systems is a purely
quantum phenomenon. Specifically, the marked contrast
between the specific heats of the isotropic antiferromag-
netic and ferromagnetic systems, seen in the full curves
of Figs.4 and 12 respectively, translates into the different
critical temperatures of the respective three-dimensional
systems.[32, 36, 37, 38] Classical ferromagnetic and anti-
FIG. 14: Comparison of our ferromagnetic specific heat re-
sults (thick lines) with the results of Refs.[5] (dash-double-
dotted), [6] (dotted), [21] (dash-dotted), [22] (open cir-
cles), and [23, 24] (dashed), for anisotropy coefficients R =
0, 0.5, 1, 5
, 2, 5 spanning the spin-liquid and Isinglike re-
gions.
ferromagnetic systems are, on the other hand, identically
mapped onto each other.
The low-temperature specifics heats are discussed in
detail and compared to other results in Sec.VI.
VI. LOW-TEMPERATURE SPECIFIC HEATS
Properties of the low-temperature specific heat of
the ferromagnetic XXZ chain have been derived from
the thermodynamic Bethe-ansatz equations [40]. For
anisotropy coefficient |R| ≤ 1, the model is gapless
[11, 12] and, except at R = 1, the specific heat is lin-
ear in T = J−1xy in the zero-temperature limit, C/T =
2γ/(3 sinγ) where again γ ≡ cos−1(−R). Note that
this result contradicts the spin-wave theory prediction of
C ∼ T 1/2 for the ferromagnetic chain (n = 2 for the fer-
romagnetic magnon dispersion relation of the kind given
above in Sec.IV). The spin-wave result is valid only for
R = 1, the isotropic Heisenberg case. From the expres-
sion given above, we see that C/T diverges as R → 1−,
and at exactly R = 1 it has been shown that C ∼ T 1/2
[40].
In the Isinglike region R > 1, the system exhibits a
FIG. 15: The calculated excitation spectrum gap ∆ versus
anisotropy.
FIG. 16: Calculated specific heat coefficient C/T as a function
of anisotropy R, for T = 0.10, 0.05, 10−10 .
gap in its excitation spectrum and the specific heat be-
haves as C ∼ T−a exp(−∆/T ), with ∆ being the excita-
tion spectrum gap [11, 12, 40]. There exist two gaps for
the energy, called the spinon gap and the spin-wave gap,
given by ∆spinon =
1−R−2 and ∆spinwave = 1−R−1.
These are the minimal energies of elementary excita-
tions [10, 40]. A crossover between them occurs atR = 5
below this value, the spinon gap is lower, while above
this value the spin-wave gap is lower. We have double-
fitted our calculated specific heats with respect to the
gap ∆ and the leading exponent a, for the entire range
of anisotropy R between 0 < R−1 < 1 (Fig.15). Our
calculated gap ∆ behaves linearly in R−1 for R−1 close
to 1, and crosses over to 1/2 at R−1 = 0, as expected.
We also obtain the exponent a = 1.99± 0.02 in the Ising
FIG. 17: Calculated specific heat coefficient C/T as a func-
tion of temperature for anisotropy coefficient R = −5 (thin
grey), R = −2 (thick grey), -1 (dotted), -0.5 (dash-dotted),
0.5 (dashed), and 2 (thin black).
limit R−1 ≤ 0.2 and a = 1.52 ± 0.10 in the Heisenberg
limit R−1 ≥ 0.9. These exponent values are respectively
expected to be 2 and 1.5 [9, 10].
We now turn to the discussion of our specific heat re-
sults for the entire ferromagnetic and antiferromagnetic
ranges. Our calculated C/T curves are plotted as a func-
tion of anisotropy and temperature in Figs.16 and 17
respectively. We discuss each region of the anisotropy R
separately:
(i) R > 1 : The specific heat coefficient C/T vanishes in
the T → 0 limit and has the expected exponential form as
discussed above in this section. The spin-wave to spinon
excitation gap crossover is obtained.
(ii) R ≈ 1 : The double-peak structure of C/T in Fig.16
is centered at R = 1. As temperature goes to zero, the
peaks narrow and diverge.
(iii) −1 ≤ R < 1 : The specific heat coefficient is
C/T = 2γ/(3 sinγ) in this region [11, 40], and our calcu-
lated specific heat is indeed linear at low temperatures.
The C/T curves for R = −1,−0.5, 0.5 in Fig.17 all ex-
trapolate to nonzero limits at T = 0. The spin-wave
dispersion relation exponent and velocity, for the antifer-
romagnetic system, is correctly obtained for the isotropic
case and for all anisotropies, as seen in Fig.7. Fig.18
directly compares C/T = 2γ/(3 sinγ) with our results:
The curves have the same basic form, gradually rising
from R = −1, with a sharp divergence as R nears 1.
At R = 1+, we expect C/T = 0. Our T = 10−10
curve diverges at R = 1 and indeed returns to zero at
R = 1.0000001.
(iv) R < −1 : We expect a vanishing C/T , which we do
find as seen in Fig.16 and in the insets of Fig.17. The
exponential behavior of the specific heat is clearly seen
in the Ising limit.
VII. CONCLUSION
A detailed global renormalization-group solution of
the XXZ Heisenberg chain, for all temperatures and
anisotropies, for both ferromagnetic and antiferromag-
netic couplings, has been obtained. In the spin-liquid
region, the linear low-temperature specific heat and, for
the antiferromagnetic chain, the spin-wave dispersion re-
lation exponent n and velocity c have been obtained. In
the Isinglike region, the spin-wave to spinon crossover of
the excitation spectrum gap of the ferromagnetic chain
has been obtained from the exponential specific heat,
as well as the correct leading algebraic behaviors in the
Heisenberg and Ising limits. Purely quantum mechani-
cal effects have been seen: We find that the xy corre-
lations and the antiferromagnetic z correlations mutu-
FIG. 18: Calculated specific heat coefficient C/T as a func-
tion of anisotropy coefficient R in the spin-liquid region,
−1 ≤ R ≤ 1, at constant temperature T = 10−10 . Our
renormalization-group result (grey curve) is compared to the
zero-temperature Bethe-Ansatz result (black curve). Inset:
our calculation (grey curve) at constant T = 10−2 is again
compared to the zero-temperature Bethe-Ansatz result (black
curve).
ally reinforce each other, for different ranges of temper-
atures and anisotropies, in ferromagnetic, antiferromag-
netic, and XY systems. The behaviors, with respect to
anisotropy, of the specific heat peak values and locations
are opposite in the ferromagnetic and antiferromagnetic
systems. The sharp contrast found in the specific heats
of the isotropic ferromagnetic and antiferromagnetic sys-
tems is a harbinger of the different critical temperatures
in the respective three-dimensional systems. When com-
pared with existing calculations in the various regions of
the global model, good quantitative agreement is seen.
Even at zero temperature, where our approximation is at
its worst, good quantitative agreement is seen with exact
data points for the correlation functions (Fig.1(b)), which
we extend to all values of the anisotropy. Finally, the rel-
ative ease with which the Suzuki-Takano decimation pro-
cedure is globally and quantitatively implemented should
be noted.
Acknowledgments
This research was supported by the Scientific and
Technological Research Council (TÜBİTAK) and by the
Academy of Sciences of Turkey. One of us (O.S.S.) grate-
fully acknowledges a scholarship from the Turkish Sci-
entific and Technological Research Council - Scientist
Training Group (TÜBİTAK-BAYG).
[1] F. Bloch, Z. Phys. 61, 206 (1930).
[2] H. Bethe, Z. Phys. 71, 205 (1931).
[3] L. Hulthén, Ark. Mat. Astron. Fys. A 26, 1 (1938).
[4] E. Lieb, T. Schultz, and D. Mattis, Ann. of Phys. 16,
407 (1961).
[5] S. Katsura, Phys. Rev. 127, 1508 (1962).
[6] J.C. Bonner and M.E. Fisher, Phys. Rev. 135, A640
(1964).
[7] S. Inawashiro and S. Katsura, Phys. Rev. 140, A892
(1965).
[8] C.N. Yang and C.P. Yang, Phys. Rev. 147, 303 (1966).
[9] J.D. Johnson and J.C. Bonner, Phys. Rev. Lett. 44, 616
(1980).
[10] J.D. Johnson and J.C. Bonner, Phys. Rev. B 22, 251
(1980).
[11] F.D.M. Haldane, Phys. Rev. Lett. 45, 1358 (1980).
[12] F.D.M. Haldane, Phys. Rev. B 25, 4925 (1982).
[13] F.D.M. Haldane, Phys. Lett. A 93, 464 (1983).
[14] J.G. Bednorz and K.A. Müller, Z. Phys. B 64, 189 (1986).
[15] E. Manousakis, Rev. Mod. Phys. 63, 1 (1991) and refer-
ences therein.
[16] B. Keimer, N. Belk, R.J. Birgeneau, A. Cassanho, C.Y.
Chen, M. Greven, M.A. Kastner, A. Aharony, Y. Endoh,
R.W. Erwin, and G. Shirane, Phys. Rev. B 46, 14034
(1992).
[17] M. Greven, R.J. Birgeneau, Y. Endoh, M.A. Kastner, M.
Matsuda, and G. Shirane, Z. Phys. B 96, 465 (1995).
[18] R.J. Birgeneau, A. Aharony, N.R. Belk, F.C. Chou, Y.
Endoh, M. Greven, S. Hosoya, M.A. Kastner, C.H. Lee,
Y.S. Lee, G. Shirane, S. Wakimoto, B.O. Wells, and K.
Yamada, J. Phys. Chem. Solids 56, 1913 (1995).
[19] R. Narayanan and R.R.P. Singh, Phys. Rev. B 42, 10305
(1990).
[20] M. Karbach, K.H. Mütter, P. Ueberholz, and H. Kröger,
Phys. Rev. B 48, 13666 (1993).
[21] C. Xi-Yao and G.F. Tuthill, Phys. Rev. B 32, 7280
(1985).
[22] W.J. Zhang, J.L. Shen, J.H. Xu, and C.S. Ting, Phys.
Rev. B 51, 2950 (1995).
[23] K. Fabricius, A. Klümper, and B.M. McCoy, Competi-
tion of Ferromagnetic and Antiferromagnetic Order in
the Spin-1/2 XXZ Chain at Finite Temperature, in Sta-
tistical Physics on the Eve of the 21st Century, M.T.
Batchelor and L.T. Wille, eds., p.351 (World Scientific,
Singapore 1999); arXiv: cond-mat/9810278.
[24] A. Klümper, Integrability of Quantum Chains: Theory
and Applications to the Spin-1/2 XXZ Chain, Lecture
Notes in Physics 645, 349 (Springer, Berlin-Heidelberg
2004); arXiv: cond-mat/0502431
[25] O. Rojas, S.M. de Souza, and M.T. Thomaz, J. Math.
Phys. 43, 1390 (2002).
[26] M. Bortz and F. Göhman, Eur. Phys. J. B 46, 399 (2005).
[27] J. Damerau, F. Göhman, N.P. Hasenclever, and A.
Klümper, J. Phys. A 40, 4439 (2007).
[28] A.Y. Hu, Y. Chen, and L.J. Peng, J. Magnetism and
Magnet. Mat. 313, 366 (2007).
[29] S. Thanos, Physica A 378, 273 (2007).
[30] M. Suzuki and H. Takano, Phys. Lett. A 69, 426 (1979).
[31] H. Takano and M. Suzuki, J. Stat. Phys. 26, 635 (1981).
[32] A. Falicov and A.N. Berker, Phys. Rev. B 51, 12458
(1995).
[33] P. Tomczak, Phys. Rev. B 53, R500 (1996).
[34] P. Tomczak and J. Richter, Phys. Rev. B 54, 9004 (1996).
[35] P. Tomczak and J. Richter, J. Phys. A 36, 5399 (2003).
[36] M. Hinczewski and A.N. Berker, Eur. Phys. J. B 48, 1
(2005).
[37] G.S. Rushbrooke and P.J.Wood, Mol. Phys. 6, 409
(1963).
[38] J. Oitmaa and W. Zheng, J. Phys.: Condens. Matter 16,
8653 (2004).
[39] S.R. McKay and A.N. Berker, Phys. Rev. B 29, 1315
(1984).
[40] M. Takahashi, Thermodynamics of One-Dimensional
Solvable Models, pgs. 41,56, 152-158, Cambridge Univer-
sity Press, Cambridge (1999), and references therein.
[41] D.V. Dmitriev, V.Ya. Krivnov, and A.A. Ovchinnikov,
Phys. Rev. B 65, 172409 (2002).
[42] G. Kato, M. Shiroishi, M. Takahashi, and K. Sakai, J.
Phys. A 37,5097 (2004).
[43] N. Kitanine, J.M. Maillet, N.A. Slavnov, and V. Terras,
J. Stat. Mech. L09002 (2005).
[44] J. Sato, M. Shiroishi, and M. Takahashi, Nucl. Phys. B
729, 441 (2005).
[45] R. Kubo, Phys. Rev. 87, 568 (1952).
[46] J. Van Kranendonk and J.H. Van Vleck, Rev. Mod. Phys.
30, 1 (1958).
http://arxiv.org/abs/cond-mat/9810278
http://arxiv.org/abs/cond-mat/0502431
[47] J. des Cloizeaux and M.Gaudin, J. Math. Phys 7, 1384
(1966).
[48] C. Schindelin, H. Fehske, H. Büttner, and D. Ihle, Phys.
Rev. B 62, 12141 (2000).
ABSTRACT
  The anisotropic XXZ spin-1/2 Heisenberg chain is studied using
renormalization-group theory. The specific heats and nearest-neighbor spin-spin
correlations are calculated thoughout the entire temperature and anisotropy
ranges in both ferromagnetic and antiferromagnetic regions, obtaining a global
description and quantitative results. We obtain, for all anisotropies, the
antiferromagnetic spin-liquid spin-wave velocity and the Isinglike
ferromagnetic excitation spectrum gap, exhibiting the spin-wave to spinon
crossover. A number of characteristics of purely quantum nature are found: The
in-plane interaction s_i^x s_j^x + s_i^y s_j^y induces an antiferromagnetic
correlation in the out-of-plane s_i^z component, at higher temperatures in the
antiferromagnetic XXZ chain, dominantly at low temperatures in the
ferromagnetic XXZ chain, and, in-between, at all temperatures in the XY chain.
We find that the converse effect also occurs in the antiferromagnetic XXZ
chain: an antiferromagnetic s_i^z s_j^z interaction induces a correlation in
the s_i^xy component. As another purely quantum effect, (i) in the
antiferromagnet, the value of the specific heat peak is insensitive to
anisotropy and the temperature of the specific heat peak decreases from the
isotropic (Heisenberg) with introduction of either type (Ising or XY)
anisotropy; (ii) in complete contrast, in the ferromagnet, the value and
temperature of the specific heat peak increase with either type of anisotropy.

<|endoftext|><|startoftext|>
Microsoft Word - preprint.doc
April 9, 2007, submitted to J. Phys. Soc. Jpn. 
Layered sodium cobaltate NaxCoO2
1) provides us with a 
fascinating playground to study the physics of strongly 
correlated electrons on the frustrated triangular lattice in a 
wide range of band filling.  Recent two important 
discoveries have accelerated the research of NaxCoO2: one 
is an unusually large thermoelectric power for x ~ 2/32) and 
the other is superconductivity below 4.5 K in a hydrated 
compound with x ~ 1/3.3)  Particularly, a lot of studies have 
focused on the hydrated superconductor to clarify the 
mechanism of the superconductivity and to understand 
underlying electronic states realized in the triangular lattice.  
In spite of extensive research, however, there still remain 
some controversial issues possibly resulting from difficulty 
in the chemistry of the compounds that put obstacles in the 
way of obtaining high-quality samples.4, 5)   
The electronic phase diagram of NaxCoO2 as a function 
of x has been proposed by several groups.  Foo et al. gave a 
typical one where a charge-ordered magnetic insulator exists 
at x = 0.5 with a "paramagnetic (PM) metal" on the left (x < 
0.5) and a "Curie-Weiss (CW) metal" on the right (x > 
0.5).6)  In contrast, Yokoi et al. found that the boundary 
between the two metals is located approximately at x = 0.6.7)  
The reason for this discrepancy is not known. Moreover, the 
origin of the change in metallic character has not yet been 
interpreted in a clear manner.  On the charge-ordered 
insulator at x = 0.5, it is known that the ordering of Na ions, 
which is already present at high temperature, triggers a 
magnetic as well as a metal-insulator transitions at low 
temperatures.6-8) 
The electronic states of NaxCoO2 near the Fermi level 
come from the Co d derived t2g state that splits into an a1g 
singlet and an e'g doublet.  These bands are filled 
progressively with x, because one Na+ ion donates one 
electron to the CoO2 layer.  According to band structure 
calculations,9-14) the a1g band always crosses the Fermi level, 
irrespective of band filling, which gives a large circular 
(cylindrical) Fermi surface (FS) around the Γ point in the 
Brillouin zone.  In addition, two important features on the 
FS are found in the local-density approximation 
calculations: one is a set of small hole pockets originating 
from the e'g band near the K points, which are expected to 
appear at low fillings such as x = 0.3.  The other is a small 
concentric electron FS around Γ coming from a dip in the 
band energy of the a1g state, which may appear at high 
fillings such as x = 0.7.  However, the absence of the latter 
was suggested for any doping levels by more sophisticated 
band structure calculations incorporating spin polarizations 
or the on-site Coulomb interaction U.11)  Experimentally, 
angle-resolved photoemission spectroscopy (ARPES) study 
on Na0.7CoO2 observed only a large circular FS around Γ 
and failed to detect the other FSs.15) Recently, Mochizuki 
and Ogata pointed out in their tight binding model that band 
dispersions and FS topology change sensitively with the 
thickness of the CoO2 layer:
16, 17)  the hole pockets near K 
would appear for a thinner CoO2 layer, while the small 
electron pocket around Γ would be present for a thicker 
CoO2 layer.  Thus, the current status is miles away from a 
complete understanding of the basic electronic structures of 
NaxCoO2. 
In this letter, we study systematically the transport and 
thermodynamic properties of NaxCoO2 using a series of 
polycrystalline samples.  The key point of the present study 
is a novel method used to prepare samples: most of samples 
studied so far were obtained by a soft-chemistry method at 
ambient temperature,3) which might cause an 
inhomogeneous distribution of Na ions or otherwise lead to 
Precise Control of Band Filling in NaxCoO2 
Daisuke YOSHIZUMI, Yuji MURAOKA**, Yoshihiko OKAMOTO, Yoko KIUCHI, 
Jun-Ichi YAMAURA, Masahito MOCHIZUKI1, ***, Masao OGATA2 and Zenji HIROI* 
Institute for Solid State Physics, University of Tokyo, Kashiwa, Chiba 277-8581 
1RIKEN, Hirosawa, Wako, Saitama 351-0198 
2Department of Physics, University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033 
Electronic properties of the sodium cobaltate NaxCoO2 are systematically studied through a precise 
control of band filling.  Resistivity, magnetic susceptibility and specific heat measurements are carried 
out on a series of high-quality polycrystalline samples prepared at 200°C with Na content in a wide 
range of 0.35 ≤ x ≤ 0.70.  It is found that dramatic changes in electronic properties take place at a critical 
Na concentration x* that lies between 0.58 and 0.59, which separates a Pauli paramagnetic and a Curie-
Weiss metals.  It is suggested that at x* the Fermi level touches the bottom of the a1g band at the Γ point, 
leading to a crucial change in the density of states across x* and the emergence of a small electron pocket 
around the Γ point for x > x*. 
KEYWORDS:  sodium cobaltate, resistivity, magnetic susceptibility, specific heat, Sommerfeld 
coefficient, Fermi surface, band filling 
*E-mail: hiroi@issp.u-tokyo.ac.jp 
**Present address: The Graduate School of Natural Science and Technology, Okayama University, 
1-1, Naka-3, Tsushima, Okayama 700-8530 
***Present address: Tokura Multiferroics Project, ERATO, Japan Science and Technology Agency (JST) 
a strong tendency for Na ordering at certain fractional 
compositions such as x = 0.5.  Particularly, the Na ordering 
strongly influences the electronic state of the (CoO2)
x- layer 
and may mask the intrinsic features.  We adopted 
alternatively a solid-state reaction at higher temperatures 
starting from two end members of Na0.35CoO2 and 
Na0.70CoO2 and succeeded in controlling x on a scale of 0.01.  
Moreover, since the high-temperature reaction favoured 
disordering of Na ions, more intrinsic properties of the 
(CoO2)
x- layer have been clarified.   
We prepared a series of polycrystalline samples of 
NaxCoO2 by a solid-state reaction of Na0.35CoO2 and 
Na0.70CoO2.  First, powders of Na0.70CoO2 were synthesized 
by a reaction of stoichiometric amounts of Na2CO3 and 
Co3O4 in air at 860°C.  Sodium deintercalation was then 
carried out in a 1.0 M Br2 solution in acetnitrile to obtain 
powders of Na0.35CoO2.  The Na content was determined by 
the inductively coupled plasma-atomic emission 
spectroscopy (ICP) method.  A dozen of samples with 
intermediate compositions were prepared by reacting the 
two powders of Na0.35CoO2 and Na0.70CoO2 in an 
appropriate ratio in a sealed quartz tube at 200°C for 24 
hours, followed by slow cooling to room temperature.  The 
Na content of the products was also examined by the ICP 
analysis.  The phase purity was confirmed by means of 
powder x-ray diffraction.  More detail on the sample 
preparation and characterization will be reported 
elsewhere.18)  Resistivity and specific heat were measured in 
a Quantum Design PPMS system, and magnetic 
susceptibility measurements were performed in a Quantum 
Design MPMS system. 
Figure 1 shows the systematic variation of resistivity ρ 
with x.  For clarity, each set of data is divided by the value 
at 250 K.  The ρ of x = 0.50 exhibits a smooth, 
semiconductor-like increase at low temperature with no 
anomalies for a metal-insulator transition, in contrast to the 
previous report where a soft-chemically prepared sample of 
x = 0.50 exhibits a sharp rise in ρ below 53 K.6, 7)  We 
examined our sample by electron diffraction and found that 
a superstructure coming from Na ordering was almost 
absent.18)  We think that this semiconducting behavior is due 
to a partial ordering of Na ions, because even at room 
temperature the Na ions tend to align.18)  As x increases 
from 0.50, such a semiconducting increase in ρ is 
progressively suppressed, and finally a metallic T 
dependence appears above 0.55.  Nevertheless, a small 
upturn is observed below 20 K, as shown in the inset to Fig. 
1, which may be ascribed to weak localization due to a 
random electrostatic potential from disordered Na 
distributions. 
Compared with the T dependence in ρ for x ≤ 0.58, which 
is always concave-downward, those of x ≥ 0.59 are 
apparently dissimilar; almost T linear for x = 0.59 and 
concave-upward for x = 0.66 and 0.70.  The last one shows a 
steep decrease below 30 K, which is similar as reported 
previously for x = 0.75.6)  Therefore, the  T dependence of 
ρ changes significantly across x = 0.58 ~ 0.59. 
Another related change is found in the evolution of 
magnetic susceptibility χ, as shown in Fig. 2.  The χ for x ≤ 
0.58 shown in the left panel is relatively small in magnitude 
and weakly T dependent, while, in distinct contrast, the χ for 
x ≥ 0.59 is large in magnitude and shows a characteristic 
CW divergence that is progressively enhanced with 
increasing x.  Hence, a substantial change in magnetism 
must occur across a critical Na concentration x* between 
0.58 and 0.59, accurately corresponding to the above results 
from resistivity. 
Looking in more detail, the χ of x = 0.35 decreases 
gradually with decreasing T from 300 K, exhibits a 
minimum at 120 K and then shows a CW divergence at low 
Fig. 1.  Evolution of the temperature dependence of 
resistivity ρ normalized to the value at 250 K for 0.50 ≤ 
x ≤ 0.58 (left) and 0.59 ≤ x ≤ 0.70 (right). 
Fig. 2.  Magnetic susceptibility χ measured on heating in a 
magnetic field of 10 kOe for  0.35 ≤ x ≤ 0.58 (left) and 
0.59 ≤ x ≤ 0.70 (right).  Note the difference in the 
vertical scale. 
temperature.  This low-temperature CW component may be 
due to a minor portion of spins that come from defects or 
impurities and are noninteracting with majority spins. The 
positive slope above 120 K suggests the existence of a peak 
in χ at a higher temperature.  In fact, the 0.50 and 0.52 
samples seem to show a broad maximum near 300 K.  
Moreover, a broad peak is clearly observed at around 150 K 
for x = 0.56.  The peak is possibly further shifted to a lower 
T for x = 0.58 and becomes obscure.  This broad peak 
indicates the development of antiferromagnetic (AF) short-
range order on cooling.  Then, its systematic shift to lower 
temperatures implies that the characteristic energy of AF 
spin fluctuations or effective superexchange Jeff decreases 
with increasing x.  Since the density of spins (1-x) decreases 
as x increases, we expect that Jeff between nearest-neighbour 
spins also decreases, consistent with the observed behavior 
in χ.  For x ≥ 0.59, on the other hand, the Weiss temperature 
deduced from fitting the data to the CW law χ = C/(T - Θ) is 
always negative in sign (AF) and gradually increases from -
156 K (x = 0.59) to -99 K (x = 0.70).  This fact indicates that 
AF fluctuations are suppressed or additional ferromagnetic 
interactions are enhanced with increasing x. 
In order to demonstrate more explicitly the dramatic 
changes across x*, the ρ and χ data of x = 0.58 and 0.59 are 
compared in Fig. 3.  The ρ of 0.58 is proportional to T2 in a 
wide T range below 70 K, as shown in the inset, provided 
that the low-temperature upturn is ignored.  Such T2 
behavior is what one expects generally  for a strongly 
correlated electron system.  In contrast, the T dependence in 
ρ of 0.59 is rather unusual, showing an almost linear 
variation in a similarly wide T range below 80 K.  Hence, 
there must be a substantial difference in the scattering 
mechanism of carriers between the two samples. 
On the other hand, it is apparent from Fig. 3 that χ is 
enhanced enormously from 0.58 to 0.59.  Moreover, the T 
dependence above 50 K changes from linear to CW-like.  
This T linear behavior for 0.58 may be accidental as a few 
contributions of different origins coexist. The unusual T-
linear dependence in ρ for 0.59 must be related to the 
appearance of the CW component in χ: additional magnetic 
scattering can be the source. 
To evaluate the change in the T dependence of χ, the 
slope at 100 K is plotted in Fig. 4, following the previous 
analysis done by Yokoi et al.7)  The slope is close to zero or 
takes small negative values for x ≤ 0.58, while it decreases 
almost discontinuously to a relatively large negative value at 
0.59, followed by a further decrease with increasing x.   
Another experimental evidence to support the existence 
of a critical x value has been obtained from specific heat 
measurements.  The x dependence of the Sommerfeld 
coefficient γ determined from the intercept of the C/T versus 
T2 plot is shown in Fig. 4.  Obviously, there is a change in γ  
at x*: γ increases slightly with x for x ≤ 0.58 and suddenly 
rises at x* by 17 %, from 16.8 mJ K-2 mol-1 for 0.58 to 19.7 
mJ K-2 mol-1 for 0.59.  Then, it increases rapidly to be 
saturated at 31.5 mJ K-2 mol-1 for x = 0.70.  Note that the γ 
of 0.50 is finite because of the metallic nature of our sample 
in the absence of charge order.  Since γ is proportional to the 
density of states (DOS), we conclude that the DOS at the 
Fermi level suddenly increases above x*. 
All the above results indicate that the electronic structure 
of NaxCoO2 does not change smoothly with x, but there is a 
well-defined boundary at x*.  It is found that the critical 
concentration lies between 0.58 and 0.59, not at 0.5 as 
reported by Foo et al.,6) but close to 0.60 as reported by 
Fig. 4. x dependences of the slope in χ at 100 K (top) and 
the Sommerfeld coefficient γ (bottom).  Crosses in the 
bottom panel represent the data reported by Yokoi et 
al.7)  Dotted lines serve as guides to the eye. 
Fig. 3.  Comparisons of resistivity (left) and magnetic 
susceptibility (right) between x = 0.58 and 0.59.  Inset in 
the left panel shows a ρ - ρ0 versus T
2 plot for 0.58, 
where ρ0 is the residual resistivity and is 1.83 mΩ cm.  
Inset in the right panel shows the T dependence of 
inverse χ for 0.59. 
Yokoi et al.7)  The sharp boundary breaks the phase diagram 
into two regions: a Pauli paramagnetic metal with a 
relatively small DOS for x < x*and a CW metal with an 
enhanced DOS for x > x* (Fig. 5).  A main reason why we 
observed such sharp changes in properties at x* in the 
present study may be ascribed to the high quality of our 
samples prepared at high temperature as well as the reduced 
influence of the Na ordering. 
On the basis of these experimental lines of evidence, we 
are now ready to consider what is going on with x in 
NaxCoO2 in terms of band structures.  As x decides filling in 
the t2g band, it is reasonable to assume that the band 
structure changes at a certain filling in the rigid band picture.  
According to band structure calculations,9-13) there is a dip 
around the Γ point in the a1g band, though it has not yet been 
detected experimentally.15)  Then, it is likely that at x* the 
Fermi level with energy E* touches the bottom of the a1g 
band exactly at the Γ point, as schematically depicted in Fig. 
5.  An approximate profile of DOS expected from the band 
structure is also illustrated in Fig. 5, which is constant below 
E* due to the two dimensionality and exhibits a 
discontinuous jump at E*, followed by a further increase 
above E*.  This profile of DOS is qualitatively in good 
agreement with the observed x dependence of γ shown in 
Fig. 4.  A minor difference between them may come from 
the actual three dimensionality of the band structure that 
splits the a1g band into two.
9, 12)  On the other hand, the top 
of the e'g band must be below the Fermi level even at the 
lowest filling, because no enhancement in γ is observed for 
0.35 ≤ x < x* in Fig. 4.  
Furthermore, one expects a distinct change in the 
topology of the FS at x*:  an additional small electron pocket 
should appear for x > x*.  Because of this small electron 
pocket, strong ferromagnetic correlations are expected for x 
> x*, which has been predicted theoretically10, 13, 19) and 
evidenced by neutron diffraction20, 21) and other 
experiments.22-24)  The observed change from a PM metal to 
a CW metal is obviously attributed to the emergence of this 
small electron FS. 
It seems difficult to estimate the accurate value of x* 
from band structure calculations, because the band structure 
is highly sensitivity to the crystal structure or the effect of 
electron correlations.9-14)  However, Korshunov et al. 
estimated x* (xm in their report) to be 0.56 based on a tight-
binding fit to an LAPW calculation, which is close to our 
value.13, 14)  They pointed out that taking account of electron 
correlations would push it up to 0.68,13, 14) which suggests 
that electron correlations may not be so important in 
NaxCoO2. 
Concerning superconductivity found in the hydrated 
compound, Sakurai et al. found that superconductivity  
appears at two specific Co valences of +3.48 and +3.40.4, 5) 
Interestingly, the latter corresponds to x = 0.60 in NaxCoO2, 
just above our x*.  Kuroki and his coworkers pointed out 
theoretically that the presence of the two concentric FSs 
around the Γ point leads to an enhanced spin fluctuation and 
thus gives rise to an extended s-wave superconductivity in 
the hydrated compound.25)  We think that superconductivity 
would show up at just above x* even in nonhydrated 
NaxCoO2, if the influence of disordered Na ions is 
appropriately taken away.  The other Co valence of +3.48 
corresponds to x = 0.52 in NaxCoO2, which is a simple PM 
metal.  However, a neutron diffraction experiment found 
that hydration squashes the CoO2 layer.
26)  It is theoretically 
predicted that this structural change with hydration pushes 
the e'g band above the Fermi level, leading to an 
enhancement in DOS and thus the occurrence of a spin-
triplet superconductivity.16, 17)  Therefore, the two 
superconducting states may be associated with the two 
corresponding FSs with enhanced DOS. 
In summary, we have studied the electronic properties of 
NaxCoO2 with varying x, using a series of high quality 
samples prepared at high temperature.  Dramatic changes in 
various quantities and thus in the electronic structure are 
found at a critical Na concentration x* between 0.58 and 
0.59.  This provides strong evidence of the presence of an 
electron pocket around the Γ point for high band fillings of x 
> x*.  The intrinsic phase diagram of the (CoO2)
x- layer is 
rather simple, as depicted in Fig. 5, in the absence of an 
electrostatic potential superimposed from the Na layers 
above and below. 
Fig. 5. Schematic representation of the band structure of 
NaxCoO2.  Band dispersions along the Γ-K line (left), an 
expected profile of density of states (middle) and an x-T 
phase diagram (right) are depicted.  The critical Na 
content x* in the phase diagram corresponds to a band 
filling with the Fermi energy equal to E*, as shown by a 
broken line, where the Fermi level touches the bottom of 
the a1g band at the Γ point.  Hexagons represent the 
Brillouin zone with the Γ point at the center and the K 
points at the corners. A small electron Fermi surface 
appears for the Fermi energy above E* in addition to a 
large Fermi surface around Γ.  In the phase diagram, a 
Curie-Weiss metal exists above x*, while a Pauli 
paramagnetic metal below x*.  A charge-ordered 
insulator reported at x = 0.5 is excluded in this phase 
diagram, because it is extrinsic, coming from the Na 
ordering. 
Acknowledgment 
We thank M. Ichihara for his help in electron microscopy 
observations. 
1) C. Fouassier et al.: J. Solid State Chem. 6 (1973) 532. 
2) I. Terasaki, Y. Sasago and K. Uchinokura: Phys. Rev. B 56 
(1997) R12685. 
3) K. Takada et al.: Nature 422 (2003) 53. 
4) H. Sakurai et al.: J. Phys. Soc. Jpn. 74 (2005) 2909. 
5) H. Sakurai et al.: Phys. Rev. B 74 (2006) 092502. 
6) M. L. Foo et al.: Phys. Rev. Lett. 92 (2004) 247001. 
7) M. Yokoi et al.: J. Phys. Soc. Jpn. 74 (2005) 3046. 
8) G. Gaparovi et al.: Phys. Rev. Lett. 96 (2006) 046403. 
9) D. J. Singh: Phys. Rev. B 61 (2000) 13397. 
10) D. J. Singh: Phys. Rev. B 68 (2003) 020503(R). 
11) P. Zhang et al.: Phys. Rev. Lett. 93 (2004) 236402. 
12) M. D. Johannes et al.: Europhys. Lett. 68 (2004) 433. 
13) M. M. Korshunov et al.: JETP Letters 84 (2007) 650. 
14) M. M. Korshunov et al.: Phys. Rev. B 75 (2007) 94511. 
15) M. Z. Hasan et al.: Phys. Rev. Lett. 92 (2004) 246402. 
16) M. Mochizuki and M. Ogata: J. Phys. Soc. Jpn. 75 (2006) 
113703. 
17) M. Mochizuki and M. Ogata: J. Phys. Soc. Jpn. 76 (2007) 
013704. 
18) D. Yoshizumi et al.: in preparation. 
19) K. Kuroki et al.: Phys. Rev. Lett. 98 (2007) 136401. 
20) S. P. Bayrakci et al.: Phys. Rev. B 69 (2004) 100410(R). 
21) A. T. Boothroyd et al.: Phys. Rev. Lett. 92 (2004) 197201. 
22) K. Ishida et al.: J. Phys. Soc. Jpn. 72 (2003) 3041. 
23) I. R. Mukhamedshin et al.: Phys. Rev. Lett. 94 (2005) 
247602. 
24) Y. Ihara et al.: J. Phys. Soc. Jpn. 75 (2006) 124714. 
25) K. Kuroki et al.: Phys. Rev. B 73 (2006) 184503. 
26) J. W. Lynn et al.: Phys. Rev. B 68 (2003) 214516.
ABSTRACT
  Electronic properties of the sodium cobaltate NaxCoO2 are systematically
studied through a precise control of band filling. Resistivity, magnetic
susceptibility and specific heat measurements are carried out on a series of
high-quality polycrystalline samples prepared at 200 C with Na content in a
wide range of 0.35 =< x =< 0.70. It is found that dramatic changes in
electronic properties take place at a critical Na concentration x* that lies
between 0.58 and 0.59, which separates a Pauli paramagnetic and a Curie-Weiss
metals. It is suggested that at x* the Fermi level touches the bottom of the
a1g band at the gamma point, leading to a crucial change in the density of
states across x* and the emergence of a small electron pocket around the gamma
point for x > x*.

<|endoftext|><|startoftext|>
Introduction
Degenerate saddle point problems, e.g., can be viewed as
limit cases of mixed formulations of symmetric problems
with large jumps in coefficients, corresponding to an infi-
nite jump.We prove that the degeneracy does not affect the
wellposedness in a standard norm under some natural as-
sumptions, using ideas that are initiated by [3, 4, 5, 6, 7, 14,
15]. By wellposedness, contrary to illposedness, we mean a
stable dependence of the solution on the right-hand side.
Results of this paper provide a foundation for research on
uniform wellposedness of mixed formulations of symmet-
ric problems with large jumps in coefficients in a standard
norm, independent of the jumps.
Email address: andrew.knyazev[AT]cudenver.edu (Andrew V.
Knyazev).
URL: http://math.cudenver.edu/˜ aknyazev/ (Andrew V.
Knyazev).
1 Partially supported by the National Science Foundation award
DMS-0612751.
The necessary and sufficient condition, e.g., [9, 10], of
the standard wellposedness of an operator equation with
an arbitrary right–hand side is the existence of a bounded
inverse of the operator. We argue that in some practical
cases the equation is degenerate, i.e. the inverse operator
does not exist. Assuming that the right–hand side is in the
operator range, a solution exists, but is not unique. Tomake
the solution unique we factor out the operator null–space.
This leads to a natural generalization, where boundedness
of the pseudoinverse of the operator is used as the necessary
and sufficient condition of wellposedness of a degenerate
operator equation, by analogy with [13, 15].
With this idea in mind, we revisit necessary and sufficient
conditions of wellposedness of an abstract mixed problem.
In the symmetric case we consider here, the mixed problem
can be interpreted as a variational saddle point problem.
For generalized saddle point problems we refer the reader,
e.g., to [11].
We start in Section 2 with a standard abstract symmetric
mixed problem as in [9, 10]. By analogy with [14, 17], we
Preprint submitted to Comp. Meth. Applied Mech. Engineering, accepted and published as doi:10.1016/j.cma.2006.10.01926 October 2018
http://arxiv.org/abs/0704.1066v1
split the saddle point problem into two equations, for the
primary unknown and for the Lagrange multiplier. This
split is somewhat implicit in [9, 10]. The equation for the
primary unknown is self-consistent, since here we eliminate
the Lagrange multiplier from the mixed system using an
orthogonal projector.
Following, e.g., [10], we discuss the traditional neces-
sary and sufficient conditions of wellposedness, namely, the
Ladygenskaya–Babuška–Brezzi (LBB) or inf-sup condition
and the coercivity condition. The LBB or inf-sup condition,
considered in Section 3, is necessary and sufficient for a sta-
ble dependence of the Lagrange multiplier on an arbitrary
right-hand side.
We review the traditional point of view that the coerciv-
ity condition is a necessary and sufficient condition of well-
posedness of the problem. In Section 4, an operator form
of the dual variational problem without assuming the co-
ercivity condition is considered. We examine the unique-
ness of the solution and describe all possible multiple solu-
tions for a given right-hand side. All admissible right-hand
sides are determined. We formulate several equivalent nec-
essary and sufficient conditions of wellposedness in terms
of closedness of relevant subspaces. We also derive a geo-
metrical condition—a positiveness of a minimum gap [12]
between relevant closed subspaces.
A possible application of our theory is the Hellinger–
Reissner formulation, e.g., [1], of nonhomogeneous Lamé
equations for media with (almost) rigid inclusions, where
the Lagrange multiplier is the displacement, and we get an
operator equation for the stress on the closed subspace of
divergence free (in a weak sense) stresses. Infinitely large
Lamé coefficients λ and µ, in a subdomain, result in a
null-space of the operator in the equation for the stress,
so the inverse operator does not exist and the problem is
not wellposed in a traditional sense. Our abstract geomet-
rical condition of generalized wellposedness in this example
is equivalent to a possibility of extension of displacements
preserving the energy norm of the Lamé operator. It has
been proved in [4, 7] that such an extension is possible un-
der some assumptions. We expect that in the limit case of
infinitely large Lamé coefficients λ and µ in a subdomain
the pseudoinverse of the operator is bounded, which makes
the problem wellposed for the stresses in the L2 sense, i.e.
the L2 norm of the stress is stable even if the Lamé coef-
ficients are large in a subdomain. We plan to address this
application in the future.
2. Abstract symmetric saddle point problems
In this section we essentially follow well known argu-
ments, e.g., [10], with some simplifications due to the sym-
metry of the saddle point problem and our unwillingness
to introduce dual spaces. Straightforward manipulations,
using a pair of complementary closed subspaces, allow us,
as in [14, 17], to formulate separate equations for the pri-
mary unknown and for the Lagrange multiplier of the sad-
dle point problem; see, e.g., survey [8, Sec. 6] for similar
matrix null-space methods. We start by formulating and
investigating the problem using bilinear forms, and then
repeat the arguments for operator-based formulations that
are used in the last section of the paper.
2.1. Formulations using bilinear forms
LetH andV be two real Hilbert spaces with scalar prod-
ucts and norms denoted by (·, ·)H, ‖ · ‖H and (·, ·)V, ‖ · ‖V
correspondingly. Let a(·, ·) : H × H → R and b(·, ·) :
H ×V → R be two continuous bilinear forms with a(·, ·)
symmetric and nonnegative definite. We consider the fol-
lowing problem: for a given g ∈ H and f ∈ H find σ ∈ H,
called the “primary unknown,” and u ∈ V, called the “La-
grange multiplier,” such that
a(σ, ǫ) + b(ǫ, u) = (g, ǫ)H, ∀ǫ ∈ H,
b(σ − f, v) = 0, ∀v ∈ V.
We place the right-hand side f “inside” of the form b as it
allows us to take f ∈ H, not to introduce the dual space
V′, and makes several statements somewhat simpler. We
call (1) a saddle point problem, since equations (1) are the
optimality conditions and their solution is a saddle point
for the Lagrangian, e.g., [10], defined by a(σ, σ) + 2b(σ −
f, u)− 2(g, σ)H.
We call a linear manifold, not necessarily closed, a “sub-
space” and a closed linear manifold a “closed subspace.”
Let us introduce a special notation N ⊆ H for the closed
subspace, which is the null-space of the bilinear form b(·, ·)
with respect to its first argument, i.e. N = {ǫ ∈ H :
b(ǫ, v) = 0, ∀v ∈ V}. Let us denote by P ≡ N⊥ ⊆ H the
closed subspace which is H-orthogonal (complementary)
to N. Closed subspaces N and P play important roles in
this paper, so let us introduce anH-orthogonal projector P
on H such that N(P ) = N and R(P ) = P and the com-
plementary projector P⊥ = I − P with R(P⊥) = N and
N(P⊥) = P, where by R(P ) we denote the range of opera-
tor P and, with a slight abuse of the notation, by N(P ) we
denote the null-space of operator P . We assume through-
out the paper, unless stated otherwise, that a bounded op-
erator is defined everywhere on a corresponding space. As
an orthogonal projector, operator P : H → H is bounded
H-selfadjoint, P = P ∗, and satisfies P = P 2.
In the first equation of system (1), let us split it into two
equations, by plugging separately ǫ = Pǫ ∈ P and ǫ =
P⊥ǫ ∈ N and using the fact that b(P⊥ǫ, u) = 0, ∀ǫ ∈ H.
The second equation in system (1) has a simple equivalent
geometric interpretation: σ − f ∈ N, or (σ − f, ǫ)H =
0, ∀ǫ ∈ P. We then rewrite system (1) in the following
equivalent form:
a(σ, ǫ) + b(ǫ, u) = (g, ǫ)H, ∀ǫ ∈ P,
a(σ, ǫ) = (g, ǫ)H, ∀ǫ ∈ N,
(σ − f, ǫ)H = 0, ∀ǫ ∈ P.
Nowwemake an important observation that we can treat
the first line in system (2) as an equation for the Lagrange
multiplier u, given the primary unknown σ, i.e.
b(ǫ, u) = (g, ǫ)H − a(σ, ǫ), ∀ǫ ∈ P. (3)
The last two lines in system (2) involve neither the Lagrange
multiplier u, nor the bilinear form b, and can be used to
determine the primary unknown σ:
a(σ, ǫ) = (g, ǫ)H, ∀ǫ ∈ N,
(σ − f, ǫ)H = 0, ∀ǫ ∈ P.
System (4) describes, e.g., [10], the optimality conditions
of the constrained minimization problem inf {a(σ, σ) −
2(g, σ)H}, σ ∈ H : (σ − f, ǫ)H = 0, ∀ǫ ∈ P.
2.2. Operator-based formulations
In addition to the formulations above involving bilin-
ear forms, it is convenient to consider equivalent operator-
based formulations.We associatewith the forms a and b two
linear continuous operators A : H → H and B : H → V
defined by (Aσ, ǫ)H = a(σ, ǫ), (Bσ, v)V = b(σ, v), ∀ǫ, σ ∈
H, v ∈ V. In this definition of A and B we follow a
slightly simplified, e.g., [11, 17], rather than standard [10],
approach, namely, we do not need dual spaces H′ and V′.
Now, we reformulate the main statements of subsection 2.1
using the just defined operators A and B. The following
operator formulation
Aσ + B∗u = g in H,
B(σ − f) = 0 in V
is equivalent to the original problem (1) with the bilinear
forms, where the adjoint operator B∗ : V → H is defined,
as usual, by (σ,B∗v)H = (Bσ, v)V, ∀σ ∈ H, v ∈ V. The
operator A is selfadjoint and nonnegative definite, A =
A∗ ≥ 0 on H since it is defined by the symmetric and
nonnegative definite form a.
We notice that the second equation in system (5) has
the same geometric interpretation as in the case of bilinear
forms-based system (1): σ − f ∈ N(B). The null-space
N(B) ⊆ H and its H-orthogonal complement R(B∗) ⊆ H
have already been denoted by N and P, correspondingly,
and introduced together with the H-orthogonal projector
P on H such that N = N(P ) = N(B) and P = R(P ) =
R(B∗) in the previous subsection.
We split the first equation in system (5) in two orthogonal
parts corresponding to N and P, using that PB∗u = B∗u
and P⊥B∗u = 0, since R(B∗) ⊆ P. We replace B with P ,
since they share the same null-space, in the second equation
in system (5) to get the following equivalent form of system
PAσ +B∗u = Pg in H,
P⊥Aσ = P⊥g in H,
P (σ − f) = 0 in H.
We notice that the first line in system (6) is an equation
for the Lagrange multiplier u, given the primary unknown
σ, as in (3), i.e. B∗u = P (g −Aσ).
We next discuss the necessary and sufficient conditions
from [10] of wellposedness of the problem and make it clear
why one can find weaker necessary and sufficient conditions.
To simplify our arguments, we take advantage in the rest
of the paper of the split of the original system into separate
equations for the Lagrange multiplier u and the primary
unknown σ that we have described in this section. It is
important to realize, however, that we have not made any
substitutions, neither in the solutions u and σ, nor in the
right-hand sides f and g. So whatever statements we next
prove concerning the dependence of the solutions u and
σ on the right-hand sides f and g, these statements are
equally applicable to both the separate equations and to the
original system in either bilinear form- or operator-based
context.
3. Inf-sup or LBB condition
In this section, we discuss a traditional assumption, be-
ing recently referred to as Ladygenskaya–Babuška–Brezzi
(LBB) condition, see Babuška and Aziz [2], Brezzi and
Fortin [10], Ladyzhenskaya [16], that the range of operator
B : H → V, denoted by R(B), is closed. The closedness of
a range of a closed operator is ultimately connected to the
boundedness of the operator (pseudo-)inverse, e.g., [12].
In our specific situation, operator B is bounded with
the closed domain H and, thus, is closed, so its (pseudo-
)inverse B−1 : R(B) → H/N(B) is also closed. It is neces-
sary to use a factor-space here to define the inverse, since
the standard operator inverse B−1 : R(B) → H does not
exist if N(B) is nontrivial. We note that N(B) is closed
and that the factor-space H/N(B) is a Hilbert space, as
is H. In a Hilbert space, a convenient set of representants
for the classes in the factor-space is simply the correspond-
ing orthogonal complement, e.g., H/N(B) is isometrically
isomorphic to P = (N(B))⊥ ⊆ H, so we set ‖σ‖H/N(B) =
‖Pσ‖H. The subspaceR(B) is the domain of the closed op-
erator B−1 : R(B) → H/N(B) therefore, R(B) is closed
if and only if B−1 : R(B) → H/N(B) is bounded. Closed-
ness of R(B) is equivalent to closedness of R(B∗), so all
the arguments above can be equivalently reformulated for
the adjoint operator B∗ and its (pseudo-)inverse.
When written in terms of inequalities involving the bi-
linear form b :
b(σ, v)
‖σ‖H/N(B)‖v‖V
= inf
‖Bσ‖V
‖σ‖H/N(B)
‖B−1‖R(B)→H/N(B)
or, equivalently,
b(σ, v)
‖σ‖H‖v‖V/N(B∗)
= inf
‖B∗v‖H
‖v‖V/N(B∗)
‖B−∗‖R(B∗)→V/N(B∗)
the LBB condition is also known as the inf-sup condition,
see Babuška and Aziz [2], Brezzi and Fortin [10], where
V/N(B∗) means the factor-space of V with respect to the
closed subspace N(B∗). We implicitly assume that the ar-
guments in the inf-sup formulas above and throughout the
paper do not make both the numerator and the denomi-
nator vanish. In Ladyzhenskaya [16], the inf-sup condition
does not appear to be explicitly formulated, instead, closed-
ness of a range of the gradient operator is investigated in
connection with wellposedness of the diffusion equation.
We note that the induced norms of an operator and its
adjoint are equal, so both inf-sup expressions above are
equal to the same constant that we call cb. If at least one
of the spaces H or V is finite dimensional then the value cb
is positive automatically, so it becomes important how cb
depends on some parameters, e.g., on the dimension.
Let us mention that in many practical applications the
space V can be naturally defined such that N(B∗) = {0},
so the latter inf-sup expression of the LBB condition takes
the form
b(σ, v)
‖σ‖H‖v‖V
= cb > 0,
which can be most often seen in publications on the subject.
We now contribute our own equivalent formulations of the
LBB condition.
Lemma 3.1 Subspaces R(B) ⊆ V and R(BB∗) ⊆ V are
closed simultaneously. Moreover, if either of them is closed
we have R(BB∗) = R(B).
Proof. If BB∗v = 0 then (B∗v,B∗v)H = 0, i.e. B
∗v = 0,
which proves that N(BB∗) = N(B∗). Taking an orthogo-
nal complement to both parts givesR(BB∗) = R(B) as the
operator BB∗ is selfadjoint. Trivially, R(BB∗) ⊆ R(B).
If the range R(BB∗) is closed then R(B) = R(BB∗) =
R(BB∗) ⊆ R(B), but clearlyR(B) ⊆ R(B), which proves
closedness of R(B) = R(BB∗).
To prove the inverse statement, assuming that R(B)
is closed, we invoke the orthogonal decomposition argu-
ment 2 : H = R(B∗) ⊕ (R(B∗))⊥ = R(B∗) ⊕ N(B) since
R(B) and thus R(B∗) are closed. Multiplying this equal-
ity by B gives R(B) = BH = B(R(B∗) ⊕ N(B)) =
BR(B∗) = R(BB∗). ✷
2 This proof is suggested by an anonymous referee
We use the previous lemma to introduce (BB∗)−1 :
R(BB∗) → V/N(B∗) in the next Lemma 3.2. It is nec-
essary to use the factor-space V/N(B∗) here, since the
standard inverse (BB∗)−1 : R(BB∗) → V does not exist
if N(B∗) is nontrivial.
Lemma 3.2 Closedness of R(B) ⊆ V is equivalent
to boundedness of the operator (BB∗)−1 : R(BB∗) →
V/N(B∗).
Proof. By Lemma 3.1, closedness of R(B) ⊆ V is equiv-
alent to closedness of R(BB∗) ⊆ V. We use several well-
known statements on closed operators, e.g., [12], applied
to the operator BB∗, that we have already reviewed in the
second paragraph of this section for the operator B. The
operator BB∗ is bounded and has the closed domain V, so
the operator is closed and its (pseudo-)inverse (BB∗)−1 :
R(BB∗) → V/N(B∗) with the domain R(BB∗) ⊆ V is
also closed. The domain R(BB∗) ⊆ V of the closed opera-
tor B−1 : R(BB∗) → H/N(B) is closed if and only if the
operator is bounded. ✷
If R(B) is closed then, using Lemmas 3.1 and 3.2,
R(B) = R(BB∗) and we can derive the following useful
formula
P = B∗(BB∗)−1B : H → H. (7)
Indeed, we first note that R((BB∗)−1) ⊆ V/N(B∗) is
multiplied by B∗ in (7), so the product is independent
of the choice of a representant from the equivalence class
V/N(B∗) and, thus, is correctly defined. Second, righ-hand
side of (7) is a linear and bounded operator as a product of
linear and bounded operators. Moreover, it is an orthogonal
projector on H since it is selfadjoint and idempotent, and
has the null-space the same as the orthoprojector P has.
If the LBB condition is not satisfied, i.e. R(B) is not
closed, then the domain of definition of the operator
B∗(BB∗)−1B is the subspace R(B∗) ⊕ N(B), which is
not closed, and formula (7), where P is the orthogonal
projector on H with N(P ) = N(B), clearly does not hold.
Let us note that in the case of finite dimensional spaces
H and V the range R(B) is evidently closed, the opera-
tor (B∗)+ = (BB∗)+B is the well-known Moore–Penrose
pseudo inverse of B∗, and P = B∗(B∗)+ is the well known
formula for the orthogonal projector onto the range of B∗.
If σ is an exact solution of system (5), then u in (5) can be
found from the equationB∗u = −Aσ+g ∈ R(B∗). If σ is an
approximate solution of system (5) such that the condition
Aσ − g ∈ R(B∗), which is necessary and sufficient for the
existence of u, does not hold, then u can be computed from
the projected equation B∗u = P (−Aσ + g) ∈ P. Both the
original and the projected equations for u are wellposed by
the LBB assumption, i.e. R(B∗) = P and
‖u‖V/N(B∗) ≤
‖σ‖H +
‖g‖H.
Whether the LBB assumption is necessary for wellposed-
ness of the equation for u depends on if the set of all possible
right-hand sides g − Aσ gives the whole subspace R(B∗),
see [10]. For example, in a practically important case g = 0
we have B∗u = −Aσ = −PAσ ∈ R(PA) ⊆ R(P ). If the
latter inclusion is strict, it opens up an opportunity for a
weaker, compared to the original LBB, assumption of well-
posedness of the above equation for u.
In the present paper, however, we are concerned with
finding σ, not u. The LBB condition for the bilinear form
b appears to be of no importance for our results in the
next section where we analyze wellposedness of system (5)
with respect to the σ unknown only, assuming that the u
unknown is of no interest, or can be found for a given σ
using some postprocessing.
4. Coercivity conditions
4.1. The standard coercivity condition
We finally get to the main topic of the paper: an assump-
tion on A which is a condition of wellposedness of (5) with
respect to σ. For the reader’s convenience, we briefly repeat
the necessary notation and the system of equations for σ
to make this section self-consistent. LetH be a real Hilbert
space and P be an orthoprojector in H with a null-space
N(P ) = N and a rangeR(P ) ≡ P—we emphasize that the
range of any orthoprojector in a Hilbert space is closed. Let
A be a linear and bounded operator such that 0 ≤ A∗ = A
on H. The last two lines in system (6) represent an oper-
ator form of system (4); they do not involve the Lagrange
multiplier u or the operator B and determine the primary
unknown σ ∈ H:
P⊥(Aσ − g) = 0 in H,
P (σ − f) = 0 in H,
where g ∈ H and f ∈ H are given and P⊥ ≡ I−P.We can
also replace system (8) with the following equivalent single
equation:
P⊥A |N ψ = P⊥g − P⊥APf ∈ N, σ = ψ + Pf, (9)
where in (9) we take a restriction of the operator P⊥A on
its invariant closed subspace N, and we are looking for a
solutionψ ∈ N. Then the necessary and sufficient condition
of wellposedness of problem (9) for an arbitrary g ∈ H is,
clearly, that the range of P⊥A |N is N. This leads to the
traditional assumption, see [10], a(σ, σ) ≥ ca > 0, ∀σ ∈
N, ‖σ‖H = 1 or, in an operator form, A ≥ caI on N ⊆ H,
since A is selfadjoint nonnegative. Thus, this assumption
is also necessary and sufficient [9, 10] for wellposedness of
system (5) with respect to σ for an arbitrary g ∈ H. In
the rest of the section, we analyze the scenario, where A
is selfadjoint nonnegative on H, but may be degenerate
on N, so we impose necessary restrictions on g ∈ H, and
determine a generalized coercivity condition that covers the
case of the degeneracy.
4.2. Existence, uniqueness, and wellposedness
Before we investigate the existence and uniqueness of the
solution σ, we prove the following technical, but important,
lemma.
Lemma 4.1 Let P be an orthoprojector in H with a null-
space N(P ) = N and a range R(P ) ≡ P = N⊥, and A be
a linear and bounded operator such that 0 ≤ A∗ = A on H.
N(P⊥A) ∩N = N(A) ∩N, (10)
{N(P⊥A) ∩N}⊥ = R(A) +P, (11)
N(P⊥A) ∩N
⊕ P⊥R(A). (12)
Proof.We first verify (10). It follows fromN(P⊥A) ⊇ N(A)
that the right-hand side of (10) is included in the left-hand
side. To prove the reverse inclusion, let ϕ ∈ N and P⊥Aϕ =
0, then 0 = (P⊥Aϕ,ϕ) = (Aϕ,ϕ) = ‖A1/2ϕ‖2 (recall that
A ≥ 0). Then A1/2ϕ = 0 and Aϕ = 0. Therefore, equality
(10) holds.
Equality (11) follows from (10), by substituting N(A) =
F⊥ and R(A) = F in the well-known simple identity
F⊥ ∩ P⊥ = (F + P)⊥ and noting that (R(A) + P)⊥⊥ =
R(A) +P = R(A) +P by properties of the closure.
Finally, to obtain the second term in the orthogonal de-
composition (12) of N we see that by (11) {N(P⊥A) ∩
N}⊥ ∩N = R(A) +P ∩N; at the same time
R(A) +P ∩N= P⊥R(A) +P ∩N
P⊥R(A)⊕P
∩N = P⊥R(A),
which completes the proof of the lemma. ✷
We start with the solution uniqueness.
Lemma 4.2 Suppose that for some fixed g ∈ H and f ∈ H
there exists a solution σ of (8). Then it is unique provided
that N(A) ∩N = {0};otherwise, all possible solutions yield
the hyperplane σ+ {N(A)∩N} and there exists the unique
normal (with minimal norm in H) solution of (8) that can
be also defined as a common element of the above hyperplane
and the closed subspace R(A) +P, which is the set of all
normal solutions for all possible f and g.
Proof. All solutions of (8) with g = f = 0 constitute
the closed subspace N(P⊥A) ∩N(may be 0-dimensional),
which by (10) is the same as N(A) ∩ N. Hence, all solu-
tions of (8) with the given g and f, provided that there
exists at least one solution σ, constitute the hyperplane
σ + N(A) ∩ N. It is known that each closed hyperplane
in a Hilbert space has a unique element with the minimal
norm, i.e. the element that is orthogonal to the directing
closed subspace N(A)∩N of the hyperplane. The orthogo-
nal complement to the directing closed subspace is already
given by (11). ✷
In the rest of the subsection we use the following equation
equivalent to (9):
(P⊥A+ P )σ = P⊥g + Pf. (13)
The assumptions on the right-hand side of the system (8)
which ensure the existence of a solution are rather standard
and follow from (13) easily.
Lemma 4.3 For any f ∈ H there exists a solution of (8) if
and only if g ∈ R(A)+P, i.e. P⊥g+Pf ∈ P⊥R(A)+P =
R(A) +P.
Proof. The subspace (not necessarily closed) P⊥R(A) +P
is simply the range of the operator P⊥A + P of equation
(13). ✷
The subspaceR(A)+P that appears in Lemmas 4.2 and
4.3 plays the central role in the following necessary and
sufficient conditions of wellposedness.
Theorem 4.1 The following statements are equivalent:
(i) The subspace R(A) +P is closed.
(ii) The subspace AN+P is closed.
(iii) The subspace P⊥R(A) is closed.
(iv) The subspace P⊥AN is closed.
(v) Problem (13) with f ∈ H and g ∈ R(A) +P is well-
posed in the factor-space, ‖σ‖H/{N(A)∩N} ≤ c(‖g‖ +
‖f‖), or, equivalently, ‖σ‖ ≤ c(‖g‖ + ‖f‖) for the
normal solution σ ∈ R(A) +P.
Proof. (1)⇔(3) We have R (A)+ P= P⊥R(A)⊕P.
(1)⇔(2) The subspace P⊥R(A) ⊕ P = R(A) + P is the
range of the operator P⊥A + P. The range of a bounded
operator is closed if and only if the range of the conjugate
operator is closed.
(2)⇔(4) Using the same arguments as above, AN + P =
P⊥AN⊕P.
(1)⇔(5) The operator P⊥A + P is bounded and defined
everywhere on a Hilbert space, thus it is closed. Therefore,
the (pseudo)inverse operator
(P⊥A+ P )−1 : R(P⊥A+ P ) → H/{N(A) ∩N}
is closed. It is bounded if and only if its domain of definition
R(P⊥A + P ) is closed. A normal solution is a convenient
representant of a factor-class in a Hilbert space. ✷
4.3. Generalized coercivity conditions
Statements (1)–(4) in Theorem 4.1 may not be so easily
verifiable in practice, so we want to find a somewhat eas-
ier assumption that generalizes the standard coercivity as-
sumption A ≥ caI on N ⊆ H, which itself does not hold if
the operatorA vanishes on a nontrivial subspace ofN ⊆ H.
Let us return back to equation (9). We remind the reader
that the first equation in (8) is equivalent to the orthog-
onal expansion σ = ψ + Pf, where ψ = P⊥σ ∈ N. This
and the second equation in (8) lead to (9) that we present
here, introducing a special notation K = P⊥A |N, in the
equivalent form
Kψ = φ, ψ ∈ P⊥R(A), φ = P⊥g − P⊥APf ∈ P⊥R(A).(14)
under the assumption that g ∈ R(A) +P.
The operatorK is bounded, selfadjoint, and nonnegative
definite on N, where N ⊆ H inherits the scalar product
and the norm of H, so there exists a bounded, selfadjoint,
and nonnegative definite square root
K on N. Applying
the inf-sup condition to the operator
K on N, by direct
analogy with Lemmas 3.1 and 3.2 and their proofs, we have
that N
= N(K) and
Theorem 4.2 The following statements are equivalent:
(i) The subspace R
⊆ N is closed.
(ii) The subspace R(K) ⊆ N is closed.
(iii) The inf-sup condition for the operator
K on N
Kǫ, σ
‖ǫ‖N/N(K)‖σ‖N
= inf
‖ǫ‖N/N(K)
> 0 (15)
holds.
(iv) The norm of the operator K−1 : R(K) → N/N(K)
is equal to ρ <∞.
Moreover, under either of the assumptions we have
= R(K).
Noticing that R(K) = P⊥AN, we immediately see that
statements (4) in Theorem 4.1 and (2) in Theorem 4.2 are
the same, so all statements of Theorems 4.1 and 4.2 are
equivalent. Our last goals in this subsection are to present
statement (3) of Theorem 4.2 in original terms, so that it
resembles the coercivity condition, and to bound the norm
of the solution in terms of the norms of the right-hand sides,
using statement (4) of Theorem 4.2.
Theorem 4.3 For any g ∈ R(A) + P the following as-
sumption
A ≥ 1
I on the subspace P⊥R(A) (16)
with a (finite) constant ρ > 0 is necessary and sufficient for
the normal solution σ with P⊥σ ∈ P⊥R(A) to exist and
to be unique and continuous in f ∈ H and g ∈ R(A) + P.
Moreover, assumption (16) implies
‖σ‖2 ≤ ‖f‖2 + ρ2‖g −APf‖2. (17)
Proof. First, we note that inequality (16) on the subspace
P⊥R(A) is equivalent to the same inequality on its clo-
sure P⊥R(A) because of the continuity of A and the scalar
product. Second, as (ǫ,Kǫ) = (ǫ, P⊥Aǫ) = (ǫ, Aǫ) for all
ǫ ∈ P⊥R(A) ⊆ N, inequality (16) is also equivalent to
I on the closed subspace P⊥R(A). (18)
Now we show that (18) is equivalent to (15), which is
condition (3) of Theorem 4.2. For the numerator in (15),
we have
= (ǫ,Kǫ). To handle the denominator in
(15), we remind the reader the orthogonal decomposition
N(P⊥A) ∩N
⊕P⊥R(A) stated as (12) and proved
in Lemma 4.1. Splitting ǫ ∈ N according to this orthogo-
nal decomposition, we see that its first component—from
N(P⊥A) ∩ N = N(K)—vanishes both in the numerator,
since it is in the null-space of K, and in the denominator
of (15), by the definition of the factor-norm, which gives
(18), where only the second component—from P⊥R(A)—
survives.
We conclude that (16) is equivalent to (15), which is
condition (3) of Theorem 4.2, and thus, to all statements
of Theorems 4.1 and 4.2. Finally, if (16) holds then the
subspace R(P⊥A) is closed, the operator K : P⊥R(A) 7→
P⊥R(A) is an isomorphism and problem (14) is wellposed
for f ∈ H and g ∈ R(A) +P, i.e.
‖ψ‖ ≤ ρ‖P⊥g − P⊥APf‖ ≤ ρ‖g −APf‖ (19)
by Theorem 4.2. Estimate (17) follows from σ = ψ + Pf
and (19) due to the statement of Lemma 4.2 that the nor-
mal solution σ ∈ R(A) +P, that is, ψ ∈ P⊥R(A) =
P⊥R(A) +P = P⊥ ∩ (P⊥ ∩N(A))⊥ is the corresponding
part of the orthogonal expansion σ = ψ + Pf for the nor-
mal solution. ✷
4.4. Minimum gap between subspaces
The rest of the section concerns the case where the range
of A is closed, so assumption (16) can be equivalently re-
formulated using the minimum gap between some relevant
subspaces. We first find a simple way to check if the range
of A is closed.
Lemma 4.4 Condition
A ≥ 1
I on the subspaceR(A) ≡ D (20)
with a (finite) constant ρD > 0 is equivalent to closedness
of D.
Proof. The operator A is a linear, bounded, and every-
where defined. Thus, it is closed and its inverse A−1 : D →
H/N(A) is also closed. Boundedness of the inverse is equiv-
alent, on the one hand, to condition (20) and, on the other
hand, to closedness of D. ✷
Now we are ready to present a simplified version of the
necessary and sufficient condition of wellposedness (16),
assuming that the range of A is closed.
Theorem 4.4 Let the range R(A) ≡ D be closed, the or-
thoprojector onD be denoted byD, and the constant ρD > 0
be defined by (20). Then inequality (16) is equivalent to the
inequality
κ ≡ inf
ψ∈P⊥D
> 0. (21)
In particular, (20) and (21) lead to (16) with ρ = ρD/κ
Proof. We have R(P⊥A) = P⊥R(A) = P⊥D, i.e. the
subspaces indicated in (16) and (21) coincide. Now the
main assertion of the Lemma is a consequence of relations
(Dψ,Dψ) ≤ (Aψ,ψ) = (ADψ,Dψ) ≤ ‖A‖(Dψ,Dψ)
which hold for an arbitrary ψ ∈ H. ✷
The next two lemmas provide alternative assumptions,
equivalent to (21), which are necessary and sufficient for
wellposedness, assuming that the range of A is closed. It is
important to have a choice of a criterion that may be easier
to check in a practical application. For aesthetic reasons we
denote N ≡ P⊥.
Lemma 4.5 Let D and P be orthogonal projectors onto
closed subspaces D and P, and let D⊥ = I −D and P⊥ =
I − P be orthogonal projectors onto the orthogonal comple-
ments D⊥ and P⊥, respectively. The following statements
are equivalent:
(i) The subspace P⊥D is closed.
(ii) The subspace D+P is closed.
(iii) The subspace D⊥ +P⊥ is closed.
(iv) The subspace PD⊥ is closed.
Proof. (1)⇔(2) The subspace P⊥D is closed iff the sub-
space P⊥D⊕P = D+P is closed as the terms are orthog-
onal in the first expression.
(2)⇔(3) By Theorem IV-4.8 of [12], a sum of closed sub-
spaces in a Hilbert space is closed if and only if the sum of
their orthogonal complements is closed.
(3)⇔(4) Using the same arguments as above, P⊥ +D⊥ =
P⊥ ⊕ PD⊥. ✷
Lemma 4.6 Using the notation of Lemma 4.5, the follow-
ing equalities hold:
ψ∈P, ψ 6∈D
dist{ψ;D}
dist{ψ;D ∩P}
= inf
ψ∈D, ψ 6∈P
dist{ψ;P}
dist{ψ;P ∩D}
= inf
ψ∈D⊥, ψ 6∈P⊥
dist{ψ;P⊥}
dist{ψ;P⊥ ∩D⊥}
= inf
ψ∈P⊥, ψ 6∈D⊥
dist{ψ;D⊥}
dist{ψ;D⊥ ∩P⊥}
= inf
ψ∈P⊥D
= inf
ψ∈D⊥P
= inf
ψ∈PD⊥
‖D⊥ψ‖
= inf
ψ∈DP⊥
‖P⊥ψ‖
Moreover, each statement in the previous Lemma is equiv-
alent to the positiveness κ > 0 in (21).
Proof. The first three equalities are derived in Section IV-4
of [12] on the minimum gap between subspaces, along with
a statement that positiveness of the minimum gap between
two given subspaces is a necessary and sufficient condition
of the sum of the subspaces, in our case,D+P, to be closed.
We now prove that
ψ∈P⊥, ψ 6∈D⊥
dist{ψ;D⊥}
dist{ψ;D⊥ ∩P⊥}
= inf
ψ∈P⊥D\{0}
All other equalities can be then trivially derived from the
previous ones just by interchanging P and D.
We first notice that in the right-hand side we can apply
the inf to the closure P⊥D \ {0} as well, because a norm is
a continuous function,
ψ∈P⊥D\{0}
‖ψ‖ = infψ∈P⊥D\{0}
‖ψ‖ .
We have, P⊥D = P⊥ ∩ (P⊥ ∩D⊥)⊥ as N(DP⊥) = P ⊕
(P⊥ ∩D⊥). The latter can be checked directly.
We always have dist{ψ;D⊥} = ‖Dψ‖. If ψ ∈ P⊥D =
P⊥∩(P⊥∩D⊥)⊥ ⊆ (P⊥∩D⊥)⊥,we also have dist{ψ;D⊥∩
P⊥} = ‖ψ‖. Thus,
dist{ψ;D⊥}
dist{ψ;D⊥ ∩P⊥}
, ψ ∈ P⊥D \ {0}.
Finally, using the orthogonal representation P⊥ = (P⊥ ∩
D⊥)⊕P⊥D, everyϕ ∈ P⊥ can be written as the orthogonal
sum ϕ = (ϕ−ψ)⊕ψ, where ϕ−ψ ∈ P⊥ ∩D⊥, ψ ∈ P⊥D.
Then dist{ψ;D⊥} = dist{ϕ;D⊥} and also dist{ψ;D⊥ ∩
P⊥} = dist{ϕ;D⊥ ∩P⊥}; so the value of the ratio
dist{ψ;D⊥}
dist{ψ;D⊥ ∩P⊥}
dist{ϕ;D⊥}
dist{ϕ;D⊥ ∩P⊥}
does not depend on ϕ−ψ and its two infimum values, taken
with respect to ψ ∈ P⊥D \ {0} and ϕ ∈ P⊥, ϕ 6∈ D⊥,
coincide. ✷
Finally, we notice that g = 0 if we apply a saddle point
approach to diffusion or linear elasticity equations. Indeed,
in the Hellinger–Reissner formulation of nonhomogeneous
Lamé equations, our σ represents the stress tensor, the La-
grange multiplier u is the displacement, and if we also in-
troduce the stain ǫ by the stain-displacement relation ǫ =
−B∗u, then the first line in system (5) becomes Aσ − ǫ =
g, which is the constitutive equation (3-D Hooke’s law),
where of course g = 0. The second line in (5) is the equi-
librium equation, where all body and traction forces are
represented by f 6= 0. The assumption g = 0 allows us to
look for even weaker conditions of wellposedness that we
plan to investigate in the future.
Acknowledgments The author thanks Ivo Babuška and
Franco Brezzi for discussions. This work has been stimu-
lated by collaboration with Nikolai S. Bakhvalov. The au-
thor thanks an anonymous referee, who has made numer-
ous useful suggestions to improve the original version of
the paper, and CU-Denver students Donald McCuan and
Christopher Harder for proofreading the paper.
References
[1] D. N. Arnold and R. Winther. Mixed finite elements
for elasticity. Numer. Math., 92(3):401–419, 2002.
[2] I. Babuška and A. K. Aziz. Survey lectures on
the mathematical foundations of the finite element
method. In The mathematical foundations of the fi-
nite element method with applications to partial differ-
ential equations, pages 1–359. Academic Press, New
York, 1972. With the collaboration of G. Fix and R.
B. Kellogg.
[3] N. S. Bakhvalov and A. V. Knyazev. A new iterative
algorithm for solving problems of the fictitious flow
method for elliptic equations. Soviet Math. Doklady,
41(3):481–485, 1990.
[4] N. S. Bakhvalov and A. V. Knyazev. Fictitious domain
methods and computation of homogenized properties
of composites with a periodic structure of essentially
different components. In Gury I. Marchuk, editor,
Numerical Methods and Applications, pages 221–276.
CRC Press, Boca Raton, 1994.
[5] N. S. Bakhvalov and A. V. Knyazev. Preconditioned
iterative methods in a subspace for linear algebraic
equations with large jumps in the coefficients. In
D. Keyes and J. Xu, editors, Domain Decomposition
Methods in Science and Engineering, volume 180 of
Contemporary Mathematics, pages 157–162.American
Mathematical Society, Providence, 1994.
[6] N. S. Bakhvalov, A. V. Knyazev, and G. M. Kobel’kov.
Iterative methods for solving equations with highly
varying coefficients. In Roland Glowinski, Yuri A.
Kuznetsov, Gérard A. Meurant, Jacques Périaux, and
Olof Widlund, editors, Fourth International Sympo-
sium on Domain Decomposition Methods for Partial
Differential Equations, pages 197–205, Philadelphia,
PA, 1991. SIAM.
[7] N. S. Bakhvalov, A. V. Knyazev, and R. R.
Parashkevov. Extension theorems for Stokes and Lame
equations for nearly incompressible media and their
applications to numerical solution of problems with
highly discontinuous coefficients. Numerical Linear
Algebra with Applications, 9(2):115–139, 2002.
[8] M. Benzi, G. H. Golub, and J. Liesen. Numerical solu-
tion of saddle point problems. Acta Numer., 14:1–137,
2005.
[9] F. Brezzi. On the existence, uniqueness and approx-
imation of saddle point problems arising from La-
grangian multipliers. RAIRO Anal. Numer., 2:129–
151, 1974.
[10] F. Brezzi and M. Fortin. Mixed and Hybrid Finite
Element Methods. Springer–Verlag, New York, 1991.
[11] P. Ciarlet, Jr., J. Huang, and J. Zou. Some observa-
tions on generalized saddle-point problems. SIAM J.
Matrix Anal. Appl., 25(1):224–236, 2003.
[12] T. Kato. Perturbation Theory for Linear Operators.
Springer–Verlag, New–York, 1976.
[13] A. V. Knyazev. Analysis of transmission problems on
Lipschitz boundaries in stronger norms. J. Numer.
Math., 11(3):225–234, 2003.
[14] A. V. Knyazev. Iterative solution of PDEwith strongly
varying coefficients: algebraic version. In R. Beauwens
and P. de Groen, editors, Iterative Methods in Linear
Algebra, pages 85–89, Amsterdam, 1992. Elsevier.
[15] A. V. Knyazev and O. Widlund. Lavrentiev regular-
ization + Ritz approximation= uniform finite element
error estimates for differential equations with rough
coefficients. Mathematics of Computation, 72:17–40,
2003.
[16] O. A. Ladyzhenskaya. The boundary value problems
of mathematical physics, volume 49 of Applied Mathe-
matical Sciences. Springer-Verlag, New York, 1985.
[17] J. Xu and L. Zikatanov. Some observations onBabuška
and Brezzi theories. Numer. Math., 94(1):195–202,
2003.
	Introduction
	Abstract symmetric saddle point problems
	Formulations using bilinear forms
	Operator-based formulations
	Inf-sup or LBB condition
	Coercivity conditions
	The standard coercivity condition
	Existence, uniqueness, and wellposedness
	Generalized coercivity conditions
	Minimum gap between subspaces
ABSTRACT
  We investigate degenerate saddle point problems, which can be viewed as limit
cases of standard mixed formulations of symmetric problems with large jumps in
coefficients. We prove that they are well-posed in a standard norm despite the
degeneracy. By wellposedness we mean a stable dependence of the solution on the
right-hand side. A known approach of splitting the saddle point problem into
separate equations for the primary unknown and for the Lagrange multiplier is
used. We revisit the traditional Ladygenskaya--Babu\v{s}ka--Brezzi (LBB) or
inf--sup condition as well as the standard coercivity condition, and analyze
how they are affected by the degeneracy of the corresponding bilinear forms. We
suggest and discuss generalized conditions that cover the degenerate case. The
LBB or inf--sup condition is necessary and sufficient for wellposedness of the
problem with respect to the Lagrange multiplier under some assumptions. The
generalized coercivity condition is necessary and sufficient for wellposedness
of the problem with respect to the primary unknown under some other
assumptions. We connect the generalized coercivity condition to the
positiveness of the minimum gap of relevant subspaces, and propose several
equivalent expressions for the minimum gap. Our results provide a foundation
for research on uniform wellposedness of mixed formulations of symmetric
problems with large jumps in coefficients in a standard norm, independent of
the jumps. Such problems appear, e.g., in numerical simulations of composite
materials made of components with contrasting properties.

<|endoftext|><|startoftext|>
EXPECTED PLANETS IN GLOBULAR CLUSTERS
Noam Soker1 & Alon Hershenhorn 1
ABSTRACT
We argue that all transient searches for planets in globular clusters have
a very low detection probability. Planets of low metallicity stars typically do
not reside at small orbital separations. The dependance of planetary system
properties on metallicity is clearly seen when the quantity Ie ≡ Mp[a(1 − e)]
is considered; Mp, a, e, are the planet mass, semi-major axis, and eccentricity,
respectively. In high metallicity systems there is a concentration of systems at
high and low values of Ie, with a low-populated gap near Ie ∼ 0.3MJ AU
2, where
MJ is Jupiter’s mass. In low metallicity systems the concentration is only at the
higher range of Ie, with a tail to low values of Ie. Therefore, it is still possible
that planets exist around main sequence stars in globular clusters, although at
small numbers because of the low metallicity, and at orbital periods of & 10 days.
We discuss the implications of our conclusions on the role that companions can
play in the evolution of their parent stars in globular clusters, e.g., influencing
the distribution of horizontal branch stars on the Hertzsprung-Russell diagram
of some globular clusters, and in forming low mass white dwarfs.
Subject headings: stars: horizontal-branch globular clusters: general - stars:
rotation - planets
1. INTRODUCTION
Evolved sun-like stars that burn helium in their cores occupy the horizontal branch
(HB) in the Hertzsprung-Russel (HR) diagram. HB stars that have low metallicity and/or
low envelope mass are blue, and are termed blue HB (BHB) stars in globular clusters (GCs),
and sdB or sdO (sdOB together; termed also extreme HB stars or EHB) in the field (not
in GCs). There are strong indications that many of the sdOB stars in the field are in close
binary systems (e.g., Maxted et al. 2001; Maxted 2004), and there is a strong support to
1Department of Physics, Technion−Israel Institute of Technology, Haifa 32000 Israel;
soker@physics.technion.ac.il
http://arxiv.org/abs/0704.1067v2
– 2 –
the idea that the sdOB phenomena is caused by binary companions (e.g., Han et al. 2003;
Maxted 2004). The large fraction of stars in the field that are likely to have planets around
them (Lineweaver & Grether 2003) hint that planets can also lead to the formation of sdOB
stars (Soker 1998). We note the recent tentative claim for a planet orbiting an sdB star at
an orbital separation of a = 1.7 AU (Silvotti et al. 2007), that might hint that closer planets
existed in the system before the progenitor turned into a red giant star.
The distributions of stars on the HB (the HB morphology; also referred to as the color-
magnitude diagram [CMD]) differ substantially from one GC to another. It has long been
known that metallicity is the main, but not sole, factor which determines the HB morphology
(for an historical review see, e.g., Fusi Pecci & Bellazzini 1997). The other factor (or factors)
which determines the HB morphology is termed the ‘second parameter’. There is a debate on
what are the main processes influencing the second parameter (Catelan 2007 and references
therein), with a binary interaction being one of these processes. In the low mass binary
second parameter model the companion is very light (a very low mass main sequence [MS]
star, a brown dwarf, or a massive planet), and in most cases will be destroyed as it falls
deeper into the envelope (Soker 1998; Soker & Harpaz 2007). Therefore, the non-detection
of companions to HB stars in GCs (Moni Bidin et al. 2006) is not in contradiction with the
model.
Although close brown dwarf companions exist (e.g., Zucker & Mazeh 2000), and are
included in the low mass binary second parameter models, they are rare (e.g., Mazeh et al.
2003; Grether & Lineweaver 2006), and so we concentrate on planets and low mass MS stars.
More problematic to the binary second parameter model might seem to be the null
detection of planets to MS stars in GCs (Weldrake et al. 2007b). Weldrake et al. (2007b)
looked for transits in the GCs ω Cen and 47 Tuc, and did not find any close planets, but
did find stellar companions (Weldrake et al. 2007a). The null detection of planets in 47 Tuc
(Gilliland et al. 2000) does not contradict the binary model (Soker & Hadar 2001). The
reason is that this GC has only few stars on its blue HB. If there were many planets in this
GC, then they would be swallowed by the red giant branch (RGB) star progenitor of the
HB star, causing high mass loss rate (Livio & Soker 2002) and the formation of many BHB
stars.
The null detection of planets in the GCs ω Cen (Weldrake et al. 2007b) can be com-
patible with the planet binary model for the second parameter if the planets don’t reside
in close orbits. Namely, orbital periods & 10 days. For that, Soker & Harpaz (2007)
predicted-conjectured that planet companions might exist in GCs, but at orbital separations
of 0.3 AU . a . 3 AU. Planetary systems in the upper orbital separation range will be
destroyed in GCs, but those in the lower range can survive (Fregeau et al. 2006).
– 3 –
Soker & Harpaz (2007) only based their prediction on the requirement of their model,
but did not bring any observations to support this conjecture. Our main goal is to further
explore the binary model for the second parameter, and in particular to examine the possible
role of planets. For that, we turn to examine what can be learned from the wealth of data
acquired in the field of exoplants. We use recent results from the field of exoplants to support
the claim that if planets exist in GCs, they are at larger orbital separations than planets
around stars close to the sun. Unlike planets, low mass MS companions can reside close to
the parent star in GCs (Weldrake et al. 2007a), and be much more significant in influencing
the evolution of their parent star. Namely, the low mass binary second parameter model in
low metallicity GCs must be based mainly on low mass main sequence stellar companions,
but with contribution of massive planets as well.
2. PLANET COMPANIONS
To support the conjecture that if planets exist in GCs they don’t reside at small orbital
separations we present several correlations between properties of known extra solar planets.
We start by presenting the well known distribution of planets by their orbital period P
(Figure 1), but motivated by the recent results of Grether & Lineweaver (2007) we do
so separately for three ranges of metallicity of the parent stars: [Fe/H] < −0.1 (black),
−0.1 ≤ [Fe/H] ≤ 0.1 (gray), and 0.1 < [Fe/H] (white). Here and in the rest of the diagrams
in the paper, each bin shows the number of planets with a period (or other relevant quantities)
greater than the number to the left of the bin and smaller than the number to the right of the
bin. The leftmost bin shows the number of planets with a period smaller than the number
to the right of the bin. The rightmost bin shows the number of planets with a period greater
than the number to the left of the bin. All planets data used here are from the Extrasolar
Planets Encyclopaedia maintained by Schneider (2007, and references therein), as of June
1, 2007. We skip the comparison of the planet hosting star metallicity distribution with that
of other field stars (see, e.g., Santos et al. 2001).
There is a population gap in this histogram, i.e., the planets are concentrated in two
regions. The gap exists only for the high and medium metallicity systems, as marked by
the two thick horizontal arrows in Figure 1. The long period region is limited from above
by selection effects. This gap is well known, e.g., Udry et al. (2003) noted the shortage of
planets in the range 0.06 AU ≤ a ≤ 0.6 AU.
We seek a better quantity to distinguish between close and wide planets, and between
the metallicity ranges. For that, and motivated by known correlations between planets’ mass
and period (e.g., Mazeh et al. 2005), we plot in Figure 2 planets in the Mp − a plane, where
– 4 –
Log[ Period/Day ]
Period/Day
1 2 4 7 10 20 40 70 10
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
Period
Fig. 1.— Histogram of the number of planets as a function of the orbital period P in days,
for three ranges of metallicity as indicated. Each bin shows the number of planets with a
period greater than the number to the left of the bin and smaller than the number to the
right of the bin. The leftmost bin shows the number of planets with a period smaller than
the number to the right of the bin. The rightmost bin shows the the number of planets with
a period greater than the number to the left of the bin. The horizontal thick arrows mark
the gaps in the respective two high metallicity ranges. Data from the Extrasolar Planets
Encyclopaedia maintained by Schneider (2007, June 1, and references therein).
– 5 –
Mp is the minimum planet mass, used here in units of Jupiter mass MJ , and a is the orbital
semi-major axis, used here in units of AU. Filled and empty circles are for systems where
the host star metallicity is below and above solar metallicity, respectively. We note that
there is a morphological structure along a few lines of
α = constant. (1)
Two lines are marked by their α value on Figure 2. These lines bound a low-populated
stripe between them, and emphasize two populated clumps: one consists of planets having
high masses and large orbital separations, and the other consists of planets having low masses
and small orbital separations. Based on this, we take α = 2 as our standard value to further
analyze the correlations.
In Figures 3-9 we show the distribution of the entire sample as a function of the following
quantities: Mpa
2, Mpa
Ie ≡ Mp[a(1− e)]
2, (2)
[a(1 − e)]2, Mp[a(1 − e)]
3, [a(1 − e)]2/Mp, and [a(1 + e)]
2/Mp, respectively. Planets with
unknown eccentricity were calculated with e = 0. The quantity Ie has the dimension of
moment of inertia, and might therefore indicate the importance of some kind of interaction
between the planet and the parent star, as will be discussed in section 4.3. As the strongest
interaction occurs near periastron, the relevant distance is a(1− e) rather than a alone.
From Figures 1-9 we learn the following.
1. As is well known (e.g., Udry et al. 2003; Marcy et al. 2005) there is a concentration of
planets at very short periods of days. Then there is a low-populated range, the gap,
and a rise to a second grouping of planets at hundreds to thousands of days. The
gap exists only at the two higher metallicity ranges, as marked on Figure 1 by the
horizontal thick arrows.
2. At lower metallicities planets tend to have longer orbital periods. The ratio of planets
with P > 100 day to planets with P < 100 day is 28/16 = 1.75 and 31/17 = 1.8 for
[Fe/H] < −0.1 and −0.1 ≤ [Fe/H] ≤ 0.1, respectively, while it is only 64/53 = 1.2 for
0.1 < [Fe/H]. This trend was found also by Marchi (2007) in the C2 and C3 sub
samples defined there. This trend will have to be checked with much larger samples in
the future.
3. We find that the double-peak distribution of planets at high metallicity is more pro-
nounced when instead of the period other quantities are used, e.g., Mpa
2 or [a(1 −
e)]2/Mp, or [a(1 + e)]
2/Mp, and even more so when the quantity Ie = Mp[a(1 − e)]
used.
– 6 –
Log[ a/AU ]
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0
1 2 4 7 10
[Fe/H] < 0
[Fe/H] > 0
1.7α =
2.36α =
Fig. 2.— The distribution of all known planets (from The Extrasolar Planets Encyclopaedia)
in the minimum mass (in MJ)−semi-major axis (in AU) plane. Filled and empty circles are
for host stars metallicity below and above solar, respectively. Two lines are drawn to show
morphological features in the distribution, with the value of α marked (eq. 1). The
morphological feature we refer to is a low-populated stripe between the two lines, and two
populated clumps, one above the upper line and one below the lower line.
– 7 –
Log[ (M
p/MJ)(a/AU)
.5 0.
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
Fig. 3.— Histogram of the number of planets as a function of Mpa
2 (in MJ AU
2) for the
three metallicity ranges.
– 8 –
Log[ (M
p/MJ)(a/AU)
1/2  ]
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
Fig. 4.— Histogram of the number of planets as a function of Mpa
1/2 (in MJ AU
1/2) for
the three metallicity ranges.
– 9 –
Log[ (Mp/MJ)([a/AU][1-e])
.5 0.
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
( )[ ]
M a e−
Fig. 5.— Histogram of the number of planets as a function of Ie = Mp[a(1−e)]
2 (inMJ AU
for the three metallicity ranges.
– 10 –
Log[ ([a/AU][1-e])
.5 0.
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
( )[ ]
1a e−
Fig. 6.— Histogram of the number of planets as a function of [a(1 − e)]2 (in AU2) for the
three metallicity ranges.
– 11 –
Log[ (Mp/MJ)([a/AU][1-e])
.5 0.
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
( )[ ]
M a e−
Fig. 7.— Histogram of the number of planets as a function of Mp[a(1 − e)]
3 (in MJ AU
for the three metallicity ranges.
– 12 –
Log[ ([a/AU][1-e])2/(Mp/MJ) ]
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
( )[ ]
a e M−
Fig. 8.— Histogram of the number of planets as a function of [a(1− e)]2/Mp (in AU
2 M−1
for the three metallicity ranges.
– 13 –
Log[ ([a/AU][1+e])2/(Mp/MJ) ]
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
≤ ≤( )[ ]
a e M+
Fig. 9.— Histogram of the number of planets as a function of [a(1 + e)]2/Mp (in AU
2 M−1
for the three metallicity ranges.
– 14 –
4. From the different parameters we have tried, Ie = Mp[a(1 − e)]
2 shows most clearly
the two groups of planets at high metallicity, and the differences between planets with
different metallicity of their parent star. The criteria we use to prefer the parameter Ie
are (a) a smooth variation in the peaks of the histogram, i.e., small fluctuations in peaks
(e.g., the peaks of the graph of [a(1 − e)]2/Mp in Fig. 8 have larger fluctuations than
the peaks of Ie); (b) a sharp jump between the group of planets with large separations
and the deep gap for the two high metallicity ranges (e.g., for Mpa
2, presented in
Fig. 3, the difference between the peaks and gap is not as sharp as for Ie presented
in Fig. 5); (c) A wide gap (e.g., for Mpa
1/2 presented in Fig. 4 the gap is very
narrow); and (d) A clear different behavior of the low metallicity range and the two
high metallicity ranges. In particular, when using Ie such a jump does not exist for
the low metallicity range. For the highest metallicity range used here there is a large
jump at Ie ≃ 0.3MJ AU
2 (log Ie = −0.5), which clearly separates two Ie-populations
of planets. While in high metallicity systems there are two well defined populations of
planets, in low metallicity systems there is only one peak: Planets of low metallicity
stars have typically larger orbital separations. Because Mp is the minimum mass, in
the histogram showing [a(1 − e)]2/Mp (Fig. 8) of the real distribution, systems will
move to the left smearing the peak of planets in the lowest metallicity range. In the
histogram using Ie (Fig. 5), on the other hand, using the real mass will move systems
to the right. This might make the peak on the right for the lowest metallicity range
more pronounced.
5. We have tried to use the same quantities listed above with 1 + e instead of 1− e. For
most cases the separation between close and wide planets is worse than when 1 − e is
used. This implies that the periastron is physically more influential than the apastron
for defining close and wide planets. However, for the quantity [a(1 + e)]2/Mp shown in
figure 9 a partition to two groups is evident. Still, we regard Ie to be the best indicator
of the two planet populations.
Burkert & Ida (2007) find that a gap in the semimajor axis distribution does not exist
if only a sample of systems with hosting stellar mass of M < 1.2M⊙ is used. We find
that when using only hosting stars with masses of M < 1.2M⊙ a gap does exist in the Ie
distribution for the high metallicity range. There is a gap in the medium metallicity range
as well. We also find that ∼ 70% of the hosting stars with masses of M > 1.2M⊙ are
in our high metallicity range. The planets in lower metallicity systems with stellar mass
of M > 1.2M⊙ tend to have larger values of Ie than planets in systems with stellar mass
of M < 1.2M⊙. We therefore propose that both stellar mass (Burkert & Ida 2007) and
metallicity are fundamental quantities influencing the distribution of planets around stars.
– 15 –
Burkert & Ida (2007) also find that the gap in the high hosting stellar mass sample
exists for planets with Mp sin i > 0.8M⊙, but not for M sin i ≤ 0.8M⊙. The dependance
on the planet mass is included together with the semimajor axis and eccentricity in Ie. We
find that the Ie distribution does not have a gap for systems with both Mp > 0.8MJ and
M > 1.2M⊙, unlike the semimajor axis distribution found by Burkert & Ida (2007). We do
see the gap in the Ie distribution of systems with Mp > 0.8MJ and in the Ie distribution of
systems with M < 1.2M⊙.
It is not clear if stars in GCs have planets at all, and in particular close-in planets that
could be found with the transit technique, as the fraction of detected planetary systems
decreases sharply with decreasing metallicity (e.g., Santos et al. 2001; Fischer & Valenti
2005; Grether & Lineweaver 2007). According to Grether & Lineweaver (2007) the most
probable value of this fraction for the metallicity range appropriate for GCs is . 0.1%,
although the uncertainty is large, and values of ∼ 1% are still possible. Contrary to the
general trend is the low metallicity of M-dwarfs (low mass MS stars) hosting planets (Bean
et al. 2006), which still leaves hope for planetary systems in GCs. If some planets do exist
in GCs, the implications of the results presented here are clear: the planets will not be on
short orbital periods. Therefore, we conclude, all transient searches for planets in GCs have
a very low detection probability. (At least in the statistical sense, as rare close planets might
exist.)
However, planets might be detected in metal-rich clusters. The open cluster NGC 6791
has [Fe/H]∼ 0.4, and has a large population of EHB stars and low mass WDs (Kalirai et
al. 2007), both of which were formed by stars having increased mass loss on the RGB.
Stellar companions are not likely to cause this increased mass loss (Kalirai et la. 2007). We
propose that the increased mass loss on the RGB in this cluster is partially caused by planets
swallowed by the RGB progenitors of EHB stars and low mass WDs. We therefore predict
that many transient events can be detected in this cluster.
3. LOW MASS MAIN SEQUENCE COMPANIONS
A similar analysis was conducted as in the previous section but for stellar companions
based on the same sample of 135 systems analyzed by Grether & Lineweaver (2006). We
present two histograms of the period of the secondary. In figure 10 the sample was divided
into three ranges of metallicity as in the previous section. In figure 11 we follow the division
of Grether & Lineweaver (2006) and divide the sample into two ranges of color of the parent
star as marked on the figure.
– 16 –
Log[Period/Day]
Period/Day
1 2 4 7 10 20 40 70 10
           [Fe/H] < -0.1
 -0.1    [Fe/H]    0.1
  0.1 < [Fe/H]
≤≤Period
Fig. 10.— Histogram of the number of stellar companions as a function of the orbital period
P in days, for three ranges of metallicity as indicated. Data from Grether & Lineweaver
(2006, 2007).
– 17 –
Log[Period/Day]
Period/Day
1 2 4 7 10 20 40 70 10
           B-V < 0.75
0.75 < B-V
Period
Fig. 11.— Histogram of the number of Stellar companions as a function of the orbital period
P in days, for two ranges of color as indicated.
– 18 –
From figures 10 and 11 we can deduce the following:
1. As with planets, there are two concentrations of stellar companions: at short and long
orbital periods.
2. Contrary to the behavior of planets, this double-distribution is more pronounced in the
low and medium metallicity ranges, with longer orbital periods at higher metallicities.
The ratio of stellar companions with P > 100 day to P < 100 day is 27/49 = 0.55,
15/27 = 0.56 and 10/7 = 1.43 for [Fe/H] < −0.1, −0.1 ≤ [Fe/H] ≤ 0.1 and 0.1 <
[Fe/H], respectively.
3. Redder systems tend to have a slightly shorter orbital periods. The ratio of stellar
companions with P > 100 to P < 100 day is 34/50 = 0.68 and 18/33 = 0.55 for
B− V < 0.75 and B− V > 0.75, respectively.
4. The width of the gap between long and short orbital periods is ∼ 70 (from ∼ 30
to ∼ 100 day) and ∼ 460 day (from ∼ 100 to ∼ 550 day) for [Fe/H] < −0.1 and
−0.1 ≤ [Fe/H] ≤ 0.1, respectively. The center of the gap is located at P ≈ 56 and
P ≈ 247 day for [Fe/H] < −0.1 and −0.1 ≤ [Fe/H] ≤ 0.1, respectively.
We have tried to use a finer classification of metallicity and color ranges, namely
[Fe/H] < −0.3, −0.3 < [Fe/H] < −0.1, −0.1 ≤ [Fe/H] ≤ 0.1, and 0.1 < [Fe/H], for metal-
licity, and B− V < 0.6, 0.6 < B− V < 0.7, 0.7 < B− V < 0.8, and 0.8 < B−V for the
color (data not shown). These did not add new information. We find that the ratio between
the number of systems with P < 100 days to systems with P > 100 days is 49/27 = 1.8,
33/15 = 2.2, and 28/10 = 2.8, for [Fe/H] < −0.1, [Fe/H] < −0.3, and [Fe/H] < −0.4,
respectively. This shows that the tendency of low metallicity systems to harbor short period
companions is robust.
Although we find that low metallicity stars tend to be slightly closer (shorter orbital
periods) than higher metallicity systems, this does not automatically imply the same trend
to low mass systems. Maxted & Jeffries (2005) find that a large fraction of very low mass
stars seem to be in binary systems, but not very close ones. In addition, a large fraction
of stellar companions to low mass stars can have very low mass M2 < 0.3M⊙ (Mazeh et al.
2003), and low mass stellar companions tend to be at large orbital separation (Grether &
Lineweaver 2007).
– 19 –
4. DISCUSSION AND SUMMARY
4.1. Main Results
As is well known and can be seen in figures 1-9, there are two regions highly-populated
with planets, with a low-populated gap between them. What we have found here (sections
2 and 3) is the following.
1. In planets the partition to two groups is more significant for high metallicity systems.
2. From the different quantities we have tried, the quantity Ie ≡ Mp[a(1 − e)]
2 both
distinguishes between high and low metallicity systems, and shows best this partition
to two planet populations in the high metallicity range.
3. In high metallicity systems planets tend to reside on an average closer orbital periods.
We note that these two properties depend also on the hosting stellar mass (Burkert &
Ida 2007).
4. This trend for metallicity dependance is opposite for stellar companions (section 3).
5. This trend for stellar companions is mainly due to metallicity and not to the parent
star’s mass. There is only a small difference in the ratio of stellar companions with
P > 100 to P < 100 day for the two color ranges used.
We note that there are other properties for which stars with planetary systems and
with stellar companions show opposite behavior. The most important is the finding that as
metallicity decreases the star is much more likely to have a stellar companion than to harbor
a planetary system (Grether & Lineweaver 2007).
Our results have implications for two areas.
4.2. Implications for Globular Clusters
Since in galactic GCs the metallicity is very low, close-in planets are not expected
there. However, if planets do exist around some stars in GCs, they will most likely have
large orbital separations. Therefore, the transit search for planets in GCs has a very low
detection probability, and the non-detection of close-in planets should not be considered as
evidence against the presence of planets in GCs. We should stay open to the possibility that
wide (large orbital periods) planetary systems exist in GCs.
– 20 –
The large fraction of low metallicity stars that have stellar companions (Grether &
Lineweaver 2007) implies that a large fraction of stars in GCs might have formed with
stellar companions around them. It seems that binary systems are indeed common in GCs
(Leigh et al. 2007).
Not only metallicity, but other properties at the formation epoch of globular clusters
(Soker & Hadar 2001) might determine the presence of companions and planets. For example,
density in star forming regions can determine stellar rotation (Wolff et al. 2007).
Put together, our results support the low mass binary second parameter model, but
the companions in low metallicity star clusters are more likely to be very low mass stellar
companions, as observed by, e.g., Weldrake et al. (2007a), rather than massive planets. In
that model a low mass companion (a very low mass main sequence star, a brown dwarf,
or a massive planet) influences the post-main sequence evolution of stars, in particular the
properties of the parent stars on the horizontal branch. (Soker 1998; Soker & Harpaz 2007).
In high metallicity clusters, such as NGC 6791 (Kalirai et al. 2007), planets might be more
important than stellar companions in forming EHB stars and low mass (undermassive) white
dwarfs. We predict that many transient events can be detected in this cluster.
4.3. Implications for Planetary Systems
This topic is beyond our scope. However, our findings suggest that the migration of
planets from large to small orbital separation depends on a combination of parameters ex-
pressed by Ie (eq. 2). This quantity has the dimension of moment of inertia, and may imply
that a process reminiscent of the Darwin instability (e.g., Eggleton 2006) is in operation. In
particular, the strong dependance on eccentricity, in that 1− e is a much better factor than
1 + e in showing the two planet populations, suggests that some sort of tidal interaction is
operating.
Although the situation cannot be simple, let us try the following. The left limit of the
right populated area in figure 5 is Ie ≃ 0.3. Let us substitute this in the condition for the
Darwin instability to occur (Eggleton 2006, sec. 4.2)
> λcrit, (3)
where M1 is the stellar mass, R1 is the stellar radius, kR1 the radius of gyration with
k2 ≃ 0.05 − 0.1 for main sequence stars, Ω is the stellar angular velocity, and ω = 2π/P is
the mean orbital angular velocity. The critical value λcrit rapidly decreases with increasing
eccentricity, with λcrit = 1/3, 0.171, 0.102, and 0.052, for e = 0, 0.3, 0.4, and 0.5, respectively.
– 21 –
As most planets have e . 0.4, in this range we see that
λcrit ≃
(1− e)2 for e . 0.5. (4)
Substituting approximation (4) in equation (3), and taking M1 = 1M⊙, k
2 = 0.075, and
Ω ≃ ω, the condition for the Darwin instability to occur becomes
R1 & 8
0.3MJ AU
R⊙ (5)
This explanation requires that the planets interact with an inflated pre-main sequence star.
This possibility will be studied in a future paper.
We thank Charles Lineweaver and Daniel Grether for giving us their data on stellar
companions. We thank David Weldrake for useful comments, and an anonymous referee for
comments that improved the presentation of our results. This research was supported by a
grant from the Asher Space Research Institute at the Technion.
REFERENCES
Bean, J. L., Benedict, G. F., & Endl, M. 2006, ApJ, 653, L65
Burkert, A. & Ida, S. 2007, ApJ, 660, 845
Catelan, M. 2007, in Resolved Stellar Populations, eds. D. Valls-Gabaud & M. Chavez (ASP
Conf. Ser., in press) (astro-ph/0507464)
Eggleton, P. 2006, Evolution Processes in Binary and Multiple Stars, Cambridge university
Press (Cambridge, UK).
Fischer, D. A., & Valenti, J. 2005, ApJ, 622, 1102
Fregeau, J. M., Chatterjee, S., & Rasio, F. A. 2006, ApJ, 640, 1086
Fusi Pecci, F. & Bellazzini, M. 1997, in The Third Conference on Faint Blue Stars, ed. A.
G. D. Philip, J. Liebert, R. Saffer, and D. S. Hayes, Published by L. Davis Press, p.
255 (astro-ph/9701026).
Gilliland, R. L., et al. 2000, ApJ, 545, L47
Grether, D. & Lineweaver, C. H. 2006, ApJ, 640, 1051
Grether, D. & Lineweaver, C. H. 2007, (astro-ph/0612172)
Han, Z., Podsiadlowski, Ph., Maxted, P. F. L., & Marsh, T. R. 2003, MNRAS, 341, 669
http://arxiv.org/abs/astro-ph/0507464
http://arxiv.org/abs/astro-ph/9701026
http://arxiv.org/abs/astro-ph/0612172
– 22 –
Kalirai, J. S., Bergeron, P., Hansen, B. M. S., Kelson, D. D., Reitzel, D. B., Rich, R. M., &
Richer, H. B. (astro-ph/arXiv:0705.0977)
Leigh, N., Sills, A., & Knigge, C. (astro-ph/0702349)
Lineweaver, C. H., & Grether, D. 2003, ApJ, 598, 1350
Livio, M., & Soker, N. 2002, ApJ, 571, L161
Marchi, S. 2007, ApJ, in press (arXiv:0705.0910)
Marcy, G., Butler, R. P., Fischer, D., Vogt, S., Wright, J. T., Tinney, C. G., & Jones, H. R.
A. 2005, PThPS, 158, 24
Maxted, P. 2004, A&G, 45, e24
Maxted, P. f. L., Heber, U., Marsh, T. R., & North, R. C. 2001, MNRAS, 326, 1391
Maxted, P. F. L., & Jeffries, R. D. 2005, MNRAS, 362, L45
Mazeh, T., Simon, M., Prato, L., Markus, B., & Zucker, S. 2003, ApJ, 599, 1344
Mazeh, T., Zucker, S., Pont, F. 2005, MNRAS, 356, 955
Moni Bidin, C., Moehler, S., Piotto, G., Recio-Blanco, A., Momany, Y., & Mendez, R. A.
2006, A&A, 451, 499
Santos, N. C., Israelian, G., Mayor, M. 2001, A&A, 373, 1019
Schneider, J. 2007, The Extrasolar Planets Encyclopaedia (http://exoplanet.eu/).
Silvotti, R. et al. ASP conf. series (astro-ph/0703753)
Soker, N. 1998, AJ, 116, 1308
Soker, N., & Harpaz, A. 2007, ApJ, 660, 699
Soker, N., & Hadar, R. 2001, MNRAS, 324,213
Udry, S., Mayor, M., & Santos, N. C. 2003, A&A, 407, 369
Weldrake, D. T. F., Sackett, P. D., & Bridges, T. J. 2007a, AJ, 133, 1447
Weldrake, D. T. F., Sackett, P. D., & Bridges, T. J. 2007b, in Transiting Extrasolar Planets
Workshop, Eds: C. Afonso, D. Weldrake & T. Henning PAS proceedings, in press
(astro-ph/0612215)
Wolff, S. C., Strom, S. E., Dror, D. & Venn, K. 2007, AJ, 133, 1092
Zucker, S., & Mazeh, T. 2000, ApJ, 531, L67
This preprint was prepared with the AAS LATEX macros v5.2.
http://arxiv.org/abs/astro-ph/0702349
http://arxiv.org/abs/0705.0910
http://exoplanet.eu/
http://arxiv.org/abs/astro-ph/0703753
http://arxiv.org/abs/astro-ph/0612215
	INTRODUCTION
	PLANET COMPANIONS
	LOW MASS MAIN SEQUENCE COMPANIONS
	DISCUSSION AND SUMMARY
	Main Results
	Implications for Globular Clusters
	Implications for Planetary Systems
ABSTRACT
  We argue that all transient searches for planets in globular clusters have a
very low detection probability. Planets of low metallicity stars typically do
not reside at small orbital separations. The dependance of planetary system
properties on metallicity is clearly seen when the quantity Ie=Mp[a(1-e)]^2 is
considered; Mp, a, e, are the planet mass, semi-major axis, and eccentricity,
respectively. In high metallicity systems there is a concentration of systems
at high and low values of Ie, with a low-populated gap near Ie~0.3 M_J AU^2,
where M_J is Jupiter's mass. In low metallicity systems the concentration is
only at the higher range of I_e, with a tail to low values of Ie. Therefore, it
is still possible that planets exist around main sequence stars in globular
clusters, although at small numbers because of the low metallicity, and at
orbital periods of >~10 days. We discuss the implications of our conclusions on
the role that companions can play in the evolution of their parent stars in
globular clusters, e.g., influencing the distribution of horizontal branch
stars on the Hertzsprung-Russell diagram of some globular clusters, and in
forming low mass white dwarfs.

<|endoftext|><|startoftext|>
Introduction
Several cars are now fitted with a Global Positioning System (GPS) terminal which gives the exact
geographic location of the vehicle on the surface of the earth. All of these GPS terminals are now endowed
with detailed road network databases which allow them to compute the shortest path (in terms of distance)
between the current vehicle location (source) and another location given by the driver (destination).
Naturally, drivers are more interested in the source-destination fastest path (i.e. shortest in terms of
travelling time). The greatest difficulty to overcome is that the travelling time depends heavily on the
amount of traffic on the chosen road. Currently, some state agencies as well as commercial enterprises
are charged with monitoring the traffic situation in certain pre-determined strategic places. Furthermore,
traffic reports are collected from police cars as well as some taxi services. The dynamic traffic information,
however, is as yet limited to a small proportion of the whole road network.
The problem faced by traffic information providers is currently that of offering GPS terminal enabled
drivers a source-destination path subject to the following constraints: (a) the path should be fast in terms
of travelling time subject to dynamic traffic information being available on part of the road network; (b)
traffic information data are updated approximately each minute; (c) answers to path queries should be
computed in real time. Given the data communication time and other overheads, constraint (c) practically
asks for a shortest path computation time of no more than 1 second. Constraint (b) poses a serious
problem, because it implies that the fastest source-destination path may change each minute, giving
an on-line dimension to the problem. A source-destination query spanning several hundred kilometers,
which would take several hours to travel, would need a system recomputing the fastest path each minute;
this in turn would mean keeping track of each query for potentially several hours. As the estimated
computational cost of this requirement is superior to the resources usually devoted to the task, a system
based on dynamic traffic information will not, in practice, ever compute the on-line fastest path. As a
typical national road network for a large European country usually counts several million junctions and
road segments, constraint (c) implies that a straight Dijkstra’s algorithm is not a viable option. In view of
constraint (a), in our solution method fast paths can be efficiently computed by means of a point-to-point
hierarchy-based shortest path algorithm for static large-scale networks, where the hierarchy is built using
static information and each query is answered on the dynamically evaluated network.
http://arxiv.org/abs/0704.1068v2
{giacomon,baptiste,dk,liberti}@lix.polytechnique.fr
{giacomo.nannicini,gilles.barbier,contact}@v-trafic.com
1 INTRODUCTION 2
This paper makes two original scientific contributions (i) We extend a known hierarchy-based shortest
path algorithm for static large-scale undirected graphs (the Highway Hierarchies algorithm [SS05]) to
the directed case. The method has been developed and tested on real road network data taken from
the TeleAtlas France database [NV05]. We note that the original authors of [SS05] have extended the
algorithm to work on directed graphs in a slightly different way than ours (see [SS06]). (ii) We propose a
method for efficiently finding fast paths on a large-scale dynamic road network where arc travelling times
are updated in quasi real-time (meaning very often but not continuously).
In the rest of this section, we discuss the state of the art as regards shortest path algorithms in dynamic
and large-scale networks, and we describe the proposed solution. The rest of the paper is organized as
follows. In Section 2 we briefly review the highway hierarchy-based shortest path algorithm for static large-
scale networks, which is one of the important building blocks of our method, and discuss the extension of
the existing shortest-path algorithm to the directed case. Section 3 discusses the computational results,
and Section 4 concludes the paper.
1.1 Shortest path algorithms in road networks
The problem of computing fastest paths in graphs whose arc weights change over time is termed the
Dynamic Shortest Path Problem (DSPP) [BRTed]. The work that laid the foundations for solving
the DSPP is [CH66] (a good review of this paper can be found in [Dre69], p. 407): Dijkstra’s algorithm
is extended to the dynamic case through a recursion formula based on the assumption that the network
G = (V,A) has the FIFO property: for each pair of time instants t, t′ with t < t′:
∀ (u, v) ∈ A τuv(t) + t ≤ τuv(t
′) + t′,
where τuv(t) is the travelling time on the arc (u, v) starting from u at time t. The FIFO property
is also called the non-overtaking property, because it basically says that if A leaves u at time t and
B at time t′ > t, B cannot arrive at v before A using the arc (u, v). The shortest path problem in
dynamic FIFO networks is therefore polynomially solvable [Cha98], even in the presence of traffic lights
[AOPS03]. Dijkstra’s algorithm applied to dynamic FIFO networks has been optimized in various ways
[BRTed, Cha98]; the A∗ one-to-one shortest path algorithm has also been extended to dynamic networks
[CS02]. The DSPP is NP-hard in non-FIFO networks [Dea04].
Although in this paper we do not assume any knowledge about the statistical distribution of the arc
weights in time, it is worth mentioning that a considerable amount of work has been carried out for
computing shortest paths in stochastic networks. A good review is [FHK+05].
The computation of exact shortest paths in large-scale static networks has received a good deal of
attention [CZ01]. The established practice is to delegate a considerable amount of computation to a
preprocessing phase (which may be very slow) and then perform fast source-destination shortest path
queries on the pre-processed data. Recently, the concept of highway hierarchy was proposed in [SS05,
Sch05, SS06]. A highway hierarchy of L levels of a graph G = (V,A) is a sequence of graphs G =
G0, . . . , GL with vertex sets V 0 = V, V 1 ⊇ . . . ⊇ V L and arc sets A0 = A,A1 ⊇ . . . ⊇ AL; each arc has
maximum hierarchy level (the maximum i such that it belongs to Ai) such that for all pairs of vertices
there exists between them a shortest path (a1, . . . , ak), where ai are the consecutive path arcs, whose
search level first increases and then decreases, and each arc’s search level is not greater than its maximum
hierarchy level. A more precise description is given in Section 2. The A∗ algorithm has also been extended
to use a concept, reach, which has turned out to be closely related to highway hierarchies (see [GKW05]).
1.2 Description of the solution method
The solution method we propose in this paper efficiently finds fast paths by deploying Dijkstra-like
queries on a highway hierarchy built using the static arc weights found in the road network database,
2 HIGHWAY HIERARCHIES ALGORITHM ON DYNAMIC DIRECTED GRAPHS 3
but used with the dynamic arc weights reflecting quasi real-time traffic observations. This implies using
two main building blocks: highway hierarchy construction (the Highway Hierarchies1 algorithm extended
to directed graphs), and the query algorithm. Consequently, the implementation is a complex piece of
software whose architecture has been detailed in the appendix.
• Highway hierarchy. Apply the directed graph extension of the HH algorithm (see Section 2) to
construct a highway hierarchy using the static road network information. In particular, arc travelling
times are average estimations found in the database. This is a preprocessing step that has to be
performed only when the topology of the road network changes. The CPU time taken for this step
is not an issue.
• Efficient path queries. Efficiently address source-destination fast path requests by employing a
multi-level bidirectional Dijkstra’s algorithm on the dynamically evaluated graph using the highway
hierarchy structure constructed during preprocessing. This algorithm is carried out each time a path
request is issued; its running time must be as fast as possible, in any case not over 1 second.
2 Highway Hierarchies algorithm on dynamic directed graphs
The Highway Hierarchies algorithm [SS05, Sch05] is a fast, hierarchy-based shortest paths algorithm
which works on static undirected graphs. HH algorithm is specially suited to efficiently finding shortest
paths in large-scale networks. Since the HH algorithm is one of our main building blocks, we briefly
review the necessary concepts.
The Highway Hierarchies algorithm is heavily based on Dijkstra’s algorithm [Dij59], which finds the
tree of all shortest paths from a root vertex r to all other vertices v ∈ V of a weighted digraph G = (V,A)
by maintaining a heap H of reached vertices u with their associated (current) shortest path length c(u)
(elements of the heap are denoted by [u, c(u)]. Vertices which have not yet entered the heap (i.e. which are
still unvisited) are unreached, and vertices which have already exited the heap (i.e. for which a shortest
path has already been found) are settled. Dijkstra’s algorithm is as follows.
1. Let H = {[r, 0]}.
2. If H = ∅, terminate.
3. Let u be the vertex in H with minimum associated path length c(u).
4. Let H = H r {u}.
5. For all v ∈ δ+(u), if c(u) + τuv < c(v) then let H = (H r {[v, c(v)]}) ∪ {[v, c(u) + τuv]}.
6. Go to 2.
A bidirectional Dijkstra algorithm works by keeping track of two Dijkstra search scopes: one from the
source, and one from the destination working on the reverse graph. When the two search scopes meet
it can be shown that the shortest path passes through a vertex that has been reached from both nodes
([Sch05], p. 30). A set of shortest paths is canonical2 if, for any shortest path p = 〈u1, . . . , ui, . . . , uj . . . , uk〉
in the set, the canonical shortest path between ui and uj is a subpath of p.
The HH algorithm works in two stages: a time-consuming pre-processing stage to be carried out only
once, and a fast query stage to be executed at each shortest path request. Let G0 = G. During the first
stage, a highway hierarchy is constructed, where each hierarchy level Gl, for l ≤ L, is a modified subgraph
1From now on, simply HH
2Dijkstra’s algorithm can easily be modified to output a canonical shortest paths tree (see [Sch05], Appendix A.1 — can
be downloaded from http://algo2.iti.uka.de/schultes/hwy/).
http://algo2.iti.uka.de/schultes/hwy/
2 HIGHWAY HIERARCHIES ALGORITHM ON DYNAMIC DIRECTED GRAPHS 4
of the previous level graph Gl−1 such that no canonical shortest path in Gl−1 lies entirely outside the
current level for all sufficiently distant path endpoints: this ensures that all queries between far endpoints
on level l−1 are mostly carried out on level l, which is smaller, thus speeding up the search. Each shortest
path query is executed by a multi-level bidirectional Dijkstra algorithm: two searches are started from the
source and from the destination, and the query is completed shortly after the search scopes have met; at
no time do the search scopes decrease hierarchical level. Intuitively, path optimality is due to the fact that
by hierarchy construction there exist no canonical shortest path of the form 〈a1, . . . , ai, . . . , aj . . . , ak . . .〉,
where ai, aj , ak ∈ A and the search level of aj is lower than the level of both ai, ak; besides, each arc’s
search level is always lower or equal to that arc’s maximum level, which is computed during the hierarchy
construction phase and is equal to the maximum level l such that the arc belongs to Gl. The speed of
the query is due to the fact that the search scopes occur mostly on a high hierarchy level, with fewer arcs
and nodes than in the original graph.
2.1 Highway hierarchy
As the construction of the highway hierarchy is the most complicated part of HH algorithm, we endeavour
to explain its main traits in more detail. Given a local extensionality parameter H (which measures
the degree at which shortest path queries are satisfied without stepping up hierarchical levels) and the
maximum number of hierarchy levels L, the iterative method to build the next highway level l+1 starting
from a given level graph Gl is as follows:
1. For each v ∈ V , build the neighbourhoodN lH(v) of all vertices reached from v with a simple Dijkstra
search in the l-th level graph up to and including the H-st settled vertex. This defines the local
extensionality of each vertex, i.e. the extent to which the query “stays on level l”.
2. For each v ∈ V :
(a) Build a partial shortest path tree B(v) from v, assigning a status to each vertex. The initial
status for v is “active”. The vertex status is inherited from the parent vertex whenever a
vertex is reached or settled. A vertex w which is settled on the shortest path 〈v, u, . . . , w〉
(where v 6= u 6= w) becomes “passive” if
|N lH(u) ∩N
H(w)| ≤ 1. (1)
The partial shortest path tree is complete when there are no more active reached but unsettled
vertices left.
(b) From each leaf t of B(v), iterate backwards along the branch from t to v: all arcs (u,w) such
that u 6∈ N lH(t) and w 6∈ N
H(v), as well as their adjacent vertices u,w, are raised to the next
hierarchy level l + 1.
3. Select a set of bypassable nodes on level l + 1; intuitively, these nodes have low degree, so that the
benefit of skipping them during a search outweights the drawbacks (i.e., the fact that we have to
add shortcuts to preserve the algorithm’s correctness). Specifically, for a given set Bl+1 ⊂ Vl+1
of bypassable nodes, we define the set Sl+1 of shortcut edges that bypass the nodes in Bl+1: for
each path p = (s, b1, b2, . . . , bk, t) with s, t ∈ Vl+1 r Bl and bi ∈ Bl+1, 1 ≤ i ≤ k, the set Sl+1
contains an edge (s, t) with c(s, t) = c(p). The core G′l+1 = (V
l+1, E
l+1) of level l + 1 is defined
as:V ′
= Vl+1 rBl+1, E
= (El+1 ∩ (V
× V ′
)) ∪ Sl+1.
The result of the contraction is the contracted highway network G′l+1, which can be used as input for
the following iteration of the construction procedure. It is worth noting that higher level graphs may be
disconnected even though the original graph is connected.
2.1 Example
Take the directed graph G = (V,A) given in Fig. 1 (above). We are going to construct a road hierarchy
with H = 3 and L = 1 on G. First we compute N0
(v) for all v ∈ V = {v0, . . . , v6}.
2 HIGHWAY HIERARCHIES ALGORITHM ON DYNAMIC DIRECTED GRAPHS 5
v0, v1, v2 {0, 1, 2}
v3, v4, v5 {3, 4, 5}
v6 {3, 5, 6}
Next, we compute B(v) for all v ∈ V and raise the hierarchy level of the relevant arcs from the leaves to
B(v) to v. We only discuss the computation of B(v0) in detail as the others are similar.
1. Vertex v0 is initialized as an active vertex.
2. Dijkstra’s algorithm is started.
(a) v0 is settled (cost 0) on the empty path, so the passivity condition (1) does not apply;
(b) v1 and v2 are reached from v0 with costs resp. 1 and 2, and inherit its active status;
(c) v1 is settled (cost 1) on the path 〈v0, v1〉 and condition (1) does not apply;
(d) v6 is reached from v1 with cost 1 + 4 = 5 and set to active;
(e) v2 (cost 2) is settled on 〈v0, v2〉;
(f) v4 is reached from 2 with cost 2 + 6 = 8 and set to active;
(g) v6 (cost 5) is settled on the path 〈v0, v1, v6〉: since N
(v1) ∩ N
(v6) = ∅, condition (1) is
verified, and v6 is labeled passive;
(h) v3 is reached from v6 with cost 1 + 4 + 4 = 9 and set to passive.
(i) v4 (cost 8) is settled on the path 〈v0, v2, v4〉: since N
3 (v2) ∩ N
3 (v4) = ∅, condition (1) is
verified, and v4 is labeled passive;
(j) v5 is reached from v4 with cost 2 + 6 + 2 = 10 and set to passive;
(k) the only unsettled vertices are v3 and v5. Since both are reached and passive, the search
terminates.
3. The leaf vertices of B(v0) are v4 and v6.
(a) From t = v4, we iterate backwards along the arcs on the path 〈v0, v2, v4〉: the arc (v2, v4) has
the property that v2 6∈ N
(v4) and v4 6∈ N
(v2), so its hierarchy level is raised to l + 1 = 1
(the other arc on the path, (v0, v2), stays at level l = 0);
(b) from t = v6, we iterate backwards along the arcs on the path 〈v0, v1, v6〉: the arc (v1, v6) has
the property that v1 6∈ N
3 (v6) and v6 6∈ N
3 (v1), so its hierarchy level is raised to 1 (the other
arc on the path stays at level 0).
Fig. 1 shows the hierarchy at level 1.
2.2 Extension to directed graphs
The original description of the HH algorithm [SS05] applies to undirected graphs only; in this section we
provide an extension to the directed case. It should be noted that the HH algorithm was extended to the
directed case by the authors (see [SS06]) in a way which is very similar to that described here. However,
we believe our slightly different exposition helps to clarify these ideas considerably.
The algorithm for hierarchy construction, as explained in Section 2.1, works with both undirected and
directed graphs. However, storing all neighbourhoods N lH(v) for each v and l has prohibitive memory
requirements. Thus, the original HH implementation for checking whether a vertex v is in N lH(u) is
based on comparing the distance d(u, v) with the “distance-to-border” (also called slack) from u to the
border of its neighbourhood N lH(u). The “distance-to-border” d
H(u) is a measure of a neighbourhood’s
2 HIGHWAY HIERARCHIES ALGORITHM ON DYNAMIC DIRECTED GRAPHS 6
Figure 1: The graph of Example 2.1 (left) and its highway hierarchy for H = 3, L = 1 (right): the dashed
lines indicate arcs at level 0, the solid lines indicate arcs at level 1.
radius, and is defined as the distance d(u, v) where v is the farthest node in N lH(u), i.e. the cost of the
shortest path from u to the H-th settled node when applying Dijkstra’s algorithm on node u at level l.
This is the basis of the slack-based method in [Sch05], p. 19 (from which we draw our notation). In the
partial shortest paths tree B(s0) computed in Step 2a of the algorithm in Section 2.1, the slack ∆(u) is
recursively computed for all u ∈ B(s0) starting from the leaves t0 of B(s0), as follows.
1. Initialise a FIFO queue Q to contain all nodes u of B(s0), ordering them from the farthest one to
the nearest one with respect to s0.
2. Set ∆(u) = dlH(u) for u a leaf of B(s0) and +∞ otherwise.
3. If Q is empty, terminate.
4. Remove u from Q, and let p be its predecessor in B(s0).
5. If ∆(p) = +∞ and p 6∈ N lH(s0), p is added to Q.
6. Let ∆(p) = min(∆(p),∆(u)− d(p, u)).
7. If ∆(p) < 0, the edge (p, u) is lifted to the higher hierarchical level.
8. Return to Step 3.
The algorithm works because Thm. 2 in [Sch05] proves that condition ∆(p) < 0 is equivalent to the
condition of Step 2b of the algorithm in Section 2.1. The cited theorem is based on the following
assumption:
∀u ∈ V (u 6∈ N lH(t0) → d
H(t0)− d(u, t0) < 0). (2)
This condition may fail to hold for directed graphs, since d(u, t0) 6= d(t0, u).
To make Assumption 2 hold, we have to consider a neighbourhood radius computed on the reverse
graph, that is the graph G = (V,A) such that (u, v) ∈ A ⇔ (v, u) ∈ A. Thus, we modified the original
implementation to compute, for each node, a reverse neighbourhood N
H(v) (see Figure 2), so that we
are able to store the corresponding reverse neighbourhood radius d
H(u)∀u ∈ V . We replace Step 2 in
the algorithm above by:
2a. Set ∆(u) = d
H(u) for u a leaf of B(s0) and +∞ otherwise.
We are now going to prove our key lemma.
3 COMPUTATIONAL RESULTS 7
2.2 Lemma
Let u, s ∈ V and t a leaf in B(s). If u 6∈ N
H(t) then d
H(t)− d(u, t) < 0.
Proof. Suppose d(u, t) ≤ d
H(t). By definition, this means that there is a shortest path in N
H(t) which
connects u to t. Therefore, u ∈ N
H(t) against the hypothesis. ✷
It is now straightforward to amend Thm. 2 in [Sch05] to hold in the directed case; all other theorems
in [Sch05] need similar modifications, replacing N lH(t) with N
H(t) and d
H(t) with d
H(t) whenever t is
target node or is “on the right side” of a path - it will always be clear from the context. The query
algorithm must me modified to cope with these differences, using d
H(t) instead of d
H(t) whenever we are
searching in the backwards direction.
Figure 2: An example which shows neighbourhoods and reverse neighbourhoods with H = 3; only solid
arcs are lifted to a higher level in the hierarchy. Note that arcs (p, t) and (p′, t) are not lifted even if
p, p′ /∈ N lH(t); this is because p, p
′ ∈ N
H(t), and for target node we consider the reverse neighbourhood.
Interestingly, the problem with the slack-based method was first detected when our original imple-
mentation of the HH algorithm failed to construct a correct hierarchy for the Paris urban area. This
shows that the extension of the algorithm to the directed case actually arises from real needs.
2.3 Heuristic application to dynamic networks
The original Highway Hierarchies algorithm, as described above, finds shortest paths in networks whose
arc weights do not change in time. By forsaking the optimality guarantee, we adapt the algorithm to the
case of networks whose arc weights are updated in quasi real-time. Whereas the highway hierarchy is
constructed using the static arc travelling times from the road network database, each point-to-point path
query is deployed on a dynamically evaluated version of the highway hierarchy where the arcs are weighted
using the quasi real-time traffic information. In particular, in all tests that involved a comparison with
neighbourhood radius we use the static arc travelling times, while for all evaluations of path lengths or
of node distances we use the real-time (dynamic) travelling times. This means that the static travelling
times are used to determine neighbourhood’s crossings, and thus to determine when to switch to a higher
level in the hierarchy, while the key for the priority queue for HH algorithm is computed using only
dynamic travelling times.
3 Computational results
In this section we discuss the computational results obtained by our implementation. As there seems
to be no other readily available software with equivalent functionality, the computational results are not
comparative. However, we establish the quality of the heuristic solutions by comparing them against
3 COMPUTATIONAL RESULTS 8
the fastest paths found by a plain Dijkstra’s algorithm. We mention here two different approaches:
dynamic highway-node routing ([SS07]), which uses a selection of nodes operated by the HH algorithm to
build an overlay graph (see [HSW06]), and dynamic ALT ([DW07]), which is a dynamic landmark-based
implementation of A∗. Both approaches, however, although very performing with respect to query times,
require a computationally heavy update phase (which takes time in the order of minutes), and thus are
not suitable for our scenario, where, supposedly, each arc can have its cost changed every 2 minutes
(roughly).
We performed the tests on the entire road network of France, using a highway hierarchy with H = 65
and L = 9. The original network has 7778913 junctions and 17154042 road segments; the number of
nodes and arcs in each level is as follows.
level 0 1 2 3 4 5 6 7 8 9
nodes 7778913 1517291 433286 182474 91888 53376 34116 23338 16445 11790
arcs 17154042 3461385 1283000 583380 308249 183659 119524 81170 57235 41092
We show the results for queries on the full graph without dynamic travelling times in Table 1; in
this case, all paths computed with the HH algorithm are fastest paths. In Table 2, instead, we record
our results on a graph with dynamic travelling times; we also report the relative distance of the solution
found with our heuristic version of the HH algorithm and the fastest path computed with Dijkstra, and,
for comparative reasons, the results of the naive approach which consists in computing the traffic-free
optimal solution with the HH algorithm (i.e., on the static graph) and then applying dynamic times on
the so-found solution. Dynamic travelling times were taken choosing, for each query, one out of five sets
of values recorded in different times of the day for each of the 29384 arcs with dynamic information.
Although this number is small with respect to the total number of arcs in the graph, it should be
noted that most of these arcs correspond to very important road segments (highways and national roads).
All arcs (i, j) that did not have a dynamic travelling time were assigned a different weight at each query,
chosen at random with a uniform distribution over [τij , 15τij ], where τij is the reference time for arc (i, j).
This choice has been made in order to recreate a difficult scenario for the query algorithm: even if the
number of arcs with real traffic information is still small, it is going to increase rapidly as the means for
obtaining dynamic information increase (e.g. number of road cameras, etc.), and thus, to simulate an
instance where most arcs have their travelling time changed several times per hour, we generated each
arc’s cost at random. The interval [τij , 15τij ] is simply a rough estimation of a likely cost interval, based
on the analysis of historical data. All tables report average values over 5000 queries. All computational
results in Table 1 and 2 have been obtained on a multiprocessor Intel Xeon 2.6 GHz with 8GB RAM
running Microsoft Windows Server 2003, compiling with Miscrosoft Visual Studio 2005 and optimization
level 2.
Computational results show that, although with no guarantee of optimality, our heuristic version
of the algorithm works well in practice, with 0.55% average deviation from the optimal solution and a
recorded maximum deviation of 17.59%; query times do not seem to be influenced by our changes with
respect to the original version of the algorithm. The naive approach of computing the shortest path
on the static graph, and then applying dynamic times, records an average error of 2%, but it has a
much higher variance, and a maximum error of 27.95%; although the average error is not high, it’s still
almost 4 times the average error of the more sofisticated approach, and the high variance suggests lack of
stability in the solution’s quality. The low value recorded for the average error with the naive approach (in
absolute terms) can be explained as a consequence of the following two facts: travelling times generated
at random on arcs without real-time traffic information cannot simulate real traffic situation, because
they lack spatial coherence (i.e. they do not simulate congested nearby zones) and traffic behaviour
information (i.e. the fact that during peak hours important road segments are likely to be congested,
while less important roads are not), thus making the task of finding a fast path easier; besides, the
average query on such a large graph corresponds to a very long path (296 minutes on the traffic-free
graph, 2356 minutes on the dynamic graph), and on long paths it is usually necessary to use highways
4 CONCLUSION 9
or national roads regardless of their congestion status - which is exactly what the HH algorithm does.
This last sentence is supported by the fact that, if we consider only the 500 shortest queries in terms of
path length, the average error of the naive approach increases to 3.60%, while the average error of the
heuristic version increases to 0.97%; this is in accord with the fact that on short paths the influence of
traffic is greater, because alternative routes that do not use highways are more appealing, while on long
paths using highways is often a necessary step. However, in relative terms, the heuristic version of the
HH algorithm performs significantly better than the naive approach proposed for comparison, and we
expect the difference to increase (in favour of the heuristic algorithm) if applied to a graph fully covered
with real traffic information.
Figure 3 shows how the optimal and the heuristic path may differ; since the hierarchy built on the
static graph emphasizes important roads, the heuristic algorithm applied on the dynamically weighted
graph still tends to use highways and national roads even when they are congested (up to a certain
degree), thus sometimes losing optimality.
Dijkstra’s algorithm HH algorithm
# settled nodes 2275563 18966
# explored nodes 2587112 36200
query time [sec] 11.830 0.099
Table 1: Computational results on the static graph: average values
Dijkstra’s algorithm HH algorithm HH algorithm
naive approach heuristic version
# settled nodes 2280872 19174 19099
# explored nodes 2594361 36581 36492
query time [sec] 11.917 0.100 0.099
distance from optimum (variance) 0% 2.00% (5.00) 0.55% (0.45)
Table 2: Computational results on the graph with dynamic times: average values
4 Conclusion
We present a heuristic algorithm for efficiently finding fast paths in large-scale partially dynamically
weighted road networks, and benchmark its application on real-world data. The proposed solution is
based on fast multi-level bidirectional Dijkstra queries on a highway hierarchy built on the statically
weighted version of the network using the Highway Hierarchies algorithm, and deployed using the dynamic
arc weights. Computational results show that, although with no guarantee of optimality, the proposed
solution works well in practice, computing near-optimal fast paths quickly enough for our purposes.
Acknowledgements
We are grateful to Ms. Annabel Chevaux, Mr. Benjamin Simon and Mr. Benjamin Becquet for invaluable
practical help with Oracle and the real data, and to the rest of the Mediamobile’s energetic and youthful
staff for being always friendly and helpful.
REFERENCES 10
Figure 3: Fast paths calculated with different algorithms; each number identifies a path, paths are par-
tially overlapping. 1: Dijkstra’s algorithm (optimal solution) with real-time arc costs; dynamic travelling
time: 24 minutes and 6 seconds. 2: HH algorithm (heuristic solution) with real-time arc costs; dynamic
travelling time: 25 minutes and 5 seconds. 3: HH algorithm without real-time arc costs (traffic-free
optimal solution); dynamic travelling time: 37 minutes and 5 seconds.
References
[AOPS03] R.K. Ahuja, J.B. Orlin, S. Pallottino, and M.G. Scutellà. Dynamic shortest paths minimizing
travel times and costs. Networks, 41(4):197–205, 2003.
[BRTed] L.S. Buriol, M.G.C. Resende, and M. Thorup. Speeding up dynamic shortest path algorithms.
INFORMS Journal on Computing, submitted.
[CH66] K.L. Cooke and E. Halsey. The shortest route through a network with time-dependent intern-
odal transit times. Journal of Mathematical Analysis and Applications, 14:493–498, 1966.
[Cha98] I. Chabini. Discrete dynamic shortest path problems in transportation applications: complex-
ity and algorithms with optimal run time. Transportation Research Records, 1645:170–175,
1998.
[CS02] I. Chabini and L. Shan. Adaptations of the A∗ algorithm for the computation of fastest
paths in deterministic discrete-time dynamic networks. IEEE Transactions on Intelligent
Transportation Systems, 3(1):60–74, 2002.
REFERENCES 11
Figure 4: Highway hierarchy near the Champs Elysées, Paris; colour intensity and line width increase
with hierarchy level, in that order (a wide line with a lighter colour has a higher hierarchy level than a
thin line with dark colour).
[CZ01] E.P.F. Chan and N. Zhang. Finding shortest paths in large network systems. In GIS ’01:
Proceedings of the 9th ACM international symposium on Advances in geographic information
systems, pages 160–166, New York, NY, USA, 2001. ACM Press.
[Dea04] B.C. Dean. Shortest paths in fifo time-dependent networks: theory and algorithms. Technical
report, MIT, Cambridge MA, 2004.
[Dij59] E.W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik,
1:269–271, 1959.
[Dre69] S.E. Dreyfus. An appraisal of some shortest-path algorithms. Operations Research, 17(3):395–
412, 1969.
[DW07] D. Delling and D. Wagner. Landmark-based routing in dynamic graphs. InWEA 2007, volume
4525 of Lecture Notes in Computer Science. Springer, 2007.
[FHK+05] T. Flatberg, G. Hasle, O. Kloster, E.J. Nilssen, and A. Riise. Dynamic and stochastic aspects
in vehicle routing – a literature survey. Technical Report STF90A05413, SINTEF, Oslo,
Norway, 2005.
[GKW05] A.V. Goldberg, H. Kaplan, and R.F. Werneck. Reach for A∗: Efficient point-to-point shortest
path algorithms. Technical Report MSR-TR-2005-132, Microsoft Research, 2005.
[HSW06] M. Holzer, F. Schulz, and D. Wagner. Engineering multi-level overlay graphs for shortest-path
queries. In SIAM, volume 129 of Lecture Notes in Computer Science, pages 156–170. Springer,
2006.
[Ker04] B. S. Kerner. The Physics of Traffic. Springer, Berlin, 2004.
[NV05] TeleAtlas NV. Tele Atlas Multinet ShapeFile 4.3.1 Format Specifications. TeleAtlas NV, May
2005.
REFERENCES 12
[Sch05] D. Schultes. Fast and exact shortest path queries using highway hierarchies. Master Thesis,
Informatik, Universität des Saarlandes, June 2005.
[SS05] P. Sanders and D. Schultes. Highway hierarchies hasten exact shortest path queries. In
G. Stølting Brodal and S. Leonardi, editors, ESA, volume 3669 of Lecture Notes in Computer
Science, pages 568–579. Springer, 2005.
[SS06] P. Sanders and D. Schultes. Engineering highway hierarchies. In ESA 2006, volume 4168 of
Lecture Notes in Computer Science, pages 804–816. Springer, 2006.
[SS07] P. Sanders and D. Schultes. Dynamic highway-node routing. In WEA 2007, volume 4525 of
Lecture Notes in Computer Science, pages 66–79. Springer, 2007.
	Introduction
	Shortest path algorithms in road networks
	Description of the solution method
	Highway Hierarchies algorithm on dynamic directed graphs
	Highway hierarchy
	Extension to directed graphs
	Heuristic application to dynamic networks
	Computational results
	Conclusion
ABSTRACT
  Efficiently computing fast paths in large scale dynamic road networks (where
dynamic traffic information is known over a part of the network) is a practical
problem faced by several traffic information service providers who wish to
offer a realistic fast path computation to GPS terminal enabled vehicles. The
heuristic solution method we propose is based on a highway hierarchy-based
shortest path algorithm for static large-scale networks; we maintain a static
highway hierarchy and perform each query on the dynamically evaluated network.

<|endoftext|><|startoftext|>
Optical Zeno Gate: Bounds for Fault Tolerant Operation
Patrick M. Leung∗ and Timothy C. Ralph
Centre for Quantum Computer Technology, Department of Physics,
University of Queensland, Brisbane 4072, Australia
(Dated: September 8, 2021)
In principle the Zeno effect controlled-sign gate of Franson et al’s (PRA 70, 062302, 2004) is a
deterministic two-qubit optical gate. However, when realistic values of photon loss are considered its
fidelity is significantly reduced. Here we consider the use of measurement based quantum processing
techniques to enhance the operation of the Zeno gate. With the help of quantum teleportation, we
show that it is possible to achieve a Zeno CNOT gate (GC-Zeno gate) that gives (near) unit fidelity
and moderate probability of success of 0.76 with a one-photon to two-photon transmission ratio
κ = 104. We include some mode-mismatch effects and estimate the bounds on the mode overlap
and κ for which fault tolerant operation would be possible.
PACS numbers: 03.67.Lx, 42.50.-p
I. INTRODUCTION
Photons are a natural choice for making qubits be-
cause the quantum information encoded can have a long
decoherence time and is easy to manipulate and mea-
sure. Also, photonic qubits are the only type of qubits
that are feasible for long distance quantum communica-
tion. Quantum information processing requires universal
two-qubit entangling gates. Knill et al [1] showed that
it is theoretically possible to do scalable quantum com-
puting with linear optics by using measurement induced
interactions to perform the two-qubit gates. However, de-
spite a continuous effort in reducing the resource require-
ment [2, 3, 4, 5], the resource overhead is still high for
linear optical quantum computing. Franson et al[6] pro-
posed the use of two-photon absorption non-linearity and
exploiting the quantum Zeno effect to implement a con-
trol sign (CZ) gate that requires much less resources than
linear optics schemes. However, the problem with the
Zeno gate is that photon loss affects the performance of
the Zeno gate significantly and the single photon to two-
photon loss ratio requirement is very stringent. One solu-
tion couldbe to combine measurement induced quantum
processing with the Zeno gate. Previously we have shown
that when using the Zeno gate for qubit fusion [8], state
distillation [9] with post-selection can boost the gate fi-
delity to unity and that for less stringent absorption ra-
tios the gate has an advantage in success probability over
linear optics gates for fusing clusters of qubits. These
clusters of qubits can then be used for cluster state quan-
tum computing [10]. In addition, Myers and Gilchrist [7]
have shown that the performance of the Zeno gate may
be enhanced by using error correction codes such as re-
dundancy and parity encoding.
Here we design a high fidelity Zeno CNOT gate
suitable for circuit-based quantum computing. Although
with the current estimate of the photon loss ratio, only
a poor fidelity Zeno gate is directly achievable, here we
show that it is possible to use two of these Zeno gates
to do Bell measurements and implement a Gottesman-
Chuang[11] teleportation type of CNOT gate (GC-Zeno
gate) that, like the fusion gate, gives high fidelity via
state distillation and moderate success probability via
partially off-line state preparation. We include the effect
of mode-mismatch and detector efficiency on the scheme
and estimate lower bounds on the parameter which in
principle allow fault tolerant operation.
The paper is arranged in the following way. The in-
troduction continues with a subsection on the Zeno CZ
gate, which describes the scheme and modelling of the
gate and give descriptions on the modelling parameters
that are also used for modelling the GC-Zeno CZ gate.
In section II, we discuss the GC-Zeno gate and the effect
of mode-mismatch and detector efficiency on the gate. In
section III, we give estimates of the lower bounds on the
photon loss ratio and mode-matching, followed by a sub-
section on the advantage in using state distillation. We
conclude and summarize in section IV.
Ia. Zeno CZ Gate
Franson et al’s control sign gate scheme consists of a
pair of optical fibers weakly evanescently coupled and
doped with two-photon absorbing atoms. The purpose of
the two-photon absorbers is to suppress the occurrence
of two photon state components in the two fibre modes
via the Quantum Zeno effect. This allows the state to
remain in the computational basis. After a length of
fibre corresponding to a complete swap of the two fibre
modes, a π phase difference is produced between the |11〉
term and the other three basis terms. After swapping
the fibre modes by simply crossing them, a CZ gate
is achieved. The gate becomes near deterministic and
performs a near unitary operation when the Quantum
Zeno effect is strong and photon loss is insignificant.
However, with current technology, the strength of the
http://arxiv.org/abs/0704.1069v1
Quantum Zeno effect is a few orders of magnitude below
what is required, and thus the Zeno gate has significant
photon loss.
In [8], the gate is modelled as a succession of n weak
beamsplitters followed by two-photon absorbers as shown
in Fig. 1. As n → ∞ the model tends to the continu-
ous coupling limit envisaged for the physical realization.
The gate operates on the single-rail encoding for which
|0〉L = |0〉 and |1〉L = |1〉 with the kets representing pho-
ton Fock states. Fig. 2 shows how the single rail CZ can
be converted into a dual rail [13] CZ with logical encoding
|0〉L = |H〉 and |1〉L = |V 〉. Let L be the total length of n
absorbers. Also, let γ1 = exp(
) and γ2 = exp(
the probability of single photon and two-photon trans-
mission respectively for one absorber. Here the param-
eter λ = χL, where L is the length of the absorber and
χ is the corresponding proportionality constant related
to the absorption cross section. Furthermore, κ specifies
the relative strength of the two transmissions and relates
them by γ2 = γ
1 . This CZ gate does the following oper-
ation:
|00〉 → |00〉
|01〉 → γn/2
|10〉 → γn/2
|11〉 → −γn
τ |11〉+ f(|02〉, |20〉) (1)
where the new expression for τ is given by:
τn,λ =
(gn,λ +
dn,λ√
2dn,λ − hn,λ)
+(gn,λ −
dn,λ√
2dn,λ + hn,λ)
dn,λ =
(1 + cos
)(1 + γ2) + 2
γ2(cos(
)− 3)
gn,λ = (cos
γ2 + 1)
hn,λ = 2(cos
γ2 − 1) (2)
The explicit form of the |02〉, |20〉 state components
are suppressed, as they lie outside the computational
basis and so do not explicitly contribute to the fidelity.
FIG. 1: Construction of our CZ gate.
FIG. 2: CZ gate in dual rail implementation.
From equation 1, it is clear that the amplitude of the
four computational states are unequal and this lowers the
gate fidelity. With the current best estimate of κ = 104,
the unherald fidelity is only 0.94. If the gate is used in
a measurement based strategy then state distillation can
be used and the fidelity of the gate can be improved by
trading off some success probability. Figure 3 shows the
distillated Zeno CZ gate circuit. The τ gate is simply
an interferometer consisting of two 50-50 beam splitters
with a two-photon absorber in each arm, which gives op-
eration: |00〉 → |00〉, |01〉 →
|01〉, |10〉 →
|10〉,
and |11〉 → γ′
τ |11〉. Here γ′
= τ1/κ is the single photon
transmission coefficient of the absorber. The distillation
beam splitters labelled 1 to 3 have transmission coeffi-
cient
τ respectively. With these dis-
tillations in place, the operation of the distillated Zeno
CZ gate is:
|00〉 → γn
τ |00〉
|01〉 → γn1 γ
1τ |01〉
|10〉 → γn
τ |10〉
|11〉 → −γn
τ |11〉+ f(|02〉, |20〉) (3)
After measuring the output (detectors at output are
not shown in fig.3) and treating the photon bunching
terms (|02〉, |20〉) and the terms with photon loss as fail-
ures (excluded from Eq. 3 for clarity), renormalising the
states gives unit heralded fidelity independent of λ and
probability of success Ps = γ
τ2 = e−2λ/κτ2+2/κ.
FIG. 3: Schematic of distillated Zeno CZ gate.
II. ZENO GATE USING GOTTESMAN-CHUANG
SCHEME
As argued above, state distillation can improve the
fidelity of the Zeno gate to unity by trading off success
probability. However, the output of the distillated Zeno
FIG. 4: Schematic of GC-Zeno gate. The state |χ〉 is
((|00〉 + |11〉)|00〉 + (|01〉 + |10〉)|11〉)/2
gate contains terms outside the computational basis
due to photon loss and photon bunching. Hence if we
want the gate to have unit fidelity, it is necessary to
measure the output and exclude these failure terms by
post-selection. However, such post-selection means that
the gate can no longer be used directly as a CNOT gate
for circuit-based quantum computing.
Gottesman and Chuang [11] showed the viabil-
ity in using state teleportation and single qubit
operations to implement a CNOT gate. The
scheme requires the four qubit entangled state
|χ〉 = ((|00〉+ |11〉)|00〉+ (|01〉+ |10〉)|11〉)/2. Preparing
the entangled state requires a CZ operation, which can
be done off-line with linear optics with high fidelity. Bell
measurements are made between the input qubits and
the first and last qubits of |χ〉. The measurement results
are fed forward for some single qubit corrections such
that the circuit gives a proper CNOT operation. Here
we propose using such scheme, as shown in figure 4, to
implement a GC-Zeno CNOT gate with high fidelity.
Since this gate includes state distillation, post-selection
and off-line state preparation, the gate has unit fidelity
(under perfect mode-matching) and moderate success
probability. Figure 5 plots the probability of success
against the one-photon to two-photon transmission ratio
κ. It shows that with κ = 104 (current best estimate),
the probability of success is about 0.76, which is better
than the break even point of 0.25 for the linear optics
version of this gate [1].
IIa. Effect of Mode-Mismatch
From source preparation to gate operation to detec-
tion, mode-mismatch is an unavoidable issue in optical
quantum computing that causes unlocated errors which
lowers the fidelity of the device[15]. Fortunately, with
the help of quantum error correction, a certain amount of
unlocated error rate, including but not limited to mode-
mismatch errors, can be tolerated. A reliable quantum
FIG. 5: Plot of probability of success versus log(κ) (in base
10) for GC-Zeno gate. Note that the success probability is not
one at κ = 108, but that the curve asymptotically approaches
one for very large κ. Result is per two input qubits. Detector
inefficiency is taken into account in accord with Dawson et
al’s bound.
gate must therefore have unlocated error rates below this
threshold.
The dominant source of mode-mismatch error in the
GC-Zeno gate is from the CZ gate and τ gate, where
two-photon interaction occurs. Here we follow Rohde et
al’s [14] analysis to examine the effect of such error. We
take the simplest model in which the mode-mismatch is
present between the photons entering the gate but re-
main constant through the gate. In this case, the mode-
mismatch in two-photon interaction can be analysed as
having two-photons fail to interact with some probabil-
ity. This allows us to write the operations for the CZ
gate as follow, where 0 < Γ < 1 quantifies the over-
lap of the two wavepackets. Γ2 is the probability that
the two photons successfully interacted and Γ = 0 for
completely mode-mismatched and Γ = 1 for completely
mode-matched. The bar in the |1̄1〉 term indicates mode-
mismatched component of the state.
|00〉 → |00〉
|01〉 →
|10〉 →
|11〉 → −Γγn1 τ |11〉+
1− Γ2γn1 |1̄1〉
+f(|02〉, |20〉) (4)
And similarly for the operations of τ gate:
|00〉 → |00〉
|01〉 →
|10〉 →
|11〉 → Γγ
τ |11〉+
1− Γ2γ
|1̄1〉
+f(|02〉, |20〉) (5)
With the equations for the CZ and τ gate[16], and
given a normalized input state (α|00〉 + β|01〉 + δ|10〉 +
ǫ|11〉), we can obtain analytical expression for the fidelity
F (per qubit) and success probability Ps (per qubit) of
the GC-Zeno gate as follow. Equation 6 and 7 show
that both the fidelity and success probability are state
dependent due to mode-mismatch. The worst case of
fidelity occurs when the input state is the equal super-
position state (|00〉+ |01〉+ |10〉+ |11〉)/2 (i.e. α = δ =
β = ǫ = 1/2) and the worst case of success probability
occurs when the input state is the pure state |11〉 (i.e.
α = β = δ = 0 and ǫ = 1).
α∗A1 + β
∗A2 + δ
∗A3 + ǫ
|A1|2 + |A2|2 + |A3|2 + |A4|2
e−2λ/κτ2/κ
2(1 + e−λ/κτ2+1/κ)
|A1|2 + |A2|2 + |A3|2 + |A4|2 (7)
where a1 = (τ+τΓ+
1− Γ2), a2 = (τ−τΓ+
1− Γ2),
a3 = (τ − τΓ−
1− Γ2), a4 = (τ + τΓ−
1− Γ2), and
A1 = αa
1 + βa1a2 + δa1a2 + ǫa
2, A2 = αa1a3 + βa2a3 +
δa1a4 + ǫa2a4, A3 = αa1a3 + βa1a4 + δa2a3 + ǫa2a4,
A4 = αa
3 + βa3a4 + δa3a4 + ǫa
IIb. Effect of Detector Efficiency
In practice, even for the most advanced photon detec-
tor, detector inefficiency is always present. The effect of
this noise is to reduce the probability of success of the
gate but not the fidelity because the errors are locatable.
III. ESTIMATE OF BOUNDS FOR FAULT
TOLERANCE
We now wish to estimate lower bounds on the mode-
matching, Γ, and photon loss ratio, κ, that will still allow
fault-tolerant operation. We allow a small amount of
detector inefficiency but assume all other parameters are
ideal. To make this estimate we directly use the bounds
obtained by Dawson et al [12] for a deterministic error
correction protocol. For this protocol, they numerically
derived one bound using the 7-qubit Steane code and
another bound using the 23-qubit Golay code.
In order to use the Dawson et al’s bounds we need
to identify the unlocated and located error rates for our
gate. In general, the unlocated error rate is less than
1 − F but here we take it to be 1 − F because in our
analysis, γ is almost 1, which means the other terms in-
volved are very small. The located error rate is simply
1 − Ps (both F and Ps are per qubit). Using these re-
lationships, we convert each of the bounds into a fidelity
versus success probability bound. For a gate built with
two-photon absorbers that have a certain single-photon
to two-photon transmission ratio κ, we can find an op-
timal λ (i.e. choosing an optimal absorber length) that
FIG. 6: Lower bounds of amount of mode-matching Γ
required for a fault tolerant GC-Zeno gate versus single-
photon to two-photon transmission ratio κ. The bounds are
derived from Dawson et al’s [12] results on deterministic
error correction protocol. The top and bottom curves are
for the 7-qubit Steane code and the 23-qubit Golay code
respectively. Above the curves are the regions where the
amount of mode-mismatch is tolerable. Here we have used
the worst case input.
gives a maximum success probability. Hence, by match-
ing the success probability with the bound, we can deter-
mine the corresponding fidelity threshold and therefore
find the least amount of mode-matching required for fault
tolerant gate operation. We note that the error model
used by Dawson et al is specific to their optical cluster
state architecture and will differ in detail from the appro-
priate error model for the GC-Zeno gate. Nonetheless we
assume that a comparison based on the total error rates
will give a good estimate of the bounds.
Figure 6 shows the lower bounds on the mode-
matching parameter Γ for a gate with a certain κ. Since
the fidelity and success probability are state dependent
due to mode-mismatch, in that figure, we have plotted
for the case of worst fidelity input state (i.e. the equal
superposition state). The top and bottom curves are
best fit curves for using the 7-qubit Steane code and the
23-qubit Golay code respectively. The curves show that
highly mode-matched photons are essential for robust
gate operation. With the worst fidelity input state,
(|00〉 + |01〉 + |10〉 + |11〉)/2, for the Steane code, the
lowest Γ required for fault tolerant operation is about
0.998, where κ = 106, and for the Golay code, the lowest
Γ required is about 0.996, where κ = 5 × 105. With
the worst success probability input state, |11〉, for the
Steane code, the lowest Γ required for fault tolerant
operation is about 0.995, where κ = 106, and for the
Golay code, the lowest Γ required is about 0.989, where
κ = 5 × 105. Figure 6 also shows that under (near)
perfect mode-matching, the required κ can be as low as
approximately 6000 for the Steane code and 2000 for the
Golay code. Two-photon absorbers with such κ values
may be achievable with the best of current technology.
IIIa. Advantage of Using State Distillation
State distillation allows us to trade off some success
probability against fidelity for the GC-Zeno gate, or in
other words, reducing the unlocated error rate by hav-
ing a larger located error rate. Since the determinis-
tic error correction protocol can tolerate both unlocated
and located errors, therefore we should ask whether state
distillation is truly advantageous? We can answer this
question by comparing two GC-Zeno gates in the case of
perfect mode matching, where one has complete distil-
lation and the other has no distillation. For the case of
complete distillation, the deterministic error correction
protocol with the 7-qubit Steane code can tolerate errors
of a GC-Zeno gate with κ = 6100, and with the 23-qubit
Golay code, it can tolerate errors of a GC-Zeno gate with
κ = 2100. For state distillation to be advantageous un-
der the same protocol, these values of κ must be smaller
than the values of κ for the case of no distillation[17].
For the case of no distillation, the fidelity and suc-
cess probability of the gate becomes state dependent. In
the parameters space of interest, the input that gives the
worst fidelity is (|00〉 + |01〉 + |10〉 − |11〉)/2. With this
input state, we find that for the protocol using the 7-
qubit Steane code and no distillation, the critical κ is
12000. Similarly, for the protocol using the 23-qubit Go-
lay code and no distillation, the critical κ is 4300. With
an arbitrary amount of distillation, the value of κ lies
between the limit of no distillation and full distillation
cases. Hence it is evident that state distillation is advan-
tageous. Also, it should be noted it is better to have only
located error, which is the case when there is full distilla-
tion, than have both located and unlocated errors, which
is the case when no or some distillation is utilized.
IV. CONCLUSION
In this paper, we have shown that it is possible to
build a high fidelity Zeno CNOT gate with two distil-
lated Zeno gates implemented in the Gottesman-Chuang
teleportation CNOT scheme. For one-photon to two-
photon transmission ratio κ = 104 (current best esti-
mate), the gate has a success probability of 0.76 under
perfect mode-matching. When including measurement
noise that equals one-tenth of the gate’s noise, and the
effect of mode-mismatch in the CZ and τ gate, we find
that with the deterministic error correction protocol us-
ing the 7-qubit Steane code, the lowest Γ required for
fault tolerant gate operation is 0.998, where κ = 106.
For using the 23-qubit Golay code, the lowest Γ required
is 0.996, where κ = 5 × 105. Hence, the requirement on
mode-matching is stringent for a fault tolerant GC-Zeno
gate.
∗ Electronic address: pmleung@physics.uq.edu.au
[1] E. Knill, R. Laflamme, and G.J. Milburn, Nature 409,
46-52 (2001)
[2] N.Yoran and B.Reznik, Phys. Rev. Lett. 91, 037903
(2003)
[3] M.A. Nielsen, Phys. Rev. Lett. 93, 040503 (2004)
[4] A.J.F.Hayes, A.Gilchrist, C.R.Myers and T.C.Ralph,
J.Opt.B 6, 533 (2004)
[5] D.E.Browne and T.Rudolph, Phys. Rev. Lett. 95, 010501
(2005)
[6] J.D. Franson, B.C. Jacobs, and T.B. Pittman, Phys. Rev.
A 70, 062302 (2004)
[7] C.R. Myers and A. Gilchrist, quant-ph 0612176 (2006)
[8] P.M. Leung and T.C. Ralph, Phys. Rev. A 74, 062325
(2006)
[9] R.T. Thew and W.J. Munro, Phys. Rev. A 63, 030302(R)
(2001)
[10] R. Raussendorf and H.J. Briegel, Phys. Rev. Lett. 86
5188 (2001)
[11] D. Gottesman, I.L. Chuang, Nature 402, 6760 (1999)
[12] C.M. Dawson, H.L. Haselgrove, and M.A. Nielsen, Phys.
Rev. Lett. 96, 020501 (2006)
[13] A.P. Lund, T.C. Ralph, Phys. Rev. A, 66, 032307 (2002)
[14] P.P. Rohde and T.C. Ralph, Phys. Rev. A 73, 062312
(2006)
[15] Note that mode mismatch in CZ and τ gate causes unlo-
cated error that lowers the fidelity, in which we find that
it cannot be improved with state distillation.
[16] Due to mode-mismatch, the τ gate is less effective in two-
photon distillation. It is true that we can increase the
two-photon absorption strength in the τ gate to make
up for the inefficiency. However, here we assume that we
do not know the mode-matching parameter Γ of the in-
put wavepackets, and therefore this adjustment cannot
be made. In addition, increasing the two-photon distilla-
tion will increase single-photon loss as well, which lowers
the probability of success.
[17] An ideal Zeno gate requires strong quantum Zeno effect
and strong quantum Zeno effect corresponds to a large
κ value. However such large non-linearity is difficult to
engineer. Hence it is desirable to have a gate that works
with modest κ.
mailto:pmleung@physics.uq.edu.au
ABSTRACT
  In principle the Zeno effect controlled-sign gate of Franson et al's (PRA 70,
062302, 2004) is a deterministic two-qubit optical gate. However, when
realistic values of photon loss are considered its fidelity is significantly
reduced. Here we consider the use of measurement based quantum processing
techniques to enhance the operation of the Zeno gate. With the help of quantum
teleportation, we show that it is possible to achieve a Zeno CNOT gate (GC-Zeno
gate) that gives (near) unit fidelity and moderate probability of success of
0.76 with a one-photon to two-photon transmission ratio $\kappa=10^4$. We
include some mode-mismatch effects and estimate the bounds on the mode overlap
and $\kappa$ for which fault tolerant operation would be possible.

<|endoftext|><|startoftext|>
Differential Diversity Reception of MDPSK over Independent
Rayleigh Channels with Nonidentical Branch Statistics and
Asymmetric Fading Spectrum
Hua Fu and Pooi Yuen Kam
ECE Department, National University of Singapore
Singapore 117576, Email: {elefh, elekampy}@nus.edu.sg
Abstract— This paper is concerned with optimum diversity re-
ceiver structure and its performance analysis of differential phase
shift keying (DPSK) with differential detection over nonselective,
independent, nonidentically distributed, Rayleigh fading chan-
nels. The fading process in each branch is assumed to have an
arbitrary Doppler spectrum with arbitrary Doppler bandwidth,
but to have distinct, asymmetric fading power spectral density
characteristic. Using 8-DPSK as an example, the average bit error
probability (BEP) of the optimum diversity receiver is obtained
by calculating the BEP for each of the three individual bits.
The BEP results derived are given in exact, explicit, closed-form
expressions which show clearly the behavior of the performance
as a function of various system parameters.
I. INTRODUCTION
The receiver structure and bit error probability (BEP) per-
formance of differential phase shift keying (DPSK) with differ-
ential detection over nonselective, independent and identically
distributed (i.i.d.), Rayleigh fading channels with combining
diversity reception have been well known in the literature
[1]−[4]. However, reaserch shows that in some practical
systems, the independent, non-identically distributed (i.n.i.d.)
channel model is more accurate [5], [6]. In i.n.i.d. channel,
the fading processes and possibly the additive, white Gaussian
noise (AWGN) on the diversity branches have non-uniform
power profiles which are distinct from one another. The effect
of the nonidentical diversity branch statistics on the receiver
structure is studied in [7]. Recently, based on the maximum a
posteriori probability (MAP) criterion, an explicit structure of
the optimum combining differential receiver and a complete
set of closed-form BEP expressions and their Chernoff upper
bounds, for 2-, 4- and 8-DPSK, both with optimum combining
reception and suboptimum combining reception, are derived
in [8]−[10]. The purpose of this paper is to provide a further
extension. The results derived in this paper, together with those
in [8]−[10], form a benchmark counterpart to the classic ones
for the i.i.d. channel given in [1]−[4].
In a Rayleigh channel, the fading gain is usually modeled
as a zero-mean, stationary, complex, Gaussian random pro-
cess. The most widely accepted model [1]−[10] is that the
spectrum of the fading process over each diversity branch is
symmetric around the carrier so that the quadrature processes
are independent of each other. This assumption is valid for
various fading spectra. For example, see [11] and its refer-
ences. However, in some fading environments such as the land
mobile channel with Jakes model [12], the Doppler spectrum
becomes asymmetric when the multipath signals are absorbed
by obstacles or the propagation environment is characterised
by directional non-isotropic scattering [13]−[15]. Thus, it is
of great practical importance to take account of the effect of
the asymmetric fading spectrum on the receiver structure and
the performance analysis of differentially detected DPSK over
i.n.i.d. channels, the topic of this paper.
The paper is orgainzed as follows. In Section II, the signal
model is introduced and different optimum diversity receivers
are derived for different Rayleigh fading scenarios (see eqs.
(17)−(20) below). In Section III, we use 8-DPSK as an
example to study the BEP performance. Here, the average BEP
of the optimum diversity receiver is obtained by calculating the
BEP for each of the three individual bits. The results are given
in exact, explicit, closed-form expressions which show clearly
the behavior of the performance as a function of signal-to-
noise ratio (SNR), fading correlation coefficient, and diversity
order. Section IV presents numerical examples. Throughout
this paper, overhead ∼ denotes a complex quantity, superscript
∗ will denote its conjugate, E is the ensemble average operator,
δ represents the Kronecker delta, and [·]T denotes transposition
of the vector and matrix.
II. SIGNAL MODEL AND RECEIVER STRUCTURE
With space diversity reception over L frequency nonse-
lective, i.n.i.d., Rayleigh fading branches with AWGN, the
received signal over the ith branch, i = 1, 2, · · · , L, during
the kth symbol interval kT ≤ t < (k + 1)T is given, after
matched filtering and sampling at time t = (k + 1)T , by the
statistic r̃i(k), where
r̃i(k) = E
jφ(k)c̃i(k) + ñi(k). (1)
Here, Es is the energy per symbol, and for DPSK, φ(k) is
the data-modulated phase with Gray encoding of bits onto
the phase transition ∆φ(k) = φ(k) − φ(k − 1). The kth data
symbol is conveyed in ∆φ(k). We assume here that all symbol
points are equally likely. In (1), a rectangular data pulse shape
g(t), where g(t) = 1/
T for 0 ≤ t < T and zero elsewhere,
is assumed so that each matched filter has a rectangular low-
pass-equivalent impulse response hi(t) = g(T − t) for all i.
Thus, the filtered noise ñi(k) is given by
ñi(k) =
∫ (k+1)T
ñi(t)√
dt. (2)
Here, {ñi(t)}Li=1 is a set of i.n.i.d., lowpass, complex AWGN
processes with E [ñi(t)] = 0 and E[ñi(t)ñ
i (t− τ)] = Niδ(τ)
so that {ñi(k)}k is a sequence of zero-mean, complex Gaus-
sian variables with covariance function for each branch i
E[ñi(k)ñ
i (j)] = Ni δkj (3)
The multiplicative distortion c̃i(k) in (1) is given by
c̃i(k) =
∫ (k+1)T
c̃i(t)
dt. (4)
http://arxiv.org/abs/0704.1070v1
Here,
c̃i(t) = ai(t) + jbi(t)
is a set of i.n.i.d., lowpass,
zero-mean, stationary, complex, Gaussian random processes.
Each c̃i(t) represents the complex gain due to frequency
nonselective Rayleigh fading of the ith branch. For asymmetric
spectrum in each i, the inphase fading process ai(·) and the
quadrature phase fading process bi(·) are generally correlated.
At any time instant t, however, ai(t) and bi(t) are always
uncorrelated. With reference to Fig. 1, it is shown in [16] that
the covariance function of ai(·) and bi(·) can be obtained as
E[ai(t)bi(t)] = 0 (5a)
E[ai(t− τ)ai(t)] = E[bi(t− τ)bi(t)] = Ri(τ) (5b)
E[ai(t)bi(t− τ)] = −E[bi(t)ai(t− τ)] = Qi(τ) (5c)
Note that if the spectrum of each c̃i(t) is symmetric, the
processes ai(·) and bi(·) will be independent (i.e., we have
Qi(τ) = 0) with the same covariance function Ri(τ).
Letting c̃i(k) = ai(k) + jbi(k), it follows from (4) and (5)
that both {ai(k)}k and {bi(k)}k are sequences of zero-mean,
real-valued, Gaussian random variables with
E[ai(k)bi(k)] = 0 (6a)
E[ai(k − l)ai(k)] = E[bi(k − l)bi(k)] = Ci(l) (6b)
∫ (k+1)T
∫ (k+1−l)T
(k−l)T
Ri(u − v)
du dv
E[ai(k)bi(k − l)] = −E[bi(k)ai(k − l)] = Di(l) (6c)
∫ (k+1)T
∫ (k+1−l)T
(k−l)T
Qi(u− v)
du dv
Thus, the covariance matrix can be obtained as
Γi = E
ai(k)
ai(k − l)
bi(k)
bi(k − l)
ai(k) ai(k − l) bi(k) bi(k − l)
Ci(0) Ci(l) 0 Di(l)
Ci(l) Ci(0) −Di(l) 0
0 −Di(l) Ci(0) Ci(l)
Di(l) 0 Ci(l) Ci(0)
 (7)
For each i , c̃i(k) and ñi(k) are mutually independent. For
i 6= j, {c̃i(k), ñi(k)} are independent of {c̃j(k), ñj(k)}.
The diversity branches are nonidentical since the covariance
functions Ri(τ), Qi(τ) and Niδ(τ) depend on the branch
index i. For convenience of later application, the following
parameters are defined. The fading correlation coefficient at
the matched filter output over a symbol interval of T for the
ith diversity branch is defined as
ρ̃i =
E[c̃i(k)c̃
i (k − 1)]√
|c̃i(k)|2
|c̃i(k − 1)|2
Ci(1)− jDi(1)
Ci(0)
From (8), we note that ρ̃i is a complex quantity. It is a measure
of the fluctuation rate of the channel fading process. The mean
received SNR per symbol over the ith branch is defined as
|E1/2s ejφ(k)c̃i(k)|2
2EsCi(0)
We consider 2-, 4- and 8-DPSK with Gray encoding of bits
onto ∆φ(k) as shown in [4, Fig.1] for 4- and 8-DPSK, the
mean received SNR per bit γbi is given by γ
i = γi for 2-DPSK,
γbi = γi/2 for 4-DPSK, and γ
i = γi/3 for 8-DPSK.
Using the MAP criterion, the aim of the receiver is to
determine from the received signals {r̃i(k), r̃i(k − 1)}Li=1
which one of the possible values 2πm/M , m = 0, 1, · · · ,M−
1, of the phase difference ∆φ(k) has maximum probability
of occurrence. Following [9], it can be shown that MAP
detection is equivalent to maximum log-likelihood detection.
Specifically, based on {r̃i(k), r̃i(k − 1)}Li=1, we decide that
∆φ(k) = 2πn/M whenever the log-likelihood function
logΨm =
r̃i(k)
∣∣∣r̃i(k − 1),∆φ(k) =
is maximized for m = n.
To proceed with evaluating (10), we need to verify that
c̃i(k) = ai(k)+jbi(k) and c̃i(k−1) = ai(k−1)+jbi(k−1) are
jointly complex Gaussian. By being jointly complex Gaussian,
it means that if x̃ = xR + jxI and ỹ = yR + jyI are two
column complex random vector, then [xR
has a real multivatiate Gaussian probability density function
(PDF), and furthermore, if u = [xR
T ]T and v =
T ]T , then the real covariance matrix of [uT vT ]T
has a special form given in [18, Theorem 15.1] that satis-
fies Goodman’s theorem [19]. After careful examination, it
follows from (7) that c̃i(k) and c̃i(k − 1) are indeed jointly
complex Gaussian1. Thus, conditioned on c̃i(k − 1), c̃i(k) is
conditionally complex Gaussian with mean [18]
E [c̃i(k)|c̃i(k − 1)] = ρ̃i c̃i(k − 1) (11)
and variance
{∣∣c̃i(k)− E[c̃i(k)|c̃i(k − 1)]
∣∣∣c̃i(k − 1)
= 2Ci(0)− 2
C2i (1) +D
i (1)
Ci(0)
Moreover, conditioned on the vector [ai(k − 1) bi(k − 1)]T ,
the vector [ai(k) bi(k)]
T is conditionally Gaussian with
covariance matrix given by
Ci(0)− C
(1)+D2
Ci(0)
0 Ci(0)− C
(1)+D2
Ci(0)
which is a diagonal matrix. This shows that Re[c̃i(k)|c̃i(k−1)]
and Im[c̃i(k)|c̃i(k − 1)] are independent.
Applying (11), (12) and (13) to (10), we obtain
logΨm = ζ + (14)
2Es [Ci(1) + jDi(1)] e
−j 2πm
M r̃i(k)r̃
i (k − 1)
[2EsCi(0) +Ni]2 − 4E2s [C2i (1) +D2i (1)]
or, equivalently
logΨm = ζ + (15)
|ρ̃i| γi e−j∠ρ̃i
(1 + γi)2 − (|ρ̃i|γi)2
r̃i(k) r̃
i (k − 1)e−j
where ζ represents the constant term which does not affect
the decision. In (15), the quantities |ρ̃i| =
(1)+D2
∠ρ̃i = − tan−1
Di(1)
Ci(1)
represent the magnitude and phase of
the correlation coefficient ρ̃i given in (8), respectively.
1We also call them the proper complex Gaussian random variables [17].
Defining the real-valued weighting factors
|ρ̃i|γi
(1 + γi)2 − (|ρ̃i|γi)2
, (16)
it follows from (15) that the optimum combining differential
receiver will now compute, for the kth symbol, the decision
statistics {Λm(k)}M−1m=0 , and declares that ∆φ(k) =
Λn(k) = maxm {Λm(k)}, where
Λm(k) = Re
wi r̃i(k) r̃
i (k − 1) e−j∠ρ̃i
If the spectrum of the channel complex gain is symmetric,
ρ̃i is a real-valued quantity. Then, the optimum combining
differential receiver (17) will become [9]
Λ′m(k) = Re
wi r̃i(k) r̃
i (k − 1)
If the diversity branches are i.i.d., but the fading gains have
asymmetric spectrum, the optimum receiver will become
Λ′′m(k) = Re
M e−j∠ρ̃
r̃i(k) r̃
i (k − 1)
where ρ̃ = ρ̃i for i = 1, 2, · · · , L. For i.i.d. branches with
fading gains having symmetric spectrum, the optimum receiver
is the well-known product detector, given by [4]
Λ′′′m(k) = Re
r̃i(k) r̃
i (k − 1)
Comparing (20) with (17), we see that in the case of i.n.i.d.
channels with asymmetric power spectrum, the receiver first
rotates the product phasor r̃i(k)r̃
i (k − 1) between the two
received signal samples at each diversity branch by the angle
−∠ρ̃i, then scales each resulting phasor by the weight wi, and
finally sums all L rotated and scaled phasors to form a decision
variable. Clearly, in order to form the optimum detector (17),
besides the received signal samples r̃i(k) and r̃i(k − 1),
the receiver requires the a priori knowledge of the channel
statistics, including the power spectral densities of AWGN
Ni, both the magnitude and phase of the fading correlation
coefficient ρ̃i, and the mean received SNR γi. These quantities
can be pre-computed according to our knowledge of the
channel statistics at the receiver.
III. PERFORMANCE ANALYSIS
In this section, we will derive exact, explicit and closed-
form BEP expressions for differentially detected DPSK for
the optimum receiver (17). Due to space limitation, we only
consider 8-DPSK in this paper. The signal constellation, bit
mapping and the decision region Rm for 8-DPSK is shown in
Fig. 2. In [4] and [9], the average BEP is computed using
the binary reflected Gray code (BRGC) approach through
Hamming weight spectrum [20]. It is shown in [21] that the
BRGC approach with Hamming weight is less accurate for
M ≥ 16. In this paper, we adopt a new approach, namely,
the average BEP is obtained by calculating the BEP for each
of the three individual bits in 8-DPSK. This approach has
the advantage of showing explicitly the BEP performance
differently for the three different transmitted information bits.
Therefore, using the bit which has lower BEP to convey more
important information can improve communication reliability.
From Fig. 2, we see that each signal point is represented
by a 3-bit symbol (j1, j2, j3). We use Pj1 , Pj2 and Pj3 to
denote the corresponding individual BEP. Since the three bits
are equally likely, the average BEP is given by
(Pj1 + Pj2 + Pj3) (21)
We begin with computing Pj1 . Without loss of generality, it
is assumed that j1 = 0. The case where j1 = 1 gives an
identical result. From Fig. 2, we see that the bit j1 = 0 is
associated with the symbols 000 (∆φ(k) = 0), 001 (∆φ(k) =
π/4), 011 (∆φ(k) = π/2), and 010 (∆φ(k) = 3π/4). Thus,
conditioning on j1 = 0, the BEP Pj1 will be given by
Pj1 =
Pj1 (e|∆φ(k) = 0) + Pj1(e|∆φ(k) = π/4) (22)
+Pj1(e|∆φ(k) = π/2) + Pj1 (e|∆φ(k) = 3π/4)
Here, Pj1(e|∆φ(k) = mπ/4),m = 0, 1, 2, 3, is the probabil-
ity that conditioning on ∆φ(k) = mπ/4, the decision j1 = 1
is made. With reference to Fig. 2, this is equivalent to the
probability that conditioning on ∆φ(k) = mπ/4, the phasor∑L
i=1 wi r̃i(k) r̃
i (k − 1) e−j∠ρ̃i lies outside the half-plane
region R0+R1+R2+R3 (i.e., in the region R4+R5+R6+R7).
The BEP Pj1(e|∆φ(k) = mπ/4) is thus obtained as
e|∆φ(k) = mπ/4
8 (23)
wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i
∣∣∣∣∣
∆φ(k) =
To evaluate (23), first, it follows from (11) and (12) that con-
ditioning on ∆φ(k) = mπ/4 and on r̃i(k− 1) ej∠ρ̃i ej3π/8 =
α̃i, for i = 1, 2, · · · , L, the quantity r̃i(k) is condition-
ally Gaussian with mean α̃i
ρ̃iγi
e−j∠ρ̃ie−j3π/8ejmπ/4 =
|ρ̃i|γi
e−j3π/8ejmπ/4, where ρ̃i = |ρ̃i|ej∠ρ̃i has been used,
and variance (1+γi)
2−(|ρ̃i|γi)2
Ni. Then, in (23) the quantity
Re[e−j
i=1 wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i)] is conditionally
Gaussian with mean cos
mπ/4− 3π/8
i=1 wi
|ρ̃i|γi
|α̃i|2
and variance 1
i=1 w
(1+γi)
2−(|ρ̃i|γi)2
Ni |α̃i|2. Finally,
following the derivation procedure detailed in [9], the BEP
in (23) can be obtained as
e|∆φ(k) = mπ
= (24)
+ 1/λi
where the quantity Gi is given by
j=1,j 6=i
λi − λj
, and λi =
(|ρ̃i|γi)2
(1 + γi)2 − (|ρ̃i|γi)2
Putting (24) into (22) leads to the BEP Pj1. An interesting
observation from (24) is that the BEP does not depend on the
phase, ∠ρ̃i, of the fading correlation coefficient ρ̃i. Intuitively,
this is because the optimum receiver (17) can provide “phase
compensation” for each diversity branch before combining
using the channel statistic knowledge e−j∠ρ̃i . As such, we
expect that the receivers (18) and (20) are suboptimum over
the channel with asymmetric fading spectrum.
Next, we compute Pj2 in (21). The procedure for obtaining
the conditional BEP for j2 = 0 is parallel to that followed in
the case for j1 = 0. From Fig. 2, the bit j2 = 0 is associated
with the symbols 001 (∆φ(k) = π/4), 000 (∆φ(k) = 0),
100 (∆φ(k) = 7π/4), and 101 (∆φ(k) = 3π/2). Hence,
conditioning on j2 = 0, the BEP Pj2 is given by
Pj2 =
Pj2(e|∆φ(k) = π/4) + Pj2(e|∆φ(k) = 0) (26)
+Pj2(e|∆φ(k) = 7π/4) + Pj2 (e|∆φ(k) = 3π/2)
where Pj2(e|∆φ(k) = nπ/4), n = 0, 1, 6, 7, is the conditional
probability that the phasor
i=1 wir̃i(k)r̃
i (k− 1)e−j∠ρ̃i lies
in the half-plane region R2 +R3 +R4 +R5, i.e.,
e|∆φ(k) = nπ/4
8 (27)
wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i
∣∣∣∣∣
∆φ(k) =
which has solution
Pj2 (e|∆φ(k) = nπ/4)= (28)
+ 1/λi
Putting (28) into (26) leads to the BEP Pj2.
Finally, we compute Pj3 in (21). From Fig. 2, the bit j3 = 0
is associated with the symbols 100 (∆φ(k) = 7π/4), 000
(∆φ(k) = 0), 010 (∆φ(k) = 3π/4), and 110 (∆φ(k) = π).
Thus, conditioning on j3 = 0, the BEP Pj3 is given by
Pj3 =
Pj3 (e|∆φ(k) = 7π/4) + Pj3(e|∆φ(k) = 0) (29)
+Pj3(e|∆φ(k) = 3π/4) + Pj3 (e|∆φ(k) = π)
where Pj3(e|∆φ(k) = lπ/4), l = 0, 3, 4, 7, is the conditional
probability that the phasor
i=1 wir̃i(k)r̃
i (k− 1)e−j∠ρ̃i lies
in the region R1 + R2 + R5 + R6. This is equivalent to the
conditional probability that after rotating by −π/8, the product
of the inphase and quadrature-phase components of the phasor∑L
i=1 wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i is greater than zero, i.e.,
e|∆φ(k) = l π/4
= (30)
wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i
wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i
∣∣∣∣∣
∆φ(k) =
From the argument for deriving (24), we note that condi-
tioning on ∆φ(k) = lπ/4 and on r̃i(k − 1) ej∠ρ̃i ejπ/8 =
β̃i, for i = 1, 2, · · · , L, the inphase component in (30),
Re[e−j
i=1 wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i)] is conditionally
Gaussian with mean cos
lπ/4− π/8
i=1 wi
|ρ̃i|γi
|β̃i|2
and variance 1
i=1 w
(1+γi)
2−(|ρ̃i|γi)2
Ni |β̃i|2. Similarly,
the component Im[e−j
i=1 wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i)] in
(30) is also a conditionally Gaussian random variable,
with mean sin
lπ/4− π/8
i=1 wi
|ρ̃i|γi
|β̃i|2 and vari-
ance 1
i=1 w
(1+γi)
2−(|ρ̃i|γi)2
Ni |β̃i|2. Moreover, it fol-
lows from (13) and the properties of the complex Gaussian ran-
dom variables [18] that the conditional inphase and quadrature-
phase components Re[e−j
i=1 wir̃i(k)r̃
i (k − 1)e−j∠ρ̃i)]
and Im[e−j
i=1 wir̃i(k)r̃
i (k−1)e−j∠ρ̃i)] in (30) are also
independent. Therefore, conditioning on ∆φ(k) = lπ/4 and
on r̃i(k − 1) ej∠ρ̃i ejπ/8 = β̃i, and denoting the inphase and
quadrature-phase components as
X ∼ N
lπ/4− π/8
u, η2
Y ∼ N
lπ/4− π/8
u, η2
where u and η2 are given, respectively, by
|ρ̃i|γi
1 + γi
|β̃i|2
(1 + γi)
2 − (|ρ̃i|γi)2
1 + γi
Ni |β̃i|2 (32)
the conditional BEP Pj3
∣∣∆φ(k) = lπ
, β̃i
is given by
∣∣∆φ(k) =
, β̃i
X Y > 0
∣∣∆φ(k) =
, β̃i
This is probability that the product of two independent real-
valued Gaussian random variables with non-zero, nonidentical
means and identical variances is greater than zero. This is a
special case of the results given in [2, Appendix B] concerning
the probability that a general quadratic form in complex-valued
Gaussian random variables is less than zero. Using [2, (B-21)
of Appendix B], (33) can be evaluated as
∣∣∆φ(k) =
, β̃i
= 1− (34)
g[1− sin (lπ/2− π/4)],
g[1 + sin (lπ/2− π/4)]
I0 [g| cos (lπ/2− π/4)|] exp(−g)
where, Q1(a, b) is first-order Marcum’s Q-function and Ik(x)
is the kth-order modified Bessel function of the first kind. In
(34), the quantity g =
i=1 w
i |β̃i|2 has PDF given by [9]
p(g) =
w′i Ni (1 + γi)
w′iNi (1 + γi)
where w′i =
(|ρ̃i|γi)2
(1+γi)[(1+γi)2−(|ρ̃i|γi)2] . Averaging the condi-
tional probability (34) over g using the PDF (35) gives the
BEP Pj3
∣∣∆φ(k) = lπ/4
in (30), i.e.,
∣∣∆φ(k) =
∣∣∆φ(k) =
, β̃i
p(g)dg
Substituting (34) and (35) into (36), we obtain, after manipu-
lation and simplification,
∣∣∆φ(k) =
A2i − cos2 ( lπ2 −
1− | cos (
−cos2 ( lπ
− 1/2√
A2i − cos2 ( lπ2 −
where Ai is given by
1 + γi
|ρ̃i|γi
Putting (37) into (29) leads to the BEP Pj3. Substituting (22),
(26) and (29) in (21), we obtain the average BEP P .
IV. NUMERICAL EXAMPLE
Fig. 3 plots the BEP performance for the three individual
bits in (22), (26) and (29) and the average BEP in (21) of
8-DPSK against the total average received SNR per bit. The
order of diversity is set to L = 2. The abscissa represents the
total mean SNR per bit which is given by γb =
i=1 γ
i=1 γi. The average received bit energy distribution among
the two branches is set to γb1 : γ
2 = 30% : 70%. It is
assumed that the fading correlation coefficient (the normalized
covariance function) model follows [14, eq.(10)], given by
E[c̃i(t)c̃
i (t− τ)]
|c̃i(t)|2
κ2 − 4π2f2dτ2 + j4πκfdτ
I0(κ)
where fd is the Doppler frequency, and κ is a parameter that
controls the width of the angle of arrival of scatter components
[14, eq.(1)]. Note that if κ = 0, (38) results in the correlation
coefficient for the Jakes two-dimensional isotropic scattering
model, i.e., E[c̃i(t)c̃
i (t− τ)]/E
|c̃i(t)|2
= I0(j2πfmτ) =
J0(2πfmτ), where J0(·) is the zeroth-order Bessel function.
We assume that the normalized Doppler spread fdT = 0.03
and 0.05 for diversity branches 1 and 2, respectively, and the
parameter κ is set to 3. Thus, we have ρ̃1 = 0.9871+ j0.1519
and ρ̃2 = 0.9642 + j0.2511. It is seen from Fig. 3 that the
third bit j3 has the lowest BEP, whereas, the BEP Pj1 for the
first bit j1 is equal to the BEP Pj2 for the second bit j2.
REFERENCES
[1] M. Schwartz, W.R. Bennett, and S. Stein, Communication Systems and
Techniques, New York: McGraw-Hill, 1966.
[2] J. G. Proakis, Digital Communications, 4th edition, New York: McGraw-
Hill, 2001.
[3] M. K. Simon and M. S. Alouini,, Digital communication over fading
channels, 2nd Edition, New York: John Wiley & Sons, 2005.
[4] P. Y. Kam, “Bit error probabilities of MDPSK over the nonselective
Rayleigh fading channel with diversity reception,” IEEE Trans. Com-
mun., vol.39, pp.220-224, February 1991.
[5] M.Z. Win and J.H. Winters, “Analysis of hybrid selection/maximal-ratio
combining of diversity channels with unequal SNR in Rayleigh fading,”
Proc. 49th IEEE VTC, pp.215-220, May 16-20, 1999.
[6] P. Polydorou and P. Ho, “Error performance of MPSK with diversity
combining in non-uniform Rayleigh fading and non-ideal channel esti-
mation,” Proc. 51st IEEE VTC, pp.627-631, May 15-18, 2000.
[7] F. Adachi, “Postdetection optimal diversity combiner for DPSK differen-
tial detection,” IEEE Trans. Veh. Technology, vol.42, pp.326-337, August
1993.
[8] H. Fu and P. Y. Kam, “Performance of Optimum and Suboptimum Com-
bining Diversity Reception for Binary DPSK over Independent, Non-
identical Rayleigh Fading Channels,” Proc. 40th IEEE ICC, pp.2367-
2371, May 16-20, 2005.
[9] H. Fu and P. Y. Kam, “MDPSK diversity receiver over Rayleigh fading
channels with differential detection and nonidentical branch statistics,”
Proc. 63rd IEEE VTC, pp.1660-1664, May 7-10, 2006.
[10] H. Fu and P. Y. Kam, “Performance of Optimum and Suboptimum
Combining Diversity Reception for Binary and Quadrature DPSK over
Independent, Nonidentical Rayleigh Fading Channels,” to appear in the
IEEE Trans. Commun., May 2007.
[11] L. J. Mason “Error probability evaluation for systems employing differ-
ential detection in a Rician fast fading environment and Gaussian noise,”
IEEE Trans. Commun., vol.35, pp.39-46, January 1987.
[12] W.C. Jakes, Microwave Mobile Communications, NJ: IEEE Press, 1974.
[13] M. Patzold, Y.C. Li and F. Laue “A study of a land mobile satel-
lite channel model with asymmetrical Doppler power spectrum and
lognormally distributed line-of-sight component,” IEEE Trans. Veh.
Technology, vol.47, pp.297-310, February 1998.
[14] A. Abdi, J.A. Barger and M. Kaveh “A parametric model for the
distribution of the angle of arrival and the associated correlation function
and power spectrum at the mobile station,” IEEE Trans. Veh. Technology,
vol.51, pp.425-434, May 2002.
[15] K. Anim-Appiah “Complex envelope correlations for nonisotropic scat-
tering,” Electron. Lett., vol.34, pp.918-919, April 1998.
[16] W. B. Davenport and W. L. Root, An introduction to the theory of
random signals and noise, New York: McGraw-Hill, 1958
[17] F. D. Neeser and J.L. Massey “Proper complex random processes with
applications to information theory,” IEEE Trans. Inform. Theory, vol.39,
pp.1293-1302, July 1993.
[18] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation
Theory, New Jersey: Prentice-Hall, 1998.
[19] A. Papoulis and S.U. Pillai, Probability, Random Variables and Stochas-
tic Processes, 4th Ed., MA: McGraw-Hill, 2002.
[20] P. J. Lee, “Computation of the bit error rate of coherent M-ary PSK
with Gray code bit mapping,” IEEE Trans. Commun., vol.34, pp.488-
491, May 1986.
[21] J. Lassing, E.G. Strom, E. Agrell and L. Ottosson, “Computation of the
exact bit-error rate of coherent M-ary PSK with Gray code bit mapping,”
IEEE Trans. Commun., vol.51, pp.1758-1760, November 2003.
Fig. 1. Illustration of complex channel fading process.
Fig. 2. 8-DPSK constellation and decision region.
10 15 20 25 30 35 40
SNR (γ
), dB
Fig. 3. BEP comparison of the three individual bits and the average of all
bits for 8-DPSK.
	INTRODUCTION
	SIGNAL MODEL AND RECEIVER STRUCTURE
	PERFORMANCE ANALYSIS
	NUMERICAL EXAMPLE
	References
ABSTRACT
  This paper is concerned with optimum diversity receiver structure and its
performance analysis of differential phase shift keying (DPSK) with
differential detection over nonselective, independent, nonidentically
distributed, Rayleigh fading channels. The fading process in each branch is
assumed to have an arbitrary Doppler spectrum with arbitrary Doppler bandwidth,
but to have distinct, asymmetric fading power spectral density characteristic.
Using 8-DPSK as an example, the average bit error probability (BEP) of the
optimum diversity receiver is obtained by calculating the BEP for each of the
three individual bits. The BEP results derived are given in exact, explicit,
closed-form expressions which show clearly the behavior of the performance as a
function of various system parameters.

<|endoftext|><|startoftext|>
Introduction to the Theory of Su-
perfluidity (Addison-Wesley, New York, 1989).
[18] D. T. Son, Int. J. Mod. Phys. A16S1C, 1284 (2001).
[19] M. E. Gusakov and N. Andersson, Mon. Not. Astron.
Soc. 372, 1776 (2006).
[20] B. Carter, in: Journeés Relativistes 1976, edited by Ca-
hen M., Debever R., Geheniau J. (Université Libre de
Bruxelles, Brussells, 1976), pp. 12–27
[21] B. Carter, in: Journeés Relativistes 1979, edited by
Moret-Baily I. and Latremoliére C. (Faculté des Sciences,
Anger, 1979), pp. 166–182
[22] B. Carter, in: A Random Walk in Relativity and Cosmol-
ogy, Proceedings of the Vadya-Raychaudhuri Festschrift,
IAGRG, 1983, edited by Dadhich N., Krishna Rao J.,
Narlikar J.V., and Vishveshwara C.V. (Wiley Eastern,
Bombay, 1985), pp. 48–62
[23] I.M. Khalatnikov and V.V. Lebedev, Phys. Lett. A91,
70 (1982).
[24] V.V. Lebedev and I.M. Khalatnikov, Zh. Eksp. Teor. Fiz.
83, 1623 (1982) [Sov. Phys. JETP 56, 923 (1982)].
[25] G. L. Comer, D. Langlois, and L. M. Lin, Phys. Rev.
D60, 104025 (1999).
[26] N. Andersson, G. L. Comer, and D. Langlois, Phys. Rev.
D66, 104002 (2002).
[27] S. Yoshida and U. Lee, Phys. Rev. D67, 124019 (2003).
[28] L. Tisza, Nature 141, 913 (1938).
[29] L.D. Landau, Zh. Eksp. Teor. Fiz. 11, 592 (1941).
[30] L.D. Landau, J. Physics 11, 91 (1947).
[31] I.M. Khalatnikov, Zh. Eksp. Teor. Fiz. 23, 169 (1952).
[32] N. Andersson and G. L. Comer, Living Reviews in Rela-
tivity 10, 1 (2007).
[33] S. Weinberg, Astrophys. J. 168, 175 (1971).
[34] I. M. Khalatnikov, Zh. Eksp. Teor. Fiz. 23, 169 (1952).
[35] C. Pujol and D. Davesne, Phys. Rev.C67, 014901 (2003).
[36] G. Mendell, Astrophys. J. 380, 515 (1991a).
[37] G. Mendell, Astrophys. J. 380, 530 (1991b).
[38] L. Lindblom and G. Mendell, Astrophys. J. 421, 689
(1994).
[39] N. Andersson and G. L. Comer, Mon. Not. R. Astron.
Soc. 328, 1129 (2001).
[40] N. Andersson and G. L. Comer, Class. Quant. Grav. 23,
5505 (2006).
[41] A. F. Andreev and E. P. Bashkin, Zh. Eksp. Teor. Fiz.
69, 319 (1975) [Sov. Phys. JETP 42, 164 (1976)].
[42] M. Borumand, R. Joynt, and W. Kluźniak, Phys. Rev.
C54, 2745 (1996).
[43] M. E. Gusakov and P. Haensel, Nucl. Phys. A761, 333
(2005).
[44] D. G. Yakovlev, A. D. Kaminker, O. Y. Gnedin, and P.
Haensel, Phys. Rep. 354, 1 (2001).
[45] R. F. Sawyer, Phys. Rev. D39, 3804 (1989).
[46] P. Haensel, R. Schaeffer, Phys. Rev. D45, 4708 (1992).
[47] E. M. Lifshitz and L. P. Pitaevskii, Statistical Physics,
Part 2, (Pergamon Press, Oxford, 1980).
[48] R. P. Feynman, Statistical Mechanics, (Benjamin, Mas-
sachusetts, 1972).
[49] A. Reisenegger, Astrophys. J. 442, 749 (1995).
[50] M. E. Gusakov, D. G. Yakovlev, and O. Y. Gnedin, Mon.
Not. Astron. Soc. 361, 1415 (2005).
[51] R. I. Epstein, Astrophys. J. 333, 880 (1988).
[52] H. Heiselberg and M. Hjorth-Jensen, Astrophys. J. 525,
L45 (1999).
[53] A. D. Jackson, E. Krotscheck, D. E. Meltzer, and R. A.
Smith, Nucl. Phys. A386, 125 (1982).
[54] R. Fernández and A. Reisenegger, Astrophys. J. 625, 291
(2005).
[55] A. Reisenegger, P. Jofre, R. Fernandez, and E. Kantor,
Astrophys. J. 653, 568 (2006).
[56] P. Jofré, A. Reisenegger, and R. Fernández, Phys. Rev.
Lett. 97, 131102 (2006).
[57] A. Reisenegger, Astrophys. J. 485, 313 (1997).
http://arxiv.org/abs/astro-ph/0012209
http://arxiv.org/abs/astro-ph/0607643
ABSTRACT
  The hydrodynamics, describing dynamical effects in superfluid neutron stars,
essentially differs from the standard one-fluid hydrodynamics. In particular,
we have four bulk viscosity coefficients in the theory instead of one. In this
paper we calculate these coefficients, for the first time, assuming they are
due to non-equilibrium beta-processes (such as modified or direct Urca
process). The results of our analysis are used to estimate characteristic
damping times of sound waves in superfluid neutron stars. It is demonstrated
that all four bulk viscosity coefficients lead to comparable dissipation of
sound waves and should be considered on the same footing.

<|endoftext|><|startoftext|>
Introduction
Experimental situation on the existence of Θ+ is currently filled with conflicting
positive and negative evidence and the question is still not yet fully settled.1)
Even if it turns out that Θ+ does not exist, it would still be theoretically interest-
ing to understand the absence of such an exotic. Many theoretical approaches have
been used, including quark models, QCD sum rules, and lattice QCD, in addition to
the chiral soliton model, to understand the properties and structure of Θ+ since its
first sighting.2) One of the most intriguing theoretical ideas suggested for Θ+ is the
diquark picture of Jaffe and Wilczek (JW)3) in which Θ+ is considered as a three-
body system consisted of two scalar, isoscalar, color 3̄ diquarks (D’s) and a strange
antiquark (s̄). It is based, in part, on group theoretical consideration. It would hence
be desirable to examine such a scheme from a more dynamical perspective.
It is known that diquark arises naturally from Nambu-Jona-Lasinio (NJL) model,
an effective quark theory in the low energy region.4) NJL model conveniently incor-
porates chiral symmetry and its spontaneously breaking which dictates the hadronic
physics at low energy. Models based on NJL type of Lagrangians have been very
successful in describing the low energy meson physics. Based on relativistic Fad-
∗) e-mail address: mineo@gate.sinica.edu.tw
∗∗) e-mail address: J.A.Tjon@phys.uu.nl
∗∗∗) e-mail address: tsushima@usal.es
†) e-mail address: snyang@phys.ntu.edu.tw
typeset using PTPTEX.cls 〈Ver.0.9〉
http://arxiv.org/abs/0704.1072v2
2 H. Mineo, J.A. Tjon, K. Tsushima, and S.N. Yang
deev equation, the NJL model has also been applied to the baryon systems.5), 6)
It has been shown that, using the quark-diquark approximation, one can explain
the nucleon static properties and the qualitative features of the empirical valence
quark distribution reasonably well.6) Consequently, we will employ NJL model to
describe the dynamics of a (s̄DD) three-particle system in Faddeev formalism. We
use relativistic equations to describe both the three-particle and its two-particle sub-
systems, namely, the Bethe-Salpeter-Faddeev (BSF) equation7) and Bethe-Salpeter
(BS) equations. In practice, Blankenbecler-Sugar reduction scheme is used to reduce
the four-dimensional integral equation into three-dimensional ones.
§2. SU(3)f NJL model and the diquark
The SU(3)f NJL model is a chirally symmetric four-fermi contact interaction
Lagrangian. With the use of Fierz transformations, the original NJL interaction
Lagrangian LI can be rewritten, for the qq̄ channel, as
LI,qq̄ = G1
(ψ̄λafψ)
2 − (ψ̄γ5λafψ)
(ψ̄γµλafψ)
2 + (ψ̄γµγ5λafψ)
(ψ̄γµλ0fψ)
2 + (ψ̄γµγ5λ0fψ)
(ψ̄γµλ0fψ)
2 − (ψ̄γµγ5λ0fψ)
+ · · · , (2.1)
where a = 0 ∼ 8, and λ0
I. For later use, we define G5 = G2 +
Gv , with
Gv ≡ G3 + G4. For the scalar, isoscalar diquark channel, interaction Lagrangian is
given by
LI,s = Gs
ψ̄(γ5C)λ2fβ
ψT (C−1γ5)λ2fβ
, (2.2)
where βAc =
λA(A = 2, 5, 7) corresponds to one of the color 3̄c states. C = iγ
is the charge conjugation operator, and λ′s are the Gell-Mann matrices.
The constituent quark and diquark masses can be obtained from the gap equation
and t-matrix of the diquark. Since we are only interested in a qualitatively study of
the interactions inside Θ+, we will use the empirical values of the constituent quark
masses Mu,d = 400 MeV, Ms = 600 MeV, and the diquark mass MD = 600 MeV as
obtained in Ref.8)
§3. Two-body interactions for s̄D and DD channels
In the JW model for Θ+,3) symmetry consideration requires that the the spatial
wave function of the two scalar-isoscalar, color 3̄ diquarks must be antisymmetric
and the lowest possible state is p-state. Since Θ+ is of JP = 1
, s̄ would be in
relative s-wave to the DD pair. Accordingly, we will consider only the configuration
where s̄D and DD are in relative s- and p-waves, respectively.
Fig. 1 shows the lowest order diagram, i.e., first order in LI,qq̄ in s̄D scat-
tering. Trace properties in Dirac and flavor space limit the vertex Γ to only the
vector-isoscalar term, (ψ̄γµλ0
ψ)2. For the DD interaction, the quark rearrangement
Faddeev Calculation for Pentaquark Θ+ 3
diagram gives no contribution because of its color structure. The lowest order non-
vanishing diagram of the first order in LI,qq̄ is given in Fig. 2 where only the direct
term is shown. The corresponding exchange diagram vanishes again because of the
color structure.
Γ=λ f
aγµ (a=0,8)
Fig. 1. s̄D potential of the lowest order
in LI,qq̄ .
Fig. 2. Lowest order diagrams in DD
scattering.
With the use of the interaction Lagrangians of Eqs. (2.1-2.2), we obtain the
following driving terms of Fig. 1 and 2, in the BS equations for s̄D and DD two-
particle systems,
< s̄fDf |V |s̄iDi > = (−v̄(ps̄i))(−iVs̄D)(pDi, pDf )v(ps̄f ),
Vs̄D =
GvFv(q
2)Ṽs̄D(pDi, pDf ), Ṽs̄D(pDi, pDf ) = (6pDi + 6pDf )/2 (3.1)
− iVDD(~pDi, ~pDf ) = 128i
2)−G5(pD1i + pD1f ) · (pD2i + pD2f )F
,(3.2)
where ps̄ and pD denote the four-momentum of the s̄-quark and diquark etc. The Fv
and Fs are the vector and scalar form factors of the scalar diquark. For simplicity,
we will assume that both take the dipole form, (1−q2/Λ2)−2, with Λ = 0.84 GeV. In
the NJL model calculation with the Pauli-Villars cutoff,8) the coupling constants are
related to the mesonic coupling constants by G1 = Gπ/2, G2 = Gρ/2 and G5 = Gω/2
which give Gv = −0.78 GeV
−2. We remark that the sign of Gv is definitely negative
since omega meson is heavier than the rho meson.
The potential matrix elements of Eqs. (3.1-3.2) can then be used in the scattering
equations obtained with the use of Blankenbecler-Sugar three-dimensional reduction
scheme7) for the BS equation for both the s̄D and DD systems. The resultant
scattering equations are solved to obtain the two-body t−matrix elements and the
phase shifts.
§4. Results for Θ+ and discussion
Our results for the phase shifts are shown in Fig. 3(a). We see that the s-wave
phase shifts for s̄D is positive which indicates the interaction is attractive, while the
p-wave DD interaction is repulsive since their phase shift is negative. In Fig. 3(b)
4 H. Mineo, J.A. Tjon, K. Tsushima, and S.N. Yang
we show the Gv dependence of the two-body s̄D binding energy. We see that with
the type of interaction constructed in Sec. 3, a s̄D bound state begins to appear only
when Gv becomes less than −5 ∼ −6 Gev
−2, far too negative as compared to the
physical value of -0.78 Gev−2 determined from the ρ− ω mass difference.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
pE [GeV](a)
6 8 10 12 14
-Gv [GeV](b)
Fig. 3. (a) Phase shifts δl for the s̄D in s-wave with Gv = −0.78 GeV
−2 (solid line) and DD in
p-wave (dashed line). (b) Gv dependence of the s̄D binding energy.
The three-body BSF equation7) takes the same form as the nonrelativistic one,
Ti(s) = ti(s) + ti(s)G0(s) [Tj(s) + Tk(s)] , (4.1)
where G0 is the free three-particle Green’s function and ti(s) is the two-particle
t−matrix of particles j and k with (i, j, k) being a cyclic permutation of (1, 2, 3). If
one uses the Blankenbecler-Sugar approximation for G0 and the two-body t−matrix
elements obtained in Sec. 3, then the homogeneous equation of Eq. (4.1) can be
solved9) to look for a possible three-body s̄DD bound state. We could not find
a bound pentaquark in JP = 1
channel. However, we do see a pentaquark in
JP = 1
channel when Gv becomes less that ∼ −8.0 GeV
−2. Pentaquark binding
energy EB(5q) grows from 77 to 505 MeV as Gv decreases from -8.0 to -14.0 GeV
The effect of consierdable weaker attraction in the JP = 1/2+ channel is caused by
the spectator particle being in a p-wave state.
Acknowledgements
One of the authors (S.N.Y.) thanks the Yukawa Institute for Theoretical Physics
at Kyoto University, for warm hospitality extended to him during the YKIS2006 on
”New Frontiers on QCD”.
References
1) T. Nakano, talk at YKIS2006 (Kyoto, Japan, November 20 - December 8, 2006).
2) T. Nakano et al. [LEPS Collaboration], Phys. Rev. Lett. 91, (2003) 012002.
3) R.L. Jaffe and F. Wilczek, Phys. Rev. Lett. 91, (2003) 232003.
4) Y. Nambu and G. Jona-Lasinio, Phys. Rev. 122, 345 (1960); 124, 246 (1961).
5) S. Huang and J. Tjon, Phys. Rev. C49, 1702 (1994).
6) H. Mineo et al., Phys. Rev. C60, 065201 (1999); Nucl. Phys. A703, 785 (2002).
7) G. Rupp and J.A. Tjon, Phys. Rev. C37, (1988) 1729.
8) H. Mineo et al., Phys. Rev. C72, (2005) 025202.
9) A. Ahmadzadeh and J. Tjon, Phys. Rev. 147, (1966) 1111.
	Introduction
	SU(3)f NJL model and the diquark
	Two-body interactions for D and DD channels
	Results for + and discussion
ABSTRACT
  A Bethe-Salpeter-Faddeev (BSF) calculation is performed for the pentaquark
$\Theta^+$ in the diquark picture of Jaffe and Wilczek in which $\Theta^+$ is a
diquark-diquark-${\bar s}$ three-body system.
  Nambu-Jona-Lasinio (NJL) model is used to calculate the lowest order diagrams
in the two-body scatterings of ${\bar s}D$ and $D D$. With the use of coupling
constants determined from the meson sector, we find that ${\bar s}D$
interaction is attractive while $DD$ interaction is repulsive, and there is no
bound $\frac 12^+$ pentaquark state. A bound pentaquark $\Theta^+$ can only be
obtained with unphysically strong vector mesonic coupling constants.

<|endoftext|><|startoftext|>
arXiv:0704.1075v2  [hep-ph]  3 May 2007
Preprint typeset in JHEP style - HYPER VERSION TUHEP-TH-07156
New Terms for the Compact Form of Electroweak
Chiral Lagrangian
Hong-Hao Zhang, Wen-Bin Yan, and J. K. Parry
Center for High Energy Physics & Department of Physics, Tsinghua University,
Beijing 100084, China
E-mail: zhanghonghao@tsinghua.org.cn, wenbin.yan@gmail.com,
jkparry@tsinghua.edu.cn
Xue-Song Li
Science College, Hunan Agricultural University, Changsha 410128, China
E-mail: lixuesong@tsinghua.org.cn
Abstract: The compact form of the electroweak chiral Lagrangian is a reformulation of
its original form and is expressed in terms of chiral rotated electroweak gauge fields, which
is crucial for relating the information of underlying theories to the coefficients of the low-
energy effective Lagrangian. However the compact form obtained in previous works is not
complete. In this letter we add several new chiral invariant terms to it and discuss the
contributions of these terms to the original electroweak chiral Lagrangian.
Keywords: Electroweak Chiral Lagrangian, Beyond Standard Model.
http://arxiv.org/abs/0704.1075v2
mailto:zhanghonghao@tsinghua.org.cn
mailto:wenbin.yan@gmail.com
mailto:jkparry@tsinghua.edu.cn
mailto:lixuesong@tsinghua.org.cn
http://jhep.sissa.it/stdsearch
So far the postulated Higgs particle of the standard model has not been observed in ex-
periments. We do not know what the electroweak symmetry breaking mechanism in nature
is. The electroweak chiral Lagrangian is a general low-energy description for electroweak
symmetry breaking patterns [1, 2], especially for those strong dynamical electroweak sym-
metry breaking mechanisms [3]. All the coefficients of the electroweak chiral Lagrangian,
in principle, can be fixed by experiments [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. Due to the non-
perturbative property of the possible strong dynamics, it is difficult to relate the measured
coefficients of the electroweak chiral Lagrangian to underlying theories. Recently, the series
of work of Ref. [14, 15, 16, 17, 18, 19, 20, 21] successfully produced the predictions for the
coefficients of the chiral Lagrangian from the underlying theory of QCD, which lights up
the hope of building up the relationship between the coefficients of the electroweak chiral
Lagrangian and underlying strong dynamical models. As seen in Ref. [14], a crucial first
step of the derivation of the chiral Lagrangian from QCD is to reformulate the original
chiral Lagrangian in terms of chiral rotated external source fields. For the case of the elec-
troweak chiral Lagrangian, we need to similarly reformulate it in terms of chiral rotated
electroweak gauge fields in order to deduce the information of underlying theories. An
attempt of this line of thought is the work of Ref. [22], which tries to give a so-called com-
pact form of the electroweak chiral Lagrangian in terms of chiral rotated electroweak gauge
fields. However, there are still several relevant terms not included in the reformulation of
Ref. [22]. In this letter, we shall present 3 new terms for this reformulation and give their
contributions to the original electroweak chiral Lagrangian.
We refer the interested reader to Ref. [22] for all the details on the compact form of the
electroweak chiral Lagrangian and the relations between the compact form and the original
form. Besides the inner-product terms c2g
b′µV b
′ν , c8g
ν − dνV
a′µAν ,
and c9g
(dµV a
ν already included in the compact form of Ref. [22], there should be
3 new cross-product terms which are given by,
∆L = c′
2 + c′
a′b′(dµV
ν − dνV
ν + c′
a′b′(dµV a
where a′, b′ run from 1 to 2, and ǫ12 = −ǫ21 = 1. Here and henceforth V a
′µ and Aµ
are short for the chiral rotated gauge fields V
and A
in Ref. [22] respectively. There
are also 3 cross-product terms corresponding to the c15,17,25-terms in that paper, but they
vanish. Let us consider Eq. (1) term by term. From the definitions of Ref. [22], it is
straightforward to obtain the relation between the first term of Eq. (1) and the ordinary
terms of the electroweak chiral Lagrangian as follows,
2 = −4c′
[tr(XµXν)]
− [tr(XµX
µ)]2 − tr(XµXν)tr(τ
µ)tr(τ3Xν)
+tr(XµX
µ)[tr(τ3Xν)]
, (2)
where Xµ ≡ U
†(DµU). And the last two terms of Eq. (1) are respectively given by,
′b′(dµV
ν − dνV
b′µAν
– 1 –
tr(τ3Xν)
tr(τ3Xµ)tr(XµXν)− tr(τ
3Xν)tr(XµX
+ itr(W µνX
tr(τ3W µν)tr(τ
, (3)
with Wµν ≡ U
W aµνU , and
′b′(dµV a
(1− 4β1)c
tr(τ3Xµ)tr(τ3Xν)tr(XµXν)
−[tr(τ3Xµ)tr(τ
. (4)
From Eqs. (2), (3) and (4), we obtain the contributions of these 3 new terms to the original
electroweak chiral Lagrangian as follows,
∆α3 =
, ∆α4 = −4c
, ∆α5 = 4c
∆α6 = 4c
(1− 4β1)c
, ∆α7 = −4c
∆α9 = −
, ∆α10 = (1− 4β1)c
. (5)
The coefficients c′
2,8,9 and ci (i = 1, 2, . . . , 25) in this compact form of the electroweak chiral
Lagrangian are determined by the underlying ultraviolet theories. For example, if we take
the one-doublet technicolor model as the underlying theory, it can be shown that these 3
new coefficients c′
2,8,9 are all non-vanishing, and full details will be presented in forthcoming
publications [23].
In summary, we have provided 3 new terms to the compact form of the electroweak
chiral Lagrangian introduced in Ref. [22]. These additional terms were not considered in
the previous work. In this letter we have related these new terms to the original electroweak
chiral Lagrangian, which will be crucial in forthcoming studies of strong dynamical models.
Acknowledgments
We are indebted to Qing Wang for all our knowledge about the chiral Lagrangian and his
helps and supports for this work. This work is supported in part by the National Natural
Science Foundation of China.
References
[1] T. Appelquist and C. W. Bernard, Phys. Rev. D 22, 200 (1980). T. Appelquist and
G. H. Wu, Phys. Rev. D 48, 3235 (1993) [arXiv:hep-ph/9304240].
[2] A. C. Longhitano, Phys. Rev. D 22, 1166 (1980); Nucl. Phys. B 188, 118 (1981).
[3] C. T. Hill and E. H. Simmons, Phys. Rept. 381, 235 (2003) [Erratum-ibid. 390, 553 (2004)]
[arXiv:hep-ph/0203079].
– 2 –
[4] J. F. Donoghue, C. Ramirez and G. Valencia, Phys. Rev. D 39, 1947 (1989); J. F. Donoghue
and C. Ramirez, Phys. Lett. B 234, 361 (1990). A. Dobado and M. T. Urdiales, Z. Phys. C
71, 659 (1996) [arXiv:hep-ph/9502255].
[5] H. J. He, Y. P. Kuang and C. P. Yuan, Phys. Lett. B 382, 149 (1996)
[arXiv:hep-ph/9604309]; Phys. Rev. D 55, 3038 (1997) [arXiv:hep-ph/9611316]; “Global
analysis for probing electroweak symmetry breaking mechanism at high energy colliders,”
arXiv:hep-ph/9704276.
[6] H. J. He, “Quartic gauge boson couplings,” arXiv:hep-ph/9804210, Invited overview talk
published in the Proceedings of Workshop on Physics at the First Muon Collider,
pp. 685-700, Fermilab, Batavia, USA, Nov. 6-9, 1997; T. Han, H. J. He and C. P. Yuan,
Phys. Lett. B 422, 294 (1998) [arXiv:hep-ph/9711429].
[7] S. Alam, S. Dawson and R. Szalapski, Phys. Rev. D 57, 1577 (1998) [arXiv:hep-ph/9706542].
[8] H. J. He, Y. P. Kuang, C. P. Yuan and B. Zhang, Phys. Lett. B 554, 64 (2003)
[arXiv:hep-ph/0211229].
[9] B. Zhang, Y. P. Kuang, H. J. He and C. P. Yuan, Phys. Rev. D 67, 114024 (2003)
[arXiv:hep-ph/0303048].
[10] H. J. He, Y. P. Kuang, C. P. Yuan and B. Zhang, arXiv:hep-ph/0401209, Published in the
Proceedings of the 3rd Les Houches Workshop: Physics at TeV Colliders, Les Houches,
France, May 26-Jun6, 2003.
[11] R. S. Chivukula, E. H. Simmons, H. J. He, M. Kurachi and M. Tanabashi, Phys. Rev. D 72,
075012 (2005) [arXiv:hep-ph/0508147].
[12] T. Han, Y. P. Kuang and B. Zhang, Phys. Rev. D 73, 055010 (2006)
[arXiv:hep-ph/0512193].
[13] E. Boos, H. J. He, W. Kilian, A. Pukhov, C. P. Yuan and P. M. Zerwas, Phys. Rev. D 57,
1553 (1998) [arXiv:hep-ph/9708310]; Phys. Rev. D 61, 077901 (2000)
[arXiv:hep-ph/9908409]. M. Beyer, W. Kilian, P. Krstonosic, K. Monig, J. Reuter,
E. Schmidt and H. Schroder, Eur. Phys. J. C 48, 353 (2006) [arXiv:hep-ph/0604048].
[14] Q. Wang, Y. P. Kuang, M. Xiao and X. L. Wang, Phys. Rev. D 61, 054011 (2000)
[arXiv:hep-ph/9903201].
[15] Q. Wang, Y. P. Kuang, X. L. Wang and M. Xiao, arXiv:hep-ph/9910289.
[16] X. L. Wang, Z. M. Wang and Q. Wang, Commun. Theor. Phys. 34, 683 (2000).
[17] X. L. Wang and Q. Wang, Commun. Theor. Phys. 34, 519 (2000).
[18] H. Yang, Q. Wang, Y. P. Kuang and Q. Lu, Phys. Rev. D 66, 014019 (2002)
[arXiv:hep-ph/0203040].
[19] Q. Wang, Y. P. Kuang, H. Yang and Q. Lu, J. Phys. G 28, L55 (2002)
[arXiv:hep-ph/0209201].
[20] Q. Wang, Int. J. Mod. Phys. A 20, 1627 (2005).
[21] Y. L. Ma and Q. Wang, Phys. Lett. B 560, 188 (2003) [arXiv:hep-ph/0302143].
[22] Z. M. Wang and Q. Wang, Commun. Theor. Phys. 36, 417 (2001).
[23] H. H. Zhang, S. Z. Jiang and Q. Wang, “Dynamical Computation on Coefficients of
Electroweak Chiral Lagrangian from One-doublet and Topcolor-assisted Technicolor
Models,” arXiv:0705.0115 [hep-ph].
– 3 –
ABSTRACT
  The compact form of the electroweak chiral Lagrangian is a reformulation of
its original form and is expressed in terms of chiral rotated electroweak gauge
fields, which is crucial for relating the information of underlying theories to
the coefficients of the low-energy effective Lagrangian. However the compact
form obtained in previous works is not complete. In this letter we add several
new chiral invariant terms to it and discuss the contributions of these terms
to the original electroweak chiral Lagrangian.

<|endoftext|><|startoftext|>
Introduction 
The principle of maximum entropy (maxent) is widely used in the statistical sciences and 
engineering as a powerful tool and fundamental rule. The maxent approach in statistical 
mechanics can be traced back to the works of Boltzmann and Gibbs[3] and finally be given 
the status of principle thanks to the work of Jaynes[4] who used it with Boltzmann-Gibbs-
Shannon (BGS) entropy (see below) to derive the canonical probability distribution for 
statistical mechanics in a simple manner. However, in spite of its success and popularity, 
maxent has always been at the center of scientific and philosophical discussions and has 
raised many questions and controversies[4][5][6]. A central question is why a thermodynamic 
system chooses the equilibrium microstates such that the BGS entropy gets to maximum. As a 
basic assumption of scientific theory, maxent is not directly or indirectly related to 
observation and undoubted facts. In the literature, maxent is postulated as such or justified 
either a priori by the second laws with additional hypothesis such as the entropy functional 
(Boltzmann or Shannon entropy)[6], or a posteriori by the correctness of the probability 
distributions derived from it[4]. In statistical inference theory, it was often justified by 
intuitive arguments based on the subjectivity of probability[4] or by relating it to other 
principles such as the consistency requirement and the principle of insufficient reason of 
Laplace, which have been the object of considerable criticisms[5].  
Another important question about maxent is whether or not the BGS entropy is unique as 
the measure of uncertainty or disorder that can be maximized in order to determine 
probability distributions. This was already an question raised 40 years ago by the scientists 
who tried to generalize the Shannon entropy by mathematical considerations [9][10].  
In the present work, we try to contribute to the debate around maxent by an attempt to 
derive maxent from a well known fundamental principle of classical mechanics, the virtual 
work principle or Lagrange-d’Alembert principle (LAP) [1][2] without additional hypotheses 
to LAP and about entropy property. LAP is widely used in physical sciences as well as in 
mechanical engineering. It is a basic principle capable of yielding all the basic laws of statics 
and of dynamics of mechanical systems. It is in addition a simple, clearly defined, easily 
understandable and palpable law of physics. It is hoped that this derivation is scientifically 
and pedagogically beneficial for the understanding of maxent and of the relevant questions 
and controversies around it. In this work, the term entropy, denoted by S, is used in the sense 
of the second law of thermodynamics for equilibrium system. 
2) Principle of virtual work 
The variational calculus in mechanics has a long history which may be traced back to 
Galilei and other physicists of his time who studied the equilibrium problem of statics with 
LAP (or virtual displacement1). LAP gets unified and concise mathematical forms thanks to 
Lagrange[1] and d’Alembert[2] and is considered as a most basic principle of mechanics from 
which all the fundamental laws of statics and dynamics can be understood thoroughly.   
LAP says that the total work done by all forces acting on a system in static equilibrium is 
zero on all possible virtual displacements which are consistent with the constraints of the 
system. Let us suppose a simple case of a system of N points of mass in equilibrium under the 
action of N forces Fi (i=1,2,…N) with Fi on the point i, and imagine virtual displacement of 
each point ir
δ  for the point i. According to LAP, the virtual work Wδ  of all the forces Fi on 
all ir
δ  cancels itself for static equilibrium, i.e.  
i rFW
     (1) 
This principle for statics was extended to dynamical equilibrium by d’Alembert[2] in the LAP 
by adding the initial force iiam
− on each point: 
=⋅−∑=
i ramFW
     (2) 
where mi is the mass of the poin i and ia
 its acceleration. From this principle, we can not only 
derive Newtonian equation of dynamics, but also other fundamental principles such as least 
action principle.  
3) Why maximum thermodynamic entropy ? 
We suppose that the mechanics laws are usable not only for mechanical system containing 
small number of particles in regular motion, but also for large number of particles in random 
and stochastic motion for which one has to use statistical approach introducing probability 
distribution of mechanical states. Let us first consider an ensemble of equilibrium systems, 
                                                 
1 In mechanics, the virtual displacement of a system is a kind of imaginary infinitesimal displacement with no 
time passage and no influence on the forces. It should be perpendicular to the constraint forces. 
each composed of N particles in random motion with vi
v  the velocity of the particle i. It will 
be shown that the result for canonical ensemble can be easily extended to microcanonical 
ensemble and grand-canonical ensemble. Without loss of generality, let us look at a system 
without macroscopic motion, i.e., 0
v .  
We imagine that the system in thermodynamic equilibrium leaves the equilibrium state by 
a reversible infinitesimal virtual process.  Let Fi
 be the force on a particle i of the system at 
that moment. Fi
 includes all the interacting forces particles-particles and particles-walls of 
the container. During the virtual process, each particle with acceleration ir&&
 has a virtual 
displacement r i
vδ . The total virtual work on this displacement is given by 
i rrmFW
δδ ⋅−∑=
     (3) 
Although the sum of the accelerations of all the particles vanishes, i.e., 0
irm &&
v , the 
acceleration ir&&
 on each particle can be nonzero. So in general 0
i rrm
v&&v δ . As a matter of 
fact, we have kiiiiii ermrrmrrm δδδδ ==⋅=⋅ )2
v&v&vv&&v  where eki is the kinetic energy of the 
particle. On the other hand, we suppose these are no dissipative forces in the system or on the 
particles. It means that the energy of the system will not change if the system is completely 
closed and isolated. Let epi be the potential energy of a particle i subjet to the force Fi
, we 
should have pii eF −∇=
 and 
i ererF ∑−=⋅∑ ∇−=⋅∑
=== 111
vvv  
     (4) 
So finally it follows that 
∑−=∑ +−=
kipi eeeW
)( δδδδ  
     (5) 
where eiδ  is a virtual variation of the total energy eee kipii +=  of the particle i and ∑=
the total energy of the N particles. 
At this stage, no statistics has been done. The particles are treated as if they had regular 
dynamics. As a matter of fact, when the dynamics is random such as in a thermodynamic 
system, a microscopic process can leads the N particles from a given microstate to different 
microstates j with different probability pj (j=1,2 … w). If we looks at the system in the phase 
space, the considered process with given virtual displacements can take different directions or 
paths each leading to a given microstate with some likelihood. Hence the virtual work given 
by Eq.(5) is not a correct and complete expression for the random dynamics. It is in fact the 
virtual work of a possible process leading to a microstate j. It should be written as 
( )∑−=
jij eW
δδ  instead of Eq.(5). This "partial" virtual work cannot be used in the LAP 
since it is only a possible part of the total virtual work whose correct expression needs the 
introduction of the probability distribution of microstates pj. Logically, the total virtual work 
should be an average of the work given by Eq.(5) over all the possible microstates, i.e., 
WpW j
jδδ ∑=
     (6) 
This expression is essential in the application of LAP, an approach originally for regular 
dynamics, to irregular and random dynamics. Eq.(6) makes it possible to introduce the 
dynamic uncertainty (entropy) into the variational approach as shown below. In terms of 
thermodynamic ensemble, Eq.(6) is the ensemble average of the virtual works of all the 
members of an ensemble of systems distributed over the microstates. It is this average which 
is measurable and has a physical sense in the case of random dynamics just as the usual 
average energy in thermodynamics. It is not conceivable to let the partial virtual work of 
Eq.(5) vanish because this would signifies that the random motion in each direction in phase 
space of the virtual process is regular according to LAP and there would be only one direction 
or phase path of the virtual process leading to only one microstate, which is contradictory 
with the hypothesis of the random dynamics. This reasoning is the essential difference of the 
present approach from the simple search for mechanics principle and a direct use of the latter 
to each possible state or trajectory in phase space. The use of mechanics principle in regular 
way in general yields regular mechanical laws irrelevant to thermodynamics. The dynamical 
randomness is the fact that not all the possible states or trajectories follow the regular 
mechanical laws due to noises, certain chaos, or to quantum mechanics in which the 
Newtonian laws are obeyed only statistically. The statistical Newtonian second law given in 
[11] is an example.  
Eq.(6) can be accounted for in an explicit way as follows. A microstate j is some 
distribution of the N particles over the one particle states k with energy εk where k varies from, 
say, 1 to g (g can be very large). We imagine Nj identical particles distributed over the g states 
at a microstate j which is here a combination of g numbers nk of particles over the g states, 
i.e., j={n1, n2, … ng, }. We naturally have ( )∑=
jkj nN
 and ( ) ε k
jkj nE ∑=
. During the 
process of virtual work, only the energy of the particle can change (the fact that virtual work 
does not affect nk can be understood from quantum point of view since kε  is discrete but 
virtual work is infinitesimal and continuous). For a given j with probability pj, the virtual 
work given in Eq.(5) is now 
∑+−=∑ ∑+−∑ =−=
kjkkjk
kjkj nEnnnW εδδεδεδδεδ )()()()( .      (7) 
The first term of the right hand side is the total energy variation due to the one particle energy 
variation kδε  caused by the virtual work as well as to the variation in particle number jNδ  of 
the system. The second term is just the energy variation caused by the particle number 
variation ∑=
jkj nN )(δδ . Hence Eq.(6) reads 
NEnEnpEpW
j δμδδεδδεδδ +−=∑+−=∑∑+∑−= )( . 
     (8) 
where we put an expression for the chemical potential Nn
kk δδεμ /∑=  with 
( ) ∑=∑ ∑=∑=
j nnpNpN δδδδ  and j
j EpE δδ ∑= . Since  ∑−=
jj pEEE δδδ  and  
pNNN δδδ  with j
jEpE ∑=  and j
jNpN ∑= , we get 
∑ −++−∑ =−∑ ++−=
jj pNENEpNNpEEW δμμδδδμμδδδδ )(  
     (9) 
Now using the first law NWQE μδδδδ +−=  for Grand-canonical ensemble, we identify the 
heat transfer ∑ −=
jjj pNEQ δμδ )( . For a reversible virtual process, we can write 
∑ −==
jjj pNEQS δμββδδ )(  and get  
NEW ++−= . 
     (10) 
where S is the thermodynamic entropy of the second law.  
The following variational calculus for different ensemble is straightforward. According to 
LAP 0=Wδ , we have 
0)( =+− NES βμβδ       (11) 
which is the usual algorithm of maxent for grand-canonical ensemble. The only difference is 
that here the "constraints" associated with energy and particle number appear in the 
variational calculus as a simple consequence of LAP, in contrast to the introduction of these 
constraints in the inference theory or inferential statistical mechanics[4] by the argument that 
an averaged value of an observable quantity represents a factual information to be put into the 
maximization of information in order to derive unbiased probability distribution[5]. 
In order to see further details about this maxent, let us suppose the entropy is a function of 
the probability distribution pj of the considered moment, i.e., ...)...( ,2,1 jpppfS = . We can 
write j
S δδ ∑
=  due to the variations of the virtual process. On the other hand, we have 
jjj pNES δμβδ )( which implies 
0)( =+−∑
δβμβ . 
     (12) 
By virtue of the normalization condition 0=∑
jpδ , one can prove [12] that 
αβμβ =+−
     (13) 
with a constant α. Eq.(13) can be used for deriving the probability distribution of the 
nonequilibrium component of the dynamics if the functional f is given. Inversely, if the 
probability distribution is known, one can derive the functional of S. 
For canonical ensemble, we have 0=Nδ  and 
0)( =− ES βδ       (14) 
or, by the same argument as above, 
αβ =−
     (15) 
For microcanonical ensemble, the system is completely closed and isolated with constant 
energy 0=Eδ  and constant particle number 0=Nδ . When the virtual displacements occur, 
the total virtual work would be transformed into virtual heat such that Eq.(10) becomes 
0=− QW δδ . LAP 0=Wδ  leads to 
0=Sδ  or α=
     (16) 
which necessarily yields uniform probability distribution over the different microstates j, i.e., 
pj =1/w whatever is the form of the entropy S. Note that here the uniform distribution over the 
microstates is not an a priori assumption, but a consequence of LAP.  
This equiprobability can be proven as follows only by supposing that ...)...( ,2,1 jpppfS =  
is a strictly increasing or decreasing function of all pj throughout the interval 10 ≤≤ jp , i.e., 
its derivatives 0or  0 <>
 and are zero only at some finite number of points on the 
interval. However, Eq.(16) tells us that α=
 is a constant independent of pj, implying that 
...)...( ,2,1 jpppfS =  is either a linear function of all pj, or all pj are identical. It is evident that 
entropy cannot be linear function of pj. The equal probability of all microstates follows.  
The conclusion of this section is that, at thermodynamic equilibrium, the maxent under the 
constraints of energy is a consequence of the equilibrium condition LAP extended to random 
motion. From Eq.(8), one notices that maxent can be written in the following concise form for 
any ensemble with n random variables Xi (i=1,2 …n): 
ii Xδχ  
where iχ  is some constant corresponding to iχ . For grand-canonical ensemble, 
     (17) 
this is 0=− NE δμδ  and for canonical ensemble, it is 0=Eδ . 
We stress that in the above derivation, the only essential assumptions or fundamental 
physical hypotheses used before the LAP are the first and second laws of thermodynamics for 
equilibrium system and reversible process. Hence the three algorithms of maxent for the three 
statistical ensembles are in principle valid for all systems for which the first and second laws 
are valid. We would like to mention here that this derivation of maxent is not associated with 
any given form of entropy like in the original version of Jaynes principle.  
4) Concluding remarks 
This work shows that the maximum entropy principle has a close connection with the 
fundamental principle of classical mechanics, the principle of virtual work, i.e., for a 
mechanical system to be in thermodynamics equilibrium with maximum entropy, the total 
virtual work of all the forces on all the elements (particles) of the system should vanish. 
Indeed, if one admits that thermodynamic entropy is a measure of dynamical disorder and 
randomness, it is natural to say that this disorder must get to maximum in order that all the 
random forces act on each degree of freedom of the motion in such a way that over any 
possible (virtual) displacement, the work of all the forces is zero. In other words, this 
vanishing work can be obtained if and only if the randomness of the forces is at maximum 
over all degree of freedom allowed by the constraints to get stable equilibrium state.  
To our opinion, the present result is helpful not only for the understanding of maxent 
derived from a more basic and well understood mechanical principle, it also shows that 
entropy in physics is not necessarily a subjective quantity reaching maximum for correct 
inference, and that maximum entropy is a law of physics but not merely an inference 
principle.  
After finishing this paper, the author became aware of a work of Plastino and Curado[12] 
on the equivalence between the particular thermodynamic relation ES βδδ =  and maxent in 
the derivation of probability distribution. They consider the particular thermodynamic process 
affecting only the microstate population in order to find a different way from maxent to derive 
probability. The work part is not considered in their work. Their analysis is pertinent and 
consequential. The present work provides a substantial support of their reasoning from a basic 
principle of mechanics.   
References 
[1] J.L. Lagrange,   Mécanique analytique, Blanchard, reprint , Paris  (1965)  (Also: 
Oeuvres, Vol. 11.) 
[2] J. D’Alembert,   Traité de dynamique, Editions Jacques Cabay , Sceaux  (1990) 
[3] J. Willard Gibbs, Principes élémentaires de mécanique statistique (Paris, Hermann, 
1998) 
[4] E.T. Jaynes, The evolution of Carnot's principle, The opening talk at the EMBO 
Workshop on Maximum Entropy Methods in x-ray crystallographic and biological 
macromolecule structure determination, Orsay, France, April 24-28, 1984; Gibbs 
vs Boltzmann entropies, American Journal of Physics, 33,391(1965) ; Where do 
we go from here? in Maximum entropy and Bayesian methods in inverse problems, 
pp.21-58, eddited by C. Ray Smith and W.T. Grandy Jr., D. Reidel, Publishing 
Company (1985) 
[5] Jos Uffink, Can the maximum entropy principle be explained as a consistency 
requirement, Studies in History and Philosophy of Modern Physics, 26B (1995): 
223-261 
[6] L.M. Martyushev and V.D. Seleznev, Maximum entropy production principle in 
physics, chemistry and biology, Physics Reports, 426, 1-45 (2006) 
[7] Y.P. Terletskii, Statistical physics, North-Holland Publishing Company, 
Amsterdam, 1971 
[8] Q.A. Wang, Some invariant probability and entropy as a maximizable measure of 
uncertainty, to appear in J. Phys. A (2008); cond-mat/0612076 
[9] A. Rényi, Calcul de probabilité, Paris, Dunod, 1966, P522 
A. Wehrl, Rev. Mod. Phys., 50(1978)221 
[10] M.D. Esteban, Kybernetika, 31(1995)337 
[11] R.P. Feynman and A.R. Hibbs, Quantum mechanics and path integrals, 
McGraw-Hill Publishing Company, New York, 1965 
[12] A. Plastino and E.F.M. Curado, Phys. Rev. E, 72(2005)047103; 
E.F.M. Curado and A. Plastino, arXiv:cond-mat/0601076; also cond-mat/0509070
ABSTRACT
  We propose an extension of the principle of virtual work of mechanics to
random dynamics of mechanical systems. The total virtual work of the
interacting forces and inertial forces on every particle of the system is
calculated by considering the motion of each particle. Then according to the
principle of Lagrange-d'Alembert for dynamical equilibrium, the vanishing
ensemble average of the virtual work gives rise to the thermodynamic
equilibrium state with maximization of thermodynamic entropy. This approach
establishes a close relationship between the maximum entropy approach for
statistical mechanics and a fundamental principle of mechanics, and constitutes
an attempt to give the maximum entropy approach, considered by many as only an
inference principle based on the subjectivity of probability and entropy, the
status of fundamental physics law.

<|endoftext|><|startoftext|>
Introduction
Various nonlinear theories of generalized functions have been developed over the past twenty
years, with contributions by many authors. These theories have in common that the space of
distributions is enlarged or embedded into algebras so that nonlinear operations on distribu-
tions become possible. These methods have been especially efficient in formulating and solving
nonlinear differential problems with irregular data.
Most of the algebras of generalized functions possess the structure of sheaves or presheaves,
which may contain some sub(pre)sheaves with particular properties. For example, the sheaf
∗Supported by FWF (Austria), grant Y237.
http://arxiv.org/abs/0704.1077v1
G of the special Colombeau algebras [2, 7, 15] contains the subsheaf G∞ of so-called regular
sections of G such that the embedding: G∞ → G is the natural extension of the classical one:
C∞ → D′. This notion of regularity leads to G∞-local or microlocal analysis of generalized
functions, extending the classical results on the C∞-microlocal analysis of distributions due
to Hörmander [8]. This concept has been slightly extended in [4] to less restrictive kinds
of measuring regularity. In [14], microlocal regularity theory in analytic and Gevrey classes
has been generalized to algebras of generalized functions. Many results on propagation of
singularities and pseudodifferential techniques have been obtained during the last years (see
[5, 6, 9, 10, 11]). Nevertheless, these results are still mainly limited to linear cases, since they
use frequential methods based on the Fourier transform.
In this paper, we develop a new type of asymptotic local and microlocal analysis of gener-
alized functions in the framework of (C, E ,P)-algebras [12, 13], following first steps undertaken
in [12]. An example of the construction is given by taking G as a special case of a (C, E ,P)-
structure (see Subsection 2.2 for details). Let F be a subsheaf of vector spaces (or algebras) of
G and (uε)ε a representative of u ∈ G (Ω) for some open set Ω ⊂ R
n. We first define OFG (u) as
the set of all x ∈ Ω such that uε tends to a section of F above some neighborhood of x. The
F-singular support of u is Ω\OFG (u). For fixed x and u, Nx(u) is the set of all r ∈ R+ such
that εruε tends to a section of F above some neighborhood of x. The F-singular spectrum of
u is the set of all (x, r) ∈ Ω × R+ such that r ∈ R+\Nx(u). It gives a spectral decomposition
of the F-singular support of u.
This asymptotic analysis is extended to (C, E ,P)-algebras. This gives the general asymptotic
framework, in which the net (εr)ε is replaced by any net a satisfying some technical conditions,
leading to the concept of the (a,F)-singular asymptotic spectrum. The main advantage is that
this asymptotic analysis is compatible with the algebraic structure of the (C, E ,P)-algebras.
Thus, the (a,F)-singular asymptotic spectrum inherits good properties with respect to nonlin-
ear operations (Theorem 15 and Corollary 16).
The paper is organized as follows. In Section 2, we introduce the sheaves of (C, E ,P)-algebras
and develop the local asymptotic analysis. Section 3 is devoted to the (a,F)-microlocal analysis
and specially to the nonlinear properties of the (a,F)-singular asymptotic spectrum. In Section
4 various examples of the propagation of singularities through non linear differential operators
are given.
2 Preliminary definitions and local parametric analysis
2.1 The presheaves of (C, E ,P)-algebras: the algebraic structure
We begin by recalling the notions from [12, 13] that form the basis for our study.
(a) Let:
(1) Λ be a set of indices;
(2) A be a solid subring of the ring KΛ (K = R or C); this means that whenever (|sλ|)λ ≤ (|rλ|)λ
for some ((sλ)λ, (rλ)λ) ∈ K
Λ ×A, that is, |sλ| ≤ |rλ| for all λ, it follows that (sλ)λ ∈ A ;
(3) IA be a solid ideal of A ;
(4) E be a sheaf of K-topological algebras over a topological space X .
Moreover, suppose that
(5) for any open set Ω in X, the algebra E(Ω) is endowed with a family P(Ω) = (pi)i∈I(Ω)
of semi-norms such that if Ω1, Ω2 are two open subsets of X with Ω1 ⊂ Ω2, it follows that
I(Ω1) ⊂ I(Ω2) and if ρ
1 is the restriction operator E(Ω2) → E(Ω1), then, for each pi ∈ P(Ω1)
the semi-norm p̃i = pi ◦ ρ
1 extends pi to P(Ω2) .
(6) Let Θ = (Ωh)h∈H be any family of open sets in X with Ω = ∪h∈HΩh. Then, for each
pi ∈ P(Ω), i ∈ I(Ω), there exist a finite subfamily of Θ: Ω1, . . . , Ωn(i) and corresponding
semi-norms p1 ∈ P(Ω1), . . . , pn(i) ∈ P(Ωn(i)), such that, for any u ∈ E(Ω)
pi (u) ≤ p1 (u |Ω1 ) + . . .+ pn(i)(u |Ωn(i)).
(b) Define |B| = {(|rλ|)λ , (rλ)λ ∈ B}, B = A or IA, and set
H(A,E,P)(Ω) =
(uλ)λ ∈ [E(Ω)]
| ∀i ∈ I(Ω), ((pi(uλ))λ ∈ |A|
J(IA,E,P)(Ω) =
(uλ)λ ∈ [E(Ω)]
| ∀i ∈ I(Ω), (pi(uλ))λ ∈ |IA|
C = A/IA,
Note that, from (2), |A| is a subset of A and that A+ = {(bλ)λ ∈ A, ∀λ ∈ Λ, bλ ≥ 0} = |A|.
The same holds for IA. Furthermore, (2) implies also that A is a K-algebra. Indeed, it suffices
to show that A is stable under multiplication by elements of K. Let c be in K and (aλ)λ ∈ A.
Then (caλ)λ satisfies (|caλ|)λ ≤ (|naλ|)λ for some n ∈ N. We have (naλ)λ ∈ A since A is stable
under addition. Thus, using (2), we get that (caλ)λ ∈ A.
For later reference, we recall the following notions entering in the definition of a sheaf A on
X. Let (Ωh)h∈H be a family of open sets in X with Ω = ∪h∈HΩh.
(F1) (Localization principle) Let u, v ∈ A(Ω). If all restrictions u|Ωh and u|Ωh , h ∈ H, coincide,
then u = v in A(Ω).
(F2) (Gluing principle) Let (uh)h∈H be a coherent family of elements of A(Ωh), that is, the
restrictions to the non-void intersections of the Ωh coincide. Then there is an element
u ∈ A(Ω) such that u|Ωh = uh for all h ∈ H.
Proposition 1 (i) H(A,E,P) is a sheaf of K-subalgebras of the sheaf E
(ii) J(IA,E,P) is a sheaf of ideals of H(A,E,P).
Proof. The proof can be found in [12, 13], so we just recall the main steps. We start from
the statement that E and EΛ are already sheaves of algebras. From (5), we infer that H(A,E,P)
and J(IA,E,P) are a presheaves (the restriction property holds) and that the localization property
(F1) is valid. To obtain the gluing property (F2) we need property (6), which generalizes the
situation from C∞ to E .
Theorem 2 The factor H(A,E,P)/J(IA,E,P) is a presheaf satisfying the localization principle
(F1).
Proof. From the previous proposition, we know that A = H(A,E,P)/J(IA,E,P) is a presheaf.
For Ω1 ⊂ Ω2, the restriction is defined by
A (Ω2)
−→ A (Ω1)
u 7−→ u |Ω1 := [uλ |Ω1 ]
where (uλ)λ is any representative of u ∈ A(Ω2) and [uλ |Ω1 ] denotes the class of (uλ |Ω1)λ.
The definition is consistent and independent of the representative because for each (uλ)λ∈Λ ∈
H(A,E,P)(Ω2) and (ηλ)λ∈Λ ∈ J(IA,E,P)(Ω2), we have
(uλ)λ |Ω1 := (uλ |Ω1 )λ ∈ H(A,E,P)(Ω1) , (ηλ)λ |Ω1 := (ηλ |Ω1 )λ ∈ J(IA,E,P)(Ω1)
The localization principle is also obviously fulfilled because J(IA,E,P) is itself a sheaf.
Proposition 3 Under the hypothesis (2), the constant sheaf H(A,K,|.|)/J(IA,K,|.|) is exactly the
ring C = A/IA.
Proof. We clearly have H(A,K,|.|) = A and J(IA,K,|.|) = IA.
Definition 1 The factor presheaf of algebras over the ring C = A/IA:
A = H(A,E,P)/J(IA,E,P)
is called a presheaf of (C, E ,P)-algebras.
Notation 1 We denote by [uλ] the class in A(Ω) defined by (uλ)λ∈Λ ∈ H(A,E,P)(Ω). For u ∈ A,
the notation (uλ)λ∈Λ ∈ u means that (uλ)λ∈Λ is a representative of u.
Remark 1 The problem of rendering A a sheaf (and even a fine sheaf) is not studied here. It
is well known that the Colombeau algebra G, which is a special case of a (C, E ,P)-algebra (see
Subsection 2.2), forms a fine sheaf [1, 7]. The sheaf property can be inferred from the existence
of a C∞-partition of unity associated to any open covering of an open set Ω of Rd. This existence
is fulfilled because X = Rd is a locally compact Hausdorff space. On the other hand, C∞ is a fine
sheaf because multiplication by a smooth function defines a sheaf homomorphism in a natural
way. Hence the usual topology and C∞-partition of unity defines the required sheaf partition of
unity. Observing that G is a sheaf of C∞-modules and using the well known result that a sheaf of
modules on a fine sheaf is itself a fine sheaf, we obtain the corresponding assertion about G. In
the general case, turning A into a sheaf requires additional hypotheses, which are not necessary
for the results in this paper. Indeed, the presheaf structure of A and the (F1)-principle are
sufficient to develop our local and microlocal asymptotic analysis.
Remark 2 The map ι : K → A defined by ι (r) = (r)λ is an embedding of algebras and induces
a ring morphism from K → C if, and only if, A is unitary (Lemma 14, [13]). Indeed, if A is
unitary, (r)λ = r (1λ)λ is an element of A since A is a K-algebra, and ι is clearly an injective
ring morphism. The converse is obvious. Moreover, if Λ is a directed set with partial order
relation ≺ and if
(7) IA ⊂
(aλ)λ ∈ A | lim
aλ = 0
then the morphism ι is injective. Indeed, if [ι (r)] = 0, relation (7) implies that the limit of the
constant sequence (r)λ is null, thus r = 0.
2.2 Relationship with distribution theory and Colombeau algebras
One main feature of this construction is that we can choose the triple (C, E ,P) such that the
sheaves C∞ and D′ are embedded in the corresponding sheaf A. In particular, we can multiply
(the images of) distributions in A.
We consider the sheaf E = C∞ over Rd, whereP is the usual family of topologies (PΩ)Ω∈O(Rd).
Here O
denotes the set of all open sets of Rd; this notation will be used in the sequel. Let
us recall that PΩ is defined by the family of semi-norms (pK,l)K⋐Ω,l∈N with
∀f ∈ C∞ (Ω) , pK,l (f) = sup
x∈K,|α|≤l
|∂αf (x)| .
From Lemma 14 in [13], it follows that the canonical maps, defined for any Ω ∈ O
σΩ : C
∞ (Ω) → H(A,E,P)(Ω) f 7→ (f)λ ,
are injective morphism of algebras if, and only if, A is unitary. Under this assumption, these
maps give rise to a canonical sheaf embedding of C∞ into H(A,E,P) and (using a partition of
unity in C∞ inducing a sheaf structure on A) to a canonical sheaf morphism of algebras from
C∞ into A. This sheaf morphism turns out to be a sheaf morphism of embeddings if Λ is a
directed set with respect to a partial order ≺ and if relation (7) holds.
We shall address the question of the embedding of D′ for the simple case of Λ = (0, 1]. For
a net (ϕε)ε of mollifiers given by
ϕε (x) =
, x ∈ Rd where ϕ ∈ D(Rd) and
ϕ (x) dx = 1,
and T ∈ D′
, the net (T ∗ ϕε)ε is a net of smooth functions in C
, moderately
increasing in
. This means that
(8) ∀K ⋐ Rd,∀l ∈ N, ∃m ∈ N : pK,l (T ∗ ϕε) = o(ε
−m), as ε→ 0.
This justifies to choose
(rε)ε ∈ R
(0,1] | ∃m ∈ N : |uε| = o(ε
−m), as ε→ 0
(rε)ε ∈ R
(0,1] | ∀q ∈ N : |uε| = o(ε
q), as ε→ 0
In this case (with E = C∞), the sheaf of algebras A = H(A,E,P)/J(IA,E,P) is exactly the so-called
special Colombeau algebra G [2, 7, 16]. Then, for all Ω ∈ O
, C∞ (Ω) is embedded in A (Ω)
σΩ : C
∞(Ω) → A(Ω) f 7→ [fε] with fε = f for all ε in (0, 1] ,
because the constant net (f)ε belongs to H(A,E,P)
and (f)ε ∈ J(IA,E,P) implies f = 0 in
C∞(Ω). Furthermore, D′
is embedded in A
by the mapping
ι : T 7→ (T ∗ ϕε)ε
Indeed, relation (8) implies that (T ∗ ϕε)ε belongs to H(A,E,P)
and (T ∗ ϕε)ε ∈ J(IA,E,P)
implies that T ∗ϕε → 0 in D
, as ε→ 0 and T = 0. Thus, ι is a well defined injective map.
With the help of cutoff functions, we can define analogously, for each open set Ω in Rd, an
embedding ιΩ of D
′ (Ω) into A (Ω), and finally a sheaf embedding D′ → A. This embedding
depends on the choice of the net of mollifiers (ϕε)ε. We refer the reader to [3, 15] for more
complete discussions about embeddings in Colombeau’s case and to [13] for the case of (C, E ,P)-
algebras.
2.3 An association process
We return to the general case with the assumption that A is unitary and Λ is a directed set
with partial order relation ≺ .
Let us denote by:
• Ω an open subset of X,
• F a given sheaf (or presheaf) of topological K-vector spaces (resp. K-algebras) over X
containing E as a subsheaf of topological algebras,
• a a map from R+ to A+ such that a(0) = 1 (for r ∈ R+, we denote a (r) by (aλ (r))λ).
In the Colombeau case, a typical example would be aε(r) = ε
r, ε ∈ (0, 1].
For (vλ)λ ∈ H(A,E,P) (Ω), we shall denote the limit of (vλ)λ for the F-topology by lim
F(Ω) vλ
when it exists. We recall that lim
F(V ) uλ |V = f ∈ F(V ) iff, for each F-neighborhood W of f ,
there exists λ0 ∈ Λ such that
λ ≺ λ0 =⇒ uλ|V ∈W.
We suppose also that we have, for each open subset V ⊂ Ω,
(9) J(IA,E,P)(V ) ⊂
(vλ)λ ∈ H(A,E,P)(V ) : lim
F(V ) vλ = 0
Definition 2 Consider u = [uλ] ∈ A(Ω), r ∈ R+, V an open subset of Ω and f ∈ F(V ). We
say that u is a (r)-associated with f in V :
F(V )
if lim
F(V ) (aλ (r) uλ |V ) = f.
In particular, if r = 0, u and f are called associated in V .
To ensure the independence of the definition with respect to the representative of u, we
must have, for any (ηλ)λ ∈ J(IA,E,P)(Ω), that lim
F(V ) aλ (r) ηλ |V = 0. As J(IA,E,P)(V ) is a
module over A, (aλ (r) ηλ |V )λ is in J(IA,E,P)(V ). Thus, our claim follows from hypothesis (9).
Example 1 Take X = Rd, F = D′, Λ =]0, 1], A = G, V = Ω, r = 0. The usual association
between u = [uε] ∈ G (Ω) and T ∈ D
′ (Ω) is defined by
u ∼ T ⇐⇒ u
D′(Ω)
T ⇐⇒ lim
D′(Ω) uε = T.
2.4 The F-singular support of a generalized function
We use the notations of Subsection 2.3. According to the hypothesis (9), we have, for any open
set Ω in X,
J(IA,E,P)(Ω) ⊂
(uλ)λ ∈ H(A,E,P)(Ω) : lim
F(V ) uλ = 0
FA(Ω) =
u ∈ A(Ω) | ∃ (uλ)λ ∈ u, ∃f ∈ F(Ω) : lim
F(V ) uλ = f
FA(Ω) is well defined because if (ηλ)λ belongs to J(IA,E,P)(Ω), we have lim
F(V ) ηλ = 0.
Moreover, FA is a sub-presheaf of vector spaces (resp. algebras) of A. Roughly speaking,
it is the presheaf whose sections above some open set Ω are the generalized functions of A (Ω)
associated with an element of F (Ω).
Thus, for u ∈ A (Ω), we can consider the set OFA (u) of all x ∈ Ω having an open neighbor-
hood V on which u is associated with f ∈ F (V ), that is:
OFA (u) = {x ∈ Ω | ∃V ∈ Vx : u |V ∈ FA(V )} ,
Vx being the set of all the open neighborhoods of x.
This leads to the following definition:
Definition 3 The F-singular support of u ∈ A(Ω) is denoted SFA (u) and defined as
SFA (u) = Ω\O
A (u) .
Remark 3 (i) The validity of the gluing principle (F2) is not necessary to get the notion of
support (and of F-singular support) of a section u ∈ A(Ω). More precisely, the localization
principle (F1) is sufficient to prove the following: The set
A (u) = {x ∈ Ω | ∃V ∈ Vx, u |V = 0}
is exactly the the union ΩA (u) of the open subsets of Ω on which u vanishes.
Indeed, (F1) allows to show that u vanishes on an open subset O of Ω if, and only if, it vanishes
on an open neighborhood of every point of O. This leads immediately to the required assertion.
Moreover, ΩA (u) = O
A (u) is the largest open set on which u vanishes, S
A (u) = Ω\O
A (u)
is exactly the support of u in its classical definition, and the F-singular support of u is a closed
subset of its support.
(ii) In contrast to the situation described above for the support, we need the gluing principle
(F2) if we want to prove that the restriction of u to O
A (u) belongs to FA(O
A (u)). We make
this precise in the following lemma.
Lemma 4 Take u ∈ A(Ω) and set ΩFA (u) = ∪i∈IΩi, (Ωi)i∈I denoting the collection of the open
subsets of Ω such that u |Ωi ∈ FA (Ωi). Then, if FA is a sheaf (even if A is only a prehesaf),
(i) ΩFA (u) is the largest open subset O of Ω such that u |O belongs to FA (O);
(ii) ΩFA (u) = O
A(u) and S
A (u) = Ω \ Ω
A (u).
Proof. (i) For i ∈ I, set u |Ωi = fi ∈ FA (Ωi). The family (fi)i∈I is coherent by assumption:
From (F2), there exists f ∈ FA(Ω
A (u)) such that f |Ωi = fi. But from (F1), we have f = u on
∪i∈IΩi = Ω
A (u). Thus u |ΩF
(u) ∈ FA(Ω
A (u)), and Ω
A (u) is clearly the largest open subset of
Ω having this property.
(ii) First, OFA (u) is clearly an open subset of Ω. For x ∈ O
A (u), set u |Vx = fx ∈ FA (Vx) for
some suitable neighborhood Vx. The open set O
A (u) can be covered by the family (Vx)x∈OF
As the family (fx) is coherent, we get from (F2) that there exists f ∈ FA
∪x∈OF
(u)Vx
such that
f |Vx = fx. From (F1), we have u = f on ∪x∈OF
(u)Vx and, therefore, u |OF
(u) ∈ FA(O
A (u)).
Thus OFA (u) is contained in Ω
A (u). Conversely, if x ∈ Ω
A (u), there exists an open neigh-
borhood Vx of x such that u |Vx ∈ FA (Vx). Thus x ∈ O
A (u) and the assertion (ii) holds.
Proposition 5 For any u, v ∈ A(Ω), if F is a presheaf of topological vector spaces, (resp.
algebras), we have:
SFA (u+ v) ⊂ S
A (u) ∪ S
A(v).
Moreover, in the resp. case, we have
SFA (uv) ⊂ S
A(u) ∪ S
A(v).
Proof. If x ∈ Ω belongs to OFA(u) ∩ O
A(v), there exist V and W in Vx such that u |V ∈
FA(V ) and v |W ∈ FA(W ). Thus (u + v)|V ∩W ∈ FA(V ∩W ) (resp. (uv)|V ∩W ∈ FA(V ∩W )),
which implies
OFA(u) ∩ O
A(v) ⊂ O
A(u+ v) (resp. O
A(u) ∩O
A(v) ⊂ O
A (uv) ).
The result follows by taking the complementary sets in Ω.
This proposition leads easily to the following:
Corollary 6 Let (uj)1≤j≤p be any finite family of elements in A(Ω). If F is a presheaf of
topological vector spaces, (resp. algebras), we have
SFA (
1≤j≤p
uj) ⊂
1≤j≤p
SFA(uj).
Moreover, in the resp. case, we have
SFA (
1≤j≤p
uj) ⊂
1≤j≤p
SFA(uj).
In particular, if uj = u for 1 ≤ j ≤ p, we have S
p) ⊂ SFA(u).
Example 2 Taking E = C∞; F = D′; A = G leads to the D′-singular support of an element of
the Colombeau algebra. This notion is complementary to the usual concept of local association
in the Colombeau sense. We refer the reader to [12, 13] for more details.
Example 3 In the following examples we consider X = Rd, E = C∞ and A = G.
(i) Take u ∈ σΩ (C
∞ (Ω)), where σΩ : C
∞ (Ω) → G (Ω) is the canonical embedding defined in
Subsection 2.2. Then SC
G (u) = ∅, for all p ∈ N.
(ii) Take ϕ ∈ D (R), with
ϕ (x) dx = 1, and set ϕε (x) = ε
−1ϕ (x/ε). As ϕε
D′(R)
δ, we have:
G ([ϕε]) = {0}. We note also that S
G ([ϕε]) = {0}. Indeed, for any K ⋐ R
∗ = R\ {0} and ε
small enough, ϕε is null on K and, therefore, ϕε
C∞(R∗)
(iii) Take u = [uε] with uε(x) = ε sin(x/ε). We have that lim pK,0(uε) = 0, for all K ⋐ R,
whereas lim pK,1(uε) does not exist for l ≥ 1. Therefore
G (u) = ∅ , S
G (u) = R.
Remark 4 For any (p, q) ∈ N
with p ≤ q, and u ∈ G, it holds that SC
G (u) ⊂ S
G (u).
3 The concept of (a,F)-microlocal analysis
Let Ω be an open set in X. Fix u = [uλ] ∈ A(Ω) and x ∈ Ω. The idea of the (a,F)-microlocal
analysis is the following: (uλ)λ may not tend to a section of F above a neighborhood of x,
that is, there exists no V ∈ Vx and no f ∈ F (V ) such that lim
F(V ) uλ = f . Nevertheless,
in this case, there may exist V ∈ Vx, r ≥ 0 and f ∈ F (V ) such that lim
F(V ) aλ(r)uλ = f ,
that is [aλ(r)uλ |V ] belongs to the subspace (resp. subalgebra) FA(V ) of A(V ) introduced in
Subsection 2.4. These preliminary remarks lead to the following concept.
3.1 The (a,F)-singular parametric spectrum
We recall that a is a map from R+ to A+ such that a(0) = 1 and F is a presheaf of topological
vector spaces (or topological algebras). For any open subset Ω of X, u = [uλ] ∈ A(Ω) and
x ∈ Ω, set
N(a,F),x (u) =
r ∈ R+ | ∃V ∈ Vx, ∃f ∈ F(V ) : lim
F(V ) (aλ(r)uλ |V ) = f
r ∈ R+ | ∃V ∈ Vx : [aλ (r)uλ |V ] ∈ FA(V )
It is easy to check that N(a,F),x (u) does not depend on the representative of u. If no confusion
may arise, we shall simply write
N(a,F),x (u) = Nx(u).
Theorem 7 Suppose that:
(a) For all λ ∈ Λ
∀ (r, s) ∈ R+, aλ(r + s) ≤ aλ(r)aλ(s),
and, for all r ∈ R+\ {0}, the net (aλ (r))λ converges to 0 in K.
(b) F is a presheaf of separated locally convex topological vector spaces.
Then we have, for u ∈ A(Ω):
(i) If r ∈ Nx(u), then [r,+∞) is included in Nx(u). Moreover, for all s > r, there exists V ∈ Vx
such that: lim
F(V ) (aλ(s)uλ |V ) = 0. Consequently, Nx(u) is either empty, or a sub-interval
of R+.
(ii) More precisely, suppose that for x ∈ Ω, there exist r ∈ R+, V ∈ Vx and f ∈ F(V ),
nonzero on each neighborhood of x included in V , such that lim
F(V ) (aλ(r)uλ |V ) = f . Then
Nx(u) = [r,+∞) .
(iii) In the situation of (i) and (ii), we have that 0 ∈ Nx(u) iff Nx(u) = R+. Moreover, if one
of these assertions holds, the limits lim
F(V ) (aλ (s) uλ |V ) can be non null only for s = 0.
Proof. (i) If r ∈ Nx(u), there exist V ∈ Vx and f ∈ F(V ) such that lim
F(V ) (aλ(r)uλ|V ) =
f. As F(V ) is locally convex, its topology may be described by a family QV = (qj)j∈J(V ) of
semi-norms. For all s > r, we have, for any j ∈ J (V ),
qj(aλ(s) (uλ |V )) = aλ(s) qj(uλ |V ) ≤ aλ(s− r) aλ(r) qj(uλ |V ) ≤ aλ(s − r) qj(aλ(r) uλ |V ).
From lim
qj (aλ(r) (uλ |V − f)) = 0, we have qj(aλ(r) uλ |V ) < +∞ and lim
qj (aλ(s)(uλ |V )) =
0, since aλ(s− r)
→ 0. Thus lim
F(V ) (aλ(s)uλ |V ) = 0.
(ii) From (i), we have [r,+∞) ⊂ Nx(u). Suppose that there exists t < r in Nx(u). Then we get
W ∈ Vx, which can be chosen included in V , and g ∈ F(W ) such that lim
F(W ) (aλ(t)uλ |W ) =
g. With the notations of the proof of (i), we have
qj(aλ(r) (uλ |W )) ≤ aλ(r − t)qj(aλ(t)uλ |W ).
As qj(aλ(t)uλ |V ) is bounded, it follows that lim
qj(aλ(r) (uλ |W )) = 0, which is in contradiction
with lim
F(V ) (aλ(r) (uλ |V ) = f 6≡ 0 on W.
(iii) The first assertion follows directly from (i) and the second from (ii).
From now on, we suppose that the hypotheses (a) and (b) of Theorem 7 are fulfilled. We
Σ(a,F),x(u) = Σx(u) = R+\Nx(u),
R(a,F),x (u) = Rx(u) = inf Nx(u).
According to the previous remarks and comments, Σ(a,F),x(u) is an interval of R+ of the form[
0, R(a,F),x (u)
0, R(a,F),x (u)
, the empty set, or R+.
Definition 4 The (a,F)-singular spectrum of u ∈ A(Ω) is the set
(a,F)
A (u) = {(x, r) ∈ Ω× R+ | r ∈ Σx(u)} .
Example 4 Take X = Rd, E = C∞, F = Cp (p ∈ N = N∪{+∞}), f ∈ C∞ (Ω). Set u =[(
and v =
ε−1 |ln ε| f
in A (Ω) = G (Ω). Then, for all x ∈ R,
N(a,Cp),x (u) = [1,+∞) , N(a,Cp),x (v) = (1,+∞) , R(a,Cp),x (u) = R(a,Cp),x (v) = 1.
Remark 5 We have: Σ(a,F),x(u) = ∅ iff N(a,F),x(u) = R+ and, according to Theorem 7, iff
0 ∈ N(a,F),x(u), that is, there exist (V, f) ∈ Vx×F(V ) such that lim
F(V ) (aλ(0)uλ |V ) = f . As
aλ(0) ≡ 1, this last assertion is equivalent to x ∈ O
A (u). Thus Σ(a,F),x(u) = ∅ iff x /∈ S
A (u).
This remark implies directly the:
Proposition 8 The projection of the (a,F)-singular spectrum of u on Ω is the F-singular
support of u.
3.2 Example: The Colombeau case
In this subsection we investigate the relationship between the (a,F)-singular spectrum and the
sharp topology for X = Rd, E = C∞, F = Cp (p ∈ N), A = G, aε (r) = ε
r. First, let us remark
that, for u = [uε] ∈ G (Ω), x ∈ Ω (Ω ∈ O
), N(a,Cp),x (u) is never empty.
Indeed, consider V ∈ Vx with V ⋐ Ω. There exists m > 0 such that pp,V (uε) = o (ε
−m) as
ε → 0. Thus, p
(uε) = o (ε
−m) for all k ≤ p and lim
Cp(V ) (ε
muε |V ) = 0. Thus [m,+∞) ⊂
N(a,Cp),x (u) .
Let us now recall the construction of the sharp topology on G (Ω) . For u = [(uε)ε] ∈ G (Ω),
K ⋐ Ω, l ∈ N, set
vK,l(u) = inf
r ∈ R
∣∣ pK,l (uε) = o(ε−r) as ε→ 0
The real number vK,l(u) is well defined, i.e. does not depend on the representative of u, and is
called the (K, l)-valuation of u. It has the usual properties:
(i) ∀λ ∈ C\{0}, ∀u ∈ G (Ω) , vK,l(λu) = vK,l(u) ;
(ii) ∀u, v ∈ G (Ω) , vK,l(u+ v) ≤ sup(vK,l(u), vK,l(v)).
The family (vK,l) permits to define the (K, l)-pseudodistances dK,l on G (Ω) by
∀ (u, v) ∈ G (Ω)
, dK,l (u, v) = exp (vK,l(u− v)) ,
which turns out to be ultrametric:
∀ (u, v, w) ∈ G (Ω)
, dK,l (u, v) ≤ sup(dK,l(u,w), dK,l(w, v)).
The topology defined by the family (dK,l)K,l is called the sharp topology on G (Ω).
As we are interested here in valuations greater or equal to 0, we set, for u ∈ G (Ω),
νK,l(u) = sup (vK,l(u), 0) .
We can define, for x ∈ Ω, the l-valuation of u at x by
νx,l(u) = inf
(u) |V ∈ V (x) , V relatively compact
and set, for any p ∈ N,
νpx(u) = sup
0≤l≤p
νx,l(u).
Proposition 9 For all p ∈ N, [uε] ∈ G (Ω) and x ∈ Ω, we have
νpx(u) = R(a,Cp),x (u) = infN(a,Cp),x (u) .
Proof. Take r > ν
x(u). Then, for any l with 0 ≤ l ≤ p, one has r > νx,l(u) and there exists
V ∈ V (x) , V relatively compact, such that v
(u) < r. Thus, p
(uε) = o(ε
−r), as ε → 0,
and lim
Cp(V ) (ε
ruε |V ) = 0, which implies that r > R(a,Cp),x (u) and ν
x(u) ≥ R(a,Cp),x (u).
Conversely, if r > R(a,Cp),x (u), there exists V ∈ V (x) such that lim
Cp(V ) (ε
ruε |V ) = 0. For
any relatively compact neighborhood W of x included in V , we get p
(uε) = o(ε
−r) and
r > v
(u) > νx,l(u). Thus, r ≥ ν
x(u) and ν
x(u) ≤ R(a,Cp),x (u).
3.3 Some properties of the (a,F)-singular parametric spectrum
Notation 2 For u = [uλ] ∈ A (Ω), lim
F(V ) (aλ(r)uλ |V ) ∈ F (V ) means that there exists
f ∈ F (V ) such that lim
F(V ) (aλ(r)uλ |V ) = f .
3.3.1 Linear properties
Proposition 10 For any u, v ∈ A(Ω), we have
(a,F)
A (u+ v) ⊂ S
(a,F)
A (u) ∪ S
(a,F)
A (v) .
Proof. Let r be in Nx(u) ∩Nx(v). Then there exist V ∈ Vx and W ∈ Vx such that
F(V ) (aλ(r)uλ |V ) ∈ F (V ) and lim
F(W ) (aλ(r) vλ |W ) ∈ F (W ) .
Thus lim
F(V ∩W ) (aλ(r) (uλ + vλ) |V ∩W ) ∈ F (V ∩W ) and r ∈ Nx(u+ v). Consequently,
Nx(u) ∩Nx(v) ⊂ Nx(u+ v).
We obtain the result by taking the complementary sets in R+.
Corollary 11 For any u, u0, u1 in A(Ω) with
(i) u = u0 + u1 (ii) S
(a,F)
A (u0) = ∅,
we have
(a,F)
A (u) = S
(a,F)
A (u1) .
Proof. Proposition 10 and condition (ii) give S
(a,F)
A (u) ⊂ S
(a,F)
A (u1). As (i) implies
u0 = u− u1, we obtain the converse inclusion, and thus the equality.
3.3.2 Differential properties
We suppose that F is a sheaf of topological differential vector spaces (resp. algebras), with
continuous differentiation, admitting E as a subsheaf of topological differential algebras. Then
the sheaf A is also a sheaf of differential algebras with, for any α ∈ Nd and u ∈ A (Ω),
∂αu = [∂αuλ] , where (uλ)λ is any representative of u.
The independence of ∂αu on the choice of representative follows directly from the definition of
J(IA,E,P).)
Proposition 12 Let u be in A(Ω). For all ∂α, α ∈ Nd, we have
(a,F)
αu) ⊂ S
(a,F)
A (u) .
Proof. Take u ∈ A(Ω), α ∈ Nd, x ∈ Ω, r ∈ Nx(u). There exists V ∈ Vx, f ∈ F (V ) such
F(V ) (aλ(r)uλ |V ) = f.
The continuity of ∂α implies that
F(V ) (aλ(r)∂
α uλ |V ) = ∂
Thus Nx(u) ⊂ Nx(∂
αu). The result is proved.
In the following two results we require that F is a sheaf of topological modules over E , in
addition. The proofs are straightforward.
Proposition 13 Let g be in E(Ω) and u in A(Ω). We have
(a,F)
A (gu) ⊂ S
(a,F)
A (u) .
Propositions 10, 12 and 13 finally imply:
Corollary 14 Let P (∂) =
|α|≤m
α be a differential polynomial with coefficients in E(Ω). For
any u ∈ A(Ω), we have
(a,F)
A (P (∂)u) ⊂ S
(a,F)
A (u) .
3.3.3 Nonlinear properties
Theorem 15 For given u and v ∈ A(Ω), let Di (i = 1, 2, 3) be the following disjoint sets:
D1 = S
A (u)�(S
A (u) ∩ S
A (v)) ; D2 = S
A (v)�(S
A (u) ∩ S
A (v)) ; D3 = S
A (u) ∩ S
A (v).
Then the (a,F)-singular asymptotic spectrum of uv verifies
(a,F)
A (uv) ⊂ {(x,Σx(u)), x ∈ D1} ∪ {(x,Σx(v)), x ∈ D2} ∪ {(x,Ex(u, v)), x ∈ D3}
where (for any x ∈ D3)
Ex(u, v) =
[0, supΣx(u) + supΣx(v)] if Σx(u) 6= R+ and Σx(v) 6= R+
R+ if Σx(u) = R+ or Σx(v) = R+
Proof. Suppose that x belongs to D1. Then x is not in S
A (v) and we have
Σx(v) = ∅, Nx(v) = R+.
If Nx(u) is not empty, let r be in Nx(u). As Nx(v) = R+, we have r ∈ Nx(v). Thus there exists
V ∈ Vx (resp. W ∈ Vx) such that [aλ(r)uλ |V ] ∈ FA(V ) (resp. [aλ(r)vλ |W ] ∈ FA(W )). As F
is a sheaf of topological algebras we have
[aλ(r) (uλvλ) |V ∩W ] ∈ FA(V ∩W ).
Thus, r belongs to Nx(uv). Therefore, we have proved that Σx(uv) ⊂ Σx(u). If Nx(u) is empty,
we have Σx(u) = R+ and the above inclusion is obviously fulfilled. For x in D2, the same proof
gives Σx(uv) ⊂ Σx(v).
Consider x in D3. Then, Σx(u) and Σx(v) are not empty. We suppose first that both of them
are not equal to R+. Set R = supΣx(u) and S = supΣx(v). If r > R, there exists r
′ ∈ Nx(u)
such that R < r′ < r and then, from the part (i) of Theorem 7, there exists V ∈ Vx such that
F(V ) (aλ(r)uλ |V ) = 0.
Similarly, if s > S, there exists W ∈ Vx such that
F(W ) (aλ(s) vλ |W ) = 0.
Then lim
F(V ∩W ) (aλ(r)aλ(s) (uλvλ) |V ∩W ) = 0. By expressing this limit in terms of semi-
norms, as in the proof of Theorem 7 and by using the inequality aλ(r+ s) ≤ aλ(r)aλ(s), we get
that lim
F(V ∩W ) (aλ(r + s) (uλvλ) |V ∩W ) = 0. Thus
[r + s,∞[ ⊂ Nx(uv) or [0, r + s[ ⊃ Σx(uv)
for any r > R and s > S. Thus
Σx(uv) ⊂ [0, R+ S] = [0, supΣx(u) + supΣx(v)].
If Σx(u) or Σx(v) is equal to R+, the obvious inclusion Σx(uv) ⊂ R+ gives the last result.
Corollary 16 For given u ∈ A(Ω) and p ∈ N∗, we have
(a,F)
(x,Hp,x(u)), x ∈ S
A (u)
where Hp,x(u) =
[0, p supΣx(u)] if Σx(u) 6= R+
R+ if Σx(u) = R+
Proof. When Σx(u) = R+, the result is obvious. Suppose now Σx(u) 6= R+. We shall prove
the result by induction. If p = 1, the result is a simple consequence of the definitions. Suppose
that the result holds for some p ≥ 1. Set v = up in the previous theorem. We have
D1 = S
A (u)�S
p) ; D2 = ∅ ; D3 = S
(a,F)
(x,Σx(u)), x ∈ S
A (u)�S
(x, [0, (p + 1) supΣx(u)]), x ∈ S
by using the induction hypothesis. It follows a fortiori that
(a,F)
(x, [0, (p + 1) supΣx(u)]), x ∈ S
A (u)
4 Applications to partial differential equations
In this section we shall compute various (a,F)-singular spectra of solutions to linear and nonlin-
ear partial differential equations. Throughout we shall suppose that Λ =]0, 1], X = Rd, E = C∞,
F = Cp (1 ≤ p ≤ ∞) or F = D′, aε(r) = ε
r. The results will hold for any (C, E ,P)-algebra
A = H(A,E,P)/J(IA,E,P)
such that (aε(r))ε ∈ A+ for all r ∈ R+ and property (9) holds.
Example 5 The (a,Cp)-singular spectrum of powers of the delta function. Given a mollifier
of the form
ϕε (x) =
, x ∈ Rd where ϕ ∈ D(Rd), ϕ ≥ 0 and
ϕ (x) dx = 1,
its class in A(Rd) defines the delta function δ(x) as an element of A(Rd). Its powers are given
by (m ∈ N)
Clearly, the C0-singular spectrum is given by
(a,C0)
0, [0,md]
Differentiating ϕm(x) and observing that for each derivative there is a point x at which it does
not vanish we see that
(a,Ck)
0, [0,md + k]
Example 6 The (a,D′)-singular spectrum of powers of the delta function. Given a test function
ψ ∈ D(Rd), we have ∫
ϕmε (x)ψ(x) dx =
εmd−d
ϕm(x)ψ(εx) dx,
(a,D′)
m) = ∅ for m = 1, S
(a,D′)
0, [0,md − d[
for m > 1.
4.1 The singular spectrum of solutions to linear hyperbolic equations
Consider the Cauchy problem for the d-dimensional linear wave equation
∂2t uε(x, t)−∆uε(x, t) = 0, x ∈ R
d, t ∈ R
uε(x, 0) = u0ε(x), ∂tuε(x, 0) = u1ε(x), x ∈ R
where u0ε, u1ε ∈ C
∞(Rd) represent elements u0, u1 of an algebra A(R
d) as outlined at the
beginning of this section. Under suitable assumptions on the ring A, the corresponding net of
classical smooth solutions represents a unique solution u in the algebra A(Rd+1); for example,
this holds in the Colombeau case [16]. Let t → E(t) ∈ C∞(R : E ′(Rd)) be the fundamental
solution of the Cauchy problem. Then
uε(·, t) =
E(t) ∗ εru0ε + E(t) ∗ ε
ru1ε.
If for some r ≥ 0 and u0 ∈ D
′(Rd),
εru0ε(x)ψ(x) dx → 〈u0, ψ〉
for all ψ ∈ D(Rd), then
εruε(x)ψ(x) dx = 〈E(t) ∗ ε
ru0ε, ψ〉 = 〈ε
ru0ε, Ě(t) ∗ ψ〉 → 〈u0, Ě(t) ∗ ψ〉
for all ψ ∈ D(Rd) and t ∈ R as well. We arrive at the following assertion.
Proposition 17 Assume that S
(a,D′)
A (u0) and S
(a,D′)
A (u1) are contained in R
d×I, where I = ∅,
I = [0, r[ or I = [0, r] for some r, 0 ≤ r ≤ ∞. Let u ∈ A(Rd+1) be the solution to the linear
wave equation (10). Then S
(a,D′)
A (u(·, t)) ⊂ R
d × I for all t ∈ R.
This upper bound may or may not be reached, depending on the effects of finite propagation
speed or the Huyghens principle in odd space dimension d ≥ 3. We just illustrate some of the
possible effects for the one-dimensional wave equation with powers of delta functions as initial
data. Thus we consider the problem
∂2t uε(x, t)− ∂
xuε(x, t) = 0, x ∈ R, t ∈ R
uε(x, 0) = c0ϕ
ε (x), ∂tuε(x, 0) = c1ϕ
ε (x), x ∈ R,
where ϕ is a mollifier as in Example 5 and c0, c1 ∈ R. The solution to (11) is given by
uε(x, t) =
ϕmε (x− t) + ϕ
ε (x+ t)
∫ x+t
ϕnε (y) dy.
We observe that uε(x, t) = 0 for sufficiently small ε when |x| > |t|, that is, outside the light
cone, and uε(x, t) = sign(t)
εn−1‖ϕn‖L1(R) for sufficiently small ε when |x| < |t|.
Example 7 If in equation (11) c0 6= 0, c1 = 0 then
(a,D′)
A (u) = {(x, t, r) : |x| = |t|, 0 ≤ r < m− 1}
with the provision that S
(a,D′)
A (u) = ∅ when m = 1. If in equation (11) c0 = 0, c1 6= 0 then
(a,D′)
A (u) = {(x, t, r) : |x| ≤ |t|, 0 ≤ r < n− 1}.
When both c0 and c1 are nonzero the singular spectrum is obtained as the union of the two
spectra. For the C0-singular spectrum the following results hold: If in equation (11) c0 6= 0,
c1 = 0 then
(a,C0)
A (u) = {(x, t, r) : |x| = |t|, 0 ≤ r ≤ m}.
If c0 = 0, c1 6= 0 then
(a,C0))
A (u) = {(x, t, r) : |x| < |t|, 0 ≤ r < n− 1} ∪ {(x, t, r) : |x| = |t|, 0 ≤ r ≤ n− 1}.
4.2 The singular spectrum of solutions to semilinear hyperbolic equations
In this subsection we study the paradigmatic case of a semilinear transport equation
∂tuε(x, t) + λ(x, t)∂xuε(x, t) = F (uε(x, t)), x ∈ R, t ∈ R
uε(x, 0) = u0ε(x), x ∈ R
where λ and F are smooth functions of their arguments. In this situation, the singular spectrum
of the initial data may be decreased or increased, depending on the function F . We observe
that by a change of coordinates we may assume without loss of generality that λ ≡ 0. In fact,
denote by s → γ(x, t, s) the characteristic curve of (12) passing through the point x at time
s = t, that is the solution to
γ(x, t, s) = λ(γ(x, t, s), s), γ(x, t, t) = x.
The function v(y, s) = u(γ(y, 0, s), s) is a solution of the initial value problem
∂sv(y, s) = F (v(y, s), v(y, 0) = u0(y),
at least as long as the characteristic curves exist.
Example 8 (The dissipative case) The equation
∂tuε(x, t) = −u
ε(x, t), x ∈ R, t > 0
uε(x, 0) = u0ε(x), x ∈ R
has the solution
uε(x, t) =
u0ε(x)√
2tu20ε(x) + 1
2t+ 1/u20ε(x)
When the initial data are given by a power of the delta function, u0ε(x) = ϕ
ε (x), the solution
formula shows that uε(x, t) is a bounded function (uniformly in ε) and supported on the line
{x = 0}. Thus uε(x, t) converges to zero in D
′(R×]0,∞[), and so
(a,D′)
A (u0) =
0, [0,m − 1[
(a,D′)
A (u) = ∅.
Example 9 The equation
∂tuε(x, t) =
1 + u2ε(x, t), x ∈ R, t > 0
uε(x, 0) = u0ε(x), x ∈ R
has the solution
uε(x, t) = u0ε(x) cosh t+
1 + u20ε(x) sinh t.
We first take a delta function as initial value, that is, u0ε(x) = ϕε(x). Then
uε(x, t)ψ(x, t) dxdt =
ϕ(x) cosh t+
ε2 + ϕ2(x) sinh t
ψ(εx, t) dxdt
ϕ(x) cosh t+ |ϕ(x)| sinh t
ψ(0, t) dxdt
for ψ ∈ D(R2). Thus in this case
(a,D′)
A (u0) = S
(a,D′)
A (u) = ∅.
On the other hand, taking the derivative of a delta function as initial value, u0ε(x) = ϕ
ε(x), a
similar calculation shows that
uε(x, t)ψ(x, t) dxdt =
ϕ(x) cosh t+
ε4 + (ϕ′)2(x) sinh t
ψ(εx, t) dxdt
and so
(a,D′)
A (u0) = ∅, S
(a,D′)
A (u) = {(0, t, r) : t > 0, 0 ≤ r < 1}.
The next example shows that it is quite possible for the singular spectrum to increase with
time.
Example 10 The equation
∂tuε(x, t) =
uε(x, t) + 1)
uε(x, t) + 1
, x ∈ R, t > 0
uε(x, 0) = u0ε(x), x ∈ R
has the solution
uε(x, t) =
u0ε(x) + 1
provided u0ε > −1 in which case the function on the right hand side of the differential equation
is smooth in the relevant region. To demonstrate the effect, we take a power of the delta function
as initial value, that is u0ε(x) = ϕ
ε (x). Then
(a,D′)
A (u0) = {(0, r) : 0 ≤ r < m− 1}, S
(a,D′)
A (u) = {(0, t, r) : t > 0, 0 ≤ r < me
t − 1}.
In situations where blow-up in finite time occurs, microlocal asymptotic methods allow to
extract information beyond the point of blow-up. This can be done by regularizing the initial
data and truncating the nonlinear term. We demonstrate this in a simple situation.
Example 11 Formally, we wish to treat the initial value problem
∂tu(x, t) = u
2(x, t), x ∈ R, t > 0
u(x, 0) = H(x), x ∈ R
where H denotes the Heaviside function. Clearly, the local solution u(x, t) = H(x)/(1− t) blows
up at time t = 1 when x > 0. Choose χε ∈ C
∞ (R) with
0 ≤ χε(z) ≤ 1 ; χε(z) = 1 if |z| ≤ ε
−s , χε(z) = 0 if |z| ≥ 1 + ε
−s , s > 0.
Further, let Hε(x) = H ∗ ϕε(x) where ϕε is a mollifier as in Example 5. We consider the
regularized problem
∂tuε(x, t) = χε
uε(x, t)
u2ε(x, t), x ∈ R, t > 0
uε(x, 0) = Hε(x), x ∈ R.
When x < 0 and ε is sufficiently small, uε(x, t) = 0 for all t ≥ 0. For x > 0, uε(x, t) = 1/(1− t)
as long as t ≤ 1− εs. The cut-off function is chosen in such a way that |χε(z)z
2| ≤ (1 + ε−s)2
for all z ∈ R. Therefore,
∂tuε ≤ (1 + ε
−s)2 always and ∂tuε = 0 when |uε| ≥ 1 + ε
Continuing the regularized solution beyond time t = 1 − εs, we infer by combining the two
inequalities that ε−s ≤ uε(x, t) ≤ 1 + ε
−s for t ≥ 1 − εs when x > 0 and ε is sufficiently
small. Finally, as long as t < 1, the regularized solution remains bounded with respect to ε near
(0, t) for ε small enough; after t = 1, the asymptotic growth of order ε−s spills over into any
neighborhood of every point (x, t) for x ≥ 0.
Collecting all previous estimates, we obtain the following C0-singular support and
singular spectrum (for aε(r) = ε
r) of u = [uε]:
A (u) = S1(u) ∪ S2(u) with S1(u) = {(0, t) : 0 ≤ t < 1} ; S2(u) = {(x, t) : x ≥ 0, t ≥ 1},
(a,C0)
A (u) = (S1(u)× {0}) ∪ (S2(u)× [0, s]) .
The C0-singularities (resp.
-singularities) of u are described by means of two sets: S1(u)
and S2(u) (resp. S1(u) × {0} and S2(u) × [0, s]). The set S1(u) (resp. S1(u) × {0}) is related
to the data C0 (resp.
)-singularity. The set S2(u) (resp. S2(u) × [0, s]) is related to the
singularity due to the nonlinearity of the equation giving the blow-up at t = 1. The blow-up locus
is the edge {x ≥ 0, t = 1} of S2(u) and the strength of the blow-up is measured by the length
s of the fiber [0, s] above each point of the blow-up locus. This length is closely related to the
diameter of the support of the regularizing function χε and depends essentially on the nature
of the blow-up: Changing simultaneously the scales of the regularization and of the cut-off (i.e.
replacing ε by some function h(ε) → 0 in the definition of ϕε and χε) does not change the fiber
and characterizes a sort of moderateness of the strength of the blow-up.
4.3 The strength of a singularity and the sum law
When studying the propagation and interaction of singularities in semilinear hyperbolic systems,
Rauch and Reed [18] defined the strength of a singularity of a piecewise smooth function. We
recall this notion in the one-dimensional case. Assume that the function f : R → R is smooth
on ] − ∞, x0] and on [x0,∞[ for some point x0 ∈ R. The strength of the singularity of f
at x0 is the order of the highest derivative which is still continuous across x0. For example,
if f is continuous with a jump in the first derivative at x0, the order is 0; if f has a jump
at x0, the order is −1. Travers [21] later generalized this notion to include delta functions.
Slightly deviating from her definition, but in line with the one of [18], we define the strength of
singularity of the k-th derivative of a delta function at x0, ∂
xδ(x− x0), by −k − 2.
The significance of these definitions is seen in the description of what Rauch and Reed
termed anomalous singularities in semilinear hyperbolic systems. We demonstrate the effect in
a paradigmatic example, also due to [18], the (3× 3)-system
(∂t + ∂x)u(x, t) = 0, u(x, 0) = u0(x)
(∂t − ∂x)v(x, t) = 0, v(x, 0) = v0(x)
∂tw(x, t) = u(x, t)v(x, t), w(x, 0) = 0
Assume that u0 has a singularity of strength n1 ≥ −1 at x1 = −1 and v0 has a singularity
of strength n2 ≥ −1 at x2 = +1. The characteristic curves emanating from x1 and x2 are
straight lines intersecting at the point x = 0, t = 1. Rauch and Reed showed that, in general,
the third component w will have a singularity of strength n3 = n1 + n2 + 2 along the half-ray
{(0, t) : t ≥ 1}. This half-ray does not connect backwards to a singularity in the initial data
for w, hence the term anomalous singularity. The formula n3 = n1 + n2 + 2 is called the sum
law. Travers extended this result to the case where u0 and v0 were given as derivatives of
delta functions at x1 and x2. We are going to further generalize this result to powers of delta
functions, after establishing the relation between the strength of a singularity of a function f
at x0 and the singular spectrum of f ∗ ϕε.
We consider a function f : R → R which is smooth on ]−∞, x0] and on [x0,∞[ for some point
x0 ∈ R; actually only the local behavior near x0 is relevant. We fix a mollifier ϕε(x) =
as in Example 5 and denote the corresponding embedding of D′(R) into the (C, E ,P)-algebra
A(R) by ι. In particular, ι(f) = [f ∗ ϕε].
If f is continuous at x0, then limε→0 f ∗ ϕε = f in C
0. If f has a jump x0, this limit does
not exist in C0, but limε→0 ε
rf ∗ ϕε = 0 in C
0 for every r > 0. We have the following result.
Proposition 18 Let x0 ∈ R. If f : R → R is a smooth function on ]−∞, x0] and on [x0,∞[
or f(x) = ∂kxδ(x − x0) for some k ∈ N, then the strength of the singularity of f at x0 is −n if
and only if
Σ(a,C1),x0
= [0, n].
Here n ∈ N and aε(r) = ε
Proof. When n = 0, the function f is continuous and its derivative has a jump at x0.
From what was said before Proposition 18 it follows that Σ(a,C1),x0
= {0}. When n = 1,
the function f has a jump itself at x0 and its distributional derivative contains a delta function
part. Thus limε→0 ε
rf ∗ϕε = 0 in C
0 for every r > 0 and limε→0 ε
r∂xf ∗ϕε = 0 in C
0 for every
r > 1, and neither of the two limits exists for smaller r. Therefore, Σ(a,C1),x0
= [0, 1].
When n ≥ 2, f(x) = ∂n−2x δ(x− x0) and the assertion is straightforward.
We shall now return to the model equation (13) and demonstrate that the sum law remains
valid when the initial data are powers of delta functions. We work in suitable (C, E ,P)-algebras
A(R) and A(R2) in which the initial value problem (13) can be uniquely solved (see the discus-
sion at the beginning of Subsection 4.1). We still consider the scale aε(r) = ε
Proposition 19 Let u0(x) = δ
m(x+1), v0(x) = δ
n(x−1) for some m,n ∈ N∗. Let w ∈ A(R2)
be the third component of the solution to problem (13). Then w(x, t) vanishes at all points (x, t)
with x 6= 0 as well as (0, t) with t < 1, and
Σ(a,C1),(0,t)
⊂ [0,m+ n]
for t ≥ 1.
Proof. A representative of w is given by
wε(x, t) =
ϕmε (x+ 1− s)ϕ
ε (x− 1 + s) ds.
The fact that the mollifier ϕ has compact support entails that wε(x, t) vanishes for sufficiently
small ε whenever x 6= 0 or t < 1. We have
wε(x, t) =
(x+ 1− s
(x− 1 + s
∂twε(x, t) =
(x+ 1− t
(x− 1 + t
∂xwε(x, t) =
εm+n+1
(x+ 1− s
(x+ 1− s
(x− 1 + s
εm+n+1
(x+ 1− s
(x− 1 + s
(x− 1− s
If the support of ϕ is contained in an interval [−κ, κ], say, then the t-integrations extend at
most from x+ 1− κε to x+ 1 + κε at fixed x. Therefore, all terms converge to zero uniformly
on R2 when multiplied by εr with r > m+ n. This proves the assertion.
Using the correspondence between the singular spectrum and the strength of a singularity
formulated in Proposition 18, as well as Example 5, we may say that the strength of the
singularity of δm(x + 1) at x0 = −1 is n1 = −m − 1, while the strength of the singularity of
δn(x−1) at x0 = +1 is n2 = −n−1. The strength of the singularity of the solution w at points
(0, t) with t ≥ 1 is −m− n = n1 + n2 + 2 and is seen to satisfy the sum law.
4.4 Regular Colombeau generalized functions
The subsheaf G∞ of regular Colombeau functions of the sheaf G is defined as follows [16]:
Given an open subset Ω of Rd, the algebra G∞(Ω) comprises those elements u of G(Ω) whose
representatives (uε)ε satisfy the condition
(14) ∀K ⋐ Ω ∃m ∈ N ∀l ∈ N : pK,l(uε) = o(ε
−m) as ε→ 0.
The decisive property is that the bound of order ε−m is uniform with respect to the order
of derivation on compact sets. The algebra G∞(Ω) satisfies G∞(Ω) ∩ D′(Ω) = C∞(Ω) and
forms the basis for the investigation of hypoellipticity of linear partial differential operators
in the Colombeau framework. We are going to characterize the G∞-property in terms of the
C∞-singular spectrum. The scale a is still given by aε(r) = ε
Proposition 20 Let u ∈ G(Ω). Then u belongs to G∞(Ω) if and only if
Σ(a,C∞),x
6= R+
for all x ∈ Ω.
Proof. If u ∈ G∞(Ω), x ∈ Ω and Vx is a relatively compact open neighborhood of x, property
(14) says that there is m ∈ N such that limε→0 ε
muε = 0 in C
∞(Vx). Thus Σ(a,C∞),x
6= R+.
Conversely, if Σ(a,C∞),x
6= R+ we can find an open neighborhood Vx of x and m(x) ∈ N
such that limε→0 ε
ruε = 0 in C
∞(Vx) for all r ≥ m. Any compact set K can be covered by
finitely many such neighborhoods. Letting m be the maximum of the numbers m(x) involved,
we obtain property (14).
In relation with regularity theory of solutions to nonlinear partial differential equations, a
further subalgebra of G(Ω) has been introduced in [17] – the algebra of Colombeau functions of
total slow scale type. It consists of those elements u of G(Ω) whose representatives (uε)ε satisfy
the condition
(15) ∀K ⋐ Ω ∀r > 0 ∀l ∈ N : pK,l(uε) = o(ε
−r) as ε→ 0.
The term slow scale refers to the fact that the growth is slower than any negative power of ε
as ε→ 0. This property can again be characterized by means of the singular spectrum.
Proposition 21 An element u ∈ G(Ω) is of total slow scale type if and only if
Σ(a,C∞),x
⊂ {0}
for all x ∈ Ω.
Proof. If u is of total slow scale type, x ∈ Ω and Vx is a relatively compact open neigh-
borhood of x, property (15) implies that limε→0 ε
suε = 0 in C
∞(Vx) for every s > 0. Thus
Σ(a,C∞),x
⊂ {0}. To prove the converse, we take a compact subset K and r > 0 and cover
K by finitely many neighborhoods Vx of points x ∈ K such that limε→0 ε
ruε = 0 in C
∞(Vx).
Then property (15) follows.
References
[1] Aragona, J., Biagioni, H.: Intrinsic definition of the Colombeau algebra of generalized
functions. Anal. Math. 17, 75–132 (1991)
[2] Colombeau, J.F.: Multiplication of Distributions: a tool in mathematics, numerical engi-
neering and theoretical physics. Lecture Notes in Mathematics, vol. 1532. Springer-Verlag,
Berlin (1992)
[3] Delcroix, A: Remarks on the embedding of spaces of distributions into spaces of Colombeau
generalized functions. Novi Sad J. Math. 35(2), 27–40 (2005)
[4] Delcroix, A.: Regular rapidly decreasing nonlinear generalized functions. Application to
microlocal regularity. J. Math. Anal. Appl. 327, 564–584 (2007)
[5] Garetto, C., Gramchev, T., Oberguggenberger, M.: Pseudo-Differential operators and reg-
ularity theory. Electron J. Diff. Eqns. 2005(116), 1–43 (2005)
[6] Garetto, C., Hörmann, G.: Microlocal analysis of generalized functions: pseudodifferential
techniques and propagation of singularities. Proc. Edinb. Math. Soc. 48, 603–629 (2005)
[7] Grosser, M., Kunzinger, M., Oberguggenberger, M., Steinbauer, R.: Geometric Theory of
Generalized Functions with Applications to General Relativity. Kluwer Academic Publ.,
Dordrecht (2001)
[8] Hörmander, L.: The Analysis of Linear Partial Differential Operators, I: Distribution
Theory and Fourier Analysis. Grundlehren der mathematischen Wissenschaften, vol. 256,
2nd edition. Springer-Verlag, Berlin (1990)
[9] Hörmann, G., De Hoop, M.V.: Microlocal analysis and global solutions for some hyperbolic
equations with discontinuous coefficients. Acta Appl. Math. 67, 173–224 (2001)
[10] Hörmann, G., Kunzinger, M.: Microlocal properties of basic operations in Colombeau
algebras. J. Math. Anal. Appl. 261, 254–270 (2001)
[11] Hörmann, G., Oberguggenberger, M., Pilipović, S.: Microlocal hypoellipticity of linear
differential operators with generalized functions as coefficients. Trans. Amer. Math. Soc.
358, 3363–3383 (2006)
[12] Marti, J.A.: (C, E ,P)-Sheaf structure and applications. In: Grosser, M., Hörmann, G.,
Kunzinger, M., Oberguggenberger, M. (Eds.) Nonlinear Theory of Generalized Functions.
Chapman & Hall/CRC Research Notes in Mathematics, vol. 401, pp.175–186. Boca Raton
(1999)
[13] Marti, J.A.: Nonlinear algebraic analysis of delta shock wave to Burgers’ equation. Pacific
J. Math. 210(1), 165–187 (2003)
[14] Marti, J.A.: GL-microanalysis of generalized functions. Integral Transf. Spec. Funct. 2–3,
119–125 (2006)
[15] Nedeljkov, M., Pilipović, S., Scarpalézos, D.: The linear theory of Colombeau general-
ized functions. Pitman Research Notes in Mathematics, vol. 385. Longman Scientific &
Technical, Harlow (1998)
[16] Oberguggenberger, M.: Multiplication of Distributions and Applications to Partial Differ-
ential Equations. Pitman Research Notes in Mathematics, vol. 259. Longman Scientific &
Technical, Harlow (1992)
[17] Oberguggenberger, M.: Generalized solutions to nonlinear wave equations. Matemática
Contemporânea 27, 169–187 (2004)
[18] Rauch, J., Reed, M.: Jump discontinuities of semilinear, strictly hyperbolic equations in
two variables: Creation and propagation. Comm. Math. Phys. 81, 203-227 (1981)
[19] Scarpalézos, D.: Colombeau’s generalized functions: topological structures; microlocal
properties. A simplified point of view. Bull. Cl. Sci. Math. Nat. Sci. Math. 25, 89–114
(2000)
[20] Schwartz, L.: Théorie des Distributions. Hermann, Paris (1966)
[21] Travers, K.: Semilinear hyperbolic systems in one space dimension with strongly singular
initial data. Electron. J. Diff. Eqns. 1997(14), 1–11 (1997)
	Introduction
	Preliminary definitions and local parametric analysis
	The presheaves of (C,E,P)-algebras: the algebraic structure 
	Relationship with distribution theory and Colombeau algebras
	An association process
	The F-singular support of a generalized function
	The concept of (a,F)-microlocal analysis
	The (a,F)-singular parametric spectrum
	Example: The Colombeau case
	Some properties of the (a,F)-singular parametric spectrum
	Linear properties
	Differential properties
	Nonlinear properties
	Applications to partial differential equations
	The singular spectrum of solutions to linear hyperbolic equations
	The singular spectrum of solutions to semilinear hyperbolic equations
	The strength of a singularity and the sum law
	Regular Colombeau generalized functions
ABSTRACT
  We introduce a new type of local and microlocal asymptotic analysis in
algebras of generalized functions, based on the presheaf properties of those
algebras and on the properties of their elements with respect to a regularizing
parameter. Contrary to the more classical frequential analysis based on the
Fourier transform, we can describe a singular asymptotic spectrum which has
good properties with respect to nonlinear operations. In this spirit we give
several examples of propagation of singularities through nonlinear operators.

<|endoftext|><|startoftext|>
Introduction 
In classical spin systems, disorder destroys magnetic order by cutting off the correlation 
between neighboring spins.  However, in quantum spin systems, where the magnetic order is 
instabilized by the quantum spin fluctuation, disorder often brings the non-trivial effect to 
their ground state.  A low number of holes drastically destruct the antiferromagnetic order in 
parent phase of high-TC cuprates through frustration effect.1)  In spin liquid systems 
represented by CuGeO3, disorder induces an unpaired spin among singlet dimer and brings a 
static magnetic order to the disordered state.2) 
The title compounds of (CH3)2CHNH3CuCl3 and (CH3)2CHNH3CuBr3 abbreviated as 
IPA-CuCl3 and IPA-CuBr3 are isostructural spin gap systems with a non-magnetic ground 
state.  Both the mechanism and the magnitude of the gap are quite different in the two 
systems. In the Cl-system, which was discovered to have a ladder-like magnetic structure 
based on an inelastic neutron experiment5), the neighboring two S=1/2 spins on the rung 
interact ferromagnetically to form effective S=1 spins at low temperatures.  The interaction 
among these integer spins is weak and antiferromagnetic, so that the ground state is gapped 
one as Haldane conjectured6).  In a Br-system, the interaction between the two neighboring 
spins is antiferromagnetic, so that they form a singlet dimer state at low temperatures.  The 
spin excitation gaps of these two systems have been reported to be 18 and 98 K based on 
magnetization experiments7-9).   The former was re-examined in the neutron experiment and 
reported to be 13.9 K5).  The aim of this paper is a microscopic investigation on the ground 
state of the solid solution of the two spin gap systems. 
Manaka et al. have been working intensively on the macroscopic measurements in 
IPA-Cu(ClxBr1-x)3, and they report that the system becomes gapless only within the limited 
region x=0.44-0.87, where a magnetic order occurs at low temperatures9,10).  However, a 
microscopic investigation on the spin state has not been conducted except for our preliminary 
result by µSR11).  In this paper, we report the existence of an antiferromagnetic phase 
transition in IPA-Cu(ClxBr1−x)3 (x=0.85) through the use of 1H-NMR spectra and the 
spin-lattice relaxation rate. 
2. Experimental 
Evaporation method was utilized to grow single crystals of IPA-Cu(ClxBr1−x)3 (x=1 and 
0.85).  An isopropanol solution of isopropylamine hydrochloride, copper(II) chloride 
dihydrate, isopropylamine hydrobromide, copper (II) bromide was placed in a bowl, which 
was maintained at 30(±0.1)°C, and in an atmosphere of flowing nitrogen gas during an entire 
period of crystal growth, approximately two months.  A typical size of the obtained crystals 
was around 3×5×10mm with a rectangular shape as was reported in a previous paper 9).  The 
content of Cl, x=0.85(5) was determined using the ICP method for three tiny fragments 
chipped off from different points of the crystal.  Macroscopic quantities of obtained crystals, 
the specific heat and the susceptibility were measured by PPMS and MPMS manufactured by 
Quantum Design Co Ltd.  The uniform susceptibility of x=0.85 showed a kink at around 
12K, below which it steeply decreased for H⊥C and slightly increases for H⊥A.  The 
specific heat also showed a small but distinct cusp at the same temperature.  The sample x=1 
showed no anomalies in neither of the two quantities.  These results are consistent with 
reports by Manaka 9,10). 
1H-NMR measurements were performed using a 4K cryogen-free refrigerator set in a 6T 
cryogen-free-superconducting magnet.  The spectra were measured by recording a spin-echo 
amplitude simultaneously ramping up or down the magnetic field.  The spin-lattice 
relaxation rate T1−1 was measured by the saturation-recovery method with a pulse train.  
Relaxation curves were traced until the difference of the nuclear magnetization from its 
saturation value becomes one percent. 
3.  Results and Discussion 
Figure 1 shows the field-swept spectra of the two samples x=0.85 and 1.0 at various 
temperatures under the field around 3T.  At high temperatures both the samples show a sharp 
paramagnetic resonance line.  The width around 18K is approximately 200 and 80 Oe for 
x=0.85 and 1.0 respectively.  As the temperature decreases, the x=1.0 sample maintains a 
sharp width, and only the x=0.85 sample shows a drastic splitting in the peak below TN=13.5 
K.  This temperature coincides with that for an anomaly in 0χ  and the specific heat.  Five 
distinct peaks overlap each other at the lowest temperature of 4 K.  
These multiple peaks in spectra correspond to the inequivalent proton sites exposed to a 
hyperfine field produced by the antiferromagnetically-ordered spins.  Ten inequivalent 
proton sites are in the unit cell, and these proton sites have different distances from the nearest 
Cu site, varying from 3.3 to 6.8 Å.  They should feel different hyperfine fields from the 
ordered moment and should contribute to each split peak.  Actually, observed peak 
separations around 300 Oe are comparable with the estimation by the classical dipole-dipole 
interaction between electronic and nuclear spins.  Though we are still in the process of 
conducting a detailed site assignment, which requires measurements of spectra involving 
various directions of the field, we conclude that the spin structure is simple antiferromagnetic 
rather than incommensurate or glass-like.   
Recently, Nakamura numerically investigated the disordered bond-alternating spin chain 
to report that an antiferromagnetic state is more stable than a glass-like state expected in a 
classical point of view4).  He pointed out that the emergence of Néel state is totally to due the 
quantum effect.   In our previous µSR report11), we have pointed out that in the sample with 
x=0.95 which belongs to the gapped region, the antiferromagnetic spin fluctuation is 
anomalously enhanced at low temperatures.  This fact suggests that the magnetic instability 
is inherently present in the gapped sample. 
Figure 2 shows the dependence on the temperature of a peak separation H∆ , which is 
defined as the distance between the positions at 20% of maxima of the left most and right 
most peaks, and is considered to be a good measure for the magnitude of the staggered 
moment and hence of the order parameter.  The dependence of H∆  on temperature is aptly 
described by the scaling function ( )βN1 TT− , where β  is a critical exponent estimated 
from data fitting to be 0.33, the value of which is close to the value of 0.327 which is 
expected 3D-Ising model16), and is consistent with our report by µSR11).  The reason for the 
appearance of dimensionality of the universality class is 3D, simply because the phase 
transition in 1D-spin systems must be set off by the weak interaction paths that runs three 
dimensionally in the system. 
In previous paper11), we have explained the appearance of the Ising-type universality 
class in terms of D*, an effective single-ion anisotropy that appears at low temperatures 
owing to the formation of effective S=1 spins in ferromagnetic dimers.  However, an 
existence of effective S=1 spins12-14) are proved only in the pure system of x=1.0 but the 
disordered system x=0.85 containing both ferromagnetic and antiferromagnetic dimers.  It is, 
therefore, not self-evident whether or not the disordered system has a universality class 
identical to the pure system.  According to Harris criterion15), the disorder does not affect the 
universality class only for those systems with the negative critical exponent of the specific 
heat α, which does not hold in the present case. 
Therefore, the universality class of x=0.85 sample should be determined carefully 
through experiments.  Generally, in the presence of finite magnetic anisotropy D, the 
universality class of the spin system with Heisenberg interaction J shows a crossover from the 
isotropic Heisenberg class to Ising-like one as temperature getting closer to TN, at the 
temperature defined as 1~φ−t
, where t  is the reduced temperature, and 1<φ , the 
crossover index16).  The critical exponent is expected to show simultaneously the gradual 
change from 0.367 of 3D-Heisenberg to 0.327 of 3D-Ising.  However, as is clear in Fig. 2, 
the value of β, the gradient of the fitting line, shows a further decrease from 0.33 as the fitting 
temperature region is expanded.   We conclude that our previous interpretation of the 
observed β to be the Ising model arisen from D* is possibly be misleading.  The reason of 
appearance of β=0.33 experimentally confirmed by both µSR and NMR is still open question 
at this stage, and must be resolved in the future both theoretically and experimentally. 
Figure 3 shows the dependence of the spin-lattice relaxation rate 11
−T  on temperature.  
In a ordered state, 11
−T  was measured on a left most peak.  There was no significant 
difference in 11
−T  on any peaks except for the center peak which bears rather a lower 
relaxation rate.  While 11
−T  of the gapped sample (x=1.0) decreases exponentially as the 
temperature decreases, reflecting the gapped ground state, the gapless sample (x=0.85) shows 
diverging behavior around NT .  This clearly demonstrates the fact that the observed phase 
transition is a second order one.  In the vicinity of the second order phase transition, the 
fluctuation in the magnetic field shows a critical slow down and enhances NMR- 11
−T , a 
measure for the Fourier component of the Larmor frequency in the fluctuation.  The 
dominant q-component of the fluctuation is considered to be located at far apart from zero, 
that is, possibly, around π=q .  The reason for this is the uniform susceptibility that probes 
the fluctuation around q=0 shows no diverging behavior around NT . 
The scaling plot of 11
−T  in the temperature region both above and below NT  is shown 
in Fig. 4, where the dynamical critical exponent was obtained by fitting ( )nTT N1− to the 
observed temperature dependence as n=1.0(4) for NTT <  and 0.5(4) for NTT > .  The 
latter value above TN coincides with that for the classical 3D localized spin system.  The 
universality class belongs to the three dimensional one, for which the same argument as that 
on β stated above holds.   
In conclusion, we investigated 1H-NMR on a bond-disordered spin-gap system 
IPA-Cu(Cl0.85Br0.15)3.  The existence of an antiferromagnetic long-range order was clearly 
demonstrated by peak splitting in the spectra and by the divergence of 11
−T .  The critical 
exponent was obtained from the temperature dependence of hyperfine field to be β=0.33, 
which is close to the value expected for the 3D-Ising model. 
Acknowledgment 
We thank Dr. H. Manaka and Prof. T. Ohtsuki for their valuable advice, and Dr. K. Noda 
at Kuwahara Lab., Sophia University for his assistance with specific heat measurements using 
PPMS.  We also thank Prof. T. Nojima at the Center of Low Temperature Science of Tohoku 
Univ. for his assistance with magnetization measurements using MPMS.   This work was 
supported by the Kurata Memorial Hitachi Science and Technology Foundation, Saneyoshi 
Scholarship Foundation and by a Grant-in-Aid for Scientific Research on priority Areas “High 
Field Spin Science in 100T” (No.451) from the Ministry of Education, Culture, Sports, 
Science and Technology (MEXT). 
1) Y. Kitaoka, S. Hiramatsu, K. Ishida, T. Kohara, K. Asayama: J. Phys. Soc. Jpn. 56, 3024 
(1987) . 
2) M. Hase, I. Terasaki, Y. Sasago, K. Uchinokura and H. Obara: Phys. Rev. Lett. 71, 4059 
(1993) . 
3) M. P. A. Fisher, P. B. Weichman, G. Grinstein, and D. S. Fisher: Phys. Rev. 40, 546 (1989). 
4) T. Nakamura: Phys. Rev. B71, 144401 (2005). 
 -10-
5) T. Masuda, A. Zheludev, H. Manaka, L.-P. Regnault, J.-H. Chung and Y. Qiu: Phys. Rev. 
Lett. 96, 047210 (2006).  
6) F. D. Haldane: Phys. Rev. Lett. 50, 1153 (1983). 
7) H. Manaka, I. Yamada, and K. Yamaguchi: J. Phys. Soc. Jpn. 66, 564 (1997). 
8) H. Manaka and I. Yamada: J. Phys. Soc. Jpn. 66, 1908 (1997). 
9) H. Manaka, I. Yamada and H. Aruga Katori: Phys. Rev. B63, 104408 (2001); In early 
papers of Manaka, three surfaces of the crystal were referred as A, B and C-plane in 
ascending order of area.  Recent neutron experiment by Masuda has revealed that b*-axis is 
perpendicular to C-plane, and c-axis, A-plane. 
10) H. Manaka, I. Yamada, H. Mitamura and T. Goto: Phys. Rev. B66, 064402 (2002). 
11) T. Saito, A. Oosawa, T. Goto, T. Suzuki and I. Watanabe: Phys. Rev. B74, 134423.  
12) H. Manaka, I. Yamada, M. Hagiwara and M. Tokunaga: Phys. Rev. B63, 144428 (2001). 
13) H. Manaka,I. Yamada: Physica B284-288, 1607 (2000). 
14) H. Manaka,I. Yamada: Phys. Rev. B62, 14279 (2000). 
15) A. B. Harris: J. Phys. C 7, 1671 (1974). 
16] J. Cardy: “Scaling and Renomalization in Statistical Physics” (Cambridge U. P. 1996). 
 -11-
−0.05 0 0.05
H−1γ/ν0 (T)
18.0K x=0.85(gapless)
x=1.0 (gapped) 15K
Fig. 1 1H-NMR spectra of x=0.85 and 1.0 at various temperatures.  The gyromagnetic ratio 
of 1H, 42.5774MHz/T is denoted as 1γ. 
0 5 10 15
Temperature (K)
TN=13.5K
0.01 0.05 0.1 0.5 1
H⊥C x=0.85 (gapless)
1−T(K)/TN
 β=0.33
∆H~ (1−|T |/TN)
 -12-
Fig. 2 Dependence of the splitting width ∆H on the temperature, definition of which is shown 
in Fig. 1.  The curve shows the scaling function ( )βN1 TT− , where =NT 13.5 K and 
33.0=β  are determined by data fitting. 
0 10 20 300
Temperature (K)
  x=0.85   
(gapless)
  x=1
(gapped)
0 1.0 2.0
0 10 20
Delay (msec)
↑ 3.7K
↓ 14K
Fig. 3  Dependence of the spin-lattice relaxation rate 11
−T  on temperature of gapped (x=1.0) 
and gapless (x=0.85) samples with the field direction H⊥C and H⊥B respectively.    The 
inset shows typical relaxation curves for the x=0.85 sample. 
 -13-
0.05 0.1 0.5 1
1−|T |/TN
x=0.85 (gapless)
T <TN    1.0  
T >TN    0.5
−1~ (1−|T |/TN)
Fig. 4 Scaling plot of 11
−T  against the reduced temperature N1 TT− .  The dynamical 
critical exponent was obtained from data fitting (dashed lines).
ABSTRACT
  The antiferromagnetic ordering in the solid-solution of the two spin-gap
systems (CH3)2CHNH3CuCl3 and (CH3)2CHNH3CuBr3 has been investigated by 1H-NMR.
The sample with the Cl-content ratio x=0.85 showed a clear splitting in spectra
below TN=13.5 K, where the spin-lattice relaxation rate T1-1 showed a diverging
behavior. The critical exponent of the temperature dependence of the hyperfine
field is found to be 0.33.

<|endoftext|><|startoftext|>
Introduction
String Theory, as the leading candidate for a unified theory of Particle Physics and
Gravity, should be able to describe all observed particle phenomena. One the most
valuable experimental pieces of information obtained in the last decade concerns neu-
trino masses. Indeed the evidence from solar, atmospheric, reactor and accelerator
experiments indicates that neutrinos are massive. The simplest explanation of the
smallness of neutrino masses is the see-saw mechanism [1]. The SM gauge symmetry
allows for two types of operators bilinear on the neutrinos (with dimension ≤ 4) :
Lν = MabνaRνbR + habνaRH̄Lb (1.1)
where νR is the right-handed neutrino, L is the left-handed lepton doublet and H̄ is
the Higgs field. In supersymmetric theories, this term arise from a superpotential with
the above structure, upon replacing fields by chiral superfields. If Mab is large, the
lightest neutrino eigenvalues have masses
Mν = < H̄ >
2 hTM−1h (1.2)
For M ∼ 1010 − 1013 GeV and Dirac neutrino masses of order charged lepton masses,
the eigenvalues are consistent with experimental results.
What is the structure of neutrinos and their masses in string theory? In specific
compactifications giving rise to the MSSM spectra singlet fields corresponding to right-
handed neutrinos νR generically appear. Dirac neutrino masses are also generically
present but the required Majorana νR masses are absent. This is because most MSSM-
like models constructed to date have extra U(1) symmetries, under which the right-
handed neutrinos are charged, which hence forbid such masses. In many models, such
symmetries are associated to a U(1)B−L gauge boson beyond the SM. In order to
argue for the existence of νR masses, string model builders have searched for non-
renormalizable couplings of the type (νRνRN̄RN̄R) with extra singlets NR. Once the
latter fields get a vev, U(1)B−L is broken and a Majorana mass appears for the νR.
Although indeed such couplings (or similar ones with higher dimensions) exist in some
semi-realistic compactifications, such a solution to the neutrino mass problem in string
theory has two problems: 1) The typical νR masses so generated tend to be too small
due to the higher dimension of the involved operators and 2) The vevs for the NR fields
breaks spontaneously R-parity so that dimension 4 operators potentially giving rise to
fast proton decay are generated. This is in a nutshell the neutrino problem in string
compactifications (see [2] for a recent discussion in heterotic setups).
In [3] (see also [4]) two of the present authors pointed out that there is a built-
in mechanism in string theory which may naturally give rise to Majorana masses for
right-handed neutrinos. It was pointed out that string theory instantons may generate
such masses through operators of the general form
Mstring e
−U νRνR . (1.3)
Here U is a linear combination of closed string moduli whose imaginary part gets
shifted under a U(1)B−L gauge transformation in such a way that the operator is fully
gauge invariant. The exponential factor comes from the semi-classical contribution
of a certain class of string instantons. This a pure stringy effect distinct from the
familiar gauge instanton effects which give rise to couplings violating anomalous global
symmetries like (B+L) in the SM. Here also (B−L) (which is anomaly-free) is violated.
This operator is generated due to existence of instanton fermionic zero modes which
are charged under (B − L) and couple to the νR chiral superfield. Although the effect
can take place in different constructions, the most intuitive description may be obtained
for the case of Type IIA CY orientifold compactifications with background D6-branes
wrapping 3-cycles in the CY. In the simplest configurations one has four SM stacks
of D6-branes labeled a,b, c,d which correspond to U(3), SU(2) (or U(2)), U(1)R and
U(1)L gauge interactions respectively, which contain the SM group. One can construct
compactifications with the MSSM particle spectrum in which quarks and leptons lie
at the intersections of those SM D6-branes. Then the relevant instantons correspond
to euclidean D2-branes wrapping 3-cycles in the CY (satisfying specific properties so
as to lead to the appropriate superspace interaction). The D2-D6 intersections lie the
additional fermionic zero modes which are charged under (B−L). For instantons with
the appropriate number of intersections with the appropriate D6-branes, and with open
string disk couplings among the zero modes and the νR chiral multiplet (see fig.(3.1)),
the operator in (1.3) is generated.
The fact that the complex modulus U transforms under U(1)B−L gauge transfor-
mations indicates that the U(1)B−L gauge boson gets a mass from a Stückelberg term.
So a crucial ingredient in the mechanism to generate non-perturbative masses for the
νR’s is that there should be massless U(1)B−L gauge boson which become massive by a
Stückelberg term. It turns out that not many semi-realistic models with U(1)B−L mass
from Stückelberg couplings have been constructed up to date. In the literature there
are two main classes of RR tadpole free models with massive B-L. The first class are
non-susy type IIa toroidal orientifold models first constructed in [5]. The second class
are the type II Gepner orientifold models constructed by one of the present authors and
collaborators [6, 7]. The former were already considered in [3]. In the present paper we
will concentrate on the RCFT Gepner model constructions, which lead to a large class
of MSSM like models, more representative of the general Calabi-Yau compactifications
(for a recent discussion of instanton-induced neutrino masses in a model with no RR
tadpole cancellation, see [8]).
The class of constructions in [6, 7] start with any of the 168 Type II compacti-
fications obtained by tensoring N = 2 SCFT minimal models. In addition one can
choose a number of modular invariant partition functions (MIPF), leading to a total of
5403. Then different consistent orientifold projections are performed on the different
models. This yields a total of 49304 Type II orientifolds. The open string sector of
the theory is defined in terms of the boundary states of the theory. Intuitively, they
play the same role as D-branes wrapping cycles in the geometrical settings. Thus one
associates boundary states a,b, c,d to the gauge groups giving rise to the SM. Differ-
ent choices for the SM boundary states lead to different spectra. In the present paper
we will make use of the data in [6] which contains 211634 different MSSM-like spectra
(including also different hidden sectors). Although this number is huge, most of these
models are really extensions of the MSSM, since they have either an extra U(1)B−L or
SU(2)R × U(1)B−L group factor beyond the SM group. As we said, we are actually
only interested in models in which the U(1)B−L gets a Stückelberg mass. Then we
find that the number of MSSM-like models with these characteristics is dramatically
reduced: only 0.18 percent of the models (391) have a massive U(1)B−L.
As we said, in the geometrical setting of IIA orientifolds with intersecting D6-branes
[9, 10] (see [11] for reviews and [12, 13] for the IIB counterparts), instantons are associ-
ated to D2-branes wrapping 3-cycles, like the background D6-branes do. Analogously,
in the RCFT setting the same class of boundary states appearing in the SM construc-
tions are the ones corresponding to instantons. The zero modes on the instanton is
computable from the overlaps of instanton brane boundary states (zero modes un-
charged under the 4d gauge group) or of instanton and 4d spacefilling brane boundary
states (zero modes charged under the corresponding gauge factor). We find that the
criteria for a non-perturbative superpotential to be generated [14] are only fulfilled if
the Chan-Paton (CP) symmetry of the instantons is O(1). For instantons with CP
symmetry 1 Sp(2) or U(1) we find that there are a few extra uncharged fermionic
zero modes which would preclude the formation of the searched superpotentials. On
the other hand we argue that the addition of fluxes and/or possible non-perturbative
1We adopt the convention that the fundamental representation of Sp(m) is m-dimensional.
extensions of the orientifold compactifications would allow also instantons with Sp(2)
and U(1) symmetries to generate such superpotentials. We thus include all O(1), Sp(2)
and U(1) instantons 2 in our systematic search. The computation of charged and un-
charged fermion zero modes may be easily implemented as a routine in a systematic
computer search for instanton zero modes in Gepner MSSM-like orientifolds. Results
of such a systematic computer search are presented in this article.
We find that out of the 391 models with massive U(1)B−L, there are very few ad-
mitting instantons with the required minimal O(1) CP symmetry, and in fact none
of them without additional vector-like zero modes. On the other hand we do find
32 models admitting Sp(2) symmetric instantons with just the required charged zero
mode content (and the minimal set of non-chiral fermion zero modes). In fact they
are all variations of the same orientifold Gepner model based on the tensor product
(k1, k2, k3, k4) = (2, 4, 22, 22). These models all in fact correspond to the same MIPF
and orientifold projection, they only differ on which particular boundary states corre-
sponding to the four a,b, c,d SM ‘stacks’. All models have the same chiral content,
exactly that of the MSSM , with extra vectorlike chiral fields which may in principle
become massive in different points of the CY moduli space. They have no hidden
sector, i.e., the gauge group is just that of the SM. For each of the models there are 8
instantons with Sp(2) CP symmetry with just the correct charged zero mode structure
allowing for the superpotential coupling (1.3) giving rise to νR Majorana masses. As
we said, they have extra uncharged fermion zero modes beyond the two required to
generate a superpotentials. However one would expect that these unwanted zero modes
might be lifted in more generic situations in which e.g. NS/RR fluxes are added.
We thus see that, starting with a ’large’ landscape of 211634 MSSM-like models,
and searching for instantons inducing neutrino masses, we find there are none admitting
the O(1) instantons with exactly the required zero mode structure, and only few (32)
examples with Sp(2) instantons with next-to-minimal uncharged zero mode structure
(and exactly the correct charged zero modes). Let us emphasize though that it is
the existence of massive U(1)B−L models which is rare. Starting with the subset of
models with a massive U(1)B−L, finding models with correct instanton charged zero
modes within that class is relatively frequent, 10 percent of the cases. Furthermore, we
will see that there are further models beyond those 32 which contain extra non-chiral
instanton zero modes and which may also be viable if these modes get massive by some
2We refer to the different kinds of instanton by their Chan-Paton symmetry on their volume. Since
we are not interested in gauge theory instantons, this notation should not be confusing.
effect (like e.g. the presence of RR/NS fluxes).
Instantons may generate some other interesting superpotential couplings in addition
to νR masses, some possibly beneficial and others potentially dangerous. In particular
we find that in the models which contain Sp(2) instantons which might induce νR
masses, there are also other instantons which would give rise directly to the Weinberg
operator [15]
(LHLH) (1.4)
Once the Higgs field gets a vev, this gives rise directly to left-handed neutrino masses.
Thus we find that in that class of models both the see-saw mechanism (which also gives
rise to a contribution to the Weinberg operator) and an explicit Weinberg operator
might contribute to the physical masses of neutrinos. Which effect dominates will
depend on the relative size of the corresponding instanton actions as well as on the size
of the string scale. Among potentially dangerous operators which might be generated
stand the R-parity violating operators of dimension < 5, which might give rise e.g. to
fast proton decay. We make an study of the possible generation of those, and find that
in all models in which νR masses might be generated R-parity is exactly conserved.
This is a very encouraging result.
A natural question to ask is whether one can say something about the structure
of masses and mixings for neutrinos. As argued in [3] generically large mixing angles
are expected, however to be more quantitative we also need to know the structure
of Yukawa couplings for leptons. In principle those may be computed in CFT but
in practice this type of computation has not yet been developed for CFT orientifolds.
Nevertheless we show that, in the case of instantons with Sp(2) CP symmetry, a certain
factorization of the flavor structure takes place, which could naturally give rise to a
hierarchical structure of eigenvalues for neutrino masses.
The structure of this article is as follows. In the next section we present a discussion
of instanton induced superpotentials in Type II orientifolds. This discussion will apply
both to Type IIA and Type IIB CY orientifolds as well as to more abstract CFT
orientifolds. We discuss the structure of both uncharged and charged instanton zero
modes. In particular we show that only instantons with O(1) CP symmetry have the
appropriate uncharged zero mode content to induce a superpotential contribution. We
also discuss how Sp(2) and U(1) might still generate superpotential contributions if
extra ingredients are added to the general setting. In section 3 we apply that discussion
to the specific case of the generation of νR Majorana masses, showing what is the
required zero mode structure in this case. We show how the flavor structure of the
Majorana mass term factorizes in the case of instantons with Sp(2) CP symmetry,
leading potentially to a hierarchical structure of eigenvalues. We further discuss the
generation of other B/L-violating operators including the generation of the Weinberg
operator as well as R-parity violating couplings. In section 4 we review the RCFT
Type II orientifold constructions in [6, 7]. A general discussion of zero fermion modes
for instantons in RCFT orientifolds is presented in section 5.
The results of our general search for instantons generating νR masses are presented
in section 6. We provide a list of all Gepner orientifolds which admit instanton con-
figurations potentially giving rise to νR Majorana masses. We describe the structure
of the models with Sp(2) instantons having the required charged zero modes for that
superpotential to be generated. We also describe the boundary states of the corre-
sponding instantons and provide the massless spectrum of the relevant MSSM-like
models. Other orientifolds with zero mode structure close to the minimal one are also
briefly discussed. A brief discussion about the possible generation of R-parity violating
superpotentials is included. We leave section 7 for some final comments. Some nota-
tion on the CFT orientifold constructions, and a discussion of the CFT symmetries in
the Sp(2) examples are provided in two appendices.
As this paper was ready for submission, we noticed [16, 17], whose discussion of
instanton zero modes partially overlaps with our analysis in Section 2.2.
2 Instanton induced superpotentials in Type II ori-
entifolds
In this Section we review the generation of superpotentials involving 4d charged fields
via D-brane instantons in type II compactifications. The discussion applies both to type
IIA and IIB models, and to geometrical compactification as well as to more abstract
internal CFT’s. For recent discussions on D-brane instantons, see [4, 3, 25, 8] .
Before starting, a notational remark in in order. Our notation is adapted to working
in the covering theory, namely the type II compactification, and orientifolding in a
further step. Thus we describe the brane configurations as a system of branes (described
by boundary states for abstract CFT’s), labeled k, and their orientifold images labeled
k′. Similarly, we denote M the brane / boundary state corresponding to the instanton
brane, andM ′ its orientifold image. If a brane is mapped to itself under the orientifold
action, we call it a ‘real’ brane / boundary, and ‘complex’ otherwise.
2.1 D-brane instantons, gauge invariance and effective opera-
A basic feature of type II orientifold compactifications with D-branes is the generic
presence of Stückelberg couplings between the U(1) gauge fields on the D-branes, and
certain 4d RR closed string 2-forms. These couplings are required by the Green-
Schwarz mechanism when the U(1)’s have non-zero triangle contributions to mixed
anomalies [18, 19], but can also exist for certain non-anomalous U(1)’s [5, 20]. These
couplings make the U(1) gauge bosons massive, but the symmetry remains as a global
symmetry exact in perturbation theory. Since the closed string moduli involved are
scalars (0-forms) in the RR sector, the natural candidate non-perturbative effects to
violate these U(1) symmetries are instantons arising from euclidean D-branes coupling
to these fields.
In computing the spacetime effective interaction mediated by the instanton, one
needs to integrate over the instanton zero modes. In the generic case (and in particular
for the case of our interest) there are no bosonic zero modes beyond the universal ones
(namely, the four translational bosonic zero modes associated to the position of the
instanton). On the other hand, the structure of fermion zero modes will be crucial.
Since we are interested in models with non-trivial 4d gauge group, arising from a set
of background 4d spacetime filling branes, we consider separately fermion zero modes
which are uncharged under the 4d gauge group and those which are charged. In this
paper we restrict our discussion to 4d N = 1 supersymmetric models, and this will
simplify the analysis of zero modes.
Fermion zero modes which are uncharged under the 4d gauge group determine
the kind of 4d superspace interaction which is generated by the instanton. We are
interested in generating superpotential interactions, which, as is familiar, requires the
instanton to have two fermion zero modes to saturate the d2θ superspace integration.
For this, a necessary (but not sufficient!) condition is that the D-branes are half-BPS,
so these fermion zero modes are the Goldstinos of the two broken supersymmetries.
In the string description, uncharged zero modes arise from open strings in the MM
sector (in our notation, the one leading to adjoint representations), which in particular
contain these Goldstinos, and the MM ′ sector (in our notation, the one leading to
two-index symmetric or antisymmetric tensors). Note that both are the same for ‘real’
branes. Hence we are primarily interested in D-branes whose MM sector contains
just two fermion zero modes, and whose MM ′ sector (for ‘complex’ branes) does not
contain additional fermion zero modes.
In analogy with the familiar case of gauge theory instantons [21], charged fermion
zero modes determine the violation of perturbative global symmetries by the instanton-
induced interaction. Namely, in order to saturate the integration over the charged
fermions zero modes, the spacetime interaction must contain insertions of fields charged
under the 4d gauge symmetry, and in particular under the global U(1) factors, which
are thus violated by the D-brane instanton. Notice that this holds irrespectively of the
number of uncharged fermion zero modes, namely of the kind of superspace interaction
induced by the instanton. Restricting to superpotential interactions, the resulting
operator in the 4d effective action has roughly the form
Winst = e
−U Φ1 . . .Φn (2.1)
Here the fields Φ1, . . . ,Φn are 4d N = 1 chiral multiplets charged under the 4d gauge
group, and in particular also under the U(1) symmetries. Note also that the instanton
amplitude contains a prefactor (which in general depends on closed and open string
moduli) arising from the Gaussian path integral over (massive) fluctuations of the
instanton (hence described by an open string annulus partition function, see [22, 23]
for related work), which we can ignore for our purposes in this paper.
For D-brane instantons, U is the closed string modulus to which the euclidean D-
brane couples. In the D-brane picture, instanton fermion zero modes charged under
the gauge factor carried by the kth stack of 4d space-filling branes (and its image
k′) arise from open strings in the Mk and Mk′ sectors, transforming as usual in the
( M , k) and ( M , k) representations, respectively (with both related in the case of
‘real’ branes). The (net) number of instanton fermion zero modes with such charges is
given by certain multiplicities 3 IMk, IMk′.
A D-brane instanton, irrespectively of the superspace structure of the 4d interac-
tions it may generate, violates U(1)k charge conservation by an amount IMk − IMk′ for
‘complex’ branes and IMk for ‘real’ branes. In particular, this is the total charge of the
field theory operator Φ1 . . .Φn in (2.1). From the Stückelberg couplings, it is possible
to derive [3] (see [23, 24, 25] for related work, also [26])
3In geometric type IIA compactifications with 4d spacetime-filling branes and instanton branes
given by D6- and D2-branes wrapped on Special Lagrangian 3-cycles, IMk corresponds to the inter-
section number between the 3-cycles corresponding to the kthD6- and the D2-brane M (and similarly
for IMk′ ). In geometric type IIB compactifications, it corresponds to the index of a suitable Dirac
operator. In general (even for abstract CFT’s) it can be defined as the Witten index for the 2d theory
on the open string with the boundary conditions corresponding to the two relevant branes. We will
often abuse language and refer to this quantity as intersection number, even in Section 6 where we
work in the non-geometric regime of type IIB compactifications.
that for ‘complex’ instantons, gauge transformations of the U(1)k vector multiplets
Vk → Vk + Λk, transform U as
U → U +
Nk(IMk − IMk′)Λk (2.2)
For ‘real’ brane instantons, which were not considered in [3], the shift is given by 4
U → U +
NkIMkΛk (2.3)
(this new possibility will be an important point in our instanton scan in Section 6).
The complete interaction (2.1) is invariant under the U(1) gauge symmetries. How-
ever, from the viewpoint of the 4d low-energy effective field theory viewpoint, it leads to
non-perturbative violations of the perturbative U(1) global symmetries, by the amount
mentioned above.
In the string theory construction there is a simple microscopic explanation for the
appearance of the insertions of the 4d charged fields (related to the mechanism in
[27]). The instanton brane action in general contains cubic terms αΦ γ, involving two
instanton fermions zero modes α in the ( M , k) and γ in the ( p, M) coupling to the 4d
spacetime field Φ in the ( k, p) of the 4d gauge group
5. Performing the Gaussian path
integral over the instanton fermion zero modes leads to an insertion of Φ in the effective
spacetime interaction. In general, and for a ‘complex’ instanton, there are several
fermion zero modes αi, γi in the fundamental (resp. antifundamental) of the instanton
gauge group, coupling to a 4d spacetime chiral operators Oij (which could possibly
be elementary charged fields, or composite chiral operators). Gaussian integration
over the fermion zero modes leads to an insertion of the form detO (for ‘real’ brane
instantons, detO should be interpreted as a Pfaffian). It is straightforward to derive
our above statement on the net charge violation from this microscopic mechanism.
Note that the above discussions show that instantons in different topological sec-
tors (namely with different RR charges, and different intersection numbers with the
4d spacefilling branes) contribute to different 4d spacetime operators. In particular,
multiwrapped instantons, if they exist as BPS objects, contribute to operators different
4Equivalently, one may use (2.2), but must include an additional factor of 1/2 from the reduction
of the volume for a real brane (which is invariant under the orientifold action).
5 Although there is no chirality in 0+0 dimension, the fermion zero modes α and γ are distinguished
by their chirality with respect to the SO(4) global symmetry of the system (which corresponds to
rotations in the 4d spacetime dimensions, longitudinal to the space-filling branes, and transverse to
the instanton brane). Supersymmetry of the instantons constrains the couplings on the instanton
action (such as the cubic couplings above) to have a holomorphic structure.
from the singly wrapped instanton. This implies that the instanton expansion for a
fixed operator is very convergent, and could even be finite.
Another important implication of the above discussion is that, in order to generate
a specific operator via an instanton process, a necessary condition is that the instanton
has an appropriate number and structure of charged zero modes. However, this is not
sufficient. Insertions of 4d fields appear only if the fields couple to the instanton fermion
zero modes via terms at most quadratic in the zero modes. In equivalent terms, only
zero modes appearing in the Gaussian part of the instanton action can be saturated
by insertions of 4d fields (those to which they couple). The requirement that the zero
modes have appropriate couplings to the 4d fields may be non-trivial to verify in certain
constructions. This is the case for the Gepner model orientifolds in coming sections,
whose couplings are computable in principle, but unknown in practice. In such cases
we will assume that any coupling which is not obviously forbidden by symmetries will
be non-vanishing. Unfortunately there are no arguments to estimate the actual values
of such non-vanishing couplings, hence we can argue about the existence of certain
instanton induced operators, but not about the coefficients of such terms.
2.2 Zero mode structure for D-brane instantons
In this section we describe more concretely different kinds of instantons and the struc-
ture of interesting and unwanted zero modes. Our discussion will be valid for general
D-brane models in perturbative type II orientifolds without closed string fluxes, al-
though we also make some comments on more general F-theory vacua and the effects
of fluxes. A more specific discussion is presented in Section 5.
2.2.1 Uncharged zero modes
We start discussing zero modes uncharged under the 4d gauge group. These are crucial
in determining the kind of superspace interaction induced by the instanton on the 4d
theory. In particular, we are interested in instantons contributing to the 4d superpo-
tential, namely those which contain just two fermion zero modes in this sector. We are
also interested in instantons which may contain additional fermion zero modes, and the
possible mechanisms that can be used to lift them. Let us discuss ‘real’ and ‘complex’
brane instantons in turn.
• Real brane instantons
Real brane instantons correspond to branes which are mapped to themselves by the
orientifold action, hence M = M ′. Uncharged zero modes arise from the MM open
string sector. As discussed in Section 5, there is a universal sector of zero modes, in the
sense that it is present in any BPS D-brane instanton, which we now describe. Before
the orientifold projection, we have a gauge group U(n) on the volume of n coincident
instantons. Notice that, although there are no gauge bosons in 0 + 0 dimensions, the
gauge group is still well-defined, since it acts on charged states (open string ending on
the instanton brane). There are four real bosonic zero modes and four fermion zero
modes in the adjoint representation. For the minimal U(1) case, the four bosons are the
translational Goldstones. The four fermions arise as follows. This sector is insensitive
to the extra 4d spacefilling branes, and feels an accidental 4d N = 2 supersymmetry.
The BPS D-brane instanton breaks half of this, and leads to four Goldstinos, which
are the described fermions 6.
The orientifold projection acts on this universal sector as follows (see Section 5 for
further discussion). The gauge group is projected down to orthogonal or symplectic.
Hence instanton branes with symplectic gauge group must have even multiplicity (a
related argument, in terms of the orientifold action on Chan-Paton indices, is given in
Section 5). For instantons with O(n) gauge symmetry, the orientifold projects the four
bosonic zero modes and two fermion zero modes (related by the two supercharges of 4d
N = 1 supersymmetry broken by the instanton) to the two-index symmetric represen-
tation, and the other two fermion zero modes (related by the other two supercharges of
the accidental 4d N = 2 in this sector) to the antisymmetric representation. Hence for
O(1) instantons (namely instantons with O(1) gauge group on their volume), we have
just two fermion zero modes, which are the Goldstinos of 4d N = 1 supersymmetry,
and the instanton can in principle contribute to the superpotential (if no additional
zero modes arise from other non-universal sectors). For instantons with Sp(n) gauge
symmetry, the orientifold projects the four bosonic zero modes and two fermion zero
modes to the two-index antisymmetric representation, and the other two fermion zero
modes to the symmetric representation. Hence, even for the minimal case of Sp(2)
instantons, we have just two fermion zero modes in the triplet representation, in ad-
dition to the two 4d N = 1 Goldstinos. Hence Sp(2) instantons cannot contribute to
the superpotential in the absence of additional effects which lift these zero modes (see
later) 7.
6We thank F. Marchesano for discussions on this point.
7 For D-brane instantons corresponding to gauge instantons, the additional fermion zero modes
in the universal sector couple to the boson and fermion zero modes from open strings stretched
between the instanton and the 4d spacefilling brane. They act as Lagrange multipliers which impose
In addition to this universal sector, there exist in general additional modes, whose
presence and number depends on the detailed structure of the branes. Namely, on
the geometry of the brane in the 6d compact space in geometric compactifications, or
on the boundary state of the internal CFT in more abstract setups. They lead to a
number of boson and fermion zero modes in the symmetric or antisymmetric represen-
tation. The computation of these multiplicities in terms of the precise properties of
the instanton branes is postponed to Section 5. In order to generate a superpotential,
one must require these modes to be absent, except for the case of antisymmetrics of
O(1) instantons, which are actually trivial.
An important point is that extra fermion zero modes (including the extra triplet
fermion zero modes in the universal sector of Sp(2) instantons, and any two-index
tensor fermion zero mode in the non-universal sectors) are in principle not protected
against acquiring non-zero masses once the model is slightly modified. In other words,
such fermions are non-chiral, in terms of the SO(4) chirality in footnote 5. One
such modification is motion in the closed string moduli space, which can lift the non-
universal modes if there are non-trivial couplings between them and closed string mod-
uli (unfortunately, such correlators are difficult to compute, even in cases where the
CFT is exactly solvable, like the Gepner models). Note that extra zero modes in the
universal sector of Sp(2) instantons cannot be lifted by this effect, since it does not
break the accidental 4d N = 2 in this sector. A second possible modification which
in general can lift extra zero modes is the addition of fluxes, generalizing the results
for D3-brane instantons in geometric compactifications [29] (for non-geometric CFT
compactifications, we also expect a similar effect, once fluxes are introduced following
[30]). Note that fluxes can lift extra zero modes in the universal sector as well, since
they can break the accidental 4d N = 2 susy in this sector. A last mechanism arising in
more general F-theory compactifications and discussed below for complex instantons,
is valid for real instanton branes as well.
The bottom line is that in the absence of such extra effects, only O(1) instantons can
contribute to superpotential terms. However, in modifications of the model such extra
effects can easily lift the extra fermion zero modes. Hence, this kind of extra vector-like
zero modes will not be considered catastrophic, and real instantons (including the O(1)
and Sp(2) cases) with such zero modes are considered in our scan in Section 6.
the fermionic constraints in the ADHM construction [28], and may not spoil the generation of a
superpotential.
• Complex brane instantons
Zero modes uncharged under the 4d gauge group can arise from the MM and
MM ′ open string sectors. Notice that the orientifold action maps the MM sector
to the M ′M ′, hence we simply discuss the former and impose no projection. The
discussion of the MM sector is as for real brane instantons before the orientifold
projection. The universal sector leads to a U(n) gauge symmetry, and four bosonic and
four fermionic zero modes in the adjoint representation. The bosons are translational
Goldstones, while the fermions are Goldstinos of the accidental 4d N = 2 present in
this sector. Hence, even in the minimal case of U(1) brane instantons there are two
extra fermion zero modes, beyond the two fermion zero modes corresponding to the
4d N = 1 Goldstinos. Hence U(1) instanton (except for those corresponding to gauge
instantons, see footnote 7) cannot contribute to superpotential terms in the absence of
additional effects, like closed string fluxes . However, keeping in mind the possibility
of additional effects lifting them in modifications of the model, we include them in the
discussion. Also, in what follows we will use the U(n) notation for the different fields
to keep track of the Chan-Paton index structure.
The above statement would seem in contradiction8 with computations of non-
perturbative superpotentials [14] induced by M5-branes instantons in M-theory com-
pactifications on Calabi-Yau fourfolds, which are dual to D3-brane instantons (with
world-volume U(1) gauge group) on type IIB compactifications. The resolution is that
the M5-branes that contribute are intersected by different (p, q) degenerations of the
elliptic fiber. This implies that U(1) D3-brane instanton only contribute if they are
intersected by mutually non-local (p, q) 7-branes. The two extra fermion zero modes
exist locally on the D3-brane volume, but cannot be defined globally due to the 7-brane
monodromies. Hence such effect can take place only on non-perturbative type IIB com-
pactifications including (p, q) 7-branes. Note that in perturbative compactifications,
namely IIB orientifolds, the (p, q) 7-branes are hidden inside orientifold planes [31] with
their monodromy encoding the orientifold projection; hence the only branes that can
contribute to the superpotential are real branes, for which the projection/monodromy
removes the extra fermion zero modes, as discussed above.
In addition to this universal sector, theMM sector may contain a non-universal set
of fermions and bosons, in the adjoint representation (hence uncharged under U(1)).
They depend on the specific properties of the brane instanton, and will be discussed
in Section 5. These additional zero modes should be absent in order for the instanton
8We thank S. Kachru for discussions on the ideas in this paragraph.
to induce a non-trivial superpotential. Notice however that these zero modes are un-
charged under any gauge symmetry, and hence vector-like. Thus, there could be lifted
in modifications of the model, as discussed for real instantons.
The MM ′ sector is mapped to itself under the orientifold action. Hence it leads to
a number of bosons and fermions in the two index symmetric or antisymmetric repre-
sentations. Notice that the two-index antisymmetric representation is trivial for U(1),
so these modes are actually not present. On the other hand, fermion zero modes in the
two-index symmetric representation are chiral and charged under the brane instanton
gauge symmetry. Hence they cannot be lifted by any of the familiar mechanisms, and
thus spoil the appearance of a non-perturbative superpotential, even if the model is
modified. Such fermion zero modes are considered catastrophic and we will look for
models avoiding them in our scan in Section 6.
2.2.2 Charged fermion zero modes
• Real brane instantons
Instanton zero modes charged under the 4d gauge group arise fromMk open string
sectors (and their image Mk′). In the generic case, there are no scalar zero modes in
these sectors. This is because in mixed Mk open string sectors the 4d spacetime part
leads to DN boundary conditions, which already saturate the vacuum energy in the NS
sector. Only in the special case where the internal structure of the spacetime filling
brane k and the instanton brane are the same, there may be NS ground states of the
internal CFT which do not contribute extra vacuum energy, hence leading to massless
scalars. However, this corresponds to brane instantons which can be interpreted as
instantons of the 4d gauge theory on the 4d space-filling branes. The instantons we
are interested in for the generation of neutrino Majorana mass terms are not of this
kind [3] (see e.g. [32, 33, 28, 25, 34] for discussions on gauge theory instantons from
D-brane instantons).
Hence we focus on charged fermion zero modes, which are generically present in
any mixed Mk sector. Let us define LMk, LMk′ the (positive by definition) number
of left-handed chiral fermion zero modes in the representations ( M , k), ( M , k),
respectively. The net number of chiral fermion zero modes in the ( M , k) is given
by IMk = LMk′ − LMk. This controls the violation of the U(1)a global charge by the
instanton. Namely, such fermion zero modes in the Mk, Mp sectors lead (if suitable
couplings are present) to the insertion of 4d charged fields Φkp and/or Φkp′.
In addition, there are PMk = min(LMk′ , LMk) vector-like pairs of fermion zero
modes. Since they are vector-like, they may be lifted by slight modifications of the
model, like moving in the closed string moduli space, or by introducing additional
ingredients, like fluxes. In addition, they may be lifted by moving in the open string
moduli space of the 4d spacefilling branes, as follows. The zero modes may lead to
insertions of 4d fields Φkk, if the kk sector contains such 4d chiral multiplets (or to
insertions of composite 4d operators in the adjoint of the kth 4d gauge factor), and if
they couple to the zero modes. Although this may not be generically not the case,
many of our models in coming section contain such adjoint fields. Hence, a non-
trivial vev for the latter can lift these extra vector-like zero modes, hence leading to
instanton generating the superpotential of interest. Given these diverse mechanisms to
lift these zero modes, their presence of such zero modes is thus unwanted, but again
not necessarily catastrophic.
One last comment, related to the concrete kind of instanton search we will be
interested in. Namely, we will be searching for instantons leading to a specific operator,
carrying non-trivial charges under a specific set of 4d gauge factors. Postponing the
detailed discussion to Sections 3.1, 4.2 , let us denote a, b, c, d the set of branes
leading to a field theory sector, denoted ‘observable’ (and which reproduces the SM
in our examples). We will require the instanton to have a prescribed number of chiral
fermion zero modes charged under these branes, namely we require the intersection
numbers of the instanton with these branes IMa, . . . , IMd to have specific values (as
mentioned above, in the most restrictive scan we forbid vector-like pairs of zero modes
under these branes). In addition, the model in general contains an additional sector of
branes, denoted ‘hidden’ (since there is zero net number of chiral multiplets charged
under both sectors) and labeled hi, required to fulfill the RR tadpole cancellation
conditions. In general there may be instanton fermion zero modes from e.g. the
Mh1, h2M sectors, which would contribute to insertions of the 4d fields in the h1h2
sector if there are suitable cubic couplings. These extra insertions could be avoided
if such 4d fields in the hidden sector acquire vevs (note that vevs for the (vector-like)
fields charged under the visible and hidden sectors would typically break hypercharge,
and should be avoided), and hence lift the zero modes. Equivalently, from the 4d
perspective, the unwanted extra h1h2 field insertion is replaced by its vev. However,
this renders the discussion very model dependent. Moreover, the possibility of hidden
brane recombination was not included in the search for SM-like models in [6, 7] (namely,
the possibility of allowing for chiral fields charged under the observable and hidden
gauge groups, which may become non-chiral and possibly massive upon hidden brane
recombination). Hence we will consider these chiral fermion zero modes as unwanted
(as usual, non-chiral modes are unwanted but not catastrophic, hence they are allowed
for in a more relaxed search).
• Complex brane instantons
The discussion of ‘complex’ brane instantons is somewhat analogous to the previous
one, with the only complication that the braneM and its imageM ′ lead to independent
modes, leading to a more involved pattern of fermion zero modes. Instanton zero modes
charged under the 4d gauge group arise from the Mk,Mk′ and related sectors. As for
‘real’ brane instantons, there are generically no scalars in these sectors (and certainly
not in our case of interest). Hence we focus on charged fermion zero modes, which are
generically present in any mixed sector.
In contrast with ‘real’ brane instantons, a net combination of fermion zero modes
in the ( M , k) + ( M , k) is not vector-like, but chiral under the instanton gauge
symmetry. Such a pair cannot therefore be lifted even by modifications of the theory.
In general, if the instanton has a mismatch in the total numbers nα, nγ of fermion zero
modes αi in the M and γj in the M , the instanton amplitude automatically vanishes.
Namely, the matrix of operators Oij coupling to the zero modes necessarily has rank
at most min(nα, nγ). That is , if nα > nγ there are linear combinations of the αi which
do not couple, and cannot lead to insertions. Moreover, they are not liftable by the
familiar mechanisms 9, thus in our instanton search in Section 6 such excess zero modes
are forbidden even in relaxed scans.
Let us thus discuss a sector of fermion zero modes with equal number nα = nγ .
Considering a given 4d space-filling brane k, let us denote LMk, LM ′k′, LMk′, LM ′k the
(positive by definition) number of left-handed chiral fermion zero modes in the repre-
sentations ( M , k), ( M , k), ( M , k), and ( M , k) respectively. The net number of
chiral fermion zero modes in the ( M , k) and ( M , k) is given by IMk = LMk′ −LM ′k′
and IMk′ = LMk−LM ′k, respectively. This net number of fermions zero modes controls
the violation of the U(1)a global charge by the instanton. Namely, such fermion zero
modes in the Mk, Mp, Mk′, Mp′ sectors lead (if suitable couplings are present) to the
insertion of 4d charged fields Φkp and/or Φkp′.
9Note that such a mismatch is always correlated with the existence of extra chiral zero modes in
the MM ′ sectors, discussed above. Denoting ~Qa, ~Qorient the vector of RR charges of the 4d space-
filling branes and orientifold planes, they satisfy the RR tadpole conditions
Na ~Qa +
Na ~Qa′ +
~Qorient. = 0. By taking the ‘intersection’ bilinear with the RR charges ~QM of the brane instanton, we
have IMa+IMa′+IM,orient = 0. This implies that the number of fundamentals minus anti-fundamentals
of the instanton gauge group is related to the number of two-index tensors.
The remaining fields in this sector are vector-like pairs, in the ( M , k) + ( M , k)
or the ( M , k) + ( M , k), which in principle lead to the vanishing of the instanton
amplitude, but which can be lifted by additional effects (motion in closed or open string
moduli space, or addition of fluxes), in a way consistent with the gauge symmetries in
4d spacetime and on the instanton.
Just like for ‘real’ brane instantons, we conclude by commenting on our concrete
instanton search in models with a set of visible branes a, b, c, d and a set of hidden
branes hi. The requirement that the instanton leads to an operator with specific
charges under the visible branes fixes the values of the quantities IMa − IMa′, etc. As
we described for real branes, one may still have fermion zero modes charged under the
hidden sector branes, but they lead to additional insertions, hence we rather focus on
instantons with IMhi − IMh′j = 0. The two kinds of conditions, on intersection numbers
with visible and hidden branes, still leave the possibility of combinations of fermion
zero modes of the kind ( M , k) + ( M , k), which do not contribute to IMk, or of the
kind ( M , k) + ( M , k), which does not contribute to IMk′. Such combinations are
automatically vector-like, and thus may be lifted in modifications of the theory. But the
condition also allow combinations like ( M , k)+( M , k), which exploit a cancellation
between IMk and IMk′ (as also does ( M , k) + ( M , k)). Such combinations are
chiral by themselves, and in general imply a mismatch of modes in the M and the
M . The total mismatch can be arranged to vanish using combinations of the kind
( M , k) + ( M , k) and ( M , p) + ( M , p) for different branes. However, the only
way to lift these pairs is by breaking the gauge symmetry on the 4d space-filling branes
k and p. This can be done without damage to the visible sector if these are hidden
branes, but this corresponds to the recombination of hidden branes that, as mentioned
already, we are not going to consider. Hence only vector-like pairs with respect to
each brane are considered to be liftable in simple modifications of the theory. In our
instanton search, these are the only additional fermion zero modes which we allow in
relaxed scans (but they are clearly not allowed for in restricted scans)
3 Instanton induced Majorana neutrino masses
In this Section we discuss the possible physical effects of D-brane instantons in string
models with SM-like spectrum. In particular we describe the conditions to generate
right-handed neutrino Majorana masses. We also comment on other possible B and/or
L violating operators that can be generated by instantons. In this section we will again
Sp(2),U(2)
U  , D 
gluon
d− Leptonic
a− Baryonic
b− Left c− Right
Figure 1: Quarks and leptons at intersecting branes
use the geometrical language of IIA intersecting D-branes but it should be clear that
our discussion equally applies to general CFT orientifolds like the ones presented in
the next section.
3.1 The MSSM on the branes
Let us now specify the discussion in the previous section to the case of the generation
of a right-handed neutrino mass term. In order to do that we need some realistic
orientifold construction with the gauge group and fermion spectrum of the Standard
Model (SM). In the context of Type II orientifolds perhaps the most economical brane
configuration leading to a SM spectrum is the one first considered in [5]. This consists
of four stacks, labelled a,b, c,d. The gauge factor on branes a is U(3), and contains
QCD and baryon number. The d factor is U(1)d, and corresponds to lepton number.
Stack b contains SU(2)Weak either embedded in U(2) or Sp(2). Finally brane c can
either provide a U(1) or an O(2) factor. In the brane intersection language, the chiral
fermions of the SM live at the intersections of these branes, as depicted in Fig. 3.1.
The U(1)Y factor of the standard model is embedded in the Chan-Paton factors of
branes a,c and d as
(QB−L −QR) (3.1)
where Qx denotes the generator of the U(1) of brane stack x (in case the Chan-Paton
factor of brane c is O(2) one should use the properly normalized O(2) generator).
Note that in this convention the Qd generator appears with sign opposite to other
conventions in the literature, e.g. in [3]. In addition to Y these models have two
additional U(1) gauge symmetries:
Qanom = 3Qa +Qd = 9QB +QL
Y ′ =
Qa +Qc −Qd = QB−L +QR (3.2)
The first is anomalous whereas the second, which we will call B − L (with a slight
abuse of language, since it is in fact a linear combination of B − L and hypercharge),
is anomaly free. In models in which the electroweak gauge group is embedded in U(2),
rather than in Sp(2), there is a second anomalous U(1)b. The charges of the SM
particles under these U(1) symmetries are given in table 1.
Intersection D = 4 fields/ zero modes Qa Qc Qd Y QM
(ab),(ab’) QL 3(3, 2) 1 0 0 1/6 0
(ca) UR 3(3̄, 1) -1 1 0 -2/3 0
(c’a) DR 3(3̄, 1) -1 -1 0 1/3 0
(db),(db’) L 3(1, 2) 0 0 1 -1/2 0
(c’d) ER 3(1, 1) 0 -1 -1 1 0
(cd) νR 3(1, 1) 0 1 -1 0 0
(Mc) αi 2(0, 0) 0 -1 0 1/2 1
(dM) γi 2(0, 0) 0 0 1 -1/2 -1
Table 1: Standard model spectrum and U(1) charges of particles and zero modes. QM stands
for the world-volume gauge symmetry in the case of U(1) complex instantons.
The U(1)k gauge symmetries have couplings with the RR 2-forms Br of the model,
as follows
SBF =
Nk(pkr − pk′r)
Br ∧ Fk (3.3)
where pkr, pk′,r are given by the RR charges of the D-branes. These imply that under
a U(1)k gauge transformation Ak → Ak + dΛk the scalar ar dual to Br transforms as
ar → ar +
Nk (pkr − pk′r) Λk (3.4)
This has two effects: 1) The linear combination of axion fields
r(pkr − pk′,r)ar is
eaten up by the U(1)k massless gauge boson, making it massive. 2) For anomalous
U(1)k, the anomalies cancel through a 4d version of the Green-Schwarz mechanism.
This works due to the existence of appropriate ar F ∧F couplings, involving the gauge
fields in the theory.
It is obvious that all anomalous U(1)’s become massive by this mechanism. However
it is important to realize [5] that gauge bosons of anomaly-free symmetries like U(1)B−L
may also become massive by combining with a linear combination of axions. This is
interesting since it provides a mechanism to reduce the gauge symmetry of the model
without needing explicit extra Higgsing. In the models in which U(1)B−L becomes
massive in this way, the gauge group left over is purely that of the SM. Moreover, we
will see that having (B-L) massive by this Stückelberg mechanism is crucial to allow
the generation of instanton-induced Majorana neutrino masses.
Note that the B∧F couplings may also be potentially dangerous, since in principle
they could also exist for hypercharge, removing U(1)Y from the low-energy spectrum.
As we will see in our RCFT examples later on, having massless U(1)Y but massive
U(1)B−L turns out to be a strong constraint in model building.
3.2 Majorana mass term generation
As discussed in the previous section, string instantons can give rise to non-perturbative
superpotentials breaking explicitly the perturbative global U(1) symmetries left-over
from U(1) gauge bosons made massive through the Stückelberg mechanism. The kind
of operator we are interested in has the form
W ≃ e−Sins νRνR (3.5)
where νR is the right-handed neutrino superfield
10. Here Sins transforms under both
U(1)B−L and U(1)R in such a way that the overall operator is gauge invariant. This
operator may be created if the mixed open string sectors lead to fermionic zero modes
αi, γi , i = 1, 2, appropriately charged under the 4d gauge factors. As we discussed in
the previous section, to generate a superpotential one needs instanton with O(1) Chan-
Paton symmetry, in order to lead to two uncharged fermion zero modes to saturate
the d2θ 4d superspace integration. On the other hand, as we argued, instantons with
Sp(2) or U(1) CP symmetries may also induce the required superpotentials if there
is some additional dynamics getting rid of the extra uncharged zero modes which in
principle appear in instantons with these symmetries. We thus consider all O(1), Sp(2)
and U(1) instantons in our discussion.
In order to to get a νR bilinear, the intersection numbers of instanton M and d, c
branes are as follows
Sp(2) case : IMc = 1 ; IMd = −1 (3.6)
(since there is an extra multiplicity from the two branes required to produce Sp(2))
O(1) case : IMc = 2 ; IMd = −2 (3.7)
10 Actually we denote by νR the left-handed ν
L field following the usual (a bit confusing) convention.
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
���������������
Figure 2: Disk amplitude coupling two charged zero modes to νR in the geometrical Type
IIA intersecting brane approach.
U(1) case : IMc = 2 ; IMd = −2 or IMd′ = 2 ; IMc′ = −2 (3.8)
Furthermore there must be cubic couplings involving the right-handed neutrino super-
field νa in the ath family and the fermionic zero modes αi, γj
Lcubic ∝ dija (αi νaγj) , a = 1, 2, 3 (3.9)
In type IIA geometric compactifications, this coupling arises from open string disk
instantons, see Fig. 3.1. In general type IIA models (resp. IIB models), the coefficients
dija depend on the Kähler (resp. complex structure) moduli, and possibly on open string
moduli. In simple CFT models (like e.g. in toroidal cases) these quantities may be in
principle explicitly computed.
These trilinear couplings appear in the instanton action and after integration of the
fermionic zero modes αi, γi one gets a superpotential coupling proportional to
d2α d2γ e−d
a (αiν
aγj) =
d2θ νaνb ( ǫijǫkld
b ) (3.10)
yielding a right-handed neutrino mass term. This term is multiplied by the exponential
of the instanton euclidean action so that the final result for the right-handed neutrino
mass (up to a 1-loop prefactor) has the form
MRab = Ms( ǫijǫkld
b ) exp(−
qM,rar ) (3.11)
For geometric compactifications VΠM is roughly related to the wrapped volume. We
keep the same notation to emphasize that the effect is non-perturbative in gs. In
supersymmetric models the term in the exponential is the linear combination U of
complex structure moduli to which the instanton D-brane couples, as described in the
previous section. As explained, the gauge U(1)c, U(1)d transformation of the bilinear
piece and the e−SD2 factor nicely cancel. Note that from the viewpoint of the 4d SM
effective field theory, the instanton has generated a Majorana neutrino mass violating
B−L. Notice also that since this symmetry is non-anomalous, its violation cannot be
associated to a gauge instanton, hence this is a pure string theory instanton effect.
3.3 Flavor and the special case of Sp(2) instantons
In order to extract more specific results for the flavor structure of the obtained Majo-
rana mass operator, one needs to know more details about the quantities dija coming
from the disk correlators. However in the particular case of Sp(2) instantons, the la-
bels i, j are Sp(2) doublet indices, and the symmetry requires dija = daǫ
ij. The mass
matrix for the three neutrinos is given by MRab =2Msdadb exp (−U), so that the flavour
dependence on a, b = 1, 2, 3 factorizes. More generally, as we will see in our RCFT
search in Section 6, there are typically several different instantons contributing to the
amplitude, so that we actually have a result for the mass
MRab = 2Ms
d(r)a d
−Ur (3.12)
where the sum goes over the different contributing instantons. One thus has a structure
of the form
e−Urdiag (d
1 , d
2 , d
3 ) ·
1 1 1
1 1 1
1 1 1
· diag (d(r)1 , d
2 , d
3 ) . (3.13)
This structure is very interesting. Indeed, each instanton makes one particular (instanton-
dependent) linear combination of the neutrinos massive, leaving two linear combina-
tions massless. Hence, for three or more instantons, one generically has a matrix with
three non-zero eigenvalues. It is easy to imagine a hierarchical structure among the
three eigenvalues if e.g. the exponential suppression factors exp(−ReUr) are different
for each instanton.
3.4 Other B− and L−violating operators
Our main focus in this paper is on the generation of right-handed neutrino Majorana
masses. However instantons may induce other L- and B-violating operators which we
briefly summarize in this subsection.
3.4.1 The Weinberg operator
A right-handed neutrino Majorana mass term is not the only possible operator violating
lepton number. Instanton effects may also give rise to dimension 5 operators not
involving νR. Specifically, the Weinberg operator
(LHLH) . (3.14)
might be generated. Once Higgs fields get a vev v this operator gives rise directly to
Majorana masses for the left-handed neutrinos of order ≃ v2/M . Indeed, it is easy to
check that in this case the required instanton M must verify
Sp(2) case : IMc = −1 ; IMd = 1 (3.15)
O(1) case : IMc = −2 ; IMd = 2 (3.16)
U(1) case : IMc = −2 ; IMd = 2 or IMc′ = 2 ; IMd′ = −2 (3.17)
(here we are assuming SU(2)weak to be embedded in an Sp(2)). Note that these
intersection numbers are different to those giving rise to νR mass terms. In particular
they lead to a transformation under B − L opposite to that of νR mass operators 11.
In the present case there are altogether four fermionic zero modes αi,γi corresponding
to the intersections of the instanton M with the branes c, d. These zero modes can
have couplings involving the left-handed leptons L and the u-type Higgs multiplet H
Ldisk ∝ cija (αi(LaH)γj) . (3.18)
Upon integration over the fermionic zero modes one recovers the Weinberg operator.
In the present case the scale M of the Weinberg operator will be the string scale Ms
and the coupling λ ≃ exp(−Sins). Again, in the particular case of Sp(2) instantons the
situation simplifies (cija = c
aǫij) and one gets left-handed neutrino Majorana masses
MLab =
< H >2
2c(r)a c
−Sr (3.19)
where r runs over the different contributing instantons and Sr is their corresponding
action. The flavour structure of this left-handed neutrino mass matrix is the same as
in eq.(3.13) and again may potentially lead to a hierarchical structure of left-handed
neutrino masses, as is experimentally observed.
11Instantons with these intersection numbers will be denoted with a plus sign in the instanton search
later on
In a given model both this kind of instanton and the one giving rise to right-
handed neutrino masses (which is different) may be present. This contribution to the
left-handed neutrino Majorana mass is in principle sub-leading compared to the see-saw
contribution
MLab(see-saw) =
< H >2
d(r)a d
−Sr)−1 hD (3.20)
where is the ordinary Yukawa coupling constant habD (ν
b). In principle the former
is doubly suppressed both by 1/Ms and the exponential factor. On the other hand
if the exponential suppression is not too large this mechanism involving directly the
Weinberg operator may be the most relevant source of neutrino masses. This is because
the see-saw contribution coming from νR exchange is proportional to the square of the
ordinary Yukawa couplings habD which could be small. One could even think of having
just the Weinberg operator as the unique source of the observed left-handed neutrino
masses. Note however that in string vacua like this, in which the νR’s are present and
massless at the perturbative level, having just the Weinberg operator would not be
phenomenologically correct, and instantons of the first class are still needed so that the
νR’s get a sufficiently large mass.
3.4.2 R-parity violating operators
In the case ofN = 1 SUSYmodels like the MSSM there might be operators of dimension
3 and 4 violating lepton and/or baryon number. These are the superpotential couplings
WRp = µ
aH + λabcQ
aDbLc + λ′abcU
aDbDc + λ′′abcL
aLbEc (3.21)
in standard notation. Unlike the neutrino operators mentioned above, these operators
violate B −L in one unit (rather than 2). It is well known that the standard R-parity
of the MSSM may be identified with a Z2 subgroup of U(1)B−L, so these terms are odd
under R-parity. The simultaneous presence of all these couplings is phenomenologically
unacceptable. Indeed, the third coupling violates baryon number, and the other three
violate lepton number. Together they lead to proton decay at an unacceptably large
rate. On the other hand couplings violating either B or L are phenomenologically
allowed.
It is an interesting question whether any of these operators may be induced by
string instanton effects. A first point to note is that instantons with Sp(2) Chan-Paton
symmetry can never generate operators of this type. The reason is that all charged
zero modes will necessarily come in Sp(2) doublets and hence the charged operators
induced will always involve an even number of charged D = 4 fields and R-parity is
automatically preserved. On the other hand O(1) and U(1) instantons may generate
R-parity violating operators. In particular, the LH bilinear is essentially the square
root of the Weinberg operator, and may be induced if a U(1) or O(1) instanton M
exists with
IMc = −1 ; IMd = 1 or IMc′ = 1 ; IMd′ = −1 . (3.22)
(in the O(1) case the second option is not independent from the first). Again, if
the appropriate disk couplings are non-vanishing a term with µaL ∼ Ms exp(−Sins) is
generated. The rest of the operators in WRp may also be generated. Possible instanton
zero modes which may induce them are shown in table 2. For example, the QDL
operator may be induced if a U(1) instanton M with intersection numbers
IMb = −1 ; IMc′ = 1 ; IMd = 1 (3.23)
is present and in addition couplings
Ldisk ∝ cab (α(UaQbj)γj) + c′a(βLajγj) (3.24)
exist. Here α, β, γ are zero modes corresponding to (Mc′), (Md) and (bM) intersections
and a, b(j) are flavor(SU(2)L) indices. Analogous trilinear or quartic disk amplitudes
involving two charged zero modes should exist to generate the rest of the R-parity
violating amplitudes in table 2.
D = 4 Operator IMa IMa′ IMb IMc IMc′ IMd IMd′
νRνR 0 0 0 2 0 -2 0
LH̄LH̄ 0 0 0 -2 0 2 0
LH̄ 0 0 0 -1 0 1 0
QDL 0 0 -1 0 1 1 0
UDD -1 0 0 1 2 0 0
LLE 0 0 -1 0 1 1 0
QQQL 1 0 -2 0 0 1 0
UUDE -1 0 0 2 2 -1 0
Table 2: Zero modes required to generate Lepton/Baryon-number violating superpoten-
tial operators. Sp(2) instantons cannot give rise to R-parity violating operators whereas
O(1),U(1) instantons may in principle contribute to all of them. In the case of U(1) instantons
there are additional zero mode possibilities which are obtained by exchanging IMx ↔ −IMx′ .
3.4.3 Dimension 5 proton decay operators
There are also superpotential dimension-5 operators violating B and L which may be
constructed from the MSSM matter superfields. Indeed the dimension 5 operators
)QQQL ; (
)UUDE (3.25)
are in fact the leading source of proton decay in SUSY GUT models with R-parity.
Unlike the other operators considered here these ones preserve B−L (hence R-parity)
but not B +L. These operators do not contribute directly to a proton decay but need
to be ’dressed’ by a one loop exchange of some fermionic SUSY particle. This makes
that, even although they are suppressed only by one power of the relevant fundamental
scale, the loop factor and the corresponding couplings make the overall rate in SUSY-
GUTS (barely) consistent with present experimental bounds for M of order the GUT
scale or larger.
These dimension 5 operators may also be induced in D-brane models of the class
here considered by the presence of instantons with appropriate intersection numbers.
For instance, the first operator may be induced through O(1) or U(1) instantons M
IMb = IMb′ = −2 ; IMa = 1 ; IMd = 1 (3.26)
Again Sp(2) instantons cannot induced this operator, since the the Ma intersection
would yield 6 (rather than 3) colored fermionic zero modes. The proton decay rate
obtained from these operators depend on the ratio exp(−Sins) × 1/Ms. For Ms of
order 1016 GeV, the rate is consistent with present bounds if exp(−Sins) provides a
suppression of a few orders of magnitude. On the other hand, models with a low string
scale may be in danger unless the exponential suppression is sufficiently large (or such
particular instantons are absent).
As a general conclusion, these phenomenological aspects of instanton induced oper-
ators very much depend on the action of the instanton, e.g. the volume of the wrapped
D2-instanton in the intersecting D-brane constructions. In any event it is clear that
the instantons here considered may indeed induce proton decay at a model-dependent
rate. However in certain models R-parity will be preserved and prevent too rapid pro-
ton decay. Indeed, this is what we find in our instanton search in Gepner orientifolds.
As we said Sp(2) instantons automatically preserve R-parity. More generally, models
that violate R-parity are rare, and the corresponding instantons actually generate very
high dimensional operators, so R-parity breaking effects seems quite suppressed. In
fact in our search within MSSM-like models in Gepner model orientifolds we do not
find instantons with just the correct charged zero modes to generate the low dimen-
sional couplings discussed above. So, at least within our class of RCFT constructions,
R-parity preservation is quite a common feature.
4 CFT orientifolds
In this section we describe the 4d string models we consider, namely orientifolds of
type IIB Gepner model compactifications. This is a very large class, on which one can
carry out large scans for certain desired properties. And moreover at present the only
known class of (SUSY) models with massive B − L.
4.1 Construction of the models
In general, RCFT orientifolds are orientifold projections of closed string theories con-
structed using rational conformal field theory. Although this includes in principle
rational tori and orbifolds, the real interest lies in cases where the two-dimensional
CFT is interacting, because such theories are hard to access by other methods. A
disadvantage of the use of RCFT is that this method is algebraic, and not geometric in
nature, so that one cannot easily explore small deformations of a certain string theory.
It is best thought of as a rational scan of moduli spaces.
The most easily accessible examples are the orientifolds of tensor products of mini-
mal N = 2 conformal field theories (“Gepner models”) forming a type IIB closed string
theory. During the last decade, examples in this class have been studied by many au-
thors (see [35][36][37][38][39][40][41][42]), and searched systematically in [6] and [7].
Although the Gepner models form only a small subset of RCFT’s, they already offer
a large number of possibilities. The total number of tensor products with the required
central charge c = 9 is 168. On top of this, one can choose a large number of distinct
modular invariant partition functions on the torus. The orientifold formalism is not
available for all of them, but it has been completely worked out [43] for all simple cur-
rent invariants (based on the charge conjugation invariant). This yields a total of 5403
distinct MIPFs. On top of this, we may choose various orientifold projections. Here
the only known possibilities are a class of simple-current based choices [44][45][46][47].
This then yields a total of 49304 orientifolds.
For each orientifold choice, the full open string partition function is
a,b,i
NaNbA
abχi(
aχ̂i(
 (4.1)
Here Aiab are the annulus coefficients, M
a the Moebius coefficients, Na the Chan-Paton
multiplicities and χ(τ) are the closed string characters, and χ̂i(τ) = T
−1/2χi(τ). The
set of integers i is simply the set of primary fields of the closed string CFT, and depends
only on the tensor product. The integers a, b are the boundary labels; this set depends
on the MIPF. Our notation and labelling conventions for these CFT quantities are
explained in Appendix A. The integers Aiab and M
a depend in addition also on the
orientifold choice; in the case of Aiab the latter dependence is very simple: all distinct
annuli can be written as A
cb, where Ω is the orientifold choice (which we
usually do not specify explicitly) and CΩcb is the boundary conjugation matrix, which
acts as an involution on the set of boundaries.
Suppressing some details (which can be found in [43]) we may write these integers
m,J,K
SimRa,(m,J)g
JK Rb,(m,K)
(4.2)
MΩ,ia =
m,J,K
P imRa,(m,J)g
(m,K)
(4.3)
Here m is the label of an Ishibashi-state (the set of states that propagates in the
transverse (or closed string) channel of the the annulus or Moebius diagrams). It is
a subset of the set of closed string labels i, but in general there are degeneracies,
so that more than one distinct Ishibashi state belongs to a given closed string label.
These degeneracies are distinguished by the labels J,K (see Appendix A). The complex
numbers R and U are respectively the boundary and crosscap coefficients. Note that
the latter depend on the orientifold choice, but the former do not. The only dependence
of the annulus coefficients on the orientifold choice is through the Ishibashi metric gΩJK ,
which is a matrix on each Ishibashi degeneracy space, and which can be a sign if there
are no degeneracies. Finally, the matrix P is given by P =
TST 2S
T , where S and
T are the generators of the modular group of the torus. Similar expressions exist for
the Klein bottle multiplicities defining the unoriented closed sector, but they will not
be needed in this paper.
The boundary labels a, b, . . . refer to all boundaries that respect the bulk symmetries
of the CFT. This includes the individual N = 2 chiral algebras of the factors in the ten-
sor product, the alignment currents12 that ensure the proper definition of world-sheet
supersymmetry and the space-time supersymmetry generator that imposes a general-
ized GSO-projection on the spectrum. The latter implies that all characters χi respect
(at least) N = 1 space-time supersymmetry. By construction, the boundary states
are then supersymmetric as well. Both conditions (boundary and bulk space-time su-
persymmetry) can in principle be relaxed within the formalism, but this leads to a
much larger set of bulk and boundary states. The precise labelling of the boundaries
is explained in Appendix A and involves a subset of the closed string labels i and a
degeneracy label, distinct from the one used for the Ishibashi states. The set of bound-
ary labels is complete in the sense of [45]. This means that no additional boundary
states exist that respect all the aforementioned symmetries. It also means that the
matrices R are square matrices (although their rows and columns are defined in terms
of different index sets). It is in principle possible to write down additional boundary
states that break some of the world-sheet symmetries. This is an important possibility
to keep in mind, but we will not consider it here.
The massless spectrum is obtained by restricting the characters χi to massless
states. Since the characters are supersymmetric those massless states are either vector
multiplets or chiral multiplets. The latter can be restricted to one chirality (e.g left-
handed); the other choice merely produces the CPT conjugates. Boundaries are called
real if a = a′, where the conjugate boundary a′ is defined by CΩa,a′ = 1, and complex
otherwise. The Chan-Paton multiplicities Na give rise to gauge groups U(Na) for
complex boundaries and SO(Na) or Sp(Na) for real ones. In the latter case Na must
be even. To count bi-fundamentals we define
Lab ≡
Aiabχi(
)massless,L . (4.4)
Note that because of the factor 1
in (4.1) and the fact that Lab is symmetric, the value
of Lab is indeed precisely the number of bi-fundamentals in the representation (Na, Nb).
12These are spin-3 currents consisting of products of the world-sheet supercurrents of the factors in
the tensor product, including the NSR space-time factor.
It is convenient to introduce the intersection matrix13
Iab ≡ Lab′ − La′b , (4.5)
which is manifestly antisymmetric in a and b. Note that for a pair of complex bound-
aries a, b with conjugates a′, b′ one can define four quantities that are relevant for the
massless spectrum, two of which are chiral, namely Iab and Iab′ .
It is often convenient to associate a geometric picture to these integers. Thus we
will often refer to the boundary labels and their multiplicities as “stacks of branes”, and
view the integers Iab as brane intersection numbers. This is only done for convenience
and does not imply a concrete brane realization; indeed, it does not make sense to
say that a given boundary label corresponds to a Dp-brane for some give p. Such an
interpretation might be valid in a large radius limit, assuming such a limit exists.
In general, for a choice of Chan-Paton multiplicities Na there will be tadpoles in
the one-point closed string amplitudes on the disk and the crosscap. These have to be
cancelled in order to make the theory consistent (since we work with supersymmetric
strings we do not have the option of cancelling RR and NS-NS tadpoles separately).
This leads to a condition on the Chan-Paton multiplicities:
NaRa,(m,J) = 4ηmUm,J (4.6)
where η0 = 1 and all other η’s are −1; there is such a condition for any Ishibashi label
(m, J) that leads to a massless scalar in the transverse channel. The one for m = 0
(which is non-degenerate) is the dilaton tadpole condition. It has the special feature
that all coefficients Ra0 are real and positive. The crosscap coefficient U0 is also real
and can be chosen positive (in the CFT both signs are acceptable). If U0 6= 0 (4.6)
limits the Chan-Paton multiplicities; if U0 = 0 the only solution is Na = 0 for all a,
which rules out any realization of the Standard Model. This reduces the number of
usable orientifolds to 33012.
Tadpole cancellation condition implies cancellation of RR-charges coupling to long-
range fields, and absence of local anomalies. There is a second condition that has
13Note that Lab is a symmetric matrix giving the number of chiral multiplets in the ( a, b) bi-
fundamental. This is a natural quantity in unoriented CFT’s, where a symmetric definition for the
annulus amplitude exists. In oriented CFT the annulus is, in general, not symmetric, but on the
other hand it is possible to choose the branes in such a way that only ( , ) bi-fundamentals appear.
This has become the customary way of counting states in the intersecting brane literature, even for
orientifold models. The quantity Iab is defined in such a way that it is anti-symmetric in a and b.
This is why boundary conjugations appear in the right hand side. This has the additional advantage
of making I a more familiar quantity for readers used to the standard intersection brane conventions.
to be taken into account, which has to do with Z2 charges that do not couple to
long-range fields, usually referred to as “K-theory charges” in geometric constructions.
Uncancelled K-theory charges may lead to global anomalies in symplectic factors of the
gauge group. But even if this symptom is absent, the disease may still exist. A much
more general way to probe for uncancelled K-theory charges is to require the absence
of global anomalies not only in the Chan-Paton gauge group but also on all symplectic
brane-anti-brane pairs that can be added to it as “probe-branes” [48]. Presently this
is the most general constraint that be imposed in these models, but it is not known
if additional ones are required. This probe brane constraint leads to a large number
of mod-2 constraint and is potentially very restrictive, but almost harmless in practice
[49]. It is satisfied by all models we consider in the present paper.
4.2 Search for SM-like models
The complete set of solutions to these conditions is finite but huge, but the vast major-
ity is of no phenomenological interest. In the last few years systematic searches have
been carried out for models that contain the Standard Model. The models that were
considered have the property that the set of Chan-Paton labels can be split into two
subsets, the observable and the hidden sector. The former has been limited, for prac-
tical reasons, to at most four complex brane stacks, required to contain the Standard
Model gauge group and the right intersections to yield three families of quarks and
leptons, plus (in general) some non-chiral (vector-like) additional matter. The hidden
sector is only constrained by the requirement that there be no net number of chiral
multiplets charged under both the observable and hidden sector, and by practical com-
putational limitations. The main purpose of the hidden sector in these models is to
provide variables that can be used to satisfy the tadpole and global anomaly condi-
tions, since the multiplicities in the observable sector are already fixed. In some cases
the observable sector already satisfies the constraints by itself, and there is no hidden
sector.
The observable sector can be realized in many different ways if one only imposes
the constraint that the standard model should be contained in it. These possibilities
were recently explored in [7]. We will focus on the realization described in Section 3.1,
first considered in [5]. There are four stacks, namely a (containing QCD and baryon
number as U(3)), b (containing electroweak SU(2) embedded as U(2) or Sp(2)), c
(providing a U(1) or an O(2) factor14, and d (providing another U(1) factor).
14In [6] also Sp(2) was considered, but this requires an additional Higgs mechanism.
The standard model hypercharge generator is , defined in (3.1):
Qd (4.7)
where Qx denotes the generator of the U(1) of brane stack x; in case the Chan-Paton
factor of brane c is O(2) one should use the properly normalized O(2) generator. In
addition to Y these models have two or three additional U(1) gauge symmetries (the
latter case if electroweak SU(2) arises from U(2)). These (except the combination
B − L) are anomalous, with anomaly cancelled by the Green-Schwarz mechanism,
implying the existence of a B ∧ F coupling making them massive. In fact, as already
mentioned, such Stückelberg couplings may be present for non-anomalous U(1)’s as
well. We are interested in models where the hypercharge gauge boson does not have
such couplings (otherwise the model would be phenomenologically unacceptable), but
where the B−L gauge boson is massive by such couplings (both in order that the gauge
group reduces to the SM one, and that neutrino Majorana masses may be induced by
string instantons, as discussed in previous sections).
The combined requirements of having a massive B − L and a massless Y turn out
to be difficult to satisfy. In fact, if the group on brane c is O(2) they are impossible
to satisfy simultaneously, because the O(2) component of the vector boson does not
couple to any axions, and hence the B−L and Y bosons have the same mass. But even
in models with a U(1) group on brane c it happens rather rarely that both constraints
are satisfied simultaneously, at least in the searches that have been done so far.
We will make use here of the data presented in [6, 7], which are available in slightly
improved form on the website www.nikhef.nl/∼t58/filtersols.php. This database con-
sist of 211634 distinct spectra. Here “distinct” means that they are physically different
for a given MIPF15 if the hidden sector is ignored. Hence the differences can be the
number of vector-like states of various kinds or the dilaton couplings of branes a, b, c,
d. Geometrically, these spectra may originate from the same moduli space, but then in
any case from different points on this moduli space. The improvements in comparison
with the data presented in [6] consist of taking into account the full global anomaly
conditions from probe branes. In some cases this required nothing more than checking
these conditions for an existing solution of the tadpole conditions, but in other cases a
new solution had to be found. As a result, a few models disappeared from the original
database, but due to improved algorithms a few new ones could be added. The net re-
15Rare cases of identical spectra and couplings originating from different MIPFs are treated as
distinct.
http://www.nikhef.nl/~t58/filtersols.php
sult is some small but inconsequential changes in the total number of models of various
kinds. The numbers we will mention below are based on the improved database.
The total number of models in that database with a Chan-Paton group U(3) ×
Sp(2)× U(1)× U(1) is 10587. Of these, 391 (about 4%) have a massive B − L vector
boson. For U(3)×U(2)×U(1)×U(1) these numbers are, respectively, 51 and 0. Hence
no examples of the latter type were found, although they were found with 1,2 and 4
families (in a limited search), in a few percent of the total number of models. It seems
therefore reasonable to expect that U(3) × U(2) × U(1) × U(1) with massive B − L
do exist, and that their absence is just a matter of statistics. Just for comparison, the
total number of U(3)× Sp(2)× O(2)× U(1) models is 56627.
5 Fermion zero modes for instantons on RCFT’s
In this section we discuss D-brane instantons for general compactifications, including
abstract CFT ones. We also provide the spectrum of zero modes on an instanton brane,
using the information about their internal structure i.e. in the compactified dimension
in geometric models, or of the internal CFT in more abstract setups like in previous
section. We will be interested in the latter case.
A first question that should be addressed is what this internal structure is. For
instance, in type IIA geometric compactifications, it corresponds to a supersymmetric
(i.e. special lagrangian) 3-cycle. Notice that these are the same kind of 3-cycles already
used to wrap the D6-branes that give rise to the 4d gauge symmetry of such models.
For general CFT’s, D-branes are described as boundary states. To describe instantons,
one can simply use the same boundary state of the internal CFT to describe the 4d
space-filling branes present in the model and the instanton branes. The only difference
is that boundaries satisfy Neumann conditions in the 4d space-filling case, and Dirichlet
in the instanton case. This exploits the fact that whenever a boundary state of the
internal CFT, and with Neumann boundary conditions in the 4d space is an acceptable
state of the full CFT, the same boundary state of the internal CFT, combined with
Dirichlet boundary conditions in the 4d space also gives an acceptable state of the full
CFT. For geometric compactifications this is related to Bott periodicity of the K-theory
classes associated to the D-brane charges, but it is possible to show it in general.
Since instanton D-branes can thus be naturally associated to the boundary states
of 4d space-filling branes, it is convenient to express the spectrum of zero modes of the
former in terms of the massless states of the latter. This is particularly useful, since
the computation of the spectra on 4d space-filling branes for Gepner model orientifolds
has already been described (although the arguments below are valid also for geometric
compactifications). Hence, let us denote by M a 4d space-filling brane associated with
the same boundary state of the internal CFT as the instanton brane M of interest.
Note that the 4d space-filling brane M is an auxiliary tool, and need not be (and, for
our instantons of interest, will not be) one of the 4d space-filling branes present in the
model.
‘Real’ brane instantons
Let us first consider the case of ‘real’ brane instantons. Consider a set ofm 4d space-
filling branes M, and focus first on the massless spectrum in the MM sector. Before
the orientifold projection, it leads to a universal 4d N = 1 U(m) vector multiplet, and
a number LMM of adjoint chiral multiplets. The orientifold operation maps this sector
to itself, acting on the Chan-Paton with a matrix γΩ,M. This matrix satisfies
γTΩ,Mγ
Ω,M = ±1m (5.1)
The two possibilities can be chosen to correspond to γΩ,M = 1m or γΩ,M = ǫm, with
−1r 0
, and m = 2r hence necessarily even in the latter case. They corre-
spond to the SO and Sp projections, respectively.
The orientifold projection on the N = 1 vector multiplet Chan-Paton matrices is
given by
λ = −γΩ,M λT γ−1Ω,M (5.2)
and leads to SO(m) or Sp(m) vector multiplets for the SO or Sp projection (hence
the name). Concerning the N = 1 chiral multiplets, they fall in two classes of p−, p+
(with p− + p+ = LMM) which suffer the projections
λ = ±γΩ,M λT γ−1Ω,M (5.3)
For the SO projection, this leads to p+, p− chiral multiplets in the , representation.
For the Sp projection, there are p+, p− chiral multiplets in the , representation.
The sectors Ma (where a is a 4d space-filling branes present in the model) are
mapped to sectors Ma′, so it is enough to focus on the former. After the orientifold
projection one gets LMa, LMa′ chiral multiplets in the ( M, a), ( M, a).
Let us now obtain the zero modes for a set of m instanton branes M in terms
of the above spectrum. The MM sector is closely related to the MM sector, by
changing the NN boundary conditions in 4d spacetime to DD boundary conditions
(which can be done in a covariant formalism, but not in the light-cone gauge). Before
the orientifold projection, one obtains the same set of states (since moddings for NN
and DD boundary conditions are identical, both in the NS and R sector), but with
different world-volume interpretation. Also, the change in boundary conditions implies
that some polarization states which are unphysical for the 4d spacefilling brane are
physical in the instanton brane. Hence, the U(m) gauge bosons on the 4d space-filling
brane M correspond to four adjoint real scalars in the instanton brane M . Similarly,
the 4d spinors in M, correspond to four fermion zero modes on M , transforming as
two spinors of opposite chiralities θα, θ̃α̇ of the SO(4) rotation group in transverse
space. The orientifold projection maps the MM sector to itself, acting on Chan-Paton
indices with a matrix γΩ,M . In close analogy with the argument in [50] for the familiar
D5-D9-brane system in type I (see [51, 52] for related derivations), one can show that
the condition (5.1) flips sign upon changing four NN boundary conditions to DD, hence
γTΩ,Mγ
Ω,M = ∓1m (5.4)
Namely, the instanton brane has Sp(m) gauge group when the 4d space-filling brane
(with same internal boundary state) has gauge group O(m), and vice-versa. We still
refer to these projections as SO and Sp, hoping no confusion arises. Note that, as
mentioned in Section 2.2, although there are no gauge bosons in 0+ 0 dimensions, the
gauge group is present on the instantons in that it acts on open string endpoints.
Let us consider the effect of the orientifold projection on the MM states, as com-
pared with the effect onMM states. Again, following arguments familiar in the D5-D9
brane system in type I, one can show that the signs in conditions like (5.2), (5.3) re-
main unchanged upon changing four NN dimensions to DD, except for bosonic modes
polarized along the directions longitudinal to these four dimensions (and for fermions
related to them by the unbroken susy of the total system). To be concrete, consider-
ing the four MM adjoint bosons, and two MM adjoint fermions θα associated to the
universal MM vector multiplets, they suffer the projection
λ = +γΩ,M λ
T γ−1M (5.5)
Hence they transform in the of Sp(m) for the SO projection, and in the of
SO(m) for the Sp projection. On the other hand, for the two fermion zero modes θ̃α̇,
the projection is
λ = −γΩ,M λT γ−1M (5.6)
and leads to two fermion zero modes in the of Sp(m) for the SO projection, and in
the of SO(m) for the Sp projection.
This implies that in order to obtain two fermion zero modes from this univer-
sal multiplet, in order to generate a superpotential, one should consider instantons
with orthogonal gauge group and multiplicity one (O(1) instantons). For instantons
with symplectic gauge group and multiplicity two (Sp(2) instantons), there are two
additional fermion zero modes in the triplet representation. As mentioned, we will
continue to consider such instantons in our relaxed scan. Multiple instantons, i.e.
boundary states with higher multiplicity, lead to a larger amount of additional fermion
zero modes (due to the larger gauge representations for the fermions), and do not
contribute to superpotentials; we will not consider such cases even in relaxed scans,
since they also very often lead to too many charged fermion zero modes and cannot
contribute to the operators of interest (except possibly for O(2) and U(2) instantons
with low intersections, which are kept in our scan as a curiosity).
Similarly, for the p± sets of MM scalars and fermions associated to the MM 4d
chiral multiplets, the projection is
λ = ±γΩ,M λT γ−1Ω,M (5.7)
with the same sign choice as in (5.3). The different structure of γΩ implies that, for
the SO projection we get p+, p− sets of scalars and fermions in the , , while for
the Sp projection there are p+, p− sets of scalars and fermions in the , .
This concludes the discussion of the MM sector. Let us not consider the Ma sec-
tors, from the information from the Ma sectors. Notice that this implies changing
four NN boundary conditions to DN, which have different moddings. Hence the states
are different in both situations, but the information on the multiplicities is preserved.
Specifically, in the NS sector the DN boundary condition introduce an additional vac-
uum energy which generically makes all states massive. Hence there are no massless
scalar zero modes in generic Ma sectors. In the R sector, the change in the mod-
dings reduces the dimension of the massless ground state, leading to a single (chiral)
fermionic degree of freedom. Since the orientifold action maps the Ma sector to Ma′
sectors, there are no subtleties in the orientifold projection. The end result is LMa,
LMa′ fermion zero modes in the ( M, a), ( M, a). The net number of chiral fermion
zero modes in the ( M, a) is given by IMa = LMa′ − LMa, i.e. the net number of
chiral multiplets in the related Ma sector.
The results for orientifold projections for real branes are shown in table 3.
Proj. Multiplet in M M (before orient.) M (after orient.) M (after orient.)
SO N = 1 vect. mult. U(m) O(m) Sp(m)
2 f + 2 f + 4 b
N = 1 ch. mult. (p+ + p−) Ad p+ + p− 2p+ ( f + b ) +
2p− ( f + b )
Sp N = 1 vect. mult. U(m) Sp(m) O(m)
2 f + 2 f + 4 b
N = 1 ch. mult. (p+ + p−) Ad p+ + p− 2p+ ( f + b ) +
2p− ( f + b )
Any N = 1 ch. mult. LMa′( M, a)+ LMa′( M, a)+ LMa′( M , a) f
LMa( M, a) LMa( M, a) LMa( M , a) f
net IMa( M, a) net IMa( M , a) f
Table 3: Orientifold projection for real branes: Massless modes of the 4d space-filling branes
M (before and after the orientifold projection) and zero modes on the instanton branes M
(denoted with sub-indices b, f for bosonic and fermionic modes)
Complex brane instantons
We now consider the case of complex brane instantons. The arguments are very
similar, hence the discussion is more sketchy. Consider m 4d spacefilling branes M,
associated to the internal boundary state of the instanton brane M of interest. The
MM leads to a 4d N = 1 U(m) vector multiplet and a number LMM′ of adjoint chiral
multiplets. The orientifold action maps it to the M′M′ sector, hence we may keep just
the former and impose no projection. The MM′ sector is mapped to itself under the
orientifold projection. Denoting by γΩ,M the action on Chan-Paton indices, the MM′
modes split into sets L±MM , L
M ′M ′, which suffer a projection
λ = ±γΩ,M λT γ−1Ω,M (5.8)
leading, for γΩ,M = 1m, to L
chiral multiplets in the , , and L+
chiral multiplets in the , . The net number of chiral multiplets in the ,
is I+
M,M − L+M′M′, I−MM′ = L−M,M − L−M′M′ . And oppositely for γΩ,M = ǫm.
Finally, the Ma, Ma′ and related sectors lead, after the orientifold projection, to
LMa′ , LMa, LM′a′ , LM′a chiral multiplets in the ( M, a), ( M, a), ( M, a), ( M, a).
In order to simplify notation, we replaceM →M in these expressions in our discussions
of instanton zero modes.
Let us now consider m brane instantons M and compute their zero mode spectrum
in terms of the above. In the MM (and its image M ′M ′) sector there are four scalar
modes and four fermions in the adjoint of the U(m) gauge symmetry group; these are
related to the 4d vector multiplet in the MM sector. In addition, there are LMM ′
sets of scalars and fermions in the adjoint, related to the LMM′ non-universal chiral
multiplets in the MM sector. The MM ′ sector is mapped to itself, and one has to
impose the orientifold projection (recalling that the matrix γΩ,M differs from γΩ,M).
For γΩ,M = 1, hence γΩ,M = ǫ, we obtain L
MM , L
MM chiral multiplets in the , ,
and L+M ′M ′ , L
M ′M ′ chiral multiplets in the , . The net number of chiral multiplets
in the , is I+MM ′ = L
MM − L+M ′M ′, I−MM ′ = L−MM − L−M ′M ′ . And oppositely for
γΩ,M = ǫ hence γΩ,M = 1.
In theMa,Ma′ and related sectors, there are generically no bosonic zero modes, and
there are LMa, LM′a′ , LMa′ , LM′a chiral fermion zero modes in the ( M , a), ( M , a),
( M , a), and ( M , a) respectively. The net number of chiral fermion zero modes in
the ( M , a) and ( M , a) is given by IMa = LMa′ −LM′a′ and IMa′ = LMa−LM′a. In
order to simplify notation, we replace M →M in these expressions in our discussions
of instanton zero modes.
The results for orientifold projections for real branes are shown in table 4.
6 Search for M instantons
In this section we perform a search of models which admit an instanton inducing a
right-handed neutrino Majorana mass operator. Namely, for each model with the
chiral content of the SM in the classification described in Section 4.2, we first scan over
boundary states, searching for all instantons with the required uncharged and charged
fermion zero mode structure to yield neutrino masses. We then relax our criteria a bit
and allow for instantons with correct charged zero mode structure but having extra
non-chiral zero modes (both charged and uncharged). The idea is that these non-chiral
zero modes could be lifted by diverse effects, as discussed.
It is important to recall that the cubic couplings between instanton zero modes and
4d chiral multiplets are difficult to compute in Gepner model orientifolds. Hence, we
will simply assume that such couplings are non-zero if there is no symmetry forbidding
them.
Proj. Multiplet in M M (before orient.) M (after orient.) M (after orient.)
Any N = 1 vect. mult. U(m)× U(m)′ U(m) U(m)
4Ad f + 4Ad b
N = 1 ch. mult. padj Ad + padjAd
′ padjAd 2padj ( Ad f + Ad b )
SO N = 1 ch.mult. LMM( M, M′) L
2L+MM b,f + 2L
MM b,f
LM′M′( M, M′) L
M′M′ M
M′M′ M
2L+M ′M ′ b,f + 2L
M ′M ′ b,f
Sp N = 1 ch.mult. LMM( M, M′) L
2L+MM b,f + 2L
MM b,f
LM′M′( M, M′) L
M′M′ M
M′M′ M
L+M ′M ′ b,f + L
M ′M ′ b,f
Any N = 1 ch. mult. LMa′( M, a)+ LMa′( M, a) LMa′( M , a) f
. . . LMa( M, a) LMa( M , a) f
. . . LM′a′( M, a) LM ′a′( M , a) f
. . . LM′a( M, a) LM ′a( M , a) f
net IMa( M, a) net IMa( M , a) f
net IMa′( M, a) net IMa′( M , a) f
Table 4: Orientifold projection for complex branes: Massless modes of the 4d space-filling
branesM (before and after the orientifold projection) and zero modes on the instanton branes
M (denoted with sub-indices b, f for bosonic and fermionic modes)
6.1 The instanton scan
Our detailed strategy will become clear along the description of the results. Given a
set of a,b,c,d standard model branes, we must look for additional boundary states M
that satisfy the requirements of a (B−L)-violating instanton. From the internal CFT
point of view this is just another boundary state, differing from 4d spacefilling branes
only in the fully localized 4d spacetime structure. The minimal requirement for such
a boundary state is B − L violation, which means explicitly
IMa − IMa′ − IMd + IMd′ 6= 0 (6.1)
It is easy to see that the existence of such an instanton implies (and hence requires) the
existence of a Stückelberg coupling making B−L massive. To see this, consider adding
to the Standard Model configuration a 4d spacefilling brane M (in fact used in Section
5) associated to the boundary stateM (RR tadpoles can be avoided by simultaneously
including M antibranes, which will not change the argument). The new sector in
the chiral spectrum charged under the branes M can be obtained by reversing the
argument in Section 5, and is controlled by the intersection numbers of M . From
the above condition it follows that the complete system has mixed U(1)B−L × (GM)2
anomalies, where GM is the Chan-Paton-factor of brane M. These anomalies are
cancelled by a Green-Schwarz mechanism involving a (B −L)-axion bilinear coupling,
which ends up giving a mass to B−L via the Stückelberg mechanism. This coupling is
in fact not sensitive to the presence of the brane M, hence it must have been present
already in the initial model (without M).
Hence the existence of a boundary labelM that satisfies (6.1) implies that B−L is
massive. Unfortunately the converse is not true: even if B−L has a Stückelberg mass,
this still does not imply the existence of suitable instantons satisfying (6.1)16 Indeed,
in several models we found not a single boundary state satisfying (6.1).
Note that, since hypercharge must be massless, one can use the reverse argument
and obtain that
IMa − IMa′ − IMc + IMc′ − IMd + IMd′ = 0 (6.2)
in all models. We verified this for all models we considered as a check on the compu-
tations.
As already discussed in Section 4.2, in the search for SM constructions in Gepner
orientifold, there are 391 models with massless hypercharge and massive B−L. In these
models we found a total of 29680 instantons with B−L violation, i.e. with intersection
numbers satisfying (6.1). Of course, in order to serve our purpose of generating a
Majorana mass superpotential, the instantons have to satisfy some more conditions.
Let us consider them in order of importance, and start with the conditions on the net
number of chiral fermion zero modes charged under the 4d observable sector. Clearly
we need IMa = IMa′ and IMb = IMb′ . The latter condition is automatically satisfied in
this case, because the b-brane is real in all 391 models. The chiral conditions on the
zero modes charged under the branes c and d are as in [3]17 and are given in equations
(3.6), (3.7) (3.8) of the present paper. These are the instantons of most interest, and
on which we mainly focus. However, as discussed in Section 3.4, other important B-
and/or L- violating operators (such as the Weinberg operator or the LH operator) can
16From intuition in geometric compactifications, one expects that there may always exist a D-brane
with the appropriate topological pairings, but there is no guarantee that there is a supersymmetric
representative in that topological sector, and even less that it would have no additional fermion zero
modes. Note also that even if such D-brane instantons exists, there is no guarantee that it will fall in
the scan over RCFT boundary states.
17Note that there is a sign change in the contribution of the U(1)d generator to Y in comparison
to [3]
be generated by instantons with similar intersection numbers, up to a factor of 2 and
a sign, see table 2. For this reason we also allow at this stage any instanton which has
the correct number of charged zero modes to generate them. Imposing these conditions
reduces the number of candidate instantons potentially contributing to neutrino masses
in any of the models to 1315.
All instantons satisfying these requirements are summarized in the table 5. In
columns 1,2 and 3 we list the tensor combination, MIPF and orientifold choice for
which the model occurred. The latter two numbers codify simple current data that
describe respectively a MIPF and an orientifold. MIPFs are in general defined by
means of a subgroup H of the simple current group G, plus a certain matrix X of
rational numbers [55]. Orientifolds are defined by a simple current and a set of signs
[43]. In previous work [6] we have enumerated these quantities (up to permutation
symmetries) and assigned integer labels to them for future reference. We only refer
to these numbers here, but further details are available upon request. Usually for
each MIPF and orientifold which contains the standard model there are several choices
a,b,c,d for which it is obtained. For a given choice of tensor combination, MIPF and
orientifold and SM branes there may be several instantons. For clarity we put all such
instantons together in the information in table 5. In column 4 we indicate which type
of instanton branes were found. Five types are distinguished: O1, O2, S2, U1 and
U2, corresponding to O(1), O(2), Sp(2), U(1) and U(2) Chan-Paton symmetry on the
instanton volume. The number indicates the instanton brane multiplicity that gives
the correct number of instanton charged zero modes from the a, b, c, d branes, to lead
to right-handed neutrino Majorana masses. The number of zero modes is in general
the product of the instanton brane multiplicity and ‘intersection number’ with the
corresponding 4d spacefilling brane. As discussed in Section 5, for symplectic branes
the smallest possible brane multiplicity is 2. As we discussed there, only O1 instantons
may have the required universal minimal set of two zero modes in the uncharged sector.
Still we look for all O(1), Sp(2) and U(1) instantons which may yield a superpotential
if the extra uncharged fermion zero modes. In this vein we also include a search for
O2 and U2 instantons. Note also that such O2 or U2 instantons imply the existence
of other instantons involving the same boundary state, but with multiplicity 1, which
may lead to the R-parity violating operator LH . We will discuss the generation of
R-parity violating operators at the end of this section. The third character (+ or −)
in the instanton in table 5 is the sign of IMc′ − IMc. For the instantons giving rise to
right-handed neutrino Majorana masses this sign should be negative, whereas it should
be positive for instantons giving rise to the Weinberg operator (or the LH operator),
see table 2.
The 1315 instantons are divided in the following way over the different types: 3 of
types O1+ and O1−, 46 of type U1+, 24 of type U1−, 550 S2+, 627 S2−, 27 of types
U2+ and U2− and four of types O2+ and O2−. Notice that the vast majority (97.5%)
of the instanton solutions are of type S2+ and S2−. This is encouraging given the
nice properties of such instantons, concerning e.g. R-parity conservation. Note also
that in almost all cases both S− and S+ are simultaneously present,18 so both sources
of physical neutrino Majorana masses (from the see-saw mechanism or the Weinberg
operator) are present. The other instanton classes possibly generating right-handed
neutrino masses are O1− and U1−, which are much less abundant. There is just one
orientifold with O1− instantons, for which one can obtain cancellation of RR tadpoles,
see below. On the other hand we have found no orientifold with U1− instantons and
cancellation of tadpoles, see below.
Table 5: Summary of instanton branes.
Tensor MIPF Orientifold Instanton Solution
(1,16,16,16) 12 0 S2+, S2− Yes
(2,4,12,82) 19 0 S2−! ?
(2,4,12,82) 19 0 U2+!, U2−! No
(2,4,12,82) 19 0 U1+, U1− No
(2,4,14,46) 10 0
(2,4,14,46) 16 0
(2,4,16,34) 15 0
(2,4,16,34) 15 1
(2,4,16,34) 16 0 S2+, S2− Yes
(2,4,16,34) 16 1
(2,4,16,34) 18 0 S2− Yes
(2,4,16,34) 18 0 U1+, U1−, U2+, U2− No
(2,4,16,34) 49 0 U2+, S2−!, U1+ Yes
Continued on next page
18In some models contributing many instantons there is an exact symmetry between S− and S+.
This explains the approximate symmetry in the full set. In some cases this symmetry can be understood
in terms of flipping the degeneracy labels of boundary states. We regard it as accidental, since it is
not found in all models.
Table 5 – continued from previous page
Tensor MIPF Orientifold Instanton Solution
(2,4,16,34) 49 0 U1− No
(2,4,18,28) 17 0
(2,4,22,22) 13 3 S2+!, S2−! Yes!
(2,4,22,22) 13 2 S2+!, S2−! Yes
(2,4,22,22) 13 1 S2+, S2− No
(2,4,22,22) 13 0 S2+, S2− Yes
(2,4,22,22) 31 1 U1+, U1− No
(2,4,22,22) 20 0
(2,4,22,22) 46 0
(2,4,22,22) 49 1 O2+, O2−, O1+, O1− Yes
(2,6,14,14) 1 1 U1+ No
(2,6,14,14) 22 2
(2,6,14,14) 60 2
(2,6,14,14) 64 0
(2,6,14,14) 65 0
(2,6,10,22) 22 2
(2,6,8,38) 16 0
(2,8,8,18) 14 2 S2+!, S2−! Yes
(2,8,8,18) 14 0 S2+!, S2−! No
(2,10,10,10) 52 0 U1+, U1− No
(4,6,6,10) 41 0
(4,4,6,22) 43 0
(6,6,6,6) 18 0
Most models have a hidden sector containing extra boundary states beyond the
SM ones. In the same spirit of imposing chiral conditions first, we should require
that IMh = IMh′, where h is a hidden sector brane. This is to guarantee that the
generated superpotential does not violate some hidden sector gauge symmetry which
would require the presence of hidden sector fields along with the νR bilinear. The latter
condition is not imposed on the previously known hidden sector (i.e. the one in [6, 7]),
but instead a new search for tadpole solutions was performed, for each M , restricting
the candidate hidden sector branes to those satisfying IMh = IMh′ (as discussed in
Section 5). This is because in general the known hidden sector in [6, 7] is just a sample
out of a huge number of possibilities.
In column 5 we indicate for which instantons it was possible to satisfy the tadpole
conditions with this additional constraint. With regard to observable-hidden matter we
use the same condition as in [6], namely that it is allowed only if it is vector-like. Such
a solution could be found for 879 of the 1315 instantons, with ten cases inconclusive
(i.e it was computationally too difficult to decide if a solution does or does not exist).
The latter are indicated with a question mark in column 5 (for most of the undecidable
cases there is a tadpole solution for a different instanton with the same characteristics;
for that reason just one question mark appears).
Independently of the RR tadpole condition (since there may be alternative sources
for its cancellation, or hidden sectors which fall beyond the reach of RCFT), we can
also consider the further constraint that the number of charged fermion zero modes is
exactly right, not just in the chiral sense. This means IMa = IMa′ = IMb = IMb′ = 0,
IMc = 2, IMc′ = 0 and IMd = −2, IMd′ = 0 or vice-versa. Furthermore we require that
there are no adjoint or rank-2 tensor zero-modes (note that the latter could be chiral
if the instanton brane is complex, and indeed they are in some of the 1315 cases).
This reduces the 1315 instantons to 263. In column 4 we indicate those cases with
an exclamation mark. It is noteworthy that the success rate for solving the tadpole
conditions is highest for these instantons: 254 of the 263 allow a solution (with 3
undecided). If an exclamation mark appears in column 4, this only indicates that
some of the instantons are free of the aforementioned zero modes, not that all of them
are. But in all cases, if there are tadpole solutions, they exist in particular for the
configurations with an exclamation mark. Finally we may impose the condition that
IMh and IMh′ are separately zero. This is indicated with and exclamation mark in
column 5. This turns out to be very restrictive. The only cases where this happens
have no hidden sector at all.
It is worth remarking that the only instantons having exactly the correct set of
charged zero modes and cancelling tadpoles are of S2± type. Also those instantons are
the only cases marked with an exclamation mark in column 4 and 5. These examples,
which will be discussed below in some detail, also have just the minimal set of fermion
zero modes, except for the universal sector (which for Sp(2) instantons contains two
extra triplets).
The main conclusion about this scan is that we did not find any instantons with
exactly the zero mode fermions to generate the neutrino mass superpotential. However
we have found a number of examples which come very close to that, with exactly the
required charged zero modes and a very reduced set of extra uncharged zero modes
from the universal sector. These extra zero modes are non-chiral and hence one expects
that e.g. RR/NS fluxes or other effects may easily lift them, as we discussed in section
2. Concerning O(1) instantons, which have just the two required fermion zero modes
in the universal sector, we have found one example, with the appropriate net structure
of charged zero modes. However, it has plenty of other extra zero modes. We discuss
examples of O(1) and Sp(2) instantons in the following subsections.
6.2 An O1 example
Let us first discuss the case of O(1) instantons. In principle they would be the more
attractive since they have no undesirable universal zero modes at all. Unfortunately
this type of instanton is rare within the set we scanned, and we found just one example
with a solution to the tadpole equations without any unwanted chiral zero-modes. The
instanton however has a very large number of uncharged and charged vector-like zero
modes.
The standard model brane configuration occurs for tensor product (2, 4, 22, 22),
MIPF 49, orientifold 1, boundaries (a,b,c,d) = (487, 1365, 576, 486). As usual we only
provide this information in order to locate this model in the database. Further details
are available on request.
The bi-fundamental fermion spectrum of this model in the (a,b,c,d) sector is fairly
close to the MSSM: there is an extra up-quark mirror pair, two mirror pairs of lepto-
quarks with down quark charges and one with up-quark charges, plus two extra right-
handed neutrinos (i.e. a total of five right-handed neutrinos). There are three MSSM
Higgs pairs. The tensor spectrum is far less appealing, in particular for brane c: this
has 25 adjoints and 7 vector-like pairs of anti-symmetric tensors.
As we said, there is just one instanton brane of type O1−. It has exactly the
right number of zero-modes with brane d, but five superfluous pairs of vector-like zero-
modes with brane c, plus one vector-like pair with brane a. In addition there are four
symmetric tensor zero-modes on the instanton brane (which of course are vector-like,
since it is a real brane): the parameter p+ in table 3 is equal to 2.
The tadpole solution that is (chirally speaking) compatible with this instanton has
a large hidden sector: O(1)×O(2)4×O(3)×U(1)2×Sp(2)2×U(3) (there are other pos-
sibilities, but no simple ones). This hidden sector introduces more undesirable features:
vector-like observable/hidden matter, vector-like instanton/hidden sector modes, plus
chiral and non-chiral matter within the hidden sector. Finally the coupling ratios are
as follows: α3/α2 = .54, sin
2θw = .094, and the instanton coupling is 3.4 times weaker
than the QCD coupling (α3/αInstanton = 3.4).
Despite these unappealing features this model does demonstrate the existence of
this kind of solution.
6.3 The S2 models
As we have mentioned, these are the examples which come closer to the minimal set of
fermion zero modes. As we see in Table 5, all such instantons satisfying the criteria on
the zero mode structure (except for the extra universal zero modes) appear for models
based on the same CFT orientifold. It is the one obtained from the (2, 4, 22, 22) Gepner
model with MIPF 13 and orientifold 3 in the table. The model is obtained as follows.
6.3.1 The closed string sector
We start with the tensor product (2,4,22,22). This yields a CFT with 12060 primary
fields, 48 of which are simple currents, forming a discrete group G = Z12×Z2×Z2. After
taking into account the permutation symmetry of the last two factors, we find that this
tensor product has 54 symmetric MIPFs, and we choose one of them to build the model
of interest. For convenience we specify all quantities in terms of a standard minimal
model notation, but also in terms of the labelling of the computer program “kac” that
generates the spectrum. This particular MIPF is nr. 13. To build it we choose a
subgroup of G, which is isomorphic to H = Z12 × Z2. The generator of the Z12 factor
is primary field nr. 1, (0, 0, 0, {24,−24, 0}, {24, 20, 0}), and the Z2 factor is generated
by primary field nr. 24, (0, 0, 0, 0, {24, 20, 2}). The representations are specified on a
basis (NSR, k = 2, k = 4, k = 22, k = 22), i.e. the boundary conditions of the NSR-
fermions and the four minimal models in the tensor product. Here 0 indicates the CFT
vacuum, and for all other states we use the familiar (l, q, s) notation for the N = 2
minimal models. The first generator has conformal weight h = 11
and has ground state
dimension 1. The second has weight h = 11
and has ground state dimension 2: the
ground state contains both (0, 0, 0, 0, {24, 20, 2}) and (0, 0, 0, {24, 20, 2}, 0). The matrix
X defining the MIPF according to the prescription given in [53][54][55] is
(6.3)
This simple current modification is applied to the charge conjugation invariant of the
tensor product. This defines a MIPF that corresponds to an automorphism of the
fusion rules, and that pairs all the primaries in the CFT off-diagonally. The number
of Ishibashi states, and hence the number of boundary states is 1080. The MIPF is
invariant under exchange of the two k = 22 factors: this maps current 24 to itself, and
current 1 to current 11, which is also in H. Hence this symmetry of the tensor product
maps H into itself, and it also preserves the matrix X .
To define an orientifold, we must specify a “Klein bottle current” plus two signs
defined on the basis of the simple current group. For the current K we use the
generator of the second Z2 in G, primary field nr. 12. This is the representation
(0, 0, {4,−4, 0}, {(24, 16, 2)}, {(24,−12, 2)}) which is degenerate with nine other states,
all of dimension 1 and conformal weight 7. The crosscap signs are chosen, on the afore-
mentioned basis of H as (+,−). This results in a crosscap coefficient of 0.0464731, and
it is orientifold nr. 3 of a total of 8. The orientifold is also invariant under permutation
of the identical factors.
The closed string spectrum contains 14 vector multiplets and 60 chiral multiplets.
6.3.2 The standard model branes
To build a standard model configuration we have to specify the boundary state labels.
It turns out that we have four choices for label a and b, one for c and two for d.
This leads to a total of 32 possibilities. Among these 32 there are 22 have distinct
spectra (distinguished by the number of vector-like states), but for all 32 choices one
obtains the same set of dilaton couplings. It seems plausible that these choices simply
correspond to putting the a, b and d branes in slightly different positions, so that we
move the configuration in brane moduli space. The choices are as follows (these are
boundary labels assigned by the computer program, and can be decomposed in terms
of minimal model representations; this will be explained in table 6 below)
a : 10, 22, 130, 142
b : 210, 282, 290, 291
c : 629
d : 712, 797
There are additional possibilities, but they do not give rise to additional distinct spec-
Table 6: Branes appearing in standard model configurations
Label Orbit/Deg. Reps Weight Dimension
10 240 (0, 0, 0, 0, {10, 0, 0}) 5/4 1
130 2760 (0, 0, 0, {10, 0, 0}, 0) 5/4 1
22 [528,0] (0, 0, 0, {1,−1, 0}, {11, 1, 0}) 3/2 1
(0, 0, 0, {1, 1, 0}, {11,−1, 0}) 3/2 1
142 [3048,0] (0, 0, 0, {11,−1, 0}, {1, 1, 0}) 3/2 1
(0, 0, 0, {11, 1, 0}, {1,−1, 0}) 3/2 1
210 4248 (0, 0, {3, 3, 0}, {3,−3, 0}, {9,−9, 0}) 1/2 1
282 5760 (0, 0, {3, 3, 0}{9,−9, 0}{3,−3, 0}) 1/2 1
290 [5952,0] (0, 0, {1, 1, 0}{9, 7, 0}{11,−11, 0}) 5/6 1
291 [5952,24] (0, 0, {1, 1, 0}{9, 7, 0}{11,−11, 0}) 5/6 1
629 [9348,30] (0, (1,−1, 0), 0, {9, 9, 0}{5,−3, 0} 7/12 1
712 [9852,0] (0, {1, 1, 0}{3,−3, 0}{1, 1, 0}{5, 5, 0}) 1/2 2
(0, {1, 1, 0}{1,−1, 0}{1, 1, 0}{5,−3, 0}) 1/2 2
797 [10356,30] (0, {1, 1, 0}{3,−3, 0}{5, 5, 0}{1, 1, 0}) 1/2 2
(0, {1, 1, 0}{1,−1, 0}{5,−3, 0}{1, 1, 0}) 1/2 2
The second column gives the boundary labels in terms of a primary field label and
a degeneracy label (boundaries not indicated by square brackets are not degenerate).
The labels appearing in columns 1 and 2 are assigned by the computer program, and
are listed here only for the purpose of reproducing the results using that program.
In column 2, the boundary labels are expressed in terms of primary field labels, as
in formula (A.4). If a single number appears, this is a representative of an H-orbit
corresponding to the boundary. If square brackets are used, this means that the H-
orbit has fixed points, and that it corresponds to more than one boundary label. The
second entry in the square brackets is the degeneracy label, and refers to a character of
the “Central Stabilizer” defined in [43]; the details of the definition and the labelling
will not be important here. In this case the first entry within the square brackets refers
to an orbit representative.
These orbit representatives can also be expressed in a standard form for minimal
model tensor products. This is done in column 3. This is basically the same expansion
shown in (A.4), except that the degeneracy label ΨI turns out to be trivial in all cases,
both for the standard model and for the instanton branes shown below (although the
theory does contain primaries with non-trivial Ψ’s). In columns 4 and 5 we specify the
weight and ground state dimension of the corresponding highest weight representation.
These data are not directly relevant for the boundary state, but helps in identifying it.
Since boundaries are specified by orbit representatives, it is not straightforward to
compare them, since the standard choice (the one listed in column 2) is arbitrary. For
this reason we have used another representative in columns 3, 4 and 5, selected by an
objective criterion: we choose the one of minimal dimension and minimal conformal
weight (in that order). If there is more than one representative satisfying these criteria
we list all.
6.3.3 The open string spectrum
In Table 7 we summarize the spectra of the 32 models. The first four columns list the
a,b,c,d brane labels. The last eight columns specify the total number of multiplets of
types Q (quark doublet), U (up quark singlet), D (down quark singlet), L (lepton dou-
blet), E (charged lepton singlet), N (neutrino singlet), Y (lepto-quark) and H (Higgs).
The numbers given are for the total number of lefthanded fermions in the represen-
tation, plus their complex conjugates. So for example a 7 in column “Q” means that
there are 5 quark doublets in the usual representation (3, 2, 1
), plus two in the complex
conjugate representation (3∗, 2,−1
This yields the required three families of quark doublets, plus two mirror pairs.
Hence the smallest number that can occur in the six columns QUDLEN is three, if
there are no mirrors (note that cubic anomaly cancellation requires three right-handed
neutrinos in this class of models). The lepto-quarks Y are all in the same representation
as the down-quarks (D), or the conjugate thereof, and they occur only as vector-like
mirror pairs. They differ from D-type mirror quarks because they carry lepton number,
because they come from open strings ending on the d-brane instead of the c-brane.
In general, there can also exist U-type lepto-quarks, but in these models they do not
occur. Finally the numbers 10, 18 and 26 in column ’H’ mean that there are 5, 9
or 13 MSSM Higgs pairs H + H̄. It is worth noticing that right-handed quarks U,D
and neutrinos N = νR do not have vectorlike copies. On the other hand right-handed
leptons E always have one and the left-handed fields Q,L may have up to 3 vector-like
copies.
Table 7: Spectrum all 32 configurations.
U(3) Sp(2) U(1) U(1) Q U D L E N Y H
10 210 629 712 7 3 3 9 5 3 6 10
22 210 629 712 7 3 3 9 5 3 6 10
130 210 629 712 3 3 3 9 5 3 2 10
142 210 629 712 3 3 3 9 5 3 2 10
10 282 629 712 3 3 3 5 5 3 6 26
22 282 629 712 3 3 3 5 5 3 6 26
130 282 629 712 7 3 3 5 5 3 2 26
142 282 629 712 7 3 3 5 5 3 2 26
10 290 629 712 3 3 3 3 5 3 6 18
22 290 629 712 3 3 3 3 5 3 6 18
130 290 629 712 3 3 3 3 5 3 2 18
142 290 629 712 3 3 3 3 5 3 2 18
10 291 629 712 3 3 3 3 5 3 6 18
22 291 629 712 5 3 3 3 5 3 6 18
130 291 629 712 3 3 3 3 5 3 2 18
142 291 629 712 3 3 3 3 5 3 2 18
10 210 629 797 7 3 3 5 5 3 2 10
22 210 629 797 7 3 3 5 5 3 2 10
130 210 629 797 3 3 3 5 5 3 6 10
142 210 629 797 3 3 3 5 5 3 6 10
10 282 629 797 3 3 3 9 5 3 2 26
22 282 629 797 3 3 3 9 5 3 2 26
130 282 629 797 7 3 3 9 5 3 6 26
142 282 629 797 7 3 3 9 5 3 6 26
10 290 629 797 3 3 3 3 5 3 2 18
22 290 629 797 3 3 3 3 5 3 2 18
130 290 629 797 3 3 3 3 5 3 6 18
142 290 629 797 3 3 3 3 5 3 6 18
10 291 629 797 3 3 3 3 5 3 2 18
22 291 629 797 5 3 3 3 5 3 2 18
130 291 629 797 3 3 3 3 5 3 6 18
Continued on next page
Table 7 – continued from previous page
U(3) Sp(2) U(1) U(1) Q U D L E N Y H
142 291 629 797 3 3 3 3 5 3 6 18
In the following table we list the multiplicities Laa and Laa′ of the branes that occur
in these models, leading to vector-like sets of adjoints and rank-2 tensors. Since brane
b is symplectic, the number of adjoints is equal to the number of symmetric tensors.
Table 8: 4d matter from the aa and aa′ sectors.
Boundary Adjoints Anti-symm. Symm.
a(10) 2 2 6
a(22) 2 2 2
a(130) 2 2 6
a(142) 2 2 2
b(210) - 14 10
b(282) - 14 10
b(290) - 14 6
b(291) - 14 6
c(629) 9 - 14
d(712) 3 - 6
d(797) 3 - 6
It should be emphasized that CFT constructions generically correspond to par-
ticular points in moduli space of CY orientifolds. Due to this, they usually have an
‘enhanced’ massless particle content with extra vector-like matter and closed string
gauge interactions. Thus one would expect that many of the massless vector-like chiral
fields present in this class of models could gain masses while moving to a nearby point
in moduli space.
6.3.4 The instantons
Each of these 32 Standard Model compactifications admits 8 instantons. The instanton
labels are identical for all the 32 models. They are listed in Table 9. The first five
columns use the same notation as for the standard model boundary labels. In column
6 we list the numerical value of the dilaton coupling to the instanton brane. This
quantity is proportional to 1
. It is instructive to compare these couplings to the gauge
couplings, in order to gain intuition on the suppression factor for our instantons. In
these models the U(3) dilaton couplings are 0.00622, so that the instantons are more
strongly coupled than QCD19 On the other hand in this particular model the ratio
α3/α2 at the string scale is 3.23 (the value of sin
2θw at the string scale is 0.527). All
of these couplings are subject to renormalization group running, and there are plenty
of vector-like states to contribute to this, if one assumes that they acquire masses at
a sufficiently low scale. One should perform a detailed renormalization group analysis
to check whether one may obtain consistency with the gauge couplings measured at
low-energies. Let us emphasize however that one expects that moving in moduli space
many of these vector-like states will gain masses and also the values of the different
gauge couplings will also generically vary.
Since the value of the Type II dilaton is a free parameter at this level, one can get the
appropriate (intermediate) mass scale for the right-handed neutrino Majorana masses
by choosing an appropriate value for the dilaton. In this context, it is satisfactory to
verify that the instanton couplings are unrelated to the gauge couplings, as expected
since they do not correspond to gauge instantons [3], and are in fact less suppressed
than the latter.
Note that the 8 instantons fall into two distinct classes (evidently not related by
any discrete symmetry, since the conformal weight on the boundary orbit is distinct,
and the coupling is different as well). Within each class, the orbits of the four instan-
ton boundaries appear to be related by the Z2 symmetries of interchange of the last
two tensor factors, and simultaneous inversion of the charge q of the minimal model.
However, one has to be very careful in reading off symmetries directly from the labels
in columns 3 of Tables (6) and (9) for a number of reasons. First of all the entries in
column 3 are representatives of boundary orbits, and these representatives themselves
are merely representatives of extension orbits. Secondly the action of any discrete
19 Note that the Type II dilaton in this compactifications is an arbitrary parameter which can
always be chosen so that we consistently work at weak coupling. It is the relative value of gauge
couplings which we are comparing here.
Table 9: Instantons for all 32 configurations
Lbl. Orbit/Deg. Reps Weight Dim. coupling
414 [8064,0] (0, {1, 1, 0}, 0, {22,−22, 0}, {20, 16, 0}) 5/2 1 0.0016993
417 [8076,30] (0, {1,−1, 0}, 0, {22, 22, 0}, {20,−16, 0}) 5/2 1 0.0016993
456 [8316,0] (0, {1, 1, 0}, 0, {20, 16, 0}, {22,−22, 0}) 5/2 1 0.0016993
459 [8328,30] (0, {1,−1, 0}, 0, {20,−16, 0}, {22, 22, 0}) 5/2 1 0.0016993
418 [8088,0] (0, {1, 1, 0}, 0, {22,−22, 0}, {18, 16, 0}) 5/3 1 0.0027033
420 [8100,0] (0, {1,−1, 0}, 0, {22, 22, 0}, {18,−16, 0}) 5/3 1 0.0027033
502 [8592,0] (0, {1, 1, 0}, 0, {18, 16, 0}, {22,−22, 0}) 5/3 1 0.0027033
505 [8604,30] (0, {1,−1, 0}, 0, {18,−16, 0}, {22, 22, 0}) 5/3 1 0.0027033
symmetry on the degeneracy labels can be non-trivial. In appendix B we discuss these
symmetries in more detail.
6.4 Other examples
The Sp(2) instanton examples just discussed are the ones which get closer to the
required minimal set of fermion zero modes. Under slightly weaker conditions, we
find many more solutions. In all these cases some additional mechanism beyond exact
RCFT will be needed to lift some undesirable zero modes.
The simplest such case is the following. The tensor product is (2, 8, 8, 18), MIPF
nr. 14, orientifold 2 (the precise spectra may be found using this information in the
database www.nikhef.nl/∼t58/filtersols.php). There are three distinct brane configu-
rations for which almost perfect instantons exist, namely (a,b, c,d) = (64, 562, 389, 67)
and (64, 577, 389, 67) and (65, 560, 189, 66). Each has six instantons, three of type S2+
and three of type S2−. As in the foregoing example, the six instantons are identical
for the three standard model configurations. In this example, they have three differ-
ent dilaton coupling strengths: .00254, .00665 and .0108 (each value occurs once for
S2+ and once for S2−). By comparison, the U(3)-brane dilaton coupling strength is
0.0119338, so that the instanton brane coupling is quite a bit stronger than the QCD
coupling. This is again an interesting point if we want that νR masses are not too much
http://www.nikhef.nl/~t58/filtersols.php
suppressed. Furthermore in this example there are three distinct instanton couplings,
so that one may expect three non-zero eigenvalues (with a hierarchy) in the mass ma-
trix. As in the previous examples there is not gauge coupling unification, one rather
has α3/α2 = .4813 and sin
2(θw) = .183 at the string scale. Again a full renormalization
group analysis should be performed in order to check consistency with the measured
low-energy gauge coupling values.
These models all have a hidden sector consisting of a single Sp(2) factor. They
have respectively 3, 1 and 3 susy Higgs pairs, and a spectrum of bi-fundamentals that
is closer to that of the standard model than the previously discussed Sp(2) examples:
quarks and leptons do not have vector-like copies (there are only some vector-like
leptoquarks), and even one of the three models have the minimal set of Higgs fields
of the MSSM. The rest of the spectrum is purely vector-like, and contains a number
of rank-2 tensors, including eight or six adjoints of U(3). Furthermore there is vector-
like observable-hidden matter. The only undesirable instanton zero-mode is a single
bi-fundamental between the hidden sector Sp(2) brane and the instanton brane. Still,
these SM brane configurations without the hidden sector, provide interesting and very
simple local models of D-brane sectors admitting instantons generating neutrino masses
(with the additional ingredients required to eliminate the extra universal triplets of
fermion zero modes).
6.5 R-parity violation
We now turn to the generation of other possible superpotentials violating B − L. An
instanton violates R-parity if the amount of B − L violation,
IMa − IMa′ − IMd + IMd′ (6.4)
is odd. Examples of instantons with that property were found in the following ten-
sor product/MIPF/orientifold combinations: [(1, 16, 16, 16), 12, 0], [(2, 4, 16, 34), 49, 0],
[(2, 4, 12, 82), 19, 0] [(2, 4, 22, 22), 49, 0] and [(2, 4, 16, 34), 18, 0]. Note that all cases for
which O2 or U2 instantons were found necessarily have R-parity violating instantons as
well: the corresponding O1 and U1 instantons have IMd or IMd′ equal to ±1, whereas
the intersection with the a is non-chiral. In principle, there are many more ways
to obtain R-parity violating instantons (either due to non-vanishing contributions to
IMa − IMa′ or higher values of IMd − IMd′), and indeed, many such instantons turn
out to exist. But the number of tensor product/MIPF/orientifold combinations where
they occur hardly increases: only in the case [(1, 16, 16, 16), 12, 0] we found R-parity
violating instantons, but no U1 or O1 instantons. This suggests that in the other cases
R-parity is a true symmetry of the model. Unfortunately we have no way of rigorously
ruling out any other non-perturbative effects, but at least the set we can examine re-
spects R-parity. This includes in particular the models without hidden sector (found
for [(2, 4, 22, 22), 13, 3] ) discussed above.
The following table list the total number of instantons with the chiral intersections
listed in table 2. The total number of instantons (boundaries violating the sum rule,
as defined in (6.1)) is 29680, for all standard model configurations combined. The
last four columns indicate how many unitary instantons satisfy the sum rule exactly
as listed in table (6.1), how many satisfy it with IMx ↔ −IMx′ (the column U’), and
how many O-type and S-type instantons there are. Here ‘S’ refers to boundaries with
a symplectic Chan-Paton group if the boundary is used as an instanton brane. All
intersection numbers for type S have been multiplied by 2 before comparing with table
2. For real branes, the relevant quantities used in the comparison are IMa − IMa′,
IMc−IMc′ and IMd−IMd′ , while IMb = 0. There are fewer unitary instantons possibly
generating Majorana masses then the numbers mentioned above because the conditions
we use here are stricter: we require here that IMx and IMx′ match exactly, not just their
difference. Note however that this still allows additional vector-like zero-modes. If we
only wish to consider cases without any spurious zero-modes, we may limit ourselves
to the O-type instantons in the last column. There are very few to inspect, and all of
them turn out to have a few non-universal zero modes.
D = 4 Operator U U’ S O
νRνR 1 2 627 3
LH̄LH̄ 0 5 550 3
LH̄ 3 3 0 4
QDL 8 4 0 4
UDD 0 0 0 4
LLE 8 4 0 4
QQQL 0 4 0 3892
UUDE 4 0 0 3880
Table 10: Number of instantons in our search which may induce neutrino masses (first 2
rows), R-parity violation (next 4 rows) or proton decay operators (last 2 rows).
The last two cases are B − L preserving dimension five operators, and obviously
do not come from the set of 29680 B −L violating instantons. They were searched for
separately, but the search was limited to the same 391 models we used in the rest of the
paper. Obviously, one could equally well look for such instantons in the full database,
since their existence does not require a massive B − L.
It is interesting to note that in the classes of MSSM-like models discussed earlier in
this section with the closest to minimal zero mode structure, there are no instantons
al all generating either R-parity violating or the B − L dim=5 operators in the table.
This makes them particularly attractive.
Note that all numbers in table 10 refer to the occurrence of instantons in the set
of 391 tadpole-free models with massive B-L, but without checking the presence of
zero-modes between the hidden sector and the instanton. It makes little sense to use
the hidden sector in the database for such a check, since this is just one sample from a
(usually) large number of possibilities. A meaningful question would be: can one find a
hidden sector that has no zero-modes with the instanton. We have done such a search
for the B − L violating instantons (see the exclamation marks in the last column of
table (5)), but not for the B − L preserving instantons.
7 Conclusions and outlook
In this paper we have presented a systematic search for MSSM-like Type II Gepner
orientifold models allowing for boundary states associated to instantons giving rise to
neutrino Majorana masses. This search is very well motivated since neutrino masses
are not easily accommodated in the semi-realistic compactifications constructed up to
now. String instanton induced Majorana masses provides a novel and promising way
to understand the origin of neutrino masses in the string theory context.
The string instantons under discussion are not gauge instantons. Thus, for example,
they not only break B + L symmetry (like ’t Hooft instantons do) but also B − L,
allowing for Majorana neutrino mass generation. The obtained mass terms are of
order Ms exp(−V/g2) but this suppression is unrelated to the exponential suppression
of e.g. electroweak instantons and may be mild. In fact we find in our most interesting
examples that the instanton action is typically substantially smaller than that of QCD
or electroweak instantons, and hence these effects are much less suppressed than those
coming from gauge theory instantons.
To perform our instanton search we have analyzed the structure of the zero modes
that these instantons must have in order to induce the required superpotential. This
analysis goes beyond the particular context of Gepner orientifolds and has general
validity for Type II CY orientifolds. We have found that instantons with O(1) CP
symmetry have the required universal sector of just two fermionic zero modes for the
superpotential to be generated. Instantons with Sp(2) and U(1) CP symmetries have
extra unwanted universal fermionic zero modes, which however may be lifted in a va-
riety of ways in more general setups, as we discuss in the text. In fact we find in
our search that around 98 % of the instantons with the correct structure of charged
zero modes have Sp(2) CP symmetry. Indeed, from a number of viewpoints the Sp(2)
instantons are specially interesting. The instantons we find with the simplest structure
of fermionic zero modes are Sp(2) instantons which are also the ones which are present
more frequently in the MSSM-like class of Gepner constructions considered. They have
also some interesting features from the phenomenological point of view. Indeed, due
to the non-Abelian structure of the CP symmetry, the structure in flavor space of the
neutrino Majorana masses factorizes. This makes that, irrespective of what particu-
lar compactification is considered, Sp(2) instantons may easily lead to a hierarchical
structure of neutrino masses. It would be important to further study the possible
phenomenological applications of the present neutrino mass generating mechanism.
String instanton effects can also give rise to other B- or L-violating operators. Of
particular interest is the dimension 5 Weinberg operator giving direct Majorana masses
to the left-handed neutrinos. We find that in the most interesting cases, different
instantons giving rise to the Weinberg operator and to νR Majorana masses are both
simultaneously present. Which effect is the dominant one in the generation of the
physical light neutrino masses depends on the values of the instanton actions and
amplitudes as well as on the value of the string scale. Instantons may also generate
dim< 5 operators violating R-parity. We find however that instantons inducing such
operators are extremely rare, and in fact are completely absent in the Gepner models
with the simplest Sp(2) instantons inducing neutrino masses.
There are many avenues yet to be explored. It would be important to understand
better the possible sources (moving in moduli space, addition of RR/NS backgrounds
etc.) of uplifting for the extra uncharged fermionic zero modes in the most favoured
Sp(2) instantons. A second important question is that we have concentrated on check-
ing the existence of instanton zero modes appropriate to generate neutrino masses; one
should further check that the required couplings among the fermionic zero modes and
the relevant 4d superfields (i.e. νR or LH̄) are indeed present in each particular case.
This is in principle possible in models with a known CFT description but could be
difficult in practice for the Gepner models here described.
Instantons can also generate other superpotentials with interesting physical appli-
cations. One important example is the generation of a Higgs bilinear (i.e. a µ-term)
in MSSM-like models [4, 3]. Thus, e.g., one could perform a systematic search for
instantons (boundary states) generating a µ-term in the class of CFT Gepner orien-
tifolds considered in the present article. Other possible application is the search for
instantons inducing superpotential couplings involving only closed string moduli. The
latter may be useful for the moduli-fixing problem, or for non-perturbative corrections
to perturbatively allowed couplings [56].
Finally, it would be important to search for analogous instanton effects inducing
neutrino masses in other string constructions (heterotic, M-theory etc.). A necessary
condition is that the anomaly free U(1)B−L gauge boson should become massive due
to a Stückelberg term.
The importance of neutrino masses in physics beyond the Standard Model is un-
questionable. We have shown that string theory instantons provide an elegant and
simple mechanism to implement them in semi-realistic MSSM-like string vacua, and a
powerful constraint in model building. In our opinion, the conditions of the existence
of appropriate instantons to generate neutrino masses should be an important guide in
a search for a string description of the Standard Model.
Acknowledgements
We thank M. Bertolini, R. Blumenhagen, S. Franco, M. Frau, S. Kachru, E. Kiritsis,
A. Lerda, D. Lüst, F. Marchesano, T. Weigand for useful discussions. A.M.U. thanks
M. González for encouragement and support. The research of A.N. Schellekens was
funded in part by program FP 57 of the Foundation for Fundamental Research of
Matter (FOM), and Research Project FPA2005-05046 of de Ministerio de Educacion y
Ciencia, Spain. The research by L.E. Ibáñez and A.M. Uranga has been supported by
the European Commission under RTN European Programs MRTN-CT-2004-503369,
MRTN-CT-2004-005105, by the CICYT (Spain), and the Comunidad de Madrid under
project HEPHACOS P-ESP-00346.
Appendix
A CFT Notation
Here we summarize the labelling conventions for various CFT quantities. Further
details and explanations can be found in [43].
It is important to keep in mind that there are four steps in the construction, each
involving choices of some quantities. The steps are
• A CFT tensor product
• An extension of the chiral algebra of this tensor product
• The choice of a MIPF
• The choice of an orientifold
The second and third step are easily confused. A MIPF can itself be of extension
type (although it can also be of automorphism or mixed type), meaning that it implies
an extension of the chiral algebra. The crucial difference between step two and three in
that case is that in step 2 all fields that are non-local with respect to the extension are
projected out, and the symmetry of the extension is imposed on all states of the CFT,
i.e in particular on all boundary states. The extension in step three acts as a bulk
invariant, but the boundary states are not required to respect the symmetry implied
by the extension.
Primary fields of N = 2 minimal models are labeled in the usual way by three
integers (l, q, s). In addition to these minimal models, one building block of our CFT’s
is of course a set of NSR fermions in four dimensions. They can be represented by the
four conjugacy classes (0), (v), (s), (c) analogous to those of a root lattice of type D.
Primary fields in a tensor product of M factors are therefore labelled as
I = ((x), (l1, q1, s1), . . . , (lM , qM , sM)) (A.1)
where x = 0, v, s or c.
This tensor product is extended by the alignment currents and the spin-1 field
corresponding to the space-time supersymmetry generator. This organizes the tensor
product fields into orbits, which can be labelled by one of the elements of the orbit.
We always choose the field of minimal conformal weight (or one of them, in case there
are more) as the orbit representative labelling the orbit. The supersymmetry generator
may have fixed points, leading to orbits appearing more than once as primary fields
of the extended theory. In those cases we need an additional degeneracy label to
distinguish them. It is convenient to choose for this label a character of the discrete
group that is causing the degeneracy, the “untwisted stabilizer”, which depends on I.
Denoting this character as ΨI we get then the following set of labels for the primaries
of the extended CFT
i = [I,ΨI ] (A.2)
where I has the form (A.1). If there are no degeneracies we will leave out the square
brackets and the ΨI .
In boundary CFT’s two new labels appear: the labels of Ishibashi-states that propa-
gate in the transverse channel of the annulus, and the boundary labels. In the simplest,
“Cardy” case both sets of labels are in one-to-one correspondence with the extended
CFT labels i. But if we consider non-trivial MIPFs Zij both sets of labels are different.
The Ishibashi states are in one-to-one correspondence with the fields i with Ziic 6= 0.
Degeneracies can occur here if Ziic 6> 0. This requires the introduction of a degeneracy
label. Such degeneracies may occur if the stabilizer of i (the set of simple currents that
fix i) is non-trivial. It is convenient to use elements J of the stabilizer as degeneracy
labels, so that the Ishibashi labels get the following form
m = (i, J) , (A.3)
where i is an extended CFT label, as defined above (to be precise, in some cases a
non-trivial degeneracy label is introduced even if Ziic = 1. The details will not matter
here).
Boundary states correspond to orbits of the simple current group H that defines a
MIPF. To label such orbits we choose a representative. There is no obvious canonical
representative (one could use one of minimal conformal weight, but the conformal
weight of orbit members of a boundary state does not play any rôle in the formalism,
unlike the conformal weight of a primary). So in this case we just make an arbitrary
choice. Once again there can be degeneracies. In this case they are due to a subgroup
of the stabilizer called the “Central Stabilizer”. It is convenient to label the boundary
states by an orbit representative i and a character ψi of the central stabilizer. If we
expand the boundary state label in all of its components we get
a = [i, ψi] = [[I,ΨI ], ψi] = [((x), (l1, q1, s1), . . . , (lM , qM , sM)),ΨI ], ψi] . (A.4)
Note that i is just a representative of a boundary orbit, and that I is just a represen-
tative of an orbit of the extension of the CFT.
B Instanton boundary symmetries
In the hidden-sector free example discussed in some detail in section 6 we have en-
countered Sp(2)-type instantons, the most common kind in our scan. This particular
model is the one that comes closest to the required zero mode count, although the only
superfluous zero modes are rather awkward. Let us assume that the effect of these su-
perfluous universal zero-modes instantons can be avoided. Then there is still another
problem we have to face, namely that the two zero modes αi and γi are related by
an Sp(2) transformation of the label i. Then we we need at least three independent
instantons (with unrelated couplings) to generate three non-zero neutrino masses, as
discussed in Section 3.3. Since the technology to compute the couplings is not yet
available, we cannot be completely sure that the relevant couplings are distinct, or
indeed that they are non-vanishing, but at least we can inspect if there are obvious
symmetries relating them.
The unextended tensor product (2, 4, 22, 22) has 64 discrete symmetries: five sep-
arate charge conjugations of the factors (including the NSR space-time factor) and
the interchange of the two identical k = 22 minimal models. To get space-time su-
persymmetry this tensor product is extended with the product of the simple current
Ramond ground states of each factor. These Ramond ground states are not invariant
under charge conjugation. Therefore this extension breaks the discrete symmetries to
Z2 × Z2, the combined charge conjugation of all five factors and the permutation of
the two identical factors. The combined charge conjugation also acts non-trivially on
the Ramond ground states in each factor, but the result is the charge conjugate of the
space-time supersymmetry generator, which is in the chiral algebra. The combined
conjugation is in fact the charge conjugation symmetry of the extended CFT. It turns
out that only a Z2 subgroup of Z2 × Z2 acts non-trivially on the simple currents of
the extended CFT: the permutation of the k = 22 factors acts in the same way as
charge conjugation. The action of these symmetries on the complete set of primary
fields is more complicated. It is easy to see that the permutation acts differently than
charge conjugation. In general the primary fields of the extended CFT are labelled
as i = [((x), (i1), . . . , (i4)),Ψ] (see Appendix A). The action of the permutation is to
interchange i3 and i4, but in cases with a non-trivial degeneracy it is not a priori
clear which of the degenerate states is the image of the map. This can be resolved by
examining the fusion rules, which should be invariant under the permutation:
Nijk = Nπ(i)π(j)π(k) , (B.1)
where N is a fusion coefficient and π a permutation or other automorphism. In general
there may be more than one way to resolve these ambiguities, resulting in additional
automorphism of the CFT. The standard example of this situation is the extension of
the affine algebra A1 level 4 by the simple current. The resulting CFT has an outer
automorphism, non-trivial charge conjugation, that has no counterpart in A1 level 4.
As mentioned above, the Z2 permutation symmetry is respected by the MIPF and
the orientifold, and since charge conjugation acts in the same way on the simple currents
as the permutation, charge conjugation is respected as well.
In this way we end up with (at least) a surviving Z2×Z2 discrete symmetry acting
on the boundary labels, or a larger discrete symmetry if that symmetry is extended
by the action on the degeneracy labels of the extension. The foregoing story repeats
itself for the action on the boundary labels. The boundary labels are given in terms
of the CFT labels plus a second degeneracy label, the one indicated by the second
entry in the square brackets in column 2 of tables (6) and (9). Once again one has to
determine not only how a symmetry acts on the first entry (this is just the action of
the symmetry in the extended CFT, respecting its fusion rules), but also how it acts
on the degeneracy labels. In this case the precise action can be determined from the
invariance of the annulus coefficients
Aiab = A
π̂(a)π̂(b) , (B.2)
where π is the action on the primaries of the extended CFT (as determined above) and
π̂ is the action on the boundary labels induced by π.
Since the orientifold choice is non-trivial, boundary charge conjugation does not
coincide with CFT charge conjugation. Indeed, the eight instanton boundary states
are invariant under boundary charge conjugation (which they must be in order to
produce a “real” Sp(2)-type instanton). However, just as permutations, CFT charge
conjugation may induce a non-trivial discrete symmetry on the boundary states.
In addition to these “outer automorphisms” there is the notion of boundary simple
currents, introduced in the appendix of [6]. These may be thought of as remnants of
the original simple currents, and imply relations between annulus amplitudes of the
Aiab′ = A
Ja(Jb)′ (B.3)
All of the aforementioned symmetries might relate instanton couplings, and hence
threaten their numerical independence. However, in order to do that they have to be
symmetries of the full standard model/instanton configuration, not just relate some of
the eight instantons to each other. It is easy to see that the permutation of the k = 22
factors changes the standard model brane configuration. Consider brane c: it turns out
that under permutation boundary state 629 it is mapped to boundary state 544 or 545
(depending on the action on the degeneracy label), which in any case is distinct. Hence
even if the instanton boundaries 414 and 456 resp. 418 and 502 are mapped to each
other by boundary permutation, at the same time the standard model configuration is
mapped to a distinct one.
This means that we may expect at least four distinct couplings, which should be
sufficient. It is of course possible to work out the discrete symmetries exactly, but in
view of this argument this would not yield any additional insight.
We do know the exact boundary orbits. The orbit of instanton label 414 is
(414, 415, 416, 417), so that instantons 414 and 417 are related. But the orbit of brane
c under the same action is (629, 628, 626, 627). Hence the action that relates 414 and
417 maps 629 to 627. In fact all four standard model boundaries a,b,c,d are mapped
to different ones. This implies that instantons 414 and 417 may produce different
couplings as well, so that all eight instantons may contribute in a different way.
These distinctions concern the disk correlators d(r)a in (3.12). The factors exp(−Re Ur)
will be related by discrete symmetries, and it seems reasonable to expect them to be
identical for instantons 414, 417, 456 and 459, which is indeed correct. However there
is no reason to expect the other four instantons to have the same suppression factor,
and indeed they do not.
Note that these symmetries imply the existence of a much larger set of standard
model configurations than the 32 discussed here. However, as mentioned before, the
32 models considered here display all possible distinct spectra.
References
[1] P. Minkowski, Phys.Lett.B67,421(1977); M. GellMann, P. Ramond and R. Slan-
sky, in Supergravity, p.315, ed. P. Nieuwenhuizen and D. Friedman, (North Hol-
land, Amsterdam, 1979); T. Yanagida, in Proc. of the Workshop on Unified The-
ories and the Baryon Number of the Universe, ed. O. Sawada and A. Sugamoto,
(KEK, Japan, 1979); R.N. Mohapatra and G. Senjanovic, Phys.Rev.Lett.44, 912
(1980).
[2] J. Giedt, G. L. Kane, P. Langacker and B. D. Nelson, Phys. Rev. D 71 (2005)
115013 [arXiv:hep-th/0502032].
[3] L. E. Ibanez and A. M. Uranga, “Neutrino Majorana masses from string theory
instanton effects,” arXiv:hep-th/0609213.
[4] R. Blumenhagen, M. Cvetic and T. Weigand, “Spacetime instanton cor-
rections in 4D string vacua - the seesaw mechanism for D-brane models,”
arXiv:hep-th/0609191.
[5] L. E. Ibáñez, F. Marchesano and R. Rabadan, JHEP 0111 (2001) 002
[arXiv:hep-th/0105155].
[6] T. P. T. Dijkstra, L. R. Huiszoon and A. N. Schellekens, Nucl. Phys. B 710
(2005) 3, hep-th/0411129]
[7] P. Anastasopoulos, T. P. T. Dijkstra, E. Kiritsis and A. N. Schellekens, “Orien-
tifolds, hypercharge embeddings and the standard model,”, hep-th/0605226, to
appear in Nucl. Phys. B.
[8] M. Cvetic, R. Richter, T. Weigand, “Computation of D-brane instanton
induced superpotential couplings: Majorana masses from string theory”,
arXiv:th/0703028.
[9] R. Blumenhagen, L. Goerlich, B. Kors and D. Lust, JHEP 0010 (2000) 006
[arXiv:hep-th/0007024].
[10] G. Aldazabal, S. Franco, L. E. Ibáñez, R. Rabadan and A. M. Uranga, J.
Math. Phys. 42 (2001) 3103 [arXiv:hep-th/0011073]; JHEP 0102 (2001) 047
[arXiv:hep-ph/0011132].
[11] A. M. Uranga, Class. Quant. Grav. 20 (2003) S373 [arXiv:hep-th/0301032];
R. Blumenhagen, M. Cvetic, P. Langacker and G. Shiu, Ann. Rev. Nucl. Part. Sci.
55 (2005) 71 [arXiv:hep-th/0502005]
[12] C. Bachas, [arXiv:hep-th/9503030].
http://arxiv.org/abs/hep-th/0502032
http://arxiv.org/abs/hep-th/0609213
http://arxiv.org/abs/hep-th/0609191
http://arxiv.org/abs/hep-th/0105155
http://arxiv.org/abs/hep-th/0411129
http://arxiv.org/abs/hep-th/0605226
http://arxiv.org/abs/hep-th/0007024
http://arxiv.org/abs/hep-th/0011073
http://arxiv.org/abs/hep-ph/0011132
http://arxiv.org/abs/hep-th/0301032
http://arxiv.org/abs/hep-th/0502005
http://arxiv.org/abs/hep-th/9503030
[13] C. Angelantonj, I. Antoniadis, E. Dudas and A. Sagnotti, Phys. Lett. B 489 (2000)
223 [arXiv:hep-th/0007090].
[14] E. Witten, Nucl. Phys. B 474 (1996) 343 [arXiv:hep-th/9604030].
[15] S. Weinberg, “Varieties Of Baryon And Lepton Nonconservation,” Phys. Rev. D
22, 1694 (1980).
[16] R. Argurio, M. Bertolini, G. Ferretti, A. Lerda, C. Petersson, arXiv:hep-
th/0704.0262
[17] M.Bianchi, F. Fucito, J.F. Morales, arXiv:hep-th/07040784
[18] A. Sagnotti, Phys. Lett. B 294 (1992) 196 [arXiv:hep-th/9210127].
[19] L. E. Ibanez, R. Rabadan and A. M. Uranga, Nucl. Phys. B 542 (1999) 112
[arXiv:hep-th/9808139].
[20] I. Antoniadis, E. Kiritsis and J. Rizos, Nucl. Phys. B 637 (2002) 92
[arXiv:hep-th/0204153].
[21] G. ’t Hooft, Phys. Rev. D 14 (1976) 3432 [Erratum-ibid. D 18 (1978) 2199].
[22] D. Baumann, A. Dymarsky, I. R. Klebanov, J. Maldacena, L. McAllister and
A. Murugan, JHEP 0611 (2006) 031 [arXiv:hep-th/0607050].
[23] R. Blumenhagen, M. Cvetic and T. Weigand, [arXiv:hep-th/0609191].
[24] M. Haack, D. Krefl, D. Lust, A. Van Proeyen and M. Zagermann, JHEP 0701
(2007) 078 [arXiv:hep-th/0609211].
[25] B. Florea, S. Kachru, J. McGreevy and N. Saulina, arXiv:hep-th/0610003.
[26] M. Bianchi, E. Kiritsis, “Non-perturbative and flux superpotentials for type I
strings on the Z(3) orbifold”, arXiv:hep-th/0702015.
[27] O. J. Ganor, Nucl. Phys. B 499 (1997) 55 [arXiv:hep-th/9612077].
[28] M. Billo, M. Frau, I. Pesando, F. Fucito, A. Lerda, A. Liccardo, “Classical gauge
instantons from open strings”, JHEP 0302 (2003) 045. arXiv:hep-th/0211250.
http://arxiv.org/abs/hep-th/0007090
http://arxiv.org/abs/hep-th/9604030
http://arxiv.org/abs/hep-th/0704078
http://arxiv.org/abs/hep-th/9210127
http://arxiv.org/abs/hep-th/9808139
http://arxiv.org/abs/hep-th/0204153
http://arxiv.org/abs/hep-th/0607050
http://arxiv.org/abs/hep-th/0609191
http://arxiv.org/abs/hep-th/0609211
http://arxiv.org/abs/hep-th/0610003
http://arxiv.org/abs/hep-th/0702015
http://arxiv.org/abs/hep-th/9612077
http://arxiv.org/abs/hep-th/0211250
[29] L. Gorlich, S. Kachru, P. K. Tripathy and S. P. Trivedi, JHEP 0412 (2004) 074,
[arXiv:hep-th/0407130];
P. G. Camara, L. E. Ibanez and A. M. Uranga, Nucl. Phys. B
708 (2005) 268, [arXiv:hep-th/0408036]; J. Park, arXiv:hep-th/0507091;
R. Kallosh, A. K. Kashani-Poor and A. Tomasiello, JHEP 0506 (2005)
069, [arXiv:hep-th/0503138]. E. Bergshoeff, R. Kallosh, A. K. Kashani-Poor,
D. Sorokin and A. Tomasiello, JHEP 0510 (2005) 102, [arXiv:hep-th/0507069].
[30] K. Becker, M. Becker, C. Vafa, J. Walcher, “Moduli stabilization in non-geometric
backgrounds”, arXiv:hep-th/0611001.
[31] A. Sen, “F-theory and orientifolds”, Nucl. Phys. B475 (1996) 562,
arXiv:hep-th/9605150.
[32] M. Bershadsky, A. Johansen, T. Pantev, V. Sadov and C. Vafa, Nucl. Phys. B
505 (1997) 153 [arXiv:hep-th/9612052].
[33] B. S. Acharya, arXiv:hep-th/0011089.
[34] N. Akerblom, R. Blumenhagen, D. Lust, E. Plauschinn and M. Schmidt-
Sommerfeld, arXiv:hep-th/0612132.
[35] C. Angelantonj, M. Bianchi, G. Pradisi, A. Sagnotti and Y. S. Stanev, “Comments
on Gepner models and type I vacua in string theory,” Phys. Lett. B 387 (1996)
743 [ArXiv:hep-th/9607229].
[36] R. Blumenhagen and A. Wisskirchen, “Spectra of 4D, N = 1 type I string vacua on
non-toroidal CY threefolds,” Phys. Lett. B 438, 52 (1998) [ArXiv:hep-th/9806131].
[37] G. Aldazabal, E. C. Andres, M. Leston and C. Nunez, “Type IIB orientifolds on
Gepner points,” JHEP 0309, 067 (2003) [ArXiv:hep-th/0307183].
[38] I. Brunner, K. Hori, K. Hosomichi and J. Walcher, “Orientifolds of Gepner mod-
els,” [ArXiv:hep-th/0401137].
[39] R. Blumenhagen and T. Weigand, “Chiral supersymmetric Gepner model orien-
tifolds,” JHEP 0402 (2004) 041 [ArXiv:hep-th/0401148].
[40] T. P. T. Dijkstra, L. R. Huiszoon and A. N. Schellekens, Phys. Lett. B 609 (2005)
408 [arXiv:hep-th/0403196].
http://arxiv.org/abs/hep-th/0407130
http://arxiv.org/abs/hep-th/0408036
http://arxiv.org/abs/hep-th/0507091
http://arxiv.org/abs/hep-th/0503138
http://arxiv.org/abs/hep-th/0507069
http://arxiv.org/abs/hep-th/0611001
http://arxiv.org/abs/hep-th/9605150
http://arxiv.org/abs/hep-th/9612052
http://arxiv.org/abs/hep-th/0011089
http://arxiv.org/abs/hep-th/0612132
http://arxiv.org/abs/hep-th/9607229
http://arxiv.org/abs/hep-th/9806131
http://arxiv.org/abs/hep-th/0307183
http://arxiv.org/abs/hep-th/0401137
http://arxiv.org/abs/hep-th/0401148
http://arxiv.org/abs/hep-th/0403196
[41] G. Aldazabal, E. C. Andres and J. E. Juknevich, “Particle models from orientifolds
at Gepner-orbifold points,” JHEP 0405, 054 (2004) [ArXiv:hep-th/0403262].
[42] G. Aldazabal, E. Andres and J. E. Juknevich, JHEP 0607 (2006) 039
[arXiv:hep-th/0603217].
[43] J. Fuchs, L. R. Huiszoon, A. N. Schellekens, C. Schweigert and J. Walcher, “Bound-
aries, crosscaps and simple currents,” Phys. Lett. B 495 (2000) 427
[ArXiv:hep-th/0007174].
[44] G. Pradisi, A. Sagnotti, and Ya.S. Stanev, The open descendants of non-diagonal
SU(2) Wess-Zumino-Witten models, Phys. Lett. B 356 (1995) 230
[45] G. Pradisi, A. Sagnotti, and Ya.S. Stanev, Completeness conditions for boundary
operators in 2D Conformal Field Theory, Phys. Lett. B 381 (1996) 97
[46] L. R. Huiszoon, A. N. Schellekens and N. Sousa, Phys. Lett. B 470, 95 (1999)
[arXiv:hep-th/9909114].
[47] L. R. Huiszoon and A. N. Schellekens, Nucl. Phys. B 584, 705 (2000)
[arXiv:hep-th/0004100].
[48] A. M. Uranga, “D-brane probes, RR tadpole cancellation and K-theory charge,”
Nucl. Phys. B 598 (2001) 225 [ArXiv:hep-th/0011048].
[49] B. Gato-Rivera and A. N. Schellekens, “Remarks on global anomalies in RCFT
orientifolds,” Phys. Lett. B 632 (2006) 728 [ArXiv:hep-th/0510074].
[50] E. Gimon, J. Polchinski, “Consistency conditions for orientifolds and D-
manifolds”, Phys. Rev. D54 (1996) 1667, arXiv:hep-th/9601038.
[51] G. Pradisi and A. Sagnotti, Phys. Lett. B 216 (1989) 59.
[52] E. Witten, Nucl. Phys. B 460 (1996) 541 [arXiv:hep-th/9511030].
[53] B. Gato-Rivera and A. N. Schellekens, Commun. Math. Phys. 145, 85 (1992).
[54] B. Gato-Rivera and A. N. Schellekens, Nucl. Phys. B 353, 519 (1991).
[55] M. Kreuzer and A. N. Schellekens, Nucl. Phys. B 411, 97 (1994)
[arXiv:hep-th/9306145].
[56] S.A. Abel, M.D. Goodsell, arXiv:hep-th/0612110.
http://arxiv.org/abs/hep-th/0403262
http://arxiv.org/abs/hep-th/0603217
http://arxiv.org/abs/hep-th/0007174
http://arxiv.org/abs/hep-th/9909114
http://arxiv.org/abs/hep-th/0004100
http://arxiv.org/abs/hep-th/0011048
http://arxiv.org/abs/hep-th/0510074
http://arxiv.org/abs/hep-th/9601038
http://arxiv.org/abs/hep-th/9511030
http://arxiv.org/abs/hep-th/9306145
http://arxiv.org/abs/hep-th/0612110
	Introduction
	Instanton induced superpotentials in Type II orientifolds
	D-brane instantons, gauge invariance and effective operators
	Zero mode structure for D-brane instantons
	Uncharged zero modes
	Charged fermion zero modes
	Instanton induced Majorana neutrino masses
	The MSSM on the branes
	Majorana mass term generation
	Flavor and the special case of Sp(2) instantons
	Other B- and L-violating operators
	The Weinberg operator 
	R-parity violating operators 
	Dimension 5 proton decay operators 
	CFT orientifolds
	Construction of the models
	Search for SM-like models
	Fermion zero modes for instantons on RCFT's
	Search for M instantons
	The instanton scan
	An O1 example
	The S2 models
	The closed string sector
	The standard model branes
	The open string spectrum
	The instantons
	Other examples
	R-parity violation
	Conclusions and outlook
	Appendix
	CFT Notation
	Instanton boundary symmetries
ABSTRACT
  Recently it has been shown that string instanton effects may give rise to
neutrino Majorana masses in certain classes of semi-realistic string
compactifications. In this paper we make a systematic search for supersymmetric
MSSM-like Type II Gepner orientifold constructions admitting boundary states
associated with instantons giving rise to neutrino Majorana masses and other L-
and/or B-violating operators. We analyze the zero mode structure of D-brane
instantons on general type II orientifold compactifications, and show that only
instantons with O(1) symmetry can have just the two zero modes required to
contribute to the 4d superpotential. We however discuss how the addition of
fluxes and/or possible non-perturbative extensions of the orientifold
compactifications would allow also instantons with $Sp(2)$ and U(1) symmetries
to generate such superpotentials. In the context of Gepner orientifolds with
MSSM-like spectra, we find no models with O(1) instantons with just the
required zero modes to generate a neutrino mass superpotential. On the other
hand we find a number of models in one particular orientifold of the Gepner
model $(2,4,22,22)$ with $Sp(2)$ instantons with a few extra uncharged
non-chiral zero modes which could be easily lifted by the mentioned effects. A
few more orientifold examples are also found under less stringent constraints
on the zero modes. This class of $Sp(2)$ instantons have the interesting
property that R-parity conservation is automatic and the flavour structure of
the neutrino Majorana mass matrices has a simple factorized form.

<|endoftext|><|startoftext|>
Asymmetry of in-medium ρ-mesons
as a signature of Cherenkov effects
I.M. Dremin∗, V.A. Nechitailo†
P.N.Lebedev Physics Institute RAS, 119991 Moscow, Russia
December 3, 2018
Abstract
Cherenkov gluons may be responsible for the asymmetry of dilep-
ton mass spectra near ρ-meson observed in experiment. They can be
produced only in the low-mass wing of the resonance. Therefore the
dilepton mass spectra are flattened there and their peak is slightly
shifted to lower masses compared with the in-vacuum ρ-meson mass.
This feature must be common for all resonances.
PACS: 12.38Bx, 13.87.-a
There exist numerous experimental data [1, 2, 3, 4, 5, 6, 7, 8, 9] about the
in-medium modification of widths and positions of prominent vector-meson
resonances. They are mostly obtained from the shapes of dilepton mass and
transverse momentum spectra in nucleus-nucleus collisions. Such in-medium
effects were tied theoretically to chiral symmetry restoration a long time ago
[10].
The dilepton mass spectra decrease approximately exponentially with in-
crease of masses albeit with substantial declines from the average approx-
imation of the general trend by the exponent in the low-mass region. A
significant excess of low-mass dilepton pairs yield over expectations from
hadronic decays is observed in experiment. The shape of the excess mass
spectra shown in [1, 2, 3] is dominated by ρ-mesons. Their ratio to other
vector meson resonances can be estimated as ρ : ω : φ=10:1:2.
∗email: dremin@lpi.ru
†email: nechit@lpi.ru
http://arxiv.org/abs/0704.1081v2
Several approaches have been advocated for explanation of the excess.
Strong dependence of the parameters of the effective Lagrangian on the tem-
perature and the chemical potential was assumed in [11, 12]. The hydro-
dynamical evolution was incorporated in [13] to describe the spectra. The
QCD sum rules and dispersion relations have been used [14, 15] to show that
condensates decrease in the medium leads to both broadening and slight
downward mass shift of resonances. The similar conclusions have been ob-
tained from more traditional attempts using either the empirical scattering
amplitudes with parton-hadron duality [16, 17] or the hadronic many-body
theory [18, 19, 20].
In the latest approach, which pretends on the best description of experi-
mental plots, the in-medium V-meson spectral functions are evaluated. The
excess of dilepton pairs below ρ-mass is ascribed to anti-/baryonic effects.
This conclusion is the alternative to more common ideas about the chiral
restoration at high energies. It asks for some empirical constraints to fit the
observed excess.
In this paper we propose another possible source of low-mass lepton pairs.
Namely, the emission of Cherenkov gluons may provide a substantial contri-
bution to the low mass region.
Considered first for processes at very high energies [21], the idea about
Cherenkov gluons was extended to resonance production [22, 23]. For Che-
renkov effects to be pronounced in ordinary or nuclear matter, the (either
electromagnetic or nuclear) index of refraction of the medium n should be
larger than 1. Qualitatively, the observed low mass excess of lepton pairs is
easy to ascribe to the gluonic Cherenkov effect if one reminds that the index
of refraction of any medium exceeds 1 within the lower wing of any resonance
(the ρ-meson, in particular).
This feature is well known in electrodynamics (see, e.g., Fig. 31-5 in [24])
where the atoms behaving as oscillators emit as Breit-Wigner resonances
when get excited. This results in the indices of refraction larger than 1
within their low-energy wings. In QCD, one can imagine that the nuclear
index of refraction for gluons in the hadronic medium behaves in a similar
way in the resonance regions. This statement is more general and can be
valid also at other energies if the relation (see, e.g., [25]) between the index
of refraction and the forward scattering amplitude F (E, 0o) is fulfilled not
only for photons but for gluons as well:
∆n = Ren− 1 ∝ ReF (E, 0o)/E. (1)
Here E is the photon (gluon) energy. In classical electrodynamics, it is the
dipole excitation of atoms in the medium by light which results in the Breit-
Wigner shape of the amplitude F (E, 0o). In hadronic medium, there should
be some modes (quarks, gluons or their preconfined bound states, conden-
sates, blobs of hot matter...?) which can get excited by the impinging parton,
radiate coherently if n > 1 and hadronize at the final stage as hadronic res-
onances [22, 23]. The hadronic Cherenkov effect can provide insight into
the substructure of the medium formed in nucleus-nucleus collisions. The
resonance amplitudes are chosen for F (E, 0o) at comparatively low energies.
The scenario, we have in mind, is as follows. The initial parton, belonging
to a colliding nucleus, emits a gluon which traverses the nuclear medium. On
its way, it collides with some internal modes. Therefore it affects the medium
as an ”effective” wave which accounts also for the waves emitted by other
scattering centers (see, e.g., [25]). Beside incoherent scattering, there are
processes which can be described as the refraction of the initial wave along
the path of the coherent wave. The Cherenkov effect is the induced coherent
radiation by a set of scattering centers placed on the way of propagation of
a gluon. That is why the forward scattering amplitude plays such a crucial
role in formation of the index of refraction. At low energies its excess over 1
is related to the resonance peaks as dictated by the Breit-Wigner shapes of
the amplitudes. In experiment, usual resonances are formed during the color
neutralization process. However, only those gluons whose energies are within
the left-wing resonance region of n > 1 give rise also to collective Cherenkov
effect proportional to ∆n.
Thus, apart from the ordinary Breit-Wigner shape of the cross section
for resonance production, the dilepton mass spectrum would acquire the
additional term proportional to ∆n at masses below the resonance peak.
Therefore its excess near the ρ-meson can be described by the following
formula1
(m2ρ −M
2)2 +M2Γ2
1 + w
m2ρ −M
θ(mρ −M)
HereM is the total c.m.s. energy of two colliding objects (the dilepton mass),
mρ=775 MeV is the in-vacuum ρ-meson mass. The first term corresponds
to the Breit-Wigner cross section. According to the optical theorem it is
proportional to the imaginary part of the forward scattering amplitude. The
second term is proportional to ∆n where it is taken into account that the
ratio of real to imaginary parts of Breit-Wigner amplitudes is
ReF (M, 0o)
ImF (M, 0o)
m2ρ −M
. (3)
1We consider only ρ-mesons here. To include other mesons, one should evaluate the
corresponding sum of similar expressions.
This term vanishes for M > mρ because only positive ∆n lead to the Che-
renkov effect. Namely it describes the distribution of masses of Cherenkov
states. In these formulas, one should take into account the in-medium mod-
ification of the height of the peak and its width. In principle, one could
consider mρ as a free in-medium parameter as well. We rely on experimental
findings that its shift in the medium is small. All this asks for some dynamics
to be known. In our approach, it is not determined. Therefore, first of all,
we just fit the parameters A and Γ by describing the shape of the mass spec-
trum at 0.75 < M < 0.9 GeV measured in [3] and shown in Fig. 1. In this
way we avoid any strong influence of the φ-meson. Let us note that w is not
used in this procedure. The values A=104 GeV3 and Γ = 0.354 GeV were
obtained. The width of the in-medium peak is larger than the in-vacuum
ρ-meson width equal to 150 MeV.
Thus the low mass spectrum at M < mρ depends only on a single pa-
rameter w which is determined by the relative role of Cherenkov effects and
ordinary mechanism of resonance production. It is clearly seen from Eq. (2)
that the role of the second term in the brackets increases for smaller masses
M . The excess spectrum in the mass region from 0.4 GeV to 0.75 GeV has
been fitted by w = 0.19. The slight downward shift about 40 MeV of the
peak of the distribution compared with mρ may be estimated from Eq. (2) at
these values of the parameters. This agrees with the above statement about
small shift compared to mρ. The total mass spectrum (the dashed line) and
its widened Breit-Wigner component (the solid line) according to Eq. (2)
with the chosen parameters are shown in Fig. 1. The overall description of
experimental points seems quite satisfactory. The contribution of Cherenkov
gluons (the excess of the dashed line over the solid one) constitutes the no-
ticeable part at low masses. The formula (2) must be valid in the vicinity of
the resonance peak. Thus we use it for masses larger than 0.4 GeV only.
The experimental data plotted in Fig. 1 have not been corrected for the
acceptance of the experiment, which strongly depends on mass and transverse
momentum of the muon pairs. However, due to an approximate cancellation
between the variations of the thermal radiation mediated by the rho and
those of the acceptance, the data as shown can roughly be interpreted as
spectral function of the rho, averaged over momenta and the complete space-
time evolution of nuclear collision [3]. To use these data without further
corrections is therefore justified as long as the pT spectra of the radiation and
those of the Cherenkov process are not dramatically different. From general
principles one would expect slightly lower pT for low-mass dilepton pairs from
coherent Cherenkov processes than for incoherent scattering. Qualitatively,
this conclusion is supported by experiment [3]. The Cherenkov dominance
region of masses from 400 MeV to 600 MeV below the ρ-resonance has softer
Figure 1: Excess dilepton mass spectrum in semi-central In(158 AGeV)-In
of NA60 (dots) compared to the in-medium ρ-meson peak with additional
Cherenkov effect (dashed line).
pT -distribution compared to the resonance region from 600 MeV to 900 MeV
filled in by usual incoherent scattering. More accurate statements can be
obtained after the microscopic theory of Cherenkov gluons developed.
We should mention that the expression (2) may be applied for ∆n ≪ 1.
The RHIC experiments revealed rather large ∆n ≈ 2. If the same values are
typical at lower energies of SPS then the more general formulas (see [23])
should be used. The qualitative conclusions stay valid.
Whether the in-medium Cherenkov gluonic effect is as strong as shown
in Fig. 1 can be verified by measuring the angular distribution of the lepton
pairs with different masses. The trigger-jet experiments similar to that at
RHIC are necessary to check this prediction. One should measure the angles
between the companion jet axis and the total momentum of the lepton pair.
The Cherenkov pairs with masses between 0.4 GeV and 0.7 GeV should tend
to fill in the rings around the jet axis. The angular radius θ of the ring is
determined by the usual condition
cos θ =
as discussed in more detail in [22].
Another way to demonstrate it is to measure the average mass of lepton
pairs as a function of their polar emission angle (pseudorapidity) with the
companion jet direction chosen as z-axis. Some excess of low-mass pairs
may be observed at the angle (4). Baryon-antibaryon effects can not possess
signatures similar to these ones.
In practice, these procedures can be quite complicated at comparatively
low energies if the momenta of decay products are comparable to the trans-
verse momentum of the resonance. It can be a hard task to pair leptons in
reliable combinations. The Monte Carlo models could be of some help.
In non-trigger experiments like that of NA60 there is another obstacle.
Everything is averaged over directions of initial partons. Different partons
are moving in different directions. The angle θ, measured from the direction
of their initial momenta, is the same but the total angles are different, corre-
spondingly. The averaging procedure would shift the maxima and give rise to
more smooth distribution. Nevertheless, some indications on the substruc-
ture with maxima at definite angles have been found at the same energies by
CERES collaboration [27]. It is not clear yet if it can be ascribed to Che-
renkov gluons. To recover a definite maximum, it would be better to detect
a single parton jet, i.e. to have a trigger.
The prediction of asymmetrical in-medium widening of any resonance at
its low-mass side due to Cherenkov gluons is universal. This universality is
definitely supported by experiment. Very clear signals of the excess on the
low-mass sides of ρ, ω and φ mesons have been seen in [5, 6]. This effect for
ω-meson is also studied in [8]. Slight asymmetry of φ-meson near 0.9 - 1 GeV
is noticeable in the Fig. 1 shown above but the error bars are large there.
We did not try to fit it just to deal with as small number of parameters as
possible. There are some indications at RHIC (see Fig. 6 in [7]) on this effect
for J/ψ-meson.
At much higher energies one can expect better alignement of the mo-
menta of initial partons. This would favour the direct observation of emitted
by them rings in non-trigger experiments. The first cosmic ray event [26]
with ring structure gives some hope that at LHC energies the initial partons
are really more aligned and this effect can be found. The possible addi-
tional signature at high energies could be the enlarged transverse momenta
of particles within the ring.
To conclude, the new mechanism is proposed for explanation of the low-
mass excess of dilepton pairs observed in experiment. It is the Cherenkov
gluon radiation which adds to the ordinary processes at the left wing of any
resonance.
Acknowledgments
We thank S. Damjanovic for providing us with experimental data. I.D.
is grateful to S. Damjanovic and H. Specht for very illuminating and fruitful
discussions, in particular, on the role of the experimental acceptance.
This work has been supported in part by the RFBR grants 06-02-16864,
06-02-17051.
References
[1] G. Agakichiev et al. (CERES), Phys. Rev. Lett. 75, 1272 (1995); Phys.
Lett. B422, 405 (1998); Eur. Phys. J. C41, 475 (2005).
[2] D. Adamova et al. (CERES), Phys. Rev Lett. 91, 042301 (2003); 96,
152301 (2006); nucl-ex/0611022.
[3] R. Arnaldi et al. (NA60), Phys. Rev. Lett. 96, 162302 (2006); S. Dam-
janovic et al. (NA60), Eur. Phys. J. C49, 235 (2007), nucl-ex/0609026
and Nucl. Phys. A783, 327 (2007), nucl-ex/0701015.
[4] D. Trnka et al., Phys. Rev. Lett. 94, 192303 (2005).
[5] M. Naruki et al. (KEK), Phys. Rev. Lett. 96, 092301 (2006),
nucl-ex/0504016.
[6] R. Muto et al. (KEK), Phys. Rev. Lett. 98, 042501 (2007),
nucl-ex/0511019.
[7] A. Kozlov (PHENIX), nucl-ex/0611025.
[8] M. Kotulla (CBELSA/TAPS), nucl-ex/0609012.
[9] I. Tserruya, Nucl. Phys. A774, 415 (2006).
[10] R. Pisarski, Phys. Lett. B110, 155 (1982).
[11] M. Harada and K. Yamawaki, Phys. Rep. 381, 1 (2003).
http://arxiv.org/abs/nucl-ex/0611022
http://arxiv.org/abs/nucl-ex/0609026
http://arxiv.org/abs/nucl-ex/0701015
http://arxiv.org/abs/nucl-ex/0504016
http://arxiv.org/abs/nucl-ex/0511019
http://arxiv.org/abs/nucl-ex/0611025
http://arxiv.org/abs/nucl-ex/0609012
[12] G.E. Brown and M. Rho, Phys. Rev. Lett. 66, 2720 (1991); Phys. Rep.
269, 333 (1996); Phys. Rep. 363, 85 (2002).
[13] K. Dusling, D. Teaney and I. Zahed, nucl-th/0604071.
[14] S. Leupold, W. Peters and U. Mosel, Nucl. Phys. A628, 311 (1998).
[15] J. Ruppert, T. Renk and B. Muller, Phys. Rev. C73, 034907 (2006);
hep-ph/0612113.
[16] V.L. Eletsky, M. Belkacem, P.J. Ellis and J.I. Kapusta, Phys. Rev. C64,
035202.
[17] A.T. Martell and P.J. Ellis, Phys. Rev. C69, 065206 (2004).
[18] R. Rapp and J. Wambach, Eur. Phys. J. A6, 415 (1999); Adv. Nucl.
Phys. 25, 1 (2000).
[19] H. van Hees and R. Rapp, Phys. Rev. Lett. 97, 102301 (2006).
[20] R. Rapp, nucl-th/0701082.
[21] I.M. Dremin, JETP Lett. 30, 140 (1979); Sov. J. Nucl. Phys. 33, 726
(1981).
[22] I.M. Dremin, Nucl. Phys. A767, 233 (2006); J. Phys. G35, 1 (2007);
Int. J. Mod. Phys. E18, 1 (2007).
[23] I.M. Dremin, Nucl. Phys. A785, 369 (2007).
[24] R.P. Feynman, R.B. Leighton and M. Sands, The Feynman Lectures in
Physics (Addison-Wesley PC Inc., 1963) vol. 1, ch. 31.
[25] M. Goldberger and K. Watson, Collision Theory (John Wiley and Sons
Inc., 1964) Ch. 11, sect. 3, sect. 4.
[26] A.V. Apanasenko et al., JETP Lett. 30, 145 (1979).
[27] S. Kniege and M. Ploskon, nucl-ex/0703008.
http://arxiv.org/abs/nucl-th/0604071
http://arxiv.org/abs/hep-ph/0612113
http://arxiv.org/abs/nucl-th/0701082
http://arxiv.org/abs/nucl-ex/0703008
ABSTRACT
  Cherenkov gluons may be responsible for the asymmetry of dilepton mass
spectra near rho-meson observed in experiment. They can be produced only in the
low-mass wing of the resonance. Therefore the dilepton mass spectra are
flattened there and their peak is slightly shifted to lower masses compared
with the in-vacuum rho-meson mass. This feature must be common for all
resonances.

<|endoftext|><|startoftext|>
Superconductivity and magnetic order in CeRhIn5; spectra of coexistence.
J.V. Alvarez and Felix Yndurain
Departamento de F́ısica de la Materia Condensada,
Universidad Autónoma de Madrid, 28049 Madrid, Spain
(Dated: November 14, 2018)
We discuss the fixed-point Hamiltonian and the spectrum of excitations of a quasi-bidimensional
electronic system supporting simultaneously antiferromamagnetic ordering and superconductivity.
The coexistence of these two order parameters in a single phase is possible because the magnetic
order is linked to the formation of a metallic spin density wave, and its order parameter is not
associated to a spectral gap but to an energy shift of the paramagnetic bands. This peculiarity
entails several distinct features in the phase diagram and the spectral properties of the model, which
may have been observed in CeRhIn5. Apart from the coexistence, we find an abrupt suppression
of the spin density wave when the superconducting and magnetic ordering temperatures are equal.
The divergence of the cyclotron mass extracted from de Haas-van Alphen experiments is also
analyzed in the same framework.
PACS numbers: 71.10.Pm,74.50.+r,71.20.Tx
The interplay between magnetism and superconduc-
tivity is a recurrent area of research in condensed matter
physics. This interest has being activated in the last
years due to the experimental findings of their coexis-
tence in materials based on Ce, particularly in the 1-1-
5 CeMIn5 family [1] [2]. A prominent member of this
family is CeRhIn5, which grows in tetragonal form, al-
ternating CeIn3 and RhIn2 planes along the c crystallo-
graphic axis. The structural anisotropy, induces quasi-
bidimensionality in the electronic bonding and the Fermi
surface, as evidenced in a series of de Haas-van Alphen
measurements and band structure calculations [3][4][5].
At ambient pressure CeRhIn5 becomes an antiferromag-
net (AFM) below TN ∼ 3.8K [1], with a small staggered
magnetization aligned in the ab plane. Within the stan-
dard Doniach’s Kondo lattice paradigm, applying pres-
sure in a weak AFM heavy-fermion system opens a route
to very interesting effects. As the pressure increases this
theoretical scenario predicts: i) A reduction of TN due
to Kondo compensation. ii) The eventual suppression of
the AFM order in a quantum critical point (QCP), which
alike other heavy-fermion compounds, would be respon-
sible of the anomalies in the metallic phase. iii) The set-
ting of unconventional superconductivity. However, all
calorimetric [6], NQR [7], transport [8] and susceptibil-
ity [9] measurements provide a consistent picture for the
pressure-temperature phase diagram (presented schemat-
ically in Fig. 1) in conflict with the aforementioned the-
oretical scenario. Surprisingly, TN first increases with
pressure and it only starts to decrease for pressures higher
than 0.7GPa. Superconductivity shows up before TN has
gone down to zero, i.e. AFM and SC coexist. Finally,
AFM disappears abruptly at Pc = 1.9GPa exactly when
TN = TSC , in a first order transition and before a QCP
could have taken place. Despite the lack of a QCP in the
pressure-temperature phase diagram, the metallic phase
still might be understood in the framework of a quan-
0 0.5 1 1.5 2 2.5 3
P(GPa)
T_N_Park
T_SC_Park
T_SC_Chen
T_N_Chen
T_N_Flouquet
T_SC_Flouquet
AFM+SC
FIG. 1: Schematic experimental pressure-temperature phase
diagram of CeRhIn5 at zero-magnetic field adapted from ref-
erences [6], [9] and [11].
tum criticality if, changing another experimental knob,
one could find a QCP nearby in the phase diagram. The
natural choice is using a magnetic field to quench the
superconductivity, and in that way, continue the point
Pc(H = 0) into a line of first order transitions down to
zero temperature, ending with a QCP at H∗c2. A new
surprise appeared on this type of experiments [6, 10].
For pressures higher than Pc, the AFM reenters applying
an in-plane magnetic field Hm < Hc2. Besides, trans-
port measurements suggest [1, 8] that magnetic order in
CeRhIn5 may not be associated to a gap in the single-
particle spectrum. Actually, the resistivity as a function
of the temperature does not show the conventional mini-
mum characteristic of a metal-insulator transition at any
temperature. The small anomaly observed in the resistiv-
ity close to TN seems related to a change on the scattering
mechanism when the AFM sets in.
http://arxiv.org/abs/0704.1082v1
In vivid contrast with quantum critical and quasi-one-
dimensional systems, the understanding of the individual
AFM or SC states rely on simple but accurate mean-field
theories. In this Letter, we propose that a basic compre-
hension of the microscopic coexistence of AFM and SC
and the first-order transition between these two phases
can also be achieved within a mean-field scenario. We dis-
cuss this phenomenon at the microscopic level in terms
of a quasi-bidimensional model of interacting electrons
proposed by one of us [12, 13, 14]. We will enumerate
some implications of that model, discussing to which ex-
tent can be related to the phenomenology observed in
CeRhIn5.
The model Hamiltonian contains the terms that nat-
urally establish superconductivity and antiferromag-
netism.
H = Hk +HV +HU (1)
ε(k)c
kσckσ (2)
(k+q)↑
(−k−q)↓
c−k↓ck↑ (3)
k′−σckσck′−σ (4)
Where U is a Hubbard on-site repulsion, and Vkq the
effective interaction in the Cooper channel, which we will
take to be attractive. Only s-wave pairing will be con-
sidered throughout this work. Superconductivity with
an order parameter having a symmetry related to Vkq
is favored by HV but the presence of a Hubbard repul-
sion establish a competition, which in terms of the gap
is given by ∆ = ∆2 −∆1, such that:
∆2 = V
−k−σ〉 (5)
∆1 = U
−k−σ〉 (6)
where the prime restricts the summation to states with
an energy (measured from the FL) smaller than a cut-off
energy Ec. For those states we assume a very weak mo-
mentum dependence of Vkq . In the absence of magnetic
order and setting a constant density of states at the FL
(i.e. far from a logarithmic divergence) , the competition
between terms favoring and disfavoring the SC is clearly
shown in the McMillan-like formula for the critical tem-
perature.
TSC = 1.13Ec exp
(V − U∗)D(EF )
where U∗ = U
1+UD(EF ) ln(W/Ec)
FIG. 2: Calculated electronic structure of paramagnetic
CeRhIn5. (a) Density of states and (b) band structure. The
arrow indicates a saddle point singularity at the X point of
the Brillouin zone. The inset shows the variation of the sad-
dle point energy along the z R-X-R direction (it crosses the
Fermi Level at approximately the (π/a, 0, 0.6π/c) point).
An essential ingredient of the model is the quasipar-
ticle dispersion relation ε(k) in the paramagnetic phase
taken to set the Fermi level (FL) very close to the sad-
dle point of a 2D system of itinerant electrons. Based
on first principles calculations, Hall et al.[3] find at the
FL a sharp peak in the Density of States characteris-
tic of a 2D van Hove logarithmic singularity. We have
also performed a full first principles calculation of the
electronic structure of tetragonal paramagnetic CeRhIn5
using the VASP code [15]. The generalized gradient ap-
proximation of Perdew et al.[16] for the exchange and
correlation was adopted. The results for the experimen-
tal lattice parameters [17] are reported in Figure 2. In
Figure 2(a) we observe, like Hall et al.[3], a sharp peak
in the density of states near the Fermi level. In addi-
tion, in Figure 2(b) we display the band structure at the
vicinity of the Fermi level. We find a saddle point very
close to the Fermi level whose dispersion along the z-
direction (R-X-R in the standard notation) is about 0.3
eV indicating the two-dimensional character of the sad-
dle point singularity. Under these conditions we have,
besides the Van Hove singularity in the density of states
that favors superconductivity and magnetic order, an im-
portant kinematic restriction, namely ε(k) = ε(k + Q),
where Q = (π/a, π/a) is the vector connecting two equiv-
alent saddle points within the Brillouin zone. The above
restriction determines the gapless nature of the SDW like
in the CDW case proposed by Rice and Scott [18].
The SDW order parameter is given by,
γσ = U
kσck+Q−σ〉 (8)
with γσ = −γ−σ = γ. The resulting quadratic Hamil-
tonian is solved by a Bogoliubov transformation and the
SC and SDW order parameters obtained selfconsistently.
The Hamiltionian eigenvalues are obtained by solving:
εk − Ek −γ↑ −∆ 0
−γ↑ ε(k+Q) − Ek 0 −∆
−∆ 0 ε−k − Ek γ↓
0 −∆ γ↓ ε−(k+Q) − Ek
The four solutions are E1 = −E−, E2 = −E+, E3 =
E−, E4 = E+ where E±(k) =
ε(k)2 +∆2 ± γ. No-
tice that for a 1D or 2D nested system ε(k) = −ε(k+Q)
and therefore E±(k) = ±
ε(k)2 +∆2 + γ2.
The system is solved self-consistently for an effective
half bandwidth at ambient pressure of W0 and the phase
diagram for V = 4W0, Ec = 0.7W0 , n = 0.92 electrons
and U = 2.25W0 , U = 2.50W0 and U = 2.75W0, is
presented in Fig. 3. To simulate the effect of the pres-
sure we have considered a linear variation of the band-
width with the pressure and a bandwidth independent
electron-electron interaction U. These assumptions have
being found to be reasonable with a first principles calcu-
lation using the SIESTA code [19] taking Ni as a bench-
mark. Also, for simplicity, a constant V interaction and
hole concentration independent of pressure are assumed.
The results of our model agree qualitatively with the ex-
perimental findings depicted in Fig. 1and Fig. 3(d). We
indeed find a competition between SDW and SC but, as
seen experimentally, they can coexist in a non negligi-
ble region of the phase diagram. In addition, the SDW
disappears abruptly when the two critical temperatures
became equal, i.e., the SDW transition temperature can
not be lower than the superconducting one. This numeri-
cal result is similar to the analytical finding by Bilbro and
McMillan [20] concerning superconductivity and marten-
sitic transformation in A15 compounds.
The proximity of the FL to a saddle point is an ingredi-
ent of model (1) necessary for the formation of the metal-
lic SDW. We ask ourselves whether in CeRhIn5 there is
experimental evidence for such a proximity. A recent de
Haas-van Alphen study [4] shows a divergence in the cy-
clotron mass at a pressure Pc2 ∼ 2.4GPa, accompanied
with a change in the quasi-2D Fermi surface. Those ex-
periments were performed for values of magnetic fields
and pressures in which the system is AFM. Following
Park et al. Pc2 is very close to the pressure at which TN
would extrapolate to zero in absence of SC (see upper
inset in Fig. 4 ). The cyclotron mass in a 2D elec-
tronic system mc = h̄
2/2π(∂A(E)/∂E) where A is the
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
0 5 10 15 20 25
(a) (b)
(c) (d)
FIG. 3: Phase diagram (temperature versus pressure) ob-
tained using the model Hamiltonian (1) and a two dimensional
band structure with the FL close to a saddle point. The vari-
ables are in units of the half bandwidth W0. TN and TSC
stand for the SDW and SC critical temperatures respectively.
Panels (a), (b) and (c) are the calculated results for the pa-
rameters given in the text and U = 2.25W0 , U = 2.50W0 and
U = 2.75W0 respectively. The inset in panel (c) indicates the
metallic DOS in the three (SDW, SDW+SC and SC) different
regimes. The shaded area indicates occupied levels. Panel (d)
represents the experimental results of ref. [9]
area enclosed in the isoenergetic contour line E(k) = E.
Close to the 2D Van Hove singularity one expects the
cyclotron mass to diverge logarithmically. Actually, the
precise functional form has been computed in Ref. [21]
and found to be
mc = mc0
C +D ln
|EF − Evhs|
In a tetragonal crystal structure C and D are numbers
nearly independent of the pressure and mc0 is the cy-
clotron mass at the bottom of the band. The divergence
is driven by the denominator in the argument of the log-
arithm which in our model is: EF (γ) − Evhs ∼ γ(P ) ∼
TN(P ). In other words, in model (1), mc is enhanced
as the AFM vanishes because the FL in the SDW ap-
proaches the saddle point at X (see the inset in Fig. 4).
To elucidate if the experimental results are compatible
with this argument and with expression (9), we have ex-
tracted the pressure dependence of TN , fitting the ex-
perimental data in Ref ([6]) (see upper inset in Fig. 4
to a cubic polynomial law in (Pc2 − P ) in the range of
pressures between P = 0.65 and P = 2.4 GPa. We do
not attempt to justify physically this fit here, since our
only goal is extracting an analytic expression for TN (P )
for the range of pressures of interest, to insert it in (9).
The results are presented in the main panel of Fig. 4.
Our model reproduces reasonably well these experimen-
tal findings.
Within this model, we expect an anomaly in the spe-
cific heat at TN , which is not associated to a SDW spec-
0 0.5 1 1.5 2 2.5 3
P(Gpa)
0 0.5 1 1.5 2 2.5 3
P(Gpa)
FIG. 4: Cyclotron mass (mc) as a function of pressure. Tri-
angles are the experimental values from Ref. [4] showing the
mc divergence close to P ∼ 2.4 GPa, where the magnetic
order disappears. The solid line is a fit to the theoretical
model. Close to a Van Hove singularity the cyclotron mass
diverges logarithmically as the difference between EF and the
Evhs vanishes (see lower inset). According to the model this
energy difference is proportional to TN and its pressure de-
pendence can be extracted, for instance, from Ref. [6](see
upper inset).
tral gap but to the entropy released when the magnetic
order disappears. The electronic part of the specific heat
Ce−V = −2β
Ei(k)
∂Ei(k)
× (10)
Ei(k) +
ε(k)2 +∆2
where i=3,4. The second term inside the parenthesis
gives the SC anomaly at TC and the third term gives
the antiferromagnetic anomaly at TN . The SC anomaly
is much weaker in the coexisting phase, because the FL
lies in a depression of the density of states (see central
graph on the panel (c) of Fig. 3) created by the under-
lying SDW, while in the purely SC phase the FL is very
close to a divergence in the density of states (right graph).
Remarkably, this enhancement has been also observed in
calorimetric measurements on CeRhIn5 [6]
To summarize: beyond the detailed boundary shape
in the phase diagram of CeRhIn5 we have identified two
unequivocal, clear-cut features of the phenomenology of
CeRhIn5 in the discussed model, namely; coexistence and
abrupt disappearance of AFM when TN = TSC . The es-
sential ingredient in the model is the metallic SDW, fa-
vored for the proximity of Fermi level to a Van Hove
logarithmic singularity in the density of states. The
gapless nature of the SDW implies the lack of a metal-
insulator transition at, or close to, TN as shown by resis-
tivity measurements. The kinematic conditions needed
for the metallic SDW to be formed seem to be present in
CeRhIn5 as shown by the logarithmic divergence of the
cyclotron mass.
We appreciate very much discussions with G. Gomez-
Santos, H. Suderow and S. Vieira. Financial support of
the Spanish Ministry of Science ( Ramon y Cajal contract
and Grant BFM2003-03372) is acknowledged.
[1] H. Hegger, C. Petrovic, E.G. Moshopoulou, M.F. Hund-
ley, J.L. Sarrao, Z. Fisk and J.D. Thompson, Phys. Rev.
Lett. 84, 4986, (2000).
[2] M. Nicklas, V.A. Sidorov, H. A. Borges, P.G. Pagliuso,
J.L. Sarrao and J.D. Thompson, Phys. Rev. B 70,
020505(R) (2004).
[3] D. Hall, E. Palm, T. Murphy, S.W. Tozer, C. Petrovic,
E. Miller-Ricci, L. Peabody, CharisQayHuei Li, U. Alver,
R. G. Goodrich, J. L. Sarrao, P.G. Pagliuso, J. M. Wills,
Z. Fisk. Phys. Rev. B 64, 064506 (2001).
[4] H. Shishido, R. Settai, H. Harima, Y.Ōnuki, J. Phys. Soc.
Jpn. 74, 1103 (2005).
[5] S. Fujimori et al. Phys. Rev. B 67, 144507 (2003).
[6] G. Knebel, D. Aoki, D. Braithwaite, B. Salce and J.
Flouquet,cond-mat 0512078.
[7] T. Mito, S. Kawasaki, Y. Kawasaki, G.-q. Zheng, Y. Ki-
taoka, D. Aoki, Y. Haga and Y.Ōnuki, Phys. Rev. Lett.
90, 077004, (2003).
[8] A. Llobet, J.S. Gardner, E.G. Moshopoulou, J.M.
Mignot, M. Nicklas, W. Bao, N.O. Moreno, P.G.
Pagliuso, I.N. Goncharenko, J.L. Sarrao, J.D. Thomp-
son, Phys. Rev. B 69, 024403 (2004).
[9] G. F. Chen, K. Matsubayashi, S. Ban, K. Deguchi, N.K.
Sato, Phys. Rev. Lett. 97, 017005, (2006).
[10] T. Park, F. Ronning, H.Q. Yang, M.B. Salamon, R.
Movshovich, J.L.Sarrao, J.D.Thompsom, Nature 440,
65, (2006).
[11] T.Park, J.L. Sarrao and J.D. Thompson, J. of Magn. and
Magn. Mater. (in press).
[12] F. Yndurain, Solid St. Commun. 81, 939 (1991).
[13] F. Yndurain, Phys. Rev. B. 51, 8494 (1995).
[14] F. Yndurain, in New trends in Magnetism, Magnetic Ma-
terials and their Applications. Plenum Press, New York,
(1994).
[15] G. Kresse and J. Furthmuller, Phys. Rev. B 54, 11169
(1996).
[16] J.P. Perdew, K. Burke and M. Ernzerhof, Phys. Rev.
Lett. 77, 3865 (1996).
[17] E. G. Moshopoulou, Z. Fisk, J. L. Sarrao and J.D.
Thompson, J. Solid Stat. Chem. , 158 25 (2001).
[18] T.M. Rice and G. K. Scott, Phys. Rev. Lett. 35, 120
(1975).
[19] J.M. Soler, E. Artacho, J.D.Gale, A.Garcia, J.Junquera,
P.Ordejon and D.Sanchez-Portal, J. Phys. Condens. Mat-
ter 14, 2745 (2002).
[20] G. Bilbro and W.L. McMillan, Phys. Rev. B 14, 1887
(1976).
[21] M.A. Itskovsky and T. Maniv, Phys. Rev. B 72, 075124
(2005).
http://arxiv.org/abs/cond-mat/0512078
ABSTRACT
  We discuss the fixed-point Hamiltonian and the spectrum of excitations of a
quasi-bidimensional electronic system supporting simultaneously
antiferromamagnetic ordering and superconductivity. The coexistence of these
two order parameters in a single phase is possible because the magnetic order
is linked to the formation of a spin density wave, and its order parameter is
not associated to a spectral gap but to an energy shift of the paramagnetic
bands. This peculiarity entails several distinct features in the phase diagram
and the spectral properties of the model, which may have been observed in
CeRhIn$_5$. Apart from the coexistence, we find an abrupt suppression of the
spin density wave when the superconducting and magnetic ordering temperatures
are equal. The divergence of the cyclotron mass extracted from de Haas-van
Alphen experiments is also analyzed in the same framework.

<|endoftext|><|startoftext|>
Introduction
The emergence of the dark energy as one of the basic ingredients of the current
standard cosmological scenario, and the absence of an even vague understanding of
its possible origin, opens a window to the analysis of all possible mechanisms that
generate background energy (see e.g. [1] for a review of recent proposals). The main
problem is that the apparent value of the dark energy is very tiny compared with any
physical energy scale. A second problem is that in a generic quantum field theory
there is generation of vacuum energy and any renormalization prescription requires a
fine tuning, which is not very convincing without the quantisation of the gravitational
interaction.
The guess that dark energy might change with the evolution of the Universe can
be understood even if dark energy is just vacuum energy. The finite corrections due to
finite size of the causal Hubble domain decrease as the Universe continues to expand.
The aim of this paper is to analyse the variation of these finite size corrections under
of renormalization group on the space of boundary conditions for scalar field theories in
flat space, although the results are generalisable for more general backgrounds.
The dependence of the vacuum energy on the boundary conditions [2] is well known
since the discovery of Casimir effect [3] (see [4, 5] and [6] and references therein for
recent revisions). However, boundaries might also be considered as a source of new,
although peculiar, interactions and therefore can undergo renormalization [7, 8]. The
renormalization of boundary conditions modifies the critical behaviour of the theory
[9, 10, 11]. In systems with boundaries or defects, the boundary RG flow induces a
dynamical behaviour on the boundaries. The dynamics of D-branes in string theory
emerges in this way [12].
The renormalization group flow is analysed from a global viewpoint in the most
general framework for boundary conditions of scalar field theories introduced in Ref.
[13]. In particular, we consider the possible existence of topological transitions [14]
induced by the renormalization of boundary conditions or cyclic orbits in the boundary
RG flow [15]. The dependence of the finite size corrections to the vacuum energy and
vacuum entanglement entropy [16, 17] under the boundary RG flow is analysed from a
very general perspective.
2. Boundary conditions in Field Theory
The action which governs the dynamics of scalar field theory in a bounded domain
Ω of flat space consists of two different of terms, S(φ) = SB(φ) + Sb(φ). The first one
SB(φ) =
g dDx
|φ̇|2 − |∇φ|2 − V (|φ|2)
is defined in terms of the values of the fields in the bulk. The second term
Sb(ϕ) =
dD−1x
|ϕ̇|2+
ϕ∗∂nϕ+
∗)ϕ−|∇ϕ|2
Vacuum Energy and Renormalization on the Edge 3
depends only on the values of the fields at the boundary ∂Ω † . g
denotes the metric
induced on the boundary by the bulk flat metric, and ∂n is the normal derivative at the
boundary
ϕ = φ|
∂nϕ = ∂nφ|∂Ω. (3)
The presence of the boundary term Sb allows the generation of local classical equations
of motion without requiring any specific type of boundary conditions [19, 20]. Indeed,
the gradient term
V = 1
|∇φ|2 (4)
can be rewritten as
φ†∆φ+
φ† ∂nφ (5)
where ∆ is the Laplace-Beltrami operator ∆ = − ∂µ∂µ . In the quantum theory the
Laplace-Beltrami operator must have a real spectrum in order to have a selfadjoint
Hamiltonian
H = 1
∆+m2. (6)
for the free field theory (The inclusion of interactions does not changes the picture [21]).
This means that the classical fields must satisfy boundary conditions which make the
operator ∆ selfadjoint. The complete set of boundary conditions which satisfy this
requirement [13] are in one-to-one correspondence with the group of unitary operators
of the boundary Hilbert space L2(∂Ω, C ). For any unitary operator U ∈ L2(∂Ω, C ), the
fields satisfying the boundary condition
ϕ− i ∂nϕ = U (ϕ+ i ∂nϕ) (7)
define a domain where ∆ is a selfadjoint operator.
In the case of open strings, the corresponding conformal 1+1 dimensional scalar
field theories is defined on the space interval Ω = [0, 1] ⊂ IR and there is a large variety
of admissible boundary conditions described by the unitary group M = U(2). The
unitary matrices
define Dirichlet, Neumann and periodic boundary conditions, which in string theory
correspond to a string attached to a D-brane background, free open and closed string
theories, respectively.
For higher N-dimensional target spaces, or N-component strings, the set of
boundary conditions becomes M = U(2N) which includes matrices which interpolate
between one single closed string or N disconnected strings [13]. The topology change
is described in this picture by a simple change of boundary conditions in L2(∂Ω, C N)
[14].
† We will assume that the boundary is regular and smooth. See e.g. [18] for the peculiarities associated
to the presence of irregular boundaries
Vacuum Energy and Renormalization on the Edge 4
If the spectrum of eigenvalues of the unitary operator U does not include the value
±1 (i.e. ±1 /∈ SpU) the boundary condition (7) can be rewritten as
∂nϕ = −i
ϕ (9)
which means that only the boundary values of the fields at the boundary can have an
arbitrary value ϕ whereas its normal derivative is determined by U and ϕ.
The corresponding operator mappings from unitary into selfadjoint operators
A± = −i
are the celebrated Cayley transforms. The inverse Cayley transform
I∓ iA±
I± iA±
recovers the unitary operator U from their selfadjoint Cayley transforms A±.
The condition of ∆ being selfadjoint is necessary but not sufficient to guarantee
the unitarity of the corresponding quantum field theory. Indeed, in the case of free
field theory the Hamiltonian (6) must be selfadjoint. This requires that the spectrum
of ∆ +m2 must be not only real but also positive which restricts the set of admissible
boundary conditions to a subset M of L2(∂Ω, C ).
Because of the existence of the boundary term in (5) the Hamiltonian H (6) is not
selfadjoint if the spectrum of the unitary operator U intersects the following domain of
phase factors
S1m = {e2αi;−π < α ≤ π, 0 < α <
− arctan m2, or
< −α < π − arctan m2 }.
In any other case, −m2 is a lower bound for the spectrum of the operator ∆ and H is
selfadjoint. One possible source of unitarity loss is the existence of edge estates with
large negative eigenvalues of operator ∆.
The consistency of the quantum field theory imposes, thus, a very stringent
condition on the type of acceptable boundary conditions, even in the case of massive
theories in order to prevent this type of pathological behaviour of vacuum energy.
For real scalar fields there is a further condition. U has to satisfy a CP symmetry
preserving condition
U † = U∗, U = UT . (12)
The usual Neumann and Dirichlet boundary conditions U = ±I satisfy this condition.
In general, for
BT A2
the condition requires that
A1 = A
1 , A2 = A
2 , A1B
∗ +BA
2 = 0 (14)
BB† + A1A
1 = I, A2A
TB∗ = I (15)
Vacuum Energy and Renormalization on the Edge 5
In particular, the quasi-periodic condition ϕ(L) = M−1ϕ(0), ∂nϕ(L) = M∂nϕ(0)
is also compatible if M = M t = M∗.
In the case of one single real massless scalar the set of compatible boundary
conditions has two connected components: M0 given by the operators of the form
Uβ = cos β I+ i sin β σy,
and M1 given by
Uα = cosα σz + sinα σx . (16)
M0 includes Neumann (β = 0) and Dirichlet (β = π) conditions; and M1 contains the
quasi-periodic boundary conditions
ϕ(L) = tan
ϕ(0); ∂nϕ(L) =
∂nϕ(0) (17)
which include periodic (α = π
) and antiperiodic (α = −π
) boundary conditions.
3. Boundary Conditions and Renormalization Group
Since boundary conditions appear more naturally in the Schrödinger picture of field
theory and the theory is plagued of ultraviolet singularities some doubts were raised
about their relevance for the quantum field theory. The pioneer work of Symanzik [21]
confirmed the consistence of the standard picture even in presence of bulk renormalizable
interactions (see [22] for an explanation of a recent controversy [23]).
Moreover, there is a renormalization of the very boundary conditions because the
boundary terms are the source of new interactions.
The renormalization group can be defined in the continuum approach by
= Λ[φ][φ(x)− ξΛ(x)] (18)
by means of a fluctuating field ξΛ with short range fluctuations of order
. This implies
that the boundary condition
∂nϕ = Aϕ (19)
is renormalised to
∂nϕΛ = AΛϕΛ, (20)
since
= Λ[φ]+1[∂nφ(xb)− ∂nξΛ(xb)] = AΛ[φ]+1φ(xb) = AΛφΛ
with AΛ = ΛA. For more general boundary conditions the continuum renormalization
group is given by
Λ∂ΛUΛ =
Λ − UΛ
t ∂tUt =
t − Ut
Vacuum Energy and Renormalization on the Edge 6
for Λ = Λ0 e
t. Fixed points correspond, therefore, to self-adjoint boundary conditions
U † = U . In particular, Dirichlet and Neumann (U = ∓ I) are renormalization group
fixed points.
For mixed boundary conditions the RG flows from Dirichlet (UV) toward Neumann
(IR) conditions.
U = e2i arctan e
I. (24)
Critical exponents can be identified with the eigenvalues of the matrix Uc at the
fixed points. Since Uc is also hermitian all critical exponents are either 1 or −1 and
there is no room for cyclic orbits. It is well known, however, that some quantum systems
with singular boundaries and singular interactions [15, 24] exhibit cyclic renormalization
group flows. Moreover, some topological field theories (e.g. Russian doll models) present
a similar behaviour [25]. In scalar field theories, this phenomenon simply does not occur
for regular boundaries. For the same reasons topological transitions do not occur for
finite scale transformations since the flip of eigenvalues from −1 to +1 requires a change
in the parameter t of the flow from −∞ to ∞ as in (24)).
4. Conformal Invariance and boundary conditions
In 1+1 dimensions the theory of massless scalar fields is formally conformal
invariant. However, boundary conditions might break this symmetry [9, 10, 11].
Conformal invariance is only preserved if the boundary conditions are stable under
the boundary renormalization group flow. The fixed points can easily be identified. For
a complex scalar field, besides the above mentioned fixed points, which correspond
to Dirichlet, Neumann and pseudo-periodic boundary conditions and obviously are
conformal invariant, there are fixed points corresponding to quasi-periodic boundary
conditions (17). They also preserve the conformal symmetry.
In 1+1 dimensions this exhausts the whole set of conformal invariant boundary
conditions. Any other boundary condition flows toward one of these fixed points.
The most stable fixed point corresponds to Neumann conditions because all its critical
exponents are +1. The most unstable is that of Dirichlet conditions since all critical
exponents −1. This is compatible with the fact that the neighbourhood of Dirichlet
boundary conditions is plagued of singularities
Periodic, quasi-periodic and pseudo-periodic fixed points present relevant and
irrelevant perturbations with critical exponents ±1, respectively. Negative values label
the possible instabilities. Implications of these results for string theory are well known.
Periodic boundary conditions, appear as attractors of systems with quasi-periodic and
pseudo-periodic conditions which stresses the stability of closed string theory vacuum.
For open strings the (stable) attractor points are standard free strings (Neumann). Any
other boundary condition flow toward one of those fixed points.
Notice that the absence of topological transitions in the boundary renormalization
group flow is a consequence of the fact that all relevant perturbations are always
associated with -1 critical exponents.
Vacuum Energy and Renormalization on the Edge 7
In higher dimensions (D > 1) conformal invariance requires, even in the massless
case m = 0, that Neumann boundary conditions have to be modified in order to preserve
conformal invariance with a term
∂nϕ =
D − 1
K ϕ, (25)
proportional to the extrinsic curvature K of the boundary.
In the case of singular boundaries some more interesting boundary renormalization
group flows arise (see e.g. [18] for a review): fixed points and cyclic orbits of the
boundary renormalization group flow can appear [15, 25, 24] and conformal invariance
can be partially broken to a discrete subgroup Z [24].
5. Vacuum energy and boundary conditions
The infrared properties of quantum field theory are very sensitive to boundary
conditions [26]. In particular, physical properties of the quantum vacuum state like
the vacuum energy may exhibit a very strong dependence on the type of boundary
conditions. This can be explicitly shown in the simple case of a massless field defined
on a finite one-dimensional interval [0, L].
For pseudoperiodic boundary conditions defined by the unitary operator
Uθ = cos θ σx − sin θ σy : ϕ(L) = eiθϕ(0) (26)
the Casimir vacuum energy (see e.g. Ref. [5] and references therein) is given by
+n− 1
 (27)
The vacuum energy dependence on θ is in this case relatively smooth. The only
cuspidal point at θ = 0 corresponds to periodic boundary conditions. A completely
regular behaviour is obtained for Robin boundary conditions
U = e2αiI : ∂nϕ(0) = tanαϕ(0), ∂nϕ(L) = tanαϕ(L) (28)
which smoothly interpolate between Dirichlet (α = π
) and Neumann (α = π) conditions
when α is restricted to the interval α ∈ [π
, π] [27, 28, 29] .
Finally, the Casimir energy for quasi-periodic boundary conditions [30]
is also dependent on the choice of the parameter α. Two particularly interesting
cases are given by α = 0, UZ = σz; ϕ(L) = 0, ∂nϕ(0) = 0 and α = π,
U ′Z = σz; ϕ(0) = 0, ∂nϕ(L) = 0 which correspond to a Zaremba (mixed) boundary
conditions: one boundary is Dirichlet and the other Neumann.
Vacuum Energy and Renormalization on the Edge 8
6. Vacuum entanglement entropy
The dependence of vacuum energy on boundary conditions seems to suggest that
many other observables may suffer the same effect. In particular, one may wonder
whether or not the entropy of the system is dependent on the type of conditions that
constrain the values of the fields at the boundary. The entropy of the field theory at
finite temperature scales with the volume of physical space. Only in quantum gravity
or string theory the entropy can scale with the area of black hole horizon. However, in
field theory it is possible
Figure 1. Information loss by integration over the fluctuations of the fields inside the domain ω
to generate a mixed state from the pure vacuum state Ψ0 by integrating out the
fluctuating modes in a bounded domain ω of the physical space Ω (see Figure 1)
Ψ∗0Ψ0. (30)
The entropy of this state Sω = −Tr ρω log ρω, although ultravioletly divergent, provides
a measure of the degree of entanglement of the vacuum state. In the case of a free
massless real scalar field theory in one-dimensional spaces (D = 1) this entropy scales
logarithmically with the size lω of ω and the ultraviolet cut-off ǫ introduced to split
apart the domain ω and its complement Ω\ω
, (31)
and in D = 2 dimensions it scales linearly with the perimeter Rω of ω
Sω = c2
− γ (32)
and in D > 2 dimensions as the volume of the boundary of ω
Sω = cDVωǫ
1−D. (33)
In particular in three-dimensional spaces it scales with the area of the boundary of ω
like in the presence of a blackhole [16, 17]. Although the coefficients of the leading terms
c2, cD in (32) and (33) have been explicitly computed, they are not universal because
they obviously depend on the choice of the UV cutoff ǫ. On the contrary, the coefficient
Vacuum Energy and Renormalization on the Edge 9
c1 = 1/3 of the logarithmic term in (31) is universal and does coincide with one third
of the central charge of the corresponding conformal field theory. Similarly, the finite γ
term in (32) is also universal in D = 2 dimension and is related to a degree of topological
entanglement [31].
It is remarkable that in D = 1 the coefficient c1 = 1/3 is also independent
of the choice of boundary condition in Ω. This in contrast with what happens for
the finite size corrections to vacuum energy. The coefficient of the 1/L term is also
proportional to the central charge but in that case the corresponding factor is very
sensitive to the type of boundary conditions imposed at the boundary of Ω. The above
results indicate that whereas the Casimir energy is closely related with the infrared
properties of the conformal theory which are sensitive to the boundary conditions, the
entanglement entropy is rather associated to the behaviour at the interface between ω
and its complement Ω\ω which do not depend on the choice of boundary conditions at
the edge of the physical space.
7. Conclusions
The description of regular boundary conditions in terms of unitary matrices provides
a very useful framework for the description of the boundary renormalization group flow
and the breaking of conformal invariance due to boundary effects. Neumann conditions
turn out to be the only boundary conditions which are absolutely stable under RG
flow. All other boundary conditions may have some relevant perturbations which are
the source of RG instabilities. However, the global structure of the flow does not permit
topological transitions.
The finite size corrections to vacuum energy are very sensitive to the choice
of boundary conditions which discriminate between the different fixed points of the
renormalization group flow. On the contrary, the leading contribution to entanglement
entropy of the vacuum is insensitive, for one-dimensional massless scalar field theories,
to the change of boundary conditions. In D=2 dimensions the same property holds
for the finite correction to the entanglement entropy of massless scalar theories. This
fact, is very relevant for the implementation of quantum codes with topological stability
[31]. However, these properties do not hold for the leading terms contributing to the
entanglement entropy.
Acknowledgements
We thank E. Elizalde, J.G. Esteve, S. Odintsov and G. Sierra for interesting
discussions on closely related subjects. This work is partially supported by CICYT
(grant FPA2004-02948) and DGIID-DGA (grant2006-E24/2).
Vacuum Energy and Renormalization on the Edge 10
References
[1] T. Padmanabhan, Phys. Rept. 380 (2003) 235
[2] D. Deutsch, P. Candelas, Phys. Rev. D20 (1979) 3063
[3] H. B. G. Casimir, Proc. K. ned. Akad. Wet. 51(1948) 793
[4] K. A. Milton, The Casimir Effect: Physical Manifestations of the Zero Point Energy, World Sci.,
Singapore (2001)
[5] M. Bordag, U. Mohideen, V. M. Mostepanenko, Phys. Rep. 353 (2002) 1
[6] M. Asorey, D. Garćıa-Alvarez and J.M. Muñoz-Castañeda, J. Phys. A 39 (2006) 6127
[7] I. G. Moss, Class. Quant. Grav. 6 (1989) 759
[8] S. D. Odintsov, Class. Quant. Grav., 7(1990)445
[9] I. Affleck, Nucl. Phys. (Proc. Suppl.) B58 (1997) 35
[10] J. B. Zuber and V. B. Petkova, Arxive preprint [hep-th/0103007]
[11] J. Cardy, In Encyclopedia of Mathematical Physics, Eds. J.-P. Françoise, G. L. Naber and T. S.
Tsun, Academic Press (2006)
[12] J. Polchinski, Phys. Rev. Lett. 75 (1995) 4724
[13] M. Asorey, A. Ibort and G. Marmo, Int. J. Mod. Phys. A 20 (2005) 1001
[14] A. P. Balachandran, G. Bimonte, G. Marmo and A. Simoni, Nucl. Phys. B 446 (1995) 299
[15] S. D. Glazek and K. G. Wilson, Phys. Rev. Lett. 89 (2002) 230401
[16] L. Bombelli, R.K. Koul, J. Lee and R. Sorkin, Phys. Rev. D 34 (1986) 373
[17] M. Srednicki, Phys. Rev. Lett. 71 (1993) 666
[18] I. Tsutsui and T. Fülöp, Arxive preprint [quant-ph/0312028]
[19] M. Bordag, H. Falomir, E. M. Santangelo and D. V. Vassilevich, Phys.Rev. D65 (2002) 064032
[20] A. A. Saharian, Phys. Rev. D69 (2004) 085005
[21] K. Symanzik, Nucl. Phys. B190 (1981) 1
[22] E. Elizalde, J. Phys. A 36 (2003) L567
[23] N.Graham, R.L.Jaffe, V.Khemani, M.Quandt, O. Schroeder and H.Weigel, Nucl.Phys. B677
(2004) 379-404
[24] M. Asorey, J.G. Esteve and G. Sierra, In preparation (2006)
[25] A. LeClair, J. M. Roman and G. Sierra, Phys. Rev. B69 (2004) 20505
[26] M.Asorey, J. Geom. Phys. 11(1993)94
[27] A. Romeo and A. A. Saharian, J. Phys. A35 (2002) 1297
[28] L. C. Albuquerque, R. M. Cavalcanti, J.Phys. A37 (2004) 7039
[29] B. Mintz, C. Farina, P. A. Maia Neto and R. B. Rodrigues, J. Phys. A39 (2006) 11325
[30] M. Asorey, unpublished (1994)
[31] A. Kitaev and J. Preskill, Phys. Rev. Lett. 96 (2006) 110404
ABSTRACT
  The vacuum dependence on boundary conditions in quantum field theories is
analysed from a very general viewpoint. From this perspective the
renormalization prescriptions not only imply the renormalization of the
couplings of the theory in the bulk but also the appearance of a flow in the
space of boundary conditions. For regular boundaries this flow has a large
variety of fixed points and no cyclic orbit. The family of fixed points
includes Neumann and Dirichlet boundary conditions. In one-dimensional field
theories pseudoperiodic and quasiperiodic boundary conditions are also RG fixed
points. Under these conditions massless bosonic free field theories are
conformally invariant. Among all fixed points only Neumann boundary conditions
are infrared stable fixed points. All other conformal invariant boundary
conditions become unstable under some relevant perturbations. In finite volumes
we analyse the dependence of the vacuum energy along the trajectories of the
renormalization group flow providing an interesting framework for dark energy
evolution. On the contrary, the renormalization group flow on the boundary does
not affect the leading behaviour of the entanglement entropy of the vacuum in
one-dimensional conformally invariant bosonic theories.

<|endoftext|><|startoftext|>
Introduction
The use of meta-stable vacua in supersymmetric model building has attracted much
attention lately, especially after the discovery [1] that generic supersymmetric field theories
in four dimensions such as the supersymmetric QCD with massive flavors have meta-
stable vacua with broken supersymmetry. In [2], realistic models of direct mediation were
constructed using superpotentials without U(1)R symmetry. Though explicit breaking of
the U(1)R symmetry generates meta-stable vacua, there is a range of parameters where one
can make them sufficiently long lived, while satisfying the phenomenological constraints
on the masses of the gauginos, the gravitino, and the scalars without artificially elaborate
constructions. The models can also avoid producing Landau poles in standard model gauge
interactions below the unification scale. Recently beautiful realizations of these models in
string theory, including a natural mechanism to generate small parameters of these models,
were found in [3].
Gauge mediation models were also constructed using meta-stable vacua with similar
phenomenological benefits [4,5]. Related ideas have been explored in [6-10]. Accepting
the possibility that our universe may be in a meta-stable state allows us to circumvent
the theoretical constraints due to the Nelson-Seiberg theorem on R-symmetry [11] and the
Witten index [12] and gives us greater flexibility in model building, as emphasized in [13]
among other recent papers.
Among the models constructed recently based on meta-stable vacua, the ones dis-
cussed in [4] are particularly simple. In this paper, we will show that they have ultra-violet
completions in supersymmetric quiver gauge theories which can be realized in string com-
pactifications. Moreover, our construction can be naturally generalized to a large class of
quiver gauge theories, providing a basis for the speculation in [4] that “gauge mediation
may be a rather generic phenomenon in the landscape of possible supersymmetric theo-
ries.” In this paper, we will demonstrate the idea by explicitly working out one example:
a model based on type IIB superstrings compactified on the A4-fibered geometry [14]. We
will also give an outline of generalizations of this construction to a large class of quiver
gauge theories. Detailed analysis of meta-stable vacua in these models will be given in a
separate paper [15].
2. The Model
The model we will consider in this paper is realized in string theory compactified on
the local Calabi-Yau manifold described by the equation,
x2 + y2 +
(z + ti(w)) = 0,
ti(w) = 0,
ti(w)− ti+1(w) = µi(w − xi), (x, y, z, w) ∈C
(2.1)
Since ti’s are functions of w, this gives the A4 singularity fibered over w ∈C. In particular,
there exist four two-cycles S2 on which D branes can be wrapped. The low energy limit
of D5 branes wrapping the two-cycles S2 and extending along the four uncompactified
dimensions is the A4 quiver gauge theory with the gauge group U(N1)×U(N2)×U(N3)×
U(N4) with the adjoint chiral multiplets Xi=1,2,3,4 for the four gauge group factors and
the bi-fundamental chiral multiplets (Q12, Q21), (Q23, Q32), and (Q34, Q43). This quiver
gauge theory can also be realized on intersecting brane configuration with NS5 and D4
branes, as expected from the T-duality between the An singularity and NS5 branes [16].
N 1 N 2 N 3 N 4
Q 12,21 Q 23,32 Q 34,43
X1 X2 X3 X 4
Fig. 1: A4 quiver diagram
From the Calabi-Yau singularity (2.1), one can read off the superpotential of the quiver
theory as [17,18]
WA4 =
tr (Qi+1,iXiQi,i+1 −Qi,i+1Xi+1Qi+1,i) +
(Xi − xi)
2. (2.2)
Note that the dimensionful parameters µi and xi are the moduli of the Calabi-Yau manifold
given by (2.1), namely they are closed string moduli. The dynamical scales Λi=1,···,4 of
the four gauge group factors are also closed string moduli, related to the sizes of the S2’s.
These closed string moduli are frozen and can be regarded as parameters of the low energy
theory. Let us suppose that µi are sufficiently larger than Λi so that we can integrate out
all the adjoints Xi to obtain the effective superpotential
Weff =
mitrQi i+1Qi+1 i −
tr (Qi i+1Qi+1 i)
trQ21Q12Q23Q32 +
trQ32Q23Q34Q43,
(2.3)
where
mi = ci − ci+1, µ̃i =
2µiµi+1
µi + µi+1
(i = 1, 2, 3).
This quiver gauge theory can be used as a gauge mediation model as follows. We
identify the bi-fundamentals (Q34, Q43) as messenger fields. One way to incorporate the
standard model sector would be to identify a subgroup of the U(N4) gauge group with the
standard model gauge group or a GUT gauge group. Alternatively, we can replace the 4th
node of the quiver diagram of Fig. 1 carrying the U(N4) gauge group with a string theory
construction of the standard model. For example, if the standard model is realized on
intersecting branes, messengers can be open strings connecting the 3rd node carrying the
U(N3) gauge group to the standard model branes.
1 In the following, we will denote the
bi-fundamental fields (Q34, Q43) as (f, f̃) to distinguish them from the rest of the quiver
gauge theory and to emphasize their role as the messengers.
The rest of the quiver gauge theory is treated as a hidden sector, where supersymmetry
is broken dynamically. To use the result of [1], let us assume that the ranks of the gauge
group factors satisfy
N2 + 1 ≤ N1 +N3 <
N2 (2.4)
and that
Λ1,Λ3,Λ4 ≪ Λ2 ≪ µi.
In this case, one can identify the gauge group SU(Nc) of the model of [1] with SU(N2) ⊂
U(N2) of the quiver theory. Since the metastable vacuum can be found near the origin of
the meson fields M11 ∼ Q12Q21, M33 ∼ Q32Q23, the terms tr (Q12Q21)
, tr (Q32Q23)
tr (Q12Q21Q32Q23) in the superpotential (2.3) are irrelevant in our discussion below, if the
masses µi of the adjoints satisfy the following bounds [4,5],
µ̃1,2
≤ min
m1,2Λ2,
. (2.5)
In this range of the parameters, the hidden sector and its interaction with the mes-
senger sector is described by the superpotential,
W = mtrQ12Q21 +mtrQ32Q23 +
trQ32Q23f f̃ +m3tr f f̃ −
. (2.6)
1 In this case, we can still use the effective potential (2.3) to describe the interaction of the
messengers and the hidden sector, but we should set 1/µ̃4 = 1/(2µ3) since we do not have the
adjoint field X4.
Here, we set the mass parameters m1 = m2 = m, for simplicity. Consider the case when
N1 = N2 = 3 and N3 = 1 so that the Landau pole problem can easily be avoided. The
resulting model is a variant of the models proposed in [4]. The model [4] has the global
symmetry U(4) × U(1)mess, where U(4) is the flavor symmetry of the ISS model and
U(1)mess acts on the messengers (f, f̃). The meta-stable vacuum spontaneously breaks the
U(4) symmetry, giving rise to Nambu-Goldstone bosons, when m1 ≃ m2. In our model,
the would-be Nambu-Goldstone bosons are eaten by the gauge symmetry. This difference
is not important in the low energy analysis of supersymmetry breaking effects.
Let us discuss phenomenological constraints on the parameters in (2.6). We will focus
on the following part of the superpotential (2.6),
Wmess =
M33f f̃ +m3 f f̃ , (2.7)
where M33 = Q32Q23/Λ2 is neutral under the U(N3) = U(1) gauge group. We have
dropped the irrelevant quartic term (f f̃)2 because the messengers (f, f̃) are weakly inter-
acting at energies above the electroweak scale, if the mass parameter µ̃3 is large enough.
The F -component of the meson superfield M33 develops the vacuum expectation value
and breaks supersymmetry [1]. The supersymmetric mass and the soft supersymmetry
breaking mass of the messenger fields (f, f̃) are then given by
Wmess ≃
m3 + θ
f f̃ . (2.8)
Following the analysis in [4,5], we find that all the phenomenological requirements for the
messenger sector can be satisfied, for example, in the following range of parameters,
Λ2 ≃ 10
11GeV, m ≃ 108GeV, m3 ≃ 10
7GeV,
µ1 ≥ µ2 ≥ 10
13GeV, µ3 ≃ 10
18GeV.
(2.9)
3. Generalization
We found that both the messenger sector and the hidden sector of the models proposed
in [4] can be realized in the A4 quiver gauge theory. This construction naturally suggests
the following generalization. Consider a quiver diagram which can be separated into two
disjoint diagrams Γ1 and Γ2 by cutting at one node, which we denote by a. If the scale
Λa associated to the gauge group on the a-node is sufficiently low, and if superpotential
interactions between them are small, we have effectively two separate quiver gauge theories
for phenomena much above the scale Λa, one associated to Γ1 and another associated to
Γ2, which are weakly interacting with each other through the a-node. If supersymmetry is
broken in the sector Γ1, it can be communicated to the sector Γ2 by the gauge mediation
mechanism. The beauty of the quiver gauge theory construction is that, because of the
presence of bi-fundamental and adjoint fields on links and nodes, an effective superpotential
of the form (2.8) is naturally generated when supersymmetry is broken in a part of the
diagram connected to the a-node.
It follows trivially that any quiver theory that is vector like with adjustable mass terms
has meta-stable supersymmetry breaking vacua in some range of its parameter. All one has
to do is to identify a part of the diagram where supersymmetry can be broken using a known
mechanism, for example as in [1] or its variant [19], and to have its effect communicated to
the rest of the diagram by messengers. One can also consider the scenario where the quiver
theory associated to a sub-diagram Γ2 has a supersymmetric vacuum with dynamically
generated small scales, which can be used to set parameters of the theory associated to
another sub-diagram Γ1, where supersymmetry is broken. The supersymmetry breaking
effect can then be communicated back to the sub-diagram Γ2. This would give a string
theory realization of the idea of [20]. These and other mechanisms of supersymmetry
breaking will be explored further in [15].
These supersymmetry breaking quiver gauge theories can be coupled to the messenger
sector. In fact, as in the case of the A4 model discussed in the previous section, the mes-
senger sector itself can be included in quiver theories. If the messenger sector is attached
at the end of the quiver diagram, the effective low energy superpotential always takes the
form (2.3). Thus, one can see that the models in [4] and their generalizations are robust
and naturally appear in this large class of string compactifications.
Acknowledgments
We thank D. Berenstein, M. Dine, R. Kitano, J. Marsano, C. S. Park, N. Seiberg,
M. Shigemori, and T. Watari for discussions. H.O. thanks the hospitality of the high
energy theory group at the University of Tokyo at Hongo.
H.O. and Y.O. are supported in part by the DOE grant DE-FG03-92-ER40701. The
research of H.O. is also supported in part by the NSF grant OISE-0403366 and by the 21st
Century COE Program at the University of Tokyo. Y.O. is also supported in part by the
JSPS Fellowship for Research Abroad. The research of T.K. was supported in part by the
Grants-in-Aid (#16740133) and (#16081206) from the Ministry of Education, Culture,
Sports, Science, and Technology of Japan.
References
[1] K. Intriligator, N. Seiberg and D. Shih, “Dynamical SUSY breaking in meta-stable
vacua,” JHEP 0604, 021 (2006), arXiv:hep-th/0602239.
[2] R. Kitano, H. Ooguri and Y. Ookouchi, “Direct mediation of meta-stable supersym-
metry breaking,” Phys. Rev. D 75, 045022 (2007), arXiv:hep-ph/0612139.
[3] R. Argurio, M. Bertolini, S. Franco and S. Kachru, “Metastable vacua and D-branes
at the conifold,” arXiv:hep-th/0703236.
[4] H. Murayama and Y. Nomura, “Gauge mediation simplified,” arXiv:hep-ph/0612186.
[5] H. Murayama and Y. Nomura, “Simple scheme for gauge mediation,” arXiv:hep-
ph/0701231.
[6] A. Amariti, L. Girardello and A. Mariotti, “On meta-stable SQCD with adjoint matter
and gauge mediation,” arXiv:hep-th/0701121.
[7] M. Dine and J. Mason, “Gauge mediation in metastable vacua,” arXiv:hep-ph/0611312.
[8] C. Csaki, Y. Shirman and J. Terning, “A simple model of low-scale direct gauge
mediation,” arXiv:hep-ph/0612241.
[9] O. Aharony and N. Seiberg, “Naturalized and simplified gauge mediation,” JHEP
0702, 054 (2007), arXiv:hep-ph/0612308.
[10] D. Shih, “Spontaneous R-symmetry breaking in O’Raifeartaigh models,” arXiv:hep-
th/0703196.
[11] A. E. Nelson and N. Seiberg, “R symmetry breaking versus supersymmetry breaking,”
Nucl. Phys. B 416, 46 (1994), arXiv:hep-ph/9309299.
[12] E. Witten, “Dynamical Breaking Of Supersymmetry,” Nucl. Phys. B 188, 513 (1981).
[13] K. Intriligator, N. Seiberg and D. Shih, “Supersymmetry breaking, R-symmetry break-
ing and metastable vacua,” arXiv:hep-th/0703281.
[14] F. Cachazo, S. Katz and C. Vafa, “Geometric transitions and N = 1 quiver theories,”
arXiv:hep-th/0108120.
[15] T. Kawano, H. Ooguri, Y. Ookouchi and C.S. Park, in preparation
[16] H. Ooguri and C. Vafa, “Two-dimensional black hole and singularities of CY mani-
folds,” Nucl. Phys. B 463, 55 (1996), arXiv:hep-th/9511164.
[17] C. Vafa, “Superstrings and topological strings at large N ,” J. Math. Phys. 42, 2798
(2001), arXiv:hep-th/0008142.
[18] F. Cachazo, B. Fiol, K. A. Intriligator, S. Katz and C. Vafa, “A geometric unification
of dualities,” Nucl. Phys. B 628, 3 (2002), arXiv:hep-th/0110028.
[19] H. Ooguri and Y. Ookouchi, ‘Landscape of supersymmetry breaking vacua in geomet-
rically realized gauge theories,” Nucl. Phys. B 755, 239 (2006), arXiv:hep-th/0606061.
[20] M. Dine, J. L. Feng and E. Silverstein, “Retrofitting O’Raifeartaigh models with
dynamical scales,” Phys. Rev. D 74, 095012 (2006), arXiv:hep-th/0608159.
http://arxiv.org/abs/hep-th/0602239
http://arxiv.org/abs/hep-ph/0612139
http://arxiv.org/abs/hep-th/0703236
http://arxiv.org/abs/hep-ph/0612186
http://arxiv.org/abs/hep-ph/0701231
http://arxiv.org/abs/hep-ph/0701231
http://arxiv.org/abs/hep-th/0701121
http://arxiv.org/abs/hep-ph/0611312
http://arxiv.org/abs/hep-ph/0612241
http://arxiv.org/abs/hep-ph/0612308
http://arxiv.org/abs/hep-th/0703196
http://arxiv.org/abs/hep-th/0703196
http://arxiv.org/abs/hep-ph/9309299
http://arxiv.org/abs/hep-th/0703281
http://arxiv.org/abs/hep-th/0108120
http://arxiv.org/abs/hep-th/9511164
http://arxiv.org/abs/hep-th/0008142
http://arxiv.org/abs/hep-th/0110028
http://arxiv.org/abs/hep-th/0606061
http://arxiv.org/abs/hep-th/0608159
ABSTRACT
  We show that a large class of phenomenologically viable models for gauge
mediation of supersymmetry breaking based on meta-stable vacua can be realized
in local Calabi-Yau compactifications of string theory.

<|endoftext|><|startoftext|>
EPR, Bell, Schrodinger’s cat, and the Monty Hall Paradox
Doron Cohen
Department of Physics, Ben-Gurion University, Beer-Sheva 84105, Israel
The purpose of this manuscript is to provide a short pedagogical explanation why “quantum
collapse” is not a metaphysical event, by pointing out the analogy with a “classical collapse” which
is associated with the Monty Hall Paradox.
This manuscript constitutes a short self-contained version of some selected sections taken from “lecture notes in
Quantum Mechanics” [quant-ph/0605180](∼250p). In particular section 47.4 regarding the notion of collapse has
attracted some attention. In this section I suggest to use the “Monty Hall Paradox” as a pedagogical introduction to
the discussion of “quantum collapse”. From my experience it is the most effective way to convince students and other
non-experts that “quantum collapse” is not a metaphysical event.
[8] Quantum States
====== [8.1] Is the world classical? (EPR, Bell)
We would like to examine whether the world we live in is “classical” or not. The notion of classical world includes
mainly two ingredients: (i) realism (ii) determinism. By realism we means that any quantity that can be measured
is well defined even if we do not measure it in practice. By determinism we mean that the result of a measurement
is determined in a definite way by the state of the system and by the measurement setup. We shall see later that
quantum mechanics is not classical in both respects: In the case of spin 1/2 we cannot associate a definite value of
σ̂y for a spin which has been polarized in the σ̂x direction. Moreover, if we measure the σ̂y of a σ̂x polarized spin, we
get with equal probability ±1 as the result.
In this section we would like to assume that our world is ”classical”. Also we would like to assume that interactions
cannot travel faster than light. In some textbooks the latter is called ”locality of the interactions” or ”causality”. It
has been found by Bell that the two assumptions lead to an inequality that can be tested experimentally. It turns
out from actual experiments that Bell’s inequality are violated. This means that our world is either non-classical or
else we have to assume that interactions can travel faster than light.
If the world is classical it follows that for any set of initial conditions a given measurement would yield a definite
result. Whether or not we know how to predict or calculate the outcome of a possible measurement is not assumed.
To be specific let us consider a particle of zero spin, which disintegrates into two particles going in opposite directions,
each with spin 1/2. Let us assume that each spin is described by a set of state variables.
state of particle A = xA1 , x
2 , ... (1)
state of particle B = xB1 , x
2 , ...
The number of state variables might be very big, but it is assumed to be a finite set. Possibly we are not aware or
not able to measure some of these “hidden” variables.
Since we possibly do not have total control over the disintegration, the emerging state of the two particles is described
by a joint probability function ρ
xA1 , ..., x
1 , ...
. We assume that the particles do not affect each other after the
disintegration (“causality” assumption). We measure the spin of each of the particles using a Stern-Gerlach apparatus.
The measurement can yield either 1 or −1. For the first particle the measurement outcome will be denoted as a,
and for the second particle it will be denoted as b. It is assumed that the outcomes a and b are determined in a
deterministic fashion. Namely, given the state variables of the particle and the orientation θ of the apparatus we have
a = a(θA) = f(θA, x
1 , x
2 , ...) = ±1 (2)
b = b(θB) = f(θB, x
1 , x
2 , ...) = ±1
http://arxiv.org/abs/0704.1087v1
where the function f() is possibly very complicated. If we put the Stern-Gerlach machine in a different orientation
then we will get different results:
a′ = a(θ′A) = f
θ′A, x
1 , x
2 , ...
= ±1 (3)
b′ = b(θ′B) = f
θ′B , x
1 , x
2 , ...
We have following innocent identity:
ab+ ab′ + a′b− a′b′ = ±2 (4)
The proof is as follows: if b = b′ the sum is ±2a, while if b = −b′ the sum is ±2a′. Though this identity looks innocent,
it is completely non trivial. It assumes both ”reality” and ”causality” This becomes more manifest if we write this
identity as
a(θA)b(θB) + a(θA)b(θ
B) + a(θ
A)b(θB)− a(θ′A)b(θ′B) = ±2 (5)
The realism is reflected by the assumption that both a(θA) and a(θ
A) have definite values, though it is clear that in
practice we can measure either a(θA) or a(θ
A), but not both. The causality is reflected by assuming that a depends
on θA but not on the distant setup parameter θB.
Let us assume that we have conducted this experiment many times. Since we have a joint probability distribution ρ,
we can calculate average values, for instance:
〈ab〉 =
xA1 , ..., x
1 , ...
θA, x
1 , ...
θB , x
1 , ...
Thus we get that the following inequality should hold:
|〈ab〉+ 〈ab′〉+ 〈a′b〉 − 〈a′b′〉| ≤ 2 (7)
This is called Bell’s inequality. Let us see whether it is consistent with quantum mechanics. We assume that all the
pairs are generated in a singlet (zero angular momentum) state. It is not difficult to calculate the expectation values.
The result is
〈ab〉 = − cos(θA − θB) ≡ C(θA − θB) (8)
we have for example
C(0o) = −1, C(45o) = − 1√
, C(90o) = 0, C(180o) = +1. (9)
If the world were classical the Bell’s inequality would imply
|C(θA − θB) + C(θA − θ′B) + C(θ′A − θB) + C(θ′A − θ′B)| ≤ 2 (10)
Let us take θA = 0
o and θB = 45
o and θ′A = 90
o and θ′B = −45o. Assuming that quantum mechanics holds we get
2 > 2 (11)
It turns out, on the basis of celebrated experiments that Nature has chosen to violate Bell’s inequality. Furthermore
it seems that the results of the experiments are consistent with the predictions of quantum mechanics. Assuming that
we do not want to admit that interactions can travel faster than light it follows that our world is not classical.
====== [8.2] The four Postulates of Quantum Mechanics
The 18th century version classical mechanics can be derived from three postulates: The three laws of Newton. The
better formulated 19th century version of classical mechanics can be derived from three postulates: (1) The state
of classical particles is determined by the specification of their positions and its velocities; (2) The trajectories are
determined by a minimum action principle. (3) The form of the Lagrangian of the theory is determined by symmetry
considerations, namely Galilei invariance in the non-relativistic case. See the Classical Mechanics book of Landau
and Lifshitz for details.
Quantum mechanically requires four postulates: Two postulates define the notion of quantum state, while the other
two postulates, in analogy with classical mechanics, are about the laws that govern the evolution of quantum me-
chanical systems. [The rest of this section can be found in the lecture notes].
====== [8.3] What is a Pure State
”Pure states” are states that have been filtered. The filtering is called ”preparation”. For example: we take a beam
of electrons. Without ”filtering” the beam is not polarized. If we measure the spin we will find (in any orientation
of the measurement apparatus) that the polarization is zero. On the other hand, if we ”filter” the beam (e.g. in the
left direction) then there is a direction for which we will get a definite result (in the above example, in the right/left
direction). In that case we say that there is full polarization - a pure state. The ”uncertainty principle” tells us that
if in a specific measurement we get a definite result (in the above example, in the right/left direction), then there
are different measurements (in the above example, in the up/down direction) for which the result is uncertain. The
uncertainty principle is implied by the first postulate.
====== [8.4] What is a Measurement
In contrast with classical mechanics, in quantum mechanics measurement only has meaning in a statistical sense.
We measure ”states” in the following way: we prepare a collection of systems that were all prepared in the same
way. We make the measurement on all the ”copies”. The outcome of the measurement is an event x̂ = x that can be
characterized by a distribution function. The single event has no statistical meaning. For example, if we measured
the spin of a single electron and get σ̂z = 1, it does not mean that the state is polarized ”up”. In order to know if
the electron is polarized we must measure a large number of electrons that were prepared in an identical way. If only
50% of the events give σ̂z = 1 we should conclude that there is no definite polarization in the direction we measured!
====== [8.5] Random Variables
A random variable is an object that can have any numerical value. In other words x̂ = x is an event. Let’s assume,
for example, that we have a particle that can be in one of five sites: x = 1, 2, 3, 4, 5. An experimentalist could measure
Prob(x̂ = 3) or Prob(p̂ = 3(2π/5)). Another example is a measurement of the probability Prob(σ̂z = 1) that the
particle will have spin up.
The collection of values of x is called the spectrum of values of the random variable. We make the distinction between
random variables with a discrete spectrum, and random variables with a continuous spectrum. [The rest of this
section can be found in the lecture notes].
====== [8.6] Quantum Versus Statistical Mechanics
Quantum mechanics stands opposite classical statistical mechanics. A particle is described in classical statistical
mechanics by a probability function (for presentation purpose we treat x̂ and p̂ as having discrete spectrum):
ρ(x, p) = Prob{x̂ = x, p̂ = p} (12)
The expectation value of a random variable Â = A(x̂, p̂) is calculated using the definition:
〈Â〉 =
ρ(x, p)A(x, p) ≡ trace(ρA) (13)
In particular we can write: ρ(x, p) = 〈 δ(p̂− p) δ(x̂ − x) 〉 (14)
From the definition of the expectation value follows the linear relation 〈αÂ+ βB̂〉 = α〈Â〉+ β〈B̂〉 for any pair of
observables. This linear relation is a trivial result of classical probability theory. It assumes that the joint probability
function Eq.(12) can be defined. But in quantum mechanics we cannot define a “quantum state” using a joint
probability function, as implied by the observation that our world is not “classical”. For this reason, we have to use a
more sophisticated approach. Loosely speaking one may say that Quantum Mechanics takes Eq.(14) as the definition
of ρ, and use the linear relation of the expectation values as a second postulate in order to deduce Eq.(13).
[The rest of this section an be found in the lecture notes].
[47] Theory of Quantum Measurements
====== [47.4] Measurements, the notion of collapse
In elementary textbooks the quantum measurement process is described as inducing “collapse” of the wavefunction.
Assume that the system is prepared in state ρinitial = |ψ〉〈ψ| and that one measures P̂ = |ϕ〉〈ϕ|. If the result of the
measurement is P̂ = 1 then it is said that the system has collapsed into the state ρfinal = |ϕ〉〈ϕ|. The probability for
this “collapse” is given by the projection formula Prob(ϕ|ψ) = |〈ϕ|ψ〉|2.
If one regard ρ(x, x′) or ψ(x) as representing physical reality, rather than a probability matrix or a probability
amplitude, then one immediately gets into puzzles. Recalling the EPR experiment this world imply that once the
state of one spin is measured at Earth, then immediately the state of the other spin (at the Moon) would change from
unpolarized to polarized. This would suggest that some spooky type of “interaction” over distance has occurred.
In fact we shall see that the quantum theory of measurement does not involve any assumption of spooky “collapse”
mechanism. Once we recall that the notion of quantum state has a statistical interpretation the mystery fades away.
In fact we explain (see below) that there is “collapse” also in classical physics! To avoid potential miss-understanding
it should be clear that I do not claim that the classical “collapse” which is described below is an explanation of the
the quantum collapse. The explanation of quantum collapse using a quantum measurement (probabilistic) point of
view will be presented in a later section. The only claim of this section is that in probability theory a correlation is
frequently mistaken to be a causal relation: “smokers are less likely to have Alzheimer” not because cigarettes help
to their health, but simply because their life span is smaller. Similarly quantum collapse is frequently mistaken to be
a spooky interaction between well separated systems.
Consider the thought experiment which is known as the “Monty Hall Paradox”. There is a car behind one of three
doors. The car is like a classical ”particle”, and each door is like a ”site”. The initial classical state is such that the car
has equal probability to be behind any of the three doors. You are asked to make a guess. Let us say that you peak
door #1. Now the organizer opens door #2 and you see that there is no car behind it. This is like a measurement.
Now the organizer allows you to change your mind. The naive reasoning is that now the car has equal probability to
be behind either of the two remaining doors. So you may claim that it does not matter. But it turns out that this
simple answer is very very wrong! The car is no longer in a state of equal probabilities: Now the probability to find it
behind door #3 has increased. A standard calculation reveals that the probability to find it behind door #3 is twice
large compared with the probability to find it behind door #2. So we have here an example for a classical collapse.
If the reader is not familiar with this well known ”paradox”, the following may help to understand why we have this
collapse (I thank my colleague Eitan Bachmat for providing this explanation). Imagine that there are billion doors.
You peak door #1. The organizer opens all the other doors except door #234123. So now you know that the car is
either behind door #1 or behind door #234123. You want the car. What are you going to do? It is quite obvious that
the car is almost definitely behind door #234123. It is also clear the that the collapse of the car into site #234123
does not imply any physical change in the position of the car.
====== [47.5] Quantum measurements, Schroedinger’s cat
What do we mean by quantum measurement? In order to clarify this notion let us consider a system and a detector
which are prepared independently as
ψa|a〉
⊗ |q = 0〉 (15)
As a result of an interaction we assume that the detector correlates with the system as follows:
ÛmeasurementΨ =
ψa|a〉 ⊗ |q = a〉 (16)
We call such type of unitary evolution ”ideal measurement”. If the system is in a definite a state, then it is not affected
by the detector. Rather, we gain information on the state of the system. One can think of q as representing a memory
device in which the information is stored. This memory device can be of course the brain of a human observer. Form
the point of view of the observer, the result at the end of the measurement process is to have a definite a. This is
interpreted as a ”collapse” of the state of the system. Some people wrongly think that ”collapse” is something that
goes beyond unitary evolution. But in fact this term just makes over dramatization of the above unitary process.
The concept of measurement in quantum mechanics involves psychological difficulties which are best illustrated by
considering the ”Schroedinger’s cat” experiment. This thought experiment involves a radioactive nucleus, a cat, and
a human being. The half life time of the nucleus is an hour. If the radioactive nucleus decays it triggers a poison
which kills the cat. The radioactive nucleus and the cat are inside an isolated box. At some stage the human observer
may open the box to see what happens with the cat... Let us translate the story into a mathematical language. A
time t = 0 the state of the universe (nucleus⊗cat⊗observer) is
Ψ = | ↑= radioactive〉 ⊗ |q = 1 = alive〉 ⊗ |Q = 0 = ignorant〉 (17)
where q is the state of the cat, and Q is the state of the memory bit inside the human observer. If we wait a very
long time the nucleus would definitely decay, and as a result we will have a definitely dead cat:
UwaitingΨ = | ↓= decayed〉 ⊗ |q = −1 = dead〉 ⊗ |Q = 0 = ignorant〉 (18)
If the observer opens the box he/she would see a dead cat:
UseeingUwaitingΨ = | ↑= decayed〉 ⊗ |q = −1 = dead〉 ⊗ |Q = −1 = shocked〉 (19)
But if we wait only one hour then
UwaitingΨ =
| ↑〉 ⊗ |q = +1〉+ | ↓〉 ⊗ |q = −1〉
⊗ |Q = 0 = ignorant〉 (20)
which means that from the point of view of the observer the system (nucleus+cat) is in a superposition. The cat at
this stage is neither definitely alive nor definitely dead. But now the observer open the box and we have:
UseeingUwaitingΨ =
| ↑〉 ⊗ |q = +1〉 ⊗ |Q = +1 = happy〉 + | ↓〉 ⊗ |q = −1〉 ⊗ |Q = −1 = shocked〉
We see that now, form the point of view of the observer, the cat is in a definite(!) state. This is regarded by the
observer as “collapse” of the superposition. We have of course two possibilities: one possibility is that the observer sees
a definitely dead cat, while the other possibility is that the observer sees a definitely alive cat. The two possibilities
”exist” in parallel, which leads to the ”many worlds” interpretation. Equivalently one may say that only one of the
two possible scenarios is realized from the point of view of the observer, which leads to the ”relative state” concept
of Everett. Whatever terminology we use, ”collapse” or ”many worlds” or ”relative state”, the bottom line is that we
have here merely a unitary evolution.
====== [47.6] Measurements, formal treatment
In this section we describe mathematically how an ideal measurement affects the state of the system. First of all let
us write how the U of a measurement process looks like. The formal expression is
Ûmeasurement =
P̂ (a) ⊗ D̂(a) (22)
where P̂ (a) = |a〉〈a| is the projection operator on the state |a〉, and D̂(a) is a translation operator. Assuming that the
measurement device is prepared in a state of ignorance |q = 0〉, the effect of D̂(a) is to get |q = a〉. Hence
ÛΨ =
P̂ (a) ⊗ D̂(a)
ψa′ |a′〉 ⊗ |q = 0〉
ψa|a〉 ⊗ D̂(a)|q = 0〉 =
ψa|a〉 ⊗ |q = a〉 (23)
A more appropriate way to describe the state of the system is using the probability matrix. Let us describe the above
measurement process using this language. After ”reset” the state of the measurement apparatus is σ(0) = |q=0〉〈q=0|.
The system is initially in an arbitrary state ρ. The measurement process correlates that state of the measurement
apparatus with the state of the system as follows:
Ûρ⊗ σ(0)Û † =
P̂ (a)ρP̂ (b) ⊗ [D̂(a)]σ(0)[D̂(b)]† =
P̂ (a)ρP̂ (b) ⊗ |q=a〉〈q=b| (24)
Tracing out the measurement apparatus we get
ρsystem =
P̂ (a)ρpreparationP̂ (a) =
(a) (25)
Where pa is the trace of the projected probability matrix P̂
(a)ρP̂ (a), while ρ(a) is its normalized version. We see that
the effect of the measurement is to turn the superposition into a mixture of a states, unlike unitary evolution for
which ρsystem = Usystem ρ
preparation U †
system
. So indeed a measurement process looks like a non-unitary process: it turns a
pure superposition into a mixture. A simple example is in order. Let us assume that the system is a spin 1/2 particle.
The spin is prepared in a pure polarization state ρ =| ψ〉〈ψ | which is represented by the matrix
ρab = ψaψ
| ψ1 |2 ψ1ψ∗2
1 | ψ2 |2
where 1 and 2 are (say) the ”up” and ”down” states. Using a Stern-Gerlech apparatus we can measure the polarization
of the spin in the up/down direction. This means that the measurement apparatus projects the state of the spin using
P (1) =
and P (2) =
leading after the measurement to the state
ρsystem = P (1)ρpreparationP (1) + P (2)ρpreparationP (2) =
| ψ1 |2 0
0 | ψ2 |2
Thus the measurement process has eliminated the off-diagonal terms in ρ and hence turned a pure state into a mixture.
It is important to remember that this non-unitary non-coherent evolution arise because we look only on the state of
the system. On a universal scale the evolution is in fact unitary.
	8  Quantum States
	47  Theory of Quantum Measurements
ABSTRACT
  The purpose of this manuscript is to provide a short pedagogical explanation
why "quantum collapse" is not a metaphysical event, by pointing out the analogy
with a "classical collapse" which is associated with the Monty Hall Paradox.

<|endoftext|><|startoftext|>
Introduction
Statistical systems are characterized by statistical ensembles. It is crucially important that
the given statistical system be correctly represented by a statistical ensemble. In other words,
the chosen statistical ensemble must be representative for the considered statistical system.
This necessitates a thorough definition of what, actually, a statistical system is, requiring an
accurate enumeration of all its basic features. The usage of a nonrepresentative ensemble,
incorrectly representing the considered statistical system, may lead, and often does lead, to
inconsistencies in the theoretical description of the system.
The necessity of defining a statistical ensemble that would correctly represent the given
statistical system was, first, emphasized already by Gibbs [1], who stressed that all additional
conditions and constraints, imposed on the system, must be taken into account. The prob-
lem of a proper representation of equilibrium statistical systems by equilibrium statistical
ensembles was discussed by ter Haar [2,3] and also analized in the review article [4].
The aim of the present paper is to formalize the notion of representative statistical en-
sembles by giving precise mathematical definitions and to generalize this notion for arbitrary
systems, whether equilibrium or nonequilibrium. The application of the notion is illustrated
by systems with Bose-Einstein condensate, when the global gauge symmetry is broken. It is
shown that employing a representative ensemble for a Bose-condensed system results in the
theory enjoying conservation laws and having no gap in the spectrum of collective excitations.
Throughout the paper, the system of units will be used with the Planck and Boltzmann
constants set to unity, h̄ = 1, kB = 1.
2 Representative Ensembles
Let us, first, recall several general preliminary definitions that are necessary for precisely
defining the basic notion of a representative statistical ensemble.
Physical system is a collection of objects characterized by their typical features distin-
guishing this collection from other systems.
For example, a collection of particles can be characterized by their Hamiltonian, that is,
by their energy operator.
Statistical system is a many-body physical system, whose typical features are compli-
mented by all additional constraints and conditions which are necessary for uniquely de-
scribing the statistical properties of the system.
Statistical systems are characterized by statistical ensembles.
Statistical ensemble is a pair {F , ρ̂(t)} composed by the space of microstates F and a
statistical operator ρ̂(t) on that space.
The space of microstates can be the Fock space or its appropriate subspace. The statis-
tical operator ρ̂(t), generally, is a function of time t. To give ρ̂(t) implies to define its form
ρ̂(0) at the initial time t = 0 and to specify the evolution operator Û(t) such that
ρ̂(t) = Û(t)ρ̂(0)Û+(t) . (1)
Therefore, a statistical ensemble can be defined as a triplet
{F , ρ̂(0), Û(t)} ←→ {F , ρ̂(t)} .
The knowledge of a statistical ensemble allows one to find statistical averages.
Statistical average for an operator Â(t) on F is
< Â(t) > = TrF ρ̂(t)Â(0) = TrF ρ̂(0)Â(t) . (2)
Here the Heisenberg representation of the operator Â(t) is assumed, for which
Â(t) = Û+(t)Â(0)Û(t) . (3)
Representative ensemble is a statistical ensemble equipped with all additional constraints
and conditions that are necessary for a unique representation of the given statistical system.
Additional constraints and conditions for statistical systems are usually formulated as
conditions on statistical averages for some specified condition operators Ĉi(t), where i =
1, 2, . . .. These operators do not need to be necessarily the integrals of motion, but they are
supposed to be Hermitian.
Statistical condition is a prescribed equality for the statistical average of a condition
operator,
Ci(t) = < Ĉi(t) > = Trρ̂(0)Ĉi(t) . (4)
Here and in what follows, the trace operation is assumed to be over the appropriate space
of microstates F .
Let us consider, first, an equilibrium statistical system, for which the statistical operator
does not depend on time,
ρ̂(t) = ρ̂(0) ≡ ρ̂ . (5)
The explicit form of the statistical operator follows from the principle of minimal information
[5]. The latter presumes the conditional maximization of the Gibbs entropy
S = −Trρ̂ ln ρ̂ (6)
under the statistical conditions (4), among which one usually distinguishes the definition of
the internal energy
E = Trρ̂Ĥ (7)
and the normalization condition
Trρ̂ = 1 . (8)
The information functional is
I[ρ̂] = −S + λ0 (Trρ̂− 1) + β
Trρ̂Ĥ −E
Trρ̂Ĉi − Ci
, (9)
where λ0 ≡ lnZ − 1 is the Lagrange multiplier preserving the normalization condition (8),
β is the inverse temperature, which is the Lagrange multiplier for condition (7), and βνi are
the Lagrange multipliers related to statistical conditions (4).
The minimization of the information functional (9) yields the statistical operator
e−βH , (10)
corresponding to the grand canonical ensemble with the grand Hamiltonian
H ≡ Ĥ +
νiĈi . (11)
The most customary expression for the grand Hamiltonian (11) is
H = Ĥ − µN̂ ,
where µ is the chemical potential and N̂ is the number-of-particle operator. However, the
general form of the grand Hamiltonian is given by Eq. (11), in which any condition operators
can be involved. Thus, an equilibrium representative ensemble is described by the statistical
operator (10) with the grand Hamiltonian (11). The evolution operator for an equilibrium
system is
Û(t) = e−iHt , (12)
which commutes with the statistical operator (10), because of which
ρ̂(t) = [H, ρ̂(t)] = 0 .
The general way of obtaining the evolution equations for arbitrary nonequilibrium sys-
tems is through the extremization of action functionals [6]. In our case, this extremization
has to be accomplished under the prescribed statistical conditions (4).
Let the system Hamiltonian be a functional of the field operators ψ(x, t) and ψ†(x, t),
that is, Ĥ = Ĥ [ψ], where ψ = ψ(x, t). The system Lagrangian is
L̂[ψ] ≡
ψ†(x, t) i
ψ(x, t) dx− Ĥ[ψ] . (13)
The action functional, or effective action, under the prescribed statistical conditions (4),
takes the form
A[ψ] ≡
L̂[ψ]−
νiĈi(t)
dt , (14)
where νi are the Lagrange multipliers guaranteeing the validity of the given statistical con-
ditions. The action functional is defined so that to be a self-adjoint operator,
A+[ψ] = A[ψ] . (15)
Similarly to Eq. (11), the grand Hamiltonian in the Heisenberg representation is
H [ψ] = Ĥ [ψ] +
νiĈi(t) . (16)
Then the effective action (14) can be rewritten as
A[ψ] =
ψ†(x, t) i
ψ(x, t) dx−H [ψ]
dt . (17)
The extremization of the action functional, requiring that
δA[ψ] = 0 ,
δA[ψ] =
δA[ψ]
δψ(x, t)
δψ(x, t) +
δA[ψ]
δψ†(x, t)
δψ†(x, t) ,
yields the evolution equations
δA[ψ]
δψ†(x, t)
= 0 ,
δA[ψ]
δψ(x, t)
= 0 . (18)
These equations are the Hermitian conjugated forms of each other.
From Eqs. (17) and (18), it is evident that the evolution equations for the field operators
can be represented as
ψ(x, t) =
δH [ψ]
δψ†(x, t)
and its Hermitian conjugated. This should be equivalent to the Heisenberg equation of
motion
ψ(x, t) = [ψ(x, t), H [ψ]] ,
that is, to the Heisenberg representation for the field operator
ψ(x, t) = Û+(t)ψ(x, 0)Û(t) .
Hence, the evolution operator satisfies the Schrödinger equation
Û(t) == H [ψ(x, 0)] Û(t) . (20)
In this way, a nonequilibrium representative ensemble is the set of the given space of
microstates F , initial statistical operator ρ̂(0), and of the evolution operator Û(t) defined
by Eq. (20). An equilibrium representative ensemble is, of course, just a particular case of
the general nonequilibrium ensemble.
3 Bose-Condensed Systems
To illustrate the explicit construction of a representative ensemble, let us consider a system
with Bose-Einstein condensate. Such systems possess a variety of interesting properties, as
can be inferred from review works [7–10]. Moreover, theoretical description of these systems
is known to confront the notorious difficulty of defining a self-consistent approach. The
theory of Bose-condensed systems is based on the Bogolubov idea [11-14] of breaking the
global gauge symmetry by means of the famous Bogolubov shift for field operators. The
condensate wave function, introduced in the course of this shift, has to satisfy the minimum
of the related thermodynamic potential, which is the stability condition necessary for mak-
ing the system stable and the theory conserving and self-consistent. At the same time, the
spectrum of collective excitations, according to the Hugenholtz-Pines theorem [15], has to
be gapless. The notorious problem is the appearance of the contradiction between the above
two requirements, when the theory is either nonconserving or gapful. This contradiction
does not arise only in the lowest orders with respect to particle interactions, when one uses
the Bogolubov approximation at low temperatures [11,12] or the quasiclassical approxima-
tion at high temperatures [16]. However, this contradiction immediately arises as soon as
the interaction strength is not asymptotically weak and one has to invoke a more elaborate
approximation. This problem of conserving versus gapless approximations was first empha-
sized by Hohenberg and Martin [17] and recently covered comprehensively by Andersen [9].
The problem is caused by the usage of nonrepresentative ensembles, which renders the sys-
tem unstable [18]. Here we show that employing a representative ensemble never yields the
above contradiction, always resulting in a self-consistent theory, being both conserving and
gapless.
We consider a system with the Hamiltonian
ψ†(r)
ψ(r) dr+
ψ†(r)ψ†(r′)Φ(r− r′)ψ(r′)ψ(r) drdr′ , (21)
in which the field operators ψ(r) = ψ(r, t) satisfy the Bose commutation relations, U =
U(r, t) is an external field, and Φ(r) = Φ(−r) is an interaction potential. For describing a
Bose-condensed system with broken global gauge symmetry, the Bogolubov shift [13,14] has
to be done through the replacement
ψ(r, t) −→ ψ̂(r, t) ≡ η(r, t) + ψ1(r, t) , (22)
where η(r, t) is the condensate wave function and ψ1(r, t) is the field operator of noncon-
densed particles. The latter field variables are assumed to be orthogonal to each other,
η∗(r, t)ψ1(r, t) dr = 0 . (23)
It is necessary to emphasize that the Bogolubov shift (22) realizes unitary nonequivalent
operator representations [18,19]. Accomplishing the Bogolubov shift (22) in Hamiltonian
(21), as well as in all operators of observables, we get the algebra of observables defined on
the Fock space F(ψ1) generated by the field operators ψ†1(r) (see details in Refs. [5,18,19]).
The condensate function is normalized to the number of condensed particles
|η(r, t)|2dr . (24)
The Bogolubov shift (22) is only rational when the number of condensed particles (24) is
macroscopic, which means that the limit
is not zero, where N is the total number of particles. The latter is given by the average
N = < N̂ > (25)
for the number-of-particle operator
ψ̂†(r)ψ̂(r) dr , (26)
in which the Bogolubov shift (22) is again assumed. The statistical averaging in Eq. (25)
and everywhere below is over the Fock space F(ψ1).
Substituting the Bogolubov shift (22) into Hamiltonian (21) gives in the latter the terms
linear in ψ1, because of which the average < ψ1 > can be nonzero. This, however, would
result in the nonconservation of quantum numbers, e.g., of spin or momentum. Therefore,
one has to impose the constraint for the conservation of quantum numbers,
< ψ1(r, t) > = 0 . (27)
Defining the self-adjoint condition operator
Λ̂(t) ≡
λ(r, t)ψ
1(r, t) + λ
∗(r, t)ψ1(r, t)
dr , (28)
in which λ(r, t) is a complex function, we may represent constraint (27) as the quantum
conservation condition
< Λ̂(t) > = 0 . (29)
In this way, there are three statistical conditions. The first condition is the normalization
(24) for the number of condensed particles. Condition (24) can be represented in the standard
form (4) by defining the operator
N̂0 ≡ 1̂
|η(r, t)|2dr , (30)
in which 1̂ is the unity operator in the Fock space F(ψ1). Then Eq. (24) reduces to the
statistical condition
N0 = < N̂0 > . (31)
The second condition is the normalization (25) for the total number of particles. Equivalently,
instead of normalization (25), we may consider the normalization condition
N1 = < N̂1 > , N̂1 ≡
1(r)ψ1(r) dr (32)
for the number of uncondensed particles N1 = N − N0. And the third condition is the
conservation condition (29). Respectively, the effective action, which is now a functional of
the two field variables, η(r, t) and ψ1(r, t), with taking account of the statistical conditions
(29), (31), and (32), becomes
A[η, ψ1] =
L̂+ µ0N̂0 + µ1N̂1 + Λ̂
dt . (33)
Here L̂ = L̂[ψ̂] is the Lagrangian (13) under the Bogolubov shift (22) and Λ̂ = Λ̂(t) from
Eq. (28). The quantities µ0, µ1, and λ(r, t) are the Lagrange multipliers guaranteeing the
validity of the corresponding statistical conditions. Introducing the grand Hamiltonian
H [η, ψ1] ≡ Ĥ − µ0N̂0 − µ1N̂1 − Λ̂ , (34)
in which Ĥ = Ĥ[ψ̂], with shift (22), and the effective Lagrangian
L[η, ψ1] ≡
η∗(r, t) i
η(r, t) + ψ
1(r, t) i
ψ1(r, t)
dr−H [η, ψ1] , (35)
for the action functional (33), we get
A[η, ψ1] =
L[η, ψ1] dt . (36)
The evolution equations follow from the extremization of the action functional (36), that
is, from the variations
δA[η, ψ1]
δη∗(r, t)
= 0 (37)
δA[η, ψ1]
1(r, t)
= 0 . (38)
These equations, as is clear from Eqs. (35) and (36), are equivalent to the equations of
motion
η(r, t) =
δH [η, ψ1]
δη∗(r, t)
ψ1(r, t) =
δH [η, ψ1]
1(r, t)
. (40)
One has to substitute here Hamiltonian (34) under the Bogolubov shift (22). Accomplishing
the variation in Eq. (39), we get
η(r, t) =
+ U − µ0
η(r)+
Φ(r− r′)
|η(r′)|2η(r) + X̂(r, r′)
dr′ , (41)
where the time dependence in the right-hand side, for short, is not explicitly shown, U =
U(r, t), and the notation for the correlation operator
X̂(r, r′) ≡ ψ†1(r′)ψ1(r′)η(r) + ψ
′)η(r′)ψ1(r)+
+η∗(r′)ψ1(r
′)ψ1(r) + ψ
′)ψ1(r
′)ψ1(r) (42)
is used. The variation in Eq. (40) gives
ψ1(r, t) =
+ U − µ1
ψ1(r)+
Φ(r− r′)
|η(r′)|2ψ1(r) + η∗(r′)η(r)ψ1(r′) + η(r′)η(r)ψ†1(r′) + X̂(r, r′)
dr′ . (43)
To get an equation for the condensate wave function, we have to take the statistical
average of Eq. (41). For this purpose, we introduce the normal density matrix
ρ1(r, r
′) ≡ < ψ†1(r′)ψ1(r) > , (44)
the anomalous density matrix
σ1(r, r
′) ≡ < ψ1(r′)ψ1(r) > , (45)
and their diagonal elements, giving the density of noncondensed particles
ρ1(r) ≡ ρ1(r, r) = < ψ†1(r)ψ1(r) > (46)
and the anomalous average
σ1(r) ≡ σ1(r, r) = < ψ1(r)ψ1(r) > . (47)
The quantity |σ1(r)| can be interpreted as the density of paired particles [19]. The total
density of particles
ρ(r) = ρ0(r) + ρ1(r) (48)
consists of the condensate density
ρ0(r) ≡ |η(r)|2 (49)
and the density of noncondensed particles (46). Averaging Eq. (41), we find the equation
for the condensate wave function
η(r, t) =
+ U − µ0
η(r)+
Φ(r− r′)
ρ(r′)η(r) + ρ1(r, r
′)η(r′) + σ1(r, r
′)η∗(r′)+ < ψ
′)ψ1(r
′)ψ1(r) >
dr′ . (50)
Equations (43) and (50) are the basic equations of motion for the field variables η(r, t) and
ψ1(r, t). These equations, according to Eqs. (39) and (40), are generated by the variation
of the grand Hamiltonian (34). The latter, in agreement with Eq. (20), also defines the
evolution operator Û(t), which satisfies the Schrödinger equation
Û(t) = H [η(r, 0), ψ1(r, 0)]Û(t) .
Thus, the representative ensemble for a Bose-condensed system is the triplet
{F(ψ1), ρ̂(0), Û(t)} .
It is important to stress that the so defined representative ensemble possesses a principal
feature making it different from the standardly used ensemble having the sole Lagrange
multiplier µ0 ≡ µ1. But then the normalization condition (24) cannot be guaranteed. Then
the evolution equation for the condensate wave function is not a result of a variational
procedure. For an equilibrium system, this means that the number of condensed particles
N0 does not provide the minimum of a thermodynamic potential, which implies the system
instability. All notorious inconsistencies in theory, manifesting themselves in the lack od
conservation laws or in the appearance of an unphysical gap in the spectrum, are caused by
the usage of nonrepresentative ensembles.
4 Green Functions
The equations of motion (43) and (50) allow us to derive the evolutional equations for the
Green functions. To this end, we shall use the compact notation denoting the set {rj, tj} by
the sole letter j, so that the dependence of functions on the spatial and temporal variables
looks like
f(12 . . . n) ≡ f(r1, t1, r2, t2, . . . , rn, tn) .
The product of the differentials drjdtj will be denoted as d(j), so that
d(12 . . . n) ≡
drj dtj .
We shall employ the Dirac delta function
δ(12) ≡ δ(r1 − r2) δ(t1 − t2) .
For the interaction potential, we shall use the retarded form
Φ(12) ≡ Φ(r1 − r2)δ(t1 − t2 + 0) . (51)
The matrix Green function G(12) = [Gαβ(12)] is a 2 × 2 matrix, with α, β = 1, 2, and
with the following elements:
G11(12) ≡ −i < T̂ψ1(1)ψ†1(2) > , G12(12) ≡ −i < T̂ψ1(1)ψ1(2) > ,
G21(12) ≡ −i < T̂ψ†1(1)ψ
1(2) > , G22(12) ≡ −i < T̂ψ
1(1)ψ1(2) > , (52)
where T̂ is the time-ordering operator.
Let us introduce the operator
K̂j ≡ −
+ U(j)− µ1 , (53)
the condensate effective potential
V (12) ≡ δ(12)
Φ(13)|η(3)|2d(3) + Φ(12)η(1)η∗(2) , (54)
and let us rewrite the correlation operator (42) in the form
X̂(12) = ψ
1(2)ψ1(2)η(1) + ψ
1(2)η(2)ψ1(1) + η
∗(2)ψ1(2)ψ1(1) + ψ
1(2)ψ1(2)ψ1(1) . (55)
We also define the matrix correlation function X(123) = [Xαβ(123)] with the elements:
X11(123) ≡ − < T̂ X̂(12)ψ†1(3) > , X12(123) ≡ − < T̂ X̂(12)ψ1(3) > ,
X21(123) ≡ − < T̂X̂+(12)ψ†1(3) > , X22(123) ≡ − < T̂X̂+(12)ψ1(3) > . (56)
From the equations of motion (43) and (50), we find the equations
− K̂1
G11(12)−
V (13)G11(32) d(3)−
Φ(13) [η(1)η(3)G21(32) + iX11(132)] d(3) = δ(12) ,
− K̂1
G12(12)−
V (13)G12(32) d(3)−
Φ(13) [η(1)η(3)G22(32) + iX12(132)] d(3) = 0 ,
− K̂1
G21(12)−
V ∗(13)G21(32) d(3)−
Φ(13) [η∗(1)η∗(3)G11(32) + iX21(132)] d(3) = 0 ,
− K̂1
G22(12)−
V ∗(13)G22(32) d(3)−
Φ(13) [η∗(1)η∗(3)G12(32) + iX22(132)] d(3) = δ(12) . (57)
The self-energy Σ(12) = [Σαβ(12)] is a matrix whose elements are defined by the relations
[Σ11(13)G11(32) + Σ12(13)G21(32)] d(3) =
V (13)G11(32) d(3) +
Φ(13) [η(1)η(3)G21(32) + iX11(132)] d(3) ,
[Σ11(13)G12(32) + Σ12(13)G22(32)] d(3) =
V (13)G12(32) d(3) +
Φ(13) [η(1)η(3)G22(32) + iX12(132)] d(3) ,
[Σ21(13)G11(32) + Σ22(13)G21(32)] d(3) =
V ∗(13)G21(32) d(3) +
Φ(13) [η∗(1)η∗(3)G11(32) + iX21(132)] d(3) ,
[Σ21(13)G12(32) + Σ22(13)G22(32)] d(3) =
V ∗(13)G22(32) d(3) +
Φ(13) [η∗(1)η∗(3)G12(32) + iX22(132)] d(3) . (58)
Let us introduce the matrix condensate propagator C(12) = [Cαβ(12)], with the elements
C11(12) ≡ −iη(1)η∗(2) , C12(12) ≡ −iη(1)η(2) ,
C21(12) ≡ −iη∗(1)η∗(2) , C22(12) ≡ −iη∗(1)η(2) , (59)
The latter have the properties
C11(21) = C22(12) , C11(11) = C22(11) , C12(21) = C12(12) ,
C21(21) = C21(12) , C
11(12) = −C22(12) , C∗12(12) = −C21(12) . (60)
The binary Green function is a matrix B(123) = [Bαβ(123)],
B(123) = C(12)G(23)− iρ0(2)G(13) +X(123) , (61)
whose elements are
B11(123) ≡ C11(12)G11(23) + C11(22)G11(13) + C12(12)G21(23) +X11(123) ,
B12(123) ≡ C11(12)G12(23) + C11(22)G12(13) + C12(12)G22(23) +X12(123) ,
B21(123) ≡ C22(12)G21(23) + C22(22)G21(13) + C21(12)G11(23) +X21(123) ,
B22(123) ≡ C22(12)G22(23) + C22(22)G22(13) + C21(12)G12(23) +X22(123) , (62)
and where ρ0(1) ≡ |η(1)|2.
With Eqs. (59) and (61), relations (58), defining the self-energy, can be rewritten in the
matrix form
Σ(13)G(32) d(3) = i
Φ(13)B(132) d(3) . (63)
Then the equations of motion (57) acquire the matrix representation
τ̂3 i
− K̂1
G(12)−
Σ(13)G(32) d(3) = δ(12) , (64)
in which the delta function δ(12) in the right-hand side is assumed to be factored with the
unity matrix 1̂ = [δαβ ] and
τ̂3 ≡
is a Pauli matrix.
Introducing the inverse propagator
G−1(12) ≡
τ̂3 i
− K̂1
δ(12)− Σ(12) (65)
allows us to transform Eq. (64) into
G−1(13)G(32) d(3) = δ(12) . (66)
An equivalent representation, following from Eq. (66), is
G(13)G−1(32) d(3) = δ(12) . (67)
For the self-energy, using Eq. (63), we have
Σ(12) = i
Φ(13)B(134)G−1(42) d(34) . (68)
The equations for the Green functions are to be complimented by the equation for the
condensate wave function (50), which, introducing one more anomalous average
ξ(12) ≡ < ψ†1(2)ψ1(2)ψ(1) > , (69)
can be represented as
η(1) =
+ U(1)− µ0
η(1)+
Φ(12) [ρ(2)η(1) + ρ1(12)η(2) + σ1(12)η
∗(2) + ξ(12)] d(2) . (70)
It is the equations for the Green functions and the equation for the condensate func-
tion, which become mutually incompatible in the standard approach, while employing the
representative ensemble renders the theory self-consistent in any approximation.
5 Theory Self-Consistency
One usually confronts inconsistencies in theory considering a uniform equilibrium Bose-
condensed system. Then, in any given approximation, one gets either a nonconserving
theory, that is, an unstable system, or one finds an unphysical gap in the spectrum, which,
actually, again corresponds to an unstable system [9,18]. To analyze this problem, we pass
now to the case of an equilibrium uniform system, when U = 0.
Then we use the Fourier transform for the Green function
G(12) =
G(k, ω)ei(k·r12−ωt12)
(2π)4
in which
r12 ≡ r1 − r2 , t12 ≡ t1 − t2 .
By their definition in Eq. (52), the Green function elements possess the properties
G11(21) = G22(12) , G12(21) = G12(12) , G21(21) = G21(12) . (71)
Therefore the corresponding Fourier transforms satisfy the relations
G11(−k,−ω) = G22(k, ω) , G12(−k,−ω) = G12(k, ω) ,
G21(−k,−ω) = G21(k, ω) . (72)
Assuming that the system is isotropic, one has
Gαβ(−k, ω) = Gαβ(k, ω) (73)
for all α, β. Combining Eqs. (72) and (73), we find
G11(k,−ω) = G22(k, ω) , G12(k,−ω) = G12(k, ω) ,
G21(k,−ω) = G21(k, ω) . (74)
Also, for a uniform equilibrium system, one has [14] the equality
G21(k, ω) = G12(k, ω) . (75)
Fourier-transforming the self-energy
Σ(12) =
Σ(k, ω)ei(k·r12−ωt12)
(2π)4
and, similarly, the inverse propagator (65), we have for the latter
G−1(k, ω) = τ̂3ω −
+ µ− Σ(k, ω) . (76)
Then Eq. (67) reduces to
G−1(k, ω)G(k, ω) = 1̂ , (77)
where 1̂ = [δαβ ].
From Eqs. (76) and (77), it follows that G−1(k, ω), hense, also Σ(k, ω), have the same
symmetry properties as G(k, ω). In particular,
Σαβ(−k, ω) = Σαβ(k, ω) , Σ12(k,−ω) = Σ12(k, ω) , Σ21(k,−ω) = Σ21(k, ω) ,
Σ21(k, ω) = Σ12(k, ω) , Σ11(k,−ω) = Σ22(k, ω) . (78)
The matrix equation (77), explicitly, is the system of equations
ω − k
+ µ1 − Σ11
G11 − Σ12G21 = 1 ,
+ µ1 − Σ11
G12 − Σ12G22 = 0 ,
−ω − k
+ µ1 − Σ22
G21 − Σ21G11 = 0 ,
−ω − k
+ µ1 − Σ22
G22 − Σ21G12 = 1 , (79)
where, for short, Gαβ = Gαβ(k, ω) and Σαβ = Σαβ(k, ω). The solutions to these equations
G11(k, ω) =
ω + k2/2m+ Σ11(k, ω)− µ1
D(k, ω)
G12(k, ω) = −
Σ12(k, ω)
D(k, ω)
, (80)
with the denominator
D(k, ω) ≡
ω − k
− Σ11(k, ω) + µ1
+ Σ22(k, ω)− µ1
+ Σ212(k, ω) . (81)
The solutions for Σ21(k, ω) and G22(k, ω) are defined by the symmetry properties (72) to
(75).
The excitation spectrum is given by the poles of the Green functions, that is, by the zero
of denominator (81),
D(k, εk) = 0 . (82)
Equation (82) can be represented as
[Σ11(k, εk)− Σ22(k, εk)]±
ω2k − Σ212(k, εk) , (83)
with the notation
[Σ11(k, εk) + Σ22(k, εk)]− µ1 . (84)
Denominator (81) enjoys the property
D(k,−ω) = D(k, ω) .
Consequently, if εk is a solution of Eq. (82), then −εk is also its solution, which is in
agreement with the form of Eq. (83).
For an equilibrium uniform system, the Bogolubov shift (22) is equivalent to the sepa-
ration of the zero-momentum term in the expansion of the field operator over plane waves.
The shift itself has meaning only under the normalization condition (24), in which N0 ∼ N ,
that is, the zero-momentum state is macroscopically occupied. The latter becomes possible
when the single particle spectrum touches zero. Therefore, the necessary condition for the
existence of Bose-Einstein condensate is
εk = 0 . (85)
This is to be complimented by the stability condition
Re εk ≥ 0 , Im εk ≤ 0 . (86)
This condition should be kept in mind when choosing the sign plus in front of the square
root in spectrum (83).
Taking limit (85) for spectrum (83), we notice that, according to properties (78),
Σ11(k, 0) = Σ22(k, 0) . (87)
By using perturbation theory for a stable system, one can show [15] that in all orders of the
theory
Σαβ(0, 0) ≥ 0 . (88)
Then the necessary condition (85) yields the expression for the chemical potential
µ1 = Σ11(0, 0)− Σ12(0, 0) , (89)
which is the Hugenholtz-Pines relation [15].
On the other hand, we have Eq. (70) for the condensate wave function. For an equilibrium
uniform system, with no external potential U , all densities do not depend on the spatial and
temporal variables,
ρ0(r) = ρ0 , ρ1(r) = ρ1 , σ1(r) = σ1 , ρ(r) = ρ . (90)
The condensate wave function reduces to the constant
η(r, t) = η =
ρ0 . (91)
Then we substitute into Eq. (70) the Fourier transforms for the interaction potential
Φ(r) =
ik·r dk
(2π)3
for the normal density matrix (44),
ρ1(r1, r2) =
ik·r12
(2π)3
and for the anomalous density matrix (45),
σ1(r1, r2) =
ik·r12
(2π)3
Similarly, the Fourier transform for the anomalous average (69) is
ξ1(r1, r2) =
ik·r12
(2π)3
As a result, Eq. (70) gives
µ0 = ρΦ0 +
nk + σk +
(2π)3
. (92)
Generally, expressions (92) and (89) do not coincide with each other, their difference
being
µ0 − µ1 = ρΦ0 +
nk + σk +
(2π)3
−Σ11(0, 0) + Σ12(0, 0) . (93)
This is the general expression for the difference between the Lagrange multipliers µ0 and µ1
for an arbitrary equilibrium uniform Bose-condensed system.
Usually, one does not distinguish between the Lagrange multipliers µ0 and µ1, which
implies setting µ0 − µ1 → 0. However, as is evident from Eq. (93), there is no any reason
for requiring that this quantity be zero. As an illustration, we may resort to the Hartree-
Fock-Bogolubov approximation, in which ξk = 0 and
Σ11(0, 0) = (ρ+ ρ0)Φ0 +
(2π)3
, Σ12(0, 0) = ρ0Φ0 +
(2π)3
Relation (89) then yields
µ1 = ρΦ0 +
(nk − σk)Φk
(2π)3
. (94)
The difference of the chemical potentials (93) becomes
µ0 − µ1 = 2
(2π)3
, (95)
which, certainly, is nonzero [20].
In this way, the introduction of the additional Lagrange multiplier makes the theory
completely self-consistent. All inconsistencies that often arise in other works, such as the
appearance of a gap in the spectrum, system instability or a distortion of the phase transition
order, are caused by neglecting the difference between the multiplier µ1 and the multiplier
µ0. It is worth emphasizing that the introduction of the Lagrange multiplier µ0 for preserving
the normalization condition (24), from the mathematical point of view is strictly necessary.
In other case, the employed ensemble would not be representative, hence, could not correctly
describe the Bose-condensed system with broken gauge symmetry.
Acknowledgement
I am grateful for many useful discussions to M. Girardeau, R. Graham, H. Kleinert, and
E.P. Yukalova.
References
[1] J.W. Gibbs, Collected Works (Longmans, New York, 1931), Vol. 2.
[2] D. ter Haar, Elements of Statistical Mechanics (Reinhart, New York, 1954).
[3] D. ter Haar, Rep. Prog. Phys. 24, 304 (1961).
[4] V.I. Yukalov, Phys. Rep. 208, 395 (1991).
[5] V.I. Yukalov, Statistical Green’s Functions (Queen’s University, Kingston, 1998).
[6] H. Kleinert, Path Integrals (World Scientific, Singapore, 2004).
[7] P.W. Courteille, V.S. Bagnato and V.I. Yukalov, Laser Phys. 11, 659 (2001).
[8] L. Pitaevskii and S. Stringari, Bose-Einstein Condensation (Clarendon, Oxford, 2003).
[9] J.O. Andersen, Rev. Mod. Phys. 76, 599 (2004).
[10] K. Bongs and K. Sengstock, Rep. Prog. Phys. 67, 907 (2004).
[11] N.N. Bogolubov, J. Phys. (Moscow) 11, 23 (1947).
[12] N.N. Bogolubov, Moscow Univ. Phys. Bull. 7, 43 (1947).
[13] N.N. Bogolubov, Lectures on Quantum Statistics (Gordon and Breach, New York, 1967),
Vol. 1.
[14] N.N. Bogolubov, Lectures on Quantum Statistics (Gordon and Breach, New York, 1970),
Vol. 2.
[15] N.M. Hugenholtz and D. Pines, Phys. Rev. 116, 489 (1959).
[16] N. Prokofiev, O. Ruebenacker and B. Svistunov, Phys. Rev. A 69, 053625 (2004).
[17] P.C. Hohenberg and P.C. Martin, Ann. Phys. 34, 291 (1965).
[18] V.I. Yukalov, Phys. Rev. E 72, 066119 (2005).
[19] V.I. Yukalov, Laser Phys. 16, 511 (2006).
[20] V.I. Yukalov and E.P. Yukalova, Laser Phys. Lett. 2, 506 (2005).
ABSTRACT
  The notion of representative statistical ensembles, correctly representing
statistical systems, is strictly formulated. This notion allows for a proper
description of statistical systems, avoiding inconsistencies in theory. As an
illustration, a Bose-condensed system is considered. It is shown that a
self-consistent treatment of the latter, using a representative ensemble,
always yields a conserving and gapless theory.

<|endoftext|><|startoftext|>
Introduction
	Bouncing solution in the presence of Quintom matter
	A phenomenological Quintom model
	Two-field Quintom model
	A single scalar with high-derivative terms
	Conclusion and discussions
	Acknowledgments
	References
ABSTRACT
  The bouncing universe provides a possible solution to the Big Bang
singularity problem. In this paper we study the bouncing solution in the
universe dominated by the Quintom matter with an equation of state (EoS)
crossing the cosmological constant boundary. We will show explicitly the
analytical and numerical bouncing solutions in three types of models for the
Quintom matter with an phenomenological EoS, the two scalar fields and a scalar
field with a modified Born-Infeld action.

<|endoftext|><|startoftext|>
I. Mullayeva
On the weight structure of cyclic codes over GF (q) , q > 2 .
Abstract
The interrelation between the cyclic structure of an ideal, i.e.,
a cyclic code over Galois field GF (q) , q > 2 , and its classes of
proportional elements is considered. This relation is used in order
to define the code’s weight structure. The equidistance conditions
of irreducible nonprimitive codes over GF(q) are given. Besides
that, the minimum distance for some class of nonprimitive cyclic
codes is found.
The relation of proportionality for elements of algebra An , consisting
of polynomials in x over Galois field GF (q) , modulo polynomial xn − 1 ,
is the equivalence relation [1]. Therefore An falls into several disjoint sub-
sets and every such subset contains all elements which are proportional
to each other. These subsets will be called the classes of proportionality.
Let z(x) 6= 0 be some vector of An . If α1 = 1, α2, . . . , αq−1 are all dif-
ferent elements of the multiplicative group GF (q)∗ of the field GF (q) ,
then the following q − 1 vectors
α1z(x), α2z(x), . . . , αq−1z(x) (1)
are some different and proportional to each other elements of An . The set
of vectors (1) is closed under the multiplication by the elements of the
group GF (q)∗ . Hence the set (1) represents some class of proportional
elements, which will be denoted by Pz(x) . Because of an arbitrary choice
for z(x) , every nonzero class consists of q− 1 elements of the form (1) .
Consequently An contains (q
n − 1)/(q − 1) different nonzero classes.
Evidently all elements of one class have the same period [2, 3] or the
same order [8]. Clearly, the supporting sets [2] of vectors, entering into
the same class of proportionality, are similar too. Hence, the Hamming
weight is also the same for all vectors of one class. Thus, we can say that
any proportionality class Pz(x) , Pz(x) ⊂ An , has its order, its supporting
set and its Hamming weight. Obviously, any proportionality class of An
is characterizied by its unique monic polynomial.
Now consider an ideal J , J ⊂ An , i.e., some cyclic code over GF (q) ,
having the following generator g(x) = (xn − 1)/h(x) [7], where h(x) is
some parity-check polynomial of degree m , having some order n , n =
ord(h(x)) [8]. Below we suppose that q > 2 and gcd(n, q) = 1 .
It’s also known [3, 4, 10] that any ideal is partitioned into several
disjoint subsets, that is cycles, under the multiplication of ideal’s vectors
http://arxiv.org/abs/0704.1091v1
by x . On the other hand, some ideal J , as a subspace of An , consists of
(qm−1)/(q−1) nonzero proportionality classes. Obviously, the existence
of two different partitions into some disjoint subsets of any ideal assumes
a certain dependence between proportionality classes and cycles of ideal.
Further, any ideal J ⊂ An is the direct sum of minimal ideals [2, 8]
Ji, (2)
where Ji is some minimal ideal, having an irreducible parity-check poly-
nomial hi(x) of degree mi and of order ni , ni = ord(hi(x)) , 1 ≤ i ≤ t .
This implies that the following polynomial
h(x) =
hi(x), (3)
is the parity-check polynomial of J and n , n = lcm(n1, n2, . . . , nt) ,
is the order of h(x) [8]. It should be stressed that under the condition
gcd(n, q) = 1 the polynomial (3) has no repeated factors [8].
Remark 1 . It is worth mentioning that the number n can be some
number of either the primitive form n = qm − 1 or of the nonprimitive
form n 6= qm−1 . In the first case, we have some cyclic primitive code and
the second case corresponds to a certain cyclic nonprimitive code [11].
Let us stress that n 6= qm − 1 if and only if ni 6= q
mi − 1 , 1 ≤ i ≤ t .
But if there is at least one primitive polynomial among the polynomials
hi(x) , 1 ≤ i ≤ t , then n = q
m − 1 .
Furthermore, applying the theory of linear recurring sequences [8, 12]
to elements of an ideal, we obtain that every element of some ideal J ⊂
An is characterized by its unique minimal polynomial. Denote by C the
set of all elements of J , having the same minimal polynomial c(x) . The
set C is either some minimal ideal Ji , 1 ≤ i ≤ t , or a certain subset of
all such elements of J , whose characteristic polynomial of the smallest
degree coincides with c(x) . In the general case, the polynomial c(x) is
equal to the product of some k , 1 ≤ k ≤ t , polynomials from t different
prime divisors of h(x) . Thus,
c(x) =
hij (x), 1 ≤ k ≤ t. (4)
This means that any element of C has the same period or the same order
nc = ord(c(x)) , nc ≤ n , nc|n .
Lemma 1 . For the set C , C ⊂ J , having some minimal polynomi-
al c(x) in terms of (4) , the following equality takes place
nc · sc = Rc · (q − 1), (5)
where sc is the number of all cycles, and Rc is the number of all pro-
portionality classes of C .
Proof. The set C is closed under two different operations. The first
operation is the cyclic shift of vector and the second one is the multipli-
cation of vectors by elements of group GF (q)∗ . Hence equality (5) can
be obtained by the counting of the number of all elements, belonging to
C , via the two different ways. The lemma is proved.
Theorem 1 . Any cycle {z(x)} of ideal (2) , having some period nz ,
nz|n , consists of dz subsets. The first element of each such subset is
proportional to z(x) . Every such subset contains rz nonproportional to
each other vectors, i.e., nz = rz · dz , dz|(q − 1) . And the number rz is
the index of the subgroup, belonging to GF (q)∗ , of order dz in the group
of the roots of unity, having the least possible order.
Proof. Let rz , 1 ≤ rz ≤ nz , is the smallest natural number such
that the following equality holds
xrz · z(x) = α · z(x)mod(xnz − 1), (6)
where α is some element of GF (q)∗ . Then the following rz vectors of
cycle {z(x)}
z(x), x · z(x), ..., xr
−1 · z(x) (7)
are some non–proportional to each others vectors because, assuming the
inverse, we should be able to decrease the number rz , but it is impossible.
Hence the set of elements (7) belongs to the following rz classes of
proportionality
Pz(x), Pxz(x), . . . , ..., Pxrz−1z(x). (8)
Since xnzz(x) = z(x) in the ring An and also, considering (6) , we
see that Pz(x) = Pxrz z(x) = Pxnz z(x) . This means that the cycle {z(x)}
belongs to the classes (8) and every such class contains dz vectors,
dz = nz/rz , 1 < dz ≤ q − 1 . In terms of equality (6) the following
different vectors z(x), xrzz(x), x2rzz(x), . . . , x(dz−1)rzz(x) of class Pz(x)
can be represented as α0z(x), αz(x), . . . , αdz−1z(x) . This implies that
α0 = 1, α, α2, . . . , αdz−1 are the different elements of group GF (q)∗ .
Since xnzz(x) = xrzdzz(x) = αdzz(x) = z(x) , we see that αdz = 1 .
This yields that dz is the order of element α in GF (q)
∗ . Consequently,
dz|q − 1 , 1 ≤ dz ≤ q − 1 .
Finally, under the condition gcd(n, q) = 1 the polynomial xn − 1
has n different roots in GF (qh) field, where h is the multiplicative or-
der of q modulo n [8]. Denote by E(n) the multiplicative group of n - th
roots of unity over GF (q) . Let ξ ∈ E(n) be some n - th primitive root of
unity. Then the following set of elements ξ0 = 1, ξ, ξ2, . . . , ξnz−1, . . . , ξn−1
represents the group E(n) . Since nz|n , we have E(nz) ⊂ E(n) , where
E(nz) is the multiplicative group of nz -th roots of unity. Moreover,
taking into account the isomorphism of the groups, having the same or-
der [9], we can state that the subgroup {α} , {α} ⊂ GF (q)∗ , of order dz
belongs to E(nz) because dz|nz .
As mentioned above, nz is the period of z(x) , so that nz is the small-
est divisor of n such that the following congruence xrz ≡ αmod(xnz −1)
takes place. Hence E(nz) is the smallest group of n - th roots of uni-
ty, which contains {α} . Since α = ξrz , we see that the following el-
ements ξrzdz = 1, ξrz , ξ2rz , . . . , ξ(dz−1)rz represent the subgroup {α} in
the group E(nz) . Besides that, the decomposition of E(nz) relative to
the subgroup {α} consists of rz different cosets. Thus, the number rz
is the index of subgroup {α} in the group of roots of unity, having the
smallest possible order. The theorem is proved.
Remark 2 . Notice that when a parity-check polynomial of code is
some primitive polynomial of degree m and of order n = qm−1 , m > 1 ,
then r = R = (qm − 1)/(q − 1) and d = q − 1 . (see [2], [7]).
Corrollary 1 . The period nz , nz = rz·dz ,of element z(x) , z(x) ∈ J ,
equals dz , dz|(q − 1) , 1 ≤ dz ≤ (q − 1) ,if and only if the cycle {z(x)}
is contained in one class of proportionality.
Corollary 2 . All code words of any cyclic code, havinq some length
n over GF (q) , fall into some equal-weight subsets and every such subset
includes all proportional to each other cycles.
Besides that, consider some minimal ideal J , J ⊂ An , having an
irreducible nonprimitive parity-check polynomial h(x) of degree m and
of order n , n 6= qm−1 , i.e., some irreducible code of nonprimitive length.
Remark 3 . The degree m of the polynomial h(x) coincides with
the multiplicative order h of the number q modulo n [8]. Also, the order
n of the polynomial h(x) is some divisor of qh−1 . This means that the
order n can change in the following limits 1 < n < qh−1 , n 6= q − 1 . If
1 < n < q − 1 , i.e., n 6= q − 1 , then some minimal ideal J , J ⊂ An , of
dimention one contains only one nonzero class of proportyonality. Con-
sequently, n = d , 1 < d < q − 1 , d|q − 1 .
Theorem 2 . Any cycle of minimal ideal J , J ⊂ An , having some
parity-check polynomial h(x) of degree m , m > 1 , and of order n ,
n 6= qm − 1 , is contained in r proportionality classes, 1 ≤ r ≤ R , r|R ,
R = (qm − 1)/(q − 1) . Every such class consists of d , 1 ≤ d ≤ (q − 1) ,
different vectors of cycle, and
d = gcd(q − 1, n), r = n/d. (9)
Proof. All elements of minimal ideal J , J ⊂ An , have the same order
n , n = ord(h(x)) . Applying the theorem 1 to some element f(x) ,
f(x) ∈ J , we have n = rf ·df , df |(q−1) , 1 ≤ rf ≤ n , 1 ≤ df ≤ (q−1) ,
and also the following equality
xrf · f(x) = γ · f(x), (10)
where γ is some element of GF (q)∗ . Evidently, if either rf = 1 , i.e.,
df = n , or rf = n and df = 1 , then equalities (9) take place. Hence,
below we suppose that 1 < rf < n , 1 < df < q − 1 , and therefore
q − 1 < n < qm − 1 .
Taking into account (9) , we have df ≤ d . Let us show that the strong
unequality df < d is impossible. Indeed, if df < d , then the subgroup
{γ} , where γ is the element from equality (10) , belongs to some group
of n -th roots of unity, having the order d , because df |d . Since d|(q−1) ,
we see that the subgroup {γ} belongs to GF (q)∗ . This means that the
cycle {f(x)} is contained in one class of proportionality, i. e., rf = 1 .
But this fact contradicts to the condition rf > 1 . This implies that the
strong unequality df < d is impossible. Hence df = d . Because of an
arbitrary choice of f(x) we can conclude that equalities (9) take place
for any element of J . The theorem is proved [6].
Remark 4 . It is necessary to note that the theorem 2 is valid only
for irreducible codes of non–primitive length except Reed-Solomon codes
of length n = q − 1 as it was shown above. In the case of irreducible
codes of primitive length n , n = qm− 1 , m > 1 , the theorem 2 will be
valid if and only if gcd(m, q − 1) = 1 . Indeed, when the last condition
takes place, then gcd((qm − 1)/(q − 1), q − 1) = 1 [7]. Thus, d = q −
1 = gcd(qm − 1, q − 1) that is d = gcd(n, q − 1) . It follows that the
theorem 2 holds.
Remark 5 . Notice that under the condition gcd(m, q − 1) = 1 the
number r from (9) has no divisors of (q−1) except 1 , so gcd(r, q − 1) = 1 .
This means that gcd(r, d) = 1 . Hence, considering the fact that r|R and
also, taking into account (9) and the following equality s = R(q−1)/n ,
we have r = gcd(R, n) . Besides that, if either one from the two numbers
n and q− 1 does not contain multiple prime divisors or the same prime
divisors of these numbers have the same degrees under the decomposi-
tion of both n and q − 1 , then the following equalities r = gcd(R, n) ,
gcd(r, d) = 1 also take place.
Corollary 3 . Both the number r and the number d are the same
numbers of all irreducible divisors of polynomial xn − 1 over GF (q) ,
having the same order.
Corollary 4 . The number R , R = (qm−1)/(q−1) , of proportional-
ity classes, of some irreducible code K ,having some length n , n 6= (qm−
1) , n = d · r , over GF (q) field, consists of some v different subsets.
And every such subset contains r equal-weight proportionality classes,
i.e., R = v · r . Besides that, every subset includes b equal-weight cycles,
b = (q − 1)/d , 1 < b ≤ (q − 1) . So that the number of all cycles for K
equals s = v · b and gcd(r, b) = 1 .
Proof. According to the theorem 2 any cycle of code K is contained
in r , r|R , proportionality classes. Therefore the number v = R/r gives
us the common quantity of different subsets of J , each of which consists
of r classes, i.e., R = v · r . The number b , b = (q−1)/d , is the number
of all different equal-weight cycles, contained in every such subset, which
consists of some r classes. Hence the number of all cycles for K is equal
to s = v · b . Since d = gcd(q− 1, n) we see that gcd(r, b) = 1 . Actually,
assuming the inverse, we would have been able to decrease the number
d , but it’s impossible. The corollary is proved.
Corollary 5 . The irreducible nonprimitive code K is some equidis-
tant code if s = b .Besides that, the last equation is equivalent to the
following ones: r = R or gcd(s, R) = 1 .
Remark 6 . Note that the condition s = b was obtained in [14, 15],
but only for some subclass of irreducible nonprimitive codes and under
the following additional restriction gcd(b,m) = 1 .
Corollary 6 . The weight of any element, belonging to some irre-
ducible nonprimitive code K of length n over GF (q) , is multiple of the
number d , d = gcd(q − 1, n) .
Proof. The weight of any element z(x) , z(x) ∈ K , of order n ,
n = ord(h(x)) , is equal to the number of such j , 0 ≤ j ≤ n − 1 , for
which the polynomial xj · z(x) has the following degree n−1 . According
to the theorem 2 , the number of such polynomials for the cycle z(x) ,
having degree n − 1 , is equal to wr · d , where wr is the number of
polynomials, having degree n − 1 , among the first r cyclic shifts of
z(x) , and d = gcd(q − 1, n) . The corollary is proved.
In addition, consider some ideal J , J ⊂ An , of the form (2) , having
the parity-check polynomial h(x) in terms of (3) .
Theorem 3 .If the following condition gcd(h, q− 1) = 1 , where h is
the multiplicative order of number q mod n , takes place, then any cycle
of set C , C ⊂ J , having some minimal polynomial of the form (4) ,
is contained in rc proportionality classes, rc|Rc , and every such class
includes dc , dc|q − 1 , elements of cycle, that is nc = rc · dc , nc =
ord(c(x)) , where Rc is the number of all proportionality classes of set
C , and
rc = gcd(Rc, nc), dc = gcd(q − 1, nc). (11)
Proof. It is sufficient to consider the case k = 2 because the general case
can be obtained by the induction. Thus assume that c(x) = h1(x)·h2(x) ,
where hi(x) is of degree mi and of order ni , ni = q
mi − 1 , 1 ≤ i ≤ 2 ,
is a certain prime miltiplier of c(x) . It is known [8], that the number
mi , 1 ≤ i ≤ 2 , equals either h or some divisor of this number. Hence,
taking into account the theorem 2 , and also remarks 4 and 5 , we have
ni = ri ·di , where ri = gcd(Ri, ni) , di = gcd(q−1, ni) , gcd(ri, q−1) = 1 ,
1 ≤ ri ≤ Ri , 1 ≤ di ≤ q − 1 , and Ri = (q
mi − 1)/(q − 1) , where Ri is
the number of all proportionality classes of minimal ideal Ji , 1 ≤ i ≤ 2 .
Therefore the order nc , nc = lcm(n1, n2) = n1 · n2/gcd(n1, n2) of the
polynomial c(x) can be rewritten as
nc = r1 d1 · r2 d2/gcd(r1d1 · r2d2). (12)
Since gcd(ri, q − 1) = 1 , we have gcd(ri, d1 · d2) = 1 , 1 ≤ i ≤ 2 . Thus
gcd(r1r2, d1d2) = 1 . Hence gcd(gcd(r1, r2), gcd(d1, d2)) = 1 so that (12)
may be represented in the following form
nc = (r1 r2/gcd(r1, r2)) · (d1 d2/gcd(d1, d2)). (13)
Thus, nc = lcm(r1, r2) · lcm(d1, d2) . Now by rc and dc we denote
lcm(r1, r2) and lcm(d1, d2) , respectively. Thus nc = rc·dc and gcd(rc, dc) =
1 . Since dc|q− 1 and gcd(rc, q− 1) = 1 , we obtain dc = gcd(q− 1, nc) .
Considering (5) , it follows that nc|(Rc · (q − 1)) . Hence we have rc =
gcd(Rc, nc) . Consequently the order nc of any element of set C is equal
to the product of two relatively prime numbers, i.e., nc = rc · dc , where
rc = lcm(r1, r2) = gcd(Rc, nc) and dc = lcm(d1, d2) = gcd(q − 1, nc) .
Furthermore, applying the theorem 1 to some element a(x) ∈ C of
period nc = ra · da , we have
xra · a(x) = θa(x), (14)
where θ ∈ GF (q)∗ is some element of order da , and ra is the smallest
natural number such that equality (14) takes place. Notice that the
subgroup {θ} has the order da , da < dc . If dc = 1 , then da = 1 and
nc = ra = rc = gcd(Rc, nc) , so that equalities (11) hold. For this reason
below we suppose that dc > 1 . If under this condition the number rc
is equal to one, then nc = dc= gcd(q − 1, nc) and the theorem is valid.
Therefore below we suppose that both rc > 1 and dc > 1 .
Evidently, da ≤ dc . Now let us show that the inequality da < dc is
not possible. Indeed, if da < dc , then we come to the following conclusion.
The subgroup {θ} , where θ is the element from equality (14) , belongs
to some subgroup of GF (q)∗ , having the order dc , because da|dc . Since
dc is some divisor of nc , then, considering the uniqueness of subgroups,
having the same order, the subgroup {θ} belongs to some group of dc -
th roots of unity. This implies that the smallest group of roots of unit,
containing {θ} , has an order, which either less or equals dc . Thus, both
the period of a(x) and the order of c(x) must be either less or equal to
dc . This yields that the order of c(x) must be some divisor of q−1 . But
this fact contradicts the condition rc > 1 . Hence the inequality da < dc
is impossible so that da = dc and ra = rc . Because of an arbitrary choice
of a(x) equalities (11) take place for any element of set C . The theorem
is proved.
Corollary 7 . The order nc of reducible factor c(x) of polynomial
xn − 1 , having some degree m over GF (q) , is some divisor of num-
ber qm − 1 , if gcd(h, q − 1) = 1 , where h is the multiplicative order of
number q mod n .
Remark 7 . In terms of condition gcd(h, q − 1) = 1 , where h is the
multiplicative order of number q mod n , the theorem 3 is valid for
cyclic codes of both the primitive and the nonprimitive length. Also,
taking into consideration the remark 5 , the order of any reducible factor
of the polynomial (xn − 1) over GF (q) of degree m , is some divisor of
the number (qm − 1) , if gcd (h, q − 1) = 1 .
Theorem 4 . Any cycle of set C , C ⊂ J , having some minimal
polynomial c(x) of the type (4) and of order nc = lcm(n1, n2, . . . , nk) ,
where ni 6= q
mi − 1 , 1 ≤ i ≤ k , is contained in rc , rc|Rc , 1 ≤ rc ≤ Rc ,
classes of proportionality and every such class includes dc , 1 ≤ dc ≤
q − 1 , elements of cycle, where Rc is the number of all proportionality
classes of C , and
dc = gcd(q − 1, nc), rc = nc/dc. (15)
Proof. It is sufficient to assume that k = 2 because the general case can
be obtained by the induction. This implies that c(x) = h1(x)h2(x) , where
hi(x) is some prime divisor of equality (3) , having some degree mi , and
of order ni , ni 6= q
mi − 1 , 1 ≤ i ≤ 2 . This yields that the theorem 2
holds for the minimal ideal Ji , 1 ≤ i ≤ 2 .
Evidently, if one of the two numbers, i.e., either dc or rc is equal
to one, then equalities (15) hold. For this reason below we assume that
both rc > 1 and dc > 1 .
Let z(x) be some vector of set C . According to the theorem 1 some
cycle {z(x)} ⊂ C is contained in rz classes and every such class includes
dz different elements of this cycle. Thus nc = rz dz , 1 ≤ dz ≤ q − 1 ,
dz|q−1 , 1 ≤ rz ≤ nc . And in addition, the following equality takes place
xrz · z(x) = β z(x), (16)
where β is some element of GF (q)∗ and the subgroup {β} , {β} ⊂
GF (q)∗ , has the order dz . Evidently, dz ≤ dc . Let us show that the
number dz can not be smaller than dc . Assume the inverse, i.e. let dz
be less than dc . Since at least one of the two numbers ether r1 or r2 is
not equal to one, we see that at least one of the numbers ni , 1 ≤ i ≤ 2 ,
is more than q−1 , as was established in the theorem 2. This means that
nc = lcm(n1, n2) > (q − 1). (17)
Further, since dz|dc , we see that the subgroup {β} of order dz belongs to
some subgroup of GF (q)∗ , having the order dc , where β is the element
from equality (16) . Due to the uniqueness of groups, having the same
order, the subgroup {β} belongs to the group of dc -th roots of unity
because dc|nc . It follows that the smallest group of the roots of unity,
which contains the subgroup {β} , has the order less or equals to dc .
This implies that both the period of z(x) and the order of c(x) is some
divisor of q − 1 . But this fact contradicts to (17) . It follows that our
assumption is not true, so that dz = dc , and rz = rc .
Besides that, since the number dc is the same number for every cycle
of set C , we see that every subset, consisting of rc proportionality class-
es, contains the same number of cycles, which is equal to bc = q − 1/dc .
Moreover, since dc = gcd(q − 1, nc) , we obtain gcd(bc, rc) = 1 . Hence,
taking into account the following equality sc = Rc(q − 1)/nc , which fol-
lows from (5) , we see that rc is some divisor of Rc and sc = vcbc , where
vc is equal to Rc/rc . The theorem is proved.
Corollary 7 . (Equidistance signs of the subset C , C ⊂ J , having
some minimal polynomial of the form (4) .)
All vectors of the subset C , C ⊂ J , having some minimal polynomial
c(x) of order nc = lcm(n1, n2, ..., nk) , ni 6= q
mi − 1 , 1 ≤ i ≤ k , k > 1 ,
have the same weight if at least one of the following conditions holds: 1.
sc = bc , 1 < bc ≤ q − 1 , 2. rc = Rc , 3. gcd(sc, Rc) = 1 .
Corollary 8 . The order of the reducible factor c(x) of the polynomial
xn−1 over GF (q) , having some degree m , is some divisor of the number
qm− 1 , if the decomposition of the polynomial c(x) into prime multiples
does not contain any primitive polynomial.
Remark 8 . Note that corollaries (6) and (8) show us in what cases
corollary 3.4 from [8] takes place for some reducible polynomial of the
degree m over GF (q) .
Also, consider some cyclic code, having the following parity-check
polynomial
h(x) =
hi(x), (18)
where hi(x) is an irreducible polynomial over GF (q) , q > 2 , of degree
mi and of order ni = (q
mi − 1)/bi , bi|q − 1 , 1 ≤ bi ≤ q − 1 , 1 ≤ i ≤ t ,
gcd(mi, mj) = 1 and gcd(bi, bj) = 1 , provided i 6= j , 1 ≤ i, j ≤ t , so
that the order n of h(x) equals n = lcm(n1, n2, ..., nt) .
It is worth mentioning that in [14] and [15] the following cases of
polynomial (18) have been considered. Namely t = 1 and t = 2 , and
besides that, with some additional restrictions, which can be omitted.
Also, some particular case of polynomial (18) , that is provided bi = 1
for all i , 1 ≤ i ≤ t , was obtained in [16]. But there are some unnecessary
restrictions in this paper too. Also, there are some mistakes in that paper.
Namely, the order of the product for some two polynomials from (18)
was defined incorrectly in [16].
Finally, using the results obtained above, we have found the minimal
distance of code, having the parity-check polynomial (18) , (see [17]).
Denote by M the set of degrees for the polynomials hi(x) , 1 ≤ i ≤ t ,
from the eq. (18) that is M = (m1, m2, ..., mt) . Let the number m
denote the degree of polynomial h(x) , i. e., m = m1+m2+ ...+mt . By
Mj,k , 1 ≤ j ≤ C
t , 1 ≤ k ≤ t − 1 , denote j
,s k -subset of M , where
Ckt is the binomial coefficient.
Thus Mj,k = (mj1, mj2 , ..., mjk) , 1 ≤ j ≤ C
t , 1 ≤ k ≤ t − 1 . At
last, denote by mj,k the sum of degrees, from the subset Mj,k , so that
mj,k = mj1 +mj2 + ... +mjk , 1 ≤ j ≤ C
t , 1 ≤ k ≤ t− 1 . It is obvious
that 1 ≤ mjk < m . Let us remark that provided k = 1 the number
mj,k = mj,1 = mj and 1 ≤ j ≤ t because C
t = t .
Theorem 5 .The minimal distance of code, having the parity-check
polynomial (18) , has the following form
dmin = q
m−1 −
qmj,1−1 −−
qmj,2−1 − ...−−
qmj,t−1−1, t ≥ 2,
dmin = q
m−1(q − 1)/b, t = 1, 1 ≤ b ≤ q − 1,
dmin = q
m−1(q − 1), t = 1, b = 1.
In conclusion I should like to express my sincere gratitude to L.A.Bassaligo,
M.I. Boguslavskii and E.T. Akhmedov for helping me to correct some
mistakes in the original version of my paper.
References
[1] Van der Waerden B.L. Algebra, vol. 1 , 1976 .
[2] Macwilliams F. J., Sloane N. J. A. The theory of error-correcting
codes, North-Holland, Amsterdam, Sixth Printing, 1988 .
[3] Nili H. Matrixschaltungen zur codierung und decodierung von Grup-
pen codes, Archiv der elektrishen ubertragung, Band 18 , Het. 9 ,
1964 , S. 555− 564 .
[4] Macwilliams F.J. The structure and properties of binary cyclic al-
phabets, Bell System Tech. J., 44 , 303− 332 , 1965 .
[5] Mullayeva Iren I. Dependence between cyclic structure of ide-
al J , J ⊂ An and proportionality classes of algebra An =
GF (q)[x]/(Xn−1) , AZNIINTI, Deposited scient. works, N 1(5) , p.
15 , Baku, Azerb., 1994 .
[6] Mullayeva I.I. Interdependence between cyclic and weight structure
of codes over GF(q) and its classes of proportyonality. RAS, Prob-
lems of Information Transmition, v. 41 , No. 2 , 2005 .
[7] Peterson W. W., Weldon E.J. Error - correcting codes, 2 nd, ed.,
M.I.T. Press, Cambridge, Mass., 1972 .
[8] Lidl R., Niderreiter H. Finite fields, Addison-Wesley Publ. Com.,
Massachusetts, 1983 .
[9] Kurosch. Higher algebra, 1962 .
[10] Elspas B. The theory of autonomous linear sequential networks, IRE
Transactions on circuit theory, CT- 6 , N 1 , March ( 1959 ), 45−60 .
[11] Richard E. Blahut. Theory and practice of error control codes,
Addison-Wesley Publ. Com., Massachusetts, 1984 .
[12] Zierler N. Linear recurring sequences, J. Soc. Ind. Appl. Math., 7
( 1959 ), N 1 , 45− 60 .
[13] Oganesyan S., Tairyan V., Yagdzyan V. Decomposition of cyclic
codes into equel-weight classes, Probl. of Control and Inform. Th.,
3 ( 1974 ), N 2 , 117− 125 .
[14] Clark W. E. Equidistant cyclic codes over GF(q), Discrete Math.,
17 ( 1977 ), N 2 , 139− 141 .
[15] Oganesyan S.,Yagdzyan V., Tairyan V. On a class of optimal cyclic
codes, Proc. 2 nd Int. Symp. on Inform. Theory, Tsahkadzor, Arme-
nia, USSR, 1971 , Budapest: Acad. Kiado, 1973 , 219− 224 .
[16] Tenengolts G. M. A new class of cyclic error-corecting codes, Proc.
3 nd Confr. on Coding and Transm. Inform., Soviet Un., Ugegorod,
1967 , 18− 28 .
[17] Mullayeva Iren I. Equidistance criterion of cyclic subset for ideal J ,
of algebra An = GF (q)[x]/(x
n−1) , consisting of polynomials mod-
ulo xn − 1 over Galois field GF (q) , q > 2 , AZNIINTI, Deposited
scient. works, N2(6) , p. 16 , Baku, Azerbaijan, 1994 .
ABSTRACT
  The interrelation between the cyclic structure of an ideal, i.e., a cyclic
code over Galois field $GF(q)$, $q>2$, and its classes of proportional elements
is considered. This relation is used in order to define the code's weight
structure. The equidistance conditions of irreducible nonprimitive codes over
GF(q) are given. Besides that, the minimum distance for some class of
nonprimitive cyclic codes is found.

<|endoftext|><|startoftext|>
Simplifying additivity problems using direct sum constructions
Motohisa Fukuda1, Michael M. Wolf2
1Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge
2 Max-Planck-Institute for Quantum Optics, Hans-Kopfermann-Str. 1, D-85748 Garching, Germany.
(Dated: August 20, 2021)
We study the additivity problems for the classical capacity of quantum channels, the minimal output entropy
and its convex closure. We show for each of them that additivity for arbitrary pairs of channels holds iff it holds
for arbitrary equal pairs, which in turn can be taken to be unital. In a similar sense, weak additivity is shown to
imply strong additivity for any convex entanglement monotone. The implications are obtained by considering
direct sums of channels (or states) for which we show how to obtain several information theoretic quantities
from their values on the summands. This provides a simple and general tool for lifting additivity results.
I. INTRODUCTION
A central question in classical and quantum information
theory is, how much information can be transmitted through
a given noisy channel. For classical channels the maximal
asymptotically achievable rate—the capacity—was derived in
the seminal work of Shannon [1]. For quantum channels,
however, the matter is complicated by the existence of entan-
glement and the possibility of exploiting it in the encoding to
protect information against decoherence. If one excludes this
possibility, a capacity formula for the transmission of clas-
sical information through quantum channels was proven by
Holevo [2] and Schumacher and Westmoreland [3] (HSW).
Since then, considerable effort was devoted to the question
whether (or in which cases) entangled inputs can lead to rates
beyond the HSW capacity. This issue—the additivity problem
for the HSW capacity—is still undecided, although for sev-
eral classes of channels additivity has been shown to be true,
i.e., entanglement does not seem to help in any case (see, e.g.,
[4, 5, 6, 7] and references therein). Instead, other additiv-
ity problems appeared which are similar in spirit but concern
very different quantities like the minimal output entropy and
the entanglement of formation, an entanglement measure for
bipartite states for which in addition strong super-additivity
has been conjectured.
A major conceptional insight was then gained in [8, 9, 10,
11] where it was shown that all these additivity problems are
globally equivalent in the sense that if additivity holds for one
of these quantities in general, then it does so for all of them.
Here ‘in general’ means that it has to be true for arbitrary pairs
of channels (or states), a condition we will call strong additiv-
In this work we present a further conceptional simplifica-
tion of these and related additivity problems. We show that
strong additivity is implied by weak additivity, meaning ad-
ditivity for arbitrary pairs of equal channels or states. More-
over, based on [12] we argue that it suffices to consider pairs
of identical unital channels only. This observation may be
a small step on a notorious path but it might guide future
research as it for instance underlines recent attempts to un-
derstand the asymptotic structure of tensor powers of unital
channels [13]. Moreover, one may think of other additivity
questions than the ones stated above for which our techniques
could be of use. In particular, we think of regularized quanti-
ties (like quantum capacities (cf. [14, 15]) or certain entangle-
ment measures) for which weak additivity holds by definition.
Our main tool is the use of direct sums of channels or
states. For the latter case similar constructions appeared in
[16, 17, 18]. We begin with a discussion of direct sum chan-
nels. This will contain more than what is needed for the sub-
sequent additivity results as we think that these tools might be
of independent interest.
II. DIRECT SUMS OF QUANTUM CHANNELS
We consider direct sums of channels, i.e., completely posi-
tive and trace preserving maps of the form T = ⊕iTi, where
each Ti is a channel in its own right. Our aim in this section is
to express information theoretic functionals of T in terms of
their values for the Ti’s. The definition of the quantities ap-
pearing in the following proposition will be given in the proof.
Proposition 1 (Direct sums) Consider a direct sum T =
⊕ni=1Ti, n ∈ N of arbitrary finite dimensional channels. Then
1. Minimal output α-Renyi entropy (α ≥ 1):
Smin,α
= min
Smin,α(Ti)
, (1)
2. Coherent information:
= max
J(Ti)
, (2)
3. Mutual information:
= max
S({λi}) +
λiI(Ti), (3)
= log
2I(Ti), (4)
where {λi} is a probability distribution and S({λi}) its
entropy.
4. HSW capacity:
= max
S({λi}) +
λiχ(Ti) (5)
= log
2χ(Ti). (6)
http://arxiv.org/abs/0704.1092v2
Remark: Let us briefly comment on the interpretation of
the above formulas. Concerning the HSW capacity, classical
information can either be sent through the channelsTi or it can
be encoded in the choice of blocks i = 1, . . . , n. Eq.(5) shows
exactly the competition between these two ways of communi-
cating classical information. For the quantum mutual infor-
mation, which gives the entanglement assisted capacity [19],
we obtain the same interpretation (note that the ’2’ comes
from the fact that we take log in base 2). The coherent infor-
mation is related (via regularization) to the quantum capacity
[20]. In this case encoding information in the choice of blocks
is not possible—this would be purely classical as all the co-
herences get lost. Similarly, for the minimal output entropies
the minimum is obtained by putting all the weight into the
least noisy channel.
Proof. 1. The α-Renyi entropy is defined as
Sα(ρ) =
log tr[ρα] =
log ‖ρ‖α, (7)
for 0 ≤ α ≤ ∞. Here ‖ · ‖p is the Schatten p-norm.
When α = 1 the functional is defined by its limit which is
the minimal output entropy Smin(T ) = infρ S(T (ρ)) with
S(ρ) = −trρ log ρ the von Neumann entropy. Let us consider
this case first.
As the direct sum ⊕iTi erases the off-diagonal blocks so
that all possible outputs can be obtained upon block-diagonal
inputs we can restrict to ρ = ⊕iρ̃i. Here ρ̃i is not necessarily
normalized so that the weights tr[ρ̃i] =: λi form a probability
distribution. Writing ρi := ρ̃i/λi and using the concavity of
von Neumann entropy we get
⊕i Ti(λiρi)
λiS(Ti(ρi)). (8)
This leads to Eq.(1) when α = 1. For α > 1 the minimization
of Sα(T (ρ)) amounts to a maximization of ‖T (ρ)‖α and the
result follows from convexity of ‖ · ‖α in a similar way.
2. The coherent information is defined as
J(T ) = sup
T (ρ)
T ⊗ id(Ψ)
, (9)
where Ψ is a purification of ρ such that ρ = trBΨ. Since
T and T ⊗ id erase the off-diagonal blocks we can replace ρ
and Ψ by their diagonal blocks: ⊕iλiρi and ⊕iλiΨi. Here,
Ψi is an extension of ρi. Since the conditional entropy
S(ρAB)−S(ρA) is concave in ρAB [21] considering a convex
decomposition of each Ψi into pure states shows Eq.(2) in a
similar way as above.
3. The mutual information defined as
I(T ) = sup
S(ρ) + S
T (ρ)
T ⊗ id(Ψ)
is concave in ρ so the maximum will be achieved by a block
diagonal ρ = ⊕iλiρi for T = ⊕iTi. To see this, let V =
⊕j exp{2πij/n}Ij and average V kρV ∗k over k = 1, . . . , n.
Take a purification Ψ of ρ = ⊕iλiρi, and then replace Ψ by
its diagonal blocks: ⊕iλiΨi as before. However, each Ψi
is a purification of ρi in this case. Indeed, suppose ρi =
pij |ij〉〈ij|, where {|ij〉}j is an orthonormal basis in the
ith subspace. Then, Ψ is
λiλkpijpkl|ij〉〈kl| ⊗ |ij〉〈kl|,
and its ith diagonal block λiΨi is
pijpil|ij〉〈il| ⊗ |ij〉〈il| = λi|Ψi〉〈Ψi|.
Here, |Ψi〉 =
pij |ij〉 ⊗ |ij〉. Exploiting this together
with S(λρ) = λ(S(ρ) − logλ) then gives Eq.(3). Eq.(4) fol-
lows then from determining the optimal λi via Lagrange mul-
tipliers in the following way. The maximization problem of
S({λi})+
λici for a probability distribution {λi} amounts
then to maximizing
S({λi}) +
λici + Λ
λi − 1
, (11)
where Λ is the Lagrange multiplier. Taking partial derivatives
we obtain for extremal {λ̃i}:
− log λ̃i −
+ ci + Λ = 0 ∀i (12)
λ̃i − 1 = 0. (13)
Hence (12) shows ci − log λ̃i is a constant, say, C for ∀i, and
by (13) we get C = log
ci . Therefore
S({λ̃i}) +
λ̃ici =
λ̃iC = log
2ci . (14)
As this is lower bounded by mini[I(Ti)] it must be the maxi-
4. The HSW capacity χ is given by
χ(T ) = sup
S(T (ρ))−HT (ρ), (15)
where HT (ρ) = inf
pkρk=ρ
pkS(T (ρk)). (16)
Here, HT (ρ) is the convex closure of the output entropy; {pk}
is a probability distribution and ρk are density matrices. The
r.h.s. of Eq.(15) for a fixed average input state ρ is a constraint
HSW capacity which we will denote by χ(T, ρ). Since the
inputs can again be assumed to be block-diagonal we have:
χ (⊕iTi,⊕iλiρi) = S(⊕iλiTi(ρi))−
λiHTi(ρi)
= S({λi}) +
λiχ(Ti, ρi). (17)
The first equality is explained by the fact that since the von
Neumann entropy is concave there is an optimal decompo-
sition of ⊕iλiρi for which each state has its support in one
of the diagonal blocks. The second equality comes from
S(λiρi) = λi(S(ρi) − logλi). Taking the supremum over
all states {ρi} then leads to Eq.(5). Again, Eq.(6) is obtained
by using Lagrange multipliers as above. In fact, for unital
channels (6) has been obtained in [30].
III. SIMPLIFYING ADDITIVITY PROBLEMS
Let us now turn to the additivity conjectures and exploit
Prop.1 in order to show that in several cases weak additivity
(for equal channels or states) implies strong additivity (i.e., for
different ones).
Proposition 2 (Reduction for channels) The following (in-)
equalities hold for arbitrary pairs of different channels T1 and
T2 iff they hold for arbitrary equal pairs T1 = T2.
1. Smin,α(T1 ⊗ T2) = Smin,α(T1) + Smin,α(T2), for any
α ≥ 1.
2. χ(T1 ⊗ T2) = χ(T1) + χ(T2).
3. HT1⊗T2(ρ) ≥ HT1(ρ1) +HT2(ρ2) for all states ρ with
respective subsystems ρ1, ρ2.
4. HT1⊗T2(ρ1⊗ρ2) = HT1(ρ1)+HT2(ρ2) for all product
states ρ1 ⊗ ρ2.
Remark: The conjectured equality in 1. is the additiv-
ity of the minimal output entropy when α = 1 [22], and
it becomes the multiplicativity of maximal output p-norms
for p = α > 1. This was conjectured to be true for all
α ∈ [1,+∞] before a counterexample was found [23] ruling
out all values α > 4.79. The equation in 2. is the conjec-
tured additivity of the HSW capacity, which gives the clas-
sical capacity as long as entangled states are not allowed to
be used in the encoding [2, 3]. The additivity would show
that the HSW capacity itself is the unconstrained classical ca-
pacity of quantum channels. The conjectures 3. and 4. are
called strong superadditivity and additivity of the convex clo-
sure of the output entropy. When T1, T2 are partial traces they
become strong superadditivity and additivity of entanglement
of formation, respectively, which we discuss in greater detail
below.
We note that Prop.2 remains valid in the case where ‘arbi-
trary channels’ refers to a restricted set of channels which is
closed under direct sums and tensor products.
Proof. 1. Let σ1 and σ2 be optimal output states for T1 and
T2 respectively. Then, form the following two channels:
T ′1(ρ) = T1(ρ)⊗ σ2, T ′2(ρ) = σ1 ⊗ T2(ρ). (18)
It is not difficult to see that T1 ⊗ T2 and T ′1 ⊗ T ′2 share the
additivity property. Hence we can assume that T1 and T2 have
the same optimal output: Smin,α(T1) = Smin,α(T2). If we
apply first weak additivity and then Prop. 1.1. we obtain:
Smin,α(((T1 ⊕ T2)⊗ (T1 ⊕ T2))) (19)
= 2Smin,α(T1 ⊕ T2) = Smin,α(T1) + Smin,α(T2). (20)
On the other hand, if we first apply Prop. 1.1. and then weak
additivity, we obtain that (19) and thus (20) is upper bounded
by Smin,α(T1 ⊗ T2). The converse inequality is trivial.
2. Consider
T1 ⊕ T2
T1 ⊕ T2
= 2 log
2χ(T1) + 2χ(T2)
= log
22χ(T1) + 22χ(T2) + 2χ(T1)+χ(T2)+1
This follows from first applying weak additivity and then the
proposition 1.4. On the other hand, applying them in reverse
order we have
T1 ⊕ T2
= log
2χ(T1⊗T1) + 2χ(T2⊗T2) + 2 · 2χ(T1⊗T2)
= log
22χ(T1) + 22χ(T2) + 2χ(T1⊗T2)+1
Together they prove the claimed equality.
For 3. we obtain by weak superadditivity,
HT1⊗T2(ρ) = H(T1⊕T2)⊗(T1⊕T2)(0⊕ ρ⊕ 0⊕ 0)
≥ HT1⊕T2(ρ1 ⊕ 0) +HT1⊕T2(0⊕ ρ2)
= HT1(ρ1) +HT2(ρ2). (21)
Here, ρ1, ρ2 are reduced states of ρ. This proves 3. and the
statement 4. follows in a similar way when replacing ρ by a
product state.
Proposition 3 (Unital channels) Proving one of the conjec-
tures in proposition 2 for all pairs of identical unital channels
would show the conjecture is true for arbitrary channels.
Proof. In [12] a unital channel T̃ is constructed for a given
channel T so that these two channels T̃ and T share the fol-
lowing additivity properties: additivity of minimal output α-
Renyi entropy, and strong superadditivity and additivity of the
convex closure of the output entropy. Hence these conjectures
can be restricted to products T̃1 ⊗ T̃2 for all channels T1, T2.
As for the HSW, we have the same reduction but for a differ-
ent reason (See the remark below). Finally, for the above two
unital channels T̃1, T̃2 we can construct the direct sum T̃1⊕ T̃2
which is again a unital channel. Then the result follows from
the proof of proposition 2.
Remark: We explain local relation between minimal out-
put entropy and the HSW capacity, which was implicitly writ-
ten but not clear in [12]. Since the unital extension T̃ sort of
mixes up outputs of T we have the following formula.
χ(T̃1 ⊗ T̃2) = log d1d2 − Smin(T̃1 ⊗ T̃2), (22)
where d1, d2 are the dimensions of the output spaces of T̃1
and T̃2 respectively. Hence the additivity of HSW capacity
is equivalent to the additivity of the minimal output entropy
for products of those extensions T̃1 ⊗ T̃2 by Eq.(22). Hence
the additivity conjecture of the HSW capacity can also be re-
stricted to products T̃1 ⊗ T̃2 for all channels T1, T2 by using
global equivalence [8, 9, 10, 11].
Finally, we will discuss additivity issues for entanglement
measures. The one already mentioned is the entanglement of
formation which was introduced in [24]. Since then the fol-
lowing conjectures have been considered:
EF (ρ) ≥ EF (ρ1) + EF (ρ2) (23)
EF (ρ1 ⊗ ρ2) = EF (ρ1) + EF (ρ2). (24)
In fact, both are again globally equivalent to the additivity of
the HSW capacity and the minimal output entropy. Moreover,
additivity would imply that EF equals an important opera-
tionally defined entanglement measure, the entanglement cost
Ec, since Ec(ρ) = limn→∞ EF (ρ
⊗n) [25].
The entanglement of formationEF (ρ) is the convex closure
of output entropy HT (ρ) when T is a partial trace.
Following a similar strategy as above we will now show
that strong additivity in the sense of Eq.(24) is again implied
by weak additivity (i.e., Eq.(24) with ρ1 = ρ2). In fact, this
will not only hold for EF but for any convex entanglement
monotone [24, 26, 27]. The main reason behind is that every
such functional satisfies [28]:
f(⊕iλiρi) =
λif(ρi), (25)
where {λi} is a probability distribution and ρi are states as
before.
Proposition 4 (Convex entanglement monotones) [29]
Suppose f is a convex entanglement monotone which is
weakly additive, i.e., f(ρ1 ⊗ ρ2) = f(ρ1) + f(ρ2) for all
ρ1 = ρ2. Then f is strongly additive in the sense that this
holds also for all ρ1 6= ρ2.
Proof. Let ρ = 1
(ρ1 ⊕ ρ2). Then
f(ρ⊗ ρ) = 2f(ρ) = f(ρ1) + f(ρ2). (26)
Here, we applied the weak additivity and then (25). Applying
them in reverse order we get
f(ρ⊗ ρ) =
i,j=1
f(ρi ⊗ ρj)
(f(ρ1) + f(ρ2) + f(ρ1 ⊗ ρ2)). (27)
Using similar ideas, it has recently been shown that for reg-
ularized entanglement measures like Ec or the asymptotic rel-
ative entropy of entanglement, monotonicity (i.e., essentially
Eq.(25)) and strong additivity are equivalent [18].
Acknowledgement M.F. would like to thank his supervisor
Y.M.Suhov for constant encouragement and numerous discus-
sions. M.W. thanks K.G. Vollbrecht for discussions and J.I.
Cirac for support. Both authors thank M. B. Ruskai for bring-
ing [30] to their attention.
[1] C. E. Shannon, “ A Mathematical Theory of Communication”,
Bell System Technical Journal, 27 379E23 and 623E56 (1948).
[2] A. S. Holevo “The capacity of the quantum channel with gen-
eral signal states”, IEEE Trans. Info. Theory, 44, 269–273,
(1998).
[3] B. Schumacher and M. D. Westmoreland, “Sending classical in-
formation via noisy quantum channels”, Phys. Rev. A, 56, 131–
138, (1997).
[4] M.M. Wolf, J. Eisert, “Classical information capacity of a
class of quantum channels ”, New J. Phys. 7, 93 (2005);
quant-ph/0412133
[5] C. King, “The capacity of the quantum depolarizing channel ”,
IEEE Trans. Inf. Theo. 49, 221 (2003); quant-ph/0204172
[6] C. King, “Additivity for unital qubit channels”, J. Math. Phys.
43, 4641–4653, (2002).
[7] P.W. Shor, “Additivity of the Classical Capacity of
Entanglement-Breaking Quantum Channels”, J. Math. Phys.
43, 4334 (2002); quant-ph/0201149
[8] P.W. Shor, ”Equivalence of Additivity Questions in Quantum
Information Theory”, Comm. Math. Phys., 246, Issue 3, 453–
472 (2004); quant-ph/0305035.
[9] A.A. Pomeransky, ”Strong superadditivity of the entanglement
of formation follows from its additivity”, Phys. Rev. A 68,
032317 (2003); quant-ph/0305056.
[10] K.M.R. Audenaert and S.L. Braunstein, ”On Strong Subaddi-
tivity of the Entanglement of Formation”, Comm. Math. Phys.
246 No 3, 443–452, (2004); quant-ph/0303045.
[11] K. Matsumoto, T. Shimono, and A. Winter, “Remarks on ad-
ditivity of the Holevo channel capacity and of the entangle-
ment of formation ”, Commun. Math. Phys. 246, 437 (2004);
quant-ph/0206148.
[12] M. Fukuda, “Simplification of additivity conjecture in quantum
information theory”, quant-ph/0608010.
[13] J.A. Smolin, F. Verstraete, A. Winter, “Entanglement of as-
sistance and multipartite state distillation”, Phys. Rev. A 72,
052317 (2005); quant-ph/0505038.
[14] M. Wolf, D. Perez-Garcia, “Quantum Capacities of Channels
with small Environment ”, Phys. Rev. A 75, 012303 (2007);
quant-ph/0607070.
[15] G. Smith, J.A. Smolin, A. Winter, “The quantum capacity with
symmetric side channels ”, quant-ph/0607039.
[16] P.W. Shor, J.A. Smolin, B.M. Terhal, “Nonadditivity of Bi-
partite Distillable Entanglement follows from Conjecture on
Bound Entangled Werner States ”, Phys. Rev. Lett. 86, 2681
(2001); quant-ph/0010054.
[17] K.G.H. Vollbrecht, R.F. Werner, M.M. Wolf, “On the irre-
versibility of entanglement distillation ”, Phys. Rev. A 69,
062304 (2004); quant-ph/0301072.
[18] F.G.S.L. Brandao, M. Horodecki, M.B. Plenio, S. Virmani,
“Remarks on the equivalence of full additivity and monotonic-
ity for the entanglement cost”, quant-ph/0702136.
[19] C. H. Bennett, P.W. Shor, J. A. Smolin, and A. V.
Thapliyal,“Entanglement-assisted capacity of a quantum chan-
nel and the reverse Shannon theorem”, IEEE Trans. Inf. Theory
48, 2637 (2002); quant-ph/0106052.
[20] P.W. Shor, “The quantum channel capacity and coherent infor-
mation”, lecture notes, MSRI Workshop on Quantum Compu-
tation (2002); I. Devetak, IEEE Trans. Inf. Th. 51, 44 (2005);
S. Lloyd, Phys. Rev. A 55, 1613 (1997).
[21] M.B. Ruskai, “Inequalities for Quantum Entropy: A Review
with Conditions for Equality ”, J. Math. Phys. 43, 4358 (2002);
erratum 46, 019901 (2005); quant-ph/0205064.
[22] C. King and M. B. Ruskai, “Minimal entropy of states emerging
from noisy quantum channels”, IEEE Trans. Info. Theory, 47,
192-209 (2001); quant-ph/9911079.
[23] R.F. Werner and A.S. Holevo, “Counterexample to an ad-
http://arxiv.org/abs/quant-ph/0412133
http://arxiv.org/abs/quant-ph/0204172
http://arxiv.org/abs/quant-ph/0201149
http://arxiv.org/abs/quant-ph/0305035
http://arxiv.org/abs/quant-ph/0305056
http://arxiv.org/abs/quant-ph/0303045
http://arxiv.org/abs/quant-ph/0206148
http://arxiv.org/abs/quant-ph/0608010
http://arxiv.org/abs/quant-ph/0505038
http://arxiv.org/abs/quant-ph/0607070
http://arxiv.org/abs/quant-ph/0607039
http://arxiv.org/abs/quant-ph/0010054
http://arxiv.org/abs/quant-ph/0301072
http://arxiv.org/abs/quant-ph/0702136
http://arxiv.org/abs/quant-ph/0106052
http://arxiv.org/abs/quant-ph/0205064
http://arxiv.org/abs/quant-ph/9911079
ditivity conjecture for output purity of quantum channels”,
quant-ph/0203003.
[24] C. H. Bennet, D. P. DiVincenzo, J. A. Smolin, W. K. Woot-
ters, “Mixed-state entanglement and quantum error correction”,
Phys. Rev. A, 54, 3824–3851, (1996); quant-ph/9604024.
[25] P.M. Hayden, M. Horodecki, B.M. Terhal, “The asymptotic en-
tanglement cost of preparing a quantum state ”, J. Phys. A:
Math. Gen. 34, 6891 (2001); quant-ph/0008134.
[26] G. Vidal, J. Mod. Opt., 47, 355 (2000); quant-ph/9807077.
[27] V. Vedral and M. B. Plenio, Phys. Rev. A, 57, 1619 (1998);
quant-ph/9707035.
[28] M. Horodecki, “Simplifying monotonicity conditions for en-
tanglement measures”, Open Syst. Inf. Dyn., 12, 231 (2005),
quant-ph/0412210.
[29] This result is implicit in the recent work [18]. However, it has
already been proven and been communicated by one of us at the
ERATO conference, Kyoto 2003 (unpublished).
[30] E. Stormer, “A reduction theorem for capacity of positive
maps”, quant-ph/0510040.
http://arxiv.org/abs/quant-ph/0203003
http://arxiv.org/abs/quant-ph/9604024
http://arxiv.org/abs/quant-ph/0008134
http://arxiv.org/abs/quant-ph/9807077
http://arxiv.org/abs/quant-ph/9707035
http://arxiv.org/abs/quant-ph/0412210
http://arxiv.org/abs/quant-ph/0510040
ABSTRACT
  We study the additivity problems for the classical capacity of quantum
channels, the minimal output entropy and its convex closure. We show for each
of them that additivity for arbitrary pairs of channels holds iff it holds for
arbitrary equal pairs, which in turn can be taken to be unital. In a similar
sense, weak additivity is shown to imply strong additivity for any convex
entanglement monotone. The implications are obtained by considering direct sums
of channels (or states) for which we show how to obtain several information
theoretic quantities from their values on the summands. This provides a simple
and general tool for lifting additivity results.

<|endoftext|><|startoftext|>
Modulational instability in nonlocal Kerr-type media with random parameters
E.V. Doktorov∗ and M.A. Molchan†
B.I. Stepanov Institute of Physics, 68 F. Skaryna Ave., 220072 Minsk, Belarus
Modulational instability of continuous waves in nonlocal focusing and defocusing Kerr media with
stochastically varying diffraction (dispersion) and nonlinearity coefficients is studied both analyt-
ically and numerically. It is shown that nonlocality with the sign-definite Fourier images of the
medium response functions suppresses considerably the growth rate peak and bandwidth of insta-
bility caused by stochasticity. Contrary, nonlocality can enhance modulational instability growth
for a response function with negative-sign bands.
PACS numbers: 42.25.Dd, 42.70.-a, 42.65.Jx
I. INTRODUCTION
Modulational instability (MI) in nonlinear media is a
destabilization mechanism which produces a self-induced
breakup of an initially continuous wave into localized
(solitary wave) structures. This phenomenon was pre-
dicted in plasma [1, 2], nonlinear optics [3, 4], fluids [5]
and atomic Bose-Einstein condensates [6, 7, 8]. MI of
continuous waves can be used to generate ultra-high
repetition-rate trains of soliton-like pulses [9, 10, 11]. It is
common knowledge that MI is absent in the defocusing
Kerr medium and presents as the long-wave instability
with a finite bandwidth in the focusing Kerr medium [12].
The above results were obtained for media with de-
terministic parameters. Contrary, in realistic media the
characteristic parameters are not constants, as a rule,
but fluctuate randomly around their mean values. It was
shown in the setting of nonlinear optics that stochastic
inhomogeneities in a Kerr-type medium extend the do-
main of MI of continuous waves, as compared with deter-
ministic systems, over the whole spectrum of modulation
wavenumbers, even for the defocusing regime [13, 14, 15].
A comprehensive review of MI of electromagnetic waves
in inhomogeneous and in discrete media is given in
Ref. [16].
Another important aspect of a class of realistic nonlin-
ear media is concerned with their nonlocality. Nonlocal-
ity is typically a result of underlying transport processes
such as heat conduction in thermal nonlinear media [17],
diffusion of atoms in a gas [18], long-range electrostatic
interaction in liquid crystals [19], charge carrier transfer
in photorefractive crystals [20, 21], and many-body in-
teraction in Bose-Einstein condensates [22]. Nonlocality
can prevent the collapse of self-focused beams [23, 24] and
dramatically alter interaction between dark solitons [25].
MI in deterministic nonlocal Kerr-type media was stud-
ied in Refs. [26, 27], and it was shown that nonlocality
does not produce MI in the defocusing case for small and
moderate values of the product “modulation amplitude
∗Electronic address: doktorov@dragon.bas-net.by
†Electronic address: m.moltschan@dragon.bas-net.by
× nonlocality parameter”.
In the present paper we unite the two above lines of
study of nonlinear media and analyze MI in nonlocal me-
dia with stochastic parameters. Since nonlocality spreads
out localized excitations, it is reasonable to expect a par-
tial suppression of the stochasticity-induced MI gain. In-
deed, we demonstrate that the aforementioned situation
with MI in local stochastic media with the sign-definite
Fourier images of the response functions changes drasti-
cally, if nonlocality is taken into account. Namely, both
the growth rate peaks and bandwidths of instability are
considerably decreased. On the other hand, there can be
an “anomalous” behavior of nonlocality when the Fourier
image of the response function of a nonlocal medium al-
lows for sign-negative bands. In this case the MI gain of
a nonlocal medium can exceed that of a local stochastic
medium for some values of the modulation wavenumber.
We adopt the nonlocal nonlinear Schrödinger equation
with random coefficients as a model to reveal peculiar-
ities of MI of continuous waves. The results obtained
are illustrated by the white noise model for parameter
fluctuations and by response functions of several types.
II. MODEL
The propagation of an optical beam along the z axis in
a nonlocal medium with random parameters is governed
by the nonlinear Schrödinger equation
iuz +
d(z)uxx + g(z)u
dx′R(x− x′)|u|2(x′, z) = 0.
(2.1)
Here x is the transverse coordinate, u(x, z) is the complex
envelope amplitude and we use the standard dimension-
less variables. The group velocity dispersion (or diffrac-
tion) coefficient d(z) and nonlinearity coefficient g(z) are
considered as stochastic functions which fluctuate around
their mean values d0 (d0 > 0) and g0 (g0 ≷ 0):
d(z) = d0(1 +md(z)), g(z) = g0(1 +mg(z)). (2.2)
Here md and mg are independent zero-mean random pro-
cesses of the Gaussian white-noise type,
〈md〉 = 〈mg〉 = 0, 〈md(z)md(z′)〉 = 2σ2dδ(z − z′),
http://arxiv.org/abs/0704.1093v1
mailto:doktorov@dragon.bas-net.by
mailto:m.moltschan@dragon.bas-net.by
〈mg(z)mg(z′)〉 = 2σ2gδ(z − z′)〉,
and the angle brackets stand for the expectation with
respect to the distribution of the processes md(z) and
mg(z). The integral in equation (2.1) represents the field-
intensity dependent change of the refractive index char-
acterized by the normalized symmetric response function
R(x),
dxR(x) = 1. The delta-function response
function R(x) = δ(x) corresponds to the local limit of
the model. We will discriminate between the focusing
(g0 > 0) and defocusing (g0 < 0) media.
Eq. (2.1) possesses the homogeneous plane wave solu-
u0 = A exp
dz′g(z′)
, (2.3)
where A is a real amplitude. Now we perform the linear
stability analysis of the solution (2.3). Assume that
u(x, z) = (A+ v(x, z)) exp
dz′g(z′)
(2.4)
is a perturbed solution of Eq. (2.1) with v(x, z) being a
small complex modulation. Substituting Eq. (2.4) into
Eq. (2.1) and linearizing about the plane wave (2.3), we
get a linear equation for v(x, z):
ivz +
d(z)vxx + 2g(z)A
dx′R(x− x′)Re v(x′, z) = 0.
(2.5)
After decomposing v into real and imaginary parts, v =
r(x, z) + is(x, z), and performing the Fourier transforms
ρ(k, z) =
dx r(x, z)eikx,
σ(k, z) =
dx s(x, z)eikx,
R̂(k) =
dxR(x)eikx,
Eq. (2.5) is converted to a system of linear equations for
ρ and σ:
d(z)k2
d(z)k2 + 2g(z)A2R̂ 0
(2.6)
If we were deal with the deterministic system with the
parameters d0 and g0, Eq. (2.6) would be the main object
to study MI [26]. However, MI induced by the random
fluctuations is not captured by the analysis of the first
moments 〈ρ〉 and 〈σ〉 [15], and it is necessary to compute
the modulational intensity growth given by the higher-
order moments.
III. THE SECOND-ORDER MOMENT MI GAIN
We consider the second moments 〈ρ2〉, 〈ρσ〉 and 〈σ2〉
as constituents of the column vector
X(2) =
〈ρ2〉, 〈ρσ〉, 〈σ2〉
. (3.1)
The moment 〈ρσ〉 is added to close the equations for
the second-order moments. Then we should calculate z-
evolution of the vector X(2). Its first component gives
〈ρ2〉 = 2〈ρzρ〉 = d0k2〈ρσ〉 + d0k2〈md(z)ρσ〉,
in accordance with Eqs. (2.2) and (2.6). For decoupling
of the mean 〈md(z)ρσ〉 we apply the Furutsu-Novikov
formula [28, 29]
〈md(z)ρσ〉 =
dyσ2dB(z − y)
. (3.2)
Here B(z − y) = δ(z − y) for the white-noise Gaussian
random process, while the functional derivative (δ/δmd)
is calculated from Eq. (2.6). Indeed, writing ρ(z) as the
integral ρ(z) = (1/2)k2
dyd(y)σ(y) (and the similar
integral for σ(z)) and accounting for the explicit repre-
sentation (2.2) of d(z) in terms of md gives
δ(ρσ)
σ + ρ
2(σ2 − ρ2).
Therefore, 〈md(z)ρσ〉 = (1/2)σ2dd0k2(〈σ2〉 − 〈ρ2〉) and
finally
〈ρ2〉 = d0k2〈ρσ〉+
4(〈σ2〉 − 〈ρ2〉).
Just in the same way we can calculate z-derivatives of
the other components of the vector X(2). As a result, we
obtain the evolution equation (d/dz)X(2) = M (2)X(2)
with the 3× 3 matrix M (2) of the form
M (2) =
4 d0k
2R̂ − 1
2 −σ2dd20k4
16σ2gg
4R̂2 + σ2dd
2R̂− 1
. (3.3)
Eigenvalues of M (2) with positive real parts lead to in-
stabilities, and the largest positive value determines the
MI gain G2(k). The eigenvalues λj are easily found from
Eq. (3.3) but they are too cumbersome to be reproduced
here explicitly. Below we separately analyze the cases of
the defocusing (g0 < 0) and focusing (g0 > 0) nonlin-
earities. Following [26], we will use for illustration the
Gaussian response function
RG(x) =
, R̂G(k) = exp
(3.4)
and the exponential one
Re(x) =
, R̂e(k) =
1 + a2k2
(3.5)
as examples of the response functions with the sign-
definite Fourier images, as well as the rectangular re-
sponse function
Rr(x) =
for |x| ≤ a,
0 for |x| > a,
R̂r(k) =
sin(ak)
(3.6)
whose Fourier transform has negative-sign bands. Here a
is the nonlocality parameter, a → 0 means R(x) → δ(x)
and R̂(k) → 1.
A. Defocusing nonlinearity
For the defocusing nonlinearity g0 < 0 we obtain one
real eigenvalue λ1 and two complex conjugate ones λ2
and λ3. Numerical analysis shows that λ1 is positive for
all k2, while λ2 and λ3 have negative real parts for the
Gaussian and exponential response functions. Let us re-
mind that there is no MI for g0 < 0 for local deterministic
Kerr media, while randomness of the coefficients d(z) and
g(z) completely destroys stability of the continuous wave
solution. This situation considerably changes for nonlo-
cal media. Indeed, Fig. 1 clearly shows that nonlocality
with the sign-definite response functions suppresses both
the growth rate peak of G2(k) ≡ λ1 and MI bandwidth,
the latter being practically finite. When the nonlocality
parameter a grows, the suppression effect becomes more
pronounced. Somewhat different situation takes place
0 1 2 3 4
0 1 2 3 4
FIG. 1: Defocusing media. Plots of the MI gain G2(k) for
local stochastic medium (solid line), nonlocal stochastic media
with the Gaussian (dash-dotted line) and exponential (dashed
line) response functions. Here d0 = 2, |g0|A2 = 1, σ2d = σ2g =
0.1. Upper panel: a = 1; lower panel: a =
for the rectangular response function (3.6). For suffi-
ciently high nonlocality, MI gain maximum for a given
wavenumber k can exceed the corresponding value of G2
for a local random medium (Fig. 2). Besides, the MI
bandwidth becomes strictly finite in this limit.
B. Focusing nonlinearity
In the case of the focusing nonlinearity (g0 > 0) a local
deterministic medium produces the long-wave instability
with a finite bandwidth. Stochasticity of medium param-
eters extends the bandwidth to the whole spectrum of
modulation wavenumbers. Calculation of eigenvalues of
the matrix M2 (3.3) with g0 > 0 demonstrates that non-
locality suppresses the MI gain and bandwidth for media
with both sigh-definite (Fig. 3) and sign-indefinite (Fig.
0 1 2 3 4
FIG. 2: Defocusing media with the rectangular response func-
tion. Plots of the MI gain G2(k) for local stochastic medium
(solid line), nonlocal stochastic media with a = 2 (dashed
line), a = 6 (dash-dotted line), and a = 10 (dotted line).
Other parameters are the same as in Fig. 1.
0 1 2 3 4
0 1 2 3 4
FIG. 3: Focusing media. Plots of the MI gain G2(k) for a lo-
cal deterministic medium (solid line), local stochastic medium
(dotted line), nonlocal stochastic media with the Gaussian
(dash-dotted line) and exponential (dashed line) response
functions. Here d0 = 2, g0A
2 = 1, σ2d = 0.1, σ
g = 0.2.
Upper panel: a = 1; lower panel: a = 2.5.
4) response functions. Notice that stronger nonlocality
is needed for focusing media to achieve a reduction of
the MI gain, as compared with defocusing ones. Besides,
maximum positions of the MI gains shift toward smaller
wavenumbers k under nonlocality growth, producing fi-
nite bandwidth.
0 0.5 1 1.5 2 2.5 3
FIG. 4: Focusing media. Plots of the MI gain G2(k) for a local
deterministic medium (solid line), local stochastic medium
(dotted line), nonlocal stochastic media with the rectangular
response function: a = 2 (dash-dotted line), a = 8 (dashed
line). Other parameters are the same as in Fig. 3.
IV. HIGHER-ORDER MOMENTS
The second-order moments (3.1) do not provide an
analysis of the MI gain in stochastic media with sufficient
detail. In particular, it is important to see fluctuations
of the exponential growth of the modulation amplitude.
More deep insight into the problem demands to account
for higher-order moments
X(2n) =
< ρ(2n−j)σj >
, j = 0, . . . , 2n. (4.1)
In this section we study the interplay of nonlocality and
exponential growth of the higher momentsX(2n) in virtue
of stochasticity. As before, applying the Furutsu-Novikov
formula (3.2), we obtain a matrix M (2n) in the form
M (2n) = d0k
2A(2n) +
2R̂ (4.2)
B(2n) + d20k
4σ2dC
(2n) + 16g20A
4R̂2σ2gD
(2n).
Non-zero entries of the matrices A(2n), B(2n), C(2n) and
D(2n) are written as
j,j+1 = n−
j,j−1 = j; C
jj = −
(n+2nj−j2),
j,j+2 =
j + 1
, (4.3)
j,j−2 = D
j,j−2 =
j(j − 1), j = 0, . . . , 2n.
Then the maximal real part of roots of the characteristic
polynomial det |M (2n) − λI| will give nG2n(k). Since all
the matrix elements of M (2n) are real and the character-
istic polynomial is of the odd degree, at least one of the
eigenvalues of M (2n) is real and the others are mutually
complex conjugate. In what follows we will consider the
4-th and 6-th moments.
A. Defocusing nonlinearity
In Fig. 5 we show the results of calculating MI gains
G2, G4 and G6 for both the exponential and Gaussian re-
sponse functions and compare them with the same curves
for local stochastic media obtained in [15]. It is seen
that nonlocality suppresses the higher-order moments as
well. Notice that in defocusing media positions of MI
gain maxima for moments of different orders coincide [15]
(they are deterministic rather than random). Nonlocality
does not disturb this property. Fig. 6 demonstrate simi-
lar curves for the rectangular response function for differ-
ent values of the nonlocality parameter a. It is seen that
for sufficiently high a the medium demonstrates prac-
tically coinciding distributions of higher-moment growth
rates, their maxima being shifted to shorter wavelengths.
Evidently, higher-order moments for the rectangular re-
sponse function manifest the same “anomalous” enhance-
ment of the growth rate in a narrow region of modulation
wavenumbers, as compared with the local stochastic case.
0 1 2 3 4
0 1 2 3 4
FIG. 5: Defocusing media. Plots of the MI gains G6 (solid
line), G4 (dashed line), and G2 (dash-dotted line) for a lo-
cal stochastic medium (upper three curves) and for nonlocal
stochastic media (lower three curves). Here d0 = 2, a
2 = 1,
|g0|A2 = 1, σ2d = σ2g = 0.1. Upper panel: exponential re-
sponse function; lower panel: Gaussian response function.
B. Focusing nonlinearity
For the focusing media the higher-order MI gains
demonstrate much the same behavior as for the defo-
cusing ones for both sign-definite and sign-indefinite re-
0 1 2 3 4 5
0 1 2 3
0 0.5 1 1.5
FIG. 6: Defocusing nonlocal stochastic media with the rect-
angular response function. Plots of the MI gains G6 (solid
line), G4 (dashed line), and G2 (dash-dotted line) for d0 = 2,
|g0|A2 = 1, σ2d = σ2g = 0.1. Upper panel: a = 2; middle panel:
a = 6; lower panel: a = 8.
sponse functions. Fig. 7 shows the MI gains for local
and nonlocal stochastic media for the Gaussian response
function. Curves for the exponential and rectangular re-
sponse functions are qualitatively the same. With in-
creasing the nonlocality parameter a, curves for MI gains
of different orders become closer one the other, so high
nonlocality smoothes fluctuations of the modulation am-
plitude growth.
0 0.5 1 1.5 2 2.5 3
FIG. 7: Focusing media. Plots of the MI gains G6 (solid
line), G4 (dashed line), and G2 (dash-dotted line) for a lo-
cal stochastic medium (upper three curves) and for nonlo-
cal stochastic medium with the Gaussian response function
(lower three curves). Here d0 = 2, a = 2, g0A
2 = 1,
d = σ
g = 0.1.
V. CONCLUSION
Within the limits of the linear stability analysis, we
have investigated the MI of a homogeneous wave in a
nonlocal nonlinear Kerr-type medium with random pa-
rameters. For the case of the white-noise model of pa-
rameter fluctuations, we derived the equations which gov-
ern the dependence of the MI gain on the modulation
wavenumber. As was expected from physical motiva-
tions, nonlocality causes considerable suppression of the
stochasticity-induced MI growth rate for media with the
sign-definite Fourier images of the response functions. At
the same time, nonlocal media with the sign-indefinite
Fourier images of the response functions can display a
somewhat different behavior leading to an increase, as
compared with local media, of the MI gain for some do-
mains of modulation wavenumbers.
Acknowledgments
The authors are very grateful to F. Abdullaev and J.
Garnier for constructive comments.
[1] G.A. Askar’yan, Zh. Eksp. Teor. Fiz. 42, 1576 (1962)
[Sov. Phys. JETP 15, 1088 (1962)].
[2] T. Taniuti and H. Washini, Phys. Rev. Lett. 21, 209
(1968).
[3] V.I. Bespalov and V.I. Talanov, Pis’ma Zh. Eksp. Teor.
Fiz. 3, 471 (1966) [JETP Lett. 3, 307 (1966)].
[4] L.A. Ostrovskii, Zh. Eksp. Teor. Fiz. 51, 1189 (1966)
[Sov. Phys. JETP 24, 797 (1967)].
[5] T.B. Benjamin and J.E. Feir, J. Fluid Mech. 27, 417
(1967).
[6] B. Wu and Q. Niu, Phys. Rev. A 64, 061603 (2001).
[7] V.V. Konotop and M. Salerno, Phys. Rev. A 65,
021602(R) (2001).
[8] L. Salasnich, A. Parola, and L. Reatto, Phys. Rev. Lett.
91, 080405 (2003).
[9] K. Tai, A. Hasegawa, and A. Tomita, Phys. Rev. Lett.
56, 135 (1986).
[10] G. Millot, E. Seve, S. Wabnitz, and M. Haelterman, J.
Opt. Soc. Am. B 15, 1266 (1998).
[11] K.E. Steeker, G.B. Partridge, A.G. Truscott, and
R.J. Hulet, Nature 417, 150 (2002).
[12] Yu.S. Kivshar and G.P. Agraval, Optical Solitons: From
Fibers to Photonic Crystals, Academic Press, San Diego,
2003.
[13] F.Kh. Abdullaev, S.A. Darmanyan, S. Bishoff, and
H.P. Sørensen, J. Opt. Soc. Am. B 14, 27 (1997).
[14] M. Karlsson., J. Opt. Soc. Am. B 15, 2269 (1998).
[15] J. Garnier and F.Kh. Abdullaev, Physica D 145, 65
(2000).
[16] F.Kh. Abdullaev, S.A. Darmanyan, and J. Garnier,
Progr. Opt. 44, 303 (2002).
[17] A. Dreischuh, G.G. Paulus, F. Zacher, F. Grasbon, and
H. Walther, Phys. Rev. E 60, 6111 (1999).
[18] D. Suter and T. Blasberg, Phys. Rev. A 48, 4583 (1993).
[19] C. Conti, M. Peccianti, and G. Assanto, Phys. Rev. Lett.
91, 073901 (1993); 92,113902 (2004).
[20] M. Segev, B. Crosignani, A. Yariv, and B. Fischer, Phys.
Rev. Lett. 68, 923 (1992).
[21] G.C. Duree, J.L. Shultz, G.J. Salamo, M. Segev,
A. Yariv, B. Crosignani, P. Di Porto, E.J. Sharp, and
R.R. Neurgaonkar, Phys. Rev. Lett. 71, 533 (1993).
[22] V.M. Perez-Garcia, V.V. Konotop, and J.J. Garcia-
Ripoll, Phys. Rev. E 62, 4300 (2000).
[23] S.K. Turitsyn, Teor. Mat. Fiz. 64, 226 (1985) [Theor.
Math. Phys. 64 797 (1985)].
[24] O. Bang, W. Krolikowski, J. Wyller, and J.J. Rasmussen,
Phys. Rev. E 66, 046619 (2002).
[25] A. Dreischuh, D. Neshev, D.E. Peterson, O. Bang, and
W. Krolikowski, Phys. Rev. Lett. 96, 043901 (2006).
[26] W. Krolikowski, O. Bang, J.J. Rasmussen, and J. Wyller,
Phys. Rev. E 64, 016612 (2001).
[27] W. Krolikowski, O. Bang, N.J. Nikolov, D.N. Neshev,
J.J. Rasmussen, J. Wyller, and D. Edmundson, J. Opt.
B: Quantum Semiclass. Opt. 6, S288 (2004).
[28] K. Furutsu, J. Res. NBS, D-67, 303 (1983).
[29] E.A. Novikov, Sov. Phys. JETP 20, 1290 (1964).
ABSTRACT
  Modulational instability of continuous waves in nonlocal focusing and
defocusing Kerr media with stochastically varying diffraction (dispersion) and
nonlinearity coefficients is studied both analytically and numerically. It is
shown that nonlocality with the sign-definite Fourier images of the medium
response functions suppresses considerably the growth rate peak and bandwidth
of instability caused by stochasticity. Contrary, nonlocality can enhance
modulational instability growth for a response function with negative-sign
bands.

<|endoftext|><|startoftext|>
Draft version October 28, 2018
Preprint typeset using LATEX style emulateapj v. 04/21/05
SUBARU HDS OBSERVATIONS OF A BALMER-DOMINATED SHOCK IN TYCHO’S SUPERNOVA
REMNANT ∗
Jae-Joon Lee,
Bon-Chul Koo,
John Raymond,
Parviz Ghavamian,
Tae-Soo Pyo,
Akito Tajitsu,
Masahiko Hayashi
Draft version October 28, 2018
ABSTRACT
We present an Hα spectral observation of a Balmer-dominated shock on the eastern side of Tycho’s
supernova remnant using SUBARU Telescope. Utilizing the High Dispersion Spectrograph (HDS), we
measure the spatial variation of the line profile between preshock and postshock gas. Our observation
clearly shows a broadening and centroid shift of the narrow-component postshock Hα line relative to
the Hα emission from the preshock gas. The observation supports the existence of a thin precursor
where gas is heated and accelerated ahead of the shock. Furthermore, the spatial profile of the emission
ahead of the Balmer filament shows a gradual gradient in the Hα intensity and line width ahead of
the shock. We propose that this region (∼ 1016 cm) is likely to be the spatially resolved precursor.
The line width increases from ∼ 30 km s−1 up to ∼ 45 km s−1 and its central velocity shows a redshift
of ∼ 5 km s−1 across the shock front. The characteristics of the precursor are consistent with a cosmic
ray precursor, although a possibility of a fast neutral precursor is not ruled out.
Subject headings: ISM:supernova remnants – ISM: individual (Tycho, G120.1+1.4) – Shock Waves –
line: profiles
1. INTRODUCTION
Balmer-dominated filaments are the signature of
non-radiative shocks propagating into partially neu-
tral medium (Chevalier & Raymond 1978). The Hα
line profile is composed of two distinctive components
(narrow and broad) representing the velocity distri-
bution of preshock and postshock gases, respectively
(Chevalier et al. 1980). High resolution spectroscopic ob-
servations of several of these shocks have revealed that
the width of the narrow component is unusually large
(30 ∼ 50 km s−1) for ambient neutral hydrogen, and it
was proposed that the gas was heated in a precursor
thin enough (. 1017 cm) to avoid complete ionization
of hydrogen (see Ghavamian et al. 2001; Sollerman et al.
2003, and references therein).6 Two likely candidates are
cosmic-ray (CR) and fast neutral precursors (Smith et al.
1994; Hester et al. 1994; Ghavamian et al. 2001). Both
scenarios predict significant Doppler shifts of preshock
gas. No clear indication of such a shift of the Hα nar-
row component is reported (but see Lee et al. 2004). A
careful comparison of postshock and preshock line pro-
files with high spectral resolution is crucial for confirming
the existence of such a precursor.
The blast wave of Tycho’s SNR, the historical
∗BASED ON DATA COLLECTED AT SUBARU TELESCOPE,
WHICH IS OPERATED BY THE NATIONAL ASTRONOMI-
CAL OBSERVATORY OF JAPAN
1 Astronomy Program, Department of Physics and Astronomy,
Seoul National University, Seoul 151-742, Korea
2 jjlee@astro.snu.ac.kr
3 Harvard-Smithsonian Center for Astrophysics, 60 Garden
Street, Cambridge, MA 02138
4 Department of Physics and Astronomy, Johns Hopkins Univer-
sity, 3400 North Charles Street, Baltimore, MD 21218
5 Subaru Telescope, National Astronomical Observatory of
Japan, 650 North A’ohōkū Place, Hilo, HI 96720
6 We refer to this precursor explicitly as a “thin precursor”, to
avoid potential confusion with a photoionization precursor which
will be shortly introduced.
remnant of the 1572 supernova, has been known
to exhibit Balmer-dominated emission (van den Bergh
1971; Kamper & van den Bergh 1978). The re-
gion of the brightest Hα emission (Knot g from
Kamper & van den Bergh 1978) is located along the
north-eastern edge of the remnant. Faint, diffuse optical
emission extends ∼1 pc ahead of the Balmer-dominated
filaments (Ghavamian et al. 2000, G00 hereafter). This
feature has been identified as a photoionization precur-
sor (PIP) produced by photoionization of the preshock
gas by He II λ304Å emission from behind the blast
wave (G00). The estimated temperature of the PIP is
∼ 1.2 × 104 K, which is not high enough to explain the
observed width of the Balmer narrow component.
The existence of Hα emission from the PIP makes
Tycho an unique target for the study of nature of the
thin precursor, as we can investigate the change of the
line profile across this precursor by comparing the line
emission from the PIP (preshock) and that of Knot g
(postshock). In this Letter, we present high resolution
(Echelle) long slit Hα spectra of Tycho Knot g and its
PIP using the SUBARU High Dispersion Spectrograph
(Noguchi et al. 2002). Our observations reveal the line
broadening and the Doppler shift of the the narrow com-
ponent, providing strong evidence for the existence of a
cosmic ray or fast neutral precursor.
2. OBSERVATIONS AND RESULTS
The spectrosopic observation was carried out on 2004
October 1. Longslit Echelle spectroscopy of the Hα line
was performed using order-blocking filters (HDS stan-
dard setup “stdHa”). This gives a spectral coverage
of 6540 Å∼ 6690 Å over the 60′′ of slit length. The
slit was centered at α(2000), δ(2000) = (00h 25m 56.s5,
64◦ 09′ 28′′) with position-angle of 76◦ (measured E of
N), covering both Knot g and its PIP simultaneously
(Fig. 1(a)). A total of 3×30 minutes of source expo-
sure was obtained, and the same amount of exposure for
http://arxiv.org/abs/0704.1094v1
2 Lee et al.
nearby sky. The spectrum was binned by 2 along the
slit direction and 4 along the dispersion direction before
the readout. The pixel scale after the binning is 0.27′′
pixel−1 (9.3 × 1015 cm pixel−1 at a distance of 2.3 kpc
(Kamper & van den Bergh 1978)) and 0.08 Å respec-
tively. The slit width was 2′′, which gives velocity resolu-
tion of 17 km s−1. The seeing was 0.′′5±0.′′1. The process-
ing of the SUBARU data included a typical CCD prepro-
cessing (including overscan correction and flat fielding)
and two-dimensional spectral extraction. A wavelength
calibration solution is obtained from the spectrum of a
Th-Ar lamp. The source spectrum was sky-subtracted,
and normalized using the spectra of standard stars. The
uncertainty in the wavelength calibration is estimated to
be around 0.2 km s−1 at the wavelength of Hα.
In Fig. 1(b), we present the fully processed two dimen-
sional spectrum of Hα line. The bright patch at the bot-
tom with three distinct local emission peaks corresponds
to Knot g, and the faint emission extending to the top
of the image corresponds to the PIP. The average Hα
spectrum of Knot g and that of PIP are shown together
in Fig. 2. The velocity width of the Knot g narrow com-
ponent line is clearly larger than that of the PIP Hα line.
In addition, the velocity centroid of the Knot g narrow
component is slightly redshifted (5.5± 0.6 km s−1) rela-
tive to that of the PIP Hα line. And as clearly seen in
Fig. 2(b), the Knot g spectrum shows a very broad (∼
1,000 km s−1), faint Hα line. The small peak near 900
km s−1 is the [N II] λ6583.4 Å line from the PIP.
The spectrum of the PIP is well fitted by a single Gaus-
sian, and yields a FWHM of 33.8± 0.8 km s−1 and cen-
troid velocity of −35.8 ± 0.6 km s−1 (in LSR frame).7
The measured FWHM of [N II] λ6583.4 Å in the PIP
is ∼ 23 km s−1. If the broadening were purely thermal,
the widths of the lines whould be inversely proportional
to the square root of their atomic masses. The expected
width of the [N II] line in this case is 1/
14 = 0.267
of Hα. As this is significantly narrower than what is
observed, a significant amount of nonthermal broaden-
ing is suggested. If we simply assume that the observed
line widths are a convolution of thermal and nonther-
mal broadenings, the estimated thermal temperature is
∼ 13, 000 K, consistent with 12, 000 K of G00. It is also
possible that the large line width is due to a residual of
Galactic Hα emission.
An adequate fit to the Knot g spectrum requires three
Gaussian components. They have velocity widths of
45.3 ± 0.9 km s−1 (narrow), 108 ± 4 km s−1 (intermedi-
ate) and 931± 55 km s−1 (broad), with central velocities
of −30.3± 0.2, −25.8± 0.8 and 29± 18 km s−1, respec-
tively. This result confirms that the Hα narrow com-
ponent line of Knot g is redshifted and broadened rela-
tive to that of the PIP. The broad component (FWHM∼
1, 000 km s−1) should correspond to the previously re-
ported ∼ 2, 000 km s−1 component (Ghavamian et al.
2001). The relatively narrower width of our observation
is probably due to the insensitivity of our spectroscopic
configuration to the very broad line, although the pos-
sibility of temporal variation (e.g., by crossing a density
7 throughout this paper, we give line widths corrected for instru-
mental broadening and velocity in LSR frame.
jump) does exist. The abrupt density increase might be
possible if the shock is propagating into the edge out-
skirts of the dense cloud (Lee et al. 2004). The existence
of the intermediate width component has already been
reported by G00. It may be produced when slow pro-
tons picked up by the postshock magnetic field undergo
a secondary charge exchange. Alternatively, it might be
an artifact of the assumption of Gaussian distributions,
which would not necessarily be appropriate if the mo-
tions are non-thermal.
The characteristics of the narrow component Hα line
of Knot g are consistent with G00, except that the ve-
locity centroid of the line in our data is significantly dif-
ferent from theirs (vLSR = −30.3± 0.2 from our data vs.
−53.9 ± 1.3 km s−1 from G00). We have carefully as-
sessed our wavelength calibration, but couldn’t find any
significant source of error. The wavelengths of night sky
lines observed near Hα matched well with those of VLT
UVES night-sky emission line catalog (Hanuschik 2003)
within error of 0.02 Å (∼ 1 km s−1). The Hα spectrum of
blank sky is also consistent with nearby WHAM spectra
(Haffner et al. 2003). Furthermore, the central velocity
of [N II] λ6583.4 Å emission line from the PIP is con-
sistent with that of Hα. Although we believe that our
wavelength calibration solution is quite secure, the dis-
crepancy with the previous observation should be con-
firmed by an independent observation. For the rest of
this paper, we mainly concentrate on the line width and
the relative variation of central velocity.
3. LOCATION OF THE SHOCK FRONT AND DISCOVERY
OF A THIN (1016 CM) PRECURSOR
Fig. 3 shows the spatial variation of the Hα line profile
along the slit. In the top panel, we separately plot the
integrated intensities of the representative Hα of narrow
and broad components, together with their sum (refer
to the figure caption for the velocity range of each com-
ponent). In the middle and bottom panels, we plot the
central velocities and FWHMs at each slit position from
the single Gaussian fit to the full profile. In all three
panels, the x-axis represents the pixel offset from the
southwest end of the slit. Although three Gaussian com-
ponents are actually required to adequately fit the Hα
profile of Knot g, the non-uniqueness of the multiple-
component fitting can complicate the interpretation of
the different emission line components. Therefore, we
plot the fit result from a single Gaussian and describe
our interpretation of this fit in the region of Knot g as
following: First, the central velocities closely match those
of the narrow component. Fitting with multiple compo-
nents leaves the fitted line centroids unchanged to within
the errors. Second, the FWHMs basically trace the spa-
tial variation of the narrow component width, but the
presence of the intermediate and the broad components
contribute significantly to the width. When fit with three
Gaussian components, the large widths of the Hα nar-
row component in Knot g from the single profile models
(60 ∼ 80 km s−1) are reduced to 40 ∼ 50 km s−1.
Inspection of Fig. 3 reveals that there exists only one
location (pixel offset 27, marked as a dotted vertical line)
where the intensity and width of the Hα line exhibits an
abrupt jump. The jump is most noticeable for the broad
component, where the flux is virtually zero toward the di-
rection of ambient medium (i.e., toward positive pixel off-
SUBARU HDS Observation of Tycho 3
6559 6560 6561 6562 6563 6564
A(htgnelevaW
(a) (b)
Fig. 1.— (a) Hα image of Tycho Knot g with position of the
2′′ ×60′′ slit overlaid. (b) The fully-processed Hα 2-d spectrum
(only the western part of the slit is shown). The bright emission
knot at the bottom is Knot g.
sets). The central velocity of the narrow component also
shows rapid change around this location. The FWHM
also steeply increases behind this point, but this might
be largely due to the sudden appearance of the broad
component. The fact that the intensity of the broad
component, which is associated with the postshock gas,
shows a significant jump at this location suggests that it
corresponds to the location of the shock front.
The Hα intensity in the PIP region is nearly constant
along the slit, but shows an rapid rise just before the
shock front, i.e., from pixel offset 32 to 27 in Fig. 3. The
intensity of the Hα line in the PIP region is about 10%
of the observed peak value in Knot g, and it reaches
about half the peak very near the shock. The line width
also increases from 30 to 45 km s
within this region.
We propose that the steep increase of the Hα intensity
accompanied with line broadening corresponds to a thin
precursor where gas is heated and accelerated. Unlike
the line broadening, the observed velocity centroids shift
slightly behind the shock instead of the precursor region.
This does not necessarily indicate that the broadening
and the Doppler shift takes at different region as this
is likely due to geometrical projection effects. The Hα
intensity profile gives an e-folding thickness of 1.4 ± 0.4
pixel (after accounting for seeing), which corresponds to
(1.4± 0.4)× 1016 cm at a distance of 2.3 kpc.
The observed emission of Knot g shows a few local
peaks indicating possible substructure, e.g., a collection
of shock tangencies seen in projection along the line of
sight. This leads to a possibility that the ‘precursor’
is simply the results of geometric projection of fainter
Balmer-dominated filament lying just ahead of Knot g.
If we assume that this filament has a line profile simi-
lar to that of Knot g, i.e., have a same broad-to-narrow
ratio, then a detectable amount of broad component is
expected. The flux profile of broad component plotted
in Fig. 3 would show similar gradual increase in the pre-
cursor region, which is not seen. Also, no hint of a broad
component is found in a summed spectra of precursor
region, which is expected to show up. Examining the
archival Chandra observation of Tycho’s SNR (the anal-
ysis is presented by Warren et al. 2005), we did not
find evidence of X-ray emission extending ahead of Knot
g. Therefore, there is no strong supporting evidence in
favor of a projection, although we cannot rule out the
possibility.
4. NATURE OF THE THIN PRECURSOR
As the shock is nearly tangential to our line of sight,
the actual amount of bulk gas acceleration could be much
-200 -100 0 100
Knot g : observed
Knot g : fit (sum)
Knot g : fit (individual)
PIP : observed
-2000 -1500 -1000 -500 0 500 1000
Velocity (km/s)
[NII]
Knot g : observed
Knot g : fit (broad)
PIP : observed
Fig. 2.— (a) Hα spectrum of Knot g and the PIP. The spectrum
of Knot g is fitted with three Gaussian components, and two nar-
rowest components are displayed (thin dashed) together with their
sum (thin solid). (b) Same as (a) except the full observed velocity
range is presented and the spectra are binned. The Knot g broad
component from the above fit is displayed as a thin solid line.
larger than the observed Doppler shift. The shock an-
gle can be inferred from the radial velocity shift of the
broad component of Knot g relative to the narrow com-
ponent (Chevalier et al. 1980), which is measured to be
∼ 60 km s−1. As our observation could be insensitive to
this broad component, using this value is rather conser-
vative. On the other hand, Ghavamian et al. (2001) re-
ported a redshift of ∼ 130 km s−1, but their field of view
is different from ours. We take these two values as limits
and give shock angle of 2◦−5◦ assuming a shock velocity
of 2,000 km s
. We consider 5 km s
to be the repre-
sentative redshift of the narrow component compared to
unperturbed medium, which gives actual acceleration of
60 ∼ 130 km s−1.
The line width of the narrow component at the shock
front (∼ 45 km s−1) corresponds to a neutral hydrogen
temperature of 40, 000 K, if the line broadening is purely
thermal, or lower, if there is a significant non-thermal
broadening such as a wave motion. Neutral hydrogen
atoms and protons may have similar velocity distribu-
tions due to their charge exchange interactions. But that
of electrons can be different, which would depend on the
heating mechanism in the precursor. In the following,
we estimate the electron temperature in the thin precur-
sor (TP) from the observed intensity increase of factor
5. Since the observed spectrum is an integrated emis-
sion along the line of sight where a significant contribu-
tion from PIP region is expected, the actual emissivity
increase within the precursor should be much greater.
We assume that regions of the PIP and the TP are
represented by two concentric shells with thicknesses of
δPIP ≃ 1016 cm and δTP ≃ 1 pc (G00), respectively, and
that both have an inner diameter of the Tycho (5.4 pc).
Then the ratio of path length through each shell along
the tangential direction of the shock front is ≃ 18. This
implies that the emissivity increase in the TP could be
as large as a factor of 90. This value should be regarded
4 Lee et al.
total
narrow
broad
0 10 20 30 40 50 60
pixel offset
Fig. 3.— (top) Integrated fluxes along the slit position of
Hα spectrum for a given velocity range. The thin solid line
shows emission in the range −60 < vLSR < 10 km s
−1, rep-
resentative of Hα narrow component, while the thick solid line
shows emission in the range −400 < vLSR < −120 km s
−1 and
+100 < vLSR < +350 km s
−1, for the broad component. The
thick dashed line shows the summed intensity for both components.
(−400 < vLSR < +350 km s
−1). Middle and bottom panels: cen-
tral velocities and FWHMs from fits with a single Gaussian. The
proposed location of shock front is marked as a vertical dotted line.
as an upper limit as it is likely that the filament is nearly
flat and tangent to the line of sight. The collisional ex-
citation rate of Hα at Te ∼ 40, 000K is 2, 000 times
greater than that of 12, 000K which is the temperature
of the PIP region (G00). This value greatly exceeds the
emissivity increase of 90, and implies a few possibilities.
The electron temperature can be less than 40, 000K, ei-
ther if observed Hα line width has significant nonthermal
broadening, or if Te is intrinsically less than Tp. On the
other hand, the estimated emissivity increase might be
explained if the emissivity increase due to high Te is sup-
pressed by ionization of neutrals. Since the existence of
Blamer-dominated filaments requires a significant frac-
tion of neutral hydrogen atoms to survive within precur-
sor, this possibility is less favored.
The two likely candidates for this precursor are fast
neutral and CR precursors. The momentum and en-
ergy carried by upstreaming fast neutrals can be large
enough to explain the observed heating and accelera-
tion in the precursor, but the available model calcula-
tions do not predict significant heating by these neutrals
(Lim & Raga 1996; Korreck 2005). Furthermore, It is
difficult to reproduce the small range of FWHMs ob-
served for narrow component Hα lines from shocks of dif-
ferent velocities (Smith et al. 1994; Hester et al. 1994).
The observed characteristics of the precursor are consis-
tent with a CR precursor. CR acceleration in the shock
does require a precursor (e.g., Blandford & Eichler 1987)
and there has been growing evidence of CR acceleration
in young SNRs including Tycho itself (e.g., Warren et al.
2005). Significant heating and acceleration are expected
to happen in the CR precursor (Blandford & Eichler
1987). Interaction of CR particles in the upstream gen-
erates Alfv́en waves and significant amplification of mag-
netic field has been suggested (Bell & Lucek 2001). Al-
though a measurement of the preshock magnetic field
is hardly available, a high value (40µG) of preshock
magnetic field is suggested for Tycho (Völk et al. 2002),
which may explain the line width of the narrow compo-
nent.
To conclude, our SUBARU observation clearly has
shown the line broadening and the Doppler shift between
the preshock gas and postshock gas. This strongly sug-
gests the existence of the thin precursor. Furthermore,
the precursor itself is likely to be resolved. Given the
lack of observational constraints on CR ion acceleration
in SNR shocks, the tentatively measured width (∼ 1016
cm) of the thin precursor whose primary candidate is the
CR precursor is promising. Follow-up observation, such
as a high resolution imaging with HST, would clearly
resolve the structures of the precursor.
We thank to the anonymous referee for valuable com-
ments. This work was supported by the Korea Research
Foundation (grant No. R14-2002-058-01003-0). JJL has
been supported in part by the BK 21 program.
REFERENCES
Bell, A. R. & Lucek, S. G. 2001, MNRAS, 321, 433
Blandford, R. & Eichler, D. 1987, Phys. Rep., 154, 1
Chevalier, R. A., Kirshner, R. P., & Raymond, J. C. 1980, ApJ,
235, 186
Chevalier, R. A. & Raymond, J. C. 1978, ApJ, 225, L27
Ghavamian, P., Raymond, J., Hartigan, P., & Blair, W. P. 2000,
ApJ, 535, 266
Ghavamian, P., Raymond, J., Smith, R. C., & Hartigan, P. 2001,
ApJ, 547, 995
Haffner, L. M., Reynolds, R. J., Tufte, S. L., Madsen, G. J.,
Jaehnig, K. P., & Percival, J. W. 2003, ApJS, 149, 405
Hanuschik, R. W. 2003, A&A, 407, 1157
Hester, J. J., Raymond, J. C., & Blair, W. P. 1994, ApJ, 420, 721
Kamper, K. W. & van den Bergh, S. 1978, ApJ, 224, 851
Korreck, K. 2005, Ph.D. Thesis
Lee, J., Koo, B., & Tatematsu, K. 2004, ApJ, 605, L113
Lim, A. J. & Raga, A. C. 1996, MNRAS, 280, 103
Noguchi, K., Aoki, W., Kawanomoto, S., Ando, H., Honda, S.,
Izumiura, H., Kambe, E., Okita, K., Sadakane, K., Sato, B.,
Tajitsu, A., Takada-Hidai, T., Tanaka, W., Watanabe, E., &
Yoshida, M. 2002, PASJ, 54, 855
Smith, R. C., Raymond, J. C., & Laming, J. M. 1994, ApJ, 420,
Sollerman, J., Ghavamian, P., Lundqvist, P., & Smith, R. C. 2003,
A&A, 407, 249
Völk, H. J., Berezhko, E. G., Ksenofontov, L. T., & Rowell, G. P.
2002, A&A, 396, 649
van den Bergh, S. 1971, ApJ, 168, 37
Warren, J. S., Hughes, J. P., Badenes, C., Ghavamian, P., McKee,
C. F., Moffett, D., Plucinsky, P. P., Rakowski, C., Reynoso, E.,
& Slane, P. 2005, ApJ, 634, 376
ABSTRACT
  We present an Ha spectral observation of a Balmer-dominated shock on the
eastern side of Tycho's supernova remnant using the Subaru Telescope. Utilizing
the High Dispersion Spectrograph (HDS), we measure the spatial variation of the
line profile between preshock and postshock gas. Our observation clearly shows
a broadening and centroid shift of the narrow-component postshock Ha line
relative to the Ha emission from the preshock gas. The observation supports the
existence of a thin precursor where gas is heated and accelerated ahead of the
shock. Furthermore, the spatial profile of the emission ahead of the Balmer
filament shows a gradual gradient in the Ha intensity and line width ahead of
the shock. We propose that this region (~10^16 cm) is likely to be the
spatially resolved precursor. The line width increases from ~30 up to ~45 km/s,
and its central velocity shows a redshift of ~5 km/s across the shock front.
The characteristics of the precursor are consistent with a cosmic-ray
precursor, although the possibility of a fast neutral precursor is not ruled
out.

<|endoftext|><|startoftext|>
Introduction
Let V be a finite-dimensional complex linear space and G ⊂ GL(V ) be a compact
subgroup of GL(V ). We consider the problem of description of polynomially convex
hulls for orbits Ov = Gv, v ∈ V . The polynomially convex hull (or polynomial hull)
Q̂ of a compact set Q ⊂ V is defined as
Q̂ = {z ∈ V : |p(z)| ≤ sup
|p(ζ)| for all p ∈ P(V )},(0.1)
where P(V ) is the algebra of all holomorphic polynomials on V . It is usually
difficult to find Q̂. For Q = Gv, the answer is known if G is an isotropy group of a
bounded symmetric domain in Cn. Paper [9] contains a description of G-invariant
polynomially convex compact sets, including hulls of orbits (Q ⊂ V is polynomially
convex if Q̂ = Q); it continues paper [10] and uses results of [8]. On the other
hand, it is known that an orbit of a compact linear group is polynomially convex
if and only if the complex orbit GCv is closed and Gv is its real form ([7]). The
cases G = U(2), SU(2) were considered in [1], [4]. The problem of determination
of polynomial hulls of orbits admits the following natural generalization: given a
homogeneous space M of a compact group G, describe maximal ideal spaces MA of
G-invariant closed subalgebras A of C(M), where C(M) is the Banach algebra of all
1991 Mathematics Subject Classification. Primary 32E20; Secondary 32M15, 32M05.
Key words and phrases. Polynomial hulls, bounded symmetric domains.
The author was partially supported by RFBR Grants 06-08-01403 and 06-07-89051.
http://arxiv.org/abs/0704.1095v1
2 V.M. GICHEV
continuous complex-valued functions on M with the sup-norm. If A is generated by
a finite-dimensional invariant subspace, then MA can be realized as the polynomial
hull of an orbit. Paper [6] contains a description of MA for bi-invariant algebras
on compact groups and partial results on spherical homogeneous spaces. Maximal
ideal spaces for U(n)-invariant algebras on spheres in C
n are described in [11].
In this paper we consider orbits Gv of groups G = FT , where F ⊆ G is a finite
subgroup and T is a torus, such that GCv = TCv. Let t ⊆ gl(V ) be the Lie algebra
of T and set tR = it, TR = exp(tR). Suppose that v ∈ V has a trivial stable
subgroup in T and let X ⊂ TRv be finite. The hull of Y = TX admits a simple
description. If X = {v}, then Ŷ = T̂ v is the closure of T exp(CT )v, where CT is
a cone in tR. If TC is closed, then Ŷ = T exp(QX)v, where QX ⊆ tR is a convex
polytope (the convex hull of the inverse image of X for the mapping ξ → exp(ξ)v,
ξ ∈ tR). Any segment in QX corresponds to an analytic strip or an annulus in
Ŷ . In general, Ŷ is the union of T̂ u, where u runs over exp(QX)v. Also, Ŷ is
distinguished in closTCv by a finite family of monomial inequalities of the type
|z1|s1 . . . |zn|sn ≤ c,(0.2)
where c ≥ 0 and s = (s1, . . . , sn) ∈ Rn depend on v and X . Vectors s correspond
to normals of faces of CT +QX .
Thus, the problem of determination of Ĝv is not difficult if Gv ⊂ TCv. The
latter is equivalent to the assumption that the complex orbit GCv is connected. In
Example 3.4, we give a construction for orbits which satisfy this condition; here is
a sketch. The group G = FT acts on the space V = C(K), where K is a finite
F -invariant subset of t∗: F acts naturally on C(K), t = t∗∗ is naturally embedded
into C(K), and T = exp(t) acts on C(K) by multiplication. If v ∈ C(K) is an
F -invariant function, then Gv ⊂ TCv. According to Theorem 3.5, each connected
complex orbit can be realized in this way. Further, we describe pairs (V,G) such
Gv ⊂ TCv for a generic v ∈ V.(0.3)
By Theorem 4.3, under the additional assumption that the complex linear span of
TCv coincides with V , this happens if and only if the group GCZ, where Z is the
centralizer of G in GL(V ), has an open orbit in V . There are two extreme cases:
(A) Z ⊆ GC; (B) G has a finite center. An example for (A) is the group G = SnTn
acting in Cn, where Tn is the torus of all diagonal matrices in U(n) and Sn is the
group of all permutations of coordinates. Replacing Tn with SU(n)∩Tn, we get an
example for (B). Example 4.4 contains a construction for pairs (V,G) that satisfy
(0.3). Theorem 4.5 states that the construction is a universal one. In Theorem 4.10,
we determine pairs which satisfy (0.3) and the following condition:
vectors s in (0.2) are independent of v.(0.4)
The paper also contains a description of hulls Ĝv for G = Aut0(D), where D is a
bounded symmetric domain in the canonical realization and Aut0(D) is the stable
subgroup of zero, which coincides with the group of all linear automorphisms of
D. These hulls have already been described: the final step was done in paper [9],
which essentially used [10], partial results appear in [14] and [8]. Most of them
use the technique of Jordan triples and Jordan algebras. We use Lie theory, in
particular, an explicit construction of paper [15] for a maximal abelian subspace a.
A compact group acting in a Euclidean space is called polar if there exists a subspace
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS3
(a Cartan subspace) such that each orbit meets it orthogonally. The group G is
polar in the ambient linear space d, and a is the Cartan subspace for G. Real polar
representations are classified in paper [3]; they are orbit equivalent (i.e., have the
same orbits) to isotropy representations of Riemannian symmetric spaces. If D is
a polydisc Dn ⊂ Cn, where D is the unit disc in C, then G = SnTn; the polynomial
hulls Ĝv are determined by the inequalities
µk(z) ≤ µk(v),(0.5)
where k = 1, . . . , n and µk are defined by
µk(z) = max{|zσ(1) . . . zσ(k)| : σ ∈ Sn}.(0.6)
The general case can be reduced to this one in the following way. Any bounded
symmetric domain D ⊂ d of rank n admits an equivariant embedding of Cn to
d, which induces an embedding of Dn to D, such that Rn ⊂ Cn is the maximal
abelian subspace a, and, for any v ∈ a , the hull of Aut0(D)v is the orbit of the hull
of Aut0(D
n)v. Each µk(z) has a unique continuation to a K-invariant function on
d. The extended functions determine hulls by the same inequalities. Moreover, they
are plurisubharmonic and can be treated as products of singular values of z ∈ d or
as norms of exterior powers of adjoint operators in suitable spaces. The subsystem
of long roots of the restricted root system (i.e., the root system for a) has type nA1;
this defines the above embedding Cn → d. Furthermore, this makes it possible to
determine hulls in terms of the adjoint representation (Theorem 5.7). Thus, there
is no need to consider different types of domains separately.
The reduction to the case of a torus extended by a finite group, which is de-
scribed above, is contained in Section 5 (in papers [9], [14], the problem is also
reduced to this case by another method). It does not use essentially the results
of the previous sections (only Proposition 3.2, in proof of Theorem 5.7). These
extensions satisfy conditions (0.3) and (0.4); in addition, they possess the property
that the complexified groups have open orbits. According to Theorem 4.10, any
group with these properties is the product of groups SnT
n acting in Cn; it admits
a natural realization as a group of automorphisms of a bounded symmetric domain
(Corollary 5.3).
The following simple examples illustrate the case Gv 6⊆ TCv and show that
condition (0.3) is essential. Let G = SnT
n, and let ǫ1, . . . , ǫn be the standard base
in Cn. Then Ĝǫ1 is the closure the union of discs Dǫk, k = 1, . . . , n. Set H = SnT,
where T acts by z → eitz, t ∈ R, z ∈ Cn. Then Ĥǫ1 = Ĝǫ1. For v = ǫ1 + ǫ2, Ĝv is
the closure of the union of
bidiscs but Tn contains no proper torus T such that
Ĝv = Ĥv for H = SnT . However, for any subgroup F ⊆ Sn which acts transitively
on 2-sets and H = FTn we have Ĝv = Ĥv.
1. Preliminaries
We keep the notation of Introduction, in particular, (0.1) and (0.6). Linear
spaces are supposed to be finite dimensional and complex unless the contrary is
explicitly stated. ”Generic” means ”in some open dense subset”. Throughout the
paper, we use the following notation:
D and T are the open unit disc and the unit circle in C, respectively;
V denotes a complex linear space (except for Section 5);
4 V.M. GICHEV
if V is equipped with a linear base identifying it with Cn, then Tn is the
group of all diagonal unitary transformations;
Zn2 consists of all transformations in T
n with eigenvalues ±1;
ǫ1, . . . , ǫn is the standard base in C
n and Rn;
Rn+ is the set of vectors in R
n with positive entries;
SK denotes the group of all permutations of a finite setK; ifK = {1, . . . , n},
then SK = Sn;
C(K) is the algebra of all complex-valued functions on K;
1 is the identity of C(K);
G ⊂ GL(V ) is a compact group whose identity component is a torus T
(except for Section 5);
t ⊂ gl(V ) is the Lie algebra of T , tR = it, tC = t+ tR;
TR = exp(tR), TC = exp(tC);
C∗ = TC = C \ {0};
Ť = Hom(T,T) is the dual group to T ;
Aut(D) is the group of all holomorphic automorphisms of a domain D ⊂ V ,
Aut0(D) = Aut(D) ∩GL(V );
coneX denotes the least convex cone which contains the set X ;
convX is the convex hull of X ;
closX is the closure of X ;
spanFX is the linear span of X over the field F = C,R,Q.
Clearly, exp is bijective on tR and TR ∼= TC/T . The differentiating at the identity e
defines an embedding of Ť into the dual space t∗: χ → −ideχ, where χ ∈ Ť . This
is a lattice in the vector group t∗, moreover, T ∼= t/L, where L is the dual lattice
to Ť in t. For χ ∈ Ť , let
Vχ = {v ∈ V : gv = χ(g)v for all g ∈ T }
be the corresponding isotypical component of V . Then
⊕Vχ.(1.1)
We assume that V is equipped with a G-invariant inner product 〈 , 〉. Then de-
composition (1.1) is orthogonal. Let spec(v) denote the spectrum of v ∈ V (the set
of χ ∈ Ť such that the χ-component of v is nonzero); for X ⊆ V ,
spec(X) = ∪x∈X spec(x).
We say that T has a simple spectrum if
dimVχ ≤ 1(1.2)
for all χ ∈ Ť . If (1.2) is true, then there exists a unique (up to scaling factors)
orthogonal base in V which agree with (1.1) and a unique maximal torus Tn in
GL(V ) which contains T . In what follows, we assume that (1.2) holds; we shall see
in the next section that such assumption is not restrictive. Thus, we may fix an
identification
V = Cn = C(K),(1.3)
where K = {1, . . . , n}. If F is a subgroup of SK , then C(K)F denotes the set
of all F -invariant functions on K; clearly, 1 ∈ C(K)F . Further, (C∗)n is the
multiplicative group of all invertible functions in C(K), Tn consists of functions
with values in T, and (Tn)C = (C∗)n. The Lie algebra of Tn is realized as iRn ⊂ Cn.
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS5
The embedding T → Tn induces embeddings of the Lie algebra and the fundamental
group: t → iRn, π1(T ) → iZn ⊂ iRn, respectively. Let Γ be the image of π1(T ).
Then spanR Γ = t; moreover, t∩ iZn = Γ and t/Γ = T . The dual mapping Ťn → Ť ,
which is defined by the restriction of characters e−i〈x,y〉, where x ∈ iZn, to t, is
the orthogonal projection πt : iZ
n → t. Thus, Γ is a subgroup of finite index in
Ť = πtiZ
n. Vectors in spanQ Ť are called rational. The image of t in iR
n can be
distinguished by linear equations with integer coefficients. Hence, clos(TCv), for
a generic v ∈ V , is the set of all solutions to a finite number of equalities with
holomorphic monomials. Thus, Y ⊂ TC implies Ŷ ⊂ clos(TCv). Set
CT = t
R ∩ clos(−Rn+),(1.4)
The cone iCT is dual to cone(specV ) ⊆ t∗ ⊆ iRn. If −ξ ∈ clos(Rn+), then ι =
limt→+∞ exp(tξ) is an idempotent in C(K) such that the multiplication by the
complementary idempotent 1− ι is a projection onto spanC(spec(ξ)). Set
IT = {limt→+∞ exp(tξ) : ξ ∈ CT }.(1.5)
Clearly, IT is finite and contains 1.
Lemma 1.1. The closure of exp(CT ) is equal to IT exp(CT ).
Proof. Due to the evident inclusion clos(exp(CT )) ⊇ IT exp(CT ), it is sufficient to
prove that the set ST = IT exp(CT ) is closed. Clearly, ST is an abelian semigroup.
The cone CT is polyhedral; hence, it is finitely generated:
CT = cone{ξ1, . . . , ξm},
where R+ξk are the extreme rays of CT , k = 1, . . . ,m. Obviously, IT is a finite
semigroup, which is generated by the idempotents limt→+∞ exp(tξk). Thus, the cor-
respondence (e−t1 , . . . , e−tm) → exp(t1ξ1 + . . . , tmξm) defines a mapping of (0, 1]m
onto exp(CT ), which continuously extends to [0, 1]
m. It follows that its image is
closed and coincides with ST . �
Note that there is a natural one-to-one correspondence between IT and the set
of faces of CT .
2. Hulls of finite unions of T -orbits in a TC-orbit
Let v ∈ TC. If v =
χ∈Ť vχ, where vχ ∈ Vχ, g ∈ TC, and u = gv, then
χ∈Ť χ(g)vχ. Since χ(g) 6= 0 for all g ∈ G and χ ∈ Ť , we get
u ∈ TCv =⇒ spec(u) = spec(v);(2.1)
dim (Vχ ∩ spanC Tv) ≤ 1 for all v ∈ V and χ ∈ Ť .(2.2)
Thus, the assumption that T has a simple spectrum in V is not restrictive in
the problem of description of polynomial hulls of orbits Gv such that Gv ⊂ TCv.
Clearly, Dn = T̂n in L(V ). For each x ∈ CT and any polynomial p on L(V ), the
holomorphic function f(ζ) = p(exp(ζx)) is bounded in the halfplane Π : Re ζ ≥ 0.
Hence, exp(Π) is contained in T̂ . On the other hand, if z ∈ Dn ∩ TC, then z =
t exp(x) for some t ∈ T and x ∈ CT (the polar decomposition). By Lemma 1.1,
T̂ = clos(Dn ∩ TC) = T clos(exp(CT )) = TIT exp(CT ).
6 V.M. GICHEV
If v ∈ (C∗)n, then (C∗)nv = (C∗)n, and the mapping z → zv is a linear nondegen-
erate transformation of Cn. Therefore,
v ∈ (C∗)n =⇒ T̂ v = T̂ v = TIT exp(CT )v.(2.3)
For an arbitrary v ∈ V = Cn, set
CvT = {ξ ∈ tR : ξk ≤ 0 if vk 6= 0, k = 1, . . . , n}.
Applying (2.3) to spanC(spec(v)) = C
nv, we get
T̂ v = T clos(exp(CvT )v.(2.4)
Clearly, CvT depends only on spec(v). For s ∈ Rn and z ∈ (C∗)n, set
νs(z) =
|zk|sk .
If sk ≥ 0, then the k-th factor in (C∗)n can be replaced with C (i.e., νs continuously
extends to this product).
It is well known that for any holomorphically convex T -invariant set U ⊆ TC,
the set log(U ∩ TR) ⊆ tR is convex. In particular, this is true for sets of g ∈ TC
such that gv ∈ T̂X, where X ⊂ TCv, v ∈ V . Nevertheless, it is convenient to have
an explicit construction of an analytic strip (or an annulus, if it is periodic) in a
TC-orbit, which corresponds to a segment that joins two points in tR; it is contained
in the following lemma. Set
S = {z ∈ C : 0 ≤ Im z ≤ 1}.
Lemma 2.1. Let v ∈ Cn and u ∈ TRv. Then, there exists ξ ∈ tR such that
λ(z) = exp(zξ)v(2.5)
is a holomorphic mapping λ : S → TCv which satisfies conditions
λ(∂S) ⊆ Tv ∪ Tu,
λ(0) = v, λ(1) = u.
If the stable subgroup of v in TR is trivial, then ξ is unique.
Proof. These properties hold for ξ ∈ tR such that exp(ξ)v = u; such a ξ exists, since
exp is a bijection tR → TR. The last assertion is clear. �
If ξ ∈ CT , then (2.5) defines an analytic halfplane in T̂ v; for Γ-rational ξ, λ is
periodic and defines an analytic disc in T̂ v. Together with Lemma 2.1 this gives a
characterization of hulls for finite unions of T -orbits in TC. Suppose that X ⊂ TRv
is finite and the stable subgroup of v in T is trivial. Then, the inverse to the
mapping x → exp(x)v is well defined. Let us denote it by logv and set
QX = conv(logv X),(2.6)
PX = QX + CT .(2.7)
The set PX is a convex polyhedron, which is unbounded if CT 6= 0. Hence, there
exists a finite set NX ⊂ Rn and, for each s ∈ NX , real numbers cs such that
PX = {x ∈ tR : 〈x, s〉 ≤ cs for all s ∈ NX}.(2.8)
The set NX consists of vectors orthogonal to faces of PX , whose projections into
spanR PX look outside of it; clearly, it is not unique in general.
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS7
Proposition 2.2. Let v ∈ (C∗)n. Suppose that Y ⊂ TCv is a finite union of
T -orbits (including Tv), and set X = TRv ∩ Y . Then X is finite and
Ŷ = clos (T exp(PX)v)(2.9)
= clos {z ∈ (C∗)n : νs(z) ≤ ecsνs(v), s ∈ NX}(2.10)
u∈exp(QX )v
T̂ u(2.11)
= T exp(PX)IT v,(2.12)
where QX , PX ,NX are as above and IT is defined in (1.5).
Proof. Due to the polar decomposition, the set Tu∩TRv, for each u ∈ TCv, is non-
void and consists of a single point. Hence, X is finite and Y = TX . The inclusion
(expQX)v ⊆ Ŷ follows from Lemma 2.1 and Phragmén–Lindelöf Principle. The in-
clusion exp(CT )u ⊆ T̂ u is true for any u ∈ Cn. Since it holds for all u ∈ T exp(QX),
the left-hand side of (2.9) includes the right-hand side. If z = exp(ξ)v, where ξ ∈ tC,
then zk = e
ξkvk, k = 1, . . . , n; due to (2.8), this implies that the right-hand side
of (2.9) coincides with (2.10). According to (2.3), the right-hand side of (2.9) and
(2.11) intersect TCv by the set
T exp(PX)v = exp(QX)T exp(CT )v;
clearly, it is dense in (2.11). Since QX is compact, the set (2.11) is closed. The
compactness of QX , the above equality, and Lemma 1.1 imply that (2.12) is closed;
hence, it is the same as the right-hand side of (2.9).
Each of the sets (2.9)–(2.12) includes Y . Thus, it remains to prove that (2.12)
is polynomially convex. If x ∈ tR \ PX , then there exists s ∈ Rn such that
sup{〈y, s〉 : y ∈ PX} < 〈x, s〉 .(2.13)
Since QX is compact, the linear functional on t in the right-hand side of (2.13) must
be nonnegative on CT . According to (1.4), we may assume that s ∈ closRn+. It
follows that (2.13) holds in a neighborhood of s in closRn+. Thus, s can be assumed
rational (hence, integer) with strictly positive entries. Then, p(z) = zs11 . . . z
a holomorphic polynomial such that |p | separates exp(x)v and T clos(exp(PX)v).
Therefore,
Ŷ ∩ TCv = T exp(PX)v.
For any ι ∈ IT , the projection z → ιz commutes with T . This makes it possible to
apply the above arguments to the vector ιv, the set ιX , and to the restriction of T
to ιCn. Consequently,
ι̂Y ∩ TCιv = T exp(PιX)ιv = ιT exp(PX)v(2.14)
(clearly, ι exp(PX)v = exp(PιX)ιv). By (1.5), ιY ⊆ Ŷ , hence, ι̂Y ⊆ Ŷ ; on the other
hand, ιŶ ⊆ ι̂Y since p◦ι is a polynomial on Cn for any polynomial p on ιCn. Thus,
ιŶ = ι̂Y = Ŷ ∩ ιCn. Together with (2.14), this implies the polynomial convexity of
(2.12). �
If T = Tn, then Proposition 2.2 follows from the well-known characterization of
polynomially convex Reinhardt domains.
Corollary 2.3. For any v ∈ (C∗)n, the orbit TCv is closed in Cn if and only if Tv
is polynomially convex, and this is equivalent to CT = 0. Then, Ŷ = T exp(QX)v
for all Y,X as above.
8 V.M. GICHEV
Proof. The orbit TCv is closed if and only if the convex hull of spec(v) = spec(Cn)
contains 0 in its relative interior (see, for example, [13, Proposition 6.15]). Since
T ⊆ GL(n,C), the set spec(Cn) is generating in t∗. Hence, TCv is closed if and
only if CT = 0; by (2.4), this is equivalent to T̂ v = Tv. Then, Ŷ = T exp(QX)v by
(2.9) and (2.1). �
There is a version of the first assertion for an arbitrary compact linear group G:
a GC-orbit is closed if and only if it contains a polynomially convex G-orbit ([7,
Theorem 1 and Theorem 5]). For a torus T , all T -orbits in TCv are simultaneously
polynomially convex or non-convex, but this is not true if G is not abelian.
3. Finite extensions of T that keep a TC-orbit
In this section, we consider the case where the set X defined in the previous
section is an orbit of a finite group F which normalizes T and keeps the TC-orbit.
We assume that T ⊆ G, T is a torus, G is a subgroup of GL(V ), F is a finite
subgroup of G, and
G = FT = TF, F ∼= G/T,(3.1)
Gv ⊆ TCv,(3.2)
v ∈ (C∗)n ⊂ Cn = V.(3.3)
By (3.1), T is normal in G. Clearly, (3.2) is equivalent to Fv ⊆ TCv and to the
connectedness of GC. Here is an illustrating example.
Example 3.1. Let G = Aut0(D
2) be the group of linear automorphisms of the
bidisc D2 ⊂ C2. Clearly, G = FT , where F = S2 is generated by the transposition
τ of the coordinates, T = T2, TC = (C∗)2, and TCv = (C∗)2 for any v that lies
outside the coordinate lines. Thus, (3.2) holds for all v ∈ (C∗)2 (however, (3.2) fails
for any v 6= 0 in C2 \ (C∗)2). The hull Ĝv can be distinguished by the inequalities
max{|z1|, |z2|} ≤ max{|v1|, |v2|},(3.4)
|z1z2| ≤ |v1v2|.(3.5)
Clearly, (3.4) and (3.5) define a polynomially convex set. Let z1, z2 > 0 (a generic
T -orbit evidently contains such a point z). Then, z and τz can be joined by an
analytic strip with the boundary in Tz ∪ Tτz:
λz(s) = (z
2 ), s ∈ S.
Set q = ln z1
and let z1 > z2. Then, the strip can be written in the form
λz(s) = (e
−sz1, e
sz2), 0 ≤ Re s ≤ q.
It is periodic with the period 2πi and defines a τ -invariant annulus in Ĝv with τ -
fixed points (
z1z2,
z1z2) and (−
z1z2,−
z1z2). As z2 → 0, the annulus tends
to a couple of discs: (e−sz1, 0) and (0, e
−sz1), where Re s > 0, 0 ≤ Im s ≤ 2π (the
circle Re s = q
, 0 ≤ Im s ≤ 2π collapses to zero). Let z ∈ Ĝv ∩ R2. Then Ĝv
contains a bidisc D2z. It intersects R2 by a rectangle, which is symmetric with
respect to the coordinate axes. If z lies on an axis, then the rectangle degenerates
into a segment. Let v1 > v2 > 0. The union of these rectangles with vertices
in the set Q of real points of the annulus, which joins v and τv, is a curvilinear
octagon. It degenerates into a pair of segments if v2 = 0 and into a square if v1 = v2
(see [10, Fig. 2] for the 3-dimensional case). In the logarithmic coordinates in the
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS9
first quadrant, Q is a segment. Also, note that all nontrivial TC-orbits are not
closed. �
In [2], Björk found a typical situation where analytic annuli appear in the maxi-
mal ideal space MA of a commutative Banach algebra A which admits a nontrivial
action of T by automorphisms: this happens if T -invariant functions on MA do
not separate distinct T-orbits. In [7], it was noted that analytic strips and/or an-
nuli appear in Ĝv if the stable subgroup of v in GC does not coincide with the
complexification of the stable subgroup of v in G.
Proposition 3.2. The hulls Ĝv for orbits of G = Aut0(D
n) = SnT
n are distin-
guished by inequalities (0.5), where µk are defined by (0.6).
Proof. The approximation by decreasing sequences of hulls makes it possible to
reduce the proposition to the case of a generic v in (0.5). Then, applying to v =
(v1, . . . , vn) a suitable transformation in T
n, we may assume that
v1 > v2 > · · · > vn > 0.(3.6)
Moreover, we may use Proposition 2.2 with X = Snv, CT = − closRn+ (we keep
the notation of Proposition 2.2). Since X , QX , CT , PX , and µk are Sn-invariant,
Sn is transitive on X , by (0.6), (0.5), and (2.10), it is sufficient to prove that the
vectors ξk =
r=1 ǫr, k = 1, . . . , n, correspond to the faces of PX that meet at v,
are orthogonal to them, and look outside of PX .
Set η1 = ǫ2 − ǫ1, . . . , ηn−1 = ǫn − ǫn−1, ηn = −ǫn. Then {−ηk}nk=1 is a base in
Rn, which is dual to the base {ξk}nk=1. We claim that the cone of the polyhedron
PX at the vertex v is generated by {ηk}nk=1. This implies the assertion above
(note that both cones are simplicial). If τ ∈ Sn is a transposition (k, j), then
v − τv = (vk − vj)(ǫk − ǫj). If σ, κ ∈ Sn then v − σκv = (v − κv) + (κv − σκv).
Furthermore, Sn is generated by transpositions (k, k + 1), and vk − vk+1 > 0 by
(3.6), where k = 1, . . . , n− 1. Therefore, vectors η1, . . . , ηn−1 generate the cone of
QX at v. Since −ǫk =
r=0 ηn−r and CT is generated by −ǫk, k = 1, . . . , n, this
proves the proposition. �
Property (3.2) implies spanC Gv = spanC T
Cv. Hence, we may assume that (1.2)
is valid. Then, a generic ξ ∈ t has a simple spectrum. Any f ∈ F permutes
eigenvalues and eigenspaces. Thus, assuming (1.2) and identifying V with C(K)
in accordance with (1.3), we get that each element of F is a composition of a
permutation of K and a multiplication by a function on K. Further, (3.3) implies
that the stable subgroup of v in TC is trivial. Hence,
TRv ∼= TC/T ∼= tR,
where the identification of TRv and tR is realized by ξ → exp(ξ)v, ξ ∈ tR.
Lemma 3.3. Let G ⊂ GL(V ), a subgroup F ⊆ G, and v ∈ V satisfy (3.1)–
(3.3). Then TCv contains a G-invariant T -orbit. Moreover, there exists a mapping
f → tf , F → T , such that F̃ = {tff : f ∈ F} is a subgroup of G which has a fixed
point in TC and satisfies (3.1)–(3.3).
Proof. The group F naturally acts on TRv ∼= TC/T ∼= tR. Any g ∈ F is a composi-
tion of σ ∈ SK and a multiplication by a function in C(K). Since t acts on C(K)
by multiplication on linear functions and σ induces a linear transformation in tC,
the induced action of F on tR is affine. Since F is finite, it has a fixed point in tR.
10 V.M. GICHEV
Hence, TCv contains a G-invariant T -orbit. Let us fix a point u in it and define
tf by tffu = u; the choice is unique due to (3.3). Taken together with (3.1), this
implies that F̃ is a group, which obviously satisfies the lemma. �
According to Lemma 3.3, we may assume without loss of generality that
fv = v for all f ∈ F.(3.7)
In the following example we give a construction (associated with a given finite group
F ) for orbits with property (3.2).
Example 3.4. Let t be a real linear space, t∗ be the dual space to t, L be a lattice
in t, and L∗ ⊂ t∗ be the dual lattice to L. Set
λx(y) = y(x), where x ∈ t, y ∈ t∗.
Let K be a finite subset of L∗ that generates L∗ as a subgroup of the vector group
t∗. Then
t∗ = spanR K,(3.8)
L = {x ∈ t : λx(K) ⊂ Z}.(3.9)
Further, let F be a finite subgroup of GL(t) which keeps K. Set V = C(K). The
mapping
λ : x → iλx
(3.10)
is an embedding t → V , which has a natural extension to tC. Set
exp(x) = e2πiλx .(3.11)
Clearly, L = ker exp. Hence, exp defines an embedding of T = t/L and TC into the
group (C∗)n:
TC = exp(tC) ⊆ (C∗)n.
The group TC acts on C(K) by multiplication. The inclusion v ∈ C(K)F is the
same as (3.7); it implies (3.2). Furthermore, if v ∈ (C∗)n, then
spanC Tv = V.(3.12)
Indeed, the space spanC T is a subalgebra of C(K), which separates points of the
finite set K. Hence, it coincides with C(K). �
Theorem 3.5. Let a group G ⊂ GL(V ), a finite subgroup F ⊆ G, a torus T , and
a vector v ∈ V satisfy (3.1)–(3.3), (3.7), and (3.12). Then V,G, F, T, v can be
realized as in Example 3.4, where
v ∈ (C∗)n ∩ C(K)F .(3.13)
Conversely, if V,G, F, T, v are as in Example 3.4 and v satisfies (3.13), then (3.1)–
(3.3), (3.7), and (3.12) are true.
Proof. The group F acts in t and t∗ by the adjoint action. Let K ⊂ t∗ be the
collection of all weights for the representation of T in V ; clearly, K is F -invariant.
It follows from (3.12) and (3.2) that the weights are multiplicity free. This defines
an equivariant linear isomorphism between V and C(K), where the group T acts by
multiplication. Thus, λ and exp are well defined by (3.10) and (3.11). According
to (3.7) and (3.3), (3.13) is true; (3.8) holds since T ⊂ GL(V ) is compact and acts
effectively on V (note that the annihilator of spanR K in t acts trivially due to
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS11
(3.11) and (3.13)). Let us define L by (3.9). Then L = ker exp by (3.11). Hence, L
is a lattice in t and the group L∗ generated by K is the dual lattice in t∗.
The converse was proved in Example 3.4. �
4. Finite extensions of T which keep generic TC-orbits
In what follows, we use the setting of Example 3.4. Let Z denote the centralizer
of G in GL(V ). We assume that (C∗)n acts in V = C(K) by multiplication.
Lemma 4.1. Z = C(K)F ∩ (C∗)n.
Proof. Since λ(t) separates points of K, Z ⊆ (C∗)n. The multiplication by u ∈
C(K) commutes with F if and only if u is F -invariant. �
In general, condition (3.2) does not hold for a generic vector v. Hence, there is
a natural problem: describe V and G such that generic orbits satisfy (3.2). The
following proposition contains a simple criterion.
Proposition 4.2. Let V,G be as in Example 3.4. Then G satisfies (3.2) for a
generic v ∈ V if and only if
C(K) = λ(tC) + C(K)F .(4.1)
In this case, each TC-orbit in (C∗)n intersects C(K)F .
Proof. It follows from Lemma 4.1 that the right-hand side of (4.1) is the tangent
space at 1 to the set TC Z. Clearly, G̃C = ZGC is a group, TCZ is the identity
component of G̃C, and the right-hand side of (4.1) is the tangent space to G̃C1.
Hence, (4.1) holds if and only if G̃C1 is open. Moreover, this is equivalent to the
equality TCZ = exp(λ(tC) + C(K)F ) = (C∗)n. Therefore, each TC-orbit in (C∗)n
intersects C(K)F , i.e., contains an F -fixed point. Thus, (4.1) implies (3.2) for
v ∈ (C∗)n.
Let (3.2) hold and letW be an F -invariant neighborhood of 1. IfW is sufficiently
small, then the condition log 1 = 0 defines a branch of log in W . We may assume
that logW is convex and symmetric. This makes it possible to define roots in W :
r = exp
. For v ∈ W 12 and f ∈ F , set gf =
, where r = cardF ,
and g =
f∈F gf . Then gv is F -fixed. If (3.2) holds for v, then gf ∈ TC for all
f ∈ F ; hence, gv ∈ TCv. Consequently, for all v ∈ W , TCv intersects C(K)F . Since
Z keeps this property of orbits, it follows that TCZ has a nonempty interior. This
implies (4.1). �
Theorem 4.3. Let G ⊂ GL(V ) be a semidirect product of a torus T and a finite
subgroup F , and let Z be the centralizer of G in GL(V ). Suppose that spanC Tv = V
for some v ∈ V . Then the following conditions are equivalent:
(i) Gv ⊂ TCv for a generic v ∈ V ;
(ii) GCZv is open in V for a generic v ∈ V .
Proof. By Theorem 3.5, we may use the construction of Example 3.4. According to
Lemma 4.1, (ii) is equivalent to (4.1), and the assertion follows from Proposition 4.2.
12 V.M. GICHEV
We shall give a constructive description of these spaces and groups. Set
C0(K) =
u ∈ C(K) :
u(q) = 0
Sometimes, we identify points in K with their characteristic functions.
Example 4.4. Let V = Cn = C(K), where K = {1, . . . , n}, let F be a subgroup
of Sn, and
K = K1 ∪ · · · ∪Kp(4.2)
be the partition of K into F -orbits. For k ∈ {1, . . . , p}, set Vk = C(Kk). Then
V = V1 ⊕ · · · ⊕ Vp. Set
t0k = C0(Kk) ∩ iRn,
= exp(t0
) ⊂ C(Kk),
where exp is defined by (3.11). Set t0 = t01 ⊕ · · · ⊕ t0p,
T 0 = exp(t0) = T 01 × · · · × T 0p .
Let T be an F -invariant torus such that
T 0 ⊆ T ⊆ Tn(4.3)
and set G = FT . Then generic GC-orbits satisfy (3.2). The group G is irreducible
if and only if F is transitive on K; in general, F -orbits in K define G-irreducible
components of V . There are two extreme cases in (4.3).
(A) If T = Tn, then there is one open orbit (C∗)n of the groupGC = FTC, which
evidently satisfies (3.2). If F is nontrivial, then there exist degenerate orbits
that do not satisfy (3.2); moreover, if F is transitive onK, then all non-open
GC-orbits, except for zero, are nontrivial finite unions of TC-orbits.
(B) If T = T 0, then generic orbits are closed. They have codimension p and
are distinguished by equations
zr = ck,
where ck ∈ C∗, k = 1, . . . , p.
Note that (A) and (B) are invariant under the Cartesian product (the group F
need not be the product of groups Fk of irreducible components but must have the
same orbits in K as F1 × · · · × Fp). In terms of Example 3.4: in (A), t = Rn, the
mapping λ : tC → C(K) is surjective, K = {ǫ1, . . . , ǫn}; in (B), t = iRn ∩ C0(K),
λ(tC) = C0(K), and the set K is the projection of {ǫ1, . . . , ǫn} into t∗ = t. In both
cases, K is the set of all vertices of a regular simplex. �
Theorem 4.5. Let V,G be as in Theorem 4.3 and let (i) hold. Then V,G can be
realized as in Example 4.4. Furthermore,
(1) V,G are of type (A) if and only if GC has an open orbit,
(2) (B) is equivalent to the assumption that the center of G is finite,
(3) if G is irreducible, then either (A) or (B) holds.
Let C(K)F+ be the cone of all nonnegative functions in C(K)
Lemma 4.6. Let G and V be as in Example 3.4. Then, the orbit GCv is closed
for a generic v ∈ V if and only if
R ∩ C(K)F+ = 0.(4.4)
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS13
Proof. Clearly, GCv is closed if and only if TCv is closed. Let v ∈ (C∗)n. By
Proposition 2.2 and Corollary 2.3, TCv is not closed if and only if CT 6= 0. Since
CT is F -invariant by (1.4), it contains
f∈F fu for each u ∈ CT . Thus, CT = 0 is
equivalent to (4.4). �
Proof of Theorem 4.5. Suppose that G is irreducible or, equivalently, F is transi-
tive. Then Z = C∗ 1 according to Lemma 4.1. If 1 ∈ λ(tC), then TC ⊇ Z and
TCv is open for a generic v ∈ V by Theorem 4.3. If 1 /∈ λ(tC), then (4.4) is true;
by Lemma 4.6, TCv is closed for a generic v ∈ V . By Proposition 4.2, a generic
TC-orbit intersects C∗1. Consequently, we have
codimGCv = 1.(4.5)
Let 1 ∈ TC ∩ Z. The orthogonal projection of 1 into the tangent space T1TC1 is
F -fixed. Hence, it is proportional to 1; since 1 /∈ λ(tC), this implies 1 ⊥ λ(tC)1.
Therefore, T1T
C1 coincides with the tangent space to the hypersurface z1 . . . zn = 1
at 1; since the monomial on the left is an eigenfunction of TC, this group keeps it.
Due to (4.5), TC1 coincides with this hypersurface. Then, T = Tn ∩ SU(n), and
any TC-orbit that intersects Z is a hypersurface z1 . . . zn = c, for some c ∈ C∗. This
implies tC = C0(K) and T = T
Thus, the theorem is proved for all irreducible G. The projection onto each
irreducible component keeps the property (3.2) for generic orbits since it commutes
with G. Hence, (i) holds for all irreducible components. They correspond to F -
orbits Kk in the partition (4.2). Let t
, k = 1, . . . , p , be defined as in Example 4.4.
According to the arguments above, λ (t|Kk) ⊇ λ(t0k) for all k. If x ∈ t, then the
averaging
Ax = 1
f∈F fx, r = cardF,
distinguishes the F -fixed component of x (i.e., Ax ∈ C(K)F ∩ t and x− Ax ∈ t0);
since t is F -invariant, it contains both components. By Lemma 4.1, if G has a finite
center, then λ (t|Kk) = λ(t0k) for all k. It follows that
t ⊆ t0 = t01 ⊕ · · · ⊕ t0p.
On the other hand, (ii) and Lemma 4.1 imply codim t ≤ dimC(K)F = p. Hence,
the inclusion above is in fact the equality. Thus, we get (B) assuming that G has
a finite center. The converse is true since t0 does not contain a nontrivial F -fixed
element. The same arguments show that any F -invariant torus T includes T 0 if (i)
is true. This proves that V,G admit the realization of Example 4.4; (1) and (2) are
clear. �
Corollary 4.7. Let G be as in Theorems 4.5 and 4.3. Then G contains a closed
subgroup G0 such that
(1) each connected component of G contains a connected component of G0,
(2) G0 has a finite center,
(3) generic orbits of (G0)C are closed,
(4) Gv ∩ TRv = G0v ∩ (T 0)Rv for a generic v ∈ V .
Proof. By Theorem 4.5 and (4.3), G ⊇ T 0, where T 0 is as in (B). Clearly, F
normalizes T 0. Hence, G0 = FT 0 is a group, which satisfies the corollary. �
14 V.M. GICHEV
Proposition 2.2 makes it possible to find Ĝv for G as above. If T = Tn, then
T ⊃ Zn2 and generic T -orbits intersect Rn+; hence, we may assume v ∈ Rn+. Then
T̂ v ∩ Rn is a parallelepiped Πv = conv{(±v1, . . . ,±vn)}. Clearly, Πv = Zn2Π+v ,
where Π+v = Πv ∩ closRn+. Since Rn+ = TRv, we may use Proposition 2.2 with
X = Fv, CT = − closRn+, PX = conv(Fv) − Rn+:
Ĝv = ∪u∈exp(Qv)Dnu = T ∪u∈exp(Qv) Πu = T ∪u∈exp(Qv) Π+u ,
where Qv = convFv. For the description in the form (2.10), one has to know
normal vectors to faces of convFv. Since F may be an arbitrary subgroup of
Sn, they need not be proportional to rational vectors (for example, this is true
for the cyclic subgroup of order 3 in S3). We shall describe the situation where
they are locally independent of v; since they depend on v continuously, this is
equivalent to the condition that they are rational. Note that the vector which joins
two points in tR as in Lemma 2.1 is rational if and only if the strip reduces to an
annulus. In Example 4.4, F need not be the product of groups corresponding to
the irreducible components; we shall see that F possesses this property in the case
under consideration.
Let U be a real vector space and F ⊂ GL(U) be a finite group. Set
Cu = cone(u− Fu);
this is the cone at the vertex u of the polytope conv(Fu) (which may be degenerate).
We say that Cu is locally independent of u if, for a generic u ∈ U , Cu = Cw for all
w that are sufficiently close to u.
Lemma 4.8. Let U be a real vector space and F be a finite subgroup of GL(U).
Suppose that Cu is locally independent of u. Then F is generated by reflections in
hyperplanes in U .
Proof. We may assume without loss of generality that U is equipped with an inner
product and that F ⊆ O(U). Let R+(u − fu), f ∈ F , be an extreme ray of Cu.
The equality Cu = Cw for w in a neighborhood of u implies that this ray does not
change near u. Hence, dim(1 − f)U = 1. Since f is orthogonal and nontrivial, it
is a refection in a hyperplane. The stable subgroup of a generic u ∈ U is trivial
(hence, F acts freely on a generic orbit) and each vertex of conv(Fu) can be joined
with u by a chain of edges. Applying the above arguments repeatedly to u, fu, etc.,
we get that F is generated by reflections in hyperplanes. �
For any g ∈ Zn2Sn and k = 1, . . . , n, gǫk = ± ǫσ(k) for some σ ∈ Sn. The mapping
f → σ is a natural homomorphism Zn2Sn → Sn, which we denote by φ.
Lemma 4.9. Let F be a transitive subgroup of Sn acting in R
n by permutations
of coordinates and let a group H ⊆ Zn2Sn be generated by reflections in hyperplanes
in Rn. If φ(H) = F , then F = Sn.
Proof. Let ρ be a reflection in a hyperplane in Rn. If ρ ∈ Zn2Sn = BCn, then it
is conjugate to a reflection in a wall of the Weyl chamber that is distinguished by
the inequalities x1 > · · · > xn > 0. Hence, φ(ρ) is a transposition if it is nontrivial.
Since F = φ(H), F is generated by transpositions. It remains to note that any
subgroup of Sn, which is generated by transpositions, coincides with Sn if it is
transitive on {1, . . . , n} (consider the graph with the vertices {1, . . . , n} and edges
corresponding to transpositions and note that inclusions (k, l) ∈ F , (l,m) ∈ F
imply (k,m) ∈ F ; this makes it possible to use the induction). �
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS15
We say that a pair (V,G) is standard if it is isomorphic to (A) or (B) in Ex-
ample 4.4 with F = SK . The product of pairs (Vk, Gk), k = 1, . . . ,m, is the pair
k=1 Vk,
k=1 Gk).
Theorem 4.10. Let G = FT be a compact subgroup of GL(n,C), where T ⊆ Tn
is a torus and F is a subgroup of Sn. Suppose that Gv ⊂ TCv for a generic v ∈ V
(1) either T = Tn or the center of G is finite,
(2) for a generic v ∈ Cn, Ĝv can be distinguished in closTCv by a family of
inequalities
|z1|s1 . . . |zn|sn ≤ ρs(v),
where ρs(v) ≥ 0 and vector s = (s1, . . . , sn) runs over a certain finite subset
of Rn which is independent of v.
Then (V,G) is isomorphic to the product of standard pairs. Moreover, if G is
irreducible, then (V,G) is standard.
Proof. Let G be irreducible. Then F is transitive and (V,G) are as in (A) or as
in (B) by Theorem 4.5. Suppose that (B) is the case. It follows from (2) and
Proposition 2.2 that the polytope QX ⊂ tR, where X = Gv ∩ TRv, for a generic
v, satisfies the assumption of Lemma 4.8. Therefore, F is generated by reflections
(we may assume that F ⊂ O(tR)). They extend to reflections in hyperplanes in
tR+R1 = Rn if we assume that they fix 1. Then, Lemma 4.9 implies F = Sn. The
case (A) can be reduced to (B): it is sufficient to replace Tn with T = SU(n) ∩ Tn
since F evidently keeps T and to note that (2) remains true due to Proposition 2.2.
Thus, (V,G) is standard.
Let the center of G be finite. According to Theorem 4.5, T may be identified
with the group T 0 in Example 4.4. In particular, GCv is closed for a generic v
and CT = 0 due to Proposition 2.2. By Proposition 4.2, generic orbits contain
F -fixed points. Applying the arguments above (which did not use the assumption
that G is irreducible), we get that the cones at the vertices of the convex polytope
QX , X = Gv ∩ TR ⊂ tR, are locally independent of v. Clearly, the same is true
for its projection into each space t0k corresponding to an irreducible component
Vk of V = C
n. This implies that all irreducible components are standard. Thus,
Fk = S(Kk), where k = 1, . . . , p and K = K1 ∪ · · · ∪Kp is the partition of K into
F -orbits. Due to Theorem 4.5, it is sufficient to prove that
F = F1 × · · · × Fp.(4.6)
By Lemma 4.8, F |t0 is generated by reflections in hyperplanes in t0; the condition
that they keep real F -invariant functions on K uniquely defines their extension to
Rn. Hence, F is generated by reflections in Rn. A permutation which induces a
reflection in a hyperplane in Rn is a transposition of a pair of coordinates; this pair
is necessarily contained in only one of the sets Kk, k = 1, . . . , p. This proves (4.6).
If T = Tn, then T is a product of tori in irreducible components. Thus, the case
T = Tn follows from the above case, since the assumptions of the theorem hold
true for the group T 0 if they hold for T in (4.3) in Example 4.4. �
5. Hulls of isotropy orbits of bounded symmetric domains
We start with a preliminary material on hermitian symmetric spaces following
[15] but adapting the exposition to our purpose in order to be as self contained as
16 V.M. GICHEV
possible. For a subset X of a Lie algebra g, z(X) = {z ∈ g : [z,X ] = 0} is the
centralizer of X . Let G be a simple real noncompact Lie group with a finite center,
K be its maximal compact subgroup, and g, k be their Lie algebras, respectively.
If the center z = z(k) of k is nontrivial, then g is called hermitian. Then k = z(z)
and dim z = 1 (note that K is irreducible in g/k). Let c be a Cartan subalgebra
of k. Then c is also a Cartan subalgebra of g and z ⊆ c. There exists k ∈ z such
that ad(k) has eigenvalues 0,±i (it is unique up to a sign; ker ad(k) = k). Then
κ = eπ ad(k) is the Cartan involution which defines the Cartan decomposition
g = k⊕ d,(5.1)
where k, d are eigenspaces for 1,−1, respectively. Furthermore, j = ad(k) is a
complex structure in d. This defines the structure of a hermitian symmetric space of
noncompact type in D = G/K. These spaces can be realized as bounded symmetric
domains in Cn with K = Aut0(D). Any irreducible bounded symmetric domain
admits such a realization. Let ∆ ⊆ ic∗ be the root system of gC. Each α ∈ ∆
corresponds to an sl2-triple hα, eα, fα such that ihα ∈ c. Thus, α(hα) = 2, [eα, fα] =
hα, and
[h, eα] = α(h)eα, [h, fα] = −α(h)fα(5.2)
for all h ∈ cC. We identify cC and (c∗)C equipping g with an Ad(K)-invariant
sesquilinear inner product and normalize it by the condition
max{|α| : α ∈ ∆} =
2.(5.3)
Then short roots must have length 1 (note that G2 has no real hermitian form).
The set ∆∨ = {hα : α ∈ ∆} is the dual root system. The above normalization
implies hα = α for long roots and hα = 2α for short ones. Since ad(h), h ∈ c, has
eigenvalues 0 and α(h), where α ∈ ∆, we get α(ik) = 0,±1, i.e., ik is a microweight
(of ∆∨). For s = 0,±1, set
∆s = {α ∈ ∆ : α(ik) = s}.(5.4)
Since k⊕ id is a compact real form of gC and spanR{ihα, eα − fα, i(eα + fα)} is the
su(2)-subalgebra corresponding to a root α ∈ ∆, we have
d = spanR{eα + fα, i(eα − fα) : α ∈ ∆1}.(5.5)
Set sα = spanR{ihα, eα + fα, i(eα − fα)}. Then sα is an sl(2,R)-subalgebra of gC
α ∈ ∆±1 ⇐⇒ sα ⊆ g.(5.6)
Let E be a maximal subset of pairwise orthogonal long roots in ∆1. Set
α∈E hα, e =
α∈E eα, f =
α∈E fα;
α∈E ⊕ sα.
Let α, β ∈ E, α 6= β. Since α, β are long and orthogonal, ±α± β /∈ ∆. Hence,
α, β ∈ E, α 6= β =⇒ [sα, sβ] = 0.(5.7)
It follows that h, e, f is an sl2-triple and s is a subalgebra of g. Set
θ = e
π ad(e−f),(5.8)
a = spanR θE.(5.9)
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS17
Here is the standard realization of root systems Bn and Cn:
Bn = {±ǫk ± ǫl, ±ǫm : k, l,m = 1, . . . , n, k < l};
Cn = {±ǫk ± ǫl, ±2ǫm : k, l,m = 1, . . . , n, k < l}.
Then Cn = B
n , but Cn does not satisfy (5.3). These systems have microweights;
up to the action of the Weyl group, they are:
Bn : ǫ1;
(ǫ1 + · · ·+ ǫn).
There are no other irreducible root systems which have microweights and contain
roots of different lengths. Also, Bn and Cn have the sameWeyl group BCn = Z
Lemma 5.1. The space a is a maximal abelian subspace of d.
Proof. A straightforward calculation with 2-matrices shows that θh = e + f. By
(5.7),
θhα = eα + fα for all α ∈ E.(5.10)
It follows from (5.5) that a ⊆ d. Moreover, a is abelian due to (5.7). Set Ξ = ∆∩E⊥.
We claim that
Ξ ⊆ ∆0.(5.11)
Indeed, a root in ∆1 ∩ Ξ must be short. This may happen only in Bn or Cn, since
G2 and F4 have no microweights and other irreducible root systems have no roots
of different lengths. In Bn, k is a short root and all other short roots are orthogonal
to k. Hence, they do not belong to ∆1. In Cn, E = {2ǫ1, . . . 2ǫn}; then Ξ = ∅.
Since ∆−1 = −∆1, this proves (5.11).
Set b = E⊥ ∩ c and m = spanC{eα, fα : α ∈ Ξ}. It follows from (5.11) that
m ⊆ kC. Clearly, z(E) = cC ⊕ m. The space m is θ-invariant, because θ fixes roots
in Ξ. Due to (5.9), we get
z(a) = θz(E) = bC ⊕ aC ⊕m
Since bC ⊕m ⊆ kC, this implies z(a) ∩ d = a. �
The projection of θ∆ into a is the restricted root system ∆a (it is also the set of
roots for ad(a) in g). The group
W = {Ad(g) : g ∈ K, Ad(g)a = a}|a,
acting in a, is the Weyl group of a.
In what follows, we denote by v the complexification of a with respect to the
complex structure j (thus, v ⊂ d). The set θE is a base in v; enumerating it, we
identify v with Cn. Set t = spanR iE, T = exp t, H = WT . The torus T = T
n is a
maximal compact subgroup in the group exp s ⊆ G.
Proposition 5.2. The following assertions hold:
(1) ∆a is a root system of type BCn or Cn;
(2) the pair (v, H) is standard with T = Tn.
Proof. (1). Let ∆a \ θE contain a long root α. Then α = 12 (α1 + α2 + α3 +
α4) for some α1, . . . , α4 ∈ θE, since |α|2 = 2 and 〈α, β〉 = 0,±1 for all β ∈ E
due to the normalization (5.3) (note that α, β generate A2 if 〈α, β〉 6= 0). Roots
α, α1, . . . , α4 generate D4, since only A4 and D4 among irreducible systems of rank
18 V.M. GICHEV
4 consist of roots of equal length, but A4 does not contain an orthogonal base. Since
〈ik, β〉 = 1 for all β ∈ E, the projection of iθk into spanR D4 is a microweight ω
such that 〈ω, αk〉 = 1, k = 1, . . . , 4, but D4 has no microweight with this property
(in the realization above, D4 = B4 ∩ C4 and the microweights are either ±ǫk or
(±ǫ1± ǫ2± ǫ3± ǫ4)). Thus, E ∪ (−E) is the set of all long roots in ∆a. According
to the classification of irreducible root systems, only Cn and BCn = Bn ∪ Cn has
the property that linearly independent long roots are mutually orthogonal.
(2). The maximal compact subgroup of the group corresponding to s is Tn.
Hence, T = Tn ⊃ Zn2 . Systems Cn and BCn have the same Weyl group W = BCn.
Therefore, H = WT = SnT
Let D be a bounded symmetric domain in a complex linear space d (may be,
reducible) and v ⊆ d be the complex linear span of a maximal abelian subspace in
d (thus, we identify d with the corresponding space in the Cartan decomposition
(5.1), which is induced by the Cartan involutions in irreducible components). Let
Aut00(v, D) denote the subgroup of all linear transformations in Aut(D) which keep
v and each irreducible component of D.
Corollary 5.3. Let F be a subgroup of Sn, G = FT
n ⊂ GL(n,C). Then G
satisfies condition (2) of Theorem 4.10 if and only if (V,G) is isomorphic to a pair
(v,Aut00(v, D)) for a bounded symmetric domain D.
Proof. All pairs (Cn, SnT
n) appear as (v,Aut00(v, D)) for matrix balls D. It re-
mains to combine Theorem 4.10 and Proposition 5.2. �
It is possible now to describe hulls of K-orbits in d (with respect to the complex
structure j) it terms of Proposition 3.2. The key point is that K is polar in d: each
K-orbit meets a orthogonally (i.e., a is a Cartan subspace). This is true, since all
maximal abelian subspaces are conjugate in d by K, ad(a) is symmetric if a ∈ d
and, for a generic a ∈ a, ker ad(a) = a; hence,
[a, g] = a⊥.(5.12)
We may include the linear base in v into a base in d as the first n vectors of the
latter. Then z1, . . . , zn are coordinates in v and linear functions in d. The functions
µk in (0.6) admit a K-invariant extension to d:
µk(z) = sup{|(gz)1 . . . (gz)k| : g ∈ K},(5.13)
where k = 1, . . . , n. The following lemma shows that (5.13) is an extension indeed.
Lemma 5.4. For z ∈ v, (0.6) and (5.13) coincide.
Proof. It follows from (5.12) that any critical point of the linear function Re z1 on
the orbit Kz belongs to a. If the lemma is not true, then there exist z ∈ v and
k ∈ {1, . . . , n} such that |(gz)k| > |zk|. Transformations in Sn and T reduce the
problem to the case z1 > · · · > zn > 0 and k = 1, but then the assumption implies
that Re z1 attains its maximal value on Kz outside of a. �
Proposition 5.5. For any v ∈ d, K̂v = {z ∈ d : µk(z) ≤ µk(v), k = 1, . . . , n}.
Proof. Due to (5.13), each µk is a supremum of absolute values of holomorphic
polynomials. Hence, the right-hand side is polynomially convex. Thus, it includes
K̂v. The inverse inclusion holds, since each K-orbit intersects v by an H-orbit
and hulls of H-orbits are distinguished in v by the same inequalities according to
Proposition 3.2 and Lemma 5.4. �
ORBITS OF TORI EXTENDED BY FINITE GROUPS AND THEIR POLYNOMIAL HULLS: THE CASE OF CONNECTED COMPLEX ORBITS19
The functions µk can be written in more invariant terms. To do it, note that the
Weyl group of ∆a has the form Z
2Sn in the base θE by (5.9); thus, zk = αk(z),
k = 1, . . . , n, where αk ∈ θE and z ∈ a. Therefore, zk are eigenvalues of ad(z)
in the subspace generated by the corresponding root vectors. The problem is to
distinguish this subspace (in fact, we use a slightly different version). After that,
functions µk can be defined as norms of some operators according to the following
lemma (this observation was used in [12] in another context).
Lemma 5.6. Let V be a Euclidean space and A be a symmetric nonnegative oper-
ator in V with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λm ≥ 0, where m = dim V . Let A∧k be
its natural extension to the k-th exterior power V ∧k =
V . Then
‖A∧k‖V ∧k = λ1 . . . λk,
where ‖ ‖k is the operator norm with respect to the inner product in V ∧k.
Proof. The norm of a nonnegative symmetric operator is equal to its maximal
eigenvalue. �
Let v ∈ g be semisimple and π(v) denote the projection onto ker ad(v) along
other eigenspaces of ad(v) (note that π(v) is a function of ad(v), since it is the
residue at zero of the resolvent of ad(v)). Set
a(v) = ad([v, [v, k]])π(v) ad(k),
pk(v) = ‖a(v)∧k‖g∧k , k = 1, . . . , n.
The space d is a(v)-invariant and ker a(v) ⊇ k. We assume that g is equipped
with some K-invariant inner product, which extends the inner product in d. It
follows from the calculation below that a(v) is symmetric and has range ad(k)a.
Let n = dim a be the rank of the symmetric space D. It is equal to the codimension
of a generic K-orbit in d.
Theorem 5.7. For any v ∈ d,
K̂v = {z ∈ d : pk(z) ≤ pk(v), k = 1, . . . , n}.
Proof. It is sufficient to prove the assertion for a generic v ∈ d. Clearly, pk are
K-invariant. Hence, we may assume v ∈ a. Then, by (5.9) and (5.10),
vα(eα + fα),
where vα ∈ R. According to (5.2) and (5.4), [k, v] =
α∈E ivα(eα − fα). Thus,
[v, [v, k]] =
2iv2αhα
due to (5.7). Also, (5.7) implies that ad(ihα) keeps v and has eigenvalues 0,±2i
in it for each α ∈ E. Therefore, v is ad([v, [v, k]])-invariant and its eigenvalues are
±4v2αi, α ∈ E. Since ad(k)g = d and π(v)d = a for a generic v ∈ a, the space v is
a(v)-invariant; moreover, a(v)g = ad(k)a ⊆ v. Thus, a(v) has eigenvalues 0,±4v2α
in g. According to Lemma 5.6 and (0.6),
pk(v) = 4
kµ2k(v)(5.14)
for v ∈ v and k = 1, . . . , n. Since pk and µk are K-invariant, (5.14) holds for all
v ∈ d. The theorem follows from Proposition 5.5. �
20 V.M. GICHEV
Corollary 5.8. Functions pk, k = 1, . . . , n, are plurisubharmonic in d with respect
to the complex structure j = ad(k).
Proof. By (5.14) and (5.13),
pk(z) = 4
k sup{|(gz)21 . . . (gz)2k| : g ∈ K}.
The right-hand side is plurisubharmonic, since the functions z2
are j-holomorphic
and j is K-invariant. �
One can get the same functions pk by replacing g with d, endowed with the
complex structure j, and a(v) with ad([v, jv])(π(v) + π(jv)).
Acknowledgement. I thank Anton Pankratiev for helpful comments.
References
[1] J.T. Anderson, Polnomial hulls of sets invariant under an action of the special unitary group,
Can. J. Math., v. XL, (1988) no. 5, 1256-1271.
[2] J.-E. Björk, Compact groups operating on Banach algebras, Math. Ann., v. 205 (1973), no. 4,
281–297.
[3] J. Dadok, Polar coordinates induced by actions of compact Lie groups, Transactions Amer.
Math. Soc., 288 (1985), 125-137.
[4] A. Debiard, B. Gaveau, Equations de CauchyRiemann sur SU(2) et leurs enveloppes dholo-
morphie, Can. J. Math., v. 38 (1986), 1009-1024.
[5] T. Gamelin, Uniform Algebras, Prentice-Hall, Englewood Cliffs, N. J., 1969.
[6] V.M. Gichev, Maximal ideal spaces of invariant function algebras on compact groups, preprint,
46 pp., available at http://arXiv.org/abs/math/0603449
[7] V.M. Gichev, I.A. Latypov, Polynomially Convex Orbits of Compact Lie Groups, Transfor-
mation Groups, v. 6 (2001) no. 4, 321-331.
[8] J. Faraut, L. Bouattour, Enveloppes polynomiales densembles compacts invariants, Math.
Nachrichten 266 (2004), 20-26.
[9] W. Kaup, Bounded symmetric domains and polynomial convexity, Manuscr. Math. 114 (2004),
No.3, 391-398.
[10] W. Kaup, D. Zaitsev, On the CR-structure of compact group orbits associated with bounded
symmetric domains, Invent. Math., 153 (2003), 45-104.
[11] J. Kane, Maximal ideal spaces of U-algebras, Illinois J. Math., v.27 (1983), No 1, 1-13.
[12] B. Kostant, On convexity, the Weyl group and the Iwasawa decimposition, Ann. Sci. Ecole.
Norm. Super., 6 (1973), 413–455.
[13] V.L. Popov, E.B. Vinberg, Invariant theory, Itogi Nauki i Tekhniki, Sovr. Probl. Mat. Fund.
Napravl., v. 55, VINITI, Moscow 1989, pp. 137–309 (in Russian); English transl.: Algebraic
Geometry IV, Encyclopaedia of Math. Sciences, v. 55, Springer-Verlag, Berlin 1994, pp. 123-
[14] C. Sacre, Enveloppes polynomiales de compacts, Bull. Sci. math. 116 (1992), 129-144.
[15] J.A. Wolf, Fine Structure of Hermitian Symmetric Spaces, Symmetric spaces (Short Courses,
Washington Univ., St. Louis, Mo., 1969-1970), Pure Appl. Math., Vol. 8. New York: Dekker
1972, pp. 271-357.
Omsk Branch of Sobolev Institute of Mathematics, Pevtsova 13, 644099, Omsk, Rus-
E-mail address: gichev@ofim.oscsbras.ru
http://arXiv.org/abs/math/0603449
	Introduction
	1. Preliminaries
	2. Hulls of finite unions of T-orbits in a TC-orbit
	3. Finite extensions of T that keep a TC-orbit
	4. Finite extensions of T which keep generic TC-orbits
	5. Hulls of isotropy orbits of bounded symmetric domains
	Acknowledgement
	References
ABSTRACT
  Let $V$ be a complex linear space, $G\subset\GL(V)$ be a compact group. We
consider the problem of description of polynomial hulls $\wh{Gv}$ for orbits
$Gv$, $v\in V$, assuming that the identity component of $G$ is a torus $T$. The
paper contains a universal construction for orbits which satisfy the inclusion
$Gv\subset T^\bbC v$ and a characterization of pairs $(G,V)$ such that it is
true for a generic $v\in V$. The hull of a finite union of $T$-orbits in
$T^\bbC v$ can be distinguished in $\clos T^\bbC v$ by a finite collection of
inequalities of the type $\abs{z_1}^{s_1}...\abs{z_n}^{s_n}\leq c$. In
particular, this is true for $Gv$. If powers in the monomials are independent
of $v$, $Gv\subset T^\bbC v$ for a generic $v$, and either the center of $G$ is
finite or $T^\bbC$ has an open orbit, then the space $V$ and the group $G$ are
products of standard ones; the latter means that $G=S_nT$, where $S_n$ is the
group of all permutations of coordinates and $T$ is either $\bbT^n$ or
$\SU(n)\cap\bbT^n$, where $\bbT^n$ is the torus of all diagonal matrices in
$\rU(n)$. The paper also contains a description of polynomial hulls for orbits
of isotropy groups of bounded symmetric domains. This result is already known,
but we formulate it in a different form and supply with a shorter proof.

<|endoftext|><|startoftext|>
YITP-07-16, OCU-PHYS-264, AP-GR-40
Formation of a sonic horizon in isotropically expanding Bose-Einstein condensates
Yasunari Kurita
Osaka City University Advanced Mathematical Institute, Osaka 558-8585, Japan
Takao Morinari
Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8502, Japan
We propose a simple experiment to create a sonic horizon in isotropically trapped cold atoms
within currently available experimental techniques. Numerical simulation of the Gross-Pitaevskii
equation shows that the sonic horizon should appear by making the condensate expand. The ex-
pansion is triggered by changing the interaction which can be controlled by the Feshbach resonance
in real experiments. The sonic horizon is shown to be quasi-static for sufficiently strong interaction
or large number of atoms. The characteristic temperature that is associated with particle emission
from the horizon, which corresponds to the Hawking temperature in an ideal situation, is estimated
to be a few nK.
PACS numbers: 03.75.Kk, 03.75.Hh, 04.62.+v, 05.30.Jp
I. INTRODUCTION
For the exploration of cosmology and gravitational
physics, it is necessary to have a deep understanding of
quantum filed theory in curved spacetime: It is widely
believed that everything except for the spacetime it-
self should originate from quantum fluctuations in the
early Universe. Quantum effects on curved spacetime,
such as the Hawking radiation, give us theoretical sup-
port for black hole thermodynamics. However, it is ex-
tremely hard to verify such quantum effects experimen-
tally. For instance, the Hawking radiation is thermal
radiation emitted from a dynamically formed stationary
black hole [1]. However, the characteristic temperature
of the thermal radiation, the Hawking temperature, is on
the order of several tens of nanokelvins at most, which is
much lower than the cosmic microwave background radi-
ation temperature. So detecting thermal radiation from
a real black hole is almost impossible.
One way to circumvent this difficulty is to make use of
artificial black holes [2][3]. Unruh showed in his seminal
paper[4] that excitations in a supersonic flow corresponds
to a scalar field equation on a curved spacetime includ-
ing a horizon. Since the phenomenon of the Hawking
radiation can be separated from gravitational physics, it
is possible to detect the corresponding phenomenon in a
fluid system with sonic horizon [4]. The basic idea is to
identify fluid flow with curved spacetime and excitation
modes with fields on the curved spacetime. A black hole
event horizon corresponds to a sonic horizon in a fluid.
For the purpose of investigating the quantum effects, a
quantum fluid should be considered. As such a quantum
fluid, Bose-Einstein condensates (BEC) in trapped cold
atoms [5, 6] are one of the most suitable systems [7]-[9].
A crucial advantage is that one can control scattering
length between atoms by making use of the Feshbach
resonance[10]. In fact, that experimental technique was
used in observing jets and bursts in a collapsing conden-
sate, which is called “Bose-Novae” [11]. An remarkable
explanation of burst and jet phenomena in Bose-Novae
was proposed in [12][13], based on quantum field theory
of particle creation and structure formation in cosmolog-
ical spacetime.
In order to verify the Hawking effect in fluid analogy, it
is necessary to create a stationary sonic horizon because
it is a phenomenon on a dynamically formed stationary
black hole. Although several possibilities have been dis-
cussed so far [7, 8], it seems difficult to realize exactly sta-
tionary sonic horizon in cold atoms. However, if one can
make a quasi-static horizon for high frequency modes,
particle emission from the horizon is also expected. In
this paper, we numerically demonstrate that a quasi-
static horizon is realized without introducing new exper-
imental techniques beyond currently available ones. We
consider an expanding BEC driven by a sudden change of
the interaction. Numerically solving the Gross-Pitaevskii
(GP) equation, we show that a quasi-static horizon ap-
pear.
We note that there have been made great efforts to cre-
ate cosmological geometry using expanding BEC[14]-[21].
In these papers, the analogue models with specific cos-
mological metrics such as Friedmann-Robertson-Walker
(FRW) metric or de Sitter metric were discussed and the
effects of particle creation in these cosmological space-
times were investigated. However, in this paper, we do
not intend to obtain any cosmological analogue model
with well-known analytic metric. But we try to obtain
dynamically formed quasi-static sonic horizon. Further-
more, the sonic horizon should be formed in hydrody-
namic regime of the condensate because the spacetime
analogy is only valid in such regime. The appearance
of a horizon due to expansion of a condensate was no-
ticed in the previous works, for example in [15], and its
formation itself is not surprising. But it is non-trivial
whether the condensate flow at the horizon is in the hy-
drodynamic regime, or not. In this paper, we show that
the quasi-static sonic horizon will appear in the hydrody-
namic regime of the condensate by changing the atomic
interaction instantaneously.
http://arxiv.org/abs/0704.1096v3
II. ANALOGUE SPACETIME IN BEC
In the coherent state path integral formulation, the
action of bosons is given by
i~φ̄∂tφ−
∇φ̄ · ∇φ − Vextφ̄φ
U0(φ̄φ)
, (1)
where Vext is the confining potential and U0 = 4π~
with a the s-wave scattering length. For φ and Vext,
spatial and time dependences are implicit. The saddle
point equation for this action leads to the GP equation:
i~∂tΨ =
∇2 + Vext + U0|Ψ|2
Ψ. (2)
This GP equation governs the dynamics of the conden-
sate whose order parameter is given by Ψ.
Now we consider hydrodynamical approximation. We
denote the bosonic field φ as φ =
ρ0 + ρe
i(ϕ0+ϕ), where√
ρ0 and ϕ0 are the amplitude and the phase of Ψ, re-
spectively. (Namely, Ψ =
ρ0 exp(iϕ0).) The fields ρ
and ϕ describe the non-condensate part of the bosonic
field. If the density gradient is sufficiently smooth over
the scale determined by the local healing length ξ(r, t) ≡
~/(2mρ0U0)
1/2, or, in other words if the conditions,
|ξ∇ρ0/ρ0|2 ≪ 1 and |ξ∇ρ/ρ|2 ≪ 1, (3)
are satisfied, hydrodynamical approximation is justified.
(The condition (3) shall be examined later.) Under the
above condition, the equation for ρ is
ρ = −~(ϕ̇+ v0 · ∇ϕ)/U0, (4)
where v0 =
∇ϕ0 is the background fluid velocity, and
the effective action for ϕ is
Seff =
(ϕ̇+ v0 · ∇ϕ)2 −
(∇ϕ)2
Taking variation with respect to ϕ, we find that the field
equation for ϕ has the form of a propagating wave equa-
tion. Also, the equation for the field ϕ can be expressed
as ∂µ(
−ggµν∂ν)ϕ = 0, where gµν is the inverse of the
following matrix:
gµν ∝
−(c2s − v02) −v0
−v0 1
, (6)
with cs =
ρ0U0/m and g = detgµν . Thus, the equa-
tion is equivalent to an equation for a massless field on
a curved spacetime determined by the metric (6) with cs
the speed of ”light.” Note that in order to interpret the
quantity cs as a velocity, U0 must be positive because, for
negative U0, cs becomes pure imaginary. Hereafter we
consider positive U0, which leads to an effective space-
time with Lorentzian signature.
For the excitation modes of ϕ whose frequencies, say
ω, are much higher than the frequency ωBEC, which is
associated with the condensate motion, the condensate
will be quasi-static. (For moderate changes of the inter-
action, ωBEC turns out to be the trapping harmonic po-
tential frequency ωho, as shall be discussed below .) The
analogy between fields on the curved spacetime and ex-
citation modes on the fluid flow is meaningful only when
the conditions (3) are satisfied. The latter condition in
Eq.(3) turns out to be ω2 ≪ (cs/ξ)2, by using Eq.(4).
Thus, the frequency ω has an upper limit. The former
condition in Eq.(3) is satisfied in the regions far from
the edge of the condensate. (In contrast, if one is very
close to the edge, zero-point oscillations become domi-
nant, and so the former condition in (3) is not satisfied.)
If there exists intermediate region for ω of
ωBEC ≪ ω ≪ cs/ξ, (7)
then the hydrodynamical approximation is justified and
the condensate is quasi-static for excitation modes. Note
that those modes are associated with particle emission
from the horizon if the hydrodynamical flow has a dy-
namically formed sonic horizon. The necessary condition
for the existence of the intermediate region (7) is
ξωBEC
≫ 1. (8)
In the following, we mainly consider condensate satisfy-
ing the above condition.
III. FORMATION OF SONIC HORIZON
Now we investigate sonic horizon formation in an ex-
panding BEC trapped in isotropic harmonic potential,
Vext = mω
2/2, where r is the radial coordinate. Ini-
tially, we set the condensate in a ground state with an
initial atomic interaction ai. At t = 0, the atomic inter-
action is changed suddenly from ai to af (> ai), which
makes the condensate expand. Then, formation of sonic
horizon can be expected. The reason is as follows: The
sound velocity is proportional to square root of the con-
densate density and a decreasing function of r. In con-
trast, the fluid velocity is an increasing function of r
and the condensate expands fast around its edge whereas
v0(r = 0) = 0 due to the boundary condition. Therefore,
at an intermediate radius, v0 exceeds cs and the fluid flow
is transonic. It has a surface satisfying cs = |v0| which
is called a sonic horizon. We should note that the sonic
horizon corresponds to a horizon in the analogue space-
time defined by the metric (6). We also note that the
subsonic region is around the center of the condensate
and inside of the sonic horizon.
In general, if a fluid has a static sonic horizon and a
proper quantum state for an excitation field is realized,
then it is theoretically predicted that the horizon will
emit thermal radiation of the quantum field. As will
 0  5  10  15  20  25  30
af/ai
ai=50a0
ai=200a0
ai=800a0
FIG. 1: Times tc in each simulation are shown in units of
ω−1ho . The horizontal axis is the ratio of af to ai.
be discussed in Appendix A, if the sonic horizon in the
expanding condensate is quasi-static for the field ϕ, the
horizon will emit thermal radiation into the center of the
condensate. The temperature characterizing the thermal
emission (Hawking temperature) is given by the following
formula:
Tpc =
∂r(v0 − cs)|rH , (9)
where rH is the horizon radius and kB is the Boltzmann’s
constant[4][24]. From the above expression, it is found
that the Hawking temperature is determined by gradi-
ent of fluid and sound velocity at the horizon. Thus,
it is important to investigate the velocity gradients at
the horizon. In the derivation of the formula (9), it is as-
sumed that the dynamically formed horizon is static, but
in actual experiments, this assumption is not satisfied ex-
actly. Therefore, the spectrum of the particle is not fully
given by the single Planck’s distribution function, but
rather given by a superposition of the Planck’s distribu-
tion functions with slightly different temperatures. Even
if this is the case, the energy scale of the particle creation
emitted from the dynamically formed horizon is on the
order of Tpc.
We have simulated the expansion of the condensate by
solving numerically (using the Crank-Nicolson scheme)
the time-dependent GP equation. The initial ground-
state wave function is obtained by solving the GP equa-
tion using the steepest descent method for an initial s-
wave scattering length ai and the number of atoms N .
We have computed cs and the radial velocity of the con-
densate via
(Ψ∗Ψ)U0/m, (10)
v0 = ~ [Ψ
∗∂rΨ− (∂rΨ∗)Ψ] /(2mi|Ψ|2), (11)
and searched for parameter sets leading to |v0| > cs.
In the following, we assume that the condensate con-
sists of N = 105 Rb atoms. (The values of atomic inter-
action given below are those in the case of N = 105. If
 0  2  4  6  8  10  12  14
FIG. 2: Sound velocity cs(solid line) and the fluid velocity
v0(dashed line) versus r at t̃ = 0.4 in the case of ai = 200a0
and af = 5ai are shown in units of (~ωho/m)
1/2. The
healing length ξ(dotted line) is shown as well in units of
aho = (~/mωho)
N = 105/n with an integer n, then ai and af should
be multiplied by n.) The initial atomic interaction
is assumed to be ai = 50a0, 200a0 and 800a0 where
a0 = 0.53 × 10−10m is the Bohr radius. The follow-
ing change of the atomic interaction has been simulated:
af/ai = 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30.
Just after t = 0, the condensate begins to expand in
the trapping potential and the expansion is accelerated
for a while. At some time, say t = tc, the expansion
turns to be decelerated. Figure 1 shows tc as a function
of af/ai. It is seen that tc does not depend on the initial
strength of the interaction. For t̃ := ωhot < π/2, the
condensate continues to expand, and at t̃ ≃ π/2, the
condensate starts to collapse. Therefore, we turn off the
trapping potential at t̃ = π/4 and make the condensate
expand freely in order to keep the horizon for a while.
As far as we have investigated, sonic horizon always
appears in the sence of the surface where v0 exceeds cs.
As an example of a sonic horizon, we show Fig. 2 which
is the snapshot at t̃ = 0.4 in the case of ai = 200a0 and
af = 5ai. We see that, around r = 7aho, the fluid velocity
exceeds the sound velocity, and the sonic horizon exists
there. In this case, we find that, at t̃ ≡ ωhot = 0.11,
the horizon appears. Fig. 3 shows the time dependence
of the radius of the horizon, say rH , and the velocity
gradient at the horizon ∂r(v0 − cs)|rH .
If we keep the trapping potential for a long time, an
oscillating behavior of the condensate is observed. The
period of the oscillation is about π in units of ω−1ho and
we find ωBEC ≃ ωho, within a moderate change of the
interaction. This oscillation is just like an oscillation of a
droplet confined in a harmonic potential. Therefore, the
condition (8) can be rewritten as
≫ 1. (12)
FIG. 3: Time dependence of ∂r(v0−cs)rH (solid line) in units
of ωho and the position of the sonic horizon rH (dashed line)
in units of aho = (~/mωho)
in the case of ai = 200a0 and
af = 5ai.
 0  5  10  15  20  25  30
af/ai
ai=50a0
ai=200a0
ai=800a0
FIG. 4: cs/ξωho as a function of af/ai is shown for each
initial scattering length.
Now, we are interested in sonic horizon where (12) is sat-
isfied. The intermediate region (7) exists if, for example,
the following inequality is satisfied:
≥ 22.5. (13)
For this choice of the lower bound, there exists a re-
gion of ω satisfying both conditions of ω ≥ 10ωho and
(cs/ωξ)
2 ≥ 5. The condition (13) ensures hydrodynamic
flow and quasi-static nature of the condensate. Fig. 4
shows cs/ξωho as a function of af/ai. We define horizon
life time as the time interval during which the condition
(13) continues to be satisfied at the horizon. The horizon
life time is shown in Fig. 5. As far as we have investi-
gated, the condensate flow satisfying (13) at sonic horizon
appears only when af ≥ 5ai for ai = 50a0, af ≥ 4ai for
ai = 200a0 and af ≥ 3ai for ai = 800a0.
The Hawking temperature at t = tc and t̃ = 0.79 (just
 0  5  10  15  20  25  30
af/ai
ai=50a0
ai=200a0
ai=800a0
FIG. 5: Horizon life time as a function of af/ai is shown in
units of ω−1ho .
 0  5  10  15  20  25  30
af/ai
ai=50a0
ai=200a0
ai=800a0
FIG. 6: Hawking temperature at t = tc in units of nK. In
the evaluation, we assume ωho = 1400 Hz.
after turning off the trapping potential) are shown in Fig.
6 and Fig. 7, respectively. In the evaluation, we assume
the frequency ωho = 1400 Hz. The Hawking temperature
at t = tc depends on the ratio af/ai almost linearly. In
contrast, for af/ai ≥ 9, the Hawking temperature at
t̃ = 0.79 does not depend on the ratio so much. From the
simulations, the temperature is expected to be a few nK.
For this spherically symmetric trap, one may concern
the three-body recombination loss of condensed atoms.
Now, we check the effect of three-body losses for the given
peak density. This effects may be taken into account by
incorporating the imaginary term describing the inelastic
process in the GP equation [22]
i~∂tΨ =
∇2 + Vext + U0|Ψ|2
Ψ− i~
K3|Ψ|4Ψ,
where K3 denotes three-body recombination loss-rate co-
efficient. Then, the three-body loss is proportional to the
 0  5  10  15  20  25  30
af/ai
ai=50a0
ai=200a0
ai=800a0
FIG. 7: Hawking temperature at t̃ = 0.79 in units of nK,
when just after the trapping potential is turned off. In the
evaluation, we assume ωho = 1400 Hz.
cube of the atomic density
|Ψ|2d3r = −K3
|Ψ|6d3r,
which implies that the three-body loss rate is given by
R3 ≡ K3
|Ψ|6d3r/
|Ψ|2d3r. For the value of K3, we
assume K3 = 2 × 10−28cm6/s, according to [23]. Of
course, high atomic density causes many inelastic pro-
cesses and gives high atomic loss rate. In our numerical
simulation, the upper limit of the loss rate can be esti-
mated by use of the peak density asR3 ≤ 3×10s−1, where
the total atomic number was set to be N =
|Ψ|2d3r =
105. Then, the three-body loss can be ignored because
we consider the time scale of ≤ 10 ms.
In the above evaluation for Hawking temperature, hori-
zon lifetime and R3, we have assumed that the trapping
frequency is ωho = 1400 Hz. Note that ωho is the en-
ergy scale of the system. Therefore, a large value of
ωho is plausible to increase the characteristic tempera-
ture for the particle emission, though the time evolution
process becomes rapid for large ωho. If lower frequency
is assumed, lower temperature, longer horizon lifetime
and fewer three-body loss rate would be expected. As
an example, Fig. 8 shows ωho-dependence of the Hawk-
ing temperature and the horizon lifetime in the case of
ai = 200a0 and af = 10ai.
IV. BOGOLIUBOV SPECTRUM
In the above numerical simulations, we assume there
is no dynamical instability. Now, we check whether there
is dynamical instability or not, within Gaussian approx-
imation. For that purpose, we study the Bogoliubov-de
Gennes equations: the second quantized field equations
 0  200  400  600  800  1000  1200  1400
horizon life time
FIG. 8: ωho-dependence of TH and horizon lifetime in the
case of ai = 200a0 and af = 10ai. The Hawking temperature
is shown in units of nK and horizon lifetime is in units of ω−1ho .
for the excitation fields δφ and δφ are given by
i~∂tδφ =
∇2 + Vext + 2U0|Ψ|2
δφ+ U0Ψ
−i~∂tδφ =
∇2 + Vext + 2U0|Ψ|2
δφ+ U0
The excitation spectrum is computed by performing the
Bogoliubov transformation:
uα (r) bαe
−iEαt/~ − vα (r) b†αeiEαt/~
,(14)
u∗α (r) b
iEαt/~ − v∗α (r) bαe−iEαt/~
.(15)
The energy spectrum Eα is calculated by diagonalizing
the skew symmetric matrix, which is carried out by using
a routine in LAPACK. For the parameter values taken
above, we find that all eigenvalues do not have the imag-
inary parts within numerical errors. Therefore, within
Gaussian approximation, there is no dynamical instabil-
ity. In addition, we find that there is no level crossing.
V. SUMMARY
To summarize, we have proposed an experiment to
create a quasi-static sonic horizon using an expanding
BEC. It has been shown that the dynamically formed
quasi-static sonic horizon is in hydrodynamic regime as
it should be to discuss analogy with curved spacetime in
BEC. Under suitable choices of the interaction parameter
and the confining potential, the characteristic tempera-
ture of the particle emission is expected to be a few nK
for sufficiently strong confining potential. Large num-
ber of atoms or strong atomic interaction improves the
quasi-static nature of the horizon.
Of course, other effect such as cosmological particle
creation can occur in this expanding BEC setup, as dis-
cussed in [14]-[21]. In this paper, we have focused on
how to make dynamically formed quasi-static sonic hori-
zon in the hydrodynamic regime of the condensate flow.
In order to investigate cosmological particle creation ef-
fect and other excitations arising from depletion, we need
a different numerical simulation scheme. The result will
be reported in a future publication.
Furthermore, it is interesting to investigate numeri-
cally the behavior of negative frequency modes with pos-
itive norm which seem to be related to Hawking effect as
was discussed in [26][27]. This point shall be investigated
in a future publication.
Acknowledgments
Y.K. thanks Hideki Ishihara, Ken-ichi Nakao, and
Makoto Tsubota for useful discussions. The authors
thank Michikazu Kobayashi and Takashi Uneyama for
useful comments on numerical simulations. Y.K. was
partially supported by the Yukawa memorial founda-
tion. This work was also supported by the 21st Century
COE ”Center for Diversity and Universality in Physics”
and ”Constitution of wide-angle mathematical basis fo-
cused on knots” from the Ministry of Education, Culture,
Sports, Science and Technology (MEXT) of Japan. The
numerical calculations were carried out on Altix3700 BX2
at YITP in Kyoto University.
APPENDIX A: PARTICLE CREATION
PHENOMENON
Here we focus on spherically symmetric quantum fluc-
tuations by symmetry. At t ≤ 0, the fluid velocity v0 = 0,
and the metric of the initial static effective spacetime is
ds2 ∝ −c2sdt2 + dr2 + r2dΩ2S2 , (A1)
where dΩ2S2 is the element of solid angle on the unit
sphere S2. After the increase of the interaction, the ef-
fective spacetime evolves dynamically as the BEC starts
to expand. Then, the sonic horizon is formed as was
shown by the above numerical simulation. If the effec-
tive spacetime is static, we can introduce a following time
coordinate: τ = t +
v0dr/(c
s − v20), and the effective
spacetime metric becomes
ds2 ∝ −(c2s − v20)dτ2 +
c2sdr
c2s − v20
+ r2dΩ2S2 . (A2)
From this expression, it is found that the horizon is lo-
cated at the surface where the condition cs = |v0| is sat-
isfied. A new coordinate v is introduced as v ≡ τ + r∗
where r∗ ≡
csdr/(c
s − v20), which is a coordinate char-
acterizing ingoing light-like (null) rays in the effective
spacetime.
We assume here that the initial state of the quantum
field ϕ is the vacuum state for the static observer in the
initial effective spacetime. Under the time evolution of
the effective spacetime caused by the expansion of the
condensate, the creation and annihilation operators for
the field ϕ also evolve, and particle creation occurs.
Now we consider an observer who moves along his
or her outgoing geodesic with proper time λ, crossing
the horizon at λ = 0. Hereafter, we term the observer
geodesic observer. If we assume that the horizon is lo-
cated at r = rH , the proper time λ is related to the coor-
dinate v there via λ ≈ −λ0e−
v, where cH ≡ cs(rH),
α ≡ 2cH∂r(v0 − cs)|r=rH and λ0 is a constant. The ingo-
ing mode functions ϕω = e
−iωv have λ-dependence near
the horizon as
ϕω ≈ exp
ln(−λ)
. (A3)
Initially, the state is the vacuum for the static observer
and therefore the geodesic observer would see no exci-
tation at short distance, because there will be no much
higher positive frequency excitations than those deter-
mined by the time scale of the dynamical expansion of
the BEC. If we ignore the short distance cut-off deter-
mined by the healing length, or equivalently, if the lat-
ter condition in Eq.(3) is ignored, this λ-dependence of
the ingoing mode functions implies that the particle cre-
ation from the horizon into the inside of the condensate
has thermal spectrum with the temperature given by (9).
Furthermore, even if the short distance cut-off is taken
into account, it is known that the result does not change
in principle [25].
Therefore, the particle emission from horizon will oc-
cur in the case of expanding condensate, even where the
subsonic region is inside of the horizon.
[1] S.W. Hawking, Nature 248, 30 (1974); S.W. Hawking,
Commun. Math. Phys. 43, 199 (1975).
[2] Artificial Black Holes, edited by M. Novello, M. Visser,
and G. Volovik (World Scientific, 2002).
[3] C. Barcelo, S. Liberati and M. Visser, Living Rev. Rel.
8, 12 (2005)
[4] W. G. Unruh, Phys. Rev. Lett. 46, 1351 (1981).
[5] M. H. Anderson, J. R. Ensher, M. R. Matthews,
C. E. Wieman, and E.A. Cornell, Science 269, 198
(1995).
[6] K. B. Davis, M.-O. Mewes, M. R. Andrews, N. J. van
Druten, D. S. Durfee, D. M. Kurn, and W. Ketterle,
Phys. Rev. Lett. 75, 3969 (1995).
[7] L. J. Garay, J. R. Anglin, J. I. Cirac and P. Zoller, Phys.
Rev. Lett. 85, 4643 (2000)
[8] L. J. Garay, J. R. Anglin, J. I. Cirac and P. Zoller, Phys.
Rev. A 63, 023611 (2001)
[9] C. Barcelo, S. Liberati and M. Visser, Class. Quantum
Grav. 18, 1137 (2001).
[10] S. Inouye, M. R. Andrews, J. Stenger, H.-J. Miesner,
D. M. Stamper-Kurn, and W. Ketterle, Nature 392, 151
(1998); P. Courteille, R.S. Freeland, D.J. Heinzen, F.A.
van Abeelen, and B.J. Verhaar, Phys. Rev. Lett. 81, 69
(1998); J.L. Roberts, N.R. Claussen, J.P. Burke, C.H.
Greene, E.A. Cornell, and C. E. Wieman, Phys. Rev.
Lett. 81, 5109 (1998).
[11] E. A. Donley, N. R. Claussen, S. L. Cornish,
J. L. Roberts, E. A. Cornell, and C. E. Wieman, Nature
412, 295 (2001).
[12] E. A. Calzetta and B. L. Hu, Phys. Rev. A 68, 043625
(2003)
[13] E. A. Calzetta and B. L. Hu, cond-mat/0208569
[14] C. Barcelo, S. Liberati and M. Visser, Int.J.Mod. Phys.
D 12 1641 (2003).
[15] C. Barcelo, S. Liberati and M. Visser, Phys. Rev. A 68
053613 (2003).
[16] P. O. Fedichev and U. R. Fischer, Phys. Rev. Lett. 91,
240407 (2003).
[17] P. O. Fedichev and U. R. Fischer, Phys. Rev. A. 69,
033602 (2004).
[18] P. O. Fedichev and U. R. Fischer, Phys. Rev. D 69,
064021 (2004).
[19] U. R. Fischer and R. Schützhold, Phys. Rev. A 70,
063615 (2004).
[20] J. E. Lidsey, Class. Quantum Grav. 21, 777 (2004).
[21] S. E. C. Weinfurtner, gr-qc/0404063.
[22] Yu. Kagan, A.E. Muryshev, and G.V. Shlyapnikov, Phys.
Rev. Lett. 81 933 (1998).
[23] H. Saito and M. Ueda, Phys. Rev. A 65, 033624 (2002).
[24] For derivation details of the particle creation, see, for
example, T. Jacobson, gr-qc/0308048.
[25] S. Corley and T. Jacobson, Phys. Rev. D 54, 1568 (1996).
[26] U. Leonhardt, T. Kiss and P. Öhberg, J. Opt. B 5 S 42
(2003).
[27] U. Leonhardt, T. Kiss and P. Öhberg, Phys. Rev. A 67,
033602 (2003)
http://arxiv.org/abs/cond-mat/0208569
http://arxiv.org/abs/gr-qc/0404063
http://arxiv.org/abs/gr-qc/0308048
ABSTRACT
  We propose a simple experiment to create a sonic horizon in isotropically
trapped cold atoms within currently available experimental techniques.
Numerical simulation of the Gross-Pitaevskii equation shows that the sonic
horizon should appear by making the condensate expand. The expansion is
triggered by changing the interaction which can be controlled by the Feshbach
resonance in real experiments. The sonic horizon is shown to be quasi-static
for sufficiently strong interaction or large number of atoms. The
characteristic temperature that is associated with particle emission from the
horizon, which corresponds to the Hawking temperature in an ideal situation, is
estimated to be a few nK.

<|endoftext|><|startoftext|>
Zero-temperature resistive transition in Josephson-junction arrays at irrational
frustration
Enzo Granato
Laboratório Associado de Sensores e Materiais,
Instituto Nacional de Pesquisas Espaciais,
12245-970 São José dos Campos, SP Brazil
We use a driven Monte Carlo dynamics in the phase representation to determine the linear re-
sistivity and current-voltage scaling of a two-dimensional Josephson-junction array at an irrational
flux quantum per plaquette. The results are consistent with a phase-coherence transition scenario
where the critical temperature vanishes. The linear resistivity is nonzero at any finite tempera-
tures but nonlinear behavior sets in at a temperature-dependent crossover current determined by
the thermal critical exponent. From a dynamic scaling analysis we determine this critical exponent
and the thermally activated behavior of the linear resistivity. The results are in agreement with
earlier calculations using the resistively shunted-junction model for the dynamics of the array. The
linear resistivity behavior is consistent with some experimental results on arrays of superconducting
grains but not on wire networks, which we argue have been obtained in a current regime above the
crossover current.
PACS numbers: 74.81.Fa, 74.25.Qt, 75.10.Nr
Most theoretical investigations of the vortex-glass
phase in superconductors have considered model systems
where there is a combined effect of quenched disorder and
frustration1. However, in artificial Josephson-junction
arrays, frustration without disorder can in principle be
introduced by applying an external magnetic field on
a perfect periodic array of weakly coupled supercon-
ducting grains2,3,4 and similarly on superconducting wire
networks5,6. The frustration parameter f , the number of
flux quantum per plaquette, is given by f = φ/φo, the
ratio of the magnetic flux through a plaquette φ to the su-
perconducting flux quantum φo = hc/2e. It can be tuned
by varying the strength of the external field. Frustration
effects can be viewed as resulting from a competition be-
tween the underlying periodic pinning potential of the
array and the natural periodicity of the vortex lattice7.
At a rational value of f , the ground state is a commensu-
rate pinned vortex lattice leading to discrete symmetries
in addition to the continuous U(1) symmetry of the su-
perconducting order parameter. The resistive transition
is only reasonably well understood for simple rational
values of f .
At irrational values of f , the resistive behavior is much
less understood since the vortex lattice is now incom-
mensurate with the periodic array. In early Monte Carlo
(MC) simulations8 the ground state was found to con-
sist of a disordered vortex pattern lacking long range or-
der which could be regarded as a some sort of vortex-
glass state without quenched disorder. Glassy-like be-
havior was indeed observed in these simulations suggest-
ing a possible superconducting (vortex-glass) transition
at finite temperatures. However, some arguments also
suggested that the critical temperature should vanish7,9.
Simulations of the current-voltage scaling using the resis-
tively shunted-junction model for the dynamics of the ar-
ray found that the behavior was consistent with an equi-
librium resistive transition where the critical temperature
vanishes10, similar to the resistive transition described by
the the gauge-glass model in two dimensions1,11, but with
different values for the correlation-length critical expo-
nent ν. The linear resistivity is nonzero at any finite tem-
peratures but nonlinear behavior sets in at a crossover
current with a temperature dependence determined by
the exponent ν. This zero-temperature transition leads
to slow relaxation dynamics where the correlation length
diverges as a power law and the relaxation time diverges
exponentially as the temperature vanishes.
Simulations of the relaxation dynamics12 found a be-
havior analogous to relaxation in supercooled liquids with
a characteristic dynamic crossover temperature rather
than an equilibrium transition temperature, which is not
inconsistent with the zero-temperature transition sce-
nario. On the other hand, a systematic study by MC
simulations13 of a sequence of rational values of f con-
verging to the irrational frustration, using the vortex rep-
resentation, found two phase transitions at finite temper-
atures, a vortex-order transition weakly dependent on
f and a vortex pinning transition at much lower tem-
peratures varying with f , which should correspond to
the resistive transition. These results are in qualitative
agreement with MC simulations using the phase repre-
sentation of the same model14 but different ground states
were found.
More recently, MC simulations for the the specific
heat and relaxation dynamics found an intrinsic finite-
size effect15. The corresponding scaling analysis sug-
gested a zero-temperature transition with a critical ex-
ponent ν consistent with the value obtained initially
from current-voltage scaling10. However, a study of
the low-temperature configurations for frustrations close
the irrational value by MC simulations in the vortex
representation16, find two phase transitions consistent
with earlier work13.
On the experimental side, some results on arrays of su-
http://arxiv.org/abs/0704.1097v1
perconducting grains at irrational frustration2,3 are con-
sistent with the scenario of the zero-temperature resis-
tive transition but on wire networks5,6, resistivity scaling
showed evidence of a transition at finite temperature. Re-
cently, resistivity scaling suggesting a finite temperature
transition was also observed in arrays of superconducting
grains4.
In view of these conflicting results, it seems useful to
further investigate the current-voltage scaling for the ar-
ray at irrational frustration by studying both the non-
linear and linear resistivity with an improved method17
taking into account the long relaxation times. In fact, as
found recently, current-voltage scaling turned out to be
quite reliable in determining the phase-coherence transi-
tion even for a model with quenched disorder, such as the
three-dimensional XY-spin glass model17,18. The main
question is therefore, if the array at irrational frustration
displays an equilibrium phase-coherence transition at a
nonzero critical temperature into a state with vanishing
linear resistivity or its critical temperature vanishes and
the linear resistivity is finite at nonzero temperatures.
In this work, we investigate the resistivity scaling of
Josephson-junction arrays at a irrational frustration f =
5)/2, a golden irrational, using a driven MC dy-
namics in the phase representation introduced recently17.
The results are consistent with a phase-coherence tran-
sition scenario where the critical temperature vanishes,
Tc = 0. The linear resistivity is finite at nonzero temper-
atures but nonlinear behavior sets in at a temperature-
dependent crossover current determined by the thermal
critical exponent ν. The results agree with earlier simula-
tions using the resistively shunted-junction model for the
dynamics of the array10. However, with the present MC
method we are able to reach much lower temperatures
and current densities, improving the analysis of resistiv-
ity scaling and the estimate of the critical exponent ν. We
also argue that the finite-temperature transition found
in resistivity measurements on wire networks5,6have been
obtained in a current regime above the crossover current.
We consider a two-dimensional Josephson-junction
square array described by the Hamiltonian
H = −Jo
cos(θi − θj −Aij)− J
(θi − θi+x) (1)
The first term gives the Josephson-coupling energy be-
tween nearest neighbor grains where line integral of the
vector potential Aij is constrained to
ij Aij = 2πf
around each plaquette. The second term represents the
effects of an external driving current density J applied
in the x direction. When J 6= 0, the total energy is un-
bounded and the system is out of equilibrium. The lower-
energy minima occur at phase differences θi−θi+x which
increases with time t, leading to a net phase slippage
rate proportional to < d(θi − θi+x)/dt >, corresponding
to the voltage Vi,i+x. We take the frustration parameter
f equals an irrational number, f = (3−
5)/2, related to
the Golden Ratio Φ = (1+
5)/2 as f = 1− 1/Φ. In the
numerical simulations we use periodic (fluctuating twist)
boundary conditions on lattices of linear sizes L and cor-
responding rational approximations Φ = Fn+1/Fn, where
Fn are Fibonacci numbers (13, 21, 34, 55), with L = Fn.
To study the current-voltage scaling, we use a driven
MC dynamics method17. The time dependence is ob-
tained by identifying the MC time as the real time t and
we set the unit of time dt = 1, corresponding to a com-
plete MC pass through the lattice. Periodic (fluctuat-
ing twist) boundary conditions are used19. This bound-
ary condition adds new dynamical variables, uα (α = x
and y), corresponding to a uniform phase twist between
nearest-neighbor sites along the principal axis directions
x̂ and ŷ. A MC step consists of an attempt to change the
local phase θi and the phase twists uα by fixed amounts,
using the Metropolis algorithm. If the change in en-
ergy is ∆H , the trial move is accepted with probability
min{1, exp(−∆H/kT )}. The external current density J
in Eq. 1 biases these changes, leading to a net voltage
(phase slippage rate) across the system given by
(θ1,j − θL+1,j − uxL), (2)
in arbitrary units. The main advantage of this MC
method compared with the Langevin dynamics used
earlier10 is that in principle much longer time scales can
be accessed which allows one to obtain reliable data at
much lower temperatures and current densities. We have
determined the electric field E = V/L and nonlinear re-
sistivity ρ = E/J as a function of the driving current
density J for different temperatures T and different sys-
tem sizes. We used typically 2×105 MC steps to reach the
nonequilibrium steady state at finite current and equal
time steps to perform time averages with and additional
average over 4− 6 independent runs.
We have also determined the linear resistivity, ρL =
limJ−>0E/J , from equilibrium MC simulations. As any
transport coefficient, this quantity can be obtained from
equilibrium fluctuations and therefore can be calculated
in absence of an imposing driving current (J = 0). From
Kubo formula, the linear resistivity (resistance in two
dimensions) is given in terms of the equilibrium voltage
autocorrelation as
dt〈V (t)V (0)〉. (3)
Since the total voltage V is related to the phase difference
across the system ∆θ(t) by V = d∆θ(t)/dt, we find more
convenient to determine ρL from the long-time equilib-
rium fluctuations11 of ∆θ(t) as
〈(∆θ(t) −∆θ(0))2〉, (4)
which is valid for sufficiently long times t. To insure that
only equilibrium fluctuations are considered, the calcu-
lations were performed in two steps. First, simulations
using the exchange MC method (parallel tempering)20
 T=0.3
 T=0.275
 T=0.25
 T=0.225
 T=0.20
 T=0.175
 T=0.15
FIG. 1: Nonlinear resistivity E/J at different temperatures
T for system size L = 55.
were used to obtain equilibrium configurations of the sys-
tems at different temperatures21. This method is known
to reduce significantly the critical slowing down near the
transition allowing fully equilibration in finite small sys-
tem sizes. These configurations were then used as ini-
tial states for the driven MC dynamics process described
above, with J = 0, in order to obtain the ρL. The ini-
tial states are similar to the low-temperature states ob-
tained previously13,16 including thermal excitations. In
the parallel-tempering method20, many replicas of the
system with different temperatures are simulated simul-
taneously and the corresponding configurations are al-
lowed to be exchanged with a probability satisfying de-
tailed balance. The equilibration time can be measured
as the average number of MC steps required for each
replica to travel over the whole temperature range. We
used typically 4× 106 (parallel tempering) MC steps for
equilibration which is much larger than the estimated
equilibration time in systems with up to 100 replicas.
Subsequent MC simulations for the linear resistivity ob-
tained from Eq. 4 were performed using 2 × 103 time
averages for 2× 105 MC steps which is much larger than
the equilibrium relaxation time.
Fig. 1a shows the nonlinear resistivity E/J as a func-
tion of temperature for the largest system size. At small
current densities J , the nonlinear resistivity E/J tends
to a constant value, corresponding to the linear resis-
tivity ρL, which decreases rapidly with decreasing tem-
perature. For increasing J , the resistivity cross over to
a nonlinear behavior at a characteristic current density
Jnl, which also decreases with decreasing temperature.
To verify that the nonzero values approached at low cur-
rents in Fig. 1 correspond indeed to the linear resistivity
ρL, we show in Fig. 2 the temperature dependence of
ρL obtained without current bias from Eq.(4) for dif-
ferent system sizes. ρL decreases with system size but
approaches nonzero values for the largest system size.
These values are in agreement with the corresponding
values at the lowest current in Fig. 1. Since the be-
3 4 5 6 7
 L=13
 L=21
 L=34
 L=55
FIG. 2: Temperature dependence of the linear resistivity for
different system sizes.
havior of the ρL for the largest system size on the log-
linear plot in Fig. 2 is a straight line, it indicates an
activated Arrhenius behavior, where the linear resistiv-
ity decreases exponentially with the inverse of tempera-
ture with a temperature-independent energy barrier, es-
timated as Eb ∼ 1.07. Such activated behavior suggests
that the linear resistivity can be very small at low tem-
peratures but nevertheless remains finite for all tempera-
tures T > 0 and therefore there is no resistive transition
at finite temperatures. However, as will be described
below, the system behaves as if a resistive transition
occurs at zero temperature, corresponding to a phase-
coherence transition where the critical temperature van-
ishes, Tc = 0.
The behavior of the linear resistivity can be related to
the equilibrium relaxation time for phase fluctuations.
Since the voltage is the rate of change of the phase,
a nonzero ρL requires measurements over a time scale
τ ∝ 1/ρL, corresponding to the relaxation time for phase
fluctuations. Thus, we expect that τ should also have
an activated behavior, increasing exponentially with the
inverse of temperature. To verify this behavior, we have
in addition calculated the relaxation time τ for different
temperatures from the autocorrelation function of phase
fluctuations C(t) as
C(0)2
dtC(t) (5)
using MC simulations with J = 0. The starting con-
figurations were taken from equilibrium configurations
obtained21 with the parallel tempering MC method20.
The results shown on the log-linear plot in Fig. 3 are in-
deed consistent with an activated behavior of τ with an
energy barrier Eb = 1.18 in reasonable agreement with
the value obtained for the linear resistivity in Fig. 2.
The behavior in Figs. 1, 2 and 3 has the main features
associated with a phase transition that only occurs at
zero temperature, Tc = 0, similar to the two-dimensional
gauge glass model of disordered superconductors1,11. In
this case the correlation length ξ is finite for T > 0 but it
increases with decreasing temperature as ξ ∝ T−ν , with
ν a critical exponent. The divergent correlation length
near the transition determines both the linear an nonlin-
ear resistivity behavior leading to current-voltage scaling
sufficiently close to the critical temperature and suffi-
ciently small driving current. To understand in detail
the behavior of the linear ρL and nonlinear resistivity ρ
we need a scaling theory for the resistive behavior. If the
data satisfy such scaling behavior for different driving
currents and temperatures, the critical temperature and
critical exponents of the underlying equilibrium transi-
tion at J = 0 can then be determined from the best data
collapse. A detailed scaling theory has been described
in the context of the current-voltage characteristics of
vortex-glass models1 but the arguments should also ap-
ply to the present case. The basic assumption is the
existence of a second order phase transition. Measurable
quantities should then scale with the diverging correla-
tion length ξ ∝ |T −Tc|−ν and relaxation time τ near the
critical point. The nonlinear resistivity E/J should then
satisfy the scaling form1
τ = g±(
), (6)
in two-dimensions, where g±(x) is a scaling function. The
+ and − signs correspond to T > Tc and T < Tc, respec-
tively. If Tc 6= 0, then to satisfy such scaling form, the
nonlinear resistivity curves on the log-log plot in Fig. 1
should have a positive curvature at small J , with E/J de-
creasing with decreasing J to a temperature dependent
value for T > Tc while for T < Tc, the curvature should
be negative, with E/J vanishing in the limit J → 0. The
data in Fig. 1 do not show a change in curvature even for
the lowest temperature, already suggesting the possibil-
ity of a resistive transition at much lower temperatures
or at Tc = 0. However, a full scaling analysis of the data
is required to show that a transition indeed occur with
Tc = 0. If Tc = 0, then the correlation length ξ ∝ T−ν
and the linear resistivity ρL are both finite at T > 0.
One can then consider the behavior of the dimensionless
ratio E/JρL which should satisfy the scaling form
T 1+ν
) (7)
where g is a scaling function with g(0) = 1. A crossover
from linear behavior, when g(x) ∼ 1, to nonlinear behav-
ior, when g(x) >> 1, occurs when x ∼ 1 which leads to a
characteristic current density at which nonlinear behav-
ior sets in decreasing with temperatures as a power law,
Jnl ∝ T/ξ ∝ T 1+ν . The scaling form in Eq. (7)contains a
single critical exponent ν and does not depend on the par-
ticular form assumed for the divergence of the relaxation
time τ . However, for sufficiently low temperatures, the
relaxation process is expected to be thermally activated1
with τ ∝ exp(Eb/kT ). This corresponds formally to a
dynamic exponent z → ∞, if power-law behavior is as-
sumed for the relaxation time τ ∝ ξz . From the scal-
ing form of Eq.(6), the linear resistivity should scale as
3 4 5 6 7
              τ �  e
              E
=1.18(2)
FIG. 3: Temperature dependence of the relaxation time τ of
phase fluctuations for system size L = 55.
 ν = 1.4
 T=0.30
 T=0.275
 T=0.25
 T=0.225
 T=0.20
 T=0.175
 T=0.15
J / T 
 1 + ν
FIG. 4: Scaling plot of the nonlinear resistivity in Fig. 1 for
ν = 1.4.
ρL ∝ 1/τ and therefore it is also expected to have an
activated behavior, τ ∝ exp(−Eb/kT ). In general, the
energy barrier Eb also scales with the correlation length
as Eb ∝ ξψ , which leads to a temperature-dependent bar-
rier Eb ∝ T−ψν . A pure Arrhenius behavior corresponds
to ψ = 0. The behavior of the nonlinear and linear resis-
tivity in Figs 1, 2 and the relaxation time in Fig. 3 are
quite consistent with these predictions from the scaling
theory of a zero-temperature transition.
If there is a zero-temperature transition, as suggested
by the behaviors in Figs. 1, 2 and 3, then the data for
the nonlinear resistivity should satisfy the scaling form
of Eq.(7), if finite-size effects are negligible, and the best
data collapse provides an estimate of the critical expo-
nent ν. We expect that finite-size effects are negligible
for the largest system size L = 55 in Fig. 1 since at
this length scale the behavior of the linear resistivity is
roughly independent of the size as can be seen from Fig.
2. Fig. 4 shows that indeed the data for the largest
system size satisfy this scaling form with ν ∼ 1.4± 0.2.
The nonlinear resistivity should also satisfy the ex-
pected finite-size behavior in smaller system sizes when
the correlation length ξ approaches the system size L.
According to finite-size scaling, the scaling function in
Eq. (7), should also depend on the dimensionless ratio
L/ξ and so to account for finite-size effects the nonlinear
resistivity should satisfy the scaling form
= ḡ(
T 1+ν
, L1/νT ). (8)
The scaling analysis of the whole nonlinear resistivity
data is rather complicated in this case since the scal-
ing function depends on two variables. To simplify
the analysis22 we first estimate the temperature and
finite-size behavior of the crossover current density Jnl
where nonlinear behavior sets in as the value of J where
E/JρL = C, a constant. Then, from Eq. (8), the finite-
size behavior of Jnl can be expressed in the scaling form
(1+ν)/ν = ¯̄g(L1/νT ). (9)
The best data collapse according to the scaling in Eq. (9)
provides an alternative estimate of the critical exponent
ν. Fig. 5 shows that indeed the values of Jnl for different
system sizes and temperatures satisfy this scaling form
with ν ∼ 1.4, in agreement with the estimate obtained
for the largest system in Fig. 4 size using Eq. (7).
In addition to the standard finite-size effects, which oc-
cur when the correlation length is comparable to the sys-
tem size, already taken into account in the scaling form
of Eq. (8), there are also intrinsic finite-size effects15
resulting from the rational approximations used for the
irrational value of f . Since we use rational approxima-
tions Φ = Fn+1/Fn, where Fn are Fibonacci numbers
(13, 21, 34, 55), with the system size set to L = Fn, this
amounts essentially to have different values of the frus-
tration, fL = 1 − 1/Φ, for different system sizes which
will only converge to the correct value f = (3 −
in the infinite-size limit. We have assumed that such ef-
fects are negligible in the above scaling analysis but they
should affect our estimate of the critical exponent ν. In
principle, this intrinsic effect could be taken into account
within the zero-temperature transition scenario by allow-
ing for a size-dependent critical temperature Tc(L) in the
scaling analysis15. Alternatively, we could regard it as a
crossover from the critical behavior at the true irrational
frustration (infinite-size limit) to a phase with an addi-
tional small frustration δf = fL − f which should act as
a relevant perturbation. In this case, the scaling func-
tion in Eq. (7) should also depend on the dimensionless
ratio ξ2δf and again a scaling analysis with more than
one variable is required. However, our present numeri-
cal data is not sufficiently accurate to separate this effect
from standard finite-size effects.
The present results for the linear and nonlinear resis-
tivity of the array at irrational frustration obtained by
the driven MC dynamics agree with earlier simulations of
the current-voltage scaling using the resistively shunted-
junction model for the dynamics of the array10, where a
zero-temperature resistive transition was suggested and
1 2 3 4 5 6
ν = 1.4
 L=13
 L=21
 L=34
 L=55J n
1 / ν
FIG. 5: Finite-size scaling plot of the crossover current den-
sity Jnl with ν = 1.4, for different system sizes L.
the critical exponent was estimated as ν = 0.9(2). Al-
though the later model is expected to be a more realistic
description for the dynamics of the array, the value of the
static critical exponent ν should be the same for both
models. In general, the dynamic exponent z may de-
pend on the particular dynamics but since the relaxation
time τ is found to diverge exponentially for decreasing
temperature it corresponds to z → ∞ for both dynam-
ics. The present estimate of ν = 1.4(2), however, should
be more reliable since it considers much lower tempera-
tures and current densities and larger system size. In-
terestingly, similar behavior for the resistive transition
has been found both numerically and experimentally for
two-dimensional disordered superconductors in a mag-
netic field described as a gauge-glass model1,11 but with
different value for critical exponent ν ∼ 2. It should
be noted however that the actual ground state at irra-
tional frustration (without disorder) can be quite differ-
ent, as the self similar structure which has already been
proposed5,23. As would be expected, the different nature
of ground state leads to the different values of the critical
exponent ν.
Although the above scaling analysis is consistent with a
zero temperature transition, on pure numerical grounds
the data in Figs 1 and 2 can not complete ruled out
a vortex-order or a phase-coherence transition at tem-
peratures much lower than T = 0.15. In fact, phase-
coherence transitions were found in MC simulations using
the Coulomb-gas presentation13 at temperatures as low
as T ∼ 0.03 for the sequence of rational approximations
fL of the irrational f but since they show considerable
variation with fL it is not clear if it will remain nonzero
in the large size limit. However, the lowest temperature
in Figs 1 and 2 is already much smaller than the apparent
freezing temperature Tf ∼ 0.25 observed in earlier MC
simulations8. Below Tf , a nonzero Edwards-Anderson
order parameter q(t) =< ~Si >
2, was observed, where
~S = (cos θ, sin θ) and the average was taken over the sim-
ulation times t. Although this could suggest a diverging
relaxation time τ ∝
q(t)dt near a finite temperature
Tc ∼ Tf , such long relaxation time can also result from a
zero-temperature transition (Tc = 0) as suggested by the
above scaling analysis since in this case τ diverges expo-
nentially with decreasing temperature, τ ∝ exp(Eb/kT ),
as shown in Fig. 3. For low enough temperatures, τ will
eventually be larger than any simulation or experimental
measuring time scale and an apparent (time dependent)
freezing transition could occur depending on the partic-
ular dynamics and system size.
Some experimental results on arrays of superconduct-
ing grains at irrational frustration2,3 are consistent with
the scenario of a zero-temperature resistive transition
since even at the lowest temperatures a zero-resistance
state was not observed in these experiments. On the
other hand, current-voltage scaling analysis of experi-
mental data on wire networks5,6 was found to be con-
sistent with a resistive transition at finite temperature.
We note, however, that although the equilibrium behav-
ior of wire networks can be described by the same model
of Eq. 1, the nonlinear dynamical behavior may be quite
different since the nodes of the network are connected
by continuous superconducting wires, instead of weak
links, leading to additional larger energy barriers for vor-
tex motion, not included in the model, and consequently
larger phase-coherence length ξ and relaxation time τ
when compared with weak links24. In this case, the
characteristic crossover current to the linear resistivity
regime at low temperatures due to thermal fluctuations,
Jnl ∝ kT/ξ, expected in the zero-temperature transition
scenario, may only occur at current scales too small to
be detected experimentally. Thus the resistive behavior
is observed in a current regime at higher currents where
it follows the mean-field theory result25 where a vortex-
glass transition is possible at finite temperatures. How-
ever, the zero-temperature resistive could in principle be
observed in specially prepared wire networks in the weak
coupling regime where the additional energy barrier for
vortex motion can be minimized26. Other effects, such
as weak disorder, which is inevitably present in both ex-
perimental systems, should also be considered. It could
provide a possible explanation for the finite-temperature
resistive transition observed recently in arrays of super-
conducting grains4.
In conclusion, we have investigated the resistivity scal-
ing of Josephson-junction arrays at a irrational frustra-
tion using a driven MC dynamics17. The results are con-
sistent with a phase-coherence transition scenario where
the critical temperature vanishes, Tc = 0. The linear
resistivity is finite at nonzero temperatures but nonlin-
ear behavior sets in at a crossover current determined
by the thermal critical exponent ν. The results agree
with earlier simulations using the resistively shunted-
junction model for the dynamics of the array10 and more
recent MC simulations taking into account the intrinsic
finite-size effect15. Although we have only studied the
array at a particular value of irrational frustration, the
golden mean, we believe that the conclusion of a zero-
temperature phase-coherence transition should be valid
for all irrationals but possibly with different values of
the thermal critical exponent ν. The main advantage of
studying the golden mean value is that it is considered
the farthest from the low-order rationals and so intrinsic
finite-effects should be smaller. However, other irrational
frustrations have also been studied numerically15,23 and
experimentally5. The resistive behavior probes mainly
the phase-coherence of the system and since we find that
phase coherence is only attained at zero temperature, we
can not address directly the question of the existence of
a vortex-order transition at finite temperatures. In fact,
vortex order does not require long-range phase coherence.
Therefore, a vortex-order transition at zero temperature
or at finite temperature is consistent with the present
work. However, in view of the results for the supercooled
relaxation12 suggesting an analogy to structural glasses
such transition may be expected at finite temperature
and in fact is consistent with MC simulations indicating
a first-order vortex transition13,14,16. Thus, the inter-
esting possibility arises where the array undergoes two
transitions for decreasing temperature, a finite-resistance
vortex-order transition at finite temperature and a su-
perconducting transition only at zero temperature. This
phase transition scenario and the predicted behavior of
the linear and nonlinear resistivity provides an interesting
experimental signature for a Josephson-junction array at
irrational frustration.
This work was supported by FAPESP (grant 03/00541-
0) and computer facilities from CENAPAD-SP.
1 R.A. Hyman, M. Wallin, M.P.A. Fisher, S.M. Girvin, and
A.P. Young, Phys. Rev. B 51, 15304 (1995); D.S. Fisher,
M.P.A. Fisher, and D.A. Huse, Phys. Rev. B 43 130 (1991).
2 J.P. Carini, Phys. Rev. B 38, 63 (1988).
3 H. S. J. Zant, H. A. Rijken, and J. E. Mooij, J. Low Temp.
Phys. 82, 67 (1991).
4 I.C Baek, Y.J.Y Yu, and M.Y Choi, Phys. Rev. B 69,
172501 (2004).
5 F.Yu, N.E. Israeloff, A.M. Goldman, and R. Bojko, Phys.
Rev. Lett. 68, 2535 (1992).
6 X.S. Ling, H.J. Lezec, M.J. Higgins, J.S. Tsai, J. Fujita, H.
Numata, Y. Nakamara, Y. Ochiai, C. Tang, P.M. Chaikin,
and S. Bhattacharya, Phys. Rev. Lett. 76, 2989 (1996).
7 S. Teitel and C. Jayaprakash, Phys. Rev. Lett. 51, 1999
(1983).
8 T. Halsey, Phys. Rev. Lett. 55, 1018 (1985).
9 M.Y. Choi and D. Stroud, Phys. Rev. B 35, 7109 (1987).
10 E. Granato, Phys. Rev. B 54, 9655 (1996).
11 E. Granato, Phys. Rev. B 58, 11161 (1998).
12 B. Kim and S.J. Lee, Phys. Rev. Lett. 78, 3709 (1997).
13 P. Gupta, S. Teitel, M.J.P. Gingras, Phys. Rev. Lett. 80,
105 (1998).
14 C. Denniston and C. Tang, Phys. Rev. B 60, 3163 (1999).
15 S.Y. Park, M.Y. Choi, B.J. Kim, G.S. Jeon, and J.S.
Chung, Phys. Rev. Lett. 85, 3484 (2000).
16 S.J. Lee, J.R. Lee, and B. Kim, Phys. Rev. Lett. 88, 025701
(2002).
17 E. Granato, Phys. Rev. B 69, 144203 (2004).
18 L.W. Lee and A.P. Young, Phys. Rev. Lett. 22, 227203
(2003).
19 W.M. Saslow, M. Gabay, and W.-M. Zhang, Phys. Rev.
Lett. 68, 3627 (1992).
20 K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. 65, 1604
(1996); E. Marinari and G. Parisi, Europhys. Lett. 19, 451
(1992).
21 E. Granato, unpublished.
22 C. Wengel and A.P. Young, Phys. Rev. B 56, 5918 (1997)
23 M.R. Kolahchi, Phys. Rev. B 59, 9569 (1999).
24 H. S. J. Zant, M.N. Webster, J. Romijn, and J.E. Mooij,
Phys. Rev. B 42, 2647 (1990).
25 G. Parisi, J. Phys. A 27, 7555 (1994).
26 M. Giroud, O. Buisson, Y.Y. Wang, and B. Pannetier, J.
Low Temp. Phys. 87, 683 (1992).
ABSTRACT
  We use a driven Monte Carlo dynamics in the phase representation to determine
the linear resistivity and current-voltage scaling of a two-dimensional
Josephson-junction array at an irrational flux quantum per plaquette. The
results are consistent with a phase-coherence transition scenario where the
critical temperature vanishes. The linear resistivity is nonzero at any finite
temperatures but nonlinear behavior sets in at a temperature-dependent
crossover current determined by the thermal critical exponent. From a dynamic
scaling analysis we determine this critical exponent and the thermally
activated behavior of the linear resistivity. The results are in agreement with
earlier calculations using the resistively shunted-junction model for the
dynamics of the array. The linear resistivity behavior is consistent with some
experimental results on arrays of superconducting grains but not on wire
networks, which we argue have been obtained in a current regime above the
crossover current.

<|endoftext|><|startoftext|>
Introduction
Low and high mass stars (M∗>8 M⊙) are formed in different
regimes. While low mass stars can be formed isolated or in loose
associations, high mass stars are always found in tight clusters.
Intermediate-mass young stellar objects (IMs) (protostars and
Herbig Ae/Be [HAEBE] stars with M∗ ∼ 2 - 8 M⊙) constitute the
link between low- and high-mass stars. In particular the transi-
tion between the low density groups around T Tauri stars and the
dense clusters around massive stars occurs in these objects. Testi
et al. (1998,1999) studied the clustering around HAEBE stars
using optical and near-infrared (NIR) images and concluded that
transition occurs smoothly from Ae to Be stars. Thus, these stars
are key objects to study the onset of clustering.
Thus far, clustering has only been studied at infrared and op-
tical wavelengths because of the limited spatial resolution and
sensitivity of the mm telescopes. Thus, the earliest stages of
the cluster formation were hidden to the observers. The sub-
arcsecond angular resolution provided by the new A configu-
ration of the PdBI allows, for the first time, to study clustering
at mm wavelengths with a similar sensitivity and spatial resolu-
tion to the NIR studies. In this Letter, we present interferomet-
ric continuum observations of the IM protostars Serpens-FIRS 1
(precursor of a Ae star) and CB 3 (precursor of a Be star) aimed
Send offprint requests to: A. Fuente
to study the clustering phenomena in the early Class 0 phase. We
also use the data at highest spatial resolution towards IC 1396 N
reported in this special issue by Neri et al. (Paper II, hereafter).
1.1. Serpens-FIRS 1
Serpens-FIRS 1 is a 46 L⊙ Class 0 source located in a very active
star forming region. Previous mid-IR and NIR studies show that
the population of YSOs is strongly clustered, with the Class I
sources more clustered than the Class II ones (Kaas et al. 2004).
The sub-clusters of Class I sources are located in a NW-SE
oriented ridge following the distribution of dense cores in the
molecular cloud with a subclustering spatial scale of 0.12 pc (see
Fig. 1). The Class II stars are located surrounding the molecular
cores with a subclustering spatial scale of 0.25 pc. Adopting a
distance of 310 pc, the YSOs density in the sub-clusters ranges
from 360–780 pc−2. Several high angular resolution mm studies
have been made in the Serpens molecular cloud (Testi & Sargent
1998, William & Myers 1999, Hogerheijde et al. 1999, Testi et
al. 2000). We have imaged at higher spatial resolution a region
of 0.04 pc around the intense mm-source FIRS 1.
http://arxiv.org/abs/0704.1098v1
2 A. Fuente et al.: Protostellar clusters in intermediate mass (IM) star forming regions
Table 1. Millimeter flux densities, sizes, spectral indexes and masses
Position Peak Gaussian width1 Int. Intensity Mass2 Size3 α4 Sensitivity5 Sampled area6
(mJy/beam) (”) (mJy) (M⊙) (AU) (M⊙) r(pc)
Serpens-FIRS 1
1.3mm 18:29:49.80 01:15:20.41 273(1) 0.50”×0.63” 357 0.1 65 1.57 0.01 0.02
3.3mm 18:29:49.80 01:15:20.41 63(0.5) 0.80”×1.73” 71 0.04
CB 3-1
1.3mm 00:28:42.60 56:42:01.11 20(1) 0.36”×0.48” 34 0.62 600 2.52 0.04 0.16
3.3mm 00:28:42.60 56:42:01.11 2.0(0.5) 0.88”×1.27” 2.9 0.32
CB 3-2
1.3mm 00:28:42.20 56:42:05.11 10(1) 0.31”×0.43” 13 0.24 330 1.87 0.04 0.16
3.3mm 00:28:42.20 56:42:05.11 2.1(0.5) 0.80”×1.00” 2.1 0.32
1 Half-power width of the fitted 2-D elliptical Gaussian
2 Mass estimated using the 1.3mm fluxes and assuming Td=100 K and κ1.3mm=0.01 g
−1 cm2
3 Deconvolved source size at 1.3mm
4 1.3mm/3.3mm spectral index
5 5×rms mass sensitivity derived from the 1.3mm image assuming Td=100 K and κ1.3mm=0.01 g
−1 cm2
6 Radius (HPBW/2) of the PdBI primary beam at the source distance
Fig. 1. Dust continuum mosaic (contours and grey scale) of the
Serpens main core as observed with the IRAM 30m telescope.
The location of the Class II (blue filled squares), flat (red crosses)
and Class I sources (red empty circles) is indicated (adapted
from Kaas et al. 2004). In the inset, we show the 3mm and
1.3mm (small inset) continuum images observed with the PdBI.
Note that only one compact core is detected in this region down
to a spatial scale of less than 100 AU. The dashed circle marks a
region of 0.2 pc radius around FIRS 1.
1.2. IC 1396 N
IC 1396 N is a ∼300 L⊙ source located at a distance of 750 pc
(Codella et al. 2001). A total population of ∼30 YSOs has been
found in this region (Getman et al. 2007, Nisini et al. 2001).
These YSOs present an elongated spatial distribution with an age
gradient towards the center of the Class I/0 system. The Class III
sources are located in the outer rim of the globule, the Class II
sources are congregated in the bright ionized rim and the Class
I/0 objects are located towards the dense molecular clump (see
Fig. 2). The average density of YSOs in the globule is ∼200 pc−2.
We have mapped a region of 0.1 pc around the Class 0/I system.
1.3. CB 3
CB 3 is a large globule (930 L⊙) located at 2.5 Kpc from the
Sun (Codella & Bachiller 1999). A strong submillimeter source
is observed in the central core (see Fig. 3 and Huard et al. 2000).
Deep NIR images of the region show ∼40 NIR sources, from
which at least 22 are very red, indicative of pre-main sequence
stars (Launhardt et al. 1998). Up to our knowledge, there are
no mid-IR and/or X-ray studies in this region. Then, the census
of YSOs is not complete in this IM source. We have mapped a
region of 0.32 pc around the submillimeter source.
2. Observations
The observations were made on January and February, 2006.
The spectral correlator was adjusted to cover the entire RF pass-
bands (580 MHz) for highest continuum sensitivity. The overall
flux scales for each epoch and for each frequency band were
set on 3C454.3 and MWC349 (for CB 3), and 1749+096 (for
Serpens–FIRS 1). The resulting continuum point source sensi-
tivities (5×rms) were estimated to 2.00 mJy at 237.571 GHz and
0.5 mJy at 90.250 GHz for CB 3 and 40.00 mJy at 237.571 GHz
and 7.0 mJy at 90.250 GHz for Serpens–FIRS 1. The corre-
sponding synthesized beams adopting uniform weighting were
0.4′′ × 0.3′′ at 237.571 GHz and 1.0′′ × 0.8′′ at 90.250 GHz
for CB 3 and 0.6′′ × 0.4′′ at 237.571 GHz and 1.7′′ × 0.7′′ at
90.250 GHz for Serpens–FIRS 1. (See Paper II for IC 1396 N.)
3. Results
In Table 1 we present the coordinates, sizes and mm fluxes of the
compact cores detected in Serpens–FIRS 1 and CB 3. The results
towards IC 1396 N are presented in Paper II. Only 1 mm-source
is detected in Serpens–FIRS 1 down to a separation of less than
100 AU. The other targets turned out to be multiple sources. We
have detected 2 mm-sources towards CB 3 and 4 mm-sources
towards IC 1396 N.
A. Fuente et al.: Protostellar clusters in intermediate mass (IM) star forming regions 3
Fig. 2. On the left, we show the 5′×5′ Spitzer IRAC 3.6 µm image towards IC 1396 N (adapted from Getman et al. 2007). The
location of the globule is marked by the green contour and the Class III (yellow triangles), Class II (red circles) and blue squares
(Class 0/I) sources are indicated. On the right, we show the 3mm (up) and 1.3mm (down) continuum images observed with the
PdBI. In the 3mm image we also indicate the Class III (black triangles), Class II (red circles) and Class 0/I (filled blue squares)
sources.
The 4 compact sources towards IC 1396 N are grouped in
2 sub-clusters separated by 0.05 pc which are spatially coinci-
dent with the sources named BIMA 2 and BIMA 3 by Beltrán
et al. (2002). The projected distance between these sub-clusters
is similar to that found by Hunter et al. (2007) between the mm
sub-clusters in the massive star forming region NGC 6336 I. This
distance is also similar to the distance between the stars form-
ing the Trapezium in Orion (from 5000 to 10000 AU). Thus it
is a typical distance between the IM and massive stars in the
same cloud. Our high angular resolution observations reveal that
BIMA 2 is itself composed of 3 compact cores embedded in a
more extended component (see Fig. 2). These 3 compact cores
are new mm detections and constitute the first sub-cluster of
Class 0 IM sources detected thus far.
In CB 3 we have detected 2 mm-sources separated by 0.06 pc
(see Table 1 and Fig. 3). These compact cores are new detec-
tions and the separation between them is similar to that between
BIMA 2 and BIMA 3 in IC 1396 N. In fact, the structure of the
globule CB 3 resembles much that of IC 1396 N but the angu-
lar resolution of our observations prevent us from resolving any
possible sub-cluster of compact cores in this more distant source.
Note that the masses of CB 3-1 and CB 3-2 are similar to that of
the sub-cluster BIMA 2 (Paper II).
The number of detections is limited by the sensitivity of
our observations. In Table 1 we show the point source mass
sensitivity assuming a dust temperature of 100 K (typical for
hot cores and circumstellar disks around luminous Be stars)
and κ1.3mm=0.01 g
−1 cm2 for each target. It is possible that we
miss a population of weak Class 0/I sources in CB 3 where
the mass sensitivity is poor (0.04 M⊙). However, the sensitiv-
ity in Serpens–FIRS 1 (0.01 M⊙) and IC1396 N (0.007 M⊙) is
good enough to detect disks around early Be stars that usually
have masses of ∼0.01 M⊙ (see e.g Fuente et al. 2003, 2006).
We should have also detected massive disks (∼0.1 M⊙) around
Herbig Ae and T Tauri stars although the dust temperature is
lower, Td=15–56 K (Natta et al. 2000). But there is still the
possibility of the existence of HAEBE or T Tauri stars with
weak circumstellar disks that are not detected in our mm im-
ages. Another possibility is that we are missing a population
of hot corinos (we refer as “hot corino” to the warm material
(∼100 K) around a low mass Class 0 protostar regardless of its
chemical composition) with masses below the values reported in
Table 1. Our sensitivity is good enough to detect a hot corino
similar to IRAS 16293–2422 A and B (L∼10 L⊙) at the dis-
tance of our sources (see Bottinelli et al. 2004). Thus the possi-
ble “missed” hot corinos should correspond to lower luminosity
protostars. Finally, we can be missing a population of dense and
cold cores. Assuming a dust temperature of 10 K, these compact
cold cores should have masses of less than 0.17, 0.12 and 0.7
M⊙ in Serpens–FIRS 1, IC 1396 N and CB 3 respectively. These
masses are not large enough to form new IM stars.
4 A. Fuente et al.: Protostellar clusters in intermediate mass (IM) star forming regions
4. Discussion
Testi et al. (1999) studied the clustering around a large sample of
HAEBE stars. In order to quantify the concept, they introduced
the parameter Nk, defined as the number of stars in a radius of
0.2 pc, the typical cluster radius. They showed that rich clus-
ters are only found around the most massive stars, although the
parameter Nk is highly variable. Some Be stars are born quite
isolated, while others have Nk >70. For our sources this number
is 22 (Launhardt et al. 1998, but the census is not complete), 29
(from Fig. 1) and 28 (Getman et al 2007; Nisini et al. 2001) in
CB3, Serpens and IC 1396 N respectively, where all previously
known YSOs (Class 0, I, II and III) in the regions are considered.
Our maps show 2 sources in CB 3 on a 0.3 pc scale, 1 source
in Serpens-FIRS 1 on a 0.04 pc scale, and 4 sources in IC 1396 N
on a 0.1 pc scale. Defining Nmm as the number of mm sources in
a radius of 0.2 pc, we can estimate Nmm from our observations
and provide a revised value for the total number of YSOs at this
scale. In Serpens our interferometric observations do not add any
new mm source to previous data. We have observed the most in-
tense mm clump in Fig, 1, the most likely to be a multiple source,
and only found 1 compact source. Based on the 30m map shown
in Fig. 1 and assuming that all the clumps host only one source
we estimate Nmm∼7 from a total of 29 YSOs. In CB 3, our data
add 2 new mm sources (Nmm=2) to the previous census of YSOs
based on NIR studies. In IC 1396 N, we estimate Nmm=4–16.
The upper limit has been calculated assuming a constant den-
sity of mm sources in the region. Usually, the Class 0/I stars
are not uniformly distributed in the clouds, but grouped in sub-
clusters that are coincident with the peak of dense cores. Thus
the value of Nmm is very likely close to 4 and we assumed this
number hereafter. Since BIMA 2 and BIMA 3 were previously
detected in the X-rays surveys by Getman et al. (2007), we only
add two new sources (due to the multiplicity of BIMA 2) to the
total number of YSOs in this region.
Summarizing, the total number of YSOs is now 29, 24 and
30 for Serpens–FIRS 1, CB 3 and IC 1396 N respectively. While
Serpens–FIRS 1 is an extraordinarily rich cluster compared with
the clusters around Ae stars reported by Testi et al. (1999), CB 3
and IC 1396 N do not seem to become one of the crowded
clusters (Nk∼70) detected by these authors around Be stars.
However, this conclusion might not be true. The interferometer
is only sensitive to dense and compact cores and provides a bi-
ased vision of the star forming regions. In fact our interferomet-
ric observations accounts for less than 1% of the total interstellar
mass in the studied globules, i.e., ∼ 10, 58 and 64 M⊙ are missed
in Serpens–FIRS 1, CB 3 and IC 1396 N respectively (Alonso-
Albi et al. 2007). One possibility is that this mass is in the form
of many weak hot corinos which could eventually become low
mass stars. The fate of these hot corinos is, however, linked to
the evolution of the IM protostar that is progressively dispers-
ing and warming the surrounding material (Fuente et al. 1998).
Another possibility is that the ”missed” mass is in the form of an
extended and massive envelope. This envelope (if not totally dis-
persed by the IM star) could produce new stars in a forthcoming
star formation event.
5. Summary
We have searched for clustering at mm wavelengths in 3 IM star
forming regions. We have detected 1, 2 and 4 compact cores in
Serpens–FIRS 1, CB 3 and IC 1396 N respectively. The compact
cores are not distributed uniformly but grouped in sub-clusters
separated by ∼0.05 pc. Such a separation is a typical distance
Fig. 3. Dust continuum emission at 850 µm as observed with
SCUBA towards CB 3. In the inset, we show the 3mm contin-
uum image observed with PdBI. Note that two compact cores
are detected towards the single-dish peak.
for both IM and massive stars within the same cloud. We have
used our mm observations to complete the census of YSOs in
these regions and compare them with the clusters found by Testi
et al. (1999) in the more evolved HAEBE stars. Serpens–FIRS 1
seems to belong to an extraordinarily rich cluster. The density
of YSOs in the high luminosity sources IC 1396 N and CB 3 is
consistent with the density found in the clusters around Be stars
although our sources are not found between the most crowded
regions. The large amount of interstellar gas and dust in the stud-
ied regions suggest that new star formation events are still pos-
sible.
Acknowledgements. We are grateful to Phil Myers for his careful reading of
the manuscript. A.F. is grateful for support from the Spanish MEC and FEDER
funds under grant ESP 2003-04957, and from SEPCT/MEC under grant AYA
2003-07584.
References
Alonso-Albi, T. et al. 2007, in preparation
Beltrán, M. T., Girart, J. M., Estalella, R., Ho, P. T. P., & Palau, A. 2002, ApJ,
573, 246
Bottinelli, S., Ceccarelli, C., Neri, R. et al. 2004, ApJ, 617, L69
Codella, C., & Bachiller, R. 1999, A&A, 350, 659
Codella, C., Bachiller, R., Nisini, B., Saraceno, P., & Testi, L. 2001, A&A, 376,
Davis, C. J., Matthews, H. E., Ray, T. P., Dent, W. R. F., & Richer, J. S. 1999,
MNRAS, 309, 141
Fuente, A., Martı́n-Pintado, J., Bachiller, R., Neri, R., & Palla, F. 1998, A&A,
334, 253
Fuente, A., Rodrı́guez-Franco, A., Testi, L., Natta, A., Bachiller, R., & Neri, R.
2003, ApJ, 598, L39
Fuente, A., Alonso-Albi, T., Bachiller, R., Natta, A., Testi, L., Neri, R., &
Planesas, P. 2006, ApJ, 649, L119
Getman, K.V., Feigelson, E.D., Garmire, G., Broos, P., Wang, J., 2007, ApJ 654,
Hogerheijde, M. R., van Dishoeck, E. F., Salverda, J. M., & Blake, G. A. 1999,
ApJ, 513, 350
Huard, T.L., Weintraub, D.A., & Sandell, G., 2000, A&A 362, 635
Hunter, T. R., Brogan, C. L., Megeath, S. T., Menten, K. M., Beuther, H., &
Thorwirth, S. 2006, ApJ, 649, 888
Kaas, A.A., Olofsson, G., Bontemps, S., et al., 2004, A&A 421, 623
A. Fuente et al.: Protostellar clusters in intermediate mass (IM) star forming regions 5
Launhardt, R., Henning, T., Klein, R., 1998, in: Yun J.L., Liseau R. (eds), ASP
Conf. Ser. 132, Star Formation with the Infrared Space Observatory, San
Francisco: ASP, p. 119
Natta, A., Grinin, V., & Mannings, V. 2000, Protostars and Planets IV, 559
Neri, R., Fuente, A., Ceccarelli, et al., 2007, this issue.
Nisini, B., et al. 2001, A&A, 376, 553
Testi, L., & Sargent, A. I. 1998, ApJ, 508, L91
Testi, L., Palla, F., & Natta, A. 1998, A&AS, 133, 81
Testi, L., Palla, F., & Natta, A., 1999, A&A 342, 515
Testi, L., Sargent, A. I., Olmi, L., & Onello, J. S. 2000, ApJ, 540, L53
Williams, J. P., & Myers, P. C. 1999, ApJ, 518, L37
	Introduction
	Serpens-FIRS 1
	IC 1396 N
	CB 3
	Observations
	Results
	Discussion
	Summary
ABSTRACT
  The transition between the low density groups of T Tauri stars and the high
density clusters around massive stars occurs in the intermediate-mass (IM)
range (M$_*$$\sim$2--8 M$_\odot$). High spatial resolution studies of IM young
stellar objects (YSO) can provide important clues to understand the clustering
in massive star forming regions.
  Aims: Our aim is to search for clustering in IM Class 0 protostars. The high
spatial resolution and sensitivity provided by the new A configuration of the
Plateau de Bure Interferometer (PdBI) allow us to study the clustering in these
nearby objects.
  Methods: We have imaged three IM Class 0 protostars (Serpens-FIRS 1, IC 1396
N, CB 3) in the continuum at 3.3 and 1.3mm using the PdBI. The sources have
been selected with different luminosity to investigate the dependence of the
clustering process on the luminosity of the source.
  Results: Only one millimeter (mm) source is detected towards the low
luminosity source Serpens--FIRS 1. Towards CB 3 and IC1396 N, we detect two
compact sources separated by $\sim$0.05 pc. The 1.3mm image of IC 1396 N, which
provides the highest spatial resolution, reveal that one of these cores is
splitted in, at least, three individual sources.

<|endoftext|><|startoftext|>
Introduction
1979 Epps reported results showing that stock return correlations decrease
as the sampling frequency of data increases [1]. Since his discovery the
phenomenon has been detected in several studies of different stock markets
[2, 3, 4] and foreign exchange markets [5, 6].
∗E-mail: bence@maxwell.phy.bme.hu
http://arXiv.org/abs/0704.1099v2
Cross-correlations between the individual assets are the main factors in
classical portfolio management thus it is important to understand and give
their accurate description on different time scales. This is especially so, since
today the time scale in adjusting portfolios to events occuring may be in the
order of minutes.
Considerable effort has been devoted to uncover the phenomenon found
by Epps [7, 8, 9, 10, 11, 12]. However most of the works aim to construct a
better statistical measure for co-movements in prices in order to exclude bias
of the estimator by microstructure effects [13, 14, 15, 16, 17, 18, 19, 20], only
a few searching for the description of the microstructure dynamics.
Up to now two main factors causing the effect have been revealed: The
first one is a possible lead-lag effect between stock returns [21, 22, 23] which
can appear mainly between stocks of very different capitalisation and/or for
some functional dependencies between them. In this case the maximum of
the time-dependent cross-correlation function can be found at non zero time
lag, resulting in increasing cross-correlations as the sampling time scale gets
into the same order of magnitude as the characteristic lag. This factor can be
easily understood, morever, in a recent study [23] we showed that through
the years this effect becomes less important as the characteristic time lag
shrinks, signalising an increasing efficiency of stock markets. As the Epps
effect can also be found for the case when no lead-lag effect is present, in the
following we will focus only on other possible factors.
The second, more important factor is the asynchronicity of ticks in case
of different stocks [7, 8, 21, 24]. Empirical results [7] showed that taking into
account only the synchronous ticks reduces to a great degree the Epps effect,
i.e. measured correlations on short sampling time scale increase. Naturally
one would expect that for a given sampling frequency growing activity de-
creases the asynchronicity, leading to a weaker Epps effect. Indeed Monte
Carlo experiments showed an inverse relation between trading activity and
the correlation drop [7].
However, the analysis of empirical data showed [25] that the explanation
of the effect solely by asynchronicity is not satisfactory. After eliminating the
effect of changing asymptotic cross-correlations through the years (scaling
with the asymptotic value), the curves of cross-correlation as a function of
sampling time scale tend to collapse to one curve and surprisingly we do
not find a measurable reduction of the characteristic time of the Epps effect,
while the trading frequency grew by a factor of ∼ 5−10 in the period. These
results will be discussed further in details in Section 2.2.
The characteristic time of market phenomena can usually be split up into
three kinds of market time scales: the frequency of trading on the market
(which we will denote as activity), market periodicities and the reaction time
of traders to news, events. In Ref. [25] we showed that the characteristic time
of the Epps effect does not scale with changing market activity (this we will
discuss in Section 2.2), which points out that the characteristic time of the
Epps effect can not be determined solely by the market activity causing asyn-
chronicity. Market periodicities in high frequency data are the different types
of patterns, which can be found in intraday data, as well as on broader time
scales (see e.g. Refs. [26] and [27]). Market periodicities and intraday struc-
ture do not have a role in our results since we are averaging them out. Hence
we believe that the characteristic time of the Epps effect is the outcome of a
human time scale present on the market: The time that market participants
need to react to certain pieces of news. There are several studies in the litera-
ture about reaction time. The issue is connected both to behavioural finance
questions and to market efficiency. 1970, Fama defined an efficient market as
one in which prices fully reflect all available information [28]. This response
to information in practice can not happen instantaneously. There are several
results reporting that prices incorporate news within five to fifteen minutes
after news announcements [29, 30, 31, 32, 33]. More recent studies showed
similar results on the time that traders needed to react to news [34, 35, 36].
Supposing that the Epps effect is possibly the outcome of a human time
scale present on the market motivated us to separate the terms in the cross-
correlation function, in order to study their behaviour one by one. In this
paper we suggest an analytic decomposition of the cross-correlation function
of asynchronous events using time lagged correlations. As a second step we
demonstrate the efficiency of the formalism on the example of generated data.
Finally we describe and fit the empirically observed dependence of the cross-
correlations. We find that the origin of the independence of the characteristic
time of the Epps effect on the trading frequency is the presence of a human
time scale in the time lagged autocorrelation functions. Ref. [12] already
called the attention to the importance of lagged cross-influences of stock
returns in explaining the Epps-effect. Using a somewhat different formalism,
here we investigate thoroughly this relationship.
The paper is built up as follows: in Section 2 we present the data used
and discuss the problems of the former descriptions. Section 3 the decompo-
sition of the cross-correlation coefficient, Section 4 shows a simulation model
demonstrating the idea. In Section 5 we present the assumptions concerning
real data and show fits and statistics for the Epps curves. We finish the paper
with a discussion.
2 Empirical analysis
2.1 Data and methodology
In our analysis we used the Trade and Quote (TAQ) Database of the New
York Stock Exchange (NYSE) for the period of 4.1.1993 to 31.12.2003, con-
taining tick-by-tick data. The data used was adjusted for dividends and
splits.
We computed the logarithmic returns of stock prices:
rA∆t(t) = ln
pA(t)
pA(t −∆t)
, (1)
where pA(t) stands for the price of stock A at time t. The prices were deter-
mined using previous tick estimator on the high frequency data, i.e. prices
are defined constant between two consecutive trades. The time dependent
cross-correlation function C
∆t (τ) of stocks A and B is defined by
∆t (τ) =
rA∆t(t)r
∆t(t + τ)
rA∆t(t)
rB∆t(t + τ)
. (2)
The notion 〈· · · 〉 stands for the time average over the considered period:
〈r∆t(t)〉=
T −∆t
r∆t(i), (3)
where time is measured in seconds and T is the time span of the data.
The standard deviation σ of the returns reads as:
〈r∆t(t)2〉−〈r∆t(t)〉
, (4)
both for A and B in (2). We computed correlations for each day separately
and averaged over the set of days, this way avoiding large overnight returns
and trades out of the market opening hours. For pairs of stocks with a
lead–lag effect the function C
∆t has a peak at non-zero τ. The equal-time
cross-correlation coefficient is naturally: ρA/B∆t ≡C
∆t (τ = 0). In our notations
the Epps effect means the decrease of ρ∆t as ∆t decreases (see Figure 1). Since
the prices are defined as being constant between two consecutive trades, the
∆t time scale of the sampling can be chosen arbitrarily.
As stated above, we do not want to discuss the Epps effect originated
from the lead-lag effect in the correlations. Thus we consider only pairs of
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
∆t [sec]
Figure 1: The cross-correlation coefficient as a function of sampling time
scale for the period 1993–2003 for the Coca-Cola Pepsi pair. Several hours
are needed for the correlation to reach its asymptotic value.
stocks where the latter effect is negligible, i.e., for which the price changes
are highly correlated with the peak position of C
∆t of Equation (2) being at
τ ≈ 0. The results shown in this paper can be generalised for all stock pairs
(though in case of an empirical study one can never fit all data). To illus-
trate our results we will present results for some stock pairs and in Section
5 we will show statisitcs for a broader set of data. The stocks mentioned
in the paper are the following: Avon Products, Inc. (AVP), Caterpillar Inc.
(CAT), Colgate-Palmolive Company (CL), E.I. du Pont de Nemours & Com-
pany (DD), Deere & Company (DE), The Walt Disney Company (DIS), The
Dow Chemical Company (DOW), General Electric Co. (GE), International
Business Machines Corp. (IBM), Johnson & Johnson (JNJ), The Coca-Cola
Company (KO), 3M Company (MMM), Motorola Inc. (MOT), Merck &
Co., Inc. (MRK), PepsiCo, Inc. (PEP), Pfizer Inc. (PFE), The Procter &
Gamble Company (PG), Sprint Nextel Corp. (S), Vodafone Group (VOD),
Wal-Mart Stores Inc. (WMT).
2.2 Time evolution of the characteristic time
Previous studies claimed the asynchronicity of ticks for different stocks as the
main cause of the Epps effect [7, 8]. It is natural to assume that, for a given
sampling frequency, increasing trading activity should enhance synchronicity,
leading to a weaker Epps effect.
To study the trading frequency dependence of the cross-correlation drop,
we computed the Epps curve separately for different years. In Figure 2 the
cross-correlation coefficients can be seen as a function of the sampling time
scale for the years 1993, 1997, 2000 and 2003 for three example stock pairs.
0 2000 4000 6000 8000
∆t [sec]
0 2000 4000 6000 8000
∆t [sec]
0 2000 4000 6000 8000
∆t [sec]
Figure 2: The Epps curves for the CAT/DE (top), KO/PEP (middle) and
MRK/JNJ (bottom) pairs for the years 1993, 1997, 2000 and 2003. The
asymptotic value of the cross-correlations varies in time.
It is known that cross-correlation coefficients are not constant through
the years. The asymptotic values of cross-correlations (long sampling time
scale) depend on the economical situation, the state of the economic sectors
that the pairs of stocks belong to, and several other factors. We need to
take this into account and try to extract the effect of changing asymptotic
cross-correlations from the Epps phenomenon. In order to get comparable
curves, we scaled the cross-correlation curves with their asymptotic value:
The latter was defined as the mean of the cross-correlation coefficients for
the sampling time scales ∆t = 6000 seconds through ∆t = 9000 seconds, and
the cross-correlations were divided by this value. Figure 3 shows the scaled
curves for the same years and pairs as Figure 2.
The frequency of trades changed considerably in the last two decades:
Trading activity has grown almost monotonically, as it can be seen in Figure
4. This would infer the diminution of the Epps effect and a much weaker
decrease of the correlations as sampling frequency is increased. However,
after scaling with the asymptotic cross-correlation value, the curves give a
reasonable data collapse and no systematic trend can be seen. Surprisingly,
as it can be seen in Table 1, a rise of the trading frequency by a factor of
∼ 5−10 does not lead to a measurable reduction of the characteristic time
of the Epps effect (where we define the characteristic time the time scale for
which the cross-correlation reaches the 1− e−1 rate of its asymptotic value).
These results show that explaining the Epps effect merely as a result of
the asynchronicity of ticks is not satisfactory. It is also important to mention
that not even the changing tick sizes for the stocks (likely to change the
arrival rate of price changes) alter the characteristic time of the effect.
Table 1: The characteristic time of the Epps effect for the years 1993, 1997,
2000 and 2003 measured in seconds for the stocks pairs: CAT/DE, KO/PEP
and MRK/JNJ (characteristic time was defined as the time scale for which
the cross-correlation value reaches the 1− e−1 rate of its asymptotic value).
No clear trends can be seen in the charactersitic time while the activity is
growing rapidly.
CAT/DE KO/PEP MRK/JNJ
1993 940 920 800
1997 620 760 420
2000 1320 1040 880
2003 700 800 1060
0 2000 4000 6000 8000
∆t [sec]
0 2000 4000 6000 8000
∆t [sec]
0 2000 4000 6000 8000
∆t [sec]
Figure 3: The Epps curves scaled with the asymptotic cross-correlation values
for the CAT/DE (top), KO/PEP (middle) and MRK/JNJ (bottom) pairs
for the years 1993, 1997, 2000 and 2003. The scaled curves give a reasonable
data collapse in spite of the considerably changing trading frequency, showing
that the characteristic time of the Epps effect does not change with growing
activity.
1994 1996 1998 2000 2002
yearsa
Figure 4: The average intertrade time for the years 1993 to 2003 for some
example stocks (CAT, DE, KO, PEP, MRK, JNJ). The activity was growing
almost monotonically.
3 Decomposition of the cross-correlations
In this section we show calculations for the relation between the value of
cross-correlations on different time scales. We connect the cross-correlation
on a certain time scale (∆t) to lagged autocorrelations and cross-correlations
on smaller time scales (∆t0).
Returns in a certain time window ∆t are mere sums of returns in smaller,
non-overlapping windows ∆t0, where ∆t is a multiple of ∆t0:
r∆t(t) =
∆t/∆t0
r∆t0(t −∆t + s∆t0). (5)
Using this relationship the time average of the product of returns on the
large time scale (∆t) can be written in terms of the averages on the short
time scale (∆t0) in the following way:
rA∆t(t)r
∆t(t)
T −∆t
rA∆t(i)r
∆t(i) =
T −∆t
∆t/∆t0
rA∆t0(i−∆t + s∆t0)
∆t/∆t0
rB∆t0(i−∆t +q∆t0)
∆t/∆t0
∆t/∆t0
rA∆t0(i−∆t + s∆t0)r
∆t0(i−∆t +q∆t0)
. (6)
We can see that on the right side of Equation (6) the lagged time average
of return products appear on the short time scale, ∆t0, i.e., the non-trivial
part of the lagged cross-correlations. Naturally in the case of A = B, we get
the relation for
r∆t(t)
In order to apply Equation (6), we need to have information about the
lagged autocorrelation and cross-correlation functions. Writing out the sum
in Eq. (6) we get:
rA∆t(t)r
∆t(t)
x=− ∆t∆t0
rA∆t0(t)r
∆t0(t + x∆t0)
, (7)
and similarly
rA∆t(t)
x=− ∆t∆t0
rA∆t0(t)r
∆t0(t + x∆t0)
rB∆t(t)
x=− ∆t∆t0
rB∆t0(t)r
∆t0(t + x∆t0)
. (8)
Since the mean of returns is 1-2 orders of magnitude smaller than the
second moments in the correlation function, we can omit the expressions
rA∆t(t)
rB∆t(t + τ)
and 〈r∆t(t)〉
in Equation (2). As these terms are of second
order, this can even be done in case of slight price trends. Hence Equation
(2) becomes:
ρA/B∆t =
rA∆t(t)r
∆t(t)
rA∆t(t)
rB∆t(t)
. (9)
For simplicity, we introduce decay functions to describe lagged correla-
tions:
(x∆t0) =
rA∆t0(t)r
∆t0(t + x∆t0)
rA∆t0(t)r
〉 , (10)
defined for both positive and negative x values, and similarly f
(x∆t0) and
(x∆t0). Thus the correlation can be written in the following form:
ρA/B∆t =
x=− ∆t∆t0
(x∆t0)
rA∆t0(t)r
∆t0(t)
x=− ∆t∆t0
(x∆t0)
rA∆t0(t)
)−1/2
x=− ∆t∆t0
(x∆t0)
rB∆t0(t)
)−1/2
. (11)
Hence
ρA/B∆t =
x=− ∆t∆t0
(x∆t0)
x=− ∆t∆t0
(x∆t0)
)−1/2
x=− ∆t∆t0
(x∆t0)
)−1/2
ρA/B∆t0 . (12)
This way we obtained an expression of the cross-correlation coefficient
for any sampling time scale, ∆t, by knowing the coefficient on a shorter
sampling time scale, ∆t0, and the decay of lagged autocorrelations and cross-
correlations on the same shorter sampling time scale (given that ∆t is multiple
of ∆t0). Our method is to measure the correlations and fit their decay func-
tions on a certain short time scale and compute the Epps curve using the
above formula.
4 Model calculations
In this section we demonstrate the decomposition process on computer gen-
erated time series which should mimic two correlated return series. We will
demonstrate the Epps effect and see how the decomposition works for these
controlled cases. Our aim is to show that in case of generated “price” series,
the decomposition process leads to a very good description of the time scale
dependence of the cross-correlation coefficients. More discussion and details
on the analytic treatment of the model can be found in Ref. [38].
To mimic some properties of financial data, we simulate two correlated
but asynchronous price time series. As a first step we generate a core random
walk with unit steps up or down in each second with equal possibility (W (t)).
Second we sample the random walk, W (t), twice independently with waiting
times drawn from an exponential distribution. This way we obtain two time
series (pA(t) and pB(t)), which are correlated since they are sampled from the
same core random walk, but the steps in the two walks are asynchronous.
The core random walk is:
W (t) = W (t −1)+ ε(t),
where ε(t) is ±1 with equal probability (and W (0) is set high in order to
avoid negative values). The two price time series are determined by
pA(ti) =
W (ti) if ti = ∑ik=1 Xk
p(ti −1) otherwise
pB(ti) =
W (ti) if ti = ∑ik=1Yk
p(ti −1) otherwise
. (14)
from the core random walk, where, i = 1,2, · · · , and Xk and Yk are drawn from
an exponential distribution:
P(y) =
λe−λy if y ≥ 0
0 y < 0
with parameter λ = 1/60. A snapshot as an example of the generated time
series with exponentially distributed waiting times can be seen on Figure 5.
As a next step we create the logarithmic return time series (rA∆t(t) and
rB∆t(t)) of p
A(t) and pB(t) as defined by Equation (1). In case of the random
walk model of price changes we know that
rA∆t(t)
rB∆t(t)
= 0 without
having to make any assumptions. Of course having a random walk model,
the autocorrelation function of the steps is zero for all non-zero time lags:
(x∆t0) = f
(x∆t0) = δx,0, (16)
0 100 200 300 400
10000
10010
10020
pA(t)
pB(t)
Figure 5: A snapshot of the model with exponentially distributed waiting
times. The original random walk is shown with lines (black), the two sampled
series with dots and lines (red) and triangles and lines (blue).
rA∆t(t)
rA∆t0(t)
rB∆t(t)
rB∆t0(t)
. (17)
Hence the cross-correlation can be written in the following form:
ρA/B∆t =
x=− ∆t∆t0
(x∆t0)
rA∆t0(t)r
∆t0(t)
rA∆t0(t)
rB∆t0(t)
)−1/2
x=− ∆t∆t0
(x∆t0)
ρA/B∆t0 . (18)
In the model case we set the smallest time scale ∆t0 = 1 time step. It
can be shown [38] that in the case of λ ≪ 1 (small density of ticks) the exact
analytical expression for the cross-correlations is identical to (18) with an
exponential decay function:
(x∆t0) = e−λ∆t0|x|, (19)
where λ is the parameter of the original exponential distribution used for
sampling. Further results and exact computations of the cross-correlations
for the model can be found in Ref. [38].
Figure 6 shows the computed cross-correlations of the generated time
series on several sampling time scales and the computed cross-correlations
using Equation (18) and the exponential decay function (19). The two curves
are in very good agreement showing that the decomposition procedure is able
to well capture the Epps effect for generated time series.
0 5000
measured correlations
analytically computed correlations
∆t [simulation steps]
Figure 6: The measured and the computed cross-correlation coefficients using
exponential decay function as a function of sampling time scale for the simu-
lated time series with exponentially distributed waiting times. The analytic
fit is in very good agreement with the Epps curve.
5 Application of the theory to the data
In this section we discuss the properties of the decay functions in case of real
world data, and inserting them into Equation (12) we derive analytical fits
for the measured Epps curves.
5.1 Decay functions
As discussed, we measure the equal-time cross-correlations and the decay of
cross and autocorrelations on a certain short sampling time scale and from
these we obtain the value of equal-time cross-correlations on larger sampling
time scales. To do this, in case of the toy model, we had the possibility
of using the smallest time scale available in the generated data as ∆t0, i.e.,
the resolution being one simulation step. When studying data from real
world markets, one has to make restrictions. As being the highest resolution
commonly used in financial analysis, it would be plausible to choose windows
of one second as ∆t0. However on this time scale one is only able to measure
noise, no valid correlations and decay functions can be found. Thus we had to
use less dense data for the smallest sampling time scale: in the results shown
below we set ∆t0 = 120 seconds. Using this resolution we get an acceptable
signal-to-noise ratio and we hope not to lose too much information compared
to higher frequencies.
To avoid new parameters in the model we use the raw decay functions in
the formula (12), without fitting them. Since it is an empirical approach to
determine the decay functions for real data, we have to distinguish the signal
from the noise in the decay functions. Concerning the sensitivity from the
input (decay function) we observed that the results are quite robust against
little changes in the input functions, however the noise in the tail can cause
significant deviations. According to this we take into account the decay
functions for correlations only for short time lags. For the decay of the cross-
correlations we take into account the function only up to the time lag where
the decaying signal reaches zero for the first time, for larger lags we assume
it to be zero. For the decay of autocorrelations we take into the account the
function only up to the time lag where after the negative overshoot of the
beginning it decays to zero from below for the first time, for larger lags we
assume it to be zero.
Figure 7 shows an example of the decay functions in case of the stock
pair KO/PEP (for other pairs the decay functions are very much similar).
The plot shows the decay functions up to the time lags of 1000 seconds, with
a vertical line showing how long we take the empirical decays into account.
We can see that the time lag for which the decay functions disappear is in
the order of a few minutes. In fact in case of all stock pairs studied we found
the decay disappearing after 5–15 minutes.
5.2 Fits
Inserting the empirical decay of lagged autocorrelations and cross-correlations
on the short time scale into the formula of Equation (12), we compare the
computed and the measured Epps curves. Figures 8, 9 and 10 show these
plots for a few example stock pairs.
One can see, that the fits are able to describe the change of cross-correlation
0 200 400 600 800 1000
x∆t0 [sec]
0 200 400 600 800 1000
x∆t0 [sec]
-1000 -750 -500 -250 0 250 500 750 1000
x∆t0 [sec]
Figure 7: Top: The decay of lagged autocorrelations for KO. Middle: The
decay of lagged autocorrelations for PEP. Bottom: The decay of lagged cross-
correlations for KO/PEP pair. A vertical lines show the threshold up to which
we take the decays into account, for larger lags we assume them to be zero.
Sampling time scale is ∆t0 = 120 seconds on all three plots.
with increasing sampling time scale. Note, that as it has been shown in
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
Figure 8: The measured and the analytically computed cross-correlation co-
efficients as a function of sampling time scale for the pairs CAT/DE and
KO/PEP. Note that using only the autocorrelations and cross-correlations
measured on the smallest time scale (∆t0 = 120 seconds) we are able to give
reasonable fits to the cross-correlations on all time scales. Details on the
goodness parameter of the measured and computed correlations can be found
in Table 2.
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
Figure 9: The measured and the analytically computed cross-correlation co-
efficients as a function of sampling time scale for the pairs WMT/S and
GE/MOT. Details on the goodness parameter of the measured and com-
puted correlations can be found in Table 2.
Section 3, in the analytical formula only the autocorrelations and cross-
correlations on the smallest time scale (∆t0) and the decay functions are
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
0 2000 4000 6000 8000
measured correlations
analytically computed correlations
∆t [sec]
Figure 10: The measured and the analytically computed cross-correlation
coefficients as a function of sampling time scale for the pairs MRK/JNJ and
GE /IBM. Details on the goodness parameter of the measured and computed
correlations can be found in Table 2.
taken into account as input to compute the cross-correlations on all other
time scales, no additional parameters are used.
To show a broader set of results, we introduce a goodness parameter for
the agreement between the measured and the analytically determined Epps
curves. We define the goodness parameter as the absolute error between the
measured and the analytically computed points:
g(∆t) = 100
|ρmeasured∆t −ρ
computed
ρmeasured∆t
. (20)
Table 2 shows the maximum, the mean and the median of the goodness
parameters for a broader set of stocks. The results show that the absolute
mean error is very low, with a maximum around 7 percents and both a
mean and a median around 2 percents. It is important to mention, that
the maximal error is usually found for high frequency scales, for longer time
scales and especially for the asymptotic correlation value the aggrement is
very good.
These results show that the growing cross-correlations with decreasing
sampling frequency are due to finite time decay of the lagged autocorrelations
and cross-correlations in the high frequency sampled data.
The finite decay of the cross-correlations on the short time scale (∆t0)
is not caused by difference in the capitalisation of the two stocks or func-
tional dependencies between them. Instead, it is an artifact of the market
Table 2: The maximum, the mean and the median of the goodness parameters
a broader set of stocks. The results show that the absolute mean error is low.
Note that the maximal absolute error in general occurs for high frequency
scales.
stock pair max [%] mean [%] median [%]
CAT/DE 4.81 1.26 0.94
KO/PEP 10.67 2.46 1.23
WMT/S 7.66 2.32 1.74
GE/MOT 6.43 3.29 3.44
MRK/JNJ 5.26 1.51 0.95
GE/IBM 3.90 1.57 1.12
PG/CL 6.05 1.81 1.24
MRK/PFE 4.76 1.42 1.32
AVP/CL 10.37 7.75 9.53
DD/DOW 8.84 2.05 1.49
DD/MMM 5.76 2.17 1.93
MOT/VOD 9.73 2.57 1.82
DIS/GE 5.78 1.54 0.97
average 6.92 2.4 2.1
microstructure. Reaction to a certain piece of news is usually spread out on
an interval of a few minutes for the stocks [29, 30, 31, 32, 33, 34, 35, 36, 39, 40]
due to human trading nature, thus not scaling with activity, with ticks being
distributed more or less randomly. This means that correlated returns are
spread out for this interval (asynchronously), causing non zero lagged cross-
correlations on the short time scale and thus the Epps effect. This way, as
stated by Ref. [7], the asynchronicity is indeed important in describing the
Epps effect but only in promoting the lagged correlations. (Even in case of
completely synchronous, but randomly spread ticks we could have the finite
decay of lagged correlations on short time scale, and hence the Epps effect.)
6 Discussion
In our study we examined the causes of the Epps effect, the dependence
of stock return cross-correlations on sampling time scale. We showed that
explaining the effect solely through asynchronicity of price ticks is not sat-
isfactory. When scaling the Epps curves with their asymptotic value for
different years, we get a reasonable data collapse and a growing activity of
the order ∼ 5−10 does not affect the characteristic time of the Epps effect.
The main point of our calculations is that we connected the cross-correlations
on longer time scales to the lagged autocorrelations and cross-correlations on
any shorter time scale. We demonstrated the idea of these calculations on a
random walk asynchronous model of prices, getting a very good agreement
with the cross-correlation curves.
Assuming the time average of stock returns to be zero we were able to
decompose the expression for the cross-correlation coefficient deriving an an-
alytical formula of the cross-correlations on any time scale, given the decay
of the autocorrelations and cross-correlations on a certain short time scale.
With this analytical formula we were able to give fits to the Epps curves
of real stock pairs getting acceptable results. The fits show that the Epps
effect is caused by the finite time decay of the lagged correlations in the high
frequency sampled data. The reason for the characteristic time not chang-
ing with growing activity is a human time scale present in the phenomenon,
which does not scale with the changing inter-tick time. The finite decay of
lagged correlations on the short time scale is due to market microstructure
properties: different actors on the market have different time horizons of in-
terest resulting in the reactions to certain pieces of news being spread out
for a time interval of a few minutes. The correlated returns ranging over
this interval cause the finite time decay of lagged correlations on the short
time scale resulting in the Epps effect. Our results do not contradict to the
earlier observations on the importance of asynchronicity in the Epps-effect,
however, its role has been put into a new perspective.
Acknowledgments
Support by OTKA T049238 is acknowledged.
References
[1] T.W. Epps, Journal of the American Statistical Association 74, 291-298
(1979)
[2] G. Bonanno, F. Lillo, R.N. Mantegna, Quantitative Finance 1, 1-9
(2001)
[3] A. Zebedee, A closer look at co-movements among stock returns, San
Diego State University, working paper (2001)
[4] M. Tumminello, T. Di Matteo, T. Aste, R.N. Mantegna, Eur. Phys. J.
B. 55, 209-217 (2007)
[5] M. Lundin, M. Dacorogna, U. A. Müller, Correlation of high-frequency
financial time series. In P. Lequeux (Ed.), Financial Markets Tick by
Tick. Wiley & Sons.
[6] J. Muthuswamy, S. Sarkar, A. Low, E. Terry, Journal of Futures Markets
21(2), 127-144 (2001)
[7] R. Renò, International Journal of Theoretical and Applied Finance 6(1),
87-102 (2003)
[8] O. V. Precup, G. Iori, Physica A 344, 252-256 (2004)
[9] O. V. Precup, G. Iori, European Journal of Finance (2006)
[10] J. Kwapień, S. Drożdż, J. Speth, Physica A 337, 231-242 (2004)
[11] L. Zhang, Estimating Covariation: Epps Effect, Microstructure Noise
working paper (2006)
[12] M. Potters, J.-P. Bouchaud, L. Laloux, Acta Physica Polonica B, 36, 9,
(2005)
[13] N. Shephard, O.E. Barndorff-Nielsen, University of
Oxford, Department of Economics, Economics Se-
ries Working Papers No. 240. (2005) Available at:
http://www.economics.ox.ac.uk/Research/wp/pdf/paper240.pdf
[14] F. Corsi, F. Audrino, University of St. Gallen, Department of Eco-
nomics, Discussion Paper No. 2007-02 (2007) Available at SSRN:
http://ssrn.com/abstract=957997
[15] T. Hayashi, N. Yoshida, Bernoulli 11(2), 359-379 (2005)
[16] M. Martens, Working Paper, University of Rotterdam (2003)
[17] T.G. Andersen, T. Bollerslev, P.F. Christoffersen, F.X. Diebold, in G.
Elliot, C.W.J. Granger, and Allan Timmermann (eds.), Handbook of
Economic Forecasting, Amsterdam: North-Holland, 778-878 (2006)
[18] T. Andersen, T. Bollerslev, International Economic Review, 39, 885-905
(1998)
[19] F. de Jong, T. Nijman, Journal of Empirical Finance 4, 259-277 (1997)
[20] C. Ball, W. Torous, Journal of Empirical Finance 7, 373-388 (2000)
http://www.economics.ox.ac.uk/Research/wp/pdf/paper240.pdf
http://ssrn.com/abstract=957997
[21] A. Lo, A. C. MacKinlay, Rev. Finance Stud 3, 175-205 (1990)
[22] L. Kullmann, J. Kertész, K. Kaski, Phys. Rev. E 66, 026125 (2002)
[23] B. Tóth, J. Kertész, Physica A 360 505-515 (2006)
[24] A. Lo, A. C. MacKinlay, Journal of Econometrics 45, 181-211 (1990)
[25] B. Tóth, J. Kertész, Physica A 383, 54-58 (2007)
[26] R.A. Wood, T.H. McInish, J.K. Ord, Journal of Finance 40(3), 723-739
(1985)
[27] K. Chan, Y.P. Chung, H. Johnson, The Journal of Financial and Quan-
titative Analysis, 30, 3, 329-346 (1995)
[28] E. Fama, Journal of Finance 25, 383-417 (1970)
[29] L. Dann, D. Mayers, R. Raab, Journal of Financial Economics 4, 3-22
(1977)
[30] J. Patell, M. Wolfson, Journal of Financial Economics 13, 223-252 (1984)
[31] R. Jennings, L. Starks, Journal of Accounting Research 23, 336-350
(1985)
[32] M. Barclay, R. Litzenberger, Journal of Financial Economics 21, 71-99
(1988)
[33] S. Kim, J. Lin, M. Slovin, Journal of Financial and Quantitative Analysis
32, 507-524 (1997)
[34] J. Busse, C. Green, Journal of Financial Economics 65, 415-437 (2002)
[35] T. Chordia, R. Roll, A. Subrahmanyam, Journal of Financial Economics
76, 271-292 (2005)
[36] T. Chordia, R. Roll, A. Subrahmanyam, Journal of Financial Economics
87, 249-268 (2008)
[37] P.Ch. Ivanov, A. Yuen, B. Podobnik, Y. Lee, Phys. Rev. E 69, 056107
(2004)
[38] Bence Tóth, János Kertész, Bálint Tóth, Proceedings of SPIE Vol. 6601
(Fluctuations and Noise 2007)
[39] M.M. Dacorogna, R. Gençay, U.A. Müller, R.B. Olsen, O.V. Pictet, An
Introduction to High-Frequency Finance, Academic Press, 2001
[40] A. Almeida, C. Goodhart, R. Payne, The Journal of Financial and Quan-
titative Analysis, 33, 383-408 (1998)
	Introduction
	Empirical analysis
	Data and methodology
	Time evolution of the characteristic time
	Decomposition of the cross-correlations
	Model calculations
	Application of the theory to the data
	Decay functions
	Fits
	Discussion
ABSTRACT
  We analyse the dependence of stock return cross-correlations on the sampling
frequency of the data known as the Epps effect: For high resolution data the
cross-correlations are significantly smaller than their asymptotic value as
observed on daily data. The former description implies that changing trading
frequency should alter the characteristic time of the phenomenon. This is not
true for the empirical data: The Epps curves do not scale with market activity.
The latter result indicates that the time scale of the phenomenon is connected
to the reaction time of market participants (this we denote as human time
scale), independent of market activity. In this paper we give a new description
of the Epps effect through the decomposition of cross-correlations. After
testing our method on a model of generated random walk price changes we justify
our analytical results by fitting the Epps curves of real world data.

<|endoftext|><|startoftext|>
Introduction and background
	1.1. Young-Jucys-Murphy elements and the Main Theorem
	1.2. Background
	1.3. Outline
	2. The Join-cut Equation
	3. A proof of Theorem ??
	3.1. An explicit solution to the Join-cut Equation
	3.2. An expression for the coefficients of 
	4. Further questions
	Acknowledgements
	References
ABSTRACT
  Although powers of the Young-Jucys-Murphya elements X_i = (1 i) + ... +(i-1
i), i = 1, ..., n, in the symmetric group S_n acting on {1, ...,n} do not lie
in the centre of the group algebra of S_n, we show that transitive powers,
namely the sum of the contributions from elements that act transitively on {1,
>...,n}, are central. We determine the coefficients, which we call star
factorization numbers, that occur in the resolution of transitive powers with
respect to the class basis of the centre of S_n, and show that they have a
polynomiality property. These centrality and polynomiality properties have
seemingly unrelated consequences. First, they answer a question raised by Pak
about reduced decompositions; second, they explain and extend the beautiful
symmetry result discovered by Irving and Rattan; and thirdly, we relate the
polynomiality to an existing polynomiality result for a class of double Hurwitz
numbers associated with branched covers of the sphere, which therefore suggests
that there may be an ELSV-type formula associated with the star factorization
numbers.

<|endoftext|><|startoftext|>
Introduction
A central result of Claude Chevalley [3] decomposes the ring of polyno-
mials in n variables (as graded representation of the symmetric group Sn)
as the tensor product of the symmetric polynomials times the coinvariants
of Sn (i.e., polynomials modulo symmetric polynomials with no constant
term).
The coinvariants of the symmetric group can also be defined as its har-
monics (the polynomials annhilated by all symmetric polynomial differen-
tial operators with no constant term). They admit as a basis the famous
Schubert polynomials of Schubert calculus, that play an important role in
algebraic combinatorics, see for instance [6].
The space of invariant polynomials in noncommutative variables was in-
troduced in 1936 by Wolf [16] where she found a noncommutative version of
the fundamental theorem of symmetric functions. This space has been stud-
ied from a modern perspective in [13, 1, 2]. On the other hand, two sets of
noncommutative harmonics for the symmetric group were introduced in [1]
that translated into two noncommutative analogues of Chevalley decompo-
sition for the ring of polynomials in noncommuting variables. The question
of decomposing as Sn–modules both kinds of noncommutative harmonics
was left open. This is the starting point in our investigations.
We begin the present work with the computation of the graded Frobenius
characteristic of noncommutative harmonics. We then use these calculations
Expanded version of a paper to appear in Journal of Combinatorial theory, series A,
http://www.elsevier.com/locate/jcta.
Emmanuel Briand is supported by a contract Juan de la Cierva, MEC. Mercedes Rosas
is supported by a contract Ramón y Cajal, MEC. Mike Zabrocki is supported by NSERC.
http://arxiv.org/abs/0704.1101v2
2 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
to derive the Frobenius series for the enveloping algebra of the derived free
Lie algebra in n variables, A′n. This last computation is achieved by using
the existence of an isomorphism of GLn(Q)–modules between the space of
polynomials in noncommutative variables, and the tensor product of the
space of commuting polynomials with A′n. Such an isomorphism is presented
explicitly in the last section.
We conclude this introduction with some basic definitions and results
that we will be using in the following sections. Let Sn denote the sym-
metric group in n letters. Denote by Q[Xn] = Q[x1, x2, . . . , xn] the space
of polynomials in n commuting variables and by Q〈Xn〉 = Q〈x1, x2, . . . , xn〉
the space of polynomials in n noncommutative variables.
The space of symmetric polynomials in n variables will be denoted by
Symn and the space of noncommutative polynomials which are invariant
under the canonical action of the symmetric group Sn will be denoted by
NCSymn.
Given any polynomial f(Xn) ∈ Q[Xn], the notation f(∂Xn) represents
the polynomial turned into an operator with each of the variables replaced
by its corresponding derivative operator. Analogous notation will also hold
for f(Xn) ∈ Q〈Xn〉 except that there are two types of differential operators
acting on words in noncommutative variables. The first is the Hausdorff
derivative, ∂x, whose action on a word w is defined to be the sum of the
subwords of w with an occurrence of the letter x deleted. The second deriv-
ative is the twisted derivative, dx, which is defined on w to be w
′ if w = xw′,
and 0 otherwise. Both derivations are extended to polynomials by linearity.
It is interesting to remark (as does Lenormand in [8], section Séries comme
opérateurs) that these two operations are dual to the shuffle and concate-
nation products respectively, with respect to a scalar product where the
noncommutative monomials are self dual. That is,
〈∂xf, g〉 = 〈f, x⊔⊔g〉, and
〈dxf, g〉 = 〈f, xg〉.
Following [1], we introduce the following two sets of noncommutative ana-
logues of the harmonic polynomials. The canonical action of the symmetric
group endow them with the structure of Sn–modules.
MHarn = {f ∈ Q〈Xn〉 : p(∂Xn)f(Xn) = 0 for all p ∈ Mn}
NCHarn = {f ∈ Q〈Xn〉 : p(dXn)f(Xn) = 0 for all p ∈ Mn}
where Mn = {p ∈ NCSymn with p(0) = 0}.
We are now ready to state the two decompositions of Q〈Xn〉 as the ten-
sor product (over Q) of its invariants times its coinvariants that we have
described.
Proposition 1 ([1], Theorems 6.8 and 8.8). As graded Sn–modules,
Q〈Xn〉 ≃ MHarn ⊗ Symn,
Q〈Xn〉 ≃ NCHarn ⊗NCSymn.
NONCOMMUTATIVE HARMONICS 3
2. The Frobenius characteristic of noncommutative harmonics
In this section we compute the Frobenius characteristic of both kinds
of noncommutative harmonics. This section is based of the observation
that the graded Frobenius series for each of the Sn–modules appearing in
Proposition 1 is either known or can be deduced from the existence of the
isomorphisms described there.
The expressions for Frobenius images and characters will require a little
use of symmetric function notation and identities. We will follow Macdonald
[9] for the notation of the sλ Schur, hλ homogeneous, eλ elementary and pλ
power sums bases for the ring of symmetric functions Sym, that we identify
with Q[p1, p2, p3, . . .]. For convenience we will make use of some plethystic
notation.
For a symmetric function f , f [X] represents the symmetric function eval-
uated at an unspecified (possibly infinite) alphabet X. Then, f [X(1 − q)]
is the image of f under the algebra automorphism sending the power sum
symmetric function pk to (1 − q
k)pk[X]. Similarly, f
is the image
of the symmetric function f under the inverse automorphism (sending the
power sum pk to pk/(1− q
In our calculations, we use the Kronecker product ⊙ of symmetric func-
tions. This operation on symmetric functions corresponds, under the Frobe-
nius map, to the inner tensor product of representations of the symmetric
group (tensor product of representations with the diagonal action on the ten-
sors). It can also be defined directly on symmetric functions by the equation
pλ⊙ pµ = δλ,µ
i ni(λ)!i
ni(λ)
pλ where ni(λ) is the number of parts of size
i in λ, and then extended by bilinearity.
We introduce the notations
(q; q)k = (1− q)(1− q
2) · · · (1− qk),
{q; q}k = (1− q)(1− 2q) · · · (1− k q).
Then qd/{q; q}d is the generating function for the set partitions with length
d and qd/(q; q)d is the generating function for partitions with length d, [15].
Finally, since Symn and NCSymn are made of graded copies of the trivial
Sn-module we conclude that
FrobSn(NCSymn) = hn [X]
{q, q}d
FrobSn(Symn) = hn [X]
(q; q)n
= hn [X]
(q; q)d
In the following lemma we compute the graded Frobenius characteristic
for the module Q〈Xn〉.
4 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
Lemma 2 (The Frobenius characteristic of Q〈Xn〉).
FrobSn(Q〈Xn〉) =
{q, q}d
h(n−d,1d)[X].
Proof. For each monomial xi1 · · · xir , we define its type ∇(xi1 · · · xir) to be
the set partition of [r] = {1, 2, . . . , r} such that a and b are in the same part
of the set partition if and only if ia = ib in the monomial. For a set partition
A with at most n parts, we will let MA equal the Sn submodule of Q〈Xn〉
spanned by all monomials of type A. As Sn–module,
Q〈Xn〉 ≃
A : ℓ(A)=d
where the second direct sum is taken over all set partitions A with d parts.
Fix a set partition A, and let d be the number of parts of A, and x~i =
xi1xi2 ...xir be the smallest monomial in lex order in M
A. It involves only
the variables x1, x2, . . . , xd. The representation M
A is the representation of
Sn induced by the action of the subgroup Sd×S
1 ≃ Sd on the subspace
Q[Sd] · x~i. The representation Q[Sd] · x~i of Sd is isomorphic to the regular
representation. We use the rule for a representation R of Sd induced to Sn,
FrobSn(R ↑
) = hn−d[X]FrobSd(R),
and conclude that the Frobenius characteristic of MA is h(n−d,1d)[X]. Hence
the graded Frobenius characteristic of Q〈Xn〉 is
FrobSn(Q〈Xn〉) =
A:ℓ(A)=d
q|A|h(n−d,1d)[X] =
{q, q}d
h(n−d,1d)[X].
We are now able to compute the Frobenius characteristic for MHarn and
NCHarn.
Theorem 3 (The Frobenius characteristic of the noncommutative harmon-
ics).
FrobSn(MHarn) = (q; q)n
{q, q}d
h(n−d,1d)[X]
FrobSn(NCHarn) =
{q, q}d
{q, q}d
h(n−d,1d)[X].
Proof. This follows since FrobSn(MHarn ⊗ Symn) = FrobSn(MHarn) ⊙
FrobSn(Symn). Since hn[X] is the unity for the Kronecker product on
symmetric functions of degree n, and since FrobSn(Symn) = hn[X]/(q; q)n,
we conclude that FrobSn(MHarn)/(q; q)n = FrobSn(Q〈Xn〉). We can now
solve for FrobSn(MHarn).
NONCOMMUTATIVE HARMONICS 5
A similar argument demonstrates the formula for FrobSn(NCHarn). We
have from Proposition 1 and Lemma 2,
{q, q}d
h(n−d,1d)[X] = FrobSn(Q〈Xn〉)
= FrobSn(NCHarn)⊙FrobSn(NCSymn)
{q, q}d
hn[X] ⊙FrobSn(NCHarn)
{q, q}d
FrobSn(NCHarn).
From this equation we can solve for FrobSn(NCHarn). �
As a corollary, we obtain the generating functions for the graded dimen-
sions of these spaces.
Corollary 4 (The Hilbert series of the noncommutative harmonics).
dimq(MHarn) =
(q; q)n
1− nq
dimq(NCHarn) =
(1− nq)
{q,q}d
Proof. After Theorem 3,
FrobSn(MHarn) = (q; q)n FrobSn(Q〈Xn〉)
FrobSn(NCHarn) =
{q, q}d
FrobSn(Q〈Xn〉)
This implies
dimq(MHarn) = (q; q)n dimq(Q〈Xn〉)
dimq(NCHarn) =
{q, q}d
dimq(Q〈Xn〉)
since the Hilbert series of a graded Sn–module is obtained by coefficient
extraction from the graded Frobenius characteristic (of the coefficient of
p(1n)[X]/n! in the expansion in power sum symmetric functions). Last, the
Hilbert series of Q〈Xn〉 is
The graded dimensions of MHarn for 2 ≤ n ≤ 5 are listed in [14] as
sequences A122391 through A122394. The sequences of graded dimensions
of NCHarn for 3 ≤ n ≤ 8 are listed in [14] as sequences A122367 through
A122372.
6 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
3. Non–commutative harmonics and the enveloping algebra of
the derived free Lie algebra
Let Ln be the canonical realization of the free Lie algebra inside the ring
of polynomials in noncommuting variables Q〈Xn〉. More precisely, Ln is
the linear span of the minimal set of polynomials in Q〈Xn〉 that includes
Q and the variables Xn, and is closed under the bracket operation [x, y] =
xy − yx. Let L′n = [Ln,Ln] be the derived free Lie algebra. Remark that
Ln = L
n ⊕ QXn, where QXn denotes the space of linear polynomials. The
enveloping algebra A′n of Ln can be realized as a subalgebra of Q〈Xn〉 as
follows (see [12] 1.6.5):
A′n =
ker ∂x.
More explicitly, A′n is the subalgebra of Q〈Xn〉 generated by all the brackets
under concatenation.
In [1] it was established that there is an isomorphism of vector spaces
between MHarn and A
n ⊗ Hn. In this section we will show the following
result.
Theorem 5. As Sn–modules,
MHarn ≃ A
n ⊗Hn.
The proposition will be established by comparing the Frobenius image of
MHarn (known from Theorem 3) to FrobSn(A
n ⊗ Hn), which is equal to
FrobSn(A
n)⊙FrobSn(Hn). We will determine FrobSn(A
n) in Theorem 8
below. An intermediate step will make use the following Theorem due to V.
Drensky.
Proposition 6 (Drensky, [5] Theorem 2.6). As GLn(Q)–modules (and con-
sequently as Sn–modules),
Q〈Xn〉 ≃ Q[Xn]⊗A
Drensky proved Proposition 6 by exhibiting an explicit isomorphism be-
tween these two representations. We will look at it in the next section. For
now, we will provide a non–constructive proof of the theorem. Before, we
need to introduce some notation.
It is known that Q〈Xn〉 is the universal enveloping algebra (u.e.a) of the
free Lie algebra, Ln. Using the Poincaré-Birkhoff-Witt theorem, a linear
basis for Q〈Xn〉 is given by decreasing products of elements of Ln. Since
we can choose an ordering of the elements of Ln so that the space of linear
polynomials is smallest and decreasing products of linear polynomials are
isomorphic to Q[Xn] (as a vector space), we note that as vector spaces
Q〈Xn〉 = u.e.a.(Ln) = u.e.a(QXn ⊕ L
n) ≃ Q[Xn]⊗A
To distinguish between the commutative elements of Q[Xn] and the non-
commutative words of Q〈Xn〉, we will place a dot over the variables (as in
ẋi) to indicate the commutative variables.
NONCOMMUTATIVE HARMONICS 7
Let [n] = {1, 2, . . . , n} and let [n]r denote the words of length r in the
alphabet of the numbers 1, 2, . . . , n. A word w ∈ [n]r is called a Lyndon word
if w < wkwk+1 · · ·wr for all 2 ≤ k ≤ r where< represents lexicographic order
on words.
Every word w ∈ [n]r is equal to a unique product w = ℓ1ℓ2 · · · ℓk such
that ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓk and each ℓi is Lyndon (e.g. Corollary 4.4 of [12]).
Let ℓ be a Lyndon word of length greater than 1. We say that ℓ = uv
is the standard factorization of ℓ if v is the smallest nontrivial suffix in
lexicographic order. It follows that u and v are Lyndon words and u < v.
For a Lyndon word ℓ, if ℓ is a single letter a then define Pa = xa ∈ Q〈Xn〉.
If ℓ = uv is the standard factorization of ℓ, then Pℓ = [Pu, Pv ]. For any
w ∈ [n]r with Lyndon decomposition w = ℓ1ℓ2 · · · ℓk, define
Pw = Pℓ1Pℓ2 · · ·Pℓk .
The set {Pw}w∈[n]r forms a basis for the noncommutative polynomials of
degree r ([12], Theorem 5.1). The elements Pw with Lyndon decomposition
w = ℓ1ℓ2 · · · ℓk such that each Lyndon factor has degree at least 2 are a basis
of A′n.
Proof. To prove that Q〈Xn〉 and Q[Xn] ⊗ A
n are isomorphic as GLn(Q)–
modules, we use the fact that two polynomial GLn(Q)–modules with the
same character are isomorphic (see for instance the notes by Kraft and
Procesi, [7]). The character of a GLn(Q)–module is the trace of the action
of the diagonal matrix diag(a1, a2, . . . , an).
A basis for Q[Xn] ⊗A
n are the elements ẋ
α ⊗ Pℓ1 · · ·Pℓk with ℓ1 ≥ ℓ2 ≥
· · · ≥ ℓk and |ℓi| ≥ 2. The action of the diagonal matrix diag(a1, a2, . . . , an)
on this basis element is the same as the action on the noncommutative
polynomial xα11 x
2 · · · x
n Pℓ1Pℓ2 · · ·Pℓk (in both cases: multiplication by
α1+m1
α2+m2
2 · · · a
αn+mn
n where mi is the number of occurrences of i in the
word ℓ1ℓ2 · · · ℓk). By the Poincaré-Birkhoff-Witt theorem, these polynomials
form a basis for Q 〈Xn〉, hence the trace of the action of diag(a1, a2, . . . , an)
acting on Q 〈Xn〉 and Q[Xn]⊗A
n are equal. Since their characters are equal,
we conclude that they are isomorphic as GLn(Q) modules. �
The GLn(Q)–character of Q[Xn] is
, and the GLn(Q)–character
of Q〈Xn〉 is
1−(a1+a2+···+an)
. Therefore, the existence of a GLn(Q)-module
isomorphism between Q〈Xn〉 and Q[Xn]⊗A
n implies the following result.
Corollary 7 (The GLn(Q)–character of A
charGLn(Q)(A
n)(a1, a2, · · · , an) =
(1− a1) · · · (1− an)
1− (a1 + a2 + · · · + an)
(−1)ie(i,1k−i)(a1, a2, . . . , an).
8 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
Moreover this last sum is equal to
sshape(T )(a1, a2, . . . , an)
where the sum is over all standard tableaux T such that the smallest integer
which does not appear in the first column of T is odd.
By Schur-Weyl duality, the above formula also describes the decompo-
sition of the subspace of multilinear polynomials (i.e. with distinct occur-
rences of the variables) of A′n. That is, if n is the number of variables, the the
multilinear polynomials of degree n will be an Sn-module with Frobenius
image equal to
i=2(−1)
ie(i,1n−i)[X]. This decomposition was considered
in the papers [4], [10], [11] where an expression was given degree by degree
up to n = 7. The expansion of this formula in the Schur basis provided in
the Theorem agrees with the computations in those papers.
We can derive a formula for the Frobenius characteristic of A′ by using a
similar technique.
Theorem 8 (The Frobenius characteristic of A′n).
FrobSn(A
{q; q}d
h(n−d,1d)[X(1− q)].
Proof. For any symmetric function f [X] of degree n, we have that
f [X]⊙ hn
In particular, since FrobSn(Q[Xn]) = hn
, we conclude that
FrobSn(Q〈Xn〉) = FrobSn(A
n ⊗Q[Xn])
= FrobSn(A
n)⊙ hn
= FrobSn(A
This implies that if we make the plethystic substitution X→X(1 − q) into
both sides of this equation and using Lemma 2 we arrive at the stated
formula. �
We can now prove Theorem 5.
Proof. From Theorem 3 we know the Frobenius image of MHarn, we com-
pare this to
FrobSn(A
n ⊗Hn) = FrobSn(A
n)⊙FrobSn(Hn),
{q; q}d
h(n−d,1d)[X(1 − q)]⊙ hn
(q; q)n
= (q; q)n
{q; q}d
h(n−d,1d)[X]
= FrobSn(MHarn).
NONCOMMUTATIVE HARMONICS 9
Since the two Sn–modules have the same Frobenius image, we conclude that
they must be isomorphic. �
4. An explicit isomorphism between Q〈Xn〉 and Q[Xn]⊗A
Let V be a finite–dimensional vector space over Q. Let S(V ) and T (V ) be
its symmetric algebra and tensor algebra respectively. There exists a unique
embedding ϕ of GL(V )–modules of S(V ) into T (V ) such that
ϕ(v1v2 · · · vr) =
vσ(1) ⊗ vσ(2) ⊗ · · · ⊗ vσ(r)
for all r ≥ 0, v1, v2, . . . , vr ∈ V.
Its image is the subspace of the symmetric tensors. In the case V =
i=1Qxi, we have S(V ) = Q[Xn] and T (V ) = Q〈Xn〉. Then the embed-
ding ϕ and the inclusion A′n ⊂ Q〈Xn〉 induce a map of GLn(Q)–modules
Φ : Q[Xn] ⊗ A
n −→ Q〈Xn〉 characterized by Φ(f ⊗ a) = ϕ(f)a for all
f ∈ Q[Xn] and all a ∈ A
n. Then,
Proposition 9 (Drensky, [5] Theorem 2.6). The map Φ is a GLn(Q) equi-
variant isomorphism from Q[Xn]⊗A
n to Q〈Xn〉.
Indeed, Drensky showed that given an arbitrary homogeneous basis of G
of A′n, the elements Φ(m⊗ g) for m monomial and g ∈ G, are a basis of A
([5] Lemma 2.4). We refine Drensky’s proof by considering for G the bracket
basis {Pw}w∈[n]r of A
n (introduced before the proof of Proposition 6) and
the shuffle basis (see below) to realize Q[Xn] in Q〈Xn〉. We show that the
elements Φ(m⊗g) form a basis Q〈Xn〉 (the hybrid basis) that is triangularly
related and expands positively in the bracket basis of Q〈Xn〉 (Theorem 10
below).
We follow the book of Reutenauer [12] for the classical definitions and
results used in this section. The bracket basis Pw has been introduced in
the previous section (before the proof of Proposition 6). Before presenting
the hybrid basis we introduce another classical basis of Q〈Xn〉: the shuffle
basis.
The shuffle basis of Q〈Xn〉. Consider two monomials, xi1xi2 · · · xir and
xj1xj2 · · · xjr′ in Q〈Xn〉. For a subset
S = {s1, s2, . . . , sr} ⊆ [r + r
and the complement subset T = {t1, t2, . . . , tr′} = [r + r
′]\S, we let
xi1xi2 · · · xir⊔⊔Sxj1xj2 · · · xjr′ := w
be the unique monomial in Q〈Xn〉 of length r+r
′ such that ws1ws2 · · ·wsr =
xi1xi2 · · · xir and wt1wt2 · · ·wtr′ = xj1xj2 · · · xjr′ .
10 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
The shuffle of any two monomials is defined as
u⊔⊔v =
S⊆[|u|+|v|]
|S|=|u|
u⊔⊔Sv.
This shuffle of monomials is then extended to a bilinear operation on any
two elements of Q〈Xn〉 The shuffle product is a commutative and associative
operation on Q〈Xn〉.
Let w be a word in [n]r and let w = ℓ
2 · · · ℓ
be the factorization of w
into decreasing products of Lyndon words ℓ1 > ℓ2 > · · · > ℓk. For a Lyndon
word ℓ = i1i2 · · · ir, let Sℓ be the corresponding monomial in Q〈Xn〉, that is
Sℓ = xi1xi2 · · · xir . If w is not a single Lyndon word then define
i1!i2! · · · ik!
⊔⊔ · · · ⊔⊔S
The set {Sw}w∈[n]r forms a basis for the noncommutative polynomials of
degree r ([12], Corollary 5.5).
It is interesting to note that the bracket basis Pw and the shuffle basis
Sw are dual with respect to the scalar product where the noncommutative
monomials are self-dual.
The hybrid basis of Q〈Xn〉. We are now ready to introduce the hybrid
basis.
Given a word w ∈ [n]r with a factorization into decreasing products of
Lyndon words w = ℓ
2 · · · ℓ
, then let ℓj1 , ℓj2 , · · · , ℓjr be the Lyndon words
of length 1 in this decomposition and set
M(w) = x
⊔⊔ij1
⊔⊔ij2
⊔⊔ · · · ⊔⊔x
⊔⊔ijr
= ij1 !ij2 ! · · · ijr !S
Observe that M(w) is the image under the embedding ϕ of the monomial
X(w) = ẋ
· · · ẋ
. For all of the remaining Lyndon words ℓa1 , ℓa2 ,
. . . , ℓak−r with length greater than 1 we define the Lie portion of the word
to be L(w) = P
iak−r
. We will define the hybrid elements to be
Hw := M(w)L(w) = Φ(X(w) ⊗ L(w)).
The result of this section is:
Theorem 10. The noncommutative polynomials Hw are triangularly related
to and expand positively in the Pu basis. Precisely, for w of length r,
Hw = r!Pw + terms cuPu with u lexicographically smaller than w.
As a consequence, the set {Hw}w∈[n]r is a basis for the noncommutative
polynomials of Q〈Xn〉 of degree r.
We require a few facts about Lyndon words and the lexicographic ordering
which can be found in [12].
(1) If u and v are Lyndon words and u < v then uv is a Lyndon word.
([12], (5.1.2))
NONCOMMUTATIVE HARMONICS 11
(2) If u < v and u is not a prefix of v, then ux < vy for all words x, y.
([12], Lemma 5.2.(i))
(3) If w = ℓ1ℓ2 · · · ℓk with ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓk then ℓk is the smallest
(with respect to the > order) nontrivial suffix of w. ([12], Lemma
7.14)
(4) If ℓ′ < ℓ, are both Lyndon words, then ℓ′ℓ < ℓℓ′ (follows from (1)). As
a consequence, for ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓk, ℓ1ℓ2 · · · ℓk ≥ ℓσ(1)ℓσ(2) · · · ℓσ(k)
for any permutation σ ∈ Sk with equality if and only ℓi = ℓσ(i) for
all 1 ≤ i ≤ k.
Proof. To see that (4) holds consider a weakly decreasing product of Lyndon
words ℓ1ℓ2 · · · ℓk. If id → σ
(1) → σ(2) → · · · → σ is a chain in the weak right
order then we have just shown that
ℓσ(i)(1)ℓσ(i)(1) · · · ℓσ(i)(k) ≥ ℓσ(i+1)(1)ℓσ(i+1)(2) · · · ℓσ(i+1)(k)
with equality if and only if the two Lyndon factors which were transposed are
equal. Therefore there exists a chain of words one greater than or equal to
the next with ℓ1ℓ2 · · · ℓk on one end and ℓσ(1)ℓσ(2) · · · ℓσ(k) on the other. �
Theorem 10 will be established after the following lemma.
Lemma 11. Let w be a word and ℓ1ℓ2 · · · ℓr the decomposition of w into
a decreasing product of Lyndon words. Let ℓ be a Lyndon word such that
ℓ = af1f2 · · · fk with a one of the variables, each fi a Lyndon word and
fi ≥ fi+1 and fk ≥ ℓ1. Let u = ℓ1 · · · ℓdℓℓd+1 · · · ℓr where ℓd > ℓ ≥ ℓd+1 or
d = 0 and ℓ ≥ ℓ1. Then
PℓPw = Pu
+ terms cvPv where v is lexicographically smaller than u and cv ≥ 0.
Proof. Assume that r = 1, and we have that either ℓ ≥ ℓ1 and PℓPℓ1 = Pℓℓ1
and we are done, or ℓ < ℓ1 and
PℓPℓ1 = Pℓ1Pℓ + [Pℓ, Pℓ1 ].
In this case Pℓ1Pℓ = Pℓ1ℓ. By (1) we know that ℓℓ1 is Lyndon. Moreover,
ℓℓ1 is its standard factorization (this follows from (3), since the nontrivial
suffixes of ℓℓ1 are all suffixes of f1f2 · · · fkℓ1, which is a nonincreasing product
of Lyndon words). Therefore Pℓℓ1 = [Pℓ, Pℓ1 ] and PℓPℓ1 = Pℓ1ℓ + Pℓℓ1 . By
(4), ℓℓ1 < ℓ1ℓ so the triangularity relation holds.
Now for an arbitrary r > 1 we have the same two cases. Either ℓ ≥ ℓ1
and PℓPℓ1Pℓ2 · · ·Pℓr = Pℓw, or ℓ < ℓ1 and
PℓPℓ1Pℓ2 · · ·Pℓr = Pℓ1PℓPℓ2 · · ·Pℓr + [Pℓ, Pℓ1 ]Pℓ2 · · ·Pℓr .
Our induction hypothesis holds for PℓPℓ2 · · ·Pℓr since fk ≥ ℓ1 ≥ ℓ2, hence
PℓPℓ2 · · ·Pℓr = Pu′ +
v′<u′ c
Pv′ where u
′ = ℓ2 · · · ℓdℓℓd+1 · · · ℓr. Moreover,
Pℓ1Pu′ = Pℓ1u′ = Pu since ℓ1 ≥ ℓ2 and Pℓ1Pv′ = Pℓ1v′ since ℓ1 ≥ any Lyndon
prefix of v′.
12 EMMANUEL BRIAND, MERCEDES ROSAS, AND MIKE ZABROCKI
Since [Pℓ, Pℓ1 ] = Pℓℓ1 by (3), and ℓ1 ≥ ℓ2, we have by the induction
hypothesis that Pℓℓ1Pℓ2 · · ·Pℓr = Pu′′ +
v′′<u′′ c
Pv′′ where
u′′ = ℓ2 · · · ℓd′ℓℓ1ℓd′+1 · · · ℓr
with ℓd′ > ℓℓ1 ≥ ℓd′+1. In order to justify the induction step we also need
to have that u′′ < u. This follows from (4) since u′′ is a permutation of the
factors of u and ℓ1 > ℓ and ℓ lies to the left of ℓ1 in u
We are now in a position to prove Theorem 10.
Proof. Hw is defined as the product M(w)L(w) where M(w) is a a shuffle
of monomials. It expands as M(w) =
c̃bxb with
c̃b = r! and where
each monomial in M(w) has the same number of x1s, x2s, etc. We fix one
such monomial xb that as follows by indexing its letters backwards: xb =
xikxik−1 · · · xi1 . We define inductively words w[k], . . . , w[1], w[0] as follows:
w[k] := w and w[j − 1] is the word obtained from w[j] by removing one of
its Lyndon factors of length 1 equal to xij . Remark that L(w[j]) = L(w)
for all j. Then we establish by induction on j that
xijxij−1 · · · xi1L(w) = Pw[j]
+ terms cvPv with v lexicographically smaller than w[j]
by applying Lemma 11 with xij for ℓ and w[j] for w. �
References
[1] N. Bergeron, C. Reutenauer, M. Rosas, M. Zabrocki, Invariants and Coinvariants of
the Symmetric Group in Noncommuting Variables, to appear in Canad. J. Math.
[2] N. Bergeron and M. Zabrocki, The Hopf algebra of symmetric functions in non-
commutative variables is free and cofree. Preprint, arXiv: math.CO/0509265.
[3] C. Chevalley, Invariants of finite groups generated by reflections, Amer. J. Math. 77
(1955), 778–782.
[4] V. Drensky, Lattices of varieties of associative algebras. Serdica 8 (1982), 20–31.
[5] V. Drensky, Codimensions of T -ideals and Hilbert series of relatively free algebras,
J. Algebra 91:1 (1984), 1–17.
[6] W. Fulton, Young tableaux. London Mathematical Society Student Texts, 35. Cam-
bridge University Press, Cambridge, 1997.
[7] H. P. Kraft and C. Procesi, Classical Invariant Theory: A primer, lecture notes,
http://www.math.unibas.ch/~kraft/Papers/KP-Primer.pdf.
[8] C. Lenormand, Opérateurs sur les polynômes non–commutatifs: définitions et no-
tations, Séminaire Schützenberger–Lentin–Nivat (Problèmes mathématiques de la
théorie des automates), 1969–1970, exposé num. 3, 9 p.
[9] I. G. Macdonald, Symmetric functions and Hall polynomials, 2nd edition, The
Clarendon Press, Oxford University Press, 1995.
[10] A. P. Popov, Identities of the Tensor Square of the Grassmann Algebra, Algebra
and Logic, 21 (1982) 4, pp. 296-316.
[11] A. P. Popov, Module Structure of Space of Proper Polynomials of Degree Seven, C.
R. Acad. Bulgare, 38:3 (1985), 295–298.
[12] C. Reutenauer, Free Lie algebras. London Mathematical Society Monographs. New
Series, 7. Oxford Science Publications. The Clarendon Press, Oxford University
Press, New York, 1993.
http://arxiv.org/abs/math/0509265
http://www.math.unibas.ch/~kraft/Papers/KP-Primer.pdf
NONCOMMUTATIVE HARMONICS 13
[13] M. Rosas, B. Sagan, Symmetric Functions in Noncommuting Variables. Transactions
of the American Mathematical Society, 358:1 (2006), 215–232.
[14] N. J. A. Sloane, The Online Encyclopedia of Integer Sequences, published electron-
ically at http://www.research.att.com/~njas/sequences/.
[15] H. Wilf, Generatingfunctionology. Second edition. Academic Press, Inc., Boston,
MA, 1994.
[16] M. C. Wolf, Symmetric functions of noncommutative elements, Duke Math. J. 2
(1936), 626–637.
Emmanuel Briand and Mercedes Rosas, Universidad de Sevilla, Sevilla,
Spain
Mike Zabrocki, York University, Toronto, Canada
http://www.research.att.com/~njas/sequences/
	1. Introduction
	2.  The Frobenius characteristic of noncommutative harmonics
	3. Non–commutative harmonics and the enveloping algebra of the derived free Lie algebra
	4. An explicit isomorphism between Q"426830A Xn "526930B  and  Q[Xn] An'.
	The shuffle basis of Q"426830A Xn "526930B .
	The hybrid basis of Q"426830A Xn "526930B .
	References
ABSTRACT
  Using a noncommutative analog of Chevalley's decomposition of polynomials
into symmetric polynomials times coinvariants due to Bergeron, Reutenauer,
Rosas, and Zabrocki we compute the graded Frobenius series for their two sets
of noncommutative harmonics with respect to the left action of the symmetric
group (acting on variables). We use these results to derive the Frobenius
series for the enveloping algebra of the derived free Lie algebra in n
variables.

<|endoftext|>